diff --git "a/train.log" "b/train.log" new file mode 100644--- /dev/null +++ "b/train.log" @@ -0,0 +1,118810 @@ +2025-02-05 09:49:29 - INFO - llana.model.llana - Using nf2vec. +2025-02-05 09:49:29 - INFO - stdout - Loading nf2vec config from /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/llana/model/nf2vec/nf2vec_2layer.yaml. +2025-02-05 09:49:30 - INFO - llana.model.llana - nerf2vec output dim: 1024. +2025-02-05 09:49:30 - INFO - llana.model.llana - Use 2 projection hiddent layers. +2025-02-05 09:49:30 - INFO - llana.model.llana - Each layer with [1024, 2048] hidden units. +2025-02-05 09:49:30 - INFO - stdout - Each layer with [1024, 2048] hidden units. +2025-02-05 09:49:30 - INFO - stdout - Vec projector output dim: 5120. +2025-02-05 09:49:30 - INFO - llana.model.llana - =========== VEC PROJ PARAMETERS ============= +2025-02-05 09:49:30 - INFO - llana.model.llana - Vec projector architecture: Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) +) +2025-02-05 09:49:30 - INFO - llana.model.llana - Total number of parameters: 13,639,680 +2025-02-05 09:49:30 - INFO - llana.model.llana - ============================================= +2025-02-05 09:49:30 - INFO - llana.model.llana - =========== LLANA TOKENIZER ============= +2025-02-05 09:49:30 - INFO - llana.model.llana - Tokenizer: Embedding(32003, 5120, padding_idx=0) +2025-02-05 09:49:30 - INFO - llana.model.llana - Total number of parameters: 163,855,360 +2025-02-05 09:49:30 - INFO - llana.model.llana - ============================================= +2025-02-05 09:49:30 - INFO - llana.model.llana - =========== LM HEAD PARAMETERS ============= +2025-02-05 09:49:30 - INFO - llana.model.llana - lm_head architecture: Linear(in_features=5120, out_features=32003, bias=False) +2025-02-05 09:49:30 - INFO - llana.model.llana - Total number of parameters: 163,855,360 +2025-02-05 09:49:30 - INFO - llana.model.llana - ============================================= +2025-02-05 09:49:30 - ERROR - stderr - Loading checkpoint shards: 0%| | 0/11 [00:00= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:38 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,006] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,006] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,006] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,006] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,211] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,211] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,211] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout - [2025-02-05 09:49:39,211] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:39 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:49:40 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 09:50:54 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 09:50:54 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 09:50:54 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 09:50:54 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 09:50:54 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 09:50:54 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 09:50:54 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 09:50:54 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 09:50:54 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 48 +2025-02-05 09:50:54 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 48 +2025-02-05 09:50:54 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 09:50:54 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 09:50:54 - INFO - transformers.trainer - Total optimization steps = 29,910 +2025-02-05 09:50:54 - INFO - transformers.trainer - Total optimization steps = 29,910 +2025-02-05 09:50:54 - INFO - transformers.trainer - Number of trainable parameters = 1,085,794,574 +2025-02-05 09:50:54 - INFO - transformers.trainer - Number of trainable parameters = 1,085,794,574 +2025-02-05 09:50:55 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 09:50:55 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 09:50:55 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 09:50:55 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 09:50:55 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 09:50:55 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 09:50:55 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 09:50:55 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 09:50:55 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 09:50:55 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 09:50:55 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 09:50:55 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 09:50:55 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 09:50:55 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 48 +2025-02-05 09:50:55 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 48 +2025-02-05 09:50:55 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 09:50:55 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 48 +2025-02-05 09:50:55 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 48 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total optimization steps = 29,910 +2025-02-05 09:50:55 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total optimization steps = 29,910 +2025-02-05 09:50:55 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total optimization steps = 29,910 +2025-02-05 09:50:55 - INFO - transformers.trainer - Total optimization steps = 29,910 +2025-02-05 09:50:55 - INFO - transformers.trainer - Number of trainable parameters = 1,085,794,574 +2025-02-05 09:50:55 - INFO - transformers.trainer - Number of trainable parameters = 1,085,794,574 +2025-02-05 09:50:55 - INFO - transformers.trainer - Number of trainable parameters = 1,085,794,574 +2025-02-05 09:50:55 - INFO - transformers.trainer - Number of trainable parameters = 1,085,794,574 +2025-02-05 09:50:55 - INFO - transformers.integrations.integration_utils - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +2025-02-05 09:50:55 - INFO - transformers.integrations.integration_utils - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +2025-02-05 09:50:55 - ERROR - stderr - wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. +2025-02-05 09:50:55 - INFO - wandb - Current SDK version is 0.18.3 +2025-02-05 09:50:55 - INFO - wandb - Configure stats pid to 425929 +2025-02-05 09:50:55 - INFO - wandb - Loading settings from /leonardo/home/userexternal/aamaduzz/.config/wandb/settings +2025-02-05 09:50:55 - INFO - wandb - Loading settings from /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/wandb/settings +2025-02-05 09:50:55 - INFO - wandb - Loading settings from environment variables: {'mode': 'offline'} +2025-02-05 09:50:55 - INFO - wandb - Applying setup settings: {'mode': 'offline', '_disable_service': None} +2025-02-05 09:50:55 - INFO - wandb - Inferring run settings from compute environment: {'program_relpath': 'llana/train/train_mem_llana.py', 'program_abspath': '/leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/llana/train/train_mem_llana.py', 'program': '/leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/llana/train/train_mem_llana.py'} +2025-02-05 09:50:55 - INFO - wandb - Logging user logs to /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/wandb/offline-run-20250205_095055-niozp2gt/logs/debug.log +2025-02-05 09:50:55 - INFO - wandb - Logging internal logs to /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/wandb/offline-run-20250205_095055-niozp2gt/logs/debug-internal.log +2025-02-05 09:50:55 - INFO - wandb - calling init triggers +2025-02-05 09:50:55 - INFO - wandb - wandb.init called with sweep_config: {} +config: {} +2025-02-05 09:50:55 - INFO - wandb - starting backend +2025-02-05 09:50:55 - INFO - wandb - sending inform_init request +2025-02-05 09:50:55 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn +2025-02-05 09:50:55 - INFO - wandb - backend started and connected +2025-02-05 09:50:55 - DEBUG - wandb - no default config file found in config-defaults.yaml +2025-02-05 09:50:55 - INFO - wandb - updated telemetry +2025-02-05 09:50:55 - INFO - wandb - communicating run to backend with 90.0 second timeout +2025-02-05 09:50:55 - INFO - wandb - starting run threads in backend +2025-02-05 09:50:55 - ERROR - stderr - wandb: Tracking run with wandb version 0.18.3 +2025-02-05 09:50:55 - ERROR - stderr - wandb: W&B syncing is set to `offline` in this directory. +2025-02-05 09:50:55 - ERROR - stderr - wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. +2025-02-05 09:50:55 - DEBUG - wandb - Saving list of pip packages installed into the current environment +2025-02-05 09:50:56 - INFO - wandb - atexit reg +2025-02-05 09:50:56 - INFO - wandb - redirect: wrap_raw +2025-02-05 09:50:56 - INFO - wandb - Wrapping output streams. +2025-02-05 09:50:56 - INFO - wandb - Redirects installed. +2025-02-05 09:50:56 - INFO - wandb - run started, returning control to user process +2025-02-05 09:50:56 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'max_position_embeddings': 2048, 'hidden_size': 5120, 'intermediate_size': 13824, 'num_hidden_layers': 40, 'num_attention_heads': 40, 'num_key_value_heads': 40, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['LLaNA'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'outputs/LLaNA_13B_train_stage1_shapenerf_objanerf_AUGMENTED/slurm_script_31-01-2025_19:29', 'transformers_version': '4.44.0', 'DEFAULT_POINT_END_TOKEN': '', 'DEFAULT_POINT_PATCH_TOKEN': '', 'DEFAULT_POINT_START_TOKEN': '', 'mm_use_point_start_end': True, 'model_type': 'llana', 'nf2vec_config_name': 'nf2vec_2layer', 'point_backbone': 'nf2vec', 'point_backbone_ckpt': '', 'use_color': True, 'output_dir': 'outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 4, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 2e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/runs/Feb05_09-49-29_lrdn1697.leonardo.local', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1.0, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 32860, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 100.0, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'llana_objanerf_13b_stage2_recipe3_shapenerf_objanerf_AUGMENTED', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': ['full_shard', 'auto_wrap'], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'no', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'eval_use_gather_object': False, 'cache_dir': None, 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': None, 'detatch_point_token': ''} +2025-02-05 09:50:56 - INFO - wandb - config set model/num_parameters = 1085794574 - > +2025-02-05 09:50:56 - INFO - wandb - config_cb model/num_parameters 1085794574 None +2025-02-05 09:50:56 - ERROR - stderr - 0%| | 0/29910 [00:00= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:03 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:03 - INFO - transformers.trainer - Using auto half precision backend +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - transformers.trainer - Using auto half precision backend +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - WARNING - accelerate.utils.other - Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - transformers.trainer - Using auto half precision backend +2025-02-05 10:06:03 - INFO - transformers.trainer - Using auto half precision backend +2025-02-05 10:06:03 - INFO - llana.train.train_llana - =========== LLANA PARAMETERS ============= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - lm_head architecture: LLaNA( + (model): LLaNAModel( + (embed_tokens): Embedding(32003, 5120, padding_idx=0) + (layers): ModuleList( + (0-39): 40 x LlamaDecoderLayer( + (self_attn): LlamaSdpaAttention( + (q_proj): Linear(in_features=5120, out_features=5120, bias=False) + (k_proj): Linear(in_features=5120, out_features=5120, bias=False) + (v_proj): Linear(in_features=5120, out_features=5120, bias=False) + (o_proj): Linear(in_features=5120, out_features=5120, bias=False) + (rotary_emb): LlamaRotaryEmbedding() + ) + (mlp): LlamaMLP( + (gate_proj): Linear(in_features=5120, out_features=13824, bias=False) + (up_proj): Linear(in_features=5120, out_features=13824, bias=False) + (down_proj): Linear(in_features=13824, out_features=5120, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + (post_attention_layernorm): LlamaRMSNorm((5120,), eps=1e-06) + ) + ) + (norm): LlamaRMSNorm((5120,), eps=1e-06) + (rotary_emb): LlamaRotaryEmbedding() + (vec_proj): Sequential( + (0): Linear(in_features=1024, out_features=1024, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=1024, out_features=2048, bias=True) + (3): GELU(approximate='none') + (4): Linear(in_features=2048, out_features=5120, bias=True) + ) + ) + (lm_head): Linear(in_features=5120, out_features=32003, bias=False) +) +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - llana.train.train_llana - Total number of trainable parameters: 13,029,534,720 +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - INFO - llana.train.train_llana - ============================================= +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - INFO - stdout - ***** output_dir: outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:03 - INFO - stdout - **** training from scratch **** +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,043] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,043] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,043] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,043] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,107] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,107] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,107] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout - [2025-02-05 10:06:04,107] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  async_io: please install the libaio-devel package with yum +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:04 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 +2025-02-05 10:06:05 - INFO - stdout -  [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - ***** Running training ***** +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num examples = 478,531 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Num Epochs = 3 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Instantaneous batch size per device = 4 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 64 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Gradient Accumulation steps = 1 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Total optimization steps = 22,434 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.trainer - Number of trainable parameters = 814,345,920 +2025-02-05 10:07:39 - INFO - transformers.integrations.integration_utils - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +2025-02-05 10:07:39 - INFO - transformers.integrations.integration_utils - Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" +2025-02-05 10:07:39 - ERROR - stderr - wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. +2025-02-05 10:07:39 - INFO - wandb - Current SDK version is 0.18.3 +2025-02-05 10:07:39 - INFO - wandb - Configure stats pid to 1377777 +2025-02-05 10:07:39 - INFO - wandb - Loading settings from /leonardo/home/userexternal/aamaduzz/.config/wandb/settings +2025-02-05 10:07:39 - INFO - wandb - Loading settings from /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/wandb/settings +2025-02-05 10:07:39 - INFO - wandb - Loading settings from environment variables: {'mode': 'offline'} +2025-02-05 10:07:39 - INFO - wandb - Applying setup settings: {'mode': 'offline', '_disable_service': None} +2025-02-05 10:07:39 - INFO - wandb - Inferring run settings from compute environment: {'program_relpath': 'llana/train/train_mem_llana.py', 'program_abspath': '/leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/llana/train/train_mem_llana.py', 'program': '/leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/llana/train/train_mem_llana.py'} +2025-02-05 10:07:39 - INFO - wandb - Logging user logs to /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/wandb/offline-run-20250205_100739-go8ons0a/logs/debug.log +2025-02-05 10:07:39 - INFO - wandb - Logging internal logs to /leonardo_scratch/fast/IscrC_V2Text/dev/LLaNA_objanerf/wandb/offline-run-20250205_100739-go8ons0a/logs/debug-internal.log +2025-02-05 10:07:39 - INFO - wandb - calling init triggers +2025-02-05 10:07:39 - INFO - wandb - wandb.init called with sweep_config: {} +config: {} +2025-02-05 10:07:39 - INFO - wandb - starting backend +2025-02-05 10:07:39 - INFO - wandb - sending inform_init request +2025-02-05 10:07:39 - INFO - wandb - multiprocessing start_methods=fork,spawn,forkserver, using: spawn +2025-02-05 10:07:39 - INFO - wandb - backend started and connected +2025-02-05 10:07:39 - DEBUG - wandb - no default config file found in config-defaults.yaml +2025-02-05 10:07:39 - INFO - wandb - updated telemetry +2025-02-05 10:07:39 - INFO - wandb - communicating run to backend with 90.0 second timeout +2025-02-05 10:07:39 - INFO - wandb - starting run threads in backend +2025-02-05 10:07:39 - ERROR - stderr - wandb: Tracking run with wandb version 0.18.3 +2025-02-05 10:07:39 - ERROR - stderr - wandb: W&B syncing is set to `offline` in this directory. +2025-02-05 10:07:39 - ERROR - stderr - wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. +2025-02-05 10:07:39 - DEBUG - wandb - Saving list of pip packages installed into the current environment +2025-02-05 10:07:40 - INFO - wandb - atexit reg +2025-02-05 10:07:40 - INFO - wandb - redirect: wrap_raw +2025-02-05 10:07:40 - INFO - wandb - Wrapping output streams. +2025-02-05 10:07:40 - INFO - wandb - Redirects installed. +2025-02-05 10:07:40 - INFO - wandb - run started, returning control to user process +2025-02-05 10:07:40 - INFO - wandb - config_cb None None {'vocab_size': 32003, 'max_position_embeddings': 2048, 'hidden_size': 5120, 'intermediate_size': 13824, 'num_hidden_layers': 40, 'num_attention_heads': 40, 'num_key_value_heads': 40, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'pretraining_tp': 1, 'use_cache': False, 'rope_theta': 10000.0, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['LLaNA'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'outputs/LLaNA_13B_train_stage1_shapenerf_objanerf_AUGMENTED/slurm_script_31-01-2025_19:29', 'transformers_version': '4.44.0', 'DEFAULT_POINT_END_TOKEN': '', 'DEFAULT_POINT_PATCH_TOKEN': '', 'DEFAULT_POINT_START_TOKEN': '', 'mm_use_point_start_end': True, 'model_type': 'llana', 'nf2vec_config_name': 'nf2vec_2layer', 'point_backbone': 'nf2vec', 'point_backbone_ckpt': '', 'use_color': True, 'output_dir': 'outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 4, 'per_device_eval_batch_size': 1, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 2e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.03, 'warmup_steps': 0, 'log_level': 'info', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': 'outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/runs/Feb05_10-05-54_lrdn2016.leonardo.local', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1.0, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 32860, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 100.0, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': 'llana_objanerf_13b_stage2_recipe3_shapenerf_objanerf_AUGMENTED', 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': ['full_shard', 'auto_wrap'], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': 'LlamaDecoderLayer', 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': True, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'no', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'eval_use_gather_object': False, 'cache_dir': None, 'model_max_length': 2048, 'model_debug': False, 'fix_llm': False, 'force_fsdp': False, 'tune_mm_mlp_adapter': True, 'stage_2': True, 'pretrained_mm_mlp_adapter': None, 'detatch_point_token': ''} +2025-02-05 10:07:40 - INFO - wandb - config set model/num_parameters = 814345920 - > +2025-02-05 10:07:40 - INFO - wandb - config_cb model/num_parameters 814345920 None +2025-02-05 10:07:40 - ERROR - stderr - 0%| | 0/22434 [00:00 2048). Running this sequence through the model will result in indexing errors +2025-02-05 11:42:16 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2915 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 11:42:18 - ERROR - stderr - 10%|█ | 2253/22434 [1:34:38<14:01:56, 2.50s/it] +2025-02-05 11:42:18 - ERROR - stderr - +2025-02-05 11:42:18 - ERROR - stderr - +2025-02-05 11:42:18 - INFO - stdout - {'loss': 1.1228, 'grad_norm': 1.15614914894104, 'learning_rate': 1.9741277270962225e-05, 'epoch': 0.3} +2025-02-05 11:42:18 - ERROR - stderr - 10%|█ | 2253/22434 [1:34:38<14:01:56, 2.50s/it] +2025-02-05 11:42:24 - ERROR - stderr - 10%|█ | 2254/22434 [1:34:44<19:32:52, 3.49s/it] +2025-02-05 11:42:24 - ERROR - stderr - +2025-02-05 11:42:24 - ERROR - stderr - +2025-02-05 11:42:24 - INFO - stdout - {'loss': 1.0237, 'grad_norm': 1.1618907451629639, 'learning_rate': 1.9740950885349536e-05, 'epoch': 0.3} +2025-02-05 11:42:24 - ERROR - stderr - 10%|█ | 2254/22434 [1:34:44<19:32:52, 3.49s/it] +2025-02-05 11:42:26 - ERROR - stderr - 10%|█ | 2255/22434 [1:34:46<17:52:36, 3.19s/it] +2025-02-05 11:42:26 - ERROR - stderr - +2025-02-05 11:42:26 - ERROR - stderr - +2025-02-05 11:42:26 - INFO - stdout - {'loss': 1.022, 'grad_norm': 1.3347283601760864, 'learning_rate': 1.974062429669605e-05, 'epoch': 0.3} +2025-02-05 11:42:26 - ERROR - stderr - 10%|█ | 2255/22434 [1:34:46<17:52:36, 3.19s/it] +2025-02-05 11:42:29 - ERROR - stderr - 10%|█ | 2256/22434 [1:34:49<16:46:05, 2.99s/it] +2025-02-05 11:42:29 - ERROR - stderr - +2025-02-05 11:42:29 - ERROR - stderr - +2025-02-05 11:42:29 - INFO - stdout - {'loss': 0.9313, 'grad_norm': 0.9908042550086975, 'learning_rate': 1.9740297505008565e-05, 'epoch': 0.3} +2025-02-05 11:42:29 - ERROR - stderr - 10%|█ | 2256/22434 [1:34:49<16:46:05, 2.99s/it] +2025-02-05 11:42:31 - ERROR - stderr - 10%|█ | 2257/22434 [1:34:51<16:09:34, 2.88s/it] +2025-02-05 11:42:32 - ERROR - stderr - +2025-02-05 11:42:32 - ERROR - stderr - +2025-02-05 11:42:32 - INFO - stdout - {'loss': 1.0773, 'grad_norm': 1.2134469747543335, 'learning_rate': 1.9739970510293903e-05, 'epoch': 0.3} +2025-02-05 11:42:32 - ERROR - stderr - 10%|█ | 2257/22434 [1:34:51<16:09:34, 2.88s/it] +2025-02-05 11:42:34 - ERROR - stderr - 10%|█ | 2258/22434 [1:34:54<15:32:45, 2.77s/it] +2025-02-05 11:42:34 - ERROR - stderr - +2025-02-05 11:42:34 - ERROR - stderr - +2025-02-05 11:42:34 - INFO - stdout - {'loss': 0.8971, 'grad_norm': 1.1534372568130493, 'learning_rate': 1.9739643312558875e-05, 'epoch': 0.3} +2025-02-05 11:42:34 - ERROR - stderr - 10%|█ | 2258/22434 [1:34:54<15:32:45, 2.77s/it] +2025-02-05 11:42:36 - ERROR - stderr - 10%|█ | 2259/22434 [1:34:56<14:59:07, 2.67s/it] +2025-02-05 11:42:36 - ERROR - stderr - +2025-02-05 11:42:36 - ERROR - stderr - +2025-02-05 11:42:36 - INFO - stdout - {'loss': 0.9809, 'grad_norm': 1.1988004446029663, 'learning_rate': 1.97393159118103e-05, 'epoch': 0.3} +2025-02-05 11:42:36 - ERROR - stderr - 10%|█ | 2259/22434 [1:34:56<14:59:07, 2.67s/it] +2025-02-05 11:42:39 - ERROR - stderr - 10%|█ | 2260/22434 [1:34:59<14:35:24, 2.60s/it] +2025-02-05 11:42:39 - ERROR - stderr - +2025-02-05 11:42:39 - ERROR - stderr - +2025-02-05 11:42:39 - INFO - stdout - {'loss': 1.0303, 'grad_norm': 1.278108835220337, 'learning_rate': 1.9738988308055006e-05, 'epoch': 0.3} +2025-02-05 11:42:39 - ERROR - stderr - 10%|█ | 2260/22434 [1:34:59<14:35:24, 2.60s/it] +2025-02-05 11:42:41 - ERROR - stderr - 10%|█ | 2261/22434 [1:35:01<14:25:26, 2.57s/it] +2025-02-05 11:42:41 - ERROR - stderr - +2025-02-05 11:42:41 - ERROR - stderr - +2025-02-05 11:42:41 - INFO - stdout - {'loss': 1.0356, 'grad_norm': 1.265594720840454, 'learning_rate': 1.9738660501299823e-05, 'epoch': 0.3} +2025-02-05 11:42:41 - ERROR - stderr - 10%|█ | 2261/22434 [1:35:01<14:25:26, 2.57s/it] +2025-02-05 11:42:44 - ERROR - stderr - 10%|█ | 2262/22434 [1:35:04<14:16:22, 2.55s/it] +2025-02-05 11:42:44 - ERROR - stderr - +2025-02-05 11:42:44 - ERROR - stderr - +2025-02-05 11:42:44 - INFO - stdout - {'loss': 1.0335, 'grad_norm': 1.1128697395324707, 'learning_rate': 1.9738332491551574e-05, 'epoch': 0.3} +2025-02-05 11:42:44 - ERROR - stderr - 10%|█ | 2262/22434 [1:35:04<14:16:22, 2.55s/it] +2025-02-05 11:42:46 - ERROR - stderr - 10%|█ | 2263/22434 [1:35:06<14:21:00, 2.56s/it] +2025-02-05 11:42:47 - ERROR - stderr - +2025-02-05 11:42:47 - ERROR - stderr - +2025-02-05 11:42:47 - INFO - stdout - {'loss': 0.9489, 'grad_norm': 1.1275129318237305, 'learning_rate': 1.9738004278817107e-05, 'epoch': 0.3} +2025-02-05 11:42:47 - ERROR - stderr - 10%|█ | 2263/22434 [1:35:06<14:21:00, 2.56s/it] +2025-02-05 11:42:49 - ERROR - stderr - 10%|█ | 2264/22434 [1:35:09<14:06:48, 2.52s/it] +2025-02-05 11:42:49 - ERROR - stderr - +2025-02-05 11:42:49 - ERROR - stderr - +2025-02-05 11:42:49 - INFO - stdout - {'loss': 1.1969, 'grad_norm': 1.2506957054138184, 'learning_rate': 1.9737675863103257e-05, 'epoch': 0.3} +2025-02-05 11:42:49 - ERROR - stderr - 10%|█ | 2264/22434 [1:35:09<14:06:48, 2.52s/it] +2025-02-05 11:42:51 - ERROR - stderr - 10%|█ | 2265/22434 [1:35:11<14:16:34, 2.55s/it] +2025-02-05 11:42:52 - ERROR - stderr - +2025-02-05 11:42:52 - ERROR - stderr - +2025-02-05 11:42:52 - INFO - stdout - {'loss': 0.8943, 'grad_norm': 1.1707462072372437, 'learning_rate': 1.9737347244416876e-05, 'epoch': 0.3} +2025-02-05 11:42:52 - ERROR - stderr - 10%|█ | 2265/22434 [1:35:11<14:16:34, 2.55s/it] +2025-02-05 11:42:54 - ERROR - stderr - 10%|█ | 2266/22434 [1:35:14<14:13:55, 2.54s/it] +2025-02-05 11:42:54 - ERROR - stderr - +2025-02-05 11:42:54 - ERROR - stderr - +2025-02-05 11:42:54 - INFO - stdout - {'loss': 0.957, 'grad_norm': 1.3129950761795044, 'learning_rate': 1.9737018422764803e-05, 'epoch': 0.3} +2025-02-05 11:42:54 - ERROR - stderr - 10%|█ | 2266/22434 [1:35:14<14:13:55, 2.54s/it] +2025-02-05 11:42:57 - ERROR - stderr - 10%|█ | 2267/22434 [1:35:16<14:20:26, 2.56s/it] +2025-02-05 11:42:57 - ERROR - stderr - +2025-02-05 11:42:57 - ERROR - stderr - +2025-02-05 11:42:57 - INFO - stdout - {'loss': 1.0951, 'grad_norm': 1.3260670900344849, 'learning_rate': 1.9736689398153905e-05, 'epoch': 0.3} +2025-02-05 11:42:57 - ERROR - stderr - 10%|█ | 2267/22434 [1:35:16<14:20:26, 2.56s/it] +2025-02-05 11:42:59 - ERROR - stderr - 10%|█ | 2268/22434 [1:35:19<14:05:29, 2.52s/it] +2025-02-05 11:42:59 - ERROR - stderr - +2025-02-05 11:42:59 - ERROR - stderr - +2025-02-05 11:42:59 - INFO - stdout - {'loss': 1.0037, 'grad_norm': 1.1444002389907837, 'learning_rate': 1.973636017059103e-05, 'epoch': 0.3} +2025-02-05 11:42:59 - ERROR - stderr - 10%|█ | 2268/22434 [1:35:19<14:05:29, 2.52s/it] +2025-02-05 11:43:02 - ERROR - stderr - 10%|█ | 2269/22434 [1:35:21<14:05:10, 2.51s/it] +2025-02-05 11:43:02 - ERROR - stderr - +2025-02-05 11:43:02 - ERROR - stderr - +2025-02-05 11:43:02 - INFO - stdout - {'loss': 0.9369, 'grad_norm': 1.2704200744628906, 'learning_rate': 1.9736030740083045e-05, 'epoch': 0.3} +2025-02-05 11:43:02 - ERROR - stderr - 10%|█ | 2269/22434 [1:35:21<14:05:10, 2.51s/it] +2025-02-05 11:43:04 - ERROR - stderr - 10%|█ | 2270/22434 [1:35:24<14:03:51, 2.51s/it] +2025-02-05 11:43:04 - ERROR - stderr - +2025-02-05 11:43:04 - ERROR - stderr - +2025-02-05 11:43:04 - INFO - stdout - {'loss': 1.093, 'grad_norm': 1.2839304208755493, 'learning_rate': 1.9735701106636814e-05, 'epoch': 0.3} +2025-02-05 11:43:04 - ERROR - stderr - 10%|█ | 2270/22434 [1:35:24<14:03:51, 2.51s/it] +2025-02-05 11:43:07 - ERROR - stderr - 10%|█ | 2271/22434 [1:35:26<13:59:52, 2.50s/it] +2025-02-05 11:43:07 - ERROR - stderr - +2025-02-05 11:43:07 - ERROR - stderr - +2025-02-05 11:43:07 - INFO - stdout - {'loss': 0.9884, 'grad_norm': 1.1554511785507202, 'learning_rate': 1.973537127025921e-05, 'epoch': 0.3} +2025-02-05 11:43:07 - ERROR - stderr - 10%|█ | 2271/22434 [1:35:26<13:59:52, 2.50s/it] +2025-02-05 11:43:09 - ERROR - stderr - 10%|█ | 2272/22434 [1:35:29<13:54:15, 2.48s/it] +2025-02-05 11:43:09 - ERROR - stderr - +2025-02-05 11:43:09 - ERROR - stderr - +2025-02-05 11:43:09 - INFO - stdout - {'loss': 0.9654, 'grad_norm': 1.139846682548523, 'learning_rate': 1.9735041230957108e-05, 'epoch': 0.3} +2025-02-05 11:43:09 - ERROR - stderr - 10%|█ | 2272/22434 [1:35:29<13:54:15, 2.48s/it] +2025-02-05 11:43:11 - ERROR - stderr - 10%|█ | 2273/22434 [1:35:31<13:57:37, 2.49s/it] +2025-02-05 11:43:12 - ERROR - stderr - +2025-02-05 11:43:12 - ERROR - stderr - +2025-02-05 11:43:12 - INFO - stdout - {'loss': 0.9497, 'grad_norm': 1.1091350317001343, 'learning_rate': 1.9734710988737385e-05, 'epoch': 0.3} +2025-02-05 11:43:12 - ERROR - stderr - 10%|█ | 2273/22434 [1:35:31<13:57:37, 2.49s/it] +2025-02-05 11:43:14 - ERROR - stderr - 10%|█ | 2274/22434 [1:35:34<14:04:48, 2.51s/it] +2025-02-05 11:43:14 - ERROR - stderr - +2025-02-05 11:43:14 - ERROR - stderr - +2025-02-05 11:43:14 - INFO - stdout - {'loss': 0.9337, 'grad_norm': 1.1415122747421265, 'learning_rate': 1.9734380543606932e-05, 'epoch': 0.3} +2025-02-05 11:43:14 - ERROR - stderr - 10%|█ | 2274/22434 [1:35:34<14:04:48, 2.51s/it] +2025-02-05 11:43:16 - ERROR - stderr - 10%|█ | 2275/22434 [1:35:36<13:54:29, 2.48s/it] +2025-02-05 11:43:17 - ERROR - stderr - +2025-02-05 11:43:17 - ERROR - stderr - +2025-02-05 11:43:17 - INFO - stdout - {'loss': 0.9536, 'grad_norm': 1.2031067609786987, 'learning_rate': 1.9734049895572626e-05, 'epoch': 0.3} +2025-02-05 11:43:17 - ERROR - stderr - 10%|█ | 2275/22434 [1:35:36<13:54:29, 2.48s/it] +2025-02-05 11:43:19 - ERROR - stderr - 10%|█ | 2276/22434 [1:35:39<13:49:23, 2.47s/it] +2025-02-05 11:43:19 - ERROR - stderr - +2025-02-05 11:43:19 - ERROR - stderr - +2025-02-05 11:43:19 - INFO - stdout - {'loss': 1.0396, 'grad_norm': 1.1437950134277344, 'learning_rate': 1.9733719044641366e-05, 'epoch': 0.3} +2025-02-05 11:43:19 - ERROR - stderr - 10%|█ | 2276/22434 [1:35:39<13:49:23, 2.47s/it] +2025-02-05 11:43:21 - ERROR - stderr - 10%|█ | 2277/22434 [1:35:41<13:46:20, 2.46s/it] +2025-02-05 11:43:21 - ERROR - stderr - +2025-02-05 11:43:21 - ERROR - stderr - +2025-02-05 11:43:21 - INFO - stdout - {'loss': 0.9652, 'grad_norm': 1.2494534254074097, 'learning_rate': 1.9733387990820047e-05, 'epoch': 0.3} +2025-02-05 11:43:21 - ERROR - stderr - 10%|█ | 2277/22434 [1:35:41<13:46:20, 2.46s/it] +2025-02-05 11:43:24 - ERROR - stderr - 10%|█ | 2278/22434 [1:35:44<13:49:51, 2.47s/it] +2025-02-05 11:43:24 - ERROR - stderr - +2025-02-05 11:43:24 - ERROR - stderr - +2025-02-05 11:43:24 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.2008118629455566, 'learning_rate': 1.9733056734115567e-05, 'epoch': 0.3} +2025-02-05 11:43:24 - ERROR - stderr - 10%|█ | 2278/22434 [1:35:44<13:49:51, 2.47s/it] +2025-02-05 11:43:26 - ERROR - stderr - 10%|█ | 2279/22434 [1:35:46<13:51:23, 2.47s/it] +2025-02-05 11:43:26 - ERROR - stderr - +2025-02-05 11:43:26 - ERROR - stderr - +2025-02-05 11:43:26 - INFO - stdout - {'loss': 0.9644, 'grad_norm': 1.1481235027313232, 'learning_rate': 1.9732725274534837e-05, 'epoch': 0.3} +2025-02-05 11:43:26 - ERROR - stderr - 10%|█ | 2279/22434 [1:35:46<13:51:23, 2.47s/it] +2025-02-05 11:43:29 - ERROR - stderr - 10%|█ | 2280/22434 [1:35:49<13:48:13, 2.47s/it] +2025-02-05 11:43:29 - ERROR - stderr - +2025-02-05 11:43:29 - ERROR - stderr - +2025-02-05 11:43:29 - INFO - stdout - {'loss': 0.9663, 'grad_norm': 1.2427749633789062, 'learning_rate': 1.973239361208476e-05, 'epoch': 0.3} +2025-02-05 11:43:29 - ERROR - stderr - 10%|█ | 2280/22434 [1:35:49<13:48:13, 2.47s/it] +2025-02-05 11:43:31 - ERROR - stderr - 10%|█ | 2281/22434 [1:35:51<13:51:31, 2.48s/it] +2025-02-05 11:43:31 - ERROR - stderr - +2025-02-05 11:43:31 - ERROR - stderr - +2025-02-05 11:43:31 - INFO - stdout - {'loss': 1.0344, 'grad_norm': 1.0829435586929321, 'learning_rate': 1.973206174677225e-05, 'epoch': 0.31} +2025-02-05 11:43:31 - ERROR - stderr - 10%|█ | 2281/22434 [1:35:51<13:51:31, 2.48s/it] +2025-02-05 11:43:34 - ERROR - stderr - 10%|█ | 2282/22434 [1:35:54<13:55:18, 2.49s/it] +2025-02-05 11:43:34 - ERROR - stderr - +2025-02-05 11:43:34 - ERROR - stderr - +2025-02-05 11:43:34 - INFO - stdout - {'loss': 1.0859, 'grad_norm': 1.0766233205795288, 'learning_rate': 1.9731729678604226e-05, 'epoch': 0.31} +2025-02-05 11:43:34 - ERROR - stderr - 10%|█ | 2282/22434 [1:35:54<13:55:18, 2.49s/it] +2025-02-05 11:43:36 - ERROR - stderr - 10%|█ | 2283/22434 [1:35:56<13:54:07, 2.48s/it] +2025-02-05 11:43:36 - ERROR - stderr - +2025-02-05 11:43:36 - ERROR - stderr - +2025-02-05 11:43:36 - INFO - stdout - {'loss': 0.9386, 'grad_norm': 1.216006875038147, 'learning_rate': 1.973139740758761e-05, 'epoch': 0.31} +2025-02-05 11:43:36 - ERROR - stderr - 10%|█ | 2283/22434 [1:35:56<13:54:07, 2.48s/it] +2025-02-05 11:43:39 - ERROR - stderr - 10%|█ | 2284/22434 [1:35:58<13:52:03, 2.48s/it] +2025-02-05 11:43:39 - ERROR - stderr - +2025-02-05 11:43:39 - ERROR - stderr - +2025-02-05 11:43:39 - INFO - stdout - {'loss': 1.0004, 'grad_norm': 1.1430648565292358, 'learning_rate': 1.9731064933729324e-05, 'epoch': 0.31} +2025-02-05 11:43:39 - ERROR - stderr - 10%|█ | 2284/22434 [1:35:59<13:52:03, 2.48s/it] +2025-02-05 11:43:41 - ERROR - stderr - 10%|█ | 2285/22434 [1:36:01<13:51:18, 2.48s/it] +2025-02-05 11:43:41 - ERROR - stderr - +2025-02-05 11:43:41 - ERROR - stderr - +2025-02-05 11:43:41 - INFO - stdout - {'loss': 0.9436, 'grad_norm': 1.1239149570465088, 'learning_rate': 1.9730732257036303e-05, 'epoch': 0.31} +2025-02-05 11:43:41 - ERROR - stderr - 10%|█ | 2285/22434 [1:36:01<13:51:18, 2.48s/it] +2025-02-05 11:43:44 - ERROR - stderr - 10%|█ | 2286/22434 [1:36:03<13:49:05, 2.47s/it] +2025-02-05 11:43:44 - ERROR - stderr - +2025-02-05 11:43:44 - ERROR - stderr - +2025-02-05 11:43:44 - INFO - stdout - {'loss': 0.9276, 'grad_norm': 1.2132402658462524, 'learning_rate': 1.973039937751548e-05, 'epoch': 0.31} +2025-02-05 11:43:44 - ERROR - stderr - 10%|█ | 2286/22434 [1:36:03<13:49:05, 2.47s/it] +2025-02-05 11:43:46 - ERROR - stderr - 10%|█ | 2287/22434 [1:36:06<13:44:25, 2.46s/it] +2025-02-05 11:43:46 - ERROR - stderr - +2025-02-05 11:43:46 - ERROR - stderr - +2025-02-05 11:43:46 - INFO - stdout - {'loss': 1.0626, 'grad_norm': 1.2516506910324097, 'learning_rate': 1.9730066295173794e-05, 'epoch': 0.31} +2025-02-05 11:43:46 - ERROR - stderr - 10%|█ | 2287/22434 [1:36:06<13:44:25, 2.46s/it] +2025-02-05 11:43:49 - ERROR - stderr - 10%|█ | 2288/22434 [1:36:08<13:59:57, 2.50s/it] +2025-02-05 11:43:49 - ERROR - stderr - +2025-02-05 11:43:49 - ERROR - stderr - +2025-02-05 11:43:49 - INFO - stdout - {'loss': 0.9528, 'grad_norm': 1.1240605115890503, 'learning_rate': 1.9729733010018186e-05, 'epoch': 0.31} +2025-02-05 11:43:49 - ERROR - stderr - 10%|█ | 2288/22434 [1:36:08<13:59:57, 2.50s/it] +2025-02-05 11:43:51 - ERROR - stderr - 10%|█ | 2289/22434 [1:36:11<14:29:53, 2.59s/it] +2025-02-05 11:43:52 - ERROR - stderr - +2025-02-05 11:43:52 - ERROR - stderr - +2025-02-05 11:43:52 - INFO - stdout - {'loss': 1.0005, 'grad_norm': 1.1625926494598389, 'learning_rate': 1.9729399522055603e-05, 'epoch': 0.31} +2025-02-05 11:43:52 - ERROR - stderr - 10%|█ | 2289/22434 [1:36:11<14:29:53, 2.59s/it] +2025-02-05 11:43:54 - ERROR - stderr - 10%|█ | 2290/22434 [1:36:14<14:21:14, 2.57s/it] +2025-02-05 11:43:54 - ERROR - stderr - +2025-02-05 11:43:54 - ERROR - stderr - +2025-02-05 11:43:54 - INFO - stdout - {'loss': 1.2219, 'grad_norm': 1.4405685663223267, 'learning_rate': 1.9729065831292996e-05, 'epoch': 0.31} +2025-02-05 11:43:54 - ERROR - stderr - 10%|█ | 2290/22434 [1:36:14<14:21:14, 2.57s/it] +2025-02-05 11:43:56 - ERROR - stderr - 10%|█ | 2291/22434 [1:36:16<14:08:48, 2.53s/it] +2025-02-05 11:43:56 - ERROR - stderr - +2025-02-05 11:43:56 - ERROR - stderr - +2025-02-05 11:43:56 - INFO - stdout - {'loss': 0.9635, 'grad_norm': 1.1085350513458252, 'learning_rate': 1.9728731937737326e-05, 'epoch': 0.31} +2025-02-05 11:43:56 - ERROR - stderr - 10%|█ | 2291/22434 [1:36:16<14:08:48, 2.53s/it] +2025-02-05 11:43:59 - ERROR - stderr - 10%|█ | 2292/22434 [1:36:19<13:57:23, 2.49s/it] +2025-02-05 11:43:59 - ERROR - stderr - +2025-02-05 11:43:59 - ERROR - stderr - +2025-02-05 11:43:59 - INFO - stdout - {'loss': 1.0476, 'grad_norm': 1.276872158050537, 'learning_rate': 1.9728397841395544e-05, 'epoch': 0.31} +2025-02-05 11:43:59 - ERROR - stderr - 10%|█ | 2292/22434 [1:36:19<13:57:23, 2.49s/it] +2025-02-05 11:44:02 - ERROR - stderr - 10%|█ | 2293/22434 [1:36:21<14:16:19, 2.55s/it] +2025-02-05 11:44:02 - ERROR - stderr - +2025-02-05 11:44:02 - ERROR - stderr - +2025-02-05 11:44:02 - INFO - stdout - {'loss': 0.9807, 'grad_norm': 1.1097266674041748, 'learning_rate': 1.9728063542274617e-05, 'epoch': 0.31} +2025-02-05 11:44:02 - ERROR - stderr - 10%|█ | 2293/22434 [1:36:21<14:16:19, 2.55s/it] +2025-02-05 11:44:04 - ERROR - stderr - 10%|█ | 2294/22434 [1:36:24<14:11:00, 2.54s/it] +2025-02-05 11:44:04 - ERROR - stderr - +2025-02-05 11:44:04 - ERROR - stderr - +2025-02-05 11:44:04 - INFO - stdout - {'loss': 1.0409, 'grad_norm': 1.1223084926605225, 'learning_rate': 1.9727729040381517e-05, 'epoch': 0.31} +2025-02-05 11:44:04 - ERROR - stderr - 10%|█ | 2294/22434 [1:36:24<14:11:00, 2.54s/it] +2025-02-05 11:44:07 - ERROR - stderr - 10%|█ | 2295/22434 [1:36:27<14:39:04, 2.62s/it] +2025-02-05 11:44:07 - ERROR - stderr - +2025-02-05 11:44:07 - ERROR - stderr - +2025-02-05 11:44:07 - INFO - stdout - {'loss': 0.9245, 'grad_norm': 1.1173349618911743, 'learning_rate': 1.972739433572321e-05, 'epoch': 0.31} +2025-02-05 11:44:07 - ERROR - stderr - 10%|█ | 2295/22434 [1:36:27<14:39:04, 2.62s/it] +2025-02-05 11:44:09 - ERROR - stderr - 10%|█ | 2296/22434 [1:36:29<14:33:01, 2.60s/it] +2025-02-05 11:44:09 - ERROR - stderr - +2025-02-05 11:44:09 - ERROR - stderr - +2025-02-05 11:44:09 - INFO - stdout - {'loss': 1.0129, 'grad_norm': 1.2066407203674316, 'learning_rate': 1.972705942830668e-05, 'epoch': 0.31} +2025-02-05 11:44:09 - ERROR - stderr - 10%|█ | 2296/22434 [1:36:29<14:33:01, 2.60s/it] +2025-02-05 11:44:12 - ERROR - stderr - 10%|█ | 2297/22434 [1:36:32<14:41:08, 2.63s/it] +2025-02-05 11:44:12 - ERROR - stderr - +2025-02-05 11:44:12 - ERROR - stderr - +2025-02-05 11:44:12 - INFO - stdout - {'loss': 0.969, 'grad_norm': 1.1713558435440063, 'learning_rate': 1.9726724318138905e-05, 'epoch': 0.31} +2025-02-05 11:44:12 - ERROR - stderr - 10%|█ | 2297/22434 [1:36:32<14:41:08, 2.63s/it] +2025-02-05 11:44:15 - ERROR - stderr - 10%|█ | 2298/22434 [1:36:34<14:26:52, 2.58s/it] +2025-02-05 11:44:15 - ERROR - stderr - +2025-02-05 11:44:15 - ERROR - stderr - +2025-02-05 11:44:15 - INFO - stdout - {'loss': 0.9516, 'grad_norm': 1.2510708570480347, 'learning_rate': 1.9726389005226865e-05, 'epoch': 0.31} +2025-02-05 11:44:15 - ERROR - stderr - 10%|█ | 2298/22434 [1:36:34<14:26:52, 2.58s/it] +2025-02-05 11:44:17 - ERROR - stderr - 10%|█ | 2299/22434 [1:36:37<14:16:00, 2.55s/it] +2025-02-05 11:44:17 - ERROR - stderr - +2025-02-05 11:44:17 - ERROR - stderr - +2025-02-05 11:44:17 - INFO - stdout - {'loss': 0.8915, 'grad_norm': 1.0327454805374146, 'learning_rate': 1.9726053489577555e-05, 'epoch': 0.31} +2025-02-05 11:44:17 - ERROR - stderr - 10%|█ | 2299/22434 [1:36:37<14:16:00, 2.55s/it] +2025-02-05 11:44:19 - ERROR - stderr - 10%|█ | 2300/22434 [1:36:39<14:08:25, 2.53s/it] +2025-02-05 11:44:20 - ERROR - stderr - +2025-02-05 11:44:20 - ERROR - stderr - +2025-02-05 11:44:20 - INFO - stdout - {'loss': 1.0306, 'grad_norm': 1.1685277223587036, 'learning_rate': 1.972571777119797e-05, 'epoch': 0.31} +2025-02-05 11:44:20 - ERROR - stderr - 10%|█ | 2300/22434 [1:36:39<14:08:25, 2.53s/it] +2025-02-05 11:44:22 - ERROR - stderr - 10%|█ | 2301/22434 [1:36:42<14:00:44, 2.51s/it] +2025-02-05 11:44:22 - ERROR - stderr - +2025-02-05 11:44:22 - ERROR - stderr - +2025-02-05 11:44:22 - INFO - stdout - {'loss': 1.0247, 'grad_norm': 1.1453516483306885, 'learning_rate': 1.97253818500951e-05, 'epoch': 0.31} +2025-02-05 11:44:22 - ERROR - stderr - 10%|█ | 2301/22434 [1:36:42<14:00:44, 2.51s/it] +2025-02-05 11:44:24 - ERROR - stderr - 10%|█ | 2302/22434 [1:36:44<14:01:46, 2.51s/it] +2025-02-05 11:44:25 - ERROR - stderr - +2025-02-05 11:44:25 - ERROR - stderr - +2025-02-05 11:44:25 - INFO - stdout - {'loss': 0.943, 'grad_norm': 1.2007924318313599, 'learning_rate': 1.9725045726275954e-05, 'epoch': 0.31} +2025-02-05 11:44:25 - ERROR - stderr - 10%|█ | 2302/22434 [1:36:44<14:01:46, 2.51s/it] +2025-02-05 11:44:27 - ERROR - stderr - 10%|█ | 2303/22434 [1:36:47<13:58:10, 2.50s/it] +2025-02-05 11:44:27 - ERROR - stderr - +2025-02-05 11:44:27 - ERROR - stderr - +2025-02-05 11:44:27 - INFO - stdout - {'loss': 1.0375, 'grad_norm': 1.3448234796524048, 'learning_rate': 1.9724709399747532e-05, 'epoch': 0.31} +2025-02-05 11:44:27 - ERROR - stderr - 10%|█ | 2303/22434 [1:36:47<13:58:10, 2.50s/it] +2025-02-05 11:44:29 - ERROR - stderr - 10%|█ | 2304/22434 [1:36:49<13:59:07, 2.50s/it] +2025-02-05 11:44:29 - ERROR - stderr - +2025-02-05 11:44:29 - ERROR - stderr - +2025-02-05 11:44:29 - INFO - stdout - {'loss': 1.0505, 'grad_norm': 1.2502800226211548, 'learning_rate': 1.972437287051685e-05, 'epoch': 0.31} +2025-02-05 11:44:29 - ERROR - stderr - 10%|█ | 2304/22434 [1:36:49<13:59:07, 2.50s/it] +2025-02-05 11:44:32 - ERROR - stderr - 10%|█ | 2305/22434 [1:36:52<13:54:07, 2.49s/it] +2025-02-05 11:44:32 - ERROR - stderr - +2025-02-05 11:44:32 - ERROR - stderr - +2025-02-05 11:44:32 - INFO - stdout - {'loss': 1.0447, 'grad_norm': 1.3218092918395996, 'learning_rate': 1.9724036138590926e-05, 'epoch': 0.31} +2025-02-05 11:44:32 - ERROR - stderr - 10%|█ | 2305/22434 [1:36:52<13:54:07, 2.49s/it] +2025-02-05 11:44:34 - ERROR - stderr - 10%|█ | 2306/22434 [1:36:54<13:52:38, 2.48s/it] +2025-02-05 11:44:34 - ERROR - stderr - +2025-02-05 11:44:34 - ERROR - stderr - +2025-02-05 11:44:34 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.1700409650802612, 'learning_rate': 1.9723699203976768e-05, 'epoch': 0.31} +2025-02-05 11:44:34 - ERROR - stderr - 10%|█ | 2306/22434 [1:36:54<13:52:38, 2.48s/it] +2025-02-05 11:44:37 - ERROR - stderr - 10%|█ | 2307/22434 [1:36:57<13:48:47, 2.47s/it] +2025-02-05 11:44:37 - ERROR - stderr - +2025-02-05 11:44:37 - ERROR - stderr - +2025-02-05 11:44:37 - INFO - stdout - {'loss': 1.0042, 'grad_norm': 1.1452702283859253, 'learning_rate': 1.9723362066681403e-05, 'epoch': 0.31} +2025-02-05 11:44:37 - ERROR - stderr - 10%|█ | 2307/22434 [1:36:57<13:48:47, 2.47s/it] +2025-02-05 11:44:40 - ERROR - stderr - 10%|█ | 2308/22434 [1:36:59<14:10:36, 2.54s/it] +2025-02-05 11:44:40 - ERROR - stderr - +2025-02-05 11:44:40 - ERROR - stderr - +2025-02-05 11:44:40 - INFO - stdout - {'loss': 1.1546, 'grad_norm': 1.1776940822601318, 'learning_rate': 1.9723024726711866e-05, 'epoch': 0.31} +2025-02-05 11:44:40 - ERROR - stderr - 10%|█ | 2308/22434 [1:36:59<14:10:36, 2.54s/it] +2025-02-05 11:44:42 - ERROR - stderr - 10%|█ | 2309/22434 [1:37:02<14:04:07, 2.52s/it] +2025-02-05 11:44:42 - ERROR - stderr - +2025-02-05 11:44:42 - ERROR - stderr - +2025-02-05 11:44:42 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.200139045715332, 'learning_rate': 1.972268718407518e-05, 'epoch': 0.31} +2025-02-05 11:44:42 - ERROR - stderr - 10%|█ | 2309/22434 [1:37:02<14:04:07, 2.52s/it] +2025-02-05 11:44:44 - ERROR - stderr - 10%|█ | 2310/22434 [1:37:04<14:02:04, 2.51s/it] +2025-02-05 11:44:45 - ERROR - stderr - +2025-02-05 11:44:45 - ERROR - stderr - +2025-02-05 11:44:45 - INFO - stdout - {'loss': 1.0634, 'grad_norm': 1.2481821775436401, 'learning_rate': 1.972234943877838e-05, 'epoch': 0.31} +2025-02-05 11:44:45 - ERROR - stderr - 10%|█ | 2310/22434 [1:37:04<14:02:04, 2.51s/it] +2025-02-05 11:44:47 - ERROR - stderr - 10%|█ | 2311/22434 [1:37:07<14:16:55, 2.56s/it] +2025-02-05 11:44:47 - ERROR - stderr - +2025-02-05 11:44:47 - ERROR - stderr - +2025-02-05 11:44:47 - INFO - stdout - {'loss': 0.9114, 'grad_norm': 1.1221435070037842, 'learning_rate': 1.9722011490828514e-05, 'epoch': 0.31} +2025-02-05 11:44:47 - ERROR - stderr - 10%|█ | 2311/22434 [1:37:07<14:16:55, 2.56s/it] +2025-02-05 11:44:50 - ERROR - stderr - 10%|█ | 2312/22434 [1:37:09<14:11:46, 2.54s/it] +2025-02-05 11:44:50 - ERROR - stderr - +2025-02-05 11:44:50 - ERROR - stderr - +2025-02-05 11:44:50 - INFO - stdout - {'loss': 1.0682, 'grad_norm': 1.2518023252487183, 'learning_rate': 1.9721673340232617e-05, 'epoch': 0.31} +2025-02-05 11:44:50 - ERROR - stderr - 10%|█ | 2312/22434 [1:37:09<14:11:46, 2.54s/it] +2025-02-05 11:44:52 - ERROR - stderr - 10%|█ | 2313/22434 [1:37:12<14:14:59, 2.55s/it] +2025-02-05 11:44:52 - ERROR - stderr - +2025-02-05 11:44:52 - ERROR - stderr - +2025-02-05 11:44:52 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.2186800241470337, 'learning_rate': 1.9721334986997746e-05, 'epoch': 0.31} +2025-02-05 11:44:52 - ERROR - stderr - 10%|█ | 2313/22434 [1:37:12<14:14:59, 2.55s/it] +2025-02-05 11:44:55 - ERROR - stderr - 10%|█ | 2314/22434 [1:37:14<14:07:38, 2.53s/it] +2025-02-05 11:44:55 - ERROR - stderr - +2025-02-05 11:44:55 - ERROR - stderr - +2025-02-05 11:44:55 - INFO - stdout - {'loss': 0.9743, 'grad_norm': 1.3271985054016113, 'learning_rate': 1.9720996431130946e-05, 'epoch': 0.31} +2025-02-05 11:44:55 - ERROR - stderr - 10%|█ | 2314/22434 [1:37:15<14:07:38, 2.53s/it] +2025-02-05 11:44:57 - ERROR - stderr - 10%|█ | 2315/22434 [1:37:17<14:02:10, 2.51s/it] +2025-02-05 11:44:57 - ERROR - stderr - +2025-02-05 11:44:57 - ERROR - stderr - +2025-02-05 11:44:57 - INFO - stdout - {'loss': 1.0378, 'grad_norm': 1.083513617515564, 'learning_rate': 1.972065767263928e-05, 'epoch': 0.31} +2025-02-05 11:44:57 - ERROR - stderr - 10%|█ | 2315/22434 [1:37:17<14:02:10, 2.51s/it] +2025-02-05 11:45:00 - ERROR - stderr - 10%|█ | 2316/22434 [1:37:19<13:57:50, 2.50s/it] +2025-02-05 11:45:00 - ERROR - stderr - +2025-02-05 11:45:00 - ERROR - stderr - +2025-02-05 11:45:00 - INFO - stdout - {'loss': 1.0245, 'grad_norm': 1.1666163206100464, 'learning_rate': 1.9720318711529804e-05, 'epoch': 0.31} +2025-02-05 11:45:00 - ERROR - stderr - 10%|█ | 2316/22434 [1:37:19<13:57:50, 2.50s/it] +2025-02-05 11:45:02 - ERROR - stderr - 10%|█ | 2317/22434 [1:37:22<13:49:19, 2.47s/it] +2025-02-05 11:45:02 - ERROR - stderr - +2025-02-05 11:45:02 - ERROR - stderr - +2025-02-05 11:45:02 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.1062474250793457, 'learning_rate': 1.971997954780959e-05, 'epoch': 0.31} +2025-02-05 11:45:02 - ERROR - stderr - 10%|█ | 2317/22434 [1:37:22<13:49:19, 2.47s/it] +2025-02-05 11:45:05 - ERROR - stderr - 10%|█ | 2318/22434 [1:37:24<14:00:43, 2.51s/it] +2025-02-05 11:45:05 - ERROR - stderr - +2025-02-05 11:45:05 - ERROR - stderr - +2025-02-05 11:45:05 - INFO - stdout - {'loss': 1.054, 'grad_norm': 1.1604300737380981, 'learning_rate': 1.97196401814857e-05, 'epoch': 0.31} +2025-02-05 11:45:05 - ERROR - stderr - 10%|█ | 2318/22434 [1:37:24<14:00:43, 2.51s/it] +2025-02-05 11:45:07 - ERROR - stderr - 10%|█ | 2319/22434 [1:37:27<13:54:46, 2.49s/it] +2025-02-05 11:45:07 - ERROR - stderr - +2025-02-05 11:45:07 - ERROR - stderr - +2025-02-05 11:45:07 - INFO - stdout - {'loss': 1.0135, 'grad_norm': 1.190425157546997, 'learning_rate': 1.9719300612565214e-05, 'epoch': 0.31} +2025-02-05 11:45:07 - ERROR - stderr - 10%|█ | 2319/22434 [1:37:27<13:54:46, 2.49s/it] +2025-02-05 11:45:10 - ERROR - stderr - 10%|█ | 2320/22434 [1:37:29<13:59:53, 2.51s/it] +2025-02-05 11:45:10 - ERROR - stderr - +2025-02-05 11:45:10 - ERROR - stderr - +2025-02-05 11:45:10 - INFO - stdout - {'loss': 1.144, 'grad_norm': 1.3250095844268799, 'learning_rate': 1.97189608410552e-05, 'epoch': 0.31} +2025-02-05 11:45:10 - ERROR - stderr - 10%|█ | 2320/22434 [1:37:29<13:59:53, 2.51s/it] +2025-02-05 11:45:12 - ERROR - stderr - 10%|█ | 2321/22434 [1:37:32<13:58:20, 2.50s/it] +2025-02-05 11:45:12 - ERROR - stderr - +2025-02-05 11:45:12 - ERROR - stderr - +2025-02-05 11:45:12 - INFO - stdout - {'loss': 0.9722, 'grad_norm': 1.1069201231002808, 'learning_rate': 1.9718620866962754e-05, 'epoch': 0.31} +2025-02-05 11:45:12 - ERROR - stderr - 10%|█ | 2321/22434 [1:37:32<13:58:20, 2.50s/it] +2025-02-05 11:45:15 - ERROR - stderr - 10%|█ | 2322/22434 [1:37:34<13:52:04, 2.48s/it] +2025-02-05 11:45:15 - ERROR - stderr - +2025-02-05 11:45:15 - ERROR - stderr - +2025-02-05 11:45:15 - INFO - stdout - {'loss': 1.0689, 'grad_norm': 1.240280270576477, 'learning_rate': 1.9718280690294954e-05, 'epoch': 0.31} +2025-02-05 11:45:15 - ERROR - stderr - 10%|█ | 2322/22434 [1:37:34<13:52:04, 2.48s/it] +2025-02-05 11:45:17 - ERROR - stderr - 10%|█ | 2323/22434 [1:37:37<13:55:00, 2.49s/it] +2025-02-05 11:45:17 - ERROR - stderr - +2025-02-05 11:45:17 - ERROR - stderr - +2025-02-05 11:45:17 - INFO - stdout - {'loss': 0.9788, 'grad_norm': 1.2191075086593628, 'learning_rate': 1.9717940311058893e-05, 'epoch': 0.31} +2025-02-05 11:45:17 - ERROR - stderr - 10%|█ | 2323/22434 [1:37:37<13:55:00, 2.49s/it] +2025-02-05 11:45:20 - ERROR - stderr - 10%|█ | 2324/22434 [1:37:39<13:50:05, 2.48s/it] +2025-02-05 11:45:20 - ERROR - stderr - +2025-02-05 11:45:20 - ERROR - stderr - +2025-02-05 11:45:20 - INFO - stdout - {'loss': 0.9691, 'grad_norm': 1.1158037185668945, 'learning_rate': 1.9717599729261666e-05, 'epoch': 0.31} +2025-02-05 11:45:20 - ERROR - stderr - 10%|█ | 2324/22434 [1:37:39<13:50:05, 2.48s/it] +2025-02-05 11:45:22 - ERROR - stderr - 10%|█ | 2325/22434 [1:37:42<13:54:36, 2.49s/it] +2025-02-05 11:45:22 - ERROR - stderr - +2025-02-05 11:45:22 - ERROR - stderr - +2025-02-05 11:45:22 - INFO - stdout - {'loss': 0.9183, 'grad_norm': 1.203933835029602, 'learning_rate': 1.9717258944910366e-05, 'epoch': 0.31} +2025-02-05 11:45:22 - ERROR - stderr - 10%|█ | 2325/22434 [1:37:42<13:54:36, 2.49s/it] +2025-02-05 11:45:25 - ERROR - stderr - 10%|█ | 2326/22434 [1:37:44<14:06:31, 2.53s/it] +2025-02-05 11:45:25 - ERROR - stderr - +2025-02-05 11:45:25 - ERROR - stderr - +2025-02-05 11:45:25 - INFO - stdout - {'loss': 1.0965, 'grad_norm': 1.279782772064209, 'learning_rate': 1.9716917958012106e-05, 'epoch': 0.31} +2025-02-05 11:45:25 - ERROR - stderr - 10%|█ | 2326/22434 [1:37:44<14:06:31, 2.53s/it] +2025-02-05 11:45:27 - ERROR - stderr - 10%|█ | 2327/22434 [1:37:47<13:58:23, 2.50s/it] +2025-02-05 11:45:27 - ERROR - stderr - +2025-02-05 11:45:27 - ERROR - stderr - +2025-02-05 11:45:27 - INFO - stdout - {'loss': 1.0505, 'grad_norm': 1.1461976766586304, 'learning_rate': 1.971657676857399e-05, 'epoch': 0.31} +2025-02-05 11:45:27 - ERROR - stderr - 10%|█ | 2327/22434 [1:37:47<13:58:23, 2.50s/it] +2025-02-05 11:45:30 - ERROR - stderr - 10%|█ | 2328/22434 [1:37:49<13:55:46, 2.49s/it] +2025-02-05 11:45:30 - ERROR - stderr - +2025-02-05 11:45:30 - ERROR - stderr - +2025-02-05 11:45:30 - INFO - stdout - {'loss': 1.008, 'grad_norm': 1.2181671857833862, 'learning_rate': 1.971623537660313e-05, 'epoch': 0.31} +2025-02-05 11:45:30 - ERROR - stderr - 10%|█ | 2328/22434 [1:37:49<13:55:46, 2.49s/it] +2025-02-05 11:45:32 - ERROR - stderr - 10%|█ | 2329/22434 [1:37:52<13:48:11, 2.47s/it] +2025-02-05 11:45:32 - ERROR - stderr - +2025-02-05 11:45:32 - ERROR - stderr - +2025-02-05 11:45:32 - INFO - stdout - {'loss': 0.8455, 'grad_norm': 1.1614128351211548, 'learning_rate': 1.9715893782106638e-05, 'epoch': 0.31} +2025-02-05 11:45:32 - ERROR - stderr - 10%|█ | 2329/22434 [1:37:52<13:48:11, 2.47s/it] +2025-02-05 11:45:34 - ERROR - stderr - 10%|█ | 2330/22434 [1:37:54<13:49:39, 2.48s/it] +2025-02-05 11:45:35 - ERROR - stderr - +2025-02-05 11:45:35 - ERROR - stderr - +2025-02-05 11:45:35 - INFO - stdout - {'loss': 1.1784, 'grad_norm': 1.2468427419662476, 'learning_rate': 1.9715551985091637e-05, 'epoch': 0.31} +2025-02-05 11:45:35 - ERROR - stderr - 10%|█ | 2330/22434 [1:37:54<13:49:39, 2.48s/it] +2025-02-05 11:45:37 - ERROR - stderr - 10%|█ | 2331/22434 [1:37:57<13:52:57, 2.49s/it] +2025-02-05 11:45:37 - ERROR - stderr - +2025-02-05 11:45:37 - ERROR - stderr - +2025-02-05 11:45:37 - INFO - stdout - {'loss': 1.1879, 'grad_norm': 1.2512284517288208, 'learning_rate': 1.9715209985565252e-05, 'epoch': 0.31} +2025-02-05 11:45:37 - ERROR - stderr - 10%|█ | 2331/22434 [1:37:57<13:52:57, 2.49s/it] +2025-02-05 11:45:40 - ERROR - stderr - 10%|█ | 2332/22434 [1:37:59<14:00:49, 2.51s/it] +2025-02-05 11:45:40 - ERROR - stderr - +2025-02-05 11:45:40 - ERROR - stderr - +2025-02-05 11:45:40 - INFO - stdout - {'loss': 0.9189, 'grad_norm': 1.11635422706604, 'learning_rate': 1.9714867783534614e-05, 'epoch': 0.31} +2025-02-05 11:45:40 - ERROR - stderr - 10%|█ | 2332/22434 [1:37:59<14:00:49, 2.51s/it] +2025-02-05 11:45:42 - ERROR - stderr - 10%|█ | 2333/22434 [1:38:02<13:58:42, 2.50s/it] +2025-02-05 11:45:42 - ERROR - stderr - +2025-02-05 11:45:42 - ERROR - stderr - +2025-02-05 11:45:42 - INFO - stdout - {'loss': 1.1133, 'grad_norm': 1.2817426919937134, 'learning_rate': 1.971452537900685e-05, 'epoch': 0.31} +2025-02-05 11:45:42 - ERROR - stderr - 10%|█ | 2333/22434 [1:38:02<13:58:42, 2.50s/it] +2025-02-05 11:45:45 - ERROR - stderr - 10%|█ | 2334/22434 [1:38:04<14:06:40, 2.53s/it] +2025-02-05 11:45:45 - ERROR - stderr - +2025-02-05 11:45:45 - ERROR - stderr - +2025-02-05 11:45:45 - INFO - stdout - {'loss': 0.9566, 'grad_norm': 1.0824156999588013, 'learning_rate': 1.97141827719891e-05, 'epoch': 0.31} +2025-02-05 11:45:45 - ERROR - stderr - 10%|█ | 2334/22434 [1:38:04<14:06:40, 2.53s/it] +2025-02-05 11:45:47 - ERROR - stderr - 10%|█ | 2335/22434 [1:38:07<14:13:31, 2.55s/it] +2025-02-05 11:45:47 - ERROR - stderr - +2025-02-05 11:45:47 - ERROR - stderr - +2025-02-05 11:45:47 - INFO - stdout - {'loss': 0.996, 'grad_norm': 1.1821709871292114, 'learning_rate': 1.971383996248851e-05, 'epoch': 0.31} +2025-02-05 11:45:47 - ERROR - stderr - 10%|█ | 2335/22434 [1:38:07<14:13:31, 2.55s/it] +2025-02-05 11:45:50 - ERROR - stderr - 10%|█ | 2336/22434 [1:38:09<14:10:15, 2.54s/it] +2025-02-05 11:45:50 - ERROR - stderr - +2025-02-05 11:45:50 - ERROR - stderr - +2025-02-05 11:45:50 - INFO - stdout - {'loss': 0.9692, 'grad_norm': 1.177978515625, 'learning_rate': 1.9713496950512217e-05, 'epoch': 0.31} +2025-02-05 11:45:50 - ERROR - stderr - 10%|█ | 2336/22434 [1:38:10<14:10:15, 2.54s/it] +2025-02-05 11:45:52 - ERROR - stderr - 10%|█ | 2337/22434 [1:38:12<14:05:17, 2.52s/it] +2025-02-05 11:45:52 - ERROR - stderr - +2025-02-05 11:45:52 - ERROR - stderr - +2025-02-05 11:45:52 - INFO - stdout - {'loss': 1.0135, 'grad_norm': 1.3418282270431519, 'learning_rate': 1.9713153736067377e-05, 'epoch': 0.31} +2025-02-05 11:45:52 - ERROR - stderr - 10%|█ | 2337/22434 [1:38:12<14:05:17, 2.52s/it] +2025-02-05 11:45:55 - ERROR - stderr - 10%|█ | 2338/22434 [1:38:14<13:59:11, 2.51s/it] +2025-02-05 11:45:55 - ERROR - stderr - +2025-02-05 11:45:55 - ERROR - stderr - +2025-02-05 11:45:55 - INFO - stdout - {'loss': 1.0505, 'grad_norm': 1.167195439338684, 'learning_rate': 1.971281031916114e-05, 'epoch': 0.31} +2025-02-05 11:45:55 - ERROR - stderr - 10%|█ | 2338/22434 [1:38:14<13:59:11, 2.51s/it] +2025-02-05 11:45:57 - ERROR - stderr - 10%|█ | 2339/22434 [1:38:17<13:59:58, 2.51s/it] +2025-02-05 11:45:57 - ERROR - stderr - +2025-02-05 11:45:57 - ERROR - stderr - +2025-02-05 11:45:57 - INFO - stdout - {'loss': 1.1362, 'grad_norm': 1.3744072914123535, 'learning_rate': 1.971246669980067e-05, 'epoch': 0.31} +2025-02-05 11:45:57 - ERROR - stderr - 10%|█ | 2339/22434 [1:38:17<13:59:58, 2.51s/it] +2025-02-05 11:45:57 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2783 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 11:45:57 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2783 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 11:46:00 - ERROR - stderr - 10%|█ | 2340/22434 [1:38:20<14:07:59, 2.53s/it] +2025-02-05 11:46:00 - ERROR - stderr - +2025-02-05 11:46:00 - ERROR - stderr - +2025-02-05 11:46:00 - INFO - stdout - {'loss': 0.9365, 'grad_norm': 1.084775447845459, 'learning_rate': 1.971212287799312e-05, 'epoch': 0.31} +2025-02-05 11:46:00 - ERROR - stderr - 10%|█ | 2340/22434 [1:38:20<14:07:59, 2.53s/it] +2025-02-05 11:46:06 - ERROR - stderr - 10%|█ | 2341/22434 [1:38:25<19:31:20, 3.50s/it] +2025-02-05 11:46:06 - ERROR - stderr - +2025-02-05 11:46:06 - ERROR - stderr - +2025-02-05 11:46:06 - INFO - stdout - {'loss': 0.8748, 'grad_norm': 1.0068764686584473, 'learning_rate': 1.9711778853745663e-05, 'epoch': 0.31} +2025-02-05 11:46:06 - ERROR - stderr - 10%|█ | 2341/22434 [1:38:25<19:31:20, 3.50s/it] +2025-02-05 11:46:08 - ERROR - stderr - 10%|█ | 2342/22434 [1:38:28<17:50:46, 3.20s/it] +2025-02-05 11:46:08 - ERROR - stderr - +2025-02-05 11:46:08 - ERROR - stderr - +2025-02-05 11:46:08 - INFO - stdout - {'loss': 0.996, 'grad_norm': 1.0703853368759155, 'learning_rate': 1.9711434627065472e-05, 'epoch': 0.31} +2025-02-05 11:46:08 - ERROR - stderr - 10%|█ | 2342/22434 [1:38:28<17:50:46, 3.20s/it] +2025-02-05 11:46:11 - ERROR - stderr - 10%|█ | 2343/22434 [1:38:30<16:41:25, 2.99s/it] +2025-02-05 11:46:11 - ERROR - stderr - +2025-02-05 11:46:11 - ERROR - stderr - +2025-02-05 11:46:11 - INFO - stdout - {'loss': 1.0193, 'grad_norm': 1.1515754461288452, 'learning_rate': 1.9711090197959715e-05, 'epoch': 0.31} +2025-02-05 11:46:11 - ERROR - stderr - 10%|█ | 2343/22434 [1:38:30<16:41:25, 2.99s/it] +2025-02-05 11:46:13 - ERROR - stderr - 10%|█ | 2344/22434 [1:38:33<15:43:21, 2.82s/it] +2025-02-05 11:46:13 - ERROR - stderr - +2025-02-05 11:46:13 - ERROR - stderr - +2025-02-05 11:46:13 - INFO - stdout - {'loss': 1.0018, 'grad_norm': 1.1545875072479248, 'learning_rate': 1.9710745566435578e-05, 'epoch': 0.31} +2025-02-05 11:46:13 - ERROR - stderr - 10%|█ | 2344/22434 [1:38:33<15:43:21, 2.82s/it] +2025-02-05 11:46:15 - ERROR - stderr - 10%|█ | 2345/22434 [1:38:35<15:01:07, 2.69s/it] +2025-02-05 11:46:15 - ERROR - stderr - +2025-02-05 11:46:15 - ERROR - stderr - +2025-02-05 11:46:15 - INFO - stdout - {'loss': 0.9701, 'grad_norm': 1.1640560626983643, 'learning_rate': 1.9710400732500242e-05, 'epoch': 0.31} +2025-02-05 11:46:15 - ERROR - stderr - 10%|█ | 2345/22434 [1:38:35<15:01:07, 2.69s/it] +2025-02-05 11:46:18 - ERROR - stderr - 10%|█ | 2346/22434 [1:38:38<14:38:52, 2.63s/it] +2025-02-05 11:46:18 - ERROR - stderr - +2025-02-05 11:46:18 - ERROR - stderr - +2025-02-05 11:46:18 - INFO - stdout - {'loss': 0.8721, 'grad_norm': 1.0496866703033447, 'learning_rate': 1.9710055696160895e-05, 'epoch': 0.31} +2025-02-05 11:46:18 - ERROR - stderr - 10%|█ | 2346/22434 [1:38:38<14:38:52, 2.63s/it] +2025-02-05 11:46:20 - ERROR - stderr - 10%|█ | 2347/22434 [1:38:40<14:18:16, 2.56s/it] +2025-02-05 11:46:20 - ERROR - stderr - +2025-02-05 11:46:20 - ERROR - stderr - +2025-02-05 11:46:20 - INFO - stdout - {'loss': 1.0636, 'grad_norm': 1.3501888513565063, 'learning_rate': 1.970971045742473e-05, 'epoch': 0.31} +2025-02-05 11:46:20 - ERROR - stderr - 10%|█ | 2347/22434 [1:38:40<14:18:16, 2.56s/it] +2025-02-05 11:46:23 - ERROR - stderr - 10%|█ | 2348/22434 [1:38:42<14:04:54, 2.52s/it] +2025-02-05 11:46:23 - ERROR - stderr - +2025-02-05 11:46:23 - ERROR - stderr - +2025-02-05 11:46:23 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.1900006532669067, 'learning_rate': 1.970936501629894e-05, 'epoch': 0.31} +2025-02-05 11:46:23 - ERROR - stderr - 10%|█ | 2348/22434 [1:38:42<14:04:54, 2.52s/it] +2025-02-05 11:46:25 - ERROR - stderr - 10%|█ | 2349/22434 [1:38:45<14:01:57, 2.52s/it] +2025-02-05 11:46:25 - ERROR - stderr - +2025-02-05 11:46:25 - ERROR - stderr - +2025-02-05 11:46:25 - INFO - stdout - {'loss': 0.9268, 'grad_norm': 1.0351680517196655, 'learning_rate': 1.9709019372790722e-05, 'epoch': 0.31} +2025-02-05 11:46:25 - ERROR - stderr - 10%|█ | 2349/22434 [1:38:45<14:01:57, 2.52s/it] +2025-02-05 11:46:28 - ERROR - stderr - 10%|█ | 2350/22434 [1:38:47<14:05:03, 2.52s/it] +2025-02-05 11:46:28 - ERROR - stderr - +2025-02-05 11:46:28 - ERROR - stderr - +2025-02-05 11:46:28 - INFO - stdout - {'loss': 0.9444, 'grad_norm': 1.0425423383712769, 'learning_rate': 1.9708673526907293e-05, 'epoch': 0.31} +2025-02-05 11:46:28 - ERROR - stderr - 10%|█ | 2350/22434 [1:38:48<14:05:03, 2.52s/it] +2025-02-05 11:46:30 - ERROR - stderr - 10%|█ | 2351/22434 [1:38:50<14:04:06, 2.52s/it] +2025-02-05 11:46:30 - ERROR - stderr - +2025-02-05 11:46:30 - ERROR - stderr - +2025-02-05 11:46:30 - INFO - stdout - {'loss': 1.1876, 'grad_norm': 1.2512761354446411, 'learning_rate': 1.9708327478655855e-05, 'epoch': 0.31} +2025-02-05 11:46:30 - ERROR - stderr - 10%|█ | 2351/22434 [1:38:50<14:04:06, 2.52s/it] +2025-02-05 11:46:33 - ERROR - stderr - 10%|█ | 2352/22434 [1:38:52<14:02:20, 2.52s/it] +2025-02-05 11:46:33 - ERROR - stderr - +2025-02-05 11:46:33 - ERROR - stderr - +2025-02-05 11:46:33 - INFO - stdout - {'loss': 1.0344, 'grad_norm': 1.3124017715454102, 'learning_rate': 1.9707981228043614e-05, 'epoch': 0.31} +2025-02-05 11:46:33 - ERROR - stderr - 10%|█ | 2352/22434 [1:38:53<14:02:20, 2.52s/it] +2025-02-05 11:46:35 - ERROR - stderr - 10%|█ | 2353/22434 [1:38:55<13:58:18, 2.50s/it] +2025-02-05 11:46:35 - ERROR - stderr - +2025-02-05 11:46:35 - ERROR - stderr - +2025-02-05 11:46:35 - INFO - stdout - {'loss': 0.8714, 'grad_norm': 1.0086859464645386, 'learning_rate': 1.9707634775077797e-05, 'epoch': 0.31} +2025-02-05 11:46:35 - ERROR - stderr - 10%|█ | 2353/22434 [1:38:55<13:58:18, 2.50s/it] +2025-02-05 11:46:38 - ERROR - stderr - 10%|█ | 2354/22434 [1:38:57<13:51:33, 2.48s/it] +2025-02-05 11:46:38 - ERROR - stderr - +2025-02-05 11:46:38 - ERROR - stderr - +2025-02-05 11:46:38 - INFO - stdout - {'loss': 1.0197, 'grad_norm': 1.2256962060928345, 'learning_rate': 1.9707288119765625e-05, 'epoch': 0.31} +2025-02-05 11:46:38 - ERROR - stderr - 10%|█ | 2354/22434 [1:38:57<13:51:33, 2.48s/it] +2025-02-05 11:46:40 - ERROR - stderr - 10%|█ | 2355/22434 [1:39:00<13:49:38, 2.48s/it] +2025-02-05 11:46:40 - ERROR - stderr - +2025-02-05 11:46:40 - ERROR - stderr - +2025-02-05 11:46:40 - INFO - stdout - {'loss': 1.0212, 'grad_norm': 1.1449984312057495, 'learning_rate': 1.9706941262114317e-05, 'epoch': 0.31} +2025-02-05 11:46:40 - ERROR - stderr - 10%|█ | 2355/22434 [1:39:00<13:49:38, 2.48s/it] +2025-02-05 11:46:43 - ERROR - stderr - 11%|█ | 2356/22434 [1:39:02<13:54:54, 2.49s/it] +2025-02-05 11:46:43 - ERROR - stderr - +2025-02-05 11:46:43 - ERROR - stderr - +2025-02-05 11:46:43 - INFO - stdout - {'loss': 1.1267, 'grad_norm': 1.1958261728286743, 'learning_rate': 1.9706594202131107e-05, 'epoch': 0.32} +2025-02-05 11:46:43 - ERROR - stderr - 11%|█ | 2356/22434 [1:39:02<13:54:54, 2.49s/it] +2025-02-05 11:46:45 - ERROR - stderr - 11%|█ | 2357/22434 [1:39:05<14:11:45, 2.55s/it] +2025-02-05 11:46:45 - ERROR - stderr - +2025-02-05 11:46:45 - ERROR - stderr - +2025-02-05 11:46:45 - INFO - stdout - {'loss': 1.0381, 'grad_norm': 1.1744670867919922, 'learning_rate': 1.9706246939823232e-05, 'epoch': 0.32} +2025-02-05 11:46:45 - ERROR - stderr - 11%|█ | 2357/22434 [1:39:05<14:11:45, 2.55s/it] +2025-02-05 11:46:48 - ERROR - stderr - 11%|█ | 2358/22434 [1:39:08<14:08:29, 2.54s/it] +2025-02-05 11:46:48 - ERROR - stderr - +2025-02-05 11:46:48 - ERROR - stderr - +2025-02-05 11:46:48 - INFO - stdout - {'loss': 1.1104, 'grad_norm': 1.2177389860153198, 'learning_rate': 1.9705899475197926e-05, 'epoch': 0.32} +2025-02-05 11:46:48 - ERROR - stderr - 11%|█ | 2358/22434 [1:39:08<14:08:29, 2.54s/it] +2025-02-05 11:46:50 - ERROR - stderr - 11%|█ | 2359/22434 [1:39:10<14:05:23, 2.53s/it] +2025-02-05 11:46:50 - ERROR - stderr - +2025-02-05 11:46:50 - ERROR - stderr - +2025-02-05 11:46:50 - INFO - stdout - {'loss': 0.929, 'grad_norm': 1.1450207233428955, 'learning_rate': 1.9705551808262432e-05, 'epoch': 0.32} +2025-02-05 11:46:50 - ERROR - stderr - 11%|█ | 2359/22434 [1:39:10<14:05:23, 2.53s/it] +2025-02-05 11:46:53 - ERROR - stderr - 11%|█ | 2360/22434 [1:39:13<13:59:45, 2.51s/it] +2025-02-05 11:46:53 - ERROR - stderr - +2025-02-05 11:46:53 - ERROR - stderr - +2025-02-05 11:46:53 - INFO - stdout - {'loss': 0.9712, 'grad_norm': 1.0978758335113525, 'learning_rate': 1.9705203939024e-05, 'epoch': 0.32} +2025-02-05 11:46:53 - ERROR - stderr - 11%|█ | 2360/22434 [1:39:13<13:59:45, 2.51s/it] +2025-02-05 11:46:55 - ERROR - stderr - 11%|█ | 2361/22434 [1:39:15<13:57:34, 2.50s/it] +2025-02-05 11:46:55 - ERROR - stderr - +2025-02-05 11:46:55 - ERROR - stderr - +2025-02-05 11:46:55 - INFO - stdout - {'loss': 1.0172, 'grad_norm': 1.1750407218933105, 'learning_rate': 1.9704855867489876e-05, 'epoch': 0.32} +2025-02-05 11:46:55 - ERROR - stderr - 11%|█ | 2361/22434 [1:39:15<13:57:34, 2.50s/it] +2025-02-05 11:46:58 - ERROR - stderr - 11%|█ | 2362/22434 [1:39:17<13:52:14, 2.49s/it] +2025-02-05 11:46:58 - ERROR - stderr - +2025-02-05 11:46:58 - ERROR - stderr - +2025-02-05 11:46:58 - INFO - stdout - {'loss': 0.9547, 'grad_norm': 1.1775720119476318, 'learning_rate': 1.970450759366732e-05, 'epoch': 0.32} +2025-02-05 11:46:58 - ERROR - stderr - 11%|█ | 2362/22434 [1:39:18<13:52:14, 2.49s/it] +2025-02-05 11:47:00 - ERROR - stderr - 11%|█ | 2363/22434 [1:39:20<13:57:40, 2.50s/it] +2025-02-05 11:47:00 - ERROR - stderr - +2025-02-05 11:47:00 - ERROR - stderr - +2025-02-05 11:47:00 - INFO - stdout - {'loss': 0.8894, 'grad_norm': 1.0643346309661865, 'learning_rate': 1.9704159117563587e-05, 'epoch': 0.32} +2025-02-05 11:47:00 - ERROR - stderr - 11%|█ | 2363/22434 [1:39:20<13:57:40, 2.50s/it] +2025-02-05 11:47:03 - ERROR - stderr - 11%|█ | 2364/22434 [1:39:23<13:58:52, 2.51s/it] +2025-02-05 11:47:03 - ERROR - stderr - +2025-02-05 11:47:03 - ERROR - stderr - +2025-02-05 11:47:03 - INFO - stdout - {'loss': 1.0195, 'grad_norm': 1.1409015655517578, 'learning_rate': 1.9703810439185946e-05, 'epoch': 0.32} +2025-02-05 11:47:03 - ERROR - stderr - 11%|█ | 2364/22434 [1:39:23<13:58:52, 2.51s/it] +2025-02-05 11:47:05 - ERROR - stderr - 11%|█ | 2365/22434 [1:39:25<14:01:51, 2.52s/it] +2025-02-05 11:47:05 - ERROR - stderr - +2025-02-05 11:47:05 - ERROR - stderr - +2025-02-05 11:47:05 - INFO - stdout - {'loss': 1.0843, 'grad_norm': 1.1865304708480835, 'learning_rate': 1.9703461558541662e-05, 'epoch': 0.32} +2025-02-05 11:47:05 - ERROR - stderr - 11%|█ | 2365/22434 [1:39:25<14:01:51, 2.52s/it] +2025-02-05 11:47:08 - ERROR - stderr - 11%|█ | 2366/22434 [1:39:28<13:54:45, 2.50s/it] +2025-02-05 11:47:08 - ERROR - stderr - +2025-02-05 11:47:08 - ERROR - stderr - +2025-02-05 11:47:08 - INFO - stdout - {'loss': 1.11, 'grad_norm': 1.1936390399932861, 'learning_rate': 1.9703112475638003e-05, 'epoch': 0.32} +2025-02-05 11:47:08 - ERROR - stderr - 11%|█ | 2366/22434 [1:39:28<13:54:45, 2.50s/it] +2025-02-05 11:47:10 - ERROR - stderr - 11%|█ | 2367/22434 [1:39:30<13:58:54, 2.51s/it] +2025-02-05 11:47:10 - ERROR - stderr - +2025-02-05 11:47:10 - ERROR - stderr - +2025-02-05 11:47:10 - INFO - stdout - {'loss': 0.9617, 'grad_norm': 1.2269513607025146, 'learning_rate': 1.9702763190482256e-05, 'epoch': 0.32} +2025-02-05 11:47:10 - ERROR - stderr - 11%|█ | 2367/22434 [1:39:30<13:58:54, 2.51s/it] +2025-02-05 11:47:13 - ERROR - stderr - 11%|█ | 2368/22434 [1:39:33<14:02:36, 2.52s/it] +2025-02-05 11:47:13 - ERROR - stderr - +2025-02-05 11:47:13 - ERROR - stderr - +2025-02-05 11:47:13 - INFO - stdout - {'loss': 1.0168, 'grad_norm': 1.1934444904327393, 'learning_rate': 1.970241370308169e-05, 'epoch': 0.32} +2025-02-05 11:47:13 - ERROR - stderr - 11%|█ | 2368/22434 [1:39:33<14:02:36, 2.52s/it] +2025-02-05 11:47:15 - ERROR - stderr - 11%|█ | 2369/22434 [1:39:35<14:01:48, 2.52s/it] +2025-02-05 11:47:15 - ERROR - stderr - +2025-02-05 11:47:15 - ERROR - stderr - +2025-02-05 11:47:15 - INFO - stdout - {'loss': 1.111, 'grad_norm': 1.138992190361023, 'learning_rate': 1.9702064013443592e-05, 'epoch': 0.32} +2025-02-05 11:47:15 - ERROR - stderr - 11%|█ | 2369/22434 [1:39:35<14:01:48, 2.52s/it] +2025-02-05 11:47:18 - ERROR - stderr - 11%|█ | 2370/22434 [1:39:38<13:57:06, 2.50s/it] +2025-02-05 11:47:18 - ERROR - stderr - +2025-02-05 11:47:18 - ERROR - stderr - +2025-02-05 11:47:18 - INFO - stdout - {'loss': 1.1461, 'grad_norm': 1.2658984661102295, 'learning_rate': 1.970171412157526e-05, 'epoch': 0.32} +2025-02-05 11:47:18 - ERROR - stderr - 11%|█ | 2370/22434 [1:39:38<13:57:06, 2.50s/it] +2025-02-05 11:47:20 - ERROR - stderr - 11%|█ | 2371/22434 [1:39:40<13:55:12, 2.50s/it] +2025-02-05 11:47:20 - ERROR - stderr - +2025-02-05 11:47:20 - ERROR - stderr - +2025-02-05 11:47:20 - INFO - stdout - {'loss': 0.9606, 'grad_norm': 1.1690174341201782, 'learning_rate': 1.970136402748398e-05, 'epoch': 0.32} +2025-02-05 11:47:20 - ERROR - stderr - 11%|█ | 2371/22434 [1:39:40<13:55:12, 2.50s/it] +2025-02-05 11:47:23 - ERROR - stderr - 11%|█ | 2372/22434 [1:39:43<13:51:18, 2.49s/it] +2025-02-05 11:47:23 - ERROR - stderr - +2025-02-05 11:47:23 - ERROR - stderr - +2025-02-05 11:47:23 - INFO - stdout - {'loss': 1.0524, 'grad_norm': 1.230116844177246, 'learning_rate': 1.9701013731177047e-05, 'epoch': 0.32} +2025-02-05 11:47:23 - ERROR - stderr - 11%|█ | 2372/22434 [1:39:43<13:51:18, 2.49s/it] +2025-02-05 11:47:25 - ERROR - stderr - 11%|█ | 2373/22434 [1:39:45<13:52:21, 2.49s/it] +2025-02-05 11:47:25 - ERROR - stderr - +2025-02-05 11:47:25 - ERROR - stderr - +2025-02-05 11:47:25 - INFO - stdout - {'loss': 0.9436, 'grad_norm': 1.0801745653152466, 'learning_rate': 1.9700663232661765e-05, 'epoch': 0.32} +2025-02-05 11:47:25 - ERROR - stderr - 11%|█ | 2373/22434 [1:39:45<13:52:21, 2.49s/it] +2025-02-05 11:47:28 - ERROR - stderr - 11%|█ | 2374/22434 [1:39:48<13:50:43, 2.48s/it] +2025-02-05 11:47:28 - ERROR - stderr - +2025-02-05 11:47:28 - ERROR - stderr - +2025-02-05 11:47:28 - INFO - stdout - {'loss': 1.0195, 'grad_norm': 1.3025161027908325, 'learning_rate': 1.9700312531945444e-05, 'epoch': 0.32} +2025-02-05 11:47:28 - ERROR - stderr - 11%|█ | 2374/22434 [1:39:48<13:50:43, 2.48s/it] +2025-02-05 11:47:30 - ERROR - stderr - 11%|█ | 2375/22434 [1:39:50<13:55:32, 2.50s/it] +2025-02-05 11:47:30 - ERROR - stderr - +2025-02-05 11:47:30 - ERROR - stderr - +2025-02-05 11:47:30 - INFO - stdout - {'loss': 1.0454, 'grad_norm': 1.1869661808013916, 'learning_rate': 1.9699961629035386e-05, 'epoch': 0.32} +2025-02-05 11:47:30 - ERROR - stderr - 11%|█ | 2375/22434 [1:39:50<13:55:32, 2.50s/it] +2025-02-05 11:47:33 - ERROR - stderr - 11%|█ | 2376/22434 [1:39:52<13:47:53, 2.48s/it] +2025-02-05 11:47:33 - ERROR - stderr - +2025-02-05 11:47:33 - ERROR - stderr - +2025-02-05 11:47:33 - INFO - stdout - {'loss': 0.9587, 'grad_norm': 1.2095932960510254, 'learning_rate': 1.9699610523938912e-05, 'epoch': 0.32} +2025-02-05 11:47:33 - ERROR - stderr - 11%|█ | 2376/22434 [1:39:53<13:47:53, 2.48s/it] +2025-02-05 11:47:35 - ERROR - stderr - 11%|█ | 2377/22434 [1:39:55<13:48:42, 2.48s/it] +2025-02-05 11:47:35 - ERROR - stderr - +2025-02-05 11:47:35 - ERROR - stderr - +2025-02-05 11:47:35 - INFO - stdout - {'loss': 0.9343, 'grad_norm': 1.0450241565704346, 'learning_rate': 1.9699259216663338e-05, 'epoch': 0.32} +2025-02-05 11:47:35 - ERROR - stderr - 11%|█ | 2377/22434 [1:39:55<13:48:42, 2.48s/it] +2025-02-05 11:47:38 - ERROR - stderr - 11%|█ | 2378/22434 [1:39:58<14:00:49, 2.52s/it] +2025-02-05 11:47:38 - ERROR - stderr - +2025-02-05 11:47:38 - ERROR - stderr - +2025-02-05 11:47:38 - INFO - stdout - {'loss': 0.9498, 'grad_norm': 1.1858789920806885, 'learning_rate': 1.9698907707215985e-05, 'epoch': 0.32} +2025-02-05 11:47:38 - ERROR - stderr - 11%|█ | 2378/22434 [1:39:58<14:00:49, 2.52s/it] +2025-02-05 11:47:41 - ERROR - stderr - 11%|█ | 2379/22434 [1:40:01<14:43:57, 2.64s/it] +2025-02-05 11:47:41 - ERROR - stderr - +2025-02-05 11:47:41 - ERROR - stderr - +2025-02-05 11:47:41 - INFO - stdout - {'loss': 1.0616, 'grad_norm': 1.230066180229187, 'learning_rate': 1.9698555995604188e-05, 'epoch': 0.32} +2025-02-05 11:47:41 - ERROR - stderr - 11%|█ | 2379/22434 [1:40:01<14:43:57, 2.64s/it] +2025-02-05 11:47:43 - ERROR - stderr - 11%|█ | 2380/22434 [1:40:03<14:32:19, 2.61s/it] +2025-02-05 11:47:43 - ERROR - stderr - +2025-02-05 11:47:43 - ERROR - stderr - +2025-02-05 11:47:43 - INFO - stdout - {'loss': 1.0992, 'grad_norm': 1.2173399925231934, 'learning_rate': 1.9698204081835266e-05, 'epoch': 0.32} +2025-02-05 11:47:43 - ERROR - stderr - 11%|█ | 2380/22434 [1:40:03<14:32:19, 2.61s/it] +2025-02-05 11:47:46 - ERROR - stderr - 11%|█ | 2381/22434 [1:40:06<14:36:20, 2.62s/it] +2025-02-05 11:47:46 - ERROR - stderr - +2025-02-05 11:47:46 - ERROR - stderr - +2025-02-05 11:47:46 - INFO - stdout - {'loss': 0.9741, 'grad_norm': 1.163827896118164, 'learning_rate': 1.969785196591656e-05, 'epoch': 0.32} +2025-02-05 11:47:46 - ERROR - stderr - 11%|█ | 2381/22434 [1:40:06<14:36:20, 2.62s/it] +2025-02-05 11:47:48 - ERROR - stderr - 11%|█ | 2382/22434 [1:40:08<14:23:15, 2.58s/it] +2025-02-05 11:47:48 - ERROR - stderr - +2025-02-05 11:47:48 - ERROR - stderr - +2025-02-05 11:47:48 - INFO - stdout - {'loss': 0.9972, 'grad_norm': 1.1509188413619995, 'learning_rate': 1.9697499647855413e-05, 'epoch': 0.32} +2025-02-05 11:47:48 - ERROR - stderr - 11%|█ | 2382/22434 [1:40:08<14:23:15, 2.58s/it] +2025-02-05 11:47:51 - ERROR - stderr - 11%|█ | 2383/22434 [1:40:11<14:13:23, 2.55s/it] +2025-02-05 11:47:51 - ERROR - stderr - +2025-02-05 11:47:51 - ERROR - stderr - +2025-02-05 11:47:51 - INFO - stdout - {'loss': 1.0734, 'grad_norm': 1.130071997642517, 'learning_rate': 1.969714712765916e-05, 'epoch': 0.32} +2025-02-05 11:47:51 - ERROR - stderr - 11%|█ | 2383/22434 [1:40:11<14:13:23, 2.55s/it] +2025-02-05 11:47:53 - ERROR - stderr - 11%|█ | 2384/22434 [1:40:13<14:12:41, 2.55s/it] +2025-02-05 11:47:53 - ERROR - stderr - +2025-02-05 11:47:53 - ERROR - stderr - +2025-02-05 11:47:53 - INFO - stdout - {'loss': 1.0602, 'grad_norm': 1.1836953163146973, 'learning_rate': 1.969679440533516e-05, 'epoch': 0.32} +2025-02-05 11:47:53 - ERROR - stderr - 11%|█ | 2384/22434 [1:40:13<14:12:41, 2.55s/it] +2025-02-05 11:47:56 - ERROR - stderr - 11%|█ | 2385/22434 [1:40:16<14:06:06, 2.53s/it] +2025-02-05 11:47:56 - ERROR - stderr - +2025-02-05 11:47:56 - ERROR - stderr - +2025-02-05 11:47:56 - INFO - stdout - {'loss': 1.0698, 'grad_norm': 1.176977276802063, 'learning_rate': 1.9696441480890757e-05, 'epoch': 0.32} +2025-02-05 11:47:56 - ERROR - stderr - 11%|█ | 2385/22434 [1:40:16<14:06:06, 2.53s/it] +2025-02-05 11:47:59 - ERROR - stderr - 11%|█ | 2386/22434 [1:40:18<14:10:28, 2.55s/it] +2025-02-05 11:47:59 - ERROR - stderr - +2025-02-05 11:47:59 - ERROR - stderr - +2025-02-05 11:47:59 - INFO - stdout - {'loss': 1.0044, 'grad_norm': 1.1579760313034058, 'learning_rate': 1.9696088354333313e-05, 'epoch': 0.32} +2025-02-05 11:47:59 - ERROR - stderr - 11%|█ | 2386/22434 [1:40:18<14:10:28, 2.55s/it] +2025-02-05 11:48:01 - ERROR - stderr - 11%|█ | 2387/22434 [1:40:21<14:09:59, 2.54s/it] +2025-02-05 11:48:01 - ERROR - stderr - +2025-02-05 11:48:01 - ERROR - stderr - +2025-02-05 11:48:01 - INFO - stdout - {'loss': 0.9129, 'grad_norm': 1.1565437316894531, 'learning_rate': 1.9695735025670178e-05, 'epoch': 0.32} +2025-02-05 11:48:01 - ERROR - stderr - 11%|█ | 2387/22434 [1:40:21<14:09:59, 2.54s/it] +2025-02-05 11:48:04 - ERROR - stderr - 11%|█ | 2388/22434 [1:40:23<14:04:00, 2.53s/it] +2025-02-05 11:48:04 - ERROR - stderr - +2025-02-05 11:48:04 - ERROR - stderr - +2025-02-05 11:48:04 - INFO - stdout - {'loss': 1.0412, 'grad_norm': 1.2603696584701538, 'learning_rate': 1.9695381494908733e-05, 'epoch': 0.32} +2025-02-05 11:48:04 - ERROR - stderr - 11%|█ | 2388/22434 [1:40:23<14:04:00, 2.53s/it] +2025-02-05 11:48:06 - ERROR - stderr - 11%|█ | 2389/22434 [1:40:26<14:01:47, 2.52s/it] +2025-02-05 11:48:06 - ERROR - stderr - +2025-02-05 11:48:06 - ERROR - stderr - +2025-02-05 11:48:06 - INFO - stdout - {'loss': 1.0563, 'grad_norm': 1.1807267665863037, 'learning_rate': 1.9695027762056333e-05, 'epoch': 0.32} +2025-02-05 11:48:06 - ERROR - stderr - 11%|█ | 2389/22434 [1:40:26<14:01:47, 2.52s/it] +2025-02-05 11:48:08 - ERROR - stderr - 11%|█ | 2390/22434 [1:40:28<13:51:27, 2.49s/it] +2025-02-05 11:48:09 - ERROR - stderr - +2025-02-05 11:48:09 - ERROR - stderr - +2025-02-05 11:48:09 - INFO - stdout - {'loss': 1.0307, 'grad_norm': 1.134080171585083, 'learning_rate': 1.9694673827120354e-05, 'epoch': 0.32} +2025-02-05 11:48:09 - ERROR - stderr - 11%|█ | 2390/22434 [1:40:28<13:51:27, 2.49s/it] +2025-02-05 11:48:11 - ERROR - stderr - 11%|█ | 2391/22434 [1:40:31<13:56:26, 2.50s/it] +2025-02-05 11:48:11 - ERROR - stderr - +2025-02-05 11:48:11 - ERROR - stderr - +2025-02-05 11:48:11 - INFO - stdout - {'loss': 1.0194, 'grad_norm': 1.1582611799240112, 'learning_rate': 1.9694319690108182e-05, 'epoch': 0.32} +2025-02-05 11:48:11 - ERROR - stderr - 11%|█ | 2391/22434 [1:40:31<13:56:26, 2.50s/it] +2025-02-05 11:48:13 - ERROR - stderr - 11%|█ | 2392/22434 [1:40:33<13:53:33, 2.50s/it] +2025-02-05 11:48:14 - ERROR - stderr - +2025-02-05 11:48:14 - ERROR - stderr - +2025-02-05 11:48:14 - INFO - stdout - {'loss': 1.042, 'grad_norm': 1.3401755094528198, 'learning_rate': 1.969396535102719e-05, 'epoch': 0.32} +2025-02-05 11:48:14 - ERROR - stderr - 11%|█ | 2392/22434 [1:40:33<13:53:33, 2.50s/it] +2025-02-05 11:48:16 - ERROR - stderr - 11%|█ | 2393/22434 [1:40:36<13:52:13, 2.49s/it] +2025-02-05 11:48:16 - ERROR - stderr - +2025-02-05 11:48:16 - ERROR - stderr - +2025-02-05 11:48:16 - INFO - stdout - {'loss': 1.0981, 'grad_norm': 1.2861007452011108, 'learning_rate': 1.9693610809884764e-05, 'epoch': 0.32} +2025-02-05 11:48:16 - ERROR - stderr - 11%|█ | 2393/22434 [1:40:36<13:52:13, 2.49s/it] +2025-02-05 11:48:18 - ERROR - stderr - 11%|█ | 2394/22434 [1:40:38<13:47:51, 2.48s/it] +2025-02-05 11:48:18 - ERROR - stderr - +2025-02-05 11:48:18 - ERROR - stderr - +2025-02-05 11:48:18 - INFO - stdout - {'loss': 0.9389, 'grad_norm': 1.0507349967956543, 'learning_rate': 1.96932560666883e-05, 'epoch': 0.32} +2025-02-05 11:48:18 - ERROR - stderr - 11%|█ | 2394/22434 [1:40:38<13:47:51, 2.48s/it] +2025-02-05 11:48:21 - ERROR - stderr - 11%|█ | 2395/22434 [1:40:41<13:52:46, 2.49s/it] +2025-02-05 11:48:21 - ERROR - stderr - +2025-02-05 11:48:21 - ERROR - stderr - +2025-02-05 11:48:21 - INFO - stdout - {'loss': 1.0343, 'grad_norm': 1.3202192783355713, 'learning_rate': 1.9692901121445187e-05, 'epoch': 0.32} +2025-02-05 11:48:21 - ERROR - stderr - 11%|█ | 2395/22434 [1:40:41<13:52:46, 2.49s/it] +2025-02-05 11:48:24 - ERROR - stderr - 11%|█ | 2396/22434 [1:40:43<14:02:29, 2.52s/it] +2025-02-05 11:48:24 - ERROR - stderr - +2025-02-05 11:48:24 - ERROR - stderr - +2025-02-05 11:48:24 - INFO - stdout - {'loss': 1.0231, 'grad_norm': 1.1251357793807983, 'learning_rate': 1.9692545974162826e-05, 'epoch': 0.32} +2025-02-05 11:48:24 - ERROR - stderr - 11%|█ | 2396/22434 [1:40:43<14:02:29, 2.52s/it] +2025-02-05 11:48:26 - ERROR - stderr - 11%|█ | 2397/22434 [1:40:46<14:01:23, 2.52s/it] +2025-02-05 11:48:26 - ERROR - stderr - +2025-02-05 11:48:26 - ERROR - stderr - +2025-02-05 11:48:26 - INFO - stdout - {'loss': 0.9627, 'grad_norm': 1.2302676439285278, 'learning_rate': 1.9692190624848616e-05, 'epoch': 0.32} +2025-02-05 11:48:26 - ERROR - stderr - 11%|█ | 2397/22434 [1:40:46<14:01:23, 2.52s/it] +2025-02-05 11:48:28 - ERROR - stderr - 11%|█ | 2398/22434 [1:40:48<13:50:59, 2.49s/it] +2025-02-05 11:48:28 - ERROR - stderr - +2025-02-05 11:48:28 - ERROR - stderr - +2025-02-05 11:48:28 - INFO - stdout - {'loss': 0.8752, 'grad_norm': 1.1330833435058594, 'learning_rate': 1.969183507350997e-05, 'epoch': 0.32} +2025-02-05 11:48:28 - ERROR - stderr - 11%|█ | 2398/22434 [1:40:48<13:50:59, 2.49s/it] +2025-02-05 11:48:31 - ERROR - stderr - 11%|█ | 2399/22434 [1:40:51<13:53:16, 2.50s/it] +2025-02-05 11:48:31 - ERROR - stderr - +2025-02-05 11:48:31 - ERROR - stderr - +2025-02-05 11:48:31 - INFO - stdout - {'loss': 1.0059, 'grad_norm': 1.0865366458892822, 'learning_rate': 1.9691479320154295e-05, 'epoch': 0.32} +2025-02-05 11:48:31 - ERROR - stderr - 11%|█ | 2399/22434 [1:40:51<13:53:16, 2.50s/it] +2025-02-05 11:48:34 - ERROR - stderr - 11%|█ | 2400/22434 [1:40:53<14:10:57, 2.55s/it] +2025-02-05 11:48:34 - ERROR - stderr - +2025-02-05 11:48:34 - ERROR - stderr - +2025-02-05 11:48:34 - INFO - stdout - {'loss': 1.0611, 'grad_norm': 1.3230291604995728, 'learning_rate': 1.9691123364789008e-05, 'epoch': 0.32} +2025-02-05 11:48:34 - ERROR - stderr - 11%|█ | 2400/22434 [1:40:53<14:10:57, 2.55s/it] +2025-02-05 11:48:36 - ERROR - stderr - 11%|█ | 2401/22434 [1:40:56<13:57:39, 2.51s/it] +2025-02-05 11:48:36 - ERROR - stderr - +2025-02-05 11:48:36 - ERROR - stderr - +2025-02-05 11:48:36 - INFO - stdout - {'loss': 1.069, 'grad_norm': 1.3397996425628662, 'learning_rate': 1.9690767207421527e-05, 'epoch': 0.32} +2025-02-05 11:48:36 - ERROR - stderr - 11%|█ | 2401/22434 [1:40:56<13:57:39, 2.51s/it] +2025-02-05 11:48:39 - ERROR - stderr - 11%|█ | 2402/22434 [1:40:58<13:57:10, 2.51s/it] +2025-02-05 11:48:39 - ERROR - stderr - +2025-02-05 11:48:39 - ERROR - stderr - +2025-02-05 11:48:39 - INFO - stdout - {'loss': 0.9918, 'grad_norm': 1.2741390466690063, 'learning_rate': 1.9690410848059278e-05, 'epoch': 0.32} +2025-02-05 11:48:39 - ERROR - stderr - 11%|█ | 2402/22434 [1:40:58<13:57:10, 2.51s/it] +2025-02-05 11:48:41 - ERROR - stderr - 11%|█ | 2403/22434 [1:41:01<14:00:12, 2.52s/it] +2025-02-05 11:48:41 - ERROR - stderr - +2025-02-05 11:48:41 - ERROR - stderr - +2025-02-05 11:48:41 - INFO - stdout - {'loss': 1.1248, 'grad_norm': 1.1550568342208862, 'learning_rate': 1.969005428670969e-05, 'epoch': 0.32} +2025-02-05 11:48:41 - ERROR - stderr - 11%|█ | 2403/22434 [1:41:01<14:00:12, 2.52s/it] +2025-02-05 11:48:44 - ERROR - stderr - 11%|█ | 2404/22434 [1:41:03<13:51:50, 2.49s/it] +2025-02-05 11:48:44 - ERROR - stderr - +2025-02-05 11:48:44 - ERROR - stderr - +2025-02-05 11:48:44 - INFO - stdout - {'loss': 0.9138, 'grad_norm': 1.2188063859939575, 'learning_rate': 1.968969752338019e-05, 'epoch': 0.32} +2025-02-05 11:48:44 - ERROR - stderr - 11%|█ | 2404/22434 [1:41:03<13:51:50, 2.49s/it] +2025-02-05 11:48:46 - ERROR - stderr - 11%|█ | 2405/22434 [1:41:06<13:46:26, 2.48s/it] +2025-02-05 11:48:46 - ERROR - stderr - +2025-02-05 11:48:46 - ERROR - stderr - +2025-02-05 11:48:46 - INFO - stdout - {'loss': 0.975, 'grad_norm': 1.1910172700881958, 'learning_rate': 1.9689340558078212e-05, 'epoch': 0.32} +2025-02-05 11:48:46 - ERROR - stderr - 11%|█ | 2405/22434 [1:41:06<13:46:26, 2.48s/it] +2025-02-05 11:48:48 - ERROR - stderr - 11%|█ | 2406/22434 [1:41:08<13:51:06, 2.49s/it] +2025-02-05 11:48:49 - ERROR - stderr - +2025-02-05 11:48:49 - ERROR - stderr - +2025-02-05 11:48:49 - INFO - stdout - {'loss': 0.8924, 'grad_norm': 1.1405029296875, 'learning_rate': 1.9688983390811204e-05, 'epoch': 0.32} +2025-02-05 11:48:49 - ERROR - stderr - 11%|█ | 2406/22434 [1:41:08<13:51:06, 2.49s/it] +2025-02-05 11:48:51 - ERROR - stderr - 11%|█ | 2407/22434 [1:41:11<13:46:18, 2.48s/it] +2025-02-05 11:48:51 - ERROR - stderr - +2025-02-05 11:48:51 - ERROR - stderr - +2025-02-05 11:48:51 - INFO - stdout - {'loss': 0.913, 'grad_norm': 1.1406763792037964, 'learning_rate': 1.9688626021586615e-05, 'epoch': 0.32} +2025-02-05 11:48:51 - ERROR - stderr - 11%|█ | 2407/22434 [1:41:11<13:46:18, 2.48s/it] +2025-02-05 11:48:53 - ERROR - stderr - 11%|█ | 2408/22434 [1:41:13<13:50:41, 2.49s/it] +2025-02-05 11:48:53 - ERROR - stderr - +2025-02-05 11:48:53 - ERROR - stderr - +2025-02-05 11:48:53 - INFO - stdout - {'loss': 0.968, 'grad_norm': 1.1816368103027344, 'learning_rate': 1.9688268450411882e-05, 'epoch': 0.32} +2025-02-05 11:48:53 - ERROR - stderr - 11%|█ | 2408/22434 [1:41:13<13:50:41, 2.49s/it] +2025-02-05 11:48:56 - ERROR - stderr - 11%|█ | 2409/22434 [1:41:16<13:48:05, 2.48s/it] +2025-02-05 11:48:56 - ERROR - stderr - +2025-02-05 11:48:56 - ERROR - stderr - +2025-02-05 11:48:56 - INFO - stdout - {'loss': 1.0293, 'grad_norm': 1.2005079984664917, 'learning_rate': 1.9687910677294466e-05, 'epoch': 0.32} +2025-02-05 11:48:56 - ERROR - stderr - 11%|█ | 2409/22434 [1:41:16<13:48:05, 2.48s/it] +2025-02-05 11:48:59 - ERROR - stderr - 11%|█ | 2410/22434 [1:41:18<14:10:27, 2.55s/it] +2025-02-05 11:48:59 - ERROR - stderr - +2025-02-05 11:48:59 - ERROR - stderr - +2025-02-05 11:48:59 - INFO - stdout - {'loss': 1.0668, 'grad_norm': 1.2041183710098267, 'learning_rate': 1.9687552702241823e-05, 'epoch': 0.32} +2025-02-05 11:48:59 - ERROR - stderr - 11%|█ | 2410/22434 [1:41:18<14:10:27, 2.55s/it] +2025-02-05 11:49:01 - ERROR - stderr - 11%|█ | 2411/22434 [1:41:21<14:08:55, 2.54s/it] +2025-02-05 11:49:01 - ERROR - stderr - +2025-02-05 11:49:01 - ERROR - stderr - +2025-02-05 11:49:01 - INFO - stdout - {'loss': 0.9578, 'grad_norm': 1.1517561674118042, 'learning_rate': 1.9687194525261408e-05, 'epoch': 0.32} +2025-02-05 11:49:01 - ERROR - stderr - 11%|█ | 2411/22434 [1:41:21<14:08:55, 2.54s/it] +2025-02-05 11:49:04 - ERROR - stderr - 11%|█ | 2412/22434 [1:41:23<14:06:21, 2.54s/it] +2025-02-05 11:49:04 - ERROR - stderr - +2025-02-05 11:49:04 - ERROR - stderr - +2025-02-05 11:49:04 - INFO - stdout - {'loss': 1.1175, 'grad_norm': 1.372638463973999, 'learning_rate': 1.9686836146360698e-05, 'epoch': 0.32} +2025-02-05 11:49:04 - ERROR - stderr - 11%|█ | 2412/22434 [1:41:23<14:06:21, 2.54s/it] +2025-02-05 11:49:06 - ERROR - stderr - 11%|█ | 2413/22434 [1:41:26<14:02:27, 2.52s/it] +2025-02-05 11:49:06 - ERROR - stderr - +2025-02-05 11:49:06 - ERROR - stderr - +2025-02-05 11:49:06 - INFO - stdout - {'loss': 0.9554, 'grad_norm': 1.1384968757629395, 'learning_rate': 1.9686477565547157e-05, 'epoch': 0.32} +2025-02-05 11:49:06 - ERROR - stderr - 11%|█ | 2413/22434 [1:41:26<14:02:27, 2.52s/it] +2025-02-05 11:49:09 - ERROR - stderr - 11%|█ | 2414/22434 [1:41:28<13:55:39, 2.50s/it] +2025-02-05 11:49:09 - ERROR - stderr - +2025-02-05 11:49:09 - ERROR - stderr - +2025-02-05 11:49:09 - INFO - stdout - {'loss': 0.9901, 'grad_norm': 1.1989781856536865, 'learning_rate': 1.968611878282826e-05, 'epoch': 0.32} +2025-02-05 11:49:09 - ERROR - stderr - 11%|█ | 2414/22434 [1:41:28<13:55:39, 2.50s/it] +2025-02-05 11:49:11 - ERROR - stderr - 11%|█ | 2415/22434 [1:41:31<13:59:22, 2.52s/it] +2025-02-05 11:49:11 - ERROR - stderr - +2025-02-05 11:49:11 - ERROR - stderr - +2025-02-05 11:49:11 - INFO - stdout - {'loss': 1.0625, 'grad_norm': 1.212981939315796, 'learning_rate': 1.9685759798211488e-05, 'epoch': 0.32} +2025-02-05 11:49:11 - ERROR - stderr - 11%|█ | 2415/22434 [1:41:31<13:59:22, 2.52s/it] +2025-02-05 11:49:14 - ERROR - stderr - 11%|█ | 2416/22434 [1:41:33<13:52:05, 2.49s/it] +2025-02-05 11:49:14 - ERROR - stderr - +2025-02-05 11:49:14 - ERROR - stderr - +2025-02-05 11:49:14 - INFO - stdout - {'loss': 1.0136, 'grad_norm': 1.1808278560638428, 'learning_rate': 1.968540061170432e-05, 'epoch': 0.32} +2025-02-05 11:49:14 - ERROR - stderr - 11%|█ | 2416/22434 [1:41:33<13:52:05, 2.49s/it] +2025-02-05 11:49:16 - ERROR - stderr - 11%|█ | 2417/22434 [1:41:36<13:48:50, 2.48s/it] +2025-02-05 11:49:16 - ERROR - stderr - +2025-02-05 11:49:16 - ERROR - stderr - +2025-02-05 11:49:16 - INFO - stdout - {'loss': 0.9276, 'grad_norm': 1.2867447137832642, 'learning_rate': 1.968504122331424e-05, 'epoch': 0.32} +2025-02-05 11:49:16 - ERROR - stderr - 11%|█ | 2417/22434 [1:41:36<13:48:50, 2.48s/it] +2025-02-05 11:49:19 - ERROR - stderr - 11%|█ | 2418/22434 [1:41:38<13:46:58, 2.48s/it] +2025-02-05 11:49:19 - ERROR - stderr - +2025-02-05 11:49:19 - ERROR - stderr - +2025-02-05 11:49:19 - INFO - stdout - {'loss': 1.0046, 'grad_norm': 1.2021349668502808, 'learning_rate': 1.9684681633048748e-05, 'epoch': 0.32} +2025-02-05 11:49:19 - ERROR - stderr - 11%|█ | 2418/22434 [1:41:38<13:46:58, 2.48s/it] +2025-02-05 11:49:21 - ERROR - stderr - 11%|█ | 2419/22434 [1:41:41<13:48:51, 2.48s/it] +2025-02-05 11:49:21 - ERROR - stderr - +2025-02-05 11:49:21 - ERROR - stderr - +2025-02-05 11:49:21 - INFO - stdout - {'loss': 1.049, 'grad_norm': 1.2332921028137207, 'learning_rate': 1.968432184091533e-05, 'epoch': 0.32} +2025-02-05 11:49:21 - ERROR - stderr - 11%|█ | 2419/22434 [1:41:41<13:48:51, 2.48s/it] +2025-02-05 11:49:24 - ERROR - stderr - 11%|█ | 2420/22434 [1:41:43<13:51:53, 2.49s/it] +2025-02-05 11:49:24 - ERROR - stderr - +2025-02-05 11:49:24 - ERROR - stderr - +2025-02-05 11:49:24 - INFO - stdout - {'loss': 0.9516, 'grad_norm': 1.0550178289413452, 'learning_rate': 1.9683961846921495e-05, 'epoch': 0.32} +2025-02-05 11:49:24 - ERROR - stderr - 11%|█ | 2420/22434 [1:41:43<13:51:53, 2.49s/it] +2025-02-05 11:49:26 - ERROR - stderr - 11%|█ | 2421/22434 [1:41:46<13:51:29, 2.49s/it] +2025-02-05 11:49:26 - ERROR - stderr - +2025-02-05 11:49:26 - ERROR - stderr - +2025-02-05 11:49:26 - INFO - stdout - {'loss': 1.045, 'grad_norm': 1.1444095373153687, 'learning_rate': 1.9683601651074743e-05, 'epoch': 0.32} +2025-02-05 11:49:26 - ERROR - stderr - 11%|█ | 2421/22434 [1:41:46<13:51:29, 2.49s/it] +2025-02-05 11:49:29 - ERROR - stderr - 11%|█ | 2422/22434 [1:41:48<13:57:54, 2.51s/it] +2025-02-05 11:49:29 - ERROR - stderr - +2025-02-05 11:49:29 - ERROR - stderr - +2025-02-05 11:49:29 - INFO - stdout - {'loss': 1.1205, 'grad_norm': 1.3518046140670776, 'learning_rate': 1.9683241253382578e-05, 'epoch': 0.32} +2025-02-05 11:49:29 - ERROR - stderr - 11%|█ | 2422/22434 [1:41:48<13:57:54, 2.51s/it] +2025-02-05 11:49:31 - ERROR - stderr - 11%|█ | 2423/22434 [1:41:51<14:04:39, 2.53s/it] +2025-02-05 11:49:31 - ERROR - stderr - +2025-02-05 11:49:31 - ERROR - stderr - +2025-02-05 11:49:31 - INFO - stdout - {'loss': 1.0283, 'grad_norm': 1.0830843448638916, 'learning_rate': 1.968288065385251e-05, 'epoch': 0.32} +2025-02-05 11:49:31 - ERROR - stderr - 11%|█ | 2423/22434 [1:41:51<14:04:39, 2.53s/it] +2025-02-05 11:49:34 - ERROR - stderr - 11%|█ | 2424/22434 [1:41:53<14:02:46, 2.53s/it] +2025-02-05 11:49:34 - ERROR - stderr - +2025-02-05 11:49:34 - ERROR - stderr - +2025-02-05 11:49:34 - INFO - stdout - {'loss': 0.938, 'grad_norm': 1.2061160802841187, 'learning_rate': 1.9682519852492066e-05, 'epoch': 0.32} +2025-02-05 11:49:34 - ERROR - stderr - 11%|█ | 2424/22434 [1:41:54<14:02:46, 2.53s/it] +2025-02-05 11:49:36 - ERROR - stderr - 11%|█ | 2425/22434 [1:41:56<14:26:38, 2.60s/it] +2025-02-05 11:49:37 - ERROR - stderr - +2025-02-05 11:49:37 - ERROR - stderr - +2025-02-05 11:49:37 - INFO - stdout - {'loss': 0.8912, 'grad_norm': 1.095863699913025, 'learning_rate': 1.968215884930876e-05, 'epoch': 0.32} +2025-02-05 11:49:37 - ERROR - stderr - 11%|█ | 2425/22434 [1:41:56<14:26:38, 2.60s/it] +2025-02-05 11:49:39 - ERROR - stderr - 11%|█ | 2426/22434 [1:41:59<14:19:34, 2.58s/it] +2025-02-05 11:49:39 - ERROR - stderr - +2025-02-05 11:49:39 - ERROR - stderr - +2025-02-05 11:49:39 - INFO - stdout - {'loss': 0.9213, 'grad_norm': 1.141638159751892, 'learning_rate': 1.9681797644310116e-05, 'epoch': 0.32} +2025-02-05 11:49:39 - ERROR - stderr - 11%|█ | 2426/22434 [1:41:59<14:19:34, 2.58s/it] +2025-02-05 11:49:41 - ERROR - stderr - 11%|█ | 2427/22434 [1:42:01<14:06:46, 2.54s/it] +2025-02-05 11:49:41 - ERROR - stderr - +2025-02-05 11:49:41 - ERROR - stderr - +2025-02-05 11:49:41 - INFO - stdout - {'loss': 0.872, 'grad_norm': 1.1542855501174927, 'learning_rate': 1.9681436237503667e-05, 'epoch': 0.32} +2025-02-05 11:49:41 - ERROR - stderr - 11%|█ | 2427/22434 [1:42:01<14:06:46, 2.54s/it] +2025-02-05 11:49:44 - ERROR - stderr - 11%|█ | 2428/22434 [1:42:04<14:02:22, 2.53s/it] +2025-02-05 11:49:44 - ERROR - stderr - +2025-02-05 11:49:44 - ERROR - stderr - +2025-02-05 11:49:44 - INFO - stdout - {'loss': 1.0385, 'grad_norm': 1.1600905656814575, 'learning_rate': 1.9681074628896945e-05, 'epoch': 0.32} +2025-02-05 11:49:44 - ERROR - stderr - 11%|█ | 2428/22434 [1:42:04<14:02:22, 2.53s/it] +2025-02-05 11:49:46 - ERROR - stderr - 11%|█ | 2429/22434 [1:42:06<14:00:39, 2.52s/it] +2025-02-05 11:49:46 - ERROR - stderr - +2025-02-05 11:49:46 - ERROR - stderr - +2025-02-05 11:49:46 - INFO - stdout - {'loss': 0.9619, 'grad_norm': 1.1197657585144043, 'learning_rate': 1.9680712818497484e-05, 'epoch': 0.32} +2025-02-05 11:49:46 - ERROR - stderr - 11%|█ | 2429/22434 [1:42:06<14:00:39, 2.52s/it] +2025-02-05 11:49:49 - ERROR - stderr - 11%|█ | 2430/22434 [1:42:09<13:52:48, 2.50s/it] +2025-02-05 11:49:49 - ERROR - stderr - +2025-02-05 11:49:49 - ERROR - stderr - +2025-02-05 11:49:49 - INFO - stdout - {'loss': 1.0009, 'grad_norm': 1.2793159484863281, 'learning_rate': 1.9680350806312826e-05, 'epoch': 0.32} +2025-02-05 11:49:49 - ERROR - stderr - 11%|█ | 2430/22434 [1:42:09<13:52:48, 2.50s/it] +2025-02-05 11:49:51 - ERROR - stderr - 11%|█ | 2431/22434 [1:42:11<13:52:57, 2.50s/it] +2025-02-05 11:49:51 - ERROR - stderr - +2025-02-05 11:49:51 - ERROR - stderr - +2025-02-05 11:49:51 - INFO - stdout - {'loss': 1.0387, 'grad_norm': 1.1968907117843628, 'learning_rate': 1.967998859235052e-05, 'epoch': 0.33} +2025-02-05 11:49:51 - ERROR - stderr - 11%|█ | 2431/22434 [1:42:11<13:52:57, 2.50s/it] +2025-02-05 11:49:54 - ERROR - stderr - 11%|█ | 2432/22434 [1:42:14<13:48:36, 2.49s/it] +2025-02-05 11:49:54 - ERROR - stderr - +2025-02-05 11:49:54 - ERROR - stderr - +2025-02-05 11:49:54 - INFO - stdout - {'loss': 0.9038, 'grad_norm': 1.0916651487350464, 'learning_rate': 1.9679626176618118e-05, 'epoch': 0.33} +2025-02-05 11:49:54 - ERROR - stderr - 11%|█ | 2432/22434 [1:42:14<13:48:36, 2.49s/it] +2025-02-05 11:49:56 - ERROR - stderr - 11%|█ | 2433/22434 [1:42:16<13:42:35, 2.47s/it] +2025-02-05 11:49:56 - ERROR - stderr - +2025-02-05 11:49:56 - ERROR - stderr - +2025-02-05 11:49:56 - INFO - stdout - {'loss': 1.0481, 'grad_norm': 1.24396550655365, 'learning_rate': 1.9679263559123164e-05, 'epoch': 0.33} +2025-02-05 11:49:56 - ERROR - stderr - 11%|█ | 2433/22434 [1:42:16<13:42:35, 2.47s/it] +2025-02-05 11:49:59 - ERROR - stderr - 11%|█ | 2434/22434 [1:42:18<13:42:04, 2.47s/it] +2025-02-05 11:49:59 - ERROR - stderr - +2025-02-05 11:49:59 - ERROR - stderr - +2025-02-05 11:49:59 - INFO - stdout - {'loss': 0.9202, 'grad_norm': 1.1350520849227905, 'learning_rate': 1.967890073987323e-05, 'epoch': 0.33} +2025-02-05 11:49:59 - ERROR - stderr - 11%|█ | 2434/22434 [1:42:19<13:42:04, 2.47s/it] +2025-02-05 11:50:01 - ERROR - stderr - 11%|█ | 2435/22434 [1:42:21<13:52:39, 2.50s/it] +2025-02-05 11:50:01 - ERROR - stderr - +2025-02-05 11:50:01 - ERROR - stderr - +2025-02-05 11:50:01 - INFO - stdout - {'loss': 1.0256, 'grad_norm': 1.1618800163269043, 'learning_rate': 1.9678537718875865e-05, 'epoch': 0.33} +2025-02-05 11:50:01 - ERROR - stderr - 11%|█ | 2435/22434 [1:42:21<13:52:39, 2.50s/it] +2025-02-05 11:50:04 - ERROR - stderr - 11%|█ | 2436/22434 [1:42:24<13:51:52, 2.50s/it] +2025-02-05 11:50:04 - ERROR - stderr - +2025-02-05 11:50:04 - ERROR - stderr - +2025-02-05 11:50:04 - INFO - stdout - {'loss': 1.0106, 'grad_norm': 1.1498866081237793, 'learning_rate': 1.9678174496138645e-05, 'epoch': 0.33} +2025-02-05 11:50:04 - ERROR - stderr - 11%|█ | 2436/22434 [1:42:24<13:51:52, 2.50s/it] +2025-02-05 11:50:06 - ERROR - stderr - 11%|█ | 2437/22434 [1:42:26<13:53:49, 2.50s/it] +2025-02-05 11:50:06 - ERROR - stderr - +2025-02-05 11:50:06 - ERROR - stderr - +2025-02-05 11:50:06 - INFO - stdout - {'loss': 0.9647, 'grad_norm': 1.1057292222976685, 'learning_rate': 1.967781107166914e-05, 'epoch': 0.33} +2025-02-05 11:50:06 - ERROR - stderr - 11%|█ | 2437/22434 [1:42:26<13:53:49, 2.50s/it] +2025-02-05 11:50:09 - ERROR - stderr - 11%|█ | 2438/22434 [1:42:29<13:50:20, 2.49s/it] +2025-02-05 11:50:09 - ERROR - stderr - +2025-02-05 11:50:09 - ERROR - stderr - +2025-02-05 11:50:09 - INFO - stdout - {'loss': 1.2417, 'grad_norm': 1.298959732055664, 'learning_rate': 1.9677447445474923e-05, 'epoch': 0.33} +2025-02-05 11:50:09 - ERROR - stderr - 11%|█ | 2438/22434 [1:42:29<13:50:20, 2.49s/it] +2025-02-05 11:50:11 - ERROR - stderr - 11%|█ | 2439/22434 [1:42:31<13:47:46, 2.48s/it] +2025-02-05 11:50:11 - ERROR - stderr - +2025-02-05 11:50:11 - ERROR - stderr - +2025-02-05 11:50:11 - INFO - stdout - {'loss': 0.9229, 'grad_norm': 1.2618036270141602, 'learning_rate': 1.967708361756358e-05, 'epoch': 0.33} +2025-02-05 11:50:11 - ERROR - stderr - 11%|█ | 2439/22434 [1:42:31<13:47:46, 2.48s/it] +2025-02-05 11:50:14 - ERROR - stderr - 11%|█ | 2440/22434 [1:42:34<13:51:17, 2.49s/it] +2025-02-05 11:50:14 - ERROR - stderr - +2025-02-05 11:50:14 - ERROR - stderr - +2025-02-05 11:50:14 - INFO - stdout - {'loss': 0.9972, 'grad_norm': 1.0797914266586304, 'learning_rate': 1.967671958794268e-05, 'epoch': 0.33} +2025-02-05 11:50:14 - ERROR - stderr - 11%|█ | 2440/22434 [1:42:34<13:51:17, 2.49s/it] +2025-02-05 11:50:16 - ERROR - stderr - 11%|█ | 2441/22434 [1:42:36<13:49:32, 2.49s/it] +2025-02-05 11:50:16 - ERROR - stderr - +2025-02-05 11:50:16 - ERROR - stderr - +2025-02-05 11:50:16 - INFO - stdout - {'loss': 1.0198, 'grad_norm': 1.0676758289337158, 'learning_rate': 1.9676355356619824e-05, 'epoch': 0.33} +2025-02-05 11:50:16 - ERROR - stderr - 11%|█ | 2441/22434 [1:42:36<13:49:32, 2.49s/it] +2025-02-05 11:50:19 - ERROR - stderr - 11%|█ | 2442/22434 [1:42:39<13:59:04, 2.52s/it] +2025-02-05 11:50:19 - ERROR - stderr - +2025-02-05 11:50:19 - ERROR - stderr - +2025-02-05 11:50:19 - INFO - stdout - {'loss': 0.941, 'grad_norm': 1.1564595699310303, 'learning_rate': 1.96759909236026e-05, 'epoch': 0.33} +2025-02-05 11:50:19 - ERROR - stderr - 11%|█ | 2442/22434 [1:42:39<13:59:04, 2.52s/it] +2025-02-05 11:50:21 - ERROR - stderr - 11%|█ | 2443/22434 [1:42:41<14:01:41, 2.53s/it] +2025-02-05 11:50:21 - ERROR - stderr - +2025-02-05 11:50:21 - ERROR - stderr - +2025-02-05 11:50:21 - INFO - stdout - {'loss': 0.9186, 'grad_norm': 1.1766642332077026, 'learning_rate': 1.9675626288898604e-05, 'epoch': 0.33} +2025-02-05 11:50:21 - ERROR - stderr - 11%|█ | 2443/22434 [1:42:41<14:01:41, 2.53s/it] +2025-02-05 11:50:24 - ERROR - stderr - 11%|█ | 2444/22434 [1:42:44<13:52:30, 2.50s/it] +2025-02-05 11:50:24 - ERROR - stderr - +2025-02-05 11:50:24 - ERROR - stderr - +2025-02-05 11:50:24 - INFO - stdout - {'loss': 1.0513, 'grad_norm': 1.1774756908416748, 'learning_rate': 1.9675261452515434e-05, 'epoch': 0.33} +2025-02-05 11:50:24 - ERROR - stderr - 11%|█ | 2444/22434 [1:42:44<13:52:30, 2.50s/it] +2025-02-05 11:50:26 - ERROR - stderr - 11%|█ | 2445/22434 [1:42:46<13:50:03, 2.49s/it] +2025-02-05 11:50:26 - ERROR - stderr - +2025-02-05 11:50:26 - ERROR - stderr - +2025-02-05 11:50:26 - INFO - stdout - {'loss': 1.0382, 'grad_norm': 1.0944279432296753, 'learning_rate': 1.96748964144607e-05, 'epoch': 0.33} +2025-02-05 11:50:26 - ERROR - stderr - 11%|█ | 2445/22434 [1:42:46<13:50:03, 2.49s/it] +2025-02-05 11:50:29 - ERROR - stderr - 11%|█ | 2446/22434 [1:42:49<13:50:59, 2.49s/it] +2025-02-05 11:50:29 - ERROR - stderr - +2025-02-05 11:50:29 - ERROR - stderr - +2025-02-05 11:50:29 - INFO - stdout - {'loss': 1.098, 'grad_norm': 1.17111074924469, 'learning_rate': 1.9674531174742007e-05, 'epoch': 0.33} +2025-02-05 11:50:29 - ERROR - stderr - 11%|█ | 2446/22434 [1:42:49<13:50:59, 2.49s/it] +2025-02-05 11:50:31 - ERROR - stderr - 11%|█ | 2447/22434 [1:42:51<13:53:24, 2.50s/it] +2025-02-05 11:50:31 - ERROR - stderr - +2025-02-05 11:50:31 - ERROR - stderr - +2025-02-05 11:50:31 - INFO - stdout - {'loss': 0.9892, 'grad_norm': 1.1919169425964355, 'learning_rate': 1.967416573336697e-05, 'epoch': 0.33} +2025-02-05 11:50:31 - ERROR - stderr - 11%|█ | 2447/22434 [1:42:51<13:53:24, 2.50s/it] +2025-02-05 11:50:34 - ERROR - stderr - 11%|█ | 2448/22434 [1:42:53<13:45:43, 2.48s/it] +2025-02-05 11:50:34 - ERROR - stderr - +2025-02-05 11:50:34 - ERROR - stderr - +2025-02-05 11:50:34 - INFO - stdout - {'loss': 0.9587, 'grad_norm': 1.2979373931884766, 'learning_rate': 1.9673800090343204e-05, 'epoch': 0.33} +2025-02-05 11:50:34 - ERROR - stderr - 11%|█ | 2448/22434 [1:42:54<13:45:43, 2.48s/it] +2025-02-05 11:50:36 - ERROR - stderr - 11%|█ | 2449/22434 [1:42:56<14:02:36, 2.53s/it] +2025-02-05 11:50:36 - ERROR - stderr - +2025-02-05 11:50:36 - ERROR - stderr - +2025-02-05 11:50:36 - INFO - stdout - {'loss': 1.0121, 'grad_norm': 1.210742712020874, 'learning_rate': 1.9673434245678335e-05, 'epoch': 0.33} +2025-02-05 11:50:36 - ERROR - stderr - 11%|█ | 2449/22434 [1:42:56<14:02:36, 2.53s/it] +2025-02-05 11:50:39 - ERROR - stderr - 11%|█ | 2450/22434 [1:42:59<13:51:32, 2.50s/it] +2025-02-05 11:50:39 - ERROR - stderr - +2025-02-05 11:50:39 - ERROR - stderr - +2025-02-05 11:50:39 - INFO - stdout - {'loss': 1.1142, 'grad_norm': 1.227232813835144, 'learning_rate': 1.9673068199379984e-05, 'epoch': 0.33} +2025-02-05 11:50:39 - ERROR - stderr - 11%|█ | 2450/22434 [1:42:59<13:51:32, 2.50s/it] +2025-02-05 11:50:41 - ERROR - stderr - 11%|█ | 2451/22434 [1:43:01<13:44:00, 2.47s/it] +2025-02-05 11:50:41 - ERROR - stderr - +2025-02-05 11:50:41 - ERROR - stderr - +2025-02-05 11:50:41 - INFO - stdout - {'loss': 1.0396, 'grad_norm': 1.1151500940322876, 'learning_rate': 1.967270195145578e-05, 'epoch': 0.33} +2025-02-05 11:50:41 - ERROR - stderr - 11%|█ | 2451/22434 [1:43:01<13:44:00, 2.47s/it] +2025-02-05 11:50:44 - ERROR - stderr - 11%|█ | 2452/22434 [1:43:03<13:40:59, 2.47s/it] +2025-02-05 11:50:44 - ERROR - stderr - +2025-02-05 11:50:44 - ERROR - stderr - +2025-02-05 11:50:44 - INFO - stdout - {'loss': 1.0332, 'grad_norm': 1.2713627815246582, 'learning_rate': 1.9672335501913365e-05, 'epoch': 0.33} +2025-02-05 11:50:44 - ERROR - stderr - 11%|█ | 2452/22434 [1:43:03<13:40:59, 2.47s/it] +2025-02-05 11:50:46 - ERROR - stderr - 11%|█ | 2453/22434 [1:43:06<13:49:52, 2.49s/it] +2025-02-05 11:50:46 - ERROR - stderr - +2025-02-05 11:50:46 - ERROR - stderr - +2025-02-05 11:50:46 - INFO - stdout - {'loss': 1.1004, 'grad_norm': 1.1099375486373901, 'learning_rate': 1.9671968850760366e-05, 'epoch': 0.33} +2025-02-05 11:50:46 - ERROR - stderr - 11%|█ | 2453/22434 [1:43:06<13:49:52, 2.49s/it] +2025-02-05 11:50:49 - ERROR - stderr - 11%|█ | 2454/22434 [1:43:09<13:56:41, 2.51s/it] +2025-02-05 11:50:49 - ERROR - stderr - +2025-02-05 11:50:49 - ERROR - stderr - +2025-02-05 11:50:49 - INFO - stdout - {'loss': 1.1221, 'grad_norm': 1.2335171699523926, 'learning_rate': 1.9671601998004436e-05, 'epoch': 0.33} +2025-02-05 11:50:49 - ERROR - stderr - 11%|█ | 2454/22434 [1:43:09<13:56:41, 2.51s/it] +2025-02-05 11:50:51 - ERROR - stderr - 11%|█ | 2455/22434 [1:43:11<14:04:48, 2.54s/it] +2025-02-05 11:50:51 - ERROR - stderr - +2025-02-05 11:50:51 - ERROR - stderr - +2025-02-05 11:50:51 - INFO - stdout - {'loss': 1.1262, 'grad_norm': 1.2816839218139648, 'learning_rate': 1.9671234943653215e-05, 'epoch': 0.33} +2025-02-05 11:50:51 - ERROR - stderr - 11%|█ | 2455/22434 [1:43:11<14:04:48, 2.54s/it] +2025-02-05 11:50:54 - ERROR - stderr - 11%|█ | 2456/22434 [1:43:14<13:59:19, 2.52s/it] +2025-02-05 11:50:54 - ERROR - stderr - +2025-02-05 11:50:54 - ERROR - stderr - +2025-02-05 11:50:54 - INFO - stdout - {'loss': 0.9708, 'grad_norm': 1.1292667388916016, 'learning_rate': 1.9670867687714356e-05, 'epoch': 0.33} +2025-02-05 11:50:54 - ERROR - stderr - 11%|█ | 2456/22434 [1:43:14<13:59:19, 2.52s/it] +2025-02-05 11:50:56 - ERROR - stderr - 11%|█ | 2457/22434 [1:43:16<13:54:38, 2.51s/it] +2025-02-05 11:50:56 - ERROR - stderr - +2025-02-05 11:50:56 - ERROR - stderr - +2025-02-05 11:50:56 - INFO - stdout - {'loss': 0.8945, 'grad_norm': 1.2714191675186157, 'learning_rate': 1.9670500230195512e-05, 'epoch': 0.33} +2025-02-05 11:50:56 - ERROR - stderr - 11%|█ | 2457/22434 [1:43:16<13:54:38, 2.51s/it] +2025-02-05 11:50:59 - ERROR - stderr - 11%|█ | 2458/22434 [1:43:19<13:50:10, 2.49s/it] +2025-02-05 11:50:59 - ERROR - stderr - +2025-02-05 11:50:59 - ERROR - stderr - +2025-02-05 11:50:59 - INFO - stdout - {'loss': 0.9633, 'grad_norm': 1.2258857488632202, 'learning_rate': 1.967013257110435e-05, 'epoch': 0.33} +2025-02-05 11:50:59 - ERROR - stderr - 11%|█ | 2458/22434 [1:43:19<13:50:10, 2.49s/it] +2025-02-05 11:51:01 - ERROR - stderr - 11%|█ | 2459/22434 [1:43:21<13:48:39, 2.49s/it] +2025-02-05 11:51:01 - ERROR - stderr - +2025-02-05 11:51:01 - ERROR - stderr - +2025-02-05 11:51:01 - INFO - stdout - {'loss': 0.9807, 'grad_norm': 1.1638267040252686, 'learning_rate': 1.9669764710448523e-05, 'epoch': 0.33} +2025-02-05 11:51:01 - ERROR - stderr - 11%|█ | 2459/22434 [1:43:21<13:48:39, 2.49s/it] +2025-02-05 11:51:04 - ERROR - stderr - 11%|█ | 2460/22434 [1:43:24<14:07:37, 2.55s/it] +2025-02-05 11:51:04 - ERROR - stderr - +2025-02-05 11:51:04 - ERROR - stderr - +2025-02-05 11:51:04 - INFO - stdout - {'loss': 1.1655, 'grad_norm': 1.1591609716415405, 'learning_rate': 1.9669396648235704e-05, 'epoch': 0.33} +2025-02-05 11:51:04 - ERROR - stderr - 11%|█ | 2460/22434 [1:43:24<14:07:37, 2.55s/it] +2025-02-05 11:51:06 - ERROR - stderr - 11%|█ | 2461/22434 [1:43:26<14:04:00, 2.54s/it] +2025-02-05 11:51:07 - ERROR - stderr - +2025-02-05 11:51:07 - ERROR - stderr - +2025-02-05 11:51:07 - INFO - stdout - {'loss': 0.9203, 'grad_norm': 1.1324043273925781, 'learning_rate': 1.9669028384473568e-05, 'epoch': 0.33} +2025-02-05 11:51:07 - ERROR - stderr - 11%|█ | 2461/22434 [1:43:26<14:04:00, 2.54s/it] +2025-02-05 11:51:07 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2878 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 11:51:07 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2878 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 11:51:09 - ERROR - stderr - 11%|█ | 2462/22434 [1:43:29<14:01:59, 2.53s/it] +2025-02-05 11:51:09 - ERROR - stderr - +2025-02-05 11:51:09 - ERROR - stderr - +2025-02-05 11:51:09 - INFO - stdout - {'loss': 1.0153, 'grad_norm': 1.1558243036270142, 'learning_rate': 1.9668659919169785e-05, 'epoch': 0.33} +2025-02-05 11:51:09 - ERROR - stderr - 11%|█ | 2462/22434 [1:43:29<14:01:59, 2.53s/it] +2025-02-05 11:51:15 - ERROR - stderr - 11%|█ | 2463/22434 [1:43:35<19:28:38, 3.51s/it] +2025-02-05 11:51:15 - ERROR - stderr - +2025-02-05 11:51:15 - ERROR - stderr - +2025-02-05 11:51:15 - INFO - stdout - {'loss': 0.8862, 'grad_norm': 1.1760532855987549, 'learning_rate': 1.9668291252332038e-05, 'epoch': 0.33} +2025-02-05 11:51:15 - ERROR - stderr - 11%|█ | 2463/22434 [1:43:35<19:28:38, 3.51s/it] +2025-02-05 11:51:18 - ERROR - stderr - 11%|█ | 2464/22434 [1:43:37<18:15:16, 3.29s/it] +2025-02-05 11:51:18 - ERROR - stderr - +2025-02-05 11:51:18 - ERROR - stderr - +2025-02-05 11:51:18 - INFO - stdout - {'loss': 1.0399, 'grad_norm': 1.3157655000686646, 'learning_rate': 1.966792238396801e-05, 'epoch': 0.33} +2025-02-05 11:51:18 - ERROR - stderr - 11%|█ | 2464/22434 [1:43:37<18:15:16, 3.29s/it] +2025-02-05 11:51:20 - ERROR - stderr - 11%|█ | 2465/22434 [1:43:40<16:55:28, 3.05s/it] +2025-02-05 11:51:20 - ERROR - stderr - +2025-02-05 11:51:20 - ERROR - stderr - +2025-02-05 11:51:20 - INFO - stdout - {'loss': 1.06, 'grad_norm': 1.1519900560379028, 'learning_rate': 1.966755331408539e-05, 'epoch': 0.33} +2025-02-05 11:51:20 - ERROR - stderr - 11%|█ | 2465/22434 [1:43:40<16:55:28, 3.05s/it] +2025-02-05 11:51:22 - ERROR - stderr - 11%|█ | 2466/22434 [1:43:42<15:51:11, 2.86s/it] +2025-02-05 11:51:22 - ERROR - stderr - +2025-02-05 11:51:22 - ERROR - stderr - +2025-02-05 11:51:22 - INFO - stdout - {'loss': 0.9835, 'grad_norm': 1.1726974248886108, 'learning_rate': 1.9667184042691877e-05, 'epoch': 0.33} +2025-02-05 11:51:22 - ERROR - stderr - 11%|█ | 2466/22434 [1:43:42<15:51:11, 2.86s/it] +2025-02-05 11:51:25 - ERROR - stderr - 11%|█ | 2467/22434 [1:43:45<15:33:39, 2.81s/it] +2025-02-05 11:51:25 - ERROR - stderr - +2025-02-05 11:51:25 - ERROR - stderr - +2025-02-05 11:51:25 - INFO - stdout - {'loss': 0.9068, 'grad_norm': 1.2968918085098267, 'learning_rate': 1.966681456979516e-05, 'epoch': 0.33} +2025-02-05 11:51:25 - ERROR - stderr - 11%|█ | 2467/22434 [1:43:45<15:33:39, 2.81s/it] +2025-02-05 11:51:28 - ERROR - stderr - 11%|█ | 2468/22434 [1:43:47<14:55:51, 2.69s/it] +2025-02-05 11:51:28 - ERROR - stderr - +2025-02-05 11:51:28 - ERROR - stderr - +2025-02-05 11:51:28 - INFO - stdout - {'loss': 0.9437, 'grad_norm': 1.1878401041030884, 'learning_rate': 1.9666444895402942e-05, 'epoch': 0.33} +2025-02-05 11:51:28 - ERROR - stderr - 11%|█ | 2468/22434 [1:43:47<14:55:51, 2.69s/it] +2025-02-05 11:51:30 - ERROR - stderr - 11%|█ | 2469/22434 [1:43:50<14:34:55, 2.63s/it] +2025-02-05 11:51:30 - ERROR - stderr - +2025-02-05 11:51:30 - ERROR - stderr - +2025-02-05 11:51:30 - INFO - stdout - {'loss': 1.0268, 'grad_norm': 1.1700770854949951, 'learning_rate': 1.9666075019522933e-05, 'epoch': 0.33} +2025-02-05 11:51:30 - ERROR - stderr - 11%|█ | 2469/22434 [1:43:50<14:34:55, 2.63s/it] +2025-02-05 11:51:33 - ERROR - stderr - 11%|█ | 2470/22434 [1:43:52<14:35:10, 2.63s/it] +2025-02-05 11:51:33 - ERROR - stderr - +2025-02-05 11:51:33 - ERROR - stderr - +2025-02-05 11:51:33 - INFO - stdout - {'loss': 1.0115, 'grad_norm': 1.2303813695907593, 'learning_rate': 1.966570494216284e-05, 'epoch': 0.33} +2025-02-05 11:51:33 - ERROR - stderr - 11%|█ | 2470/22434 [1:43:52<14:35:10, 2.63s/it] +2025-02-05 11:51:35 - ERROR - stderr - 11%|█ | 2471/22434 [1:43:55<14:24:52, 2.60s/it] +2025-02-05 11:51:35 - ERROR - stderr - +2025-02-05 11:51:35 - ERROR - stderr - +2025-02-05 11:51:35 - INFO - stdout - {'loss': 1.0371, 'grad_norm': 1.2742059230804443, 'learning_rate': 1.9665334663330372e-05, 'epoch': 0.33} +2025-02-05 11:51:35 - ERROR - stderr - 11%|█ | 2471/22434 [1:43:55<14:24:52, 2.60s/it] +2025-02-05 11:51:38 - ERROR - stderr - 11%|█ | 2472/22434 [1:43:57<14:17:27, 2.58s/it] +2025-02-05 11:51:38 - ERROR - stderr - +2025-02-05 11:51:38 - ERROR - stderr - +2025-02-05 11:51:38 - INFO - stdout - {'loss': 1.0544, 'grad_norm': 1.163232684135437, 'learning_rate': 1.9664964183033256e-05, 'epoch': 0.33} +2025-02-05 11:51:38 - ERROR - stderr - 11%|█ | 2472/22434 [1:43:58<14:17:27, 2.58s/it] +2025-02-05 11:51:40 - ERROR - stderr - 11%|█ | 2473/22434 [1:44:00<14:12:09, 2.56s/it] +2025-02-05 11:51:40 - ERROR - stderr - +2025-02-05 11:51:40 - ERROR - stderr - +2025-02-05 11:51:40 - INFO - stdout - {'loss': 1.1024, 'grad_norm': 1.1946009397506714, 'learning_rate': 1.966459350127921e-05, 'epoch': 0.33} +2025-02-05 11:51:40 - ERROR - stderr - 11%|█ | 2473/22434 [1:44:00<14:12:09, 2.56s/it] +2025-02-05 11:51:43 - ERROR - stderr - 11%|█ | 2474/22434 [1:44:02<13:59:36, 2.52s/it] +2025-02-05 11:51:43 - ERROR - stderr - +2025-02-05 11:51:43 - ERROR - stderr - +2025-02-05 11:51:43 - INFO - stdout - {'loss': 0.9295, 'grad_norm': 1.2083193063735962, 'learning_rate': 1.9664222618075958e-05, 'epoch': 0.33} +2025-02-05 11:51:43 - ERROR - stderr - 11%|█ | 2474/22434 [1:44:03<13:59:36, 2.52s/it] +2025-02-05 11:51:45 - ERROR - stderr - 11%|█ | 2475/22434 [1:44:05<14:00:22, 2.53s/it] +2025-02-05 11:51:45 - ERROR - stderr - +2025-02-05 11:51:45 - ERROR - stderr - +2025-02-05 11:51:45 - INFO - stdout - {'loss': 1.1697, 'grad_norm': 1.2728837728500366, 'learning_rate': 1.9663851533431236e-05, 'epoch': 0.33} +2025-02-05 11:51:45 - ERROR - stderr - 11%|█ | 2475/22434 [1:44:05<14:00:22, 2.53s/it] +2025-02-05 11:51:48 - ERROR - stderr - 11%|█ | 2476/22434 [1:44:08<14:27:38, 2.61s/it] +2025-02-05 11:51:48 - ERROR - stderr - +2025-02-05 11:51:48 - ERROR - stderr - +2025-02-05 11:51:48 - INFO - stdout - {'loss': 0.9949, 'grad_norm': 1.3240692615509033, 'learning_rate': 1.9663480247352775e-05, 'epoch': 0.33} +2025-02-05 11:51:48 - ERROR - stderr - 11%|█ | 2476/22434 [1:44:08<14:27:38, 2.61s/it] +2025-02-05 11:51:51 - ERROR - stderr - 11%|█ | 2477/22434 [1:44:10<14:18:04, 2.58s/it] +2025-02-05 11:51:51 - ERROR - stderr - +2025-02-05 11:51:51 - ERROR - stderr - +2025-02-05 11:51:51 - INFO - stdout - {'loss': 0.956, 'grad_norm': 1.1284722089767456, 'learning_rate': 1.9663108759848314e-05, 'epoch': 0.33} +2025-02-05 11:51:51 - ERROR - stderr - 11%|█ | 2477/22434 [1:44:10<14:18:04, 2.58s/it] +2025-02-05 11:51:53 - ERROR - stderr - 11%|█ | 2478/22434 [1:44:13<14:14:16, 2.57s/it] +2025-02-05 11:51:53 - ERROR - stderr - +2025-02-05 11:51:53 - ERROR - stderr - +2025-02-05 11:51:53 - INFO - stdout - {'loss': 0.9896, 'grad_norm': 1.1202340126037598, 'learning_rate': 1.96627370709256e-05, 'epoch': 0.33} +2025-02-05 11:51:53 - ERROR - stderr - 11%|█ | 2478/22434 [1:44:13<14:14:16, 2.57s/it] +2025-02-05 11:51:56 - ERROR - stderr - 11%|█ | 2479/22434 [1:44:15<14:04:01, 2.54s/it] +2025-02-05 11:51:56 - ERROR - stderr - +2025-02-05 11:51:56 - ERROR - stderr - +2025-02-05 11:51:56 - INFO - stdout - {'loss': 1.0591, 'grad_norm': 1.4902220964431763, 'learning_rate': 1.9662365180592372e-05, 'epoch': 0.33} +2025-02-05 11:51:56 - ERROR - stderr - 11%|█ | 2479/22434 [1:44:15<14:04:01, 2.54s/it] +2025-02-05 11:51:58 - ERROR - stderr - 11%|█ | 2480/22434 [1:44:18<14:01:17, 2.53s/it] +2025-02-05 11:51:58 - ERROR - stderr - +2025-02-05 11:51:58 - ERROR - stderr - +2025-02-05 11:51:58 - INFO - stdout - {'loss': 0.9949, 'grad_norm': 1.1558111906051636, 'learning_rate': 1.9661993088856395e-05, 'epoch': 0.33} +2025-02-05 11:51:58 - ERROR - stderr - 11%|█ | 2480/22434 [1:44:18<14:01:17, 2.53s/it] +2025-02-05 11:52:01 - ERROR - stderr - 11%|█ | 2481/22434 [1:44:20<13:59:02, 2.52s/it] +2025-02-05 11:52:01 - ERROR - stderr - +2025-02-05 11:52:01 - ERROR - stderr - +2025-02-05 11:52:01 - INFO - stdout - {'loss': 1.0558, 'grad_norm': 1.227022647857666, 'learning_rate': 1.9661620795725413e-05, 'epoch': 0.33} +2025-02-05 11:52:01 - ERROR - stderr - 11%|█ | 2481/22434 [1:44:20<13:59:02, 2.52s/it] +2025-02-05 11:52:03 - ERROR - stderr - 11%|█ | 2482/22434 [1:44:23<13:47:29, 2.49s/it] +2025-02-05 11:52:03 - ERROR - stderr - +2025-02-05 11:52:03 - ERROR - stderr - +2025-02-05 11:52:03 - INFO - stdout - {'loss': 1.125, 'grad_norm': 1.1978788375854492, 'learning_rate': 1.966124830120719e-05, 'epoch': 0.33} +2025-02-05 11:52:03 - ERROR - stderr - 11%|█ | 2482/22434 [1:44:23<13:47:29, 2.49s/it] +2025-02-05 11:52:05 - ERROR - stderr - 11%|█ | 2483/22434 [1:44:25<13:44:19, 2.48s/it] +2025-02-05 11:52:05 - ERROR - stderr - +2025-02-05 11:52:05 - ERROR - stderr - +2025-02-05 11:52:05 - INFO - stdout - {'loss': 1.1443, 'grad_norm': 1.2000869512557983, 'learning_rate': 1.96608756053095e-05, 'epoch': 0.33} +2025-02-05 11:52:05 - ERROR - stderr - 11%|█ | 2483/22434 [1:44:25<13:44:19, 2.48s/it] +2025-02-05 11:52:08 - ERROR - stderr - 11%|█ | 2484/22434 [1:44:28<13:41:42, 2.47s/it] +2025-02-05 11:52:08 - ERROR - stderr - +2025-02-05 11:52:08 - ERROR - stderr - +2025-02-05 11:52:08 - INFO - stdout - {'loss': 1.0164, 'grad_norm': 1.0709697008132935, 'learning_rate': 1.9660502708040094e-05, 'epoch': 0.33} +2025-02-05 11:52:08 - ERROR - stderr - 11%|█ | 2484/22434 [1:44:28<13:41:42, 2.47s/it] +2025-02-05 11:52:10 - ERROR - stderr - 11%|█ | 2485/22434 [1:44:30<13:43:26, 2.48s/it] +2025-02-05 11:52:10 - ERROR - stderr - +2025-02-05 11:52:10 - ERROR - stderr - +2025-02-05 11:52:10 - INFO - stdout - {'loss': 1.0067, 'grad_norm': 1.1124541759490967, 'learning_rate': 1.9660129609406752e-05, 'epoch': 0.33} +2025-02-05 11:52:10 - ERROR - stderr - 11%|█ | 2485/22434 [1:44:30<13:43:26, 2.48s/it] +2025-02-05 11:52:13 - ERROR - stderr - 11%|█ | 2486/22434 [1:44:33<13:41:47, 2.47s/it] +2025-02-05 11:52:13 - ERROR - stderr - +2025-02-05 11:52:13 - ERROR - stderr - +2025-02-05 11:52:13 - INFO - stdout - {'loss': 0.9994, 'grad_norm': 1.237353801727295, 'learning_rate': 1.9659756309417254e-05, 'epoch': 0.33} +2025-02-05 11:52:13 - ERROR - stderr - 11%|█ | 2486/22434 [1:44:33<13:41:47, 2.47s/it] +2025-02-05 11:52:15 - ERROR - stderr - 11%|█ | 2487/22434 [1:44:35<13:45:01, 2.48s/it] +2025-02-05 11:52:15 - ERROR - stderr - +2025-02-05 11:52:15 - ERROR - stderr - +2025-02-05 11:52:15 - INFO - stdout - {'loss': 0.8429, 'grad_norm': 1.1384249925613403, 'learning_rate': 1.965938280807938e-05, 'epoch': 0.33} +2025-02-05 11:52:15 - ERROR - stderr - 11%|█ | 2487/22434 [1:44:35<13:45:01, 2.48s/it] +2025-02-05 11:52:18 - ERROR - stderr - 11%|█ | 2488/22434 [1:44:38<14:04:40, 2.54s/it] +2025-02-05 11:52:18 - ERROR - stderr - +2025-02-05 11:52:18 - ERROR - stderr - +2025-02-05 11:52:18 - INFO - stdout - {'loss': 0.9322, 'grad_norm': 1.0440430641174316, 'learning_rate': 1.9659009105400915e-05, 'epoch': 0.33} +2025-02-05 11:52:18 - ERROR - stderr - 11%|█ | 2488/22434 [1:44:38<14:04:40, 2.54s/it] +2025-02-05 11:52:21 - ERROR - stderr - 11%|█ | 2489/22434 [1:44:40<13:59:48, 2.53s/it] +2025-02-05 11:52:21 - ERROR - stderr - +2025-02-05 11:52:21 - ERROR - stderr - +2025-02-05 11:52:21 - INFO - stdout - {'loss': 0.9499, 'grad_norm': 1.0262411832809448, 'learning_rate': 1.9658635201389646e-05, 'epoch': 0.33} +2025-02-05 11:52:21 - ERROR - stderr - 11%|█ | 2489/22434 [1:44:40<13:59:48, 2.53s/it] +2025-02-05 11:52:23 - ERROR - stderr - 11%|█ | 2490/22434 [1:44:43<13:51:02, 2.50s/it] +2025-02-05 11:52:23 - ERROR - stderr - +2025-02-05 11:52:23 - ERROR - stderr - +2025-02-05 11:52:23 - INFO - stdout - {'loss': 1.1036, 'grad_norm': 1.113940954208374, 'learning_rate': 1.965826109605337e-05, 'epoch': 0.33} +2025-02-05 11:52:23 - ERROR - stderr - 11%|█ | 2490/22434 [1:44:43<13:51:02, 2.50s/it] +2025-02-05 11:52:25 - ERROR - stderr - 11%|█ | 2491/22434 [1:44:45<13:51:50, 2.50s/it] +2025-02-05 11:52:25 - ERROR - stderr - +2025-02-05 11:52:25 - ERROR - stderr - +2025-02-05 11:52:25 - INFO - stdout - {'loss': 1.0036, 'grad_norm': 1.0630565881729126, 'learning_rate': 1.9657886789399882e-05, 'epoch': 0.33} +2025-02-05 11:52:25 - ERROR - stderr - 11%|█ | 2491/22434 [1:44:45<13:51:50, 2.50s/it] +2025-02-05 11:52:28 - ERROR - stderr - 11%|█ | 2492/22434 [1:44:48<13:47:00, 2.49s/it] +2025-02-05 11:52:28 - ERROR - stderr - +2025-02-05 11:52:28 - ERROR - stderr - +2025-02-05 11:52:28 - INFO - stdout - {'loss': 0.979, 'grad_norm': 1.3706883192062378, 'learning_rate': 1.965751228143699e-05, 'epoch': 0.33} +2025-02-05 11:52:28 - ERROR - stderr - 11%|█ | 2492/22434 [1:44:48<13:47:00, 2.49s/it] +2025-02-05 11:52:30 - ERROR - stderr - 11%|█ | 2493/22434 [1:44:50<13:46:30, 2.49s/it] +2025-02-05 11:52:30 - ERROR - stderr - +2025-02-05 11:52:30 - ERROR - stderr - +2025-02-05 11:52:30 - INFO - stdout - {'loss': 0.9954, 'grad_norm': 1.0768769979476929, 'learning_rate': 1.965713757217249e-05, 'epoch': 0.33} +2025-02-05 11:52:30 - ERROR - stderr - 11%|█ | 2493/22434 [1:44:50<13:46:30, 2.49s/it] +2025-02-05 11:52:33 - ERROR - stderr - 11%|█ | 2494/22434 [1:44:53<13:52:36, 2.51s/it] +2025-02-05 11:52:33 - ERROR - stderr - +2025-02-05 11:52:33 - ERROR - stderr - +2025-02-05 11:52:33 - INFO - stdout - {'loss': 0.9375, 'grad_norm': 1.0911844968795776, 'learning_rate': 1.96567626616142e-05, 'epoch': 0.33} +2025-02-05 11:52:33 - ERROR - stderr - 11%|█ | 2494/22434 [1:44:53<13:52:36, 2.51s/it] +2025-02-05 11:52:35 - ERROR - stderr - 11%|█ | 2495/22434 [1:44:55<13:44:42, 2.48s/it] +2025-02-05 11:52:35 - ERROR - stderr - +2025-02-05 11:52:35 - ERROR - stderr - +2025-02-05 11:52:35 - INFO - stdout - {'loss': 0.9625, 'grad_norm': 1.1118284463882446, 'learning_rate': 1.9656387549769934e-05, 'epoch': 0.33} +2025-02-05 11:52:35 - ERROR - stderr - 11%|█ | 2495/22434 [1:44:55<13:44:42, 2.48s/it] +2025-02-05 11:52:38 - ERROR - stderr - 11%|█ | 2496/22434 [1:44:58<13:37:45, 2.46s/it] +2025-02-05 11:52:38 - ERROR - stderr - +2025-02-05 11:52:38 - ERROR - stderr - +2025-02-05 11:52:38 - INFO - stdout - {'loss': 0.9775, 'grad_norm': 1.3816057443618774, 'learning_rate': 1.965601223664751e-05, 'epoch': 0.33} +2025-02-05 11:52:38 - ERROR - stderr - 11%|█ | 2496/22434 [1:44:58<13:37:45, 2.46s/it] +2025-02-05 11:52:40 - ERROR - stderr - 11%|█ | 2497/22434 [1:45:00<13:54:38, 2.51s/it] +2025-02-05 11:52:40 - ERROR - stderr - +2025-02-05 11:52:40 - ERROR - stderr - +2025-02-05 11:52:40 - INFO - stdout - {'loss': 1.1202, 'grad_norm': 1.3033983707427979, 'learning_rate': 1.965563672225475e-05, 'epoch': 0.33} +2025-02-05 11:52:40 - ERROR - stderr - 11%|█ | 2497/22434 [1:45:00<13:54:38, 2.51s/it] +2025-02-05 11:52:43 - ERROR - stderr - 11%|█ | 2498/22434 [1:45:03<13:50:32, 2.50s/it] +2025-02-05 11:52:43 - ERROR - stderr - +2025-02-05 11:52:43 - ERROR - stderr - +2025-02-05 11:52:43 - INFO - stdout - {'loss': 0.9068, 'grad_norm': 1.0809283256530762, 'learning_rate': 1.9655261006599482e-05, 'epoch': 0.33} +2025-02-05 11:52:43 - ERROR - stderr - 11%|█ | 2498/22434 [1:45:03<13:50:32, 2.50s/it] +2025-02-05 11:52:45 - ERROR - stderr - 11%|█ | 2499/22434 [1:45:05<13:43:10, 2.48s/it] +2025-02-05 11:52:45 - ERROR - stderr - +2025-02-05 11:52:45 - ERROR - stderr - +2025-02-05 11:52:45 - INFO - stdout - {'loss': 0.9733, 'grad_norm': 1.182268500328064, 'learning_rate': 1.9654885089689537e-05, 'epoch': 0.33} +2025-02-05 11:52:45 - ERROR - stderr - 11%|█ | 2499/22434 [1:45:05<13:43:10, 2.48s/it] +2025-02-05 11:52:48 - ERROR - stderr - 11%|█ | 2500/22434 [1:45:07<13:37:18, 2.46s/it] +2025-02-05 11:52:48 - ERROR - stderr - +2025-02-05 11:52:48 - ERROR - stderr - +2025-02-05 11:52:48 - INFO - stdout - {'loss': 1.0003, 'grad_norm': 1.1254799365997314, 'learning_rate': 1.965450897153275e-05, 'epoch': 0.33} +2025-02-05 11:52:48 - ERROR - stderr - 11%|█ | 2500/22434 [1:45:08<13:37:18, 2.46s/it] +2025-02-05 11:52:50 - ERROR - stderr - 11%|█ | 2501/22434 [1:45:10<13:34:43, 2.45s/it] +2025-02-05 11:52:50 - ERROR - stderr - +2025-02-05 11:52:50 - ERROR - stderr - +2025-02-05 11:52:50 - INFO - stdout - {'loss': 1.1529, 'grad_norm': 1.1354291439056396, 'learning_rate': 1.9654132652136964e-05, 'epoch': 0.33} +2025-02-05 11:52:50 - ERROR - stderr - 11%|█ | 2501/22434 [1:45:10<13:34:43, 2.45s/it] +2025-02-05 11:52:53 - ERROR - stderr - 11%|█ | 2502/22434 [1:45:12<13:37:27, 2.46s/it] +2025-02-05 11:52:53 - ERROR - stderr - +2025-02-05 11:52:53 - ERROR - stderr - +2025-02-05 11:52:53 - INFO - stdout - {'loss': 0.9225, 'grad_norm': 1.1071571111679077, 'learning_rate': 1.965375613151002e-05, 'epoch': 0.33} +2025-02-05 11:52:53 - ERROR - stderr - 11%|█ | 2502/22434 [1:45:12<13:37:27, 2.46s/it] +2025-02-05 11:52:55 - ERROR - stderr - 11%|█ | 2503/22434 [1:45:15<13:48:23, 2.49s/it] +2025-02-05 11:52:55 - ERROR - stderr - +2025-02-05 11:52:55 - ERROR - stderr - +2025-02-05 11:52:55 - INFO - stdout - {'loss': 1.06, 'grad_norm': 1.3543483018875122, 'learning_rate': 1.9653379409659767e-05, 'epoch': 0.33} +2025-02-05 11:52:55 - ERROR - stderr - 11%|█ | 2503/22434 [1:45:15<13:48:23, 2.49s/it] +2025-02-05 11:52:58 - ERROR - stderr - 11%|█ | 2504/22434 [1:45:17<13:42:00, 2.47s/it] +2025-02-05 11:52:58 - ERROR - stderr - +2025-02-05 11:52:58 - ERROR - stderr - +2025-02-05 11:52:58 - INFO - stdout - {'loss': 0.8874, 'grad_norm': 1.1036163568496704, 'learning_rate': 1.9653002486594057e-05, 'epoch': 0.33} +2025-02-05 11:52:58 - ERROR - stderr - 11%|█ | 2504/22434 [1:45:17<13:42:00, 2.47s/it] +2025-02-05 11:53:00 - ERROR - stderr - 11%|█ | 2505/22434 [1:45:20<13:47:03, 2.49s/it] +2025-02-05 11:53:00 - ERROR - stderr - +2025-02-05 11:53:00 - ERROR - stderr - +2025-02-05 11:53:00 - INFO - stdout - {'loss': 0.8567, 'grad_norm': 1.0290050506591797, 'learning_rate': 1.9652625362320746e-05, 'epoch': 0.33} +2025-02-05 11:53:00 - ERROR - stderr - 11%|█ | 2505/22434 [1:45:20<13:47:03, 2.49s/it] +2025-02-05 11:53:03 - ERROR - stderr - 11%|█ | 2506/22434 [1:45:22<13:51:38, 2.50s/it] +2025-02-05 11:53:03 - ERROR - stderr - +2025-02-05 11:53:03 - ERROR - stderr - +2025-02-05 11:53:03 - INFO - stdout - {'loss': 0.922, 'grad_norm': 1.1527010202407837, 'learning_rate': 1.9652248036847698e-05, 'epoch': 0.34} +2025-02-05 11:53:03 - ERROR - stderr - 11%|█ | 2506/22434 [1:45:23<13:51:38, 2.50s/it] +2025-02-05 11:53:05 - ERROR - stderr - 11%|█ | 2507/22434 [1:45:25<13:49:14, 2.50s/it] +2025-02-05 11:53:05 - ERROR - stderr - +2025-02-05 11:53:05 - ERROR - stderr - +2025-02-05 11:53:05 - INFO - stdout - {'loss': 1.1178, 'grad_norm': 1.2125111818313599, 'learning_rate': 1.9651870510182776e-05, 'epoch': 0.34} +2025-02-05 11:53:05 - ERROR - stderr - 11%|█ | 2507/22434 [1:45:25<13:49:14, 2.50s/it] +2025-02-05 11:53:08 - ERROR - stderr - 11%|█ | 2508/22434 [1:45:27<13:43:19, 2.48s/it] +2025-02-05 11:53:08 - ERROR - stderr - +2025-02-05 11:53:08 - ERROR - stderr - +2025-02-05 11:53:08 - INFO - stdout - {'loss': 1.0948, 'grad_norm': 1.2517215013504028, 'learning_rate': 1.9651492782333848e-05, 'epoch': 0.34} +2025-02-05 11:53:08 - ERROR - stderr - 11%|█ | 2508/22434 [1:45:27<13:43:19, 2.48s/it] +2025-02-05 11:53:10 - ERROR - stderr - 11%|█ | 2509/22434 [1:45:30<13:54:05, 2.51s/it] +2025-02-05 11:53:10 - ERROR - stderr - +2025-02-05 11:53:10 - ERROR - stderr - +2025-02-05 11:53:10 - INFO - stdout - {'loss': 0.9732, 'grad_norm': 1.2690868377685547, 'learning_rate': 1.9651114853308788e-05, 'epoch': 0.34} +2025-02-05 11:53:10 - ERROR - stderr - 11%|█ | 2509/22434 [1:45:30<13:54:05, 2.51s/it] +2025-02-05 11:53:13 - ERROR - stderr - 11%|█ | 2510/22434 [1:45:32<13:47:10, 2.49s/it] +2025-02-05 11:53:13 - ERROR - stderr - +2025-02-05 11:53:13 - ERROR - stderr - +2025-02-05 11:53:13 - INFO - stdout - {'loss': 1.0289, 'grad_norm': 1.1586898565292358, 'learning_rate': 1.9650736723115476e-05, 'epoch': 0.34} +2025-02-05 11:53:13 - ERROR - stderr - 11%|█ | 2510/22434 [1:45:32<13:47:10, 2.49s/it] +2025-02-05 11:53:15 - ERROR - stderr - 11%|█ | 2511/22434 [1:45:35<13:44:45, 2.48s/it] +2025-02-05 11:53:15 - ERROR - stderr - +2025-02-05 11:53:15 - ERROR - stderr - +2025-02-05 11:53:15 - INFO - stdout - {'loss': 0.9628, 'grad_norm': 1.2338892221450806, 'learning_rate': 1.965035839176179e-05, 'epoch': 0.34} +2025-02-05 11:53:15 - ERROR - stderr - 11%|█ | 2511/22434 [1:45:35<13:44:45, 2.48s/it] +2025-02-05 11:53:18 - ERROR - stderr - 11%|█ | 2512/22434 [1:45:37<13:38:48, 2.47s/it] +2025-02-05 11:53:18 - ERROR - stderr - +2025-02-05 11:53:18 - ERROR - stderr - +2025-02-05 11:53:18 - INFO - stdout - {'loss': 0.9847, 'grad_norm': 1.228184700012207, 'learning_rate': 1.9649979859255618e-05, 'epoch': 0.34} +2025-02-05 11:53:18 - ERROR - stderr - 11%|█ | 2512/22434 [1:45:37<13:38:48, 2.47s/it] +2025-02-05 11:53:20 - ERROR - stderr - 11%|█ | 2513/22434 [1:45:40<13:49:17, 2.50s/it] +2025-02-05 11:53:20 - ERROR - stderr - +2025-02-05 11:53:20 - ERROR - stderr - +2025-02-05 11:53:20 - INFO - stdout - {'loss': 1.0722, 'grad_norm': 1.3086342811584473, 'learning_rate': 1.964960112560485e-05, 'epoch': 0.34} +2025-02-05 11:53:20 - ERROR - stderr - 11%|█ | 2513/22434 [1:45:40<13:49:17, 2.50s/it] +2025-02-05 11:53:23 - ERROR - stderr - 11%|█ | 2514/22434 [1:45:43<14:19:49, 2.59s/it] +2025-02-05 11:53:23 - ERROR - stderr - +2025-02-05 11:53:23 - ERROR - stderr - +2025-02-05 11:53:23 - INFO - stdout - {'loss': 1.0829, 'grad_norm': 1.1865824460983276, 'learning_rate': 1.9649222190817382e-05, 'epoch': 0.34} +2025-02-05 11:53:23 - ERROR - stderr - 11%|█ | 2514/22434 [1:45:43<14:19:49, 2.59s/it] +2025-02-05 11:53:25 - ERROR - stderr - 11%|█ | 2515/22434 [1:45:45<14:03:29, 2.54s/it] +2025-02-05 11:53:25 - ERROR - stderr - +2025-02-05 11:53:25 - ERROR - stderr - +2025-02-05 11:53:25 - INFO - stdout - {'loss': 0.9169, 'grad_norm': 1.2394098043441772, 'learning_rate': 1.9648843054901106e-05, 'epoch': 0.34} +2025-02-05 11:53:25 - ERROR - stderr - 11%|█ | 2515/22434 [1:45:45<14:03:29, 2.54s/it] +2025-02-05 11:53:28 - ERROR - stderr - 11%|█ | 2516/22434 [1:45:48<14:07:01, 2.55s/it] +2025-02-05 11:53:28 - ERROR - stderr - +2025-02-05 11:53:28 - ERROR - stderr - +2025-02-05 11:53:28 - INFO - stdout - {'loss': 0.9327, 'grad_norm': 1.1646184921264648, 'learning_rate': 1.9648463717863935e-05, 'epoch': 0.34} +2025-02-05 11:53:28 - ERROR - stderr - 11%|█ | 2516/22434 [1:45:48<14:07:01, 2.55s/it] +2025-02-05 11:53:31 - ERROR - stderr - 11%|█ | 2517/22434 [1:45:50<14:09:53, 2.56s/it] +2025-02-05 11:53:31 - ERROR - stderr - +2025-02-05 11:53:31 - ERROR - stderr - +2025-02-05 11:53:31 - INFO - stdout - {'loss': 1.012, 'grad_norm': 1.1969743967056274, 'learning_rate': 1.9648084179713766e-05, 'epoch': 0.34} +2025-02-05 11:53:31 - ERROR - stderr - 11%|█ | 2517/22434 [1:45:50<14:09:53, 2.56s/it] +2025-02-05 11:53:33 - ERROR - stderr - 11%|█ | 2518/22434 [1:45:53<14:01:16, 2.53s/it] +2025-02-05 11:53:33 - ERROR - stderr - +2025-02-05 11:53:33 - ERROR - stderr - +2025-02-05 11:53:33 - INFO - stdout - {'loss': 0.995, 'grad_norm': 1.1722489595413208, 'learning_rate': 1.9647704440458518e-05, 'epoch': 0.34} +2025-02-05 11:53:33 - ERROR - stderr - 11%|█ | 2518/22434 [1:45:53<14:01:16, 2.53s/it] +2025-02-05 11:53:35 - ERROR - stderr - 11%|█ | 2519/22434 [1:45:55<13:55:18, 2.52s/it] +2025-02-05 11:53:35 - ERROR - stderr - +2025-02-05 11:53:35 - ERROR - stderr - +2025-02-05 11:53:35 - INFO - stdout - {'loss': 1.0475, 'grad_norm': 1.1746480464935303, 'learning_rate': 1.96473245001061e-05, 'epoch': 0.34} +2025-02-05 11:53:35 - ERROR - stderr - 11%|█ | 2519/22434 [1:45:55<13:55:18, 2.52s/it] +2025-02-05 11:53:38 - ERROR - stderr - 11%|█ | 2520/22434 [1:45:58<13:53:45, 2.51s/it] +2025-02-05 11:53:38 - ERROR - stderr - +2025-02-05 11:53:38 - ERROR - stderr - +2025-02-05 11:53:38 - INFO - stdout - {'loss': 1.099, 'grad_norm': 1.1708028316497803, 'learning_rate': 1.9646944358664436e-05, 'epoch': 0.34} +2025-02-05 11:53:38 - ERROR - stderr - 11%|█ | 2520/22434 [1:45:58<13:53:45, 2.51s/it] +2025-02-05 11:53:40 - ERROR - stderr - 11%|█ | 2521/22434 [1:46:00<13:55:47, 2.52s/it] +2025-02-05 11:53:41 - ERROR - stderr - +2025-02-05 11:53:41 - ERROR - stderr - +2025-02-05 11:53:41 - INFO - stdout - {'loss': 0.9723, 'grad_norm': 1.0921833515167236, 'learning_rate': 1.9646564016141447e-05, 'epoch': 0.34} +2025-02-05 11:53:41 - ERROR - stderr - 11%|█ | 2521/22434 [1:46:00<13:55:47, 2.52s/it] +2025-02-05 11:53:43 - ERROR - stderr - 11%|█ | 2522/22434 [1:46:03<13:51:25, 2.51s/it] +2025-02-05 11:53:43 - ERROR - stderr - +2025-02-05 11:53:43 - ERROR - stderr - +2025-02-05 11:53:43 - INFO - stdout - {'loss': 1.0105, 'grad_norm': 1.1508148908615112, 'learning_rate': 1.9646183472545063e-05, 'epoch': 0.34} +2025-02-05 11:53:43 - ERROR - stderr - 11%|█ | 2522/22434 [1:46:03<13:51:25, 2.51s/it] +2025-02-05 11:53:45 - ERROR - stderr - 11%|█ | 2523/22434 [1:46:05<13:52:47, 2.51s/it] +2025-02-05 11:53:46 - ERROR - stderr - +2025-02-05 11:53:46 - ERROR - stderr - +2025-02-05 11:53:46 - INFO - stdout - {'loss': 0.9449, 'grad_norm': 1.2986013889312744, 'learning_rate': 1.964580272788321e-05, 'epoch': 0.34} +2025-02-05 11:53:46 - ERROR - stderr - 11%|█ | 2523/22434 [1:46:05<13:52:47, 2.51s/it] +2025-02-05 11:53:48 - ERROR - stderr - 11%|█▏ | 2524/22434 [1:46:08<14:07:23, 2.55s/it] +2025-02-05 11:53:48 - ERROR - stderr - +2025-02-05 11:53:48 - ERROR - stderr - +2025-02-05 11:53:48 - INFO - stdout - {'loss': 1.005, 'grad_norm': 1.2493939399719238, 'learning_rate': 1.9645421782163838e-05, 'epoch': 0.34} +2025-02-05 11:53:48 - ERROR - stderr - 11%|█▏ | 2524/22434 [1:46:08<14:07:23, 2.55s/it] +2025-02-05 11:53:51 - ERROR - stderr - 11%|█▏ | 2525/22434 [1:46:11<14:21:13, 2.60s/it] +2025-02-05 11:53:51 - ERROR - stderr - +2025-02-05 11:53:51 - ERROR - stderr - +2025-02-05 11:53:51 - INFO - stdout - {'loss': 0.8448, 'grad_norm': 1.093065857887268, 'learning_rate': 1.9645040635394876e-05, 'epoch': 0.34} +2025-02-05 11:53:51 - ERROR - stderr - 11%|█▏ | 2525/22434 [1:46:11<14:21:13, 2.60s/it] +2025-02-05 11:53:53 - ERROR - stderr - 11%|█▏ | 2526/22434 [1:46:13<14:23:23, 2.60s/it] +2025-02-05 11:53:53 - ERROR - stderr - +2025-02-05 11:53:53 - ERROR - stderr - +2025-02-05 11:53:53 - INFO - stdout - {'loss': 1.1083, 'grad_norm': 1.2449997663497925, 'learning_rate': 1.9644659287584263e-05, 'epoch': 0.34} +2025-02-05 11:53:53 - ERROR - stderr - 11%|█▏ | 2526/22434 [1:46:13<14:23:23, 2.60s/it] +2025-02-05 11:53:56 - ERROR - stderr - 11%|█▏ | 2527/22434 [1:46:16<14:13:07, 2.57s/it] +2025-02-05 11:53:56 - ERROR - stderr - +2025-02-05 11:53:56 - ERROR - stderr - +2025-02-05 11:53:56 - INFO - stdout - {'loss': 0.977, 'grad_norm': 1.1653188467025757, 'learning_rate': 1.9644277738739966e-05, 'epoch': 0.34} +2025-02-05 11:53:56 - ERROR - stderr - 11%|█▏ | 2527/22434 [1:46:16<14:13:07, 2.57s/it] +2025-02-05 11:53:58 - ERROR - stderr - 11%|█▏ | 2528/22434 [1:46:18<14:04:13, 2.54s/it] +2025-02-05 11:53:58 - ERROR - stderr - +2025-02-05 11:53:58 - ERROR - stderr - +2025-02-05 11:53:58 - INFO - stdout - {'loss': 1.032, 'grad_norm': 1.2044494152069092, 'learning_rate': 1.9643895988869922e-05, 'epoch': 0.34} +2025-02-05 11:53:58 - ERROR - stderr - 11%|█▏ | 2528/22434 [1:46:18<14:04:13, 2.54s/it] +2025-02-05 11:54:01 - ERROR - stderr - 11%|█▏ | 2529/22434 [1:46:21<13:56:24, 2.52s/it] +2025-02-05 11:54:01 - ERROR - stderr - +2025-02-05 11:54:01 - ERROR - stderr - +2025-02-05 11:54:01 - INFO - stdout - {'loss': 0.9607, 'grad_norm': 1.1300307512283325, 'learning_rate': 1.96435140379821e-05, 'epoch': 0.34} +2025-02-05 11:54:01 - ERROR - stderr - 11%|█▏ | 2529/22434 [1:46:21<13:56:24, 2.52s/it] +2025-02-05 11:54:03 - ERROR - stderr - 11%|█▏ | 2530/22434 [1:46:23<13:56:09, 2.52s/it] +2025-02-05 11:54:03 - ERROR - stderr - +2025-02-05 11:54:03 - ERROR - stderr - +2025-02-05 11:54:03 - INFO - stdout - {'loss': 0.9449, 'grad_norm': 1.1526036262512207, 'learning_rate': 1.964313188608445e-05, 'epoch': 0.34} +2025-02-05 11:54:03 - ERROR - stderr - 11%|█▏ | 2530/22434 [1:46:23<13:56:09, 2.52s/it] +2025-02-05 11:54:06 - ERROR - stderr - 11%|█▏ | 2531/22434 [1:46:26<13:48:49, 2.50s/it] +2025-02-05 11:54:06 - ERROR - stderr - +2025-02-05 11:54:06 - ERROR - stderr - +2025-02-05 11:54:06 - INFO - stdout - {'loss': 0.9135, 'grad_norm': 1.13448166847229, 'learning_rate': 1.9642749533184945e-05, 'epoch': 0.34} +2025-02-05 11:54:06 - ERROR - stderr - 11%|█▏ | 2531/22434 [1:46:26<13:48:49, 2.50s/it] +2025-02-05 11:54:09 - ERROR - stderr - 11%|█▏ | 2532/22434 [1:46:28<14:03:15, 2.54s/it] +2025-02-05 11:54:09 - ERROR - stderr - +2025-02-05 11:54:09 - ERROR - stderr - +2025-02-05 11:54:09 - INFO - stdout - {'loss': 1.1695, 'grad_norm': 1.1744157075881958, 'learning_rate': 1.9642366979291555e-05, 'epoch': 0.34} +2025-02-05 11:54:09 - ERROR - stderr - 11%|█▏ | 2532/22434 [1:46:28<14:03:15, 2.54s/it] +2025-02-05 11:54:11 - ERROR - stderr - 11%|█▏ | 2533/22434 [1:46:31<14:12:19, 2.57s/it] +2025-02-05 11:54:11 - ERROR - stderr - +2025-02-05 11:54:11 - ERROR - stderr - +2025-02-05 11:54:11 - INFO - stdout - {'loss': 0.9579, 'grad_norm': 1.0801098346710205, 'learning_rate': 1.964198422441225e-05, 'epoch': 0.34} +2025-02-05 11:54:11 - ERROR - stderr - 11%|█▏ | 2533/22434 [1:46:31<14:12:19, 2.57s/it] +2025-02-05 11:54:14 - ERROR - stderr - 11%|█▏ | 2534/22434 [1:46:33<14:07:14, 2.55s/it] +2025-02-05 11:54:14 - ERROR - stderr - +2025-02-05 11:54:14 - ERROR - stderr - +2025-02-05 11:54:14 - INFO - stdout - {'loss': 1.1205, 'grad_norm': 1.310989260673523, 'learning_rate': 1.964160126855501e-05, 'epoch': 0.34} +2025-02-05 11:54:14 - ERROR - stderr - 11%|█▏ | 2534/22434 [1:46:33<14:07:14, 2.55s/it] +2025-02-05 11:54:16 - ERROR - stderr - 11%|█▏ | 2535/22434 [1:46:36<13:58:29, 2.53s/it] +2025-02-05 11:54:16 - ERROR - stderr - +2025-02-05 11:54:16 - ERROR - stderr - +2025-02-05 11:54:16 - INFO - stdout - {'loss': 1.0463, 'grad_norm': 1.3216352462768555, 'learning_rate': 1.964121811172782e-05, 'epoch': 0.34} +2025-02-05 11:54:16 - ERROR - stderr - 11%|█▏ | 2535/22434 [1:46:36<13:58:29, 2.53s/it] +2025-02-05 11:54:19 - ERROR - stderr - 11%|█▏ | 2536/22434 [1:46:38<13:51:27, 2.51s/it] +2025-02-05 11:54:19 - ERROR - stderr - +2025-02-05 11:54:19 - ERROR - stderr - +2025-02-05 11:54:19 - INFO - stdout - {'loss': 0.9809, 'grad_norm': 1.2654401063919067, 'learning_rate': 1.9640834753938663e-05, 'epoch': 0.34} +2025-02-05 11:54:19 - ERROR - stderr - 11%|█▏ | 2536/22434 [1:46:38<13:51:27, 2.51s/it] +2025-02-05 11:54:21 - ERROR - stderr - 11%|█▏ | 2537/22434 [1:46:41<13:51:30, 2.51s/it] +2025-02-05 11:54:21 - ERROR - stderr - +2025-02-05 11:54:21 - ERROR - stderr - +2025-02-05 11:54:21 - INFO - stdout - {'loss': 0.9372, 'grad_norm': 1.1328372955322266, 'learning_rate': 1.9640451195195533e-05, 'epoch': 0.34} +2025-02-05 11:54:21 - ERROR - stderr - 11%|█▏ | 2537/22434 [1:46:41<13:51:30, 2.51s/it] +2025-02-05 11:54:24 - ERROR - stderr - 11%|█▏ | 2538/22434 [1:46:43<13:44:49, 2.49s/it] +2025-02-05 11:54:24 - ERROR - stderr - +2025-02-05 11:54:24 - ERROR - stderr - +2025-02-05 11:54:24 - INFO - stdout - {'loss': 1.036, 'grad_norm': 1.2147736549377441, 'learning_rate': 1.9640067435506416e-05, 'epoch': 0.34} +2025-02-05 11:54:24 - ERROR - stderr - 11%|█▏ | 2538/22434 [1:46:43<13:44:49, 2.49s/it] +2025-02-05 11:54:26 - ERROR - stderr - 11%|█▏ | 2539/22434 [1:46:46<13:43:59, 2.49s/it] +2025-02-05 11:54:26 - ERROR - stderr - +2025-02-05 11:54:26 - ERROR - stderr - +2025-02-05 11:54:26 - INFO - stdout - {'loss': 1.0111, 'grad_norm': 1.2760734558105469, 'learning_rate': 1.9639683474879326e-05, 'epoch': 0.34} +2025-02-05 11:54:26 - ERROR - stderr - 11%|█▏ | 2539/22434 [1:46:46<13:43:59, 2.49s/it] +2025-02-05 11:54:29 - ERROR - stderr - 11%|█▏ | 2540/22434 [1:46:48<13:46:17, 2.49s/it] +2025-02-05 11:54:29 - ERROR - stderr - +2025-02-05 11:54:29 - ERROR - stderr - +2025-02-05 11:54:29 - INFO - stdout - {'loss': 1.0139, 'grad_norm': 1.22752046585083, 'learning_rate': 1.963929931332225e-05, 'epoch': 0.34} +2025-02-05 11:54:29 - ERROR - stderr - 11%|█▏ | 2540/22434 [1:46:48<13:46:17, 2.49s/it] +2025-02-05 11:54:31 - ERROR - stderr - 11%|█▏ | 2541/22434 [1:46:51<13:48:17, 2.50s/it] +2025-02-05 11:54:31 - ERROR - stderr - +2025-02-05 11:54:31 - ERROR - stderr - +2025-02-05 11:54:31 - INFO - stdout - {'loss': 0.956, 'grad_norm': 1.0937491655349731, 'learning_rate': 1.9638914950843212e-05, 'epoch': 0.34} +2025-02-05 11:54:31 - ERROR - stderr - 11%|█▏ | 2541/22434 [1:46:51<13:48:17, 2.50s/it] +2025-02-05 11:54:33 - ERROR - stderr - 11%|█▏ | 2542/22434 [1:46:53<13:40:47, 2.48s/it] +2025-02-05 11:54:34 - ERROR - stderr - +2025-02-05 11:54:34 - ERROR - stderr - +2025-02-05 11:54:34 - INFO - stdout - {'loss': 1.042, 'grad_norm': 1.2286529541015625, 'learning_rate': 1.963853038745021e-05, 'epoch': 0.34} +2025-02-05 11:54:34 - ERROR - stderr - 11%|█▏ | 2542/22434 [1:46:53<13:40:47, 2.48s/it] +2025-02-05 11:54:36 - ERROR - stderr - 11%|█▏ | 2543/22434 [1:46:56<13:36:24, 2.46s/it] +2025-02-05 11:54:36 - ERROR - stderr - +2025-02-05 11:54:36 - ERROR - stderr - +2025-02-05 11:54:36 - INFO - stdout - {'loss': 1.0048, 'grad_norm': 1.168082594871521, 'learning_rate': 1.9638145623151267e-05, 'epoch': 0.34} +2025-02-05 11:54:36 - ERROR - stderr - 11%|█▏ | 2543/22434 [1:46:56<13:36:24, 2.46s/it] +2025-02-05 11:54:38 - ERROR - stderr - 11%|█▏ | 2544/22434 [1:46:58<13:40:49, 2.48s/it] +2025-02-05 11:54:38 - ERROR - stderr - +2025-02-05 11:54:38 - ERROR - stderr - +2025-02-05 11:54:38 - INFO - stdout - {'loss': 1.097, 'grad_norm': 1.2270926237106323, 'learning_rate': 1.96377606579544e-05, 'epoch': 0.34} +2025-02-05 11:54:38 - ERROR - stderr - 11%|█▏ | 2544/22434 [1:46:58<13:40:49, 2.48s/it] +2025-02-05 11:54:41 - ERROR - stderr - 11%|█▏ | 2545/22434 [1:47:01<13:41:51, 2.48s/it] +2025-02-05 11:54:41 - ERROR - stderr - +2025-02-05 11:54:41 - ERROR - stderr - +2025-02-05 11:54:41 - INFO - stdout - {'loss': 1.0339, 'grad_norm': 1.1742442846298218, 'learning_rate': 1.9637375491867636e-05, 'epoch': 0.34} +2025-02-05 11:54:41 - ERROR - stderr - 11%|█▏ | 2545/22434 [1:47:01<13:41:51, 2.48s/it] +2025-02-05 11:54:44 - ERROR - stderr - 11%|█▏ | 2546/22434 [1:47:03<13:59:50, 2.53s/it] +2025-02-05 11:54:44 - ERROR - stderr - +2025-02-05 11:54:44 - ERROR - stderr - +2025-02-05 11:54:44 - INFO - stdout - {'loss': 0.9833, 'grad_norm': 1.164702296257019, 'learning_rate': 1.9636990124899e-05, 'epoch': 0.34} +2025-02-05 11:54:44 - ERROR - stderr - 11%|█▏ | 2546/22434 [1:47:03<13:59:50, 2.53s/it] +2025-02-05 11:54:46 - ERROR - stderr - 11%|█▏ | 2547/22434 [1:47:06<13:55:14, 2.52s/it] +2025-02-05 11:54:46 - ERROR - stderr - +2025-02-05 11:54:46 - ERROR - stderr - +2025-02-05 11:54:46 - INFO - stdout - {'loss': 0.9439, 'grad_norm': 1.129084825515747, 'learning_rate': 1.963660455705653e-05, 'epoch': 0.34} +2025-02-05 11:54:46 - ERROR - stderr - 11%|█▏ | 2547/22434 [1:47:06<13:55:14, 2.52s/it] +2025-02-05 11:54:49 - ERROR - stderr - 11%|█▏ | 2548/22434 [1:47:08<13:51:58, 2.51s/it] +2025-02-05 11:54:49 - ERROR - stderr - +2025-02-05 11:54:49 - ERROR - stderr - +2025-02-05 11:54:49 - INFO - stdout - {'loss': 0.9155, 'grad_norm': 1.0737391710281372, 'learning_rate': 1.9636218788348254e-05, 'epoch': 0.34} +2025-02-05 11:54:49 - ERROR - stderr - 11%|█▏ | 2548/22434 [1:47:08<13:51:58, 2.51s/it] +2025-02-05 11:54:51 - ERROR - stderr - 11%|█▏ | 2549/22434 [1:47:11<13:51:00, 2.51s/it] +2025-02-05 11:54:51 - ERROR - stderr - +2025-02-05 11:54:51 - ERROR - stderr - +2025-02-05 11:54:51 - INFO - stdout - {'loss': 1.0652, 'grad_norm': 1.1754376888275146, 'learning_rate': 1.963583281878222e-05, 'epoch': 0.34} +2025-02-05 11:54:51 - ERROR - stderr - 11%|█▏ | 2549/22434 [1:47:11<13:51:00, 2.51s/it] +2025-02-05 11:54:53 - ERROR - stderr - 11%|█▏ | 2550/22434 [1:47:13<13:43:33, 2.49s/it] +2025-02-05 11:54:54 - ERROR - stderr - +2025-02-05 11:54:54 - ERROR - stderr - +2025-02-05 11:54:54 - INFO - stdout - {'loss': 0.988, 'grad_norm': 1.1493417024612427, 'learning_rate': 1.9635446648366473e-05, 'epoch': 0.34} +2025-02-05 11:54:54 - ERROR - stderr - 11%|█▏ | 2550/22434 [1:47:13<13:43:33, 2.49s/it] +2025-02-05 11:54:56 - ERROR - stderr - 11%|█▏ | 2551/22434 [1:47:16<13:48:38, 2.50s/it] +2025-02-05 11:54:56 - ERROR - stderr - +2025-02-05 11:54:56 - ERROR - stderr - +2025-02-05 11:54:56 - INFO - stdout - {'loss': 0.9255, 'grad_norm': 1.085188388824463, 'learning_rate': 1.963506027710906e-05, 'epoch': 0.34} +2025-02-05 11:54:56 - ERROR - stderr - 11%|█▏ | 2551/22434 [1:47:16<13:48:38, 2.50s/it] +2025-02-05 11:54:58 - ERROR - stderr - 11%|█▏ | 2552/22434 [1:47:18<13:47:59, 2.50s/it] +2025-02-05 11:54:59 - ERROR - stderr - +2025-02-05 11:54:59 - ERROR - stderr - +2025-02-05 11:54:59 - INFO - stdout - {'loss': 1.0145, 'grad_norm': 1.1129672527313232, 'learning_rate': 1.9634673705018034e-05, 'epoch': 0.34} +2025-02-05 11:54:59 - ERROR - stderr - 11%|█▏ | 2552/22434 [1:47:18<13:47:59, 2.50s/it] +2025-02-05 11:55:01 - ERROR - stderr - 11%|█▏ | 2553/22434 [1:47:21<13:41:01, 2.48s/it] +2025-02-05 11:55:01 - ERROR - stderr - +2025-02-05 11:55:01 - ERROR - stderr - +2025-02-05 11:55:01 - INFO - stdout - {'loss': 0.9954, 'grad_norm': 1.2364767789840698, 'learning_rate': 1.9634286932101457e-05, 'epoch': 0.34} +2025-02-05 11:55:01 - ERROR - stderr - 11%|█▏ | 2553/22434 [1:47:21<13:41:01, 2.48s/it] +2025-02-05 11:55:03 - ERROR - stderr - 11%|█▏ | 2554/22434 [1:47:23<13:42:48, 2.48s/it] +2025-02-05 11:55:03 - ERROR - stderr - +2025-02-05 11:55:03 - ERROR - stderr - +2025-02-05 11:55:03 - INFO - stdout - {'loss': 0.8759, 'grad_norm': 1.079734444618225, 'learning_rate': 1.9633899958367384e-05, 'epoch': 0.34} +2025-02-05 11:55:03 - ERROR - stderr - 11%|█▏ | 2554/22434 [1:47:23<13:42:48, 2.48s/it] +2025-02-05 11:55:06 - ERROR - stderr - 11%|█▏ | 2555/22434 [1:47:26<14:08:59, 2.56s/it] +2025-02-05 11:55:06 - ERROR - stderr - +2025-02-05 11:55:06 - ERROR - stderr - +2025-02-05 11:55:06 - INFO - stdout - {'loss': 0.9272, 'grad_norm': 1.1786879301071167, 'learning_rate': 1.9633512783823887e-05, 'epoch': 0.34} +2025-02-05 11:55:06 - ERROR - stderr - 11%|█▏ | 2555/22434 [1:47:26<14:08:59, 2.56s/it] +2025-02-05 11:55:09 - ERROR - stderr - 11%|█▏ | 2556/22434 [1:47:28<14:10:00, 2.57s/it] +2025-02-05 11:55:09 - ERROR - stderr - +2025-02-05 11:55:09 - ERROR - stderr - +2025-02-05 11:55:09 - INFO - stdout - {'loss': 0.9312, 'grad_norm': 1.183010220527649, 'learning_rate': 1.9633125408479035e-05, 'epoch': 0.34} +2025-02-05 11:55:09 - ERROR - stderr - 11%|█▏ | 2556/22434 [1:47:29<14:10:00, 2.57s/it] +2025-02-05 11:55:11 - ERROR - stderr - 11%|█▏ | 2557/22434 [1:47:31<14:06:21, 2.55s/it] +2025-02-05 11:55:11 - ERROR - stderr - +2025-02-05 11:55:11 - ERROR - stderr - +2025-02-05 11:55:11 - INFO - stdout - {'loss': 0.9726, 'grad_norm': 1.05107843875885, 'learning_rate': 1.9632737832340904e-05, 'epoch': 0.34} +2025-02-05 11:55:11 - ERROR - stderr - 11%|█▏ | 2557/22434 [1:47:31<14:06:21, 2.55s/it] +2025-02-05 11:55:14 - ERROR - stderr - 11%|█▏ | 2558/22434 [1:47:34<14:01:48, 2.54s/it] +2025-02-05 11:55:14 - ERROR - stderr - +2025-02-05 11:55:14 - ERROR - stderr - +2025-02-05 11:55:14 - INFO - stdout - {'loss': 1.0098, 'grad_norm': 1.1555575132369995, 'learning_rate': 1.9632350055417566e-05, 'epoch': 0.34} +2025-02-05 11:55:14 - ERROR - stderr - 11%|█▏ | 2558/22434 [1:47:34<14:01:48, 2.54s/it] +2025-02-05 11:55:16 - ERROR - stderr - 11%|█▏ | 2559/22434 [1:47:36<14:02:37, 2.54s/it] +2025-02-05 11:55:16 - ERROR - stderr - +2025-02-05 11:55:16 - ERROR - stderr - +2025-02-05 11:55:16 - INFO - stdout - {'loss': 0.9987, 'grad_norm': 1.201690912246704, 'learning_rate': 1.963196207771711e-05, 'epoch': 0.34} +2025-02-05 11:55:16 - ERROR - stderr - 11%|█▏ | 2559/22434 [1:47:36<14:02:37, 2.54s/it] +2025-02-05 11:55:19 - ERROR - stderr - 11%|█▏ | 2560/22434 [1:47:39<13:53:53, 2.52s/it] +2025-02-05 11:55:19 - ERROR - stderr - +2025-02-05 11:55:19 - ERROR - stderr - +2025-02-05 11:55:19 - INFO - stdout - {'loss': 1.1288, 'grad_norm': 1.2961421012878418, 'learning_rate': 1.963157389924762e-05, 'epoch': 0.34} +2025-02-05 11:55:19 - ERROR - stderr - 11%|█▏ | 2560/22434 [1:47:39<13:53:53, 2.52s/it] +2025-02-05 11:55:21 - ERROR - stderr - 11%|█▏ | 2561/22434 [1:47:41<13:59:21, 2.53s/it] +2025-02-05 11:55:21 - ERROR - stderr - +2025-02-05 11:55:21 - ERROR - stderr - +2025-02-05 11:55:21 - INFO - stdout - {'loss': 1.0613, 'grad_norm': 1.1089577674865723, 'learning_rate': 1.9631185520017187e-05, 'epoch': 0.34} +2025-02-05 11:55:21 - ERROR - stderr - 11%|█▏ | 2561/22434 [1:47:41<13:59:21, 2.53s/it] +2025-02-05 11:55:24 - ERROR - stderr - 11%|█▏ | 2562/22434 [1:47:44<13:55:44, 2.52s/it] +2025-02-05 11:55:24 - ERROR - stderr - +2025-02-05 11:55:24 - ERROR - stderr - +2025-02-05 11:55:24 - INFO - stdout - {'loss': 1.0191, 'grad_norm': 1.1423362493515015, 'learning_rate': 1.9630796940033913e-05, 'epoch': 0.34} +2025-02-05 11:55:24 - ERROR - stderr - 11%|█▏ | 2562/22434 [1:47:44<13:55:44, 2.52s/it] +2025-02-05 11:55:26 - ERROR - stderr - 11%|█▏ | 2563/22434 [1:47:46<13:47:14, 2.50s/it] +2025-02-05 11:55:26 - ERROR - stderr - +2025-02-05 11:55:26 - ERROR - stderr - +2025-02-05 11:55:26 - INFO - stdout - {'loss': 0.9508, 'grad_norm': 1.1997482776641846, 'learning_rate': 1.963040815930589e-05, 'epoch': 0.34} +2025-02-05 11:55:26 - ERROR - stderr - 11%|█▏ | 2563/22434 [1:47:46<13:47:14, 2.50s/it] +2025-02-05 11:55:29 - ERROR - stderr - 11%|█▏ | 2564/22434 [1:47:49<13:49:58, 2.51s/it] +2025-02-05 11:55:29 - ERROR - stderr - +2025-02-05 11:55:29 - ERROR - stderr - +2025-02-05 11:55:29 - INFO - stdout - {'loss': 0.9615, 'grad_norm': 1.1286191940307617, 'learning_rate': 1.9630019177841224e-05, 'epoch': 0.34} +2025-02-05 11:55:29 - ERROR - stderr - 11%|█▏ | 2564/22434 [1:47:49<13:49:58, 2.51s/it] +2025-02-05 11:55:31 - ERROR - stderr - 11%|█▏ | 2565/22434 [1:47:51<13:48:06, 2.50s/it] +2025-02-05 11:55:31 - ERROR - stderr - +2025-02-05 11:55:31 - ERROR - stderr - +2025-02-05 11:55:31 - INFO - stdout - {'loss': 1.0301, 'grad_norm': 1.072165608406067, 'learning_rate': 1.9629629995648024e-05, 'epoch': 0.34} +2025-02-05 11:55:31 - ERROR - stderr - 11%|█▏ | 2565/22434 [1:47:51<13:48:06, 2.50s/it] +2025-02-05 11:55:34 - ERROR - stderr - 11%|█▏ | 2566/22434 [1:47:53<13:39:36, 2.48s/it] +2025-02-05 11:55:34 - ERROR - stderr - +2025-02-05 11:55:34 - ERROR - stderr - +2025-02-05 11:55:34 - INFO - stdout - {'loss': 1.0236, 'grad_norm': 1.2226704359054565, 'learning_rate': 1.96292406127344e-05, 'epoch': 0.34} +2025-02-05 11:55:34 - ERROR - stderr - 11%|█▏ | 2566/22434 [1:47:54<13:39:36, 2.48s/it] +2025-02-05 11:55:36 - ERROR - stderr - 11%|█▏ | 2567/22434 [1:47:56<13:43:56, 2.49s/it] +2025-02-05 11:55:36 - ERROR - stderr - +2025-02-05 11:55:36 - ERROR - stderr - +2025-02-05 11:55:36 - INFO - stdout - {'loss': 0.9152, 'grad_norm': 1.1634501218795776, 'learning_rate': 1.962885102910847e-05, 'epoch': 0.34} +2025-02-05 11:55:36 - ERROR - stderr - 11%|█▏ | 2567/22434 [1:47:56<13:43:56, 2.49s/it] +2025-02-05 11:55:39 - ERROR - stderr - 11%|█▏ | 2568/22434 [1:47:59<13:57:21, 2.53s/it] +2025-02-05 11:55:39 - ERROR - stderr - +2025-02-05 11:55:39 - ERROR - stderr - +2025-02-05 11:55:39 - INFO - stdout - {'loss': 0.9922, 'grad_norm': 1.1952215433120728, 'learning_rate': 1.9628461244778356e-05, 'epoch': 0.34} +2025-02-05 11:55:39 - ERROR - stderr - 11%|█▏ | 2568/22434 [1:47:59<13:57:21, 2.53s/it] +2025-02-05 11:55:41 - ERROR - stderr - 11%|█▏ | 2569/22434 [1:48:01<13:59:30, 2.54s/it] +2025-02-05 11:55:41 - ERROR - stderr - +2025-02-05 11:55:41 - ERROR - stderr - +2025-02-05 11:55:41 - INFO - stdout - {'loss': 0.9343, 'grad_norm': 1.2677711248397827, 'learning_rate': 1.9628071259752177e-05, 'epoch': 0.34} +2025-02-05 11:55:41 - ERROR - stderr - 11%|█▏ | 2569/22434 [1:48:01<13:59:30, 2.54s/it] +2025-02-05 11:55:44 - ERROR - stderr - 11%|█▏ | 2570/22434 [1:48:04<14:03:01, 2.55s/it] +2025-02-05 11:55:44 - ERROR - stderr - +2025-02-05 11:55:44 - ERROR - stderr - +2025-02-05 11:55:44 - INFO - stdout - {'loss': 0.9223, 'grad_norm': 1.1028345823287964, 'learning_rate': 1.962768107403807e-05, 'epoch': 0.34} +2025-02-05 11:55:44 - ERROR - stderr - 11%|█▏ | 2570/22434 [1:48:04<14:03:01, 2.55s/it] +2025-02-05 11:55:46 - ERROR - stderr - 11%|█▏ | 2571/22434 [1:48:06<13:49:38, 2.51s/it] +2025-02-05 11:55:46 - ERROR - stderr - +2025-02-05 11:55:46 - ERROR - stderr - +2025-02-05 11:55:46 - INFO - stdout - {'loss': 1.0954, 'grad_norm': 1.1565215587615967, 'learning_rate': 1.962729068764416e-05, 'epoch': 0.34} +2025-02-05 11:55:46 - ERROR - stderr - 11%|█▏ | 2571/22434 [1:48:06<13:49:38, 2.51s/it] +2025-02-05 11:55:49 - ERROR - stderr - 11%|█▏ | 2572/22434 [1:48:09<13:42:49, 2.49s/it] +2025-02-05 11:55:49 - ERROR - stderr - +2025-02-05 11:55:49 - ERROR - stderr - +2025-02-05 11:55:49 - INFO - stdout - {'loss': 1.1138, 'grad_norm': 1.2226780652999878, 'learning_rate': 1.962690010057859e-05, 'epoch': 0.34} +2025-02-05 11:55:49 - ERROR - stderr - 11%|█▏ | 2572/22434 [1:48:09<13:42:49, 2.49s/it] +2025-02-05 11:55:51 - ERROR - stderr - 11%|█▏ | 2573/22434 [1:48:11<13:49:44, 2.51s/it] +2025-02-05 11:55:51 - ERROR - stderr - +2025-02-05 11:55:51 - ERROR - stderr - +2025-02-05 11:55:51 - INFO - stdout - {'loss': 1.0102, 'grad_norm': 1.1678746938705444, 'learning_rate': 1.96265093128495e-05, 'epoch': 0.34} +2025-02-05 11:55:51 - ERROR - stderr - 11%|█▏ | 2573/22434 [1:48:11<13:49:44, 2.51s/it] +2025-02-05 11:55:54 - ERROR - stderr - 11%|█▏ | 2574/22434 [1:48:14<13:42:14, 2.48s/it] +2025-02-05 11:55:54 - ERROR - stderr - +2025-02-05 11:55:54 - ERROR - stderr - +2025-02-05 11:55:54 - INFO - stdout - {'loss': 1.0013, 'grad_norm': 1.349263072013855, 'learning_rate': 1.9626118324465035e-05, 'epoch': 0.34} +2025-02-05 11:55:54 - ERROR - stderr - 11%|█▏ | 2574/22434 [1:48:14<13:42:14, 2.48s/it] +2025-02-05 11:55:56 - ERROR - stderr - 11%|█▏ | 2575/22434 [1:48:16<13:42:12, 2.48s/it] +2025-02-05 11:55:56 - ERROR - stderr - +2025-02-05 11:55:56 - ERROR - stderr - +2025-02-05 11:55:56 - INFO - stdout - {'loss': 0.9626, 'grad_norm': 1.0769171714782715, 'learning_rate': 1.9625727135433343e-05, 'epoch': 0.34} +2025-02-05 11:55:56 - ERROR - stderr - 11%|█▏ | 2575/22434 [1:48:16<13:42:12, 2.48s/it] +2025-02-05 11:55:59 - ERROR - stderr - 11%|█▏ | 2576/22434 [1:48:19<13:40:10, 2.48s/it] +2025-02-05 11:55:59 - ERROR - stderr - +2025-02-05 11:55:59 - ERROR - stderr - +2025-02-05 11:55:59 - INFO - stdout - {'loss': 1.0471, 'grad_norm': 1.0992207527160645, 'learning_rate': 1.9625335745762578e-05, 'epoch': 0.34} +2025-02-05 11:55:59 - ERROR - stderr - 11%|█▏ | 2576/22434 [1:48:19<13:40:10, 2.48s/it] +2025-02-05 11:56:01 - ERROR - stderr - 11%|█▏ | 2577/22434 [1:48:21<13:40:17, 2.48s/it] +2025-02-05 11:56:01 - ERROR - stderr - +2025-02-05 11:56:01 - ERROR - stderr - +2025-02-05 11:56:01 - INFO - stdout - {'loss': 1.0813, 'grad_norm': 1.2378076314926147, 'learning_rate': 1.96249441554609e-05, 'epoch': 0.34} +2025-02-05 11:56:01 - ERROR - stderr - 11%|█▏ | 2577/22434 [1:48:21<13:40:17, 2.48s/it] +2025-02-05 11:56:04 - ERROR - stderr - 11%|█▏ | 2578/22434 [1:48:23<13:39:36, 2.48s/it] +2025-02-05 11:56:04 - ERROR - stderr - +2025-02-05 11:56:04 - ERROR - stderr - +2025-02-05 11:56:04 - INFO - stdout - {'loss': 0.9162, 'grad_norm': 1.1264938116073608, 'learning_rate': 1.9624552364536472e-05, 'epoch': 0.34} +2025-02-05 11:56:04 - ERROR - stderr - 11%|█▏ | 2578/22434 [1:48:24<13:39:36, 2.48s/it] +2025-02-05 11:56:06 - ERROR - stderr - 11%|█▏ | 2579/22434 [1:48:26<13:54:54, 2.52s/it] +2025-02-05 11:56:06 - ERROR - stderr - +2025-02-05 11:56:06 - ERROR - stderr - +2025-02-05 11:56:06 - INFO - stdout - {'loss': 1.1321, 'grad_norm': 1.243513822555542, 'learning_rate': 1.962416037299746e-05, 'epoch': 0.34} +2025-02-05 11:56:06 - ERROR - stderr - 11%|█▏ | 2579/22434 [1:48:26<13:54:54, 2.52s/it] +2025-02-05 11:56:09 - ERROR - stderr - 12%|█▏ | 2580/22434 [1:48:29<14:05:53, 2.56s/it] +2025-02-05 11:56:09 - ERROR - stderr - +2025-02-05 11:56:09 - ERROR - stderr - +2025-02-05 11:56:09 - INFO - stdout - {'loss': 0.9682, 'grad_norm': 1.0973551273345947, 'learning_rate': 1.962376818085204e-05, 'epoch': 0.35} +2025-02-05 11:56:09 - ERROR - stderr - 12%|█▏ | 2580/22434 [1:48:29<14:05:53, 2.56s/it] +2025-02-05 11:56:11 - ERROR - stderr - 12%|█▏ | 2581/22434 [1:48:31<13:59:51, 2.54s/it] +2025-02-05 11:56:12 - ERROR - stderr - +2025-02-05 11:56:12 - ERROR - stderr - +2025-02-05 11:56:12 - INFO - stdout - {'loss': 0.9831, 'grad_norm': 1.0493675470352173, 'learning_rate': 1.9623375788108373e-05, 'epoch': 0.35} +2025-02-05 11:56:12 - ERROR - stderr - 12%|█▏ | 2581/22434 [1:48:31<13:59:51, 2.54s/it] +2025-02-05 11:56:14 - ERROR - stderr - 12%|█▏ | 2582/22434 [1:48:34<14:02:47, 2.55s/it] +2025-02-05 11:56:14 - ERROR - stderr - +2025-02-05 11:56:14 - ERROR - stderr - +2025-02-05 11:56:14 - INFO - stdout - {'loss': 0.9248, 'grad_norm': 1.1050320863723755, 'learning_rate': 1.9622983194774652e-05, 'epoch': 0.35} +2025-02-05 11:56:14 - ERROR - stderr - 12%|█▏ | 2582/22434 [1:48:34<14:02:47, 2.55s/it] +2025-02-05 11:56:17 - ERROR - stderr - 12%|█▏ | 2583/22434 [1:48:36<13:58:05, 2.53s/it] +2025-02-05 11:56:17 - ERROR - stderr - +2025-02-05 11:56:17 - ERROR - stderr - +2025-02-05 11:56:17 - INFO - stdout - {'loss': 0.9449, 'grad_norm': 1.0662256479263306, 'learning_rate': 1.962259040085905e-05, 'epoch': 0.35} +2025-02-05 11:56:17 - ERROR - stderr - 12%|█▏ | 2583/22434 [1:48:36<13:58:05, 2.53s/it] +2025-02-05 11:56:19 - ERROR - stderr - 12%|█▏ | 2584/22434 [1:48:39<13:48:26, 2.50s/it] +2025-02-05 11:56:19 - ERROR - stderr - +2025-02-05 11:56:19 - ERROR - stderr - +2025-02-05 11:56:19 - INFO - stdout - {'loss': 1.0101, 'grad_norm': 1.118995189666748, 'learning_rate': 1.9622197406369764e-05, 'epoch': 0.35} +2025-02-05 11:56:19 - ERROR - stderr - 12%|█▏ | 2584/22434 [1:48:39<13:48:26, 2.50s/it] +2025-02-05 11:56:21 - ERROR - stderr - 12%|█▏ | 2585/22434 [1:48:41<13:40:42, 2.48s/it] +2025-02-05 11:56:21 - ERROR - stderr - +2025-02-05 11:56:21 - ERROR - stderr - +2025-02-05 11:56:21 - INFO - stdout - {'loss': 1.0218, 'grad_norm': 1.1912171840667725, 'learning_rate': 1.9621804211314974e-05, 'epoch': 0.35} +2025-02-05 11:56:21 - ERROR - stderr - 12%|█▏ | 2585/22434 [1:48:41<13:40:42, 2.48s/it] +2025-02-05 11:56:24 - ERROR - stderr - 12%|█▏ | 2586/22434 [1:48:44<13:35:30, 2.47s/it] +2025-02-05 11:56:24 - ERROR - stderr - +2025-02-05 11:56:24 - ERROR - stderr - +2025-02-05 11:56:24 - INFO - stdout - {'loss': 1.0849, 'grad_norm': 1.166723370552063, 'learning_rate': 1.9621410815702888e-05, 'epoch': 0.35} +2025-02-05 11:56:24 - ERROR - stderr - 12%|█▏ | 2586/22434 [1:48:44<13:35:30, 2.47s/it] +2025-02-05 11:56:26 - ERROR - stderr - 12%|█▏ | 2587/22434 [1:48:46<13:37:21, 2.47s/it] +2025-02-05 11:56:26 - ERROR - stderr - +2025-02-05 11:56:26 - ERROR - stderr - +2025-02-05 11:56:26 - INFO - stdout - {'loss': 1.0346, 'grad_norm': 1.1717168092727661, 'learning_rate': 1.9621017219541694e-05, 'epoch': 0.35} +2025-02-05 11:56:26 - ERROR - stderr - 12%|█▏ | 2587/22434 [1:48:46<13:37:21, 2.47s/it] +2025-02-05 11:56:29 - ERROR - stderr - 12%|█▏ | 2588/22434 [1:48:49<13:34:10, 2.46s/it] +2025-02-05 11:56:29 - ERROR - stderr - +2025-02-05 11:56:29 - ERROR - stderr - +2025-02-05 11:56:29 - INFO - stdout - {'loss': 0.9908, 'grad_norm': 1.158998727798462, 'learning_rate': 1.962062342283961e-05, 'epoch': 0.35} +2025-02-05 11:56:29 - ERROR - stderr - 12%|█▏ | 2588/22434 [1:48:49<13:34:10, 2.46s/it] +2025-02-05 11:56:31 - ERROR - stderr - 12%|█▏ | 2589/22434 [1:48:51<13:33:53, 2.46s/it] +2025-02-05 11:56:31 - ERROR - stderr - +2025-02-05 11:56:31 - ERROR - stderr - +2025-02-05 11:56:31 - INFO - stdout - {'loss': 1.0466, 'grad_norm': 1.2118558883666992, 'learning_rate': 1.962022942560483e-05, 'epoch': 0.35} +2025-02-05 11:56:31 - ERROR - stderr - 12%|█▏ | 2589/22434 [1:48:51<13:33:53, 2.46s/it] +2025-02-05 11:56:34 - ERROR - stderr - 12%|█▏ | 2590/22434 [1:48:53<13:34:29, 2.46s/it] +2025-02-05 11:56:34 - ERROR - stderr - +2025-02-05 11:56:34 - ERROR - stderr - +2025-02-05 11:56:34 - INFO - stdout - {'loss': 0.9992, 'grad_norm': 1.2053078413009644, 'learning_rate': 1.9619835227845582e-05, 'epoch': 0.35} +2025-02-05 11:56:34 - ERROR - stderr - 12%|█▏ | 2590/22434 [1:48:54<13:34:29, 2.46s/it] +2025-02-05 11:56:36 - ERROR - stderr - 12%|█▏ | 2591/22434 [1:48:56<13:40:38, 2.48s/it] +2025-02-05 11:56:36 - ERROR - stderr - +2025-02-05 11:56:36 - ERROR - stderr - +2025-02-05 11:56:36 - INFO - stdout - {'loss': 0.9243, 'grad_norm': 1.1855584383010864, 'learning_rate': 1.9619440829570065e-05, 'epoch': 0.35} +2025-02-05 11:56:36 - ERROR - stderr - 12%|█▏ | 2591/22434 [1:48:56<13:40:38, 2.48s/it] +2025-02-05 11:56:39 - ERROR - stderr - 12%|█▏ | 2592/22434 [1:48:58<13:39:47, 2.48s/it] +2025-02-05 11:56:39 - ERROR - stderr - +2025-02-05 11:56:39 - ERROR - stderr - +2025-02-05 11:56:39 - INFO - stdout - {'loss': 0.8814, 'grad_norm': 1.1357593536376953, 'learning_rate': 1.9619046230786512e-05, 'epoch': 0.35} +2025-02-05 11:56:39 - ERROR - stderr - 12%|█▏ | 2592/22434 [1:48:59<13:39:47, 2.48s/it] +2025-02-05 11:56:41 - ERROR - stderr - 12%|█▏ | 2593/22434 [1:49:01<13:39:52, 2.48s/it] +2025-02-05 11:56:41 - ERROR - stderr - +2025-02-05 11:56:41 - ERROR - stderr - +2025-02-05 11:56:41 - INFO - stdout - {'loss': 1.0791, 'grad_norm': 1.271559715270996, 'learning_rate': 1.9618651431503146e-05, 'epoch': 0.35} +2025-02-05 11:56:41 - ERROR - stderr - 12%|█▏ | 2593/22434 [1:49:01<13:39:52, 2.48s/it] +2025-02-05 11:56:44 - ERROR - stderr - 12%|█▏ | 2594/22434 [1:49:03<13:38:00, 2.47s/it] +2025-02-05 11:56:44 - ERROR - stderr - +2025-02-05 11:56:44 - ERROR - stderr - +2025-02-05 11:56:44 - INFO - stdout - {'loss': 1.0079, 'grad_norm': 1.1946696043014526, 'learning_rate': 1.961825643172819e-05, 'epoch': 0.35} +2025-02-05 11:56:44 - ERROR - stderr - 12%|█▏ | 2594/22434 [1:49:03<13:38:00, 2.47s/it] +2025-02-05 11:56:46 - ERROR - stderr - 12%|█▏ | 2595/22434 [1:49:06<14:06:38, 2.56s/it] +2025-02-05 11:56:46 - ERROR - stderr - +2025-02-05 11:56:46 - ERROR - stderr - +2025-02-05 11:56:46 - INFO - stdout - {'loss': 0.9431, 'grad_norm': 1.1071274280548096, 'learning_rate': 1.9617861231469887e-05, 'epoch': 0.35} +2025-02-05 11:56:46 - ERROR - stderr - 12%|█▏ | 2595/22434 [1:49:06<14:06:38, 2.56s/it] +2025-02-05 11:56:49 - ERROR - stderr - 12%|█▏ | 2596/22434 [1:49:09<13:50:41, 2.51s/it] +2025-02-05 11:56:49 - ERROR - stderr - +2025-02-05 11:56:49 - ERROR - stderr - +2025-02-05 11:56:49 - INFO - stdout - {'loss': 1.0815, 'grad_norm': 1.2470589876174927, 'learning_rate': 1.961746583073647e-05, 'epoch': 0.35} +2025-02-05 11:56:49 - ERROR - stderr - 12%|█▏ | 2596/22434 [1:49:09<13:50:41, 2.51s/it] +2025-02-05 11:56:51 - ERROR - stderr - 12%|█▏ | 2597/22434 [1:49:11<13:49:43, 2.51s/it] +2025-02-05 11:56:51 - ERROR - stderr - +2025-02-05 11:56:51 - ERROR - stderr - +2025-02-05 11:56:51 - INFO - stdout - {'loss': 1.0213, 'grad_norm': 1.1656633615493774, 'learning_rate': 1.9617070229536178e-05, 'epoch': 0.35} +2025-02-05 11:56:51 - ERROR - stderr - 12%|█▏ | 2597/22434 [1:49:11<13:49:43, 2.51s/it] +2025-02-05 11:56:54 - ERROR - stderr - 12%|█▏ | 2598/22434 [1:49:14<13:54:15, 2.52s/it] +2025-02-05 11:56:54 - ERROR - stderr - +2025-02-05 11:56:54 - ERROR - stderr - +2025-02-05 11:56:54 - INFO - stdout - {'loss': 0.9887, 'grad_norm': 1.1932566165924072, 'learning_rate': 1.9616674427877264e-05, 'epoch': 0.35} +2025-02-05 11:56:54 - ERROR - stderr - 12%|█▏ | 2598/22434 [1:49:14<13:54:15, 2.52s/it] +2025-02-05 11:56:57 - ERROR - stderr - 12%|█▏ | 2599/22434 [1:49:16<14:23:36, 2.61s/it] +2025-02-05 11:56:57 - ERROR - stderr - +2025-02-05 11:56:57 - ERROR - stderr - +2025-02-05 11:56:57 - INFO - stdout - {'loss': 0.9304, 'grad_norm': 1.1705557107925415, 'learning_rate': 1.961627842576797e-05, 'epoch': 0.35} +2025-02-05 11:56:57 - ERROR - stderr - 12%|█▏ | 2599/22434 [1:49:16<14:23:36, 2.61s/it] +2025-02-05 11:56:59 - ERROR - stderr - 12%|█▏ | 2600/22434 [1:49:19<14:04:51, 2.56s/it] +2025-02-05 11:56:59 - ERROR - stderr - +2025-02-05 11:56:59 - ERROR - stderr - +2025-02-05 11:56:59 - INFO - stdout - {'loss': 1.0532, 'grad_norm': 1.2132103443145752, 'learning_rate': 1.9615882223216553e-05, 'epoch': 0.35} +2025-02-05 11:56:59 - ERROR - stderr - 12%|█▏ | 2600/22434 [1:49:19<14:04:51, 2.56s/it] +2025-02-05 11:57:02 - ERROR - stderr - 12%|█▏ | 2601/22434 [1:49:21<14:04:29, 2.55s/it] +2025-02-05 11:57:02 - ERROR - stderr - +2025-02-05 11:57:02 - ERROR - stderr - +2025-02-05 11:57:02 - INFO - stdout - {'loss': 0.9883, 'grad_norm': 1.261538028717041, 'learning_rate': 1.9615485820231278e-05, 'epoch': 0.35} +2025-02-05 11:57:02 - ERROR - stderr - 12%|█▏ | 2601/22434 [1:49:21<14:04:29, 2.55s/it] +2025-02-05 11:57:04 - ERROR - stderr - 12%|█▏ | 2602/22434 [1:49:24<13:54:11, 2.52s/it] +2025-02-05 11:57:04 - ERROR - stderr - +2025-02-05 11:57:04 - ERROR - stderr - +2025-02-05 11:57:04 - INFO - stdout - {'loss': 1.0481, 'grad_norm': 1.2422410249710083, 'learning_rate': 1.9615089216820395e-05, 'epoch': 0.35} +2025-02-05 11:57:04 - ERROR - stderr - 12%|█▏ | 2602/22434 [1:49:24<13:54:11, 2.52s/it] +2025-02-05 11:57:07 - ERROR - stderr - 12%|█▏ | 2603/22434 [1:49:26<13:49:17, 2.51s/it] +2025-02-05 11:57:07 - ERROR - stderr - +2025-02-05 11:57:07 - ERROR - stderr - +2025-02-05 11:57:07 - INFO - stdout - {'loss': 1.0819, 'grad_norm': 1.1227924823760986, 'learning_rate': 1.9614692412992183e-05, 'epoch': 0.35} +2025-02-05 11:57:07 - ERROR - stderr - 12%|█▏ | 2603/22434 [1:49:26<13:49:17, 2.51s/it] +2025-02-05 11:57:09 - ERROR - stderr - 12%|█▏ | 2604/22434 [1:49:29<13:51:31, 2.52s/it] +2025-02-05 11:57:09 - ERROR - stderr - +2025-02-05 11:57:09 - ERROR - stderr - +2025-02-05 11:57:09 - INFO - stdout - {'loss': 1.1976, 'grad_norm': 1.2238742113113403, 'learning_rate': 1.9614295408754908e-05, 'epoch': 0.35} +2025-02-05 11:57:09 - ERROR - stderr - 12%|█▏ | 2604/22434 [1:49:29<13:51:31, 2.52s/it] +2025-02-05 11:57:12 - ERROR - stderr - 12%|█▏ | 2605/22434 [1:49:31<13:50:21, 2.51s/it] +2025-02-05 11:57:12 - ERROR - stderr - +2025-02-05 11:57:12 - ERROR - stderr - +2025-02-05 11:57:12 - INFO - stdout - {'loss': 1.0073, 'grad_norm': 1.1077107191085815, 'learning_rate': 1.961389820411684e-05, 'epoch': 0.35} +2025-02-05 11:57:12 - ERROR - stderr - 12%|█▏ | 2605/22434 [1:49:31<13:50:21, 2.51s/it] +2025-02-05 11:57:14 - ERROR - stderr - 12%|█▏ | 2606/22434 [1:49:34<13:42:45, 2.49s/it] +2025-02-05 11:57:14 - ERROR - stderr - +2025-02-05 11:57:14 - ERROR - stderr - +2025-02-05 11:57:14 - INFO - stdout - {'loss': 1.0746, 'grad_norm': 1.2013999223709106, 'learning_rate': 1.9613500799086266e-05, 'epoch': 0.35} +2025-02-05 11:57:14 - ERROR - stderr - 12%|█▏ | 2606/22434 [1:49:34<13:42:45, 2.49s/it] +2025-02-05 11:57:17 - ERROR - stderr - 12%|█▏ | 2607/22434 [1:49:36<13:42:49, 2.49s/it] +2025-02-05 11:57:17 - ERROR - stderr - +2025-02-05 11:57:17 - ERROR - stderr - +2025-02-05 11:57:17 - INFO - stdout - {'loss': 0.9325, 'grad_norm': 1.076201319694519, 'learning_rate': 1.9613103193671466e-05, 'epoch': 0.35} +2025-02-05 11:57:17 - ERROR - stderr - 12%|█▏ | 2607/22434 [1:49:36<13:42:49, 2.49s/it] +2025-02-05 11:57:19 - ERROR - stderr - 12%|█▏ | 2608/22434 [1:49:39<13:46:08, 2.50s/it] +2025-02-05 11:57:19 - ERROR - stderr - +2025-02-05 11:57:19 - ERROR - stderr - +2025-02-05 11:57:19 - INFO - stdout - {'loss': 1.0074, 'grad_norm': 1.078354001045227, 'learning_rate': 1.9612705387880733e-05, 'epoch': 0.35} +2025-02-05 11:57:19 - ERROR - stderr - 12%|█▏ | 2608/22434 [1:49:39<13:46:08, 2.50s/it] +2025-02-05 11:57:22 - ERROR - stderr - 12%|█▏ | 2609/22434 [1:49:41<13:42:03, 2.49s/it] +2025-02-05 11:57:22 - ERROR - stderr - +2025-02-05 11:57:22 - ERROR - stderr - +2025-02-05 11:57:22 - INFO - stdout - {'loss': 0.9253, 'grad_norm': 1.1448390483856201, 'learning_rate': 1.961230738172235e-05, 'epoch': 0.35} +2025-02-05 11:57:22 - ERROR - stderr - 12%|█▏ | 2609/22434 [1:49:41<13:42:03, 2.49s/it] +2025-02-05 11:57:24 - ERROR - stderr - 12%|█▏ | 2610/22434 [1:49:44<13:45:11, 2.50s/it] +2025-02-05 11:57:24 - ERROR - stderr - +2025-02-05 11:57:24 - ERROR - stderr - +2025-02-05 11:57:24 - INFO - stdout - {'loss': 1.0108, 'grad_norm': 1.0853244066238403, 'learning_rate': 1.961190917520462e-05, 'epoch': 0.35} +2025-02-05 11:57:24 - ERROR - stderr - 12%|█▏ | 2610/22434 [1:49:44<13:45:11, 2.50s/it] +2025-02-05 11:57:27 - ERROR - stderr - 12%|█▏ | 2611/22434 [1:49:46<13:45:30, 2.50s/it] +2025-02-05 11:57:27 - ERROR - stderr - +2025-02-05 11:57:27 - ERROR - stderr - +2025-02-05 11:57:27 - INFO - stdout - {'loss': 1.0537, 'grad_norm': 1.1311365365982056, 'learning_rate': 1.9611510768335842e-05, 'epoch': 0.35} +2025-02-05 11:57:27 - ERROR - stderr - 12%|█▏ | 2611/22434 [1:49:46<13:45:30, 2.50s/it] +2025-02-05 11:57:29 - ERROR - stderr - 12%|█▏ | 2612/22434 [1:49:49<13:45:58, 2.50s/it] +2025-02-05 11:57:29 - ERROR - stderr - +2025-02-05 11:57:29 - ERROR - stderr - +2025-02-05 11:57:29 - INFO - stdout - {'loss': 0.877, 'grad_norm': 1.0610649585723877, 'learning_rate': 1.961111216112432e-05, 'epoch': 0.35} +2025-02-05 11:57:29 - ERROR - stderr - 12%|█▏ | 2612/22434 [1:49:49<13:45:58, 2.50s/it] +2025-02-05 11:57:32 - ERROR - stderr - 12%|█▏ | 2613/22434 [1:49:51<13:55:42, 2.53s/it] +2025-02-05 11:57:32 - ERROR - stderr - +2025-02-05 11:57:32 - ERROR - stderr - +2025-02-05 11:57:32 - INFO - stdout - {'loss': 0.9543, 'grad_norm': 1.1435920000076294, 'learning_rate': 1.9610713353578356e-05, 'epoch': 0.35} +2025-02-05 11:57:32 - ERROR - stderr - 12%|█▏ | 2613/22434 [1:49:51<13:55:42, 2.53s/it] +2025-02-05 11:57:34 - ERROR - stderr - 12%|█▏ | 2614/22434 [1:49:54<13:50:06, 2.51s/it] +2025-02-05 11:57:34 - ERROR - stderr - +2025-02-05 11:57:34 - ERROR - stderr - +2025-02-05 11:57:34 - INFO - stdout - {'loss': 0.9889, 'grad_norm': 3.3594019412994385, 'learning_rate': 1.9610314345706275e-05, 'epoch': 0.35} +2025-02-05 11:57:34 - ERROR - stderr - 12%|█▏ | 2614/22434 [1:49:54<13:50:06, 2.51s/it] +2025-02-05 11:57:37 - ERROR - stderr - 12%|█▏ | 2615/22434 [1:49:56<13:55:40, 2.53s/it] +2025-02-05 11:57:37 - ERROR - stderr - +2025-02-05 11:57:37 - ERROR - stderr - +2025-02-05 11:57:37 - INFO - stdout - {'loss': 1.0147, 'grad_norm': 1.2156792879104614, 'learning_rate': 1.9609915137516383e-05, 'epoch': 0.35} +2025-02-05 11:57:37 - ERROR - stderr - 12%|█▏ | 2615/22434 [1:49:57<13:55:40, 2.53s/it] +2025-02-05 11:57:39 - ERROR - stderr - 12%|█▏ | 2616/22434 [1:49:59<13:58:47, 2.54s/it] +2025-02-05 11:57:39 - ERROR - stderr - +2025-02-05 11:57:39 - ERROR - stderr - +2025-02-05 11:57:39 - INFO - stdout - {'loss': 1.1006, 'grad_norm': 1.363714575767517, 'learning_rate': 1.9609515729017006e-05, 'epoch': 0.35} +2025-02-05 11:57:39 - ERROR - stderr - 12%|█▏ | 2616/22434 [1:49:59<13:58:47, 2.54s/it] +2025-02-05 11:57:42 - ERROR - stderr - 12%|█▏ | 2617/22434 [1:50:02<14:04:31, 2.56s/it] +2025-02-05 11:57:42 - ERROR - stderr - +2025-02-05 11:57:42 - ERROR - stderr - +2025-02-05 11:57:42 - INFO - stdout - {'loss': 1.0501, 'grad_norm': 1.108022689819336, 'learning_rate': 1.960911612021647e-05, 'epoch': 0.35} +2025-02-05 11:57:42 - ERROR - stderr - 12%|█▏ | 2617/22434 [1:50:02<14:04:31, 2.56s/it] +2025-02-05 11:57:44 - ERROR - stderr - 12%|█▏ | 2618/22434 [1:50:04<13:59:46, 2.54s/it] +2025-02-05 11:57:44 - ERROR - stderr - +2025-02-05 11:57:44 - ERROR - stderr - +2025-02-05 11:57:44 - INFO - stdout - {'loss': 1.0165, 'grad_norm': 1.1953414678573608, 'learning_rate': 1.9608716311123107e-05, 'epoch': 0.35} +2025-02-05 11:57:44 - ERROR - stderr - 12%|█▏ | 2618/22434 [1:50:04<13:59:46, 2.54s/it] +2025-02-05 11:57:47 - ERROR - stderr - 12%|█▏ | 2619/22434 [1:50:07<13:58:11, 2.54s/it] +2025-02-05 11:57:47 - ERROR - stderr - +2025-02-05 11:57:47 - ERROR - stderr - +2025-02-05 11:57:47 - INFO - stdout - {'loss': 0.9524, 'grad_norm': 1.0880476236343384, 'learning_rate': 1.9608316301745242e-05, 'epoch': 0.35} +2025-02-05 11:57:47 - ERROR - stderr - 12%|█▏ | 2619/22434 [1:50:07<13:58:11, 2.54s/it] +2025-02-05 11:57:49 - ERROR - stderr - 12%|█▏ | 2620/22434 [1:50:09<13:57:53, 2.54s/it] +2025-02-05 11:57:49 - ERROR - stderr - +2025-02-05 11:57:49 - ERROR - stderr - +2025-02-05 11:57:49 - INFO - stdout - {'loss': 1.0349, 'grad_norm': 1.113537073135376, 'learning_rate': 1.960791609209122e-05, 'epoch': 0.35} +2025-02-05 11:57:49 - ERROR - stderr - 12%|█▏ | 2620/22434 [1:50:09<13:57:53, 2.54s/it] +2025-02-05 11:57:52 - ERROR - stderr - 12%|█▏ | 2621/22434 [1:50:12<14:07:01, 2.57s/it] +2025-02-05 11:57:52 - ERROR - stderr - +2025-02-05 11:57:52 - ERROR - stderr - +2025-02-05 11:57:52 - INFO - stdout - {'loss': 0.9616, 'grad_norm': 1.159740924835205, 'learning_rate': 1.9607515682169378e-05, 'epoch': 0.35} +2025-02-05 11:57:52 - ERROR - stderr - 12%|█▏ | 2621/22434 [1:50:12<14:07:01, 2.57s/it] +2025-02-05 11:57:55 - ERROR - stderr - 12%|█▏ | 2622/22434 [1:50:14<14:15:24, 2.59s/it] +2025-02-05 11:57:55 - ERROR - stderr - +2025-02-05 11:57:55 - ERROR - stderr - +2025-02-05 11:57:55 - INFO - stdout - {'loss': 0.7935, 'grad_norm': 1.044344425201416, 'learning_rate': 1.9607115071988068e-05, 'epoch': 0.35} +2025-02-05 11:57:55 - ERROR - stderr - 12%|█▏ | 2622/22434 [1:50:15<14:15:24, 2.59s/it] +2025-02-05 11:57:57 - ERROR - stderr - 12%|█▏ | 2623/22434 [1:50:17<14:06:52, 2.56s/it] +2025-02-05 11:57:57 - ERROR - stderr - +2025-02-05 11:57:57 - ERROR - stderr - +2025-02-05 11:57:57 - INFO - stdout - {'loss': 1.098, 'grad_norm': 1.2492702007293701, 'learning_rate': 1.9606714261555637e-05, 'epoch': 0.35} +2025-02-05 11:57:57 - ERROR - stderr - 12%|█▏ | 2623/22434 [1:50:17<14:06:52, 2.56s/it] +2025-02-05 11:58:00 - ERROR - stderr - 12%|█▏ | 2624/22434 [1:50:19<13:59:23, 2.54s/it] +2025-02-05 11:58:00 - ERROR - stderr - +2025-02-05 11:58:00 - ERROR - stderr - +2025-02-05 11:58:00 - INFO - stdout - {'loss': 0.9106, 'grad_norm': 1.1514935493469238, 'learning_rate': 1.960631325088044e-05, 'epoch': 0.35} +2025-02-05 11:58:00 - ERROR - stderr - 12%|█▏ | 2624/22434 [1:50:20<13:59:23, 2.54s/it] +2025-02-05 11:58:02 - ERROR - stderr - 12%|█▏ | 2625/22434 [1:50:22<13:55:00, 2.53s/it] +2025-02-05 11:58:02 - ERROR - stderr - +2025-02-05 11:58:02 - ERROR - stderr - +2025-02-05 11:58:02 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.0382087230682373, 'learning_rate': 1.9605912039970835e-05, 'epoch': 0.35} +2025-02-05 11:58:02 - ERROR - stderr - 12%|█▏ | 2625/22434 [1:50:22<13:55:00, 2.53s/it] +2025-02-05 11:58:05 - ERROR - stderr - 12%|█▏ | 2626/22434 [1:50:24<13:55:00, 2.53s/it] +2025-02-05 11:58:05 - ERROR - stderr - +2025-02-05 11:58:05 - ERROR - stderr - +2025-02-05 11:58:05 - INFO - stdout - {'loss': 1.1021, 'grad_norm': 1.158911943435669, 'learning_rate': 1.9605510628835184e-05, 'epoch': 0.35} +2025-02-05 11:58:05 - ERROR - stderr - 12%|█▏ | 2626/22434 [1:50:25<13:55:00, 2.53s/it] +2025-02-05 11:58:07 - ERROR - stderr - 12%|█▏ | 2627/22434 [1:50:27<13:51:03, 2.52s/it] +2025-02-05 11:58:07 - ERROR - stderr - +2025-02-05 11:58:07 - ERROR - stderr - +2025-02-05 11:58:07 - INFO - stdout - {'loss': 0.9501, 'grad_norm': 1.0473262071609497, 'learning_rate': 1.960510901748186e-05, 'epoch': 0.35} +2025-02-05 11:58:07 - ERROR - stderr - 12%|█▏ | 2627/22434 [1:50:27<13:51:03, 2.52s/it] +2025-02-05 11:58:10 - ERROR - stderr - 12%|█▏ | 2628/22434 [1:50:29<13:47:19, 2.51s/it] +2025-02-05 11:58:10 - ERROR - stderr - +2025-02-05 11:58:10 - ERROR - stderr - +2025-02-05 11:58:10 - INFO - stdout - {'loss': 1.0231, 'grad_norm': 1.1491297483444214, 'learning_rate': 1.9604707205919223e-05, 'epoch': 0.35} +2025-02-05 11:58:10 - ERROR - stderr - 12%|█▏ | 2628/22434 [1:50:30<13:47:19, 2.51s/it] +2025-02-05 11:58:12 - ERROR - stderr - 12%|█▏ | 2629/22434 [1:50:32<13:41:03, 2.49s/it] +2025-02-05 11:58:12 - ERROR - stderr - +2025-02-05 11:58:12 - ERROR - stderr - +2025-02-05 11:58:12 - INFO - stdout - {'loss': 0.9688, 'grad_norm': 1.1306887865066528, 'learning_rate': 1.960430519415566e-05, 'epoch': 0.35} +2025-02-05 11:58:12 - ERROR - stderr - 12%|█▏ | 2629/22434 [1:50:32<13:41:03, 2.49s/it] +2025-02-05 11:58:15 - ERROR - stderr - 12%|█▏ | 2630/22434 [1:50:34<13:35:45, 2.47s/it] +2025-02-05 11:58:15 - ERROR - stderr - +2025-02-05 11:58:15 - ERROR - stderr - +2025-02-05 11:58:15 - INFO - stdout - {'loss': 0.9622, 'grad_norm': 1.2194674015045166, 'learning_rate': 1.9603902982199544e-05, 'epoch': 0.35} +2025-02-05 11:58:15 - ERROR - stderr - 12%|█▏ | 2630/22434 [1:50:34<13:35:45, 2.47s/it] +2025-02-05 11:58:17 - ERROR - stderr - 12%|█▏ | 2631/22434 [1:50:37<13:31:00, 2.46s/it] +2025-02-05 11:58:17 - ERROR - stderr - +2025-02-05 11:58:17 - ERROR - stderr - +2025-02-05 11:58:17 - INFO - stdout - {'loss': 1.1039, 'grad_norm': 1.2383387088775635, 'learning_rate': 1.9603500570059258e-05, 'epoch': 0.35} +2025-02-05 11:58:17 - ERROR - stderr - 12%|█▏ | 2631/22434 [1:50:37<13:31:00, 2.46s/it] +2025-02-05 11:58:20 - ERROR - stderr - 12%|█▏ | 2632/22434 [1:50:39<13:37:05, 2.48s/it] +2025-02-05 11:58:20 - ERROR - stderr - +2025-02-05 11:58:20 - ERROR - stderr - +2025-02-05 11:58:20 - INFO - stdout - {'loss': 0.9986, 'grad_norm': 1.1345744132995605, 'learning_rate': 1.9603097957743197e-05, 'epoch': 0.35} +2025-02-05 11:58:20 - ERROR - stderr - 12%|█▏ | 2632/22434 [1:50:39<13:37:05, 2.48s/it] +2025-02-05 11:58:22 - ERROR - stderr - 12%|█▏ | 2633/22434 [1:50:42<13:40:52, 2.49s/it] +2025-02-05 11:58:22 - ERROR - stderr - +2025-02-05 11:58:22 - ERROR - stderr - +2025-02-05 11:58:22 - INFO - stdout - {'loss': 0.86, 'grad_norm': 1.085554599761963, 'learning_rate': 1.9602695145259744e-05, 'epoch': 0.35} +2025-02-05 11:58:22 - ERROR - stderr - 12%|█▏ | 2633/22434 [1:50:42<13:40:52, 2.49s/it] +2025-02-05 11:58:25 - ERROR - stderr - 12%|█▏ | 2634/22434 [1:50:44<13:43:57, 2.50s/it] +2025-02-05 11:58:25 - ERROR - stderr - +2025-02-05 11:58:25 - ERROR - stderr - +2025-02-05 11:58:25 - INFO - stdout - {'loss': 1.0867, 'grad_norm': 1.1948943138122559, 'learning_rate': 1.96022921326173e-05, 'epoch': 0.35} +2025-02-05 11:58:25 - ERROR - stderr - 12%|█▏ | 2634/22434 [1:50:44<13:43:57, 2.50s/it] +2025-02-05 11:58:27 - ERROR - stderr - 12%|█▏ | 2635/22434 [1:50:47<13:38:46, 2.48s/it] +2025-02-05 11:58:27 - ERROR - stderr - +2025-02-05 11:58:27 - ERROR - stderr - +2025-02-05 11:58:27 - INFO - stdout - {'loss': 0.9797, 'grad_norm': 1.3336191177368164, 'learning_rate': 1.960188891982427e-05, 'epoch': 0.35} +2025-02-05 11:58:27 - ERROR - stderr - 12%|█▏ | 2635/22434 [1:50:47<13:38:46, 2.48s/it] +2025-02-05 11:58:29 - ERROR - stderr - 12%|█▏ | 2636/22434 [1:50:49<13:33:44, 2.47s/it] +2025-02-05 11:58:29 - ERROR - stderr - +2025-02-05 11:58:29 - ERROR - stderr - +2025-02-05 11:58:29 - INFO - stdout - {'loss': 0.8849, 'grad_norm': 1.1102896928787231, 'learning_rate': 1.9601485506889047e-05, 'epoch': 0.35} +2025-02-05 11:58:29 - ERROR - stderr - 12%|█▏ | 2636/22434 [1:50:49<13:33:44, 2.47s/it] +2025-02-05 11:58:32 - ERROR - stderr - 12%|█▏ | 2637/22434 [1:50:52<13:33:02, 2.46s/it] +2025-02-05 11:58:32 - ERROR - stderr - +2025-02-05 11:58:32 - ERROR - stderr - +2025-02-05 11:58:32 - INFO - stdout - {'loss': 0.9583, 'grad_norm': 1.0755975246429443, 'learning_rate': 1.9601081893820048e-05, 'epoch': 0.35} +2025-02-05 11:58:32 - ERROR - stderr - 12%|█▏ | 2637/22434 [1:50:52<13:33:02, 2.46s/it] +2025-02-05 11:58:35 - ERROR - stderr - 12%|█▏ | 2638/22434 [1:50:55<14:12:26, 2.58s/it] +2025-02-05 11:58:35 - ERROR - stderr - +2025-02-05 11:58:35 - ERROR - stderr - +2025-02-05 11:58:35 - INFO - stdout - {'loss': 0.9901, 'grad_norm': 1.2134389877319336, 'learning_rate': 1.9600678080625685e-05, 'epoch': 0.35} +2025-02-05 11:58:35 - ERROR - stderr - 12%|█▏ | 2638/22434 [1:50:55<14:12:26, 2.58s/it] +2025-02-05 11:58:37 - ERROR - stderr - 12%|█▏ | 2639/22434 [1:50:57<14:05:06, 2.56s/it] +2025-02-05 11:58:37 - ERROR - stderr - +2025-02-05 11:58:37 - ERROR - stderr - +2025-02-05 11:58:37 - INFO - stdout - {'loss': 1.0353, 'grad_norm': 1.1847506761550903, 'learning_rate': 1.9600274067314374e-05, 'epoch': 0.35} +2025-02-05 11:58:37 - ERROR - stderr - 12%|█▏ | 2639/22434 [1:50:57<14:05:06, 2.56s/it] +2025-02-05 11:58:40 - ERROR - stderr - 12%|█▏ | 2640/22434 [1:50:59<13:56:19, 2.54s/it] +2025-02-05 11:58:40 - ERROR - stderr - +2025-02-05 11:58:40 - ERROR - stderr - +2025-02-05 11:58:40 - INFO - stdout - {'loss': 0.9557, 'grad_norm': 1.3278470039367676, 'learning_rate': 1.959986985389454e-05, 'epoch': 0.35} +2025-02-05 11:58:40 - ERROR - stderr - 12%|█▏ | 2640/22434 [1:51:00<13:56:19, 2.54s/it] +2025-02-05 11:58:42 - ERROR - stderr - 12%|█▏ | 2641/22434 [1:51:02<13:55:21, 2.53s/it] +2025-02-05 11:58:42 - ERROR - stderr - +2025-02-05 11:58:42 - ERROR - stderr - +2025-02-05 11:58:42 - INFO - stdout - {'loss': 1.133, 'grad_norm': 1.1818082332611084, 'learning_rate': 1.95994654403746e-05, 'epoch': 0.35} +2025-02-05 11:58:42 - ERROR - stderr - 12%|█▏ | 2641/22434 [1:51:02<13:55:21, 2.53s/it] +2025-02-05 11:58:45 - ERROR - stderr - 12%|█▏ | 2642/22434 [1:51:05<13:51:27, 2.52s/it] +2025-02-05 11:58:45 - ERROR - stderr - +2025-02-05 11:58:45 - ERROR - stderr - +2025-02-05 11:58:45 - INFO - stdout - {'loss': 0.9336, 'grad_norm': 1.100904107093811, 'learning_rate': 1.959906082676299e-05, 'epoch': 0.35} +2025-02-05 11:58:45 - ERROR - stderr - 12%|█▏ | 2642/22434 [1:51:05<13:51:27, 2.52s/it] +2025-02-05 11:58:47 - ERROR - stderr - 12%|█▏ | 2643/22434 [1:51:07<13:53:31, 2.53s/it] +2025-02-05 11:58:47 - ERROR - stderr - +2025-02-05 11:58:47 - ERROR - stderr - +2025-02-05 11:58:47 - INFO - stdout - {'loss': 0.8484, 'grad_norm': 1.0586740970611572, 'learning_rate': 1.9598656013068145e-05, 'epoch': 0.35} +2025-02-05 11:58:47 - ERROR - stderr - 12%|█▏ | 2643/22434 [1:51:07<13:53:31, 2.53s/it] +2025-02-05 11:58:50 - ERROR - stderr - 12%|█▏ | 2644/22434 [1:51:10<13:45:47, 2.50s/it] +2025-02-05 11:58:50 - ERROR - stderr - +2025-02-05 11:58:50 - ERROR - stderr - +2025-02-05 11:58:50 - INFO - stdout - {'loss': 0.9348, 'grad_norm': 1.056347131729126, 'learning_rate': 1.9598250999298495e-05, 'epoch': 0.35} +2025-02-05 11:58:50 - ERROR - stderr - 12%|█▏ | 2644/22434 [1:51:10<13:45:47, 2.50s/it] +2025-02-05 11:58:53 - ERROR - stderr - 12%|█▏ | 2645/22434 [1:51:12<14:17:34, 2.60s/it] +2025-02-05 11:58:53 - ERROR - stderr - +2025-02-05 11:58:53 - ERROR - stderr - +2025-02-05 11:58:53 - INFO - stdout - {'loss': 0.9324, 'grad_norm': 1.1483207941055298, 'learning_rate': 1.9597845785462492e-05, 'epoch': 0.35} +2025-02-05 11:58:53 - ERROR - stderr - 12%|█▏ | 2645/22434 [1:51:12<14:17:34, 2.60s/it] +2025-02-05 11:58:55 - ERROR - stderr - 12%|█▏ | 2646/22434 [1:51:15<14:13:45, 2.59s/it] +2025-02-05 11:58:55 - ERROR - stderr - +2025-02-05 11:58:55 - ERROR - stderr - +2025-02-05 11:58:55 - INFO - stdout - {'loss': 1.0206, 'grad_norm': 1.149651288986206, 'learning_rate': 1.9597440371568576e-05, 'epoch': 0.35} +2025-02-05 11:58:55 - ERROR - stderr - 12%|█▏ | 2646/22434 [1:51:15<14:13:45, 2.59s/it] +2025-02-05 11:58:58 - ERROR - stderr - 12%|█▏ | 2647/22434 [1:51:17<13:59:29, 2.55s/it] +2025-02-05 11:58:58 - ERROR - stderr - +2025-02-05 11:58:58 - ERROR - stderr - +2025-02-05 11:58:58 - INFO - stdout - {'loss': 0.9694, 'grad_norm': 1.1656427383422852, 'learning_rate': 1.95970347576252e-05, 'epoch': 0.35} +2025-02-05 11:58:58 - ERROR - stderr - 12%|█▏ | 2647/22434 [1:51:17<13:59:29, 2.55s/it] +2025-02-05 11:59:00 - ERROR - stderr - 12%|█▏ | 2648/22434 [1:51:20<13:51:57, 2.52s/it] +2025-02-05 11:59:00 - ERROR - stderr - +2025-02-05 11:59:00 - ERROR - stderr - +2025-02-05 11:59:00 - INFO - stdout - {'loss': 0.999, 'grad_norm': 1.1961395740509033, 'learning_rate': 1.9596628943640817e-05, 'epoch': 0.35} +2025-02-05 11:59:00 - ERROR - stderr - 12%|█▏ | 2648/22434 [1:51:20<13:51:57, 2.52s/it] +2025-02-05 11:59:03 - ERROR - stderr - 12%|█▏ | 2649/22434 [1:51:22<13:47:47, 2.51s/it] +2025-02-05 11:59:03 - ERROR - stderr - +2025-02-05 11:59:03 - ERROR - stderr - +2025-02-05 11:59:03 - INFO - stdout - {'loss': 1.0927, 'grad_norm': 1.1476325988769531, 'learning_rate': 1.9596222929623888e-05, 'epoch': 0.35} +2025-02-05 11:59:03 - ERROR - stderr - 12%|█▏ | 2649/22434 [1:51:22<13:47:47, 2.51s/it] +2025-02-05 11:59:05 - ERROR - stderr - 12%|█▏ | 2650/22434 [1:51:25<13:58:19, 2.54s/it] +2025-02-05 11:59:05 - ERROR - stderr - +2025-02-05 11:59:05 - ERROR - stderr - +2025-02-05 11:59:05 - INFO - stdout - {'loss': 0.9684, 'grad_norm': 1.179354190826416, 'learning_rate': 1.9595816715582873e-05, 'epoch': 0.35} +2025-02-05 11:59:05 - ERROR - stderr - 12%|█▏ | 2650/22434 [1:51:25<13:58:19, 2.54s/it] +2025-02-05 11:59:08 - ERROR - stderr - 12%|█▏ | 2651/22434 [1:51:27<13:56:41, 2.54s/it] +2025-02-05 11:59:08 - ERROR - stderr - +2025-02-05 11:59:08 - ERROR - stderr - +2025-02-05 11:59:08 - INFO - stdout - {'loss': 1.045, 'grad_norm': 1.2051736116409302, 'learning_rate': 1.959541030152624e-05, 'epoch': 0.35} +2025-02-05 11:59:08 - ERROR - stderr - 12%|█▏ | 2651/22434 [1:51:27<13:56:41, 2.54s/it] +2025-02-05 11:59:10 - ERROR - stderr - 12%|█▏ | 2652/22434 [1:51:30<13:45:46, 2.50s/it] +2025-02-05 11:59:10 - ERROR - stderr - +2025-02-05 11:59:10 - ERROR - stderr - +2025-02-05 11:59:10 - INFO - stdout - {'loss': 1.0269, 'grad_norm': 1.286818504333496, 'learning_rate': 1.9595003687462463e-05, 'epoch': 0.35} +2025-02-05 11:59:10 - ERROR - stderr - 12%|█▏ | 2652/22434 [1:51:30<13:45:46, 2.50s/it] +2025-02-05 11:59:13 - ERROR - stderr - 12%|█▏ | 2653/22434 [1:51:32<13:44:18, 2.50s/it] +2025-02-05 11:59:13 - ERROR - stderr - +2025-02-05 11:59:13 - ERROR - stderr - +2025-02-05 11:59:13 - INFO - stdout - {'loss': 1.0408, 'grad_norm': 1.108031988143921, 'learning_rate': 1.9594596873400015e-05, 'epoch': 0.35} +2025-02-05 11:59:13 - ERROR - stderr - 12%|█▏ | 2653/22434 [1:51:32<13:44:18, 2.50s/it] +2025-02-05 11:59:15 - ERROR - stderr - 12%|█▏ | 2654/22434 [1:51:35<13:48:35, 2.51s/it] +2025-02-05 11:59:15 - ERROR - stderr - +2025-02-05 11:59:15 - ERROR - stderr - +2025-02-05 11:59:15 - INFO - stdout - {'loss': 0.9333, 'grad_norm': 1.1158322095870972, 'learning_rate': 1.9594189859347376e-05, 'epoch': 0.35} +2025-02-05 11:59:15 - ERROR - stderr - 12%|█▏ | 2654/22434 [1:51:35<13:48:35, 2.51s/it] +2025-02-05 11:59:18 - ERROR - stderr - 12%|█▏ | 2655/22434 [1:51:37<13:50:16, 2.52s/it] +2025-02-05 11:59:18 - ERROR - stderr - +2025-02-05 11:59:18 - ERROR - stderr - +2025-02-05 11:59:18 - INFO - stdout - {'loss': 0.8743, 'grad_norm': 1.1174850463867188, 'learning_rate': 1.959378264531303e-05, 'epoch': 0.36} +2025-02-05 11:59:18 - ERROR - stderr - 12%|█▏ | 2655/22434 [1:51:37<13:50:16, 2.52s/it] +2025-02-05 11:59:20 - ERROR - stderr - 12%|█▏ | 2656/22434 [1:51:40<14:03:30, 2.56s/it] +2025-02-05 11:59:20 - ERROR - stderr - +2025-02-05 11:59:20 - ERROR - stderr - +2025-02-05 11:59:20 - INFO - stdout - {'loss': 0.8946, 'grad_norm': 1.0827534198760986, 'learning_rate': 1.9593375231305466e-05, 'epoch': 0.36} +2025-02-05 11:59:20 - ERROR - stderr - 12%|█▏ | 2656/22434 [1:51:40<14:03:30, 2.56s/it] +2025-02-05 11:59:23 - ERROR - stderr - 12%|█▏ | 2657/22434 [1:51:43<14:03:04, 2.56s/it] +2025-02-05 11:59:23 - ERROR - stderr - +2025-02-05 11:59:23 - ERROR - stderr - +2025-02-05 11:59:23 - INFO - stdout - {'loss': 0.893, 'grad_norm': 1.028654932975769, 'learning_rate': 1.959296761733317e-05, 'epoch': 0.36} +2025-02-05 11:59:23 - ERROR - stderr - 12%|█▏ | 2657/22434 [1:51:43<14:03:04, 2.56s/it] +2025-02-05 11:59:25 - ERROR - stderr - 12%|█▏ | 2658/22434 [1:51:45<13:50:24, 2.52s/it] +2025-02-05 11:59:25 - ERROR - stderr - +2025-02-05 11:59:25 - ERROR - stderr - +2025-02-05 11:59:25 - INFO - stdout - {'loss': 0.9932, 'grad_norm': 1.186279296875, 'learning_rate': 1.9592559803404652e-05, 'epoch': 0.36} +2025-02-05 11:59:25 - ERROR - stderr - 12%|█▏ | 2658/22434 [1:51:45<13:50:24, 2.52s/it] +2025-02-05 11:59:28 - ERROR - stderr - 12%|█▏ | 2659/22434 [1:51:48<13:45:52, 2.51s/it] +2025-02-05 11:59:28 - ERROR - stderr - +2025-02-05 11:59:28 - ERROR - stderr - +2025-02-05 11:59:28 - INFO - stdout - {'loss': 1.0447, 'grad_norm': 1.1797289848327637, 'learning_rate': 1.9592151789528397e-05, 'epoch': 0.36} +2025-02-05 11:59:28 - ERROR - stderr - 12%|█▏ | 2659/22434 [1:51:48<13:45:52, 2.51s/it] +2025-02-05 11:59:30 - ERROR - stderr - 12%|█▏ | 2660/22434 [1:51:50<13:49:22, 2.52s/it] +2025-02-05 11:59:30 - ERROR - stderr - +2025-02-05 11:59:30 - ERROR - stderr - +2025-02-05 11:59:30 - INFO - stdout - {'loss': 1.0292, 'grad_norm': 1.1956654787063599, 'learning_rate': 1.959174357571292e-05, 'epoch': 0.36} +2025-02-05 11:59:30 - ERROR - stderr - 12%|█▏ | 2660/22434 [1:51:50<13:49:22, 2.52s/it] +2025-02-05 11:59:33 - ERROR - stderr - 12%|█▏ | 2661/22434 [1:51:53<14:34:36, 2.65s/it] +2025-02-05 11:59:33 - ERROR - stderr - +2025-02-05 11:59:33 - ERROR - stderr - +2025-02-05 11:59:33 - INFO - stdout - {'loss': 1.0862, 'grad_norm': 1.1413626670837402, 'learning_rate': 1.9591335161966725e-05, 'epoch': 0.36} +2025-02-05 11:59:33 - ERROR - stderr - 12%|█▏ | 2661/22434 [1:51:53<14:34:36, 2.65s/it] +2025-02-05 11:59:36 - ERROR - stderr - 12%|█▏ | 2662/22434 [1:51:56<14:20:19, 2.61s/it] +2025-02-05 11:59:36 - ERROR - stderr - +2025-02-05 11:59:36 - ERROR - stderr - +2025-02-05 11:59:36 - INFO - stdout - {'loss': 0.9734, 'grad_norm': 1.0182641744613647, 'learning_rate': 1.959092654829833e-05, 'epoch': 0.36} +2025-02-05 11:59:36 - ERROR - stderr - 12%|█▏ | 2662/22434 [1:51:56<14:20:19, 2.61s/it] +2025-02-05 11:59:38 - ERROR - stderr - 12%|█▏ | 2663/22434 [1:51:58<14:09:50, 2.58s/it] +2025-02-05 11:59:38 - ERROR - stderr - +2025-02-05 11:59:38 - ERROR - stderr - +2025-02-05 11:59:38 - INFO - stdout - {'loss': 1.0722, 'grad_norm': 1.2872415781021118, 'learning_rate': 1.9590517734716244e-05, 'epoch': 0.36} +2025-02-05 11:59:38 - ERROR - stderr - 12%|█▏ | 2663/22434 [1:51:58<14:09:50, 2.58s/it] +2025-02-05 11:59:41 - ERROR - stderr - 12%|█▏ | 2664/22434 [1:52:01<13:59:46, 2.55s/it] +2025-02-05 11:59:41 - ERROR - stderr - +2025-02-05 11:59:41 - ERROR - stderr - +2025-02-05 11:59:41 - INFO - stdout - {'loss': 1.0597, 'grad_norm': 1.2341710329055786, 'learning_rate': 1.9590108721228994e-05, 'epoch': 0.36} +2025-02-05 11:59:41 - ERROR - stderr - 12%|█▏ | 2664/22434 [1:52:01<13:59:46, 2.55s/it] +2025-02-05 11:59:43 - ERROR - stderr - 12%|█▏ | 2665/22434 [1:52:03<13:59:40, 2.55s/it] +2025-02-05 11:59:43 - ERROR - stderr - +2025-02-05 11:59:43 - ERROR - stderr - +2025-02-05 11:59:43 - INFO - stdout - {'loss': 1.1569, 'grad_norm': 1.129207968711853, 'learning_rate': 1.9589699507845106e-05, 'epoch': 0.36} +2025-02-05 11:59:43 - ERROR - stderr - 12%|█▏ | 2665/22434 [1:52:03<13:59:40, 2.55s/it] +2025-02-05 11:59:46 - ERROR - stderr - 12%|█▏ | 2666/22434 [1:52:06<13:48:02, 2.51s/it] +2025-02-05 11:59:46 - ERROR - stderr - +2025-02-05 11:59:46 - ERROR - stderr - +2025-02-05 11:59:46 - INFO - stdout - {'loss': 1.0741, 'grad_norm': 1.3279445171356201, 'learning_rate': 1.958929009457311e-05, 'epoch': 0.36} +2025-02-05 11:59:46 - ERROR - stderr - 12%|█▏ | 2666/22434 [1:52:06<13:48:02, 2.51s/it] +2025-02-05 11:59:48 - ERROR - stderr - 12%|█▏ | 2667/22434 [1:52:08<13:42:12, 2.50s/it] +2025-02-05 11:59:48 - ERROR - stderr - +2025-02-05 11:59:48 - ERROR - stderr - +2025-02-05 11:59:48 - INFO - stdout - {'loss': 0.9052, 'grad_norm': 1.0417251586914062, 'learning_rate': 1.9588880481421537e-05, 'epoch': 0.36} +2025-02-05 11:59:48 - ERROR - stderr - 12%|█▏ | 2667/22434 [1:52:08<13:42:12, 2.50s/it] +2025-02-05 11:59:51 - ERROR - stderr - 12%|█▏ | 2668/22434 [1:52:10<13:35:30, 2.48s/it] +2025-02-05 11:59:51 - ERROR - stderr - +2025-02-05 11:59:51 - ERROR - stderr - +2025-02-05 11:59:51 - INFO - stdout - {'loss': 0.9439, 'grad_norm': 1.086450457572937, 'learning_rate': 1.958847066839892e-05, 'epoch': 0.36} +2025-02-05 11:59:51 - ERROR - stderr - 12%|█▏ | 2668/22434 [1:52:10<13:35:30, 2.48s/it] +2025-02-05 11:59:53 - ERROR - stderr - 12%|█▏ | 2669/22434 [1:52:13<13:44:30, 2.50s/it] +2025-02-05 11:59:53 - ERROR - stderr - +2025-02-05 11:59:53 - ERROR - stderr - +2025-02-05 11:59:53 - INFO - stdout - {'loss': 1.0193, 'grad_norm': 1.0320312976837158, 'learning_rate': 1.9588060655513814e-05, 'epoch': 0.36} +2025-02-05 11:59:53 - ERROR - stderr - 12%|█▏ | 2669/22434 [1:52:13<13:44:30, 2.50s/it] +2025-02-05 11:59:56 - ERROR - stderr - 12%|█▏ | 2670/22434 [1:52:16<13:56:54, 2.54s/it] +2025-02-05 11:59:56 - ERROR - stderr - +2025-02-05 11:59:56 - ERROR - stderr - +2025-02-05 11:59:56 - INFO - stdout - {'loss': 1.041, 'grad_norm': 1.278128981590271, 'learning_rate': 1.9587650442774756e-05, 'epoch': 0.36} +2025-02-05 11:59:56 - ERROR - stderr - 12%|█▏ | 2670/22434 [1:52:16<13:56:54, 2.54s/it] +2025-02-05 11:59:58 - ERROR - stderr - 12%|█▏ | 2671/22434 [1:52:18<13:46:02, 2.51s/it] +2025-02-05 11:59:58 - ERROR - stderr - +2025-02-05 11:59:58 - ERROR - stderr - +2025-02-05 11:59:58 - INFO - stdout - {'loss': 0.9989, 'grad_norm': 1.1445777416229248, 'learning_rate': 1.9587240030190298e-05, 'epoch': 0.36} +2025-02-05 11:59:58 - ERROR - stderr - 12%|█▏ | 2671/22434 [1:52:18<13:46:02, 2.51s/it] +2025-02-05 12:00:01 - ERROR - stderr - 12%|█▏ | 2672/22434 [1:52:21<14:05:50, 2.57s/it] +2025-02-05 12:00:01 - ERROR - stderr - +2025-02-05 12:00:01 - ERROR - stderr - +2025-02-05 12:00:01 - INFO - stdout - {'loss': 0.9621, 'grad_norm': 1.210056185722351, 'learning_rate': 1.9586829417768995e-05, 'epoch': 0.36} +2025-02-05 12:00:01 - ERROR - stderr - 12%|█▏ | 2672/22434 [1:52:21<14:05:50, 2.57s/it] +2025-02-05 12:00:03 - ERROR - stderr - 12%|█▏ | 2673/22434 [1:52:23<13:55:15, 2.54s/it] +2025-02-05 12:00:03 - ERROR - stderr - +2025-02-05 12:00:03 - ERROR - stderr - +2025-02-05 12:00:03 - INFO - stdout - {'loss': 0.9983, 'grad_norm': 1.16221284866333, 'learning_rate': 1.9586418605519407e-05, 'epoch': 0.36} +2025-02-05 12:00:03 - ERROR - stderr - 12%|█▏ | 2673/22434 [1:52:23<13:55:15, 2.54s/it] +2025-02-05 12:00:06 - ERROR - stderr - 12%|█▏ | 2674/22434 [1:52:26<13:50:38, 2.52s/it] +2025-02-05 12:00:06 - ERROR - stderr - +2025-02-05 12:00:06 - ERROR - stderr - +2025-02-05 12:00:06 - INFO - stdout - {'loss': 0.9709, 'grad_norm': 1.1673552989959717, 'learning_rate': 1.9586007593450098e-05, 'epoch': 0.36} +2025-02-05 12:00:06 - ERROR - stderr - 12%|█▏ | 2674/22434 [1:52:26<13:50:38, 2.52s/it] +2025-02-05 12:00:09 - ERROR - stderr - 12%|█▏ | 2675/22434 [1:52:28<14:00:03, 2.55s/it] +2025-02-05 12:00:09 - ERROR - stderr - +2025-02-05 12:00:09 - ERROR - stderr - +2025-02-05 12:00:09 - INFO - stdout - {'loss': 0.9639, 'grad_norm': 1.076001763343811, 'learning_rate': 1.958559638156963e-05, 'epoch': 0.36} +2025-02-05 12:00:09 - ERROR - stderr - 12%|█▏ | 2675/22434 [1:52:28<14:00:03, 2.55s/it] +2025-02-05 12:00:11 - ERROR - stderr - 12%|█▏ | 2676/22434 [1:52:31<13:54:29, 2.53s/it] +2025-02-05 12:00:11 - ERROR - stderr - +2025-02-05 12:00:11 - ERROR - stderr - +2025-02-05 12:00:11 - INFO - stdout - {'loss': 0.993, 'grad_norm': 1.1253043413162231, 'learning_rate': 1.9585184969886585e-05, 'epoch': 0.36} +2025-02-05 12:00:11 - ERROR - stderr - 12%|█▏ | 2676/22434 [1:52:31<13:54:29, 2.53s/it] +2025-02-05 12:00:14 - ERROR - stderr - 12%|█▏ | 2677/22434 [1:52:33<13:48:15, 2.52s/it] +2025-02-05 12:00:14 - ERROR - stderr - +2025-02-05 12:00:14 - ERROR - stderr - +2025-02-05 12:00:14 - INFO - stdout - {'loss': 1.0439, 'grad_norm': 1.1168324947357178, 'learning_rate': 1.9584773358409525e-05, 'epoch': 0.36} +2025-02-05 12:00:14 - ERROR - stderr - 12%|█▏ | 2677/22434 [1:52:33<13:48:15, 2.52s/it] +2025-02-05 12:00:16 - ERROR - stderr - 12%|█▏ | 2678/22434 [1:52:36<13:46:12, 2.51s/it] +2025-02-05 12:00:16 - ERROR - stderr - +2025-02-05 12:00:16 - ERROR - stderr - +2025-02-05 12:00:16 - INFO - stdout - {'loss': 1.0543, 'grad_norm': 1.2184659242630005, 'learning_rate': 1.9584361547147036e-05, 'epoch': 0.36} +2025-02-05 12:00:16 - ERROR - stderr - 12%|█▏ | 2678/22434 [1:52:36<13:46:12, 2.51s/it] +2025-02-05 12:00:18 - ERROR - stderr - 12%|█▏ | 2679/22434 [1:52:38<13:43:37, 2.50s/it] +2025-02-05 12:00:19 - ERROR - stderr - +2025-02-05 12:00:19 - ERROR - stderr - +2025-02-05 12:00:19 - INFO - stdout - {'loss': 1.0282, 'grad_norm': 1.2141841650009155, 'learning_rate': 1.9583949536107706e-05, 'epoch': 0.36} +2025-02-05 12:00:19 - ERROR - stderr - 12%|█▏ | 2679/22434 [1:52:38<13:43:37, 2.50s/it] +2025-02-05 12:00:21 - ERROR - stderr - 12%|█▏ | 2680/22434 [1:52:41<13:42:46, 2.50s/it] +2025-02-05 12:00:21 - ERROR - stderr - +2025-02-05 12:00:21 - ERROR - stderr - +2025-02-05 12:00:21 - INFO - stdout - {'loss': 0.9203, 'grad_norm': 1.1196305751800537, 'learning_rate': 1.9583537325300118e-05, 'epoch': 0.36} +2025-02-05 12:00:21 - ERROR - stderr - 12%|█▏ | 2680/22434 [1:52:41<13:42:46, 2.50s/it] +2025-02-05 12:00:23 - ERROR - stderr - 12%|█▏ | 2681/22434 [1:52:43<13:38:04, 2.48s/it] +2025-02-05 12:00:23 - ERROR - stderr - +2025-02-05 12:00:23 - ERROR - stderr - +2025-02-05 12:00:23 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.2981374263763428, 'learning_rate': 1.958312491473286e-05, 'epoch': 0.36} +2025-02-05 12:00:23 - ERROR - stderr - 12%|█▏ | 2681/22434 [1:52:43<13:38:04, 2.48s/it] +2025-02-05 12:00:26 - ERROR - stderr - 12%|█▏ | 2682/22434 [1:52:46<13:34:19, 2.47s/it] +2025-02-05 12:00:26 - ERROR - stderr - +2025-02-05 12:00:26 - ERROR - stderr - +2025-02-05 12:00:26 - INFO - stdout - {'loss': 0.9953, 'grad_norm': 1.1890085935592651, 'learning_rate': 1.9582712304414538e-05, 'epoch': 0.36} +2025-02-05 12:00:26 - ERROR - stderr - 12%|█▏ | 2682/22434 [1:52:46<13:34:19, 2.47s/it] +2025-02-05 12:00:28 - ERROR - stderr - 12%|█▏ | 2683/22434 [1:52:48<13:34:01, 2.47s/it] +2025-02-05 12:00:28 - ERROR - stderr - +2025-02-05 12:00:28 - ERROR - stderr - +2025-02-05 12:00:28 - INFO - stdout - {'loss': 1.0206, 'grad_norm': 1.1681146621704102, 'learning_rate': 1.958229949435375e-05, 'epoch': 0.36} +2025-02-05 12:00:28 - ERROR - stderr - 12%|█▏ | 2683/22434 [1:52:48<13:34:01, 2.47s/it] +2025-02-05 12:00:31 - ERROR - stderr - 12%|█▏ | 2684/22434 [1:52:51<13:45:20, 2.51s/it] +2025-02-05 12:00:31 - ERROR - stderr - +2025-02-05 12:00:31 - ERROR - stderr - +2025-02-05 12:00:31 - INFO - stdout - {'loss': 1.0129, 'grad_norm': 1.2158714532852173, 'learning_rate': 1.958188648455909e-05, 'epoch': 0.36} +2025-02-05 12:00:31 - ERROR - stderr - 12%|█▏ | 2684/22434 [1:52:51<13:45:20, 2.51s/it] +2025-02-05 12:00:34 - ERROR - stderr - 12%|█▏ | 2685/22434 [1:52:53<13:52:04, 2.53s/it] +2025-02-05 12:00:34 - ERROR - stderr - +2025-02-05 12:00:34 - ERROR - stderr - +2025-02-05 12:00:34 - INFO - stdout - {'loss': 0.8771, 'grad_norm': 1.080311894416809, 'learning_rate': 1.958147327503918e-05, 'epoch': 0.36} +2025-02-05 12:00:34 - ERROR - stderr - 12%|█▏ | 2685/22434 [1:52:53<13:52:04, 2.53s/it] +2025-02-05 12:00:36 - ERROR - stderr - 12%|█▏ | 2686/22434 [1:52:56<13:50:27, 2.52s/it] +2025-02-05 12:00:36 - ERROR - stderr - +2025-02-05 12:00:36 - ERROR - stderr - +2025-02-05 12:00:36 - INFO - stdout - {'loss': 1.0446, 'grad_norm': 1.1317156553268433, 'learning_rate': 1.9581059865802627e-05, 'epoch': 0.36} +2025-02-05 12:00:36 - ERROR - stderr - 12%|█▏ | 2686/22434 [1:52:56<13:50:27, 2.52s/it] +2025-02-05 12:00:38 - ERROR - stderr - 12%|█▏ | 2687/22434 [1:52:58<13:40:24, 2.49s/it] +2025-02-05 12:00:38 - ERROR - stderr - +2025-02-05 12:00:38 - ERROR - stderr - +2025-02-05 12:00:38 - INFO - stdout - {'loss': 0.9078, 'grad_norm': 1.175309419631958, 'learning_rate': 1.9580646256858048e-05, 'epoch': 0.36} +2025-02-05 12:00:38 - ERROR - stderr - 12%|█▏ | 2687/22434 [1:52:58<13:40:24, 2.49s/it] +2025-02-05 12:00:41 - ERROR - stderr - 12%|█▏ | 2688/22434 [1:53:01<13:36:54, 2.48s/it] +2025-02-05 12:00:41 - ERROR - stderr - +2025-02-05 12:00:41 - ERROR - stderr - +2025-02-05 12:00:41 - INFO - stdout - {'loss': 0.946, 'grad_norm': 1.1587311029434204, 'learning_rate': 1.9580232448214067e-05, 'epoch': 0.36} +2025-02-05 12:00:41 - ERROR - stderr - 12%|█▏ | 2688/22434 [1:53:01<13:36:54, 2.48s/it] +2025-02-05 12:00:43 - ERROR - stderr - 12%|█▏ | 2689/22434 [1:53:03<13:40:19, 2.49s/it] +2025-02-05 12:00:43 - ERROR - stderr - +2025-02-05 12:00:43 - ERROR - stderr - +2025-02-05 12:00:43 - INFO - stdout - {'loss': 1.1262, 'grad_norm': 1.2271808385849, 'learning_rate': 1.957981843987931e-05, 'epoch': 0.36} +2025-02-05 12:00:43 - ERROR - stderr - 12%|█▏ | 2689/22434 [1:53:03<13:40:19, 2.49s/it] +2025-02-05 12:00:46 - ERROR - stderr - 12%|█▏ | 2690/22434 [1:53:06<13:37:38, 2.48s/it] +2025-02-05 12:00:46 - ERROR - stderr - +2025-02-05 12:00:46 - ERROR - stderr - +2025-02-05 12:00:46 - INFO - stdout - {'loss': 0.9796, 'grad_norm': 1.142259120941162, 'learning_rate': 1.9579404231862403e-05, 'epoch': 0.36} +2025-02-05 12:00:46 - ERROR - stderr - 12%|█▏ | 2690/22434 [1:53:06<13:37:38, 2.48s/it] +2025-02-05 12:00:48 - ERROR - stderr - 12%|█▏ | 2691/22434 [1:53:08<13:46:10, 2.51s/it] +2025-02-05 12:00:49 - ERROR - stderr - +2025-02-05 12:00:49 - ERROR - stderr - +2025-02-05 12:00:49 - INFO - stdout - {'loss': 0.9748, 'grad_norm': 1.247002124786377, 'learning_rate': 1.9578989824171982e-05, 'epoch': 0.36} +2025-02-05 12:00:49 - ERROR - stderr - 12%|█▏ | 2691/22434 [1:53:08<13:46:10, 2.51s/it] +2025-02-05 12:00:51 - ERROR - stderr - 12%|█▏ | 2692/22434 [1:53:11<13:43:50, 2.50s/it] +2025-02-05 12:00:51 - ERROR - stderr - +2025-02-05 12:00:51 - ERROR - stderr - +2025-02-05 12:00:51 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.1332519054412842, 'learning_rate': 1.957857521681668e-05, 'epoch': 0.36} +2025-02-05 12:00:51 - ERROR - stderr - 12%|█▏ | 2692/22434 [1:53:11<13:43:50, 2.50s/it] +2025-02-05 12:00:53 - ERROR - stderr - 12%|█▏ | 2693/22434 [1:53:13<13:46:53, 2.51s/it] +2025-02-05 12:00:54 - ERROR - stderr - +2025-02-05 12:00:54 - ERROR - stderr - +2025-02-05 12:00:54 - INFO - stdout - {'loss': 0.9782, 'grad_norm': 1.2020732164382935, 'learning_rate': 1.957816040980515e-05, 'epoch': 0.36} +2025-02-05 12:00:54 - ERROR - stderr - 12%|█▏ | 2693/22434 [1:53:13<13:46:53, 2.51s/it] +2025-02-05 12:00:56 - ERROR - stderr - 12%|█▏ | 2694/22434 [1:53:16<13:42:13, 2.50s/it] +2025-02-05 12:00:56 - ERROR - stderr - +2025-02-05 12:00:56 - ERROR - stderr - +2025-02-05 12:00:56 - INFO - stdout - {'loss': 0.9771, 'grad_norm': 1.2204875946044922, 'learning_rate': 1.9577745403146026e-05, 'epoch': 0.36} +2025-02-05 12:00:56 - ERROR - stderr - 12%|█▏ | 2694/22434 [1:53:16<13:42:13, 2.50s/it] +2025-02-05 12:00:58 - ERROR - stderr - 12%|█▏ | 2695/22434 [1:53:18<13:39:25, 2.49s/it] +2025-02-05 12:00:58 - ERROR - stderr - +2025-02-05 12:00:58 - ERROR - stderr - +2025-02-05 12:00:58 - INFO - stdout - {'loss': 1.038, 'grad_norm': 1.0782824754714966, 'learning_rate': 1.9577330196847965e-05, 'epoch': 0.36} +2025-02-05 12:00:58 - ERROR - stderr - 12%|█▏ | 2695/22434 [1:53:18<13:39:25, 2.49s/it] +2025-02-05 12:01:01 - ERROR - stderr - 12%|█▏ | 2696/22434 [1:53:21<14:21:17, 2.62s/it] +2025-02-05 12:01:01 - ERROR - stderr - +2025-02-05 12:01:01 - ERROR - stderr - +2025-02-05 12:01:01 - INFO - stdout - {'loss': 1.0298, 'grad_norm': 1.1685690879821777, 'learning_rate': 1.9576914790919624e-05, 'epoch': 0.36} +2025-02-05 12:01:01 - ERROR - stderr - 12%|█▏ | 2696/22434 [1:53:21<14:21:17, 2.62s/it] +2025-02-05 12:01:04 - ERROR - stderr - 12%|█▏ | 2697/22434 [1:53:24<14:11:28, 2.59s/it] +2025-02-05 12:01:04 - ERROR - stderr - +2025-02-05 12:01:04 - ERROR - stderr - +2025-02-05 12:01:04 - INFO - stdout - {'loss': 0.9098, 'grad_norm': 1.0536532402038574, 'learning_rate': 1.9576499185369652e-05, 'epoch': 0.36} +2025-02-05 12:01:04 - ERROR - stderr - 12%|█▏ | 2697/22434 [1:53:24<14:11:28, 2.59s/it] +2025-02-05 12:01:06 - ERROR - stderr - 12%|█▏ | 2698/22434 [1:53:26<14:12:12, 2.59s/it] +2025-02-05 12:01:06 - ERROR - stderr - +2025-02-05 12:01:06 - ERROR - stderr - +2025-02-05 12:01:06 - INFO - stdout - {'loss': 0.9821, 'grad_norm': 1.263819932937622, 'learning_rate': 1.9576083380206724e-05, 'epoch': 0.36} +2025-02-05 12:01:06 - ERROR - stderr - 12%|█▏ | 2698/22434 [1:53:26<14:12:12, 2.59s/it] +2025-02-05 12:01:09 - ERROR - stderr - 12%|█▏ | 2699/22434 [1:53:29<14:02:22, 2.56s/it] +2025-02-05 12:01:09 - ERROR - stderr - +2025-02-05 12:01:09 - ERROR - stderr - +2025-02-05 12:01:09 - INFO - stdout - {'loss': 1.0891, 'grad_norm': 1.3008877038955688, 'learning_rate': 1.95756673754395e-05, 'epoch': 0.36} +2025-02-05 12:01:09 - ERROR - stderr - 12%|█▏ | 2699/22434 [1:53:29<14:02:22, 2.56s/it] +2025-02-05 12:01:11 - ERROR - stderr - 12%|█▏ | 2700/22434 [1:53:31<13:52:07, 2.53s/it] +2025-02-05 12:01:11 - ERROR - stderr - +2025-02-05 12:01:11 - ERROR - stderr - +2025-02-05 12:01:11 - INFO - stdout - {'loss': 0.926, 'grad_norm': 1.2156957387924194, 'learning_rate': 1.9575251171076652e-05, 'epoch': 0.36} +2025-02-05 12:01:11 - ERROR - stderr - 12%|█▏ | 2700/22434 [1:53:31<13:52:07, 2.53s/it] +2025-02-05 12:01:14 - ERROR - stderr - 12%|█▏ | 2701/22434 [1:53:34<13:43:43, 2.50s/it] +2025-02-05 12:01:14 - ERROR - stderr - +2025-02-05 12:01:14 - ERROR - stderr - +2025-02-05 12:01:14 - INFO - stdout - {'loss': 1.0309, 'grad_norm': 1.1306465864181519, 'learning_rate': 1.9574834767126855e-05, 'epoch': 0.36} +2025-02-05 12:01:14 - ERROR - stderr - 12%|█▏ | 2701/22434 [1:53:34<13:43:43, 2.50s/it] +2025-02-05 12:01:16 - ERROR - stderr - 12%|█▏ | 2702/22434 [1:53:36<13:47:48, 2.52s/it] +2025-02-05 12:01:16 - ERROR - stderr - +2025-02-05 12:01:16 - ERROR - stderr - +2025-02-05 12:01:16 - INFO - stdout - {'loss': 0.9483, 'grad_norm': 1.0821365118026733, 'learning_rate': 1.957441816359879e-05, 'epoch': 0.36} +2025-02-05 12:01:16 - ERROR - stderr - 12%|█▏ | 2702/22434 [1:53:36<13:47:48, 2.52s/it] +2025-02-05 12:01:19 - ERROR - stderr - 12%|█▏ | 2703/22434 [1:53:39<13:41:20, 2.50s/it] +2025-02-05 12:01:19 - ERROR - stderr - +2025-02-05 12:01:19 - ERROR - stderr - +2025-02-05 12:01:19 - INFO - stdout - {'loss': 1.0879, 'grad_norm': 1.6093029975891113, 'learning_rate': 1.957400136050114e-05, 'epoch': 0.36} +2025-02-05 12:01:19 - ERROR - stderr - 12%|█▏ | 2703/22434 [1:53:39<13:41:20, 2.50s/it] +2025-02-05 12:01:22 - ERROR - stderr - 12%|█▏ | 2704/22434 [1:53:41<14:07:39, 2.58s/it] +2025-02-05 12:01:22 - ERROR - stderr - +2025-02-05 12:01:22 - ERROR - stderr - +2025-02-05 12:01:22 - INFO - stdout - {'loss': 1.092, 'grad_norm': 1.121168851852417, 'learning_rate': 1.9573584357842592e-05, 'epoch': 0.36} +2025-02-05 12:01:22 - ERROR - stderr - 12%|█▏ | 2704/22434 [1:53:41<14:07:39, 2.58s/it] +2025-02-05 12:01:24 - ERROR - stderr - 12%|█▏ | 2705/22434 [1:53:44<14:25:55, 2.63s/it] +2025-02-05 12:01:24 - ERROR - stderr - +2025-02-05 12:01:24 - ERROR - stderr - +2025-02-05 12:01:24 - INFO - stdout - {'loss': 1.0473, 'grad_norm': 1.1248654127120972, 'learning_rate': 1.957316715563184e-05, 'epoch': 0.36} +2025-02-05 12:01:24 - ERROR - stderr - 12%|█▏ | 2705/22434 [1:53:44<14:25:55, 2.63s/it] +2025-02-05 12:01:27 - ERROR - stderr - 12%|█▏ | 2706/22434 [1:53:47<14:05:37, 2.57s/it] +2025-02-05 12:01:27 - ERROR - stderr - +2025-02-05 12:01:27 - ERROR - stderr - +2025-02-05 12:01:27 - INFO - stdout - {'loss': 1.0133, 'grad_norm': 1.2085644006729126, 'learning_rate': 1.957274975387758e-05, 'epoch': 0.36} +2025-02-05 12:01:27 - ERROR - stderr - 12%|█▏ | 2706/22434 [1:53:47<14:05:37, 2.57s/it] +2025-02-05 12:01:29 - ERROR - stderr - 12%|█▏ | 2707/22434 [1:53:49<13:52:03, 2.53s/it] +2025-02-05 12:01:29 - ERROR - stderr - +2025-02-05 12:01:29 - ERROR - stderr - +2025-02-05 12:01:29 - INFO - stdout - {'loss': 0.9706, 'grad_norm': 1.1050665378570557, 'learning_rate': 1.9572332152588513e-05, 'epoch': 0.36} +2025-02-05 12:01:29 - ERROR - stderr - 12%|█▏ | 2707/22434 [1:53:49<13:52:03, 2.53s/it] +2025-02-05 12:01:32 - ERROR - stderr - 12%|█▏ | 2708/22434 [1:53:52<13:57:20, 2.55s/it] +2025-02-05 12:01:32 - ERROR - stderr - +2025-02-05 12:01:32 - ERROR - stderr - +2025-02-05 12:01:32 - INFO - stdout - {'loss': 0.9016, 'grad_norm': 1.1249905824661255, 'learning_rate': 1.957191435177334e-05, 'epoch': 0.36} +2025-02-05 12:01:32 - ERROR - stderr - 12%|█▏ | 2708/22434 [1:53:52<13:57:20, 2.55s/it] +2025-02-05 12:01:34 - ERROR - stderr - 12%|█▏ | 2709/22434 [1:53:54<13:58:07, 2.55s/it] +2025-02-05 12:01:34 - ERROR - stderr - +2025-02-05 12:01:34 - ERROR - stderr - +2025-02-05 12:01:34 - INFO - stdout - {'loss': 0.9942, 'grad_norm': 1.1558479070663452, 'learning_rate': 1.957149635144077e-05, 'epoch': 0.36} +2025-02-05 12:01:34 - ERROR - stderr - 12%|█▏ | 2709/22434 [1:53:54<13:58:07, 2.55s/it] +2025-02-05 12:01:37 - ERROR - stderr - 12%|█▏ | 2710/22434 [1:53:57<13:55:49, 2.54s/it] +2025-02-05 12:01:37 - ERROR - stderr - +2025-02-05 12:01:37 - ERROR - stderr - +2025-02-05 12:01:37 - INFO - stdout - {'loss': 1.0187, 'grad_norm': 1.2220560312271118, 'learning_rate': 1.9571078151599517e-05, 'epoch': 0.36} +2025-02-05 12:01:37 - ERROR - stderr - 12%|█▏ | 2710/22434 [1:53:57<13:55:49, 2.54s/it] +2025-02-05 12:01:39 - ERROR - stderr - 12%|█▏ | 2711/22434 [1:53:59<13:56:06, 2.54s/it] +2025-02-05 12:01:39 - ERROR - stderr - +2025-02-05 12:01:39 - ERROR - stderr - +2025-02-05 12:01:39 - INFO - stdout - {'loss': 0.922, 'grad_norm': 1.073351263999939, 'learning_rate': 1.9570659752258302e-05, 'epoch': 0.36} +2025-02-05 12:01:39 - ERROR - stderr - 12%|█▏ | 2711/22434 [1:53:59<13:56:06, 2.54s/it] +2025-02-05 12:01:42 - ERROR - stderr - 12%|█▏ | 2712/22434 [1:54:02<14:09:18, 2.58s/it] +2025-02-05 12:01:42 - ERROR - stderr - +2025-02-05 12:01:42 - ERROR - stderr - +2025-02-05 12:01:42 - INFO - stdout - {'loss': 1.0319, 'grad_norm': 1.1340545415878296, 'learning_rate': 1.9570241153425842e-05, 'epoch': 0.36} +2025-02-05 12:01:42 - ERROR - stderr - 12%|█▏ | 2712/22434 [1:54:02<14:09:18, 2.58s/it] +2025-02-05 12:01:45 - ERROR - stderr - 12%|█▏ | 2713/22434 [1:54:04<13:53:50, 2.54s/it] +2025-02-05 12:01:45 - ERROR - stderr - +2025-02-05 12:01:45 - ERROR - stderr - +2025-02-05 12:01:45 - INFO - stdout - {'loss': 0.9037, 'grad_norm': 1.2663789987564087, 'learning_rate': 1.956982235511086e-05, 'epoch': 0.36} +2025-02-05 12:01:45 - ERROR - stderr - 12%|█▏ | 2713/22434 [1:54:04<13:53:50, 2.54s/it] +2025-02-05 12:01:47 - ERROR - stderr - 12%|█▏ | 2714/22434 [1:54:07<13:41:36, 2.50s/it] +2025-02-05 12:01:47 - ERROR - stderr - +2025-02-05 12:01:47 - ERROR - stderr - +2025-02-05 12:01:47 - INFO - stdout - {'loss': 1.0822, 'grad_norm': 1.3487099409103394, 'learning_rate': 1.956940335732209e-05, 'epoch': 0.36} +2025-02-05 12:01:47 - ERROR - stderr - 12%|█▏ | 2714/22434 [1:54:07<13:41:36, 2.50s/it] +2025-02-05 12:01:50 - ERROR - stderr - 12%|█▏ | 2715/22434 [1:54:09<13:48:31, 2.52s/it] +2025-02-05 12:01:50 - ERROR - stderr - +2025-02-05 12:01:50 - ERROR - stderr - +2025-02-05 12:01:50 - INFO - stdout - {'loss': 0.9797, 'grad_norm': 1.1533018350601196, 'learning_rate': 1.9568984160068263e-05, 'epoch': 0.36} +2025-02-05 12:01:50 - ERROR - stderr - 12%|█▏ | 2715/22434 [1:54:09<13:48:31, 2.52s/it] +2025-02-05 12:01:52 - ERROR - stderr - 12%|█▏ | 2716/22434 [1:54:12<13:58:46, 2.55s/it] +2025-02-05 12:01:52 - ERROR - stderr - +2025-02-05 12:01:52 - ERROR - stderr - +2025-02-05 12:01:52 - INFO - stdout - {'loss': 0.9819, 'grad_norm': 1.0488159656524658, 'learning_rate': 1.956856476335812e-05, 'epoch': 0.36} +2025-02-05 12:01:52 - ERROR - stderr - 12%|█▏ | 2716/22434 [1:54:12<13:58:46, 2.55s/it] +2025-02-05 12:01:55 - ERROR - stderr - 12%|█▏ | 2717/22434 [1:54:14<13:47:16, 2.52s/it] +2025-02-05 12:01:55 - ERROR - stderr - +2025-02-05 12:01:55 - ERROR - stderr - +2025-02-05 12:01:55 - INFO - stdout - {'loss': 1.0159, 'grad_norm': 1.123511552810669, 'learning_rate': 1.9568145167200397e-05, 'epoch': 0.36} +2025-02-05 12:01:55 - ERROR - stderr - 12%|█▏ | 2717/22434 [1:54:14<13:47:16, 2.52s/it] +2025-02-05 12:01:57 - ERROR - stderr - 12%|█▏ | 2718/22434 [1:54:17<13:45:22, 2.51s/it] +2025-02-05 12:01:57 - ERROR - stderr - +2025-02-05 12:01:57 - ERROR - stderr - +2025-02-05 12:01:57 - INFO - stdout - {'loss': 0.9759, 'grad_norm': 1.1428534984588623, 'learning_rate': 1.9567725371603848e-05, 'epoch': 0.36} +2025-02-05 12:01:57 - ERROR - stderr - 12%|█▏ | 2718/22434 [1:54:17<13:45:22, 2.51s/it] +2025-02-05 12:02:00 - ERROR - stderr - 12%|█▏ | 2719/22434 [1:54:19<13:46:50, 2.52s/it] +2025-02-05 12:02:00 - ERROR - stderr - +2025-02-05 12:02:00 - ERROR - stderr - +2025-02-05 12:02:00 - INFO - stdout - {'loss': 0.9382, 'grad_norm': 1.1470234394073486, 'learning_rate': 1.956730537657722e-05, 'epoch': 0.36} +2025-02-05 12:02:00 - ERROR - stderr - 12%|█▏ | 2719/22434 [1:54:19<13:46:50, 2.52s/it] +2025-02-05 12:02:02 - ERROR - stderr - 12%|█▏ | 2720/22434 [1:54:22<13:42:35, 2.50s/it] +2025-02-05 12:02:02 - ERROR - stderr - +2025-02-05 12:02:02 - ERROR - stderr - +2025-02-05 12:02:02 - INFO - stdout - {'loss': 1.0164, 'grad_norm': 1.197984218597412, 'learning_rate': 1.956688518212926e-05, 'epoch': 0.36} +2025-02-05 12:02:02 - ERROR - stderr - 12%|█▏ | 2720/22434 [1:54:22<13:42:35, 2.50s/it] +2025-02-05 12:02:05 - ERROR - stderr - 12%|█▏ | 2721/22434 [1:54:24<13:38:09, 2.49s/it] +2025-02-05 12:02:05 - ERROR - stderr - +2025-02-05 12:02:05 - ERROR - stderr - +2025-02-05 12:02:05 - INFO - stdout - {'loss': 0.9922, 'grad_norm': 1.1778687238693237, 'learning_rate': 1.9566464788268737e-05, 'epoch': 0.36} +2025-02-05 12:02:05 - ERROR - stderr - 12%|█▏ | 2721/22434 [1:54:24<13:38:09, 2.49s/it] +2025-02-05 12:02:07 - ERROR - stderr - 12%|█▏ | 2722/22434 [1:54:27<13:38:21, 2.49s/it] +2025-02-05 12:02:07 - ERROR - stderr - +2025-02-05 12:02:07 - ERROR - stderr - +2025-02-05 12:02:07 - INFO - stdout - {'loss': 0.9117, 'grad_norm': 1.0675179958343506, 'learning_rate': 1.956604419500441e-05, 'epoch': 0.36} +2025-02-05 12:02:07 - ERROR - stderr - 12%|█▏ | 2722/22434 [1:54:27<13:38:21, 2.49s/it] +2025-02-05 12:02:10 - ERROR - stderr - 12%|█▏ | 2723/22434 [1:54:29<13:36:20, 2.48s/it] +2025-02-05 12:02:10 - ERROR - stderr - +2025-02-05 12:02:10 - ERROR - stderr - +2025-02-05 12:02:10 - INFO - stdout - {'loss': 0.9949, 'grad_norm': 1.2712956666946411, 'learning_rate': 1.9565623402345045e-05, 'epoch': 0.36} +2025-02-05 12:02:10 - ERROR - stderr - 12%|█▏ | 2723/22434 [1:54:29<13:36:20, 2.48s/it] +2025-02-05 12:02:12 - ERROR - stderr - 12%|█▏ | 2724/22434 [1:54:32<13:38:44, 2.49s/it] +2025-02-05 12:02:12 - ERROR - stderr - +2025-02-05 12:02:12 - ERROR - stderr - +2025-02-05 12:02:12 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.1661655902862549, 'learning_rate': 1.9565202410299415e-05, 'epoch': 0.36} +2025-02-05 12:02:12 - ERROR - stderr - 12%|█▏ | 2724/22434 [1:54:32<13:38:44, 2.49s/it] +2025-02-05 12:02:15 - ERROR - stderr - 12%|█▏ | 2725/22434 [1:54:34<13:44:26, 2.51s/it] +2025-02-05 12:02:15 - ERROR - stderr - +2025-02-05 12:02:15 - ERROR - stderr - +2025-02-05 12:02:15 - INFO - stdout - {'loss': 1.0302, 'grad_norm': 1.1678333282470703, 'learning_rate': 1.956478121887629e-05, 'epoch': 0.36} +2025-02-05 12:02:15 - ERROR - stderr - 12%|█▏ | 2725/22434 [1:54:34<13:44:26, 2.51s/it] +2025-02-05 12:02:17 - ERROR - stderr - 12%|█▏ | 2726/22434 [1:54:37<13:56:44, 2.55s/it] +2025-02-05 12:02:17 - ERROR - stderr - +2025-02-05 12:02:17 - ERROR - stderr - +2025-02-05 12:02:17 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.1373299360275269, 'learning_rate': 1.9564359828084454e-05, 'epoch': 0.36} +2025-02-05 12:02:17 - ERROR - stderr - 12%|█▏ | 2726/22434 [1:54:37<13:56:44, 2.55s/it] +2025-02-05 12:02:20 - ERROR - stderr - 12%|█▏ | 2727/22434 [1:54:39<13:43:10, 2.51s/it] +2025-02-05 12:02:20 - ERROR - stderr - +2025-02-05 12:02:20 - ERROR - stderr - +2025-02-05 12:02:20 - INFO - stdout - {'loss': 1.0216, 'grad_norm': 1.1422022581100464, 'learning_rate': 1.9563938237932688e-05, 'epoch': 0.36} +2025-02-05 12:02:20 - ERROR - stderr - 12%|█▏ | 2727/22434 [1:54:39<13:43:10, 2.51s/it] +2025-02-05 12:02:22 - ERROR - stderr - 12%|█▏ | 2728/22434 [1:54:42<14:00:39, 2.56s/it] +2025-02-05 12:02:22 - ERROR - stderr - +2025-02-05 12:02:22 - ERROR - stderr - +2025-02-05 12:02:22 - INFO - stdout - {'loss': 1.0579, 'grad_norm': 1.2675966024398804, 'learning_rate': 1.9563516448429783e-05, 'epoch': 0.36} +2025-02-05 12:02:22 - ERROR - stderr - 12%|█▏ | 2728/22434 [1:54:42<14:00:39, 2.56s/it] +2025-02-05 12:02:25 - ERROR - stderr - 12%|█▏ | 2729/22434 [1:54:45<13:52:41, 2.54s/it] +2025-02-05 12:02:25 - ERROR - stderr - +2025-02-05 12:02:25 - ERROR - stderr - +2025-02-05 12:02:25 - INFO - stdout - {'loss': 0.9668, 'grad_norm': 1.1172945499420166, 'learning_rate': 1.9563094459584532e-05, 'epoch': 0.36} +2025-02-05 12:02:25 - ERROR - stderr - 12%|█▏ | 2729/22434 [1:54:45<13:52:41, 2.54s/it] +2025-02-05 12:02:27 - ERROR - stderr - 12%|█▏ | 2730/22434 [1:54:47<13:46:19, 2.52s/it] +2025-02-05 12:02:27 - ERROR - stderr - +2025-02-05 12:02:27 - ERROR - stderr - +2025-02-05 12:02:27 - INFO - stdout - {'loss': 1.0171, 'grad_norm': 1.0568033456802368, 'learning_rate': 1.9562672271405723e-05, 'epoch': 0.37} +2025-02-05 12:02:27 - ERROR - stderr - 12%|█▏ | 2730/22434 [1:54:47<13:46:19, 2.52s/it] +2025-02-05 12:02:30 - ERROR - stderr - 12%|█▏ | 2731/22434 [1:54:49<13:40:08, 2.50s/it] +2025-02-05 12:02:30 - ERROR - stderr - +2025-02-05 12:02:30 - ERROR - stderr - +2025-02-05 12:02:30 - INFO - stdout - {'loss': 1.1311, 'grad_norm': 1.3010711669921875, 'learning_rate': 1.956224988390216e-05, 'epoch': 0.37} +2025-02-05 12:02:30 - ERROR - stderr - 12%|█▏ | 2731/22434 [1:54:50<13:40:08, 2.50s/it] +2025-02-05 12:02:32 - ERROR - stderr - 12%|█▏ | 2732/22434 [1:54:52<13:34:29, 2.48s/it] +2025-02-05 12:02:32 - ERROR - stderr - +2025-02-05 12:02:32 - ERROR - stderr - +2025-02-05 12:02:32 - INFO - stdout - {'loss': 0.9701, 'grad_norm': 1.1235120296478271, 'learning_rate': 1.9561827297082658e-05, 'epoch': 0.37} +2025-02-05 12:02:32 - ERROR - stderr - 12%|█▏ | 2732/22434 [1:54:52<13:34:29, 2.48s/it] +2025-02-05 12:02:35 - ERROR - stderr - 12%|█▏ | 2733/22434 [1:54:55<13:50:52, 2.53s/it] +2025-02-05 12:02:35 - ERROR - stderr - +2025-02-05 12:02:35 - ERROR - stderr - +2025-02-05 12:02:35 - INFO - stdout - {'loss': 1.0043, 'grad_norm': 1.2145131826400757, 'learning_rate': 1.9561404510956006e-05, 'epoch': 0.37} +2025-02-05 12:02:35 - ERROR - stderr - 12%|█▏ | 2733/22434 [1:54:55<13:50:52, 2.53s/it] +2025-02-05 12:02:37 - ERROR - stderr - 12%|█▏ | 2734/22434 [1:54:57<13:42:55, 2.51s/it] +2025-02-05 12:02:37 - ERROR - stderr - +2025-02-05 12:02:37 - ERROR - stderr - +2025-02-05 12:02:37 - INFO - stdout - {'loss': 1.0026, 'grad_norm': 1.1204999685287476, 'learning_rate': 1.9560981525531027e-05, 'epoch': 0.37} +2025-02-05 12:02:37 - ERROR - stderr - 12%|█▏ | 2734/22434 [1:54:57<13:42:55, 2.51s/it] +2025-02-05 12:02:40 - ERROR - stderr - 12%|█▏ | 2735/22434 [1:55:00<13:58:48, 2.55s/it] +2025-02-05 12:02:40 - ERROR - stderr - +2025-02-05 12:02:40 - ERROR - stderr - +2025-02-05 12:02:40 - INFO - stdout - {'loss': 0.9443, 'grad_norm': 1.080398440361023, 'learning_rate': 1.956055834081654e-05, 'epoch': 0.37} +2025-02-05 12:02:40 - ERROR - stderr - 12%|█▏ | 2735/22434 [1:55:00<13:58:48, 2.55s/it] +2025-02-05 12:02:43 - ERROR - stderr - 12%|█▏ | 2736/22434 [1:55:02<14:06:25, 2.58s/it] +2025-02-05 12:02:43 - ERROR - stderr - +2025-02-05 12:02:43 - ERROR - stderr - +2025-02-05 12:02:43 - INFO - stdout - {'loss': 0.9752, 'grad_norm': 1.1875252723693848, 'learning_rate': 1.9560134956821362e-05, 'epoch': 0.37} +2025-02-05 12:02:43 - ERROR - stderr - 12%|█▏ | 2736/22434 [1:55:02<14:06:25, 2.58s/it] +2025-02-05 12:02:45 - ERROR - stderr - 12%|█▏ | 2737/22434 [1:55:05<13:54:29, 2.54s/it] +2025-02-05 12:02:45 - ERROR - stderr - +2025-02-05 12:02:45 - ERROR - stderr - +2025-02-05 12:02:45 - INFO - stdout - {'loss': 1.0312, 'grad_norm': 1.1252415180206299, 'learning_rate': 1.955971137355432e-05, 'epoch': 0.37} +2025-02-05 12:02:45 - ERROR - stderr - 12%|█▏ | 2737/22434 [1:55:05<13:54:29, 2.54s/it] +2025-02-05 12:02:48 - ERROR - stderr - 12%|█▏ | 2738/22434 [1:55:07<13:49:37, 2.53s/it] +2025-02-05 12:02:48 - ERROR - stderr - +2025-02-05 12:02:48 - ERROR - stderr - +2025-02-05 12:02:48 - INFO - stdout - {'loss': 0.9836, 'grad_norm': 1.080429196357727, 'learning_rate': 1.9559287591024237e-05, 'epoch': 0.37} +2025-02-05 12:02:48 - ERROR - stderr - 12%|█▏ | 2738/22434 [1:55:07<13:49:37, 2.53s/it] +2025-02-05 12:02:50 - ERROR - stderr - 12%|█▏ | 2739/22434 [1:55:10<13:46:12, 2.52s/it] +2025-02-05 12:02:50 - ERROR - stderr - +2025-02-05 12:02:50 - ERROR - stderr - +2025-02-05 12:02:50 - INFO - stdout - {'loss': 0.9282, 'grad_norm': 1.1133793592453003, 'learning_rate': 1.955886360923996e-05, 'epoch': 0.37} +2025-02-05 12:02:50 - ERROR - stderr - 12%|█▏ | 2739/22434 [1:55:10<13:46:12, 2.52s/it] +2025-02-05 12:02:52 - ERROR - stderr - 12%|█▏ | 2740/22434 [1:55:12<13:37:04, 2.49s/it] +2025-02-05 12:02:52 - ERROR - stderr - +2025-02-05 12:02:52 - ERROR - stderr - +2025-02-05 12:02:52 - INFO - stdout - {'loss': 0.9629, 'grad_norm': 1.1525962352752686, 'learning_rate': 1.9558439428210312e-05, 'epoch': 0.37} +2025-02-05 12:02:52 - ERROR - stderr - 12%|█▏ | 2740/22434 [1:55:12<13:37:04, 2.49s/it] +2025-02-05 12:02:55 - ERROR - stderr - 12%|█▏ | 2741/22434 [1:55:15<13:30:53, 2.47s/it] +2025-02-05 12:02:55 - ERROR - stderr - +2025-02-05 12:02:55 - ERROR - stderr - +2025-02-05 12:02:55 - INFO - stdout - {'loss': 1.0303, 'grad_norm': 1.160780906677246, 'learning_rate': 1.955801504794414e-05, 'epoch': 0.37} +2025-02-05 12:02:55 - ERROR - stderr - 12%|█▏ | 2741/22434 [1:55:15<13:30:53, 2.47s/it] +2025-02-05 12:02:58 - ERROR - stderr - 12%|█▏ | 2742/22434 [1:55:18<14:20:13, 2.62s/it] +2025-02-05 12:02:58 - ERROR - stderr - +2025-02-05 12:02:58 - ERROR - stderr - +2025-02-05 12:02:58 - INFO - stdout - {'loss': 0.9728, 'grad_norm': 1.1687520742416382, 'learning_rate': 1.9557590468450294e-05, 'epoch': 0.37} +2025-02-05 12:02:58 - ERROR - stderr - 12%|█▏ | 2742/22434 [1:55:18<14:20:13, 2.62s/it] +2025-02-05 12:03:00 - ERROR - stderr - 12%|█▏ | 2743/22434 [1:55:20<14:03:38, 2.57s/it] +2025-02-05 12:03:00 - ERROR - stderr - +2025-02-05 12:03:00 - ERROR - stderr - +2025-02-05 12:03:00 - INFO - stdout - {'loss': 0.8744, 'grad_norm': 1.0965487957000732, 'learning_rate': 1.955716568973762e-05, 'epoch': 0.37} +2025-02-05 12:03:00 - ERROR - stderr - 12%|█▏ | 2743/22434 [1:55:20<14:03:38, 2.57s/it] +2025-02-05 12:03:03 - ERROR - stderr - 12%|█▏ | 2744/22434 [1:55:22<13:50:56, 2.53s/it] +2025-02-05 12:03:03 - ERROR - stderr - +2025-02-05 12:03:03 - ERROR - stderr - +2025-02-05 12:03:03 - INFO - stdout - {'loss': 0.9731, 'grad_norm': 1.1608115434646606, 'learning_rate': 1.955674071181497e-05, 'epoch': 0.37} +2025-02-05 12:03:03 - ERROR - stderr - 12%|█▏ | 2744/22434 [1:55:23<13:50:56, 2.53s/it] +2025-02-05 12:03:05 - ERROR - stderr - 12%|█▏ | 2745/22434 [1:55:25<13:49:42, 2.53s/it] +2025-02-05 12:03:05 - ERROR - stderr - +2025-02-05 12:03:05 - ERROR - stderr - +2025-02-05 12:03:05 - INFO - stdout - {'loss': 0.9334, 'grad_norm': 0.9959310293197632, 'learning_rate': 1.9556315534691204e-05, 'epoch': 0.37} +2025-02-05 12:03:05 - ERROR - stderr - 12%|█▏ | 2745/22434 [1:55:25<13:49:42, 2.53s/it] +2025-02-05 12:03:08 - ERROR - stderr - 12%|█▏ | 2746/22434 [1:55:28<13:48:03, 2.52s/it] +2025-02-05 12:03:08 - ERROR - stderr - +2025-02-05 12:03:08 - ERROR - stderr - +2025-02-05 12:03:08 - INFO - stdout - {'loss': 0.9926, 'grad_norm': 0.9976779818534851, 'learning_rate': 1.9555890158375188e-05, 'epoch': 0.37} +2025-02-05 12:03:08 - ERROR - stderr - 12%|█▏ | 2746/22434 [1:55:28<13:48:03, 2.52s/it] +2025-02-05 12:03:10 - ERROR - stderr - 12%|█▏ | 2747/22434 [1:55:30<13:37:09, 2.49s/it] +2025-02-05 12:03:10 - ERROR - stderr - +2025-02-05 12:03:10 - ERROR - stderr - +2025-02-05 12:03:10 - INFO - stdout - {'loss': 1.0225, 'grad_norm': 1.0713155269622803, 'learning_rate': 1.9555464582875783e-05, 'epoch': 0.37} +2025-02-05 12:03:10 - ERROR - stderr - 12%|█▏ | 2747/22434 [1:55:30<13:37:09, 2.49s/it] +2025-02-05 12:03:13 - ERROR - stderr - 12%|█▏ | 2748/22434 [1:55:32<13:37:32, 2.49s/it] +2025-02-05 12:03:13 - ERROR - stderr - +2025-02-05 12:03:13 - ERROR - stderr - +2025-02-05 12:03:13 - INFO - stdout - {'loss': 0.9535, 'grad_norm': 1.1484497785568237, 'learning_rate': 1.9555038808201866e-05, 'epoch': 0.37} +2025-02-05 12:03:13 - ERROR - stderr - 12%|█▏ | 2748/22434 [1:55:32<13:37:32, 2.49s/it] +2025-02-05 12:03:15 - ERROR - stderr - 12%|█▏ | 2749/22434 [1:55:35<13:35:18, 2.49s/it] +2025-02-05 12:03:15 - ERROR - stderr - +2025-02-05 12:03:15 - ERROR - stderr - +2025-02-05 12:03:15 - INFO - stdout - {'loss': 0.979, 'grad_norm': 1.1695374250411987, 'learning_rate': 1.9554612834362304e-05, 'epoch': 0.37} +2025-02-05 12:03:15 - ERROR - stderr - 12%|█▏ | 2749/22434 [1:55:35<13:35:18, 2.49s/it] +2025-02-05 12:03:18 - ERROR - stderr - 12%|█▏ | 2750/22434 [1:55:37<13:35:46, 2.49s/it] +2025-02-05 12:03:18 - ERROR - stderr - +2025-02-05 12:03:18 - ERROR - stderr - +2025-02-05 12:03:18 - INFO - stdout - {'loss': 1.0258, 'grad_norm': 1.320141077041626, 'learning_rate': 1.955418666136598e-05, 'epoch': 0.37} +2025-02-05 12:03:18 - ERROR - stderr - 12%|█▏ | 2750/22434 [1:55:37<13:35:46, 2.49s/it] +2025-02-05 12:03:20 - ERROR - stderr - 12%|█▏ | 2751/22434 [1:55:40<13:28:47, 2.47s/it] +2025-02-05 12:03:20 - ERROR - stderr - +2025-02-05 12:03:20 - ERROR - stderr - +2025-02-05 12:03:20 - INFO - stdout - {'loss': 0.9083, 'grad_norm': 1.0721712112426758, 'learning_rate': 1.955376028922178e-05, 'epoch': 0.37} +2025-02-05 12:03:20 - ERROR - stderr - 12%|█▏ | 2751/22434 [1:55:40<13:28:47, 2.47s/it] +2025-02-05 12:03:23 - ERROR - stderr - 12%|█▏ | 2752/22434 [1:55:42<13:32:37, 2.48s/it] +2025-02-05 12:03:23 - ERROR - stderr - +2025-02-05 12:03:23 - ERROR - stderr - +2025-02-05 12:03:23 - INFO - stdout - {'loss': 0.9897, 'grad_norm': 1.1393400430679321, 'learning_rate': 1.955333371793859e-05, 'epoch': 0.37} +2025-02-05 12:03:23 - ERROR - stderr - 12%|█▏ | 2752/22434 [1:55:42<13:32:37, 2.48s/it] +2025-02-05 12:03:25 - ERROR - stderr - 12%|█▏ | 2753/22434 [1:55:45<13:28:35, 2.47s/it] +2025-02-05 12:03:25 - ERROR - stderr - +2025-02-05 12:03:25 - ERROR - stderr - +2025-02-05 12:03:25 - INFO - stdout - {'loss': 0.9625, 'grad_norm': 1.088148593902588, 'learning_rate': 1.9552906947525295e-05, 'epoch': 0.37} +2025-02-05 12:03:25 - ERROR - stderr - 12%|█▏ | 2753/22434 [1:55:45<13:28:35, 2.47s/it] +2025-02-05 12:03:27 - ERROR - stderr - 12%|█▏ | 2754/22434 [1:55:47<13:29:32, 2.47s/it] +2025-02-05 12:03:27 - ERROR - stderr - +2025-02-05 12:03:27 - ERROR - stderr - +2025-02-05 12:03:27 - INFO - stdout - {'loss': 1.0234, 'grad_norm': 1.153430461883545, 'learning_rate': 1.9552479977990802e-05, 'epoch': 0.37} +2025-02-05 12:03:27 - ERROR - stderr - 12%|█▏ | 2754/22434 [1:55:47<13:29:32, 2.47s/it] +2025-02-05 12:03:30 - ERROR - stderr - 12%|█▏ | 2755/22434 [1:55:50<13:28:22, 2.46s/it] +2025-02-05 12:03:30 - ERROR - stderr - +2025-02-05 12:03:30 - ERROR - stderr - +2025-02-05 12:03:30 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.0644422769546509, 'learning_rate': 1.9552052809344004e-05, 'epoch': 0.37} +2025-02-05 12:03:30 - ERROR - stderr - 12%|█▏ | 2755/22434 [1:55:50<13:28:22, 2.46s/it] +2025-02-05 12:03:32 - ERROR - stderr - 12%|█▏ | 2756/22434 [1:55:52<13:26:37, 2.46s/it] +2025-02-05 12:03:32 - ERROR - stderr - +2025-02-05 12:03:32 - ERROR - stderr - +2025-02-05 12:03:32 - INFO - stdout - {'loss': 1.0651, 'grad_norm': 1.189340353012085, 'learning_rate': 1.95516254415938e-05, 'epoch': 0.37} +2025-02-05 12:03:32 - ERROR - stderr - 12%|█▏ | 2756/22434 [1:55:52<13:26:37, 2.46s/it] +2025-02-05 12:03:35 - ERROR - stderr - 12%|█▏ | 2757/22434 [1:55:55<13:31:09, 2.47s/it] +2025-02-05 12:03:35 - ERROR - stderr - +2025-02-05 12:03:35 - ERROR - stderr - +2025-02-05 12:03:35 - INFO - stdout - {'loss': 1.0251, 'grad_norm': 1.1223186254501343, 'learning_rate': 1.9551197874749107e-05, 'epoch': 0.37} +2025-02-05 12:03:35 - ERROR - stderr - 12%|█▏ | 2757/22434 [1:55:55<13:31:09, 2.47s/it] +2025-02-05 12:03:37 - ERROR - stderr - 12%|█▏ | 2758/22434 [1:55:57<13:28:28, 2.47s/it] +2025-02-05 12:03:37 - ERROR - stderr - +2025-02-05 12:03:37 - ERROR - stderr - +2025-02-05 12:03:37 - INFO - stdout - {'loss': 0.8895, 'grad_norm': 1.1488653421401978, 'learning_rate': 1.955077010881883e-05, 'epoch': 0.37} +2025-02-05 12:03:37 - ERROR - stderr - 12%|█▏ | 2758/22434 [1:55:57<13:28:28, 2.47s/it] +2025-02-05 12:03:40 - ERROR - stderr - 12%|█▏ | 2759/22434 [1:56:00<13:34:50, 2.48s/it] +2025-02-05 12:03:40 - ERROR - stderr - +2025-02-05 12:03:40 - ERROR - stderr - +2025-02-05 12:03:40 - INFO - stdout - {'loss': 1.029, 'grad_norm': 1.2532360553741455, 'learning_rate': 1.9550342143811896e-05, 'epoch': 0.37} +2025-02-05 12:03:40 - ERROR - stderr - 12%|█▏ | 2759/22434 [1:56:00<13:34:50, 2.48s/it] +2025-02-05 12:03:42 - ERROR - stderr - 12%|█▏ | 2760/22434 [1:56:02<13:29:55, 2.47s/it] +2025-02-05 12:03:42 - ERROR - stderr - +2025-02-05 12:03:42 - ERROR - stderr - +2025-02-05 12:03:42 - INFO - stdout - {'loss': 1.0284, 'grad_norm': 1.2013119459152222, 'learning_rate': 1.954991397973722e-05, 'epoch': 0.37} +2025-02-05 12:03:42 - ERROR - stderr - 12%|█▏ | 2760/22434 [1:56:02<13:29:55, 2.47s/it] +2025-02-05 12:03:45 - ERROR - stderr - 12%|█▏ | 2761/22434 [1:56:04<13:26:15, 2.46s/it] +2025-02-05 12:03:45 - ERROR - stderr - +2025-02-05 12:03:45 - ERROR - stderr - +2025-02-05 12:03:45 - INFO - stdout - {'loss': 1.1279, 'grad_norm': 1.2756202220916748, 'learning_rate': 1.9549485616603718e-05, 'epoch': 0.37} +2025-02-05 12:03:45 - ERROR - stderr - 12%|█▏ | 2761/22434 [1:56:05<13:26:15, 2.46s/it] +2025-02-05 12:03:47 - ERROR - stderr - 12%|█▏ | 2762/22434 [1:56:07<13:29:01, 2.47s/it] +2025-02-05 12:03:47 - ERROR - stderr - +2025-02-05 12:03:47 - ERROR - stderr - +2025-02-05 12:03:47 - INFO - stdout - {'loss': 1.0546, 'grad_norm': 1.0860332250595093, 'learning_rate': 1.954905705442033e-05, 'epoch': 0.37} +2025-02-05 12:03:47 - ERROR - stderr - 12%|█▏ | 2762/22434 [1:56:07<13:29:01, 2.47s/it] +2025-02-05 12:03:50 - ERROR - stderr - 12%|█▏ | 2763/22434 [1:56:09<13:29:16, 2.47s/it] +2025-02-05 12:03:50 - ERROR - stderr - +2025-02-05 12:03:50 - ERROR - stderr - +2025-02-05 12:03:50 - INFO - stdout - {'loss': 0.8869, 'grad_norm': 1.2069071531295776, 'learning_rate': 1.9548628293195983e-05, 'epoch': 0.37} +2025-02-05 12:03:50 - ERROR - stderr - 12%|█▏ | 2763/22434 [1:56:09<13:29:16, 2.47s/it] +2025-02-05 12:03:52 - ERROR - stderr - 12%|█▏ | 2764/22434 [1:56:12<13:31:29, 2.48s/it] +2025-02-05 12:03:52 - ERROR - stderr - +2025-02-05 12:03:52 - ERROR - stderr - +2025-02-05 12:03:52 - INFO - stdout - {'loss': 0.9653, 'grad_norm': 1.208526611328125, 'learning_rate': 1.954819933293962e-05, 'epoch': 0.37} +2025-02-05 12:03:52 - ERROR - stderr - 12%|█▏ | 2764/22434 [1:56:12<13:31:29, 2.48s/it] +2025-02-05 12:03:55 - ERROR - stderr - 12%|█▏ | 2765/22434 [1:56:14<13:29:43, 2.47s/it] +2025-02-05 12:03:55 - ERROR - stderr - +2025-02-05 12:03:55 - ERROR - stderr - +2025-02-05 12:03:55 - INFO - stdout - {'loss': 0.9589, 'grad_norm': 1.1659077405929565, 'learning_rate': 1.9547770173660173e-05, 'epoch': 0.37} +2025-02-05 12:03:55 - ERROR - stderr - 12%|█▏ | 2765/22434 [1:56:14<13:29:43, 2.47s/it] +2025-02-05 12:03:57 - ERROR - stderr - 12%|█▏ | 2766/22434 [1:56:17<13:49:15, 2.53s/it] +2025-02-05 12:03:57 - ERROR - stderr - +2025-02-05 12:03:57 - ERROR - stderr - +2025-02-05 12:03:57 - INFO - stdout - {'loss': 0.9502, 'grad_norm': 1.0698506832122803, 'learning_rate': 1.9547340815366595e-05, 'epoch': 0.37} +2025-02-05 12:03:57 - ERROR - stderr - 12%|█▏ | 2766/22434 [1:56:17<13:49:15, 2.53s/it] +2025-02-05 12:04:00 - ERROR - stderr - 12%|█▏ | 2767/22434 [1:56:20<13:42:11, 2.51s/it] +2025-02-05 12:04:00 - ERROR - stderr - +2025-02-05 12:04:00 - ERROR - stderr - +2025-02-05 12:04:00 - INFO - stdout - {'loss': 1.0648, 'grad_norm': 1.1477301120758057, 'learning_rate': 1.9546911258067836e-05, 'epoch': 0.37} +2025-02-05 12:04:00 - ERROR - stderr - 12%|█▏ | 2767/22434 [1:56:20<13:42:11, 2.51s/it] +2025-02-05 12:04:02 - ERROR - stderr - 12%|█▏ | 2768/22434 [1:56:22<13:41:23, 2.51s/it] +2025-02-05 12:04:02 - ERROR - stderr - +2025-02-05 12:04:02 - ERROR - stderr - +2025-02-05 12:04:02 - INFO - stdout - {'loss': 1.0517, 'grad_norm': 1.1000534296035767, 'learning_rate': 1.9546481501772846e-05, 'epoch': 0.37} +2025-02-05 12:04:02 - ERROR - stderr - 12%|█▏ | 2768/22434 [1:56:22<13:41:23, 2.51s/it] +2025-02-05 12:04:05 - ERROR - stderr - 12%|█▏ | 2769/22434 [1:56:24<13:35:44, 2.49s/it] +2025-02-05 12:04:05 - ERROR - stderr - +2025-02-05 12:04:05 - ERROR - stderr - +2025-02-05 12:04:05 - INFO - stdout - {'loss': 0.9205, 'grad_norm': 1.1689552068710327, 'learning_rate': 1.9546051546490586e-05, 'epoch': 0.37} +2025-02-05 12:04:05 - ERROR - stderr - 12%|█▏ | 2769/22434 [1:56:25<13:35:44, 2.49s/it] +2025-02-05 12:04:07 - ERROR - stderr - 12%|█▏ | 2770/22434 [1:56:27<13:39:46, 2.50s/it] +2025-02-05 12:04:07 - ERROR - stderr - +2025-02-05 12:04:07 - ERROR - stderr - +2025-02-05 12:04:07 - INFO - stdout - {'loss': 0.9347, 'grad_norm': 1.1020498275756836, 'learning_rate': 1.9545621392230013e-05, 'epoch': 0.37} +2025-02-05 12:04:07 - ERROR - stderr - 12%|█▏ | 2770/22434 [1:56:27<13:39:46, 2.50s/it] +2025-02-05 12:04:10 - ERROR - stderr - 12%|█▏ | 2771/22434 [1:56:29<13:33:43, 2.48s/it] +2025-02-05 12:04:10 - ERROR - stderr - +2025-02-05 12:04:10 - ERROR - stderr - +2025-02-05 12:04:10 - INFO - stdout - {'loss': 0.97, 'grad_norm': 1.0748441219329834, 'learning_rate': 1.9545191039000096e-05, 'epoch': 0.37} +2025-02-05 12:04:10 - ERROR - stderr - 12%|█▏ | 2771/22434 [1:56:29<13:33:43, 2.48s/it] +2025-02-05 12:04:12 - ERROR - stderr - 12%|█▏ | 2772/22434 [1:56:32<13:55:00, 2.55s/it] +2025-02-05 12:04:12 - ERROR - stderr - +2025-02-05 12:04:12 - ERROR - stderr - +2025-02-05 12:04:12 - INFO - stdout - {'loss': 0.9091, 'grad_norm': 1.2570973634719849, 'learning_rate': 1.9544760486809808e-05, 'epoch': 0.37} +2025-02-05 12:04:12 - ERROR - stderr - 12%|█▏ | 2772/22434 [1:56:32<13:55:00, 2.55s/it] +2025-02-05 12:04:15 - ERROR - stderr - 12%|█▏ | 2773/22434 [1:56:35<13:41:48, 2.51s/it] +2025-02-05 12:04:15 - ERROR - stderr - +2025-02-05 12:04:15 - ERROR - stderr - +2025-02-05 12:04:15 - INFO - stdout - {'loss': 1.0013, 'grad_norm': 1.1747227907180786, 'learning_rate': 1.954432973566812e-05, 'epoch': 0.37} +2025-02-05 12:04:15 - ERROR - stderr - 12%|█▏ | 2773/22434 [1:56:35<13:41:48, 2.51s/it] +2025-02-05 12:04:17 - ERROR - stderr - 12%|█▏ | 2774/22434 [1:56:37<13:37:37, 2.50s/it] +2025-02-05 12:04:17 - ERROR - stderr - +2025-02-05 12:04:17 - ERROR - stderr - +2025-02-05 12:04:17 - INFO - stdout - {'loss': 0.9705, 'grad_norm': 1.249719262123108, 'learning_rate': 1.954389878558401e-05, 'epoch': 0.37} +2025-02-05 12:04:17 - ERROR - stderr - 12%|█▏ | 2774/22434 [1:56:37<13:37:37, 2.50s/it] +2025-02-05 12:04:20 - ERROR - stderr - 12%|█▏ | 2775/22434 [1:56:39<13:29:59, 2.47s/it] +2025-02-05 12:04:20 - ERROR - stderr - +2025-02-05 12:04:20 - ERROR - stderr - +2025-02-05 12:04:20 - INFO - stdout - {'loss': 0.9938, 'grad_norm': 1.1774156093597412, 'learning_rate': 1.9543467636566463e-05, 'epoch': 0.37} +2025-02-05 12:04:20 - ERROR - stderr - 12%|█▏ | 2775/22434 [1:56:39<13:29:59, 2.47s/it] +2025-02-05 12:04:22 - ERROR - stderr - 12%|█▏ | 2776/22434 [1:56:42<13:30:41, 2.47s/it] +2025-02-05 12:04:22 - ERROR - stderr - +2025-02-05 12:04:22 - ERROR - stderr - +2025-02-05 12:04:22 - INFO - stdout - {'loss': 0.9119, 'grad_norm': 1.0194612741470337, 'learning_rate': 1.9543036288624465e-05, 'epoch': 0.37} +2025-02-05 12:04:22 - ERROR - stderr - 12%|█▏ | 2776/22434 [1:56:42<13:30:41, 2.47s/it] +2025-02-05 12:04:25 - ERROR - stderr - 12%|█▏ | 2777/22434 [1:56:44<13:33:45, 2.48s/it] +2025-02-05 12:04:25 - ERROR - stderr - +2025-02-05 12:04:25 - ERROR - stderr - +2025-02-05 12:04:25 - INFO - stdout - {'loss': 0.9419, 'grad_norm': 1.1757391691207886, 'learning_rate': 1.954260474176701e-05, 'epoch': 0.37} +2025-02-05 12:04:25 - ERROR - stderr - 12%|█▏ | 2777/22434 [1:56:44<13:33:45, 2.48s/it] +2025-02-05 12:04:27 - ERROR - stderr - 12%|█▏ | 2778/22434 [1:56:47<13:27:31, 2.46s/it] +2025-02-05 12:04:27 - ERROR - stderr - +2025-02-05 12:04:27 - ERROR - stderr - +2025-02-05 12:04:27 - INFO - stdout - {'loss': 1.0257, 'grad_norm': 1.1901240348815918, 'learning_rate': 1.954217299600309e-05, 'epoch': 0.37} +2025-02-05 12:04:27 - ERROR - stderr - 12%|█▏ | 2778/22434 [1:56:47<13:27:31, 2.46s/it] +2025-02-05 12:04:30 - ERROR - stderr - 12%|█▏ | 2779/22434 [1:56:49<13:24:51, 2.46s/it] +2025-02-05 12:04:30 - ERROR - stderr - +2025-02-05 12:04:30 - ERROR - stderr - +2025-02-05 12:04:30 - INFO - stdout - {'loss': 1.1054, 'grad_norm': 1.302170991897583, 'learning_rate': 1.95417410513417e-05, 'epoch': 0.37} +2025-02-05 12:04:30 - ERROR - stderr - 12%|█▏ | 2779/22434 [1:56:49<13:24:51, 2.46s/it] +2025-02-05 12:04:32 - ERROR - stderr - 12%|█▏ | 2780/22434 [1:56:52<13:30:40, 2.47s/it] +2025-02-05 12:04:32 - ERROR - stderr - +2025-02-05 12:04:32 - ERROR - stderr - +2025-02-05 12:04:32 - INFO - stdout - {'loss': 0.9817, 'grad_norm': 1.1504501104354858, 'learning_rate': 1.9541308907791854e-05, 'epoch': 0.37} +2025-02-05 12:04:32 - ERROR - stderr - 12%|█▏ | 2780/22434 [1:56:52<13:30:40, 2.47s/it] +2025-02-05 12:04:34 - ERROR - stderr - 12%|█▏ | 2781/22434 [1:56:54<13:27:47, 2.47s/it] +2025-02-05 12:04:35 - ERROR - stderr - +2025-02-05 12:04:35 - ERROR - stderr - +2025-02-05 12:04:35 - INFO - stdout - {'loss': 1.0348, 'grad_norm': 1.1870187520980835, 'learning_rate': 1.954087656536255e-05, 'epoch': 0.37} +2025-02-05 12:04:35 - ERROR - stderr - 12%|█▏ | 2781/22434 [1:56:54<13:27:47, 2.47s/it] +2025-02-05 12:04:37 - ERROR - stderr - 12%|█▏ | 2782/22434 [1:56:57<13:30:59, 2.48s/it] +2025-02-05 12:04:37 - ERROR - stderr - +2025-02-05 12:04:37 - ERROR - stderr - +2025-02-05 12:04:37 - INFO - stdout - {'loss': 0.8195, 'grad_norm': 0.997534453868866, 'learning_rate': 1.9540444024062807e-05, 'epoch': 0.37} +2025-02-05 12:04:37 - ERROR - stderr - 12%|█▏ | 2782/22434 [1:56:57<13:30:59, 2.48s/it] +2025-02-05 12:04:39 - ERROR - stderr - 12%|█▏ | 2783/22434 [1:56:59<13:27:29, 2.47s/it] +2025-02-05 12:04:39 - ERROR - stderr - +2025-02-05 12:04:39 - ERROR - stderr - +2025-02-05 12:04:39 - INFO - stdout - {'loss': 0.8837, 'grad_norm': 1.0806421041488647, 'learning_rate': 1.9540011283901635e-05, 'epoch': 0.37} +2025-02-05 12:04:39 - ERROR - stderr - 12%|█▏ | 2783/22434 [1:56:59<13:27:29, 2.47s/it] +2025-02-05 12:04:42 - ERROR - stderr - 12%|█▏ | 2784/22434 [1:57:02<13:30:42, 2.48s/it] +2025-02-05 12:04:42 - ERROR - stderr - +2025-02-05 12:04:42 - ERROR - stderr - +2025-02-05 12:04:42 - INFO - stdout - {'loss': 0.9852, 'grad_norm': 1.0678706169128418, 'learning_rate': 1.9539578344888057e-05, 'epoch': 0.37} +2025-02-05 12:04:42 - ERROR - stderr - 12%|█▏ | 2784/22434 [1:57:02<13:30:42, 2.48s/it] +2025-02-05 12:04:44 - ERROR - stderr - 12%|█▏ | 2785/22434 [1:57:04<13:27:29, 2.47s/it] +2025-02-05 12:04:44 - ERROR - stderr - +2025-02-05 12:04:44 - ERROR - stderr - +2025-02-05 12:04:44 - INFO - stdout - {'loss': 0.9974, 'grad_norm': 1.1064411401748657, 'learning_rate': 1.95391452070311e-05, 'epoch': 0.37} +2025-02-05 12:04:44 - ERROR - stderr - 12%|█▏ | 2785/22434 [1:57:04<13:27:29, 2.47s/it] +2025-02-05 12:04:47 - ERROR - stderr - 12%|█▏ | 2786/22434 [1:57:07<13:35:51, 2.49s/it] +2025-02-05 12:04:47 - ERROR - stderr - +2025-02-05 12:04:47 - ERROR - stderr - +2025-02-05 12:04:47 - INFO - stdout - {'loss': 0.8882, 'grad_norm': 1.1063323020935059, 'learning_rate': 1.953871187033978e-05, 'epoch': 0.37} +2025-02-05 12:04:47 - ERROR - stderr - 12%|█▏ | 2786/22434 [1:57:07<13:35:51, 2.49s/it] +2025-02-05 12:04:49 - ERROR - stderr - 12%|█▏ | 2787/22434 [1:57:09<13:30:25, 2.47s/it] +2025-02-05 12:04:49 - ERROR - stderr - +2025-02-05 12:04:49 - ERROR - stderr - +2025-02-05 12:04:49 - INFO - stdout - {'loss': 0.8442, 'grad_norm': 1.0284321308135986, 'learning_rate': 1.9538278334823148e-05, 'epoch': 0.37} +2025-02-05 12:04:49 - ERROR - stderr - 12%|█▏ | 2787/22434 [1:57:09<13:30:25, 2.47s/it] +2025-02-05 12:04:52 - ERROR - stderr - 12%|█▏ | 2788/22434 [1:57:12<13:28:16, 2.47s/it] +2025-02-05 12:04:52 - ERROR - stderr - +2025-02-05 12:04:52 - ERROR - stderr - +2025-02-05 12:04:52 - INFO - stdout - {'loss': 0.9145, 'grad_norm': 1.0987207889556885, 'learning_rate': 1.9537844600490227e-05, 'epoch': 0.37} +2025-02-05 12:04:52 - ERROR - stderr - 12%|█▏ | 2788/22434 [1:57:12<13:28:16, 2.47s/it] +2025-02-05 12:04:54 - ERROR - stderr - 12%|█▏ | 2789/22434 [1:57:14<13:36:12, 2.49s/it] +2025-02-05 12:04:54 - ERROR - stderr - +2025-02-05 12:04:54 - ERROR - stderr - +2025-02-05 12:04:54 - INFO - stdout - {'loss': 1.1888, 'grad_norm': 1.2421810626983643, 'learning_rate': 1.9537410667350064e-05, 'epoch': 0.37} +2025-02-05 12:04:54 - ERROR - stderr - 12%|█▏ | 2789/22434 [1:57:14<13:36:12, 2.49s/it] +2025-02-05 12:04:57 - ERROR - stderr - 12%|█▏ | 2790/22434 [1:57:17<13:37:06, 2.50s/it] +2025-02-05 12:04:57 - ERROR - stderr - +2025-02-05 12:04:57 - ERROR - stderr - +2025-02-05 12:04:57 - INFO - stdout - {'loss': 0.9538, 'grad_norm': 1.0936285257339478, 'learning_rate': 1.95369765354117e-05, 'epoch': 0.37} +2025-02-05 12:04:57 - ERROR - stderr - 12%|█▏ | 2790/22434 [1:57:17<13:37:06, 2.50s/it] +2025-02-05 12:04:59 - ERROR - stderr - 12%|█▏ | 2791/22434 [1:57:19<13:40:44, 2.51s/it] +2025-02-05 12:04:59 - ERROR - stderr - +2025-02-05 12:04:59 - ERROR - stderr - +2025-02-05 12:04:59 - INFO - stdout - {'loss': 0.9783, 'grad_norm': 1.1306225061416626, 'learning_rate': 1.9536542204684187e-05, 'epoch': 0.37} +2025-02-05 12:04:59 - ERROR - stderr - 12%|█▏ | 2791/22434 [1:57:19<13:40:44, 2.51s/it] +2025-02-05 12:05:02 - ERROR - stderr - 12%|█▏ | 2792/22434 [1:57:22<13:41:25, 2.51s/it] +2025-02-05 12:05:02 - ERROR - stderr - +2025-02-05 12:05:02 - ERROR - stderr - +2025-02-05 12:05:02 - INFO - stdout - {'loss': 0.8413, 'grad_norm': 1.174850344657898, 'learning_rate': 1.953610767517658e-05, 'epoch': 0.37} +2025-02-05 12:05:02 - ERROR - stderr - 12%|█▏ | 2792/22434 [1:57:22<13:41:25, 2.51s/it] +2025-02-05 12:05:04 - ERROR - stderr - 12%|█▏ | 2793/22434 [1:57:24<13:44:41, 2.52s/it] +2025-02-05 12:05:04 - ERROR - stderr - +2025-02-05 12:05:04 - ERROR - stderr - +2025-02-05 12:05:04 - INFO - stdout - {'loss': 1.0104, 'grad_norm': 1.167121410369873, 'learning_rate': 1.953567294689793e-05, 'epoch': 0.37} +2025-02-05 12:05:04 - ERROR - stderr - 12%|█▏ | 2793/22434 [1:57:24<13:44:41, 2.52s/it] +2025-02-05 12:05:07 - ERROR - stderr - 12%|█▏ | 2794/22434 [1:57:27<13:56:06, 2.55s/it] +2025-02-05 12:05:07 - ERROR - stderr - +2025-02-05 12:05:07 - ERROR - stderr - +2025-02-05 12:05:07 - INFO - stdout - {'loss': 1.1348, 'grad_norm': 1.197521686553955, 'learning_rate': 1.95352380198573e-05, 'epoch': 0.37} +2025-02-05 12:05:07 - ERROR - stderr - 12%|█▏ | 2794/22434 [1:57:27<13:56:06, 2.55s/it] +2025-02-05 12:05:10 - ERROR - stderr - 12%|█▏ | 2795/22434 [1:57:29<13:58:44, 2.56s/it] +2025-02-05 12:05:10 - ERROR - stderr - +2025-02-05 12:05:10 - ERROR - stderr - +2025-02-05 12:05:10 - INFO - stdout - {'loss': 1.0382, 'grad_norm': 1.1388661861419678, 'learning_rate': 1.9534802894063764e-05, 'epoch': 0.37} +2025-02-05 12:05:10 - ERROR - stderr - 12%|█▏ | 2795/22434 [1:57:29<13:58:44, 2.56s/it] +2025-02-05 12:05:12 - ERROR - stderr - 12%|█▏ | 2796/22434 [1:57:32<13:46:59, 2.53s/it] +2025-02-05 12:05:12 - ERROR - stderr - +2025-02-05 12:05:12 - ERROR - stderr - +2025-02-05 12:05:12 - INFO - stdout - {'loss': 0.8724, 'grad_norm': 1.040999174118042, 'learning_rate': 1.953436756952638e-05, 'epoch': 0.37} +2025-02-05 12:05:12 - ERROR - stderr - 12%|█▏ | 2796/22434 [1:57:32<13:46:59, 2.53s/it] +2025-02-05 12:05:15 - ERROR - stderr - 12%|█▏ | 2797/22434 [1:57:34<13:43:10, 2.52s/it] +2025-02-05 12:05:15 - ERROR - stderr - +2025-02-05 12:05:15 - ERROR - stderr - +2025-02-05 12:05:15 - INFO - stdout - {'loss': 1.0474, 'grad_norm': 1.170767068862915, 'learning_rate': 1.953393204625423e-05, 'epoch': 0.37} +2025-02-05 12:05:15 - ERROR - stderr - 12%|█▏ | 2797/22434 [1:57:34<13:43:10, 2.52s/it] +2025-02-05 12:05:17 - ERROR - stderr - 12%|█▏ | 2798/22434 [1:57:37<13:40:08, 2.51s/it] +2025-02-05 12:05:17 - ERROR - stderr - +2025-02-05 12:05:17 - ERROR - stderr - +2025-02-05 12:05:17 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.1557772159576416, 'learning_rate': 1.953349632425639e-05, 'epoch': 0.37} +2025-02-05 12:05:17 - ERROR - stderr - 12%|█▏ | 2798/22434 [1:57:37<13:40:08, 2.51s/it] +2025-02-05 12:05:20 - ERROR - stderr - 12%|█▏ | 2799/22434 [1:57:39<13:34:26, 2.49s/it] +2025-02-05 12:05:20 - ERROR - stderr - +2025-02-05 12:05:20 - ERROR - stderr - +2025-02-05 12:05:20 - INFO - stdout - {'loss': 0.9683, 'grad_norm': 1.1081714630126953, 'learning_rate': 1.9533060403541937e-05, 'epoch': 0.37} +2025-02-05 12:05:20 - ERROR - stderr - 12%|█▏ | 2799/22434 [1:57:39<13:34:26, 2.49s/it] +2025-02-05 12:05:22 - ERROR - stderr - 12%|█▏ | 2800/22434 [1:57:42<13:34:44, 2.49s/it] +2025-02-05 12:05:22 - ERROR - stderr - +2025-02-05 12:05:22 - ERROR - stderr - +2025-02-05 12:05:22 - INFO - stdout - {'loss': 1.0143, 'grad_norm': 1.2257702350616455, 'learning_rate': 1.953262428411997e-05, 'epoch': 0.37} +2025-02-05 12:05:22 - ERROR - stderr - 12%|█▏ | 2800/22434 [1:57:42<13:34:44, 2.49s/it] +2025-02-05 12:05:25 - ERROR - stderr - 12%|█▏ | 2801/22434 [1:57:44<13:56:37, 2.56s/it] +2025-02-05 12:05:25 - ERROR - stderr - +2025-02-05 12:05:25 - ERROR - stderr - +2025-02-05 12:05:25 - INFO - stdout - {'loss': 1.0098, 'grad_norm': 1.2131214141845703, 'learning_rate': 1.9532187965999565e-05, 'epoch': 0.37} +2025-02-05 12:05:25 - ERROR - stderr - 12%|█▏ | 2801/22434 [1:57:45<13:56:37, 2.56s/it] +2025-02-05 12:05:27 - ERROR - stderr - 12%|█▏ | 2802/22434 [1:57:47<14:06:45, 2.59s/it] +2025-02-05 12:05:27 - ERROR - stderr - +2025-02-05 12:05:27 - ERROR - stderr - +2025-02-05 12:05:27 - INFO - stdout - {'loss': 1.0206, 'grad_norm': 1.0500364303588867, 'learning_rate': 1.9531751449189826e-05, 'epoch': 0.37} +2025-02-05 12:05:27 - ERROR - stderr - 12%|█▏ | 2802/22434 [1:57:47<14:06:45, 2.59s/it] +2025-02-05 12:05:30 - ERROR - stderr - 12%|█▏ | 2803/22434 [1:57:50<13:56:06, 2.56s/it] +2025-02-05 12:05:30 - ERROR - stderr - +2025-02-05 12:05:30 - ERROR - stderr - +2025-02-05 12:05:30 - INFO - stdout - {'loss': 0.916, 'grad_norm': 1.1045676469802856, 'learning_rate': 1.953131473369985e-05, 'epoch': 0.37} +2025-02-05 12:05:30 - ERROR - stderr - 12%|█▏ | 2803/22434 [1:57:50<13:56:06, 2.56s/it] +2025-02-05 12:05:32 - ERROR - stderr - 12%|█▏ | 2804/22434 [1:57:52<13:49:49, 2.54s/it] +2025-02-05 12:05:32 - ERROR - stderr - +2025-02-05 12:05:32 - ERROR - stderr - +2025-02-05 12:05:32 - INFO - stdout - {'loss': 0.9999, 'grad_norm': 1.1069353818893433, 'learning_rate': 1.9530877819538736e-05, 'epoch': 0.37} +2025-02-05 12:05:32 - ERROR - stderr - 12%|█▏ | 2804/22434 [1:57:52<13:49:49, 2.54s/it] +2025-02-05 12:05:35 - ERROR - stderr - 13%|█▎ | 2805/22434 [1:57:55<14:15:35, 2.62s/it] +2025-02-05 12:05:35 - ERROR - stderr - +2025-02-05 12:05:35 - ERROR - stderr - +2025-02-05 12:05:35 - INFO - stdout - {'loss': 0.8825, 'grad_norm': 0.9986951351165771, 'learning_rate': 1.9530440706715595e-05, 'epoch': 0.38} +2025-02-05 12:05:35 - ERROR - stderr - 13%|█▎ | 2805/22434 [1:57:55<14:15:35, 2.62s/it] +2025-02-05 12:05:38 - ERROR - stderr - 13%|█▎ | 2806/22434 [1:57:57<14:03:01, 2.58s/it] +2025-02-05 12:05:38 - ERROR - stderr - +2025-02-05 12:05:38 - ERROR - stderr - +2025-02-05 12:05:38 - INFO - stdout - {'loss': 1.0313, 'grad_norm': 1.0826143026351929, 'learning_rate': 1.9530003395239538e-05, 'epoch': 0.38} +2025-02-05 12:05:38 - ERROR - stderr - 13%|█▎ | 2806/22434 [1:57:57<14:03:01, 2.58s/it] +2025-02-05 12:05:40 - ERROR - stderr - 13%|█▎ | 2807/22434 [1:58:00<13:54:55, 2.55s/it] +2025-02-05 12:05:40 - ERROR - stderr - +2025-02-05 12:05:40 - ERROR - stderr - +2025-02-05 12:05:40 - INFO - stdout - {'loss': 0.997, 'grad_norm': 1.1827342510223389, 'learning_rate': 1.9529565885119676e-05, 'epoch': 0.38} +2025-02-05 12:05:40 - ERROR - stderr - 13%|█▎ | 2807/22434 [1:58:00<13:54:55, 2.55s/it] +2025-02-05 12:05:43 - ERROR - stderr - 13%|█▎ | 2808/22434 [1:58:02<13:49:55, 2.54s/it] +2025-02-05 12:05:43 - ERROR - stderr - +2025-02-05 12:05:43 - ERROR - stderr - +2025-02-05 12:05:43 - INFO - stdout - {'loss': 0.8826, 'grad_norm': 1.122597098350525, 'learning_rate': 1.9529128176365137e-05, 'epoch': 0.38} +2025-02-05 12:05:43 - ERROR - stderr - 13%|█▎ | 2808/22434 [1:58:02<13:49:55, 2.54s/it] +2025-02-05 12:05:45 - ERROR - stderr - 13%|█▎ | 2809/22434 [1:58:05<14:11:57, 2.60s/it] +2025-02-05 12:05:45 - ERROR - stderr - +2025-02-05 12:05:45 - ERROR - stderr - +2025-02-05 12:05:45 - INFO - stdout - {'loss': 0.8772, 'grad_norm': 1.0263030529022217, 'learning_rate': 1.9528690268985037e-05, 'epoch': 0.38} +2025-02-05 12:05:45 - ERROR - stderr - 13%|█▎ | 2809/22434 [1:58:05<14:11:57, 2.60s/it] +2025-02-05 12:05:48 - ERROR - stderr - 13%|█▎ | 2810/22434 [1:58:08<14:17:44, 2.62s/it] +2025-02-05 12:05:48 - ERROR - stderr - +2025-02-05 12:05:48 - ERROR - stderr - +2025-02-05 12:05:48 - INFO - stdout - {'loss': 0.9948, 'grad_norm': 1.1097997426986694, 'learning_rate': 1.9528252162988505e-05, 'epoch': 0.38} +2025-02-05 12:05:48 - ERROR - stderr - 13%|█▎ | 2810/22434 [1:58:08<14:17:44, 2.62s/it] +2025-02-05 12:05:50 - ERROR - stderr - 13%|█▎ | 2811/22434 [1:58:10<13:58:35, 2.56s/it] +2025-02-05 12:05:51 - ERROR - stderr - +2025-02-05 12:05:51 - ERROR - stderr - +2025-02-05 12:05:51 - INFO - stdout - {'loss': 1.1656, 'grad_norm': 1.2263058423995972, 'learning_rate': 1.9527813858384678e-05, 'epoch': 0.38} +2025-02-05 12:05:51 - ERROR - stderr - 13%|█▎ | 2811/22434 [1:58:10<13:58:35, 2.56s/it] +2025-02-05 12:05:53 - ERROR - stderr - 13%|█▎ | 2812/22434 [1:58:13<13:51:53, 2.54s/it] +2025-02-05 12:05:53 - ERROR - stderr - +2025-02-05 12:05:53 - ERROR - stderr - +2025-02-05 12:05:53 - INFO - stdout - {'loss': 1.1627, 'grad_norm': 1.264751672744751, 'learning_rate': 1.9527375355182684e-05, 'epoch': 0.38} +2025-02-05 12:05:53 - ERROR - stderr - 13%|█▎ | 2812/22434 [1:58:13<13:51:53, 2.54s/it] +2025-02-05 12:05:56 - ERROR - stderr - 13%|█▎ | 2813/22434 [1:58:15<13:57:30, 2.56s/it] +2025-02-05 12:05:56 - ERROR - stderr - +2025-02-05 12:05:56 - ERROR - stderr - +2025-02-05 12:05:56 - INFO - stdout - {'loss': 0.9163, 'grad_norm': 1.0346555709838867, 'learning_rate': 1.952693665339167e-05, 'epoch': 0.38} +2025-02-05 12:05:56 - ERROR - stderr - 13%|█▎ | 2813/22434 [1:58:15<13:57:30, 2.56s/it] +2025-02-05 12:05:58 - ERROR - stderr - 13%|█▎ | 2814/22434 [1:58:18<13:46:50, 2.53s/it] +2025-02-05 12:05:58 - ERROR - stderr - +2025-02-05 12:05:58 - ERROR - stderr - +2025-02-05 12:05:58 - INFO - stdout - {'loss': 0.9958, 'grad_norm': 1.1582435369491577, 'learning_rate': 1.9526497753020776e-05, 'epoch': 0.38} +2025-02-05 12:05:58 - ERROR - stderr - 13%|█▎ | 2814/22434 [1:58:18<13:46:50, 2.53s/it] +2025-02-05 12:06:01 - ERROR - stderr - 13%|█▎ | 2815/22434 [1:58:20<13:44:01, 2.52s/it] +2025-02-05 12:06:01 - ERROR - stderr - +2025-02-05 12:06:01 - ERROR - stderr - +2025-02-05 12:06:01 - INFO - stdout - {'loss': 0.9527, 'grad_norm': 1.1092829704284668, 'learning_rate': 1.9526058654079155e-05, 'epoch': 0.38} +2025-02-05 12:06:01 - ERROR - stderr - 13%|█▎ | 2815/22434 [1:58:20<13:44:01, 2.52s/it] +2025-02-05 12:06:03 - ERROR - stderr - 13%|█▎ | 2816/22434 [1:58:23<13:41:32, 2.51s/it] +2025-02-05 12:06:03 - ERROR - stderr - +2025-02-05 12:06:03 - ERROR - stderr - +2025-02-05 12:06:03 - INFO - stdout - {'loss': 1.0141, 'grad_norm': 1.1325383186340332, 'learning_rate': 1.9525619356575955e-05, 'epoch': 0.38} +2025-02-05 12:06:03 - ERROR - stderr - 13%|█▎ | 2816/22434 [1:58:23<13:41:32, 2.51s/it] +2025-02-05 12:06:05 - ERROR - stderr - 13%|█▎ | 2817/22434 [1:58:25<13:33:37, 2.49s/it] +2025-02-05 12:06:06 - ERROR - stderr - +2025-02-05 12:06:06 - ERROR - stderr - +2025-02-05 12:06:06 - INFO - stdout - {'loss': 0.9861, 'grad_norm': 1.1630785465240479, 'learning_rate': 1.9525179860520334e-05, 'epoch': 0.38} +2025-02-05 12:06:06 - ERROR - stderr - 13%|█▎ | 2817/22434 [1:58:25<13:33:37, 2.49s/it] +2025-02-05 12:06:08 - ERROR - stderr - 13%|█▎ | 2818/22434 [1:58:28<13:33:56, 2.49s/it] +2025-02-05 12:06:08 - ERROR - stderr - +2025-02-05 12:06:08 - ERROR - stderr - +2025-02-05 12:06:08 - INFO - stdout - {'loss': 0.958, 'grad_norm': 1.2063350677490234, 'learning_rate': 1.9524740165921454e-05, 'epoch': 0.38} +2025-02-05 12:06:08 - ERROR - stderr - 13%|█▎ | 2818/22434 [1:58:28<13:33:56, 2.49s/it] +2025-02-05 12:06:10 - ERROR - stderr - 13%|█▎ | 2819/22434 [1:58:30<13:36:47, 2.50s/it] +2025-02-05 12:06:11 - ERROR - stderr - +2025-02-05 12:06:11 - ERROR - stderr - +2025-02-05 12:06:11 - INFO - stdout - {'loss': 0.9974, 'grad_norm': 1.213502049446106, 'learning_rate': 1.9524300272788477e-05, 'epoch': 0.38} +2025-02-05 12:06:11 - ERROR - stderr - 13%|█▎ | 2819/22434 [1:58:30<13:36:47, 2.50s/it] +2025-02-05 12:06:13 - ERROR - stderr - 13%|█▎ | 2820/22434 [1:58:33<13:28:52, 2.47s/it] +2025-02-05 12:06:13 - ERROR - stderr - +2025-02-05 12:06:13 - ERROR - stderr - +2025-02-05 12:06:13 - INFO - stdout - {'loss': 1.06, 'grad_norm': 1.0902763605117798, 'learning_rate': 1.952386018113058e-05, 'epoch': 0.38} +2025-02-05 12:06:13 - ERROR - stderr - 13%|█▎ | 2820/22434 [1:58:33<13:28:52, 2.47s/it] +2025-02-05 12:06:15 - ERROR - stderr - 13%|█▎ | 2821/22434 [1:58:35<13:29:02, 2.48s/it] +2025-02-05 12:06:15 - ERROR - stderr - +2025-02-05 12:06:15 - ERROR - stderr - +2025-02-05 12:06:15 - INFO - stdout - {'loss': 0.8734, 'grad_norm': 1.064086675643921, 'learning_rate': 1.9523419890956927e-05, 'epoch': 0.38} +2025-02-05 12:06:15 - ERROR - stderr - 13%|█▎ | 2821/22434 [1:58:35<13:29:02, 2.48s/it] +2025-02-05 12:06:18 - ERROR - stderr - 13%|█▎ | 2822/22434 [1:58:38<13:30:07, 2.48s/it] +2025-02-05 12:06:18 - ERROR - stderr - +2025-02-05 12:06:18 - ERROR - stderr - +2025-02-05 12:06:18 - INFO - stdout - {'loss': 1.1146, 'grad_norm': 1.1437721252441406, 'learning_rate': 1.9522979402276704e-05, 'epoch': 0.38} +2025-02-05 12:06:18 - ERROR - stderr - 13%|█▎ | 2822/22434 [1:58:38<13:30:07, 2.48s/it] +2025-02-05 12:06:20 - ERROR - stderr - 13%|█▎ | 2823/22434 [1:58:40<13:33:31, 2.49s/it] +2025-02-05 12:06:20 - ERROR - stderr - +2025-02-05 12:06:20 - ERROR - stderr - +2025-02-05 12:06:20 - INFO - stdout - {'loss': 0.9732, 'grad_norm': 1.127562403678894, 'learning_rate': 1.952253871509908e-05, 'epoch': 0.38} +2025-02-05 12:06:20 - ERROR - stderr - 13%|█▎ | 2823/22434 [1:58:40<13:33:31, 2.49s/it] +2025-02-05 12:06:23 - ERROR - stderr - 13%|█▎ | 2824/22434 [1:58:43<13:40:08, 2.51s/it] +2025-02-05 12:06:23 - ERROR - stderr - +2025-02-05 12:06:23 - ERROR - stderr - +2025-02-05 12:06:23 - INFO - stdout - {'loss': 1.1216, 'grad_norm': 1.2367345094680786, 'learning_rate': 1.9522097829433252e-05, 'epoch': 0.38} +2025-02-05 12:06:23 - ERROR - stderr - 13%|█▎ | 2824/22434 [1:58:43<13:40:08, 2.51s/it] +2025-02-05 12:06:26 - ERROR - stderr - 13%|█▎ | 2825/22434 [1:58:45<14:00:48, 2.57s/it] +2025-02-05 12:06:26 - ERROR - stderr - +2025-02-05 12:06:26 - ERROR - stderr - +2025-02-05 12:06:26 - INFO - stdout - {'loss': 1.0174, 'grad_norm': 1.2287040948867798, 'learning_rate': 1.952165674528841e-05, 'epoch': 0.38} +2025-02-05 12:06:26 - ERROR - stderr - 13%|█▎ | 2825/22434 [1:58:45<14:00:48, 2.57s/it] +2025-02-05 12:06:28 - ERROR - stderr - 13%|█▎ | 2826/22434 [1:58:48<13:45:58, 2.53s/it] +2025-02-05 12:06:28 - ERROR - stderr - +2025-02-05 12:06:28 - ERROR - stderr - +2025-02-05 12:06:28 - INFO - stdout - {'loss': 0.9984, 'grad_norm': 1.0770084857940674, 'learning_rate': 1.9521215462673743e-05, 'epoch': 0.38} +2025-02-05 12:06:28 - ERROR - stderr - 13%|█▎ | 2826/22434 [1:58:48<13:45:58, 2.53s/it] +2025-02-05 12:06:31 - ERROR - stderr - 13%|█▎ | 2827/22434 [1:58:50<13:43:57, 2.52s/it] +2025-02-05 12:06:31 - ERROR - stderr - +2025-02-05 12:06:31 - ERROR - stderr - +2025-02-05 12:06:31 - INFO - stdout - {'loss': 1.0212, 'grad_norm': 1.1580950021743774, 'learning_rate': 1.9520773981598446e-05, 'epoch': 0.38} +2025-02-05 12:06:31 - ERROR - stderr - 13%|█▎ | 2827/22434 [1:58:50<13:43:57, 2.52s/it] +2025-02-05 12:06:33 - ERROR - stderr - 13%|█▎ | 2828/22434 [1:58:53<13:39:38, 2.51s/it] +2025-02-05 12:06:33 - ERROR - stderr - +2025-02-05 12:06:33 - ERROR - stderr - +2025-02-05 12:06:33 - INFO - stdout - {'loss': 0.9547, 'grad_norm': 1.1108931303024292, 'learning_rate': 1.952033230207173e-05, 'epoch': 0.38} +2025-02-05 12:06:33 - ERROR - stderr - 13%|█▎ | 2828/22434 [1:58:53<13:39:38, 2.51s/it] +2025-02-05 12:06:35 - ERROR - stderr - 13%|█▎ | 2829/22434 [1:58:55<13:29:35, 2.48s/it] +2025-02-05 12:06:36 - ERROR - stderr - +2025-02-05 12:06:36 - ERROR - stderr - +2025-02-05 12:06:36 - INFO - stdout - {'loss': 1.0631, 'grad_norm': 1.1535866260528564, 'learning_rate': 1.9519890424102795e-05, 'epoch': 0.38} +2025-02-05 12:06:36 - ERROR - stderr - 13%|█▎ | 2829/22434 [1:58:55<13:29:35, 2.48s/it] +2025-02-05 12:06:38 - ERROR - stderr - 13%|█▎ | 2830/22434 [1:58:58<13:48:45, 2.54s/it] +2025-02-05 12:06:38 - ERROR - stderr - +2025-02-05 12:06:38 - ERROR - stderr - +2025-02-05 12:06:38 - INFO - stdout - {'loss': 1.0352, 'grad_norm': 1.2534866333007812, 'learning_rate': 1.9519448347700855e-05, 'epoch': 0.38} +2025-02-05 12:06:38 - ERROR - stderr - 13%|█▎ | 2830/22434 [1:58:58<13:48:45, 2.54s/it] +2025-02-05 12:06:41 - ERROR - stderr - 13%|█▎ | 2831/22434 [1:59:00<13:40:51, 2.51s/it] +2025-02-05 12:06:41 - ERROR - stderr - +2025-02-05 12:06:41 - ERROR - stderr - +2025-02-05 12:06:41 - INFO - stdout - {'loss': 1.0495, 'grad_norm': 1.1554940938949585, 'learning_rate': 1.951900607287512e-05, 'epoch': 0.38} +2025-02-05 12:06:41 - ERROR - stderr - 13%|█▎ | 2831/22434 [1:59:00<13:40:51, 2.51s/it] +2025-02-05 12:06:43 - ERROR - stderr - 13%|█▎ | 2832/22434 [1:59:03<13:32:39, 2.49s/it] +2025-02-05 12:06:43 - ERROR - stderr - +2025-02-05 12:06:43 - ERROR - stderr - +2025-02-05 12:06:43 - INFO - stdout - {'loss': 0.9284, 'grad_norm': 1.0243728160858154, 'learning_rate': 1.9518563599634815e-05, 'epoch': 0.38} +2025-02-05 12:06:43 - ERROR - stderr - 13%|█▎ | 2832/22434 [1:59:03<13:32:39, 2.49s/it] +2025-02-05 12:06:46 - ERROR - stderr - 13%|█▎ | 2833/22434 [1:59:05<13:32:32, 2.49s/it] +2025-02-05 12:06:46 - ERROR - stderr - +2025-02-05 12:06:46 - ERROR - stderr - +2025-02-05 12:06:46 - INFO - stdout - {'loss': 0.8786, 'grad_norm': 1.187166690826416, 'learning_rate': 1.951812092798916e-05, 'epoch': 0.38} +2025-02-05 12:06:46 - ERROR - stderr - 13%|█▎ | 2833/22434 [1:59:05<13:32:32, 2.49s/it] +2025-02-05 12:06:48 - ERROR - stderr - 13%|█▎ | 2834/22434 [1:59:08<13:32:01, 2.49s/it] +2025-02-05 12:06:48 - ERROR - stderr - +2025-02-05 12:06:48 - ERROR - stderr - +2025-02-05 12:06:48 - INFO - stdout - {'loss': 0.9292, 'grad_norm': 1.212857961654663, 'learning_rate': 1.9517678057947385e-05, 'epoch': 0.38} +2025-02-05 12:06:48 - ERROR - stderr - 13%|█▎ | 2834/22434 [1:59:08<13:32:01, 2.49s/it] +2025-02-05 12:06:50 - ERROR - stderr - 13%|█▎ | 2835/22434 [1:59:10<13:31:27, 2.48s/it] +2025-02-05 12:06:51 - ERROR - stderr - +2025-02-05 12:06:51 - ERROR - stderr - +2025-02-05 12:06:51 - INFO - stdout - {'loss': 0.8352, 'grad_norm': 1.006543755531311, 'learning_rate': 1.9517234989518715e-05, 'epoch': 0.38} +2025-02-05 12:06:51 - ERROR - stderr - 13%|█▎ | 2835/22434 [1:59:10<13:31:27, 2.48s/it] +2025-02-05 12:06:53 - ERROR - stderr - 13%|█▎ | 2836/22434 [1:59:13<13:27:53, 2.47s/it] +2025-02-05 12:06:53 - ERROR - stderr - +2025-02-05 12:06:53 - ERROR - stderr - +2025-02-05 12:06:53 - INFO - stdout - {'loss': 1.1225, 'grad_norm': 1.2194923162460327, 'learning_rate': 1.9516791722712388e-05, 'epoch': 0.38} +2025-02-05 12:06:53 - ERROR - stderr - 13%|█▎ | 2836/22434 [1:59:13<13:27:53, 2.47s/it] +2025-02-05 12:06:55 - ERROR - stderr - 13%|█▎ | 2837/22434 [1:59:15<13:24:48, 2.46s/it] +2025-02-05 12:06:55 - ERROR - stderr - +2025-02-05 12:06:55 - ERROR - stderr - +2025-02-05 12:06:55 - INFO - stdout - {'loss': 1.0428, 'grad_norm': 1.0792709589004517, 'learning_rate': 1.9516348257537646e-05, 'epoch': 0.38} +2025-02-05 12:06:55 - ERROR - stderr - 13%|█▎ | 2837/22434 [1:59:15<13:24:48, 2.46s/it] +2025-02-05 12:06:58 - ERROR - stderr - 13%|█▎ | 2838/22434 [1:59:18<13:25:13, 2.47s/it] +2025-02-05 12:06:58 - ERROR - stderr - +2025-02-05 12:06:58 - ERROR - stderr - +2025-02-05 12:06:58 - INFO - stdout - {'loss': 1.0347, 'grad_norm': 1.2383439540863037, 'learning_rate': 1.951590459400373e-05, 'epoch': 0.38} +2025-02-05 12:06:58 - ERROR - stderr - 13%|█▎ | 2838/22434 [1:59:18<13:25:13, 2.47s/it] +2025-02-05 12:07:00 - ERROR - stderr - 13%|█▎ | 2839/22434 [1:59:20<13:21:44, 2.45s/it] +2025-02-05 12:07:00 - ERROR - stderr - +2025-02-05 12:07:00 - ERROR - stderr - +2025-02-05 12:07:00 - INFO - stdout - {'loss': 1.0192, 'grad_norm': 1.1096854209899902, 'learning_rate': 1.9515460732119887e-05, 'epoch': 0.38} +2025-02-05 12:07:00 - ERROR - stderr - 13%|█▎ | 2839/22434 [1:59:20<13:21:44, 2.45s/it] +2025-02-05 12:07:03 - ERROR - stderr - 13%|█▎ | 2840/22434 [1:59:23<13:25:24, 2.47s/it] +2025-02-05 12:07:03 - ERROR - stderr - +2025-02-05 12:07:03 - ERROR - stderr - +2025-02-05 12:07:03 - INFO - stdout - {'loss': 1.0516, 'grad_norm': 1.0400632619857788, 'learning_rate': 1.9515016671895373e-05, 'epoch': 0.38} +2025-02-05 12:07:03 - ERROR - stderr - 13%|█▎ | 2840/22434 [1:59:23<13:25:24, 2.47s/it] +2025-02-05 12:07:05 - ERROR - stderr - 13%|█▎ | 2841/22434 [1:59:25<13:34:18, 2.49s/it] +2025-02-05 12:07:05 - ERROR - stderr - +2025-02-05 12:07:05 - ERROR - stderr - +2025-02-05 12:07:05 - INFO - stdout - {'loss': 1.1986, 'grad_norm': 1.222752332687378, 'learning_rate': 1.9514572413339442e-05, 'epoch': 0.38} +2025-02-05 12:07:05 - ERROR - stderr - 13%|█▎ | 2841/22434 [1:59:25<13:34:18, 2.49s/it] +2025-02-05 12:07:08 - ERROR - stderr - 13%|█▎ | 2842/22434 [1:59:28<13:27:25, 2.47s/it] +2025-02-05 12:07:08 - ERROR - stderr - +2025-02-05 12:07:08 - ERROR - stderr - +2025-02-05 12:07:08 - INFO - stdout - {'loss': 0.8845, 'grad_norm': 1.0761499404907227, 'learning_rate': 1.9514127956461348e-05, 'epoch': 0.38} +2025-02-05 12:07:08 - ERROR - stderr - 13%|█▎ | 2842/22434 [1:59:28<13:27:25, 2.47s/it] +2025-02-05 12:07:10 - ERROR - stderr - 13%|█▎ | 2843/22434 [1:59:30<13:25:12, 2.47s/it] +2025-02-05 12:07:10 - ERROR - stderr - +2025-02-05 12:07:10 - ERROR - stderr - +2025-02-05 12:07:10 - INFO - stdout - {'loss': 0.8417, 'grad_norm': 1.014450192451477, 'learning_rate': 1.9513683301270364e-05, 'epoch': 0.38} +2025-02-05 12:07:10 - ERROR - stderr - 13%|█▎ | 2843/22434 [1:59:30<13:25:12, 2.47s/it] +2025-02-05 12:07:13 - ERROR - stderr - 13%|█▎ | 2844/22434 [1:59:32<13:26:16, 2.47s/it] +2025-02-05 12:07:13 - ERROR - stderr - +2025-02-05 12:07:13 - ERROR - stderr - +2025-02-05 12:07:13 - INFO - stdout - {'loss': 0.9648, 'grad_norm': 1.1654932498931885, 'learning_rate': 1.9513238447775757e-05, 'epoch': 0.38} +2025-02-05 12:07:13 - ERROR - stderr - 13%|█▎ | 2844/22434 [1:59:32<13:26:16, 2.47s/it] +2025-02-05 12:07:15 - ERROR - stderr - 13%|█▎ | 2845/22434 [1:59:35<13:25:18, 2.47s/it] +2025-02-05 12:07:15 - ERROR - stderr - +2025-02-05 12:07:15 - ERROR - stderr - +2025-02-05 12:07:15 - INFO - stdout - {'loss': 1.104, 'grad_norm': 1.1912479400634766, 'learning_rate': 1.9512793395986796e-05, 'epoch': 0.38} +2025-02-05 12:07:15 - ERROR - stderr - 13%|█▎ | 2845/22434 [1:59:35<13:25:18, 2.47s/it] +2025-02-05 12:07:18 - ERROR - stderr - 13%|█▎ | 2846/22434 [1:59:37<13:22:27, 2.46s/it] +2025-02-05 12:07:18 - ERROR - stderr - +2025-02-05 12:07:18 - ERROR - stderr - +2025-02-05 12:07:18 - INFO - stdout - {'loss': 1.0041, 'grad_norm': 1.302092432975769, 'learning_rate': 1.951234814591276e-05, 'epoch': 0.38} +2025-02-05 12:07:18 - ERROR - stderr - 13%|█▎ | 2846/22434 [1:59:37<13:22:27, 2.46s/it] +2025-02-05 12:07:20 - ERROR - stderr - 13%|█▎ | 2847/22434 [1:59:40<13:25:42, 2.47s/it] +2025-02-05 12:07:20 - ERROR - stderr - +2025-02-05 12:07:20 - ERROR - stderr - +2025-02-05 12:07:20 - INFO - stdout - {'loss': 1.1675, 'grad_norm': 1.2056795358657837, 'learning_rate': 1.951190269756293e-05, 'epoch': 0.38} +2025-02-05 12:07:20 - ERROR - stderr - 13%|█▎ | 2847/22434 [1:59:40<13:25:42, 2.47s/it] +2025-02-05 12:07:23 - ERROR - stderr - 13%|█▎ | 2848/22434 [1:59:42<13:24:52, 2.47s/it] +2025-02-05 12:07:23 - ERROR - stderr - +2025-02-05 12:07:23 - ERROR - stderr - +2025-02-05 12:07:23 - INFO - stdout - {'loss': 0.9501, 'grad_norm': 1.1844807863235474, 'learning_rate': 1.9511457050946586e-05, 'epoch': 0.38} +2025-02-05 12:07:23 - ERROR - stderr - 13%|█▎ | 2848/22434 [1:59:42<13:24:52, 2.47s/it] +2025-02-05 12:07:25 - ERROR - stderr - 13%|█▎ | 2849/22434 [1:59:45<13:31:48, 2.49s/it] +2025-02-05 12:07:25 - ERROR - stderr - +2025-02-05 12:07:25 - ERROR - stderr - +2025-02-05 12:07:25 - INFO - stdout - {'loss': 1.0995, 'grad_norm': 1.2032376527786255, 'learning_rate': 1.9511011206073026e-05, 'epoch': 0.38} +2025-02-05 12:07:25 - ERROR - stderr - 13%|█▎ | 2849/22434 [1:59:45<13:31:48, 2.49s/it] +2025-02-05 12:07:28 - ERROR - stderr - 13%|█▎ | 2850/22434 [1:59:47<13:35:02, 2.50s/it] +2025-02-05 12:07:28 - ERROR - stderr - +2025-02-05 12:07:28 - ERROR - stderr - +2025-02-05 12:07:28 - INFO - stdout - {'loss': 0.9985, 'grad_norm': 1.1233458518981934, 'learning_rate': 1.9510565162951538e-05, 'epoch': 0.38} +2025-02-05 12:07:28 - ERROR - stderr - 13%|█▎ | 2850/22434 [1:59:47<13:35:02, 2.50s/it] +2025-02-05 12:07:30 - ERROR - stderr - 13%|█▎ | 2851/22434 [1:59:50<13:34:29, 2.50s/it] +2025-02-05 12:07:30 - ERROR - stderr - +2025-02-05 12:07:30 - ERROR - stderr - +2025-02-05 12:07:30 - INFO - stdout - {'loss': 1.0025, 'grad_norm': 1.1801958084106445, 'learning_rate': 1.9510118921591417e-05, 'epoch': 0.38} +2025-02-05 12:07:30 - ERROR - stderr - 13%|█▎ | 2851/22434 [1:59:50<13:34:29, 2.50s/it] +2025-02-05 12:07:33 - ERROR - stderr - 13%|█▎ | 2852/22434 [1:59:52<13:46:00, 2.53s/it] +2025-02-05 12:07:33 - ERROR - stderr - +2025-02-05 12:07:33 - ERROR - stderr - +2025-02-05 12:07:33 - INFO - stdout - {'loss': 1.0633, 'grad_norm': 0.9823777675628662, 'learning_rate': 1.9509672482001968e-05, 'epoch': 0.38} +2025-02-05 12:07:33 - ERROR - stderr - 13%|█▎ | 2852/22434 [1:59:52<13:46:00, 2.53s/it] +2025-02-05 12:07:35 - ERROR - stderr - 13%|█▎ | 2853/22434 [1:59:55<13:47:19, 2.54s/it] +2025-02-05 12:07:35 - ERROR - stderr - +2025-02-05 12:07:35 - ERROR - stderr - +2025-02-05 12:07:35 - INFO - stdout - {'loss': 0.9752, 'grad_norm': 1.221772313117981, 'learning_rate': 1.9509225844192498e-05, 'epoch': 0.38} +2025-02-05 12:07:35 - ERROR - stderr - 13%|█▎ | 2853/22434 [1:59:55<13:47:19, 2.54s/it] +2025-02-05 12:07:38 - ERROR - stderr - 13%|█▎ | 2854/22434 [1:59:57<13:35:53, 2.50s/it] +2025-02-05 12:07:38 - ERROR - stderr - +2025-02-05 12:07:38 - ERROR - stderr - +2025-02-05 12:07:38 - INFO - stdout - {'loss': 1.0377, 'grad_norm': 1.2179268598556519, 'learning_rate': 1.9508779008172314e-05, 'epoch': 0.38} +2025-02-05 12:07:38 - ERROR - stderr - 13%|█▎ | 2854/22434 [1:59:57<13:35:53, 2.50s/it] +2025-02-05 12:07:40 - ERROR - stderr - 13%|█▎ | 2855/22434 [2:00:00<13:31:08, 2.49s/it] +2025-02-05 12:07:40 - ERROR - stderr - +2025-02-05 12:07:40 - ERROR - stderr - +2025-02-05 12:07:40 - INFO - stdout - {'loss': 0.8815, 'grad_norm': 1.0407724380493164, 'learning_rate': 1.950833197395073e-05, 'epoch': 0.38} +2025-02-05 12:07:40 - ERROR - stderr - 13%|█▎ | 2855/22434 [2:00:00<13:31:08, 2.49s/it] +2025-02-05 12:07:43 - ERROR - stderr - 13%|█▎ | 2856/22434 [2:00:02<13:34:03, 2.49s/it] +2025-02-05 12:07:43 - ERROR - stderr - +2025-02-05 12:07:43 - ERROR - stderr - +2025-02-05 12:07:43 - INFO - stdout - {'loss': 0.9635, 'grad_norm': 1.1152023077011108, 'learning_rate': 1.9507884741537063e-05, 'epoch': 0.38} +2025-02-05 12:07:43 - ERROR - stderr - 13%|█▎ | 2856/22434 [2:00:02<13:34:03, 2.49s/it] +2025-02-05 12:07:45 - ERROR - stderr - 13%|█▎ | 2857/22434 [2:00:05<13:30:00, 2.48s/it] +2025-02-05 12:07:45 - ERROR - stderr - +2025-02-05 12:07:45 - ERROR - stderr - +2025-02-05 12:07:45 - INFO - stdout - {'loss': 0.9984, 'grad_norm': 1.1226112842559814, 'learning_rate': 1.950743731094064e-05, 'epoch': 0.38} +2025-02-05 12:07:45 - ERROR - stderr - 13%|█▎ | 2857/22434 [2:00:05<13:30:00, 2.48s/it] +2025-02-05 12:07:47 - ERROR - stderr - 13%|█▎ | 2858/22434 [2:00:07<13:25:40, 2.47s/it] +2025-02-05 12:07:48 - ERROR - stderr - +2025-02-05 12:07:48 - ERROR - stderr - +2025-02-05 12:07:48 - INFO - stdout - {'loss': 0.8843, 'grad_norm': 1.1583904027938843, 'learning_rate': 1.9506989682170782e-05, 'epoch': 0.38} +2025-02-05 12:07:48 - ERROR - stderr - 13%|█▎ | 2858/22434 [2:00:07<13:25:40, 2.47s/it] +2025-02-05 12:07:50 - ERROR - stderr - 13%|█▎ | 2859/22434 [2:00:10<13:24:06, 2.46s/it] +2025-02-05 12:07:50 - ERROR - stderr - +2025-02-05 12:07:50 - ERROR - stderr - +2025-02-05 12:07:50 - INFO - stdout - {'loss': 0.9417, 'grad_norm': 1.120026707649231, 'learning_rate': 1.950654185523682e-05, 'epoch': 0.38} +2025-02-05 12:07:50 - ERROR - stderr - 13%|█▎ | 2859/22434 [2:00:10<13:24:06, 2.46s/it] +2025-02-05 12:07:52 - ERROR - stderr - 13%|█▎ | 2860/22434 [2:00:12<13:22:31, 2.46s/it] +2025-02-05 12:07:52 - ERROR - stderr - +2025-02-05 12:07:52 - ERROR - stderr - +2025-02-05 12:07:52 - INFO - stdout - {'loss': 1.0384, 'grad_norm': 1.2614694833755493, 'learning_rate': 1.950609383014809e-05, 'epoch': 0.38} +2025-02-05 12:07:52 - ERROR - stderr - 13%|█▎ | 2860/22434 [2:00:12<13:22:31, 2.46s/it] +2025-02-05 12:07:55 - ERROR - stderr - 13%|█▎ | 2861/22434 [2:00:15<13:28:34, 2.48s/it] +2025-02-05 12:07:55 - ERROR - stderr - +2025-02-05 12:07:55 - ERROR - stderr - +2025-02-05 12:07:55 - INFO - stdout - {'loss': 0.9409, 'grad_norm': 1.121084451675415, 'learning_rate': 1.950564560691393e-05, 'epoch': 0.38} +2025-02-05 12:07:55 - ERROR - stderr - 13%|█▎ | 2861/22434 [2:00:15<13:28:34, 2.48s/it] +2025-02-05 12:07:57 - ERROR - stderr - 13%|█▎ | 2862/22434 [2:00:17<13:33:39, 2.49s/it] +2025-02-05 12:07:58 - ERROR - stderr - +2025-02-05 12:07:58 - ERROR - stderr - +2025-02-05 12:07:58 - INFO - stdout - {'loss': 1.0272, 'grad_norm': 1.1301448345184326, 'learning_rate': 1.9505197185543688e-05, 'epoch': 0.38} +2025-02-05 12:07:58 - ERROR - stderr - 13%|█▎ | 2862/22434 [2:00:17<13:33:39, 2.49s/it] +2025-02-05 12:08:00 - ERROR - stderr - 13%|█▎ | 2863/22434 [2:00:20<13:28:34, 2.48s/it] +2025-02-05 12:08:00 - ERROR - stderr - +2025-02-05 12:08:00 - ERROR - stderr - +2025-02-05 12:08:00 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.1735868453979492, 'learning_rate': 1.9504748566046702e-05, 'epoch': 0.38} +2025-02-05 12:08:00 - ERROR - stderr - 13%|█▎ | 2863/22434 [2:00:20<13:28:34, 2.48s/it] +2025-02-05 12:08:02 - ERROR - stderr - 13%|█▎ | 2864/22434 [2:00:22<13:34:29, 2.50s/it] +2025-02-05 12:08:02 - ERROR - stderr - +2025-02-05 12:08:02 - ERROR - stderr - +2025-02-05 12:08:02 - INFO - stdout - {'loss': 1.0019, 'grad_norm': 1.1865742206573486, 'learning_rate': 1.9504299748432328e-05, 'epoch': 0.38} +2025-02-05 12:08:02 - ERROR - stderr - 13%|█▎ | 2864/22434 [2:00:22<13:34:29, 2.50s/it] +2025-02-05 12:08:05 - ERROR - stderr - 13%|█▎ | 2865/22434 [2:00:25<13:33:27, 2.49s/it] +2025-02-05 12:08:05 - ERROR - stderr - +2025-02-05 12:08:05 - ERROR - stderr - +2025-02-05 12:08:05 - INFO - stdout - {'loss': 1.0377, 'grad_norm': 1.2304835319519043, 'learning_rate': 1.9503850732709918e-05, 'epoch': 0.38} +2025-02-05 12:08:05 - ERROR - stderr - 13%|█▎ | 2865/22434 [2:00:25<13:33:27, 2.49s/it] +2025-02-05 12:08:07 - ERROR - stderr - 13%|█▎ | 2866/22434 [2:00:27<13:30:24, 2.48s/it] +2025-02-05 12:08:07 - ERROR - stderr - +2025-02-05 12:08:07 - ERROR - stderr - +2025-02-05 12:08:07 - INFO - stdout - {'loss': 1.0499, 'grad_norm': 1.2325650453567505, 'learning_rate': 1.950340151888884e-05, 'epoch': 0.38} +2025-02-05 12:08:07 - ERROR - stderr - 13%|█▎ | 2866/22434 [2:00:27<13:30:24, 2.48s/it] +2025-02-05 12:08:10 - ERROR - stderr - 13%|█▎ | 2867/22434 [2:00:30<13:27:35, 2.48s/it] +2025-02-05 12:08:10 - ERROR - stderr - +2025-02-05 12:08:10 - ERROR - stderr - +2025-02-05 12:08:10 - INFO - stdout - {'loss': 0.9706, 'grad_norm': 1.0876178741455078, 'learning_rate': 1.9502952106978447e-05, 'epoch': 0.38} +2025-02-05 12:08:10 - ERROR - stderr - 13%|█▎ | 2867/22434 [2:00:30<13:27:35, 2.48s/it] +2025-02-05 12:08:12 - ERROR - stderr - 13%|█▎ | 2868/22434 [2:00:32<13:19:50, 2.45s/it] +2025-02-05 12:08:12 - ERROR - stderr - +2025-02-05 12:08:12 - ERROR - stderr - +2025-02-05 12:08:12 - INFO - stdout - {'loss': 0.9256, 'grad_norm': 1.2775626182556152, 'learning_rate': 1.950250249698811e-05, 'epoch': 0.38} +2025-02-05 12:08:12 - ERROR - stderr - 13%|█▎ | 2868/22434 [2:00:32<13:19:50, 2.45s/it] +2025-02-05 12:08:15 - ERROR - stderr - 13%|█▎ | 2869/22434 [2:00:34<13:21:09, 2.46s/it] +2025-02-05 12:08:15 - ERROR - stderr - +2025-02-05 12:08:15 - ERROR - stderr - +2025-02-05 12:08:15 - INFO - stdout - {'loss': 1.0165, 'grad_norm': 1.1436847448349, 'learning_rate': 1.9502052688927203e-05, 'epoch': 0.38} +2025-02-05 12:08:15 - ERROR - stderr - 13%|█▎ | 2869/22434 [2:00:35<13:21:09, 2.46s/it] +2025-02-05 12:08:17 - ERROR - stderr - 13%|█▎ | 2870/22434 [2:00:37<13:32:25, 2.49s/it] +2025-02-05 12:08:17 - ERROR - stderr - +2025-02-05 12:08:17 - ERROR - stderr - +2025-02-05 12:08:17 - INFO - stdout - {'loss': 1.0759, 'grad_norm': 1.235756754875183, 'learning_rate': 1.95016026828051e-05, 'epoch': 0.38} +2025-02-05 12:08:17 - ERROR - stderr - 13%|█▎ | 2870/22434 [2:00:37<13:32:25, 2.49s/it] +2025-02-05 12:08:20 - ERROR - stderr - 13%|█▎ | 2871/22434 [2:00:40<13:35:42, 2.50s/it] +2025-02-05 12:08:20 - ERROR - stderr - +2025-02-05 12:08:20 - ERROR - stderr - +2025-02-05 12:08:20 - INFO - stdout - {'loss': 0.9744, 'grad_norm': 1.212841272354126, 'learning_rate': 1.9501152478631177e-05, 'epoch': 0.38} +2025-02-05 12:08:20 - ERROR - stderr - 13%|█▎ | 2871/22434 [2:00:40<13:35:42, 2.50s/it] +2025-02-05 12:08:22 - ERROR - stderr - 13%|█▎ | 2872/22434 [2:00:42<13:28:53, 2.48s/it] +2025-02-05 12:08:22 - ERROR - stderr - +2025-02-05 12:08:22 - ERROR - stderr - +2025-02-05 12:08:22 - INFO - stdout - {'loss': 1.0691, 'grad_norm': 1.2590534687042236, 'learning_rate': 1.9500702076414827e-05, 'epoch': 0.38} +2025-02-05 12:08:22 - ERROR - stderr - 13%|█▎ | 2872/22434 [2:00:42<13:28:53, 2.48s/it] +2025-02-05 12:08:25 - ERROR - stderr - 13%|█▎ | 2873/22434 [2:00:45<13:30:28, 2.49s/it] +2025-02-05 12:08:25 - ERROR - stderr - +2025-02-05 12:08:25 - ERROR - stderr - +2025-02-05 12:08:25 - INFO - stdout - {'loss': 0.8867, 'grad_norm': 0.9985617995262146, 'learning_rate': 1.9500251476165432e-05, 'epoch': 0.38} +2025-02-05 12:08:25 - ERROR - stderr - 13%|█▎ | 2873/22434 [2:00:45<13:30:28, 2.49s/it] +2025-02-05 12:08:27 - ERROR - stderr - 13%|█▎ | 2874/22434 [2:00:47<13:33:24, 2.50s/it] +2025-02-05 12:08:27 - ERROR - stderr - +2025-02-05 12:08:27 - ERROR - stderr - +2025-02-05 12:08:27 - INFO - stdout - {'loss': 0.9833, 'grad_norm': 1.1148408651351929, 'learning_rate': 1.9499800677892386e-05, 'epoch': 0.38} +2025-02-05 12:08:27 - ERROR - stderr - 13%|█▎ | 2874/22434 [2:00:47<13:33:24, 2.50s/it] +2025-02-05 12:08:30 - ERROR - stderr - 13%|█▎ | 2875/22434 [2:00:50<13:41:02, 2.52s/it] +2025-02-05 12:08:30 - ERROR - stderr - +2025-02-05 12:08:30 - ERROR - stderr - +2025-02-05 12:08:30 - INFO - stdout - {'loss': 0.9318, 'grad_norm': 1.1358232498168945, 'learning_rate': 1.9499349681605087e-05, 'epoch': 0.38} +2025-02-05 12:08:30 - ERROR - stderr - 13%|█▎ | 2875/22434 [2:00:50<13:41:02, 2.52s/it] +2025-02-05 12:08:32 - ERROR - stderr - 13%|█▎ | 2876/22434 [2:00:52<13:34:49, 2.50s/it] +2025-02-05 12:08:32 - ERROR - stderr - +2025-02-05 12:08:32 - ERROR - stderr - +2025-02-05 12:08:32 - INFO - stdout - {'loss': 0.9853, 'grad_norm': 1.1911143064498901, 'learning_rate': 1.949889848731293e-05, 'epoch': 0.38} +2025-02-05 12:08:32 - ERROR - stderr - 13%|█▎ | 2876/22434 [2:00:52<13:34:49, 2.50s/it] +2025-02-05 12:08:35 - ERROR - stderr - 13%|█▎ | 2877/22434 [2:00:55<13:37:27, 2.51s/it] +2025-02-05 12:08:35 - ERROR - stderr - +2025-02-05 12:08:35 - ERROR - stderr - +2025-02-05 12:08:35 - INFO - stdout - {'loss': 0.9325, 'grad_norm': 1.1217687129974365, 'learning_rate': 1.9498447095025324e-05, 'epoch': 0.38} +2025-02-05 12:08:35 - ERROR - stderr - 13%|█▎ | 2877/22434 [2:00:55<13:37:27, 2.51s/it] +2025-02-05 12:08:37 - ERROR - stderr - 13%|█▎ | 2878/22434 [2:00:57<13:31:38, 2.49s/it] +2025-02-05 12:08:37 - ERROR - stderr - +2025-02-05 12:08:37 - ERROR - stderr - +2025-02-05 12:08:37 - INFO - stdout - {'loss': 0.9799, 'grad_norm': 1.1208195686340332, 'learning_rate': 1.949799550475168e-05, 'epoch': 0.38} +2025-02-05 12:08:37 - ERROR - stderr - 13%|█▎ | 2878/22434 [2:00:57<13:31:38, 2.49s/it] +2025-02-05 12:08:40 - ERROR - stderr - 13%|█▎ | 2879/22434 [2:01:00<13:33:59, 2.50s/it] +2025-02-05 12:08:40 - ERROR - stderr - +2025-02-05 12:08:40 - ERROR - stderr - +2025-02-05 12:08:40 - INFO - stdout - {'loss': 0.8813, 'grad_norm': 1.069865345954895, 'learning_rate': 1.9497543716501404e-05, 'epoch': 0.38} +2025-02-05 12:08:40 - ERROR - stderr - 13%|█▎ | 2879/22434 [2:01:00<13:33:59, 2.50s/it] +2025-02-05 12:08:42 - ERROR - stderr - 13%|█▎ | 2880/22434 [2:01:02<13:29:04, 2.48s/it] +2025-02-05 12:08:42 - ERROR - stderr - +2025-02-05 12:08:42 - ERROR - stderr - +2025-02-05 12:08:42 - INFO - stdout - {'loss': 0.8909, 'grad_norm': 1.076357126235962, 'learning_rate': 1.949709173028392e-05, 'epoch': 0.39} +2025-02-05 12:08:42 - ERROR - stderr - 13%|█▎ | 2880/22434 [2:01:02<13:29:04, 2.48s/it] +2025-02-05 12:08:45 - ERROR - stderr - 13%|█▎ | 2881/22434 [2:01:04<13:31:01, 2.49s/it] +2025-02-05 12:08:45 - ERROR - stderr - +2025-02-05 12:08:45 - ERROR - stderr - +2025-02-05 12:08:45 - INFO - stdout - {'loss': 0.9698, 'grad_norm': 1.1292835474014282, 'learning_rate': 1.949663954610865e-05, 'epoch': 0.39} +2025-02-05 12:08:45 - ERROR - stderr - 13%|█▎ | 2881/22434 [2:01:05<13:31:01, 2.49s/it] +2025-02-05 12:08:47 - ERROR - stderr - 13%|█▎ | 2882/22434 [2:01:07<13:32:38, 2.49s/it] +2025-02-05 12:08:47 - ERROR - stderr - +2025-02-05 12:08:47 - ERROR - stderr - +2025-02-05 12:08:47 - INFO - stdout - {'loss': 0.964, 'grad_norm': 1.1143873929977417, 'learning_rate': 1.9496187163985012e-05, 'epoch': 0.39} +2025-02-05 12:08:47 - ERROR - stderr - 13%|█▎ | 2882/22434 [2:01:07<13:32:38, 2.49s/it] +2025-02-05 12:08:50 - ERROR - stderr - 13%|█▎ | 2883/22434 [2:01:10<13:33:37, 2.50s/it] +2025-02-05 12:08:50 - ERROR - stderr - +2025-02-05 12:08:50 - ERROR - stderr - +2025-02-05 12:08:50 - INFO - stdout - {'loss': 0.9965, 'grad_norm': 1.1518305540084839, 'learning_rate': 1.949573458392244e-05, 'epoch': 0.39} +2025-02-05 12:08:50 - ERROR - stderr - 13%|█▎ | 2883/22434 [2:01:10<13:33:37, 2.50s/it] +2025-02-05 12:08:52 - ERROR - stderr - 13%|█▎ | 2884/22434 [2:01:12<13:34:24, 2.50s/it] +2025-02-05 12:08:52 - ERROR - stderr - +2025-02-05 12:08:52 - ERROR - stderr - +2025-02-05 12:08:52 - INFO - stdout - {'loss': 0.9174, 'grad_norm': 1.1327941417694092, 'learning_rate': 1.949528180593037e-05, 'epoch': 0.39} +2025-02-05 12:08:52 - ERROR - stderr - 13%|█▎ | 2884/22434 [2:01:12<13:34:24, 2.50s/it] +2025-02-05 12:08:55 - ERROR - stderr - 13%|█▎ | 2885/22434 [2:01:15<13:42:19, 2.52s/it] +2025-02-05 12:08:55 - ERROR - stderr - +2025-02-05 12:08:55 - ERROR - stderr - +2025-02-05 12:08:55 - INFO - stdout - {'loss': 0.9655, 'grad_norm': 1.0774791240692139, 'learning_rate': 1.9494828830018232e-05, 'epoch': 0.39} +2025-02-05 12:08:55 - ERROR - stderr - 13%|█▎ | 2885/22434 [2:01:15<13:42:19, 2.52s/it] +2025-02-05 12:08:57 - ERROR - stderr - 13%|█▎ | 2886/22434 [2:01:17<13:43:52, 2.53s/it] +2025-02-05 12:08:57 - ERROR - stderr - +2025-02-05 12:08:57 - ERROR - stderr - +2025-02-05 12:08:57 - INFO - stdout - {'loss': 1.0465, 'grad_norm': 1.1973756551742554, 'learning_rate': 1.9494375656195475e-05, 'epoch': 0.39} +2025-02-05 12:08:57 - ERROR - stderr - 13%|█▎ | 2886/22434 [2:01:17<13:43:52, 2.53s/it] +2025-02-05 12:09:00 - ERROR - stderr - 13%|█▎ | 2887/22434 [2:01:20<13:41:55, 2.52s/it] +2025-02-05 12:09:00 - ERROR - stderr - +2025-02-05 12:09:00 - ERROR - stderr - +2025-02-05 12:09:00 - INFO - stdout - {'loss': 0.9981, 'grad_norm': 1.221407413482666, 'learning_rate': 1.9493922284471543e-05, 'epoch': 0.39} +2025-02-05 12:09:00 - ERROR - stderr - 13%|█▎ | 2887/22434 [2:01:20<13:41:55, 2.52s/it] +2025-02-05 12:09:02 - ERROR - stderr - 13%|█▎ | 2888/22434 [2:01:22<13:40:21, 2.52s/it] +2025-02-05 12:09:02 - ERROR - stderr - +2025-02-05 12:09:02 - ERROR - stderr - +2025-02-05 12:09:02 - INFO - stdout - {'loss': 1.028, 'grad_norm': 1.1910771131515503, 'learning_rate': 1.9493468714855887e-05, 'epoch': 0.39} +2025-02-05 12:09:02 - ERROR - stderr - 13%|█▎ | 2888/22434 [2:01:22<13:40:21, 2.52s/it] +2025-02-05 12:09:05 - ERROR - stderr - 13%|█▎ | 2889/22434 [2:01:25<13:30:53, 2.49s/it] +2025-02-05 12:09:05 - ERROR - stderr - +2025-02-05 12:09:05 - ERROR - stderr - +2025-02-05 12:09:05 - INFO - stdout - {'loss': 0.9901, 'grad_norm': 1.173493504524231, 'learning_rate': 1.9493014947357955e-05, 'epoch': 0.39} +2025-02-05 12:09:05 - ERROR - stderr - 13%|█▎ | 2889/22434 [2:01:25<13:30:53, 2.49s/it] +2025-02-05 12:09:07 - ERROR - stderr - 13%|█▎ | 2890/22434 [2:01:27<13:23:19, 2.47s/it] +2025-02-05 12:09:07 - ERROR - stderr - +2025-02-05 12:09:07 - ERROR - stderr - +2025-02-05 12:09:07 - INFO - stdout - {'loss': 1.0734, 'grad_norm': 1.1722590923309326, 'learning_rate': 1.9492560981987215e-05, 'epoch': 0.39} +2025-02-05 12:09:07 - ERROR - stderr - 13%|█▎ | 2890/22434 [2:01:27<13:23:19, 2.47s/it] +2025-02-05 12:09:10 - ERROR - stderr - 13%|█▎ | 2891/22434 [2:01:29<13:24:58, 2.47s/it] +2025-02-05 12:09:10 - ERROR - stderr - +2025-02-05 12:09:10 - ERROR - stderr - +2025-02-05 12:09:10 - INFO - stdout - {'loss': 0.9698, 'grad_norm': 1.0097779035568237, 'learning_rate': 1.949210681875312e-05, 'epoch': 0.39} +2025-02-05 12:09:10 - ERROR - stderr - 13%|█▎ | 2891/22434 [2:01:30<13:24:58, 2.47s/it] +2025-02-05 12:09:12 - ERROR - stderr - 13%|█▎ | 2892/22434 [2:01:32<13:28:49, 2.48s/it] +2025-02-05 12:09:12 - ERROR - stderr - +2025-02-05 12:09:12 - ERROR - stderr - +2025-02-05 12:09:12 - INFO - stdout - {'loss': 1.0029, 'grad_norm': 1.1475749015808105, 'learning_rate': 1.9491652457665146e-05, 'epoch': 0.39} +2025-02-05 12:09:12 - ERROR - stderr - 13%|█▎ | 2892/22434 [2:01:32<13:28:49, 2.48s/it] +2025-02-05 12:09:15 - ERROR - stderr - 13%|█▎ | 2893/22434 [2:01:35<13:34:44, 2.50s/it] +2025-02-05 12:09:15 - ERROR - stderr - +2025-02-05 12:09:15 - ERROR - stderr - +2025-02-05 12:09:15 - INFO - stdout - {'loss': 1.16, 'grad_norm': 1.246366024017334, 'learning_rate': 1.9491197898732758e-05, 'epoch': 0.39} +2025-02-05 12:09:15 - ERROR - stderr - 13%|█▎ | 2893/22434 [2:01:35<13:34:44, 2.50s/it] +2025-02-05 12:09:17 - ERROR - stderr - 13%|█▎ | 2894/22434 [2:01:37<13:27:58, 2.48s/it] +2025-02-05 12:09:17 - ERROR - stderr - +2025-02-05 12:09:17 - ERROR - stderr - +2025-02-05 12:09:17 - INFO - stdout - {'loss': 1.0476, 'grad_norm': 1.1189351081848145, 'learning_rate': 1.949074314196543e-05, 'epoch': 0.39} +2025-02-05 12:09:17 - ERROR - stderr - 13%|█▎ | 2894/22434 [2:01:37<13:27:58, 2.48s/it] +2025-02-05 12:09:20 - ERROR - stderr - 13%|█▎ | 2895/22434 [2:01:39<13:25:19, 2.47s/it] +2025-02-05 12:09:20 - ERROR - stderr - +2025-02-05 12:09:20 - ERROR - stderr - +2025-02-05 12:09:20 - INFO - stdout - {'loss': 1.0936, 'grad_norm': 1.1763771772384644, 'learning_rate': 1.9490288187372642e-05, 'epoch': 0.39} +2025-02-05 12:09:20 - ERROR - stderr - 13%|█▎ | 2895/22434 [2:01:39<13:25:19, 2.47s/it] +2025-02-05 12:09:22 - ERROR - stderr - 13%|█▎ | 2896/22434 [2:01:42<13:26:44, 2.48s/it] +2025-02-05 12:09:22 - ERROR - stderr - +2025-02-05 12:09:22 - ERROR - stderr - +2025-02-05 12:09:22 - INFO - stdout - {'loss': 0.9663, 'grad_norm': 1.1193116903305054, 'learning_rate': 1.948983303496388e-05, 'epoch': 0.39} +2025-02-05 12:09:22 - ERROR - stderr - 13%|█▎ | 2896/22434 [2:01:42<13:26:44, 2.48s/it] +2025-02-05 12:09:25 - ERROR - stderr - 13%|█▎ | 2897/22434 [2:01:44<13:27:01, 2.48s/it] +2025-02-05 12:09:25 - ERROR - stderr - +2025-02-05 12:09:25 - ERROR - stderr - +2025-02-05 12:09:25 - INFO - stdout - {'loss': 1.0633, 'grad_norm': 1.2144430875778198, 'learning_rate': 1.9489377684748628e-05, 'epoch': 0.39} +2025-02-05 12:09:25 - ERROR - stderr - 13%|█▎ | 2897/22434 [2:01:44<13:27:01, 2.48s/it] +2025-02-05 12:09:27 - ERROR - stderr - 13%|█▎ | 2898/22434 [2:01:47<13:34:37, 2.50s/it] +2025-02-05 12:09:27 - ERROR - stderr - +2025-02-05 12:09:27 - ERROR - stderr - +2025-02-05 12:09:27 - INFO - stdout - {'loss': 1.0913, 'grad_norm': 1.2183281183242798, 'learning_rate': 1.9488922136736382e-05, 'epoch': 0.39} +2025-02-05 12:09:27 - ERROR - stderr - 13%|█▎ | 2898/22434 [2:01:47<13:34:37, 2.50s/it] +2025-02-05 12:09:30 - ERROR - stderr - 13%|█▎ | 2899/22434 [2:01:50<13:45:19, 2.53s/it] +2025-02-05 12:09:30 - ERROR - stderr - +2025-02-05 12:09:30 - ERROR - stderr - +2025-02-05 12:09:30 - INFO - stdout - {'loss': 1.0226, 'grad_norm': 1.0372185707092285, 'learning_rate': 1.948846639093663e-05, 'epoch': 0.39} +2025-02-05 12:09:30 - ERROR - stderr - 13%|█▎ | 2899/22434 [2:01:50<13:45:19, 2.53s/it] +2025-02-05 12:09:32 - ERROR - stderr - 13%|█▎ | 2900/22434 [2:01:52<13:41:16, 2.52s/it] +2025-02-05 12:09:32 - ERROR - stderr - +2025-02-05 12:09:32 - ERROR - stderr - +2025-02-05 12:09:32 - INFO - stdout - {'loss': 0.967, 'grad_norm': 1.1767842769622803, 'learning_rate': 1.948801044735888e-05, 'epoch': 0.39} +2025-02-05 12:09:32 - ERROR - stderr - 13%|█▎ | 2900/22434 [2:01:52<13:41:16, 2.52s/it] +2025-02-05 12:09:35 - ERROR - stderr - 13%|█▎ | 2901/22434 [2:01:55<14:18:04, 2.64s/it] +2025-02-05 12:09:35 - ERROR - stderr - +2025-02-05 12:09:35 - ERROR - stderr - +2025-02-05 12:09:35 - INFO - stdout - {'loss': 1.0494, 'grad_norm': 1.1482349634170532, 'learning_rate': 1.9487554306012625e-05, 'epoch': 0.39} +2025-02-05 12:09:35 - ERROR - stderr - 13%|█▎ | 2901/22434 [2:01:55<14:18:04, 2.64s/it] +2025-02-05 12:09:38 - ERROR - stderr - 13%|█▎ | 2902/22434 [2:01:57<13:57:54, 2.57s/it] +2025-02-05 12:09:38 - ERROR - stderr - +2025-02-05 12:09:38 - ERROR - stderr - +2025-02-05 12:09:38 - INFO - stdout - {'loss': 1.0827, 'grad_norm': 1.1677573919296265, 'learning_rate': 1.9487097966907385e-05, 'epoch': 0.39} +2025-02-05 12:09:38 - ERROR - stderr - 13%|█▎ | 2902/22434 [2:01:57<13:57:54, 2.57s/it] +2025-02-05 12:09:40 - ERROR - stderr - 13%|█▎ | 2903/22434 [2:02:00<13:53:54, 2.56s/it] +2025-02-05 12:09:40 - ERROR - stderr - +2025-02-05 12:09:40 - ERROR - stderr - +2025-02-05 12:09:40 - INFO - stdout - {'loss': 0.9769, 'grad_norm': 1.0432132482528687, 'learning_rate': 1.9486641430052664e-05, 'epoch': 0.39} +2025-02-05 12:09:40 - ERROR - stderr - 13%|█▎ | 2903/22434 [2:02:00<13:53:54, 2.56s/it] +2025-02-05 12:09:43 - ERROR - stderr - 13%|█▎ | 2904/22434 [2:02:02<13:53:49, 2.56s/it] +2025-02-05 12:09:43 - ERROR - stderr - +2025-02-05 12:09:43 - ERROR - stderr - +2025-02-05 12:09:43 - INFO - stdout - {'loss': 1.02, 'grad_norm': 1.225866675376892, 'learning_rate': 1.948618469545798e-05, 'epoch': 0.39} +2025-02-05 12:09:43 - ERROR - stderr - 13%|█▎ | 2904/22434 [2:02:03<13:53:49, 2.56s/it] +2025-02-05 12:09:45 - ERROR - stderr - 13%|█▎ | 2905/22434 [2:02:05<13:42:19, 2.53s/it] +2025-02-05 12:09:45 - ERROR - stderr - +2025-02-05 12:09:45 - ERROR - stderr - +2025-02-05 12:09:45 - INFO - stdout - {'loss': 1.0114, 'grad_norm': 1.1266239881515503, 'learning_rate': 1.9485727763132853e-05, 'epoch': 0.39} +2025-02-05 12:09:45 - ERROR - stderr - 13%|█▎ | 2905/22434 [2:02:05<13:42:19, 2.53s/it] +2025-02-05 12:09:48 - ERROR - stderr - 13%|█▎ | 2906/22434 [2:02:07<13:35:49, 2.51s/it] +2025-02-05 12:09:48 - ERROR - stderr - +2025-02-05 12:09:48 - ERROR - stderr - +2025-02-05 12:09:48 - INFO - stdout - {'loss': 1.0389, 'grad_norm': 1.1278605461120605, 'learning_rate': 1.9485270633086807e-05, 'epoch': 0.39} +2025-02-05 12:09:48 - ERROR - stderr - 13%|█▎ | 2906/22434 [2:02:07<13:35:49, 2.51s/it] +2025-02-05 12:09:50 - ERROR - stderr - 13%|█▎ | 2907/22434 [2:02:10<13:35:48, 2.51s/it] +2025-02-05 12:09:50 - ERROR - stderr - +2025-02-05 12:09:50 - ERROR - stderr - +2025-02-05 12:09:50 - INFO - stdout - {'loss': 1.098, 'grad_norm': 1.2346082925796509, 'learning_rate': 1.948481330532937e-05, 'epoch': 0.39} +2025-02-05 12:09:50 - ERROR - stderr - 13%|█▎ | 2907/22434 [2:02:10<13:35:48, 2.51s/it] +2025-02-05 12:09:53 - ERROR - stderr - 13%|█▎ | 2908/22434 [2:02:12<13:27:15, 2.48s/it] +2025-02-05 12:09:53 - ERROR - stderr - +2025-02-05 12:09:53 - ERROR - stderr - +2025-02-05 12:09:53 - INFO - stdout - {'loss': 0.9568, 'grad_norm': 1.0492706298828125, 'learning_rate': 1.9484355779870078e-05, 'epoch': 0.39} +2025-02-05 12:09:53 - ERROR - stderr - 13%|█▎ | 2908/22434 [2:02:12<13:27:15, 2.48s/it] +2025-02-05 12:09:55 - ERROR - stderr - 13%|█▎ | 2909/22434 [2:02:15<13:27:41, 2.48s/it] +2025-02-05 12:09:55 - ERROR - stderr - +2025-02-05 12:09:55 - ERROR - stderr - +2025-02-05 12:09:55 - INFO - stdout - {'loss': 0.9138, 'grad_norm': 1.157475233078003, 'learning_rate': 1.9483898056718464e-05, 'epoch': 0.39} +2025-02-05 12:09:55 - ERROR - stderr - 13%|█▎ | 2909/22434 [2:02:15<13:27:41, 2.48s/it] +2025-02-05 12:09:57 - ERROR - stderr - 13%|█▎ | 2910/22434 [2:02:17<13:27:28, 2.48s/it] +2025-02-05 12:09:58 - ERROR - stderr - +2025-02-05 12:09:58 - ERROR - stderr - +2025-02-05 12:09:58 - INFO - stdout - {'loss': 0.9448, 'grad_norm': 1.0891684293746948, 'learning_rate': 1.948344013588407e-05, 'epoch': 0.39} +2025-02-05 12:09:58 - ERROR - stderr - 13%|█▎ | 2910/22434 [2:02:17<13:27:28, 2.48s/it] +2025-02-05 12:10:00 - ERROR - stderr - 13%|█▎ | 2911/22434 [2:02:20<13:28:34, 2.48s/it] +2025-02-05 12:10:00 - ERROR - stderr - +2025-02-05 12:10:00 - ERROR - stderr - +2025-02-05 12:10:00 - INFO - stdout - {'loss': 0.9558, 'grad_norm': 1.0841211080551147, 'learning_rate': 1.9482982017376444e-05, 'epoch': 0.39} +2025-02-05 12:10:00 - ERROR - stderr - 13%|█▎ | 2911/22434 [2:02:20<13:28:34, 2.48s/it] +2025-02-05 12:10:02 - ERROR - stderr - 13%|█▎ | 2912/22434 [2:02:22<13:26:18, 2.48s/it] +2025-02-05 12:10:02 - ERROR - stderr - +2025-02-05 12:10:02 - ERROR - stderr - +2025-02-05 12:10:02 - INFO - stdout - {'loss': 0.9462, 'grad_norm': 1.029944658279419, 'learning_rate': 1.948252370120513e-05, 'epoch': 0.39} +2025-02-05 12:10:02 - ERROR - stderr - 13%|█▎ | 2912/22434 [2:02:22<13:26:18, 2.48s/it] +2025-02-05 12:10:05 - ERROR - stderr - 13%|█▎ | 2913/22434 [2:02:25<13:27:01, 2.48s/it] +2025-02-05 12:10:05 - ERROR - stderr - +2025-02-05 12:10:05 - ERROR - stderr - +2025-02-05 12:10:05 - INFO - stdout - {'loss': 1.002, 'grad_norm': 1.0911540985107422, 'learning_rate': 1.9482065187379682e-05, 'epoch': 0.39} +2025-02-05 12:10:05 - ERROR - stderr - 13%|█▎ | 2913/22434 [2:02:25<13:27:01, 2.48s/it] +2025-02-05 12:10:07 - ERROR - stderr - 13%|█▎ | 2914/22434 [2:02:27<13:27:46, 2.48s/it] +2025-02-05 12:10:07 - ERROR - stderr - +2025-02-05 12:10:07 - ERROR - stderr - +2025-02-05 12:10:07 - INFO - stdout - {'loss': 0.9342, 'grad_norm': 1.0668919086456299, 'learning_rate': 1.948160647590966e-05, 'epoch': 0.39} +2025-02-05 12:10:07 - ERROR - stderr - 13%|█▎ | 2914/22434 [2:02:27<13:27:46, 2.48s/it] +2025-02-05 12:10:10 - ERROR - stderr - 13%|█▎ | 2915/22434 [2:02:30<13:25:39, 2.48s/it] +2025-02-05 12:10:10 - ERROR - stderr - +2025-02-05 12:10:10 - ERROR - stderr - +2025-02-05 12:10:10 - INFO - stdout - {'loss': 1.0074, 'grad_norm': 1.1697531938552856, 'learning_rate': 1.9481147566804623e-05, 'epoch': 0.39} +2025-02-05 12:10:10 - ERROR - stderr - 13%|█▎ | 2915/22434 [2:02:30<13:25:39, 2.48s/it] +2025-02-05 12:10:12 - ERROR - stderr - 13%|█▎ | 2916/22434 [2:02:32<13:31:54, 2.50s/it] +2025-02-05 12:10:12 - ERROR - stderr - +2025-02-05 12:10:12 - ERROR - stderr - +2025-02-05 12:10:12 - INFO - stdout - {'loss': 0.9349, 'grad_norm': 1.1550745964050293, 'learning_rate': 1.9480688460074136e-05, 'epoch': 0.39} +2025-02-05 12:10:12 - ERROR - stderr - 13%|█▎ | 2916/22434 [2:02:32<13:31:54, 2.50s/it] +2025-02-05 12:10:15 - ERROR - stderr - 13%|█▎ | 2917/22434 [2:02:35<13:29:35, 2.49s/it] +2025-02-05 12:10:15 - ERROR - stderr - +2025-02-05 12:10:15 - ERROR - stderr - +2025-02-05 12:10:15 - INFO - stdout - {'loss': 0.9116, 'grad_norm': 1.1940799951553345, 'learning_rate': 1.9480229155727776e-05, 'epoch': 0.39} +2025-02-05 12:10:15 - ERROR - stderr - 13%|█▎ | 2917/22434 [2:02:35<13:29:35, 2.49s/it] +2025-02-05 12:10:17 - ERROR - stderr - 13%|█▎ | 2918/22434 [2:02:37<13:26:44, 2.48s/it] +2025-02-05 12:10:17 - ERROR - stderr - +2025-02-05 12:10:17 - ERROR - stderr - +2025-02-05 12:10:17 - INFO - stdout - {'loss': 0.8927, 'grad_norm': 1.0912806987762451, 'learning_rate': 1.9479769653775107e-05, 'epoch': 0.39} +2025-02-05 12:10:17 - ERROR - stderr - 13%|█▎ | 2918/22434 [2:02:37<13:26:44, 2.48s/it] +2025-02-05 12:10:20 - ERROR - stderr - 13%|█▎ | 2919/22434 [2:02:40<13:25:00, 2.48s/it] +2025-02-05 12:10:20 - ERROR - stderr - +2025-02-05 12:10:20 - ERROR - stderr - +2025-02-05 12:10:20 - INFO - stdout - {'loss': 0.9929, 'grad_norm': 1.2202863693237305, 'learning_rate': 1.947930995422571e-05, 'epoch': 0.39} +2025-02-05 12:10:20 - ERROR - stderr - 13%|█▎ | 2919/22434 [2:02:40<13:25:00, 2.48s/it] +2025-02-05 12:10:22 - ERROR - stderr - 13%|█▎ | 2920/22434 [2:02:42<13:21:29, 2.46s/it] +2025-02-05 12:10:22 - ERROR - stderr - +2025-02-05 12:10:22 - ERROR - stderr - +2025-02-05 12:10:22 - INFO - stdout - {'loss': 1.157, 'grad_norm': 1.2713217735290527, 'learning_rate': 1.9478850057089168e-05, 'epoch': 0.39} +2025-02-05 12:10:22 - ERROR - stderr - 13%|█▎ | 2920/22434 [2:02:42<13:21:29, 2.46s/it] +2025-02-05 12:10:25 - ERROR - stderr - 13%|█▎ | 2921/22434 [2:02:44<13:20:20, 2.46s/it] +2025-02-05 12:10:25 - ERROR - stderr - +2025-02-05 12:10:25 - ERROR - stderr - +2025-02-05 12:10:25 - INFO - stdout - {'loss': 0.895, 'grad_norm': 1.1349575519561768, 'learning_rate': 1.947838996237507e-05, 'epoch': 0.39} +2025-02-05 12:10:25 - ERROR - stderr - 13%|█▎ | 2921/22434 [2:02:45<13:20:20, 2.46s/it] +2025-02-05 12:10:27 - ERROR - stderr - 13%|█▎ | 2922/22434 [2:02:47<13:29:05, 2.49s/it] +2025-02-05 12:10:27 - ERROR - stderr - +2025-02-05 12:10:27 - ERROR - stderr - +2025-02-05 12:10:27 - INFO - stdout - {'loss': 1.0364, 'grad_norm': 1.2043986320495605, 'learning_rate': 1.9477929670092997e-05, 'epoch': 0.39} +2025-02-05 12:10:27 - ERROR - stderr - 13%|█▎ | 2922/22434 [2:02:47<13:29:05, 2.49s/it] +2025-02-05 12:10:30 - ERROR - stderr - 13%|█▎ | 2923/22434 [2:02:49<13:26:07, 2.48s/it] +2025-02-05 12:10:30 - ERROR - stderr - +2025-02-05 12:10:30 - ERROR - stderr - +2025-02-05 12:10:30 - INFO - stdout - {'loss': 0.9952, 'grad_norm': 1.1637495756149292, 'learning_rate': 1.947746918025255e-05, 'epoch': 0.39} +2025-02-05 12:10:30 - ERROR - stderr - 13%|█▎ | 2923/22434 [2:02:50<13:26:07, 2.48s/it] +2025-02-05 12:10:32 - ERROR - stderr - 13%|█▎ | 2924/22434 [2:02:52<13:20:11, 2.46s/it] +2025-02-05 12:10:32 - ERROR - stderr - +2025-02-05 12:10:32 - ERROR - stderr - +2025-02-05 12:10:32 - INFO - stdout - {'loss': 0.9592, 'grad_norm': 1.1131538152694702, 'learning_rate': 1.947700849286333e-05, 'epoch': 0.39} +2025-02-05 12:10:32 - ERROR - stderr - 13%|█▎ | 2924/22434 [2:02:52<13:20:11, 2.46s/it] +2025-02-05 12:10:35 - ERROR - stderr - 13%|█▎ | 2925/22434 [2:02:54<13:20:12, 2.46s/it] +2025-02-05 12:10:35 - ERROR - stderr - +2025-02-05 12:10:35 - ERROR - stderr - +2025-02-05 12:10:35 - INFO - stdout - {'loss': 0.9818, 'grad_norm': 1.221379280090332, 'learning_rate': 1.9476547607934937e-05, 'epoch': 0.39} +2025-02-05 12:10:35 - ERROR - stderr - 13%|█▎ | 2925/22434 [2:02:54<13:20:12, 2.46s/it] +2025-02-05 12:10:37 - ERROR - stderr - 13%|█▎ | 2926/22434 [2:02:57<13:16:40, 2.45s/it] +2025-02-05 12:10:37 - ERROR - stderr - +2025-02-05 12:10:37 - ERROR - stderr - +2025-02-05 12:10:37 - INFO - stdout - {'loss': 0.9342, 'grad_norm': 1.0862956047058105, 'learning_rate': 1.9476086525476977e-05, 'epoch': 0.39} +2025-02-05 12:10:37 - ERROR - stderr - 13%|█▎ | 2926/22434 [2:02:57<13:16:40, 2.45s/it] +2025-02-05 12:10:40 - ERROR - stderr - 13%|█▎ | 2927/22434 [2:02:59<13:20:55, 2.46s/it] +2025-02-05 12:10:40 - ERROR - stderr - +2025-02-05 12:10:40 - ERROR - stderr - +2025-02-05 12:10:40 - INFO - stdout - {'loss': 0.9713, 'grad_norm': 1.2069255113601685, 'learning_rate': 1.947562524549906e-05, 'epoch': 0.39} +2025-02-05 12:10:40 - ERROR - stderr - 13%|█▎ | 2927/22434 [2:02:59<13:20:55, 2.46s/it] +2025-02-05 12:10:42 - ERROR - stderr - 13%|█▎ | 2928/22434 [2:03:02<13:23:07, 2.47s/it] +2025-02-05 12:10:42 - ERROR - stderr - +2025-02-05 12:10:42 - ERROR - stderr - +2025-02-05 12:10:42 - INFO - stdout - {'loss': 0.9268, 'grad_norm': 1.136892318725586, 'learning_rate': 1.9475163768010802e-05, 'epoch': 0.39} +2025-02-05 12:10:42 - ERROR - stderr - 13%|█▎ | 2928/22434 [2:03:02<13:23:07, 2.47s/it] +2025-02-05 12:10:44 - ERROR - stderr - 13%|█▎ | 2929/22434 [2:03:04<13:22:22, 2.47s/it] +2025-02-05 12:10:45 - ERROR - stderr - +2025-02-05 12:10:45 - ERROR - stderr - +2025-02-05 12:10:45 - INFO - stdout - {'loss': 1.1889, 'grad_norm': 1.2015352249145508, 'learning_rate': 1.9474702093021823e-05, 'epoch': 0.39} +2025-02-05 12:10:45 - ERROR - stderr - 13%|█▎ | 2929/22434 [2:03:04<13:22:22, 2.47s/it] +2025-02-05 12:10:47 - ERROR - stderr - 13%|█▎ | 2930/22434 [2:03:07<13:35:50, 2.51s/it] +2025-02-05 12:10:47 - ERROR - stderr - +2025-02-05 12:10:47 - ERROR - stderr - +2025-02-05 12:10:47 - INFO - stdout - {'loss': 0.9225, 'grad_norm': 1.1238012313842773, 'learning_rate': 1.9474240220541745e-05, 'epoch': 0.39} +2025-02-05 12:10:47 - ERROR - stderr - 13%|█▎ | 2930/22434 [2:03:07<13:35:50, 2.51s/it] +2025-02-05 12:10:50 - ERROR - stderr - 13%|█▎ | 2931/22434 [2:03:09<13:34:59, 2.51s/it] +2025-02-05 12:10:50 - ERROR - stderr - +2025-02-05 12:10:50 - ERROR - stderr - +2025-02-05 12:10:50 - INFO - stdout - {'loss': 0.9251, 'grad_norm': 1.0850963592529297, 'learning_rate': 1.9473778150580194e-05, 'epoch': 0.39} +2025-02-05 12:10:50 - ERROR - stderr - 13%|█▎ | 2931/22434 [2:03:09<13:34:59, 2.51s/it] +2025-02-05 12:10:52 - ERROR - stderr - 13%|█▎ | 2932/22434 [2:03:12<13:27:06, 2.48s/it] +2025-02-05 12:10:52 - ERROR - stderr - +2025-02-05 12:10:52 - ERROR - stderr - +2025-02-05 12:10:52 - INFO - stdout - {'loss': 1.033, 'grad_norm': 1.1071081161499023, 'learning_rate': 1.9473315883146803e-05, 'epoch': 0.39} +2025-02-05 12:10:52 - ERROR - stderr - 13%|█▎ | 2932/22434 [2:03:12<13:27:06, 2.48s/it] +2025-02-05 12:10:54 - ERROR - stderr - 13%|█▎ | 2933/22434 [2:03:14<13:28:00, 2.49s/it] +2025-02-05 12:10:55 - ERROR - stderr - +2025-02-05 12:10:55 - ERROR - stderr - +2025-02-05 12:10:55 - INFO - stdout - {'loss': 1.2082, 'grad_norm': 1.2282966375350952, 'learning_rate': 1.947285341825121e-05, 'epoch': 0.39} +2025-02-05 12:10:55 - ERROR - stderr - 13%|█▎ | 2933/22434 [2:03:14<13:28:00, 2.49s/it] +2025-02-05 12:10:57 - ERROR - stderr - 13%|█▎ | 2934/22434 [2:03:17<13:24:59, 2.48s/it] +2025-02-05 12:10:57 - ERROR - stderr - +2025-02-05 12:10:57 - ERROR - stderr - +2025-02-05 12:10:57 - INFO - stdout - {'loss': 0.9242, 'grad_norm': 1.1162861585617065, 'learning_rate': 1.947239075590305e-05, 'epoch': 0.39} +2025-02-05 12:10:57 - ERROR - stderr - 13%|█▎ | 2934/22434 [2:03:17<13:24:59, 2.48s/it] +2025-02-05 12:10:59 - ERROR - stderr - 13%|█▎ | 2935/22434 [2:03:19<13:26:59, 2.48s/it] +2025-02-05 12:10:59 - ERROR - stderr - +2025-02-05 12:10:59 - ERROR - stderr - +2025-02-05 12:10:59 - INFO - stdout - {'loss': 1.0234, 'grad_norm': 1.0945684909820557, 'learning_rate': 1.9471927896111967e-05, 'epoch': 0.39} +2025-02-05 12:10:59 - ERROR - stderr - 13%|█▎ | 2935/22434 [2:03:19<13:26:59, 2.48s/it] +2025-02-05 12:11:02 - ERROR - stderr - 13%|█▎ | 2936/22434 [2:03:22<13:22:17, 2.47s/it] +2025-02-05 12:11:02 - ERROR - stderr - +2025-02-05 12:11:02 - ERROR - stderr - +2025-02-05 12:11:02 - INFO - stdout - {'loss': 0.9675, 'grad_norm': 1.1530365943908691, 'learning_rate': 1.9471464838887614e-05, 'epoch': 0.39} +2025-02-05 12:11:02 - ERROR - stderr - 13%|█▎ | 2936/22434 [2:03:22<13:22:17, 2.47s/it] +2025-02-05 12:11:04 - ERROR - stderr - 13%|█▎ | 2937/22434 [2:03:24<13:21:24, 2.47s/it] +2025-02-05 12:11:04 - ERROR - stderr - +2025-02-05 12:11:04 - ERROR - stderr - +2025-02-05 12:11:04 - INFO - stdout - {'loss': 0.9735, 'grad_norm': 1.1861568689346313, 'learning_rate': 1.9471001584239637e-05, 'epoch': 0.39} +2025-02-05 12:11:04 - ERROR - stderr - 13%|█▎ | 2937/22434 [2:03:24<13:21:24, 2.47s/it] +2025-02-05 12:11:07 - ERROR - stderr - 13%|█▎ | 2938/22434 [2:03:27<13:27:36, 2.49s/it] +2025-02-05 12:11:07 - ERROR - stderr - +2025-02-05 12:11:07 - ERROR - stderr - +2025-02-05 12:11:07 - INFO - stdout - {'loss': 0.8628, 'grad_norm': 1.0933645963668823, 'learning_rate': 1.9470538132177696e-05, 'epoch': 0.39} +2025-02-05 12:11:07 - ERROR - stderr - 13%|█▎ | 2938/22434 [2:03:27<13:27:36, 2.49s/it] +2025-02-05 12:11:09 - ERROR - stderr - 13%|█▎ | 2939/22434 [2:03:29<13:33:11, 2.50s/it] +2025-02-05 12:11:09 - ERROR - stderr - +2025-02-05 12:11:09 - ERROR - stderr - +2025-02-05 12:11:09 - INFO - stdout - {'loss': 1.0275, 'grad_norm': 1.1958930492401123, 'learning_rate': 1.947007448271145e-05, 'epoch': 0.39} +2025-02-05 12:11:09 - ERROR - stderr - 13%|█▎ | 2939/22434 [2:03:29<13:33:11, 2.50s/it] +2025-02-05 12:11:12 - ERROR - stderr - 13%|█▎ | 2940/22434 [2:03:32<13:34:34, 2.51s/it] +2025-02-05 12:11:12 - ERROR - stderr - +2025-02-05 12:11:12 - ERROR - stderr - +2025-02-05 12:11:12 - INFO - stdout - {'loss': 0.9463, 'grad_norm': 1.1497830152511597, 'learning_rate': 1.9469610635850566e-05, 'epoch': 0.39} +2025-02-05 12:11:12 - ERROR - stderr - 13%|█▎ | 2940/22434 [2:03:32<13:34:34, 2.51s/it] +2025-02-05 12:11:14 - ERROR - stderr - 13%|█▎ | 2941/22434 [2:03:34<13:25:39, 2.48s/it] +2025-02-05 12:11:14 - ERROR - stderr - +2025-02-05 12:11:14 - ERROR - stderr - +2025-02-05 12:11:14 - INFO - stdout - {'loss': 1.0117, 'grad_norm': 1.118687629699707, 'learning_rate': 1.9469146591604703e-05, 'epoch': 0.39} +2025-02-05 12:11:14 - ERROR - stderr - 13%|█▎ | 2941/22434 [2:03:34<13:25:39, 2.48s/it] +2025-02-05 12:11:17 - ERROR - stderr - 13%|█▎ | 2942/22434 [2:03:37<13:27:58, 2.49s/it] +2025-02-05 12:11:17 - ERROR - stderr - +2025-02-05 12:11:17 - ERROR - stderr - +2025-02-05 12:11:17 - INFO - stdout - {'loss': 0.9626, 'grad_norm': 1.1976524591445923, 'learning_rate': 1.9468682349983544e-05, 'epoch': 0.39} +2025-02-05 12:11:17 - ERROR - stderr - 13%|█▎ | 2942/22434 [2:03:37<13:27:58, 2.49s/it] +2025-02-05 12:11:19 - ERROR - stderr - 13%|█▎ | 2943/22434 [2:03:39<13:27:17, 2.49s/it] +2025-02-05 12:11:19 - ERROR - stderr - +2025-02-05 12:11:19 - ERROR - stderr - +2025-02-05 12:11:19 - INFO - stdout - {'loss': 1.0688, 'grad_norm': 1.1083488464355469, 'learning_rate': 1.9468217910996767e-05, 'epoch': 0.39} +2025-02-05 12:11:19 - ERROR - stderr - 13%|█▎ | 2943/22434 [2:03:39<13:27:17, 2.49s/it] +2025-02-05 12:11:22 - ERROR - stderr - 13%|█▎ | 2944/22434 [2:03:42<13:31:50, 2.50s/it] +2025-02-05 12:11:22 - ERROR - stderr - +2025-02-05 12:11:22 - ERROR - stderr - +2025-02-05 12:11:22 - INFO - stdout - {'loss': 0.9823, 'grad_norm': 1.0521825551986694, 'learning_rate': 1.946775327465404e-05, 'epoch': 0.39} +2025-02-05 12:11:22 - ERROR - stderr - 13%|█▎ | 2944/22434 [2:03:42<13:31:50, 2.50s/it] +2025-02-05 12:11:24 - ERROR - stderr - 13%|█▎ | 2945/22434 [2:03:44<13:38:55, 2.52s/it] +2025-02-05 12:11:24 - ERROR - stderr - +2025-02-05 12:11:24 - ERROR - stderr - +2025-02-05 12:11:24 - INFO - stdout - {'loss': 0.9873, 'grad_norm': 1.0379694700241089, 'learning_rate': 1.946728844096506e-05, 'epoch': 0.39} +2025-02-05 12:11:24 - ERROR - stderr - 13%|█▎ | 2945/22434 [2:03:44<13:38:55, 2.52s/it] +2025-02-05 12:11:27 - ERROR - stderr - 13%|█▎ | 2946/22434 [2:03:47<13:31:06, 2.50s/it] +2025-02-05 12:11:27 - ERROR - stderr - +2025-02-05 12:11:27 - ERROR - stderr - +2025-02-05 12:11:27 - INFO - stdout - {'loss': 0.9242, 'grad_norm': 1.135482668876648, 'learning_rate': 1.946682340993951e-05, 'epoch': 0.39} +2025-02-05 12:11:27 - ERROR - stderr - 13%|█▎ | 2946/22434 [2:03:47<13:31:06, 2.50s/it] +2025-02-05 12:11:29 - ERROR - stderr - 13%|█▎ | 2947/22434 [2:03:49<13:27:12, 2.49s/it] +2025-02-05 12:11:29 - ERROR - stderr - +2025-02-05 12:11:29 - ERROR - stderr - +2025-02-05 12:11:29 - INFO - stdout - {'loss': 1.1414, 'grad_norm': 1.2415028810501099, 'learning_rate': 1.9466358181587085e-05, 'epoch': 0.39} +2025-02-05 12:11:29 - ERROR - stderr - 13%|█▎ | 2947/22434 [2:03:49<13:27:12, 2.49s/it] +2025-02-05 12:11:32 - ERROR - stderr - 13%|█▎ | 2948/22434 [2:03:52<13:36:43, 2.51s/it] +2025-02-05 12:11:32 - ERROR - stderr - +2025-02-05 12:11:32 - ERROR - stderr - +2025-02-05 12:11:32 - INFO - stdout - {'loss': 1.0327, 'grad_norm': 1.1224685907363892, 'learning_rate': 1.9465892755917482e-05, 'epoch': 0.39} +2025-02-05 12:11:32 - ERROR - stderr - 13%|█▎ | 2948/22434 [2:03:52<13:36:43, 2.51s/it] +2025-02-05 12:11:34 - ERROR - stderr - 13%|█▎ | 2949/22434 [2:03:54<13:35:44, 2.51s/it] +2025-02-05 12:11:34 - ERROR - stderr - +2025-02-05 12:11:34 - ERROR - stderr - +2025-02-05 12:11:34 - INFO - stdout - {'loss': 1.0562, 'grad_norm': 1.1045244932174683, 'learning_rate': 1.9465427132940404e-05, 'epoch': 0.39} +2025-02-05 12:11:34 - ERROR - stderr - 13%|█▎ | 2949/22434 [2:03:54<13:35:44, 2.51s/it] +2025-02-05 12:11:37 - ERROR - stderr - 13%|█▎ | 2950/22434 [2:03:57<14:04:45, 2.60s/it] +2025-02-05 12:11:37 - ERROR - stderr - +2025-02-05 12:11:37 - ERROR - stderr - +2025-02-05 12:11:37 - INFO - stdout - {'loss': 0.941, 'grad_norm': 1.1819961071014404, 'learning_rate': 1.946496131266555e-05, 'epoch': 0.39} +2025-02-05 12:11:37 - ERROR - stderr - 13%|█▎ | 2950/22434 [2:03:57<14:04:45, 2.60s/it] +2025-02-05 12:11:40 - ERROR - stderr - 13%|█▎ | 2951/22434 [2:03:59<13:49:22, 2.55s/it] +2025-02-05 12:11:40 - ERROR - stderr - +2025-02-05 12:11:40 - ERROR - stderr - +2025-02-05 12:11:40 - INFO - stdout - {'loss': 1.0656, 'grad_norm': 1.1414289474487305, 'learning_rate': 1.946449529510264e-05, 'epoch': 0.39} +2025-02-05 12:11:40 - ERROR - stderr - 13%|█▎ | 2951/22434 [2:03:59<13:49:22, 2.55s/it] +2025-02-05 12:11:42 - ERROR - stderr - 13%|█▎ | 2952/22434 [2:04:02<13:38:33, 2.52s/it] +2025-02-05 12:11:42 - ERROR - stderr - +2025-02-05 12:11:42 - ERROR - stderr - +2025-02-05 12:11:42 - INFO - stdout - {'loss': 1.0048, 'grad_norm': 1.454622745513916, 'learning_rate': 1.946402908026138e-05, 'epoch': 0.39} +2025-02-05 12:11:42 - ERROR - stderr - 13%|█▎ | 2952/22434 [2:04:02<13:38:33, 2.52s/it] +2025-02-05 12:11:45 - ERROR - stderr - 13%|█▎ | 2953/22434 [2:04:04<13:39:54, 2.53s/it] +2025-02-05 12:11:45 - ERROR - stderr - +2025-02-05 12:11:45 - ERROR - stderr - +2025-02-05 12:11:45 - INFO - stdout - {'loss': 0.925, 'grad_norm': 1.0038478374481201, 'learning_rate': 1.946356266815149e-05, 'epoch': 0.39} +2025-02-05 12:11:45 - ERROR - stderr - 13%|█▎ | 2953/22434 [2:04:04<13:39:54, 2.53s/it] +2025-02-05 12:11:47 - ERROR - stderr - 13%|█▎ | 2954/22434 [2:04:07<13:38:56, 2.52s/it] +2025-02-05 12:11:47 - ERROR - stderr - +2025-02-05 12:11:47 - ERROR - stderr - +2025-02-05 12:11:47 - INFO - stdout - {'loss': 0.9707, 'grad_norm': 1.0276093482971191, 'learning_rate': 1.946309605878269e-05, 'epoch': 0.4} +2025-02-05 12:11:47 - ERROR - stderr - 13%|█▎ | 2954/22434 [2:04:07<13:38:56, 2.52s/it] +2025-02-05 12:11:50 - ERROR - stderr - 13%|█▎ | 2955/22434 [2:04:09<13:38:35, 2.52s/it] +2025-02-05 12:11:50 - ERROR - stderr - +2025-02-05 12:11:50 - ERROR - stderr - +2025-02-05 12:11:50 - INFO - stdout - {'loss': 1.0376, 'grad_norm': 1.1300634145736694, 'learning_rate': 1.9462629252164712e-05, 'epoch': 0.4} +2025-02-05 12:11:50 - ERROR - stderr - 13%|█▎ | 2955/22434 [2:04:10<13:38:35, 2.52s/it] +2025-02-05 12:11:52 - ERROR - stderr - 13%|█▎ | 2956/22434 [2:04:12<13:35:45, 2.51s/it] +2025-02-05 12:11:52 - ERROR - stderr - +2025-02-05 12:11:52 - ERROR - stderr - +2025-02-05 12:11:52 - INFO - stdout - {'loss': 0.9123, 'grad_norm': 1.034170389175415, 'learning_rate': 1.9462162248307276e-05, 'epoch': 0.4} +2025-02-05 12:11:52 - ERROR - stderr - 13%|█▎ | 2956/22434 [2:04:12<13:35:45, 2.51s/it] +2025-02-05 12:11:55 - ERROR - stderr - 13%|█▎ | 2957/22434 [2:04:14<13:30:48, 2.50s/it] +2025-02-05 12:11:55 - ERROR - stderr - +2025-02-05 12:11:55 - ERROR - stderr - +2025-02-05 12:11:55 - INFO - stdout - {'loss': 0.8317, 'grad_norm': 1.1481757164001465, 'learning_rate': 1.9461695047220125e-05, 'epoch': 0.4} +2025-02-05 12:11:55 - ERROR - stderr - 13%|█▎ | 2957/22434 [2:04:14<13:30:48, 2.50s/it] +2025-02-05 12:11:57 - ERROR - stderr - 13%|█▎ | 2958/22434 [2:04:17<13:26:52, 2.49s/it] +2025-02-05 12:11:57 - ERROR - stderr - +2025-02-05 12:11:57 - ERROR - stderr - +2025-02-05 12:11:57 - INFO - stdout - {'loss': 0.8334, 'grad_norm': 1.1233325004577637, 'learning_rate': 1.9461227648912998e-05, 'epoch': 0.4} +2025-02-05 12:11:57 - ERROR - stderr - 13%|█▎ | 2958/22434 [2:04:17<13:26:52, 2.49s/it] +2025-02-05 12:12:00 - ERROR - stderr - 13%|█▎ | 2959/22434 [2:04:19<13:31:12, 2.50s/it] +2025-02-05 12:12:00 - ERROR - stderr - +2025-02-05 12:12:00 - ERROR - stderr - +2025-02-05 12:12:00 - INFO - stdout - {'loss': 1.028, 'grad_norm': 1.3017017841339111, 'learning_rate': 1.9460760053395628e-05, 'epoch': 0.4} +2025-02-05 12:12:00 - ERROR - stderr - 13%|█▎ | 2959/22434 [2:04:19<13:31:12, 2.50s/it] +2025-02-05 12:12:02 - ERROR - stderr - 13%|█▎ | 2960/22434 [2:04:22<13:29:25, 2.49s/it] +2025-02-05 12:12:02 - ERROR - stderr - +2025-02-05 12:12:02 - ERROR - stderr - +2025-02-05 12:12:02 - INFO - stdout - {'loss': 1.0041, 'grad_norm': 1.1644599437713623, 'learning_rate': 1.9460292260677773e-05, 'epoch': 0.4} +2025-02-05 12:12:02 - ERROR - stderr - 13%|█▎ | 2960/22434 [2:04:22<13:29:25, 2.49s/it] +2025-02-05 12:12:05 - ERROR - stderr - 13%|█▎ | 2961/22434 [2:04:24<13:34:36, 2.51s/it] +2025-02-05 12:12:05 - ERROR - stderr - +2025-02-05 12:12:05 - ERROR - stderr - +2025-02-05 12:12:05 - INFO - stdout - {'loss': 0.9709, 'grad_norm': 1.05825674533844, 'learning_rate': 1.9459824270769178e-05, 'epoch': 0.4} +2025-02-05 12:12:05 - ERROR - stderr - 13%|█▎ | 2961/22434 [2:04:24<13:34:36, 2.51s/it] +2025-02-05 12:12:07 - ERROR - stderr - 13%|█▎ | 2962/22434 [2:04:27<13:33:58, 2.51s/it] +2025-02-05 12:12:07 - ERROR - stderr - +2025-02-05 12:12:07 - ERROR - stderr - +2025-02-05 12:12:07 - INFO - stdout - {'loss': 0.9406, 'grad_norm': 1.1265883445739746, 'learning_rate': 1.9459356083679596e-05, 'epoch': 0.4} +2025-02-05 12:12:07 - ERROR - stderr - 13%|█▎ | 2962/22434 [2:04:27<13:33:58, 2.51s/it] +2025-02-05 12:12:10 - ERROR - stderr - 13%|█▎ | 2963/22434 [2:04:29<13:35:13, 2.51s/it] +2025-02-05 12:12:10 - ERROR - stderr - +2025-02-05 12:12:10 - ERROR - stderr - +2025-02-05 12:12:10 - INFO - stdout - {'loss': 0.959, 'grad_norm': 1.0600440502166748, 'learning_rate': 1.9458887699418786e-05, 'epoch': 0.4} +2025-02-05 12:12:10 - ERROR - stderr - 13%|█▎ | 2963/22434 [2:04:30<13:35:13, 2.51s/it] +2025-02-05 12:12:12 - ERROR - stderr - 13%|█▎ | 2964/22434 [2:04:32<13:36:20, 2.52s/it] +2025-02-05 12:12:12 - ERROR - stderr - +2025-02-05 12:12:12 - ERROR - stderr - +2025-02-05 12:12:12 - INFO - stdout - {'loss': 0.9721, 'grad_norm': 1.0856757164001465, 'learning_rate': 1.9458419117996516e-05, 'epoch': 0.4} +2025-02-05 12:12:12 - ERROR - stderr - 13%|█▎ | 2964/22434 [2:04:32<13:36:20, 2.52s/it] +2025-02-05 12:12:15 - ERROR - stderr - 13%|█▎ | 2965/22434 [2:04:34<13:35:58, 2.51s/it] +2025-02-05 12:12:15 - ERROR - stderr - +2025-02-05 12:12:15 - ERROR - stderr - +2025-02-05 12:12:15 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.1213642358779907, 'learning_rate': 1.945795033942255e-05, 'epoch': 0.4} +2025-02-05 12:12:15 - ERROR - stderr - 13%|█▎ | 2965/22434 [2:04:35<13:35:58, 2.51s/it] +2025-02-05 12:12:17 - ERROR - stderr - 13%|█▎ | 2966/22434 [2:04:37<13:42:53, 2.54s/it] +2025-02-05 12:12:17 - ERROR - stderr - +2025-02-05 12:12:17 - ERROR - stderr - +2025-02-05 12:12:17 - INFO - stdout - {'loss': 0.9499, 'grad_norm': 0.9919081330299377, 'learning_rate': 1.945748136370666e-05, 'epoch': 0.4} +2025-02-05 12:12:17 - ERROR - stderr - 13%|█▎ | 2966/22434 [2:04:37<13:42:53, 2.54s/it] +2025-02-05 12:12:20 - ERROR - stderr - 13%|█▎ | 2967/22434 [2:04:40<13:45:48, 2.55s/it] +2025-02-05 12:12:20 - ERROR - stderr - +2025-02-05 12:12:20 - ERROR - stderr - +2025-02-05 12:12:20 - INFO - stdout - {'loss': 0.9882, 'grad_norm': 1.1107960939407349, 'learning_rate': 1.945701219085862e-05, 'epoch': 0.4} +2025-02-05 12:12:20 - ERROR - stderr - 13%|█▎ | 2967/22434 [2:04:40<13:45:48, 2.55s/it] +2025-02-05 12:12:22 - ERROR - stderr - 13%|█▎ | 2968/22434 [2:04:42<13:44:29, 2.54s/it] +2025-02-05 12:12:22 - ERROR - stderr - +2025-02-05 12:12:22 - ERROR - stderr - +2025-02-05 12:12:22 - INFO - stdout - {'loss': 0.9671, 'grad_norm': 1.348785400390625, 'learning_rate': 1.9456542820888212e-05, 'epoch': 0.4} +2025-02-05 12:12:22 - ERROR - stderr - 13%|█▎ | 2968/22434 [2:04:42<13:44:29, 2.54s/it] +2025-02-05 12:12:25 - ERROR - stderr - 13%|█▎ | 2969/22434 [2:04:45<13:44:46, 2.54s/it] +2025-02-05 12:12:25 - ERROR - stderr - +2025-02-05 12:12:25 - ERROR - stderr - +2025-02-05 12:12:25 - INFO - stdout - {'loss': 0.9435, 'grad_norm': 1.0729196071624756, 'learning_rate': 1.9456073253805214e-05, 'epoch': 0.4} +2025-02-05 12:12:25 - ERROR - stderr - 13%|█▎ | 2969/22434 [2:04:45<13:44:46, 2.54s/it] +2025-02-05 12:12:28 - ERROR - stderr - 13%|█▎ | 2970/22434 [2:04:47<13:47:11, 2.55s/it] +2025-02-05 12:12:28 - ERROR - stderr - +2025-02-05 12:12:28 - ERROR - stderr - +2025-02-05 12:12:28 - INFO - stdout - {'loss': 0.9303, 'grad_norm': 1.0821197032928467, 'learning_rate': 1.945560348961942e-05, 'epoch': 0.4} +2025-02-05 12:12:28 - ERROR - stderr - 13%|█▎ | 2970/22434 [2:04:47<13:47:11, 2.55s/it] +2025-02-05 12:12:30 - ERROR - stderr - 13%|█▎ | 2971/22434 [2:04:50<13:39:14, 2.53s/it] +2025-02-05 12:12:30 - ERROR - stderr - +2025-02-05 12:12:30 - ERROR - stderr - +2025-02-05 12:12:30 - INFO - stdout - {'loss': 0.997, 'grad_norm': 1.1943069696426392, 'learning_rate': 1.945513352834062e-05, 'epoch': 0.4} +2025-02-05 12:12:30 - ERROR - stderr - 13%|█▎ | 2971/22434 [2:04:50<13:39:14, 2.53s/it] +2025-02-05 12:12:32 - ERROR - stderr - 13%|█▎ | 2972/22434 [2:04:52<13:35:48, 2.52s/it] +2025-02-05 12:12:33 - ERROR - stderr - +2025-02-05 12:12:33 - ERROR - stderr - +2025-02-05 12:12:33 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.0914510488510132, 'learning_rate': 1.945466336997861e-05, 'epoch': 0.4} +2025-02-05 12:12:33 - ERROR - stderr - 13%|█▎ | 2972/22434 [2:04:52<13:35:48, 2.52s/it] +2025-02-05 12:12:35 - ERROR - stderr - 13%|█▎ | 2973/22434 [2:04:55<13:29:29, 2.50s/it] +2025-02-05 12:12:35 - ERROR - stderr - +2025-02-05 12:12:35 - ERROR - stderr - +2025-02-05 12:12:35 - INFO - stdout - {'loss': 0.9323, 'grad_norm': 1.1857821941375732, 'learning_rate': 1.9454193014543185e-05, 'epoch': 0.4} +2025-02-05 12:12:35 - ERROR - stderr - 13%|█▎ | 2973/22434 [2:04:55<13:29:29, 2.50s/it] +2025-02-05 12:12:37 - ERROR - stderr - 13%|█▎ | 2974/22434 [2:04:57<13:33:04, 2.51s/it] +2025-02-05 12:12:38 - ERROR - stderr - +2025-02-05 12:12:38 - ERROR - stderr - +2025-02-05 12:12:38 - INFO - stdout - {'loss': 1.0902, 'grad_norm': 1.286543369293213, 'learning_rate': 1.9453722462044157e-05, 'epoch': 0.4} +2025-02-05 12:12:38 - ERROR - stderr - 13%|█▎ | 2974/22434 [2:04:57<13:33:04, 2.51s/it] +2025-02-05 12:12:40 - ERROR - stderr - 13%|█▎ | 2975/22434 [2:05:00<13:32:34, 2.51s/it] +2025-02-05 12:12:40 - ERROR - stderr - +2025-02-05 12:12:40 - ERROR - stderr - +2025-02-05 12:12:40 - INFO - stdout - {'loss': 0.9273, 'grad_norm': 1.052204966545105, 'learning_rate': 1.9453251712491326e-05, 'epoch': 0.4} +2025-02-05 12:12:40 - ERROR - stderr - 13%|█▎ | 2975/22434 [2:05:00<13:32:34, 2.51s/it] +2025-02-05 12:12:42 - ERROR - stderr - 13%|█▎ | 2976/22434 [2:05:02<13:27:59, 2.49s/it] +2025-02-05 12:12:42 - ERROR - stderr - +2025-02-05 12:12:42 - ERROR - stderr - +2025-02-05 12:12:42 - INFO - stdout - {'loss': 1.0412, 'grad_norm': 1.0948431491851807, 'learning_rate': 1.9452780765894516e-05, 'epoch': 0.4} +2025-02-05 12:12:42 - ERROR - stderr - 13%|█▎ | 2976/22434 [2:05:02<13:27:59, 2.49s/it] +2025-02-05 12:12:45 - ERROR - stderr - 13%|█▎ | 2977/22434 [2:05:05<13:54:45, 2.57s/it] +2025-02-05 12:12:45 - ERROR - stderr - +2025-02-05 12:12:45 - ERROR - stderr - +2025-02-05 12:12:45 - INFO - stdout - {'loss': 0.8893, 'grad_norm': 1.1378690004348755, 'learning_rate': 1.945230962226353e-05, 'epoch': 0.4} +2025-02-05 12:12:45 - ERROR - stderr - 13%|█▎ | 2977/22434 [2:05:05<13:54:45, 2.57s/it] +2025-02-05 12:12:48 - ERROR - stderr - 13%|█▎ | 2978/22434 [2:05:07<13:44:52, 2.54s/it] +2025-02-05 12:12:48 - ERROR - stderr - +2025-02-05 12:12:48 - ERROR - stderr - +2025-02-05 12:12:48 - INFO - stdout - {'loss': 0.9549, 'grad_norm': 1.2577379941940308, 'learning_rate': 1.94518382816082e-05, 'epoch': 0.4} +2025-02-05 12:12:48 - ERROR - stderr - 13%|█▎ | 2978/22434 [2:05:07<13:44:52, 2.54s/it] +2025-02-05 12:12:50 - ERROR - stderr - 13%|█▎ | 2979/22434 [2:05:10<13:38:58, 2.53s/it] +2025-02-05 12:12:50 - ERROR - stderr - +2025-02-05 12:12:50 - ERROR - stderr - +2025-02-05 12:12:50 - INFO - stdout - {'loss': 0.955, 'grad_norm': 1.0572412014007568, 'learning_rate': 1.945136674393834e-05, 'epoch': 0.4} +2025-02-05 12:12:50 - ERROR - stderr - 13%|█▎ | 2979/22434 [2:05:10<13:38:58, 2.53s/it] +2025-02-05 12:12:53 - ERROR - stderr - 13%|█▎ | 2980/22434 [2:05:12<13:28:38, 2.49s/it] +2025-02-05 12:12:53 - ERROR - stderr - +2025-02-05 12:12:53 - ERROR - stderr - +2025-02-05 12:12:53 - INFO - stdout - {'loss': 0.933, 'grad_norm': 1.176315188407898, 'learning_rate': 1.9450895009263786e-05, 'epoch': 0.4} +2025-02-05 12:12:53 - ERROR - stderr - 13%|█▎ | 2980/22434 [2:05:12<13:28:38, 2.49s/it] +2025-02-05 12:12:55 - ERROR - stderr - 13%|█▎ | 2981/22434 [2:05:15<13:27:16, 2.49s/it] +2025-02-05 12:12:55 - ERROR - stderr - +2025-02-05 12:12:55 - ERROR - stderr - +2025-02-05 12:12:55 - INFO - stdout - {'loss': 0.955, 'grad_norm': 1.030555009841919, 'learning_rate': 1.9450423077594373e-05, 'epoch': 0.4} +2025-02-05 12:12:55 - ERROR - stderr - 13%|█▎ | 2981/22434 [2:05:15<13:27:16, 2.49s/it] +2025-02-05 12:12:58 - ERROR - stderr - 13%|█▎ | 2982/22434 [2:05:17<13:28:28, 2.49s/it] +2025-02-05 12:12:58 - ERROR - stderr - +2025-02-05 12:12:58 - ERROR - stderr - +2025-02-05 12:12:58 - INFO - stdout - {'loss': 0.9533, 'grad_norm': 1.1320264339447021, 'learning_rate': 1.944995094893993e-05, 'epoch': 0.4} +2025-02-05 12:12:58 - ERROR - stderr - 13%|█▎ | 2982/22434 [2:05:17<13:28:28, 2.49s/it] +2025-02-05 12:13:00 - ERROR - stderr - 13%|█▎ | 2983/22434 [2:05:20<13:23:33, 2.48s/it] +2025-02-05 12:13:00 - ERROR - stderr - +2025-02-05 12:13:00 - ERROR - stderr - +2025-02-05 12:13:00 - INFO - stdout - {'loss': 1.1805, 'grad_norm': 1.2765610218048096, 'learning_rate': 1.94494786233103e-05, 'epoch': 0.4} +2025-02-05 12:13:00 - ERROR - stderr - 13%|█▎ | 2983/22434 [2:05:20<13:23:33, 2.48s/it] +2025-02-05 12:13:03 - ERROR - stderr - 13%|█▎ | 2984/22434 [2:05:22<13:32:26, 2.51s/it] +2025-02-05 12:13:03 - ERROR - stderr - +2025-02-05 12:13:03 - ERROR - stderr - +2025-02-05 12:13:03 - INFO - stdout - {'loss': 1.1222, 'grad_norm': 1.199271321296692, 'learning_rate': 1.9449006100715334e-05, 'epoch': 0.4} +2025-02-05 12:13:03 - ERROR - stderr - 13%|█▎ | 2984/22434 [2:05:22<13:32:26, 2.51s/it] +2025-02-05 12:13:05 - ERROR - stderr - 13%|█▎ | 2985/22434 [2:05:25<14:10:55, 2.63s/it] +2025-02-05 12:13:06 - ERROR - stderr - +2025-02-05 12:13:06 - ERROR - stderr - +2025-02-05 12:13:06 - INFO - stdout - {'loss': 0.9553, 'grad_norm': 1.1603643894195557, 'learning_rate': 1.9448533381164876e-05, 'epoch': 0.4} +2025-02-05 12:13:06 - ERROR - stderr - 13%|█▎ | 2985/22434 [2:05:25<14:10:55, 2.63s/it] +2025-02-05 12:13:08 - ERROR - stderr - 13%|█▎ | 2986/22434 [2:05:28<13:55:27, 2.58s/it] +2025-02-05 12:13:08 - ERROR - stderr - +2025-02-05 12:13:08 - ERROR - stderr - +2025-02-05 12:13:08 - INFO - stdout - {'loss': 0.9784, 'grad_norm': 1.1358752250671387, 'learning_rate': 1.944806046466878e-05, 'epoch': 0.4} +2025-02-05 12:13:08 - ERROR - stderr - 13%|█▎ | 2986/22434 [2:05:28<13:55:27, 2.58s/it] +2025-02-05 12:13:10 - ERROR - stderr - 13%|█▎ | 2987/22434 [2:05:30<13:48:33, 2.56s/it] +2025-02-05 12:13:10 - ERROR - stderr - +2025-02-05 12:13:10 - ERROR - stderr - +2025-02-05 12:13:10 - INFO - stdout - {'loss': 0.8616, 'grad_norm': 1.0535459518432617, 'learning_rate': 1.9447587351236907e-05, 'epoch': 0.4} +2025-02-05 12:13:10 - ERROR - stderr - 13%|█▎ | 2987/22434 [2:05:30<13:48:33, 2.56s/it] +2025-02-05 12:13:13 - ERROR - stderr - 13%|█▎ | 2988/22434 [2:05:33<13:45:38, 2.55s/it] +2025-02-05 12:13:13 - ERROR - stderr - +2025-02-05 12:13:13 - ERROR - stderr - +2025-02-05 12:13:13 - INFO - stdout - {'loss': 0.8675, 'grad_norm': 1.1692144870758057, 'learning_rate': 1.9447114040879115e-05, 'epoch': 0.4} +2025-02-05 12:13:13 - ERROR - stderr - 13%|█▎ | 2988/22434 [2:05:33<13:45:38, 2.55s/it] +2025-02-05 12:13:16 - ERROR - stderr - 13%|█▎ | 2989/22434 [2:05:35<14:00:33, 2.59s/it] +2025-02-05 12:13:16 - ERROR - stderr - +2025-02-05 12:13:16 - ERROR - stderr - +2025-02-05 12:13:16 - INFO - stdout - {'loss': 0.9998, 'grad_norm': 1.0296725034713745, 'learning_rate': 1.9446640533605272e-05, 'epoch': 0.4} +2025-02-05 12:13:16 - ERROR - stderr - 13%|█▎ | 2989/22434 [2:05:35<14:00:33, 2.59s/it] +2025-02-05 12:13:18 - ERROR - stderr - 13%|█▎ | 2990/22434 [2:05:38<14:01:26, 2.60s/it] +2025-02-05 12:13:18 - ERROR - stderr - +2025-02-05 12:13:18 - ERROR - stderr - +2025-02-05 12:13:18 - INFO - stdout - {'loss': 1.0327, 'grad_norm': 1.220070242881775, 'learning_rate': 1.9446166829425244e-05, 'epoch': 0.4} +2025-02-05 12:13:18 - ERROR - stderr - 13%|█▎ | 2990/22434 [2:05:38<14:01:26, 2.60s/it] +2025-02-05 12:13:21 - ERROR - stderr - 13%|█▎ | 2991/22434 [2:05:41<14:10:58, 2.63s/it] +2025-02-05 12:13:21 - ERROR - stderr - +2025-02-05 12:13:21 - ERROR - stderr - +2025-02-05 12:13:21 - INFO - stdout - {'loss': 1.1087, 'grad_norm': 1.226694941520691, 'learning_rate': 1.944569292834891e-05, 'epoch': 0.4} +2025-02-05 12:13:21 - ERROR - stderr - 13%|█▎ | 2991/22434 [2:05:41<14:10:58, 2.63s/it] +2025-02-05 12:13:24 - ERROR - stderr - 13%|█▎ | 2992/22434 [2:05:43<14:10:24, 2.62s/it] +2025-02-05 12:13:24 - ERROR - stderr - +2025-02-05 12:13:24 - ERROR - stderr - +2025-02-05 12:13:24 - INFO - stdout - {'loss': 0.9626, 'grad_norm': 1.043750286102295, 'learning_rate': 1.944521883038614e-05, 'epoch': 0.4} +2025-02-05 12:13:24 - ERROR - stderr - 13%|█▎ | 2992/22434 [2:05:43<14:10:24, 2.62s/it] +2025-02-05 12:13:26 - ERROR - stderr - 13%|█▎ | 2993/22434 [2:05:46<14:03:27, 2.60s/it] +2025-02-05 12:13:26 - ERROR - stderr - +2025-02-05 12:13:26 - ERROR - stderr - +2025-02-05 12:13:26 - INFO - stdout - {'loss': 0.9701, 'grad_norm': 1.0843777656555176, 'learning_rate': 1.9444744535546827e-05, 'epoch': 0.4} +2025-02-05 12:13:26 - ERROR - stderr - 13%|█▎ | 2993/22434 [2:05:46<14:03:27, 2.60s/it] +2025-02-05 12:13:29 - ERROR - stderr - 13%|█▎ | 2994/22434 [2:05:48<13:59:34, 2.59s/it] +2025-02-05 12:13:29 - ERROR - stderr - +2025-02-05 12:13:29 - ERROR - stderr - +2025-02-05 12:13:29 - INFO - stdout - {'loss': 0.8868, 'grad_norm': 1.1172428131103516, 'learning_rate': 1.9444270043840854e-05, 'epoch': 0.4} +2025-02-05 12:13:29 - ERROR - stderr - 13%|█▎ | 2994/22434 [2:05:49<13:59:34, 2.59s/it] +2025-02-05 12:13:31 - ERROR - stderr - 13%|█▎ | 2995/22434 [2:05:51<13:56:51, 2.58s/it] +2025-02-05 12:13:31 - ERROR - stderr - +2025-02-05 12:13:31 - ERROR - stderr - +2025-02-05 12:13:31 - INFO - stdout - {'loss': 1.0663, 'grad_norm': 1.0918567180633545, 'learning_rate': 1.9443795355278105e-05, 'epoch': 0.4} +2025-02-05 12:13:31 - ERROR - stderr - 13%|█▎ | 2995/22434 [2:05:51<13:56:51, 2.58s/it] +2025-02-05 12:13:34 - ERROR - stderr - 13%|█▎ | 2996/22434 [2:05:53<13:43:59, 2.54s/it] +2025-02-05 12:13:34 - ERROR - stderr - +2025-02-05 12:13:34 - ERROR - stderr - +2025-02-05 12:13:34 - INFO - stdout - {'loss': 0.9964, 'grad_norm': 1.0752836465835571, 'learning_rate': 1.944332046986848e-05, 'epoch': 0.4} +2025-02-05 12:13:34 - ERROR - stderr - 13%|█▎ | 2996/22434 [2:05:54<13:43:59, 2.54s/it] +2025-02-05 12:13:36 - ERROR - stderr - 13%|█▎ | 2997/22434 [2:05:56<13:33:08, 2.51s/it] +2025-02-05 12:13:36 - ERROR - stderr - +2025-02-05 12:13:36 - ERROR - stderr - +2025-02-05 12:13:36 - INFO - stdout - {'loss': 0.9478, 'grad_norm': 1.24544095993042, 'learning_rate': 1.9442845387621876e-05, 'epoch': 0.4} +2025-02-05 12:13:36 - ERROR - stderr - 13%|█▎ | 2997/22434 [2:05:56<13:33:08, 2.51s/it] +2025-02-05 12:13:39 - ERROR - stderr - 13%|█▎ | 2998/22434 [2:05:58<13:29:06, 2.50s/it] +2025-02-05 12:13:39 - ERROR - stderr - +2025-02-05 12:13:39 - ERROR - stderr - +2025-02-05 12:13:39 - INFO - stdout - {'loss': 0.8961, 'grad_norm': 0.9910604357719421, 'learning_rate': 1.9442370108548194e-05, 'epoch': 0.4} +2025-02-05 12:13:39 - ERROR - stderr - 13%|█▎ | 2998/22434 [2:05:58<13:29:06, 2.50s/it] +2025-02-05 12:13:41 - ERROR - stderr - 13%|█▎ | 2999/22434 [2:06:01<13:24:57, 2.49s/it] +2025-02-05 12:13:41 - ERROR - stderr - +2025-02-05 12:13:41 - ERROR - stderr - +2025-02-05 12:13:41 - INFO - stdout - {'loss': 1.0771, 'grad_norm': 1.1637219190597534, 'learning_rate': 1.9441894632657343e-05, 'epoch': 0.4} +2025-02-05 12:13:41 - ERROR - stderr - 13%|█▎ | 2999/22434 [2:06:01<13:24:57, 2.49s/it] +2025-02-05 12:13:44 - ERROR - stderr - 13%|█▎ | 3000/22434 [2:06:03<13:27:47, 2.49s/it] +2025-02-05 12:13:44 - ERROR - stderr - +2025-02-05 12:13:44 - ERROR - stderr - +2025-02-05 12:13:44 - INFO - stdout - {'loss': 1.1962, 'grad_norm': 1.2301656007766724, 'learning_rate': 1.9441418959959237e-05, 'epoch': 0.4} +2025-02-05 12:13:44 - ERROR - stderr - 13%|█▎ | 3000/22434 [2:06:03<13:27:47, 2.49s/it] +2025-02-05 12:13:46 - ERROR - stderr - 13%|█▎ | 3001/22434 [2:06:06<13:28:00, 2.49s/it] +2025-02-05 12:13:46 - ERROR - stderr - +2025-02-05 12:13:46 - ERROR - stderr - +2025-02-05 12:13:46 - INFO - stdout - {'loss': 1.1201, 'grad_norm': 1.0858397483825684, 'learning_rate': 1.9440943090463783e-05, 'epoch': 0.4} +2025-02-05 12:13:46 - ERROR - stderr - 13%|█▎ | 3001/22434 [2:06:06<13:28:00, 2.49s/it] +2025-02-05 12:13:49 - ERROR - stderr - 13%|█▎ | 3002/22434 [2:06:08<13:21:48, 2.48s/it] +2025-02-05 12:13:49 - ERROR - stderr - +2025-02-05 12:13:49 - ERROR - stderr - +2025-02-05 12:13:49 - INFO - stdout - {'loss': 1.0366, 'grad_norm': 1.1521450281143188, 'learning_rate': 1.94404670241809e-05, 'epoch': 0.4} +2025-02-05 12:13:49 - ERROR - stderr - 13%|█▎ | 3002/22434 [2:06:08<13:21:48, 2.48s/it] +2025-02-05 12:13:51 - ERROR - stderr - 13%|█▎ | 3003/22434 [2:06:11<13:25:03, 2.49s/it] +2025-02-05 12:13:51 - ERROR - stderr - +2025-02-05 12:13:51 - ERROR - stderr - +2025-02-05 12:13:51 - INFO - stdout - {'loss': 1.0114, 'grad_norm': 1.0844823122024536, 'learning_rate': 1.9439990761120523e-05, 'epoch': 0.4} +2025-02-05 12:13:51 - ERROR - stderr - 13%|█▎ | 3003/22434 [2:06:11<13:25:03, 2.49s/it] +2025-02-05 12:13:53 - ERROR - stderr - 13%|█▎ | 3004/22434 [2:06:13<13:19:57, 2.47s/it] +2025-02-05 12:13:54 - ERROR - stderr - +2025-02-05 12:13:54 - ERROR - stderr - +2025-02-05 12:13:54 - INFO - stdout - {'loss': 1.0517, 'grad_norm': 1.101545810699463, 'learning_rate': 1.943951430129257e-05, 'epoch': 0.4} +2025-02-05 12:13:54 - ERROR - stderr - 13%|█▎ | 3004/22434 [2:06:13<13:19:57, 2.47s/it] +2025-02-05 12:13:56 - ERROR - stderr - 13%|█▎ | 3005/22434 [2:06:16<13:19:41, 2.47s/it] +2025-02-05 12:13:56 - ERROR - stderr - +2025-02-05 12:13:56 - ERROR - stderr - +2025-02-05 12:13:56 - INFO - stdout - {'loss': 1.0221, 'grad_norm': 1.1841049194335938, 'learning_rate': 1.9439037644706974e-05, 'epoch': 0.4} +2025-02-05 12:13:56 - ERROR - stderr - 13%|█▎ | 3005/22434 [2:06:16<13:19:41, 2.47s/it] +2025-02-05 12:13:58 - ERROR - stderr - 13%|█▎ | 3006/22434 [2:06:18<13:24:12, 2.48s/it] +2025-02-05 12:13:59 - ERROR - stderr - +2025-02-05 12:13:59 - ERROR - stderr - +2025-02-05 12:13:59 - INFO - stdout - {'loss': 1.0512, 'grad_norm': 1.16738760471344, 'learning_rate': 1.9438560791373668e-05, 'epoch': 0.4} +2025-02-05 12:13:59 - ERROR - stderr - 13%|█▎ | 3006/22434 [2:06:18<13:24:12, 2.48s/it] +2025-02-05 12:14:01 - ERROR - stderr - 13%|█�� | 3007/22434 [2:06:21<13:32:53, 2.51s/it] +2025-02-05 12:14:01 - ERROR - stderr - +2025-02-05 12:14:01 - ERROR - stderr - +2025-02-05 12:14:01 - INFO - stdout - {'loss': 1.0459, 'grad_norm': 1.2542275190353394, 'learning_rate': 1.9438083741302598e-05, 'epoch': 0.4} +2025-02-05 12:14:01 - ERROR - stderr - 13%|█▎ | 3007/22434 [2:06:21<13:32:53, 2.51s/it] +2025-02-05 12:14:04 - ERROR - stderr - 13%|█▎ | 3008/22434 [2:06:23<13:47:59, 2.56s/it] +2025-02-05 12:14:04 - ERROR - stderr - +2025-02-05 12:14:04 - ERROR - stderr - +2025-02-05 12:14:04 - INFO - stdout - {'loss': 1.0963, 'grad_norm': 1.157622218132019, 'learning_rate': 1.94376064945037e-05, 'epoch': 0.4} +2025-02-05 12:14:04 - ERROR - stderr - 13%|█▎ | 3008/22434 [2:06:24<13:47:59, 2.56s/it] +2025-02-05 12:14:06 - ERROR - stderr - 13%|█▎ | 3009/22434 [2:06:26<13:45:45, 2.55s/it] +2025-02-05 12:14:06 - ERROR - stderr - +2025-02-05 12:14:06 - ERROR - stderr - +2025-02-05 12:14:06 - INFO - stdout - {'loss': 1.0438, 'grad_norm': 1.1552412509918213, 'learning_rate': 1.9437129050986928e-05, 'epoch': 0.4} +2025-02-05 12:14:06 - ERROR - stderr - 13%|█▎ | 3009/22434 [2:06:26<13:45:45, 2.55s/it] +2025-02-05 12:14:09 - ERROR - stderr - 13%|█▎ | 3010/22434 [2:06:29<13:45:23, 2.55s/it] +2025-02-05 12:14:09 - ERROR - stderr - +2025-02-05 12:14:09 - ERROR - stderr - +2025-02-05 12:14:09 - INFO - stdout - {'loss': 0.9397, 'grad_norm': 1.0435905456542969, 'learning_rate': 1.943665141076223e-05, 'epoch': 0.4} +2025-02-05 12:14:09 - ERROR - stderr - 13%|█▎ | 3010/22434 [2:06:29<13:45:23, 2.55s/it] +2025-02-05 12:14:11 - ERROR - stderr - 13%|█▎ | 3011/22434 [2:06:31<13:42:48, 2.54s/it] +2025-02-05 12:14:11 - ERROR - stderr - +2025-02-05 12:14:11 - ERROR - stderr - +2025-02-05 12:14:11 - INFO - stdout - {'loss': 1.0182, 'grad_norm': 1.1731706857681274, 'learning_rate': 1.9436173573839565e-05, 'epoch': 0.4} +2025-02-05 12:14:11 - ERROR - stderr - 13%|█▎ | 3011/22434 [2:06:31<13:42:48, 2.54s/it] +2025-02-05 12:14:14 - ERROR - stderr - 13%|█▎ | 3012/22434 [2:06:33<13:30:59, 2.51s/it] +2025-02-05 12:14:14 - ERROR - stderr - +2025-02-05 12:14:14 - ERROR - stderr - +2025-02-05 12:14:14 - INFO - stdout - {'loss': 1.0548, 'grad_norm': 1.1348472833633423, 'learning_rate': 1.943569554022889e-05, 'epoch': 0.4} +2025-02-05 12:14:14 - ERROR - stderr - 13%|█▎ | 3012/22434 [2:06:34<13:30:59, 2.51s/it] +2025-02-05 12:14:16 - ERROR - stderr - 13%|█▎ | 3013/22434 [2:06:36<13:30:27, 2.50s/it] +2025-02-05 12:14:16 - ERROR - stderr - +2025-02-05 12:14:16 - ERROR - stderr - +2025-02-05 12:14:16 - INFO - stdout - {'loss': 1.0425, 'grad_norm': 1.1312835216522217, 'learning_rate': 1.943521730994017e-05, 'epoch': 0.4} +2025-02-05 12:14:16 - ERROR - stderr - 13%|█▎ | 3013/22434 [2:06:36<13:30:27, 2.50s/it] +2025-02-05 12:14:19 - ERROR - stderr - 13%|█▎ | 3014/22434 [2:06:39<13:32:00, 2.51s/it] +2025-02-05 12:14:19 - ERROR - stderr - +2025-02-05 12:14:19 - ERROR - stderr - +2025-02-05 12:14:19 - INFO - stdout - {'loss': 1.0935, 'grad_norm': 1.1038933992385864, 'learning_rate': 1.9434738882983373e-05, 'epoch': 0.4} +2025-02-05 12:14:19 - ERROR - stderr - 13%|█▎ | 3014/22434 [2:06:39<13:32:00, 2.51s/it] +2025-02-05 12:14:22 - ERROR - stderr - 13%|█▎ | 3015/22434 [2:06:41<14:08:21, 2.62s/it] +2025-02-05 12:14:22 - ERROR - stderr - +2025-02-05 12:14:22 - ERROR - stderr - +2025-02-05 12:14:22 - INFO - stdout - {'loss': 0.9917, 'grad_norm': 1.1634535789489746, 'learning_rate': 1.9434260259368473e-05, 'epoch': 0.4} +2025-02-05 12:14:22 - ERROR - stderr - 13%|█▎ | 3015/22434 [2:06:41<14:08:21, 2.62s/it] +2025-02-05 12:14:24 - ERROR - stderr - 13%|█▎ | 3016/22434 [2:06:44<14:05:00, 2.61s/it] +2025-02-05 12:14:24 - ERROR - stderr - +2025-02-05 12:14:24 - ERROR - stderr - +2025-02-05 12:14:24 - INFO - stdout - {'loss': 0.8737, 'grad_norm': 1.065834879875183, 'learning_rate': 1.9433781439105446e-05, 'epoch': 0.4} +2025-02-05 12:14:24 - ERROR - stderr - 13%|█▎ | 3016/22434 [2:06:44<14:05:00, 2.61s/it] +2025-02-05 12:14:27 - ERROR - stderr - 13%|█▎ | 3017/22434 [2:06:46<13:52:42, 2.57s/it] +2025-02-05 12:14:27 - ERROR - stderr - +2025-02-05 12:14:27 - ERROR - stderr - +2025-02-05 12:14:27 - INFO - stdout - {'loss': 1.0413, 'grad_norm': 1.089040994644165, 'learning_rate': 1.9433302422204272e-05, 'epoch': 0.4} +2025-02-05 12:14:27 - ERROR - stderr - 13%|█▎ | 3017/22434 [2:06:47<13:52:42, 2.57s/it] +2025-02-05 12:14:29 - ERROR - stderr - 13%|█▎ | 3018/22434 [2:06:49<14:01:56, 2.60s/it] +2025-02-05 12:14:29 - ERROR - stderr - +2025-02-05 12:14:29 - ERROR - stderr - +2025-02-05 12:14:29 - INFO - stdout - {'loss': 1.0662, 'grad_norm': 1.160011649131775, 'learning_rate': 1.9432823208674936e-05, 'epoch': 0.4} +2025-02-05 12:14:29 - ERROR - stderr - 13%|█▎ | 3018/22434 [2:06:49<14:01:56, 2.60s/it] +2025-02-05 12:14:32 - ERROR - stderr - 13%|█▎ | 3019/22434 [2:06:52<13:53:15, 2.58s/it] +2025-02-05 12:14:32 - ERROR - stderr - +2025-02-05 12:14:32 - ERROR - stderr - +2025-02-05 12:14:32 - INFO - stdout - {'loss': 0.9126, 'grad_norm': 1.0672634840011597, 'learning_rate': 1.9432343798527427e-05, 'epoch': 0.4} +2025-02-05 12:14:32 - ERROR - stderr - 13%|█▎ | 3019/22434 [2:06:52<13:53:15, 2.58s/it] +2025-02-05 12:14:34 - ERROR - stderr - 13%|█▎ | 3020/22434 [2:06:54<13:40:31, 2.54s/it] +2025-02-05 12:14:34 - ERROR - stderr - +2025-02-05 12:14:34 - ERROR - stderr - +2025-02-05 12:14:34 - INFO - stdout - {'loss': 0.8837, 'grad_norm': 1.1072423458099365, 'learning_rate': 1.9431864191771733e-05, 'epoch': 0.4} +2025-02-05 12:14:34 - ERROR - stderr - 13%|█▎ | 3020/22434 [2:06:54<13:40:31, 2.54s/it] +2025-02-05 12:14:37 - ERROR - stderr - 13%|█▎ | 3021/22434 [2:06:57<13:51:34, 2.57s/it] +2025-02-05 12:14:37 - ERROR - stderr - +2025-02-05 12:14:37 - ERROR - stderr - +2025-02-05 12:14:37 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.2003124952316284, 'learning_rate': 1.943138438841786e-05, 'epoch': 0.4} +2025-02-05 12:14:37 - ERROR - stderr - 13%|█▎ | 3021/22434 [2:06:57<13:51:34, 2.57s/it] +2025-02-05 12:14:40 - ERROR - stderr - 13%|█▎ | 3022/22434 [2:07:00<14:39:12, 2.72s/it] +2025-02-05 12:14:40 - ERROR - stderr - +2025-02-05 12:14:40 - ERROR - stderr - +2025-02-05 12:14:40 - INFO - stdout - {'loss': 1.1152, 'grad_norm': 1.2278048992156982, 'learning_rate': 1.9430904388475803e-05, 'epoch': 0.4} +2025-02-05 12:14:40 - ERROR - stderr - 13%|█▎ | 3022/22434 [2:07:00<14:39:12, 2.72s/it] +2025-02-05 12:14:43 - ERROR - stderr - 13%|█▎ | 3023/22434 [2:07:02<14:26:19, 2.68s/it] +2025-02-05 12:14:43 - ERROR - stderr - +2025-02-05 12:14:43 - ERROR - stderr - +2025-02-05 12:14:43 - INFO - stdout - {'loss': 1.0251, 'grad_norm': 1.2416614294052124, 'learning_rate': 1.9430424191955567e-05, 'epoch': 0.4} +2025-02-05 12:14:43 - ERROR - stderr - 13%|█▎ | 3023/22434 [2:07:02<14:26:19, 2.68s/it] +2025-02-05 12:14:45 - ERROR - stderr - 13%|█▎ | 3024/22434 [2:07:05<14:05:34, 2.61s/it] +2025-02-05 12:14:45 - ERROR - stderr - +2025-02-05 12:14:45 - ERROR - stderr - +2025-02-05 12:14:45 - INFO - stdout - {'loss': 0.9551, 'grad_norm': 1.1391545534133911, 'learning_rate': 1.9429943798867163e-05, 'epoch': 0.4} +2025-02-05 12:14:45 - ERROR - stderr - 13%|█▎ | 3024/22434 [2:07:05<14:05:34, 2.61s/it] +2025-02-05 12:14:48 - ERROR - stderr - 13%|█▎ | 3025/22434 [2:07:07<13:52:26, 2.57s/it] +2025-02-05 12:14:48 - ERROR - stderr - +2025-02-05 12:14:48 - ERROR - stderr - +2025-02-05 12:14:48 - INFO - stdout - {'loss': 0.9185, 'grad_norm': 0.9949235320091248, 'learning_rate': 1.9429463209220604e-05, 'epoch': 0.4} +2025-02-05 12:14:48 - ERROR - stderr - 13%|█▎ | 3025/22434 [2:07:07<13:52:26, 2.57s/it] +2025-02-05 12:14:50 - ERROR - stderr - 13%|█▎ | 3026/22434 [2:07:10<13:46:33, 2.56s/it] +2025-02-05 12:14:50 - ERROR - stderr - +2025-02-05 12:14:50 - ERROR - stderr - +2025-02-05 12:14:50 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.070574164390564, 'learning_rate': 1.942898242302591e-05, 'epoch': 0.4} +2025-02-05 12:14:50 - ERROR - stderr - 13%|█▎ | 3026/22434 [2:07:10<13:46:33, 2.56s/it] +2025-02-05 12:14:53 - ERROR - stderr - 13%|█▎ | 3027/22434 [2:07:12<13:41:19, 2.54s/it] +2025-02-05 12:14:53 - ERROR - stderr - +2025-02-05 12:14:53 - ERROR - stderr - +2025-02-05 12:14:53 - INFO - stdout - {'loss': 1.0681, 'grad_norm': 1.1673306226730347, 'learning_rate': 1.9428501440293098e-05, 'epoch': 0.4} +2025-02-05 12:14:53 - ERROR - stderr - 13%|█▎ | 3027/22434 [2:07:12<13:41:19, 2.54s/it] +2025-02-05 12:14:55 - ERROR - stderr - 13%|█▎ | 3028/22434 [2:07:15<13:50:45, 2.57s/it] +2025-02-05 12:14:55 - ERROR - stderr - +2025-02-05 12:14:55 - ERROR - stderr - +2025-02-05 12:14:55 - INFO - stdout - {'loss': 0.9421, 'grad_norm': 1.080773949623108, 'learning_rate': 1.9428020261032196e-05, 'epoch': 0.4} +2025-02-05 12:14:55 - ERROR - stderr - 13%|█▎ | 3028/22434 [2:07:15<13:50:45, 2.57s/it] +2025-02-05 12:14:58 - ERROR - stderr - 14%|█▎ | 3029/22434 [2:07:18<13:58:49, 2.59s/it] +2025-02-05 12:14:58 - ERROR - stderr - +2025-02-05 12:14:58 - ERROR - stderr - +2025-02-05 12:14:58 - INFO - stdout - {'loss': 0.9367, 'grad_norm': 1.1092828512191772, 'learning_rate': 1.9427538885253233e-05, 'epoch': 0.41} +2025-02-05 12:14:58 - ERROR - stderr - 14%|█▎ | 3029/22434 [2:07:18<13:58:49, 2.59s/it] +2025-02-05 12:15:00 - ERROR - stderr - 14%|█▎ | 3030/22434 [2:07:20<13:43:43, 2.55s/it] +2025-02-05 12:15:00 - ERROR - stderr - +2025-02-05 12:15:00 - ERROR - stderr - +2025-02-05 12:15:00 - INFO - stdout - {'loss': 0.8834, 'grad_norm': 1.0429340600967407, 'learning_rate': 1.942705731296624e-05, 'epoch': 0.41} +2025-02-05 12:15:00 - ERROR - stderr - 14%|█▎ | 3030/22434 [2:07:20<13:43:43, 2.55s/it] +2025-02-05 12:15:03 - ERROR - stderr - 14%|█▎ | 3031/22434 [2:07:23<13:33:37, 2.52s/it] +2025-02-05 12:15:03 - ERROR - stderr - +2025-02-05 12:15:03 - ERROR - stderr - +2025-02-05 12:15:03 - INFO - stdout - {'loss': 1.0274, 'grad_norm': 1.245124340057373, 'learning_rate': 1.9426575544181263e-05, 'epoch': 0.41} +2025-02-05 12:15:03 - ERROR - stderr - 14%|█▎ | 3031/22434 [2:07:23<13:33:37, 2.52s/it] +2025-02-05 12:15:05 - ERROR - stderr - 14%|█▎ | 3032/22434 [2:07:25<13:23:55, 2.49s/it] +2025-02-05 12:15:05 - ERROR - stderr - +2025-02-05 12:15:05 - ERROR - stderr - +2025-02-05 12:15:05 - INFO - stdout - {'loss': 0.9729, 'grad_norm': 1.1455271244049072, 'learning_rate': 1.9426093578908335e-05, 'epoch': 0.41} +2025-02-05 12:15:05 - ERROR - stderr - 14%|█▎ | 3032/22434 [2:07:25<13:23:55, 2.49s/it] +2025-02-05 12:15:08 - ERROR - stderr - 14%|█▎ | 3033/22434 [2:07:27<13:24:23, 2.49s/it] +2025-02-05 12:15:08 - ERROR - stderr - +2025-02-05 12:15:08 - ERROR - stderr - +2025-02-05 12:15:08 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.1643540859222412, 'learning_rate': 1.9425611417157512e-05, 'epoch': 0.41} +2025-02-05 12:15:08 - ERROR - stderr - 14%|█▎ | 3033/22434 [2:07:27<13:24:23, 2.49s/it] +2025-02-05 12:15:10 - ERROR - stderr - 14%|█▎ | 3034/22434 [2:07:30<13:28:24, 2.50s/it] +2025-02-05 12:15:10 - ERROR - stderr - +2025-02-05 12:15:10 - ERROR - stderr - +2025-02-05 12:15:10 - INFO - stdout - {'loss': 0.9572, 'grad_norm': 1.2185564041137695, 'learning_rate': 1.9425129058938833e-05, 'epoch': 0.41} +2025-02-05 12:15:10 - ERROR - stderr - 14%|█▎ | 3034/22434 [2:07:30<13:28:24, 2.50s/it] +2025-02-05 12:15:13 - ERROR - stderr - 14%|█▎ | 3035/22434 [2:07:32<13:30:10, 2.51s/it] +2025-02-05 12:15:13 - ERROR - stderr - +2025-02-05 12:15:13 - ERROR - stderr - +2025-02-05 12:15:13 - INFO - stdout - {'loss': 0.9759, 'grad_norm': 1.0965440273284912, 'learning_rate': 1.942464650426236e-05, 'epoch': 0.41} +2025-02-05 12:15:13 - ERROR - stderr - 14%|█▎ | 3035/22434 [2:07:33<13:30:10, 2.51s/it] +2025-02-05 12:15:15 - ERROR - stderr - 14%|█▎ | 3036/22434 [2:07:35<13:21:47, 2.48s/it] +2025-02-05 12:15:15 - ERROR - stderr - +2025-02-05 12:15:15 - ERROR - stderr - +2025-02-05 12:15:15 - INFO - stdout - {'loss': 1.0272, 'grad_norm': 1.1587576866149902, 'learning_rate': 1.9424163753138144e-05, 'epoch': 0.41} +2025-02-05 12:15:15 - ERROR - stderr - 14%|█▎ | 3036/22434 [2:07:35<13:21:47, 2.48s/it] +2025-02-05 12:15:18 - ERROR - stderr - 14%|█▎ | 3037/22434 [2:07:37<13:15:41, 2.46s/it] +2025-02-05 12:15:18 - ERROR - stderr - +2025-02-05 12:15:18 - ERROR - stderr - +2025-02-05 12:15:18 - INFO - stdout - {'loss': 0.9277, 'grad_norm': 1.0783741474151611, 'learning_rate': 1.942368080557626e-05, 'epoch': 0.41} +2025-02-05 12:15:18 - ERROR - stderr - 14%|█▎ | 3037/22434 [2:07:37<13:15:41, 2.46s/it] +2025-02-05 12:15:20 - ERROR - stderr - 14%|█▎ | 3038/22434 [2:07:40<13:19:45, 2.47s/it] +2025-02-05 12:15:20 - ERROR - stderr - +2025-02-05 12:15:20 - ERROR - stderr - +2025-02-05 12:15:20 - INFO - stdout - {'loss': 1.0057, 'grad_norm': 1.0751574039459229, 'learning_rate': 1.9423197661586765e-05, 'epoch': 0.41} +2025-02-05 12:15:20 - ERROR - stderr - 14%|█▎ | 3038/22434 [2:07:40<13:19:45, 2.47s/it] +2025-02-05 12:15:22 - ERROR - stderr - 14%|█▎ | 3039/22434 [2:07:42<13:17:18, 2.47s/it] +2025-02-05 12:15:23 - ERROR - stderr - +2025-02-05 12:15:23 - ERROR - stderr - +2025-02-05 12:15:23 - INFO - stdout - {'loss': 0.9691, 'grad_norm': 1.169594407081604, 'learning_rate': 1.942271432117973e-05, 'epoch': 0.41} +2025-02-05 12:15:23 - ERROR - stderr - 14%|█▎ | 3039/22434 [2:07:42<13:17:18, 2.47s/it] +2025-02-05 12:15:25 - ERROR - stderr - 14%|█▎ | 3040/22434 [2:07:45<13:20:38, 2.48s/it] +2025-02-05 12:15:25 - ERROR - stderr - +2025-02-05 12:15:25 - ERROR - stderr - +2025-02-05 12:15:25 - INFO - stdout - {'loss': 1.0368, 'grad_norm': 1.227099061012268, 'learning_rate': 1.942223078436523e-05, 'epoch': 0.41} +2025-02-05 12:15:25 - ERROR - stderr - 14%|█▎ | 3040/22434 [2:07:45<13:20:38, 2.48s/it] +2025-02-05 12:15:27 - ERROR - stderr - 14%|█▎ | 3041/22434 [2:07:47<13:20:36, 2.48s/it] +2025-02-05 12:15:28 - ERROR - stderr - +2025-02-05 12:15:28 - ERROR - stderr - +2025-02-05 12:15:28 - INFO - stdout - {'loss': 1.0428, 'grad_norm': 1.2535454034805298, 'learning_rate': 1.942174705115335e-05, 'epoch': 0.41} +2025-02-05 12:15:28 - ERROR - stderr - 14%|█▎ | 3041/22434 [2:07:47<13:20:36, 2.48s/it] +2025-02-05 12:15:30 - ERROR - stderr - 14%|█▎ | 3042/22434 [2:07:50<13:27:08, 2.50s/it] +2025-02-05 12:15:30 - ERROR - stderr - +2025-02-05 12:15:30 - ERROR - stderr - +2025-02-05 12:15:30 - INFO - stdout - {'loss': 1.0246, 'grad_norm': 1.1885732412338257, 'learning_rate': 1.9421263121554163e-05, 'epoch': 0.41} +2025-02-05 12:15:30 - ERROR - stderr - 14%|█▎ | 3042/22434 [2:07:50<13:27:08, 2.50s/it] +2025-02-05 12:15:32 - ERROR - stderr - 14%|█▎ | 3043/22434 [2:07:52<13:21:39, 2.48s/it] +2025-02-05 12:15:33 - ERROR - stderr - +2025-02-05 12:15:33 - ERROR - stderr - +2025-02-05 12:15:33 - INFO - stdout - {'loss': 1.0452, 'grad_norm': 1.1729487180709839, 'learning_rate': 1.9420778995577768e-05, 'epoch': 0.41} +2025-02-05 12:15:33 - ERROR - stderr - 14%|█▎ | 3043/22434 [2:07:52<13:21:39, 2.48s/it] +2025-02-05 12:15:35 - ERROR - stderr - 14%|█▎ | 3044/22434 [2:07:55<13:25:15, 2.49s/it] +2025-02-05 12:15:35 - ERROR - stderr - +2025-02-05 12:15:35 - ERROR - stderr - +2025-02-05 12:15:35 - INFO - stdout - {'loss': 1.1481, 'grad_norm': 1.1860896348953247, 'learning_rate': 1.9420294673234243e-05, 'epoch': 0.41} +2025-02-05 12:15:35 - ERROR - stderr - 14%|█▎ | 3044/22434 [2:07:55<13:25:15, 2.49s/it] +2025-02-05 12:15:37 - ERROR - stderr - 14%|█▎ | 3045/22434 [2:07:57<13:19:33, 2.47s/it] +2025-02-05 12:15:37 - ERROR - stderr - +2025-02-05 12:15:37 - ERROR - stderr - +2025-02-05 12:15:37 - INFO - stdout - {'loss': 1.0033, 'grad_norm': 1.3106626272201538, 'learning_rate': 1.9419810154533694e-05, 'epoch': 0.41} +2025-02-05 12:15:37 - ERROR - stderr - 14%|█▎ | 3045/22434 [2:07:57<13:19:33, 2.47s/it] +2025-02-05 12:15:40 - ERROR - stderr - 14%|█▎ | 3046/22434 [2:08:00<13:25:20, 2.49s/it] +2025-02-05 12:15:40 - ERROR - stderr - +2025-02-05 12:15:40 - ERROR - stderr - +2025-02-05 12:15:40 - INFO - stdout - {'loss': 1.0379, 'grad_norm': 1.2986791133880615, 'learning_rate': 1.9419325439486213e-05, 'epoch': 0.41} +2025-02-05 12:15:40 - ERROR - stderr - 14%|█▎ | 3046/22434 [2:08:00<13:25:20, 2.49s/it] +2025-02-05 12:15:42 - ERROR - stderr - 14%|█▎ | 3047/22434 [2:08:02<13:28:09, 2.50s/it] +2025-02-05 12:15:43 - ERROR - stderr - +2025-02-05 12:15:43 - ERROR - stderr - +2025-02-05 12:15:43 - INFO - stdout - {'loss': 1.1168, 'grad_norm': 1.1774249076843262, 'learning_rate': 1.941884052810191e-05, 'epoch': 0.41} +2025-02-05 12:15:43 - ERROR - stderr - 14%|█▎ | 3047/22434 [2:08:02<13:28:09, 2.50s/it] +2025-02-05 12:15:45 - ERROR - stderr - 14%|█▎ | 3048/22434 [2:08:05<13:27:15, 2.50s/it] +2025-02-05 12:15:45 - ERROR - stderr - +2025-02-05 12:15:45 - ERROR - stderr - +2025-02-05 12:15:45 - INFO - stdout - {'loss': 0.9946, 'grad_norm': 1.2179279327392578, 'learning_rate': 1.9418355420390885e-05, 'epoch': 0.41} +2025-02-05 12:15:45 - ERROR - stderr - 14%|█▎ | 3048/22434 [2:08:05<13:27:15, 2.50s/it] +2025-02-05 12:15:47 - ERROR - stderr - 14%|█▎ | 3049/22434 [2:08:07<13:28:07, 2.50s/it] +2025-02-05 12:15:48 - ERROR - stderr - +2025-02-05 12:15:48 - ERROR - stderr - +2025-02-05 12:15:48 - INFO - stdout - {'loss': 1.0165, 'grad_norm': 1.1613017320632935, 'learning_rate': 1.941787011636326e-05, 'epoch': 0.41} +2025-02-05 12:15:48 - ERROR - stderr - 14%|█▎ | 3049/22434 [2:08:07<13:28:07, 2.50s/it] +2025-02-05 12:15:50 - ERROR - stderr - 14%|█▎ | 3050/22434 [2:08:10<13:29:10, 2.50s/it] +2025-02-05 12:15:50 - ERROR - stderr - +2025-02-05 12:15:50 - ERROR - stderr - +2025-02-05 12:15:50 - INFO - stdout - {'loss': 0.9082, 'grad_norm': 1.1061269044876099, 'learning_rate': 1.9417384616029137e-05, 'epoch': 0.41} +2025-02-05 12:15:50 - ERROR - stderr - 14%|█▎ | 3050/22434 [2:08:10<13:29:10, 2.50s/it] +2025-02-05 12:15:52 - ERROR - stderr - 14%|█▎ | 3051/22434 [2:08:12<13:26:49, 2.50s/it] +2025-02-05 12:15:53 - ERROR - stderr - +2025-02-05 12:15:53 - ERROR - stderr - +2025-02-05 12:15:53 - INFO - stdout - {'loss': 0.9004, 'grad_norm': 1.0965896844863892, 'learning_rate': 1.9416898919398646e-05, 'epoch': 0.41} +2025-02-05 12:15:53 - ERROR - stderr - 14%|█▎ | 3051/22434 [2:08:12<13:26:49, 2.50s/it] +2025-02-05 12:15:55 - ERROR - stderr - 14%|█▎ | 3052/22434 [2:08:15<13:18:56, 2.47s/it] +2025-02-05 12:15:55 - ERROR - stderr - +2025-02-05 12:15:55 - ERROR - stderr - +2025-02-05 12:15:55 - INFO - stdout - {'loss': 0.8921, 'grad_norm': 1.0301753282546997, 'learning_rate': 1.9416413026481907e-05, 'epoch': 0.41} +2025-02-05 12:15:55 - ERROR - stderr - 14%|█▎ | 3052/22434 [2:08:15<13:18:56, 2.47s/it] +2025-02-05 12:15:57 - ERROR - stderr - 14%|█▎ | 3053/22434 [2:08:17<13:15:19, 2.46s/it] +2025-02-05 12:15:57 - ERROR - stderr - +2025-02-05 12:15:57 - ERROR - stderr - +2025-02-05 12:15:57 - INFO - stdout - {'loss': 0.9608, 'grad_norm': 1.2491261959075928, 'learning_rate': 1.9415926937289054e-05, 'epoch': 0.41} +2025-02-05 12:15:57 - ERROR - stderr - 14%|█▎ | 3053/22434 [2:08:17<13:15:19, 2.46s/it] +2025-02-05 12:16:00 - ERROR - stderr - 14%|█▎ | 3054/22434 [2:08:20<13:38:09, 2.53s/it] +2025-02-05 12:16:00 - ERROR - stderr - +2025-02-05 12:16:00 - ERROR - stderr - +2025-02-05 12:16:00 - INFO - stdout - {'loss': 1.0465, 'grad_norm': 1.0893139839172363, 'learning_rate': 1.941544065183021e-05, 'epoch': 0.41} +2025-02-05 12:16:00 - ERROR - stderr - 14%|█▎ | 3054/22434 [2:08:20<13:38:09, 2.53s/it] +2025-02-05 12:16:02 - ERROR - stderr - 14%|█▎ | 3055/22434 [2:08:22<13:29:25, 2.51s/it] +2025-02-05 12:16:03 - ERROR - stderr - +2025-02-05 12:16:03 - ERROR - stderr - +2025-02-05 12:16:03 - INFO - stdout - {'loss': 0.9938, 'grad_norm': 1.0968921184539795, 'learning_rate': 1.9414954170115516e-05, 'epoch': 0.41} +2025-02-05 12:16:03 - ERROR - stderr - 14%|█▎ | 3055/22434 [2:08:22<13:29:25, 2.51s/it] +2025-02-05 12:16:05 - ERROR - stderr - 14%|█▎ | 3056/22434 [2:08:25<13:28:38, 2.50s/it] +2025-02-05 12:16:05 - ERROR - stderr - +2025-02-05 12:16:05 - ERROR - stderr - +2025-02-05 12:16:05 - INFO - stdout - {'loss': 0.9408, 'grad_norm': 1.203365445137024, 'learning_rate': 1.9414467492155113e-05, 'epoch': 0.41} +2025-02-05 12:16:05 - ERROR - stderr - 14%|█▎ | 3056/22434 [2:08:25<13:28:38, 2.50s/it] +2025-02-05 12:16:07 - ERROR - stderr - 14%|█▎ | 3057/22434 [2:08:27<13:27:14, 2.50s/it] +2025-02-05 12:16:07 - ERROR - stderr - +2025-02-05 12:16:07 - ERROR - stderr - +2025-02-05 12:16:07 - INFO - stdout - {'loss': 1.0231, 'grad_norm': 1.0484868288040161, 'learning_rate': 1.9413980617959137e-05, 'epoch': 0.41} +2025-02-05 12:16:07 - ERROR - stderr - 14%|█▎ | 3057/22434 [2:08:27<13:27:14, 2.50s/it] +2025-02-05 12:16:10 - ERROR - stderr - 14%|█▎ | 3058/22434 [2:08:30<13:18:41, 2.47s/it] +2025-02-05 12:16:10 - ERROR - stderr - +2025-02-05 12:16:10 - ERROR - stderr - +2025-02-05 12:16:10 - INFO - stdout - {'loss': 0.962, 'grad_norm': 1.158787488937378, 'learning_rate': 1.941349354753775e-05, 'epoch': 0.41} +2025-02-05 12:16:10 - ERROR - stderr - 14%|█▎ | 3058/22434 [2:08:30<13:18:41, 2.47s/it] +2025-02-05 12:16:12 - ERROR - stderr - 14%|█▎ | 3059/22434 [2:08:32<13:16:12, 2.47s/it] +2025-02-05 12:16:12 - ERROR - stderr - +2025-02-05 12:16:12 - ERROR - stderr - +2025-02-05 12:16:12 - INFO - stdout - {'loss': 0.9851, 'grad_norm': 1.2331900596618652, 'learning_rate': 1.9413006280901098e-05, 'epoch': 0.41} +2025-02-05 12:16:12 - ERROR - stderr - 14%|█▎ | 3059/22434 [2:08:32<13:16:12, 2.47s/it] +2025-02-05 12:16:15 - ERROR - stderr - 14%|█▎ | 3060/22434 [2:08:35<13:14:58, 2.46s/it] +2025-02-05 12:16:15 - ERROR - stderr - +2025-02-05 12:16:15 - ERROR - stderr - +2025-02-05 12:16:15 - INFO - stdout - {'loss': 0.9297, 'grad_norm': 1.1405267715454102, 'learning_rate': 1.9412518818059335e-05, 'epoch': 0.41} +2025-02-05 12:16:15 - ERROR - stderr - 14%|█▎ | 3060/22434 [2:08:35<13:14:58, 2.46s/it] +2025-02-05 12:16:17 - ERROR - stderr - 14%|█▎ | 3061/22434 [2:08:37<13:11:28, 2.45s/it] +2025-02-05 12:16:17 - ERROR - stderr - +2025-02-05 12:16:17 - ERROR - stderr - +2025-02-05 12:16:17 - INFO - stdout - {'loss': 1.0723, 'grad_norm': 1.1454408168792725, 'learning_rate': 1.9412031159022624e-05, 'epoch': 0.41} +2025-02-05 12:16:17 - ERROR - stderr - 14%|█▎ | 3061/22434 [2:08:37<13:11:28, 2.45s/it] +2025-02-05 12:16:20 - ERROR - stderr - 14%|█▎ | 3062/22434 [2:08:39<13:12:41, 2.46s/it] +2025-02-05 12:16:20 - ERROR - stderr - +2025-02-05 12:16:20 - ERROR - stderr - +2025-02-05 12:16:20 - INFO - stdout - {'loss': 0.891, 'grad_norm': 1.1048020124435425, 'learning_rate': 1.941154330380113e-05, 'epoch': 0.41} +2025-02-05 12:16:20 - ERROR - stderr - 14%|█▎ | 3062/22434 [2:08:39<13:12:41, 2.46s/it] +2025-02-05 12:16:22 - ERROR - stderr - 14%|█▎ | 3063/22434 [2:08:42<13:19:23, 2.48s/it] +2025-02-05 12:16:22 - ERROR - stderr - +2025-02-05 12:16:22 - ERROR - stderr - +2025-02-05 12:16:22 - INFO - stdout - {'loss': 0.9157, 'grad_norm': 1.266876220703125, 'learning_rate': 1.9411055252405022e-05, 'epoch': 0.41} +2025-02-05 12:16:22 - ERROR - stderr - 14%|█▎ | 3063/22434 [2:08:42<13:19:23, 2.48s/it] +2025-02-05 12:16:25 - ERROR - stderr - 14%|█▎ | 3064/22434 [2:08:45<13:40:17, 2.54s/it] +2025-02-05 12:16:25 - ERROR - stderr - +2025-02-05 12:16:25 - ERROR - stderr - +2025-02-05 12:16:25 - INFO - stdout - {'loss': 1.0723, 'grad_norm': 1.1568734645843506, 'learning_rate': 1.9410567004844473e-05, 'epoch': 0.41} +2025-02-05 12:16:25 - ERROR - stderr - 14%|█▎ | 3064/22434 [2:08:45<13:40:17, 2.54s/it] +2025-02-05 12:16:27 - ERROR - stderr - 14%|█▎ | 3065/22434 [2:08:47<13:29:43, 2.51s/it] +2025-02-05 12:16:27 - ERROR - stderr - +2025-02-05 12:16:27 - ERROR - stderr - +2025-02-05 12:16:27 - INFO - stdout - {'loss': 1.0037, 'grad_norm': 1.2634341716766357, 'learning_rate': 1.9410078561129657e-05, 'epoch': 0.41} +2025-02-05 12:16:27 - ERROR - stderr - 14%|█▎ | 3065/22434 [2:08:47<13:29:43, 2.51s/it] +2025-02-05 12:16:30 - ERROR - stderr - 14%|█▎ | 3066/22434 [2:08:50<13:35:20, 2.53s/it] +2025-02-05 12:16:30 - ERROR - stderr - +2025-02-05 12:16:30 - ERROR - stderr - +2025-02-05 12:16:30 - INFO - stdout - {'loss': 0.9384, 'grad_norm': 1.0296622514724731, 'learning_rate': 1.9409589921270758e-05, 'epoch': 0.41} +2025-02-05 12:16:30 - ERROR - stderr - 14%|█▎ | 3066/22434 [2:08:50<13:35:20, 2.53s/it] +2025-02-05 12:16:32 - ERROR - stderr - 14%|█▎ | 3067/22434 [2:08:52<13:24:31, 2.49s/it] +2025-02-05 12:16:32 - ERROR - stderr - +2025-02-05 12:16:32 - ERROR - stderr - +2025-02-05 12:16:32 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 1.115875005722046, 'learning_rate': 1.9409101085277966e-05, 'epoch': 0.41} +2025-02-05 12:16:32 - ERROR - stderr - 14%|█▎ | 3067/22434 [2:08:52<13:24:31, 2.49s/it] +2025-02-05 12:16:35 - ERROR - stderr - 14%|█▎ | 3068/22434 [2:08:55<13:26:22, 2.50s/it] +2025-02-05 12:16:35 - ERROR - stderr - +2025-02-05 12:16:35 - ERROR - stderr - +2025-02-05 12:16:35 - INFO - stdout - {'loss': 1.0656, 'grad_norm': 1.1686104536056519, 'learning_rate': 1.9408612053161464e-05, 'epoch': 0.41} +2025-02-05 12:16:35 - ERROR - stderr - 14%|█▎ | 3068/22434 [2:08:55<13:26:22, 2.50s/it] +2025-02-05 12:16:37 - ERROR - stderr - 14%|█▎ | 3069/22434 [2:08:57<13:27:18, 2.50s/it] +2025-02-05 12:16:37 - ERROR - stderr - +2025-02-05 12:16:37 - ERROR - stderr - +2025-02-05 12:16:37 - INFO - stdout - {'loss': 0.8682, 'grad_norm': 1.030661940574646, 'learning_rate': 1.9408122824931444e-05, 'epoch': 0.41} +2025-02-05 12:16:37 - ERROR - stderr - 14%|█▎ | 3069/22434 [2:08:57<13:27:18, 2.50s/it] +2025-02-05 12:16:40 - ERROR - stderr - 14%|█▎ | 3070/22434 [2:09:00<13:34:01, 2.52s/it] +2025-02-05 12:16:40 - ERROR - stderr - +2025-02-05 12:16:40 - ERROR - stderr - +2025-02-05 12:16:40 - INFO - stdout - {'loss': 1.061, 'grad_norm': 1.24689519405365, 'learning_rate': 1.9407633400598107e-05, 'epoch': 0.41} +2025-02-05 12:16:40 - ERROR - stderr - 14%|█▎ | 3070/22434 [2:09:00<13:34:01, 2.52s/it] +2025-02-05 12:16:42 - ERROR - stderr - 14%|█▎ | 3071/22434 [2:09:02<13:29:01, 2.51s/it] +2025-02-05 12:16:42 - ERROR - stderr - +2025-02-05 12:16:42 - ERROR - stderr - +2025-02-05 12:16:42 - INFO - stdout - {'loss': 1.0071, 'grad_norm': 1.0940386056900024, 'learning_rate': 1.9407143780171656e-05, 'epoch': 0.41} +2025-02-05 12:16:42 - ERROR - stderr - 14%|█▎ | 3071/22434 [2:09:02<13:29:01, 2.51s/it] +2025-02-05 12:16:45 - ERROR - stderr - 14%|█▎ | 3072/22434 [2:09:05<13:26:56, 2.50s/it] +2025-02-05 12:16:45 - ERROR - stderr - +2025-02-05 12:16:45 - ERROR - stderr - +2025-02-05 12:16:45 - INFO - stdout - {'loss': 0.897, 'grad_norm': 1.0001624822616577, 'learning_rate': 1.9406653963662293e-05, 'epoch': 0.41} +2025-02-05 12:16:45 - ERROR - stderr - 14%|█▎ | 3072/22434 [2:09:05<13:26:56, 2.50s/it] +2025-02-05 12:16:47 - ERROR - stderr - 14%|█▎ | 3073/22434 [2:09:07<13:29:04, 2.51s/it] +2025-02-05 12:16:47 - ERROR - stderr - +2025-02-05 12:16:47 - ERROR - stderr - +2025-02-05 12:16:47 - INFO - stdout - {'loss': 1.0116, 'grad_norm': 1.158817172050476, 'learning_rate': 1.9406163951080228e-05, 'epoch': 0.41} +2025-02-05 12:16:47 - ERROR - stderr - 14%|█▎ | 3073/22434 [2:09:07<13:29:04, 2.51s/it] +2025-02-05 12:16:50 - ERROR - stderr - 14%|█▎ | 3074/22434 [2:09:10<13:34:51, 2.53s/it] +2025-02-05 12:16:50 - ERROR - stderr - +2025-02-05 12:16:50 - ERROR - stderr - +2025-02-05 12:16:50 - INFO - stdout - {'loss': 0.976, 'grad_norm': 1.0831011533737183, 'learning_rate': 1.9405673742435677e-05, 'epoch': 0.41} +2025-02-05 12:16:50 - ERROR - stderr - 14%|█▎ | 3074/22434 [2:09:10<13:34:51, 2.53s/it] +2025-02-05 12:16:52 - ERROR - stderr - 14%|█▎ | 3075/22434 [2:09:12<13:29:05, 2.51s/it] +2025-02-05 12:16:52 - ERROR - stderr - +2025-02-05 12:16:52 - ERROR - stderr - +2025-02-05 12:16:52 - INFO - stdout - {'loss': 1.0039, 'grad_norm': 1.1321932077407837, 'learning_rate': 1.940518333773886e-05, 'epoch': 0.41} +2025-02-05 12:16:52 - ERROR - stderr - 14%|█▎ | 3075/22434 [2:09:12<13:29:05, 2.51s/it] +2025-02-05 12:16:55 - ERROR - stderr - 14%|█▎ | 3076/22434 [2:09:15<13:26:32, 2.50s/it] +2025-02-05 12:16:55 - ERROR - stderr - +2025-02-05 12:16:55 - ERROR - stderr - +2025-02-05 12:16:55 - INFO - stdout - {'loss': 1.0027, 'grad_norm': 1.2162421941757202, 'learning_rate': 1.940469273699999e-05, 'epoch': 0.41} +2025-02-05 12:16:55 - ERROR - stderr - 14%|█▎ | 3076/22434 [2:09:15<13:26:32, 2.50s/it] +2025-02-05 12:16:57 - ERROR - stderr - 14%|█▎ | 3077/22434 [2:09:17<13:29:08, 2.51s/it] +2025-02-05 12:16:57 - ERROR - stderr - +2025-02-05 12:16:57 - ERROR - stderr - +2025-02-05 12:16:57 - INFO - stdout - {'loss': 1.044, 'grad_norm': 1.2145994901657104, 'learning_rate': 1.9404201940229305e-05, 'epoch': 0.41} +2025-02-05 12:16:57 - ERROR - stderr - 14%|█▎ | 3077/22434 [2:09:17<13:29:08, 2.51s/it] +2025-02-05 12:17:00 - ERROR - stderr - 14%|█▎ | 3078/22434 [2:09:20<13:32:59, 2.52s/it] +2025-02-05 12:17:00 - ERROR - stderr - +2025-02-05 12:17:00 - ERROR - stderr - +2025-02-05 12:17:00 - INFO - stdout - {'loss': 1.1144, 'grad_norm': 1.2762770652770996, 'learning_rate': 1.9403710947437027e-05, 'epoch': 0.41} +2025-02-05 12:17:00 - ERROR - stderr - 14%|█▎ | 3078/22434 [2:09:20<13:32:59, 2.52s/it] +2025-02-05 12:17:02 - ERROR - stderr - 14%|█▎ | 3079/22434 [2:09:22<13:35:03, 2.53s/it] +2025-02-05 12:17:03 - ERROR - stderr - +2025-02-05 12:17:03 - ERROR - stderr - +2025-02-05 12:17:03 - INFO - stdout - {'loss': 0.9767, 'grad_norm': 1.1265672445297241, 'learning_rate': 1.9403219758633397e-05, 'epoch': 0.41} +2025-02-05 12:17:03 - ERROR - stderr - 14%|█▎ | 3079/22434 [2:09:22<13:35:03, 2.53s/it] +2025-02-05 12:17:05 - ERROR - stderr - 14%|█▎ | 3080/22434 [2:09:25<13:28:03, 2.51s/it] +2025-02-05 12:17:05 - ERROR - stderr - +2025-02-05 12:17:05 - ERROR - stderr - +2025-02-05 12:17:05 - INFO - stdout - {'loss': 1.0313, 'grad_norm': 1.2573039531707764, 'learning_rate': 1.9402728373828643e-05, 'epoch': 0.41} +2025-02-05 12:17:05 - ERROR - stderr - 14%|█▎ | 3080/22434 [2:09:25<13:28:03, 2.51s/it] +2025-02-05 12:17:07 - ERROR - stderr - 14%|█▎ | 3081/22434 [2:09:27<13:32:39, 2.52s/it] +2025-02-05 12:17:08 - ERROR - stderr - +2025-02-05 12:17:08 - ERROR - stderr - +2025-02-05 12:17:08 - INFO - stdout - {'loss': 1.0282, 'grad_norm': 1.22186279296875, 'learning_rate': 1.9402236793033015e-05, 'epoch': 0.41} +2025-02-05 12:17:08 - ERROR - stderr - 14%|█▎ | 3081/22434 [2:09:27<13:32:39, 2.52s/it] +2025-02-05 12:17:10 - ERROR - stderr - 14%|█▎ | 3082/22434 [2:09:30<13:24:25, 2.49s/it] +2025-02-05 12:17:10 - ERROR - stderr - +2025-02-05 12:17:10 - ERROR - stderr - +2025-02-05 12:17:10 - INFO - stdout - {'loss': 1.0852, 'grad_norm': 1.14968740940094, 'learning_rate': 1.940174501625676e-05, 'epoch': 0.41} +2025-02-05 12:17:10 - ERROR - stderr - 14%|█▎ | 3082/22434 [2:09:30<13:24:25, 2.49s/it] +2025-02-05 12:17:13 - ERROR - stderr - 14%|█▎ | 3083/22434 [2:09:33<13:55:51, 2.59s/it] +2025-02-05 12:17:13 - ERROR - stderr - +2025-02-05 12:17:13 - ERROR - stderr - +2025-02-05 12:17:13 - INFO - stdout - {'loss': 1.0022, 'grad_norm': 1.2463802099227905, 'learning_rate': 1.9401253043510126e-05, 'epoch': 0.41} +2025-02-05 12:17:13 - ERROR - stderr - 14%|█▎ | 3083/22434 [2:09:33<13:55:51, 2.59s/it] +2025-02-05 12:17:16 - ERROR - stderr - 14%|█▎ | 3084/22434 [2:09:35<14:13:47, 2.65s/it] +2025-02-05 12:17:16 - ERROR - stderr - +2025-02-05 12:17:16 - ERROR - stderr - +2025-02-05 12:17:16 - INFO - stdout - {'loss': 0.9332, 'grad_norm': 0.998227596282959, 'learning_rate': 1.9400760874803366e-05, 'epoch': 0.41} +2025-02-05 12:17:16 - ERROR - stderr - 14%|█▎ | 3084/22434 [2:09:35<14:13:47, 2.65s/it] +2025-02-05 12:17:18 - ERROR - stderr - 14%|█▍ | 3085/22434 [2:09:38<14:08:08, 2.63s/it] +2025-02-05 12:17:18 - ERROR - stderr - +2025-02-05 12:17:18 - ERROR - stderr - +2025-02-05 12:17:18 - INFO - stdout - {'loss': 0.9516, 'grad_norm': 1.1748766899108887, 'learning_rate': 1.940026851014674e-05, 'epoch': 0.41} +2025-02-05 12:17:18 - ERROR - stderr - 14%|█▍ | 3085/22434 [2:09:38<14:08:08, 2.63s/it] +2025-02-05 12:17:21 - ERROR - stderr - 14%|█▍ | 3086/22434 [2:09:40<13:57:19, 2.60s/it] +2025-02-05 12:17:21 - ERROR - stderr - +2025-02-05 12:17:21 - ERROR - stderr - +2025-02-05 12:17:21 - INFO - stdout - {'loss': 0.9656, 'grad_norm': 1.0752381086349487, 'learning_rate': 1.9399775949550516e-05, 'epoch': 0.41} +2025-02-05 12:17:21 - ERROR - stderr - 14%|█▍ | 3086/22434 [2:09:40<13:57:19, 2.60s/it] +2025-02-05 12:17:23 - ERROR - stderr - 14%|█▍ | 3087/22434 [2:09:43<13:43:14, 2.55s/it] +2025-02-05 12:17:23 - ERROR - stderr - +2025-02-05 12:17:23 - ERROR - stderr - +2025-02-05 12:17:23 - INFO - stdout - {'loss': 1.1041, 'grad_norm': 1.2725883722305298, 'learning_rate': 1.9399283193024957e-05, 'epoch': 0.41} +2025-02-05 12:17:23 - ERROR - stderr - 14%|█▍ | 3087/22434 [2:09:43<13:43:14, 2.55s/it] +2025-02-05 12:17:26 - ERROR - stderr - 14%|█▍ | 3088/22434 [2:09:45<13:39:55, 2.54s/it] +2025-02-05 12:17:26 - ERROR - stderr - +2025-02-05 12:17:26 - ERROR - stderr - +2025-02-05 12:17:26 - INFO - stdout - {'loss': 0.9853, 'grad_norm': 1.0904481410980225, 'learning_rate': 1.9398790240580333e-05, 'epoch': 0.41} +2025-02-05 12:17:26 - ERROR - stderr - 14%|█▍ | 3088/22434 [2:09:45<13:39:55, 2.54s/it] +2025-02-05 12:17:28 - ERROR - stderr - 14%|█▍ | 3089/22434 [2:09:48<13:32:49, 2.52s/it] +2025-02-05 12:17:28 - ERROR - stderr - +2025-02-05 12:17:28 - ERROR - stderr - +2025-02-05 12:17:28 - INFO - stdout - {'loss': 0.977, 'grad_norm': 0.9968140125274658, 'learning_rate': 1.9398297092226918e-05, 'epoch': 0.41} +2025-02-05 12:17:28 - ERROR - stderr - 14%|█▍ | 3089/22434 [2:09:48<13:32:49, 2.52s/it] +2025-02-05 12:17:31 - ERROR - stderr - 14%|█▍ | 3090/22434 [2:09:50<13:36:43, 2.53s/it] +2025-02-05 12:17:31 - ERROR - stderr - +2025-02-05 12:17:31 - ERROR - stderr - +2025-02-05 12:17:31 - INFO - stdout - {'loss': 1.0732, 'grad_norm': 1.0959019660949707, 'learning_rate': 1.9397803747974996e-05, 'epoch': 0.41} +2025-02-05 12:17:31 - ERROR - stderr - 14%|█▍ | 3090/22434 [2:09:50<13:36:43, 2.53s/it] +2025-02-05 12:17:33 - ERROR - stderr - 14%|█▍ | 3091/22434 [2:09:53<13:36:32, 2.53s/it] +2025-02-05 12:17:33 - ERROR - stderr - +2025-02-05 12:17:33 - ERROR - stderr - +2025-02-05 12:17:33 - INFO - stdout - {'loss': 1.0574, 'grad_norm': 1.2900850772857666, 'learning_rate': 1.9397310207834847e-05, 'epoch': 0.41} +2025-02-05 12:17:33 - ERROR - stderr - 14%|█▍ | 3091/22434 [2:09:53<13:36:32, 2.53s/it] +2025-02-05 12:17:36 - ERROR - stderr - 14%|█▍ | 3092/22434 [2:09:55<13:35:34, 2.53s/it] +2025-02-05 12:17:36 - ERROR - stderr - +2025-02-05 12:17:36 - ERROR - stderr - +2025-02-05 12:17:36 - INFO - stdout - {'loss': 0.9684, 'grad_norm': 1.2283388376235962, 'learning_rate': 1.9396816471816756e-05, 'epoch': 0.41} +2025-02-05 12:17:36 - ERROR - stderr - 14%|█▍ | 3092/22434 [2:09:56<13:35:34, 2.53s/it] +2025-02-05 12:17:38 - ERROR - stderr - 14%|█▍ | 3093/22434 [2:09:58<13:25:29, 2.50s/it] +2025-02-05 12:17:38 - ERROR - stderr - +2025-02-05 12:17:38 - ERROR - stderr - +2025-02-05 12:17:38 - INFO - stdout - {'loss': 0.9693, 'grad_norm': 1.1191096305847168, 'learning_rate': 1.9396322539931025e-05, 'epoch': 0.41} +2025-02-05 12:17:38 - ERROR - stderr - 14%|█▍ | 3093/22434 [2:09:58<13:25:29, 2.50s/it] +2025-02-05 12:17:41 - ERROR - stderr - 14%|█▍ | 3094/22434 [2:10:00<13:29:01, 2.51s/it] +2025-02-05 12:17:41 - ERROR - stderr - +2025-02-05 12:17:41 - ERROR - stderr - +2025-02-05 12:17:41 - INFO - stdout - {'loss': 0.9633, 'grad_norm': 1.1841137409210205, 'learning_rate': 1.9395828412187935e-05, 'epoch': 0.41} +2025-02-05 12:17:41 - ERROR - stderr - 14%|█▍ | 3094/22434 [2:10:00<13:29:01, 2.51s/it] +2025-02-05 12:17:43 - ERROR - stderr - 14%|█▍ | 3095/22434 [2:10:03<13:26:02, 2.50s/it] +2025-02-05 12:17:43 - ERROR - stderr - +2025-02-05 12:17:43 - ERROR - stderr - +2025-02-05 12:17:43 - INFO - stdout - {'loss': 1.0126, 'grad_norm': 1.0935860872268677, 'learning_rate': 1.9395334088597793e-05, 'epoch': 0.41} +2025-02-05 12:17:43 - ERROR - stderr - 14%|█▍ | 3095/22434 [2:10:03<13:26:02, 2.50s/it] +2025-02-05 12:17:46 - ERROR - stderr - 14%|█▍ | 3096/22434 [2:10:05<13:19:01, 2.48s/it] +2025-02-05 12:17:46 - ERROR - stderr - +2025-02-05 12:17:46 - ERROR - stderr - +2025-02-05 12:17:46 - INFO - stdout - {'loss': 1.1544, 'grad_norm': 1.212520718574524, 'learning_rate': 1.9394839569170907e-05, 'epoch': 0.41} +2025-02-05 12:17:46 - ERROR - stderr - 14%|█▍ | 3096/22434 [2:10:05<13:19:01, 2.48s/it] +2025-02-05 12:17:48 - ERROR - stderr - 14%|█▍ | 3097/22434 [2:10:08<13:22:27, 2.49s/it] +2025-02-05 12:17:48 - ERROR - stderr - +2025-02-05 12:17:48 - ERROR - stderr - +2025-02-05 12:17:48 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.0677695274353027, 'learning_rate': 1.9394344853917575e-05, 'epoch': 0.41} +2025-02-05 12:17:48 - ERROR - stderr - 14%|█▍ | 3097/22434 [2:10:08<13:22:27, 2.49s/it] +2025-02-05 12:17:51 - ERROR - stderr - 14%|█▍ | 3098/22434 [2:10:10<13:16:39, 2.47s/it] +2025-02-05 12:17:51 - ERROR - stderr - +2025-02-05 12:17:51 - ERROR - stderr - +2025-02-05 12:17:51 - INFO - stdout - {'loss': 1.0152, 'grad_norm': 1.2430074214935303, 'learning_rate': 1.9393849942848116e-05, 'epoch': 0.41} +2025-02-05 12:17:51 - ERROR - stderr - 14%|█▍ | 3098/22434 [2:10:10<13:16:39, 2.47s/it] +2025-02-05 12:17:53 - ERROR - stderr - 14%|█▍ | 3099/22434 [2:10:13<13:16:36, 2.47s/it] +2025-02-05 12:17:53 - ERROR - stderr - +2025-02-05 12:17:53 - ERROR - stderr - +2025-02-05 12:17:53 - INFO - stdout - {'loss': 1.0846, 'grad_norm': 1.2330094575881958, 'learning_rate': 1.9393354835972846e-05, 'epoch': 0.41} +2025-02-05 12:17:53 - ERROR - stderr - 14%|█▍ | 3099/22434 [2:10:13<13:16:36, 2.47s/it] +2025-02-05 12:17:55 - ERROR - stderr - 14%|█▍ | 3100/22434 [2:10:15<13:16:24, 2.47s/it] +2025-02-05 12:17:55 - ERROR - stderr - +2025-02-05 12:17:55 - ERROR - stderr - +2025-02-05 12:17:55 - INFO - stdout - {'loss': 1.0088, 'grad_norm': 1.1950565576553345, 'learning_rate': 1.9392859533302077e-05, 'epoch': 0.41} +2025-02-05 12:17:55 - ERROR - stderr - 14%|█▍ | 3100/22434 [2:10:15<13:16:24, 2.47s/it] +2025-02-05 12:17:58 - ERROR - stderr - 14%|█▍ | 3101/22434 [2:10:18<13:11:00, 2.45s/it] +2025-02-05 12:17:58 - ERROR - stderr - +2025-02-05 12:17:58 - ERROR - stderr - +2025-02-05 12:17:58 - INFO - stdout - {'loss': 0.9647, 'grad_norm': 1.3253813982009888, 'learning_rate': 1.9392364034846145e-05, 'epoch': 0.41} +2025-02-05 12:17:58 - ERROR - stderr - 14%|█▍ | 3101/22434 [2:10:18<13:11:00, 2.45s/it] +2025-02-05 12:18:00 - ERROR - stderr - 14%|█▍ | 3102/22434 [2:10:20<13:13:28, 2.46s/it] +2025-02-05 12:18:00 - ERROR - stderr - +2025-02-05 12:18:00 - ERROR - stderr - +2025-02-05 12:18:00 - INFO - stdout - {'loss': 0.9103, 'grad_norm': 1.0363998413085938, 'learning_rate': 1.9391868340615366e-05, 'epoch': 0.41} +2025-02-05 12:18:00 - ERROR - stderr - 14%|█▍ | 3102/22434 [2:10:20<13:13:28, 2.46s/it] +2025-02-05 12:18:03 - ERROR - stderr - 14%|█▍ | 3103/22434 [2:10:23<13:14:37, 2.47s/it] +2025-02-05 12:18:03 - ERROR - stderr - +2025-02-05 12:18:03 - ERROR - stderr - +2025-02-05 12:18:03 - INFO - stdout - {'loss': 0.8239, 'grad_norm': 1.0540871620178223, 'learning_rate': 1.9391372450620087e-05, 'epoch': 0.41} +2025-02-05 12:18:03 - ERROR - stderr - 14%|█▍ | 3103/22434 [2:10:23<13:14:37, 2.47s/it] +2025-02-05 12:18:05 - ERROR - stderr - 14%|█▍ | 3104/22434 [2:10:25<13:17:08, 2.47s/it] +2025-02-05 12:18:05 - ERROR - stderr - +2025-02-05 12:18:05 - ERROR - stderr - +2025-02-05 12:18:05 - INFO - stdout - {'loss': 1.1811, 'grad_norm': 1.19146728515625, 'learning_rate': 1.939087636487063e-05, 'epoch': 0.42} +2025-02-05 12:18:05 - ERROR - stderr - 14%|█▍ | 3104/22434 [2:10:25<13:17:08, 2.47s/it] +2025-02-05 12:18:08 - ERROR - stderr - 14%|█▍ | 3105/22434 [2:10:28<13:18:31, 2.48s/it] +2025-02-05 12:18:08 - ERROR - stderr - +2025-02-05 12:18:08 - ERROR - stderr - +2025-02-05 12:18:08 - INFO - stdout - {'loss': 1.2203, 'grad_norm': 1.251397967338562, 'learning_rate': 1.939038008337734e-05, 'epoch': 0.42} +2025-02-05 12:18:08 - ERROR - stderr - 14%|█▍ | 3105/22434 [2:10:28<13:18:31, 2.48s/it] +2025-02-05 12:18:10 - ERROR - stderr - 14%|█▍ | 3106/22434 [2:10:30<13:14:21, 2.47s/it] +2025-02-05 12:18:10 - ERROR - stderr - +2025-02-05 12:18:10 - ERROR - stderr - +2025-02-05 12:18:10 - INFO - stdout - {'loss': 0.9466, 'grad_norm': 1.1168112754821777, 'learning_rate': 1.938988360615057e-05, 'epoch': 0.42} +2025-02-05 12:18:10 - ERROR - stderr - 14%|█▍ | 3106/22434 [2:10:30<13:14:21, 2.47s/it] +2025-02-05 12:18:13 - ERROR - stderr - 14%|█▍ | 3107/22434 [2:10:33<13:19:53, 2.48s/it] +2025-02-05 12:18:13 - ERROR - stderr - +2025-02-05 12:18:13 - ERROR - stderr - +2025-02-05 12:18:13 - INFO - stdout - {'loss': 1.0691, 'grad_norm': 1.0601052045822144, 'learning_rate': 1.9389386933200653e-05, 'epoch': 0.42} +2025-02-05 12:18:13 - ERROR - stderr - 14%|█▍ | 3107/22434 [2:10:33<13:19:53, 2.48s/it] +2025-02-05 12:18:15 - ERROR - stderr - 14%|█▍ | 3108/22434 [2:10:35<13:12:57, 2.46s/it] +2025-02-05 12:18:15 - ERROR - stderr - +2025-02-05 12:18:15 - ERROR - stderr - +2025-02-05 12:18:15 - INFO - stdout - {'loss': 0.9285, 'grad_norm': 1.2435392141342163, 'learning_rate': 1.9388890064537954e-05, 'epoch': 0.42} +2025-02-05 12:18:15 - ERROR - stderr - 14%|█▍ | 3108/22434 [2:10:35<13:12:57, 2.46s/it] +2025-02-05 12:18:18 - ERROR - stderr - 14%|█▍ | 3109/22434 [2:10:37<13:09:59, 2.45s/it] +2025-02-05 12:18:18 - ERROR - stderr - +2025-02-05 12:18:18 - ERROR - stderr - +2025-02-05 12:18:18 - INFO - stdout - {'loss': 0.9492, 'grad_norm': 1.3495535850524902, 'learning_rate': 1.9388393000172825e-05, 'epoch': 0.42} +2025-02-05 12:18:18 - ERROR - stderr - 14%|█▍ | 3109/22434 [2:10:37<13:09:59, 2.45s/it] +2025-02-05 12:18:20 - ERROR - stderr - 14%|█▍ | 3110/22434 [2:10:40<13:09:13, 2.45s/it] +2025-02-05 12:18:20 - ERROR - stderr - +2025-02-05 12:18:20 - ERROR - stderr - +2025-02-05 12:18:20 - INFO - stdout - {'loss': 0.9502, 'grad_norm': 1.1146191358566284, 'learning_rate': 1.9387895740115628e-05, 'epoch': 0.42} +2025-02-05 12:18:20 - ERROR - stderr - 14%|█▍ | 3110/22434 [2:10:40<13:09:13, 2.45s/it] +2025-02-05 12:18:22 - ERROR - stderr - 14%|█▍ | 3111/22434 [2:10:42<13:08:30, 2.45s/it] +2025-02-05 12:18:23 - ERROR - stderr - +2025-02-05 12:18:23 - ERROR - stderr - +2025-02-05 12:18:23 - INFO - stdout - {'loss': 1.0556, 'grad_norm': 1.157058835029602, 'learning_rate': 1.9387398284376727e-05, 'epoch': 0.42} +2025-02-05 12:18:23 - ERROR - stderr - 14%|█▍ | 3111/22434 [2:10:42<13:08:30, 2.45s/it] +2025-02-05 12:18:25 - ERROR - stderr - 14%|█▍ | 3112/22434 [2:10:45<13:04:41, 2.44s/it] +2025-02-05 12:18:25 - ERROR - stderr - +2025-02-05 12:18:25 - ERROR - stderr - +2025-02-05 12:18:25 - INFO - stdout - {'loss': 1.009, 'grad_norm': 1.2075316905975342, 'learning_rate': 1.9386900632966494e-05, 'epoch': 0.42} +2025-02-05 12:18:25 - ERROR - stderr - 14%|█▍ | 3112/22434 [2:10:45<13:04:41, 2.44s/it] +2025-02-05 12:18:27 - ERROR - stderr - 14%|█▍ | 3113/22434 [2:10:47<13:03:29, 2.43s/it] +2025-02-05 12:18:27 - ERROR - stderr - +2025-02-05 12:18:27 - ERROR - stderr - +2025-02-05 12:18:27 - INFO - stdout - {'loss': 1.0024, 'grad_norm': 1.1105562448501587, 'learning_rate': 1.93864027858953e-05, 'epoch': 0.42} +2025-02-05 12:18:27 - ERROR - stderr - 14%|█▍ | 3113/22434 [2:10:47<13:03:29, 2.43s/it] +2025-02-05 12:18:30 - ERROR - stderr - 14%|█▍ | 3114/22434 [2:10:50<13:08:40, 2.45s/it] +2025-02-05 12:18:30 - ERROR - stderr - +2025-02-05 12:18:30 - ERROR - stderr - +2025-02-05 12:18:30 - INFO - stdout - {'loss': 0.9076, 'grad_norm': 1.0195220708847046, 'learning_rate': 1.938590474317352e-05, 'epoch': 0.42} +2025-02-05 12:18:30 - ERROR - stderr - 14%|█▍ | 3114/22434 [2:10:50<13:08:40, 2.45s/it] +2025-02-05 12:18:32 - ERROR - stderr - 14%|█▍ | 3115/22434 [2:10:52<13:13:12, 2.46s/it] +2025-02-05 12:18:32 - ERROR - stderr - +2025-02-05 12:18:32 - ERROR - stderr - +2025-02-05 12:18:32 - INFO - stdout - {'loss': 0.9827, 'grad_norm': 1.2019658088684082, 'learning_rate': 1.9385406504811534e-05, 'epoch': 0.42} +2025-02-05 12:18:32 - ERROR - stderr - 14%|█▍ | 3115/22434 [2:10:52<13:13:12, 2.46s/it] +2025-02-05 12:18:35 - ERROR - stderr - 14%|█▍ | 3116/22434 [2:10:55<13:21:53, 2.49s/it] +2025-02-05 12:18:35 - ERROR - stderr - +2025-02-05 12:18:35 - ERROR - stderr - +2025-02-05 12:18:35 - INFO - stdout - {'loss': 1.0408, 'grad_norm': 1.2836029529571533, 'learning_rate': 1.9384908070819733e-05, 'epoch': 0.42} +2025-02-05 12:18:35 - ERROR - stderr - 14%|█▍ | 3116/22434 [2:10:55<13:21:53, 2.49s/it] +2025-02-05 12:18:37 - ERROR - stderr - 14%|█▍ | 3117/22434 [2:10:57<13:31:35, 2.52s/it] +2025-02-05 12:18:38 - ERROR - stderr - +2025-02-05 12:18:38 - ERROR - stderr - +2025-02-05 12:18:38 - INFO - stdout - {'loss': 1.0515, 'grad_norm': 1.196293592453003, 'learning_rate': 1.9384409441208503e-05, 'epoch': 0.42} +2025-02-05 12:18:38 - ERROR - stderr - 14%|█▍ | 3117/22434 [2:10:57<13:31:35, 2.52s/it] +2025-02-05 12:18:40 - ERROR - stderr - 14%|█▍ | 3118/22434 [2:11:00<13:27:11, 2.51s/it] +2025-02-05 12:18:40 - ERROR - stderr - +2025-02-05 12:18:40 - ERROR - stderr - +2025-02-05 12:18:40 - INFO - stdout - {'loss': 1.1003, 'grad_norm': 1.4096437692642212, 'learning_rate': 1.9383910615988238e-05, 'epoch': 0.42} +2025-02-05 12:18:40 - ERROR - stderr - 14%|█▍ | 3118/22434 [2:11:00<13:27:11, 2.51s/it] +2025-02-05 12:18:43 - ERROR - stderr - 14%|█▍ | 3119/22434 [2:11:02<13:41:06, 2.55s/it] +2025-02-05 12:18:43 - ERROR - stderr - +2025-02-05 12:18:43 - ERROR - stderr - +2025-02-05 12:18:43 - INFO - stdout - {'loss': 0.9615, 'grad_norm': 1.0785411596298218, 'learning_rate': 1.9383411595169335e-05, 'epoch': 0.42} +2025-02-05 12:18:43 - ERROR - stderr - 14%|█▍ | 3119/22434 [2:11:02<13:41:06, 2.55s/it] +2025-02-05 12:18:45 - ERROR - stderr - 14%|█▍ | 3120/22434 [2:11:05<13:57:52, 2.60s/it] +2025-02-05 12:18:45 - ERROR - stderr - +2025-02-05 12:18:45 - ERROR - stderr - +2025-02-05 12:18:45 - INFO - stdout - {'loss': 1.0025, 'grad_norm': 1.1911101341247559, 'learning_rate': 1.9382912378762197e-05, 'epoch': 0.42} +2025-02-05 12:18:45 - ERROR - stderr - 14%|█▍ | 3120/22434 [2:11:05<13:57:52, 2.60s/it] +2025-02-05 12:18:48 - ERROR - stderr - 14%|█▍ | 3121/22434 [2:11:08<13:41:03, 2.55s/it] +2025-02-05 12:18:48 - ERROR - stderr - +2025-02-05 12:18:48 - ERROR - stderr - +2025-02-05 12:18:48 - INFO - stdout - {'loss': 0.9557, 'grad_norm': 1.0992122888565063, 'learning_rate': 1.938241296677723e-05, 'epoch': 0.42} +2025-02-05 12:18:48 - ERROR - stderr - 14%|█▍ | 3121/22434 [2:11:08<13:41:03, 2.55s/it] +2025-02-05 12:18:50 - ERROR - stderr - 14%|█▍ | 3122/22434 [2:11:10<13:35:31, 2.53s/it] +2025-02-05 12:18:50 - ERROR - stderr - +2025-02-05 12:18:50 - ERROR - stderr - +2025-02-05 12:18:50 - INFO - stdout - {'loss': 1.1261, 'grad_norm': 1.1893155574798584, 'learning_rate': 1.9381913359224844e-05, 'epoch': 0.42} +2025-02-05 12:18:50 - ERROR - stderr - 14%|█▍ | 3122/22434 [2:11:10<13:35:31, 2.53s/it] +2025-02-05 12:18:53 - ERROR - stderr - 14%|█▍ | 3123/22434 [2:11:12<13:26:02, 2.50s/it] +2025-02-05 12:18:53 - ERROR - stderr - +2025-02-05 12:18:53 - ERROR - stderr - +2025-02-05 12:18:53 - INFO - stdout - {'loss': 0.919, 'grad_norm': 1.04340398311615, 'learning_rate': 1.9381413556115446e-05, 'epoch': 0.42} +2025-02-05 12:18:53 - ERROR - stderr - 14%|█▍ | 3123/22434 [2:11:12<13:26:02, 2.50s/it] +2025-02-05 12:18:55 - ERROR - stderr - 14%|█▍ | 3124/22434 [2:11:15<13:25:29, 2.50s/it] +2025-02-05 12:18:55 - ERROR - stderr - +2025-02-05 12:18:55 - ERROR - stderr - +2025-02-05 12:18:55 - INFO - stdout - {'loss': 1.0603, 'grad_norm': 1.1195671558380127, 'learning_rate': 1.9380913557459466e-05, 'epoch': 0.42} +2025-02-05 12:18:55 - ERROR - stderr - 14%|█▍ | 3124/22434 [2:11:15<13:25:29, 2.50s/it] +2025-02-05 12:18:58 - ERROR - stderr - 14%|█▍ | 3125/22434 [2:11:17<13:26:48, 2.51s/it] +2025-02-05 12:18:58 - ERROR - stderr - +2025-02-05 12:18:58 - ERROR - stderr - +2025-02-05 12:18:58 - INFO - stdout - {'loss': 0.9804, 'grad_norm': 1.0873807668685913, 'learning_rate': 1.9380413363267315e-05, 'epoch': 0.42} +2025-02-05 12:18:58 - ERROR - stderr - 14%|█▍ | 3125/22434 [2:11:17<13:26:48, 2.51s/it] +2025-02-05 12:19:00 - ERROR - stderr - 14%|█▍ | 3126/22434 [2:11:20<13:24:39, 2.50s/it] +2025-02-05 12:19:00 - ERROR - stderr - +2025-02-05 12:19:00 - ERROR - stderr - +2025-02-05 12:19:00 - INFO - stdout - {'loss': 0.9814, 'grad_norm': 1.2355629205703735, 'learning_rate': 1.9379912973549427e-05, 'epoch': 0.42} +2025-02-05 12:19:00 - ERROR - stderr - 14%|█▍ | 3126/22434 [2:11:20<13:24:39, 2.50s/it] +2025-02-05 12:19:03 - ERROR - stderr - 14%|█▍ | 3127/22434 [2:11:22<13:18:40, 2.48s/it] +2025-02-05 12:19:03 - ERROR - stderr - +2025-02-05 12:19:03 - ERROR - stderr - +2025-02-05 12:19:03 - INFO - stdout - {'loss': 0.9747, 'grad_norm': 1.0799921751022339, 'learning_rate': 1.9379412388316226e-05, 'epoch': 0.42} +2025-02-05 12:19:03 - ERROR - stderr - 14%|█▍ | 3127/22434 [2:11:22<13:18:40, 2.48s/it] +2025-02-05 12:19:05 - ERROR - stderr - 14%|█▍ | 3128/22434 [2:11:25<13:14:26, 2.47s/it] +2025-02-05 12:19:05 - ERROR - stderr - +2025-02-05 12:19:05 - ERROR - stderr - +2025-02-05 12:19:05 - INFO - stdout - {'loss': 0.963, 'grad_norm': 1.3345062732696533, 'learning_rate': 1.9378911607578148e-05, 'epoch': 0.42} +2025-02-05 12:19:05 - ERROR - stderr - 14%|█▍ | 3128/22434 [2:11:25<13:14:26, 2.47s/it] +2025-02-05 12:19:08 - ERROR - stderr - 14%|█▍ | 3129/22434 [2:11:27<13:17:52, 2.48s/it] +2025-02-05 12:19:08 - ERROR - stderr - +2025-02-05 12:19:08 - ERROR - stderr - +2025-02-05 12:19:08 - INFO - stdout - {'loss': 0.9439, 'grad_norm': 1.0292000770568848, 'learning_rate': 1.9378410631345634e-05, 'epoch': 0.42} +2025-02-05 12:19:08 - ERROR - stderr - 14%|█▍ | 3129/22434 [2:11:27<13:17:52, 2.48s/it] +2025-02-05 12:19:10 - ERROR - stderr - 14%|█▍ | 3130/22434 [2:11:30<13:14:59, 2.47s/it] +2025-02-05 12:19:10 - ERROR - stderr - +2025-02-05 12:19:10 - ERROR - stderr - +2025-02-05 12:19:10 - INFO - stdout - {'loss': 0.9696, 'grad_norm': 1.104035496711731, 'learning_rate': 1.9377909459629125e-05, 'epoch': 0.42} +2025-02-05 12:19:10 - ERROR - stderr - 14%|█▍ | 3130/22434 [2:11:30<13:14:59, 2.47s/it] +2025-02-05 12:19:13 - ERROR - stderr - 14%|█▍ | 3131/22434 [2:11:32<13:23:01, 2.50s/it] +2025-02-05 12:19:13 - ERROR - stderr - +2025-02-05 12:19:13 - ERROR - stderr - +2025-02-05 12:19:13 - INFO - stdout - {'loss': 1.0003, 'grad_norm': 1.1421610116958618, 'learning_rate': 1.9377408092439064e-05, 'epoch': 0.42} +2025-02-05 12:19:13 - ERROR - stderr - 14%|█▍ | 3131/22434 [2:11:32<13:23:01, 2.50s/it] +2025-02-05 12:19:15 - ERROR - stderr - 14%|█▍ | 3132/22434 [2:11:35<13:22:06, 2.49s/it] +2025-02-05 12:19:15 - ERROR - stderr - +2025-02-05 12:19:15 - ERROR - stderr - +2025-02-05 12:19:15 - INFO - stdout - {'loss': 0.9891, 'grad_norm': 1.242640733718872, 'learning_rate': 1.937690652978591e-05, 'epoch': 0.42} +2025-02-05 12:19:15 - ERROR - stderr - 14%|█▍ | 3132/22434 [2:11:35<13:22:06, 2.49s/it] +2025-02-05 12:19:18 - ERROR - stderr - 14%|█▍ | 3133/22434 [2:11:38<13:58:56, 2.61s/it] +2025-02-05 12:19:18 - ERROR - stderr - +2025-02-05 12:19:18 - ERROR - stderr - +2025-02-05 12:19:18 - INFO - stdout - {'loss': 1.0013, 'grad_norm': 1.0270787477493286, 'learning_rate': 1.9376404771680107e-05, 'epoch': 0.42} +2025-02-05 12:19:18 - ERROR - stderr - 14%|█▍ | 3133/22434 [2:11:38<13:58:56, 2.61s/it] +2025-02-05 12:19:20 - ERROR - stderr - 14%|█▍ | 3134/22434 [2:11:40<13:55:19, 2.60s/it] +2025-02-05 12:19:21 - ERROR - stderr - +2025-02-05 12:19:21 - ERROR - stderr - +2025-02-05 12:19:21 - INFO - stdout - {'loss': 1.0552, 'grad_norm': 1.0702391862869263, 'learning_rate': 1.9375902818132123e-05, 'epoch': 0.42} +2025-02-05 12:19:21 - ERROR - stderr - 14%|█▍ | 3134/22434 [2:11:40<13:55:19, 2.60s/it] +2025-02-05 12:19:23 - ERROR - stderr - 14%|█▍ | 3135/22434 [2:11:43<13:46:37, 2.57s/it] +2025-02-05 12:19:23 - ERROR - stderr - +2025-02-05 12:19:23 - ERROR - stderr - +2025-02-05 12:19:23 - INFO - stdout - {'loss': 0.933, 'grad_norm': 1.0596153736114502, 'learning_rate': 1.9375400669152414e-05, 'epoch': 0.42} +2025-02-05 12:19:23 - ERROR - stderr - 14%|█▍ | 3135/22434 [2:11:43<13:46:37, 2.57s/it] +2025-02-05 12:19:26 - ERROR - stderr - 14%|█▍ | 3136/22434 [2:11:45<13:42:17, 2.56s/it] +2025-02-05 12:19:26 - ERROR - stderr - +2025-02-05 12:19:26 - ERROR - stderr - +2025-02-05 12:19:26 - INFO - stdout - {'loss': 1.0294, 'grad_norm': 1.139905333518982, 'learning_rate': 1.9374898324751447e-05, 'epoch': 0.42} +2025-02-05 12:19:26 - ERROR - stderr - 14%|█▍ | 3136/22434 [2:11:45<13:42:17, 2.56s/it] +2025-02-05 12:19:28 - ERROR - stderr - 14%|█▍ | 3137/22434 [2:11:48<13:35:07, 2.53s/it] +2025-02-05 12:19:28 - ERROR - stderr - +2025-02-05 12:19:28 - ERROR - stderr - +2025-02-05 12:19:28 - INFO - stdout - {'loss': 1.1142, 'grad_norm': 1.2888685464859009, 'learning_rate': 1.9374395784939698e-05, 'epoch': 0.42} +2025-02-05 12:19:28 - ERROR - stderr - 14%|█▍ | 3137/22434 [2:11:48<13:35:07, 2.53s/it] +2025-02-05 12:19:30 - ERROR - stderr - 14%|█▍ | 3138/22434 [2:11:50<13:31:04, 2.52s/it] +2025-02-05 12:19:31 - ERROR - stderr - +2025-02-05 12:19:31 - ERROR - stderr - +2025-02-05 12:19:31 - INFO - stdout - {'loss': 1.0417, 'grad_norm': 1.2379025220870972, 'learning_rate': 1.9373893049727643e-05, 'epoch': 0.42} +2025-02-05 12:19:31 - ERROR - stderr - 14%|█▍ | 3138/22434 [2:11:50<13:31:04, 2.52s/it] +2025-02-05 12:19:33 - ERROR - stderr - 14%|█▍ | 3139/22434 [2:11:53<14:02:56, 2.62s/it] +2025-02-05 12:19:33 - ERROR - stderr - +2025-02-05 12:19:33 - ERROR - stderr - +2025-02-05 12:19:33 - INFO - stdout - {'loss': 1.0129, 'grad_norm': 1.2016184329986572, 'learning_rate': 1.937339011912575e-05, 'epoch': 0.42} +2025-02-05 12:19:33 - ERROR - stderr - 14%|█▍ | 3139/22434 [2:11:53<14:02:56, 2.62s/it] +2025-02-05 12:19:36 - ERROR - stderr - 14%|█▍ | 3140/22434 [2:11:56<13:47:33, 2.57s/it] +2025-02-05 12:19:36 - ERROR - stderr - +2025-02-05 12:19:36 - ERROR - stderr - +2025-02-05 12:19:36 - INFO - stdout - {'loss': 1.0602, 'grad_norm': 1.1542670726776123, 'learning_rate': 1.937288699314451e-05, 'epoch': 0.42} +2025-02-05 12:19:36 - ERROR - stderr - 14%|█▍ | 3140/22434 [2:11:56<13:47:33, 2.57s/it] +2025-02-05 12:19:38 - ERROR - stderr - 14%|█▍ | 3141/22434 [2:11:58<13:41:21, 2.55s/it] +2025-02-05 12:19:38 - ERROR - stderr - +2025-02-05 12:19:38 - ERROR - stderr - +2025-02-05 12:19:38 - INFO - stdout - {'loss': 1.0738, 'grad_norm': 1.1953070163726807, 'learning_rate': 1.9372383671794415e-05, 'epoch': 0.42} +2025-02-05 12:19:38 - ERROR - stderr - 14%|█▍ | 3141/22434 [2:11:58<13:41:21, 2.55s/it] +2025-02-05 12:19:41 - ERROR - stderr - 14%|█▍ | 3142/22434 [2:12:01<13:34:58, 2.53s/it] +2025-02-05 12:19:41 - ERROR - stderr - +2025-02-05 12:19:41 - ERROR - stderr - +2025-02-05 12:19:41 - INFO - stdout - {'loss': 0.9126, 'grad_norm': 1.0075047016143799, 'learning_rate': 1.9371880155085948e-05, 'epoch': 0.42} +2025-02-05 12:19:41 - ERROR - stderr - 14%|█▍ | 3142/22434 [2:12:01<13:34:58, 2.53s/it] +2025-02-05 12:19:43 - ERROR - stderr - 14%|█▍ | 3143/22434 [2:12:03<13:38:28, 2.55s/it] +2025-02-05 12:19:43 - ERROR - stderr - +2025-02-05 12:19:43 - ERROR - stderr - +2025-02-05 12:19:43 - INFO - stdout - {'loss': 1.0463, 'grad_norm': 1.151087999343872, 'learning_rate': 1.937137644302961e-05, 'epoch': 0.42} +2025-02-05 12:19:43 - ERROR - stderr - 14%|█▍ | 3143/22434 [2:12:03<13:38:28, 2.55s/it] +2025-02-05 12:19:46 - ERROR - stderr - 14%|█▍ | 3144/22434 [2:12:06<13:34:04, 2.53s/it] +2025-02-05 12:19:46 - ERROR - stderr - +2025-02-05 12:19:46 - ERROR - stderr - +2025-02-05 12:19:46 - INFO - stdout - {'loss': 0.8486, 'grad_norm': 1.0070656538009644, 'learning_rate': 1.937087253563589e-05, 'epoch': 0.42} +2025-02-05 12:19:46 - ERROR - stderr - 14%|█▍ | 3144/22434 [2:12:06<13:34:04, 2.53s/it] +2025-02-05 12:19:48 - ERROR - stderr - 14%|█▍ | 3145/22434 [2:12:08<13:24:09, 2.50s/it] +2025-02-05 12:19:48 - ERROR - stderr - +2025-02-05 12:19:48 - ERROR - stderr - +2025-02-05 12:19:48 - INFO - stdout - {'loss': 0.9569, 'grad_norm': 1.0248143672943115, 'learning_rate': 1.9370368432915306e-05, 'epoch': 0.42} +2025-02-05 12:19:48 - ERROR - stderr - 14%|█▍ | 3145/22434 [2:12:08<13:24:09, 2.50s/it] +2025-02-05 12:19:51 - ERROR - stderr - 14%|█▍ | 3146/22434 [2:12:11<13:19:04, 2.49s/it] +2025-02-05 12:19:51 - ERROR - stderr - +2025-02-05 12:19:51 - ERROR - stderr - +2025-02-05 12:19:51 - INFO - stdout - {'loss': 0.9551, 'grad_norm': 1.0660024881362915, 'learning_rate': 1.9369864134878352e-05, 'epoch': 0.42} +2025-02-05 12:19:51 - ERROR - stderr - 14%|█▍ | 3146/22434 [2:12:11<13:19:04, 2.49s/it] +2025-02-05 12:19:53 - ERROR - stderr - 14%|█▍ | 3147/22434 [2:12:13<13:18:21, 2.48s/it] +2025-02-05 12:19:53 - ERROR - stderr - +2025-02-05 12:19:53 - ERROR - stderr - +2025-02-05 12:19:53 - INFO - stdout - {'loss': 1.1584, 'grad_norm': 1.2577528953552246, 'learning_rate': 1.9369359641535554e-05, 'epoch': 0.42} +2025-02-05 12:19:53 - ERROR - stderr - 14%|█▍ | 3147/22434 [2:12:13<13:18:21, 2.48s/it] +2025-02-05 12:19:56 - ERROR - stderr - 14%|█▍ | 3148/22434 [2:12:15<13:13:00, 2.47s/it] +2025-02-05 12:19:56 - ERROR - stderr - +2025-02-05 12:19:56 - ERROR - stderr - +2025-02-05 12:19:56 - INFO - stdout - {'loss': 0.9724, 'grad_norm': 1.084814190864563, 'learning_rate': 1.9368854952897416e-05, 'epoch': 0.42} +2025-02-05 12:19:56 - ERROR - stderr - 14%|█▍ | 3148/22434 [2:12:15<13:13:00, 2.47s/it] +2025-02-05 12:19:58 - ERROR - stderr - 14%|█▍ | 3149/22434 [2:12:18<13:13:46, 2.47s/it] +2025-02-05 12:19:58 - ERROR - stderr - +2025-02-05 12:19:58 - ERROR - stderr - +2025-02-05 12:19:58 - INFO - stdout - {'loss': 1.0177, 'grad_norm': 1.2028355598449707, 'learning_rate': 1.936835006897446e-05, 'epoch': 0.42} +2025-02-05 12:19:58 - ERROR - stderr - 14%|█▍ | 3149/22434 [2:12:18<13:13:46, 2.47s/it] +2025-02-05 12:20:01 - ERROR - stderr - 14%|█▍ | 3150/22434 [2:12:20<13:22:09, 2.50s/it] +2025-02-05 12:20:01 - ERROR - stderr - +2025-02-05 12:20:01 - ERROR - stderr - +2025-02-05 12:20:01 - INFO - stdout - {'loss': 1.0347, 'grad_norm': 1.0317140817642212, 'learning_rate': 1.936784498977721e-05, 'epoch': 0.42} +2025-02-05 12:20:01 - ERROR - stderr - 14%|█▍ | 3150/22434 [2:12:21<13:22:09, 2.50s/it] +2025-02-05 12:20:03 - ERROR - stderr - 14%|█▍ | 3151/22434 [2:12:23<13:17:21, 2.48s/it] +2025-02-05 12:20:03 - ERROR - stderr - +2025-02-05 12:20:03 - ERROR - stderr - +2025-02-05 12:20:03 - INFO - stdout - {'loss': 0.8617, 'grad_norm': 1.0233596563339233, 'learning_rate': 1.93673397153162e-05, 'epoch': 0.42} +2025-02-05 12:20:03 - ERROR - stderr - 14%|█▍ | 3151/22434 [2:12:23<13:17:21, 2.48s/it] +2025-02-05 12:20:06 - ERROR - stderr - 14%|█▍ | 3152/22434 [2:12:25<13:19:09, 2.49s/it] +2025-02-05 12:20:06 - ERROR - stderr - +2025-02-05 12:20:06 - ERROR - stderr - +2025-02-05 12:20:06 - INFO - stdout - {'loss': 1.0344, 'grad_norm': 1.1906932592391968, 'learning_rate': 1.9366834245601955e-05, 'epoch': 0.42} +2025-02-05 12:20:06 - ERROR - stderr - 14%|█▍ | 3152/22434 [2:12:25<13:19:09, 2.49s/it] +2025-02-05 12:20:08 - ERROR - stderr - 14%|█▍ | 3153/22434 [2:12:28<13:18:21, 2.48s/it] +2025-02-05 12:20:08 - ERROR - stderr - +2025-02-05 12:20:08 - ERROR - stderr - +2025-02-05 12:20:08 - INFO - stdout - {'loss': 1.1191, 'grad_norm': 1.1631810665130615, 'learning_rate': 1.9366328580645013e-05, 'epoch': 0.42} +2025-02-05 12:20:08 - ERROR - stderr - 14%|█▍ | 3153/22434 [2:12:28<13:18:21, 2.48s/it] +2025-02-05 12:20:11 - ERROR - stderr - 14%|█▍ | 3154/22434 [2:12:30<13:17:23, 2.48s/it] +2025-02-05 12:20:11 - ERROR - stderr - +2025-02-05 12:20:11 - ERROR - stderr - +2025-02-05 12:20:11 - INFO - stdout - {'loss': 1.0361, 'grad_norm': 1.1617748737335205, 'learning_rate': 1.9365822720455915e-05, 'epoch': 0.42} +2025-02-05 12:20:11 - ERROR - stderr - 14%|█▍ | 3154/22434 [2:12:30<13:17:23, 2.48s/it] +2025-02-05 12:20:13 - ERROR - stderr - 14%|█▍ | 3155/22434 [2:12:33<13:18:18, 2.48s/it] +2025-02-05 12:20:13 - ERROR - stderr - +2025-02-05 12:20:13 - ERROR - stderr - +2025-02-05 12:20:13 - INFO - stdout - {'loss': 0.9957, 'grad_norm': 1.077890396118164, 'learning_rate': 1.9365316665045204e-05, 'epoch': 0.42} +2025-02-05 12:20:13 - ERROR - stderr - 14%|█▍ | 3155/22434 [2:12:33<13:18:18, 2.48s/it] +2025-02-05 12:20:16 - ERROR - stderr - 14%|█▍ | 3156/22434 [2:12:35<13:25:15, 2.51s/it] +2025-02-05 12:20:16 - ERROR - stderr - +2025-02-05 12:20:16 - ERROR - stderr - +2025-02-05 12:20:16 - INFO - stdout - {'loss': 1.0546, 'grad_norm': 1.1680024862289429, 'learning_rate': 1.9364810414423428e-05, 'epoch': 0.42} +2025-02-05 12:20:16 - ERROR - stderr - 14%|█▍ | 3156/22434 [2:12:35<13:25:15, 2.51s/it] +2025-02-05 12:20:18 - ERROR - stderr - 14%|█▍ | 3157/22434 [2:12:38<13:21:13, 2.49s/it] +2025-02-05 12:20:18 - ERROR - stderr - +2025-02-05 12:20:18 - ERROR - stderr - +2025-02-05 12:20:18 - INFO - stdout - {'loss': 0.9844, 'grad_norm': 1.2040678262710571, 'learning_rate': 1.936430396860114e-05, 'epoch': 0.42} +2025-02-05 12:20:18 - ERROR - stderr - 14%|█▍ | 3157/22434 [2:12:38<13:21:13, 2.49s/it] +2025-02-05 12:20:21 - ERROR - stderr - 14%|█▍ | 3158/22434 [2:12:40<13:20:29, 2.49s/it] +2025-02-05 12:20:21 - ERROR - stderr - +2025-02-05 12:20:21 - ERROR - stderr - +2025-02-05 12:20:21 - INFO - stdout - {'loss': 0.8955, 'grad_norm': 1.217994213104248, 'learning_rate': 1.93637973275889e-05, 'epoch': 0.42} +2025-02-05 12:20:21 - ERROR - stderr - 14%|█▍ | 3158/22434 [2:12:40<13:20:29, 2.49s/it] +2025-02-05 12:20:23 - ERROR - stderr - 14%|█▍ | 3159/22434 [2:12:43<13:18:39, 2.49s/it] +2025-02-05 12:20:23 - ERROR - stderr - +2025-02-05 12:20:23 - ERROR - stderr - +2025-02-05 12:20:23 - INFO - stdout - {'loss': 1.0135, 'grad_norm': 1.1274604797363281, 'learning_rate': 1.936329049139726e-05, 'epoch': 0.42} +2025-02-05 12:20:23 - ERROR - stderr - 14%|█▍ | 3159/22434 [2:12:43<13:18:39, 2.49s/it] +2025-02-05 12:20:26 - ERROR - stderr - 14%|█▍ | 3160/22434 [2:12:45<13:19:00, 2.49s/it] +2025-02-05 12:20:26 - ERROR - stderr - +2025-02-05 12:20:26 - ERROR - stderr - +2025-02-05 12:20:26 - INFO - stdout - {'loss': 0.9295, 'grad_norm': 1.092786431312561, 'learning_rate': 1.9362783460036794e-05, 'epoch': 0.42} +2025-02-05 12:20:26 - ERROR - stderr - 14%|█▍ | 3160/22434 [2:12:45<13:19:00, 2.49s/it] +2025-02-05 12:20:28 - ERROR - stderr - 14%|█▍ | 3161/22434 [2:12:48<13:14:16, 2.47s/it] +2025-02-05 12:20:28 - ERROR - stderr - +2025-02-05 12:20:28 - ERROR - stderr - +2025-02-05 12:20:28 - INFO - stdout - {'loss': 0.9725, 'grad_norm': 1.092376947402954, 'learning_rate': 1.9362276233518063e-05, 'epoch': 0.42} +2025-02-05 12:20:28 - ERROR - stderr - 14%|█▍ | 3161/22434 [2:12:48<13:14:16, 2.47s/it] +2025-02-05 12:20:31 - ERROR - stderr - 14%|█▍ | 3162/22434 [2:12:51<13:44:09, 2.57s/it] +2025-02-05 12:20:31 - ERROR - stderr - +2025-02-05 12:20:31 - ERROR - stderr - +2025-02-05 12:20:31 - INFO - stdout - {'loss': 0.9192, 'grad_norm': 1.1029256582260132, 'learning_rate': 1.936176881185164e-05, 'epoch': 0.42} +2025-02-05 12:20:31 - ERROR - stderr - 14%|█▍ | 3162/22434 [2:12:51<13:44:09, 2.57s/it] +2025-02-05 12:20:33 - ERROR - stderr - 14%|█▍ | 3163/22434 [2:12:53<13:39:19, 2.55s/it] +2025-02-05 12:20:33 - ERROR - stderr - +2025-02-05 12:20:33 - ERROR - stderr - +2025-02-05 12:20:33 - INFO - stdout - {'loss': 1.0509, 'grad_norm': 1.1618587970733643, 'learning_rate': 1.936126119504811e-05, 'epoch': 0.42} +2025-02-05 12:20:33 - ERROR - stderr - 14%|█▍ | 3163/22434 [2:12:53<13:39:19, 2.55s/it] +2025-02-05 12:20:36 - ERROR - stderr - 14%|█▍ | 3164/22434 [2:12:56<13:54:39, 2.60s/it] +2025-02-05 12:20:36 - ERROR - stderr - +2025-02-05 12:20:36 - ERROR - stderr - +2025-02-05 12:20:36 - INFO - stdout - {'loss': 0.9596, 'grad_norm': 1.178165078163147, 'learning_rate': 1.9360753383118048e-05, 'epoch': 0.42} +2025-02-05 12:20:36 - ERROR - stderr - 14%|█▍ | 3164/22434 [2:12:56<13:54:39, 2.60s/it] +2025-02-05 12:20:39 - ERROR - stderr - 14%|█▍ | 3165/22434 [2:12:58<13:47:22, 2.58s/it] +2025-02-05 12:20:39 - ERROR - stderr - +2025-02-05 12:20:39 - ERROR - stderr - +2025-02-05 12:20:39 - INFO - stdout - {'loss': 1.0452, 'grad_norm': 1.1823660135269165, 'learning_rate': 1.9360245376072035e-05, 'epoch': 0.42} +2025-02-05 12:20:39 - ERROR - stderr - 14%|█▍ | 3165/22434 [2:12:58<13:47:22, 2.58s/it] +2025-02-05 12:20:41 - ERROR - stderr - 14%|█▍ | 3166/22434 [2:13:01<13:36:53, 2.54s/it] +2025-02-05 12:20:41 - ERROR - stderr - +2025-02-05 12:20:41 - ERROR - stderr - +2025-02-05 12:20:41 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.1500324010849, 'learning_rate': 1.9359737173920667e-05, 'epoch': 0.42} +2025-02-05 12:20:41 - ERROR - stderr - 14%|█▍ | 3166/22434 [2:13:01<13:36:53, 2.54s/it] +2025-02-05 12:20:43 - ERROR - stderr - 14%|█▍ | 3167/22434 [2:13:03<13:26:38, 2.51s/it] +2025-02-05 12:20:43 - ERROR - stderr - +2025-02-05 12:20:43 - ERROR - stderr - +2025-02-05 12:20:43 - INFO - stdout - {'loss': 0.9361, 'grad_norm': 1.0814740657806396, 'learning_rate': 1.935922877667453e-05, 'epoch': 0.42} +2025-02-05 12:20:43 - ERROR - stderr - 14%|█▍ | 3167/22434 [2:13:03<13:26:38, 2.51s/it] +2025-02-05 12:20:46 - ERROR - stderr - 14%|█▍ | 3168/22434 [2:13:06<13:23:15, 2.50s/it] +2025-02-05 12:20:46 - ERROR - stderr - +2025-02-05 12:20:46 - ERROR - stderr - +2025-02-05 12:20:46 - INFO - stdout - {'loss': 0.9499, 'grad_norm': 1.0281052589416504, 'learning_rate': 1.935872018434423e-05, 'epoch': 0.42} +2025-02-05 12:20:46 - ERROR - stderr - 14%|█▍ | 3168/22434 [2:13:06<13:23:15, 2.50s/it] +2025-02-05 12:20:48 - ERROR - stderr - 14%|█▍ | 3169/22434 [2:13:08<13:26:02, 2.51s/it] +2025-02-05 12:20:48 - ERROR - stderr - +2025-02-05 12:20:48 - ERROR - stderr - +2025-02-05 12:20:48 - INFO - stdout - {'loss': 0.8612, 'grad_norm': 1.0019768476486206, 'learning_rate': 1.9358211396940358e-05, 'epoch': 0.42} +2025-02-05 12:20:48 - ERROR - stderr - 14%|█▍ | 3169/22434 [2:13:08<13:26:02, 2.51s/it] +2025-02-05 12:20:51 - ERROR - stderr - 14%|█▍ | 3170/22434 [2:13:11<13:33:29, 2.53s/it] +2025-02-05 12:20:51 - ERROR - stderr - +2025-02-05 12:20:51 - ERROR - stderr - +2025-02-05 12:20:51 - INFO - stdout - {'loss': 1.0215, 'grad_norm': 1.1525962352752686, 'learning_rate': 1.9357702414473528e-05, 'epoch': 0.42} +2025-02-05 12:20:51 - ERROR - stderr - 14%|█▍ | 3170/22434 [2:13:11<13:33:29, 2.53s/it] +2025-02-05 12:20:53 - ERROR - stderr - 14%|█▍ | 3171/22434 [2:13:13<13:23:36, 2.50s/it] +2025-02-05 12:20:54 - ERROR - stderr - +2025-02-05 12:20:54 - ERROR - stderr - +2025-02-05 12:20:54 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.1244524717330933, 'learning_rate': 1.9357193236954342e-05, 'epoch': 0.42} +2025-02-05 12:20:54 - ERROR - stderr - 14%|█▍ | 3171/22434 [2:13:13<13:23:36, 2.50s/it] +2025-02-05 12:20:56 - ERROR - stderr - 14%|█▍ | 3172/22434 [2:13:16<13:20:24, 2.49s/it] +2025-02-05 12:20:56 - ERROR - stderr - +2025-02-05 12:20:56 - ERROR - stderr - +2025-02-05 12:20:56 - INFO - stdout - {'loss': 0.8903, 'grad_norm': 1.05103600025177, 'learning_rate': 1.9356683864393424e-05, 'epoch': 0.42} +2025-02-05 12:20:56 - ERROR - stderr - 14%|█▍ | 3172/22434 [2:13:16<13:20:24, 2.49s/it] +2025-02-05 12:20:58 - ERROR - stderr - 14%|█▍ | 3173/22434 [2:13:18<13:21:13, 2.50s/it] +2025-02-05 12:20:58 - ERROR - stderr - +2025-02-05 12:20:58 - ERROR - stderr - +2025-02-05 12:20:58 - INFO - stdout - {'loss': 1.0763, 'grad_norm': 1.1584497690200806, 'learning_rate': 1.9356174296801376e-05, 'epoch': 0.42} +2025-02-05 12:20:58 - ERROR - stderr - 14%|█▍ | 3173/22434 [2:13:18<13:21:13, 2.50s/it] +2025-02-05 12:21:01 - ERROR - stderr - 14%|█▍ | 3174/22434 [2:13:21<13:50:23, 2.59s/it] +2025-02-05 12:21:01 - ERROR - stderr - +2025-02-05 12:21:01 - ERROR - stderr - +2025-02-05 12:21:01 - INFO - stdout - {'loss': 0.9392, 'grad_norm': 1.1762516498565674, 'learning_rate': 1.9355664534188833e-05, 'epoch': 0.42} +2025-02-05 12:21:01 - ERROR - stderr - 14%|█▍ | 3174/22434 [2:13:21<13:50:23, 2.59s/it] +2025-02-05 12:21:04 - ERROR - stderr - 14%|█▍ | 3175/22434 [2:13:23<13:37:20, 2.55s/it] +2025-02-05 12:21:04 - ERROR - stderr - +2025-02-05 12:21:04 - ERROR - stderr - +2025-02-05 12:21:04 - INFO - stdout - {'loss': 0.9417, 'grad_norm': 1.168289303779602, 'learning_rate': 1.9355154576566414e-05, 'epoch': 0.42} +2025-02-05 12:21:04 - ERROR - stderr - 14%|█▍ | 3175/22434 [2:13:24<13:37:20, 2.55s/it] +2025-02-05 12:21:06 - ERROR - stderr - 14%|█▍ | 3176/22434 [2:13:26<13:27:20, 2.52s/it] +2025-02-05 12:21:06 - ERROR - stderr - +2025-02-05 12:21:06 - ERROR - stderr - +2025-02-05 12:21:06 - INFO - stdout - {'loss': 0.9293, 'grad_norm': 1.1297686100006104, 'learning_rate': 1.9354644423944747e-05, 'epoch': 0.42} +2025-02-05 12:21:06 - ERROR - stderr - 14%|█▍ | 3176/22434 [2:13:26<13:27:20, 2.52s/it] +2025-02-05 12:21:09 - ERROR - stderr - 14%|█▍ | 3177/22434 [2:13:29<13:36:41, 2.54s/it] +2025-02-05 12:21:09 - ERROR - stderr - +2025-02-05 12:21:09 - ERROR - stderr - +2025-02-05 12:21:09 - INFO - stdout - {'loss': 1.0181, 'grad_norm': 1.2507750988006592, 'learning_rate': 1.935413407633447e-05, 'epoch': 0.42} +2025-02-05 12:21:09 - ERROR - stderr - 14%|█▍ | 3177/22434 [2:13:29<13:36:41, 2.54s/it] +2025-02-05 12:21:11 - ERROR - stderr - 14%|█▍ | 3178/22434 [2:13:31<13:38:43, 2.55s/it] +2025-02-05 12:21:11 - ERROR - stderr - +2025-02-05 12:21:11 - ERROR - stderr - +2025-02-05 12:21:11 - INFO - stdout - {'loss': 1.042, 'grad_norm': 1.1753677129745483, 'learning_rate': 1.935362353374622e-05, 'epoch': 0.42} +2025-02-05 12:21:11 - ERROR - stderr - 14%|█▍ | 3178/22434 [2:13:31<13:38:43, 2.55s/it] +2025-02-05 12:21:14 - ERROR - stderr - 14%|█▍ | 3179/22434 [2:13:34<13:30:03, 2.52s/it] +2025-02-05 12:21:14 - ERROR - stderr - +2025-02-05 12:21:14 - ERROR - stderr - +2025-02-05 12:21:14 - INFO - stdout - {'loss': 0.9168, 'grad_norm': 1.0900845527648926, 'learning_rate': 1.9353112796190637e-05, 'epoch': 0.43} +2025-02-05 12:21:14 - ERROR - stderr - 14%|█▍ | 3179/22434 [2:13:34<13:30:03, 2.52s/it] +2025-02-05 12:21:17 - ERROR - stderr - 14%|█▍ | 3180/22434 [2:13:37<14:11:38, 2.65s/it] +2025-02-05 12:21:17 - ERROR - stderr - +2025-02-05 12:21:17 - ERROR - stderr - +2025-02-05 12:21:17 - INFO - stdout - {'loss': 1.0905, 'grad_norm': 1.1463525295257568, 'learning_rate': 1.935260186367837e-05, 'epoch': 0.43} +2025-02-05 12:21:17 - ERROR - stderr - 14%|█▍ | 3180/22434 [2:13:37<14:11:38, 2.65s/it] +2025-02-05 12:21:19 - ERROR - stderr - 14%|█▍ | 3181/22434 [2:13:39<14:00:21, 2.62s/it] +2025-02-05 12:21:19 - ERROR - stderr - +2025-02-05 12:21:19 - ERROR - stderr - +2025-02-05 12:21:19 - INFO - stdout - {'loss': 1.1016, 'grad_norm': 1.1809686422348022, 'learning_rate': 1.9352090736220065e-05, 'epoch': 0.43} +2025-02-05 12:21:19 - ERROR - stderr - 14%|█▍ | 3181/22434 [2:13:39<14:00:21, 2.62s/it] +2025-02-05 12:21:22 - ERROR - stderr - 14%|█▍ | 3182/22434 [2:13:42<13:52:09, 2.59s/it] +2025-02-05 12:21:22 - ERROR - stderr - +2025-02-05 12:21:22 - ERROR - stderr - +2025-02-05 12:21:22 - INFO - stdout - {'loss': 1.0922, 'grad_norm': 1.1723607778549194, 'learning_rate': 1.9351579413826375e-05, 'epoch': 0.43} +2025-02-05 12:21:22 - ERROR - stderr - 14%|█▍ | 3182/22434 [2:13:42<13:52:09, 2.59s/it] +2025-02-05 12:21:24 - ERROR - stderr - 14%|█▍ | 3183/22434 [2:13:44<13:58:09, 2.61s/it] +2025-02-05 12:21:25 - ERROR - stderr - +2025-02-05 12:21:25 - ERROR - stderr - +2025-02-05 12:21:25 - INFO - stdout - {'loss': 1.0483, 'grad_norm': 1.196715235710144, 'learning_rate': 1.9351067896507964e-05, 'epoch': 0.43} +2025-02-05 12:21:25 - ERROR - stderr - 14%|█▍ | 3183/22434 [2:13:44<13:58:09, 2.61s/it] +2025-02-05 12:21:27 - ERROR - stderr - 14%|█▍ | 3184/22434 [2:13:47<13:45:45, 2.57s/it] +2025-02-05 12:21:27 - ERROR - stderr - +2025-02-05 12:21:27 - ERROR - stderr - +2025-02-05 12:21:27 - INFO - stdout - {'loss': 0.839, 'grad_norm': 0.9938531517982483, 'learning_rate': 1.935055618427549e-05, 'epoch': 0.43} +2025-02-05 12:21:27 - ERROR - stderr - 14%|█▍ | 3184/22434 [2:13:47<13:45:45, 2.57s/it] +2025-02-05 12:21:29 - ERROR - stderr - 14%|█▍ | 3185/22434 [2:13:49<13:34:35, 2.54s/it] +2025-02-05 12:21:29 - ERROR - stderr - +2025-02-05 12:21:29 - ERROR - stderr - +2025-02-05 12:21:29 - INFO - stdout - {'loss': 0.91, 'grad_norm': 1.0874520540237427, 'learning_rate': 1.935004427713962e-05, 'epoch': 0.43} +2025-02-05 12:21:29 - ERROR - stderr - 14%|█▍ | 3185/22434 [2:13:49<13:34:35, 2.54s/it] +2025-02-05 12:21:32 - ERROR - stderr - 14%|█▍ | 3186/22434 [2:13:52<13:25:48, 2.51s/it] +2025-02-05 12:21:32 - ERROR - stderr - +2025-02-05 12:21:32 - ERROR - stderr - +2025-02-05 12:21:32 - INFO - stdout - {'loss': 1.0289, 'grad_norm': 1.1492681503295898, 'learning_rate': 1.9349532175111023e-05, 'epoch': 0.43} +2025-02-05 12:21:32 - ERROR - stderr - 14%|█▍ | 3186/22434 [2:13:52<13:25:48, 2.51s/it] +2025-02-05 12:21:34 - ERROR - stderr - 14%|█▍ | 3187/22434 [2:13:54<13:25:24, 2.51s/it] +2025-02-05 12:21:34 - ERROR - stderr - +2025-02-05 12:21:34 - ERROR - stderr - +2025-02-05 12:21:34 - INFO - stdout - {'loss': 1.0175, 'grad_norm': 1.1197785139083862, 'learning_rate': 1.9349019878200374e-05, 'epoch': 0.43} +2025-02-05 12:21:34 - ERROR - stderr - 14%|█▍ | 3187/22434 [2:13:54<13:25:24, 2.51s/it] +2025-02-05 12:21:37 - ERROR - stderr - 14%|█▍ | 3188/22434 [2:13:57<13:15:47, 2.48s/it] +2025-02-05 12:21:37 - ERROR - stderr - +2025-02-05 12:21:37 - ERROR - stderr - +2025-02-05 12:21:37 - INFO - stdout - {'loss': 0.9558, 'grad_norm': 1.2450969219207764, 'learning_rate': 1.9348507386418354e-05, 'epoch': 0.43} +2025-02-05 12:21:37 - ERROR - stderr - 14%|█▍ | 3188/22434 [2:13:57<13:15:47, 2.48s/it] +2025-02-05 12:21:39 - ERROR - stderr - 14%|█▍ | 3189/22434 [2:13:59<13:16:04, 2.48s/it] +2025-02-05 12:21:39 - ERROR - stderr - +2025-02-05 12:21:39 - ERROR - stderr - +2025-02-05 12:21:39 - INFO - stdout - {'loss': 0.9513, 'grad_norm': 1.167239785194397, 'learning_rate': 1.934799469977564e-05, 'epoch': 0.43} +2025-02-05 12:21:39 - ERROR - stderr - 14%|█▍ | 3189/22434 [2:13:59<13:16:04, 2.48s/it] +2025-02-05 12:21:42 - ERROR - stderr - 14%|█▍ | 3190/22434 [2:14:01<13:11:35, 2.47s/it] +2025-02-05 12:21:42 - ERROR - stderr - +2025-02-05 12:21:42 - ERROR - stderr - +2025-02-05 12:21:42 - INFO - stdout - {'loss': 0.9621, 'grad_norm': 0.9940695762634277, 'learning_rate': 1.9347481818282927e-05, 'epoch': 0.43} +2025-02-05 12:21:42 - ERROR - stderr - 14%|█▍ | 3190/22434 [2:14:02<13:11:35, 2.47s/it] +2025-02-05 12:21:44 - ERROR - stderr - 14%|█▍ | 3191/22434 [2:14:04<13:06:48, 2.45s/it] +2025-02-05 12:21:44 - ERROR - stderr - +2025-02-05 12:21:44 - ERROR - stderr - +2025-02-05 12:21:44 - INFO - stdout - {'loss': 1.0492, 'grad_norm': 1.1731464862823486, 'learning_rate': 1.9346968741950896e-05, 'epoch': 0.43} +2025-02-05 12:21:44 - ERROR - stderr - 14%|█▍ | 3191/22434 [2:14:04<13:06:48, 2.45s/it] +2025-02-05 12:21:47 - ERROR - stderr - 14%|█▍ | 3192/22434 [2:14:06<13:16:36, 2.48s/it] +2025-02-05 12:21:47 - ERROR - stderr - +2025-02-05 12:21:47 - ERROR - stderr - +2025-02-05 12:21:47 - INFO - stdout - {'loss': 0.9194, 'grad_norm': 1.1179336309432983, 'learning_rate': 1.9346455470790245e-05, 'epoch': 0.43} +2025-02-05 12:21:47 - ERROR - stderr - 14%|█▍ | 3192/22434 [2:14:06<13:16:36, 2.48s/it] +2025-02-05 12:21:49 - ERROR - stderr - 14%|█▍ | 3193/22434 [2:14:09<13:21:15, 2.50s/it] +2025-02-05 12:21:49 - ERROR - stderr - +2025-02-05 12:21:49 - ERROR - stderr - +2025-02-05 12:21:49 - INFO - stdout - {'loss': 0.8502, 'grad_norm': 1.0085368156433105, 'learning_rate': 1.9345942004811674e-05, 'epoch': 0.43} +2025-02-05 12:21:49 - ERROR - stderr - 14%|█▍ | 3193/22434 [2:14:09<13:21:15, 2.50s/it] +2025-02-05 12:21:52 - ERROR - stderr - 14%|█▍ | 3194/22434 [2:14:11<13:16:32, 2.48s/it] +2025-02-05 12:21:52 - ERROR - stderr - +2025-02-05 12:21:52 - ERROR - stderr - +2025-02-05 12:21:52 - INFO - stdout - {'loss': 0.9507, 'grad_norm': 1.1526203155517578, 'learning_rate': 1.9345428344025883e-05, 'epoch': 0.43} +2025-02-05 12:21:52 - ERROR - stderr - 14%|█▍ | 3194/22434 [2:14:11<13:16:32, 2.48s/it] +2025-02-05 12:21:54 - ERROR - stderr - 14%|█▍ | 3195/22434 [2:14:14<13:15:20, 2.48s/it] +2025-02-05 12:21:54 - ERROR - stderr - +2025-02-05 12:21:54 - ERROR - stderr - +2025-02-05 12:21:54 - INFO - stdout - {'loss': 0.981, 'grad_norm': 1.0069924592971802, 'learning_rate': 1.9344914488443585e-05, 'epoch': 0.43} +2025-02-05 12:21:54 - ERROR - stderr - 14%|█▍ | 3195/22434 [2:14:14<13:15:20, 2.48s/it] +2025-02-05 12:21:57 - ERROR - stderr - 14%|█▍ | 3196/22434 [2:14:16<13:16:33, 2.48s/it] +2025-02-05 12:21:57 - ERROR - stderr - +2025-02-05 12:21:57 - ERROR - stderr - +2025-02-05 12:21:57 - INFO - stdout - {'loss': 1.0073, 'grad_norm': 1.0836031436920166, 'learning_rate': 1.9344400438075487e-05, 'epoch': 0.43} +2025-02-05 12:21:57 - ERROR - stderr - 14%|█▍ | 3196/22434 [2:14:16<13:16:33, 2.48s/it] +2025-02-05 12:21:59 - ERROR - stderr - 14%|█▍ | 3197/22434 [2:14:19<13:18:34, 2.49s/it] +2025-02-05 12:21:59 - ERROR - stderr - +2025-02-05 12:21:59 - ERROR - stderr - +2025-02-05 12:21:59 - INFO - stdout - {'loss': 1.0621, 'grad_norm': 1.18271803855896, 'learning_rate': 1.93438861929323e-05, 'epoch': 0.43} +2025-02-05 12:21:59 - ERROR - stderr - 14%|█▍ | 3197/22434 [2:14:19<13:18:34, 2.49s/it] +2025-02-05 12:22:02 - ERROR - stderr - 14%|█▍ | 3198/22434 [2:14:21<13:14:35, 2.48s/it] +2025-02-05 12:22:02 - ERROR - stderr - +2025-02-05 12:22:02 - ERROR - stderr - +2025-02-05 12:22:02 - INFO - stdout - {'loss': 0.9822, 'grad_norm': 1.1367918252944946, 'learning_rate': 1.9343371753024747e-05, 'epoch': 0.43} +2025-02-05 12:22:02 - ERROR - stderr - 14%|█▍ | 3198/22434 [2:14:21<13:14:35, 2.48s/it] +2025-02-05 12:22:04 - ERROR - stderr - 14%|█▍ | 3199/22434 [2:14:24<13:18:44, 2.49s/it] +2025-02-05 12:22:04 - ERROR - stderr - +2025-02-05 12:22:04 - ERROR - stderr - +2025-02-05 12:22:04 - INFO - stdout - {'loss': 1.0373, 'grad_norm': 1.1420409679412842, 'learning_rate': 1.934285711836355e-05, 'epoch': 0.43} +2025-02-05 12:22:04 - ERROR - stderr - 14%|█▍ | 3199/22434 [2:14:24<13:18:44, 2.49s/it] +2025-02-05 12:22:07 - ERROR - stderr - 14%|█▍ | 3200/22434 [2:14:26<13:19:23, 2.49s/it] +2025-02-05 12:22:07 - ERROR - stderr - +2025-02-05 12:22:07 - ERROR - stderr - +2025-02-05 12:22:07 - INFO - stdout - {'loss': 0.9689, 'grad_norm': 1.1827529668807983, 'learning_rate': 1.934234228895944e-05, 'epoch': 0.43} +2025-02-05 12:22:07 - ERROR - stderr - 14%|█▍ | 3200/22434 [2:14:26<13:19:23, 2.49s/it] +2025-02-05 12:22:09 - ERROR - stderr - 14%|█▍ | 3201/22434 [2:14:29<13:18:30, 2.49s/it] +2025-02-05 12:22:09 - ERROR - stderr - +2025-02-05 12:22:09 - ERROR - stderr - +2025-02-05 12:22:09 - INFO - stdout - {'loss': 1.023, 'grad_norm': 1.2171211242675781, 'learning_rate': 1.9341827264823142e-05, 'epoch': 0.43} +2025-02-05 12:22:09 - ERROR - stderr - 14%|█▍ | 3201/22434 [2:14:29<13:18:30, 2.49s/it] +2025-02-05 12:22:12 - ERROR - stderr - 14%|█▍ | 3202/22434 [2:14:31<13:13:33, 2.48s/it] +2025-02-05 12:22:12 - ERROR - stderr - +2025-02-05 12:22:12 - ERROR - stderr - +2025-02-05 12:22:12 - INFO - stdout - {'loss': 1.1413, 'grad_norm': 1.1928479671478271, 'learning_rate': 1.934131204596539e-05, 'epoch': 0.43} +2025-02-05 12:22:12 - ERROR - stderr - 14%|█▍ | 3202/22434 [2:14:31<13:13:33, 2.48s/it] +2025-02-05 12:22:14 - ERROR - stderr - 14%|█▍ | 3203/22434 [2:14:34<13:28:59, 2.52s/it] +2025-02-05 12:22:14 - ERROR - stderr - +2025-02-05 12:22:14 - ERROR - stderr - +2025-02-05 12:22:14 - INFO - stdout - {'loss': 1.0534, 'grad_norm': 1.2026665210723877, 'learning_rate': 1.9340796632396935e-05, 'epoch': 0.43} +2025-02-05 12:22:14 - ERROR - stderr - 14%|█▍ | 3203/22434 [2:14:34<13:28:59, 2.52s/it] +2025-02-05 12:22:17 - ERROR - stderr - 14%|█▍ | 3204/22434 [2:14:36<13:33:45, 2.54s/it] +2025-02-05 12:22:17 - ERROR - stderr - +2025-02-05 12:22:17 - ERROR - stderr - +2025-02-05 12:22:17 - INFO - stdout - {'loss': 0.8741, 'grad_norm': 1.0762709379196167, 'learning_rate': 1.934028102412851e-05, 'epoch': 0.43} +2025-02-05 12:22:17 - ERROR - stderr - 14%|█▍ | 3204/22434 [2:14:37<13:33:45, 2.54s/it] +2025-02-05 12:22:19 - ERROR - stderr - 14%|█▍ | 3205/22434 [2:14:39<13:26:42, 2.52s/it] +2025-02-05 12:22:19 - ERROR - stderr - +2025-02-05 12:22:19 - ERROR - stderr - +2025-02-05 12:22:19 - INFO - stdout - {'loss': 1.042, 'grad_norm': 1.2451571226119995, 'learning_rate': 1.933976522117086e-05, 'epoch': 0.43} +2025-02-05 12:22:19 - ERROR - stderr - 14%|█▍ | 3205/22434 [2:14:39<13:26:42, 2.52s/it] +2025-02-05 12:22:22 - ERROR - stderr - 14%|█▍ | 3206/22434 [2:14:41<13:24:20, 2.51s/it] +2025-02-05 12:22:22 - ERROR - stderr - +2025-02-05 12:22:22 - ERROR - stderr - +2025-02-05 12:22:22 - INFO - stdout - {'loss': 1.0747, 'grad_norm': 1.1522217988967896, 'learning_rate': 1.9339249223534743e-05, 'epoch': 0.43} +2025-02-05 12:22:22 - ERROR - stderr - 14%|█▍ | 3206/22434 [2:14:42<13:24:20, 2.51s/it] +2025-02-05 12:22:24 - ERROR - stderr - 14%|█▍ | 3207/22434 [2:14:44<13:23:34, 2.51s/it] +2025-02-05 12:22:24 - ERROR - stderr - +2025-02-05 12:22:24 - ERROR - stderr - +2025-02-05 12:22:24 - INFO - stdout - {'loss': 0.9487, 'grad_norm': 1.00810706615448, 'learning_rate': 1.9338733031230917e-05, 'epoch': 0.43} +2025-02-05 12:22:24 - ERROR - stderr - 14%|█▍ | 3207/22434 [2:14:44<13:23:34, 2.51s/it] +2025-02-05 12:22:27 - ERROR - stderr - 14%|█▍ | 3208/22434 [2:14:47<13:30:05, 2.53s/it] +2025-02-05 12:22:27 - ERROR - stderr - +2025-02-05 12:22:27 - ERROR - stderr - +2025-02-05 12:22:27 - INFO - stdout - {'loss': 0.9177, 'grad_norm': 1.1863644123077393, 'learning_rate': 1.9338216644270134e-05, 'epoch': 0.43} +2025-02-05 12:22:27 - ERROR - stderr - 14%|█▍ | 3208/22434 [2:14:47<13:30:05, 2.53s/it] +2025-02-05 12:22:29 - ERROR - stderr - 14%|█▍ | 3209/22434 [2:14:49<13:29:08, 2.53s/it] +2025-02-05 12:22:29 - ERROR - stderr - +2025-02-05 12:22:29 - ERROR - stderr - +2025-02-05 12:22:29 - INFO - stdout - {'loss': 1.0805, 'grad_norm': 1.1721563339233398, 'learning_rate': 1.933770006266316e-05, 'epoch': 0.43} +2025-02-05 12:22:29 - ERROR - stderr - 14%|█▍ | 3209/22434 [2:14:49<13:29:08, 2.53s/it] +2025-02-05 12:22:32 - ERROR - stderr - 14%|█▍ | 3210/22434 [2:14:52<13:26:02, 2.52s/it] +2025-02-05 12:22:32 - ERROR - stderr - +2025-02-05 12:22:32 - ERROR - stderr - +2025-02-05 12:22:32 - INFO - stdout - {'loss': 0.9603, 'grad_norm': 1.0565379858016968, 'learning_rate': 1.9337183286420764e-05, 'epoch': 0.43} +2025-02-05 12:22:32 - ERROR - stderr - 14%|█▍ | 3210/22434 [2:14:52<13:26:02, 2.52s/it] +2025-02-05 12:22:34 - ERROR - stderr - 14%|█▍ | 3211/22434 [2:14:54<13:15:55, 2.48s/it] +2025-02-05 12:22:34 - ERROR - stderr - +2025-02-05 12:22:34 - ERROR - stderr - +2025-02-05 12:22:34 - INFO - stdout - {'loss': 0.958, 'grad_norm': 1.0838137865066528, 'learning_rate': 1.933666631555372e-05, 'epoch': 0.43} +2025-02-05 12:22:34 - ERROR - stderr - 14%|█▍ | 3211/22434 [2:14:54<13:15:55, 2.48s/it] +2025-02-05 12:22:37 - ERROR - stderr - 14%|█▍ | 3212/22434 [2:14:56<13:11:44, 2.47s/it] +2025-02-05 12:22:37 - ERROR - stderr - +2025-02-05 12:22:37 - ERROR - stderr - +2025-02-05 12:22:37 - INFO - stdout - {'loss': 0.8762, 'grad_norm': 1.149985432624817, 'learning_rate': 1.9336149150072795e-05, 'epoch': 0.43} +2025-02-05 12:22:37 - ERROR - stderr - 14%|█▍ | 3212/22434 [2:14:56<13:11:44, 2.47s/it] +2025-02-05 12:22:39 - ERROR - stderr - 14%|█▍ | 3213/22434 [2:14:59<13:09:53, 2.47s/it] +2025-02-05 12:22:39 - ERROR - stderr - +2025-02-05 12:22:39 - ERROR - stderr - +2025-02-05 12:22:39 - INFO - stdout - {'loss': 1.0185, 'grad_norm': 1.2659294605255127, 'learning_rate': 1.933563178998878e-05, 'epoch': 0.43} +2025-02-05 12:22:39 - ERROR - stderr - 14%|█▍ | 3213/22434 [2:14:59<13:09:53, 2.47s/it] +2025-02-05 12:22:42 - ERROR - stderr - 14%|█▍ | 3214/22434 [2:15:01<13:10:49, 2.47s/it] +2025-02-05 12:22:42 - ERROR - stderr - +2025-02-05 12:22:42 - ERROR - stderr - +2025-02-05 12:22:42 - INFO - stdout - {'loss': 1.0157, 'grad_norm': 1.0330573320388794, 'learning_rate': 1.933511423531245e-05, 'epoch': 0.43} +2025-02-05 12:22:42 - ERROR - stderr - 14%|█▍ | 3214/22434 [2:15:01<13:10:49, 2.47s/it] +2025-02-05 12:22:44 - ERROR - stderr - 14%|█▍ | 3215/22434 [2:15:04<13:22:57, 2.51s/it] +2025-02-05 12:22:44 - ERROR - stderr - +2025-02-05 12:22:44 - ERROR - stderr - +2025-02-05 12:22:44 - INFO - stdout - {'loss': 0.9575, 'grad_norm': 1.2409868240356445, 'learning_rate': 1.93345964860546e-05, 'epoch': 0.43} +2025-02-05 12:22:44 - ERROR - stderr - 14%|█▍ | 3215/22434 [2:15:04<13:22:57, 2.51s/it] +2025-02-05 12:22:47 - ERROR - stderr - 14%|█▍ | 3216/22434 [2:15:06<13:24:32, 2.51s/it] +2025-02-05 12:22:47 - ERROR - stderr - +2025-02-05 12:22:47 - ERROR - stderr - +2025-02-05 12:22:47 - INFO - stdout - {'loss': 1.0152, 'grad_norm': 1.226502776145935, 'learning_rate': 1.9334078542226015e-05, 'epoch': 0.43} +2025-02-05 12:22:47 - ERROR - stderr - 14%|█▍ | 3216/22434 [2:15:06<13:24:32, 2.51s/it] +2025-02-05 12:22:50 - ERROR - stderr - 14%|█▍ | 3217/22434 [2:15:09<14:04:18, 2.64s/it] +2025-02-05 12:22:50 - ERROR - stderr - +2025-02-05 12:22:50 - ERROR - stderr - +2025-02-05 12:22:50 - INFO - stdout - {'loss': 1.0068, 'grad_norm': 1.1387345790863037, 'learning_rate': 1.9333560403837497e-05, 'epoch': 0.43} +2025-02-05 12:22:50 - ERROR - stderr - 14%|█▍ | 3217/22434 [2:15:09<14:04:18, 2.64s/it] +2025-02-05 12:22:52 - ERROR - stderr - 14%|█▍ | 3218/22434 [2:15:12<13:52:03, 2.60s/it] +2025-02-05 12:22:52 - ERROR - stderr - +2025-02-05 12:22:52 - ERROR - stderr - +2025-02-05 12:22:52 - INFO - stdout - {'loss': 1.0246, 'grad_norm': 1.123658299446106, 'learning_rate': 1.933304207089984e-05, 'epoch': 0.43} +2025-02-05 12:22:52 - ERROR - stderr - 14%|█▍ | 3218/22434 [2:15:12<13:52:03, 2.60s/it] +2025-02-05 12:22:55 - ERROR - stderr - 14%|█▍ | 3219/22434 [2:15:14<13:36:39, 2.55s/it] +2025-02-05 12:22:55 - ERROR - stderr - +2025-02-05 12:22:55 - ERROR - stderr - +2025-02-05 12:22:55 - INFO - stdout - {'loss': 0.9516, 'grad_norm': 1.132745623588562, 'learning_rate': 1.9332523543423858e-05, 'epoch': 0.43} +2025-02-05 12:22:55 - ERROR - stderr - 14%|█▍ | 3219/22434 [2:15:14<13:36:39, 2.55s/it] +2025-02-05 12:22:57 - ERROR - stderr - 14%|█▍ | 3220/22434 [2:15:17<13:27:10, 2.52s/it] +2025-02-05 12:22:57 - ERROR - stderr - +2025-02-05 12:22:57 - ERROR - stderr - +2025-02-05 12:22:57 - INFO - stdout - {'loss': 0.9207, 'grad_norm': 1.17822265625, 'learning_rate': 1.9332004821420346e-05, 'epoch': 0.43} +2025-02-05 12:22:57 - ERROR - stderr - 14%|█▍ | 3220/22434 [2:15:17<13:27:10, 2.52s/it] +2025-02-05 12:23:00 - ERROR - stderr - 14%|█▍ | 3221/22434 [2:15:19<13:32:54, 2.54s/it] +2025-02-05 12:23:00 - ERROR - stderr - +2025-02-05 12:23:00 - ERROR - stderr - +2025-02-05 12:23:00 - INFO - stdout - {'loss': 0.9799, 'grad_norm': 1.1625081300735474, 'learning_rate': 1.933148590490013e-05, 'epoch': 0.43} +2025-02-05 12:23:00 - ERROR - stderr - 14%|█▍ | 3221/22434 [2:15:19<13:32:54, 2.54s/it] +2025-02-05 12:23:02 - ERROR - stderr - 14%|█▍ | 3222/22434 [2:15:22<13:21:24, 2.50s/it] +2025-02-05 12:23:02 - ERROR - stderr - +2025-02-05 12:23:02 - ERROR - stderr - +2025-02-05 12:23:02 - INFO - stdout - {'loss': 0.9598, 'grad_norm': 1.1458196640014648, 'learning_rate': 1.9330966793874015e-05, 'epoch': 0.43} +2025-02-05 12:23:02 - ERROR - stderr - 14%|█▍ | 3222/22434 [2:15:22<13:21:24, 2.50s/it] +2025-02-05 12:23:05 - ERROR - stderr - 14%|█▍ | 3223/22434 [2:15:24<13:24:36, 2.51s/it] +2025-02-05 12:23:05 - ERROR - stderr - +2025-02-05 12:23:05 - ERROR - stderr - +2025-02-05 12:23:05 - INFO - stdout - {'loss': 0.9867, 'grad_norm': 1.0614173412322998, 'learning_rate': 1.933044748835283e-05, 'epoch': 0.43} +2025-02-05 12:23:05 - ERROR - stderr - 14%|█▍ | 3223/22434 [2:15:24<13:24:36, 2.51s/it] +2025-02-05 12:23:07 - ERROR - stderr - 14%|█▍ | 3224/22434 [2:15:27<13:23:12, 2.51s/it] +2025-02-05 12:23:07 - ERROR - stderr - +2025-02-05 12:23:07 - ERROR - stderr - +2025-02-05 12:23:07 - INFO - stdout - {'loss': 0.9199, 'grad_norm': 1.1612673997879028, 'learning_rate': 1.932992798834739e-05, 'epoch': 0.43} +2025-02-05 12:23:07 - ERROR - stderr - 14%|█▍ | 3224/22434 [2:15:27<13:23:12, 2.51s/it] +2025-02-05 12:23:09 - ERROR - stderr - 14%|█▍ | 3225/22434 [2:15:29<13:17:38, 2.49s/it] +2025-02-05 12:23:10 - ERROR - stderr - +2025-02-05 12:23:10 - ERROR - stderr - +2025-02-05 12:23:10 - INFO - stdout - {'loss': 0.9475, 'grad_norm': 1.1997700929641724, 'learning_rate': 1.9329408293868533e-05, 'epoch': 0.43} +2025-02-05 12:23:10 - ERROR - stderr - 14%|█▍ | 3225/22434 [2:15:29<13:17:38, 2.49s/it] +2025-02-05 12:23:12 - ERROR - stderr - 14%|█▍ | 3226/22434 [2:15:32<13:10:22, 2.47s/it] +2025-02-05 12:23:12 - ERROR - stderr - +2025-02-05 12:23:12 - ERROR - stderr - +2025-02-05 12:23:12 - INFO - stdout - {'loss': 1.0813, 'grad_norm': 1.2088536024093628, 'learning_rate': 1.9328888404927086e-05, 'epoch': 0.43} +2025-02-05 12:23:12 - ERROR - stderr - 14%|█▍ | 3226/22434 [2:15:32<13:10:22, 2.47s/it] +2025-02-05 12:23:14 - ERROR - stderr - 14%|█▍ | 3227/22434 [2:15:34<13:13:00, 2.48s/it] +2025-02-05 12:23:14 - ERROR - stderr - +2025-02-05 12:23:14 - ERROR - stderr - +2025-02-05 12:23:14 - INFO - stdout - {'loss': 1.0938, 'grad_norm': 1.263952612876892, 'learning_rate': 1.9328368321533885e-05, 'epoch': 0.43} +2025-02-05 12:23:14 - ERROR - stderr - 14%|█▍ | 3227/22434 [2:15:34<13:13:00, 2.48s/it] +2025-02-05 12:23:17 - ERROR - stderr - 14%|█▍ | 3228/22434 [2:15:37<13:49:56, 2.59s/it] +2025-02-05 12:23:17 - ERROR - stderr - +2025-02-05 12:23:17 - ERROR - stderr - +2025-02-05 12:23:17 - INFO - stdout - {'loss': 0.9589, 'grad_norm': 1.1620732545852661, 'learning_rate': 1.9327848043699774e-05, 'epoch': 0.43} +2025-02-05 12:23:17 - ERROR - stderr - 14%|█▍ | 3228/22434 [2:15:37<13:49:56, 2.59s/it] +2025-02-05 12:23:20 - ERROR - stderr - 14%|█▍ | 3229/22434 [2:15:39<13:37:00, 2.55s/it] +2025-02-05 12:23:20 - ERROR - stderr - +2025-02-05 12:23:20 - ERROR - stderr - +2025-02-05 12:23:20 - INFO - stdout - {'loss': 1.0728, 'grad_norm': 1.4293076992034912, 'learning_rate': 1.9327327571435597e-05, 'epoch': 0.43} +2025-02-05 12:23:20 - ERROR - stderr - 14%|█▍ | 3229/22434 [2:15:40<13:37:00, 2.55s/it] +2025-02-05 12:23:22 - ERROR - stderr - 14%|█▍ | 3230/22434 [2:15:42<13:26:44, 2.52s/it] +2025-02-05 12:23:22 - ERROR - stderr - +2025-02-05 12:23:22 - ERROR - stderr - +2025-02-05 12:23:22 - INFO - stdout - {'loss': 0.9247, 'grad_norm': 1.212754487991333, 'learning_rate': 1.93268069047522e-05, 'epoch': 0.43} +2025-02-05 12:23:22 - ERROR - stderr - 14%|█▍ | 3230/22434 [2:15:42<13:26:44, 2.52s/it] +2025-02-05 12:23:25 - ERROR - stderr - 14%|█▍ | 3231/22434 [2:15:44<13:22:53, 2.51s/it] +2025-02-05 12:23:25 - ERROR - stderr - +2025-02-05 12:23:25 - ERROR - stderr - +2025-02-05 12:23:25 - INFO - stdout - {'loss': 0.9545, 'grad_norm': 1.1887993812561035, 'learning_rate': 1.9326286043660442e-05, 'epoch': 0.43} +2025-02-05 12:23:25 - ERROR - stderr - 14%|█▍ | 3231/22434 [2:15:44<13:22:53, 2.51s/it] +2025-02-05 12:23:27 - ERROR - stderr - 14%|█▍ | 3232/22434 [2:15:47<13:18:21, 2.49s/it] +2025-02-05 12:23:27 - ERROR - stderr - +2025-02-05 12:23:27 - ERROR - stderr - +2025-02-05 12:23:27 - INFO - stdout - {'loss': 0.9823, 'grad_norm': 1.0981377363204956, 'learning_rate': 1.9325764988171173e-05, 'epoch': 0.43} +2025-02-05 12:23:27 - ERROR - stderr - 14%|█▍ | 3232/22434 [2:15:47<13:18:21, 2.49s/it] +2025-02-05 12:23:30 - ERROR - stderr - 14%|█▍ | 3233/22434 [2:15:49<13:13:15, 2.48s/it] +2025-02-05 12:23:30 - ERROR - stderr - +2025-02-05 12:23:30 - ERROR - stderr - +2025-02-05 12:23:30 - INFO - stdout - {'loss': 0.899, 'grad_norm': 1.1061948537826538, 'learning_rate': 1.932524373829526e-05, 'epoch': 0.43} +2025-02-05 12:23:30 - ERROR - stderr - 14%|█▍ | 3233/22434 [2:15:49<13:13:15, 2.48s/it] +2025-02-05 12:23:32 - ERROR - stderr - 14%|█▍ | 3234/22434 [2:15:52<13:18:34, 2.50s/it] +2025-02-05 12:23:32 - ERROR - stderr - +2025-02-05 12:23:32 - ERROR - stderr - +2025-02-05 12:23:32 - INFO - stdout - {'loss': 1.0584, 'grad_norm': 1.14080011844635, 'learning_rate': 1.932472229404356e-05, 'epoch': 0.43} +2025-02-05 12:23:32 - ERROR - stderr - 14%|█▍ | 3234/22434 [2:15:52<13:18:34, 2.50s/it] +2025-02-05 12:23:35 - ERROR - stderr - 14%|█▍ | 3235/22434 [2:15:54<13:20:54, 2.50s/it] +2025-02-05 12:23:35 - ERROR - stderr - +2025-02-05 12:23:35 - ERROR - stderr - +2025-02-05 12:23:35 - INFO - stdout - {'loss': 1.0231, 'grad_norm': 1.0690444707870483, 'learning_rate': 1.932420065542695e-05, 'epoch': 0.43} +2025-02-05 12:23:35 - ERROR - stderr - 14%|█▍ | 3235/22434 [2:15:54<13:20:54, 2.50s/it] +2025-02-05 12:23:37 - ERROR - stderr - 14%|█▍ | 3236/22434 [2:15:57<13:17:15, 2.49s/it] +2025-02-05 12:23:37 - ERROR - stderr - +2025-02-05 12:23:37 - ERROR - stderr - +2025-02-05 12:23:37 - INFO - stdout - {'loss': 1.0213, 'grad_norm': 1.1371108293533325, 'learning_rate': 1.9323678822456296e-05, 'epoch': 0.43} +2025-02-05 12:23:37 - ERROR - stderr - 14%|█▍ | 3236/22434 [2:15:57<13:17:15, 2.49s/it] +2025-02-05 12:23:40 - ERROR - stderr - 14%|█▍ | 3237/22434 [2:15:59<13:16:53, 2.49s/it] +2025-02-05 12:23:40 - ERROR - stderr - +2025-02-05 12:23:40 - ERROR - stderr - +2025-02-05 12:23:40 - INFO - stdout - {'loss': 0.9224, 'grad_norm': 1.081855297088623, 'learning_rate': 1.932315679514248e-05, 'epoch': 0.43} +2025-02-05 12:23:40 - ERROR - stderr - 14%|█▍ | 3237/22434 [2:15:59<13:16:53, 2.49s/it] +2025-02-05 12:23:42 - ERROR - stderr - 14%|█▍ | 3238/22434 [2:16:02<13:17:18, 2.49s/it] +2025-02-05 12:23:42 - ERROR - stderr - +2025-02-05 12:23:42 - ERROR - stderr - +2025-02-05 12:23:42 - INFO - stdout - {'loss': 0.8645, 'grad_norm': 1.0324413776397705, 'learning_rate': 1.9322634573496383e-05, 'epoch': 0.43} +2025-02-05 12:23:42 - ERROR - stderr - 14%|█▍ | 3238/22434 [2:16:02<13:17:18, 2.49s/it] +2025-02-05 12:23:45 - ERROR - stderr - 14%|█▍ | 3239/22434 [2:16:05<13:51:27, 2.60s/it] +2025-02-05 12:23:45 - ERROR - stderr - +2025-02-05 12:23:45 - ERROR - stderr - +2025-02-05 12:23:45 - INFO - stdout - {'loss': 1.0141, 'grad_norm': 1.0652774572372437, 'learning_rate': 1.9322112157528886e-05, 'epoch': 0.43} +2025-02-05 12:23:45 - ERROR - stderr - 14%|█▍ | 3239/22434 [2:16:05<13:51:27, 2.60s/it] +2025-02-05 12:23:47 - ERROR - stderr - 14%|█▍ | 3240/22434 [2:16:07<13:35:50, 2.55s/it] +2025-02-05 12:23:47 - ERROR - stderr - +2025-02-05 12:23:47 - ERROR - stderr - +2025-02-05 12:23:47 - INFO - stdout - {'loss': 1.0183, 'grad_norm': 1.16048002243042, 'learning_rate': 1.932158954725089e-05, 'epoch': 0.43} +2025-02-05 12:23:47 - ERROR - stderr - 14%|█▍ | 3240/22434 [2:16:07<13:35:50, 2.55s/it] +2025-02-05 12:23:50 - ERROR - stderr - 14%|█▍ | 3241/22434 [2:16:10<13:35:01, 2.55s/it] +2025-02-05 12:23:50 - ERROR - stderr - +2025-02-05 12:23:50 - ERROR - stderr - +2025-02-05 12:23:50 - INFO - stdout - {'loss': 1.0345, 'grad_norm': 1.1730974912643433, 'learning_rate': 1.932106674267327e-05, 'epoch': 0.43} +2025-02-05 12:23:50 - ERROR - stderr - 14%|█▍ | 3241/22434 [2:16:10<13:35:01, 2.55s/it] +2025-02-05 12:23:52 - ERROR - stderr - 14%|█▍ | 3242/22434 [2:16:12<13:26:32, 2.52s/it] +2025-02-05 12:23:52 - ERROR - stderr - +2025-02-05 12:23:52 - ERROR - stderr - +2025-02-05 12:23:52 - INFO - stdout - {'loss': 0.9948, 'grad_norm': 1.2496461868286133, 'learning_rate': 1.9320543743806936e-05, 'epoch': 0.43} +2025-02-05 12:23:52 - ERROR - stderr - 14%|█▍ | 3242/22434 [2:16:12<13:26:32, 2.52s/it] +2025-02-05 12:23:55 - ERROR - stderr - 14%|█▍ | 3243/22434 [2:16:15<13:36:27, 2.55s/it] +2025-02-05 12:23:55 - ERROR - stderr - +2025-02-05 12:23:55 - ERROR - stderr - +2025-02-05 12:23:55 - INFO - stdout - {'loss': 0.883, 'grad_norm': 1.0477992296218872, 'learning_rate': 1.932002055066279e-05, 'epoch': 0.43} +2025-02-05 12:23:55 - ERROR - stderr - 14%|█▍ | 3243/22434 [2:16:15<13:36:27, 2.55s/it] +2025-02-05 12:23:57 - ERROR - stderr - 14%|█▍ | 3244/22434 [2:16:17<13:26:33, 2.52s/it] +2025-02-05 12:23:57 - ERROR - stderr - +2025-02-05 12:23:57 - ERROR - stderr - +2025-02-05 12:23:57 - INFO - stdout - {'loss': 1.0263, 'grad_norm': 1.1289085149765015, 'learning_rate': 1.9319497163251728e-05, 'epoch': 0.43} +2025-02-05 12:23:57 - ERROR - stderr - 14%|█▍ | 3244/22434 [2:16:17<13:26:33, 2.52s/it] +2025-02-05 12:24:00 - ERROR - stderr - 14%|█▍ | 3245/22434 [2:16:20<13:27:00, 2.52s/it] +2025-02-05 12:24:00 - ERROR - stderr - +2025-02-05 12:24:00 - ERROR - stderr - +2025-02-05 12:24:00 - INFO - stdout - {'loss': 0.9508, 'grad_norm': 1.1679236888885498, 'learning_rate': 1.931897358158467e-05, 'epoch': 0.43} +2025-02-05 12:24:00 - ERROR - stderr - 14%|█▍ | 3245/22434 [2:16:20<13:27:00, 2.52s/it] +2025-02-05 12:24:02 - ERROR - stderr - 14%|█▍ | 3246/22434 [2:16:22<13:18:35, 2.50s/it] +2025-02-05 12:24:02 - ERROR - stderr - +2025-02-05 12:24:02 - ERROR - stderr - +2025-02-05 12:24:02 - INFO - stdout - {'loss': 0.9105, 'grad_norm': 1.1121903657913208, 'learning_rate': 1.9318449805672524e-05, 'epoch': 0.43} +2025-02-05 12:24:02 - ERROR - stderr - 14%|█▍ | 3246/22434 [2:16:22<13:18:35, 2.50s/it] +2025-02-05 12:24:05 - ERROR - stderr - 14%|█▍ | 3247/22434 [2:16:25<13:21:44, 2.51s/it] +2025-02-05 12:24:05 - ERROR - stderr - +2025-02-05 12:24:05 - ERROR - stderr - +2025-02-05 12:24:05 - INFO - stdout - {'loss': 1.154, 'grad_norm': 1.1972988843917847, 'learning_rate': 1.9317925835526206e-05, 'epoch': 0.43} +2025-02-05 12:24:05 - ERROR - stderr - 14%|█▍ | 3247/22434 [2:16:25<13:21:44, 2.51s/it] +2025-02-05 12:24:07 - ERROR - stderr - 14%|█▍ | 3248/22434 [2:16:27<13:13:11, 2.48s/it] +2025-02-05 12:24:07 - ERROR - stderr - +2025-02-05 12:24:07 - ERROR - stderr - +2025-02-05 12:24:07 - INFO - stdout - {'loss': 1.0495, 'grad_norm': 1.1384882926940918, 'learning_rate': 1.931740167115664e-05, 'epoch': 0.43} +2025-02-05 12:24:07 - ERROR - stderr - 14%|█▍ | 3248/22434 [2:16:27<13:13:11, 2.48s/it] +2025-02-05 12:24:10 - ERROR - stderr - 14%|█▍ | 3249/22434 [2:16:30<13:20:14, 2.50s/it] +2025-02-05 12:24:10 - ERROR - stderr - +2025-02-05 12:24:10 - ERROR - stderr - +2025-02-05 12:24:10 - INFO - stdout - {'loss': 1.1052, 'grad_norm': 1.253135085105896, 'learning_rate': 1.9316877312574756e-05, 'epoch': 0.43} +2025-02-05 12:24:10 - ERROR - stderr - 14%|█▍ | 3249/22434 [2:16:30<13:20:14, 2.50s/it] +2025-02-05 12:24:12 - ERROR - stderr - 14%|█▍ | 3250/22434 [2:16:32<13:14:56, 2.49s/it] +2025-02-05 12:24:12 - ERROR - stderr - +2025-02-05 12:24:12 - ERROR - stderr - +2025-02-05 12:24:12 - INFO - stdout - {'loss': 0.9202, 'grad_norm': 1.188148021697998, 'learning_rate': 1.931635275979148e-05, 'epoch': 0.43} +2025-02-05 12:24:12 - ERROR - stderr - 14%|█▍ | 3250/22434 [2:16:32<13:14:56, 2.49s/it] +2025-02-05 12:24:15 - ERROR - stderr - 14%|█▍ | 3251/22434 [2:16:35<13:17:10, 2.49s/it] +2025-02-05 12:24:15 - ERROR - stderr - +2025-02-05 12:24:15 - ERROR - stderr - +2025-02-05 12:24:15 - INFO - stdout - {'loss': 1.0526, 'grad_norm': 1.1647379398345947, 'learning_rate': 1.9315828012817742e-05, 'epoch': 0.43} +2025-02-05 12:24:15 - ERROR - stderr - 14%|█▍ | 3251/22434 [2:16:35<13:17:10, 2.49s/it] +2025-02-05 12:24:17 - ERROR - stderr - 14%|█▍ | 3252/22434 [2:16:37<13:13:01, 2.48s/it] +2025-02-05 12:24:17 - ERROR - stderr - +2025-02-05 12:24:17 - ERROR - stderr - +2025-02-05 12:24:17 - INFO - stdout - {'loss': 1.0161, 'grad_norm': 1.0021169185638428, 'learning_rate': 1.9315303071664486e-05, 'epoch': 0.43} +2025-02-05 12:24:17 - ERROR - stderr - 14%|█▍ | 3252/22434 [2:16:37<13:13:01, 2.48s/it] +2025-02-05 12:24:20 - ERROR - stderr - 15%|█▍ | 3253/22434 [2:16:40<13:22:04, 2.51s/it] +2025-02-05 12:24:20 - ERROR - stderr - +2025-02-05 12:24:20 - ERROR - stderr - +2025-02-05 12:24:20 - INFO - stdout - {'loss': 0.9394, 'grad_norm': 1.1428781747817993, 'learning_rate': 1.9314777936342648e-05, 'epoch': 0.44} +2025-02-05 12:24:20 - ERROR - stderr - 15%|█▍ | 3253/22434 [2:16:40<13:22:04, 2.51s/it] +2025-02-05 12:24:22 - ERROR - stderr - 15%|█▍ | 3254/22434 [2:16:42<13:24:48, 2.52s/it] +2025-02-05 12:24:22 - ERROR - stderr - +2025-02-05 12:24:22 - ERROR - stderr - +2025-02-05 12:24:22 - INFO - stdout - {'loss': 1.0201, 'grad_norm': 0.9778270721435547, 'learning_rate': 1.931425260686318e-05, 'epoch': 0.44} +2025-02-05 12:24:22 - ERROR - stderr - 15%|█▍ | 3254/22434 [2:16:42<13:24:48, 2.52s/it] +2025-02-05 12:24:25 - ERROR - stderr - 15%|█▍ | 3255/22434 [2:16:45<13:24:01, 2.52s/it] +2025-02-05 12:24:25 - ERROR - stderr - +2025-02-05 12:24:25 - ERROR - stderr - +2025-02-05 12:24:25 - INFO - stdout - {'loss': 1.0505, 'grad_norm': 1.0943289995193481, 'learning_rate': 1.9313727083237028e-05, 'epoch': 0.44} +2025-02-05 12:24:25 - ERROR - stderr - 15%|█▍ | 3255/22434 [2:16:45<13:24:01, 2.52s/it] +2025-02-05 12:24:27 - ERROR - stderr - 15%|█▍ | 3256/22434 [2:16:47<13:20:13, 2.50s/it] +2025-02-05 12:24:27 - ERROR - stderr - +2025-02-05 12:24:27 - ERROR - stderr - +2025-02-05 12:24:27 - INFO - stdout - {'loss': 0.9662, 'grad_norm': 1.1592936515808105, 'learning_rate': 1.9313201365475146e-05, 'epoch': 0.44} +2025-02-05 12:24:27 - ERROR - stderr - 15%|█▍ | 3256/22434 [2:16:47<13:20:13, 2.50s/it] +2025-02-05 12:24:30 - ERROR - stderr - 15%|█▍ | 3257/22434 [2:16:50<13:18:25, 2.50s/it] +2025-02-05 12:24:30 - ERROR - stderr - +2025-02-05 12:24:30 - ERROR - stderr - +2025-02-05 12:24:30 - INFO - stdout - {'loss': 0.9783, 'grad_norm': 1.143384575843811, 'learning_rate': 1.93126754535885e-05, 'epoch': 0.44} +2025-02-05 12:24:30 - ERROR - stderr - 15%|█▍ | 3257/22434 [2:16:50<13:18:25, 2.50s/it] +2025-02-05 12:24:32 - ERROR - stderr - 15%|█▍ | 3258/22434 [2:16:52<13:15:19, 2.49s/it] +2025-02-05 12:24:32 - ERROR - stderr - +2025-02-05 12:24:32 - ERROR - stderr - +2025-02-05 12:24:32 - INFO - stdout - {'loss': 0.9328, 'grad_norm': 1.1319254636764526, 'learning_rate': 1.9312149347588035e-05, 'epoch': 0.44} +2025-02-05 12:24:32 - ERROR - stderr - 15%|█▍ | 3258/22434 [2:16:52<13:15:19, 2.49s/it] +2025-02-05 12:24:35 - ERROR - stderr - 15%|█▍ | 3259/22434 [2:16:55<13:18:25, 2.50s/it] +2025-02-05 12:24:35 - ERROR - stderr - +2025-02-05 12:24:35 - ERROR - stderr - +2025-02-05 12:24:35 - INFO - stdout - {'loss': 0.9043, 'grad_norm': 0.9889384508132935, 'learning_rate': 1.9311623047484734e-05, 'epoch': 0.44} +2025-02-05 12:24:35 - ERROR - stderr - 15%|█▍ | 3259/22434 [2:16:55<13:18:25, 2.50s/it] +2025-02-05 12:24:37 - ERROR - stderr - 15%|█▍ | 3260/22434 [2:16:57<13:27:35, 2.53s/it] +2025-02-05 12:24:38 - ERROR - stderr - +2025-02-05 12:24:38 - ERROR - stderr - +2025-02-05 12:24:38 - INFO - stdout - {'loss': 1.0451, 'grad_norm': 1.17056143283844, 'learning_rate': 1.9311096553289563e-05, 'epoch': 0.44} +2025-02-05 12:24:38 - ERROR - stderr - 15%|█▍ | 3260/22434 [2:16:57<13:27:35, 2.53s/it] +2025-02-05 12:24:40 - ERROR - stderr - 15%|█▍ | 3261/22434 [2:17:00<13:28:31, 2.53s/it] +2025-02-05 12:24:40 - ERROR - stderr - +2025-02-05 12:24:40 - ERROR - stderr - +2025-02-05 12:24:40 - INFO - stdout - {'loss': 0.8717, 'grad_norm': 0.9741213917732239, 'learning_rate': 1.9310569865013488e-05, 'epoch': 0.44} +2025-02-05 12:24:40 - ERROR - stderr - 15%|█▍ | 3261/22434 [2:17:00<13:28:31, 2.53s/it] +2025-02-05 12:24:43 - ERROR - stderr - 15%|█▍ | 3262/22434 [2:17:02<13:46:05, 2.59s/it] +2025-02-05 12:24:43 - ERROR - stderr - +2025-02-05 12:24:43 - ERROR - stderr - +2025-02-05 12:24:43 - INFO - stdout - {'loss': 0.9078, 'grad_norm': 1.2895677089691162, 'learning_rate': 1.9310042982667498e-05, 'epoch': 0.44} +2025-02-05 12:24:43 - ERROR - stderr - 15%|█▍ | 3262/22434 [2:17:03<13:46:05, 2.59s/it] +2025-02-05 12:24:45 - ERROR - stderr - 15%|█▍ | 3263/22434 [2:17:05<13:38:09, 2.56s/it] +2025-02-05 12:24:45 - ERROR - stderr - +2025-02-05 12:24:45 - ERROR - stderr - +2025-02-05 12:24:45 - INFO - stdout - {'loss': 0.9789, 'grad_norm': 1.124538540840149, 'learning_rate': 1.930951590626257e-05, 'epoch': 0.44} +2025-02-05 12:24:45 - ERROR - stderr - 15%|█▍ | 3263/22434 [2:17:05<13:38:09, 2.56s/it] +2025-02-05 12:24:48 - ERROR - stderr - 15%|█▍ | 3264/22434 [2:17:08<13:34:27, 2.55s/it] +2025-02-05 12:24:48 - ERROR - stderr - +2025-02-05 12:24:48 - ERROR - stderr - +2025-02-05 12:24:48 - INFO - stdout - {'loss': 0.917, 'grad_norm': 1.1540082693099976, 'learning_rate': 1.9308988635809688e-05, 'epoch': 0.44} +2025-02-05 12:24:48 - ERROR - stderr - 15%|█▍ | 3264/22434 [2:17:08<13:34:27, 2.55s/it] +2025-02-05 12:24:50 - ERROR - stderr - 15%|█▍ | 3265/22434 [2:17:10<13:20:28, 2.51s/it] +2025-02-05 12:24:50 - ERROR - stderr - +2025-02-05 12:24:50 - ERROR - stderr - +2025-02-05 12:24:50 - INFO - stdout - {'loss': 1.1223, 'grad_norm': 1.3382575511932373, 'learning_rate': 1.930846117131985e-05, 'epoch': 0.44} +2025-02-05 12:24:50 - ERROR - stderr - 15%|█▍ | 3265/22434 [2:17:10<13:20:28, 2.51s/it] +2025-02-05 12:24:53 - ERROR - stderr - 15%|█▍ | 3266/22434 [2:17:12<13:23:27, 2.52s/it] +2025-02-05 12:24:53 - ERROR - stderr - +2025-02-05 12:24:53 - ERROR - stderr - +2025-02-05 12:24:53 - INFO - stdout - {'loss': 1.0398, 'grad_norm': 1.3084815740585327, 'learning_rate': 1.930793351280404e-05, 'epoch': 0.44} +2025-02-05 12:24:53 - ERROR - stderr - 15%|█▍ | 3266/22434 [2:17:12<13:23:27, 2.52s/it] +2025-02-05 12:24:55 - ERROR - stderr - 15%|█▍ | 3267/22434 [2:17:15<13:21:59, 2.51s/it] +2025-02-05 12:24:55 - ERROR - stderr - +2025-02-05 12:24:55 - ERROR - stderr - +2025-02-05 12:24:55 - INFO - stdout - {'loss': 0.9553, 'grad_norm': 1.111212968826294, 'learning_rate': 1.930740566027327e-05, 'epoch': 0.44} +2025-02-05 12:24:55 - ERROR - stderr - 15%|█▍ | 3267/22434 [2:17:15<13:21:59, 2.51s/it] +2025-02-05 12:24:58 - ERROR - stderr - 15%|█▍ | 3268/22434 [2:17:18<13:29:54, 2.54s/it] +2025-02-05 12:24:58 - ERROR - stderr - +2025-02-05 12:24:58 - ERROR - stderr - +2025-02-05 12:24:58 - INFO - stdout - {'loss': 0.9113, 'grad_norm': 1.0764597654342651, 'learning_rate': 1.9306877613738532e-05, 'epoch': 0.44} +2025-02-05 12:24:58 - ERROR - stderr - 15%|█▍ | 3268/22434 [2:17:18<13:29:54, 2.54s/it] +2025-02-05 12:25:00 - ERROR - stderr - 15%|█▍ | 3269/22434 [2:17:20<13:29:46, 2.54s/it] +2025-02-05 12:25:00 - ERROR - stderr - +2025-02-05 12:25:00 - ERROR - stderr - +2025-02-05 12:25:00 - INFO - stdout - {'loss': 1.0067, 'grad_norm': 1.0475043058395386, 'learning_rate': 1.9306349373210834e-05, 'epoch': 0.44} +2025-02-05 12:25:00 - ERROR - stderr - 15%|█▍ | 3269/22434 [2:17:20<13:29:46, 2.54s/it] +2025-02-05 12:25:03 - ERROR - stderr - 15%|█▍ | 3270/22434 [2:17:23<13:38:46, 2.56s/it] +2025-02-05 12:25:03 - ERROR - stderr - +2025-02-05 12:25:03 - ERROR - stderr - +2025-02-05 12:25:03 - INFO - stdout - {'loss': 1.0595, 'grad_norm': 1.1076101064682007, 'learning_rate': 1.9305820938701193e-05, 'epoch': 0.44} +2025-02-05 12:25:03 - ERROR - stderr - 15%|█▍ | 3270/22434 [2:17:23<13:38:46, 2.56s/it] +2025-02-05 12:25:06 - ERROR - stderr - 15%|█▍ | 3271/22434 [2:17:25<13:55:32, 2.62s/it] +2025-02-05 12:25:06 - ERROR - stderr - +2025-02-05 12:25:06 - ERROR - stderr - +2025-02-05 12:25:06 - INFO - stdout - {'loss': 0.8583, 'grad_norm': 1.059186577796936, 'learning_rate': 1.9305292310220614e-05, 'epoch': 0.44} +2025-02-05 12:25:06 - ERROR - stderr - 15%|█▍ | 3271/22434 [2:17:25<13:55:32, 2.62s/it] +2025-02-05 12:25:08 - ERROR - stderr - 15%|█▍ | 3272/22434 [2:17:28<13:39:54, 2.57s/it] +2025-02-05 12:25:08 - ERROR - stderr - +2025-02-05 12:25:08 - ERROR - stderr - +2025-02-05 12:25:08 - INFO - stdout - {'loss': 1.0495, 'grad_norm': 1.1533136367797852, 'learning_rate': 1.9304763487780125e-05, 'epoch': 0.44} +2025-02-05 12:25:08 - ERROR - stderr - 15%|█▍ | 3272/22434 [2:17:28<13:39:54, 2.57s/it] +2025-02-05 12:25:11 - ERROR - stderr - 15%|█▍ | 3273/22434 [2:17:30<13:32:39, 2.54s/it] +2025-02-05 12:25:11 - ERROR - stderr - +2025-02-05 12:25:11 - ERROR - stderr - +2025-02-05 12:25:11 - INFO - stdout - {'loss': 0.9711, 'grad_norm': 1.1041620969772339, 'learning_rate': 1.9304234471390742e-05, 'epoch': 0.44} +2025-02-05 12:25:11 - ERROR - stderr - 15%|█▍ | 3273/22434 [2:17:30<13:32:39, 2.54s/it] +2025-02-05 12:25:13 - ERROR - stderr - 15%|█▍ | 3274/22434 [2:17:33<13:32:21, 2.54s/it] +2025-02-05 12:25:13 - ERROR - stderr - +2025-02-05 12:25:13 - ERROR - stderr - +2025-02-05 12:25:13 - INFO - stdout - {'loss': 1.0145, 'grad_norm': 1.0773154497146606, 'learning_rate': 1.9303705261063496e-05, 'epoch': 0.44} +2025-02-05 12:25:13 - ERROR - stderr - 15%|█▍ | 3274/22434 [2:17:33<13:32:21, 2.54s/it] +2025-02-05 12:25:16 - ERROR - stderr - 15%|█▍ | 3275/22434 [2:17:35<13:21:27, 2.51s/it] +2025-02-05 12:25:16 - ERROR - stderr - +2025-02-05 12:25:16 - ERROR - stderr - +2025-02-05 12:25:16 - INFO - stdout - {'loss': 1.0048, 'grad_norm': 1.1122961044311523, 'learning_rate': 1.930317585680942e-05, 'epoch': 0.44} +2025-02-05 12:25:16 - ERROR - stderr - 15%|█▍ | 3275/22434 [2:17:35<13:21:27, 2.51s/it] +2025-02-05 12:25:18 - ERROR - stderr - 15%|█▍ | 3276/22434 [2:17:38<13:21:46, 2.51s/it] +2025-02-05 12:25:18 - ERROR - stderr - +2025-02-05 12:25:18 - ERROR - stderr - +2025-02-05 12:25:18 - INFO - stdout - {'loss': 0.9541, 'grad_norm': 1.1464418172836304, 'learning_rate': 1.9302646258639538e-05, 'epoch': 0.44} +2025-02-05 12:25:18 - ERROR - stderr - 15%|█▍ | 3276/22434 [2:17:38<13:21:46, 2.51s/it] +2025-02-05 12:25:21 - ERROR - stderr - 15%|█▍ | 3277/22434 [2:17:40<13:28:54, 2.53s/it] +2025-02-05 12:25:21 - ERROR - stderr - +2025-02-05 12:25:21 - ERROR - stderr - +2025-02-05 12:25:21 - INFO - stdout - {'loss': 1.0563, 'grad_norm': 1.2643078565597534, 'learning_rate': 1.93021164665649e-05, 'epoch': 0.44} +2025-02-05 12:25:21 - ERROR - stderr - 15%|█▍ | 3277/22434 [2:17:41<13:28:54, 2.53s/it] +2025-02-05 12:25:23 - ERROR - stderr - 15%|█▍ | 3278/22434 [2:17:43<13:37:23, 2.56s/it] +2025-02-05 12:25:23 - ERROR - stderr - +2025-02-05 12:25:23 - ERROR - stderr - +2025-02-05 12:25:23 - INFO - stdout - {'loss': 0.9657, 'grad_norm': 1.1109564304351807, 'learning_rate': 1.9301586480596547e-05, 'epoch': 0.44} +2025-02-05 12:25:23 - ERROR - stderr - 15%|█▍ | 3278/22434 [2:17:43<13:37:23, 2.56s/it] +2025-02-05 12:25:26 - ERROR - stderr - 15%|█▍ | 3279/22434 [2:17:46<13:34:53, 2.55s/it] +2025-02-05 12:25:26 - ERROR - stderr - +2025-02-05 12:25:26 - ERROR - stderr - +2025-02-05 12:25:26 - INFO - stdout - {'loss': 0.8434, 'grad_norm': 1.0269380807876587, 'learning_rate': 1.9301056300745523e-05, 'epoch': 0.44} +2025-02-05 12:25:26 - ERROR - stderr - 15%|█▍ | 3279/22434 [2:17:46<13:34:53, 2.55s/it] +2025-02-05 12:25:28 - ERROR - stderr - 15%|█▍ | 3280/22434 [2:17:48<13:31:25, 2.54s/it] +2025-02-05 12:25:28 - ERROR - stderr - +2025-02-05 12:25:28 - ERROR - stderr - +2025-02-05 12:25:28 - INFO - stdout - {'loss': 0.9783, 'grad_norm': 1.0329316854476929, 'learning_rate': 1.930052592702288e-05, 'epoch': 0.44} +2025-02-05 12:25:28 - ERROR - stderr - 15%|█▍ | 3280/22434 [2:17:48<13:31:25, 2.54s/it] +2025-02-05 12:25:31 - ERROR - stderr - 15%|█▍ | 3281/22434 [2:17:51<13:27:23, 2.53s/it] +2025-02-05 12:25:31 - ERROR - stderr - +2025-02-05 12:25:31 - ERROR - stderr - +2025-02-05 12:25:31 - INFO - stdout - {'loss': 0.9532, 'grad_norm': 1.1038655042648315, 'learning_rate': 1.9299995359439672e-05, 'epoch': 0.44} +2025-02-05 12:25:31 - ERROR - stderr - 15%|█▍ | 3281/22434 [2:17:51<13:27:23, 2.53s/it] +2025-02-05 12:25:33 - ERROR - stderr - 15%|█▍ | 3282/22434 [2:17:53<13:18:29, 2.50s/it] +2025-02-05 12:25:33 - ERROR - stderr - +2025-02-05 12:25:33 - ERROR - stderr - +2025-02-05 12:25:33 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.0996888875961304, 'learning_rate': 1.9299464598006964e-05, 'epoch': 0.44} +2025-02-05 12:25:33 - ERROR - stderr - 15%|█▍ | 3282/22434 [2:17:53<13:18:29, 2.50s/it] +2025-02-05 12:25:36 - ERROR - stderr - 15%|█▍ | 3283/22434 [2:17:56<13:13:06, 2.48s/it] +2025-02-05 12:25:36 - ERROR - stderr - +2025-02-05 12:25:36 - ERROR - stderr - +2025-02-05 12:25:36 - INFO - stdout - {'loss': 1.044, 'grad_norm': 1.1291358470916748, 'learning_rate': 1.9298933642735817e-05, 'epoch': 0.44} +2025-02-05 12:25:36 - ERROR - stderr - 15%|█▍ | 3283/22434 [2:17:56<13:13:06, 2.48s/it] +2025-02-05 12:25:38 - ERROR - stderr - 15%|█▍ | 3284/22434 [2:17:58<13:09:24, 2.47s/it] +2025-02-05 12:25:38 - ERROR - stderr - +2025-02-05 12:25:38 - ERROR - stderr - +2025-02-05 12:25:38 - INFO - stdout - {'loss': 1.0118, 'grad_norm': 1.1658300161361694, 'learning_rate': 1.929840249363729e-05, 'epoch': 0.44} +2025-02-05 12:25:38 - ERROR - stderr - 15%|█▍ | 3284/22434 [2:17:58<13:09:24, 2.47s/it] +2025-02-05 12:25:41 - ERROR - stderr - 15%|█▍ | 3285/22434 [2:18:00<13:10:46, 2.48s/it] +2025-02-05 12:25:41 - ERROR - stderr - +2025-02-05 12:25:41 - ERROR - stderr - +2025-02-05 12:25:41 - INFO - stdout - {'loss': 1.1532, 'grad_norm': 1.310865879058838, 'learning_rate': 1.9297871150722463e-05, 'epoch': 0.44} +2025-02-05 12:25:41 - ERROR - stderr - 15%|█▍ | 3285/22434 [2:18:00<13:10:46, 2.48s/it] +2025-02-05 12:25:43 - ERROR - stderr - 15%|█▍ | 3286/22434 [2:18:03<13:07:10, 2.47s/it] +2025-02-05 12:25:43 - ERROR - stderr - +2025-02-05 12:25:43 - ERROR - stderr - +2025-02-05 12:25:43 - INFO - stdout - {'loss': 0.9064, 'grad_norm': 1.1090534925460815, 'learning_rate': 1.9297339614002412e-05, 'epoch': 0.44} +2025-02-05 12:25:43 - ERROR - stderr - 15%|█▍ | 3286/22434 [2:18:03<13:07:10, 2.47s/it] +2025-02-05 12:25:46 - ERROR - stderr - 15%|█▍ | 3287/22434 [2:18:05<13:03:33, 2.46s/it] +2025-02-05 12:25:46 - ERROR - stderr - +2025-02-05 12:25:46 - ERROR - stderr - +2025-02-05 12:25:46 - INFO - stdout - {'loss': 0.9505, 'grad_norm': 1.0887482166290283, 'learning_rate': 1.929680788348821e-05, 'epoch': 0.44} +2025-02-05 12:25:46 - ERROR - stderr - 15%|█▍ | 3287/22434 [2:18:05<13:03:33, 2.46s/it] +2025-02-05 12:25:48 - ERROR - stderr - 15%|█▍ | 3288/22434 [2:18:08<13:13:32, 2.49s/it] +2025-02-05 12:25:48 - ERROR - stderr - +2025-02-05 12:25:48 - ERROR - stderr - +2025-02-05 12:25:48 - INFO - stdout - {'loss': 1.0883, 'grad_norm': 1.1095621585845947, 'learning_rate': 1.9296275959190943e-05, 'epoch': 0.44} +2025-02-05 12:25:48 - ERROR - stderr - 15%|█▍ | 3288/22434 [2:18:08<13:13:32, 2.49s/it] +2025-02-05 12:25:51 - ERROR - stderr - 15%|█▍ | 3289/22434 [2:18:10<13:17:16, 2.50s/it] +2025-02-05 12:25:51 - ERROR - stderr - +2025-02-05 12:25:51 - ERROR - stderr - +2025-02-05 12:25:51 - INFO - stdout - {'loss': 1.0304, 'grad_norm': 1.1983813047409058, 'learning_rate': 1.92957438411217e-05, 'epoch': 0.44} +2025-02-05 12:25:51 - ERROR - stderr - 15%|█▍ | 3289/22434 [2:18:10<13:17:16, 2.50s/it] +2025-02-05 12:25:53 - ERROR - stderr - 15%|█▍ | 3290/22434 [2:18:13<13:27:52, 2.53s/it] +2025-02-05 12:25:53 - ERROR - stderr - +2025-02-05 12:25:53 - ERROR - stderr - +2025-02-05 12:25:53 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 0.9838898777961731, 'learning_rate': 1.9295211529291574e-05, 'epoch': 0.44} +2025-02-05 12:25:53 - ERROR - stderr - 15%|█▍ | 3290/22434 [2:18:13<13:27:52, 2.53s/it] +2025-02-05 12:25:56 - ERROR - stderr - 15%|█▍ | 3291/22434 [2:18:16<13:45:26, 2.59s/it] +2025-02-05 12:25:56 - ERROR - stderr - +2025-02-05 12:25:56 - ERROR - stderr - +2025-02-05 12:25:56 - INFO - stdout - {'loss': 1.029, 'grad_norm': 1.1803792715072632, 'learning_rate': 1.9294679023711653e-05, 'epoch': 0.44} +2025-02-05 12:25:56 - ERROR - stderr - 15%|█▍ | 3291/22434 [2:18:16<13:45:26, 2.59s/it] +2025-02-05 12:25:58 - ERROR - stderr - 15%|█▍ | 3292/22434 [2:18:18<13:37:52, 2.56s/it] +2025-02-05 12:25:59 - ERROR - stderr - +2025-02-05 12:25:59 - ERROR - stderr - +2025-02-05 12:25:59 - INFO - stdout - {'loss': 1.0151, 'grad_norm': 1.231771469116211, 'learning_rate': 1.9294146324393047e-05, 'epoch': 0.44} +2025-02-05 12:25:59 - ERROR - stderr - 15%|█▍ | 3292/22434 [2:18:18<13:37:52, 2.56s/it] +2025-02-05 12:26:01 - ERROR - stderr - 15%|█▍ | 3293/22434 [2:18:21<13:34:37, 2.55s/it] +2025-02-05 12:26:01 - ERROR - stderr - +2025-02-05 12:26:01 - ERROR - stderr - +2025-02-05 12:26:01 - INFO - stdout - {'loss': 0.9662, 'grad_norm': 1.0567387342453003, 'learning_rate': 1.9293613431346853e-05, 'epoch': 0.44} +2025-02-05 12:26:01 - ERROR - stderr - 15%|█▍ | 3293/22434 [2:18:21<13:34:37, 2.55s/it] +2025-02-05 12:26:03 - ERROR - stderr - 15%|█▍ | 3294/22434 [2:18:23<13:28:21, 2.53s/it] +2025-02-05 12:26:04 - ERROR - stderr - +2025-02-05 12:26:04 - ERROR - stderr - +2025-02-05 12:26:04 - INFO - stdout - {'loss': 0.8648, 'grad_norm': 0.9989197254180908, 'learning_rate': 1.929308034458418e-05, 'epoch': 0.44} +2025-02-05 12:26:04 - ERROR - stderr - 15%|█▍ | 3294/22434 [2:18:23<13:28:21, 2.53s/it] +2025-02-05 12:26:06 - ERROR - stderr - 15%|█▍ | 3295/22434 [2:18:26<13:26:03, 2.53s/it] +2025-02-05 12:26:06 - ERROR - stderr - +2025-02-05 12:26:06 - ERROR - stderr - +2025-02-05 12:26:06 - INFO - stdout - {'loss': 1.0167, 'grad_norm': 1.073473334312439, 'learning_rate': 1.929254706411614e-05, 'epoch': 0.44} +2025-02-05 12:26:06 - ERROR - stderr - 15%|█▍ | 3295/22434 [2:18:26<13:26:03, 2.53s/it] +2025-02-05 12:26:08 - ERROR - stderr - 15%|█▍ | 3296/22434 [2:18:28<13:19:30, 2.51s/it] +2025-02-05 12:26:09 - ERROR - stderr - +2025-02-05 12:26:09 - ERROR - stderr - +2025-02-05 12:26:09 - INFO - stdout - {'loss': 0.991, 'grad_norm': 1.144068956375122, 'learning_rate': 1.9292013589953847e-05, 'epoch': 0.44} +2025-02-05 12:26:09 - ERROR - stderr - 15%|█▍ | 3296/22434 [2:18:28<13:19:30, 2.51s/it] +2025-02-05 12:26:11 - ERROR - stderr - 15%|█▍ | 3297/22434 [2:18:31<13:29:23, 2.54s/it] +2025-02-05 12:26:11 - ERROR - stderr - +2025-02-05 12:26:11 - ERROR - stderr - +2025-02-05 12:26:11 - INFO - stdout - {'loss': 0.9076, 'grad_norm': 1.1017115116119385, 'learning_rate': 1.929147992210842e-05, 'epoch': 0.44} +2025-02-05 12:26:11 - ERROR - stderr - 15%|█▍ | 3297/22434 [2:18:31<13:29:23, 2.54s/it] +2025-02-05 12:26:14 - ERROR - stderr - 15%|█▍ | 3298/22434 [2:18:33<13:21:56, 2.51s/it] +2025-02-05 12:26:14 - ERROR - stderr - +2025-02-05 12:26:14 - ERROR - stderr - +2025-02-05 12:26:14 - INFO - stdout - {'loss': 1.0691, 'grad_norm': 1.3337068557739258, 'learning_rate': 1.9290946060590992e-05, 'epoch': 0.44} +2025-02-05 12:26:14 - ERROR - stderr - 15%|█▍ | 3298/22434 [2:18:33<13:21:56, 2.51s/it] +2025-02-05 12:26:16 - ERROR - stderr - 15%|█▍ | 3299/22434 [2:18:36<13:21:46, 2.51s/it] +2025-02-05 12:26:16 - ERROR - stderr - +2025-02-05 12:26:16 - ERROR - stderr - +2025-02-05 12:26:16 - INFO - stdout - {'loss': 0.9559, 'grad_norm': 1.0668264627456665, 'learning_rate': 1.9290412005412676e-05, 'epoch': 0.44} +2025-02-05 12:26:16 - ERROR - stderr - 15%|█▍ | 3299/22434 [2:18:36<13:21:46, 2.51s/it] +2025-02-05 12:26:19 - ERROR - stderr - 15%|█▍ | 3300/22434 [2:18:38<13:23:13, 2.52s/it] +2025-02-05 12:26:19 - ERROR - stderr - +2025-02-05 12:26:19 - ERROR - stderr - +2025-02-05 12:26:19 - INFO - stdout - {'loss': 0.8681, 'grad_norm': 0.9895573258399963, 'learning_rate': 1.9289877756584618e-05, 'epoch': 0.44} +2025-02-05 12:26:19 - ERROR - stderr - 15%|█▍ | 3300/22434 [2:18:38<13:23:13, 2.52s/it] +2025-02-05 12:26:21 - ERROR - stderr - 15%|█▍ | 3301/22434 [2:18:41<13:19:26, 2.51s/it] +2025-02-05 12:26:21 - ERROR - stderr - +2025-02-05 12:26:21 - ERROR - stderr - +2025-02-05 12:26:21 - INFO - stdout - {'loss': 1.0162, 'grad_norm': 1.1195969581604004, 'learning_rate': 1.9289343314117946e-05, 'epoch': 0.44} +2025-02-05 12:26:21 - ERROR - stderr - 15%|█▍ | 3301/22434 [2:18:41<13:19:26, 2.51s/it] +2025-02-05 12:26:24 - ERROR - stderr - 15%|█▍ | 3302/22434 [2:18:43<13:15:22, 2.49s/it] +2025-02-05 12:26:24 - ERROR - stderr - +2025-02-05 12:26:24 - ERROR - stderr - +2025-02-05 12:26:24 - INFO - stdout - {'loss': 1.0334, 'grad_norm': 1.0682613849639893, 'learning_rate': 1.92888086780238e-05, 'epoch': 0.44} +2025-02-05 12:26:24 - ERROR - stderr - 15%|█▍ | 3302/22434 [2:18:43<13:15:22, 2.49s/it] +2025-02-05 12:26:26 - ERROR - stderr - 15%|█▍ | 3303/22434 [2:18:46<13:29:14, 2.54s/it] +2025-02-05 12:26:26 - ERROR - stderr - +2025-02-05 12:26:26 - ERROR - stderr - +2025-02-05 12:26:26 - INFO - stdout - {'loss': 1.0388, 'grad_norm': 1.1326122283935547, 'learning_rate': 1.9288273848313325e-05, 'epoch': 0.44} +2025-02-05 12:26:26 - ERROR - stderr - 15%|█▍ | 3303/22434 [2:18:46<13:29:14, 2.54s/it] +2025-02-05 12:26:29 - ERROR - stderr - 15%|█▍ | 3304/22434 [2:18:48<13:29:12, 2.54s/it] +2025-02-05 12:26:29 - ERROR - stderr - +2025-02-05 12:26:29 - ERROR - stderr - +2025-02-05 12:26:29 - INFO - stdout - {'loss': 1.0486, 'grad_norm': 1.0917998552322388, 'learning_rate': 1.9287738824997672e-05, 'epoch': 0.44} +2025-02-05 12:26:29 - ERROR - stderr - 15%|█▍ | 3304/22434 [2:18:49<13:29:12, 2.54s/it] +2025-02-05 12:26:31 - ERROR - stderr - 15%|█▍ | 3305/22434 [2:18:51<13:23:06, 2.52s/it] +2025-02-05 12:26:31 - ERROR - stderr - +2025-02-05 12:26:31 - ERROR - stderr - +2025-02-05 12:26:31 - INFO - stdout - {'loss': 1.0108, 'grad_norm': 1.100752353668213, 'learning_rate': 1.9287203608087987e-05, 'epoch': 0.44} +2025-02-05 12:26:31 - ERROR - stderr - 15%|█▍ | 3305/22434 [2:18:51<13:23:06, 2.52s/it] +2025-02-05 12:26:34 - ERROR - stderr - 15%|█▍ | 3306/22434 [2:18:54<13:34:42, 2.56s/it] +2025-02-05 12:26:34 - ERROR - stderr - +2025-02-05 12:26:34 - ERROR - stderr - +2025-02-05 12:26:34 - INFO - stdout - {'loss': 0.9483, 'grad_norm': 1.1760727167129517, 'learning_rate': 1.928666819759543e-05, 'epoch': 0.44} +2025-02-05 12:26:34 - ERROR - stderr - 15%|█▍ | 3306/22434 [2:18:54<13:34:42, 2.56s/it] +2025-02-05 12:26:36 - ERROR - stderr - 15%|█▍ | 3307/22434 [2:18:56<13:32:43, 2.55s/it] +2025-02-05 12:26:36 - ERROR - stderr - +2025-02-05 12:26:36 - ERROR - stderr - +2025-02-05 12:26:36 - INFO - stdout - {'loss': 0.9873, 'grad_norm': 1.0925190448760986, 'learning_rate': 1.9286132593531167e-05, 'epoch': 0.44} +2025-02-05 12:26:36 - ERROR - stderr - 15%|█▍ | 3307/22434 [2:18:56<13:32:43, 2.55s/it] +2025-02-05 12:26:39 - ERROR - stderr - 15%|█▍ | 3308/22434 [2:18:59<13:42:14, 2.58s/it] +2025-02-05 12:26:39 - ERROR - stderr - +2025-02-05 12:26:39 - ERROR - stderr - +2025-02-05 12:26:39 - INFO - stdout - {'loss': 0.9629, 'grad_norm': 1.2291847467422485, 'learning_rate': 1.9285596795906353e-05, 'epoch': 0.44} +2025-02-05 12:26:39 - ERROR - stderr - 15%|█▍ | 3308/22434 [2:18:59<13:42:14, 2.58s/it] +2025-02-05 12:26:42 - ERROR - stderr - 15%|█▍ | 3309/22434 [2:19:02<14:14:05, 2.68s/it] +2025-02-05 12:26:42 - ERROR - stderr - +2025-02-05 12:26:42 - ERROR - stderr - +2025-02-05 12:26:42 - INFO - stdout - {'loss': 0.9286, 'grad_norm': 1.081689476966858, 'learning_rate': 1.928506080473216e-05, 'epoch': 0.44} +2025-02-05 12:26:42 - ERROR - stderr - 15%|█▍ | 3309/22434 [2:19:02<14:14:05, 2.68s/it] +2025-02-05 12:26:44 - ERROR - stderr - 15%|█▍ | 3310/22434 [2:19:04<13:52:55, 2.61s/it] +2025-02-05 12:26:44 - ERROR - stderr - +2025-02-05 12:26:44 - ERROR - stderr - +2025-02-05 12:26:44 - INFO - stdout - {'loss': 1.0137, 'grad_norm': 1.132133960723877, 'learning_rate': 1.9284524620019756e-05, 'epoch': 0.44} +2025-02-05 12:26:44 - ERROR - stderr - 15%|█▍ | 3310/22434 [2:19:04<13:52:55, 2.61s/it] +2025-02-05 12:26:47 - ERROR - stderr - 15%|█▍ | 3311/22434 [2:19:07<13:38:56, 2.57s/it] +2025-02-05 12:26:47 - ERROR - stderr - +2025-02-05 12:26:47 - ERROR - stderr - +2025-02-05 12:26:47 - INFO - stdout - {'loss': 0.9395, 'grad_norm': 1.086695909500122, 'learning_rate': 1.928398824178032e-05, 'epoch': 0.44} +2025-02-05 12:26:47 - ERROR - stderr - 15%|█▍ | 3311/22434 [2:19:07<13:38:56, 2.57s/it] +2025-02-05 12:26:49 - ERROR - stderr - 15%|█▍ | 3312/22434 [2:19:09<13:33:24, 2.55s/it] +2025-02-05 12:26:49 - ERROR - stderr - +2025-02-05 12:26:49 - ERROR - stderr - +2025-02-05 12:26:49 - INFO - stdout - {'loss': 1.0323, 'grad_norm': 1.1986316442489624, 'learning_rate': 1.9283451670025035e-05, 'epoch': 0.44} +2025-02-05 12:26:49 - ERROR - stderr - 15%|█▍ | 3312/22434 [2:19:09<13:33:24, 2.55s/it] +2025-02-05 12:26:52 - ERROR - stderr - 15%|█▍ | 3313/22434 [2:19:12<13:26:24, 2.53s/it] +2025-02-05 12:26:52 - ERROR - stderr - +2025-02-05 12:26:52 - ERROR - stderr - +2025-02-05 12:26:52 - INFO - stdout - {'loss': 1.0116, 'grad_norm': 1.0736405849456787, 'learning_rate': 1.9282914904765083e-05, 'epoch': 0.44} +2025-02-05 12:26:52 - ERROR - stderr - 15%|█▍ | 3313/22434 [2:19:12<13:26:24, 2.53s/it] +2025-02-05 12:26:54 - ERROR - stderr - 15%|█▍ | 3314/22434 [2:19:14<13:18:18, 2.51s/it] +2025-02-05 12:26:54 - ERROR - stderr - +2025-02-05 12:26:54 - ERROR - stderr - +2025-02-05 12:26:54 - INFO - stdout - {'loss': 0.9197, 'grad_norm': 1.133349061012268, 'learning_rate': 1.928237794601165e-05, 'epoch': 0.44} +2025-02-05 12:26:54 - ERROR - stderr - 15%|█▍ | 3314/22434 [2:19:14<13:18:18, 2.51s/it] +2025-02-05 12:26:57 - ERROR - stderr - 15%|█▍ | 3315/22434 [2:19:16<13:10:24, 2.48s/it] +2025-02-05 12:26:57 - ERROR - stderr - +2025-02-05 12:26:57 - ERROR - stderr - +2025-02-05 12:26:57 - INFO - stdout - {'loss': 0.8588, 'grad_norm': 1.0145351886749268, 'learning_rate': 1.928184079377594e-05, 'epoch': 0.44} +2025-02-05 12:26:57 - ERROR - stderr - 15%|█▍ | 3315/22434 [2:19:17<13:10:24, 2.48s/it] +2025-02-05 12:26:59 - ERROR - stderr - 15%|█▍ | 3316/22434 [2:19:19<13:16:47, 2.50s/it] +2025-02-05 12:26:59 - ERROR - stderr - +2025-02-05 12:26:59 - ERROR - stderr - +2025-02-05 12:26:59 - INFO - stdout - {'loss': 0.9751, 'grad_norm': 1.0097167491912842, 'learning_rate': 1.9281303448069132e-05, 'epoch': 0.44} +2025-02-05 12:26:59 - ERROR - stderr - 15%|█▍ | 3316/22434 [2:19:19<13:16:47, 2.50s/it] +2025-02-05 12:27:02 - ERROR - stderr - 15%|█▍ | 3317/22434 [2:19:22<13:16:01, 2.50s/it] +2025-02-05 12:27:02 - ERROR - stderr - +2025-02-05 12:27:02 - ERROR - stderr - +2025-02-05 12:27:02 - INFO - stdout - {'loss': 0.9229, 'grad_norm': 1.193129539489746, 'learning_rate': 1.9280765908902437e-05, 'epoch': 0.44} +2025-02-05 12:27:02 - ERROR - stderr - 15%|█▍ | 3317/22434 [2:19:22<13:16:01, 2.50s/it] +2025-02-05 12:27:04 - ERROR - stderr - 15%|█▍ | 3318/22434 [2:19:24<13:10:17, 2.48s/it] +2025-02-05 12:27:04 - ERROR - stderr - +2025-02-05 12:27:04 - ERROR - stderr - +2025-02-05 12:27:04 - INFO - stdout - {'loss': 0.9527, 'grad_norm': 1.1657564640045166, 'learning_rate': 1.9280228176287057e-05, 'epoch': 0.44} +2025-02-05 12:27:04 - ERROR - stderr - 15%|█▍ | 3318/22434 [2:19:24<13:10:17, 2.48s/it] +2025-02-05 12:27:07 - ERROR - stderr - 15%|█▍ | 3319/22434 [2:19:26<13:12:25, 2.49s/it] +2025-02-05 12:27:07 - ERROR - stderr - +2025-02-05 12:27:07 - ERROR - stderr - +2025-02-05 12:27:07 - INFO - stdout - {'loss': 1.0438, 'grad_norm': 1.0933988094329834, 'learning_rate': 1.92796902502342e-05, 'epoch': 0.44} +2025-02-05 12:27:07 - ERROR - stderr - 15%|█▍ | 3319/22434 [2:19:26<13:12:25, 2.49s/it] +2025-02-05 12:27:09 - ERROR - stderr - 15%|█▍ | 3320/22434 [2:19:29<13:14:47, 2.49s/it] +2025-02-05 12:27:09 - ERROR - stderr - +2025-02-05 12:27:09 - ERROR - stderr - +2025-02-05 12:27:09 - INFO - stdout - {'loss': 0.9411, 'grad_norm': 1.2894423007965088, 'learning_rate': 1.9279152130755082e-05, 'epoch': 0.44} +2025-02-05 12:27:09 - ERROR - stderr - 15%|█▍ | 3320/22434 [2:19:29<13:14:47, 2.49s/it] +2025-02-05 12:27:12 - ERROR - stderr - 15%|█▍ | 3321/22434 [2:19:32<13:40:21, 2.58s/it] +2025-02-05 12:27:12 - ERROR - stderr - +2025-02-05 12:27:12 - ERROR - stderr - +2025-02-05 12:27:12 - INFO - stdout - {'loss': 0.9331, 'grad_norm': 1.0571297407150269, 'learning_rate': 1.9278613817860917e-05, 'epoch': 0.44} +2025-02-05 12:27:12 - ERROR - stderr - 15%|█▍ | 3321/22434 [2:19:32<13:40:21, 2.58s/it] +2025-02-05 12:27:14 - ERROR - stderr - 15%|█▍ | 3322/22434 [2:19:34<13:34:44, 2.56s/it] +2025-02-05 12:27:15 - ERROR - stderr - +2025-02-05 12:27:15 - ERROR - stderr - +2025-02-05 12:27:15 - INFO - stdout - {'loss': 0.88, 'grad_norm': 1.0643575191497803, 'learning_rate': 1.9278075311562922e-05, 'epoch': 0.44} +2025-02-05 12:27:15 - ERROR - stderr - 15%|█▍ | 3322/22434 [2:19:34<13:34:44, 2.56s/it] +2025-02-05 12:27:17 - ERROR - stderr - 15%|█▍ | 3323/22434 [2:19:37<13:24:29, 2.53s/it] +2025-02-05 12:27:17 - ERROR - stderr - +2025-02-05 12:27:17 - ERROR - stderr - +2025-02-05 12:27:17 - INFO - stdout - {'loss': 0.9576, 'grad_norm': 1.0989140272140503, 'learning_rate': 1.9277536611872327e-05, 'epoch': 0.44} +2025-02-05 12:27:17 - ERROR - stderr - 15%|█▍ | 3323/22434 [2:19:37<13:24:29, 2.53s/it] +2025-02-05 12:27:19 - ERROR - stderr - 15%|█▍ | 3324/22434 [2:19:39<13:21:54, 2.52s/it] +2025-02-05 12:27:19 - ERROR - stderr - +2025-02-05 12:27:19 - ERROR - stderr - +2025-02-05 12:27:19 - INFO - stdout - {'loss': 1.0726, 'grad_norm': 1.154719591140747, 'learning_rate': 1.9276997718800362e-05, 'epoch': 0.44} +2025-02-05 12:27:19 - ERROR - stderr - 15%|█▍ | 3324/22434 [2:19:39<13:21:54, 2.52s/it] +2025-02-05 12:27:22 - ERROR - stderr - 15%|█▍ | 3325/22434 [2:19:42<13:40:31, 2.58s/it] +2025-02-05 12:27:22 - ERROR - stderr - +2025-02-05 12:27:22 - ERROR - stderr - +2025-02-05 12:27:22 - INFO - stdout - {'loss': 1.0416, 'grad_norm': 1.1565909385681152, 'learning_rate': 1.9276458632358253e-05, 'epoch': 0.44} +2025-02-05 12:27:22 - ERROR - stderr - 15%|█▍ | 3325/22434 [2:19:42<13:40:31, 2.58s/it] +2025-02-05 12:27:25 - ERROR - stderr - 15%|█▍ | 3326/22434 [2:19:44<13:30:49, 2.55s/it] +2025-02-05 12:27:25 - ERROR - stderr - +2025-02-05 12:27:25 - ERROR - stderr - +2025-02-05 12:27:25 - INFO - stdout - {'loss': 0.9912, 'grad_norm': 1.086600422859192, 'learning_rate': 1.9275919352557242e-05, 'epoch': 0.44} +2025-02-05 12:27:25 - ERROR - stderr - 15%|█▍ | 3326/22434 [2:19:44<13:30:49, 2.55s/it] +2025-02-05 12:27:27 - ERROR - stderr - 15%|█▍ | 3327/22434 [2:19:47<13:34:31, 2.56s/it] +2025-02-05 12:27:27 - ERROR - stderr - +2025-02-05 12:27:27 - ERROR - stderr - +2025-02-05 12:27:27 - INFO - stdout - {'loss': 1.0216, 'grad_norm': 1.1150155067443848, 'learning_rate': 1.927537987940857e-05, 'epoch': 0.44} +2025-02-05 12:27:27 - ERROR - stderr - 15%|█▍ | 3327/22434 [2:19:47<13:34:31, 2.56s/it] +2025-02-05 12:27:30 - ERROR - stderr - 15%|█▍ | 3328/22434 [2:19:49<13:21:52, 2.52s/it] +2025-02-05 12:27:30 - ERROR - stderr - +2025-02-05 12:27:30 - ERROR - stderr - +2025-02-05 12:27:30 - INFO - stdout - {'loss': 0.9119, 'grad_norm': 1.0587728023529053, 'learning_rate': 1.9274840212923476e-05, 'epoch': 0.45} +2025-02-05 12:27:30 - ERROR - stderr - 15%|█▍ | 3328/22434 [2:19:49<13:21:52, 2.52s/it] +2025-02-05 12:27:32 - ERROR - stderr - 15%|█▍ | 3329/22434 [2:19:52<13:15:14, 2.50s/it] +2025-02-05 12:27:32 - ERROR - stderr - +2025-02-05 12:27:32 - ERROR - stderr - +2025-02-05 12:27:32 - INFO - stdout - {'loss': 0.9733, 'grad_norm': 1.2374671697616577, 'learning_rate': 1.9274300353113212e-05, 'epoch': 0.45} +2025-02-05 12:27:32 - ERROR - stderr - 15%|█▍ | 3329/22434 [2:19:52<13:15:14, 2.50s/it] +2025-02-05 12:27:35 - ERROR - stderr - 15%|█▍ | 3330/22434 [2:19:54<13:11:39, 2.49s/it] +2025-02-05 12:27:35 - ERROR - stderr - +2025-02-05 12:27:35 - ERROR - stderr - +2025-02-05 12:27:35 - INFO - stdout - {'loss': 1.0323, 'grad_norm': 1.161790132522583, 'learning_rate': 1.9273760299989036e-05, 'epoch': 0.45} +2025-02-05 12:27:35 - ERROR - stderr - 15%|█▍ | 3330/22434 [2:19:54<13:11:39, 2.49s/it] +2025-02-05 12:27:37 - ERROR - stderr - 15%|█▍ | 3331/22434 [2:19:57<13:15:23, 2.50s/it] +2025-02-05 12:27:37 - ERROR - stderr - +2025-02-05 12:27:37 - ERROR - stderr - +2025-02-05 12:27:37 - INFO - stdout - {'loss': 0.9728, 'grad_norm': 1.1512707471847534, 'learning_rate': 1.92732200535622e-05, 'epoch': 0.45} +2025-02-05 12:27:37 - ERROR - stderr - 15%|█▍ | 3331/22434 [2:19:57<13:15:23, 2.50s/it] +2025-02-05 12:27:40 - ERROR - stderr - 15%|█▍ | 3332/22434 [2:19:59<13:25:52, 2.53s/it] +2025-02-05 12:27:40 - ERROR - stderr - +2025-02-05 12:27:40 - ERROR - stderr - +2025-02-05 12:27:40 - INFO - stdout - {'loss': 0.963, 'grad_norm': 1.2696303129196167, 'learning_rate': 1.9272679613843962e-05, 'epoch': 0.45} +2025-02-05 12:27:40 - ERROR - stderr - 15%|█▍ | 3332/22434 [2:19:59<13:25:52, 2.53s/it] +2025-02-05 12:27:42 - ERROR - stderr - 15%|█▍ | 3333/22434 [2:20:02<13:26:51, 2.53s/it] +2025-02-05 12:27:42 - ERROR - stderr - +2025-02-05 12:27:42 - ERROR - stderr - +2025-02-05 12:27:42 - INFO - stdout - {'loss': 1.074, 'grad_norm': 1.1581220626831055, 'learning_rate': 1.9272138980845595e-05, 'epoch': 0.45} +2025-02-05 12:27:42 - ERROR - stderr - 15%|█▍ | 3333/22434 [2:20:02<13:26:51, 2.53s/it] +2025-02-05 12:27:45 - ERROR - stderr - 15%|█▍ | 3334/22434 [2:20:05<13:43:03, 2.59s/it] +2025-02-05 12:27:45 - ERROR - stderr - +2025-02-05 12:27:45 - ERROR - stderr - +2025-02-05 12:27:45 - INFO - stdout - {'loss': 1.125, 'grad_norm': 1.1378134489059448, 'learning_rate': 1.927159815457836e-05, 'epoch': 0.45} +2025-02-05 12:27:45 - ERROR - stderr - 15%|█▍ | 3334/22434 [2:20:05<13:43:03, 2.59s/it] +2025-02-05 12:27:47 - ERROR - stderr - 15%|█▍ | 3335/22434 [2:20:07<13:29:27, 2.54s/it] +2025-02-05 12:27:47 - ERROR - stderr - +2025-02-05 12:27:47 - ERROR - stderr - +2025-02-05 12:27:47 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.1823351383209229, 'learning_rate': 1.9271057135053537e-05, 'epoch': 0.45} +2025-02-05 12:27:47 - ERROR - stderr - 15%|█▍ | 3335/22434 [2:20:07<13:29:27, 2.54s/it] +2025-02-05 12:27:50 - ERROR - stderr - 15%|█▍ | 3336/22434 [2:20:10<13:22:58, 2.52s/it] +2025-02-05 12:27:50 - ERROR - stderr - +2025-02-05 12:27:50 - ERROR - stderr - +2025-02-05 12:27:50 - INFO - stdout - {'loss': 0.908, 'grad_norm': 0.9492054581642151, 'learning_rate': 1.9270515922282394e-05, 'epoch': 0.45} +2025-02-05 12:27:50 - ERROR - stderr - 15%|█▍ | 3336/22434 [2:20:10<13:22:58, 2.52s/it] +2025-02-05 12:27:52 - ERROR - stderr - 15%|█▍ | 3337/22434 [2:20:12<13:19:03, 2.51s/it] +2025-02-05 12:27:52 - ERROR - stderr - +2025-02-05 12:27:52 - ERROR - stderr - +2025-02-05 12:27:52 - INFO - stdout - {'loss': 1.0801, 'grad_norm': 1.2447816133499146, 'learning_rate': 1.9269974516276223e-05, 'epoch': 0.45} +2025-02-05 12:27:52 - ERROR - stderr - 15%|█▍ | 3337/22434 [2:20:12<13:19:03, 2.51s/it] +2025-02-05 12:27:55 - ERROR - stderr - 15%|█▍ | 3338/22434 [2:20:15<14:13:09, 2.68s/it] +2025-02-05 12:27:55 - ERROR - stderr - +2025-02-05 12:27:55 - ERROR - stderr - +2025-02-05 12:27:55 - INFO - stdout - {'loss': 0.9343, 'grad_norm': 1.0827890634536743, 'learning_rate': 1.9269432917046302e-05, 'epoch': 0.45} +2025-02-05 12:27:55 - ERROR - stderr - 15%|█▍ | 3338/22434 [2:20:15<14:13:09, 2.68s/it] +2025-02-05 12:27:58 - ERROR - stderr - 15%|█▍ | 3339/22434 [2:20:18<14:18:13, 2.70s/it] +2025-02-05 12:27:58 - ERROR - stderr - +2025-02-05 12:27:58 - ERROR - stderr - +2025-02-05 12:27:58 - INFO - stdout - {'loss': 0.9262, 'grad_norm': 1.0911729335784912, 'learning_rate': 1.926889112460392e-05, 'epoch': 0.45} +2025-02-05 12:27:58 - ERROR - stderr - 15%|█▍ | 3339/22434 [2:20:18<14:18:13, 2.70s/it] +2025-02-05 12:28:01 - ERROR - stderr - 15%|█▍ | 3340/22434 [2:20:20<14:00:22, 2.64s/it] +2025-02-05 12:28:01 - ERROR - stderr - +2025-02-05 12:28:01 - ERROR - stderr - +2025-02-05 12:28:01 - INFO - stdout - {'loss': 1.0089, 'grad_norm': 1.197203516960144, 'learning_rate': 1.9268349138960374e-05, 'epoch': 0.45} +2025-02-05 12:28:01 - ERROR - stderr - 15%|█▍ | 3340/22434 [2:20:20<14:00:22, 2.64s/it] +2025-02-05 12:28:03 - ERROR - stderr - 15%|█▍ | 3341/22434 [2:20:23<13:45:53, 2.60s/it] +2025-02-05 12:28:03 - ERROR - stderr - +2025-02-05 12:28:03 - ERROR - stderr - +2025-02-05 12:28:03 - INFO - stdout - {'loss': 1.0306, 'grad_norm': 1.2953457832336426, 'learning_rate': 1.926780696012696e-05, 'epoch': 0.45} +2025-02-05 12:28:03 - ERROR - stderr - 15%|█▍ | 3341/22434 [2:20:23<13:45:53, 2.60s/it] +2025-02-05 12:28:06 - ERROR - stderr - 15%|█▍ | 3342/22434 [2:20:25<13:33:45, 2.56s/it] +2025-02-05 12:28:06 - ERROR - stderr - +2025-02-05 12:28:06 - ERROR - stderr - +2025-02-05 12:28:06 - INFO - stdout - {'loss': 0.9684, 'grad_norm': 1.0786229372024536, 'learning_rate': 1.9267264588114975e-05, 'epoch': 0.45} +2025-02-05 12:28:06 - ERROR - stderr - 15%|█▍ | 3342/22434 [2:20:25<13:33:45, 2.56s/it] +2025-02-05 12:28:08 - ERROR - stderr - 15%|█▍ | 3343/22434 [2:20:28<13:34:16, 2.56s/it] +2025-02-05 12:28:08 - ERROR - stderr - +2025-02-05 12:28:08 - ERROR - stderr - +2025-02-05 12:28:08 - INFO - stdout - {'loss': 0.9538, 'grad_norm': 1.0888077020645142, 'learning_rate': 1.9266722022935728e-05, 'epoch': 0.45} +2025-02-05 12:28:08 - ERROR - stderr - 15%|█▍ | 3343/22434 [2:20:28<13:34:16, 2.56s/it] +2025-02-05 12:28:11 - ERROR - stderr - 15%|█▍ | 3344/22434 [2:20:30<13:26:06, 2.53s/it] +2025-02-05 12:28:11 - ERROR - stderr - +2025-02-05 12:28:11 - ERROR - stderr - +2025-02-05 12:28:11 - INFO - stdout - {'loss': 0.9176, 'grad_norm': 1.1159228086471558, 'learning_rate': 1.9266179264600527e-05, 'epoch': 0.45} +2025-02-05 12:28:11 - ERROR - stderr - 15%|█▍ | 3344/22434 [2:20:30<13:26:06, 2.53s/it] +2025-02-05 12:28:13 - ERROR - stderr - 15%|█▍ | 3345/22434 [2:20:33<13:23:51, 2.53s/it] +2025-02-05 12:28:13 - ERROR - stderr - +2025-02-05 12:28:13 - ERROR - stderr - +2025-02-05 12:28:13 - INFO - stdout - {'loss': 1.0072, 'grad_norm': 1.1443184614181519, 'learning_rate': 1.9265636313120687e-05, 'epoch': 0.45} +2025-02-05 12:28:13 - ERROR - stderr - 15%|█▍ | 3345/22434 [2:20:33<13:23:51, 2.53s/it] +2025-02-05 12:28:16 - ERROR - stderr - 15%|█▍ | 3346/22434 [2:20:35<13:16:17, 2.50s/it] +2025-02-05 12:28:16 - ERROR - stderr - +2025-02-05 12:28:16 - ERROR - stderr - +2025-02-05 12:28:16 - INFO - stdout - {'loss': 1.0627, 'grad_norm': 1.2469744682312012, 'learning_rate': 1.9265093168507525e-05, 'epoch': 0.45} +2025-02-05 12:28:16 - ERROR - stderr - 15%|█▍ | 3346/22434 [2:20:35<13:16:17, 2.50s/it] +2025-02-05 12:28:18 - ERROR - stderr - 15%|█▍ | 3347/22434 [2:20:38<13:10:55, 2.49s/it] +2025-02-05 12:28:18 - ERROR - stderr - +2025-02-05 12:28:18 - ERROR - stderr - +2025-02-05 12:28:18 - INFO - stdout - {'loss': 0.9925, 'grad_norm': 1.0613532066345215, 'learning_rate': 1.9264549830772363e-05, 'epoch': 0.45} +2025-02-05 12:28:18 - ERROR - stderr - 15%|█▍ | 3347/22434 [2:20:38<13:10:55, 2.49s/it] +2025-02-05 12:28:21 - ERROR - stderr - 15%|█▍ | 3348/22434 [2:20:40<13:18:27, 2.51s/it] +2025-02-05 12:28:21 - ERROR - stderr - +2025-02-05 12:28:21 - ERROR - stderr - +2025-02-05 12:28:21 - INFO - stdout - {'loss': 0.9961, 'grad_norm': 1.0912984609603882, 'learning_rate': 1.9264006299926523e-05, 'epoch': 0.45} +2025-02-05 12:28:21 - ERROR - stderr - 15%|█▍ | 3348/22434 [2:20:40<13:18:27, 2.51s/it] +2025-02-05 12:28:23 - ERROR - stderr - 15%|█▍ | 3349/22434 [2:20:43<13:31:15, 2.55s/it] +2025-02-05 12:28:23 - ERROR - stderr - +2025-02-05 12:28:23 - ERROR - stderr - +2025-02-05 12:28:23 - INFO - stdout - {'loss': 1.1751, 'grad_norm': 1.2709434032440186, 'learning_rate': 1.926346257598134e-05, 'epoch': 0.45} +2025-02-05 12:28:23 - ERROR - stderr - 15%|█▍ | 3349/22434 [2:20:43<13:31:15, 2.55s/it] +2025-02-05 12:28:26 - ERROR - stderr - 15%|█▍ | 3350/22434 [2:20:46<13:26:16, 2.53s/it] +2025-02-05 12:28:26 - ERROR - stderr - +2025-02-05 12:28:26 - ERROR - stderr - +2025-02-05 12:28:26 - INFO - stdout - {'loss': 1.0059, 'grad_norm': 1.1200724840164185, 'learning_rate': 1.9262918658948137e-05, 'epoch': 0.45} +2025-02-05 12:28:26 - ERROR - stderr - 15%|█▍ | 3350/22434 [2:20:46<13:26:16, 2.53s/it] +2025-02-05 12:28:28 - ERROR - stderr - 15%|█▍ | 3351/22434 [2:20:48<13:19:11, 2.51s/it] +2025-02-05 12:28:28 - ERROR - stderr - +2025-02-05 12:28:28 - ERROR - stderr - +2025-02-05 12:28:28 - INFO - stdout - {'loss': 0.9931, 'grad_norm': 1.1213024854660034, 'learning_rate': 1.9262374548838264e-05, 'epoch': 0.45} +2025-02-05 12:28:28 - ERROR - stderr - 15%|█▍ | 3351/22434 [2:20:48<13:19:11, 2.51s/it] +2025-02-05 12:28:31 - ERROR - stderr - 15%|█▍ | 3352/22434 [2:20:50<13:20:17, 2.52s/it] +2025-02-05 12:28:31 - ERROR - stderr - +2025-02-05 12:28:31 - ERROR - stderr - +2025-02-05 12:28:31 - INFO - stdout - {'loss': 0.9238, 'grad_norm': 1.0249545574188232, 'learning_rate': 1.9261830245663053e-05, 'epoch': 0.45} +2025-02-05 12:28:31 - ERROR - stderr - 15%|█▍ | 3352/22434 [2:20:51<13:20:17, 2.52s/it] +2025-02-05 12:28:33 - ERROR - stderr - 15%|█▍ | 3353/22434 [2:20:53<13:16:47, 2.51s/it] +2025-02-05 12:28:33 - ERROR - stderr - +2025-02-05 12:28:33 - ERROR - stderr - +2025-02-05 12:28:33 - INFO - stdout - {'loss': 1.012, 'grad_norm': 1.0901380777359009, 'learning_rate': 1.9261285749433854e-05, 'epoch': 0.45} +2025-02-05 12:28:33 - ERROR - stderr - 15%|█▍ | 3353/22434 [2:20:53<13:16:47, 2.51s/it] +2025-02-05 12:28:36 - ERROR - stderr - 15%|█▍ | 3354/22434 [2:20:55<13:10:36, 2.49s/it] +2025-02-05 12:28:36 - ERROR - stderr - +2025-02-05 12:28:36 - ERROR - stderr - +2025-02-05 12:28:36 - INFO - stdout - {'loss': 1.0555, 'grad_norm': 1.2205151319503784, 'learning_rate': 1.9260741060162015e-05, 'epoch': 0.45} +2025-02-05 12:28:36 - ERROR - stderr - 15%|█▍ | 3354/22434 [2:20:55<13:10:36, 2.49s/it] +2025-02-05 12:28:38 - ERROR - stderr - 15%|█▍ | 3355/22434 [2:20:58<13:13:36, 2.50s/it] +2025-02-05 12:28:38 - ERROR - stderr - +2025-02-05 12:28:38 - ERROR - stderr - +2025-02-05 12:28:38 - INFO - stdout - {'loss': 1.1466, 'grad_norm': 1.1517947912216187, 'learning_rate': 1.9260196177858892e-05, 'epoch': 0.45} +2025-02-05 12:28:38 - ERROR - stderr - 15%|█▍ | 3355/22434 [2:20:58<13:13:36, 2.50s/it] +2025-02-05 12:28:41 - ERROR - stderr - 15%|█▍ | 3356/22434 [2:21:00<13:16:40, 2.51s/it] +2025-02-05 12:28:41 - ERROR - stderr - +2025-02-05 12:28:41 - ERROR - stderr - +2025-02-05 12:28:41 - INFO - stdout - {'loss': 0.9033, 'grad_norm': 1.0503699779510498, 'learning_rate': 1.925965110253584e-05, 'epoch': 0.45} +2025-02-05 12:28:41 - ERROR - stderr - 15%|█▍ | 3356/22434 [2:21:01<13:16:40, 2.51s/it] +2025-02-05 12:28:43 - ERROR - stderr - 15%|█▍ | 3357/22434 [2:21:03<13:12:43, 2.49s/it] +2025-02-05 12:28:43 - ERROR - stderr - +2025-02-05 12:28:43 - ERROR - stderr - +2025-02-05 12:28:43 - INFO - stdout - {'loss': 0.9939, 'grad_norm': 1.060001015663147, 'learning_rate': 1.925910583420422e-05, 'epoch': 0.45} +2025-02-05 12:28:43 - ERROR - stderr - 15%|█▍ | 3357/22434 [2:21:03<13:12:43, 2.49s/it] +2025-02-05 12:28:46 - ERROR - stderr - 15%|█▍ | 3358/22434 [2:21:05<13:13:22, 2.50s/it] +2025-02-05 12:28:46 - ERROR - stderr - +2025-02-05 12:28:46 - ERROR - stderr - +2025-02-05 12:28:46 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.1567180156707764, 'learning_rate': 1.9258560372875402e-05, 'epoch': 0.45} +2025-02-05 12:28:46 - ERROR - stderr - 15%|█▍ | 3358/22434 [2:21:05<13:13:22, 2.50s/it] +2025-02-05 12:28:48 - ERROR - stderr - 15%|█▍ | 3359/22434 [2:21:08<13:12:49, 2.49s/it] +2025-02-05 12:28:48 - ERROR - stderr - +2025-02-05 12:28:48 - ERROR - stderr - +2025-02-05 12:28:48 - INFO - stdout - {'loss': 0.9523, 'grad_norm': 0.9911651611328125, 'learning_rate': 1.9258014718560752e-05, 'epoch': 0.45} +2025-02-05 12:28:48 - ERROR - stderr - 15%|█▍ | 3359/22434 [2:21:08<13:12:49, 2.49s/it] +2025-02-05 12:28:51 - ERROR - stderr - 15%|█▍ | 3360/22434 [2:21:10<13:20:36, 2.52s/it] +2025-02-05 12:28:51 - ERROR - stderr - +2025-02-05 12:28:51 - ERROR - stderr - +2025-02-05 12:28:51 - INFO - stdout - {'loss': 1.0788, 'grad_norm': 1.210352897644043, 'learning_rate': 1.925746887127164e-05, 'epoch': 0.45} +2025-02-05 12:28:51 - ERROR - stderr - 15%|█▍ | 3360/22434 [2:21:11<13:20:36, 2.52s/it] +2025-02-05 12:28:53 - ERROR - stderr - 15%|█▍ | 3361/22434 [2:21:13<13:31:00, 2.55s/it] +2025-02-05 12:28:53 - ERROR - stderr - +2025-02-05 12:28:53 - ERROR - stderr - +2025-02-05 12:28:53 - INFO - stdout - {'loss': 0.9591, 'grad_norm': 1.0245184898376465, 'learning_rate': 1.9256922831019453e-05, 'epoch': 0.45} +2025-02-05 12:28:53 - ERROR - stderr - 15%|█▍ | 3361/22434 [2:21:13<13:31:00, 2.55s/it] +2025-02-05 12:28:56 - ERROR - stderr - 15%|█▍ | 3362/22434 [2:21:16<13:18:58, 2.51s/it] +2025-02-05 12:28:56 - ERROR - stderr - +2025-02-05 12:28:56 - ERROR - stderr - +2025-02-05 12:28:56 - INFO - stdout - {'loss': 0.9033, 'grad_norm': 1.110620379447937, 'learning_rate': 1.9256376597815565e-05, 'epoch': 0.45} +2025-02-05 12:28:56 - ERROR - stderr - 15%|█▍ | 3362/22434 [2:21:16<13:18:58, 2.51s/it] +2025-02-05 12:28:58 - ERROR - stderr - 15%|█▍ | 3363/22434 [2:21:18<13:24:15, 2.53s/it] +2025-02-05 12:28:58 - ERROR - stderr - +2025-02-05 12:28:58 - ERROR - stderr - +2025-02-05 12:28:58 - INFO - stdout - {'loss': 0.9059, 'grad_norm': 1.0918771028518677, 'learning_rate': 1.9255830171671364e-05, 'epoch': 0.45} +2025-02-05 12:28:58 - ERROR - stderr - 15%|█▍ | 3363/22434 [2:21:18<13:24:15, 2.53s/it] +2025-02-05 12:29:01 - ERROR - stderr - 15%|█▍ | 3364/22434 [2:21:21<13:23:33, 2.53s/it] +2025-02-05 12:29:01 - ERROR - stderr - +2025-02-05 12:29:01 - ERROR - stderr - +2025-02-05 12:29:01 - INFO - stdout - {'loss': 1.026, 'grad_norm': 1.0640569925308228, 'learning_rate': 1.9255283552598242e-05, 'epoch': 0.45} +2025-02-05 12:29:01 - ERROR - stderr - 15%|█▍ | 3364/22434 [2:21:21<13:23:33, 2.53s/it] +2025-02-05 12:29:03 - ERROR - stderr - 15%|█▍ | 3365/22434 [2:21:23<13:20:35, 2.52s/it] +2025-02-05 12:29:03 - ERROR - stderr - +2025-02-05 12:29:03 - ERROR - stderr - +2025-02-05 12:29:03 - INFO - stdout - {'loss': 0.9007, 'grad_norm': 1.1876285076141357, 'learning_rate': 1.9254736740607586e-05, 'epoch': 0.45} +2025-02-05 12:29:03 - ERROR - stderr - 15%|█▍ | 3365/22434 [2:21:23<13:20:35, 2.52s/it] +2025-02-05 12:29:06 - ERROR - stderr - 15%|█▌ | 3366/22434 [2:21:26<13:27:03, 2.54s/it] +2025-02-05 12:29:06 - ERROR - stderr - +2025-02-05 12:29:06 - ERROR - stderr - +2025-02-05 12:29:06 - INFO - stdout - {'loss': 1.0842, 'grad_norm': 1.1135878562927246, 'learning_rate': 1.9254189735710805e-05, 'epoch': 0.45} +2025-02-05 12:29:06 - ERROR - stderr - 15%|█▌ | 3366/22434 [2:21:26<13:27:03, 2.54s/it] +2025-02-05 12:29:08 - ERROR - stderr - 15%|█▌ | 3367/22434 [2:21:28<13:21:09, 2.52s/it] +2025-02-05 12:29:08 - ERROR - stderr - +2025-02-05 12:29:08 - ERROR - stderr - +2025-02-05 12:29:08 - INFO - stdout - {'loss': 1.0785, 'grad_norm': 1.2246215343475342, 'learning_rate': 1.9253642537919288e-05, 'epoch': 0.45} +2025-02-05 12:29:08 - ERROR - stderr - 15%|█▌ | 3367/22434 [2:21:28<13:21:09, 2.52s/it] +2025-02-05 12:29:11 - ERROR - stderr - 15%|█▌ | 3368/22434 [2:21:31<13:50:53, 2.61s/it] +2025-02-05 12:29:11 - ERROR - stderr - +2025-02-05 12:29:11 - ERROR - stderr - +2025-02-05 12:29:11 - INFO - stdout - {'loss': 1.084, 'grad_norm': 1.175704002380371, 'learning_rate': 1.925309514724445e-05, 'epoch': 0.45} +2025-02-05 12:29:11 - ERROR - stderr - 15%|█▌ | 3368/22434 [2:21:31<13:50:53, 2.61s/it] +2025-02-05 12:29:14 - ERROR - stderr - 15%|█▌ | 3369/22434 [2:21:34<13:38:11, 2.57s/it] +2025-02-05 12:29:14 - ERROR - stderr - +2025-02-05 12:29:14 - ERROR - stderr - +2025-02-05 12:29:14 - INFO - stdout - {'loss': 0.9904, 'grad_norm': 1.1717102527618408, 'learning_rate': 1.92525475636977e-05, 'epoch': 0.45} +2025-02-05 12:29:14 - ERROR - stderr - 15%|█▌ | 3369/22434 [2:21:34<13:38:11, 2.57s/it] +2025-02-05 12:29:16 - ERROR - stderr - 15%|█▌ | 3370/22434 [2:21:36<13:25:14, 2.53s/it] +2025-02-05 12:29:16 - ERROR - stderr - +2025-02-05 12:29:16 - ERROR - stderr - +2025-02-05 12:29:16 - INFO - stdout - {'loss': 0.9728, 'grad_norm': 1.1339911222457886, 'learning_rate': 1.9251999787290445e-05, 'epoch': 0.45} +2025-02-05 12:29:16 - ERROR - stderr - 15%|█▌ | 3370/22434 [2:21:36<13:25:14, 2.53s/it] +2025-02-05 12:29:19 - ERROR - stderr - 15%|█▌ | 3371/22434 [2:21:38<13:14:32, 2.50s/it] +2025-02-05 12:29:19 - ERROR - stderr - +2025-02-05 12:29:19 - ERROR - stderr - +2025-02-05 12:29:19 - INFO - stdout - {'loss': 1.0138, 'grad_norm': 1.2592439651489258, 'learning_rate': 1.925145181803411e-05, 'epoch': 0.45} +2025-02-05 12:29:19 - ERROR - stderr - 15%|█▌ | 3371/22434 [2:21:38<13:14:32, 2.50s/it] +2025-02-05 12:29:21 - ERROR - stderr - 15%|█▌ | 3372/22434 [2:21:41<13:19:43, 2.52s/it] +2025-02-05 12:29:21 - ERROR - stderr - +2025-02-05 12:29:21 - ERROR - stderr - +2025-02-05 12:29:21 - INFO - stdout - {'loss': 0.9797, 'grad_norm': 1.1844533681869507, 'learning_rate': 1.9250903655940116e-05, 'epoch': 0.45} +2025-02-05 12:29:21 - ERROR - stderr - 15%|█▌ | 3372/22434 [2:21:41<13:19:43, 2.52s/it] +2025-02-05 12:29:24 - ERROR - stderr - 15%|█▌ | 3373/22434 [2:21:43<13:12:44, 2.50s/it] +2025-02-05 12:29:24 - ERROR - stderr - +2025-02-05 12:29:24 - ERROR - stderr - +2025-02-05 12:29:24 - INFO - stdout - {'loss': 1.0744, 'grad_norm': 1.150302767753601, 'learning_rate': 1.9250355301019885e-05, 'epoch': 0.45} +2025-02-05 12:29:24 - ERROR - stderr - 15%|█▌ | 3373/22434 [2:21:43<13:12:44, 2.50s/it] +2025-02-05 12:29:26 - ERROR - stderr - 15%|█▌ | 3374/22434 [2:21:46<13:19:55, 2.52s/it] +2025-02-05 12:29:26 - ERROR - stderr - +2025-02-05 12:29:26 - ERROR - stderr - +2025-02-05 12:29:26 - INFO - stdout - {'loss': 1.093, 'grad_norm': 1.041264533996582, 'learning_rate': 1.924980675328485e-05, 'epoch': 0.45} +2025-02-05 12:29:26 - ERROR - stderr - 15%|█▌ | 3374/22434 [2:21:46<13:19:55, 2.52s/it] +2025-02-05 12:29:29 - ERROR - stderr - 15%|█▌ | 3375/22434 [2:21:48<13:10:03, 2.49s/it] +2025-02-05 12:29:29 - ERROR - stderr - +2025-02-05 12:29:29 - ERROR - stderr - +2025-02-05 12:29:29 - INFO - stdout - {'loss': 1.1424, 'grad_norm': 1.2654507160186768, 'learning_rate': 1.9249258012746447e-05, 'epoch': 0.45} +2025-02-05 12:29:29 - ERROR - stderr - 15%|█▌ | 3375/22434 [2:21:48<13:10:03, 2.49s/it] +2025-02-05 12:29:31 - ERROR - stderr - 15%|█▌ | 3376/22434 [2:21:51<13:20:03, 2.52s/it] +2025-02-05 12:29:31 - ERROR - stderr - +2025-02-05 12:29:31 - ERROR - stderr - +2025-02-05 12:29:31 - INFO - stdout - {'loss': 0.9184, 'grad_norm': 1.0421652793884277, 'learning_rate': 1.9248709079416107e-05, 'epoch': 0.45} +2025-02-05 12:29:31 - ERROR - stderr - 15%|█▌ | 3376/22434 [2:21:51<13:20:03, 2.52s/it] +2025-02-05 12:29:34 - ERROR - stderr - 15%|█▌ | 3377/22434 [2:21:54<13:28:09, 2.54s/it] +2025-02-05 12:29:34 - ERROR - stderr - +2025-02-05 12:29:34 - ERROR - stderr - +2025-02-05 12:29:34 - INFO - stdout - {'loss': 0.9203, 'grad_norm': 1.1006495952606201, 'learning_rate': 1.924815995330528e-05, 'epoch': 0.45} +2025-02-05 12:29:34 - ERROR - stderr - 15%|█▌ | 3377/22434 [2:21:54<13:28:09, 2.54s/it] +2025-02-05 12:29:36 - ERROR - stderr - 15%|█▌ | 3378/22434 [2:21:56<13:23:15, 2.53s/it] +2025-02-05 12:29:36 - ERROR - stderr - +2025-02-05 12:29:36 - ERROR - stderr - +2025-02-05 12:29:36 - INFO - stdout - {'loss': 1.0535, 'grad_norm': 1.4314602613449097, 'learning_rate': 1.9247610634425407e-05, 'epoch': 0.45} +2025-02-05 12:29:36 - ERROR - stderr - 15%|█▌ | 3378/22434 [2:21:56<13:23:15, 2.53s/it] +2025-02-05 12:29:39 - ERROR - stderr - 15%|█▌ | 3379/22434 [2:21:58<13:13:41, 2.50s/it] +2025-02-05 12:29:39 - ERROR - stderr - +2025-02-05 12:29:39 - ERROR - stderr - +2025-02-05 12:29:39 - INFO - stdout - {'loss': 1.006, 'grad_norm': 1.1162046194076538, 'learning_rate': 1.9247061122787936e-05, 'epoch': 0.45} +2025-02-05 12:29:39 - ERROR - stderr - 15%|█▌ | 3379/22434 [2:21:59<13:13:41, 2.50s/it] +2025-02-05 12:29:41 - ERROR - stderr - 15%|█▌ | 3380/22434 [2:22:01<13:21:37, 2.52s/it] +2025-02-05 12:29:41 - ERROR - stderr - +2025-02-05 12:29:41 - ERROR - stderr - +2025-02-05 12:29:41 - INFO - stdout - {'loss': 0.8776, 'grad_norm': 0.9385702610015869, 'learning_rate': 1.924651141840433e-05, 'epoch': 0.45} +2025-02-05 12:29:41 - ERROR - stderr - 15%|█▌ | 3380/22434 [2:22:01<13:21:37, 2.52s/it] +2025-02-05 12:29:44 - ERROR - stderr - 15%|█▌ | 3381/22434 [2:22:03<13:12:19, 2.50s/it] +2025-02-05 12:29:44 - ERROR - stderr - +2025-02-05 12:29:44 - ERROR - stderr - +2025-02-05 12:29:44 - INFO - stdout - {'loss': 0.9883, 'grad_norm': 1.198063850402832, 'learning_rate': 1.924596152128604e-05, 'epoch': 0.45} +2025-02-05 12:29:44 - ERROR - stderr - 15%|█▌ | 3381/22434 [2:22:04<13:12:19, 2.50s/it] +2025-02-05 12:29:46 - ERROR - stderr - 15%|█▌ | 3382/22434 [2:22:06<13:16:15, 2.51s/it] +2025-02-05 12:29:46 - ERROR - stderr - +2025-02-05 12:29:46 - ERROR - stderr - +2025-02-05 12:29:46 - INFO - stdout - {'loss': 1.0444, 'grad_norm': 1.1203556060791016, 'learning_rate': 1.9245411431444526e-05, 'epoch': 0.45} +2025-02-05 12:29:46 - ERROR - stderr - 15%|█▌ | 3382/22434 [2:22:06<13:16:15, 2.51s/it] +2025-02-05 12:29:49 - ERROR - stderr - 15%|█▌ | 3383/22434 [2:22:08<13:08:04, 2.48s/it] +2025-02-05 12:29:49 - ERROR - stderr - +2025-02-05 12:29:49 - ERROR - stderr - +2025-02-05 12:29:49 - INFO - stdout - {'loss': 0.9729, 'grad_norm': 1.0597683191299438, 'learning_rate': 1.924486114889126e-05, 'epoch': 0.45} +2025-02-05 12:29:49 - ERROR - stderr - 15%|█▌ | 3383/22434 [2:22:09<13:08:04, 2.48s/it] +2025-02-05 12:29:52 - ERROR - stderr - 15%|█▌ | 3384/22434 [2:22:11<13:56:11, 2.63s/it] +2025-02-05 12:29:52 - ERROR - stderr - +2025-02-05 12:29:52 - ERROR - stderr - +2025-02-05 12:29:52 - INFO - stdout - {'loss': 0.9988, 'grad_norm': 1.0010021924972534, 'learning_rate': 1.924431067363771e-05, 'epoch': 0.45} +2025-02-05 12:29:52 - ERROR - stderr - 15%|█▌ | 3384/22434 [2:22:11<13:56:11, 2.63s/it] +2025-02-05 12:29:54 - ERROR - stderr - 15%|█▌ | 3385/22434 [2:22:14<13:42:01, 2.59s/it] +2025-02-05 12:29:54 - ERROR - stderr - +2025-02-05 12:29:54 - ERROR - stderr - +2025-02-05 12:29:54 - INFO - stdout - {'loss': 1.0205, 'grad_norm': 1.0378679037094116, 'learning_rate': 1.924376000569535e-05, 'epoch': 0.45} +2025-02-05 12:29:54 - ERROR - stderr - 15%|█▌ | 3385/22434 [2:22:14<13:42:01, 2.59s/it] +2025-02-05 12:29:57 - ERROR - stderr - 15%|█▌ | 3386/22434 [2:22:16<13:33:09, 2.56s/it] +2025-02-05 12:29:57 - ERROR - stderr - +2025-02-05 12:29:57 - ERROR - stderr - +2025-02-05 12:29:57 - INFO - stdout - {'loss': 0.9553, 'grad_norm': 1.0878831148147583, 'learning_rate': 1.9243209145075656e-05, 'epoch': 0.45} +2025-02-05 12:29:57 - ERROR - stderr - 15%|█▌ | 3386/22434 [2:22:16<13:33:09, 2.56s/it] +2025-02-05 12:29:59 - ERROR - stderr - 15%|█▌ | 3387/22434 [2:22:19<13:18:22, 2.51s/it] +2025-02-05 12:29:59 - ERROR - stderr - +2025-02-05 12:29:59 - ERROR - stderr - +2025-02-05 12:29:59 - INFO - stdout - {'loss': 1.008, 'grad_norm': 1.3530305624008179, 'learning_rate': 1.9242658091790118e-05, 'epoch': 0.45} +2025-02-05 12:29:59 - ERROR - stderr - 15%|█▌ | 3387/22434 [2:22:19<13:18:22, 2.51s/it] +2025-02-05 12:30:02 - ERROR - stderr - 15%|█▌ | 3388/22434 [2:22:21<13:14:09, 2.50s/it] +2025-02-05 12:30:02 - ERROR - stderr - +2025-02-05 12:30:02 - ERROR - stderr - +2025-02-05 12:30:02 - INFO - stdout - {'loss': 1.0446, 'grad_norm': 1.2059698104858398, 'learning_rate': 1.9242106845850208e-05, 'epoch': 0.45} +2025-02-05 12:30:02 - ERROR - stderr - 15%|█▌ | 3388/22434 [2:22:21<13:14:09, 2.50s/it] +2025-02-05 12:30:04 - ERROR - stderr - 15%|█▌ | 3389/22434 [2:22:24<13:21:12, 2.52s/it] +2025-02-05 12:30:04 - ERROR - stderr - +2025-02-05 12:30:04 - ERROR - stderr - +2025-02-05 12:30:04 - INFO - stdout - {'loss': 0.9387, 'grad_norm': 1.076590657234192, 'learning_rate': 1.924155540726743e-05, 'epoch': 0.45} +2025-02-05 12:30:04 - ERROR - stderr - 15%|█▌ | 3389/22434 [2:22:24<13:21:12, 2.52s/it] +2025-02-05 12:30:07 - ERROR - stderr - 15%|█▌ | 3390/22434 [2:22:27<13:33:05, 2.56s/it] +2025-02-05 12:30:07 - ERROR - stderr - +2025-02-05 12:30:07 - ERROR - stderr - +2025-02-05 12:30:07 - INFO - stdout - {'loss': 1.0034, 'grad_norm': 1.1469002962112427, 'learning_rate': 1.9241003776053273e-05, 'epoch': 0.45} +2025-02-05 12:30:07 - ERROR - stderr - 15%|█▌ | 3390/22434 [2:22:27<13:33:05, 2.56s/it] +2025-02-05 12:30:09 - ERROR - stderr - 15%|█▌ | 3391/22434 [2:22:29<13:29:03, 2.55s/it] +2025-02-05 12:30:09 - ERROR - stderr - +2025-02-05 12:30:09 - ERROR - stderr - +2025-02-05 12:30:09 - INFO - stdout - {'loss': 1.0745, 'grad_norm': 1.130566120147705, 'learning_rate': 1.9240451952219232e-05, 'epoch': 0.45} +2025-02-05 12:30:09 - ERROR - stderr - 15%|█▌ | 3391/22434 [2:22:29<13:29:03, 2.55s/it] +2025-02-05 12:30:12 - ERROR - stderr - 15%|█▌ | 3392/22434 [2:22:32<13:33:36, 2.56s/it] +2025-02-05 12:30:12 - ERROR - stderr - +2025-02-05 12:30:12 - ERROR - stderr - +2025-02-05 12:30:12 - INFO - stdout - {'loss': 0.9338, 'grad_norm': 1.1840704679489136, 'learning_rate': 1.9239899935776812e-05, 'epoch': 0.45} +2025-02-05 12:30:12 - ERROR - stderr - 15%|█▌ | 3392/22434 [2:22:32<13:33:36, 2.56s/it] +2025-02-05 12:30:14 - ERROR - stderr - 15%|█▌ | 3393/22434 [2:22:34<13:27:54, 2.55s/it] +2025-02-05 12:30:14 - ERROR - stderr - +2025-02-05 12:30:14 - ERROR - stderr - +2025-02-05 12:30:14 - INFO - stdout - {'loss': 0.9768, 'grad_norm': 1.2050395011901855, 'learning_rate': 1.9239347726737524e-05, 'epoch': 0.45} +2025-02-05 12:30:14 - ERROR - stderr - 15%|█▌ | 3393/22434 [2:22:34<13:27:54, 2.55s/it] +2025-02-05 12:30:17 - ERROR - stderr - 15%|█▌ | 3394/22434 [2:22:37<13:25:14, 2.54s/it] +2025-02-05 12:30:17 - ERROR - stderr - +2025-02-05 12:30:17 - ERROR - stderr - +2025-02-05 12:30:17 - INFO - stdout - {'loss': 0.9451, 'grad_norm': 1.029349684715271, 'learning_rate': 1.9238795325112867e-05, 'epoch': 0.45} +2025-02-05 12:30:17 - ERROR - stderr - 15%|█▌ | 3394/22434 [2:22:37<13:25:14, 2.54s/it] +2025-02-05 12:30:19 - ERROR - stderr - 15%|█▌ | 3395/22434 [2:22:39<13:19:52, 2.52s/it] +2025-02-05 12:30:19 - ERROR - stderr - +2025-02-05 12:30:19 - ERROR - stderr - +2025-02-05 12:30:19 - INFO - stdout - {'loss': 1.0074, 'grad_norm': 1.068260908126831, 'learning_rate': 1.923824273091437e-05, 'epoch': 0.45} +2025-02-05 12:30:19 - ERROR - stderr - 15%|█▌ | 3395/22434 [2:22:39<13:19:52, 2.52s/it] +2025-02-05 12:30:22 - ERROR - stderr - 15%|█▌ | 3396/22434 [2:22:42<13:45:37, 2.60s/it] +2025-02-05 12:30:22 - ERROR - stderr - +2025-02-05 12:30:22 - ERROR - stderr - +2025-02-05 12:30:22 - INFO - stdout - {'loss': 1.0076, 'grad_norm': 1.1231591701507568, 'learning_rate': 1.9237689944153535e-05, 'epoch': 0.45} +2025-02-05 12:30:22 - ERROR - stderr - 15%|█▌ | 3396/22434 [2:22:42<13:45:37, 2.60s/it] +2025-02-05 12:30:25 - ERROR - stderr - 15%|█▌ | 3397/22434 [2:22:44<13:35:08, 2.57s/it] +2025-02-05 12:30:25 - ERROR - stderr - +2025-02-05 12:30:25 - ERROR - stderr - +2025-02-05 12:30:25 - INFO - stdout - {'loss': 0.9961, 'grad_norm': 1.1692661046981812, 'learning_rate': 1.92371369648419e-05, 'epoch': 0.45} +2025-02-05 12:30:25 - ERROR - stderr - 15%|█▌ | 3397/22434 [2:22:44<13:35:08, 2.57s/it] +2025-02-05 12:30:27 - ERROR - stderr - 15%|█▌ | 3398/22434 [2:22:47<13:24:16, 2.53s/it] +2025-02-05 12:30:27 - ERROR - stderr - +2025-02-05 12:30:27 - ERROR - stderr - +2025-02-05 12:30:27 - INFO - stdout - {'loss': 0.8335, 'grad_norm': 1.1960813999176025, 'learning_rate': 1.923658379299098e-05, 'epoch': 0.45} +2025-02-05 12:30:27 - ERROR - stderr - 15%|█▌ | 3398/22434 [2:22:47<13:24:16, 2.53s/it] +2025-02-05 12:30:30 - ERROR - stderr - 15%|█▌ | 3399/22434 [2:22:49<13:14:20, 2.50s/it] +2025-02-05 12:30:30 - ERROR - stderr - +2025-02-05 12:30:30 - ERROR - stderr - +2025-02-05 12:30:30 - INFO - stdout - {'loss': 1.0425, 'grad_norm': 1.298230767250061, 'learning_rate': 1.9236030428612307e-05, 'epoch': 0.45} +2025-02-05 12:30:30 - ERROR - stderr - 15%|█▌ | 3399/22434 [2:22:49<13:14:20, 2.50s/it] +2025-02-05 12:30:32 - ERROR - stderr - 15%|█▌ | 3400/22434 [2:22:52<13:11:00, 2.49s/it] +2025-02-05 12:30:32 - ERROR - stderr - +2025-02-05 12:30:32 - ERROR - stderr - +2025-02-05 12:30:32 - INFO - stdout - {'loss': 0.7899, 'grad_norm': 1.0371997356414795, 'learning_rate': 1.9235476871717422e-05, 'epoch': 0.45} +2025-02-05 12:30:32 - ERROR - stderr - 15%|█▌ | 3400/22434 [2:22:52<13:11:00, 2.49s/it] +2025-02-05 12:30:34 - ERROR - stderr - 15%|█▌ | 3401/22434 [2:22:54<13:06:09, 2.48s/it] +2025-02-05 12:30:35 - ERROR - stderr - +2025-02-05 12:30:35 - ERROR - stderr - +2025-02-05 12:30:35 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.0718671083450317, 'learning_rate': 1.923492312231786e-05, 'epoch': 0.45} +2025-02-05 12:30:35 - ERROR - stderr - 15%|█▌ | 3401/22434 [2:22:54<13:06:09, 2.48s/it] +2025-02-05 12:30:37 - ERROR - stderr - 15%|█▌ | 3402/22434 [2:22:57<13:20:25, 2.52s/it] +2025-02-05 12:30:37 - ERROR - stderr - +2025-02-05 12:30:37 - ERROR - stderr - +2025-02-05 12:30:37 - INFO - stdout - {'loss': 1.0515, 'grad_norm': 1.1243482828140259, 'learning_rate': 1.923436918042516e-05, 'epoch': 0.45} +2025-02-05 12:30:37 - ERROR - stderr - 15%|█▌ | 3402/22434 [2:22:57<13:20:25, 2.52s/it] +2025-02-05 12:30:40 - ERROR - stderr - 15%|█▌ | 3403/22434 [2:22:59<13:13:03, 2.50s/it] +2025-02-05 12:30:40 - ERROR - stderr - +2025-02-05 12:30:40 - ERROR - stderr - +2025-02-05 12:30:40 - INFO - stdout - {'loss': 0.9858, 'grad_norm': 1.146529197692871, 'learning_rate': 1.9233815046050867e-05, 'epoch': 0.46} +2025-02-05 12:30:40 - ERROR - stderr - 15%|█▌ | 3403/22434 [2:22:59<13:13:03, 2.50s/it] +2025-02-05 12:30:42 - ERROR - stderr - 15%|█▌ | 3404/22434 [2:23:02<13:05:43, 2.48s/it] +2025-02-05 12:30:42 - ERROR - stderr - +2025-02-05 12:30:42 - ERROR - stderr - +2025-02-05 12:30:42 - INFO - stdout - {'loss': 0.8909, 'grad_norm': 1.1278969049453735, 'learning_rate': 1.9233260719206543e-05, 'epoch': 0.46} +2025-02-05 12:30:42 - ERROR - stderr - 15%|█▌ | 3404/22434 [2:23:02<13:05:43, 2.48s/it] +2025-02-05 12:30:44 - ERROR - stderr - 15%|█▌ | 3405/22434 [2:23:04<13:11:06, 2.49s/it] +2025-02-05 12:30:45 - ERROR - stderr - +2025-02-05 12:30:45 - ERROR - stderr - +2025-02-05 12:30:45 - INFO - stdout - {'loss': 0.961, 'grad_norm': 0.9876331686973572, 'learning_rate': 1.923270619990373e-05, 'epoch': 0.46} +2025-02-05 12:30:45 - ERROR - stderr - 15%|█▌ | 3405/22434 [2:23:04<13:11:06, 2.49s/it] +2025-02-05 12:30:47 - ERROR - stderr - 15%|█▌ | 3406/22434 [2:23:07<13:11:14, 2.50s/it] +2025-02-05 12:30:47 - ERROR - stderr - +2025-02-05 12:30:47 - ERROR - stderr - +2025-02-05 12:30:47 - INFO - stdout - {'loss': 1.0153, 'grad_norm': 1.1827300786972046, 'learning_rate': 1.923215148815399e-05, 'epoch': 0.46} +2025-02-05 12:30:47 - ERROR - stderr - 15%|█▌ | 3406/22434 [2:23:07<13:11:14, 2.50s/it] +2025-02-05 12:30:49 - ERROR - stderr - 15%|█▌ | 3407/22434 [2:23:09<13:02:11, 2.47s/it] +2025-02-05 12:30:49 - ERROR - stderr - +2025-02-05 12:30:49 - ERROR - stderr - +2025-02-05 12:30:49 - INFO - stdout - {'loss': 0.9652, 'grad_norm': 1.1657085418701172, 'learning_rate': 1.9231596583968888e-05, 'epoch': 0.46} +2025-02-05 12:30:49 - ERROR - stderr - 15%|█▌ | 3407/22434 [2:23:09<13:02:11, 2.47s/it] +2025-02-05 12:30:52 - ERROR - stderr - 15%|█▌ | 3408/22434 [2:23:12<14:02:20, 2.66s/it] +2025-02-05 12:30:53 - ERROR - stderr - +2025-02-05 12:30:53 - ERROR - stderr - +2025-02-05 12:30:53 - INFO - stdout - {'loss': 0.9458, 'grad_norm': 1.0937281847000122, 'learning_rate': 1.9231041487359988e-05, 'epoch': 0.46} +2025-02-05 12:30:53 - ERROR - stderr - 15%|█▌ | 3408/22434 [2:23:12<14:02:20, 2.66s/it] +2025-02-05 12:30:55 - ERROR - stderr - 15%|█▌ | 3409/22434 [2:23:15<13:44:29, 2.60s/it] +2025-02-05 12:30:55 - ERROR - stderr - +2025-02-05 12:30:55 - ERROR - stderr - +2025-02-05 12:30:55 - INFO - stdout - {'loss': 1.0025, 'grad_norm': 1.1266882419586182, 'learning_rate': 1.9230486198338863e-05, 'epoch': 0.46} +2025-02-05 12:30:55 - ERROR - stderr - 15%|█▌ | 3409/22434 [2:23:15<13:44:29, 2.60s/it] +2025-02-05 12:30:58 - ERROR - stderr - 15%|█▌ | 3410/22434 [2:23:17<13:45:56, 2.60s/it] +2025-02-05 12:30:58 - ERROR - stderr - +2025-02-05 12:30:58 - ERROR - stderr - +2025-02-05 12:30:58 - INFO - stdout - {'loss': 0.8005, 'grad_norm': 0.9347115159034729, 'learning_rate': 1.9229930716917085e-05, 'epoch': 0.46} +2025-02-05 12:30:58 - ERROR - stderr - 15%|█▌ | 3410/22434 [2:23:17<13:45:56, 2.60s/it] +2025-02-05 12:31:00 - ERROR - stderr - 15%|█▌ | 3411/22434 [2:23:20<13:34:50, 2.57s/it] +2025-02-05 12:31:00 - ERROR - stderr - +2025-02-05 12:31:00 - ERROR - stderr - +2025-02-05 12:31:00 - INFO - stdout - {'loss': 0.9616, 'grad_norm': 1.0955075025558472, 'learning_rate': 1.9229375043106233e-05, 'epoch': 0.46} +2025-02-05 12:31:00 - ERROR - stderr - 15%|█▌ | 3411/22434 [2:23:20<13:34:50, 2.57s/it] +2025-02-05 12:31:03 - ERROR - stderr - 15%|█▌ | 3412/22434 [2:23:22<13:24:10, 2.54s/it] +2025-02-05 12:31:03 - ERROR - stderr - +2025-02-05 12:31:03 - ERROR - stderr - +2025-02-05 12:31:03 - INFO - stdout - {'loss': 0.9329, 'grad_norm': 1.1396260261535645, 'learning_rate': 1.922881917691789e-05, 'epoch': 0.46} +2025-02-05 12:31:03 - ERROR - stderr - 15%|█▌ | 3412/22434 [2:23:22<13:24:10, 2.54s/it] +2025-02-05 12:31:05 - ERROR - stderr - 15%|█▌ | 3413/22434 [2:23:25<13:13:01, 2.50s/it] +2025-02-05 12:31:05 - ERROR - stderr - +2025-02-05 12:31:05 - ERROR - stderr - +2025-02-05 12:31:05 - INFO - stdout - {'loss': 1.0297, 'grad_norm': 1.0695569515228271, 'learning_rate': 1.922826311836364e-05, 'epoch': 0.46} +2025-02-05 12:31:05 - ERROR - stderr - 15%|█▌ | 3413/22434 [2:23:25<13:13:01, 2.50s/it] +2025-02-05 12:31:07 - ERROR - stderr - 15%|█▌ | 3414/22434 [2:23:27<13:10:31, 2.49s/it] +2025-02-05 12:31:07 - ERROR - stderr - +2025-02-05 12:31:07 - ERROR - stderr - +2025-02-05 12:31:07 - INFO - stdout - {'loss': 1.0094, 'grad_norm': 1.126440167427063, 'learning_rate': 1.922770686745508e-05, 'epoch': 0.46} +2025-02-05 12:31:07 - ERROR - stderr - 15%|█▌ | 3414/22434 [2:23:27<13:10:31, 2.49s/it] +2025-02-05 12:31:10 - ERROR - stderr - 15%|█▌ | 3415/22434 [2:23:30<13:13:27, 2.50s/it] +2025-02-05 12:31:10 - ERROR - stderr - +2025-02-05 12:31:10 - ERROR - stderr - +2025-02-05 12:31:10 - INFO - stdout - {'loss': 0.9809, 'grad_norm': 1.039226770401001, 'learning_rate': 1.92271504242038e-05, 'epoch': 0.46} +2025-02-05 12:31:10 - ERROR - stderr - 15%|█▌ | 3415/22434 [2:23:30<13:13:27, 2.50s/it] +2025-02-05 12:31:12 - ERROR - stderr - 15%|█▌ | 3416/22434 [2:23:32<13:15:21, 2.51s/it] +2025-02-05 12:31:13 - ERROR - stderr - +2025-02-05 12:31:13 - ERROR - stderr - +2025-02-05 12:31:13 - INFO - stdout - {'loss': 1.0911, 'grad_norm': 1.430786371231079, 'learning_rate': 1.9226593788621393e-05, 'epoch': 0.46} +2025-02-05 12:31:13 - ERROR - stderr - 15%|█▌ | 3416/22434 [2:23:32<13:15:21, 2.51s/it] +2025-02-05 12:31:15 - ERROR - stderr - 15%|█▌ | 3417/22434 [2:23:35<13:10:46, 2.49s/it] +2025-02-05 12:31:15 - ERROR - stderr - +2025-02-05 12:31:15 - ERROR - stderr - +2025-02-05 12:31:15 - INFO - stdout - {'loss': 0.9621, 'grad_norm': 1.1412203311920166, 'learning_rate': 1.9226036960719474e-05, 'epoch': 0.46} +2025-02-05 12:31:15 - ERROR - stderr - 15%|█▌ | 3417/22434 [2:23:35<13:10:46, 2.49s/it] +2025-02-05 12:31:17 - ERROR - stderr - 15%|█▌ | 3418/22434 [2:23:37<13:10:16, 2.49s/it] +2025-02-05 12:31:17 - ERROR - stderr - +2025-02-05 12:31:17 - ERROR - stderr - +2025-02-05 12:31:17 - INFO - stdout - {'loss': 1.0723, 'grad_norm': 1.1158143281936646, 'learning_rate': 1.922547994050964e-05, 'epoch': 0.46} +2025-02-05 12:31:17 - ERROR - stderr - 15%|█▌ | 3418/22434 [2:23:37<13:10:16, 2.49s/it] +2025-02-05 12:31:20 - ERROR - stderr - 15%|█▌ | 3419/22434 [2:23:40<13:11:12, 2.50s/it] +2025-02-05 12:31:20 - ERROR - stderr - +2025-02-05 12:31:20 - ERROR - stderr - +2025-02-05 12:31:20 - INFO - stdout - {'loss': 1.0132, 'grad_norm': 1.1592521667480469, 'learning_rate': 1.9224922728003507e-05, 'epoch': 0.46} +2025-02-05 12:31:20 - ERROR - stderr - 15%|█▌ | 3419/22434 [2:23:40<13:11:12, 2.50s/it] +2025-02-05 12:31:23 - ERROR - stderr - 15%|█▌ | 3420/22434 [2:23:43<14:05:19, 2.67s/it] +2025-02-05 12:31:23 - ERROR - stderr - +2025-02-05 12:31:23 - ERROR - stderr - +2025-02-05 12:31:23 - INFO - stdout - {'loss': 0.9764, 'grad_norm': 1.0948034524917603, 'learning_rate': 1.9224365323212685e-05, 'epoch': 0.46} +2025-02-05 12:31:23 - ERROR - stderr - 15%|█▌ | 3420/22434 [2:23:43<14:05:19, 2.67s/it] +2025-02-05 12:31:25 - ERROR - stderr - 15%|█▌ | 3421/22434 [2:23:45<13:46:07, 2.61s/it] +2025-02-05 12:31:26 - ERROR - stderr - +2025-02-05 12:31:26 - ERROR - stderr - +2025-02-05 12:31:26 - INFO - stdout - {'loss': 0.8994, 'grad_norm': 1.1486250162124634, 'learning_rate': 1.9223807726148792e-05, 'epoch': 0.46} +2025-02-05 12:31:26 - ERROR - stderr - 15%|█▌ | 3421/22434 [2:23:45<13:46:07, 2.61s/it] +2025-02-05 12:31:28 - ERROR - stderr - 15%|█▌ | 3422/22434 [2:23:48<13:58:07, 2.65s/it] +2025-02-05 12:31:28 - ERROR - stderr - +2025-02-05 12:31:28 - ERROR - stderr - +2025-02-05 12:31:28 - INFO - stdout - {'loss': 1.0021, 'grad_norm': 1.1484692096710205, 'learning_rate': 1.9223249936823457e-05, 'epoch': 0.46} +2025-02-05 12:31:28 - ERROR - stderr - 15%|█▌ | 3422/22434 [2:23:48<13:58:07, 2.65s/it] +2025-02-05 12:31:31 - ERROR - stderr - 15%|█▌ | 3423/22434 [2:23:50<13:47:39, 2.61s/it] +2025-02-05 12:31:31 - ERROR - stderr - +2025-02-05 12:31:31 - ERROR - stderr - +2025-02-05 12:31:31 - INFO - stdout - {'loss': 1.1202, 'grad_norm': 1.1607353687286377, 'learning_rate': 1.92226919552483e-05, 'epoch': 0.46} +2025-02-05 12:31:31 - ERROR - stderr - 15%|█▌ | 3423/22434 [2:23:51<13:47:39, 2.61s/it] +2025-02-05 12:31:33 - ERROR - stderr - 15%|█▌ | 3424/22434 [2:23:53<13:34:32, 2.57s/it] +2025-02-05 12:31:33 - ERROR - stderr - +2025-02-05 12:31:33 - ERROR - stderr - +2025-02-05 12:31:33 - INFO - stdout - {'loss': 0.9923, 'grad_norm': 1.087226152420044, 'learning_rate': 1.922213378143496e-05, 'epoch': 0.46} +2025-02-05 12:31:33 - ERROR - stderr - 15%|█▌ | 3424/22434 [2:23:53<13:34:32, 2.57s/it] +2025-02-05 12:31:36 - ERROR - stderr - 15%|█▌ | 3425/22434 [2:23:56<13:39:34, 2.59s/it] +2025-02-05 12:31:36 - ERROR - stderr - +2025-02-05 12:31:36 - ERROR - stderr - +2025-02-05 12:31:36 - INFO - stdout - {'loss': 0.8913, 'grad_norm': 1.1255282163619995, 'learning_rate': 1.9221575415395058e-05, 'epoch': 0.46} +2025-02-05 12:31:36 - ERROR - stderr - 15%|█▌ | 3425/22434 [2:23:56<13:39:34, 2.59s/it] +2025-02-05 12:31:38 - ERROR - stderr - 15%|█▌ | 3426/22434 [2:23:58<13:27:46, 2.55s/it] +2025-02-05 12:31:38 - ERROR - stderr - +2025-02-05 12:31:38 - ERROR - stderr - +2025-02-05 12:31:38 - INFO - stdout - {'loss': 0.9961, 'grad_norm': 1.1019564867019653, 'learning_rate': 1.9221016857140244e-05, 'epoch': 0.46} +2025-02-05 12:31:38 - ERROR - stderr - 15%|█▌ | 3426/22434 [2:23:58<13:27:46, 2.55s/it] +2025-02-05 12:31:41 - ERROR - stderr - 15%|█▌ | 3427/22434 [2:24:00<13:15:36, 2.51s/it] +2025-02-05 12:31:41 - ERROR - stderr - +2025-02-05 12:31:41 - ERROR - stderr - +2025-02-05 12:31:41 - INFO - stdout - {'loss': 1.0114, 'grad_norm': 1.1333547830581665, 'learning_rate': 1.922045810668216e-05, 'epoch': 0.46} +2025-02-05 12:31:41 - ERROR - stderr - 15%|█▌ | 3427/22434 [2:24:01<13:15:36, 2.51s/it] +2025-02-05 12:31:43 - ERROR - stderr - 15%|█▌ | 3428/22434 [2:24:03<13:17:41, 2.52s/it] +2025-02-05 12:31:43 - ERROR - stderr - +2025-02-05 12:31:43 - ERROR - stderr - +2025-02-05 12:31:43 - INFO - stdout - {'loss': 1.1199, 'grad_norm': 1.2449817657470703, 'learning_rate': 1.9219899164032446e-05, 'epoch': 0.46} +2025-02-05 12:31:43 - ERROR - stderr - 15%|█▌ | 3428/22434 [2:24:03<13:17:41, 2.52s/it] +2025-02-05 12:31:46 - ERROR - stderr - 15%|█▌ | 3429/22434 [2:24:05<13:14:03, 2.51s/it] +2025-02-05 12:31:46 - ERROR - stderr - +2025-02-05 12:31:46 - ERROR - stderr - +2025-02-05 12:31:46 - INFO - stdout - {'loss': 0.8471, 'grad_norm': 1.0216999053955078, 'learning_rate': 1.921934002920276e-05, 'epoch': 0.46} +2025-02-05 12:31:46 - ERROR - stderr - 15%|█▌ | 3429/22434 [2:24:06<13:14:03, 2.51s/it] +2025-02-05 12:31:48 - ERROR - stderr - 15%|█▌ | 3430/22434 [2:24:08<13:09:19, 2.49s/it] +2025-02-05 12:31:48 - ERROR - stderr - +2025-02-05 12:31:48 - ERROR - stderr - +2025-02-05 12:31:48 - INFO - stdout - {'loss': 1.0736, 'grad_norm': 1.1862415075302124, 'learning_rate': 1.921878070220475e-05, 'epoch': 0.46} +2025-02-05 12:31:48 - ERROR - stderr - 15%|█▌ | 3430/22434 [2:24:08<13:09:19, 2.49s/it] +2025-02-05 12:31:51 - ERROR - stderr - 15%|█▌ | 3431/22434 [2:24:10<13:01:32, 2.47s/it] +2025-02-05 12:31:51 - ERROR - stderr - +2025-02-05 12:31:51 - ERROR - stderr - +2025-02-05 12:31:51 - INFO - stdout - {'loss': 0.9986, 'grad_norm': 1.1326826810836792, 'learning_rate': 1.921822118305008e-05, 'epoch': 0.46} +2025-02-05 12:31:51 - ERROR - stderr - 15%|█▌ | 3431/22434 [2:24:10<13:01:32, 2.47s/it] +2025-02-05 12:31:53 - ERROR - stderr - 15%|█▌ | 3432/22434 [2:24:13<13:35:06, 2.57s/it] +2025-02-05 12:31:53 - ERROR - stderr - +2025-02-05 12:31:53 - ERROR - stderr - +2025-02-05 12:31:53 - INFO - stdout - {'loss': 1.0586, 'grad_norm': 1.077890157699585, 'learning_rate': 1.9217661471750406e-05, 'epoch': 0.46} +2025-02-05 12:31:53 - ERROR - stderr - 15%|█▌ | 3432/22434 [2:24:13<13:35:06, 2.57s/it] +2025-02-05 12:31:56 - ERROR - stderr - 15%|█▌ | 3433/22434 [2:24:16<13:23:03, 2.54s/it] +2025-02-05 12:31:56 - ERROR - stderr - +2025-02-05 12:31:56 - ERROR - stderr - +2025-02-05 12:31:56 - INFO - stdout - {'loss': 1.0043, 'grad_norm': 1.1471023559570312, 'learning_rate': 1.9217101568317402e-05, 'epoch': 0.46} +2025-02-05 12:31:56 - ERROR - stderr - 15%|█▌ | 3433/22434 [2:24:16<13:23:03, 2.54s/it] +2025-02-05 12:31:58 - ERROR - stderr - 15%|█▌ | 3434/22434 [2:24:18<13:14:56, 2.51s/it] +2025-02-05 12:31:58 - ERROR - stderr - +2025-02-05 12:31:58 - ERROR - stderr - +2025-02-05 12:31:58 - INFO - stdout - {'loss': 1.0345, 'grad_norm': 1.126162052154541, 'learning_rate': 1.9216541472762736e-05, 'epoch': 0.46} +2025-02-05 12:31:58 - ERROR - stderr - 15%|█▌ | 3434/22434 [2:24:18<13:14:56, 2.51s/it] +2025-02-05 12:32:01 - ERROR - stderr - 15%|█▌ | 3435/22434 [2:24:21<13:36:39, 2.58s/it] +2025-02-05 12:32:01 - ERROR - stderr - +2025-02-05 12:32:01 - ERROR - stderr - +2025-02-05 12:32:01 - INFO - stdout - {'loss': 0.9682, 'grad_norm': 1.087627649307251, 'learning_rate': 1.9215981185098083e-05, 'epoch': 0.46} +2025-02-05 12:32:01 - ERROR - stderr - 15%|█▌ | 3435/22434 [2:24:21<13:36:39, 2.58s/it] +2025-02-05 12:32:04 - ERROR - stderr - 15%|█▌ | 3436/22434 [2:24:23<13:27:17, 2.55s/it] +2025-02-05 12:32:04 - ERROR - stderr - +2025-02-05 12:32:04 - ERROR - stderr - +2025-02-05 12:32:04 - INFO - stdout - {'loss': 1.0124, 'grad_norm': 1.0893338918685913, 'learning_rate': 1.9215420705335117e-05, 'epoch': 0.46} +2025-02-05 12:32:04 - ERROR - stderr - 15%|█▌ | 3436/22434 [2:24:23<13:27:17, 2.55s/it] +2025-02-05 12:32:06 - ERROR - stderr - 15%|█▌ | 3437/22434 [2:24:26<13:27:55, 2.55s/it] +2025-02-05 12:32:06 - ERROR - stderr - +2025-02-05 12:32:06 - ERROR - stderr - +2025-02-05 12:32:06 - INFO - stdout - {'loss': 1.0602, 'grad_norm': 1.1037579774856567, 'learning_rate': 1.921486003348553e-05, 'epoch': 0.46} +2025-02-05 12:32:06 - ERROR - stderr - 15%|█▌ | 3437/22434 [2:24:26<13:27:55, 2.55s/it] +2025-02-05 12:32:09 - ERROR - stderr - 15%|█▌ | 3438/22434 [2:24:28<13:23:36, 2.54s/it] +2025-02-05 12:32:09 - ERROR - stderr - +2025-02-05 12:32:09 - ERROR - stderr - +2025-02-05 12:32:09 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.064774990081787, 'learning_rate': 1.9214299169561e-05, 'epoch': 0.46} +2025-02-05 12:32:09 - ERROR - stderr - 15%|█▌ | 3438/22434 [2:24:28<13:23:36, 2.54s/it] +2025-02-05 12:32:11 - ERROR - stderr - 15%|█▌ | 3439/22434 [2:24:31<13:15:26, 2.51s/it] +2025-02-05 12:32:11 - ERROR - stderr - +2025-02-05 12:32:11 - ERROR - stderr - +2025-02-05 12:32:11 - INFO - stdout - {'loss': 1.0371, 'grad_norm': 1.067426085472107, 'learning_rate': 1.921373811357322e-05, 'epoch': 0.46} +2025-02-05 12:32:11 - ERROR - stderr - 15%|█▌ | 3439/22434 [2:24:31<13:15:26, 2.51s/it] +2025-02-05 12:32:14 - ERROR - stderr - 15%|█▌ | 3440/22434 [2:24:34<13:35:01, 2.57s/it] +2025-02-05 12:32:14 - ERROR - stderr - +2025-02-05 12:32:14 - ERROR - stderr - +2025-02-05 12:32:14 - INFO - stdout - {'loss': 1.0329, 'grad_norm': 1.1470178365707397, 'learning_rate': 1.9213176865533887e-05, 'epoch': 0.46} +2025-02-05 12:32:14 - ERROR - stderr - 15%|█▌ | 3440/22434 [2:24:34<13:35:01, 2.57s/it] +2025-02-05 12:32:16 - ERROR - stderr - 15%|█▌ | 3441/22434 [2:24:36<13:24:36, 2.54s/it] +2025-02-05 12:32:16 - ERROR - stderr - +2025-02-05 12:32:16 - ERROR - stderr - +2025-02-05 12:32:16 - INFO - stdout - {'loss': 0.9333, 'grad_norm': 1.0724917650222778, 'learning_rate': 1.92126154254547e-05, 'epoch': 0.46} +2025-02-05 12:32:16 - ERROR - stderr - 15%|█▌ | 3441/22434 [2:24:36<13:24:36, 2.54s/it] +2025-02-05 12:32:19 - ERROR - stderr - 15%|█▌ | 3442/22434 [2:24:38<13:17:40, 2.52s/it] +2025-02-05 12:32:19 - ERROR - stderr - +2025-02-05 12:32:19 - ERROR - stderr - +2025-02-05 12:32:19 - INFO - stdout - {'loss': 0.9744, 'grad_norm': 1.1104001998901367, 'learning_rate': 1.921205379334736e-05, 'epoch': 0.46} +2025-02-05 12:32:19 - ERROR - stderr - 15%|█▌ | 3442/22434 [2:24:39<13:17:40, 2.52s/it] +2025-02-05 12:32:21 - ERROR - stderr - 15%|█▌ | 3443/22434 [2:24:41<13:06:06, 2.48s/it] +2025-02-05 12:32:21 - ERROR - stderr - +2025-02-05 12:32:21 - ERROR - stderr - +2025-02-05 12:32:21 - INFO - stdout - {'loss': 1.0706, 'grad_norm': 1.1794346570968628, 'learning_rate': 1.921149196922357e-05, 'epoch': 0.46} +2025-02-05 12:32:21 - ERROR - stderr - 15%|█▌ | 3443/22434 [2:24:41<13:06:06, 2.48s/it] +2025-02-05 12:32:24 - ERROR - stderr - 15%|█▌ | 3444/22434 [2:24:43<13:01:23, 2.47s/it] +2025-02-05 12:32:24 - ERROR - stderr - +2025-02-05 12:32:24 - ERROR - stderr - +2025-02-05 12:32:24 - INFO - stdout - {'loss': 0.8562, 'grad_norm': 1.1289557218551636, 'learning_rate': 1.9210929953095047e-05, 'epoch': 0.46} +2025-02-05 12:32:24 - ERROR - stderr - 15%|█▌ | 3444/22434 [2:24:43<13:01:23, 2.47s/it] +2025-02-05 12:32:26 - ERROR - stderr - 15%|█▌ | 3445/22434 [2:24:46<12:59:30, 2.46s/it] +2025-02-05 12:32:26 - ERROR - stderr - +2025-02-05 12:32:26 - ERROR - stderr - +2025-02-05 12:32:26 - INFO - stdout - {'loss': 0.9655, 'grad_norm': 1.0861027240753174, 'learning_rate': 1.9210367744973498e-05, 'epoch': 0.46} +2025-02-05 12:32:26 - ERROR - stderr - 15%|█▌ | 3445/22434 [2:24:46<12:59:30, 2.46s/it] +2025-02-05 12:32:28 - ERROR - stderr - 15%|█▌ | 3446/22434 [2:24:48<13:03:55, 2.48s/it] +2025-02-05 12:32:29 - ERROR - stderr - +2025-02-05 12:32:29 - ERROR - stderr - +2025-02-05 12:32:29 - INFO - stdout - {'loss': 0.9527, 'grad_norm': 1.247092843055725, 'learning_rate': 1.9209805344870654e-05, 'epoch': 0.46} +2025-02-05 12:32:29 - ERROR - stderr - 15%|█▌ | 3446/22434 [2:24:48<13:03:55, 2.48s/it] +2025-02-05 12:32:31 - ERROR - stderr - 15%|█▌ | 3447/22434 [2:24:51<13:13:57, 2.51s/it] +2025-02-05 12:32:31 - ERROR - stderr - +2025-02-05 12:32:31 - ERROR - stderr - +2025-02-05 12:32:31 - INFO - stdout - {'loss': 1.0268, 'grad_norm': 1.2231624126434326, 'learning_rate': 1.9209242752798225e-05, 'epoch': 0.46} +2025-02-05 12:32:31 - ERROR - stderr - 15%|█▌ | 3447/22434 [2:24:51<13:13:57, 2.51s/it] +2025-02-05 12:32:34 - ERROR - stderr - 15%|█▌ | 3448/22434 [2:24:53<13:10:50, 2.50s/it] +2025-02-05 12:32:34 - ERROR - stderr - +2025-02-05 12:32:34 - ERROR - stderr - +2025-02-05 12:32:34 - INFO - stdout - {'loss': 0.9184, 'grad_norm': 1.1294763088226318, 'learning_rate': 1.9208679968767947e-05, 'epoch': 0.46} +2025-02-05 12:32:34 - ERROR - stderr - 15%|█▌ | 3448/22434 [2:24:53<13:10:50, 2.50s/it] +2025-02-05 12:32:36 - ERROR - stderr - 15%|█▌ | 3449/22434 [2:24:56<13:06:19, 2.49s/it] +2025-02-05 12:32:36 - ERROR - stderr - +2025-02-05 12:32:36 - ERROR - stderr - +2025-02-05 12:32:36 - INFO - stdout - {'loss': 0.948, 'grad_norm': 1.0424902439117432, 'learning_rate': 1.9208116992791546e-05, 'epoch': 0.46} +2025-02-05 12:32:36 - ERROR - stderr - 15%|█▌ | 3449/22434 [2:24:56<13:06:19, 2.49s/it] +2025-02-05 12:32:39 - ERROR - stderr - 15%|█▌ | 3450/22434 [2:24:58<13:26:45, 2.55s/it] +2025-02-05 12:32:39 - ERROR - stderr - +2025-02-05 12:32:39 - ERROR - stderr - +2025-02-05 12:32:39 - INFO - stdout - {'loss': 0.9674, 'grad_norm': 1.1189802885055542, 'learning_rate': 1.920755382488076e-05, 'epoch': 0.46} +2025-02-05 12:32:39 - ERROR - stderr - 15%|█▌ | 3450/22434 [2:24:59<13:26:45, 2.55s/it] +2025-02-05 12:32:41 - ERROR - stderr - 15%|█▌ | 3451/22434 [2:25:01<13:23:21, 2.54s/it] +2025-02-05 12:32:41 - ERROR - stderr - +2025-02-05 12:32:41 - ERROR - stderr - +2025-02-05 12:32:41 - INFO - stdout - {'loss': 0.9225, 'grad_norm': 1.1505566835403442, 'learning_rate': 1.9206990465047316e-05, 'epoch': 0.46} +2025-02-05 12:32:41 - ERROR - stderr - 15%|█▌ | 3451/22434 [2:25:01<13:23:21, 2.54s/it] +2025-02-05 12:32:44 - ERROR - stderr - 15%|█▌ | 3452/22434 [2:25:03<13:15:21, 2.51s/it] +2025-02-05 12:32:44 - ERROR - stderr - +2025-02-05 12:32:44 - ERROR - stderr - +2025-02-05 12:32:44 - INFO - stdout - {'loss': 0.9448, 'grad_norm': 1.078946590423584, 'learning_rate': 1.9206426913302976e-05, 'epoch': 0.46} +2025-02-05 12:32:44 - ERROR - stderr - 15%|█▌ | 3452/22434 [2:25:03<13:15:21, 2.51s/it] +2025-02-05 12:32:46 - ERROR - stderr - 15%|█▌ | 3453/22434 [2:25:06<13:17:06, 2.52s/it] +2025-02-05 12:32:46 - ERROR - stderr - +2025-02-05 12:32:46 - ERROR - stderr - +2025-02-05 12:32:46 - INFO - stdout - {'loss': 0.9962, 'grad_norm': 1.070104956626892, 'learning_rate': 1.920586316965947e-05, 'epoch': 0.46} +2025-02-05 12:32:46 - ERROR - stderr - 15%|█▌ | 3453/22434 [2:25:06<13:17:06, 2.52s/it] +2025-02-05 12:32:49 - ERROR - stderr - 15%|█▌ | 3454/22434 [2:25:08<13:11:34, 2.50s/it] +2025-02-05 12:32:49 - ERROR - stderr - +2025-02-05 12:32:49 - ERROR - stderr - +2025-02-05 12:32:49 - INFO - stdout - {'loss': 0.9946, 'grad_norm': 1.1646274328231812, 'learning_rate': 1.9205299234128558e-05, 'epoch': 0.46} +2025-02-05 12:32:49 - ERROR - stderr - 15%|█▌ | 3454/22434 [2:25:08<13:11:34, 2.50s/it] +2025-02-05 12:32:51 - ERROR - stderr - 15%|█▌ | 3455/22434 [2:25:11<13:05:41, 2.48s/it] +2025-02-05 12:32:51 - ERROR - stderr - +2025-02-05 12:32:51 - ERROR - stderr - +2025-02-05 12:32:51 - INFO - stdout - {'loss': 1.0148, 'grad_norm': 1.1912081241607666, 'learning_rate': 1.9204735106721992e-05, 'epoch': 0.46} +2025-02-05 12:32:51 - ERROR - stderr - 15%|█▌ | 3455/22434 [2:25:11<13:05:41, 2.48s/it] +2025-02-05 12:32:54 - ERROR - stderr - 15%|█▌ | 3456/22434 [2:25:13<13:03:04, 2.48s/it] +2025-02-05 12:32:54 - ERROR - stderr - +2025-02-05 12:32:54 - ERROR - stderr - +2025-02-05 12:32:54 - INFO - stdout - {'loss': 1.0132, 'grad_norm': 1.134006381034851, 'learning_rate': 1.920417078745153e-05, 'epoch': 0.46} +2025-02-05 12:32:54 - ERROR - stderr - 15%|█▌ | 3456/22434 [2:25:13<13:03:04, 2.48s/it] +2025-02-05 12:32:56 - ERROR - stderr - 15%|█▌ | 3457/22434 [2:25:16<13:07:05, 2.49s/it] +2025-02-05 12:32:56 - ERROR - stderr - +2025-02-05 12:32:56 - ERROR - stderr - +2025-02-05 12:32:56 - INFO - stdout - {'loss': 0.9198, 'grad_norm': 1.0826951265335083, 'learning_rate': 1.9203606276328937e-05, 'epoch': 0.46} +2025-02-05 12:32:56 - ERROR - stderr - 15%|█▌ | 3457/22434 [2:25:16<13:07:05, 2.49s/it] +2025-02-05 12:32:59 - ERROR - stderr - 15%|█▌ | 3458/22434 [2:25:19<13:27:10, 2.55s/it] +2025-02-05 12:32:59 - ERROR - stderr - +2025-02-05 12:32:59 - ERROR - stderr - +2025-02-05 12:32:59 - INFO - stdout - {'loss': 0.8228, 'grad_norm': 0.9836342334747314, 'learning_rate': 1.9203041573365978e-05, 'epoch': 0.46} +2025-02-05 12:32:59 - ERROR - stderr - 15%|█▌ | 3458/22434 [2:25:19<13:27:10, 2.55s/it] +2025-02-05 12:33:01 - ERROR - stderr - 15%|█▌ | 3459/22434 [2:25:21<13:26:20, 2.55s/it] +2025-02-05 12:33:01 - ERROR - stderr - +2025-02-05 12:33:01 - ERROR - stderr - +2025-02-05 12:33:01 - INFO - stdout - {'loss': 0.9493, 'grad_norm': 1.1260766983032227, 'learning_rate': 1.9202476678574424e-05, 'epoch': 0.46} +2025-02-05 12:33:01 - ERROR - stderr - 15%|█▌ | 3459/22434 [2:25:21<13:26:20, 2.55s/it] +2025-02-05 12:33:04 - ERROR - stderr - 15%|█▌ | 3460/22434 [2:25:24<13:15:03, 2.51s/it] +2025-02-05 12:33:04 - ERROR - stderr - +2025-02-05 12:33:04 - ERROR - stderr - +2025-02-05 12:33:04 - INFO - stdout - {'loss': 1.003, 'grad_norm': 1.1229695081710815, 'learning_rate': 1.9201911591966045e-05, 'epoch': 0.46} +2025-02-05 12:33:04 - ERROR - stderr - 15%|█▌ | 3460/22434 [2:25:24<13:15:03, 2.51s/it] +2025-02-05 12:33:06 - ERROR - stderr - 15%|█▌ | 3461/22434 [2:25:26<13:14:25, 2.51s/it] +2025-02-05 12:33:06 - ERROR - stderr - +2025-02-05 12:33:06 - ERROR - stderr - +2025-02-05 12:33:06 - INFO - stdout - {'loss': 1.0445, 'grad_norm': 1.2000818252563477, 'learning_rate': 1.9201346313552628e-05, 'epoch': 0.46} +2025-02-05 12:33:06 - ERROR - stderr - 15%|█▌ | 3461/22434 [2:25:26<13:14:25, 2.51s/it] +2025-02-05 12:33:09 - ERROR - stderr - 15%|█▌ | 3462/22434 [2:25:28<13:07:34, 2.49s/it] +2025-02-05 12:33:09 - ERROR - stderr - +2025-02-05 12:33:09 - ERROR - stderr - +2025-02-05 12:33:09 - INFO - stdout - {'loss': 0.8961, 'grad_norm': 1.0836660861968994, 'learning_rate': 1.920078084334595e-05, 'epoch': 0.46} +2025-02-05 12:33:09 - ERROR - stderr - 15%|█▌ | 3462/22434 [2:25:29<13:07:34, 2.49s/it] +2025-02-05 12:33:11 - ERROR - stderr - 15%|█▌ | 3463/22434 [2:25:31<13:03:43, 2.48s/it] +2025-02-05 12:33:11 - ERROR - stderr - +2025-02-05 12:33:11 - ERROR - stderr - +2025-02-05 12:33:11 - INFO - stdout - {'loss': 0.9747, 'grad_norm': 1.1743868589401245, 'learning_rate': 1.9200215181357798e-05, 'epoch': 0.46} +2025-02-05 12:33:11 - ERROR - stderr - 15%|█▌ | 3463/22434 [2:25:31<13:03:43, 2.48s/it] +2025-02-05 12:33:14 - ERROR - stderr - 15%|█▌ | 3464/22434 [2:25:33<13:07:03, 2.49s/it] +2025-02-05 12:33:14 - ERROR - stderr - +2025-02-05 12:33:14 - ERROR - stderr - +2025-02-05 12:33:14 - INFO - stdout - {'loss': 0.9822, 'grad_norm': 1.1364059448242188, 'learning_rate': 1.919964932759997e-05, 'epoch': 0.46} +2025-02-05 12:33:14 - ERROR - stderr - 15%|█▌ | 3464/22434 [2:25:33<13:07:03, 2.49s/it] +2025-02-05 12:33:16 - ERROR - stderr - 15%|█▌ | 3465/22434 [2:25:36<13:08:31, 2.49s/it] +2025-02-05 12:33:16 - ERROR - stderr - +2025-02-05 12:33:16 - ERROR - stderr - +2025-02-05 12:33:16 - INFO - stdout - {'loss': 0.9867, 'grad_norm': 1.0988980531692505, 'learning_rate': 1.9199083282084253e-05, 'epoch': 0.46} +2025-02-05 12:33:16 - ERROR - stderr - 15%|█▌ | 3465/22434 [2:25:36<13:08:31, 2.49s/it] +2025-02-05 12:33:19 - ERROR - stderr - 15%|█▌ | 3466/22434 [2:25:38<13:09:56, 2.50s/it] +2025-02-05 12:33:19 - ERROR - stderr - +2025-02-05 12:33:19 - ERROR - stderr - +2025-02-05 12:33:19 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.0937973260879517, 'learning_rate': 1.9198517044822445e-05, 'epoch': 0.46} +2025-02-05 12:33:19 - ERROR - stderr - 15%|█▌ | 3466/22434 [2:25:39<13:09:56, 2.50s/it] +2025-02-05 12:33:22 - ERROR - stderr - 15%|█▌ | 3467/22434 [2:25:41<13:40:48, 2.60s/it] +2025-02-05 12:33:22 - ERROR - stderr - +2025-02-05 12:33:22 - ERROR - stderr - +2025-02-05 12:33:22 - INFO - stdout - {'loss': 0.923, 'grad_norm': 1.0952130556106567, 'learning_rate': 1.9197950615826354e-05, 'epoch': 0.46} +2025-02-05 12:33:22 - ERROR - stderr - 15%|█▌ | 3467/22434 [2:25:41<13:40:48, 2.60s/it] +2025-02-05 12:33:24 - ERROR - stderr - 15%|█▌ | 3468/22434 [2:25:44<13:27:09, 2.55s/it] +2025-02-05 12:33:24 - ERROR - stderr - +2025-02-05 12:33:24 - ERROR - stderr - +2025-02-05 12:33:24 - INFO - stdout - {'loss': 0.9464, 'grad_norm': 1.1552884578704834, 'learning_rate': 1.919738399510778e-05, 'epoch': 0.46} +2025-02-05 12:33:24 - ERROR - stderr - 15%|█▌ | 3468/22434 [2:25:44<13:27:09, 2.55s/it] +2025-02-05 12:33:26 - ERROR - stderr - 15%|█▌ | 3469/22434 [2:25:46<13:17:23, 2.52s/it] +2025-02-05 12:33:26 - ERROR - stderr - +2025-02-05 12:33:26 - ERROR - stderr - +2025-02-05 12:33:26 - INFO - stdout - {'loss': 0.8644, 'grad_norm': 1.0457805395126343, 'learning_rate': 1.919681718267854e-05, 'epoch': 0.46} +2025-02-05 12:33:26 - ERROR - stderr - 15%|█▌ | 3469/22434 [2:25:46<13:17:23, 2.52s/it] +2025-02-05 12:33:29 - ERROR - stderr - 15%|█▌ | 3470/22434 [2:25:49<13:15:16, 2.52s/it] +2025-02-05 12:33:29 - ERROR - stderr - +2025-02-05 12:33:29 - ERROR - stderr - +2025-02-05 12:33:29 - INFO - stdout - {'loss': 1.0576, 'grad_norm': 1.085153579711914, 'learning_rate': 1.9196250178550447e-05, 'epoch': 0.46} +2025-02-05 12:33:29 - ERROR - stderr - 15%|█▌ | 3470/22434 [2:25:49<13:15:16, 2.52s/it] +2025-02-05 12:33:32 - ERROR - stderr - 15%|█▌ | 3471/22434 [2:25:51<13:34:32, 2.58s/it] +2025-02-05 12:33:32 - ERROR - stderr - +2025-02-05 12:33:32 - ERROR - stderr - +2025-02-05 12:33:32 - INFO - stdout - {'loss': 1.0834, 'grad_norm': 1.2174235582351685, 'learning_rate': 1.9195682982735317e-05, 'epoch': 0.46} +2025-02-05 12:33:32 - ERROR - stderr - 15%|█▌ | 3471/22434 [2:25:51<13:34:32, 2.58s/it] +2025-02-05 12:33:34 - ERROR - stderr - 15%|█▌ | 3472/22434 [2:25:54<13:26:36, 2.55s/it] +2025-02-05 12:33:34 - ERROR - stderr - +2025-02-05 12:33:34 - ERROR - stderr - +2025-02-05 12:33:34 - INFO - stdout - {'loss': 0.9687, 'grad_norm': 1.2986717224121094, 'learning_rate': 1.9195115595244976e-05, 'epoch': 0.46} +2025-02-05 12:33:34 - ERROR - stderr - 15%|█▌ | 3472/22434 [2:25:54<13:26:36, 2.55s/it] +2025-02-05 12:33:37 - ERROR - stderr - 15%|█▌ | 3473/22434 [2:25:56<13:17:36, 2.52s/it] +2025-02-05 12:33:37 - ERROR - stderr - +2025-02-05 12:33:37 - ERROR - stderr - +2025-02-05 12:33:37 - INFO - stdout - {'loss': 0.8698, 'grad_norm': 1.1093624830245972, 'learning_rate': 1.919454801609125e-05, 'epoch': 0.46} +2025-02-05 12:33:37 - ERROR - stderr - 15%|█▌ | 3473/22434 [2:25:56<13:17:36, 2.52s/it] +2025-02-05 12:33:39 - ERROR - stderr - 15%|█▌ | 3474/22434 [2:25:59<13:16:49, 2.52s/it] +2025-02-05 12:33:39 - ERROR - stderr - +2025-02-05 12:33:39 - ERROR - stderr - +2025-02-05 12:33:39 - INFO - stdout - {'loss': 0.9729, 'grad_norm': 1.1158854961395264, 'learning_rate': 1.9193980245285967e-05, 'epoch': 0.46} +2025-02-05 12:33:39 - ERROR - stderr - 15%|█▌ | 3474/22434 [2:25:59<13:16:49, 2.52s/it] +2025-02-05 12:33:42 - ERROR - stderr - 15%|█▌ | 3475/22434 [2:26:01<13:09:58, 2.50s/it] +2025-02-05 12:33:42 - ERROR - stderr - +2025-02-05 12:33:42 - ERROR - stderr - +2025-02-05 12:33:42 - INFO - stdout - {'loss': 0.9811, 'grad_norm': 1.2086210250854492, 'learning_rate': 1.9193412282840965e-05, 'epoch': 0.46} +2025-02-05 12:33:42 - ERROR - stderr - 15%|█▌ | 3475/22434 [2:26:01<13:09:58, 2.50s/it] +2025-02-05 12:33:44 - ERROR - stderr - 15%|█▌ | 3476/22434 [2:26:04<13:05:29, 2.49s/it] +2025-02-05 12:33:44 - ERROR - stderr - +2025-02-05 12:33:44 - ERROR - stderr - +2025-02-05 12:33:44 - INFO - stdout - {'loss': 0.9405, 'grad_norm': 1.0425963401794434, 'learning_rate': 1.9192844128768077e-05, 'epoch': 0.46} +2025-02-05 12:33:44 - ERROR - stderr - 15%|█▌ | 3476/22434 [2:26:04<13:05:29, 2.49s/it] +2025-02-05 12:33:46 - ERROR - stderr - 15%|█▌ | 3477/22434 [2:26:06<13:03:24, 2.48s/it] +2025-02-05 12:33:47 - ERROR - stderr - +2025-02-05 12:33:47 - ERROR - stderr - +2025-02-05 12:33:47 - INFO - stdout - {'loss': 1.0837, 'grad_norm': 1.2545664310455322, 'learning_rate': 1.9192275783079155e-05, 'epoch': 0.46} +2025-02-05 12:33:47 - ERROR - stderr - 15%|█▌ | 3477/22434 [2:26:06<13:03:24, 2.48s/it] +2025-02-05 12:33:49 - ERROR - stderr - 16%|█▌ | 3478/22434 [2:26:09<13:09:09, 2.50s/it] +2025-02-05 12:33:49 - ERROR - stderr - +2025-02-05 12:33:49 - ERROR - stderr - +2025-02-05 12:33:49 - INFO - stdout - {'loss': 0.9364, 'grad_norm': 1.0331977605819702, 'learning_rate': 1.9191707245786038e-05, 'epoch': 0.47} +2025-02-05 12:33:49 - ERROR - stderr - 16%|█▌ | 3478/22434 [2:26:09<13:09:09, 2.50s/it] +2025-02-05 12:33:52 - ERROR - stderr - 16%|█▌ | 3479/22434 [2:26:11<13:13:01, 2.51s/it] +2025-02-05 12:33:52 - ERROR - stderr - +2025-02-05 12:33:52 - ERROR - stderr - +2025-02-05 12:33:52 - INFO - stdout - {'loss': 0.9805, 'grad_norm': 1.200106143951416, 'learning_rate': 1.919113851690058e-05, 'epoch': 0.47} +2025-02-05 12:33:52 - ERROR - stderr - 16%|█▌ | 3479/22434 [2:26:11<13:13:01, 2.51s/it] +2025-02-05 12:33:54 - ERROR - stderr - 16%|█▌ | 3480/22434 [2:26:14<13:06:24, 2.49s/it] +2025-02-05 12:33:54 - ERROR - stderr - +2025-02-05 12:33:54 - ERROR - stderr - +2025-02-05 12:33:54 - INFO - stdout - {'loss': 1.0226, 'grad_norm': 1.121775507926941, 'learning_rate': 1.9190569596434635e-05, 'epoch': 0.47} +2025-02-05 12:33:54 - ERROR - stderr - 16%|█▌ | 3480/22434 [2:26:14<13:06:24, 2.49s/it] +2025-02-05 12:33:56 - ERROR - stderr - 16%|█▌ | 3481/22434 [2:26:16<13:02:03, 2.48s/it] +2025-02-05 12:33:56 - ERROR - stderr - +2025-02-05 12:33:56 - ERROR - stderr - +2025-02-05 12:33:56 - INFO - stdout - {'loss': 0.9592, 'grad_norm': 1.0887154340744019, 'learning_rate': 1.9190000484400058e-05, 'epoch': 0.47} +2025-02-05 12:33:56 - ERROR - stderr - 16%|█▌ | 3481/22434 [2:26:16<13:02:03, 2.48s/it] +2025-02-05 12:33:59 - ERROR - stderr - 16%|█▌ | 3482/22434 [2:26:19<13:05:38, 2.49s/it] +2025-02-05 12:33:59 - ERROR - stderr - +2025-02-05 12:33:59 - ERROR - stderr - +2025-02-05 12:33:59 - INFO - stdout - {'loss': 0.919, 'grad_norm': 1.155894160270691, 'learning_rate': 1.9189431180808715e-05, 'epoch': 0.47} +2025-02-05 12:33:59 - ERROR - stderr - 16%|█▌ | 3482/22434 [2:26:19<13:05:38, 2.49s/it] +2025-02-05 12:34:01 - ERROR - stderr - 16%|█▌ | 3483/22434 [2:26:21<13:10:26, 2.50s/it] +2025-02-05 12:34:02 - ERROR - stderr - +2025-02-05 12:34:02 - ERROR - stderr - +2025-02-05 12:34:02 - INFO - stdout - {'loss': 1.1008, 'grad_norm': 1.1092969179153442, 'learning_rate': 1.9188861685672475e-05, 'epoch': 0.47} +2025-02-05 12:34:02 - ERROR - stderr - 16%|█▌ | 3483/22434 [2:26:21<13:10:26, 2.50s/it] +2025-02-05 12:34:04 - ERROR - stderr - 16%|█▌ | 3484/22434 [2:26:24<13:09:37, 2.50s/it] +2025-02-05 12:34:04 - ERROR - stderr - +2025-02-05 12:34:04 - ERROR - stderr - +2025-02-05 12:34:04 - INFO - stdout - {'loss': 0.8381, 'grad_norm': 1.0001921653747559, 'learning_rate': 1.9188291999003207e-05, 'epoch': 0.47} +2025-02-05 12:34:04 - ERROR - stderr - 16%|█▌ | 3484/22434 [2:26:24<13:09:37, 2.50s/it] +2025-02-05 12:34:06 - ERROR - stderr - 16%|█▌ | 3485/22434 [2:26:26<13:07:50, 2.49s/it] +2025-02-05 12:34:07 - ERROR - stderr - +2025-02-05 12:34:07 - ERROR - stderr - +2025-02-05 12:34:07 - INFO - stdout - {'loss': 0.9104, 'grad_norm': 0.9587041735649109, 'learning_rate': 1.9187722120812783e-05, 'epoch': 0.47} +2025-02-05 12:34:07 - ERROR - stderr - 16%|█▌ | 3485/22434 [2:26:26<13:07:50, 2.49s/it] +2025-02-05 12:34:09 - ERROR - stderr - 16%|█▌ | 3486/22434 [2:26:29<13:12:58, 2.51s/it] +2025-02-05 12:34:09 - ERROR - stderr - +2025-02-05 12:34:09 - ERROR - stderr - +2025-02-05 12:34:09 - INFO - stdout - {'loss': 0.9604, 'grad_norm': 1.0774333477020264, 'learning_rate': 1.9187152051113082e-05, 'epoch': 0.47} +2025-02-05 12:34:09 - ERROR - stderr - 16%|█▌ | 3486/22434 [2:26:29<13:12:58, 2.51s/it] +2025-02-05 12:34:12 - ERROR - stderr - 16%|█▌ | 3487/22434 [2:26:32<13:39:29, 2.60s/it] +2025-02-05 12:34:12 - ERROR - stderr - +2025-02-05 12:34:12 - ERROR - stderr - +2025-02-05 12:34:12 - INFO - stdout - {'loss': 0.9771, 'grad_norm': 1.054100513458252, 'learning_rate': 1.918658178991599e-05, 'epoch': 0.47} +2025-02-05 12:34:12 - ERROR - stderr - 16%|█▌ | 3487/22434 [2:26:32<13:39:29, 2.60s/it] +2025-02-05 12:34:14 - ERROR - stderr - 16%|█▌ | 3488/22434 [2:26:34<13:24:31, 2.55s/it] +2025-02-05 12:34:14 - ERROR - stderr - +2025-02-05 12:34:14 - ERROR - stderr - +2025-02-05 12:34:14 - INFO - stdout - {'loss': 0.8995, 'grad_norm': 1.1179627180099487, 'learning_rate': 1.9186011337233387e-05, 'epoch': 0.47} +2025-02-05 12:34:14 - ERROR - stderr - 16%|█▌ | 3488/22434 [2:26:34<13:24:31, 2.55s/it] +2025-02-05 12:34:17 - ERROR - stderr - 16%|█▌ | 3489/22434 [2:26:37<13:19:38, 2.53s/it] +2025-02-05 12:34:17 - ERROR - stderr - +2025-02-05 12:34:17 - ERROR - stderr - +2025-02-05 12:34:17 - INFO - stdout - {'loss': 1.0134, 'grad_norm': 1.2614140510559082, 'learning_rate': 1.9185440693077168e-05, 'epoch': 0.47} +2025-02-05 12:34:17 - ERROR - stderr - 16%|█▌ | 3489/22434 [2:26:37<13:19:38, 2.53s/it] +2025-02-05 12:34:19 - ERROR - stderr - 16%|█▌ | 3490/22434 [2:26:39<13:16:38, 2.52s/it] +2025-02-05 12:34:19 - ERROR - stderr - +2025-02-05 12:34:19 - ERROR - stderr - +2025-02-05 12:34:19 - INFO - stdout - {'loss': 0.9413, 'grad_norm': 1.060590386390686, 'learning_rate': 1.9184869857459233e-05, 'epoch': 0.47} +2025-02-05 12:34:19 - ERROR - stderr - 16%|█▌ | 3490/22434 [2:26:39<13:16:38, 2.52s/it] +2025-02-05 12:34:22 - ERROR - stderr - 16%|█▌ | 3491/22434 [2:26:41<13:09:04, 2.50s/it] +2025-02-05 12:34:22 - ERROR - stderr - +2025-02-05 12:34:22 - ERROR - stderr - +2025-02-05 12:34:22 - INFO - stdout - {'loss': 0.8798, 'grad_norm': 1.0428732633590698, 'learning_rate': 1.918429883039147e-05, 'epoch': 0.47} +2025-02-05 12:34:22 - ERROR - stderr - 16%|█▌ | 3491/22434 [2:26:42<13:09:04, 2.50s/it] +2025-02-05 12:34:24 - ERROR - stderr - 16%|█▌ | 3492/22434 [2:26:44<13:03:04, 2.48s/it] +2025-02-05 12:34:24 - ERROR - stderr - +2025-02-05 12:34:24 - ERROR - stderr - +2025-02-05 12:34:24 - INFO - stdout - {'loss': 1.0264, 'grad_norm': 1.1563969850540161, 'learning_rate': 1.9183727611885787e-05, 'epoch': 0.47} +2025-02-05 12:34:24 - ERROR - stderr - 16%|█▌ | 3492/22434 [2:26:44<13:03:04, 2.48s/it] +2025-02-05 12:34:27 - ERROR - stderr - 16%|█▌ | 3493/22434 [2:26:47<13:27:43, 2.56s/it] +2025-02-05 12:34:27 - ERROR - stderr - +2025-02-05 12:34:27 - ERROR - stderr - +2025-02-05 12:34:27 - INFO - stdout - {'loss': 1.027, 'grad_norm': 1.1321064233779907, 'learning_rate': 1.918315620195409e-05, 'epoch': 0.47} +2025-02-05 12:34:27 - ERROR - stderr - 16%|█▌ | 3493/22434 [2:26:47<13:27:43, 2.56s/it] +2025-02-05 12:34:30 - ERROR - stderr - 16%|█▌ | 3494/22434 [2:26:49<13:43:39, 2.61s/it] +2025-02-05 12:34:30 - ERROR - stderr - +2025-02-05 12:34:30 - ERROR - stderr - +2025-02-05 12:34:30 - INFO - stdout - {'loss': 1.0025, 'grad_norm': 1.073287844657898, 'learning_rate': 1.918258460060829e-05, 'epoch': 0.47} +2025-02-05 12:34:30 - ERROR - stderr - 16%|█▌ | 3494/22434 [2:26:49<13:43:39, 2.61s/it] +2025-02-05 12:34:32 - ERROR - stderr - 16%|█▌ | 3495/22434 [2:26:52<13:28:16, 2.56s/it] +2025-02-05 12:34:32 - ERROR - stderr - +2025-02-05 12:34:32 - ERROR - stderr - +2025-02-05 12:34:32 - INFO - stdout - {'loss': 0.9481, 'grad_norm': 1.0472468137741089, 'learning_rate': 1.91820128078603e-05, 'epoch': 0.47} +2025-02-05 12:34:32 - ERROR - stderr - 16%|█▌ | 3495/22434 [2:26:52<13:28:16, 2.56s/it] +2025-02-05 12:34:35 - ERROR - stderr - 16%|█▌ | 3496/22434 [2:26:54<13:22:36, 2.54s/it] +2025-02-05 12:34:35 - ERROR - stderr - +2025-02-05 12:34:35 - ERROR - stderr - +2025-02-05 12:34:35 - INFO - stdout - {'loss': 1.0559, 'grad_norm': 1.2310487031936646, 'learning_rate': 1.9181440823722043e-05, 'epoch': 0.47} +2025-02-05 12:34:35 - ERROR - stderr - 16%|█▌ | 3496/22434 [2:26:54<13:22:36, 2.54s/it] +2025-02-05 12:34:37 - ERROR - stderr - 16%|█▌ | 3497/22434 [2:26:57<13:18:48, 2.53s/it] +2025-02-05 12:34:37 - ERROR - stderr - +2025-02-05 12:34:37 - ERROR - stderr - +2025-02-05 12:34:37 - INFO - stdout - {'loss': 0.9462, 'grad_norm': 1.2014554738998413, 'learning_rate': 1.9180868648205435e-05, 'epoch': 0.47} +2025-02-05 12:34:37 - ERROR - stderr - 16%|█▌ | 3497/22434 [2:26:57<13:18:48, 2.53s/it] +2025-02-05 12:34:39 - ERROR - stderr - 16%|█▌ | 3498/22434 [2:26:59<13:10:58, 2.51s/it] +2025-02-05 12:34:40 - ERROR - stderr - +2025-02-05 12:34:40 - ERROR - stderr - +2025-02-05 12:34:40 - INFO - stdout - {'loss': 1.0781, 'grad_norm': 1.1634538173675537, 'learning_rate': 1.9180296281322402e-05, 'epoch': 0.47} +2025-02-05 12:34:40 - ERROR - stderr - 16%|█▌ | 3498/22434 [2:26:59<13:10:58, 2.51s/it] +2025-02-05 12:34:42 - ERROR - stderr - 16%|█▌ | 3499/22434 [2:27:02<13:38:28, 2.59s/it] +2025-02-05 12:34:42 - ERROR - stderr - +2025-02-05 12:34:42 - ERROR - stderr - +2025-02-05 12:34:42 - INFO - stdout - {'loss': 1.1788, 'grad_norm': 1.16202974319458, 'learning_rate': 1.917972372308488e-05, 'epoch': 0.47} +2025-02-05 12:34:42 - ERROR - stderr - 16%|█▌ | 3499/22434 [2:27:02<13:38:28, 2.59s/it] +2025-02-05 12:34:45 - ERROR - stderr - 16%|█▌ | 3500/22434 [2:27:05<13:28:18, 2.56s/it] +2025-02-05 12:34:45 - ERROR - stderr - +2025-02-05 12:34:45 - ERROR - stderr - +2025-02-05 12:34:45 - INFO - stdout - {'loss': 0.8972, 'grad_norm': 1.0067589282989502, 'learning_rate': 1.91791509735048e-05, 'epoch': 0.47} +2025-02-05 12:34:45 - ERROR - stderr - 16%|█▌ | 3500/22434 [2:27:05<13:28:18, 2.56s/it] +2025-02-05 12:34:47 - ERROR - stderr - 16%|█▌ | 3501/22434 [2:27:07<13:33:49, 2.58s/it] +2025-02-05 12:34:47 - ERROR - stderr - +2025-02-05 12:34:47 - ERROR - stderr - +2025-02-05 12:34:47 - INFO - stdout - {'loss': 0.9096, 'grad_norm': 1.0489157438278198, 'learning_rate': 1.9178578032594105e-05, 'epoch': 0.47} +2025-02-05 12:34:47 - ERROR - stderr - 16%|█▌ | 3501/22434 [2:27:07<13:33:49, 2.58s/it] +2025-02-05 12:34:50 - ERROR - stderr - 16%|█▌ | 3502/22434 [2:27:10<13:25:40, 2.55s/it] +2025-02-05 12:34:50 - ERROR - stderr - +2025-02-05 12:34:50 - ERROR - stderr - +2025-02-05 12:34:50 - INFO - stdout - {'loss': 0.9396, 'grad_norm': 1.1619493961334229, 'learning_rate': 1.917800490036473e-05, 'epoch': 0.47} +2025-02-05 12:34:50 - ERROR - stderr - 16%|█▌ | 3502/22434 [2:27:10<13:25:40, 2.55s/it] +2025-02-05 12:34:52 - ERROR - stderr - 16%|█▌ | 3503/22434 [2:27:12<13:27:32, 2.56s/it] +2025-02-05 12:34:53 - ERROR - stderr - +2025-02-05 12:34:53 - ERROR - stderr - +2025-02-05 12:34:53 - INFO - stdout - {'loss': 0.9995, 'grad_norm': 1.153590440750122, 'learning_rate': 1.9177431576828626e-05, 'epoch': 0.47} +2025-02-05 12:34:53 - ERROR - stderr - 16%|█▌ | 3503/22434 [2:27:12<13:27:32, 2.56s/it] +2025-02-05 12:34:55 - ERROR - stderr - 16%|█▌ | 3504/22434 [2:27:15<13:27:32, 2.56s/it] +2025-02-05 12:34:55 - ERROR - stderr - +2025-02-05 12:34:55 - ERROR - stderr - +2025-02-05 12:34:55 - INFO - stdout - {'loss': 1.1274, 'grad_norm': 1.1528078317642212, 'learning_rate': 1.9176858061997744e-05, 'epoch': 0.47} +2025-02-05 12:34:55 - ERROR - stderr - 16%|█▌ | 3504/22434 [2:27:15<13:27:32, 2.56s/it] +2025-02-05 12:34:58 - ERROR - stderr - 16%|█▌ | 3505/22434 [2:27:17<13:21:21, 2.54s/it] +2025-02-05 12:34:58 - ERROR - stderr - +2025-02-05 12:34:58 - ERROR - stderr - +2025-02-05 12:34:58 - INFO - stdout - {'loss': 1.0372, 'grad_norm': 1.1239203214645386, 'learning_rate': 1.9176284355884038e-05, 'epoch': 0.47} +2025-02-05 12:34:58 - ERROR - stderr - 16%|█▌ | 3505/22434 [2:27:17<13:21:21, 2.54s/it] +2025-02-05 12:35:00 - ERROR - stderr - 16%|█▌ | 3506/22434 [2:27:20<13:22:16, 2.54s/it] +2025-02-05 12:35:00 - ERROR - stderr - +2025-02-05 12:35:00 - ERROR - stderr - +2025-02-05 12:35:00 - INFO - stdout - {'loss': 0.962, 'grad_norm': 1.1865296363830566, 'learning_rate': 1.9175710458499464e-05, 'epoch': 0.47} +2025-02-05 12:35:00 - ERROR - stderr - 16%|█▌ | 3506/22434 [2:27:20<13:22:16, 2.54s/it] +2025-02-05 12:35:03 - ERROR - stderr - 16%|█▌ | 3507/22434 [2:27:22<13:20:46, 2.54s/it] +2025-02-05 12:35:03 - ERROR - stderr - +2025-02-05 12:35:03 - ERROR - stderr - +2025-02-05 12:35:03 - INFO - stdout - {'loss': 1.0542, 'grad_norm': 1.1185070276260376, 'learning_rate': 1.9175136369855985e-05, 'epoch': 0.47} +2025-02-05 12:35:03 - ERROR - stderr - 16%|█▌ | 3507/22434 [2:27:22<13:20:46, 2.54s/it] +2025-02-05 12:35:05 - ERROR - stderr - 16%|█▌ | 3508/22434 [2:27:25<13:16:11, 2.52s/it] +2025-02-05 12:35:05 - ERROR - stderr - +2025-02-05 12:35:05 - ERROR - stderr - +2025-02-05 12:35:05 - INFO - stdout - {'loss': 0.9607, 'grad_norm': 1.1700890064239502, 'learning_rate': 1.917456208996557e-05, 'epoch': 0.47} +2025-02-05 12:35:05 - ERROR - stderr - 16%|█▌ | 3508/22434 [2:27:25<13:16:11, 2.52s/it] +2025-02-05 12:35:08 - ERROR - stderr - 16%|█▌ | 3509/22434 [2:27:27<13:21:40, 2.54s/it] +2025-02-05 12:35:08 - ERROR - stderr - +2025-02-05 12:35:08 - ERROR - stderr - +2025-02-05 12:35:08 - INFO - stdout - {'loss': 0.923, 'grad_norm': 1.0757564306259155, 'learning_rate': 1.9173987618840185e-05, 'epoch': 0.47} +2025-02-05 12:35:08 - ERROR - stderr - 16%|█▌ | 3509/22434 [2:27:27<13:21:40, 2.54s/it] +2025-02-05 12:35:10 - ERROR - stderr - 16%|█▌ | 3510/22434 [2:27:30<13:20:21, 2.54s/it] +2025-02-05 12:35:10 - ERROR - stderr - +2025-02-05 12:35:10 - ERROR - stderr - +2025-02-05 12:35:10 - INFO - stdout - {'loss': 0.8703, 'grad_norm': 1.0142959356307983, 'learning_rate': 1.9173412956491808e-05, 'epoch': 0.47} +2025-02-05 12:35:10 - ERROR - stderr - 16%|█▌ | 3510/22434 [2:27:30<13:20:21, 2.54s/it] +2025-02-05 12:35:13 - ERROR - stderr - 16%|█▌ | 3511/22434 [2:27:32<13:18:59, 2.53s/it] +2025-02-05 12:35:13 - ERROR - stderr - +2025-02-05 12:35:13 - ERROR - stderr - +2025-02-05 12:35:13 - INFO - stdout - {'loss': 0.8675, 'grad_norm': 1.0474114418029785, 'learning_rate': 1.9172838102932414e-05, 'epoch': 0.47} +2025-02-05 12:35:13 - ERROR - stderr - 16%|█▌ | 3511/22434 [2:27:33<13:18:59, 2.53s/it] +2025-02-05 12:35:15 - ERROR - stderr - 16%|█▌ | 3512/22434 [2:27:35<13:12:47, 2.51s/it] +2025-02-05 12:35:15 - ERROR - stderr - +2025-02-05 12:35:15 - ERROR - stderr - +2025-02-05 12:35:15 - INFO - stdout - {'loss': 0.8893, 'grad_norm': 1.0961796045303345, 'learning_rate': 1.917226305817399e-05, 'epoch': 0.47} +2025-02-05 12:35:15 - ERROR - stderr - 16%|█▌ | 3512/22434 [2:27:35<13:12:47, 2.51s/it] +2025-02-05 12:35:18 - ERROR - stderr - 16%|█▌ | 3513/22434 [2:27:37<13:05:19, 2.49s/it] +2025-02-05 12:35:18 - ERROR - stderr - +2025-02-05 12:35:18 - ERROR - stderr - +2025-02-05 12:35:18 - INFO - stdout - {'loss': 0.9378, 'grad_norm': 1.0665099620819092, 'learning_rate': 1.917168782222852e-05, 'epoch': 0.47} +2025-02-05 12:35:18 - ERROR - stderr - 16%|█▌ | 3513/22434 [2:27:37<13:05:19, 2.49s/it] +2025-02-05 12:35:20 - ERROR - stderr - 16%|█▌ | 3514/22434 [2:27:40<13:05:15, 2.49s/it] +2025-02-05 12:35:20 - ERROR - stderr - +2025-02-05 12:35:20 - ERROR - stderr - +2025-02-05 12:35:20 - INFO - stdout - {'loss': 0.8196, 'grad_norm': 1.0473741292953491, 'learning_rate': 1.9171112395107988e-05, 'epoch': 0.47} +2025-02-05 12:35:20 - ERROR - stderr - 16%|█▌ | 3514/22434 [2:27:40<13:05:15, 2.49s/it] +2025-02-05 12:35:23 - ERROR - stderr - 16%|█▌ | 3515/22434 [2:27:42<13:06:22, 2.49s/it] +2025-02-05 12:35:23 - ERROR - stderr - +2025-02-05 12:35:23 - ERROR - stderr - +2025-02-05 12:35:23 - INFO - stdout - {'loss': 1.1079, 'grad_norm': 1.1318310499191284, 'learning_rate': 1.9170536776824396e-05, 'epoch': 0.47} +2025-02-05 12:35:23 - ERROR - stderr - 16%|█▌ | 3515/22434 [2:27:42<13:06:22, 2.49s/it] +2025-02-05 12:35:25 - ERROR - stderr - 16%|█▌ | 3516/22434 [2:27:45<13:14:20, 2.52s/it] +2025-02-05 12:35:25 - ERROR - stderr - +2025-02-05 12:35:25 - ERROR - stderr - +2025-02-05 12:35:25 - INFO - stdout - {'loss': 0.9907, 'grad_norm': 1.0326460599899292, 'learning_rate': 1.9169960967389744e-05, 'epoch': 0.47} +2025-02-05 12:35:25 - ERROR - stderr - 16%|█▌ | 3516/22434 [2:27:45<13:14:20, 2.52s/it] +2025-02-05 12:35:28 - ERROR - stderr - 16%|█▌ | 3517/22434 [2:27:47<13:11:36, 2.51s/it] +2025-02-05 12:35:28 - ERROR - stderr - +2025-02-05 12:35:28 - ERROR - stderr - +2025-02-05 12:35:28 - INFO - stdout - {'loss': 0.9849, 'grad_norm': 1.258815050125122, 'learning_rate': 1.9169384966816026e-05, 'epoch': 0.47} +2025-02-05 12:35:28 - ERROR - stderr - 16%|█▌ | 3517/22434 [2:27:48<13:11:36, 2.51s/it] +2025-02-05 12:35:30 - ERROR - stderr - 16%|█▌ | 3518/22434 [2:27:50<13:16:55, 2.53s/it] +2025-02-05 12:35:30 - ERROR - stderr - +2025-02-05 12:35:30 - ERROR - stderr - +2025-02-05 12:35:30 - INFO - stdout - {'loss': 0.9476, 'grad_norm': 1.0982081890106201, 'learning_rate': 1.9168808775115256e-05, 'epoch': 0.47} +2025-02-05 12:35:30 - ERROR - stderr - 16%|█▌ | 3518/22434 [2:27:50<13:16:55, 2.53s/it] +2025-02-05 12:35:33 - ERROR - stderr - 16%|█▌ | 3519/22434 [2:27:53<13:14:41, 2.52s/it] +2025-02-05 12:35:33 - ERROR - stderr - +2025-02-05 12:35:33 - ERROR - stderr - +2025-02-05 12:35:33 - INFO - stdout - {'loss': 1.0291, 'grad_norm': 1.0521838665008545, 'learning_rate': 1.916823239229944e-05, 'epoch': 0.47} +2025-02-05 12:35:33 - ERROR - stderr - 16%|█▌ | 3519/22434 [2:27:53<13:14:41, 2.52s/it] +2025-02-05 12:35:35 - ERROR - stderr - 16%|█▌ | 3520/22434 [2:27:55<13:09:10, 2.50s/it] +2025-02-05 12:35:35 - ERROR - stderr - +2025-02-05 12:35:35 - ERROR - stderr - +2025-02-05 12:35:35 - INFO - stdout - {'loss': 0.9748, 'grad_norm': 1.1831716299057007, 'learning_rate': 1.9167655818380594e-05, 'epoch': 0.47} +2025-02-05 12:35:35 - ERROR - stderr - 16%|█▌ | 3520/22434 [2:27:55<13:09:10, 2.50s/it] +2025-02-05 12:35:38 - ERROR - stderr - 16%|█▌ | 3521/22434 [2:27:58<13:12:06, 2.51s/it] +2025-02-05 12:35:38 - ERROR - stderr - +2025-02-05 12:35:38 - ERROR - stderr - +2025-02-05 12:35:38 - INFO - stdout - {'loss': 0.9398, 'grad_norm': 1.128922462463379, 'learning_rate': 1.916707905337073e-05, 'epoch': 0.47} +2025-02-05 12:35:38 - ERROR - stderr - 16%|█▌ | 3521/22434 [2:27:58<13:12:06, 2.51s/it] +2025-02-05 12:35:40 - ERROR - stderr - 16%|█▌ | 3522/22434 [2:28:00<13:07:11, 2.50s/it] +2025-02-05 12:35:40 - ERROR - stderr - +2025-02-05 12:35:40 - ERROR - stderr - +2025-02-05 12:35:40 - INFO - stdout - {'loss': 0.9049, 'grad_norm': 1.2630934715270996, 'learning_rate': 1.9166502097281882e-05, 'epoch': 0.47} +2025-02-05 12:35:40 - ERROR - stderr - 16%|█▌ | 3522/22434 [2:28:00<13:07:11, 2.50s/it] +2025-02-05 12:35:43 - ERROR - stderr - 16%|█▌ | 3523/22434 [2:28:03<13:25:25, 2.56s/it] +2025-02-05 12:35:43 - ERROR - stderr - +2025-02-05 12:35:43 - ERROR - stderr - +2025-02-05 12:35:43 - INFO - stdout - {'loss': 1.059, 'grad_norm': 1.207610011100769, 'learning_rate': 1.9165924950126064e-05, 'epoch': 0.47} +2025-02-05 12:35:43 - ERROR - stderr - 16%|█▌ | 3523/22434 [2:28:03<13:25:25, 2.56s/it] +2025-02-05 12:35:45 - ERROR - stderr - 16%|█▌ | 3524/22434 [2:28:05<13:17:04, 2.53s/it] +2025-02-05 12:35:45 - ERROR - stderr - +2025-02-05 12:35:45 - ERROR - stderr - +2025-02-05 12:35:45 - INFO - stdout - {'loss': 0.9014, 'grad_norm': 1.01205575466156, 'learning_rate': 1.9165347611915313e-05, 'epoch': 0.47} +2025-02-05 12:35:45 - ERROR - stderr - 16%|█▌ | 3524/22434 [2:28:05<13:17:04, 2.53s/it] +2025-02-05 12:35:48 - ERROR - stderr - 16%|█▌ | 3525/22434 [2:28:08<13:14:25, 2.52s/it] +2025-02-05 12:35:48 - ERROR - stderr - +2025-02-05 12:35:48 - ERROR - stderr - +2025-02-05 12:35:48 - INFO - stdout - {'loss': 1.0318, 'grad_norm': 1.217113971710205, 'learning_rate': 1.9164770082661662e-05, 'epoch': 0.47} +2025-02-05 12:35:48 - ERROR - stderr - 16%|█▌ | 3525/22434 [2:28:08<13:14:25, 2.52s/it] +2025-02-05 12:35:50 - ERROR - stderr - 16%|█▌ | 3526/22434 [2:28:10<13:14:18, 2.52s/it] +2025-02-05 12:35:50 - ERROR - stderr - +2025-02-05 12:35:50 - ERROR - stderr - +2025-02-05 12:35:50 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 1.0785239934921265, 'learning_rate': 1.9164192362377144e-05, 'epoch': 0.47} +2025-02-05 12:35:50 - ERROR - stderr - 16%|█▌ | 3526/22434 [2:28:10<13:14:18, 2.52s/it] +2025-02-05 12:35:53 - ERROR - stderr - 16%|█▌ | 3527/22434 [2:28:13<13:08:36, 2.50s/it] +2025-02-05 12:35:53 - ERROR - stderr - +2025-02-05 12:35:53 - ERROR - stderr - +2025-02-05 12:35:53 - INFO - stdout - {'loss': 0.9629, 'grad_norm': 1.1666805744171143, 'learning_rate': 1.9163614451073812e-05, 'epoch': 0.47} +2025-02-05 12:35:53 - ERROR - stderr - 16%|█▌ | 3527/22434 [2:28:13<13:08:36, 2.50s/it] +2025-02-05 12:35:55 - ERROR - stderr - 16%|█▌ | 3528/22434 [2:28:15<13:07:42, 2.50s/it] +2025-02-05 12:35:55 - ERROR - stderr - +2025-02-05 12:35:55 - ERROR - stderr - +2025-02-05 12:35:55 - INFO - stdout - {'loss': 0.8615, 'grad_norm': 1.0410642623901367, 'learning_rate': 1.91630363487637e-05, 'epoch': 0.47} +2025-02-05 12:35:55 - ERROR - stderr - 16%|█▌ | 3528/22434 [2:28:15<13:07:42, 2.50s/it] +2025-02-05 12:35:58 - ERROR - stderr - 16%|█▌ | 3529/22434 [2:28:18<13:04:40, 2.49s/it] +2025-02-05 12:35:58 - ERROR - stderr - +2025-02-05 12:35:58 - ERROR - stderr - +2025-02-05 12:35:58 - INFO - stdout - {'loss': 0.887, 'grad_norm': 1.0157328844070435, 'learning_rate': 1.9162458055458866e-05, 'epoch': 0.47} +2025-02-05 12:35:58 - ERROR - stderr - 16%|█▌ | 3529/22434 [2:28:18<13:04:40, 2.49s/it] +2025-02-05 12:36:00 - ERROR - stderr - 16%|█▌ | 3530/22434 [2:28:20<13:08:01, 2.50s/it] +2025-02-05 12:36:00 - ERROR - stderr - +2025-02-05 12:36:00 - ERROR - stderr - +2025-02-05 12:36:00 - INFO - stdout - {'loss': 1.0339, 'grad_norm': 1.1191221475601196, 'learning_rate': 1.916187957117136e-05, 'epoch': 0.47} +2025-02-05 12:36:00 - ERROR - stderr - 16%|█▌ | 3530/22434 [2:28:20<13:08:01, 2.50s/it] +2025-02-05 12:36:03 - ERROR - stderr - 16%|█▌ | 3531/22434 [2:28:23<13:07:52, 2.50s/it] +2025-02-05 12:36:03 - ERROR - stderr - +2025-02-05 12:36:03 - ERROR - stderr - +2025-02-05 12:36:03 - INFO - stdout - {'loss': 0.9753, 'grad_norm': 1.1440049409866333, 'learning_rate': 1.9161300895913242e-05, 'epoch': 0.47} +2025-02-05 12:36:03 - ERROR - stderr - 16%|█▌ | 3531/22434 [2:28:23<13:07:52, 2.50s/it] +2025-02-05 12:36:05 - ERROR - stderr - 16%|█▌ | 3532/22434 [2:28:25<13:00:36, 2.48s/it] +2025-02-05 12:36:05 - ERROR - stderr - +2025-02-05 12:36:05 - ERROR - stderr - +2025-02-05 12:36:05 - INFO - stdout - {'loss': 1.0189, 'grad_norm': 1.07695734500885, 'learning_rate': 1.9160722029696573e-05, 'epoch': 0.47} +2025-02-05 12:36:05 - ERROR - stderr - 16%|█▌ | 3532/22434 [2:28:25<13:00:36, 2.48s/it] +2025-02-05 12:36:08 - ERROR - stderr - 16%|█▌ | 3533/22434 [2:28:28<13:01:23, 2.48s/it] +2025-02-05 12:36:08 - ERROR - stderr - +2025-02-05 12:36:08 - ERROR - stderr - +2025-02-05 12:36:08 - INFO - stdout - {'loss': 1.0896, 'grad_norm': 1.1916295289993286, 'learning_rate': 1.9160142972533423e-05, 'epoch': 0.47} +2025-02-05 12:36:08 - ERROR - stderr - 16%|█▌ | 3533/22434 [2:28:28<13:01:23, 2.48s/it] +2025-02-05 12:36:10 - ERROR - stderr - 16%|█▌ | 3534/22434 [2:28:30<13:00:40, 2.48s/it] +2025-02-05 12:36:10 - ERROR - stderr - +2025-02-05 12:36:10 - ERROR - stderr - +2025-02-05 12:36:10 - INFO - stdout - {'loss': 1.0067, 'grad_norm': 1.0217418670654297, 'learning_rate': 1.9159563724435852e-05, 'epoch': 0.47} +2025-02-05 12:36:10 - ERROR - stderr - 16%|█▌ | 3534/22434 [2:28:30<13:00:40, 2.48s/it] +2025-02-05 12:36:13 - ERROR - stderr - 16%|█▌ | 3535/22434 [2:28:32<12:57:58, 2.47s/it] +2025-02-05 12:36:13 - ERROR - stderr - +2025-02-05 12:36:13 - ERROR - stderr - +2025-02-05 12:36:13 - INFO - stdout - {'loss': 0.9931, 'grad_norm': 1.1300748586654663, 'learning_rate': 1.915898428541594e-05, 'epoch': 0.47} +2025-02-05 12:36:13 - ERROR - stderr - 16%|█▌ | 3535/22434 [2:28:33<12:57:58, 2.47s/it] +2025-02-05 12:36:15 - ERROR - stderr - 16%|█▌ | 3536/22434 [2:28:35<12:58:07, 2.47s/it] +2025-02-05 12:36:15 - ERROR - stderr - +2025-02-05 12:36:15 - ERROR - stderr - +2025-02-05 12:36:15 - INFO - stdout - {'loss': 1.076, 'grad_norm': 1.1691640615463257, 'learning_rate': 1.915840465548577e-05, 'epoch': 0.47} +2025-02-05 12:36:15 - ERROR - stderr - 16%|█▌ | 3536/22434 [2:28:35<12:58:07, 2.47s/it] +2025-02-05 12:36:18 - ERROR - stderr - 16%|█▌ | 3537/22434 [2:28:37<12:58:47, 2.47s/it] +2025-02-05 12:36:18 - ERROR - stderr - +2025-02-05 12:36:18 - ERROR - stderr - +2025-02-05 12:36:18 - INFO - stdout - {'loss': 0.9771, 'grad_norm': 1.0145046710968018, 'learning_rate': 1.9157824834657413e-05, 'epoch': 0.47} +2025-02-05 12:36:18 - ERROR - stderr - 16%|█▌ | 3537/22434 [2:28:37<12:58:47, 2.47s/it] +2025-02-05 12:36:20 - ERROR - stderr - 16%|█▌ | 3538/22434 [2:28:40<13:06:30, 2.50s/it] +2025-02-05 12:36:20 - ERROR - stderr - +2025-02-05 12:36:20 - ERROR - stderr - +2025-02-05 12:36:20 - INFO - stdout - {'loss': 1.0824, 'grad_norm': 1.168656826019287, 'learning_rate': 1.9157244822942965e-05, 'epoch': 0.47} +2025-02-05 12:36:20 - ERROR - stderr - 16%|█▌ | 3538/22434 [2:28:40<13:06:30, 2.50s/it] +2025-02-05 12:36:23 - ERROR - stderr - 16%|█▌ | 3539/22434 [2:28:42<13:04:48, 2.49s/it] +2025-02-05 12:36:23 - ERROR - stderr - +2025-02-05 12:36:23 - ERROR - stderr - +2025-02-05 12:36:23 - INFO - stdout - {'loss': 1.0226, 'grad_norm': 1.1399892568588257, 'learning_rate': 1.9156664620354514e-05, 'epoch': 0.47} +2025-02-05 12:36:23 - ERROR - stderr - 16%|█▌ | 3539/22434 [2:28:42<13:04:48, 2.49s/it] +2025-02-05 12:36:25 - ERROR - stderr - 16%|█▌ | 3540/22434 [2:28:45<12:57:28, 2.47s/it] +2025-02-05 12:36:25 - ERROR - stderr - +2025-02-05 12:36:25 - ERROR - stderr - +2025-02-05 12:36:25 - INFO - stdout - {'loss': 1.0251, 'grad_norm': 1.123217225074768, 'learning_rate': 1.9156084226904142e-05, 'epoch': 0.47} +2025-02-05 12:36:25 - ERROR - stderr - 16%|█▌ | 3540/22434 [2:28:45<12:57:28, 2.47s/it] +2025-02-05 12:36:28 - ERROR - stderr - 16%|█▌ | 3541/22434 [2:28:47<12:58:23, 2.47s/it] +2025-02-05 12:36:28 - ERROR - stderr - +2025-02-05 12:36:28 - ERROR - stderr - +2025-02-05 12:36:28 - INFO - stdout - {'loss': 1.0382, 'grad_norm': 1.085670828819275, 'learning_rate': 1.9155503642603963e-05, 'epoch': 0.47} +2025-02-05 12:36:28 - ERROR - stderr - 16%|█▌ | 3541/22434 [2:28:47<12:58:23, 2.47s/it] +2025-02-05 12:36:30 - ERROR - stderr - 16%|█▌ | 3542/22434 [2:28:50<13:01:38, 2.48s/it] +2025-02-05 12:36:30 - ERROR - stderr - +2025-02-05 12:36:30 - ERROR - stderr - +2025-02-05 12:36:30 - INFO - stdout - {'loss': 0.906, 'grad_norm': 0.9745550751686096, 'learning_rate': 1.9154922867466067e-05, 'epoch': 0.47} +2025-02-05 12:36:30 - ERROR - stderr - 16%|█▌ | 3542/22434 [2:28:50<13:01:38, 2.48s/it] +2025-02-05 12:36:33 - ERROR - stderr - 16%|█▌ | 3543/22434 [2:28:52<13:02:06, 2.48s/it] +2025-02-05 12:36:33 - ERROR - stderr - +2025-02-05 12:36:33 - ERROR - stderr - +2025-02-05 12:36:33 - INFO - stdout - {'loss': 1.0389, 'grad_norm': 1.0427231788635254, 'learning_rate': 1.9154341901502566e-05, 'epoch': 0.47} +2025-02-05 12:36:33 - ERROR - stderr - 16%|█▌ | 3543/22434 [2:28:52<13:02:06, 2.48s/it] +2025-02-05 12:36:35 - ERROR - stderr - 16%|█▌ | 3544/22434 [2:28:55<13:05:44, 2.50s/it] +2025-02-05 12:36:35 - ERROR - stderr - +2025-02-05 12:36:35 - ERROR - stderr - +2025-02-05 12:36:35 - INFO - stdout - {'loss': 0.9746, 'grad_norm': 1.0810281038284302, 'learning_rate': 1.915376074472557e-05, 'epoch': 0.47} +2025-02-05 12:36:35 - ERROR - stderr - 16%|█▌ | 3544/22434 [2:28:55<13:05:44, 2.50s/it] +2025-02-05 12:36:38 - ERROR - stderr - 16%|█▌ | 3545/22434 [2:28:57<13:10:22, 2.51s/it] +2025-02-05 12:36:38 - ERROR - stderr - +2025-02-05 12:36:38 - ERROR - stderr - +2025-02-05 12:36:38 - INFO - stdout - {'loss': 0.8923, 'grad_norm': 1.0047287940979004, 'learning_rate': 1.9153179397147187e-05, 'epoch': 0.47} +2025-02-05 12:36:38 - ERROR - stderr - 16%|█▌ | 3545/22434 [2:28:57<13:10:22, 2.51s/it] +2025-02-05 12:36:40 - ERROR - stderr - 16%|█▌ | 3546/22434 [2:29:00<13:15:54, 2.53s/it] +2025-02-05 12:36:40 - ERROR - stderr - +2025-02-05 12:36:40 - ERROR - stderr - +2025-02-05 12:36:40 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.0907237529754639, 'learning_rate': 1.9152597858779538e-05, 'epoch': 0.47} +2025-02-05 12:36:40 - ERROR - stderr - 16%|█▌ | 3546/22434 [2:29:00<13:15:54, 2.53s/it] +2025-02-05 12:36:43 - ERROR - stderr - 16%|█▌ | 3547/22434 [2:29:02<13:13:03, 2.52s/it] +2025-02-05 12:36:43 - ERROR - stderr - +2025-02-05 12:36:43 - ERROR - stderr - +2025-02-05 12:36:43 - INFO - stdout - {'loss': 0.9208, 'grad_norm': 1.0600823163986206, 'learning_rate': 1.9152016129634746e-05, 'epoch': 0.47} +2025-02-05 12:36:43 - ERROR - stderr - 16%|█▌ | 3547/22434 [2:29:03<13:13:03, 2.52s/it] +2025-02-05 12:36:45 - ERROR - stderr - 16%|█▌ | 3548/22434 [2:29:05<13:06:28, 2.50s/it] +2025-02-05 12:36:45 - ERROR - stderr - +2025-02-05 12:36:45 - ERROR - stderr - +2025-02-05 12:36:45 - INFO - stdout - {'loss': 0.879, 'grad_norm': 1.0306575298309326, 'learning_rate': 1.9151434209724935e-05, 'epoch': 0.47} +2025-02-05 12:36:45 - ERROR - stderr - 16%|█▌ | 3548/22434 [2:29:05<13:06:28, 2.50s/it] +2025-02-05 12:36:48 - ERROR - stderr - 16%|█▌ | 3549/22434 [2:29:07<13:01:34, 2.48s/it] +2025-02-05 12:36:48 - ERROR - stderr - +2025-02-05 12:36:48 - ERROR - stderr - +2025-02-05 12:36:48 - INFO - stdout - {'loss': 0.9873, 'grad_norm': 1.1240202188491821, 'learning_rate': 1.9150852099062236e-05, 'epoch': 0.47} +2025-02-05 12:36:48 - ERROR - stderr - 16%|█▌ | 3549/22434 [2:29:07<13:01:34, 2.48s/it] +2025-02-05 12:36:50 - ERROR - stderr - 16%|█▌ | 3550/22434 [2:29:10<13:00:41, 2.48s/it] +2025-02-05 12:36:50 - ERROR - stderr - +2025-02-05 12:36:50 - ERROR - stderr - +2025-02-05 12:36:50 - INFO - stdout - {'loss': 0.9618, 'grad_norm': 1.038956642150879, 'learning_rate': 1.915026979765878e-05, 'epoch': 0.47} +2025-02-05 12:36:50 - ERROR - stderr - 16%|█▌ | 3550/22434 [2:29:10<13:00:41, 2.48s/it] +2025-02-05 12:36:53 - ERROR - stderr - 16%|█▌ | 3551/22434 [2:29:12<13:03:05, 2.49s/it] +2025-02-05 12:36:53 - ERROR - stderr - +2025-02-05 12:36:53 - ERROR - stderr - +2025-02-05 12:36:53 - INFO - stdout - {'loss': 1.0747, 'grad_norm': 1.1260778903961182, 'learning_rate': 1.9149687305526704e-05, 'epoch': 0.47} +2025-02-05 12:36:53 - ERROR - stderr - 16%|█▌ | 3551/22434 [2:29:12<13:03:05, 2.49s/it] +2025-02-05 12:36:55 - ERROR - stderr - 16%|█▌ | 3552/22434 [2:29:15<13:07:04, 2.50s/it] +2025-02-05 12:36:55 - ERROR - stderr - +2025-02-05 12:36:55 - ERROR - stderr - +2025-02-05 12:36:55 - INFO - stdout - {'loss': 0.9437, 'grad_norm': 1.0979074239730835, 'learning_rate': 1.9149104622678155e-05, 'epoch': 0.47} +2025-02-05 12:36:55 - ERROR - stderr - 16%|█▌ | 3552/22434 [2:29:15<13:07:04, 2.50s/it] +2025-02-05 12:36:58 - ERROR - stderr - 16%|█▌ | 3553/22434 [2:29:17<13:08:17, 2.51s/it] +2025-02-05 12:36:58 - ERROR - stderr - +2025-02-05 12:36:58 - ERROR - stderr - +2025-02-05 12:36:58 - INFO - stdout - {'loss': 0.9802, 'grad_norm': 1.1374695301055908, 'learning_rate': 1.9148521749125275e-05, 'epoch': 0.48} +2025-02-05 12:36:58 - ERROR - stderr - 16%|█▌ | 3553/22434 [2:29:17<13:08:17, 2.51s/it] +2025-02-05 12:37:00 - ERROR - stderr - 16%|█▌ | 3554/22434 [2:29:20<13:13:32, 2.52s/it] +2025-02-05 12:37:00 - ERROR - stderr - +2025-02-05 12:37:00 - ERROR - stderr - +2025-02-05 12:37:00 - INFO - stdout - {'loss': 0.938, 'grad_norm': 1.1580686569213867, 'learning_rate': 1.9147938684880213e-05, 'epoch': 0.48} +2025-02-05 12:37:00 - ERROR - stderr - 16%|█▌ | 3554/22434 [2:29:20<13:13:32, 2.52s/it] +2025-02-05 12:37:03 - ERROR - stderr - 16%|█▌ | 3555/22434 [2:29:22<13:03:12, 2.49s/it] +2025-02-05 12:37:03 - ERROR - stderr - +2025-02-05 12:37:03 - ERROR - stderr - +2025-02-05 12:37:03 - INFO - stdout - {'loss': 0.9127, 'grad_norm': 1.1435892581939697, 'learning_rate': 1.9147355429955123e-05, 'epoch': 0.48} +2025-02-05 12:37:03 - ERROR - stderr - 16%|█▌ | 3555/22434 [2:29:22<13:03:12, 2.49s/it] +2025-02-05 12:37:05 - ERROR - stderr - 16%|█▌ | 3556/22434 [2:29:25<12:56:31, 2.47s/it] +2025-02-05 12:37:05 - ERROR - stderr - +2025-02-05 12:37:05 - ERROR - stderr - +2025-02-05 12:37:05 - INFO - stdout - {'loss': 0.8869, 'grad_norm': 1.1364918947219849, 'learning_rate': 1.9146771984362157e-05, 'epoch': 0.48} +2025-02-05 12:37:05 - ERROR - stderr - 16%|█▌ | 3556/22434 [2:29:25<12:56:31, 2.47s/it] +2025-02-05 12:37:07 - ERROR - stderr - 16%|█▌ | 3557/22434 [2:29:27<12:54:46, 2.46s/it] +2025-02-05 12:37:08 - ERROR - stderr - +2025-02-05 12:37:08 - ERROR - stderr - +2025-02-05 12:37:08 - INFO - stdout - {'loss': 0.9242, 'grad_norm': 1.1763559579849243, 'learning_rate': 1.9146188348113486e-05, 'epoch': 0.48} +2025-02-05 12:37:08 - ERROR - stderr - 16%|█▌ | 3557/22434 [2:29:27<12:54:46, 2.46s/it] +2025-02-05 12:37:10 - ERROR - stderr - 16%|█▌ | 3558/22434 [2:29:30<12:58:59, 2.48s/it] +2025-02-05 12:37:10 - ERROR - stderr - +2025-02-05 12:37:10 - ERROR - stderr - +2025-02-05 12:37:10 - INFO - stdout - {'loss': 0.9453, 'grad_norm': 1.1106432676315308, 'learning_rate': 1.914560452122127e-05, 'epoch': 0.48} +2025-02-05 12:37:10 - ERROR - stderr - 16%|█▌ | 3558/22434 [2:29:30<12:58:59, 2.48s/it] +2025-02-05 12:37:12 - ERROR - stderr - 16%|█▌ | 3559/22434 [2:29:32<13:02:53, 2.49s/it] +2025-02-05 12:37:13 - ERROR - stderr - +2025-02-05 12:37:13 - ERROR - stderr - +2025-02-05 12:37:13 - INFO - stdout - {'loss': 0.9085, 'grad_norm': 0.9965659976005554, 'learning_rate': 1.914502050369768e-05, 'epoch': 0.48} +2025-02-05 12:37:13 - ERROR - stderr - 16%|█▌ | 3559/22434 [2:29:32<13:02:53, 2.49s/it] +2025-02-05 12:37:15 - ERROR - stderr - 16%|█▌ | 3560/22434 [2:29:35<12:55:32, 2.47s/it] +2025-02-05 12:37:15 - ERROR - stderr - +2025-02-05 12:37:15 - ERROR - stderr - +2025-02-05 12:37:15 - INFO - stdout - {'loss': 1.0362, 'grad_norm': 1.2233017683029175, 'learning_rate': 1.9144436295554885e-05, 'epoch': 0.48} +2025-02-05 12:37:15 - ERROR - stderr - 16%|█▌ | 3560/22434 [2:29:35<12:55:32, 2.47s/it] +2025-02-05 12:37:17 - ERROR - stderr - 16%|█▌ | 3561/22434 [2:29:37<12:56:07, 2.47s/it] +2025-02-05 12:37:17 - ERROR - stderr - +2025-02-05 12:37:17 - ERROR - stderr - +2025-02-05 12:37:17 - INFO - stdout - {'loss': 0.9245, 'grad_norm': 1.0986855030059814, 'learning_rate': 1.914385189680507e-05, 'epoch': 0.48} +2025-02-05 12:37:17 - ERROR - stderr - 16%|█▌ | 3561/22434 [2:29:37<12:56:07, 2.47s/it] +2025-02-05 12:37:20 - ERROR - stderr - 16%|█▌ | 3562/22434 [2:29:40<13:07:25, 2.50s/it] +2025-02-05 12:37:20 - ERROR - stderr - +2025-02-05 12:37:20 - ERROR - stderr - +2025-02-05 12:37:20 - INFO - stdout - {'loss': 0.9465, 'grad_norm': 1.04866623878479, 'learning_rate': 1.914326730746041e-05, 'epoch': 0.48} +2025-02-05 12:37:20 - ERROR - stderr - 16%|█▌ | 3562/22434 [2:29:40<13:07:25, 2.50s/it] +2025-02-05 12:37:22 - ERROR - stderr - 16%|█▌ | 3563/22434 [2:29:42<12:59:15, 2.48s/it] +2025-02-05 12:37:22 - ERROR - stderr - +2025-02-05 12:37:22 - ERROR - stderr - +2025-02-05 12:37:22 - INFO - stdout - {'loss': 0.974, 'grad_norm': 1.1320934295654297, 'learning_rate': 1.9142682527533095e-05, 'epoch': 0.48} +2025-02-05 12:37:22 - ERROR - stderr - 16%|█▌ | 3563/22434 [2:29:42<12:59:15, 2.48s/it] +2025-02-05 12:37:25 - ERROR - stderr - 16%|█▌ | 3564/22434 [2:29:45<12:59:51, 2.48s/it] +2025-02-05 12:37:25 - ERROR - stderr - +2025-02-05 12:37:25 - ERROR - stderr - +2025-02-05 12:37:25 - INFO - stdout - {'loss': 1.0573, 'grad_norm': 1.139564871788025, 'learning_rate': 1.914209755703531e-05, 'epoch': 0.48} +2025-02-05 12:37:25 - ERROR - stderr - 16%|█▌ | 3564/22434 [2:29:45<12:59:51, 2.48s/it] +2025-02-05 12:37:27 - ERROR - stderr - 16%|█▌ | 3565/22434 [2:29:47<13:00:52, 2.48s/it] +2025-02-05 12:37:27 - ERROR - stderr - +2025-02-05 12:37:27 - ERROR - stderr - +2025-02-05 12:37:27 - INFO - stdout - {'loss': 1.1402, 'grad_norm': 1.2128188610076904, 'learning_rate': 1.914151239597925e-05, 'epoch': 0.48} +2025-02-05 12:37:27 - ERROR - stderr - 16%|█▌ | 3565/22434 [2:29:47<13:00:52, 2.48s/it] +2025-02-05 12:37:30 - ERROR - stderr - 16%|█▌ | 3566/22434 [2:29:50<13:00:35, 2.48s/it] +2025-02-05 12:37:30 - ERROR - stderr - +2025-02-05 12:37:30 - ERROR - stderr - +2025-02-05 12:37:30 - INFO - stdout - {'loss': 0.9737, 'grad_norm': 1.164311408996582, 'learning_rate': 1.9140927044377105e-05, 'epoch': 0.48} +2025-02-05 12:37:30 - ERROR - stderr - 16%|█▌ | 3566/22434 [2:29:50<13:00:35, 2.48s/it] +2025-02-05 12:37:32 - ERROR - stderr - 16%|█▌ | 3567/22434 [2:29:52<13:14:09, 2.53s/it] +2025-02-05 12:37:33 - ERROR - stderr - +2025-02-05 12:37:33 - ERROR - stderr - +2025-02-05 12:37:33 - INFO - stdout - {'loss': 0.8472, 'grad_norm': 1.1340677738189697, 'learning_rate': 1.9140341502241087e-05, 'epoch': 0.48} +2025-02-05 12:37:33 - ERROR - stderr - 16%|█▌ | 3567/22434 [2:29:52<13:14:09, 2.53s/it] +2025-02-05 12:37:35 - ERROR - stderr - 16%|█▌ | 3568/22434 [2:29:55<13:09:34, 2.51s/it] +2025-02-05 12:37:35 - ERROR - stderr - +2025-02-05 12:37:35 - ERROR - stderr - +2025-02-05 12:37:35 - INFO - stdout - {'loss': 1.0217, 'grad_norm': 1.14836847782135, 'learning_rate': 1.9139755769583398e-05, 'epoch': 0.48} +2025-02-05 12:37:35 - ERROR - stderr - 16%|█▌ | 3568/22434 [2:29:55<13:09:34, 2.51s/it] +2025-02-05 12:37:37 - ERROR - stderr - 16%|█▌ | 3569/22434 [2:29:57<13:06:13, 2.50s/it] +2025-02-05 12:37:37 - ERROR - stderr - +2025-02-05 12:37:37 - ERROR - stderr - +2025-02-05 12:37:37 - INFO - stdout - {'loss': 1.0192, 'grad_norm': 1.1419048309326172, 'learning_rate': 1.913916984641625e-05, 'epoch': 0.48} +2025-02-05 12:37:37 - ERROR - stderr - 16%|█▌ | 3569/22434 [2:29:57<13:06:13, 2.50s/it] +2025-02-05 12:37:40 - ERROR - stderr - 16%|█▌ | 3570/22434 [2:30:00<12:59:55, 2.48s/it] +2025-02-05 12:37:40 - ERROR - stderr - +2025-02-05 12:37:40 - ERROR - stderr - +2025-02-05 12:37:40 - INFO - stdout - {'loss': 0.8833, 'grad_norm': 1.0963401794433594, 'learning_rate': 1.913858373275184e-05, 'epoch': 0.48} +2025-02-05 12:37:40 - ERROR - stderr - 16%|█▌ | 3570/22434 [2:30:00<12:59:55, 2.48s/it] +2025-02-05 12:37:42 - ERROR - stderr - 16%|█▌ | 3571/22434 [2:30:02<13:02:42, 2.49s/it] +2025-02-05 12:37:42 - ERROR - stderr - +2025-02-05 12:37:42 - ERROR - stderr - +2025-02-05 12:37:42 - INFO - stdout - {'loss': 0.9956, 'grad_norm': 1.1476385593414307, 'learning_rate': 1.9137997428602406e-05, 'epoch': 0.48} +2025-02-05 12:37:42 - ERROR - stderr - 16%|█▌ | 3571/22434 [2:30:02<13:02:42, 2.49s/it] +2025-02-05 12:37:45 - ERROR - stderr - 16%|█▌ | 3572/22434 [2:30:05<13:09:40, 2.51s/it] +2025-02-05 12:37:45 - ERROR - stderr - +2025-02-05 12:37:45 - ERROR - stderr - +2025-02-05 12:37:45 - INFO - stdout - {'loss': 0.9826, 'grad_norm': 1.1408270597457886, 'learning_rate': 1.913741093398016e-05, 'epoch': 0.48} +2025-02-05 12:37:45 - ERROR - stderr - 16%|█▌ | 3572/22434 [2:30:05<13:09:40, 2.51s/it] +2025-02-05 12:37:47 - ERROR - stderr - 16%|█▌ | 3573/22434 [2:30:07<13:13:30, 2.52s/it] +2025-02-05 12:37:48 - ERROR - stderr - +2025-02-05 12:37:48 - ERROR - stderr - +2025-02-05 12:37:48 - INFO - stdout - {'loss': 0.9487, 'grad_norm': 1.1750731468200684, 'learning_rate': 1.913682424889732e-05, 'epoch': 0.48} +2025-02-05 12:37:48 - ERROR - stderr - 16%|█▌ | 3573/22434 [2:30:07<13:13:30, 2.52s/it] +2025-02-05 12:37:50 - ERROR - stderr - 16%|█▌ | 3574/22434 [2:30:10<13:17:21, 2.54s/it] +2025-02-05 12:37:50 - ERROR - stderr - +2025-02-05 12:37:50 - ERROR - stderr - +2025-02-05 12:37:50 - INFO - stdout - {'loss': 1.0776, 'grad_norm': 1.2140734195709229, 'learning_rate': 1.9136237373366126e-05, 'epoch': 0.48} +2025-02-05 12:37:50 - ERROR - stderr - 16%|█▌ | 3574/22434 [2:30:10<13:17:21, 2.54s/it] +2025-02-05 12:37:53 - ERROR - stderr - 16%|█▌ | 3575/22434 [2:30:12<13:23:21, 2.56s/it] +2025-02-05 12:37:53 - ERROR - stderr - +2025-02-05 12:37:53 - ERROR - stderr - +2025-02-05 12:37:53 - INFO - stdout - {'loss': 0.937, 'grad_norm': 1.0570056438446045, 'learning_rate': 1.9135650307398808e-05, 'epoch': 0.48} +2025-02-05 12:37:53 - ERROR - stderr - 16%|█▌ | 3575/22434 [2:30:12<13:23:21, 2.56s/it] +2025-02-05 12:37:55 - ERROR - stderr - 16%|█▌ | 3576/22434 [2:30:15<13:29:57, 2.58s/it] +2025-02-05 12:37:55 - ERROR - stderr - +2025-02-05 12:37:55 - ERROR - stderr - +2025-02-05 12:37:55 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.1552014350891113, 'learning_rate': 1.9135063051007597e-05, 'epoch': 0.48} +2025-02-05 12:37:55 - ERROR - stderr - 16%|█▌ | 3576/22434 [2:30:15<13:29:57, 2.58s/it] +2025-02-05 12:37:58 - ERROR - stderr - 16%|█▌ | 3577/22434 [2:30:17<13:13:54, 2.53s/it] +2025-02-05 12:37:58 - ERROR - stderr - +2025-02-05 12:37:58 - ERROR - stderr - +2025-02-05 12:37:58 - INFO - stdout - {'loss': 0.9091, 'grad_norm': 1.0553832054138184, 'learning_rate': 1.9134475604204742e-05, 'epoch': 0.48} +2025-02-05 12:37:58 - ERROR - stderr - 16%|█▌ | 3577/22434 [2:30:17<13:13:54, 2.53s/it] +2025-02-05 12:38:00 - ERROR - stderr - 16%|█▌ | 3578/22434 [2:30:20<13:09:54, 2.51s/it] +2025-02-05 12:38:00 - ERROR - stderr - +2025-02-05 12:38:00 - ERROR - stderr - +2025-02-05 12:38:00 - INFO - stdout - {'loss': 1.0257, 'grad_norm': 1.1314283609390259, 'learning_rate': 1.9133887967002483e-05, 'epoch': 0.48} +2025-02-05 12:38:00 - ERROR - stderr - 16%|█▌ | 3578/22434 [2:30:20<13:09:54, 2.51s/it] +2025-02-05 12:38:03 - ERROR - stderr - 16%|█▌ | 3579/22434 [2:30:22<13:03:13, 2.49s/it] +2025-02-05 12:38:03 - ERROR - stderr - +2025-02-05 12:38:03 - ERROR - stderr - +2025-02-05 12:38:03 - INFO - stdout - {'loss': 1.0577, 'grad_norm': 1.2739524841308594, 'learning_rate': 1.9133300139413067e-05, 'epoch': 0.48} +2025-02-05 12:38:03 - ERROR - stderr - 16%|█▌ | 3579/22434 [2:30:22<13:03:13, 2.49s/it] +2025-02-05 12:38:05 - ERROR - stderr - 16%|█▌ | 3580/22434 [2:30:25<12:57:37, 2.47s/it] +2025-02-05 12:38:05 - ERROR - stderr - +2025-02-05 12:38:05 - ERROR - stderr - +2025-02-05 12:38:05 - INFO - stdout - {'loss': 0.9154, 'grad_norm': 1.1427751779556274, 'learning_rate': 1.913271212144875e-05, 'epoch': 0.48} +2025-02-05 12:38:05 - ERROR - stderr - 16%|█▌ | 3580/22434 [2:30:25<12:57:37, 2.47s/it] +2025-02-05 12:38:08 - ERROR - stderr - 16%|█▌ | 3581/22434 [2:30:27<13:01:11, 2.49s/it] +2025-02-05 12:38:08 - ERROR - stderr - +2025-02-05 12:38:08 - ERROR - stderr - +2025-02-05 12:38:08 - INFO - stdout - {'loss': 1.0924, 'grad_norm': 1.1865460872650146, 'learning_rate': 1.913212391312179e-05, 'epoch': 0.48} +2025-02-05 12:38:08 - ERROR - stderr - 16%|█▌ | 3581/22434 [2:30:27<13:01:11, 2.49s/it] +2025-02-05 12:38:10 - ERROR - stderr - 16%|█▌ | 3582/22434 [2:30:30<13:05:46, 2.50s/it] +2025-02-05 12:38:10 - ERROR - stderr - +2025-02-05 12:38:10 - ERROR - stderr - +2025-02-05 12:38:10 - INFO - stdout - {'loss': 0.9781, 'grad_norm': 1.1503925323486328, 'learning_rate': 1.9131535514444445e-05, 'epoch': 0.48} +2025-02-05 12:38:10 - ERROR - stderr - 16%|█▌ | 3582/22434 [2:30:30<13:05:46, 2.50s/it] +2025-02-05 12:38:13 - ERROR - stderr - 16%|█▌ | 3583/22434 [2:30:32<13:04:51, 2.50s/it] +2025-02-05 12:38:13 - ERROR - stderr - +2025-02-05 12:38:13 - ERROR - stderr - +2025-02-05 12:38:13 - INFO - stdout - {'loss': 1.0568, 'grad_norm': 1.193393588066101, 'learning_rate': 1.913094692542898e-05, 'epoch': 0.48} +2025-02-05 12:38:13 - ERROR - stderr - 16%|█▌ | 3583/22434 [2:30:32<13:04:51, 2.50s/it] +2025-02-05 12:38:15 - ERROR - stderr - 16%|█▌ | 3584/22434 [2:30:35<13:06:33, 2.50s/it] +2025-02-05 12:38:15 - ERROR - stderr - +2025-02-05 12:38:15 - ERROR - stderr - +2025-02-05 12:38:15 - INFO - stdout - {'loss': 0.9596, 'grad_norm': 1.0923079252243042, 'learning_rate': 1.913035814608766e-05, 'epoch': 0.48} +2025-02-05 12:38:15 - ERROR - stderr - 16%|█▌ | 3584/22434 [2:30:35<13:06:33, 2.50s/it] +2025-02-05 12:38:18 - ERROR - stderr - 16%|█▌ | 3585/22434 [2:30:37<13:04:28, 2.50s/it] +2025-02-05 12:38:18 - ERROR - stderr - +2025-02-05 12:38:18 - ERROR - stderr - +2025-02-05 12:38:18 - INFO - stdout - {'loss': 0.983, 'grad_norm': 1.068599820137024, 'learning_rate': 1.9129769176432768e-05, 'epoch': 0.48} +2025-02-05 12:38:18 - ERROR - stderr - 16%|█▌ | 3585/22434 [2:30:37<13:04:28, 2.50s/it] +2025-02-05 12:38:20 - ERROR - stderr - 16%|█▌ | 3586/22434 [2:30:40<13:06:40, 2.50s/it] +2025-02-05 12:38:20 - ERROR - stderr - +2025-02-05 12:38:20 - ERROR - stderr - +2025-02-05 12:38:20 - INFO - stdout - {'loss': 0.9609, 'grad_norm': 1.0484039783477783, 'learning_rate': 1.9129180016476568e-05, 'epoch': 0.48} +2025-02-05 12:38:20 - ERROR - stderr - 16%|█▌ | 3586/22434 [2:30:40<13:06:40, 2.50s/it] +2025-02-05 12:38:23 - ERROR - stderr - 16%|█▌ | 3587/22434 [2:30:42<13:08:30, 2.51s/it] +2025-02-05 12:38:23 - ERROR - stderr - +2025-02-05 12:38:23 - ERROR - stderr - +2025-02-05 12:38:23 - INFO - stdout - {'loss': 0.9995, 'grad_norm': 1.0019081830978394, 'learning_rate': 1.9128590666231347e-05, 'epoch': 0.48} +2025-02-05 12:38:23 - ERROR - stderr - 16%|█▌ | 3587/22434 [2:30:42<13:08:30, 2.51s/it] +2025-02-05 12:38:25 - ERROR - stderr - 16%|█▌ | 3588/22434 [2:30:45<13:05:55, 2.50s/it] +2025-02-05 12:38:25 - ERROR - stderr - +2025-02-05 12:38:25 - ERROR - stderr - +2025-02-05 12:38:25 - INFO - stdout - {'loss': 0.8628, 'grad_norm': 1.1352434158325195, 'learning_rate': 1.912800112570939e-05, 'epoch': 0.48} +2025-02-05 12:38:25 - ERROR - stderr - 16%|█▌ | 3588/22434 [2:30:45<13:05:55, 2.50s/it] +2025-02-05 12:38:28 - ERROR - stderr - 16%|█▌ | 3589/22434 [2:30:47<13:04:05, 2.50s/it] +2025-02-05 12:38:28 - ERROR - stderr - +2025-02-05 12:38:28 - ERROR - stderr - +2025-02-05 12:38:28 - INFO - stdout - {'loss': 0.9074, 'grad_norm': 1.2107622623443604, 'learning_rate': 1.9127411394922982e-05, 'epoch': 0.48} +2025-02-05 12:38:28 - ERROR - stderr - 16%|█▌ | 3589/22434 [2:30:47<13:04:05, 2.50s/it] +2025-02-05 12:38:30 - ERROR - stderr - 16%|█▌ | 3590/22434 [2:30:50<13:10:11, 2.52s/it] +2025-02-05 12:38:30 - ERROR - stderr - +2025-02-05 12:38:30 - ERROR - stderr - +2025-02-05 12:38:30 - INFO - stdout - {'loss': 0.9309, 'grad_norm': 0.9772448539733887, 'learning_rate': 1.9126821473884423e-05, 'epoch': 0.48} +2025-02-05 12:38:30 - ERROR - stderr - 16%|█▌ | 3590/22434 [2:30:50<13:10:11, 2.52s/it] +2025-02-05 12:38:33 - ERROR - stderr - 16%|█▌ | 3591/22434 [2:30:52<13:07:23, 2.51s/it] +2025-02-05 12:38:33 - ERROR - stderr - +2025-02-05 12:38:33 - ERROR - stderr - +2025-02-05 12:38:33 - INFO - stdout - {'loss': 0.8697, 'grad_norm': 1.0572630167007446, 'learning_rate': 1.9126231362605997e-05, 'epoch': 0.48} +2025-02-05 12:38:33 - ERROR - stderr - 16%|█▌ | 3591/22434 [2:30:52<13:07:23, 2.51s/it] +2025-02-05 12:38:35 - ERROR - stderr - 16%|█▌ | 3592/22434 [2:30:55<13:00:33, 2.49s/it] +2025-02-05 12:38:35 - ERROR - stderr - +2025-02-05 12:38:35 - ERROR - stderr - +2025-02-05 12:38:35 - INFO - stdout - {'loss': 0.8581, 'grad_norm': 1.1788923740386963, 'learning_rate': 1.9125641061100014e-05, 'epoch': 0.48} +2025-02-05 12:38:35 - ERROR - stderr - 16%|█▌ | 3592/22434 [2:30:55<13:00:33, 2.49s/it] +2025-02-05 12:38:37 - ERROR - stderr - 16%|█▌ | 3593/22434 [2:30:57<12:55:40, 2.47s/it] +2025-02-05 12:38:38 - ERROR - stderr - +2025-02-05 12:38:38 - ERROR - stderr - +2025-02-05 12:38:38 - INFO - stdout - {'loss': 1.0829, 'grad_norm': 1.1551872491836548, 'learning_rate': 1.9125050569378777e-05, 'epoch': 0.48} +2025-02-05 12:38:38 - ERROR - stderr - 16%|█▌ | 3593/22434 [2:30:57<12:55:40, 2.47s/it] +2025-02-05 12:38:40 - ERROR - stderr - 16%|█▌ | 3594/22434 [2:31:00<12:54:04, 2.47s/it] +2025-02-05 12:38:40 - ERROR - stderr - +2025-02-05 12:38:40 - ERROR - stderr - +2025-02-05 12:38:40 - INFO - stdout - {'loss': 0.9155, 'grad_norm': 1.131706714630127, 'learning_rate': 1.912445988745459e-05, 'epoch': 0.48} +2025-02-05 12:38:40 - ERROR - stderr - 16%|█▌ | 3594/22434 [2:31:00<12:54:04, 2.47s/it] +2025-02-05 12:38:42 - ERROR - stderr - 16%|█▌ | 3595/22434 [2:31:02<12:54:44, 2.47s/it] +2025-02-05 12:38:42 - ERROR - stderr - +2025-02-05 12:38:42 - ERROR - stderr - +2025-02-05 12:38:42 - INFO - stdout - {'loss': 0.9594, 'grad_norm': 1.1873979568481445, 'learning_rate': 1.912386901533977e-05, 'epoch': 0.48} +2025-02-05 12:38:42 - ERROR - stderr - 16%|█▌ | 3595/22434 [2:31:02<12:54:44, 2.47s/it] +2025-02-05 12:38:45 - ERROR - stderr - 16%|█▌ | 3596/22434 [2:31:05<12:57:05, 2.48s/it] +2025-02-05 12:38:45 - ERROR - stderr - +2025-02-05 12:38:45 - ERROR - stderr - +2025-02-05 12:38:45 - INFO - stdout - {'loss': 1.0303, 'grad_norm': 1.1682223081588745, 'learning_rate': 1.912327795304663e-05, 'epoch': 0.48} +2025-02-05 12:38:45 - ERROR - stderr - 16%|█▌ | 3596/22434 [2:31:05<12:57:05, 2.48s/it] +2025-02-05 12:38:47 - ERROR - stderr - 16%|█▌ | 3597/22434 [2:31:07<13:03:38, 2.50s/it] +2025-02-05 12:38:48 - ERROR - stderr - +2025-02-05 12:38:48 - ERROR - stderr - +2025-02-05 12:38:48 - INFO - stdout - {'loss': 1.1797, 'grad_norm': 1.195803165435791, 'learning_rate': 1.912268670058749e-05, 'epoch': 0.48} +2025-02-05 12:38:48 - ERROR - stderr - 16%|█▌ | 3597/22434 [2:31:07<13:03:38, 2.50s/it] +2025-02-05 12:38:50 - ERROR - stderr - 16%|█▌ | 3598/22434 [2:31:10<13:08:22, 2.51s/it] +2025-02-05 12:38:50 - ERROR - stderr - +2025-02-05 12:38:50 - ERROR - stderr - +2025-02-05 12:38:50 - INFO - stdout - {'loss': 0.9474, 'grad_norm': 1.2576552629470825, 'learning_rate': 1.9122095257974676e-05, 'epoch': 0.48} +2025-02-05 12:38:50 - ERROR - stderr - 16%|█▌ | 3598/22434 [2:31:10<13:08:22, 2.51s/it] +2025-02-05 12:38:53 - ERROR - stderr - 16%|█▌ | 3599/22434 [2:31:12<13:09:24, 2.51s/it] +2025-02-05 12:38:53 - ERROR - stderr - +2025-02-05 12:38:53 - ERROR - stderr - +2025-02-05 12:38:53 - INFO - stdout - {'loss': 1.0134, 'grad_norm': 1.1217975616455078, 'learning_rate': 1.9121503625220515e-05, 'epoch': 0.48} +2025-02-05 12:38:53 - ERROR - stderr - 16%|█▌ | 3599/22434 [2:31:12<13:09:24, 2.51s/it] +2025-02-05 12:38:55 - ERROR - stderr - 16%|█▌ | 3600/22434 [2:31:15<13:08:30, 2.51s/it] +2025-02-05 12:38:55 - ERROR - stderr - +2025-02-05 12:38:55 - ERROR - stderr - +2025-02-05 12:38:55 - INFO - stdout - {'loss': 0.9757, 'grad_norm': 1.1104400157928467, 'learning_rate': 1.912091180233734e-05, 'epoch': 0.48} +2025-02-05 12:38:55 - ERROR - stderr - 16%|█▌ | 3600/22434 [2:31:15<13:08:30, 2.51s/it] +2025-02-05 12:38:58 - ERROR - stderr - 16%|█▌ | 3601/22434 [2:31:17<13:07:45, 2.51s/it] +2025-02-05 12:38:58 - ERROR - stderr - +2025-02-05 12:38:58 - ERROR - stderr - +2025-02-05 12:38:58 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.099687933921814, 'learning_rate': 1.912031978933749e-05, 'epoch': 0.48} +2025-02-05 12:38:58 - ERROR - stderr - 16%|█▌ | 3601/22434 [2:31:17<13:07:45, 2.51s/it] +2025-02-05 12:39:00 - ERROR - stderr - 16%|█▌ | 3602/22434 [2:31:20<13:02:30, 2.49s/it] +2025-02-05 12:39:00 - ERROR - stderr - +2025-02-05 12:39:00 - ERROR - stderr - +2025-02-05 12:39:00 - INFO - stdout - {'loss': 0.9478, 'grad_norm': 1.0479400157928467, 'learning_rate': 1.9119727586233295e-05, 'epoch': 0.48} +2025-02-05 12:39:00 - ERROR - stderr - 16%|█▌ | 3602/22434 [2:31:20<13:02:30, 2.49s/it] +2025-02-05 12:39:03 - ERROR - stderr - 16%|█▌ | 3603/22434 [2:31:22<13:21:48, 2.55s/it] +2025-02-05 12:39:03 - ERROR - stderr - +2025-02-05 12:39:03 - ERROR - stderr - +2025-02-05 12:39:03 - INFO - stdout - {'loss': 1.0929, 'grad_norm': 1.0492379665374756, 'learning_rate': 1.9119135193037108e-05, 'epoch': 0.48} +2025-02-05 12:39:03 - ERROR - stderr - 16%|█▌ | 3603/22434 [2:31:23<13:21:48, 2.55s/it] +2025-02-05 12:39:05 - ERROR - stderr - 16%|█▌ | 3604/22434 [2:31:25<13:13:42, 2.53s/it] +2025-02-05 12:39:05 - ERROR - stderr - +2025-02-05 12:39:05 - ERROR - stderr - +2025-02-05 12:39:05 - INFO - stdout - {'loss': 1.037, 'grad_norm': 1.0451514720916748, 'learning_rate': 1.9118542609761273e-05, 'epoch': 0.48} +2025-02-05 12:39:05 - ERROR - stderr - 16%|█▌ | 3604/22434 [2:31:25<13:13:42, 2.53s/it] +2025-02-05 12:39:08 - ERROR - stderr - 16%|█▌ | 3605/22434 [2:31:27<13:09:51, 2.52s/it] +2025-02-05 12:39:08 - ERROR - stderr - +2025-02-05 12:39:08 - ERROR - stderr - +2025-02-05 12:39:08 - INFO - stdout - {'loss': 1.031, 'grad_norm': 1.1628445386886597, 'learning_rate': 1.9117949836418143e-05, 'epoch': 0.48} +2025-02-05 12:39:08 - ERROR - stderr - 16%|█▌ | 3605/22434 [2:31:27<13:09:51, 2.52s/it] +2025-02-05 12:39:10 - ERROR - stderr - 16%|█▌ | 3606/22434 [2:31:30<13:14:26, 2.53s/it] +2025-02-05 12:39:10 - ERROR - stderr - +2025-02-05 12:39:10 - ERROR - stderr - +2025-02-05 12:39:10 - INFO - stdout - {'loss': 1.0823, 'grad_norm': 1.1809580326080322, 'learning_rate': 1.9117356873020075e-05, 'epoch': 0.48} +2025-02-05 12:39:10 - ERROR - stderr - 16%|█▌ | 3606/22434 [2:31:30<13:14:26, 2.53s/it] +2025-02-05 12:39:13 - ERROR - stderr - 16%|█▌ | 3607/22434 [2:31:33<13:12:51, 2.53s/it] +2025-02-05 12:39:13 - ERROR - stderr - +2025-02-05 12:39:13 - ERROR - stderr - +2025-02-05 12:39:13 - INFO - stdout - {'loss': 1.0181, 'grad_norm': 1.0655548572540283, 'learning_rate': 1.9116763719579424e-05, 'epoch': 0.48} +2025-02-05 12:39:13 - ERROR - stderr - 16%|█▌ | 3607/22434 [2:31:33<13:12:51, 2.53s/it] +2025-02-05 12:39:15 - ERROR - stderr - 16%|█▌ | 3608/22434 [2:31:35<13:02:28, 2.49s/it] +2025-02-05 12:39:15 - ERROR - stderr - +2025-02-05 12:39:15 - ERROR - stderr - +2025-02-05 12:39:15 - INFO - stdout - {'loss': 1.086, 'grad_norm': 1.0812218189239502, 'learning_rate': 1.911617037610856e-05, 'epoch': 0.48} +2025-02-05 12:39:15 - ERROR - stderr - 16%|█▌ | 3608/22434 [2:31:35<13:02:28, 2.49s/it] +2025-02-05 12:39:18 - ERROR - stderr - 16%|█▌ | 3609/22434 [2:31:37<13:08:47, 2.51s/it] +2025-02-05 12:39:18 - ERROR - stderr - +2025-02-05 12:39:18 - ERROR - stderr - +2025-02-05 12:39:18 - INFO - stdout - {'loss': 0.895, 'grad_norm': 1.0576560497283936, 'learning_rate': 1.9115576842619846e-05, 'epoch': 0.48} +2025-02-05 12:39:18 - ERROR - stderr - 16%|█▌ | 3609/22434 [2:31:38<13:08:47, 2.51s/it] +2025-02-05 12:39:20 - ERROR - stderr - 16%|█▌ | 3610/22434 [2:31:40<13:09:35, 2.52s/it] +2025-02-05 12:39:20 - ERROR - stderr - +2025-02-05 12:39:20 - ERROR - stderr - +2025-02-05 12:39:20 - INFO - stdout - {'loss': 1.0846, 'grad_norm': 1.1840145587921143, 'learning_rate': 1.911498311912566e-05, 'epoch': 0.48} +2025-02-05 12:39:20 - ERROR - stderr - 16%|█▌ | 3610/22434 [2:31:40<13:09:35, 2.52s/it] +2025-02-05 12:39:23 - ERROR - stderr - 16%|█▌ | 3611/22434 [2:31:43<13:14:20, 2.53s/it] +2025-02-05 12:39:23 - ERROR - stderr - +2025-02-05 12:39:23 - ERROR - stderr - +2025-02-05 12:39:23 - INFO - stdout - {'loss': 1.0237, 'grad_norm': 1.1334906816482544, 'learning_rate': 1.9114389205638367e-05, 'epoch': 0.48} +2025-02-05 12:39:23 - ERROR - stderr - 16%|█▌ | 3611/22434 [2:31:43<13:14:20, 2.53s/it] +2025-02-05 12:39:25 - ERROR - stderr - 16%|█▌ | 3612/22434 [2:31:45<13:05:42, 2.50s/it] +2025-02-05 12:39:25 - ERROR - stderr - +2025-02-05 12:39:25 - ERROR - stderr - +2025-02-05 12:39:25 - INFO - stdout - {'loss': 0.9556, 'grad_norm': 1.1529263257980347, 'learning_rate': 1.9113795102170357e-05, 'epoch': 0.48} +2025-02-05 12:39:25 - ERROR - stderr - 16%|█▌ | 3612/22434 [2:31:45<13:05:42, 2.50s/it] +2025-02-05 12:39:28 - ERROR - stderr - 16%|█▌ | 3613/22434 [2:31:48<13:08:30, 2.51s/it] +2025-02-05 12:39:28 - ERROR - stderr - +2025-02-05 12:39:28 - ERROR - stderr - +2025-02-05 12:39:28 - INFO - stdout - {'loss': 0.8203, 'grad_norm': 1.0640449523925781, 'learning_rate': 1.9113200808734005e-05, 'epoch': 0.48} +2025-02-05 12:39:28 - ERROR - stderr - 16%|█▌ | 3613/22434 [2:31:48<13:08:30, 2.51s/it] +2025-02-05 12:39:31 - ERROR - stderr - 16%|█▌ | 3614/22434 [2:31:50<13:41:50, 2.62s/it] +2025-02-05 12:39:31 - ERROR - stderr - +2025-02-05 12:39:31 - ERROR - stderr - +2025-02-05 12:39:31 - INFO - stdout - {'loss': 0.9326, 'grad_norm': 1.05023992061615, 'learning_rate': 1.9112606325341706e-05, 'epoch': 0.48} +2025-02-05 12:39:31 - ERROR - stderr - 16%|█▌ | 3614/22434 [2:31:50<13:41:50, 2.62s/it] +2025-02-05 12:39:33 - ERROR - stderr - 16%|█▌ | 3615/22434 [2:31:53<13:31:06, 2.59s/it] +2025-02-05 12:39:33 - ERROR - stderr - +2025-02-05 12:39:33 - ERROR - stderr - +2025-02-05 12:39:33 - INFO - stdout - {'loss': 0.8566, 'grad_norm': 1.056945562362671, 'learning_rate': 1.9112011652005843e-05, 'epoch': 0.48} +2025-02-05 12:39:33 - ERROR - stderr - 16%|█▌ | 3615/22434 [2:31:53<13:31:06, 2.59s/it] +2025-02-05 12:39:36 - ERROR - stderr - 16%|█▌ | 3616/22434 [2:31:55<13:16:09, 2.54s/it] +2025-02-05 12:39:36 - ERROR - stderr - +2025-02-05 12:39:36 - ERROR - stderr - +2025-02-05 12:39:36 - INFO - stdout - {'loss': 1.0519, 'grad_norm': 1.1845916509628296, 'learning_rate': 1.911141678873882e-05, 'epoch': 0.48} +2025-02-05 12:39:36 - ERROR - stderr - 16%|█▌ | 3616/22434 [2:31:55<13:16:09, 2.54s/it] +2025-02-05 12:39:38 - ERROR - stderr - 16%|█▌ | 3617/22434 [2:31:58<13:08:04, 2.51s/it] +2025-02-05 12:39:38 - ERROR - stderr - +2025-02-05 12:39:38 - ERROR - stderr - +2025-02-05 12:39:38 - INFO - stdout - {'loss': 0.909, 'grad_norm': 1.0535677671432495, 'learning_rate': 1.9110821735553034e-05, 'epoch': 0.48} +2025-02-05 12:39:38 - ERROR - stderr - 16%|█▌ | 3617/22434 [2:31:58<13:08:04, 2.51s/it] +2025-02-05 12:39:41 - ERROR - stderr - 16%|█▌ | 3618/22434 [2:32:00<13:11:40, 2.52s/it] +2025-02-05 12:39:41 - ERROR - stderr - +2025-02-05 12:39:41 - ERROR - stderr - +2025-02-05 12:39:41 - INFO - stdout - {'loss': 1.0694, 'grad_norm': 1.175420880317688, 'learning_rate': 1.9110226492460886e-05, 'epoch': 0.48} +2025-02-05 12:39:41 - ERROR - stderr - 16%|█▌ | 3618/22434 [2:32:00<13:11:40, 2.52s/it] +2025-02-05 12:39:43 - ERROR - stderr - 16%|█▌ | 3619/22434 [2:32:03<13:14:21, 2.53s/it] +2025-02-05 12:39:43 - ERROR - stderr - +2025-02-05 12:39:43 - ERROR - stderr - +2025-02-05 12:39:43 - INFO - stdout - {'loss': 0.8209, 'grad_norm': 1.0371235609054565, 'learning_rate': 1.9109631059474783e-05, 'epoch': 0.48} +2025-02-05 12:39:43 - ERROR - stderr - 16%|█▌ | 3619/22434 [2:32:03<13:14:21, 2.53s/it] +2025-02-05 12:39:46 - ERROR - stderr - 16%|█▌ | 3620/22434 [2:32:06<13:33:37, 2.59s/it] +2025-02-05 12:39:46 - ERROR - stderr - +2025-02-05 12:39:46 - ERROR - stderr - +2025-02-05 12:39:46 - INFO - stdout - {'loss': 1.0566, 'grad_norm': 1.1100915670394897, 'learning_rate': 1.9109035436607136e-05, 'epoch': 0.48} +2025-02-05 12:39:46 - ERROR - stderr - 16%|█▌ | 3620/22434 [2:32:06<13:33:37, 2.59s/it] +2025-02-05 12:39:49 - ERROR - stderr - 16%|█▌ | 3621/22434 [2:32:09<14:46:12, 2.83s/it] +2025-02-05 12:39:49 - ERROR - stderr - +2025-02-05 12:39:49 - ERROR - stderr - +2025-02-05 12:39:49 - INFO - stdout - {'loss': 1.0044, 'grad_norm': 1.1361162662506104, 'learning_rate': 1.910843962387037e-05, 'epoch': 0.48} +2025-02-05 12:39:49 - ERROR - stderr - 16%|█▌ | 3621/22434 [2:32:09<14:46:12, 2.83s/it] +2025-02-05 12:39:53 - ERROR - stderr - 16%|█▌ | 3622/22434 [2:32:12<15:32:28, 2.97s/it] +2025-02-05 12:39:53 - ERROR - stderr - +2025-02-05 12:39:53 - ERROR - stderr - +2025-02-05 12:39:53 - INFO - stdout - {'loss': 0.9127, 'grad_norm': 1.0493508577346802, 'learning_rate': 1.9107843621276886e-05, 'epoch': 0.48} +2025-02-05 12:39:53 - ERROR - stderr - 16%|█▌ | 3622/22434 [2:32:12<15:32:28, 2.97s/it] +2025-02-05 12:39:55 - ERROR - stderr - 16%|█▌ | 3623/22434 [2:32:15<14:42:51, 2.82s/it] +2025-02-05 12:39:55 - ERROR - stderr - +2025-02-05 12:39:55 - ERROR - stderr - +2025-02-05 12:39:55 - INFO - stdout - {'loss': 0.9193, 'grad_norm': 1.0796613693237305, 'learning_rate': 1.910724742883912e-05, 'epoch': 0.48} +2025-02-05 12:39:55 - ERROR - stderr - 16%|█▌ | 3623/22434 [2:32:15<14:42:51, 2.82s/it] +2025-02-05 12:39:57 - ERROR - stderr - 16%|█▌ | 3624/22434 [2:32:17<14:10:12, 2.71s/it] +2025-02-05 12:39:58 - ERROR - stderr - +2025-02-05 12:39:58 - ERROR - stderr - +2025-02-05 12:39:58 - INFO - stdout - {'loss': 0.9153, 'grad_norm': 1.1538848876953125, 'learning_rate': 1.91066510465695e-05, 'epoch': 0.48} +2025-02-05 12:39:58 - ERROR - stderr - 16%|█▌ | 3624/22434 [2:32:17<14:10:12, 2.71s/it] +2025-02-05 12:40:00 - ERROR - stderr - 16%|█▌ | 3625/22434 [2:32:20<13:46:53, 2.64s/it] +2025-02-05 12:40:00 - ERROR - stderr - +2025-02-05 12:40:00 - ERROR - stderr - +2025-02-05 12:40:00 - INFO - stdout - {'loss': 0.9949, 'grad_norm': 1.111733078956604, 'learning_rate': 1.9106054474480448e-05, 'epoch': 0.48} +2025-02-05 12:40:00 - ERROR - stderr - 16%|█▌ | 3625/22434 [2:32:20<13:46:53, 2.64s/it] +2025-02-05 12:40:02 - ERROR - stderr - 16%|█▌ | 3626/22434 [2:32:22<13:32:44, 2.59s/it] +2025-02-05 12:40:02 - ERROR - stderr - +2025-02-05 12:40:02 - ERROR - stderr - +2025-02-05 12:40:02 - INFO - stdout - {'loss': 0.9663, 'grad_norm': 1.3412001132965088, 'learning_rate': 1.9105457712584405e-05, 'epoch': 0.48} +2025-02-05 12:40:02 - ERROR - stderr - 16%|█▌ | 3626/22434 [2:32:22<13:32:44, 2.59s/it] +2025-02-05 12:40:06 - ERROR - stderr - 16%|█▌ | 3627/22434 [2:32:25<14:24:07, 2.76s/it] +2025-02-05 12:40:06 - ERROR - stderr - +2025-02-05 12:40:06 - ERROR - stderr - +2025-02-05 12:40:06 - INFO - stdout - {'loss': 0.9699, 'grad_norm': 1.0330497026443481, 'learning_rate': 1.9104860760893808e-05, 'epoch': 0.49} +2025-02-05 12:40:06 - ERROR - stderr - 16%|█▌ | 3627/22434 [2:32:25<14:24:07, 2.76s/it] +2025-02-05 12:40:08 - ERROR - stderr - 16%|█▌ | 3628/22434 [2:32:28<13:56:41, 2.67s/it] +2025-02-05 12:40:08 - ERROR - stderr - +2025-02-05 12:40:08 - ERROR - stderr - +2025-02-05 12:40:08 - INFO - stdout - {'loss': 1.0132, 'grad_norm': 1.23576021194458, 'learning_rate': 1.9104263619421105e-05, 'epoch': 0.49} +2025-02-05 12:40:08 - ERROR - stderr - 16%|█▌ | 3628/22434 [2:32:28<13:56:41, 2.67s/it] +2025-02-05 12:40:11 - ERROR - stderr - 16%|█▌ | 3629/22434 [2:32:30<13:40:33, 2.62s/it] +2025-02-05 12:40:11 - ERROR - stderr - +2025-02-05 12:40:11 - ERROR - stderr - +2025-02-05 12:40:11 - INFO - stdout - {'loss': 0.9309, 'grad_norm': 1.037109613418579, 'learning_rate': 1.9103666288178737e-05, 'epoch': 0.49} +2025-02-05 12:40:11 - ERROR - stderr - 16%|█▌ | 3629/22434 [2:32:30<13:40:33, 2.62s/it] +2025-02-05 12:40:13 - ERROR - stderr - 16%|█▌ | 3630/22434 [2:32:33<13:27:23, 2.58s/it] +2025-02-05 12:40:13 - ERROR - stderr - +2025-02-05 12:40:13 - ERROR - stderr - +2025-02-05 12:40:13 - INFO - stdout - {'loss': 0.9313, 'grad_norm': 1.1156550645828247, 'learning_rate': 1.9103068767179156e-05, 'epoch': 0.49} +2025-02-05 12:40:13 - ERROR - stderr - 16%|█▌ | 3630/22434 [2:32:33<13:27:23, 2.58s/it] +2025-02-05 12:40:16 - ERROR - stderr - 16%|█▌ | 3631/22434 [2:32:35<13:19:46, 2.55s/it] +2025-02-05 12:40:16 - ERROR - stderr - +2025-02-05 12:40:16 - ERROR - stderr - +2025-02-05 12:40:16 - INFO - stdout - {'loss': 1.0601, 'grad_norm': 1.2146857976913452, 'learning_rate': 1.9102471056434816e-05, 'epoch': 0.49} +2025-02-05 12:40:16 - ERROR - stderr - 16%|█▌ | 3631/22434 [2:32:35<13:19:46, 2.55s/it] +2025-02-05 12:40:18 - ERROR - stderr - 16%|█▌ | 3632/22434 [2:32:38<13:14:26, 2.54s/it] +2025-02-05 12:40:18 - ERROR - stderr - +2025-02-05 12:40:18 - ERROR - stderr - +2025-02-05 12:40:18 - INFO - stdout - {'loss': 1.0331, 'grad_norm': 1.1898068189620972, 'learning_rate': 1.910187315595818e-05, 'epoch': 0.49} +2025-02-05 12:40:18 - ERROR - stderr - 16%|█▌ | 3632/22434 [2:32:38<13:14:26, 2.54s/it] +2025-02-05 12:40:21 - ERROR - stderr - 16%|█▌ | 3633/22434 [2:32:40<13:30:21, 2.59s/it] +2025-02-05 12:40:21 - ERROR - stderr - +2025-02-05 12:40:21 - ERROR - stderr - +2025-02-05 12:40:21 - INFO - stdout - {'loss': 0.9203, 'grad_norm': 1.020881175994873, 'learning_rate': 1.9101275065761705e-05, 'epoch': 0.49} +2025-02-05 12:40:21 - ERROR - stderr - 16%|█▌ | 3633/22434 [2:32:41<13:30:21, 2.59s/it] +2025-02-05 12:40:23 - ERROR - stderr - 16%|█▌ | 3634/22434 [2:32:43<13:24:35, 2.57s/it] +2025-02-05 12:40:23 - ERROR - stderr - +2025-02-05 12:40:23 - ERROR - stderr - +2025-02-05 12:40:23 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.0890753269195557, 'learning_rate': 1.9100676785857862e-05, 'epoch': 0.49} +2025-02-05 12:40:23 - ERROR - stderr - 16%|█�� | 3634/22434 [2:32:43<13:24:35, 2.57s/it] +2025-02-05 12:40:26 - ERROR - stderr - 16%|█▌ | 3635/22434 [2:32:45<13:11:01, 2.52s/it] +2025-02-05 12:40:26 - ERROR - stderr - +2025-02-05 12:40:26 - ERROR - stderr - +2025-02-05 12:40:26 - INFO - stdout - {'loss': 1.0049, 'grad_norm': 1.1613149642944336, 'learning_rate': 1.9100078316259118e-05, 'epoch': 0.49} +2025-02-05 12:40:26 - ERROR - stderr - 16%|█▌ | 3635/22434 [2:32:45<13:11:01, 2.52s/it] +2025-02-05 12:40:28 - ERROR - stderr - 16%|█▌ | 3636/22434 [2:32:48<13:13:20, 2.53s/it] +2025-02-05 12:40:28 - ERROR - stderr - +2025-02-05 12:40:28 - ERROR - stderr - +2025-02-05 12:40:28 - INFO - stdout - {'loss': 1.0787, 'grad_norm': 1.2421503067016602, 'learning_rate': 1.909947965697795e-05, 'epoch': 0.49} +2025-02-05 12:40:28 - ERROR - stderr - 16%|█▌ | 3636/22434 [2:32:48<13:13:20, 2.53s/it] +2025-02-05 12:40:31 - ERROR - stderr - 16%|█▌ | 3637/22434 [2:32:51<13:19:12, 2.55s/it] +2025-02-05 12:40:31 - ERROR - stderr - +2025-02-05 12:40:31 - ERROR - stderr - +2025-02-05 12:40:31 - INFO - stdout - {'loss': 0.9664, 'grad_norm': 1.1563407182693481, 'learning_rate': 1.9098880808026832e-05, 'epoch': 0.49} +2025-02-05 12:40:31 - ERROR - stderr - 16%|█▌ | 3637/22434 [2:32:51<13:19:12, 2.55s/it] +2025-02-05 12:40:33 - ERROR - stderr - 16%|█▌ | 3638/22434 [2:32:53<13:15:16, 2.54s/it] +2025-02-05 12:40:33 - ERROR - stderr - +2025-02-05 12:40:33 - ERROR - stderr - +2025-02-05 12:40:33 - INFO - stdout - {'loss': 1.0166, 'grad_norm': 1.1479296684265137, 'learning_rate': 1.909828176941826e-05, 'epoch': 0.49} +2025-02-05 12:40:33 - ERROR - stderr - 16%|█▌ | 3638/22434 [2:32:53<13:15:16, 2.54s/it] +2025-02-05 12:40:36 - ERROR - stderr - 16%|█▌ | 3639/22434 [2:32:56<13:14:02, 2.53s/it] +2025-02-05 12:40:36 - ERROR - stderr - +2025-02-05 12:40:36 - ERROR - stderr - +2025-02-05 12:40:36 - INFO - stdout - {'loss': 1.1338, 'grad_norm': 1.2485655546188354, 'learning_rate': 1.90976825411647e-05, 'epoch': 0.49} +2025-02-05 12:40:36 - ERROR - stderr - 16%|█▌ | 3639/22434 [2:32:56<13:14:02, 2.53s/it] +2025-02-05 12:40:38 - ERROR - stderr - 16%|█▌ | 3640/22434 [2:32:58<13:06:57, 2.51s/it] +2025-02-05 12:40:38 - ERROR - stderr - +2025-02-05 12:40:38 - ERROR - stderr - +2025-02-05 12:40:38 - INFO - stdout - {'loss': 1.048, 'grad_norm': 1.0930095911026, 'learning_rate': 1.909708312327866e-05, 'epoch': 0.49} +2025-02-05 12:40:38 - ERROR - stderr - 16%|█▌ | 3640/22434 [2:32:58<13:06:57, 2.51s/it] +2025-02-05 12:40:41 - ERROR - stderr - 16%|█▌ | 3641/22434 [2:33:01<13:00:12, 2.49s/it] +2025-02-05 12:40:41 - ERROR - stderr - +2025-02-05 12:40:41 - ERROR - stderr - +2025-02-05 12:40:41 - INFO - stdout - {'loss': 1.0228, 'grad_norm': 1.124192714691162, 'learning_rate': 1.9096483515772625e-05, 'epoch': 0.49} +2025-02-05 12:40:41 - ERROR - stderr - 16%|█▌ | 3641/22434 [2:33:01<13:00:12, 2.49s/it] +2025-02-05 12:40:43 - ERROR - stderr - 16%|█▌ | 3642/22434 [2:33:03<13:03:02, 2.50s/it] +2025-02-05 12:40:43 - ERROR - stderr - +2025-02-05 12:40:43 - ERROR - stderr - +2025-02-05 12:40:43 - INFO - stdout - {'loss': 0.9589, 'grad_norm': 1.0576257705688477, 'learning_rate': 1.9095883718659095e-05, 'epoch': 0.49} +2025-02-05 12:40:43 - ERROR - stderr - 16%|█▌ | 3642/22434 [2:33:03<13:03:02, 2.50s/it] +2025-02-05 12:40:46 - ERROR - stderr - 16%|█▌ | 3643/22434 [2:33:05<12:59:55, 2.49s/it] +2025-02-05 12:40:46 - ERROR - stderr - +2025-02-05 12:40:46 - ERROR - stderr - +2025-02-05 12:40:46 - INFO - stdout - {'loss': 0.9364, 'grad_norm': 1.0012192726135254, 'learning_rate': 1.9095283731950572e-05, 'epoch': 0.49} +2025-02-05 12:40:46 - ERROR - stderr - 16%|█▌ | 3643/22434 [2:33:06<12:59:55, 2.49s/it] +2025-02-05 12:40:49 - ERROR - stderr - 16%|█▌ | 3644/22434 [2:33:08<13:26:31, 2.58s/it] +2025-02-05 12:40:49 - ERROR - stderr - +2025-02-05 12:40:49 - ERROR - stderr - +2025-02-05 12:40:49 - INFO - stdout - {'loss': 0.9965, 'grad_norm': 1.049712896347046, 'learning_rate': 1.9094683555659565e-05, 'epoch': 0.49} +2025-02-05 12:40:49 - ERROR - stderr - 16%|█▌ | 3644/22434 [2:33:08<13:26:31, 2.58s/it] +2025-02-05 12:40:51 - ERROR - stderr - 16%|█▌ | 3645/22434 [2:33:11<13:14:28, 2.54s/it] +2025-02-05 12:40:51 - ERROR - stderr - +2025-02-05 12:40:51 - ERROR - stderr - +2025-02-05 12:40:51 - INFO - stdout - {'loss': 1.0536, 'grad_norm': 1.155009388923645, 'learning_rate': 1.9094083189798583e-05, 'epoch': 0.49} +2025-02-05 12:40:51 - ERROR - stderr - 16%|█▌ | 3645/22434 [2:33:11<13:14:28, 2.54s/it] +2025-02-05 12:40:54 - ERROR - stderr - 16%|█▋ | 3646/22434 [2:33:13<13:21:09, 2.56s/it] +2025-02-05 12:40:54 - ERROR - stderr - +2025-02-05 12:40:54 - ERROR - stderr - +2025-02-05 12:40:54 - INFO - stdout - {'loss': 0.9225, 'grad_norm': 1.1310431957244873, 'learning_rate': 1.9093482634380135e-05, 'epoch': 0.49} +2025-02-05 12:40:54 - ERROR - stderr - 16%|█▋ | 3646/22434 [2:33:13<13:21:09, 2.56s/it] +2025-02-05 12:40:56 - ERROR - stderr - 16%|█▋ | 3647/22434 [2:33:16<13:14:02, 2.54s/it] +2025-02-05 12:40:56 - ERROR - stderr - +2025-02-05 12:40:56 - ERROR - stderr - +2025-02-05 12:40:56 - INFO - stdout - {'loss': 1.017, 'grad_norm': 1.1689939498901367, 'learning_rate': 1.9092881889416744e-05, 'epoch': 0.49} +2025-02-05 12:40:56 - ERROR - stderr - 16%|█▋ | 3647/22434 [2:33:16<13:14:02, 2.54s/it] +2025-02-05 12:40:59 - ERROR - stderr - 16%|█▋ | 3648/22434 [2:33:18<13:17:53, 2.55s/it] +2025-02-05 12:40:59 - ERROR - stderr - +2025-02-05 12:40:59 - ERROR - stderr - +2025-02-05 12:40:59 - INFO - stdout - {'loss': 0.9707, 'grad_norm': 1.155153751373291, 'learning_rate': 1.9092280954920935e-05, 'epoch': 0.49} +2025-02-05 12:40:59 - ERROR - stderr - 16%|█▋ | 3648/22434 [2:33:18<13:17:53, 2.55s/it] +2025-02-05 12:41:01 - ERROR - stderr - 16%|█▋ | 3649/22434 [2:33:21<13:21:56, 2.56s/it] +2025-02-05 12:41:01 - ERROR - stderr - +2025-02-05 12:41:01 - ERROR - stderr - +2025-02-05 12:41:01 - INFO - stdout - {'loss': 0.9849, 'grad_norm': 1.1819772720336914, 'learning_rate': 1.9091679830905225e-05, 'epoch': 0.49} +2025-02-05 12:41:01 - ERROR - stderr - 16%|█▋ | 3649/22434 [2:33:21<13:21:56, 2.56s/it] +2025-02-05 12:41:04 - ERROR - stderr - 16%|█▋ | 3650/22434 [2:33:23<13:16:24, 2.54s/it] +2025-02-05 12:41:04 - ERROR - stderr - +2025-02-05 12:41:04 - ERROR - stderr - +2025-02-05 12:41:04 - INFO - stdout - {'loss': 0.9631, 'grad_norm': 1.2240902185440063, 'learning_rate': 1.909107851738215e-05, 'epoch': 0.49} +2025-02-05 12:41:04 - ERROR - stderr - 16%|█▋ | 3650/22434 [2:33:24<13:16:24, 2.54s/it] +2025-02-05 12:41:06 - ERROR - stderr - 16%|█▋ | 3651/22434 [2:33:26<13:22:15, 2.56s/it] +2025-02-05 12:41:06 - ERROR - stderr - +2025-02-05 12:41:06 - ERROR - stderr - +2025-02-05 12:41:06 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.084999680519104, 'learning_rate': 1.9090477014364242e-05, 'epoch': 0.49} +2025-02-05 12:41:06 - ERROR - stderr - 16%|█▋ | 3651/22434 [2:33:26<13:22:15, 2.56s/it] +2025-02-05 12:41:09 - ERROR - stderr - 16%|█▋ | 3652/22434 [2:33:29<13:18:42, 2.55s/it] +2025-02-05 12:41:09 - ERROR - stderr - +2025-02-05 12:41:09 - ERROR - stderr - +2025-02-05 12:41:09 - INFO - stdout - {'loss': 0.9614, 'grad_norm': 1.1480823755264282, 'learning_rate': 1.9089875321864043e-05, 'epoch': 0.49} +2025-02-05 12:41:09 - ERROR - stderr - 16%|█▋ | 3652/22434 [2:33:29<13:18:42, 2.55s/it] +2025-02-05 12:41:11 - ERROR - stderr - 16%|█▋ | 3653/22434 [2:33:31<13:10:24, 2.53s/it] +2025-02-05 12:41:11 - ERROR - stderr - +2025-02-05 12:41:11 - ERROR - stderr - +2025-02-05 12:41:11 - INFO - stdout - {'loss': 0.9406, 'grad_norm': 1.0436360836029053, 'learning_rate': 1.908927343989409e-05, 'epoch': 0.49} +2025-02-05 12:41:11 - ERROR - stderr - 16%|█▋ | 3653/22434 [2:33:31<13:10:24, 2.53s/it] +2025-02-05 12:41:14 - ERROR - stderr - 16%|█▋ | 3654/22434 [2:33:34<13:00:27, 2.49s/it] +2025-02-05 12:41:14 - ERROR - stderr - +2025-02-05 12:41:14 - ERROR - stderr - +2025-02-05 12:41:14 - INFO - stdout - {'loss': 0.9442, 'grad_norm': 1.0209296941757202, 'learning_rate': 1.9088671368466928e-05, 'epoch': 0.49} +2025-02-05 12:41:14 - ERROR - stderr - 16%|█▋ | 3654/22434 [2:33:34<13:00:27, 2.49s/it] +2025-02-05 12:41:16 - ERROR - stderr - 16%|█▋ | 3655/22434 [2:33:36<13:01:00, 2.50s/it] +2025-02-05 12:41:16 - ERROR - stderr - +2025-02-05 12:41:16 - ERROR - stderr - +2025-02-05 12:41:16 - INFO - stdout - {'loss': 1.0133, 'grad_norm': 1.1864526271820068, 'learning_rate': 1.9088069107595105e-05, 'epoch': 0.49} +2025-02-05 12:41:16 - ERROR - stderr - 16%|█▋ | 3655/22434 [2:33:36<13:01:00, 2.50s/it] +2025-02-05 12:41:19 - ERROR - stderr - 16%|█▋ | 3656/22434 [2:33:38<12:59:59, 2.49s/it] +2025-02-05 12:41:19 - ERROR - stderr - +2025-02-05 12:41:19 - ERROR - stderr - +2025-02-05 12:41:19 - INFO - stdout - {'loss': 0.9762, 'grad_norm': 1.2133468389511108, 'learning_rate': 1.908746665729118e-05, 'epoch': 0.49} +2025-02-05 12:41:19 - ERROR - stderr - 16%|█▋ | 3656/22434 [2:33:39<12:59:59, 2.49s/it] +2025-02-05 12:41:21 - ERROR - stderr - 16%|█▋ | 3657/22434 [2:33:41<12:58:05, 2.49s/it] +2025-02-05 12:41:21 - ERROR - stderr - +2025-02-05 12:41:21 - ERROR - stderr - +2025-02-05 12:41:21 - INFO - stdout - {'loss': 1.0346, 'grad_norm': 1.1199297904968262, 'learning_rate': 1.908686401756771e-05, 'epoch': 0.49} +2025-02-05 12:41:21 - ERROR - stderr - 16%|█▋ | 3657/22434 [2:33:41<12:58:05, 2.49s/it] +2025-02-05 12:41:24 - ERROR - stderr - 16%|█▋ | 3658/22434 [2:33:43<12:54:01, 2.47s/it] +2025-02-05 12:41:24 - ERROR - stderr - +2025-02-05 12:41:24 - ERROR - stderr - +2025-02-05 12:41:24 - INFO - stdout - {'loss': 0.9842, 'grad_norm': 1.1451926231384277, 'learning_rate': 1.9086261188437255e-05, 'epoch': 0.49} +2025-02-05 12:41:24 - ERROR - stderr - 16%|█▋ | 3658/22434 [2:33:43<12:54:01, 2.47s/it] +2025-02-05 12:41:26 - ERROR - stderr - 16%|█▋ | 3659/22434 [2:33:46<12:54:56, 2.48s/it] +2025-02-05 12:41:26 - ERROR - stderr - +2025-02-05 12:41:26 - ERROR - stderr - +2025-02-05 12:41:26 - INFO - stdout - {'loss': 0.9413, 'grad_norm': 1.033084511756897, 'learning_rate': 1.908565816991238e-05, 'epoch': 0.49} +2025-02-05 12:41:26 - ERROR - stderr - 16%|█▋ | 3659/22434 [2:33:46<12:54:56, 2.48s/it] +2025-02-05 12:41:29 - ERROR - stderr - 16%|█▋ | 3660/22434 [2:33:48<13:02:28, 2.50s/it] +2025-02-05 12:41:29 - ERROR - stderr - +2025-02-05 12:41:29 - ERROR - stderr - +2025-02-05 12:41:29 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.0193250179290771, 'learning_rate': 1.908505496200565e-05, 'epoch': 0.49} +2025-02-05 12:41:29 - ERROR - stderr - 16%|█▋ | 3660/22434 [2:33:48<13:02:28, 2.50s/it] +2025-02-05 12:41:31 - ERROR - stderr - 16%|█▋ | 3661/22434 [2:33:51<13:20:34, 2.56s/it] +2025-02-05 12:41:31 - ERROR - stderr - +2025-02-05 12:41:31 - ERROR - stderr - +2025-02-05 12:41:31 - INFO - stdout - {'loss': 0.9584, 'grad_norm': 1.1539981365203857, 'learning_rate': 1.908445156472965e-05, 'epoch': 0.49} +2025-02-05 12:41:31 - ERROR - stderr - 16%|█▋ | 3661/22434 [2:33:51<13:20:34, 2.56s/it] +2025-02-05 12:41:34 - ERROR - stderr - 16%|█▋ | 3662/22434 [2:33:54<13:12:30, 2.53s/it] +2025-02-05 12:41:34 - ERROR - stderr - +2025-02-05 12:41:34 - ERROR - stderr - +2025-02-05 12:41:34 - INFO - stdout - {'loss': 1.0218, 'grad_norm': 1.114689826965332, 'learning_rate': 1.9083847978096944e-05, 'epoch': 0.49} +2025-02-05 12:41:34 - ERROR - stderr - 16%|█▋ | 3662/22434 [2:33:54<13:12:30, 2.53s/it] +2025-02-05 12:41:36 - ERROR - stderr - 16%|█▋ | 3663/22434 [2:33:56<13:07:04, 2.52s/it] +2025-02-05 12:41:36 - ERROR - stderr - +2025-02-05 12:41:36 - ERROR - stderr - +2025-02-05 12:41:36 - INFO - stdout - {'loss': 0.9672, 'grad_norm': 1.058713436126709, 'learning_rate': 1.9083244202120124e-05, 'epoch': 0.49} +2025-02-05 12:41:36 - ERROR - stderr - 16%|█▋ | 3663/22434 [2:33:56<13:07:04, 2.52s/it] +2025-02-05 12:41:39 - ERROR - stderr - 16%|█▋ | 3664/22434 [2:33:59<13:04:13, 2.51s/it] +2025-02-05 12:41:39 - ERROR - stderr - +2025-02-05 12:41:39 - ERROR - stderr - +2025-02-05 12:41:39 - INFO - stdout - {'loss': 0.9935, 'grad_norm': 1.150854468345642, 'learning_rate': 1.9082640236811766e-05, 'epoch': 0.49} +2025-02-05 12:41:39 - ERROR - stderr - 16%|█▋ | 3664/22434 [2:33:59<13:04:13, 2.51s/it] +2025-02-05 12:41:41 - ERROR - stderr - 16%|█▋ | 3665/22434 [2:34:01<13:02:42, 2.50s/it] +2025-02-05 12:41:41 - ERROR - stderr - +2025-02-05 12:41:41 - ERROR - stderr - +2025-02-05 12:41:41 - INFO - stdout - {'loss': 0.9814, 'grad_norm': 1.0963890552520752, 'learning_rate': 1.9082036082184466e-05, 'epoch': 0.49} +2025-02-05 12:41:41 - ERROR - stderr - 16%|█▋ | 3665/22434 [2:34:01<13:02:42, 2.50s/it] +2025-02-05 12:41:44 - ERROR - stderr - 16%|█▋ | 3666/22434 [2:34:04<13:02:10, 2.50s/it] +2025-02-05 12:41:44 - ERROR - stderr - +2025-02-05 12:41:44 - ERROR - stderr - +2025-02-05 12:41:44 - INFO - stdout - {'loss': 0.8903, 'grad_norm': 1.1178874969482422, 'learning_rate': 1.9081431738250815e-05, 'epoch': 0.49} +2025-02-05 12:41:44 - ERROR - stderr - 16%|█▋ | 3666/22434 [2:34:04<13:02:10, 2.50s/it] +2025-02-05 12:41:46 - ERROR - stderr - 16%|█▋ | 3667/22434 [2:34:06<13:08:25, 2.52s/it] +2025-02-05 12:41:46 - ERROR - stderr - +2025-02-05 12:41:46 - ERROR - stderr - +2025-02-05 12:41:46 - INFO - stdout - {'loss': 1.0131, 'grad_norm': 1.0414948463439941, 'learning_rate': 1.908082720502341e-05, 'epoch': 0.49} +2025-02-05 12:41:46 - ERROR - stderr - 16%|█▋ | 3667/22434 [2:34:06<13:08:25, 2.52s/it] +2025-02-05 12:41:49 - ERROR - stderr - 16%|█▋ | 3668/22434 [2:34:09<13:04:22, 2.51s/it] +2025-02-05 12:41:49 - ERROR - stderr - +2025-02-05 12:41:49 - ERROR - stderr - +2025-02-05 12:41:49 - INFO - stdout - {'loss': 0.9576, 'grad_norm': 1.0815478563308716, 'learning_rate': 1.9080222482514847e-05, 'epoch': 0.49} +2025-02-05 12:41:49 - ERROR - stderr - 16%|█▋ | 3668/22434 [2:34:09<13:04:22, 2.51s/it] +2025-02-05 12:41:51 - ERROR - stderr - 16%|█▋ | 3669/22434 [2:34:11<13:00:05, 2.49s/it] +2025-02-05 12:41:51 - ERROR - stderr - +2025-02-05 12:41:51 - ERROR - stderr - +2025-02-05 12:41:51 - INFO - stdout - {'loss': 0.9399, 'grad_norm': 1.0705641508102417, 'learning_rate': 1.9079617570737738e-05, 'epoch': 0.49} +2025-02-05 12:41:51 - ERROR - stderr - 16%|█▋ | 3669/22434 [2:34:11<13:00:05, 2.49s/it] +2025-02-05 12:41:54 - ERROR - stderr - 16%|█▋ | 3670/22434 [2:34:14<13:01:05, 2.50s/it] +2025-02-05 12:41:54 - ERROR - stderr - +2025-02-05 12:41:54 - ERROR - stderr - +2025-02-05 12:41:54 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.0730514526367188, 'learning_rate': 1.907901246970469e-05, 'epoch': 0.49} +2025-02-05 12:41:54 - ERROR - stderr - 16%|█▋ | 3670/22434 [2:34:14<13:01:05, 2.50s/it] +2025-02-05 12:41:56 - ERROR - stderr - 16%|█▋ | 3671/22434 [2:34:16<13:03:42, 2.51s/it] +2025-02-05 12:41:56 - ERROR - stderr - +2025-02-05 12:41:56 - ERROR - stderr - +2025-02-05 12:41:56 - INFO - stdout - {'loss': 1.1435, 'grad_norm': 1.1241930723190308, 'learning_rate': 1.9078407179428313e-05, 'epoch': 0.49} +2025-02-05 12:41:56 - ERROR - stderr - 16%|█▋ | 3671/22434 [2:34:16<13:03:42, 2.51s/it] +2025-02-05 12:41:59 - ERROR - stderr - 16%|█▋ | 3672/22434 [2:34:19<12:58:55, 2.49s/it] +2025-02-05 12:41:59 - ERROR - stderr - +2025-02-05 12:41:59 - ERROR - stderr - +2025-02-05 12:41:59 - INFO - stdout - {'loss': 0.9647, 'grad_norm': 1.034538745880127, 'learning_rate': 1.9077801699921225e-05, 'epoch': 0.49} +2025-02-05 12:41:59 - ERROR - stderr - 16%|█▋ | 3672/22434 [2:34:19<12:58:55, 2.49s/it] +2025-02-05 12:42:01 - ERROR - stderr - 16%|█▋ | 3673/22434 [2:34:21<13:06:28, 2.52s/it] +2025-02-05 12:42:01 - ERROR - stderr - +2025-02-05 12:42:01 - ERROR - stderr - +2025-02-05 12:42:01 - INFO - stdout - {'loss': 0.9686, 'grad_norm': 1.032455563545227, 'learning_rate': 1.9077196031196047e-05, 'epoch': 0.49} +2025-02-05 12:42:01 - ERROR - stderr - 16%|█▋ | 3673/22434 [2:34:21<13:06:28, 2.52s/it] +2025-02-05 12:42:04 - ERROR - stderr - 16%|█▋ | 3674/22434 [2:34:24<13:19:18, 2.56s/it] +2025-02-05 12:42:04 - ERROR - stderr - +2025-02-05 12:42:04 - ERROR - stderr - +2025-02-05 12:42:04 - INFO - stdout - {'loss': 1.0638, 'grad_norm': 1.2188570499420166, 'learning_rate': 1.9076590173265406e-05, 'epoch': 0.49} +2025-02-05 12:42:04 - ERROR - stderr - 16%|█▋ | 3674/22434 [2:34:24<13:19:18, 2.56s/it] +2025-02-05 12:42:07 - ERROR - stderr - 16%|█▋ | 3675/22434 [2:34:26<13:13:58, 2.54s/it] +2025-02-05 12:42:07 - ERROR - stderr - +2025-02-05 12:42:07 - ERROR - stderr - +2025-02-05 12:42:07 - INFO - stdout - {'loss': 0.9833, 'grad_norm': 1.1617692708969116, 'learning_rate': 1.9075984126141927e-05, 'epoch': 0.49} +2025-02-05 12:42:07 - ERROR - stderr - 16%|█▋ | 3675/22434 [2:34:26<13:13:58, 2.54s/it] +2025-02-05 12:42:09 - ERROR - stderr - 16%|█▋ | 3676/22434 [2:34:29<13:06:01, 2.51s/it] +2025-02-05 12:42:09 - ERROR - stderr - +2025-02-05 12:42:09 - ERROR - stderr - +2025-02-05 12:42:09 - INFO - stdout - {'loss': 1.1119, 'grad_norm': 1.257301688194275, 'learning_rate': 1.9075377889838243e-05, 'epoch': 0.49} +2025-02-05 12:42:09 - ERROR - stderr - 16%|█▋ | 3676/22434 [2:34:29<13:06:01, 2.51s/it] +2025-02-05 12:42:12 - ERROR - stderr - 16%|█▋ | 3677/22434 [2:34:31<13:10:03, 2.53s/it] +2025-02-05 12:42:12 - ERROR - stderr - +2025-02-05 12:42:12 - ERROR - stderr - +2025-02-05 12:42:12 - INFO - stdout - {'loss': 0.9529, 'grad_norm': 1.0904678106307983, 'learning_rate': 1.907477146436699e-05, 'epoch': 0.49} +2025-02-05 12:42:12 - ERROR - stderr - 16%|█▋ | 3677/22434 [2:34:31<13:10:03, 2.53s/it] +2025-02-05 12:42:14 - ERROR - stderr - 16%|█▋ | 3678/22434 [2:34:34<13:37:41, 2.62s/it] +2025-02-05 12:42:14 - ERROR - stderr - +2025-02-05 12:42:14 - ERROR - stderr - +2025-02-05 12:42:14 - INFO - stdout - {'loss': 0.9508, 'grad_norm': 1.039637804031372, 'learning_rate': 1.9074164849740813e-05, 'epoch': 0.49} +2025-02-05 12:42:14 - ERROR - stderr - 16%|█▋ | 3678/22434 [2:34:34<13:37:41, 2.62s/it] +2025-02-05 12:42:17 - ERROR - stderr - 16%|█▋ | 3679/22434 [2:34:37<13:28:36, 2.59s/it] +2025-02-05 12:42:17 - ERROR - stderr - +2025-02-05 12:42:17 - ERROR - stderr - +2025-02-05 12:42:17 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.1693003177642822, 'learning_rate': 1.9073558045972352e-05, 'epoch': 0.49} +2025-02-05 12:42:17 - ERROR - stderr - 16%|█▋ | 3679/22434 [2:34:37<13:28:36, 2.59s/it] +2025-02-05 12:42:19 - ERROR - stderr - 16%|█▋ | 3680/22434 [2:34:39<13:16:56, 2.55s/it] +2025-02-05 12:42:19 - ERROR - stderr - +2025-02-05 12:42:19 - ERROR - stderr - +2025-02-05 12:42:19 - INFO - stdout - {'loss': 0.9573, 'grad_norm': 1.0231982469558716, 'learning_rate': 1.9072951053074252e-05, 'epoch': 0.49} +2025-02-05 12:42:19 - ERROR - stderr - 16%|█▋ | 3680/22434 [2:34:39<13:16:56, 2.55s/it] +2025-02-05 12:42:22 - ERROR - stderr - 16%|█▋ | 3681/22434 [2:34:42<13:16:48, 2.55s/it] +2025-02-05 12:42:22 - ERROR - stderr - +2025-02-05 12:42:22 - ERROR - stderr - +2025-02-05 12:42:22 - INFO - stdout - {'loss': 1.078, 'grad_norm': 1.1048824787139893, 'learning_rate': 1.907234387105917e-05, 'epoch': 0.49} +2025-02-05 12:42:22 - ERROR - stderr - 16%|█▋ | 3681/22434 [2:34:42<13:16:48, 2.55s/it] +2025-02-05 12:42:24 - ERROR - stderr - 16%|█▋ | 3682/22434 [2:34:44<13:10:10, 2.53s/it] +2025-02-05 12:42:24 - ERROR - stderr - +2025-02-05 12:42:24 - ERROR - stderr - +2025-02-05 12:42:24 - INFO - stdout - {'loss': 0.9978, 'grad_norm': 1.1571928262710571, 'learning_rate': 1.9071736499939765e-05, 'epoch': 0.49} +2025-02-05 12:42:24 - ERROR - stderr - 16%|█▋ | 3682/22434 [2:34:44<13:10:10, 2.53s/it] +2025-02-05 12:42:27 - ERROR - stderr - 16%|█▋ | 3683/22434 [2:34:47<13:04:39, 2.51s/it] +2025-02-05 12:42:27 - ERROR - stderr - +2025-02-05 12:42:27 - ERROR - stderr - +2025-02-05 12:42:27 - INFO - stdout - {'loss': 1.0189, 'grad_norm': 1.0723716020584106, 'learning_rate': 1.9071128939728693e-05, 'epoch': 0.49} +2025-02-05 12:42:27 - ERROR - stderr - 16%|█▋ | 3683/22434 [2:34:47<13:04:39, 2.51s/it] +2025-02-05 12:42:29 - ERROR - stderr - 16%|█▋ | 3684/22434 [2:34:49<13:12:16, 2.54s/it] +2025-02-05 12:42:29 - ERROR - stderr - +2025-02-05 12:42:29 - ERROR - stderr - +2025-02-05 12:42:29 - INFO - stdout - {'loss': 0.886, 'grad_norm': 0.9494854211807251, 'learning_rate': 1.9070521190438618e-05, 'epoch': 0.49} +2025-02-05 12:42:29 - ERROR - stderr - 16%|█▋ | 3684/22434 [2:34:49<13:12:16, 2.54s/it] +2025-02-05 12:42:32 - ERROR - stderr - 16%|█▋ | 3685/22434 [2:34:52<13:05:56, 2.52s/it] +2025-02-05 12:42:32 - ERROR - stderr - +2025-02-05 12:42:32 - ERROR - stderr - +2025-02-05 12:42:32 - INFO - stdout - {'loss': 0.9689, 'grad_norm': 1.0872453451156616, 'learning_rate': 1.9069913252082207e-05, 'epoch': 0.49} +2025-02-05 12:42:32 - ERROR - stderr - 16%|█▋ | 3685/22434 [2:34:52<13:05:56, 2.52s/it] +2025-02-05 12:42:34 - ERROR - stderr - 16%|█▋ | 3686/22434 [2:34:54<13:02:51, 2.51s/it] +2025-02-05 12:42:34 - ERROR - stderr - +2025-02-05 12:42:34 - ERROR - stderr - +2025-02-05 12:42:34 - INFO - stdout - {'loss': 0.9871, 'grad_norm': 1.2586435079574585, 'learning_rate': 1.9069305124672134e-05, 'epoch': 0.49} +2025-02-05 12:42:34 - ERROR - stderr - 16%|█▋ | 3686/22434 [2:34:54<13:02:51, 2.51s/it] +2025-02-05 12:42:37 - ERROR - stderr - 16%|█▋ | 3687/22434 [2:34:57<13:37:32, 2.62s/it] +2025-02-05 12:42:37 - ERROR - stderr - +2025-02-05 12:42:37 - ERROR - stderr - +2025-02-05 12:42:37 - INFO - stdout - {'loss': 0.9015, 'grad_norm': 1.0400748252868652, 'learning_rate': 1.9068696808221073e-05, 'epoch': 0.49} +2025-02-05 12:42:37 - ERROR - stderr - 16%|█▋ | 3687/22434 [2:34:57<13:37:32, 2.62s/it] +2025-02-05 12:42:40 - ERROR - stderr - 16%|█▋ | 3688/22434 [2:35:00<13:25:58, 2.58s/it] +2025-02-05 12:42:40 - ERROR - stderr - +2025-02-05 12:42:40 - ERROR - stderr - +2025-02-05 12:42:40 - INFO - stdout - {'loss': 1.0109, 'grad_norm': 1.0710338354110718, 'learning_rate': 1.9068088302741703e-05, 'epoch': 0.49} +2025-02-05 12:42:40 - ERROR - stderr - 16%|█▋ | 3688/22434 [2:35:00<13:25:58, 2.58s/it] +2025-02-05 12:42:42 - ERROR - stderr - 16%|█▋ | 3689/22434 [2:35:02<13:16:48, 2.55s/it] +2025-02-05 12:42:42 - ERROR - stderr - +2025-02-05 12:42:42 - ERROR - stderr - +2025-02-05 12:42:42 - INFO - stdout - {'loss': 1.0345, 'grad_norm': 1.1903811693191528, 'learning_rate': 1.906747960824671e-05, 'epoch': 0.49} +2025-02-05 12:42:42 - ERROR - stderr - 16%|█▋ | 3689/22434 [2:35:02<13:16:48, 2.55s/it] +2025-02-05 12:42:45 - ERROR - stderr - 16%|█▋ | 3690/22434 [2:35:04<13:10:40, 2.53s/it] +2025-02-05 12:42:45 - ERROR - stderr - +2025-02-05 12:42:45 - ERROR - stderr - +2025-02-05 12:42:45 - INFO - stdout - {'loss': 1.0551, 'grad_norm': 1.2121220827102661, 'learning_rate': 1.9066870724748786e-05, 'epoch': 0.49} +2025-02-05 12:42:45 - ERROR - stderr - 16%|█▋ | 3690/22434 [2:35:05<13:10:40, 2.53s/it] +2025-02-05 12:42:47 - ERROR - stderr - 16%|█▋ | 3691/22434 [2:35:07<13:02:02, 2.50s/it] +2025-02-05 12:42:47 - ERROR - stderr - +2025-02-05 12:42:47 - ERROR - stderr - +2025-02-05 12:42:47 - INFO - stdout - {'loss': 0.9148, 'grad_norm': 0.9684620499610901, 'learning_rate': 1.9066261652260615e-05, 'epoch': 0.49} +2025-02-05 12:42:47 - ERROR - stderr - 16%|█▋ | 3691/22434 [2:35:07<13:02:02, 2.50s/it] +2025-02-05 12:42:50 - ERROR - stderr - 16%|█▋ | 3692/22434 [2:35:09<12:56:51, 2.49s/it] +2025-02-05 12:42:50 - ERROR - stderr - +2025-02-05 12:42:50 - ERROR - stderr - +2025-02-05 12:42:50 - INFO - stdout - {'loss': 1.0012, 'grad_norm': 1.090959906578064, 'learning_rate': 1.9065652390794894e-05, 'epoch': 0.49} +2025-02-05 12:42:50 - ERROR - stderr - 16%|█▋ | 3692/22434 [2:35:09<12:56:51, 2.49s/it] +2025-02-05 12:42:52 - ERROR - stderr - 16%|█▋ | 3693/22434 [2:35:12<13:03:20, 2.51s/it] +2025-02-05 12:42:52 - ERROR - stderr - +2025-02-05 12:42:52 - ERROR - stderr - +2025-02-05 12:42:52 - INFO - stdout - {'loss': 0.9869, 'grad_norm': 1.0787307024002075, 'learning_rate': 1.9065042940364326e-05, 'epoch': 0.49} +2025-02-05 12:42:52 - ERROR - stderr - 16%|█▋ | 3693/22434 [2:35:12<13:03:20, 2.51s/it] +2025-02-05 12:42:55 - ERROR - stderr - 16%|█▋ | 3694/22434 [2:35:14<13:01:10, 2.50s/it] +2025-02-05 12:42:55 - ERROR - stderr - +2025-02-05 12:42:55 - ERROR - stderr - +2025-02-05 12:42:55 - INFO - stdout - {'loss': 0.8817, 'grad_norm': 1.052585482597351, 'learning_rate': 1.906443330098161e-05, 'epoch': 0.49} +2025-02-05 12:42:55 - ERROR - stderr - 16%|█▋ | 3694/22434 [2:35:14<13:01:10, 2.50s/it] +2025-02-05 12:42:57 - ERROR - stderr - 16%|█▋ | 3695/22434 [2:35:17<13:00:55, 2.50s/it] +2025-02-05 12:42:57 - ERROR - stderr - +2025-02-05 12:42:57 - ERROR - stderr - +2025-02-05 12:42:57 - INFO - stdout - {'loss': 1.0274, 'grad_norm': 1.048724889755249, 'learning_rate': 1.9063823472659457e-05, 'epoch': 0.49} +2025-02-05 12:42:57 - ERROR - stderr - 16%|█▋ | 3695/22434 [2:35:17<13:00:55, 2.50s/it] +2025-02-05 12:43:00 - ERROR - stderr - 16%|█▋ | 3696/22434 [2:35:19<13:04:19, 2.51s/it] +2025-02-05 12:43:00 - ERROR - stderr - +2025-02-05 12:43:00 - ERROR - stderr - +2025-02-05 12:43:00 - INFO - stdout - {'loss': 0.9794, 'grad_norm': 1.2308967113494873, 'learning_rate': 1.9063213455410577e-05, 'epoch': 0.49} +2025-02-05 12:43:00 - ERROR - stderr - 16%|█▋ | 3696/22434 [2:35:19<13:04:19, 2.51s/it] +2025-02-05 12:43:02 - ERROR - stderr - 16%|█▋ | 3697/22434 [2:35:22<13:20:24, 2.56s/it] +2025-02-05 12:43:02 - ERROR - stderr - +2025-02-05 12:43:02 - ERROR - stderr - +2025-02-05 12:43:02 - INFO - stdout - {'loss': 0.8997, 'grad_norm': 1.2070680856704712, 'learning_rate': 1.9062603249247686e-05, 'epoch': 0.49} +2025-02-05 12:43:02 - ERROR - stderr - 16%|█▋ | 3697/22434 [2:35:22<13:20:24, 2.56s/it] +2025-02-05 12:43:05 - ERROR - stderr - 16%|█▋ | 3698/22434 [2:35:25<13:15:39, 2.55s/it] +2025-02-05 12:43:05 - ERROR - stderr - +2025-02-05 12:43:05 - ERROR - stderr - +2025-02-05 12:43:05 - INFO - stdout - {'loss': 0.9202, 'grad_norm': 1.032382845878601, 'learning_rate': 1.90619928541835e-05, 'epoch': 0.49} +2025-02-05 12:43:05 - ERROR - stderr - 16%|█▋ | 3698/22434 [2:35:25<13:15:39, 2.55s/it] +2025-02-05 12:43:07 - ERROR - stderr - 16%|█▋ | 3699/22434 [2:35:27<13:09:42, 2.53s/it] +2025-02-05 12:43:07 - ERROR - stderr - +2025-02-05 12:43:07 - ERROR - stderr - +2025-02-05 12:43:07 - INFO - stdout - {'loss': 1.0274, 'grad_norm': 1.2193373441696167, 'learning_rate': 1.9061382270230745e-05, 'epoch': 0.49} +2025-02-05 12:43:07 - ERROR - stderr - 16%|█▋ | 3699/22434 [2:35:27<13:09:42, 2.53s/it] +2025-02-05 12:43:10 - ERROR - stderr - 16%|█▋ | 3700/22434 [2:35:30<13:03:25, 2.51s/it] +2025-02-05 12:43:10 - ERROR - stderr - +2025-02-05 12:43:10 - ERROR - stderr - +2025-02-05 12:43:10 - INFO - stdout - {'loss': 1.1391, 'grad_norm': 1.241326928138733, 'learning_rate': 1.9060771497402147e-05, 'epoch': 0.49} +2025-02-05 12:43:10 - ERROR - stderr - 16%|█▋ | 3700/22434 [2:35:30<13:03:25, 2.51s/it] +2025-02-05 12:43:12 - ERROR - stderr - 16%|█▋ | 3701/22434 [2:35:32<13:01:29, 2.50s/it] +2025-02-05 12:43:12 - ERROR - stderr - +2025-02-05 12:43:12 - ERROR - stderr - +2025-02-05 12:43:12 - INFO - stdout - {'loss': 0.94, 'grad_norm': 1.0512620210647583, 'learning_rate': 1.9060160535710438e-05, 'epoch': 0.49} +2025-02-05 12:43:12 - ERROR - stderr - 16%|█▋ | 3701/22434 [2:35:32<13:01:29, 2.50s/it] +2025-02-05 12:43:15 - ERROR - stderr - 17%|█▋ | 3702/22434 [2:35:35<13:06:01, 2.52s/it] +2025-02-05 12:43:15 - ERROR - stderr - +2025-02-05 12:43:15 - ERROR - stderr - +2025-02-05 12:43:15 - INFO - stdout - {'loss': 0.9942, 'grad_norm': 1.3783785104751587, 'learning_rate': 1.9059549385168355e-05, 'epoch': 0.5} +2025-02-05 12:43:15 - ERROR - stderr - 17%|█▋ | 3702/22434 [2:35:35<13:06:01, 2.52s/it] +2025-02-05 12:43:17 - ERROR - stderr - 17%|█▋ | 3703/22434 [2:35:37<13:07:58, 2.52s/it] +2025-02-05 12:43:17 - ERROR - stderr - +2025-02-05 12:43:17 - ERROR - stderr - +2025-02-05 12:43:17 - INFO - stdout - {'loss': 0.8717, 'grad_norm': 1.0376447439193726, 'learning_rate': 1.905893804578863e-05, 'epoch': 0.5} +2025-02-05 12:43:17 - ERROR - stderr - 17%|█▋ | 3703/22434 [2:35:37<13:07:58, 2.52s/it] +2025-02-05 12:43:20 - ERROR - stderr - 17%|█▋ | 3704/22434 [2:35:40<13:05:52, 2.52s/it] +2025-02-05 12:43:20 - ERROR - stderr - +2025-02-05 12:43:20 - ERROR - stderr - +2025-02-05 12:43:20 - INFO - stdout - {'loss': 1.0457, 'grad_norm': 1.1338492631912231, 'learning_rate': 1.9058326517584014e-05, 'epoch': 0.5} +2025-02-05 12:43:20 - ERROR - stderr - 17%|█▋ | 3704/22434 [2:35:40<13:05:52, 2.52s/it] +2025-02-05 12:43:22 - ERROR - stderr - 17%|█▋ | 3705/22434 [2:35:42<13:02:30, 2.51s/it] +2025-02-05 12:43:22 - ERROR - stderr - +2025-02-05 12:43:22 - ERROR - stderr - +2025-02-05 12:43:22 - INFO - stdout - {'loss': 0.9726, 'grad_norm': 1.1192903518676758, 'learning_rate': 1.9057714800567244e-05, 'epoch': 0.5} +2025-02-05 12:43:22 - ERROR - stderr - 17%|█▋ | 3705/22434 [2:35:42<13:02:30, 2.51s/it] +2025-02-05 12:43:25 - ERROR - stderr - 17%|█▋ | 3706/22434 [2:35:45<13:02:15, 2.51s/it] +2025-02-05 12:43:25 - ERROR - stderr - +2025-02-05 12:43:25 - ERROR - stderr - +2025-02-05 12:43:25 - INFO - stdout - {'loss': 0.9052, 'grad_norm': 1.0879130363464355, 'learning_rate': 1.905710289475108e-05, 'epoch': 0.5} +2025-02-05 12:43:25 - ERROR - stderr - 17%|█▋ | 3706/22434 [2:35:45<13:02:15, 2.51s/it] +2025-02-05 12:43:28 - ERROR - stderr - 17%|█▋ | 3707/22434 [2:35:47<13:21:07, 2.57s/it] +2025-02-05 12:43:28 - ERROR - stderr - +2025-02-05 12:43:28 - ERROR - stderr - +2025-02-05 12:43:28 - INFO - stdout - {'loss': 0.9178, 'grad_norm': 1.03330397605896, 'learning_rate': 1.9056490800148273e-05, 'epoch': 0.5} +2025-02-05 12:43:28 - ERROR - stderr - 17%|█▋ | 3707/22434 [2:35:47<13:21:07, 2.57s/it] +2025-02-05 12:43:30 - ERROR - stderr - 17%|█▋ | 3708/22434 [2:35:50<13:18:43, 2.56s/it] +2025-02-05 12:43:30 - ERROR - stderr - +2025-02-05 12:43:30 - ERROR - stderr - +2025-02-05 12:43:30 - INFO - stdout - {'loss': 1.0012, 'grad_norm': 1.083507776260376, 'learning_rate': 1.905587851677158e-05, 'epoch': 0.5} +2025-02-05 12:43:30 - ERROR - stderr - 17%|█▋ | 3708/22434 [2:35:50<13:18:43, 2.56s/it] +2025-02-05 12:43:33 - ERROR - stderr - 17%|█▋ | 3709/22434 [2:35:52<13:14:22, 2.55s/it] +2025-02-05 12:43:33 - ERROR - stderr - +2025-02-05 12:43:33 - ERROR - stderr - +2025-02-05 12:43:33 - INFO - stdout - {'loss': 0.8887, 'grad_norm': 1.0827792882919312, 'learning_rate': 1.9055266044633765e-05, 'epoch': 0.5} +2025-02-05 12:43:33 - ERROR - stderr - 17%|█▋ | 3709/22434 [2:35:52<13:14:22, 2.55s/it] +2025-02-05 12:43:35 - ERROR - stderr - 17%|█▋ | 3710/22434 [2:35:55<13:07:43, 2.52s/it] +2025-02-05 12:43:35 - ERROR - stderr - +2025-02-05 12:43:35 - ERROR - stderr - +2025-02-05 12:43:35 - INFO - stdout - {'loss': 1.1565, 'grad_norm': 1.1803441047668457, 'learning_rate': 1.9054653383747593e-05, 'epoch': 0.5} +2025-02-05 12:43:35 - ERROR - stderr - 17%|█▋ | 3710/22434 [2:35:55<13:07:43, 2.52s/it] +2025-02-05 12:43:38 - ERROR - stderr - 17%|█▋ | 3711/22434 [2:35:57<13:04:32, 2.51s/it] +2025-02-05 12:43:38 - ERROR - stderr - +2025-02-05 12:43:38 - ERROR - stderr - +2025-02-05 12:43:38 - INFO - stdout - {'loss': 1.0697, 'grad_norm': 1.2359166145324707, 'learning_rate': 1.905404053412584e-05, 'epoch': 0.5} +2025-02-05 12:43:38 - ERROR - stderr - 17%|█▋ | 3711/22434 [2:35:57<13:04:32, 2.51s/it] +2025-02-05 12:43:40 - ERROR - stderr - 17%|█▋ | 3712/22434 [2:36:00<13:07:59, 2.53s/it] +2025-02-05 12:43:40 - ERROR - stderr - +2025-02-05 12:43:40 - ERROR - stderr - +2025-02-05 12:43:40 - INFO - stdout - {'loss': 1.0273, 'grad_norm': 1.18559992313385, 'learning_rate': 1.9053427495781273e-05, 'epoch': 0.5} +2025-02-05 12:43:40 - ERROR - stderr - 17%|█▋ | 3712/22434 [2:36:00<13:07:59, 2.53s/it] +2025-02-05 12:43:43 - ERROR - stderr - 17%|█▋ | 3713/22434 [2:36:02<13:01:39, 2.51s/it] +2025-02-05 12:43:43 - ERROR - stderr - +2025-02-05 12:43:43 - ERROR - stderr - +2025-02-05 12:43:43 - INFO - stdout - {'loss': 1.0001, 'grad_norm': 1.1418718099594116, 'learning_rate': 1.905281426872667e-05, 'epoch': 0.5} +2025-02-05 12:43:43 - ERROR - stderr - 17%|█▋ | 3713/22434 [2:36:02<13:01:39, 2.51s/it] +2025-02-05 12:43:45 - ERROR - stderr - 17%|█▋ | 3714/22434 [2:36:05<13:01:14, 2.50s/it] +2025-02-05 12:43:45 - ERROR - stderr - +2025-02-05 12:43:45 - ERROR - stderr - +2025-02-05 12:43:45 - INFO - stdout - {'loss': 0.8729, 'grad_norm': 1.032114863395691, 'learning_rate': 1.905220085297482e-05, 'epoch': 0.5} +2025-02-05 12:43:45 - ERROR - stderr - 17%|█▋ | 3714/22434 [2:36:05<13:01:14, 2.50s/it] +2025-02-05 12:43:48 - ERROR - stderr - 17%|█▋ | 3715/22434 [2:36:07<12:58:06, 2.49s/it] +2025-02-05 12:43:48 - ERROR - stderr - +2025-02-05 12:43:48 - ERROR - stderr - +2025-02-05 12:43:48 - INFO - stdout - {'loss': 0.9755, 'grad_norm': 1.1375812292099, 'learning_rate': 1.9051587248538505e-05, 'epoch': 0.5} +2025-02-05 12:43:48 - ERROR - stderr - 17%|█▋ | 3715/22434 [2:36:07<12:58:06, 2.49s/it] +2025-02-05 12:43:50 - ERROR - stderr - 17%|█▋ | 3716/22434 [2:36:10<12:52:46, 2.48s/it] +2025-02-05 12:43:50 - ERROR - stderr - +2025-02-05 12:43:50 - ERROR - stderr - +2025-02-05 12:43:50 - INFO - stdout - {'loss': 0.949, 'grad_norm': 1.0829858779907227, 'learning_rate': 1.9050973455430517e-05, 'epoch': 0.5} +2025-02-05 12:43:50 - ERROR - stderr - 17%|█▋ | 3716/22434 [2:36:10<12:52:46, 2.48s/it] +2025-02-05 12:43:53 - ERROR - stderr - 17%|█▋ | 3717/22434 [2:36:12<12:54:03, 2.48s/it] +2025-02-05 12:43:53 - ERROR - stderr - +2025-02-05 12:43:53 - ERROR - stderr - +2025-02-05 12:43:53 - INFO - stdout - {'loss': 0.8596, 'grad_norm': 1.0053060054779053, 'learning_rate': 1.9050359473663644e-05, 'epoch': 0.5} +2025-02-05 12:43:53 - ERROR - stderr - 17%|█▋ | 3717/22434 [2:36:12<12:54:03, 2.48s/it] +2025-02-05 12:43:55 - ERROR - stderr - 17%|█▋ | 3718/22434 [2:36:15<12:53:01, 2.48s/it] +2025-02-05 12:43:55 - ERROR - stderr - +2025-02-05 12:43:55 - ERROR - stderr - +2025-02-05 12:43:55 - INFO - stdout - {'loss': 0.9247, 'grad_norm': 1.0411442518234253, 'learning_rate': 1.9049745303250692e-05, 'epoch': 0.5} +2025-02-05 12:43:55 - ERROR - stderr - 17%|█▋ | 3718/22434 [2:36:15<12:53:01, 2.48s/it] +2025-02-05 12:43:58 - ERROR - stderr - 17%|█▋ | 3719/22434 [2:36:17<13:03:57, 2.51s/it] +2025-02-05 12:43:58 - ERROR - stderr - +2025-02-05 12:43:58 - ERROR - stderr - +2025-02-05 12:43:58 - INFO - stdout - {'loss': 0.9255, 'grad_norm': 1.0137289762496948, 'learning_rate': 1.9049130944204454e-05, 'epoch': 0.5} +2025-02-05 12:43:58 - ERROR - stderr - 17%|█▋ | 3719/22434 [2:36:17<13:03:57, 2.51s/it] +2025-02-05 12:44:00 - ERROR - stderr - 17%|█▋ | 3720/22434 [2:36:20<13:09:55, 2.53s/it] +2025-02-05 12:44:00 - ERROR - stderr - +2025-02-05 12:44:00 - ERROR - stderr - +2025-02-05 12:44:00 - INFO - stdout - {'loss': 1.0447, 'grad_norm': 1.199967384338379, 'learning_rate': 1.9048516396537745e-05, 'epoch': 0.5} +2025-02-05 12:44:00 - ERROR - stderr - 17%|█▋ | 3720/22434 [2:36:20<13:09:55, 2.53s/it] +2025-02-05 12:44:03 - ERROR - stderr - 17%|█▋ | 3721/22434 [2:36:23<13:14:04, 2.55s/it] +2025-02-05 12:44:03 - ERROR - stderr - +2025-02-05 12:44:03 - ERROR - stderr - +2025-02-05 12:44:03 - INFO - stdout - {'loss': 1.0075, 'grad_norm': 1.1437036991119385, 'learning_rate': 1.9047901660263372e-05, 'epoch': 0.5} +2025-02-05 12:44:03 - ERROR - stderr - 17%|█▋ | 3721/22434 [2:36:23<13:14:04, 2.55s/it] +2025-02-05 12:44:05 - ERROR - stderr - 17%|█▋ | 3722/22434 [2:36:25<13:07:54, 2.53s/it] +2025-02-05 12:44:05 - ERROR - stderr - +2025-02-05 12:44:05 - ERROR - stderr - +2025-02-05 12:44:05 - INFO - stdout - {'loss': 0.9919, 'grad_norm': 1.3072922229766846, 'learning_rate': 1.904728673539414e-05, 'epoch': 0.5} +2025-02-05 12:44:05 - ERROR - stderr - 17%|█▋ | 3722/22434 [2:36:25<13:07:54, 2.53s/it] +2025-02-05 12:44:08 - ERROR - stderr - 17%|█▋ | 3723/22434 [2:36:28<13:13:35, 2.54s/it] +2025-02-05 12:44:08 - ERROR - stderr - +2025-02-05 12:44:08 - ERROR - stderr - +2025-02-05 12:44:08 - INFO - stdout - {'loss': 1.0676, 'grad_norm': 1.213537335395813, 'learning_rate': 1.904667162194288e-05, 'epoch': 0.5} +2025-02-05 12:44:08 - ERROR - stderr - 17%|█▋ | 3723/22434 [2:36:28<13:13:35, 2.54s/it] +2025-02-05 12:44:10 - ERROR - stderr - 17%|█▋ | 3724/22434 [2:36:30<13:12:52, 2.54s/it] +2025-02-05 12:44:10 - ERROR - stderr - +2025-02-05 12:44:10 - ERROR - stderr - +2025-02-05 12:44:10 - INFO - stdout - {'loss': 0.9715, 'grad_norm': 1.12119460105896, 'learning_rate': 1.9046056319922403e-05, 'epoch': 0.5} +2025-02-05 12:44:10 - ERROR - stderr - 17%|█▋ | 3724/22434 [2:36:30<13:12:52, 2.54s/it] +2025-02-05 12:44:13 - ERROR - stderr - 17%|█▋ | 3725/22434 [2:36:33<13:14:00, 2.55s/it] +2025-02-05 12:44:13 - ERROR - stderr - +2025-02-05 12:44:13 - ERROR - stderr - +2025-02-05 12:44:13 - INFO - stdout - {'loss': 1.0197, 'grad_norm': 1.0706086158752441, 'learning_rate': 1.9045440829345536e-05, 'epoch': 0.5} +2025-02-05 12:44:13 - ERROR - stderr - 17%|█▋ | 3725/22434 [2:36:33<13:14:00, 2.55s/it] +2025-02-05 12:44:15 - ERROR - stderr - 17%|█▋ | 3726/22434 [2:36:35<13:17:20, 2.56s/it] +2025-02-05 12:44:16 - ERROR - stderr - +2025-02-05 12:44:16 - ERROR - stderr - +2025-02-05 12:44:16 - INFO - stdout - {'loss': 0.9707, 'grad_norm': 1.054457187652588, 'learning_rate': 1.904482515022511e-05, 'epoch': 0.5} +2025-02-05 12:44:16 - ERROR - stderr - 17%|█▋ | 3726/22434 [2:36:35<13:17:20, 2.56s/it] +2025-02-05 12:44:18 - ERROR - stderr - 17%|█▋ | 3727/22434 [2:36:38<13:07:06, 2.52s/it] +2025-02-05 12:44:18 - ERROR - stderr - +2025-02-05 12:44:18 - ERROR - stderr - +2025-02-05 12:44:18 - INFO - stdout - {'loss': 0.9691, 'grad_norm': 1.1057053804397583, 'learning_rate': 1.9044209282573963e-05, 'epoch': 0.5} +2025-02-05 12:44:18 - ERROR - stderr - 17%|█▋ | 3727/22434 [2:36:38<13:07:06, 2.52s/it] +2025-02-05 12:44:20 - ERROR - stderr - 17%|█▋ | 3728/22434 [2:36:40<13:03:30, 2.51s/it] +2025-02-05 12:44:20 - ERROR - stderr - +2025-02-05 12:44:20 - ERROR - stderr - +2025-02-05 12:44:20 - INFO - stdout - {'loss': 0.9649, 'grad_norm': 1.1541610956192017, 'learning_rate': 1.9043593226404927e-05, 'epoch': 0.5} +2025-02-05 12:44:20 - ERROR - stderr - 17%|█▋ | 3728/22434 [2:36:40<13:03:30, 2.51s/it] +2025-02-05 12:44:23 - ERROR - stderr - 17%|█▋ | 3729/22434 [2:36:43<13:16:05, 2.55s/it] +2025-02-05 12:44:23 - ERROR - stderr - +2025-02-05 12:44:23 - ERROR - stderr - +2025-02-05 12:44:23 - INFO - stdout - {'loss': 1.0062, 'grad_norm': 1.0658810138702393, 'learning_rate': 1.9042976981730845e-05, 'epoch': 0.5} +2025-02-05 12:44:23 - ERROR - stderr - 17%|█▋ | 3729/22434 [2:36:43<13:16:05, 2.55s/it] +2025-02-05 12:44:26 - ERROR - stderr - 17%|█▋ | 3730/22434 [2:36:45<13:10:01, 2.53s/it] +2025-02-05 12:44:26 - ERROR - stderr - +2025-02-05 12:44:26 - ERROR - stderr - +2025-02-05 12:44:26 - INFO - stdout - {'loss': 1.0002, 'grad_norm': 1.13431978225708, 'learning_rate': 1.9042360548564557e-05, 'epoch': 0.5} +2025-02-05 12:44:26 - ERROR - stderr - 17%|█▋ | 3730/22434 [2:36:45<13:10:01, 2.53s/it] +2025-02-05 12:44:28 - ERROR - stderr - 17%|█▋ | 3731/22434 [2:36:48<13:04:16, 2.52s/it] +2025-02-05 12:44:28 - ERROR - stderr - +2025-02-05 12:44:28 - ERROR - stderr - +2025-02-05 12:44:28 - INFO - stdout - {'loss': 0.9908, 'grad_norm': 1.0684891939163208, 'learning_rate': 1.904174392691892e-05, 'epoch': 0.5} +2025-02-05 12:44:28 - ERROR - stderr - 17%|█▋ | 3731/22434 [2:36:48<13:04:16, 2.52s/it] +2025-02-05 12:44:30 - ERROR - stderr - 17%|█▋ | 3732/22434 [2:36:50<12:56:24, 2.49s/it] +2025-02-05 12:44:31 - ERROR - stderr - +2025-02-05 12:44:31 - ERROR - stderr - +2025-02-05 12:44:31 - INFO - stdout - {'loss': 1.0002, 'grad_norm': 1.1629993915557861, 'learning_rate': 1.9041127116806782e-05, 'epoch': 0.5} +2025-02-05 12:44:31 - ERROR - stderr - 17%|█▋ | 3732/22434 [2:36:50<12:56:24, 2.49s/it] +2025-02-05 12:44:33 - ERROR - stderr - 17%|█▋ | 3733/22434 [2:36:53<12:56:23, 2.49s/it] +2025-02-05 12:44:33 - ERROR - stderr - +2025-02-05 12:44:33 - ERROR - stderr - +2025-02-05 12:44:33 - INFO - stdout - {'loss': 1.0335, 'grad_norm': 1.0453673601150513, 'learning_rate': 1.9040510118241e-05, 'epoch': 0.5} +2025-02-05 12:44:33 - ERROR - stderr - 17%|█▋ | 3733/22434 [2:36:53<12:56:23, 2.49s/it] +2025-02-05 12:44:35 - ERROR - stderr - 17%|█▋ | 3734/22434 [2:36:55<12:53:30, 2.48s/it] +2025-02-05 12:44:35 - ERROR - stderr - +2025-02-05 12:44:35 - ERROR - stderr - +2025-02-05 12:44:35 - INFO - stdout - {'loss': 1.0918, 'grad_norm': 1.224331259727478, 'learning_rate': 1.9039892931234434e-05, 'epoch': 0.5} +2025-02-05 12:44:35 - ERROR - stderr - 17%|█▋ | 3734/22434 [2:36:55<12:53:30, 2.48s/it] +2025-02-05 12:44:38 - ERROR - stderr - 17%|█▋ | 3735/22434 [2:36:58<12:55:28, 2.49s/it] +2025-02-05 12:44:38 - ERROR - stderr - +2025-02-05 12:44:38 - ERROR - stderr - +2025-02-05 12:44:38 - INFO - stdout - {'loss': 0.9817, 'grad_norm': 1.0447088479995728, 'learning_rate': 1.903927555579995e-05, 'epoch': 0.5} +2025-02-05 12:44:38 - ERROR - stderr - 17%|█▋ | 3735/22434 [2:36:58<12:55:28, 2.49s/it] +2025-02-05 12:44:40 - ERROR - stderr - 17%|█▋ | 3736/22434 [2:37:00<12:48:29, 2.47s/it] +2025-02-05 12:44:40 - ERROR - stderr - +2025-02-05 12:44:40 - ERROR - stderr - +2025-02-05 12:44:40 - INFO - stdout - {'loss': 1.012, 'grad_norm': 1.0892528295516968, 'learning_rate': 1.903865799195042e-05, 'epoch': 0.5} +2025-02-05 12:44:40 - ERROR - stderr - 17%|█▋ | 3736/22434 [2:37:00<12:48:29, 2.47s/it] +2025-02-05 12:44:43 - ERROR - stderr - 17%|█▋ | 3737/22434 [2:37:03<12:52:13, 2.48s/it] +2025-02-05 12:44:43 - ERROR - stderr - +2025-02-05 12:44:43 - ERROR - stderr - +2025-02-05 12:44:43 - INFO - stdout - {'loss': 1.145, 'grad_norm': 1.1643753051757812, 'learning_rate': 1.9038040239698712e-05, 'epoch': 0.5} +2025-02-05 12:44:43 - ERROR - stderr - 17%|█▋ | 3737/22434 [2:37:03<12:52:13, 2.48s/it] +2025-02-05 12:44:46 - ERROR - stderr - 17%|█▋ | 3738/22434 [2:37:05<13:18:53, 2.56s/it] +2025-02-05 12:44:46 - ERROR - stderr - +2025-02-05 12:44:46 - ERROR - stderr - +2025-02-05 12:44:46 - INFO - stdout - {'loss': 0.9291, 'grad_norm': 1.215293288230896, 'learning_rate': 1.9037422299057703e-05, 'epoch': 0.5} +2025-02-05 12:44:46 - ERROR - stderr - 17%|█▋ | 3738/22434 [2:37:05<13:18:53, 2.56s/it] +2025-02-05 12:44:48 - ERROR - stderr - 17%|█▋ | 3739/22434 [2:37:08<13:05:48, 2.52s/it] +2025-02-05 12:44:48 - ERROR - stderr - +2025-02-05 12:44:48 - ERROR - stderr - +2025-02-05 12:44:48 - INFO - stdout - {'loss': 1.0363, 'grad_norm': 1.1376841068267822, 'learning_rate': 1.9036804170040277e-05, 'epoch': 0.5} +2025-02-05 12:44:48 - ERROR - stderr - 17%|█▋ | 3739/22434 [2:37:08<13:05:48, 2.52s/it] +2025-02-05 12:44:51 - ERROR - stderr - 17%|█▋ | 3740/22434 [2:37:10<13:06:40, 2.52s/it] +2025-02-05 12:44:51 - ERROR - stderr - +2025-02-05 12:44:51 - ERROR - stderr - +2025-02-05 12:44:51 - INFO - stdout - {'loss': 0.9304, 'grad_norm': 1.058864712715149, 'learning_rate': 1.903618585265931e-05, 'epoch': 0.5} +2025-02-05 12:44:51 - ERROR - stderr - 17%|█▋ | 3740/22434 [2:37:10<13:06:40, 2.52s/it] +2025-02-05 12:44:53 - ERROR - stderr - 17%|█▋ | 3741/22434 [2:37:13<13:05:32, 2.52s/it] +2025-02-05 12:44:53 - ERROR - stderr - +2025-02-05 12:44:53 - ERROR - stderr - +2025-02-05 12:44:53 - INFO - stdout - {'loss': 0.9755, 'grad_norm': 1.107782006263733, 'learning_rate': 1.9035567346927698e-05, 'epoch': 0.5} +2025-02-05 12:44:53 - ERROR - stderr - 17%|█▋ | 3741/22434 [2:37:13<13:05:32, 2.52s/it] +2025-02-05 12:44:56 - ERROR - stderr - 17%|█▋ | 3742/22434 [2:37:15<13:07:35, 2.53s/it] +2025-02-05 12:44:56 - ERROR - stderr - +2025-02-05 12:44:56 - ERROR - stderr - +2025-02-05 12:44:56 - INFO - stdout - {'loss': 1.0345, 'grad_norm': 1.1619786024093628, 'learning_rate': 1.9034948652858333e-05, 'epoch': 0.5} +2025-02-05 12:44:56 - ERROR - stderr - 17%|█▋ | 3742/22434 [2:37:15<13:07:35, 2.53s/it] +2025-02-05 12:44:58 - ERROR - stderr - 17%|█▋ | 3743/22434 [2:37:18<13:02:07, 2.51s/it] +2025-02-05 12:44:58 - ERROR - stderr - +2025-02-05 12:44:58 - ERROR - stderr - +2025-02-05 12:44:58 - INFO - stdout - {'loss': 0.8764, 'grad_norm': 1.1246308088302612, 'learning_rate': 1.9034329770464107e-05, 'epoch': 0.5} +2025-02-05 12:44:58 - ERROR - stderr - 17%|█▋ | 3743/22434 [2:37:18<13:02:07, 2.51s/it] +2025-02-05 12:45:01 - ERROR - stderr - 17%|█▋ | 3744/22434 [2:37:21<13:24:03, 2.58s/it] +2025-02-05 12:45:01 - ERROR - stderr - +2025-02-05 12:45:01 - ERROR - stderr - +2025-02-05 12:45:01 - INFO - stdout - {'loss': 0.9618, 'grad_norm': 0.9970629215240479, 'learning_rate': 1.903371069975792e-05, 'epoch': 0.5} +2025-02-05 12:45:01 - ERROR - stderr - 17%|█▋ | 3744/22434 [2:37:21<13:24:03, 2.58s/it] +2025-02-05 12:45:03 - ERROR - stderr - 17%|█▋ | 3745/22434 [2:37:23<13:16:53, 2.56s/it] +2025-02-05 12:45:03 - ERROR - stderr - +2025-02-05 12:45:03 - ERROR - stderr - +2025-02-05 12:45:03 - INFO - stdout - {'loss': 0.9536, 'grad_norm': 0.9689397215843201, 'learning_rate': 1.9033091440752677e-05, 'epoch': 0.5} +2025-02-05 12:45:03 - ERROR - stderr - 17%|█▋ | 3745/22434 [2:37:23<13:16:53, 2.56s/it] +2025-02-05 12:45:06 - ERROR - stderr - 17%|█▋ | 3746/22434 [2:37:26<13:05:13, 2.52s/it] +2025-02-05 12:45:06 - ERROR - stderr - +2025-02-05 12:45:06 - ERROR - stderr - +2025-02-05 12:45:06 - INFO - stdout - {'loss': 0.859, 'grad_norm': 1.151862382888794, 'learning_rate': 1.903247199346129e-05, 'epoch': 0.5} +2025-02-05 12:45:06 - ERROR - stderr - 17%|█▋ | 3746/22434 [2:37:26<13:05:13, 2.52s/it] +2025-02-05 12:45:08 - ERROR - stderr - 17%|█▋ | 3747/22434 [2:37:28<13:07:40, 2.53s/it] +2025-02-05 12:45:08 - ERROR - stderr - +2025-02-05 12:45:08 - ERROR - stderr - +2025-02-05 12:45:08 - INFO - stdout - {'loss': 1.0223, 'grad_norm': 1.1340296268463135, 'learning_rate': 1.9031852357896667e-05, 'epoch': 0.5} +2025-02-05 12:45:08 - ERROR - stderr - 17%|█▋ | 3747/22434 [2:37:28<13:07:40, 2.53s/it] +2025-02-05 12:45:11 - ERROR - stderr - 17%|█▋ | 3748/22434 [2:37:31<13:10:24, 2.54s/it] +2025-02-05 12:45:11 - ERROR - stderr - +2025-02-05 12:45:11 - ERROR - stderr - +2025-02-05 12:45:11 - INFO - stdout - {'loss': 0.8795, 'grad_norm': 1.0682473182678223, 'learning_rate': 1.903123253407172e-05, 'epoch': 0.5} +2025-02-05 12:45:11 - ERROR - stderr - 17%|█▋ | 3748/22434 [2:37:31<13:10:24, 2.54s/it] +2025-02-05 12:45:13 - ERROR - stderr - 17%|█▋ | 3749/22434 [2:37:33<13:08:23, 2.53s/it] +2025-02-05 12:45:13 - ERROR - stderr - +2025-02-05 12:45:13 - ERROR - stderr - +2025-02-05 12:45:13 - INFO - stdout - {'loss': 1.0473, 'grad_norm': 1.12838613986969, 'learning_rate': 1.903061252199938e-05, 'epoch': 0.5} +2025-02-05 12:45:13 - ERROR - stderr - 17%|█▋ | 3749/22434 [2:37:33<13:08:23, 2.53s/it] +2025-02-05 12:45:16 - ERROR - stderr - 17%|█▋ | 3750/22434 [2:37:36<13:09:06, 2.53s/it] +2025-02-05 12:45:16 - ERROR - stderr - +2025-02-05 12:45:16 - ERROR - stderr - +2025-02-05 12:45:16 - INFO - stdout - {'loss': 0.9686, 'grad_norm': 1.1914901733398438, 'learning_rate': 1.902999232169256e-05, 'epoch': 0.5} +2025-02-05 12:45:16 - ERROR - stderr - 17%|█▋ | 3750/22434 [2:37:36<13:09:06, 2.53s/it] +2025-02-05 12:45:19 - ERROR - stderr - 17%|█▋ | 3751/22434 [2:37:38<13:13:12, 2.55s/it] +2025-02-05 12:45:19 - ERROR - stderr - +2025-02-05 12:45:19 - ERROR - stderr - +2025-02-05 12:45:19 - INFO - stdout - {'loss': 0.9996, 'grad_norm': 1.078313946723938, 'learning_rate': 1.9029371933164192e-05, 'epoch': 0.5} +2025-02-05 12:45:19 - ERROR - stderr - 17%|█▋ | 3751/22434 [2:37:38<13:13:12, 2.55s/it] +2025-02-05 12:45:21 - ERROR - stderr - 17%|█▋ | 3752/22434 [2:37:41<13:11:50, 2.54s/it] +2025-02-05 12:45:21 - ERROR - stderr - +2025-02-05 12:45:21 - ERROR - stderr - +2025-02-05 12:45:21 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 1.182842493057251, 'learning_rate': 1.90287513564272e-05, 'epoch': 0.5} +2025-02-05 12:45:21 - ERROR - stderr - 17%|█▋ | 3752/22434 [2:37:41<13:11:50, 2.54s/it] +2025-02-05 12:45:24 - ERROR - stderr - 17%|█▋ | 3753/22434 [2:37:43<13:11:35, 2.54s/it] +2025-02-05 12:45:24 - ERROR - stderr - +2025-02-05 12:45:24 - ERROR - stderr - +2025-02-05 12:45:24 - INFO - stdout - {'loss': 0.9839, 'grad_norm': 1.1277867555618286, 'learning_rate': 1.9028130591494532e-05, 'epoch': 0.5} +2025-02-05 12:45:24 - ERROR - stderr - 17%|█▋ | 3753/22434 [2:37:43<13:11:35, 2.54s/it] +2025-02-05 12:45:26 - ERROR - stderr - 17%|█▋ | 3754/22434 [2:37:46<13:13:06, 2.55s/it] +2025-02-05 12:45:26 - ERROR - stderr - +2025-02-05 12:45:26 - ERROR - stderr - +2025-02-05 12:45:26 - INFO - stdout - {'loss': 0.9676, 'grad_norm': 1.0247992277145386, 'learning_rate': 1.9027509638379122e-05, 'epoch': 0.5} +2025-02-05 12:45:26 - ERROR - stderr - 17%|█▋ | 3754/22434 [2:37:46<13:13:06, 2.55s/it] +2025-02-05 12:45:29 - ERROR - stderr - 17%|█▋ | 3755/22434 [2:37:48<13:01:38, 2.51s/it] +2025-02-05 12:45:29 - ERROR - stderr - +2025-02-05 12:45:29 - ERROR - stderr - +2025-02-05 12:45:29 - INFO - stdout - {'loss': 0.9449, 'grad_norm': 1.071983814239502, 'learning_rate': 1.902688849709391e-05, 'epoch': 0.5} +2025-02-05 12:45:29 - ERROR - stderr - 17%|█▋ | 3755/22434 [2:37:48<13:01:38, 2.51s/it] +2025-02-05 12:45:31 - ERROR - stderr - 17%|█▋ | 3756/22434 [2:37:51<12:55:44, 2.49s/it] +2025-02-05 12:45:31 - ERROR - stderr - +2025-02-05 12:45:31 - ERROR - stderr - +2025-02-05 12:45:31 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 1.0668017864227295, 'learning_rate': 1.902626716765184e-05, 'epoch': 0.5} +2025-02-05 12:45:31 - ERROR - stderr - 17%|█▋ | 3756/22434 [2:37:51<12:55:44, 2.49s/it] +2025-02-05 12:45:34 - ERROR - stderr - 17%|█▋ | 3757/22434 [2:37:54<13:29:12, 2.60s/it] +2025-02-05 12:45:34 - ERROR - stderr - +2025-02-05 12:45:34 - ERROR - stderr - +2025-02-05 12:45:34 - INFO - stdout - {'loss': 0.9724, 'grad_norm': 1.1264066696166992, 'learning_rate': 1.9025645650065874e-05, 'epoch': 0.5} +2025-02-05 12:45:34 - ERROR - stderr - 17%|█▋ | 3757/22434 [2:37:54<13:29:12, 2.60s/it] +2025-02-05 12:45:36 - ERROR - stderr - 17%|█▋ | 3758/22434 [2:37:56<13:18:11, 2.56s/it] +2025-02-05 12:45:36 - ERROR - stderr - +2025-02-05 12:45:36 - ERROR - stderr - +2025-02-05 12:45:36 - INFO - stdout - {'loss': 1.0995, 'grad_norm': 1.080678105354309, 'learning_rate': 1.9025023944348957e-05, 'epoch': 0.5} +2025-02-05 12:45:36 - ERROR - stderr - 17%|█▋ | 3758/22434 [2:37:56<13:18:11, 2.56s/it] +2025-02-05 12:45:39 - ERROR - stderr - 17%|█▋ | 3759/22434 [2:37:59<13:12:54, 2.55s/it] +2025-02-05 12:45:39 - ERROR - stderr - +2025-02-05 12:45:39 - ERROR - stderr - +2025-02-05 12:45:39 - INFO - stdout - {'loss': 0.9914, 'grad_norm': 1.193969964981079, 'learning_rate': 1.9024402050514056e-05, 'epoch': 0.5} +2025-02-05 12:45:39 - ERROR - stderr - 17%|█▋ | 3759/22434 [2:37:59<13:12:54, 2.55s/it] +2025-02-05 12:45:41 - ERROR - stderr - 17%|█▋ | 3760/22434 [2:38:01<13:04:13, 2.52s/it] +2025-02-05 12:45:41 - ERROR - stderr - +2025-02-05 12:45:41 - ERROR - stderr - +2025-02-05 12:45:41 - INFO - stdout - {'loss': 0.9721, 'grad_norm': 1.1350390911102295, 'learning_rate': 1.9023779968574127e-05, 'epoch': 0.5} +2025-02-05 12:45:41 - ERROR - stderr - 17%|█▋ | 3760/22434 [2:38:01<13:04:13, 2.52s/it] +2025-02-05 12:45:44 - ERROR - stderr - 17%|█▋ | 3761/22434 [2:38:04<12:59:50, 2.51s/it] +2025-02-05 12:45:44 - ERROR - stderr - +2025-02-05 12:45:44 - ERROR - stderr - +2025-02-05 12:45:44 - INFO - stdout - {'loss': 0.8765, 'grad_norm': 1.1308635473251343, 'learning_rate': 1.902315769854214e-05, 'epoch': 0.5} +2025-02-05 12:45:44 - ERROR - stderr - 17%|█▋ | 3761/22434 [2:38:04<12:59:50, 2.51s/it] +2025-02-05 12:45:46 - ERROR - stderr - 17%|█▋ | 3762/22434 [2:38:06<12:51:08, 2.48s/it] +2025-02-05 12:45:46 - ERROR - stderr - +2025-02-05 12:45:46 - ERROR - stderr - +2025-02-05 12:45:46 - INFO - stdout - {'loss': 0.9379, 'grad_norm': 1.0877312421798706, 'learning_rate': 1.9022535240431066e-05, 'epoch': 0.5} +2025-02-05 12:45:46 - ERROR - stderr - 17%|█▋ | 3762/22434 [2:38:06<12:51:08, 2.48s/it] +2025-02-05 12:45:49 - ERROR - stderr - 17%|█▋ | 3763/22434 [2:38:08<12:51:43, 2.48s/it] +2025-02-05 12:45:49 - ERROR - stderr - +2025-02-05 12:45:49 - ERROR - stderr - +2025-02-05 12:45:49 - INFO - stdout - {'loss': 0.9568, 'grad_norm': 1.036550521850586, 'learning_rate': 1.902191259425388e-05, 'epoch': 0.5} +2025-02-05 12:45:49 - ERROR - stderr - 17%|█▋ | 3763/22434 [2:38:09<12:51:43, 2.48s/it] +2025-02-05 12:45:51 - ERROR - stderr - 17%|█▋ | 3764/22434 [2:38:11<12:50:38, 2.48s/it] +2025-02-05 12:45:51 - ERROR - stderr - +2025-02-05 12:45:51 - ERROR - stderr - +2025-02-05 12:45:51 - INFO - stdout - {'loss': 0.8939, 'grad_norm': 1.0522043704986572, 'learning_rate': 1.9021289760023555e-05, 'epoch': 0.5} +2025-02-05 12:45:51 - ERROR - stderr - 17%|█▋ | 3764/22434 [2:38:11<12:50:38, 2.48s/it] +2025-02-05 12:45:54 - ERROR - stderr - 17%|█▋ | 3765/22434 [2:38:13<12:54:37, 2.49s/it] +2025-02-05 12:45:54 - ERROR - stderr - +2025-02-05 12:45:54 - ERROR - stderr - +2025-02-05 12:45:54 - INFO - stdout - {'loss': 0.9778, 'grad_norm': 1.1509100198745728, 'learning_rate': 1.902066673775308e-05, 'epoch': 0.5} +2025-02-05 12:45:54 - ERROR - stderr - 17%|█▋ | 3765/22434 [2:38:13<12:54:37, 2.49s/it] +2025-02-05 12:45:56 - ERROR - stderr - 17%|█▋ | 3766/22434 [2:38:16<12:49:46, 2.47s/it] +2025-02-05 12:45:56 - ERROR - stderr - +2025-02-05 12:45:56 - ERROR - stderr - +2025-02-05 12:45:56 - INFO - stdout - {'loss': 0.9107, 'grad_norm': 1.0133944749832153, 'learning_rate': 1.9020043527455438e-05, 'epoch': 0.5} +2025-02-05 12:45:56 - ERROR - stderr - 17%|█▋ | 3766/22434 [2:38:16<12:49:46, 2.47s/it] +2025-02-05 12:45:59 - ERROR - stderr - 17%|█▋ | 3767/22434 [2:38:18<12:51:25, 2.48s/it] +2025-02-05 12:45:59 - ERROR - stderr - +2025-02-05 12:45:59 - ERROR - stderr - +2025-02-05 12:45:59 - INFO - stdout - {'loss': 0.8588, 'grad_norm': 1.0727956295013428, 'learning_rate': 1.9019420129143618e-05, 'epoch': 0.5} +2025-02-05 12:45:59 - ERROR - stderr - 17%|█▋ | 3767/22434 [2:38:18<12:51:25, 2.48s/it] +2025-02-05 12:46:01 - ERROR - stderr - 17%|█▋ | 3768/22434 [2:38:21<13:14:29, 2.55s/it] +2025-02-05 12:46:01 - ERROR - stderr - +2025-02-05 12:46:01 - ERROR - stderr - +2025-02-05 12:46:01 - INFO - stdout - {'loss': 1.053, 'grad_norm': 1.1666569709777832, 'learning_rate': 1.9018796542830616e-05, 'epoch': 0.5} +2025-02-05 12:46:01 - ERROR - stderr - 17%|█▋ | 3768/22434 [2:38:21<13:14:29, 2.55s/it] +2025-02-05 12:46:04 - ERROR - stderr - 17%|█▋ | 3769/22434 [2:38:24<13:10:37, 2.54s/it] +2025-02-05 12:46:04 - ERROR - stderr - +2025-02-05 12:46:04 - ERROR - stderr - +2025-02-05 12:46:04 - INFO - stdout - {'loss': 1.0018, 'grad_norm': 1.1905138492584229, 'learning_rate': 1.9018172768529433e-05, 'epoch': 0.5} +2025-02-05 12:46:04 - ERROR - stderr - 17%|█▋ | 3769/22434 [2:38:24<13:10:37, 2.54s/it] +2025-02-05 12:46:06 - ERROR - stderr - 17%|█▋ | 3770/22434 [2:38:26<13:03:30, 2.52s/it] +2025-02-05 12:46:06 - ERROR - stderr - +2025-02-05 12:46:06 - ERROR - stderr - +2025-02-05 12:46:06 - INFO - stdout - {'loss': 1.0021, 'grad_norm': 1.1711221933364868, 'learning_rate': 1.9017548806253068e-05, 'epoch': 0.5} +2025-02-05 12:46:06 - ERROR - stderr - 17%|█▋ | 3770/22434 [2:38:26<13:03:30, 2.52s/it] +2025-02-05 12:46:09 - ERROR - stderr - 17%|█▋ | 3771/22434 [2:38:29<13:00:10, 2.51s/it] +2025-02-05 12:46:09 - ERROR - stderr - +2025-02-05 12:46:09 - ERROR - stderr - +2025-02-05 12:46:09 - INFO - stdout - {'loss': 0.9986, 'grad_norm': 0.9530136585235596, 'learning_rate': 1.9016924656014525e-05, 'epoch': 0.5} +2025-02-05 12:46:09 - ERROR - stderr - 17%|█▋ | 3771/22434 [2:38:29<13:00:10, 2.51s/it] +2025-02-05 12:46:11 - ERROR - stderr - 17%|█▋ | 3772/22434 [2:38:31<13:08:30, 2.54s/it] +2025-02-05 12:46:11 - ERROR - stderr - +2025-02-05 12:46:11 - ERROR - stderr - +2025-02-05 12:46:11 - INFO - stdout - {'loss': 1.0203, 'grad_norm': 1.056572437286377, 'learning_rate': 1.901630031782682e-05, 'epoch': 0.5} +2025-02-05 12:46:11 - ERROR - stderr - 17%|█▋ | 3772/22434 [2:38:31<13:08:30, 2.54s/it] +2025-02-05 12:46:14 - ERROR - stderr - 17%|█▋ | 3773/22434 [2:38:34<13:07:01, 2.53s/it] +2025-02-05 12:46:14 - ERROR - stderr - +2025-02-05 12:46:14 - ERROR - stderr - +2025-02-05 12:46:14 - INFO - stdout - {'loss': 1.0977, 'grad_norm': 1.1881023645401, 'learning_rate': 1.9015675791702956e-05, 'epoch': 0.5} +2025-02-05 12:46:14 - ERROR - stderr - 17%|█▋ | 3773/22434 [2:38:34<13:07:01, 2.53s/it] +2025-02-05 12:46:16 - ERROR - stderr - 17%|█▋ | 3774/22434 [2:38:36<13:07:40, 2.53s/it] +2025-02-05 12:46:16 - ERROR - stderr - +2025-02-05 12:46:17 - ERROR - stderr - +2025-02-05 12:46:17 - INFO - stdout - {'loss': 1.0022, 'grad_norm': 1.1704810857772827, 'learning_rate': 1.9015051077655963e-05, 'epoch': 0.5} +2025-02-05 12:46:17 - ERROR - stderr - 17%|█▋ | 3774/22434 [2:38:36<13:07:40, 2.53s/it] +2025-02-05 12:46:19 - ERROR - stderr - 17%|█▋ | 3775/22434 [2:38:39<13:17:42, 2.57s/it] +2025-02-05 12:46:19 - ERROR - stderr - +2025-02-05 12:46:19 - ERROR - stderr - +2025-02-05 12:46:19 - INFO - stdout - {'loss': 1.0099, 'grad_norm': 1.1093477010726929, 'learning_rate': 1.901442617569885e-05, 'epoch': 0.5} +2025-02-05 12:46:19 - ERROR - stderr - 17%|█▋ | 3775/22434 [2:38:39<13:17:42, 2.57s/it] +2025-02-05 12:46:22 - ERROR - stderr - 17%|█▋ | 3776/22434 [2:38:41<13:09:21, 2.54s/it] +2025-02-05 12:46:22 - ERROR - stderr - +2025-02-05 12:46:22 - ERROR - stderr - +2025-02-05 12:46:22 - INFO - stdout - {'loss': 1.0734, 'grad_norm': 1.2047743797302246, 'learning_rate': 1.9013801085844655e-05, 'epoch': 0.5} +2025-02-05 12:46:22 - ERROR - stderr - 17%|█▋ | 3776/22434 [2:38:41<13:09:21, 2.54s/it] +2025-02-05 12:46:24 - ERROR - stderr - 17%|█▋ | 3777/22434 [2:38:44<13:06:16, 2.53s/it] +2025-02-05 12:46:24 - ERROR - stderr - +2025-02-05 12:46:24 - ERROR - stderr - +2025-02-05 12:46:24 - INFO - stdout - {'loss': 0.9386, 'grad_norm': 0.972062349319458, 'learning_rate': 1.90131758081064e-05, 'epoch': 0.51} +2025-02-05 12:46:24 - ERROR - stderr - 17%|█▋ | 3777/22434 [2:38:44<13:06:16, 2.53s/it] +2025-02-05 12:46:27 - ERROR - stderr - 17%|█▋ | 3778/22434 [2:38:46<13:02:47, 2.52s/it] +2025-02-05 12:46:27 - ERROR - stderr - +2025-02-05 12:46:27 - ERROR - stderr - +2025-02-05 12:46:27 - INFO - stdout - {'loss': 0.9765, 'grad_norm': 1.0205680131912231, 'learning_rate': 1.901255034249712e-05, 'epoch': 0.51} +2025-02-05 12:46:27 - ERROR - stderr - 17%|█▋ | 3778/22434 [2:38:46<13:02:47, 2.52s/it] +2025-02-05 12:46:29 - ERROR - stderr - 17%|█▋ | 3779/22434 [2:38:49<13:01:40, 2.51s/it] +2025-02-05 12:46:29 - ERROR - stderr - +2025-02-05 12:46:29 - ERROR - stderr - +2025-02-05 12:46:29 - INFO - stdout - {'loss': 0.9258, 'grad_norm': 1.0622607469558716, 'learning_rate': 1.9011924689029856e-05, 'epoch': 0.51} +2025-02-05 12:46:29 - ERROR - stderr - 17%|█▋ | 3779/22434 [2:38:49<13:01:40, 2.51s/it] +2025-02-05 12:46:32 - ERROR - stderr - 17%|█▋ | 3780/22434 [2:38:51<12:55:45, 2.50s/it] +2025-02-05 12:46:32 - ERROR - stderr - +2025-02-05 12:46:32 - ERROR - stderr - +2025-02-05 12:46:32 - INFO - stdout - {'loss': 0.9289, 'grad_norm': 1.0987156629562378, 'learning_rate': 1.901129884771764e-05, 'epoch': 0.51} +2025-02-05 12:46:32 - ERROR - stderr - 17%|█▋ | 3780/22434 [2:38:51<12:55:45, 2.50s/it] +2025-02-05 12:46:34 - ERROR - stderr - 17%|█▋ | 3781/22434 [2:38:54<12:57:51, 2.50s/it] +2025-02-05 12:46:34 - ERROR - stderr - +2025-02-05 12:46:34 - ERROR - stderr - +2025-02-05 12:46:34 - INFO - stdout - {'loss': 0.9944, 'grad_norm': 1.1513290405273438, 'learning_rate': 1.9010672818573522e-05, 'epoch': 0.51} +2025-02-05 12:46:34 - ERROR - stderr - 17%|█▋ | 3781/22434 [2:38:54<12:57:51, 2.50s/it] +2025-02-05 12:46:37 - ERROR - stderr - 17%|█▋ | 3782/22434 [2:38:57<13:21:28, 2.58s/it] +2025-02-05 12:46:37 - ERROR - stderr - +2025-02-05 12:46:37 - ERROR - stderr - +2025-02-05 12:46:37 - INFO - stdout - {'loss': 0.9449, 'grad_norm': 1.1628025770187378, 'learning_rate': 1.9010046601610557e-05, 'epoch': 0.51} +2025-02-05 12:46:37 - ERROR - stderr - 17%|█▋ | 3782/22434 [2:38:57<13:21:28, 2.58s/it] +2025-02-05 12:46:39 - ERROR - stderr - 17%|█▋ | 3783/22434 [2:38:59<13:15:59, 2.56s/it] +2025-02-05 12:46:39 - ERROR - stderr - +2025-02-05 12:46:39 - ERROR - stderr - +2025-02-05 12:46:39 - INFO - stdout - {'loss': 0.9575, 'grad_norm': 1.044317603111267, 'learning_rate': 1.9009420196841786e-05, 'epoch': 0.51} +2025-02-05 12:46:39 - ERROR - stderr - 17%|█▋ | 3783/22434 [2:38:59<13:15:59, 2.56s/it] +2025-02-05 12:46:42 - ERROR - stderr - 17%|█▋ | 3784/22434 [2:39:02<13:06:53, 2.53s/it] +2025-02-05 12:46:42 - ERROR - stderr - +2025-02-05 12:46:42 - ERROR - stderr - +2025-02-05 12:46:42 - INFO - stdout - {'loss': 0.9242, 'grad_norm': 1.1931627988815308, 'learning_rate': 1.9008793604280275e-05, 'epoch': 0.51} +2025-02-05 12:46:42 - ERROR - stderr - 17%|█▋ | 3784/22434 [2:39:02<13:06:53, 2.53s/it] +2025-02-05 12:46:44 - ERROR - stderr - 17%|█▋ | 3785/22434 [2:39:04<13:14:56, 2.56s/it] +2025-02-05 12:46:44 - ERROR - stderr - +2025-02-05 12:46:44 - ERROR - stderr - +2025-02-05 12:46:44 - INFO - stdout - {'loss': 0.9965, 'grad_norm': 1.124172568321228, 'learning_rate': 1.900816682393908e-05, 'epoch': 0.51} +2025-02-05 12:46:44 - ERROR - stderr - 17%|█▋ | 3785/22434 [2:39:04<13:14:56, 2.56s/it] +2025-02-05 12:46:47 - ERROR - stderr - 17%|█▋ | 3786/22434 [2:39:07<13:07:10, 2.53s/it] +2025-02-05 12:46:47 - ERROR - stderr - +2025-02-05 12:46:47 - ERROR - stderr - +2025-02-05 12:46:47 - INFO - stdout - {'loss': 0.9878, 'grad_norm': 1.0585620403289795, 'learning_rate': 1.9007539855831272e-05, 'epoch': 0.51} +2025-02-05 12:46:47 - ERROR - stderr - 17%|█▋ | 3786/22434 [2:39:07<13:07:10, 2.53s/it] +2025-02-05 12:46:49 - ERROR - stderr - 17%|█▋ | 3787/22434 [2:39:09<12:57:10, 2.50s/it] +2025-02-05 12:46:49 - ERROR - stderr - +2025-02-05 12:46:49 - ERROR - stderr - +2025-02-05 12:46:49 - INFO - stdout - {'loss': 1.0873, 'grad_norm': 1.2777968645095825, 'learning_rate': 1.900691269996991e-05, 'epoch': 0.51} +2025-02-05 12:46:49 - ERROR - stderr - 17%|█▋ | 3787/22434 [2:39:09<12:57:10, 2.50s/it] +2025-02-05 12:46:52 - ERROR - stderr - 17%|█▋ | 3788/22434 [2:39:12<13:27:47, 2.60s/it] +2025-02-05 12:46:52 - ERROR - stderr - +2025-02-05 12:46:52 - ERROR - stderr - +2025-02-05 12:46:52 - INFO - stdout - {'loss': 0.991, 'grad_norm': 1.1473510265350342, 'learning_rate': 1.9006285356368076e-05, 'epoch': 0.51} +2025-02-05 12:46:52 - ERROR - stderr - 17%|█▋ | 3788/22434 [2:39:12<13:27:47, 2.60s/it] +2025-02-05 12:46:55 - ERROR - stderr - 17%|█▋ | 3789/22434 [2:39:14<13:24:50, 2.59s/it] +2025-02-05 12:46:55 - ERROR - stderr - +2025-02-05 12:46:55 - ERROR - stderr - +2025-02-05 12:46:55 - INFO - stdout - {'loss': 1.0212, 'grad_norm': 1.0634920597076416, 'learning_rate': 1.9005657825038838e-05, 'epoch': 0.51} +2025-02-05 12:46:55 - ERROR - stderr - 17%|█▋ | 3789/22434 [2:39:15<13:24:50, 2.59s/it] +2025-02-05 12:46:57 - ERROR - stderr - 17%|█▋ | 3790/22434 [2:39:17<13:19:31, 2.57s/it] +2025-02-05 12:46:57 - ERROR - stderr - +2025-02-05 12:46:57 - ERROR - stderr - +2025-02-05 12:46:57 - INFO - stdout - {'loss': 1.0114, 'grad_norm': 1.2837328910827637, 'learning_rate': 1.900503010599528e-05, 'epoch': 0.51} +2025-02-05 12:46:57 - ERROR - stderr - 17%|█▋ | 3790/22434 [2:39:17<13:19:31, 2.57s/it] +2025-02-05 12:47:00 - ERROR - stderr - 17%|█▋ | 3791/22434 [2:39:19<13:10:28, 2.54s/it] +2025-02-05 12:47:00 - ERROR - stderr - +2025-02-05 12:47:00 - ERROR - stderr - +2025-02-05 12:47:00 - INFO - stdout - {'loss': 1.0315, 'grad_norm': 1.1228044033050537, 'learning_rate': 1.900440219925049e-05, 'epoch': 0.51} +2025-02-05 12:47:00 - ERROR - stderr - 17%|█▋ | 3791/22434 [2:39:20<13:10:28, 2.54s/it] +2025-02-05 12:47:02 - ERROR - stderr - 17%|█▋ | 3792/22434 [2:39:22<13:04:14, 2.52s/it] +2025-02-05 12:47:02 - ERROR - stderr - +2025-02-05 12:47:02 - ERROR - stderr - +2025-02-05 12:47:02 - INFO - stdout - {'loss': 0.9859, 'grad_norm': 1.097066044807434, 'learning_rate': 1.900377410481755e-05, 'epoch': 0.51} +2025-02-05 12:47:02 - ERROR - stderr - 17%|█▋ | 3792/22434 [2:39:22<13:04:14, 2.52s/it] +2025-02-05 12:47:05 - ERROR - stderr - 17%|█▋ | 3793/22434 [2:39:24<12:57:58, 2.50s/it] +2025-02-05 12:47:05 - ERROR - stderr - +2025-02-05 12:47:05 - ERROR - stderr - +2025-02-05 12:47:05 - INFO - stdout - {'loss': 0.9559, 'grad_norm': 1.1513768434524536, 'learning_rate': 1.9003145822709553e-05, 'epoch': 0.51} +2025-02-05 12:47:05 - ERROR - stderr - 17%|█▋ | 3793/22434 [2:39:24<12:57:58, 2.50s/it] +2025-02-05 12:47:07 - ERROR - stderr - 17%|█▋ | 3794/22434 [2:39:27<12:50:52, 2.48s/it] +2025-02-05 12:47:07 - ERROR - stderr - +2025-02-05 12:47:07 - ERROR - stderr - +2025-02-05 12:47:07 - INFO - stdout - {'loss': 0.9548, 'grad_norm': 1.0075641870498657, 'learning_rate': 1.90025173529396e-05, 'epoch': 0.51} +2025-02-05 12:47:07 - ERROR - stderr - 17%|█▋ | 3794/22434 [2:39:27<12:50:52, 2.48s/it] +2025-02-05 12:47:10 - ERROR - stderr - 17%|█▋ | 3795/22434 [2:39:29<12:49:33, 2.48s/it] +2025-02-05 12:47:10 - ERROR - stderr - +2025-02-05 12:47:10 - ERROR - stderr - +2025-02-05 12:47:10 - INFO - stdout - {'loss': 0.9136, 'grad_norm': 1.0988287925720215, 'learning_rate': 1.9001888695520785e-05, 'epoch': 0.51} +2025-02-05 12:47:10 - ERROR - stderr - 17%|█▋ | 3795/22434 [2:39:29<12:49:33, 2.48s/it] +2025-02-05 12:47:12 - ERROR - stderr - 17%|█▋ | 3796/22434 [2:39:32<12:50:42, 2.48s/it] +2025-02-05 12:47:12 - ERROR - stderr - +2025-02-05 12:47:12 - ERROR - stderr - +2025-02-05 12:47:12 - INFO - stdout - {'loss': 1.1613, 'grad_norm': 1.1111541986465454, 'learning_rate': 1.9001259850466214e-05, 'epoch': 0.51} +2025-02-05 12:47:12 - ERROR - stderr - 17%|█▋ | 3796/22434 [2:39:32<12:50:42, 2.48s/it] +2025-02-05 12:47:15 - ERROR - stderr - 17%|█▋ | 3797/22434 [2:39:34<12:58:52, 2.51s/it] +2025-02-05 12:47:15 - ERROR - stderr - +2025-02-05 12:47:15 - ERROR - stderr - +2025-02-05 12:47:15 - INFO - stdout - {'loss': 0.9497, 'grad_norm': 1.156724214553833, 'learning_rate': 1.9000630817788994e-05, 'epoch': 0.51} +2025-02-05 12:47:15 - ERROR - stderr - 17%|█▋ | 3797/22434 [2:39:34<12:58:52, 2.51s/it] +2025-02-05 12:47:17 - ERROR - stderr - 17%|█▋ | 3798/22434 [2:39:37<13:02:39, 2.52s/it] +2025-02-05 12:47:17 - ERROR - stderr - +2025-02-05 12:47:17 - ERROR - stderr - +2025-02-05 12:47:17 - INFO - stdout - {'loss': 0.8897, 'grad_norm': 1.051926612854004, 'learning_rate': 1.900000159750224e-05, 'epoch': 0.51} +2025-02-05 12:47:17 - ERROR - stderr - 17%|█▋ | 3798/22434 [2:39:37<13:02:39, 2.52s/it] +2025-02-05 12:47:20 - ERROR - stderr - 17%|█▋ | 3799/22434 [2:39:39<13:01:27, 2.52s/it] +2025-02-05 12:47:20 - ERROR - stderr - +2025-02-05 12:47:20 - ERROR - stderr - +2025-02-05 12:47:20 - INFO - stdout - {'loss': 0.9084, 'grad_norm': 1.1226952075958252, 'learning_rate': 1.8999372189619062e-05, 'epoch': 0.51} +2025-02-05 12:47:20 - ERROR - stderr - 17%|█▋ | 3799/22434 [2:39:39<13:01:27, 2.52s/it] +2025-02-05 12:47:22 - ERROR - stderr - 17%|█▋ | 3800/22434 [2:39:42<12:54:06, 2.49s/it] +2025-02-05 12:47:22 - ERROR - stderr - +2025-02-05 12:47:22 - ERROR - stderr - +2025-02-05 12:47:22 - INFO - stdout - {'loss': 0.9657, 'grad_norm': 1.1156808137893677, 'learning_rate': 1.8998742594152585e-05, 'epoch': 0.51} +2025-02-05 12:47:22 - ERROR - stderr - 17%|█▋ | 3800/22434 [2:39:42<12:54:06, 2.49s/it] +2025-02-05 12:47:25 - ERROR - stderr - 17%|█▋ | 3801/22434 [2:39:44<12:59:43, 2.51s/it] +2025-02-05 12:47:25 - ERROR - stderr - +2025-02-05 12:47:25 - ERROR - stderr - +2025-02-05 12:47:25 - INFO - stdout - {'loss': 0.8922, 'grad_norm': 1.051435112953186, 'learning_rate': 1.8998112811115924e-05, 'epoch': 0.51} +2025-02-05 12:47:25 - ERROR - stderr - 17%|█▋ | 3801/22434 [2:39:44<12:59:43, 2.51s/it] +2025-02-05 12:47:27 - ERROR - stderr - 17%|█▋ | 3802/22434 [2:39:47<12:57:26, 2.50s/it] +2025-02-05 12:47:27 - ERROR - stderr - +2025-02-05 12:47:27 - ERROR - stderr - +2025-02-05 12:47:27 - INFO - stdout - {'loss': 0.9868, 'grad_norm': 0.98753422498703, 'learning_rate': 1.8997482840522218e-05, 'epoch': 0.51} +2025-02-05 12:47:27 - ERROR - stderr - 17%|█▋ | 3802/22434 [2:39:47<12:57:26, 2.50s/it] +2025-02-05 12:47:30 - ERROR - stderr - 17%|█▋ | 3803/22434 [2:39:49<12:57:44, 2.50s/it] +2025-02-05 12:47:30 - ERROR - stderr - +2025-02-05 12:47:30 - ERROR - stderr - +2025-02-05 12:47:30 - INFO - stdout - {'loss': 0.8274, 'grad_norm': 1.0197744369506836, 'learning_rate': 1.899685268238459e-05, 'epoch': 0.51} +2025-02-05 12:47:30 - ERROR - stderr - 17%|█▋ | 3803/22434 [2:39:49<12:57:44, 2.50s/it] +2025-02-05 12:47:32 - ERROR - stderr - 17%|█▋ | 3804/22434 [2:39:52<13:23:37, 2.59s/it] +2025-02-05 12:47:32 - ERROR - stderr - +2025-02-05 12:47:32 - ERROR - stderr - +2025-02-05 12:47:32 - INFO - stdout - {'loss': 1.1568, 'grad_norm': 1.183379054069519, 'learning_rate': 1.8996222336716172e-05, 'epoch': 0.51} +2025-02-05 12:47:32 - ERROR - stderr - 17%|█▋ | 3804/22434 [2:39:52<13:23:37, 2.59s/it] +2025-02-05 12:47:35 - ERROR - stderr - 17%|█▋ | 3805/22434 [2:39:55<13:09:49, 2.54s/it] +2025-02-05 12:47:35 - ERROR - stderr - +2025-02-05 12:47:35 - ERROR - stderr - +2025-02-05 12:47:35 - INFO - stdout - {'loss': 0.9507, 'grad_norm': 1.0052472352981567, 'learning_rate': 1.8995591803530115e-05, 'epoch': 0.51} +2025-02-05 12:47:35 - ERROR - stderr - 17%|█▋ | 3805/22434 [2:39:55<13:09:49, 2.54s/it] +2025-02-05 12:47:37 - ERROR - stderr - 17%|█▋ | 3806/22434 [2:39:57<13:00:21, 2.51s/it] +2025-02-05 12:47:37 - ERROR - stderr - +2025-02-05 12:47:37 - ERROR - stderr - +2025-02-05 12:47:37 - INFO - stdout - {'loss': 0.9336, 'grad_norm': 1.1517760753631592, 'learning_rate': 1.8994961082839548e-05, 'epoch': 0.51} +2025-02-05 12:47:37 - ERROR - stderr - 17%|█▋ | 3806/22434 [2:39:57<13:00:21, 2.51s/it] +2025-02-05 12:47:40 - ERROR - stderr - 17%|█▋ | 3807/22434 [2:40:00<12:53:58, 2.49s/it] +2025-02-05 12:47:40 - ERROR - stderr - +2025-02-05 12:47:40 - ERROR - stderr - +2025-02-05 12:47:40 - INFO - stdout - {'loss': 0.8832, 'grad_norm': 1.1153533458709717, 'learning_rate': 1.899433017465763e-05, 'epoch': 0.51} +2025-02-05 12:47:40 - ERROR - stderr - 17%|█▋ | 3807/22434 [2:40:00<12:53:58, 2.49s/it] +2025-02-05 12:47:42 - ERROR - stderr - 17%|█▋ | 3808/22434 [2:40:02<12:48:53, 2.48s/it] +2025-02-05 12:47:42 - ERROR - stderr - +2025-02-05 12:47:42 - ERROR - stderr - +2025-02-05 12:47:42 - INFO - stdout - {'loss': 0.9337, 'grad_norm': 1.1506197452545166, 'learning_rate': 1.8993699078997506e-05, 'epoch': 0.51} +2025-02-05 12:47:42 - ERROR - stderr - 17%|█▋ | 3808/22434 [2:40:02<12:48:53, 2.48s/it] +2025-02-05 12:47:45 - ERROR - stderr - 17%|█▋ | 3809/22434 [2:40:04<12:42:30, 2.46s/it] +2025-02-05 12:47:45 - ERROR - stderr - +2025-02-05 12:47:45 - ERROR - stderr - +2025-02-05 12:47:45 - INFO - stdout - {'loss': 0.9719, 'grad_norm': 1.1759499311447144, 'learning_rate': 1.899306779587233e-05, 'epoch': 0.51} +2025-02-05 12:47:45 - ERROR - stderr - 17%|█▋ | 3809/22434 [2:40:04<12:42:30, 2.46s/it] +2025-02-05 12:47:47 - ERROR - stderr - 17%|█▋ | 3810/22434 [2:40:07<12:55:59, 2.50s/it] +2025-02-05 12:47:47 - ERROR - stderr - +2025-02-05 12:47:47 - ERROR - stderr - +2025-02-05 12:47:47 - INFO - stdout - {'loss': 0.8986, 'grad_norm': 0.9596386551856995, 'learning_rate': 1.8992436325295258e-05, 'epoch': 0.51} +2025-02-05 12:47:47 - ERROR - stderr - 17%|█▋ | 3810/22434 [2:40:07<12:55:59, 2.50s/it] +2025-02-05 12:47:50 - ERROR - stderr - 17%|█▋ | 3811/22434 [2:40:09<12:53:33, 2.49s/it] +2025-02-05 12:47:50 - ERROR - stderr - +2025-02-05 12:47:50 - ERROR - stderr - +2025-02-05 12:47:50 - INFO - stdout - {'loss': 1.002, 'grad_norm': 1.219245195388794, 'learning_rate': 1.8991804667279455e-05, 'epoch': 0.51} +2025-02-05 12:47:50 - ERROR - stderr - 17%|█▋ | 3811/22434 [2:40:09<12:53:33, 2.49s/it] +2025-02-05 12:47:52 - ERROR - stderr - 17%|█▋ | 3812/22434 [2:40:12<12:52:09, 2.49s/it] +2025-02-05 12:47:52 - ERROR - stderr - +2025-02-05 12:47:52 - ERROR - stderr - +2025-02-05 12:47:52 - INFO - stdout - {'loss': 0.9283, 'grad_norm': 1.2183505296707153, 'learning_rate': 1.8991172821838093e-05, 'epoch': 0.51} +2025-02-05 12:47:52 - ERROR - stderr - 17%|█▋ | 3812/22434 [2:40:12<12:52:09, 2.49s/it] +2025-02-05 12:47:55 - ERROR - stderr - 17%|█▋ | 3813/22434 [2:40:14<12:45:17, 2.47s/it] +2025-02-05 12:47:55 - ERROR - stderr - +2025-02-05 12:47:55 - ERROR - stderr - +2025-02-05 12:47:55 - INFO - stdout - {'loss': 1.0332, 'grad_norm': 1.310115098953247, 'learning_rate': 1.8990540788984336e-05, 'epoch': 0.51} +2025-02-05 12:47:55 - ERROR - stderr - 17%|█▋ | 3813/22434 [2:40:14<12:45:17, 2.47s/it] +2025-02-05 12:47:57 - ERROR - stderr - 17%|█▋ | 3814/22434 [2:40:17<12:48:00, 2.47s/it] +2025-02-05 12:47:57 - ERROR - stderr - +2025-02-05 12:47:57 - ERROR - stderr - +2025-02-05 12:47:57 - INFO - stdout - {'loss': 0.9784, 'grad_norm': 1.111986517906189, 'learning_rate': 1.8989908568731356e-05, 'epoch': 0.51} +2025-02-05 12:47:57 - ERROR - stderr - 17%|█▋ | 3814/22434 [2:40:17<12:48:00, 2.47s/it] +2025-02-05 12:48:00 - ERROR - stderr - 17%|█▋ | 3815/22434 [2:40:19<12:50:14, 2.48s/it] +2025-02-05 12:48:00 - ERROR - stderr - +2025-02-05 12:48:00 - ERROR - stderr - +2025-02-05 12:48:00 - INFO - stdout - {'loss': 1.1051, 'grad_norm': 1.2221065759658813, 'learning_rate': 1.8989276161092337e-05, 'epoch': 0.51} +2025-02-05 12:48:00 - ERROR - stderr - 17%|█▋ | 3815/22434 [2:40:19<12:50:14, 2.48s/it] +2025-02-05 12:48:02 - ERROR - stderr - 17%|█▋ | 3816/22434 [2:40:22<12:45:41, 2.47s/it] +2025-02-05 12:48:02 - ERROR - stderr - +2025-02-05 12:48:02 - ERROR - stderr - +2025-02-05 12:48:02 - INFO - stdout - {'loss': 0.9523, 'grad_norm': 1.1372051239013672, 'learning_rate': 1.898864356608046e-05, 'epoch': 0.51} +2025-02-05 12:48:02 - ERROR - stderr - 17%|█▋ | 3816/22434 [2:40:22<12:45:41, 2.47s/it] +2025-02-05 12:48:05 - ERROR - stderr - 17%|█▋ | 3817/22434 [2:40:24<12:56:00, 2.50s/it] +2025-02-05 12:48:05 - ERROR - stderr - +2025-02-05 12:48:05 - ERROR - stderr - +2025-02-05 12:48:05 - INFO - stdout - {'loss': 1.05, 'grad_norm': 1.1853171586990356, 'learning_rate': 1.8988010783708906e-05, 'epoch': 0.51} +2025-02-05 12:48:05 - ERROR - stderr - 17%|█▋ | 3817/22434 [2:40:24<12:56:00, 2.50s/it] +2025-02-05 12:48:07 - ERROR - stderr - 17%|█▋ | 3818/22434 [2:40:27<12:52:39, 2.49s/it] +2025-02-05 12:48:07 - ERROR - stderr - +2025-02-05 12:48:07 - ERROR - stderr - +2025-02-05 12:48:07 - INFO - stdout - {'loss': 1.0411, 'grad_norm': 1.1441550254821777, 'learning_rate': 1.8987377813990867e-05, 'epoch': 0.51} +2025-02-05 12:48:07 - ERROR - stderr - 17%|█▋ | 3818/22434 [2:40:27<12:52:39, 2.49s/it] +2025-02-05 12:48:10 - ERROR - stderr - 17%|█▋ | 3819/22434 [2:40:30<13:38:46, 2.64s/it] +2025-02-05 12:48:10 - ERROR - stderr - +2025-02-05 12:48:10 - ERROR - stderr - +2025-02-05 12:48:10 - INFO - stdout - {'loss': 0.9104, 'grad_norm': 1.1583088636398315, 'learning_rate': 1.898674465693954e-05, 'epoch': 0.51} +2025-02-05 12:48:10 - ERROR - stderr - 17%|█▋ | 3819/22434 [2:40:30<13:38:46, 2.64s/it] +2025-02-05 12:48:13 - ERROR - stderr - 17%|█▋ | 3820/22434 [2:40:32<13:39:20, 2.64s/it] +2025-02-05 12:48:13 - ERROR - stderr - +2025-02-05 12:48:13 - ERROR - stderr - +2025-02-05 12:48:13 - INFO - stdout - {'loss': 0.9962, 'grad_norm': 1.1596380472183228, 'learning_rate': 1.8986111312568118e-05, 'epoch': 0.51} +2025-02-05 12:48:13 - ERROR - stderr - 17%|█▋ | 3820/22434 [2:40:32<13:39:20, 2.64s/it] +2025-02-05 12:48:15 - ERROR - stderr - 17%|█▋ | 3821/22434 [2:40:35<13:26:46, 2.60s/it] +2025-02-05 12:48:15 - ERROR - stderr - +2025-02-05 12:48:15 - ERROR - stderr - +2025-02-05 12:48:15 - INFO - stdout - {'loss': 0.9593, 'grad_norm': 1.1127177476882935, 'learning_rate': 1.8985477780889808e-05, 'epoch': 0.51} +2025-02-05 12:48:15 - ERROR - stderr - 17%|█▋ | 3821/22434 [2:40:35<13:26:46, 2.60s/it] +2025-02-05 12:48:18 - ERROR - stderr - 17%|█▋ | 3822/22434 [2:40:38<13:24:44, 2.59s/it] +2025-02-05 12:48:18 - ERROR - stderr - +2025-02-05 12:48:18 - ERROR - stderr - +2025-02-05 12:48:18 - INFO - stdout - {'loss': 0.9177, 'grad_norm': 1.0773991346359253, 'learning_rate': 1.8984844061917805e-05, 'epoch': 0.51} +2025-02-05 12:48:18 - ERROR - stderr - 17%|█▋ | 3822/22434 [2:40:38<13:24:44, 2.59s/it] +2025-02-05 12:48:20 - ERROR - stderr - 17%|█▋ | 3823/22434 [2:40:40<13:22:07, 2.59s/it] +2025-02-05 12:48:20 - ERROR - stderr - +2025-02-05 12:48:20 - ERROR - stderr - +2025-02-05 12:48:20 - INFO - stdout - {'loss': 1.0687, 'grad_norm': 1.1321998834609985, 'learning_rate': 1.898421015566533e-05, 'epoch': 0.51} +2025-02-05 12:48:20 - ERROR - stderr - 17%|█▋ | 3823/22434 [2:40:40<13:22:07, 2.59s/it] +2025-02-05 12:48:23 - ERROR - stderr - 17%|█▋ | 3824/22434 [2:40:43<13:14:27, 2.56s/it] +2025-02-05 12:48:23 - ERROR - stderr - +2025-02-05 12:48:23 - ERROR - stderr - +2025-02-05 12:48:23 - INFO - stdout - {'loss': 1.16, 'grad_norm': 1.1707265377044678, 'learning_rate': 1.8983576062145594e-05, 'epoch': 0.51} +2025-02-05 12:48:23 - ERROR - stderr - 17%|█▋ | 3824/22434 [2:40:43<13:14:27, 2.56s/it] +2025-02-05 12:48:25 - ERROR - stderr - 17%|█▋ | 3825/22434 [2:40:45<13:05:07, 2.53s/it] +2025-02-05 12:48:25 - ERROR - stderr - +2025-02-05 12:48:25 - ERROR - stderr - +2025-02-05 12:48:25 - INFO - stdout - {'loss': 0.8821, 'grad_norm': 1.043716311454773, 'learning_rate': 1.8982941781371807e-05, 'epoch': 0.51} +2025-02-05 12:48:25 - ERROR - stderr - 17%|█▋ | 3825/22434 [2:40:45<13:05:07, 2.53s/it] +2025-02-05 12:48:28 - ERROR - stderr - 17%|█▋ | 3826/22434 [2:40:48<13:01:23, 2.52s/it] +2025-02-05 12:48:28 - ERROR - stderr - +2025-02-05 12:48:28 - ERROR - stderr - +2025-02-05 12:48:28 - INFO - stdout - {'loss': 0.891, 'grad_norm': 1.0262171030044556, 'learning_rate': 1.8982307313357195e-05, 'epoch': 0.51} +2025-02-05 12:48:28 - ERROR - stderr - 17%|█▋ | 3826/22434 [2:40:48<13:01:23, 2.52s/it] +2025-02-05 12:48:30 - ERROR - stderr - 17%|█▋ | 3827/22434 [2:40:50<13:04:20, 2.53s/it] +2025-02-05 12:48:30 - ERROR - stderr - +2025-02-05 12:48:30 - ERROR - stderr - +2025-02-05 12:48:30 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 1.095699429512024, 'learning_rate': 1.8981672658114983e-05, 'epoch': 0.51} +2025-02-05 12:48:30 - ERROR - stderr - 17%|█▋ | 3827/22434 [2:40:50<13:04:20, 2.53s/it] +2025-02-05 12:48:33 - ERROR - stderr - 17%|█▋ | 3828/22434 [2:40:53<12:57:02, 2.51s/it] +2025-02-05 12:48:33 - ERROR - stderr - +2025-02-05 12:48:33 - ERROR - stderr - +2025-02-05 12:48:33 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.1988531351089478, 'learning_rate': 1.8981037815658398e-05, 'epoch': 0.51} +2025-02-05 12:48:33 - ERROR - stderr - 17%|█▋ | 3828/22434 [2:40:53<12:57:02, 2.51s/it] +2025-02-05 12:48:35 - ERROR - stderr - 17%|█▋ | 3829/22434 [2:40:55<12:51:03, 2.49s/it] +2025-02-05 12:48:35 - ERROR - stderr - +2025-02-05 12:48:35 - ERROR - stderr - +2025-02-05 12:48:35 - INFO - stdout - {'loss': 1.0078, 'grad_norm': 1.1508054733276367, 'learning_rate': 1.8980402786000677e-05, 'epoch': 0.51} +2025-02-05 12:48:35 - ERROR - stderr - 17%|█▋ | 3829/22434 [2:40:55<12:51:03, 2.49s/it] +2025-02-05 12:48:38 - ERROR - stderr - 17%|█▋ | 3830/22434 [2:40:58<12:53:24, 2.49s/it] +2025-02-05 12:48:38 - ERROR - stderr - +2025-02-05 12:48:38 - ERROR - stderr - +2025-02-05 12:48:38 - INFO - stdout - {'loss': 1.0322, 'grad_norm': 1.093957543373108, 'learning_rate': 1.8979767569155048e-05, 'epoch': 0.51} +2025-02-05 12:48:38 - ERROR - stderr - 17%|█▋ | 3830/22434 [2:40:58<12:53:24, 2.49s/it] +2025-02-05 12:48:40 - ERROR - stderr - 17%|█▋ | 3831/22434 [2:41:00<12:50:47, 2.49s/it] +2025-02-05 12:48:40 - ERROR - stderr - +2025-02-05 12:48:40 - ERROR - stderr - +2025-02-05 12:48:40 - INFO - stdout - {'loss': 0.8891, 'grad_norm': 1.0881657600402832, 'learning_rate': 1.897913216513476e-05, 'epoch': 0.51} +2025-02-05 12:48:40 - ERROR - stderr - 17%|█▋ | 3831/22434 [2:41:00<12:50:47, 2.49s/it] +2025-02-05 12:48:43 - ERROR - stderr - 17%|█▋ | 3832/22434 [2:41:02<12:48:17, 2.48s/it] +2025-02-05 12:48:43 - ERROR - stderr - +2025-02-05 12:48:43 - ERROR - stderr - +2025-02-05 12:48:43 - INFO - stdout - {'loss': 0.87, 'grad_norm': 1.05989670753479, 'learning_rate': 1.8978496573953052e-05, 'epoch': 0.51} +2025-02-05 12:48:43 - ERROR - stderr - 17%|█▋ | 3832/22434 [2:41:02<12:48:17, 2.48s/it] +2025-02-05 12:48:45 - ERROR - stderr - 17%|█▋ | 3833/22434 [2:41:05<12:54:14, 2.50s/it] +2025-02-05 12:48:45 - ERROR - stderr - +2025-02-05 12:48:45 - ERROR - stderr - +2025-02-05 12:48:45 - INFO - stdout - {'loss': 0.8853, 'grad_norm': 1.113887906074524, 'learning_rate': 1.8977860795623178e-05, 'epoch': 0.51} +2025-02-05 12:48:45 - ERROR - stderr - 17%|█▋ | 3833/22434 [2:41:05<12:54:14, 2.50s/it] +2025-02-05 12:48:48 - ERROR - stderr - 17%|█▋ | 3834/22434 [2:41:07<12:53:32, 2.50s/it] +2025-02-05 12:48:48 - ERROR - stderr - +2025-02-05 12:48:48 - ERROR - stderr - +2025-02-05 12:48:48 - INFO - stdout - {'loss': 1.051, 'grad_norm': 1.0479919910430908, 'learning_rate': 1.897722483015838e-05, 'epoch': 0.51} +2025-02-05 12:48:48 - ERROR - stderr - 17%|█▋ | 3834/22434 [2:41:08<12:53:32, 2.50s/it] +2025-02-05 12:48:50 - ERROR - stderr - 17%|█▋ | 3835/22434 [2:41:10<12:55:53, 2.50s/it] +2025-02-05 12:48:50 - ERROR - stderr - +2025-02-05 12:48:50 - ERROR - stderr - +2025-02-05 12:48:50 - INFO - stdout - {'loss': 1.0021, 'grad_norm': 1.0667577981948853, 'learning_rate': 1.897658867757193e-05, 'epoch': 0.51} +2025-02-05 12:48:50 - ERROR - stderr - 17%|█▋ | 3835/22434 [2:41:10<12:55:53, 2.50s/it] +2025-02-05 12:48:53 - ERROR - stderr - 17%|█▋ | 3836/22434 [2:41:13<13:23:59, 2.59s/it] +2025-02-05 12:48:53 - ERROR - stderr - +2025-02-05 12:48:53 - ERROR - stderr - +2025-02-05 12:48:53 - INFO - stdout - {'loss': 0.9569, 'grad_norm': 1.096814513206482, 'learning_rate': 1.897595233787707e-05, 'epoch': 0.51} +2025-02-05 12:48:53 - ERROR - stderr - 17%|█▋ | 3836/22434 [2:41:13<13:23:59, 2.59s/it] +2025-02-05 12:48:55 - ERROR - stderr - 17%|█▋ | 3837/22434 [2:41:15<13:11:28, 2.55s/it] +2025-02-05 12:48:56 - ERROR - stderr - +2025-02-05 12:48:56 - ERROR - stderr - +2025-02-05 12:48:56 - INFO - stdout - {'loss': 0.9629, 'grad_norm': 1.0351325273513794, 'learning_rate': 1.8975315811087077e-05, 'epoch': 0.51} +2025-02-05 12:48:56 - ERROR - stderr - 17%|█▋ | 3837/22434 [2:41:15<13:11:28, 2.55s/it] +2025-02-05 12:48:58 - ERROR - stderr - 17%|█▋ | 3838/22434 [2:41:18<13:10:41, 2.55s/it] +2025-02-05 12:48:58 - ERROR - stderr - +2025-02-05 12:48:58 - ERROR - stderr - +2025-02-05 12:48:58 - INFO - stdout - {'loss': 0.9604, 'grad_norm': 1.2123996019363403, 'learning_rate': 1.8974679097215214e-05, 'epoch': 0.51} +2025-02-05 12:48:58 - ERROR - stderr - 17%|█▋ | 3838/22434 [2:41:18<13:10:41, 2.55s/it] +2025-02-05 12:49:00 - ERROR - stderr - 17%|█▋ | 3839/22434 [2:41:20<13:01:39, 2.52s/it] +2025-02-05 12:49:01 - ERROR - stderr - +2025-02-05 12:49:01 - ERROR - stderr - +2025-02-05 12:49:01 - INFO - stdout - {'loss': 0.8971, 'grad_norm': 1.076482892036438, 'learning_rate': 1.8974042196274752e-05, 'epoch': 0.51} +2025-02-05 12:49:01 - ERROR - stderr - 17%|█▋ | 3839/22434 [2:41:20<13:01:39, 2.52s/it] +2025-02-05 12:49:03 - ERROR - stderr - 17%|█▋ | 3840/22434 [2:41:23<13:17:09, 2.57s/it] +2025-02-05 12:49:03 - ERROR - stderr - +2025-02-05 12:49:03 - ERROR - stderr - +2025-02-05 12:49:03 - INFO - stdout - {'loss': 0.959, 'grad_norm': 1.0794481039047241, 'learning_rate': 1.8973405108278967e-05, 'epoch': 0.51} +2025-02-05 12:49:03 - ERROR - stderr - 17%|█▋ | 3840/22434 [2:41:23<13:17:09, 2.57s/it] +2025-02-05 12:49:06 - ERROR - stderr - 17%|█▋ | 3841/22434 [2:41:25<13:12:40, 2.56s/it] +2025-02-05 12:49:06 - ERROR - stderr - +2025-02-05 12:49:06 - ERROR - stderr - +2025-02-05 12:49:06 - INFO - stdout - {'loss': 1.0548, 'grad_norm': 1.1245522499084473, 'learning_rate': 1.8972767833241142e-05, 'epoch': 0.51} +2025-02-05 12:49:06 - ERROR - stderr - 17%|█▋ | 3841/22434 [2:41:26<13:12:40, 2.56s/it] +2025-02-05 12:49:08 - ERROR - stderr - 17%|█▋ | 3842/22434 [2:41:28<13:04:01, 2.53s/it] +2025-02-05 12:49:08 - ERROR - stderr - +2025-02-05 12:49:08 - ERROR - stderr - +2025-02-05 12:49:08 - INFO - stdout - {'loss': 0.9976, 'grad_norm': 1.0636851787567139, 'learning_rate': 1.8972130371174557e-05, 'epoch': 0.51} +2025-02-05 12:49:08 - ERROR - stderr - 17%|█▋ | 3842/22434 [2:41:28<13:04:01, 2.53s/it] +2025-02-05 12:49:11 - ERROR - stderr - 17%|█▋ | 3843/22434 [2:41:30<12:58:25, 2.51s/it] +2025-02-05 12:49:11 - ERROR - stderr - +2025-02-05 12:49:11 - ERROR - stderr - +2025-02-05 12:49:11 - INFO - stdout - {'loss': 0.936, 'grad_norm': 1.137702226638794, 'learning_rate': 1.89714927220925e-05, 'epoch': 0.51} +2025-02-05 12:49:11 - ERROR - stderr - 17%|█▋ | 3843/22434 [2:41:30<12:58:25, 2.51s/it] +2025-02-05 12:49:13 - ERROR - stderr - 17%|█▋ | 3844/22434 [2:41:33<12:52:06, 2.49s/it] +2025-02-05 12:49:13 - ERROR - stderr - +2025-02-05 12:49:13 - ERROR - stderr - +2025-02-05 12:49:13 - INFO - stdout - {'loss': 0.9398, 'grad_norm': 1.1484642028808594, 'learning_rate': 1.897085488600826e-05, 'epoch': 0.51} +2025-02-05 12:49:13 - ERROR - stderr - 17%|█▋ | 3844/22434 [2:41:33<12:52:06, 2.49s/it] +2025-02-05 12:49:16 - ERROR - stderr - 17%|█▋ | 3845/22434 [2:41:35<12:50:11, 2.49s/it] +2025-02-05 12:49:16 - ERROR - stderr - +2025-02-05 12:49:16 - ERROR - stderr - +2025-02-05 12:49:16 - INFO - stdout - {'loss': 1.113, 'grad_norm': 1.1954537630081177, 'learning_rate': 1.8970216862935134e-05, 'epoch': 0.51} +2025-02-05 12:49:16 - ERROR - stderr - 17%|█▋ | 3845/22434 [2:41:35<12:50:11, 2.49s/it] +2025-02-05 12:49:18 - ERROR - stderr - 17%|█▋ | 3846/22434 [2:41:38<12:44:28, 2.47s/it] +2025-02-05 12:49:18 - ERROR - stderr - +2025-02-05 12:49:18 - ERROR - stderr - +2025-02-05 12:49:18 - INFO - stdout - {'loss': 0.9846, 'grad_norm': 1.1452709436416626, 'learning_rate': 1.896957865288642e-05, 'epoch': 0.51} +2025-02-05 12:49:18 - ERROR - stderr - 17%|█▋ | 3846/22434 [2:41:38<12:44:28, 2.47s/it] +2025-02-05 12:49:21 - ERROR - stderr - 17%|█▋ | 3847/22434 [2:41:40<12:50:20, 2.49s/it] +2025-02-05 12:49:21 - ERROR - stderr - +2025-02-05 12:49:21 - ERROR - stderr - +2025-02-05 12:49:21 - INFO - stdout - {'loss': 1.0308, 'grad_norm': 1.0287585258483887, 'learning_rate': 1.8968940255875426e-05, 'epoch': 0.51} +2025-02-05 12:49:21 - ERROR - stderr - 17%|█▋ | 3847/22434 [2:41:40<12:50:20, 2.49s/it] +2025-02-05 12:49:23 - ERROR - stderr - 17%|█▋ | 3848/22434 [2:41:43<12:43:37, 2.47s/it] +2025-02-05 12:49:23 - ERROR - stderr - +2025-02-05 12:49:23 - ERROR - stderr - +2025-02-05 12:49:23 - INFO - stdout - {'loss': 1.0187, 'grad_norm': 1.0327305793762207, 'learning_rate': 1.8968301671915454e-05, 'epoch': 0.51} +2025-02-05 12:49:23 - ERROR - stderr - 17%|█▋ | 3848/22434 [2:41:43<12:43:37, 2.47s/it] +2025-02-05 12:49:25 - ERROR - stderr - 17%|█▋ | 3849/22434 [2:41:45<12:46:09, 2.47s/it] +2025-02-05 12:49:25 - ERROR - stderr - +2025-02-05 12:49:25 - ERROR - stderr - +2025-02-05 12:49:25 - INFO - stdout - {'loss': 1.072, 'grad_norm': 0.9759637117385864, 'learning_rate': 1.8967662901019813e-05, 'epoch': 0.51} +2025-02-05 12:49:25 - ERROR - stderr - 17%|█▋ | 3849/22434 [2:41:45<12:46:09, 2.47s/it] +2025-02-05 12:49:28 - ERROR - stderr - 17%|█▋ | 3850/22434 [2:41:48<12:52:20, 2.49s/it] +2025-02-05 12:49:28 - ERROR - stderr - +2025-02-05 12:49:28 - ERROR - stderr - +2025-02-05 12:49:28 - INFO - stdout - {'loss': 1.0316, 'grad_norm': 1.0826876163482666, 'learning_rate': 1.8967023943201818e-05, 'epoch': 0.51} +2025-02-05 12:49:28 - ERROR - stderr - 17%|█▋ | 3850/22434 [2:41:48<12:52:20, 2.49s/it] +2025-02-05 12:49:30 - ERROR - stderr - 17%|█▋ | 3851/22434 [2:41:50<12:53:59, 2.50s/it] +2025-02-05 12:49:31 - ERROR - stderr - +2025-02-05 12:49:31 - ERROR - stderr - +2025-02-05 12:49:31 - INFO - stdout - {'loss': 0.902, 'grad_norm': 1.0880807638168335, 'learning_rate': 1.8966384798474793e-05, 'epoch': 0.51} +2025-02-05 12:49:31 - ERROR - stderr - 17%|█▋ | 3851/22434 [2:41:50<12:53:59, 2.50s/it] +2025-02-05 12:49:33 - ERROR - stderr - 17%|█▋ | 3852/22434 [2:41:53<13:24:13, 2.60s/it] +2025-02-05 12:49:33 - ERROR - stderr - +2025-02-05 12:49:33 - ERROR - stderr - +2025-02-05 12:49:33 - INFO - stdout - {'loss': 0.9522, 'grad_norm': 1.1165785789489746, 'learning_rate': 1.8965745466852055e-05, 'epoch': 0.52} +2025-02-05 12:49:33 - ERROR - stderr - 17%|█▋ | 3852/22434 [2:41:53<13:24:13, 2.60s/it] +2025-02-05 12:49:36 - ERROR - stderr - 17%|█▋ | 3853/22434 [2:41:56<13:25:57, 2.60s/it] +2025-02-05 12:49:36 - ERROR - stderr - +2025-02-05 12:49:36 - ERROR - stderr - +2025-02-05 12:49:36 - INFO - stdout - {'loss': 1.0999, 'grad_norm': 1.1486557722091675, 'learning_rate': 1.8965105948346934e-05, 'epoch': 0.52} +2025-02-05 12:49:36 - ERROR - stderr - 17%|█▋ | 3853/22434 [2:41:56<13:25:57, 2.60s/it] +2025-02-05 12:49:38 - ERROR - stderr - 17%|█▋ | 3854/22434 [2:41:58<13:17:40, 2.58s/it] +2025-02-05 12:49:38 - ERROR - stderr - +2025-02-05 12:49:38 - ERROR - stderr - +2025-02-05 12:49:38 - INFO - stdout - {'loss': 1.161, 'grad_norm': 1.2315080165863037, 'learning_rate': 1.8964466242972758e-05, 'epoch': 0.52} +2025-02-05 12:49:38 - ERROR - stderr - 17%|█▋ | 3854/22434 [2:41:58<13:17:40, 2.58s/it] +2025-02-05 12:49:41 - ERROR - stderr - 17%|█▋ | 3855/22434 [2:42:01<13:09:54, 2.55s/it] +2025-02-05 12:49:41 - ERROR - stderr - +2025-02-05 12:49:41 - ERROR - stderr - +2025-02-05 12:49:41 - INFO - stdout - {'loss': 1.0412, 'grad_norm': 1.1569840908050537, 'learning_rate': 1.896382635074286e-05, 'epoch': 0.52} +2025-02-05 12:49:41 - ERROR - stderr - 17%|█▋ | 3855/22434 [2:42:01<13:09:54, 2.55s/it] +2025-02-05 12:49:44 - ERROR - stderr - 17%|█▋ | 3856/22434 [2:42:03<13:27:41, 2.61s/it] +2025-02-05 12:49:44 - ERROR - stderr - +2025-02-05 12:49:44 - ERROR - stderr - +2025-02-05 12:49:44 - INFO - stdout - {'loss': 0.8947, 'grad_norm': 1.0345807075500488, 'learning_rate': 1.8963186271670578e-05, 'epoch': 0.52} +2025-02-05 12:49:44 - ERROR - stderr - 17%|█▋ | 3856/22434 [2:42:03<13:27:41, 2.61s/it] +2025-02-05 12:49:46 - ERROR - stderr - 17%|█▋ | 3857/22434 [2:42:06<13:20:14, 2.58s/it] +2025-02-05 12:49:46 - ERROR - stderr - +2025-02-05 12:49:46 - ERROR - stderr - +2025-02-05 12:49:46 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.2175929546356201, 'learning_rate': 1.896254600576926e-05, 'epoch': 0.52} +2025-02-05 12:49:46 - ERROR - stderr - 17%|█▋ | 3857/22434 [2:42:06<13:20:14, 2.58s/it] +2025-02-05 12:49:49 - ERROR - stderr - 17%|█▋ | 3858/22434 [2:42:08<13:10:59, 2.55s/it] +2025-02-05 12:49:49 - ERROR - stderr - +2025-02-05 12:49:49 - ERROR - stderr - +2025-02-05 12:49:49 - INFO - stdout - {'loss': 0.8375, 'grad_norm': 0.9854939579963684, 'learning_rate': 1.896190555305224e-05, 'epoch': 0.52} +2025-02-05 12:49:49 - ERROR - stderr - 17%|█▋ | 3858/22434 [2:42:08<13:10:59, 2.55s/it] +2025-02-05 12:49:51 - ERROR - stderr - 17%|█▋ | 3859/22434 [2:42:11<13:04:28, 2.53s/it] +2025-02-05 12:49:51 - ERROR - stderr - +2025-02-05 12:49:51 - ERROR - stderr - +2025-02-05 12:49:51 - INFO - stdout - {'loss': 0.9554, 'grad_norm': 1.0953930616378784, 'learning_rate': 1.8961264913532876e-05, 'epoch': 0.52} +2025-02-05 12:49:51 - ERROR - stderr - 17%|█▋ | 3859/22434 [2:42:11<13:04:28, 2.53s/it] +2025-02-05 12:49:54 - ERROR - stderr - 17%|█▋ | 3860/22434 [2:42:13<12:53:55, 2.50s/it] +2025-02-05 12:49:54 - ERROR - stderr - +2025-02-05 12:49:54 - ERROR - stderr - +2025-02-05 12:49:54 - INFO - stdout - {'loss': 1.0183, 'grad_norm': 1.3187388181686401, 'learning_rate': 1.8960624087224527e-05, 'epoch': 0.52} +2025-02-05 12:49:54 - ERROR - stderr - 17%|█▋ | 3860/22434 [2:42:13<12:53:55, 2.50s/it] +2025-02-05 12:49:56 - ERROR - stderr - 17%|█▋ | 3861/22434 [2:42:16<12:58:03, 2.51s/it] +2025-02-05 12:49:56 - ERROR - stderr - +2025-02-05 12:49:56 - ERROR - stderr - +2025-02-05 12:49:56 - INFO - stdout - {'loss': 0.8611, 'grad_norm': 1.0151300430297852, 'learning_rate': 1.8959983074140535e-05, 'epoch': 0.52} +2025-02-05 12:49:56 - ERROR - stderr - 17%|█▋ | 3861/22434 [2:42:16<12:58:03, 2.51s/it] +2025-02-05 12:49:59 - ERROR - stderr - 17%|█▋ | 3862/22434 [2:42:18<12:53:02, 2.50s/it] +2025-02-05 12:49:59 - ERROR - stderr - +2025-02-05 12:49:59 - ERROR - stderr - +2025-02-05 12:49:59 - INFO - stdout - {'loss': 1.0728, 'grad_norm': 1.0359649658203125, 'learning_rate': 1.895934187429427e-05, 'epoch': 0.52} +2025-02-05 12:49:59 - ERROR - stderr - 17%|█▋ | 3862/22434 [2:42:18<12:53:02, 2.50s/it] +2025-02-05 12:50:01 - ERROR - stderr - 17%|█▋ | 3863/22434 [2:42:21<12:49:05, 2.48s/it] +2025-02-05 12:50:01 - ERROR - stderr - +2025-02-05 12:50:01 - ERROR - stderr - +2025-02-05 12:50:01 - INFO - stdout - {'loss': 1.1133, 'grad_norm': 1.208021879196167, 'learning_rate': 1.8958700487699103e-05, 'epoch': 0.52} +2025-02-05 12:50:01 - ERROR - stderr - 17%|█▋ | 3863/22434 [2:42:21<12:49:05, 2.48s/it] +2025-02-05 12:50:04 - ERROR - stderr - 17%|█▋ | 3864/22434 [2:42:23<12:51:34, 2.49s/it] +2025-02-05 12:50:04 - ERROR - stderr - +2025-02-05 12:50:04 - ERROR - stderr - +2025-02-05 12:50:04 - INFO - stdout - {'loss': 0.9771, 'grad_norm': 1.060405969619751, 'learning_rate': 1.8958058914368393e-05, 'epoch': 0.52} +2025-02-05 12:50:04 - ERROR - stderr - 17%|█▋ | 3864/22434 [2:42:23<12:51:34, 2.49s/it] +2025-02-05 12:50:06 - ERROR - stderr - 17%|█▋ | 3865/22434 [2:42:26<12:46:34, 2.48s/it] +2025-02-05 12:50:06 - ERROR - stderr - +2025-02-05 12:50:06 - ERROR - stderr - +2025-02-05 12:50:06 - INFO - stdout - {'loss': 0.8762, 'grad_norm': 1.0115083456039429, 'learning_rate': 1.8957417154315517e-05, 'epoch': 0.52} +2025-02-05 12:50:06 - ERROR - stderr - 17%|█▋ | 3865/22434 [2:42:26<12:46:34, 2.48s/it] +2025-02-05 12:50:09 - ERROR - stderr - 17%|█▋ | 3866/22434 [2:42:28<12:50:12, 2.49s/it] +2025-02-05 12:50:09 - ERROR - stderr - +2025-02-05 12:50:09 - ERROR - stderr - +2025-02-05 12:50:09 - INFO - stdout - {'loss': 1.0173, 'grad_norm': 0.9833860993385315, 'learning_rate': 1.8956775207553853e-05, 'epoch': 0.52} +2025-02-05 12:50:09 - ERROR - stderr - 17%|█▋ | 3866/22434 [2:42:28<12:50:12, 2.49s/it] +2025-02-05 12:50:11 - ERROR - stderr - 17%|█▋ | 3867/22434 [2:42:31<13:13:52, 2.57s/it] +2025-02-05 12:50:11 - ERROR - stderr - +2025-02-05 12:50:11 - ERROR - stderr - +2025-02-05 12:50:11 - INFO - stdout - {'loss': 1.106, 'grad_norm': 1.0816105604171753, 'learning_rate': 1.895613307409678e-05, 'epoch': 0.52} +2025-02-05 12:50:11 - ERROR - stderr - 17%|█▋ | 3867/22434 [2:42:31<13:13:52, 2.57s/it] +2025-02-05 12:50:14 - ERROR - stderr - 17%|█▋ | 3868/22434 [2:42:34<13:07:19, 2.54s/it] +2025-02-05 12:50:14 - ERROR - stderr - +2025-02-05 12:50:14 - ERROR - stderr - +2025-02-05 12:50:14 - INFO - stdout - {'loss': 0.9081, 'grad_norm': 1.2732880115509033, 'learning_rate': 1.8955490753957678e-05, 'epoch': 0.52} +2025-02-05 12:50:14 - ERROR - stderr - 17%|█▋ | 3868/22434 [2:42:34<13:07:19, 2.54s/it] +2025-02-05 12:50:16 - ERROR - stderr - 17%|█▋ | 3869/22434 [2:42:36<12:57:44, 2.51s/it] +2025-02-05 12:50:16 - ERROR - stderr - +2025-02-05 12:50:16 - ERROR - stderr - +2025-02-05 12:50:16 - INFO - stdout - {'loss': 0.9248, 'grad_norm': 1.0721365213394165, 'learning_rate': 1.8954848247149948e-05, 'epoch': 0.52} +2025-02-05 12:50:16 - ERROR - stderr - 17%|█▋ | 3869/22434 [2:42:36<12:57:44, 2.51s/it] +2025-02-05 12:50:19 - ERROR - stderr - 17%|█▋ | 3870/22434 [2:42:38<12:53:37, 2.50s/it] +2025-02-05 12:50:19 - ERROR - stderr - +2025-02-05 12:50:19 - ERROR - stderr - +2025-02-05 12:50:19 - INFO - stdout - {'loss': 0.8849, 'grad_norm': 1.0737065076828003, 'learning_rate': 1.895420555368697e-05, 'epoch': 0.52} +2025-02-05 12:50:19 - ERROR - stderr - 17%|█▋ | 3870/22434 [2:42:38<12:53:37, 2.50s/it] +2025-02-05 12:50:21 - ERROR - stderr - 17%|█▋ | 3871/22434 [2:42:41<12:52:45, 2.50s/it] +2025-02-05 12:50:21 - ERROR - stderr - +2025-02-05 12:50:21 - ERROR - stderr - +2025-02-05 12:50:21 - INFO - stdout - {'loss': 1.0696, 'grad_norm': 1.1095443964004517, 'learning_rate': 1.895356267358215e-05, 'epoch': 0.52} +2025-02-05 12:50:21 - ERROR - stderr - 17%|█▋ | 3871/22434 [2:42:41<12:52:45, 2.50s/it] +2025-02-05 12:50:24 - ERROR - stderr - 17%|█▋ | 3872/22434 [2:42:43<12:57:23, 2.51s/it] +2025-02-05 12:50:24 - ERROR - stderr - +2025-02-05 12:50:24 - ERROR - stderr - +2025-02-05 12:50:24 - INFO - stdout - {'loss': 1.0665, 'grad_norm': 1.0778032541275024, 'learning_rate': 1.8952919606848882e-05, 'epoch': 0.52} +2025-02-05 12:50:24 - ERROR - stderr - 17%|█▋ | 3872/22434 [2:42:44<12:57:23, 2.51s/it] +2025-02-05 12:50:26 - ERROR - stderr - 17%|█▋ | 3873/22434 [2:42:46<12:54:22, 2.50s/it] +2025-02-05 12:50:26 - ERROR - stderr - +2025-02-05 12:50:26 - ERROR - stderr - +2025-02-05 12:50:26 - INFO - stdout - {'loss': 1.0059, 'grad_norm': 1.0335626602172852, 'learning_rate': 1.895227635350057e-05, 'epoch': 0.52} +2025-02-05 12:50:26 - ERROR - stderr - 17%|█▋ | 3873/22434 [2:42:46<12:54:22, 2.50s/it] +2025-02-05 12:50:29 - ERROR - stderr - 17%|█▋ | 3874/22434 [2:42:49<13:00:52, 2.52s/it] +2025-02-05 12:50:29 - ERROR - stderr - +2025-02-05 12:50:29 - ERROR - stderr - +2025-02-05 12:50:29 - INFO - stdout - {'loss': 0.9658, 'grad_norm': 1.0494685173034668, 'learning_rate': 1.8951632913550625e-05, 'epoch': 0.52} +2025-02-05 12:50:29 - ERROR - stderr - 17%|█▋ | 3874/22434 [2:42:49<13:00:52, 2.52s/it] +2025-02-05 12:50:31 - ERROR - stderr - 17%|█▋ | 3875/22434 [2:42:51<12:54:05, 2.50s/it] +2025-02-05 12:50:31 - ERROR - stderr - +2025-02-05 12:50:31 - ERROR - stderr - +2025-02-05 12:50:31 - INFO - stdout - {'loss': 0.9021, 'grad_norm': 1.127772331237793, 'learning_rate': 1.8950989287012457e-05, 'epoch': 0.52} +2025-02-05 12:50:31 - ERROR - stderr - 17%|█▋ | 3875/22434 [2:42:51<12:54:05, 2.50s/it] +2025-02-05 12:50:34 - ERROR - stderr - 17%|█▋ | 3876/22434 [2:42:53<12:50:24, 2.49s/it] +2025-02-05 12:50:34 - ERROR - stderr - +2025-02-05 12:50:34 - ERROR - stderr - +2025-02-05 12:50:34 - INFO - stdout - {'loss': 1.0194, 'grad_norm': 1.1766126155853271, 'learning_rate': 1.8950345473899484e-05, 'epoch': 0.52} +2025-02-05 12:50:34 - ERROR - stderr - 17%|█▋ | 3876/22434 [2:42:53<12:50:24, 2.49s/it] +2025-02-05 12:50:36 - ERROR - stderr - 17%|█▋ | 3877/22434 [2:42:56<12:51:28, 2.49s/it] +2025-02-05 12:50:36 - ERROR - stderr - +2025-02-05 12:50:36 - ERROR - stderr - +2025-02-05 12:50:36 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.1021366119384766, 'learning_rate': 1.8949701474225123e-05, 'epoch': 0.52} +2025-02-05 12:50:36 - ERROR - stderr - 17%|█▋ | 3877/22434 [2:42:56<12:51:28, 2.49s/it] +2025-02-05 12:50:39 - ERROR - stderr - 17%|█▋ | 3878/22434 [2:42:58<12:51:02, 2.49s/it] +2025-02-05 12:50:39 - ERROR - stderr - +2025-02-05 12:50:39 - ERROR - stderr - +2025-02-05 12:50:39 - INFO - stdout - {'loss': 0.9591, 'grad_norm': 1.020088791847229, 'learning_rate': 1.89490572880028e-05, 'epoch': 0.52} +2025-02-05 12:50:39 - ERROR - stderr - 17%|█▋ | 3878/22434 [2:42:58<12:51:02, 2.49s/it] +2025-02-05 12:50:41 - ERROR - stderr - 17%|█▋ | 3879/22434 [2:43:01<12:48:03, 2.48s/it] +2025-02-05 12:50:41 - ERROR - stderr - +2025-02-05 12:50:41 - ERROR - stderr - +2025-02-05 12:50:41 - INFO - stdout - {'loss': 0.9431, 'grad_norm': 1.1151494979858398, 'learning_rate': 1.894841291524594e-05, 'epoch': 0.52} +2025-02-05 12:50:41 - ERROR - stderr - 17%|█▋ | 3879/22434 [2:43:01<12:48:03, 2.48s/it] +2025-02-05 12:50:44 - ERROR - stderr - 17%|█▋ | 3880/22434 [2:43:03<12:48:26, 2.48s/it] +2025-02-05 12:50:44 - ERROR - stderr - +2025-02-05 12:50:44 - ERROR - stderr - +2025-02-05 12:50:44 - INFO - stdout - {'loss': 1.0015, 'grad_norm': 1.071023941040039, 'learning_rate': 1.8947768355967975e-05, 'epoch': 0.52} +2025-02-05 12:50:44 - ERROR - stderr - 17%|█▋ | 3880/22434 [2:43:03<12:48:26, 2.48s/it] +2025-02-05 12:50:46 - ERROR - stderr - 17%|█▋ | 3881/22434 [2:43:06<12:54:02, 2.50s/it] +2025-02-05 12:50:46 - ERROR - stderr - +2025-02-05 12:50:46 - ERROR - stderr - +2025-02-05 12:50:46 - INFO - stdout - {'loss': 1.0225, 'grad_norm': 0.9775139093399048, 'learning_rate': 1.8947123610182342e-05, 'epoch': 0.52} +2025-02-05 12:50:46 - ERROR - stderr - 17%|█▋ | 3881/22434 [2:43:06<12:54:02, 2.50s/it] +2025-02-05 12:50:49 - ERROR - stderr - 17%|█▋ | 3882/22434 [2:43:08<12:52:24, 2.50s/it] +2025-02-05 12:50:49 - ERROR - stderr - +2025-02-05 12:50:49 - ERROR - stderr - +2025-02-05 12:50:49 - INFO - stdout - {'loss': 0.8393, 'grad_norm': 1.0321720838546753, 'learning_rate': 1.894647867790248e-05, 'epoch': 0.52} +2025-02-05 12:50:49 - ERROR - stderr - 17%|█▋ | 3882/22434 [2:43:08<12:52:24, 2.50s/it] +2025-02-05 12:50:51 - ERROR - stderr - 17%|█▋ | 3883/22434 [2:43:11<12:52:55, 2.50s/it] +2025-02-05 12:50:51 - ERROR - stderr - +2025-02-05 12:50:51 - ERROR - stderr - +2025-02-05 12:50:51 - INFO - stdout - {'loss': 1.0616, 'grad_norm': 1.1958746910095215, 'learning_rate': 1.8945833559141825e-05, 'epoch': 0.52} +2025-02-05 12:50:51 - ERROR - stderr - 17%|█▋ | 3883/22434 [2:43:11<12:52:55, 2.50s/it] +2025-02-05 12:50:54 - ERROR - stderr - 17%|█▋ | 3884/22434 [2:43:13<12:52:19, 2.50s/it] +2025-02-05 12:50:54 - ERROR - stderr - +2025-02-05 12:50:54 - ERROR - stderr - +2025-02-05 12:50:54 - INFO - stdout - {'loss': 1.1117, 'grad_norm': 1.1184295415878296, 'learning_rate': 1.8945188253913837e-05, 'epoch': 0.52} +2025-02-05 12:50:54 - ERROR - stderr - 17%|█▋ | 3884/22434 [2:43:13<12:52:19, 2.50s/it] +2025-02-05 12:50:56 - ERROR - stderr - 17%|█▋ | 3885/22434 [2:43:16<12:50:44, 2.49s/it] +2025-02-05 12:50:56 - ERROR - stderr - +2025-02-05 12:50:56 - ERROR - stderr - +2025-02-05 12:50:56 - INFO - stdout - {'loss': 0.9481, 'grad_norm': 1.1438792943954468, 'learning_rate': 1.8944542762231955e-05, 'epoch': 0.52} +2025-02-05 12:50:56 - ERROR - stderr - 17%|█▋ | 3885/22434 [2:43:16<12:50:44, 2.49s/it] +2025-02-05 12:50:59 - ERROR - stderr - 17%|█▋ | 3886/22434 [2:43:19<13:14:45, 2.57s/it] +2025-02-05 12:50:59 - ERROR - stderr - +2025-02-05 12:50:59 - ERROR - stderr - +2025-02-05 12:50:59 - INFO - stdout - {'loss': 0.8317, 'grad_norm': 1.1248042583465576, 'learning_rate': 1.8943897084109638e-05, 'epoch': 0.52} +2025-02-05 12:50:59 - ERROR - stderr - 17%|█▋ | 3886/22434 [2:43:19<13:14:45, 2.57s/it] +2025-02-05 12:51:01 - ERROR - stderr - 17%|█▋ | 3887/22434 [2:43:21<13:04:47, 2.54s/it] +2025-02-05 12:51:01 - ERROR - stderr - +2025-02-05 12:51:01 - ERROR - stderr - +2025-02-05 12:51:01 - INFO - stdout - {'loss': 1.034, 'grad_norm': 1.2589408159255981, 'learning_rate': 1.8943251219560347e-05, 'epoch': 0.52} +2025-02-05 12:51:01 - ERROR - stderr - 17%|█▋ | 3887/22434 [2:43:21<13:04:47, 2.54s/it] +2025-02-05 12:51:04 - ERROR - stderr - 17%|█▋ | 3888/22434 [2:43:24<13:02:49, 2.53s/it] +2025-02-05 12:51:04 - ERROR - stderr - +2025-02-05 12:51:04 - ERROR - stderr - +2025-02-05 12:51:04 - INFO - stdout - {'loss': 0.9552, 'grad_norm': 1.0450526475906372, 'learning_rate': 1.8942605168597542e-05, 'epoch': 0.52} +2025-02-05 12:51:04 - ERROR - stderr - 17%|█▋ | 3888/22434 [2:43:24<13:02:49, 2.53s/it] +2025-02-05 12:51:06 - ERROR - stderr - 17%|█▋ | 3889/22434 [2:43:26<12:52:42, 2.50s/it] +2025-02-05 12:51:06 - ERROR - stderr - +2025-02-05 12:51:06 - ERROR - stderr - +2025-02-05 12:51:06 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.2703546285629272, 'learning_rate': 1.894195893123469e-05, 'epoch': 0.52} +2025-02-05 12:51:06 - ERROR - stderr - 17%|█▋ | 3889/22434 [2:43:26<12:52:42, 2.50s/it] +2025-02-05 12:51:09 - ERROR - stderr - 17%|█▋ | 3890/22434 [2:43:29<13:13:20, 2.57s/it] +2025-02-05 12:51:09 - ERROR - stderr - +2025-02-05 12:51:09 - ERROR - stderr - +2025-02-05 12:51:09 - INFO - stdout - {'loss': 1.0503, 'grad_norm': 1.1543128490447998, 'learning_rate': 1.894131250748526e-05, 'epoch': 0.52} +2025-02-05 12:51:09 - ERROR - stderr - 17%|█▋ | 3890/22434 [2:43:29<13:13:20, 2.57s/it] +2025-02-05 12:51:12 - ERROR - stderr - 17%|█▋ | 3891/22434 [2:43:31<13:18:21, 2.58s/it] +2025-02-05 12:51:12 - ERROR - stderr - +2025-02-05 12:51:12 - ERROR - stderr - +2025-02-05 12:51:12 - INFO - stdout - {'loss': 0.8552, 'grad_norm': 1.1224424839019775, 'learning_rate': 1.8940665897362724e-05, 'epoch': 0.52} +2025-02-05 12:51:12 - ERROR - stderr - 17%|█▋ | 3891/22434 [2:43:31<13:18:21, 2.58s/it] +2025-02-05 12:51:14 - ERROR - stderr - 17%|█▋ | 3892/22434 [2:43:34<13:02:31, 2.53s/it] +2025-02-05 12:51:14 - ERROR - stderr - +2025-02-05 12:51:14 - ERROR - stderr - +2025-02-05 12:51:14 - INFO - stdout - {'loss': 1.0045, 'grad_norm': 1.0187281370162964, 'learning_rate': 1.8940019100880564e-05, 'epoch': 0.52} +2025-02-05 12:51:14 - ERROR - stderr - 17%|█▋ | 3892/22434 [2:43:34<13:02:31, 2.53s/it] +2025-02-05 12:51:17 - ERROR - stderr - 17%|█▋ | 3893/22434 [2:43:36<13:06:39, 2.55s/it] +2025-02-05 12:51:17 - ERROR - stderr - +2025-02-05 12:51:17 - ERROR - stderr - +2025-02-05 12:51:17 - INFO - stdout - {'loss': 0.922, 'grad_norm': 1.040170431137085, 'learning_rate': 1.8939372118052263e-05, 'epoch': 0.52} +2025-02-05 12:51:17 - ERROR - stderr - 17%|█▋ | 3893/22434 [2:43:36<13:06:39, 2.55s/it] +2025-02-05 12:51:19 - ERROR - stderr - 17%|█▋ | 3894/22434 [2:43:39<13:01:02, 2.53s/it] +2025-02-05 12:51:19 - ERROR - stderr - +2025-02-05 12:51:19 - ERROR - stderr - +2025-02-05 12:51:19 - INFO - stdout - {'loss': 0.9302, 'grad_norm': 1.140371322631836, 'learning_rate': 1.89387249488913e-05, 'epoch': 0.52} +2025-02-05 12:51:19 - ERROR - stderr - 17%|█▋ | 3894/22434 [2:43:39<13:01:02, 2.53s/it] +2025-02-05 12:51:22 - ERROR - stderr - 17%|█▋ | 3895/22434 [2:43:41<12:59:50, 2.52s/it] +2025-02-05 12:51:22 - ERROR - stderr - +2025-02-05 12:51:22 - ERROR - stderr - +2025-02-05 12:51:22 - INFO - stdout - {'loss': 0.9625, 'grad_norm': 1.0946296453475952, 'learning_rate': 1.8938077593411172e-05, 'epoch': 0.52} +2025-02-05 12:51:22 - ERROR - stderr - 17%|█▋ | 3895/22434 [2:43:41<12:59:50, 2.52s/it] +2025-02-05 12:51:24 - ERROR - stderr - 17%|█▋ | 3896/22434 [2:43:44<12:55:52, 2.51s/it] +2025-02-05 12:51:24 - ERROR - stderr - +2025-02-05 12:51:24 - ERROR - stderr - +2025-02-05 12:51:24 - INFO - stdout - {'loss': 0.873, 'grad_norm': 1.0061008930206299, 'learning_rate': 1.893743005162537e-05, 'epoch': 0.52} +2025-02-05 12:51:24 - ERROR - stderr - 17%|█▋ | 3896/22434 [2:43:44<12:55:52, 2.51s/it] +2025-02-05 12:51:27 - ERROR - stderr - 17%|█▋ | 3897/22434 [2:43:46<12:54:03, 2.51s/it] +2025-02-05 12:51:27 - ERROR - stderr - +2025-02-05 12:51:27 - ERROR - stderr - +2025-02-05 12:51:27 - INFO - stdout - {'loss': 1.0298, 'grad_norm': 1.3982113599777222, 'learning_rate': 1.8936782323547387e-05, 'epoch': 0.52} +2025-02-05 12:51:27 - ERROR - stderr - 17%|█▋ | 3897/22434 [2:43:46<12:54:03, 2.51s/it] +2025-02-05 12:51:29 - ERROR - stderr - 17%|█▋ | 3898/22434 [2:43:49<12:49:17, 2.49s/it] +2025-02-05 12:51:29 - ERROR - stderr - +2025-02-05 12:51:29 - ERROR - stderr - +2025-02-05 12:51:29 - INFO - stdout - {'loss': 1.1054, 'grad_norm': 1.0664973258972168, 'learning_rate': 1.893613440919073e-05, 'epoch': 0.52} +2025-02-05 12:51:29 - ERROR - stderr - 17%|█▋ | 3898/22434 [2:43:49<12:49:17, 2.49s/it] +2025-02-05 12:51:31 - ERROR - stderr - 17%|█▋ | 3899/22434 [2:43:51<12:45:01, 2.48s/it] +2025-02-05 12:51:32 - ERROR - stderr - +2025-02-05 12:51:32 - ERROR - stderr - +2025-02-05 12:51:32 - INFO - stdout - {'loss': 0.9052, 'grad_norm': 1.007728099822998, 'learning_rate': 1.8935486308568902e-05, 'epoch': 0.52} +2025-02-05 12:51:32 - ERROR - stderr - 17%|█▋ | 3899/22434 [2:43:51<12:45:01, 2.48s/it] +2025-02-05 12:51:34 - ERROR - stderr - 17%|█▋ | 3900/22434 [2:43:54<12:46:33, 2.48s/it] +2025-02-05 12:51:34 - ERROR - stderr - +2025-02-05 12:51:34 - ERROR - stderr - +2025-02-05 12:51:34 - INFO - stdout - {'loss': 1.0236, 'grad_norm': 1.0875911712646484, 'learning_rate': 1.8934838021695415e-05, 'epoch': 0.52} +2025-02-05 12:51:34 - ERROR - stderr - 17%|█▋ | 3900/22434 [2:43:54<12:46:33, 2.48s/it] +2025-02-05 12:51:36 - ERROR - stderr - 17%|█▋ | 3901/22434 [2:43:56<12:42:23, 2.47s/it] +2025-02-05 12:51:36 - ERROR - stderr - +2025-02-05 12:51:36 - ERROR - stderr - +2025-02-05 12:51:36 - INFO - stdout - {'loss': 0.9267, 'grad_norm': 1.1241705417633057, 'learning_rate': 1.8934189548583774e-05, 'epoch': 0.52} +2025-02-05 12:51:36 - ERROR - stderr - 17%|█▋ | 3901/22434 [2:43:56<12:42:23, 2.47s/it] +2025-02-05 12:51:39 - ERROR - stderr - 17%|█▋ | 3902/22434 [2:43:59<12:45:39, 2.48s/it] +2025-02-05 12:51:39 - ERROR - stderr - +2025-02-05 12:51:39 - ERROR - stderr - +2025-02-05 12:51:39 - INFO - stdout - {'loss': 1.0, 'grad_norm': 1.0521745681762695, 'learning_rate': 1.8933540889247504e-05, 'epoch': 0.52} +2025-02-05 12:51:39 - ERROR - stderr - 17%|█▋ | 3902/22434 [2:43:59<12:45:39, 2.48s/it] +2025-02-05 12:51:42 - ERROR - stderr - 17%|█▋ | 3903/22434 [2:44:01<12:57:48, 2.52s/it] +2025-02-05 12:51:42 - ERROR - stderr - +2025-02-05 12:51:42 - ERROR - stderr - +2025-02-05 12:51:42 - INFO - stdout - {'loss': 0.9531, 'grad_norm': 1.100459337234497, 'learning_rate': 1.8932892043700125e-05, 'epoch': 0.52} +2025-02-05 12:51:42 - ERROR - stderr - 17%|█▋ | 3903/22434 [2:44:01<12:57:48, 2.52s/it] +2025-02-05 12:51:44 - ERROR - stderr - 17%|█▋ | 3904/22434 [2:44:04<12:52:08, 2.50s/it] +2025-02-05 12:51:44 - ERROR - stderr - +2025-02-05 12:51:44 - ERROR - stderr - +2025-02-05 12:51:44 - INFO - stdout - {'loss': 1.0201, 'grad_norm': 1.258790373802185, 'learning_rate': 1.8932243011955154e-05, 'epoch': 0.52} +2025-02-05 12:51:44 - ERROR - stderr - 17%|█▋ | 3904/22434 [2:44:04<12:52:08, 2.50s/it] +2025-02-05 12:51:46 - ERROR - stderr - 17%|█▋ | 3905/22434 [2:44:06<12:49:27, 2.49s/it] +2025-02-05 12:51:47 - ERROR - stderr - +2025-02-05 12:51:47 - ERROR - stderr - +2025-02-05 12:51:47 - INFO - stdout - {'loss': 0.9818, 'grad_norm': 1.1220483779907227, 'learning_rate': 1.8931593794026128e-05, 'epoch': 0.52} +2025-02-05 12:51:47 - ERROR - stderr - 17%|█▋ | 3905/22434 [2:44:06<12:49:27, 2.49s/it] +2025-02-05 12:51:49 - ERROR - stderr - 17%|█▋ | 3906/22434 [2:44:09<12:45:38, 2.48s/it] +2025-02-05 12:51:49 - ERROR - stderr - +2025-02-05 12:51:49 - ERROR - stderr - +2025-02-05 12:51:49 - INFO - stdout - {'loss': 0.919, 'grad_norm': 1.107544183731079, 'learning_rate': 1.8930944389926575e-05, 'epoch': 0.52} +2025-02-05 12:51:49 - ERROR - stderr - 17%|█▋ | 3906/22434 [2:44:09<12:45:38, 2.48s/it] +2025-02-05 12:51:52 - ERROR - stderr - 17%|█▋ | 3907/22434 [2:44:11<12:56:23, 2.51s/it] +2025-02-05 12:51:52 - ERROR - stderr - +2025-02-05 12:51:52 - ERROR - stderr - +2025-02-05 12:51:52 - INFO - stdout - {'loss': 0.9646, 'grad_norm': 1.2315235137939453, 'learning_rate': 1.8930294799670034e-05, 'epoch': 0.52} +2025-02-05 12:51:52 - ERROR - stderr - 17%|█▋ | 3907/22434 [2:44:11<12:56:23, 2.51s/it] +2025-02-05 12:51:54 - ERROR - stderr - 17%|█▋ | 3908/22434 [2:44:14<12:52:15, 2.50s/it] +2025-02-05 12:51:54 - ERROR - stderr - +2025-02-05 12:51:54 - ERROR - stderr - +2025-02-05 12:51:54 - INFO - stdout - {'loss': 0.7975, 'grad_norm': 0.9500715136528015, 'learning_rate': 1.892964502327004e-05, 'epoch': 0.52} +2025-02-05 12:51:54 - ERROR - stderr - 17%|█▋ | 3908/22434 [2:44:14<12:52:15, 2.50s/it] +2025-02-05 12:51:56 - ERROR - stderr - 17%|█▋ | 3909/22434 [2:44:16<12:49:19, 2.49s/it] +2025-02-05 12:51:56 - ERROR - stderr - +2025-02-05 12:51:56 - ERROR - stderr - +2025-02-05 12:51:56 - INFO - stdout - {'loss': 0.9765, 'grad_norm': 1.1456037759780884, 'learning_rate': 1.8928995060740144e-05, 'epoch': 0.52} +2025-02-05 12:51:56 - ERROR - stderr - 17%|█▋ | 3909/22434 [2:44:16<12:49:19, 2.49s/it] +2025-02-05 12:51:59 - ERROR - stderr - 17%|█▋ | 3910/22434 [2:44:19<12:46:05, 2.48s/it] +2025-02-05 12:51:59 - ERROR - stderr - +2025-02-05 12:51:59 - ERROR - stderr - +2025-02-05 12:51:59 - INFO - stdout - {'loss': 0.9696, 'grad_norm': 1.0872883796691895, 'learning_rate': 1.8928344912093887e-05, 'epoch': 0.52} +2025-02-05 12:51:59 - ERROR - stderr - 17%|█▋ | 3910/22434 [2:44:19<12:46:05, 2.48s/it] +2025-02-05 12:52:01 - ERROR - stderr - 17%|█▋ | 3911/22434 [2:44:21<12:40:33, 2.46s/it] +2025-02-05 12:52:01 - ERROR - stderr - +2025-02-05 12:52:01 - ERROR - stderr - +2025-02-05 12:52:01 - INFO - stdout - {'loss': 0.8645, 'grad_norm': 1.057666540145874, 'learning_rate': 1.8927694577344825e-05, 'epoch': 0.52} +2025-02-05 12:52:01 - ERROR - stderr - 17%|█▋ | 3911/22434 [2:44:21<12:40:33, 2.46s/it] +2025-02-05 12:52:04 - ERROR - stderr - 17%|█▋ | 3912/22434 [2:44:24<12:35:33, 2.45s/it] +2025-02-05 12:52:04 - ERROR - stderr - +2025-02-05 12:52:04 - ERROR - stderr - +2025-02-05 12:52:04 - INFO - stdout - {'loss': 0.9933, 'grad_norm': 1.1444308757781982, 'learning_rate': 1.892704405650651e-05, 'epoch': 0.52} +2025-02-05 12:52:04 - ERROR - stderr - 17%|█▋ | 3912/22434 [2:44:24<12:35:33, 2.45s/it] +2025-02-05 12:52:06 - ERROR - stderr - 17%|█▋ | 3913/22434 [2:44:26<12:41:04, 2.47s/it] +2025-02-05 12:52:06 - ERROR - stderr - +2025-02-05 12:52:06 - ERROR - stderr - +2025-02-05 12:52:06 - INFO - stdout - {'loss': 0.9565, 'grad_norm': 1.1053740978240967, 'learning_rate': 1.8926393349592506e-05, 'epoch': 0.52} +2025-02-05 12:52:06 - ERROR - stderr - 17%|█▋ | 3913/22434 [2:44:26<12:41:04, 2.47s/it] +2025-02-05 12:52:09 - ERROR - stderr - 17%|█▋ | 3914/22434 [2:44:28<12:42:36, 2.47s/it] +2025-02-05 12:52:09 - ERROR - stderr - +2025-02-05 12:52:09 - ERROR - stderr - +2025-02-05 12:52:09 - INFO - stdout - {'loss': 1.0202, 'grad_norm': 1.1594345569610596, 'learning_rate': 1.8925742456616375e-05, 'epoch': 0.52} +2025-02-05 12:52:09 - ERROR - stderr - 17%|█▋ | 3914/22434 [2:44:29<12:42:36, 2.47s/it] +2025-02-05 12:52:11 - ERROR - stderr - 17%|█▋ | 3915/22434 [2:44:31<12:42:09, 2.47s/it] +2025-02-05 12:52:11 - ERROR - stderr - +2025-02-05 12:52:11 - ERROR - stderr - +2025-02-05 12:52:11 - INFO - stdout - {'loss': 0.983, 'grad_norm': 1.0516413450241089, 'learning_rate': 1.8925091377591684e-05, 'epoch': 0.52} +2025-02-05 12:52:11 - ERROR - stderr - 17%|█▋ | 3915/22434 [2:44:31<12:42:09, 2.47s/it] +2025-02-05 12:52:14 - ERROR - stderr - 17%|█▋ | 3916/22434 [2:44:33<12:38:12, 2.46s/it] +2025-02-05 12:52:14 - ERROR - stderr - +2025-02-05 12:52:14 - ERROR - stderr - +2025-02-05 12:52:14 - INFO - stdout - {'loss': 0.9984, 'grad_norm': 1.1840918064117432, 'learning_rate': 1.8924440112532e-05, 'epoch': 0.52} +2025-02-05 12:52:14 - ERROR - stderr - 17%|█▋ | 3916/22434 [2:44:33<12:38:12, 2.46s/it] +2025-02-05 12:52:16 - ERROR - stderr - 17%|█▋ | 3917/22434 [2:44:36<12:45:36, 2.48s/it] +2025-02-05 12:52:16 - ERROR - stderr - +2025-02-05 12:52:16 - ERROR - stderr - +2025-02-05 12:52:16 - INFO - stdout - {'loss': 0.9514, 'grad_norm': 1.076615333557129, 'learning_rate': 1.892378866145091e-05, 'epoch': 0.52} +2025-02-05 12:52:16 - ERROR - stderr - 17%|█▋ | 3917/22434 [2:44:36<12:45:36, 2.48s/it] +2025-02-05 12:52:19 - ERROR - stderr - 17%|█▋ | 3918/22434 [2:44:38<12:46:50, 2.48s/it] +2025-02-05 12:52:19 - ERROR - stderr - +2025-02-05 12:52:19 - ERROR - stderr - +2025-02-05 12:52:19 - INFO - stdout - {'loss': 0.9191, 'grad_norm': 1.0013864040374756, 'learning_rate': 1.8923137024361975e-05, 'epoch': 0.52} +2025-02-05 12:52:19 - ERROR - stderr - 17%|█▋ | 3918/22434 [2:44:38<12:46:50, 2.48s/it] +2025-02-05 12:52:21 - ERROR - stderr - 17%|█▋ | 3919/22434 [2:44:41<12:43:23, 2.47s/it] +2025-02-05 12:52:21 - ERROR - stderr - +2025-02-05 12:52:21 - ERROR - stderr - +2025-02-05 12:52:21 - INFO - stdout - {'loss': 1.0503, 'grad_norm': 1.20210862159729, 'learning_rate': 1.8922485201278792e-05, 'epoch': 0.52} +2025-02-05 12:52:21 - ERROR - stderr - 17%|█▋ | 3919/22434 [2:44:41<12:43:23, 2.47s/it] +2025-02-05 12:52:24 - ERROR - stderr - 17%|█▋ | 3920/22434 [2:44:43<12:38:38, 2.46s/it] +2025-02-05 12:52:24 - ERROR - stderr - +2025-02-05 12:52:24 - ERROR - stderr - +2025-02-05 12:52:24 - INFO - stdout - {'loss': 0.9515, 'grad_norm': 1.0958980321884155, 'learning_rate': 1.892183319221494e-05, 'epoch': 0.52} +2025-02-05 12:52:24 - ERROR - stderr - 17%|█▋ | 3920/22434 [2:44:43<12:38:38, 2.46s/it] +2025-02-05 12:52:26 - ERROR - stderr - 17%|█▋ | 3921/22434 [2:44:46<12:48:59, 2.49s/it] +2025-02-05 12:52:26 - ERROR - stderr - +2025-02-05 12:52:26 - ERROR - stderr - +2025-02-05 12:52:26 - INFO - stdout - {'loss': 1.0282, 'grad_norm': 1.1728498935699463, 'learning_rate': 1.8921180997184014e-05, 'epoch': 0.52} +2025-02-05 12:52:26 - ERROR - stderr - 17%|█▋ | 3921/22434 [2:44:46<12:48:59, 2.49s/it] +2025-02-05 12:52:29 - ERROR - stderr - 17%|█▋ | 3922/22434 [2:44:49<13:15:38, 2.58s/it] +2025-02-05 12:52:29 - ERROR - stderr - +2025-02-05 12:52:29 - ERROR - stderr - +2025-02-05 12:52:29 - INFO - stdout - {'loss': 1.0716, 'grad_norm': 1.139930248260498, 'learning_rate': 1.892052861619961e-05, 'epoch': 0.52} +2025-02-05 12:52:29 - ERROR - stderr - 17%|█▋ | 3922/22434 [2:44:49<13:15:38, 2.58s/it] +2025-02-05 12:52:32 - ERROR - stderr - 17%|█▋ | 3923/22434 [2:44:51<13:24:05, 2.61s/it] +2025-02-05 12:52:32 - ERROR - stderr - +2025-02-05 12:52:32 - ERROR - stderr - +2025-02-05 12:52:32 - INFO - stdout - {'loss': 0.9014, 'grad_norm': 1.098024845123291, 'learning_rate': 1.8919876049275318e-05, 'epoch': 0.52} +2025-02-05 12:52:32 - ERROR - stderr - 17%|█▋ | 3923/22434 [2:44:51<13:24:05, 2.61s/it] +2025-02-05 12:52:34 - ERROR - stderr - 17%|█▋ | 3924/22434 [2:44:54<13:16:44, 2.58s/it] +2025-02-05 12:52:34 - ERROR - stderr - +2025-02-05 12:52:34 - ERROR - stderr - +2025-02-05 12:52:34 - INFO - stdout - {'loss': 1.0292, 'grad_norm': 1.1482025384902954, 'learning_rate': 1.8919223296424746e-05, 'epoch': 0.52} +2025-02-05 12:52:34 - ERROR - stderr - 17%|█▋ | 3924/22434 [2:44:54<13:16:44, 2.58s/it] +2025-02-05 12:52:37 - ERROR - stderr - 17%|█▋ | 3925/22434 [2:44:56<13:11:20, 2.57s/it] +2025-02-05 12:52:37 - ERROR - stderr - +2025-02-05 12:52:37 - ERROR - stderr - +2025-02-05 12:52:37 - INFO - stdout - {'loss': 1.0716, 'grad_norm': 1.079440712928772, 'learning_rate': 1.8918570357661502e-05, 'epoch': 0.52} +2025-02-05 12:52:37 - ERROR - stderr - 17%|█▋ | 3925/22434 [2:44:56<13:11:20, 2.57s/it] +2025-02-05 12:52:39 - ERROR - stderr - 18%|█▊ | 3926/22434 [2:44:59<13:11:57, 2.57s/it] +2025-02-05 12:52:39 - ERROR - stderr - +2025-02-05 12:52:39 - ERROR - stderr - +2025-02-05 12:52:39 - INFO - stdout - {'loss': 1.0171, 'grad_norm': 1.2071138620376587, 'learning_rate': 1.891791723299919e-05, 'epoch': 0.53} +2025-02-05 12:52:39 - ERROR - stderr - 18%|█▊ | 3926/22434 [2:44:59<13:11:57, 2.57s/it] +2025-02-05 12:52:42 - ERROR - stderr - 18%|█▊ | 3927/22434 [2:45:01<13:07:11, 2.55s/it] +2025-02-05 12:52:42 - ERROR - stderr - +2025-02-05 12:52:42 - ERROR - stderr - +2025-02-05 12:52:42 - INFO - stdout - {'loss': 0.993, 'grad_norm': 1.0317797660827637, 'learning_rate': 1.8917263922451427e-05, 'epoch': 0.53} +2025-02-05 12:52:42 - ERROR - stderr - 18%|█▊ | 3927/22434 [2:45:02<13:07:11, 2.55s/it] +2025-02-05 12:52:44 - ERROR - stderr - 18%|█▊ | 3928/22434 [2:45:04<12:58:33, 2.52s/it] +2025-02-05 12:52:44 - ERROR - stderr - +2025-02-05 12:52:44 - ERROR - stderr - +2025-02-05 12:52:44 - INFO - stdout - {'loss': 0.9571, 'grad_norm': 1.1713004112243652, 'learning_rate': 1.8916610426031835e-05, 'epoch': 0.53} +2025-02-05 12:52:44 - ERROR - stderr - 18%|█▊ | 3928/22434 [2:45:04<12:58:33, 2.52s/it] +2025-02-05 12:52:47 - ERROR - stderr - 18%|█▊ | 3929/22434 [2:45:06<12:58:55, 2.53s/it] +2025-02-05 12:52:47 - ERROR - stderr - +2025-02-05 12:52:47 - ERROR - stderr - +2025-02-05 12:52:47 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 1.0108625888824463, 'learning_rate': 1.8915956743754026e-05, 'epoch': 0.53} +2025-02-05 12:52:47 - ERROR - stderr - 18%|█▊ | 3929/22434 [2:45:06<12:58:55, 2.53s/it] +2025-02-05 12:52:49 - ERROR - stderr - 18%|█▊ | 3930/22434 [2:45:09<12:52:59, 2.51s/it] +2025-02-05 12:52:49 - ERROR - stderr - +2025-02-05 12:52:49 - ERROR - stderr - +2025-02-05 12:52:49 - INFO - stdout - {'loss': 0.9245, 'grad_norm': 1.0294760465621948, 'learning_rate': 1.8915302875631633e-05, 'epoch': 0.53} +2025-02-05 12:52:49 - ERROR - stderr - 18%|█▊ | 3930/22434 [2:45:09<12:52:59, 2.51s/it] +2025-02-05 12:52:52 - ERROR - stderr - 18%|█▊ | 3931/22434 [2:45:11<12:53:22, 2.51s/it] +2025-02-05 12:52:52 - ERROR - stderr - +2025-02-05 12:52:52 - ERROR - stderr - +2025-02-05 12:52:52 - INFO - stdout - {'loss': 1.0639, 'grad_norm': 1.2941956520080566, 'learning_rate': 1.8914648821678278e-05, 'epoch': 0.53} +2025-02-05 12:52:52 - ERROR - stderr - 18%|█▊ | 3931/22434 [2:45:11<12:53:22, 2.51s/it] +2025-02-05 12:52:54 - ERROR - stderr - 18%|█▊ | 3932/22434 [2:45:14<12:47:32, 2.49s/it] +2025-02-05 12:52:54 - ERROR - stderr - +2025-02-05 12:52:54 - ERROR - stderr - +2025-02-05 12:52:54 - INFO - stdout - {'loss': 0.8877, 'grad_norm': 1.0763232707977295, 'learning_rate': 1.8913994581907605e-05, 'epoch': 0.53} +2025-02-05 12:52:54 - ERROR - stderr - 18%|█▊ | 3932/22434 [2:45:14<12:47:32, 2.49s/it] +2025-02-05 12:52:57 - ERROR - stderr - 18%|█▊ | 3933/22434 [2:45:17<13:14:30, 2.58s/it] +2025-02-05 12:52:57 - ERROR - stderr - +2025-02-05 12:52:57 - ERROR - stderr - +2025-02-05 12:52:57 - INFO - stdout - {'loss': 0.9157, 'grad_norm': 1.0422208309173584, 'learning_rate': 1.891334015633324e-05, 'epoch': 0.53} +2025-02-05 12:52:57 - ERROR - stderr - 18%|█▊ | 3933/22434 [2:45:17<13:14:30, 2.58s/it] +2025-02-05 12:52:59 - ERROR - stderr - 18%|█▊ | 3934/22434 [2:45:19<13:11:14, 2.57s/it] +2025-02-05 12:52:59 - ERROR - stderr - +2025-02-05 12:52:59 - ERROR - stderr - +2025-02-05 12:52:59 - INFO - stdout - {'loss': 1.0086, 'grad_norm': 1.0282213687896729, 'learning_rate': 1.891268554496883e-05, 'epoch': 0.53} +2025-02-05 12:52:59 - ERROR - stderr - 18%|█▊ | 3934/22434 [2:45:19<13:11:14, 2.57s/it] +2025-02-05 12:53:02 - ERROR - stderr - 18%|█▊ | 3935/22434 [2:45:22<12:57:58, 2.52s/it] +2025-02-05 12:53:02 - ERROR - stderr - +2025-02-05 12:53:02 - ERROR - stderr - +2025-02-05 12:53:02 - INFO - stdout - {'loss': 0.9986, 'grad_norm': 1.2093687057495117, 'learning_rate': 1.8912030747828018e-05, 'epoch': 0.53} +2025-02-05 12:53:02 - ERROR - stderr - 18%|█▊ | 3935/22434 [2:45:22<12:57:58, 2.52s/it] +2025-02-05 12:53:04 - ERROR - stderr - 18%|█▊ | 3936/22434 [2:45:24<12:52:18, 2.51s/it] +2025-02-05 12:53:04 - ERROR - stderr - +2025-02-05 12:53:04 - ERROR - stderr - +2025-02-05 12:53:04 - INFO - stdout - {'loss': 1.0043, 'grad_norm': 1.0463991165161133, 'learning_rate': 1.8911375764924455e-05, 'epoch': 0.53} +2025-02-05 12:53:04 - ERROR - stderr - 18%|█▊ | 3936/22434 [2:45:24<12:52:18, 2.51s/it] +2025-02-05 12:53:07 - ERROR - stderr - 18%|█▊ | 3937/22434 [2:45:27<12:53:45, 2.51s/it] +2025-02-05 12:53:07 - ERROR - stderr - +2025-02-05 12:53:07 - ERROR - stderr - +2025-02-05 12:53:07 - INFO - stdout - {'loss': 0.9172, 'grad_norm': 1.0864888429641724, 'learning_rate': 1.8910720596271787e-05, 'epoch': 0.53} +2025-02-05 12:53:07 - ERROR - stderr - 18%|█▊ | 3937/22434 [2:45:27<12:53:45, 2.51s/it] +2025-02-05 12:53:09 - ERROR - stderr - 18%|█▊ | 3938/22434 [2:45:29<12:54:40, 2.51s/it] +2025-02-05 12:53:09 - ERROR - stderr - +2025-02-05 12:53:09 - ERROR - stderr - +2025-02-05 12:53:09 - INFO - stdout - {'loss': 0.9725, 'grad_norm': 1.023023009300232, 'learning_rate': 1.891006524188368e-05, 'epoch': 0.53} +2025-02-05 12:53:09 - ERROR - stderr - 18%|█▊ | 3938/22434 [2:45:29<12:54:40, 2.51s/it] +2025-02-05 12:53:12 - ERROR - stderr - 18%|█▊ | 3939/22434 [2:45:32<13:02:08, 2.54s/it] +2025-02-05 12:53:12 - ERROR - stderr - +2025-02-05 12:53:12 - ERROR - stderr - +2025-02-05 12:53:12 - INFO - stdout - {'loss': 0.8713, 'grad_norm': 1.079361915588379, 'learning_rate': 1.8909409701773787e-05, 'epoch': 0.53} +2025-02-05 12:53:12 - ERROR - stderr - 18%|█▊ | 3939/22434 [2:45:32<13:02:08, 2.54s/it] +2025-02-05 12:53:14 - ERROR - stderr - 18%|█▊ | 3940/22434 [2:45:34<12:55:19, 2.52s/it] +2025-02-05 12:53:14 - ERROR - stderr - +2025-02-05 12:53:14 - ERROR - stderr - +2025-02-05 12:53:14 - INFO - stdout - {'loss': 0.924, 'grad_norm': 1.0619900226593018, 'learning_rate': 1.8908753975955772e-05, 'epoch': 0.53} +2025-02-05 12:53:14 - ERROR - stderr - 18%|█▊ | 3940/22434 [2:45:34<12:55:19, 2.52s/it] +2025-02-05 12:53:17 - ERROR - stderr - 18%|█▊ | 3941/22434 [2:45:37<12:54:00, 2.51s/it] +2025-02-05 12:53:17 - ERROR - stderr - +2025-02-05 12:53:17 - ERROR - stderr - +2025-02-05 12:53:17 - INFO - stdout - {'loss': 0.9711, 'grad_norm': 1.067112684249878, 'learning_rate': 1.890809806444331e-05, 'epoch': 0.53} +2025-02-05 12:53:17 - ERROR - stderr - 18%|█▊ | 3941/22434 [2:45:37<12:54:00, 2.51s/it] +2025-02-05 12:53:19 - ERROR - stderr - 18%|█▊ | 3942/22434 [2:45:39<12:47:25, 2.49s/it] +2025-02-05 12:53:19 - ERROR - stderr - +2025-02-05 12:53:19 - ERROR - stderr - +2025-02-05 12:53:19 - INFO - stdout - {'loss': 0.9091, 'grad_norm': 1.1576350927352905, 'learning_rate': 1.8907441967250064e-05, 'epoch': 0.53} +2025-02-05 12:53:19 - ERROR - stderr - 18%|█▊ | 3942/22434 [2:45:39<12:47:25, 2.49s/it] +2025-02-05 12:53:22 - ERROR - stderr - 18%|█▊ | 3943/22434 [2:45:42<12:41:41, 2.47s/it] +2025-02-05 12:53:22 - ERROR - stderr - +2025-02-05 12:53:22 - ERROR - stderr - +2025-02-05 12:53:22 - INFO - stdout - {'loss': 0.9792, 'grad_norm': 1.2047412395477295, 'learning_rate': 1.8906785684389715e-05, 'epoch': 0.53} +2025-02-05 12:53:22 - ERROR - stderr - 18%|█▊ | 3943/22434 [2:45:42<12:41:41, 2.47s/it] +2025-02-05 12:53:24 - ERROR - stderr - 18%|█▊ | 3944/22434 [2:45:44<12:37:35, 2.46s/it] +2025-02-05 12:53:24 - ERROR - stderr - +2025-02-05 12:53:24 - ERROR - stderr - +2025-02-05 12:53:24 - INFO - stdout - {'loss': 0.8706, 'grad_norm': 0.9922213554382324, 'learning_rate': 1.8906129215875943e-05, 'epoch': 0.53} +2025-02-05 12:53:24 - ERROR - stderr - 18%|█▊ | 3944/22434 [2:45:44<12:37:35, 2.46s/it] +2025-02-05 12:53:27 - ERROR - stderr - 18%|█▊ | 3945/22434 [2:45:46<12:37:51, 2.46s/it] +2025-02-05 12:53:27 - ERROR - stderr - +2025-02-05 12:53:27 - ERROR - stderr - +2025-02-05 12:53:27 - INFO - stdout - {'loss': 1.0702, 'grad_norm': 1.1775661706924438, 'learning_rate': 1.8905472561722425e-05, 'epoch': 0.53} +2025-02-05 12:53:27 - ERROR - stderr - 18%|█▊ | 3945/22434 [2:45:46<12:37:51, 2.46s/it] +2025-02-05 12:53:29 - ERROR - stderr - 18%|█▊ | 3946/22434 [2:45:49<12:35:51, 2.45s/it] +2025-02-05 12:53:29 - ERROR - stderr - +2025-02-05 12:53:29 - ERROR - stderr - +2025-02-05 12:53:29 - INFO - stdout - {'loss': 1.0432, 'grad_norm': 1.1330151557922363, 'learning_rate': 1.8904815721942857e-05, 'epoch': 0.53} +2025-02-05 12:53:29 - ERROR - stderr - 18%|█▊ | 3946/22434 [2:45:49<12:35:51, 2.45s/it] +2025-02-05 12:53:32 - ERROR - stderr - 18%|█▊ | 3947/22434 [2:45:51<12:34:30, 2.45s/it] +2025-02-05 12:53:32 - ERROR - stderr - +2025-02-05 12:53:32 - ERROR - stderr - +2025-02-05 12:53:32 - INFO - stdout - {'loss': 1.0329, 'grad_norm': 1.0949708223342896, 'learning_rate': 1.8904158696550927e-05, 'epoch': 0.53} +2025-02-05 12:53:32 - ERROR - stderr - 18%|█▊ | 3947/22434 [2:45:51<12:34:30, 2.45s/it] +2025-02-05 12:53:34 - ERROR - stderr - 18%|█▊ | 3948/22434 [2:45:54<12:37:17, 2.46s/it] +2025-02-05 12:53:34 - ERROR - stderr - +2025-02-05 12:53:34 - ERROR - stderr - +2025-02-05 12:53:34 - INFO - stdout - {'loss': 1.0306, 'grad_norm': 1.1763720512390137, 'learning_rate': 1.8903501485560328e-05, 'epoch': 0.53} +2025-02-05 12:53:34 - ERROR - stderr - 18%|█▊ | 3948/22434 [2:45:54<12:37:17, 2.46s/it] +2025-02-05 12:53:36 - ERROR - stderr - 18%|█▊ | 3949/22434 [2:45:56<12:35:58, 2.45s/it] +2025-02-05 12:53:37 - ERROR - stderr - +2025-02-05 12:53:37 - ERROR - stderr - +2025-02-05 12:53:37 - INFO - stdout - {'loss': 0.8144, 'grad_norm': 1.034354329109192, 'learning_rate': 1.8902844088984757e-05, 'epoch': 0.53} +2025-02-05 12:53:37 - ERROR - stderr - 18%|█▊ | 3949/22434 [2:45:56<12:35:58, 2.45s/it] +2025-02-05 12:53:39 - ERROR - stderr - 18%|█▊ | 3950/22434 [2:45:59<13:01:04, 2.54s/it] +2025-02-05 12:53:39 - ERROR - stderr - +2025-02-05 12:53:39 - ERROR - stderr - +2025-02-05 12:53:39 - INFO - stdout - {'loss': 0.9686, 'grad_norm': 1.0692715644836426, 'learning_rate': 1.8902186506837924e-05, 'epoch': 0.53} +2025-02-05 12:53:39 - ERROR - stderr - 18%|█▊ | 3950/22434 [2:45:59<13:01:04, 2.54s/it] +2025-02-05 12:53:42 - ERROR - stderr - 18%|█▊ | 3951/22434 [2:46:01<12:57:31, 2.52s/it] +2025-02-05 12:53:42 - ERROR - stderr - +2025-02-05 12:53:42 - ERROR - stderr - +2025-02-05 12:53:42 - INFO - stdout - {'loss': 0.8168, 'grad_norm': 0.9756340384483337, 'learning_rate': 1.890152873913353e-05, 'epoch': 0.53} +2025-02-05 12:53:42 - ERROR - stderr - 18%|█▊ | 3951/22434 [2:46:01<12:57:31, 2.52s/it] +2025-02-05 12:53:44 - ERROR - stderr - 18%|█▊ | 3952/22434 [2:46:04<12:58:23, 2.53s/it] +2025-02-05 12:53:44 - ERROR - stderr - +2025-02-05 12:53:44 - ERROR - stderr - +2025-02-05 12:53:44 - INFO - stdout - {'loss': 1.0726, 'grad_norm': 1.1703331470489502, 'learning_rate': 1.8900870785885288e-05, 'epoch': 0.53} +2025-02-05 12:53:44 - ERROR - stderr - 18%|█▊ | 3952/22434 [2:46:04<12:58:23, 2.53s/it] +2025-02-05 12:53:47 - ERROR - stderr - 18%|█▊ | 3953/22434 [2:46:06<12:53:36, 2.51s/it] +2025-02-05 12:53:47 - ERROR - stderr - +2025-02-05 12:53:47 - ERROR - stderr - +2025-02-05 12:53:47 - INFO - stdout - {'loss': 0.909, 'grad_norm': 1.0233592987060547, 'learning_rate': 1.890021264710691e-05, 'epoch': 0.53} +2025-02-05 12:53:47 - ERROR - stderr - 18%|█▊ | 3953/22434 [2:46:07<12:53:36, 2.51s/it] +2025-02-05 12:53:49 - ERROR - stderr - 18%|█▊ | 3954/22434 [2:46:09<13:06:47, 2.55s/it] +2025-02-05 12:53:49 - ERROR - stderr - +2025-02-05 12:53:49 - ERROR - stderr - +2025-02-05 12:53:49 - INFO - stdout - {'loss': 0.9969, 'grad_norm': 1.038329839706421, 'learning_rate': 1.889955432281212e-05, 'epoch': 0.53} +2025-02-05 12:53:49 - ERROR - stderr - 18%|█▊ | 3954/22434 [2:46:09<13:06:47, 2.55s/it] +2025-02-05 12:53:52 - ERROR - stderr - 18%|█▊ | 3955/22434 [2:46:12<12:55:38, 2.52s/it] +2025-02-05 12:53:52 - ERROR - stderr - +2025-02-05 12:53:52 - ERROR - stderr - +2025-02-05 12:53:52 - INFO - stdout - {'loss': 0.995, 'grad_norm': 1.1241778135299683, 'learning_rate': 1.8898895813014633e-05, 'epoch': 0.53} +2025-02-05 12:53:52 - ERROR - stderr - 18%|█▊ | 3955/22434 [2:46:12<12:55:38, 2.52s/it] +2025-02-05 12:53:54 - ERROR - stderr - 18%|█▊ | 3956/22434 [2:46:14<12:48:32, 2.50s/it] +2025-02-05 12:53:54 - ERROR - stderr - +2025-02-05 12:53:54 - ERROR - stderr - +2025-02-05 12:53:54 - INFO - stdout - {'loss': 0.8693, 'grad_norm': 1.034817099571228, 'learning_rate': 1.8898237117728177e-05, 'epoch': 0.53} +2025-02-05 12:53:54 - ERROR - stderr - 18%|█▊ | 3956/22434 [2:46:14<12:48:32, 2.50s/it] +2025-02-05 12:53:57 - ERROR - stderr - 18%|█▊ | 3957/22434 [2:46:16<12:47:43, 2.49s/it] +2025-02-05 12:53:57 - ERROR - stderr - +2025-02-05 12:53:57 - ERROR - stderr - +2025-02-05 12:53:57 - INFO - stdout - {'loss': 0.9579, 'grad_norm': 1.0925058126449585, 'learning_rate': 1.8897578236966486e-05, 'epoch': 0.53} +2025-02-05 12:53:57 - ERROR - stderr - 18%|█▊ | 3957/22434 [2:46:17<12:47:43, 2.49s/it] +2025-02-05 12:53:59 - ERROR - stderr - 18%|█▊ | 3958/22434 [2:46:19<12:51:52, 2.51s/it] +2025-02-05 12:53:59 - ERROR - stderr - +2025-02-05 12:53:59 - ERROR - stderr - +2025-02-05 12:53:59 - INFO - stdout - {'loss': 0.9749, 'grad_norm': 1.1579203605651855, 'learning_rate': 1.889691917074329e-05, 'epoch': 0.53} +2025-02-05 12:53:59 - ERROR - stderr - 18%|█▊ | 3958/22434 [2:46:19<12:51:52, 2.51s/it] +2025-02-05 12:54:02 - ERROR - stderr - 18%|█▊ | 3959/22434 [2:46:21<12:45:02, 2.48s/it] +2025-02-05 12:54:02 - ERROR - stderr - +2025-02-05 12:54:02 - ERROR - stderr - +2025-02-05 12:54:02 - INFO - stdout - {'loss': 0.9824, 'grad_norm': 1.1268013715744019, 'learning_rate': 1.8896259919072325e-05, 'epoch': 0.53} +2025-02-05 12:54:02 - ERROR - stderr - 18%|█▊ | 3959/22434 [2:46:21<12:45:02, 2.48s/it] +2025-02-05 12:54:04 - ERROR - stderr - 18%|█▊ | 3960/22434 [2:46:24<12:48:08, 2.49s/it] +2025-02-05 12:54:04 - ERROR - stderr - +2025-02-05 12:54:04 - ERROR - stderr - +2025-02-05 12:54:04 - INFO - stdout - {'loss': 1.0323, 'grad_norm': 1.2145124673843384, 'learning_rate': 1.8895600481967337e-05, 'epoch': 0.53} +2025-02-05 12:54:04 - ERROR - stderr - 18%|█▊ | 3960/22434 [2:46:24<12:48:08, 2.49s/it] +2025-02-05 12:54:07 - ERROR - stderr - 18%|█▊ | 3961/22434 [2:46:26<12:42:39, 2.48s/it] +2025-02-05 12:54:07 - ERROR - stderr - +2025-02-05 12:54:07 - ERROR - stderr - +2025-02-05 12:54:07 - INFO - stdout - {'loss': 0.9027, 'grad_norm': 1.0292880535125732, 'learning_rate': 1.889494085944207e-05, 'epoch': 0.53} +2025-02-05 12:54:07 - ERROR - stderr - 18%|█▊ | 3961/22434 [2:46:26<12:42:39, 2.48s/it] +2025-02-05 12:54:09 - ERROR - stderr - 18%|█▊ | 3962/22434 [2:46:29<12:40:03, 2.47s/it] +2025-02-05 12:54:09 - ERROR - stderr - +2025-02-05 12:54:09 - ERROR - stderr - +2025-02-05 12:54:09 - INFO - stdout - {'loss': 0.908, 'grad_norm': 1.286773443222046, 'learning_rate': 1.8894281051510267e-05, 'epoch': 0.53} +2025-02-05 12:54:09 - ERROR - stderr - 18%|█▊ | 3962/22434 [2:46:29<12:40:03, 2.47s/it] +2025-02-05 12:54:12 - ERROR - stderr - 18%|█▊ | 3963/22434 [2:46:31<12:48:49, 2.50s/it] +2025-02-05 12:54:12 - ERROR - stderr - +2025-02-05 12:54:12 - ERROR - stderr - +2025-02-05 12:54:12 - INFO - stdout - {'loss': 1.0053, 'grad_norm': 1.1288243532180786, 'learning_rate': 1.889362105818569e-05, 'epoch': 0.53} +2025-02-05 12:54:12 - ERROR - stderr - 18%|█▊ | 3963/22434 [2:46:31<12:48:49, 2.50s/it] +2025-02-05 12:54:14 - ERROR - stderr - 18%|█▊ | 3964/22434 [2:46:34<12:48:21, 2.50s/it] +2025-02-05 12:54:14 - ERROR - stderr - +2025-02-05 12:54:14 - ERROR - stderr - +2025-02-05 12:54:14 - INFO - stdout - {'loss': 0.9721, 'grad_norm': 1.1426972150802612, 'learning_rate': 1.8892960879482092e-05, 'epoch': 0.53} +2025-02-05 12:54:14 - ERROR - stderr - 18%|█▊ | 3964/22434 [2:46:34<12:48:21, 2.50s/it] +2025-02-05 12:54:17 - ERROR - stderr - 18%|█▊ | 3965/22434 [2:46:36<12:46:08, 2.49s/it] +2025-02-05 12:54:17 - ERROR - stderr - +2025-02-05 12:54:17 - ERROR - stderr - +2025-02-05 12:54:17 - INFO - stdout - {'loss': 0.9593, 'grad_norm': 1.150708556175232, 'learning_rate': 1.889230051541324e-05, 'epoch': 0.53} +2025-02-05 12:54:17 - ERROR - stderr - 18%|█▊ | 3965/22434 [2:46:36<12:46:08, 2.49s/it] +2025-02-05 12:54:19 - ERROR - stderr - 18%|█▊ | 3966/22434 [2:46:39<12:45:41, 2.49s/it] +2025-02-05 12:54:19 - ERROR - stderr - +2025-02-05 12:54:19 - ERROR - stderr - +2025-02-05 12:54:19 - INFO - stdout - {'loss': 0.9213, 'grad_norm': 1.0496158599853516, 'learning_rate': 1.8891639965992884e-05, 'epoch': 0.53} +2025-02-05 12:54:19 - ERROR - stderr - 18%|█▊ | 3966/22434 [2:46:39<12:45:41, 2.49s/it] +2025-02-05 12:54:22 - ERROR - stderr - 18%|█▊ | 3967/22434 [2:46:41<12:55:27, 2.52s/it] +2025-02-05 12:54:22 - ERROR - stderr - +2025-02-05 12:54:22 - ERROR - stderr - +2025-02-05 12:54:22 - INFO - stdout - {'loss': 0.8702, 'grad_norm': 1.0713404417037964, 'learning_rate': 1.8890979231234806e-05, 'epoch': 0.53} +2025-02-05 12:54:22 - ERROR - stderr - 18%|█▊ | 3967/22434 [2:46:42<12:55:27, 2.52s/it] +2025-02-05 12:54:24 - ERROR - stderr - 18%|█▊ | 3968/22434 [2:46:44<12:52:11, 2.51s/it] +2025-02-05 12:54:24 - ERROR - stderr - +2025-02-05 12:54:24 - ERROR - stderr - +2025-02-05 12:54:24 - INFO - stdout - {'loss': 0.9381, 'grad_norm': 1.0482089519500732, 'learning_rate': 1.8890318311152773e-05, 'epoch': 0.53} +2025-02-05 12:54:24 - ERROR - stderr - 18%|█▊ | 3968/22434 [2:46:44<12:52:11, 2.51s/it] +2025-02-05 12:54:27 - ERROR - stderr - 18%|█▊ | 3969/22434 [2:46:47<13:22:02, 2.61s/it] +2025-02-05 12:54:27 - ERROR - stderr - +2025-02-05 12:54:27 - ERROR - stderr - +2025-02-05 12:54:27 - INFO - stdout - {'loss': 0.9044, 'grad_norm': 1.0940866470336914, 'learning_rate': 1.888965720576056e-05, 'epoch': 0.53} +2025-02-05 12:54:27 - ERROR - stderr - 18%|█▊ | 3969/22434 [2:46:47<13:22:02, 2.61s/it] +2025-02-05 12:54:30 - ERROR - stderr - 18%|█▊ | 3970/22434 [2:46:49<13:17:32, 2.59s/it] +2025-02-05 12:54:30 - ERROR - stderr - +2025-02-05 12:54:30 - ERROR - stderr - +2025-02-05 12:54:30 - INFO - stdout - {'loss': 1.0506, 'grad_norm': 1.1427652835845947, 'learning_rate': 1.888899591507195e-05, 'epoch': 0.53} +2025-02-05 12:54:30 - ERROR - stderr - 18%|█▊ | 3970/22434 [2:46:49<13:17:32, 2.59s/it] +2025-02-05 12:54:32 - ERROR - stderr - 18%|█▊ | 3971/22434 [2:46:52<13:07:38, 2.56s/it] +2025-02-05 12:54:32 - ERROR - stderr - +2025-02-05 12:54:32 - ERROR - stderr - +2025-02-05 12:54:32 - INFO - stdout - {'loss': 0.9982, 'grad_norm': 1.1380037069320679, 'learning_rate': 1.8888334439100728e-05, 'epoch': 0.53} +2025-02-05 12:54:32 - ERROR - stderr - 18%|█▊ | 3971/22434 [2:46:52<13:07:38, 2.56s/it] +2025-02-05 12:54:34 - ERROR - stderr - 18%|█▊ | 3972/22434 [2:46:54<12:54:34, 2.52s/it] +2025-02-05 12:54:35 - ERROR - stderr - +2025-02-05 12:54:35 - ERROR - stderr - +2025-02-05 12:54:35 - INFO - stdout - {'loss': 0.89, 'grad_norm': 0.9526935815811157, 'learning_rate': 1.8887672777860678e-05, 'epoch': 0.53} +2025-02-05 12:54:35 - ERROR - stderr - 18%|█▊ | 3972/22434 [2:46:54<12:54:34, 2.52s/it] +2025-02-05 12:54:37 - ERROR - stderr - 18%|█▊ | 3973/22434 [2:46:57<12:47:12, 2.49s/it] +2025-02-05 12:54:37 - ERROR - stderr - +2025-02-05 12:54:37 - ERROR - stderr - +2025-02-05 12:54:37 - INFO - stdout - {'loss': 0.9734, 'grad_norm': 1.0654829740524292, 'learning_rate': 1.8887010931365592e-05, 'epoch': 0.53} +2025-02-05 12:54:37 - ERROR - stderr - 18%|█▊ | 3973/22434 [2:46:57<12:47:12, 2.49s/it] +2025-02-05 12:54:40 - ERROR - stderr - 18%|█▊ | 3974/22434 [2:46:59<13:05:57, 2.55s/it] +2025-02-05 12:54:40 - ERROR - stderr - +2025-02-05 12:54:40 - ERROR - stderr - +2025-02-05 12:54:40 - INFO - stdout - {'loss': 0.9873, 'grad_norm': 1.1163285970687866, 'learning_rate': 1.888634889962927e-05, 'epoch': 0.53} +2025-02-05 12:54:40 - ERROR - stderr - 18%|█▊ | 3974/22434 [2:46:59<13:05:57, 2.55s/it] +2025-02-05 12:54:42 - ERROR - stderr - 18%|█▊ | 3975/22434 [2:47:02<12:56:04, 2.52s/it] +2025-02-05 12:54:42 - ERROR - stderr - +2025-02-05 12:54:42 - ERROR - stderr - +2025-02-05 12:54:42 - INFO - stdout - {'loss': 0.8316, 'grad_norm': 1.14678156375885, 'learning_rate': 1.8885686682665505e-05, 'epoch': 0.53} +2025-02-05 12:54:42 - ERROR - stderr - 18%|█▊ | 3975/22434 [2:47:02<12:56:04, 2.52s/it] +2025-02-05 12:54:45 - ERROR - stderr - 18%|█▊ | 3976/22434 [2:47:04<13:09:36, 2.57s/it] +2025-02-05 12:54:45 - ERROR - stderr - +2025-02-05 12:54:45 - ERROR - stderr - +2025-02-05 12:54:45 - INFO - stdout - {'loss': 0.961, 'grad_norm': 1.165987253189087, 'learning_rate': 1.8885024280488108e-05, 'epoch': 0.53} +2025-02-05 12:54:45 - ERROR - stderr - 18%|█▊ | 3976/22434 [2:47:05<13:09:36, 2.57s/it] +2025-02-05 12:54:47 - ERROR - stderr - 18%|█▊ | 3977/22434 [2:47:07<12:59:29, 2.53s/it] +2025-02-05 12:54:47 - ERROR - stderr - +2025-02-05 12:54:47 - ERROR - stderr - +2025-02-05 12:54:47 - INFO - stdout - {'loss': 1.0257, 'grad_norm': 1.1527067422866821, 'learning_rate': 1.888436169311088e-05, 'epoch': 0.53} +2025-02-05 12:54:47 - ERROR - stderr - 18%|█▊ | 3977/22434 [2:47:07<12:59:29, 2.53s/it] +2025-02-05 12:54:50 - ERROR - stderr - 18%|█▊ | 3978/22434 [2:47:10<13:29:27, 2.63s/it] +2025-02-05 12:54:50 - ERROR - stderr - +2025-02-05 12:54:50 - ERROR - stderr - +2025-02-05 12:54:50 - INFO - stdout - {'loss': 0.9289, 'grad_norm': 1.114498257637024, 'learning_rate': 1.8883698920547633e-05, 'epoch': 0.53} +2025-02-05 12:54:50 - ERROR - stderr - 18%|█▊ | 3978/22434 [2:47:10<13:29:27, 2.63s/it] +2025-02-05 12:54:53 - ERROR - stderr - 18%|█▊ | 3979/22434 [2:47:12<13:15:54, 2.59s/it] +2025-02-05 12:54:53 - ERROR - stderr - +2025-02-05 12:54:53 - ERROR - stderr - +2025-02-05 12:54:53 - INFO - stdout - {'loss': 0.8959, 'grad_norm': 1.0571329593658447, 'learning_rate': 1.8883035962812184e-05, 'epoch': 0.53} +2025-02-05 12:54:53 - ERROR - stderr - 18%|█▊ | 3979/22434 [2:47:12<13:15:54, 2.59s/it] +2025-02-05 12:54:55 - ERROR - stderr - 18%|█▊ | 3980/22434 [2:47:15<13:08:17, 2.56s/it] +2025-02-05 12:54:55 - ERROR - stderr - +2025-02-05 12:54:55 - ERROR - stderr - +2025-02-05 12:54:55 - INFO - stdout - {'loss': 0.9485, 'grad_norm': 1.1587417125701904, 'learning_rate': 1.888237281991835e-05, 'epoch': 0.53} +2025-02-05 12:54:55 - ERROR - stderr - 18%|█▊ | 3980/22434 [2:47:15<13:08:17, 2.56s/it] +2025-02-05 12:54:58 - ERROR - stderr - 18%|█▊ | 3981/22434 [2:47:17<13:01:55, 2.54s/it] +2025-02-05 12:54:58 - ERROR - stderr - +2025-02-05 12:54:58 - ERROR - stderr - +2025-02-05 12:54:58 - INFO - stdout - {'loss': 0.8969, 'grad_norm': 1.1801154613494873, 'learning_rate': 1.8881709491879954e-05, 'epoch': 0.53} +2025-02-05 12:54:58 - ERROR - stderr - 18%|█▊ | 3981/22434 [2:47:17<13:01:55, 2.54s/it] +2025-02-05 12:55:00 - ERROR - stderr - 18%|█▊ | 3982/22434 [2:47:20<12:58:53, 2.53s/it] +2025-02-05 12:55:00 - ERROR - stderr - +2025-02-05 12:55:00 - ERROR - stderr - +2025-02-05 12:55:00 - INFO - stdout - {'loss': 0.8846, 'grad_norm': 1.1700735092163086, 'learning_rate': 1.8881045978710823e-05, 'epoch': 0.53} +2025-02-05 12:55:00 - ERROR - stderr - 18%|█▊ | 3982/22434 [2:47:20<12:58:53, 2.53s/it] +2025-02-05 12:55:03 - ERROR - stderr - 18%|█▊ | 3983/22434 [2:47:23<13:29:25, 2.63s/it] +2025-02-05 12:55:03 - ERROR - stderr - +2025-02-05 12:55:03 - ERROR - stderr - +2025-02-05 12:55:03 - INFO - stdout - {'loss': 1.0252, 'grad_norm': 1.1310909986495972, 'learning_rate': 1.8880382280424786e-05, 'epoch': 0.53} +2025-02-05 12:55:03 - ERROR - stderr - 18%|█▊ | 3983/22434 [2:47:23<13:29:25, 2.63s/it] +2025-02-05 12:55:05 - ERROR - stderr - 18%|█▊ | 3984/22434 [2:47:25<13:17:13, 2.59s/it] +2025-02-05 12:55:05 - ERROR - stderr - +2025-02-05 12:55:05 - ERROR - stderr - +2025-02-05 12:55:05 - INFO - stdout - {'loss': 1.0342, 'grad_norm': 1.061645269393921, 'learning_rate': 1.887971839703568e-05, 'epoch': 0.53} +2025-02-05 12:55:05 - ERROR - stderr - 18%|█▊ | 3984/22434 [2:47:25<13:17:13, 2.59s/it] +2025-02-05 12:55:08 - ERROR - stderr - 18%|█▊ | 3985/22434 [2:47:28<13:11:00, 2.57s/it] +2025-02-05 12:55:08 - ERROR - stderr - +2025-02-05 12:55:08 - ERROR - stderr - +2025-02-05 12:55:08 - INFO - stdout - {'loss': 0.9589, 'grad_norm': 1.0959954261779785, 'learning_rate': 1.887905432855734e-05, 'epoch': 0.53} +2025-02-05 12:55:08 - ERROR - stderr - 18%|█▊ | 3985/22434 [2:47:28<13:11:00, 2.57s/it] +2025-02-05 12:55:11 - ERROR - stderr - 18%|█▊ | 3986/22434 [2:47:30<13:21:56, 2.61s/it] +2025-02-05 12:55:11 - ERROR - stderr - +2025-02-05 12:55:11 - ERROR - stderr - +2025-02-05 12:55:11 - INFO - stdout - {'loss': 0.9091, 'grad_norm': 1.0616424083709717, 'learning_rate': 1.8878390075003607e-05, 'epoch': 0.53} +2025-02-05 12:55:11 - ERROR - stderr - 18%|█▊ | 3986/22434 [2:47:30<13:21:56, 2.61s/it] +2025-02-05 12:55:13 - ERROR - stderr - 18%|█▊ | 3987/22434 [2:47:33<13:32:18, 2.64s/it] +2025-02-05 12:55:13 - ERROR - stderr - +2025-02-05 12:55:13 - ERROR - stderr - +2025-02-05 12:55:13 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.180062174797058, 'learning_rate': 1.8877725636388327e-05, 'epoch': 0.53} +2025-02-05 12:55:13 - ERROR - stderr - 18%|█▊ | 3987/22434 [2:47:33<13:32:18, 2.64s/it] +2025-02-05 12:55:16 - ERROR - stderr - 18%|█▊ | 3988/22434 [2:47:36<13:19:22, 2.60s/it] +2025-02-05 12:55:16 - ERROR - stderr - +2025-02-05 12:55:16 - ERROR - stderr - +2025-02-05 12:55:16 - INFO - stdout - {'loss': 0.9843, 'grad_norm': 1.0381860733032227, 'learning_rate': 1.8877061012725355e-05, 'epoch': 0.53} +2025-02-05 12:55:16 - ERROR - stderr - 18%|█▊ | 3988/22434 [2:47:36<13:19:22, 2.60s/it] +2025-02-05 12:55:18 - ERROR - stderr - 18%|█▊ | 3989/22434 [2:47:38<13:09:18, 2.57s/it] +2025-02-05 12:55:18 - ERROR - stderr - +2025-02-05 12:55:18 - ERROR - stderr - +2025-02-05 12:55:18 - INFO - stdout - {'loss': 0.9808, 'grad_norm': 1.099104881286621, 'learning_rate': 1.8876396204028543e-05, 'epoch': 0.53} +2025-02-05 12:55:18 - ERROR - stderr - 18%|█▊ | 3989/22434 [2:47:38<13:09:18, 2.57s/it] +2025-02-05 12:55:21 - ERROR - stderr - 18%|█▊ | 3990/22434 [2:47:41<13:03:14, 2.55s/it] +2025-02-05 12:55:21 - ERROR - stderr - +2025-02-05 12:55:21 - ERROR - stderr - +2025-02-05 12:55:21 - INFO - stdout - {'loss': 0.8673, 'grad_norm': 1.3772532939910889, 'learning_rate': 1.887573121031174e-05, 'epoch': 0.53} +2025-02-05 12:55:21 - ERROR - stderr - 18%|█▊ | 3990/22434 [2:47:41<13:03:14, 2.55s/it] +2025-02-05 12:55:23 - ERROR - stderr - 18%|█▊ | 3991/22434 [2:47:43<12:52:14, 2.51s/it] +2025-02-05 12:55:23 - ERROR - stderr - +2025-02-05 12:55:23 - ERROR - stderr - +2025-02-05 12:55:23 - INFO - stdout - {'loss': 0.9918, 'grad_norm': 1.284437656402588, 'learning_rate': 1.887506603158882e-05, 'epoch': 0.53} +2025-02-05 12:55:23 - ERROR - stderr - 18%|█▊ | 3991/22434 [2:47:43<12:52:14, 2.51s/it] +2025-02-05 12:55:26 - ERROR - stderr - 18%|█▊ | 3992/22434 [2:47:45<12:43:10, 2.48s/it] +2025-02-05 12:55:26 - ERROR - stderr - +2025-02-05 12:55:26 - ERROR - stderr - +2025-02-05 12:55:26 - INFO - stdout - {'loss': 1.0122, 'grad_norm': 1.087471604347229, 'learning_rate': 1.8874400667873634e-05, 'epoch': 0.53} +2025-02-05 12:55:26 - ERROR - stderr - 18%|█▊ | 3992/22434 [2:47:45<12:43:10, 2.48s/it] +2025-02-05 12:55:28 - ERROR - stderr - 18%|█▊ | 3993/22434 [2:47:48<12:43:23, 2.48s/it] +2025-02-05 12:55:28 - ERROR - stderr - +2025-02-05 12:55:28 - ERROR - stderr - +2025-02-05 12:55:28 - INFO - stdout - {'loss': 0.9962, 'grad_norm': 1.028607964515686, 'learning_rate': 1.887373511918006e-05, 'epoch': 0.53} +2025-02-05 12:55:28 - ERROR - stderr - 18%|█▊ | 3993/22434 [2:47:48<12:43:23, 2.48s/it] +2025-02-05 12:55:31 - ERROR - stderr - 18%|█▊ | 3994/22434 [2:47:50<12:38:34, 2.47s/it] +2025-02-05 12:55:31 - ERROR - stderr - +2025-02-05 12:55:31 - ERROR - stderr - +2025-02-05 12:55:31 - INFO - stdout - {'loss': 0.9383, 'grad_norm': 1.147425889968872, 'learning_rate': 1.887306938552197e-05, 'epoch': 0.53} +2025-02-05 12:55:31 - ERROR - stderr - 18%|█▊ | 3994/22434 [2:47:50<12:38:34, 2.47s/it] +2025-02-05 12:55:33 - ERROR - stderr - 18%|█▊ | 3995/22434 [2:47:53<13:00:56, 2.54s/it] +2025-02-05 12:55:33 - ERROR - stderr - +2025-02-05 12:55:33 - ERROR - stderr - +2025-02-05 12:55:33 - INFO - stdout - {'loss': 0.9109, 'grad_norm': 1.069148302078247, 'learning_rate': 1.887240346691324e-05, 'epoch': 0.53} +2025-02-05 12:55:33 - ERROR - stderr - 18%|█▊ | 3995/22434 [2:47:53<13:00:56, 2.54s/it] +2025-02-05 12:55:36 - ERROR - stderr - 18%|█▊ | 3996/22434 [2:47:56<13:04:08, 2.55s/it] +2025-02-05 12:55:36 - ERROR - stderr - +2025-02-05 12:55:36 - ERROR - stderr - +2025-02-05 12:55:36 - INFO - stdout - {'loss': 0.9228, 'grad_norm': 1.1001719236373901, 'learning_rate': 1.8871737363367745e-05, 'epoch': 0.53} +2025-02-05 12:55:36 - ERROR - stderr - 18%|█▊ | 3996/22434 [2:47:56<13:04:08, 2.55s/it] +2025-02-05 12:55:38 - ERROR - stderr - 18%|█▊ | 3997/22434 [2:47:58<13:00:50, 2.54s/it] +2025-02-05 12:55:38 - ERROR - stderr - +2025-02-05 12:55:38 - ERROR - stderr - +2025-02-05 12:55:38 - INFO - stdout - {'loss': 1.0522, 'grad_norm': 1.1183935403823853, 'learning_rate': 1.887107107489938e-05, 'epoch': 0.53} +2025-02-05 12:55:38 - ERROR - stderr - 18%|█▊ | 3997/22434 [2:47:58<13:00:50, 2.54s/it] +2025-02-05 12:55:41 - ERROR - stderr - 18%|█▊ | 3998/22434 [2:48:01<13:01:24, 2.54s/it] +2025-02-05 12:55:41 - ERROR - stderr - +2025-02-05 12:55:41 - ERROR - stderr - +2025-02-05 12:55:41 - INFO - stdout - {'loss': 0.9477, 'grad_norm': 1.1498290300369263, 'learning_rate': 1.8870404601522022e-05, 'epoch': 0.53} +2025-02-05 12:55:41 - ERROR - stderr - 18%|█▊ | 3998/22434 [2:48:01<13:01:24, 2.54s/it] +2025-02-05 12:55:43 - ERROR - stderr - 18%|█▊ | 3999/22434 [2:48:03<12:56:26, 2.53s/it] +2025-02-05 12:55:43 - ERROR - stderr - +2025-02-05 12:55:43 - ERROR - stderr - +2025-02-05 12:55:43 - INFO - stdout - {'loss': 0.9049, 'grad_norm': 1.1521180868148804, 'learning_rate': 1.8869737943249572e-05, 'epoch': 0.53} +2025-02-05 12:55:43 - ERROR - stderr - 18%|█▊ | 3999/22434 [2:48:03<12:56:26, 2.53s/it] +2025-02-05 12:55:46 - ERROR - stderr - 18%|█▊ | 4000/22434 [2:48:06<13:04:18, 2.55s/it] +2025-02-05 12:55:46 - ERROR - stderr - +2025-02-05 12:55:46 - ERROR - stderr - +2025-02-05 12:55:46 - INFO - stdout - {'loss': 0.9458, 'grad_norm': 1.210731029510498, 'learning_rate': 1.8869071100095922e-05, 'epoch': 0.53} +2025-02-05 12:55:46 - ERROR - stderr - 18%|█▊ | 4000/22434 [2:48:06<13:04:18, 2.55s/it] +2025-02-05 12:55:49 - ERROR - stderr - 18%|█▊ | 4001/22434 [2:48:08<13:03:55, 2.55s/it] +2025-02-05 12:55:49 - ERROR - stderr - +2025-02-05 12:55:49 - ERROR - stderr - +2025-02-05 12:55:49 - INFO - stdout - {'loss': 1.0425, 'grad_norm': 1.0592319965362549, 'learning_rate': 1.886840407207497e-05, 'epoch': 0.54} +2025-02-05 12:55:49 - ERROR - stderr - 18%|█▊ | 4001/22434 [2:48:08<13:03:55, 2.55s/it] +2025-02-05 12:55:51 - ERROR - stderr - 18%|█▊ | 4002/22434 [2:48:11<12:55:04, 2.52s/it] +2025-02-05 12:55:51 - ERROR - stderr - +2025-02-05 12:55:51 - ERROR - stderr - +2025-02-05 12:55:51 - INFO - stdout - {'loss': 0.9616, 'grad_norm': 1.1009807586669922, 'learning_rate': 1.886773685920062e-05, 'epoch': 0.54} +2025-02-05 12:55:51 - ERROR - stderr - 18%|█▊ | 4002/22434 [2:48:11<12:55:04, 2.52s/it] +2025-02-05 12:55:54 - ERROR - stderr - 18%|█▊ | 4003/22434 [2:48:13<12:49:56, 2.51s/it] +2025-02-05 12:55:54 - ERROR - stderr - +2025-02-05 12:55:54 - ERROR - stderr - +2025-02-05 12:55:54 - INFO - stdout - {'loss': 0.9673, 'grad_norm': 1.0995705127716064, 'learning_rate': 1.8867069461486785e-05, 'epoch': 0.54} +2025-02-05 12:55:54 - ERROR - stderr - 18%|█▊ | 4003/22434 [2:48:13<12:49:56, 2.51s/it] +2025-02-05 12:55:56 - ERROR - stderr - 18%|█▊ | 4004/22434 [2:48:16<12:47:42, 2.50s/it] +2025-02-05 12:55:56 - ERROR - stderr - +2025-02-05 12:55:56 - ERROR - stderr - +2025-02-05 12:55:56 - INFO - stdout - {'loss': 0.9884, 'grad_norm': 1.076185941696167, 'learning_rate': 1.8866401878947365e-05, 'epoch': 0.54} +2025-02-05 12:55:56 - ERROR - stderr - 18%|█▊ | 4004/22434 [2:48:16<12:47:42, 2.50s/it] +2025-02-05 12:55:58 - ERROR - stderr - 18%|█▊ | 4005/22434 [2:48:18<12:47:13, 2.50s/it] +2025-02-05 12:55:59 - ERROR - stderr - +2025-02-05 12:55:59 - ERROR - stderr - +2025-02-05 12:55:59 - INFO - stdout - {'loss': 1.032, 'grad_norm': 1.0944101810455322, 'learning_rate': 1.886573411159629e-05, 'epoch': 0.54} +2025-02-05 12:55:59 - ERROR - stderr - 18%|█▊ | 4005/22434 [2:48:18<12:47:13, 2.50s/it] +2025-02-05 12:56:01 - ERROR - stderr - 18%|█▊ | 4006/22434 [2:48:21<13:03:34, 2.55s/it] +2025-02-05 12:56:01 - ERROR - stderr - +2025-02-05 12:56:01 - ERROR - stderr - +2025-02-05 12:56:01 - INFO - stdout - {'loss': 1.0553, 'grad_norm': 1.0662139654159546, 'learning_rate': 1.8865066159447468e-05, 'epoch': 0.54} +2025-02-05 12:56:01 - ERROR - stderr - 18%|█▊ | 4006/22434 [2:48:21<13:03:34, 2.55s/it] +2025-02-05 12:56:04 - ERROR - stderr - 18%|█▊ | 4007/22434 [2:48:23<12:58:37, 2.54s/it] +2025-02-05 12:56:04 - ERROR - stderr - +2025-02-05 12:56:04 - ERROR - stderr - +2025-02-05 12:56:04 - INFO - stdout - {'loss': 0.8748, 'grad_norm': 0.9646372199058533, 'learning_rate': 1.8864398022514823e-05, 'epoch': 0.54} +2025-02-05 12:56:04 - ERROR - stderr - 18%|█▊ | 4007/22434 [2:48:23<12:58:37, 2.54s/it] +2025-02-05 12:56:06 - ERROR - stderr - 18%|█▊ | 4008/22434 [2:48:26<12:54:16, 2.52s/it] +2025-02-05 12:56:06 - ERROR - stderr - +2025-02-05 12:56:06 - ERROR - stderr - +2025-02-05 12:56:06 - INFO - stdout - {'loss': 0.9366, 'grad_norm': 1.0678128004074097, 'learning_rate': 1.8863729700812282e-05, 'epoch': 0.54} +2025-02-05 12:56:06 - ERROR - stderr - 18%|█▊ | 4008/22434 [2:48:26<12:54:16, 2.52s/it] +2025-02-05 12:56:09 - ERROR - stderr - 18%|█▊ | 4009/22434 [2:48:28<12:56:27, 2.53s/it] +2025-02-05 12:56:09 - ERROR - stderr - +2025-02-05 12:56:09 - ERROR - stderr - +2025-02-05 12:56:09 - INFO - stdout - {'loss': 0.8202, 'grad_norm': 1.0341919660568237, 'learning_rate': 1.886306119435378e-05, 'epoch': 0.54} +2025-02-05 12:56:09 - ERROR - stderr - 18%|█▊ | 4009/22434 [2:48:29<12:56:27, 2.53s/it] +2025-02-05 12:56:11 - ERROR - stderr - 18%|█▊ | 4010/22434 [2:48:31<12:50:08, 2.51s/it] +2025-02-05 12:56:11 - ERROR - stderr - +2025-02-05 12:56:11 - ERROR - stderr - +2025-02-05 12:56:11 - INFO - stdout - {'loss': 0.982, 'grad_norm': 1.1835156679153442, 'learning_rate': 1.886239250315325e-05, 'epoch': 0.54} +2025-02-05 12:56:11 - ERROR - stderr - 18%|█▊ | 4010/22434 [2:48:31<12:50:08, 2.51s/it] +2025-02-05 12:56:14 - ERROR - stderr - 18%|█▊ | 4011/22434 [2:48:33<12:47:45, 2.50s/it] +2025-02-05 12:56:14 - ERROR - stderr - +2025-02-05 12:56:14 - ERROR - stderr - +2025-02-05 12:56:14 - INFO - stdout - {'loss': 0.9127, 'grad_norm': 1.1393098831176758, 'learning_rate': 1.8861723627224627e-05, 'epoch': 0.54} +2025-02-05 12:56:14 - ERROR - stderr - 18%|█▊ | 4011/22434 [2:48:33<12:47:45, 2.50s/it] +2025-02-05 12:56:16 - ERROR - stderr - 18%|█▊ | 4012/22434 [2:48:36<12:53:33, 2.52s/it] +2025-02-05 12:56:16 - ERROR - stderr - +2025-02-05 12:56:16 - ERROR - stderr - +2025-02-05 12:56:16 - INFO - stdout - {'loss': 0.9508, 'grad_norm': 1.1345680952072144, 'learning_rate': 1.8861054566581852e-05, 'epoch': 0.54} +2025-02-05 12:56:16 - ERROR - stderr - 18%|█▊ | 4012/22434 [2:48:36<12:53:33, 2.52s/it] +2025-02-05 12:56:19 - ERROR - stderr - 18%|█▊ | 4013/22434 [2:48:38<12:49:05, 2.51s/it] +2025-02-05 12:56:19 - ERROR - stderr - +2025-02-05 12:56:19 - ERROR - stderr - +2025-02-05 12:56:19 - INFO - stdout - {'loss': 0.8737, 'grad_norm': 1.1731466054916382, 'learning_rate': 1.8860385321238877e-05, 'epoch': 0.54} +2025-02-05 12:56:19 - ERROR - stderr - 18%|█▊ | 4013/22434 [2:48:39<12:49:05, 2.51s/it] +2025-02-05 12:56:21 - ERROR - stderr - 18%|█��� | 4014/22434 [2:48:41<12:49:43, 2.51s/it] +2025-02-05 12:56:21 - ERROR - stderr - +2025-02-05 12:56:21 - ERROR - stderr - +2025-02-05 12:56:21 - INFO - stdout - {'loss': 0.9562, 'grad_norm': 1.1283605098724365, 'learning_rate': 1.885971589120965e-05, 'epoch': 0.54} +2025-02-05 12:56:21 - ERROR - stderr - 18%|█▊ | 4014/22434 [2:48:41<12:49:43, 2.51s/it] +2025-02-05 12:56:24 - ERROR - stderr - 18%|█▊ | 4015/22434 [2:48:43<12:45:31, 2.49s/it] +2025-02-05 12:56:24 - ERROR - stderr - +2025-02-05 12:56:24 - ERROR - stderr - +2025-02-05 12:56:24 - INFO - stdout - {'loss': 1.0774, 'grad_norm': 1.0630086660385132, 'learning_rate': 1.8859046276508118e-05, 'epoch': 0.54} +2025-02-05 12:56:24 - ERROR - stderr - 18%|█▊ | 4015/22434 [2:48:43<12:45:31, 2.49s/it] +2025-02-05 12:56:26 - ERROR - stderr - 18%|█▊ | 4016/22434 [2:48:46<12:43:56, 2.49s/it] +2025-02-05 12:56:26 - ERROR - stderr - +2025-02-05 12:56:26 - ERROR - stderr - +2025-02-05 12:56:26 - INFO - stdout - {'loss': 0.9711, 'grad_norm': 1.1081104278564453, 'learning_rate': 1.885837647714825e-05, 'epoch': 0.54} +2025-02-05 12:56:26 - ERROR - stderr - 18%|█▊ | 4016/22434 [2:48:46<12:43:56, 2.49s/it] +2025-02-05 12:56:29 - ERROR - stderr - 18%|█▊ | 4017/22434 [2:48:48<12:43:28, 2.49s/it] +2025-02-05 12:56:29 - ERROR - stderr - +2025-02-05 12:56:29 - ERROR - stderr - +2025-02-05 12:56:29 - INFO - stdout - {'loss': 0.979, 'grad_norm': 0.9931021332740784, 'learning_rate': 1.8857706493143995e-05, 'epoch': 0.54} +2025-02-05 12:56:29 - ERROR - stderr - 18%|█▊ | 4017/22434 [2:48:48<12:43:28, 2.49s/it] +2025-02-05 12:56:31 - ERROR - stderr - 18%|█▊ | 4018/22434 [2:48:51<12:41:19, 2.48s/it] +2025-02-05 12:56:31 - ERROR - stderr - +2025-02-05 12:56:31 - ERROR - stderr - +2025-02-05 12:56:31 - INFO - stdout - {'loss': 0.9207, 'grad_norm': 1.0917123556137085, 'learning_rate': 1.8857036324509324e-05, 'epoch': 0.54} +2025-02-05 12:56:31 - ERROR - stderr - 18%|█▊ | 4018/22434 [2:48:51<12:41:19, 2.48s/it] +2025-02-05 12:56:34 - ERROR - stderr - 18%|█▊ | 4019/22434 [2:48:53<12:48:06, 2.50s/it] +2025-02-05 12:56:34 - ERROR - stderr - +2025-02-05 12:56:34 - ERROR - stderr - +2025-02-05 12:56:34 - INFO - stdout - {'loss': 1.1062, 'grad_norm': 1.0740206241607666, 'learning_rate': 1.8856365971258212e-05, 'epoch': 0.54} +2025-02-05 12:56:34 - ERROR - stderr - 18%|█▊ | 4019/22434 [2:48:53<12:48:06, 2.50s/it] +2025-02-05 12:56:36 - ERROR - stderr - 18%|█▊ | 4020/22434 [2:48:56<12:44:45, 2.49s/it] +2025-02-05 12:56:36 - ERROR - stderr - +2025-02-05 12:56:36 - ERROR - stderr - +2025-02-05 12:56:36 - INFO - stdout - {'loss': 1.0445, 'grad_norm': 1.1552101373672485, 'learning_rate': 1.885569543340462e-05, 'epoch': 0.54} +2025-02-05 12:56:36 - ERROR - stderr - 18%|█▊ | 4020/22434 [2:48:56<12:44:45, 2.49s/it] +2025-02-05 12:56:39 - ERROR - stderr - 18%|█▊ | 4021/22434 [2:48:58<12:55:32, 2.53s/it] +2025-02-05 12:56:39 - ERROR - stderr - +2025-02-05 12:56:39 - ERROR - stderr - +2025-02-05 12:56:39 - INFO - stdout - {'loss': 1.0089, 'grad_norm': 1.117110013961792, 'learning_rate': 1.8855024710962536e-05, 'epoch': 0.54} +2025-02-05 12:56:39 - ERROR - stderr - 18%|█▊ | 4021/22434 [2:48:59<12:55:32, 2.53s/it] +2025-02-05 12:56:41 - ERROR - stderr - 18%|█▊ | 4022/22434 [2:49:01<12:58:08, 2.54s/it] +2025-02-05 12:56:41 - ERROR - stderr - +2025-02-05 12:56:41 - ERROR - stderr - +2025-02-05 12:56:41 - INFO - stdout - {'loss': 0.9584, 'grad_norm': 1.1631462574005127, 'learning_rate': 1.885435380394593e-05, 'epoch': 0.54} +2025-02-05 12:56:41 - ERROR - stderr - 18%|█▊ | 4022/22434 [2:49:01<12:58:08, 2.54s/it] +2025-02-05 12:56:44 - ERROR - stderr - 18%|█▊ | 4023/22434 [2:49:04<12:54:33, 2.52s/it] +2025-02-05 12:56:44 - ERROR - stderr - +2025-02-05 12:56:44 - ERROR - stderr - +2025-02-05 12:56:44 - INFO - stdout - {'loss': 0.9404, 'grad_norm': 1.017776370048523, 'learning_rate': 1.8853682712368796e-05, 'epoch': 0.54} +2025-02-05 12:56:44 - ERROR - stderr - 18%|█▊ | 4023/22434 [2:49:04<12:54:33, 2.52s/it] +2025-02-05 12:56:46 - ERROR - stderr - 18%|█▊ | 4024/22434 [2:49:06<12:58:01, 2.54s/it] +2025-02-05 12:56:46 - ERROR - stderr - +2025-02-05 12:56:46 - ERROR - stderr - +2025-02-05 12:56:46 - INFO - stdout - {'loss': 0.9939, 'grad_norm': 1.0239611864089966, 'learning_rate': 1.8853011436245113e-05, 'epoch': 0.54} +2025-02-05 12:56:46 - ERROR - stderr - 18%|█▊ | 4024/22434 [2:49:06<12:58:01, 2.54s/it] +2025-02-05 12:56:49 - ERROR - stderr - 18%|█▊ | 4025/22434 [2:49:09<12:49:23, 2.51s/it] +2025-02-05 12:56:49 - ERROR - stderr - +2025-02-05 12:56:49 - ERROR - stderr - +2025-02-05 12:56:49 - INFO - stdout - {'loss': 1.1091, 'grad_norm': 1.2450803518295288, 'learning_rate': 1.885233997558888e-05, 'epoch': 0.54} +2025-02-05 12:56:49 - ERROR - stderr - 18%|█▊ | 4025/22434 [2:49:09<12:49:23, 2.51s/it] +2025-02-05 12:56:51 - ERROR - stderr - 18%|█▊ | 4026/22434 [2:49:11<12:45:35, 2.50s/it] +2025-02-05 12:56:51 - ERROR - stderr - +2025-02-05 12:56:51 - ERROR - stderr - +2025-02-05 12:56:51 - INFO - stdout - {'loss': 1.1424, 'grad_norm': 1.122562050819397, 'learning_rate': 1.8851668330414092e-05, 'epoch': 0.54} +2025-02-05 12:56:51 - ERROR - stderr - 18%|█▊ | 4026/22434 [2:49:11<12:45:35, 2.50s/it] +2025-02-05 12:56:54 - ERROR - stderr - 18%|█▊ | 4027/22434 [2:49:14<12:47:17, 2.50s/it] +2025-02-05 12:56:54 - ERROR - stderr - +2025-02-05 12:56:54 - ERROR - stderr - +2025-02-05 12:56:54 - INFO - stdout - {'loss': 0.9484, 'grad_norm': 1.1152565479278564, 'learning_rate': 1.885099650073475e-05, 'epoch': 0.54} +2025-02-05 12:56:54 - ERROR - stderr - 18%|█▊ | 4027/22434 [2:49:14<12:47:17, 2.50s/it] +2025-02-05 12:56:56 - ERROR - stderr - 18%|█▊ | 4028/22434 [2:49:16<12:54:51, 2.53s/it] +2025-02-05 12:56:56 - ERROR - stderr - +2025-02-05 12:56:56 - ERROR - stderr - +2025-02-05 12:56:56 - INFO - stdout - {'loss': 0.8987, 'grad_norm': 1.0552746057510376, 'learning_rate': 1.8850324486564853e-05, 'epoch': 0.54} +2025-02-05 12:56:56 - ERROR - stderr - 18%|█▊ | 4028/22434 [2:49:16<12:54:51, 2.53s/it] +2025-02-05 12:56:59 - ERROR - stderr - 18%|█▊ | 4029/22434 [2:49:19<12:50:55, 2.51s/it] +2025-02-05 12:56:59 - ERROR - stderr - +2025-02-05 12:56:59 - ERROR - stderr - +2025-02-05 12:56:59 - INFO - stdout - {'loss': 0.9762, 'grad_norm': 1.0813008546829224, 'learning_rate': 1.884965228791841e-05, 'epoch': 0.54} +2025-02-05 12:56:59 - ERROR - stderr - 18%|█▊ | 4029/22434 [2:49:19<12:50:55, 2.51s/it] +2025-02-05 12:57:01 - ERROR - stderr - 18%|█▊ | 4030/22434 [2:49:21<12:51:08, 2.51s/it] +2025-02-05 12:57:01 - ERROR - stderr - +2025-02-05 12:57:01 - ERROR - stderr - +2025-02-05 12:57:01 - INFO - stdout - {'loss': 1.0349, 'grad_norm': 1.1637060642242432, 'learning_rate': 1.8848979904809435e-05, 'epoch': 0.54} +2025-02-05 12:57:01 - ERROR - stderr - 18%|█▊ | 4030/22434 [2:49:21<12:51:08, 2.51s/it] +2025-02-05 12:57:04 - ERROR - stderr - 18%|█▊ | 4031/22434 [2:49:24<12:48:32, 2.51s/it] +2025-02-05 12:57:04 - ERROR - stderr - +2025-02-05 12:57:04 - ERROR - stderr - +2025-02-05 12:57:04 - INFO - stdout - {'loss': 0.9885, 'grad_norm': 1.0969377756118774, 'learning_rate': 1.884830733725194e-05, 'epoch': 0.54} +2025-02-05 12:57:04 - ERROR - stderr - 18%|█▊ | 4031/22434 [2:49:24<12:48:32, 2.51s/it] +2025-02-05 12:57:06 - ERROR - stderr - 18%|█▊ | 4032/22434 [2:49:26<12:48:39, 2.51s/it] +2025-02-05 12:57:06 - ERROR - stderr - +2025-02-05 12:57:06 - ERROR - stderr - +2025-02-05 12:57:06 - INFO - stdout - {'loss': 0.9344, 'grad_norm': 1.0484496355056763, 'learning_rate': 1.8847634585259948e-05, 'epoch': 0.54} +2025-02-05 12:57:06 - ERROR - stderr - 18%|█▊ | 4032/22434 [2:49:26<12:48:39, 2.51s/it] +2025-02-05 12:57:09 - ERROR - stderr - 18%|█▊ | 4033/22434 [2:49:29<12:45:54, 2.50s/it] +2025-02-05 12:57:09 - ERROR - stderr - +2025-02-05 12:57:09 - ERROR - stderr - +2025-02-05 12:57:09 - INFO - stdout - {'loss': 1.0066, 'grad_norm': 1.0504816770553589, 'learning_rate': 1.8846961648847476e-05, 'epoch': 0.54} +2025-02-05 12:57:09 - ERROR - stderr - 18%|█▊ | 4033/22434 [2:49:29<12:45:54, 2.50s/it] +2025-02-05 12:57:11 - ERROR - stderr - 18%|█▊ | 4034/22434 [2:49:31<12:46:39, 2.50s/it] +2025-02-05 12:57:11 - ERROR - stderr - +2025-02-05 12:57:11 - ERROR - stderr - +2025-02-05 12:57:11 - INFO - stdout - {'loss': 1.0313, 'grad_norm': 1.1143165826797485, 'learning_rate': 1.8846288528028555e-05, 'epoch': 0.54} +2025-02-05 12:57:11 - ERROR - stderr - 18%|█▊ | 4034/22434 [2:49:31<12:46:39, 2.50s/it] +2025-02-05 12:57:14 - ERROR - stderr - 18%|█▊ | 4035/22434 [2:49:34<12:42:20, 2.49s/it] +2025-02-05 12:57:14 - ERROR - stderr - +2025-02-05 12:57:14 - ERROR - stderr - +2025-02-05 12:57:14 - INFO - stdout - {'loss': 1.017, 'grad_norm': 1.118200421333313, 'learning_rate': 1.8845615222817217e-05, 'epoch': 0.54} +2025-02-05 12:57:14 - ERROR - stderr - 18%|█▊ | 4035/22434 [2:49:34<12:42:20, 2.49s/it] +2025-02-05 12:57:16 - ERROR - stderr - 18%|█▊ | 4036/22434 [2:49:36<12:54:57, 2.53s/it] +2025-02-05 12:57:16 - ERROR - stderr - +2025-02-05 12:57:16 - ERROR - stderr - +2025-02-05 12:57:16 - INFO - stdout - {'loss': 0.9642, 'grad_norm': 1.1040101051330566, 'learning_rate': 1.884494173322749e-05, 'epoch': 0.54} +2025-02-05 12:57:16 - ERROR - stderr - 18%|█▊ | 4036/22434 [2:49:36<12:54:57, 2.53s/it] +2025-02-05 12:57:19 - ERROR - stderr - 18%|█▊ | 4037/22434 [2:49:39<12:45:20, 2.50s/it] +2025-02-05 12:57:19 - ERROR - stderr - +2025-02-05 12:57:19 - ERROR - stderr - +2025-02-05 12:57:19 - INFO - stdout - {'loss': 1.0766, 'grad_norm': 1.202311635017395, 'learning_rate': 1.884426805927342e-05, 'epoch': 0.54} +2025-02-05 12:57:19 - ERROR - stderr - 18%|█▊ | 4037/22434 [2:49:39<12:45:20, 2.50s/it] +2025-02-05 12:57:21 - ERROR - stderr - 18%|█▊ | 4038/22434 [2:49:41<12:45:20, 2.50s/it] +2025-02-05 12:57:21 - ERROR - stderr - +2025-02-05 12:57:21 - ERROR - stderr - +2025-02-05 12:57:21 - INFO - stdout - {'loss': 0.8847, 'grad_norm': 1.0037615299224854, 'learning_rate': 1.8843594200969043e-05, 'epoch': 0.54} +2025-02-05 12:57:21 - ERROR - stderr - 18%|█▊ | 4038/22434 [2:49:41<12:45:20, 2.50s/it] +2025-02-05 12:57:24 - ERROR - stderr - 18%|█▊ | 4039/22434 [2:49:44<12:41:02, 2.48s/it] +2025-02-05 12:57:24 - ERROR - stderr - +2025-02-05 12:57:24 - ERROR - stderr - +2025-02-05 12:57:24 - INFO - stdout - {'loss': 1.0628, 'grad_norm': 1.060538649559021, 'learning_rate': 1.884292015832841e-05, 'epoch': 0.54} +2025-02-05 12:57:24 - ERROR - stderr - 18%|█▊ | 4039/22434 [2:49:44<12:41:02, 2.48s/it] +2025-02-05 12:57:26 - ERROR - stderr - 18%|█▊ | 4040/22434 [2:49:46<12:39:25, 2.48s/it] +2025-02-05 12:57:26 - ERROR - stderr - +2025-02-05 12:57:26 - ERROR - stderr - +2025-02-05 12:57:26 - INFO - stdout - {'loss': 0.9419, 'grad_norm': 1.1091669797897339, 'learning_rate': 1.8842245931365564e-05, 'epoch': 0.54} +2025-02-05 12:57:26 - ERROR - stderr - 18%|█▊ | 4040/22434 [2:49:46<12:39:25, 2.48s/it] +2025-02-05 12:57:29 - ERROR - stderr - 18%|█▊ | 4041/22434 [2:49:49<13:09:39, 2.58s/it] +2025-02-05 12:57:29 - ERROR - stderr - +2025-02-05 12:57:29 - ERROR - stderr - +2025-02-05 12:57:29 - INFO - stdout - {'loss': 0.8612, 'grad_norm': 0.9443292617797852, 'learning_rate': 1.8841571520094564e-05, 'epoch': 0.54} +2025-02-05 12:57:29 - ERROR - stderr - 18%|█▊ | 4041/22434 [2:49:49<13:09:39, 2.58s/it] +2025-02-05 12:57:31 - ERROR - stderr - 18%|█▊ | 4042/22434 [2:49:51<12:55:57, 2.53s/it] +2025-02-05 12:57:32 - ERROR - stderr - +2025-02-05 12:57:32 - ERROR - stderr - +2025-02-05 12:57:32 - INFO - stdout - {'loss': 0.9891, 'grad_norm': 1.095067024230957, 'learning_rate': 1.8840896924529466e-05, 'epoch': 0.54} +2025-02-05 12:57:32 - ERROR - stderr - 18%|█▊ | 4042/22434 [2:49:51<12:55:57, 2.53s/it] +2025-02-05 12:57:34 - ERROR - stderr - 18%|█▊ | 4043/22434 [2:49:54<13:25:36, 2.63s/it] +2025-02-05 12:57:34 - ERROR - stderr - +2025-02-05 12:57:34 - ERROR - stderr - +2025-02-05 12:57:34 - INFO - stdout - {'loss': 0.8, 'grad_norm': 1.0677266120910645, 'learning_rate': 1.8840222144684333e-05, 'epoch': 0.54} +2025-02-05 12:57:34 - ERROR - stderr - 18%|█▊ | 4043/22434 [2:49:54<13:25:36, 2.63s/it] +2025-02-05 12:57:37 - ERROR - stderr - 18%|█▊ | 4044/22434 [2:49:57<13:09:56, 2.58s/it] +2025-02-05 12:57:37 - ERROR - stderr - +2025-02-05 12:57:37 - ERROR - stderr - +2025-02-05 12:57:37 - INFO - stdout - {'loss': 0.9176, 'grad_norm': 1.0165082216262817, 'learning_rate': 1.8839547180573228e-05, 'epoch': 0.54} +2025-02-05 12:57:37 - ERROR - stderr - 18%|█▊ | 4044/22434 [2:49:57<13:09:56, 2.58s/it] +2025-02-05 12:57:39 - ERROR - stderr - 18%|█▊ | 4045/22434 [2:49:59<12:53:41, 2.52s/it] +2025-02-05 12:57:39 - ERROR - stderr - +2025-02-05 12:57:39 - ERROR - stderr - +2025-02-05 12:57:39 - INFO - stdout - {'loss': 1.1452, 'grad_norm': 1.2070832252502441, 'learning_rate': 1.883887203221022e-05, 'epoch': 0.54} +2025-02-05 12:57:39 - ERROR - stderr - 18%|█▊ | 4045/22434 [2:49:59<12:53:41, 2.52s/it] +2025-02-05 12:57:42 - ERROR - stderr - 18%|█▊ | 4046/22434 [2:50:01<12:49:07, 2.51s/it] +2025-02-05 12:57:42 - ERROR - stderr - +2025-02-05 12:57:42 - ERROR - stderr - +2025-02-05 12:57:42 - INFO - stdout - {'loss': 0.8936, 'grad_norm': 0.9099141955375671, 'learning_rate': 1.8838196699609385e-05, 'epoch': 0.54} +2025-02-05 12:57:42 - ERROR - stderr - 18%|█▊ | 4046/22434 [2:50:01<12:49:07, 2.51s/it] +2025-02-05 12:57:44 - ERROR - stderr - 18%|█▊ | 4047/22434 [2:50:04<12:44:37, 2.50s/it] +2025-02-05 12:57:44 - ERROR - stderr - +2025-02-05 12:57:44 - ERROR - stderr - +2025-02-05 12:57:44 - INFO - stdout - {'loss': 1.0062, 'grad_norm': 0.9718128442764282, 'learning_rate': 1.8837521182784795e-05, 'epoch': 0.54} +2025-02-05 12:57:44 - ERROR - stderr - 18%|█▊ | 4047/22434 [2:50:04<12:44:37, 2.50s/it] +2025-02-05 12:57:47 - ERROR - stderr - 18%|█▊ | 4048/22434 [2:50:06<12:41:08, 2.48s/it] +2025-02-05 12:57:47 - ERROR - stderr - +2025-02-05 12:57:47 - ERROR - stderr - +2025-02-05 12:57:47 - INFO - stdout - {'loss': 0.9909, 'grad_norm': 1.1335023641586304, 'learning_rate': 1.8836845481750533e-05, 'epoch': 0.54} +2025-02-05 12:57:47 - ERROR - stderr - 18%|█▊ | 4048/22434 [2:50:06<12:41:08, 2.48s/it] +2025-02-05 12:57:49 - ERROR - stderr - 18%|█▊ | 4049/22434 [2:50:09<12:53:59, 2.53s/it] +2025-02-05 12:57:49 - ERROR - stderr - +2025-02-05 12:57:49 - ERROR - stderr - +2025-02-05 12:57:49 - INFO - stdout - {'loss': 0.9943, 'grad_norm': 1.0748789310455322, 'learning_rate': 1.8836169596520683e-05, 'epoch': 0.54} +2025-02-05 12:57:49 - ERROR - stderr - 18%|█▊ | 4049/22434 [2:50:09<12:53:59, 2.53s/it] +2025-02-05 12:57:52 - ERROR - stderr - 18%|█▊ | 4050/22434 [2:50:11<12:50:17, 2.51s/it] +2025-02-05 12:57:52 - ERROR - stderr - +2025-02-05 12:57:52 - ERROR - stderr - +2025-02-05 12:57:52 - INFO - stdout - {'loss': 0.9091, 'grad_norm': 1.1526007652282715, 'learning_rate': 1.883549352710933e-05, 'epoch': 0.54} +2025-02-05 12:57:52 - ERROR - stderr - 18%|█▊ | 4050/22434 [2:50:11<12:50:17, 2.51s/it] +2025-02-05 12:57:54 - ERROR - stderr - 18%|█▊ | 4051/22434 [2:50:14<12:43:57, 2.49s/it] +2025-02-05 12:57:54 - ERROR - stderr - +2025-02-05 12:57:54 - ERROR - stderr - +2025-02-05 12:57:54 - INFO - stdout - {'loss': 1.1026, 'grad_norm': 1.204253911972046, 'learning_rate': 1.8834817273530572e-05, 'epoch': 0.54} +2025-02-05 12:57:54 - ERROR - stderr - 18%|█▊ | 4051/22434 [2:50:14<12:43:57, 2.49s/it] +2025-02-05 12:57:57 - ERROR - stderr - 18%|█▊ | 4052/22434 [2:50:16<12:45:20, 2.50s/it] +2025-02-05 12:57:57 - ERROR - stderr - +2025-02-05 12:57:57 - ERROR - stderr - +2025-02-05 12:57:57 - INFO - stdout - {'loss': 1.0354, 'grad_norm': 1.2260923385620117, 'learning_rate': 1.88341408357985e-05, 'epoch': 0.54} +2025-02-05 12:57:57 - ERROR - stderr - 18%|█▊ | 4052/22434 [2:50:16<12:45:20, 2.50s/it] +2025-02-05 12:57:59 - ERROR - stderr - 18%|█▊ | 4053/22434 [2:50:19<12:45:44, 2.50s/it] +2025-02-05 12:57:59 - ERROR - stderr - +2025-02-05 12:57:59 - ERROR - stderr - +2025-02-05 12:57:59 - INFO - stdout - {'loss': 0.9088, 'grad_norm': 1.0631901025772095, 'learning_rate': 1.8833464213927217e-05, 'epoch': 0.54} +2025-02-05 12:57:59 - ERROR - stderr - 18%|█▊ | 4053/22434 [2:50:19<12:45:44, 2.50s/it] +2025-02-05 12:58:02 - ERROR - stderr - 18%|█▊ | 4054/22434 [2:50:21<12:40:42, 2.48s/it] +2025-02-05 12:58:02 - ERROR - stderr - +2025-02-05 12:58:02 - ERROR - stderr - +2025-02-05 12:58:02 - INFO - stdout - {'loss': 0.8789, 'grad_norm': 1.0479751825332642, 'learning_rate': 1.8832787407930825e-05, 'epoch': 0.54} +2025-02-05 12:58:02 - ERROR - stderr - 18%|█▊ | 4054/22434 [2:50:21<12:40:42, 2.48s/it] +2025-02-05 12:58:04 - ERROR - stderr - 18%|█▊ | 4055/22434 [2:50:24<12:33:01, 2.46s/it] +2025-02-05 12:58:04 - ERROR - stderr - +2025-02-05 12:58:04 - ERROR - stderr - +2025-02-05 12:58:04 - INFO - stdout - {'loss': 1.014, 'grad_norm': 1.242635726928711, 'learning_rate': 1.8832110417823433e-05, 'epoch': 0.54} +2025-02-05 12:58:04 - ERROR - stderr - 18%|█▊ | 4055/22434 [2:50:24<12:33:01, 2.46s/it] +2025-02-05 12:58:07 - ERROR - stderr - 18%|█▊ | 4056/22434 [2:50:26<12:53:19, 2.52s/it] +2025-02-05 12:58:07 - ERROR - stderr - +2025-02-05 12:58:07 - ERROR - stderr - +2025-02-05 12:58:07 - INFO - stdout - {'loss': 0.9911, 'grad_norm': 1.082195520401001, 'learning_rate': 1.8831433243619148e-05, 'epoch': 0.54} +2025-02-05 12:58:07 - ERROR - stderr - 18%|█▊ | 4056/22434 [2:50:26<12:53:19, 2.52s/it] +2025-02-05 12:58:09 - ERROR - stderr - 18%|█▊ | 4057/22434 [2:50:29<12:45:00, 2.50s/it] +2025-02-05 12:58:09 - ERROR - stderr - +2025-02-05 12:58:09 - ERROR - stderr - +2025-02-05 12:58:09 - INFO - stdout - {'loss': 1.037, 'grad_norm': 1.1591027975082397, 'learning_rate': 1.8830755885332087e-05, 'epoch': 0.54} +2025-02-05 12:58:09 - ERROR - stderr - 18%|█▊ | 4057/22434 [2:50:29<12:45:00, 2.50s/it] +2025-02-05 12:58:12 - ERROR - stderr - 18%|█▊ | 4058/22434 [2:50:31<12:41:34, 2.49s/it] +2025-02-05 12:58:12 - ERROR - stderr - +2025-02-05 12:58:12 - ERROR - stderr - +2025-02-05 12:58:12 - INFO - stdout - {'loss': 0.8676, 'grad_norm': 1.0585474967956543, 'learning_rate': 1.8830078342976374e-05, 'epoch': 0.54} +2025-02-05 12:58:12 - ERROR - stderr - 18%|█▊ | 4058/22434 [2:50:31<12:41:34, 2.49s/it] +2025-02-05 12:58:14 - ERROR - stderr - 18%|█▊ | 4059/22434 [2:50:34<12:48:42, 2.51s/it] +2025-02-05 12:58:14 - ERROR - stderr - +2025-02-05 12:58:14 - ERROR - stderr - +2025-02-05 12:58:14 - INFO - stdout - {'loss': 0.8947, 'grad_norm': 0.8934906125068665, 'learning_rate': 1.8829400616566124e-05, 'epoch': 0.54} +2025-02-05 12:58:14 - ERROR - stderr - 18%|█▊ | 4059/22434 [2:50:34<12:48:42, 2.51s/it] +2025-02-05 12:58:17 - ERROR - stderr - 18%|█▊ | 4060/22434 [2:50:36<12:47:27, 2.51s/it] +2025-02-05 12:58:17 - ERROR - stderr - +2025-02-05 12:58:17 - ERROR - stderr - +2025-02-05 12:58:17 - INFO - stdout - {'loss': 1.1145, 'grad_norm': 1.2074781656265259, 'learning_rate': 1.882872270611547e-05, 'epoch': 0.54} +2025-02-05 12:58:17 - ERROR - stderr - 18%|█▊ | 4060/22434 [2:50:36<12:47:27, 2.51s/it] +2025-02-05 12:58:19 - ERROR - stderr - 18%|█▊ | 4061/22434 [2:50:39<13:20:42, 2.61s/it] +2025-02-05 12:58:20 - ERROR - stderr - +2025-02-05 12:58:20 - ERROR - stderr - +2025-02-05 12:58:20 - INFO - stdout - {'loss': 0.9149, 'grad_norm': 1.0659806728363037, 'learning_rate': 1.8828044611638538e-05, 'epoch': 0.54} +2025-02-05 12:58:20 - ERROR - stderr - 18%|█▊ | 4061/22434 [2:50:39<13:20:42, 2.61s/it] +2025-02-05 12:58:22 - ERROR - stderr - 18%|█▊ | 4062/22434 [2:50:42<13:03:12, 2.56s/it] +2025-02-05 12:58:22 - ERROR - stderr - +2025-02-05 12:58:22 - ERROR - stderr - +2025-02-05 12:58:22 - INFO - stdout - {'loss': 1.0843, 'grad_norm': 1.1296091079711914, 'learning_rate': 1.8827366333149465e-05, 'epoch': 0.54} +2025-02-05 12:58:22 - ERROR - stderr - 18%|█▊ | 4062/22434 [2:50:42<13:03:12, 2.56s/it] +2025-02-05 12:58:24 - ERROR - stderr - 18%|█▊ | 4063/22434 [2:50:44<13:00:44, 2.55s/it] +2025-02-05 12:58:24 - ERROR - stderr - +2025-02-05 12:58:24 - ERROR - stderr - +2025-02-05 12:58:24 - INFO - stdout - {'loss': 1.0003, 'grad_norm': 0.9791759848594666, 'learning_rate': 1.8826687870662383e-05, 'epoch': 0.54} +2025-02-05 12:58:24 - ERROR - stderr - 18%|█▊ | 4063/22434 [2:50:44<13:00:44, 2.55s/it] +2025-02-05 12:58:27 - ERROR - stderr - 18%|█▊ | 4064/22434 [2:50:47<13:00:45, 2.55s/it] +2025-02-05 12:58:27 - ERROR - stderr - +2025-02-05 12:58:27 - ERROR - stderr - +2025-02-05 12:58:27 - INFO - stdout - {'loss': 0.8917, 'grad_norm': 0.9883964657783508, 'learning_rate': 1.882600922419144e-05, 'epoch': 0.54} +2025-02-05 12:58:27 - ERROR - stderr - 18%|█▊ | 4064/22434 [2:50:47<13:00:45, 2.55s/it] +2025-02-05 12:58:29 - ERROR - stderr - 18%|█▊ | 4065/22434 [2:50:49<12:53:04, 2.53s/it] +2025-02-05 12:58:30 - ERROR - stderr - +2025-02-05 12:58:30 - ERROR - stderr - +2025-02-05 12:58:30 - INFO - stdout - {'loss': 1.0969, 'grad_norm': 1.1391581296920776, 'learning_rate': 1.8825330393750783e-05, 'epoch': 0.54} +2025-02-05 12:58:30 - ERROR - stderr - 18%|█▊ | 4065/22434 [2:50:49<12:53:04, 2.53s/it] +2025-02-05 12:58:32 - ERROR - stderr - 18%|█▊ | 4066/22434 [2:50:52<12:51:09, 2.52s/it] +2025-02-05 12:58:32 - ERROR - stderr - +2025-02-05 12:58:32 - ERROR - stderr - +2025-02-05 12:58:32 - INFO - stdout - {'loss': 1.0222, 'grad_norm': 1.1297281980514526, 'learning_rate': 1.882465137935456e-05, 'epoch': 0.54} +2025-02-05 12:58:32 - ERROR - stderr - 18%|█▊ | 4066/22434 [2:50:52<12:51:09, 2.52s/it] +2025-02-05 12:58:34 - ERROR - stderr - 18%|█▊ | 4067/22434 [2:50:54<12:50:49, 2.52s/it] +2025-02-05 12:58:35 - ERROR - stderr - +2025-02-05 12:58:35 - ERROR - stderr - +2025-02-05 12:58:35 - INFO - stdout - {'loss': 1.0208, 'grad_norm': 1.1820268630981445, 'learning_rate': 1.8823972181016922e-05, 'epoch': 0.54} +2025-02-05 12:58:35 - ERROR - stderr - 18%|█▊ | 4067/22434 [2:50:54<12:50:49, 2.52s/it] +2025-02-05 12:58:37 - ERROR - stderr - 18%|█▊ | 4068/22434 [2:50:57<12:45:15, 2.50s/it] +2025-02-05 12:58:37 - ERROR - stderr - +2025-02-05 12:58:37 - ERROR - stderr - +2025-02-05 12:58:37 - INFO - stdout - {'loss': 0.9482, 'grad_norm': 1.0535166263580322, 'learning_rate': 1.8823292798752023e-05, 'epoch': 0.54} +2025-02-05 12:58:37 - ERROR - stderr - 18%|█▊ | 4068/22434 [2:50:57<12:45:15, 2.50s/it] +2025-02-05 12:58:39 - ERROR - stderr - 18%|█▊ | 4069/22434 [2:50:59<12:39:31, 2.48s/it] +2025-02-05 12:58:39 - ERROR - stderr - +2025-02-05 12:58:39 - ERROR - stderr - +2025-02-05 12:58:39 - INFO - stdout - {'loss': 1.0862, 'grad_norm': 1.2228018045425415, 'learning_rate': 1.8822613232574035e-05, 'epoch': 0.54} +2025-02-05 12:58:39 - ERROR - stderr - 18%|█▊ | 4069/22434 [2:50:59<12:39:31, 2.48s/it] +2025-02-05 12:58:42 - ERROR - stderr - 18%|█▊ | 4070/22434 [2:51:02<12:39:42, 2.48s/it] +2025-02-05 12:58:42 - ERROR - stderr - +2025-02-05 12:58:42 - ERROR - stderr - +2025-02-05 12:58:42 - INFO - stdout - {'loss': 0.9011, 'grad_norm': 0.9343435168266296, 'learning_rate': 1.882193348249711e-05, 'epoch': 0.54} +2025-02-05 12:58:42 - ERROR - stderr - 18%|█▊ | 4070/22434 [2:51:02<12:39:42, 2.48s/it] +2025-02-05 12:58:44 - ERROR - stderr - 18%|█▊ | 4071/22434 [2:51:04<12:36:11, 2.47s/it] +2025-02-05 12:58:44 - ERROR - stderr - +2025-02-05 12:58:44 - ERROR - stderr - +2025-02-05 12:58:44 - INFO - stdout - {'loss': 1.0211, 'grad_norm': 1.1489194631576538, 'learning_rate': 1.8821253548535427e-05, 'epoch': 0.54} +2025-02-05 12:58:44 - ERROR - stderr - 18%|█▊ | 4071/22434 [2:51:04<12:36:11, 2.47s/it] +2025-02-05 12:58:47 - ERROR - stderr - 18%|█▊ | 4072/22434 [2:51:07<12:39:09, 2.48s/it] +2025-02-05 12:58:47 - ERROR - stderr - +2025-02-05 12:58:47 - ERROR - stderr - +2025-02-05 12:58:47 - INFO - stdout - {'loss': 1.0102, 'grad_norm': 1.1050649881362915, 'learning_rate': 1.8820573430703155e-05, 'epoch': 0.54} +2025-02-05 12:58:47 - ERROR - stderr - 18%|█▊ | 4072/22434 [2:51:07<12:39:09, 2.48s/it] +2025-02-05 12:58:49 - ERROR - stderr - 18%|█▊ | 4073/22434 [2:51:09<12:39:16, 2.48s/it] +2025-02-05 12:58:49 - ERROR - stderr - +2025-02-05 12:58:49 - ERROR - stderr - +2025-02-05 12:58:49 - INFO - stdout - {'loss': 0.9198, 'grad_norm': 1.0614635944366455, 'learning_rate': 1.881989312901447e-05, 'epoch': 0.54} +2025-02-05 12:58:49 - ERROR - stderr - 18%|█▊ | 4073/22434 [2:51:09<12:39:16, 2.48s/it] +2025-02-05 12:58:52 - ERROR - stderr - 18%|█▊ | 4074/22434 [2:51:12<12:47:09, 2.51s/it] +2025-02-05 12:58:52 - ERROR - stderr - +2025-02-05 12:58:52 - ERROR - stderr - +2025-02-05 12:58:52 - INFO - stdout - {'loss': 0.9444, 'grad_norm': 0.9965329170227051, 'learning_rate': 1.881921264348355e-05, 'epoch': 0.54} +2025-02-05 12:58:52 - ERROR - stderr - 18%|█▊ | 4074/22434 [2:51:12<12:47:09, 2.51s/it] +2025-02-05 12:58:54 - ERROR - stderr - 18%|█▊ | 4075/22434 [2:51:14<12:38:47, 2.48s/it] +2025-02-05 12:58:54 - ERROR - stderr - +2025-02-05 12:58:54 - ERROR - stderr - +2025-02-05 12:58:54 - INFO - stdout - {'loss': 1.0339, 'grad_norm': 1.0792934894561768, 'learning_rate': 1.8818531974124584e-05, 'epoch': 0.54} +2025-02-05 12:58:54 - ERROR - stderr - 18%|█▊ | 4075/22434 [2:51:14<12:38:47, 2.48s/it] +2025-02-05 12:58:57 - ERROR - stderr - 18%|█▊ | 4076/22434 [2:51:16<12:34:12, 2.46s/it] +2025-02-05 12:58:57 - ERROR - stderr - +2025-02-05 12:58:57 - ERROR - stderr - +2025-02-05 12:58:57 - INFO - stdout - {'loss': 0.9802, 'grad_norm': 1.233396053314209, 'learning_rate': 1.881785112095176e-05, 'epoch': 0.55} +2025-02-05 12:58:57 - ERROR - stderr - 18%|█▊ | 4076/22434 [2:51:17<12:34:12, 2.46s/it] +2025-02-05 12:58:59 - ERROR - stderr - 18%|█▊ | 4077/22434 [2:51:19<12:38:19, 2.48s/it] +2025-02-05 12:58:59 - ERROR - stderr - +2025-02-05 12:58:59 - ERROR - stderr - +2025-02-05 12:58:59 - INFO - stdout - {'loss': 0.895, 'grad_norm': 1.0449467897415161, 'learning_rate': 1.8817170083979262e-05, 'epoch': 0.55} +2025-02-05 12:58:59 - ERROR - stderr - 18%|█▊ | 4077/22434 [2:51:19<12:38:19, 2.48s/it] +2025-02-05 12:59:02 - ERROR - stderr - 18%|█▊ | 4078/22434 [2:51:21<12:35:00, 2.47s/it] +2025-02-05 12:59:02 - ERROR - stderr - +2025-02-05 12:59:02 - ERROR - stderr - +2025-02-05 12:59:02 - INFO - stdout - {'loss': 0.9726, 'grad_norm': 1.0529789924621582, 'learning_rate': 1.8816488863221294e-05, 'epoch': 0.55} +2025-02-05 12:59:02 - ERROR - stderr - 18%|█▊ | 4078/22434 [2:51:21<12:35:00, 2.47s/it] +2025-02-05 12:59:04 - ERROR - stderr - 18%|█▊ | 4079/22434 [2:51:24<12:37:57, 2.48s/it] +2025-02-05 12:59:04 - ERROR - stderr - +2025-02-05 12:59:04 - ERROR - stderr - +2025-02-05 12:59:04 - INFO - stdout - {'loss': 0.9804, 'grad_norm': 1.057137370109558, 'learning_rate': 1.881580745869205e-05, 'epoch': 0.55} +2025-02-05 12:59:04 - ERROR - stderr - 18%|█▊ | 4079/22434 [2:51:24<12:37:57, 2.48s/it] +2025-02-05 12:59:07 - ERROR - stderr - 18%|█▊ | 4080/22434 [2:51:26<12:34:11, 2.47s/it] +2025-02-05 12:59:07 - ERROR - stderr - +2025-02-05 12:59:07 - ERROR - stderr - +2025-02-05 12:59:07 - INFO - stdout - {'loss': 0.9712, 'grad_norm': 1.1353020668029785, 'learning_rate': 1.8815125870405738e-05, 'epoch': 0.55} +2025-02-05 12:59:07 - ERROR - stderr - 18%|█▊ | 4080/22434 [2:51:26<12:34:11, 2.47s/it] +2025-02-05 12:59:09 - ERROR - stderr - 18%|█▊ | 4081/22434 [2:51:29<12:33:49, 2.46s/it] +2025-02-05 12:59:09 - ERROR - stderr - +2025-02-05 12:59:09 - ERROR - stderr - +2025-02-05 12:59:09 - INFO - stdout - {'loss': 1.1781, 'grad_norm': 1.165024995803833, 'learning_rate': 1.8814444098376562e-05, 'epoch': 0.55} +2025-02-05 12:59:09 - ERROR - stderr - 18%|█▊ | 4081/22434 [2:51:29<12:33:49, 2.46s/it] +2025-02-05 12:59:12 - ERROR - stderr - 18%|█▊ | 4082/22434 [2:51:31<12:32:38, 2.46s/it] +2025-02-05 12:59:12 - ERROR - stderr - +2025-02-05 12:59:12 - ERROR - stderr - +2025-02-05 12:59:12 - INFO - stdout - {'loss': 1.0514, 'grad_norm': 1.25754976272583, 'learning_rate': 1.881376214261873e-05, 'epoch': 0.55} +2025-02-05 12:59:12 - ERROR - stderr - 18%|█▊ | 4082/22434 [2:51:31<12:32:38, 2.46s/it] +2025-02-05 12:59:14 - ERROR - stderr - 18%|█▊ | 4083/22434 [2:51:34<12:37:41, 2.48s/it] +2025-02-05 12:59:14 - ERROR - stderr - +2025-02-05 12:59:14 - ERROR - stderr - +2025-02-05 12:59:14 - INFO - stdout - {'loss': 0.9676, 'grad_norm': 1.0897449254989624, 'learning_rate': 1.8813080003146463e-05, 'epoch': 0.55} +2025-02-05 12:59:14 - ERROR - stderr - 18%|█▊ | 4083/22434 [2:51:34<12:37:41, 2.48s/it] +2025-02-05 12:59:17 - ERROR - stderr - 18%|█▊ | 4084/22434 [2:51:36<12:38:52, 2.48s/it] +2025-02-05 12:59:17 - ERROR - stderr - +2025-02-05 12:59:17 - ERROR - stderr - +2025-02-05 12:59:17 - INFO - stdout - {'loss': 0.9263, 'grad_norm': 0.9986870884895325, 'learning_rate': 1.8812397679973975e-05, 'epoch': 0.55} +2025-02-05 12:59:17 - ERROR - stderr - 18%|█▊ | 4084/22434 [2:51:36<12:38:52, 2.48s/it] +2025-02-05 12:59:19 - ERROR - stderr - 18%|█▊ | 4085/22434 [2:51:39<12:35:55, 2.47s/it] +2025-02-05 12:59:19 - ERROR - stderr - +2025-02-05 12:59:19 - ERROR - stderr - +2025-02-05 12:59:19 - INFO - stdout - {'loss': 0.872, 'grad_norm': 1.0525767803192139, 'learning_rate': 1.8811715173115492e-05, 'epoch': 0.55} +2025-02-05 12:59:19 - ERROR - stderr - 18%|█▊ | 4085/22434 [2:51:39<12:35:55, 2.47s/it] +2025-02-05 12:59:22 - ERROR - stderr - 18%|█▊ | 4086/22434 [2:51:41<12:45:54, 2.50s/it] +2025-02-05 12:59:22 - ERROR - stderr - +2025-02-05 12:59:22 - ERROR - stderr - +2025-02-05 12:59:22 - INFO - stdout - {'loss': 0.9688, 'grad_norm': 1.033512830734253, 'learning_rate': 1.8811032482585235e-05, 'epoch': 0.55} +2025-02-05 12:59:22 - ERROR - stderr - 18%|█▊ | 4086/22434 [2:51:41<12:45:54, 2.50s/it] +2025-02-05 12:59:24 - ERROR - stderr - 18%|█▊ | 4087/22434 [2:51:44<12:45:43, 2.50s/it] +2025-02-05 12:59:24 - ERROR - stderr - +2025-02-05 12:59:24 - ERROR - stderr - +2025-02-05 12:59:24 - INFO - stdout - {'loss': 0.8851, 'grad_norm': 1.0833057165145874, 'learning_rate': 1.881034960839744e-05, 'epoch': 0.55} +2025-02-05 12:59:24 - ERROR - stderr - 18%|█▊ | 4087/22434 [2:51:44<12:45:43, 2.50s/it] +2025-02-05 12:59:27 - ERROR - stderr - 18%|█▊ | 4088/22434 [2:51:46<12:44:20, 2.50s/it] +2025-02-05 12:59:27 - ERROR - stderr - +2025-02-05 12:59:27 - ERROR - stderr - +2025-02-05 12:59:27 - INFO - stdout - {'loss': 0.9235, 'grad_norm': 1.0895195007324219, 'learning_rate': 1.8809666550566334e-05, 'epoch': 0.55} +2025-02-05 12:59:27 - ERROR - stderr - 18%|█▊ | 4088/22434 [2:51:46<12:44:20, 2.50s/it] +2025-02-05 12:59:29 - ERROR - stderr - 18%|█▊ | 4089/22434 [2:51:49<12:47:39, 2.51s/it] +2025-02-05 12:59:29 - ERROR - stderr - +2025-02-05 12:59:29 - ERROR - stderr - +2025-02-05 12:59:29 - INFO - stdout - {'loss': 0.8973, 'grad_norm': 1.0610026121139526, 'learning_rate': 1.8808983309106164e-05, 'epoch': 0.55} +2025-02-05 12:59:29 - ERROR - stderr - 18%|█▊ | 4089/22434 [2:51:49<12:47:39, 2.51s/it] +2025-02-05 12:59:32 - ERROR - stderr - 18%|█▊ | 4090/22434 [2:51:51<12:39:42, 2.48s/it] +2025-02-05 12:59:32 - ERROR - stderr - +2025-02-05 12:59:32 - ERROR - stderr - +2025-02-05 12:59:32 - INFO - stdout - {'loss': 1.0943, 'grad_norm': 1.1304194927215576, 'learning_rate': 1.880829988403116e-05, 'epoch': 0.55} +2025-02-05 12:59:32 - ERROR - stderr - 18%|█▊ | 4090/22434 [2:51:51<12:39:42, 2.48s/it] +2025-02-05 12:59:34 - ERROR - stderr - 18%|█▊ | 4091/22434 [2:51:54<12:35:03, 2.47s/it] +2025-02-05 12:59:34 - ERROR - stderr - +2025-02-05 12:59:34 - ERROR - stderr - +2025-02-05 12:59:34 - INFO - stdout - {'loss': 0.967, 'grad_norm': 1.2175449132919312, 'learning_rate': 1.880761627535558e-05, 'epoch': 0.55} +2025-02-05 12:59:34 - ERROR - stderr - 18%|█▊ | 4091/22434 [2:51:54<12:35:03, 2.47s/it] +2025-02-05 12:59:36 - ERROR - stderr - 18%|█▊ | 4092/22434 [2:51:56<12:36:11, 2.47s/it] +2025-02-05 12:59:36 - ERROR - stderr - +2025-02-05 12:59:36 - ERROR - stderr - +2025-02-05 12:59:36 - INFO - stdout - {'loss': 1.0145, 'grad_norm': 1.1401782035827637, 'learning_rate': 1.8806932483093666e-05, 'epoch': 0.55} +2025-02-05 12:59:36 - ERROR - stderr - 18%|█▊ | 4092/22434 [2:51:56<12:36:11, 2.47s/it] +2025-02-05 12:59:39 - ERROR - stderr - 18%|█▊ | 4093/22434 [2:51:59<12:34:44, 2.47s/it] +2025-02-05 12:59:39 - ERROR - stderr - +2025-02-05 12:59:39 - ERROR - stderr - +2025-02-05 12:59:39 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.1192463636398315, 'learning_rate': 1.8806248507259668e-05, 'epoch': 0.55} +2025-02-05 12:59:39 - ERROR - stderr - 18%|█▊ | 4093/22434 [2:51:59<12:34:44, 2.47s/it] +2025-02-05 12:59:41 - ERROR - stderr - 18%|█▊ | 4094/22434 [2:52:01<12:47:58, 2.51s/it] +2025-02-05 12:59:42 - ERROR - stderr - +2025-02-05 12:59:42 - ERROR - stderr - +2025-02-05 12:59:42 - INFO - stdout - {'loss': 0.9377, 'grad_norm': 1.172466516494751, 'learning_rate': 1.880556434786785e-05, 'epoch': 0.55} +2025-02-05 12:59:42 - ERROR - stderr - 18%|█▊ | 4094/22434 [2:52:01<12:47:58, 2.51s/it] +2025-02-05 12:59:44 - ERROR - stderr - 18%|█▊ | 4095/22434 [2:52:04<12:49:17, 2.52s/it] +2025-02-05 12:59:44 - ERROR - stderr - +2025-02-05 12:59:44 - ERROR - stderr - +2025-02-05 12:59:44 - INFO - stdout - {'loss': 1.0167, 'grad_norm': 1.0581740140914917, 'learning_rate': 1.8804880004932468e-05, 'epoch': 0.55} +2025-02-05 12:59:44 - ERROR - stderr - 18%|█▊ | 4095/22434 [2:52:04<12:49:17, 2.52s/it] +2025-02-05 12:59:47 - ERROR - stderr - 18%|█▊ | 4096/22434 [2:52:06<12:49:00, 2.52s/it] +2025-02-05 12:59:47 - ERROR - stderr - +2025-02-05 12:59:47 - ERROR - stderr - +2025-02-05 12:59:47 - INFO - stdout - {'loss': 1.0049, 'grad_norm': 1.130346655845642, 'learning_rate': 1.8804195478467785e-05, 'epoch': 0.55} +2025-02-05 12:59:47 - ERROR - stderr - 18%|█▊ | 4096/22434 [2:52:06<12:49:00, 2.52s/it] +2025-02-05 12:59:49 - ERROR - stderr - 18%|█▊ | 4097/22434 [2:52:09<12:50:44, 2.52s/it] +2025-02-05 12:59:49 - ERROR - stderr - +2025-02-05 12:59:49 - ERROR - stderr - +2025-02-05 12:59:49 - INFO - stdout - {'loss': 0.9096, 'grad_norm': 1.031082272529602, 'learning_rate': 1.8803510768488075e-05, 'epoch': 0.55} +2025-02-05 12:59:49 - ERROR - stderr - 18%|█▊ | 4097/22434 [2:52:09<12:50:44, 2.52s/it] +2025-02-05 12:59:52 - ERROR - stderr - 18%|█▊ | 4098/22434 [2:52:11<12:49:17, 2.52s/it] +2025-02-05 12:59:52 - ERROR - stderr - +2025-02-05 12:59:52 - ERROR - stderr - +2025-02-05 12:59:52 - INFO - stdout - {'loss': 0.9791, 'grad_norm': 1.0581367015838623, 'learning_rate': 1.8802825875007604e-05, 'epoch': 0.55} +2025-02-05 12:59:52 - ERROR - stderr - 18%|█▊ | 4098/22434 [2:52:11<12:49:17, 2.52s/it] +2025-02-05 12:59:54 - ERROR - stderr - 18%|█▊ | 4099/22434 [2:52:14<12:47:45, 2.51s/it] +2025-02-05 12:59:54 - ERROR - stderr - +2025-02-05 12:59:54 - ERROR - stderr - +2025-02-05 12:59:54 - INFO - stdout - {'loss': 0.9072, 'grad_norm': 1.0102113485336304, 'learning_rate': 1.8802140798040653e-05, 'epoch': 0.55} +2025-02-05 12:59:54 - ERROR - stderr - 18%|█▊ | 4099/22434 [2:52:14<12:47:45, 2.51s/it] +2025-02-05 12:59:57 - ERROR - stderr - 18%|█▊ | 4100/22434 [2:52:16<12:44:35, 2.50s/it] +2025-02-05 12:59:57 - ERROR - stderr - +2025-02-05 12:59:57 - ERROR - stderr - +2025-02-05 12:59:57 - INFO - stdout - {'loss': 0.9316, 'grad_norm': 1.1040164232254028, 'learning_rate': 1.88014555376015e-05, 'epoch': 0.55} +2025-02-05 12:59:57 - ERROR - stderr - 18%|█▊ | 4100/22434 [2:52:16<12:44:35, 2.50s/it] +2025-02-05 12:59:59 - ERROR - stderr - 18%|█▊ | 4101/22434 [2:52:19<12:42:04, 2.49s/it] +2025-02-05 12:59:59 - ERROR - stderr - +2025-02-05 12:59:59 - ERROR - stderr - +2025-02-05 12:59:59 - INFO - stdout - {'loss': 0.874, 'grad_norm': 1.0502278804779053, 'learning_rate': 1.880077009370443e-05, 'epoch': 0.55} +2025-02-05 12:59:59 - ERROR - stderr - 18%|█▊ | 4101/22434 [2:52:19<12:42:04, 2.49s/it] +2025-02-05 13:00:02 - ERROR - stderr - 18%|█▊ | 4102/22434 [2:52:21<12:42:01, 2.49s/it] +2025-02-05 13:00:02 - ERROR - stderr - +2025-02-05 13:00:02 - ERROR - stderr - +2025-02-05 13:00:02 - INFO - stdout - {'loss': 1.0042, 'grad_norm': 1.321721076965332, 'learning_rate': 1.8800084466363726e-05, 'epoch': 0.55} +2025-02-05 13:00:02 - ERROR - stderr - 18%|█▊ | 4102/22434 [2:52:21<12:42:01, 2.49s/it] +2025-02-05 13:00:04 - ERROR - stderr - 18%|█▊ | 4103/22434 [2:52:24<12:40:44, 2.49s/it] +2025-02-05 13:00:04 - ERROR - stderr - +2025-02-05 13:00:04 - ERROR - stderr - +2025-02-05 13:00:04 - INFO - stdout - {'loss': 0.9917, 'grad_norm': 1.0465561151504517, 'learning_rate': 1.8799398655593682e-05, 'epoch': 0.55} +2025-02-05 13:00:04 - ERROR - stderr - 18%|█▊ | 4103/22434 [2:52:24<12:40:44, 2.49s/it] +2025-02-05 13:00:07 - ERROR - stderr - 18%|█▊ | 4104/22434 [2:52:26<12:43:50, 2.50s/it] +2025-02-05 13:00:07 - ERROR - stderr - +2025-02-05 13:00:07 - ERROR - stderr - +2025-02-05 13:00:07 - INFO - stdout - {'loss': 0.9839, 'grad_norm': 1.015295386314392, 'learning_rate': 1.8798712661408594e-05, 'epoch': 0.55} +2025-02-05 13:00:07 - ERROR - stderr - 18%|█▊ | 4104/22434 [2:52:26<12:43:50, 2.50s/it] +2025-02-05 13:00:09 - ERROR - stderr - 18%|█▊ | 4105/22434 [2:52:29<13:05:31, 2.57s/it] +2025-02-05 13:00:09 - ERROR - stderr - +2025-02-05 13:00:09 - ERROR - stderr - +2025-02-05 13:00:09 - INFO - stdout - {'loss': 0.905, 'grad_norm': 1.0752264261245728, 'learning_rate': 1.8798026483822763e-05, 'epoch': 0.55} +2025-02-05 13:00:09 - ERROR - stderr - 18%|█▊ | 4105/22434 [2:52:29<13:05:31, 2.57s/it] +2025-02-05 13:00:12 - ERROR - stderr - 18%|█▊ | 4106/22434 [2:52:32<13:09:26, 2.58s/it] +2025-02-05 13:00:12 - ERROR - stderr - +2025-02-05 13:00:12 - ERROR - stderr - +2025-02-05 13:00:12 - INFO - stdout - {'loss': 0.9949, 'grad_norm': 0.9739238023757935, 'learning_rate': 1.8797340122850484e-05, 'epoch': 0.55} +2025-02-05 13:00:12 - ERROR - stderr - 18%|█▊ | 4106/22434 [2:52:32<13:09:26, 2.58s/it] +2025-02-05 13:00:15 - ERROR - stderr - 18%|█▊ | 4107/22434 [2:52:34<13:17:36, 2.61s/it] +2025-02-05 13:00:15 - ERROR - stderr - +2025-02-05 13:00:15 - ERROR - stderr - +2025-02-05 13:00:15 - INFO - stdout - {'loss': 0.8715, 'grad_norm': 1.004654884338379, 'learning_rate': 1.879665357850607e-05, 'epoch': 0.55} +2025-02-05 13:00:15 - ERROR - stderr - 18%|█▊ | 4107/22434 [2:52:34<13:17:36, 2.61s/it] +2025-02-05 13:00:17 - ERROR - stderr - 18%|█▊ | 4108/22434 [2:52:37<13:08:20, 2.58s/it] +2025-02-05 13:00:17 - ERROR - stderr - +2025-02-05 13:00:17 - ERROR - stderr - +2025-02-05 13:00:17 - INFO - stdout - {'loss': 0.8753, 'grad_norm': 1.158569097518921, 'learning_rate': 1.879596685080383e-05, 'epoch': 0.55} +2025-02-05 13:00:17 - ERROR - stderr - 18%|█▊ | 4108/22434 [2:52:37<13:08:20, 2.58s/it] +2025-02-05 13:00:20 - ERROR - stderr - 18%|█▊ | 4109/22434 [2:52:39<13:05:50, 2.57s/it] +2025-02-05 13:00:20 - ERROR - stderr - +2025-02-05 13:00:20 - ERROR - stderr - +2025-02-05 13:00:20 - INFO - stdout - {'loss': 1.0681, 'grad_norm': 1.0677436590194702, 'learning_rate': 1.8795279939758076e-05, 'epoch': 0.55} +2025-02-05 13:00:20 - ERROR - stderr - 18%|█▊ | 4109/22434 [2:52:39<13:05:50, 2.57s/it] +2025-02-05 13:00:22 - ERROR - stderr - 18%|█▊ | 4110/22434 [2:52:42<12:56:29, 2.54s/it] +2025-02-05 13:00:22 - ERROR - stderr - +2025-02-05 13:00:22 - ERROR - stderr - +2025-02-05 13:00:22 - INFO - stdout - {'loss': 1.0462, 'grad_norm': 1.116233229637146, 'learning_rate': 1.8794592845383133e-05, 'epoch': 0.55} +2025-02-05 13:00:22 - ERROR - stderr - 18%|█▊ | 4110/22434 [2:52:42<12:56:29, 2.54s/it] +2025-02-05 13:00:25 - ERROR - stderr - 18%|█▊ | 4111/22434 [2:52:44<12:55:05, 2.54s/it] +2025-02-05 13:00:25 - ERROR - stderr - +2025-02-05 13:00:25 - ERROR - stderr - +2025-02-05 13:00:25 - INFO - stdout - {'loss': 0.889, 'grad_norm': 1.1746715307235718, 'learning_rate': 1.8793905567693313e-05, 'epoch': 0.55} +2025-02-05 13:00:25 - ERROR - stderr - 18%|█▊ | 4111/22434 [2:52:44<12:55:05, 2.54s/it] +2025-02-05 13:00:27 - ERROR - stderr - 18%|█▊ | 4112/22434 [2:52:47<12:49:17, 2.52s/it] +2025-02-05 13:00:27 - ERROR - stderr - +2025-02-05 13:00:27 - ERROR - stderr - +2025-02-05 13:00:27 - INFO - stdout - {'loss': 0.9486, 'grad_norm': 1.1653187274932861, 'learning_rate': 1.8793218106702947e-05, 'epoch': 0.55} +2025-02-05 13:00:27 - ERROR - stderr - 18%|█▊ | 4112/22434 [2:52:47<12:49:17, 2.52s/it] +2025-02-05 13:00:30 - ERROR - stderr - 18%|█▊ | 4113/22434 [2:52:49<12:45:57, 2.51s/it] +2025-02-05 13:00:30 - ERROR - stderr - +2025-02-05 13:00:30 - ERROR - stderr - +2025-02-05 13:00:30 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.1489580869674683, 'learning_rate': 1.8792530462426364e-05, 'epoch': 0.55} +2025-02-05 13:00:30 - ERROR - stderr - 18%|█▊ | 4113/22434 [2:52:49<12:45:57, 2.51s/it] +2025-02-05 13:00:32 - ERROR - stderr - 18%|█▊ | 4114/22434 [2:52:52<12:49:38, 2.52s/it] +2025-02-05 13:00:32 - ERROR - stderr - +2025-02-05 13:00:32 - ERROR - stderr - +2025-02-05 13:00:32 - INFO - stdout - {'loss': 0.9157, 'grad_norm': 1.076123595237732, 'learning_rate': 1.87918426348779e-05, 'epoch': 0.55} +2025-02-05 13:00:32 - ERROR - stderr - 18%|█▊ | 4114/22434 [2:52:52<12:49:38, 2.52s/it] +2025-02-05 13:00:35 - ERROR - stderr - 18%|█▊ | 4115/22434 [2:52:54<12:42:12, 2.50s/it] +2025-02-05 13:00:35 - ERROR - stderr - +2025-02-05 13:00:35 - ERROR - stderr - +2025-02-05 13:00:35 - INFO - stdout - {'loss': 0.9784, 'grad_norm': 1.066899299621582, 'learning_rate': 1.8791154624071885e-05, 'epoch': 0.55} +2025-02-05 13:00:35 - ERROR - stderr - 18%|█▊ | 4115/22434 [2:52:54<12:42:12, 2.50s/it] +2025-02-05 13:00:37 - ERROR - stderr - 18%|█▊ | 4116/22434 [2:52:57<12:43:52, 2.50s/it] +2025-02-05 13:00:37 - ERROR - stderr - +2025-02-05 13:00:37 - ERROR - stderr - +2025-02-05 13:00:37 - INFO - stdout - {'loss': 0.9966, 'grad_norm': 1.1230682134628296, 'learning_rate': 1.8790466430022665e-05, 'epoch': 0.55} +2025-02-05 13:00:37 - ERROR - stderr - 18%|█▊ | 4116/22434 [2:52:57<12:43:52, 2.50s/it] +2025-02-05 13:00:37 - INFO - stdout - WARNING: tokenization mismatch: 1 vs. 55. (ignored) +2025-02-05 13:00:40 - ERROR - stderr - 18%|█▊ | 4117/22434 [2:52:59<12:43:00, 2.50s/it] +2025-02-05 13:00:40 - ERROR - stderr - +2025-02-05 13:00:40 - ERROR - stderr - +2025-02-05 13:00:40 - INFO - stdout - {'loss': 0.9593, 'grad_norm': 1.2411407232284546, 'learning_rate': 1.8789778052744587e-05, 'epoch': 0.55} +2025-02-05 13:00:40 - ERROR - stderr - 18%|█▊ | 4117/22434 [2:52:59<12:43:00, 2.50s/it] +2025-02-05 13:00:42 - ERROR - stderr - 18%|█▊ | 4118/22434 [2:53:02<12:35:22, 2.47s/it] +2025-02-05 13:00:42 - ERROR - stderr - +2025-02-05 13:00:42 - ERROR - stderr - +2025-02-05 13:00:42 - INFO - stdout - {'loss': 1.025, 'grad_norm': 1.2296236753463745, 'learning_rate': 1.878908949225199e-05, 'epoch': 0.55} +2025-02-05 13:00:42 - ERROR - stderr - 18%|█▊ | 4118/22434 [2:53:02<12:35:22, 2.47s/it] +2025-02-05 13:00:45 - ERROR - stderr - 18%|█▊ | 4119/22434 [2:53:04<12:46:27, 2.51s/it] +2025-02-05 13:00:45 - ERROR - stderr - +2025-02-05 13:00:45 - ERROR - stderr - +2025-02-05 13:00:45 - INFO - stdout - {'loss': 0.9995, 'grad_norm': 1.1244949102401733, 'learning_rate': 1.878840074855924e-05, 'epoch': 0.55} +2025-02-05 13:00:45 - ERROR - stderr - 18%|█▊ | 4119/22434 [2:53:04<12:46:27, 2.51s/it] +2025-02-05 13:00:47 - ERROR - stderr - 18%|█▊ | 4120/22434 [2:53:07<12:38:02, 2.48s/it] +2025-02-05 13:00:47 - ERROR - stderr - +2025-02-05 13:00:47 - ERROR - stderr - +2025-02-05 13:00:47 - INFO - stdout - {'loss': 1.0638, 'grad_norm': 1.1304404735565186, 'learning_rate': 1.8787711821680682e-05, 'epoch': 0.55} +2025-02-05 13:00:47 - ERROR - stderr - 18%|█▊ | 4120/22434 [2:53:07<12:38:02, 2.48s/it] +2025-02-05 13:00:49 - ERROR - stderr - 18%|█▊ | 4121/22434 [2:53:09<12:34:13, 2.47s/it] +2025-02-05 13:00:49 - ERROR - stderr - +2025-02-05 13:00:49 - ERROR - stderr - +2025-02-05 13:00:49 - INFO - stdout - {'loss': 0.8397, 'grad_norm': 1.1003179550170898, 'learning_rate': 1.878702271163068e-05, 'epoch': 0.55} +2025-02-05 13:00:49 - ERROR - stderr - 18%|█▊ | 4121/22434 [2:53:09<12:34:13, 2.47s/it] +2025-02-05 13:00:52 - ERROR - stderr - 18%|█▊ | 4122/22434 [2:53:12<12:40:51, 2.49s/it] +2025-02-05 13:00:52 - ERROR - stderr - +2025-02-05 13:00:52 - ERROR - stderr - +2025-02-05 13:00:52 - INFO - stdout - {'loss': 1.0425, 'grad_norm': 1.2588616609573364, 'learning_rate': 1.8786333418423597e-05, 'epoch': 0.55} +2025-02-05 13:00:52 - ERROR - stderr - 18%|█▊ | 4122/22434 [2:53:12<12:40:51, 2.49s/it] +2025-02-05 13:00:54 - ERROR - stderr - 18%|█▊ | 4123/22434 [2:53:14<12:36:36, 2.48s/it] +2025-02-05 13:00:54 - ERROR - stderr - +2025-02-05 13:00:54 - ERROR - stderr - +2025-02-05 13:00:54 - INFO - stdout - {'loss': 1.1004, 'grad_norm': 1.2148840427398682, 'learning_rate': 1.8785643942073804e-05, 'epoch': 0.55} +2025-02-05 13:00:54 - ERROR - stderr - 18%|█▊ | 4123/22434 [2:53:14<12:36:36, 2.48s/it] +2025-02-05 13:00:57 - ERROR - stderr - 18%|█▊ | 4124/22434 [2:53:17<12:47:17, 2.51s/it] +2025-02-05 13:00:57 - ERROR - stderr - +2025-02-05 13:00:57 - ERROR - stderr - +2025-02-05 13:00:57 - INFO - stdout - {'loss': 0.9765, 'grad_norm': 0.9715263247489929, 'learning_rate': 1.878495428259567e-05, 'epoch': 0.55} +2025-02-05 13:00:57 - ERROR - stderr - 18%|█▊ | 4124/22434 [2:53:17<12:47:17, 2.51s/it] +2025-02-05 13:01:00 - ERROR - stderr - 18%|█▊ | 4125/22434 [2:53:19<12:47:50, 2.52s/it] +2025-02-05 13:01:00 - ERROR - stderr - +2025-02-05 13:01:00 - ERROR - stderr - +2025-02-05 13:01:00 - INFO - stdout - {'loss': 1.0754, 'grad_norm': 1.134539246559143, 'learning_rate': 1.8784264440003567e-05, 'epoch': 0.55} +2025-02-05 13:01:00 - ERROR - stderr - 18%|█▊ | 4125/22434 [2:53:19<12:47:50, 2.52s/it] +2025-02-05 13:01:02 - ERROR - stderr - 18%|█▊ | 4126/22434 [2:53:22<12:46:09, 2.51s/it] +2025-02-05 13:01:02 - ERROR - stderr - +2025-02-05 13:01:02 - ERROR - stderr - +2025-02-05 13:01:02 - INFO - stdout - {'loss': 1.0087, 'grad_norm': 1.012941837310791, 'learning_rate': 1.878357441431188e-05, 'epoch': 0.55} +2025-02-05 13:01:02 - ERROR - stderr - 18%|█▊ | 4126/22434 [2:53:22<12:46:09, 2.51s/it] +2025-02-05 13:01:04 - ERROR - stderr - 18%|█▊ | 4127/22434 [2:53:24<12:39:13, 2.49s/it] +2025-02-05 13:01:05 - ERROR - stderr - +2025-02-05 13:01:05 - ERROR - stderr - +2025-02-05 13:01:05 - INFO - stdout - {'loss': 0.948, 'grad_norm': 1.0308141708374023, 'learning_rate': 1.878288420553499e-05, 'epoch': 0.55} +2025-02-05 13:01:05 - ERROR - stderr - 18%|█▊ | 4127/22434 [2:53:24<12:39:13, 2.49s/it] +2025-02-05 13:01:07 - ERROR - stderr - 18%|█▊ | 4128/22434 [2:53:27<12:38:30, 2.49s/it] +2025-02-05 13:01:07 - ERROR - stderr - +2025-02-05 13:01:07 - ERROR - stderr - +2025-02-05 13:01:07 - INFO - stdout - {'loss': 0.9309, 'grad_norm': 1.0697585344314575, 'learning_rate': 1.878219381368728e-05, 'epoch': 0.55} +2025-02-05 13:01:07 - ERROR - stderr - 18%|█▊ | 4128/22434 [2:53:27<12:38:30, 2.49s/it] +2025-02-05 13:01:10 - ERROR - stderr - 18%|█▊ | 4129/22434 [2:53:29<12:43:03, 2.50s/it] +2025-02-05 13:01:10 - ERROR - stderr - +2025-02-05 13:01:10 - ERROR - stderr - +2025-02-05 13:01:10 - INFO - stdout - {'loss': 1.0639, 'grad_norm': 1.0895172357559204, 'learning_rate': 1.8781503238783146e-05, 'epoch': 0.55} +2025-02-05 13:01:10 - ERROR - stderr - 18%|█▊ | 4129/22434 [2:53:29<12:43:03, 2.50s/it] +2025-02-05 13:01:12 - ERROR - stderr - 18%|█▊ | 4130/22434 [2:53:32<12:41:38, 2.50s/it] +2025-02-05 13:01:12 - ERROR - stderr - +2025-02-05 13:01:12 - ERROR - stderr - +2025-02-05 13:01:12 - INFO - stdout - {'loss': 0.8741, 'grad_norm': 0.9940357804298401, 'learning_rate': 1.878081248083698e-05, 'epoch': 0.55} +2025-02-05 13:01:12 - ERROR - stderr - 18%|█▊ | 4130/22434 [2:53:32<12:41:38, 2.50s/it] +2025-02-05 13:01:14 - ERROR - stderr - 18%|█▊ | 4131/22434 [2:53:34<12:38:13, 2.49s/it] +2025-02-05 13:01:15 - ERROR - stderr - +2025-02-05 13:01:15 - ERROR - stderr - +2025-02-05 13:01:15 - INFO - stdout - {'loss': 0.8339, 'grad_norm': 1.0215710401535034, 'learning_rate': 1.8780121539863182e-05, 'epoch': 0.55} +2025-02-05 13:01:15 - ERROR - stderr - 18%|█▊ | 4131/22434 [2:53:34<12:38:13, 2.49s/it] +2025-02-05 13:01:17 - ERROR - stderr - 18%|█▊ | 4132/22434 [2:53:37<12:40:58, 2.49s/it] +2025-02-05 13:01:17 - ERROR - stderr - +2025-02-05 13:01:17 - ERROR - stderr - +2025-02-05 13:01:17 - INFO - stdout - {'loss': 1.0038, 'grad_norm': 1.1118327379226685, 'learning_rate': 1.877943041587615e-05, 'epoch': 0.55} +2025-02-05 13:01:17 - ERROR - stderr - 18%|█▊ | 4132/22434 [2:53:37<12:40:58, 2.49s/it] +2025-02-05 13:01:19 - ERROR - stderr - 18%|█▊ | 4133/22434 [2:53:39<12:34:29, 2.47s/it] +2025-02-05 13:01:19 - ERROR - stderr - +2025-02-05 13:01:19 - ERROR - stderr - +2025-02-05 13:01:19 - INFO - stdout - {'loss': 0.9596, 'grad_norm': 1.1815924644470215, 'learning_rate': 1.877873910889029e-05, 'epoch': 0.55} +2025-02-05 13:01:19 - ERROR - stderr - 18%|█▊ | 4133/22434 [2:53:39<12:34:29, 2.47s/it] +2025-02-05 13:01:22 - ERROR - stderr - 18%|█▊ | 4134/22434 [2:53:42<12:34:03, 2.47s/it] +2025-02-05 13:01:22 - ERROR - stderr - +2025-02-05 13:01:22 - ERROR - stderr - +2025-02-05 13:01:22 - INFO - stdout - {'loss': 0.873, 'grad_norm': 1.1048861742019653, 'learning_rate': 1.8778047618920016e-05, 'epoch': 0.55} +2025-02-05 13:01:22 - ERROR - stderr - 18%|█▊ | 4134/22434 [2:53:42<12:34:03, 2.47s/it] +2025-02-05 13:01:24 - ERROR - stderr - 18%|█▊ | 4135/22434 [2:53:44<12:40:15, 2.49s/it] +2025-02-05 13:01:24 - ERROR - stderr - +2025-02-05 13:01:24 - ERROR - stderr - +2025-02-05 13:01:24 - INFO - stdout - {'loss': 1.0458, 'grad_norm': 1.086790680885315, 'learning_rate': 1.877735594597974e-05, 'epoch': 0.55} +2025-02-05 13:01:24 - ERROR - stderr - 18%|█▊ | 4135/22434 [2:53:44<12:40:15, 2.49s/it] +2025-02-05 13:01:27 - ERROR - stderr - 18%|█▊ | 4136/22434 [2:53:47<12:46:25, 2.51s/it] +2025-02-05 13:01:27 - ERROR - stderr - +2025-02-05 13:01:27 - ERROR - stderr - +2025-02-05 13:01:27 - INFO - stdout - {'loss': 0.8433, 'grad_norm': 1.1849241256713867, 'learning_rate': 1.8776664090083872e-05, 'epoch': 0.55} +2025-02-05 13:01:27 - ERROR - stderr - 18%|█▊ | 4136/22434 [2:53:47<12:46:25, 2.51s/it] +2025-02-05 13:01:30 - ERROR - stderr - 18%|█▊ | 4137/22434 [2:53:49<13:04:56, 2.57s/it] +2025-02-05 13:01:30 - ERROR - stderr - +2025-02-05 13:01:30 - ERROR - stderr - +2025-02-05 13:01:30 - INFO - stdout - {'loss': 0.996, 'grad_norm': 1.151999831199646, 'learning_rate': 1.8775972051246846e-05, 'epoch': 0.55} +2025-02-05 13:01:30 - ERROR - stderr - 18%|█▊ | 4137/22434 [2:53:49<13:04:56, 2.57s/it] +2025-02-05 13:01:32 - ERROR - stderr - 18%|█▊ | 4138/22434 [2:53:52<12:55:39, 2.54s/it] +2025-02-05 13:01:32 - ERROR - stderr - +2025-02-05 13:01:32 - ERROR - stderr - +2025-02-05 13:01:32 - INFO - stdout - {'loss': 0.984, 'grad_norm': 1.1179007291793823, 'learning_rate': 1.877527982948308e-05, 'epoch': 0.55} +2025-02-05 13:01:32 - ERROR - stderr - 18%|█▊ | 4138/22434 [2:53:52<12:55:39, 2.54s/it] +2025-02-05 13:01:35 - ERROR - stderr - 18%|█▊ | 4139/22434 [2:53:54<12:45:26, 2.51s/it] +2025-02-05 13:01:35 - ERROR - stderr - +2025-02-05 13:01:35 - ERROR - stderr - +2025-02-05 13:01:35 - INFO - stdout - {'loss': 1.0434, 'grad_norm': 1.1322762966156006, 'learning_rate': 1.8774587424807e-05, 'epoch': 0.55} +2025-02-05 13:01:35 - ERROR - stderr - 18%|█▊ | 4139/22434 [2:53:54<12:45:26, 2.51s/it] +2025-02-05 13:01:37 - ERROR - stderr - 18%|█▊ | 4140/22434 [2:53:57<12:38:41, 2.49s/it] +2025-02-05 13:01:37 - ERROR - stderr - +2025-02-05 13:01:37 - ERROR - stderr - +2025-02-05 13:01:37 - INFO - stdout - {'loss': 0.9508, 'grad_norm': 1.1262671947479248, 'learning_rate': 1.8773894837233044e-05, 'epoch': 0.55} +2025-02-05 13:01:37 - ERROR - stderr - 18%|█▊ | 4140/22434 [2:53:57<12:38:41, 2.49s/it] +2025-02-05 13:01:39 - ERROR - stderr - 18%|█▊ | 4141/22434 [2:53:59<12:34:11, 2.47s/it] +2025-02-05 13:01:40 - ERROR - stderr - +2025-02-05 13:01:40 - ERROR - stderr - +2025-02-05 13:01:40 - INFO - stdout - {'loss': 1.0189, 'grad_norm': 1.2443318367004395, 'learning_rate': 1.8773202066775646e-05, 'epoch': 0.55} +2025-02-05 13:01:40 - ERROR - stderr - 18%|█▊ | 4141/22434 [2:53:59<12:34:11, 2.47s/it] +2025-02-05 13:01:42 - ERROR - stderr - 18%|█▊ | 4142/22434 [2:54:02<12:31:24, 2.46s/it] +2025-02-05 13:01:42 - ERROR - stderr - +2025-02-05 13:01:42 - ERROR - stderr - +2025-02-05 13:01:42 - INFO - stdout - {'loss': 0.8627, 'grad_norm': 1.0210678577423096, 'learning_rate': 1.8772509113449243e-05, 'epoch': 0.55} +2025-02-05 13:01:42 - ERROR - stderr - 18%|█▊ | 4142/22434 [2:54:02<12:31:24, 2.46s/it] +2025-02-05 13:01:44 - ERROR - stderr - 18%|█▊ | 4143/22434 [2:54:04<12:33:36, 2.47s/it] +2025-02-05 13:01:44 - ERROR - stderr - +2025-02-05 13:01:44 - ERROR - stderr - +2025-02-05 13:01:44 - INFO - stdout - {'loss': 1.01, 'grad_norm': 1.1153596639633179, 'learning_rate': 1.8771815977268284e-05, 'epoch': 0.55} +2025-02-05 13:01:44 - ERROR - stderr - 18%|█▊ | 4143/22434 [2:54:04<12:33:36, 2.47s/it] +2025-02-05 13:01:47 - ERROR - stderr - 18%|█▊ | 4144/22434 [2:54:07<12:31:32, 2.47s/it] +2025-02-05 13:01:47 - ERROR - stderr - +2025-02-05 13:01:47 - ERROR - stderr - +2025-02-05 13:01:47 - INFO - stdout - {'loss': 1.0077, 'grad_norm': 1.0814077854156494, 'learning_rate': 1.8771122658247214e-05, 'epoch': 0.55} +2025-02-05 13:01:47 - ERROR - stderr - 18%|█▊ | 4144/22434 [2:54:07<12:31:32, 2.47s/it] +2025-02-05 13:01:49 - ERROR - stderr - 18%|█▊ | 4145/22434 [2:54:09<12:36:55, 2.48s/it] +2025-02-05 13:01:49 - ERROR - stderr - +2025-02-05 13:01:49 - ERROR - stderr - +2025-02-05 13:01:49 - INFO - stdout - {'loss': 1.0627, 'grad_norm': 1.0816489458084106, 'learning_rate': 1.877042915640049e-05, 'epoch': 0.55} +2025-02-05 13:01:49 - ERROR - stderr - 18%|█▊ | 4145/22434 [2:54:09<12:36:55, 2.48s/it] +2025-02-05 13:01:52 - ERROR - stderr - 18%|█▊ | 4146/22434 [2:54:12<12:42:22, 2.50s/it] +2025-02-05 13:01:52 - ERROR - stderr - +2025-02-05 13:01:52 - ERROR - stderr - +2025-02-05 13:01:52 - INFO - stdout - {'loss': 0.9253, 'grad_norm': 1.2560906410217285, 'learning_rate': 1.8769735471742555e-05, 'epoch': 0.55} +2025-02-05 13:01:52 - ERROR - stderr - 18%|█▊ | 4146/22434 [2:54:12<12:42:22, 2.50s/it] +2025-02-05 13:01:54 - ERROR - stderr - 18%|█▊ | 4147/22434 [2:54:14<12:41:05, 2.50s/it] +2025-02-05 13:01:54 - ERROR - stderr - +2025-02-05 13:01:54 - ERROR - stderr - +2025-02-05 13:01:54 - INFO - stdout - {'loss': 1.0188, 'grad_norm': 1.0629435777664185, 'learning_rate': 1.876904160428788e-05, 'epoch': 0.55} +2025-02-05 13:01:54 - ERROR - stderr - 18%|█▊ | 4147/22434 [2:54:14<12:41:05, 2.50s/it] +2025-02-05 13:01:57 - ERROR - stderr - 18%|█▊ | 4148/22434 [2:54:17<12:45:13, 2.51s/it] +2025-02-05 13:01:57 - ERROR - stderr - +2025-02-05 13:01:57 - ERROR - stderr - +2025-02-05 13:01:57 - INFO - stdout - {'loss': 0.9647, 'grad_norm': 1.0532605648040771, 'learning_rate': 1.8768347554050922e-05, 'epoch': 0.55} +2025-02-05 13:01:57 - ERROR - stderr - 18%|█▊ | 4148/22434 [2:54:17<12:45:13, 2.51s/it] +2025-02-05 13:01:59 - ERROR - stderr - 18%|█▊ | 4149/22434 [2:54:19<12:42:41, 2.50s/it] +2025-02-05 13:01:59 - ERROR - stderr - +2025-02-05 13:01:59 - ERROR - stderr - +2025-02-05 13:01:59 - INFO - stdout - {'loss': 0.985, 'grad_norm': 1.1149368286132812, 'learning_rate': 1.8767653321046153e-05, 'epoch': 0.55} +2025-02-05 13:01:59 - ERROR - stderr - 18%|█▊ | 4149/22434 [2:54:19<12:42:41, 2.50s/it] +2025-02-05 13:02:02 - ERROR - stderr - 18%|█▊ | 4150/22434 [2:54:22<12:50:19, 2.53s/it] +2025-02-05 13:02:02 - ERROR - stderr - +2025-02-05 13:02:02 - ERROR - stderr - +2025-02-05 13:02:02 - INFO - stdout - {'loss': 0.9483, 'grad_norm': 1.1745245456695557, 'learning_rate': 1.8766958905288035e-05, 'epoch': 0.55} +2025-02-05 13:02:02 - ERROR - stderr - 18%|█▊ | 4150/22434 [2:54:22<12:50:19, 2.53s/it] +2025-02-05 13:02:05 - ERROR - stderr - 19%|█▊ | 4151/22434 [2:54:24<12:50:17, 2.53s/it] +2025-02-05 13:02:05 - ERROR - stderr - +2025-02-05 13:02:05 - ERROR - stderr - +2025-02-05 13:02:05 - INFO - stdout - {'loss': 1.0683, 'grad_norm': 1.0421106815338135, 'learning_rate': 1.876626430679105e-05, 'epoch': 0.56} +2025-02-05 13:02:05 - ERROR - stderr - 19%|█▊ | 4151/22434 [2:54:24<12:50:17, 2.53s/it] +2025-02-05 13:02:07 - ERROR - stderr - 19%|█▊ | 4152/22434 [2:54:27<12:41:16, 2.50s/it] +2025-02-05 13:02:07 - ERROR - stderr - +2025-02-05 13:02:07 - ERROR - stderr - +2025-02-05 13:02:07 - INFO - stdout - {'loss': 0.9195, 'grad_norm': 1.0267407894134521, 'learning_rate': 1.8765569525569677e-05, 'epoch': 0.56} +2025-02-05 13:02:07 - ERROR - stderr - 19%|█▊ | 4152/22434 [2:54:27<12:41:16, 2.50s/it] +2025-02-05 13:02:09 - ERROR - stderr - 19%|█▊ | 4153/22434 [2:54:29<12:42:32, 2.50s/it] +2025-02-05 13:02:10 - ERROR - stderr - +2025-02-05 13:02:10 - ERROR - stderr - +2025-02-05 13:02:10 - INFO - stdout - {'loss': 0.8578, 'grad_norm': 1.0786383152008057, 'learning_rate': 1.876487456163839e-05, 'epoch': 0.56} +2025-02-05 13:02:10 - ERROR - stderr - 19%|█▊ | 4153/22434 [2:54:29<12:42:32, 2.50s/it] +2025-02-05 13:02:12 - ERROR - stderr - 19%|█▊ | 4154/22434 [2:54:32<12:42:50, 2.50s/it] +2025-02-05 13:02:12 - ERROR - stderr - +2025-02-05 13:02:12 - ERROR - stderr - +2025-02-05 13:02:12 - INFO - stdout - {'loss': 0.9591, 'grad_norm': 1.2095805406570435, 'learning_rate': 1.876417941501168e-05, 'epoch': 0.56} +2025-02-05 13:02:12 - ERROR - stderr - 19%|█▊ | 4154/22434 [2:54:32<12:42:50, 2.50s/it] +2025-02-05 13:02:15 - ERROR - stderr - 19%|█▊ | 4155/22434 [2:54:34<12:51:11, 2.53s/it] +2025-02-05 13:02:15 - ERROR - stderr - +2025-02-05 13:02:15 - ERROR - stderr - +2025-02-05 13:02:15 - INFO - stdout - {'loss': 0.8748, 'grad_norm': 1.0385119915008545, 'learning_rate': 1.876348408570404e-05, 'epoch': 0.56} +2025-02-05 13:02:15 - ERROR - stderr - 19%|█▊ | 4155/22434 [2:54:34<12:51:11, 2.53s/it] +2025-02-05 13:02:17 - ERROR - stderr - 19%|█▊ | 4156/22434 [2:54:37<12:46:08, 2.51s/it] +2025-02-05 13:02:17 - ERROR - stderr - +2025-02-05 13:02:17 - ERROR - stderr - +2025-02-05 13:02:17 - INFO - stdout - {'loss': 0.8829, 'grad_norm': 1.0932854413986206, 'learning_rate': 1.876278857372996e-05, 'epoch': 0.56} +2025-02-05 13:02:17 - ERROR - stderr - 19%|█▊ | 4156/22434 [2:54:37<12:46:08, 2.51s/it] +2025-02-05 13:02:20 - ERROR - stderr - 19%|█▊ | 4157/22434 [2:54:39<12:43:09, 2.51s/it] +2025-02-05 13:02:20 - ERROR - stderr - +2025-02-05 13:02:20 - ERROR - stderr - +2025-02-05 13:02:20 - INFO - stdout - {'loss': 0.9742, 'grad_norm': 0.9786622524261475, 'learning_rate': 1.8762092879103938e-05, 'epoch': 0.56} +2025-02-05 13:02:20 - ERROR - stderr - 19%|█▊ | 4157/22434 [2:54:39<12:43:09, 2.51s/it] +2025-02-05 13:02:22 - ERROR - stderr - 19%|█▊ | 4158/22434 [2:54:42<12:42:31, 2.50s/it] +2025-02-05 13:02:22 - ERROR - stderr - +2025-02-05 13:02:22 - ERROR - stderr - +2025-02-05 13:02:22 - INFO - stdout - {'loss': 0.9489, 'grad_norm': 1.1122028827667236, 'learning_rate': 1.8761397001840472e-05, 'epoch': 0.56} +2025-02-05 13:02:22 - ERROR - stderr - 19%|█▊ | 4158/22434 [2:54:42<12:42:31, 2.50s/it] +2025-02-05 13:02:24 - ERROR - stderr - 19%|█▊ | 4159/22434 [2:54:44<12:35:33, 2.48s/it] +2025-02-05 13:02:25 - ERROR - stderr - +2025-02-05 13:02:25 - ERROR - stderr - +2025-02-05 13:02:25 - INFO - stdout - {'loss': 1.0692, 'grad_norm': 1.1907130479812622, 'learning_rate': 1.8760700941954066e-05, 'epoch': 0.56} +2025-02-05 13:02:25 - ERROR - stderr - 19%|█▊ | 4159/22434 [2:54:44<12:35:33, 2.48s/it] +2025-02-05 13:02:27 - ERROR - stderr - 19%|█▊ | 4160/22434 [2:54:47<12:40:56, 2.50s/it] +2025-02-05 13:02:27 - ERROR - stderr - +2025-02-05 13:02:27 - ERROR - stderr - +2025-02-05 13:02:27 - INFO - stdout - {'loss': 1.0327, 'grad_norm': 1.1519546508789062, 'learning_rate': 1.8760004699459236e-05, 'epoch': 0.56} +2025-02-05 13:02:27 - ERROR - stderr - 19%|█▊ | 4160/22434 [2:54:47<12:40:56, 2.50s/it] +2025-02-05 13:02:29 - ERROR - stderr - 19%|█▊ | 4161/22434 [2:54:49<12:33:10, 2.47s/it] +2025-02-05 13:02:29 - ERROR - stderr - +2025-02-05 13:02:29 - ERROR - stderr - +2025-02-05 13:02:29 - INFO - stdout - {'loss': 0.9116, 'grad_norm': 1.0557442903518677, 'learning_rate': 1.8759308274370492e-05, 'epoch': 0.56} +2025-02-05 13:02:29 - ERROR - stderr - 19%|█▊ | 4161/22434 [2:54:49<12:33:10, 2.47s/it] +2025-02-05 13:02:32 - ERROR - stderr - 19%|█▊ | 4162/22434 [2:54:52<12:33:38, 2.47s/it] +2025-02-05 13:02:32 - ERROR - stderr - +2025-02-05 13:02:32 - ERROR - stderr - +2025-02-05 13:02:32 - INFO - stdout - {'loss': 0.9897, 'grad_norm': 1.1056785583496094, 'learning_rate': 1.8758611666702347e-05, 'epoch': 0.56} +2025-02-05 13:02:32 - ERROR - stderr - 19%|█▊ | 4162/22434 [2:54:52<12:33:38, 2.47s/it] +2025-02-05 13:02:34 - ERROR - stderr - 19%|█▊ | 4163/22434 [2:54:54<12:28:05, 2.46s/it] +2025-02-05 13:02:34 - ERROR - stderr - +2025-02-05 13:02:34 - ERROR - stderr - +2025-02-05 13:02:34 - INFO - stdout - {'loss': 0.9291, 'grad_norm': 1.1571147441864014, 'learning_rate': 1.875791487646932e-05, 'epoch': 0.56} +2025-02-05 13:02:34 - ERROR - stderr - 19%|█▊ | 4163/22434 [2:54:54<12:28:05, 2.46s/it] +2025-02-05 13:02:37 - ERROR - stderr - 19%|█▊ | 4164/22434 [2:54:57<12:28:50, 2.46s/it] +2025-02-05 13:02:37 - ERROR - stderr - +2025-02-05 13:02:37 - ERROR - stderr - +2025-02-05 13:02:37 - INFO - stdout - {'loss': 1.0663, 'grad_norm': 1.0730870962142944, 'learning_rate': 1.8757217903685943e-05, 'epoch': 0.56} +2025-02-05 13:02:37 - ERROR - stderr - 19%|█▊ | 4164/22434 [2:54:57<12:28:50, 2.46s/it] +2025-02-05 13:02:39 - ERROR - stderr - 19%|█▊ | 4165/22434 [2:54:59<12:34:15, 2.48s/it] +2025-02-05 13:02:39 - ERROR - stderr - +2025-02-05 13:02:39 - ERROR - stderr - +2025-02-05 13:02:39 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 1.1050481796264648, 'learning_rate': 1.8756520748366735e-05, 'epoch': 0.56} +2025-02-05 13:02:39 - ERROR - stderr - 19%|█▊ | 4165/22434 [2:54:59<12:34:15, 2.48s/it] +2025-02-05 13:02:42 - ERROR - stderr - 19%|█▊ | 4166/22434 [2:55:02<12:40:20, 2.50s/it] +2025-02-05 13:02:42 - ERROR - stderr - +2025-02-05 13:02:42 - ERROR - stderr - +2025-02-05 13:02:42 - INFO - stdout - {'loss': 0.9153, 'grad_norm': 1.100428819656372, 'learning_rate': 1.875582341052623e-05, 'epoch': 0.56} +2025-02-05 13:02:42 - ERROR - stderr - 19%|█▊ | 4166/22434 [2:55:02<12:40:20, 2.50s/it] +2025-02-05 13:02:44 - ERROR - stderr - 19%|█▊ | 4167/22434 [2:55:04<12:41:46, 2.50s/it] +2025-02-05 13:02:44 - ERROR - stderr - +2025-02-05 13:02:44 - ERROR - stderr - +2025-02-05 13:02:44 - INFO - stdout - {'loss': 1.0603, 'grad_norm': 1.1881496906280518, 'learning_rate': 1.875512589017897e-05, 'epoch': 0.56} +2025-02-05 13:02:44 - ERROR - stderr - 19%|█▊ | 4167/22434 [2:55:04<12:41:46, 2.50s/it] +2025-02-05 13:02:47 - ERROR - stderr - 19%|█▊ | 4168/22434 [2:55:07<12:40:37, 2.50s/it] +2025-02-05 13:02:47 - ERROR - stderr - +2025-02-05 13:02:47 - ERROR - stderr - +2025-02-05 13:02:47 - INFO - stdout - {'loss': 1.0452, 'grad_norm': 1.099585771560669, 'learning_rate': 1.8754428187339484e-05, 'epoch': 0.56} +2025-02-05 13:02:47 - ERROR - stderr - 19%|█▊ | 4168/22434 [2:55:07<12:40:37, 2.50s/it] +2025-02-05 13:02:49 - ERROR - stderr - 19%|█▊ | 4169/22434 [2:55:09<12:38:21, 2.49s/it] +2025-02-05 13:02:49 - ERROR - stderr - +2025-02-05 13:02:49 - ERROR - stderr - +2025-02-05 13:02:49 - INFO - stdout - {'loss': 0.9291, 'grad_norm': 1.0015478134155273, 'learning_rate': 1.875373030202232e-05, 'epoch': 0.56} +2025-02-05 13:02:49 - ERROR - stderr - 19%|█▊ | 4169/22434 [2:55:09<12:38:21, 2.49s/it] +2025-02-05 13:02:52 - ERROR - stderr - 19%|█▊ | 4170/22434 [2:55:12<12:37:14, 2.49s/it] +2025-02-05 13:02:52 - ERROR - stderr - +2025-02-05 13:02:52 - ERROR - stderr - +2025-02-05 13:02:52 - INFO - stdout - {'loss': 1.2123, 'grad_norm': 1.2411212921142578, 'learning_rate': 1.8753032234242024e-05, 'epoch': 0.56} +2025-02-05 13:02:52 - ERROR - stderr - 19%|█▊ | 4170/22434 [2:55:12<12:37:14, 2.49s/it] +2025-02-05 13:02:54 - ERROR - stderr - 19%|█▊ | 4171/22434 [2:55:14<12:41:10, 2.50s/it] +2025-02-05 13:02:54 - ERROR - stderr - +2025-02-05 13:02:54 - ERROR - stderr - +2025-02-05 13:02:54 - INFO - stdout - {'loss': 0.9302, 'grad_norm': 0.9905581474304199, 'learning_rate': 1.875233398401315e-05, 'epoch': 0.56} +2025-02-05 13:02:54 - ERROR - stderr - 19%|█▊ | 4171/22434 [2:55:14<12:41:10, 2.50s/it] +2025-02-05 13:02:57 - ERROR - stderr - 19%|█▊ | 4172/22434 [2:55:17<12:32:36, 2.47s/it] +2025-02-05 13:02:57 - ERROR - stderr - +2025-02-05 13:02:57 - ERROR - stderr - +2025-02-05 13:02:57 - INFO - stdout - {'loss': 0.9981, 'grad_norm': 1.145564317703247, 'learning_rate': 1.8751635551350243e-05, 'epoch': 0.56} +2025-02-05 13:02:57 - ERROR - stderr - 19%|█▊ | 4172/22434 [2:55:17<12:32:36, 2.47s/it] +2025-02-05 13:02:59 - ERROR - stderr - 19%|█▊ | 4173/22434 [2:55:19<12:39:47, 2.50s/it] +2025-02-05 13:02:59 - ERROR - stderr - +2025-02-05 13:02:59 - ERROR - stderr - +2025-02-05 13:02:59 - INFO - stdout - {'loss': 0.922, 'grad_norm': 1.1210523843765259, 'learning_rate': 1.8750936936267874e-05, 'epoch': 0.56} +2025-02-05 13:02:59 - ERROR - stderr - 19%|█▊ | 4173/22434 [2:55:19<12:39:47, 2.50s/it] +2025-02-05 13:03:02 - ERROR - stderr - 19%|█▊ | 4174/22434 [2:55:22<12:56:27, 2.55s/it] +2025-02-05 13:03:02 - ERROR - stderr - +2025-02-05 13:03:02 - ERROR - stderr - +2025-02-05 13:03:02 - INFO - stdout - {'loss': 0.9693, 'grad_norm': 1.1233254671096802, 'learning_rate': 1.8750238138780595e-05, 'epoch': 0.56} +2025-02-05 13:03:02 - ERROR - stderr - 19%|█▊ | 4174/22434 [2:55:22<12:56:27, 2.55s/it] +2025-02-05 13:03:18 - ERROR - stderr - 19%|█▊ | 4175/22434 [2:55:38<33:13:07, 6.55s/it] +2025-02-05 13:03:18 - ERROR - stderr - +2025-02-05 13:03:18 - ERROR - stderr - +2025-02-05 13:03:18 - INFO - stdout - {'loss': 1.1435, 'grad_norm': 1.2204418182373047, 'learning_rate': 1.8749539158902975e-05, 'epoch': 0.56} +2025-02-05 13:03:18 - ERROR - stderr - 19%|█▊ | 4175/22434 [2:55:38<33:13:07, 6.55s/it] +2025-02-05 13:03:23 - ERROR - stderr - 19%|█▊ | 4176/22434 [2:55:43<31:25:21, 6.20s/it] +2025-02-05 13:03:23 - ERROR - stderr - +2025-02-05 13:03:23 - ERROR - stderr - +2025-02-05 13:03:23 - INFO - stdout - {'loss': 1.0065, 'grad_norm': 1.2252203226089478, 'learning_rate': 1.8748839996649583e-05, 'epoch': 0.56} +2025-02-05 13:03:23 - ERROR - stderr - 19%|█▊ | 4176/22434 [2:55:43<31:25:21, 6.20s/it] +2025-02-05 13:03:26 - ERROR - stderr - 19%|█▊ | 4177/22434 [2:55:46<25:53:28, 5.11s/it] +2025-02-05 13:03:26 - ERROR - stderr - +2025-02-05 13:03:26 - ERROR - stderr - +2025-02-05 13:03:26 - INFO - stdout - {'loss': 0.9989, 'grad_norm': 1.0889040231704712, 'learning_rate': 1.8748140652034992e-05, 'epoch': 0.56} +2025-02-05 13:03:26 - ERROR - stderr - 19%|█▊ | 4177/22434 [2:55:46<25:53:28, 5.11s/it] +2025-02-05 13:03:29 - ERROR - stderr - 19%|█▊ | 4178/22434 [2:55:49<23:13:35, 4.58s/it] +2025-02-05 13:03:29 - ERROR - stderr - +2025-02-05 13:03:29 - ERROR - stderr - +2025-02-05 13:03:29 - INFO - stdout - {'loss': 0.9868, 'grad_norm': 1.135770559310913, 'learning_rate': 1.8747441125073784e-05, 'epoch': 0.56} +2025-02-05 13:03:29 - ERROR - stderr - 19%|█▊ | 4178/22434 [2:55:49<23:13:35, 4.58s/it] +2025-02-05 13:03:39 - ERROR - stderr - 19%|█▊ | 4179/22434 [2:55:59<31:43:38, 6.26s/it] +2025-02-05 13:03:39 - ERROR - stderr - +2025-02-05 13:03:39 - ERROR - stderr - +2025-02-05 13:03:39 - INFO - stdout - {'loss': 0.9424, 'grad_norm': 1.039806842803955, 'learning_rate': 1.8746741415780535e-05, 'epoch': 0.56} +2025-02-05 13:03:39 - ERROR - stderr - 19%|█▊ | 4179/22434 [2:55:59<31:43:38, 6.26s/it] +2025-02-05 13:03:50 - ERROR - stderr - 19%|█▊ | 4180/22434 [2:56:09<37:46:00, 7.45s/it] +2025-02-05 13:03:50 - ERROR - stderr - +2025-02-05 13:03:50 - ERROR - stderr - +2025-02-05 13:03:50 - INFO - stdout - {'loss': 0.9424, 'grad_norm': 1.1514462232589722, 'learning_rate': 1.874604152416983e-05, 'epoch': 0.56} +2025-02-05 13:03:50 - ERROR - stderr - 19%|█▊ | 4180/22434 [2:56:09<37:46:00, 7.45s/it] +2025-02-05 13:03:52 - ERROR - stderr - 19%|█▊ | 4181/22434 [2:56:12<30:07:17, 5.94s/it] +2025-02-05 13:03:52 - ERROR - stderr - +2025-02-05 13:03:52 - ERROR - stderr - +2025-02-05 13:03:52 - INFO - stdout - {'loss': 0.8366, 'grad_norm': 1.019472599029541, 'learning_rate': 1.874534145025626e-05, 'epoch': 0.56} +2025-02-05 13:03:52 - ERROR - stderr - 19%|█▊ | 4181/22434 [2:56:12<30:07:17, 5.94s/it] +2025-02-05 13:04:08 - ERROR - stderr - 19%|█▊ | 4182/22434 [2:56:28<46:14:29, 9.12s/it] +2025-02-05 13:04:09 - ERROR - stderr - +2025-02-05 13:04:09 - ERROR - stderr - +2025-02-05 13:04:09 - INFO - stdout - {'loss': 0.9199, 'grad_norm': 1.045609712600708, 'learning_rate': 1.8744641194054417e-05, 'epoch': 0.56} +2025-02-05 13:04:09 - ERROR - stderr - 19%|█▊ | 4182/22434 [2:56:28<46:14:29, 9.12s/it] +2025-02-05 13:04:20 - ERROR - stderr - 19%|█▊ | 4183/22434 [2:56:39<49:12:14, 9.71s/it] +2025-02-05 13:04:20 - ERROR - stderr - +2025-02-05 13:04:20 - ERROR - stderr - +2025-02-05 13:04:20 - INFO - stdout - {'loss': 1.0042, 'grad_norm': 1.0890038013458252, 'learning_rate': 1.874394075557889e-05, 'epoch': 0.56} +2025-02-05 13:04:20 - ERROR - stderr - 19%|█▊ | 4183/22434 [2:56:39<49:12:14, 9.71s/it] +2025-02-05 13:04:48 - ERROR - stderr - 19%|█▊ | 4184/22434 [2:57:08<77:46:10, 15.34s/it] +2025-02-05 13:04:48 - ERROR - stderr - +2025-02-05 13:04:48 - ERROR - stderr - +2025-02-05 13:04:48 - INFO - stdout - {'loss': 0.97, 'grad_norm': 1.1431515216827393, 'learning_rate': 1.874324013484429e-05, 'epoch': 0.56} +2025-02-05 13:04:48 - ERROR - stderr - 19%|█▊ | 4184/22434 [2:57:08<77:46:10, 15.34s/it] +2025-02-05 13:05:21 - ERROR - stderr - 19%|█▊ | 4185/22434 [2:57:41<105:11:48, 20.75s/it] +2025-02-05 13:05:21 - ERROR - stderr - +2025-02-05 13:05:21 - ERROR - stderr - +2025-02-05 13:05:21 - INFO - stdout - {'loss': 0.8798, 'grad_norm': 1.0080128908157349, 'learning_rate': 1.8742539331865214e-05, 'epoch': 0.56} +2025-02-05 13:05:21 - ERROR - stderr - 19%|█▊ | 4185/22434 [2:57:41<105:11:48, 20.75s/it] +2025-02-05 13:05:54 - ERROR - stderr - 19%|█▊ | 4186/22434 [2:58:13<122:35:50, 24.19s/it] +2025-02-05 13:05:54 - ERROR - stderr - +2025-02-05 13:05:54 - ERROR - stderr - +2025-02-05 13:05:54 - INFO - stdout - {'loss': 0.971, 'grad_norm': 1.025169849395752, 'learning_rate': 1.8741838346656275e-05, 'epoch': 0.56} +2025-02-05 13:05:54 - ERROR - stderr - 19%|█▊ | 4186/22434 [2:58:13<122:35:50, 24.19s/it] +2025-02-05 13:05:56 - ERROR - stderr - 19%|█▊ | 4187/22434 [2:58:16<89:37:35, 17.68s/it] +2025-02-05 13:05:56 - ERROR - stderr - +2025-02-05 13:05:56 - ERROR - stderr - +2025-02-05 13:05:56 - INFO - stdout - {'loss': 1.051, 'grad_norm': 1.0815359354019165, 'learning_rate': 1.8741137179232077e-05, 'epoch': 0.56} +2025-02-05 13:05:56 - ERROR - stderr - 19%|█▊ | 4187/22434 [2:58:16<89:37:35, 17.68s/it] +2025-02-05 13:05:59 - ERROR - stderr - 19%|█▊ | 4188/22434 [2:58:18<66:31:59, 13.13s/it] +2025-02-05 13:05:59 - ERROR - stderr - +2025-02-05 13:05:59 - ERROR - stderr - +2025-02-05 13:05:59 - INFO - stdout - {'loss': 1.0302, 'grad_norm': 1.1972018480300903, 'learning_rate': 1.8740435829607237e-05, 'epoch': 0.56} +2025-02-05 13:05:59 - ERROR - stderr - 19%|█▊ | 4188/22434 [2:58:18<66:31:59, 13.13s/it] +2025-02-05 13:06:22 - ERROR - stderr - 19%|█▊ | 4189/22434 [2:58:41<81:29:45, 16.08s/it] +2025-02-05 13:06:22 - ERROR - stderr - +2025-02-05 13:06:22 - ERROR - stderr - +2025-02-05 13:06:22 - INFO - stdout - {'loss': 1.0006, 'grad_norm': 1.1575087308883667, 'learning_rate': 1.873973429779638e-05, 'epoch': 0.56} +2025-02-05 13:06:22 - ERROR - stderr - 19%|█▊ | 4189/22434 [2:58:41<81:29:45, 16.08s/it] +2025-02-05 13:06:53 - ERROR - stderr - 19%|█▊ | 4190/22434 [2:59:13<104:31:09, 20.62s/it] +2025-02-05 13:06:53 - ERROR - stderr - +2025-02-05 13:06:53 - ERROR - stderr - +2025-02-05 13:06:53 - INFO - stdout - {'loss': 0.9701, 'grad_norm': 1.0021131038665771, 'learning_rate': 1.8739032583814124e-05, 'epoch': 0.56} +2025-02-05 13:06:53 - ERROR - stderr - 19%|█▊ | 4190/22434 [2:59:13<104:31:09, 20.62s/it] +2025-02-05 13:07:13 - ERROR - stderr - 19%|█▊ | 4191/22434 [2:59:33<103:45:58, 20.48s/it] +2025-02-05 13:07:13 - ERROR - stderr - +2025-02-05 13:07:13 - ERROR - stderr - +2025-02-05 13:07:13 - INFO - stdout - {'loss': 0.8772, 'grad_norm': 1.107019305229187, 'learning_rate': 1.8738330687675094e-05, 'epoch': 0.56} +2025-02-05 13:07:13 - ERROR - stderr - 19%|█▊ | 4191/22434 [2:59:33<103:45:58, 20.48s/it] +2025-02-05 13:07:15 - ERROR - stderr - 19%|█▊ | 4192/22434 [2:59:35<76:25:05, 15.08s/it] +2025-02-05 13:07:16 - ERROR - stderr - +2025-02-05 13:07:16 - ERROR - stderr - +2025-02-05 13:07:16 - INFO - stdout - {'loss': 0.9294, 'grad_norm': 1.0211387872695923, 'learning_rate': 1.8737628609393922e-05, 'epoch': 0.56} +2025-02-05 13:07:16 - ERROR - stderr - 19%|█▊ | 4192/22434 [2:59:35<76:25:05, 15.08s/it] +2025-02-05 13:07:27 - ERROR - stderr - 19%|█▊ | 4193/22434 [2:59:47<71:32:31, 14.12s/it] +2025-02-05 13:07:27 - ERROR - stderr - +2025-02-05 13:07:27 - ERROR - stderr - +2025-02-05 13:07:27 - INFO - stdout - {'loss': 0.9133, 'grad_norm': 0.9384823441505432, 'learning_rate': 1.8736926348985246e-05, 'epoch': 0.56} +2025-02-05 13:07:27 - ERROR - stderr - 19%|█▊ | 4193/22434 [2:59:47<71:32:31, 14.12s/it] +2025-02-05 13:07:55 - ERROR - stderr - 19%|█▊ | 4194/22434 [3:00:14<91:30:47, 18.06s/it] +2025-02-05 13:07:55 - ERROR - stderr - +2025-02-05 13:07:55 - ERROR - stderr - +2025-02-05 13:07:55 - INFO - stdout - {'loss': 0.9611, 'grad_norm': 1.1420704126358032, 'learning_rate': 1.8736223906463698e-05, 'epoch': 0.56} +2025-02-05 13:07:55 - ERROR - stderr - 19%|█▊ | 4194/22434 [3:00:14<91:30:47, 18.06s/it] +2025-02-05 13:08:10 - ERROR - stderr - 19%|█▊ | 4195/22434 [3:00:30<87:09:10, 17.20s/it] +2025-02-05 13:08:10 - ERROR - stderr - +2025-02-05 13:08:10 - ERROR - stderr - +2025-02-05 13:08:10 - INFO - stdout - {'loss': 0.8352, 'grad_norm': 1.0146013498306274, 'learning_rate': 1.8735521281843923e-05, 'epoch': 0.56} +2025-02-05 13:08:10 - ERROR - stderr - 19%|█▊ | 4195/22434 [3:00:30<87:09:10, 17.20s/it] +2025-02-05 13:08:36 - ERROR - stderr - 19%|█▊ | 4196/22434 [3:00:56<100:50:14, 19.90s/it] +2025-02-05 13:08:36 - ERROR - stderr - +2025-02-05 13:08:36 - ERROR - stderr - +2025-02-05 13:08:36 - INFO - stdout - {'loss': 1.0533, 'grad_norm': 1.2548308372497559, 'learning_rate': 1.8734818475140565e-05, 'epoch': 0.56} +2025-02-05 13:08:36 - ERROR - stderr - 19%|█▊ | 4196/22434 [3:00:56<100:50:14, 19.90s/it] +2025-02-05 13:08:53 - ERROR - stderr - 19%|█▊ | 4197/22434 [3:01:12<95:53:26, 18.93s/it] +2025-02-05 13:08:53 - ERROR - stderr - +2025-02-05 13:08:53 - ERROR - stderr - +2025-02-05 13:08:53 - INFO - stdout - {'loss': 1.0579, 'grad_norm': 1.1899572610855103, 'learning_rate': 1.8734115486368275e-05, 'epoch': 0.56} +2025-02-05 13:08:53 - ERROR - stderr - 19%|█▊ | 4197/22434 [3:01:12<95:53:26, 18.93s/it] +2025-02-05 13:09:12 - ERROR - stderr - 19%|█▊ | 4198/22434 [3:01:32<97:05:58, 19.17s/it] +2025-02-05 13:09:12 - ERROR - stderr - +2025-02-05 13:09:12 - ERROR - stderr - +2025-02-05 13:09:12 - INFO - stdout - {'loss': 0.9172, 'grad_norm': 1.1153916120529175, 'learning_rate': 1.8733412315541706e-05, 'epoch': 0.56} +2025-02-05 13:09:12 - ERROR - stderr - 19%|█▊ | 4198/22434 [3:01:32<97:05:58, 19.17s/it] +2025-02-05 13:09:49 - ERROR - stderr - 19%|█▊ | 4199/22434 [3:02:09<123:32:19, 24.39s/it] +2025-02-05 13:09:49 - ERROR - stderr - +2025-02-05 13:09:49 - ERROR - stderr - +2025-02-05 13:09:49 - INFO - stdout - {'loss': 1.1226, 'grad_norm': 1.0486317873001099, 'learning_rate': 1.8732708962675513e-05, 'epoch': 0.56} +2025-02-05 13:09:49 - ERROR - stderr - 19%|█▊ | 4199/22434 [3:02:09<123:32:19, 24.39s/it] +2025-02-05 13:09:51 - ERROR - stderr - 19%|█▊ | 4200/22434 [3:02:11<90:18:06, 17.83s/it] +2025-02-05 13:09:52 - ERROR - stderr - +2025-02-05 13:09:52 - ERROR - stderr - +2025-02-05 13:09:52 - INFO - stdout - {'loss': 0.9975, 'grad_norm': 1.20783531665802, 'learning_rate': 1.8732005427784357e-05, 'epoch': 0.56} +2025-02-05 13:09:52 - ERROR - stderr - 19%|█▊ | 4200/22434 [3:02:11<90:18:06, 17.83s/it] +2025-02-05 13:10:13 - ERROR - stderr - 19%|█▊ | 4201/22434 [3:02:32<95:11:10, 18.79s/it] +2025-02-05 13:10:13 - ERROR - stderr - +2025-02-05 13:10:13 - ERROR - stderr - +2025-02-05 13:10:13 - INFO - stdout - {'loss': 1.1299, 'grad_norm': 1.0829203128814697, 'learning_rate': 1.8731301710882905e-05, 'epoch': 0.56} +2025-02-05 13:10:13 - ERROR - stderr - 19%|█▊ | 4201/22434 [3:02:32<95:11:10, 18.79s/it] +2025-02-05 13:10:34 - ERROR - stderr - 19%|█▊ | 4202/22434 [3:02:54<99:56:44, 19.73s/it] +2025-02-05 13:10:35 - ERROR - stderr - +2025-02-05 13:10:35 - ERROR - stderr - +2025-02-05 13:10:35 - INFO - stdout - {'loss': 0.9116, 'grad_norm': 1.0710757970809937, 'learning_rate': 1.8730597811985826e-05, 'epoch': 0.56} +2025-02-05 13:10:35 - ERROR - stderr - 19%|█▊ | 4202/22434 [3:02:54<99:56:44, 19.73s/it] +2025-02-05 13:10:37 - ERROR - stderr - 19%|█▊ | 4203/22434 [3:02:57<73:42:58, 14.56s/it] +2025-02-05 13:10:37 - ERROR - stderr - +2025-02-05 13:10:37 - ERROR - stderr - +2025-02-05 13:10:37 - INFO - stdout - {'loss': 0.8762, 'grad_norm': 1.0341309309005737, 'learning_rate': 1.872989373110779e-05, 'epoch': 0.56} +2025-02-05 13:10:37 - ERROR - stderr - 19%|█▊ | 4203/22434 [3:02:57<73:42:58, 14.56s/it] +2025-02-05 13:10:54 - ERROR - stderr - 19%|█▊ | 4204/22434 [3:03:14<77:40:55, 15.34s/it] +2025-02-05 13:10:54 - ERROR - stderr - +2025-02-05 13:10:54 - ERROR - stderr - +2025-02-05 13:10:54 - INFO - stdout - {'loss': 1.0129, 'grad_norm': 1.127497673034668, 'learning_rate': 1.8729189468263466e-05, 'epoch': 0.56} +2025-02-05 13:10:54 - ERROR - stderr - 19%|█▊ | 4204/22434 [3:03:14<77:40:55, 15.34s/it] +2025-02-05 13:11:10 - ERROR - stderr - 19%|█▊ | 4205/22434 [3:03:30<78:11:34, 15.44s/it] +2025-02-05 13:11:10 - ERROR - stderr - +2025-02-05 13:11:10 - ERROR - stderr - +2025-02-05 13:11:10 - INFO - stdout - {'loss': 1.0923, 'grad_norm': 1.1886883974075317, 'learning_rate': 1.8728485023467547e-05, 'epoch': 0.56} +2025-02-05 13:11:10 - ERROR - stderr - 19%|█▊ | 4205/22434 [3:03:30<78:11:34, 15.44s/it] +2025-02-05 13:11:48 - ERROR - stderr - 19%|█▊ | 4206/22434 [3:04:08<113:17:38, 22.38s/it] +2025-02-05 13:11:48 - ERROR - stderr - +2025-02-05 13:11:48 - ERROR - stderr - +2025-02-05 13:11:48 - INFO - stdout - {'loss': 0.9698, 'grad_norm': 1.2200772762298584, 'learning_rate': 1.8727780396734707e-05, 'epoch': 0.56} +2025-02-05 13:11:48 - ERROR - stderr - 19%|█▊ | 4206/22434 [3:04:08<113:17:38, 22.38s/it] +2025-02-05 13:11:51 - ERROR - stderr - 19%|█▉ | 4207/22434 [3:04:11<83:03:25, 16.40s/it] +2025-02-05 13:11:51 - ERROR - stderr - +2025-02-05 13:11:51 - ERROR - stderr - +2025-02-05 13:11:51 - INFO - stdout - {'loss': 1.0018, 'grad_norm': 1.106721043586731, 'learning_rate': 1.8727075588079638e-05, 'epoch': 0.56} +2025-02-05 13:11:51 - ERROR - stderr - 19%|█▉ | 4207/22434 [3:04:11<83:03:25, 16.40s/it] +2025-02-05 13:12:16 - ERROR - stderr - 19%|█▉ | 4208/22434 [3:04:36<96:52:50, 19.14s/it] +2025-02-05 13:12:16 - ERROR - stderr - +2025-02-05 13:12:16 - ERROR - stderr - +2025-02-05 13:12:16 - INFO - stdout - {'loss': 0.904, 'grad_norm': 0.8979536294937134, 'learning_rate': 1.8726370597517026e-05, 'epoch': 0.56} +2025-02-05 13:12:16 - ERROR - stderr - 19%|█▉ | 4208/22434 [3:04:36<96:52:50, 19.14s/it] +2025-02-05 13:12:53 - ERROR - stderr - 19%|█▉ | 4209/22434 [3:05:13<123:07:58, 24.32s/it] +2025-02-05 13:12:53 - ERROR - stderr - +2025-02-05 13:12:53 - ERROR - stderr - +2025-02-05 13:12:53 - INFO - stdout - {'loss': 0.9454, 'grad_norm': 1.0531038045883179, 'learning_rate': 1.8725665425061574e-05, 'epoch': 0.56} +2025-02-05 13:12:53 - ERROR - stderr - 19%|█▉ | 4209/22434 [3:05:13<123:07:58, 24.32s/it] +2025-02-05 13:13:10 - ERROR - stderr - 19%|█▉ | 4210/22434 [3:05:30<112:27:51, 22.22s/it] +2025-02-05 13:13:10 - ERROR - stderr - +2025-02-05 13:13:10 - ERROR - stderr - +2025-02-05 13:13:10 - INFO - stdout - {'loss': 0.8943, 'grad_norm': 1.0507900714874268, 'learning_rate': 1.8724960070727974e-05, 'epoch': 0.56} +2025-02-05 13:13:10 - ERROR - stderr - 19%|█▉ | 4210/22434 [3:05:30<112:27:51, 22.22s/it] +2025-02-05 13:13:26 - ERROR - stderr - 19%|█▉ | 4211/22434 [3:05:46<103:39:48, 20.48s/it] +2025-02-05 13:13:27 - ERROR - stderr - +2025-02-05 13:13:27 - ERROR - stderr - +2025-02-05 13:13:27 - INFO - stdout - {'loss': 1.0486, 'grad_norm': 1.1653261184692383, 'learning_rate': 1.8724254534530926e-05, 'epoch': 0.56} +2025-02-05 13:13:27 - ERROR - stderr - 19%|█▉ | 4211/22434 [3:05:46<103:39:48, 20.48s/it] +2025-02-05 13:13:39 - ERROR - stderr - 19%|█▉ | 4212/22434 [3:05:59<91:53:39, 18.15s/it] +2025-02-05 13:13:39 - ERROR - stderr - +2025-02-05 13:13:39 - ERROR - stderr - +2025-02-05 13:13:39 - INFO - stdout - {'loss': 0.9609, 'grad_norm': 1.126889705657959, 'learning_rate': 1.8723548816485147e-05, 'epoch': 0.56} +2025-02-05 13:13:39 - ERROR - stderr - 19%|█▉ | 4212/22434 [3:05:59<91:53:39, 18.15s/it] +2025-02-05 13:13:42 - ERROR - stderr - 19%|█▉ | 4213/22434 [3:06:01<68:08:14, 13.46s/it] +2025-02-05 13:13:42 - ERROR - stderr - +2025-02-05 13:13:42 - ERROR - stderr - +2025-02-05 13:13:42 - INFO - stdout - {'loss': 0.8531, 'grad_norm': 1.1318472623825073, 'learning_rate': 1.8722842916605338e-05, 'epoch': 0.56} +2025-02-05 13:13:42 - ERROR - stderr - 19%|█▉ | 4213/22434 [3:06:02<68:08:14, 13.46s/it] +2025-02-05 13:14:15 - ERROR - stderr - 19%|█▉ | 4214/22434 [3:06:35<98:13:27, 19.41s/it] +2025-02-05 13:14:15 - ERROR - stderr - +2025-02-05 13:14:15 - ERROR - stderr - +2025-02-05 13:14:15 - INFO - stdout - {'loss': 0.9788, 'grad_norm': 1.0813262462615967, 'learning_rate': 1.8722136834906214e-05, 'epoch': 0.56} +2025-02-05 13:14:15 - ERROR - stderr - 19%|█▉ | 4214/22434 [3:06:35<98:13:27, 19.41s/it] +2025-02-05 13:14:17 - ERROR - stderr - 19%|█▉ | 4215/22434 [3:06:37<72:30:03, 14.33s/it] +2025-02-05 13:14:18 - ERROR - stderr - +2025-02-05 13:14:18 - ERROR - stderr - +2025-02-05 13:14:18 - INFO - stdout - {'loss': 1.004, 'grad_norm': 1.1463592052459717, 'learning_rate': 1.8721430571402496e-05, 'epoch': 0.56} +2025-02-05 13:14:18 - ERROR - stderr - 19%|█▉ | 4215/22434 [3:06:37<72:30:03, 14.33s/it] +2025-02-05 13:14:54 - ERROR - stderr - 19%|█▉ | 4216/22434 [3:07:14<106:46:17, 21.10s/it] +2025-02-05 13:14:54 - ERROR - stderr - +2025-02-05 13:14:54 - ERROR - stderr - +2025-02-05 13:14:54 - INFO - stdout - {'loss': 0.8571, 'grad_norm': 1.1911218166351318, 'learning_rate': 1.87207241261089e-05, 'epoch': 0.56} +2025-02-05 13:14:54 - ERROR - stderr - 19%|█▉ | 4216/22434 [3:07:14<106:46:17, 21.10s/it] +2025-02-05 13:15:12 - ERROR - stderr - 19%|█▉ | 4217/22434 [3:07:32<101:06:47, 19.98s/it] +2025-02-05 13:15:12 - ERROR - stderr - +2025-02-05 13:15:12 - ERROR - stderr - +2025-02-05 13:15:12 - INFO - stdout - {'loss': 1.0385, 'grad_norm': 1.0400562286376953, 'learning_rate': 1.8720017499040154e-05, 'epoch': 0.56} +2025-02-05 13:15:12 - ERROR - stderr - 19%|█▉ | 4217/22434 [3:07:32<101:06:47, 19.98s/it] +2025-02-05 13:15:14 - ERROR - stderr - 19%|█▉ | 4218/22434 [3:07:34<74:32:42, 14.73s/it] +2025-02-05 13:15:14 - ERROR - stderr - +2025-02-05 13:15:14 - ERROR - stderr - +2025-02-05 13:15:14 - INFO - stdout - {'loss': 1.0449, 'grad_norm': 1.292311429977417, 'learning_rate': 1.8719310690210993e-05, 'epoch': 0.56} +2025-02-05 13:15:14 - ERROR - stderr - 19%|█▉ | 4218/22434 [3:07:34<74:32:42, 14.73s/it] +2025-02-05 13:15:29 - ERROR - stderr - 19%|█▉ | 4219/22434 [3:07:49<74:30:17, 14.73s/it] +2025-02-05 13:15:29 - ERROR - stderr - +2025-02-05 13:15:29 - ERROR - stderr - +2025-02-05 13:15:29 - INFO - stdout - {'loss': 0.8971, 'grad_norm': 1.058998942375183, 'learning_rate': 1.871860369963614e-05, 'epoch': 0.56} +2025-02-05 13:15:29 - ERROR - stderr - 19%|█▉ | 4219/22434 [3:07:49<74:30:17, 14.73s/it] +2025-02-05 13:15:41 - ERROR - stderr - 19%|█▉ | 4220/22434 [3:08:01<70:18:41, 13.90s/it] +2025-02-05 13:15:41 - ERROR - stderr - +2025-02-05 13:15:41 - ERROR - stderr - +2025-02-05 13:15:41 - INFO - stdout - {'loss': 0.9394, 'grad_norm': 0.9786389470100403, 'learning_rate': 1.8717896527330334e-05, 'epoch': 0.56} +2025-02-05 13:15:41 - ERROR - stderr - 19%|█▉ | 4220/22434 [3:08:01<70:18:41, 13.90s/it] +2025-02-05 13:15:52 - ERROR - stderr - 19%|█▉ | 4221/22434 [3:08:12<65:57:39, 13.04s/it] +2025-02-05 13:15:52 - ERROR - stderr - +2025-02-05 13:15:52 - ERROR - stderr - +2025-02-05 13:15:52 - INFO - stdout - {'loss': 1.0865, 'grad_norm': 1.1137664318084717, 'learning_rate': 1.8717189173308322e-05, 'epoch': 0.56} +2025-02-05 13:15:52 - ERROR - stderr - 19%|█▉ | 4221/22434 [3:08:12<65:57:39, 13.04s/it] +2025-02-05 13:15:54 - ERROR - stderr - 19%|█▉ | 4222/22434 [3:08:14<49:54:10, 9.86s/it] +2025-02-05 13:15:54 - ERROR - stderr - +2025-02-05 13:15:54 - ERROR - stderr - +2025-02-05 13:15:54 - INFO - stdout - {'loss': 1.0253, 'grad_norm': 1.2083667516708374, 'learning_rate': 1.8716481637584838e-05, 'epoch': 0.56} +2025-02-05 13:15:54 - ERROR - stderr - 19%|█▉ | 4222/22434 [3:08:14<49:54:10, 9.86s/it] +2025-02-05 13:16:20 - ERROR - stderr - 19%|█▉ | 4223/22434 [3:08:39<73:07:15, 14.45s/it] +2025-02-05 13:16:20 - ERROR - stderr - +2025-02-05 13:16:20 - ERROR - stderr - +2025-02-05 13:16:20 - INFO - stdout - {'loss': 0.8516, 'grad_norm': 1.041096806526184, 'learning_rate': 1.871577392017464e-05, 'epoch': 0.56} +2025-02-05 13:16:20 - ERROR - stderr - 19%|█▉ | 4223/22434 [3:08:39<73:07:15, 14.45s/it] +2025-02-05 13:16:43 - ERROR - stderr - 19%|█▉ | 4224/22434 [3:09:03<87:10:09, 17.23s/it] +2025-02-05 13:16:43 - ERROR - stderr - +2025-02-05 13:16:43 - ERROR - stderr - +2025-02-05 13:16:43 - INFO - stdout - {'loss': 1.0062, 'grad_norm': 1.1420881748199463, 'learning_rate': 1.8715066021092472e-05, 'epoch': 0.56} +2025-02-05 13:16:43 - ERROR - stderr - 19%|█▉ | 4224/22434 [3:09:03<87:10:09, 17.23s/it] +2025-02-05 13:17:09 - ERROR - stderr - 19%|█▉ | 4225/22434 [3:09:29<100:15:15, 19.82s/it] +2025-02-05 13:17:09 - ERROR - stderr - +2025-02-05 13:17:09 - ERROR - stderr - +2025-02-05 13:17:09 - INFO - stdout - {'loss': 0.9895, 'grad_norm': 1.0848466157913208, 'learning_rate': 1.8714357940353092e-05, 'epoch': 0.56} +2025-02-05 13:17:09 - ERROR - stderr - 19%|█▉ | 4225/22434 [3:09:29<100:15:15, 19.82s/it] +2025-02-05 13:17:12 - ERROR - stderr - 19%|█▉ | 4226/22434 [3:09:31<73:53:58, 14.61s/it] +2025-02-05 13:17:12 - ERROR - stderr - +2025-02-05 13:17:12 - ERROR - stderr - +2025-02-05 13:17:12 - INFO - stdout - {'loss': 0.992, 'grad_norm': 1.1167887449264526, 'learning_rate': 1.871364967797126e-05, 'epoch': 0.57} +2025-02-05 13:17:12 - ERROR - stderr - 19%|█▉ | 4226/22434 [3:09:31<73:53:58, 14.61s/it] +2025-02-05 13:17:46 - ERROR - stderr - 19%|█▉ | 4227/22434 [3:10:06<103:42:54, 20.51s/it] +2025-02-05 13:17:46 - ERROR - stderr - +2025-02-05 13:17:46 - ERROR - stderr - +2025-02-05 13:17:46 - INFO - stdout - {'loss': 1.0205, 'grad_norm': 1.1449971199035645, 'learning_rate': 1.8712941233961736e-05, 'epoch': 0.57} +2025-02-05 13:17:46 - ERROR - stderr - 19%|█▉ | 4227/22434 [3:10:06<103:42:54, 20.51s/it] +2025-02-05 13:18:26 - ERROR - stderr - 19%|█▉ | 4228/22434 [3:10:46<133:10:11, 26.33s/it] +2025-02-05 13:18:26 - ERROR - stderr - +2025-02-05 13:18:26 - ERROR - stderr - +2025-02-05 13:18:26 - INFO - stdout - {'loss': 0.9632, 'grad_norm': 1.130625605583191, 'learning_rate': 1.8712232608339294e-05, 'epoch': 0.57} +2025-02-05 13:18:26 - ERROR - stderr - 19%|█▉ | 4228/22434 [3:10:46<133:10:11, 26.33s/it] +2025-02-05 13:18:57 - ERROR - stderr - 19%|█▉ | 4229/22434 [3:11:17<140:21:03, 27.75s/it] +2025-02-05 13:18:57 - ERROR - stderr - +2025-02-05 13:18:57 - ERROR - stderr - +2025-02-05 13:18:57 - INFO - stdout - {'loss': 1.0497, 'grad_norm': 1.1609690189361572, 'learning_rate': 1.8711523801118694e-05, 'epoch': 0.57} +2025-02-05 13:18:57 - ERROR - stderr - 19%|█▉ | 4229/22434 [3:11:17<140:21:03, 27.75s/it] +2025-02-05 13:18:59 - ERROR - stderr - 19%|█▉ | 4230/22434 [3:11:19<102:03:10, 20.18s/it] +2025-02-05 13:18:59 - ERROR - stderr - +2025-02-05 13:18:59 - ERROR - stderr - +2025-02-05 13:18:59 - INFO - stdout - {'loss': 0.9908, 'grad_norm': 1.2421350479125977, 'learning_rate': 1.8710814812314722e-05, 'epoch': 0.57} +2025-02-05 13:18:59 - ERROR - stderr - 19%|█▉ | 4230/22434 [3:11:19<102:03:10, 20.18s/it] +2025-02-05 13:19:21 - ERROR - stderr - 19%|█▉ | 4231/22434 [3:11:41<104:52:09, 20.74s/it] +2025-02-05 13:19:21 - ERROR - stderr - +2025-02-05 13:19:21 - ERROR - stderr - +2025-02-05 13:19:21 - INFO - stdout - {'loss': 0.9738, 'grad_norm': 1.0364757776260376, 'learning_rate': 1.871010564194215e-05, 'epoch': 0.57} +2025-02-05 13:19:21 - ERROR - stderr - 19%|█▉ | 4231/22434 [3:11:41<104:52:09, 20.74s/it] +2025-02-05 13:19:41 - ERROR - stderr - 19%|█▉ | 4232/22434 [3:12:01<103:12:36, 20.41s/it] +2025-02-05 13:19:41 - ERROR - stderr - +2025-02-05 13:19:41 - ERROR - stderr - +2025-02-05 13:19:41 - INFO - stdout - {'loss': 1.0001, 'grad_norm': 1.160951852798462, 'learning_rate': 1.870939629001576e-05, 'epoch': 0.57} +2025-02-05 13:19:41 - ERROR - stderr - 19%|█▉ | 4232/22434 [3:12:01<103:12:36, 20.41s/it] +2025-02-05 13:20:08 - ERROR - stderr - 19%|█▉ | 4233/22434 [3:12:28<113:29:33, 22.45s/it] +2025-02-05 13:20:08 - ERROR - stderr - +2025-02-05 13:20:08 - ERROR - stderr - +2025-02-05 13:20:08 - INFO - stdout - {'loss': 0.9345, 'grad_norm': 1.1690552234649658, 'learning_rate': 1.8708686756550338e-05, 'epoch': 0.57} +2025-02-05 13:20:08 - ERROR - stderr - 19%|█▉ | 4233/22434 [3:12:28<113:29:33, 22.45s/it] +2025-02-05 13:20:11 - ERROR - stderr - 19%|█▉ | 4234/22434 [3:12:30<83:07:42, 16.44s/it] +2025-02-05 13:20:11 - ERROR - stderr - +2025-02-05 13:20:11 - ERROR - stderr - +2025-02-05 13:20:11 - INFO - stdout - {'loss': 0.9256, 'grad_norm': 1.1095682382583618, 'learning_rate': 1.8707977041560673e-05, 'epoch': 0.57} +2025-02-05 13:20:11 - ERROR - stderr - 19%|█▉ | 4234/22434 [3:12:30<83:07:42, 16.44s/it] +2025-02-05 13:20:39 - ERROR - stderr - 19%|█▉ | 4235/22434 [3:12:59<101:46:45, 20.13s/it] +2025-02-05 13:20:39 - ERROR - stderr - +2025-02-05 13:20:39 - ERROR - stderr - +2025-02-05 13:20:39 - INFO - stdout - {'loss': 1.0911, 'grad_norm': 1.1522551774978638, 'learning_rate': 1.870726714506156e-05, 'epoch': 0.57} +2025-02-05 13:20:39 - ERROR - stderr - 19%|█▉ | 4235/22434 [3:12:59<101:46:45, 20.13s/it] +2025-02-05 13:20:42 - ERROR - stderr - 19%|█▉ | 4236/22434 [3:13:02<74:58:37, 14.83s/it] +2025-02-05 13:20:42 - ERROR - stderr - +2025-02-05 13:20:42 - ERROR - stderr - +2025-02-05 13:20:42 - INFO - stdout - {'loss': 0.974, 'grad_norm': 0.9971983432769775, 'learning_rate': 1.8706557067067795e-05, 'epoch': 0.57} +2025-02-05 13:20:42 - ERROR - stderr - 19%|█▉ | 4236/22434 [3:13:02<74:58:37, 14.83s/it] +2025-02-05 13:20:44 - ERROR - stderr - 19%|█▉ | 4237/22434 [3:13:04<56:18:27, 11.14s/it] +2025-02-05 13:20:44 - ERROR - stderr - +2025-02-05 13:20:44 - ERROR - stderr - +2025-02-05 13:20:44 - INFO - stdout - {'loss': 0.9948, 'grad_norm': 1.1638718843460083, 'learning_rate': 1.870584680759418e-05, 'epoch': 0.57} +2025-02-05 13:20:44 - ERROR - stderr - 19%|█▉ | 4237/22434 [3:13:04<56:18:27, 11.14s/it] +2025-02-05 13:21:12 - ERROR - stderr - 19%|█▉ | 4238/22434 [3:13:32<80:57:46, 16.02s/it] +2025-02-05 13:21:12 - ERROR - stderr - +2025-02-05 13:21:12 - ERROR - stderr - +2025-02-05 13:21:12 - INFO - stdout - {'loss': 1.013, 'grad_norm': 1.300297498703003, 'learning_rate': 1.8705136366655518e-05, 'epoch': 0.57} +2025-02-05 13:21:12 - ERROR - stderr - 19%|█▉ | 4238/22434 [3:13:32<80:57:46, 16.02s/it] +2025-02-05 13:21:14 - ERROR - stderr - 19%|█▉ | 4239/22434 [3:13:34<60:25:05, 11.95s/it] +2025-02-05 13:21:14 - ERROR - stderr - +2025-02-05 13:21:14 - ERROR - stderr - +2025-02-05 13:21:14 - INFO - stdout - {'loss': 0.7816, 'grad_norm': 0.9951872825622559, 'learning_rate': 1.8704425744266616e-05, 'epoch': 0.57} +2025-02-05 13:21:14 - ERROR - stderr - 19%|█▉ | 4239/22434 [3:13:34<60:25:05, 11.95s/it] +2025-02-05 13:21:17 - ERROR - stderr - 19%|█▉ | 4240/22434 [3:13:37<46:04:44, 9.12s/it] +2025-02-05 13:21:17 - ERROR - stderr - +2025-02-05 13:21:17 - ERROR - stderr - +2025-02-05 13:21:17 - INFO - stdout - {'loss': 0.9191, 'grad_norm': 1.0650863647460938, 'learning_rate': 1.8703714940442294e-05, 'epoch': 0.57} +2025-02-05 13:21:17 - ERROR - stderr - 19%|█▉ | 4240/22434 [3:13:37<46:04:44, 9.12s/it] +2025-02-05 13:21:44 - ERROR - stderr - 19%|█▉ | 4241/22434 [3:14:04<73:30:14, 14.54s/it] +2025-02-05 13:21:44 - ERROR - stderr - +2025-02-05 13:21:44 - ERROR - stderr - +2025-02-05 13:21:44 - INFO - stdout - {'loss': 1.0494, 'grad_norm': 1.220794677734375, 'learning_rate': 1.870300395519736e-05, 'epoch': 0.57} +2025-02-05 13:21:44 - ERROR - stderr - 19%|█▉ | 4241/22434 [3:14:04<73:30:14, 14.54s/it] +2025-02-05 13:21:46 - ERROR - stderr - 19%|█▉ | 4242/22434 [3:14:06<55:07:20, 10.91s/it] +2025-02-05 13:21:46 - ERROR - stderr - +2025-02-05 13:21:46 - ERROR - stderr - +2025-02-05 13:21:46 - INFO - stdout - {'loss': 1.0441, 'grad_norm': 1.1574805974960327, 'learning_rate': 1.8702292788546634e-05, 'epoch': 0.57} +2025-02-05 13:21:46 - ERROR - stderr - 19%|█▉ | 4242/22434 [3:14:06<55:07:20, 10.91s/it] +2025-02-05 13:21:49 - ERROR - stderr - 19%|█▉ | 4243/22434 [3:14:09<42:17:04, 8.37s/it] +2025-02-05 13:21:49 - ERROR - stderr - +2025-02-05 13:21:49 - ERROR - stderr - +2025-02-05 13:21:49 - INFO - stdout - {'loss': 0.8493, 'grad_norm': 1.1586596965789795, 'learning_rate': 1.8701581440504945e-05, 'epoch': 0.57} +2025-02-05 13:21:49 - ERROR - stderr - 19%|█▉ | 4243/22434 [3:14:09<42:17:04, 8.37s/it] +2025-02-05 13:21:51 - ERROR - stderr - 19%|█▉ | 4244/22434 [3:14:11<33:18:49, 6.59s/it] +2025-02-05 13:21:51 - ERROR - stderr - +2025-02-05 13:21:51 - ERROR - stderr - +2025-02-05 13:21:51 - INFO - stdout - {'loss': 0.9277, 'grad_norm': 1.103170394897461, 'learning_rate': 1.8700869911087115e-05, 'epoch': 0.57} +2025-02-05 13:21:51 - ERROR - stderr - 19%|█▉ | 4244/22434 [3:14:11<33:18:49, 6.59s/it] +2025-02-05 13:21:54 - ERROR - stderr - 19%|█▉ | 4245/22434 [3:14:14<27:04:07, 5.36s/it] +2025-02-05 13:21:54 - ERROR - stderr - +2025-02-05 13:21:54 - ERROR - stderr - +2025-02-05 13:21:54 - INFO - stdout - {'loss': 0.9952, 'grad_norm': 1.0954698324203491, 'learning_rate': 1.870015820030798e-05, 'epoch': 0.57} +2025-02-05 13:21:54 - ERROR - stderr - 19%|█▉ | 4245/22434 [3:14:14<27:04:07, 5.36s/it] +2025-02-05 13:22:07 - ERROR - stderr - 19%|█▉ | 4246/22434 [3:14:27<39:28:06, 7.81s/it] +2025-02-05 13:22:07 - ERROR - stderr - +2025-02-05 13:22:07 - ERROR - stderr - +2025-02-05 13:22:07 - INFO - stdout - {'loss': 1.025, 'grad_norm': 1.126454472541809, 'learning_rate': 1.8699446308182372e-05, 'epoch': 0.57} +2025-02-05 13:22:07 - ERROR - stderr - 19%|█▉ | 4246/22434 [3:14:27<39:28:06, 7.81s/it] +2025-02-05 13:22:21 - ERROR - stderr - 19%|█▉ | 4247/22434 [3:14:41<48:49:33, 9.66s/it] +2025-02-05 13:22:21 - ERROR - stderr - +2025-02-05 13:22:21 - ERROR - stderr - +2025-02-05 13:22:21 - INFO - stdout - {'loss': 0.9862, 'grad_norm': 1.1199926137924194, 'learning_rate': 1.869873423472513e-05, 'epoch': 0.57} +2025-02-05 13:22:21 - ERROR - stderr - 19%|█▉ | 4247/22434 [3:14:41<48:49:33, 9.66s/it] +2025-02-05 13:22:24 - ERROR - stderr - 19%|█▉ | 4248/22434 [3:14:44<37:53:05, 7.50s/it] +2025-02-05 13:22:24 - ERROR - stderr - +2025-02-05 13:22:24 - ERROR - stderr - +2025-02-05 13:22:24 - INFO - stdout - {'loss': 0.9111, 'grad_norm': 1.0378814935684204, 'learning_rate': 1.8698021979951096e-05, 'epoch': 0.57} +2025-02-05 13:22:24 - ERROR - stderr - 19%|█▉ | 4248/22434 [3:14:44<37:53:05, 7.50s/it] +2025-02-05 13:22:35 - ERROR - stderr - 19%|█▉ | 4249/22434 [3:14:55<43:22:01, 8.59s/it] +2025-02-05 13:22:35 - ERROR - stderr - +2025-02-05 13:22:35 - ERROR - stderr - +2025-02-05 13:22:35 - INFO - stdout - {'loss': 1.0847, 'grad_norm': 1.0390182733535767, 'learning_rate': 1.8697309543875115e-05, 'epoch': 0.57} +2025-02-05 13:22:35 - ERROR - stderr - 19%|█▉ | 4249/22434 [3:14:55<43:22:01, 8.59s/it] +2025-02-05 13:22:44 - ERROR - stderr - 19%|█▉ | 4250/22434 [3:15:04<44:06:04, 8.73s/it] +2025-02-05 13:22:44 - ERROR - stderr - +2025-02-05 13:22:44 - ERROR - stderr - +2025-02-05 13:22:44 - INFO - stdout - {'loss': 1.0063, 'grad_norm': 1.2294104099273682, 'learning_rate': 1.8696596926512043e-05, 'epoch': 0.57} +2025-02-05 13:22:44 - ERROR - stderr - 19%|█▉ | 4250/22434 [3:15:04<44:06:04, 8.73s/it] +2025-02-05 13:22:52 - ERROR - stderr - 19%|█▉ | 4251/22434 [3:15:12<42:53:11, 8.49s/it] +2025-02-05 13:22:52 - ERROR - stderr - +2025-02-05 13:22:52 - ERROR - stderr - +2025-02-05 13:22:52 - INFO - stdout - {'loss': 1.0005, 'grad_norm': 1.206725001335144, 'learning_rate': 1.8695884127876728e-05, 'epoch': 0.57} +2025-02-05 13:22:52 - ERROR - stderr - 19%|█▉ | 4251/22434 [3:15:12<42:53:11, 8.49s/it] +2025-02-05 13:22:56 - ERROR - stderr - 19%|█▉ | 4252/22434 [3:15:16<37:01:34, 7.33s/it] +2025-02-05 13:22:57 - ERROR - stderr - +2025-02-05 13:22:57 - ERROR - stderr - +2025-02-05 13:22:57 - INFO - stdout - {'loss': 0.9163, 'grad_norm': 1.1476114988327026, 'learning_rate': 1.869517114798403e-05, 'epoch': 0.57} +2025-02-05 13:22:57 - ERROR - stderr - 19%|█▉ | 4252/22434 [3:15:16<37:01:34, 7.33s/it] +2025-02-05 13:22:59 - ERROR - stderr - 19%|█▉ | 4253/22434 [3:15:19<29:41:43, 5.88s/it] +2025-02-05 13:22:59 - ERROR - stderr - +2025-02-05 13:22:59 - ERROR - stderr - +2025-02-05 13:22:59 - INFO - stdout - {'loss': 0.9948, 'grad_norm': 1.168081521987915, 'learning_rate': 1.8694457986848808e-05, 'epoch': 0.57} +2025-02-05 13:22:59 - ERROR - stderr - 19%|█▉ | 4253/22434 [3:15:19<29:41:43, 5.88s/it] +2025-02-05 13:23:02 - ERROR - stderr - 19%|█▉ | 4254/22434 [3:15:21<24:39:12, 4.88s/it] +2025-02-05 13:23:02 - ERROR - stderr - +2025-02-05 13:23:02 - ERROR - stderr - +2025-02-05 13:23:02 - INFO - stdout - {'loss': 1.0201, 'grad_norm': 1.1543513536453247, 'learning_rate': 1.869374464448593e-05, 'epoch': 0.57} +2025-02-05 13:23:02 - ERROR - stderr - 19%|█▉ | 4254/22434 [3:15:21<24:39:12, 4.88s/it] +2025-02-05 13:23:04 - ERROR - stderr - 19%|█▉ | 4255/22434 [3:15:24<21:00:36, 4.16s/it] +2025-02-05 13:23:04 - ERROR - stderr - +2025-02-05 13:23:04 - ERROR - stderr - +2025-02-05 13:23:04 - INFO - stdout - {'loss': 0.8849, 'grad_norm': 1.1034555435180664, 'learning_rate': 1.8693031120910264e-05, 'epoch': 0.57} +2025-02-05 13:23:04 - ERROR - stderr - 19%|█▉ | 4255/22434 [3:15:24<21:00:36, 4.16s/it] +2025-02-05 13:23:07 - ERROR - stderr - 19%|█▉ | 4256/22434 [3:15:26<18:34:09, 3.68s/it] +2025-02-05 13:23:07 - ERROR - stderr - +2025-02-05 13:23:07 - ERROR - stderr - +2025-02-05 13:23:07 - INFO - stdout - {'loss': 0.9204, 'grad_norm': 1.1699445247650146, 'learning_rate': 1.8692317416136686e-05, 'epoch': 0.57} +2025-02-05 13:23:07 - ERROR - stderr - 19%|█▉ | 4256/22434 [3:15:26<18:34:09, 3.68s/it] +2025-02-05 13:23:09 - ERROR - stderr - 19%|█▉ | 4257/22434 [3:15:29<16:40:38, 3.30s/it] +2025-02-05 13:23:09 - ERROR - stderr - +2025-02-05 13:23:09 - ERROR - stderr - +2025-02-05 13:23:09 - INFO - stdout - {'loss': 1.0621, 'grad_norm': 1.2645628452301025, 'learning_rate': 1.8691603530180064e-05, 'epoch': 0.57} +2025-02-05 13:23:09 - ERROR - stderr - 19%|█▉ | 4257/22434 [3:15:29<16:40:38, 3.30s/it] +2025-02-05 13:23:11 - ERROR - stderr - 19%|█▉ | 4258/22434 [3:15:31<15:19:40, 3.04s/it] +2025-02-05 13:23:11 - ERROR - stderr - +2025-02-05 13:23:11 - ERROR - stderr - +2025-02-05 13:23:11 - INFO - stdout - {'loss': 1.0086, 'grad_norm': 1.1419788599014282, 'learning_rate': 1.8690889463055285e-05, 'epoch': 0.57} +2025-02-05 13:23:11 - ERROR - stderr - 19%|█▉ | 4258/22434 [3:15:31<15:19:40, 3.04s/it] +2025-02-05 13:23:14 - ERROR - stderr - 19%|█▉ | 4259/22434 [3:15:34<14:27:02, 2.86s/it] +2025-02-05 13:23:14 - ERROR - stderr - +2025-02-05 13:23:14 - ERROR - stderr - +2025-02-05 13:23:14 - INFO - stdout - {'loss': 1.0043, 'grad_norm': 1.179091453552246, 'learning_rate': 1.8690175214777233e-05, 'epoch': 0.57} +2025-02-05 13:23:14 - ERROR - stderr - 19%|█▉ | 4259/22434 [3:15:34<14:27:02, 2.86s/it] +2025-02-05 13:23:16 - ERROR - stderr - 19%|█▉ | 4260/22434 [3:15:36<13:50:49, 2.74s/it] +2025-02-05 13:23:16 - ERROR - stderr - +2025-02-05 13:23:16 - ERROR - stderr - +2025-02-05 13:23:16 - INFO - stdout - {'loss': 1.0159, 'grad_norm': 1.084663987159729, 'learning_rate': 1.8689460785360792e-05, 'epoch': 0.57} +2025-02-05 13:23:16 - ERROR - stderr - 19%|█▉ | 4260/22434 [3:15:36<13:50:49, 2.74s/it] +2025-02-05 13:23:19 - ERROR - stderr - 19%|█▉ | 4261/22434 [3:15:39<13:24:42, 2.66s/it] +2025-02-05 13:23:19 - ERROR - stderr - +2025-02-05 13:23:19 - ERROR - stderr - +2025-02-05 13:23:19 - INFO - stdout - {'loss': 1.0621, 'grad_norm': 1.1908880472183228, 'learning_rate': 1.8688746174820857e-05, 'epoch': 0.57} +2025-02-05 13:23:19 - ERROR - stderr - 19%|█▉ | 4261/22434 [3:15:39<13:24:42, 2.66s/it] +2025-02-05 13:23:22 - ERROR - stderr - 19%|█▉ | 4262/22434 [3:15:41<13:31:59, 2.68s/it] +2025-02-05 13:23:22 - ERROR - stderr - +2025-02-05 13:23:22 - ERROR - stderr - +2025-02-05 13:23:22 - INFO - stdout - {'loss': 0.9558, 'grad_norm': 1.1702381372451782, 'learning_rate': 1.868803138317232e-05, 'epoch': 0.57} +2025-02-05 13:23:22 - ERROR - stderr - 19%|█▉ | 4262/22434 [3:15:41<13:31:59, 2.68s/it] +2025-02-05 13:23:24 - ERROR - stderr - 19%|█▉ | 4263/22434 [3:15:44<13:24:52, 2.66s/it] +2025-02-05 13:23:24 - ERROR - stderr - +2025-02-05 13:23:24 - ERROR - stderr - +2025-02-05 13:23:24 - INFO - stdout - {'loss': 0.9667, 'grad_norm': 1.0802279710769653, 'learning_rate': 1.8687316410430086e-05, 'epoch': 0.57} +2025-02-05 13:23:24 - ERROR - stderr - 19%|█▉ | 4263/22434 [3:15:44<13:24:52, 2.66s/it] +2025-02-05 13:23:38 - ERROR - stderr - 19%|█▉ | 4264/22434 [3:15:58<30:35:16, 6.06s/it] +2025-02-05 13:23:38 - ERROR - stderr - +2025-02-05 13:23:38 - ERROR - stderr - +2025-02-05 13:23:38 - INFO - stdout - {'loss': 0.8215, 'grad_norm': 1.0563397407531738, 'learning_rate': 1.8686601256609053e-05, 'epoch': 0.57} +2025-02-05 13:23:38 - ERROR - stderr - 19%|█▉ | 4264/22434 [3:15:58<30:35:16, 6.06s/it] +2025-02-05 13:23:41 - ERROR - stderr - 19%|█▉ | 4265/22434 [3:16:00<25:06:04, 4.97s/it] +2025-02-05 13:23:41 - ERROR - stderr - +2025-02-05 13:23:41 - ERROR - stderr - +2025-02-05 13:23:41 - INFO - stdout - {'loss': 1.0115, 'grad_norm': 1.0943865776062012, 'learning_rate': 1.868588592172413e-05, 'epoch': 0.57} +2025-02-05 13:23:41 - ERROR - stderr - 19%|█▉ | 4265/22434 [3:16:00<25:06:04, 4.97s/it] +2025-02-05 13:23:43 - ERROR - stderr - 19%|█▉ | 4266/22434 [3:16:03<21:24:41, 4.24s/it] +2025-02-05 13:23:43 - ERROR - stderr - +2025-02-05 13:23:43 - ERROR - stderr - +2025-02-05 13:23:43 - INFO - stdout - {'loss': 0.9674, 'grad_norm': 1.1347267627716064, 'learning_rate': 1.8685170405790222e-05, 'epoch': 0.57} +2025-02-05 13:23:43 - ERROR - stderr - 19%|█▉ | 4266/22434 [3:16:03<21:24:41, 4.24s/it] +2025-02-05 13:23:46 - ERROR - stderr - 19%|█▉ | 4267/22434 [3:16:05<18:45:55, 3.72s/it] +2025-02-05 13:23:46 - ERROR - stderr - +2025-02-05 13:23:46 - ERROR - stderr - +2025-02-05 13:23:46 - INFO - stdout - {'loss': 1.017, 'grad_norm': 1.0891984701156616, 'learning_rate': 1.868445470882225e-05, 'epoch': 0.57} +2025-02-05 13:23:46 - ERROR - stderr - 19%|█▉ | 4267/22434 [3:16:05<18:45:55, 3.72s/it] +2025-02-05 13:24:03 - ERROR - stderr - 19%|█▉ | 4268/22434 [3:16:23<39:45:37, 7.88s/it] +2025-02-05 13:24:03 - ERROR - stderr - +2025-02-05 13:24:03 - ERROR - stderr - +2025-02-05 13:24:03 - INFO - stdout - {'loss': 1.0345, 'grad_norm': 1.0536088943481445, 'learning_rate': 1.8683738830835132e-05, 'epoch': 0.57} +2025-02-05 13:24:03 - ERROR - stderr - 19%|█▉ | 4268/22434 [3:16:23<39:45:37, 7.88s/it] +2025-02-05 13:24:33 - ERROR - stderr - 19%|█▉ | 4269/22434 [3:16:53<73:09:12, 14.50s/it] +2025-02-05 13:24:33 - ERROR - stderr - +2025-02-05 13:24:33 - ERROR - stderr - +2025-02-05 13:24:33 - INFO - stdout - {'loss': 1.0318, 'grad_norm': 1.1744624376296997, 'learning_rate': 1.8683022771843785e-05, 'epoch': 0.57} +2025-02-05 13:24:33 - ERROR - stderr - 19%|█▉ | 4269/22434 [3:16:53<73:09:12, 14.50s/it] +2025-02-05 13:25:12 - ERROR - stderr - 19%|█▉ | 4270/22434 [3:17:32<110:10:01, 21.83s/it] +2025-02-05 13:25:12 - ERROR - stderr - +2025-02-05 13:25:12 - ERROR - stderr - +2025-02-05 13:25:12 - INFO - stdout - {'loss': 0.857, 'grad_norm': 0.9553273916244507, 'learning_rate': 1.8682306531863137e-05, 'epoch': 0.57} +2025-02-05 13:25:12 - ERROR - stderr - 19%|█▉ | 4270/22434 [3:17:32<110:10:01, 21.83s/it] +2025-02-05 13:25:15 - ERROR - stderr - 19%|█▉ | 4271/22434 [3:17:34<81:04:52, 16.07s/it] +2025-02-05 13:25:15 - ERROR - stderr - +2025-02-05 13:25:15 - ERROR - stderr - +2025-02-05 13:25:15 - INFO - stdout - {'loss': 1.0382, 'grad_norm': 1.1216747760772705, 'learning_rate': 1.868159011090812e-05, 'epoch': 0.57} +2025-02-05 13:25:15 - ERROR - stderr - 19%|█▉ | 4271/22434 [3:17:35<81:04:52, 16.07s/it] +2025-02-05 13:25:53 - ERROR - stderr - 19%|█▉ | 4272/22434 [3:18:13<115:19:39, 22.86s/it] +2025-02-05 13:25:53 - ERROR - stderr - +2025-02-05 13:25:53 - ERROR - stderr - +2025-02-05 13:25:53 - INFO - stdout - {'loss': 0.9544, 'grad_norm': 1.0053479671478271, 'learning_rate': 1.868087350899366e-05, 'epoch': 0.57} +2025-02-05 13:25:53 - ERROR - stderr - 19%|█▉ | 4272/22434 [3:18:13<115:19:39, 22.86s/it] +2025-02-05 13:26:07 - ERROR - stderr - 19%|█▉ | 4273/22434 [3:18:27<101:30:30, 20.12s/it] +2025-02-05 13:26:07 - ERROR - stderr - +2025-02-05 13:26:07 - ERROR - stderr - +2025-02-05 13:26:07 - INFO - stdout - {'loss': 1.0534, 'grad_norm': 1.147739291191101, 'learning_rate': 1.8680156726134702e-05, 'epoch': 0.57} +2025-02-05 13:26:07 - ERROR - stderr - 19%|█▉ | 4273/22434 [3:18:27<101:30:30, 20.12s/it] +2025-02-05 13:26:51 - ERROR - stderr - 19%|█▉ | 4274/22434 [3:19:10<136:45:08, 27.11s/it] +2025-02-05 13:26:51 - ERROR - stderr - +2025-02-05 13:26:51 - ERROR - stderr - +2025-02-05 13:26:51 - INFO - stdout - {'loss': 0.9791, 'grad_norm': 1.0907477140426636, 'learning_rate': 1.8679439762346186e-05, 'epoch': 0.57} +2025-02-05 13:26:51 - ERROR - stderr - 19%|█▉ | 4274/22434 [3:19:10<136:45:08, 27.11s/it] +2025-02-05 13:27:05 - ERROR - stderr - 19%|█▉ | 4275/22434 [3:19:25<117:20:41, 23.26s/it] +2025-02-05 13:27:05 - ERROR - stderr - +2025-02-05 13:27:05 - ERROR - stderr - +2025-02-05 13:27:05 - INFO - stdout - {'loss': 1.0878, 'grad_norm': 1.200300931930542, 'learning_rate': 1.8678722617643047e-05, 'epoch': 0.57} +2025-02-05 13:27:05 - ERROR - stderr - 19%|█▉ | 4275/22434 [3:19:25<117:20:41, 23.26s/it] +2025-02-05 13:27:49 - ERROR - stderr - 19%|█▉ | 4276/22434 [3:20:09<149:05:27, 29.56s/it] +2025-02-05 13:27:49 - ERROR - stderr - +2025-02-05 13:27:49 - ERROR - stderr - +2025-02-05 13:27:49 - INFO - stdout - {'loss': 1.0301, 'grad_norm': 1.168286919593811, 'learning_rate': 1.8678005292040243e-05, 'epoch': 0.57} +2025-02-05 13:27:49 - ERROR - stderr - 19%|█▉ | 4276/22434 [3:20:09<149:05:27, 29.56s/it] +2025-02-05 13:28:34 - ERROR - stderr - 19%|█▉ | 4277/22434 [3:20:53<171:47:09, 34.06s/it] +2025-02-05 13:28:34 - ERROR - stderr - +2025-02-05 13:28:34 - ERROR - stderr - +2025-02-05 13:28:34 - INFO - stdout - {'loss': 0.9846, 'grad_norm': 1.0300790071487427, 'learning_rate': 1.8677287785552724e-05, 'epoch': 0.57} +2025-02-05 13:28:34 - ERROR - stderr - 19%|█▉ | 4277/22434 [3:20:53<171:47:09, 34.06s/it] +2025-02-05 13:29:14 - ERROR - stderr - 19%|█▉ | 4278/22434 [3:21:34<181:59:24, 36.09s/it] +2025-02-05 13:29:15 - ERROR - stderr - +2025-02-05 13:29:15 - ERROR - stderr - +2025-02-05 13:29:15 - INFO - stdout - {'loss': 0.9177, 'grad_norm': 1.0243234634399414, 'learning_rate': 1.8676570098195443e-05, 'epoch': 0.57} +2025-02-05 13:29:15 - ERROR - stderr - 19%|█▉ | 4278/22434 [3:21:34<181:59:24, 36.09s/it] +2025-02-05 13:29:17 - ERROR - stderr - 19%|█▉ | 4279/22434 [3:21:37<131:12:55, 26.02s/it] +2025-02-05 13:29:17 - ERROR - stderr - +2025-02-05 13:29:17 - ERROR - stderr - +2025-02-05 13:29:17 - INFO - stdout - {'loss': 0.8831, 'grad_norm': 1.092434048652649, 'learning_rate': 1.867585222998336e-05, 'epoch': 0.57} +2025-02-05 13:29:17 - ERROR - stderr - 19%|█▉ | 4279/22434 [3:21:37<131:12:55, 26.02s/it] +2025-02-05 13:30:03 - ERROR - stderr - 19%|█▉ | 4280/22434 [3:22:23<162:05:18, 32.14s/it] +2025-02-05 13:30:03 - ERROR - stderr - +2025-02-05 13:30:03 - ERROR - stderr - +2025-02-05 13:30:03 - INFO - stdout - {'loss': 1.0003, 'grad_norm': 1.1066855192184448, 'learning_rate': 1.867513418093144e-05, 'epoch': 0.57} +2025-02-05 13:30:03 - ERROR - stderr - 19%|█▉ | 4280/22434 [3:22:23<162:05:18, 32.14s/it] +2025-02-05 13:30:40 - ERROR - stderr - 19%|█▉ | 4281/22434 [3:23:00<168:50:20, 33.48s/it] +2025-02-05 13:30:40 - ERROR - stderr - +2025-02-05 13:30:40 - ERROR - stderr - +2025-02-05 13:30:40 - INFO - stdout - {'loss': 0.892, 'grad_norm': 0.9998567700386047, 'learning_rate': 1.8674415951054647e-05, 'epoch': 0.57} +2025-02-05 13:30:40 - ERROR - stderr - 19%|█▉ | 4281/22434 [3:23:00<168:50:20, 33.48s/it] +2025-02-05 13:30:58 - ERROR - stderr - 19%|█▉ | 4282/22434 [3:23:18<145:28:35, 28.85s/it] +2025-02-05 13:30:58 - ERROR - stderr - +2025-02-05 13:30:58 - ERROR - stderr - +2025-02-05 13:30:58 - INFO - stdout - {'loss': 0.8627, 'grad_norm': 0.9793708920478821, 'learning_rate': 1.8673697540367957e-05, 'epoch': 0.57} +2025-02-05 13:30:58 - ERROR - stderr - 19%|█▉ | 4282/22434 [3:23:18<145:28:35, 28.85s/it] +2025-02-05 13:31:01 - ERROR - stderr - 19%|█▉ | 4283/22434 [3:23:20<105:34:29, 20.94s/it] +2025-02-05 13:31:01 - ERROR - stderr - +2025-02-05 13:31:01 - ERROR - stderr - +2025-02-05 13:31:01 - INFO - stdout - {'loss': 1.0422, 'grad_norm': 1.2207691669464111, 'learning_rate': 1.867297894888634e-05, 'epoch': 0.57} +2025-02-05 13:31:01 - ERROR - stderr - 19%|█▉ | 4283/22434 [3:23:20<105:34:29, 20.94s/it] +2025-02-05 13:31:50 - ERROR - stderr - 19%|█▉ | 4284/22434 [3:24:09<148:13:11, 29.40s/it] +2025-02-05 13:31:50 - ERROR - stderr - +2025-02-05 13:31:50 - ERROR - stderr - +2025-02-05 13:31:50 - INFO - stdout - {'loss': 0.9389, 'grad_norm': 0.9690563082695007, 'learning_rate': 1.8672260176624775e-05, 'epoch': 0.57} +2025-02-05 13:31:50 - ERROR - stderr - 19%|█▉ | 4284/22434 [3:24:10<148:13:11, 29.40s/it] +2025-02-05 13:32:38 - ERROR - stderr - 19%|█▉ | 4285/22434 [3:24:57<176:22:20, 34.98s/it] +2025-02-05 13:32:38 - ERROR - stderr - +2025-02-05 13:32:38 - ERROR - stderr - +2025-02-05 13:32:38 - INFO - stdout - {'loss': 0.9732, 'grad_norm': 1.1791579723358154, 'learning_rate': 1.8671541223598248e-05, 'epoch': 0.57} +2025-02-05 13:32:38 - ERROR - stderr - 19%|█▉ | 4285/22434 [3:24:58<176:22:20, 34.98s/it] +2025-02-05 13:32:40 - ERROR - stderr - 19%|█▉ | 4286/22434 [3:25:00<127:07:55, 25.22s/it] +2025-02-05 13:32:40 - ERROR - stderr - +2025-02-05 13:32:40 - ERROR - stderr - +2025-02-05 13:32:40 - INFO - stdout - {'loss': 1.0981, 'grad_norm': 1.2792478799819946, 'learning_rate': 1.867082208982174e-05, 'epoch': 0.57} +2025-02-05 13:32:40 - ERROR - stderr - 19%|█▉ | 4286/22434 [3:25:00<127:07:55, 25.22s/it] +2025-02-05 13:32:58 - ERROR - stderr - 19%|█▉ | 4287/22434 [3:25:18<116:22:29, 23.09s/it] +2025-02-05 13:32:58 - ERROR - stderr - +2025-02-05 13:32:58 - ERROR - stderr - +2025-02-05 13:32:58 - INFO - stdout - {'loss': 1.0178, 'grad_norm': 1.2760359048843384, 'learning_rate': 1.867010277531024e-05, 'epoch': 0.57} +2025-02-05 13:32:58 - ERROR - stderr - 19%|█▉ | 4287/22434 [3:25:18<116:22:29, 23.09s/it] +2025-02-05 13:33:01 - ERROR - stderr - 19%|█▉ | 4288/22434 [3:25:21<85:18:24, 16.92s/it] +2025-02-05 13:33:01 - ERROR - stderr - +2025-02-05 13:33:01 - ERROR - stderr - +2025-02-05 13:33:01 - INFO - stdout - {'loss': 1.0475, 'grad_norm': 1.3104729652404785, 'learning_rate': 1.866938328007875e-05, 'epoch': 0.57} +2025-02-05 13:33:01 - ERROR - stderr - 19%|█▉ | 4288/22434 [3:25:21<85:18:24, 16.92s/it] +2025-02-05 13:33:35 - ERROR - stderr - 19%|█▉ | 4289/22434 [3:25:55<112:04:57, 22.24s/it] +2025-02-05 13:33:35 - ERROR - stderr - +2025-02-05 13:33:35 - ERROR - stderr - +2025-02-05 13:33:35 - INFO - stdout - {'loss': 0.9351, 'grad_norm': 1.0913432836532593, 'learning_rate': 1.8668663604142257e-05, 'epoch': 0.57} +2025-02-05 13:33:35 - ERROR - stderr - 19%|█▉ | 4289/22434 [3:25:55<112:04:57, 22.24s/it] +2025-02-05 13:33:38 - ERROR - stderr - 19%|█▉ | 4290/22434 [3:25:58<82:07:45, 16.30s/it] +2025-02-05 13:33:38 - ERROR - stderr - +2025-02-05 13:33:38 - ERROR - stderr - +2025-02-05 13:33:38 - INFO - stdout - {'loss': 1.008, 'grad_norm': 1.0429764986038208, 'learning_rate': 1.866794374751577e-05, 'epoch': 0.57} +2025-02-05 13:33:38 - ERROR - stderr - 19%|█▉ | 4290/22434 [3:25:58<82:07:45, 16.30s/it] +2025-02-05 13:33:55 - ERROR - stderr - 19%|█▉ | 4291/22434 [3:26:15<83:24:17, 16.55s/it] +2025-02-05 13:33:55 - ERROR - stderr - +2025-02-05 13:33:55 - ERROR - stderr - +2025-02-05 13:33:55 - INFO - stdout - {'loss': 1.0325, 'grad_norm': 1.0540709495544434, 'learning_rate': 1.8667223710214286e-05, 'epoch': 0.57} +2025-02-05 13:33:55 - ERROR - stderr - 19%|█▉ | 4291/22434 [3:26:15<83:24:17, 16.55s/it] +2025-02-05 13:34:49 - ERROR - stderr - 19%|█▉ | 4292/22434 [3:27:09<139:46:54, 27.74s/it] +2025-02-05 13:34:49 - ERROR - stderr - +2025-02-05 13:34:49 - ERROR - stderr - +2025-02-05 13:34:49 - INFO - stdout - {'loss': 0.9381, 'grad_norm': 1.1324586868286133, 'learning_rate': 1.8666503492252818e-05, 'epoch': 0.57} +2025-02-05 13:34:49 - ERROR - stderr - 19%|█▉ | 4292/22434 [3:27:09<139:46:54, 27.74s/it] +2025-02-05 13:34:51 - ERROR - stderr - 19%|█▉ | 4293/22434 [3:27:11<101:34:17, 20.16s/it] +2025-02-05 13:34:51 - ERROR - stderr - +2025-02-05 13:34:51 - ERROR - stderr - +2025-02-05 13:34:51 - INFO - stdout - {'loss': 0.9818, 'grad_norm': 1.1117823123931885, 'learning_rate': 1.866578309364638e-05, 'epoch': 0.57} +2025-02-05 13:34:51 - ERROR - stderr - 19%|█▉ | 4293/22434 [3:27:11<101:34:17, 20.16s/it] +2025-02-05 13:34:54 - ERROR - stderr - 19%|█▉ | 4294/22434 [3:27:14<74:52:10, 14.86s/it] +2025-02-05 13:34:54 - ERROR - stderr - +2025-02-05 13:34:54 - ERROR - stderr - +2025-02-05 13:34:54 - INFO - stdout - {'loss': 1.0504, 'grad_norm': 1.015079140663147, 'learning_rate': 1.8665062514409985e-05, 'epoch': 0.57} +2025-02-05 13:34:54 - ERROR - stderr - 19%|█▉ | 4294/22434 [3:27:14<74:52:10, 14.86s/it] +2025-02-05 13:34:56 - ERROR - stderr - 19%|█▉ | 4295/22434 [3:27:16<56:08:05, 11.14s/it] +2025-02-05 13:34:56 - ERROR - stderr - +2025-02-05 13:34:56 - ERROR - stderr - +2025-02-05 13:34:56 - INFO - stdout - {'loss': 0.9238, 'grad_norm': 1.011838674545288, 'learning_rate': 1.866434175455865e-05, 'epoch': 0.57} +2025-02-05 13:34:56 - ERROR - stderr - 19%|█▉ | 4295/22434 [3:27:16<56:08:05, 11.14s/it] +2025-02-05 13:35:17 - ERROR - stderr - 19%|█▉ | 4296/22434 [3:27:37<70:42:41, 14.03s/it] +2025-02-05 13:35:17 - ERROR - stderr - +2025-02-05 13:35:17 - ERROR - stderr - +2025-02-05 13:35:17 - INFO - stdout - {'loss': 1.0773, 'grad_norm': 1.219601035118103, 'learning_rate': 1.8663620814107404e-05, 'epoch': 0.57} +2025-02-05 13:35:17 - ERROR - stderr - 19%|█▉ | 4296/22434 [3:27:37<70:42:41, 14.03s/it] +2025-02-05 13:36:11 - ERROR - stderr - 19%|█▉ | 4297/22434 [3:28:30<130:16:35, 25.86s/it] +2025-02-05 13:36:11 - ERROR - stderr - +2025-02-05 13:36:11 - ERROR - stderr - +2025-02-05 13:36:11 - INFO - stdout - {'loss': 0.9675, 'grad_norm': 1.047299861907959, 'learning_rate': 1.8662899693071276e-05, 'epoch': 0.57} +2025-02-05 13:36:11 - ERROR - stderr - 19%|█▉ | 4297/22434 [3:28:30<130:16:35, 25.86s/it] +2025-02-05 13:37:00 - ERROR - stderr - 19%|█▉ | 4298/22434 [3:29:20<166:32:39, 33.06s/it] +2025-02-05 13:37:00 - ERROR - stderr - +2025-02-05 13:37:00 - ERROR - stderr - +2025-02-05 13:37:00 - INFO - stdout - {'loss': 0.8747, 'grad_norm': 1.1643438339233398, 'learning_rate': 1.8662178391465288e-05, 'epoch': 0.57} +2025-02-05 13:37:00 - ERROR - stderr - 19%|█▉ | 4298/22434 [3:29:20<166:32:39, 33.06s/it] +2025-02-05 13:37:49 - ERROR - stderr - 19%|█▉ | 4299/22434 [3:30:09<189:55:08, 37.70s/it] +2025-02-05 13:37:49 - ERROR - stderr - +2025-02-05 13:37:49 - ERROR - stderr - +2025-02-05 13:37:49 - INFO - stdout - {'loss': 1.0029, 'grad_norm': 1.1427836418151855, 'learning_rate': 1.8661456909304482e-05, 'epoch': 0.57} +2025-02-05 13:37:49 - ERROR - stderr - 19%|█▉ | 4299/22434 [3:30:09<189:55:08, 37.70s/it] +2025-02-05 13:37:51 - ERROR - stderr - 19%|█▉ | 4300/22434 [3:30:11<136:44:31, 27.15s/it] +2025-02-05 13:37:51 - ERROR - stderr - +2025-02-05 13:37:51 - ERROR - stderr - +2025-02-05 13:37:51 - INFO - stdout - {'loss': 1.0718, 'grad_norm': 1.093421220779419, 'learning_rate': 1.8660735246603896e-05, 'epoch': 0.58} +2025-02-05 13:37:51 - ERROR - stderr - 19%|█▉ | 4300/22434 [3:30:11<136:44:31, 27.15s/it] +2025-02-05 13:37:54 - ERROR - stderr - 19%|█▉ | 4301/22434 [3:30:14<99:27:45, 19.75s/it] +2025-02-05 13:37:54 - ERROR - stderr - +2025-02-05 13:37:54 - ERROR - stderr - +2025-02-05 13:37:54 - INFO - stdout - {'loss': 0.9216, 'grad_norm': 1.1446141004562378, 'learning_rate': 1.866001340337857e-05, 'epoch': 0.58} +2025-02-05 13:37:54 - ERROR - stderr - 19%|█▉ | 4301/22434 [3:30:14<99:27:45, 19.75s/it] +2025-02-05 13:38:40 - ERROR - stderr - 19%|█▉ | 4302/22434 [3:31:00<139:26:19, 27.68s/it] +2025-02-05 13:38:40 - ERROR - stderr - +2025-02-05 13:38:40 - ERROR - stderr - +2025-02-05 13:38:40 - INFO - stdout - {'loss': 0.9749, 'grad_norm': 1.064455270767212, 'learning_rate': 1.8659291379643553e-05, 'epoch': 0.58} +2025-02-05 13:38:40 - ERROR - stderr - 19%|█▉ | 4302/22434 [3:31:00<139:26:19, 27.68s/it] +2025-02-05 13:39:09 - ERROR - stderr - 19%|█▉ | 4303/22434 [3:31:28<140:40:14, 27.93s/it] +2025-02-05 13:39:09 - ERROR - stderr - +2025-02-05 13:39:09 - ERROR - stderr - +2025-02-05 13:39:09 - INFO - stdout - {'loss': 0.9315, 'grad_norm': 1.060905933380127, 'learning_rate': 1.8658569175413893e-05, 'epoch': 0.58} +2025-02-05 13:39:09 - ERROR - stderr - 19%|█▉ | 4303/22434 [3:31:28<140:40:14, 27.93s/it] +2025-02-05 13:39:59 - ERROR - stderr - 19%|█▉ | 4304/22434 [3:32:19<175:15:53, 34.80s/it] +2025-02-05 13:40:00 - ERROR - stderr - +2025-02-05 13:40:00 - ERROR - stderr - +2025-02-05 13:40:00 - INFO - stdout - {'loss': 0.9146, 'grad_norm': 1.0140661001205444, 'learning_rate': 1.865784679070464e-05, 'epoch': 0.58} +2025-02-05 13:40:00 - ERROR - stderr - 19%|█▉ | 4304/22434 [3:32:19<175:15:53, 34.80s/it] +2025-02-05 13:40:02 - ERROR - stderr - 19%|█▉ | 4305/22434 [3:32:22<126:36:15, 25.14s/it] +2025-02-05 13:40:02 - ERROR - stderr - +2025-02-05 13:40:02 - ERROR - stderr - +2025-02-05 13:40:02 - INFO - stdout - {'loss': 0.9328, 'grad_norm': 1.0355207920074463, 'learning_rate': 1.8657124225530857e-05, 'epoch': 0.58} +2025-02-05 13:40:02 - ERROR - stderr - 19%|█▉ | 4305/22434 [3:32:22<126:36:15, 25.14s/it] +2025-02-05 13:40:05 - ERROR - stderr - 19%|█▉ | 4306/22434 [3:32:24<92:28:32, 18.36s/it] +2025-02-05 13:40:05 - ERROR - stderr - +2025-02-05 13:40:05 - ERROR - stderr - +2025-02-05 13:40:05 - INFO - stdout - {'loss': 0.865, 'grad_norm': 1.1159552335739136, 'learning_rate': 1.8656401479907607e-05, 'epoch': 0.58} +2025-02-05 13:40:05 - ERROR - stderr - 19%|█▉ | 4306/22434 [3:32:24<92:28:32, 18.36s/it] +2025-02-05 13:40:07 - ERROR - stderr - 19%|█▉ | 4307/22434 [3:32:27<68:28:53, 13.60s/it] +2025-02-05 13:40:07 - ERROR - stderr - +2025-02-05 13:40:07 - ERROR - stderr - +2025-02-05 13:40:07 - INFO - stdout - {'loss': 0.9141, 'grad_norm': 1.05385160446167, 'learning_rate': 1.865567855384995e-05, 'epoch': 0.58} +2025-02-05 13:40:07 - ERROR - stderr - 19%|█▉ | 4307/22434 [3:32:27<68:28:53, 13.60s/it] +2025-02-05 13:40:44 - ERROR - stderr - 19%|█▉ | 4308/22434 [3:33:03<103:06:05, 20.48s/it] +2025-02-05 13:40:44 - ERROR - stderr - +2025-02-05 13:40:44 - ERROR - stderr - +2025-02-05 13:40:44 - INFO - stdout - {'loss': 0.9237, 'grad_norm': 1.0680540800094604, 'learning_rate': 1.8654955447372957e-05, 'epoch': 0.58} +2025-02-05 13:40:44 - ERROR - stderr - 19%|█▉ | 4308/22434 [3:33:03<103:06:05, 20.48s/it] +2025-02-05 13:41:18 - ERROR - stderr - 19%|█▉ | 4309/22434 [3:33:38<124:28:15, 24.72s/it] +2025-02-05 13:41:18 - ERROR - stderr - +2025-02-05 13:41:18 - ERROR - stderr - +2025-02-05 13:41:18 - INFO - stdout - {'loss': 0.8756, 'grad_norm': 1.0946894884109497, 'learning_rate': 1.8654232160491696e-05, 'epoch': 0.58} +2025-02-05 13:41:18 - ERROR - stderr - 19%|█▉ | 4309/22434 [3:33:38<124:28:15, 24.72s/it] +2025-02-05 13:41:27 - ERROR - stderr - 19%|█▉ | 4310/22434 [3:33:47<100:21:42, 19.94s/it] +2025-02-05 13:41:27 - ERROR - stderr - +2025-02-05 13:41:27 - ERROR - stderr - +2025-02-05 13:41:27 - INFO - stdout - {'loss': 0.9475, 'grad_norm': 1.0846202373504639, 'learning_rate': 1.865350869322125e-05, 'epoch': 0.58} +2025-02-05 13:41:27 - ERROR - stderr - 19%|█▉ | 4310/22434 [3:33:47<100:21:42, 19.94s/it] +2025-02-05 13:41:30 - ERROR - stderr - 19%|█▉ | 4311/22434 [3:33:49<74:01:21, 14.70s/it] +2025-02-05 13:41:30 - ERROR - stderr - +2025-02-05 13:41:30 - ERROR - stderr - +2025-02-05 13:41:30 - INFO - stdout - {'loss': 0.9424, 'grad_norm': 1.0334192514419556, 'learning_rate': 1.8652785045576692e-05, 'epoch': 0.58} +2025-02-05 13:41:30 - ERROR - stderr - 19%|█▉ | 4311/22434 [3:33:49<74:01:21, 14.70s/it] +2025-02-05 13:41:32 - ERROR - stderr - 19%|█▉ | 4312/22434 [3:33:52<55:28:46, 11.02s/it] +2025-02-05 13:41:32 - ERROR - stderr - +2025-02-05 13:41:32 - ERROR - stderr - +2025-02-05 13:41:32 - INFO - stdout - {'loss': 0.9894, 'grad_norm': 1.056339144706726, 'learning_rate': 1.8652061217573115e-05, 'epoch': 0.58} +2025-02-05 13:41:32 - ERROR - stderr - 19%|█▉ | 4312/22434 [3:33:52<55:28:46, 11.02s/it] +2025-02-05 13:41:35 - ERROR - stderr - 19%|█▉ | 4313/22434 [3:33:54<42:42:35, 8.48s/it] +2025-02-05 13:41:35 - ERROR - stderr - +2025-02-05 13:41:35 - ERROR - stderr - +2025-02-05 13:41:35 - INFO - stdout - {'loss': 1.0031, 'grad_norm': 1.25147545337677, 'learning_rate': 1.8651337209225598e-05, 'epoch': 0.58} +2025-02-05 13:41:35 - ERROR - stderr - 19%|█▉ | 4313/22434 [3:33:54<42:42:35, 8.48s/it] +2025-02-05 13:42:05 - ERROR - stderr - 19%|█▉ | 4314/22434 [3:34:25<76:04:14, 15.11s/it] +2025-02-05 13:42:05 - ERROR - stderr - +2025-02-05 13:42:05 - ERROR - stderr - +2025-02-05 13:42:05 - INFO - stdout - {'loss': 0.9595, 'grad_norm': 1.0496535301208496, 'learning_rate': 1.8650613020549232e-05, 'epoch': 0.58} +2025-02-05 13:42:05 - ERROR - stderr - 19%|█▉ | 4314/22434 [3:34:25<76:04:14, 15.11s/it] +2025-02-05 13:42:08 - ERROR - stderr - 19%|█▉ | 4315/22434 [3:34:27<57:01:24, 11.33s/it] +2025-02-05 13:42:08 - ERROR - stderr - +2025-02-05 13:42:08 - ERROR - stderr - +2025-02-05 13:42:08 - INFO - stdout - {'loss': 1.0372, 'grad_norm': 1.0301775932312012, 'learning_rate': 1.8649888651559122e-05, 'epoch': 0.58} +2025-02-05 13:42:08 - ERROR - stderr - 19%|█▉ | 4315/22434 [3:34:27<57:01:24, 11.33s/it] +2025-02-05 13:42:10 - ERROR - stderr - 19%|█▉ | 4316/22434 [3:34:30<43:37:21, 8.67s/it] +2025-02-05 13:42:10 - ERROR - stderr - +2025-02-05 13:42:10 - ERROR - stderr - +2025-02-05 13:42:10 - INFO - stdout - {'loss': 0.8965, 'grad_norm': 1.0061918497085571, 'learning_rate': 1.8649164102270357e-05, 'epoch': 0.58} +2025-02-05 13:42:10 - ERROR - stderr - 19%|█▉ | 4316/22434 [3:34:30<43:37:21, 8.67s/it] +2025-02-05 13:42:13 - ERROR - stderr - 19%|█▉ | 4317/22434 [3:34:32<34:26:50, 6.84s/it] +2025-02-05 13:42:13 - ERROR - stderr - +2025-02-05 13:42:13 - ERROR - stderr - +2025-02-05 13:42:13 - INFO - stdout - {'loss': 1.039, 'grad_norm': 1.0771774053573608, 'learning_rate': 1.8648439372698043e-05, 'epoch': 0.58} +2025-02-05 13:42:13 - ERROR - stderr - 19%|█▉ | 4317/22434 [3:34:32<34:26:50, 6.84s/it] +2025-02-05 13:42:15 - ERROR - stderr - 19%|█▉ | 4318/22434 [3:34:35<27:52:42, 5.54s/it] +2025-02-05 13:42:15 - ERROR - stderr - +2025-02-05 13:42:15 - ERROR - stderr - +2025-02-05 13:42:15 - INFO - stdout - {'loss': 0.8077, 'grad_norm': 0.9801791906356812, 'learning_rate': 1.8647714462857284e-05, 'epoch': 0.58} +2025-02-05 13:42:15 - ERROR - stderr - 19%|█▉ | 4318/22434 [3:34:35<27:52:42, 5.54s/it] +2025-02-05 13:42:41 - ERROR - stderr - 19%|█▉ | 4319/22434 [3:35:00<57:50:12, 11.49s/it] +2025-02-05 13:42:41 - ERROR - stderr - +2025-02-05 13:42:41 - ERROR - stderr - +2025-02-05 13:42:41 - INFO - stdout - {'loss': 0.9401, 'grad_norm': 1.144612431526184, 'learning_rate': 1.8646989372763194e-05, 'epoch': 0.58} +2025-02-05 13:42:41 - ERROR - stderr - 19%|█▉ | 4319/22434 [3:35:00<57:50:12, 11.49s/it] +2025-02-05 13:43:02 - ERROR - stderr - 19%|█▉ | 4320/22434 [3:35:21<72:21:27, 14.38s/it] +2025-02-05 13:43:02 - ERROR - stderr - +2025-02-05 13:43:02 - ERROR - stderr - +2025-02-05 13:43:02 - INFO - stdout - {'loss': 1.1049, 'grad_norm': 1.1350862979888916, 'learning_rate': 1.8646264102430884e-05, 'epoch': 0.58} +2025-02-05 13:43:02 - ERROR - stderr - 19%|█▉ | 4320/22434 [3:35:21<72:21:27, 14.38s/it] +2025-02-05 13:43:04 - ERROR - stderr - 19%|█▉ | 4321/22434 [3:35:24<54:18:12, 10.79s/it] +2025-02-05 13:43:04 - ERROR - stderr - +2025-02-05 13:43:04 - ERROR - stderr - +2025-02-05 13:43:04 - INFO - stdout - {'loss': 0.9418, 'grad_norm': 1.1326942443847656, 'learning_rate': 1.864553865187547e-05, 'epoch': 0.58} +2025-02-05 13:43:04 - ERROR - stderr - 19%|█▉ | 4321/22434 [3:35:24<54:18:12, 10.79s/it] +2025-02-05 13:43:07 - ERROR - stderr - 19%|█▉ | 4322/22434 [3:35:26<41:48:18, 8.31s/it] +2025-02-05 13:43:07 - ERROR - stderr - +2025-02-05 13:43:07 - ERROR - stderr - +2025-02-05 13:43:07 - INFO - stdout - {'loss': 0.993, 'grad_norm': 1.2036370038986206, 'learning_rate': 1.864481302111208e-05, 'epoch': 0.58} +2025-02-05 13:43:07 - ERROR - stderr - 19%|█▉ | 4322/22434 [3:35:26<41:48:18, 8.31s/it] +2025-02-05 13:43:09 - ERROR - stderr - 19%|█▉ | 4323/22434 [3:35:29<33:01:04, 6.56s/it] +2025-02-05 13:43:09 - ERROR - stderr - +2025-02-05 13:43:09 - ERROR - stderr - +2025-02-05 13:43:09 - INFO - stdout - {'loss': 0.8166, 'grad_norm': 0.9390064477920532, 'learning_rate': 1.8644087210155834e-05, 'epoch': 0.58} +2025-02-05 13:43:09 - ERROR - stderr - 19%|█▉ | 4323/22434 [3:35:29<33:01:04, 6.56s/it] +2025-02-05 13:43:12 - ERROR - stderr - 19%|█▉ | 4324/22434 [3:35:31<26:56:35, 5.36s/it] +2025-02-05 13:43:12 - ERROR - stderr - +2025-02-05 13:43:12 - ERROR - stderr - +2025-02-05 13:43:12 - INFO - stdout - {'loss': 0.9941, 'grad_norm': 1.078291893005371, 'learning_rate': 1.864336121902186e-05, 'epoch': 0.58} +2025-02-05 13:43:12 - ERROR - stderr - 19%|█▉ | 4324/22434 [3:35:31<26:56:35, 5.36s/it] +2025-02-05 13:43:14 - ERROR - stderr - 19%|█▉ | 4325/22434 [3:35:34<22:45:45, 4.53s/it] +2025-02-05 13:43:14 - ERROR - stderr - +2025-02-05 13:43:14 - ERROR - stderr - +2025-02-05 13:43:14 - INFO - stdout - {'loss': 0.9133, 'grad_norm': 1.100733757019043, 'learning_rate': 1.864263504772529e-05, 'epoch': 0.58} +2025-02-05 13:43:14 - ERROR - stderr - 19%|█▉ | 4325/22434 [3:35:34<22:45:45, 4.53s/it] +2025-02-05 13:43:17 - ERROR - stderr - 19%|█▉ | 4326/22434 [3:35:36<19:40:53, 3.91s/it] +2025-02-05 13:43:17 - ERROR - stderr - +2025-02-05 13:43:17 - ERROR - stderr - +2025-02-05 13:43:17 - INFO - stdout - {'loss': 1.0392, 'grad_norm': 1.0020307302474976, 'learning_rate': 1.864190869628127e-05, 'epoch': 0.58} +2025-02-05 13:43:17 - ERROR - stderr - 19%|█▉ | 4326/22434 [3:35:36<19:40:53, 3.91s/it] +2025-02-05 13:43:24 - ERROR - stderr - 19%|█▉ | 4327/22434 [3:35:43<24:15:40, 4.82s/it] +2025-02-05 13:43:24 - ERROR - stderr - +2025-02-05 13:43:24 - ERROR - stderr - +2025-02-05 13:43:24 - INFO - stdout - {'loss': 0.9659, 'grad_norm': 1.1002930402755737, 'learning_rate': 1.8641182164704924e-05, 'epoch': 0.58} +2025-02-05 13:43:24 - ERROR - stderr - 19%|█▉ | 4327/22434 [3:35:43<24:15:40, 4.82s/it] +2025-02-05 13:43:26 - ERROR - stderr - 19%|█▉ | 4328/22434 [3:35:46<20:45:22, 4.13s/it] +2025-02-05 13:43:26 - ERROR - stderr - +2025-02-05 13:43:26 - ERROR - stderr - +2025-02-05 13:43:26 - INFO - stdout - {'loss': 1.068, 'grad_norm': 1.2451192140579224, 'learning_rate': 1.864045545301141e-05, 'epoch': 0.58} +2025-02-05 13:43:26 - ERROR - stderr - 19%|█▉ | 4328/22434 [3:35:46<20:45:22, 4.13s/it] +2025-02-05 13:43:29 - ERROR - stderr - 19%|█▉ | 4329/22434 [3:35:48<18:15:46, 3.63s/it] +2025-02-05 13:43:29 - ERROR - stderr - +2025-02-05 13:43:29 - ERROR - stderr - +2025-02-05 13:43:29 - INFO - stdout - {'loss': 0.9935, 'grad_norm': 1.0918710231781006, 'learning_rate': 1.863972856121587e-05, 'epoch': 0.58} +2025-02-05 13:43:29 - ERROR - stderr - 19%|█▉ | 4329/22434 [3:35:48<18:15:46, 3.63s/it] +2025-02-05 13:43:31 - ERROR - stderr - 19%|█▉ | 4330/22434 [3:35:51<16:38:15, 3.31s/it] +2025-02-05 13:43:31 - ERROR - stderr - +2025-02-05 13:43:31 - ERROR - stderr - +2025-02-05 13:43:31 - INFO - stdout - {'loss': 0.9033, 'grad_norm': 1.054304838180542, 'learning_rate': 1.8639001489333453e-05, 'epoch': 0.58} +2025-02-05 13:43:31 - ERROR - stderr - 19%|█▉ | 4330/22434 [3:35:51<16:38:15, 3.31s/it] +2025-02-05 13:43:34 - ERROR - stderr - 19%|█▉ | 4331/22434 [3:35:53<15:28:29, 3.08s/it] +2025-02-05 13:43:34 - ERROR - stderr - +2025-02-05 13:43:34 - ERROR - stderr - +2025-02-05 13:43:34 - INFO - stdout - {'loss': 0.9684, 'grad_norm': 1.0869016647338867, 'learning_rate': 1.8638274237379316e-05, 'epoch': 0.58} +2025-02-05 13:43:34 - ERROR - stderr - 19%|█▉ | 4331/22434 [3:35:53<15:28:29, 3.08s/it] +2025-02-05 13:43:36 - ERROR - stderr - 19%|█▉ | 4332/22434 [3:35:56<14:37:56, 2.91s/it] +2025-02-05 13:43:36 - ERROR - stderr - +2025-02-05 13:43:36 - ERROR - stderr - +2025-02-05 13:43:36 - INFO - stdout - {'loss': 1.0336, 'grad_norm': 1.1065447330474854, 'learning_rate': 1.863754680536862e-05, 'epoch': 0.58} +2025-02-05 13:43:36 - ERROR - stderr - 19%|█▉ | 4332/22434 [3:35:56<14:37:56, 2.91s/it] +2025-02-05 13:43:39 - ERROR - stderr - 19%|█▉ | 4333/22434 [3:35:58<13:54:09, 2.76s/it] +2025-02-05 13:43:39 - ERROR - stderr - +2025-02-05 13:43:39 - ERROR - stderr - +2025-02-05 13:43:39 - INFO - stdout - {'loss': 0.969, 'grad_norm': 1.0373461246490479, 'learning_rate': 1.863681919331653e-05, 'epoch': 0.58} +2025-02-05 13:43:39 - ERROR - stderr - 19%|█▉ | 4333/22434 [3:35:58<13:54:09, 2.76s/it] +2025-02-05 13:43:42 - ERROR - stderr - 19%|█▉ | 4334/22434 [3:36:01<14:24:09, 2.86s/it] +2025-02-05 13:43:42 - ERROR - stderr - +2025-02-05 13:43:42 - ERROR - stderr - +2025-02-05 13:43:42 - INFO - stdout - {'loss': 0.9722, 'grad_norm': 1.161434292793274, 'learning_rate': 1.86360914012382e-05, 'epoch': 0.58} +2025-02-05 13:43:42 - ERROR - stderr - 19%|█▉ | 4334/22434 [3:36:02<14:24:09, 2.86s/it] +2025-02-05 13:43:44 - ERROR - stderr - 19%|█▉ | 4335/22434 [3:36:04<13:55:17, 2.77s/it] +2025-02-05 13:43:44 - ERROR - stderr - +2025-02-05 13:43:44 - ERROR - stderr - +2025-02-05 13:43:44 - INFO - stdout - {'loss': 0.9716, 'grad_norm': 1.1578494310379028, 'learning_rate': 1.8635363429148816e-05, 'epoch': 0.58} +2025-02-05 13:43:44 - ERROR - stderr - 19%|█▉ | 4335/22434 [3:36:04<13:55:17, 2.77s/it] +2025-02-05 13:43:47 - ERROR - stderr - 19%|█▉ | 4336/22434 [3:36:07<13:29:55, 2.69s/it] +2025-02-05 13:43:47 - ERROR - stderr - +2025-02-05 13:43:47 - ERROR - stderr - +2025-02-05 13:43:47 - INFO - stdout - {'loss': 0.9918, 'grad_norm': 1.0085622072219849, 'learning_rate': 1.863463527706354e-05, 'epoch': 0.58} +2025-02-05 13:43:47 - ERROR - stderr - 19%|█▉ | 4336/22434 [3:36:07<13:29:55, 2.69s/it] +2025-02-05 13:43:49 - ERROR - stderr - 19%|█▉ | 4337/22434 [3:36:09<13:21:32, 2.66s/it] +2025-02-05 13:43:49 - ERROR - stderr - +2025-02-05 13:43:49 - ERROR - stderr - +2025-02-05 13:43:49 - INFO - stdout - {'loss': 0.8571, 'grad_norm': 1.1762139797210693, 'learning_rate': 1.8633906944997557e-05, 'epoch': 0.58} +2025-02-05 13:43:49 - ERROR - stderr - 19%|█▉ | 4337/22434 [3:36:09<13:21:32, 2.66s/it] +2025-02-05 13:43:52 - ERROR - stderr - 19%|█▉ | 4338/22434 [3:36:12<13:07:11, 2.61s/it] +2025-02-05 13:43:52 - ERROR - stderr - +2025-02-05 13:43:52 - ERROR - stderr - +2025-02-05 13:43:52 - INFO - stdout - {'loss': 1.048, 'grad_norm': 1.1650320291519165, 'learning_rate': 1.8633178432966044e-05, 'epoch': 0.58} +2025-02-05 13:43:52 - ERROR - stderr - 19%|█▉ | 4338/22434 [3:36:12<13:07:11, 2.61s/it] +2025-02-05 13:43:54 - ERROR - stderr - 19%|█▉ | 4339/22434 [3:36:14<12:54:30, 2.57s/it] +2025-02-05 13:43:54 - ERROR - stderr - +2025-02-05 13:43:54 - ERROR - stderr - +2025-02-05 13:43:54 - INFO - stdout - {'loss': 1.0059, 'grad_norm': 1.022343635559082, 'learning_rate': 1.8632449740984187e-05, 'epoch': 0.58} +2025-02-05 13:43:54 - ERROR - stderr - 19%|█▉ | 4339/22434 [3:36:14<12:54:30, 2.57s/it] +2025-02-05 13:43:57 - ERROR - stderr - 19%|█▉ | 4340/22434 [3:36:17<12:43:12, 2.53s/it] +2025-02-05 13:43:57 - ERROR - stderr - +2025-02-05 13:43:57 - ERROR - stderr - +2025-02-05 13:43:57 - INFO - stdout - {'loss': 0.9996, 'grad_norm': 1.2207787036895752, 'learning_rate': 1.863172086906718e-05, 'epoch': 0.58} +2025-02-05 13:43:57 - ERROR - stderr - 19%|█▉ | 4340/22434 [3:36:17<12:43:12, 2.53s/it] +2025-02-05 13:43:59 - ERROR - stderr - 19%|█▉ | 4341/22434 [3:36:19<12:42:51, 2.53s/it] +2025-02-05 13:43:59 - ERROR - stderr - +2025-02-05 13:43:59 - ERROR - stderr - +2025-02-05 13:43:59 - INFO - stdout - {'loss': 0.9458, 'grad_norm': 1.0533626079559326, 'learning_rate': 1.8630991817230205e-05, 'epoch': 0.58} +2025-02-05 13:43:59 - ERROR - stderr - 19%|█▉ | 4341/22434 [3:36:19<12:42:51, 2.53s/it] +2025-02-05 13:44:02 - ERROR - stderr - 19%|█▉ | 4342/22434 [3:36:22<12:40:05, 2.52s/it] +2025-02-05 13:44:02 - ERROR - stderr - +2025-02-05 13:44:02 - ERROR - stderr - +2025-02-05 13:44:02 - INFO - stdout - {'loss': 0.9323, 'grad_norm': 1.092612624168396, 'learning_rate': 1.8630262585488465e-05, 'epoch': 0.58} +2025-02-05 13:44:02 - ERROR - stderr - 19%|█▉ | 4342/22434 [3:36:22<12:40:05, 2.52s/it] +2025-02-05 13:44:04 - ERROR - stderr - 19%|█▉ | 4343/22434 [3:36:24<12:40:33, 2.52s/it] +2025-02-05 13:44:04 - ERROR - stderr - +2025-02-05 13:44:04 - ERROR - stderr - +2025-02-05 13:44:04 - INFO - stdout - {'loss': 0.9896, 'grad_norm': 1.170183539390564, 'learning_rate': 1.8629533173857164e-05, 'epoch': 0.58} +2025-02-05 13:44:04 - ERROR - stderr - 19%|█▉ | 4343/22434 [3:36:24<12:40:33, 2.52s/it] +2025-02-05 13:44:07 - ERROR - stderr - 19%|█▉ | 4344/22434 [3:36:27<12:41:52, 2.53s/it] +2025-02-05 13:44:07 - ERROR - stderr - +2025-02-05 13:44:07 - ERROR - stderr - +2025-02-05 13:44:07 - INFO - stdout - {'loss': 0.9136, 'grad_norm': 0.9614370465278625, 'learning_rate': 1.8628803582351497e-05, 'epoch': 0.58} +2025-02-05 13:44:07 - ERROR - stderr - 19%|█▉ | 4344/22434 [3:36:27<12:41:52, 2.53s/it] +2025-02-05 13:44:09 - ERROR - stderr - 19%|█▉ | 4345/22434 [3:36:29<12:35:51, 2.51s/it] +2025-02-05 13:44:09 - ERROR - stderr - +2025-02-05 13:44:09 - ERROR - stderr - +2025-02-05 13:44:09 - INFO - stdout - {'loss': 0.8956, 'grad_norm': 1.0926681756973267, 'learning_rate': 1.862807381098668e-05, 'epoch': 0.58} +2025-02-05 13:44:09 - ERROR - stderr - 19%|█▉ | 4345/22434 [3:36:29<12:35:51, 2.51s/it] +2025-02-05 13:44:12 - ERROR - stderr - 19%|█▉ | 4346/22434 [3:36:32<12:32:33, 2.50s/it] +2025-02-05 13:44:12 - ERROR - stderr - +2025-02-05 13:44:12 - ERROR - stderr - +2025-02-05 13:44:12 - INFO - stdout - {'loss': 0.8954, 'grad_norm': 0.983894407749176, 'learning_rate': 1.862734385977792e-05, 'epoch': 0.58} +2025-02-05 13:44:12 - ERROR - stderr - 19%|█▉ | 4346/22434 [3:36:32<12:32:33, 2.50s/it] +2025-02-05 13:44:14 - ERROR - stderr - 19%|█▉ | 4347/22434 [3:36:34<12:38:49, 2.52s/it] +2025-02-05 13:44:14 - ERROR - stderr - +2025-02-05 13:44:14 - ERROR - stderr - +2025-02-05 13:44:14 - INFO - stdout - {'loss': 0.8689, 'grad_norm': 1.0330544710159302, 'learning_rate': 1.862661372874043e-05, 'epoch': 0.58} +2025-02-05 13:44:14 - ERROR - stderr - 19%|█▉ | 4347/22434 [3:36:34<12:38:49, 2.52s/it] +2025-02-05 13:44:17 - ERROR - stderr - 19%|█▉ | 4348/22434 [3:36:37<12:44:30, 2.54s/it] +2025-02-05 13:44:17 - ERROR - stderr - +2025-02-05 13:44:17 - ERROR - stderr - +2025-02-05 13:44:17 - INFO - stdout - {'loss': 1.044, 'grad_norm': 1.0935121774673462, 'learning_rate': 1.8625883417889435e-05, 'epoch': 0.58} +2025-02-05 13:44:17 - ERROR - stderr - 19%|█▉ | 4348/22434 [3:36:37<12:44:30, 2.54s/it] +2025-02-05 13:44:19 - ERROR - stderr - 19%|█▉ | 4349/22434 [3:36:39<12:42:15, 2.53s/it] +2025-02-05 13:44:19 - ERROR - stderr - +2025-02-05 13:44:19 - ERROR - stderr - +2025-02-05 13:44:19 - INFO - stdout - {'loss': 0.9675, 'grad_norm': 1.101121425628662, 'learning_rate': 1.862515292724015e-05, 'epoch': 0.58} +2025-02-05 13:44:19 - ERROR - stderr - 19%|█▉ | 4349/22434 [3:36:39<12:42:15, 2.53s/it] +2025-02-05 13:44:22 - ERROR - stderr - 19%|█▉ | 4350/22434 [3:36:42<13:14:05, 2.63s/it] +2025-02-05 13:44:22 - ERROR - stderr - +2025-02-05 13:44:22 - ERROR - stderr - +2025-02-05 13:44:22 - INFO - stdout - {'loss': 0.733, 'grad_norm': 1.0263274908065796, 'learning_rate': 1.862442225680781e-05, 'epoch': 0.58} +2025-02-05 13:44:22 - ERROR - stderr - 19%|█▉ | 4350/22434 [3:36:42<13:14:05, 2.63s/it] +2025-02-05 13:45:04 - ERROR - stderr - 19%|█▉ | 4351/22434 [3:37:24<71:53:24, 14.31s/it] +2025-02-05 13:45:04 - ERROR - stderr - +2025-02-05 13:45:04 - ERROR - stderr - +2025-02-05 13:45:04 - INFO - stdout - {'loss': 1.0055, 'grad_norm': 1.1773288249969482, 'learning_rate': 1.862369140660764e-05, 'epoch': 0.58} +2025-02-05 13:45:04 - ERROR - stderr - 19%|█▉ | 4351/22434 [3:37:24<71:53:24, 14.31s/it] +2025-02-05 13:45:38 - ERROR - stderr - 19%|█▉ | 4352/22434 [3:37:57<101:02:00, 20.12s/it] +2025-02-05 13:45:38 - ERROR - stderr - +2025-02-05 13:45:38 - ERROR - stderr - +2025-02-05 13:45:38 - INFO - stdout - {'loss': 0.9618, 'grad_norm': 1.075722336769104, 'learning_rate': 1.8622960376654872e-05, 'epoch': 0.58} +2025-02-05 13:45:38 - ERROR - stderr - 19%|█▉ | 4352/22434 [3:37:57<101:02:00, 20.12s/it] +2025-02-05 13:45:52 - ERROR - stderr - 19%|█▉ | 4353/22434 [3:38:12<92:40:37, 18.45s/it] +2025-02-05 13:45:52 - ERROR - stderr - +2025-02-05 13:45:52 - ERROR - stderr - +2025-02-05 13:45:52 - INFO - stdout - {'loss': 0.9435, 'grad_norm': 1.2372747659683228, 'learning_rate': 1.8622229166964748e-05, 'epoch': 0.58} +2025-02-05 13:45:52 - ERROR - stderr - 19%|█▉ | 4353/22434 [3:38:12<92:40:37, 18.45s/it] +2025-02-05 13:46:15 - ERROR - stderr - 19%|█▉ | 4354/22434 [3:38:35<99:50:13, 19.88s/it] +2025-02-05 13:46:15 - ERROR - stderr - +2025-02-05 13:46:15 - ERROR - stderr - +2025-02-05 13:46:15 - INFO - stdout - {'loss': 0.858, 'grad_norm': 0.9891464114189148, 'learning_rate': 1.8621497777552508e-05, 'epoch': 0.58} +2025-02-05 13:46:15 - ERROR - stderr - 19%|█▉ | 4354/22434 [3:38:35<99:50:13, 19.88s/it] +2025-02-05 13:46:43 - ERROR - stderr - 19%|█▉ | 4355/22434 [3:39:03<111:24:41, 22.18s/it] +2025-02-05 13:46:43 - ERROR - stderr - +2025-02-05 13:46:43 - ERROR - stderr - +2025-02-05 13:46:43 - INFO - stdout - {'loss': 0.9022, 'grad_norm': 1.172113299369812, 'learning_rate': 1.8620766208433395e-05, 'epoch': 0.58} +2025-02-05 13:46:43 - ERROR - stderr - 19%|█▉ | 4355/22434 [3:39:03<111:24:41, 22.18s/it] +2025-02-05 13:47:29 - ERROR - stderr - 19%|█▉ | 4356/22434 [3:39:49<147:29:33, 29.37s/it] +2025-02-05 13:47:29 - ERROR - stderr - +2025-02-05 13:47:29 - ERROR - stderr - +2025-02-05 13:47:29 - INFO - stdout - {'loss': 1.09, 'grad_norm': 1.0522807836532593, 'learning_rate': 1.8620034459622663e-05, 'epoch': 0.58} +2025-02-05 13:47:29 - ERROR - stderr - 19%|█▉ | 4356/22434 [3:39:49<147:29:33, 29.37s/it] +2025-02-05 13:47:31 - ERROR - stderr - 19%|█▉ | 4357/22434 [3:39:51<106:55:27, 21.29s/it] +2025-02-05 13:47:32 - ERROR - stderr - +2025-02-05 13:47:32 - ERROR - stderr - +2025-02-05 13:47:32 - INFO - stdout - {'loss': 1.1329, 'grad_norm': 1.130573034286499, 'learning_rate': 1.8619302531135555e-05, 'epoch': 0.58} +2025-02-05 13:47:32 - ERROR - stderr - 19%|█▉ | 4357/22434 [3:39:51<106:55:27, 21.29s/it] +2025-02-05 13:47:49 - ERROR - stderr - 19%|█▉ | 4358/22434 [3:40:09<100:52:26, 20.09s/it] +2025-02-05 13:47:49 - ERROR - stderr - +2025-02-05 13:47:49 - ERROR - stderr - +2025-02-05 13:47:49 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.1870092153549194, 'learning_rate': 1.8618570422987342e-05, 'epoch': 0.58} +2025-02-05 13:47:49 - ERROR - stderr - 19%|█▉ | 4358/22434 [3:40:09<100:52:26, 20.09s/it] +2025-02-05 13:48:04 - ERROR - stderr - 19%|█▉ | 4359/22434 [3:40:24<93:51:30, 18.69s/it] +2025-02-05 13:48:04 - ERROR - stderr - +2025-02-05 13:48:04 - ERROR - stderr - +2025-02-05 13:48:04 - INFO - stdout - {'loss': 0.9633, 'grad_norm': 1.079034447669983, 'learning_rate': 1.861783813519327e-05, 'epoch': 0.58} +2025-02-05 13:48:04 - ERROR - stderr - 19%|█▉ | 4359/22434 [3:40:24<93:51:30, 18.69s/it] +2025-02-05 13:49:47 - ERROR - stderr - 19%|█▉ | 4360/22434 [3:42:07<220:29:53, 43.92s/it] +2025-02-05 13:49:47 - ERROR - stderr - +2025-02-05 13:49:47 - ERROR - stderr - +2025-02-05 13:49:47 - INFO - stdout - {'loss': 0.9559, 'grad_norm': 0.9621286392211914, 'learning_rate': 1.8617105667768607e-05, 'epoch': 0.58} +2025-02-05 13:49:47 - ERROR - stderr - 19%|█▉ | 4360/22434 [3:42:07<220:29:53, 43.92s/it] +2025-02-05 13:50:18 - ERROR - stderr - 19%|█▉ | 4361/22434 [3:42:38<201:41:08, 40.17s/it] +2025-02-05 13:50:18 - ERROR - stderr - +2025-02-05 13:50:18 - ERROR - stderr - +2025-02-05 13:50:18 - INFO - stdout - {'loss': 0.9894, 'grad_norm': 1.178645372390747, 'learning_rate': 1.8616373020728627e-05, 'epoch': 0.58} +2025-02-05 13:50:18 - ERROR - stderr - 19%|█▉ | 4361/22434 [3:42:38<201:41:08, 40.17s/it] +2025-02-05 13:50:50 - ERROR - stderr - 19%|█▉ | 4362/22434 [3:43:10<189:10:57, 37.69s/it] +2025-02-05 13:50:50 - ERROR - stderr - +2025-02-05 13:50:50 - ERROR - stderr - +2025-02-05 13:50:50 - INFO - stdout - {'loss': 0.9668, 'grad_norm': 1.0526145696640015, 'learning_rate': 1.8615640194088592e-05, 'epoch': 0.58} +2025-02-05 13:50:50 - ERROR - stderr - 19%|█▉ | 4362/22434 [3:43:10<189:10:57, 37.69s/it] +2025-02-05 13:50:53 - ERROR - stderr - 19%|█▉ | 4363/22434 [3:43:13<136:13:34, 27.14s/it] +2025-02-05 13:50:53 - ERROR - stderr - +2025-02-05 13:50:53 - ERROR - stderr - +2025-02-05 13:50:53 - INFO - stdout - {'loss': 1.1666, 'grad_norm': 1.0943219661712646, 'learning_rate': 1.8614907187863786e-05, 'epoch': 0.58} +2025-02-05 13:50:53 - ERROR - stderr - 19%|█▉ | 4363/22434 [3:43:13<136:13:34, 27.14s/it] +2025-02-05 13:50:55 - ERROR - stderr - 19%|█▉ | 4364/22434 [3:43:15<99:01:08, 19.73s/it] +2025-02-05 13:50:55 - ERROR - stderr - +2025-02-05 13:50:55 - ERROR - stderr - +2025-02-05 13:50:55 - INFO - stdout - {'loss': 1.1163, 'grad_norm': 1.1866589784622192, 'learning_rate': 1.861417400206948e-05, 'epoch': 0.58} +2025-02-05 13:50:55 - ERROR - stderr - 19%|█▉ | 4364/22434 [3:43:15<99:01:08, 19.73s/it] +2025-02-05 13:50:58 - ERROR - stderr - 19%|█▉ | 4365/22434 [3:43:17<73:02:55, 14.55s/it] +2025-02-05 13:50:58 - ERROR - stderr - +2025-02-05 13:50:58 - ERROR - stderr - +2025-02-05 13:50:58 - INFO - stdout - {'loss': 1.1154, 'grad_norm': 1.2501007318496704, 'learning_rate': 1.8613440636720958e-05, 'epoch': 0.58} +2025-02-05 13:50:58 - ERROR - stderr - 19%|█▉ | 4365/22434 [3:43:18<73:02:55, 14.55s/it] +2025-02-05 13:51:33 - ERROR - stderr - 19%|█▉ | 4366/22434 [3:43:53<104:44:16, 20.87s/it] +2025-02-05 13:51:33 - ERROR - stderr - +2025-02-05 13:51:33 - ERROR - stderr - +2025-02-05 13:51:33 - INFO - stdout - {'loss': 0.8368, 'grad_norm': 0.9247719049453735, 'learning_rate': 1.861270709183351e-05, 'epoch': 0.58} +2025-02-05 13:51:33 - ERROR - stderr - 19%|█▉ | 4366/22434 [3:43:53<104:44:16, 20.87s/it] +2025-02-05 13:52:25 - ERROR - stderr - 19%|█▉ | 4367/22434 [3:44:45<151:33:13, 30.20s/it] +2025-02-05 13:52:25 - ERROR - stderr - +2025-02-05 13:52:25 - ERROR - stderr - +2025-02-05 13:52:25 - INFO - stdout - {'loss': 0.9193, 'grad_norm': 1.028141975402832, 'learning_rate': 1.8611973367422425e-05, 'epoch': 0.58} +2025-02-05 13:52:25 - ERROR - stderr - 19%|█▉ | 4367/22434 [3:44:45<151:33:13, 30.20s/it] +2025-02-05 13:52:57 - ERROR - stderr - 19%|█▉ | 4368/22434 [3:45:17<153:45:49, 30.64s/it] +2025-02-05 13:52:57 - ERROR - stderr - +2025-02-05 13:52:57 - ERROR - stderr - +2025-02-05 13:52:57 - INFO - stdout - {'loss': 1.0214, 'grad_norm': 1.1229690313339233, 'learning_rate': 1.8611239463502997e-05, 'epoch': 0.58} +2025-02-05 13:52:57 - ERROR - stderr - 19%|█▉ | 4368/22434 [3:45:17<153:45:49, 30.64s/it] +2025-02-05 13:53:10 - ERROR - stderr - 19%|█▉ | 4369/22434 [3:45:30<127:19:44, 25.37s/it] +2025-02-05 13:53:10 - ERROR - stderr - +2025-02-05 13:53:10 - ERROR - stderr - +2025-02-05 13:53:10 - INFO - stdout - {'loss': 0.9486, 'grad_norm': 1.092471718788147, 'learning_rate': 1.861050538009052e-05, 'epoch': 0.58} +2025-02-05 13:53:10 - ERROR - stderr - 19%|█▉ | 4369/22434 [3:45:30<127:19:44, 25.37s/it] +2025-02-05 13:54:05 - ERROR - stderr - 19%|█▉ | 4370/22434 [3:46:24<171:13:28, 34.12s/it] +2025-02-05 13:54:05 - ERROR - stderr - +2025-02-05 13:54:05 - ERROR - stderr - +2025-02-05 13:54:05 - INFO - stdout - {'loss': 0.8775, 'grad_norm': 1.0574297904968262, 'learning_rate': 1.86097711172003e-05, 'epoch': 0.58} +2025-02-05 13:54:05 - ERROR - stderr - 19%|█▉ | 4370/22434 [3:46:24<171:13:28, 34.12s/it] +2025-02-05 13:55:02 - ERROR - stderr - 19%|█▉ | 4371/22434 [3:47:22<206:13:29, 41.10s/it] +2025-02-05 13:55:02 - ERROR - stderr - +2025-02-05 13:55:02 - ERROR - stderr - +2025-02-05 13:55:02 - INFO - stdout - {'loss': 0.9813, 'grad_norm': 1.1997524499893188, 'learning_rate': 1.8609036674847635e-05, 'epoch': 0.58} +2025-02-05 13:55:02 - ERROR - stderr - 19%|█▉ | 4371/22434 [3:47:22<206:13:29, 41.10s/it] +2025-02-05 13:56:02 - ERROR - stderr - 19%|█▉ | 4372/22434 [3:48:22<235:24:59, 46.92s/it] +2025-02-05 13:56:03 - ERROR - stderr - +2025-02-05 13:56:03 - ERROR - stderr - +2025-02-05 13:56:03 - INFO - stdout - {'loss': 0.9694, 'grad_norm': 1.069234848022461, 'learning_rate': 1.8608302053047845e-05, 'epoch': 0.58} +2025-02-05 13:56:03 - ERROR - stderr - 19%|█▉ | 4372/22434 [3:48:22<235:24:59, 46.92s/it] +2025-02-05 13:56:20 - ERROR - stderr - 19%|█▉ | 4373/22434 [3:48:40<191:36:45, 38.19s/it] +2025-02-05 13:56:20 - ERROR - stderr - +2025-02-05 13:56:20 - ERROR - stderr - +2025-02-05 13:56:20 - INFO - stdout - {'loss': 1.1134, 'grad_norm': 1.0913699865341187, 'learning_rate': 1.8607567251816232e-05, 'epoch': 0.58} +2025-02-05 13:56:20 - ERROR - stderr - 19%|█▉ | 4373/22434 [3:48:40<191:36:45, 38.19s/it] +2025-02-05 13:56:51 - ERROR - stderr - 19%|█▉ | 4374/22434 [3:49:11<180:12:25, 35.92s/it] +2025-02-05 13:56:51 - ERROR - stderr - +2025-02-05 13:56:51 - ERROR - stderr - +2025-02-05 13:56:51 - INFO - stdout - {'loss': 0.8635, 'grad_norm': 1.2003231048583984, 'learning_rate': 1.8606832271168115e-05, 'epoch': 0.58} +2025-02-05 13:56:51 - ERROR - stderr - 19%|█▉ | 4374/22434 [3:49:11<180:12:25, 35.92s/it] +2025-02-05 13:57:52 - ERROR - stderr - 20%|█▉ | 4375/22434 [3:50:12<218:47:06, 43.61s/it] +2025-02-05 13:57:53 - ERROR - stderr - +2025-02-05 13:57:53 - ERROR - stderr - +2025-02-05 13:57:53 - INFO - stdout - {'loss': 0.9104, 'grad_norm': 0.996042013168335, 'learning_rate': 1.8606097111118817e-05, 'epoch': 0.59} +2025-02-05 13:57:53 - ERROR - stderr - 20%|█▉ | 4375/22434 [3:50:12<218:47:06, 43.61s/it] +2025-02-05 13:57:55 - ERROR - stderr - 20%|█▉ | 4376/22434 [3:50:15<157:19:59, 31.37s/it] +2025-02-05 13:57:55 - ERROR - stderr - +2025-02-05 13:57:55 - ERROR - stderr - +2025-02-05 13:57:55 - INFO - stdout - {'loss': 1.0371, 'grad_norm': 1.040037989616394, 'learning_rate': 1.860536177168366e-05, 'epoch': 0.59} +2025-02-05 13:57:55 - ERROR - stderr - 20%|█▉ | 4376/22434 [3:50:15<157:19:59, 31.37s/it] +2025-02-05 13:58:24 - ERROR - stderr - 20%|█▉ | 4377/22434 [3:50:43<152:35:44, 30.42s/it] +2025-02-05 13:58:24 - ERROR - stderr - +2025-02-05 13:58:24 - ERROR - stderr - +2025-02-05 13:58:24 - INFO - stdout - {'loss': 0.9577, 'grad_norm': 1.0615019798278809, 'learning_rate': 1.8604626252877972e-05, 'epoch': 0.59} +2025-02-05 13:58:24 - ERROR - stderr - 20%|█▉ | 4377/22434 [3:50:43<152:35:44, 30.42s/it] +2025-02-05 13:59:13 - ERROR - stderr - 20%|█▉ | 4378/22434 [3:51:33<180:54:24, 36.07s/it] +2025-02-05 13:59:13 - ERROR - stderr - +2025-02-05 13:59:13 - ERROR - stderr - +2025-02-05 13:59:13 - INFO - stdout - {'loss': 0.9777, 'grad_norm': 1.0888714790344238, 'learning_rate': 1.8603890554717082e-05, 'epoch': 0.59} +2025-02-05 13:59:13 - ERROR - stderr - 20%|█▉ | 4378/22434 [3:51:33<180:54:24, 36.07s/it] +2025-02-05 13:59:36 - ERROR - stderr - 20%|█▉ | 4379/22434 [3:51:56<161:29:13, 32.20s/it] +2025-02-05 13:59:36 - ERROR - stderr - +2025-02-05 13:59:36 - ERROR - stderr - +2025-02-05 13:59:36 - INFO - stdout - {'loss': 1.1023, 'grad_norm': 1.1677852869033813, 'learning_rate': 1.8603154677216325e-05, 'epoch': 0.59} +2025-02-05 13:59:36 - ERROR - stderr - 20%|█▉ | 4379/22434 [3:51:56<161:29:13, 32.20s/it] +2025-02-05 14:00:10 - ERROR - stderr - 20%|█▉ | 4380/22434 [3:52:30<164:59:31, 32.90s/it] +2025-02-05 14:00:10 - ERROR - stderr - +2025-02-05 14:00:10 - ERROR - stderr - +2025-02-05 14:00:10 - INFO - stdout - {'loss': 0.8889, 'grad_norm': 1.1060407161712646, 'learning_rate': 1.8602418620391046e-05, 'epoch': 0.59} +2025-02-05 14:00:10 - ERROR - stderr - 20%|█▉ | 4380/22434 [3:52:30<164:59:31, 32.90s/it] +2025-02-05 14:00:33 - ERROR - stderr - 20%|█▉ | 4381/22434 [3:52:53<149:38:12, 29.84s/it] +2025-02-05 14:00:33 - ERROR - stderr - +2025-02-05 14:00:33 - ERROR - stderr - +2025-02-05 14:00:33 - INFO - stdout - {'loss': 0.8008, 'grad_norm': 1.0657731294631958, 'learning_rate': 1.8601682384256577e-05, 'epoch': 0.59} +2025-02-05 14:00:33 - ERROR - stderr - 20%|█▉ | 4381/22434 [3:52:53<149:38:12, 29.84s/it] +2025-02-05 14:00:36 - ERROR - stderr - 20%|█▉ | 4382/22434 [3:52:55<108:28:07, 21.63s/it] +2025-02-05 14:00:36 - ERROR - stderr - +2025-02-05 14:00:36 - ERROR - stderr - +2025-02-05 14:00:36 - INFO - stdout - {'loss': 0.8763, 'grad_norm': 1.0989327430725098, 'learning_rate': 1.8600945968828275e-05, 'epoch': 0.59} +2025-02-05 14:00:36 - ERROR - stderr - 20%|█▉ | 4382/22434 [3:52:55<108:28:07, 21.63s/it] +2025-02-05 14:00:38 - ERROR - stderr - 20%|█▉ | 4383/22434 [3:52:58<79:33:51, 15.87s/it] +2025-02-05 14:00:38 - ERROR - stderr - +2025-02-05 14:00:38 - ERROR - stderr - +2025-02-05 14:00:38 - INFO - stdout - {'loss': 1.059, 'grad_norm': 1.2909172773361206, 'learning_rate': 1.860020937412148e-05, 'epoch': 0.59} +2025-02-05 14:00:38 - ERROR - stderr - 20%|█▉ | 4383/22434 [3:52:58<79:33:51, 15.87s/it] +2025-02-05 14:00:41 - ERROR - stderr - 20%|█▉ | 4384/22434 [3:53:00<59:30:04, 11.87s/it] +2025-02-05 14:00:41 - ERROR - stderr - +2025-02-05 14:00:41 - ERROR - stderr - +2025-02-05 14:00:41 - INFO - stdout - {'loss': 0.9236, 'grad_norm': 1.07817804813385, 'learning_rate': 1.8599472600151555e-05, 'epoch': 0.59} +2025-02-05 14:00:41 - ERROR - stderr - 20%|█▉ | 4384/22434 [3:53:00<59:30:04, 11.87s/it] +2025-02-05 14:01:00 - ERROR - stderr - 20%|█▉ | 4385/22434 [3:53:20<71:14:16, 14.21s/it] +2025-02-05 14:01:00 - ERROR - stderr - +2025-02-05 14:01:00 - ERROR - stderr - +2025-02-05 14:01:00 - INFO - stdout - {'loss': 0.9189, 'grad_norm': 1.0776126384735107, 'learning_rate': 1.859873564693385e-05, 'epoch': 0.59} +2025-02-05 14:01:00 - ERROR - stderr - 20%|█▉ | 4385/22434 [3:53:20<71:14:16, 14.21s/it] +2025-02-05 14:01:15 - ERROR - stderr - 20%|█▉ | 4386/22434 [3:53:35<72:29:59, 14.46s/it] +2025-02-05 14:01:15 - ERROR - stderr - +2025-02-05 14:01:15 - ERROR - stderr - +2025-02-05 14:01:15 - INFO - stdout - {'loss': 0.9382, 'grad_norm': 1.1654759645462036, 'learning_rate': 1.8597998514483724e-05, 'epoch': 0.59} +2025-02-05 14:01:15 - ERROR - stderr - 20%|█▉ | 4386/22434 [3:53:35<72:29:59, 14.46s/it] +2025-02-05 14:01:53 - ERROR - stderr - 20%|█▉ | 4387/22434 [3:54:13<107:11:29, 21.38s/it] +2025-02-05 14:01:53 - ERROR - stderr - +2025-02-05 14:01:53 - ERROR - stderr - +2025-02-05 14:01:53 - INFO - stdout - {'loss': 1.0317, 'grad_norm': 1.1703912019729614, 'learning_rate': 1.8597261202816553e-05, 'epoch': 0.59} +2025-02-05 14:01:53 - ERROR - stderr - 20%|█▉ | 4387/22434 [3:54:13<107:11:29, 21.38s/it] +2025-02-05 14:01:56 - ERROR - stderr - 20%|█▉ | 4388/22434 [3:54:16<79:54:28, 15.94s/it] +2025-02-05 14:01:56 - ERROR - stderr - +2025-02-05 14:01:56 - ERROR - stderr - +2025-02-05 14:01:56 - INFO - stdout - {'loss': 1.0388, 'grad_norm': 1.0751920938491821, 'learning_rate': 1.8596523711947693e-05, 'epoch': 0.59} +2025-02-05 14:01:56 - ERROR - stderr - 20%|█▉ | 4388/22434 [3:54:16<79:54:28, 15.94s/it] +2025-02-05 14:01:59 - ERROR - stderr - 20%|█▉ | 4389/22434 [3:54:18<59:36:54, 11.89s/it] +2025-02-05 14:01:59 - ERROR - stderr - +2025-02-05 14:01:59 - ERROR - stderr - +2025-02-05 14:01:59 - INFO - stdout - {'loss': 0.9699, 'grad_norm': 1.1128225326538086, 'learning_rate': 1.8595786041892526e-05, 'epoch': 0.59} +2025-02-05 14:01:59 - ERROR - stderr - 20%|█▉ | 4389/22434 [3:54:18<59:36:54, 11.89s/it] +2025-02-05 14:02:01 - ERROR - stderr - 20%|█▉ | 4390/22434 [3:54:21<45:29:51, 9.08s/it] +2025-02-05 14:02:01 - ERROR - stderr - +2025-02-05 14:02:01 - ERROR - stderr - +2025-02-05 14:02:01 - INFO - stdout - {'loss': 1.0197, 'grad_norm': 1.1440491676330566, 'learning_rate': 1.8595048192666425e-05, 'epoch': 0.59} +2025-02-05 14:02:01 - ERROR - stderr - 20%|█▉ | 4390/22434 [3:54:21<45:29:51, 9.08s/it] +2025-02-05 14:02:04 - ERROR - stderr - 20%|█▉ | 4391/22434 [3:54:23<35:41:44, 7.12s/it] +2025-02-05 14:02:04 - ERROR - stderr - +2025-02-05 14:02:04 - ERROR - stderr - +2025-02-05 14:02:04 - INFO - stdout - {'loss': 1.0001, 'grad_norm': 1.0496002435684204, 'learning_rate': 1.8594310164284767e-05, 'epoch': 0.59} +2025-02-05 14:02:04 - ERROR - stderr - 20%|█▉ | 4391/22434 [3:54:23<35:41:44, 7.12s/it] +2025-02-05 14:02:06 - ERROR - stderr - 20%|█▉ | 4392/22434 [3:54:26<29:12:18, 5.83s/it] +2025-02-05 14:02:06 - ERROR - stderr - +2025-02-05 14:02:06 - ERROR - stderr - +2025-02-05 14:02:06 - INFO - stdout - {'loss': 1.0825, 'grad_norm': 1.063289761543274, 'learning_rate': 1.8593571956762937e-05, 'epoch': 0.59} +2025-02-05 14:02:06 - ERROR - stderr - 20%|█▉ | 4392/22434 [3:54:26<29:12:18, 5.83s/it] +2025-02-05 14:02:31 - ERROR - stderr - 20%|█▉ | 4393/22434 [3:54:50<56:40:30, 11.31s/it] +2025-02-05 14:02:31 - ERROR - stderr - +2025-02-05 14:02:31 - ERROR - stderr - +2025-02-05 14:02:31 - INFO - stdout - {'loss': 0.9692, 'grad_norm': 1.088711142539978, 'learning_rate': 1.8592833570116324e-05, 'epoch': 0.59} +2025-02-05 14:02:31 - ERROR - stderr - 20%|█▉ | 4393/22434 [3:54:50<56:40:30, 11.31s/it] +2025-02-05 14:02:55 - ERROR - stderr - 20%|█▉ | 4394/22434 [3:55:14<75:57:31, 15.16s/it] +2025-02-05 14:02:55 - ERROR - stderr - +2025-02-05 14:02:55 - ERROR - stderr - +2025-02-05 14:02:55 - INFO - stdout - {'loss': 0.9048, 'grad_norm': 1.0396865606307983, 'learning_rate': 1.8592095004360316e-05, 'epoch': 0.59} +2025-02-05 14:02:55 - ERROR - stderr - 20%|█▉ | 4394/22434 [3:55:14<75:57:31, 15.16s/it] +2025-02-05 14:02:57 - ERROR - stderr - 20%|█▉ | 4395/22434 [3:55:17<56:54:50, 11.36s/it] +2025-02-05 14:02:57 - ERROR - stderr - +2025-02-05 14:02:57 - ERROR - stderr - +2025-02-05 14:02:57 - INFO - stdout - {'loss': 1.0185, 'grad_norm': 1.162926197052002, 'learning_rate': 1.8591356259510315e-05, 'epoch': 0.59} +2025-02-05 14:02:57 - ERROR - stderr - 20%|█▉ | 4395/22434 [3:55:17<56:54:50, 11.36s/it] +2025-02-05 14:03:00 - ERROR - stderr - 20%|█▉ | 4396/22434 [3:55:19<43:39:01, 8.71s/it] +2025-02-05 14:03:00 - ERROR - stderr - +2025-02-05 14:03:00 - ERROR - stderr - +2025-02-05 14:03:00 - INFO - stdout - {'loss': 0.9886, 'grad_norm': 1.037520408630371, 'learning_rate': 1.859061733558171e-05, 'epoch': 0.59} +2025-02-05 14:03:00 - ERROR - stderr - 20%|█▉ | 4396/22434 [3:55:19<43:39:01, 8.71s/it] +2025-02-05 14:03:02 - ERROR - stderr - 20%|█▉ | 4397/22434 [3:55:22<34:17:17, 6.84s/it] +2025-02-05 14:03:02 - ERROR - stderr - +2025-02-05 14:03:02 - ERROR - stderr - +2025-02-05 14:03:02 - INFO - stdout - {'loss': 0.9297, 'grad_norm': 1.1378631591796875, 'learning_rate': 1.8589878232589904e-05, 'epoch': 0.59} +2025-02-05 14:03:02 - ERROR - stderr - 20%|█▉ | 4397/22434 [3:55:22<34:17:17, 6.84s/it] +2025-02-05 14:03:21 - ERROR - stderr - 20%|█▉ | 4398/22434 [3:55:41<52:04:54, 10.40s/it] +2025-02-05 14:03:21 - ERROR - stderr - +2025-02-05 14:03:21 - ERROR - stderr - +2025-02-05 14:03:21 - INFO - stdout - {'loss': 0.9098, 'grad_norm': 1.198503851890564, 'learning_rate': 1.858913895055031e-05, 'epoch': 0.59} +2025-02-05 14:03:21 - ERROR - stderr - 20%|█▉ | 4398/22434 [3:55:41<52:04:54, 10.40s/it] +2025-02-05 14:03:23 - ERROR - stderr - 20%|█▉ | 4399/22434 [3:55:43<40:13:49, 8.03s/it] +2025-02-05 14:03:23 - ERROR - stderr - +2025-02-05 14:03:23 - ERROR - stderr - +2025-02-05 14:03:23 - INFO - stdout - {'loss': 1.0114, 'grad_norm': 1.0104840993881226, 'learning_rate': 1.858839948947833e-05, 'epoch': 0.59} +2025-02-05 14:03:23 - ERROR - stderr - 20%|█▉ | 4399/22434 [3:55:43<40:13:49, 8.03s/it] +2025-02-05 14:03:26 - ERROR - stderr - 20%|█▉ | 4400/22434 [3:55:46<31:56:15, 6.38s/it] +2025-02-05 14:03:26 - ERROR - stderr - +2025-02-05 14:03:26 - ERROR - stderr - +2025-02-05 14:03:26 - INFO - stdout - {'loss': 0.9127, 'grad_norm': 1.0440484285354614, 'learning_rate': 1.8587659849389386e-05, 'epoch': 0.59} +2025-02-05 14:03:26 - ERROR - stderr - 20%|█▉ | 4400/22434 [3:55:46<31:56:15, 6.38s/it] +2025-02-05 14:03:28 - ERROR - stderr - 20%|█▉ | 4401/22434 [3:55:48<26:03:10, 5.20s/it] +2025-02-05 14:03:28 - ERROR - stderr - +2025-02-05 14:03:28 - ERROR - stderr - +2025-02-05 14:03:28 - INFO - stdout - {'loss': 0.989, 'grad_norm': 0.9837992787361145, 'learning_rate': 1.8586920030298885e-05, 'epoch': 0.59} +2025-02-05 14:03:28 - ERROR - stderr - 20%|█▉ | 4401/22434 [3:55:48<26:03:10, 5.20s/it] +2025-02-05 14:03:31 - ERROR - stderr - 20%|█▉ | 4402/22434 [3:55:51<21:55:01, 4.38s/it] +2025-02-05 14:03:31 - ERROR - stderr - +2025-02-05 14:03:31 - ERROR - stderr - +2025-02-05 14:03:31 - INFO - stdout - {'loss': 1.0159, 'grad_norm': 1.0748567581176758, 'learning_rate': 1.8586180032222255e-05, 'epoch': 0.59} +2025-02-05 14:03:31 - ERROR - stderr - 20%|█▉ | 4402/22434 [3:55:51<21:55:01, 4.38s/it] +2025-02-05 14:03:33 - ERROR - stderr - 20%|█▉ | 4403/22434 [3:55:53<19:24:29, 3.87s/it] +2025-02-05 14:03:34 - ERROR - stderr - +2025-02-05 14:03:34 - ERROR - stderr - +2025-02-05 14:03:34 - INFO - stdout - {'loss': 1.0879, 'grad_norm': 1.2201601266860962, 'learning_rate': 1.858543985517492e-05, 'epoch': 0.59} +2025-02-05 14:03:34 - ERROR - stderr - 20%|█▉ | 4403/22434 [3:55:53<19:24:29, 3.87s/it] +2025-02-05 14:03:54 - ERROR - stderr - 20%|█▉ | 4404/22434 [3:56:14<44:02:04, 8.79s/it] +2025-02-05 14:03:54 - ERROR - stderr - +2025-02-05 14:03:54 - ERROR - stderr - +2025-02-05 14:03:54 - INFO - stdout - {'loss': 1.0039, 'grad_norm': 1.0066763162612915, 'learning_rate': 1.8584699499172304e-05, 'epoch': 0.59} +2025-02-05 14:03:54 - ERROR - stderr - 20%|█▉ | 4404/22434 [3:56:14<44:02:04, 8.79s/it] +2025-02-05 14:03:56 - ERROR - stderr - 20%|█▉ | 4405/22434 [3:56:16<34:31:45, 6.89s/it] +2025-02-05 14:03:56 - ERROR - stderr - +2025-02-05 14:03:56 - ERROR - stderr - +2025-02-05 14:03:56 - INFO - stdout - {'loss': 1.1156, 'grad_norm': 1.385453462600708, 'learning_rate': 1.858395896422984e-05, 'epoch': 0.59} +2025-02-05 14:03:56 - ERROR - stderr - 20%|█▉ | 4405/22434 [3:56:16<34:31:45, 6.89s/it] +2025-02-05 14:03:59 - ERROR - stderr - 20%|█▉ | 4406/22434 [3:56:19<28:21:45, 5.66s/it] +2025-02-05 14:03:59 - ERROR - stderr - +2025-02-05 14:03:59 - ERROR - stderr - +2025-02-05 14:03:59 - INFO - stdout - {'loss': 0.9929, 'grad_norm': 1.074121356010437, 'learning_rate': 1.8583218250362967e-05, 'epoch': 0.59} +2025-02-05 14:03:59 - ERROR - stderr - 20%|█▉ | 4406/22434 [3:56:19<28:21:45, 5.66s/it] +2025-02-05 14:04:14 - ERROR - stderr - 20%|█▉ | 4407/22434 [3:56:34<42:08:26, 8.42s/it] +2025-02-05 14:04:14 - ERROR - stderr - +2025-02-05 14:04:14 - ERROR - stderr - +2025-02-05 14:04:14 - INFO - stdout - {'loss': 0.9797, 'grad_norm': 1.0838309526443481, 'learning_rate': 1.8582477357587123e-05, 'epoch': 0.59} +2025-02-05 14:04:14 - ERROR - stderr - 20%|█▉ | 4407/22434 [3:56:34<42:08:26, 8.42s/it] +2025-02-05 14:04:17 - ERROR - stderr - 20%|█▉ | 4408/22434 [3:56:36<33:44:10, 6.74s/it] +2025-02-05 14:04:17 - ERROR - stderr - +2025-02-05 14:04:17 - ERROR - stderr - +2025-02-05 14:04:17 - INFO - stdout - {'loss': 1.0286, 'grad_norm': 1.1560280323028564, 'learning_rate': 1.858173628591775e-05, 'epoch': 0.59} +2025-02-05 14:04:17 - ERROR - stderr - 20%|█▉ | 4408/22434 [3:56:36<33:44:10, 6.74s/it] +2025-02-05 14:04:19 - ERROR - stderr - 20%|█▉ | 4409/22434 [3:56:39<27:17:57, 5.45s/it] +2025-02-05 14:04:19 - ERROR - stderr - +2025-02-05 14:04:19 - ERROR - stderr - +2025-02-05 14:04:19 - INFO - stdout - {'loss': 0.9663, 'grad_norm': 1.151377558708191, 'learning_rate': 1.85809950353703e-05, 'epoch': 0.59} +2025-02-05 14:04:19 - ERROR - stderr - 20%|█▉ | 4409/22434 [3:56:39<27:17:57, 5.45s/it] +2025-02-05 14:04:22 - ERROR - stderr - 20%|█▉ | 4410/22434 [3:56:41<22:43:36, 4.54s/it] +2025-02-05 14:04:22 - ERROR - stderr - +2025-02-05 14:04:22 - ERROR - stderr - +2025-02-05 14:04:22 - INFO - stdout - {'loss': 0.9735, 'grad_norm': 1.1858372688293457, 'learning_rate': 1.8580253605960215e-05, 'epoch': 0.59} +2025-02-05 14:04:22 - ERROR - stderr - 20%|█▉ | 4410/22434 [3:56:41<22:43:36, 4.54s/it] +2025-02-05 14:04:24 - ERROR - stderr - 20%|█▉ | 4411/22434 [3:56:44<19:37:14, 3.92s/it] +2025-02-05 14:04:24 - ERROR - stderr - +2025-02-05 14:04:24 - ERROR - stderr - +2025-02-05 14:04:24 - INFO - stdout - {'loss': 0.9989, 'grad_norm': 1.083585500717163, 'learning_rate': 1.8579511997702955e-05, 'epoch': 0.59} +2025-02-05 14:04:24 - ERROR - stderr - 20%|█▉ | 4411/22434 [3:56:44<19:37:14, 3.92s/it] +2025-02-05 14:04:26 - ERROR - stderr - 20%|█▉ | 4412/22434 [3:56:46<17:28:00, 3.49s/it] +2025-02-05 14:04:27 - ERROR - stderr - +2025-02-05 14:04:27 - ERROR - stderr - +2025-02-05 14:04:27 - INFO - stdout - {'loss': 0.9549, 'grad_norm': 1.0679858922958374, 'learning_rate': 1.857877021061398e-05, 'epoch': 0.59} +2025-02-05 14:04:27 - ERROR - stderr - 20%|█▉ | 4412/22434 [3:56:46<17:28:00, 3.49s/it] +2025-02-05 14:04:29 - ERROR - stderr - 20%|█▉ | 4413/22434 [3:56:49<16:02:05, 3.20s/it] +2025-02-05 14:04:29 - ERROR - stderr - +2025-02-05 14:04:29 - ERROR - stderr - +2025-02-05 14:04:29 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 1.0416409969329834, 'learning_rate': 1.8578028244708747e-05, 'epoch': 0.59} +2025-02-05 14:04:29 - ERROR - stderr - 20%|█▉ | 4413/22434 [3:56:49<16:02:05, 3.20s/it] +2025-02-05 14:04:31 - ERROR - stderr - 20%|█▉ | 4414/22434 [3:56:51<14:56:52, 2.99s/it] +2025-02-05 14:04:32 - ERROR - stderr - +2025-02-05 14:04:32 - ERROR - stderr - +2025-02-05 14:04:32 - INFO - stdout - {'loss': 0.9468, 'grad_norm': 1.0587183237075806, 'learning_rate': 1.8577286100002723e-05, 'epoch': 0.59} +2025-02-05 14:04:32 - ERROR - stderr - 20%|█▉ | 4414/22434 [3:56:51<14:56:52, 2.99s/it] +2025-02-05 14:04:34 - ERROR - stderr - 20%|█▉ | 4415/22434 [3:56:54<14:08:09, 2.82s/it] +2025-02-05 14:04:34 - ERROR - stderr - +2025-02-05 14:04:34 - ERROR - stderr - +2025-02-05 14:04:34 - INFO - stdout - {'loss': 1.0694, 'grad_norm': 1.1815359592437744, 'learning_rate': 1.8576543776511378e-05, 'epoch': 0.59} +2025-02-05 14:04:34 - ERROR - stderr - 20%|█▉ | 4415/22434 [3:56:54<14:08:09, 2.82s/it] +2025-02-05 14:04:36 - ERROR - stderr - 20%|█▉ | 4416/22434 [3:56:56<13:32:51, 2.71s/it] +2025-02-05 14:04:36 - ERROR - stderr - +2025-02-05 14:04:36 - ERROR - stderr - +2025-02-05 14:04:36 - INFO - stdout - {'loss': 1.1438, 'grad_norm': 1.1404277086257935, 'learning_rate': 1.8575801274250185e-05, 'epoch': 0.59} +2025-02-05 14:04:36 - ERROR - stderr - 20%|█▉ | 4416/22434 [3:56:56<13:32:51, 2.71s/it] +2025-02-05 14:04:39 - ERROR - stderr - 20%|█▉ | 4417/22434 [3:56:59<13:11:47, 2.64s/it] +2025-02-05 14:04:39 - ERROR - stderr - +2025-02-05 14:04:39 - ERROR - stderr - +2025-02-05 14:04:39 - INFO - stdout - {'loss': 1.0331, 'grad_norm': 1.1776742935180664, 'learning_rate': 1.857505859323462e-05, 'epoch': 0.59} +2025-02-05 14:04:39 - ERROR - stderr - 20%|█▉ | 4417/22434 [3:56:59<13:11:47, 2.64s/it] +2025-02-05 14:04:41 - ERROR - stderr - 20%|█▉ | 4418/22434 [3:57:01<12:58:24, 2.59s/it] +2025-02-05 14:04:41 - ERROR - stderr - +2025-02-05 14:04:41 - ERROR - stderr - +2025-02-05 14:04:41 - INFO - stdout - {'loss': 0.9235, 'grad_norm': 0.981890082359314, 'learning_rate': 1.8574315733480165e-05, 'epoch': 0.59} +2025-02-05 14:04:41 - ERROR - stderr - 20%|█▉ | 4418/22434 [3:57:01<12:58:24, 2.59s/it] +2025-02-05 14:04:44 - ERROR - stderr - 20%|█▉ | 4419/22434 [3:57:04<12:47:47, 2.56s/it] +2025-02-05 14:04:44 - ERROR - stderr - +2025-02-05 14:04:44 - ERROR - stderr - +2025-02-05 14:04:44 - INFO - stdout - {'loss': 0.9454, 'grad_norm': 1.0318905115127563, 'learning_rate': 1.85735726950023e-05, 'epoch': 0.59} +2025-02-05 14:04:44 - ERROR - stderr - 20%|█▉ | 4419/22434 [3:57:04<12:47:47, 2.56s/it] +2025-02-05 14:04:46 - ERROR - stderr - 20%|█▉ | 4420/22434 [3:57:06<12:40:14, 2.53s/it] +2025-02-05 14:04:46 - ERROR - stderr - +2025-02-05 14:04:46 - ERROR - stderr - +2025-02-05 14:04:46 - INFO - stdout - {'loss': 1.0348, 'grad_norm': 1.1517056226730347, 'learning_rate': 1.8572829477816522e-05, 'epoch': 0.59} +2025-02-05 14:04:46 - ERROR - stderr - 20%|█▉ | 4420/22434 [3:57:06<12:40:14, 2.53s/it] +2025-02-05 14:04:49 - ERROR - stderr - 20%|█▉ | 4421/22434 [3:57:08<12:30:39, 2.50s/it] +2025-02-05 14:04:49 - ERROR - stderr - +2025-02-05 14:04:49 - ERROR - stderr - +2025-02-05 14:04:49 - INFO - stdout - {'loss': 0.9597, 'grad_norm': 1.0160032510757446, 'learning_rate': 1.8572086081938315e-05, 'epoch': 0.59} +2025-02-05 14:04:49 - ERROR - stderr - 20%|█▉ | 4421/22434 [3:57:09<12:30:39, 2.50s/it] +2025-02-05 14:04:51 - ERROR - stderr - 20%|█▉ | 4422/22434 [3:57:11<12:29:43, 2.50s/it] +2025-02-05 14:04:51 - ERROR - stderr - +2025-02-05 14:04:51 - ERROR - stderr - +2025-02-05 14:04:51 - INFO - stdout - {'loss': 0.9574, 'grad_norm': 1.0701489448547363, 'learning_rate': 1.8571342507383175e-05, 'epoch': 0.59} +2025-02-05 14:04:51 - ERROR - stderr - 20%|█▉ | 4422/22434 [3:57:11<12:29:43, 2.50s/it] +2025-02-05 14:04:54 - ERROR - stderr - 20%|█▉ | 4423/22434 [3:57:13<12:26:43, 2.49s/it] +2025-02-05 14:04:54 - ERROR - stderr - +2025-02-05 14:04:54 - ERROR - stderr - +2025-02-05 14:04:54 - INFO - stdout - {'loss': 0.945, 'grad_norm': 1.0123778581619263, 'learning_rate': 1.8570598754166602e-05, 'epoch': 0.59} +2025-02-05 14:04:54 - ERROR - stderr - 20%|█▉ | 4423/22434 [3:57:13<12:26:43, 2.49s/it] +2025-02-05 14:04:56 - ERROR - stderr - 20%|█▉ | 4424/22434 [3:57:16<12:52:19, 2.57s/it] +2025-02-05 14:04:56 - ERROR - stderr - +2025-02-05 14:04:56 - ERROR - stderr - +2025-02-05 14:04:56 - INFO - stdout - {'loss': 1.0119, 'grad_norm': 1.249263048171997, 'learning_rate': 1.85698548223041e-05, 'epoch': 0.59} +2025-02-05 14:04:56 - ERROR - stderr - 20%|█▉ | 4424/22434 [3:57:16<12:52:19, 2.57s/it] +2025-02-05 14:05:21 - ERROR - stderr - 20%|█▉ | 4425/22434 [3:57:41<46:22:00, 9.27s/it] +2025-02-05 14:05:21 - ERROR - stderr - +2025-02-05 14:05:21 - ERROR - stderr - +2025-02-05 14:05:21 - INFO - stdout - {'loss': 0.9766, 'grad_norm': 1.0524859428405762, 'learning_rate': 1.8569110711811173e-05, 'epoch': 0.59} +2025-02-05 14:05:21 - ERROR - stderr - 20%|█▉ | 4425/22434 [3:57:41<46:22:00, 9.27s/it] +2025-02-05 14:05:32 - ERROR - stderr - 20%|█▉ | 4426/22434 [3:57:52<48:30:41, 9.70s/it] +2025-02-05 14:05:32 - ERROR - stderr - +2025-02-05 14:05:32 - ERROR - stderr - +2025-02-05 14:05:32 - INFO - stdout - {'loss': 0.9684, 'grad_norm': 1.2803441286087036, 'learning_rate': 1.8568366422703336e-05, 'epoch': 0.59} +2025-02-05 14:05:32 - ERROR - stderr - 20%|█▉ | 4426/22434 [3:57:52<48:30:41, 9.70s/it] +2025-02-05 14:06:11 - ERROR - stderr - 20%|█▉ | 4427/22434 [3:58:30<91:45:03, 18.34s/it] +2025-02-05 14:06:11 - ERROR - stderr - +2025-02-05 14:06:11 - ERROR - stderr - +2025-02-05 14:06:11 - INFO - stdout - {'loss': 0.9088, 'grad_norm': 1.005005955696106, 'learning_rate': 1.8567621954996098e-05, 'epoch': 0.59} +2025-02-05 14:06:11 - ERROR - stderr - 20%|█▉ | 4427/22434 [3:58:30<91:45:03, 18.34s/it] +2025-02-05 14:06:43 - ERROR - stderr - 20%|█▉ | 4428/22434 [3:59:03<113:24:10, 22.67s/it] +2025-02-05 14:06:43 - ERROR - stderr - +2025-02-05 14:06:43 - ERROR - stderr - +2025-02-05 14:06:43 - INFO - stdout - {'loss': 0.8976, 'grad_norm': 1.062455654144287, 'learning_rate': 1.8566877308704977e-05, 'epoch': 0.59} +2025-02-05 14:06:43 - ERROR - stderr - 20%|█▉ | 4428/22434 [3:59:03<113:24:10, 22.67s/it] +2025-02-05 14:07:19 - ERROR - stderr - 20%|█▉ | 4429/22434 [3:59:39<132:43:59, 26.54s/it] +2025-02-05 14:07:19 - ERROR - stderr - +2025-02-05 14:07:19 - ERROR - stderr - +2025-02-05 14:07:19 - INFO - stdout - {'loss': 1.0398, 'grad_norm': 1.0038727521896362, 'learning_rate': 1.8566132483845497e-05, 'epoch': 0.59} +2025-02-05 14:07:19 - ERROR - stderr - 20%|█▉ | 4429/22434 [3:59:39<132:43:59, 26.54s/it] +2025-02-05 14:08:01 - ERROR - stderr - 20%|█▉ | 4430/22434 [4:00:20<155:32:59, 31.10s/it] +2025-02-05 14:08:01 - ERROR - stderr - +2025-02-05 14:08:01 - ERROR - stderr - +2025-02-05 14:08:01 - INFO - stdout - {'loss': 0.9291, 'grad_norm': 1.0474847555160522, 'learning_rate': 1.8565387480433186e-05, 'epoch': 0.59} +2025-02-05 14:08:01 - ERROR - stderr - 20%|█▉ | 4430/22434 [4:00:20<155:32:59, 31.10s/it] +2025-02-05 14:08:33 - ERROR - stderr - 20%|█▉ | 4431/22434 [4:00:53<157:55:40, 31.58s/it] +2025-02-05 14:08:33 - ERROR - stderr - +2025-02-05 14:08:33 - ERROR - stderr - +2025-02-05 14:08:33 - INFO - stdout - {'loss': 1.0637, 'grad_norm': 1.1138916015625, 'learning_rate': 1.8564642298483565e-05, 'epoch': 0.59} +2025-02-05 14:08:33 - ERROR - stderr - 20%|█▉ | 4431/22434 [4:00:53<157:55:40, 31.58s/it] +2025-02-05 14:09:01 - ERROR - stderr - 20%|█▉ | 4432/22434 [4:01:20<151:24:33, 30.28s/it] +2025-02-05 14:09:01 - ERROR - stderr - +2025-02-05 14:09:01 - ERROR - stderr - +2025-02-05 14:09:01 - INFO - stdout - {'loss': 0.9454, 'grad_norm': 1.0432411432266235, 'learning_rate': 1.8563896938012173e-05, 'epoch': 0.59} +2025-02-05 14:09:01 - ERROR - stderr - 20%|█▉ | 4432/22434 [4:01:20<151:24:33, 30.28s/it] +2025-02-05 14:09:19 - ERROR - stderr - 20%|█▉ | 4433/22434 [4:01:39<133:25:36, 26.68s/it] +2025-02-05 14:09:19 - ERROR - stderr - +2025-02-05 14:09:19 - ERROR - stderr - +2025-02-05 14:09:19 - INFO - stdout - {'loss': 0.8906, 'grad_norm': 1.147680401802063, 'learning_rate': 1.8563151399034543e-05, 'epoch': 0.59} +2025-02-05 14:09:19 - ERROR - stderr - 20%|█▉ | 4433/22434 [4:01:39<133:25:36, 26.68s/it] +2025-02-05 14:10:05 - ERROR - stderr - 20%|█▉ | 4434/22434 [4:02:25<162:16:42, 32.46s/it] +2025-02-05 14:10:05 - ERROR - stderr - +2025-02-05 14:10:05 - ERROR - stderr - +2025-02-05 14:10:05 - INFO - stdout - {'loss': 1.0408, 'grad_norm': 1.134974718093872, 'learning_rate': 1.8562405681566217e-05, 'epoch': 0.59} +2025-02-05 14:10:05 - ERROR - stderr - 20%|█▉ | 4434/22434 [4:02:25<162:16:42, 32.46s/it] +2025-02-05 14:10:52 - ERROR - stderr - 20%|█▉ | 4435/22434 [4:03:12<185:00:47, 37.00s/it] +2025-02-05 14:10:52 - ERROR - stderr - +2025-02-05 14:10:52 - ERROR - stderr - +2025-02-05 14:10:52 - INFO - stdout - {'loss': 0.9175, 'grad_norm': 1.0291316509246826, 'learning_rate': 1.8561659785622737e-05, 'epoch': 0.59} +2025-02-05 14:10:52 - ERROR - stderr - 20%|█▉ | 4435/22434 [4:03:12<185:00:47, 37.00s/it] +2025-02-05 14:10:55 - ERROR - stderr - 20%|█▉ | 4436/22434 [4:03:15<133:12:43, 26.65s/it] +2025-02-05 14:10:55 - ERROR - stderr - +2025-02-05 14:10:55 - ERROR - stderr - +2025-02-05 14:10:55 - INFO - stdout - {'loss': 1.0203, 'grad_norm': 0.9544959664344788, 'learning_rate': 1.8560913711219653e-05, 'epoch': 0.59} +2025-02-05 14:10:55 - ERROR - stderr - 20%|█▉ | 4436/22434 [4:03:15<133:12:43, 26.65s/it] +2025-02-05 14:11:41 - ERROR - stderr - 20%|█▉ | 4437/22434 [4:04:01<162:26:14, 32.49s/it] +2025-02-05 14:11:41 - ERROR - stderr - +2025-02-05 14:11:41 - ERROR - stderr - +2025-02-05 14:11:41 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.1420345306396484, 'learning_rate': 1.856016745837251e-05, 'epoch': 0.59} +2025-02-05 14:11:41 - ERROR - stderr - 20%|█▉ | 4437/22434 [4:04:01<162:26:14, 32.49s/it] +2025-02-05 14:12:13 - ERROR - stderr - 20%|█▉ | 4438/22434 [4:04:33<162:10:55, 32.44s/it] +2025-02-05 14:12:13 - ERROR - stderr - +2025-02-05 14:12:13 - ERROR - stderr - +2025-02-05 14:12:13 - INFO - stdout - {'loss': 0.962, 'grad_norm': 1.1274502277374268, 'learning_rate': 1.8559421027096873e-05, 'epoch': 0.59} +2025-02-05 14:12:13 - ERROR - stderr - 20%|█▉ | 4438/22434 [4:04:33<162:10:55, 32.44s/it] +2025-02-05 14:12:16 - ERROR - stderr - 20%|█▉ | 4439/22434 [4:04:36<117:18:10, 23.47s/it] +2025-02-05 14:12:16 - ERROR - stderr - +2025-02-05 14:12:16 - ERROR - stderr - +2025-02-05 14:12:16 - INFO - stdout - {'loss': 0.9221, 'grad_norm': 1.0310615301132202, 'learning_rate': 1.8558674417408293e-05, 'epoch': 0.59} +2025-02-05 14:12:16 - ERROR - stderr - 20%|█��� | 4439/22434 [4:04:36<117:18:10, 23.47s/it] +2025-02-05 14:13:00 - ERROR - stderr - 20%|█▉ | 4440/22434 [4:05:20<148:00:54, 29.61s/it] +2025-02-05 14:13:00 - ERROR - stderr - +2025-02-05 14:13:00 - ERROR - stderr - +2025-02-05 14:13:00 - INFO - stdout - {'loss': 0.9358, 'grad_norm': 1.0321381092071533, 'learning_rate': 1.8557927629322333e-05, 'epoch': 0.59} +2025-02-05 14:13:00 - ERROR - stderr - 20%|█▉ | 4440/22434 [4:05:20<148:00:54, 29.61s/it] +2025-02-05 14:13:48 - ERROR - stderr - 20%|█▉ | 4441/22434 [4:06:08<175:35:22, 35.13s/it] +2025-02-05 14:13:48 - ERROR - stderr - +2025-02-05 14:13:48 - ERROR - stderr - +2025-02-05 14:13:48 - INFO - stdout - {'loss': 0.9564, 'grad_norm': 1.0985547304153442, 'learning_rate': 1.8557180662854565e-05, 'epoch': 0.59} +2025-02-05 14:13:48 - ERROR - stderr - 20%|█▉ | 4441/22434 [4:06:08<175:35:22, 35.13s/it] +2025-02-05 14:14:03 - ERROR - stderr - 20%|█▉ | 4442/22434 [4:06:22<145:13:24, 29.06s/it] +2025-02-05 14:14:03 - ERROR - stderr - +2025-02-05 14:14:03 - ERROR - stderr - +2025-02-05 14:14:03 - INFO - stdout - {'loss': 0.9261, 'grad_norm': 1.0813101530075073, 'learning_rate': 1.855643351802055e-05, 'epoch': 0.59} +2025-02-05 14:14:03 - ERROR - stderr - 20%|█▉ | 4442/22434 [4:06:23<145:13:24, 29.06s/it] +2025-02-05 14:14:36 - ERROR - stderr - 20%|█▉ | 4443/22434 [4:06:56<151:27:41, 30.31s/it] +2025-02-05 14:14:36 - ERROR - stderr - +2025-02-05 14:14:36 - ERROR - stderr - +2025-02-05 14:14:36 - INFO - stdout - {'loss': 0.9867, 'grad_norm': 1.0591710805892944, 'learning_rate': 1.8555686194835868e-05, 'epoch': 0.59} +2025-02-05 14:14:36 - ERROR - stderr - 20%|█▉ | 4443/22434 [4:06:56<151:27:41, 30.31s/it] +2025-02-05 14:14:49 - ERROR - stderr - 20%|█▉ | 4444/22434 [4:07:09<125:17:15, 25.07s/it] +2025-02-05 14:14:49 - ERROR - stderr - +2025-02-05 14:14:49 - ERROR - stderr - +2025-02-05 14:14:49 - INFO - stdout - {'loss': 1.0344, 'grad_norm': 1.1935772895812988, 'learning_rate': 1.8554938693316093e-05, 'epoch': 0.59} +2025-02-05 14:14:49 - ERROR - stderr - 20%|█▉ | 4444/22434 [4:07:09<125:17:15, 25.07s/it] +2025-02-05 14:15:37 - ERROR - stderr - 20%|█▉ | 4445/22434 [4:07:57<159:36:28, 31.94s/it] +2025-02-05 14:15:37 - ERROR - stderr - +2025-02-05 14:15:37 - ERROR - stderr - +2025-02-05 14:15:37 - INFO - stdout - {'loss': 0.8892, 'grad_norm': 1.1170843839645386, 'learning_rate': 1.855419101347681e-05, 'epoch': 0.59} +2025-02-05 14:15:37 - ERROR - stderr - 20%|█▉ | 4445/22434 [4:07:57<159:36:28, 31.94s/it] +2025-02-05 14:15:52 - ERROR - stderr - 20%|█▉ | 4446/22434 [4:08:11<134:04:07, 26.83s/it] +2025-02-05 14:15:52 - ERROR - stderr - +2025-02-05 14:15:52 - ERROR - stderr - +2025-02-05 14:15:52 - INFO - stdout - {'loss': 0.8558, 'grad_norm': 1.0175094604492188, 'learning_rate': 1.8553443155333596e-05, 'epoch': 0.59} +2025-02-05 14:15:52 - ERROR - stderr - 20%|█▉ | 4446/22434 [4:08:11<134:04:07, 26.83s/it] +2025-02-05 14:16:24 - ERROR - stderr - 20%|█▉ | 4447/22434 [4:08:44<142:37:44, 28.55s/it] +2025-02-05 14:16:24 - ERROR - stderr - +2025-02-05 14:16:24 - ERROR - stderr - +2025-02-05 14:16:24 - INFO - stdout - {'loss': 0.9747, 'grad_norm': 1.2034450769424438, 'learning_rate': 1.855269511890205e-05, 'epoch': 0.59} +2025-02-05 14:16:24 - ERROR - stderr - 20%|█▉ | 4447/22434 [4:08:44<142:37:44, 28.55s/it] +2025-02-05 14:16:31 - ERROR - stderr - 20%|█▉ | 4448/22434 [4:08:51<110:10:18, 22.05s/it] +2025-02-05 14:16:31 - ERROR - stderr - +2025-02-05 14:16:31 - ERROR - stderr - +2025-02-05 14:16:31 - INFO - stdout - {'loss': 0.858, 'grad_norm': 1.1066092252731323, 'learning_rate': 1.8551946904197754e-05, 'epoch': 0.59} +2025-02-05 14:16:31 - ERROR - stderr - 20%|█▉ | 4448/22434 [4:08:51<110:10:18, 22.05s/it] +2025-02-05 14:17:18 - ERROR - stderr - 20%|█▉ | 4449/22434 [4:09:37<146:57:37, 29.42s/it] +2025-02-05 14:17:18 - ERROR - stderr - +2025-02-05 14:17:18 - ERROR - stderr - +2025-02-05 14:17:18 - INFO - stdout - {'loss': 0.8943, 'grad_norm': 1.0075006484985352, 'learning_rate': 1.8551198511236308e-05, 'epoch': 0.59} +2025-02-05 14:17:18 - ERROR - stderr - 20%|█▉ | 4449/22434 [4:09:38<146:57:37, 29.42s/it] +2025-02-05 14:17:20 - ERROR - stderr - 20%|█▉ | 4450/22434 [4:09:40<106:42:11, 21.36s/it] +2025-02-05 14:17:20 - ERROR - stderr - +2025-02-05 14:17:20 - ERROR - stderr - +2025-02-05 14:17:20 - INFO - stdout - {'loss': 0.9346, 'grad_norm': 1.1339243650436401, 'learning_rate': 1.855044994003331e-05, 'epoch': 0.6} +2025-02-05 14:17:20 - ERROR - stderr - 20%|█▉ | 4450/22434 [4:09:40<106:42:11, 21.36s/it] +2025-02-05 14:17:46 - ERROR - stderr - 20%|█▉ | 4451/22434 [4:10:06<113:15:16, 22.67s/it] +2025-02-05 14:17:46 - ERROR - stderr - +2025-02-05 14:17:46 - ERROR - stderr - +2025-02-05 14:17:46 - INFO - stdout - {'loss': 1.044, 'grad_norm': 1.1379661560058594, 'learning_rate': 1.854970119060437e-05, 'epoch': 0.6} +2025-02-05 14:17:46 - ERROR - stderr - 20%|█▉ | 4451/22434 [4:10:06<113:15:16, 22.67s/it] +2025-02-05 14:17:49 - ERROR - stderr - 20%|█▉ | 4452/22434 [4:10:08<83:06:13, 16.64s/it] +2025-02-05 14:17:49 - ERROR - stderr - +2025-02-05 14:17:49 - ERROR - stderr - +2025-02-05 14:17:49 - INFO - stdout - {'loss': 0.9572, 'grad_norm': 1.0578159093856812, 'learning_rate': 1.854895226296509e-05, 'epoch': 0.6} +2025-02-05 14:17:49 - ERROR - stderr - 20%|█▉ | 4452/22434 [4:10:08<83:06:13, 16.64s/it] +2025-02-05 14:17:51 - ERROR - stderr - 20%|█▉ | 4453/22434 [4:10:11<61:55:05, 12.40s/it] +2025-02-05 14:17:51 - ERROR - stderr - +2025-02-05 14:17:51 - ERROR - stderr - +2025-02-05 14:17:51 - INFO - stdout - {'loss': 0.9851, 'grad_norm': 1.1160355806350708, 'learning_rate': 1.8548203157131074e-05, 'epoch': 0.6} +2025-02-05 14:17:51 - ERROR - stderr - 20%|█▉ | 4453/22434 [4:10:11<61:55:05, 12.40s/it] +2025-02-05 14:17:53 - ERROR - stderr - 20%|█▉ | 4454/22434 [4:10:13<46:57:29, 9.40s/it] +2025-02-05 14:17:54 - ERROR - stderr - +2025-02-05 14:17:54 - ERROR - stderr - +2025-02-05 14:17:54 - INFO - stdout - {'loss': 1.0036, 'grad_norm': 1.2497044801712036, 'learning_rate': 1.854745387311795e-05, 'epoch': 0.6} +2025-02-05 14:17:54 - ERROR - stderr - 20%|█▉ | 4454/22434 [4:10:13<46:57:29, 9.40s/it] +2025-02-05 14:17:56 - ERROR - stderr - 20%|█▉ | 4455/22434 [4:10:16<36:36:09, 7.33s/it] +2025-02-05 14:17:56 - ERROR - stderr - +2025-02-05 14:17:56 - ERROR - stderr - +2025-02-05 14:17:56 - INFO - stdout - {'loss': 1.0284, 'grad_norm': 1.0638896226882935, 'learning_rate': 1.8546704410941325e-05, 'epoch': 0.6} +2025-02-05 14:17:56 - ERROR - stderr - 20%|█▉ | 4455/22434 [4:10:16<36:36:09, 7.33s/it] +2025-02-05 14:18:39 - ERROR - stderr - 20%|█▉ | 4456/22434 [4:10:59<89:42:13, 17.96s/it] +2025-02-05 14:18:39 - ERROR - stderr - +2025-02-05 14:18:39 - ERROR - stderr - +2025-02-05 14:18:39 - INFO - stdout - {'loss': 0.8751, 'grad_norm': 1.0296021699905396, 'learning_rate': 1.8545954770616825e-05, 'epoch': 0.6} +2025-02-05 14:18:39 - ERROR - stderr - 20%|█▉ | 4456/22434 [4:10:59<89:42:13, 17.96s/it] +2025-02-05 14:19:03 - ERROR - stderr - 20%|█▉ | 4457/22434 [4:11:23<98:54:09, 19.81s/it] +2025-02-05 14:19:03 - ERROR - stderr - +2025-02-05 14:19:03 - ERROR - stderr - +2025-02-05 14:19:03 - INFO - stdout - {'loss': 0.9918, 'grad_norm': 1.1330212354660034, 'learning_rate': 1.8545204952160077e-05, 'epoch': 0.6} +2025-02-05 14:19:03 - ERROR - stderr - 20%|█▉ | 4457/22434 [4:11:23<98:54:09, 19.81s/it] +2025-02-05 14:19:26 - ERROR - stderr - 20%|█▉ | 4458/22434 [4:11:46<103:52:03, 20.80s/it] +2025-02-05 14:19:26 - ERROR - stderr - +2025-02-05 14:19:26 - ERROR - stderr - +2025-02-05 14:19:26 - INFO - stdout - {'loss': 1.0835, 'grad_norm': 1.1670010089874268, 'learning_rate': 1.8544454955586707e-05, 'epoch': 0.6} +2025-02-05 14:19:26 - ERROR - stderr - 20%|█▉ | 4458/22434 [4:11:46<103:52:03, 20.80s/it] +2025-02-05 14:19:28 - ERROR - stderr - 20%|█▉ | 4459/22434 [4:11:48<76:19:13, 15.29s/it] +2025-02-05 14:19:28 - ERROR - stderr - +2025-02-05 14:19:28 - ERROR - stderr - +2025-02-05 14:19:28 - INFO - stdout - {'loss': 0.8798, 'grad_norm': 1.0553812980651855, 'learning_rate': 1.8543704780912354e-05, 'epoch': 0.6} +2025-02-05 14:19:28 - ERROR - stderr - 20%|█▉ | 4459/22434 [4:11:48<76:19:13, 15.29s/it] +2025-02-05 14:19:53 - ERROR - stderr - 20%|█▉ | 4460/22434 [4:12:13<90:50:02, 18.19s/it] +2025-02-05 14:19:53 - ERROR - stderr - +2025-02-05 14:19:53 - ERROR - stderr - +2025-02-05 14:19:53 - INFO - stdout - {'loss': 1.0532, 'grad_norm': 1.0852761268615723, 'learning_rate': 1.8542954428152647e-05, 'epoch': 0.6} +2025-02-05 14:19:53 - ERROR - stderr - 20%|█▉ | 4460/22434 [4:12:13<90:50:02, 18.19s/it] +2025-02-05 14:19:56 - ERROR - stderr - 20%|█▉ | 4461/22434 [4:12:16<67:19:54, 13.49s/it] +2025-02-05 14:19:56 - ERROR - stderr - +2025-02-05 14:19:56 - ERROR - stderr - +2025-02-05 14:19:56 - INFO - stdout - {'loss': 1.0638, 'grad_norm': 1.1600054502487183, 'learning_rate': 1.8542203897323226e-05, 'epoch': 0.6} +2025-02-05 14:19:56 - ERROR - stderr - 20%|█▉ | 4461/22434 [4:12:16<67:19:54, 13.49s/it] +2025-02-05 14:20:29 - ERROR - stderr - 20%|█▉ | 4462/22434 [4:12:49<96:57:37, 19.42s/it] +2025-02-05 14:20:29 - ERROR - stderr - +2025-02-05 14:20:29 - ERROR - stderr - +2025-02-05 14:20:29 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 1.0125837326049805, 'learning_rate': 1.8541453188439745e-05, 'epoch': 0.6} +2025-02-05 14:20:29 - ERROR - stderr - 20%|█▉ | 4462/22434 [4:12:49<96:57:37, 19.42s/it] +2025-02-05 14:20:36 - ERROR - stderr - 20%|█▉ | 4463/22434 [4:12:56<78:05:44, 15.64s/it] +2025-02-05 14:20:36 - ERROR - stderr - +2025-02-05 14:20:36 - ERROR - stderr - +2025-02-05 14:20:36 - INFO - stdout - {'loss': 1.0022, 'grad_norm': 1.2771514654159546, 'learning_rate': 1.854070230151784e-05, 'epoch': 0.6} +2025-02-05 14:20:36 - ERROR - stderr - 20%|█▉ | 4463/22434 [4:12:56<78:05:44, 15.64s/it] +2025-02-05 14:21:07 - ERROR - stderr - 20%|█▉ | 4464/22434 [4:13:27<101:34:54, 20.35s/it] +2025-02-05 14:21:07 - ERROR - stderr - +2025-02-05 14:21:07 - ERROR - stderr - +2025-02-05 14:21:07 - INFO - stdout - {'loss': 0.9948, 'grad_norm': 1.2395879030227661, 'learning_rate': 1.8539951236573173e-05, 'epoch': 0.6} +2025-02-05 14:21:07 - ERROR - stderr - 20%|█▉ | 4464/22434 [4:13:27<101:34:54, 20.35s/it] +2025-02-05 14:21:10 - ERROR - stderr - 20%|█▉ | 4465/22434 [4:13:30<74:47:40, 14.98s/it] +2025-02-05 14:21:10 - ERROR - stderr - +2025-02-05 14:21:10 - ERROR - stderr - +2025-02-05 14:21:10 - INFO - stdout - {'loss': 1.0318, 'grad_norm': 1.129096508026123, 'learning_rate': 1.853919999362139e-05, 'epoch': 0.6} +2025-02-05 14:21:10 - ERROR - stderr - 20%|█▉ | 4465/22434 [4:13:30<74:47:40, 14.98s/it] +2025-02-05 14:21:12 - ERROR - stderr - 20%|█▉ | 4466/22434 [4:13:32<56:04:41, 11.24s/it] +2025-02-05 14:21:12 - ERROR - stderr - +2025-02-05 14:21:12 - ERROR - stderr - +2025-02-05 14:21:12 - INFO - stdout - {'loss': 0.9854, 'grad_norm': 1.0584124326705933, 'learning_rate': 1.853844857267816e-05, 'epoch': 0.6} +2025-02-05 14:21:12 - ERROR - stderr - 20%|█▉ | 4466/22434 [4:13:32<56:04:41, 11.24s/it] +2025-02-05 14:21:15 - ERROR - stderr - 20%|█▉ | 4467/22434 [4:13:35<43:05:20, 8.63s/it] +2025-02-05 14:21:15 - ERROR - stderr - +2025-02-05 14:21:15 - ERROR - stderr - +2025-02-05 14:21:15 - INFO - stdout - {'loss': 0.918, 'grad_norm': 1.131452202796936, 'learning_rate': 1.8537696973759135e-05, 'epoch': 0.6} +2025-02-05 14:21:15 - ERROR - stderr - 20%|█▉ | 4467/22434 [4:13:35<43:05:20, 8.63s/it] +2025-02-05 14:22:13 - ERROR - stderr - 20%|█▉ | 4468/22434 [4:14:33<117:01:01, 23.45s/it] +2025-02-05 14:22:13 - ERROR - stderr - +2025-02-05 14:22:13 - ERROR - stderr - +2025-02-05 14:22:13 - INFO - stdout - {'loss': 1.0887, 'grad_norm': 1.1946680545806885, 'learning_rate': 1.853694519687999e-05, 'epoch': 0.6} +2025-02-05 14:22:13 - ERROR - stderr - 20%|█▉ | 4468/22434 [4:14:33<117:01:01, 23.45s/it] +2025-02-05 14:22:23 - ERROR - stderr - 20%|█▉ | 4469/22434 [4:14:42<96:39:18, 19.37s/it] +2025-02-05 14:22:23 - ERROR - stderr - +2025-02-05 14:22:23 - ERROR - stderr - +2025-02-05 14:22:23 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.147078514099121, 'learning_rate': 1.8536193242056386e-05, 'epoch': 0.6} +2025-02-05 14:22:23 - ERROR - stderr - 20%|█▉ | 4469/22434 [4:14:43<96:39:18, 19.37s/it] +2025-02-05 14:22:25 - ERROR - stderr - 20%|█▉ | 4470/22434 [4:14:45<71:18:30, 14.29s/it] +2025-02-05 14:22:25 - ERROR - stderr - +2025-02-05 14:22:25 - ERROR - stderr - +2025-02-05 14:22:25 - INFO - stdout - {'loss': 1.1428, 'grad_norm': 1.224615216255188, 'learning_rate': 1.8535441109304006e-05, 'epoch': 0.6} +2025-02-05 14:22:25 - ERROR - stderr - 20%|█▉ | 4470/22434 [4:14:45<71:18:30, 14.29s/it] +2025-02-05 14:22:28 - ERROR - stderr - 20%|█▉ | 4471/22434 [4:14:47<53:41:48, 10.76s/it] +2025-02-05 14:22:28 - ERROR - stderr - +2025-02-05 14:22:28 - ERROR - stderr - +2025-02-05 14:22:28 - INFO - stdout - {'loss': 1.0077, 'grad_norm': 1.0773061513900757, 'learning_rate': 1.8534688798638524e-05, 'epoch': 0.6} +2025-02-05 14:22:28 - ERROR - stderr - 20%|█▉ | 4471/22434 [4:14:47<53:41:48, 10.76s/it] +2025-02-05 14:22:30 - ERROR - stderr - 20%|█▉ | 4472/22434 [4:14:50<41:20:23, 8.29s/it] +2025-02-05 14:22:30 - ERROR - stderr - +2025-02-05 14:22:30 - ERROR - stderr - +2025-02-05 14:22:30 - INFO - stdout - {'loss': 0.988, 'grad_norm': 1.1713714599609375, 'learning_rate': 1.853393631007562e-05, 'epoch': 0.6} +2025-02-05 14:22:30 - ERROR - stderr - 20%|█▉ | 4472/22434 [4:14:50<41:20:23, 8.29s/it] +2025-02-05 14:22:33 - ERROR - stderr - 20%|█▉ | 4473/22434 [4:14:53<32:47:48, 6.57s/it] +2025-02-05 14:22:33 - ERROR - stderr - +2025-02-05 14:22:33 - ERROR - stderr - +2025-02-05 14:22:33 - INFO - stdout - {'loss': 0.9334, 'grad_norm': 1.0535506010055542, 'learning_rate': 1.853318364363098e-05, 'epoch': 0.6} +2025-02-05 14:22:33 - ERROR - stderr - 20%|█▉ | 4473/22434 [4:14:53<32:47:48, 6.57s/it] +2025-02-05 14:22:35 - ERROR - stderr - 20%|█▉ | 4474/22434 [4:14:55<26:46:35, 5.37s/it] +2025-02-05 14:22:35 - ERROR - stderr - +2025-02-05 14:22:35 - ERROR - stderr - +2025-02-05 14:22:35 - INFO - stdout - {'loss': 1.0317, 'grad_norm': 1.1029497385025024, 'learning_rate': 1.853243079932029e-05, 'epoch': 0.6} +2025-02-05 14:22:35 - ERROR - stderr - 20%|█▉ | 4474/22434 [4:14:55<26:46:35, 5.37s/it] +2025-02-05 14:22:38 - ERROR - stderr - 20%|█▉ | 4475/22434 [4:14:58<22:51:33, 4.58s/it] +2025-02-05 14:22:38 - ERROR - stderr - +2025-02-05 14:22:38 - ERROR - stderr - +2025-02-05 14:22:38 - INFO - stdout - {'loss': 0.9816, 'grad_norm': 1.0632649660110474, 'learning_rate': 1.8531677777159246e-05, 'epoch': 0.6} +2025-02-05 14:22:38 - ERROR - stderr - 20%|█▉ | 4475/22434 [4:14:58<22:51:33, 4.58s/it] +2025-02-05 14:22:41 - ERROR - stderr - 20%|█▉ | 4476/22434 [4:15:00<19:49:17, 3.97s/it] +2025-02-05 14:22:41 - ERROR - stderr - +2025-02-05 14:22:41 - ERROR - stderr - +2025-02-05 14:22:41 - INFO - stdout - {'loss': 1.1314, 'grad_norm': 1.0627434253692627, 'learning_rate': 1.8530924577163546e-05, 'epoch': 0.6} +2025-02-05 14:22:41 - ERROR - stderr - 20%|█▉ | 4476/22434 [4:15:00<19:49:17, 3.97s/it] +2025-02-05 14:22:43 - ERROR - stderr - 20%|█▉ | 4477/22434 [4:15:03<17:40:38, 3.54s/it] +2025-02-05 14:22:43 - ERROR - stderr - +2025-02-05 14:22:43 - ERROR - stderr - +2025-02-05 14:22:43 - INFO - stdout - {'loss': 1.0636, 'grad_norm': 1.1574668884277344, 'learning_rate': 1.853017119934888e-05, 'epoch': 0.6} +2025-02-05 14:22:43 - ERROR - stderr - 20%|█▉ | 4477/22434 [4:15:03<17:40:38, 3.54s/it] +2025-02-05 14:22:46 - ERROR - stderr - 20%|█▉ | 4478/22434 [4:15:05<16:04:39, 3.22s/it] +2025-02-05 14:22:46 - ERROR - stderr - +2025-02-05 14:22:46 - ERROR - stderr - +2025-02-05 14:22:46 - INFO - stdout - {'loss': 1.0335, 'grad_norm': 1.1443142890930176, 'learning_rate': 1.852941764373096e-05, 'epoch': 0.6} +2025-02-05 14:22:46 - ERROR - stderr - 20%|█▉ | 4478/22434 [4:15:05<16:04:39, 3.22s/it] +2025-02-05 14:22:48 - ERROR - stderr - 20%|█▉ | 4479/22434 [4:15:08<15:08:28, 3.04s/it] +2025-02-05 14:22:48 - ERROR - stderr - +2025-02-05 14:22:48 - ERROR - stderr - +2025-02-05 14:22:48 - INFO - stdout - {'loss': 0.9357, 'grad_norm': 1.0488827228546143, 'learning_rate': 1.8528663910325492e-05, 'epoch': 0.6} +2025-02-05 14:22:48 - ERROR - stderr - 20%|█▉ | 4479/22434 [4:15:08<15:08:28, 3.04s/it] +2025-02-05 14:22:51 - ERROR - stderr - 20%|█▉ | 4480/22434 [4:15:11<14:30:23, 2.91s/it] +2025-02-05 14:22:51 - ERROR - stderr - +2025-02-05 14:22:51 - ERROR - stderr - +2025-02-05 14:22:51 - INFO - stdout - {'loss': 0.9756, 'grad_norm': 1.0668854713439941, 'learning_rate': 1.852790999914819e-05, 'epoch': 0.6} +2025-02-05 14:22:51 - ERROR - stderr - 20%|█▉ | 4480/22434 [4:15:11<14:30:23, 2.91s/it] +2025-02-05 14:22:53 - ERROR - stderr - 20%|█▉ | 4481/22434 [4:15:13<13:47:35, 2.77s/it] +2025-02-05 14:22:53 - ERROR - stderr - +2025-02-05 14:22:53 - ERROR - stderr - +2025-02-05 14:22:53 - INFO - stdout - {'loss': 0.9152, 'grad_norm': 1.1215001344680786, 'learning_rate': 1.852715591021476e-05, 'epoch': 0.6} +2025-02-05 14:22:53 - ERROR - stderr - 20%|█▉ | 4481/22434 [4:15:13<13:47:35, 2.77s/it] +2025-02-05 14:22:56 - ERROR - stderr - 20%|█▉ | 4482/22434 [4:15:15<13:18:33, 2.67s/it] +2025-02-05 14:22:56 - ERROR - stderr - +2025-02-05 14:22:56 - ERROR - stderr - +2025-02-05 14:22:56 - INFO - stdout - {'loss': 1.0071, 'grad_norm': 1.1761562824249268, 'learning_rate': 1.8526401643540924e-05, 'epoch': 0.6} +2025-02-05 14:22:56 - ERROR - stderr - 20%|█▉ | 4482/22434 [4:15:16<13:18:33, 2.67s/it] +2025-02-05 14:22:58 - ERROR - stderr - 20%|█▉ | 4483/22434 [4:15:18<13:05:47, 2.63s/it] +2025-02-05 14:22:58 - ERROR - stderr - +2025-02-05 14:22:58 - ERROR - stderr - +2025-02-05 14:22:58 - INFO - stdout - {'loss': 0.9649, 'grad_norm': 1.0299917459487915, 'learning_rate': 1.8525647199142406e-05, 'epoch': 0.6} +2025-02-05 14:22:58 - ERROR - stderr - 20%|█▉ | 4483/22434 [4:15:18<13:05:47, 2.63s/it] +2025-02-05 14:23:01 - ERROR - stderr - 20%|█▉ | 4484/22434 [4:15:21<13:01:01, 2.61s/it] +2025-02-05 14:23:01 - ERROR - stderr - +2025-02-05 14:23:01 - ERROR - stderr - +2025-02-05 14:23:01 - INFO - stdout - {'loss': 0.9146, 'grad_norm': 1.1721644401550293, 'learning_rate': 1.8524892577034928e-05, 'epoch': 0.6} +2025-02-05 14:23:01 - ERROR - stderr - 20%|█▉ | 4484/22434 [4:15:21<13:01:01, 2.61s/it] +2025-02-05 14:23:14 - ERROR - stderr - 20%|█▉ | 4485/22434 [4:15:34<28:36:56, 5.74s/it] +2025-02-05 14:23:14 - ERROR - stderr - +2025-02-05 14:23:14 - ERROR - stderr - +2025-02-05 14:23:14 - INFO - stdout - {'loss': 0.8912, 'grad_norm': 1.0512962341308594, 'learning_rate': 1.8524137777234226e-05, 'epoch': 0.6} +2025-02-05 14:23:14 - ERROR - stderr - 20%|█▉ | 4485/22434 [4:15:34<28:36:56, 5.74s/it] +2025-02-05 14:23:25 - ERROR - stderr - 20%|█▉ | 4486/22434 [4:15:45<37:07:40, 7.45s/it] +2025-02-05 14:23:25 - ERROR - stderr - +2025-02-05 14:23:25 - ERROR - stderr - +2025-02-05 14:23:25 - INFO - stdout - {'loss': 1.0982, 'grad_norm': 1.1344468593597412, 'learning_rate': 1.8523382799756024e-05, 'epoch': 0.6} +2025-02-05 14:23:25 - ERROR - stderr - 20%|█▉ | 4486/22434 [4:15:45<37:07:40, 7.45s/it] +2025-02-05 14:23:43 - ERROR - stderr - 20%|██ | 4487/22434 [4:16:03<53:11:58, 10.67s/it] +2025-02-05 14:23:44 - ERROR - stderr - +2025-02-05 14:23:44 - ERROR - stderr - +2025-02-05 14:23:44 - INFO - stdout - {'loss': 0.9431, 'grad_norm': 1.016634464263916, 'learning_rate': 1.8522627644616066e-05, 'epoch': 0.6} +2025-02-05 14:23:44 - ERROR - stderr - 20%|██ | 4487/22434 [4:16:03<53:11:58, 10.67s/it] +2025-02-05 14:24:07 - ERROR - stderr - 20%|██ | 4488/22434 [4:16:27<72:47:26, 14.60s/it] +2025-02-05 14:24:07 - ERROR - stderr - +2025-02-05 14:24:07 - ERROR - stderr - +2025-02-05 14:24:07 - INFO - stdout - {'loss': 0.8622, 'grad_norm': 1.048527479171753, 'learning_rate': 1.852187231183009e-05, 'epoch': 0.6} +2025-02-05 14:24:07 - ERROR - stderr - 20%|██ | 4488/22434 [4:16:27<72:47:26, 14.60s/it] +2025-02-05 14:24:21 - ERROR - stderr - 20%|██ | 4489/22434 [4:16:41<71:57:25, 14.44s/it] +2025-02-05 14:24:21 - ERROR - stderr - +2025-02-05 14:24:21 - ERROR - stderr - +2025-02-05 14:24:21 - INFO - stdout - {'loss': 1.0529, 'grad_norm': 1.2555572986602783, 'learning_rate': 1.852111680141384e-05, 'epoch': 0.6} +2025-02-05 14:24:21 - ERROR - stderr - 20%|██ | 4489/22434 [4:16:41<71:57:25, 14.44s/it] +2025-02-05 14:24:36 - ERROR - stderr - 20%|██ | 4490/22434 [4:16:55<71:50:29, 14.41s/it] +2025-02-05 14:24:36 - ERROR - stderr - +2025-02-05 14:24:36 - ERROR - stderr - +2025-02-05 14:24:36 - INFO - stdout - {'loss': 1.0224, 'grad_norm': 1.0794832706451416, 'learning_rate': 1.8520361113383068e-05, 'epoch': 0.6} +2025-02-05 14:24:36 - ERROR - stderr - 20%|██ | 4490/22434 [4:16:55<71:50:29, 14.41s/it] +2025-02-05 14:25:15 - ERROR - stderr - 20%|██ | 4491/22434 [4:17:35<109:13:12, 21.91s/it] +2025-02-05 14:25:15 - ERROR - stderr - +2025-02-05 14:25:15 - ERROR - stderr - +2025-02-05 14:25:15 - INFO - stdout - {'loss': 0.9989, 'grad_norm': 1.0830272436141968, 'learning_rate': 1.8519605247753517e-05, 'epoch': 0.6} +2025-02-05 14:25:15 - ERROR - stderr - 20%|██ | 4491/22434 [4:17:35<109:13:12, 21.91s/it] +2025-02-05 14:26:03 - ERROR - stderr - 20%|██ | 4492/22434 [4:18:23<147:57:28, 29.69s/it] +2025-02-05 14:26:03 - ERROR - stderr - +2025-02-05 14:26:03 - ERROR - stderr - +2025-02-05 14:26:03 - INFO - stdout - {'loss': 0.9453, 'grad_norm': 1.099109411239624, 'learning_rate': 1.8518849204540947e-05, 'epoch': 0.6} +2025-02-05 14:26:03 - ERROR - stderr - 20%|██ | 4492/22434 [4:18:23<147:57:28, 29.69s/it] +2025-02-05 14:26:20 - ERROR - stderr - 20%|██ | 4493/22434 [4:18:40<128:45:11, 25.84s/it] +2025-02-05 14:26:20 - ERROR - stderr - +2025-02-05 14:26:20 - ERROR - stderr - +2025-02-05 14:26:20 - INFO - stdout - {'loss': 1.0033, 'grad_norm': 1.2155909538269043, 'learning_rate': 1.8518092983761117e-05, 'epoch': 0.6} +2025-02-05 14:26:20 - ERROR - stderr - 20%|██ | 4493/22434 [4:18:40<128:45:11, 25.84s/it] +2025-02-05 14:26:49 - ERROR - stderr - 20%|██ | 4494/22434 [4:19:09<133:48:00, 26.85s/it] +2025-02-05 14:26:49 - ERROR - stderr - +2025-02-05 14:26:49 - ERROR - stderr - +2025-02-05 14:26:49 - INFO - stdout - {'loss': 1.0491, 'grad_norm': 1.1234641075134277, 'learning_rate': 1.851733658542979e-05, 'epoch': 0.6} +2025-02-05 14:26:49 - ERROR - stderr - 20%|██ | 4494/22434 [4:19:09<133:48:00, 26.85s/it] +2025-02-05 14:27:36 - ERROR - stderr - 20%|██ | 4495/22434 [4:19:56<163:40:18, 32.85s/it] +2025-02-05 14:27:36 - ERROR - stderr - +2025-02-05 14:27:36 - ERROR - stderr - +2025-02-05 14:27:36 - INFO - stdout - {'loss': 0.9313, 'grad_norm': 1.1045058965682983, 'learning_rate': 1.8516580009562734e-05, 'epoch': 0.6} +2025-02-05 14:27:36 - ERROR - stderr - 20%|██ | 4495/22434 [4:19:56<163:40:18, 32.85s/it] +2025-02-05 14:28:27 - ERROR - stderr - 20%|██ | 4496/22434 [4:20:47<191:00:40, 38.33s/it] +2025-02-05 14:28:27 - ERROR - stderr - +2025-02-05 14:28:27 - ERROR - stderr - +2025-02-05 14:28:27 - INFO - stdout - {'loss': 0.9677, 'grad_norm': 1.1874134540557861, 'learning_rate': 1.8515823256175716e-05, 'epoch': 0.6} +2025-02-05 14:28:27 - ERROR - stderr - 20%|██ | 4496/22434 [4:20:47<191:00:40, 38.33s/it] +2025-02-05 14:29:16 - ERROR - stderr - 20%|██ | 4497/22434 [4:21:36<207:22:32, 41.62s/it] +2025-02-05 14:29:16 - ERROR - stderr - +2025-02-05 14:29:16 - ERROR - stderr - +2025-02-05 14:29:16 - INFO - stdout - {'loss': 0.9587, 'grad_norm': 1.1104332208633423, 'learning_rate': 1.8515066325284513e-05, 'epoch': 0.6} +2025-02-05 14:29:16 - ERROR - stderr - 20%|██ | 4497/22434 [4:21:36<207:22:32, 41.62s/it] +2025-02-05 14:30:08 - ERROR - stderr - 20%|██ | 4498/22434 [4:22:28<222:42:22, 44.70s/it] +2025-02-05 14:30:08 - ERROR - stderr - +2025-02-05 14:30:08 - ERROR - stderr - +2025-02-05 14:30:08 - INFO - stdout - {'loss': 0.9552, 'grad_norm': 1.0392301082611084, 'learning_rate': 1.8514309216904895e-05, 'epoch': 0.6} +2025-02-05 14:30:08 - ERROR - stderr - 20%|██ | 4498/22434 [4:22:28<222:42:22, 44.70s/it] +2025-02-05 14:30:11 - ERROR - stderr - 20%|██ | 4499/22434 [4:22:30<159:41:00, 32.05s/it] +2025-02-05 14:30:11 - ERROR - stderr - +2025-02-05 14:30:11 - ERROR - stderr - +2025-02-05 14:30:11 - INFO - stdout - {'loss': 0.9353, 'grad_norm': 1.0481865406036377, 'learning_rate': 1.8513551931052654e-05, 'epoch': 0.6} +2025-02-05 14:30:11 - ERROR - stderr - 20%|██ | 4499/22434 [4:22:30<159:41:00, 32.05s/it] +2025-02-05 14:30:28 - ERROR - stderr - 20%|██ | 4500/22434 [4:22:48<138:01:06, 27.71s/it] +2025-02-05 14:30:28 - ERROR - stderr - +2025-02-05 14:30:28 - ERROR - stderr - +2025-02-05 14:30:28 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.0300705432891846, 'learning_rate': 1.8512794467743567e-05, 'epoch': 0.6} +2025-02-05 14:30:28 - ERROR - stderr - 20%|██ | 4500/22434 [4:22:48<138:01:06, 27.71s/it] +2025-02-05 14:30:45 - ERROR - stderr - 20%|██ | 4501/22434 [4:23:05<122:10:19, 24.53s/it] +2025-02-05 14:30:45 - ERROR - stderr - +2025-02-05 14:30:45 - ERROR - stderr - +2025-02-05 14:30:45 - INFO - stdout - {'loss': 1.0321, 'grad_norm': 1.1318790912628174, 'learning_rate': 1.8512036826993425e-05, 'epoch': 0.6} +2025-02-05 14:30:45 - ERROR - stderr - 20%|██ | 4501/22434 [4:23:05<122:10:19, 24.53s/it] +2025-02-05 14:31:20 - ERROR - stderr - 20%|██ | 4502/22434 [4:23:40<137:33:05, 27.61s/it] +2025-02-05 14:31:20 - ERROR - stderr - +2025-02-05 14:31:20 - ERROR - stderr - +2025-02-05 14:31:20 - INFO - stdout - {'loss': 0.9246, 'grad_norm': 1.0639405250549316, 'learning_rate': 1.8511279008818022e-05, 'epoch': 0.6} +2025-02-05 14:31:20 - ERROR - stderr - 20%|██ | 4502/22434 [4:23:40<137:33:05, 27.61s/it] +2025-02-05 14:32:08 - ERROR - stderr - 20%|██ | 4503/22434 [4:24:27<167:21:46, 33.60s/it] +2025-02-05 14:32:08 - ERROR - stderr - +2025-02-05 14:32:08 - ERROR - stderr - +2025-02-05 14:32:08 - INFO - stdout - {'loss': 1.0496, 'grad_norm': 1.2319903373718262, 'learning_rate': 1.851052101323315e-05, 'epoch': 0.6} +2025-02-05 14:32:08 - ERROR - stderr - 20%|██ | 4503/22434 [4:24:28<167:21:46, 33.60s/it] +2025-02-05 14:32:54 - ERROR - stderr - 20%|██ | 4504/22434 [4:25:14<186:59:31, 37.54s/it] +2025-02-05 14:32:55 - ERROR - stderr - +2025-02-05 14:32:55 - ERROR - stderr - +2025-02-05 14:32:55 - INFO - stdout - {'loss': 0.9195, 'grad_norm': 1.170634150505066, 'learning_rate': 1.8509762840254613e-05, 'epoch': 0.6} +2025-02-05 14:32:55 - ERROR - stderr - 20%|██ | 4504/22434 [4:25:14<186:59:31, 37.54s/it] +2025-02-05 14:33:39 - ERROR - stderr - 20%|██ | 4505/22434 [4:25:59<197:02:33, 39.56s/it] +2025-02-05 14:33:39 - ERROR - stderr - +2025-02-05 14:33:39 - ERROR - stderr - +2025-02-05 14:33:39 - INFO - stdout - {'loss': 0.8844, 'grad_norm': 1.0659806728363037, 'learning_rate': 1.850900448989821e-05, 'epoch': 0.6} +2025-02-05 14:33:39 - ERROR - stderr - 20%|██ | 4505/22434 [4:25:59<197:02:33, 39.56s/it] +2025-02-05 14:33:41 - ERROR - stderr - 20%|██ | 4506/22434 [4:26:01<141:39:06, 28.44s/it] +2025-02-05 14:33:41 - ERROR - stderr - +2025-02-05 14:33:41 - ERROR - stderr - +2025-02-05 14:33:41 - INFO - stdout - {'loss': 0.989, 'grad_norm': 1.113992691040039, 'learning_rate': 1.8508245962179755e-05, 'epoch': 0.6} +2025-02-05 14:33:41 - ERROR - stderr - 20%|██ | 4506/22434 [4:26:01<141:39:06, 28.44s/it] +2025-02-05 14:34:14 - ERROR - stderr - 20%|██ | 4507/22434 [4:26:33<147:25:42, 29.61s/it] +2025-02-05 14:34:14 - ERROR - stderr - +2025-02-05 14:34:14 - ERROR - stderr - +2025-02-05 14:34:14 - INFO - stdout - {'loss': 0.8596, 'grad_norm': 1.0443806648254395, 'learning_rate': 1.8507487257115055e-05, 'epoch': 0.6} +2025-02-05 14:34:14 - ERROR - stderr - 20%|██ | 4507/22434 [4:26:33<147:25:42, 29.61s/it] +2025-02-05 14:34:59 - ERROR - stderr - 20%|██ | 4508/22434 [4:27:19<171:24:18, 34.42s/it] +2025-02-05 14:34:59 - ERROR - stderr - +2025-02-05 14:34:59 - ERROR - stderr - +2025-02-05 14:34:59 - INFO - stdout - {'loss': 1.0007, 'grad_norm': 1.2125656604766846, 'learning_rate': 1.850672837471992e-05, 'epoch': 0.6} +2025-02-05 14:34:59 - ERROR - stderr - 20%|██ | 4508/22434 [4:27:19<171:24:18, 34.42s/it] +2025-02-05 14:35:02 - ERROR - stderr - 20%|██ | 4509/22434 [4:27:21<123:43:00, 24.85s/it] +2025-02-05 14:35:02 - ERROR - stderr - +2025-02-05 14:35:02 - ERROR - stderr - +2025-02-05 14:35:02 - INFO - stdout - {'loss': 0.9042, 'grad_norm': 1.0046961307525635, 'learning_rate': 1.8505969315010175e-05, 'epoch': 0.6} +2025-02-05 14:35:02 - ERROR - stderr - 20%|██ | 4509/22434 [4:27:22<123:43:00, 24.85s/it] +2025-02-05 14:35:04 - ERROR - stderr - 20%|██ | 4510/22434 [4:27:24<90:18:51, 18.14s/it] +2025-02-05 14:35:04 - ERROR - stderr - +2025-02-05 14:35:04 - ERROR - stderr - +2025-02-05 14:35:04 - INFO - stdout - {'loss': 0.978, 'grad_norm': 1.0259705781936646, 'learning_rate': 1.8505210078001635e-05, 'epoch': 0.6} +2025-02-05 14:35:04 - ERROR - stderr - 20%|██ | 4510/22434 [4:27:24<90:18:51, 18.14s/it] +2025-02-05 14:35:34 - ERROR - stderr - 20%|██ | 4511/22434 [4:27:53<107:12:22, 21.53s/it] +2025-02-05 14:35:34 - ERROR - stderr - +2025-02-05 14:35:34 - ERROR - stderr - +2025-02-05 14:35:34 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 0.9716789722442627, 'learning_rate': 1.8504450663710134e-05, 'epoch': 0.6} +2025-02-05 14:35:34 - ERROR - stderr - 20%|██ | 4511/22434 [4:27:53<107:12:22, 21.53s/it] +2025-02-05 14:36:16 - ERROR - stderr - 20%|██ | 4512/22434 [4:28:36<138:30:49, 27.82s/it] +2025-02-05 14:36:16 - ERROR - stderr - +2025-02-05 14:36:16 - ERROR - stderr - +2025-02-05 14:36:16 - INFO - stdout - {'loss': 1.0877, 'grad_norm': 1.1140798330307007, 'learning_rate': 1.8503691072151495e-05, 'epoch': 0.6} +2025-02-05 14:36:16 - ERROR - stderr - 20%|██ | 4512/22434 [4:28:36<138:30:49, 27.82s/it] +2025-02-05 14:36:59 - ERROR - stderr - 20%|██ | 4513/22434 [4:29:19<160:36:40, 32.26s/it] +2025-02-05 14:36:59 - ERROR - stderr - +2025-02-05 14:36:59 - ERROR - stderr - +2025-02-05 14:36:59 - INFO - stdout - {'loss': 0.9907, 'grad_norm': 1.194087028503418, 'learning_rate': 1.8502931303341553e-05, 'epoch': 0.6} +2025-02-05 14:36:59 - ERROR - stderr - 20%|██ | 4513/22434 [4:29:19<160:36:40, 32.26s/it] +2025-02-05 14:37:26 - ERROR - stderr - 20%|██ | 4514/22434 [4:29:45<152:21:23, 30.61s/it] +2025-02-05 14:37:26 - ERROR - stderr - +2025-02-05 14:37:26 - ERROR - stderr - +2025-02-05 14:37:26 - INFO - stdout - {'loss': 0.8972, 'grad_norm': 1.0034937858581543, 'learning_rate': 1.8502171357296144e-05, 'epoch': 0.6} +2025-02-05 14:37:26 - ERROR - stderr - 20%|██ | 4514/22434 [4:29:45<152:21:23, 30.61s/it] +2025-02-05 14:37:53 - ERROR - stderr - 20%|██ | 4515/22434 [4:30:13<148:02:49, 29.74s/it] +2025-02-05 14:37:53 - ERROR - stderr - +2025-02-05 14:37:53 - ERROR - stderr - +2025-02-05 14:37:53 - INFO - stdout - {'loss': 0.9661, 'grad_norm': 0.9939236640930176, 'learning_rate': 1.850141123403111e-05, 'epoch': 0.6} +2025-02-05 14:37:53 - ERROR - stderr - 20%|██ | 4515/22434 [4:30:13<148:02:49, 29.74s/it] +2025-02-05 14:37:56 - ERROR - stderr - 20%|██ | 4516/22434 [4:30:15<107:17:10, 21.56s/it] +2025-02-05 14:37:56 - ERROR - stderr - +2025-02-05 14:37:56 - ERROR - stderr - +2025-02-05 14:37:56 - INFO - stdout - {'loss': 0.8413, 'grad_norm': 0.9628288745880127, 'learning_rate': 1.850065093356229e-05, 'epoch': 0.6} +2025-02-05 14:37:56 - ERROR - stderr - 20%|██ | 4516/22434 [4:30:16<107:17:10, 21.56s/it] +2025-02-05 14:38:30 - ERROR - stderr - 20%|██ | 4517/22434 [4:30:49<125:48:30, 25.28s/it] +2025-02-05 14:38:30 - ERROR - stderr - +2025-02-05 14:38:30 - ERROR - stderr - +2025-02-05 14:38:30 - INFO - stdout - {'loss': 0.8985, 'grad_norm': 1.0935051441192627, 'learning_rate': 1.849989045590554e-05, 'epoch': 0.6} +2025-02-05 14:38:30 - ERROR - stderr - 20%|██ | 4517/22434 [4:30:49<125:48:30, 25.28s/it] +2025-02-05 14:38:32 - ERROR - stderr - 20%|██ | 4518/22434 [4:30:52<91:44:12, 18.43s/it] +2025-02-05 14:38:32 - ERROR - stderr - +2025-02-05 14:38:32 - ERROR - stderr - +2025-02-05 14:38:32 - INFO - stdout - {'loss': 1.0148, 'grad_norm': 1.0853569507598877, 'learning_rate': 1.8499129801076704e-05, 'epoch': 0.6} +2025-02-05 14:38:32 - ERROR - stderr - 20%|██ | 4518/22434 [4:30:52<91:44:12, 18.43s/it] +2025-02-05 14:38:35 - ERROR - stderr - 20%|██ | 4519/22434 [4:30:54<67:52:39, 13.64s/it] +2025-02-05 14:38:35 - ERROR - stderr - +2025-02-05 14:38:35 - ERROR - stderr - +2025-02-05 14:38:35 - INFO - stdout - {'loss': 0.9588, 'grad_norm': 0.9970325827598572, 'learning_rate': 1.849836896909164e-05, 'epoch': 0.6} +2025-02-05 14:38:35 - ERROR - stderr - 20%|██ | 4519/22434 [4:30:54<67:52:39, 13.64s/it] +2025-02-05 14:39:10 - ERROR - stderr - 20%|██ | 4520/22434 [4:31:30<100:17:11, 20.15s/it] +2025-02-05 14:39:10 - ERROR - stderr - +2025-02-05 14:39:10 - ERROR - stderr - +2025-02-05 14:39:10 - INFO - stdout - {'loss': 0.8414, 'grad_norm': 1.0848073959350586, 'learning_rate': 1.849760795996621e-05, 'epoch': 0.6} +2025-02-05 14:39:10 - ERROR - stderr - 20%|██ | 4520/22434 [4:31:30<100:17:11, 20.15s/it] +2025-02-05 14:39:24 - ERROR - stderr - 20%|██ | 4521/22434 [4:31:44<91:39:04, 18.42s/it] +2025-02-05 14:39:24 - ERROR - stderr - +2025-02-05 14:39:24 - ERROR - stderr - +2025-02-05 14:39:24 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.0946645736694336, 'learning_rate': 1.8496846773716267e-05, 'epoch': 0.6} +2025-02-05 14:39:24 - ERROR - stderr - 20%|██ | 4521/22434 [4:31:44<91:39:04, 18.42s/it] +2025-02-05 14:39:56 - ERROR - stderr - 20%|██ | 4522/22434 [4:32:15<110:57:42, 22.30s/it] +2025-02-05 14:39:56 - ERROR - stderr - +2025-02-05 14:39:56 - ERROR - stderr - +2025-02-05 14:39:56 - INFO - stdout - {'loss': 0.9959, 'grad_norm': 1.1542887687683105, 'learning_rate': 1.849608541035769e-05, 'epoch': 0.6} +2025-02-05 14:39:56 - ERROR - stderr - 20%|██ | 4522/22434 [4:32:15<110:57:42, 22.30s/it] +2025-02-05 14:40:29 - ERROR - stderr - 20%|██ | 4523/22434 [4:32:49<127:31:41, 25.63s/it] +2025-02-05 14:40:29 - ERROR - stderr - +2025-02-05 14:40:29 - ERROR - stderr - +2025-02-05 14:40:29 - INFO - stdout - {'loss': 1.0061, 'grad_norm': 1.093762993812561, 'learning_rate': 1.8495323869906342e-05, 'epoch': 0.6} +2025-02-05 14:40:29 - ERROR - stderr - 20%|██ | 4523/22434 [4:32:49<127:31:41, 25.63s/it] +2025-02-05 14:40:32 - ERROR - stderr - 20%|██ | 4524/22434 [4:32:51<92:57:30, 18.69s/it] +2025-02-05 14:40:32 - ERROR - stderr - +2025-02-05 14:40:32 - ERROR - stderr - +2025-02-05 14:40:32 - INFO - stdout - {'loss': 1.0387, 'grad_norm': 1.123477578163147, 'learning_rate': 1.8494562152378093e-05, 'epoch': 0.6} +2025-02-05 14:40:32 - ERROR - stderr - 20%|██ | 4524/22434 [4:32:51<92:57:30, 18.69s/it] +2025-02-05 14:40:34 - ERROR - stderr - 20%|██ | 4525/22434 [4:32:54<68:46:32, 13.83s/it] +2025-02-05 14:40:34 - ERROR - stderr - +2025-02-05 14:40:34 - ERROR - stderr - +2025-02-05 14:40:34 - INFO - stdout - {'loss': 0.935, 'grad_norm': 1.1093143224716187, 'learning_rate': 1.849380025778883e-05, 'epoch': 0.61} +2025-02-05 14:40:34 - ERROR - stderr - 20%|██ | 4525/22434 [4:32:54<68:46:32, 13.83s/it] +2025-02-05 14:40:36 - ERROR - stderr - 20%|██ | 4526/22434 [4:32:56<51:48:03, 10.41s/it] +2025-02-05 14:40:37 - ERROR - stderr - +2025-02-05 14:40:37 - ERROR - stderr - +2025-02-05 14:40:37 - INFO - stdout - {'loss': 1.0147, 'grad_norm': 1.1035022735595703, 'learning_rate': 1.8493038186154424e-05, 'epoch': 0.61} +2025-02-05 14:40:37 - ERROR - stderr - 20%|██ | 4526/22434 [4:32:56<51:48:03, 10.41s/it] +2025-02-05 14:40:39 - ERROR - stderr - 20%|██ | 4527/22434 [4:32:59<39:53:54, 8.02s/it] +2025-02-05 14:40:39 - ERROR - stderr - +2025-02-05 14:40:39 - ERROR - stderr - +2025-02-05 14:40:39 - INFO - stdout - {'loss': 0.8308, 'grad_norm': 1.0522245168685913, 'learning_rate': 1.8492275937490764e-05, 'epoch': 0.61} +2025-02-05 14:40:39 - ERROR - stderr - 20%|██ | 4527/22434 [4:32:59<39:53:54, 8.02s/it] +2025-02-05 14:40:41 - ERROR - stderr - 20%|██ | 4528/22434 [4:33:01<31:38:24, 6.36s/it] +2025-02-05 14:40:41 - ERROR - stderr - +2025-02-05 14:40:41 - ERROR - stderr - +2025-02-05 14:40:41 - INFO - stdout - {'loss': 1.0213, 'grad_norm': 1.1079338788986206, 'learning_rate': 1.849151351181374e-05, 'epoch': 0.61} +2025-02-05 14:40:41 - ERROR - stderr - 20%|██ | 4528/22434 [4:33:01<31:38:24, 6.36s/it] +2025-02-05 14:40:44 - ERROR - stderr - 20%|██ | 4529/22434 [4:33:04<25:45:55, 5.18s/it] +2025-02-05 14:40:44 - ERROR - stderr - +2025-02-05 14:40:44 - ERROR - stderr - +2025-02-05 14:40:44 - INFO - stdout - {'loss': 1.0146, 'grad_norm': 1.0584173202514648, 'learning_rate': 1.8490750909139242e-05, 'epoch': 0.61} +2025-02-05 14:40:44 - ERROR - stderr - 20%|██ | 4529/22434 [4:33:04<25:45:55, 5.18s/it] +2025-02-05 14:41:08 - ERROR - stderr - 20%|██ | 4530/22434 [4:33:27<53:22:17, 10.73s/it] +2025-02-05 14:41:08 - ERROR - stderr - +2025-02-05 14:41:08 - ERROR - stderr - +2025-02-05 14:41:08 - INFO - stdout - {'loss': 0.8822, 'grad_norm': 1.1158702373504639, 'learning_rate': 1.8489988129483167e-05, 'epoch': 0.61} +2025-02-05 14:41:08 - ERROR - stderr - 20%|██ | 4530/22434 [4:33:27<53:22:17, 10.73s/it] +2025-02-05 14:41:10 - ERROR - stderr - 20%|██ | 4531/22434 [4:33:30<41:01:50, 8.25s/it] +2025-02-05 14:41:10 - ERROR - stderr - +2025-02-05 14:41:10 - ERROR - stderr - +2025-02-05 14:41:10 - INFO - stdout - {'loss': 1.0219, 'grad_norm': 1.125991702079773, 'learning_rate': 1.848922517286141e-05, 'epoch': 0.61} +2025-02-05 14:41:10 - ERROR - stderr - 20%|██ | 4531/22434 [4:33:30<41:01:50, 8.25s/it] +2025-02-05 14:41:13 - ERROR - stderr - 20%|██ | 4532/22434 [4:33:32<32:33:48, 6.55s/it] +2025-02-05 14:41:13 - ERROR - stderr - +2025-02-05 14:41:13 - ERROR - stderr - +2025-02-05 14:41:13 - INFO - stdout - {'loss': 1.0876, 'grad_norm': 1.1146489381790161, 'learning_rate': 1.848846203928988e-05, 'epoch': 0.61} +2025-02-05 14:41:13 - ERROR - stderr - 20%|██ | 4532/22434 [4:33:32<32:33:48, 6.55s/it] +2025-02-05 14:41:15 - ERROR - stderr - 20%|██ | 4533/22434 [4:33:35<26:37:38, 5.35s/it] +2025-02-05 14:41:15 - ERROR - stderr - +2025-02-05 14:41:15 - ERROR - stderr - +2025-02-05 14:41:15 - INFO - stdout - {'loss': 0.9425, 'grad_norm': 1.020655870437622, 'learning_rate': 1.8487698728784482e-05, 'epoch': 0.61} +2025-02-05 14:41:15 - ERROR - stderr - 20%|██ | 4533/22434 [4:33:35<26:37:38, 5.35s/it] +2025-02-05 14:41:35 - ERROR - stderr - 20%|██ | 4534/22434 [4:33:55<48:00:49, 9.66s/it] +2025-02-05 14:41:35 - ERROR - stderr - +2025-02-05 14:41:35 - ERROR - stderr - +2025-02-05 14:41:35 - INFO - stdout - {'loss': 1.0368, 'grad_norm': 1.037375807762146, 'learning_rate': 1.8486935241361127e-05, 'epoch': 0.61} +2025-02-05 14:41:35 - ERROR - stderr - 20%|██ | 4534/22434 [4:33:55<48:00:49, 9.66s/it] +2025-02-05 14:41:54 - ERROR - stderr - 20%|██ | 4535/22434 [4:34:13<61:30:06, 12.37s/it] +2025-02-05 14:41:54 - ERROR - stderr - +2025-02-05 14:41:54 - ERROR - stderr - +2025-02-05 14:41:54 - INFO - stdout - {'loss': 1.0099, 'grad_norm': 1.2069827318191528, 'learning_rate': 1.8486171577035727e-05, 'epoch': 0.61} +2025-02-05 14:41:54 - ERROR - stderr - 20%|██ | 4535/22434 [4:34:13<61:30:06, 12.37s/it] +2025-02-05 14:41:56 - ERROR - stderr - 20%|██ | 4536/22434 [4:34:16<46:47:32, 9.41s/it] +2025-02-05 14:41:56 - ERROR - stderr - +2025-02-05 14:41:56 - ERROR - stderr - +2025-02-05 14:41:56 - INFO - stdout - {'loss': 0.8572, 'grad_norm': 1.0879740715026855, 'learning_rate': 1.84854077358242e-05, 'epoch': 0.61} +2025-02-05 14:41:56 - ERROR - stderr - 20%|██ | 4536/22434 [4:34:16<46:47:32, 9.41s/it] +2025-02-05 14:41:58 - ERROR - stderr - 20%|██ | 4537/22434 [4:34:18<36:23:00, 7.32s/it] +2025-02-05 14:41:59 - ERROR - stderr - +2025-02-05 14:41:59 - ERROR - stderr - +2025-02-05 14:41:59 - INFO - stdout - {'loss': 1.0611, 'grad_norm': 1.036346673965454, 'learning_rate': 1.8484643717742465e-05, 'epoch': 0.61} +2025-02-05 14:41:59 - ERROR - stderr - 20%|██ | 4537/22434 [4:34:18<36:23:00, 7.32s/it] +2025-02-05 14:42:01 - ERROR - stderr - 20%|██ | 4538/22434 [4:34:21<29:09:48, 5.87s/it] +2025-02-05 14:42:01 - ERROR - stderr - +2025-02-05 14:42:01 - ERROR - stderr - +2025-02-05 14:42:01 - INFO - stdout - {'loss': 0.8069, 'grad_norm': 1.044650673866272, 'learning_rate': 1.8483879522806455e-05, 'epoch': 0.61} +2025-02-05 14:42:01 - ERROR - stderr - 20%|██ | 4538/22434 [4:34:21<29:09:48, 5.87s/it] +2025-02-05 14:42:06 - ERROR - stderr - 20%|██ | 4539/22434 [4:34:26<27:58:55, 5.63s/it] +2025-02-05 14:42:06 - ERROR - stderr - +2025-02-05 14:42:06 - ERROR - stderr - +2025-02-05 14:42:06 - INFO - stdout - {'loss': 1.0056, 'grad_norm': 0.9842966794967651, 'learning_rate': 1.8483115151032094e-05, 'epoch': 0.61} +2025-02-05 14:42:06 - ERROR - stderr - 20%|██ | 4539/22434 [4:34:26<27:58:55, 5.63s/it] +2025-02-05 14:42:09 - ERROR - stderr - 20%|██ | 4540/22434 [4:34:28<23:18:02, 4.69s/it] +2025-02-05 14:42:09 - ERROR - stderr - +2025-02-05 14:42:09 - ERROR - stderr - +2025-02-05 14:42:09 - INFO - stdout - {'loss': 0.9188, 'grad_norm': 1.0881311893463135, 'learning_rate': 1.8482350602435315e-05, 'epoch': 0.61} +2025-02-05 14:42:09 - ERROR - stderr - 20%|██ | 4540/22434 [4:34:28<23:18:02, 4.69s/it] +2025-02-05 14:42:11 - ERROR - stderr - 20%|██ | 4541/22434 [4:34:31<20:09:36, 4.06s/it] +2025-02-05 14:42:11 - ERROR - stderr - +2025-02-05 14:42:11 - ERROR - stderr - +2025-02-05 14:42:11 - INFO - stdout - {'loss': 0.9113, 'grad_norm': 1.0662394762039185, 'learning_rate': 1.8481585877032054e-05, 'epoch': 0.61} +2025-02-05 14:42:11 - ERROR - stderr - 20%|██ | 4541/22434 [4:34:31<20:09:36, 4.06s/it] +2025-02-05 14:42:14 - ERROR - stderr - 20%|██ | 4542/22434 [4:34:33<17:46:26, 3.58s/it] +2025-02-05 14:42:14 - ERROR - stderr - +2025-02-05 14:42:14 - ERROR - stderr - +2025-02-05 14:42:14 - INFO - stdout - {'loss': 0.8597, 'grad_norm': 0.9918805360794067, 'learning_rate': 1.848082097483825e-05, 'epoch': 0.61} +2025-02-05 14:42:14 - ERROR - stderr - 20%|██ | 4542/22434 [4:34:33<17:46:26, 3.58s/it] +2025-02-05 14:42:16 - ERROR - stderr - 20%|██ | 4543/22434 [4:34:36<16:15:27, 3.27s/it] +2025-02-05 14:42:16 - ERROR - stderr - +2025-02-05 14:42:16 - ERROR - stderr - +2025-02-05 14:42:16 - INFO - stdout - {'loss': 0.9096, 'grad_norm': 1.0060417652130127, 'learning_rate': 1.848005589586985e-05, 'epoch': 0.61} +2025-02-05 14:42:16 - ERROR - stderr - 20%|██ | 4543/22434 [4:34:36<16:15:27, 3.27s/it] +2025-02-05 14:42:19 - ERROR - stderr - 20%|██ | 4544/22434 [4:34:38<15:06:38, 3.04s/it] +2025-02-05 14:42:19 - ERROR - stderr - +2025-02-05 14:42:19 - ERROR - stderr - +2025-02-05 14:42:19 - INFO - stdout - {'loss': 0.9055, 'grad_norm': 1.0584622621536255, 'learning_rate': 1.84792906401428e-05, 'epoch': 0.61} +2025-02-05 14:42:19 - ERROR - stderr - 20%|██ | 4544/22434 [4:34:38<15:06:38, 3.04s/it] +2025-02-05 14:42:21 - ERROR - stderr - 20%|██ | 4545/22434 [4:34:41<14:19:25, 2.88s/it] +2025-02-05 14:42:21 - ERROR - stderr - +2025-02-05 14:42:21 - ERROR - stderr - +2025-02-05 14:42:21 - INFO - stdout - {'loss': 0.9573, 'grad_norm': 1.0492143630981445, 'learning_rate': 1.847852520767305e-05, 'epoch': 0.61} +2025-02-05 14:42:21 - ERROR - stderr - 20%|██ | 4545/22434 [4:34:41<14:19:25, 2.88s/it] +2025-02-05 14:42:24 - ERROR - stderr - 20%|██ | 4546/22434 [4:34:43<13:44:59, 2.77s/it] +2025-02-05 14:42:24 - ERROR - stderr - +2025-02-05 14:42:24 - ERROR - stderr - +2025-02-05 14:42:24 - INFO - stdout - {'loss': 1.0293, 'grad_norm': 1.0446584224700928, 'learning_rate': 1.8477759598476556e-05, 'epoch': 0.61} +2025-02-05 14:42:24 - ERROR - stderr - 20%|██ | 4546/22434 [4:34:43<13:44:59, 2.77s/it] +2025-02-05 14:42:26 - ERROR - stderr - 20%|██ | 4547/22434 [4:34:46<13:21:22, 2.69s/it] +2025-02-05 14:42:26 - ERROR - stderr - +2025-02-05 14:42:26 - ERROR - stderr - +2025-02-05 14:42:26 - INFO - stdout - {'loss': 0.9103, 'grad_norm': 1.0829218626022339, 'learning_rate': 1.847699381256927e-05, 'epoch': 0.61} +2025-02-05 14:42:26 - ERROR - stderr - 20%|██ | 4547/22434 [4:34:46<13:21:22, 2.69s/it] +2025-02-05 14:42:29 - ERROR - stderr - 20%|██ | 4548/22434 [4:34:48<13:05:23, 2.63s/it] +2025-02-05 14:42:29 - ERROR - stderr - +2025-02-05 14:42:29 - ERROR - stderr - +2025-02-05 14:42:29 - INFO - stdout - {'loss': 0.9276, 'grad_norm': 1.12076735496521, 'learning_rate': 1.8476227849967166e-05, 'epoch': 0.61} +2025-02-05 14:42:29 - ERROR - stderr - 20%|██ | 4548/22434 [4:34:48<13:05:23, 2.63s/it] +2025-02-05 14:42:31 - ERROR - stderr - 20%|██ | 4549/22434 [4:34:51<12:54:46, 2.60s/it] +2025-02-05 14:42:31 - ERROR - stderr - +2025-02-05 14:42:31 - ERROR - stderr - +2025-02-05 14:42:31 - INFO - stdout - {'loss': 0.9711, 'grad_norm': 1.1958202123641968, 'learning_rate': 1.8475461710686202e-05, 'epoch': 0.61} +2025-02-05 14:42:31 - ERROR - stderr - 20%|██ | 4549/22434 [4:34:51<12:54:46, 2.60s/it] +2025-02-05 14:42:34 - ERROR - stderr - 20%|██ | 4550/22434 [4:34:53<12:40:26, 2.55s/it] +2025-02-05 14:42:34 - ERROR - stderr - +2025-02-05 14:42:34 - ERROR - stderr - +2025-02-05 14:42:34 - INFO - stdout - {'loss': 0.8768, 'grad_norm': 1.0606281757354736, 'learning_rate': 1.8474695394742345e-05, 'epoch': 0.61} +2025-02-05 14:42:34 - ERROR - stderr - 20%|██ | 4550/22434 [4:34:53<12:40:26, 2.55s/it] +2025-02-05 14:42:36 - ERROR - stderr - 20%|██ | 4551/22434 [4:34:56<12:36:34, 2.54s/it] +2025-02-05 14:42:36 - ERROR - stderr - +2025-02-05 14:42:36 - ERROR - stderr - +2025-02-05 14:42:36 - INFO - stdout - {'loss': 0.9358, 'grad_norm': 1.0081276893615723, 'learning_rate': 1.8473928902151576e-05, 'epoch': 0.61} +2025-02-05 14:42:36 - ERROR - stderr - 20%|██ | 4551/22434 [4:34:56<12:36:34, 2.54s/it] +2025-02-05 14:42:39 - ERROR - stderr - 20%|██ | 4552/22434 [4:34:58<12:37:47, 2.54s/it] +2025-02-05 14:42:39 - ERROR - stderr - +2025-02-05 14:42:39 - ERROR - stderr - +2025-02-05 14:42:39 - INFO - stdout - {'loss': 0.9871, 'grad_norm': 1.0348228216171265, 'learning_rate': 1.8473162232929867e-05, 'epoch': 0.61} +2025-02-05 14:42:39 - ERROR - stderr - 20%|██ | 4552/22434 [4:34:58<12:37:47, 2.54s/it] +2025-02-05 14:42:41 - ERROR - stderr - 20%|██ | 4553/22434 [4:35:01<12:36:10, 2.54s/it] +2025-02-05 14:42:41 - ERROR - stderr - +2025-02-05 14:42:41 - ERROR - stderr - +2025-02-05 14:42:41 - INFO - stdout - {'loss': 0.901, 'grad_norm': 1.1591606140136719, 'learning_rate': 1.8472395387093195e-05, 'epoch': 0.61} +2025-02-05 14:42:41 - ERROR - stderr - 20%|██ | 4553/22434 [4:35:01<12:36:10, 2.54s/it] +2025-02-05 14:42:44 - ERROR - stderr - 20%|██ | 4554/22434 [4:35:04<12:41:46, 2.56s/it] +2025-02-05 14:42:44 - ERROR - stderr - +2025-02-05 14:42:44 - ERROR - stderr - +2025-02-05 14:42:44 - INFO - stdout - {'loss': 0.9137, 'grad_norm': 1.1983684301376343, 'learning_rate': 1.8471628364657555e-05, 'epoch': 0.61} +2025-02-05 14:42:44 - ERROR - stderr - 20%|██ | 4554/22434 [4:35:04<12:41:46, 2.56s/it] +2025-02-05 14:42:46 - ERROR - stderr - 20%|██ | 4555/22434 [4:35:06<12:36:36, 2.54s/it] +2025-02-05 14:42:46 - ERROR - stderr - +2025-02-05 14:42:46 - ERROR - stderr - +2025-02-05 14:42:46 - INFO - stdout - {'loss': 1.0372, 'grad_norm': 1.1667745113372803, 'learning_rate': 1.8470861165638926e-05, 'epoch': 0.61} +2025-02-05 14:42:46 - ERROR - stderr - 20%|██ | 4555/22434 [4:35:06<12:36:36, 2.54s/it] +2025-02-05 14:43:06 - ERROR - stderr - 20%|██ | 4556/22434 [4:35:26<38:30:39, 7.75s/it] +2025-02-05 14:43:06 - ERROR - stderr - +2025-02-05 14:43:06 - ERROR - stderr - +2025-02-05 14:43:06 - INFO - stdout - {'loss': 1.234, 'grad_norm': 1.0995213985443115, 'learning_rate': 1.8470093790053297e-05, 'epoch': 0.61} +2025-02-05 14:43:06 - ERROR - stderr - 20%|██ | 4556/22434 [4:35:26<38:30:39, 7.75s/it] +2025-02-05 14:43:23 - ERROR - stderr - 20%|██ | 4557/22434 [4:35:42<51:25:35, 10.36s/it] +2025-02-05 14:43:23 - ERROR - stderr - +2025-02-05 14:43:23 - ERROR - stderr - +2025-02-05 14:43:23 - INFO - stdout - {'loss': 0.9736, 'grad_norm': 1.1155389547348022, 'learning_rate': 1.8469326237916675e-05, 'epoch': 0.61} +2025-02-05 14:43:23 - ERROR - stderr - 20%|██ | 4557/22434 [4:35:42<51:25:35, 10.36s/it] +2025-02-05 14:43:47 - ERROR - stderr - 20%|██ | 4558/22434 [4:36:07<72:46:50, 14.66s/it] +2025-02-05 14:43:47 - ERROR - stderr - +2025-02-05 14:43:47 - ERROR - stderr - +2025-02-05 14:43:47 - INFO - stdout - {'loss': 0.9469, 'grad_norm': 1.0100648403167725, 'learning_rate': 1.846855850924505e-05, 'epoch': 0.61} +2025-02-05 14:43:47 - ERROR - stderr - 20%|██ | 4558/22434 [4:36:07<72:46:50, 14.66s/it] +2025-02-05 14:44:19 - ERROR - stderr - 20%|██ | 4559/22434 [4:36:38<97:25:10, 19.62s/it] +2025-02-05 14:44:19 - ERROR - stderr - +2025-02-05 14:44:19 - ERROR - stderr - +2025-02-05 14:44:19 - INFO - stdout - {'loss': 1.0334, 'grad_norm': 1.1121280193328857, 'learning_rate': 1.8467790604054423e-05, 'epoch': 0.61} +2025-02-05 14:44:19 - ERROR - stderr - 20%|██ | 4559/22434 [4:36:38<97:25:10, 19.62s/it] +2025-02-05 14:44:51 - ERROR - stderr - 20%|██ | 4560/22434 [4:37:11<117:12:25, 23.61s/it] +2025-02-05 14:44:51 - ERROR - stderr - +2025-02-05 14:44:51 - ERROR - stderr - +2025-02-05 14:44:51 - INFO - stdout - {'loss': 0.921, 'grad_norm': 1.0562087297439575, 'learning_rate': 1.8467022522360805e-05, 'epoch': 0.61} +2025-02-05 14:44:51 - ERROR - stderr - 20%|██ | 4560/22434 [4:37:11<117:12:25, 23.61s/it] +2025-02-05 14:45:17 - ERROR - stderr - 20%|██ | 4561/22434 [4:37:37<120:50:28, 24.34s/it] +2025-02-05 14:45:18 - ERROR - stderr - +2025-02-05 14:45:18 - ERROR - stderr - +2025-02-05 14:45:18 - INFO - stdout - {'loss': 1.0534, 'grad_norm': 1.1882513761520386, 'learning_rate': 1.8466254264180205e-05, 'epoch': 0.61} +2025-02-05 14:45:18 - ERROR - stderr - 20%|██ | 4561/22434 [4:37:37<120:50:28, 24.34s/it] +2025-02-05 14:45:45 - ERROR - stderr - 20%|██ | 4562/22434 [4:38:05<125:48:33, 25.34s/it] +2025-02-05 14:45:45 - ERROR - stderr - +2025-02-05 14:45:45 - ERROR - stderr - +2025-02-05 14:45:45 - INFO - stdout - {'loss': 0.9164, 'grad_norm': 1.1301093101501465, 'learning_rate': 1.846548582952864e-05, 'epoch': 0.61} +2025-02-05 14:45:45 - ERROR - stderr - 20%|██ | 4562/22434 [4:38:05<125:48:33, 25.34s/it] +2025-02-05 14:46:24 - ERROR - stderr - 20%|██ | 4563/22434 [4:38:44<145:44:38, 29.36s/it] +2025-02-05 14:46:24 - ERROR - stderr - +2025-02-05 14:46:24 - ERROR - stderr - +2025-02-05 14:46:24 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.0955933332443237, 'learning_rate': 1.8464717218422115e-05, 'epoch': 0.61} +2025-02-05 14:46:24 - ERROR - stderr - 20%|██ | 4563/22434 [4:38:44<145:44:38, 29.36s/it] +2025-02-05 14:46:26 - ERROR - stderr - 20%|██ | 4564/22434 [4:38:46<105:46:31, 21.31s/it] +2025-02-05 14:46:26 - ERROR - stderr - +2025-02-05 14:46:26 - ERROR - stderr - +2025-02-05 14:46:26 - INFO - stdout - {'loss': 1.012, 'grad_norm': 1.090499997138977, 'learning_rate': 1.8463948430876667e-05, 'epoch': 0.61} +2025-02-05 14:46:26 - ERROR - stderr - 20%|██ | 4564/22434 [4:38:46<105:46:31, 21.31s/it] +2025-02-05 14:46:42 - ERROR - stderr - 20%|██ | 4565/22434 [4:39:01<96:45:35, 19.49s/it] +2025-02-05 14:46:42 - ERROR - stderr - +2025-02-05 14:46:42 - ERROR - stderr - +2025-02-05 14:46:42 - INFO - stdout - {'loss': 0.7973, 'grad_norm': 1.0175905227661133, 'learning_rate': 1.846317946690831e-05, 'epoch': 0.61} +2025-02-05 14:46:42 - ERROR - stderr - 20%|██ | 4565/22434 [4:39:02<96:45:35, 19.49s/it] +2025-02-05 14:47:26 - ERROR - stderr - 20%|██ | 4566/22434 [4:39:46<133:39:08, 26.93s/it] +2025-02-05 14:47:26 - ERROR - stderr - +2025-02-05 14:47:26 - ERROR - stderr - +2025-02-05 14:47:26 - INFO - stdout - {'loss': 0.9581, 'grad_norm': 1.081360936164856, 'learning_rate': 1.8462410326533073e-05, 'epoch': 0.61} +2025-02-05 14:47:26 - ERROR - stderr - 20%|██ | 4566/22434 [4:39:46<133:39:08, 26.93s/it] +2025-02-05 14:47:28 - ERROR - stderr - 20%|██ | 4567/22434 [4:39:48<97:13:20, 19.59s/it] +2025-02-05 14:47:28 - ERROR - stderr - +2025-02-05 14:47:28 - ERROR - stderr - +2025-02-05 14:47:28 - INFO - stdout - {'loss': 0.927, 'grad_norm': 0.9667996764183044, 'learning_rate': 1.8461641009766996e-05, 'epoch': 0.61} +2025-02-05 14:47:28 - ERROR - stderr - 20%|██ | 4567/22434 [4:39:48<97:13:20, 19.59s/it] +2025-02-05 14:47:38 - ERROR - stderr - 20%|██ | 4568/22434 [4:39:57<81:48:44, 16.49s/it] +2025-02-05 14:47:38 - ERROR - stderr - +2025-02-05 14:47:38 - ERROR - stderr - +2025-02-05 14:47:38 - INFO - stdout - {'loss': 0.9166, 'grad_norm': 1.0959899425506592, 'learning_rate': 1.8460871516626105e-05, 'epoch': 0.61} +2025-02-05 14:47:38 - ERROR - stderr - 20%|██ | 4568/22434 [4:39:57<81:48:44, 16.49s/it] +2025-02-05 14:48:20 - ERROR - stderr - 20%|██ | 4569/22434 [4:40:39<119:35:50, 24.10s/it] +2025-02-05 14:48:20 - ERROR - stderr - +2025-02-05 14:48:20 - ERROR - stderr - +2025-02-05 14:48:20 - INFO - stdout - {'loss': 1.0318, 'grad_norm': 1.0939836502075195, 'learning_rate': 1.8460101847126445e-05, 'epoch': 0.61} +2025-02-05 14:48:20 - ERROR - stderr - 20%|██ | 4569/22434 [4:40:39<119:35:50, 24.10s/it] +2025-02-05 14:48:52 - ERROR - stderr - 20%|██ | 4570/22434 [4:41:12<132:45:26, 26.75s/it] +2025-02-05 14:48:53 - ERROR - stderr - +2025-02-05 14:48:53 - ERROR - stderr - +2025-02-05 14:48:53 - INFO - stdout - {'loss': 0.9044, 'grad_norm': 0.9785194993019104, 'learning_rate': 1.8459332001284057e-05, 'epoch': 0.61} +2025-02-05 14:48:53 - ERROR - stderr - 20%|██ | 4570/22434 [4:41:12<132:45:26, 26.75s/it] +2025-02-05 14:49:34 - ERROR - stderr - 20%|██ | 4571/22434 [4:41:53<154:11:31, 31.07s/it] +2025-02-05 14:49:34 - ERROR - stderr - +2025-02-05 14:49:34 - ERROR - stderr - +2025-02-05 14:49:34 - INFO - stdout - {'loss': 0.9238, 'grad_norm': 1.062530517578125, 'learning_rate': 1.845856197911499e-05, 'epoch': 0.61} +2025-02-05 14:49:34 - ERROR - stderr - 20%|██ | 4571/22434 [4:41:53<154:11:31, 31.07s/it] +2025-02-05 14:49:36 - ERROR - stderr - 20%|██ | 4572/22434 [4:41:56<111:31:44, 22.48s/it] +2025-02-05 14:49:36 - ERROR - stderr - +2025-02-05 14:49:36 - ERROR - stderr - +2025-02-05 14:49:36 - INFO - stdout - {'loss': 0.8209, 'grad_norm': 1.0204249620437622, 'learning_rate': 1.8457791780635288e-05, 'epoch': 0.61} +2025-02-05 14:49:36 - ERROR - stderr - 20%|██ | 4572/22434 [4:41:56<111:31:44, 22.48s/it] +2025-02-05 14:49:39 - ERROR - stderr - 20%|██ | 4573/22434 [4:41:58<81:47:41, 16.49s/it] +2025-02-05 14:49:39 - ERROR - stderr - +2025-02-05 14:49:39 - ERROR - stderr - +2025-02-05 14:49:39 - INFO - stdout - {'loss': 0.9158, 'grad_norm': 1.0798455476760864, 'learning_rate': 1.8457021405861014e-05, 'epoch': 0.61} +2025-02-05 14:49:39 - ERROR - stderr - 20%|██ | 4573/22434 [4:41:58<81:47:41, 16.49s/it] +2025-02-05 14:50:21 - ERROR - stderr - 20%|██ | 4574/22434 [4:42:41<120:45:49, 24.34s/it] +2025-02-05 14:50:21 - ERROR - stderr - +2025-02-05 14:50:21 - ERROR - stderr - +2025-02-05 14:50:21 - INFO - stdout - {'loss': 0.831, 'grad_norm': 0.983466386795044, 'learning_rate': 1.845625085480822e-05, 'epoch': 0.61} +2025-02-05 14:50:21 - ERROR - stderr - 20%|██ | 4574/22434 [4:42:41<120:45:49, 24.34s/it] +2025-02-05 14:50:40 - ERROR - stderr - 20%|██ | 4575/22434 [4:43:00<112:41:56, 22.72s/it] +2025-02-05 14:50:40 - ERROR - stderr - +2025-02-05 14:50:40 - ERROR - stderr - +2025-02-05 14:50:40 - INFO - stdout - {'loss': 0.9387, 'grad_norm': 1.0896072387695312, 'learning_rate': 1.8455480127492968e-05, 'epoch': 0.61} +2025-02-05 14:50:40 - ERROR - stderr - 20%|██ | 4575/22434 [4:43:00<112:41:56, 22.72s/it] +2025-02-05 14:51:26 - ERROR - stderr - 20%|██ | 4576/22434 [4:43:46<147:43:18, 29.78s/it] +2025-02-05 14:51:26 - ERROR - stderr - +2025-02-05 14:51:26 - ERROR - stderr - +2025-02-05 14:51:26 - INFO - stdout - {'loss': 0.8393, 'grad_norm': 0.9671067595481873, 'learning_rate': 1.8454709223931323e-05, 'epoch': 0.61} +2025-02-05 14:51:26 - ERROR - stderr - 20%|██ | 4576/22434 [4:43:46<147:43:18, 29.78s/it] +2025-02-05 14:51:41 - ERROR - stderr - 20%|██ | 4577/22434 [4:44:01<125:15:22, 25.25s/it] +2025-02-05 14:51:41 - ERROR - stderr - +2025-02-05 14:51:41 - ERROR - stderr - +2025-02-05 14:51:41 - INFO - stdout - {'loss': 0.9594, 'grad_norm': 0.9905608892440796, 'learning_rate': 1.8453938144139356e-05, 'epoch': 0.61} +2025-02-05 14:51:41 - ERROR - stderr - 20%|██ | 4577/22434 [4:44:01<125:15:22, 25.25s/it] +2025-02-05 14:52:13 - ERROR - stderr - 20%|██ | 4578/22434 [4:44:33<135:28:33, 27.31s/it] +2025-02-05 14:52:13 - ERROR - stderr - +2025-02-05 14:52:13 - ERROR - stderr - +2025-02-05 14:52:13 - INFO - stdout - {'loss': 1.0362, 'grad_norm': 1.0986615419387817, 'learning_rate': 1.845316688813314e-05, 'epoch': 0.61} +2025-02-05 14:52:13 - ERROR - stderr - 20%|██ | 4578/22434 [4:44:33<135:28:33, 27.31s/it] +2025-02-05 14:52:16 - ERROR - stderr - 20%|██ | 4579/22434 [4:44:35<98:26:54, 19.85s/it] +2025-02-05 14:52:16 - ERROR - stderr - +2025-02-05 14:52:16 - ERROR - stderr - +2025-02-05 14:52:16 - INFO - stdout - {'loss': 1.0637, 'grad_norm': 1.175173282623291, 'learning_rate': 1.8452395455928744e-05, 'epoch': 0.61} +2025-02-05 14:52:16 - ERROR - stderr - 20%|██ | 4579/22434 [4:44:35<98:26:54, 19.85s/it] +2025-02-05 14:52:49 - ERROR - stderr - 20%|██ | 4580/22434 [4:45:09<118:12:52, 23.84s/it] +2025-02-05 14:52:49 - ERROR - stderr - +2025-02-05 14:52:49 - ERROR - stderr - +2025-02-05 14:52:49 - INFO - stdout - {'loss': 0.8776, 'grad_norm': 1.1355693340301514, 'learning_rate': 1.8451623847542256e-05, 'epoch': 0.61} +2025-02-05 14:52:49 - ERROR - stderr - 20%|██ | 4580/22434 [4:45:09<118:12:52, 23.84s/it] +2025-02-05 14:53:33 - ERROR - stderr - 20%|██ | 4581/22434 [4:45:53<148:33:45, 29.96s/it] +2025-02-05 14:53:33 - ERROR - stderr - +2025-02-05 14:53:33 - ERROR - stderr - +2025-02-05 14:53:33 - INFO - stdout - {'loss': 0.9882, 'grad_norm': 1.1460543870925903, 'learning_rate': 1.8450852062989756e-05, 'epoch': 0.61} +2025-02-05 14:53:33 - ERROR - stderr - 20%|██ | 4581/22434 [4:45:53<148:33:45, 29.96s/it] +2025-02-05 14:53:36 - ERROR - stderr - 20%|██ | 4582/22434 [4:45:55<107:47:24, 21.74s/it] +2025-02-05 14:53:36 - ERROR - stderr - +2025-02-05 14:53:36 - ERROR - stderr - +2025-02-05 14:53:36 - INFO - stdout - {'loss': 0.9201, 'grad_norm': 1.1756792068481445, 'learning_rate': 1.845008010228733e-05, 'epoch': 0.61} +2025-02-05 14:53:36 - ERROR - stderr - 20%|██ | 4582/22434 [4:45:55<107:47:24, 21.74s/it] +2025-02-05 14:53:38 - ERROR - stderr - 20%|██ | 4583/22434 [4:45:58<79:08:02, 15.96s/it] +2025-02-05 14:53:38 - ERROR - stderr - +2025-02-05 14:53:38 - ERROR - stderr - +2025-02-05 14:53:38 - INFO - stdout - {'loss': 1.0574, 'grad_norm': 1.1689866781234741, 'learning_rate': 1.844930796545107e-05, 'epoch': 0.61} +2025-02-05 14:53:38 - ERROR - stderr - 20%|██ | 4583/22434 [4:45:58<79:08:02, 15.96s/it] +2025-02-05 14:53:41 - ERROR - stderr - 20%|██ | 4584/22434 [4:46:00<59:02:21, 11.91s/it] +2025-02-05 14:53:41 - ERROR - stderr - +2025-02-05 14:53:41 - ERROR - stderr - +2025-02-05 14:53:41 - INFO - stdout - {'loss': 1.0118, 'grad_norm': 1.0559935569763184, 'learning_rate': 1.8448535652497073e-05, 'epoch': 0.61} +2025-02-05 14:53:41 - ERROR - stderr - 20%|██ | 4584/22434 [4:46:00<59:02:21, 11.91s/it] +2025-02-05 14:53:43 - ERROR - stderr - 20%|██ | 4585/22434 [4:46:03<44:58:27, 9.07s/it] +2025-02-05 14:53:43 - ERROR - stderr - +2025-02-05 14:53:43 - ERROR - stderr - +2025-02-05 14:53:43 - INFO - stdout - {'loss': 0.8674, 'grad_norm': 1.1090352535247803, 'learning_rate': 1.8447763163441433e-05, 'epoch': 0.61} +2025-02-05 14:53:43 - ERROR - stderr - 20%|██ | 4585/22434 [4:46:03<44:58:27, 9.07s/it] +2025-02-05 14:53:46 - ERROR - stderr - 20%|██ | 4586/22434 [4:46:05<35:14:20, 7.11s/it] +2025-02-05 14:53:46 - ERROR - stderr - +2025-02-05 14:53:46 - ERROR - stderr - +2025-02-05 14:53:46 - INFO - stdout - {'loss': 0.9188, 'grad_norm': 1.0772432088851929, 'learning_rate': 1.8446990498300254e-05, 'epoch': 0.61} +2025-02-05 14:53:46 - ERROR - stderr - 20%|██ | 4586/22434 [4:46:05<35:14:20, 7.11s/it] +2025-02-05 14:54:40 - ERROR - stderr - 20%|██ | 4587/22434 [4:47:00<105:41:50, 21.32s/it] +2025-02-05 14:54:40 - ERROR - stderr - +2025-02-05 14:54:40 - ERROR - stderr - +2025-02-05 14:54:40 - INFO - stdout - {'loss': 0.8902, 'grad_norm': 1.3038002252578735, 'learning_rate': 1.844621765708964e-05, 'epoch': 0.61} +2025-02-05 14:54:40 - ERROR - stderr - 20%|██ | 4587/22434 [4:47:00<105:41:50, 21.32s/it] +2025-02-05 14:55:15 - ERROR - stderr - 20%|██ | 4588/22434 [4:47:35<125:42:18, 25.36s/it] +2025-02-05 14:55:15 - ERROR - stderr - +2025-02-05 14:55:15 - ERROR - stderr - +2025-02-05 14:55:15 - INFO - stdout - {'loss': 0.9564, 'grad_norm': 1.0212488174438477, 'learning_rate': 1.84454446398257e-05, 'epoch': 0.61} +2025-02-05 14:55:15 - ERROR - stderr - 20%|██ | 4588/22434 [4:47:35<125:42:18, 25.36s/it] +2025-02-05 14:55:17 - ERROR - stderr - 20%|██ | 4589/22434 [4:47:37<91:40:25, 18.49s/it] +2025-02-05 14:55:17 - ERROR - stderr - +2025-02-05 14:55:17 - ERROR - stderr - +2025-02-05 14:55:17 - INFO - stdout - {'loss': 0.9754, 'grad_norm': 1.185678482055664, 'learning_rate': 1.8444671446524552e-05, 'epoch': 0.61} +2025-02-05 14:55:17 - ERROR - stderr - 20%|██ | 4589/22434 [4:47:37<91:40:25, 18.49s/it] +2025-02-05 14:55:53 - ERROR - stderr - 20%|██ | 4590/22434 [4:48:13<117:39:34, 23.74s/it] +2025-02-05 14:55:53 - ERROR - stderr - +2025-02-05 14:55:53 - ERROR - stderr - +2025-02-05 14:55:53 - INFO - stdout - {'loss': 0.8964, 'grad_norm': 1.129547357559204, 'learning_rate': 1.8443898077202306e-05, 'epoch': 0.61} +2025-02-05 14:55:53 - ERROR - stderr - 20%|██ | 4590/22434 [4:48:13<117:39:34, 23.74s/it] +2025-02-05 14:56:21 - ERROR - stderr - 20%|██ | 4591/22434 [4:48:40<123:06:44, 24.84s/it] +2025-02-05 14:56:21 - ERROR - stderr - +2025-02-05 14:56:21 - ERROR - stderr - +2025-02-05 14:56:21 - INFO - stdout - {'loss': 1.0037, 'grad_norm': 1.1499437093734741, 'learning_rate': 1.8443124531875086e-05, 'epoch': 0.61} +2025-02-05 14:56:21 - ERROR - stderr - 20%|██ | 4591/22434 [4:48:40<123:06:44, 24.84s/it] +2025-02-05 14:56:46 - ERROR - stderr - 20%|██ | 4592/22434 [4:49:06<124:37:58, 25.15s/it] +2025-02-05 14:56:47 - ERROR - stderr - +2025-02-05 14:56:47 - ERROR - stderr - +2025-02-05 14:56:47 - INFO - stdout - {'loss': 0.8568, 'grad_norm': 1.084995985031128, 'learning_rate': 1.8442350810559012e-05, 'epoch': 0.61} +2025-02-05 14:56:47 - ERROR - stderr - 20%|██ | 4592/22434 [4:49:06<124:37:58, 25.15s/it] +2025-02-05 14:57:20 - ERROR - stderr - 20%|██ | 4593/22434 [4:49:39<136:31:26, 27.55s/it] +2025-02-05 14:57:20 - ERROR - stderr - +2025-02-05 14:57:20 - ERROR - stderr - +2025-02-05 14:57:20 - INFO - stdout - {'loss': 0.9021, 'grad_norm': 1.0891430377960205, 'learning_rate': 1.8441576913270213e-05, 'epoch': 0.61} +2025-02-05 14:57:20 - ERROR - stderr - 20%|██ | 4593/22434 [4:49:39<136:31:26, 27.55s/it] +2025-02-05 14:57:22 - ERROR - stderr - 20%|██ | 4594/22434 [4:49:42<99:13:16, 20.02s/it] +2025-02-05 14:57:22 - ERROR - stderr - +2025-02-05 14:57:22 - ERROR - stderr - +2025-02-05 14:57:22 - INFO - stdout - {'loss': 1.0208, 'grad_norm': 1.162308931350708, 'learning_rate': 1.8440802840024824e-05, 'epoch': 0.61} +2025-02-05 14:57:22 - ERROR - stderr - 20%|██ | 4594/22434 [4:49:42<99:13:16, 20.02s/it] +2025-02-05 14:57:25 - ERROR - stderr - 20%|██ | 4595/22434 [4:49:44<73:06:42, 14.75s/it] +2025-02-05 14:57:25 - ERROR - stderr - +2025-02-05 14:57:25 - ERROR - stderr - +2025-02-05 14:57:25 - INFO - stdout - {'loss': 0.923, 'grad_norm': 1.1022157669067383, 'learning_rate': 1.8440028590838975e-05, 'epoch': 0.61} +2025-02-05 14:57:25 - ERROR - stderr - 20%|██ | 4595/22434 [4:49:44<73:06:42, 14.75s/it] +2025-02-05 14:57:53 - ERROR - stderr - 20%|██ | 4596/22434 [4:50:13<93:40:32, 18.91s/it] +2025-02-05 14:57:53 - ERROR - stderr - +2025-02-05 14:57:53 - ERROR - stderr - +2025-02-05 14:57:53 - INFO - stdout - {'loss': 0.9396, 'grad_norm': 1.1547234058380127, 'learning_rate': 1.8439254165728805e-05, 'epoch': 0.61} +2025-02-05 14:57:53 - ERROR - stderr - 20%|██ | 4596/22434 [4:50:13<93:40:32, 18.91s/it] +2025-02-05 14:58:18 - ERROR - stderr - 20%|██ | 4597/22434 [4:50:38<102:32:36, 20.70s/it] +2025-02-05 14:58:18 - ERROR - stderr - +2025-02-05 14:58:18 - ERROR - stderr - +2025-02-05 14:58:18 - INFO - stdout - {'loss': 0.9575, 'grad_norm': 1.0485843420028687, 'learning_rate': 1.8438479564710458e-05, 'epoch': 0.61} +2025-02-05 14:58:18 - ERROR - stderr - 20%|██ | 4597/22434 [4:50:38<102:32:36, 20.70s/it] +2025-02-05 14:58:42 - ERROR - stderr - 20%|██ | 4598/22434 [4:51:02<107:19:44, 21.66s/it] +2025-02-05 14:58:42 - ERROR - stderr - +2025-02-05 14:58:42 - ERROR - stderr - +2025-02-05 14:58:42 - INFO - stdout - {'loss': 0.9345, 'grad_norm': 1.1971862316131592, 'learning_rate': 1.8437704787800085e-05, 'epoch': 0.61} +2025-02-05 14:58:42 - ERROR - stderr - 20%|██ | 4598/22434 [4:51:02<107:19:44, 21.66s/it] +2025-02-05 14:59:05 - ERROR - stderr - 21%|██ | 4599/22434 [4:51:25<109:42:38, 22.15s/it] +2025-02-05 14:59:05 - ERROR - stderr - +2025-02-05 14:59:05 - ERROR - stderr - +2025-02-05 14:59:05 - INFO - stdout - {'loss': 1.0097, 'grad_norm': 1.1647599935531616, 'learning_rate': 1.8436929835013823e-05, 'epoch': 0.62} +2025-02-05 14:59:05 - ERROR - stderr - 21%|██ | 4599/22434 [4:51:25<109:42:38, 22.15s/it] +2025-02-05 14:59:33 - ERROR - stderr - 21%|██ | 4600/22434 [4:51:53<118:11:06, 23.86s/it] +2025-02-05 14:59:33 - ERROR - stderr - +2025-02-05 14:59:33 - ERROR - stderr - +2025-02-05 14:59:33 - INFO - stdout - {'loss': 1.0451, 'grad_norm': 1.0963987112045288, 'learning_rate': 1.843615470636783e-05, 'epoch': 0.62} +2025-02-05 14:59:33 - ERROR - stderr - 21%|██ | 4600/22434 [4:51:53<118:11:06, 23.86s/it] +2025-02-05 14:59:36 - ERROR - stderr - 21%|██ | 4601/22434 [4:51:55<86:24:58, 17.45s/it] +2025-02-05 14:59:36 - ERROR - stderr - +2025-02-05 14:59:36 - ERROR - stderr - +2025-02-05 14:59:36 - INFO - stdout - {'loss': 0.9089, 'grad_norm': 1.0143883228302002, 'learning_rate': 1.8435379401878274e-05, 'epoch': 0.62} +2025-02-05 14:59:36 - ERROR - stderr - 21%|██ | 4601/22434 [4:51:55<86:24:58, 17.45s/it] +2025-02-05 14:59:38 - ERROR - stderr - 21%|██ | 4602/22434 [4:51:58<64:15:42, 12.97s/it] +2025-02-05 14:59:38 - ERROR - stderr - +2025-02-05 14:59:38 - ERROR - stderr - +2025-02-05 14:59:38 - INFO - stdout - {'loss': 0.8999, 'grad_norm': 1.1572073698043823, 'learning_rate': 1.84346039215613e-05, 'epoch': 0.62} +2025-02-05 14:59:38 - ERROR - stderr - 21%|██ | 4602/22434 [4:51:58<64:15:42, 12.97s/it] +2025-02-05 14:59:41 - ERROR - stderr - 21%|██ | 4603/22434 [4:52:00<48:39:03, 9.82s/it] +2025-02-05 14:59:41 - ERROR - stderr - +2025-02-05 14:59:41 - ERROR - stderr - +2025-02-05 14:59:41 - INFO - stdout - {'loss': 0.9531, 'grad_norm': 1.0570807456970215, 'learning_rate': 1.8433828265433078e-05, 'epoch': 0.62} +2025-02-05 14:59:41 - ERROR - stderr - 21%|██ | 4603/22434 [4:52:00<48:39:03, 9.82s/it] +2025-02-05 14:59:43 - ERROR - stderr - 21%|██ | 4604/22434 [4:52:03<37:51:21, 7.64s/it] +2025-02-05 14:59:43 - ERROR - stderr - +2025-02-05 14:59:43 - ERROR - stderr - +2025-02-05 14:59:43 - INFO - stdout - {'loss': 1.0439, 'grad_norm': 1.1528053283691406, 'learning_rate': 1.843305243350978e-05, 'epoch': 0.62} +2025-02-05 14:59:43 - ERROR - stderr - 21%|██ | 4604/22434 [4:52:03<37:51:21, 7.64s/it] +2025-02-05 14:59:46 - ERROR - stderr - 21%|██ | 4605/22434 [4:52:05<30:15:40, 6.11s/it] +2025-02-05 14:59:46 - ERROR - stderr - +2025-02-05 14:59:46 - ERROR - stderr - +2025-02-05 14:59:46 - INFO - stdout - {'loss': 1.0176, 'grad_norm': 1.0917670726776123, 'learning_rate': 1.8432276425807566e-05, 'epoch': 0.62} +2025-02-05 14:59:46 - ERROR - stderr - 21%|██ | 4605/22434 [4:52:05<30:15:40, 6.11s/it] +2025-02-05 14:59:48 - ERROR - stderr - 21%|██ | 4606/22434 [4:52:08<24:51:00, 5.02s/it] +2025-02-05 14:59:48 - ERROR - stderr - +2025-02-05 14:59:48 - ERROR - stderr - +2025-02-05 14:59:48 - INFO - stdout - {'loss': 0.94, 'grad_norm': 1.0241259336471558, 'learning_rate': 1.8431500242342623e-05, 'epoch': 0.62} +2025-02-05 14:59:48 - ERROR - stderr - 21%|██ | 4606/22434 [4:52:08<24:51:00, 5.02s/it] +2025-02-05 14:59:51 - ERROR - stderr - 21%|██ | 4607/22434 [4:52:10<21:07:07, 4.26s/it] +2025-02-05 14:59:51 - ERROR - stderr - +2025-02-05 14:59:51 - ERROR - stderr - +2025-02-05 14:59:51 - INFO - stdout - {'loss': 0.916, 'grad_norm': 1.0566401481628418, 'learning_rate': 1.843072388313113e-05, 'epoch': 0.62} +2025-02-05 14:59:51 - ERROR - stderr - 21%|██ | 4607/22434 [4:52:10<21:07:07, 4.26s/it] +2025-02-05 14:59:53 - ERROR - stderr - 21%|██ | 4608/22434 [4:52:13<18:25:47, 3.72s/it] +2025-02-05 14:59:53 - ERROR - stderr - +2025-02-05 14:59:53 - ERROR - stderr - +2025-02-05 14:59:53 - INFO - stdout - {'loss': 1.0375, 'grad_norm': 1.1511932611465454, 'learning_rate': 1.8429947348189257e-05, 'epoch': 0.62} +2025-02-05 14:59:53 - ERROR - stderr - 21%|██ | 4608/22434 [4:52:13<18:25:47, 3.72s/it] +2025-02-05 14:59:56 - ERROR - stderr - 21%|██ | 4609/22434 [4:52:15<16:42:09, 3.37s/it] +2025-02-05 14:59:56 - ERROR - stderr - +2025-02-05 14:59:56 - ERROR - stderr - +2025-02-05 14:59:56 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.0721886157989502, 'learning_rate': 1.8429170637533206e-05, 'epoch': 0.62} +2025-02-05 14:59:56 - ERROR - stderr - 21%|██ | 4609/22434 [4:52:15<16:42:09, 3.37s/it] +2025-02-05 14:59:58 - ERROR - stderr - 21%|██ | 4610/22434 [4:52:18<15:20:35, 3.10s/it] +2025-02-05 14:59:58 - ERROR - stderr - +2025-02-05 14:59:58 - ERROR - stderr - +2025-02-05 14:59:58 - INFO - stdout - {'loss': 1.0661, 'grad_norm': 1.043841004371643, 'learning_rate': 1.8428393751179154e-05, 'epoch': 0.62} +2025-02-05 14:59:58 - ERROR - stderr - 21%|██ | 4610/22434 [4:52:18<15:20:35, 3.10s/it] +2025-02-05 15:00:01 - ERROR - stderr - 21%|██ | 4611/22434 [4:52:20<14:27:51, 2.92s/it] +2025-02-05 15:00:01 - ERROR - stderr - +2025-02-05 15:00:01 - ERROR - stderr - +2025-02-05 15:00:01 - INFO - stdout - {'loss': 1.0143, 'grad_norm': 1.049148440361023, 'learning_rate': 1.84276166891433e-05, 'epoch': 0.62} +2025-02-05 15:00:01 - ERROR - stderr - 21%|██ | 4611/22434 [4:52:20<14:27:51, 2.92s/it] +2025-02-05 15:00:03 - ERROR - stderr - 21%|██ | 4612/22434 [4:52:23<13:49:06, 2.79s/it] +2025-02-05 15:00:03 - ERROR - stderr - +2025-02-05 15:00:03 - ERROR - stderr - +2025-02-05 15:00:03 - INFO - stdout - {'loss': 1.0137, 'grad_norm': 1.106191873550415, 'learning_rate': 1.842683945144184e-05, 'epoch': 0.62} +2025-02-05 15:00:03 - ERROR - stderr - 21%|██ | 4612/22434 [4:52:23<13:49:06, 2.79s/it] +2025-02-05 15:00:06 - ERROR - stderr - 21%|██ | 4613/22434 [4:52:25<13:20:20, 2.69s/it] +2025-02-05 15:00:06 - ERROR - stderr - +2025-02-05 15:00:06 - ERROR - stderr - +2025-02-05 15:00:06 - INFO - stdout - {'loss': 0.8247, 'grad_norm': 1.0513697862625122, 'learning_rate': 1.8426062038090976e-05, 'epoch': 0.62} +2025-02-05 15:00:06 - ERROR - stderr - 21%|██ | 4613/22434 [4:52:25<13:20:20, 2.69s/it] +2025-02-05 15:00:08 - ERROR - stderr - 21%|██ | 4614/22434 [4:52:28<13:06:12, 2.65s/it] +2025-02-05 15:00:08 - ERROR - stderr - +2025-02-05 15:00:08 - ERROR - stderr - +2025-02-05 15:00:08 - INFO - stdout - {'loss': 0.9646, 'grad_norm': 1.0746268033981323, 'learning_rate': 1.8425284449106912e-05, 'epoch': 0.62} +2025-02-05 15:00:08 - ERROR - stderr - 21%|██ | 4614/22434 [4:52:28<13:06:12, 2.65s/it] +2025-02-05 15:00:11 - ERROR - stderr - 21%|██ | 4615/22434 [4:52:31<13:09:18, 2.66s/it] +2025-02-05 15:00:11 - ERROR - stderr - +2025-02-05 15:00:11 - ERROR - stderr - +2025-02-05 15:00:11 - INFO - stdout - {'loss': 0.9417, 'grad_norm': 1.2171393632888794, 'learning_rate': 1.8424506684505854e-05, 'epoch': 0.62} +2025-02-05 15:00:11 - ERROR - stderr - 21%|██ | 4615/22434 [4:52:31<13:09:18, 2.66s/it] +2025-02-05 15:00:13 - ERROR - stderr - 21%|██ | 4616/22434 [4:52:33<12:53:19, 2.60s/it] +2025-02-05 15:00:13 - ERROR - stderr - +2025-02-05 15:00:13 - ERROR - stderr - +2025-02-05 15:00:13 - INFO - stdout - {'loss': 0.8997, 'grad_norm': 0.9602169990539551, 'learning_rate': 1.8423728744304017e-05, 'epoch': 0.62} +2025-02-05 15:00:13 - ERROR - stderr - 21%|██ | 4616/22434 [4:52:33<12:53:19, 2.60s/it] +2025-02-05 15:00:16 - ERROR - stderr - 21%|██ | 4617/22434 [4:52:36<13:15:37, 2.68s/it] +2025-02-05 15:00:16 - ERROR - stderr - +2025-02-05 15:00:16 - ERROR - stderr - +2025-02-05 15:00:16 - INFO - stdout - {'loss': 0.9718, 'grad_norm': 1.1689722537994385, 'learning_rate': 1.8422950628517616e-05, 'epoch': 0.62} +2025-02-05 15:00:16 - ERROR - stderr - 21%|██ | 4617/22434 [4:52:36<13:15:37, 2.68s/it] +2025-02-05 15:00:19 - ERROR - stderr - 21%|██ | 4618/22434 [4:52:38<12:59:04, 2.62s/it] +2025-02-05 15:00:19 - ERROR - stderr - +2025-02-05 15:00:19 - ERROR - stderr - +2025-02-05 15:00:19 - INFO - stdout - {'loss': 0.9334, 'grad_norm': 1.3014100790023804, 'learning_rate': 1.8422172337162865e-05, 'epoch': 0.62} +2025-02-05 15:00:19 - ERROR - stderr - 21%|██ | 4618/22434 [4:52:38<12:59:04, 2.62s/it] +2025-02-05 15:00:21 - ERROR - stderr - 21%|██ | 4619/22434 [4:52:41<12:44:11, 2.57s/it] +2025-02-05 15:00:21 - ERROR - stderr - +2025-02-05 15:00:21 - ERROR - stderr - +2025-02-05 15:00:21 - INFO - stdout - {'loss': 0.976, 'grad_norm': 1.1799534559249878, 'learning_rate': 1.8421393870255996e-05, 'epoch': 0.62} +2025-02-05 15:00:21 - ERROR - stderr - 21%|██ | 4619/22434 [4:52:41<12:44:11, 2.57s/it] +2025-02-05 15:00:45 - ERROR - stderr - 21%|██ | 4620/22434 [4:53:05<44:23:20, 8.97s/it] +2025-02-05 15:00:45 - ERROR - stderr - +2025-02-05 15:00:45 - ERROR - stderr - +2025-02-05 15:00:45 - INFO - stdout - {'loss': 0.9268, 'grad_norm': 1.1077040433883667, 'learning_rate': 1.8420615227813227e-05, 'epoch': 0.62} +2025-02-05 15:00:45 - ERROR - stderr - 21%|██ | 4620/22434 [4:53:05<44:23:20, 8.97s/it] +2025-02-05 15:01:08 - ERROR - stderr - 21%|██ | 4621/22434 [4:53:28<65:31:58, 13.24s/it] +2025-02-05 15:01:08 - ERROR - stderr - +2025-02-05 15:01:08 - ERROR - stderr - +2025-02-05 15:01:08 - INFO - stdout - {'loss': 0.9542, 'grad_norm': 1.1594727039337158, 'learning_rate': 1.8419836409850794e-05, 'epoch': 0.62} +2025-02-05 15:01:08 - ERROR - stderr - 21%|██ | 4621/22434 [4:53:28<65:31:58, 13.24s/it] +2025-02-05 15:01:38 - ERROR - stderr - 21%|██ | 4622/22434 [4:53:57<89:23:53, 18.07s/it] +2025-02-05 15:01:38 - ERROR - stderr - +2025-02-05 15:01:38 - ERROR - stderr - +2025-02-05 15:01:38 - INFO - stdout - {'loss': 0.946, 'grad_norm': 1.0005004405975342, 'learning_rate': 1.8419057416384927e-05, 'epoch': 0.62} +2025-02-05 15:01:38 - ERROR - stderr - 21%|██ | 4622/22434 [4:53:57<89:23:53, 18.07s/it] +2025-02-05 15:01:40 - ERROR - stderr - 21%|██ | 4623/22434 [4:54:00<66:22:55, 13.42s/it] +2025-02-05 15:01:40 - ERROR - stderr - +2025-02-05 15:01:40 - ERROR - stderr - +2025-02-05 15:01:40 - INFO - stdout - {'loss': 0.9058, 'grad_norm': 1.129563808441162, 'learning_rate': 1.8418278247431862e-05, 'epoch': 0.62} +2025-02-05 15:01:40 - ERROR - stderr - 21%|██ | 4623/22434 [4:54:00<66:22:55, 13.42s/it] +2025-02-05 15:01:43 - ERROR - stderr - 21%|██ | 4624/22434 [4:54:02<50:06:31, 10.13s/it] +2025-02-05 15:01:43 - ERROR - stderr - +2025-02-05 15:01:43 - ERROR - stderr - +2025-02-05 15:01:43 - INFO - stdout - {'loss': 0.9461, 'grad_norm': 1.0353795289993286, 'learning_rate': 1.8417498903007845e-05, 'epoch': 0.62} +2025-02-05 15:01:43 - ERROR - stderr - 21%|██ | 4624/22434 [4:54:02<50:06:31, 10.13s/it] +2025-02-05 15:02:19 - ERROR - stderr - 21%|██ | 4625/22434 [4:54:38<88:42:32, 17.93s/it] +2025-02-05 15:02:19 - ERROR - stderr - +2025-02-05 15:02:19 - ERROR - stderr - +2025-02-05 15:02:19 - INFO - stdout - {'loss': 1.0126, 'grad_norm': 1.0598088502883911, 'learning_rate': 1.8416719383129114e-05, 'epoch': 0.62} +2025-02-05 15:02:19 - ERROR - stderr - 21%|██ | 4625/22434 [4:54:38<88:42:32, 17.93s/it] +2025-02-05 15:03:14 - ERROR - stderr - 21%|██ | 4626/22434 [4:55:34<143:47:54, 29.07s/it] +2025-02-05 15:03:14 - ERROR - stderr - +2025-02-05 15:03:14 - ERROR - stderr - +2025-02-05 15:03:14 - INFO - stdout - {'loss': 1.058, 'grad_norm': 1.135843276977539, 'learning_rate': 1.8415939687811927e-05, 'epoch': 0.62} +2025-02-05 15:03:14 - ERROR - stderr - 21%|██ | 4626/22434 [4:55:34<143:47:54, 29.07s/it] +2025-02-05 15:04:02 - ERROR - stderr - 21%|██ | 4627/22434 [4:56:22<172:20:16, 34.84s/it] +2025-02-05 15:04:02 - ERROR - stderr - +2025-02-05 15:04:02 - ERROR - stderr - +2025-02-05 15:04:02 - INFO - stdout - {'loss': 0.9312, 'grad_norm': 0.9938992857933044, 'learning_rate': 1.8415159817072525e-05, 'epoch': 0.62} +2025-02-05 15:04:02 - ERROR - stderr - 21%|██ | 4627/22434 [4:56:22<172:20:16, 34.84s/it] +2025-02-05 15:04:30 - ERROR - stderr - 21%|██ | 4628/22434 [4:56:50<162:36:29, 32.88s/it] +2025-02-05 15:04:30 - ERROR - stderr - +2025-02-05 15:04:30 - ERROR - stderr - +2025-02-05 15:04:30 - INFO - stdout - {'loss': 0.9, 'grad_norm': 0.9811779856681824, 'learning_rate': 1.841437977092717e-05, 'epoch': 0.62} +2025-02-05 15:04:30 - ERROR - stderr - 21%|██ | 4628/22434 [4:56:50<162:36:29, 32.88s/it] +2025-02-05 15:04:47 - ERROR - stderr - 21%|██ | 4629/22434 [4:57:06<137:55:10, 27.89s/it] +2025-02-05 15:04:47 - ERROR - stderr - +2025-02-05 15:04:47 - ERROR - stderr - +2025-02-05 15:04:47 - INFO - stdout - {'loss': 1.0626, 'grad_norm': 1.094675898551941, 'learning_rate': 1.8413599549392126e-05, 'epoch': 0.62} +2025-02-05 15:04:47 - ERROR - stderr - 21%|██ | 4629/22434 [4:57:06<137:55:10, 27.89s/it] +2025-02-05 15:05:37 - ERROR - stderr - 21%|██ | 4630/22434 [4:57:56<170:56:18, 34.56s/it] +2025-02-05 15:05:37 - ERROR - stderr - +2025-02-05 15:05:37 - ERROR - stderr - +2025-02-05 15:05:37 - INFO - stdout - {'loss': 0.9661, 'grad_norm': 1.2727317810058594, 'learning_rate': 1.8412819152483643e-05, 'epoch': 0.62} +2025-02-05 15:05:37 - ERROR - stderr - 21%|██ | 4630/22434 [4:57:57<170:56:18, 34.56s/it] +2025-02-05 15:06:25 - ERROR - stderr - 21%|██ | 4631/22434 [4:58:44<190:51:05, 38.59s/it] +2025-02-05 15:06:25 - ERROR - stderr - +2025-02-05 15:06:25 - ERROR - stderr - +2025-02-05 15:06:25 - INFO - stdout - {'loss': 1.0314, 'grad_norm': 1.0767731666564941, 'learning_rate': 1.8412038580218002e-05, 'epoch': 0.62} +2025-02-05 15:06:25 - ERROR - stderr - 21%|██ | 4631/22434 [4:58:45<190:51:05, 38.59s/it] +2025-02-05 15:06:37 - ERROR - stderr - 21%|██ | 4632/22434 [4:58:56<151:16:43, 30.59s/it] +2025-02-05 15:06:37 - ERROR - stderr - +2025-02-05 15:06:37 - ERROR - stderr - +2025-02-05 15:06:37 - INFO - stdout - {'loss': 0.8313, 'grad_norm': 1.2193446159362793, 'learning_rate': 1.8411257832611463e-05, 'epoch': 0.62} +2025-02-05 15:06:37 - ERROR - stderr - 21%|██ | 4632/22434 [4:58:56<151:16:43, 30.59s/it] +2025-02-05 15:07:10 - ERROR - stderr - 21%|██ | 4633/22434 [4:59:30<155:34:18, 31.46s/it] +2025-02-05 15:07:10 - ERROR - stderr - +2025-02-05 15:07:10 - ERROR - stderr - +2025-02-05 15:07:10 - INFO - stdout - {'loss': 0.9507, 'grad_norm': 1.0851116180419922, 'learning_rate': 1.84104769096803e-05, 'epoch': 0.62} +2025-02-05 15:07:10 - ERROR - stderr - 21%|██ | 4633/22434 [4:59:30<155:34:18, 31.46s/it] +2025-02-05 15:07:59 - ERROR - stderr - 21%|██ | 4634/22434 [5:00:18<180:48:49, 36.57s/it] +2025-02-05 15:07:59 - ERROR - stderr - +2025-02-05 15:07:59 - ERROR - stderr - +2025-02-05 15:07:59 - INFO - stdout - {'loss': 0.8756, 'grad_norm': 1.064950942993164, 'learning_rate': 1.8409695811440796e-05, 'epoch': 0.62} +2025-02-05 15:07:59 - ERROR - stderr - 21%|██ | 4634/22434 [5:00:18<180:48:49, 36.57s/it] +2025-02-05 15:08:01 - ERROR - stderr - 21%|██ | 4635/22434 [5:00:21<130:20:40, 26.36s/it] +2025-02-05 15:08:01 - ERROR - stderr - +2025-02-05 15:08:01 - ERROR - stderr - +2025-02-05 15:08:01 - INFO - stdout - {'loss': 0.8942, 'grad_norm': 1.0337715148925781, 'learning_rate': 1.840891453790923e-05, 'epoch': 0.62} +2025-02-05 15:08:01 - ERROR - stderr - 21%|██ | 4635/22434 [5:00:21<130:20:40, 26.36s/it] +2025-02-05 15:08:54 - ERROR - stderr - 21%|██ | 4636/22434 [5:01:13<169:00:36, 34.19s/it] +2025-02-05 15:08:54 - ERROR - stderr - +2025-02-05 15:08:54 - ERROR - stderr - +2025-02-05 15:08:54 - INFO - stdout - {'loss': 0.9237, 'grad_norm': 1.1527636051177979, 'learning_rate': 1.840813308910189e-05, 'epoch': 0.62} +2025-02-05 15:08:54 - ERROR - stderr - 21%|██ | 4636/22434 [5:01:13<169:00:36, 34.19s/it] +2025-02-05 15:09:48 - ERROR - stderr - 21%|██ | 4637/22434 [5:02:07<198:32:51, 40.16s/it] +2025-02-05 15:09:48 - ERROR - stderr - +2025-02-05 15:09:48 - ERROR - stderr - +2025-02-05 15:09:48 - INFO - stdout - {'loss': 1.0748, 'grad_norm': 1.0866047143936157, 'learning_rate': 1.8407351465035056e-05, 'epoch': 0.62} +2025-02-05 15:09:48 - ERROR - stderr - 21%|██ | 4637/22434 [5:02:08<198:32:51, 40.16s/it] +2025-02-05 15:10:39 - ERROR - stderr - 21%|██ | 4638/22434 [5:02:59<215:43:27, 43.64s/it] +2025-02-05 15:10:40 - ERROR - stderr - +2025-02-05 15:10:40 - ERROR - stderr - +2025-02-05 15:10:40 - INFO - stdout - {'loss': 0.8488, 'grad_norm': 0.9458177089691162, 'learning_rate': 1.8406569665725033e-05, 'epoch': 0.62} +2025-02-05 15:10:40 - ERROR - stderr - 21%|██ | 4638/22434 [5:02:59<215:43:27, 43.64s/it] +2025-02-05 15:10:42 - ERROR - stderr - 21%|██ | 4639/22434 [5:03:02<154:42:17, 31.30s/it] +2025-02-05 15:10:42 - ERROR - stderr - +2025-02-05 15:10:42 - ERROR - stderr - +2025-02-05 15:10:42 - INFO - stdout - {'loss': 0.9238, 'grad_norm': 1.008725881576538, 'learning_rate': 1.84057876911881e-05, 'epoch': 0.62} +2025-02-05 15:10:42 - ERROR - stderr - 21%|██ | 4639/22434 [5:03:02<154:42:17, 31.30s/it] +2025-02-05 15:11:16 - ERROR - stderr - 21%|██ | 4640/22434 [5:03:35<158:12:45, 32.01s/it] +2025-02-05 15:11:16 - ERROR - stderr - +2025-02-05 15:11:16 - ERROR - stderr - +2025-02-05 15:11:16 - INFO - stdout - {'loss': 1.0503, 'grad_norm': 1.1742769479751587, 'learning_rate': 1.840500554144057e-05, 'epoch': 0.62} +2025-02-05 15:11:16 - ERROR - stderr - 21%|██ | 4640/22434 [5:03:35<158:12:45, 32.01s/it] +2025-02-05 15:12:02 - ERROR - stderr - 21%|██ | 4641/22434 [5:04:22<180:03:23, 36.43s/it] +2025-02-05 15:12:02 - ERROR - stderr - +2025-02-05 15:12:02 - ERROR - stderr - +2025-02-05 15:12:02 - INFO - stdout - {'loss': 0.8906, 'grad_norm': 1.0403498411178589, 'learning_rate': 1.8404223216498747e-05, 'epoch': 0.62} +2025-02-05 15:12:02 - ERROR - stderr - 21%|██ | 4641/22434 [5:04:22<180:03:23, 36.43s/it] +2025-02-05 15:12:46 - ERROR - stderr - 21%|██ | 4642/22434 [5:05:05<190:10:44, 38.48s/it] +2025-02-05 15:12:46 - ERROR - stderr - +2025-02-05 15:12:46 - ERROR - stderr - +2025-02-05 15:12:46 - INFO - stdout - {'loss': 0.8627, 'grad_norm': 0.9641141295433044, 'learning_rate': 1.840344071637893e-05, 'epoch': 0.62} +2025-02-05 15:12:46 - ERROR - stderr - 21%|██ | 4642/22434 [5:05:05<190:10:44, 38.48s/it] +2025-02-05 15:12:48 - ERROR - stderr - 21%|██ | 4643/22434 [5:05:08<136:53:12, 27.70s/it] +2025-02-05 15:12:48 - ERROR - stderr - +2025-02-05 15:12:48 - ERROR - stderr - +2025-02-05 15:12:48 - INFO - stdout - {'loss': 0.8605, 'grad_norm': 0.9822632074356079, 'learning_rate': 1.840265804109743e-05, 'epoch': 0.62} +2025-02-05 15:12:48 - ERROR - stderr - 21%|██ | 4643/22434 [5:05:08<136:53:12, 27.70s/it] +2025-02-05 15:12:51 - ERROR - stderr - 21%|██ | 4644/22434 [5:05:10<99:30:09, 20.14s/it] +2025-02-05 15:12:51 - ERROR - stderr - +2025-02-05 15:12:51 - ERROR - stderr - +2025-02-05 15:12:51 - INFO - stdout - {'loss': 0.8634, 'grad_norm': 0.959027111530304, 'learning_rate': 1.8401875190670565e-05, 'epoch': 0.62} +2025-02-05 15:12:51 - ERROR - stderr - 21%|██ | 4644/22434 [5:05:10<99:30:09, 20.14s/it] +2025-02-05 15:13:29 - ERROR - stderr - 21%|██ | 4645/22434 [5:05:48<125:59:54, 25.50s/it] +2025-02-05 15:13:29 - ERROR - stderr - +2025-02-05 15:13:29 - ERROR - stderr - +2025-02-05 15:13:29 - INFO - stdout - {'loss': 0.9709, 'grad_norm': 1.1175315380096436, 'learning_rate': 1.8401092165114654e-05, 'epoch': 0.62} +2025-02-05 15:13:29 - ERROR - stderr - 21%|██ | 4645/22434 [5:05:49<125:59:54, 25.50s/it] +2025-02-05 15:14:03 - ERROR - stderr - 21%|██ | 4646/22434 [5:06:23<138:58:06, 28.12s/it] +2025-02-05 15:14:03 - ERROR - stderr - +2025-02-05 15:14:03 - ERROR - stderr - +2025-02-05 15:14:03 - INFO - stdout - {'loss': 1.0036, 'grad_norm': 1.069287657737732, 'learning_rate': 1.840030896444601e-05, 'epoch': 0.62} +2025-02-05 15:14:03 - ERROR - stderr - 21%|██ | 4646/22434 [5:06:23<138:58:06, 28.12s/it] +2025-02-05 15:14:27 - ERROR - stderr - 21%|██ | 4647/22434 [5:06:47<132:44:59, 26.87s/it] +2025-02-05 15:14:27 - ERROR - stderr - +2025-02-05 15:14:27 - ERROR - stderr - +2025-02-05 15:14:27 - INFO - stdout - {'loss': 0.9933, 'grad_norm': 1.1036072969436646, 'learning_rate': 1.839952558868097e-05, 'epoch': 0.62} +2025-02-05 15:14:27 - ERROR - stderr - 21%|██ | 4647/22434 [5:06:47<132:44:59, 26.87s/it] +2025-02-05 15:14:29 - ERROR - stderr - 21%|██ | 4648/22434 [5:06:49<96:36:14, 19.55s/it] +2025-02-05 15:14:29 - ERROR - stderr - +2025-02-05 15:14:29 - ERROR - stderr - +2025-02-05 15:14:29 - INFO - stdout - {'loss': 1.1598, 'grad_norm': 1.1730804443359375, 'learning_rate': 1.8398742037835853e-05, 'epoch': 0.62} +2025-02-05 15:14:29 - ERROR - stderr - 21%|██ | 4648/22434 [5:06:49<96:36:14, 19.55s/it] +2025-02-05 15:14:58 - ERROR - stderr - 21%|██ | 4649/22434 [5:07:17<109:26:38, 22.15s/it] +2025-02-05 15:14:58 - ERROR - stderr - +2025-02-05 15:14:58 - ERROR - stderr - +2025-02-05 15:14:58 - INFO - stdout - {'loss': 0.8336, 'grad_norm': 1.0492829084396362, 'learning_rate': 1.8397958311927e-05, 'epoch': 0.62} +2025-02-05 15:14:58 - ERROR - stderr - 21%|██ | 4649/22434 [5:07:17<109:26:38, 22.15s/it] +2025-02-05 15:15:34 - ERROR - stderr - 21%|██ | 4650/22434 [5:07:54<130:44:28, 26.47s/it] +2025-02-05 15:15:34 - ERROR - stderr - +2025-02-05 15:15:34 - ERROR - stderr - +2025-02-05 15:15:34 - INFO - stdout - {'loss': 0.9326, 'grad_norm': 0.9577750563621521, 'learning_rate': 1.8397174410970736e-05, 'epoch': 0.62} +2025-02-05 15:15:34 - ERROR - stderr - 21%|██ | 4650/22434 [5:07:54<130:44:28, 26.47s/it] +2025-02-05 15:16:08 - ERROR - stderr - 21%|██ | 4651/22434 [5:08:28<141:38:28, 28.67s/it] +2025-02-05 15:16:08 - ERROR - stderr - +2025-02-05 15:16:08 - ERROR - stderr - +2025-02-05 15:16:08 - INFO - stdout - {'loss': 0.922, 'grad_norm': 1.0941472053527832, 'learning_rate': 1.8396390334983406e-05, 'epoch': 0.62} +2025-02-05 15:16:08 - ERROR - stderr - 21%|██ | 4651/22434 [5:08:28<141:38:28, 28.67s/it] +2025-02-05 15:16:10 - ERROR - stderr - 21%|██ | 4652/22434 [5:08:30<102:52:00, 20.83s/it] +2025-02-05 15:16:11 - ERROR - stderr - +2025-02-05 15:16:11 - ERROR - stderr - +2025-02-05 15:16:11 - INFO - stdout - {'loss': 0.8784, 'grad_norm': 1.0802595615386963, 'learning_rate': 1.839560608398136e-05, 'epoch': 0.62} +2025-02-05 15:16:11 - ERROR - stderr - 21%|██ | 4652/22434 [5:08:30<102:52:00, 20.83s/it] +2025-02-05 15:16:13 - ERROR - stderr - 21%|██ | 4653/22434 [5:08:33<75:45:01, 15.34s/it] +2025-02-05 15:16:13 - ERROR - stderr - +2025-02-05 15:16:13 - ERROR - stderr - +2025-02-05 15:16:13 - INFO - stdout - {'loss': 0.8857, 'grad_norm': 1.0528788566589355, 'learning_rate': 1.8394821657980936e-05, 'epoch': 0.62} +2025-02-05 15:16:13 - ERROR - stderr - 21%|██ | 4653/22434 [5:08:33<75:45:01, 15.34s/it] +2025-02-05 15:16:50 - ERROR - stderr - 21%|██ | 4654/22434 [5:09:10<108:23:04, 21.95s/it] +2025-02-05 15:16:50 - ERROR - stderr - +2025-02-05 15:16:50 - ERROR - stderr - +2025-02-05 15:16:50 - INFO - stdout - {'loss': 0.9671, 'grad_norm': 1.1716103553771973, 'learning_rate': 1.8394037056998485e-05, 'epoch': 0.62} +2025-02-05 15:16:50 - ERROR - stderr - 21%|██ | 4654/22434 [5:09:10<108:23:04, 21.95s/it] +2025-02-05 15:16:53 - ERROR - stderr - 21%|██ | 4655/22434 [5:09:13<79:53:47, 16.18s/it] +2025-02-05 15:16:53 - ERROR - stderr - +2025-02-05 15:16:53 - ERROR - stderr - +2025-02-05 15:16:53 - INFO - stdout - {'loss': 0.9228, 'grad_norm': 1.091599941253662, 'learning_rate': 1.8393252281050364e-05, 'epoch': 0.62} +2025-02-05 15:16:53 - ERROR - stderr - 21%|██ | 4655/22434 [5:09:13<79:53:47, 16.18s/it] +2025-02-05 15:16:56 - ERROR - stderr - 21%|██ | 4656/22434 [5:09:15<59:39:28, 12.08s/it] +2025-02-05 15:16:56 - ERROR - stderr - +2025-02-05 15:16:56 - ERROR - stderr - +2025-02-05 15:16:56 - INFO - stdout - {'loss': 0.9923, 'grad_norm': 1.2274764776229858, 'learning_rate': 1.839246733015293e-05, 'epoch': 0.62} +2025-02-05 15:16:56 - ERROR - stderr - 21%|██ | 4656/22434 [5:09:15<59:39:28, 12.08s/it] +2025-02-05 15:17:33 - ERROR - stderr - 21%|██ | 4657/22434 [5:09:53<97:19:48, 19.71s/it] +2025-02-05 15:17:33 - ERROR - stderr - +2025-02-05 15:17:33 - ERROR - stderr - +2025-02-05 15:17:33 - INFO - stdout - {'loss': 1.0606, 'grad_norm': 1.0876737833023071, 'learning_rate': 1.839168220432255e-05, 'epoch': 0.62} +2025-02-05 15:17:33 - ERROR - stderr - 21%|██ | 4657/22434 [5:09:53<97:19:48, 19.71s/it] +2025-02-05 15:18:07 - ERROR - stderr - 21%|██ | 4658/22434 [5:10:27<118:03:00, 23.91s/it] +2025-02-05 15:18:07 - ERROR - stderr - +2025-02-05 15:18:07 - ERROR - stderr - +2025-02-05 15:18:07 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.1105893850326538, 'learning_rate': 1.8390896903575584e-05, 'epoch': 0.62} +2025-02-05 15:18:07 - ERROR - stderr - 21%|██ | 4658/22434 [5:10:27<118:03:00, 23.91s/it] +2025-02-05 15:18:09 - ERROR - stderr - 21%|██ | 4659/22434 [5:10:29<86:26:04, 17.51s/it] +2025-02-05 15:18:09 - ERROR - stderr - +2025-02-05 15:18:09 - ERROR - stderr - +2025-02-05 15:18:09 - INFO - stdout - {'loss': 0.8449, 'grad_norm': 1.0752147436141968, 'learning_rate': 1.8390111427928396e-05, 'epoch': 0.62} +2025-02-05 15:18:09 - ERROR - stderr - 21%|██ | 4659/22434 [5:10:29<86:26:04, 17.51s/it] +2025-02-05 15:18:12 - ERROR - stderr - 21%|██ | 4660/22434 [5:10:32<64:15:17, 13.01s/it] +2025-02-05 15:18:12 - ERROR - stderr - +2025-02-05 15:18:12 - ERROR - stderr - +2025-02-05 15:18:12 - INFO - stdout - {'loss': 0.9002, 'grad_norm': 1.020026445388794, 'learning_rate': 1.8389325777397368e-05, 'epoch': 0.62} +2025-02-05 15:18:12 - ERROR - stderr - 21%|██ | 4660/22434 [5:10:32<64:15:17, 13.01s/it] +2025-02-05 15:18:39 - ERROR - stderr - 21%|██ | 4661/22434 [5:10:59<85:44:05, 17.37s/it] +2025-02-05 15:18:39 - ERROR - stderr - +2025-02-05 15:18:39 - ERROR - stderr - +2025-02-05 15:18:39 - INFO - stdout - {'loss': 1.0133, 'grad_norm': 1.0753370523452759, 'learning_rate': 1.8388539951998875e-05, 'epoch': 0.62} +2025-02-05 15:18:39 - ERROR - stderr - 21%|██ | 4661/22434 [5:10:59<85:44:05, 17.37s/it] +2025-02-05 15:18:42 - ERROR - stderr - 21%|██ | 4662/22434 [5:11:02<63:38:34, 12.89s/it] +2025-02-05 15:18:42 - ERROR - stderr - +2025-02-05 15:18:42 - ERROR - stderr - +2025-02-05 15:18:42 - INFO - stdout - {'loss': 0.976, 'grad_norm': 1.231313705444336, 'learning_rate': 1.8387753951749284e-05, 'epoch': 0.62} +2025-02-05 15:18:42 - ERROR - stderr - 21%|██ | 4662/22434 [5:11:02<63:38:34, 12.89s/it] +2025-02-05 15:18:45 - ERROR - stderr - 21%|██ | 4663/22434 [5:11:04<48:46:34, 9.88s/it] +2025-02-05 15:18:45 - ERROR - stderr - +2025-02-05 15:18:45 - ERROR - stderr - +2025-02-05 15:18:45 - INFO - stdout - {'loss': 1.0082, 'grad_norm': 1.132586121559143, 'learning_rate': 1.8386967776664996e-05, 'epoch': 0.62} +2025-02-05 15:18:45 - ERROR - stderr - 21%|██ | 4663/22434 [5:11:05<48:46:34, 9.88s/it] +2025-02-05 15:19:06 - ERROR - stderr - 21%|██ | 4664/22434 [5:11:26<65:34:50, 13.29s/it] +2025-02-05 15:19:06 - ERROR - stderr - +2025-02-05 15:19:06 - ERROR - stderr - +2025-02-05 15:19:06 - INFO - stdout - {'loss': 1.018, 'grad_norm': 1.079953908920288, 'learning_rate': 1.8386181426762387e-05, 'epoch': 0.62} +2025-02-05 15:19:06 - ERROR - stderr - 21%|██ | 4664/22434 [5:11:26<65:34:50, 13.29s/it] +2025-02-05 15:19:08 - ERROR - stderr - 21%|██ | 4665/22434 [5:11:28<49:32:59, 10.04s/it] +2025-02-05 15:19:08 - ERROR - stderr - +2025-02-05 15:19:08 - ERROR - stderr - +2025-02-05 15:19:08 - INFO - stdout - {'loss': 0.977, 'grad_norm': 1.1663509607315063, 'learning_rate': 1.8385394902057853e-05, 'epoch': 0.62} +2025-02-05 15:19:08 - ERROR - stderr - 21%|██ | 4665/22434 [5:11:28<49:32:59, 10.04s/it] +2025-02-05 15:19:24 - ERROR - stderr - 21%|██ | 4666/22434 [5:11:44<57:36:48, 11.67s/it] +2025-02-05 15:19:24 - ERROR - stderr - +2025-02-05 15:19:24 - ERROR - stderr - +2025-02-05 15:19:24 - INFO - stdout - {'loss': 0.9999, 'grad_norm': 1.2637856006622314, 'learning_rate': 1.8384608202567786e-05, 'epoch': 0.62} +2025-02-05 15:19:24 - ERROR - stderr - 21%|██ | 4666/22434 [5:11:44<57:36:48, 11.67s/it] +2025-02-05 15:19:26 - ERROR - stderr - 21%|██ | 4667/22434 [5:11:46<43:59:54, 8.92s/it] +2025-02-05 15:19:26 - ERROR - stderr - +2025-02-05 15:19:26 - ERROR - stderr - +2025-02-05 15:19:26 - INFO - stdout - {'loss': 0.9518, 'grad_norm': 1.0912624597549438, 'learning_rate': 1.838382132830858e-05, 'epoch': 0.62} +2025-02-05 15:19:26 - ERROR - stderr - 21%|██ | 4667/22434 [5:11:46<43:59:54, 8.92s/it] +2025-02-05 15:19:29 - ERROR - stderr - 21%|██ | 4668/22434 [5:11:49<34:28:26, 6.99s/it] +2025-02-05 15:19:29 - ERROR - stderr - +2025-02-05 15:19:29 - ERROR - stderr - +2025-02-05 15:19:29 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.052543044090271, 'learning_rate': 1.8383034279296646e-05, 'epoch': 0.62} +2025-02-05 15:19:29 - ERROR - stderr - 21%|██ | 4668/22434 [5:11:49<34:28:26, 6.99s/it] +2025-02-05 15:19:32 - ERROR - stderr - 21%|██ | 4669/22434 [5:11:51<28:02:52, 5.68s/it] +2025-02-05 15:19:32 - ERROR - stderr - +2025-02-05 15:19:32 - ERROR - stderr - +2025-02-05 15:19:32 - INFO - stdout - {'loss': 0.8677, 'grad_norm': 1.0778412818908691, 'learning_rate': 1.838224705554838e-05, 'epoch': 0.62} +2025-02-05 15:19:32 - ERROR - stderr - 21%|██ | 4669/22434 [5:11:51<28:02:52, 5.68s/it] +2025-02-05 15:19:34 - ERROR - stderr - 21%|██ | 4670/22434 [5:11:54<23:20:32, 4.73s/it] +2025-02-05 15:19:34 - ERROR - stderr - +2025-02-05 15:19:34 - ERROR - stderr - +2025-02-05 15:19:34 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.1696240901947021, 'learning_rate': 1.83814596570802e-05, 'epoch': 0.62} +2025-02-05 15:19:34 - ERROR - stderr - 21%|██ | 4670/22434 [5:11:54<23:20:32, 4.73s/it] +2025-02-05 15:19:37 - ERROR - stderr - 21%|██ | 4671/22434 [5:11:57<20:36:45, 4.18s/it] +2025-02-05 15:19:37 - ERROR - stderr - +2025-02-05 15:19:37 - ERROR - stderr - +2025-02-05 15:19:37 - INFO - stdout - {'loss': 0.9752, 'grad_norm': 1.082571029663086, 'learning_rate': 1.8380672083908512e-05, 'epoch': 0.62} +2025-02-05 15:19:37 - ERROR - stderr - 21%|██ | 4671/22434 [5:11:57<20:36:45, 4.18s/it] +2025-02-05 15:19:39 - ERROR - stderr - 21%|██ | 4672/22434 [5:11:59<18:07:56, 3.68s/it] +2025-02-05 15:19:39 - ERROR - stderr - +2025-02-05 15:19:39 - ERROR - stderr - +2025-02-05 15:19:39 - INFO - stdout - {'loss': 0.915, 'grad_norm': 0.9724946618080139, 'learning_rate': 1.837988433604973e-05, 'epoch': 0.62} +2025-02-05 15:19:39 - ERROR - stderr - 21%|██ | 4672/22434 [5:11:59<18:07:56, 3.68s/it] +2025-02-05 15:19:42 - ERROR - stderr - 21%|██ | 4673/22434 [5:12:02<16:26:52, 3.33s/it] +2025-02-05 15:19:42 - ERROR - stderr - +2025-02-05 15:19:42 - ERROR - stderr - +2025-02-05 15:19:42 - INFO - stdout - {'loss': 1.0687, 'grad_norm': 1.0701051950454712, 'learning_rate': 1.837909641352028e-05, 'epoch': 0.62} +2025-02-05 15:19:42 - ERROR - stderr - 21%|██ | 4673/22434 [5:12:02<16:26:52, 3.33s/it] +2025-02-05 15:19:44 - ERROR - stderr - 21%|██ | 4674/22434 [5:12:04<15:13:35, 3.09s/it] +2025-02-05 15:19:45 - ERROR - stderr - +2025-02-05 15:19:45 - ERROR - stderr - +2025-02-05 15:19:45 - INFO - stdout - {'loss': 0.9302, 'grad_norm': 1.0757858753204346, 'learning_rate': 1.8378308316336585e-05, 'epoch': 0.63} +2025-02-05 15:19:45 - ERROR - stderr - 21%|██ | 4674/22434 [5:12:04<15:13:35, 3.09s/it] +2025-02-05 15:19:47 - ERROR - stderr - 21%|██ | 4675/22434 [5:12:07<14:19:08, 2.90s/it] +2025-02-05 15:19:47 - ERROR - stderr - +2025-02-05 15:19:47 - ERROR - stderr - +2025-02-05 15:19:47 - INFO - stdout - {'loss': 1.0059, 'grad_norm': 1.1231738328933716, 'learning_rate': 1.837752004451507e-05, 'epoch': 0.63} +2025-02-05 15:19:47 - ERROR - stderr - 21%|██ | 4675/22434 [5:12:07<14:19:08, 2.90s/it] +2025-02-05 15:19:50 - ERROR - stderr - 21%|██ | 4676/22434 [5:12:09<13:51:06, 2.81s/it] +2025-02-05 15:19:50 - ERROR - stderr - +2025-02-05 15:19:50 - ERROR - stderr - +2025-02-05 15:19:50 - INFO - stdout - {'loss': 1.0226, 'grad_norm': 0.9858707785606384, 'learning_rate': 1.837673159807216e-05, 'epoch': 0.63} +2025-02-05 15:19:50 - ERROR - stderr - 21%|██ | 4676/22434 [5:12:09<13:51:06, 2.81s/it] +2025-02-05 15:19:52 - ERROR - stderr - 21%|██ | 4677/22434 [5:12:12<13:20:47, 2.71s/it] +2025-02-05 15:19:52 - ERROR - stderr - +2025-02-05 15:19:52 - ERROR - stderr - +2025-02-05 15:19:52 - INFO - stdout - {'loss': 1.1695, 'grad_norm': 1.1747640371322632, 'learning_rate': 1.8375942977024305e-05, 'epoch': 0.63} +2025-02-05 15:19:52 - ERROR - stderr - 21%|██ | 4677/22434 [5:12:12<13:20:47, 2.71s/it] +2025-02-05 15:19:55 - ERROR - stderr - 21%|██ | 4678/22434 [5:12:14<13:05:00, 2.65s/it] +2025-02-05 15:19:55 - ERROR - stderr - +2025-02-05 15:19:55 - ERROR - stderr - +2025-02-05 15:19:55 - INFO - stdout - {'loss': 0.971, 'grad_norm': 1.0730839967727661, 'learning_rate': 1.837515418138793e-05, 'epoch': 0.63} +2025-02-05 15:19:55 - ERROR - stderr - 21%|██ | 4678/22434 [5:12:14<13:05:00, 2.65s/it] +2025-02-05 15:19:57 - ERROR - stderr - 21%|██ | 4679/22434 [5:12:17<12:48:49, 2.60s/it] +2025-02-05 15:19:57 - ERROR - stderr - +2025-02-05 15:19:57 - ERROR - stderr - +2025-02-05 15:19:57 - INFO - stdout - {'loss': 1.0062, 'grad_norm': 1.0914748907089233, 'learning_rate': 1.8374365211179475e-05, 'epoch': 0.63} +2025-02-05 15:19:57 - ERROR - stderr - 21%|██ | 4679/22434 [5:12:17<12:48:49, 2.60s/it] +2025-02-05 15:20:00 - ERROR - stderr - 21%|██ | 4680/22434 [5:12:19<12:41:51, 2.57s/it] +2025-02-05 15:20:00 - ERROR - stderr - +2025-02-05 15:20:00 - ERROR - stderr - +2025-02-05 15:20:00 - INFO - stdout - {'loss': 1.0343, 'grad_norm': 1.0983752012252808, 'learning_rate': 1.8373576066415397e-05, 'epoch': 0.63} +2025-02-05 15:20:00 - ERROR - stderr - 21%|██ | 4680/22434 [5:12:19<12:41:51, 2.57s/it] +2025-02-05 15:20:02 - ERROR - stderr - 21%|██ | 4681/22434 [5:12:22<12:29:10, 2.53s/it] +2025-02-05 15:20:02 - ERROR - stderr - +2025-02-05 15:20:02 - ERROR - stderr - +2025-02-05 15:20:02 - INFO - stdout - {'loss': 0.9457, 'grad_norm': 1.1198084354400635, 'learning_rate': 1.8372786747112136e-05, 'epoch': 0.63} +2025-02-05 15:20:02 - ERROR - stderr - 21%|██ | 4681/22434 [5:12:22<12:29:10, 2.53s/it] +2025-02-05 15:20:04 - ERROR - stderr - 21%|██ | 4682/22434 [5:12:24<12:30:51, 2.54s/it] +2025-02-05 15:20:05 - ERROR - stderr - +2025-02-05 15:20:05 - ERROR - stderr - +2025-02-05 15:20:05 - INFO - stdout - {'loss': 0.9689, 'grad_norm': 1.0994049310684204, 'learning_rate': 1.8371997253286146e-05, 'epoch': 0.63} +2025-02-05 15:20:05 - ERROR - stderr - 21%|██ | 4682/22434 [5:12:24<12:30:51, 2.54s/it] +2025-02-05 15:20:07 - ERROR - stderr - 21%|██ | 4683/22434 [5:12:27<12:32:55, 2.54s/it] +2025-02-05 15:20:07 - ERROR - stderr - +2025-02-05 15:20:07 - ERROR - stderr - +2025-02-05 15:20:07 - INFO - stdout - {'loss': 0.9985, 'grad_norm': 1.0492175817489624, 'learning_rate': 1.8371207584953886e-05, 'epoch': 0.63} +2025-02-05 15:20:07 - ERROR - stderr - 21%|██ | 4683/22434 [5:12:27<12:32:55, 2.54s/it] +2025-02-05 15:20:10 - ERROR - stderr - 21%|██ | 4684/22434 [5:12:29<12:33:23, 2.55s/it] +2025-02-05 15:20:10 - ERROR - stderr - +2025-02-05 15:20:10 - ERROR - stderr - +2025-02-05 15:20:10 - INFO - stdout - {'loss': 0.9362, 'grad_norm': 0.9940704107284546, 'learning_rate': 1.8370417742131816e-05, 'epoch': 0.63} +2025-02-05 15:20:10 - ERROR - stderr - 21%|██ | 4684/22434 [5:12:29<12:33:23, 2.55s/it] +2025-02-05 15:20:12 - ERROR - stderr - 21%|██ | 4685/22434 [5:12:32<12:47:51, 2.60s/it] +2025-02-05 15:20:12 - ERROR - stderr - +2025-02-05 15:20:12 - ERROR - stderr - +2025-02-05 15:20:12 - INFO - stdout - {'loss': 0.8798, 'grad_norm': 0.9964712858200073, 'learning_rate': 1.8369627724836395e-05, 'epoch': 0.63} +2025-02-05 15:20:12 - ERROR - stderr - 21%|██ | 4685/22434 [5:12:32<12:47:51, 2.60s/it] +2025-02-05 15:20:15 - ERROR - stderr - 21%|██ | 4686/22434 [5:12:35<12:34:27, 2.55s/it] +2025-02-05 15:20:15 - ERROR - stderr - +2025-02-05 15:20:15 - ERROR - stderr - +2025-02-05 15:20:15 - INFO - stdout - {'loss': 0.9855, 'grad_norm': 1.1672533750534058, 'learning_rate': 1.8368837533084092e-05, 'epoch': 0.63} +2025-02-05 15:20:15 - ERROR - stderr - 21%|██ | 4686/22434 [5:12:35<12:34:27, 2.55s/it] +2025-02-05 15:20:17 - ERROR - stderr - 21%|██ | 4687/22434 [5:12:37<12:39:45, 2.57s/it] +2025-02-05 15:20:17 - ERROR - stderr - +2025-02-05 15:20:17 - ERROR - stderr - +2025-02-05 15:20:17 - INFO - stdout - {'loss': 0.9179, 'grad_norm': 0.9894228577613831, 'learning_rate': 1.8368047166891382e-05, 'epoch': 0.63} +2025-02-05 15:20:17 - ERROR - stderr - 21%|██ | 4687/22434 [5:12:37<12:39:45, 2.57s/it] +2025-02-05 15:20:35 - ERROR - stderr - 21%|██ | 4688/22434 [5:12:55<34:37:07, 7.02s/it] +2025-02-05 15:20:35 - ERROR - stderr - +2025-02-05 15:20:35 - ERROR - stderr - +2025-02-05 15:20:35 - INFO - stdout - {'loss': 0.9276, 'grad_norm': 1.0386804342269897, 'learning_rate': 1.8367256626274737e-05, 'epoch': 0.63} +2025-02-05 15:20:35 - ERROR - stderr - 21%|██ | 4688/22434 [5:12:55<34:37:07, 7.02s/it] +2025-02-05 15:20:55 - ERROR - stderr - 21%|██ | 4689/22434 [5:13:15<53:51:26, 10.93s/it] +2025-02-05 15:20:55 - ERROR - stderr - +2025-02-05 15:20:55 - ERROR - stderr - +2025-02-05 15:20:55 - INFO - stdout - {'loss': 0.9769, 'grad_norm': 1.2990498542785645, 'learning_rate': 1.836646591125063e-05, 'epoch': 0.63} +2025-02-05 15:20:55 - ERROR - stderr - 21%|██ | 4689/22434 [5:13:15<53:51:26, 10.93s/it] +2025-02-05 15:21:24 - ERROR - stderr - 21%|██ | 4690/22434 [5:13:43<80:25:59, 16.32s/it] +2025-02-05 15:21:24 - ERROR - stderr - +2025-02-05 15:21:24 - ERROR - stderr - +2025-02-05 15:21:24 - INFO - stdout - {'loss': 1.0282, 'grad_norm': 1.1790125370025635, 'learning_rate': 1.8365675021835548e-05, 'epoch': 0.63} +2025-02-05 15:21:24 - ERROR - stderr - 21%|██ | 4690/22434 [5:13:44<80:25:59, 16.32s/it] +2025-02-05 15:21:26 - ERROR - stderr - 21%|██ | 4691/22434 [5:13:46<59:59:52, 12.17s/it] +2025-02-05 15:21:26 - ERROR - stderr - +2025-02-05 15:21:26 - ERROR - stderr - +2025-02-05 15:21:26 - INFO - stdout - {'loss': 0.9978, 'grad_norm': 1.154527187347412, 'learning_rate': 1.8364883958045978e-05, 'epoch': 0.63} +2025-02-05 15:21:26 - ERROR - stderr - 21%|██ | 4691/22434 [5:13:46<59:59:52, 12.17s/it] +2025-02-05 15:21:29 - ERROR - stderr - 21%|██ | 4692/22434 [5:13:48<45:40:03, 9.27s/it] +2025-02-05 15:21:29 - ERROR - stderr - +2025-02-05 15:21:29 - ERROR - stderr - +2025-02-05 15:21:29 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.2219253778457642, 'learning_rate': 1.8364092719898402e-05, 'epoch': 0.63} +2025-02-05 15:21:29 - ERROR - stderr - 21%|██ | 4692/22434 [5:13:49<45:40:03, 9.27s/it] +2025-02-05 15:21:35 - ERROR - stderr - 21%|██ | 4693/22434 [5:13:55<41:52:34, 8.50s/it] +2025-02-05 15:21:35 - ERROR - stderr - +2025-02-05 15:21:35 - ERROR - stderr - +2025-02-05 15:21:35 - INFO - stdout - {'loss': 1.0046, 'grad_norm': 1.1092414855957031, 'learning_rate': 1.836330130740932e-05, 'epoch': 0.63} +2025-02-05 15:21:35 - ERROR - stderr - 21%|██ | 4693/22434 [5:13:55<41:52:34, 8.50s/it] +2025-02-05 15:21:47 - ERROR - stderr - 21%|██ | 4694/22434 [5:14:07<46:48:35, 9.50s/it] +2025-02-05 15:21:47 - ERROR - stderr - +2025-02-05 15:21:47 - ERROR - stderr - +2025-02-05 15:21:47 - INFO - stdout - {'loss': 0.8354, 'grad_norm': 1.0951077938079834, 'learning_rate': 1.8362509720595225e-05, 'epoch': 0.63} +2025-02-05 15:21:47 - ERROR - stderr - 21%|██ | 4694/22434 [5:14:07<46:48:35, 9.50s/it] +2025-02-05 15:22:13 - ERROR - stderr - 21%|██ | 4695/22434 [5:14:33<70:35:14, 14.33s/it] +2025-02-05 15:22:13 - ERROR - stderr - +2025-02-05 15:22:13 - ERROR - stderr - +2025-02-05 15:22:13 - INFO - stdout - {'loss': 0.902, 'grad_norm': 1.03300940990448, 'learning_rate': 1.8361717959472618e-05, 'epoch': 0.63} +2025-02-05 15:22:13 - ERROR - stderr - 21%|██ | 4695/22434 [5:14:33<70:35:14, 14.33s/it] +2025-02-05 15:22:28 - ERROR - stderr - 21%|██ | 4696/22434 [5:14:47<71:25:24, 14.50s/it] +2025-02-05 15:22:28 - ERROR - stderr - +2025-02-05 15:22:28 - ERROR - stderr - +2025-02-05 15:22:28 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.1207783222198486, 'learning_rate': 1.8360926024058e-05, 'epoch': 0.63} +2025-02-05 15:22:28 - ERROR - stderr - 21%|██ | 4696/22434 [5:14:48<71:25:24, 14.50s/it] +2025-02-05 15:23:12 - ERROR - stderr - 21%|██ | 4697/22434 [5:15:32<115:06:54, 23.36s/it] +2025-02-05 15:23:12 - ERROR - stderr - +2025-02-05 15:23:12 - ERROR - stderr - +2025-02-05 15:23:12 - INFO - stdout - {'loss': 0.9068, 'grad_norm': 1.087233543395996, 'learning_rate': 1.836013391436788e-05, 'epoch': 0.63} +2025-02-05 15:23:12 - ERROR - stderr - 21%|██ | 4697/22434 [5:15:32<115:06:54, 23.36s/it] +2025-02-05 15:23:14 - ERROR - stderr - 21%|██ | 4698/22434 [5:15:34<84:14:50, 17.10s/it] +2025-02-05 15:23:14 - ERROR - stderr - +2025-02-05 15:23:14 - ERROR - stderr - +2025-02-05 15:23:14 - INFO - stdout - {'loss': 0.8789, 'grad_norm': 0.927949845790863, 'learning_rate': 1.8359341630418766e-05, 'epoch': 0.63} +2025-02-05 15:23:14 - ERROR - stderr - 21%|██ | 4698/22434 [5:15:34<84:14:50, 17.10s/it] +2025-02-05 15:23:46 - ERROR - stderr - 21%|██ | 4699/22434 [5:16:06<106:05:31, 21.54s/it] +2025-02-05 15:23:46 - ERROR - stderr - +2025-02-05 15:23:46 - ERROR - stderr - +2025-02-05 15:23:46 - INFO - stdout - {'loss': 0.8717, 'grad_norm': 1.1531922817230225, 'learning_rate': 1.8358549172227176e-05, 'epoch': 0.63} +2025-02-05 15:23:46 - ERROR - stderr - 21%|██ | 4699/22434 [5:16:06<106:05:31, 21.54s/it] +2025-02-05 15:23:49 - ERROR - stderr - 21%|██ | 4700/22434 [5:16:08<77:53:25, 15.81s/it] +2025-02-05 15:23:49 - ERROR - stderr - +2025-02-05 15:23:49 - ERROR - stderr - +2025-02-05 15:23:49 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.162847638130188, 'learning_rate': 1.8357756539809627e-05, 'epoch': 0.63} +2025-02-05 15:23:49 - ERROR - stderr - 21%|██ | 4700/22434 [5:16:08<77:53:25, 15.81s/it] +2025-02-05 15:24:38 - ERROR - stderr - 21%|██ | 4701/22434 [5:16:58<128:06:11, 26.01s/it] +2025-02-05 15:24:38 - ERROR - stderr - +2025-02-05 15:24:38 - ERROR - stderr - +2025-02-05 15:24:38 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.2207088470458984, 'learning_rate': 1.8356963733182642e-05, 'epoch': 0.63} +2025-02-05 15:24:38 - ERROR - stderr - 21%|██ | 4701/22434 [5:16:58<128:06:11, 26.01s/it] +2025-02-05 15:24:54 - ERROR - stderr - 21%|██ | 4702/22434 [5:17:14<113:07:17, 22.97s/it] +2025-02-05 15:24:54 - ERROR - stderr - +2025-02-05 15:24:54 - ERROR - stderr - +2025-02-05 15:24:54 - INFO - stdout - {'loss': 1.0881, 'grad_norm': 1.0907632112503052, 'learning_rate': 1.835617075236274e-05, 'epoch': 0.63} +2025-02-05 15:24:54 - ERROR - stderr - 21%|██ | 4702/22434 [5:17:14<113:07:17, 22.97s/it] +2025-02-05 15:25:49 - ERROR - stderr - 21%|██ | 4703/22434 [5:18:09<160:12:18, 32.53s/it] +2025-02-05 15:25:49 - ERROR - stderr - +2025-02-05 15:25:49 - ERROR - stderr - +2025-02-05 15:25:49 - INFO - stdout - {'loss': 0.949, 'grad_norm': 1.1067560911178589, 'learning_rate': 1.835537759736646e-05, 'epoch': 0.63} +2025-02-05 15:25:49 - ERROR - stderr - 21%|██ | 4703/22434 [5:18:09<160:12:18, 32.53s/it] +2025-02-05 15:26:04 - ERROR - stderr - 21%|██ | 4704/22434 [5:18:24<134:49:26, 27.38s/it] +2025-02-05 15:26:05 - ERROR - stderr - +2025-02-05 15:26:05 - ERROR - stderr - +2025-02-05 15:26:05 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 1.0939924716949463, 'learning_rate': 1.8354584268210328e-05, 'epoch': 0.63} +2025-02-05 15:26:05 - ERROR - stderr - 21%|██ | 4704/22434 [5:18:24<134:49:26, 27.38s/it] +2025-02-05 15:26:56 - ERROR - stderr - 21%|██ | 4705/22434 [5:19:16<171:12:20, 34.76s/it] +2025-02-05 15:26:57 - ERROR - stderr - +2025-02-05 15:26:57 - ERROR - stderr - +2025-02-05 15:26:57 - INFO - stdout - {'loss': 0.9368, 'grad_norm': 1.0594813823699951, 'learning_rate': 1.835379076491088e-05, 'epoch': 0.63} +2025-02-05 15:26:57 - ERROR - stderr - 21%|██ | 4705/22434 [5:19:16<171:12:20, 34.76s/it] +2025-02-05 15:27:10 - ERROR - stderr - 21%|██ | 4706/22434 [5:19:29<139:19:22, 28.29s/it] +2025-02-05 15:27:10 - ERROR - stderr - +2025-02-05 15:27:10 - ERROR - stderr - +2025-02-05 15:27:10 - INFO - stdout - {'loss': 0.9513, 'grad_norm': 1.1220465898513794, 'learning_rate': 1.8352997087484657e-05, 'epoch': 0.63} +2025-02-05 15:27:10 - ERROR - stderr - 21%|██ | 4706/22434 [5:19:29<139:19:22, 28.29s/it] +2025-02-05 15:27:29 - ERROR - stderr - 21%|██ | 4707/22434 [5:19:49<125:44:23, 25.54s/it] +2025-02-05 15:27:29 - ERROR - stderr - +2025-02-05 15:27:29 - ERROR - stderr - +2025-02-05 15:27:29 - INFO - stdout - {'loss': 0.9102, 'grad_norm': 1.0112025737762451, 'learning_rate': 1.8352203235948202e-05, 'epoch': 0.63} +2025-02-05 15:27:29 - ERROR - stderr - 21%|██ | 4707/22434 [5:19:49<125:44:23, 25.54s/it] +2025-02-05 15:27:31 - ERROR - stderr - 21%|██ | 4708/22434 [5:19:51<91:37:09, 18.61s/it] +2025-02-05 15:27:31 - ERROR - stderr - +2025-02-05 15:27:31 - ERROR - stderr - +2025-02-05 15:27:31 - INFO - stdout - {'loss': 0.9121, 'grad_norm': 1.080566644668579, 'learning_rate': 1.8351409210318064e-05, 'epoch': 0.63} +2025-02-05 15:27:31 - ERROR - stderr - 21%|██ | 4708/22434 [5:19:51<91:37:09, 18.61s/it] +2025-02-05 15:28:29 - ERROR - stderr - 21%|██ | 4709/22434 [5:20:48<148:51:24, 30.23s/it] +2025-02-05 15:28:29 - ERROR - stderr - +2025-02-05 15:28:29 - ERROR - stderr - +2025-02-05 15:28:29 - INFO - stdout - {'loss': 0.9446, 'grad_norm': 1.19144868850708, 'learning_rate': 1.8350615010610796e-05, 'epoch': 0.63} +2025-02-05 15:28:29 - ERROR - stderr - 21%|██ | 4709/22434 [5:20:48<148:51:24, 30.23s/it] +2025-02-05 15:28:31 - ERROR - stderr - 21%|██ | 4710/22434 [5:20:51<107:57:50, 21.93s/it] +2025-02-05 15:28:31 - ERROR - stderr - +2025-02-05 15:28:31 - ERROR - stderr - +2025-02-05 15:28:31 - INFO - stdout - {'loss': 0.9419, 'grad_norm': 1.1358686685562134, 'learning_rate': 1.8349820636842944e-05, 'epoch': 0.63} +2025-02-05 15:28:31 - ERROR - stderr - 21%|██ | 4710/22434 [5:20:51<107:57:50, 21.93s/it] +2025-02-05 15:29:35 - ERROR - stderr - 21%|██ | 4711/22434 [5:21:55<169:55:39, 34.52s/it] +2025-02-05 15:29:35 - ERROR - stderr - +2025-02-05 15:29:35 - ERROR - stderr - +2025-02-05 15:29:35 - INFO - stdout - {'loss': 0.9347, 'grad_norm': 1.1688398122787476, 'learning_rate': 1.8349026089031072e-05, 'epoch': 0.63} +2025-02-05 15:29:35 - ERROR - stderr - 21%|██ | 4711/22434 [5:21:55<169:55:39, 34.52s/it] +2025-02-05 15:33:02 - ERROR - stderr - 21%|██ | 4712/22434 [5:25:22<425:01:50, 86.34s/it] +2025-02-05 15:33:02 - ERROR - stderr - +2025-02-05 15:33:02 - ERROR - stderr - +2025-02-05 15:33:02 - INFO - stdout - {'loss': 1.0436, 'grad_norm': 1.1661232709884644, 'learning_rate': 1.834823136719174e-05, 'epoch': 0.63} +2025-02-05 15:33:02 - ERROR - stderr - 21%|██ | 4712/22434 [5:25:22<425:01:50, 86.34s/it] +2025-02-05 15:33:23 - ERROR - stderr - 21%|██ | 4713/22434 [5:25:42<327:37:52, 66.56s/it] +2025-02-05 15:33:23 - ERROR - stderr - +2025-02-05 15:33:23 - ERROR - stderr - +2025-02-05 15:33:23 - INFO - stdout - {'loss': 0.8125, 'grad_norm': 1.099495530128479, 'learning_rate': 1.8347436471341514e-05, 'epoch': 0.63} +2025-02-05 15:33:23 - ERROR - stderr - 21%|██ | 4713/22434 [5:25:42<327:37:52, 66.56s/it] +2025-02-05 15:33:25 - ERROR - stderr - 21%|██ | 4714/22434 [5:25:45<233:04:57, 47.35s/it] +2025-02-05 15:33:25 - ERROR - stderr - +2025-02-05 15:33:25 - ERROR - stderr - +2025-02-05 15:33:25 - INFO - stdout - {'loss': 0.911, 'grad_norm': 1.0422277450561523, 'learning_rate': 1.834664140149696e-05, 'epoch': 0.63} +2025-02-05 15:33:25 - ERROR - stderr - 21%|██ | 4714/22434 [5:25:45<233:04:57, 47.35s/it] +2025-02-05 15:34:00 - ERROR - stderr - 21%|██ | 4715/22434 [5:26:20<214:17:29, 43.54s/it] +2025-02-05 15:34:00 - ERROR - stderr - +2025-02-05 15:34:00 - ERROR - stderr - +2025-02-05 15:34:00 - INFO - stdout - {'loss': 0.8364, 'grad_norm': 1.060258150100708, 'learning_rate': 1.8345846157674657e-05, 'epoch': 0.63} +2025-02-05 15:34:00 - ERROR - stderr - 21%|██ | 4715/22434 [5:26:20<214:17:29, 43.54s/it] +2025-02-05 15:34:39 - ERROR - stderr - 21%|██ | 4716/22434 [5:26:59<208:00:32, 42.26s/it] +2025-02-05 15:34:39 - ERROR - stderr - +2025-02-05 15:34:39 - ERROR - stderr - +2025-02-05 15:34:39 - INFO - stdout - {'loss': 0.9344, 'grad_norm': 1.0634009838104248, 'learning_rate': 1.8345050739891175e-05, 'epoch': 0.63} +2025-02-05 15:34:39 - ERROR - stderr - 21%|██ | 4716/22434 [5:26:59<208:00:32, 42.26s/it] +2025-02-05 15:35:13 - ERROR - stderr - 21%|██ | 4717/22434 [5:27:32<195:00:01, 39.62s/it] +2025-02-05 15:35:13 - ERROR - stderr - +2025-02-05 15:35:13 - ERROR - stderr - +2025-02-05 15:35:13 - INFO - stdout - {'loss': 0.8351, 'grad_norm': 0.9452177286148071, 'learning_rate': 1.8344255148163095e-05, 'epoch': 0.63} +2025-02-05 15:35:13 - ERROR - stderr - 21%|██ | 4717/22434 [5:27:32<195:00:01, 39.62s/it] +2025-02-05 15:35:42 - ERROR - stderr - 21%|██ | 4718/22434 [5:28:02<180:25:57, 36.67s/it] +2025-02-05 15:35:42 - ERROR - stderr - +2025-02-05 15:35:42 - ERROR - stderr - +2025-02-05 15:35:42 - INFO - stdout - {'loss': 0.8849, 'grad_norm': 1.1992483139038086, 'learning_rate': 1.8343459382507003e-05, 'epoch': 0.63} +2025-02-05 15:35:42 - ERROR - stderr - 21%|██ | 4718/22434 [5:28:02<180:25:57, 36.67s/it] +2025-02-05 15:36:25 - ERROR - stderr - 21%|██ | 4719/22434 [5:28:45<189:45:31, 38.56s/it] +2025-02-05 15:36:25 - ERROR - stderr - +2025-02-05 15:36:25 - ERROR - stderr - +2025-02-05 15:36:25 - INFO - stdout - {'loss': 0.9358, 'grad_norm': 1.1165494918823242, 'learning_rate': 1.834266344293948e-05, 'epoch': 0.63} +2025-02-05 15:36:25 - ERROR - stderr - 21%|██ | 4719/22434 [5:28:45<189:45:31, 38.56s/it] +2025-02-05 15:37:03 - ERROR - stderr - 21%|██ | 4720/22434 [5:29:23<189:07:50, 38.44s/it] +2025-02-05 15:37:04 - ERROR - stderr - +2025-02-05 15:37:04 - ERROR - stderr - +2025-02-05 15:37:04 - INFO - stdout - {'loss': 0.9112, 'grad_norm': 1.1300991773605347, 'learning_rate': 1.8341867329477125e-05, 'epoch': 0.63} +2025-02-05 15:37:04 - ERROR - stderr - 21%|██ | 4720/22434 [5:29:23<189:07:50, 38.44s/it] +2025-02-05 15:37:06 - ERROR - stderr - 21%|██ | 4721/22434 [5:29:26<136:12:40, 27.68s/it] +2025-02-05 15:37:06 - ERROR - stderr - +2025-02-05 15:37:06 - ERROR - stderr - +2025-02-05 15:37:06 - INFO - stdout - {'loss': 1.1174, 'grad_norm': 1.1435790061950684, 'learning_rate': 1.834107104213653e-05, 'epoch': 0.63} +2025-02-05 15:37:06 - ERROR - stderr - 21%|██ | 4721/22434 [5:29:26<136:12:40, 27.68s/it] +2025-02-05 15:37:51 - ERROR - stderr - 21%|██ | 4722/22434 [5:30:11<161:30:30, 32.83s/it] +2025-02-05 15:37:51 - ERROR - stderr - +2025-02-05 15:37:51 - ERROR - stderr - +2025-02-05 15:37:51 - INFO - stdout - {'loss': 0.9511, 'grad_norm': 1.01833176612854, 'learning_rate': 1.8340274580934284e-05, 'epoch': 0.63} +2025-02-05 15:37:51 - ERROR - stderr - 21%|██ | 4722/22434 [5:30:11<161:30:30, 32.83s/it] +2025-02-05 15:38:23 - ERROR - stderr - 21%|██ | 4723/22434 [5:30:43<160:05:37, 32.54s/it] +2025-02-05 15:38:23 - ERROR - stderr - +2025-02-05 15:38:23 - ERROR - stderr - +2025-02-05 15:38:23 - INFO - stdout - {'loss': 0.9614, 'grad_norm': 1.0562607049942017, 'learning_rate': 1.8339477945886998e-05, 'epoch': 0.63} +2025-02-05 15:38:23 - ERROR - stderr - 21%|██ | 4723/22434 [5:30:43<160:05:37, 32.54s/it] +2025-02-05 15:38:25 - ERROR - stderr - 21%|██ | 4724/22434 [5:30:45<115:44:54, 23.53s/it] +2025-02-05 15:38:25 - ERROR - stderr - +2025-02-05 15:38:25 - ERROR - stderr - +2025-02-05 15:38:25 - INFO - stdout - {'loss': 0.9349, 'grad_norm': 1.1323667764663696, 'learning_rate': 1.833868113701127e-05, 'epoch': 0.63} +2025-02-05 15:38:25 - ERROR - stderr - 21%|██ | 4724/22434 [5:30:45<115:44:54, 23.53s/it] +2025-02-05 15:39:43 - ERROR - stderr - 21%|██ | 4725/22434 [5:32:03<195:30:37, 39.74s/it] +2025-02-05 15:39:43 - ERROR - stderr - +2025-02-05 15:39:43 - ERROR - stderr - +2025-02-05 15:39:43 - INFO - stdout - {'loss': 1.084, 'grad_norm': 1.2900893688201904, 'learning_rate': 1.833788415432372e-05, 'epoch': 0.63} +2025-02-05 15:39:43 - ERROR - stderr - 21%|██ | 4725/22434 [5:32:03<195:30:37, 39.74s/it] +2025-02-05 15:39:45 - ERROR - stderr - 21%|██ | 4726/22434 [5:32:05<140:24:52, 28.55s/it] +2025-02-05 15:39:45 - ERROR - stderr - +2025-02-05 15:39:45 - ERROR - stderr - +2025-02-05 15:39:45 - INFO - stdout - {'loss': 0.8973, 'grad_norm': 1.0599790811538696, 'learning_rate': 1.8337086997840952e-05, 'epoch': 0.63} +2025-02-05 15:39:45 - ERROR - stderr - 21%|██ | 4726/22434 [5:32:05<140:24:52, 28.55s/it] +2025-02-05 15:40:24 - ERROR - stderr - 21%|██ | 4727/22434 [5:32:44<155:17:28, 31.57s/it] +2025-02-05 15:40:24 - ERROR - stderr - +2025-02-05 15:40:24 - ERROR - stderr - +2025-02-05 15:40:24 - INFO - stdout - {'loss': 0.8517, 'grad_norm': 1.0545300245285034, 'learning_rate': 1.833628966757958e-05, 'epoch': 0.63} +2025-02-05 15:40:24 - ERROR - stderr - 21%|██ | 4727/22434 [5:32:44<155:17:28, 31.57s/it] +2025-02-05 15:40:52 - ERROR - stderr - 21%|██ | 4728/22434 [5:33:11<149:25:58, 30.38s/it] +2025-02-05 15:40:52 - ERROR - stderr - +2025-02-05 15:40:52 - ERROR - stderr - +2025-02-05 15:40:52 - INFO - stdout - {'loss': 0.9529, 'grad_norm': 1.0771454572677612, 'learning_rate': 1.833549216355623e-05, 'epoch': 0.63} +2025-02-05 15:40:52 - ERROR - stderr - 21%|██ | 4728/22434 [5:33:11<149:25:58, 30.38s/it] +2025-02-05 15:40:54 - ERROR - stderr - 21%|██ | 4729/22434 [5:33:14<108:13:31, 22.01s/it] +2025-02-05 15:40:54 - ERROR - stderr - +2025-02-05 15:40:54 - ERROR - stderr - +2025-02-05 15:40:54 - INFO - stdout - {'loss': 0.8878, 'grad_norm': 1.005878210067749, 'learning_rate': 1.833469448578752e-05, 'epoch': 0.63} +2025-02-05 15:40:54 - ERROR - stderr - 21%|██ | 4729/22434 [5:33:14<108:13:31, 22.01s/it] +2025-02-05 15:40:57 - ERROR - stderr - 21%|██ | 4730/22434 [5:33:17<79:56:09, 16.25s/it] +2025-02-05 15:40:57 - ERROR - stderr - +2025-02-05 15:40:57 - ERROR - stderr - +2025-02-05 15:40:57 - INFO - stdout - {'loss': 0.9824, 'grad_norm': 1.2105047702789307, 'learning_rate': 1.833389663429008e-05, 'epoch': 0.63} +2025-02-05 15:40:57 - ERROR - stderr - 21%|██ | 4730/22434 [5:33:17<79:56:09, 16.25s/it] +2025-02-05 15:40:59 - ERROR - stderr - 21%|██ | 4731/22434 [5:33:19<59:40:21, 12.13s/it] +2025-02-05 15:40:59 - ERROR - stderr - +2025-02-05 15:40:59 - ERROR - stderr - +2025-02-05 15:40:59 - INFO - stdout - {'loss': 1.0532, 'grad_norm': 1.0651311874389648, 'learning_rate': 1.833309860908054e-05, 'epoch': 0.63} +2025-02-05 15:40:59 - ERROR - stderr - 21%|██ | 4731/22434 [5:33:19<59:40:21, 12.13s/it] +2025-02-05 15:41:02 - ERROR - stderr - 21%|██ | 4732/22434 [5:33:22<45:26:59, 9.24s/it] +2025-02-05 15:41:02 - ERROR - stderr - +2025-02-05 15:41:02 - ERROR - stderr - +2025-02-05 15:41:02 - INFO - stdout - {'loss': 0.9934, 'grad_norm': 0.9635155200958252, 'learning_rate': 1.833230041017553e-05, 'epoch': 0.63} +2025-02-05 15:41:02 - ERROR - stderr - 21%|██ | 4732/22434 [5:33:22<45:26:59, 9.24s/it] +2025-02-05 15:41:04 - ERROR - stderr - 21%|██ | 4733/22434 [5:33:24<35:31:48, 7.23s/it] +2025-02-05 15:41:04 - ERROR - stderr - +2025-02-05 15:41:04 - ERROR - stderr - +2025-02-05 15:41:04 - INFO - stdout - {'loss': 0.8982, 'grad_norm': 1.1282401084899902, 'learning_rate': 1.8331502037591696e-05, 'epoch': 0.63} +2025-02-05 15:41:04 - ERROR - stderr - 21%|██ | 4733/22434 [5:33:24<35:31:48, 7.23s/it] +2025-02-05 15:41:07 - ERROR - stderr - 21%|██ | 4734/22434 [5:33:27<28:36:23, 5.82s/it] +2025-02-05 15:41:07 - ERROR - stderr - +2025-02-05 15:41:07 - ERROR - stderr - +2025-02-05 15:41:07 - INFO - stdout - {'loss': 0.9306, 'grad_norm': 1.0344957113265991, 'learning_rate': 1.8330703491345668e-05, 'epoch': 0.63} +2025-02-05 15:41:07 - ERROR - stderr - 21%|██ | 4734/22434 [5:33:27<28:36:23, 5.82s/it] +2025-02-05 15:41:32 - ERROR - stderr - 21%|██ | 4735/22434 [5:33:52<57:25:18, 11.68s/it] +2025-02-05 15:41:32 - ERROR - stderr - +2025-02-05 15:41:32 - ERROR - stderr - +2025-02-05 15:41:32 - INFO - stdout - {'loss': 0.8377, 'grad_norm': 0.9527806639671326, 'learning_rate': 1.8329904771454105e-05, 'epoch': 0.63} +2025-02-05 15:41:32 - ERROR - stderr - 21%|██ | 4735/22434 [5:33:52<57:25:18, 11.68s/it] +2025-02-05 15:41:35 - ERROR - stderr - 21%|██ | 4736/22434 [5:33:54<43:49:00, 8.91s/it] +2025-02-05 15:41:35 - ERROR - stderr - +2025-02-05 15:41:35 - ERROR - stderr - +2025-02-05 15:41:35 - INFO - stdout - {'loss': 1.023, 'grad_norm': 1.1379767656326294, 'learning_rate': 1.832910587793364e-05, 'epoch': 0.63} +2025-02-05 15:41:35 - ERROR - stderr - 21%|██ | 4736/22434 [5:33:55<43:49:00, 8.91s/it] +2025-02-05 15:41:50 - ERROR - stderr - 21%|██ | 4737/22434 [5:34:09<52:39:11, 10.71s/it] +2025-02-05 15:41:50 - ERROR - stderr - +2025-02-05 15:41:50 - ERROR - stderr - +2025-02-05 15:41:50 - INFO - stdout - {'loss': 0.9354, 'grad_norm': 0.9548843502998352, 'learning_rate': 1.832830681080094e-05, 'epoch': 0.63} +2025-02-05 15:41:50 - ERROR - stderr - 21%|██ | 4737/22434 [5:34:09<52:39:11, 10.71s/it] +2025-02-05 15:41:52 - ERROR - stderr - 21%|██ | 4738/22434 [5:34:12<40:30:20, 8.24s/it] +2025-02-05 15:41:52 - ERROR - stderr - +2025-02-05 15:41:52 - ERROR - stderr - +2025-02-05 15:41:52 - INFO - stdout - {'loss': 0.9749, 'grad_norm': 1.007220983505249, 'learning_rate': 1.8327507570072648e-05, 'epoch': 0.63} +2025-02-05 15:41:52 - ERROR - stderr - 21%|██ | 4738/22434 [5:34:12<40:30:20, 8.24s/it] +2025-02-05 15:41:56 - ERROR - stderr - 21%|██ | 4739/22434 [5:34:15<33:37:15, 6.84s/it] +2025-02-05 15:41:56 - ERROR - stderr - +2025-02-05 15:41:56 - ERROR - stderr - +2025-02-05 15:41:56 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 1.145063042640686, 'learning_rate': 1.8326708155765436e-05, 'epoch': 0.63} +2025-02-05 15:41:56 - ERROR - stderr - 21%|██ | 4739/22434 [5:34:15<33:37:15, 6.84s/it] +2025-02-05 15:41:58 - ERROR - stderr - 21%|██ | 4740/22434 [5:34:18<27:10:43, 5.53s/it] +2025-02-05 15:41:58 - ERROR - stderr - +2025-02-05 15:41:58 - ERROR - stderr - +2025-02-05 15:41:58 - INFO - stdout - {'loss': 0.9659, 'grad_norm': 1.093774437904358, 'learning_rate': 1.8325908567895955e-05, 'epoch': 0.63} +2025-02-05 15:41:58 - ERROR - stderr - 21%|██ | 4740/22434 [5:34:18<27:10:43, 5.53s/it] +2025-02-05 15:42:01 - ERROR - stderr - 21%|██ | 4741/22434 [5:34:20<22:42:52, 4.62s/it] +2025-02-05 15:42:01 - ERROR - stderr - +2025-02-05 15:42:01 - ERROR - stderr - +2025-02-05 15:42:01 - INFO - stdout - {'loss': 1.022, 'grad_norm': 1.2680492401123047, 'learning_rate': 1.832510880648088e-05, 'epoch': 0.63} +2025-02-05 15:42:01 - ERROR - stderr - 21%|██ | 4741/22434 [5:34:20<22:42:52, 4.62s/it] +2025-02-05 15:42:03 - ERROR - stderr - 21%|██ | 4742/22434 [5:34:23<19:30:40, 3.97s/it] +2025-02-05 15:42:03 - ERROR - stderr - +2025-02-05 15:42:03 - ERROR - stderr - +2025-02-05 15:42:03 - INFO - stdout - {'loss': 0.9206, 'grad_norm': 0.9581674337387085, 'learning_rate': 1.8324308871536877e-05, 'epoch': 0.63} +2025-02-05 15:42:03 - ERROR - stderr - 21%|██ | 4742/22434 [5:34:23<19:30:40, 3.97s/it] +2025-02-05 15:42:06 - ERROR - stderr - 21%|██ | 4743/22434 [5:34:25<17:18:41, 3.52s/it] +2025-02-05 15:42:06 - ERROR - stderr - +2025-02-05 15:42:06 - ERROR - stderr - +2025-02-05 15:42:06 - INFO - stdout - {'loss': 0.9434, 'grad_norm': 1.139790654182434, 'learning_rate': 1.832350876308062e-05, 'epoch': 0.63} +2025-02-05 15:42:06 - ERROR - stderr - 21%|██ | 4743/22434 [5:34:25<17:18:41, 3.52s/it] +2025-02-05 15:42:08 - ERROR - stderr - 21%|██ | 4744/22434 [5:34:28<16:04:33, 3.27s/it] +2025-02-05 15:42:08 - ERROR - stderr - +2025-02-05 15:42:08 - ERROR - stderr - +2025-02-05 15:42:08 - INFO - stdout - {'loss': 1.1061, 'grad_norm': 1.0993587970733643, 'learning_rate': 1.8322708481128787e-05, 'epoch': 0.63} +2025-02-05 15:42:08 - ERROR - stderr - 21%|██ | 4744/22434 [5:34:28<16:04:33, 3.27s/it] +2025-02-05 15:42:11 - ERROR - stderr - 21%|██ | 4745/22434 [5:34:31<15:02:05, 3.06s/it] +2025-02-05 15:42:11 - ERROR - stderr - +2025-02-05 15:42:11 - ERROR - stderr - +2025-02-05 15:42:11 - INFO - stdout - {'loss': 0.9053, 'grad_norm': 0.9931357502937317, 'learning_rate': 1.832190802569806e-05, 'epoch': 0.63} +2025-02-05 15:42:11 - ERROR - stderr - 21%|██ | 4745/22434 [5:34:31<15:02:05, 3.06s/it] +2025-02-05 15:42:13 - ERROR - stderr - 21%|██ | 4746/22434 [5:34:33<14:10:57, 2.89s/it] +2025-02-05 15:42:13 - ERROR - stderr - +2025-02-05 15:42:13 - ERROR - stderr - +2025-02-05 15:42:13 - INFO - stdout - {'loss': 0.9323, 'grad_norm': 0.9859405159950256, 'learning_rate': 1.8321107396805126e-05, 'epoch': 0.63} +2025-02-05 15:42:13 - ERROR - stderr - 21%|██ | 4746/22434 [5:34:33<14:10:57, 2.89s/it] +2025-02-05 15:42:16 - ERROR - stderr - 21%|██ | 4747/22434 [5:34:35<13:29:21, 2.75s/it] +2025-02-05 15:42:16 - ERROR - stderr - +2025-02-05 15:42:16 - ERROR - stderr - +2025-02-05 15:42:16 - INFO - stdout - {'loss': 0.9144, 'grad_norm': 1.113785743713379, 'learning_rate': 1.8320306594466667e-05, 'epoch': 0.63} +2025-02-05 15:42:16 - ERROR - stderr - 21%|██ | 4747/22434 [5:34:36<13:29:21, 2.75s/it] +2025-02-05 15:42:18 - ERROR - stderr - 21%|██ | 4748/22434 [5:34:38<13:01:07, 2.65s/it] +2025-02-05 15:42:18 - ERROR - stderr - +2025-02-05 15:42:18 - ERROR - stderr - +2025-02-05 15:42:18 - INFO - stdout - {'loss': 0.8915, 'grad_norm': 1.0219230651855469, 'learning_rate': 1.8319505618699384e-05, 'epoch': 0.63} +2025-02-05 15:42:18 - ERROR - stderr - 21%|██ | 4748/22434 [5:34:38<13:01:07, 2.65s/it] +2025-02-05 15:42:21 - ERROR - stderr - 21%|██ | 4749/22434 [5:34:40<12:42:53, 2.59s/it] +2025-02-05 15:42:21 - ERROR - stderr - +2025-02-05 15:42:21 - ERROR - stderr - +2025-02-05 15:42:21 - INFO - stdout - {'loss': 0.8833, 'grad_norm': 1.0530736446380615, 'learning_rate': 1.831870446951996e-05, 'epoch': 0.64} +2025-02-05 15:42:21 - ERROR - stderr - 21%|██ | 4749/22434 [5:34:40<12:42:53, 2.59s/it] +2025-02-05 15:42:23 - ERROR - stderr - 21%|██ | 4750/22434 [5:34:43<12:55:23, 2.63s/it] +2025-02-05 15:42:23 - ERROR - stderr - +2025-02-05 15:42:23 - ERROR - stderr - +2025-02-05 15:42:23 - INFO - stdout - {'loss': 0.961, 'grad_norm': 1.1611578464508057, 'learning_rate': 1.8317903146945106e-05, 'epoch': 0.64} +2025-02-05 15:42:23 - ERROR - stderr - 21%|██ | 4750/22434 [5:34:43<12:55:23, 2.63s/it] +2025-02-05 15:42:26 - ERROR - stderr - 21%|██ | 4751/22434 [5:34:46<12:51:04, 2.62s/it] +2025-02-05 15:42:26 - ERROR - stderr - +2025-02-05 15:42:26 - ERROR - stderr - +2025-02-05 15:42:26 - INFO - stdout - {'loss': 1.0172, 'grad_norm': 1.1257898807525635, 'learning_rate': 1.831710165099152e-05, 'epoch': 0.64} +2025-02-05 15:42:26 - ERROR - stderr - 21%|██ | 4751/22434 [5:34:46<12:51:04, 2.62s/it] +2025-02-05 15:42:28 - ERROR - stderr - 21%|██ | 4752/22434 [5:34:48<12:39:17, 2.58s/it] +2025-02-05 15:42:28 - ERROR - stderr - +2025-02-05 15:42:28 - ERROR - stderr - +2025-02-05 15:42:28 - INFO - stdout - {'loss': 0.8632, 'grad_norm': 1.1275643110275269, 'learning_rate': 1.831629998167591e-05, 'epoch': 0.64} +2025-02-05 15:42:28 - ERROR - stderr - 21%|██ | 4752/22434 [5:34:48<12:39:17, 2.58s/it] +2025-02-05 15:42:31 - ERROR - stderr - 21%|██ | 4753/22434 [5:34:51<12:27:49, 2.54s/it] +2025-02-05 15:42:31 - ERROR - stderr - +2025-02-05 15:42:31 - ERROR - stderr - +2025-02-05 15:42:31 - INFO - stdout - {'loss': 0.9498, 'grad_norm': 1.0739939212799072, 'learning_rate': 1.8315498139014982e-05, 'epoch': 0.64} +2025-02-05 15:42:31 - ERROR - stderr - 21%|██ | 4753/22434 [5:34:51<12:27:49, 2.54s/it] +2025-02-05 15:42:33 - ERROR - stderr - 21%|██ | 4754/22434 [5:34:53<12:20:55, 2.51s/it] +2025-02-05 15:42:33 - ERROR - stderr - +2025-02-05 15:42:33 - ERROR - stderr - +2025-02-05 15:42:33 - INFO - stdout - {'loss': 0.903, 'grad_norm': 0.9668481945991516, 'learning_rate': 1.8314696123025456e-05, 'epoch': 0.64} +2025-02-05 15:42:33 - ERROR - stderr - 21%|██ | 4754/22434 [5:34:53<12:20:55, 2.51s/it] +2025-02-05 15:42:36 - ERROR - stderr - 21%|██ | 4755/22434 [5:34:56<12:16:15, 2.50s/it] +2025-02-05 15:42:36 - ERROR - stderr - +2025-02-05 15:42:36 - ERROR - stderr - +2025-02-05 15:42:36 - INFO - stdout - {'loss': 0.8521, 'grad_norm': 1.1157217025756836, 'learning_rate': 1.831389393372404e-05, 'epoch': 0.64} +2025-02-05 15:42:36 - ERROR - stderr - 21%|██ | 4755/22434 [5:34:56<12:16:15, 2.50s/it] +2025-02-05 15:42:38 - ERROR - stderr - 21%|██ | 4756/22434 [5:34:58<12:11:45, 2.48s/it] +2025-02-05 15:42:38 - ERROR - stderr - +2025-02-05 15:42:38 - ERROR - stderr - +2025-02-05 15:42:38 - INFO - stdout - {'loss': 0.9637, 'grad_norm': 1.0479258298873901, 'learning_rate': 1.8313091571127467e-05, 'epoch': 0.64} +2025-02-05 15:42:38 - ERROR - stderr - 21%|██ | 4756/22434 [5:34:58<12:11:45, 2.48s/it] +2025-02-05 15:42:41 - ERROR - stderr - 21%|██ | 4757/22434 [5:35:00<12:14:05, 2.49s/it] +2025-02-05 15:42:41 - ERROR - stderr - +2025-02-05 15:42:41 - ERROR - stderr - +2025-02-05 15:42:41 - INFO - stdout - {'loss': 0.9181, 'grad_norm': 1.1885377168655396, 'learning_rate': 1.8312289035252448e-05, 'epoch': 0.64} +2025-02-05 15:42:41 - ERROR - stderr - 21%|██ | 4757/22434 [5:35:01<12:14:05, 2.49s/it] +2025-02-05 15:42:43 - ERROR - stderr - 21%|██ | 4758/22434 [5:35:03<12:14:51, 2.49s/it] +2025-02-05 15:42:43 - ERROR - stderr - +2025-02-05 15:42:43 - ERROR - stderr - +2025-02-05 15:42:43 - INFO - stdout - {'loss': 0.8511, 'grad_norm': 0.9766324758529663, 'learning_rate': 1.8311486326115726e-05, 'epoch': 0.64} +2025-02-05 15:42:43 - ERROR - stderr - 21%|██ | 4758/22434 [5:35:03<12:14:51, 2.49s/it] +2025-02-05 15:43:14 - ERROR - stderr - 21%|██ | 4759/22434 [5:35:34<54:27:13, 11.09s/it] +2025-02-05 15:43:14 - ERROR - stderr - +2025-02-05 15:43:14 - ERROR - stderr - +2025-02-05 15:43:14 - INFO - stdout - {'loss': 0.9584, 'grad_norm': 1.0711909532546997, 'learning_rate': 1.8310683443734016e-05, 'epoch': 0.64} +2025-02-05 15:43:14 - ERROR - stderr - 21%|██ | 4759/22434 [5:35:34<54:27:13, 11.09s/it] +2025-02-05 15:43:40 - ERROR - stderr - 21%|██ | 4760/22434 [5:36:00<75:46:20, 15.43s/it] +2025-02-05 15:43:40 - ERROR - stderr - +2025-02-05 15:43:40 - ERROR - stderr - +2025-02-05 15:43:40 - INFO - stdout - {'loss': 0.9871, 'grad_norm': 1.1418178081512451, 'learning_rate': 1.8309880388124067e-05, 'epoch': 0.64} +2025-02-05 15:43:40 - ERROR - stderr - 21%|██ | 4760/22434 [5:36:00<75:46:20, 15.43s/it] +2025-02-05 15:44:11 - ERROR - stderr - 21%|██ | 4761/22434 [5:36:31<98:40:40, 20.10s/it] +2025-02-05 15:44:11 - ERROR - stderr - +2025-02-05 15:44:11 - ERROR - stderr - +2025-02-05 15:44:11 - INFO - stdout - {'loss': 0.9531, 'grad_norm': 0.9983953833580017, 'learning_rate': 1.8309077159302612e-05, 'epoch': 0.64} +2025-02-05 15:44:11 - ERROR - stderr - 21%|██ | 4761/22434 [5:36:31<98:40:40, 20.10s/it] +2025-02-05 15:45:04 - ERROR - stderr - 21%|██ | 4762/22434 [5:37:23<146:34:12, 29.86s/it] +2025-02-05 15:45:04 - ERROR - stderr - +2025-02-05 15:45:04 - ERROR - stderr - +2025-02-05 15:45:04 - INFO - stdout - {'loss': 1.0669, 'grad_norm': 1.1794579029083252, 'learning_rate': 1.8308273757286396e-05, 'epoch': 0.64} +2025-02-05 15:45:04 - ERROR - stderr - 21%|██ | 4762/22434 [5:37:23<146:34:12, 29.86s/it] +2025-02-05 15:45:38 - ERROR - stderr - 21%|██ | 4763/22434 [5:37:58<153:16:31, 31.23s/it] +2025-02-05 15:45:38 - ERROR - stderr - +2025-02-05 15:45:38 - ERROR - stderr - +2025-02-05 15:45:38 - INFO - stdout - {'loss': 0.907, 'grad_norm': 1.1387255191802979, 'learning_rate': 1.8307470182092163e-05, 'epoch': 0.64} +2025-02-05 15:45:38 - ERROR - stderr - 21%|██ | 4763/22434 [5:37:58<153:16:31, 31.23s/it] +2025-02-05 15:45:55 - ERROR - stderr - 21%|██ | 4764/22434 [5:38:15<132:39:44, 27.03s/it] +2025-02-05 15:45:55 - ERROR - stderr - +2025-02-05 15:45:55 - ERROR - stderr - +2025-02-05 15:45:55 - INFO - stdout - {'loss': 0.8724, 'grad_norm': 0.9812789559364319, 'learning_rate': 1.8306666433736664e-05, 'epoch': 0.64} +2025-02-05 15:45:55 - ERROR - stderr - 21%|██ | 4764/22434 [5:38:15<132:39:44, 27.03s/it] +2025-02-05 15:46:13 - ERROR - stderr - 21%|██ | 4765/22434 [5:38:33<119:28:11, 24.34s/it] +2025-02-05 15:46:13 - ERROR - stderr - +2025-02-05 15:46:13 - ERROR - stderr - +2025-02-05 15:46:13 - INFO - stdout - {'loss': 0.9547, 'grad_norm': 1.073772668838501, 'learning_rate': 1.830586251223665e-05, 'epoch': 0.64} +2025-02-05 15:46:13 - ERROR - stderr - 21%|██ | 4765/22434 [5:38:33<119:28:11, 24.34s/it] +2025-02-05 15:47:15 - ERROR - stderr - 21%|██ | 4766/22434 [5:39:34<174:03:42, 35.47s/it] +2025-02-05 15:47:15 - ERROR - stderr - +2025-02-05 15:47:15 - ERROR - stderr - +2025-02-05 15:47:15 - INFO - stdout - {'loss': 0.8231, 'grad_norm': 0.9252293109893799, 'learning_rate': 1.830505841760888e-05, 'epoch': 0.64} +2025-02-05 15:47:15 - ERROR - stderr - 21%|██ | 4766/22434 [5:39:35<174:03:42, 35.47s/it] +2025-02-05 15:48:10 - ERROR - stderr - 21%|██ | 4767/22434 [5:40:29<202:32:32, 41.27s/it] +2025-02-05 15:48:10 - ERROR - stderr - +2025-02-05 15:48:10 - ERROR - stderr - +2025-02-05 15:48:10 - INFO - stdout - {'loss': 1.0603, 'grad_norm': 1.2247083187103271, 'learning_rate': 1.8304254149870114e-05, 'epoch': 0.64} +2025-02-05 15:48:10 - ERROR - stderr - 21%|██ | 4767/22434 [5:40:29<202:32:32, 41.27s/it] +2025-02-05 15:48:42 - ERROR - stderr - 21%|██▏ | 4768/22434 [5:41:02<189:12:50, 38.56s/it] +2025-02-05 15:48:42 - ERROR - stderr - +2025-02-05 15:48:42 - ERROR - stderr - +2025-02-05 15:48:42 - INFO - stdout - {'loss': 1.0693, 'grad_norm': 1.1987895965576172, 'learning_rate': 1.830344970903712e-05, 'epoch': 0.64} +2025-02-05 15:48:42 - ERROR - stderr - 21%|██▏ | 4768/22434 [5:41:02<189:12:50, 38.56s/it] +2025-02-05 15:49:29 - ERROR - stderr - 21%|██▏ | 4769/22434 [5:41:48<201:36:31, 41.09s/it] +2025-02-05 15:49:29 - ERROR - stderr - +2025-02-05 15:49:29 - ERROR - stderr - +2025-02-05 15:49:29 - INFO - stdout - {'loss': 0.9995, 'grad_norm': 1.0926916599273682, 'learning_rate': 1.830264509512666e-05, 'epoch': 0.64} +2025-02-05 15:49:29 - ERROR - stderr - 21%|██▏ | 4769/22434 [5:41:49<201:36:31, 41.09s/it] +2025-02-05 15:50:10 - ERROR - stderr - 21%|██▏ | 4770/22434 [5:42:30<202:34:07, 41.28s/it] +2025-02-05 15:50:11 - ERROR - stderr - +2025-02-05 15:50:11 - ERROR - stderr - +2025-02-05 15:50:11 - INFO - stdout - {'loss': 0.8883, 'grad_norm': 1.1352436542510986, 'learning_rate': 1.8301840308155507e-05, 'epoch': 0.64} +2025-02-05 15:50:11 - ERROR - stderr - 21%|██▏ | 4770/22434 [5:42:30<202:34:07, 41.28s/it] +2025-02-05 15:50:13 - ERROR - stderr - 21%|██▏ | 4771/22434 [5:42:33<145:23:06, 29.63s/it] +2025-02-05 15:50:13 - ERROR - stderr - +2025-02-05 15:50:13 - ERROR - stderr - +2025-02-05 15:50:13 - INFO - stdout - {'loss': 0.9494, 'grad_norm': 1.1419860124588013, 'learning_rate': 1.830103534814044e-05, 'epoch': 0.64} +2025-02-05 15:50:13 - ERROR - stderr - 21%|██▏ | 4771/22434 [5:42:33<145:23:06, 29.63s/it] +2025-02-05 15:50:28 - ERROR - stderr - 21%|██▏ | 4772/22434 [5:42:48<124:17:09, 25.33s/it] +2025-02-05 15:50:28 - ERROR - stderr - +2025-02-05 15:50:28 - ERROR - stderr - +2025-02-05 15:50:28 - INFO - stdout - {'loss': 1.0767, 'grad_norm': 1.1492091417312622, 'learning_rate': 1.830023021509823e-05, 'epoch': 0.64} +2025-02-05 15:50:28 - ERROR - stderr - 21%|██▏ | 4772/22434 [5:42:48<124:17:09, 25.33s/it] +2025-02-05 15:51:02 - ERROR - stderr - 21%|██▏ | 4773/22434 [5:43:22<136:56:04, 27.91s/it] +2025-02-05 15:51:02 - ERROR - stderr - +2025-02-05 15:51:02 - ERROR - stderr - +2025-02-05 15:51:02 - INFO - stdout - {'loss': 0.9389, 'grad_norm': 1.04586660861969, 'learning_rate': 1.8299424909045665e-05, 'epoch': 0.64} +2025-02-05 15:51:02 - ERROR - stderr - 21%|██▏ | 4773/22434 [5:43:22<136:56:04, 27.91s/it] +2025-02-05 15:51:05 - ERROR - stderr - 21%|██▏ | 4774/22434 [5:43:24<99:32:19, 20.29s/it] +2025-02-05 15:51:05 - ERROR - stderr - +2025-02-05 15:51:05 - ERROR - stderr - +2025-02-05 15:51:05 - INFO - stdout - {'loss': 0.8989, 'grad_norm': 1.0014731884002686, 'learning_rate': 1.829861942999953e-05, 'epoch': 0.64} +2025-02-05 15:51:05 - ERROR - stderr - 21%|██▏ | 4774/22434 [5:43:24<99:32:19, 20.29s/it] +2025-02-05 15:51:07 - ERROR - stderr - 21%|██▏ | 4775/22434 [5:43:27<73:15:45, 14.94s/it] +2025-02-05 15:51:07 - ERROR - stderr - +2025-02-05 15:51:07 - ERROR - stderr - +2025-02-05 15:51:07 - INFO - stdout - {'loss': 0.8818, 'grad_norm': 1.0885424613952637, 'learning_rate': 1.8297813777976613e-05, 'epoch': 0.64} +2025-02-05 15:51:07 - ERROR - stderr - 21%|██▏ | 4775/22434 [5:43:27<73:15:45, 14.94s/it] +2025-02-05 15:51:45 - ERROR - stderr - 21%|██▏ | 4776/22434 [5:44:05<107:05:31, 21.83s/it] +2025-02-05 15:51:45 - ERROR - stderr - +2025-02-05 15:51:45 - ERROR - stderr - +2025-02-05 15:51:45 - INFO - stdout - {'loss': 0.8213, 'grad_norm': 1.008792519569397, 'learning_rate': 1.8297007952993713e-05, 'epoch': 0.64} +2025-02-05 15:51:45 - ERROR - stderr - 21%|██▏ | 4776/22434 [5:44:05<107:05:31, 21.83s/it] +2025-02-05 15:53:12 - ERROR - stderr - 21%|██▏ | 4777/22434 [5:45:32<203:14:35, 41.44s/it] +2025-02-05 15:53:12 - ERROR - stderr - +2025-02-05 15:53:12 - ERROR - stderr - +2025-02-05 15:53:12 - INFO - stdout - {'loss': 0.9579, 'grad_norm': 1.1058796644210815, 'learning_rate': 1.8296201955067614e-05, 'epoch': 0.64} +2025-02-05 15:53:12 - ERROR - stderr - 21%|██▏ | 4777/22434 [5:45:32<203:14:35, 41.44s/it] +2025-02-05 15:53:58 - ERROR - stderr - 21%|██▏ | 4778/22434 [5:46:18<209:50:24, 42.79s/it] +2025-02-05 15:53:58 - ERROR - stderr - +2025-02-05 15:53:58 - ERROR - stderr - +2025-02-05 15:53:58 - INFO - stdout - {'loss': 0.8771, 'grad_norm': 0.9740918278694153, 'learning_rate': 1.829539578421513e-05, 'epoch': 0.64} +2025-02-05 15:53:58 - ERROR - stderr - 21%|██▏ | 4778/22434 [5:46:18<209:50:24, 42.79s/it] +2025-02-05 15:54:42 - ERROR - stderr - 21%|██▏ | 4779/22434 [5:47:01<210:45:54, 42.98s/it] +2025-02-05 15:54:42 - ERROR - stderr - +2025-02-05 15:54:42 - ERROR - stderr - +2025-02-05 15:54:42 - INFO - stdout - {'loss': 0.8778, 'grad_norm': 1.1766633987426758, 'learning_rate': 1.8294589440453056e-05, 'epoch': 0.64} +2025-02-05 15:54:42 - ERROR - stderr - 21%|██▏ | 4779/22434 [5:47:01<210:45:54, 42.98s/it] +2025-02-05 15:55:31 - ERROR - stderr - 21%|██▏ | 4780/22434 [5:47:50<219:46:46, 44.82s/it] +2025-02-05 15:55:31 - ERROR - stderr - +2025-02-05 15:55:31 - ERROR - stderr - +2025-02-05 15:55:31 - INFO - stdout - {'loss': 0.9924, 'grad_norm': 1.13731849193573, 'learning_rate': 1.8293782923798203e-05, 'epoch': 0.64} +2025-02-05 15:55:31 - ERROR - stderr - 21%|██▏ | 4780/22434 [5:47:50<219:46:46, 44.82s/it] +2025-02-05 15:56:16 - ERROR - stderr - 21%|██▏ | 4781/22434 [5:48:35<219:50:45, 44.83s/it] +2025-02-05 15:56:16 - ERROR - stderr - +2025-02-05 15:56:16 - ERROR - stderr - +2025-02-05 15:56:16 - INFO - stdout - {'loss': 0.9942, 'grad_norm': 1.1319187879562378, 'learning_rate': 1.829297623426738e-05, 'epoch': 0.64} +2025-02-05 15:56:16 - ERROR - stderr - 21%|██▏ | 4781/22434 [5:48:35<219:50:45, 44.83s/it] +2025-02-05 15:56:56 - ERROR - stderr - 21%|██▏ | 4782/22434 [5:49:16<213:48:25, 43.60s/it] +2025-02-05 15:56:56 - ERROR - stderr - +2025-02-05 15:56:56 - ERROR - stderr - +2025-02-05 15:56:56 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.021061658859253, 'learning_rate': 1.82921693718774e-05, 'epoch': 0.64} +2025-02-05 15:56:56 - ERROR - stderr - 21%|██▏ | 4782/22434 [5:49:16<213:48:25, 43.60s/it] +2025-02-05 15:57:07 - ERROR - stderr - 21%|██▏ | 4783/22434 [5:49:26<164:48:19, 33.61s/it] +2025-02-05 15:57:07 - ERROR - stderr - +2025-02-05 15:57:07 - ERROR - stderr - +2025-02-05 15:57:07 - INFO - stdout - {'loss': 1.0728, 'grad_norm': 1.1099739074707031, 'learning_rate': 1.8291362336645088e-05, 'epoch': 0.64} +2025-02-05 15:57:07 - ERROR - stderr - 21%|██▏ | 4783/22434 [5:49:26<164:48:19, 33.61s/it] +2025-02-05 15:57:47 - ERROR - stderr - 21%|██▏ | 4784/22434 [5:50:07<175:08:36, 35.72s/it] +2025-02-05 15:57:47 - ERROR - stderr - +2025-02-05 15:57:47 - ERROR - stderr - +2025-02-05 15:57:47 - INFO - stdout - {'loss': 0.9691, 'grad_norm': 1.0654855966567993, 'learning_rate': 1.8290555128587263e-05, 'epoch': 0.64} +2025-02-05 15:57:47 - ERROR - stderr - 21%|██▏ | 4784/22434 [5:50:07<175:08:36, 35.72s/it] +2025-02-05 15:58:30 - ERROR - stderr - 21%|██▏ | 4785/22434 [5:50:50<185:50:53, 37.91s/it] +2025-02-05 15:58:30 - ERROR - stderr - +2025-02-05 15:58:30 - ERROR - stderr - +2025-02-05 15:58:30 - INFO - stdout - {'loss': 0.8999, 'grad_norm': 1.0759185552597046, 'learning_rate': 1.8289747747720747e-05, 'epoch': 0.64} +2025-02-05 15:58:30 - ERROR - stderr - 21%|██▏ | 4785/22434 [5:50:50<185:50:53, 37.91s/it] +2025-02-05 15:59:03 - ERROR - stderr - 21%|██▏ | 4786/22434 [5:51:22<177:43:23, 36.25s/it] +2025-02-05 15:59:03 - ERROR - stderr - +2025-02-05 15:59:03 - ERROR - stderr - +2025-02-05 15:59:03 - INFO - stdout - {'loss': 0.9299, 'grad_norm': 0.9984836578369141, 'learning_rate': 1.8288940194062373e-05, 'epoch': 0.64} +2025-02-05 15:59:03 - ERROR - stderr - 21%|██▏ | 4786/22434 [5:51:22<177:43:23, 36.25s/it] +2025-02-05 16:00:01 - ERROR - stderr - 21%|██▏ | 4787/22434 [5:52:20<209:38:12, 42.77s/it] +2025-02-05 16:00:01 - ERROR - stderr - +2025-02-05 16:00:01 - ERROR - stderr - +2025-02-05 16:00:01 - INFO - stdout - {'loss': 0.9654, 'grad_norm': 1.1172863245010376, 'learning_rate': 1.8288132467628973e-05, 'epoch': 0.64} +2025-02-05 16:00:01 - ERROR - stderr - 21%|██▏ | 4787/22434 [5:52:20<209:38:12, 42.77s/it] +2025-02-05 16:00:03 - ERROR - stderr - 21%|██▏ | 4788/22434 [5:52:23<150:22:40, 30.68s/it] +2025-02-05 16:00:03 - ERROR - stderr - +2025-02-05 16:00:03 - ERROR - stderr - +2025-02-05 16:00:03 - INFO - stdout - {'loss': 0.9351, 'grad_norm': 1.085368037223816, 'learning_rate': 1.8287324568437383e-05, 'epoch': 0.64} +2025-02-05 16:00:03 - ERROR - stderr - 21%|██▏ | 4788/22434 [5:52:23<150:22:40, 30.68s/it] +2025-02-05 16:00:34 - ERROR - stderr - 21%|██▏ | 4789/22434 [5:52:54<151:14:00, 30.86s/it] +2025-02-05 16:00:34 - ERROR - stderr - +2025-02-05 16:00:34 - ERROR - stderr - +2025-02-05 16:00:34 - INFO - stdout - {'loss': 0.8691, 'grad_norm': 1.214124321937561, 'learning_rate': 1.828651649650444e-05, 'epoch': 0.64} +2025-02-05 16:00:34 - ERROR - stderr - 21%|██▏ | 4789/22434 [5:52:54<151:14:00, 30.86s/it] +2025-02-05 16:01:10 - ERROR - stderr - 21%|██▏ | 4790/22434 [5:53:30<158:24:11, 32.32s/it] +2025-02-05 16:01:10 - ERROR - stderr - +2025-02-05 16:01:10 - ERROR - stderr - +2025-02-05 16:01:10 - INFO - stdout - {'loss': 0.913, 'grad_norm': 1.0281245708465576, 'learning_rate': 1.8285708251846994e-05, 'epoch': 0.64} +2025-02-05 16:01:10 - ERROR - stderr - 21%|██▏ | 4790/22434 [5:53:30<158:24:11, 32.32s/it] +2025-02-05 16:01:39 - ERROR - stderr - 21%|██▏ | 4791/22434 [5:53:59<153:12:09, 31.26s/it] +2025-02-05 16:01:39 - ERROR - stderr - +2025-02-05 16:01:39 - ERROR - stderr - +2025-02-05 16:01:39 - INFO - stdout - {'loss': 0.897, 'grad_norm': 1.0462946891784668, 'learning_rate': 1.8284899834481883e-05, 'epoch': 0.64} +2025-02-05 16:01:39 - ERROR - stderr - 21%|██▏ | 4791/22434 [5:53:59<153:12:09, 31.26s/it] +2025-02-05 16:02:08 - ERROR - stderr - 21%|██▏ | 4792/22434 [5:54:28<150:31:52, 30.72s/it] +2025-02-05 16:02:08 - ERROR - stderr - +2025-02-05 16:02:08 - ERROR - stderr - +2025-02-05 16:02:08 - INFO - stdout - {'loss': 1.0796, 'grad_norm': 1.0465197563171387, 'learning_rate': 1.8284091244425965e-05, 'epoch': 0.64} +2025-02-05 16:02:08 - ERROR - stderr - 21%|██▏ | 4792/22434 [5:54:28<150:31:52, 30.72s/it] +2025-02-05 16:02:38 - ERROR - stderr - 21%|██▏ | 4793/22434 [5:54:58<149:33:21, 30.52s/it] +2025-02-05 16:02:38 - ERROR - stderr - +2025-02-05 16:02:38 - ERROR - stderr - +2025-02-05 16:02:38 - INFO - stdout - {'loss': 0.9644, 'grad_norm': 1.0862988233566284, 'learning_rate': 1.8283282481696093e-05, 'epoch': 0.64} +2025-02-05 16:02:38 - ERROR - stderr - 21%|██▏ | 4793/22434 [5:54:58<149:33:21, 30.52s/it] +2025-02-05 16:02:41 - ERROR - stderr - 21%|██▏ | 4794/22434 [5:55:01<108:30:50, 22.15s/it] +2025-02-05 16:02:41 - ERROR - stderr - +2025-02-05 16:02:41 - ERROR - stderr - +2025-02-05 16:02:41 - INFO - stdout - {'loss': 0.7391, 'grad_norm': 0.9503269791603088, 'learning_rate': 1.828247354630912e-05, 'epoch': 0.64} +2025-02-05 16:02:41 - ERROR - stderr - 21%|██▏ | 4794/22434 [5:55:01<108:30:50, 22.15s/it] +2025-02-05 16:02:59 - ERROR - stderr - 21%|██▏ | 4795/22434 [5:55:19<102:45:20, 20.97s/it] +2025-02-05 16:02:59 - ERROR - stderr - +2025-02-05 16:02:59 - ERROR - stderr - +2025-02-05 16:02:59 - INFO - stdout - {'loss': 0.8947, 'grad_norm': 0.9751207828521729, 'learning_rate': 1.8281664438281918e-05, 'epoch': 0.64} +2025-02-05 16:02:59 - ERROR - stderr - 21%|██▏ | 4795/22434 [5:55:19<102:45:20, 20.97s/it] +2025-02-05 16:03:02 - ERROR - stderr - 21%|██▏ | 4796/22434 [5:55:22<75:41:40, 15.45s/it] +2025-02-05 16:03:02 - ERROR - stderr - +2025-02-05 16:03:02 - ERROR - stderr - +2025-02-05 16:03:02 - INFO - stdout - {'loss': 0.773, 'grad_norm': 0.9611235857009888, 'learning_rate': 1.8280855157631337e-05, 'epoch': 0.64} +2025-02-05 16:03:02 - ERROR - stderr - 21%|██▏ | 4796/22434 [5:55:22<75:41:40, 15.45s/it] +2025-02-05 16:03:04 - ERROR - stderr - 21%|██▏ | 4797/22434 [5:55:24<56:39:47, 11.57s/it] +2025-02-05 16:03:04 - ERROR - stderr - +2025-02-05 16:03:04 - ERROR - stderr - +2025-02-05 16:03:04 - INFO - stdout - {'loss': 0.8314, 'grad_norm': 0.935499906539917, 'learning_rate': 1.8280045704374263e-05, 'epoch': 0.64} +2025-02-05 16:03:04 - ERROR - stderr - 21%|██▏ | 4797/22434 [5:55:24<56:39:47, 11.57s/it] +2025-02-05 16:03:10 - ERROR - stderr - 21%|██▏ | 4798/22434 [5:55:30<47:54:20, 9.78s/it] +2025-02-05 16:03:10 - ERROR - stderr - +2025-02-05 16:03:10 - ERROR - stderr - +2025-02-05 16:03:10 - INFO - stdout - {'loss': 1.0479, 'grad_norm': 1.11974036693573, 'learning_rate': 1.8279236078527555e-05, 'epoch': 0.64} +2025-02-05 16:03:10 - ERROR - stderr - 21%|██▏ | 4798/22434 [5:55:30<47:54:20, 9.78s/it] +2025-02-05 16:03:14 - ERROR - stderr - 21%|██▏ | 4799/22434 [5:55:34<39:36:54, 8.09s/it] +2025-02-05 16:03:14 - ERROR - stderr - +2025-02-05 16:03:14 - ERROR - stderr - +2025-02-05 16:03:14 - INFO - stdout - {'loss': 0.9767, 'grad_norm': 1.0581741333007812, 'learning_rate': 1.8278426280108092e-05, 'epoch': 0.64} +2025-02-05 16:03:14 - ERROR - stderr - 21%|██▏ | 4799/22434 [5:55:34<39:36:54, 8.09s/it] +2025-02-05 16:03:16 - ERROR - stderr - 21%|██▏ | 4800/22434 [5:55:36<31:21:41, 6.40s/it] +2025-02-05 16:03:17 - ERROR - stderr - +2025-02-05 16:03:17 - ERROR - stderr - +2025-02-05 16:03:17 - INFO - stdout - {'loss': 1.1055, 'grad_norm': 1.095953106880188, 'learning_rate': 1.8277616309132758e-05, 'epoch': 0.64} +2025-02-05 16:03:17 - ERROR - stderr - 21%|██▏ | 4800/22434 [5:55:36<31:21:41, 6.40s/it] +2025-02-05 16:03:19 - ERROR - stderr - 21%|██▏ | 4801/22434 [5:55:39<25:39:56, 5.24s/it] +2025-02-05 16:03:19 - ERROR - stderr - +2025-02-05 16:03:19 - ERROR - stderr - +2025-02-05 16:03:19 - INFO - stdout - {'loss': 0.9283, 'grad_norm': 1.1555147171020508, 'learning_rate': 1.8276806165618432e-05, 'epoch': 0.64} +2025-02-05 16:03:19 - ERROR - stderr - 21%|██▏ | 4801/22434 [5:55:39<25:39:56, 5.24s/it] +2025-02-05 16:03:21 - ERROR - stderr - 21%|██▏ | 4802/22434 [5:55:41<21:35:53, 4.41s/it] +2025-02-05 16:03:22 - ERROR - stderr - +2025-02-05 16:03:22 - ERROR - stderr - +2025-02-05 16:03:22 - INFO - stdout - {'loss': 0.9517, 'grad_norm': 1.1237616539001465, 'learning_rate': 1.8275995849582e-05, 'epoch': 0.64} +2025-02-05 16:03:22 - ERROR - stderr - 21%|██▏ | 4802/22434 [5:55:41<21:35:53, 4.41s/it] +2025-02-05 16:03:24 - ERROR - stderr - 21%|██▏ | 4803/22434 [5:55:44<18:43:25, 3.82s/it] +2025-02-05 16:03:24 - ERROR - stderr - +2025-02-05 16:03:24 - ERROR - stderr - +2025-02-05 16:03:24 - INFO - stdout - {'loss': 0.8827, 'grad_norm': 1.1533702611923218, 'learning_rate': 1.8275185361040357e-05, 'epoch': 0.64} +2025-02-05 16:03:24 - ERROR - stderr - 21%|██▏ | 4803/22434 [5:55:44<18:43:25, 3.82s/it] +2025-02-05 16:03:26 - ERROR - stderr - 21%|██▏ | 4804/22434 [5:55:46<16:47:23, 3.43s/it] +2025-02-05 16:03:27 - ERROR - stderr - +2025-02-05 16:03:27 - ERROR - stderr - +2025-02-05 16:03:27 - INFO - stdout - {'loss': 0.9852, 'grad_norm': 1.1576662063598633, 'learning_rate': 1.8274374700010387e-05, 'epoch': 0.64} +2025-02-05 16:03:27 - ERROR - stderr - 21%|██▏ | 4804/22434 [5:55:46<16:47:23, 3.43s/it] +2025-02-05 16:03:29 - ERROR - stderr - 21%|██▏ | 4805/22434 [5:55:49<15:44:12, 3.21s/it] +2025-02-05 16:03:29 - ERROR - stderr - +2025-02-05 16:03:29 - ERROR - stderr - +2025-02-05 16:03:29 - INFO - stdout - {'loss': 0.8786, 'grad_norm': 1.0092716217041016, 'learning_rate': 1.8273563866509e-05, 'epoch': 0.64} +2025-02-05 16:03:29 - ERROR - stderr - 21%|██▏ | 4805/22434 [5:55:49<15:44:12, 3.21s/it] +2025-02-05 16:03:32 - ERROR - stderr - 21%|██▏ | 4806/22434 [5:55:51<14:46:18, 3.02s/it] +2025-02-05 16:03:32 - ERROR - stderr - +2025-02-05 16:03:32 - ERROR - stderr - +2025-02-05 16:03:32 - INFO - stdout - {'loss': 0.9335, 'grad_norm': 1.0286104679107666, 'learning_rate': 1.8272752860553088e-05, 'epoch': 0.64} +2025-02-05 16:03:32 - ERROR - stderr - 21%|██▏ | 4806/22434 [5:55:52<14:46:18, 3.02s/it] +2025-02-05 16:03:34 - ERROR - stderr - 21%|██▏ | 4807/22434 [5:55:54<14:05:43, 2.88s/it] +2025-02-05 16:03:34 - ERROR - stderr - +2025-02-05 16:03:34 - ERROR - stderr - +2025-02-05 16:03:34 - INFO - stdout - {'loss': 1.0152, 'grad_norm': 1.1145694255828857, 'learning_rate': 1.8271941682159562e-05, 'epoch': 0.64} +2025-02-05 16:03:34 - ERROR - stderr - 21%|██▏ | 4807/22434 [5:55:54<14:05:43, 2.88s/it] +2025-02-05 16:03:37 - ERROR - stderr - 21%|██▏ | 4808/22434 [5:55:57<13:47:30, 2.82s/it] +2025-02-05 16:03:37 - ERROR - stderr - +2025-02-05 16:03:37 - ERROR - stderr - +2025-02-05 16:03:37 - INFO - stdout - {'loss': 0.9799, 'grad_norm': 1.2004724740982056, 'learning_rate': 1.8271130331345324e-05, 'epoch': 0.64} +2025-02-05 16:03:37 - ERROR - stderr - 21%|██▏ | 4808/22434 [5:55:57<13:47:30, 2.82s/it] +2025-02-05 16:03:39 - ERROR - stderr - 21%|██▏ | 4809/22434 [5:55:59<13:20:42, 2.73s/it] +2025-02-05 16:03:40 - ERROR - stderr - +2025-02-05 16:03:40 - ERROR - stderr - +2025-02-05 16:03:40 - INFO - stdout - {'loss': 0.9336, 'grad_norm': 1.161144733428955, 'learning_rate': 1.827031880812729e-05, 'epoch': 0.64} +2025-02-05 16:03:40 - ERROR - stderr - 21%|██▏ | 4809/22434 [5:55:59<13:20:42, 2.73s/it] +2025-02-05 16:03:42 - ERROR - stderr - 21%|██▏ | 4810/22434 [5:56:02<13:04:20, 2.67s/it] +2025-02-05 16:03:42 - ERROR - stderr - +2025-02-05 16:03:42 - ERROR - stderr - +2025-02-05 16:03:42 - INFO - stdout - {'loss': 0.9388, 'grad_norm': 1.011474370956421, 'learning_rate': 1.8269507112522375e-05, 'epoch': 0.64} +2025-02-05 16:03:42 - ERROR - stderr - 21%|██▏ | 4810/22434 [5:56:02<13:04:20, 2.67s/it] +2025-02-05 16:03:45 - ERROR - stderr - 21%|██▏ | 4811/22434 [5:56:04<12:55:15, 2.64s/it] +2025-02-05 16:03:45 - ERROR - stderr - +2025-02-05 16:03:45 - ERROR - stderr - +2025-02-05 16:03:45 - INFO - stdout - {'loss': 0.9514, 'grad_norm': 1.0157440900802612, 'learning_rate': 1.82686952445475e-05, 'epoch': 0.64} +2025-02-05 16:03:45 - ERROR - stderr - 21%|██▏ | 4811/22434 [5:56:04<12:55:15, 2.64s/it] +2025-02-05 16:03:47 - ERROR - stderr - 21%|██▏ | 4812/22434 [5:56:07<12:37:21, 2.58s/it] +2025-02-05 16:03:47 - ERROR - stderr - +2025-02-05 16:03:47 - ERROR - stderr - +2025-02-05 16:03:47 - INFO - stdout - {'loss': 0.8761, 'grad_norm': 1.1161648035049438, 'learning_rate': 1.826788320421958e-05, 'epoch': 0.64} +2025-02-05 16:03:47 - ERROR - stderr - 21%|██▏ | 4812/22434 [5:56:07<12:37:21, 2.58s/it] +2025-02-05 16:03:50 - ERROR - stderr - 21%|██▏ | 4813/22434 [5:56:09<12:46:50, 2.61s/it] +2025-02-05 16:03:50 - ERROR - stderr - +2025-02-05 16:03:50 - ERROR - stderr - +2025-02-05 16:03:50 - INFO - stdout - {'loss': 0.9569, 'grad_norm': 1.0362584590911865, 'learning_rate': 1.8267070991555546e-05, 'epoch': 0.64} +2025-02-05 16:03:50 - ERROR - stderr - 21%|██▏ | 4813/22434 [5:56:10<12:46:50, 2.61s/it] +2025-02-05 16:03:52 - ERROR - stderr - 21%|██▏ | 4814/22434 [5:56:12<12:39:05, 2.58s/it] +2025-02-05 16:03:52 - ERROR - stderr - +2025-02-05 16:03:52 - ERROR - stderr - +2025-02-05 16:03:52 - INFO - stdout - {'loss': 0.9438, 'grad_norm': 1.1404283046722412, 'learning_rate': 1.826625860657233e-05, 'epoch': 0.64} +2025-02-05 16:03:52 - ERROR - stderr - 21%|██▏ | 4814/22434 [5:56:12<12:39:05, 2.58s/it] +2025-02-05 16:03:55 - ERROR - stderr - 21%|██▏ | 4815/22434 [5:56:15<12:35:34, 2.57s/it] +2025-02-05 16:03:55 - ERROR - stderr - +2025-02-05 16:03:55 - ERROR - stderr - +2025-02-05 16:03:55 - INFO - stdout - {'loss': 0.883, 'grad_norm': 0.9849143028259277, 'learning_rate': 1.8265446049286864e-05, 'epoch': 0.64} +2025-02-05 16:03:55 - ERROR - stderr - 21%|██▏ | 4815/22434 [5:56:15<12:35:34, 2.57s/it] +2025-02-05 16:03:58 - ERROR - stderr - 21%|██▏ | 4816/22434 [5:56:17<12:50:22, 2.62s/it] +2025-02-05 16:03:58 - ERROR - stderr - +2025-02-05 16:03:58 - ERROR - stderr - +2025-02-05 16:03:58 - INFO - stdout - {'loss': 0.8855, 'grad_norm': 1.10081946849823, 'learning_rate': 1.8264633319716084e-05, 'epoch': 0.64} +2025-02-05 16:03:58 - ERROR - stderr - 21%|██▏ | 4816/22434 [5:56:17<12:50:22, 2.62s/it] +2025-02-05 16:04:00 - ERROR - stderr - 21%|██▏ | 4817/22434 [5:56:20<12:43:08, 2.60s/it] +2025-02-05 16:04:00 - ERROR - stderr - +2025-02-05 16:04:00 - ERROR - stderr - +2025-02-05 16:04:00 - INFO - stdout - {'loss': 1.0352, 'grad_norm': 1.2425425052642822, 'learning_rate': 1.8263820417876926e-05, 'epoch': 0.64} +2025-02-05 16:04:00 - ERROR - stderr - 21%|██▏ | 4817/22434 [5:56:20<12:43:08, 2.60s/it] +2025-02-05 16:04:03 - ERROR - stderr - 21%|██▏ | 4818/22434 [5:56:22<12:32:34, 2.56s/it] +2025-02-05 16:04:03 - ERROR - stderr - +2025-02-05 16:04:03 - ERROR - stderr - +2025-02-05 16:04:03 - INFO - stdout - {'loss': 1.0226, 'grad_norm': 0.9970521330833435, 'learning_rate': 1.8263007343786347e-05, 'epoch': 0.64} +2025-02-05 16:04:03 - ERROR - stderr - 21%|██▏ | 4818/22434 [5:56:22<12:32:34, 2.56s/it] +2025-02-05 16:04:05 - ERROR - stderr - 21%|██▏ | 4819/22434 [5:56:25<12:27:28, 2.55s/it] +2025-02-05 16:04:05 - ERROR - stderr - +2025-02-05 16:04:05 - ERROR - stderr - +2025-02-05 16:04:05 - INFO - stdout - {'loss': 0.9416, 'grad_norm': 0.9919441938400269, 'learning_rate': 1.8262194097461284e-05, 'epoch': 0.64} +2025-02-05 16:04:05 - ERROR - stderr - 21%|██▏ | 4819/22434 [5:56:25<12:27:28, 2.55s/it] +2025-02-05 16:04:08 - ERROR - stderr - 21%|██▏ | 4820/22434 [5:56:27<12:27:11, 2.55s/it] +2025-02-05 16:04:08 - ERROR - stderr - +2025-02-05 16:04:08 - ERROR - stderr - +2025-02-05 16:04:08 - INFO - stdout - {'loss': 0.9224, 'grad_norm': 1.0043584108352661, 'learning_rate': 1.826138067891869e-05, 'epoch': 0.64} +2025-02-05 16:04:08 - ERROR - stderr - 21%|██▏ | 4820/22434 [5:56:27<12:27:11, 2.55s/it] +2025-02-05 16:04:41 - ERROR - stderr - 21%|██▏ | 4821/22434 [5:57:00<57:12:17, 11.69s/it] +2025-02-05 16:04:41 - ERROR - stderr - +2025-02-05 16:04:41 - ERROR - stderr - +2025-02-05 16:04:41 - INFO - stdout - {'loss': 0.9419, 'grad_norm': 0.9892141819000244, 'learning_rate': 1.826056708817552e-05, 'epoch': 0.64} +2025-02-05 16:04:41 - ERROR - stderr - 21%|██▏ | 4821/22434 [5:57:00<57:12:17, 11.69s/it] +2025-02-05 16:05:09 - ERROR - stderr - 21%|██▏ | 4822/22434 [5:57:28<80:59:13, 16.55s/it] +2025-02-05 16:05:09 - ERROR - stderr - +2025-02-05 16:05:09 - ERROR - stderr - +2025-02-05 16:05:09 - INFO - stdout - {'loss': 0.9413, 'grad_norm': 0.9406230449676514, 'learning_rate': 1.825975332524873e-05, 'epoch': 0.64} +2025-02-05 16:05:09 - ERROR - stderr - 21%|██▏ | 4822/22434 [5:57:28<80:59:13, 16.55s/it] +2025-02-05 16:05:36 - ERROR - stderr - 21%|██▏ | 4823/22434 [5:57:56<97:40:07, 19.97s/it] +2025-02-05 16:05:36 - ERROR - stderr - +2025-02-05 16:05:36 - ERROR - stderr - +2025-02-05 16:05:36 - INFO - stdout - {'loss': 1.0176, 'grad_norm': 1.1169888973236084, 'learning_rate': 1.8258939390155294e-05, 'epoch': 0.64} +2025-02-05 16:05:36 - ERROR - stderr - 21%|██▏ | 4823/22434 [5:57:56<97:40:07, 19.97s/it] +2025-02-05 16:06:01 - ERROR - stderr - 22%|██▏ | 4824/22434 [5:58:20<103:55:30, 21.25s/it] +2025-02-05 16:06:01 - ERROR - stderr - +2025-02-05 16:06:01 - ERROR - stderr - +2025-02-05 16:06:01 - INFO - stdout - {'loss': 0.9899, 'grad_norm': 1.1389446258544922, 'learning_rate': 1.8258125282912168e-05, 'epoch': 0.65} +2025-02-05 16:06:01 - ERROR - stderr - 22%|██▏ | 4824/22434 [5:58:20<103:55:30, 21.25s/it] +2025-02-05 16:06:22 - ERROR - stderr - 22%|██▏ | 4825/22434 [5:58:42<104:20:23, 21.33s/it] +2025-02-05 16:06:22 - ERROR - stderr - +2025-02-05 16:06:22 - ERROR - stderr - +2025-02-05 16:06:22 - INFO - stdout - {'loss': 0.8799, 'grad_norm': 0.9809902906417847, 'learning_rate': 1.8257311003536317e-05, 'epoch': 0.65} +2025-02-05 16:06:22 - ERROR - stderr - 22%|██▏ | 4825/22434 [5:58:42<104:20:23, 21.33s/it] +2025-02-05 16:07:02 - ERROR - stderr - 22%|██▏ | 4826/22434 [5:59:22<131:52:27, 26.96s/it] +2025-02-05 16:07:02 - ERROR - stderr - +2025-02-05 16:07:02 - ERROR - stderr - +2025-02-05 16:07:02 - INFO - stdout - {'loss': 1.05, 'grad_norm': 1.0323007106781006, 'learning_rate': 1.8256496552044724e-05, 'epoch': 0.65} +2025-02-05 16:07:02 - ERROR - stderr - 22%|██▏ | 4826/22434 [5:59:22<131:52:27, 26.96s/it] +2025-02-05 16:07:38 - ERROR - stderr - 22%|██▏ | 4827/22434 [5:59:58<144:24:23, 29.53s/it] +2025-02-05 16:07:38 - ERROR - stderr - +2025-02-05 16:07:38 - ERROR - stderr - +2025-02-05 16:07:38 - INFO - stdout - {'loss': 0.9471, 'grad_norm': 1.0785446166992188, 'learning_rate': 1.825568192845436e-05, 'epoch': 0.65} +2025-02-05 16:07:38 - ERROR - stderr - 22%|██▏ | 4827/22434 [5:59:58<144:24:23, 29.53s/it] +2025-02-05 16:07:55 - ERROR - stderr - 22%|██▏ | 4828/22434 [6:00:14<125:49:25, 25.73s/it] +2025-02-05 16:07:55 - ERROR - stderr - +2025-02-05 16:07:55 - ERROR - stderr - +2025-02-05 16:07:55 - INFO - stdout - {'loss': 0.8708, 'grad_norm': 0.957306444644928, 'learning_rate': 1.8254867132782203e-05, 'epoch': 0.65} +2025-02-05 16:07:55 - ERROR - stderr - 22%|██▏ | 4828/22434 [6:00:14<125:49:25, 25.73s/it] +2025-02-05 16:07:57 - ERROR - stderr - 22%|██▏ | 4829/22434 [6:00:17<91:39:56, 18.74s/it] +2025-02-05 16:07:57 - ERROR - stderr - +2025-02-05 16:07:57 - ERROR - stderr - +2025-02-05 16:07:57 - INFO - stdout - {'loss': 0.9732, 'grad_norm': 1.0840375423431396, 'learning_rate': 1.8254052165045245e-05, 'epoch': 0.65} +2025-02-05 16:07:57 - ERROR - stderr - 22%|██▏ | 4829/22434 [6:00:17<91:39:56, 18.74s/it] +2025-02-05 16:08:54 - ERROR - stderr - 22%|██▏ | 4830/22434 [6:01:13<146:56:27, 30.05s/it] +2025-02-05 16:08:54 - ERROR - stderr - +2025-02-05 16:08:54 - ERROR - stderr - +2025-02-05 16:08:54 - INFO - stdout - {'loss': 0.9341, 'grad_norm': 1.1256452798843384, 'learning_rate': 1.8253237025260465e-05, 'epoch': 0.65} +2025-02-05 16:08:54 - ERROR - stderr - 22%|██▏ | 4830/22434 [6:01:13<146:56:27, 30.05s/it] +2025-02-05 16:09:11 - ERROR - stderr - 22%|██▏ | 4831/22434 [6:01:31<129:01:50, 26.39s/it] +2025-02-05 16:09:11 - ERROR - stderr - +2025-02-05 16:09:11 - ERROR - stderr - +2025-02-05 16:09:11 - INFO - stdout - {'loss': 1.0918, 'grad_norm': 1.032884120941162, 'learning_rate': 1.8252421713444856e-05, 'epoch': 0.65} +2025-02-05 16:09:11 - ERROR - stderr - 22%|██▏ | 4831/22434 [6:01:31<129:01:50, 26.39s/it] +2025-02-05 16:09:33 - ERROR - stderr - 22%|██▏ | 4832/22434 [6:01:52<121:29:04, 24.85s/it] +2025-02-05 16:09:33 - ERROR - stderr - +2025-02-05 16:09:33 - ERROR - stderr - +2025-02-05 16:09:33 - INFO - stdout - {'loss': 0.9002, 'grad_norm': 1.1249043941497803, 'learning_rate': 1.8251606229615416e-05, 'epoch': 0.65} +2025-02-05 16:09:33 - ERROR - stderr - 22%|██▏ | 4832/22434 [6:01:52<121:29:04, 24.85s/it] +2025-02-05 16:09:54 - ERROR - stderr - 22%|██▏ | 4833/22434 [6:02:14<116:25:28, 23.81s/it] +2025-02-05 16:09:54 - ERROR - stderr - +2025-02-05 16:09:54 - ERROR - stderr - +2025-02-05 16:09:54 - INFO - stdout - {'loss': 0.9437, 'grad_norm': 1.1865218877792358, 'learning_rate': 1.8250790573789135e-05, 'epoch': 0.65} +2025-02-05 16:09:54 - ERROR - stderr - 22%|██▏ | 4833/22434 [6:02:14<116:25:28, 23.81s/it] +2025-02-05 16:10:24 - ERROR - stderr - 22%|██▏ | 4834/22434 [6:02:44<126:08:08, 25.80s/it] +2025-02-05 16:10:25 - ERROR - stderr - +2025-02-05 16:10:25 - ERROR - stderr - +2025-02-05 16:10:25 - INFO - stdout - {'loss': 0.9614, 'grad_norm': 1.162433385848999, 'learning_rate': 1.8249974745983023e-05, 'epoch': 0.65} +2025-02-05 16:10:25 - ERROR - stderr - 22%|██▏ | 4834/22434 [6:02:44<126:08:08, 25.80s/it] +2025-02-05 16:10:58 - ERROR - stderr - 22%|██▏ | 4835/22434 [6:03:18<137:26:41, 28.12s/it] +2025-02-05 16:10:58 - ERROR - stderr - +2025-02-05 16:10:58 - ERROR - stderr - +2025-02-05 16:10:58 - INFO - stdout - {'loss': 0.8857, 'grad_norm': 1.1015671491622925, 'learning_rate': 1.8249158746214085e-05, 'epoch': 0.65} +2025-02-05 16:10:58 - ERROR - stderr - 22%|██▏ | 4835/22434 [6:03:18<137:26:41, 28.12s/it] +2025-02-05 16:11:01 - ERROR - stderr - 22%|██▏ | 4836/22434 [6:03:21<100:24:38, 20.54s/it] +2025-02-05 16:11:01 - ERROR - stderr - +2025-02-05 16:11:01 - ERROR - stderr - +2025-02-05 16:11:01 - INFO - stdout - {'loss': 0.9356, 'grad_norm': 1.0929932594299316, 'learning_rate': 1.824834257449932e-05, 'epoch': 0.65} +2025-02-05 16:11:01 - ERROR - stderr - 22%|██▏ | 4836/22434 [6:03:21<100:24:38, 20.54s/it] +2025-02-05 16:11:03 - ERROR - stderr - 22%|██▏ | 4837/22434 [6:03:23<73:59:36, 15.14s/it] +2025-02-05 16:11:03 - ERROR - stderr - +2025-02-05 16:11:03 - ERROR - stderr - +2025-02-05 16:11:03 - INFO - stdout - {'loss': 0.9681, 'grad_norm': 1.180952787399292, 'learning_rate': 1.824752623085575e-05, 'epoch': 0.65} +2025-02-05 16:11:03 - ERROR - stderr - 22%|██▏ | 4837/22434 [6:03:23<73:59:36, 15.14s/it] +2025-02-05 16:11:23 - ERROR - stderr - 22%|██▏ | 4838/22434 [6:03:43<81:08:04, 16.60s/it] +2025-02-05 16:11:23 - ERROR - stderr - +2025-02-05 16:11:23 - ERROR - stderr - +2025-02-05 16:11:23 - INFO - stdout - {'loss': 1.019, 'grad_norm': 1.0900529623031616, 'learning_rate': 1.824670971530039e-05, 'epoch': 0.65} +2025-02-05 16:11:23 - ERROR - stderr - 22%|██▏ | 4838/22434 [6:03:43<81:08:04, 16.60s/it] +2025-02-05 16:11:43 - ERROR - stderr - 22%|██▏ | 4839/22434 [6:04:03<86:05:21, 17.61s/it] +2025-02-05 16:11:43 - ERROR - stderr - +2025-02-05 16:11:43 - ERROR - stderr - +2025-02-05 16:11:43 - INFO - stdout - {'loss': 1.0454, 'grad_norm': 1.1713682413101196, 'learning_rate': 1.8245893027850255e-05, 'epoch': 0.65} +2025-02-05 16:11:43 - ERROR - stderr - 22%|██▏ | 4839/22434 [6:04:03<86:05:21, 17.61s/it] +2025-02-05 16:11:46 - ERROR - stderr - 22%|██▏ | 4840/22434 [6:04:06<64:01:43, 13.10s/it] +2025-02-05 16:11:46 - ERROR - stderr - +2025-02-05 16:11:46 - ERROR - stderr - +2025-02-05 16:11:46 - INFO - stdout - {'loss': 0.9767, 'grad_norm': 1.0716418027877808, 'learning_rate': 1.824507616852237e-05, 'epoch': 0.65} +2025-02-05 16:11:46 - ERROR - stderr - 22%|██▏ | 4840/22434 [6:04:06<64:01:43, 13.10s/it] +2025-02-05 16:12:04 - ERROR - stderr - 22%|██▏ | 4841/22434 [6:04:24<71:15:15, 14.58s/it] +2025-02-05 16:12:04 - ERROR - stderr - +2025-02-05 16:12:04 - ERROR - stderr - +2025-02-05 16:12:04 - INFO - stdout - {'loss': 0.9264, 'grad_norm': 1.1568701267242432, 'learning_rate': 1.8244259137333763e-05, 'epoch': 0.65} +2025-02-05 16:12:04 - ERROR - stderr - 22%|██▏ | 4841/22434 [6:04:24<71:15:15, 14.58s/it] +2025-02-05 16:12:53 - ERROR - stderr - 22%|██▏ | 4842/22434 [6:05:13<122:06:01, 24.99s/it] +2025-02-05 16:12:53 - ERROR - stderr - +2025-02-05 16:12:53 - ERROR - stderr - +2025-02-05 16:12:53 - INFO - stdout - {'loss': 0.8802, 'grad_norm': 0.9616028070449829, 'learning_rate': 1.8243441934301462e-05, 'epoch': 0.65} +2025-02-05 16:12:53 - ERROR - stderr - 22%|██▏ | 4842/22434 [6:05:13<122:06:01, 24.99s/it] +2025-02-05 16:12:56 - ERROR - stderr - 22%|██▏ | 4843/22434 [6:05:16<89:08:34, 18.24s/it] +2025-02-05 16:12:56 - ERROR - stderr - +2025-02-05 16:12:56 - ERROR - stderr - +2025-02-05 16:12:56 - INFO - stdout - {'loss': 1.0983, 'grad_norm': 1.1977514028549194, 'learning_rate': 1.82426245594425e-05, 'epoch': 0.65} +2025-02-05 16:12:56 - ERROR - stderr - 22%|██▏ | 4843/22434 [6:05:16<89:08:34, 18.24s/it] +2025-02-05 16:12:58 - ERROR - stderr - 22%|██▏ | 4844/22434 [6:05:18<65:56:28, 13.50s/it] +2025-02-05 16:12:58 - ERROR - stderr - +2025-02-05 16:12:58 - ERROR - stderr - +2025-02-05 16:12:58 - INFO - stdout - {'loss': 0.976, 'grad_norm': 1.1566635370254517, 'learning_rate': 1.824180701277392e-05, 'epoch': 0.65} +2025-02-05 16:12:58 - ERROR - stderr - 22%|██▏ | 4844/22434 [6:05:18<65:56:28, 13.50s/it] +2025-02-05 16:13:01 - ERROR - stderr - 22%|██▏ | 4845/22434 [6:05:20<49:45:55, 10.19s/it] +2025-02-05 16:13:01 - ERROR - stderr - +2025-02-05 16:13:01 - ERROR - stderr - +2025-02-05 16:13:01 - INFO - stdout - {'loss': 0.8601, 'grad_norm': 1.0821622610092163, 'learning_rate': 1.8240989294312758e-05, 'epoch': 0.65} +2025-02-05 16:13:01 - ERROR - stderr - 22%|██▏ | 4845/22434 [6:05:20<49:45:55, 10.19s/it] +2025-02-05 16:14:01 - ERROR - stderr - 22%|██▏ | 4846/22434 [6:06:21<123:26:52, 25.27s/it] +2025-02-05 16:14:01 - ERROR - stderr - +2025-02-05 16:14:01 - ERROR - stderr - +2025-02-05 16:14:01 - INFO - stdout - {'loss': 0.903, 'grad_norm': 1.052411675453186, 'learning_rate': 1.824017140407606e-05, 'epoch': 0.65} +2025-02-05 16:14:01 - ERROR - stderr - 22%|██▏ | 4846/22434 [6:06:21<123:26:52, 25.27s/it] +2025-02-05 16:14:34 - ERROR - stderr - 22%|██▏ | 4847/22434 [6:06:54<134:58:07, 27.63s/it] +2025-02-05 16:14:34 - ERROR - stderr - +2025-02-05 16:14:34 - ERROR - stderr - +2025-02-05 16:14:34 - INFO - stdout - {'loss': 1.0223, 'grad_norm': 1.0975204706192017, 'learning_rate': 1.8239353342080874e-05, 'epoch': 0.65} +2025-02-05 16:14:34 - ERROR - stderr - 22%|██▏ | 4847/22434 [6:06:54<134:58:07, 27.63s/it] +2025-02-05 16:15:21 - ERROR - stderr - 22%|██▏ | 4848/22434 [6:07:41<162:54:00, 33.35s/it] +2025-02-05 16:15:21 - ERROR - stderr - +2025-02-05 16:15:21 - ERROR - stderr - +2025-02-05 16:15:21 - INFO - stdout - {'loss': 1.0235, 'grad_norm': 1.213929533958435, 'learning_rate': 1.8238535108344253e-05, 'epoch': 0.65} +2025-02-05 16:15:21 - ERROR - stderr - 22%|██▏ | 4848/22434 [6:07:41<162:54:00, 33.35s/it] +2025-02-05 16:15:39 - ERROR - stderr - 22%|██▏ | 4849/22434 [6:07:58<140:00:30, 28.66s/it] +2025-02-05 16:15:39 - ERROR - stderr - +2025-02-05 16:15:39 - ERROR - stderr - +2025-02-05 16:15:39 - INFO - stdout - {'loss': 0.988, 'grad_norm': 1.0271984338760376, 'learning_rate': 1.823771670288325e-05, 'epoch': 0.65} +2025-02-05 16:15:39 - ERROR - stderr - 22%|██▏ | 4849/22434 [6:07:58<140:00:30, 28.66s/it] +2025-02-05 16:16:23 - ERROR - stderr - 22%|██▏ | 4850/22434 [6:08:43<163:15:18, 33.42s/it] +2025-02-05 16:16:23 - ERROR - stderr - +2025-02-05 16:16:23 - ERROR - stderr - +2025-02-05 16:16:23 - INFO - stdout - {'loss': 0.8478, 'grad_norm': 1.079950213432312, 'learning_rate': 1.8236898125714925e-05, 'epoch': 0.65} +2025-02-05 16:16:23 - ERROR - stderr - 22%|██▏ | 4850/22434 [6:08:43<163:15:18, 33.42s/it] +2025-02-05 16:16:26 - ERROR - stderr - 22%|██▏ | 4851/22434 [6:08:46<118:01:32, 24.16s/it] +2025-02-05 16:16:26 - ERROR - stderr - +2025-02-05 16:16:26 - ERROR - stderr - +2025-02-05 16:16:26 - INFO - stdout - {'loss': 0.9977, 'grad_norm': 1.2224106788635254, 'learning_rate': 1.823607937685634e-05, 'epoch': 0.65} +2025-02-05 16:16:26 - ERROR - stderr - 22%|██▏ | 4851/22434 [6:08:46<118:01:32, 24.16s/it] +2025-02-05 16:17:14 - ERROR - stderr - 22%|██▏ | 4852/22434 [6:09:33<152:39:18, 31.26s/it] +2025-02-05 16:17:14 - ERROR - stderr - +2025-02-05 16:17:14 - ERROR - stderr - +2025-02-05 16:17:14 - INFO - stdout - {'loss': 0.9593, 'grad_norm': 1.0794486999511719, 'learning_rate': 1.8235260456324562e-05, 'epoch': 0.65} +2025-02-05 16:17:14 - ERROR - stderr - 22%|██▏ | 4852/22434 [6:09:33<152:39:18, 31.26s/it] +2025-02-05 16:18:01 - ERROR - stderr - 22%|██▏ | 4853/22434 [6:10:21<175:57:32, 36.03s/it] +2025-02-05 16:18:01 - ERROR - stderr - +2025-02-05 16:18:01 - ERROR - stderr - +2025-02-05 16:18:01 - INFO - stdout - {'loss': 1.0668, 'grad_norm': 1.0531206130981445, 'learning_rate': 1.823444136413666e-05, 'epoch': 0.65} +2025-02-05 16:18:01 - ERROR - stderr - 22%|██▏ | 4853/22434 [6:10:21<175:57:32, 36.03s/it] +2025-02-05 16:18:42 - ERROR - stderr - 22%|██▏ | 4854/22434 [6:11:02<184:13:35, 37.73s/it] +2025-02-05 16:18:42 - ERROR - stderr - +2025-02-05 16:18:42 - ERROR - stderr - +2025-02-05 16:18:42 - INFO - stdout - {'loss': 0.9464, 'grad_norm': 1.0323549509048462, 'learning_rate': 1.8233622100309705e-05, 'epoch': 0.65} +2025-02-05 16:18:42 - ERROR - stderr - 22%|██▏ | 4854/22434 [6:11:02<184:13:35, 37.73s/it] +2025-02-05 16:19:15 - ERROR - stderr - 22%|██▏ | 4855/22434 [6:11:34<176:12:35, 36.09s/it] +2025-02-05 16:19:15 - ERROR - stderr - +2025-02-05 16:19:15 - ERROR - stderr - +2025-02-05 16:19:15 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 1.0065720081329346, 'learning_rate': 1.8232802664860783e-05, 'epoch': 0.65} +2025-02-05 16:19:15 - ERROR - stderr - 22%|██▏ | 4855/22434 [6:11:34<176:12:35, 36.09s/it] +2025-02-05 16:19:17 - ERROR - stderr - 22%|██▏ | 4856/22434 [6:11:37<126:57:34, 26.00s/it] +2025-02-05 16:19:17 - ERROR - stderr - +2025-02-05 16:19:17 - ERROR - stderr - +2025-02-05 16:19:17 - INFO - stdout - {'loss': 1.0031, 'grad_norm': 1.1893373727798462, 'learning_rate': 1.823198305780696e-05, 'epoch': 0.65} +2025-02-05 16:19:17 - ERROR - stderr - 22%|██▏ | 4856/22434 [6:11:37<126:57:34, 26.00s/it] +2025-02-05 16:19:48 - ERROR - stderr - 22%|██▏ | 4857/22434 [6:12:08<134:05:47, 27.46s/it] +2025-02-05 16:19:48 - ERROR - stderr - +2025-02-05 16:19:48 - ERROR - stderr - +2025-02-05 16:19:48 - INFO - stdout - {'loss': 0.8221, 'grad_norm': 1.0254460573196411, 'learning_rate': 1.823116327916533e-05, 'epoch': 0.65} +2025-02-05 16:19:48 - ERROR - stderr - 22%|██▏ | 4857/22434 [6:12:08<134:05:47, 27.46s/it] +2025-02-05 16:19:50 - ERROR - stderr - 22%|██▏ | 4858/22434 [6:12:10<97:24:54, 19.95s/it] +2025-02-05 16:19:51 - ERROR - stderr - +2025-02-05 16:19:51 - ERROR - stderr - +2025-02-05 16:19:51 - INFO - stdout - {'loss': 0.901, 'grad_norm': 1.0601791143417358, 'learning_rate': 1.823034332895298e-05, 'epoch': 0.65} +2025-02-05 16:19:51 - ERROR - stderr - 22%|██▏ | 4858/22434 [6:12:10<97:24:54, 19.95s/it] +2025-02-05 16:20:20 - ERROR - stderr - 22%|██▏ | 4859/22434 [6:12:39<110:47:04, 22.69s/it] +2025-02-05 16:20:20 - ERROR - stderr - +2025-02-05 16:20:20 - ERROR - stderr - +2025-02-05 16:20:20 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 1.052120566368103, 'learning_rate': 1.8229523207186995e-05, 'epoch': 0.65} +2025-02-05 16:20:20 - ERROR - stderr - 22%|██▏ | 4859/22434 [6:12:39<110:47:04, 22.69s/it] +2025-02-05 16:20:22 - ERROR - stderr - 22%|██▏ | 4860/22434 [6:12:42<81:08:04, 16.62s/it] +2025-02-05 16:20:22 - ERROR - stderr - +2025-02-05 16:20:22 - ERROR - stderr - +2025-02-05 16:20:22 - INFO - stdout - {'loss': 0.9759, 'grad_norm': 1.0642207860946655, 'learning_rate': 1.8228702913884476e-05, 'epoch': 0.65} +2025-02-05 16:20:22 - ERROR - stderr - 22%|██▏ | 4860/22434 [6:12:42<81:08:04, 16.62s/it] +2025-02-05 16:20:25 - ERROR - stderr - 22%|██▏ | 4861/22434 [6:12:45<60:53:20, 12.47s/it] +2025-02-05 16:20:25 - ERROR - stderr - +2025-02-05 16:20:25 - ERROR - stderr - +2025-02-05 16:20:25 - INFO - stdout - {'loss': 1.052, 'grad_norm': 1.2952349185943604, 'learning_rate': 1.8227882449062516e-05, 'epoch': 0.65} +2025-02-05 16:20:25 - ERROR - stderr - 22%|██▏ | 4861/22434 [6:12:45<60:53:20, 12.47s/it] +2025-02-05 16:20:54 - ERROR - stderr - 22%|██▏ | 4862/22434 [6:13:14<85:05:00, 17.43s/it] +2025-02-05 16:20:54 - ERROR - stderr - +2025-02-05 16:20:54 - ERROR - stderr - +2025-02-05 16:20:54 - INFO - stdout - {'loss': 1.0181, 'grad_norm': 1.1930335760116577, 'learning_rate': 1.8227061812738223e-05, 'epoch': 0.65} +2025-02-05 16:20:54 - ERROR - stderr - 22%|██▏ | 4862/22434 [6:13:14<85:05:00, 17.43s/it] +2025-02-05 16:21:28 - ERROR - stderr - 22%|██▏ | 4863/22434 [6:13:48<109:57:33, 22.53s/it] +2025-02-05 16:21:28 - ERROR - stderr - +2025-02-05 16:21:28 - ERROR - stderr - +2025-02-05 16:21:28 - INFO - stdout - {'loss': 0.9708, 'grad_norm': 1.1082086563110352, 'learning_rate': 1.82262410049287e-05, 'epoch': 0.65} +2025-02-05 16:21:28 - ERROR - stderr - 22%|██▏ | 4863/22434 [6:13:48<109:57:33, 22.53s/it] +2025-02-05 16:21:47 - ERROR - stderr - 22%|██▏ | 4864/22434 [6:14:07<105:12:34, 21.56s/it] +2025-02-05 16:21:48 - ERROR - stderr - +2025-02-05 16:21:48 - ERROR - stderr - +2025-02-05 16:21:48 - INFO - stdout - {'loss': 0.9398, 'grad_norm': 1.0963523387908936, 'learning_rate': 1.822542002565105e-05, 'epoch': 0.65} +2025-02-05 16:21:48 - ERROR - stderr - 22%|██▏ | 4864/22434 [6:14:07<105:12:34, 21.56s/it] +2025-02-05 16:22:01 - ERROR - stderr - 22%|██▏ | 4865/22434 [6:14:20<92:44:39, 19.00s/it] +2025-02-05 16:22:01 - ERROR - stderr - +2025-02-05 16:22:01 - ERROR - stderr - +2025-02-05 16:22:01 - INFO - stdout - {'loss': 0.9298, 'grad_norm': 1.01266610622406, 'learning_rate': 1.822459887492239e-05, 'epoch': 0.65} +2025-02-05 16:22:01 - ERROR - stderr - 22%|██▏ | 4865/22434 [6:14:20<92:44:39, 19.00s/it] +2025-02-05 16:22:15 - ERROR - stderr - 22%|██▏ | 4866/22434 [6:14:35<85:44:17, 17.57s/it] +2025-02-05 16:22:15 - ERROR - stderr - +2025-02-05 16:22:15 - ERROR - stderr - +2025-02-05 16:22:15 - INFO - stdout - {'loss': 0.9023, 'grad_norm': 1.793610692024231, 'learning_rate': 1.822377755275984e-05, 'epoch': 0.65} +2025-02-05 16:22:15 - ERROR - stderr - 22%|██▏ | 4866/22434 [6:14:35<85:44:17, 17.57s/it] +2025-02-05 16:22:17 - ERROR - stderr - 22%|██▏ | 4867/22434 [6:14:37<63:42:43, 13.06s/it] +2025-02-05 16:22:17 - ERROR - stderr - +2025-02-05 16:22:17 - ERROR - stderr - +2025-02-05 16:22:17 - INFO - stdout - {'loss': 0.9671, 'grad_norm': 1.0492956638336182, 'learning_rate': 1.822295605918052e-05, 'epoch': 0.65} +2025-02-05 16:22:17 - ERROR - stderr - 22%|██▏ | 4867/22434 [6:14:37<63:42:43, 13.06s/it] +2025-02-05 16:22:20 - ERROR - stderr - 22%|██▏ | 4868/22434 [6:14:40<48:12:57, 9.88s/it] +2025-02-05 16:22:20 - ERROR - stderr - +2025-02-05 16:22:20 - ERROR - stderr - +2025-02-05 16:22:20 - INFO - stdout - {'loss': 0.9851, 'grad_norm': 1.0646440982818604, 'learning_rate': 1.8222134394201543e-05, 'epoch': 0.65} +2025-02-05 16:22:20 - ERROR - stderr - 22%|██▏ | 4868/22434 [6:14:40<48:12:57, 9.88s/it] +2025-02-05 16:22:22 - ERROR - stderr - 22%|██▏ | 4869/22434 [6:14:42<37:29:50, 7.69s/it] +2025-02-05 16:22:22 - ERROR - stderr - +2025-02-05 16:22:22 - ERROR - stderr - +2025-02-05 16:22:22 - INFO - stdout - {'loss': 1.0447, 'grad_norm': 1.0914469957351685, 'learning_rate': 1.8221312557840047e-05, 'epoch': 0.65} +2025-02-05 16:22:22 - ERROR - stderr - 22%|██▏ | 4869/22434 [6:14:42<37:29:50, 7.69s/it] +2025-02-05 16:22:32 - ERROR - stderr - 22%|██▏ | 4870/22434 [6:14:52<40:50:00, 8.37s/it] +2025-02-05 16:22:32 - ERROR - stderr - +2025-02-05 16:22:32 - ERROR - stderr - +2025-02-05 16:22:32 - INFO - stdout - {'loss': 1.0577, 'grad_norm': 1.0594173669815063, 'learning_rate': 1.8220490550113153e-05, 'epoch': 0.65} +2025-02-05 16:22:32 - ERROR - stderr - 22%|██▏ | 4870/22434 [6:14:52<40:50:00, 8.37s/it] +2025-02-05 16:23:04 - ERROR - stderr - 22%|██▏ | 4871/22434 [6:15:24<74:36:36, 15.29s/it] +2025-02-05 16:23:04 - ERROR - stderr - +2025-02-05 16:23:04 - ERROR - stderr - +2025-02-05 16:23:04 - INFO - stdout - {'loss': 1.0581, 'grad_norm': 1.0965802669525146, 'learning_rate': 1.8219668371038002e-05, 'epoch': 0.65} +2025-02-05 16:23:04 - ERROR - stderr - 22%|██▏ | 4871/22434 [6:15:24<74:36:36, 15.29s/it] +2025-02-05 16:23:31 - ERROR - stderr - 22%|██▏ | 4872/22434 [6:15:50<91:27:35, 18.75s/it] +2025-02-05 16:23:31 - ERROR - stderr - +2025-02-05 16:23:31 - ERROR - stderr - +2025-02-05 16:23:31 - INFO - stdout - {'loss': 0.8925, 'grad_norm': 0.9848311543464661, 'learning_rate': 1.8218846020631725e-05, 'epoch': 0.65} +2025-02-05 16:23:31 - ERROR - stderr - 22%|██▏ | 4872/22434 [6:15:50<91:27:35, 18.75s/it] +2025-02-05 16:23:33 - ERROR - stderr - 22%|██▏ | 4873/22434 [6:15:53<67:41:51, 13.88s/it] +2025-02-05 16:23:33 - ERROR - stderr - +2025-02-05 16:23:33 - ERROR - stderr - +2025-02-05 16:23:33 - INFO - stdout - {'loss': 0.9456, 'grad_norm': 1.1393353939056396, 'learning_rate': 1.8218023498911476e-05, 'epoch': 0.65} +2025-02-05 16:23:33 - ERROR - stderr - 22%|██▏ | 4873/22434 [6:15:53<67:41:51, 13.88s/it] +2025-02-05 16:23:36 - ERROR - stderr - 22%|██▏ | 4874/22434 [6:15:55<50:57:28, 10.45s/it] +2025-02-05 16:23:36 - ERROR - stderr - +2025-02-05 16:23:36 - ERROR - stderr - +2025-02-05 16:23:36 - INFO - stdout - {'loss': 0.9856, 'grad_norm': 1.1741771697998047, 'learning_rate': 1.8217200805894382e-05, 'epoch': 0.65} +2025-02-05 16:23:36 - ERROR - stderr - 22%|██▏ | 4874/22434 [6:15:55<50:57:28, 10.45s/it] +2025-02-05 16:23:45 - ERROR - stderr - 22%|██▏ | 4875/22434 [6:16:05<50:08:01, 10.28s/it] +2025-02-05 16:23:45 - ERROR - stderr - +2025-02-05 16:23:45 - ERROR - stderr - +2025-02-05 16:23:45 - INFO - stdout - {'loss': 0.8658, 'grad_norm': 1.1804367303848267, 'learning_rate': 1.8216377941597607e-05, 'epoch': 0.65} +2025-02-05 16:23:45 - ERROR - stderr - 22%|██▏ | 4875/22434 [6:16:05<50:08:01, 10.28s/it] +2025-02-05 16:23:48 - ERROR - stderr - 22%|██▏ | 4876/22434 [6:16:08<38:43:06, 7.94s/it] +2025-02-05 16:23:48 - ERROR - stderr - +2025-02-05 16:23:48 - ERROR - stderr - +2025-02-05 16:23:48 - INFO - stdout - {'loss': 0.9917, 'grad_norm': 1.0566893815994263, 'learning_rate': 1.8215554906038292e-05, 'epoch': 0.65} +2025-02-05 16:23:48 - ERROR - stderr - 22%|██▏ | 4876/22434 [6:16:08<38:43:06, 7.94s/it] +2025-02-05 16:23:53 - ERROR - stderr - 22%|██▏ | 4877/22434 [6:16:13<34:59:31, 7.18s/it] +2025-02-05 16:23:53 - ERROR - stderr - +2025-02-05 16:23:53 - ERROR - stderr - +2025-02-05 16:23:53 - INFO - stdout - {'loss': 1.0047, 'grad_norm': 1.1456278562545776, 'learning_rate': 1.8214731699233597e-05, 'epoch': 0.65} +2025-02-05 16:23:53 - ERROR - stderr - 22%|██▏ | 4877/22434 [6:16:13<34:59:31, 7.18s/it] +2025-02-05 16:23:56 - ERROR - stderr - 22%|██▏ | 4878/22434 [6:16:15<28:05:51, 5.76s/it] +2025-02-05 16:23:56 - ERROR - stderr - +2025-02-05 16:23:56 - ERROR - stderr - +2025-02-05 16:23:56 - INFO - stdout - {'loss': 0.9165, 'grad_norm': 1.0901113748550415, 'learning_rate': 1.821390832120068e-05, 'epoch': 0.65} +2025-02-05 16:23:56 - ERROR - stderr - 22%|██▏ | 4878/22434 [6:16:16<28:05:51, 5.76s/it] +2025-02-05 16:23:58 - ERROR - stderr - 22%|██▏ | 4879/22434 [6:16:18<23:22:09, 4.79s/it] +2025-02-05 16:23:58 - ERROR - stderr - +2025-02-05 16:23:58 - ERROR - stderr - +2025-02-05 16:23:58 - INFO - stdout - {'loss': 0.9102, 'grad_norm': 1.0466879606246948, 'learning_rate': 1.8213084771956707e-05, 'epoch': 0.65} +2025-02-05 16:23:58 - ERROR - stderr - 22%|██▏ | 4879/22434 [6:16:18<23:22:09, 4.79s/it] +2025-02-05 16:24:01 - ERROR - stderr - 22%|██▏ | 4880/22434 [6:16:21<20:14:55, 4.15s/it] +2025-02-05 16:24:01 - ERROR - stderr - +2025-02-05 16:24:01 - ERROR - stderr - +2025-02-05 16:24:01 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.1013215780258179, 'learning_rate': 1.821226105151884e-05, 'epoch': 0.65} +2025-02-05 16:24:01 - ERROR - stderr - 22%|██▏ | 4880/22434 [6:16:21<20:14:55, 4.15s/it] +2025-02-05 16:24:03 - ERROR - stderr - 22%|██▏ | 4881/22434 [6:16:23<17:50:51, 3.66s/it] +2025-02-05 16:24:03 - ERROR - stderr - +2025-02-05 16:24:03 - ERROR - stderr - +2025-02-05 16:24:03 - INFO - stdout - {'loss': 1.0318, 'grad_norm': 1.161557912826538, 'learning_rate': 1.821143715990425e-05, 'epoch': 0.65} +2025-02-05 16:24:03 - ERROR - stderr - 22%|██▏ | 4881/22434 [6:16:23<17:50:51, 3.66s/it] +2025-02-05 16:24:06 - ERROR - stderr - 22%|██▏ | 4882/22434 [6:16:26<16:02:45, 3.29s/it] +2025-02-05 16:24:06 - ERROR - stderr - +2025-02-05 16:24:06 - ERROR - stderr - +2025-02-05 16:24:06 - INFO - stdout - {'loss': 0.894, 'grad_norm': 1.0900743007659912, 'learning_rate': 1.821061309713011e-05, 'epoch': 0.65} +2025-02-05 16:24:06 - ERROR - stderr - 22%|██▏ | 4882/22434 [6:16:26<16:02:45, 3.29s/it] +2025-02-05 16:24:08 - ERROR - stderr - 22%|██▏ | 4883/22434 [6:16:28<14:49:21, 3.04s/it] +2025-02-05 16:24:08 - ERROR - stderr - +2025-02-05 16:24:08 - ERROR - stderr - +2025-02-05 16:24:08 - INFO - stdout - {'loss': 0.9536, 'grad_norm': 1.0907121896743774, 'learning_rate': 1.8209788863213594e-05, 'epoch': 0.65} +2025-02-05 16:24:08 - ERROR - stderr - 22%|██▏ | 4883/22434 [6:16:28<14:49:21, 3.04s/it] +2025-02-05 16:24:11 - ERROR - stderr - 22%|██▏ | 4884/22434 [6:16:31<14:02:32, 2.88s/it] +2025-02-05 16:24:11 - ERROR - stderr - +2025-02-05 16:24:11 - ERROR - stderr - +2025-02-05 16:24:11 - INFO - stdout - {'loss': 0.9984, 'grad_norm': 0.998594343662262, 'learning_rate': 1.8208964458171884e-05, 'epoch': 0.65} +2025-02-05 16:24:11 - ERROR - stderr - 22%|██▏ | 4884/22434 [6:16:31<14:02:32, 2.88s/it] +2025-02-05 16:24:13 - ERROR - stderr - 22%|██▏ | 4885/22434 [6:16:33<13:24:50, 2.75s/it] +2025-02-05 16:24:13 - ERROR - stderr - +2025-02-05 16:24:13 - ERROR - stderr - +2025-02-05 16:24:13 - INFO - stdout - {'loss': 0.955, 'grad_norm': 1.1376252174377441, 'learning_rate': 1.820813988202217e-05, 'epoch': 0.65} +2025-02-05 16:24:13 - ERROR - stderr - 22%|██▏ | 4885/22434 [6:16:33<13:24:50, 2.75s/it] +2025-02-05 16:24:16 - ERROR - stderr - 22%|██▏ | 4886/22434 [6:16:36<13:20:17, 2.74s/it] +2025-02-05 16:24:16 - ERROR - stderr - +2025-02-05 16:24:16 - ERROR - stderr - +2025-02-05 16:24:16 - INFO - stdout - {'loss': 0.9364, 'grad_norm': 1.083677887916565, 'learning_rate': 1.8207315134781633e-05, 'epoch': 0.65} +2025-02-05 16:24:16 - ERROR - stderr - 22%|██▏ | 4886/22434 [6:16:36<13:20:17, 2.74s/it] +2025-02-05 16:24:18 - ERROR - stderr - 22%|██▏ | 4887/22434 [6:16:38<13:00:23, 2.67s/it] +2025-02-05 16:24:19 - ERROR - stderr - +2025-02-05 16:24:19 - ERROR - stderr - +2025-02-05 16:24:19 - INFO - stdout - {'loss': 0.8135, 'grad_norm': 1.0465039014816284, 'learning_rate': 1.8206490216467464e-05, 'epoch': 0.65} +2025-02-05 16:24:19 - ERROR - stderr - 22%|██▏ | 4887/22434 [6:16:38<13:00:23, 2.67s/it] +2025-02-05 16:24:21 - ERROR - stderr - 22%|██▏ | 4888/22434 [6:16:41<12:45:31, 2.62s/it] +2025-02-05 16:24:21 - ERROR - stderr - +2025-02-05 16:24:21 - ERROR - stderr - +2025-02-05 16:24:21 - INFO - stdout - {'loss': 0.8827, 'grad_norm': 0.9878882765769958, 'learning_rate': 1.8205665127096855e-05, 'epoch': 0.65} +2025-02-05 16:24:21 - ERROR - stderr - 22%|██▏ | 4888/22434 [6:16:41<12:45:31, 2.62s/it] +2025-02-05 16:24:23 - ERROR - stderr - 22%|██▏ | 4889/22434 [6:16:43<12:28:44, 2.56s/it] +2025-02-05 16:24:23 - ERROR - stderr - +2025-02-05 16:24:23 - ERROR - stderr - +2025-02-05 16:24:23 - INFO - stdout - {'loss': 1.0562, 'grad_norm': 1.1272354125976562, 'learning_rate': 1.8204839866687014e-05, 'epoch': 0.65} +2025-02-05 16:24:23 - ERROR - stderr - 22%|██▏ | 4889/22434 [6:16:43<12:28:44, 2.56s/it] +2025-02-05 16:24:26 - ERROR - stderr - 22%|██▏ | 4890/22434 [6:16:46<12:24:12, 2.55s/it] +2025-02-05 16:24:26 - ERROR - stderr - +2025-02-05 16:24:26 - ERROR - stderr - +2025-02-05 16:24:26 - INFO - stdout - {'loss': 0.8929, 'grad_norm': 1.004042387008667, 'learning_rate': 1.8204014435255136e-05, 'epoch': 0.65} +2025-02-05 16:24:26 - ERROR - stderr - 22%|██▏ | 4890/22434 [6:16:46<12:24:12, 2.55s/it] +2025-02-05 16:24:29 - ERROR - stderr - 22%|██▏ | 4891/22434 [6:16:48<12:31:59, 2.57s/it] +2025-02-05 16:24:29 - ERROR - stderr - +2025-02-05 16:24:29 - ERROR - stderr - +2025-02-05 16:24:29 - INFO - stdout - {'loss': 0.9727, 'grad_norm': 0.9849901795387268, 'learning_rate': 1.820318883281843e-05, 'epoch': 0.65} +2025-02-05 16:24:29 - ERROR - stderr - 22%|██▏ | 4891/22434 [6:16:48<12:31:59, 2.57s/it] +2025-02-05 16:24:31 - ERROR - stderr - 22%|██▏ | 4892/22434 [6:16:51<12:27:46, 2.56s/it] +2025-02-05 16:24:31 - ERROR - stderr - +2025-02-05 16:24:31 - ERROR - stderr - +2025-02-05 16:24:31 - INFO - stdout - {'loss': 0.9918, 'grad_norm': 1.2953550815582275, 'learning_rate': 1.82023630593941e-05, 'epoch': 0.65} +2025-02-05 16:24:31 - ERROR - stderr - 22%|██▏ | 4892/22434 [6:16:51<12:27:46, 2.56s/it] +2025-02-05 16:24:34 - ERROR - stderr - 22%|██▏ | 4893/22434 [6:16:53<12:21:56, 2.54s/it] +2025-02-05 16:24:34 - ERROR - stderr - +2025-02-05 16:24:34 - ERROR - stderr - +2025-02-05 16:24:34 - INFO - stdout - {'loss': 0.9013, 'grad_norm': 1.1145843267440796, 'learning_rate': 1.820153711499936e-05, 'epoch': 0.65} +2025-02-05 16:24:34 - ERROR - stderr - 22%|██▏ | 4893/22434 [6:16:53<12:21:56, 2.54s/it] +2025-02-05 16:24:36 - ERROR - stderr - 22%|██▏ | 4894/22434 [6:16:56<12:24:10, 2.55s/it] +2025-02-05 16:24:36 - ERROR - stderr - +2025-02-05 16:24:36 - ERROR - stderr - +2025-02-05 16:24:36 - INFO - stdout - {'loss': 1.0358, 'grad_norm': 1.1399295330047607, 'learning_rate': 1.820071099965143e-05, 'epoch': 0.65} +2025-02-05 16:24:36 - ERROR - stderr - 22%|██▏ | 4894/22434 [6:16:56<12:24:10, 2.55s/it] +2025-02-05 16:24:39 - ERROR - stderr - 22%|██▏ | 4895/22434 [6:16:58<12:24:27, 2.55s/it] +2025-02-05 16:24:39 - ERROR - stderr - +2025-02-05 16:24:39 - ERROR - stderr - +2025-02-05 16:24:39 - INFO - stdout - {'loss': 1.0702, 'grad_norm': 1.1147061586380005, 'learning_rate': 1.8199884713367524e-05, 'epoch': 0.65} +2025-02-05 16:24:39 - ERROR - stderr - 22%|██▏ | 4895/22434 [6:16:59<12:24:27, 2.55s/it] +2025-02-05 16:24:41 - ERROR - stderr - 22%|██▏ | 4896/22434 [6:17:01<12:26:01, 2.55s/it] +2025-02-05 16:24:41 - ERROR - stderr - +2025-02-05 16:24:41 - ERROR - stderr - +2025-02-05 16:24:41 - INFO - stdout - {'loss': 0.9767, 'grad_norm': 1.0737214088439941, 'learning_rate': 1.8199058256164866e-05, 'epoch': 0.65} +2025-02-05 16:24:41 - ERROR - stderr - 22%|██▏ | 4896/22434 [6:17:01<12:26:01, 2.55s/it] +2025-02-05 16:24:44 - ERROR - stderr - 22%|██▏ | 4897/22434 [6:17:03<12:19:21, 2.53s/it] +2025-02-05 16:24:44 - ERROR - stderr - +2025-02-05 16:24:44 - ERROR - stderr - +2025-02-05 16:24:44 - INFO - stdout - {'loss': 0.9766, 'grad_norm': 1.1452678442001343, 'learning_rate': 1.8198231628060686e-05, 'epoch': 0.65} +2025-02-05 16:24:44 - ERROR - stderr - 22%|██▏ | 4897/22434 [6:17:04<12:19:21, 2.53s/it] +2025-02-05 16:24:46 - ERROR - stderr - 22%|██▏ | 4898/22434 [6:17:06<12:16:56, 2.52s/it] +2025-02-05 16:24:46 - ERROR - stderr - +2025-02-05 16:24:46 - ERROR - stderr - +2025-02-05 16:24:46 - INFO - stdout - {'loss': 1.0399, 'grad_norm': 1.0882238149642944, 'learning_rate': 1.8197404829072214e-05, 'epoch': 0.65} +2025-02-05 16:24:46 - ERROR - stderr - 22%|██▏ | 4898/22434 [6:17:06<12:16:56, 2.52s/it] +2025-02-05 16:24:49 - ERROR - stderr - 22%|██▏ | 4899/22434 [6:17:09<12:22:33, 2.54s/it] +2025-02-05 16:24:49 - ERROR - stderr - +2025-02-05 16:24:49 - ERROR - stderr - +2025-02-05 16:24:49 - INFO - stdout - {'loss': 0.9628, 'grad_norm': 1.237720251083374, 'learning_rate': 1.819657785921668e-05, 'epoch': 0.66} +2025-02-05 16:24:49 - ERROR - stderr - 22%|██▏ | 4899/22434 [6:17:09<12:22:33, 2.54s/it] +2025-02-05 16:24:51 - ERROR - stderr - 22%|██▏ | 4900/22434 [6:17:11<12:19:30, 2.53s/it] +2025-02-05 16:24:51 - ERROR - stderr - +2025-02-05 16:24:51 - ERROR - stderr - +2025-02-05 16:24:51 - INFO - stdout - {'loss': 0.9597, 'grad_norm': 1.051042914390564, 'learning_rate': 1.8195750718511326e-05, 'epoch': 0.66} +2025-02-05 16:24:51 - ERROR - stderr - 22%|██▏ | 4900/22434 [6:17:11<12:19:30, 2.53s/it] +2025-02-05 16:24:54 - ERROR - stderr - 22%|██▏ | 4901/22434 [6:17:14<12:24:14, 2.55s/it] +2025-02-05 16:24:54 - ERROR - stderr - +2025-02-05 16:24:54 - ERROR - stderr - +2025-02-05 16:24:54 - INFO - stdout - {'loss': 0.9633, 'grad_norm': 1.1524134874343872, 'learning_rate': 1.819492340697339e-05, 'epoch': 0.66} +2025-02-05 16:24:54 - ERROR - stderr - 22%|██▏ | 4901/22434 [6:17:14<12:24:14, 2.55s/it] +2025-02-05 16:24:56 - ERROR - stderr - 22%|██▏ | 4902/22434 [6:17:16<12:25:53, 2.55s/it] +2025-02-05 16:24:57 - ERROR - stderr - +2025-02-05 16:24:57 - ERROR - stderr - +2025-02-05 16:24:57 - INFO - stdout - {'loss': 0.8818, 'grad_norm': 1.1068754196166992, 'learning_rate': 1.8194095924620114e-05, 'epoch': 0.66} +2025-02-05 16:24:57 - ERROR - stderr - 22%|██▏ | 4902/22434 [6:17:16<12:25:53, 2.55s/it] +2025-02-05 16:24:59 - ERROR - stderr - 22%|██▏ | 4903/22434 [6:17:19<12:25:18, 2.55s/it] +2025-02-05 16:24:59 - ERROR - stderr - +2025-02-05 16:24:59 - ERROR - stderr - +2025-02-05 16:24:59 - INFO - stdout - {'loss': 1.0163, 'grad_norm': 1.1498146057128906, 'learning_rate': 1.8193268271468754e-05, 'epoch': 0.66} +2025-02-05 16:24:59 - ERROR - stderr - 22%|██▏ | 4903/22434 [6:17:19<12:25:18, 2.55s/it] +2025-02-05 16:25:02 - ERROR - stderr - 22%|██▏ | 4904/22434 [6:17:21<12:26:20, 2.55s/it] +2025-02-05 16:25:02 - ERROR - stderr - +2025-02-05 16:25:02 - ERROR - stderr - +2025-02-05 16:25:02 - INFO - stdout - {'loss': 1.0938, 'grad_norm': 1.1875187158584595, 'learning_rate': 1.8192440447536554e-05, 'epoch': 0.66} +2025-02-05 16:25:02 - ERROR - stderr - 22%|██▏ | 4904/22434 [6:17:21<12:26:20, 2.55s/it] +2025-02-05 16:25:04 - ERROR - stderr - 22%|██▏ | 4905/22434 [6:17:24<12:22:04, 2.54s/it] +2025-02-05 16:25:04 - ERROR - stderr - +2025-02-05 16:25:04 - ERROR - stderr - +2025-02-05 16:25:04 - INFO - stdout - {'loss': 0.8908, 'grad_norm': 0.9934622645378113, 'learning_rate': 1.8191612452840775e-05, 'epoch': 0.66} +2025-02-05 16:25:04 - ERROR - stderr - 22%|██▏ | 4905/22434 [6:17:24<12:22:04, 2.54s/it] +2025-02-05 16:25:07 - ERROR - stderr - 22%|██▏ | 4906/22434 [6:17:26<12:25:07, 2.55s/it] +2025-02-05 16:25:07 - ERROR - stderr - +2025-02-05 16:25:07 - ERROR - stderr - +2025-02-05 16:25:07 - INFO - stdout - {'loss': 0.9663, 'grad_norm': 1.1322556734085083, 'learning_rate': 1.819078428739867e-05, 'epoch': 0.66} +2025-02-05 16:25:07 - ERROR - stderr - 22%|██▏ | 4906/22434 [6:17:26<12:25:07, 2.55s/it] +2025-02-05 16:25:09 - ERROR - stderr - 22%|██▏ | 4907/22434 [6:17:29<12:34:40, 2.58s/it] +2025-02-05 16:25:09 - ERROR - stderr - +2025-02-05 16:25:09 - ERROR - stderr - +2025-02-05 16:25:09 - INFO - stdout - {'loss': 0.8333, 'grad_norm': 1.1673023700714111, 'learning_rate': 1.8189955951227504e-05, 'epoch': 0.66} +2025-02-05 16:25:09 - ERROR - stderr - 22%|██▏ | 4907/22434 [6:17:29<12:34:40, 2.58s/it] +2025-02-05 16:25:12 - ERROR - stderr - 22%|██▏ | 4908/22434 [6:17:32<12:35:29, 2.59s/it] +2025-02-05 16:25:12 - ERROR - stderr - +2025-02-05 16:25:12 - ERROR - stderr - +2025-02-05 16:25:12 - INFO - stdout - {'loss': 1.0686, 'grad_norm': 1.0496773719787598, 'learning_rate': 1.818912744434455e-05, 'epoch': 0.66} +2025-02-05 16:25:12 - ERROR - stderr - 22%|██▏ | 4908/22434 [6:17:32<12:35:29, 2.59s/it] +2025-02-05 16:25:15 - ERROR - stderr - 22%|██▏ | 4909/22434 [6:17:34<12:36:12, 2.59s/it] +2025-02-05 16:25:15 - ERROR - stderr - +2025-02-05 16:25:15 - ERROR - stderr - +2025-02-05 16:25:15 - INFO - stdout - {'loss': 0.8953, 'grad_norm': 0.9572871327400208, 'learning_rate': 1.818829876676706e-05, 'epoch': 0.66} +2025-02-05 16:25:15 - ERROR - stderr - 22%|██▏ | 4909/22434 [6:17:34<12:36:12, 2.59s/it] +2025-02-05 16:25:17 - ERROR - stderr - 22%|██▏ | 4910/22434 [6:17:37<12:25:17, 2.55s/it] +2025-02-05 16:25:17 - ERROR - stderr - +2025-02-05 16:25:17 - ERROR - stderr - +2025-02-05 16:25:17 - INFO - stdout - {'loss': 0.874, 'grad_norm': 1.0872960090637207, 'learning_rate': 1.8187469918512323e-05, 'epoch': 0.66} +2025-02-05 16:25:17 - ERROR - stderr - 22%|██▏ | 4910/22434 [6:17:37<12:25:17, 2.55s/it] +2025-02-05 16:25:19 - ERROR - stderr - 22%|██▏ | 4911/22434 [6:17:39<12:17:27, 2.53s/it] +2025-02-05 16:25:19 - ERROR - stderr - +2025-02-05 16:25:19 - ERROR - stderr - +2025-02-05 16:25:19 - INFO - stdout - {'loss': 0.8465, 'grad_norm': 1.0465223789215088, 'learning_rate': 1.8186640899597612e-05, 'epoch': 0.66} +2025-02-05 16:25:19 - ERROR - stderr - 22%|██▏ | 4911/22434 [6:17:39<12:17:27, 2.53s/it] +2025-02-05 16:25:22 - ERROR - stderr - 22%|██▏ | 4912/22434 [6:17:42<12:34:48, 2.58s/it] +2025-02-05 16:25:22 - ERROR - stderr - +2025-02-05 16:25:22 - ERROR - stderr - +2025-02-05 16:25:22 - INFO - stdout - {'loss': 1.0422, 'grad_norm': 1.1264820098876953, 'learning_rate': 1.8185811710040203e-05, 'epoch': 0.66} +2025-02-05 16:25:22 - ERROR - stderr - 22%|██▏ | 4912/22434 [6:17:42<12:34:48, 2.58s/it] +2025-02-05 16:25:25 - ERROR - stderr - 22%|██▏ | 4913/22434 [6:17:44<12:31:44, 2.57s/it] +2025-02-05 16:25:25 - ERROR - stderr - +2025-02-05 16:25:25 - ERROR - stderr - +2025-02-05 16:25:25 - INFO - stdout - {'loss': 0.9631, 'grad_norm': 1.042545199394226, 'learning_rate': 1.8184982349857384e-05, 'epoch': 0.66} +2025-02-05 16:25:25 - ERROR - stderr - 22%|██▏ | 4913/22434 [6:17:45<12:31:44, 2.57s/it] +2025-02-05 16:25:27 - ERROR - stderr - 22%|██▏ | 4914/22434 [6:17:47<12:19:24, 2.53s/it] +2025-02-05 16:25:27 - ERROR - stderr - +2025-02-05 16:25:27 - ERROR - stderr - +2025-02-05 16:25:27 - INFO - stdout - {'loss': 0.9864, 'grad_norm': 1.063456416130066, 'learning_rate': 1.8184152819066437e-05, 'epoch': 0.66} +2025-02-05 16:25:27 - ERROR - stderr - 22%|██▏ | 4914/22434 [6:17:47<12:19:24, 2.53s/it] +2025-02-05 16:25:30 - ERROR - stderr - 22%|██▏ | 4915/22434 [6:17:49<12:14:17, 2.51s/it] +2025-02-05 16:25:30 - ERROR - stderr - +2025-02-05 16:25:30 - ERROR - stderr - +2025-02-05 16:25:30 - INFO - stdout - {'loss': 0.7838, 'grad_norm': 1.0736908912658691, 'learning_rate': 1.8183323117684656e-05, 'epoch': 0.66} +2025-02-05 16:25:30 - ERROR - stderr - 22%|██▏ | 4915/22434 [6:17:49<12:14:17, 2.51s/it] +2025-02-05 16:25:32 - ERROR - stderr - 22%|██▏ | 4916/22434 [6:17:52<12:06:30, 2.49s/it] +2025-02-05 16:25:32 - ERROR - stderr - +2025-02-05 16:25:32 - ERROR - stderr - +2025-02-05 16:25:32 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.1113524436950684, 'learning_rate': 1.818249324572934e-05, 'epoch': 0.66} +2025-02-05 16:25:32 - ERROR - stderr - 22%|██▏ | 4916/22434 [6:17:52<12:06:30, 2.49s/it] +2025-02-05 16:25:35 - ERROR - stderr - 22%|██▏ | 4917/22434 [6:17:54<12:09:56, 2.50s/it] +2025-02-05 16:25:35 - ERROR - stderr - +2025-02-05 16:25:35 - ERROR - stderr - +2025-02-05 16:25:35 - INFO - stdout - {'loss': 0.9389, 'grad_norm': 1.0285409688949585, 'learning_rate': 1.8181663203217774e-05, 'epoch': 0.66} +2025-02-05 16:25:35 - ERROR - stderr - 22%|██▏ | 4917/22434 [6:17:54<12:09:56, 2.50s/it] +2025-02-05 16:25:37 - ERROR - stderr - 22%|██▏ | 4918/22434 [6:17:57<12:08:33, 2.50s/it] +2025-02-05 16:25:37 - ERROR - stderr - +2025-02-05 16:25:37 - ERROR - stderr - +2025-02-05 16:25:37 - INFO - stdout - {'loss': 0.9968, 'grad_norm': 1.1099438667297363, 'learning_rate': 1.8180832990167273e-05, 'epoch': 0.66} +2025-02-05 16:25:37 - ERROR - stderr - 22%|██▏ | 4918/22434 [6:17:57<12:08:33, 2.50s/it] +2025-02-05 16:25:40 - ERROR - stderr - 22%|██▏ | 4919/22434 [6:17:59<12:06:16, 2.49s/it] +2025-02-05 16:25:40 - ERROR - stderr - +2025-02-05 16:25:40 - ERROR - stderr - +2025-02-05 16:25:40 - INFO - stdout - {'loss': 1.0279, 'grad_norm': 0.9810138940811157, 'learning_rate': 1.8180002606595135e-05, 'epoch': 0.66} +2025-02-05 16:25:40 - ERROR - stderr - 22%|██▏ | 4919/22434 [6:17:59<12:06:16, 2.49s/it] +2025-02-05 16:25:42 - ERROR - stderr - 22%|██▏ | 4920/22434 [6:18:02<12:05:04, 2.48s/it] +2025-02-05 16:25:42 - ERROR - stderr - +2025-02-05 16:25:42 - ERROR - stderr - +2025-02-05 16:25:42 - INFO - stdout - {'loss': 1.0663, 'grad_norm': 0.9956666827201843, 'learning_rate': 1.817917205251867e-05, 'epoch': 0.66} +2025-02-05 16:25:42 - ERROR - stderr - 22%|██▏ | 4920/22434 [6:18:02<12:05:04, 2.48s/it] +2025-02-05 16:25:45 - ERROR - stderr - 22%|██▏ | 4921/22434 [6:18:04<12:07:42, 2.49s/it] +2025-02-05 16:25:45 - ERROR - stderr - +2025-02-05 16:25:45 - ERROR - stderr - +2025-02-05 16:25:45 - INFO - stdout - {'loss': 0.9678, 'grad_norm': 1.0833066701889038, 'learning_rate': 1.8178341327955193e-05, 'epoch': 0.66} +2025-02-05 16:25:45 - ERROR - stderr - 22%|██▏ | 4921/22434 [6:18:04<12:07:42, 2.49s/it] +2025-02-05 16:25:47 - ERROR - stderr - 22%|██▏ | 4922/22434 [6:18:07<12:06:49, 2.49s/it] +2025-02-05 16:25:47 - ERROR - stderr - +2025-02-05 16:25:47 - ERROR - stderr - +2025-02-05 16:25:47 - INFO - stdout - {'loss': 1.0245, 'grad_norm': 1.0350220203399658, 'learning_rate': 1.8177510432922013e-05, 'epoch': 0.66} +2025-02-05 16:25:47 - ERROR - stderr - 22%|██▏ | 4922/22434 [6:18:07<12:06:49, 2.49s/it] +2025-02-05 16:25:50 - ERROR - stderr - 22%|██▏ | 4923/22434 [6:18:09<12:08:11, 2.50s/it] +2025-02-05 16:25:50 - ERROR - stderr - +2025-02-05 16:25:50 - ERROR - stderr - +2025-02-05 16:25:50 - INFO - stdout - {'loss': 1.0295, 'grad_norm': 1.1310279369354248, 'learning_rate': 1.8176679367436453e-05, 'epoch': 0.66} +2025-02-05 16:25:50 - ERROR - stderr - 22%|██▏ | 4923/22434 [6:18:09<12:08:11, 2.50s/it] +2025-02-05 16:25:52 - ERROR - stderr - 22%|██▏ | 4924/22434 [6:18:12<12:05:54, 2.49s/it] +2025-02-05 16:25:52 - ERROR - stderr - +2025-02-05 16:25:52 - ERROR - stderr - +2025-02-05 16:25:52 - INFO - stdout - {'loss': 0.8932, 'grad_norm': 0.9682749509811401, 'learning_rate': 1.817584813151584e-05, 'epoch': 0.66} +2025-02-05 16:25:52 - ERROR - stderr - 22%|██▏ | 4924/22434 [6:18:12<12:05:54, 2.49s/it] +2025-02-05 16:25:54 - ERROR - stderr - 22%|██▏ | 4925/22434 [6:18:14<12:03:30, 2.48s/it] +2025-02-05 16:25:54 - ERROR - stderr - +2025-02-05 16:25:54 - ERROR - stderr - +2025-02-05 16:25:54 - INFO - stdout - {'loss': 0.9556, 'grad_norm': 1.152813196182251, 'learning_rate': 1.817501672517749e-05, 'epoch': 0.66} +2025-02-05 16:25:54 - ERROR - stderr - 22%|██▏ | 4925/22434 [6:18:14<12:03:30, 2.48s/it] +2025-02-05 16:25:57 - ERROR - stderr - 22%|██▏ | 4926/22434 [6:18:17<12:02:23, 2.48s/it] +2025-02-05 16:25:57 - ERROR - stderr - +2025-02-05 16:25:57 - ERROR - stderr - +2025-02-05 16:25:57 - INFO - stdout - {'loss': 0.9174, 'grad_norm': 1.0485787391662598, 'learning_rate': 1.8174185148438745e-05, 'epoch': 0.66} +2025-02-05 16:25:57 - ERROR - stderr - 22%|██▏ | 4926/22434 [6:18:17<12:02:23, 2.48s/it] +2025-02-05 16:25:59 - ERROR - stderr - 22%|██▏ | 4927/22434 [6:18:19<12:01:38, 2.47s/it] +2025-02-05 16:25:59 - ERROR - stderr - +2025-02-05 16:25:59 - ERROR - stderr - +2025-02-05 16:25:59 - INFO - stdout - {'loss': 0.9915, 'grad_norm': 1.0092227458953857, 'learning_rate': 1.817335340131693e-05, 'epoch': 0.66} +2025-02-05 16:25:59 - ERROR - stderr - 22%|██▏ | 4927/22434 [6:18:19<12:01:38, 2.47s/it] +2025-02-05 16:26:02 - ERROR - stderr - 22%|██▏ | 4928/22434 [6:18:22<12:00:30, 2.47s/it] +2025-02-05 16:26:02 - ERROR - stderr - +2025-02-05 16:26:02 - ERROR - stderr - +2025-02-05 16:26:02 - INFO - stdout - {'loss': 0.9766, 'grad_norm': 1.175471544265747, 'learning_rate': 1.8172521483829384e-05, 'epoch': 0.66} +2025-02-05 16:26:02 - ERROR - stderr - 22%|██▏ | 4928/22434 [6:18:22<12:00:30, 2.47s/it] +2025-02-05 16:26:04 - ERROR - stderr - 22%|██▏ | 4929/22434 [6:18:24<11:58:52, 2.46s/it] +2025-02-05 16:26:04 - ERROR - stderr - +2025-02-05 16:26:04 - ERROR - stderr - +2025-02-05 16:26:04 - INFO - stdout - {'loss': 0.9493, 'grad_norm': 1.0688331127166748, 'learning_rate': 1.8171689395993447e-05, 'epoch': 0.66} +2025-02-05 16:26:04 - ERROR - stderr - 22%|██▏ | 4929/22434 [6:18:24<11:58:52, 2.46s/it] +2025-02-05 16:26:07 - ERROR - stderr - 22%|██▏ | 4930/22434 [6:18:27<12:04:17, 2.48s/it] +2025-02-05 16:26:07 - ERROR - stderr - +2025-02-05 16:26:07 - ERROR - stderr - +2025-02-05 16:26:07 - INFO - stdout - {'loss': 0.8672, 'grad_norm': 0.9807957410812378, 'learning_rate': 1.8170857137826465e-05, 'epoch': 0.66} +2025-02-05 16:26:07 - ERROR - stderr - 22%|██▏ | 4930/22434 [6:18:27<12:04:17, 2.48s/it] +2025-02-05 16:26:09 - ERROR - stderr - 22%|██▏ | 4931/22434 [6:18:29<12:06:06, 2.49s/it] +2025-02-05 16:26:09 - ERROR - stderr - +2025-02-05 16:26:09 - ERROR - stderr - +2025-02-05 16:26:09 - INFO - stdout - {'loss': 1.0332, 'grad_norm': 1.101035714149475, 'learning_rate': 1.8170024709345786e-05, 'epoch': 0.66} +2025-02-05 16:26:09 - ERROR - stderr - 22%|██▏ | 4931/22434 [6:18:29<12:06:06, 2.49s/it] +2025-02-05 16:26:12 - ERROR - stderr - 22%|██▏ | 4932/22434 [6:18:32<12:01:39, 2.47s/it] +2025-02-05 16:26:12 - ERROR - stderr - +2025-02-05 16:26:12 - ERROR - stderr - +2025-02-05 16:26:12 - INFO - stdout - {'loss': 1.0438, 'grad_norm': 1.2423990964889526, 'learning_rate': 1.816919211056876e-05, 'epoch': 0.66} +2025-02-05 16:26:12 - ERROR - stderr - 22%|██▏ | 4932/22434 [6:18:32<12:01:39, 2.47s/it] +2025-02-05 16:26:14 - ERROR - stderr - 22%|██▏ | 4933/22434 [6:18:34<11:59:58, 2.47s/it] +2025-02-05 16:26:14 - ERROR - stderr - +2025-02-05 16:26:14 - ERROR - stderr - +2025-02-05 16:26:14 - INFO - stdout - {'loss': 0.9625, 'grad_norm': 1.0998975038528442, 'learning_rate': 1.816835934151274e-05, 'epoch': 0.66} +2025-02-05 16:26:14 - ERROR - stderr - 22%|██▏ | 4933/22434 [6:18:34<11:59:58, 2.47s/it] +2025-02-05 16:26:17 - ERROR - stderr - 22%|██▏ | 4934/22434 [6:18:36<11:55:46, 2.45s/it] +2025-02-05 16:26:17 - ERROR - stderr - +2025-02-05 16:26:17 - ERROR - stderr - +2025-02-05 16:26:17 - INFO - stdout - {'loss': 0.9311, 'grad_norm': 1.059422254562378, 'learning_rate': 1.8167526402195085e-05, 'epoch': 0.66} +2025-02-05 16:26:17 - ERROR - stderr - 22%|██▏ | 4934/22434 [6:18:36<11:55:46, 2.45s/it] +2025-02-05 16:26:19 - ERROR - stderr - 22%|██▏ | 4935/22434 [6:18:39<12:01:27, 2.47s/it] +2025-02-05 16:26:19 - ERROR - stderr - +2025-02-05 16:26:19 - ERROR - stderr - +2025-02-05 16:26:19 - INFO - stdout - {'loss': 0.9523, 'grad_norm': 0.9626438617706299, 'learning_rate': 1.816669329263316e-05, 'epoch': 0.66} +2025-02-05 16:26:19 - ERROR - stderr - 22%|██▏ | 4935/22434 [6:18:39<12:01:27, 2.47s/it] +2025-02-05 16:26:22 - ERROR - stderr - 22%|██▏ | 4936/22434 [6:18:41<12:06:27, 2.49s/it] +2025-02-05 16:26:22 - ERROR - stderr - +2025-02-05 16:26:22 - ERROR - stderr - +2025-02-05 16:26:22 - INFO - stdout - {'loss': 0.9433, 'grad_norm': 1.1004456281661987, 'learning_rate': 1.8165860012844325e-05, 'epoch': 0.66} +2025-02-05 16:26:22 - ERROR - stderr - 22%|██▏ | 4936/22434 [6:18:42<12:06:27, 2.49s/it] +2025-02-05 16:26:24 - ERROR - stderr - 22%|██▏ | 4937/22434 [6:18:44<12:00:22, 2.47s/it] +2025-02-05 16:26:24 - ERROR - stderr - +2025-02-05 16:26:24 - ERROR - stderr - +2025-02-05 16:26:24 - INFO - stdout - {'loss': 0.998, 'grad_norm': 1.078370451927185, 'learning_rate': 1.8165026562845954e-05, 'epoch': 0.66} +2025-02-05 16:26:24 - ERROR - stderr - 22%|██▏ | 4937/22434 [6:18:44<12:00:22, 2.47s/it] +2025-02-05 16:26:27 - ERROR - stderr - 22%|██▏ | 4938/22434 [6:18:46<11:58:40, 2.46s/it] +2025-02-05 16:26:27 - ERROR - stderr - +2025-02-05 16:26:27 - ERROR - stderr - +2025-02-05 16:26:27 - INFO - stdout - {'loss': 0.9913, 'grad_norm': 1.0814099311828613, 'learning_rate': 1.8164192942655418e-05, 'epoch': 0.66} +2025-02-05 16:26:27 - ERROR - stderr - 22%|██▏ | 4938/22434 [6:18:46<11:58:40, 2.46s/it] +2025-02-05 16:26:29 - ERROR - stderr - 22%|██▏ | 4939/22434 [6:18:49<11:59:29, 2.47s/it] +2025-02-05 16:26:29 - ERROR - stderr - +2025-02-05 16:26:29 - ERROR - stderr - +2025-02-05 16:26:29 - INFO - stdout - {'loss': 0.9861, 'grad_norm': 1.044791579246521, 'learning_rate': 1.816335915229009e-05, 'epoch': 0.66} +2025-02-05 16:26:29 - ERROR - stderr - 22%|██▏ | 4939/22434 [6:18:49<11:59:29, 2.47s/it] +2025-02-05 16:26:31 - ERROR - stderr - 22%|██▏ | 4940/22434 [6:18:51<11:59:26, 2.47s/it] +2025-02-05 16:26:32 - ERROR - stderr - +2025-02-05 16:26:32 - ERROR - stderr - +2025-02-05 16:26:32 - INFO - stdout - {'loss': 0.945, 'grad_norm': 1.0157090425491333, 'learning_rate': 1.8162525191767354e-05, 'epoch': 0.66} +2025-02-05 16:26:32 - ERROR - stderr - 22%|██▏ | 4940/22434 [6:18:51<11:59:26, 2.47s/it] +2025-02-05 16:26:34 - ERROR - stderr - 22%|██▏ | 4941/22434 [6:18:54<11:58:23, 2.46s/it] +2025-02-05 16:26:34 - ERROR - stderr - +2025-02-05 16:26:34 - ERROR - stderr - +2025-02-05 16:26:34 - INFO - stdout - {'loss': 0.9928, 'grad_norm': 1.212355613708496, 'learning_rate': 1.816169106110459e-05, 'epoch': 0.66} +2025-02-05 16:26:34 - ERROR - stderr - 22%|██▏ | 4941/22434 [6:18:54<11:58:23, 2.46s/it] +2025-02-05 16:26:36 - ERROR - stderr - 22%|██▏ | 4942/22434 [6:18:56<12:04:40, 2.49s/it] +2025-02-05 16:26:37 - ERROR - stderr - +2025-02-05 16:26:37 - ERROR - stderr - +2025-02-05 16:26:37 - INFO - stdout - {'loss': 1.0804, 'grad_norm': 1.040511131286621, 'learning_rate': 1.8160856760319186e-05, 'epoch': 0.66} +2025-02-05 16:26:37 - ERROR - stderr - 22%|██▏ | 4942/22434 [6:18:56<12:04:40, 2.49s/it] +2025-02-05 16:26:39 - ERROR - stderr - 22%|██▏ | 4943/22434 [6:18:59<12:01:37, 2.48s/it] +2025-02-05 16:26:39 - ERROR - stderr - +2025-02-05 16:26:39 - ERROR - stderr - +2025-02-05 16:26:39 - INFO - stdout - {'loss': 0.9962, 'grad_norm': 1.191188097000122, 'learning_rate': 1.816002228942853e-05, 'epoch': 0.66} +2025-02-05 16:26:39 - ERROR - stderr - 22%|██▏ | 4943/22434 [6:18:59<12:01:37, 2.48s/it] +2025-02-05 16:26:41 - ERROR - stderr - 22%|██▏ | 4944/22434 [6:19:01<12:05:41, 2.49s/it] +2025-02-05 16:26:42 - ERROR - stderr - +2025-02-05 16:26:42 - ERROR - stderr - +2025-02-05 16:26:42 - INFO - stdout - {'loss': 1.2826, 'grad_norm': 1.2231699228286743, 'learning_rate': 1.815918764845002e-05, 'epoch': 0.66} +2025-02-05 16:26:42 - ERROR - stderr - 22%|██▏ | 4944/22434 [6:19:01<12:05:41, 2.49s/it] +2025-02-05 16:26:44 - ERROR - stderr - 22%|██▏ | 4945/22434 [6:19:04<12:13:14, 2.52s/it] +2025-02-05 16:26:44 - ERROR - stderr - +2025-02-05 16:26:44 - ERROR - stderr - +2025-02-05 16:26:44 - INFO - stdout - {'loss': 0.9311, 'grad_norm': 1.021012783050537, 'learning_rate': 1.8158352837401052e-05, 'epoch': 0.66} +2025-02-05 16:26:44 - ERROR - stderr - 22%|██▏ | 4945/22434 [6:19:04<12:13:14, 2.52s/it] +2025-02-05 16:26:47 - ERROR - stderr - 22%|██▏ | 4946/22434 [6:19:06<12:16:15, 2.53s/it] +2025-02-05 16:26:47 - ERROR - stderr - +2025-02-05 16:26:47 - ERROR - stderr - +2025-02-05 16:26:47 - INFO - stdout - {'loss': 0.9375, 'grad_norm': 1.165655255317688, 'learning_rate': 1.8157517856299024e-05, 'epoch': 0.66} +2025-02-05 16:26:47 - ERROR - stderr - 22%|██▏ | 4946/22434 [6:19:06<12:16:15, 2.53s/it] +2025-02-05 16:26:49 - ERROR - stderr - 22%|██▏ | 4947/22434 [6:19:09<12:05:40, 2.49s/it] +2025-02-05 16:26:49 - ERROR - stderr - +2025-02-05 16:26:49 - ERROR - stderr - +2025-02-05 16:26:49 - INFO - stdout - {'loss': 0.9278, 'grad_norm': 1.1837654113769531, 'learning_rate': 1.815668270516134e-05, 'epoch': 0.66} +2025-02-05 16:26:49 - ERROR - stderr - 22%|██▏ | 4947/22434 [6:19:09<12:05:40, 2.49s/it] +2025-02-05 16:26:51 - ERROR - stderr - 22%|██▏ | 4948/22434 [6:19:11<12:01:27, 2.48s/it] +2025-02-05 16:26:51 - ERROR - stderr - +2025-02-05 16:26:51 - ERROR - stderr - +2025-02-05 16:26:51 - INFO - stdout - {'loss': 0.8607, 'grad_norm': 1.0211386680603027, 'learning_rate': 1.8155847384005417e-05, 'epoch': 0.66} +2025-02-05 16:26:51 - ERROR - stderr - 22%|██▏ | 4948/22434 [6:19:11<12:01:27, 2.48s/it] +2025-02-05 16:26:54 - ERROR - stderr - 22%|██▏ | 4949/22434 [6:19:14<11:59:36, 2.47s/it] +2025-02-05 16:26:54 - ERROR - stderr - +2025-02-05 16:26:54 - ERROR - stderr - +2025-02-05 16:26:54 - INFO - stdout - {'loss': 0.9783, 'grad_norm': 1.158022403717041, 'learning_rate': 1.8155011892848656e-05, 'epoch': 0.66} +2025-02-05 16:26:54 - ERROR - stderr - 22%|██▏ | 4949/22434 [6:19:14<11:59:36, 2.47s/it] +2025-02-05 16:26:56 - ERROR - stderr - 22%|██▏ | 4950/22434 [6:19:16<11:54:55, 2.45s/it] +2025-02-05 16:26:56 - ERROR - stderr - +2025-02-05 16:26:56 - ERROR - stderr - +2025-02-05 16:26:56 - INFO - stdout - {'loss': 0.9936, 'grad_norm': 1.0513328313827515, 'learning_rate': 1.8154176231708472e-05, 'epoch': 0.66} +2025-02-05 16:26:56 - ERROR - stderr - 22%|██▏ | 4950/22434 [6:19:16<11:54:55, 2.45s/it] +2025-02-05 16:26:59 - ERROR - stderr - 22%|██▏ | 4951/22434 [6:19:19<11:54:21, 2.45s/it] +2025-02-05 16:26:59 - ERROR - stderr - +2025-02-05 16:26:59 - ERROR - stderr - +2025-02-05 16:26:59 - INFO - stdout - {'loss': 0.8866, 'grad_norm': 1.0957874059677124, 'learning_rate': 1.815334040060229e-05, 'epoch': 0.66} +2025-02-05 16:26:59 - ERROR - stderr - 22%|██▏ | 4951/22434 [6:19:19<11:54:21, 2.45s/it] +2025-02-05 16:27:01 - ERROR - stderr - 22%|██▏ | 4952/22434 [6:19:21<11:55:59, 2.46s/it] +2025-02-05 16:27:01 - ERROR - stderr - +2025-02-05 16:27:01 - ERROR - stderr - +2025-02-05 16:27:01 - INFO - stdout - {'loss': 1.0454, 'grad_norm': 1.163976788520813, 'learning_rate': 1.815250439954753e-05, 'epoch': 0.66} +2025-02-05 16:27:01 - ERROR - stderr - 22%|██▏ | 4952/22434 [6:19:21<11:55:59, 2.46s/it] +2025-02-05 16:27:04 - ERROR - stderr - 22%|██▏ | 4953/22434 [6:19:23<11:54:20, 2.45s/it] +2025-02-05 16:27:04 - ERROR - stderr - +2025-02-05 16:27:04 - ERROR - stderr - +2025-02-05 16:27:04 - INFO - stdout - {'loss': 0.8984, 'grad_norm': 0.9470677375793457, 'learning_rate': 1.8151668228561616e-05, 'epoch': 0.66} +2025-02-05 16:27:04 - ERROR - stderr - 22%|██▏ | 4953/22434 [6:19:23<11:54:20, 2.45s/it] +2025-02-05 16:27:06 - ERROR - stderr - 22%|██▏ | 4954/22434 [6:19:26<11:52:53, 2.45s/it] +2025-02-05 16:27:06 - ERROR - stderr - +2025-02-05 16:27:06 - ERROR - stderr - +2025-02-05 16:27:06 - INFO - stdout - {'loss': 0.9072, 'grad_norm': 1.0575108528137207, 'learning_rate': 1.815083188766198e-05, 'epoch': 0.66} +2025-02-05 16:27:06 - ERROR - stderr - 22%|██▏ | 4954/22434 [6:19:26<11:52:53, 2.45s/it] +2025-02-05 16:27:09 - ERROR - stderr - 22%|██▏ | 4955/22434 [6:19:28<11:54:33, 2.45s/it] +2025-02-05 16:27:09 - ERROR - stderr - +2025-02-05 16:27:09 - ERROR - stderr - +2025-02-05 16:27:09 - INFO - stdout - {'loss': 1.1552, 'grad_norm': 1.243083119392395, 'learning_rate': 1.814999537686605e-05, 'epoch': 0.66} +2025-02-05 16:27:09 - ERROR - stderr - 22%|██▏ | 4955/22434 [6:19:28<11:54:33, 2.45s/it] +2025-02-05 16:27:11 - ERROR - stderr - 22%|██▏ | 4956/22434 [6:19:31<11:58:16, 2.47s/it] +2025-02-05 16:27:11 - ERROR - stderr - +2025-02-05 16:27:11 - ERROR - stderr - +2025-02-05 16:27:11 - INFO - stdout - {'loss': 0.8107, 'grad_norm': 1.0193355083465576, 'learning_rate': 1.8149158696191268e-05, 'epoch': 0.66} +2025-02-05 16:27:11 - ERROR - stderr - 22%|██▏ | 4956/22434 [6:19:31<11:58:16, 2.47s/it] +2025-02-05 16:27:14 - ERROR - stderr - 22%|██▏ | 4957/22434 [6:19:33<12:03:50, 2.48s/it] +2025-02-05 16:27:14 - ERROR - stderr - +2025-02-05 16:27:14 - ERROR - stderr - +2025-02-05 16:27:14 - INFO - stdout - {'loss': 0.8147, 'grad_norm': 0.9294533133506775, 'learning_rate': 1.8148321845655066e-05, 'epoch': 0.66} +2025-02-05 16:27:14 - ERROR - stderr - 22%|██▏ | 4957/22434 [6:19:33<12:03:50, 2.48s/it] +2025-02-05 16:27:16 - ERROR - stderr - 22%|██▏ | 4958/22434 [6:19:36<12:05:10, 2.49s/it] +2025-02-05 16:27:16 - ERROR - stderr - +2025-02-05 16:27:16 - ERROR - stderr - +2025-02-05 16:27:16 - INFO - stdout - {'loss': 0.8627, 'grad_norm': 0.9704387187957764, 'learning_rate': 1.8147484825274895e-05, 'epoch': 0.66} +2025-02-05 16:27:16 - ERROR - stderr - 22%|██▏ | 4958/22434 [6:19:36<12:05:10, 2.49s/it] +2025-02-05 16:27:19 - ERROR - stderr - 22%|██▏ | 4959/22434 [6:19:38<12:07:54, 2.50s/it] +2025-02-05 16:27:19 - ERROR - stderr - +2025-02-05 16:27:19 - ERROR - stderr - +2025-02-05 16:27:19 - INFO - stdout - {'loss': 0.8921, 'grad_norm': 1.010780930519104, 'learning_rate': 1.81466476350682e-05, 'epoch': 0.66} +2025-02-05 16:27:19 - ERROR - stderr - 22%|██▏ | 4959/22434 [6:19:38<12:07:54, 2.50s/it] +2025-02-05 16:27:21 - ERROR - stderr - 22%|██▏ | 4960/22434 [6:19:41<12:02:46, 2.48s/it] +2025-02-05 16:27:21 - ERROR - stderr - +2025-02-05 16:27:21 - ERROR - stderr - +2025-02-05 16:27:21 - INFO - stdout - {'loss': 0.8169, 'grad_norm': 1.1641600131988525, 'learning_rate': 1.814581027505243e-05, 'epoch': 0.66} +2025-02-05 16:27:21 - ERROR - stderr - 22%|██▏ | 4960/22434 [6:19:41<12:02:46, 2.48s/it] +2025-02-05 16:27:24 - ERROR - stderr - 22%|██▏ | 4961/22434 [6:19:43<12:02:30, 2.48s/it] +2025-02-05 16:27:24 - ERROR - stderr - +2025-02-05 16:27:24 - ERROR - stderr - +2025-02-05 16:27:24 - INFO - stdout - {'loss': 0.8898, 'grad_norm': 1.101241946220398, 'learning_rate': 1.814497274524504e-05, 'epoch': 0.66} +2025-02-05 16:27:24 - ERROR - stderr - 22%|██▏ | 4961/22434 [6:19:43<12:02:30, 2.48s/it] +2025-02-05 16:27:26 - ERROR - stderr - 22%|██▏ | 4962/22434 [6:19:46<12:05:23, 2.49s/it] +2025-02-05 16:27:26 - ERROR - stderr - +2025-02-05 16:27:26 - ERROR - stderr - +2025-02-05 16:27:26 - INFO - stdout - {'loss': 0.9805, 'grad_norm': 1.1946091651916504, 'learning_rate': 1.8144135045663486e-05, 'epoch': 0.66} +2025-02-05 16:27:26 - ERROR - stderr - 22%|██▏ | 4962/22434 [6:19:46<12:05:23, 2.49s/it] +2025-02-05 16:27:29 - ERROR - stderr - 22%|██▏ | 4963/22434 [6:19:48<12:06:10, 2.49s/it] +2025-02-05 16:27:29 - ERROR - stderr - +2025-02-05 16:27:29 - ERROR - stderr - +2025-02-05 16:27:29 - INFO - stdout - {'loss': 0.9625, 'grad_norm': 1.1613874435424805, 'learning_rate': 1.814329717632523e-05, 'epoch': 0.66} +2025-02-05 16:27:29 - ERROR - stderr - 22%|██▏ | 4963/22434 [6:19:48<12:06:10, 2.49s/it] +2025-02-05 16:27:31 - ERROR - stderr - 22%|██▏ | 4964/22434 [6:19:51<12:00:31, 2.47s/it] +2025-02-05 16:27:31 - ERROR - stderr - +2025-02-05 16:27:31 - ERROR - stderr - +2025-02-05 16:27:31 - INFO - stdout - {'loss': 1.1162, 'grad_norm': 1.202282428741455, 'learning_rate': 1.814245913724774e-05, 'epoch': 0.66} +2025-02-05 16:27:31 - ERROR - stderr - 22%|██▏ | 4964/22434 [6:19:51<12:00:31, 2.47s/it] +2025-02-05 16:27:34 - ERROR - stderr - 22%|██▏ | 4965/22434 [6:19:53<12:08:00, 2.50s/it] +2025-02-05 16:27:34 - ERROR - stderr - +2025-02-05 16:27:34 - ERROR - stderr - +2025-02-05 16:27:34 - INFO - stdout - {'loss': 0.9634, 'grad_norm': 1.077477216720581, 'learning_rate': 1.8141620928448474e-05, 'epoch': 0.66} +2025-02-05 16:27:34 - ERROR - stderr - 22%|██▏ | 4965/22434 [6:19:53<12:08:00, 2.50s/it] +2025-02-05 16:27:36 - ERROR - stderr - 22%|██▏ | 4966/22434 [6:19:56<12:05:56, 2.49s/it] +2025-02-05 16:27:36 - ERROR - stderr - +2025-02-05 16:27:36 - ERROR - stderr - +2025-02-05 16:27:36 - INFO - stdout - {'loss': 0.977, 'grad_norm': 1.1463258266448975, 'learning_rate': 1.8140782549944915e-05, 'epoch': 0.66} +2025-02-05 16:27:36 - ERROR - stderr - 22%|██▏ | 4966/22434 [6:19:56<12:05:56, 2.49s/it] +2025-02-05 16:27:39 - ERROR - stderr - 22%|██▏ | 4967/22434 [6:19:59<12:54:11, 2.66s/it] +2025-02-05 16:27:39 - ERROR - stderr - +2025-02-05 16:27:39 - ERROR - stderr - +2025-02-05 16:27:39 - INFO - stdout - {'loss': 0.88, 'grad_norm': 0.9715328812599182, 'learning_rate': 1.8139944001754533e-05, 'epoch': 0.66} +2025-02-05 16:27:39 - ERROR - stderr - 22%|██▏ | 4967/22434 [6:19:59<12:54:11, 2.66s/it] +2025-02-05 16:27:42 - ERROR - stderr - 22%|██▏ | 4968/22434 [6:20:01<12:37:01, 2.60s/it] +2025-02-05 16:27:42 - ERROR - stderr - +2025-02-05 16:27:42 - ERROR - stderr - +2025-02-05 16:27:42 - INFO - stdout - {'loss': 0.9837, 'grad_norm': 1.2045345306396484, 'learning_rate': 1.813910528389481e-05, 'epoch': 0.66} +2025-02-05 16:27:42 - ERROR - stderr - 22%|██▏ | 4968/22434 [6:20:01<12:37:01, 2.60s/it] +2025-02-05 16:27:44 - ERROR - stderr - 22%|██▏ | 4969/22434 [6:20:04<12:24:43, 2.56s/it] +2025-02-05 16:27:44 - ERROR - stderr - +2025-02-05 16:27:44 - ERROR - stderr - +2025-02-05 16:27:44 - INFO - stdout - {'loss': 1.0145, 'grad_norm': 1.047640085220337, 'learning_rate': 1.8138266396383222e-05, 'epoch': 0.66} +2025-02-05 16:27:44 - ERROR - stderr - 22%|██▏ | 4969/22434 [6:20:04<12:24:43, 2.56s/it] +2025-02-05 16:27:47 - ERROR - stderr - 22%|██▏ | 4970/22434 [6:20:06<12:23:37, 2.55s/it] +2025-02-05 16:27:47 - ERROR - stderr - +2025-02-05 16:27:47 - ERROR - stderr - +2025-02-05 16:27:47 - INFO - stdout - {'loss': 0.8953, 'grad_norm': 1.0173547267913818, 'learning_rate': 1.813742733923726e-05, 'epoch': 0.66} +2025-02-05 16:27:47 - ERROR - stderr - 22%|██▏ | 4970/22434 [6:20:06<12:23:37, 2.55s/it] +2025-02-05 16:27:49 - ERROR - stderr - 22%|██▏ | 4971/22434 [6:20:09<12:17:06, 2.53s/it] +2025-02-05 16:27:49 - ERROR - stderr - +2025-02-05 16:27:49 - ERROR - stderr - +2025-02-05 16:27:49 - INFO - stdout - {'loss': 0.9198, 'grad_norm': 1.2930530309677124, 'learning_rate': 1.813658811247441e-05, 'epoch': 0.66} +2025-02-05 16:27:49 - ERROR - stderr - 22%|██▏ | 4971/22434 [6:20:09<12:17:06, 2.53s/it] +2025-02-05 16:27:52 - ERROR - stderr - 22%|██▏ | 4972/22434 [6:20:11<12:20:06, 2.54s/it] +2025-02-05 16:27:52 - ERROR - stderr - +2025-02-05 16:27:52 - ERROR - stderr - +2025-02-05 16:27:52 - INFO - stdout - {'loss': 0.9414, 'grad_norm': 1.1037321090698242, 'learning_rate': 1.8135748716112168e-05, 'epoch': 0.66} +2025-02-05 16:27:52 - ERROR - stderr - 22%|██▏ | 4972/22434 [6:20:11<12:20:06, 2.54s/it] +2025-02-05 16:27:54 - ERROR - stderr - 22%|██▏ | 4973/22434 [6:20:14<12:15:55, 2.53s/it] +2025-02-05 16:27:54 - ERROR - stderr - +2025-02-05 16:27:54 - ERROR - stderr - +2025-02-05 16:27:54 - INFO - stdout - {'loss': 0.9024, 'grad_norm': 1.1478307247161865, 'learning_rate': 1.8134909150168028e-05, 'epoch': 0.67} +2025-02-05 16:27:54 - ERROR - stderr - 22%|██▏ | 4973/22434 [6:20:14<12:15:55, 2.53s/it] +2025-02-05 16:27:57 - ERROR - stderr - 22%|██▏ | 4974/22434 [6:20:17<12:28:26, 2.57s/it] +2025-02-05 16:27:57 - ERROR - stderr - +2025-02-05 16:27:57 - ERROR - stderr - +2025-02-05 16:27:57 - INFO - stdout - {'loss': 0.8417, 'grad_norm': 1.0730478763580322, 'learning_rate': 1.8134069414659496e-05, 'epoch': 0.67} +2025-02-05 16:27:57 - ERROR - stderr - 22%|██▏ | 4974/22434 [6:20:17<12:28:26, 2.57s/it] +2025-02-05 16:27:59 - ERROR - stderr - 22%|██▏ | 4975/22434 [6:20:19<12:23:18, 2.55s/it] +2025-02-05 16:27:59 - ERROR - stderr - +2025-02-05 16:27:59 - ERROR - stderr - +2025-02-05 16:27:59 - INFO - stdout - {'loss': 1.0292, 'grad_norm': 1.0726128816604614, 'learning_rate': 1.813322950960406e-05, 'epoch': 0.67} +2025-02-05 16:27:59 - ERROR - stderr - 22%|██▏ | 4975/22434 [6:20:19<12:23:18, 2.55s/it] +2025-02-05 16:28:02 - ERROR - stderr - 22%|██▏ | 4976/22434 [6:20:22<12:21:07, 2.55s/it] +2025-02-05 16:28:02 - ERROR - stderr - +2025-02-05 16:28:02 - ERROR - stderr - +2025-02-05 16:28:02 - INFO - stdout - {'loss': 0.9956, 'grad_norm': 1.0035371780395508, 'learning_rate': 1.8132389435019248e-05, 'epoch': 0.67} +2025-02-05 16:28:02 - ERROR - stderr - 22%|██▏ | 4976/22434 [6:20:22<12:21:07, 2.55s/it] +2025-02-05 16:28:04 - ERROR - stderr - 22%|██▏ | 4977/22434 [6:20:24<12:12:01, 2.52s/it] +2025-02-05 16:28:04 - ERROR - stderr - +2025-02-05 16:28:04 - ERROR - stderr - +2025-02-05 16:28:04 - INFO - stdout - {'loss': 0.8932, 'grad_norm': 1.1524064540863037, 'learning_rate': 1.8131549190922556e-05, 'epoch': 0.67} +2025-02-05 16:28:04 - ERROR - stderr - 22%|██▏ | 4977/22434 [6:20:24<12:12:01, 2.52s/it] +2025-02-05 16:28:07 - ERROR - stderr - 22%|██▏ | 4978/22434 [6:20:27<12:11:54, 2.52s/it] +2025-02-05 16:28:07 - ERROR - stderr - +2025-02-05 16:28:07 - ERROR - stderr - +2025-02-05 16:28:07 - INFO - stdout - {'loss': 0.9171, 'grad_norm': 1.0357332229614258, 'learning_rate': 1.81307087773315e-05, 'epoch': 0.67} +2025-02-05 16:28:07 - ERROR - stderr - 22%|██▏ | 4978/22434 [6:20:27<12:11:54, 2.52s/it] +2025-02-05 16:28:09 - ERROR - stderr - 22%|██▏ | 4979/22434 [6:20:29<12:20:31, 2.55s/it] +2025-02-05 16:28:09 - ERROR - stderr - +2025-02-05 16:28:09 - ERROR - stderr - +2025-02-05 16:28:09 - INFO - stdout - {'loss': 1.0408, 'grad_norm': 1.0936928987503052, 'learning_rate': 1.81298681942636e-05, 'epoch': 0.67} +2025-02-05 16:28:09 - ERROR - stderr - 22%|██▏ | 4979/22434 [6:20:29<12:20:31, 2.55s/it] +2025-02-05 16:28:12 - ERROR - stderr - 22%|██▏ | 4980/22434 [6:20:32<12:17:08, 2.53s/it] +2025-02-05 16:28:12 - ERROR - stderr - +2025-02-05 16:28:12 - ERROR - stderr - +2025-02-05 16:28:12 - INFO - stdout - {'loss': 0.9297, 'grad_norm': 1.0289288759231567, 'learning_rate': 1.8129027441736382e-05, 'epoch': 0.67} +2025-02-05 16:28:12 - ERROR - stderr - 22%|██▏ | 4980/22434 [6:20:32<12:17:08, 2.53s/it] +2025-02-05 16:28:14 - ERROR - stderr - 22%|██▏ | 4981/22434 [6:20:34<12:19:02, 2.54s/it] +2025-02-05 16:28:14 - ERROR - stderr - +2025-02-05 16:28:14 - ERROR - stderr - +2025-02-05 16:28:14 - INFO - stdout - {'loss': 0.9367, 'grad_norm': 1.031346321105957, 'learning_rate': 1.8128186519767364e-05, 'epoch': 0.67} +2025-02-05 16:28:14 - ERROR - stderr - 22%|██▏ | 4981/22434 [6:20:34<12:19:02, 2.54s/it] +2025-02-05 16:28:17 - ERROR - stderr - 22%|██▏ | 4982/22434 [6:20:37<12:14:52, 2.53s/it] +2025-02-05 16:28:17 - ERROR - stderr - +2025-02-05 16:28:17 - ERROR - stderr - +2025-02-05 16:28:17 - INFO - stdout - {'loss': 1.0336, 'grad_norm': 1.0336720943450928, 'learning_rate': 1.8127345428374074e-05, 'epoch': 0.67} +2025-02-05 16:28:17 - ERROR - stderr - 22%|██▏ | 4982/22434 [6:20:37<12:14:52, 2.53s/it] +2025-02-05 16:28:19 - ERROR - stderr - 22%|██▏ | 4983/22434 [6:20:39<12:11:12, 2.51s/it] +2025-02-05 16:28:19 - ERROR - stderr - +2025-02-05 16:28:19 - ERROR - stderr - +2025-02-05 16:28:19 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 0.9850664138793945, 'learning_rate': 1.8126504167574045e-05, 'epoch': 0.67} +2025-02-05 16:28:19 - ERROR - stderr - 22%|██▏ | 4983/22434 [6:20:39<12:11:12, 2.51s/it] +2025-02-05 16:28:22 - ERROR - stderr - 22%|██▏ | 4984/22434 [6:20:42<12:05:53, 2.50s/it] +2025-02-05 16:28:22 - ERROR - stderr - +2025-02-05 16:28:22 - ERROR - stderr - +2025-02-05 16:28:22 - INFO - stdout - {'loss': 0.9669, 'grad_norm': 1.029054880142212, 'learning_rate': 1.8125662737384814e-05, 'epoch': 0.67} +2025-02-05 16:28:22 - ERROR - stderr - 22%|██▏ | 4984/22434 [6:20:42<12:05:53, 2.50s/it] +2025-02-05 16:28:24 - ERROR - stderr - 22%|██▏ | 4985/22434 [6:20:44<12:02:26, 2.48s/it] +2025-02-05 16:28:24 - ERROR - stderr - +2025-02-05 16:28:24 - ERROR - stderr - +2025-02-05 16:28:24 - INFO - stdout - {'loss': 1.0181, 'grad_norm': 1.0611985921859741, 'learning_rate': 1.812482113782392e-05, 'epoch': 0.67} +2025-02-05 16:28:24 - ERROR - stderr - 22%|██▏ | 4985/22434 [6:20:44<12:02:26, 2.48s/it] +2025-02-05 16:28:27 - ERROR - stderr - 22%|██▏ | 4986/22434 [6:20:47<12:06:23, 2.50s/it] +2025-02-05 16:28:27 - ERROR - stderr - +2025-02-05 16:28:27 - ERROR - stderr - +2025-02-05 16:28:27 - INFO - stdout - {'loss': 0.9778, 'grad_norm': 1.0016247034072876, 'learning_rate': 1.81239793689089e-05, 'epoch': 0.67} +2025-02-05 16:28:27 - ERROR - stderr - 22%|██▏ | 4986/22434 [6:20:47<12:06:23, 2.50s/it] +2025-02-05 16:28:29 - ERROR - stderr - 22%|██▏ | 4987/22434 [6:20:49<12:04:30, 2.49s/it] +2025-02-05 16:28:29 - ERROR - stderr - +2025-02-05 16:28:29 - ERROR - stderr - +2025-02-05 16:28:29 - INFO - stdout - {'loss': 0.8778, 'grad_norm': 1.0768470764160156, 'learning_rate': 1.8123137430657308e-05, 'epoch': 0.67} +2025-02-05 16:28:29 - ERROR - stderr - 22%|██▏ | 4987/22434 [6:20:49<12:04:30, 2.49s/it] +2025-02-05 16:28:32 - ERROR - stderr - 22%|██▏ | 4988/22434 [6:20:52<11:59:07, 2.47s/it] +2025-02-05 16:28:32 - ERROR - stderr - +2025-02-05 16:28:32 - ERROR - stderr - +2025-02-05 16:28:32 - INFO - stdout - {'loss': 0.9919, 'grad_norm': 1.0309611558914185, 'learning_rate': 1.8122295323086688e-05, 'epoch': 0.67} +2025-02-05 16:28:32 - ERROR - stderr - 22%|██▏ | 4988/22434 [6:20:52<11:59:07, 2.47s/it] +2025-02-05 16:28:34 - ERROR - stderr - 22%|██▏ | 4989/22434 [6:20:54<11:56:57, 2.47s/it] +2025-02-05 16:28:34 - ERROR - stderr - +2025-02-05 16:28:34 - ERROR - stderr - +2025-02-05 16:28:34 - INFO - stdout - {'loss': 0.8895, 'grad_norm': 1.0286513566970825, 'learning_rate': 1.8121453046214593e-05, 'epoch': 0.67} +2025-02-05 16:28:34 - ERROR - stderr - 22%|██▏ | 4989/22434 [6:20:54<11:56:57, 2.47s/it] +2025-02-05 16:28:37 - ERROR - stderr - 22%|██▏ | 4990/22434 [6:20:56<12:00:02, 2.48s/it] +2025-02-05 16:28:37 - ERROR - stderr - +2025-02-05 16:28:37 - ERROR - stderr - +2025-02-05 16:28:37 - INFO - stdout - {'loss': 0.8965, 'grad_norm': 1.0100020170211792, 'learning_rate': 1.8120610600058582e-05, 'epoch': 0.67} +2025-02-05 16:28:37 - ERROR - stderr - 22%|██▏ | 4990/22434 [6:20:57<12:00:02, 2.48s/it] +2025-02-05 16:28:39 - ERROR - stderr - 22%|██▏ | 4991/22434 [6:20:59<12:00:56, 2.48s/it] +2025-02-05 16:28:39 - ERROR - stderr - +2025-02-05 16:28:39 - ERROR - stderr - +2025-02-05 16:28:39 - INFO - stdout - {'loss': 1.0634, 'grad_norm': 1.101260781288147, 'learning_rate': 1.8119767984636213e-05, 'epoch': 0.67} +2025-02-05 16:28:39 - ERROR - stderr - 22%|██▏ | 4991/22434 [6:20:59<12:00:56, 2.48s/it] +2025-02-05 16:28:42 - ERROR - stderr - 22%|██▏ | 4992/22434 [6:21:02<12:05:44, 2.50s/it] +2025-02-05 16:28:42 - ERROR - stderr - +2025-02-05 16:28:42 - ERROR - stderr - +2025-02-05 16:28:42 - INFO - stdout - {'loss': 0.8216, 'grad_norm': 0.9628750681877136, 'learning_rate': 1.811892519996505e-05, 'epoch': 0.67} +2025-02-05 16:28:42 - ERROR - stderr - 22%|██▏ | 4992/22434 [6:21:02<12:05:44, 2.50s/it] +2025-02-05 16:28:44 - ERROR - stderr - 22%|██▏ | 4993/22434 [6:21:04<12:04:38, 2.49s/it] +2025-02-05 16:28:44 - ERROR - stderr - +2025-02-05 16:28:44 - ERROR - stderr - +2025-02-05 16:28:44 - INFO - stdout - {'loss': 0.9784, 'grad_norm': 1.0571770668029785, 'learning_rate': 1.8118082246062657e-05, 'epoch': 0.67} +2025-02-05 16:28:44 - ERROR - stderr - 22%|██▏ | 4993/22434 [6:21:04<12:04:38, 2.49s/it] +2025-02-05 16:28:47 - ERROR - stderr - 22%|██▏ | 4994/22434 [6:21:06<12:04:12, 2.49s/it] +2025-02-05 16:28:47 - ERROR - stderr - +2025-02-05 16:28:47 - ERROR - stderr - +2025-02-05 16:28:47 - INFO - stdout - {'loss': 0.9442, 'grad_norm': 1.1104413270950317, 'learning_rate': 1.8117239122946615e-05, 'epoch': 0.67} +2025-02-05 16:28:47 - ERROR - stderr - 22%|██▏ | 4994/22434 [6:21:07<12:04:12, 2.49s/it] +2025-02-05 16:28:49 - ERROR - stderr - 22%|██▏ | 4995/22434 [6:21:09<11:59:39, 2.48s/it] +2025-02-05 16:28:49 - ERROR - stderr - +2025-02-05 16:28:49 - ERROR - stderr - +2025-02-05 16:28:49 - INFO - stdout - {'loss': 1.0236, 'grad_norm': 1.0943197011947632, 'learning_rate': 1.8116395830634485e-05, 'epoch': 0.67} +2025-02-05 16:28:49 - ERROR - stderr - 22%|██▏ | 4995/22434 [6:21:09<11:59:39, 2.48s/it] +2025-02-05 16:28:52 - ERROR - stderr - 22%|██▏ | 4996/22434 [6:21:11<11:59:27, 2.48s/it] +2025-02-05 16:28:52 - ERROR - stderr - +2025-02-05 16:28:52 - ERROR - stderr - +2025-02-05 16:28:52 - INFO - stdout - {'loss': 0.9944, 'grad_norm': 0.9976595044136047, 'learning_rate': 1.8115552369143855e-05, 'epoch': 0.67} +2025-02-05 16:28:52 - ERROR - stderr - 22%|██▏ | 4996/22434 [6:21:11<11:59:27, 2.48s/it] +2025-02-05 16:28:54 - ERROR - stderr - 22%|██▏ | 4997/22434 [6:21:14<12:02:08, 2.48s/it] +2025-02-05 16:28:54 - ERROR - stderr - +2025-02-05 16:28:54 - ERROR - stderr - +2025-02-05 16:28:54 - INFO - stdout - {'loss': 1.0038, 'grad_norm': 1.1618831157684326, 'learning_rate': 1.81147087384923e-05, 'epoch': 0.67} +2025-02-05 16:28:54 - ERROR - stderr - 22%|██▏ | 4997/22434 [6:21:14<12:02:08, 2.48s/it] +2025-02-05 16:28:57 - ERROR - stderr - 22%|██▏ | 4998/22434 [6:21:17<12:23:02, 2.56s/it] +2025-02-05 16:28:57 - ERROR - stderr - +2025-02-05 16:28:57 - ERROR - stderr - +2025-02-05 16:28:57 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 1.1059714555740356, 'learning_rate': 1.81138649386974e-05, 'epoch': 0.67} +2025-02-05 16:28:57 - ERROR - stderr - 22%|██▏ | 4998/22434 [6:21:17<12:23:02, 2.56s/it] +2025-02-05 16:28:59 - ERROR - stderr - 22%|██▏ | 4999/22434 [6:21:19<12:12:35, 2.52s/it] +2025-02-05 16:28:59 - ERROR - stderr - +2025-02-05 16:28:59 - ERROR - stderr - +2025-02-05 16:28:59 - INFO - stdout - {'loss': 0.888, 'grad_norm': 0.9660097360610962, 'learning_rate': 1.8113020969776758e-05, 'epoch': 0.67} +2025-02-05 16:28:59 - ERROR - stderr - 22%|██▏ | 4999/22434 [6:21:19<12:12:35, 2.52s/it] +2025-02-05 16:29:02 - ERROR - stderr - 22%|██▏ | 5000/22434 [6:21:22<12:37:56, 2.61s/it] +2025-02-05 16:29:02 - ERROR - stderr - +2025-02-05 16:29:02 - ERROR - stderr - +2025-02-05 16:29:02 - INFO - stdout - {'loss': 1.0256, 'grad_norm': 1.064026117324829, 'learning_rate': 1.8112176831747953e-05, 'epoch': 0.67} +2025-02-05 16:29:02 - ERROR - stderr - 22%|██▏ | 5000/22434 [6:21:22<12:37:56, 2.61s/it] +2025-02-05 16:29:05 - ERROR - stderr - 22%|██▏ | 5001/22434 [6:21:24<12:25:10, 2.56s/it] +2025-02-05 16:29:05 - ERROR - stderr - +2025-02-05 16:29:05 - ERROR - stderr - +2025-02-05 16:29:05 - INFO - stdout - {'loss': 0.9215, 'grad_norm': 0.9980587959289551, 'learning_rate': 1.8111332524628587e-05, 'epoch': 0.67} +2025-02-05 16:29:05 - ERROR - stderr - 22%|██▏ | 5001/22434 [6:21:24<12:25:10, 2.56s/it] +2025-02-05 16:29:07 - ERROR - stderr - 22%|██▏ | 5002/22434 [6:21:27<12:19:37, 2.55s/it] +2025-02-05 16:29:07 - ERROR - stderr - +2025-02-05 16:29:07 - ERROR - stderr - +2025-02-05 16:29:07 - INFO - stdout - {'loss': 0.9625, 'grad_norm': 1.037880778312683, 'learning_rate': 1.8110488048436254e-05, 'epoch': 0.67} +2025-02-05 16:29:07 - ERROR - stderr - 22%|██▏ | 5002/22434 [6:21:27<12:19:37, 2.55s/it] +2025-02-05 16:29:10 - ERROR - stderr - 22%|██▏ | 5003/22434 [6:21:29<12:14:01, 2.53s/it] +2025-02-05 16:29:10 - ERROR - stderr - +2025-02-05 16:29:10 - ERROR - stderr - +2025-02-05 16:29:10 - INFO - stdout - {'loss': 1.1008, 'grad_norm': 1.139431118965149, 'learning_rate': 1.8109643403188558e-05, 'epoch': 0.67} +2025-02-05 16:29:10 - ERROR - stderr - 22%|██▏ | 5003/22434 [6:21:29<12:14:01, 2.53s/it] +2025-02-05 16:29:12 - ERROR - stderr - 22%|██▏ | 5004/22434 [6:21:32<12:16:57, 2.54s/it] +2025-02-05 16:29:12 - ERROR - stderr - +2025-02-05 16:29:12 - ERROR - stderr - +2025-02-05 16:29:12 - INFO - stdout - {'loss': 0.9325, 'grad_norm': 0.9601593613624573, 'learning_rate': 1.8108798588903105e-05, 'epoch': 0.67} +2025-02-05 16:29:12 - ERROR - stderr - 22%|██▏ | 5004/22434 [6:21:32<12:16:57, 2.54s/it] +2025-02-05 16:29:15 - ERROR - stderr - 22%|██▏ | 5005/22434 [6:21:35<12:46:16, 2.64s/it] +2025-02-05 16:29:15 - ERROR - stderr - +2025-02-05 16:29:15 - ERROR - stderr - +2025-02-05 16:29:15 - INFO - stdout - {'loss': 0.9648, 'grad_norm': 1.069495677947998, 'learning_rate': 1.8107953605597507e-05, 'epoch': 0.67} +2025-02-05 16:29:15 - ERROR - stderr - 22%|██▏ | 5005/22434 [6:21:35<12:46:16, 2.64s/it] +2025-02-05 16:29:17 - ERROR - stderr - 22%|██▏ | 5006/22434 [6:21:37<12:30:55, 2.59s/it] +2025-02-05 16:29:17 - ERROR - stderr - +2025-02-05 16:29:17 - ERROR - stderr - +2025-02-05 16:29:17 - INFO - stdout - {'loss': 0.8966, 'grad_norm': 1.0853127241134644, 'learning_rate': 1.8107108453289373e-05, 'epoch': 0.67} +2025-02-05 16:29:17 - ERROR - stderr - 22%|██▏ | 5006/22434 [6:21:37<12:30:55, 2.59s/it] +2025-02-05 16:29:20 - ERROR - stderr - 22%|██▏ | 5007/22434 [6:21:40<12:21:45, 2.55s/it] +2025-02-05 16:29:20 - ERROR - stderr - +2025-02-05 16:29:20 - ERROR - stderr - +2025-02-05 16:29:20 - INFO - stdout - {'loss': 0.9711, 'grad_norm': 1.0191290378570557, 'learning_rate': 1.810626313199632e-05, 'epoch': 0.67} +2025-02-05 16:29:20 - ERROR - stderr - 22%|██▏ | 5007/22434 [6:21:40<12:21:45, 2.55s/it] +2025-02-05 16:29:22 - ERROR - stderr - 22%|██▏ | 5008/22434 [6:21:42<12:14:32, 2.53s/it] +2025-02-05 16:29:22 - ERROR - stderr - +2025-02-05 16:29:22 - ERROR - stderr - +2025-02-05 16:29:22 - INFO - stdout - {'loss': 1.0939, 'grad_norm': 1.1415996551513672, 'learning_rate': 1.8105417641735974e-05, 'epoch': 0.67} +2025-02-05 16:29:22 - ERROR - stderr - 22%|██▏ | 5008/22434 [6:21:42<12:14:32, 2.53s/it] +2025-02-05 16:29:25 - ERROR - stderr - 22%|██▏ | 5009/22434 [6:21:45<12:06:08, 2.50s/it] +2025-02-05 16:29:25 - ERROR - stderr - +2025-02-05 16:29:25 - ERROR - stderr - +2025-02-05 16:29:25 - INFO - stdout - {'loss': 0.8584, 'grad_norm': 0.9952882528305054, 'learning_rate': 1.810457198252595e-05, 'epoch': 0.67} +2025-02-05 16:29:25 - ERROR - stderr - 22%|██▏ | 5009/22434 [6:21:45<12:06:08, 2.50s/it] +2025-02-05 16:29:27 - ERROR - stderr - 22%|██▏ | 5010/22434 [6:21:47<12:04:42, 2.50s/it] +2025-02-05 16:29:27 - ERROR - stderr - +2025-02-05 16:29:27 - ERROR - stderr - +2025-02-05 16:29:27 - INFO - stdout - {'loss': 0.9274, 'grad_norm': 1.0715973377227783, 'learning_rate': 1.8103726154383876e-05, 'epoch': 0.67} +2025-02-05 16:29:27 - ERROR - stderr - 22%|██▏ | 5010/22434 [6:21:47<12:04:42, 2.50s/it] +2025-02-05 16:29:30 - ERROR - stderr - 22%|██▏ | 5011/22434 [6:21:50<11:58:03, 2.47s/it] +2025-02-05 16:29:30 - ERROR - stderr - +2025-02-05 16:29:30 - ERROR - stderr - +2025-02-05 16:29:30 - INFO - stdout - {'loss': 1.0282, 'grad_norm': 1.0314003229141235, 'learning_rate': 1.8102880157327386e-05, 'epoch': 0.67} +2025-02-05 16:29:30 - ERROR - stderr - 22%|██▏ | 5011/22434 [6:21:50<11:58:03, 2.47s/it] +2025-02-05 16:29:33 - ERROR - stderr - 22%|██▏ | 5012/22434 [6:21:52<12:29:35, 2.58s/it] +2025-02-05 16:29:33 - ERROR - stderr - +2025-02-05 16:29:33 - ERROR - stderr - +2025-02-05 16:29:33 - INFO - stdout - {'loss': 0.968, 'grad_norm': 1.1185998916625977, 'learning_rate': 1.8102033991374118e-05, 'epoch': 0.67} +2025-02-05 16:29:33 - ERROR - stderr - 22%|██▏ | 5012/22434 [6:21:52<12:29:35, 2.58s/it] +2025-02-05 16:29:35 - ERROR - stderr - 22%|██▏ | 5013/22434 [6:21:55<12:29:32, 2.58s/it] +2025-02-05 16:29:35 - ERROR - stderr - +2025-02-05 16:29:35 - ERROR - stderr - +2025-02-05 16:29:35 - INFO - stdout - {'loss': 1.0646, 'grad_norm': 1.0908783674240112, 'learning_rate': 1.8101187656541695e-05, 'epoch': 0.67} +2025-02-05 16:29:35 - ERROR - stderr - 22%|██▏ | 5013/22434 [6:21:55<12:29:32, 2.58s/it] +2025-02-05 16:29:38 - ERROR - stderr - 22%|██▏ | 5014/22434 [6:21:57<12:23:57, 2.56s/it] +2025-02-05 16:29:38 - ERROR - stderr - +2025-02-05 16:29:38 - ERROR - stderr - +2025-02-05 16:29:38 - INFO - stdout - {'loss': 1.0432, 'grad_norm': 1.1463176012039185, 'learning_rate': 1.8100341152847772e-05, 'epoch': 0.67} +2025-02-05 16:29:38 - ERROR - stderr - 22%|██▏ | 5014/22434 [6:21:57<12:23:57, 2.56s/it] +2025-02-05 16:29:40 - ERROR - stderr - 22%|██▏ | 5015/22434 [6:22:00<12:16:11, 2.54s/it] +2025-02-05 16:29:40 - ERROR - stderr - +2025-02-05 16:29:40 - ERROR - stderr - +2025-02-05 16:29:40 - INFO - stdout - {'loss': 1.0687, 'grad_norm': 1.1876200437545776, 'learning_rate': 1.809949448030999e-05, 'epoch': 0.67} +2025-02-05 16:29:40 - ERROR - stderr - 22%|██▏ | 5015/22434 [6:22:00<12:16:11, 2.54s/it] +2025-02-05 16:29:43 - ERROR - stderr - 22%|██▏ | 5016/22434 [6:22:02<12:10:25, 2.52s/it] +2025-02-05 16:29:43 - ERROR - stderr - +2025-02-05 16:29:43 - ERROR - stderr - +2025-02-05 16:29:43 - INFO - stdout - {'loss': 0.9486, 'grad_norm': 1.129399061203003, 'learning_rate': 1.8098647638946e-05, 'epoch': 0.67} +2025-02-05 16:29:43 - ERROR - stderr - 22%|██▏ | 5016/22434 [6:22:02<12:10:25, 2.52s/it] +2025-02-05 16:29:45 - ERROR - stderr - 22%|██▏ | 5017/22434 [6:22:05<12:10:36, 2.52s/it] +2025-02-05 16:29:45 - ERROR - stderr - +2025-02-05 16:29:45 - ERROR - stderr - +2025-02-05 16:29:45 - INFO - stdout - {'loss': 1.0181, 'grad_norm': 1.0842876434326172, 'learning_rate': 1.809780062877344e-05, 'epoch': 0.67} +2025-02-05 16:29:45 - ERROR - stderr - 22%|██▏ | 5017/22434 [6:22:05<12:10:36, 2.52s/it] +2025-02-05 16:29:48 - ERROR - stderr - 22%|██▏ | 5018/22434 [6:22:07<12:07:43, 2.51s/it] +2025-02-05 16:29:48 - ERROR - stderr - +2025-02-05 16:29:48 - ERROR - stderr - +2025-02-05 16:29:48 - INFO - stdout - {'loss': 0.98, 'grad_norm': 1.132673740386963, 'learning_rate': 1.8096953449809983e-05, 'epoch': 0.67} +2025-02-05 16:29:48 - ERROR - stderr - 22%|██▏ | 5018/22434 [6:22:07<12:07:43, 2.51s/it] +2025-02-05 16:29:50 - ERROR - stderr - 22%|██▏ | 5019/22434 [6:22:10<12:07:17, 2.51s/it] +2025-02-05 16:29:50 - ERROR - stderr - +2025-02-05 16:29:50 - ERROR - stderr - +2025-02-05 16:29:50 - INFO - stdout - {'loss': 0.9816, 'grad_norm': 0.9741018414497375, 'learning_rate': 1.809610610207327e-05, 'epoch': 0.67} +2025-02-05 16:29:50 - ERROR - stderr - 22%|██▏ | 5019/22434 [6:22:10<12:07:17, 2.51s/it] +2025-02-05 16:29:53 - ERROR - stderr - 22%|██▏ | 5020/22434 [6:22:12<12:02:08, 2.49s/it] +2025-02-05 16:29:53 - ERROR - stderr - +2025-02-05 16:29:53 - ERROR - stderr - +2025-02-05 16:29:53 - INFO - stdout - {'loss': 0.8669, 'grad_norm': 1.0211485624313354, 'learning_rate': 1.8095258585580983e-05, 'epoch': 0.67} +2025-02-05 16:29:53 - ERROR - stderr - 22%|██▏ | 5020/22434 [6:22:12<12:02:08, 2.49s/it] +2025-02-05 16:29:55 - ERROR - stderr - 22%|██▏ | 5021/22434 [6:22:15<12:06:44, 2.50s/it] +2025-02-05 16:29:55 - ERROR - stderr - +2025-02-05 16:29:55 - ERROR - stderr - +2025-02-05 16:29:55 - INFO - stdout - {'loss': 1.0076, 'grad_norm': 1.368371844291687, 'learning_rate': 1.809441090035077e-05, 'epoch': 0.67} +2025-02-05 16:29:55 - ERROR - stderr - 22%|██▏ | 5021/22434 [6:22:15<12:06:44, 2.50s/it] +2025-02-05 16:29:58 - ERROR - stderr - 22%|██▏ | 5022/22434 [6:22:17<12:05:59, 2.50s/it] +2025-02-05 16:29:58 - ERROR - stderr - +2025-02-05 16:29:58 - ERROR - stderr - +2025-02-05 16:29:58 - INFO - stdout - {'loss': 1.0083, 'grad_norm': 1.080718994140625, 'learning_rate': 1.809356304640031e-05, 'epoch': 0.67} +2025-02-05 16:29:58 - ERROR - stderr - 22%|██▏ | 5022/22434 [6:22:17<12:05:59, 2.50s/it] +2025-02-05 16:30:00 - ERROR - stderr - 22%|██▏ | 5023/22434 [6:22:20<12:13:12, 2.53s/it] +2025-02-05 16:30:00 - ERROR - stderr - +2025-02-05 16:30:00 - ERROR - stderr - +2025-02-05 16:30:00 - INFO - stdout - {'loss': 0.9311, 'grad_norm': 0.990145206451416, 'learning_rate': 1.809271502374727e-05, 'epoch': 0.67} +2025-02-05 16:30:00 - ERROR - stderr - 22%|██▏ | 5023/22434 [6:22:20<12:13:12, 2.53s/it] +2025-02-05 16:30:03 - ERROR - stderr - 22%|██▏ | 5024/22434 [6:22:22<12:08:45, 2.51s/it] +2025-02-05 16:30:03 - ERROR - stderr - +2025-02-05 16:30:03 - ERROR - stderr - +2025-02-05 16:30:03 - INFO - stdout - {'loss': 1.0158, 'grad_norm': 1.17551589012146, 'learning_rate': 1.8091866832409332e-05, 'epoch': 0.67} +2025-02-05 16:30:03 - ERROR - stderr - 22%|██▏ | 5024/22434 [6:22:22<12:08:45, 2.51s/it] +2025-02-05 16:30:05 - ERROR - stderr - 22%|██▏ | 5025/22434 [6:22:25<12:05:18, 2.50s/it] +2025-02-05 16:30:05 - ERROR - stderr - +2025-02-05 16:30:05 - ERROR - stderr - +2025-02-05 16:30:05 - INFO - stdout - {'loss': 1.1643, 'grad_norm': 1.1224229335784912, 'learning_rate': 1.8091018472404172e-05, 'epoch': 0.67} +2025-02-05 16:30:05 - ERROR - stderr - 22%|██▏ | 5025/22434 [6:22:25<12:05:18, 2.50s/it] +2025-02-05 16:30:08 - ERROR - stderr - 22%|██▏ | 5026/22434 [6:22:27<12:06:17, 2.50s/it] +2025-02-05 16:30:08 - ERROR - stderr - +2025-02-05 16:30:08 - ERROR - stderr - +2025-02-05 16:30:08 - INFO - stdout - {'loss': 0.9099, 'grad_norm': 1.0456095933914185, 'learning_rate': 1.8090169943749477e-05, 'epoch': 0.67} +2025-02-05 16:30:08 - ERROR - stderr - 22%|██▏ | 5026/22434 [6:22:27<12:06:17, 2.50s/it] +2025-02-05 16:30:10 - ERROR - stderr - 22%|██▏ | 5027/22434 [6:22:30<12:10:56, 2.52s/it] +2025-02-05 16:30:10 - ERROR - stderr - +2025-02-05 16:30:10 - ERROR - stderr - +2025-02-05 16:30:10 - INFO - stdout - {'loss': 0.9243, 'grad_norm': 0.9828181862831116, 'learning_rate': 1.808932124646293e-05, 'epoch': 0.67} +2025-02-05 16:30:10 - ERROR - stderr - 22%|██▏ | 5027/22434 [6:22:30<12:10:56, 2.52s/it] +2025-02-05 16:30:13 - ERROR - stderr - 22%|██▏ | 5028/22434 [6:22:32<12:01:48, 2.49s/it] +2025-02-05 16:30:13 - ERROR - stderr - +2025-02-05 16:30:13 - ERROR - stderr - +2025-02-05 16:30:13 - INFO - stdout - {'loss': 0.989, 'grad_norm': 1.097732424736023, 'learning_rate': 1.8088472380562218e-05, 'epoch': 0.67} +2025-02-05 16:30:13 - ERROR - stderr - 22%|██▏ | 5028/22434 [6:22:32<12:01:48, 2.49s/it] +2025-02-05 16:30:15 - ERROR - stderr - 22%|██▏ | 5029/22434 [6:22:35<12:27:47, 2.58s/it] +2025-02-05 16:30:15 - ERROR - stderr - +2025-02-05 16:30:15 - ERROR - stderr - +2025-02-05 16:30:15 - INFO - stdout - {'loss': 1.0223, 'grad_norm': 1.2297818660736084, 'learning_rate': 1.808762334606504e-05, 'epoch': 0.67} +2025-02-05 16:30:15 - ERROR - stderr - 22%|██▏ | 5029/22434 [6:22:35<12:27:47, 2.58s/it] +2025-02-05 16:30:18 - ERROR - stderr - 22%|██▏ | 5030/22434 [6:22:38<12:17:07, 2.54s/it] +2025-02-05 16:30:18 - ERROR - stderr - +2025-02-05 16:30:18 - ERROR - stderr - +2025-02-05 16:30:18 - INFO - stdout - {'loss': 0.9125, 'grad_norm': 1.1043789386749268, 'learning_rate': 1.8086774142989095e-05, 'epoch': 0.67} +2025-02-05 16:30:18 - ERROR - stderr - 22%|██▏ | 5030/22434 [6:22:38<12:17:07, 2.54s/it] +2025-02-05 16:30:20 - ERROR - stderr - 22%|██▏ | 5031/22434 [6:22:40<12:18:44, 2.55s/it] +2025-02-05 16:30:20 - ERROR - stderr - +2025-02-05 16:30:20 - ERROR - stderr - +2025-02-05 16:30:20 - INFO - stdout - {'loss': 0.8846, 'grad_norm': 1.0243536233901978, 'learning_rate': 1.8085924771352083e-05, 'epoch': 0.67} +2025-02-05 16:30:20 - ERROR - stderr - 22%|██▏ | 5031/22434 [6:22:40<12:18:44, 2.55s/it] +2025-02-05 16:30:23 - ERROR - stderr - 22%|██▏ | 5032/22434 [6:22:43<12:20:52, 2.55s/it] +2025-02-05 16:30:23 - ERROR - stderr - +2025-02-05 16:30:23 - ERROR - stderr - +2025-02-05 16:30:23 - INFO - stdout - {'loss': 0.9528, 'grad_norm': 0.9904436469078064, 'learning_rate': 1.8085075231171702e-05, 'epoch': 0.67} +2025-02-05 16:30:23 - ERROR - stderr - 22%|██▏ | 5032/22434 [6:22:43<12:20:52, 2.55s/it] +2025-02-05 16:30:25 - ERROR - stderr - 22%|██▏ | 5033/22434 [6:22:45<12:09:40, 2.52s/it] +2025-02-05 16:30:25 - ERROR - stderr - +2025-02-05 16:30:25 - ERROR - stderr - +2025-02-05 16:30:25 - INFO - stdout - {'loss': 0.9227, 'grad_norm': 1.0466152429580688, 'learning_rate': 1.8084225522465667e-05, 'epoch': 0.67} +2025-02-05 16:30:25 - ERROR - stderr - 22%|██▏ | 5033/22434 [6:22:45<12:09:40, 2.52s/it] +2025-02-05 16:30:28 - ERROR - stderr - 22%|██▏ | 5034/22434 [6:22:48<12:00:13, 2.48s/it] +2025-02-05 16:30:28 - ERROR - stderr - +2025-02-05 16:30:28 - ERROR - stderr - +2025-02-05 16:30:28 - INFO - stdout - {'loss': 0.9701, 'grad_norm': 1.0991414785385132, 'learning_rate': 1.8083375645251687e-05, 'epoch': 0.67} +2025-02-05 16:30:28 - ERROR - stderr - 22%|██▏ | 5034/22434 [6:22:48<12:00:13, 2.48s/it] +2025-02-05 16:30:30 - ERROR - stderr - 22%|██▏ | 5035/22434 [6:22:50<11:56:12, 2.47s/it] +2025-02-05 16:30:30 - ERROR - stderr - +2025-02-05 16:30:30 - ERROR - stderr - +2025-02-05 16:30:30 - INFO - stdout - {'loss': 0.9533, 'grad_norm': 1.1972569227218628, 'learning_rate': 1.8082525599547474e-05, 'epoch': 0.67} +2025-02-05 16:30:30 - ERROR - stderr - 22%|██▏ | 5035/22434 [6:22:50<11:56:12, 2.47s/it] +2025-02-05 16:30:33 - ERROR - stderr - 22%|██▏ | 5036/22434 [6:22:53<11:59:57, 2.48s/it] +2025-02-05 16:30:33 - ERROR - stderr - +2025-02-05 16:30:33 - ERROR - stderr - +2025-02-05 16:30:33 - INFO - stdout - {'loss': 0.8965, 'grad_norm': 1.0884032249450684, 'learning_rate': 1.8081675385370753e-05, 'epoch': 0.67} +2025-02-05 16:30:33 - ERROR - stderr - 22%|██▏ | 5036/22434 [6:22:53<11:59:57, 2.48s/it] +2025-02-05 16:30:35 - ERROR - stderr - 22%|██▏ | 5037/22434 [6:22:55<11:54:30, 2.46s/it] +2025-02-05 16:30:35 - ERROR - stderr - +2025-02-05 16:30:35 - ERROR - stderr - +2025-02-05 16:30:35 - INFO - stdout - {'loss': 0.9585, 'grad_norm': 1.0727729797363281, 'learning_rate': 1.808082500273924e-05, 'epoch': 0.67} +2025-02-05 16:30:35 - ERROR - stderr - 22%|██▏ | 5037/22434 [6:22:55<11:54:30, 2.46s/it] +2025-02-05 16:30:38 - ERROR - stderr - 22%|██▏ | 5038/22434 [6:22:57<11:57:42, 2.48s/it] +2025-02-05 16:30:38 - ERROR - stderr - +2025-02-05 16:30:38 - ERROR - stderr - +2025-02-05 16:30:38 - INFO - stdout - {'loss': 0.865, 'grad_norm': 1.0311223268508911, 'learning_rate': 1.807997445167066e-05, 'epoch': 0.67} +2025-02-05 16:30:38 - ERROR - stderr - 22%|██▏ | 5038/22434 [6:22:58<11:57:42, 2.48s/it] +2025-02-05 16:30:40 - ERROR - stderr - 22%|██▏ | 5039/22434 [6:23:00<11:55:02, 2.47s/it] +2025-02-05 16:30:40 - ERROR - stderr - +2025-02-05 16:30:40 - ERROR - stderr - +2025-02-05 16:30:40 - INFO - stdout - {'loss': 0.9585, 'grad_norm': 1.069775104522705, 'learning_rate': 1.8079123732182748e-05, 'epoch': 0.67} +2025-02-05 16:30:40 - ERROR - stderr - 22%|██▏ | 5039/22434 [6:23:00<11:55:02, 2.47s/it] +2025-02-05 16:30:43 - ERROR - stderr - 22%|██▏ | 5040/22434 [6:23:02<11:53:25, 2.46s/it] +2025-02-05 16:30:43 - ERROR - stderr - +2025-02-05 16:30:43 - ERROR - stderr - +2025-02-05 16:30:43 - INFO - stdout - {'loss': 0.9612, 'grad_norm': 1.1405057907104492, 'learning_rate': 1.807827284429323e-05, 'epoch': 0.67} +2025-02-05 16:30:43 - ERROR - stderr - 22%|██▏ | 5040/22434 [6:23:02<11:53:25, 2.46s/it] +2025-02-05 16:30:43 - INFO - stdout - WARNING: tokenization mismatch: 112 vs. 138. (ignored) +2025-02-05 16:30:45 - ERROR - stderr - 22%|██▏ | 5041/22434 [6:23:05<11:57:11, 2.47s/it] +2025-02-05 16:30:45 - ERROR - stderr - +2025-02-05 16:30:45 - ERROR - stderr - +2025-02-05 16:30:45 - INFO - stdout - {'loss': 0.7721, 'grad_norm': 0.9590426087379456, 'learning_rate': 1.8077421788019848e-05, 'epoch': 0.67} +2025-02-05 16:30:45 - ERROR - stderr - 22%|██▏ | 5041/22434 [6:23:05<11:57:11, 2.47s/it] +2025-02-05 16:30:48 - ERROR - stderr - 22%|██▏ | 5042/22434 [6:23:08<12:36:53, 2.61s/it] +2025-02-05 16:30:48 - ERROR - stderr - +2025-02-05 16:30:48 - ERROR - stderr - +2025-02-05 16:30:48 - INFO - stdout - {'loss': 1.02, 'grad_norm': 1.1761194467544556, 'learning_rate': 1.8076570563380333e-05, 'epoch': 0.67} +2025-02-05 16:30:48 - ERROR - stderr - 22%|██▏ | 5042/22434 [6:23:08<12:36:53, 2.61s/it] +2025-02-05 16:30:50 - ERROR - stderr - 22%|██▏ | 5043/22434 [6:23:10<12:21:36, 2.56s/it] +2025-02-05 16:30:51 - ERROR - stderr - +2025-02-05 16:30:51 - ERROR - stderr - +2025-02-05 16:30:51 - INFO - stdout - {'loss': 1.1724, 'grad_norm': 1.163806676864624, 'learning_rate': 1.8075719170392437e-05, 'epoch': 0.67} +2025-02-05 16:30:51 - ERROR - stderr - 22%|██▏ | 5043/22434 [6:23:10<12:21:36, 2.56s/it] +2025-02-05 16:30:51 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2736 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 16:30:51 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2736 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 16:30:53 - ERROR - stderr - 22%|██▏ | 5044/22434 [6:23:13<12:16:33, 2.54s/it] +2025-02-05 16:30:53 - ERROR - stderr - +2025-02-05 16:30:53 - ERROR - stderr - +2025-02-05 16:30:53 - INFO - stdout - {'loss': 0.895, 'grad_norm': 1.0814969539642334, 'learning_rate': 1.80748676090739e-05, 'epoch': 0.67} +2025-02-05 16:30:53 - ERROR - stderr - 22%|██▏ | 5044/22434 [6:23:13<12:16:33, 2.54s/it] +2025-02-05 16:30:59 - ERROR - stderr - 22%|██▏ | 5045/22434 [6:23:18<16:56:11, 3.51s/it] +2025-02-05 16:30:59 - ERROR - stderr - +2025-02-05 16:30:59 - ERROR - stderr - +2025-02-05 16:30:59 - INFO - stdout - {'loss': 1.0519, 'grad_norm': 1.1215808391571045, 'learning_rate': 1.8074015879442475e-05, 'epoch': 0.67} +2025-02-05 16:30:59 - ERROR - stderr - 22%|██▏ | 5045/22434 [6:23:19<16:56:11, 3.51s/it] +2025-02-05 16:31:01 - ERROR - stderr - 22%|██▏ | 5046/22434 [6:23:21<15:31:21, 3.21s/it] +2025-02-05 16:31:01 - ERROR - stderr - +2025-02-05 16:31:01 - ERROR - stderr - +2025-02-05 16:31:01 - INFO - stdout - {'loss': 0.9824, 'grad_norm': 1.0824809074401855, 'learning_rate': 1.8073163981515915e-05, 'epoch': 0.67} +2025-02-05 16:31:01 - ERROR - stderr - 22%|██▏ | 5046/22434 [6:23:21<15:31:21, 3.21s/it] +2025-02-05 16:31:04 - ERROR - stderr - 22%|██▏ | 5047/22434 [6:23:24<14:29:08, 3.00s/it] +2025-02-05 16:31:04 - ERROR - stderr - +2025-02-05 16:31:04 - ERROR - stderr - +2025-02-05 16:31:04 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.1442539691925049, 'learning_rate': 1.8072311915311978e-05, 'epoch': 0.67} +2025-02-05 16:31:04 - ERROR - stderr - 22%|██▏ | 5047/22434 [6:23:24<14:29:08, 3.00s/it] +2025-02-05 16:31:06 - ERROR - stderr - 23%|██▎ | 5048/22434 [6:23:26<13:41:49, 2.84s/it] +2025-02-05 16:31:06 - ERROR - stderr - +2025-02-05 16:31:06 - ERROR - stderr - +2025-02-05 16:31:06 - INFO - stdout - {'loss': 0.8791, 'grad_norm': 1.0627573728561401, 'learning_rate': 1.8071459680848423e-05, 'epoch': 0.68} +2025-02-05 16:31:06 - ERROR - stderr - 23%|██▎ | 5048/22434 [6:23:26<13:41:49, 2.84s/it] +2025-02-05 16:31:09 - ERROR - stderr - 23%|██▎ | 5049/22434 [6:23:28<13:11:43, 2.73s/it] +2025-02-05 16:31:09 - ERROR - stderr - +2025-02-05 16:31:09 - ERROR - stderr - +2025-02-05 16:31:09 - INFO - stdout - {'loss': 0.9051, 'grad_norm': 1.005487322807312, 'learning_rate': 1.8070607278143016e-05, 'epoch': 0.68} +2025-02-05 16:31:09 - ERROR - stderr - 23%|██▎ | 5049/22434 [6:23:29<13:11:43, 2.73s/it] +2025-02-05 16:31:11 - ERROR - stderr - 23%|██▎ | 5050/22434 [6:23:31<13:09:55, 2.73s/it] +2025-02-05 16:31:11 - ERROR - stderr - +2025-02-05 16:31:11 - ERROR - stderr - +2025-02-05 16:31:11 - INFO - stdout - {'loss': 0.9219, 'grad_norm': 1.163400650024414, 'learning_rate': 1.8069754707213522e-05, 'epoch': 0.68} +2025-02-05 16:31:11 - ERROR - stderr - 23%|██▎ | 5050/22434 [6:23:31<13:09:55, 2.73s/it] +2025-02-05 16:31:14 - ERROR - stderr - 23%|██▎ | 5051/22434 [6:23:34<12:46:58, 2.65s/it] +2025-02-05 16:31:14 - ERROR - stderr - +2025-02-05 16:31:14 - ERROR - stderr - +2025-02-05 16:31:14 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.077052354812622, 'learning_rate': 1.806890196807771e-05, 'epoch': 0.68} +2025-02-05 16:31:14 - ERROR - stderr - 23%|██▎ | 5051/22434 [6:23:34<12:46:58, 2.65s/it] +2025-02-05 16:31:16 - ERROR - stderr - 23%|██▎ | 5052/22434 [6:23:36<12:31:32, 2.59s/it] +2025-02-05 16:31:16 - ERROR - stderr - +2025-02-05 16:31:16 - ERROR - stderr - +2025-02-05 16:31:16 - INFO - stdout - {'loss': 1.0012, 'grad_norm': 0.980795681476593, 'learning_rate': 1.8068049060753365e-05, 'epoch': 0.68} +2025-02-05 16:31:16 - ERROR - stderr - 23%|██▎ | 5052/22434 [6:23:36<12:31:32, 2.59s/it] +2025-02-05 16:31:19 - ERROR - stderr - 23%|██▎ | 5053/22434 [6:23:39<12:21:33, 2.56s/it] +2025-02-05 16:31:19 - ERROR - stderr - +2025-02-05 16:31:19 - ERROR - stderr - +2025-02-05 16:31:19 - INFO - stdout - {'loss': 0.867, 'grad_norm': 1.0475205183029175, 'learning_rate': 1.8067195985258253e-05, 'epoch': 0.68} +2025-02-05 16:31:19 - ERROR - stderr - 23%|██▎ | 5053/22434 [6:23:39<12:21:33, 2.56s/it] +2025-02-05 16:31:21 - ERROR - stderr - 23%|██▎ | 5054/22434 [6:23:41<12:16:00, 2.54s/it] +2025-02-05 16:31:21 - ERROR - stderr - +2025-02-05 16:31:21 - ERROR - stderr - +2025-02-05 16:31:21 - INFO - stdout - {'loss': 0.98, 'grad_norm': 1.0309828519821167, 'learning_rate': 1.8066342741610158e-05, 'epoch': 0.68} +2025-02-05 16:31:21 - ERROR - stderr - 23%|██▎ | 5054/22434 [6:23:41<12:16:00, 2.54s/it] +2025-02-05 16:31:24 - ERROR - stderr - 23%|██▎ | 5055/22434 [6:23:44<12:08:44, 2.52s/it] +2025-02-05 16:31:24 - ERROR - stderr - +2025-02-05 16:31:24 - ERROR - stderr - +2025-02-05 16:31:24 - INFO - stdout - {'loss': 0.8414, 'grad_norm': 1.0276451110839844, 'learning_rate': 1.806548932982687e-05, 'epoch': 0.68} +2025-02-05 16:31:24 - ERROR - stderr - 23%|██▎ | 5055/22434 [6:23:44<12:08:44, 2.52s/it] +2025-02-05 16:31:26 - ERROR - stderr - 23%|██▎ | 5056/22434 [6:23:46<12:07:40, 2.51s/it] +2025-02-05 16:31:26 - ERROR - stderr - +2025-02-05 16:31:26 - ERROR - stderr - +2025-02-05 16:31:26 - INFO - stdout - {'loss': 0.8625, 'grad_norm': 1.0409561395645142, 'learning_rate': 1.8064635749926172e-05, 'epoch': 0.68} +2025-02-05 16:31:26 - ERROR - stderr - 23%|██▎ | 5056/22434 [6:23:46<12:07:40, 2.51s/it] +2025-02-05 16:31:29 - ERROR - stderr - 23%|██▎ | 5057/22434 [6:23:49<12:03:18, 2.50s/it] +2025-02-05 16:31:29 - ERROR - stderr - +2025-02-05 16:31:29 - ERROR - stderr - +2025-02-05 16:31:29 - INFO - stdout - {'loss': 0.987, 'grad_norm': 1.0347881317138672, 'learning_rate': 1.8063782001925864e-05, 'epoch': 0.68} +2025-02-05 16:31:29 - ERROR - stderr - 23%|██▎ | 5057/22434 [6:23:49<12:03:18, 2.50s/it] +2025-02-05 16:31:31 - ERROR - stderr - 23%|██▎ | 5058/22434 [6:23:51<11:58:20, 2.48s/it] +2025-02-05 16:31:31 - ERROR - stderr - +2025-02-05 16:31:31 - ERROR - stderr - +2025-02-05 16:31:31 - INFO - stdout - {'loss': 0.9924, 'grad_norm': 1.0494024753570557, 'learning_rate': 1.8062928085843732e-05, 'epoch': 0.68} +2025-02-05 16:31:31 - ERROR - stderr - 23%|██▎ | 5058/22434 [6:23:51<11:58:20, 2.48s/it] +2025-02-05 16:31:34 - ERROR - stderr - 23%|██▎ | 5059/22434 [6:23:54<12:06:39, 2.51s/it] +2025-02-05 16:31:34 - ERROR - stderr - +2025-02-05 16:31:34 - ERROR - stderr - +2025-02-05 16:31:34 - INFO - stdout - {'loss': 1.0123, 'grad_norm': 1.0453131198883057, 'learning_rate': 1.806207400169758e-05, 'epoch': 0.68} +2025-02-05 16:31:34 - ERROR - stderr - 23%|██▎ | 5059/22434 [6:23:54<12:06:39, 2.51s/it] +2025-02-05 16:31:36 - ERROR - stderr - 23%|██▎ | 5060/22434 [6:23:56<12:03:27, 2.50s/it] +2025-02-05 16:31:36 - ERROR - stderr - +2025-02-05 16:31:36 - ERROR - stderr - +2025-02-05 16:31:36 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.0931572914123535, 'learning_rate': 1.806121974950521e-05, 'epoch': 0.68} +2025-02-05 16:31:36 - ERROR - stderr - 23%|██▎ | 5060/22434 [6:23:56<12:03:27, 2.50s/it] +2025-02-05 16:31:39 - ERROR - stderr - 23%|██▎ | 5061/22434 [6:23:58<12:02:31, 2.50s/it] +2025-02-05 16:31:39 - ERROR - stderr - +2025-02-05 16:31:39 - ERROR - stderr - +2025-02-05 16:31:39 - INFO - stdout - {'loss': 0.9707, 'grad_norm': 1.053357481956482, 'learning_rate': 1.806036532928443e-05, 'epoch': 0.68} +2025-02-05 16:31:39 - ERROR - stderr - 23%|██▎ | 5061/22434 [6:23:59<12:02:31, 2.50s/it] +2025-02-05 16:31:41 - ERROR - stderr - 23%|██▎ | 5062/22434 [6:24:01<12:06:19, 2.51s/it] +2025-02-05 16:31:41 - ERROR - stderr - +2025-02-05 16:31:41 - ERROR - stderr - +2025-02-05 16:31:41 - INFO - stdout - {'loss': 0.941, 'grad_norm': 1.0865283012390137, 'learning_rate': 1.8059510741053045e-05, 'epoch': 0.68} +2025-02-05 16:31:41 - ERROR - stderr - 23%|██▎ | 5062/22434 [6:24:01<12:06:19, 2.51s/it] +2025-02-05 16:31:44 - ERROR - stderr - 23%|██▎ | 5063/22434 [6:24:04<12:05:14, 2.51s/it] +2025-02-05 16:31:44 - ERROR - stderr - +2025-02-05 16:31:44 - ERROR - stderr - +2025-02-05 16:31:44 - INFO - stdout - {'loss': 0.9522, 'grad_norm': 1.1608012914657593, 'learning_rate': 1.805865598482887e-05, 'epoch': 0.68} +2025-02-05 16:31:44 - ERROR - stderr - 23%|██▎ | 5063/22434 [6:24:04<12:05:14, 2.51s/it] +2025-02-05 16:31:46 - ERROR - stderr - 23%|██▎ | 5064/22434 [6:24:06<12:02:03, 2.49s/it] +2025-02-05 16:31:46 - ERROR - stderr - +2025-02-05 16:31:46 - ERROR - stderr - +2025-02-05 16:31:46 - INFO - stdout - {'loss': 0.9258, 'grad_norm': 1.0921530723571777, 'learning_rate': 1.805780106062973e-05, 'epoch': 0.68} +2025-02-05 16:31:46 - ERROR - stderr - 23%|██▎ | 5064/22434 [6:24:06<12:02:03, 2.49s/it] +2025-02-05 16:31:49 - ERROR - stderr - 23%|██▎ | 5065/22434 [6:24:09<12:03:03, 2.50s/it] +2025-02-05 16:31:49 - ERROR - stderr - +2025-02-05 16:31:49 - ERROR - stderr - +2025-02-05 16:31:49 - INFO - stdout - {'loss': 0.9239, 'grad_norm': 1.0793124437332153, 'learning_rate': 1.805694596847343e-05, 'epoch': 0.68} +2025-02-05 16:31:49 - ERROR - stderr - 23%|██▎ | 5065/22434 [6:24:09<12:03:03, 2.50s/it] +2025-02-05 16:31:52 - ERROR - stderr - 23%|██▎ | 5066/22434 [6:24:11<12:41:04, 2.63s/it] +2025-02-05 16:31:52 - ERROR - stderr - +2025-02-05 16:31:52 - ERROR - stderr - +2025-02-05 16:31:52 - INFO - stdout - {'loss': 0.8461, 'grad_norm': 1.0646467208862305, 'learning_rate': 1.80560907083778e-05, 'epoch': 0.68} +2025-02-05 16:31:52 - ERROR - stderr - 23%|██▎ | 5066/22434 [6:24:11<12:41:04, 2.63s/it] +2025-02-05 16:31:54 - ERROR - stderr - 23%|██▎ | 5067/22434 [6:24:14<12:24:52, 2.57s/it] +2025-02-05 16:31:54 - ERROR - stderr - +2025-02-05 16:31:54 - ERROR - stderr - +2025-02-05 16:31:54 - INFO - stdout - {'loss': 1.0139, 'grad_norm': 1.1142200231552124, 'learning_rate': 1.8055235280360674e-05, 'epoch': 0.68} +2025-02-05 16:31:54 - ERROR - stderr - 23%|██▎ | 5067/22434 [6:24:14<12:24:52, 2.57s/it] +2025-02-05 16:31:57 - ERROR - stderr - 23%|██▎ | 5068/22434 [6:24:16<12:17:46, 2.55s/it] +2025-02-05 16:31:57 - ERROR - stderr - +2025-02-05 16:31:57 - ERROR - stderr - +2025-02-05 16:31:57 - INFO - stdout - {'loss': 0.9115, 'grad_norm': 1.1605818271636963, 'learning_rate': 1.8054379684439874e-05, 'epoch': 0.68} +2025-02-05 16:31:57 - ERROR - stderr - 23%|██▎ | 5068/22434 [6:24:16<12:17:46, 2.55s/it] +2025-02-05 16:31:59 - ERROR - stderr - 23%|██▎ | 5069/22434 [6:24:19<12:10:40, 2.52s/it] +2025-02-05 16:31:59 - ERROR - stderr - +2025-02-05 16:31:59 - ERROR - stderr - +2025-02-05 16:31:59 - INFO - stdout - {'loss': 1.0478, 'grad_norm': 1.194240927696228, 'learning_rate': 1.8053523920633235e-05, 'epoch': 0.68} +2025-02-05 16:31:59 - ERROR - stderr - 23%|██▎ | 5069/22434 [6:24:19<12:10:40, 2.52s/it] +2025-02-05 16:32:02 - ERROR - stderr - 23%|██▎ | 5070/22434 [6:24:21<12:12:31, 2.53s/it] +2025-02-05 16:32:02 - ERROR - stderr - +2025-02-05 16:32:02 - ERROR - stderr - +2025-02-05 16:32:02 - INFO - stdout - {'loss': 0.9738, 'grad_norm': 0.9740921854972839, 'learning_rate': 1.8052667988958597e-05, 'epoch': 0.68} +2025-02-05 16:32:02 - ERROR - stderr - 23%|██▎ | 5070/22434 [6:24:21<12:12:31, 2.53s/it] +2025-02-05 16:32:04 - ERROR - stderr - 23%|██▎ | 5071/22434 [6:24:24<12:08:03, 2.52s/it] +2025-02-05 16:32:04 - ERROR - stderr - +2025-02-05 16:32:04 - ERROR - stderr - +2025-02-05 16:32:04 - INFO - stdout - {'loss': 0.8986, 'grad_norm': 1.2290987968444824, 'learning_rate': 1.8051811889433803e-05, 'epoch': 0.68} +2025-02-05 16:32:04 - ERROR - stderr - 23%|██▎ | 5071/22434 [6:24:24<12:08:03, 2.52s/it] +2025-02-05 16:32:07 - ERROR - stderr - 23%|██▎ | 5072/22434 [6:24:26<12:17:12, 2.55s/it] +2025-02-05 16:32:07 - ERROR - stderr - +2025-02-05 16:32:07 - ERROR - stderr - +2025-02-05 16:32:07 - INFO - stdout - {'loss': 1.0764, 'grad_norm': 1.0792953968048096, 'learning_rate': 1.805095562207669e-05, 'epoch': 0.68} +2025-02-05 16:32:07 - ERROR - stderr - 23%|██▎ | 5072/22434 [6:24:27<12:17:12, 2.55s/it] +2025-02-05 16:32:09 - ERROR - stderr - 23%|██▎ | 5073/22434 [6:24:29<12:11:37, 2.53s/it] +2025-02-05 16:32:09 - ERROR - stderr - +2025-02-05 16:32:09 - ERROR - stderr - +2025-02-05 16:32:09 - INFO - stdout - {'loss': 1.0404, 'grad_norm': 1.1804550886154175, 'learning_rate': 1.8050099186905114e-05, 'epoch': 0.68} +2025-02-05 16:32:09 - ERROR - stderr - 23%|██▎ | 5073/22434 [6:24:29<12:11:37, 2.53s/it] +2025-02-05 16:32:12 - ERROR - stderr - 23%|██▎ | 5074/22434 [6:24:31<12:07:50, 2.52s/it] +2025-02-05 16:32:12 - ERROR - stderr - +2025-02-05 16:32:12 - ERROR - stderr - +2025-02-05 16:32:12 - INFO - stdout - {'loss': 1.0377, 'grad_norm': 1.1123442649841309, 'learning_rate': 1.8049242583936923e-05, 'epoch': 0.68} +2025-02-05 16:32:12 - ERROR - stderr - 23%|██▎ | 5074/22434 [6:24:32<12:07:50, 2.52s/it] +2025-02-05 16:32:14 - ERROR - stderr - 23%|██▎ | 5075/22434 [6:24:34<12:21:37, 2.56s/it] +2025-02-05 16:32:14 - ERROR - stderr - +2025-02-05 16:32:14 - ERROR - stderr - +2025-02-05 16:32:14 - INFO - stdout - {'loss': 0.9334, 'grad_norm': 1.0268845558166504, 'learning_rate': 1.8048385813189973e-05, 'epoch': 0.68} +2025-02-05 16:32:14 - ERROR - stderr - 23%|██▎ | 5075/22434 [6:24:34<12:21:37, 2.56s/it] +2025-02-05 16:32:17 - ERROR - stderr - 23%|██▎ | 5076/22434 [6:24:37<12:18:39, 2.55s/it] +2025-02-05 16:32:17 - ERROR - stderr - +2025-02-05 16:32:17 - ERROR - stderr - +2025-02-05 16:32:17 - INFO - stdout - {'loss': 0.9569, 'grad_norm': 1.058103084564209, 'learning_rate': 1.804752887468212e-05, 'epoch': 0.68} +2025-02-05 16:32:17 - ERROR - stderr - 23%|██▎ | 5076/22434 [6:24:37<12:18:39, 2.55s/it] +2025-02-05 16:32:19 - ERROR - stderr - 23%|██▎ | 5077/22434 [6:24:39<12:13:31, 2.54s/it] +2025-02-05 16:32:19 - ERROR - stderr - +2025-02-05 16:32:19 - ERROR - stderr - +2025-02-05 16:32:19 - INFO - stdout - {'loss': 0.9504, 'grad_norm': 1.0855058431625366, 'learning_rate': 1.8046671768431233e-05, 'epoch': 0.68} +2025-02-05 16:32:19 - ERROR - stderr - 23%|██▎ | 5077/22434 [6:24:39<12:13:31, 2.54s/it] +2025-02-05 16:32:22 - ERROR - stderr - 23%|██▎ | 5078/22434 [6:24:42<12:08:27, 2.52s/it] +2025-02-05 16:32:22 - ERROR - stderr - +2025-02-05 16:32:22 - ERROR - stderr - +2025-02-05 16:32:22 - INFO - stdout - {'loss': 0.9085, 'grad_norm': 1.0597195625305176, 'learning_rate': 1.804581449445517e-05, 'epoch': 0.68} +2025-02-05 16:32:22 - ERROR - stderr - 23%|██▎ | 5078/22434 [6:24:42<12:08:27, 2.52s/it] +2025-02-05 16:32:24 - ERROR - stderr - 23%|██▎ | 5079/22434 [6:24:44<12:07:16, 2.51s/it] +2025-02-05 16:32:24 - ERROR - stderr - +2025-02-05 16:32:24 - ERROR - stderr - +2025-02-05 16:32:24 - INFO - stdout - {'loss': 1.0389, 'grad_norm': 1.0111112594604492, 'learning_rate': 1.8044957052771803e-05, 'epoch': 0.68} +2025-02-05 16:32:24 - ERROR - stderr - 23%|██▎ | 5079/22434 [6:24:44<12:07:16, 2.51s/it] +2025-02-05 16:32:27 - ERROR - stderr - 23%|██▎ | 5080/22434 [6:24:47<12:18:45, 2.55s/it] +2025-02-05 16:32:27 - ERROR - stderr - +2025-02-05 16:32:27 - ERROR - stderr - +2025-02-05 16:32:27 - INFO - stdout - {'loss': 0.9215, 'grad_norm': 0.8890573382377625, 'learning_rate': 1.8044099443399003e-05, 'epoch': 0.68} +2025-02-05 16:32:27 - ERROR - stderr - 23%|██▎ | 5080/22434 [6:24:47<12:18:45, 2.55s/it] +2025-02-05 16:32:29 - ERROR - stderr - 23%|██▎ | 5081/22434 [6:24:49<12:09:33, 2.52s/it] +2025-02-05 16:32:30 - ERROR - stderr - +2025-02-05 16:32:30 - ERROR - stderr - +2025-02-05 16:32:30 - INFO - stdout - {'loss': 0.9368, 'grad_norm': 1.094689130783081, 'learning_rate': 1.804324166635465e-05, 'epoch': 0.68} +2025-02-05 16:32:30 - ERROR - stderr - 23%|██▎ | 5081/22434 [6:24:49<12:09:33, 2.52s/it] +2025-02-05 16:32:32 - ERROR - stderr - 23%|██▎ | 5082/22434 [6:24:52<12:09:37, 2.52s/it] +2025-02-05 16:32:32 - ERROR - stderr - +2025-02-05 16:32:32 - ERROR - stderr - +2025-02-05 16:32:32 - INFO - stdout - {'loss': 0.9582, 'grad_norm': 1.1405119895935059, 'learning_rate': 1.8042383721656617e-05, 'epoch': 0.68} +2025-02-05 16:32:32 - ERROR - stderr - 23%|██▎ | 5082/22434 [6:24:52<12:09:37, 2.52s/it] +2025-02-05 16:32:34 - ERROR - stderr - 23%|██▎ | 5083/22434 [6:24:54<12:05:37, 2.51s/it] +2025-02-05 16:32:35 - ERROR - stderr - +2025-02-05 16:32:35 - ERROR - stderr - +2025-02-05 16:32:35 - INFO - stdout - {'loss': 1.1045, 'grad_norm': 1.1554011106491089, 'learning_rate': 1.8041525609322795e-05, 'epoch': 0.68} +2025-02-05 16:32:35 - ERROR - stderr - 23%|██▎ | 5083/22434 [6:24:54<12:05:37, 2.51s/it] +2025-02-05 16:32:37 - ERROR - stderr - 23%|██▎ | 5084/22434 [6:24:57<12:07:53, 2.52s/it] +2025-02-05 16:32:37 - ERROR - stderr - +2025-02-05 16:32:37 - ERROR - stderr - +2025-02-05 16:32:37 - INFO - stdout - {'loss': 1.0195, 'grad_norm': 1.1559550762176514, 'learning_rate': 1.8040667329371063e-05, 'epoch': 0.68} +2025-02-05 16:32:37 - ERROR - stderr - 23%|██▎ | 5084/22434 [6:24:57<12:07:53, 2.52s/it] +2025-02-05 16:32:40 - ERROR - stderr - 23%|██▎ | 5085/22434 [6:24:59<12:24:01, 2.57s/it] +2025-02-05 16:32:40 - ERROR - stderr - +2025-02-05 16:32:40 - ERROR - stderr - +2025-02-05 16:32:40 - INFO - stdout - {'loss': 0.9063, 'grad_norm': 1.0837669372558594, 'learning_rate': 1.8039808881819318e-05, 'epoch': 0.68} +2025-02-05 16:32:40 - ERROR - stderr - 23%|██▎ | 5085/22434 [6:25:00<12:24:01, 2.57s/it] +2025-02-05 16:32:42 - ERROR - stderr - 23%|██▎ | 5086/22434 [6:25:02<12:12:36, 2.53s/it] +2025-02-05 16:32:42 - ERROR - stderr - +2025-02-05 16:32:42 - ERROR - stderr - +2025-02-05 16:32:42 - INFO - stdout - {'loss': 0.8929, 'grad_norm': 1.0689849853515625, 'learning_rate': 1.803895026668545e-05, 'epoch': 0.68} +2025-02-05 16:32:42 - ERROR - stderr - 23%|██▎ | 5086/22434 [6:25:02<12:12:36, 2.53s/it] +2025-02-05 16:32:45 - ERROR - stderr - 23%|██▎ | 5087/22434 [6:25:04<12:05:05, 2.51s/it] +2025-02-05 16:32:45 - ERROR - stderr - +2025-02-05 16:32:45 - ERROR - stderr - +2025-02-05 16:32:45 - INFO - stdout - {'loss': 0.8775, 'grad_norm': 1.1741976737976074, 'learning_rate': 1.8038091483987357e-05, 'epoch': 0.68} +2025-02-05 16:32:45 - ERROR - stderr - 23%|██▎ | 5087/22434 [6:25:04<12:05:05, 2.51s/it] +2025-02-05 16:32:47 - ERROR - stderr - 23%|██▎ | 5088/22434 [6:25:07<12:03:18, 2.50s/it] +2025-02-05 16:32:47 - ERROR - stderr - +2025-02-05 16:32:47 - ERROR - stderr - +2025-02-05 16:32:47 - INFO - stdout - {'loss': 1.0531, 'grad_norm': 1.2029422521591187, 'learning_rate': 1.8037232533742936e-05, 'epoch': 0.68} +2025-02-05 16:32:47 - ERROR - stderr - 23%|██▎ | 5088/22434 [6:25:07<12:03:18, 2.50s/it] +2025-02-05 16:32:50 - ERROR - stderr - 23%|██▎ | 5089/22434 [6:25:09<12:10:56, 2.53s/it] +2025-02-05 16:32:50 - ERROR - stderr - +2025-02-05 16:32:50 - ERROR - stderr - +2025-02-05 16:32:50 - INFO - stdout - {'loss': 1.0407, 'grad_norm': 1.0770916938781738, 'learning_rate': 1.8036373415970093e-05, 'epoch': 0.68} +2025-02-05 16:32:50 - ERROR - stderr - 23%|██▎ | 5089/22434 [6:25:10<12:10:56, 2.53s/it] +2025-02-05 16:32:52 - ERROR - stderr - 23%|██▎ | 5090/22434 [6:25:12<12:10:43, 2.53s/it] +2025-02-05 16:32:52 - ERROR - stderr - +2025-02-05 16:32:52 - ERROR - stderr - +2025-02-05 16:32:52 - INFO - stdout - {'loss': 0.8879, 'grad_norm': 0.9712393879890442, 'learning_rate': 1.8035514130686737e-05, 'epoch': 0.68} +2025-02-05 16:32:52 - ERROR - stderr - 23%|██▎ | 5090/22434 [6:25:12<12:10:43, 2.53s/it] +2025-02-05 16:32:55 - ERROR - stderr - 23%|██▎ | 5091/22434 [6:25:14<12:05:43, 2.51s/it] +2025-02-05 16:32:55 - ERROR - stderr - +2025-02-05 16:32:55 - ERROR - stderr - +2025-02-05 16:32:55 - INFO - stdout - {'loss': 0.9188, 'grad_norm': 1.0829929113388062, 'learning_rate': 1.803465467791078e-05, 'epoch': 0.68} +2025-02-05 16:32:55 - ERROR - stderr - 23%|██▎ | 5091/22434 [6:25:14<12:05:43, 2.51s/it] +2025-02-05 16:32:57 - ERROR - stderr - 23%|██▎ | 5092/22434 [6:25:17<12:11:28, 2.53s/it] +2025-02-05 16:32:57 - ERROR - stderr - +2025-02-05 16:32:57 - ERROR - stderr - +2025-02-05 16:32:57 - INFO - stdout - {'loss': 0.7929, 'grad_norm': 1.0641124248504639, 'learning_rate': 1.8033795057660134e-05, 'epoch': 0.68} +2025-02-05 16:32:57 - ERROR - stderr - 23%|██▎ | 5092/22434 [6:25:17<12:11:28, 2.53s/it] +2025-02-05 16:33:00 - ERROR - stderr - 23%|██▎ | 5093/22434 [6:25:20<12:15:39, 2.55s/it] +2025-02-05 16:33:00 - ERROR - stderr - +2025-02-05 16:33:00 - ERROR - stderr - +2025-02-05 16:33:00 - INFO - stdout - {'loss': 0.9511, 'grad_norm': 1.576263666152954, 'learning_rate': 1.8032935269952714e-05, 'epoch': 0.68} +2025-02-05 16:33:00 - ERROR - stderr - 23%|██▎ | 5093/22434 [6:25:20<12:15:39, 2.55s/it] +2025-02-05 16:33:02 - ERROR - stderr - 23%|██▎ | 5094/22434 [6:25:22<12:06:01, 2.51s/it] +2025-02-05 16:33:02 - ERROR - stderr - +2025-02-05 16:33:02 - ERROR - stderr - +2025-02-05 16:33:02 - INFO - stdout - {'loss': 0.9749, 'grad_norm': 1.057915449142456, 'learning_rate': 1.803207531480645e-05, 'epoch': 0.68} +2025-02-05 16:33:02 - ERROR - stderr - 23%|██▎ | 5094/22434 [6:25:22<12:06:01, 2.51s/it] +2025-02-05 16:33:05 - ERROR - stderr - 23%|██▎ | 5095/22434 [6:25:25<12:14:10, 2.54s/it] +2025-02-05 16:33:05 - ERROR - stderr - +2025-02-05 16:33:05 - ERROR - stderr - +2025-02-05 16:33:05 - INFO - stdout - {'loss': 0.9927, 'grad_norm': 1.077998161315918, 'learning_rate': 1.803121519223926e-05, 'epoch': 0.68} +2025-02-05 16:33:05 - ERROR - stderr - 23%|██▎ | 5095/22434 [6:25:25<12:14:10, 2.54s/it] +2025-02-05 16:33:07 - ERROR - stderr - 23%|██▎ | 5096/22434 [6:25:27<12:18:24, 2.56s/it] +2025-02-05 16:33:08 - ERROR - stderr - +2025-02-05 16:33:08 - ERROR - stderr - +2025-02-05 16:33:08 - INFO - stdout - {'loss': 1.0748, 'grad_norm': 1.2218754291534424, 'learning_rate': 1.8030354902269077e-05, 'epoch': 0.68} +2025-02-05 16:33:08 - ERROR - stderr - 23%|██▎ | 5096/22434 [6:25:27<12:18:24, 2.56s/it] +2025-02-05 16:33:10 - ERROR - stderr - 23%|██▎ | 5097/22434 [6:25:30<12:50:07, 2.67s/it] +2025-02-05 16:33:10 - ERROR - stderr - +2025-02-05 16:33:10 - ERROR - stderr - +2025-02-05 16:33:10 - INFO - stdout - {'loss': 0.9096, 'grad_norm': 1.1164921522140503, 'learning_rate': 1.8029494444913825e-05, 'epoch': 0.68} +2025-02-05 16:33:10 - ERROR - stderr - 23%|██▎ | 5097/22434 [6:25:30<12:50:07, 2.67s/it] +2025-02-05 16:33:13 - ERROR - stderr - 23%|██▎ | 5098/22434 [6:25:33<12:47:03, 2.65s/it] +2025-02-05 16:33:13 - ERROR - stderr - +2025-02-05 16:33:13 - ERROR - stderr - +2025-02-05 16:33:13 - INFO - stdout - {'loss': 1.0513, 'grad_norm': 1.3206048011779785, 'learning_rate': 1.8028633820191448e-05, 'epoch': 0.68} +2025-02-05 16:33:13 - ERROR - stderr - 23%|██▎ | 5098/22434 [6:25:33<12:47:03, 2.65s/it] +2025-02-05 16:33:16 - ERROR - stderr - 23%|██▎ | 5099/22434 [6:25:35<12:34:51, 2.61s/it] +2025-02-05 16:33:16 - ERROR - stderr - +2025-02-05 16:33:16 - ERROR - stderr - +2025-02-05 16:33:16 - INFO - stdout - {'loss': 0.9239, 'grad_norm': 1.0226329565048218, 'learning_rate': 1.8027773028119878e-05, 'epoch': 0.68} +2025-02-05 16:33:16 - ERROR - stderr - 23%|██▎ | 5099/22434 [6:25:35<12:34:51, 2.61s/it] +2025-02-05 16:33:18 - ERROR - stderr - 23%|██▎ | 5100/22434 [6:25:38<12:29:41, 2.60s/it] +2025-02-05 16:33:18 - ERROR - stderr - +2025-02-05 16:33:18 - ERROR - stderr - +2025-02-05 16:33:18 - INFO - stdout - {'loss': 1.0135, 'grad_norm': 1.1730430126190186, 'learning_rate': 1.8026912068717064e-05, 'epoch': 0.68} +2025-02-05 16:33:18 - ERROR - stderr - 23%|██▎ | 5100/22434 [6:25:38<12:29:41, 2.60s/it] +2025-02-05 16:33:21 - ERROR - stderr - 23%|██▎ | 5101/22434 [6:25:40<12:22:53, 2.57s/it] +2025-02-05 16:33:21 - ERROR - stderr - +2025-02-05 16:33:21 - ERROR - stderr - +2025-02-05 16:33:21 - INFO - stdout - {'loss': 0.7907, 'grad_norm': 1.0840502977371216, 'learning_rate': 1.8026050942000946e-05, 'epoch': 0.68} +2025-02-05 16:33:21 - ERROR - stderr - 23%|██▎ | 5101/22434 [6:25:40<12:22:53, 2.57s/it] +2025-02-05 16:33:23 - ERROR - stderr - 23%|██▎ | 5102/22434 [6:25:43<12:16:07, 2.55s/it] +2025-02-05 16:33:23 - ERROR - stderr - +2025-02-05 16:33:23 - ERROR - stderr - +2025-02-05 16:33:23 - INFO - stdout - {'loss': 0.9023, 'grad_norm': 1.049568772315979, 'learning_rate': 1.8025189647989483e-05, 'epoch': 0.68} +2025-02-05 16:33:23 - ERROR - stderr - 23%|██▎ | 5102/22434 [6:25:43<12:16:07, 2.55s/it] +2025-02-05 16:33:26 - ERROR - stderr - 23%|██▎ | 5103/22434 [6:25:45<12:18:41, 2.56s/it] +2025-02-05 16:33:26 - ERROR - stderr - +2025-02-05 16:33:26 - ERROR - stderr - +2025-02-05 16:33:26 - INFO - stdout - {'loss': 1.0354, 'grad_norm': 1.0245225429534912, 'learning_rate': 1.8024328186700616e-05, 'epoch': 0.68} +2025-02-05 16:33:26 - ERROR - stderr - 23%|██▎ | 5103/22434 [6:25:45<12:18:41, 2.56s/it] +2025-02-05 16:33:28 - ERROR - stderr - 23%|██▎ | 5104/22434 [6:25:48<12:11:57, 2.53s/it] +2025-02-05 16:33:28 - ERROR - stderr - +2025-02-05 16:33:28 - ERROR - stderr - +2025-02-05 16:33:28 - INFO - stdout - {'loss': 0.9803, 'grad_norm': 0.9409737586975098, 'learning_rate': 1.8023466558152308e-05, 'epoch': 0.68} +2025-02-05 16:33:28 - ERROR - stderr - 23%|██▎ | 5104/22434 [6:25:48<12:11:57, 2.53s/it] +2025-02-05 16:33:31 - ERROR - stderr - 23%|██▎ | 5105/22434 [6:25:50<12:10:14, 2.53s/it] +2025-02-05 16:33:31 - ERROR - stderr - +2025-02-05 16:33:31 - ERROR - stderr - +2025-02-05 16:33:31 - INFO - stdout - {'loss': 0.9058, 'grad_norm': 1.1060967445373535, 'learning_rate': 1.8022604762362514e-05, 'epoch': 0.68} +2025-02-05 16:33:31 - ERROR - stderr - 23%|██▎ | 5105/22434 [6:25:50<12:10:14, 2.53s/it] +2025-02-05 16:33:33 - ERROR - stderr - 23%|██▎ | 5106/22434 [6:25:53<12:01:45, 2.50s/it] +2025-02-05 16:33:33 - ERROR - stderr - +2025-02-05 16:33:33 - ERROR - stderr - +2025-02-05 16:33:33 - INFO - stdout - {'loss': 0.9523, 'grad_norm': 1.1317620277404785, 'learning_rate': 1.8021742799349206e-05, 'epoch': 0.68} +2025-02-05 16:33:33 - ERROR - stderr - 23%|██▎ | 5106/22434 [6:25:53<12:01:45, 2.50s/it] +2025-02-05 16:33:36 - ERROR - stderr - 23%|██▎ | 5107/22434 [6:25:55<11:59:21, 2.49s/it] +2025-02-05 16:33:36 - ERROR - stderr - +2025-02-05 16:33:36 - ERROR - stderr - +2025-02-05 16:33:36 - INFO - stdout - {'loss': 1.0111, 'grad_norm': 1.2041938304901123, 'learning_rate': 1.802088066913034e-05, 'epoch': 0.68} +2025-02-05 16:33:36 - ERROR - stderr - 23%|██▎ | 5107/22434 [6:25:55<11:59:21, 2.49s/it] +2025-02-05 16:33:38 - ERROR - stderr - 23%|██▎ | 5108/22434 [6:25:58<12:10:57, 2.53s/it] +2025-02-05 16:33:38 - ERROR - stderr - +2025-02-05 16:33:38 - ERROR - stderr - +2025-02-05 16:33:38 - INFO - stdout - {'loss': 0.9488, 'grad_norm': 1.054218053817749, 'learning_rate': 1.8020018371723895e-05, 'epoch': 0.68} +2025-02-05 16:33:38 - ERROR - stderr - 23%|██▎ | 5108/22434 [6:25:58<12:10:57, 2.53s/it] +2025-02-05 16:33:41 - ERROR - stderr - 23%|██▎ | 5109/22434 [6:26:00<12:05:24, 2.51s/it] +2025-02-05 16:33:41 - ERROR - stderr - +2025-02-05 16:33:41 - ERROR - stderr - +2025-02-05 16:33:41 - INFO - stdout - {'loss': 0.9669, 'grad_norm': 1.1941221952438354, 'learning_rate': 1.801915590714784e-05, 'epoch': 0.68} +2025-02-05 16:33:41 - ERROR - stderr - 23%|██▎ | 5109/22434 [6:26:00<12:05:24, 2.51s/it] +2025-02-05 16:33:43 - ERROR - stderr - 23%|██▎ | 5110/22434 [6:26:03<12:00:15, 2.49s/it] +2025-02-05 16:33:43 - ERROR - stderr - +2025-02-05 16:33:43 - ERROR - stderr - +2025-02-05 16:33:43 - INFO - stdout - {'loss': 0.8966, 'grad_norm': 1.0763728618621826, 'learning_rate': 1.8018293275420156e-05, 'epoch': 0.68} +2025-02-05 16:33:43 - ERROR - stderr - 23%|██▎ | 5110/22434 [6:26:03<12:00:15, 2.49s/it] +2025-02-05 16:33:46 - ERROR - stderr - 23%|██▎ | 5111/22434 [6:26:05<12:00:11, 2.49s/it] +2025-02-05 16:33:46 - ERROR - stderr - +2025-02-05 16:33:46 - ERROR - stderr - +2025-02-05 16:33:46 - INFO - stdout - {'loss': 0.8978, 'grad_norm': 1.0471513271331787, 'learning_rate': 1.801743047655882e-05, 'epoch': 0.68} +2025-02-05 16:33:46 - ERROR - stderr - 23%|██▎ | 5111/22434 [6:26:05<12:00:11, 2.49s/it] +2025-02-05 16:33:48 - ERROR - stderr - 23%|██▎ | 5112/22434 [6:26:08<11:56:05, 2.48s/it] +2025-02-05 16:33:48 - ERROR - stderr - +2025-02-05 16:33:48 - ERROR - stderr - +2025-02-05 16:33:48 - INFO - stdout - {'loss': 0.9878, 'grad_norm': 1.0998284816741943, 'learning_rate': 1.8016567510581814e-05, 'epoch': 0.68} +2025-02-05 16:33:48 - ERROR - stderr - 23%|██▎ | 5112/22434 [6:26:08<11:56:05, 2.48s/it] +2025-02-05 16:33:51 - ERROR - stderr - 23%|██▎ | 5113/22434 [6:26:10<11:52:37, 2.47s/it] +2025-02-05 16:33:51 - ERROR - stderr - +2025-02-05 16:33:51 - ERROR - stderr - +2025-02-05 16:33:51 - INFO - stdout - {'loss': 1.0458, 'grad_norm': 1.173107624053955, 'learning_rate': 1.801570437750713e-05, 'epoch': 0.68} +2025-02-05 16:33:51 - ERROR - stderr - 23%|██▎ | 5113/22434 [6:26:10<11:52:37, 2.47s/it] +2025-02-05 16:33:53 - ERROR - stderr - 23%|██▎ | 5114/22434 [6:26:13<11:51:09, 2.46s/it] +2025-02-05 16:33:53 - ERROR - stderr - +2025-02-05 16:33:53 - ERROR - stderr - +2025-02-05 16:33:53 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.088143229484558, 'learning_rate': 1.8014841077352764e-05, 'epoch': 0.68} +2025-02-05 16:33:53 - ERROR - stderr - 23%|██▎ | 5114/22434 [6:26:13<11:51:09, 2.46s/it] +2025-02-05 16:33:55 - ERROR - stderr - 23%|██▎ | 5115/22434 [6:26:15<11:50:23, 2.46s/it] +2025-02-05 16:33:55 - ERROR - stderr - +2025-02-05 16:33:55 - ERROR - stderr - +2025-02-05 16:33:55 - INFO - stdout - {'loss': 0.941, 'grad_norm': 1.123342514038086, 'learning_rate': 1.8013977610136698e-05, 'epoch': 0.68} +2025-02-05 16:33:55 - ERROR - stderr - 23%|██▎ | 5115/22434 [6:26:15<11:50:23, 2.46s/it] +2025-02-05 16:33:58 - ERROR - stderr - 23%|██▎ | 5116/22434 [6:26:18<11:56:04, 2.48s/it] +2025-02-05 16:33:58 - ERROR - stderr - +2025-02-05 16:33:58 - ERROR - stderr - +2025-02-05 16:33:58 - INFO - stdout - {'loss': 0.808, 'grad_norm': 1.0366772413253784, 'learning_rate': 1.8013113975876942e-05, 'epoch': 0.68} +2025-02-05 16:33:58 - ERROR - stderr - 23%|██▎ | 5116/22434 [6:26:18<11:56:04, 2.48s/it] +2025-02-05 16:34:00 - ERROR - stderr - 23%|██▎ | 5117/22434 [6:26:20<11:56:59, 2.48s/it] +2025-02-05 16:34:00 - ERROR - stderr - +2025-02-05 16:34:00 - ERROR - stderr - +2025-02-05 16:34:00 - INFO - stdout - {'loss': 0.8577, 'grad_norm': 1.0562697649002075, 'learning_rate': 1.8012250174591492e-05, 'epoch': 0.68} +2025-02-05 16:34:00 - ERROR - stderr - 23%|██▎ | 5117/22434 [6:26:20<11:56:59, 2.48s/it] +2025-02-05 16:34:03 - ERROR - stderr - 23%|██▎ | 5118/22434 [6:26:23<11:56:58, 2.48s/it] +2025-02-05 16:34:03 - ERROR - stderr - +2025-02-05 16:34:03 - ERROR - stderr - +2025-02-05 16:34:03 - INFO - stdout - {'loss': 1.095, 'grad_norm': 1.283618688583374, 'learning_rate': 1.8011386206298357e-05, 'epoch': 0.68} +2025-02-05 16:34:03 - ERROR - stderr - 23%|██▎ | 5118/22434 [6:26:23<11:56:58, 2.48s/it] +2025-02-05 16:34:05 - ERROR - stderr - 23%|██▎ | 5119/22434 [6:26:25<11:54:51, 2.48s/it] +2025-02-05 16:34:05 - ERROR - stderr - +2025-02-05 16:34:05 - ERROR - stderr - +2025-02-05 16:34:05 - INFO - stdout - {'loss': 0.8278, 'grad_norm': 0.9584662318229675, 'learning_rate': 1.8010522071015537e-05, 'epoch': 0.68} +2025-02-05 16:34:05 - ERROR - stderr - 23%|██▎ | 5119/22434 [6:26:25<11:54:51, 2.48s/it] +2025-02-05 16:34:08 - ERROR - stderr - 23%|██▎ | 5120/22434 [6:26:28<12:02:45, 2.50s/it] +2025-02-05 16:34:08 - ERROR - stderr - +2025-02-05 16:34:08 - ERROR - stderr - +2025-02-05 16:34:08 - INFO - stdout - {'loss': 1.0009, 'grad_norm': 1.0604195594787598, 'learning_rate': 1.8009657768761052e-05, 'epoch': 0.68} +2025-02-05 16:34:08 - ERROR - stderr - 23%|██▎ | 5120/22434 [6:26:28<12:02:45, 2.50s/it] +2025-02-05 16:34:10 - ERROR - stderr - 23%|██▎ | 5121/22434 [6:26:30<12:00:34, 2.50s/it] +2025-02-05 16:34:10 - ERROR - stderr - +2025-02-05 16:34:10 - ERROR - stderr - +2025-02-05 16:34:10 - INFO - stdout - {'loss': 0.9388, 'grad_norm': 1.0978963375091553, 'learning_rate': 1.8008793299552914e-05, 'epoch': 0.68} +2025-02-05 16:34:10 - ERROR - stderr - 23%|██▎ | 5121/22434 [6:26:30<12:00:34, 2.50s/it] +2025-02-05 16:34:13 - ERROR - stderr - 23%|██▎ | 5122/22434 [6:26:33<11:55:04, 2.48s/it] +2025-02-05 16:34:13 - ERROR - stderr - +2025-02-05 16:34:13 - ERROR - stderr - +2025-02-05 16:34:13 - INFO - stdout - {'loss': 0.9831, 'grad_norm': 1.1427022218704224, 'learning_rate': 1.8007928663409148e-05, 'epoch': 0.68} +2025-02-05 16:34:13 - ERROR - stderr - 23%|██▎ | 5122/22434 [6:26:33<11:55:04, 2.48s/it] +2025-02-05 16:34:15 - ERROR - stderr - 23%|██▎ | 5123/22434 [6:26:35<11:58:06, 2.49s/it] +2025-02-05 16:34:15 - ERROR - stderr - +2025-02-05 16:34:15 - ERROR - stderr - +2025-02-05 16:34:15 - INFO - stdout - {'loss': 0.9301, 'grad_norm': 1.060240387916565, 'learning_rate': 1.8007063860347768e-05, 'epoch': 0.69} +2025-02-05 16:34:15 - ERROR - stderr - 23%|██▎ | 5123/22434 [6:26:35<11:58:06, 2.49s/it] +2025-02-05 16:34:18 - ERROR - stderr - 23%|██▎ | 5124/22434 [6:26:38<12:03:45, 2.51s/it] +2025-02-05 16:34:18 - ERROR - stderr - +2025-02-05 16:34:18 - ERROR - stderr - +2025-02-05 16:34:18 - INFO - stdout - {'loss': 1.0026, 'grad_norm': 1.0550285577774048, 'learning_rate': 1.8006198890386802e-05, 'epoch': 0.69} +2025-02-05 16:34:18 - ERROR - stderr - 23%|██▎ | 5124/22434 [6:26:38<12:03:45, 2.51s/it] +2025-02-05 16:34:20 - ERROR - stderr - 23%|██▎ | 5125/22434 [6:26:40<12:08:26, 2.53s/it] +2025-02-05 16:34:21 - ERROR - stderr - +2025-02-05 16:34:21 - ERROR - stderr - +2025-02-05 16:34:21 - INFO - stdout - {'loss': 1.0482, 'grad_norm': 1.1321195363998413, 'learning_rate': 1.8005333753544283e-05, 'epoch': 0.69} +2025-02-05 16:34:21 - ERROR - stderr - 23%|██▎ | 5125/22434 [6:26:40<12:08:26, 2.53s/it] +2025-02-05 16:34:23 - ERROR - stderr - 23%|██▎ | 5126/22434 [6:26:43<12:08:02, 2.52s/it] +2025-02-05 16:34:23 - ERROR - stderr - +2025-02-05 16:34:23 - ERROR - stderr - +2025-02-05 16:34:23 - INFO - stdout - {'loss': 0.9728, 'grad_norm': 1.0665620565414429, 'learning_rate': 1.8004468449838245e-05, 'epoch': 0.69} +2025-02-05 16:34:23 - ERROR - stderr - 23%|██▎ | 5126/22434 [6:26:43<12:08:02, 2.52s/it] +2025-02-05 16:34:25 - ERROR - stderr - 23%|██▎ | 5127/22434 [6:26:45<12:01:23, 2.50s/it] +2025-02-05 16:34:26 - ERROR - stderr - +2025-02-05 16:34:26 - ERROR - stderr - +2025-02-05 16:34:26 - INFO - stdout - {'loss': 0.9197, 'grad_norm': 1.1393606662750244, 'learning_rate': 1.8003602979286717e-05, 'epoch': 0.69} +2025-02-05 16:34:26 - ERROR - stderr - 23%|██▎ | 5127/22434 [6:26:45<12:01:23, 2.50s/it] +2025-02-05 16:34:28 - ERROR - stderr - 23%|██▎ | 5128/22434 [6:26:48<12:03:29, 2.51s/it] +2025-02-05 16:34:28 - ERROR - stderr - +2025-02-05 16:34:28 - ERROR - stderr - +2025-02-05 16:34:28 - INFO - stdout - {'loss': 1.0298, 'grad_norm': 1.111890435218811, 'learning_rate': 1.8002737341907743e-05, 'epoch': 0.69} +2025-02-05 16:34:28 - ERROR - stderr - 23%|██▎ | 5128/22434 [6:26:48<12:03:29, 2.51s/it] +2025-02-05 16:34:31 - ERROR - stderr - 23%|██▎ | 5129/22434 [6:26:50<12:06:16, 2.52s/it] +2025-02-05 16:34:31 - ERROR - stderr - +2025-02-05 16:34:31 - ERROR - stderr - +2025-02-05 16:34:31 - INFO - stdout - {'loss': 0.9258, 'grad_norm': 1.1211916208267212, 'learning_rate': 1.800187153771937e-05, 'epoch': 0.69} +2025-02-05 16:34:31 - ERROR - stderr - 23%|██▎ | 5129/22434 [6:26:50<12:06:16, 2.52s/it] +2025-02-05 16:34:33 - ERROR - stderr - 23%|██▎ | 5130/22434 [6:26:53<11:59:51, 2.50s/it] +2025-02-05 16:34:33 - ERROR - stderr - +2025-02-05 16:34:33 - ERROR - stderr - +2025-02-05 16:34:33 - INFO - stdout - {'loss': 1.0111, 'grad_norm': 1.0774627923965454, 'learning_rate': 1.800100556673964e-05, 'epoch': 0.69} +2025-02-05 16:34:33 - ERROR - stderr - 23%|██▎ | 5130/22434 [6:26:53<11:59:51, 2.50s/it] +2025-02-05 16:34:36 - ERROR - stderr - 23%|██▎ | 5131/22434 [6:26:55<12:03:49, 2.51s/it] +2025-02-05 16:34:36 - ERROR - stderr - +2025-02-05 16:34:36 - ERROR - stderr - +2025-02-05 16:34:36 - INFO - stdout - {'loss': 0.8513, 'grad_norm': 0.9830366969108582, 'learning_rate': 1.800013942898661e-05, 'epoch': 0.69} +2025-02-05 16:34:36 - ERROR - stderr - 23%|██▎ | 5131/22434 [6:26:55<12:03:49, 2.51s/it] +2025-02-05 16:34:38 - ERROR - stderr - 23%|██▎ | 5132/22434 [6:26:58<12:05:07, 2.51s/it] +2025-02-05 16:34:38 - ERROR - stderr - +2025-02-05 16:34:38 - ERROR - stderr - +2025-02-05 16:34:38 - INFO - stdout - {'loss': 0.9994, 'grad_norm': 1.2034239768981934, 'learning_rate': 1.7999273124478324e-05, 'epoch': 0.69} +2025-02-05 16:34:38 - ERROR - stderr - 23%|██▎ | 5132/22434 [6:26:58<12:05:07, 2.51s/it] +2025-02-05 16:34:40 - ERROR - stderr - 23%|██▎ | 5133/22434 [6:27:00<11:59:31, 2.50s/it] +2025-02-05 16:34:41 - ERROR - stderr - +2025-02-05 16:34:41 - ERROR - stderr - +2025-02-05 16:34:41 - INFO - stdout - {'loss': 1.0047, 'grad_norm': 1.1258162260055542, 'learning_rate': 1.7998406653232842e-05, 'epoch': 0.69} +2025-02-05 16:34:41 - ERROR - stderr - 23%|██▎ | 5133/22434 [6:27:00<11:59:31, 2.50s/it] +2025-02-05 16:34:43 - ERROR - stderr - 23%|██▎ | 5134/22434 [6:27:03<11:56:29, 2.48s/it] +2025-02-05 16:34:43 - ERROR - stderr - +2025-02-05 16:34:43 - ERROR - stderr - +2025-02-05 16:34:43 - INFO - stdout - {'loss': 0.9751, 'grad_norm': 1.1947698593139648, 'learning_rate': 1.7997540015268234e-05, 'epoch': 0.69} +2025-02-05 16:34:43 - ERROR - stderr - 23%|██▎ | 5134/22434 [6:27:03<11:56:29, 2.48s/it] +2025-02-05 16:34:45 - ERROR - stderr - 23%|██▎ | 5135/22434 [6:27:05<11:50:53, 2.47s/it] +2025-02-05 16:34:45 - ERROR - stderr - +2025-02-05 16:34:45 - ERROR - stderr - +2025-02-05 16:34:45 - INFO - stdout - {'loss': 0.9367, 'grad_norm': 1.1146042346954346, 'learning_rate': 1.7996673210602555e-05, 'epoch': 0.69} +2025-02-05 16:34:45 - ERROR - stderr - 23%|██▎ | 5135/22434 [6:27:05<11:50:53, 2.47s/it] +2025-02-05 16:34:48 - ERROR - stderr - 23%|██▎ | 5136/22434 [6:27:08<11:52:12, 2.47s/it] +2025-02-05 16:34:48 - ERROR - stderr - +2025-02-05 16:34:48 - ERROR - stderr - +2025-02-05 16:34:48 - INFO - stdout - {'loss': 0.9517, 'grad_norm': 1.0870232582092285, 'learning_rate': 1.7995806239253873e-05, 'epoch': 0.69} +2025-02-05 16:34:48 - ERROR - stderr - 23%|██▎ | 5136/22434 [6:27:08<11:52:12, 2.47s/it] +2025-02-05 16:34:50 - ERROR - stderr - 23%|██▎ | 5137/22434 [6:27:10<11:59:59, 2.50s/it] +2025-02-05 16:34:50 - ERROR - stderr - +2025-02-05 16:34:50 - ERROR - stderr - +2025-02-05 16:34:50 - INFO - stdout - {'loss': 0.9957, 'grad_norm': 1.0905252695083618, 'learning_rate': 1.799493910124026e-05, 'epoch': 0.69} +2025-02-05 16:34:50 - ERROR - stderr - 23%|██▎ | 5137/22434 [6:27:10<11:59:59, 2.50s/it] +2025-02-05 16:34:53 - ERROR - stderr - 23%|██▎ | 5138/22434 [6:27:13<12:01:06, 2.50s/it] +2025-02-05 16:34:53 - ERROR - stderr - +2025-02-05 16:34:53 - ERROR - stderr - +2025-02-05 16:34:53 - INFO - stdout - {'loss': 0.9696, 'grad_norm': 1.0507646799087524, 'learning_rate': 1.7994071796579794e-05, 'epoch': 0.69} +2025-02-05 16:34:53 - ERROR - stderr - 23%|██▎ | 5138/22434 [6:27:13<12:01:06, 2.50s/it] +2025-02-05 16:34:56 - ERROR - stderr - 23%|██▎ | 5139/22434 [6:27:15<12:20:01, 2.57s/it] +2025-02-05 16:34:56 - ERROR - stderr - +2025-02-05 16:34:56 - ERROR - stderr - +2025-02-05 16:34:56 - INFO - stdout - {'loss': 1.0902, 'grad_norm': 1.0436795949935913, 'learning_rate': 1.799320432529055e-05, 'epoch': 0.69} +2025-02-05 16:34:56 - ERROR - stderr - 23%|██▎ | 5139/22434 [6:27:15<12:20:01, 2.57s/it] +2025-02-05 16:34:58 - ERROR - stderr - 23%|██▎ | 5140/22434 [6:27:18<12:13:29, 2.54s/it] +2025-02-05 16:34:58 - ERROR - stderr - +2025-02-05 16:34:58 - ERROR - stderr - +2025-02-05 16:34:58 - INFO - stdout - {'loss': 0.8826, 'grad_norm': 1.0312986373901367, 'learning_rate': 1.799233668739061e-05, 'epoch': 0.69} +2025-02-05 16:34:58 - ERROR - stderr - 23%|██▎ | 5140/22434 [6:27:18<12:13:29, 2.54s/it] +2025-02-05 16:35:01 - ERROR - stderr - 23%|██▎ | 5141/22434 [6:27:20<12:07:09, 2.52s/it] +2025-02-05 16:35:01 - ERROR - stderr - +2025-02-05 16:35:01 - ERROR - stderr - +2025-02-05 16:35:01 - INFO - stdout - {'loss': 0.8882, 'grad_norm': 1.0144051313400269, 'learning_rate': 1.799146888289806e-05, 'epoch': 0.69} +2025-02-05 16:35:01 - ERROR - stderr - 23%|██▎ | 5141/22434 [6:27:20<12:07:09, 2.52s/it] +2025-02-05 16:35:03 - ERROR - stderr - 23%|██▎ | 5142/22434 [6:27:23<12:03:06, 2.51s/it] +2025-02-05 16:35:03 - ERROR - stderr - +2025-02-05 16:35:03 - ERROR - stderr - +2025-02-05 16:35:03 - INFO - stdout - {'loss': 0.938, 'grad_norm': 1.09243643283844, 'learning_rate': 1.7990600911830988e-05, 'epoch': 0.69} +2025-02-05 16:35:03 - ERROR - stderr - 23%|██▎ | 5142/22434 [6:27:23<12:03:06, 2.51s/it] +2025-02-05 16:35:06 - ERROR - stderr - 23%|██▎ | 5143/22434 [6:27:25<12:03:26, 2.51s/it] +2025-02-05 16:35:06 - ERROR - stderr - +2025-02-05 16:35:06 - ERROR - stderr - +2025-02-05 16:35:06 - INFO - stdout - {'loss': 0.9108, 'grad_norm': 1.116445541381836, 'learning_rate': 1.7989732774207486e-05, 'epoch': 0.69} +2025-02-05 16:35:06 - ERROR - stderr - 23%|██▎ | 5143/22434 [6:27:25<12:03:26, 2.51s/it] +2025-02-05 16:35:08 - ERROR - stderr - 23%|██▎ | 5144/22434 [6:27:28<12:07:16, 2.52s/it] +2025-02-05 16:35:08 - ERROR - stderr - +2025-02-05 16:35:08 - ERROR - stderr - +2025-02-05 16:35:08 - INFO - stdout - {'loss': 0.9364, 'grad_norm': 1.0592668056488037, 'learning_rate': 1.798886447004565e-05, 'epoch': 0.69} +2025-02-05 16:35:08 - ERROR - stderr - 23%|██▎ | 5144/22434 [6:27:28<12:07:16, 2.52s/it] +2025-02-05 16:35:11 - ERROR - stderr - 23%|██▎ | 5145/22434 [6:27:30<12:04:16, 2.51s/it] +2025-02-05 16:35:11 - ERROR - stderr - +2025-02-05 16:35:11 - ERROR - stderr - +2025-02-05 16:35:11 - INFO - stdout - {'loss': 1.0881, 'grad_norm': 1.0879862308502197, 'learning_rate': 1.798799599936358e-05, 'epoch': 0.69} +2025-02-05 16:35:11 - ERROR - stderr - 23%|██▎ | 5145/22434 [6:27:30<12:04:16, 2.51s/it] +2025-02-05 16:35:13 - ERROR - stderr - 23%|██▎ | 5146/22434 [6:27:33<12:02:04, 2.51s/it] +2025-02-05 16:35:13 - ERROR - stderr - +2025-02-05 16:35:13 - ERROR - stderr - +2025-02-05 16:35:13 - INFO - stdout - {'loss': 0.8993, 'grad_norm': 1.022619366645813, 'learning_rate': 1.7987127362179375e-05, 'epoch': 0.69} +2025-02-05 16:35:13 - ERROR - stderr - 23%|██▎ | 5146/22434 [6:27:33<12:02:04, 2.51s/it] +2025-02-05 16:35:16 - ERROR - stderr - 23%|██▎ | 5147/22434 [6:27:36<12:29:02, 2.60s/it] +2025-02-05 16:35:16 - ERROR - stderr - +2025-02-05 16:35:16 - ERROR - stderr - +2025-02-05 16:35:16 - INFO - stdout - {'loss': 0.9809, 'grad_norm': 1.0596449375152588, 'learning_rate': 1.7986258558511146e-05, 'epoch': 0.69} +2025-02-05 16:35:16 - ERROR - stderr - 23%|██▎ | 5147/22434 [6:27:36<12:29:02, 2.60s/it] +2025-02-05 16:35:18 - ERROR - stderr - 23%|██▎ | 5148/22434 [6:27:38<12:15:21, 2.55s/it] +2025-02-05 16:35:18 - ERROR - stderr - +2025-02-05 16:35:18 - ERROR - stderr - +2025-02-05 16:35:18 - INFO - stdout - {'loss': 0.9455, 'grad_norm': 1.019476294517517, 'learning_rate': 1.7985389588377e-05, 'epoch': 0.69} +2025-02-05 16:35:18 - ERROR - stderr - 23%|██▎ | 5148/22434 [6:27:38<12:15:21, 2.55s/it] +2025-02-05 16:35:21 - ERROR - stderr - 23%|██▎ | 5149/22434 [6:27:41<12:07:45, 2.53s/it] +2025-02-05 16:35:21 - ERROR - stderr - +2025-02-05 16:35:21 - ERROR - stderr - +2025-02-05 16:35:21 - INFO - stdout - {'loss': 0.9762, 'grad_norm': 1.0632236003875732, 'learning_rate': 1.7984520451795043e-05, 'epoch': 0.69} +2025-02-05 16:35:21 - ERROR - stderr - 23%|██▎ | 5149/22434 [6:27:41<12:07:45, 2.53s/it] +2025-02-05 16:35:23 - ERROR - stderr - 23%|██▎ | 5150/22434 [6:27:43<12:02:47, 2.51s/it] +2025-02-05 16:35:23 - ERROR - stderr - +2025-02-05 16:35:23 - ERROR - stderr - +2025-02-05 16:35:23 - INFO - stdout - {'loss': 0.9919, 'grad_norm': 1.2100046873092651, 'learning_rate': 1.7983651148783402e-05, 'epoch': 0.69} +2025-02-05 16:35:23 - ERROR - stderr - 23%|██▎ | 5150/22434 [6:27:43<12:02:47, 2.51s/it] +2025-02-05 16:35:26 - ERROR - stderr - 23%|██▎ | 5151/22434 [6:27:46<11:56:11, 2.49s/it] +2025-02-05 16:35:26 - ERROR - stderr - +2025-02-05 16:35:26 - ERROR - stderr - +2025-02-05 16:35:26 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 1.1318711042404175, 'learning_rate': 1.798278167936019e-05, 'epoch': 0.69} +2025-02-05 16:35:26 - ERROR - stderr - 23%|██▎ | 5151/22434 [6:27:46<11:56:11, 2.49s/it] +2025-02-05 16:35:28 - ERROR - stderr - 23%|██▎ | 5152/22434 [6:27:48<12:00:30, 2.50s/it] +2025-02-05 16:35:28 - ERROR - stderr - +2025-02-05 16:35:28 - ERROR - stderr - +2025-02-05 16:35:28 - INFO - stdout - {'loss': 0.9316, 'grad_norm': 1.0051398277282715, 'learning_rate': 1.7981912043543535e-05, 'epoch': 0.69} +2025-02-05 16:35:28 - ERROR - stderr - 23%|██▎ | 5152/22434 [6:27:48<12:00:30, 2.50s/it] +2025-02-05 16:35:31 - ERROR - stderr - 23%|██▎ | 5153/22434 [6:27:51<11:57:28, 2.49s/it] +2025-02-05 16:35:31 - ERROR - stderr - +2025-02-05 16:35:31 - ERROR - stderr - +2025-02-05 16:35:31 - INFO - stdout - {'loss': 0.9173, 'grad_norm': 0.9786632657051086, 'learning_rate': 1.798104224135156e-05, 'epoch': 0.69} +2025-02-05 16:35:31 - ERROR - stderr - 23%|██▎ | 5153/22434 [6:27:51<11:57:28, 2.49s/it] +2025-02-05 16:35:34 - ERROR - stderr - 23%|██▎ | 5154/22434 [6:27:53<12:29:29, 2.60s/it] +2025-02-05 16:35:34 - ERROR - stderr - +2025-02-05 16:35:34 - ERROR - stderr - +2025-02-05 16:35:34 - INFO - stdout - {'loss': 0.9628, 'grad_norm': 1.1040886640548706, 'learning_rate': 1.7980172272802398e-05, 'epoch': 0.69} +2025-02-05 16:35:34 - ERROR - stderr - 23%|██▎ | 5154/22434 [6:27:53<12:29:29, 2.60s/it] +2025-02-05 16:35:36 - ERROR - stderr - 23%|██▎ | 5155/22434 [6:27:56<12:15:57, 2.56s/it] +2025-02-05 16:35:36 - ERROR - stderr - +2025-02-05 16:35:36 - ERROR - stderr - +2025-02-05 16:35:36 - INFO - stdout - {'loss': 0.9817, 'grad_norm': 1.1284029483795166, 'learning_rate': 1.797930213791418e-05, 'epoch': 0.69} +2025-02-05 16:35:36 - ERROR - stderr - 23%|██▎ | 5155/22434 [6:27:56<12:15:57, 2.56s/it] +2025-02-05 16:35:39 - ERROR - stderr - 23%|██▎ | 5156/22434 [6:27:58<12:07:35, 2.53s/it] +2025-02-05 16:35:39 - ERROR - stderr - +2025-02-05 16:35:39 - ERROR - stderr - +2025-02-05 16:35:39 - INFO - stdout - {'loss': 0.9284, 'grad_norm': 1.1185822486877441, 'learning_rate': 1.7978431836705043e-05, 'epoch': 0.69} +2025-02-05 16:35:39 - ERROR - stderr - 23%|██▎ | 5156/22434 [6:27:58<12:07:35, 2.53s/it] +2025-02-05 16:35:41 - ERROR - stderr - 23%|██▎ | 5157/22434 [6:28:01<12:05:45, 2.52s/it] +2025-02-05 16:35:41 - ERROR - stderr - +2025-02-05 16:35:41 - ERROR - stderr - +2025-02-05 16:35:41 - INFO - stdout - {'loss': 0.8564, 'grad_norm': 0.9561588168144226, 'learning_rate': 1.797756136919313e-05, 'epoch': 0.69} +2025-02-05 16:35:41 - ERROR - stderr - 23%|██▎ | 5157/22434 [6:28:01<12:05:45, 2.52s/it] +2025-02-05 16:35:44 - ERROR - stderr - 23%|██▎ | 5158/22434 [6:28:03<12:01:16, 2.50s/it] +2025-02-05 16:35:44 - ERROR - stderr - +2025-02-05 16:35:44 - ERROR - stderr - +2025-02-05 16:35:44 - INFO - stdout - {'loss': 1.0143, 'grad_norm': 1.0426297187805176, 'learning_rate': 1.7976690735396586e-05, 'epoch': 0.69} +2025-02-05 16:35:44 - ERROR - stderr - 23%|██▎ | 5158/22434 [6:28:03<12:01:16, 2.50s/it] +2025-02-05 16:35:46 - ERROR - stderr - 23%|██▎ | 5159/22434 [6:28:06<12:01:43, 2.51s/it] +2025-02-05 16:35:46 - ERROR - stderr - +2025-02-05 16:35:46 - ERROR - stderr - +2025-02-05 16:35:46 - INFO - stdout - {'loss': 0.9952, 'grad_norm': 1.085315465927124, 'learning_rate': 1.7975819935333554e-05, 'epoch': 0.69} +2025-02-05 16:35:46 - ERROR - stderr - 23%|██▎ | 5159/22434 [6:28:06<12:01:43, 2.51s/it] +2025-02-05 16:35:49 - ERROR - stderr - 23%|██▎ | 5160/22434 [6:28:09<12:24:39, 2.59s/it] +2025-02-05 16:35:49 - ERROR - stderr - +2025-02-05 16:35:49 - ERROR - stderr - +2025-02-05 16:35:49 - INFO - stdout - {'loss': 0.9741, 'grad_norm': 0.9940829277038574, 'learning_rate': 1.797494896902219e-05, 'epoch': 0.69} +2025-02-05 16:35:49 - ERROR - stderr - 23%|██▎ | 5160/22434 [6:28:09<12:24:39, 2.59s/it] +2025-02-05 16:35:51 - ERROR - stderr - 23%|██▎ | 5161/22434 [6:28:11<12:17:49, 2.56s/it] +2025-02-05 16:35:51 - ERROR - stderr - +2025-02-05 16:35:51 - ERROR - stderr - +2025-02-05 16:35:51 - INFO - stdout - {'loss': 0.8824, 'grad_norm': 1.067638874053955, 'learning_rate': 1.797407783648064e-05, 'epoch': 0.69} +2025-02-05 16:35:51 - ERROR - stderr - 23%|██▎ | 5161/22434 [6:28:11<12:17:49, 2.56s/it] +2025-02-05 16:35:54 - ERROR - stderr - 23%|██▎ | 5162/22434 [6:28:14<12:11:54, 2.54s/it] +2025-02-05 16:35:54 - ERROR - stderr - +2025-02-05 16:35:54 - ERROR - stderr - +2025-02-05 16:35:54 - INFO - stdout - {'loss': 1.0436, 'grad_norm': 1.098319172859192, 'learning_rate': 1.797320653772707e-05, 'epoch': 0.69} +2025-02-05 16:35:54 - ERROR - stderr - 23%|██▎ | 5162/22434 [6:28:14<12:11:54, 2.54s/it] +2025-02-05 16:35:56 - ERROR - stderr - 23%|██▎ | 5163/22434 [6:28:16<12:04:56, 2.52s/it] +2025-02-05 16:35:56 - ERROR - stderr - +2025-02-05 16:35:56 - ERROR - stderr - +2025-02-05 16:35:56 - INFO - stdout - {'loss': 1.0688, 'grad_norm': 1.1455984115600586, 'learning_rate': 1.7972335072779646e-05, 'epoch': 0.69} +2025-02-05 16:35:56 - ERROR - stderr - 23%|██▎ | 5163/22434 [6:28:16<12:04:56, 2.52s/it] +2025-02-05 16:35:59 - ERROR - stderr - 23%|██▎ | 5164/22434 [6:28:19<12:02:04, 2.51s/it] +2025-02-05 16:35:59 - ERROR - stderr - +2025-02-05 16:35:59 - ERROR - stderr - +2025-02-05 16:35:59 - INFO - stdout - {'loss': 1.0348, 'grad_norm': 1.0796681642532349, 'learning_rate': 1.797146344165652e-05, 'epoch': 0.69} +2025-02-05 16:35:59 - ERROR - stderr - 23%|██▎ | 5164/22434 [6:28:19<12:02:04, 2.51s/it] +2025-02-05 16:36:02 - ERROR - stderr - 23%|██▎ | 5165/22434 [6:28:22<12:48:01, 2.67s/it] +2025-02-05 16:36:02 - ERROR - stderr - +2025-02-05 16:36:02 - ERROR - stderr - +2025-02-05 16:36:02 - INFO - stdout - {'loss': 1.0522, 'grad_norm': 1.1698533296585083, 'learning_rate': 1.797059164437587e-05, 'epoch': 0.69} +2025-02-05 16:36:02 - ERROR - stderr - 23%|██▎ | 5165/22434 [6:28:22<12:48:01, 2.67s/it] +2025-02-05 16:36:04 - ERROR - stderr - 23%|██▎ | 5166/22434 [6:28:24<12:46:53, 2.66s/it] +2025-02-05 16:36:04 - ERROR - stderr - +2025-02-05 16:36:04 - ERROR - stderr - +2025-02-05 16:36:04 - INFO - stdout - {'loss': 0.9504, 'grad_norm': 1.1333894729614258, 'learning_rate': 1.796971968095586e-05, 'epoch': 0.69} +2025-02-05 16:36:04 - ERROR - stderr - 23%|██▎ | 5166/22434 [6:28:24<12:46:53, 2.66s/it] +2025-02-05 16:36:07 - ERROR - stderr - 23%|██▎ | 5167/22434 [6:28:27<12:44:12, 2.66s/it] +2025-02-05 16:36:07 - ERROR - stderr - +2025-02-05 16:36:07 - ERROR - stderr - +2025-02-05 16:36:07 - INFO - stdout - {'loss': 1.1082, 'grad_norm': 1.1816248893737793, 'learning_rate': 1.796884755141467e-05, 'epoch': 0.69} +2025-02-05 16:36:07 - ERROR - stderr - 23%|██▎ | 5167/22434 [6:28:27<12:44:12, 2.66s/it] +2025-02-05 16:36:10 - ERROR - stderr - 23%|██▎ | 5168/22434 [6:28:29<12:34:00, 2.62s/it] +2025-02-05 16:36:10 - ERROR - stderr - +2025-02-05 16:36:10 - ERROR - stderr - +2025-02-05 16:36:10 - INFO - stdout - {'loss': 0.9955, 'grad_norm': 1.1587681770324707, 'learning_rate': 1.796797525577048e-05, 'epoch': 0.69} +2025-02-05 16:36:10 - ERROR - stderr - 23%|██▎ | 5168/22434 [6:28:29<12:34:00, 2.62s/it] +2025-02-05 16:36:12 - ERROR - stderr - 23%|██▎ | 5169/22434 [6:28:32<12:19:25, 2.57s/it] +2025-02-05 16:36:12 - ERROR - stderr - +2025-02-05 16:36:12 - ERROR - stderr - +2025-02-05 16:36:12 - INFO - stdout - {'loss': 0.9992, 'grad_norm': 1.1130093336105347, 'learning_rate': 1.796710279404147e-05, 'epoch': 0.69} +2025-02-05 16:36:12 - ERROR - stderr - 23%|██▎ | 5169/22434 [6:28:32<12:19:25, 2.57s/it] +2025-02-05 16:36:14 - ERROR - stderr - 23%|██▎ | 5170/22434 [6:28:34<12:07:25, 2.53s/it] +2025-02-05 16:36:15 - ERROR - stderr - +2025-02-05 16:36:15 - ERROR - stderr - +2025-02-05 16:36:15 - INFO - stdout - {'loss': 0.9963, 'grad_norm': 1.2344483137130737, 'learning_rate': 1.7966230166245825e-05, 'epoch': 0.69} +2025-02-05 16:36:15 - ERROR - stderr - 23%|██▎ | 5170/22434 [6:28:34<12:07:25, 2.53s/it] +2025-02-05 16:36:17 - ERROR - stderr - 23%|██▎ | 5171/22434 [6:28:37<12:06:10, 2.52s/it] +2025-02-05 16:36:17 - ERROR - stderr - +2025-02-05 16:36:17 - ERROR - stderr - +2025-02-05 16:36:17 - INFO - stdout - {'loss': 0.9422, 'grad_norm': 1.0873537063598633, 'learning_rate': 1.7965357372401733e-05, 'epoch': 0.69} +2025-02-05 16:36:17 - ERROR - stderr - 23%|██▎ | 5171/22434 [6:28:37<12:06:10, 2.52s/it] +2025-02-05 16:36:19 - ERROR - stderr - 23%|██▎ | 5172/22434 [6:28:39<12:02:39, 2.51s/it] +2025-02-05 16:36:20 - ERROR - stderr - +2025-02-05 16:36:20 - ERROR - stderr - +2025-02-05 16:36:20 - INFO - stdout - {'loss': 1.038, 'grad_norm': 1.083786964416504, 'learning_rate': 1.7964484412527394e-05, 'epoch': 0.69} +2025-02-05 16:36:20 - ERROR - stderr - 23%|██▎ | 5172/22434 [6:28:39<12:02:39, 2.51s/it] +2025-02-05 16:36:22 - ERROR - stderr - 23%|██▎ | 5173/22434 [6:28:42<12:08:55, 2.53s/it] +2025-02-05 16:36:22 - ERROR - stderr - +2025-02-05 16:36:22 - ERROR - stderr - +2025-02-05 16:36:22 - INFO - stdout - {'loss': 0.9729, 'grad_norm': 1.1155650615692139, 'learning_rate': 1.7963611286640996e-05, 'epoch': 0.69} +2025-02-05 16:36:22 - ERROR - stderr - 23%|██▎ | 5173/22434 [6:28:42<12:08:55, 2.53s/it] +2025-02-05 16:36:25 - ERROR - stderr - 23%|██▎ | 5174/22434 [6:28:44<12:06:12, 2.52s/it] +2025-02-05 16:36:25 - ERROR - stderr - +2025-02-05 16:36:25 - ERROR - stderr - +2025-02-05 16:36:25 - INFO - stdout - {'loss': 0.9215, 'grad_norm': 1.072467565536499, 'learning_rate': 1.7962737994760743e-05, 'epoch': 0.69} +2025-02-05 16:36:25 - ERROR - stderr - 23%|██▎ | 5174/22434 [6:28:44<12:06:12, 2.52s/it] +2025-02-05 16:36:27 - ERROR - stderr - 23%|██▎ | 5175/22434 [6:28:47<12:03:36, 2.52s/it] +2025-02-05 16:36:27 - ERROR - stderr - +2025-02-05 16:36:27 - ERROR - stderr - +2025-02-05 16:36:27 - INFO - stdout - {'loss': 1.0741, 'grad_norm': 1.2219685316085815, 'learning_rate': 1.796186453690483e-05, 'epoch': 0.69} +2025-02-05 16:36:27 - ERROR - stderr - 23%|██▎ | 5175/22434 [6:28:47<12:03:36, 2.52s/it] +2025-02-05 16:36:30 - ERROR - stderr - 23%|██▎ | 5176/22434 [6:28:49<12:10:20, 2.54s/it] +2025-02-05 16:36:30 - ERROR - stderr - +2025-02-05 16:36:30 - ERROR - stderr - +2025-02-05 16:36:30 - INFO - stdout - {'loss': 0.8628, 'grad_norm': 0.9055397510528564, 'learning_rate': 1.7960990913091477e-05, 'epoch': 0.69} +2025-02-05 16:36:30 - ERROR - stderr - 23%|██▎ | 5176/22434 [6:28:49<12:10:20, 2.54s/it] +2025-02-05 16:36:32 - ERROR - stderr - 23%|██▎ | 5177/22434 [6:28:52<12:09:16, 2.54s/it] +2025-02-05 16:36:32 - ERROR - stderr - +2025-02-05 16:36:32 - ERROR - stderr - +2025-02-05 16:36:32 - INFO - stdout - {'loss': 0.986, 'grad_norm': 1.066775918006897, 'learning_rate': 1.7960117123338884e-05, 'epoch': 0.69} +2025-02-05 16:36:32 - ERROR - stderr - 23%|██▎ | 5177/22434 [6:28:52<12:09:16, 2.54s/it] +2025-02-05 16:36:35 - ERROR - stderr - 23%|██▎ | 5178/22434 [6:28:54<12:02:31, 2.51s/it] +2025-02-05 16:36:35 - ERROR - stderr - +2025-02-05 16:36:35 - ERROR - stderr - +2025-02-05 16:36:35 - INFO - stdout - {'loss': 0.9648, 'grad_norm': 1.1202335357666016, 'learning_rate': 1.7959243167665263e-05, 'epoch': 0.69} +2025-02-05 16:36:35 - ERROR - stderr - 23%|██▎ | 5178/22434 [6:28:54<12:02:31, 2.51s/it] +2025-02-05 16:36:37 - ERROR - stderr - 23%|██▎ | 5179/22434 [6:28:57<11:57:47, 2.50s/it] +2025-02-05 16:36:37 - ERROR - stderr - +2025-02-05 16:36:37 - ERROR - stderr - +2025-02-05 16:36:37 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.1116210222244263, 'learning_rate': 1.7958369046088837e-05, 'epoch': 0.69} +2025-02-05 16:36:37 - ERROR - stderr - 23%|██▎ | 5179/22434 [6:28:57<11:57:47, 2.50s/it] +2025-02-05 16:36:40 - ERROR - stderr - 23%|██▎ | 5180/22434 [6:28:59<11:59:41, 2.50s/it] +2025-02-05 16:36:40 - ERROR - stderr - +2025-02-05 16:36:40 - ERROR - stderr - +2025-02-05 16:36:40 - INFO - stdout - {'loss': 1.0667, 'grad_norm': 1.0598499774932861, 'learning_rate': 1.7957494758627823e-05, 'epoch': 0.69} +2025-02-05 16:36:40 - ERROR - stderr - 23%|██▎ | 5180/22434 [6:28:59<11:59:41, 2.50s/it] +2025-02-05 16:36:42 - ERROR - stderr - 23%|██▎ | 5181/22434 [6:29:02<11:58:17, 2.50s/it] +2025-02-05 16:36:42 - ERROR - stderr - +2025-02-05 16:36:42 - ERROR - stderr - +2025-02-05 16:36:42 - INFO - stdout - {'loss': 0.9479, 'grad_norm': 1.0478835105895996, 'learning_rate': 1.7956620305300444e-05, 'epoch': 0.69} +2025-02-05 16:36:42 - ERROR - stderr - 23%|██▎ | 5181/22434 [6:29:02<11:58:17, 2.50s/it] +2025-02-05 16:36:45 - ERROR - stderr - 23%|██▎ | 5182/22434 [6:29:04<12:02:00, 2.51s/it] +2025-02-05 16:36:45 - ERROR - stderr - +2025-02-05 16:36:45 - ERROR - stderr - +2025-02-05 16:36:45 - INFO - stdout - {'loss': 1.0016, 'grad_norm': 1.0910258293151855, 'learning_rate': 1.795574568612493e-05, 'epoch': 0.69} +2025-02-05 16:36:45 - ERROR - stderr - 23%|██▎ | 5182/22434 [6:29:04<12:02:00, 2.51s/it] +2025-02-05 16:36:47 - ERROR - stderr - 23%|██▎ | 5183/22434 [6:29:07<12:05:34, 2.52s/it] +2025-02-05 16:36:47 - ERROR - stderr - +2025-02-05 16:36:47 - ERROR - stderr - +2025-02-05 16:36:47 - INFO - stdout - {'loss': 0.9586, 'grad_norm': 1.0974817276000977, 'learning_rate': 1.795487090111951e-05, 'epoch': 0.69} +2025-02-05 16:36:47 - ERROR - stderr - 23%|██▎ | 5183/22434 [6:29:07<12:05:34, 2.52s/it] +2025-02-05 16:36:50 - ERROR - stderr - 23%|██▎ | 5184/22434 [6:29:09<12:04:12, 2.52s/it] +2025-02-05 16:36:50 - ERROR - stderr - +2025-02-05 16:36:50 - ERROR - stderr - +2025-02-05 16:36:50 - INFO - stdout - {'loss': 1.0022, 'grad_norm': 1.0336499214172363, 'learning_rate': 1.795399595030242e-05, 'epoch': 0.69} +2025-02-05 16:36:50 - ERROR - stderr - 23%|██▎ | 5184/22434 [6:29:10<12:04:12, 2.52s/it] +2025-02-05 16:36:53 - ERROR - stderr - 23%|██▎ | 5185/22434 [6:29:12<12:39:51, 2.64s/it] +2025-02-05 16:36:53 - ERROR - stderr - +2025-02-05 16:36:53 - ERROR - stderr - +2025-02-05 16:36:53 - INFO - stdout - {'loss': 0.8889, 'grad_norm': 1.0178587436676025, 'learning_rate': 1.7953120833691894e-05, 'epoch': 0.69} +2025-02-05 16:36:53 - ERROR - stderr - 23%|██▎ | 5185/22434 [6:29:12<12:39:51, 2.64s/it] +2025-02-05 16:36:55 - ERROR - stderr - 23%|██▎ | 5186/22434 [6:29:15<12:28:50, 2.60s/it] +2025-02-05 16:36:55 - ERROR - stderr - +2025-02-05 16:36:55 - ERROR - stderr - +2025-02-05 16:36:55 - INFO - stdout - {'loss': 0.8964, 'grad_norm': 1.0869948863983154, 'learning_rate': 1.7952245551306173e-05, 'epoch': 0.69} +2025-02-05 16:36:55 - ERROR - stderr - 23%|██▎ | 5186/22434 [6:29:15<12:28:50, 2.60s/it] +2025-02-05 16:36:58 - ERROR - stderr - 23%|██▎ | 5187/22434 [6:29:17<12:13:26, 2.55s/it] +2025-02-05 16:36:58 - ERROR - stderr - +2025-02-05 16:36:58 - ERROR - stderr - +2025-02-05 16:36:58 - INFO - stdout - {'loss': 0.9192, 'grad_norm': 1.051178216934204, 'learning_rate': 1.7951370103163507e-05, 'epoch': 0.69} +2025-02-05 16:36:58 - ERROR - stderr - 23%|██▎ | 5187/22434 [6:29:17<12:13:26, 2.55s/it] +2025-02-05 16:37:00 - ERROR - stderr - 23%|██▎ | 5188/22434 [6:29:20<12:09:56, 2.54s/it] +2025-02-05 16:37:00 - ERROR - stderr - +2025-02-05 16:37:00 - ERROR - stderr - +2025-02-05 16:37:00 - INFO - stdout - {'loss': 0.9115, 'grad_norm': 1.145849585533142, 'learning_rate': 1.795049448928213e-05, 'epoch': 0.69} +2025-02-05 16:37:00 - ERROR - stderr - 23%|██▎ | 5188/22434 [6:29:20<12:09:56, 2.54s/it] +2025-02-05 16:37:03 - ERROR - stderr - 23%|██▎ | 5189/22434 [6:29:22<12:03:10, 2.52s/it] +2025-02-05 16:37:03 - ERROR - stderr - +2025-02-05 16:37:03 - ERROR - stderr - +2025-02-05 16:37:03 - INFO - stdout - {'loss': 0.9271, 'grad_norm': 1.0809355974197388, 'learning_rate': 1.7949618709680315e-05, 'epoch': 0.69} +2025-02-05 16:37:03 - ERROR - stderr - 23%|██▎ | 5189/22434 [6:29:22<12:03:10, 2.52s/it] +2025-02-05 16:37:05 - ERROR - stderr - 23%|██▎ | 5190/22434 [6:29:25<12:05:16, 2.52s/it] +2025-02-05 16:37:05 - ERROR - stderr - +2025-02-05 16:37:05 - ERROR - stderr - +2025-02-05 16:37:05 - INFO - stdout - {'loss': 1.0029, 'grad_norm': 1.172094464302063, 'learning_rate': 1.79487427643763e-05, 'epoch': 0.69} +2025-02-05 16:37:05 - ERROR - stderr - 23%|██▎ | 5190/22434 [6:29:25<12:05:16, 2.52s/it] +2025-02-05 16:37:08 - ERROR - stderr - 23%|██▎ | 5191/22434 [6:29:27<11:59:50, 2.50s/it] +2025-02-05 16:37:08 - ERROR - stderr - +2025-02-05 16:37:08 - ERROR - stderr - +2025-02-05 16:37:08 - INFO - stdout - {'loss': 1.003, 'grad_norm': 1.1019080877304077, 'learning_rate': 1.7947866653388346e-05, 'epoch': 0.69} +2025-02-05 16:37:08 - ERROR - stderr - 23%|█���▎ | 5191/22434 [6:29:27<11:59:50, 2.50s/it] +2025-02-05 16:37:10 - ERROR - stderr - 23%|██▎ | 5192/22434 [6:29:30<12:00:45, 2.51s/it] +2025-02-05 16:37:10 - ERROR - stderr - +2025-02-05 16:37:10 - ERROR - stderr - +2025-02-05 16:37:10 - INFO - stdout - {'loss': 0.9258, 'grad_norm': 1.0884426832199097, 'learning_rate': 1.794699037673472e-05, 'epoch': 0.69} +2025-02-05 16:37:10 - ERROR - stderr - 23%|██▎ | 5192/22434 [6:29:30<12:00:45, 2.51s/it] +2025-02-05 16:37:13 - ERROR - stderr - 23%|██▎ | 5193/22434 [6:29:32<11:57:39, 2.50s/it] +2025-02-05 16:37:13 - ERROR - stderr - +2025-02-05 16:37:13 - ERROR - stderr - +2025-02-05 16:37:13 - INFO - stdout - {'loss': 0.9453, 'grad_norm': 1.0800942182540894, 'learning_rate': 1.7946113934433686e-05, 'epoch': 0.69} +2025-02-05 16:37:13 - ERROR - stderr - 23%|██▎ | 5193/22434 [6:29:32<11:57:39, 2.50s/it] +2025-02-05 16:37:15 - ERROR - stderr - 23%|██▎ | 5194/22434 [6:29:35<11:56:13, 2.49s/it] +2025-02-05 16:37:15 - ERROR - stderr - +2025-02-05 16:37:15 - ERROR - stderr - +2025-02-05 16:37:15 - INFO - stdout - {'loss': 1.0003, 'grad_norm': 1.0571277141571045, 'learning_rate': 1.7945237326503507e-05, 'epoch': 0.69} +2025-02-05 16:37:15 - ERROR - stderr - 23%|██▎ | 5194/22434 [6:29:35<11:56:13, 2.49s/it] +2025-02-05 16:37:17 - ERROR - stderr - 23%|██▎ | 5195/22434 [6:29:37<11:51:15, 2.48s/it] +2025-02-05 16:37:18 - ERROR - stderr - +2025-02-05 16:37:18 - ERROR - stderr - +2025-02-05 16:37:18 - INFO - stdout - {'loss': 0.9992, 'grad_norm': 1.208843469619751, 'learning_rate': 1.7944360552962455e-05, 'epoch': 0.69} +2025-02-05 16:37:18 - ERROR - stderr - 23%|██▎ | 5195/22434 [6:29:37<11:51:15, 2.48s/it] +2025-02-05 16:37:20 - ERROR - stderr - 23%|██▎ | 5196/22434 [6:29:40<11:50:47, 2.47s/it] +2025-02-05 16:37:20 - ERROR - stderr - +2025-02-05 16:37:20 - ERROR - stderr - +2025-02-05 16:37:20 - INFO - stdout - {'loss': 0.9665, 'grad_norm': 1.2451117038726807, 'learning_rate': 1.7943483613828817e-05, 'epoch': 0.69} +2025-02-05 16:37:20 - ERROR - stderr - 23%|██▎ | 5196/22434 [6:29:40<11:50:47, 2.47s/it] +2025-02-05 16:37:22 - ERROR - stderr - 23%|██▎ | 5197/22434 [6:29:42<11:51:54, 2.48s/it] +2025-02-05 16:37:22 - ERROR - stderr - +2025-02-05 16:37:22 - ERROR - stderr - +2025-02-05 16:37:22 - INFO - stdout - {'loss': 0.9463, 'grad_norm': 1.166275978088379, 'learning_rate': 1.7942606509120862e-05, 'epoch': 0.69} +2025-02-05 16:37:22 - ERROR - stderr - 23%|██▎ | 5197/22434 [6:29:42<11:51:54, 2.48s/it] +2025-02-05 16:37:25 - ERROR - stderr - 23%|██▎ | 5198/22434 [6:29:45<11:53:51, 2.48s/it] +2025-02-05 16:37:25 - ERROR - stderr - +2025-02-05 16:37:25 - ERROR - stderr - +2025-02-05 16:37:25 - INFO - stdout - {'loss': 0.957, 'grad_norm': 1.0716139078140259, 'learning_rate': 1.7941729238856868e-05, 'epoch': 0.7} +2025-02-05 16:37:25 - ERROR - stderr - 23%|██▎ | 5198/22434 [6:29:45<11:53:51, 2.48s/it] +2025-02-05 16:37:27 - ERROR - stderr - 23%|██▎ | 5199/22434 [6:29:47<11:53:28, 2.48s/it] +2025-02-05 16:37:27 - ERROR - stderr - +2025-02-05 16:37:27 - ERROR - stderr - +2025-02-05 16:37:27 - INFO - stdout - {'loss': 0.9251, 'grad_norm': 1.001428246498108, 'learning_rate': 1.7940851803055138e-05, 'epoch': 0.7} +2025-02-05 16:37:27 - ERROR - stderr - 23%|██▎ | 5199/22434 [6:29:47<11:53:28, 2.48s/it] +2025-02-05 16:37:30 - ERROR - stderr - 23%|██▎ | 5200/22434 [6:29:50<11:56:52, 2.50s/it] +2025-02-05 16:37:30 - ERROR - stderr - +2025-02-05 16:37:30 - ERROR - stderr - +2025-02-05 16:37:30 - INFO - stdout - {'loss': 1.0275, 'grad_norm': 1.0861520767211914, 'learning_rate': 1.7939974201733944e-05, 'epoch': 0.7} +2025-02-05 16:37:30 - ERROR - stderr - 23%|██▎ | 5200/22434 [6:29:50<11:56:52, 2.50s/it] +2025-02-05 16:37:32 - ERROR - stderr - 23%|██▎ | 5201/22434 [6:29:52<11:49:57, 2.47s/it] +2025-02-05 16:37:32 - ERROR - stderr - +2025-02-05 16:37:32 - ERROR - stderr - +2025-02-05 16:37:32 - INFO - stdout - {'loss': 0.8965, 'grad_norm': 1.0136325359344482, 'learning_rate': 1.7939096434911586e-05, 'epoch': 0.7} +2025-02-05 16:37:32 - ERROR - stderr - 23%|██▎ | 5201/22434 [6:29:52<11:49:57, 2.47s/it] +2025-02-05 16:37:35 - ERROR - stderr - 23%|██▎ | 5202/22434 [6:29:55<11:53:34, 2.48s/it] +2025-02-05 16:37:35 - ERROR - stderr - +2025-02-05 16:37:35 - ERROR - stderr - +2025-02-05 16:37:35 - INFO - stdout - {'loss': 1.0769, 'grad_norm': 1.1068634986877441, 'learning_rate': 1.7938218502606362e-05, 'epoch': 0.7} +2025-02-05 16:37:35 - ERROR - stderr - 23%|██▎ | 5202/22434 [6:29:55<11:53:34, 2.48s/it] +2025-02-05 16:37:37 - ERROR - stderr - 23%|██▎ | 5203/22434 [6:29:57<11:52:43, 2.48s/it] +2025-02-05 16:37:37 - ERROR - stderr - +2025-02-05 16:37:37 - ERROR - stderr - +2025-02-05 16:37:37 - INFO - stdout - {'loss': 0.9873, 'grad_norm': 1.0925811529159546, 'learning_rate': 1.7937340404836566e-05, 'epoch': 0.7} +2025-02-05 16:37:37 - ERROR - stderr - 23%|██▎ | 5203/22434 [6:29:57<11:52:43, 2.48s/it] +2025-02-05 16:37:40 - ERROR - stderr - 23%|██▎ | 5204/22434 [6:30:00<11:56:47, 2.50s/it] +2025-02-05 16:37:40 - ERROR - stderr - +2025-02-05 16:37:40 - ERROR - stderr - +2025-02-05 16:37:40 - INFO - stdout - {'loss': 0.982, 'grad_norm': 0.9867472648620605, 'learning_rate': 1.7936462141620507e-05, 'epoch': 0.7} +2025-02-05 16:37:40 - ERROR - stderr - 23%|██▎ | 5204/22434 [6:30:00<11:56:47, 2.50s/it] +2025-02-05 16:37:42 - ERROR - stderr - 23%|██▎ | 5205/22434 [6:30:02<11:56:54, 2.50s/it] +2025-02-05 16:37:42 - ERROR - stderr - +2025-02-05 16:37:42 - ERROR - stderr - +2025-02-05 16:37:42 - INFO - stdout - {'loss': 0.9542, 'grad_norm': 1.0225833654403687, 'learning_rate': 1.7935583712976487e-05, 'epoch': 0.7} +2025-02-05 16:37:42 - ERROR - stderr - 23%|██▎ | 5205/22434 [6:30:02<11:56:54, 2.50s/it] +2025-02-05 16:37:45 - ERROR - stderr - 23%|██▎ | 5206/22434 [6:30:05<11:51:01, 2.48s/it] +2025-02-05 16:37:45 - ERROR - stderr - +2025-02-05 16:37:45 - ERROR - stderr - +2025-02-05 16:37:45 - INFO - stdout - {'loss': 0.9161, 'grad_norm': 1.1636637449264526, 'learning_rate': 1.7934705118922823e-05, 'epoch': 0.7} +2025-02-05 16:37:45 - ERROR - stderr - 23%|██▎ | 5206/22434 [6:30:05<11:51:01, 2.48s/it] +2025-02-05 16:37:47 - ERROR - stderr - 23%|██▎ | 5207/22434 [6:30:07<11:48:49, 2.47s/it] +2025-02-05 16:37:47 - ERROR - stderr - +2025-02-05 16:37:47 - ERROR - stderr - +2025-02-05 16:37:47 - INFO - stdout - {'loss': 0.9665, 'grad_norm': 1.1225420236587524, 'learning_rate': 1.793382635947782e-05, 'epoch': 0.7} +2025-02-05 16:37:47 - ERROR - stderr - 23%|██▎ | 5207/22434 [6:30:07<11:48:49, 2.47s/it] +2025-02-05 16:37:50 - ERROR - stderr - 23%|██▎ | 5208/22434 [6:30:10<12:18:36, 2.57s/it] +2025-02-05 16:37:50 - ERROR - stderr - +2025-02-05 16:37:50 - ERROR - stderr - +2025-02-05 16:37:50 - INFO - stdout - {'loss': 1.0437, 'grad_norm': 1.0824493169784546, 'learning_rate': 1.7932947434659796e-05, 'epoch': 0.7} +2025-02-05 16:37:50 - ERROR - stderr - 23%|██▎ | 5208/22434 [6:30:10<12:18:36, 2.57s/it] +2025-02-05 16:37:53 - ERROR - stderr - 23%|██▎ | 5209/22434 [6:30:12<12:14:32, 2.56s/it] +2025-02-05 16:37:53 - ERROR - stderr - +2025-02-05 16:37:53 - ERROR - stderr - +2025-02-05 16:37:53 - INFO - stdout - {'loss': 0.9959, 'grad_norm': 0.9740232229232788, 'learning_rate': 1.7932068344487076e-05, 'epoch': 0.7} +2025-02-05 16:37:53 - ERROR - stderr - 23%|██▎ | 5209/22434 [6:30:12<12:14:32, 2.56s/it] +2025-02-05 16:37:55 - ERROR - stderr - 23%|██▎ | 5210/22434 [6:30:15<12:10:28, 2.54s/it] +2025-02-05 16:37:55 - ERROR - stderr - +2025-02-05 16:37:55 - ERROR - stderr - +2025-02-05 16:37:55 - INFO - stdout - {'loss': 1.0007, 'grad_norm': 1.0829992294311523, 'learning_rate': 1.7931189088977984e-05, 'epoch': 0.7} +2025-02-05 16:37:55 - ERROR - stderr - 23%|██▎ | 5210/22434 [6:30:15<12:10:28, 2.54s/it] +2025-02-05 16:37:58 - ERROR - stderr - 23%|██▎ | 5211/22434 [6:30:17<12:06:24, 2.53s/it] +2025-02-05 16:37:58 - ERROR - stderr - +2025-02-05 16:37:58 - ERROR - stderr - +2025-02-05 16:37:58 - INFO - stdout - {'loss': 0.9698, 'grad_norm': 1.2006179094314575, 'learning_rate': 1.793030966815084e-05, 'epoch': 0.7} +2025-02-05 16:37:58 - ERROR - stderr - 23%|██▎ | 5211/22434 [6:30:17<12:06:24, 2.53s/it] +2025-02-05 16:38:00 - ERROR - stderr - 23%|██▎ | 5212/22434 [6:30:20<11:59:27, 2.51s/it] +2025-02-05 16:38:00 - ERROR - stderr - +2025-02-05 16:38:00 - ERROR - stderr - +2025-02-05 16:38:00 - INFO - stdout - {'loss': 0.8842, 'grad_norm': 1.1203135251998901, 'learning_rate': 1.792943008202398e-05, 'epoch': 0.7} +2025-02-05 16:38:00 - ERROR - stderr - 23%|██▎ | 5212/22434 [6:30:20<11:59:27, 2.51s/it] +2025-02-05 16:38:03 - ERROR - stderr - 23%|██▎ | 5213/22434 [6:30:22<11:57:30, 2.50s/it] +2025-02-05 16:38:03 - ERROR - stderr - +2025-02-05 16:38:03 - ERROR - stderr - +2025-02-05 16:38:03 - INFO - stdout - {'loss': 0.8798, 'grad_norm': 1.1312508583068848, 'learning_rate': 1.7928550330615743e-05, 'epoch': 0.7} +2025-02-05 16:38:03 - ERROR - stderr - 23%|██▎ | 5213/22434 [6:30:22<11:57:30, 2.50s/it] +2025-02-05 16:38:05 - ERROR - stderr - 23%|██▎ | 5214/22434 [6:30:25<12:17:48, 2.57s/it] +2025-02-05 16:38:05 - ERROR - stderr - +2025-02-05 16:38:05 - ERROR - stderr - +2025-02-05 16:38:05 - INFO - stdout - {'loss': 0.9134, 'grad_norm': 1.0806264877319336, 'learning_rate': 1.7927670413944458e-05, 'epoch': 0.7} +2025-02-05 16:38:05 - ERROR - stderr - 23%|██▎ | 5214/22434 [6:30:25<12:17:48, 2.57s/it] +2025-02-05 16:38:08 - ERROR - stderr - 23%|██▎ | 5215/22434 [6:30:27<12:06:22, 2.53s/it] +2025-02-05 16:38:08 - ERROR - stderr - +2025-02-05 16:38:08 - ERROR - stderr - +2025-02-05 16:38:08 - INFO - stdout - {'loss': 0.9105, 'grad_norm': 1.141685128211975, 'learning_rate': 1.792679033202847e-05, 'epoch': 0.7} +2025-02-05 16:38:08 - ERROR - stderr - 23%|██▎ | 5215/22434 [6:30:28<12:06:22, 2.53s/it] +2025-02-05 16:38:10 - ERROR - stderr - 23%|██▎ | 5216/22434 [6:30:30<12:07:03, 2.53s/it] +2025-02-05 16:38:10 - ERROR - stderr - +2025-02-05 16:38:10 - ERROR - stderr - +2025-02-05 16:38:10 - INFO - stdout - {'loss': 0.8921, 'grad_norm': 1.0786662101745605, 'learning_rate': 1.792591008488612e-05, 'epoch': 0.7} +2025-02-05 16:38:10 - ERROR - stderr - 23%|██▎ | 5216/22434 [6:30:30<12:07:03, 2.53s/it] +2025-02-05 16:38:13 - ERROR - stderr - 23%|██▎ | 5217/22434 [6:30:32<11:58:03, 2.50s/it] +2025-02-05 16:38:13 - ERROR - stderr - +2025-02-05 16:38:13 - ERROR - stderr - +2025-02-05 16:38:13 - INFO - stdout - {'loss': 0.9572, 'grad_norm': 1.2741613388061523, 'learning_rate': 1.792502967253576e-05, 'epoch': 0.7} +2025-02-05 16:38:13 - ERROR - stderr - 23%|██▎ | 5217/22434 [6:30:32<11:58:03, 2.50s/it] +2025-02-05 16:38:15 - ERROR - stderr - 23%|██▎ | 5218/22434 [6:30:35<11:56:51, 2.50s/it] +2025-02-05 16:38:15 - ERROR - stderr - +2025-02-05 16:38:15 - ERROR - stderr - +2025-02-05 16:38:15 - INFO - stdout - {'loss': 0.9395, 'grad_norm': 0.9734985828399658, 'learning_rate': 1.792414909499574e-05, 'epoch': 0.7} +2025-02-05 16:38:15 - ERROR - stderr - 23%|██▎ | 5218/22434 [6:30:35<11:56:51, 2.50s/it] +2025-02-05 16:38:18 - ERROR - stderr - 23%|██▎ | 5219/22434 [6:30:37<11:54:43, 2.49s/it] +2025-02-05 16:38:18 - ERROR - stderr - +2025-02-05 16:38:18 - ERROR - stderr - +2025-02-05 16:38:18 - INFO - stdout - {'loss': 0.9642, 'grad_norm': 1.041425108909607, 'learning_rate': 1.7923268352284415e-05, 'epoch': 0.7} +2025-02-05 16:38:18 - ERROR - stderr - 23%|██▎ | 5219/22434 [6:30:37<11:54:43, 2.49s/it] +2025-02-05 16:38:20 - ERROR - stderr - 23%|██▎ | 5220/22434 [6:30:40<12:00:23, 2.51s/it] +2025-02-05 16:38:20 - ERROR - stderr - +2025-02-05 16:38:20 - ERROR - stderr - +2025-02-05 16:38:20 - INFO - stdout - {'loss': 0.9762, 'grad_norm': 1.037178874015808, 'learning_rate': 1.7922387444420143e-05, 'epoch': 0.7} +2025-02-05 16:38:20 - ERROR - stderr - 23%|██▎ | 5220/22434 [6:30:40<12:00:23, 2.51s/it] +2025-02-05 16:38:23 - ERROR - stderr - 23%|██▎ | 5221/22434 [6:30:42<11:55:08, 2.49s/it] +2025-02-05 16:38:23 - ERROR - stderr - +2025-02-05 16:38:23 - ERROR - stderr - +2025-02-05 16:38:23 - INFO - stdout - {'loss': 0.873, 'grad_norm': 1.1781412363052368, 'learning_rate': 1.7921506371421285e-05, 'epoch': 0.7} +2025-02-05 16:38:23 - ERROR - stderr - 23%|██▎ | 5221/22434 [6:30:42<11:55:08, 2.49s/it] +2025-02-05 16:38:25 - ERROR - stderr - 23%|██▎ | 5222/22434 [6:30:45<12:25:25, 2.60s/it] +2025-02-05 16:38:26 - ERROR - stderr - +2025-02-05 16:38:26 - ERROR - stderr - +2025-02-05 16:38:26 - INFO - stdout - {'loss': 0.8171, 'grad_norm': 0.996019184589386, 'learning_rate': 1.7920625133306205e-05, 'epoch': 0.7} +2025-02-05 16:38:26 - ERROR - stderr - 23%|██▎ | 5222/22434 [6:30:45<12:25:25, 2.60s/it] +2025-02-05 16:38:28 - ERROR - stderr - 23%|██▎ | 5223/22434 [6:30:48<12:14:19, 2.56s/it] +2025-02-05 16:38:28 - ERROR - stderr - +2025-02-05 16:38:28 - ERROR - stderr - +2025-02-05 16:38:28 - INFO - stdout - {'loss': 1.031, 'grad_norm': 1.1254467964172363, 'learning_rate': 1.7919743730093278e-05, 'epoch': 0.7} +2025-02-05 16:38:28 - ERROR - stderr - 23%|██▎ | 5223/22434 [6:30:48<12:14:19, 2.56s/it] +2025-02-05 16:38:31 - ERROR - stderr - 23%|██▎ | 5224/22434 [6:30:50<12:22:22, 2.59s/it] +2025-02-05 16:38:31 - ERROR - stderr - +2025-02-05 16:38:31 - ERROR - stderr - +2025-02-05 16:38:31 - INFO - stdout - {'loss': 1.1622, 'grad_norm': 1.1469203233718872, 'learning_rate': 1.791886216180087e-05, 'epoch': 0.7} +2025-02-05 16:38:31 - ERROR - stderr - 23%|██▎ | 5224/22434 [6:30:50<12:22:22, 2.59s/it] +2025-02-05 16:38:33 - ERROR - stderr - 23%|██▎ | 5225/22434 [6:30:53<12:15:41, 2.57s/it] +2025-02-05 16:38:33 - ERROR - stderr - +2025-02-05 16:38:33 - ERROR - stderr - +2025-02-05 16:38:33 - INFO - stdout - {'loss': 1.0425, 'grad_norm': 1.1206374168395996, 'learning_rate': 1.7917980428447356e-05, 'epoch': 0.7} +2025-02-05 16:38:33 - ERROR - stderr - 23%|██▎ | 5225/22434 [6:30:53<12:15:41, 2.57s/it] +2025-02-05 16:38:36 - ERROR - stderr - 23%|██▎ | 5226/22434 [6:30:55<12:09:38, 2.54s/it] +2025-02-05 16:38:36 - ERROR - stderr - +2025-02-05 16:38:36 - ERROR - stderr - +2025-02-05 16:38:36 - INFO - stdout - {'loss': 0.918, 'grad_norm': 1.1212012767791748, 'learning_rate': 1.7917098530051117e-05, 'epoch': 0.7} +2025-02-05 16:38:36 - ERROR - stderr - 23%|██▎ | 5226/22434 [6:30:55<12:09:38, 2.54s/it] +2025-02-05 16:38:38 - ERROR - stderr - 23%|██▎ | 5227/22434 [6:30:58<12:04:13, 2.53s/it] +2025-02-05 16:38:38 - ERROR - stderr - +2025-02-05 16:38:38 - ERROR - stderr - +2025-02-05 16:38:38 - INFO - stdout - {'loss': 1.0259, 'grad_norm': 1.1862643957138062, 'learning_rate': 1.7916216466630532e-05, 'epoch': 0.7} +2025-02-05 16:38:38 - ERROR - stderr - 23%|██▎ | 5227/22434 [6:30:58<12:04:13, 2.53s/it] +2025-02-05 16:38:41 - ERROR - stderr - 23%|██▎ | 5228/22434 [6:31:00<12:10:29, 2.55s/it] +2025-02-05 16:38:41 - ERROR - stderr - +2025-02-05 16:38:41 - ERROR - stderr - +2025-02-05 16:38:41 - INFO - stdout - {'loss': 0.9888, 'grad_norm': 1.0381441116333008, 'learning_rate': 1.7915334238203995e-05, 'epoch': 0.7} +2025-02-05 16:38:41 - ERROR - stderr - 23%|██▎ | 5228/22434 [6:31:01<12:10:29, 2.55s/it] +2025-02-05 16:38:43 - ERROR - stderr - 23%|██▎ | 5229/22434 [6:31:03<12:01:55, 2.52s/it] +2025-02-05 16:38:43 - ERROR - stderr - +2025-02-05 16:38:43 - ERROR - stderr - +2025-02-05 16:38:43 - INFO - stdout - {'loss': 0.9357, 'grad_norm': 1.0241427421569824, 'learning_rate': 1.7914451844789887e-05, 'epoch': 0.7} +2025-02-05 16:38:43 - ERROR - stderr - 23%|██▎ | 5229/22434 [6:31:03<12:01:55, 2.52s/it] +2025-02-05 16:38:46 - ERROR - stderr - 23%|██▎ | 5230/22434 [6:31:05<11:59:58, 2.51s/it] +2025-02-05 16:38:46 - ERROR - stderr - +2025-02-05 16:38:46 - ERROR - stderr - +2025-02-05 16:38:46 - INFO - stdout - {'loss': 0.8757, 'grad_norm': 1.0289061069488525, 'learning_rate': 1.7913569286406606e-05, 'epoch': 0.7} +2025-02-05 16:38:46 - ERROR - stderr - 23%|██▎ | 5230/22434 [6:31:05<11:59:58, 2.51s/it] +2025-02-05 16:38:48 - ERROR - stderr - 23%|██▎ | 5231/22434 [6:31:08<11:59:32, 2.51s/it] +2025-02-05 16:38:48 - ERROR - stderr - +2025-02-05 16:38:48 - ERROR - stderr - +2025-02-05 16:38:48 - INFO - stdout - {'loss': 1.079, 'grad_norm': 0.9870235323905945, 'learning_rate': 1.7912686563072542e-05, 'epoch': 0.7} +2025-02-05 16:38:48 - ERROR - stderr - 23%|██▎ | 5231/22434 [6:31:08<11:59:32, 2.51s/it] +2025-02-05 16:38:51 - ERROR - stderr - 23%|██▎ | 5232/22434 [6:31:10<11:58:20, 2.51s/it] +2025-02-05 16:38:51 - ERROR - stderr - +2025-02-05 16:38:51 - ERROR - stderr - +2025-02-05 16:38:51 - INFO - stdout - {'loss': 0.9584, 'grad_norm': 0.9351276159286499, 'learning_rate': 1.79118036748061e-05, 'epoch': 0.7} +2025-02-05 16:38:51 - ERROR - stderr - 23%|██▎ | 5232/22434 [6:31:10<11:58:20, 2.51s/it] +2025-02-05 16:38:53 - ERROR - stderr - 23%|██▎ | 5233/22434 [6:31:13<11:53:22, 2.49s/it] +2025-02-05 16:38:53 - ERROR - stderr - +2025-02-05 16:38:53 - ERROR - stderr - +2025-02-05 16:38:53 - INFO - stdout - {'loss': 0.9493, 'grad_norm': 1.1108912229537964, 'learning_rate': 1.791092062162568e-05, 'epoch': 0.7} +2025-02-05 16:38:53 - ERROR - stderr - 23%|██▎ | 5233/22434 [6:31:13<11:53:22, 2.49s/it] +2025-02-05 16:38:56 - ERROR - stderr - 23%|██▎ | 5234/22434 [6:31:15<11:54:11, 2.49s/it] +2025-02-05 16:38:56 - ERROR - stderr - +2025-02-05 16:38:56 - ERROR - stderr - +2025-02-05 16:38:56 - INFO - stdout - {'loss': 0.9597, 'grad_norm': 1.1023406982421875, 'learning_rate': 1.7910037403549695e-05, 'epoch': 0.7} +2025-02-05 16:38:56 - ERROR - stderr - 23%|██▎ | 5234/22434 [6:31:15<11:54:11, 2.49s/it] +2025-02-05 16:38:58 - ERROR - stderr - 23%|██▎ | 5235/22434 [6:31:18<11:55:01, 2.49s/it] +2025-02-05 16:38:58 - ERROR - stderr - +2025-02-05 16:38:58 - ERROR - stderr - +2025-02-05 16:38:58 - INFO - stdout - {'loss': 0.8516, 'grad_norm': 0.9589882493019104, 'learning_rate': 1.7909154020596543e-05, 'epoch': 0.7} +2025-02-05 16:38:58 - ERROR - stderr - 23%|██▎ | 5235/22434 [6:31:18<11:55:01, 2.49s/it] +2025-02-05 16:39:01 - ERROR - stderr - 23%|██▎ | 5236/22434 [6:31:20<12:00:13, 2.51s/it] +2025-02-05 16:39:01 - ERROR - stderr - +2025-02-05 16:39:01 - ERROR - stderr - +2025-02-05 16:39:01 - INFO - stdout - {'loss': 0.9421, 'grad_norm': 1.09096360206604, 'learning_rate': 1.7908270472784647e-05, 'epoch': 0.7} +2025-02-05 16:39:01 - ERROR - stderr - 23%|██▎ | 5236/22434 [6:31:20<12:00:13, 2.51s/it] +2025-02-05 16:39:03 - ERROR - stderr - 23%|██▎ | 5237/22434 [6:31:23<12:12:17, 2.55s/it] +2025-02-05 16:39:03 - ERROR - stderr - +2025-02-05 16:39:03 - ERROR - stderr - +2025-02-05 16:39:03 - INFO - stdout - {'loss': 0.9539, 'grad_norm': 0.996152937412262, 'learning_rate': 1.7907386760132418e-05, 'epoch': 0.7} +2025-02-05 16:39:03 - ERROR - stderr - 23%|██▎ | 5237/22434 [6:31:23<12:12:17, 2.55s/it] +2025-02-05 16:39:06 - ERROR - stderr - 23%|██▎ | 5238/22434 [6:31:26<12:09:06, 2.54s/it] +2025-02-05 16:39:06 - ERROR - stderr - +2025-02-05 16:39:06 - ERROR - stderr - +2025-02-05 16:39:06 - INFO - stdout - {'loss': 0.8822, 'grad_norm': 1.2319047451019287, 'learning_rate': 1.790650288265828e-05, 'epoch': 0.7} +2025-02-05 16:39:06 - ERROR - stderr - 23%|██▎ | 5238/22434 [6:31:26<12:09:06, 2.54s/it] +2025-02-05 16:39:08 - ERROR - stderr - 23%|██▎ | 5239/22434 [6:31:28<12:18:05, 2.58s/it] +2025-02-05 16:39:09 - ERROR - stderr - +2025-02-05 16:39:09 - ERROR - stderr - +2025-02-05 16:39:09 - INFO - stdout - {'loss': 0.9331, 'grad_norm': 1.1690583229064941, 'learning_rate': 1.7905618840380655e-05, 'epoch': 0.7} +2025-02-05 16:39:09 - ERROR - stderr - 23%|██▎ | 5239/22434 [6:31:28<12:18:05, 2.58s/it] +2025-02-05 16:39:11 - ERROR - stderr - 23%|██▎ | 5240/22434 [6:31:31<12:06:55, 2.54s/it] +2025-02-05 16:39:11 - ERROR - stderr - +2025-02-05 16:39:11 - ERROR - stderr - +2025-02-05 16:39:11 - INFO - stdout - {'loss': 0.7813, 'grad_norm': 1.053465723991394, 'learning_rate': 1.790473463331797e-05, 'epoch': 0.7} +2025-02-05 16:39:11 - ERROR - stderr - 23%|██▎ | 5240/22434 [6:31:31<12:06:55, 2.54s/it] +2025-02-05 16:39:13 - ERROR - stderr - 23%|██▎ | 5241/22434 [6:31:33<11:58:54, 2.51s/it] +2025-02-05 16:39:13 - ERROR - stderr - +2025-02-05 16:39:13 - ERROR - stderr - +2025-02-05 16:39:13 - INFO - stdout - {'loss': 1.0106, 'grad_norm': 1.1357570886611938, 'learning_rate': 1.7903850261488656e-05, 'epoch': 0.7} +2025-02-05 16:39:13 - ERROR - stderr - 23%|██▎ | 5241/22434 [6:31:33<11:58:54, 2.51s/it] +2025-02-05 16:39:16 - ERROR - stderr - 23%|██▎ | 5242/22434 [6:31:36<11:55:09, 2.50s/it] +2025-02-05 16:39:16 - ERROR - stderr - +2025-02-05 16:39:16 - ERROR - stderr - +2025-02-05 16:39:16 - INFO - stdout - {'loss': 1.0207, 'grad_norm': 1.087565302848816, 'learning_rate': 1.7902965724911148e-05, 'epoch': 0.7} +2025-02-05 16:39:16 - ERROR - stderr - 23%|██▎ | 5242/22434 [6:31:36<11:55:09, 2.50s/it] +2025-02-05 16:39:18 - ERROR - stderr - 23%|██▎ | 5243/22434 [6:31:38<11:51:58, 2.48s/it] +2025-02-05 16:39:18 - ERROR - stderr - +2025-02-05 16:39:18 - ERROR - stderr - +2025-02-05 16:39:18 - INFO - stdout - {'loss': 0.8715, 'grad_norm': 1.0234267711639404, 'learning_rate': 1.7902081023603878e-05, 'epoch': 0.7} +2025-02-05 16:39:18 - ERROR - stderr - 23%|██▎ | 5243/22434 [6:31:38<11:51:58, 2.48s/it] +2025-02-05 16:39:21 - ERROR - stderr - 23%|██▎ | 5244/22434 [6:31:40<11:47:18, 2.47s/it] +2025-02-05 16:39:21 - ERROR - stderr - +2025-02-05 16:39:21 - ERROR - stderr - +2025-02-05 16:39:21 - INFO - stdout - {'loss': 0.8377, 'grad_norm': 1.0132678747177124, 'learning_rate': 1.7901196157585296e-05, 'epoch': 0.7} +2025-02-05 16:39:21 - ERROR - stderr - 23%|██▎ | 5244/22434 [6:31:41<11:47:18, 2.47s/it] +2025-02-05 16:39:23 - ERROR - stderr - 23%|██▎ | 5245/22434 [6:31:43<11:44:32, 2.46s/it] +2025-02-05 16:39:23 - ERROR - stderr - +2025-02-05 16:39:23 - ERROR - stderr - +2025-02-05 16:39:23 - INFO - stdout - {'loss': 1.0199, 'grad_norm': 1.1064453125, 'learning_rate': 1.7900311126873835e-05, 'epoch': 0.7} +2025-02-05 16:39:23 - ERROR - stderr - 23%|██▎ | 5245/22434 [6:31:43<11:44:32, 2.46s/it] +2025-02-05 16:39:26 - ERROR - stderr - 23%|██▎ | 5246/22434 [6:31:46<12:23:45, 2.60s/it] +2025-02-05 16:39:26 - ERROR - stderr - +2025-02-05 16:39:26 - ERROR - stderr - +2025-02-05 16:39:26 - INFO - stdout - {'loss': 0.8774, 'grad_norm': 1.1431941986083984, 'learning_rate': 1.789942593148795e-05, 'epoch': 0.7} +2025-02-05 16:39:26 - ERROR - stderr - 23%|██▎ | 5246/22434 [6:31:46<12:23:45, 2.60s/it] +2025-02-05 16:39:29 - ERROR - stderr - 23%|██▎ | 5247/22434 [6:31:48<12:23:37, 2.60s/it] +2025-02-05 16:39:29 - ERROR - stderr - +2025-02-05 16:39:29 - ERROR - stderr - +2025-02-05 16:39:29 - INFO - stdout - {'loss': 0.9808, 'grad_norm': 1.0045065879821777, 'learning_rate': 1.7898540571446093e-05, 'epoch': 0.7} +2025-02-05 16:39:29 - ERROR - stderr - 23%|██▎ | 5247/22434 [6:31:48<12:23:37, 2.60s/it] +2025-02-05 16:39:31 - ERROR - stderr - 23%|██▎ | 5248/22434 [6:31:51<12:18:07, 2.58s/it] +2025-02-05 16:39:31 - ERROR - stderr - +2025-02-05 16:39:31 - ERROR - stderr - +2025-02-05 16:39:31 - INFO - stdout - {'loss': 0.992, 'grad_norm': 0.9827533960342407, 'learning_rate': 1.7897655046766712e-05, 'epoch': 0.7} +2025-02-05 16:39:31 - ERROR - stderr - 23%|██▎ | 5248/22434 [6:31:51<12:18:07, 2.58s/it] +2025-02-05 16:39:34 - ERROR - stderr - 23%|██▎ | 5249/22434 [6:31:53<12:08:10, 2.54s/it] +2025-02-05 16:39:34 - ERROR - stderr - +2025-02-05 16:39:34 - ERROR - stderr - +2025-02-05 16:39:34 - INFO - stdout - {'loss': 0.9527, 'grad_norm': 1.1866761445999146, 'learning_rate': 1.789676935746827e-05, 'epoch': 0.7} +2025-02-05 16:39:34 - ERROR - stderr - 23%|██▎ | 5249/22434 [6:31:53<12:08:10, 2.54s/it] +2025-02-05 16:39:36 - ERROR - stderr - 23%|██▎ | 5250/22434 [6:31:56<12:05:57, 2.53s/it] +2025-02-05 16:39:36 - ERROR - stderr - +2025-02-05 16:39:36 - ERROR - stderr - +2025-02-05 16:39:36 - INFO - stdout - {'loss': 0.9541, 'grad_norm': 1.115775465965271, 'learning_rate': 1.7895883503569228e-05, 'epoch': 0.7} +2025-02-05 16:39:36 - ERROR - stderr - 23%|██▎ | 5250/22434 [6:31:56<12:05:57, 2.53s/it] +2025-02-05 16:39:39 - ERROR - stderr - 23%|██▎ | 5251/22434 [6:31:58<12:03:07, 2.52s/it] +2025-02-05 16:39:39 - ERROR - stderr - +2025-02-05 16:39:39 - ERROR - stderr - +2025-02-05 16:39:39 - INFO - stdout - {'loss': 0.8408, 'grad_norm': 1.0478111505508423, 'learning_rate': 1.789499748508805e-05, 'epoch': 0.7} +2025-02-05 16:39:39 - ERROR - stderr - 23%|██▎ | 5251/22434 [6:31:59<12:03:07, 2.52s/it] +2025-02-05 16:39:41 - ERROR - stderr - 23%|██▎ | 5252/22434 [6:32:01<12:03:42, 2.53s/it] +2025-02-05 16:39:41 - ERROR - stderr - +2025-02-05 16:39:41 - ERROR - stderr - +2025-02-05 16:39:41 - INFO - stdout - {'loss': 0.9891, 'grad_norm': 0.9772509336471558, 'learning_rate': 1.7894111302043203e-05, 'epoch': 0.7} +2025-02-05 16:39:41 - ERROR - stderr - 23%|██▎ | 5252/22434 [6:32:01<12:03:42, 2.53s/it] +2025-02-05 16:39:44 - ERROR - stderr - 23%|██▎ | 5253/22434 [6:32:03<11:58:42, 2.51s/it] +2025-02-05 16:39:44 - ERROR - stderr - +2025-02-05 16:39:44 - ERROR - stderr - +2025-02-05 16:39:44 - INFO - stdout - {'loss': 0.9032, 'grad_norm': 0.959745466709137, 'learning_rate': 1.7893224954453163e-05, 'epoch': 0.7} +2025-02-05 16:39:44 - ERROR - stderr - 23%|██▎ | 5253/22434 [6:32:04<11:58:42, 2.51s/it] +2025-02-05 16:39:46 - ERROR - stderr - 23%|██▎ | 5254/22434 [6:32:06<11:54:53, 2.50s/it] +2025-02-05 16:39:46 - ERROR - stderr - +2025-02-05 16:39:46 - ERROR - stderr - +2025-02-05 16:39:46 - INFO - stdout - {'loss': 1.0313, 'grad_norm': 1.1236652135849, 'learning_rate': 1.78923384423364e-05, 'epoch': 0.7} +2025-02-05 16:39:46 - ERROR - stderr - 23%|██▎ | 5254/22434 [6:32:06<11:54:53, 2.50s/it] +2025-02-05 16:39:49 - ERROR - stderr - 23%|██▎ | 5255/22434 [6:32:08<11:49:33, 2.48s/it] +2025-02-05 16:39:49 - ERROR - stderr - +2025-02-05 16:39:49 - ERROR - stderr - +2025-02-05 16:39:49 - INFO - stdout - {'loss': 1.0228, 'grad_norm': 1.166336178779602, 'learning_rate': 1.7891451765711393e-05, 'epoch': 0.7} +2025-02-05 16:39:49 - ERROR - stderr - 23%|██▎ | 5255/22434 [6:32:08<11:49:33, 2.48s/it] +2025-02-05 16:39:51 - ERROR - stderr - 23%|██▎ | 5256/22434 [6:32:11<11:46:54, 2.47s/it] +2025-02-05 16:39:51 - ERROR - stderr - +2025-02-05 16:39:51 - ERROR - stderr - +2025-02-05 16:39:51 - INFO - stdout - {'loss': 0.9542, 'grad_norm': 1.0581203699111938, 'learning_rate': 1.7890564924596624e-05, 'epoch': 0.7} +2025-02-05 16:39:51 - ERROR - stderr - 23%|██▎ | 5256/22434 [6:32:11<11:46:54, 2.47s/it] +2025-02-05 16:39:54 - ERROR - stderr - 23%|██▎ | 5257/22434 [6:32:13<11:56:36, 2.50s/it] +2025-02-05 16:39:54 - ERROR - stderr - +2025-02-05 16:39:54 - ERROR - stderr - +2025-02-05 16:39:54 - INFO - stdout - {'loss': 0.9759, 'grad_norm': 1.1052253246307373, 'learning_rate': 1.788967791901058e-05, 'epoch': 0.7} +2025-02-05 16:39:54 - ERROR - stderr - 23%|██▎ | 5257/22434 [6:32:13<11:56:36, 2.50s/it] +2025-02-05 16:39:56 - ERROR - stderr - 23%|██▎ | 5258/22434 [6:32:16<12:01:14, 2.52s/it] +2025-02-05 16:39:56 - ERROR - stderr - +2025-02-05 16:39:56 - ERROR - stderr - +2025-02-05 16:39:56 - INFO - stdout - {'loss': 1.0141, 'grad_norm': 1.1265859603881836, 'learning_rate': 1.7888790748971753e-05, 'epoch': 0.7} +2025-02-05 16:39:56 - ERROR - stderr - 23%|██▎ | 5258/22434 [6:32:16<12:01:14, 2.52s/it] +2025-02-05 16:39:59 - ERROR - stderr - 23%|██▎ | 5259/22434 [6:32:18<11:59:06, 2.51s/it] +2025-02-05 16:39:59 - ERROR - stderr - +2025-02-05 16:39:59 - ERROR - stderr - +2025-02-05 16:39:59 - INFO - stdout - {'loss': 0.9018, 'grad_norm': 0.9762358069419861, 'learning_rate': 1.7887903414498632e-05, 'epoch': 0.7} +2025-02-05 16:39:59 - ERROR - stderr - 23%|██▎ | 5259/22434 [6:32:18<11:59:06, 2.51s/it] +2025-02-05 16:40:01 - ERROR - stderr - 23%|██▎ | 5260/22434 [6:32:21<11:53:21, 2.49s/it] +2025-02-05 16:40:01 - ERROR - stderr - +2025-02-05 16:40:01 - ERROR - stderr - +2025-02-05 16:40:01 - INFO - stdout - {'loss': 1.0562, 'grad_norm': 1.12031090259552, 'learning_rate': 1.7887015915609708e-05, 'epoch': 0.7} +2025-02-05 16:40:01 - ERROR - stderr - 23%|██▎ | 5260/22434 [6:32:21<11:53:21, 2.49s/it] +2025-02-05 16:40:04 - ERROR - stderr - 23%|██▎ | 5261/22434 [6:32:23<12:01:22, 2.52s/it] +2025-02-05 16:40:04 - ERROR - stderr - +2025-02-05 16:40:04 - ERROR - stderr - +2025-02-05 16:40:04 - INFO - stdout - {'loss': 1.031, 'grad_norm': 1.1069087982177734, 'learning_rate': 1.7886128252323486e-05, 'epoch': 0.7} +2025-02-05 16:40:04 - ERROR - stderr - 23%|██▎ | 5261/22434 [6:32:24<12:01:22, 2.52s/it] +2025-02-05 16:40:06 - ERROR - stderr - 23%|██▎ | 5262/22434 [6:32:26<12:08:43, 2.55s/it] +2025-02-05 16:40:06 - ERROR - stderr - +2025-02-05 16:40:06 - ERROR - stderr - +2025-02-05 16:40:06 - INFO - stdout - {'loss': 0.9809, 'grad_norm': 1.1441650390625, 'learning_rate': 1.7885240424658466e-05, 'epoch': 0.7} +2025-02-05 16:40:06 - ERROR - stderr - 23%|██▎ | 5262/22434 [6:32:26<12:08:43, 2.55s/it] +2025-02-05 16:40:09 - ERROR - stderr - 23%|██▎ | 5263/22434 [6:32:29<12:09:41, 2.55s/it] +2025-02-05 16:40:09 - ERROR - stderr - +2025-02-05 16:40:09 - ERROR - stderr - +2025-02-05 16:40:09 - INFO - stdout - {'loss': 0.9829, 'grad_norm': 0.9909963607788086, 'learning_rate': 1.7884352432633157e-05, 'epoch': 0.7} +2025-02-05 16:40:09 - ERROR - stderr - 23%|██▎ | 5263/22434 [6:32:29<12:09:41, 2.55s/it] +2025-02-05 16:40:11 - ERROR - stderr - 23%|██▎ | 5264/22434 [6:32:31<12:10:12, 2.55s/it] +2025-02-05 16:40:11 - ERROR - stderr - +2025-02-05 16:40:11 - ERROR - stderr - +2025-02-05 16:40:11 - INFO - stdout - {'loss': 0.8514, 'grad_norm': 1.0335956811904907, 'learning_rate': 1.7883464276266064e-05, 'epoch': 0.7} +2025-02-05 16:40:11 - ERROR - stderr - 23%|██▎ | 5264/22434 [6:32:31<12:10:12, 2.55s/it] +2025-02-05 16:40:14 - ERROR - stderr - 23%|██▎ | 5265/22434 [6:32:34<12:12:40, 2.56s/it] +2025-02-05 16:40:14 - ERROR - stderr - +2025-02-05 16:40:14 - ERROR - stderr - +2025-02-05 16:40:14 - INFO - stdout - {'loss': 0.9805, 'grad_norm': 1.1265125274658203, 'learning_rate': 1.7882575955575702e-05, 'epoch': 0.7} +2025-02-05 16:40:14 - ERROR - stderr - 23%|██▎ | 5265/22434 [6:32:34<12:12:40, 2.56s/it] +2025-02-05 16:40:17 - ERROR - stderr - 23%|██▎ | 5266/22434 [6:32:36<12:15:10, 2.57s/it] +2025-02-05 16:40:17 - ERROR - stderr - +2025-02-05 16:40:17 - ERROR - stderr - +2025-02-05 16:40:17 - INFO - stdout - {'loss': 0.8798, 'grad_norm': 0.9583535194396973, 'learning_rate': 1.788168747058059e-05, 'epoch': 0.7} +2025-02-05 16:40:17 - ERROR - stderr - 23%|██▎ | 5266/22434 [6:32:36<12:15:10, 2.57s/it] +2025-02-05 16:40:19 - ERROR - stderr - 23%|██▎ | 5267/22434 [6:32:39<12:10:50, 2.55s/it] +2025-02-05 16:40:19 - ERROR - stderr - +2025-02-05 16:40:19 - ERROR - stderr - +2025-02-05 16:40:19 - INFO - stdout - {'loss': 1.0675, 'grad_norm': 1.2350503206253052, 'learning_rate': 1.788079882129924e-05, 'epoch': 0.7} +2025-02-05 16:40:19 - ERROR - stderr - 23%|██▎ | 5267/22434 [6:32:39<12:10:50, 2.55s/it] +2025-02-05 16:40:22 - ERROR - stderr - 23%|██▎ | 5268/22434 [6:32:41<12:02:23, 2.52s/it] +2025-02-05 16:40:22 - ERROR - stderr - +2025-02-05 16:40:22 - ERROR - stderr - +2025-02-05 16:40:22 - INFO - stdout - {'loss': 0.9671, 'grad_norm': 1.0673515796661377, 'learning_rate': 1.7879910007750184e-05, 'epoch': 0.7} +2025-02-05 16:40:22 - ERROR - stderr - 23%|██▎ | 5268/22434 [6:32:41<12:02:23, 2.52s/it] +2025-02-05 16:40:24 - ERROR - stderr - 23%|██▎ | 5269/22434 [6:32:44<11:57:26, 2.51s/it] +2025-02-05 16:40:24 - ERROR - stderr - +2025-02-05 16:40:24 - ERROR - stderr - +2025-02-05 16:40:24 - INFO - stdout - {'loss': 0.9373, 'grad_norm': 1.1447023153305054, 'learning_rate': 1.787902102995194e-05, 'epoch': 0.7} +2025-02-05 16:40:24 - ERROR - stderr - 23%|██▎ | 5269/22434 [6:32:44<11:57:26, 2.51s/it] +2025-02-05 16:40:27 - ERROR - stderr - 23%|██▎ | 5270/22434 [6:32:46<11:56:15, 2.50s/it] +2025-02-05 16:40:27 - ERROR - stderr - +2025-02-05 16:40:27 - ERROR - stderr - +2025-02-05 16:40:27 - INFO - stdout - {'loss': 0.8941, 'grad_norm': 1.0985013246536255, 'learning_rate': 1.7878131887923045e-05, 'epoch': 0.7} +2025-02-05 16:40:27 - ERROR - stderr - 23%|██▎ | 5270/22434 [6:32:46<11:56:15, 2.50s/it] +2025-02-05 16:40:29 - ERROR - stderr - 23%|██▎ | 5271/22434 [6:32:49<11:49:59, 2.48s/it] +2025-02-05 16:40:29 - ERROR - stderr - +2025-02-05 16:40:29 - ERROR - stderr - +2025-02-05 16:40:29 - INFO - stdout - {'loss': 0.9793, 'grad_norm': 1.0972760915756226, 'learning_rate': 1.7877242581682028e-05, 'epoch': 0.7} +2025-02-05 16:40:29 - ERROR - stderr - 23%|██▎ | 5271/22434 [6:32:49<11:49:59, 2.48s/it] +2025-02-05 16:40:32 - ERROR - stderr - 24%|██▎ | 5272/22434 [6:32:51<11:55:06, 2.50s/it] +2025-02-05 16:40:32 - ERROR - stderr - +2025-02-05 16:40:32 - ERROR - stderr - +2025-02-05 16:40:32 - INFO - stdout - {'loss': 0.9436, 'grad_norm': 1.0065748691558838, 'learning_rate': 1.7876353111247425e-05, 'epoch': 0.71} +2025-02-05 16:40:32 - ERROR - stderr - 24%|██▎ | 5272/22434 [6:32:51<11:55:06, 2.50s/it] +2025-02-05 16:40:34 - ERROR - stderr - 24%|██▎ | 5273/22434 [6:32:54<11:58:56, 2.51s/it] +2025-02-05 16:40:34 - ERROR - stderr - +2025-02-05 16:40:34 - ERROR - stderr - +2025-02-05 16:40:34 - INFO - stdout - {'loss': 0.9163, 'grad_norm': 1.0345087051391602, 'learning_rate': 1.7875463476637783e-05, 'epoch': 0.71} +2025-02-05 16:40:34 - ERROR - stderr - 24%|██▎ | 5273/22434 [6:32:54<11:58:56, 2.51s/it] +2025-02-05 16:40:37 - ERROR - stderr - 24%|██▎ | 5274/22434 [6:32:56<11:53:58, 2.50s/it] +2025-02-05 16:40:37 - ERROR - stderr - +2025-02-05 16:40:37 - ERROR - stderr - +2025-02-05 16:40:37 - INFO - stdout - {'loss': 0.904, 'grad_norm': 1.1655449867248535, 'learning_rate': 1.7874573677871638e-05, 'epoch': 0.71} +2025-02-05 16:40:37 - ERROR - stderr - 24%|██▎ | 5274/22434 [6:32:56<11:53:58, 2.50s/it] +2025-02-05 16:40:39 - ERROR - stderr - 24%|██▎ | 5275/22434 [6:32:59<11:50:38, 2.48s/it] +2025-02-05 16:40:39 - ERROR - stderr - +2025-02-05 16:40:39 - ERROR - stderr - +2025-02-05 16:40:39 - INFO - stdout - {'loss': 0.9389, 'grad_norm': 1.2329585552215576, 'learning_rate': 1.787368371496754e-05, 'epoch': 0.71} +2025-02-05 16:40:39 - ERROR - stderr - 24%|██▎ | 5275/22434 [6:32:59<11:50:38, 2.48s/it] +2025-02-05 16:40:41 - ERROR - stderr - 24%|██▎ | 5276/22434 [6:33:01<11:51:58, 2.49s/it] +2025-02-05 16:40:42 - ERROR - stderr - +2025-02-05 16:40:42 - ERROR - stderr - +2025-02-05 16:40:42 - INFO - stdout - {'loss': 0.9652, 'grad_norm': 1.0300840139389038, 'learning_rate': 1.787279358794404e-05, 'epoch': 0.71} +2025-02-05 16:40:42 - ERROR - stderr - 24%|██▎ | 5276/22434 [6:33:01<11:51:58, 2.49s/it] +2025-02-05 16:40:44 - ERROR - stderr - 24%|██▎ | 5277/22434 [6:33:04<11:59:32, 2.52s/it] +2025-02-05 16:40:44 - ERROR - stderr - +2025-02-05 16:40:44 - ERROR - stderr - +2025-02-05 16:40:44 - INFO - stdout - {'loss': 0.987, 'grad_norm': 1.0218945741653442, 'learning_rate': 1.787190329681969e-05, 'epoch': 0.71} +2025-02-05 16:40:44 - ERROR - stderr - 24%|██▎ | 5277/22434 [6:33:04<11:59:32, 2.52s/it] +2025-02-05 16:40:47 - ERROR - stderr - 24%|██▎ | 5278/22434 [6:33:06<11:59:40, 2.52s/it] +2025-02-05 16:40:47 - ERROR - stderr - +2025-02-05 16:40:47 - ERROR - stderr - +2025-02-05 16:40:47 - INFO - stdout - {'loss': 0.9745, 'grad_norm': 1.064854383468628, 'learning_rate': 1.787101284161305e-05, 'epoch': 0.71} +2025-02-05 16:40:47 - ERROR - stderr - 24%|██▎ | 5278/22434 [6:33:06<11:59:40, 2.52s/it] +2025-02-05 16:40:49 - ERROR - stderr - 24%|██▎ | 5279/22434 [6:33:09<11:59:45, 2.52s/it] +2025-02-05 16:40:49 - ERROR - stderr - +2025-02-05 16:40:49 - ERROR - stderr - +2025-02-05 16:40:49 - INFO - stdout - {'loss': 0.9774, 'grad_norm': 1.0169978141784668, 'learning_rate': 1.787012222234268e-05, 'epoch': 0.71} +2025-02-05 16:40:49 - ERROR - stderr - 24%|██▎ | 5279/22434 [6:33:09<11:59:45, 2.52s/it] +2025-02-05 16:40:52 - ERROR - stderr - 24%|██▎ | 5280/22434 [6:33:11<11:56:35, 2.51s/it] +2025-02-05 16:40:52 - ERROR - stderr - +2025-02-05 16:40:52 - ERROR - stderr - +2025-02-05 16:40:52 - INFO - stdout - {'loss': 0.8909, 'grad_norm': 1.0274205207824707, 'learning_rate': 1.786923143902714e-05, 'epoch': 0.71} +2025-02-05 16:40:52 - ERROR - stderr - 24%|██▎ | 5280/22434 [6:33:11<11:56:35, 2.51s/it] +2025-02-05 16:40:54 - ERROR - stderr - 24%|██▎ | 5281/22434 [6:33:14<12:29:05, 2.62s/it] +2025-02-05 16:40:54 - ERROR - stderr - +2025-02-05 16:40:54 - ERROR - stderr - +2025-02-05 16:40:54 - INFO - stdout - {'loss': 0.9401, 'grad_norm': 1.074730396270752, 'learning_rate': 1.7868340491685e-05, 'epoch': 0.71} +2025-02-05 16:40:54 - ERROR - stderr - 24%|██▎ | 5281/22434 [6:33:14<12:29:05, 2.62s/it] +2025-02-05 16:40:57 - ERROR - stderr - 24%|██▎ | 5282/22434 [6:33:17<12:15:09, 2.57s/it] +2025-02-05 16:40:57 - ERROR - stderr - +2025-02-05 16:40:57 - ERROR - stderr - +2025-02-05 16:40:57 - INFO - stdout - {'loss': 0.9214, 'grad_norm': 1.1362671852111816, 'learning_rate': 1.7867449380334834e-05, 'epoch': 0.71} +2025-02-05 16:40:57 - ERROR - stderr - 24%|██▎ | 5282/22434 [6:33:17<12:15:09, 2.57s/it] +2025-02-05 16:40:59 - ERROR - stderr - 24%|██▎ | 5283/22434 [6:33:19<12:04:30, 2.53s/it] +2025-02-05 16:40:59 - ERROR - stderr - +2025-02-05 16:40:59 - ERROR - stderr - +2025-02-05 16:40:59 - INFO - stdout - {'loss': 0.8922, 'grad_norm': 1.0211025476455688, 'learning_rate': 1.7866558104995214e-05, 'epoch': 0.71} +2025-02-05 16:40:59 - ERROR - stderr - 24%|██▎ | 5283/22434 [6:33:19<12:04:30, 2.53s/it] +2025-02-05 16:41:02 - ERROR - stderr - 24%|██▎ | 5284/22434 [6:33:22<11:56:42, 2.51s/it] +2025-02-05 16:41:02 - ERROR - stderr - +2025-02-05 16:41:02 - ERROR - stderr - +2025-02-05 16:41:02 - INFO - stdout - {'loss': 0.9344, 'grad_norm': 1.0863420963287354, 'learning_rate': 1.786566666568472e-05, 'epoch': 0.71} +2025-02-05 16:41:02 - ERROR - stderr - 24%|██▎ | 5284/22434 [6:33:22<11:56:42, 2.51s/it] +2025-02-05 16:41:04 - ERROR - stderr - 24%|██▎ | 5285/22434 [6:33:24<11:56:57, 2.51s/it] +2025-02-05 16:41:04 - ERROR - stderr - +2025-02-05 16:41:04 - ERROR - stderr - +2025-02-05 16:41:04 - INFO - stdout - {'loss': 1.0502, 'grad_norm': 1.0758394002914429, 'learning_rate': 1.7864775062421924e-05, 'epoch': 0.71} +2025-02-05 16:41:04 - ERROR - stderr - 24%|██▎ | 5285/22434 [6:33:24<11:56:57, 2.51s/it] +2025-02-05 16:41:07 - ERROR - stderr - 24%|██▎ | 5286/22434 [6:33:27<11:57:29, 2.51s/it] +2025-02-05 16:41:07 - ERROR - stderr - +2025-02-05 16:41:07 - ERROR - stderr - +2025-02-05 16:41:07 - INFO - stdout - {'loss': 1.0557, 'grad_norm': 1.0227526426315308, 'learning_rate': 1.7863883295225423e-05, 'epoch': 0.71} +2025-02-05 16:41:07 - ERROR - stderr - 24%|██▎ | 5286/22434 [6:33:27<11:57:29, 2.51s/it] +2025-02-05 16:41:09 - ERROR - stderr - 24%|██▎ | 5287/22434 [6:33:29<11:59:06, 2.52s/it] +2025-02-05 16:41:09 - ERROR - stderr - +2025-02-05 16:41:09 - ERROR - stderr - +2025-02-05 16:41:09 - INFO - stdout - {'loss': 0.9861, 'grad_norm': 1.0228816270828247, 'learning_rate': 1.78629913641138e-05, 'epoch': 0.71} +2025-02-05 16:41:09 - ERROR - stderr - 24%|██▎ | 5287/22434 [6:33:29<11:59:06, 2.52s/it] +2025-02-05 16:41:12 - ERROR - stderr - 24%|██▎ | 5288/22434 [6:33:32<11:54:52, 2.50s/it] +2025-02-05 16:41:12 - ERROR - stderr - +2025-02-05 16:41:12 - ERROR - stderr - +2025-02-05 16:41:12 - INFO - stdout - {'loss': 0.9826, 'grad_norm': 1.1481306552886963, 'learning_rate': 1.7862099269105644e-05, 'epoch': 0.71} +2025-02-05 16:41:12 - ERROR - stderr - 24%|██▎ | 5288/22434 [6:33:32<11:54:52, 2.50s/it] +2025-02-05 16:41:14 - ERROR - stderr - 24%|██▎ | 5289/22434 [6:33:34<11:48:39, 2.48s/it] +2025-02-05 16:41:14 - ERROR - stderr - +2025-02-05 16:41:14 - ERROR - stderr - +2025-02-05 16:41:14 - INFO - stdout - {'loss': 0.9349, 'grad_norm': 1.1520885229110718, 'learning_rate': 1.786120701021955e-05, 'epoch': 0.71} +2025-02-05 16:41:14 - ERROR - stderr - 24%|██▎ | 5289/22434 [6:33:34<11:48:39, 2.48s/it] +2025-02-05 16:41:17 - ERROR - stderr - 24%|██▎ | 5290/22434 [6:33:36<11:46:06, 2.47s/it] +2025-02-05 16:41:17 - ERROR - stderr - +2025-02-05 16:41:17 - ERROR - stderr - +2025-02-05 16:41:17 - INFO - stdout - {'loss': 0.8703, 'grad_norm': 1.0344934463500977, 'learning_rate': 1.7860314587474125e-05, 'epoch': 0.71} +2025-02-05 16:41:17 - ERROR - stderr - 24%|██▎ | 5290/22434 [6:33:37<11:46:06, 2.47s/it] +2025-02-05 16:41:19 - ERROR - stderr - 24%|██▎ | 5291/22434 [6:33:39<11:43:05, 2.46s/it] +2025-02-05 16:41:19 - ERROR - stderr - +2025-02-05 16:41:19 - ERROR - stderr - +2025-02-05 16:41:19 - INFO - stdout - {'loss': 0.8574, 'grad_norm': 1.1576783657073975, 'learning_rate': 1.785942200088796e-05, 'epoch': 0.71} +2025-02-05 16:41:19 - ERROR - stderr - 24%|██▎ | 5291/22434 [6:33:39<11:43:05, 2.46s/it] +2025-02-05 16:41:22 - ERROR - stderr - 24%|██▎ | 5292/22434 [6:33:41<11:50:12, 2.49s/it] +2025-02-05 16:41:22 - ERROR - stderr - +2025-02-05 16:41:22 - ERROR - stderr - +2025-02-05 16:41:22 - INFO - stdout - {'loss': 1.0546, 'grad_norm': 1.1413007974624634, 'learning_rate': 1.785852925047966e-05, 'epoch': 0.71} +2025-02-05 16:41:22 - ERROR - stderr - 24%|██▎ | 5292/22434 [6:33:42<11:50:12, 2.49s/it] +2025-02-05 16:41:24 - ERROR - stderr - 24%|██▎ | 5293/22434 [6:33:44<11:50:25, 2.49s/it] +2025-02-05 16:41:24 - ERROR - stderr - +2025-02-05 16:41:24 - ERROR - stderr - +2025-02-05 16:41:24 - INFO - stdout - {'loss': 0.9736, 'grad_norm': 1.1409422159194946, 'learning_rate': 1.7857636336267843e-05, 'epoch': 0.71} +2025-02-05 16:41:24 - ERROR - stderr - 24%|██▎ | 5293/22434 [6:33:44<11:50:25, 2.49s/it] +2025-02-05 16:41:27 - ERROR - stderr - 24%|██▎ | 5294/22434 [6:33:46<11:52:30, 2.49s/it] +2025-02-05 16:41:27 - ERROR - stderr - +2025-02-05 16:41:27 - ERROR - stderr - +2025-02-05 16:41:27 - INFO - stdout - {'loss': 1.0161, 'grad_norm': 1.0932285785675049, 'learning_rate': 1.7856743258271115e-05, 'epoch': 0.71} +2025-02-05 16:41:27 - ERROR - stderr - 24%|██▎ | 5294/22434 [6:33:47<11:52:30, 2.49s/it] +2025-02-05 16:41:29 - ERROR - stderr - 24%|██▎ | 5295/22434 [6:33:49<11:51:55, 2.49s/it] +2025-02-05 16:41:29 - ERROR - stderr - +2025-02-05 16:41:29 - ERROR - stderr - +2025-02-05 16:41:29 - INFO - stdout - {'loss': 0.9781, 'grad_norm': 1.1391288042068481, 'learning_rate': 1.785585001650809e-05, 'epoch': 0.71} +2025-02-05 16:41:29 - ERROR - stderr - 24%|██▎ | 5295/22434 [6:33:49<11:51:55, 2.49s/it] +2025-02-05 16:41:32 - ERROR - stderr - 24%|██▎ | 5296/22434 [6:33:52<11:58:09, 2.51s/it] +2025-02-05 16:41:32 - ERROR - stderr - +2025-02-05 16:41:32 - ERROR - stderr - +2025-02-05 16:41:32 - INFO - stdout - {'loss': 0.9149, 'grad_norm': 1.0212510824203491, 'learning_rate': 1.7854956610997388e-05, 'epoch': 0.71} +2025-02-05 16:41:32 - ERROR - stderr - 24%|██▎ | 5296/22434 [6:33:52<11:58:09, 2.51s/it] +2025-02-05 16:41:34 - ERROR - stderr - 24%|██▎ | 5297/22434 [6:33:54<11:52:32, 2.49s/it] +2025-02-05 16:41:34 - ERROR - stderr - +2025-02-05 16:41:34 - ERROR - stderr - +2025-02-05 16:41:34 - INFO - stdout - {'loss': 1.0497, 'grad_norm': 1.2093931436538696, 'learning_rate': 1.7854063041757635e-05, 'epoch': 0.71} +2025-02-05 16:41:34 - ERROR - stderr - 24%|██▎ | 5297/22434 [6:33:54<11:52:32, 2.49s/it] +2025-02-05 16:41:37 - ERROR - stderr - 24%|██▎ | 5298/22434 [6:33:57<11:57:57, 2.51s/it] +2025-02-05 16:41:37 - ERROR - stderr - +2025-02-05 16:41:37 - ERROR - stderr - +2025-02-05 16:41:37 - INFO - stdout - {'loss': 1.0709, 'grad_norm': 1.082269549369812, 'learning_rate': 1.785316930880745e-05, 'epoch': 0.71} +2025-02-05 16:41:37 - ERROR - stderr - 24%|██▎ | 5298/22434 [6:33:57<11:57:57, 2.51s/it] +2025-02-05 16:41:39 - ERROR - stderr - 24%|██▎ | 5299/22434 [6:33:59<11:56:37, 2.51s/it] +2025-02-05 16:41:39 - ERROR - stderr - +2025-02-05 16:41:39 - ERROR - stderr - +2025-02-05 16:41:39 - INFO - stdout - {'loss': 0.964, 'grad_norm': 0.9924930930137634, 'learning_rate': 1.7852275412165467e-05, 'epoch': 0.71} +2025-02-05 16:41:39 - ERROR - stderr - 24%|██▎ | 5299/22434 [6:33:59<11:56:37, 2.51s/it] +2025-02-05 16:41:42 - ERROR - stderr - 24%|██▎ | 5300/22434 [6:34:02<12:00:08, 2.52s/it] +2025-02-05 16:41:42 - ERROR - stderr - +2025-02-05 16:41:42 - ERROR - stderr - +2025-02-05 16:41:42 - INFO - stdout - {'loss': 0.9801, 'grad_norm': 1.0674864053726196, 'learning_rate': 1.7851381351850318e-05, 'epoch': 0.71} +2025-02-05 16:41:42 - ERROR - stderr - 24%|██▎ | 5300/22434 [6:34:02<12:00:08, 2.52s/it] +2025-02-05 16:41:44 - ERROR - stderr - 24%|██▎ | 5301/22434 [6:34:04<11:56:23, 2.51s/it] +2025-02-05 16:41:44 - ERROR - stderr - +2025-02-05 16:41:44 - ERROR - stderr - +2025-02-05 16:41:44 - INFO - stdout - {'loss': 0.9648, 'grad_norm': 1.0504636764526367, 'learning_rate': 1.7850487127880636e-05, 'epoch': 0.71} +2025-02-05 16:41:44 - ERROR - stderr - 24%|██▎ | 5301/22434 [6:34:04<11:56:23, 2.51s/it] +2025-02-05 16:41:47 - ERROR - stderr - 24%|██▎ | 5302/22434 [6:34:07<11:55:08, 2.50s/it] +2025-02-05 16:41:47 - ERROR - stderr - +2025-02-05 16:41:47 - ERROR - stderr - +2025-02-05 16:41:47 - INFO - stdout - {'loss': 0.9881, 'grad_norm': 1.0514013767242432, 'learning_rate': 1.7849592740275063e-05, 'epoch': 0.71} +2025-02-05 16:41:47 - ERROR - stderr - 24%|██▎ | 5302/22434 [6:34:07<11:55:08, 2.50s/it] +2025-02-05 16:41:49 - ERROR - stderr - 24%|██▎ | 5303/22434 [6:34:09<11:52:18, 2.49s/it] +2025-02-05 16:41:49 - ERROR - stderr - +2025-02-05 16:41:49 - ERROR - stderr - +2025-02-05 16:41:49 - INFO - stdout - {'loss': 0.9545, 'grad_norm': 1.1882227659225464, 'learning_rate': 1.784869818905224e-05, 'epoch': 0.71} +2025-02-05 16:41:49 - ERROR - stderr - 24%|██▎ | 5303/22434 [6:34:09<11:52:18, 2.49s/it] +2025-02-05 16:41:52 - ERROR - stderr - 24%|██▎ | 5304/22434 [6:34:12<12:02:39, 2.53s/it] +2025-02-05 16:41:52 - ERROR - stderr - +2025-02-05 16:41:52 - ERROR - stderr - +2025-02-05 16:41:52 - INFO - stdout - {'loss': 1.0266, 'grad_norm': 1.171319842338562, 'learning_rate': 1.7847803474230813e-05, 'epoch': 0.71} +2025-02-05 16:41:52 - ERROR - stderr - 24%|██▎ | 5304/22434 [6:34:12<12:02:39, 2.53s/it] +2025-02-05 16:41:54 - ERROR - stderr - 24%|██▎ | 5305/22434 [6:34:14<11:59:03, 2.52s/it] +2025-02-05 16:41:54 - ERROR - stderr - +2025-02-05 16:41:54 - ERROR - stderr - +2025-02-05 16:41:54 - INFO - stdout - {'loss': 0.9881, 'grad_norm': 1.018519639968872, 'learning_rate': 1.7846908595829432e-05, 'epoch': 0.71} +2025-02-05 16:41:54 - ERROR - stderr - 24%|██▎ | 5305/22434 [6:34:14<11:59:03, 2.52s/it] +2025-02-05 16:41:57 - ERROR - stderr - 24%|██▎ | 5306/22434 [6:34:17<11:57:21, 2.51s/it] +2025-02-05 16:41:57 - ERROR - stderr - +2025-02-05 16:41:57 - ERROR - stderr - +2025-02-05 16:41:57 - INFO - stdout - {'loss': 0.8423, 'grad_norm': 1.0081459283828735, 'learning_rate': 1.7846013553866754e-05, 'epoch': 0.71} +2025-02-05 16:41:57 - ERROR - stderr - 24%|██▎ | 5306/22434 [6:34:17<11:57:21, 2.51s/it] +2025-02-05 16:41:59 - ERROR - stderr - 24%|██▎ | 5307/22434 [6:34:19<11:56:06, 2.51s/it] +2025-02-05 16:41:59 - ERROR - stderr - +2025-02-05 16:41:59 - ERROR - stderr - +2025-02-05 16:41:59 - INFO - stdout - {'loss': 0.9642, 'grad_norm': 1.0839706659317017, 'learning_rate': 1.7845118348361428e-05, 'epoch': 0.71} +2025-02-05 16:41:59 - ERROR - stderr - 24%|██▎ | 5307/22434 [6:34:19<11:56:06, 2.51s/it] +2025-02-05 16:42:02 - ERROR - stderr - 24%|██▎ | 5308/22434 [6:34:22<12:02:30, 2.53s/it] +2025-02-05 16:42:02 - ERROR - stderr - +2025-02-05 16:42:02 - ERROR - stderr - +2025-02-05 16:42:02 - INFO - stdout - {'loss': 0.7332, 'grad_norm': 0.9726243615150452, 'learning_rate': 1.7844222979332115e-05, 'epoch': 0.71} +2025-02-05 16:42:02 - ERROR - stderr - 24%|██▎ | 5308/22434 [6:34:22<12:02:30, 2.53s/it] +2025-02-05 16:42:04 - ERROR - stderr - 24%|██▎ | 5309/22434 [6:34:24<12:02:08, 2.53s/it] +2025-02-05 16:42:05 - ERROR - stderr - +2025-02-05 16:42:05 - ERROR - stderr - +2025-02-05 16:42:05 - INFO - stdout - {'loss': 0.9754, 'grad_norm': 1.054402470588684, 'learning_rate': 1.7843327446797482e-05, 'epoch': 0.71} +2025-02-05 16:42:05 - ERROR - stderr - 24%|██▎ | 5309/22434 [6:34:24<12:02:08, 2.53s/it] +2025-02-05 16:42:07 - ERROR - stderr - 24%|██▎ | 5310/22434 [6:34:27<11:54:06, 2.50s/it] +2025-02-05 16:42:07 - ERROR - stderr - +2025-02-05 16:42:07 - ERROR - stderr - +2025-02-05 16:42:07 - INFO - stdout - {'loss': 0.9681, 'grad_norm': 1.0407793521881104, 'learning_rate': 1.7842431750776196e-05, 'epoch': 0.71} +2025-02-05 16:42:07 - ERROR - stderr - 24%|██▎ | 5310/22434 [6:34:27<11:54:06, 2.50s/it] +2025-02-05 16:42:09 - ERROR - stderr - 24%|██▎ | 5311/22434 [6:34:29<11:54:58, 2.51s/it] +2025-02-05 16:42:09 - ERROR - stderr - +2025-02-05 16:42:09 - ERROR - stderr - +2025-02-05 16:42:09 - INFO - stdout - {'loss': 0.9874, 'grad_norm': 0.9815563559532166, 'learning_rate': 1.784153589128692e-05, 'epoch': 0.71} +2025-02-05 16:42:09 - ERROR - stderr - 24%|██▎ | 5311/22434 [6:34:29<11:54:58, 2.51s/it] +2025-02-05 16:42:12 - ERROR - stderr - 24%|██▎ | 5312/22434 [6:34:32<11:56:07, 2.51s/it] +2025-02-05 16:42:12 - ERROR - stderr - +2025-02-05 16:42:12 - ERROR - stderr - +2025-02-05 16:42:12 - INFO - stdout - {'loss': 1.008, 'grad_norm': 1.109031081199646, 'learning_rate': 1.7840639868348338e-05, 'epoch': 0.71} +2025-02-05 16:42:12 - ERROR - stderr - 24%|██▎ | 5312/22434 [6:34:32<11:56:07, 2.51s/it] +2025-02-05 16:42:14 - ERROR - stderr - 24%|██▎ | 5313/22434 [6:34:34<11:48:56, 2.48s/it] +2025-02-05 16:42:14 - ERROR - stderr - +2025-02-05 16:42:14 - ERROR - stderr - +2025-02-05 16:42:14 - INFO - stdout - {'loss': 1.0199, 'grad_norm': 1.0666192770004272, 'learning_rate': 1.7839743681979117e-05, 'epoch': 0.71} +2025-02-05 16:42:14 - ERROR - stderr - 24%|██▎ | 5313/22434 [6:34:34<11:48:56, 2.48s/it] +2025-02-05 16:42:17 - ERROR - stderr - 24%|██▎ | 5314/22434 [6:34:37<12:09:10, 2.56s/it] +2025-02-05 16:42:17 - ERROR - stderr - +2025-02-05 16:42:17 - ERROR - stderr - +2025-02-05 16:42:17 - INFO - stdout - {'loss': 0.8564, 'grad_norm': 1.0544461011886597, 'learning_rate': 1.783884733219794e-05, 'epoch': 0.71} +2025-02-05 16:42:17 - ERROR - stderr - 24%|██▎ | 5314/22434 [6:34:37<12:09:10, 2.56s/it] +2025-02-05 16:42:20 - ERROR - stderr - 24%|██▎ | 5315/22434 [6:34:39<12:05:27, 2.54s/it] +2025-02-05 16:42:20 - ERROR - stderr - +2025-02-05 16:42:20 - ERROR - stderr - +2025-02-05 16:42:20 - INFO - stdout - {'loss': 0.9478, 'grad_norm': 0.9892165064811707, 'learning_rate': 1.783795081902349e-05, 'epoch': 0.71} +2025-02-05 16:42:20 - ERROR - stderr - 24%|██▎ | 5315/22434 [6:34:39<12:05:27, 2.54s/it] +2025-02-05 16:42:22 - ERROR - stderr - 24%|██▎ | 5316/22434 [6:34:42<12:04:02, 2.54s/it] +2025-02-05 16:42:22 - ERROR - stderr - +2025-02-05 16:42:22 - ERROR - stderr - +2025-02-05 16:42:22 - INFO - stdout - {'loss': 0.8816, 'grad_norm': 0.9916752576828003, 'learning_rate': 1.783705414247446e-05, 'epoch': 0.71} +2025-02-05 16:42:22 - ERROR - stderr - 24%|██▎ | 5316/22434 [6:34:42<12:04:02, 2.54s/it] +2025-02-05 16:42:25 - ERROR - stderr - 24%|██▎ | 5317/22434 [6:34:44<12:00:35, 2.53s/it] +2025-02-05 16:42:25 - ERROR - stderr - +2025-02-05 16:42:25 - ERROR - stderr - +2025-02-05 16:42:25 - INFO - stdout - {'loss': 0.9885, 'grad_norm': 1.0418808460235596, 'learning_rate': 1.783615730256953e-05, 'epoch': 0.71} +2025-02-05 16:42:25 - ERROR - stderr - 24%|██▎ | 5317/22434 [6:34:44<12:00:35, 2.53s/it] +2025-02-05 16:42:27 - ERROR - stderr - 24%|██▎ | 5318/22434 [6:34:47<12:05:55, 2.54s/it] +2025-02-05 16:42:27 - ERROR - stderr - +2025-02-05 16:42:27 - ERROR - stderr - +2025-02-05 16:42:27 - INFO - stdout - {'loss': 0.9534, 'grad_norm': 1.0031366348266602, 'learning_rate': 1.7835260299327402e-05, 'epoch': 0.71} +2025-02-05 16:42:27 - ERROR - stderr - 24%|██▎ | 5318/22434 [6:34:47<12:05:55, 2.54s/it] +2025-02-05 16:42:30 - ERROR - stderr - 24%|██▎ | 5319/22434 [6:34:49<11:58:17, 2.52s/it] +2025-02-05 16:42:30 - ERROR - stderr - +2025-02-05 16:42:30 - ERROR - stderr - +2025-02-05 16:42:30 - INFO - stdout - {'loss': 0.9269, 'grad_norm': 1.0235954523086548, 'learning_rate': 1.7834363132766772e-05, 'epoch': 0.71} +2025-02-05 16:42:30 - ERROR - stderr - 24%|██▎ | 5319/22434 [6:34:49<11:58:17, 2.52s/it] +2025-02-05 16:42:32 - ERROR - stderr - 24%|██▎ | 5320/22434 [6:34:52<11:57:49, 2.52s/it] +2025-02-05 16:42:32 - ERROR - stderr - +2025-02-05 16:42:32 - ERROR - stderr - +2025-02-05 16:42:32 - INFO - stdout - {'loss': 1.0242, 'grad_norm': 1.0455982685089111, 'learning_rate': 1.7833465802906338e-05, 'epoch': 0.71} +2025-02-05 16:42:32 - ERROR - stderr - 24%|██▎ | 5320/22434 [6:34:52<11:57:49, 2.52s/it] +2025-02-05 16:42:35 - ERROR - stderr - 24%|██▎ | 5321/22434 [6:34:54<11:51:35, 2.49s/it] +2025-02-05 16:42:35 - ERROR - stderr - +2025-02-05 16:42:35 - ERROR - stderr - +2025-02-05 16:42:35 - INFO - stdout - {'loss': 0.9916, 'grad_norm': 1.2224328517913818, 'learning_rate': 1.7832568309764802e-05, 'epoch': 0.71} +2025-02-05 16:42:35 - ERROR - stderr - 24%|██▎ | 5321/22434 [6:34:54<11:51:35, 2.49s/it] +2025-02-05 16:42:37 - ERROR - stderr - 24%|██▎ | 5322/22434 [6:34:57<11:47:52, 2.48s/it] +2025-02-05 16:42:37 - ERROR - stderr - +2025-02-05 16:42:37 - ERROR - stderr - +2025-02-05 16:42:37 - INFO - stdout - {'loss': 0.9772, 'grad_norm': 0.9905663728713989, 'learning_rate': 1.783167065336088e-05, 'epoch': 0.71} +2025-02-05 16:42:37 - ERROR - stderr - 24%|██▎ | 5322/22434 [6:34:57<11:47:52, 2.48s/it] +2025-02-05 16:42:40 - ERROR - stderr - 24%|██▎ | 5323/22434 [6:34:59<11:51:07, 2.49s/it] +2025-02-05 16:42:40 - ERROR - stderr - +2025-02-05 16:42:40 - ERROR - stderr - +2025-02-05 16:42:40 - INFO - stdout - {'loss': 0.9369, 'grad_norm': 0.9096208810806274, 'learning_rate': 1.7830772833713275e-05, 'epoch': 0.71} +2025-02-05 16:42:40 - ERROR - stderr - 24%|██▎ | 5323/22434 [6:34:59<11:51:07, 2.49s/it] +2025-02-05 16:42:42 - ERROR - stderr - 24%|██▎ | 5324/22434 [6:35:02<11:48:05, 2.48s/it] +2025-02-05 16:42:42 - ERROR - stderr - +2025-02-05 16:42:42 - ERROR - stderr - +2025-02-05 16:42:42 - INFO - stdout - {'loss': 1.0427, 'grad_norm': 1.181073546409607, 'learning_rate': 1.7829874850840705e-05, 'epoch': 0.71} +2025-02-05 16:42:42 - ERROR - stderr - 24%|██▎ | 5324/22434 [6:35:02<11:48:05, 2.48s/it] +2025-02-05 16:42:44 - ERROR - stderr - 24%|██▎ | 5325/22434 [6:35:04<11:42:58, 2.47s/it] +2025-02-05 16:42:45 - ERROR - stderr - +2025-02-05 16:42:45 - ERROR - stderr - +2025-02-05 16:42:45 - INFO - stdout - {'loss': 0.9686, 'grad_norm': 1.0163829326629639, 'learning_rate': 1.7828976704761884e-05, 'epoch': 0.71} +2025-02-05 16:42:45 - ERROR - stderr - 24%|██▎ | 5325/22434 [6:35:04<11:42:58, 2.47s/it] +2025-02-05 16:42:47 - ERROR - stderr - 24%|██▎ | 5326/22434 [6:35:07<11:40:01, 2.46s/it] +2025-02-05 16:42:47 - ERROR - stderr - +2025-02-05 16:42:47 - ERROR - stderr - +2025-02-05 16:42:47 - INFO - stdout - {'loss': 0.8775, 'grad_norm': 1.2507660388946533, 'learning_rate': 1.7828078395495536e-05, 'epoch': 0.71} +2025-02-05 16:42:47 - ERROR - stderr - 24%|██▎ | 5326/22434 [6:35:07<11:40:01, 2.46s/it] +2025-02-05 16:42:49 - ERROR - stderr - 24%|██▎ | 5327/22434 [6:35:09<11:42:50, 2.47s/it] +2025-02-05 16:42:49 - ERROR - stderr - +2025-02-05 16:42:49 - ERROR - stderr - +2025-02-05 16:42:49 - INFO - stdout - {'loss': 0.946, 'grad_norm': 1.048471212387085, 'learning_rate': 1.7827179923060382e-05, 'epoch': 0.71} +2025-02-05 16:42:49 - ERROR - stderr - 24%|██▎ | 5327/22434 [6:35:09<11:42:50, 2.47s/it] +2025-02-05 16:42:52 - ERROR - stderr - 24%|██▎ | 5328/22434 [6:35:12<11:47:05, 2.48s/it] +2025-02-05 16:42:52 - ERROR - stderr - +2025-02-05 16:42:52 - ERROR - stderr - +2025-02-05 16:42:52 - INFO - stdout - {'loss': 0.9341, 'grad_norm': 1.0272212028503418, 'learning_rate': 1.782628128747516e-05, 'epoch': 0.71} +2025-02-05 16:42:52 - ERROR - stderr - 24%|██▎ | 5328/22434 [6:35:12<11:47:05, 2.48s/it] +2025-02-05 16:42:54 - ERROR - stderr - 24%|██▍ | 5329/22434 [6:35:14<11:46:30, 2.48s/it] +2025-02-05 16:42:54 - ERROR - stderr - +2025-02-05 16:42:54 - ERROR - stderr - +2025-02-05 16:42:54 - INFO - stdout - {'loss': 1.0057, 'grad_norm': 1.1031184196472168, 'learning_rate': 1.7825382488758585e-05, 'epoch': 0.71} +2025-02-05 16:42:54 - ERROR - stderr - 24%|██▍ | 5329/22434 [6:35:14<11:46:30, 2.48s/it] +2025-02-05 16:42:57 - ERROR - stderr - 24%|██▍ | 5330/22434 [6:35:17<11:46:50, 2.48s/it] +2025-02-05 16:42:57 - ERROR - stderr - +2025-02-05 16:42:57 - ERROR - stderr - +2025-02-05 16:42:57 - INFO - stdout - {'loss': 1.1132, 'grad_norm': 1.1085314750671387, 'learning_rate': 1.7824483526929403e-05, 'epoch': 0.71} +2025-02-05 16:42:57 - ERROR - stderr - 24%|██▍ | 5330/22434 [6:35:17<11:46:50, 2.48s/it] +2025-02-05 16:42:59 - ERROR - stderr - 24%|██▍ | 5331/22434 [6:35:19<11:45:45, 2.48s/it] +2025-02-05 16:42:59 - ERROR - stderr - +2025-02-05 16:42:59 - ERROR - stderr - +2025-02-05 16:42:59 - INFO - stdout - {'loss': 1.0181, 'grad_norm': 1.0439192056655884, 'learning_rate': 1.782358440200635e-05, 'epoch': 0.71} +2025-02-05 16:42:59 - ERROR - stderr - 24%|██▍ | 5331/22434 [6:35:19<11:45:45, 2.48s/it] +2025-02-05 16:43:02 - ERROR - stderr - 24%|██▍ | 5332/22434 [6:35:22<11:48:57, 2.49s/it] +2025-02-05 16:43:02 - ERROR - stderr - +2025-02-05 16:43:02 - ERROR - stderr - +2025-02-05 16:43:02 - INFO - stdout - {'loss': 1.0269, 'grad_norm': 1.0995310544967651, 'learning_rate': 1.782268511400817e-05, 'epoch': 0.71} +2025-02-05 16:43:02 - ERROR - stderr - 24%|��█▍ | 5332/22434 [6:35:22<11:48:57, 2.49s/it] +2025-02-05 16:43:04 - ERROR - stderr - 24%|██▍ | 5333/22434 [6:35:24<11:51:24, 2.50s/it] +2025-02-05 16:43:04 - ERROR - stderr - +2025-02-05 16:43:04 - ERROR - stderr - +2025-02-05 16:43:04 - INFO - stdout - {'loss': 0.9717, 'grad_norm': 1.021683692932129, 'learning_rate': 1.7821785662953597e-05, 'epoch': 0.71} +2025-02-05 16:43:04 - ERROR - stderr - 24%|██▍ | 5333/22434 [6:35:24<11:51:24, 2.50s/it] +2025-02-05 16:43:07 - ERROR - stderr - 24%|██▍ | 5334/22434 [6:35:27<11:45:55, 2.48s/it] +2025-02-05 16:43:07 - ERROR - stderr - +2025-02-05 16:43:07 - ERROR - stderr - +2025-02-05 16:43:07 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.1692471504211426, 'learning_rate': 1.782088604886139e-05, 'epoch': 0.71} +2025-02-05 16:43:07 - ERROR - stderr - 24%|██▍ | 5334/22434 [6:35:27<11:45:55, 2.48s/it] +2025-02-05 16:43:09 - ERROR - stderr - 24%|██▍ | 5335/22434 [6:35:29<11:41:07, 2.46s/it] +2025-02-05 16:43:09 - ERROR - stderr - +2025-02-05 16:43:09 - ERROR - stderr - +2025-02-05 16:43:09 - INFO - stdout - {'loss': 1.0362, 'grad_norm': 1.189568281173706, 'learning_rate': 1.7819986271750295e-05, 'epoch': 0.71} +2025-02-05 16:43:09 - ERROR - stderr - 24%|██▍ | 5335/22434 [6:35:29<11:41:07, 2.46s/it] +2025-02-05 16:43:12 - ERROR - stderr - 24%|██▍ | 5336/22434 [6:35:32<11:51:28, 2.50s/it] +2025-02-05 16:43:12 - ERROR - stderr - +2025-02-05 16:43:12 - ERROR - stderr - +2025-02-05 16:43:12 - INFO - stdout - {'loss': 0.8939, 'grad_norm': 1.0767238140106201, 'learning_rate': 1.781908633163907e-05, 'epoch': 0.71} +2025-02-05 16:43:12 - ERROR - stderr - 24%|██▍ | 5336/22434 [6:35:32<11:51:28, 2.50s/it] +2025-02-05 16:43:14 - ERROR - stderr - 24%|██▍ | 5337/22434 [6:35:34<11:49:55, 2.49s/it] +2025-02-05 16:43:14 - ERROR - stderr - +2025-02-05 16:43:14 - ERROR - stderr - +2025-02-05 16:43:14 - INFO - stdout - {'loss': 0.8912, 'grad_norm': 0.966705858707428, 'learning_rate': 1.7818186228546474e-05, 'epoch': 0.71} +2025-02-05 16:43:14 - ERROR - stderr - 24%|██▍ | 5337/22434 [6:35:34<11:49:55, 2.49s/it] +2025-02-05 16:43:17 - ERROR - stderr - 24%|██▍ | 5338/22434 [6:35:37<11:52:37, 2.50s/it] +2025-02-05 16:43:17 - ERROR - stderr - +2025-02-05 16:43:17 - ERROR - stderr - +2025-02-05 16:43:17 - INFO - stdout - {'loss': 0.8977, 'grad_norm': 1.1073014736175537, 'learning_rate': 1.7817285962491268e-05, 'epoch': 0.71} +2025-02-05 16:43:17 - ERROR - stderr - 24%|██▍ | 5338/22434 [6:35:37<11:52:37, 2.50s/it] +2025-02-05 16:43:19 - ERROR - stderr - 24%|██▍ | 5339/22434 [6:35:39<11:57:16, 2.52s/it] +2025-02-05 16:43:19 - ERROR - stderr - +2025-02-05 16:43:19 - ERROR - stderr - +2025-02-05 16:43:19 - INFO - stdout - {'loss': 0.9191, 'grad_norm': 1.1901623010635376, 'learning_rate': 1.7816385533492213e-05, 'epoch': 0.71} +2025-02-05 16:43:19 - ERROR - stderr - 24%|██▍ | 5339/22434 [6:35:39<11:57:16, 2.52s/it] +2025-02-05 16:43:22 - ERROR - stderr - 24%|██▍ | 5340/22434 [6:35:42<12:00:52, 2.53s/it] +2025-02-05 16:43:22 - ERROR - stderr - +2025-02-05 16:43:22 - ERROR - stderr - +2025-02-05 16:43:22 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.0701591968536377, 'learning_rate': 1.7815484941568084e-05, 'epoch': 0.71} +2025-02-05 16:43:22 - ERROR - stderr - 24%|██▍ | 5340/22434 [6:35:42<12:00:52, 2.53s/it] +2025-02-05 16:43:25 - ERROR - stderr - 24%|██▍ | 5341/22434 [6:35:44<12:15:34, 2.58s/it] +2025-02-05 16:43:25 - ERROR - stderr - +2025-02-05 16:43:25 - ERROR - stderr - +2025-02-05 16:43:25 - INFO - stdout - {'loss': 0.9453, 'grad_norm': 0.9914907813072205, 'learning_rate': 1.781458418673765e-05, 'epoch': 0.71} +2025-02-05 16:43:25 - ERROR - stderr - 24%|██▍ | 5341/22434 [6:35:44<12:15:34, 2.58s/it] +2025-02-05 16:43:27 - ERROR - stderr - 24%|██▍ | 5342/22434 [6:35:47<12:25:57, 2.62s/it] +2025-02-05 16:43:27 - ERROR - stderr - +2025-02-05 16:43:27 - ERROR - stderr - +2025-02-05 16:43:27 - INFO - stdout - {'loss': 0.9324, 'grad_norm': 1.0258045196533203, 'learning_rate': 1.7813683269019682e-05, 'epoch': 0.71} +2025-02-05 16:43:27 - ERROR - stderr - 24%|██▍ | 5342/22434 [6:35:47<12:25:57, 2.62s/it] +2025-02-05 16:43:30 - ERROR - stderr - 24%|██▍ | 5343/22434 [6:35:50<12:22:06, 2.61s/it] +2025-02-05 16:43:30 - ERROR - stderr - +2025-02-05 16:43:30 - ERROR - stderr - +2025-02-05 16:43:30 - INFO - stdout - {'loss': 0.8608, 'grad_norm': 0.9813135266304016, 'learning_rate': 1.781278218843297e-05, 'epoch': 0.71} +2025-02-05 16:43:30 - ERROR - stderr - 24%|██▍ | 5343/22434 [6:35:50<12:22:06, 2.61s/it] +2025-02-05 16:43:32 - ERROR - stderr - 24%|██▍ | 5344/22434 [6:35:52<12:16:05, 2.58s/it] +2025-02-05 16:43:32 - ERROR - stderr - +2025-02-05 16:43:32 - ERROR - stderr - +2025-02-05 16:43:32 - INFO - stdout - {'loss': 0.9924, 'grad_norm': 0.950508713722229, 'learning_rate': 1.7811880944996285e-05, 'epoch': 0.71} +2025-02-05 16:43:32 - ERROR - stderr - 24%|██▍ | 5344/22434 [6:35:52<12:16:05, 2.58s/it] +2025-02-05 16:43:35 - ERROR - stderr - 24%|██▍ | 5345/22434 [6:35:55<12:44:30, 2.68s/it] +2025-02-05 16:43:35 - ERROR - stderr - +2025-02-05 16:43:35 - ERROR - stderr - +2025-02-05 16:43:35 - INFO - stdout - {'loss': 1.0356, 'grad_norm': 1.1717063188552856, 'learning_rate': 1.7810979538728416e-05, 'epoch': 0.71} +2025-02-05 16:43:35 - ERROR - stderr - 24%|██▍ | 5345/22434 [6:35:55<12:44:30, 2.68s/it] +2025-02-05 16:43:38 - ERROR - stderr - 24%|██▍ | 5346/22434 [6:35:58<12:42:23, 2.68s/it] +2025-02-05 16:43:38 - ERROR - stderr - +2025-02-05 16:43:38 - ERROR - stderr - +2025-02-05 16:43:38 - INFO - stdout - {'loss': 1.0761, 'grad_norm': 1.1714346408843994, 'learning_rate': 1.7810077969648157e-05, 'epoch': 0.71} +2025-02-05 16:43:38 - ERROR - stderr - 24%|██▍ | 5346/22434 [6:35:58<12:42:23, 2.68s/it] +2025-02-05 16:43:40 - ERROR - stderr - 24%|██▍ | 5347/22434 [6:36:00<12:25:58, 2.62s/it] +2025-02-05 16:43:41 - ERROR - stderr - +2025-02-05 16:43:41 - ERROR - stderr - +2025-02-05 16:43:41 - INFO - stdout - {'loss': 1.1361, 'grad_norm': 1.1618902683258057, 'learning_rate': 1.780917623777429e-05, 'epoch': 0.72} +2025-02-05 16:43:41 - ERROR - stderr - 24%|██▍ | 5347/22434 [6:36:00<12:25:58, 2.62s/it] +2025-02-05 16:43:43 - ERROR - stderr - 24%|██▍ | 5348/22434 [6:36:03<12:14:48, 2.58s/it] +2025-02-05 16:43:43 - ERROR - stderr - +2025-02-05 16:43:43 - ERROR - stderr - +2025-02-05 16:43:43 - INFO - stdout - {'loss': 0.932, 'grad_norm': 1.1420725584030151, 'learning_rate': 1.7808274343125626e-05, 'epoch': 0.72} +2025-02-05 16:43:43 - ERROR - stderr - 24%|██▍ | 5348/22434 [6:36:03<12:14:48, 2.58s/it] +2025-02-05 16:43:46 - ERROR - stderr - 24%|██▍ | 5349/22434 [6:36:05<12:12:24, 2.57s/it] +2025-02-05 16:43:46 - ERROR - stderr - +2025-02-05 16:43:46 - ERROR - stderr - +2025-02-05 16:43:46 - INFO - stdout - {'loss': 0.936, 'grad_norm': 1.1327266693115234, 'learning_rate': 1.7807372285720945e-05, 'epoch': 0.72} +2025-02-05 16:43:46 - ERROR - stderr - 24%|██▍ | 5349/22434 [6:36:05<12:12:24, 2.57s/it] +2025-02-05 16:43:48 - ERROR - stderr - 24%|██▍ | 5350/22434 [6:36:08<12:11:08, 2.57s/it] +2025-02-05 16:43:48 - ERROR - stderr - +2025-02-05 16:43:48 - ERROR - stderr - +2025-02-05 16:43:48 - INFO - stdout - {'loss': 1.022, 'grad_norm': 1.107387900352478, 'learning_rate': 1.7806470065579064e-05, 'epoch': 0.72} +2025-02-05 16:43:48 - ERROR - stderr - 24%|██▍ | 5350/22434 [6:36:08<12:11:08, 2.57s/it] +2025-02-05 16:43:51 - ERROR - stderr - 24%|██▍ | 5351/22434 [6:36:11<12:28:23, 2.63s/it] +2025-02-05 16:43:51 - ERROR - stderr - +2025-02-05 16:43:51 - ERROR - stderr - +2025-02-05 16:43:51 - INFO - stdout - {'loss': 0.8787, 'grad_norm': 1.0707104206085205, 'learning_rate': 1.7805567682718785e-05, 'epoch': 0.72} +2025-02-05 16:43:51 - ERROR - stderr - 24%|██▍ | 5351/22434 [6:36:11<12:28:23, 2.63s/it] +2025-02-05 16:43:53 - ERROR - stderr - 24%|██▍ | 5352/22434 [6:36:13<12:18:57, 2.60s/it] +2025-02-05 16:43:53 - ERROR - stderr - +2025-02-05 16:43:53 - ERROR - stderr - +2025-02-05 16:43:53 - INFO - stdout - {'loss': 0.9422, 'grad_norm': 1.0453429222106934, 'learning_rate': 1.7804665137158917e-05, 'epoch': 0.72} +2025-02-05 16:43:53 - ERROR - stderr - 24%|██▍ | 5352/22434 [6:36:13<12:18:57, 2.60s/it] +2025-02-05 16:43:56 - ERROR - stderr - 24%|██▍ | 5353/22434 [6:36:16<12:16:31, 2.59s/it] +2025-02-05 16:43:56 - ERROR - stderr - +2025-02-05 16:43:56 - ERROR - stderr - +2025-02-05 16:43:56 - INFO - stdout - {'loss': 0.86, 'grad_norm': 0.9811695218086243, 'learning_rate': 1.780376242891827e-05, 'epoch': 0.72} +2025-02-05 16:43:56 - ERROR - stderr - 24%|██▍ | 5353/22434 [6:36:16<12:16:31, 2.59s/it] +2025-02-05 16:43:58 - ERROR - stderr - 24%|██▍ | 5354/22434 [6:36:18<12:08:59, 2.56s/it] +2025-02-05 16:43:58 - ERROR - stderr - +2025-02-05 16:43:58 - ERROR - stderr - +2025-02-05 16:43:58 - INFO - stdout - {'loss': 0.9357, 'grad_norm': 1.0117377042770386, 'learning_rate': 1.7802859558015666e-05, 'epoch': 0.72} +2025-02-05 16:43:58 - ERROR - stderr - 24%|██▍ | 5354/22434 [6:36:18<12:08:59, 2.56s/it] +2025-02-05 16:44:01 - ERROR - stderr - 24%|██▍ | 5355/22434 [6:36:21<11:58:42, 2.52s/it] +2025-02-05 16:44:01 - ERROR - stderr - +2025-02-05 16:44:01 - ERROR - stderr - +2025-02-05 16:44:01 - INFO - stdout - {'loss': 0.9805, 'grad_norm': 1.071099042892456, 'learning_rate': 1.7801956524469922e-05, 'epoch': 0.72} +2025-02-05 16:44:01 - ERROR - stderr - 24%|██▍ | 5355/22434 [6:36:21<11:58:42, 2.52s/it] +2025-02-05 16:44:03 - ERROR - stderr - 24%|██▍ | 5356/22434 [6:36:23<11:50:12, 2.50s/it] +2025-02-05 16:44:03 - ERROR - stderr - +2025-02-05 16:44:03 - ERROR - stderr - +2025-02-05 16:44:03 - INFO - stdout - {'loss': 0.9908, 'grad_norm': 1.0444166660308838, 'learning_rate': 1.7801053328299856e-05, 'epoch': 0.72} +2025-02-05 16:44:03 - ERROR - stderr - 24%|██▍ | 5356/22434 [6:36:23<11:50:12, 2.50s/it] +2025-02-05 16:44:06 - ERROR - stderr - 24%|██▍ | 5357/22434 [6:36:26<11:51:14, 2.50s/it] +2025-02-05 16:44:06 - ERROR - stderr - +2025-02-05 16:44:06 - ERROR - stderr - +2025-02-05 16:44:06 - INFO - stdout - {'loss': 0.9765, 'grad_norm': 1.1647387742996216, 'learning_rate': 1.78001499695243e-05, 'epoch': 0.72} +2025-02-05 16:44:06 - ERROR - stderr - 24%|██▍ | 5357/22434 [6:36:26<11:51:14, 2.50s/it] +2025-02-05 16:44:08 - ERROR - stderr - 24%|██▍ | 5358/22434 [6:36:28<11:45:18, 2.48s/it] +2025-02-05 16:44:08 - ERROR - stderr - +2025-02-05 16:44:08 - ERROR - stderr - +2025-02-05 16:44:08 - INFO - stdout - {'loss': 0.8907, 'grad_norm': 1.1209625005722046, 'learning_rate': 1.779924644816208e-05, 'epoch': 0.72} +2025-02-05 16:44:08 - ERROR - stderr - 24%|██▍ | 5358/22434 [6:36:28<11:45:18, 2.48s/it] +2025-02-05 16:44:11 - ERROR - stderr - 24%|██▍ | 5359/22434 [6:36:30<11:44:12, 2.47s/it] +2025-02-05 16:44:11 - ERROR - stderr - +2025-02-05 16:44:11 - ERROR - stderr - +2025-02-05 16:44:11 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 1.054835319519043, 'learning_rate': 1.779834276423203e-05, 'epoch': 0.72} +2025-02-05 16:44:11 - ERROR - stderr - 24%|██▍ | 5359/22434 [6:36:31<11:44:12, 2.47s/it] +2025-02-05 16:44:13 - ERROR - stderr - 24%|██▍ | 5360/22434 [6:36:33<11:49:40, 2.49s/it] +2025-02-05 16:44:13 - ERROR - stderr - +2025-02-05 16:44:13 - ERROR - stderr - +2025-02-05 16:44:13 - INFO - stdout - {'loss': 0.8217, 'grad_norm': 0.9631587266921997, 'learning_rate': 1.7797438917752992e-05, 'epoch': 0.72} +2025-02-05 16:44:13 - ERROR - stderr - 24%|██▍ | 5360/22434 [6:36:33<11:49:40, 2.49s/it] +2025-02-05 16:44:16 - ERROR - stderr - 24%|██▍ | 5361/22434 [6:36:36<11:54:57, 2.51s/it] +2025-02-05 16:44:16 - ERROR - stderr - +2025-02-05 16:44:16 - ERROR - stderr - +2025-02-05 16:44:16 - INFO - stdout - {'loss': 0.9218, 'grad_norm': 1.1388700008392334, 'learning_rate': 1.7796534908743798e-05, 'epoch': 0.72} +2025-02-05 16:44:16 - ERROR - stderr - 24%|██▍ | 5361/22434 [6:36:36<11:54:57, 2.51s/it] +2025-02-05 16:44:18 - ERROR - stderr - 24%|██▍ | 5362/22434 [6:36:38<11:53:09, 2.51s/it] +2025-02-05 16:44:18 - ERROR - stderr - +2025-02-05 16:44:18 - ERROR - stderr - +2025-02-05 16:44:18 - INFO - stdout - {'loss': 0.9053, 'grad_norm': 1.0172324180603027, 'learning_rate': 1.7795630737223296e-05, 'epoch': 0.72} +2025-02-05 16:44:18 - ERROR - stderr - 24%|██▍ | 5362/22434 [6:36:38<11:53:09, 2.51s/it] +2025-02-05 16:44:21 - ERROR - stderr - 24%|██▍ | 5363/22434 [6:36:41<11:52:08, 2.50s/it] +2025-02-05 16:44:21 - ERROR - stderr - +2025-02-05 16:44:21 - ERROR - stderr - +2025-02-05 16:44:21 - INFO - stdout - {'loss': 0.8661, 'grad_norm': 1.015089511871338, 'learning_rate': 1.7794726403210328e-05, 'epoch': 0.72} +2025-02-05 16:44:21 - ERROR - stderr - 24%|██▍ | 5363/22434 [6:36:41<11:52:08, 2.50s/it] +2025-02-05 16:44:23 - ERROR - stderr - 24%|██▍ | 5364/22434 [6:36:43<11:48:31, 2.49s/it] +2025-02-05 16:44:23 - ERROR - stderr - +2025-02-05 16:44:23 - ERROR - stderr - +2025-02-05 16:44:23 - INFO - stdout - {'loss': 0.8592, 'grad_norm': 1.0246933698654175, 'learning_rate': 1.779382190672375e-05, 'epoch': 0.72} +2025-02-05 16:44:23 - ERROR - stderr - 24%|██▍ | 5364/22434 [6:36:43<11:48:31, 2.49s/it] +2025-02-05 16:44:26 - ERROR - stderr - 24%|██▍ | 5365/22434 [6:36:46<11:54:35, 2.51s/it] +2025-02-05 16:44:26 - ERROR - stderr - +2025-02-05 16:44:26 - ERROR - stderr - +2025-02-05 16:44:26 - INFO - stdout - {'loss': 0.8703, 'grad_norm': 1.292546272277832, 'learning_rate': 1.779291724778241e-05, 'epoch': 0.72} +2025-02-05 16:44:26 - ERROR - stderr - 24%|██▍ | 5365/22434 [6:36:46<11:54:35, 2.51s/it] +2025-02-05 16:44:28 - ERROR - stderr - 24%|██▍ | 5366/22434 [6:36:48<11:47:11, 2.49s/it] +2025-02-05 16:44:28 - ERROR - stderr - +2025-02-05 16:44:28 - ERROR - stderr - +2025-02-05 16:44:28 - INFO - stdout - {'loss': 1.0226, 'grad_norm': 1.070896863937378, 'learning_rate': 1.779201242640517e-05, 'epoch': 0.72} +2025-02-05 16:44:28 - ERROR - stderr - 24%|██▍ | 5366/22434 [6:36:48<11:47:11, 2.49s/it] +2025-02-05 16:44:31 - ERROR - stderr - 24%|██▍ | 5367/22434 [6:36:50<11:44:04, 2.48s/it] +2025-02-05 16:44:31 - ERROR - stderr - +2025-02-05 16:44:31 - ERROR - stderr - +2025-02-05 16:44:31 - INFO - stdout - {'loss': 0.9088, 'grad_norm': 1.0165013074874878, 'learning_rate': 1.7791107442610886e-05, 'epoch': 0.72} +2025-02-05 16:44:31 - ERROR - stderr - 24%|██▍ | 5367/22434 [6:36:51<11:44:04, 2.48s/it] +2025-02-05 16:44:33 - ERROR - stderr - 24%|██▍ | 5368/22434 [6:36:53<11:46:47, 2.48s/it] +2025-02-05 16:44:33 - ERROR - stderr - +2025-02-05 16:44:33 - ERROR - stderr - +2025-02-05 16:44:33 - INFO - stdout - {'loss': 1.0075, 'grad_norm': 1.0338480472564697, 'learning_rate': 1.779020229641842e-05, 'epoch': 0.72} +2025-02-05 16:44:33 - ERROR - stderr - 24%|██▍ | 5368/22434 [6:36:53<11:46:47, 2.48s/it] +2025-02-05 16:44:36 - ERROR - stderr - 24%|██▍ | 5369/22434 [6:36:55<11:44:24, 2.48s/it] +2025-02-05 16:44:36 - ERROR - stderr - +2025-02-05 16:44:36 - ERROR - stderr - +2025-02-05 16:44:36 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.1418612003326416, 'learning_rate': 1.7789296987846644e-05, 'epoch': 0.72} +2025-02-05 16:44:36 - ERROR - stderr - 24%|██▍ | 5369/22434 [6:36:55<11:44:24, 2.48s/it] +2025-02-05 16:44:38 - ERROR - stderr - 24%|██▍ | 5370/22434 [6:36:58<11:42:42, 2.47s/it] +2025-02-05 16:44:38 - ERROR - stderr - +2025-02-05 16:44:38 - ERROR - stderr - +2025-02-05 16:44:38 - INFO - stdout - {'loss': 0.8802, 'grad_norm': 1.0352901220321655, 'learning_rate': 1.7788391516914422e-05, 'epoch': 0.72} +2025-02-05 16:44:38 - ERROR - stderr - 24%|██▍ | 5370/22434 [6:36:58<11:42:42, 2.47s/it] +2025-02-05 16:44:41 - ERROR - stderr - 24%|██▍ | 5371/22434 [6:37:00<11:47:03, 2.49s/it] +2025-02-05 16:44:41 - ERROR - stderr - +2025-02-05 16:44:41 - ERROR - stderr - +2025-02-05 16:44:41 - INFO - stdout - {'loss': 0.9889, 'grad_norm': 1.0773141384124756, 'learning_rate': 1.7787485883640635e-05, 'epoch': 0.72} +2025-02-05 16:44:41 - ERROR - stderr - 24%|██▍ | 5371/22434 [6:37:00<11:47:03, 2.49s/it] +2025-02-05 16:44:43 - ERROR - stderr - 24%|██▍ | 5372/22434 [6:37:03<11:50:24, 2.50s/it] +2025-02-05 16:44:43 - ERROR - stderr - +2025-02-05 16:44:43 - ERROR - stderr - +2025-02-05 16:44:43 - INFO - stdout - {'loss': 0.9228, 'grad_norm': 1.1402558088302612, 'learning_rate': 1.7786580088044157e-05, 'epoch': 0.72} +2025-02-05 16:44:43 - ERROR - stderr - 24%|██▍ | 5372/22434 [6:37:03<11:50:24, 2.50s/it] +2025-02-05 16:44:46 - ERROR - stderr - 24%|██▍ | 5373/22434 [6:37:05<11:46:41, 2.49s/it] +2025-02-05 16:44:46 - ERROR - stderr - +2025-02-05 16:44:46 - ERROR - stderr - +2025-02-05 16:44:46 - INFO - stdout - {'loss': 1.1222, 'grad_norm': 1.1984896659851074, 'learning_rate': 1.7785674130143865e-05, 'epoch': 0.72} +2025-02-05 16:44:46 - ERROR - stderr - 24%|██▍ | 5373/22434 [6:37:05<11:46:41, 2.49s/it] +2025-02-05 16:44:48 - ERROR - stderr - 24%|██▍ | 5374/22434 [6:37:08<11:49:08, 2.49s/it] +2025-02-05 16:44:48 - ERROR - stderr - +2025-02-05 16:44:48 - ERROR - stderr - +2025-02-05 16:44:48 - INFO - stdout - {'loss': 0.8696, 'grad_norm': 0.9233139753341675, 'learning_rate': 1.778476800995865e-05, 'epoch': 0.72} +2025-02-05 16:44:48 - ERROR - stderr - 24%|██▍ | 5374/22434 [6:37:08<11:49:08, 2.49s/it] +2025-02-05 16:44:51 - ERROR - stderr - 24%|██▍ | 5375/22434 [6:37:10<11:45:03, 2.48s/it] +2025-02-05 16:44:51 - ERROR - stderr - +2025-02-05 16:44:51 - ERROR - stderr - +2025-02-05 16:44:51 - INFO - stdout - {'loss': 0.9305, 'grad_norm': 1.0708703994750977, 'learning_rate': 1.7783861727507394e-05, 'epoch': 0.72} +2025-02-05 16:44:51 - ERROR - stderr - 24%|██▍ | 5375/22434 [6:37:10<11:45:03, 2.48s/it] +2025-02-05 16:44:53 - ERROR - stderr - 24%|██▍ | 5376/22434 [6:37:13<11:47:36, 2.49s/it] +2025-02-05 16:44:53 - ERROR - stderr - +2025-02-05 16:44:53 - ERROR - stderr - +2025-02-05 16:44:53 - INFO - stdout - {'loss': 1.0838, 'grad_norm': 1.2617658376693726, 'learning_rate': 1.7782955282808986e-05, 'epoch': 0.72} +2025-02-05 16:44:53 - ERROR - stderr - 24%|██▍ | 5376/22434 [6:37:13<11:47:36, 2.49s/it] +2025-02-05 16:44:56 - ERROR - stderr - 24%|██▍ | 5377/22434 [6:37:15<11:44:07, 2.48s/it] +2025-02-05 16:44:56 - ERROR - stderr - +2025-02-05 16:44:56 - ERROR - stderr - +2025-02-05 16:44:56 - INFO - stdout - {'loss': 0.8672, 'grad_norm': 1.1590847969055176, 'learning_rate': 1.7782048675882325e-05, 'epoch': 0.72} +2025-02-05 16:44:56 - ERROR - stderr - 24%|██▍ | 5377/22434 [6:37:15<11:44:07, 2.48s/it] +2025-02-05 16:44:58 - ERROR - stderr - 24%|██▍ | 5378/22434 [6:37:18<11:44:55, 2.48s/it] +2025-02-05 16:44:58 - ERROR - stderr - +2025-02-05 16:44:58 - ERROR - stderr - +2025-02-05 16:44:58 - INFO - stdout - {'loss': 0.7874, 'grad_norm': 1.036059021949768, 'learning_rate': 1.7781141906746304e-05, 'epoch': 0.72} +2025-02-05 16:44:58 - ERROR - stderr - 24%|██▍ | 5378/22434 [6:37:18<11:44:55, 2.48s/it] +2025-02-05 16:45:01 - ERROR - stderr - 24%|██▍ | 5379/22434 [6:37:20<11:48:56, 2.49s/it] +2025-02-05 16:45:01 - ERROR - stderr - +2025-02-05 16:45:01 - ERROR - stderr - +2025-02-05 16:45:01 - INFO - stdout - {'loss': 0.8291, 'grad_norm': 1.0484755039215088, 'learning_rate': 1.7780234975419828e-05, 'epoch': 0.72} +2025-02-05 16:45:01 - ERROR - stderr - 24%|██▍ | 5379/22434 [6:37:20<11:48:56, 2.49s/it] +2025-02-05 16:45:03 - ERROR - stderr - 24%|██▍ | 5380/22434 [6:37:23<11:46:12, 2.48s/it] +2025-02-05 16:45:03 - ERROR - stderr - +2025-02-05 16:45:03 - ERROR - stderr - +2025-02-05 16:45:03 - INFO - stdout - {'loss': 1.0946, 'grad_norm': 1.2048934698104858, 'learning_rate': 1.77793278819218e-05, 'epoch': 0.72} +2025-02-05 16:45:03 - ERROR - stderr - 24%|██▍ | 5380/22434 [6:37:23<11:46:12, 2.48s/it] +2025-02-05 16:45:06 - ERROR - stderr - 24%|██▍ | 5381/22434 [6:37:25<11:47:20, 2.49s/it] +2025-02-05 16:45:06 - ERROR - stderr - +2025-02-05 16:45:06 - ERROR - stderr - +2025-02-05 16:45:06 - INFO - stdout - {'loss': 0.9157, 'grad_norm': 1.1381714344024658, 'learning_rate': 1.7778420626271123e-05, 'epoch': 0.72} +2025-02-05 16:45:06 - ERROR - stderr - 24%|██▍ | 5381/22434 [6:37:25<11:47:20, 2.49s/it] +2025-02-05 16:45:08 - ERROR - stderr - 24%|██▍ | 5382/22434 [6:37:28<11:54:43, 2.51s/it] +2025-02-05 16:45:08 - ERROR - stderr - +2025-02-05 16:45:08 - ERROR - stderr - +2025-02-05 16:45:08 - INFO - stdout - {'loss': 0.9915, 'grad_norm': 1.1251357793807983, 'learning_rate': 1.777751320848671e-05, 'epoch': 0.72} +2025-02-05 16:45:08 - ERROR - stderr - 24%|██▍ | 5382/22434 [6:37:28<11:54:43, 2.51s/it] +2025-02-05 16:45:11 - ERROR - stderr - 24%|██▍ | 5383/22434 [6:37:30<11:50:05, 2.50s/it] +2025-02-05 16:45:11 - ERROR - stderr - +2025-02-05 16:45:11 - ERROR - stderr - +2025-02-05 16:45:11 - INFO - stdout - {'loss': 1.0945, 'grad_norm': 1.180052638053894, 'learning_rate': 1.777660562858748e-05, 'epoch': 0.72} +2025-02-05 16:45:11 - ERROR - stderr - 24%|██▍ | 5383/22434 [6:37:30<11:50:05, 2.50s/it] +2025-02-05 16:45:13 - ERROR - stderr - 24%|██▍ | 5384/22434 [6:37:33<11:51:08, 2.50s/it] +2025-02-05 16:45:13 - ERROR - stderr - +2025-02-05 16:45:13 - ERROR - stderr - +2025-02-05 16:45:13 - INFO - stdout - {'loss': 0.9261, 'grad_norm': 1.0401805639266968, 'learning_rate': 1.7775697886592345e-05, 'epoch': 0.72} +2025-02-05 16:45:13 - ERROR - stderr - 24%|██▍ | 5384/22434 [6:37:33<11:51:08, 2.50s/it] +2025-02-05 16:45:16 - ERROR - stderr - 24%|██▍ | 5385/22434 [6:37:35<11:50:25, 2.50s/it] +2025-02-05 16:45:16 - ERROR - stderr - +2025-02-05 16:45:16 - ERROR - stderr - +2025-02-05 16:45:16 - INFO - stdout - {'loss': 0.9992, 'grad_norm': 1.0714852809906006, 'learning_rate': 1.777478998252023e-05, 'epoch': 0.72} +2025-02-05 16:45:16 - ERROR - stderr - 24%|██▍ | 5385/22434 [6:37:35<11:50:25, 2.50s/it] +2025-02-05 16:45:18 - ERROR - stderr - 24%|██▍ | 5386/22434 [6:37:38<11:51:31, 2.50s/it] +2025-02-05 16:45:18 - ERROR - stderr - +2025-02-05 16:45:18 - ERROR - stderr - +2025-02-05 16:45:18 - INFO - stdout - {'loss': 0.9417, 'grad_norm': 1.098952054977417, 'learning_rate': 1.7773881916390056e-05, 'epoch': 0.72} +2025-02-05 16:45:18 - ERROR - stderr - 24%|██▍ | 5386/22434 [6:37:38<11:51:31, 2.50s/it] +2025-02-05 16:45:21 - ERROR - stderr - 24%|██▍ | 5387/22434 [6:37:40<12:02:57, 2.54s/it] +2025-02-05 16:45:21 - ERROR - stderr - +2025-02-05 16:45:21 - ERROR - stderr - +2025-02-05 16:45:21 - INFO - stdout - {'loss': 0.9151, 'grad_norm': 1.1172902584075928, 'learning_rate': 1.777297368822075e-05, 'epoch': 0.72} +2025-02-05 16:45:21 - ERROR - stderr - 24%|██▍ | 5387/22434 [6:37:41<12:02:57, 2.54s/it] +2025-02-05 16:45:23 - ERROR - stderr - 24%|██▍ | 5388/22434 [6:37:43<11:56:45, 2.52s/it] +2025-02-05 16:45:23 - ERROR - stderr - +2025-02-05 16:45:23 - ERROR - stderr - +2025-02-05 16:45:23 - INFO - stdout - {'loss': 0.9418, 'grad_norm': 1.043253779411316, 'learning_rate': 1.777206529803125e-05, 'epoch': 0.72} +2025-02-05 16:45:23 - ERROR - stderr - 24%|██▍ | 5388/22434 [6:37:43<11:56:45, 2.52s/it] +2025-02-05 16:45:26 - ERROR - stderr - 24%|██▍ | 5389/22434 [6:37:46<12:01:14, 2.54s/it] +2025-02-05 16:45:26 - ERROR - stderr - +2025-02-05 16:45:26 - ERROR - stderr - +2025-02-05 16:45:26 - INFO - stdout - {'loss': 0.8409, 'grad_norm': 0.9360518455505371, 'learning_rate': 1.7771156745840482e-05, 'epoch': 0.72} +2025-02-05 16:45:26 - ERROR - stderr - 24%|██▍ | 5389/22434 [6:37:46<12:01:14, 2.54s/it] +2025-02-05 16:45:28 - ERROR - stderr - 24%|██▍ | 5390/22434 [6:37:48<11:59:43, 2.53s/it] +2025-02-05 16:45:28 - ERROR - stderr - +2025-02-05 16:45:28 - ERROR - stderr - +2025-02-05 16:45:28 - INFO - stdout - {'loss': 0.9553, 'grad_norm': 0.9903491139411926, 'learning_rate': 1.777024803166739e-05, 'epoch': 0.72} +2025-02-05 16:45:28 - ERROR - stderr - 24%|██▍ | 5390/22434 [6:37:48<11:59:43, 2.53s/it] +2025-02-05 16:45:31 - ERROR - stderr - 24%|██▍ | 5391/22434 [6:37:50<11:51:28, 2.50s/it] +2025-02-05 16:45:31 - ERROR - stderr - +2025-02-05 16:45:31 - ERROR - stderr - +2025-02-05 16:45:31 - INFO - stdout - {'loss': 0.9157, 'grad_norm': 1.061397910118103, 'learning_rate': 1.7769339155530915e-05, 'epoch': 0.72} +2025-02-05 16:45:31 - ERROR - stderr - 24%|██▍ | 5391/22434 [6:37:51<11:51:28, 2.50s/it] +2025-02-05 16:45:33 - ERROR - stderr - 24%|██▍ | 5392/22434 [6:37:53<11:58:15, 2.53s/it] +2025-02-05 16:45:33 - ERROR - stderr - +2025-02-05 16:45:33 - ERROR - stderr - +2025-02-05 16:45:33 - INFO - stdout - {'loss': 0.8587, 'grad_norm': 1.0103857517242432, 'learning_rate': 1.7768430117449998e-05, 'epoch': 0.72} +2025-02-05 16:45:33 - ERROR - stderr - 24%|██▍ | 5392/22434 [6:37:53<11:58:15, 2.53s/it] +2025-02-05 16:45:36 - ERROR - stderr - 24%|██▍ | 5393/22434 [6:37:56<12:02:32, 2.54s/it] +2025-02-05 16:45:36 - ERROR - stderr - +2025-02-05 16:45:36 - ERROR - stderr - +2025-02-05 16:45:36 - INFO - stdout - {'loss': 1.046, 'grad_norm': 1.0836666822433472, 'learning_rate': 1.7767520917443584e-05, 'epoch': 0.72} +2025-02-05 16:45:36 - ERROR - stderr - 24%|██▍ | 5393/22434 [6:37:56<12:02:32, 2.54s/it] +2025-02-05 16:45:39 - ERROR - stderr - 24%|██▍ | 5394/22434 [6:37:58<12:09:34, 2.57s/it] +2025-02-05 16:45:39 - ERROR - stderr - +2025-02-05 16:45:39 - ERROR - stderr - +2025-02-05 16:45:39 - INFO - stdout - {'loss': 1.0568, 'grad_norm': 1.131263017654419, 'learning_rate': 1.7766611555530638e-05, 'epoch': 0.72} +2025-02-05 16:45:39 - ERROR - stderr - 24%|██▍ | 5394/22434 [6:37:58<12:09:34, 2.57s/it] +2025-02-05 16:45:41 - ERROR - stderr - 24%|██▍ | 5395/22434 [6:38:01<12:02:41, 2.54s/it] +2025-02-05 16:45:41 - ERROR - stderr - +2025-02-05 16:45:41 - ERROR - stderr - +2025-02-05 16:45:41 - INFO - stdout - {'loss': 0.8317, 'grad_norm': 0.9226694107055664, 'learning_rate': 1.7765702031730102e-05, 'epoch': 0.72} +2025-02-05 16:45:41 - ERROR - stderr - 24%|██▍ | 5395/22434 [6:38:01<12:02:41, 2.54s/it] +2025-02-05 16:45:43 - ERROR - stderr - 24%|██▍ | 5396/22434 [6:38:03<11:58:11, 2.53s/it] +2025-02-05 16:45:44 - ERROR - stderr - +2025-02-05 16:45:44 - ERROR - stderr - +2025-02-05 16:45:44 - INFO - stdout - {'loss': 0.9089, 'grad_norm': 1.1343775987625122, 'learning_rate': 1.7764792346060936e-05, 'epoch': 0.72} +2025-02-05 16:45:44 - ERROR - stderr - 24%|██▍ | 5396/22434 [6:38:03<11:58:11, 2.53s/it] +2025-02-05 16:45:46 - ERROR - stderr - 24%|██▍ | 5397/22434 [6:38:06<11:51:57, 2.51s/it] +2025-02-05 16:45:46 - ERROR - stderr - +2025-02-05 16:45:46 - ERROR - stderr - +2025-02-05 16:45:46 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.0138285160064697, 'learning_rate': 1.7763882498542104e-05, 'epoch': 0.72} +2025-02-05 16:45:46 - ERROR - stderr - 24%|██▍ | 5397/22434 [6:38:06<11:51:57, 2.51s/it] +2025-02-05 16:45:48 - ERROR - stderr - 24%|██▍ | 5398/22434 [6:38:08<11:52:07, 2.51s/it] +2025-02-05 16:45:49 - ERROR - stderr - +2025-02-05 16:45:49 - ERROR - stderr - +2025-02-05 16:45:49 - INFO - stdout - {'loss': 1.0081, 'grad_norm': 1.101556658744812, 'learning_rate': 1.7762972489192575e-05, 'epoch': 0.72} +2025-02-05 16:45:49 - ERROR - stderr - 24%|██▍ | 5398/22434 [6:38:08<11:52:07, 2.51s/it] +2025-02-05 16:45:51 - ERROR - stderr - 24%|██▍ | 5399/22434 [6:38:11<11:51:40, 2.51s/it] +2025-02-05 16:45:51 - ERROR - stderr - +2025-02-05 16:45:51 - ERROR - stderr - +2025-02-05 16:45:51 - INFO - stdout - {'loss': 0.801, 'grad_norm': 1.0840650796890259, 'learning_rate': 1.7762062318031307e-05, 'epoch': 0.72} +2025-02-05 16:45:51 - ERROR - stderr - 24%|██▍ | 5399/22434 [6:38:11<11:51:40, 2.51s/it] +2025-02-05 16:45:53 - ERROR - stderr - 24%|██▍ | 5400/22434 [6:38:13<11:46:17, 2.49s/it] +2025-02-05 16:45:53 - ERROR - stderr - +2025-02-05 16:45:53 - ERROR - stderr - +2025-02-05 16:45:53 - INFO - stdout - {'loss': 0.9397, 'grad_norm': 1.1196093559265137, 'learning_rate': 1.776115198507728e-05, 'epoch': 0.72} +2025-02-05 16:45:53 - ERROR - stderr - 24%|██▍ | 5400/22434 [6:38:13<11:46:17, 2.49s/it] +2025-02-05 16:45:56 - ERROR - stderr - 24%|██▍ | 5401/22434 [6:38:16<11:49:04, 2.50s/it] +2025-02-05 16:45:56 - ERROR - stderr - +2025-02-05 16:45:56 - ERROR - stderr - +2025-02-05 16:45:56 - INFO - stdout - {'loss': 0.9643, 'grad_norm': 1.0707428455352783, 'learning_rate': 1.776024149034947e-05, 'epoch': 0.72} +2025-02-05 16:45:56 - ERROR - stderr - 24%|██▍ | 5401/22434 [6:38:16<11:49:04, 2.50s/it] +2025-02-05 16:45:58 - ERROR - stderr - 24%|██▍ | 5402/22434 [6:38:18<11:48:21, 2.50s/it] +2025-02-05 16:45:58 - ERROR - stderr - +2025-02-05 16:45:58 - ERROR - stderr - +2025-02-05 16:45:58 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.1441192626953125, 'learning_rate': 1.7759330833866847e-05, 'epoch': 0.72} +2025-02-05 16:45:58 - ERROR - stderr - 24%|██▍ | 5402/22434 [6:38:18<11:48:21, 2.50s/it] +2025-02-05 16:46:01 - ERROR - stderr - 24%|██▍ | 5403/22434 [6:38:21<11:48:42, 2.50s/it] +2025-02-05 16:46:01 - ERROR - stderr - +2025-02-05 16:46:01 - ERROR - stderr - +2025-02-05 16:46:01 - INFO - stdout - {'loss': 0.927, 'grad_norm': 1.0849087238311768, 'learning_rate': 1.77584200156484e-05, 'epoch': 0.72} +2025-02-05 16:46:01 - ERROR - stderr - 24%|██▍ | 5403/22434 [6:38:21<11:48:42, 2.50s/it] +2025-02-05 16:46:03 - ERROR - stderr - 24%|██▍ | 5404/22434 [6:38:23<11:49:45, 2.50s/it] +2025-02-05 16:46:03 - ERROR - stderr - +2025-02-05 16:46:03 - ERROR - stderr - +2025-02-05 16:46:03 - INFO - stdout - {'loss': 0.8853, 'grad_norm': 0.9749464392662048, 'learning_rate': 1.7757509035713107e-05, 'epoch': 0.72} +2025-02-05 16:46:03 - ERROR - stderr - 24%|██▍ | 5404/22434 [6:38:23<11:49:45, 2.50s/it] +2025-02-05 16:46:06 - ERROR - stderr - 24%|██▍ | 5405/22434 [6:38:26<11:50:33, 2.50s/it] +2025-02-05 16:46:06 - ERROR - stderr - +2025-02-05 16:46:06 - ERROR - stderr - +2025-02-05 16:46:06 - INFO - stdout - {'loss': 0.8962, 'grad_norm': 0.947208046913147, 'learning_rate': 1.7756597894079966e-05, 'epoch': 0.72} +2025-02-05 16:46:06 - ERROR - stderr - 24%|██▍ | 5405/22434 [6:38:26<11:50:33, 2.50s/it] +2025-02-05 16:46:08 - ERROR - stderr - 24%|██▍ | 5406/22434 [6:38:28<11:52:09, 2.51s/it] +2025-02-05 16:46:09 - ERROR - stderr - +2025-02-05 16:46:09 - ERROR - stderr - +2025-02-05 16:46:09 - INFO - stdout - {'loss': 0.9572, 'grad_norm': 1.1464723348617554, 'learning_rate': 1.7755686590767962e-05, 'epoch': 0.72} +2025-02-05 16:46:09 - ERROR - stderr - 24%|██▍ | 5406/22434 [6:38:28<11:52:09, 2.51s/it] +2025-02-05 16:46:11 - ERROR - stderr - 24%|██▍ | 5407/22434 [6:38:31<11:49:01, 2.50s/it] +2025-02-05 16:46:11 - ERROR - stderr - +2025-02-05 16:46:11 - ERROR - stderr - +2025-02-05 16:46:11 - INFO - stdout - {'loss': 0.9651, 'grad_norm': 1.189192533493042, 'learning_rate': 1.7754775125796095e-05, 'epoch': 0.72} +2025-02-05 16:46:11 - ERROR - stderr - 24%|██▍ | 5407/22434 [6:38:31<11:49:01, 2.50s/it] +2025-02-05 16:46:13 - ERROR - stderr - 24%|██▍ | 5408/22434 [6:38:33<11:52:04, 2.51s/it] +2025-02-05 16:46:14 - ERROR - stderr - +2025-02-05 16:46:14 - ERROR - stderr - +2025-02-05 16:46:14 - INFO - stdout - {'loss': 0.8988, 'grad_norm': 1.0269020795822144, 'learning_rate': 1.7753863499183358e-05, 'epoch': 0.72} +2025-02-05 16:46:14 - ERROR - stderr - 24%|██▍ | 5408/22434 [6:38:33<11:52:04, 2.51s/it] +2025-02-05 16:46:16 - ERROR - stderr - 24%|██▍ | 5409/22434 [6:38:36<11:55:41, 2.52s/it] +2025-02-05 16:46:16 - ERROR - stderr - +2025-02-05 16:46:16 - ERROR - stderr - +2025-02-05 16:46:16 - INFO - stdout - {'loss': 1.2092, 'grad_norm': 1.1524895429611206, 'learning_rate': 1.775295171094876e-05, 'epoch': 0.72} +2025-02-05 16:46:16 - ERROR - stderr - 24%|██▍ | 5409/22434 [6:38:36<11:55:41, 2.52s/it] +2025-02-05 16:46:19 - ERROR - stderr - 24%|██▍ | 5410/22434 [6:38:38<11:55:25, 2.52s/it] +2025-02-05 16:46:19 - ERROR - stderr - +2025-02-05 16:46:19 - ERROR - stderr - +2025-02-05 16:46:19 - INFO - stdout - {'loss': 0.9952, 'grad_norm': 1.0381126403808594, 'learning_rate': 1.77520397611113e-05, 'epoch': 0.72} +2025-02-05 16:46:19 - ERROR - stderr - 24%|██▍ | 5410/22434 [6:38:38<11:55:25, 2.52s/it] +2025-02-05 16:46:21 - ERROR - stderr - 24%|██▍ | 5411/22434 [6:38:41<12:10:11, 2.57s/it] +2025-02-05 16:46:21 - ERROR - stderr - +2025-02-05 16:46:21 - ERROR - stderr - +2025-02-05 16:46:21 - INFO - stdout - {'loss': 0.859, 'grad_norm': 1.070483922958374, 'learning_rate': 1.775112764968999e-05, 'epoch': 0.72} +2025-02-05 16:46:21 - ERROR - stderr - 24%|██▍ | 5411/22434 [6:38:41<12:10:11, 2.57s/it] +2025-02-05 16:46:24 - ERROR - stderr - 24%|██▍ | 5412/22434 [6:38:43<12:02:13, 2.55s/it] +2025-02-05 16:46:24 - ERROR - stderr - +2025-02-05 16:46:24 - ERROR - stderr - +2025-02-05 16:46:24 - INFO - stdout - {'loss': 0.7976, 'grad_norm': 1.022913932800293, 'learning_rate': 1.775021537670384e-05, 'epoch': 0.72} +2025-02-05 16:46:24 - ERROR - stderr - 24%|██▍ | 5412/22434 [6:38:44<12:02:13, 2.55s/it] +2025-02-05 16:46:26 - ERROR - stderr - 24%|██▍ | 5413/22434 [6:38:46<11:56:05, 2.52s/it] +2025-02-05 16:46:26 - ERROR - stderr - +2025-02-05 16:46:26 - ERROR - stderr - +2025-02-05 16:46:26 - INFO - stdout - {'loss': 0.99, 'grad_norm': 1.089581847190857, 'learning_rate': 1.7749302942171866e-05, 'epoch': 0.72} +2025-02-05 16:46:26 - ERROR - stderr - 24%|██▍ | 5413/22434 [6:38:46<11:56:05, 2.52s/it] +2025-02-05 16:46:29 - ERROR - stderr - 24%|██▍ | 5414/22434 [6:38:48<11:57:42, 2.53s/it] +2025-02-05 16:46:29 - ERROR - stderr - +2025-02-05 16:46:29 - ERROR - stderr - +2025-02-05 16:46:29 - INFO - stdout - {'loss': 1.0403, 'grad_norm': 1.1272262334823608, 'learning_rate': 1.7748390346113085e-05, 'epoch': 0.72} +2025-02-05 16:46:29 - ERROR - stderr - 24%|██▍ | 5414/22434 [6:38:49<11:57:42, 2.53s/it] +2025-02-05 16:46:31 - ERROR - stderr - 24%|██▍ | 5415/22434 [6:38:51<11:50:14, 2.50s/it] +2025-02-05 16:46:31 - ERROR - stderr - +2025-02-05 16:46:31 - ERROR - stderr - +2025-02-05 16:46:31 - INFO - stdout - {'loss': 0.9009, 'grad_norm': 1.1359671354293823, 'learning_rate': 1.7747477588546528e-05, 'epoch': 0.72} +2025-02-05 16:46:31 - ERROR - stderr - 24%|██▍ | 5415/22434 [6:38:51<11:50:14, 2.50s/it] +2025-02-05 16:46:34 - ERROR - stderr - 24%|██▍ | 5416/22434 [6:38:53<11:45:49, 2.49s/it] +2025-02-05 16:46:34 - ERROR - stderr - +2025-02-05 16:46:34 - ERROR - stderr - +2025-02-05 16:46:34 - INFO - stdout - {'loss': 0.8761, 'grad_norm': 1.015596866607666, 'learning_rate': 1.774656466949121e-05, 'epoch': 0.72} +2025-02-05 16:46:34 - ERROR - stderr - 24%|██▍ | 5416/22434 [6:38:53<11:45:49, 2.49s/it] +2025-02-05 16:46:36 - ERROR - stderr - 24%|██▍ | 5417/22434 [6:38:56<11:51:25, 2.51s/it] +2025-02-05 16:46:36 - ERROR - stderr - +2025-02-05 16:46:36 - ERROR - stderr - +2025-02-05 16:46:36 - INFO - stdout - {'loss': 0.9472, 'grad_norm': 1.0954852104187012, 'learning_rate': 1.7745651588966167e-05, 'epoch': 0.72} +2025-02-05 16:46:36 - ERROR - stderr - 24%|██▍ | 5417/22434 [6:38:56<11:51:25, 2.51s/it] +2025-02-05 16:46:39 - ERROR - stderr - 24%|██▍ | 5418/22434 [6:38:58<11:52:35, 2.51s/it] +2025-02-05 16:46:39 - ERROR - stderr - +2025-02-05 16:46:39 - ERROR - stderr - +2025-02-05 16:46:39 - INFO - stdout - {'loss': 0.7797, 'grad_norm': 1.0864711999893188, 'learning_rate': 1.7744738346990425e-05, 'epoch': 0.72} +2025-02-05 16:46:39 - ERROR - stderr - 24%|██▍ | 5418/22434 [6:38:59<11:52:35, 2.51s/it] +2025-02-05 16:46:41 - ERROR - stderr - 24%|██▍ | 5419/22434 [6:39:01<11:50:08, 2.50s/it] +2025-02-05 16:46:41 - ERROR - stderr - +2025-02-05 16:46:41 - ERROR - stderr - +2025-02-05 16:46:41 - INFO - stdout - {'loss': 0.9695, 'grad_norm': 1.0508103370666504, 'learning_rate': 1.7743824943583028e-05, 'epoch': 0.72} +2025-02-05 16:46:41 - ERROR - stderr - 24%|██▍ | 5419/22434 [6:39:01<11:50:08, 2.50s/it] +2025-02-05 16:46:44 - ERROR - stderr - 24%|██▍ | 5420/22434 [6:39:03<11:45:36, 2.49s/it] +2025-02-05 16:46:44 - ERROR - stderr - +2025-02-05 16:46:44 - ERROR - stderr - +2025-02-05 16:46:44 - INFO - stdout - {'loss': 0.9365, 'grad_norm': 1.115043044090271, 'learning_rate': 1.7742911378763006e-05, 'epoch': 0.72} +2025-02-05 16:46:44 - ERROR - stderr - 24%|██▍ | 5420/22434 [6:39:03<11:45:36, 2.49s/it] +2025-02-05 16:46:46 - ERROR - stderr - 24%|██▍ | 5421/22434 [6:39:06<11:48:55, 2.50s/it] +2025-02-05 16:46:46 - ERROR - stderr - +2025-02-05 16:46:46 - ERROR - stderr - +2025-02-05 16:46:46 - INFO - stdout - {'loss': 1.0612, 'grad_norm': 1.0312976837158203, 'learning_rate': 1.7741997652549408e-05, 'epoch': 0.72} +2025-02-05 16:46:46 - ERROR - stderr - 24%|██▍ | 5421/22434 [6:39:06<11:48:55, 2.50s/it] +2025-02-05 16:46:49 - ERROR - stderr - 24%|██▍ | 5422/22434 [6:39:08<11:48:52, 2.50s/it] +2025-02-05 16:46:49 - ERROR - stderr - +2025-02-05 16:46:49 - ERROR - stderr - +2025-02-05 16:46:49 - INFO - stdout - {'loss': 0.9425, 'grad_norm': 1.1387337446212769, 'learning_rate': 1.7741083764961274e-05, 'epoch': 0.73} +2025-02-05 16:46:49 - ERROR - stderr - 24%|██▍ | 5422/22434 [6:39:08<11:48:52, 2.50s/it] +2025-02-05 16:46:51 - ERROR - stderr - 24%|██▍ | 5423/22434 [6:39:11<11:45:48, 2.49s/it] +2025-02-05 16:46:51 - ERROR - stderr - +2025-02-05 16:46:51 - ERROR - stderr - +2025-02-05 16:46:51 - INFO - stdout - {'loss': 0.8531, 'grad_norm': 1.0692471265792847, 'learning_rate': 1.774016971601766e-05, 'epoch': 0.73} +2025-02-05 16:46:51 - ERROR - stderr - 24%|██▍ | 5423/22434 [6:39:11<11:45:48, 2.49s/it] +2025-02-05 16:46:54 - ERROR - stderr - 24%|██▍ | 5424/22434 [6:39:13<11:42:10, 2.48s/it] +2025-02-05 16:46:54 - ERROR - stderr - +2025-02-05 16:46:54 - ERROR - stderr - +2025-02-05 16:46:54 - INFO - stdout - {'loss': 0.9036, 'grad_norm': 0.9859254956245422, 'learning_rate': 1.773925550573761e-05, 'epoch': 0.73} +2025-02-05 16:46:54 - ERROR - stderr - 24%|██▍ | 5424/22434 [6:39:13<11:42:10, 2.48s/it] +2025-02-05 16:46:56 - ERROR - stderr - 24%|██▍ | 5425/22434 [6:39:16<11:45:23, 2.49s/it] +2025-02-05 16:46:56 - ERROR - stderr - +2025-02-05 16:46:56 - ERROR - stderr - +2025-02-05 16:46:56 - INFO - stdout - {'loss': 0.9257, 'grad_norm': 1.1124447584152222, 'learning_rate': 1.7738341134140188e-05, 'epoch': 0.73} +2025-02-05 16:46:56 - ERROR - stderr - 24%|██▍ | 5425/22434 [6:39:16<11:45:23, 2.49s/it] +2025-02-05 16:46:59 - ERROR - stderr - 24%|██▍ | 5426/22434 [6:39:18<11:44:46, 2.49s/it] +2025-02-05 16:46:59 - ERROR - stderr - +2025-02-05 16:46:59 - ERROR - stderr - +2025-02-05 16:46:59 - INFO - stdout - {'loss': 1.0117, 'grad_norm': 1.0326091051101685, 'learning_rate': 1.773742660124445e-05, 'epoch': 0.73} +2025-02-05 16:46:59 - ERROR - stderr - 24%|██▍ | 5426/22434 [6:39:18<11:44:46, 2.49s/it] +2025-02-05 16:47:01 - ERROR - stderr - 24%|██▍ | 5427/22434 [6:39:21<11:48:01, 2.50s/it] +2025-02-05 16:47:01 - ERROR - stderr - +2025-02-05 16:47:01 - ERROR - stderr - +2025-02-05 16:47:01 - INFO - stdout - {'loss': 0.9069, 'grad_norm': 0.9975651502609253, 'learning_rate': 1.7736511907069455e-05, 'epoch': 0.73} +2025-02-05 16:47:01 - ERROR - stderr - 24%|██▍ | 5427/22434 [6:39:21<11:48:01, 2.50s/it] +2025-02-05 16:47:04 - ERROR - stderr - 24%|██▍ | 5428/22434 [6:39:23<11:42:53, 2.48s/it] +2025-02-05 16:47:04 - ERROR - stderr - +2025-02-05 16:47:04 - ERROR - stderr - +2025-02-05 16:47:04 - INFO - stdout - {'loss': 1.0254, 'grad_norm': 1.1344283819198608, 'learning_rate': 1.7735597051634277e-05, 'epoch': 0.73} +2025-02-05 16:47:04 - ERROR - stderr - 24%|██▍ | 5428/22434 [6:39:23<11:42:53, 2.48s/it] +2025-02-05 16:47:06 - ERROR - stderr - 24%|██▍ | 5429/22434 [6:39:26<11:43:24, 2.48s/it] +2025-02-05 16:47:06 - ERROR - stderr - +2025-02-05 16:47:06 - ERROR - stderr - +2025-02-05 16:47:06 - INFO - stdout - {'loss': 0.8987, 'grad_norm': 1.1387197971343994, 'learning_rate': 1.773468203495798e-05, 'epoch': 0.73} +2025-02-05 16:47:06 - ERROR - stderr - 24%|██▍ | 5429/22434 [6:39:26<11:43:24, 2.48s/it] +2025-02-05 16:47:08 - ERROR - stderr - 24%|██▍ | 5430/22434 [6:39:28<11:39:21, 2.47s/it] +2025-02-05 16:47:09 - ERROR - stderr - +2025-02-05 16:47:09 - ERROR - stderr - +2025-02-05 16:47:09 - INFO - stdout - {'loss': 1.008, 'grad_norm': 1.1215769052505493, 'learning_rate': 1.7733766857059635e-05, 'epoch': 0.73} +2025-02-05 16:47:09 - ERROR - stderr - 24%|██▍ | 5430/22434 [6:39:28<11:39:21, 2.47s/it] +2025-02-05 16:47:11 - ERROR - stderr - 24%|██▍ | 5431/22434 [6:39:31<11:41:28, 2.48s/it] +2025-02-05 16:47:11 - ERROR - stderr - +2025-02-05 16:47:11 - ERROR - stderr - +2025-02-05 16:47:11 - INFO - stdout - {'loss': 0.8995, 'grad_norm': 1.0588525533676147, 'learning_rate': 1.773285151795832e-05, 'epoch': 0.73} +2025-02-05 16:47:11 - ERROR - stderr - 24%|██▍ | 5431/22434 [6:39:31<11:41:28, 2.48s/it] +2025-02-05 16:47:13 - ERROR - stderr - 24%|██▍ | 5432/22434 [6:39:33<11:43:55, 2.48s/it] +2025-02-05 16:47:14 - ERROR - stderr - +2025-02-05 16:47:14 - ERROR - stderr - +2025-02-05 16:47:14 - INFO - stdout - {'loss': 1.0114, 'grad_norm': 1.140607476234436, 'learning_rate': 1.7731936017673116e-05, 'epoch': 0.73} +2025-02-05 16:47:14 - ERROR - stderr - 24%|██▍ | 5432/22434 [6:39:33<11:43:55, 2.48s/it] +2025-02-05 16:47:16 - ERROR - stderr - 24%|██▍ | 5433/22434 [6:39:36<11:47:40, 2.50s/it] +2025-02-05 16:47:16 - ERROR - stderr - +2025-02-05 16:47:16 - ERROR - stderr - +2025-02-05 16:47:16 - INFO - stdout - {'loss': 1.0907, 'grad_norm': 1.0446076393127441, 'learning_rate': 1.7731020356223102e-05, 'epoch': 0.73} +2025-02-05 16:47:16 - ERROR - stderr - 24%|██▍ | 5433/22434 [6:39:36<11:47:40, 2.50s/it] +2025-02-05 16:47:18 - ERROR - stderr - 24%|██▍ | 5434/22434 [6:39:38<11:43:58, 2.48s/it] +2025-02-05 16:47:18 - ERROR - stderr - +2025-02-05 16:47:18 - ERROR - stderr - +2025-02-05 16:47:18 - INFO - stdout - {'loss': 1.0299, 'grad_norm': 1.117741346359253, 'learning_rate': 1.773010453362737e-05, 'epoch': 0.73} +2025-02-05 16:47:18 - ERROR - stderr - 24%|██▍ | 5434/22434 [6:39:38<11:43:58, 2.48s/it] +2025-02-05 16:47:21 - ERROR - stderr - 24%|██▍ | 5435/22434 [6:39:41<11:43:07, 2.48s/it] +2025-02-05 16:47:21 - ERROR - stderr - +2025-02-05 16:47:21 - ERROR - stderr - +2025-02-05 16:47:21 - INFO - stdout - {'loss': 1.012, 'grad_norm': 1.116154432296753, 'learning_rate': 1.7729188549905004e-05, 'epoch': 0.73} +2025-02-05 16:47:21 - ERROR - stderr - 24%|██▍ | 5435/22434 [6:39:41<11:43:07, 2.48s/it] +2025-02-05 16:47:23 - ERROR - stderr - 24%|██▍ | 5436/22434 [6:39:43<11:43:41, 2.48s/it] +2025-02-05 16:47:23 - ERROR - stderr - +2025-02-05 16:47:23 - ERROR - stderr - +2025-02-05 16:47:23 - INFO - stdout - {'loss': 1.0374, 'grad_norm': 1.1451771259307861, 'learning_rate': 1.77282724050751e-05, 'epoch': 0.73} +2025-02-05 16:47:23 - ERROR - stderr - 24%|██▍ | 5436/22434 [6:39:43<11:43:41, 2.48s/it] +2025-02-05 16:47:26 - ERROR - stderr - 24%|██▍ | 5437/22434 [6:39:46<11:45:42, 2.49s/it] +2025-02-05 16:47:26 - ERROR - stderr - +2025-02-05 16:47:26 - ERROR - stderr - +2025-02-05 16:47:26 - INFO - stdout - {'loss': 0.8703, 'grad_norm': 1.016891360282898, 'learning_rate': 1.7727356099156755e-05, 'epoch': 0.73} +2025-02-05 16:47:26 - ERROR - stderr - 24%|██▍ | 5437/22434 [6:39:46<11:45:42, 2.49s/it] +2025-02-05 16:47:28 - ERROR - stderr - 24%|██▍ | 5438/22434 [6:39:48<11:49:52, 2.51s/it] +2025-02-05 16:47:29 - ERROR - stderr - +2025-02-05 16:47:29 - ERROR - stderr - +2025-02-05 16:47:29 - INFO - stdout - {'loss': 0.964, 'grad_norm': 1.0598512887954712, 'learning_rate': 1.7726439632169064e-05, 'epoch': 0.73} +2025-02-05 16:47:29 - ERROR - stderr - 24%|██▍ | 5438/22434 [6:39:48<11:49:52, 2.51s/it] +2025-02-05 16:47:31 - ERROR - stderr - 24%|██▍ | 5439/22434 [6:39:51<11:42:40, 2.48s/it] +2025-02-05 16:47:31 - ERROR - stderr - +2025-02-05 16:47:31 - ERROR - stderr - +2025-02-05 16:47:31 - INFO - stdout - {'loss': 0.8849, 'grad_norm': 1.020731806755066, 'learning_rate': 1.772552300413113e-05, 'epoch': 0.73} +2025-02-05 16:47:31 - ERROR - stderr - 24%|██▍ | 5439/22434 [6:39:51<11:42:40, 2.48s/it] +2025-02-05 16:47:33 - ERROR - stderr - 24%|██▍ | 5440/22434 [6:39:53<11:47:02, 2.50s/it] +2025-02-05 16:47:33 - ERROR - stderr - +2025-02-05 16:47:33 - ERROR - stderr - +2025-02-05 16:47:33 - INFO - stdout - {'loss': 0.9839, 'grad_norm': 1.1210649013519287, 'learning_rate': 1.7724606215062065e-05, 'epoch': 0.73} +2025-02-05 16:47:33 - ERROR - stderr - 24%|██▍ | 5440/22434 [6:39:53<11:47:02, 2.50s/it] +2025-02-05 16:47:36 - ERROR - stderr - 24%|██▍ | 5441/22434 [6:39:56<11:44:00, 2.49s/it] +2025-02-05 16:47:36 - ERROR - stderr - +2025-02-05 16:47:36 - ERROR - stderr - +2025-02-05 16:47:36 - INFO - stdout - {'loss': 0.9993, 'grad_norm': 1.1568015813827515, 'learning_rate': 1.7723689264980974e-05, 'epoch': 0.73} +2025-02-05 16:47:36 - ERROR - stderr - 24%|██▍ | 5441/22434 [6:39:56<11:44:00, 2.49s/it] +2025-02-05 16:47:38 - ERROR - stderr - 24%|██▍ | 5442/22434 [6:39:58<11:43:47, 2.49s/it] +2025-02-05 16:47:38 - ERROR - stderr - +2025-02-05 16:47:38 - ERROR - stderr - +2025-02-05 16:47:38 - INFO - stdout - {'loss': 0.9102, 'grad_norm': 1.0551351308822632, 'learning_rate': 1.772277215390697e-05, 'epoch': 0.73} +2025-02-05 16:47:38 - ERROR - stderr - 24%|██▍ | 5442/22434 [6:39:58<11:43:47, 2.49s/it] +2025-02-05 16:47:41 - ERROR - stderr - 24%|██▍ | 5443/22434 [6:40:01<11:42:48, 2.48s/it] +2025-02-05 16:47:41 - ERROR - stderr - +2025-02-05 16:47:41 - ERROR - stderr - +2025-02-05 16:47:41 - INFO - stdout - {'loss': 1.0048, 'grad_norm': 1.0488078594207764, 'learning_rate': 1.7721854881859166e-05, 'epoch': 0.73} +2025-02-05 16:47:41 - ERROR - stderr - 24%|██▍ | 5443/22434 [6:40:01<11:42:48, 2.48s/it] +2025-02-05 16:47:43 - ERROR - stderr - 24%|██▍ | 5444/22434 [6:40:03<11:39:45, 2.47s/it] +2025-02-05 16:47:43 - ERROR - stderr - +2025-02-05 16:47:43 - ERROR - stderr - +2025-02-05 16:47:43 - INFO - stdout - {'loss': 0.875, 'grad_norm': 1.0173295736312866, 'learning_rate': 1.7720937448856694e-05, 'epoch': 0.73} +2025-02-05 16:47:43 - ERROR - stderr - 24%|██▍ | 5444/22434 [6:40:03<11:39:45, 2.47s/it] +2025-02-05 16:47:46 - ERROR - stderr - 24%|██▍ | 5445/22434 [6:40:06<11:44:28, 2.49s/it] +2025-02-05 16:47:46 - ERROR - stderr - +2025-02-05 16:47:46 - ERROR - stderr - +2025-02-05 16:47:46 - INFO - stdout - {'loss': 0.9178, 'grad_norm': 1.01760995388031, 'learning_rate': 1.7720019854918663e-05, 'epoch': 0.73} +2025-02-05 16:47:46 - ERROR - stderr - 24%|██▍ | 5445/22434 [6:40:06<11:44:28, 2.49s/it] +2025-02-05 16:47:48 - ERROR - stderr - 24%|██▍ | 5446/22434 [6:40:08<11:40:24, 2.47s/it] +2025-02-05 16:47:48 - ERROR - stderr - +2025-02-05 16:47:48 - ERROR - stderr - +2025-02-05 16:47:48 - INFO - stdout - {'loss': 0.9401, 'grad_norm': 1.0633618831634521, 'learning_rate': 1.771910210006421e-05, 'epoch': 0.73} +2025-02-05 16:47:48 - ERROR - stderr - 24%|██▍ | 5446/22434 [6:40:08<11:40:24, 2.47s/it] +2025-02-05 16:47:51 - ERROR - stderr - 24%|██▍ | 5447/22434 [6:40:10<11:39:07, 2.47s/it] +2025-02-05 16:47:51 - ERROR - stderr - +2025-02-05 16:47:51 - ERROR - stderr - +2025-02-05 16:47:51 - INFO - stdout - {'loss': 0.933, 'grad_norm': 0.9810612201690674, 'learning_rate': 1.771818418431246e-05, 'epoch': 0.73} +2025-02-05 16:47:51 - ERROR - stderr - 24%|██▍ | 5447/22434 [6:40:11<11:39:07, 2.47s/it] +2025-02-05 16:47:53 - ERROR - stderr - 24%|██▍ | 5448/22434 [6:40:13<11:48:06, 2.50s/it] +2025-02-05 16:47:53 - ERROR - stderr - +2025-02-05 16:47:53 - ERROR - stderr - +2025-02-05 16:47:53 - INFO - stdout - {'loss': 1.0128, 'grad_norm': 1.0796051025390625, 'learning_rate': 1.7717266107682544e-05, 'epoch': 0.73} +2025-02-05 16:47:53 - ERROR - stderr - 24%|██▍ | 5448/22434 [6:40:13<11:48:06, 2.50s/it] +2025-02-05 16:47:56 - ERROR - stderr - 24%|██▍ | 5449/22434 [6:40:16<11:47:12, 2.50s/it] +2025-02-05 16:47:56 - ERROR - stderr - +2025-02-05 16:47:56 - ERROR - stderr - +2025-02-05 16:47:56 - INFO - stdout - {'loss': 1.1054, 'grad_norm': 1.2071588039398193, 'learning_rate': 1.77163478701936e-05, 'epoch': 0.73} +2025-02-05 16:47:56 - ERROR - stderr - 24%|██▍ | 5449/22434 [6:40:16<11:47:12, 2.50s/it] +2025-02-05 16:47:58 - ERROR - stderr - 24%|██▍ | 5450/22434 [6:40:18<11:54:37, 2.52s/it] +2025-02-05 16:47:58 - ERROR - stderr - +2025-02-05 16:47:58 - ERROR - stderr - +2025-02-05 16:47:58 - INFO - stdout - {'loss': 1.023, 'grad_norm': 1.03304123878479, 'learning_rate': 1.7715429471864768e-05, 'epoch': 0.73} +2025-02-05 16:47:58 - ERROR - stderr - 24%|██▍ | 5450/22434 [6:40:18<11:54:37, 2.52s/it] +2025-02-05 16:48:01 - ERROR - stderr - 24%|██▍ | 5451/22434 [6:40:21<11:56:00, 2.53s/it] +2025-02-05 16:48:01 - ERROR - stderr - +2025-02-05 16:48:01 - ERROR - stderr - +2025-02-05 16:48:01 - INFO - stdout - {'loss': 0.8822, 'grad_norm': 1.0942498445510864, 'learning_rate': 1.7714510912715194e-05, 'epoch': 0.73} +2025-02-05 16:48:01 - ERROR - stderr - 24%|██▍ | 5451/22434 [6:40:21<11:56:00, 2.53s/it] +2025-02-05 16:48:03 - ERROR - stderr - 24%|██▍ | 5452/22434 [6:40:23<11:52:29, 2.52s/it] +2025-02-05 16:48:03 - ERROR - stderr - +2025-02-05 16:48:03 - ERROR - stderr - +2025-02-05 16:48:03 - INFO - stdout - {'loss': 0.92, 'grad_norm': 0.954436182975769, 'learning_rate': 1.771359219276402e-05, 'epoch': 0.73} +2025-02-05 16:48:03 - ERROR - stderr - 24%|██▍ | 5452/22434 [6:40:23<11:52:29, 2.52s/it] +2025-02-05 16:48:06 - ERROR - stderr - 24%|██▍ | 5453/22434 [6:40:26<11:46:29, 2.50s/it] +2025-02-05 16:48:06 - ERROR - stderr - +2025-02-05 16:48:06 - ERROR - stderr - +2025-02-05 16:48:06 - INFO - stdout - {'loss': 0.8816, 'grad_norm': 1.010201096534729, 'learning_rate': 1.77126733120304e-05, 'epoch': 0.73} +2025-02-05 16:48:06 - ERROR - stderr - 24%|██▍ | 5453/22434 [6:40:26<11:46:29, 2.50s/it] +2025-02-05 16:48:08 - ERROR - stderr - 24%|██▍ | 5454/22434 [6:40:28<11:48:56, 2.51s/it] +2025-02-05 16:48:08 - ERROR - stderr - +2025-02-05 16:48:08 - ERROR - stderr - +2025-02-05 16:48:08 - INFO - stdout - {'loss': 0.9224, 'grad_norm': 0.9629737138748169, 'learning_rate': 1.7711754270533483e-05, 'epoch': 0.73} +2025-02-05 16:48:08 - ERROR - stderr - 24%|██▍ | 5454/22434 [6:40:28<11:48:56, 2.51s/it] +2025-02-05 16:48:11 - ERROR - stderr - 24%|██▍ | 5455/22434 [6:40:31<11:51:48, 2.52s/it] +2025-02-05 16:48:11 - ERROR - stderr - +2025-02-05 16:48:11 - ERROR - stderr - +2025-02-05 16:48:11 - INFO - stdout - {'loss': 0.8518, 'grad_norm': 1.0090998411178589, 'learning_rate': 1.771083506829243e-05, 'epoch': 0.73} +2025-02-05 16:48:11 - ERROR - stderr - 24%|██▍ | 5455/22434 [6:40:31<11:51:48, 2.52s/it] +2025-02-05 16:48:13 - ERROR - stderr - 24%|██▍ | 5456/22434 [6:40:33<11:52:55, 2.52s/it] +2025-02-05 16:48:13 - ERROR - stderr - +2025-02-05 16:48:13 - ERROR - stderr - +2025-02-05 16:48:13 - INFO - stdout - {'loss': 0.8565, 'grad_norm': 0.9697344899177551, 'learning_rate': 1.7709915705326394e-05, 'epoch': 0.73} +2025-02-05 16:48:13 - ERROR - stderr - 24%|██▍ | 5456/22434 [6:40:33<11:52:55, 2.52s/it] +2025-02-05 16:48:16 - ERROR - stderr - 24%|██▍ | 5457/22434 [6:40:36<11:52:52, 2.52s/it] +2025-02-05 16:48:16 - ERROR - stderr - +2025-02-05 16:48:16 - ERROR - stderr - +2025-02-05 16:48:16 - INFO - stdout - {'loss': 0.9162, 'grad_norm': 1.096519947052002, 'learning_rate': 1.770899618165455e-05, 'epoch': 0.73} +2025-02-05 16:48:16 - ERROR - stderr - 24%|██▍ | 5457/22434 [6:40:36<11:52:52, 2.52s/it] +2025-02-05 16:48:19 - ERROR - stderr - 24%|██▍ | 5458/22434 [6:40:38<11:56:50, 2.53s/it] +2025-02-05 16:48:19 - ERROR - stderr - +2025-02-05 16:48:19 - ERROR - stderr - +2025-02-05 16:48:19 - INFO - stdout - {'loss': 0.8868, 'grad_norm': 1.0003653764724731, 'learning_rate': 1.770807649729605e-05, 'epoch': 0.73} +2025-02-05 16:48:19 - ERROR - stderr - 24%|██▍ | 5458/22434 [6:40:38<11:56:50, 2.53s/it] +2025-02-05 16:48:21 - ERROR - stderr - 24%|██▍ | 5459/22434 [6:40:41<11:53:31, 2.52s/it] +2025-02-05 16:48:21 - ERROR - stderr - +2025-02-05 16:48:21 - ERROR - stderr - +2025-02-05 16:48:21 - INFO - stdout - {'loss': 0.9921, 'grad_norm': 1.062525749206543, 'learning_rate': 1.7707156652270076e-05, 'epoch': 0.73} +2025-02-05 16:48:21 - ERROR - stderr - 24%|██▍ | 5459/22434 [6:40:41<11:53:31, 2.52s/it] +2025-02-05 16:48:24 - ERROR - stderr - 24%|██▍ | 5460/22434 [6:40:43<11:51:29, 2.51s/it] +2025-02-05 16:48:24 - ERROR - stderr - +2025-02-05 16:48:24 - ERROR - stderr - +2025-02-05 16:48:24 - INFO - stdout - {'loss': 0.9239, 'grad_norm': 1.144569754600525, 'learning_rate': 1.7706236646595792e-05, 'epoch': 0.73} +2025-02-05 16:48:24 - ERROR - stderr - 24%|██▍ | 5460/22434 [6:40:43<11:51:29, 2.51s/it] +2025-02-05 16:48:26 - ERROR - stderr - 24%|██▍ | 5461/22434 [6:40:46<11:47:59, 2.50s/it] +2025-02-05 16:48:26 - ERROR - stderr - +2025-02-05 16:48:26 - ERROR - stderr - +2025-02-05 16:48:26 - INFO - stdout - {'loss': 0.8827, 'grad_norm': 1.0911624431610107, 'learning_rate': 1.7705316480292386e-05, 'epoch': 0.73} +2025-02-05 16:48:26 - ERROR - stderr - 24%|██▍ | 5461/22434 [6:40:46<11:47:59, 2.50s/it] +2025-02-05 16:48:28 - ERROR - stderr - 24%|██▍ | 5462/22434 [6:40:48<11:42:52, 2.48s/it] +2025-02-05 16:48:28 - ERROR - stderr - +2025-02-05 16:48:28 - ERROR - stderr - +2025-02-05 16:48:28 - INFO - stdout - {'loss': 0.9305, 'grad_norm': 1.1237787008285522, 'learning_rate': 1.7704396153379024e-05, 'epoch': 0.73} +2025-02-05 16:48:28 - ERROR - stderr - 24%|██▍ | 5462/22434 [6:40:48<11:42:52, 2.48s/it] +2025-02-05 16:48:31 - ERROR - stderr - 24%|██▍ | 5463/22434 [6:40:51<11:41:38, 2.48s/it] +2025-02-05 16:48:31 - ERROR - stderr - +2025-02-05 16:48:31 - ERROR - stderr - +2025-02-05 16:48:31 - INFO - stdout - {'loss': 0.9271, 'grad_norm': 1.0386147499084473, 'learning_rate': 1.77034756658749e-05, 'epoch': 0.73} +2025-02-05 16:48:31 - ERROR - stderr - 24%|██▍ | 5463/22434 [6:40:51<11:41:38, 2.48s/it] +2025-02-05 16:48:33 - ERROR - stderr - 24%|██▍ | 5464/22434 [6:40:53<11:37:31, 2.47s/it] +2025-02-05 16:48:33 - ERROR - stderr - +2025-02-05 16:48:33 - ERROR - stderr - +2025-02-05 16:48:33 - INFO - stdout - {'loss': 0.8147, 'grad_norm': 1.1341667175292969, 'learning_rate': 1.7702555017799197e-05, 'epoch': 0.73} +2025-02-05 16:48:33 - ERROR - stderr - 24%|██▍ | 5464/22434 [6:40:53<11:37:31, 2.47s/it] +2025-02-05 16:48:36 - ERROR - stderr - 24%|██▍ | 5465/22434 [6:40:56<12:11:14, 2.59s/it] +2025-02-05 16:48:36 - ERROR - stderr - +2025-02-05 16:48:36 - ERROR - stderr - +2025-02-05 16:48:36 - INFO - stdout - {'loss': 0.8925, 'grad_norm': 1.025303602218628, 'learning_rate': 1.7701634209171103e-05, 'epoch': 0.73} +2025-02-05 16:48:36 - ERROR - stderr - 24%|██▍ | 5465/22434 [6:40:56<12:11:14, 2.59s/it] +2025-02-05 16:48:39 - ERROR - stderr - 24%|██▍ | 5466/22434 [6:40:59<12:33:08, 2.66s/it] +2025-02-05 16:48:39 - ERROR - stderr - +2025-02-05 16:48:39 - ERROR - stderr - +2025-02-05 16:48:39 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.1619781255722046, 'learning_rate': 1.770071324000982e-05, 'epoch': 0.73} +2025-02-05 16:48:39 - ERROR - stderr - 24%|██▍ | 5466/22434 [6:40:59<12:33:08, 2.66s/it] +2025-02-05 16:48:42 - ERROR - stderr - 24%|██▍ | 5467/22434 [6:41:01<12:25:38, 2.64s/it] +2025-02-05 16:48:42 - ERROR - stderr - +2025-02-05 16:48:42 - ERROR - stderr - +2025-02-05 16:48:42 - INFO - stdout - {'loss': 0.8628, 'grad_norm': 0.9426234364509583, 'learning_rate': 1.769979211033453e-05, 'epoch': 0.73} +2025-02-05 16:48:42 - ERROR - stderr - 24%|██▍ | 5467/22434 [6:41:01<12:25:38, 2.64s/it] +2025-02-05 16:48:44 - ERROR - stderr - 24%|██▍ | 5468/22434 [6:41:04<12:19:30, 2.62s/it] +2025-02-05 16:48:44 - ERROR - stderr - +2025-02-05 16:48:44 - ERROR - stderr - +2025-02-05 16:48:44 - INFO - stdout - {'loss': 1.0462, 'grad_norm': 1.0559861660003662, 'learning_rate': 1.7698870820164448e-05, 'epoch': 0.73} +2025-02-05 16:48:44 - ERROR - stderr - 24%|██▍ | 5468/22434 [6:41:04<12:19:30, 2.62s/it] +2025-02-05 16:48:47 - ERROR - stderr - 24%|██▍ | 5469/22434 [6:41:06<12:06:08, 2.57s/it] +2025-02-05 16:48:47 - ERROR - stderr - +2025-02-05 16:48:47 - ERROR - stderr - +2025-02-05 16:48:47 - INFO - stdout - {'loss': 0.7941, 'grad_norm': 0.9688773155212402, 'learning_rate': 1.7697949369518766e-05, 'epoch': 0.73} +2025-02-05 16:48:47 - ERROR - stderr - 24%|██▍ | 5469/22434 [6:41:06<12:06:08, 2.57s/it] +2025-02-05 16:48:49 - ERROR - stderr - 24%|██▍ | 5470/22434 [6:41:09<11:56:44, 2.54s/it] +2025-02-05 16:48:49 - ERROR - stderr - +2025-02-05 16:48:49 - ERROR - stderr - +2025-02-05 16:48:49 - INFO - stdout - {'loss': 0.9583, 'grad_norm': 1.1188685894012451, 'learning_rate': 1.76970277584167e-05, 'epoch': 0.73} +2025-02-05 16:48:49 - ERROR - stderr - 24%|██▍ | 5470/22434 [6:41:09<11:56:44, 2.54s/it] +2025-02-05 16:48:52 - ERROR - stderr - 24%|██▍ | 5471/22434 [6:41:11<11:57:18, 2.54s/it] +2025-02-05 16:48:52 - ERROR - stderr - +2025-02-05 16:48:52 - ERROR - stderr - +2025-02-05 16:48:52 - INFO - stdout - {'loss': 1.0556, 'grad_norm': 1.0497543811798096, 'learning_rate': 1.769610598687745e-05, 'epoch': 0.73} +2025-02-05 16:48:52 - ERROR - stderr - 24%|██▍ | 5471/22434 [6:41:11<11:57:18, 2.54s/it] +2025-02-05 16:48:54 - ERROR - stderr - 24%|██▍ | 5472/22434 [6:41:14<12:02:08, 2.55s/it] +2025-02-05 16:48:54 - ERROR - stderr - +2025-02-05 16:48:54 - ERROR - stderr - +2025-02-05 16:48:54 - INFO - stdout - {'loss': 0.9078, 'grad_norm': 1.0324809551239014, 'learning_rate': 1.7695184054920236e-05, 'epoch': 0.73} +2025-02-05 16:48:54 - ERROR - stderr - 24%|██▍ | 5472/22434 [6:41:14<12:02:08, 2.55s/it] +2025-02-05 16:48:57 - ERROR - stderr - 24%|██▍ | 5473/22434 [6:41:17<11:58:33, 2.54s/it] +2025-02-05 16:48:57 - ERROR - stderr - +2025-02-05 16:48:57 - ERROR - stderr - +2025-02-05 16:48:57 - INFO - stdout - {'loss': 0.969, 'grad_norm': 1.0529309511184692, 'learning_rate': 1.7694261962564278e-05, 'epoch': 0.73} +2025-02-05 16:48:57 - ERROR - stderr - 24%|██▍ | 5473/22434 [6:41:17<11:58:33, 2.54s/it] +2025-02-05 16:48:59 - ERROR - stderr - 24%|██▍ | 5474/22434 [6:41:19<11:58:42, 2.54s/it] +2025-02-05 16:48:59 - ERROR - stderr - +2025-02-05 16:48:59 - ERROR - stderr - +2025-02-05 16:48:59 - INFO - stdout - {'loss': 0.9219, 'grad_norm': 1.1453649997711182, 'learning_rate': 1.769333970982879e-05, 'epoch': 0.73} +2025-02-05 16:48:59 - ERROR - stderr - 24%|██▍ | 5474/22434 [6:41:19<11:58:42, 2.54s/it] +2025-02-05 16:49:02 - ERROR - stderr - 24%|██▍ | 5475/22434 [6:41:21<11:50:36, 2.51s/it] +2025-02-05 16:49:02 - ERROR - stderr - +2025-02-05 16:49:02 - ERROR - stderr - +2025-02-05 16:49:02 - INFO - stdout - {'loss': 1.0451, 'grad_norm': 1.103806734085083, 'learning_rate': 1.7692417296733e-05, 'epoch': 0.73} +2025-02-05 16:49:02 - ERROR - stderr - 24%|██▍ | 5475/22434 [6:41:22<11:50:36, 2.51s/it] +2025-02-05 16:49:04 - ERROR - stderr - 24%|██▍ | 5476/22434 [6:41:24<11:49:24, 2.51s/it] +2025-02-05 16:49:04 - ERROR - stderr - +2025-02-05 16:49:04 - ERROR - stderr - +2025-02-05 16:49:04 - INFO - stdout - {'loss': 0.9989, 'grad_norm': 1.2688848972320557, 'learning_rate': 1.769149472329613e-05, 'epoch': 0.73} +2025-02-05 16:49:04 - ERROR - stderr - 24%|██▍ | 5476/22434 [6:41:24<11:49:24, 2.51s/it] +2025-02-05 16:49:07 - ERROR - stderr - 24%|██▍ | 5477/22434 [6:41:26<11:46:03, 2.50s/it] +2025-02-05 16:49:07 - ERROR - stderr - +2025-02-05 16:49:07 - ERROR - stderr - +2025-02-05 16:49:07 - INFO - stdout - {'loss': 1.0707, 'grad_norm': 1.1294771432876587, 'learning_rate': 1.769057198953741e-05, 'epoch': 0.73} +2025-02-05 16:49:07 - ERROR - stderr - 24%|██▍ | 5477/22434 [6:41:27<11:46:03, 2.50s/it] +2025-02-05 16:49:09 - ERROR - stderr - 24%|██▍ | 5478/22434 [6:41:29<11:45:35, 2.50s/it] +2025-02-05 16:49:09 - ERROR - stderr - +2025-02-05 16:49:09 - ERROR - stderr - +2025-02-05 16:49:09 - INFO - stdout - {'loss': 0.9184, 'grad_norm': 1.0375664234161377, 'learning_rate': 1.7689649095476078e-05, 'epoch': 0.73} +2025-02-05 16:49:09 - ERROR - stderr - 24%|██▍ | 5478/22434 [6:41:29<11:45:35, 2.50s/it] +2025-02-05 16:49:12 - ERROR - stderr - 24%|██▍ | 5479/22434 [6:41:32<11:53:12, 2.52s/it] +2025-02-05 16:49:12 - ERROR - stderr - +2025-02-05 16:49:12 - ERROR - stderr - +2025-02-05 16:49:12 - INFO - stdout - {'loss': 0.9285, 'grad_norm': 1.0189743041992188, 'learning_rate': 1.768872604113137e-05, 'epoch': 0.73} +2025-02-05 16:49:12 - ERROR - stderr - 24%|██▍ | 5479/22434 [6:41:32<11:53:12, 2.52s/it] +2025-02-05 16:49:14 - ERROR - stderr - 24%|██▍ | 5480/22434 [6:41:34<11:50:34, 2.51s/it] +2025-02-05 16:49:14 - ERROR - stderr - +2025-02-05 16:49:14 - ERROR - stderr - +2025-02-05 16:49:14 - INFO - stdout - {'loss': 1.014, 'grad_norm': 1.1088390350341797, 'learning_rate': 1.7687802826522525e-05, 'epoch': 0.73} +2025-02-05 16:49:14 - ERROR - stderr - 24%|██▍ | 5480/22434 [6:41:34<11:50:34, 2.51s/it] +2025-02-05 16:49:17 - ERROR - stderr - 24%|██▍ | 5481/22434 [6:41:36<11:44:00, 2.49s/it] +2025-02-05 16:49:17 - ERROR - stderr - +2025-02-05 16:49:17 - ERROR - stderr - +2025-02-05 16:49:17 - INFO - stdout - {'loss': 0.8401, 'grad_norm': 0.9751871824264526, 'learning_rate': 1.7686879451668783e-05, 'epoch': 0.73} +2025-02-05 16:49:17 - ERROR - stderr - 24%|██▍ | 5481/22434 [6:41:37<11:44:00, 2.49s/it] +2025-02-05 16:49:19 - ERROR - stderr - 24%|██▍ | 5482/22434 [6:41:39<11:37:44, 2.47s/it] +2025-02-05 16:49:19 - ERROR - stderr - +2025-02-05 16:49:19 - ERROR - stderr - +2025-02-05 16:49:19 - INFO - stdout - {'loss': 0.8735, 'grad_norm': 1.022199273109436, 'learning_rate': 1.7685955916589396e-05, 'epoch': 0.73} +2025-02-05 16:49:19 - ERROR - stderr - 24%|██▍ | 5482/22434 [6:41:39<11:37:44, 2.47s/it] +2025-02-05 16:49:22 - ERROR - stderr - 24%|██▍ | 5483/22434 [6:41:41<11:46:25, 2.50s/it] +2025-02-05 16:49:22 - ERROR - stderr - +2025-02-05 16:49:22 - ERROR - stderr - +2025-02-05 16:49:22 - INFO - stdout - {'loss': 0.9189, 'grad_norm': 1.0358741283416748, 'learning_rate': 1.7685032221303616e-05, 'epoch': 0.73} +2025-02-05 16:49:22 - ERROR - stderr - 24%|██▍ | 5483/22434 [6:41:42<11:46:25, 2.50s/it] +2025-02-05 16:49:24 - ERROR - stderr - 24%|██▍ | 5484/22434 [6:41:44<11:47:28, 2.50s/it] +2025-02-05 16:49:24 - ERROR - stderr - +2025-02-05 16:49:24 - ERROR - stderr - +2025-02-05 16:49:24 - INFO - stdout - {'loss': 0.9417, 'grad_norm': 1.0660679340362549, 'learning_rate': 1.768410836583069e-05, 'epoch': 0.73} +2025-02-05 16:49:24 - ERROR - stderr - 24%|██▍ | 5484/22434 [6:41:44<11:47:28, 2.50s/it] +2025-02-05 16:49:27 - ERROR - stderr - 24%|██▍ | 5485/22434 [6:41:46<11:46:50, 2.50s/it] +2025-02-05 16:49:27 - ERROR - stderr - +2025-02-05 16:49:27 - ERROR - stderr - +2025-02-05 16:49:27 - INFO - stdout - {'loss': 0.9258, 'grad_norm': 0.9852597713470459, 'learning_rate': 1.7683184350189878e-05, 'epoch': 0.73} +2025-02-05 16:49:27 - ERROR - stderr - 24%|██▍ | 5485/22434 [6:41:47<11:46:50, 2.50s/it] +2025-02-05 16:49:29 - ERROR - stderr - 24%|██▍ | 5486/22434 [6:41:49<11:51:39, 2.52s/it] +2025-02-05 16:49:29 - ERROR - stderr - +2025-02-05 16:49:29 - ERROR - stderr - +2025-02-05 16:49:29 - INFO - stdout - {'loss': 0.9559, 'grad_norm': 0.9773516654968262, 'learning_rate': 1.768226017440044e-05, 'epoch': 0.73} +2025-02-05 16:49:29 - ERROR - stderr - 24%|██▍ | 5486/22434 [6:41:49<11:51:39, 2.52s/it] +2025-02-05 16:49:32 - ERROR - stderr - 24%|██▍ | 5487/22434 [6:41:52<11:58:13, 2.54s/it] +2025-02-05 16:49:32 - ERROR - stderr - +2025-02-05 16:49:32 - ERROR - stderr - +2025-02-05 16:49:32 - INFO - stdout - {'loss': 1.011, 'grad_norm': 1.1555254459381104, 'learning_rate': 1.768133583848164e-05, 'epoch': 0.73} +2025-02-05 16:49:32 - ERROR - stderr - 24%|██▍ | 5487/22434 [6:41:52<11:58:13, 2.54s/it] +2025-02-05 16:49:34 - ERROR - stderr - 24%|██▍ | 5488/22434 [6:41:54<12:04:21, 2.56s/it] +2025-02-05 16:49:35 - ERROR - stderr - +2025-02-05 16:49:35 - ERROR - stderr - +2025-02-05 16:49:35 - INFO - stdout - {'loss': 1.0207, 'grad_norm': 1.1057606935501099, 'learning_rate': 1.768041134245275e-05, 'epoch': 0.73} +2025-02-05 16:49:35 - ERROR - stderr - 24%|██▍ | 5488/22434 [6:41:54<12:04:21, 2.56s/it] +2025-02-05 16:49:37 - ERROR - stderr - 24%|██▍ | 5489/22434 [6:41:57<12:07:59, 2.58s/it] +2025-02-05 16:49:37 - ERROR - stderr - +2025-02-05 16:49:37 - ERROR - stderr - +2025-02-05 16:49:37 - INFO - stdout - {'loss': 1.0686, 'grad_norm': 1.0660011768341064, 'learning_rate': 1.7679486686333027e-05, 'epoch': 0.73} +2025-02-05 16:49:37 - ERROR - stderr - 24%|██▍ | 5489/22434 [6:41:57<12:07:59, 2.58s/it] +2025-02-05 16:49:40 - ERROR - stderr - 24%|██▍ | 5490/22434 [6:41:59<12:04:50, 2.57s/it] +2025-02-05 16:49:40 - ERROR - stderr - +2025-02-05 16:49:40 - ERROR - stderr - +2025-02-05 16:49:40 - INFO - stdout - {'loss': 0.912, 'grad_norm': 1.104441523551941, 'learning_rate': 1.7678561870141755e-05, 'epoch': 0.73} +2025-02-05 16:49:40 - ERROR - stderr - 24%|██▍ | 5490/22434 [6:41:59<12:04:50, 2.57s/it] +2025-02-05 16:49:42 - ERROR - stderr - 24%|██▍ | 5491/22434 [6:42:02<12:00:56, 2.55s/it] +2025-02-05 16:49:42 - ERROR - stderr - +2025-02-05 16:49:42 - ERROR - stderr - +2025-02-05 16:49:42 - INFO - stdout - {'loss': 0.9346, 'grad_norm': 1.0470383167266846, 'learning_rate': 1.767763689389821e-05, 'epoch': 0.73} +2025-02-05 16:49:42 - ERROR - stderr - 24%|██▍ | 5491/22434 [6:42:02<12:00:56, 2.55s/it] +2025-02-05 16:49:45 - ERROR - stderr - 24%|██▍ | 5492/22434 [6:42:04<11:56:16, 2.54s/it] +2025-02-05 16:49:45 - ERROR - stderr - +2025-02-05 16:49:45 - ERROR - stderr - +2025-02-05 16:49:45 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.101184606552124, 'learning_rate': 1.767671175762167e-05, 'epoch': 0.73} +2025-02-05 16:49:45 - ERROR - stderr - 24%|██▍ | 5492/22434 [6:42:04<11:56:16, 2.54s/it] +2025-02-05 16:49:47 - ERROR - stderr - 24%|██▍ | 5493/22434 [6:42:07<11:50:49, 2.52s/it] +2025-02-05 16:49:47 - ERROR - stderr - +2025-02-05 16:49:47 - ERROR - stderr - +2025-02-05 16:49:47 - INFO - stdout - {'loss': 0.9898, 'grad_norm': 1.0381447076797485, 'learning_rate': 1.767578646133142e-05, 'epoch': 0.73} +2025-02-05 16:49:47 - ERROR - stderr - 24%|██▍ | 5493/22434 [6:42:07<11:50:49, 2.52s/it] +2025-02-05 16:49:50 - ERROR - stderr - 24%|██▍ | 5494/22434 [6:42:09<11:43:19, 2.49s/it] +2025-02-05 16:49:50 - ERROR - stderr - +2025-02-05 16:49:50 - ERROR - stderr - +2025-02-05 16:49:50 - INFO - stdout - {'loss': 1.0527, 'grad_norm': 1.0679866075515747, 'learning_rate': 1.7674861005046743e-05, 'epoch': 0.73} +2025-02-05 16:49:50 - ERROR - stderr - 24%|██▍ | 5494/22434 [6:42:09<11:43:19, 2.49s/it] +2025-02-05 16:49:52 - ERROR - stderr - 24%|██▍ | 5495/22434 [6:42:12<11:43:46, 2.49s/it] +2025-02-05 16:49:52 - ERROR - stderr - +2025-02-05 16:49:52 - ERROR - stderr - +2025-02-05 16:49:52 - INFO - stdout - {'loss': 0.9514, 'grad_norm': 0.9806519746780396, 'learning_rate': 1.7673935388786936e-05, 'epoch': 0.73} +2025-02-05 16:49:52 - ERROR - stderr - 24%|██▍ | 5495/22434 [6:42:12<11:43:46, 2.49s/it] +2025-02-05 16:49:55 - ERROR - stderr - 24%|██▍ | 5496/22434 [6:42:14<11:47:50, 2.51s/it] +2025-02-05 16:49:55 - ERROR - stderr - +2025-02-05 16:49:55 - ERROR - stderr - +2025-02-05 16:49:55 - INFO - stdout - {'loss': 0.9131, 'grad_norm': 0.9385021328926086, 'learning_rate': 1.767300961257129e-05, 'epoch': 0.73} +2025-02-05 16:49:55 - ERROR - stderr - 24%|██▍ | 5496/22434 [6:42:14<11:47:50, 2.51s/it] +2025-02-05 16:49:57 - ERROR - stderr - 25%|██▍ | 5497/22434 [6:42:17<11:46:11, 2.50s/it] +2025-02-05 16:49:57 - ERROR - stderr - +2025-02-05 16:49:57 - ERROR - stderr - +2025-02-05 16:49:57 - INFO - stdout - {'loss': 0.9469, 'grad_norm': 1.114537000656128, 'learning_rate': 1.7672083676419095e-05, 'epoch': 0.74} +2025-02-05 16:49:57 - ERROR - stderr - 25%|██▍ | 5497/22434 [6:42:17<11:46:11, 2.50s/it] +2025-02-05 16:50:00 - ERROR - stderr - 25%|██▍ | 5498/22434 [6:42:19<11:41:19, 2.48s/it] +2025-02-05 16:50:00 - ERROR - stderr - +2025-02-05 16:50:00 - ERROR - stderr - +2025-02-05 16:50:00 - INFO - stdout - {'loss': 0.9819, 'grad_norm': 1.0891109704971313, 'learning_rate': 1.767115758034966e-05, 'epoch': 0.74} +2025-02-05 16:50:00 - ERROR - stderr - 25%|██▍ | 5498/22434 [6:42:19<11:41:19, 2.48s/it] +2025-02-05 16:50:02 - ERROR - stderr - 25%|██▍ | 5499/22434 [6:42:22<11:43:48, 2.49s/it] +2025-02-05 16:50:02 - ERROR - stderr - +2025-02-05 16:50:02 - ERROR - stderr - +2025-02-05 16:50:02 - INFO - stdout - {'loss': 1.0279, 'grad_norm': 1.0426448583602905, 'learning_rate': 1.767023132438229e-05, 'epoch': 0.74} +2025-02-05 16:50:02 - ERROR - stderr - 25%|██▍ | 5499/22434 [6:42:22<11:43:48, 2.49s/it] +2025-02-05 16:50:05 - ERROR - stderr - 25%|██▍ | 5500/22434 [6:42:25<12:14:11, 2.60s/it] +2025-02-05 16:50:05 - ERROR - stderr - +2025-02-05 16:50:05 - ERROR - stderr - +2025-02-05 16:50:05 - INFO - stdout - {'loss': 0.8487, 'grad_norm': 0.9964267611503601, 'learning_rate': 1.766930490853628e-05, 'epoch': 0.74} +2025-02-05 16:50:05 - ERROR - stderr - 25%|██▍ | 5500/22434 [6:42:25<12:14:11, 2.60s/it] +2025-02-05 16:50:07 - ERROR - stderr - 25%|██▍ | 5501/22434 [6:42:27<12:05:43, 2.57s/it] +2025-02-05 16:50:07 - ERROR - stderr - +2025-02-05 16:50:07 - ERROR - stderr - +2025-02-05 16:50:07 - INFO - stdout - {'loss': 0.9124, 'grad_norm': 1.0381603240966797, 'learning_rate': 1.7668378332830953e-05, 'epoch': 0.74} +2025-02-05 16:50:07 - ERROR - stderr - 25%|██▍ | 5501/22434 [6:42:27<12:05:43, 2.57s/it] +2025-02-05 16:50:10 - ERROR - stderr - 25%|██▍ | 5502/22434 [6:42:30<12:00:05, 2.55s/it] +2025-02-05 16:50:10 - ERROR - stderr - +2025-02-05 16:50:10 - ERROR - stderr - +2025-02-05 16:50:10 - INFO - stdout - {'loss': 0.8301, 'grad_norm': 0.9481689929962158, 'learning_rate': 1.7667451597285617e-05, 'epoch': 0.74} +2025-02-05 16:50:10 - ERROR - stderr - 25%|██▍ | 5502/22434 [6:42:30<12:00:05, 2.55s/it] +2025-02-05 16:50:12 - ERROR - stderr - 25%|██▍ | 5503/22434 [6:42:32<11:57:28, 2.54s/it] +2025-02-05 16:50:12 - ERROR - stderr - +2025-02-05 16:50:12 - ERROR - stderr - +2025-02-05 16:50:12 - INFO - stdout - {'loss': 0.8151, 'grad_norm': 1.0289973020553589, 'learning_rate': 1.7666524701919588e-05, 'epoch': 0.74} +2025-02-05 16:50:12 - ERROR - stderr - 25%|██▍ | 5503/22434 [6:42:32<11:57:28, 2.54s/it] +2025-02-05 16:50:15 - ERROR - stderr - 25%|██▍ | 5504/22434 [6:42:35<11:54:48, 2.53s/it] +2025-02-05 16:50:15 - ERROR - stderr - +2025-02-05 16:50:15 - ERROR - stderr - +2025-02-05 16:50:15 - INFO - stdout - {'loss': 0.894, 'grad_norm': 1.0425347089767456, 'learning_rate': 1.7665597646752187e-05, 'epoch': 0.74} +2025-02-05 16:50:15 - ERROR - stderr - 25%|██▍ | 5504/22434 [6:42:35<11:54:48, 2.53s/it] +2025-02-05 16:50:17 - ERROR - stderr - 25%|██▍ | 5505/22434 [6:42:37<11:51:58, 2.52s/it] +2025-02-05 16:50:17 - ERROR - stderr - +2025-02-05 16:50:17 - ERROR - stderr - +2025-02-05 16:50:17 - INFO - stdout - {'loss': 0.9095, 'grad_norm': 1.006659746170044, 'learning_rate': 1.766467043180274e-05, 'epoch': 0.74} +2025-02-05 16:50:17 - ERROR - stderr - 25%|██▍ | 5505/22434 [6:42:37<11:51:58, 2.52s/it] +2025-02-05 16:50:20 - ERROR - stderr - 25%|██▍ | 5506/22434 [6:42:40<11:49:13, 2.51s/it] +2025-02-05 16:50:20 - ERROR - stderr - +2025-02-05 16:50:20 - ERROR - stderr - +2025-02-05 16:50:20 - INFO - stdout - {'loss': 0.9025, 'grad_norm': 1.0175583362579346, 'learning_rate': 1.7663743057090572e-05, 'epoch': 0.74} +2025-02-05 16:50:20 - ERROR - stderr - 25%|██▍ | 5506/22434 [6:42:40<11:49:13, 2.51s/it] +2025-02-05 16:50:22 - ERROR - stderr - 25%|██▍ | 5507/22434 [6:42:42<11:46:56, 2.51s/it] +2025-02-05 16:50:22 - ERROR - stderr - +2025-02-05 16:50:22 - ERROR - stderr - +2025-02-05 16:50:22 - INFO - stdout - {'loss': 0.8041, 'grad_norm': 1.0142004489898682, 'learning_rate': 1.7662815522635016e-05, 'epoch': 0.74} +2025-02-05 16:50:22 - ERROR - stderr - 25%|██▍ | 5507/22434 [6:42:42<11:46:56, 2.51s/it] +2025-02-05 16:50:25 - ERROR - stderr - 25%|██▍ | 5508/22434 [6:42:45<11:48:56, 2.51s/it] +2025-02-05 16:50:25 - ERROR - stderr - +2025-02-05 16:50:25 - ERROR - stderr - +2025-02-05 16:50:25 - INFO - stdout - {'loss': 0.928, 'grad_norm': 1.0304288864135742, 'learning_rate': 1.7661887828455396e-05, 'epoch': 0.74} +2025-02-05 16:50:25 - ERROR - stderr - 25%|██▍ | 5508/22434 [6:42:45<11:48:56, 2.51s/it] +2025-02-05 16:50:27 - ERROR - stderr - 25%|██▍ | 5509/22434 [6:42:47<11:44:32, 2.50s/it] +2025-02-05 16:50:27 - ERROR - stderr - +2025-02-05 16:50:27 - ERROR - stderr - +2025-02-05 16:50:27 - INFO - stdout - {'loss': 1.0912, 'grad_norm': 1.1089518070220947, 'learning_rate': 1.7660959974571064e-05, 'epoch': 0.74} +2025-02-05 16:50:27 - ERROR - stderr - 25%|██▍ | 5509/22434 [6:42:47<11:44:32, 2.50s/it] +2025-02-05 16:50:30 - ERROR - stderr - 25%|██▍ | 5510/22434 [6:42:50<11:42:46, 2.49s/it] +2025-02-05 16:50:30 - ERROR - stderr - +2025-02-05 16:50:30 - ERROR - stderr - +2025-02-05 16:50:30 - INFO - stdout - {'loss': 0.8898, 'grad_norm': 1.0991125106811523, 'learning_rate': 1.7660031961001344e-05, 'epoch': 0.74} +2025-02-05 16:50:30 - ERROR - stderr - 25%|██▍ | 5510/22434 [6:42:50<11:42:46, 2.49s/it] +2025-02-05 16:50:32 - ERROR - stderr - 25%|██▍ | 5511/22434 [6:42:52<11:39:07, 2.48s/it] +2025-02-05 16:50:32 - ERROR - stderr - +2025-02-05 16:50:32 - ERROR - stderr - +2025-02-05 16:50:32 - INFO - stdout - {'loss': 1.1214, 'grad_norm': 1.158766746520996, 'learning_rate': 1.7659103787765594e-05, 'epoch': 0.74} +2025-02-05 16:50:32 - ERROR - stderr - 25%|██▍ | 5511/22434 [6:42:52<11:39:07, 2.48s/it] +2025-02-05 16:50:35 - ERROR - stderr - 25%|██▍ | 5512/22434 [6:42:55<11:36:15, 2.47s/it] +2025-02-05 16:50:35 - ERROR - stderr - +2025-02-05 16:50:35 - ERROR - stderr - +2025-02-05 16:50:35 - INFO - stdout - {'loss': 0.964, 'grad_norm': 1.1270241737365723, 'learning_rate': 1.7658175454883152e-05, 'epoch': 0.74} +2025-02-05 16:50:35 - ERROR - stderr - 25%|██▍ | 5512/22434 [6:42:55<11:36:15, 2.47s/it] +2025-02-05 16:50:37 - ERROR - stderr - 25%|██▍ | 5513/22434 [6:42:57<11:33:59, 2.46s/it] +2025-02-05 16:50:37 - ERROR - stderr - +2025-02-05 16:50:37 - ERROR - stderr - +2025-02-05 16:50:37 - INFO - stdout - {'loss': 1.036, 'grad_norm': 1.0338053703308105, 'learning_rate': 1.765724696237337e-05, 'epoch': 0.74} +2025-02-05 16:50:37 - ERROR - stderr - 25%|██▍ | 5513/22434 [6:42:57<11:33:59, 2.46s/it] +2025-02-05 16:50:40 - ERROR - stderr - 25%|██▍ | 5514/22434 [6:42:59<11:38:01, 2.48s/it] +2025-02-05 16:50:40 - ERROR - stderr - +2025-02-05 16:50:40 - ERROR - stderr - +2025-02-05 16:50:40 - INFO - stdout - {'loss': 0.9089, 'grad_norm': 1.0214444398880005, 'learning_rate': 1.7656318310255604e-05, 'epoch': 0.74} +2025-02-05 16:50:40 - ERROR - stderr - 25%|██▍ | 5514/22434 [6:43:00<11:38:01, 2.48s/it] +2025-02-05 16:50:42 - ERROR - stderr - 25%|██▍ | 5515/22434 [6:43:02<11:41:52, 2.49s/it] +2025-02-05 16:50:42 - ERROR - stderr - +2025-02-05 16:50:42 - ERROR - stderr - +2025-02-05 16:50:42 - INFO - stdout - {'loss': 1.0408, 'grad_norm': 1.1164906024932861, 'learning_rate': 1.765538949854921e-05, 'epoch': 0.74} +2025-02-05 16:50:42 - ERROR - stderr - 25%|██▍ | 5515/22434 [6:43:02<11:41:52, 2.49s/it] +2025-02-05 16:50:45 - ERROR - stderr - 25%|██▍ | 5516/22434 [6:43:05<11:42:31, 2.49s/it] +2025-02-05 16:50:45 - ERROR - stderr - +2025-02-05 16:50:45 - ERROR - stderr - +2025-02-05 16:50:45 - INFO - stdout - {'loss': 0.8648, 'grad_norm': 1.0141122341156006, 'learning_rate': 1.7654460527273543e-05, 'epoch': 0.74} +2025-02-05 16:50:45 - ERROR - stderr - 25%|██▍ | 5516/22434 [6:43:05<11:42:31, 2.49s/it] +2025-02-05 16:50:47 - ERROR - stderr - 25%|██▍ | 5517/22434 [6:43:07<11:37:19, 2.47s/it] +2025-02-05 16:50:47 - ERROR - stderr - +2025-02-05 16:50:47 - ERROR - stderr - +2025-02-05 16:50:47 - INFO - stdout - {'loss': 1.0089, 'grad_norm': 1.1392110586166382, 'learning_rate': 1.7653531396447975e-05, 'epoch': 0.74} +2025-02-05 16:50:47 - ERROR - stderr - 25%|██▍ | 5517/22434 [6:43:07<11:37:19, 2.47s/it] +2025-02-05 16:50:50 - ERROR - stderr - 25%|██▍ | 5518/22434 [6:43:09<11:39:49, 2.48s/it] +2025-02-05 16:50:50 - ERROR - stderr - +2025-02-05 16:50:50 - ERROR - stderr - +2025-02-05 16:50:50 - INFO - stdout - {'loss': 0.9318, 'grad_norm': 1.0669268369674683, 'learning_rate': 1.7652602106091866e-05, 'epoch': 0.74} +2025-02-05 16:50:50 - ERROR - stderr - 25%|██▍ | 5518/22434 [6:43:10<11:39:49, 2.48s/it] +2025-02-05 16:50:52 - ERROR - stderr - 25%|██▍ | 5519/22434 [6:43:12<11:48:37, 2.51s/it] +2025-02-05 16:50:52 - ERROR - stderr - +2025-02-05 16:50:52 - ERROR - stderr - +2025-02-05 16:50:52 - INFO - stdout - {'loss': 0.9506, 'grad_norm': 1.0497102737426758, 'learning_rate': 1.7651672656224592e-05, 'epoch': 0.74} +2025-02-05 16:50:52 - ERROR - stderr - 25%|██▍ | 5519/22434 [6:43:12<11:48:37, 2.51s/it] +2025-02-05 16:50:55 - ERROR - stderr - 25%|██▍ | 5520/22434 [6:43:15<11:57:09, 2.54s/it] +2025-02-05 16:50:55 - ERROR - stderr - +2025-02-05 16:50:55 - ERROR - stderr - +2025-02-05 16:50:55 - INFO - stdout - {'loss': 0.9395, 'grad_norm': 1.0458214282989502, 'learning_rate': 1.765074304686552e-05, 'epoch': 0.74} +2025-02-05 16:50:55 - ERROR - stderr - 25%|██▍ | 5520/22434 [6:43:15<11:57:09, 2.54s/it] +2025-02-05 16:50:57 - ERROR - stderr - 25%|██▍ | 5521/22434 [6:43:17<11:49:37, 2.52s/it] +2025-02-05 16:50:57 - ERROR - stderr - +2025-02-05 16:50:57 - ERROR - stderr - +2025-02-05 16:50:57 - INFO - stdout - {'loss': 0.9141, 'grad_norm': 1.0274564027786255, 'learning_rate': 1.7649813278034032e-05, 'epoch': 0.74} +2025-02-05 16:50:57 - ERROR - stderr - 25%|██▍ | 5521/22434 [6:43:17<11:49:37, 2.52s/it] +2025-02-05 16:51:00 - ERROR - stderr - 25%|██▍ | 5522/22434 [6:43:20<11:49:40, 2.52s/it] +2025-02-05 16:51:00 - ERROR - stderr - +2025-02-05 16:51:00 - ERROR - stderr - +2025-02-05 16:51:00 - INFO - stdout - {'loss': 0.8309, 'grad_norm': 1.0003740787506104, 'learning_rate': 1.7648883349749506e-05, 'epoch': 0.74} +2025-02-05 16:51:00 - ERROR - stderr - 25%|██▍ | 5522/22434 [6:43:20<11:49:40, 2.52s/it] +2025-02-05 16:51:02 - ERROR - stderr - 25%|██▍ | 5523/22434 [6:43:22<11:47:25, 2.51s/it] +2025-02-05 16:51:02 - ERROR - stderr - +2025-02-05 16:51:02 - ERROR - stderr - +2025-02-05 16:51:02 - INFO - stdout - {'loss': 0.927, 'grad_norm': 1.0978950262069702, 'learning_rate': 1.7647953262031325e-05, 'epoch': 0.74} +2025-02-05 16:51:02 - ERROR - stderr - 25%|██▍ | 5523/22434 [6:43:22<11:47:25, 2.51s/it] +2025-02-05 16:51:05 - ERROR - stderr - 25%|██▍ | 5524/22434 [6:43:25<11:43:49, 2.50s/it] +2025-02-05 16:51:05 - ERROR - stderr - +2025-02-05 16:51:05 - ERROR - stderr - +2025-02-05 16:51:05 - INFO - stdout - {'loss': 0.9315, 'grad_norm': 1.028764009475708, 'learning_rate': 1.7647023014898878e-05, 'epoch': 0.74} +2025-02-05 16:51:05 - ERROR - stderr - 25%|██▍ | 5524/22434 [6:43:25<11:43:49, 2.50s/it] +2025-02-05 16:51:07 - ERROR - stderr - 25%|██▍ | 5525/22434 [6:43:27<11:45:21, 2.50s/it] +2025-02-05 16:51:07 - ERROR - stderr - +2025-02-05 16:51:07 - ERROR - stderr - +2025-02-05 16:51:07 - INFO - stdout - {'loss': 0.9202, 'grad_norm': 1.088834285736084, 'learning_rate': 1.7646092608371553e-05, 'epoch': 0.74} +2025-02-05 16:51:07 - ERROR - stderr - 25%|██▍ | 5525/22434 [6:43:27<11:45:21, 2.50s/it] +2025-02-05 16:51:10 - ERROR - stderr - 25%|██▍ | 5526/22434 [6:43:30<11:46:04, 2.51s/it] +2025-02-05 16:51:10 - ERROR - stderr - +2025-02-05 16:51:10 - ERROR - stderr - +2025-02-05 16:51:10 - INFO - stdout - {'loss': 0.9356, 'grad_norm': 1.1014736890792847, 'learning_rate': 1.7645162042468742e-05, 'epoch': 0.74} +2025-02-05 16:51:10 - ERROR - stderr - 25%|██▍ | 5526/22434 [6:43:30<11:46:04, 2.51s/it] +2025-02-05 16:51:12 - ERROR - stderr - 25%|██▍ | 5527/22434 [6:43:32<11:41:22, 2.49s/it] +2025-02-05 16:51:12 - ERROR - stderr - +2025-02-05 16:51:12 - ERROR - stderr - +2025-02-05 16:51:12 - INFO - stdout - {'loss': 0.9033, 'grad_norm': 1.1460351943969727, 'learning_rate': 1.764423131720985e-05, 'epoch': 0.74} +2025-02-05 16:51:12 - ERROR - stderr - 25%|██▍ | 5527/22434 [6:43:32<11:41:22, 2.49s/it] +2025-02-05 16:51:15 - ERROR - stderr - 25%|██▍ | 5528/22434 [6:43:35<11:40:35, 2.49s/it] +2025-02-05 16:51:15 - ERROR - stderr - +2025-02-05 16:51:15 - ERROR - stderr - +2025-02-05 16:51:15 - INFO - stdout - {'loss': 0.8716, 'grad_norm': 1.1521360874176025, 'learning_rate': 1.7643300432614262e-05, 'epoch': 0.74} +2025-02-05 16:51:15 - ERROR - stderr - 25%|██▍ | 5528/22434 [6:43:35<11:40:35, 2.49s/it] +2025-02-05 16:51:17 - ERROR - stderr - 25%|██▍ | 5529/22434 [6:43:37<11:39:02, 2.48s/it] +2025-02-05 16:51:17 - ERROR - stderr - +2025-02-05 16:51:17 - ERROR - stderr - +2025-02-05 16:51:17 - INFO - stdout - {'loss': 0.8171, 'grad_norm': 0.9602109789848328, 'learning_rate': 1.7642369388701394e-05, 'epoch': 0.74} +2025-02-05 16:51:17 - ERROR - stderr - 25%|██▍ | 5529/22434 [6:43:37<11:39:02, 2.48s/it] +2025-02-05 16:51:20 - ERROR - stderr - 25%|██▍ | 5530/22434 [6:43:39<11:39:17, 2.48s/it] +2025-02-05 16:51:20 - ERROR - stderr - +2025-02-05 16:51:20 - ERROR - stderr - +2025-02-05 16:51:20 - INFO - stdout - {'loss': 0.9885, 'grad_norm': 1.0671948194503784, 'learning_rate': 1.764143818549065e-05, 'epoch': 0.74} +2025-02-05 16:51:20 - ERROR - stderr - 25%|██▍ | 5530/22434 [6:43:40<11:39:17, 2.48s/it] +2025-02-05 16:51:22 - ERROR - stderr - 25%|██▍ | 5531/22434 [6:43:42<11:37:08, 2.47s/it] +2025-02-05 16:51:22 - ERROR - stderr - +2025-02-05 16:51:22 - ERROR - stderr - +2025-02-05 16:51:22 - INFO - stdout - {'loss': 1.0208, 'grad_norm': 1.1693493127822876, 'learning_rate': 1.764050682300144e-05, 'epoch': 0.74} +2025-02-05 16:51:22 - ERROR - stderr - 25%|██▍ | 5531/22434 [6:43:42<11:37:08, 2.47s/it] +2025-02-05 16:51:25 - ERROR - stderr - 25%|██▍ | 5532/22434 [6:43:45<11:43:04, 2.50s/it] +2025-02-05 16:51:25 - ERROR - stderr - +2025-02-05 16:51:25 - ERROR - stderr - +2025-02-05 16:51:25 - INFO - stdout - {'loss': 0.9083, 'grad_norm': 1.0283278226852417, 'learning_rate': 1.7639575301253174e-05, 'epoch': 0.74} +2025-02-05 16:51:25 - ERROR - stderr - 25%|██▍ | 5532/22434 [6:43:45<11:43:04, 2.50s/it] +2025-02-05 16:51:27 - ERROR - stderr - 25%|██▍ | 5533/22434 [6:43:47<11:39:15, 2.48s/it] +2025-02-05 16:51:27 - ERROR - stderr - +2025-02-05 16:51:27 - ERROR - stderr - +2025-02-05 16:51:27 - INFO - stdout - {'loss': 0.9466, 'grad_norm': 1.1111806631088257, 'learning_rate': 1.7638643620265275e-05, 'epoch': 0.74} +2025-02-05 16:51:27 - ERROR - stderr - 25%|██▍ | 5533/22434 [6:43:47<11:39:15, 2.48s/it] +2025-02-05 16:51:30 - ERROR - stderr - 25%|██▍ | 5534/22434 [6:43:49<11:39:13, 2.48s/it] +2025-02-05 16:51:30 - ERROR - stderr - +2025-02-05 16:51:30 - ERROR - stderr - +2025-02-05 16:51:30 - INFO - stdout - {'loss': 0.856, 'grad_norm': 1.0647213459014893, 'learning_rate': 1.7637711780057157e-05, 'epoch': 0.74} +2025-02-05 16:51:30 - ERROR - stderr - 25%|██▍ | 5534/22434 [6:43:49<11:39:13, 2.48s/it] +2025-02-05 16:51:32 - ERROR - stderr - 25%|██▍ | 5535/22434 [6:43:52<11:36:24, 2.47s/it] +2025-02-05 16:51:32 - ERROR - stderr - +2025-02-05 16:51:32 - ERROR - stderr - +2025-02-05 16:51:32 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.0383224487304688, 'learning_rate': 1.7636779780648244e-05, 'epoch': 0.74} +2025-02-05 16:51:32 - ERROR - stderr - 25%|██▍ | 5535/22434 [6:43:52<11:36:24, 2.47s/it] +2025-02-05 16:51:35 - ERROR - stderr - 25%|██▍ | 5536/22434 [6:43:54<11:36:45, 2.47s/it] +2025-02-05 16:51:35 - ERROR - stderr - +2025-02-05 16:51:35 - ERROR - stderr - +2025-02-05 16:51:35 - INFO - stdout - {'loss': 0.9503, 'grad_norm': 1.2511208057403564, 'learning_rate': 1.7635847622057967e-05, 'epoch': 0.74} +2025-02-05 16:51:35 - ERROR - stderr - 25%|██▍ | 5536/22434 [6:43:54<11:36:45, 2.47s/it] +2025-02-05 16:51:37 - ERROR - stderr - 25%|██▍ | 5537/22434 [6:43:57<11:36:42, 2.47s/it] +2025-02-05 16:51:37 - ERROR - stderr - +2025-02-05 16:51:37 - ERROR - stderr - +2025-02-05 16:51:37 - INFO - stdout - {'loss': 0.8861, 'grad_norm': 0.9401532411575317, 'learning_rate': 1.7634915304305752e-05, 'epoch': 0.74} +2025-02-05 16:51:37 - ERROR - stderr - 25%|██▍ | 5537/22434 [6:43:57<11:36:42, 2.47s/it] +2025-02-05 16:51:40 - ERROR - stderr - 25%|██▍ | 5538/22434 [6:43:59<11:42:31, 2.49s/it] +2025-02-05 16:51:40 - ERROR - stderr - +2025-02-05 16:51:40 - ERROR - stderr - +2025-02-05 16:51:40 - INFO - stdout - {'loss': 1.0631, 'grad_norm': 1.1136353015899658, 'learning_rate': 1.763398282741103e-05, 'epoch': 0.74} +2025-02-05 16:51:40 - ERROR - stderr - 25%|██▍ | 5538/22434 [6:43:59<11:42:31, 2.49s/it] +2025-02-05 16:51:42 - ERROR - stderr - 25%|██▍ | 5539/22434 [6:44:02<11:36:18, 2.47s/it] +2025-02-05 16:51:42 - ERROR - stderr - +2025-02-05 16:51:42 - ERROR - stderr - +2025-02-05 16:51:42 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.039443850517273, 'learning_rate': 1.7633050191393243e-05, 'epoch': 0.74} +2025-02-05 16:51:42 - ERROR - stderr - 25%|██▍ | 5539/22434 [6:44:02<11:36:18, 2.47s/it] +2025-02-05 16:51:45 - ERROR - stderr - 25%|██▍ | 5540/22434 [6:44:04<11:36:21, 2.47s/it] +2025-02-05 16:51:45 - ERROR - stderr - +2025-02-05 16:51:45 - ERROR - stderr - +2025-02-05 16:51:45 - INFO - stdout - {'loss': 1.0136, 'grad_norm': 1.177010416984558, 'learning_rate': 1.763211739627183e-05, 'epoch': 0.74} +2025-02-05 16:51:45 - ERROR - stderr - 25%|██▍ | 5540/22434 [6:44:04<11:36:21, 2.47s/it] +2025-02-05 16:51:47 - ERROR - stderr - 25%|██▍ | 5541/22434 [6:44:07<11:53:28, 2.53s/it] +2025-02-05 16:51:47 - ERROR - stderr - +2025-02-05 16:51:47 - ERROR - stderr - +2025-02-05 16:51:47 - INFO - stdout - {'loss': 0.946, 'grad_norm': 1.1243691444396973, 'learning_rate': 1.7631184442066232e-05, 'epoch': 0.74} +2025-02-05 16:51:47 - ERROR - stderr - 25%|██▍ | 5541/22434 [6:44:07<11:53:28, 2.53s/it] +2025-02-05 16:51:50 - ERROR - stderr - 25%|██▍ | 5542/22434 [6:44:09<11:47:30, 2.51s/it] +2025-02-05 16:51:50 - ERROR - stderr - +2025-02-05 16:51:50 - ERROR - stderr - +2025-02-05 16:51:50 - INFO - stdout - {'loss': 0.9143, 'grad_norm': 1.0923787355422974, 'learning_rate': 1.76302513287959e-05, 'epoch': 0.74} +2025-02-05 16:51:50 - ERROR - stderr - 25%|██▍ | 5542/22434 [6:44:09<11:47:30, 2.51s/it] +2025-02-05 16:51:52 - ERROR - stderr - 25%|██▍ | 5543/22434 [6:44:12<11:44:32, 2.50s/it] +2025-02-05 16:51:52 - ERROR - stderr - +2025-02-05 16:51:52 - ERROR - stderr - +2025-02-05 16:51:52 - INFO - stdout - {'loss': 1.0334, 'grad_norm': 1.1249938011169434, 'learning_rate': 1.7629318056480276e-05, 'epoch': 0.74} +2025-02-05 16:51:52 - ERROR - stderr - 25%|██▍ | 5543/22434 [6:44:12<11:44:32, 2.50s/it] +2025-02-05 16:51:55 - ERROR - stderr - 25%|██▍ | 5544/22434 [6:44:14<11:40:25, 2.49s/it] +2025-02-05 16:51:55 - ERROR - stderr - +2025-02-05 16:51:55 - ERROR - stderr - +2025-02-05 16:51:55 - INFO - stdout - {'loss': 0.9399, 'grad_norm': 1.1163212060928345, 'learning_rate': 1.7628384625138818e-05, 'epoch': 0.74} +2025-02-05 16:51:55 - ERROR - stderr - 25%|██▍ | 5544/22434 [6:44:14<11:40:25, 2.49s/it] +2025-02-05 16:51:57 - ERROR - stderr - 25%|██▍ | 5545/22434 [6:44:17<11:41:17, 2.49s/it] +2025-02-05 16:51:57 - ERROR - stderr - +2025-02-05 16:51:57 - ERROR - stderr - +2025-02-05 16:51:57 - INFO - stdout - {'loss': 0.7557, 'grad_norm': 0.9701418280601501, 'learning_rate': 1.7627451034790983e-05, 'epoch': 0.74} +2025-02-05 16:51:57 - ERROR - stderr - 25%|██▍ | 5545/22434 [6:44:17<11:41:17, 2.49s/it] +2025-02-05 16:52:00 - ERROR - stderr - 25%|██▍ | 5546/22434 [6:44:19<11:38:27, 2.48s/it] +2025-02-05 16:52:00 - ERROR - stderr - +2025-02-05 16:52:00 - ERROR - stderr - +2025-02-05 16:52:00 - INFO - stdout - {'loss': 0.9188, 'grad_norm': 1.0682822465896606, 'learning_rate': 1.762651728545623e-05, 'epoch': 0.74} +2025-02-05 16:52:00 - ERROR - stderr - 25%|██▍ | 5546/22434 [6:44:19<11:38:27, 2.48s/it] +2025-02-05 16:52:02 - ERROR - stderr - 25%|██▍ | 5547/22434 [6:44:22<11:57:05, 2.55s/it] +2025-02-05 16:52:02 - ERROR - stderr - +2025-02-05 16:52:02 - ERROR - stderr - +2025-02-05 16:52:02 - INFO - stdout - {'loss': 0.9236, 'grad_norm': 0.987820565700531, 'learning_rate': 1.7625583377154023e-05, 'epoch': 0.74} +2025-02-05 16:52:02 - ERROR - stderr - 25%|██▍ | 5547/22434 [6:44:22<11:57:05, 2.55s/it] +2025-02-05 16:52:05 - ERROR - stderr - 25%|██▍ | 5548/22434 [6:44:25<11:54:50, 2.54s/it] +2025-02-05 16:52:05 - ERROR - stderr - +2025-02-05 16:52:05 - ERROR - stderr - +2025-02-05 16:52:05 - INFO - stdout - {'loss': 1.0382, 'grad_norm': 1.1549816131591797, 'learning_rate': 1.7624649309903824e-05, 'epoch': 0.74} +2025-02-05 16:52:05 - ERROR - stderr - 25%|██▍ | 5548/22434 [6:44:25<11:54:50, 2.54s/it] +2025-02-05 16:52:07 - ERROR - stderr - 25%|██▍ | 5549/22434 [6:44:27<11:50:58, 2.53s/it] +2025-02-05 16:52:07 - ERROR - stderr - +2025-02-05 16:52:07 - ERROR - stderr - +2025-02-05 16:52:07 - INFO - stdout - {'loss': 0.8969, 'grad_norm': 1.1395118236541748, 'learning_rate': 1.7623715083725107e-05, 'epoch': 0.74} +2025-02-05 16:52:07 - ERROR - stderr - 25%|██▍ | 5549/22434 [6:44:27<11:50:58, 2.53s/it] +2025-02-05 16:52:10 - ERROR - stderr - 25%|██▍ | 5550/22434 [6:44:29<11:44:52, 2.50s/it] +2025-02-05 16:52:10 - ERROR - stderr - +2025-02-05 16:52:10 - ERROR - stderr - +2025-02-05 16:52:10 - INFO - stdout - {'loss': 0.8673, 'grad_norm': 1.0887000560760498, 'learning_rate': 1.7622780698637348e-05, 'epoch': 0.74} +2025-02-05 16:52:10 - ERROR - stderr - 25%|██▍ | 5550/22434 [6:44:30<11:44:52, 2.50s/it] +2025-02-05 16:52:12 - ERROR - stderr - 25%|██▍ | 5551/22434 [6:44:32<11:57:25, 2.55s/it] +2025-02-05 16:52:12 - ERROR - stderr - +2025-02-05 16:52:12 - ERROR - stderr - +2025-02-05 16:52:12 - INFO - stdout - {'loss': 0.9219, 'grad_norm': 1.0476871728897095, 'learning_rate': 1.7621846154660017e-05, 'epoch': 0.74} +2025-02-05 16:52:12 - ERROR - stderr - 25%|██▍ | 5551/22434 [6:44:32<11:57:25, 2.55s/it] +2025-02-05 16:52:15 - ERROR - stderr - 25%|██▍ | 5552/22434 [6:44:35<11:50:52, 2.53s/it] +2025-02-05 16:52:15 - ERROR - stderr - +2025-02-05 16:52:15 - ERROR - stderr - +2025-02-05 16:52:15 - INFO - stdout - {'loss': 0.9583, 'grad_norm': 0.9983686208724976, 'learning_rate': 1.7620911451812595e-05, 'epoch': 0.74} +2025-02-05 16:52:15 - ERROR - stderr - 25%|██▍ | 5552/22434 [6:44:35<11:50:52, 2.53s/it] +2025-02-05 16:52:18 - ERROR - stderr - 25%|██▍ | 5553/22434 [6:44:37<12:08:59, 2.59s/it] +2025-02-05 16:52:18 - ERROR - stderr - +2025-02-05 16:52:18 - ERROR - stderr - +2025-02-05 16:52:18 - INFO - stdout - {'loss': 0.9644, 'grad_norm': 1.0809341669082642, 'learning_rate': 1.7619976590114568e-05, 'epoch': 0.74} +2025-02-05 16:52:18 - ERROR - stderr - 25%|██▍ | 5553/22434 [6:44:37<12:08:59, 2.59s/it] +2025-02-05 16:52:20 - ERROR - stderr - 25%|██▍ | 5554/22434 [6:44:40<11:57:19, 2.55s/it] +2025-02-05 16:52:20 - ERROR - stderr - +2025-02-05 16:52:20 - ERROR - stderr - +2025-02-05 16:52:20 - INFO - stdout - {'loss': 1.0363, 'grad_norm': 1.088506817817688, 'learning_rate': 1.761904156958542e-05, 'epoch': 0.74} +2025-02-05 16:52:20 - ERROR - stderr - 25%|██▍ | 5554/22434 [6:44:40<11:57:19, 2.55s/it] +2025-02-05 16:52:23 - ERROR - stderr - 25%|██▍ | 5555/22434 [6:44:42<11:56:43, 2.55s/it] +2025-02-05 16:52:23 - ERROR - stderr - +2025-02-05 16:52:23 - ERROR - stderr - +2025-02-05 16:52:23 - INFO - stdout - {'loss': 0.8884, 'grad_norm': 0.9728800654411316, 'learning_rate': 1.7618106390244643e-05, 'epoch': 0.74} +2025-02-05 16:52:23 - ERROR - stderr - 25%|█���▍ | 5555/22434 [6:44:42<11:56:43, 2.55s/it] +2025-02-05 16:52:25 - ERROR - stderr - 25%|██▍ | 5556/22434 [6:44:45<12:01:33, 2.57s/it] +2025-02-05 16:52:25 - ERROR - stderr - +2025-02-05 16:52:25 - ERROR - stderr - +2025-02-05 16:52:25 - INFO - stdout - {'loss': 0.9946, 'grad_norm': 1.0573627948760986, 'learning_rate': 1.7617171052111722e-05, 'epoch': 0.74} +2025-02-05 16:52:25 - ERROR - stderr - 25%|██▍ | 5556/22434 [6:44:45<12:01:33, 2.57s/it] +2025-02-05 16:52:28 - ERROR - stderr - 25%|██▍ | 5557/22434 [6:44:47<11:54:09, 2.54s/it] +2025-02-05 16:52:28 - ERROR - stderr - +2025-02-05 16:52:28 - ERROR - stderr - +2025-02-05 16:52:28 - INFO - stdout - {'loss': 0.9673, 'grad_norm': 1.1788753271102905, 'learning_rate': 1.7616235555206165e-05, 'epoch': 0.74} +2025-02-05 16:52:28 - ERROR - stderr - 25%|██▍ | 5557/22434 [6:44:47<11:54:09, 2.54s/it] +2025-02-05 16:52:30 - ERROR - stderr - 25%|██▍ | 5558/22434 [6:44:50<11:52:12, 2.53s/it] +2025-02-05 16:52:30 - ERROR - stderr - +2025-02-05 16:52:30 - ERROR - stderr - +2025-02-05 16:52:30 - INFO - stdout - {'loss': 0.8881, 'grad_norm': 1.0032631158828735, 'learning_rate': 1.7615299899547466e-05, 'epoch': 0.74} +2025-02-05 16:52:30 - ERROR - stderr - 25%|██▍ | 5558/22434 [6:44:50<11:52:12, 2.53s/it] +2025-02-05 16:52:33 - ERROR - stderr - 25%|██▍ | 5559/22434 [6:44:52<11:52:32, 2.53s/it] +2025-02-05 16:52:33 - ERROR - stderr - +2025-02-05 16:52:33 - ERROR - stderr - +2025-02-05 16:52:33 - INFO - stdout - {'loss': 0.8891, 'grad_norm': 1.1179721355438232, 'learning_rate': 1.7614364085155126e-05, 'epoch': 0.74} +2025-02-05 16:52:33 - ERROR - stderr - 25%|██▍ | 5559/22434 [6:44:53<11:52:32, 2.53s/it] +2025-02-05 16:52:35 - ERROR - stderr - 25%|██▍ | 5560/22434 [6:44:55<11:46:07, 2.51s/it] +2025-02-05 16:52:35 - ERROR - stderr - +2025-02-05 16:52:35 - ERROR - stderr - +2025-02-05 16:52:35 - INFO - stdout - {'loss': 1.0801, 'grad_norm': 1.1642725467681885, 'learning_rate': 1.7613428112048652e-05, 'epoch': 0.74} +2025-02-05 16:52:35 - ERROR - stderr - 25%|██▍ | 5560/22434 [6:44:55<11:46:07, 2.51s/it] +2025-02-05 16:52:38 - ERROR - stderr - 25%|██▍ | 5561/22434 [6:44:57<11:47:42, 2.52s/it] +2025-02-05 16:52:38 - ERROR - stderr - +2025-02-05 16:52:38 - ERROR - stderr - +2025-02-05 16:52:38 - INFO - stdout - {'loss': 0.9963, 'grad_norm': 1.1616088151931763, 'learning_rate': 1.7612491980247553e-05, 'epoch': 0.74} +2025-02-05 16:52:38 - ERROR - stderr - 25%|██▍ | 5561/22434 [6:44:58<11:47:42, 2.52s/it] +2025-02-05 16:52:40 - ERROR - stderr - 25%|██▍ | 5562/22434 [6:45:00<11:49:20, 2.52s/it] +2025-02-05 16:52:40 - ERROR - stderr - +2025-02-05 16:52:40 - ERROR - stderr - +2025-02-05 16:52:40 - INFO - stdout - {'loss': 0.9141, 'grad_norm': 1.0798288583755493, 'learning_rate': 1.7611555689771346e-05, 'epoch': 0.74} +2025-02-05 16:52:40 - ERROR - stderr - 25%|██▍ | 5562/22434 [6:45:00<11:49:20, 2.52s/it] +2025-02-05 16:52:43 - ERROR - stderr - 25%|██▍ | 5563/22434 [6:45:02<11:40:57, 2.49s/it] +2025-02-05 16:52:43 - ERROR - stderr - +2025-02-05 16:52:43 - ERROR - stderr - +2025-02-05 16:52:43 - INFO - stdout - {'loss': 0.9248, 'grad_norm': 1.0646347999572754, 'learning_rate': 1.7610619240639545e-05, 'epoch': 0.74} +2025-02-05 16:52:43 - ERROR - stderr - 25%|██▍ | 5563/22434 [6:45:02<11:40:57, 2.49s/it] +2025-02-05 16:52:46 - ERROR - stderr - 25%|██▍ | 5564/22434 [6:45:05<12:13:48, 2.61s/it] +2025-02-05 16:52:46 - ERROR - stderr - +2025-02-05 16:52:46 - ERROR - stderr - +2025-02-05 16:52:46 - INFO - stdout - {'loss': 0.7928, 'grad_norm': 1.119341254234314, 'learning_rate': 1.7609682632871664e-05, 'epoch': 0.74} +2025-02-05 16:52:46 - ERROR - stderr - 25%|██▍ | 5564/22434 [6:45:05<12:13:48, 2.61s/it] +2025-02-05 16:52:48 - ERROR - stderr - 25%|██▍ | 5565/22434 [6:45:08<12:04:04, 2.58s/it] +2025-02-05 16:52:48 - ERROR - stderr - +2025-02-05 16:52:48 - ERROR - stderr - +2025-02-05 16:52:48 - INFO - stdout - {'loss': 0.9003, 'grad_norm': 0.9966019988059998, 'learning_rate': 1.7608745866487233e-05, 'epoch': 0.74} +2025-02-05 16:52:48 - ERROR - stderr - 25%|██▍ | 5565/22434 [6:45:08<12:04:04, 2.58s/it] +2025-02-05 16:52:51 - ERROR - stderr - 25%|██▍ | 5566/22434 [6:45:10<11:58:51, 2.56s/it] +2025-02-05 16:52:51 - ERROR - stderr - +2025-02-05 16:52:51 - ERROR - stderr - +2025-02-05 16:52:51 - INFO - stdout - {'loss': 0.9232, 'grad_norm': 1.0849602222442627, 'learning_rate': 1.7607808941505774e-05, 'epoch': 0.74} +2025-02-05 16:52:51 - ERROR - stderr - 25%|██▍ | 5566/22434 [6:45:10<11:58:51, 2.56s/it] +2025-02-05 16:52:53 - ERROR - stderr - 25%|██▍ | 5567/22434 [6:45:13<11:54:05, 2.54s/it] +2025-02-05 16:52:53 - ERROR - stderr - +2025-02-05 16:52:53 - ERROR - stderr - +2025-02-05 16:52:53 - INFO - stdout - {'loss': 0.934, 'grad_norm': 1.1072165966033936, 'learning_rate': 1.7606871857946817e-05, 'epoch': 0.74} +2025-02-05 16:52:53 - ERROR - stderr - 25%|██▍ | 5567/22434 [6:45:13<11:54:05, 2.54s/it] +2025-02-05 16:52:56 - ERROR - stderr - 25%|██▍ | 5568/22434 [6:45:15<11:55:31, 2.55s/it] +2025-02-05 16:52:56 - ERROR - stderr - +2025-02-05 16:52:56 - ERROR - stderr - +2025-02-05 16:52:56 - INFO - stdout - {'loss': 0.9402, 'grad_norm': 1.032358169555664, 'learning_rate': 1.7605934615829897e-05, 'epoch': 0.74} +2025-02-05 16:52:56 - ERROR - stderr - 25%|██▍ | 5568/22434 [6:45:15<11:55:31, 2.55s/it] +2025-02-05 16:52:58 - ERROR - stderr - 25%|██▍ | 5569/22434 [6:45:18<12:15:21, 2.62s/it] +2025-02-05 16:52:58 - ERROR - stderr - +2025-02-05 16:52:58 - ERROR - stderr - +2025-02-05 16:52:58 - INFO - stdout - {'loss': 1.003, 'grad_norm': 0.9713364243507385, 'learning_rate': 1.760499721517455e-05, 'epoch': 0.74} +2025-02-05 16:52:58 - ERROR - stderr - 25%|██▍ | 5569/22434 [6:45:18<12:15:21, 2.62s/it] +2025-02-05 16:53:01 - ERROR - stderr - 25%|██▍ | 5570/22434 [6:45:21<12:05:47, 2.58s/it] +2025-02-05 16:53:01 - ERROR - stderr - +2025-02-05 16:53:01 - ERROR - stderr - +2025-02-05 16:53:01 - INFO - stdout - {'loss': 0.9463, 'grad_norm': 1.0515556335449219, 'learning_rate': 1.7604059656000313e-05, 'epoch': 0.74} +2025-02-05 16:53:01 - ERROR - stderr - 25%|██▍ | 5570/22434 [6:45:21<12:05:47, 2.58s/it] +2025-02-05 16:53:03 - ERROR - stderr - 25%|██▍ | 5571/22434 [6:45:23<11:56:05, 2.55s/it] +2025-02-05 16:53:03 - ERROR - stderr - +2025-02-05 16:53:03 - ERROR - stderr - +2025-02-05 16:53:03 - INFO - stdout - {'loss': 0.9029, 'grad_norm': 1.027031421661377, 'learning_rate': 1.7603121938326726e-05, 'epoch': 0.74} +2025-02-05 16:53:03 - ERROR - stderr - 25%|██▍ | 5571/22434 [6:45:23<11:56:05, 2.55s/it] +2025-02-05 16:53:06 - ERROR - stderr - 25%|██▍ | 5572/22434 [6:45:26<11:48:51, 2.52s/it] +2025-02-05 16:53:06 - ERROR - stderr - +2025-02-05 16:53:06 - ERROR - stderr - +2025-02-05 16:53:06 - INFO - stdout - {'loss': 1.0512, 'grad_norm': 1.1110166311264038, 'learning_rate': 1.7602184062173338e-05, 'epoch': 0.75} +2025-02-05 16:53:06 - ERROR - stderr - 25%|██▍ | 5572/22434 [6:45:26<11:48:51, 2.52s/it] +2025-02-05 16:53:08 - ERROR - stderr - 25%|██▍ | 5573/22434 [6:45:28<11:48:00, 2.52s/it] +2025-02-05 16:53:08 - ERROR - stderr - +2025-02-05 16:53:08 - ERROR - stderr - +2025-02-05 16:53:08 - INFO - stdout - {'loss': 0.8847, 'grad_norm': 1.0961997509002686, 'learning_rate': 1.7601246027559697e-05, 'epoch': 0.75} +2025-02-05 16:53:08 - ERROR - stderr - 25%|██▍ | 5573/22434 [6:45:28<11:48:00, 2.52s/it] +2025-02-05 16:53:11 - ERROR - stderr - 25%|██▍ | 5574/22434 [6:45:31<11:42:06, 2.50s/it] +2025-02-05 16:53:11 - ERROR - stderr - +2025-02-05 16:53:11 - ERROR - stderr - +2025-02-05 16:53:11 - INFO - stdout - {'loss': 0.9476, 'grad_norm': 1.0574986934661865, 'learning_rate': 1.7600307834505358e-05, 'epoch': 0.75} +2025-02-05 16:53:11 - ERROR - stderr - 25%|██▍ | 5574/22434 [6:45:31<11:42:06, 2.50s/it] +2025-02-05 16:53:13 - ERROR - stderr - 25%|██▍ | 5575/22434 [6:45:33<11:42:13, 2.50s/it] +2025-02-05 16:53:13 - ERROR - stderr - +2025-02-05 16:53:13 - ERROR - stderr - +2025-02-05 16:53:13 - INFO - stdout - {'loss': 1.0106, 'grad_norm': 1.1161792278289795, 'learning_rate': 1.759936948302987e-05, 'epoch': 0.75} +2025-02-05 16:53:13 - ERROR - stderr - 25%|██▍ | 5575/22434 [6:45:33<11:42:13, 2.50s/it] +2025-02-05 16:53:16 - ERROR - stderr - 25%|██▍ | 5576/22434 [6:45:36<11:41:35, 2.50s/it] +2025-02-05 16:53:16 - ERROR - stderr - +2025-02-05 16:53:16 - ERROR - stderr - +2025-02-05 16:53:16 - INFO - stdout - {'loss': 0.9083, 'grad_norm': 0.9402395486831665, 'learning_rate': 1.7598430973152805e-05, 'epoch': 0.75} +2025-02-05 16:53:16 - ERROR - stderr - 25%|██▍ | 5576/22434 [6:45:36<11:41:35, 2.50s/it] +2025-02-05 16:53:18 - ERROR - stderr - 25%|██▍ | 5577/22434 [6:45:38<11:37:32, 2.48s/it] +2025-02-05 16:53:18 - ERROR - stderr - +2025-02-05 16:53:18 - ERROR - stderr - +2025-02-05 16:53:18 - INFO - stdout - {'loss': 0.8799, 'grad_norm': 1.133420467376709, 'learning_rate': 1.759749230489371e-05, 'epoch': 0.75} +2025-02-05 16:53:18 - ERROR - stderr - 25%|██▍ | 5577/22434 [6:45:38<11:37:32, 2.48s/it] +2025-02-05 16:53:21 - ERROR - stderr - 25%|██▍ | 5578/22434 [6:45:40<11:39:00, 2.49s/it] +2025-02-05 16:53:21 - ERROR - stderr - +2025-02-05 16:53:21 - ERROR - stderr - +2025-02-05 16:53:21 - INFO - stdout - {'loss': 1.0189, 'grad_norm': 0.9927236437797546, 'learning_rate': 1.759655347827216e-05, 'epoch': 0.75} +2025-02-05 16:53:21 - ERROR - stderr - 25%|██▍ | 5578/22434 [6:45:41<11:39:00, 2.49s/it] +2025-02-05 16:53:23 - ERROR - stderr - 25%|██▍ | 5579/22434 [6:45:43<11:43:54, 2.51s/it] +2025-02-05 16:53:23 - ERROR - stderr - +2025-02-05 16:53:23 - ERROR - stderr - +2025-02-05 16:53:23 - INFO - stdout - {'loss': 1.0268, 'grad_norm': 1.092087984085083, 'learning_rate': 1.7595614493307726e-05, 'epoch': 0.75} +2025-02-05 16:53:23 - ERROR - stderr - 25%|██▍ | 5579/22434 [6:45:43<11:43:54, 2.51s/it] +2025-02-05 16:53:26 - ERROR - stderr - 25%|██▍ | 5580/22434 [6:45:45<11:38:43, 2.49s/it] +2025-02-05 16:53:26 - ERROR - stderr - +2025-02-05 16:53:26 - ERROR - stderr - +2025-02-05 16:53:26 - INFO - stdout - {'loss': 0.9565, 'grad_norm': 1.0169463157653809, 'learning_rate': 1.7594675350019975e-05, 'epoch': 0.75} +2025-02-05 16:53:26 - ERROR - stderr - 25%|██▍ | 5580/22434 [6:45:46<11:38:43, 2.49s/it] +2025-02-05 16:53:28 - ERROR - stderr - 25%|██▍ | 5581/22434 [6:45:48<11:45:10, 2.51s/it] +2025-02-05 16:53:28 - ERROR - stderr - +2025-02-05 16:53:28 - ERROR - stderr - +2025-02-05 16:53:28 - INFO - stdout - {'loss': 0.9439, 'grad_norm': 0.9976377487182617, 'learning_rate': 1.759373604842848e-05, 'epoch': 0.75} +2025-02-05 16:53:28 - ERROR - stderr - 25%|██▍ | 5581/22434 [6:45:48<11:45:10, 2.51s/it] +2025-02-05 16:53:31 - ERROR - stderr - 25%|██▍ | 5582/22434 [6:45:51<11:46:35, 2.52s/it] +2025-02-05 16:53:31 - ERROR - stderr - +2025-02-05 16:53:31 - ERROR - stderr - +2025-02-05 16:53:31 - INFO - stdout - {'loss': 0.9419, 'grad_norm': 1.0684986114501953, 'learning_rate': 1.759279658855282e-05, 'epoch': 0.75} +2025-02-05 16:53:31 - ERROR - stderr - 25%|██▍ | 5582/22434 [6:45:51<11:46:35, 2.52s/it] +2025-02-05 16:53:33 - ERROR - stderr - 25%|██▍ | 5583/22434 [6:45:53<11:43:59, 2.51s/it] +2025-02-05 16:53:33 - ERROR - stderr - +2025-02-05 16:53:33 - ERROR - stderr - +2025-02-05 16:53:33 - INFO - stdout - {'loss': 1.0049, 'grad_norm': 1.2004917860031128, 'learning_rate': 1.759185697041259e-05, 'epoch': 0.75} +2025-02-05 16:53:33 - ERROR - stderr - 25%|██▍ | 5583/22434 [6:45:53<11:43:59, 2.51s/it] +2025-02-05 16:53:36 - ERROR - stderr - 25%|██▍ | 5584/22434 [6:45:56<11:58:43, 2.56s/it] +2025-02-05 16:53:36 - ERROR - stderr - +2025-02-05 16:53:36 - ERROR - stderr - +2025-02-05 16:53:36 - INFO - stdout - {'loss': 1.0354, 'grad_norm': 1.0028046369552612, 'learning_rate': 1.759091719402736e-05, 'epoch': 0.75} +2025-02-05 16:53:36 - ERROR - stderr - 25%|██▍ | 5584/22434 [6:45:56<11:58:43, 2.56s/it] +2025-02-05 16:53:39 - ERROR - stderr - 25%|██▍ | 5585/22434 [6:45:58<11:56:19, 2.55s/it] +2025-02-05 16:53:39 - ERROR - stderr - +2025-02-05 16:53:39 - ERROR - stderr - +2025-02-05 16:53:39 - INFO - stdout - {'loss': 0.9992, 'grad_norm': 1.0717568397521973, 'learning_rate': 1.7589977259416728e-05, 'epoch': 0.75} +2025-02-05 16:53:39 - ERROR - stderr - 25%|██▍ | 5585/22434 [6:45:58<11:56:19, 2.55s/it] +2025-02-05 16:53:41 - ERROR - stderr - 25%|██▍ | 5586/22434 [6:46:01<12:03:11, 2.58s/it] +2025-02-05 16:53:41 - ERROR - stderr - +2025-02-05 16:53:41 - ERROR - stderr - +2025-02-05 16:53:41 - INFO - stdout - {'loss': 0.9196, 'grad_norm': 1.0084487199783325, 'learning_rate': 1.7589037166600283e-05, 'epoch': 0.75} +2025-02-05 16:53:41 - ERROR - stderr - 25%|██▍ | 5586/22434 [6:46:01<12:03:11, 2.58s/it] +2025-02-05 16:53:44 - ERROR - stderr - 25%|██▍ | 5587/22434 [6:46:03<11:59:15, 2.56s/it] +2025-02-05 16:53:44 - ERROR - stderr - +2025-02-05 16:53:44 - ERROR - stderr - +2025-02-05 16:53:44 - INFO - stdout - {'loss': 0.9689, 'grad_norm': 1.0724035501480103, 'learning_rate': 1.758809691559762e-05, 'epoch': 0.75} +2025-02-05 16:53:44 - ERROR - stderr - 25%|██▍ | 5587/22434 [6:46:03<11:59:15, 2.56s/it] +2025-02-05 16:53:46 - ERROR - stderr - 25%|██▍ | 5588/22434 [6:46:06<11:54:01, 2.54s/it] +2025-02-05 16:53:46 - ERROR - stderr - +2025-02-05 16:53:46 - ERROR - stderr - +2025-02-05 16:53:46 - INFO - stdout - {'loss': 1.0199, 'grad_norm': 1.0877522230148315, 'learning_rate': 1.7587156506428337e-05, 'epoch': 0.75} +2025-02-05 16:53:46 - ERROR - stderr - 25%|██▍ | 5588/22434 [6:46:06<11:54:01, 2.54s/it] +2025-02-05 16:53:49 - ERROR - stderr - 25%|██▍ | 5589/22434 [6:46:08<11:44:55, 2.51s/it] +2025-02-05 16:53:49 - ERROR - stderr - +2025-02-05 16:53:49 - ERROR - stderr - +2025-02-05 16:53:49 - INFO - stdout - {'loss': 0.9479, 'grad_norm': 0.9636661410331726, 'learning_rate': 1.758621593911203e-05, 'epoch': 0.75} +2025-02-05 16:53:49 - ERROR - stderr - 25%|██▍ | 5589/22434 [6:46:08<11:44:55, 2.51s/it] +2025-02-05 16:53:51 - ERROR - stderr - 25%|██▍ | 5590/22434 [6:46:11<11:42:13, 2.50s/it] +2025-02-05 16:53:51 - ERROR - stderr - +2025-02-05 16:53:51 - ERROR - stderr - +2025-02-05 16:53:51 - INFO - stdout - {'loss': 0.9614, 'grad_norm': 1.1206096410751343, 'learning_rate': 1.758527521366832e-05, 'epoch': 0.75} +2025-02-05 16:53:51 - ERROR - stderr - 25%|██▍ | 5590/22434 [6:46:11<11:42:13, 2.50s/it] +2025-02-05 16:53:54 - ERROR - stderr - 25%|██▍ | 5591/22434 [6:46:13<11:51:00, 2.53s/it] +2025-02-05 16:53:54 - ERROR - stderr - +2025-02-05 16:53:54 - ERROR - stderr - +2025-02-05 16:53:54 - INFO - stdout - {'loss': 0.9226, 'grad_norm': 0.9915058016777039, 'learning_rate': 1.7584334330116807e-05, 'epoch': 0.75} +2025-02-05 16:53:54 - ERROR - stderr - 25%|██▍ | 5591/22434 [6:46:14<11:51:00, 2.53s/it] +2025-02-05 16:53:56 - ERROR - stderr - 25%|██▍ | 5592/22434 [6:46:16<11:52:23, 2.54s/it] +2025-02-05 16:53:56 - ERROR - stderr - +2025-02-05 16:53:56 - ERROR - stderr - +2025-02-05 16:53:56 - INFO - stdout - {'loss': 0.9411, 'grad_norm': 1.0223729610443115, 'learning_rate': 1.7583393288477097e-05, 'epoch': 0.75} +2025-02-05 16:53:56 - ERROR - stderr - 25%|██▍ | 5592/22434 [6:46:16<11:52:23, 2.54s/it] +2025-02-05 16:53:59 - ERROR - stderr - 25%|██▍ | 5593/22434 [6:46:19<11:50:34, 2.53s/it] +2025-02-05 16:53:59 - ERROR - stderr - +2025-02-05 16:53:59 - ERROR - stderr - +2025-02-05 16:53:59 - INFO - stdout - {'loss': 0.9011, 'grad_norm': 0.9829967617988586, 'learning_rate': 1.7582452088768814e-05, 'epoch': 0.75} +2025-02-05 16:53:59 - ERROR - stderr - 25%|██▍ | 5593/22434 [6:46:19<11:50:34, 2.53s/it] +2025-02-05 16:54:01 - ERROR - stderr - 25%|██▍ | 5594/22434 [6:46:21<11:51:39, 2.54s/it] +2025-02-05 16:54:01 - ERROR - stderr - +2025-02-05 16:54:01 - ERROR - stderr - +2025-02-05 16:54:01 - INFO - stdout - {'loss': 0.9306, 'grad_norm': 1.0687378644943237, 'learning_rate': 1.758151073101157e-05, 'epoch': 0.75} +2025-02-05 16:54:01 - ERROR - stderr - 25%|██▍ | 5594/22434 [6:46:21<11:51:39, 2.54s/it] +2025-02-05 16:54:04 - ERROR - stderr - 25%|██▍ | 5595/22434 [6:46:24<11:44:56, 2.51s/it] +2025-02-05 16:54:04 - ERROR - stderr - +2025-02-05 16:54:04 - ERROR - stderr - +2025-02-05 16:54:04 - INFO - stdout - {'loss': 0.9327, 'grad_norm': 1.1205363273620605, 'learning_rate': 1.758056921522499e-05, 'epoch': 0.75} +2025-02-05 16:54:04 - ERROR - stderr - 25%|██▍ | 5595/22434 [6:46:24<11:44:56, 2.51s/it] +2025-02-05 16:54:06 - ERROR - stderr - 25%|██▍ | 5596/22434 [6:46:26<11:52:14, 2.54s/it] +2025-02-05 16:54:06 - ERROR - stderr - +2025-02-05 16:54:06 - ERROR - stderr - +2025-02-05 16:54:06 - INFO - stdout - {'loss': 0.882, 'grad_norm': 1.0322699546813965, 'learning_rate': 1.7579627541428702e-05, 'epoch': 0.75} +2025-02-05 16:54:06 - ERROR - stderr - 25%|██▍ | 5596/22434 [6:46:26<11:52:14, 2.54s/it] +2025-02-05 16:54:09 - ERROR - stderr - 25%|██▍ | 5597/22434 [6:46:29<11:47:32, 2.52s/it] +2025-02-05 16:54:09 - ERROR - stderr - +2025-02-05 16:54:09 - ERROR - stderr - +2025-02-05 16:54:09 - INFO - stdout - {'loss': 0.9656, 'grad_norm': 1.1521402597427368, 'learning_rate': 1.7578685709642327e-05, 'epoch': 0.75} +2025-02-05 16:54:09 - ERROR - stderr - 25%|██▍ | 5597/22434 [6:46:29<11:47:32, 2.52s/it] +2025-02-05 16:54:11 - ERROR - stderr - 25%|██▍ | 5598/22434 [6:46:31<11:43:00, 2.51s/it] +2025-02-05 16:54:11 - ERROR - stderr - +2025-02-05 16:54:11 - ERROR - stderr - +2025-02-05 16:54:11 - INFO - stdout - {'loss': 0.9424, 'grad_norm': 1.1766597032546997, 'learning_rate': 1.75777437198855e-05, 'epoch': 0.75} +2025-02-05 16:54:11 - ERROR - stderr - 25%|██▍ | 5598/22434 [6:46:31<11:43:00, 2.51s/it] +2025-02-05 16:54:14 - ERROR - stderr - 25%|██▍ | 5599/22434 [6:46:34<11:43:31, 2.51s/it] +2025-02-05 16:54:14 - ERROR - stderr - +2025-02-05 16:54:14 - ERROR - stderr - +2025-02-05 16:54:14 - INFO - stdout - {'loss': 0.8523, 'grad_norm': 1.0219770669937134, 'learning_rate': 1.7576801572177858e-05, 'epoch': 0.75} +2025-02-05 16:54:14 - ERROR - stderr - 25%|██▍ | 5599/22434 [6:46:34<11:43:31, 2.51s/it] +2025-02-05 16:54:16 - ERROR - stderr - 25%|██▍ | 5600/22434 [6:46:36<11:40:36, 2.50s/it] +2025-02-05 16:54:16 - ERROR - stderr - +2025-02-05 16:54:16 - ERROR - stderr - +2025-02-05 16:54:16 - INFO - stdout - {'loss': 1.0568, 'grad_norm': 1.075208067893982, 'learning_rate': 1.7575859266539036e-05, 'epoch': 0.75} +2025-02-05 16:54:16 - ERROR - stderr - 25%|██▍ | 5600/22434 [6:46:36<11:40:36, 2.50s/it] +2025-02-05 16:54:19 - ERROR - stderr - 25%|██▍ | 5601/22434 [6:46:39<11:45:06, 2.51s/it] +2025-02-05 16:54:19 - ERROR - stderr - +2025-02-05 16:54:19 - ERROR - stderr - +2025-02-05 16:54:19 - INFO - stdout - {'loss': 0.8333, 'grad_norm': 1.033706784248352, 'learning_rate': 1.757491680298868e-05, 'epoch': 0.75} +2025-02-05 16:54:19 - ERROR - stderr - 25%|██▍ | 5601/22434 [6:46:39<11:45:06, 2.51s/it] +2025-02-05 16:54:21 - ERROR - stderr - 25%|██▍ | 5602/22434 [6:46:41<11:44:32, 2.51s/it] +2025-02-05 16:54:21 - ERROR - stderr - +2025-02-05 16:54:21 - ERROR - stderr - +2025-02-05 16:54:21 - INFO - stdout - {'loss': 0.8621, 'grad_norm': 0.9717497229576111, 'learning_rate': 1.757397418154643e-05, 'epoch': 0.75} +2025-02-05 16:54:21 - ERROR - stderr - 25%|██▍ | 5602/22434 [6:46:41<11:44:32, 2.51s/it] +2025-02-05 16:54:24 - ERROR - stderr - 25%|██▍ | 5603/22434 [6:46:44<11:38:14, 2.49s/it] +2025-02-05 16:54:24 - ERROR - stderr - +2025-02-05 16:54:24 - ERROR - stderr - +2025-02-05 16:54:24 - INFO - stdout - {'loss': 0.9406, 'grad_norm': 1.0269144773483276, 'learning_rate': 1.7573031402231936e-05, 'epoch': 0.75} +2025-02-05 16:54:24 - ERROR - stderr - 25%|██▍ | 5603/22434 [6:46:44<11:38:14, 2.49s/it] +2025-02-05 16:54:26 - ERROR - stderr - 25%|██▍ | 5604/22434 [6:46:46<11:37:08, 2.49s/it] +2025-02-05 16:54:26 - ERROR - stderr - +2025-02-05 16:54:26 - ERROR - stderr - +2025-02-05 16:54:26 - INFO - stdout - {'loss': 0.9934, 'grad_norm': 1.1177387237548828, 'learning_rate': 1.7572088465064847e-05, 'epoch': 0.75} +2025-02-05 16:54:26 - ERROR - stderr - 25%|██▍ | 5604/22434 [6:46:46<11:37:08, 2.49s/it] +2025-02-05 16:54:29 - ERROR - stderr - 25%|██▍ | 5605/22434 [6:46:49<11:37:16, 2.49s/it] +2025-02-05 16:54:29 - ERROR - stderr - +2025-02-05 16:54:29 - ERROR - stderr - +2025-02-05 16:54:29 - INFO - stdout - {'loss': 1.0455, 'grad_norm': 1.0443004369735718, 'learning_rate': 1.757114537006482e-05, 'epoch': 0.75} +2025-02-05 16:54:29 - ERROR - stderr - 25%|██▍ | 5605/22434 [6:46:49<11:37:16, 2.49s/it] +2025-02-05 16:54:31 - ERROR - stderr - 25%|██▍ | 5606/22434 [6:46:51<11:56:09, 2.55s/it] +2025-02-05 16:54:32 - ERROR - stderr - +2025-02-05 16:54:32 - ERROR - stderr - +2025-02-05 16:54:32 - INFO - stdout - {'loss': 0.921, 'grad_norm': 1.0846948623657227, 'learning_rate': 1.7570202117251517e-05, 'epoch': 0.75} +2025-02-05 16:54:32 - ERROR - stderr - 25%|██▍ | 5606/22434 [6:46:51<11:56:09, 2.55s/it] +2025-02-05 16:54:34 - ERROR - stderr - 25%|██▍ | 5607/22434 [6:46:54<11:45:59, 2.52s/it] +2025-02-05 16:54:34 - ERROR - stderr - +2025-02-05 16:54:34 - ERROR - stderr - +2025-02-05 16:54:34 - INFO - stdout - {'loss': 0.9046, 'grad_norm': 1.1932439804077148, 'learning_rate': 1.7569258706644588e-05, 'epoch': 0.75} +2025-02-05 16:54:34 - ERROR - stderr - 25%|██▍ | 5607/22434 [6:46:54<11:45:59, 2.52s/it] +2025-02-05 16:54:36 - ERROR - stderr - 25%|██▍ | 5608/22434 [6:46:56<11:44:02, 2.51s/it] +2025-02-05 16:54:36 - ERROR - stderr - +2025-02-05 16:54:36 - ERROR - stderr - +2025-02-05 16:54:36 - INFO - stdout - {'loss': 1.0535, 'grad_norm': 1.0925523042678833, 'learning_rate': 1.756831513826371e-05, 'epoch': 0.75} +2025-02-05 16:54:36 - ERROR - stderr - 25%|██▍ | 5608/22434 [6:46:56<11:44:02, 2.51s/it] +2025-02-05 16:54:39 - ERROR - stderr - 25%|██▌ | 5609/22434 [6:46:59<12:06:13, 2.59s/it] +2025-02-05 16:54:39 - ERROR - stderr - +2025-02-05 16:54:39 - ERROR - stderr - +2025-02-05 16:54:39 - INFO - stdout - {'loss': 0.9062, 'grad_norm': 1.0358389616012573, 'learning_rate': 1.7567371412128544e-05, 'epoch': 0.75} +2025-02-05 16:54:39 - ERROR - stderr - 25%|██▌ | 5609/22434 [6:46:59<12:06:13, 2.59s/it] +2025-02-05 16:54:42 - ERROR - stderr - 25%|██▌ | 5610/22434 [6:47:01<11:53:27, 2.54s/it] +2025-02-05 16:54:42 - ERROR - stderr - +2025-02-05 16:54:42 - ERROR - stderr - +2025-02-05 16:54:42 - INFO - stdout - {'loss': 0.9396, 'grad_norm': 1.0828266143798828, 'learning_rate': 1.7566427528258758e-05, 'epoch': 0.75} +2025-02-05 16:54:42 - ERROR - stderr - 25%|██▌ | 5610/22434 [6:47:01<11:53:27, 2.54s/it] +2025-02-05 16:54:44 - ERROR - stderr - 25%|██▌ | 5611/22434 [6:47:04<11:50:43, 2.53s/it] +2025-02-05 16:54:44 - ERROR - stderr - +2025-02-05 16:54:44 - ERROR - stderr - +2025-02-05 16:54:44 - INFO - stdout - {'loss': 0.9433, 'grad_norm': 1.0721856355667114, 'learning_rate': 1.7565483486674035e-05, 'epoch': 0.75} +2025-02-05 16:54:44 - ERROR - stderr - 25%|██▌ | 5611/22434 [6:47:04<11:50:43, 2.53s/it] +2025-02-05 16:54:47 - ERROR - stderr - 25%|██▌ | 5612/22434 [6:47:06<11:48:08, 2.53s/it] +2025-02-05 16:54:47 - ERROR - stderr - +2025-02-05 16:54:47 - ERROR - stderr - +2025-02-05 16:54:47 - INFO - stdout - {'loss': 0.9331, 'grad_norm': 0.9857664704322815, 'learning_rate': 1.7564539287394048e-05, 'epoch': 0.75} +2025-02-05 16:54:47 - ERROR - stderr - 25%|██▌ | 5612/22434 [6:47:06<11:48:08, 2.53s/it] +2025-02-05 16:54:49 - ERROR - stderr - 25%|██▌ | 5613/22434 [6:47:09<11:48:25, 2.53s/it] +2025-02-05 16:54:49 - ERROR - stderr - +2025-02-05 16:54:49 - ERROR - stderr - +2025-02-05 16:54:49 - INFO - stdout - {'loss': 1.064, 'grad_norm': 1.0738693475723267, 'learning_rate': 1.7563594930438475e-05, 'epoch': 0.75} +2025-02-05 16:54:49 - ERROR - stderr - 25%|██▌ | 5613/22434 [6:47:09<11:48:25, 2.53s/it] +2025-02-05 16:54:52 - ERROR - stderr - 25%|██▌ | 5614/22434 [6:47:12<12:00:47, 2.57s/it] +2025-02-05 16:54:52 - ERROR - stderr - +2025-02-05 16:54:52 - ERROR - stderr - +2025-02-05 16:54:52 - INFO - stdout - {'loss': 1.0144, 'grad_norm': 0.9988113045692444, 'learning_rate': 1.7562650415827004e-05, 'epoch': 0.75} +2025-02-05 16:54:52 - ERROR - stderr - 25%|██▌ | 5614/22434 [6:47:12<12:00:47, 2.57s/it] +2025-02-05 16:54:54 - ERROR - stderr - 25%|██▌ | 5615/22434 [6:47:14<11:56:21, 2.56s/it] +2025-02-05 16:54:54 - ERROR - stderr - +2025-02-05 16:54:54 - ERROR - stderr - +2025-02-05 16:54:54 - INFO - stdout - {'loss': 0.8437, 'grad_norm': 1.0331710577011108, 'learning_rate': 1.7561705743579323e-05, 'epoch': 0.75} +2025-02-05 16:54:54 - ERROR - stderr - 25%|██▌ | 5615/22434 [6:47:14<11:56:21, 2.56s/it] +2025-02-05 16:54:57 - ERROR - stderr - 25%|██▌ | 5616/22434 [6:47:17<11:47:55, 2.53s/it] +2025-02-05 16:54:57 - ERROR - stderr - +2025-02-05 16:54:57 - ERROR - stderr - +2025-02-05 16:54:57 - INFO - stdout - {'loss': 0.8231, 'grad_norm': 1.015241026878357, 'learning_rate': 1.756076091371512e-05, 'epoch': 0.75} +2025-02-05 16:54:57 - ERROR - stderr - 25%|██▌ | 5616/22434 [6:47:17<11:47:55, 2.53s/it] +2025-02-05 16:54:59 - ERROR - stderr - 25%|██▌ | 5617/22434 [6:47:19<11:40:09, 2.50s/it] +2025-02-05 16:54:59 - ERROR - stderr - +2025-02-05 16:54:59 - ERROR - stderr - +2025-02-05 16:54:59 - INFO - stdout - {'loss': 0.9648, 'grad_norm': 1.1775310039520264, 'learning_rate': 1.755981592625409e-05, 'epoch': 0.75} +2025-02-05 16:54:59 - ERROR - stderr - 25%|██▌ | 5617/22434 [6:47:19<11:40:09, 2.50s/it] +2025-02-05 16:55:02 - ERROR - stderr - 25%|██▌ | 5618/22434 [6:47:22<11:42:27, 2.51s/it] +2025-02-05 16:55:02 - ERROR - stderr - +2025-02-05 16:55:02 - ERROR - stderr - +2025-02-05 16:55:02 - INFO - stdout - {'loss': 0.9883, 'grad_norm': 1.0573056936264038, 'learning_rate': 1.7558870781215936e-05, 'epoch': 0.75} +2025-02-05 16:55:02 - ERROR - stderr - 25%|██▌ | 5618/22434 [6:47:22<11:42:27, 2.51s/it] +2025-02-05 16:55:04 - ERROR - stderr - 25%|██▌ | 5619/22434 [6:47:24<11:47:07, 2.52s/it] +2025-02-05 16:55:04 - ERROR - stderr - +2025-02-05 16:55:04 - ERROR - stderr - +2025-02-05 16:55:04 - INFO - stdout - {'loss': 0.8707, 'grad_norm': 1.0810927152633667, 'learning_rate': 1.755792547862035e-05, 'epoch': 0.75} +2025-02-05 16:55:04 - ERROR - stderr - 25%|██▌ | 5619/22434 [6:47:24<11:47:07, 2.52s/it] +2025-02-05 16:55:07 - ERROR - stderr - 25%|██▌ | 5620/22434 [6:47:27<11:45:37, 2.52s/it] +2025-02-05 16:55:07 - ERROR - stderr - +2025-02-05 16:55:07 - ERROR - stderr - +2025-02-05 16:55:07 - INFO - stdout - {'loss': 0.9802, 'grad_norm': 1.0351015329360962, 'learning_rate': 1.7556980018487036e-05, 'epoch': 0.75} +2025-02-05 16:55:07 - ERROR - stderr - 25%|██▌ | 5620/22434 [6:47:27<11:45:37, 2.52s/it] +2025-02-05 16:55:09 - ERROR - stderr - 25%|██▌ | 5621/22434 [6:47:29<11:43:34, 2.51s/it] +2025-02-05 16:55:09 - ERROR - stderr - +2025-02-05 16:55:09 - ERROR - stderr - +2025-02-05 16:55:09 - INFO - stdout - {'loss': 0.904, 'grad_norm': 1.0617460012435913, 'learning_rate': 1.7556034400835712e-05, 'epoch': 0.75} +2025-02-05 16:55:09 - ERROR - stderr - 25%|██▌ | 5621/22434 [6:47:29<11:43:34, 2.51s/it] +2025-02-05 16:55:12 - ERROR - stderr - 25%|██▌ | 5622/22434 [6:47:32<11:35:07, 2.48s/it] +2025-02-05 16:55:12 - ERROR - stderr - +2025-02-05 16:55:12 - ERROR - stderr - +2025-02-05 16:55:12 - INFO - stdout - {'loss': 1.0427, 'grad_norm': 1.2125509977340698, 'learning_rate': 1.7555088625686075e-05, 'epoch': 0.75} +2025-02-05 16:55:12 - ERROR - stderr - 25%|██▌ | 5622/22434 [6:47:32<11:35:07, 2.48s/it] +2025-02-05 16:55:14 - ERROR - stderr - 25%|██▌ | 5623/22434 [6:47:34<11:31:39, 2.47s/it] +2025-02-05 16:55:14 - ERROR - stderr - +2025-02-05 16:55:14 - ERROR - stderr - +2025-02-05 16:55:14 - INFO - stdout - {'loss': 1.0246, 'grad_norm': 1.1726773977279663, 'learning_rate': 1.7554142693057848e-05, 'epoch': 0.75} +2025-02-05 16:55:14 - ERROR - stderr - 25%|██▌ | 5623/22434 [6:47:34<11:31:39, 2.47s/it] +2025-02-05 16:55:17 - ERROR - stderr - 25%|██▌ | 5624/22434 [6:47:36<11:31:27, 2.47s/it] +2025-02-05 16:55:17 - ERROR - stderr - +2025-02-05 16:55:17 - ERROR - stderr - +2025-02-05 16:55:17 - INFO - stdout - {'loss': 1.0829, 'grad_norm': 1.0637493133544922, 'learning_rate': 1.7553196602970746e-05, 'epoch': 0.75} +2025-02-05 16:55:17 - ERROR - stderr - 25%|██▌ | 5624/22434 [6:47:36<11:31:27, 2.47s/it] +2025-02-05 16:55:19 - ERROR - stderr - 25%|██▌ | 5625/22434 [6:47:39<11:48:34, 2.53s/it] +2025-02-05 16:55:19 - ERROR - stderr - +2025-02-05 16:55:19 - ERROR - stderr - +2025-02-05 16:55:19 - INFO - stdout - {'loss': 0.9245, 'grad_norm': 1.1356314420700073, 'learning_rate': 1.7552250355444486e-05, 'epoch': 0.75} +2025-02-05 16:55:19 - ERROR - stderr - 25%|██▌ | 5625/22434 [6:47:39<11:48:34, 2.53s/it] +2025-02-05 16:55:22 - ERROR - stderr - 25%|██▌ | 5626/22434 [6:47:42<11:44:56, 2.52s/it] +2025-02-05 16:55:22 - ERROR - stderr - +2025-02-05 16:55:22 - ERROR - stderr - +2025-02-05 16:55:22 - INFO - stdout - {'loss': 1.0833, 'grad_norm': 1.0804098844528198, 'learning_rate': 1.75513039504988e-05, 'epoch': 0.75} +2025-02-05 16:55:22 - ERROR - stderr - 25%|██▌ | 5626/22434 [6:47:42<11:44:56, 2.52s/it] +2025-02-05 16:55:24 - ERROR - stderr - 25%|██▌ | 5627/22434 [6:47:44<11:42:37, 2.51s/it] +2025-02-05 16:55:24 - ERROR - stderr - +2025-02-05 16:55:24 - ERROR - stderr - +2025-02-05 16:55:24 - INFO - stdout - {'loss': 0.9018, 'grad_norm': 0.9765375852584839, 'learning_rate': 1.75503573881534e-05, 'epoch': 0.75} +2025-02-05 16:55:24 - ERROR - stderr - 25%|██▌ | 5627/22434 [6:47:44<11:42:37, 2.51s/it] +2025-02-05 16:55:27 - ERROR - stderr - 25%|██▌ | 5628/22434 [6:47:47<11:53:32, 2.55s/it] +2025-02-05 16:55:27 - ERROR - stderr - +2025-02-05 16:55:27 - ERROR - stderr - +2025-02-05 16:55:27 - INFO - stdout - {'loss': 0.9924, 'grad_norm': 1.0798091888427734, 'learning_rate': 1.754941066842803e-05, 'epoch': 0.75} +2025-02-05 16:55:27 - ERROR - stderr - 25%|██▌ | 5628/22434 [6:47:47<11:53:32, 2.55s/it] +2025-02-05 16:55:29 - ERROR - stderr - 25%|██▌ | 5629/22434 [6:47:49<11:43:24, 2.51s/it] +2025-02-05 16:55:29 - ERROR - stderr - +2025-02-05 16:55:29 - ERROR - stderr - +2025-02-05 16:55:29 - INFO - stdout - {'loss': 1.0673, 'grad_norm': 1.0957142114639282, 'learning_rate': 1.754846379134242e-05, 'epoch': 0.75} +2025-02-05 16:55:29 - ERROR - stderr - 25%|██▌ | 5629/22434 [6:47:49<11:43:24, 2.51s/it] +2025-02-05 16:55:32 - ERROR - stderr - 25%|██▌ | 5630/22434 [6:47:52<11:49:01, 2.53s/it] +2025-02-05 16:55:32 - ERROR - stderr - +2025-02-05 16:55:32 - ERROR - stderr - +2025-02-05 16:55:32 - INFO - stdout - {'loss': 1.0201, 'grad_norm': 1.0026651620864868, 'learning_rate': 1.7547516756916304e-05, 'epoch': 0.75} +2025-02-05 16:55:32 - ERROR - stderr - 25%|██▌ | 5630/22434 [6:47:52<11:49:01, 2.53s/it] +2025-02-05 16:55:34 - ERROR - stderr - 25%|██▌ | 5631/22434 [6:47:54<11:39:45, 2.50s/it] +2025-02-05 16:55:34 - ERROR - stderr - +2025-02-05 16:55:34 - ERROR - stderr - +2025-02-05 16:55:34 - INFO - stdout - {'loss': 0.8949, 'grad_norm': 0.9785611629486084, 'learning_rate': 1.7546569565169423e-05, 'epoch': 0.75} +2025-02-05 16:55:34 - ERROR - stderr - 25%|██▌ | 5631/22434 [6:47:54<11:39:45, 2.50s/it] +2025-02-05 16:55:37 - ERROR - stderr - 25%|██▌ | 5632/22434 [6:47:57<11:39:56, 2.50s/it] +2025-02-05 16:55:37 - ERROR - stderr - +2025-02-05 16:55:37 - ERROR - stderr - +2025-02-05 16:55:37 - INFO - stdout - {'loss': 0.9347, 'grad_norm': 1.0145694017410278, 'learning_rate': 1.754562221612152e-05, 'epoch': 0.75} +2025-02-05 16:55:37 - ERROR - stderr - 25%|██▌ | 5632/22434 [6:47:57<11:39:56, 2.50s/it] +2025-02-05 16:55:39 - ERROR - stderr - 25%|██▌ | 5633/22434 [6:47:59<11:41:43, 2.51s/it] +2025-02-05 16:55:39 - ERROR - stderr - +2025-02-05 16:55:39 - ERROR - stderr - +2025-02-05 16:55:39 - INFO - stdout - {'loss': 0.9761, 'grad_norm': 1.1531141996383667, 'learning_rate': 1.7544674709792343e-05, 'epoch': 0.75} +2025-02-05 16:55:39 - ERROR - stderr - 25%|██▌ | 5633/22434 [6:47:59<11:41:43, 2.51s/it] +2025-02-05 16:55:42 - ERROR - stderr - 25%|██▌ | 5634/22434 [6:48:02<11:47:52, 2.53s/it] +2025-02-05 16:55:42 - ERROR - stderr - +2025-02-05 16:55:42 - ERROR - stderr - +2025-02-05 16:55:42 - INFO - stdout - {'loss': 1.1516, 'grad_norm': 1.1732995510101318, 'learning_rate': 1.7543727046201642e-05, 'epoch': 0.75} +2025-02-05 16:55:42 - ERROR - stderr - 25%|██▌ | 5634/22434 [6:48:02<11:47:52, 2.53s/it] +2025-02-05 16:55:44 - ERROR - stderr - 25%|██▌ | 5635/22434 [6:48:04<11:47:16, 2.53s/it] +2025-02-05 16:55:45 - ERROR - stderr - +2025-02-05 16:55:45 - ERROR - stderr - +2025-02-05 16:55:45 - INFO - stdout - {'loss': 0.8211, 'grad_norm': 1.114938497543335, 'learning_rate': 1.754277922536917e-05, 'epoch': 0.75} +2025-02-05 16:55:45 - ERROR - stderr - 25%|██▌ | 5635/22434 [6:48:04<11:47:16, 2.53s/it] +2025-02-05 16:55:47 - ERROR - stderr - 25%|██▌ | 5636/22434 [6:48:07<11:48:36, 2.53s/it] +2025-02-05 16:55:47 - ERROR - stderr - +2025-02-05 16:55:47 - ERROR - stderr - +2025-02-05 16:55:47 - INFO - stdout - {'loss': 1.0543, 'grad_norm': 1.1262239217758179, 'learning_rate': 1.7541831247314678e-05, 'epoch': 0.75} +2025-02-05 16:55:47 - ERROR - stderr - 25%|██▌ | 5636/22434 [6:48:07<11:48:36, 2.53s/it] +2025-02-05 16:55:50 - ERROR - stderr - 25%|██▌ | 5637/22434 [6:48:09<12:01:58, 2.58s/it] +2025-02-05 16:55:50 - ERROR - stderr - +2025-02-05 16:55:50 - ERROR - stderr - +2025-02-05 16:55:50 - INFO - stdout - {'loss': 1.0679, 'grad_norm': 1.1927224397659302, 'learning_rate': 1.7540883112057933e-05, 'epoch': 0.75} +2025-02-05 16:55:50 - ERROR - stderr - 25%|██▌ | 5637/22434 [6:48:10<12:01:58, 2.58s/it] +2025-02-05 16:55:53 - ERROR - stderr - 25%|██▌ | 5638/22434 [6:48:12<12:25:49, 2.66s/it] +2025-02-05 16:55:53 - ERROR - stderr - +2025-02-05 16:55:53 - ERROR - stderr - +2025-02-05 16:55:53 - INFO - stdout - {'loss': 1.0626, 'grad_norm': 1.0859661102294922, 'learning_rate': 1.7539934819618696e-05, 'epoch': 0.75} +2025-02-05 16:55:53 - ERROR - stderr - 25%|██▌ | 5638/22434 [6:48:12<12:25:49, 2.66s/it] +2025-02-05 16:55:55 - ERROR - stderr - 25%|██▌ | 5639/22434 [6:48:15<12:11:41, 2.61s/it] +2025-02-05 16:55:55 - ERROR - stderr - +2025-02-05 16:55:55 - ERROR - stderr - +2025-02-05 16:55:55 - INFO - stdout - {'loss': 0.9055, 'grad_norm': 1.0610706806182861, 'learning_rate': 1.7538986370016732e-05, 'epoch': 0.75} +2025-02-05 16:55:55 - ERROR - stderr - 25%|██▌ | 5639/22434 [6:48:15<12:11:41, 2.61s/it] +2025-02-05 16:55:58 - ERROR - stderr - 25%|██▌ | 5640/22434 [6:48:17<11:56:31, 2.56s/it] +2025-02-05 16:55:58 - ERROR - stderr - +2025-02-05 16:55:58 - ERROR - stderr - +2025-02-05 16:55:58 - INFO - stdout - {'loss': 0.8957, 'grad_norm': 1.0506479740142822, 'learning_rate': 1.7538037763271812e-05, 'epoch': 0.75} +2025-02-05 16:55:58 - ERROR - stderr - 25%|██▌ | 5640/22434 [6:48:17<11:56:31, 2.56s/it] +2025-02-05 16:56:00 - ERROR - stderr - 25%|██▌ | 5641/22434 [6:48:20<11:44:45, 2.52s/it] +2025-02-05 16:56:00 - ERROR - stderr - +2025-02-05 16:56:00 - ERROR - stderr - +2025-02-05 16:56:00 - INFO - stdout - {'loss': 0.9853, 'grad_norm': 0.9687379598617554, 'learning_rate': 1.7537088999403708e-05, 'epoch': 0.75} +2025-02-05 16:56:00 - ERROR - stderr - 25%|██▌ | 5641/22434 [6:48:20<11:44:45, 2.52s/it] +2025-02-05 16:56:02 - ERROR - stderr - 25%|██▌ | 5642/22434 [6:48:22<11:40:13, 2.50s/it] +2025-02-05 16:56:02 - ERROR - stderr - +2025-02-05 16:56:02 - ERROR - stderr - +2025-02-05 16:56:02 - INFO - stdout - {'loss': 0.8248, 'grad_norm': 0.9650346040725708, 'learning_rate': 1.7536140078432194e-05, 'epoch': 0.75} +2025-02-05 16:56:02 - ERROR - stderr - 25%|██▌ | 5642/22434 [6:48:22<11:40:13, 2.50s/it] +2025-02-05 16:56:05 - ERROR - stderr - 25%|██▌ | 5643/22434 [6:48:25<11:34:22, 2.48s/it] +2025-02-05 16:56:05 - ERROR - stderr - +2025-02-05 16:56:05 - ERROR - stderr - +2025-02-05 16:56:05 - INFO - stdout - {'loss': 0.9724, 'grad_norm': 1.0056564807891846, 'learning_rate': 1.7535191000377055e-05, 'epoch': 0.75} +2025-02-05 16:56:05 - ERROR - stderr - 25%|██▌ | 5643/22434 [6:48:25<11:34:22, 2.48s/it] +2025-02-05 16:56:07 - ERROR - stderr - 25%|██▌ | 5644/22434 [6:48:27<11:37:15, 2.49s/it] +2025-02-05 16:56:07 - ERROR - stderr - +2025-02-05 16:56:07 - ERROR - stderr - +2025-02-05 16:56:07 - INFO - stdout - {'loss': 1.0219, 'grad_norm': 0.97073894739151, 'learning_rate': 1.753424176525807e-05, 'epoch': 0.75} +2025-02-05 16:56:07 - ERROR - stderr - 25%|██▌ | 5644/22434 [6:48:27<11:37:15, 2.49s/it] +2025-02-05 16:56:10 - ERROR - stderr - 25%|██▌ | 5645/22434 [6:48:30<11:34:14, 2.48s/it] +2025-02-05 16:56:10 - ERROR - stderr - +2025-02-05 16:56:10 - ERROR - stderr - +2025-02-05 16:56:10 - INFO - stdout - {'loss': 0.8723, 'grad_norm': 1.0251795053482056, 'learning_rate': 1.753329237309502e-05, 'epoch': 0.75} +2025-02-05 16:56:10 - ERROR - stderr - 25%|██▌ | 5645/22434 [6:48:30<11:34:14, 2.48s/it] +2025-02-05 16:56:12 - ERROR - stderr - 25%|██▌ | 5646/22434 [6:48:32<11:37:03, 2.49s/it] +2025-02-05 16:56:12 - ERROR - stderr - +2025-02-05 16:56:12 - ERROR - stderr - +2025-02-05 16:56:12 - INFO - stdout - {'loss': 1.0067, 'grad_norm': 1.2767223119735718, 'learning_rate': 1.75323428239077e-05, 'epoch': 0.76} +2025-02-05 16:56:12 - ERROR - stderr - 25%|██▌ | 5646/22434 [6:48:32<11:37:03, 2.49s/it] +2025-02-05 16:56:15 - ERROR - stderr - 25%|██▌ | 5647/22434 [6:48:35<11:33:02, 2.48s/it] +2025-02-05 16:56:15 - ERROR - stderr - +2025-02-05 16:56:15 - ERROR - stderr - +2025-02-05 16:56:15 - INFO - stdout - {'loss': 0.9036, 'grad_norm': 1.0767724514007568, 'learning_rate': 1.7531393117715906e-05, 'epoch': 0.76} +2025-02-05 16:56:15 - ERROR - stderr - 25%|██▌ | 5647/22434 [6:48:35<11:33:02, 2.48s/it] +2025-02-05 16:56:17 - ERROR - stderr - 25%|██▌ | 5648/22434 [6:48:37<11:32:52, 2.48s/it] +2025-02-05 16:56:17 - ERROR - stderr - +2025-02-05 16:56:17 - ERROR - stderr - +2025-02-05 16:56:17 - INFO - stdout - {'loss': 0.801, 'grad_norm': 0.9715018272399902, 'learning_rate': 1.7530443254539426e-05, 'epoch': 0.76} +2025-02-05 16:56:17 - ERROR - stderr - 25%|██▌ | 5648/22434 [6:48:37<11:32:52, 2.48s/it] +2025-02-05 16:56:20 - ERROR - stderr - 25%|██▌ | 5649/22434 [6:48:40<11:58:32, 2.57s/it] +2025-02-05 16:56:20 - ERROR - stderr - +2025-02-05 16:56:20 - ERROR - stderr - +2025-02-05 16:56:20 - INFO - stdout - {'loss': 0.9739, 'grad_norm': 1.1763389110565186, 'learning_rate': 1.7529493234398062e-05, 'epoch': 0.76} +2025-02-05 16:56:20 - ERROR - stderr - 25%|██▌ | 5649/22434 [6:48:40<11:58:32, 2.57s/it] +2025-02-05 16:56:23 - ERROR - stderr - 25%|██▌ | 5650/22434 [6:48:43<12:30:00, 2.68s/it] +2025-02-05 16:56:23 - ERROR - stderr - +2025-02-05 16:56:23 - ERROR - stderr - +2025-02-05 16:56:23 - INFO - stdout - {'loss': 0.9803, 'grad_norm': 1.3050271272659302, 'learning_rate': 1.752854305731162e-05, 'epoch': 0.76} +2025-02-05 16:56:23 - ERROR - stderr - 25%|██▌ | 5650/22434 [6:48:43<12:30:00, 2.68s/it] +2025-02-05 16:56:25 - ERROR - stderr - 25%|██▌ | 5651/22434 [6:48:45<12:13:54, 2.62s/it] +2025-02-05 16:56:26 - ERROR - stderr - +2025-02-05 16:56:26 - ERROR - stderr - +2025-02-05 16:56:26 - INFO - stdout - {'loss': 1.0525, 'grad_norm': 1.058416724205017, 'learning_rate': 1.75275927232999e-05, 'epoch': 0.76} +2025-02-05 16:56:26 - ERROR - stderr - 25%|██▌ | 5651/22434 [6:48:45<12:13:54, 2.62s/it] +2025-02-05 16:56:28 - ERROR - stderr - 25%|██▌ | 5652/22434 [6:48:48<12:04:25, 2.59s/it] +2025-02-05 16:56:28 - ERROR - stderr - +2025-02-05 16:56:28 - ERROR - stderr - +2025-02-05 16:56:28 - INFO - stdout - {'loss': 0.9139, 'grad_norm': 1.0298298597335815, 'learning_rate': 1.752664223238271e-05, 'epoch': 0.76} +2025-02-05 16:56:28 - ERROR - stderr - 25%|██▌ | 5652/22434 [6:48:48<12:04:25, 2.59s/it] +2025-02-05 16:56:31 - ERROR - stderr - 25%|██▌ | 5653/22434 [6:48:50<12:01:48, 2.58s/it] +2025-02-05 16:56:31 - ERROR - stderr - +2025-02-05 16:56:31 - ERROR - stderr - +2025-02-05 16:56:31 - INFO - stdout - {'loss': 0.8472, 'grad_norm': 0.9952281713485718, 'learning_rate': 1.7525691584579866e-05, 'epoch': 0.76} +2025-02-05 16:56:31 - ERROR - stderr - 25%|██▌ | 5653/22434 [6:48:50<12:01:48, 2.58s/it] +2025-02-05 16:56:33 - ERROR - stderr - 25%|██▌ | 5654/22434 [6:48:53<11:57:57, 2.57s/it] +2025-02-05 16:56:33 - ERROR - stderr - +2025-02-05 16:56:33 - ERROR - stderr - +2025-02-05 16:56:33 - INFO - stdout - {'loss': 0.9506, 'grad_norm': 1.1030126810073853, 'learning_rate': 1.7524740779911185e-05, 'epoch': 0.76} +2025-02-05 16:56:33 - ERROR - stderr - 25%|██▌ | 5654/22434 [6:48:53<11:57:57, 2.57s/it] +2025-02-05 16:56:35 - ERROR - stderr - 25%|██▌ | 5655/22434 [6:48:55<11:44:11, 2.52s/it] +2025-02-05 16:56:36 - ERROR - stderr - +2025-02-05 16:56:36 - ERROR - stderr - +2025-02-05 16:56:36 - INFO - stdout - {'loss': 0.9457, 'grad_norm': 1.1117812395095825, 'learning_rate': 1.752378981839648e-05, 'epoch': 0.76} +2025-02-05 16:56:36 - ERROR - stderr - 25%|██▌ | 5655/22434 [6:48:55<11:44:11, 2.52s/it] +2025-02-05 16:56:38 - ERROR - stderr - 25%|██▌ | 5656/22434 [6:48:58<11:37:15, 2.49s/it] +2025-02-05 16:56:38 - ERROR - stderr - +2025-02-05 16:56:38 - ERROR - stderr - +2025-02-05 16:56:38 - INFO - stdout - {'loss': 0.9337, 'grad_norm': 1.0242729187011719, 'learning_rate': 1.752283870005558e-05, 'epoch': 0.76} +2025-02-05 16:56:38 - ERROR - stderr - 25%|██▌ | 5656/22434 [6:48:58<11:37:15, 2.49s/it] +2025-02-05 16:56:40 - ERROR - stderr - 25%|██▌ | 5657/22434 [6:49:00<11:45:10, 2.52s/it] +2025-02-05 16:56:41 - ERROR - stderr - +2025-02-05 16:56:41 - ERROR - stderr - +2025-02-05 16:56:41 - INFO - stdout - {'loss': 0.9674, 'grad_norm': 1.1097509860992432, 'learning_rate': 1.7521887424908298e-05, 'epoch': 0.76} +2025-02-05 16:56:41 - ERROR - stderr - 25%|██▌ | 5657/22434 [6:49:00<11:45:10, 2.52s/it] +2025-02-05 16:56:43 - ERROR - stderr - 25%|██▌ | 5658/22434 [6:49:03<11:40:41, 2.51s/it] +2025-02-05 16:56:43 - ERROR - stderr - +2025-02-05 16:56:43 - ERROR - stderr - +2025-02-05 16:56:43 - INFO - stdout - {'loss': 0.9165, 'grad_norm': 1.0772755146026611, 'learning_rate': 1.7520935992974477e-05, 'epoch': 0.76} +2025-02-05 16:56:43 - ERROR - stderr - 25%|██▌ | 5658/22434 [6:49:03<11:40:41, 2.51s/it] +2025-02-05 16:56:45 - ERROR - stderr - 25%|██▌ | 5659/22434 [6:49:05<11:40:32, 2.51s/it] +2025-02-05 16:56:46 - ERROR - stderr - +2025-02-05 16:56:46 - ERROR - stderr - +2025-02-05 16:56:46 - INFO - stdout - {'loss': 0.964, 'grad_norm': 1.1165934801101685, 'learning_rate': 1.7519984404273936e-05, 'epoch': 0.76} +2025-02-05 16:56:46 - ERROR - stderr - 25%|██▌ | 5659/22434 [6:49:05<11:40:32, 2.51s/it] +2025-02-05 16:56:48 - ERROR - stderr - 25%|██▌ | 5660/22434 [6:49:08<11:33:37, 2.48s/it] +2025-02-05 16:56:48 - ERROR - stderr - +2025-02-05 16:56:48 - ERROR - stderr - +2025-02-05 16:56:48 - INFO - stdout - {'loss': 0.9455, 'grad_norm': 1.1086770296096802, 'learning_rate': 1.7519032658826523e-05, 'epoch': 0.76} +2025-02-05 16:56:48 - ERROR - stderr - 25%|██▌ | 5660/22434 [6:49:08<11:33:37, 2.48s/it] +2025-02-05 16:56:50 - ERROR - stderr - 25%|██▌ | 5661/22434 [6:49:10<11:36:32, 2.49s/it] +2025-02-05 16:56:50 - ERROR - stderr - +2025-02-05 16:56:50 - ERROR - stderr - +2025-02-05 16:56:50 - INFO - stdout - {'loss': 1.0432, 'grad_norm': 1.1837263107299805, 'learning_rate': 1.7518080756652068e-05, 'epoch': 0.76} +2025-02-05 16:56:50 - ERROR - stderr - 25%|██▌ | 5661/22434 [6:49:10<11:36:32, 2.49s/it] +2025-02-05 16:56:53 - ERROR - stderr - 25%|██▌ | 5662/22434 [6:49:13<11:33:48, 2.48s/it] +2025-02-05 16:56:53 - ERROR - stderr - +2025-02-05 16:56:53 - ERROR - stderr - +2025-02-05 16:56:53 - INFO - stdout - {'loss': 1.0047, 'grad_norm': 1.078892707824707, 'learning_rate': 1.751712869777041e-05, 'epoch': 0.76} +2025-02-05 16:56:53 - ERROR - stderr - 25%|██▌ | 5662/22434 [6:49:13<11:33:48, 2.48s/it] +2025-02-05 16:56:55 - ERROR - stderr - 25%|██▌ | 5663/22434 [6:49:15<11:32:09, 2.48s/it] +2025-02-05 16:56:55 - ERROR - stderr - +2025-02-05 16:56:55 - ERROR - stderr - +2025-02-05 16:56:55 - INFO - stdout - {'loss': 0.9906, 'grad_norm': 1.0345041751861572, 'learning_rate': 1.7516176482201397e-05, 'epoch': 0.76} +2025-02-05 16:56:55 - ERROR - stderr - 25%|██▌ | 5663/22434 [6:49:15<11:32:09, 2.48s/it] +2025-02-05 16:56:58 - ERROR - stderr - 25%|██▌ | 5664/22434 [6:49:18<11:30:55, 2.47s/it] +2025-02-05 16:56:58 - ERROR - stderr - +2025-02-05 16:56:58 - ERROR - stderr - +2025-02-05 16:56:58 - INFO - stdout - {'loss': 0.8971, 'grad_norm': 1.018334150314331, 'learning_rate': 1.751522410996488e-05, 'epoch': 0.76} +2025-02-05 16:56:58 - ERROR - stderr - 25%|██▌ | 5664/22434 [6:49:18<11:30:55, 2.47s/it] +2025-02-05 16:57:00 - ERROR - stderr - 25%|██▌ | 5665/22434 [6:49:20<11:48:50, 2.54s/it] +2025-02-05 16:57:01 - ERROR - stderr - +2025-02-05 16:57:01 - ERROR - stderr - +2025-02-05 16:57:01 - INFO - stdout - {'loss': 0.9846, 'grad_norm': 1.1631557941436768, 'learning_rate': 1.751427158108071e-05, 'epoch': 0.76} +2025-02-05 16:57:01 - ERROR - stderr - 25%|██▌ | 5665/22434 [6:49:20<11:48:50, 2.54s/it] +2025-02-05 16:57:03 - ERROR - stderr - 25%|██▌ | 5666/22434 [6:49:23<12:18:53, 2.64s/it] +2025-02-05 16:57:03 - ERROR - stderr - +2025-02-05 16:57:03 - ERROR - stderr - +2025-02-05 16:57:03 - INFO - stdout - {'loss': 0.9793, 'grad_norm': 0.9003881216049194, 'learning_rate': 1.7513318895568734e-05, 'epoch': 0.76} +2025-02-05 16:57:03 - ERROR - stderr - 25%|██▌ | 5666/22434 [6:49:23<12:18:53, 2.64s/it] +2025-02-05 16:57:06 - ERROR - stderr - 25%|██▌ | 5667/22434 [6:49:26<12:02:32, 2.59s/it] +2025-02-05 16:57:06 - ERROR - stderr - +2025-02-05 16:57:06 - ERROR - stderr - +2025-02-05 16:57:06 - INFO - stdout - {'loss': 0.7924, 'grad_norm': 0.9781140089035034, 'learning_rate': 1.7512366053448818e-05, 'epoch': 0.76} +2025-02-05 16:57:06 - ERROR - stderr - 25%|██▌ | 5667/22434 [6:49:26<12:02:32, 2.59s/it] +2025-02-05 16:57:08 - ERROR - stderr - 25%|██▌ | 5668/22434 [6:49:28<11:57:04, 2.57s/it] +2025-02-05 16:57:08 - ERROR - stderr - +2025-02-05 16:57:08 - ERROR - stderr - +2025-02-05 16:57:08 - INFO - stdout - {'loss': 0.9497, 'grad_norm': 1.0694317817687988, 'learning_rate': 1.751141305474082e-05, 'epoch': 0.76} +2025-02-05 16:57:08 - ERROR - stderr - 25%|██▌ | 5668/22434 [6:49:28<11:57:04, 2.57s/it] +2025-02-05 16:57:11 - ERROR - stderr - 25%|██▌ | 5669/22434 [6:49:31<11:47:42, 2.53s/it] +2025-02-05 16:57:11 - ERROR - stderr - +2025-02-05 16:57:11 - ERROR - stderr - +2025-02-05 16:57:11 - INFO - stdout - {'loss': 0.9409, 'grad_norm': 1.1110020875930786, 'learning_rate': 1.7510459899464604e-05, 'epoch': 0.76} +2025-02-05 16:57:11 - ERROR - stderr - 25%|██▌ | 5669/22434 [6:49:31<11:47:42, 2.53s/it] +2025-02-05 16:57:13 - ERROR - stderr - 25%|██▌ | 5670/22434 [6:49:33<11:46:34, 2.53s/it] +2025-02-05 16:57:13 - ERROR - stderr - +2025-02-05 16:57:13 - ERROR - stderr - +2025-02-05 16:57:13 - INFO - stdout - {'loss': 0.9847, 'grad_norm': 0.9874710440635681, 'learning_rate': 1.750950658764004e-05, 'epoch': 0.76} +2025-02-05 16:57:13 - ERROR - stderr - 25%|██▌ | 5670/22434 [6:49:33<11:46:34, 2.53s/it] +2025-02-05 16:57:16 - ERROR - stderr - 25%|██▌ | 5671/22434 [6:49:36<11:45:24, 2.52s/it] +2025-02-05 16:57:16 - ERROR - stderr - +2025-02-05 16:57:16 - ERROR - stderr - +2025-02-05 16:57:16 - INFO - stdout - {'loss': 0.9138, 'grad_norm': 1.0974586009979248, 'learning_rate': 1.7508553119286995e-05, 'epoch': 0.76} +2025-02-05 16:57:16 - ERROR - stderr - 25%|██▌ | 5671/22434 [6:49:36<11:45:24, 2.52s/it] +2025-02-05 16:57:18 - ERROR - stderr - 25%|██▌ | 5672/22434 [6:49:38<11:37:43, 2.50s/it] +2025-02-05 16:57:18 - ERROR - stderr - +2025-02-05 16:57:18 - ERROR - stderr - +2025-02-05 16:57:18 - INFO - stdout - {'loss': 0.8963, 'grad_norm': 1.0416758060455322, 'learning_rate': 1.7507599494425344e-05, 'epoch': 0.76} +2025-02-05 16:57:18 - ERROR - stderr - 25%|██▌ | 5672/22434 [6:49:38<11:37:43, 2.50s/it] +2025-02-05 16:57:21 - ERROR - stderr - 25%|██▌ | 5673/22434 [6:49:40<11:35:06, 2.49s/it] +2025-02-05 16:57:21 - ERROR - stderr - +2025-02-05 16:57:21 - ERROR - stderr - +2025-02-05 16:57:21 - INFO - stdout - {'loss': 1.042, 'grad_norm': 1.052480697631836, 'learning_rate': 1.7506645713074967e-05, 'epoch': 0.76} +2025-02-05 16:57:21 - ERROR - stderr - 25%|██▌ | 5673/22434 [6:49:41<11:35:06, 2.49s/it] +2025-02-05 16:57:23 - ERROR - stderr - 25%|██▌ | 5674/22434 [6:49:43<11:30:54, 2.47s/it] +2025-02-05 16:57:23 - ERROR - stderr - +2025-02-05 16:57:23 - ERROR - stderr - +2025-02-05 16:57:23 - INFO - stdout - {'loss': 0.9767, 'grad_norm': 1.0267629623413086, 'learning_rate': 1.7505691775255744e-05, 'epoch': 0.76} +2025-02-05 16:57:23 - ERROR - stderr - 25%|██▌ | 5674/22434 [6:49:43<11:30:54, 2.47s/it] +2025-02-05 16:57:26 - ERROR - stderr - 25%|██▌ | 5675/22434 [6:49:45<11:31:04, 2.47s/it] +2025-02-05 16:57:26 - ERROR - stderr - +2025-02-05 16:57:26 - ERROR - stderr - +2025-02-05 16:57:26 - INFO - stdout - {'loss': 0.8877, 'grad_norm': 1.0133389234542847, 'learning_rate': 1.7504737680987557e-05, 'epoch': 0.76} +2025-02-05 16:57:26 - ERROR - stderr - 25%|██▌ | 5675/22434 [6:49:45<11:31:04, 2.47s/it] +2025-02-05 16:57:28 - ERROR - stderr - 25%|██▌ | 5676/22434 [6:49:48<11:40:07, 2.51s/it] +2025-02-05 16:57:28 - ERROR - stderr - +2025-02-05 16:57:28 - ERROR - stderr - +2025-02-05 16:57:28 - INFO - stdout - {'loss': 0.8761, 'grad_norm': 1.019167184829712, 'learning_rate': 1.7503783430290295e-05, 'epoch': 0.76} +2025-02-05 16:57:28 - ERROR - stderr - 25%|██▌ | 5676/22434 [6:49:48<11:40:07, 2.51s/it] +2025-02-05 16:57:31 - ERROR - stderr - 25%|██▌ | 5677/22434 [6:49:51<11:56:24, 2.57s/it] +2025-02-05 16:57:31 - ERROR - stderr - +2025-02-05 16:57:31 - ERROR - stderr - +2025-02-05 16:57:31 - INFO - stdout - {'loss': 1.0391, 'grad_norm': 1.1321409940719604, 'learning_rate': 1.7502829023183848e-05, 'epoch': 0.76} +2025-02-05 16:57:31 - ERROR - stderr - 25%|██▌ | 5677/22434 [6:49:51<11:56:24, 2.57s/it] +2025-02-05 16:57:33 - ERROR - stderr - 25%|██▌ | 5678/22434 [6:49:53<11:50:41, 2.54s/it] +2025-02-05 16:57:33 - ERROR - stderr - +2025-02-05 16:57:33 - ERROR - stderr - +2025-02-05 16:57:33 - INFO - stdout - {'loss': 0.9548, 'grad_norm': 1.2106661796569824, 'learning_rate': 1.750187445968811e-05, 'epoch': 0.76} +2025-02-05 16:57:33 - ERROR - stderr - 25%|██▌ | 5678/22434 [6:49:53<11:50:41, 2.54s/it] +2025-02-05 16:57:36 - ERROR - stderr - 25%|██▌ | 5679/22434 [6:49:56<11:41:52, 2.51s/it] +2025-02-05 16:57:36 - ERROR - stderr - +2025-02-05 16:57:36 - ERROR - stderr - +2025-02-05 16:57:36 - INFO - stdout - {'loss': 0.8892, 'grad_norm': 1.2190868854522705, 'learning_rate': 1.7500919739822973e-05, 'epoch': 0.76} +2025-02-05 16:57:36 - ERROR - stderr - 25%|██▌ | 5679/22434 [6:49:56<11:41:52, 2.51s/it] +2025-02-05 16:57:38 - ERROR - stderr - 25%|██▌ | 5680/22434 [6:49:58<11:43:18, 2.52s/it] +2025-02-05 16:57:38 - ERROR - stderr - +2025-02-05 16:57:38 - ERROR - stderr - +2025-02-05 16:57:38 - INFO - stdout - {'loss': 0.8902, 'grad_norm': 1.1106572151184082, 'learning_rate': 1.749996486360835e-05, 'epoch': 0.76} +2025-02-05 16:57:38 - ERROR - stderr - 25%|██▌ | 5680/22434 [6:49:58<11:43:18, 2.52s/it] +2025-02-05 16:57:41 - ERROR - stderr - 25%|██▌ | 5681/22434 [6:50:01<11:36:52, 2.50s/it] +2025-02-05 16:57:41 - ERROR - stderr - +2025-02-05 16:57:41 - ERROR - stderr - +2025-02-05 16:57:41 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 0.9934551119804382, 'learning_rate': 1.7499009831064127e-05, 'epoch': 0.76} +2025-02-05 16:57:41 - ERROR - stderr - 25%|██▌ | 5681/22434 [6:50:01<11:36:52, 2.50s/it] +2025-02-05 16:57:43 - ERROR - stderr - 25%|██▌ | 5682/22434 [6:50:03<11:32:07, 2.48s/it] +2025-02-05 16:57:43 - ERROR - stderr - +2025-02-05 16:57:43 - ERROR - stderr - +2025-02-05 16:57:43 - INFO - stdout - {'loss': 0.9447, 'grad_norm': 1.0583659410476685, 'learning_rate': 1.7498054642210225e-05, 'epoch': 0.76} +2025-02-05 16:57:43 - ERROR - stderr - 25%|██▌ | 5682/22434 [6:50:03<11:32:07, 2.48s/it] +2025-02-05 16:57:46 - ERROR - stderr - 25%|██▌ | 5683/22434 [6:50:06<11:35:41, 2.49s/it] +2025-02-05 16:57:46 - ERROR - stderr - +2025-02-05 16:57:46 - ERROR - stderr - +2025-02-05 16:57:46 - INFO - stdout - {'loss': 0.8503, 'grad_norm': 1.0365110635757446, 'learning_rate': 1.7497099297066546e-05, 'epoch': 0.76} +2025-02-05 16:57:46 - ERROR - stderr - 25%|██▌ | 5683/22434 [6:50:06<11:35:41, 2.49s/it] +2025-02-05 16:57:48 - ERROR - stderr - 25%|██▌ | 5684/22434 [6:50:08<11:33:12, 2.48s/it] +2025-02-05 16:57:48 - ERROR - stderr - +2025-02-05 16:57:48 - ERROR - stderr - +2025-02-05 16:57:48 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.0689677000045776, 'learning_rate': 1.749614379565301e-05, 'epoch': 0.76} +2025-02-05 16:57:48 - ERROR - stderr - 25%|██▌ | 5684/22434 [6:50:08<11:33:12, 2.48s/it] +2025-02-05 16:57:51 - ERROR - stderr - 25%|██▌ | 5685/22434 [6:50:11<11:33:28, 2.48s/it] +2025-02-05 16:57:51 - ERROR - stderr - +2025-02-05 16:57:51 - ERROR - stderr - +2025-02-05 16:57:51 - INFO - stdout - {'loss': 0.9119, 'grad_norm': 1.05976402759552, 'learning_rate': 1.7495188137989526e-05, 'epoch': 0.76} +2025-02-05 16:57:51 - ERROR - stderr - 25%|██▌ | 5685/22434 [6:50:11<11:33:28, 2.48s/it] +2025-02-05 16:57:53 - ERROR - stderr - 25%|██▌ | 5686/22434 [6:50:13<11:39:43, 2.51s/it] +2025-02-05 16:57:53 - ERROR - stderr - +2025-02-05 16:57:53 - ERROR - stderr - +2025-02-05 16:57:53 - INFO - stdout - {'loss': 0.9304, 'grad_norm': 1.1034635305404663, 'learning_rate': 1.749423232409602e-05, 'epoch': 0.76} +2025-02-05 16:57:53 - ERROR - stderr - 25%|██▌ | 5686/22434 [6:50:13<11:39:43, 2.51s/it] +2025-02-05 16:57:56 - ERROR - stderr - 25%|██▌ | 5687/22434 [6:50:16<11:41:45, 2.51s/it] +2025-02-05 16:57:56 - ERROR - stderr - +2025-02-05 16:57:56 - ERROR - stderr - +2025-02-05 16:57:56 - INFO - stdout - {'loss': 0.9736, 'grad_norm': 1.0311964750289917, 'learning_rate': 1.749327635399241e-05, 'epoch': 0.76} +2025-02-05 16:57:56 - ERROR - stderr - 25%|██▌ | 5687/22434 [6:50:16<11:41:45, 2.51s/it] +2025-02-05 16:57:58 - ERROR - stderr - 25%|██▌ | 5688/22434 [6:50:18<11:43:49, 2.52s/it] +2025-02-05 16:57:58 - ERROR - stderr - +2025-02-05 16:57:58 - ERROR - stderr - +2025-02-05 16:57:58 - INFO - stdout - {'loss': 1.043, 'grad_norm': 1.1243400573730469, 'learning_rate': 1.7492320227698624e-05, 'epoch': 0.76} +2025-02-05 16:57:58 - ERROR - stderr - 25%|██▌ | 5688/22434 [6:50:18<11:43:49, 2.52s/it] +2025-02-05 16:58:01 - ERROR - stderr - 25%|██▌ | 5689/22434 [6:50:21<11:37:47, 2.50s/it] +2025-02-05 16:58:01 - ERROR - stderr - +2025-02-05 16:58:01 - ERROR - stderr - +2025-02-05 16:58:01 - INFO - stdout - {'loss': 0.9555, 'grad_norm': 1.0421708822250366, 'learning_rate': 1.7491363945234595e-05, 'epoch': 0.76} +2025-02-05 16:58:01 - ERROR - stderr - 25%|██▌ | 5689/22434 [6:50:21<11:37:47, 2.50s/it] +2025-02-05 16:58:03 - ERROR - stderr - 25%|██▌ | 5690/22434 [6:50:23<11:33:04, 2.48s/it] +2025-02-05 16:58:03 - ERROR - stderr - +2025-02-05 16:58:03 - ERROR - stderr - +2025-02-05 16:58:03 - INFO - stdout - {'loss': 0.9402, 'grad_norm': 1.1084234714508057, 'learning_rate': 1.7490407506620252e-05, 'epoch': 0.76} +2025-02-05 16:58:03 - ERROR - stderr - 25%|██▌ | 5690/22434 [6:50:23<11:33:04, 2.48s/it] +2025-02-05 16:58:06 - ERROR - stderr - 25%|██▌ | 5691/22434 [6:50:26<11:39:29, 2.51s/it] +2025-02-05 16:58:06 - ERROR - stderr - +2025-02-05 16:58:06 - ERROR - stderr - +2025-02-05 16:58:06 - INFO - stdout - {'loss': 0.8836, 'grad_norm': 0.9782710671424866, 'learning_rate': 1.748945091187553e-05, 'epoch': 0.76} +2025-02-05 16:58:06 - ERROR - stderr - 25%|██▌ | 5691/22434 [6:50:26<11:39:29, 2.51s/it] +2025-02-05 16:58:08 - ERROR - stderr - 25%|██▌ | 5692/22434 [6:50:28<11:37:29, 2.50s/it] +2025-02-05 16:58:08 - ERROR - stderr - +2025-02-05 16:58:08 - ERROR - stderr - +2025-02-05 16:58:08 - INFO - stdout - {'loss': 0.9128, 'grad_norm': 1.0322253704071045, 'learning_rate': 1.7488494161020374e-05, 'epoch': 0.76} +2025-02-05 16:58:08 - ERROR - stderr - 25%|██▌ | 5692/22434 [6:50:28<11:37:29, 2.50s/it] +2025-02-05 16:58:11 - ERROR - stderr - 25%|██▌ | 5693/22434 [6:50:31<11:38:19, 2.50s/it] +2025-02-05 16:58:11 - ERROR - stderr - +2025-02-05 16:58:11 - ERROR - stderr - +2025-02-05 16:58:11 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 1.0175551176071167, 'learning_rate': 1.748753725407472e-05, 'epoch': 0.76} +2025-02-05 16:58:11 - ERROR - stderr - 25%|██▌ | 5693/22434 [6:50:31<11:38:19, 2.50s/it] +2025-02-05 16:58:13 - ERROR - stderr - 25%|██▌ | 5694/22434 [6:50:33<11:37:06, 2.50s/it] +2025-02-05 16:58:13 - ERROR - stderr - +2025-02-05 16:58:13 - ERROR - stderr - +2025-02-05 16:58:13 - INFO - stdout - {'loss': 0.9517, 'grad_norm': 1.0329780578613281, 'learning_rate': 1.748658019105852e-05, 'epoch': 0.76} +2025-02-05 16:58:13 - ERROR - stderr - 25%|██▌ | 5694/22434 [6:50:33<11:37:06, 2.50s/it] +2025-02-05 16:58:16 - ERROR - stderr - 25%|██▌ | 5695/22434 [6:50:36<11:31:02, 2.48s/it] +2025-02-05 16:58:16 - ERROR - stderr - +2025-02-05 16:58:16 - ERROR - stderr - +2025-02-05 16:58:16 - INFO - stdout - {'loss': 0.9757, 'grad_norm': 1.0101404190063477, 'learning_rate': 1.7485622971991718e-05, 'epoch': 0.76} +2025-02-05 16:58:16 - ERROR - stderr - 25%|██▌ | 5695/22434 [6:50:36<11:31:02, 2.48s/it] +2025-02-05 16:58:18 - ERROR - stderr - 25%|██▌ | 5696/22434 [6:50:38<11:30:55, 2.48s/it] +2025-02-05 16:58:18 - ERROR - stderr - +2025-02-05 16:58:18 - ERROR - stderr - +2025-02-05 16:58:18 - INFO - stdout - {'loss': 0.971, 'grad_norm': 1.1176928281784058, 'learning_rate': 1.748466559689427e-05, 'epoch': 0.76} +2025-02-05 16:58:18 - ERROR - stderr - 25%|██▌ | 5696/22434 [6:50:38<11:30:55, 2.48s/it] +2025-02-05 16:58:21 - ERROR - stderr - 25%|██▌ | 5697/22434 [6:50:40<11:32:09, 2.48s/it] +2025-02-05 16:58:21 - ERROR - stderr - +2025-02-05 16:58:21 - ERROR - stderr - +2025-02-05 16:58:21 - INFO - stdout - {'loss': 0.9593, 'grad_norm': 1.085686445236206, 'learning_rate': 1.7483708065786124e-05, 'epoch': 0.76} +2025-02-05 16:58:21 - ERROR - stderr - 25%|██▌ | 5697/22434 [6:50:41<11:32:09, 2.48s/it] +2025-02-05 16:58:23 - ERROR - stderr - 25%|██▌ | 5698/22434 [6:50:43<11:34:44, 2.49s/it] +2025-02-05 16:58:23 - ERROR - stderr - +2025-02-05 16:58:23 - ERROR - stderr - +2025-02-05 16:58:23 - INFO - stdout - {'loss': 0.8614, 'grad_norm': 1.1791216135025024, 'learning_rate': 1.748275037868725e-05, 'epoch': 0.76} +2025-02-05 16:58:23 - ERROR - stderr - 25%|██▌ | 5698/22434 [6:50:43<11:34:44, 2.49s/it] +2025-02-05 16:58:26 - ERROR - stderr - 25%|██▌ | 5699/22434 [6:50:45<11:32:07, 2.48s/it] +2025-02-05 16:58:26 - ERROR - stderr - +2025-02-05 16:58:26 - ERROR - stderr - +2025-02-05 16:58:26 - INFO - stdout - {'loss': 0.972, 'grad_norm': 1.1431652307510376, 'learning_rate': 1.7481792535617602e-05, 'epoch': 0.76} +2025-02-05 16:58:26 - ERROR - stderr - 25%|██▌ | 5699/22434 [6:50:46<11:32:07, 2.48s/it] +2025-02-05 16:58:28 - ERROR - stderr - 25%|██▌ | 5700/22434 [6:50:48<11:35:08, 2.49s/it] +2025-02-05 16:58:28 - ERROR - stderr - +2025-02-05 16:58:28 - ERROR - stderr - +2025-02-05 16:58:28 - INFO - stdout - {'loss': 0.9129, 'grad_norm': 1.0990146398544312, 'learning_rate': 1.748083453659715e-05, 'epoch': 0.76} +2025-02-05 16:58:28 - ERROR - stderr - 25%|██▌ | 5700/22434 [6:50:48<11:35:08, 2.49s/it] +2025-02-05 16:58:31 - ERROR - stderr - 25%|██▌ | 5701/22434 [6:50:51<11:38:54, 2.51s/it] +2025-02-05 16:58:31 - ERROR - stderr - +2025-02-05 16:58:31 - ERROR - stderr - +2025-02-05 16:58:31 - INFO - stdout - {'loss': 0.8663, 'grad_norm': 1.1180288791656494, 'learning_rate': 1.747987638164586e-05, 'epoch': 0.76} +2025-02-05 16:58:31 - ERROR - stderr - 25%|██▌ | 5701/22434 [6:50:51<11:38:54, 2.51s/it] +2025-02-05 16:58:33 - ERROR - stderr - 25%|██▌ | 5702/22434 [6:50:53<11:37:16, 2.50s/it] +2025-02-05 16:58:33 - ERROR - stderr - +2025-02-05 16:58:33 - ERROR - stderr - +2025-02-05 16:58:33 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 0.9956672191619873, 'learning_rate': 1.7478918070783703e-05, 'epoch': 0.76} +2025-02-05 16:58:33 - ERROR - stderr - 25%|██▌ | 5702/22434 [6:50:53<11:37:16, 2.50s/it] +2025-02-05 16:58:36 - ERROR - stderr - 25%|██▌ | 5703/22434 [6:50:55<11:35:23, 2.49s/it] +2025-02-05 16:58:36 - ERROR - stderr - +2025-02-05 16:58:36 - ERROR - stderr - +2025-02-05 16:58:36 - INFO - stdout - {'loss': 0.9435, 'grad_norm': 0.9825080633163452, 'learning_rate': 1.7477959604030656e-05, 'epoch': 0.76} +2025-02-05 16:58:36 - ERROR - stderr - 25%|██▌ | 5703/22434 [6:50:56<11:35:23, 2.49s/it] +2025-02-05 16:58:38 - ERROR - stderr - 25%|██▌ | 5704/22434 [6:50:58<11:34:02, 2.49s/it] +2025-02-05 16:58:38 - ERROR - stderr - +2025-02-05 16:58:38 - ERROR - stderr - +2025-02-05 16:58:38 - INFO - stdout - {'loss': 0.9414, 'grad_norm': 1.0081071853637695, 'learning_rate': 1.7477000981406697e-05, 'epoch': 0.76} +2025-02-05 16:58:38 - ERROR - stderr - 25%|██▌ | 5704/22434 [6:50:58<11:34:02, 2.49s/it] +2025-02-05 16:58:41 - ERROR - stderr - 25%|██▌ | 5705/22434 [6:51:01<11:48:31, 2.54s/it] +2025-02-05 16:58:41 - ERROR - stderr - +2025-02-05 16:58:41 - ERROR - stderr - +2025-02-05 16:58:41 - INFO - stdout - {'loss': 1.0138, 'grad_norm': 1.0427356958389282, 'learning_rate': 1.7476042202931806e-05, 'epoch': 0.76} +2025-02-05 16:58:41 - ERROR - stderr - 25%|██▌ | 5705/22434 [6:51:01<11:48:31, 2.54s/it] +2025-02-05 16:58:43 - ERROR - stderr - 25%|██▌ | 5706/22434 [6:51:03<11:44:11, 2.53s/it] +2025-02-05 16:58:43 - ERROR - stderr - +2025-02-05 16:58:43 - ERROR - stderr - +2025-02-05 16:58:43 - INFO - stdout - {'loss': 0.9377, 'grad_norm': 1.0891045331954956, 'learning_rate': 1.747508326862597e-05, 'epoch': 0.76} +2025-02-05 16:58:43 - ERROR - stderr - 25%|██▌ | 5706/22434 [6:51:03<11:44:11, 2.53s/it] +2025-02-05 16:58:46 - ERROR - stderr - 25%|██▌ | 5707/22434 [6:51:06<11:38:17, 2.50s/it] +2025-02-05 16:58:46 - ERROR - stderr - +2025-02-05 16:58:46 - ERROR - stderr - +2025-02-05 16:58:46 - INFO - stdout - {'loss': 1.0658, 'grad_norm': 1.2020474672317505, 'learning_rate': 1.7474124178509176e-05, 'epoch': 0.76} +2025-02-05 16:58:46 - ERROR - stderr - 25%|██▌ | 5707/22434 [6:51:06<11:38:17, 2.50s/it] +2025-02-05 16:58:48 - ERROR - stderr - 25%|██▌ | 5708/22434 [6:51:08<11:37:49, 2.50s/it] +2025-02-05 16:58:48 - ERROR - stderr - +2025-02-05 16:58:48 - ERROR - stderr - +2025-02-05 16:58:48 - INFO - stdout - {'loss': 0.914, 'grad_norm': 1.0939958095550537, 'learning_rate': 1.7473164932601414e-05, 'epoch': 0.76} +2025-02-05 16:58:48 - ERROR - stderr - 25%|██▌ | 5708/22434 [6:51:08<11:37:49, 2.50s/it] +2025-02-05 16:58:51 - ERROR - stderr - 25%|██▌ | 5709/22434 [6:51:11<11:36:15, 2.50s/it] +2025-02-05 16:58:51 - ERROR - stderr - +2025-02-05 16:58:51 - ERROR - stderr - +2025-02-05 16:58:51 - INFO - stdout - {'loss': 1.1071, 'grad_norm': 1.1803240776062012, 'learning_rate': 1.7472205530922683e-05, 'epoch': 0.76} +2025-02-05 16:58:51 - ERROR - stderr - 25%|██▌ | 5709/22434 [6:51:11<11:36:15, 2.50s/it] +2025-02-05 16:58:53 - ERROR - stderr - 25%|██▌ | 5710/22434 [6:51:13<11:34:43, 2.49s/it] +2025-02-05 16:58:53 - ERROR - stderr - +2025-02-05 16:58:53 - ERROR - stderr - +2025-02-05 16:58:53 - INFO - stdout - {'loss': 1.0456, 'grad_norm': 1.0756531953811646, 'learning_rate': 1.7471245973492977e-05, 'epoch': 0.76} +2025-02-05 16:58:53 - ERROR - stderr - 25%|██▌ | 5710/22434 [6:51:13<11:34:43, 2.49s/it] +2025-02-05 16:58:56 - ERROR - stderr - 25%|██▌ | 5711/22434 [6:51:16<11:46:34, 2.54s/it] +2025-02-05 16:58:56 - ERROR - stderr - +2025-02-05 16:58:56 - ERROR - stderr - +2025-02-05 16:58:56 - INFO - stdout - {'loss': 0.9322, 'grad_norm': 1.1000767946243286, 'learning_rate': 1.7470286260332296e-05, 'epoch': 0.76} +2025-02-05 16:58:56 - ERROR - stderr - 25%|██▌ | 5711/22434 [6:51:16<11:46:34, 2.54s/it] +2025-02-05 16:58:58 - ERROR - stderr - 25%|██▌ | 5712/22434 [6:51:18<11:41:08, 2.52s/it] +2025-02-05 16:58:58 - ERROR - stderr - +2025-02-05 16:58:58 - ERROR - stderr - +2025-02-05 16:58:58 - INFO - stdout - {'loss': 0.8867, 'grad_norm': 1.0814807415008545, 'learning_rate': 1.7469326391460647e-05, 'epoch': 0.76} +2025-02-05 16:58:58 - ERROR - stderr - 25%|██▌ | 5712/22434 [6:51:18<11:41:08, 2.52s/it] +2025-02-05 16:59:01 - ERROR - stderr - 25%|██▌ | 5713/22434 [6:51:21<11:48:09, 2.54s/it] +2025-02-05 16:59:01 - ERROR - stderr - +2025-02-05 16:59:01 - ERROR - stderr - +2025-02-05 16:59:01 - INFO - stdout - {'loss': 0.9926, 'grad_norm': 1.0714526176452637, 'learning_rate': 1.7468366366898038e-05, 'epoch': 0.76} +2025-02-05 16:59:01 - ERROR - stderr - 25%|██▌ | 5713/22434 [6:51:21<11:48:09, 2.54s/it] +2025-02-05 16:59:03 - ERROR - stderr - 25%|██▌ | 5714/22434 [6:51:23<11:45:16, 2.53s/it] +2025-02-05 16:59:04 - ERROR - stderr - +2025-02-05 16:59:04 - ERROR - stderr - +2025-02-05 16:59:04 - INFO - stdout - {'loss': 0.7909, 'grad_norm': 1.1460797786712646, 'learning_rate': 1.7467406186664474e-05, 'epoch': 0.76} +2025-02-05 16:59:04 - ERROR - stderr - 25%|██▌ | 5714/22434 [6:51:23<11:45:16, 2.53s/it] +2025-02-05 16:59:06 - ERROR - stderr - 25%|██▌ | 5715/22434 [6:51:26<11:43:32, 2.52s/it] +2025-02-05 16:59:06 - ERROR - stderr - +2025-02-05 16:59:06 - ERROR - stderr - +2025-02-05 16:59:06 - INFO - stdout - {'loss': 0.9048, 'grad_norm': 0.9759002923965454, 'learning_rate': 1.746644585077998e-05, 'epoch': 0.76} +2025-02-05 16:59:06 - ERROR - stderr - 25%|██▌ | 5715/22434 [6:51:26<11:43:32, 2.52s/it] +2025-02-05 16:59:09 - ERROR - stderr - 25%|██▌ | 5716/22434 [6:51:28<11:44:57, 2.53s/it] +2025-02-05 16:59:09 - ERROR - stderr - +2025-02-05 16:59:09 - ERROR - stderr - +2025-02-05 16:59:09 - INFO - stdout - {'loss': 0.9642, 'grad_norm': 0.9731238484382629, 'learning_rate': 1.7465485359264565e-05, 'epoch': 0.76} +2025-02-05 16:59:09 - ERROR - stderr - 25%|██▌ | 5716/22434 [6:51:28<11:44:57, 2.53s/it] +2025-02-05 16:59:11 - ERROR - stderr - 25%|██▌ | 5717/22434 [6:51:31<11:41:46, 2.52s/it] +2025-02-05 16:59:11 - ERROR - stderr - +2025-02-05 16:59:11 - ERROR - stderr - +2025-02-05 16:59:11 - INFO - stdout - {'loss': 0.8957, 'grad_norm': 0.9622951149940491, 'learning_rate': 1.7464524712138252e-05, 'epoch': 0.76} +2025-02-05 16:59:11 - ERROR - stderr - 25%|██▌ | 5717/22434 [6:51:31<11:41:46, 2.52s/it] +2025-02-05 16:59:14 - ERROR - stderr - 25%|██▌ | 5718/22434 [6:51:33<11:41:34, 2.52s/it] +2025-02-05 16:59:14 - ERROR - stderr - +2025-02-05 16:59:14 - ERROR - stderr - +2025-02-05 16:59:14 - INFO - stdout - {'loss': 0.8636, 'grad_norm': 1.0308570861816406, 'learning_rate': 1.746356390942106e-05, 'epoch': 0.76} +2025-02-05 16:59:14 - ERROR - stderr - 25%|██▌ | 5718/22434 [6:51:33<11:41:34, 2.52s/it] +2025-02-05 16:59:16 - ERROR - stderr - 25%|██▌ | 5719/22434 [6:51:36<11:38:36, 2.51s/it] +2025-02-05 16:59:16 - ERROR - stderr - +2025-02-05 16:59:16 - ERROR - stderr - +2025-02-05 16:59:16 - INFO - stdout - {'loss': 0.7879, 'grad_norm': 1.0122634172439575, 'learning_rate': 1.7462602951133022e-05, 'epoch': 0.76} +2025-02-05 16:59:16 - ERROR - stderr - 25%|██▌ | 5719/22434 [6:51:36<11:38:36, 2.51s/it] +2025-02-05 16:59:19 - ERROR - stderr - 25%|██▌ | 5720/22434 [6:51:38<11:42:08, 2.52s/it] +2025-02-05 16:59:19 - ERROR - stderr - +2025-02-05 16:59:19 - ERROR - stderr - +2025-02-05 16:59:19 - INFO - stdout - {'loss': 0.9342, 'grad_norm': 1.12986421585083, 'learning_rate': 1.7461641837294167e-05, 'epoch': 0.76} +2025-02-05 16:59:19 - ERROR - stderr - 25%|██▌ | 5720/22434 [6:51:38<11:42:08, 2.52s/it] +2025-02-05 16:59:21 - ERROR - stderr - 26%|██▌ | 5721/22434 [6:51:41<11:50:33, 2.55s/it] +2025-02-05 16:59:21 - ERROR - stderr - +2025-02-05 16:59:21 - ERROR - stderr - +2025-02-05 16:59:21 - INFO - stdout - {'loss': 1.0302, 'grad_norm': 1.1417661905288696, 'learning_rate': 1.7460680567924528e-05, 'epoch': 0.77} +2025-02-05 16:59:21 - ERROR - stderr - 26%|██▌ | 5721/22434 [6:51:41<11:50:33, 2.55s/it] +2025-02-05 16:59:24 - ERROR - stderr - 26%|██▌ | 5722/22434 [6:51:43<11:43:10, 2.52s/it] +2025-02-05 16:59:24 - ERROR - stderr - +2025-02-05 16:59:24 - ERROR - stderr - +2025-02-05 16:59:24 - INFO - stdout - {'loss': 1.0263, 'grad_norm': 1.1987031698226929, 'learning_rate': 1.7459719143044146e-05, 'epoch': 0.77} +2025-02-05 16:59:24 - ERROR - stderr - 26%|██▌ | 5722/22434 [6:51:43<11:43:10, 2.52s/it] +2025-02-05 16:59:26 - ERROR - stderr - 26%|██▌ | 5723/22434 [6:51:46<11:37:32, 2.50s/it] +2025-02-05 16:59:26 - ERROR - stderr - +2025-02-05 16:59:26 - ERROR - stderr - +2025-02-05 16:59:26 - INFO - stdout - {'loss': 0.8962, 'grad_norm': 1.044432282447815, 'learning_rate': 1.745875756267305e-05, 'epoch': 0.77} +2025-02-05 16:59:26 - ERROR - stderr - 26%|██▌ | 5723/22434 [6:51:46<11:37:32, 2.50s/it] +2025-02-05 16:59:29 - ERROR - stderr - 26%|██▌ | 5724/22434 [6:51:48<11:37:41, 2.51s/it] +2025-02-05 16:59:29 - ERROR - stderr - +2025-02-05 16:59:29 - ERROR - stderr - +2025-02-05 16:59:29 - INFO - stdout - {'loss': 0.9589, 'grad_norm': 1.0600156784057617, 'learning_rate': 1.7457795826831293e-05, 'epoch': 0.77} +2025-02-05 16:59:29 - ERROR - stderr - 26%|██▌ | 5724/22434 [6:51:48<11:37:41, 2.51s/it] +2025-02-05 16:59:31 - ERROR - stderr - 26%|██▌ | 5725/22434 [6:51:51<11:35:37, 2.50s/it] +2025-02-05 16:59:31 - ERROR - stderr - +2025-02-05 16:59:31 - ERROR - stderr - +2025-02-05 16:59:31 - INFO - stdout - {'loss': 1.0862, 'grad_norm': 1.1277058124542236, 'learning_rate': 1.7456833935538917e-05, 'epoch': 0.77} +2025-02-05 16:59:31 - ERROR - stderr - 26%|██▌ | 5725/22434 [6:51:51<11:35:37, 2.50s/it] +2025-02-05 16:59:34 - ERROR - stderr - 26%|██▌ | 5726/22434 [6:51:53<11:41:55, 2.52s/it] +2025-02-05 16:59:34 - ERROR - stderr - +2025-02-05 16:59:34 - ERROR - stderr - +2025-02-05 16:59:34 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.094230055809021, 'learning_rate': 1.7455871888815972e-05, 'epoch': 0.77} +2025-02-05 16:59:34 - ERROR - stderr - 26%|██▌ | 5726/22434 [6:51:53<11:41:55, 2.52s/it] +2025-02-05 16:59:37 - ERROR - stderr - 26%|██▌ | 5727/22434 [6:51:56<12:09:21, 2.62s/it] +2025-02-05 16:59:37 - ERROR - stderr - +2025-02-05 16:59:37 - ERROR - stderr - +2025-02-05 16:59:37 - INFO - stdout - {'loss': 0.9446, 'grad_norm': 1.0901530981063843, 'learning_rate': 1.7454909686682515e-05, 'epoch': 0.77} +2025-02-05 16:59:37 - ERROR - stderr - 26%|██▌ | 5727/22434 [6:51:56<12:09:21, 2.62s/it] +2025-02-05 16:59:39 - ERROR - stderr - 26%|██▌ | 5728/22434 [6:51:59<12:00:24, 2.59s/it] +2025-02-05 16:59:39 - ERROR - stderr - +2025-02-05 16:59:39 - ERROR - stderr - +2025-02-05 16:59:39 - INFO - stdout - {'loss': 1.1089, 'grad_norm': 1.1245795488357544, 'learning_rate': 1.7453947329158597e-05, 'epoch': 0.77} +2025-02-05 16:59:39 - ERROR - stderr - 26%|██▌ | 5728/22434 [6:51:59<12:00:24, 2.59s/it] +2025-02-05 16:59:41 - ERROR - stderr - 26%|██▌ | 5729/22434 [6:52:01<11:47:36, 2.54s/it] +2025-02-05 16:59:42 - ERROR - stderr - +2025-02-05 16:59:42 - ERROR - stderr - +2025-02-05 16:59:42 - INFO - stdout - {'loss': 0.9002, 'grad_norm': 1.0885945558547974, 'learning_rate': 1.7452984816264282e-05, 'epoch': 0.77} +2025-02-05 16:59:42 - ERROR - stderr - 26%|██▌ | 5729/22434 [6:52:01<11:47:36, 2.54s/it] +2025-02-05 16:59:44 - ERROR - stderr - 26%|██▌ | 5730/22434 [6:52:04<11:44:32, 2.53s/it] +2025-02-05 16:59:44 - ERROR - stderr - +2025-02-05 16:59:44 - ERROR - stderr - +2025-02-05 16:59:44 - INFO - stdout - {'loss': 0.8455, 'grad_norm': 1.0388959646224976, 'learning_rate': 1.7452022148019626e-05, 'epoch': 0.77} +2025-02-05 16:59:44 - ERROR - stderr - 26%|██▌ | 5730/22434 [6:52:04<11:44:32, 2.53s/it] +2025-02-05 16:59:46 - ERROR - stderr - 26%|██▌ | 5731/22434 [6:52:06<11:37:21, 2.51s/it] +2025-02-05 16:59:46 - ERROR - stderr - +2025-02-05 16:59:46 - ERROR - stderr - +2025-02-05 16:59:46 - INFO - stdout - {'loss': 0.9229, 'grad_norm': 1.0902312994003296, 'learning_rate': 1.7451059324444702e-05, 'epoch': 0.77} +2025-02-05 16:59:46 - ERROR - stderr - 26%|██▌ | 5731/22434 [6:52:06<11:37:21, 2.51s/it] +2025-02-05 16:59:49 - ERROR - stderr - 26%|██▌ | 5732/22434 [6:52:09<12:04:42, 2.60s/it] +2025-02-05 16:59:49 - ERROR - stderr - +2025-02-05 16:59:49 - ERROR - stderr - +2025-02-05 16:59:49 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.0550434589385986, 'learning_rate': 1.7450096345559576e-05, 'epoch': 0.77} +2025-02-05 16:59:49 - ERROR - stderr - 26%|██▌ | 5732/22434 [6:52:09<12:04:42, 2.60s/it] +2025-02-05 16:59:52 - ERROR - stderr - 26%|██▌ | 5733/22434 [6:52:12<11:57:28, 2.58s/it] +2025-02-05 16:59:52 - ERROR - stderr - +2025-02-05 16:59:52 - ERROR - stderr - +2025-02-05 16:59:52 - INFO - stdout - {'loss': 0.9638, 'grad_norm': 0.9747079014778137, 'learning_rate': 1.7449133211384325e-05, 'epoch': 0.77} +2025-02-05 16:59:52 - ERROR - stderr - 26%|██▌ | 5733/22434 [6:52:12<11:57:28, 2.58s/it] +2025-02-05 16:59:54 - ERROR - stderr - 26%|██▌ | 5734/22434 [6:52:14<11:49:40, 2.55s/it] +2025-02-05 16:59:54 - ERROR - stderr - +2025-02-05 16:59:54 - ERROR - stderr - +2025-02-05 16:59:54 - INFO - stdout - {'loss': 0.9623, 'grad_norm': 1.0863221883773804, 'learning_rate': 1.7448169921939014e-05, 'epoch': 0.77} +2025-02-05 16:59:54 - ERROR - stderr - 26%|██▌ | 5734/22434 [6:52:14<11:49:40, 2.55s/it] +2025-02-05 16:59:57 - ERROR - stderr - 26%|██▌ | 5735/22434 [6:52:16<11:42:07, 2.52s/it] +2025-02-05 16:59:57 - ERROR - stderr - +2025-02-05 16:59:57 - ERROR - stderr - +2025-02-05 16:59:57 - INFO - stdout - {'loss': 0.7959, 'grad_norm': 1.0640642642974854, 'learning_rate': 1.744720647724373e-05, 'epoch': 0.77} +2025-02-05 16:59:57 - ERROR - stderr - 26%|██▌ | 5735/22434 [6:52:17<11:42:07, 2.52s/it] +2025-02-05 16:59:59 - ERROR - stderr - 26%|██▌ | 5736/22434 [6:52:19<11:40:28, 2.52s/it] +2025-02-05 16:59:59 - ERROR - stderr - +2025-02-05 16:59:59 - ERROR - stderr - +2025-02-05 16:59:59 - INFO - stdout - {'loss': 0.9731, 'grad_norm': 0.9944091439247131, 'learning_rate': 1.7446242877318553e-05, 'epoch': 0.77} +2025-02-05 16:59:59 - ERROR - stderr - 26%|██▌ | 5736/22434 [6:52:19<11:40:28, 2.52s/it] +2025-02-05 17:00:02 - ERROR - stderr - 26%|██▌ | 5737/22434 [6:52:21<11:40:21, 2.52s/it] +2025-02-05 17:00:02 - ERROR - stderr - +2025-02-05 17:00:02 - ERROR - stderr - +2025-02-05 17:00:02 - INFO - stdout - {'loss': 0.8952, 'grad_norm': 0.9624443650245667, 'learning_rate': 1.7445279122183567e-05, 'epoch': 0.77} +2025-02-05 17:00:02 - ERROR - stderr - 26%|██▌ | 5737/22434 [6:52:22<11:40:21, 2.52s/it] +2025-02-05 17:00:04 - ERROR - stderr - 26%|██▌ | 5738/22434 [6:52:24<11:44:54, 2.53s/it] +2025-02-05 17:00:04 - ERROR - stderr - +2025-02-05 17:00:04 - ERROR - stderr - +2025-02-05 17:00:04 - INFO - stdout - {'loss': 0.8447, 'grad_norm': 1.149829387664795, 'learning_rate': 1.7444315211858864e-05, 'epoch': 0.77} +2025-02-05 17:00:04 - ERROR - stderr - 26%|██▌ | 5738/22434 [6:52:24<11:44:54, 2.53s/it] +2025-02-05 17:00:07 - ERROR - stderr - 26%|██▌ | 5739/22434 [6:52:27<11:37:51, 2.51s/it] +2025-02-05 17:00:07 - ERROR - stderr - +2025-02-05 17:00:07 - ERROR - stderr - +2025-02-05 17:00:07 - INFO - stdout - {'loss': 0.9548, 'grad_norm': 0.9767423272132874, 'learning_rate': 1.7443351146364534e-05, 'epoch': 0.77} +2025-02-05 17:00:07 - ERROR - stderr - 26%|██▌ | 5739/22434 [6:52:27<11:37:51, 2.51s/it] +2025-02-05 17:00:09 - ERROR - stderr - 26%|██▌ | 5740/22434 [6:52:29<11:38:04, 2.51s/it] +2025-02-05 17:00:09 - ERROR - stderr - +2025-02-05 17:00:09 - ERROR - stderr - +2025-02-05 17:00:09 - INFO - stdout - {'loss': 1.0041, 'grad_norm': 1.1116724014282227, 'learning_rate': 1.744238692572067e-05, 'epoch': 0.77} +2025-02-05 17:00:09 - ERROR - stderr - 26%|██▌ | 5740/22434 [6:52:29<11:38:04, 2.51s/it] +2025-02-05 17:00:12 - ERROR - stderr - 26%|██▌ | 5741/22434 [6:52:32<11:42:20, 2.52s/it] +2025-02-05 17:00:12 - ERROR - stderr - +2025-02-05 17:00:12 - ERROR - stderr - +2025-02-05 17:00:12 - INFO - stdout - {'loss': 0.955, 'grad_norm': 1.12540864944458, 'learning_rate': 1.7441422549947375e-05, 'epoch': 0.77} +2025-02-05 17:00:12 - ERROR - stderr - 26%|██▌ | 5741/22434 [6:52:32<11:42:20, 2.52s/it] +2025-02-05 17:00:14 - ERROR - stderr - 26%|██▌ | 5742/22434 [6:52:34<11:36:18, 2.50s/it] +2025-02-05 17:00:14 - ERROR - stderr - +2025-02-05 17:00:14 - ERROR - stderr - +2025-02-05 17:00:14 - INFO - stdout - {'loss': 0.9544, 'grad_norm': 1.1024413108825684, 'learning_rate': 1.7440458019064745e-05, 'epoch': 0.77} +2025-02-05 17:00:14 - ERROR - stderr - 26%|██▌ | 5742/22434 [6:52:34<11:36:18, 2.50s/it] +2025-02-05 17:00:17 - ERROR - stderr - 26%|██▌ | 5743/22434 [6:52:37<11:33:38, 2.49s/it] +2025-02-05 17:00:17 - ERROR - stderr - +2025-02-05 17:00:17 - ERROR - stderr - +2025-02-05 17:00:17 - INFO - stdout - {'loss': 0.9678, 'grad_norm': 1.094484806060791, 'learning_rate': 1.743949333309289e-05, 'epoch': 0.77} +2025-02-05 17:00:17 - ERROR - stderr - 26%|██▌ | 5743/22434 [6:52:37<11:33:38, 2.49s/it] +2025-02-05 17:00:19 - ERROR - stderr - 26%|██▌ | 5744/22434 [6:52:39<11:43:56, 2.53s/it] +2025-02-05 17:00:19 - ERROR - stderr - +2025-02-05 17:00:19 - ERROR - stderr - +2025-02-05 17:00:19 - INFO - stdout - {'loss': 0.9342, 'grad_norm': 1.133272409439087, 'learning_rate': 1.7438528492051914e-05, 'epoch': 0.77} +2025-02-05 17:00:19 - ERROR - stderr - 26%|██▌ | 5744/22434 [6:52:39<11:43:56, 2.53s/it] +2025-02-05 17:00:22 - ERROR - stderr - 26%|██▌ | 5745/22434 [6:52:42<11:37:27, 2.51s/it] +2025-02-05 17:00:22 - ERROR - stderr - +2025-02-05 17:00:22 - ERROR - stderr - +2025-02-05 17:00:22 - INFO - stdout - {'loss': 0.9971, 'grad_norm': 1.1478476524353027, 'learning_rate': 1.743756349596193e-05, 'epoch': 0.77} +2025-02-05 17:00:22 - ERROR - stderr - 26%|██▌ | 5745/22434 [6:52:42<11:37:27, 2.51s/it] +2025-02-05 17:00:24 - ERROR - stderr - 26%|██▌ | 5746/22434 [6:52:44<11:47:26, 2.54s/it] +2025-02-05 17:00:24 - ERROR - stderr - +2025-02-05 17:00:24 - ERROR - stderr - +2025-02-05 17:00:24 - INFO - stdout - {'loss': 0.9334, 'grad_norm': 1.0720198154449463, 'learning_rate': 1.743659834484305e-05, 'epoch': 0.77} +2025-02-05 17:00:24 - ERROR - stderr - 26%|██▌ | 5746/22434 [6:52:44<11:47:26, 2.54s/it] +2025-02-05 17:00:27 - ERROR - stderr - 26%|██▌ | 5747/22434 [6:52:47<12:00:06, 2.59s/it] +2025-02-05 17:00:27 - ERROR - stderr - +2025-02-05 17:00:27 - ERROR - stderr - +2025-02-05 17:00:27 - INFO - stdout - {'loss': 0.8908, 'grad_norm': 1.0617471933364868, 'learning_rate': 1.7435633038715396e-05, 'epoch': 0.77} +2025-02-05 17:00:27 - ERROR - stderr - 26%|██▌ | 5747/22434 [6:52:47<12:00:06, 2.59s/it] +2025-02-05 17:00:30 - ERROR - stderr - 26%|██▌ | 5748/22434 [6:52:49<11:56:15, 2.58s/it] +2025-02-05 17:00:30 - ERROR - stderr - +2025-02-05 17:00:30 - ERROR - stderr - +2025-02-05 17:00:30 - INFO - stdout - {'loss': 1.0, 'grad_norm': 1.0409166812896729, 'learning_rate': 1.7434667577599086e-05, 'epoch': 0.77} +2025-02-05 17:00:30 - ERROR - stderr - 26%|██▌ | 5748/22434 [6:52:50<11:56:15, 2.58s/it] +2025-02-05 17:00:32 - ERROR - stderr - 26%|██▌ | 5749/22434 [6:52:52<11:49:29, 2.55s/it] +2025-02-05 17:00:32 - ERROR - stderr - +2025-02-05 17:00:32 - ERROR - stderr - +2025-02-05 17:00:32 - INFO - stdout - {'loss': 1.0408, 'grad_norm': 1.1328110694885254, 'learning_rate': 1.7433701961514242e-05, 'epoch': 0.77} +2025-02-05 17:00:32 - ERROR - stderr - 26%|██▌ | 5749/22434 [6:52:52<11:49:29, 2.55s/it] +2025-02-05 17:00:35 - ERROR - stderr - 26%|██▌ | 5750/22434 [6:52:54<11:48:13, 2.55s/it] +2025-02-05 17:00:35 - ERROR - stderr - +2025-02-05 17:00:35 - ERROR - stderr - +2025-02-05 17:00:35 - INFO - stdout - {'loss': 0.9908, 'grad_norm': 1.175031304359436, 'learning_rate': 1.7432736190480995e-05, 'epoch': 0.77} +2025-02-05 17:00:35 - ERROR - stderr - 26%|██▌ | 5750/22434 [6:52:55<11:48:13, 2.55s/it] +2025-02-05 17:00:37 - ERROR - stderr - 26%|██▌ | 5751/22434 [6:52:57<11:43:39, 2.53s/it] +2025-02-05 17:00:37 - ERROR - stderr - +2025-02-05 17:00:37 - ERROR - stderr - +2025-02-05 17:00:37 - INFO - stdout - {'loss': 1.0363, 'grad_norm': 1.1278076171875, 'learning_rate': 1.7431770264519478e-05, 'epoch': 0.77} +2025-02-05 17:00:37 - ERROR - stderr - 26%|██▌ | 5751/22434 [6:52:57<11:43:39, 2.53s/it] +2025-02-05 17:00:40 - ERROR - stderr - 26%|██▌ | 5752/22434 [6:52:59<11:39:45, 2.52s/it] +2025-02-05 17:00:40 - ERROR - stderr - +2025-02-05 17:00:40 - ERROR - stderr - +2025-02-05 17:00:40 - INFO - stdout - {'loss': 0.8803, 'grad_norm': 1.2135567665100098, 'learning_rate': 1.7430804183649818e-05, 'epoch': 0.77} +2025-02-05 17:00:40 - ERROR - stderr - 26%|██▌ | 5752/22434 [6:53:00<11:39:45, 2.52s/it] +2025-02-05 17:00:42 - ERROR - stderr - 26%|██▌ | 5753/22434 [6:53:02<11:33:38, 2.49s/it] +2025-02-05 17:00:42 - ERROR - stderr - +2025-02-05 17:00:42 - ERROR - stderr - +2025-02-05 17:00:42 - INFO - stdout - {'loss': 0.8361, 'grad_norm': 1.0099236965179443, 'learning_rate': 1.7429837947892154e-05, 'epoch': 0.77} +2025-02-05 17:00:42 - ERROR - stderr - 26%|██▌ | 5753/22434 [6:53:02<11:33:38, 2.49s/it] +2025-02-05 17:00:45 - ERROR - stderr - 26%|██▌ | 5754/22434 [6:53:04<11:39:28, 2.52s/it] +2025-02-05 17:00:45 - ERROR - stderr - +2025-02-05 17:00:45 - ERROR - stderr - +2025-02-05 17:00:45 - INFO - stdout - {'loss': 0.9258, 'grad_norm': 1.0451719760894775, 'learning_rate': 1.7428871557266628e-05, 'epoch': 0.77} +2025-02-05 17:00:45 - ERROR - stderr - 26%|██▌ | 5754/22434 [6:53:05<11:39:28, 2.52s/it] +2025-02-05 17:00:47 - ERROR - stderr - 26%|██▌ | 5755/22434 [6:53:07<11:37:33, 2.51s/it] +2025-02-05 17:00:47 - ERROR - stderr - +2025-02-05 17:00:47 - ERROR - stderr - +2025-02-05 17:00:47 - INFO - stdout - {'loss': 0.9013, 'grad_norm': 1.1481705904006958, 'learning_rate': 1.7427905011793385e-05, 'epoch': 0.77} +2025-02-05 17:00:47 - ERROR - stderr - 26%|██▌ | 5755/22434 [6:53:07<11:37:33, 2.51s/it] +2025-02-05 17:00:50 - ERROR - stderr - 26%|██▌ | 5756/22434 [6:53:09<11:34:17, 2.50s/it] +2025-02-05 17:00:50 - ERROR - stderr - +2025-02-05 17:00:50 - ERROR - stderr - +2025-02-05 17:00:50 - INFO - stdout - {'loss': 0.9217, 'grad_norm': 1.0951621532440186, 'learning_rate': 1.742693831149257e-05, 'epoch': 0.77} +2025-02-05 17:00:50 - ERROR - stderr - 26%|██▌ | 5756/22434 [6:53:09<11:34:17, 2.50s/it] +2025-02-05 17:00:52 - ERROR - stderr - 26%|██▌ | 5757/22434 [6:53:12<11:35:31, 2.50s/it] +2025-02-05 17:00:52 - ERROR - stderr - +2025-02-05 17:00:52 - ERROR - stderr - +2025-02-05 17:00:52 - INFO - stdout - {'loss': 0.9773, 'grad_norm': 1.0907785892486572, 'learning_rate': 1.7425971456384333e-05, 'epoch': 0.77} +2025-02-05 17:00:52 - ERROR - stderr - 26%|██▌ | 5757/22434 [6:53:12<11:35:31, 2.50s/it] +2025-02-05 17:00:55 - ERROR - stderr - 26%|██▌ | 5758/22434 [6:53:14<11:33:25, 2.49s/it] +2025-02-05 17:00:55 - ERROR - stderr - +2025-02-05 17:00:55 - ERROR - stderr - +2025-02-05 17:00:55 - INFO - stdout - {'loss': 0.8736, 'grad_norm': 1.0733246803283691, 'learning_rate': 1.7425004446488825e-05, 'epoch': 0.77} +2025-02-05 17:00:55 - ERROR - stderr - 26%|██▌ | 5758/22434 [6:53:14<11:33:25, 2.49s/it] +2025-02-05 17:00:57 - ERROR - stderr - 26%|██▌ | 5759/22434 [6:53:17<11:32:41, 2.49s/it] +2025-02-05 17:00:57 - ERROR - stderr - +2025-02-05 17:00:57 - ERROR - stderr - +2025-02-05 17:00:57 - INFO - stdout - {'loss': 1.0529, 'grad_norm': 1.0340436697006226, 'learning_rate': 1.7424037281826204e-05, 'epoch': 0.77} +2025-02-05 17:00:57 - ERROR - stderr - 26%|██▌ | 5759/22434 [6:53:17<11:32:41, 2.49s/it] +2025-02-05 17:01:00 - ERROR - stderr - 26%|██▌ | 5760/22434 [6:53:19<11:34:10, 2.50s/it] +2025-02-05 17:01:00 - ERROR - stderr - +2025-02-05 17:01:00 - ERROR - stderr - +2025-02-05 17:01:00 - INFO - stdout - {'loss': 0.995, 'grad_norm': 1.153602123260498, 'learning_rate': 1.7423069962416634e-05, 'epoch': 0.77} +2025-02-05 17:01:00 - ERROR - stderr - 26%|██▌ | 5760/22434 [6:53:19<11:34:10, 2.50s/it] +2025-02-05 17:01:02 - ERROR - stderr - 26%|██▌ | 5761/22434 [6:53:22<11:30:27, 2.48s/it] +2025-02-05 17:01:02 - ERROR - stderr - +2025-02-05 17:01:02 - ERROR - stderr - +2025-02-05 17:01:02 - INFO - stdout - {'loss': 0.9111, 'grad_norm': 0.9764849543571472, 'learning_rate': 1.7422102488280266e-05, 'epoch': 0.77} +2025-02-05 17:01:02 - ERROR - stderr - 26%|██▌ | 5761/22434 [6:53:22<11:30:27, 2.48s/it] +2025-02-05 17:01:05 - ERROR - stderr - 26%|██▌ | 5762/22434 [6:53:24<11:35:10, 2.50s/it] +2025-02-05 17:01:05 - ERROR - stderr - +2025-02-05 17:01:05 - ERROR - stderr - +2025-02-05 17:01:05 - INFO - stdout - {'loss': 1.0711, 'grad_norm': 1.1071991920471191, 'learning_rate': 1.742113485943728e-05, 'epoch': 0.77} +2025-02-05 17:01:05 - ERROR - stderr - 26%|██▌ | 5762/22434 [6:53:24<11:35:10, 2.50s/it] +2025-02-05 17:01:07 - ERROR - stderr - 26%|██▌ | 5763/22434 [6:53:27<11:34:49, 2.50s/it] +2025-02-05 17:01:07 - ERROR - stderr - +2025-02-05 17:01:07 - ERROR - stderr - +2025-02-05 17:01:07 - INFO - stdout - {'loss': 0.8927, 'grad_norm': 1.0592581033706665, 'learning_rate': 1.742016707590784e-05, 'epoch': 0.77} +2025-02-05 17:01:07 - ERROR - stderr - 26%|██▌ | 5763/22434 [6:53:27<11:34:49, 2.50s/it] +2025-02-05 17:01:10 - ERROR - stderr - 26%|██▌ | 5764/22434 [6:53:29<11:39:00, 2.52s/it] +2025-02-05 17:01:10 - ERROR - stderr - +2025-02-05 17:01:10 - ERROR - stderr - +2025-02-05 17:01:10 - INFO - stdout - {'loss': 1.1665, 'grad_norm': 1.1966403722763062, 'learning_rate': 1.7419199137712112e-05, 'epoch': 0.77} +2025-02-05 17:01:10 - ERROR - stderr - 26%|██▌ | 5764/22434 [6:53:30<11:39:00, 2.52s/it] +2025-02-05 17:01:12 - ERROR - stderr - 26%|██▌ | 5765/22434 [6:53:32<11:36:15, 2.51s/it] +2025-02-05 17:01:12 - ERROR - stderr - +2025-02-05 17:01:12 - ERROR - stderr - +2025-02-05 17:01:12 - INFO - stdout - {'loss': 1.0723, 'grad_norm': 1.1510671377182007, 'learning_rate': 1.7418231044870283e-05, 'epoch': 0.77} +2025-02-05 17:01:12 - ERROR - stderr - 26%|██▌ | 5765/22434 [6:53:32<11:36:15, 2.51s/it] +2025-02-05 17:01:15 - ERROR - stderr - 26%|██▌ | 5766/22434 [6:53:34<11:35:41, 2.50s/it] +2025-02-05 17:01:15 - ERROR - stderr - +2025-02-05 17:01:15 - ERROR - stderr - +2025-02-05 17:01:15 - INFO - stdout - {'loss': 0.93, 'grad_norm': 0.9715163111686707, 'learning_rate': 1.741726279740252e-05, 'epoch': 0.77} +2025-02-05 17:01:15 - ERROR - stderr - 26%|██▌ | 5766/22434 [6:53:35<11:35:41, 2.50s/it] +2025-02-05 17:01:17 - ERROR - stderr - 26%|██▌ | 5767/22434 [6:53:37<11:35:52, 2.51s/it] +2025-02-05 17:01:17 - ERROR - stderr - +2025-02-05 17:01:17 - ERROR - stderr - +2025-02-05 17:01:17 - INFO - stdout - {'loss': 0.8397, 'grad_norm': 1.0256233215332031, 'learning_rate': 1.7416294395329018e-05, 'epoch': 0.77} +2025-02-05 17:01:17 - ERROR - stderr - 26%|██▌ | 5767/22434 [6:53:37<11:35:52, 2.51s/it] +2025-02-05 17:01:20 - ERROR - stderr - 26%|██▌ | 5768/22434 [6:53:39<11:34:07, 2.50s/it] +2025-02-05 17:01:20 - ERROR - stderr - +2025-02-05 17:01:20 - ERROR - stderr - +2025-02-05 17:01:20 - INFO - stdout - {'loss': 0.9558, 'grad_norm': 1.136228084564209, 'learning_rate': 1.741532583866995e-05, 'epoch': 0.77} +2025-02-05 17:01:20 - ERROR - stderr - 26%|██▌ | 5768/22434 [6:53:39<11:34:07, 2.50s/it] +2025-02-05 17:01:22 - ERROR - stderr - 26%|██▌ | 5769/22434 [6:53:42<11:31:14, 2.49s/it] +2025-02-05 17:01:22 - ERROR - stderr - +2025-02-05 17:01:22 - ERROR - stderr - +2025-02-05 17:01:22 - INFO - stdout - {'loss': 0.9662, 'grad_norm': 1.1174087524414062, 'learning_rate': 1.7414357127445515e-05, 'epoch': 0.77} +2025-02-05 17:01:22 - ERROR - stderr - 26%|██▌ | 5769/22434 [6:53:42<11:31:14, 2.49s/it] +2025-02-05 17:01:25 - ERROR - stderr - 26%|██▌ | 5770/22434 [6:53:44<11:29:47, 2.48s/it] +2025-02-05 17:01:25 - ERROR - stderr - +2025-02-05 17:01:25 - ERROR - stderr - +2025-02-05 17:01:25 - INFO - stdout - {'loss': 1.0482, 'grad_norm': 1.1421359777450562, 'learning_rate': 1.74133882616759e-05, 'epoch': 0.77} +2025-02-05 17:01:25 - ERROR - stderr - 26%|██▌ | 5770/22434 [6:53:44<11:29:47, 2.48s/it] +2025-02-05 17:01:27 - ERROR - stderr - 26%|██▌ | 5771/22434 [6:53:47<11:39:15, 2.52s/it] +2025-02-05 17:01:27 - ERROR - stderr - +2025-02-05 17:01:27 - ERROR - stderr - +2025-02-05 17:01:27 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 1.0439422130584717, 'learning_rate': 1.74124192413813e-05, 'epoch': 0.77} +2025-02-05 17:01:27 - ERROR - stderr - 26%|██▌ | 5771/22434 [6:53:47<11:39:15, 2.52s/it] +2025-02-05 17:01:30 - ERROR - stderr - 26%|██▌ | 5772/22434 [6:53:50<11:47:36, 2.55s/it] +2025-02-05 17:01:30 - ERROR - stderr - +2025-02-05 17:01:30 - ERROR - stderr - +2025-02-05 17:01:30 - INFO - stdout - {'loss': 0.9907, 'grad_norm': 1.0925662517547607, 'learning_rate': 1.7411450066581913e-05, 'epoch': 0.77} +2025-02-05 17:01:30 - ERROR - stderr - 26%|██▌ | 5772/22434 [6:53:50<11:47:36, 2.55s/it] +2025-02-05 17:01:32 - ERROR - stderr - 26%|██▌ | 5773/22434 [6:53:52<11:41:51, 2.53s/it] +2025-02-05 17:01:32 - ERROR - stderr - +2025-02-05 17:01:32 - ERROR - stderr - +2025-02-05 17:01:32 - INFO - stdout - {'loss': 0.9923, 'grad_norm': 1.1391443014144897, 'learning_rate': 1.7410480737297942e-05, 'epoch': 0.77} +2025-02-05 17:01:32 - ERROR - stderr - 26%|██▌ | 5773/22434 [6:53:52<11:41:51, 2.53s/it] +2025-02-05 17:01:35 - ERROR - stderr - 26%|██▌ | 5774/22434 [6:53:55<11:40:31, 2.52s/it] +2025-02-05 17:01:35 - ERROR - stderr - +2025-02-05 17:01:35 - ERROR - stderr - +2025-02-05 17:01:35 - INFO - stdout - {'loss': 0.8843, 'grad_norm': 1.1026073694229126, 'learning_rate': 1.7409511253549592e-05, 'epoch': 0.77} +2025-02-05 17:01:35 - ERROR - stderr - 26%|██▌ | 5774/22434 [6:53:55<11:40:31, 2.52s/it] +2025-02-05 17:01:37 - ERROR - stderr - 26%|██▌ | 5775/22434 [6:53:57<11:48:46, 2.55s/it] +2025-02-05 17:01:37 - ERROR - stderr - +2025-02-05 17:01:37 - ERROR - stderr - +2025-02-05 17:01:37 - INFO - stdout - {'loss': 0.9102, 'grad_norm': 1.0762516260147095, 'learning_rate': 1.740854161535707e-05, 'epoch': 0.77} +2025-02-05 17:01:37 - ERROR - stderr - 26%|██▌ | 5775/22434 [6:53:57<11:48:46, 2.55s/it] +2025-02-05 17:01:40 - ERROR - stderr - 26%|██▌ | 5776/22434 [6:54:00<12:15:08, 2.65s/it] +2025-02-05 17:01:40 - ERROR - stderr - +2025-02-05 17:01:40 - ERROR - stderr - +2025-02-05 17:01:40 - INFO - stdout - {'loss': 0.9529, 'grad_norm': 1.0368516445159912, 'learning_rate': 1.7407571822740584e-05, 'epoch': 0.77} +2025-02-05 17:01:40 - ERROR - stderr - 26%|██▌ | 5776/22434 [6:54:00<12:15:08, 2.65s/it] +2025-02-05 17:01:43 - ERROR - stderr - 26%|██▌ | 5777/22434 [6:54:03<12:01:54, 2.60s/it] +2025-02-05 17:01:43 - ERROR - stderr - +2025-02-05 17:01:43 - ERROR - stderr - +2025-02-05 17:01:43 - INFO - stdout - {'loss': 0.9205, 'grad_norm': 1.1312167644500732, 'learning_rate': 1.7406601875720354e-05, 'epoch': 0.77} +2025-02-05 17:01:43 - ERROR - stderr - 26%|██▌ | 5777/22434 [6:54:03<12:01:54, 2.60s/it] +2025-02-05 17:01:45 - ERROR - stderr - 26%|██▌ | 5778/22434 [6:54:05<11:52:31, 2.57s/it] +2025-02-05 17:01:45 - ERROR - stderr - +2025-02-05 17:01:45 - ERROR - stderr - +2025-02-05 17:01:45 - INFO - stdout - {'loss': 0.943, 'grad_norm': 1.128832221031189, 'learning_rate': 1.7405631774316595e-05, 'epoch': 0.77} +2025-02-05 17:01:45 - ERROR - stderr - 26%|██▌ | 5778/22434 [6:54:05<11:52:31, 2.57s/it] +2025-02-05 17:01:48 - ERROR - stderr - 26%|██▌ | 5779/22434 [6:54:08<11:47:39, 2.55s/it] +2025-02-05 17:01:48 - ERROR - stderr - +2025-02-05 17:01:48 - ERROR - stderr - +2025-02-05 17:01:48 - INFO - stdout - {'loss': 0.8692, 'grad_norm': 2.2299225330352783, 'learning_rate': 1.740466151854953e-05, 'epoch': 0.77} +2025-02-05 17:01:48 - ERROR - stderr - 26%|██▌ | 5779/22434 [6:54:08<11:47:39, 2.55s/it] +2025-02-05 17:01:50 - ERROR - stderr - 26%|██▌ | 5780/22434 [6:54:10<11:46:26, 2.55s/it] +2025-02-05 17:01:50 - ERROR - stderr - +2025-02-05 17:01:50 - ERROR - stderr - +2025-02-05 17:01:50 - INFO - stdout - {'loss': 0.9667, 'grad_norm': 1.0916657447814941, 'learning_rate': 1.740369110843938e-05, 'epoch': 0.77} +2025-02-05 17:01:50 - ERROR - stderr - 26%|██▌ | 5780/22434 [6:54:10<11:46:26, 2.55s/it] +2025-02-05 17:01:53 - ERROR - stderr - 26%|██▌ | 5781/22434 [6:54:13<11:46:40, 2.55s/it] +2025-02-05 17:01:53 - ERROR - stderr - +2025-02-05 17:01:53 - ERROR - stderr - +2025-02-05 17:01:53 - INFO - stdout - {'loss': 0.9951, 'grad_norm': 1.0875526666641235, 'learning_rate': 1.740272054400637e-05, 'epoch': 0.77} +2025-02-05 17:01:53 - ERROR - stderr - 26%|██▌ | 5781/22434 [6:54:13<11:46:40, 2.55s/it] +2025-02-05 17:01:55 - ERROR - stderr - 26%|██▌ | 5782/22434 [6:54:15<11:42:51, 2.53s/it] +2025-02-05 17:01:55 - ERROR - stderr - +2025-02-05 17:01:55 - ERROR - stderr - +2025-02-05 17:01:55 - INFO - stdout - {'loss': 0.8968, 'grad_norm': 1.1238740682601929, 'learning_rate': 1.740174982527074e-05, 'epoch': 0.77} +2025-02-05 17:01:55 - ERROR - stderr - 26%|██▌ | 5782/22434 [6:54:15<11:42:51, 2.53s/it] +2025-02-05 17:01:58 - ERROR - stderr - 26%|██▌ | 5783/22434 [6:54:18<11:40:45, 2.53s/it] +2025-02-05 17:01:58 - ERROR - stderr - +2025-02-05 17:01:58 - ERROR - stderr - +2025-02-05 17:01:58 - INFO - stdout - {'loss': 1.026, 'grad_norm': 1.1533631086349487, 'learning_rate': 1.7400778952252716e-05, 'epoch': 0.77} +2025-02-05 17:01:58 - ERROR - stderr - 26%|██▌ | 5783/22434 [6:54:18<11:40:45, 2.53s/it] +2025-02-05 17:02:00 - ERROR - stderr - 26%|██▌ | 5784/22434 [6:54:20<11:35:18, 2.51s/it] +2025-02-05 17:02:00 - ERROR - stderr - +2025-02-05 17:02:00 - ERROR - stderr - +2025-02-05 17:02:00 - INFO - stdout - {'loss': 1.0415, 'grad_norm': 1.141110897064209, 'learning_rate': 1.7399807924972533e-05, 'epoch': 0.77} +2025-02-05 17:02:00 - ERROR - stderr - 26%|██▌ | 5784/22434 [6:54:20<11:35:18, 2.51s/it] +2025-02-05 17:02:03 - ERROR - stderr - 26%|██▌ | 5785/22434 [6:54:23<11:34:15, 2.50s/it] +2025-02-05 17:02:03 - ERROR - stderr - +2025-02-05 17:02:03 - ERROR - stderr - +2025-02-05 17:02:03 - INFO - stdout - {'loss': 0.9102, 'grad_norm': 1.1428486108779907, 'learning_rate': 1.739883674345044e-05, 'epoch': 0.77} +2025-02-05 17:02:03 - ERROR - stderr - 26%|██▌ | 5785/22434 [6:54:23<11:34:15, 2.50s/it] +2025-02-05 17:02:05 - ERROR - stderr - 26%|██▌ | 5786/22434 [6:54:25<11:30:10, 2.49s/it] +2025-02-05 17:02:05 - ERROR - stderr - +2025-02-05 17:02:05 - ERROR - stderr - +2025-02-05 17:02:05 - INFO - stdout - {'loss': 0.7729, 'grad_norm': 1.0603721141815186, 'learning_rate': 1.7397865407706667e-05, 'epoch': 0.77} +2025-02-05 17:02:05 - ERROR - stderr - 26%|██▌ | 5786/22434 [6:54:25<11:30:10, 2.49s/it] +2025-02-05 17:02:08 - ERROR - stderr - 26%|██▌ | 5787/22434 [6:54:28<11:33:47, 2.50s/it] +2025-02-05 17:02:08 - ERROR - stderr - +2025-02-05 17:02:08 - ERROR - stderr - +2025-02-05 17:02:08 - INFO - stdout - {'loss': 1.0116, 'grad_norm': 1.0730334520339966, 'learning_rate': 1.7396893917761476e-05, 'epoch': 0.77} +2025-02-05 17:02:08 - ERROR - stderr - 26%|██▌ | 5787/22434 [6:54:28<11:33:47, 2.50s/it] +2025-02-05 17:02:10 - ERROR - stderr - 26%|██▌ | 5788/22434 [6:54:30<11:42:37, 2.53s/it] +2025-02-05 17:02:10 - ERROR - stderr - +2025-02-05 17:02:10 - ERROR - stderr - +2025-02-05 17:02:10 - INFO - stdout - {'loss': 0.8683, 'grad_norm': 1.0567317008972168, 'learning_rate': 1.7395922273635106e-05, 'epoch': 0.77} +2025-02-05 17:02:10 - ERROR - stderr - 26%|██▌ | 5788/22434 [6:54:30<11:42:37, 2.53s/it] +2025-02-05 17:02:13 - ERROR - stderr - 26%|██▌ | 5789/22434 [6:54:33<11:42:39, 2.53s/it] +2025-02-05 17:02:13 - ERROR - stderr - +2025-02-05 17:02:13 - ERROR - stderr - +2025-02-05 17:02:13 - INFO - stdout - {'loss': 0.9797, 'grad_norm': 1.1615196466445923, 'learning_rate': 1.7394950475347814e-05, 'epoch': 0.77} +2025-02-05 17:02:13 - ERROR - stderr - 26%|██▌ | 5789/22434 [6:54:33<11:42:39, 2.53s/it] +2025-02-05 17:02:15 - ERROR - stderr - 26%|██▌ | 5790/22434 [6:54:35<11:38:55, 2.52s/it] +2025-02-05 17:02:16 - ERROR - stderr - +2025-02-05 17:02:16 - ERROR - stderr - +2025-02-05 17:02:16 - INFO - stdout - {'loss': 0.8486, 'grad_norm': 0.9932485222816467, 'learning_rate': 1.7393978522919855e-05, 'epoch': 0.77} +2025-02-05 17:02:16 - ERROR - stderr - 26%|██▌ | 5790/22434 [6:54:35<11:38:55, 2.52s/it] +2025-02-05 17:02:18 - ERROR - stderr - 26%|██▌ | 5791/22434 [6:54:38<11:57:52, 2.59s/it] +2025-02-05 17:02:18 - ERROR - stderr - +2025-02-05 17:02:18 - ERROR - stderr - +2025-02-05 17:02:18 - INFO - stdout - {'loss': 0.9237, 'grad_norm': 1.0752326250076294, 'learning_rate': 1.739300641637149e-05, 'epoch': 0.77} +2025-02-05 17:02:18 - ERROR - stderr - 26%|██▌ | 5791/22434 [6:54:38<11:57:52, 2.59s/it] +2025-02-05 17:02:21 - ERROR - stderr - 26%|██▌ | 5792/22434 [6:54:40<11:52:19, 2.57s/it] +2025-02-05 17:02:21 - ERROR - stderr - +2025-02-05 17:02:21 - ERROR - stderr - +2025-02-05 17:02:21 - INFO - stdout - {'loss': 1.0123, 'grad_norm': 1.1332244873046875, 'learning_rate': 1.7392034155722977e-05, 'epoch': 0.77} +2025-02-05 17:02:21 - ERROR - stderr - 26%|██▌ | 5792/22434 [6:54:41<11:52:19, 2.57s/it] +2025-02-05 17:02:23 - ERROR - stderr - 26%|██▌ | 5793/22434 [6:54:43<11:50:22, 2.56s/it] +2025-02-05 17:02:23 - ERROR - stderr - +2025-02-05 17:02:23 - ERROR - stderr - +2025-02-05 17:02:23 - INFO - stdout - {'loss': 0.9772, 'grad_norm': 1.0429304838180542, 'learning_rate': 1.739106174099459e-05, 'epoch': 0.77} +2025-02-05 17:02:23 - ERROR - stderr - 26%|██▌ | 5793/22434 [6:54:43<11:50:22, 2.56s/it] +2025-02-05 17:02:26 - ERROR - stderr - 26%|██▌ | 5794/22434 [6:54:45<11:40:40, 2.53s/it] +2025-02-05 17:02:26 - ERROR - stderr - +2025-02-05 17:02:26 - ERROR - stderr - +2025-02-05 17:02:26 - INFO - stdout - {'loss': 0.9542, 'grad_norm': 1.0938130617141724, 'learning_rate': 1.7390089172206594e-05, 'epoch': 0.77} +2025-02-05 17:02:26 - ERROR - stderr - 26%|██▌ | 5794/22434 [6:54:46<11:40:40, 2.53s/it] +2025-02-05 17:02:28 - ERROR - stderr - 26%|██▌ | 5795/22434 [6:54:48<11:34:34, 2.50s/it] +2025-02-05 17:02:28 - ERROR - stderr - +2025-02-05 17:02:28 - ERROR - stderr - +2025-02-05 17:02:28 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.0806126594543457, 'learning_rate': 1.738911644937926e-05, 'epoch': 0.77} +2025-02-05 17:02:28 - ERROR - stderr - 26%|██▌ | 5795/22434 [6:54:48<11:34:34, 2.50s/it] +2025-02-05 17:02:31 - ERROR - stderr - 26%|██▌ | 5796/22434 [6:54:50<11:34:14, 2.50s/it] +2025-02-05 17:02:31 - ERROR - stderr - +2025-02-05 17:02:31 - ERROR - stderr - +2025-02-05 17:02:31 - INFO - stdout - {'loss': 1.0272, 'grad_norm': 1.1858100891113281, 'learning_rate': 1.738814357253286e-05, 'epoch': 0.78} +2025-02-05 17:02:31 - ERROR - stderr - 26%|██▌ | 5796/22434 [6:54:50<11:34:14, 2.50s/it] +2025-02-05 17:02:33 - ERROR - stderr - 26%|██▌ | 5797/22434 [6:54:53<11:31:32, 2.49s/it] +2025-02-05 17:02:33 - ERROR - stderr - +2025-02-05 17:02:33 - ERROR - stderr - +2025-02-05 17:02:33 - INFO - stdout - {'loss': 1.0184, 'grad_norm': 1.1996965408325195, 'learning_rate': 1.738717054168768e-05, 'epoch': 0.78} +2025-02-05 17:02:33 - ERROR - stderr - 26%|██▌ | 5797/22434 [6:54:53<11:31:32, 2.49s/it] +2025-02-05 17:02:36 - ERROR - stderr - 26%|██▌ | 5798/22434 [6:54:55<11:29:55, 2.49s/it] +2025-02-05 17:02:36 - ERROR - stderr - +2025-02-05 17:02:36 - ERROR - stderr - +2025-02-05 17:02:36 - INFO - stdout - {'loss': 0.8248, 'grad_norm': 0.9996867775917053, 'learning_rate': 1.7386197356863998e-05, 'epoch': 0.78} +2025-02-05 17:02:36 - ERROR - stderr - 26%|██▌ | 5798/22434 [6:54:55<11:29:55, 2.49s/it] +2025-02-05 17:02:38 - ERROR - stderr - 26%|██▌ | 5799/22434 [6:54:58<11:29:20, 2.49s/it] +2025-02-05 17:02:38 - ERROR - stderr - +2025-02-05 17:02:38 - ERROR - stderr - +2025-02-05 17:02:38 - INFO - stdout - {'loss': 0.9124, 'grad_norm': 1.0730143785476685, 'learning_rate': 1.73852240180821e-05, 'epoch': 0.78} +2025-02-05 17:02:38 - ERROR - stderr - 26%|██▌ | 5799/22434 [6:54:58<11:29:20, 2.49s/it] +2025-02-05 17:02:41 - ERROR - stderr - 26%|██▌ | 5800/22434 [6:55:00<11:30:28, 2.49s/it] +2025-02-05 17:02:41 - ERROR - stderr - +2025-02-05 17:02:41 - ERROR - stderr - +2025-02-05 17:02:41 - INFO - stdout - {'loss': 0.9296, 'grad_norm': 1.207648515701294, 'learning_rate': 1.7384250525362277e-05, 'epoch': 0.78} +2025-02-05 17:02:41 - ERROR - stderr - 26%|██▌ | 5800/22434 [6:55:00<11:30:28, 2.49s/it] +2025-02-05 17:02:43 - ERROR - stderr - 26%|██▌ | 5801/22434 [6:55:03<11:43:30, 2.54s/it] +2025-02-05 17:02:43 - ERROR - stderr - +2025-02-05 17:02:43 - ERROR - stderr - +2025-02-05 17:02:43 - INFO - stdout - {'loss': 0.9525, 'grad_norm': 1.144271969795227, 'learning_rate': 1.738327687872481e-05, 'epoch': 0.78} +2025-02-05 17:02:43 - ERROR - stderr - 26%|██▌ | 5801/22434 [6:55:03<11:43:30, 2.54s/it] +2025-02-05 17:02:46 - ERROR - stderr - 26%|██▌ | 5802/22434 [6:55:05<11:35:58, 2.51s/it] +2025-02-05 17:02:46 - ERROR - stderr - +2025-02-05 17:02:46 - ERROR - stderr - +2025-02-05 17:02:46 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.1087696552276611, 'learning_rate': 1.7382303078190014e-05, 'epoch': 0.78} +2025-02-05 17:02:46 - ERROR - stderr - 26%|██▌ | 5802/22434 [6:55:06<11:35:58, 2.51s/it] +2025-02-05 17:02:48 - ERROR - stderr - 26%|██▌ | 5803/22434 [6:55:08<11:32:00, 2.50s/it] +2025-02-05 17:02:48 - ERROR - stderr - +2025-02-05 17:02:48 - ERROR - stderr - +2025-02-05 17:02:48 - INFO - stdout - {'loss': 0.9219, 'grad_norm': 1.032373309135437, 'learning_rate': 1.7381329123778166e-05, 'epoch': 0.78} +2025-02-05 17:02:48 - ERROR - stderr - 26%|██▌ | 5803/22434 [6:55:08<11:32:00, 2.50s/it] +2025-02-05 17:02:51 - ERROR - stderr - 26%|██▌ | 5804/22434 [6:55:10<11:30:02, 2.49s/it] +2025-02-05 17:02:51 - ERROR - stderr - +2025-02-05 17:02:51 - ERROR - stderr - +2025-02-05 17:02:51 - INFO - stdout - {'loss': 1.0668, 'grad_norm': 1.115530014038086, 'learning_rate': 1.7380355015509577e-05, 'epoch': 0.78} +2025-02-05 17:02:51 - ERROR - stderr - 26%|██▌ | 5804/22434 [6:55:10<11:30:02, 2.49s/it] +2025-02-05 17:02:53 - ERROR - stderr - 26%|██▌ | 5805/22434 [6:55:13<11:57:55, 2.59s/it] +2025-02-05 17:02:54 - ERROR - stderr - +2025-02-05 17:02:54 - ERROR - stderr - +2025-02-05 17:02:54 - INFO - stdout - {'loss': 0.9275, 'grad_norm': 1.0470343828201294, 'learning_rate': 1.7379380753404548e-05, 'epoch': 0.78} +2025-02-05 17:02:54 - ERROR - stderr - 26%|██▌ | 5805/22434 [6:55:13<11:57:55, 2.59s/it] +2025-02-05 17:02:56 - ERROR - stderr - 26%|██▌ | 5806/22434 [6:55:16<11:46:56, 2.55s/it] +2025-02-05 17:02:56 - ERROR - stderr - +2025-02-05 17:02:56 - ERROR - stderr - +2025-02-05 17:02:56 - INFO - stdout - {'loss': 0.9981, 'grad_norm': 1.125929832458496, 'learning_rate': 1.737840633748339e-05, 'epoch': 0.78} +2025-02-05 17:02:56 - ERROR - stderr - 26%|██▌ | 5806/22434 [6:55:16<11:46:56, 2.55s/it] +2025-02-05 17:02:58 - ERROR - stderr - 26%|██▌ | 5807/22434 [6:55:18<11:47:12, 2.55s/it] +2025-02-05 17:02:59 - ERROR - stderr - +2025-02-05 17:02:59 - ERROR - stderr - +2025-02-05 17:02:59 - INFO - stdout - {'loss': 0.9639, 'grad_norm': 0.9741032123565674, 'learning_rate': 1.7377431767766414e-05, 'epoch': 0.78} +2025-02-05 17:02:59 - ERROR - stderr - 26%|██▌ | 5807/22434 [6:55:18<11:47:12, 2.55s/it] +2025-02-05 17:03:01 - ERROR - stderr - 26%|██▌ | 5808/22434 [6:55:21<11:41:13, 2.53s/it] +2025-02-05 17:03:01 - ERROR - stderr - +2025-02-05 17:03:01 - ERROR - stderr - +2025-02-05 17:03:01 - INFO - stdout - {'loss': 0.9877, 'grad_norm': 1.324411392211914, 'learning_rate': 1.7376457044273932e-05, 'epoch': 0.78} +2025-02-05 17:03:01 - ERROR - stderr - 26%|██▌ | 5808/22434 [6:55:21<11:41:13, 2.53s/it] +2025-02-05 17:03:03 - ERROR - stderr - 26%|██▌ | 5809/22434 [6:55:23<11:38:58, 2.52s/it] +2025-02-05 17:03:04 - ERROR - stderr - +2025-02-05 17:03:04 - ERROR - stderr - +2025-02-05 17:03:04 - INFO - stdout - {'loss': 0.8731, 'grad_norm': 1.006172776222229, 'learning_rate': 1.737548216702626e-05, 'epoch': 0.78} +2025-02-05 17:03:04 - ERROR - stderr - 26%|██▌ | 5809/22434 [6:55:23<11:38:58, 2.52s/it] +2025-02-05 17:03:06 - ERROR - stderr - 26%|██▌ | 5810/22434 [6:55:26<11:56:52, 2.59s/it] +2025-02-05 17:03:06 - ERROR - stderr - +2025-02-05 17:03:06 - ERROR - stderr - +2025-02-05 17:03:06 - INFO - stdout - {'loss': 1.0281, 'grad_norm': 1.1740729808807373, 'learning_rate': 1.737450713604372e-05, 'epoch': 0.78} +2025-02-05 17:03:06 - ERROR - stderr - 26%|██▌ | 5810/22434 [6:55:26<11:56:52, 2.59s/it] +2025-02-05 17:03:09 - ERROR - stderr - 26%|██▌ | 5811/22434 [6:55:28<11:51:25, 2.57s/it] +2025-02-05 17:03:09 - ERROR - stderr - +2025-02-05 17:03:09 - ERROR - stderr - +2025-02-05 17:03:09 - INFO - stdout - {'loss': 0.8052, 'grad_norm': 1.071735143661499, 'learning_rate': 1.7373531951346634e-05, 'epoch': 0.78} +2025-02-05 17:03:09 - ERROR - stderr - 26%|██▌ | 5811/22434 [6:55:29<11:51:25, 2.57s/it] +2025-02-05 17:03:11 - ERROR - stderr - 26%|██▌ | 5812/22434 [6:55:31<11:39:59, 2.53s/it] +2025-02-05 17:03:11 - ERROR - stderr - +2025-02-05 17:03:11 - ERROR - stderr - +2025-02-05 17:03:11 - INFO - stdout - {'loss': 1.0308, 'grad_norm': 1.148179292678833, 'learning_rate': 1.7372556612955335e-05, 'epoch': 0.78} +2025-02-05 17:03:11 - ERROR - stderr - 26%|██▌ | 5812/22434 [6:55:31<11:39:59, 2.53s/it] +2025-02-05 17:03:14 - ERROR - stderr - 26%|██▌ | 5813/22434 [6:55:33<11:37:44, 2.52s/it] +2025-02-05 17:03:14 - ERROR - stderr - +2025-02-05 17:03:14 - ERROR - stderr - +2025-02-05 17:03:14 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 1.0332939624786377, 'learning_rate': 1.737158112089014e-05, 'epoch': 0.78} +2025-02-05 17:03:14 - ERROR - stderr - 26%|██▌ | 5813/22434 [6:55:33<11:37:44, 2.52s/it] +2025-02-05 17:03:16 - ERROR - stderr - 26%|██▌ | 5814/22434 [6:55:36<11:39:55, 2.53s/it] +2025-02-05 17:03:16 - ERROR - stderr - +2025-02-05 17:03:16 - ERROR - stderr - +2025-02-05 17:03:16 - INFO - stdout - {'loss': 0.9084, 'grad_norm': 1.237817406654358, 'learning_rate': 1.73706054751714e-05, 'epoch': 0.78} +2025-02-05 17:03:16 - ERROR - stderr - 26%|██▌ | 5814/22434 [6:55:36<11:39:55, 2.53s/it] +2025-02-05 17:03:19 - ERROR - stderr - 26%|██▌ | 5815/22434 [6:55:39<11:43:10, 2.54s/it] +2025-02-05 17:03:19 - ERROR - stderr - +2025-02-05 17:03:19 - ERROR - stderr - +2025-02-05 17:03:19 - INFO - stdout - {'loss': 0.9833, 'grad_norm': 1.0857131481170654, 'learning_rate': 1.7369629675819436e-05, 'epoch': 0.78} +2025-02-05 17:03:19 - ERROR - stderr - 26%|██▌ | 5815/22434 [6:55:39<11:43:10, 2.54s/it] +2025-02-05 17:03:21 - ERROR - stderr - 26%|██▌ | 5816/22434 [6:55:41<11:43:31, 2.54s/it] +2025-02-05 17:03:21 - ERROR - stderr - +2025-02-05 17:03:21 - ERROR - stderr - +2025-02-05 17:03:21 - INFO - stdout - {'loss': 0.9321, 'grad_norm': 1.030595302581787, 'learning_rate': 1.7368653722854593e-05, 'epoch': 0.78} +2025-02-05 17:03:21 - ERROR - stderr - 26%|██▌ | 5816/22434 [6:55:41<11:43:31, 2.54s/it] +2025-02-05 17:03:24 - ERROR - stderr - 26%|██▌ | 5817/22434 [6:55:44<12:07:13, 2.63s/it] +2025-02-05 17:03:24 - ERROR - stderr - +2025-02-05 17:03:24 - ERROR - stderr - +2025-02-05 17:03:24 - INFO - stdout - {'loss': 0.9836, 'grad_norm': 1.0658055543899536, 'learning_rate': 1.7367677616297215e-05, 'epoch': 0.78} +2025-02-05 17:03:24 - ERROR - stderr - 26%|██▌ | 5817/22434 [6:55:44<12:07:13, 2.63s/it] +2025-02-05 17:03:27 - ERROR - stderr - 26%|██▌ | 5818/22434 [6:55:47<12:11:18, 2.64s/it] +2025-02-05 17:03:27 - ERROR - stderr - +2025-02-05 17:03:27 - ERROR - stderr - +2025-02-05 17:03:27 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.0333417654037476, 'learning_rate': 1.7366701356167648e-05, 'epoch': 0.78} +2025-02-05 17:03:27 - ERROR - stderr - 26%|██▌ | 5818/22434 [6:55:47<12:11:18, 2.64s/it] +2025-02-05 17:03:29 - ERROR - stderr - 26%|██▌ | 5819/22434 [6:55:49<12:01:29, 2.61s/it] +2025-02-05 17:03:29 - ERROR - stderr - +2025-02-05 17:03:29 - ERROR - stderr - +2025-02-05 17:03:29 - INFO - stdout - {'loss': 0.833, 'grad_norm': 1.0584616661071777, 'learning_rate': 1.7365724942486243e-05, 'epoch': 0.78} +2025-02-05 17:03:29 - ERROR - stderr - 26%|██▌ | 5819/22434 [6:55:49<12:01:29, 2.61s/it] +2025-02-05 17:03:32 - ERROR - stderr - 26%|██▌ | 5820/22434 [6:55:52<11:50:32, 2.57s/it] +2025-02-05 17:03:32 - ERROR - stderr - +2025-02-05 17:03:32 - ERROR - stderr - +2025-02-05 17:03:32 - INFO - stdout - {'loss': 0.9557, 'grad_norm': 1.0248152017593384, 'learning_rate': 1.7364748375273347e-05, 'epoch': 0.78} +2025-02-05 17:03:32 - ERROR - stderr - 26%|██▌ | 5820/22434 [6:55:52<11:50:32, 2.57s/it] +2025-02-05 17:03:34 - ERROR - stderr - 26%|██▌ | 5821/22434 [6:55:54<11:48:05, 2.56s/it] +2025-02-05 17:03:34 - ERROR - stderr - +2025-02-05 17:03:34 - ERROR - stderr - +2025-02-05 17:03:34 - INFO - stdout - {'loss': 1.0483, 'grad_norm': 1.035446047782898, 'learning_rate': 1.7363771654549317e-05, 'epoch': 0.78} +2025-02-05 17:03:34 - ERROR - stderr - 26%|██▌ | 5821/22434 [6:55:54<11:48:05, 2.56s/it] +2025-02-05 17:03:37 - ERROR - stderr - 26%|██▌ | 5822/22434 [6:55:57<11:38:20, 2.52s/it] +2025-02-05 17:03:37 - ERROR - stderr - +2025-02-05 17:03:37 - ERROR - stderr - +2025-02-05 17:03:37 - INFO - stdout - {'loss': 0.9852, 'grad_norm': 1.056353211402893, 'learning_rate': 1.7362794780334516e-05, 'epoch': 0.78} +2025-02-05 17:03:37 - ERROR - stderr - 26%|██▌ | 5822/22434 [6:55:57<11:38:20, 2.52s/it] +2025-02-05 17:03:39 - ERROR - stderr - 26%|██▌ | 5823/22434 [6:55:59<11:36:34, 2.52s/it] +2025-02-05 17:03:39 - ERROR - stderr - +2025-02-05 17:03:39 - ERROR - stderr - +2025-02-05 17:03:39 - INFO - stdout - {'loss': 0.8998, 'grad_norm': 1.0895535945892334, 'learning_rate': 1.73618177526493e-05, 'epoch': 0.78} +2025-02-05 17:03:39 - ERROR - stderr - 26%|██▌ | 5823/22434 [6:55:59<11:36:34, 2.52s/it] +2025-02-05 17:03:42 - ERROR - stderr - 26%|██▌ | 5824/22434 [6:56:02<11:39:07, 2.53s/it] +2025-02-05 17:03:42 - ERROR - stderr - +2025-02-05 17:03:42 - ERROR - stderr - +2025-02-05 17:03:42 - INFO - stdout - {'loss': 0.9623, 'grad_norm': 1.0371977090835571, 'learning_rate': 1.736084057151404e-05, 'epoch': 0.78} +2025-02-05 17:03:42 - ERROR - stderr - 26%|██▌ | 5824/22434 [6:56:02<11:39:07, 2.53s/it] +2025-02-05 17:03:44 - ERROR - stderr - 26%|██▌ | 5825/22434 [6:56:04<11:35:14, 2.51s/it] +2025-02-05 17:03:44 - ERROR - stderr - +2025-02-05 17:03:44 - ERROR - stderr - +2025-02-05 17:03:44 - INFO - stdout - {'loss': 1.062, 'grad_norm': 1.2121824026107788, 'learning_rate': 1.73598632369491e-05, 'epoch': 0.78} +2025-02-05 17:03:44 - ERROR - stderr - 26%|██▌ | 5825/22434 [6:56:04<11:35:14, 2.51s/it] +2025-02-05 17:03:47 - ERROR - stderr - 26%|██▌ | 5826/22434 [6:56:07<11:30:54, 2.50s/it] +2025-02-05 17:03:47 - ERROR - stderr - +2025-02-05 17:03:47 - ERROR - stderr - +2025-02-05 17:03:47 - INFO - stdout - {'loss': 0.9181, 'grad_norm': 1.1948134899139404, 'learning_rate': 1.7358885748974853e-05, 'epoch': 0.78} +2025-02-05 17:03:47 - ERROR - stderr - 26%|██▌ | 5826/22434 [6:56:07<11:30:54, 2.50s/it] +2025-02-05 17:03:49 - ERROR - stderr - 26%|██▌ | 5827/22434 [6:56:09<11:31:03, 2.50s/it] +2025-02-05 17:03:49 - ERROR - stderr - +2025-02-05 17:03:49 - ERROR - stderr - +2025-02-05 17:03:49 - INFO - stdout - {'loss': 1.0609, 'grad_norm': 1.1606647968292236, 'learning_rate': 1.7357908107611677e-05, 'epoch': 0.78} +2025-02-05 17:03:49 - ERROR - stderr - 26%|██▌ | 5827/22434 [6:56:09<11:31:03, 2.50s/it] +2025-02-05 17:03:52 - ERROR - stderr - 26%|██▌ | 5828/22434 [6:56:11<11:24:37, 2.47s/it] +2025-02-05 17:03:52 - ERROR - stderr - +2025-02-05 17:03:52 - ERROR - stderr - +2025-02-05 17:03:52 - INFO - stdout - {'loss': 0.9753, 'grad_norm': 1.107747197151184, 'learning_rate': 1.735693031287995e-05, 'epoch': 0.78} +2025-02-05 17:03:52 - ERROR - stderr - 26%|██▌ | 5828/22434 [6:56:12<11:24:37, 2.47s/it] +2025-02-05 17:03:54 - ERROR - stderr - 26%|██▌ | 5829/22434 [6:56:14<11:20:30, 2.46s/it] +2025-02-05 17:03:54 - ERROR - stderr - +2025-02-05 17:03:54 - ERROR - stderr - +2025-02-05 17:03:54 - INFO - stdout - {'loss': 1.0478, 'grad_norm': 1.1481611728668213, 'learning_rate': 1.7355952364800045e-05, 'epoch': 0.78} +2025-02-05 17:03:54 - ERROR - stderr - 26%|██▌ | 5829/22434 [6:56:14<11:20:30, 2.46s/it] +2025-02-05 17:03:57 - ERROR - stderr - 26%|██▌ | 5830/22434 [6:56:17<11:38:16, 2.52s/it] +2025-02-05 17:03:57 - ERROR - stderr - +2025-02-05 17:03:57 - ERROR - stderr - +2025-02-05 17:03:57 - INFO - stdout - {'loss': 0.9896, 'grad_norm': 1.1143673658370972, 'learning_rate': 1.7354974263392353e-05, 'epoch': 0.78} +2025-02-05 17:03:57 - ERROR - stderr - 26%|██▌ | 5830/22434 [6:56:17<11:38:16, 2.52s/it] +2025-02-05 17:03:59 - ERROR - stderr - 26%|██▌ | 5831/22434 [6:56:19<11:41:36, 2.54s/it] +2025-02-05 17:03:59 - ERROR - stderr - +2025-02-05 17:03:59 - ERROR - stderr - +2025-02-05 17:03:59 - INFO - stdout - {'loss': 1.0293, 'grad_norm': 1.1509370803833008, 'learning_rate': 1.7353996008677262e-05, 'epoch': 0.78} +2025-02-05 17:03:59 - ERROR - stderr - 26%|██▌ | 5831/22434 [6:56:19<11:41:36, 2.54s/it] +2025-02-05 17:04:02 - ERROR - stderr - 26%|██▌ | 5832/22434 [6:56:22<11:39:20, 2.53s/it] +2025-02-05 17:04:02 - ERROR - stderr - +2025-02-05 17:04:02 - ERROR - stderr - +2025-02-05 17:04:02 - INFO - stdout - {'loss': 0.7959, 'grad_norm': 0.9845685362815857, 'learning_rate': 1.735301760067516e-05, 'epoch': 0.78} +2025-02-05 17:04:02 - ERROR - stderr - 26%|██▌ | 5832/22434 [6:56:22<11:39:20, 2.53s/it] +2025-02-05 17:04:04 - ERROR - stderr - 26%|██▌ | 5833/22434 [6:56:24<11:33:54, 2.51s/it] +2025-02-05 17:04:04 - ERROR - stderr - +2025-02-05 17:04:04 - ERROR - stderr - +2025-02-05 17:04:04 - INFO - stdout - {'loss': 1.0104, 'grad_norm': 1.1169915199279785, 'learning_rate': 1.7352039039406442e-05, 'epoch': 0.78} +2025-02-05 17:04:04 - ERROR - stderr - 26%|██▌ | 5833/22434 [6:56:24<11:33:54, 2.51s/it] +2025-02-05 17:04:07 - ERROR - stderr - 26%|██▌ | 5834/22434 [6:56:27<11:31:52, 2.50s/it] +2025-02-05 17:04:07 - ERROR - stderr - +2025-02-05 17:04:07 - ERROR - stderr - +2025-02-05 17:04:07 - INFO - stdout - {'loss': 0.8499, 'grad_norm': 1.0956919193267822, 'learning_rate': 1.7351060324891506e-05, 'epoch': 0.78} +2025-02-05 17:04:07 - ERROR - stderr - 26%|██▌ | 5834/22434 [6:56:27<11:31:52, 2.50s/it] +2025-02-05 17:04:09 - ERROR - stderr - 26%|██▌ | 5835/22434 [6:56:29<11:31:39, 2.50s/it] +2025-02-05 17:04:09 - ERROR - stderr - +2025-02-05 17:04:09 - ERROR - stderr - +2025-02-05 17:04:09 - INFO - stdout - {'loss': 0.8643, 'grad_norm': 0.954009473323822, 'learning_rate': 1.735008145715075e-05, 'epoch': 0.78} +2025-02-05 17:04:09 - ERROR - stderr - 26%|██▌ | 5835/22434 [6:56:29<11:31:39, 2.50s/it] +2025-02-05 17:04:12 - ERROR - stderr - 26%|██▌ | 5836/22434 [6:56:32<11:36:13, 2.52s/it] +2025-02-05 17:04:12 - ERROR - stderr - +2025-02-05 17:04:12 - ERROR - stderr - +2025-02-05 17:04:12 - INFO - stdout - {'loss': 1.0588, 'grad_norm': 1.2194390296936035, 'learning_rate': 1.734910243620458e-05, 'epoch': 0.78} +2025-02-05 17:04:12 - ERROR - stderr - 26%|██▌ | 5836/22434 [6:56:32<11:36:13, 2.52s/it] +2025-02-05 17:04:14 - ERROR - stderr - 26%|██▌ | 5837/22434 [6:56:34<11:34:41, 2.51s/it] +2025-02-05 17:04:14 - ERROR - stderr - +2025-02-05 17:04:14 - ERROR - stderr - +2025-02-05 17:04:14 - INFO - stdout - {'loss': 0.8894, 'grad_norm': 1.0090768337249756, 'learning_rate': 1.73481232620734e-05, 'epoch': 0.78} +2025-02-05 17:04:14 - ERROR - stderr - 26%|██▌ | 5837/22434 [6:56:34<11:34:41, 2.51s/it] +2025-02-05 17:04:17 - ERROR - stderr - 26%|██▌ | 5838/22434 [6:56:37<11:35:59, 2.52s/it] +2025-02-05 17:04:17 - ERROR - stderr - +2025-02-05 17:04:17 - ERROR - stderr - +2025-02-05 17:04:17 - INFO - stdout - {'loss': 0.9734, 'grad_norm': 1.0626386404037476, 'learning_rate': 1.734714393477763e-05, 'epoch': 0.78} +2025-02-05 17:04:17 - ERROR - stderr - 26%|██▌ | 5838/22434 [6:56:37<11:35:59, 2.52s/it] +2025-02-05 17:04:19 - ERROR - stderr - 26%|██▌ | 5839/22434 [6:56:39<11:29:54, 2.49s/it] +2025-02-05 17:04:19 - ERROR - stderr - +2025-02-05 17:04:19 - ERROR - stderr - +2025-02-05 17:04:19 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 0.9648792147636414, 'learning_rate': 1.734616445433767e-05, 'epoch': 0.78} +2025-02-05 17:04:19 - ERROR - stderr - 26%|██▌ | 5839/22434 [6:56:39<11:29:54, 2.49s/it] +2025-02-05 17:04:22 - ERROR - stderr - 26%|██▌ | 5840/22434 [6:56:42<11:27:56, 2.49s/it] +2025-02-05 17:04:22 - ERROR - stderr - +2025-02-05 17:04:22 - ERROR - stderr - +2025-02-05 17:04:22 - INFO - stdout - {'loss': 1.0055, 'grad_norm': 1.12740957736969, 'learning_rate': 1.734518482077394e-05, 'epoch': 0.78} +2025-02-05 17:04:22 - ERROR - stderr - 26%|██▌ | 5840/22434 [6:56:42<11:27:56, 2.49s/it] +2025-02-05 17:04:24 - ERROR - stderr - 26%|██▌ | 5841/22434 [6:56:44<11:27:02, 2.48s/it] +2025-02-05 17:04:24 - ERROR - stderr - +2025-02-05 17:04:24 - ERROR - stderr - +2025-02-05 17:04:24 - INFO - stdout - {'loss': 0.9313, 'grad_norm': 1.0662246942520142, 'learning_rate': 1.7344205034106862e-05, 'epoch': 0.78} +2025-02-05 17:04:24 - ERROR - stderr - 26%|██▌ | 5841/22434 [6:56:44<11:27:02, 2.48s/it] +2025-02-05 17:04:27 - ERROR - stderr - 26%|██▌ | 5842/22434 [6:56:46<11:21:12, 2.46s/it] +2025-02-05 17:04:27 - ERROR - stderr - +2025-02-05 17:04:27 - ERROR - stderr - +2025-02-05 17:04:27 - INFO - stdout - {'loss': 1.0032, 'grad_norm': 1.106798768043518, 'learning_rate': 1.7343225094356857e-05, 'epoch': 0.78} +2025-02-05 17:04:27 - ERROR - stderr - 26%|██▌ | 5842/22434 [6:56:47<11:21:12, 2.46s/it] +2025-02-05 17:04:29 - ERROR - stderr - 26%|██▌ | 5843/22434 [6:56:49<11:22:53, 2.47s/it] +2025-02-05 17:04:29 - ERROR - stderr - +2025-02-05 17:04:29 - ERROR - stderr - +2025-02-05 17:04:29 - INFO - stdout - {'loss': 1.0699, 'grad_norm': 1.1787093877792358, 'learning_rate': 1.7342245001544352e-05, 'epoch': 0.78} +2025-02-05 17:04:29 - ERROR - stderr - 26%|██▌ | 5843/22434 [6:56:49<11:22:53, 2.47s/it] +2025-02-05 17:04:32 - ERROR - stderr - 26%|██▌ | 5844/22434 [6:56:51<11:23:59, 2.47s/it] +2025-02-05 17:04:32 - ERROR - stderr - +2025-02-05 17:04:32 - ERROR - stderr - +2025-02-05 17:04:32 - INFO - stdout - {'loss': 0.902, 'grad_norm': 1.0218850374221802, 'learning_rate': 1.7341264755689776e-05, 'epoch': 0.78} +2025-02-05 17:04:32 - ERROR - stderr - 26%|██▌ | 5844/22434 [6:56:51<11:23:59, 2.47s/it] +2025-02-05 17:04:34 - ERROR - stderr - 26%|██▌ | 5845/22434 [6:56:54<11:32:31, 2.50s/it] +2025-02-05 17:04:34 - ERROR - stderr - +2025-02-05 17:04:34 - ERROR - stderr - +2025-02-05 17:04:34 - INFO - stdout - {'loss': 1.0364, 'grad_norm': 1.0944106578826904, 'learning_rate': 1.734028435681356e-05, 'epoch': 0.78} +2025-02-05 17:04:34 - ERROR - stderr - 26%|██▌ | 5845/22434 [6:56:54<11:32:31, 2.50s/it] +2025-02-05 17:04:37 - ERROR - stderr - 26%|██▌ | 5846/22434 [6:56:56<11:25:32, 2.48s/it] +2025-02-05 17:04:37 - ERROR - stderr - +2025-02-05 17:04:37 - ERROR - stderr - +2025-02-05 17:04:37 - INFO - stdout - {'loss': 0.983, 'grad_norm': 1.1498346328735352, 'learning_rate': 1.7339303804936145e-05, 'epoch': 0.78} +2025-02-05 17:04:37 - ERROR - stderr - 26%|██▌ | 5846/22434 [6:56:56<11:25:32, 2.48s/it] +2025-02-05 17:04:39 - ERROR - stderr - 26%|██▌ | 5847/22434 [6:56:59<11:27:38, 2.49s/it] +2025-02-05 17:04:39 - ERROR - stderr - +2025-02-05 17:04:39 - ERROR - stderr - +2025-02-05 17:04:39 - INFO - stdout - {'loss': 0.8816, 'grad_norm': 0.9575804471969604, 'learning_rate': 1.7338323100077962e-05, 'epoch': 0.78} +2025-02-05 17:04:39 - ERROR - stderr - 26%|██▌ | 5847/22434 [6:56:59<11:27:38, 2.49s/it] +2025-02-05 17:04:42 - ERROR - stderr - 26%|██▌ | 5848/22434 [6:57:01<11:23:49, 2.47s/it] +2025-02-05 17:04:42 - ERROR - stderr - +2025-02-05 17:04:42 - ERROR - stderr - +2025-02-05 17:04:42 - INFO - stdout - {'loss': 0.9654, 'grad_norm': 1.039419412612915, 'learning_rate': 1.7337342242259455e-05, 'epoch': 0.78} +2025-02-05 17:04:42 - ERROR - stderr - 26%|██▌ | 5848/22434 [6:57:01<11:23:49, 2.47s/it] +2025-02-05 17:04:44 - ERROR - stderr - 26%|██▌ | 5849/22434 [6:57:04<11:31:29, 2.50s/it] +2025-02-05 17:04:44 - ERROR - stderr - +2025-02-05 17:04:44 - ERROR - stderr - +2025-02-05 17:04:44 - INFO - stdout - {'loss': 0.8725, 'grad_norm': 1.0011546611785889, 'learning_rate': 1.733636123150107e-05, 'epoch': 0.78} +2025-02-05 17:04:44 - ERROR - stderr - 26%|██▌ | 5849/22434 [6:57:04<11:31:29, 2.50s/it] +2025-02-05 17:04:47 - ERROR - stderr - 26%|██▌ | 5850/22434 [6:57:06<11:24:25, 2.48s/it] +2025-02-05 17:04:47 - ERROR - stderr - +2025-02-05 17:04:47 - ERROR - stderr - +2025-02-05 17:04:47 - INFO - stdout - {'loss': 0.9797, 'grad_norm': 0.9742418527603149, 'learning_rate': 1.7335380067823258e-05, 'epoch': 0.78} +2025-02-05 17:04:47 - ERROR - stderr - 26%|██▌ | 5850/22434 [6:57:06<11:24:25, 2.48s/it] +2025-02-05 17:04:49 - ERROR - stderr - 26%|██▌ | 5851/22434 [6:57:09<11:21:56, 2.47s/it] +2025-02-05 17:04:49 - ERROR - stderr - +2025-02-05 17:04:49 - ERROR - stderr - +2025-02-05 17:04:49 - INFO - stdout - {'loss': 0.8143, 'grad_norm': 0.9383313059806824, 'learning_rate': 1.7334398751246463e-05, 'epoch': 0.78} +2025-02-05 17:04:49 - ERROR - stderr - 26%|██▌ | 5851/22434 [6:57:09<11:21:56, 2.47s/it] +2025-02-05 17:04:51 - ERROR - stderr - 26%|██▌ | 5852/22434 [6:57:11<11:17:08, 2.45s/it] +2025-02-05 17:04:51 - ERROR - stderr - +2025-02-05 17:04:51 - ERROR - stderr - +2025-02-05 17:04:51 - INFO - stdout - {'loss': 0.8865, 'grad_norm': 1.0585530996322632, 'learning_rate': 1.733341728179115e-05, 'epoch': 0.78} +2025-02-05 17:04:51 - ERROR - stderr - 26%|██▌ | 5852/22434 [6:57:11<11:17:08, 2.45s/it] +2025-02-05 17:04:54 - ERROR - stderr - 26%|██▌ | 5853/22434 [6:57:14<11:26:35, 2.48s/it] +2025-02-05 17:04:54 - ERROR - stderr - +2025-02-05 17:04:54 - ERROR - stderr - +2025-02-05 17:04:54 - INFO - stdout - {'loss': 0.9445, 'grad_norm': 1.0603220462799072, 'learning_rate': 1.7332435659477765e-05, 'epoch': 0.78} +2025-02-05 17:04:54 - ERROR - stderr - 26%|██▌ | 5853/22434 [6:57:14<11:26:35, 2.48s/it] +2025-02-05 17:04:57 - ERROR - stderr - 26%|██▌ | 5854/22434 [6:57:16<11:30:29, 2.50s/it] +2025-02-05 17:04:57 - ERROR - stderr - +2025-02-05 17:04:57 - ERROR - stderr - +2025-02-05 17:04:57 - INFO - stdout - {'loss': 0.8455, 'grad_norm': 0.9509584903717041, 'learning_rate': 1.733145388432678e-05, 'epoch': 0.78} +2025-02-05 17:04:57 - ERROR - stderr - 26%|██▌ | 5854/22434 [6:57:16<11:30:29, 2.50s/it] +2025-02-05 17:04:59 - ERROR - stderr - 26%|██▌ | 5855/22434 [6:57:19<11:28:13, 2.49s/it] +2025-02-05 17:04:59 - ERROR - stderr - +2025-02-05 17:04:59 - ERROR - stderr - +2025-02-05 17:04:59 - INFO - stdout - {'loss': 0.9293, 'grad_norm': 1.1102031469345093, 'learning_rate': 1.7330471956358653e-05, 'epoch': 0.78} +2025-02-05 17:04:59 - ERROR - stderr - 26%|██▌ | 5855/22434 [6:57:19<11:28:13, 2.49s/it] +2025-02-05 17:05:02 - ERROR - stderr - 26%|██▌ | 5856/22434 [6:57:21<11:36:33, 2.52s/it] +2025-02-05 17:05:02 - ERROR - stderr - +2025-02-05 17:05:02 - ERROR - stderr - +2025-02-05 17:05:02 - INFO - stdout - {'loss': 0.8899, 'grad_norm': 1.098401427268982, 'learning_rate': 1.7329489875593852e-05, 'epoch': 0.78} +2025-02-05 17:05:02 - ERROR - stderr - 26%|██▌ | 5856/22434 [6:57:21<11:36:33, 2.52s/it] +2025-02-05 17:05:04 - ERROR - stderr - 26%|██▌ | 5857/22434 [6:57:24<11:33:03, 2.51s/it] +2025-02-05 17:05:04 - ERROR - stderr - +2025-02-05 17:05:04 - ERROR - stderr - +2025-02-05 17:05:04 - INFO - stdout - {'loss': 0.8922, 'grad_norm': 1.0150678157806396, 'learning_rate': 1.732850764205285e-05, 'epoch': 0.78} +2025-02-05 17:05:04 - ERROR - stderr - 26%|██▌ | 5857/22434 [6:57:24<11:33:03, 2.51s/it] +2025-02-05 17:05:07 - ERROR - stderr - 26%|██▌ | 5858/22434 [6:57:26<11:31:52, 2.50s/it] +2025-02-05 17:05:07 - ERROR - stderr - +2025-02-05 17:05:07 - ERROR - stderr - +2025-02-05 17:05:07 - INFO - stdout - {'loss': 0.9742, 'grad_norm': 0.9785661101341248, 'learning_rate': 1.7327525255756118e-05, 'epoch': 0.78} +2025-02-05 17:05:07 - ERROR - stderr - 26%|██▌ | 5858/22434 [6:57:26<11:31:52, 2.50s/it] +2025-02-05 17:05:09 - ERROR - stderr - 26%|██▌ | 5859/22434 [6:57:29<11:25:29, 2.48s/it] +2025-02-05 17:05:09 - ERROR - stderr - +2025-02-05 17:05:09 - ERROR - stderr - +2025-02-05 17:05:09 - INFO - stdout - {'loss': 0.983, 'grad_norm': 1.0655995607376099, 'learning_rate': 1.7326542716724127e-05, 'epoch': 0.78} +2025-02-05 17:05:09 - ERROR - stderr - 26%|██▌ | 5859/22434 [6:57:29<11:25:29, 2.48s/it] +2025-02-05 17:05:12 - ERROR - stderr - 26%|██▌ | 5860/22434 [6:57:31<11:31:54, 2.50s/it] +2025-02-05 17:05:12 - ERROR - stderr - +2025-02-05 17:05:12 - ERROR - stderr - +2025-02-05 17:05:12 - INFO - stdout - {'loss': 0.9121, 'grad_norm': 0.9597586393356323, 'learning_rate': 1.732556002497737e-05, 'epoch': 0.78} +2025-02-05 17:05:12 - ERROR - stderr - 26%|██▌ | 5860/22434 [6:57:31<11:31:54, 2.50s/it] +2025-02-05 17:05:14 - ERROR - stderr - 26%|██▌ | 5861/22434 [6:57:34<11:29:53, 2.50s/it] +2025-02-05 17:05:14 - ERROR - stderr - +2025-02-05 17:05:14 - ERROR - stderr - +2025-02-05 17:05:14 - INFO - stdout - {'loss': 0.8767, 'grad_norm': 0.9849139451980591, 'learning_rate': 1.7324577180536325e-05, 'epoch': 0.78} +2025-02-05 17:05:14 - ERROR - stderr - 26%|██▌ | 5861/22434 [6:57:34<11:29:53, 2.50s/it] +2025-02-05 17:05:16 - ERROR - stderr - 26%|██▌ | 5862/22434 [6:57:36<11:24:36, 2.48s/it] +2025-02-05 17:05:17 - ERROR - stderr - +2025-02-05 17:05:17 - ERROR - stderr - +2025-02-05 17:05:17 - INFO - stdout - {'loss': 0.9009, 'grad_norm': 0.9647621512413025, 'learning_rate': 1.7323594183421476e-05, 'epoch': 0.78} +2025-02-05 17:05:17 - ERROR - stderr - 26%|██▌ | 5862/22434 [6:57:36<11:24:36, 2.48s/it] +2025-02-05 17:05:19 - ERROR - stderr - 26%|██▌ | 5863/22434 [6:57:39<11:28:46, 2.49s/it] +2025-02-05 17:05:19 - ERROR - stderr - +2025-02-05 17:05:19 - ERROR - stderr - +2025-02-05 17:05:19 - INFO - stdout - {'loss': 0.8827, 'grad_norm': 1.1644455194473267, 'learning_rate': 1.7322611033653316e-05, 'epoch': 0.78} +2025-02-05 17:05:19 - ERROR - stderr - 26%|██▌ | 5863/22434 [6:57:39<11:28:46, 2.49s/it] +2025-02-05 17:05:21 - ERROR - stderr - 26%|██▌ | 5864/22434 [6:57:41<11:23:36, 2.48s/it] +2025-02-05 17:05:21 - ERROR - stderr - +2025-02-05 17:05:21 - ERROR - stderr - +2025-02-05 17:05:21 - INFO - stdout - {'loss': 1.0698, 'grad_norm': 1.057141661643982, 'learning_rate': 1.7321627731252336e-05, 'epoch': 0.78} +2025-02-05 17:05:21 - ERROR - stderr - 26%|██▌ | 5864/22434 [6:57:41<11:23:36, 2.48s/it] +2025-02-05 17:05:24 - ERROR - stderr - 26%|██▌ | 5865/22434 [6:57:44<11:23:14, 2.47s/it] +2025-02-05 17:05:24 - ERROR - stderr - +2025-02-05 17:05:24 - ERROR - stderr - +2025-02-05 17:05:24 - INFO - stdout - {'loss': 1.0372, 'grad_norm': 1.129396677017212, 'learning_rate': 1.732064427623903e-05, 'epoch': 0.78} +2025-02-05 17:05:24 - ERROR - stderr - 26%|██▌ | 5865/22434 [6:57:44<11:23:14, 2.47s/it] +2025-02-05 17:05:26 - ERROR - stderr - 26%|██▌ | 5866/22434 [6:57:46<11:24:39, 2.48s/it] +2025-02-05 17:05:26 - ERROR - stderr - +2025-02-05 17:05:26 - ERROR - stderr - +2025-02-05 17:05:26 - INFO - stdout - {'loss': 0.9073, 'grad_norm': 1.0874342918395996, 'learning_rate': 1.7319660668633897e-05, 'epoch': 0.78} +2025-02-05 17:05:26 - ERROR - stderr - 26%|██▌ | 5866/22434 [6:57:46<11:24:39, 2.48s/it] +2025-02-05 17:05:29 - ERROR - stderr - 26%|██▌ | 5867/22434 [6:57:49<11:22:48, 2.47s/it] +2025-02-05 17:05:29 - ERROR - stderr - +2025-02-05 17:05:29 - ERROR - stderr - +2025-02-05 17:05:29 - INFO - stdout - {'loss': 1.076, 'grad_norm': 1.1351569890975952, 'learning_rate': 1.7318676908457447e-05, 'epoch': 0.78} +2025-02-05 17:05:29 - ERROR - stderr - 26%|██▌ | 5867/22434 [6:57:49<11:22:48, 2.47s/it] +2025-02-05 17:05:32 - ERROR - stderr - 26%|██▌ | 5868/22434 [6:57:51<11:41:48, 2.54s/it] +2025-02-05 17:05:32 - ERROR - stderr - +2025-02-05 17:05:32 - ERROR - stderr - +2025-02-05 17:05:32 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.0553786754608154, 'learning_rate': 1.7317692995730174e-05, 'epoch': 0.78} +2025-02-05 17:05:32 - ERROR - stderr - 26%|██▌ | 5868/22434 [6:57:51<11:41:48, 2.54s/it] +2025-02-05 17:05:34 - ERROR - stderr - 26%|██▌ | 5869/22434 [6:57:54<11:39:36, 2.53s/it] +2025-02-05 17:05:34 - ERROR - stderr - +2025-02-05 17:05:34 - ERROR - stderr - +2025-02-05 17:05:34 - INFO - stdout - {'loss': 0.8443, 'grad_norm': 1.2016065120697021, 'learning_rate': 1.7316708930472596e-05, 'epoch': 0.78} +2025-02-05 17:05:34 - ERROR - stderr - 26%|██▌ | 5869/22434 [6:57:54<11:39:36, 2.53s/it] +2025-02-05 17:05:37 - ERROR - stderr - 26%|██▌ | 5870/22434 [6:57:56<11:32:02, 2.51s/it] +2025-02-05 17:05:37 - ERROR - stderr - +2025-02-05 17:05:37 - ERROR - stderr - +2025-02-05 17:05:37 - INFO - stdout - {'loss': 1.0887, 'grad_norm': 1.0746028423309326, 'learning_rate': 1.731572471270522e-05, 'epoch': 0.78} +2025-02-05 17:05:37 - ERROR - stderr - 26%|██▌ | 5870/22434 [6:57:56<11:32:02, 2.51s/it] +2025-02-05 17:05:39 - ERROR - stderr - 26%|██▌ | 5871/22434 [6:57:59<11:27:44, 2.49s/it] +2025-02-05 17:05:39 - ERROR - stderr - +2025-02-05 17:05:39 - ERROR - stderr - +2025-02-05 17:05:39 - INFO - stdout - {'loss': 0.866, 'grad_norm': 0.981548547744751, 'learning_rate': 1.7314740342448565e-05, 'epoch': 0.79} +2025-02-05 17:05:39 - ERROR - stderr - 26%|██▌ | 5871/22434 [6:57:59<11:27:44, 2.49s/it] +2025-02-05 17:05:42 - ERROR - stderr - 26%|██▌ | 5872/22434 [6:58:01<11:32:46, 2.51s/it] +2025-02-05 17:05:42 - ERROR - stderr - +2025-02-05 17:05:42 - ERROR - stderr - +2025-02-05 17:05:42 - INFO - stdout - {'loss': 1.0798, 'grad_norm': 1.1151477098464966, 'learning_rate': 1.731375581972315e-05, 'epoch': 0.79} +2025-02-05 17:05:42 - ERROR - stderr - 26%|██▌ | 5872/22434 [6:58:01<11:32:46, 2.51s/it] +2025-02-05 17:05:44 - ERROR - stderr - 26%|██▌ | 5873/22434 [6:58:04<11:38:19, 2.53s/it] +2025-02-05 17:05:44 - ERROR - stderr - +2025-02-05 17:05:44 - ERROR - stderr - +2025-02-05 17:05:44 - INFO - stdout - {'loss': 1.0079, 'grad_norm': 1.1292221546173096, 'learning_rate': 1.7312771144549488e-05, 'epoch': 0.79} +2025-02-05 17:05:44 - ERROR - stderr - 26%|██▌ | 5873/22434 [6:58:04<11:38:19, 2.53s/it] +2025-02-05 17:05:47 - ERROR - stderr - 26%|██▌ | 5874/22434 [6:58:06<11:35:31, 2.52s/it] +2025-02-05 17:05:47 - ERROR - stderr - +2025-02-05 17:05:47 - ERROR - stderr - +2025-02-05 17:05:47 - INFO - stdout - {'loss': 1.0172, 'grad_norm': 1.0944479703903198, 'learning_rate': 1.7311786316948112e-05, 'epoch': 0.79} +2025-02-05 17:05:47 - ERROR - stderr - 26%|██▌ | 5874/22434 [6:58:06<11:35:31, 2.52s/it] +2025-02-05 17:05:49 - ERROR - stderr - 26%|██▌ | 5875/22434 [6:58:09<11:41:47, 2.54s/it] +2025-02-05 17:05:49 - ERROR - stderr - +2025-02-05 17:05:49 - ERROR - stderr - +2025-02-05 17:05:49 - INFO - stdout - {'loss': 0.8997, 'grad_norm': 1.0610533952713013, 'learning_rate': 1.7310801336939542e-05, 'epoch': 0.79} +2025-02-05 17:05:49 - ERROR - stderr - 26%|██▌ | 5875/22434 [6:58:09<11:41:47, 2.54s/it] +2025-02-05 17:05:52 - ERROR - stderr - 26%|██▌ | 5876/22434 [6:58:12<12:05:30, 2.63s/it] +2025-02-05 17:05:52 - ERROR - stderr - +2025-02-05 17:05:52 - ERROR - stderr - +2025-02-05 17:05:52 - INFO - stdout - {'loss': 0.8724, 'grad_norm': 1.0645579099655151, 'learning_rate': 1.730981620454432e-05, 'epoch': 0.79} +2025-02-05 17:05:52 - ERROR - stderr - 26%|██▌ | 5876/22434 [6:58:12<12:05:30, 2.63s/it] +2025-02-05 17:05:55 - ERROR - stderr - 26%|██▌ | 5877/22434 [6:58:14<12:02:13, 2.62s/it] +2025-02-05 17:05:55 - ERROR - stderr - +2025-02-05 17:05:55 - ERROR - stderr - +2025-02-05 17:05:55 - INFO - stdout - {'loss': 0.9395, 'grad_norm': 1.1806964874267578, 'learning_rate': 1.7308830919782972e-05, 'epoch': 0.79} +2025-02-05 17:05:55 - ERROR - stderr - 26%|██▌ | 5877/22434 [6:58:14<12:02:13, 2.62s/it] +2025-02-05 17:05:57 - ERROR - stderr - 26%|██▌ | 5878/22434 [6:58:17<11:53:21, 2.59s/it] +2025-02-05 17:05:57 - ERROR - stderr - +2025-02-05 17:05:57 - ERROR - stderr - +2025-02-05 17:05:57 - INFO - stdout - {'loss': 0.9602, 'grad_norm': 1.1036674976348877, 'learning_rate': 1.7307845482676033e-05, 'epoch': 0.79} +2025-02-05 17:05:57 - ERROR - stderr - 26%|██▌ | 5878/22434 [6:58:17<11:53:21, 2.59s/it] +2025-02-05 17:06:00 - ERROR - stderr - 26%|██▌ | 5879/22434 [6:58:19<11:53:32, 2.59s/it] +2025-02-05 17:06:00 - ERROR - stderr - +2025-02-05 17:06:00 - ERROR - stderr - +2025-02-05 17:06:00 - INFO - stdout - {'loss': 0.9046, 'grad_norm': 1.0884637832641602, 'learning_rate': 1.7306859893244056e-05, 'epoch': 0.79} +2025-02-05 17:06:00 - ERROR - stderr - 26%|██▌ | 5879/22434 [6:58:20<11:53:32, 2.59s/it] +2025-02-05 17:06:02 - ERROR - stderr - 26%|██▌ | 5880/22434 [6:58:22<11:40:49, 2.54s/it] +2025-02-05 17:06:02 - ERROR - stderr - +2025-02-05 17:06:02 - ERROR - stderr - +2025-02-05 17:06:02 - INFO - stdout - {'loss': 0.8987, 'grad_norm': 1.0975658893585205, 'learning_rate': 1.730587415150757e-05, 'epoch': 0.79} +2025-02-05 17:06:02 - ERROR - stderr - 26%|██▌ | 5880/22434 [6:58:22<11:40:49, 2.54s/it] +2025-02-05 17:06:05 - ERROR - stderr - 26%|██▌ | 5881/22434 [6:58:24<11:39:44, 2.54s/it] +2025-02-05 17:06:05 - ERROR - stderr - +2025-02-05 17:06:05 - ERROR - stderr - +2025-02-05 17:06:05 - INFO - stdout - {'loss': 1.0424, 'grad_norm': 1.2087692022323608, 'learning_rate': 1.7304888257487128e-05, 'epoch': 0.79} +2025-02-05 17:06:05 - ERROR - stderr - 26%|██▌ | 5881/22434 [6:58:25<11:39:44, 2.54s/it] +2025-02-05 17:06:07 - ERROR - stderr - 26%|██▌ | 5882/22434 [6:58:27<11:38:52, 2.53s/it] +2025-02-05 17:06:07 - ERROR - stderr - +2025-02-05 17:06:07 - ERROR - stderr - +2025-02-05 17:06:07 - INFO - stdout - {'loss': 1.0312, 'grad_norm': 1.114935278892517, 'learning_rate': 1.7303902211203282e-05, 'epoch': 0.79} +2025-02-05 17:06:07 - ERROR - stderr - 26%|██▌ | 5882/22434 [6:58:27<11:38:52, 2.53s/it] +2025-02-05 17:06:10 - ERROR - stderr - 26%|██▌ | 5883/22434 [6:58:30<12:04:38, 2.63s/it] +2025-02-05 17:06:10 - ERROR - stderr - +2025-02-05 17:06:10 - ERROR - stderr - +2025-02-05 17:06:10 - INFO - stdout - {'loss': 1.014, 'grad_norm': 1.0774348974227905, 'learning_rate': 1.7302916012676587e-05, 'epoch': 0.79} +2025-02-05 17:06:10 - ERROR - stderr - 26%|██▌ | 5883/22434 [6:58:30<12:04:38, 2.63s/it] +2025-02-05 17:06:13 - ERROR - stderr - 26%|██▌ | 5884/22434 [6:58:32<11:56:47, 2.60s/it] +2025-02-05 17:06:13 - ERROR - stderr - +2025-02-05 17:06:13 - ERROR - stderr - +2025-02-05 17:06:13 - INFO - stdout - {'loss': 0.9218, 'grad_norm': 1.0701504945755005, 'learning_rate': 1.730192966192759e-05, 'epoch': 0.79} +2025-02-05 17:06:13 - ERROR - stderr - 26%|██▌ | 5884/22434 [6:58:32<11:56:47, 2.60s/it] +2025-02-05 17:06:15 - ERROR - stderr - 26%|██▌ | 5885/22434 [6:58:35<11:45:25, 2.56s/it] +2025-02-05 17:06:15 - ERROR - stderr - +2025-02-05 17:06:15 - ERROR - stderr - +2025-02-05 17:06:15 - INFO - stdout - {'loss': 1.0027, 'grad_norm': 1.119737982749939, 'learning_rate': 1.7300943158976863e-05, 'epoch': 0.79} +2025-02-05 17:06:15 - ERROR - stderr - 26%|██▌ | 5885/22434 [6:58:35<11:45:25, 2.56s/it] +2025-02-05 17:06:18 - ERROR - stderr - 26%|██▌ | 5886/22434 [6:58:37<11:40:21, 2.54s/it] +2025-02-05 17:06:18 - ERROR - stderr - +2025-02-05 17:06:18 - ERROR - stderr - +2025-02-05 17:06:18 - INFO - stdout - {'loss': 0.9071, 'grad_norm': 0.9682656526565552, 'learning_rate': 1.7299956503844955e-05, 'epoch': 0.79} +2025-02-05 17:06:18 - ERROR - stderr - 26%|██▌ | 5886/22434 [6:58:37<11:40:21, 2.54s/it] +2025-02-05 17:06:20 - ERROR - stderr - 26%|██▌ | 5887/22434 [6:58:40<11:32:08, 2.51s/it] +2025-02-05 17:06:20 - ERROR - stderr - +2025-02-05 17:06:20 - ERROR - stderr - +2025-02-05 17:06:20 - INFO - stdout - {'loss': 0.9025, 'grad_norm': 1.1441692113876343, 'learning_rate': 1.7298969696552442e-05, 'epoch': 0.79} +2025-02-05 17:06:20 - ERROR - stderr - 26%|██▌ | 5887/22434 [6:58:40<11:32:08, 2.51s/it] +2025-02-05 17:06:23 - ERROR - stderr - 26%|██▌ | 5888/22434 [6:58:42<11:34:59, 2.52s/it] +2025-02-05 17:06:23 - ERROR - stderr - +2025-02-05 17:06:23 - ERROR - stderr - +2025-02-05 17:06:23 - INFO - stdout - {'loss': 0.973, 'grad_norm': 1.169907808303833, 'learning_rate': 1.729798273711989e-05, 'epoch': 0.79} +2025-02-05 17:06:23 - ERROR - stderr - 26%|██▌ | 5888/22434 [6:58:42<11:34:59, 2.52s/it] +2025-02-05 17:06:25 - ERROR - stderr - 26%|██▋ | 5889/22434 [6:58:45<11:31:58, 2.51s/it] +2025-02-05 17:06:25 - ERROR - stderr - +2025-02-05 17:06:25 - ERROR - stderr - +2025-02-05 17:06:25 - INFO - stdout - {'loss': 0.9467, 'grad_norm': 1.281720757484436, 'learning_rate': 1.7296995625567872e-05, 'epoch': 0.79} +2025-02-05 17:06:25 - ERROR - stderr - 26%|██▋ | 5889/22434 [6:58:45<11:31:58, 2.51s/it] +2025-02-05 17:06:28 - ERROR - stderr - 26%|██▋ | 5890/22434 [6:58:47<11:33:56, 2.52s/it] +2025-02-05 17:06:28 - ERROR - stderr - +2025-02-05 17:06:28 - ERROR - stderr - +2025-02-05 17:06:28 - INFO - stdout - {'loss': 0.9103, 'grad_norm': 1.0011168718338013, 'learning_rate': 1.729600836191696e-05, 'epoch': 0.79} +2025-02-05 17:06:28 - ERROR - stderr - 26%|██▋ | 5890/22434 [6:58:47<11:33:56, 2.52s/it] +2025-02-05 17:06:30 - ERROR - stderr - 26%|██▋ | 5891/22434 [6:58:50<11:34:05, 2.52s/it] +2025-02-05 17:06:30 - ERROR - stderr - +2025-02-05 17:06:30 - ERROR - stderr - +2025-02-05 17:06:30 - INFO - stdout - {'loss': 0.915, 'grad_norm': 1.0064868927001953, 'learning_rate': 1.729502094618774e-05, 'epoch': 0.79} +2025-02-05 17:06:30 - ERROR - stderr - 26%|██▋ | 5891/22434 [6:58:50<11:34:05, 2.52s/it] +2025-02-05 17:06:33 - ERROR - stderr - 26%|██▋ | 5892/22434 [6:58:52<11:34:54, 2.52s/it] +2025-02-05 17:06:33 - ERROR - stderr - +2025-02-05 17:06:33 - ERROR - stderr - +2025-02-05 17:06:33 - INFO - stdout - {'loss': 0.8785, 'grad_norm': 1.0504189729690552, 'learning_rate': 1.7294033378400786e-05, 'epoch': 0.79} +2025-02-05 17:06:33 - ERROR - stderr - 26%|██▋ | 5892/22434 [6:58:52<11:34:54, 2.52s/it] +2025-02-05 17:06:35 - ERROR - stderr - 26%|██▋ | 5893/22434 [6:58:55<11:46:28, 2.56s/it] +2025-02-05 17:06:35 - ERROR - stderr - +2025-02-05 17:06:35 - ERROR - stderr - +2025-02-05 17:06:35 - INFO - stdout - {'loss': 0.9442, 'grad_norm': 1.0779844522476196, 'learning_rate': 1.7293045658576687e-05, 'epoch': 0.79} +2025-02-05 17:06:35 - ERROR - stderr - 26%|██▋ | 5893/22434 [6:58:55<11:46:28, 2.56s/it] +2025-02-05 17:06:38 - ERROR - stderr - 26%|██▋ | 5894/22434 [6:58:57<11:38:35, 2.53s/it] +2025-02-05 17:06:38 - ERROR - stderr - +2025-02-05 17:06:38 - ERROR - stderr - +2025-02-05 17:06:38 - INFO - stdout - {'loss': 0.869, 'grad_norm': 1.0728856325149536, 'learning_rate': 1.729205778673603e-05, 'epoch': 0.79} +2025-02-05 17:06:38 - ERROR - stderr - 26%|██▋ | 5894/22434 [6:58:58<11:38:35, 2.53s/it] +2025-02-05 17:06:40 - ERROR - stderr - 26%|██▋ | 5895/22434 [6:59:00<11:38:21, 2.53s/it] +2025-02-05 17:06:40 - ERROR - stderr - +2025-02-05 17:06:40 - ERROR - stderr - +2025-02-05 17:06:40 - INFO - stdout - {'loss': 0.8884, 'grad_norm': 1.02186918258667, 'learning_rate': 1.7291069762899404e-05, 'epoch': 0.79} +2025-02-05 17:06:40 - ERROR - stderr - 26%|██▋ | 5895/22434 [6:59:00<11:38:21, 2.53s/it] +2025-02-05 17:06:43 - ERROR - stderr - 26%|██▋ | 5896/22434 [6:59:03<11:33:29, 2.52s/it] +2025-02-05 17:06:43 - ERROR - stderr - +2025-02-05 17:06:43 - ERROR - stderr - +2025-02-05 17:06:43 - INFO - stdout - {'loss': 0.8941, 'grad_norm': 1.074196219444275, 'learning_rate': 1.7290081587087406e-05, 'epoch': 0.79} +2025-02-05 17:06:43 - ERROR - stderr - 26%|██▋ | 5896/22434 [6:59:03<11:33:29, 2.52s/it] +2025-02-05 17:06:45 - ERROR - stderr - 26%|██▋ | 5897/22434 [6:59:05<11:31:00, 2.51s/it] +2025-02-05 17:06:45 - ERROR - stderr - +2025-02-05 17:06:45 - ERROR - stderr - +2025-02-05 17:06:45 - INFO - stdout - {'loss': 0.926, 'grad_norm': 1.127129077911377, 'learning_rate': 1.7289093259320635e-05, 'epoch': 0.79} +2025-02-05 17:06:45 - ERROR - stderr - 26%|██▋ | 5897/22434 [6:59:05<11:31:00, 2.51s/it] +2025-02-05 17:06:48 - ERROR - stderr - 26%|██▋ | 5898/22434 [6:59:08<11:35:23, 2.52s/it] +2025-02-05 17:06:48 - ERROR - stderr - +2025-02-05 17:06:48 - ERROR - stderr - +2025-02-05 17:06:48 - INFO - stdout - {'loss': 0.8504, 'grad_norm': 1.024257779121399, 'learning_rate': 1.7288104779619688e-05, 'epoch': 0.79} +2025-02-05 17:06:48 - ERROR - stderr - 26%|██▋ | 5898/22434 [6:59:08<11:35:23, 2.52s/it] +2025-02-05 17:06:50 - ERROR - stderr - 26%|██▋ | 5899/22434 [6:59:10<11:37:17, 2.53s/it] +2025-02-05 17:06:50 - ERROR - stderr - +2025-02-05 17:06:50 - ERROR - stderr - +2025-02-05 17:06:50 - INFO - stdout - {'loss': 0.877, 'grad_norm': 1.0059282779693604, 'learning_rate': 1.7287116148005173e-05, 'epoch': 0.79} +2025-02-05 17:06:50 - ERROR - stderr - 26%|██▋ | 5899/22434 [6:59:10<11:37:17, 2.53s/it] +2025-02-05 17:06:53 - ERROR - stderr - 26%|██▋ | 5900/22434 [6:59:13<11:29:30, 2.50s/it] +2025-02-05 17:06:53 - ERROR - stderr - +2025-02-05 17:06:53 - ERROR - stderr - +2025-02-05 17:06:53 - INFO - stdout - {'loss': 0.9255, 'grad_norm': 1.1229854822158813, 'learning_rate': 1.7286127364497692e-05, 'epoch': 0.79} +2025-02-05 17:06:53 - ERROR - stderr - 26%|██▋ | 5900/22434 [6:59:13<11:29:30, 2.50s/it] +2025-02-05 17:06:55 - ERROR - stderr - 26%|██▋ | 5901/22434 [6:59:15<11:30:41, 2.51s/it] +2025-02-05 17:06:55 - ERROR - stderr - +2025-02-05 17:06:55 - ERROR - stderr - +2025-02-05 17:06:55 - INFO - stdout - {'loss': 1.0296, 'grad_norm': 1.1694836616516113, 'learning_rate': 1.728513842911786e-05, 'epoch': 0.79} +2025-02-05 17:06:55 - ERROR - stderr - 26%|██▋ | 5901/22434 [6:59:15<11:30:41, 2.51s/it] +2025-02-05 17:06:58 - ERROR - stderr - 26%|██▋ | 5902/22434 [6:59:18<11:31:22, 2.51s/it] +2025-02-05 17:06:58 - ERROR - stderr - +2025-02-05 17:06:58 - ERROR - stderr - +2025-02-05 17:06:58 - INFO - stdout - {'loss': 0.845, 'grad_norm': 0.9748122692108154, 'learning_rate': 1.7284149341886286e-05, 'epoch': 0.79} +2025-02-05 17:06:58 - ERROR - stderr - 26%|██▋ | 5902/22434 [6:59:18<11:31:22, 2.51s/it] +2025-02-05 17:07:00 - ERROR - stderr - 26%|██▋ | 5903/22434 [6:59:20<11:33:10, 2.52s/it] +2025-02-05 17:07:00 - ERROR - stderr - +2025-02-05 17:07:00 - ERROR - stderr - +2025-02-05 17:07:00 - INFO - stdout - {'loss': 1.0101, 'grad_norm': 1.0393608808517456, 'learning_rate': 1.7283160102823594e-05, 'epoch': 0.79} +2025-02-05 17:07:00 - ERROR - stderr - 26%|██▋ | 5903/22434 [6:59:20<11:33:10, 2.52s/it] +2025-02-05 17:07:03 - ERROR - stderr - 26%|██▋ | 5904/22434 [6:59:23<11:34:31, 2.52s/it] +2025-02-05 17:07:03 - ERROR - stderr - +2025-02-05 17:07:03 - ERROR - stderr - +2025-02-05 17:07:03 - INFO - stdout - {'loss': 0.8974, 'grad_norm': 1.0212371349334717, 'learning_rate': 1.7282170711950396e-05, 'epoch': 0.79} +2025-02-05 17:07:03 - ERROR - stderr - 26%|██▋ | 5904/22434 [6:59:23<11:34:31, 2.52s/it] +2025-02-05 17:07:05 - ERROR - stderr - 26%|██▋ | 5905/22434 [6:59:25<11:28:33, 2.50s/it] +2025-02-05 17:07:05 - ERROR - stderr - +2025-02-05 17:07:05 - ERROR - stderr - +2025-02-05 17:07:05 - INFO - stdout - {'loss': 0.9799, 'grad_norm': 1.131479263305664, 'learning_rate': 1.7281181169287318e-05, 'epoch': 0.79} +2025-02-05 17:07:05 - ERROR - stderr - 26%|██▋ | 5905/22434 [6:59:25<11:28:33, 2.50s/it] +2025-02-05 17:07:08 - ERROR - stderr - 26%|██▋ | 5906/22434 [6:59:28<11:30:31, 2.51s/it] +2025-02-05 17:07:08 - ERROR - stderr - +2025-02-05 17:07:08 - ERROR - stderr - +2025-02-05 17:07:08 - INFO - stdout - {'loss': 0.9808, 'grad_norm': 1.0069595575332642, 'learning_rate': 1.7280191474854988e-05, 'epoch': 0.79} +2025-02-05 17:07:08 - ERROR - stderr - 26%|██▋ | 5906/22434 [6:59:28<11:30:31, 2.51s/it] +2025-02-05 17:07:10 - ERROR - stderr - 26%|██▋ | 5907/22434 [6:59:30<11:31:06, 2.51s/it] +2025-02-05 17:07:10 - ERROR - stderr - +2025-02-05 17:07:10 - ERROR - stderr - +2025-02-05 17:07:10 - INFO - stdout - {'loss': 1.0175, 'grad_norm': 1.0685888528823853, 'learning_rate': 1.7279201628674028e-05, 'epoch': 0.79} +2025-02-05 17:07:10 - ERROR - stderr - 26%|██▋ | 5907/22434 [6:59:30<11:31:06, 2.51s/it] +2025-02-05 17:07:13 - ERROR - stderr - 26%|██▋ | 5908/22434 [6:59:33<11:27:26, 2.50s/it] +2025-02-05 17:07:13 - ERROR - stderr - +2025-02-05 17:07:13 - ERROR - stderr - +2025-02-05 17:07:13 - INFO - stdout - {'loss': 0.9228, 'grad_norm': 0.9918084144592285, 'learning_rate': 1.727821163076508e-05, 'epoch': 0.79} +2025-02-05 17:07:13 - ERROR - stderr - 26%|██▋ | 5908/22434 [6:59:33<11:27:26, 2.50s/it] +2025-02-05 17:07:15 - ERROR - stderr - 26%|██▋ | 5909/22434 [6:59:35<11:22:44, 2.48s/it] +2025-02-05 17:07:15 - ERROR - stderr - +2025-02-05 17:07:15 - ERROR - stderr - +2025-02-05 17:07:15 - INFO - stdout - {'loss': 0.9198, 'grad_norm': 0.9413108825683594, 'learning_rate': 1.7277221481148774e-05, 'epoch': 0.79} +2025-02-05 17:07:15 - ERROR - stderr - 26%|██▋ | 5909/22434 [6:59:35<11:22:44, 2.48s/it] +2025-02-05 17:07:18 - ERROR - stderr - 26%|██▋ | 5910/22434 [6:59:38<11:30:35, 2.51s/it] +2025-02-05 17:07:18 - ERROR - stderr - +2025-02-05 17:07:18 - ERROR - stderr - +2025-02-05 17:07:18 - INFO - stdout - {'loss': 0.8837, 'grad_norm': 1.0364792346954346, 'learning_rate': 1.727623117984575e-05, 'epoch': 0.79} +2025-02-05 17:07:18 - ERROR - stderr - 26%|██▋ | 5910/22434 [6:59:38<11:30:35, 2.51s/it] +2025-02-05 17:07:20 - ERROR - stderr - 26%|██▋ | 5911/22434 [6:59:40<11:29:24, 2.50s/it] +2025-02-05 17:07:20 - ERROR - stderr - +2025-02-05 17:07:20 - ERROR - stderr - +2025-02-05 17:07:20 - INFO - stdout - {'loss': 1.0836, 'grad_norm': 1.1601110696792603, 'learning_rate': 1.727524072687665e-05, 'epoch': 0.79} +2025-02-05 17:07:20 - ERROR - stderr - 26%|██▋ | 5911/22434 [6:59:40<11:29:24, 2.50s/it] +2025-02-05 17:07:23 - ERROR - stderr - 26%|██▋ | 5912/22434 [6:59:43<11:32:41, 2.52s/it] +2025-02-05 17:07:23 - ERROR - stderr - +2025-02-05 17:07:23 - ERROR - stderr - +2025-02-05 17:07:23 - INFO - stdout - {'loss': 0.9599, 'grad_norm': 1.0005912780761719, 'learning_rate': 1.7274250122262116e-05, 'epoch': 0.79} +2025-02-05 17:07:23 - ERROR - stderr - 26%|██▋ | 5912/22434 [6:59:43<11:32:41, 2.52s/it] +2025-02-05 17:07:25 - ERROR - stderr - 26%|██▋ | 5913/22434 [6:59:45<11:34:44, 2.52s/it] +2025-02-05 17:07:25 - ERROR - stderr - +2025-02-05 17:07:25 - ERROR - stderr - +2025-02-05 17:07:25 - INFO - stdout - {'loss': 0.8913, 'grad_norm': 1.0677276849746704, 'learning_rate': 1.7273259366022802e-05, 'epoch': 0.79} +2025-02-05 17:07:25 - ERROR - stderr - 26%|██▋ | 5913/22434 [6:59:45<11:34:44, 2.52s/it] +2025-02-05 17:07:28 - ERROR - stderr - 26%|██▋ | 5914/22434 [6:59:48<11:29:50, 2.51s/it] +2025-02-05 17:07:28 - ERROR - stderr - +2025-02-05 17:07:28 - ERROR - stderr - +2025-02-05 17:07:28 - INFO - stdout - {'loss': 0.9278, 'grad_norm': 1.0820367336273193, 'learning_rate': 1.7272268458179352e-05, 'epoch': 0.79} +2025-02-05 17:07:28 - ERROR - stderr - 26%|██▋ | 5914/22434 [6:59:48<11:29:50, 2.51s/it] +2025-02-05 17:07:30 - ERROR - stderr - 26%|██▋ | 5915/22434 [6:59:50<11:37:58, 2.54s/it] +2025-02-05 17:07:31 - ERROR - stderr - +2025-02-05 17:07:31 - ERROR - stderr - +2025-02-05 17:07:31 - INFO - stdout - {'loss': 0.8708, 'grad_norm': 1.1510486602783203, 'learning_rate': 1.727127739875243e-05, 'epoch': 0.79} +2025-02-05 17:07:31 - ERROR - stderr - 26%|██▋ | 5915/22434 [6:59:50<11:37:58, 2.54s/it] +2025-02-05 17:07:33 - ERROR - stderr - 26%|██▋ | 5916/22434 [6:59:53<11:37:09, 2.53s/it] +2025-02-05 17:07:33 - ERROR - stderr - +2025-02-05 17:07:33 - ERROR - stderr - +2025-02-05 17:07:33 - INFO - stdout - {'loss': 0.8709, 'grad_norm': 1.0579713582992554, 'learning_rate': 1.7270286187762686e-05, 'epoch': 0.79} +2025-02-05 17:07:33 - ERROR - stderr - 26%|██▋ | 5916/22434 [6:59:53<11:37:09, 2.53s/it] +2025-02-05 17:07:36 - ERROR - stderr - 26%|██▋ | 5917/22434 [6:59:55<11:35:14, 2.53s/it] +2025-02-05 17:07:36 - ERROR - stderr - +2025-02-05 17:07:36 - ERROR - stderr - +2025-02-05 17:07:36 - INFO - stdout - {'loss': 0.9742, 'grad_norm': 1.0919411182403564, 'learning_rate': 1.7269294825230784e-05, 'epoch': 0.79} +2025-02-05 17:07:36 - ERROR - stderr - 26%|██▋ | 5917/22434 [6:59:55<11:35:14, 2.53s/it] +2025-02-05 17:07:38 - ERROR - stderr - 26%|██▋ | 5918/22434 [6:59:58<12:01:13, 2.62s/it] +2025-02-05 17:07:38 - ERROR - stderr - +2025-02-05 17:07:38 - ERROR - stderr - +2025-02-05 17:07:38 - INFO - stdout - {'loss': 0.9494, 'grad_norm': 1.0626649856567383, 'learning_rate': 1.7268303311177387e-05, 'epoch': 0.79} +2025-02-05 17:07:38 - ERROR - stderr - 26%|██▋ | 5918/22434 [6:59:58<12:01:13, 2.62s/it] +2025-02-05 17:07:41 - ERROR - stderr - 26%|██▋ | 5919/22434 [7:00:01<11:51:09, 2.58s/it] +2025-02-05 17:07:41 - ERROR - stderr - +2025-02-05 17:07:41 - ERROR - stderr - +2025-02-05 17:07:41 - INFO - stdout - {'loss': 1.0083, 'grad_norm': 0.970781147480011, 'learning_rate': 1.7267311645623163e-05, 'epoch': 0.79} +2025-02-05 17:07:41 - ERROR - stderr - 26%|██▋ | 5919/22434 [7:00:01<11:51:09, 2.58s/it] +2025-02-05 17:07:44 - ERROR - stderr - 26%|██▋ | 5920/22434 [7:00:03<12:11:07, 2.66s/it] +2025-02-05 17:07:44 - ERROR - stderr - +2025-02-05 17:07:44 - ERROR - stderr - +2025-02-05 17:07:44 - INFO - stdout - {'loss': 0.9503, 'grad_norm': 1.118196725845337, 'learning_rate': 1.726631982858878e-05, 'epoch': 0.79} +2025-02-05 17:07:44 - ERROR - stderr - 26%|██▋ | 5920/22434 [7:00:03<12:11:07, 2.66s/it] +2025-02-05 17:07:46 - ERROR - stderr - 26%|██▋ | 5921/22434 [7:00:06<11:55:50, 2.60s/it] +2025-02-05 17:07:46 - ERROR - stderr - +2025-02-05 17:07:46 - ERROR - stderr - +2025-02-05 17:07:46 - INFO - stdout - {'loss': 1.0777, 'grad_norm': 1.153403401374817, 'learning_rate': 1.7265327860094916e-05, 'epoch': 0.79} +2025-02-05 17:07:46 - ERROR - stderr - 26%|██▋ | 5921/22434 [7:00:06<11:55:50, 2.60s/it] +2025-02-05 17:07:49 - ERROR - stderr - 26%|██▋ | 5922/22434 [7:00:08<11:45:36, 2.56s/it] +2025-02-05 17:07:49 - ERROR - stderr - +2025-02-05 17:07:49 - ERROR - stderr - +2025-02-05 17:07:49 - INFO - stdout - {'loss': 0.8602, 'grad_norm': 0.9938598871231079, 'learning_rate': 1.7264335740162244e-05, 'epoch': 0.79} +2025-02-05 17:07:49 - ERROR - stderr - 26%|██▋ | 5922/22434 [7:00:08<11:45:36, 2.56s/it] +2025-02-05 17:07:51 - ERROR - stderr - 26%|██▋ | 5923/22434 [7:00:11<11:39:17, 2.54s/it] +2025-02-05 17:07:51 - ERROR - stderr - +2025-02-05 17:07:51 - ERROR - stderr - +2025-02-05 17:07:51 - INFO - stdout - {'loss': 1.028, 'grad_norm': 1.1479504108428955, 'learning_rate': 1.7263343468811444e-05, 'epoch': 0.79} +2025-02-05 17:07:51 - ERROR - stderr - 26%|██▋ | 5923/22434 [7:00:11<11:39:17, 2.54s/it] +2025-02-05 17:07:54 - ERROR - stderr - 26%|██▋ | 5924/22434 [7:00:13<11:38:11, 2.54s/it] +2025-02-05 17:07:54 - ERROR - stderr - +2025-02-05 17:07:54 - ERROR - stderr - +2025-02-05 17:07:54 - INFO - stdout - {'loss': 0.9505, 'grad_norm': 1.1931774616241455, 'learning_rate': 1.72623510460632e-05, 'epoch': 0.79} +2025-02-05 17:07:54 - ERROR - stderr - 26%|██▋ | 5924/22434 [7:00:13<11:38:11, 2.54s/it] +2025-02-05 17:07:56 - ERROR - stderr - 26%|██▋ | 5925/22434 [7:00:16<11:35:11, 2.53s/it] +2025-02-05 17:07:56 - ERROR - stderr - +2025-02-05 17:07:56 - ERROR - stderr - +2025-02-05 17:07:56 - INFO - stdout - {'loss': 0.8999, 'grad_norm': 1.0811222791671753, 'learning_rate': 1.7261358471938195e-05, 'epoch': 0.79} +2025-02-05 17:07:56 - ERROR - stderr - 26%|██▋ | 5925/22434 [7:00:16<11:35:11, 2.53s/it] +2025-02-05 17:07:59 - ERROR - stderr - 26%|██▋ | 5926/22434 [7:00:18<11:32:36, 2.52s/it] +2025-02-05 17:07:59 - ERROR - stderr - +2025-02-05 17:07:59 - ERROR - stderr - +2025-02-05 17:07:59 - INFO - stdout - {'loss': 0.892, 'grad_norm': 1.014931082725525, 'learning_rate': 1.7260365746457125e-05, 'epoch': 0.79} +2025-02-05 17:07:59 - ERROR - stderr - 26%|██▋ | 5926/22434 [7:00:18<11:32:36, 2.52s/it] +2025-02-05 17:08:01 - ERROR - stderr - 26%|██▋ | 5927/22434 [7:00:21<11:36:45, 2.53s/it] +2025-02-05 17:08:01 - ERROR - stderr - +2025-02-05 17:08:01 - ERROR - stderr - +2025-02-05 17:08:01 - INFO - stdout - {'loss': 0.8746, 'grad_norm': 0.9597230553627014, 'learning_rate': 1.725937286964068e-05, 'epoch': 0.79} +2025-02-05 17:08:01 - ERROR - stderr - 26%|██▋ | 5927/22434 [7:00:21<11:36:45, 2.53s/it] +2025-02-05 17:08:04 - ERROR - stderr - 26%|██▋ | 5928/22434 [7:00:24<11:41:49, 2.55s/it] +2025-02-05 17:08:04 - ERROR - stderr - +2025-02-05 17:08:04 - ERROR - stderr - +2025-02-05 17:08:04 - INFO - stdout - {'loss': 0.7494, 'grad_norm': 0.9802173972129822, 'learning_rate': 1.725837984150955e-05, 'epoch': 0.79} +2025-02-05 17:08:04 - ERROR - stderr - 26%|██▋ | 5928/22434 [7:00:24<11:41:49, 2.55s/it] +2025-02-05 17:08:06 - ERROR - stderr - 26%|██▋ | 5929/22434 [7:00:26<11:38:15, 2.54s/it] +2025-02-05 17:08:06 - ERROR - stderr - +2025-02-05 17:08:06 - ERROR - stderr - +2025-02-05 17:08:06 - INFO - stdout - {'loss': 0.8316, 'grad_norm': 1.0733377933502197, 'learning_rate': 1.7257386662084435e-05, 'epoch': 0.79} +2025-02-05 17:08:06 - ERROR - stderr - 26%|██▋ | 5929/22434 [7:00:26<11:38:15, 2.54s/it] +2025-02-05 17:08:09 - ERROR - stderr - 26%|██▋ | 5930/22434 [7:00:29<11:38:30, 2.54s/it] +2025-02-05 17:08:09 - ERROR - stderr - +2025-02-05 17:08:09 - ERROR - stderr - +2025-02-05 17:08:09 - INFO - stdout - {'loss': 1.0157, 'grad_norm': 1.0939191579818726, 'learning_rate': 1.7256393331386046e-05, 'epoch': 0.79} +2025-02-05 17:08:09 - ERROR - stderr - 26%|██▋ | 5930/22434 [7:00:29<11:38:30, 2.54s/it] +2025-02-05 17:08:11 - ERROR - stderr - 26%|██▋ | 5931/22434 [7:00:31<11:35:14, 2.53s/it] +2025-02-05 17:08:11 - ERROR - stderr - +2025-02-05 17:08:11 - ERROR - stderr - +2025-02-05 17:08:11 - INFO - stdout - {'loss': 1.0044, 'grad_norm': 1.167578935623169, 'learning_rate': 1.7255399849435077e-05, 'epoch': 0.79} +2025-02-05 17:08:11 - ERROR - stderr - 26%|██▋ | 5931/22434 [7:00:31<11:35:14, 2.53s/it] +2025-02-05 17:08:14 - ERROR - stderr - 26%|██▋ | 5932/22434 [7:00:34<11:35:11, 2.53s/it] +2025-02-05 17:08:14 - ERROR - stderr - +2025-02-05 17:08:14 - ERROR - stderr - +2025-02-05 17:08:14 - INFO - stdout - {'loss': 0.927, 'grad_norm': 0.9683929681777954, 'learning_rate': 1.7254406216252243e-05, 'epoch': 0.79} +2025-02-05 17:08:14 - ERROR - stderr - 26%|██▋ | 5932/22434 [7:00:34<11:35:11, 2.53s/it] +2025-02-05 17:08:16 - ERROR - stderr - 26%|██▋ | 5933/22434 [7:00:36<11:37:45, 2.54s/it] +2025-02-05 17:08:16 - ERROR - stderr - +2025-02-05 17:08:16 - ERROR - stderr - +2025-02-05 17:08:16 - INFO - stdout - {'loss': 0.9656, 'grad_norm': 1.0881621837615967, 'learning_rate': 1.7253412431858253e-05, 'epoch': 0.79} +2025-02-05 17:08:16 - ERROR - stderr - 26%|██▋ | 5933/22434 [7:00:36<11:37:45, 2.54s/it] +2025-02-05 17:08:19 - ERROR - stderr - 26%|██▋ | 5934/22434 [7:00:39<11:34:34, 2.53s/it] +2025-02-05 17:08:19 - ERROR - stderr - +2025-02-05 17:08:19 - ERROR - stderr - +2025-02-05 17:08:19 - INFO - stdout - {'loss': 0.9237, 'grad_norm': 0.9965432286262512, 'learning_rate': 1.7252418496273822e-05, 'epoch': 0.79} +2025-02-05 17:08:19 - ERROR - stderr - 26%|██▋ | 5934/22434 [7:00:39<11:34:34, 2.53s/it] +2025-02-05 17:08:21 - ERROR - stderr - 26%|██▋ | 5935/22434 [7:00:41<11:36:54, 2.53s/it] +2025-02-05 17:08:22 - ERROR - stderr - +2025-02-05 17:08:22 - ERROR - stderr - +2025-02-05 17:08:22 - INFO - stdout - {'loss': 0.951, 'grad_norm': 1.0255216360092163, 'learning_rate': 1.7251424409519665e-05, 'epoch': 0.79} +2025-02-05 17:08:22 - ERROR - stderr - 26%|██▋ | 5935/22434 [7:00:41<11:36:54, 2.53s/it] +2025-02-05 17:08:24 - ERROR - stderr - 26%|██▋ | 5936/22434 [7:00:44<11:50:23, 2.58s/it] +2025-02-05 17:08:24 - ERROR - stderr - +2025-02-05 17:08:24 - ERROR - stderr - +2025-02-05 17:08:24 - INFO - stdout - {'loss': 0.9138, 'grad_norm': 0.9688674211502075, 'learning_rate': 1.7250430171616507e-05, 'epoch': 0.79} +2025-02-05 17:08:24 - ERROR - stderr - 26%|██▋ | 5936/22434 [7:00:44<11:50:23, 2.58s/it] +2025-02-05 17:08:27 - ERROR - stderr - 26%|██▋ | 5937/22434 [7:00:47<11:51:11, 2.59s/it] +2025-02-05 17:08:27 - ERROR - stderr - +2025-02-05 17:08:27 - ERROR - stderr - +2025-02-05 17:08:27 - INFO - stdout - {'loss': 0.9318, 'grad_norm': 1.1297768354415894, 'learning_rate': 1.724943578258507e-05, 'epoch': 0.79} +2025-02-05 17:08:27 - ERROR - stderr - 26%|██▋ | 5937/22434 [7:00:47<11:51:11, 2.59s/it] +2025-02-05 17:08:29 - ERROR - stderr - 26%|██▋ | 5938/22434 [7:00:49<11:43:57, 2.56s/it] +2025-02-05 17:08:29 - ERROR - stderr - +2025-02-05 17:08:29 - ERROR - stderr - +2025-02-05 17:08:29 - INFO - stdout - {'loss': 0.9276, 'grad_norm': 1.1526602506637573, 'learning_rate': 1.7248441242446082e-05, 'epoch': 0.79} +2025-02-05 17:08:29 - ERROR - stderr - 26%|██▋ | 5938/22434 [7:00:49<11:43:57, 2.56s/it] +2025-02-05 17:08:32 - ERROR - stderr - 26%|██▋ | 5939/22434 [7:00:52<11:36:46, 2.53s/it] +2025-02-05 17:08:32 - ERROR - stderr - +2025-02-05 17:08:32 - ERROR - stderr - +2025-02-05 17:08:32 - INFO - stdout - {'loss': 0.934, 'grad_norm': 1.1144160032272339, 'learning_rate': 1.7247446551220273e-05, 'epoch': 0.79} +2025-02-05 17:08:32 - ERROR - stderr - 26%|██▋ | 5939/22434 [7:00:52<11:36:46, 2.53s/it] +2025-02-05 17:08:34 - ERROR - stderr - 26%|██▋ | 5940/22434 [7:00:54<11:28:24, 2.50s/it] +2025-02-05 17:08:34 - ERROR - stderr - +2025-02-05 17:08:34 - ERROR - stderr - +2025-02-05 17:08:34 - INFO - stdout - {'loss': 0.9859, 'grad_norm': 1.1218068599700928, 'learning_rate': 1.724645170892837e-05, 'epoch': 0.79} +2025-02-05 17:08:34 - ERROR - stderr - 26%|██▋ | 5940/22434 [7:00:54<11:28:24, 2.50s/it] +2025-02-05 17:08:37 - ERROR - stderr - 26%|██▋ | 5941/22434 [7:00:57<11:32:07, 2.52s/it] +2025-02-05 17:08:37 - ERROR - stderr - +2025-02-05 17:08:37 - ERROR - stderr - +2025-02-05 17:08:37 - INFO - stdout - {'loss': 1.0143, 'grad_norm': 1.1022231578826904, 'learning_rate': 1.7245456715591122e-05, 'epoch': 0.79} +2025-02-05 17:08:37 - ERROR - stderr - 26%|██▋ | 5941/22434 [7:00:57<11:32:07, 2.52s/it] +2025-02-05 17:08:39 - ERROR - stderr - 26%|██▋ | 5942/22434 [7:00:59<11:45:14, 2.57s/it] +2025-02-05 17:08:39 - ERROR - stderr - +2025-02-05 17:08:39 - ERROR - stderr - +2025-02-05 17:08:39 - INFO - stdout - {'loss': 0.9203, 'grad_norm': 0.9646422863006592, 'learning_rate': 1.724446157122926e-05, 'epoch': 0.79} +2025-02-05 17:08:39 - ERROR - stderr - 26%|██▋ | 5942/22434 [7:00:59<11:45:14, 2.57s/it] +2025-02-05 17:08:42 - ERROR - stderr - 26%|██▋ | 5943/22434 [7:01:02<11:42:25, 2.56s/it] +2025-02-05 17:08:42 - ERROR - stderr - +2025-02-05 17:08:42 - ERROR - stderr - +2025-02-05 17:08:42 - INFO - stdout - {'loss': 0.8632, 'grad_norm': 0.9386504888534546, 'learning_rate': 1.7243466275863525e-05, 'epoch': 0.79} +2025-02-05 17:08:42 - ERROR - stderr - 26%|██▋ | 5943/22434 [7:01:02<11:42:25, 2.56s/it] +2025-02-05 17:08:44 - ERROR - stderr - 26%|██▋ | 5944/22434 [7:01:04<11:33:15, 2.52s/it] +2025-02-05 17:08:44 - ERROR - stderr - +2025-02-05 17:08:44 - ERROR - stderr - +2025-02-05 17:08:44 - INFO - stdout - {'loss': 0.9393, 'grad_norm': 1.1277166604995728, 'learning_rate': 1.7242470829514674e-05, 'epoch': 0.79} +2025-02-05 17:08:44 - ERROR - stderr - 26%|██▋ | 5944/22434 [7:01:04<11:33:15, 2.52s/it] +2025-02-05 17:08:47 - ERROR - stderr - 26%|██▋ | 5945/22434 [7:01:07<11:27:54, 2.50s/it] +2025-02-05 17:08:47 - ERROR - stderr - +2025-02-05 17:08:47 - ERROR - stderr - +2025-02-05 17:08:47 - INFO - stdout - {'loss': 1.0191, 'grad_norm': 1.03009831905365, 'learning_rate': 1.724147523220344e-05, 'epoch': 0.79} +2025-02-05 17:08:47 - ERROR - stderr - 26%|██▋ | 5945/22434 [7:01:07<11:27:54, 2.50s/it] +2025-02-05 17:08:49 - ERROR - stderr - 27%|██▋ | 5946/22434 [7:01:09<11:28:14, 2.50s/it] +2025-02-05 17:08:49 - ERROR - stderr - +2025-02-05 17:08:49 - ERROR - stderr - +2025-02-05 17:08:49 - INFO - stdout - {'loss': 0.96, 'grad_norm': 1.011220932006836, 'learning_rate': 1.724047948395059e-05, 'epoch': 0.8} +2025-02-05 17:08:49 - ERROR - stderr - 27%|██▋ | 5946/22434 [7:01:09<11:28:14, 2.50s/it] +2025-02-05 17:08:52 - ERROR - stderr - 27%|██▋ | 5947/22434 [7:01:12<11:25:45, 2.50s/it] +2025-02-05 17:08:52 - ERROR - stderr - +2025-02-05 17:08:52 - ERROR - stderr - +2025-02-05 17:08:52 - INFO - stdout - {'loss': 0.9475, 'grad_norm': 1.137093186378479, 'learning_rate': 1.7239483584776873e-05, 'epoch': 0.8} +2025-02-05 17:08:52 - ERROR - stderr - 27%|██▋ | 5947/22434 [7:01:12<11:25:45, 2.50s/it] +2025-02-05 17:08:54 - ERROR - stderr - 27%|██▋ | 5948/22434 [7:01:14<11:26:43, 2.50s/it] +2025-02-05 17:08:54 - ERROR - stderr - +2025-02-05 17:08:54 - ERROR - stderr - +2025-02-05 17:08:54 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 1.0254755020141602, 'learning_rate': 1.7238487534703045e-05, 'epoch': 0.8} +2025-02-05 17:08:54 - ERROR - stderr - 27%|██▋ | 5948/22434 [7:01:14<11:26:43, 2.50s/it] +2025-02-05 17:08:57 - ERROR - stderr - 27%|██▋ | 5949/22434 [7:01:17<11:25:36, 2.50s/it] +2025-02-05 17:08:57 - ERROR - stderr - +2025-02-05 17:08:57 - ERROR - stderr - +2025-02-05 17:08:57 - INFO - stdout - {'loss': 0.8243, 'grad_norm': 1.081653356552124, 'learning_rate': 1.7237491333749874e-05, 'epoch': 0.8} +2025-02-05 17:08:57 - ERROR - stderr - 27%|██▋ | 5949/22434 [7:01:17<11:25:36, 2.50s/it] +2025-02-05 17:08:59 - ERROR - stderr - 27%|██▋ | 5950/22434 [7:01:19<11:25:20, 2.49s/it] +2025-02-05 17:08:59 - ERROR - stderr - +2025-02-05 17:08:59 - ERROR - stderr - +2025-02-05 17:08:59 - INFO - stdout - {'loss': 0.9438, 'grad_norm': 1.0846514701843262, 'learning_rate': 1.723649498193812e-05, 'epoch': 0.8} +2025-02-05 17:08:59 - ERROR - stderr - 27%|██▋ | 5950/22434 [7:01:19<11:25:20, 2.49s/it] +2025-02-05 17:09:02 - ERROR - stderr - 27%|██▋ | 5951/22434 [7:01:22<11:30:47, 2.51s/it] +2025-02-05 17:09:02 - ERROR - stderr - +2025-02-05 17:09:02 - ERROR - stderr - +2025-02-05 17:09:02 - INFO - stdout - {'loss': 0.949, 'grad_norm': 1.1029421091079712, 'learning_rate': 1.7235498479288554e-05, 'epoch': 0.8} +2025-02-05 17:09:02 - ERROR - stderr - 27%|██▋ | 5951/22434 [7:01:22<11:30:47, 2.51s/it] +2025-02-05 17:09:04 - ERROR - stderr - 27%|██▋ | 5952/22434 [7:01:24<11:37:03, 2.54s/it] +2025-02-05 17:09:05 - ERROR - stderr - +2025-02-05 17:09:05 - ERROR - stderr - +2025-02-05 17:09:05 - INFO - stdout - {'loss': 1.0229, 'grad_norm': 1.2071943283081055, 'learning_rate': 1.7234501825821946e-05, 'epoch': 0.8} +2025-02-05 17:09:05 - ERROR - stderr - 27%|██▋ | 5952/22434 [7:01:24<11:37:03, 2.54s/it] +2025-02-05 17:09:07 - ERROR - stderr - 27%|██▋ | 5953/22434 [7:01:27<11:34:38, 2.53s/it] +2025-02-05 17:09:07 - ERROR - stderr - +2025-02-05 17:09:07 - ERROR - stderr - +2025-02-05 17:09:07 - INFO - stdout - {'loss': 0.9488, 'grad_norm': 1.0350154638290405, 'learning_rate': 1.7233505021559066e-05, 'epoch': 0.8} +2025-02-05 17:09:07 - ERROR - stderr - 27%|██▋ | 5953/22434 [7:01:27<11:34:38, 2.53s/it] +2025-02-05 17:09:10 - ERROR - stderr - 27%|██▋ | 5954/22434 [7:01:29<11:39:55, 2.55s/it] +2025-02-05 17:09:10 - ERROR - stderr - +2025-02-05 17:09:10 - ERROR - stderr - +2025-02-05 17:09:10 - INFO - stdout - {'loss': 0.9225, 'grad_norm': 1.114148497581482, 'learning_rate': 1.7232508066520702e-05, 'epoch': 0.8} +2025-02-05 17:09:10 - ERROR - stderr - 27%|██▋ | 5954/22434 [7:01:29<11:39:55, 2.55s/it] +2025-02-05 17:09:12 - ERROR - stderr - 27%|██▋ | 5955/22434 [7:01:32<11:32:40, 2.52s/it] +2025-02-05 17:09:12 - ERROR - stderr - +2025-02-05 17:09:12 - ERROR - stderr - +2025-02-05 17:09:12 - INFO - stdout - {'loss': 1.0391, 'grad_norm': 1.0580759048461914, 'learning_rate': 1.7231510960727625e-05, 'epoch': 0.8} +2025-02-05 17:09:12 - ERROR - stderr - 27%|██▋ | 5955/22434 [7:01:32<11:32:40, 2.52s/it] +2025-02-05 17:09:15 - ERROR - stderr - 27%|██▋ | 5956/22434 [7:01:34<11:31:42, 2.52s/it] +2025-02-05 17:09:15 - ERROR - stderr - +2025-02-05 17:09:15 - ERROR - stderr - +2025-02-05 17:09:15 - INFO - stdout - {'loss': 0.9178, 'grad_norm': 1.0351217985153198, 'learning_rate': 1.723051370420062e-05, 'epoch': 0.8} +2025-02-05 17:09:15 - ERROR - stderr - 27%|██▋ | 5956/22434 [7:01:34<11:31:42, 2.52s/it] +2025-02-05 17:09:17 - ERROR - stderr - 27%|██▋ | 5957/22434 [7:01:37<11:31:22, 2.52s/it] +2025-02-05 17:09:17 - ERROR - stderr - +2025-02-05 17:09:17 - ERROR - stderr - +2025-02-05 17:09:17 - INFO - stdout - {'loss': 1.0899, 'grad_norm': 1.1464687585830688, 'learning_rate': 1.7229516296960477e-05, 'epoch': 0.8} +2025-02-05 17:09:17 - ERROR - stderr - 27%|██▋ | 5957/22434 [7:01:37<11:31:22, 2.52s/it] +2025-02-05 17:09:20 - ERROR - stderr - 27%|██▋ | 5958/22434 [7:01:39<11:37:15, 2.54s/it] +2025-02-05 17:09:20 - ERROR - stderr - +2025-02-05 17:09:20 - ERROR - stderr - +2025-02-05 17:09:20 - INFO - stdout - {'loss': 1.0905, 'grad_norm': 1.1180436611175537, 'learning_rate': 1.7228518739027985e-05, 'epoch': 0.8} +2025-02-05 17:09:20 - ERROR - stderr - 27%|██▋ | 5958/22434 [7:01:39<11:37:15, 2.54s/it] +2025-02-05 17:09:23 - ERROR - stderr - 27%|██▋ | 5959/22434 [7:01:43<12:23:05, 2.71s/it] +2025-02-05 17:09:23 - ERROR - stderr - +2025-02-05 17:09:23 - ERROR - stderr - +2025-02-05 17:09:23 - INFO - stdout - {'loss': 0.9592, 'grad_norm': 1.0598148107528687, 'learning_rate': 1.7227521030423938e-05, 'epoch': 0.8} +2025-02-05 17:09:23 - ERROR - stderr - 27%|██▋ | 5959/22434 [7:01:43<12:23:05, 2.71s/it] +2025-02-05 17:09:25 - ERROR - stderr - 27%|██▋ | 5960/22434 [7:01:45<12:03:30, 2.64s/it] +2025-02-05 17:09:25 - ERROR - stderr - +2025-02-05 17:09:25 - ERROR - stderr - +2025-02-05 17:09:25 - INFO - stdout - {'loss': 0.8654, 'grad_norm': 1.0116569995880127, 'learning_rate': 1.722652317116913e-05, 'epoch': 0.8} +2025-02-05 17:09:25 - ERROR - stderr - 27%|██▋ | 5960/22434 [7:01:45<12:03:30, 2.64s/it] +2025-02-05 17:09:28 - ERROR - stderr - 27%|██▋ | 5961/22434 [7:01:47<11:52:32, 2.60s/it] +2025-02-05 17:09:28 - ERROR - stderr - +2025-02-05 17:09:28 - ERROR - stderr - +2025-02-05 17:09:28 - INFO - stdout - {'loss': 0.9046, 'grad_norm': 1.1499139070510864, 'learning_rate': 1.722552516128436e-05, 'epoch': 0.8} +2025-02-05 17:09:28 - ERROR - stderr - 27%|██▋ | 5961/22434 [7:01:48<11:52:32, 2.60s/it] +2025-02-05 17:09:30 - ERROR - stderr - 27%|██▋ | 5962/22434 [7:01:50<11:48:07, 2.58s/it] +2025-02-05 17:09:30 - ERROR - stderr - +2025-02-05 17:09:30 - ERROR - stderr - +2025-02-05 17:09:30 - INFO - stdout - {'loss': 0.8976, 'grad_norm': 1.0761595964431763, 'learning_rate': 1.7224527000790436e-05, 'epoch': 0.8} +2025-02-05 17:09:30 - ERROR - stderr - 27%|██▋ | 5962/22434 [7:01:50<11:48:07, 2.58s/it] +2025-02-05 17:09:33 - ERROR - stderr - 27%|██▋ | 5963/22434 [7:01:53<11:43:14, 2.56s/it] +2025-02-05 17:09:33 - ERROR - stderr - +2025-02-05 17:09:33 - ERROR - stderr - +2025-02-05 17:09:33 - INFO - stdout - {'loss': 1.0545, 'grad_norm': 1.2150306701660156, 'learning_rate': 1.7223528689708157e-05, 'epoch': 0.8} +2025-02-05 17:09:33 - ERROR - stderr - 27%|██▋ | 5963/22434 [7:01:53<11:43:14, 2.56s/it] +2025-02-05 17:09:35 - ERROR - stderr - 27%|██▋ | 5964/22434 [7:01:55<11:38:23, 2.54s/it] +2025-02-05 17:09:35 - ERROR - stderr - +2025-02-05 17:09:35 - ERROR - stderr - +2025-02-05 17:09:35 - INFO - stdout - {'loss': 0.9248, 'grad_norm': 0.9700686931610107, 'learning_rate': 1.7222530228058338e-05, 'epoch': 0.8} +2025-02-05 17:09:35 - ERROR - stderr - 27%|██▋ | 5964/22434 [7:01:55<11:38:23, 2.54s/it] +2025-02-05 17:09:38 - ERROR - stderr - 27%|██▋ | 5965/22434 [7:01:58<11:39:15, 2.55s/it] +2025-02-05 17:09:38 - ERROR - stderr - +2025-02-05 17:09:38 - ERROR - stderr - +2025-02-05 17:09:38 - INFO - stdout - {'loss': 0.8833, 'grad_norm': 1.1248748302459717, 'learning_rate': 1.722153161586178e-05, 'epoch': 0.8} +2025-02-05 17:09:38 - ERROR - stderr - 27%|██▋ | 5965/22434 [7:01:58<11:39:15, 2.55s/it] +2025-02-05 17:09:40 - ERROR - stderr - 27%|██▋ | 5966/22434 [7:02:00<11:40:12, 2.55s/it] +2025-02-05 17:09:40 - ERROR - stderr - +2025-02-05 17:09:40 - ERROR - stderr - +2025-02-05 17:09:40 - INFO - stdout - {'loss': 1.011, 'grad_norm': 1.2003587484359741, 'learning_rate': 1.7220532853139313e-05, 'epoch': 0.8} +2025-02-05 17:09:40 - ERROR - stderr - 27%|██▋ | 5966/22434 [7:02:00<11:40:12, 2.55s/it] +2025-02-05 17:09:43 - ERROR - stderr - 27%|██▋ | 5967/22434 [7:02:03<11:30:41, 2.52s/it] +2025-02-05 17:09:43 - ERROR - stderr - +2025-02-05 17:09:43 - ERROR - stderr - +2025-02-05 17:09:43 - INFO - stdout - {'loss': 0.8001, 'grad_norm': 1.085605263710022, 'learning_rate': 1.7219533939911743e-05, 'epoch': 0.8} +2025-02-05 17:09:43 - ERROR - stderr - 27%|██▋ | 5967/22434 [7:02:03<11:30:41, 2.52s/it] +2025-02-05 17:09:45 - ERROR - stderr - 27%|██▋ | 5968/22434 [7:02:05<11:25:57, 2.50s/it] +2025-02-05 17:09:45 - ERROR - stderr - +2025-02-05 17:09:45 - ERROR - stderr - +2025-02-05 17:09:45 - INFO - stdout - {'loss': 0.8284, 'grad_norm': 1.1982121467590332, 'learning_rate': 1.72185348761999e-05, 'epoch': 0.8} +2025-02-05 17:09:45 - ERROR - stderr - 27%|██▋ | 5968/22434 [7:02:05<11:25:57, 2.50s/it] +2025-02-05 17:09:48 - ERROR - stderr - 27%|██▋ | 5969/22434 [7:02:08<11:28:21, 2.51s/it] +2025-02-05 17:09:48 - ERROR - stderr - +2025-02-05 17:09:48 - ERROR - stderr - +2025-02-05 17:09:48 - INFO - stdout - {'loss': 1.1263, 'grad_norm': 1.0838556289672852, 'learning_rate': 1.7217535662024602e-05, 'epoch': 0.8} +2025-02-05 17:09:48 - ERROR - stderr - 27%|██▋ | 5969/22434 [7:02:08<11:28:21, 2.51s/it] +2025-02-05 17:09:50 - ERROR - stderr - 27%|██▋ | 5970/22434 [7:02:10<11:37:14, 2.54s/it] +2025-02-05 17:09:50 - ERROR - stderr - +2025-02-05 17:09:50 - ERROR - stderr - +2025-02-05 17:09:50 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.0332542657852173, 'learning_rate': 1.721653629740668e-05, 'epoch': 0.8} +2025-02-05 17:09:50 - ERROR - stderr - 27%|██▋ | 5970/22434 [7:02:10<11:37:14, 2.54s/it] +2025-02-05 17:09:53 - ERROR - stderr - 27%|██▋ | 5971/22434 [7:02:13<11:29:14, 2.51s/it] +2025-02-05 17:09:53 - ERROR - stderr - +2025-02-05 17:09:53 - ERROR - stderr - +2025-02-05 17:09:53 - INFO - stdout - {'loss': 0.9644, 'grad_norm': 1.08811616897583, 'learning_rate': 1.721553678236697e-05, 'epoch': 0.8} +2025-02-05 17:09:53 - ERROR - stderr - 27%|██▋ | 5971/22434 [7:02:13<11:29:14, 2.51s/it] +2025-02-05 17:09:55 - ERROR - stderr - 27%|██▋ | 5972/22434 [7:02:15<11:36:14, 2.54s/it] +2025-02-05 17:09:56 - ERROR - stderr - +2025-02-05 17:09:56 - ERROR - stderr - +2025-02-05 17:09:56 - INFO - stdout - {'loss': 0.8914, 'grad_norm': 1.099745750427246, 'learning_rate': 1.7214537116926292e-05, 'epoch': 0.8} +2025-02-05 17:09:56 - ERROR - stderr - 27%|██▋ | 5972/22434 [7:02:15<11:36:14, 2.54s/it] +2025-02-05 17:09:58 - ERROR - stderr - 27%|██▋ | 5973/22434 [7:02:18<11:32:00, 2.52s/it] +2025-02-05 17:09:58 - ERROR - stderr - +2025-02-05 17:09:58 - ERROR - stderr - +2025-02-05 17:09:58 - INFO - stdout - {'loss': 0.9315, 'grad_norm': 1.1409785747528076, 'learning_rate': 1.7213537301105496e-05, 'epoch': 0.8} +2025-02-05 17:09:58 - ERROR - stderr - 27%|██▋ | 5973/22434 [7:02:18<11:32:00, 2.52s/it] +2025-02-05 17:10:01 - ERROR - stderr - 27%|██▋ | 5974/22434 [7:02:20<11:38:32, 2.55s/it] +2025-02-05 17:10:01 - ERROR - stderr - +2025-02-05 17:10:01 - ERROR - stderr - +2025-02-05 17:10:01 - INFO - stdout - {'loss': 1.0215, 'grad_norm': 1.2062530517578125, 'learning_rate': 1.7212537334925416e-05, 'epoch': 0.8} +2025-02-05 17:10:01 - ERROR - stderr - 27%|██▋ | 5974/22434 [7:02:20<11:38:32, 2.55s/it] +2025-02-05 17:10:03 - ERROR - stderr - 27%|██▋ | 5975/22434 [7:02:23<11:30:38, 2.52s/it] +2025-02-05 17:10:03 - ERROR - stderr - +2025-02-05 17:10:03 - ERROR - stderr - +2025-02-05 17:10:03 - INFO - stdout - {'loss': 1.0395, 'grad_norm': 1.1689670085906982, 'learning_rate': 1.7211537218406897e-05, 'epoch': 0.8} +2025-02-05 17:10:03 - ERROR - stderr - 27%|██▋ | 5975/22434 [7:02:23<11:30:38, 2.52s/it] +2025-02-05 17:10:06 - ERROR - stderr - 27%|██▋ | 5976/22434 [7:02:25<11:32:20, 2.52s/it] +2025-02-05 17:10:06 - ERROR - stderr - +2025-02-05 17:10:06 - ERROR - stderr - +2025-02-05 17:10:06 - INFO - stdout - {'loss': 0.9498, 'grad_norm': 1.2341601848602295, 'learning_rate': 1.7210536951570788e-05, 'epoch': 0.8} +2025-02-05 17:10:06 - ERROR - stderr - 27%|██▋ | 5976/22434 [7:02:25<11:32:20, 2.52s/it] +2025-02-05 17:10:08 - ERROR - stderr - 27%|██▋ | 5977/22434 [7:02:28<11:31:56, 2.52s/it] +2025-02-05 17:10:08 - ERROR - stderr - +2025-02-05 17:10:08 - ERROR - stderr - +2025-02-05 17:10:08 - INFO - stdout - {'loss': 0.8595, 'grad_norm': 1.0076992511749268, 'learning_rate': 1.7209536534437935e-05, 'epoch': 0.8} +2025-02-05 17:10:08 - ERROR - stderr - 27%|██▋ | 5977/22434 [7:02:28<11:31:56, 2.52s/it] +2025-02-05 17:10:11 - ERROR - stderr - 27%|██▋ | 5978/22434 [7:02:30<11:38:37, 2.55s/it] +2025-02-05 17:10:11 - ERROR - stderr - +2025-02-05 17:10:11 - ERROR - stderr - +2025-02-05 17:10:11 - INFO - stdout - {'loss': 0.9613, 'grad_norm': 1.0309330224990845, 'learning_rate': 1.720853596702919e-05, 'epoch': 0.8} +2025-02-05 17:10:11 - ERROR - stderr - 27%|██▋ | 5978/22434 [7:02:31<11:38:37, 2.55s/it] +2025-02-05 17:10:13 - ERROR - stderr - 27%|██▋ | 5979/22434 [7:02:33<11:31:16, 2.52s/it] +2025-02-05 17:10:13 - ERROR - stderr - +2025-02-05 17:10:13 - ERROR - stderr - +2025-02-05 17:10:13 - INFO - stdout - {'loss': 0.941, 'grad_norm': 1.03667151927948, 'learning_rate': 1.7207535249365412e-05, 'epoch': 0.8} +2025-02-05 17:10:13 - ERROR - stderr - 27%|██▋ | 5979/22434 [7:02:33<11:31:16, 2.52s/it] +2025-02-05 17:10:16 - ERROR - stderr - 27%|██▋ | 5980/22434 [7:02:35<11:30:31, 2.52s/it] +2025-02-05 17:10:16 - ERROR - stderr - +2025-02-05 17:10:16 - ERROR - stderr - +2025-02-05 17:10:16 - INFO - stdout - {'loss': 0.9793, 'grad_norm': 1.2212883234024048, 'learning_rate': 1.7206534381467456e-05, 'epoch': 0.8} +2025-02-05 17:10:16 - ERROR - stderr - 27%|██▋ | 5980/22434 [7:02:35<11:30:31, 2.52s/it] +2025-02-05 17:10:18 - ERROR - stderr - 27%|██▋ | 5981/22434 [7:02:38<11:31:43, 2.52s/it] +2025-02-05 17:10:18 - ERROR - stderr - +2025-02-05 17:10:18 - ERROR - stderr - +2025-02-05 17:10:18 - INFO - stdout - {'loss': 0.9442, 'grad_norm': 1.0123236179351807, 'learning_rate': 1.720553336335619e-05, 'epoch': 0.8} +2025-02-05 17:10:18 - ERROR - stderr - 27%|██▋ | 5981/22434 [7:02:38<11:31:43, 2.52s/it] +2025-02-05 17:10:21 - ERROR - stderr - 27%|██▋ | 5982/22434 [7:02:40<11:23:34, 2.49s/it] +2025-02-05 17:10:21 - ERROR - stderr - +2025-02-05 17:10:21 - ERROR - stderr - +2025-02-05 17:10:21 - INFO - stdout - {'loss': 0.8257, 'grad_norm': 1.1629676818847656, 'learning_rate': 1.7204532195052476e-05, 'epoch': 0.8} +2025-02-05 17:10:21 - ERROR - stderr - 27%|██▋ | 5982/22434 [7:02:40<11:23:34, 2.49s/it] +2025-02-05 17:10:23 - ERROR - stderr - 27%|██▋ | 5983/22434 [7:02:43<11:21:04, 2.48s/it] +2025-02-05 17:10:23 - ERROR - stderr - +2025-02-05 17:10:23 - ERROR - stderr - +2025-02-05 17:10:23 - INFO - stdout - {'loss': 0.9001, 'grad_norm': 0.9287083148956299, 'learning_rate': 1.720353087657718e-05, 'epoch': 0.8} +2025-02-05 17:10:23 - ERROR - stderr - 27%|██▋ | 5983/22434 [7:02:43<11:21:04, 2.48s/it] +2025-02-05 17:10:26 - ERROR - stderr - 27%|██▋ | 5984/22434 [7:02:45<11:21:55, 2.49s/it] +2025-02-05 17:10:26 - ERROR - stderr - +2025-02-05 17:10:26 - ERROR - stderr - +2025-02-05 17:10:26 - INFO - stdout - {'loss': 0.9154, 'grad_norm': 1.1815904378890991, 'learning_rate': 1.7202529407951175e-05, 'epoch': 0.8} +2025-02-05 17:10:26 - ERROR - stderr - 27%|██▋ | 5984/22434 [7:02:45<11:21:55, 2.49s/it] +2025-02-05 17:10:28 - ERROR - stderr - 27%|██▋ | 5985/22434 [7:02:48<11:27:56, 2.51s/it] +2025-02-05 17:10:28 - ERROR - stderr - +2025-02-05 17:10:28 - ERROR - stderr - +2025-02-05 17:10:28 - INFO - stdout - {'loss': 1.086, 'grad_norm': 1.0900535583496094, 'learning_rate': 1.720152778919534e-05, 'epoch': 0.8} +2025-02-05 17:10:28 - ERROR - stderr - 27%|██▋ | 5985/22434 [7:02:48<11:27:56, 2.51s/it] +2025-02-05 17:10:31 - ERROR - stderr - 27%|██▋ | 5986/22434 [7:02:50<11:26:07, 2.50s/it] +2025-02-05 17:10:31 - ERROR - stderr - +2025-02-05 17:10:31 - ERROR - stderr - +2025-02-05 17:10:31 - INFO - stdout - {'loss': 1.0278, 'grad_norm': 1.1996012926101685, 'learning_rate': 1.720052602033055e-05, 'epoch': 0.8} +2025-02-05 17:10:31 - ERROR - stderr - 27%|██▋ | 5986/22434 [7:02:50<11:26:07, 2.50s/it] +2025-02-05 17:10:33 - ERROR - stderr - 27%|██▋ | 5987/22434 [7:02:53<11:21:34, 2.49s/it] +2025-02-05 17:10:33 - ERROR - stderr - +2025-02-05 17:10:33 - ERROR - stderr - +2025-02-05 17:10:33 - INFO - stdout - {'loss': 0.9064, 'grad_norm': 1.0817656517028809, 'learning_rate': 1.719952410137768e-05, 'epoch': 0.8} +2025-02-05 17:10:33 - ERROR - stderr - 27%|██▋ | 5987/22434 [7:02:53<11:21:34, 2.49s/it] +2025-02-05 17:10:36 - ERROR - stderr - 27%|██▋ | 5988/22434 [7:02:55<11:19:18, 2.48s/it] +2025-02-05 17:10:36 - ERROR - stderr - +2025-02-05 17:10:36 - ERROR - stderr - +2025-02-05 17:10:36 - INFO - stdout - {'loss': 1.0129, 'grad_norm': 1.1302690505981445, 'learning_rate': 1.7198522032357622e-05, 'epoch': 0.8} +2025-02-05 17:10:36 - ERROR - stderr - 27%|██▋ | 5988/22434 [7:02:55<11:19:18, 2.48s/it] +2025-02-05 17:10:38 - ERROR - stderr - 27%|██▋ | 5989/22434 [7:02:58<11:16:53, 2.47s/it] +2025-02-05 17:10:38 - ERROR - stderr - +2025-02-05 17:10:38 - ERROR - stderr - +2025-02-05 17:10:38 - INFO - stdout - {'loss': 0.8896, 'grad_norm': 1.0130740404129028, 'learning_rate': 1.7197519813291262e-05, 'epoch': 0.8} +2025-02-05 17:10:38 - ERROR - stderr - 27%|██▋ | 5989/22434 [7:02:58<11:16:53, 2.47s/it] +2025-02-05 17:10:40 - ERROR - stderr - 27%|██▋ | 5990/22434 [7:03:00<11:16:46, 2.47s/it] +2025-02-05 17:10:40 - ERROR - stderr - +2025-02-05 17:10:40 - ERROR - stderr - +2025-02-05 17:10:40 - INFO - stdout - {'loss': 1.0032, 'grad_norm': 1.072466254234314, 'learning_rate': 1.7196517444199487e-05, 'epoch': 0.8} +2025-02-05 17:10:40 - ERROR - stderr - 27%|██▋ | 5990/22434 [7:03:00<11:16:46, 2.47s/it] +2025-02-05 17:10:43 - ERROR - stderr - 27%|██▋ | 5991/22434 [7:03:03<11:13:26, 2.46s/it] +2025-02-05 17:10:43 - ERROR - stderr - +2025-02-05 17:10:43 - ERROR - stderr - +2025-02-05 17:10:43 - INFO - stdout - {'loss': 0.9505, 'grad_norm': 1.0459058284759521, 'learning_rate': 1.7195514925103195e-05, 'epoch': 0.8} +2025-02-05 17:10:43 - ERROR - stderr - 27%|██▋ | 5991/22434 [7:03:03<11:13:26, 2.46s/it] +2025-02-05 17:10:46 - ERROR - stderr - 27%|██▋ | 5992/22434 [7:03:06<12:10:59, 2.67s/it] +2025-02-05 17:10:46 - ERROR - stderr - +2025-02-05 17:10:46 - ERROR - stderr - +2025-02-05 17:10:46 - INFO - stdout - {'loss': 0.9115, 'grad_norm': 1.1594972610473633, 'learning_rate': 1.7194512256023276e-05, 'epoch': 0.8} +2025-02-05 17:10:46 - ERROR - stderr - 27%|██▋ | 5992/22434 [7:03:06<12:10:59, 2.67s/it] +2025-02-05 17:10:49 - ERROR - stderr - 27%|██▋ | 5993/22434 [7:03:08<11:54:44, 2.61s/it] +2025-02-05 17:10:49 - ERROR - stderr - +2025-02-05 17:10:49 - ERROR - stderr - +2025-02-05 17:10:49 - INFO - stdout - {'loss': 0.913, 'grad_norm': 1.20310640335083, 'learning_rate': 1.7193509436980633e-05, 'epoch': 0.8} +2025-02-05 17:10:49 - ERROR - stderr - 27%|██▋ | 5993/22434 [7:03:08<11:54:44, 2.61s/it] +2025-02-05 17:10:51 - ERROR - stderr - 27%|██▋ | 5994/22434 [7:03:11<11:41:52, 2.56s/it] +2025-02-05 17:10:51 - ERROR - stderr - +2025-02-05 17:10:51 - ERROR - stderr - +2025-02-05 17:10:51 - INFO - stdout - {'loss': 0.8977, 'grad_norm': 1.1311678886413574, 'learning_rate': 1.7192506467996174e-05, 'epoch': 0.8} +2025-02-05 17:10:51 - ERROR - stderr - 27%|██▋ | 5994/22434 [7:03:11<11:41:52, 2.56s/it] +2025-02-05 17:10:53 - ERROR - stderr - 27%|██▋ | 5995/22434 [7:03:13<11:36:39, 2.54s/it] +2025-02-05 17:10:54 - ERROR - stderr - +2025-02-05 17:10:54 - ERROR - stderr - +2025-02-05 17:10:54 - INFO - stdout - {'loss': 0.8419, 'grad_norm': 0.9222077131271362, 'learning_rate': 1.7191503349090797e-05, 'epoch': 0.8} +2025-02-05 17:10:54 - ERROR - stderr - 27%|██▋ | 5995/22434 [7:03:13<11:36:39, 2.54s/it] +2025-02-05 17:10:56 - ERROR - stderr - 27%|██▋ | 5996/22434 [7:03:16<11:38:19, 2.55s/it] +2025-02-05 17:10:56 - ERROR - stderr - +2025-02-05 17:10:56 - ERROR - stderr - +2025-02-05 17:10:56 - INFO - stdout - {'loss': 0.8525, 'grad_norm': 1.1015582084655762, 'learning_rate': 1.7190500080285412e-05, 'epoch': 0.8} +2025-02-05 17:10:56 - ERROR - stderr - 27%|██▋ | 5996/22434 [7:03:16<11:38:19, 2.55s/it] +2025-02-05 17:10:59 - ERROR - stderr - 27%|██▋ | 5997/22434 [7:03:18<11:33:21, 2.53s/it] +2025-02-05 17:10:59 - ERROR - stderr - +2025-02-05 17:10:59 - ERROR - stderr - +2025-02-05 17:10:59 - INFO - stdout - {'loss': 1.0288, 'grad_norm': 1.1134991645812988, 'learning_rate': 1.7189496661600936e-05, 'epoch': 0.8} +2025-02-05 17:10:59 - ERROR - stderr - 27%|██▋ | 5997/22434 [7:03:18<11:33:21, 2.53s/it] +2025-02-05 17:11:01 - ERROR - stderr - 27%|██▋ | 5998/22434 [7:03:21<11:33:43, 2.53s/it] +2025-02-05 17:11:01 - ERROR - stderr - +2025-02-05 17:11:01 - ERROR - stderr - +2025-02-05 17:11:01 - INFO - stdout - {'loss': 1.0164, 'grad_norm': 1.0536115169525146, 'learning_rate': 1.7188493093058283e-05, 'epoch': 0.8} +2025-02-05 17:11:01 - ERROR - stderr - 27%|██▋ | 5998/22434 [7:03:21<11:33:43, 2.53s/it] +2025-02-05 17:11:04 - ERROR - stderr - 27%|██▋ | 5999/22434 [7:03:24<12:27:04, 2.73s/it] +2025-02-05 17:11:04 - ERROR - stderr - +2025-02-05 17:11:04 - ERROR - stderr - +2025-02-05 17:11:04 - INFO - stdout - {'loss': 0.9134, 'grad_norm': 0.9787282943725586, 'learning_rate': 1.718748937467837e-05, 'epoch': 0.8} +2025-02-05 17:11:04 - ERROR - stderr - 27%|██▋ | 5999/22434 [7:03:24<12:27:04, 2.73s/it] +2025-02-05 17:11:07 - ERROR - stderr - 27%|██▋ | 6000/22434 [7:03:26<12:07:54, 2.66s/it] +2025-02-05 17:11:07 - ERROR - stderr - +2025-02-05 17:11:07 - ERROR - stderr - +2025-02-05 17:11:07 - INFO - stdout - {'loss': 0.9756, 'grad_norm': 1.1369825601577759, 'learning_rate': 1.7186485506482115e-05, 'epoch': 0.8} +2025-02-05 17:11:07 - ERROR - stderr - 27%|██▋ | 6000/22434 [7:03:27<12:07:54, 2.66s/it] +2025-02-05 17:11:09 - ERROR - stderr - 27%|██▋ | 6001/22434 [7:03:29<12:00:14, 2.63s/it] +2025-02-05 17:11:09 - ERROR - stderr - +2025-02-05 17:11:09 - ERROR - stderr - +2025-02-05 17:11:09 - INFO - stdout - {'loss': 0.9445, 'grad_norm': 1.1553720235824585, 'learning_rate': 1.718548148849045e-05, 'epoch': 0.8} +2025-02-05 17:11:09 - ERROR - stderr - 27%|██▋ | 6001/22434 [7:03:29<12:00:14, 2.63s/it] +2025-02-05 17:11:12 - ERROR - stderr - 27%|██▋ | 6002/22434 [7:03:32<11:59:09, 2.63s/it] +2025-02-05 17:11:12 - ERROR - stderr - +2025-02-05 17:11:12 - ERROR - stderr - +2025-02-05 17:11:12 - INFO - stdout - {'loss': 0.9742, 'grad_norm': 0.9981961846351624, 'learning_rate': 1.7184477320724297e-05, 'epoch': 0.8} +2025-02-05 17:11:12 - ERROR - stderr - 27%|██▋ | 6002/22434 [7:03:32<11:59:09, 2.63s/it] +2025-02-05 17:11:15 - ERROR - stderr - 27%|██▋ | 6003/22434 [7:03:34<12:10:12, 2.67s/it] +2025-02-05 17:11:15 - ERROR - stderr - +2025-02-05 17:11:15 - ERROR - stderr - +2025-02-05 17:11:15 - INFO - stdout - {'loss': 0.89, 'grad_norm': 1.0971591472625732, 'learning_rate': 1.718347300320459e-05, 'epoch': 0.8} +2025-02-05 17:11:15 - ERROR - stderr - 27%|██▋ | 6003/22434 [7:03:34<12:10:12, 2.67s/it] +2025-02-05 17:11:17 - ERROR - stderr - 27%|██▋ | 6004/22434 [7:03:37<11:54:50, 2.61s/it] +2025-02-05 17:11:17 - ERROR - stderr - +2025-02-05 17:11:17 - ERROR - stderr - +2025-02-05 17:11:17 - INFO - stdout - {'loss': 0.8237, 'grad_norm': 0.9448205232620239, 'learning_rate': 1.7182468535952263e-05, 'epoch': 0.8} +2025-02-05 17:11:17 - ERROR - stderr - 27%|██▋ | 6004/22434 [7:03:37<11:54:50, 2.61s/it] +2025-02-05 17:11:20 - ERROR - stderr - 27%|██▋ | 6005/22434 [7:03:40<11:54:16, 2.61s/it] +2025-02-05 17:11:20 - ERROR - stderr - +2025-02-05 17:11:20 - ERROR - stderr - +2025-02-05 17:11:20 - INFO - stdout - {'loss': 0.8833, 'grad_norm': 1.0414693355560303, 'learning_rate': 1.718146391898825e-05, 'epoch': 0.8} +2025-02-05 17:11:20 - ERROR - stderr - 27%|██▋ | 6005/22434 [7:03:40<11:54:16, 2.61s/it] +2025-02-05 17:11:22 - ERROR - stderr - 27%|██▋ | 6006/22434 [7:03:42<11:49:00, 2.59s/it] +2025-02-05 17:11:22 - ERROR - stderr - +2025-02-05 17:11:22 - ERROR - stderr - +2025-02-05 17:11:22 - INFO - stdout - {'loss': 0.835, 'grad_norm': 0.9588685035705566, 'learning_rate': 1.71804591523335e-05, 'epoch': 0.8} +2025-02-05 17:11:22 - ERROR - stderr - 27%|██▋ | 6006/22434 [7:03:42<11:49:00, 2.59s/it] +2025-02-05 17:11:25 - ERROR - stderr - 27%|██▋ | 6007/22434 [7:03:45<11:38:01, 2.55s/it] +2025-02-05 17:11:25 - ERROR - stderr - +2025-02-05 17:11:25 - ERROR - stderr - +2025-02-05 17:11:25 - INFO - stdout - {'loss': 0.904, 'grad_norm': 0.981637716293335, 'learning_rate': 1.717945423600894e-05, 'epoch': 0.8} +2025-02-05 17:11:25 - ERROR - stderr - 27%|██▋ | 6007/22434 [7:03:45<11:38:01, 2.55s/it] +2025-02-05 17:11:27 - ERROR - stderr - 27%|██▋ | 6008/22434 [7:03:47<11:32:54, 2.53s/it] +2025-02-05 17:11:27 - ERROR - stderr - +2025-02-05 17:11:27 - ERROR - stderr - +2025-02-05 17:11:27 - INFO - stdout - {'loss': 0.9563, 'grad_norm': 1.0093623399734497, 'learning_rate': 1.717844917003553e-05, 'epoch': 0.8} +2025-02-05 17:11:27 - ERROR - stderr - 27%|██▋ | 6008/22434 [7:03:47<11:32:54, 2.53s/it] +2025-02-05 17:11:30 - ERROR - stderr - 27%|██▋ | 6009/22434 [7:03:50<12:24:00, 2.72s/it] +2025-02-05 17:11:30 - ERROR - stderr - +2025-02-05 17:11:30 - ERROR - stderr - +2025-02-05 17:11:30 - INFO - stdout - {'loss': 0.9329, 'grad_norm': 0.9742627143859863, 'learning_rate': 1.7177443954434218e-05, 'epoch': 0.8} +2025-02-05 17:11:30 - ERROR - stderr - 27%|██▋ | 6009/22434 [7:03:50<12:24:00, 2.72s/it] +2025-02-05 17:11:33 - ERROR - stderr - 27%|██▋ | 6010/22434 [7:03:53<12:08:59, 2.66s/it] +2025-02-05 17:11:33 - ERROR - stderr - +2025-02-05 17:11:33 - ERROR - stderr - +2025-02-05 17:11:33 - INFO - stdout - {'loss': 0.7878, 'grad_norm': 1.0158179998397827, 'learning_rate': 1.7176438589225955e-05, 'epoch': 0.8} +2025-02-05 17:11:33 - ERROR - stderr - 27%|██▋ | 6010/22434 [7:03:53<12:08:59, 2.66s/it] +2025-02-05 17:11:35 - ERROR - stderr - 27%|██▋ | 6011/22434 [7:03:55<11:54:34, 2.61s/it] +2025-02-05 17:11:35 - ERROR - stderr - +2025-02-05 17:11:35 - ERROR - stderr - +2025-02-05 17:11:35 - INFO - stdout - {'loss': 0.8924, 'grad_norm': 0.9885859489440918, 'learning_rate': 1.7175433074431697e-05, 'epoch': 0.8} +2025-02-05 17:11:35 - ERROR - stderr - 27%|██▋ | 6011/22434 [7:03:55<11:54:34, 2.61s/it] +2025-02-05 17:11:38 - ERROR - stderr - 27%|██▋ | 6012/22434 [7:03:58<11:48:42, 2.59s/it] +2025-02-05 17:11:38 - ERROR - stderr - +2025-02-05 17:11:38 - ERROR - stderr - +2025-02-05 17:11:38 - INFO - stdout - {'loss': 1.0202, 'grad_norm': 1.1555663347244263, 'learning_rate': 1.7174427410072404e-05, 'epoch': 0.8} +2025-02-05 17:11:38 - ERROR - stderr - 27%|██▋ | 6012/22434 [7:03:58<11:48:42, 2.59s/it] +2025-02-05 17:11:40 - ERROR - stderr - 27%|██▋ | 6013/22434 [7:04:00<11:42:49, 2.57s/it] +2025-02-05 17:11:41 - ERROR - stderr - +2025-02-05 17:11:41 - ERROR - stderr - +2025-02-05 17:11:41 - INFO - stdout - {'loss': 0.8576, 'grad_norm': 0.9582664966583252, 'learning_rate': 1.717342159616903e-05, 'epoch': 0.8} +2025-02-05 17:11:41 - ERROR - stderr - 27%|██▋ | 6013/22434 [7:04:00<11:42:49, 2.57s/it] +2025-02-05 17:11:43 - ERROR - stderr - 27%|██▋ | 6014/22434 [7:04:03<11:34:06, 2.54s/it] +2025-02-05 17:11:43 - ERROR - stderr - +2025-02-05 17:11:43 - ERROR - stderr - +2025-02-05 17:11:43 - INFO - stdout - {'loss': 0.8963, 'grad_norm': 1.136109471321106, 'learning_rate': 1.7172415632742552e-05, 'epoch': 0.8} +2025-02-05 17:11:43 - ERROR - stderr - 27%|██▋ | 6014/22434 [7:04:03<11:34:06, 2.54s/it] +2025-02-05 17:11:45 - ERROR - stderr - 27%|██▋ | 6015/22434 [7:04:05<11:24:52, 2.50s/it] +2025-02-05 17:11:45 - ERROR - stderr - +2025-02-05 17:11:45 - ERROR - stderr - +2025-02-05 17:11:45 - INFO - stdout - {'loss': 0.9359, 'grad_norm': 1.0619771480560303, 'learning_rate': 1.7171409519813936e-05, 'epoch': 0.8} +2025-02-05 17:11:45 - ERROR - stderr - 27%|██▋ | 6015/22434 [7:04:05<11:24:52, 2.50s/it] +2025-02-05 17:11:48 - ERROR - stderr - 27%|██▋ | 6016/22434 [7:04:08<11:19:57, 2.48s/it] +2025-02-05 17:11:48 - ERROR - stderr - +2025-02-05 17:11:48 - ERROR - stderr - +2025-02-05 17:11:48 - INFO - stdout - {'loss': 1.0642, 'grad_norm': 1.134253978729248, 'learning_rate': 1.7170403257404147e-05, 'epoch': 0.8} +2025-02-05 17:11:48 - ERROR - stderr - 27%|██▋ | 6016/22434 [7:04:08<11:19:57, 2.48s/it] +2025-02-05 17:11:50 - ERROR - stderr - 27%|██▋ | 6017/22434 [7:04:10<11:25:41, 2.51s/it] +2025-02-05 17:11:50 - ERROR - stderr - +2025-02-05 17:11:50 - ERROR - stderr - +2025-02-05 17:11:50 - INFO - stdout - {'loss': 0.8841, 'grad_norm': 1.12119722366333, 'learning_rate': 1.7169396845534164e-05, 'epoch': 0.8} +2025-02-05 17:11:50 - ERROR - stderr - 27%|██▋ | 6017/22434 [7:04:10<11:25:41, 2.51s/it] +2025-02-05 17:11:53 - ERROR - stderr - 27%|██▋ | 6018/22434 [7:04:13<11:30:13, 2.52s/it] +2025-02-05 17:11:53 - ERROR - stderr - +2025-02-05 17:11:53 - ERROR - stderr - +2025-02-05 17:11:53 - INFO - stdout - {'loss': 1.0339, 'grad_norm': 1.0171111822128296, 'learning_rate': 1.7168390284224964e-05, 'epoch': 0.8} +2025-02-05 17:11:53 - ERROR - stderr - 27%|██▋ | 6018/22434 [7:04:13<11:30:13, 2.52s/it] +2025-02-05 17:11:55 - ERROR - stderr - 27%|██▋ | 6019/22434 [7:04:15<11:29:40, 2.52s/it] +2025-02-05 17:11:55 - ERROR - stderr - +2025-02-05 17:11:55 - ERROR - stderr - +2025-02-05 17:11:55 - INFO - stdout - {'loss': 1.0289, 'grad_norm': 1.0128767490386963, 'learning_rate': 1.7167383573497526e-05, 'epoch': 0.8} +2025-02-05 17:11:55 - ERROR - stderr - 27%|██▋ | 6019/22434 [7:04:15<11:29:40, 2.52s/it] +2025-02-05 17:11:58 - ERROR - stderr - 27%|██▋ | 6020/22434 [7:04:18<11:27:07, 2.51s/it] +2025-02-05 17:11:58 - ERROR - stderr - +2025-02-05 17:11:58 - ERROR - stderr - +2025-02-05 17:11:58 - INFO - stdout - {'loss': 0.9209, 'grad_norm': 1.2031018733978271, 'learning_rate': 1.716637671337284e-05, 'epoch': 0.81} +2025-02-05 17:11:58 - ERROR - stderr - 27%|██▋ | 6020/22434 [7:04:18<11:27:07, 2.51s/it] +2025-02-05 17:12:00 - ERROR - stderr - 27%|██▋ | 6021/22434 [7:04:20<11:26:11, 2.51s/it] +2025-02-05 17:12:00 - ERROR - stderr - +2025-02-05 17:12:00 - ERROR - stderr - +2025-02-05 17:12:00 - INFO - stdout - {'loss': 1.086, 'grad_norm': 1.1009597778320312, 'learning_rate': 1.7165369703871886e-05, 'epoch': 0.81} +2025-02-05 17:12:00 - ERROR - stderr - 27%|██▋ | 6021/22434 [7:04:20<11:26:11, 2.51s/it] +2025-02-05 17:12:03 - ERROR - stderr - 27%|██▋ | 6022/22434 [7:04:23<11:32:01, 2.53s/it] +2025-02-05 17:12:03 - ERROR - stderr - +2025-02-05 17:12:03 - ERROR - stderr - +2025-02-05 17:12:03 - INFO - stdout - {'loss': 0.9716, 'grad_norm': 1.144898772239685, 'learning_rate': 1.7164362545015656e-05, 'epoch': 0.81} +2025-02-05 17:12:03 - ERROR - stderr - 27%|██▋ | 6022/22434 [7:04:23<11:32:01, 2.53s/it] +2025-02-05 17:12:06 - ERROR - stderr - 27%|██▋ | 6023/22434 [7:04:25<11:33:38, 2.54s/it] +2025-02-05 17:12:06 - ERROR - stderr - +2025-02-05 17:12:06 - ERROR - stderr - +2025-02-05 17:12:06 - INFO - stdout - {'loss': 0.8193, 'grad_norm': 1.0333991050720215, 'learning_rate': 1.7163355236825146e-05, 'epoch': 0.81} +2025-02-05 17:12:06 - ERROR - stderr - 27%|██▋ | 6023/22434 [7:04:25<11:33:38, 2.54s/it] +2025-02-05 17:12:08 - ERROR - stderr - 27%|██▋ | 6024/22434 [7:04:28<11:28:01, 2.52s/it] +2025-02-05 17:12:08 - ERROR - stderr - +2025-02-05 17:12:08 - ERROR - stderr - +2025-02-05 17:12:08 - INFO - stdout - {'loss': 0.8673, 'grad_norm': 1.0955322980880737, 'learning_rate': 1.7162347779321352e-05, 'epoch': 0.81} +2025-02-05 17:12:08 - ERROR - stderr - 27%|██▋ | 6024/22434 [7:04:28<11:28:01, 2.52s/it] +2025-02-05 17:12:11 - ERROR - stderr - 27%|██▋ | 6025/22434 [7:04:30<11:28:34, 2.52s/it] +2025-02-05 17:12:11 - ERROR - stderr - +2025-02-05 17:12:11 - ERROR - stderr - +2025-02-05 17:12:11 - INFO - stdout - {'loss': 0.9881, 'grad_norm': 1.046897530555725, 'learning_rate': 1.716134017252527e-05, 'epoch': 0.81} +2025-02-05 17:12:11 - ERROR - stderr - 27%|██▋ | 6025/22434 [7:04:30<11:28:34, 2.52s/it] +2025-02-05 17:12:13 - ERROR - stderr - 27%|██▋ | 6026/22434 [7:04:33<11:29:37, 2.52s/it] +2025-02-05 17:12:13 - ERROR - stderr - +2025-02-05 17:12:13 - ERROR - stderr - +2025-02-05 17:12:13 - INFO - stdout - {'loss': 0.9068, 'grad_norm': 1.1322290897369385, 'learning_rate': 1.7160332416457907e-05, 'epoch': 0.81} +2025-02-05 17:12:13 - ERROR - stderr - 27%|██▋ | 6026/22434 [7:04:33<11:29:37, 2.52s/it] +2025-02-05 17:12:16 - ERROR - stderr - 27%|██▋ | 6027/22434 [7:04:35<11:25:34, 2.51s/it] +2025-02-05 17:12:16 - ERROR - stderr - +2025-02-05 17:12:16 - ERROR - stderr - +2025-02-05 17:12:16 - INFO - stdout - {'loss': 0.9603, 'grad_norm': 1.1079896688461304, 'learning_rate': 1.7159324511140266e-05, 'epoch': 0.81} +2025-02-05 17:12:16 - ERROR - stderr - 27%|██▋ | 6027/22434 [7:04:35<11:25:34, 2.51s/it] +2025-02-05 17:12:18 - ERROR - stderr - 27%|██▋ | 6028/22434 [7:04:38<11:29:08, 2.52s/it] +2025-02-05 17:12:18 - ERROR - stderr - +2025-02-05 17:12:18 - ERROR - stderr - +2025-02-05 17:12:18 - INFO - stdout - {'loss': 0.9239, 'grad_norm': 0.9854230284690857, 'learning_rate': 1.7158316456593356e-05, 'epoch': 0.81} +2025-02-05 17:12:18 - ERROR - stderr - 27%|██▋ | 6028/22434 [7:04:38<11:29:08, 2.52s/it] +2025-02-05 17:12:21 - ERROR - stderr - 27%|██▋ | 6029/22434 [7:04:40<11:31:19, 2.53s/it] +2025-02-05 17:12:21 - ERROR - stderr - +2025-02-05 17:12:21 - ERROR - stderr - +2025-02-05 17:12:21 - INFO - stdout - {'loss': 0.9519, 'grad_norm': 1.167246699333191, 'learning_rate': 1.7157308252838187e-05, 'epoch': 0.81} +2025-02-05 17:12:21 - ERROR - stderr - 27%|██▋ | 6029/22434 [7:04:40<11:31:19, 2.53s/it] +2025-02-05 17:12:23 - ERROR - stderr - 27%|██▋ | 6030/22434 [7:04:43<11:45:32, 2.58s/it] +2025-02-05 17:12:23 - ERROR - stderr - +2025-02-05 17:12:23 - ERROR - stderr - +2025-02-05 17:12:23 - INFO - stdout - {'loss': 0.9555, 'grad_norm': 1.0009126663208008, 'learning_rate': 1.715629989989578e-05, 'epoch': 0.81} +2025-02-05 17:12:23 - ERROR - stderr - 27%|██▋ | 6030/22434 [7:04:43<11:45:32, 2.58s/it] +2025-02-05 17:12:26 - ERROR - stderr - 27%|██▋ | 6031/22434 [7:04:46<11:43:19, 2.57s/it] +2025-02-05 17:12:26 - ERROR - stderr - +2025-02-05 17:12:26 - ERROR - stderr - +2025-02-05 17:12:26 - INFO - stdout - {'loss': 0.9597, 'grad_norm': 0.962867021560669, 'learning_rate': 1.7155291397787147e-05, 'epoch': 0.81} +2025-02-05 17:12:26 - ERROR - stderr - 27%|██▋ | 6031/22434 [7:04:46<11:43:19, 2.57s/it] +2025-02-05 17:12:28 - ERROR - stderr - 27%|██▋ | 6032/22434 [7:04:48<11:37:18, 2.55s/it] +2025-02-05 17:12:28 - ERROR - stderr - +2025-02-05 17:12:28 - ERROR - stderr - +2025-02-05 17:12:28 - INFO - stdout - {'loss': 0.8535, 'grad_norm': 1.0597095489501953, 'learning_rate': 1.715428274653331e-05, 'epoch': 0.81} +2025-02-05 17:12:28 - ERROR - stderr - 27%|██▋ | 6032/22434 [7:04:48<11:37:18, 2.55s/it] +2025-02-05 17:12:31 - ERROR - stderr - 27%|██▋ | 6033/22434 [7:04:51<11:29:53, 2.52s/it] +2025-02-05 17:12:31 - ERROR - stderr - +2025-02-05 17:12:31 - ERROR - stderr - +2025-02-05 17:12:31 - INFO - stdout - {'loss': 0.9758, 'grad_norm': 1.1344106197357178, 'learning_rate': 1.71532739461553e-05, 'epoch': 0.81} +2025-02-05 17:12:31 - ERROR - stderr - 27%|██▋ | 6033/22434 [7:04:51<11:29:53, 2.52s/it] +2025-02-05 17:12:33 - ERROR - stderr - 27%|██▋ | 6034/22434 [7:04:53<11:26:20, 2.51s/it] +2025-02-05 17:12:33 - ERROR - stderr - +2025-02-05 17:12:33 - ERROR - stderr - +2025-02-05 17:12:33 - INFO - stdout - {'loss': 0.9708, 'grad_norm': 1.1039469242095947, 'learning_rate': 1.7152264996674138e-05, 'epoch': 0.81} +2025-02-05 17:12:33 - ERROR - stderr - 27%|██▋ | 6034/22434 [7:04:53<11:26:20, 2.51s/it] +2025-02-05 17:12:36 - ERROR - stderr - 27%|██▋ | 6035/22434 [7:04:56<11:22:14, 2.50s/it] +2025-02-05 17:12:36 - ERROR - stderr - +2025-02-05 17:12:36 - ERROR - stderr - +2025-02-05 17:12:36 - INFO - stdout - {'loss': 0.8675, 'grad_norm': 0.9794313907623291, 'learning_rate': 1.7151255898110853e-05, 'epoch': 0.81} +2025-02-05 17:12:36 - ERROR - stderr - 27%|██▋ | 6035/22434 [7:04:56<11:22:14, 2.50s/it] +2025-02-05 17:12:38 - ERROR - stderr - 27%|██▋ | 6036/22434 [7:04:58<11:25:36, 2.51s/it] +2025-02-05 17:12:38 - ERROR - stderr - +2025-02-05 17:12:38 - ERROR - stderr - +2025-02-05 17:12:38 - INFO - stdout - {'loss': 0.9654, 'grad_norm': 1.0070325136184692, 'learning_rate': 1.7150246650486483e-05, 'epoch': 0.81} +2025-02-05 17:12:38 - ERROR - stderr - 27%|██▋ | 6036/22434 [7:04:58<11:25:36, 2.51s/it] +2025-02-05 17:12:41 - ERROR - stderr - 27%|██▋ | 6037/22434 [7:05:01<11:24:41, 2.51s/it] +2025-02-05 17:12:41 - ERROR - stderr - +2025-02-05 17:12:41 - ERROR - stderr - +2025-02-05 17:12:41 - INFO - stdout - {'loss': 0.8769, 'grad_norm': 1.0271183252334595, 'learning_rate': 1.7149237253822065e-05, 'epoch': 0.81} +2025-02-05 17:12:41 - ERROR - stderr - 27%|██▋ | 6037/22434 [7:05:01<11:24:41, 2.51s/it] +2025-02-05 17:12:43 - ERROR - stderr - 27%|██▋ | 6038/22434 [7:05:03<11:19:58, 2.49s/it] +2025-02-05 17:12:43 - ERROR - stderr - +2025-02-05 17:12:43 - ERROR - stderr - +2025-02-05 17:12:43 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.057939052581787, 'learning_rate': 1.714822770813864e-05, 'epoch': 0.81} +2025-02-05 17:12:43 - ERROR - stderr - 27%|██▋ | 6038/22434 [7:05:03<11:19:58, 2.49s/it] +2025-02-05 17:12:46 - ERROR - stderr - 27%|██▋ | 6039/22434 [7:05:06<11:22:14, 2.50s/it] +2025-02-05 17:12:46 - ERROR - stderr - +2025-02-05 17:12:46 - ERROR - stderr - +2025-02-05 17:12:46 - INFO - stdout - {'loss': 1.0501, 'grad_norm': 1.1301624774932861, 'learning_rate': 1.714721801345724e-05, 'epoch': 0.81} +2025-02-05 17:12:46 - ERROR - stderr - 27%|██▋ | 6039/22434 [7:05:06<11:22:14, 2.50s/it] +2025-02-05 17:12:48 - ERROR - stderr - 27%|██▋ | 6040/22434 [7:05:08<11:32:01, 2.53s/it] +2025-02-05 17:12:48 - ERROR - stderr - +2025-02-05 17:12:48 - ERROR - stderr - +2025-02-05 17:12:48 - INFO - stdout - {'loss': 0.9327, 'grad_norm': 1.1286258697509766, 'learning_rate': 1.714620816979893e-05, 'epoch': 0.81} +2025-02-05 17:12:48 - ERROR - stderr - 27%|██▋ | 6040/22434 [7:05:08<11:32:01, 2.53s/it] +2025-02-05 17:12:51 - ERROR - stderr - 27%|██▋ | 6041/22434 [7:05:11<11:28:58, 2.52s/it] +2025-02-05 17:12:51 - ERROR - stderr - +2025-02-05 17:12:51 - ERROR - stderr - +2025-02-05 17:12:51 - INFO - stdout - {'loss': 0.8419, 'grad_norm': 0.9469525218009949, 'learning_rate': 1.714519817718474e-05, 'epoch': 0.81} +2025-02-05 17:12:51 - ERROR - stderr - 27%|██▋ | 6041/22434 [7:05:11<11:28:58, 2.52s/it] +2025-02-05 17:12:54 - ERROR - stderr - 27%|██▋ | 6042/22434 [7:05:13<11:34:10, 2.54s/it] +2025-02-05 17:12:54 - ERROR - stderr - +2025-02-05 17:12:54 - ERROR - stderr - +2025-02-05 17:12:54 - INFO - stdout - {'loss': 0.9878, 'grad_norm': 1.1028311252593994, 'learning_rate': 1.7144188035635735e-05, 'epoch': 0.81} +2025-02-05 17:12:54 - ERROR - stderr - 27%|██▋ | 6042/22434 [7:05:13<11:34:10, 2.54s/it] +2025-02-05 17:12:56 - ERROR - stderr - 27%|██▋ | 6043/22434 [7:05:16<11:31:24, 2.53s/it] +2025-02-05 17:12:56 - ERROR - stderr - +2025-02-05 17:12:56 - ERROR - stderr - +2025-02-05 17:12:56 - INFO - stdout - {'loss': 1.0624, 'grad_norm': 1.1041207313537598, 'learning_rate': 1.714317774517297e-05, 'epoch': 0.81} +2025-02-05 17:12:56 - ERROR - stderr - 27%|██▋ | 6043/22434 [7:05:16<11:31:24, 2.53s/it] +2025-02-05 17:12:58 - ERROR - stderr - 27%|██▋ | 6044/22434 [7:05:18<11:26:25, 2.51s/it] +2025-02-05 17:12:59 - ERROR - stderr - +2025-02-05 17:12:59 - ERROR - stderr - +2025-02-05 17:12:59 - INFO - stdout - {'loss': 0.9587, 'grad_norm': 1.0350028276443481, 'learning_rate': 1.7142167305817495e-05, 'epoch': 0.81} +2025-02-05 17:12:59 - ERROR - stderr - 27%|██▋ | 6044/22434 [7:05:18<11:26:25, 2.51s/it] +2025-02-05 17:13:01 - ERROR - stderr - 27%|██▋ | 6045/22434 [7:05:21<11:23:38, 2.50s/it] +2025-02-05 17:13:01 - ERROR - stderr - +2025-02-05 17:13:01 - ERROR - stderr - +2025-02-05 17:13:01 - INFO - stdout - {'loss': 0.8189, 'grad_norm': 1.0243061780929565, 'learning_rate': 1.714115671759038e-05, 'epoch': 0.81} +2025-02-05 17:13:01 - ERROR - stderr - 27%|██▋ | 6045/22434 [7:05:21<11:23:38, 2.50s/it] +2025-02-05 17:13:03 - ERROR - stderr - 27%|██▋ | 6046/22434 [7:05:23<11:23:03, 2.50s/it] +2025-02-05 17:13:04 - ERROR - stderr - +2025-02-05 17:13:04 - ERROR - stderr - +2025-02-05 17:13:04 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.1283940076828003, 'learning_rate': 1.7140145980512684e-05, 'epoch': 0.81} +2025-02-05 17:13:04 - ERROR - stderr - 27%|██▋ | 6046/22434 [7:05:23<11:23:03, 2.50s/it] +2025-02-05 17:13:06 - ERROR - stderr - 27%|██▋ | 6047/22434 [7:05:26<11:26:26, 2.51s/it] +2025-02-05 17:13:06 - ERROR - stderr - +2025-02-05 17:13:06 - ERROR - stderr - +2025-02-05 17:13:06 - INFO - stdout - {'loss': 0.9221, 'grad_norm': 1.0392546653747559, 'learning_rate': 1.7139135094605478e-05, 'epoch': 0.81} +2025-02-05 17:13:06 - ERROR - stderr - 27%|██▋ | 6047/22434 [7:05:26<11:26:26, 2.51s/it] +2025-02-05 17:13:09 - ERROR - stderr - 27%|██▋ | 6048/22434 [7:05:28<11:26:16, 2.51s/it] +2025-02-05 17:13:09 - ERROR - stderr - +2025-02-05 17:13:09 - ERROR - stderr - +2025-02-05 17:13:09 - INFO - stdout - {'loss': 0.9427, 'grad_norm': 1.103288173675537, 'learning_rate': 1.7138124059889834e-05, 'epoch': 0.81} +2025-02-05 17:13:09 - ERROR - stderr - 27%|██��� | 6048/22434 [7:05:28<11:26:16, 2.51s/it] +2025-02-05 17:13:11 - ERROR - stderr - 27%|██▋ | 6049/22434 [7:05:31<11:21:55, 2.50s/it] +2025-02-05 17:13:11 - ERROR - stderr - +2025-02-05 17:13:11 - ERROR - stderr - +2025-02-05 17:13:11 - INFO - stdout - {'loss': 0.8547, 'grad_norm': 1.0742000341415405, 'learning_rate': 1.713711287638682e-05, 'epoch': 0.81} +2025-02-05 17:13:11 - ERROR - stderr - 27%|██▋ | 6049/22434 [7:05:31<11:21:55, 2.50s/it] +2025-02-05 17:13:14 - ERROR - stderr - 27%|██▋ | 6050/22434 [7:05:33<11:25:01, 2.51s/it] +2025-02-05 17:13:14 - ERROR - stderr - +2025-02-05 17:13:14 - ERROR - stderr - +2025-02-05 17:13:14 - INFO - stdout - {'loss': 0.8976, 'grad_norm': 1.0859650373458862, 'learning_rate': 1.7136101544117526e-05, 'epoch': 0.81} +2025-02-05 17:13:14 - ERROR - stderr - 27%|██▋ | 6050/22434 [7:05:33<11:25:01, 2.51s/it] +2025-02-05 17:13:16 - ERROR - stderr - 27%|██▋ | 6051/22434 [7:05:36<11:25:42, 2.51s/it] +2025-02-05 17:13:16 - ERROR - stderr - +2025-02-05 17:13:16 - ERROR - stderr - +2025-02-05 17:13:16 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.0058294534683228, 'learning_rate': 1.713509006310302e-05, 'epoch': 0.81} +2025-02-05 17:13:16 - ERROR - stderr - 27%|██▋ | 6051/22434 [7:05:36<11:25:42, 2.51s/it] +2025-02-05 17:13:19 - ERROR - stderr - 27%|██▋ | 6052/22434 [7:05:38<11:29:31, 2.53s/it] +2025-02-05 17:13:19 - ERROR - stderr - +2025-02-05 17:13:19 - ERROR - stderr - +2025-02-05 17:13:19 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 0.9886820912361145, 'learning_rate': 1.7134078433364386e-05, 'epoch': 0.81} +2025-02-05 17:13:19 - ERROR - stderr - 27%|██▋ | 6052/22434 [7:05:38<11:29:31, 2.53s/it] +2025-02-05 17:13:21 - ERROR - stderr - 27%|██▋ | 6053/22434 [7:05:41<11:30:42, 2.53s/it] +2025-02-05 17:13:21 - ERROR - stderr - +2025-02-05 17:13:21 - ERROR - stderr - +2025-02-05 17:13:21 - INFO - stdout - {'loss': 1.1178, 'grad_norm': 1.1034040451049805, 'learning_rate': 1.7133066654922714e-05, 'epoch': 0.81} +2025-02-05 17:13:21 - ERROR - stderr - 27%|██▋ | 6053/22434 [7:05:41<11:30:42, 2.53s/it] +2025-02-05 17:13:24 - ERROR - stderr - 27%|██▋ | 6054/22434 [7:05:44<11:36:54, 2.55s/it] +2025-02-05 17:13:24 - ERROR - stderr - +2025-02-05 17:13:24 - ERROR - stderr - +2025-02-05 17:13:24 - INFO - stdout - {'loss': 1.0018, 'grad_norm': 1.0523929595947266, 'learning_rate': 1.7132054727799096e-05, 'epoch': 0.81} +2025-02-05 17:13:24 - ERROR - stderr - 27%|██▋ | 6054/22434 [7:05:44<11:36:54, 2.55s/it] +2025-02-05 17:13:26 - ERROR - stderr - 27%|██▋ | 6055/22434 [7:05:46<11:38:24, 2.56s/it] +2025-02-05 17:13:26 - ERROR - stderr - +2025-02-05 17:13:26 - ERROR - stderr - +2025-02-05 17:13:26 - INFO - stdout - {'loss': 1.0176, 'grad_norm': 0.9644655585289001, 'learning_rate': 1.7131042652014623e-05, 'epoch': 0.81} +2025-02-05 17:13:26 - ERROR - stderr - 27%|██▋ | 6055/22434 [7:05:46<11:38:24, 2.56s/it] +2025-02-05 17:13:29 - ERROR - stderr - 27%|██▋ | 6056/22434 [7:05:49<11:31:15, 2.53s/it] +2025-02-05 17:13:29 - ERROR - stderr - +2025-02-05 17:13:29 - ERROR - stderr - +2025-02-05 17:13:29 - INFO - stdout - {'loss': 0.9593, 'grad_norm': 1.1424295902252197, 'learning_rate': 1.7130030427590386e-05, 'epoch': 0.81} +2025-02-05 17:13:29 - ERROR - stderr - 27%|██▋ | 6056/22434 [7:05:49<11:31:15, 2.53s/it] +2025-02-05 17:13:31 - ERROR - stderr - 27%|██▋ | 6057/22434 [7:05:51<11:30:29, 2.53s/it] +2025-02-05 17:13:31 - ERROR - stderr - +2025-02-05 17:13:31 - ERROR - stderr - +2025-02-05 17:13:31 - INFO - stdout - {'loss': 0.8514, 'grad_norm': 1.0487345457077026, 'learning_rate': 1.712901805454749e-05, 'epoch': 0.81} +2025-02-05 17:13:31 - ERROR - stderr - 27%|██▋ | 6057/22434 [7:05:51<11:30:29, 2.53s/it] +2025-02-05 17:13:34 - ERROR - stderr - 27%|██▋ | 6058/22434 [7:05:54<11:30:59, 2.53s/it] +2025-02-05 17:13:34 - ERROR - stderr - +2025-02-05 17:13:34 - ERROR - stderr - +2025-02-05 17:13:34 - INFO - stdout - {'loss': 0.893, 'grad_norm': 1.1162453889846802, 'learning_rate': 1.712800553290703e-05, 'epoch': 0.81} +2025-02-05 17:13:34 - ERROR - stderr - 27%|██▋ | 6058/22434 [7:05:54<11:30:59, 2.53s/it] +2025-02-05 17:13:36 - ERROR - stderr - 27%|██▋ | 6059/22434 [7:05:56<11:26:48, 2.52s/it] +2025-02-05 17:13:36 - ERROR - stderr - +2025-02-05 17:13:36 - ERROR - stderr - +2025-02-05 17:13:36 - INFO - stdout - {'loss': 0.9317, 'grad_norm': 1.0783329010009766, 'learning_rate': 1.712699286269012e-05, 'epoch': 0.81} +2025-02-05 17:13:36 - ERROR - stderr - 27%|██▋ | 6059/22434 [7:05:56<11:26:48, 2.52s/it] +2025-02-05 17:13:39 - ERROR - stderr - 27%|██▋ | 6060/22434 [7:05:59<11:25:50, 2.51s/it] +2025-02-05 17:13:39 - ERROR - stderr - +2025-02-05 17:13:39 - ERROR - stderr - +2025-02-05 17:13:39 - INFO - stdout - {'loss': 0.9456, 'grad_norm': 0.9578342437744141, 'learning_rate': 1.712598004391786e-05, 'epoch': 0.81} +2025-02-05 17:13:39 - ERROR - stderr - 27%|██▋ | 6060/22434 [7:05:59<11:25:50, 2.51s/it] +2025-02-05 17:13:41 - ERROR - stderr - 27%|██▋ | 6061/22434 [7:06:01<11:27:33, 2.52s/it] +2025-02-05 17:13:41 - ERROR - stderr - +2025-02-05 17:13:41 - ERROR - stderr - +2025-02-05 17:13:41 - INFO - stdout - {'loss': 0.8595, 'grad_norm': 1.022254228591919, 'learning_rate': 1.7124967076611368e-05, 'epoch': 0.81} +2025-02-05 17:13:41 - ERROR - stderr - 27%|██▋ | 6061/22434 [7:06:01<11:27:33, 2.52s/it] +2025-02-05 17:13:44 - ERROR - stderr - 27%|██▋ | 6062/22434 [7:06:04<11:24:30, 2.51s/it] +2025-02-05 17:13:44 - ERROR - stderr - +2025-02-05 17:13:44 - ERROR - stderr - +2025-02-05 17:13:44 - INFO - stdout - {'loss': 0.9, 'grad_norm': 1.091898798942566, 'learning_rate': 1.7123953960791754e-05, 'epoch': 0.81} +2025-02-05 17:13:44 - ERROR - stderr - 27%|██▋ | 6062/22434 [7:06:04<11:24:30, 2.51s/it] +2025-02-05 17:13:46 - ERROR - stderr - 27%|██▋ | 6063/22434 [7:06:06<11:21:52, 2.50s/it] +2025-02-05 17:13:46 - ERROR - stderr - +2025-02-05 17:13:46 - ERROR - stderr - +2025-02-05 17:13:46 - INFO - stdout - {'loss': 0.8918, 'grad_norm': 1.0217387676239014, 'learning_rate': 1.7122940696480137e-05, 'epoch': 0.81} +2025-02-05 17:13:46 - ERROR - stderr - 27%|██▋ | 6063/22434 [7:06:06<11:21:52, 2.50s/it] +2025-02-05 17:13:49 - ERROR - stderr - 27%|██▋ | 6064/22434 [7:06:09<11:15:11, 2.47s/it] +2025-02-05 17:13:49 - ERROR - stderr - +2025-02-05 17:13:49 - ERROR - stderr - +2025-02-05 17:13:49 - INFO - stdout - {'loss': 0.882, 'grad_norm': 1.0604270696640015, 'learning_rate': 1.7121927283697636e-05, 'epoch': 0.81} +2025-02-05 17:13:49 - ERROR - stderr - 27%|██▋ | 6064/22434 [7:06:09<11:15:11, 2.47s/it] +2025-02-05 17:13:51 - ERROR - stderr - 27%|██▋ | 6065/22434 [7:06:11<11:21:57, 2.50s/it] +2025-02-05 17:13:51 - ERROR - stderr - +2025-02-05 17:13:51 - ERROR - stderr - +2025-02-05 17:13:51 - INFO - stdout - {'loss': 0.8589, 'grad_norm': 0.9987754225730896, 'learning_rate': 1.7120913722465378e-05, 'epoch': 0.81} +2025-02-05 17:13:51 - ERROR - stderr - 27%|██▋ | 6065/22434 [7:06:11<11:21:57, 2.50s/it] +2025-02-05 17:13:54 - ERROR - stderr - 27%|██▋ | 6066/22434 [7:06:14<11:25:39, 2.51s/it] +2025-02-05 17:13:54 - ERROR - stderr - +2025-02-05 17:13:54 - ERROR - stderr - +2025-02-05 17:13:54 - INFO - stdout - {'loss': 0.9458, 'grad_norm': 1.1152828931808472, 'learning_rate': 1.7119900012804484e-05, 'epoch': 0.81} +2025-02-05 17:13:54 - ERROR - stderr - 27%|██▋ | 6066/22434 [7:06:14<11:25:39, 2.51s/it] +2025-02-05 17:13:56 - ERROR - stderr - 27%|██▋ | 6067/22434 [7:06:16<11:19:06, 2.49s/it] +2025-02-05 17:13:56 - ERROR - stderr - +2025-02-05 17:13:56 - ERROR - stderr - +2025-02-05 17:13:56 - INFO - stdout - {'loss': 0.9186, 'grad_norm': 1.1335035562515259, 'learning_rate': 1.7118886154736092e-05, 'epoch': 0.81} +2025-02-05 17:13:56 - ERROR - stderr - 27%|██▋ | 6067/22434 [7:06:16<11:19:06, 2.49s/it] +2025-02-05 17:13:59 - ERROR - stderr - 27%|██▋ | 6068/22434 [7:06:19<11:53:09, 2.61s/it] +2025-02-05 17:13:59 - ERROR - stderr - +2025-02-05 17:13:59 - ERROR - stderr - +2025-02-05 17:13:59 - INFO - stdout - {'loss': 0.8612, 'grad_norm': 0.950318455696106, 'learning_rate': 1.7117872148281324e-05, 'epoch': 0.81} +2025-02-05 17:13:59 - ERROR - stderr - 27%|██▋ | 6068/22434 [7:06:19<11:53:09, 2.61s/it] +2025-02-05 17:14:02 - ERROR - stderr - 27%|██▋ | 6069/22434 [7:06:22<11:55:51, 2.62s/it] +2025-02-05 17:14:02 - ERROR - stderr - +2025-02-05 17:14:02 - ERROR - stderr - +2025-02-05 17:14:02 - INFO - stdout - {'loss': 0.9819, 'grad_norm': 1.0655595064163208, 'learning_rate': 1.7116857993461326e-05, 'epoch': 0.81} +2025-02-05 17:14:02 - ERROR - stderr - 27%|██▋ | 6069/22434 [7:06:22<11:55:51, 2.62s/it] +2025-02-05 17:14:04 - ERROR - stderr - 27%|██▋ | 6070/22434 [7:06:24<11:40:55, 2.57s/it] +2025-02-05 17:14:04 - ERROR - stderr - +2025-02-05 17:14:04 - ERROR - stderr - +2025-02-05 17:14:04 - INFO - stdout - {'loss': 0.8233, 'grad_norm': 0.924047589302063, 'learning_rate': 1.7115843690297236e-05, 'epoch': 0.81} +2025-02-05 17:14:04 - ERROR - stderr - 27%|██▋ | 6070/22434 [7:06:24<11:40:55, 2.57s/it] +2025-02-05 17:14:07 - ERROR - stderr - 27%|██▋ | 6071/22434 [7:06:27<11:40:49, 2.57s/it] +2025-02-05 17:14:07 - ERROR - stderr - +2025-02-05 17:14:07 - ERROR - stderr - +2025-02-05 17:14:07 - INFO - stdout - {'loss': 0.9554, 'grad_norm': 1.0580531358718872, 'learning_rate': 1.711482923881019e-05, 'epoch': 0.81} +2025-02-05 17:14:07 - ERROR - stderr - 27%|██▋ | 6071/22434 [7:06:27<11:40:49, 2.57s/it] +2025-02-05 17:14:10 - ERROR - stderr - 27%|██▋ | 6072/22434 [7:06:29<11:50:29, 2.61s/it] +2025-02-05 17:14:10 - ERROR - stderr - +2025-02-05 17:14:10 - ERROR - stderr - +2025-02-05 17:14:10 - INFO - stdout - {'loss': 0.891, 'grad_norm': 0.9948450326919556, 'learning_rate': 1.7113814639021334e-05, 'epoch': 0.81} +2025-02-05 17:14:10 - ERROR - stderr - 27%|██▋ | 6072/22434 [7:06:29<11:50:29, 2.61s/it] +2025-02-05 17:14:12 - ERROR - stderr - 27%|██▋ | 6073/22434 [7:06:32<12:16:17, 2.70s/it] +2025-02-05 17:14:13 - ERROR - stderr - +2025-02-05 17:14:13 - ERROR - stderr - +2025-02-05 17:14:13 - INFO - stdout - {'loss': 0.938, 'grad_norm': 0.9294485449790955, 'learning_rate': 1.7112799890951823e-05, 'epoch': 0.81} +2025-02-05 17:14:13 - ERROR - stderr - 27%|██▋ | 6073/22434 [7:06:32<12:16:17, 2.70s/it] +2025-02-05 17:14:15 - ERROR - stderr - 27%|██▋ | 6074/22434 [7:06:35<11:57:18, 2.63s/it] +2025-02-05 17:14:15 - ERROR - stderr - +2025-02-05 17:14:15 - ERROR - stderr - +2025-02-05 17:14:15 - INFO - stdout - {'loss': 0.986, 'grad_norm': 1.0952844619750977, 'learning_rate': 1.7111784994622804e-05, 'epoch': 0.81} +2025-02-05 17:14:15 - ERROR - stderr - 27%|██▋ | 6074/22434 [7:06:35<11:57:18, 2.63s/it] +2025-02-05 17:14:17 - ERROR - stderr - 27%|██▋ | 6075/22434 [7:06:37<11:50:45, 2.61s/it] +2025-02-05 17:14:18 - ERROR - stderr - +2025-02-05 17:14:18 - ERROR - stderr - +2025-02-05 17:14:18 - INFO - stdout - {'loss': 0.9233, 'grad_norm': 1.0463758707046509, 'learning_rate': 1.711076995005543e-05, 'epoch': 0.81} +2025-02-05 17:14:18 - ERROR - stderr - 27%|██▋ | 6075/22434 [7:06:37<11:50:45, 2.61s/it] +2025-02-05 17:14:20 - ERROR - stderr - 27%|██▋ | 6076/22434 [7:06:40<11:37:37, 2.56s/it] +2025-02-05 17:14:20 - ERROR - stderr - +2025-02-05 17:14:20 - ERROR - stderr - +2025-02-05 17:14:20 - INFO - stdout - {'loss': 0.8834, 'grad_norm': 1.1055735349655151, 'learning_rate': 1.710975475727086e-05, 'epoch': 0.81} +2025-02-05 17:14:20 - ERROR - stderr - 27%|██▋ | 6076/22434 [7:06:40<11:37:37, 2.56s/it] +2025-02-05 17:14:23 - ERROR - stderr - 27%|██▋ | 6077/22434 [7:06:43<11:59:51, 2.64s/it] +2025-02-05 17:14:23 - ERROR - stderr - +2025-02-05 17:14:23 - ERROR - stderr - +2025-02-05 17:14:23 - INFO - stdout - {'loss': 0.9209, 'grad_norm': 1.1485838890075684, 'learning_rate': 1.7108739416290257e-05, 'epoch': 0.81} +2025-02-05 17:14:23 - ERROR - stderr - 27%|██▋ | 6077/22434 [7:06:43<11:59:51, 2.64s/it] +2025-02-05 17:14:25 - ERROR - stderr - 27%|██▋ | 6078/22434 [7:06:45<11:56:47, 2.63s/it] +2025-02-05 17:14:25 - ERROR - stderr - +2025-02-05 17:14:25 - ERROR - stderr - +2025-02-05 17:14:25 - INFO - stdout - {'loss': 1.0076, 'grad_norm': 1.16169273853302, 'learning_rate': 1.7107723927134788e-05, 'epoch': 0.81} +2025-02-05 17:14:25 - ERROR - stderr - 27%|██▋ | 6078/22434 [7:06:45<11:56:47, 2.63s/it] +2025-02-05 17:14:28 - ERROR - stderr - 27%|██▋ | 6079/22434 [7:06:48<11:39:22, 2.57s/it] +2025-02-05 17:14:28 - ERROR - stderr - +2025-02-05 17:14:28 - ERROR - stderr - +2025-02-05 17:14:28 - INFO - stdout - {'loss': 1.0211, 'grad_norm': 1.0903571844100952, 'learning_rate': 1.710670828982561e-05, 'epoch': 0.81} +2025-02-05 17:14:28 - ERROR - stderr - 27%|██▋ | 6079/22434 [7:06:48<11:39:22, 2.57s/it] +2025-02-05 17:14:30 - ERROR - stderr - 27%|██▋ | 6080/22434 [7:06:50<11:33:20, 2.54s/it] +2025-02-05 17:14:30 - ERROR - stderr - +2025-02-05 17:14:30 - ERROR - stderr - +2025-02-05 17:14:30 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.1035288572311401, 'learning_rate': 1.7105692504383898e-05, 'epoch': 0.81} +2025-02-05 17:14:30 - ERROR - stderr - 27%|██▋ | 6080/22434 [7:06:50<11:33:20, 2.54s/it] +2025-02-05 17:14:33 - ERROR - stderr - 27%|██▋ | 6081/22434 [7:06:53<11:28:48, 2.53s/it] +2025-02-05 17:14:33 - ERROR - stderr - +2025-02-05 17:14:33 - ERROR - stderr - +2025-02-05 17:14:33 - INFO - stdout - {'loss': 0.9728, 'grad_norm': 1.0425844192504883, 'learning_rate': 1.7104676570830824e-05, 'epoch': 0.81} +2025-02-05 17:14:33 - ERROR - stderr - 27%|██▋ | 6081/22434 [7:06:53<11:28:48, 2.53s/it] +2025-02-05 17:14:35 - ERROR - stderr - 27%|██▋ | 6082/22434 [7:06:55<11:28:28, 2.53s/it] +2025-02-05 17:14:35 - ERROR - stderr - +2025-02-05 17:14:35 - ERROR - stderr - +2025-02-05 17:14:35 - INFO - stdout - {'loss': 0.8758, 'grad_norm': 1.0070650577545166, 'learning_rate': 1.710366048918757e-05, 'epoch': 0.81} +2025-02-05 17:14:35 - ERROR - stderr - 27%|██▋ | 6082/22434 [7:06:55<11:28:28, 2.53s/it] +2025-02-05 17:14:38 - ERROR - stderr - 27%|██▋ | 6083/22434 [7:06:58<11:30:30, 2.53s/it] +2025-02-05 17:14:38 - ERROR - stderr - +2025-02-05 17:14:38 - ERROR - stderr - +2025-02-05 17:14:38 - INFO - stdout - {'loss': 0.902, 'grad_norm': 1.0774873495101929, 'learning_rate': 1.7102644259475308e-05, 'epoch': 0.81} +2025-02-05 17:14:38 - ERROR - stderr - 27%|██▋ | 6083/22434 [7:06:58<11:30:30, 2.53s/it] +2025-02-05 17:14:40 - ERROR - stderr - 27%|██▋ | 6084/22434 [7:07:00<11:30:35, 2.53s/it] +2025-02-05 17:14:40 - ERROR - stderr - +2025-02-05 17:14:40 - ERROR - stderr - +2025-02-05 17:14:40 - INFO - stdout - {'loss': 1.0043, 'grad_norm': 1.142493724822998, 'learning_rate': 1.710162788171522e-05, 'epoch': 0.81} +2025-02-05 17:14:40 - ERROR - stderr - 27%|██▋ | 6084/22434 [7:07:00<11:30:35, 2.53s/it] +2025-02-05 17:14:43 - ERROR - stderr - 27%|██▋ | 6085/22434 [7:07:03<11:29:58, 2.53s/it] +2025-02-05 17:14:43 - ERROR - stderr - +2025-02-05 17:14:43 - ERROR - stderr - +2025-02-05 17:14:43 - INFO - stdout - {'loss': 0.8673, 'grad_norm': 0.8893013000488281, 'learning_rate': 1.71006113559285e-05, 'epoch': 0.81} +2025-02-05 17:14:43 - ERROR - stderr - 27%|██▋ | 6085/22434 [7:07:03<11:29:58, 2.53s/it] +2025-02-05 17:14:45 - ERROR - stderr - 27%|██▋ | 6086/22434 [7:07:05<11:34:23, 2.55s/it] +2025-02-05 17:14:46 - ERROR - stderr - +2025-02-05 17:14:46 - ERROR - stderr - +2025-02-05 17:14:46 - INFO - stdout - {'loss': 0.9643, 'grad_norm': 1.0045000314712524, 'learning_rate': 1.7099594682136325e-05, 'epoch': 0.81} +2025-02-05 17:14:46 - ERROR - stderr - 27%|██▋ | 6086/22434 [7:07:05<11:34:23, 2.55s/it] +2025-02-05 17:14:48 - ERROR - stderr - 27%|██▋ | 6087/22434 [7:07:08<11:26:51, 2.52s/it] +2025-02-05 17:14:48 - ERROR - stderr - +2025-02-05 17:14:48 - ERROR - stderr - +2025-02-05 17:14:48 - INFO - stdout - {'loss': 0.9119, 'grad_norm': 1.2097805738449097, 'learning_rate': 1.7098577860359896e-05, 'epoch': 0.81} +2025-02-05 17:14:48 - ERROR - stderr - 27%|██▋ | 6087/22434 [7:07:08<11:26:51, 2.52s/it] +2025-02-05 17:14:50 - ERROR - stderr - 27%|██▋ | 6088/22434 [7:07:10<11:23:05, 2.51s/it] +2025-02-05 17:14:50 - ERROR - stderr - +2025-02-05 17:14:50 - ERROR - stderr - +2025-02-05 17:14:50 - INFO - stdout - {'loss': 0.9811, 'grad_norm': 1.0805107355117798, 'learning_rate': 1.7097560890620403e-05, 'epoch': 0.81} +2025-02-05 17:14:50 - ERROR - stderr - 27%|██▋ | 6088/22434 [7:07:10<11:23:05, 2.51s/it] +2025-02-05 17:14:53 - ERROR - stderr - 27%|██▋ | 6089/22434 [7:07:13<11:24:12, 2.51s/it] +2025-02-05 17:14:53 - ERROR - stderr - +2025-02-05 17:14:53 - ERROR - stderr - +2025-02-05 17:14:53 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 1.1926743984222412, 'learning_rate': 1.7096543772939047e-05, 'epoch': 0.81} +2025-02-05 17:14:53 - ERROR - stderr - 27%|██▋ | 6089/22434 [7:07:13<11:24:12, 2.51s/it] +2025-02-05 17:14:56 - ERROR - stderr - 27%|██▋ | 6090/22434 [7:07:15<11:29:20, 2.53s/it] +2025-02-05 17:14:56 - ERROR - stderr - +2025-02-05 17:14:56 - ERROR - stderr - +2025-02-05 17:14:56 - INFO - stdout - {'loss': 0.8964, 'grad_norm': 0.946707010269165, 'learning_rate': 1.709552650733702e-05, 'epoch': 0.81} +2025-02-05 17:14:56 - ERROR - stderr - 27%|██▋ | 6090/22434 [7:07:15<11:29:20, 2.53s/it] +2025-02-05 17:14:58 - ERROR - stderr - 27%|██▋ | 6091/22434 [7:07:18<11:25:24, 2.52s/it] +2025-02-05 17:14:58 - ERROR - stderr - +2025-02-05 17:14:58 - ERROR - stderr - +2025-02-05 17:14:58 - INFO - stdout - {'loss': 0.9068, 'grad_norm': 0.9843320250511169, 'learning_rate': 1.709450909383554e-05, 'epoch': 0.81} +2025-02-05 17:14:58 - ERROR - stderr - 27%|██▋ | 6091/22434 [7:07:18<11:25:24, 2.52s/it] +2025-02-05 17:15:00 - ERROR - stderr - 27%|██▋ | 6092/22434 [7:07:20<11:20:07, 2.50s/it] +2025-02-05 17:15:00 - ERROR - stderr - +2025-02-05 17:15:00 - ERROR - stderr - +2025-02-05 17:15:00 - INFO - stdout - {'loss': 0.906, 'grad_norm': 1.0823416709899902, 'learning_rate': 1.7093491532455804e-05, 'epoch': 0.81} +2025-02-05 17:15:00 - ERROR - stderr - 27%|██▋ | 6092/22434 [7:07:20<11:20:07, 2.50s/it] +2025-02-05 17:15:03 - ERROR - stderr - 27%|██▋ | 6093/22434 [7:07:23<11:18:26, 2.49s/it] +2025-02-05 17:15:03 - ERROR - stderr - +2025-02-05 17:15:03 - ERROR - stderr - +2025-02-05 17:15:03 - INFO - stdout - {'loss': 0.9362, 'grad_norm': 1.0088683366775513, 'learning_rate': 1.7092473823219028e-05, 'epoch': 0.81} +2025-02-05 17:15:03 - ERROR - stderr - 27%|██▋ | 6093/22434 [7:07:23<11:18:26, 2.49s/it] +2025-02-05 17:15:05 - ERROR - stderr - 27%|██▋ | 6094/22434 [7:07:25<11:16:47, 2.49s/it] +2025-02-05 17:15:05 - ERROR - stderr - +2025-02-05 17:15:05 - ERROR - stderr - +2025-02-05 17:15:05 - INFO - stdout - {'loss': 0.9045, 'grad_norm': 0.9953064322471619, 'learning_rate': 1.7091455966146418e-05, 'epoch': 0.81} +2025-02-05 17:15:05 - ERROR - stderr - 27%|██▋ | 6094/22434 [7:07:25<11:16:47, 2.49s/it] +2025-02-05 17:15:08 - ERROR - stderr - 27%|██▋ | 6095/22434 [7:07:28<11:18:39, 2.49s/it] +2025-02-05 17:15:08 - ERROR - stderr - +2025-02-05 17:15:08 - ERROR - stderr - +2025-02-05 17:15:08 - INFO - stdout - {'loss': 0.9155, 'grad_norm': 1.0562125444412231, 'learning_rate': 1.7090437961259195e-05, 'epoch': 0.82} +2025-02-05 17:15:08 - ERROR - stderr - 27%|██▋ | 6095/22434 [7:07:28<11:18:39, 2.49s/it] +2025-02-05 17:15:10 - ERROR - stderr - 27%|██▋ | 6096/22434 [7:07:30<11:19:24, 2.50s/it] +2025-02-05 17:15:10 - ERROR - stderr - +2025-02-05 17:15:10 - ERROR - stderr - +2025-02-05 17:15:10 - INFO - stdout - {'loss': 0.9863, 'grad_norm': 1.160382628440857, 'learning_rate': 1.7089419808578574e-05, 'epoch': 0.82} +2025-02-05 17:15:10 - ERROR - stderr - 27%|██▋ | 6096/22434 [7:07:30<11:19:24, 2.50s/it] +2025-02-05 17:15:13 - ERROR - stderr - 27%|██▋ | 6097/22434 [7:07:33<11:41:39, 2.58s/it] +2025-02-05 17:15:13 - ERROR - stderr - +2025-02-05 17:15:13 - ERROR - stderr - +2025-02-05 17:15:13 - INFO - stdout - {'loss': 0.925, 'grad_norm': 1.1183600425720215, 'learning_rate': 1.7088401508125785e-05, 'epoch': 0.82} +2025-02-05 17:15:13 - ERROR - stderr - 27%|██▋ | 6097/22434 [7:07:33<11:41:39, 2.58s/it] +2025-02-05 17:15:16 - ERROR - stderr - 27%|██▋ | 6098/22434 [7:07:35<11:32:13, 2.54s/it] +2025-02-05 17:15:16 - ERROR - stderr - +2025-02-05 17:15:16 - ERROR - stderr - +2025-02-05 17:15:16 - INFO - stdout - {'loss': 1.1005, 'grad_norm': 1.0507615804672241, 'learning_rate': 1.708738305992205e-05, 'epoch': 0.82} +2025-02-05 17:15:16 - ERROR - stderr - 27%|██▋ | 6098/22434 [7:07:35<11:32:13, 2.54s/it] +2025-02-05 17:15:18 - ERROR - stderr - 27%|██▋ | 6099/22434 [7:07:38<11:26:50, 2.52s/it] +2025-02-05 17:15:18 - ERROR - stderr - +2025-02-05 17:15:18 - ERROR - stderr - +2025-02-05 17:15:18 - INFO - stdout - {'loss': 0.9254, 'grad_norm': 1.0413898229599, 'learning_rate': 1.7086364463988597e-05, 'epoch': 0.82} +2025-02-05 17:15:18 - ERROR - stderr - 27%|██▋ | 6099/22434 [7:07:38<11:26:50, 2.52s/it] +2025-02-05 17:15:21 - ERROR - stderr - 27%|██▋ | 6100/22434 [7:07:40<11:25:57, 2.52s/it] +2025-02-05 17:15:21 - ERROR - stderr - +2025-02-05 17:15:21 - ERROR - stderr - +2025-02-05 17:15:21 - INFO - stdout - {'loss': 1.0276, 'grad_norm': 1.0653586387634277, 'learning_rate': 1.7085345720346655e-05, 'epoch': 0.82} +2025-02-05 17:15:21 - ERROR - stderr - 27%|██▋ | 6100/22434 [7:07:40<11:25:57, 2.52s/it] +2025-02-05 17:15:23 - ERROR - stderr - 27%|██▋ | 6101/22434 [7:07:43<11:21:07, 2.50s/it] +2025-02-05 17:15:23 - ERROR - stderr - +2025-02-05 17:15:23 - ERROR - stderr - +2025-02-05 17:15:23 - INFO - stdout - {'loss': 0.9608, 'grad_norm': 1.0066090822219849, 'learning_rate': 1.7084326829017464e-05, 'epoch': 0.82} +2025-02-05 17:15:23 - ERROR - stderr - 27%|██▋ | 6101/22434 [7:07:43<11:21:07, 2.50s/it] +2025-02-05 17:15:26 - ERROR - stderr - 27%|██▋ | 6102/22434 [7:07:45<11:17:55, 2.49s/it] +2025-02-05 17:15:26 - ERROR - stderr - +2025-02-05 17:15:26 - ERROR - stderr - +2025-02-05 17:15:26 - INFO - stdout - {'loss': 0.8517, 'grad_norm': 1.0620393753051758, 'learning_rate': 1.7083307790022255e-05, 'epoch': 0.82} +2025-02-05 17:15:26 - ERROR - stderr - 27%|██▋ | 6102/22434 [7:07:45<11:17:55, 2.49s/it] +2025-02-05 17:15:28 - ERROR - stderr - 27%|██▋ | 6103/22434 [7:07:48<11:17:50, 2.49s/it] +2025-02-05 17:15:28 - ERROR - stderr - +2025-02-05 17:15:28 - ERROR - stderr - +2025-02-05 17:15:28 - INFO - stdout - {'loss': 0.9747, 'grad_norm': 1.108443021774292, 'learning_rate': 1.708228860338228e-05, 'epoch': 0.82} +2025-02-05 17:15:28 - ERROR - stderr - 27%|██▋ | 6103/22434 [7:07:48<11:17:50, 2.49s/it] +2025-02-05 17:15:31 - ERROR - stderr - 27%|██▋ | 6104/22434 [7:07:50<11:22:15, 2.51s/it] +2025-02-05 17:15:31 - ERROR - stderr - +2025-02-05 17:15:31 - ERROR - stderr - +2025-02-05 17:15:31 - INFO - stdout - {'loss': 1.0128, 'grad_norm': 1.1763421297073364, 'learning_rate': 1.7081269269118773e-05, 'epoch': 0.82} +2025-02-05 17:15:31 - ERROR - stderr - 27%|██▋ | 6104/22434 [7:07:50<11:22:15, 2.51s/it] +2025-02-05 17:15:33 - ERROR - stderr - 27%|██▋ | 6105/22434 [7:07:53<11:26:13, 2.52s/it] +2025-02-05 17:15:33 - ERROR - stderr - +2025-02-05 17:15:33 - ERROR - stderr - +2025-02-05 17:15:33 - INFO - stdout - {'loss': 0.9683, 'grad_norm': 1.0500962734222412, 'learning_rate': 1.7080249787252984e-05, 'epoch': 0.82} +2025-02-05 17:15:33 - ERROR - stderr - 27%|██▋ | 6105/22434 [7:07:53<11:26:13, 2.52s/it] +2025-02-05 17:15:36 - ERROR - stderr - 27%|██▋ | 6106/22434 [7:07:56<11:52:49, 2.62s/it] +2025-02-05 17:15:36 - ERROR - stderr - +2025-02-05 17:15:36 - ERROR - stderr - +2025-02-05 17:15:36 - INFO - stdout - {'loss': 0.853, 'grad_norm': 0.9833402633666992, 'learning_rate': 1.707923015780616e-05, 'epoch': 0.82} +2025-02-05 17:15:36 - ERROR - stderr - 27%|██▋ | 6106/22434 [7:07:56<11:52:49, 2.62s/it] +2025-02-05 17:15:39 - ERROR - stderr - 27%|██▋ | 6107/22434 [7:07:58<11:52:50, 2.62s/it] +2025-02-05 17:15:39 - ERROR - stderr - +2025-02-05 17:15:39 - ERROR - stderr - +2025-02-05 17:15:39 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 1.1283477544784546, 'learning_rate': 1.707821038079956e-05, 'epoch': 0.82} +2025-02-05 17:15:39 - ERROR - stderr - 27%|██▋ | 6107/22434 [7:07:58<11:52:50, 2.62s/it] +2025-02-05 17:15:42 - ERROR - stderr - 27%|██▋ | 6108/22434 [7:08:01<12:21:09, 2.72s/it] +2025-02-05 17:15:42 - ERROR - stderr - +2025-02-05 17:15:42 - ERROR - stderr - +2025-02-05 17:15:42 - INFO - stdout - {'loss': 0.7806, 'grad_norm': 0.9429518580436707, 'learning_rate': 1.707719045625444e-05, 'epoch': 0.82} +2025-02-05 17:15:42 - ERROR - stderr - 27%|██▋ | 6108/22434 [7:08:01<12:21:09, 2.72s/it] +2025-02-05 17:15:44 - ERROR - stderr - 27%|██▋ | 6109/22434 [7:08:04<12:04:50, 2.66s/it] +2025-02-05 17:15:44 - ERROR - stderr - +2025-02-05 17:15:44 - ERROR - stderr - +2025-02-05 17:15:44 - INFO - stdout - {'loss': 0.9516, 'grad_norm': 1.0016028881072998, 'learning_rate': 1.7076170384192053e-05, 'epoch': 0.82} +2025-02-05 17:15:44 - ERROR - stderr - 27%|██▋ | 6109/22434 [7:08:04<12:04:50, 2.66s/it] +2025-02-05 17:15:47 - ERROR - stderr - 27%|██▋ | 6110/22434 [7:08:07<12:03:56, 2.66s/it] +2025-02-05 17:15:47 - ERROR - stderr - +2025-02-05 17:15:47 - ERROR - stderr - +2025-02-05 17:15:47 - INFO - stdout - {'loss': 1.0045, 'grad_norm': 1.1430487632751465, 'learning_rate': 1.7075150164633666e-05, 'epoch': 0.82} +2025-02-05 17:15:47 - ERROR - stderr - 27%|██▋ | 6110/22434 [7:08:07<12:03:56, 2.66s/it] +2025-02-05 17:15:49 - ERROR - stderr - 27%|██▋ | 6111/22434 [7:08:09<11:54:02, 2.62s/it] +2025-02-05 17:15:49 - ERROR - stderr - +2025-02-05 17:15:49 - ERROR - stderr - +2025-02-05 17:15:49 - INFO - stdout - {'loss': 0.9519, 'grad_norm': 1.1011921167373657, 'learning_rate': 1.7074129797600547e-05, 'epoch': 0.82} +2025-02-05 17:15:49 - ERROR - stderr - 27%|██▋ | 6111/22434 [7:08:09<11:54:02, 2.62s/it] +2025-02-05 17:15:52 - ERROR - stderr - 27%|██▋ | 6112/22434 [7:08:12<11:45:39, 2.59s/it] +2025-02-05 17:15:52 - ERROR - stderr - +2025-02-05 17:15:52 - ERROR - stderr - +2025-02-05 17:15:52 - INFO - stdout - {'loss': 0.8948, 'grad_norm': 1.0478070974349976, 'learning_rate': 1.707310928311396e-05, 'epoch': 0.82} +2025-02-05 17:15:52 - ERROR - stderr - 27%|██▋ | 6112/22434 [7:08:12<11:45:39, 2.59s/it] +2025-02-05 17:15:54 - ERROR - stderr - 27%|██▋ | 6113/22434 [7:08:14<11:44:40, 2.59s/it] +2025-02-05 17:15:54 - ERROR - stderr - +2025-02-05 17:15:54 - ERROR - stderr - +2025-02-05 17:15:54 - INFO - stdout - {'loss': 0.969, 'grad_norm': 1.0234606266021729, 'learning_rate': 1.707208862119518e-05, 'epoch': 0.82} +2025-02-05 17:15:54 - ERROR - stderr - 27%|██▋ | 6113/22434 [7:08:14<11:44:40, 2.59s/it] +2025-02-05 17:15:57 - ERROR - stderr - 27%|██▋ | 6114/22434 [7:08:17<11:37:17, 2.56s/it] +2025-02-05 17:15:57 - ERROR - stderr - +2025-02-05 17:15:57 - ERROR - stderr - +2025-02-05 17:15:57 - INFO - stdout - {'loss': 0.9826, 'grad_norm': 1.094452977180481, 'learning_rate': 1.7071067811865477e-05, 'epoch': 0.82} +2025-02-05 17:15:57 - ERROR - stderr - 27%|██▋ | 6114/22434 [7:08:17<11:37:17, 2.56s/it] +2025-02-05 17:15:59 - ERROR - stderr - 27%|██▋ | 6115/22434 [7:08:19<11:39:53, 2.57s/it] +2025-02-05 17:16:00 - ERROR - stderr - +2025-02-05 17:16:00 - ERROR - stderr - +2025-02-05 17:16:00 - INFO - stdout - {'loss': 0.9562, 'grad_norm': 1.162048101425171, 'learning_rate': 1.707004685514613e-05, 'epoch': 0.82} +2025-02-05 17:16:00 - ERROR - stderr - 27%|██▋ | 6115/22434 [7:08:19<11:39:53, 2.57s/it] +2025-02-05 17:16:02 - ERROR - stderr - 27%|██▋ | 6116/22434 [7:08:22<11:31:38, 2.54s/it] +2025-02-05 17:16:02 - ERROR - stderr - +2025-02-05 17:16:02 - ERROR - stderr - +2025-02-05 17:16:02 - INFO - stdout - {'loss': 0.9095, 'grad_norm': 1.042914628982544, 'learning_rate': 1.7069025751058426e-05, 'epoch': 0.82} +2025-02-05 17:16:02 - ERROR - stderr - 27%|██▋ | 6116/22434 [7:08:22<11:31:38, 2.54s/it] +2025-02-05 17:16:05 - ERROR - stderr - 27%|██▋ | 6117/22434 [7:08:24<11:32:32, 2.55s/it] +2025-02-05 17:16:05 - ERROR - stderr - +2025-02-05 17:16:05 - ERROR - stderr - +2025-02-05 17:16:05 - INFO - stdout - {'loss': 0.8642, 'grad_norm': 1.0458208322525024, 'learning_rate': 1.7068004499623645e-05, 'epoch': 0.82} +2025-02-05 17:16:05 - ERROR - stderr - 27%|██▋ | 6117/22434 [7:08:24<11:32:32, 2.55s/it] +2025-02-05 17:16:07 - ERROR - stderr - 27%|██▋ | 6118/22434 [7:08:27<11:33:46, 2.55s/it] +2025-02-05 17:16:07 - ERROR - stderr - +2025-02-05 17:16:07 - ERROR - stderr - +2025-02-05 17:16:07 - INFO - stdout - {'loss': 0.9487, 'grad_norm': 1.1367179155349731, 'learning_rate': 1.7066983100863072e-05, 'epoch': 0.82} +2025-02-05 17:16:07 - ERROR - stderr - 27%|██▋ | 6118/22434 [7:08:27<11:33:46, 2.55s/it] +2025-02-05 17:16:10 - ERROR - stderr - 27%|██▋ | 6119/22434 [7:08:29<11:38:54, 2.57s/it] +2025-02-05 17:16:10 - ERROR - stderr - +2025-02-05 17:16:10 - ERROR - stderr - +2025-02-05 17:16:10 - INFO - stdout - {'loss': 0.8661, 'grad_norm': 1.0643582344055176, 'learning_rate': 1.7065961554797997e-05, 'epoch': 0.82} +2025-02-05 17:16:10 - ERROR - stderr - 27%|██▋ | 6119/22434 [7:08:30<11:38:54, 2.57s/it] +2025-02-05 17:16:12 - ERROR - stderr - 27%|██▋ | 6120/22434 [7:08:32<11:32:34, 2.55s/it] +2025-02-05 17:16:12 - ERROR - stderr - +2025-02-05 17:16:12 - ERROR - stderr - +2025-02-05 17:16:12 - INFO - stdout - {'loss': 0.9107, 'grad_norm': 1.0496025085449219, 'learning_rate': 1.7064939861449716e-05, 'epoch': 0.82} +2025-02-05 17:16:12 - ERROR - stderr - 27%|██▋ | 6120/22434 [7:08:32<11:32:34, 2.55s/it] +2025-02-05 17:16:15 - ERROR - stderr - 27%|██▋ | 6121/22434 [7:08:34<11:26:25, 2.52s/it] +2025-02-05 17:16:15 - ERROR - stderr - +2025-02-05 17:16:15 - ERROR - stderr - +2025-02-05 17:16:15 - INFO - stdout - {'loss': 0.9959, 'grad_norm': 1.0279496908187866, 'learning_rate': 1.7063918020839525e-05, 'epoch': 0.82} +2025-02-05 17:16:15 - ERROR - stderr - 27%|██▋ | 6121/22434 [7:08:34<11:26:25, 2.52s/it] +2025-02-05 17:16:17 - ERROR - stderr - 27%|██▋ | 6122/22434 [7:08:37<11:22:26, 2.51s/it] +2025-02-05 17:16:17 - ERROR - stderr - +2025-02-05 17:16:17 - ERROR - stderr - +2025-02-05 17:16:17 - INFO - stdout - {'loss': 1.0575, 'grad_norm': 1.1633539199829102, 'learning_rate': 1.7062896032988723e-05, 'epoch': 0.82} +2025-02-05 17:16:17 - ERROR - stderr - 27%|██▋ | 6122/22434 [7:08:37<11:22:26, 2.51s/it] +2025-02-05 17:16:20 - ERROR - stderr - 27%|██▋ | 6123/22434 [7:08:39<11:26:01, 2.52s/it] +2025-02-05 17:16:20 - ERROR - stderr - +2025-02-05 17:16:20 - ERROR - stderr - +2025-02-05 17:16:20 - INFO - stdout - {'loss': 0.8754, 'grad_norm': 0.9825596213340759, 'learning_rate': 1.7061873897918607e-05, 'epoch': 0.82} +2025-02-05 17:16:20 - ERROR - stderr - 27%|██▋ | 6123/22434 [7:08:40<11:26:01, 2.52s/it] +2025-02-05 17:16:22 - ERROR - stderr - 27%|██▋ | 6124/22434 [7:08:42<11:17:02, 2.49s/it] +2025-02-05 17:16:22 - ERROR - stderr - +2025-02-05 17:16:22 - ERROR - stderr - +2025-02-05 17:16:22 - INFO - stdout - {'loss': 0.9121, 'grad_norm': 1.1353669166564941, 'learning_rate': 1.706085161565049e-05, 'epoch': 0.82} +2025-02-05 17:16:22 - ERROR - stderr - 27%|██▋ | 6124/22434 [7:08:42<11:17:02, 2.49s/it] +2025-02-05 17:16:25 - ERROR - stderr - 27%|██▋ | 6125/22434 [7:08:44<11:26:44, 2.53s/it] +2025-02-05 17:16:25 - ERROR - stderr - +2025-02-05 17:16:25 - ERROR - stderr - +2025-02-05 17:16:25 - INFO - stdout - {'loss': 1.0054, 'grad_norm': 1.0410075187683105, 'learning_rate': 1.705982918620568e-05, 'epoch': 0.82} +2025-02-05 17:16:25 - ERROR - stderr - 27%|██▋ | 6125/22434 [7:08:45<11:26:44, 2.53s/it] +2025-02-05 17:16:27 - ERROR - stderr - 27%|██▋ | 6126/22434 [7:08:47<11:19:02, 2.50s/it] +2025-02-05 17:16:27 - ERROR - stderr - +2025-02-05 17:16:27 - ERROR - stderr - +2025-02-05 17:16:27 - INFO - stdout - {'loss': 0.9491, 'grad_norm': 1.1720629930496216, 'learning_rate': 1.7058806609605482e-05, 'epoch': 0.82} +2025-02-05 17:16:27 - ERROR - stderr - 27%|██▋ | 6126/22434 [7:08:47<11:19:02, 2.50s/it] +2025-02-05 17:16:30 - ERROR - stderr - 27%|██▋ | 6127/22434 [7:08:49<11:18:35, 2.50s/it] +2025-02-05 17:16:30 - ERROR - stderr - +2025-02-05 17:16:30 - ERROR - stderr - +2025-02-05 17:16:30 - INFO - stdout - {'loss': 0.8616, 'grad_norm': 1.0217541456222534, 'learning_rate': 1.705778388587122e-05, 'epoch': 0.82} +2025-02-05 17:16:30 - ERROR - stderr - 27%|██▋ | 6127/22434 [7:08:49<11:18:35, 2.50s/it] +2025-02-05 17:16:32 - ERROR - stderr - 27%|██▋ | 6128/22434 [7:08:52<11:22:58, 2.51s/it] +2025-02-05 17:16:32 - ERROR - stderr - +2025-02-05 17:16:32 - ERROR - stderr - +2025-02-05 17:16:32 - INFO - stdout - {'loss': 0.8867, 'grad_norm': 1.0271517038345337, 'learning_rate': 1.70567610150242e-05, 'epoch': 0.82} +2025-02-05 17:16:32 - ERROR - stderr - 27%|██▋ | 6128/22434 [7:08:52<11:22:58, 2.51s/it] +2025-02-05 17:16:35 - ERROR - stderr - 27%|██▋ | 6129/22434 [7:08:55<11:31:48, 2.55s/it] +2025-02-05 17:16:35 - ERROR - stderr - +2025-02-05 17:16:35 - ERROR - stderr - +2025-02-05 17:16:35 - INFO - stdout - {'loss': 1.0929, 'grad_norm': 1.1515856981277466, 'learning_rate': 1.7055737997085753e-05, 'epoch': 0.82} +2025-02-05 17:16:35 - ERROR - stderr - 27%|██▋ | 6129/22434 [7:08:55<11:31:48, 2.55s/it] +2025-02-05 17:16:37 - ERROR - stderr - 27%|██▋ | 6130/22434 [7:08:57<11:27:01, 2.53s/it] +2025-02-05 17:16:37 - ERROR - stderr - +2025-02-05 17:16:37 - ERROR - stderr - +2025-02-05 17:16:37 - INFO - stdout - {'loss': 0.9406, 'grad_norm': 1.0481700897216797, 'learning_rate': 1.7054714832077198e-05, 'epoch': 0.82} +2025-02-05 17:16:37 - ERROR - stderr - 27%|██▋ | 6130/22434 [7:08:57<11:27:01, 2.53s/it] +2025-02-05 17:16:40 - ERROR - stderr - 27%|██▋ | 6131/22434 [7:09:00<11:28:29, 2.53s/it] +2025-02-05 17:16:40 - ERROR - stderr - +2025-02-05 17:16:40 - ERROR - stderr - +2025-02-05 17:16:40 - INFO - stdout - {'loss': 0.9357, 'grad_norm': 1.2201708555221558, 'learning_rate': 1.7053691520019863e-05, 'epoch': 0.82} +2025-02-05 17:16:40 - ERROR - stderr - 27%|██▋ | 6131/22434 [7:09:00<11:28:29, 2.53s/it] +2025-02-05 17:16:42 - ERROR - stderr - 27%|██▋ | 6132/22434 [7:09:02<11:21:12, 2.51s/it] +2025-02-05 17:16:42 - ERROR - stderr - +2025-02-05 17:16:42 - ERROR - stderr - +2025-02-05 17:16:42 - INFO - stdout - {'loss': 0.8883, 'grad_norm': 0.9761015176773071, 'learning_rate': 1.705266806093508e-05, 'epoch': 0.82} +2025-02-05 17:16:42 - ERROR - stderr - 27%|██▋ | 6132/22434 [7:09:02<11:21:12, 2.51s/it] +2025-02-05 17:16:45 - ERROR - stderr - 27%|██▋ | 6133/22434 [7:09:05<11:21:16, 2.51s/it] +2025-02-05 17:16:45 - ERROR - stderr - +2025-02-05 17:16:45 - ERROR - stderr - +2025-02-05 17:16:45 - INFO - stdout - {'loss': 0.9383, 'grad_norm': 1.061244249343872, 'learning_rate': 1.7051644454844175e-05, 'epoch': 0.82} +2025-02-05 17:16:45 - ERROR - stderr - 27%|██▋ | 6133/22434 [7:09:05<11:21:16, 2.51s/it] +2025-02-05 17:16:47 - ERROR - stderr - 27%|██▋ | 6134/22434 [7:09:07<11:19:23, 2.50s/it] +2025-02-05 17:16:47 - ERROR - stderr - +2025-02-05 17:16:47 - ERROR - stderr - +2025-02-05 17:16:47 - INFO - stdout - {'loss': 0.8859, 'grad_norm': 1.055127739906311, 'learning_rate': 1.705062070176849e-05, 'epoch': 0.82} +2025-02-05 17:16:47 - ERROR - stderr - 27%|██▋ | 6134/22434 [7:09:07<11:19:23, 2.50s/it] +2025-02-05 17:16:50 - ERROR - stderr - 27%|██▋ | 6135/22434 [7:09:10<11:18:37, 2.50s/it] +2025-02-05 17:16:50 - ERROR - stderr - +2025-02-05 17:16:50 - ERROR - stderr - +2025-02-05 17:16:50 - INFO - stdout - {'loss': 1.0374, 'grad_norm': 1.154029369354248, 'learning_rate': 1.704959680172937e-05, 'epoch': 0.82} +2025-02-05 17:16:50 - ERROR - stderr - 27%|██▋ | 6135/22434 [7:09:10<11:18:37, 2.50s/it] +2025-02-05 17:16:52 - ERROR - stderr - 27%|██▋ | 6136/22434 [7:09:12<11:18:17, 2.50s/it] +2025-02-05 17:16:52 - ERROR - stderr - +2025-02-05 17:16:52 - ERROR - stderr - +2025-02-05 17:16:52 - INFO - stdout - {'loss': 0.9582, 'grad_norm': 1.0986170768737793, 'learning_rate': 1.7048572754748143e-05, 'epoch': 0.82} +2025-02-05 17:16:52 - ERROR - stderr - 27%|██▋ | 6136/22434 [7:09:12<11:18:17, 2.50s/it] +2025-02-05 17:16:55 - ERROR - stderr - 27%|██▋ | 6137/22434 [7:09:15<11:18:59, 2.50s/it] +2025-02-05 17:16:55 - ERROR - stderr - +2025-02-05 17:16:55 - ERROR - stderr - +2025-02-05 17:16:55 - INFO - stdout - {'loss': 0.89, 'grad_norm': 1.1026197671890259, 'learning_rate': 1.7047548560846166e-05, 'epoch': 0.82} +2025-02-05 17:16:55 - ERROR - stderr - 27%|██▋ | 6137/22434 [7:09:15<11:18:59, 2.50s/it] +2025-02-05 17:16:57 - ERROR - stderr - 27%|██▋ | 6138/22434 [7:09:17<11:21:17, 2.51s/it] +2025-02-05 17:16:57 - ERROR - stderr - +2025-02-05 17:16:57 - ERROR - stderr - +2025-02-05 17:16:57 - INFO - stdout - {'loss': 0.9244, 'grad_norm': 0.9968591332435608, 'learning_rate': 1.7046524220044783e-05, 'epoch': 0.82} +2025-02-05 17:16:57 - ERROR - stderr - 27%|██▋ | 6138/22434 [7:09:17<11:21:17, 2.51s/it] +2025-02-05 17:17:00 - ERROR - stderr - 27%|██▋ | 6139/22434 [7:09:20<11:24:44, 2.52s/it] +2025-02-05 17:17:00 - ERROR - stderr - +2025-02-05 17:17:00 - ERROR - stderr - +2025-02-05 17:17:00 - INFO - stdout - {'loss': 0.9832, 'grad_norm': 1.1902706623077393, 'learning_rate': 1.7045499732365342e-05, 'epoch': 0.82} +2025-02-05 17:17:00 - ERROR - stderr - 27%|██▋ | 6139/22434 [7:09:20<11:24:44, 2.52s/it] +2025-02-05 17:17:02 - ERROR - stderr - 27%|██▋ | 6140/22434 [7:09:22<11:23:35, 2.52s/it] +2025-02-05 17:17:02 - ERROR - stderr - +2025-02-05 17:17:02 - ERROR - stderr - +2025-02-05 17:17:02 - INFO - stdout - {'loss': 1.0832, 'grad_norm': 1.1944248676300049, 'learning_rate': 1.7044475097829203e-05, 'epoch': 0.82} +2025-02-05 17:17:02 - ERROR - stderr - 27%|██▋ | 6140/22434 [7:09:22<11:23:35, 2.52s/it] +2025-02-05 17:17:05 - ERROR - stderr - 27%|██▋ | 6141/22434 [7:09:25<11:26:56, 2.53s/it] +2025-02-05 17:17:05 - ERROR - stderr - +2025-02-05 17:17:05 - ERROR - stderr - +2025-02-05 17:17:05 - INFO - stdout - {'loss': 1.0104, 'grad_norm': 1.1179265975952148, 'learning_rate': 1.704345031645772e-05, 'epoch': 0.82} +2025-02-05 17:17:05 - ERROR - stderr - 27%|██▋ | 6141/22434 [7:09:25<11:26:56, 2.53s/it] +2025-02-05 17:17:07 - ERROR - stderr - 27%|██▋ | 6142/22434 [7:09:27<11:25:18, 2.52s/it] +2025-02-05 17:17:07 - ERROR - stderr - +2025-02-05 17:17:07 - ERROR - stderr - +2025-02-05 17:17:07 - INFO - stdout - {'loss': 0.9153, 'grad_norm': 1.0587571859359741, 'learning_rate': 1.7042425388272256e-05, 'epoch': 0.82} +2025-02-05 17:17:07 - ERROR - stderr - 27%|██▋ | 6142/22434 [7:09:27<11:25:18, 2.52s/it] +2025-02-05 17:17:10 - ERROR - stderr - 27%|██▋ | 6143/22434 [7:09:30<11:22:58, 2.52s/it] +2025-02-05 17:17:10 - ERROR - stderr - +2025-02-05 17:17:10 - ERROR - stderr - +2025-02-05 17:17:10 - INFO - stdout - {'loss': 1.0031, 'grad_norm': 1.060757040977478, 'learning_rate': 1.7041400313294176e-05, 'epoch': 0.82} +2025-02-05 17:17:10 - ERROR - stderr - 27%|██▋ | 6143/22434 [7:09:30<11:22:58, 2.52s/it] +2025-02-05 17:17:12 - ERROR - stderr - 27%|██▋ | 6144/22434 [7:09:32<11:25:54, 2.53s/it] +2025-02-05 17:17:13 - ERROR - stderr - +2025-02-05 17:17:13 - ERROR - stderr - +2025-02-05 17:17:13 - INFO - stdout - {'loss': 0.9761, 'grad_norm': 1.040330410003662, 'learning_rate': 1.704037509154484e-05, 'epoch': 0.82} +2025-02-05 17:17:13 - ERROR - stderr - 27%|██▋ | 6144/22434 [7:09:32<11:25:54, 2.53s/it] +2025-02-05 17:17:15 - ERROR - stderr - 27%|██▋ | 6145/22434 [7:09:35<11:21:57, 2.51s/it] +2025-02-05 17:17:15 - ERROR - stderr - +2025-02-05 17:17:15 - ERROR - stderr - +2025-02-05 17:17:15 - INFO - stdout - {'loss': 0.9934, 'grad_norm': 1.3250054121017456, 'learning_rate': 1.7039349723045625e-05, 'epoch': 0.82} +2025-02-05 17:17:15 - ERROR - stderr - 27%|██▋ | 6145/22434 [7:09:35<11:21:57, 2.51s/it] +2025-02-05 17:17:18 - ERROR - stderr - 27%|██▋ | 6146/22434 [7:09:37<11:24:30, 2.52s/it] +2025-02-05 17:17:18 - ERROR - stderr - +2025-02-05 17:17:18 - ERROR - stderr - +2025-02-05 17:17:18 - INFO - stdout - {'loss': 0.9361, 'grad_norm': 0.9038297533988953, 'learning_rate': 1.7038324207817902e-05, 'epoch': 0.82} +2025-02-05 17:17:18 - ERROR - stderr - 27%|██▋ | 6146/22434 [7:09:37<11:24:30, 2.52s/it] +2025-02-05 17:17:20 - ERROR - stderr - 27%|██▋ | 6147/22434 [7:09:40<11:30:30, 2.54s/it] +2025-02-05 17:17:20 - ERROR - stderr - +2025-02-05 17:17:20 - ERROR - stderr - +2025-02-05 17:17:20 - INFO - stdout - {'loss': 0.8073, 'grad_norm': 0.9741806983947754, 'learning_rate': 1.7037298545883042e-05, 'epoch': 0.82} +2025-02-05 17:17:20 - ERROR - stderr - 27%|██▋ | 6147/22434 [7:09:40<11:30:30, 2.54s/it] +2025-02-05 17:17:23 - ERROR - stderr - 27%|██▋ | 6148/22434 [7:09:43<11:59:27, 2.65s/it] +2025-02-05 17:17:23 - ERROR - stderr - +2025-02-05 17:17:23 - ERROR - stderr - +2025-02-05 17:17:23 - INFO - stdout - {'loss': 0.8329, 'grad_norm': 0.8642858862876892, 'learning_rate': 1.7036272737262432e-05, 'epoch': 0.82} +2025-02-05 17:17:23 - ERROR - stderr - 27%|██▋ | 6148/22434 [7:09:43<11:59:27, 2.65s/it] +2025-02-05 17:17:26 - ERROR - stderr - 27%|██▋ | 6149/22434 [7:09:46<12:20:16, 2.73s/it] +2025-02-05 17:17:26 - ERROR - stderr - +2025-02-05 17:17:26 - ERROR - stderr - +2025-02-05 17:17:26 - INFO - stdout - {'loss': 0.929, 'grad_norm': 1.0173125267028809, 'learning_rate': 1.7035246781977447e-05, 'epoch': 0.82} +2025-02-05 17:17:26 - ERROR - stderr - 27%|██▋ | 6149/22434 [7:09:46<12:20:16, 2.73s/it] +2025-02-05 17:17:28 - ERROR - stderr - 27%|██▋ | 6150/22434 [7:09:48<12:04:20, 2.67s/it] +2025-02-05 17:17:28 - ERROR - stderr - +2025-02-05 17:17:28 - ERROR - stderr - +2025-02-05 17:17:28 - INFO - stdout - {'loss': 0.9053, 'grad_norm': 1.0292012691497803, 'learning_rate': 1.7034220680049477e-05, 'epoch': 0.82} +2025-02-05 17:17:28 - ERROR - stderr - 27%|██▋ | 6150/22434 [7:09:48<12:04:20, 2.67s/it] +2025-02-05 17:17:31 - ERROR - stderr - 27%|██▋ | 6151/22434 [7:09:51<11:46:27, 2.60s/it] +2025-02-05 17:17:31 - ERROR - stderr - +2025-02-05 17:17:31 - ERROR - stderr - +2025-02-05 17:17:31 - INFO - stdout - {'loss': 1.0302, 'grad_norm': 1.065398097038269, 'learning_rate': 1.7033194431499903e-05, 'epoch': 0.82} +2025-02-05 17:17:31 - ERROR - stderr - 27%|██▋ | 6151/22434 [7:09:51<11:46:27, 2.60s/it] +2025-02-05 17:17:33 - ERROR - stderr - 27%|██▋ | 6152/22434 [7:09:53<11:34:30, 2.56s/it] +2025-02-05 17:17:33 - ERROR - stderr - +2025-02-05 17:17:33 - ERROR - stderr - +2025-02-05 17:17:33 - INFO - stdout - {'loss': 0.9388, 'grad_norm': 1.0922472476959229, 'learning_rate': 1.7032168036350126e-05, 'epoch': 0.82} +2025-02-05 17:17:33 - ERROR - stderr - 27%|██▋ | 6152/22434 [7:09:53<11:34:30, 2.56s/it] +2025-02-05 17:17:36 - ERROR - stderr - 27%|██▋ | 6153/22434 [7:09:56<11:25:37, 2.53s/it] +2025-02-05 17:17:36 - ERROR - stderr - +2025-02-05 17:17:36 - ERROR - stderr - +2025-02-05 17:17:36 - INFO - stdout - {'loss': 0.8563, 'grad_norm': 1.0479071140289307, 'learning_rate': 1.7031141494621534e-05, 'epoch': 0.82} +2025-02-05 17:17:36 - ERROR - stderr - 27%|██▋ | 6153/22434 [7:09:56<11:25:37, 2.53s/it] +2025-02-05 17:17:39 - ERROR - stderr - 27%|██▋ | 6154/22434 [7:09:58<11:42:08, 2.59s/it] +2025-02-05 17:17:39 - ERROR - stderr - +2025-02-05 17:17:39 - ERROR - stderr - +2025-02-05 17:17:39 - INFO - stdout - {'loss': 0.9729, 'grad_norm': 1.1110531091690063, 'learning_rate': 1.7030114806335528e-05, 'epoch': 0.82} +2025-02-05 17:17:39 - ERROR - stderr - 27%|██▋ | 6154/22434 [7:09:58<11:42:08, 2.59s/it] +2025-02-05 17:17:41 - ERROR - stderr - 27%|██▋ | 6155/22434 [7:10:01<11:29:59, 2.54s/it] +2025-02-05 17:17:41 - ERROR - stderr - +2025-02-05 17:17:41 - ERROR - stderr - +2025-02-05 17:17:41 - INFO - stdout - {'loss': 0.8602, 'grad_norm': 1.0562607049942017, 'learning_rate': 1.70290879715135e-05, 'epoch': 0.82} +2025-02-05 17:17:41 - ERROR - stderr - 27%|██▋ | 6155/22434 [7:10:01<11:29:59, 2.54s/it] +2025-02-05 17:17:44 - ERROR - stderr - 27%|██▋ | 6156/22434 [7:10:03<11:29:35, 2.54s/it] +2025-02-05 17:17:44 - ERROR - stderr - +2025-02-05 17:17:44 - ERROR - stderr - +2025-02-05 17:17:44 - INFO - stdout - {'loss': 0.8301, 'grad_norm': 1.1326544284820557, 'learning_rate': 1.7028060990176865e-05, 'epoch': 0.82} +2025-02-05 17:17:44 - ERROR - stderr - 27%|██▋ | 6156/22434 [7:10:03<11:29:35, 2.54s/it] +2025-02-05 17:17:46 - ERROR - stderr - 27%|██▋ | 6157/22434 [7:10:06<11:28:44, 2.54s/it] +2025-02-05 17:17:46 - ERROR - stderr - +2025-02-05 17:17:46 - ERROR - stderr - +2025-02-05 17:17:46 - INFO - stdout - {'loss': 1.0559, 'grad_norm': 1.1494784355163574, 'learning_rate': 1.702703386234702e-05, 'epoch': 0.82} +2025-02-05 17:17:46 - ERROR - stderr - 27%|██▋ | 6157/22434 [7:10:06<11:28:44, 2.54s/it] +2025-02-05 17:17:48 - ERROR - stderr - 27%|██▋ | 6158/22434 [7:10:08<11:22:23, 2.52s/it] +2025-02-05 17:17:49 - ERROR - stderr - +2025-02-05 17:17:49 - ERROR - stderr - +2025-02-05 17:17:49 - INFO - stdout - {'loss': 0.9313, 'grad_norm': 1.0292245149612427, 'learning_rate': 1.7026006588045382e-05, 'epoch': 0.82} +2025-02-05 17:17:49 - ERROR - stderr - 27%|██▋ | 6158/22434 [7:10:08<11:22:23, 2.52s/it] +2025-02-05 17:17:51 - ERROR - stderr - 27%|██▋ | 6159/22434 [7:10:11<11:20:03, 2.51s/it] +2025-02-05 17:17:51 - ERROR - stderr - +2025-02-05 17:17:51 - ERROR - stderr - +2025-02-05 17:17:51 - INFO - stdout - {'loss': 0.8746, 'grad_norm': 1.1391910314559937, 'learning_rate': 1.7024979167293354e-05, 'epoch': 0.82} +2025-02-05 17:17:51 - ERROR - stderr - 27%|██▋ | 6159/22434 [7:10:11<11:20:03, 2.51s/it] +2025-02-05 17:17:54 - ERROR - stderr - 27%|██▋ | 6160/22434 [7:10:13<11:21:47, 2.51s/it] +2025-02-05 17:17:54 - ERROR - stderr - +2025-02-05 17:17:54 - ERROR - stderr - +2025-02-05 17:17:54 - INFO - stdout - {'loss': 1.071, 'grad_norm': 1.0820252895355225, 'learning_rate': 1.702395160011236e-05, 'epoch': 0.82} +2025-02-05 17:17:54 - ERROR - stderr - 27%|██▋ | 6160/22434 [7:10:13<11:21:47, 2.51s/it] +2025-02-05 17:17:56 - ERROR - stderr - 27%|██▋ | 6161/22434 [7:10:16<11:21:10, 2.51s/it] +2025-02-05 17:17:56 - ERROR - stderr - +2025-02-05 17:17:56 - ERROR - stderr - +2025-02-05 17:17:56 - INFO - stdout - {'loss': 0.9617, 'grad_norm': 1.1025400161743164, 'learning_rate': 1.7022923886523818e-05, 'epoch': 0.82} +2025-02-05 17:17:56 - ERROR - stderr - 27%|██▋ | 6161/22434 [7:10:16<11:21:10, 2.51s/it] +2025-02-05 17:17:59 - ERROR - stderr - 27%|██▋ | 6162/22434 [7:10:18<11:24:53, 2.53s/it] +2025-02-05 17:17:59 - ERROR - stderr - +2025-02-05 17:17:59 - ERROR - stderr - +2025-02-05 17:17:59 - INFO - stdout - {'loss': 1.0077, 'grad_norm': 1.1073821783065796, 'learning_rate': 1.702189602654915e-05, 'epoch': 0.82} +2025-02-05 17:17:59 - ERROR - stderr - 27%|██▋ | 6162/22434 [7:10:18<11:24:53, 2.53s/it] +2025-02-05 17:18:01 - ERROR - stderr - 27%|██▋ | 6163/22434 [7:10:21<11:31:48, 2.55s/it] +2025-02-05 17:18:01 - ERROR - stderr - +2025-02-05 17:18:01 - ERROR - stderr - +2025-02-05 17:18:01 - INFO - stdout - {'loss': 1.0497, 'grad_norm': 1.083636999130249, 'learning_rate': 1.7020868020209773e-05, 'epoch': 0.82} +2025-02-05 17:18:01 - ERROR - stderr - 27%|██▋ | 6163/22434 [7:10:21<11:31:48, 2.55s/it] +2025-02-05 17:18:04 - ERROR - stderr - 27%|██▋ | 6164/22434 [7:10:23<11:30:15, 2.55s/it] +2025-02-05 17:18:04 - ERROR - stderr - +2025-02-05 17:18:04 - ERROR - stderr - +2025-02-05 17:18:04 - INFO - stdout - {'loss': 0.8729, 'grad_norm': 1.0290521383285522, 'learning_rate': 1.7019839867527122e-05, 'epoch': 0.82} +2025-02-05 17:18:04 - ERROR - stderr - 27%|██▋ | 6164/22434 [7:10:24<11:30:15, 2.55s/it] +2025-02-05 17:18:06 - ERROR - stderr - 27%|██▋ | 6165/22434 [7:10:26<11:26:48, 2.53s/it] +2025-02-05 17:18:06 - ERROR - stderr - +2025-02-05 17:18:06 - ERROR - stderr - +2025-02-05 17:18:06 - INFO - stdout - {'loss': 0.9581, 'grad_norm': 1.0141433477401733, 'learning_rate': 1.701881156852263e-05, 'epoch': 0.82} +2025-02-05 17:18:06 - ERROR - stderr - 27%|██▋ | 6165/22434 [7:10:26<11:26:48, 2.53s/it] +2025-02-05 17:18:09 - ERROR - stderr - 27%|██▋ | 6166/22434 [7:10:28<11:18:30, 2.50s/it] +2025-02-05 17:18:09 - ERROR - stderr - +2025-02-05 17:18:09 - ERROR - stderr - +2025-02-05 17:18:09 - INFO - stdout - {'loss': 0.8941, 'grad_norm': 1.0510011911392212, 'learning_rate': 1.7017783123217725e-05, 'epoch': 0.82} +2025-02-05 17:18:09 - ERROR - stderr - 27%|██▋ | 6166/22434 [7:10:28<11:18:30, 2.50s/it] +2025-02-05 17:18:11 - ERROR - stderr - 27%|██▋ | 6167/22434 [7:10:31<11:16:35, 2.50s/it] +2025-02-05 17:18:11 - ERROR - stderr - +2025-02-05 17:18:11 - ERROR - stderr - +2025-02-05 17:18:11 - INFO - stdout - {'loss': 0.9721, 'grad_norm': 1.148488163948059, 'learning_rate': 1.7016754531633846e-05, 'epoch': 0.82} +2025-02-05 17:18:11 - ERROR - stderr - 27%|██▋ | 6167/22434 [7:10:31<11:16:35, 2.50s/it] +2025-02-05 17:18:14 - ERROR - stderr - 27%|██▋ | 6168/22434 [7:10:33<11:16:48, 2.50s/it] +2025-02-05 17:18:14 - ERROR - stderr - +2025-02-05 17:18:14 - ERROR - stderr - +2025-02-05 17:18:14 - INFO - stdout - {'loss': 0.9532, 'grad_norm': 0.9917287826538086, 'learning_rate': 1.701572579379243e-05, 'epoch': 0.82} +2025-02-05 17:18:14 - ERROR - stderr - 27%|██▋ | 6168/22434 [7:10:33<11:16:48, 2.50s/it] +2025-02-05 17:18:16 - ERROR - stderr - 27%|██▋ | 6169/22434 [7:10:36<11:17:57, 2.50s/it] +2025-02-05 17:18:16 - ERROR - stderr - +2025-02-05 17:18:16 - ERROR - stderr - +2025-02-05 17:18:16 - INFO - stdout - {'loss': 1.0275, 'grad_norm': 1.1366647481918335, 'learning_rate': 1.7014696909714928e-05, 'epoch': 0.82} +2025-02-05 17:18:16 - ERROR - stderr - 27%|██▋ | 6169/22434 [7:10:36<11:17:57, 2.50s/it] +2025-02-05 17:18:19 - ERROR - stderr - 28%|██▊ | 6170/22434 [7:10:38<11:12:31, 2.48s/it] +2025-02-05 17:18:19 - ERROR - stderr - +2025-02-05 17:18:19 - ERROR - stderr - +2025-02-05 17:18:19 - INFO - stdout - {'loss': 0.9064, 'grad_norm': 1.041864275932312, 'learning_rate': 1.7013667879422778e-05, 'epoch': 0.83} +2025-02-05 17:18:19 - ERROR - stderr - 28%|██▊ | 6170/22434 [7:10:38<11:12:31, 2.48s/it] +2025-02-05 17:18:21 - ERROR - stderr - 28%|██▊ | 6171/22434 [7:10:41<11:07:27, 2.46s/it] +2025-02-05 17:18:21 - ERROR - stderr - +2025-02-05 17:18:21 - ERROR - stderr - +2025-02-05 17:18:21 - INFO - stdout - {'loss': 0.9973, 'grad_norm': 1.248285174369812, 'learning_rate': 1.701263870293743e-05, 'epoch': 0.83} +2025-02-05 17:18:21 - ERROR - stderr - 28%|██▊ | 6171/22434 [7:10:41<11:07:27, 2.46s/it] +2025-02-05 17:18:24 - ERROR - stderr - 28%|██▊ | 6172/22434 [7:10:43<11:10:26, 2.47s/it] +2025-02-05 17:18:24 - ERROR - stderr - +2025-02-05 17:18:24 - ERROR - stderr - +2025-02-05 17:18:24 - INFO - stdout - {'loss': 0.9129, 'grad_norm': 1.0920511484146118, 'learning_rate': 1.7011609380280344e-05, 'epoch': 0.83} +2025-02-05 17:18:24 - ERROR - stderr - 28%|██▊ | 6172/22434 [7:10:43<11:10:26, 2.47s/it] +2025-02-05 17:18:26 - ERROR - stderr - 28%|██▊ | 6173/22434 [7:10:46<11:09:20, 2.47s/it] +2025-02-05 17:18:26 - ERROR - stderr - +2025-02-05 17:18:26 - ERROR - stderr - +2025-02-05 17:18:26 - INFO - stdout - {'loss': 0.9571, 'grad_norm': 1.3310837745666504, 'learning_rate': 1.701057991147297e-05, 'epoch': 0.83} +2025-02-05 17:18:26 - ERROR - stderr - 28%|██▊ | 6173/22434 [7:10:46<11:09:20, 2.47s/it] +2025-02-05 17:18:28 - ERROR - stderr - 28%|██▊ | 6174/22434 [7:10:48<11:11:26, 2.48s/it] +2025-02-05 17:18:29 - ERROR - stderr - +2025-02-05 17:18:29 - ERROR - stderr - +2025-02-05 17:18:29 - INFO - stdout - {'loss': 0.9721, 'grad_norm': 1.1831388473510742, 'learning_rate': 1.7009550296536762e-05, 'epoch': 0.83} +2025-02-05 17:18:29 - ERROR - stderr - 28%|██▊ | 6174/22434 [7:10:48<11:11:26, 2.48s/it] +2025-02-05 17:18:31 - ERROR - stderr - 28%|██▊ | 6175/22434 [7:10:51<11:15:09, 2.49s/it] +2025-02-05 17:18:31 - ERROR - stderr - +2025-02-05 17:18:31 - ERROR - stderr - +2025-02-05 17:18:31 - INFO - stdout - {'loss': 0.8446, 'grad_norm': 1.0175886154174805, 'learning_rate': 1.700852053549319e-05, 'epoch': 0.83} +2025-02-05 17:18:31 - ERROR - stderr - 28%|██▊ | 6175/22434 [7:10:51<11:15:09, 2.49s/it] +2025-02-05 17:18:34 - ERROR - stderr - 28%|██▊ | 6176/22434 [7:10:53<11:18:52, 2.51s/it] +2025-02-05 17:18:34 - ERROR - stderr - +2025-02-05 17:18:34 - ERROR - stderr - +2025-02-05 17:18:34 - INFO - stdout - {'loss': 0.8424, 'grad_norm': 1.0355157852172852, 'learning_rate': 1.7007490628363706e-05, 'epoch': 0.83} +2025-02-05 17:18:34 - ERROR - stderr - 28%|██▊ | 6176/22434 [7:10:53<11:18:52, 2.51s/it] +2025-02-05 17:18:36 - ERROR - stderr - 28%|██▊ | 6177/22434 [7:10:56<11:18:48, 2.51s/it] +2025-02-05 17:18:36 - ERROR - stderr - +2025-02-05 17:18:36 - ERROR - stderr - +2025-02-05 17:18:36 - INFO - stdout - {'loss': 0.9823, 'grad_norm': 1.1303750276565552, 'learning_rate': 1.7006460575169792e-05, 'epoch': 0.83} +2025-02-05 17:18:36 - ERROR - stderr - 28%|██▊ | 6177/22434 [7:10:56<11:18:48, 2.51s/it] +2025-02-05 17:18:39 - ERROR - stderr - 28%|██▊ | 6178/22434 [7:10:58<11:28:58, 2.54s/it] +2025-02-05 17:18:39 - ERROR - stderr - +2025-02-05 17:18:39 - ERROR - stderr - +2025-02-05 17:18:39 - INFO - stdout - {'loss': 1.0546, 'grad_norm': 1.0448142290115356, 'learning_rate': 1.700543037593291e-05, 'epoch': 0.83} +2025-02-05 17:18:39 - ERROR - stderr - 28%|██▊ | 6178/22434 [7:10:58<11:28:58, 2.54s/it] +2025-02-05 17:18:41 - ERROR - stderr - 28%|██▊ | 6179/22434 [7:11:01<11:23:40, 2.52s/it] +2025-02-05 17:18:41 - ERROR - stderr - +2025-02-05 17:18:41 - ERROR - stderr - +2025-02-05 17:18:41 - INFO - stdout - {'loss': 0.8833, 'grad_norm': 1.0527616739273071, 'learning_rate': 1.700440003067454e-05, 'epoch': 0.83} +2025-02-05 17:18:41 - ERROR - stderr - 28%|██▊ | 6179/22434 [7:11:01<11:23:40, 2.52s/it] +2025-02-05 17:18:44 - ERROR - stderr - 28%|██▊ | 6180/22434 [7:11:03<11:22:30, 2.52s/it] +2025-02-05 17:18:44 - ERROR - stderr - +2025-02-05 17:18:44 - ERROR - stderr - +2025-02-05 17:18:44 - INFO - stdout - {'loss': 0.9029, 'grad_norm': 1.1139705181121826, 'learning_rate': 1.7003369539416147e-05, 'epoch': 0.83} +2025-02-05 17:18:44 - ERROR - stderr - 28%|██▊ | 6180/22434 [7:11:03<11:22:30, 2.52s/it] +2025-02-05 17:18:46 - ERROR - stderr - 28%|██▊ | 6181/22434 [7:11:06<11:14:27, 2.49s/it] +2025-02-05 17:18:46 - ERROR - stderr - +2025-02-05 17:18:46 - ERROR - stderr - +2025-02-05 17:18:46 - INFO - stdout - {'loss': 0.8723, 'grad_norm': 1.0564367771148682, 'learning_rate': 1.700233890217922e-05, 'epoch': 0.83} +2025-02-05 17:18:46 - ERROR - stderr - 28%|██▊ | 6181/22434 [7:11:06<11:14:27, 2.49s/it] +2025-02-05 17:18:49 - ERROR - stderr - 28%|██▊ | 6182/22434 [7:11:08<11:14:09, 2.49s/it] +2025-02-05 17:18:49 - ERROR - stderr - +2025-02-05 17:18:49 - ERROR - stderr - +2025-02-05 17:18:49 - INFO - stdout - {'loss': 1.0272, 'grad_norm': 1.1202948093414307, 'learning_rate': 1.7001308118985237e-05, 'epoch': 0.83} +2025-02-05 17:18:49 - ERROR - stderr - 28%|██▊ | 6182/22434 [7:11:08<11:14:09, 2.49s/it] +2025-02-05 17:18:51 - ERROR - stderr - 28%|██▊ | 6183/22434 [7:11:11<11:19:40, 2.51s/it] +2025-02-05 17:18:51 - ERROR - stderr - +2025-02-05 17:18:51 - ERROR - stderr - +2025-02-05 17:18:51 - INFO - stdout - {'loss': 0.9133, 'grad_norm': 1.1927080154418945, 'learning_rate': 1.700027718985569e-05, 'epoch': 0.83} +2025-02-05 17:18:51 - ERROR - stderr - 28%|██▊ | 6183/22434 [7:11:11<11:19:40, 2.51s/it] +2025-02-05 17:18:54 - ERROR - stderr - 28%|██▊ | 6184/22434 [7:11:14<11:39:37, 2.58s/it] +2025-02-05 17:18:54 - ERROR - stderr - +2025-02-05 17:18:54 - ERROR - stderr - +2025-02-05 17:18:54 - INFO - stdout - {'loss': 0.8929, 'grad_norm': 1.0962576866149902, 'learning_rate': 1.699924611481206e-05, 'epoch': 0.83} +2025-02-05 17:18:54 - ERROR - stderr - 28%|██▊ | 6184/22434 [7:11:14<11:39:37, 2.58s/it] +2025-02-05 17:18:56 - ERROR - stderr - 28%|██▊ | 6185/22434 [7:11:16<11:36:41, 2.57s/it] +2025-02-05 17:18:56 - ERROR - stderr - +2025-02-05 17:18:56 - ERROR - stderr - +2025-02-05 17:18:56 - INFO - stdout - {'loss': 0.901, 'grad_norm': 1.0162962675094604, 'learning_rate': 1.6998214893875845e-05, 'epoch': 0.83} +2025-02-05 17:18:56 - ERROR - stderr - 28%|██▊ | 6185/22434 [7:11:16<11:36:41, 2.57s/it] +2025-02-05 17:18:59 - ERROR - stderr - 28%|██▊ | 6186/22434 [7:11:19<11:29:21, 2.55s/it] +2025-02-05 17:18:59 - ERROR - stderr - +2025-02-05 17:18:59 - ERROR - stderr - +2025-02-05 17:18:59 - INFO - stdout - {'loss': 0.8625, 'grad_norm': 1.0443971157073975, 'learning_rate': 1.6997183527068536e-05, 'epoch': 0.83} +2025-02-05 17:18:59 - ERROR - stderr - 28%|██▊ | 6186/22434 [7:11:19<11:29:21, 2.55s/it] +2025-02-05 17:19:01 - ERROR - stderr - 28%|██▊ | 6187/22434 [7:11:21<11:25:42, 2.53s/it] +2025-02-05 17:19:01 - ERROR - stderr - +2025-02-05 17:19:01 - ERROR - stderr - +2025-02-05 17:19:01 - INFO - stdout - {'loss': 0.9086, 'grad_norm': 1.0037717819213867, 'learning_rate': 1.699615201441163e-05, 'epoch': 0.83} +2025-02-05 17:19:01 - ERROR - stderr - 28%|██▊ | 6187/22434 [7:11:21<11:25:42, 2.53s/it] +2025-02-05 17:19:04 - ERROR - stderr - 28%|██▊ | 6188/22434 [7:11:24<11:18:02, 2.50s/it] +2025-02-05 17:19:04 - ERROR - stderr - +2025-02-05 17:19:04 - ERROR - stderr - +2025-02-05 17:19:04 - INFO - stdout - {'loss': 0.848, 'grad_norm': 1.1338119506835938, 'learning_rate': 1.699512035592663e-05, 'epoch': 0.83} +2025-02-05 17:19:04 - ERROR - stderr - 28%|██▊ | 6188/22434 [7:11:24<11:18:02, 2.50s/it] +2025-02-05 17:19:06 - ERROR - stderr - 28%|██▊ | 6189/22434 [7:11:26<11:21:12, 2.52s/it] +2025-02-05 17:19:06 - ERROR - stderr - +2025-02-05 17:19:06 - ERROR - stderr - +2025-02-05 17:19:06 - INFO - stdout - {'loss': 0.9257, 'grad_norm': 1.0317057371139526, 'learning_rate': 1.6994088551635043e-05, 'epoch': 0.83} +2025-02-05 17:19:06 - ERROR - stderr - 28%|██▊ | 6189/22434 [7:11:26<11:21:12, 2.52s/it] +2025-02-05 17:19:09 - ERROR - stderr - 28%|██▊ | 6190/22434 [7:11:29<11:16:35, 2.50s/it] +2025-02-05 17:19:09 - ERROR - stderr - +2025-02-05 17:19:09 - ERROR - stderr - +2025-02-05 17:19:09 - INFO - stdout - {'loss': 0.8783, 'grad_norm': 1.0992035865783691, 'learning_rate': 1.6993056601558372e-05, 'epoch': 0.83} +2025-02-05 17:19:09 - ERROR - stderr - 28%|██▊ | 6190/22434 [7:11:29<11:16:35, 2.50s/it] +2025-02-05 17:19:11 - ERROR - stderr - 28%|██▊ | 6191/22434 [7:11:31<11:26:11, 2.53s/it] +2025-02-05 17:19:12 - ERROR - stderr - +2025-02-05 17:19:12 - ERROR - stderr - +2025-02-05 17:19:12 - INFO - stdout - {'loss': 0.9876, 'grad_norm': 1.0445293188095093, 'learning_rate': 1.6992024505718126e-05, 'epoch': 0.83} +2025-02-05 17:19:12 - ERROR - stderr - 28%|██▊ | 6191/22434 [7:11:31<11:26:11, 2.53s/it] +2025-02-05 17:19:14 - ERROR - stderr - 28%|██▊ | 6192/22434 [7:11:34<11:24:23, 2.53s/it] +2025-02-05 17:19:14 - ERROR - stderr - +2025-02-05 17:19:14 - ERROR - stderr - +2025-02-05 17:19:14 - INFO - stdout - {'loss': 0.8892, 'grad_norm': 1.0581703186035156, 'learning_rate': 1.699099226413582e-05, 'epoch': 0.83} +2025-02-05 17:19:14 - ERROR - stderr - 28%|██▊ | 6192/22434 [7:11:34<11:24:23, 2.53s/it] +2025-02-05 17:19:17 - ERROR - stderr - 28%|██▊ | 6193/22434 [7:11:36<11:29:34, 2.55s/it] +2025-02-05 17:19:17 - ERROR - stderr - +2025-02-05 17:19:17 - ERROR - stderr - +2025-02-05 17:19:17 - INFO - stdout - {'loss': 0.9715, 'grad_norm': 1.1670211553573608, 'learning_rate': 1.6989959876832972e-05, 'epoch': 0.83} +2025-02-05 17:19:17 - ERROR - stderr - 28%|██▊ | 6193/22434 [7:11:36<11:29:34, 2.55s/it] +2025-02-05 17:19:19 - ERROR - stderr - 28%|██▊ | 6194/22434 [7:11:39<11:24:42, 2.53s/it] +2025-02-05 17:19:19 - ERROR - stderr - +2025-02-05 17:19:19 - ERROR - stderr - +2025-02-05 17:19:19 - INFO - stdout - {'loss': 0.8635, 'grad_norm': 1.0369857549667358, 'learning_rate': 1.6988927343831093e-05, 'epoch': 0.83} +2025-02-05 17:19:19 - ERROR - stderr - 28%|██▊ | 6194/22434 [7:11:39<11:24:42, 2.53s/it] +2025-02-05 17:19:21 - ERROR - stderr - 28%|██▊ | 6195/22434 [7:11:41<11:17:18, 2.50s/it] +2025-02-05 17:19:22 - ERROR - stderr - +2025-02-05 17:19:22 - ERROR - stderr - +2025-02-05 17:19:22 - INFO - stdout - {'loss': 0.9588, 'grad_norm': 1.0399136543273926, 'learning_rate': 1.6987894665151718e-05, 'epoch': 0.83} +2025-02-05 17:19:22 - ERROR - stderr - 28%|██▊ | 6195/22434 [7:11:41<11:17:18, 2.50s/it] +2025-02-05 17:19:24 - ERROR - stderr - 28%|██▊ | 6196/22434 [7:11:44<11:31:57, 2.56s/it] +2025-02-05 17:19:24 - ERROR - stderr - +2025-02-05 17:19:24 - ERROR - stderr - +2025-02-05 17:19:24 - INFO - stdout - {'loss': 0.8707, 'grad_norm': 1.045790433883667, 'learning_rate': 1.698686184081636e-05, 'epoch': 0.83} +2025-02-05 17:19:24 - ERROR - stderr - 28%|██▊ | 6196/22434 [7:11:44<11:31:57, 2.56s/it] +2025-02-05 17:19:27 - ERROR - stderr - 28%|██▊ | 6197/22434 [7:11:46<11:29:35, 2.55s/it] +2025-02-05 17:19:27 - ERROR - stderr - +2025-02-05 17:19:27 - ERROR - stderr - +2025-02-05 17:19:27 - INFO - stdout - {'loss': 0.9746, 'grad_norm': 1.0708565711975098, 'learning_rate': 1.698582887084656e-05, 'epoch': 0.83} +2025-02-05 17:19:27 - ERROR - stderr - 28%|██▊ | 6197/22434 [7:11:47<11:29:35, 2.55s/it] +2025-02-05 17:19:29 - ERROR - stderr - 28%|██▊ | 6198/22434 [7:11:49<11:26:46, 2.54s/it] +2025-02-05 17:19:29 - ERROR - stderr - +2025-02-05 17:19:29 - ERROR - stderr - +2025-02-05 17:19:29 - INFO - stdout - {'loss': 0.9087, 'grad_norm': 1.1816719770431519, 'learning_rate': 1.6984795755263836e-05, 'epoch': 0.83} +2025-02-05 17:19:29 - ERROR - stderr - 28%|██▊ | 6198/22434 [7:11:49<11:26:46, 2.54s/it] +2025-02-05 17:19:32 - ERROR - stderr - 28%|██▊ | 6199/22434 [7:11:52<11:39:40, 2.59s/it] +2025-02-05 17:19:32 - ERROR - stderr - +2025-02-05 17:19:32 - ERROR - stderr - +2025-02-05 17:19:32 - INFO - stdout - {'loss': 0.8929, 'grad_norm': 1.0195719003677368, 'learning_rate': 1.6983762494089732e-05, 'epoch': 0.83} +2025-02-05 17:19:32 - ERROR - stderr - 28%|██▊ | 6199/22434 [7:11:52<11:39:40, 2.59s/it] +2025-02-05 17:19:34 - ERROR - stderr - 28%|██▊ | 6200/22434 [7:11:54<11:37:06, 2.58s/it] +2025-02-05 17:19:35 - ERROR - stderr - +2025-02-05 17:19:35 - ERROR - stderr - +2025-02-05 17:19:35 - INFO - stdout - {'loss': 0.868, 'grad_norm': 0.986464262008667, 'learning_rate': 1.698272908734578e-05, 'epoch': 0.83} +2025-02-05 17:19:35 - ERROR - stderr - 28%|██▊ | 6200/22434 [7:11:54<11:37:06, 2.58s/it] +2025-02-05 17:19:37 - ERROR - stderr - 28%|██▊ | 6201/22434 [7:11:57<11:25:54, 2.54s/it] +2025-02-05 17:19:37 - ERROR - stderr - +2025-02-05 17:19:37 - ERROR - stderr - +2025-02-05 17:19:37 - INFO - stdout - {'loss': 0.9668, 'grad_norm': 1.1000392436981201, 'learning_rate': 1.6981695535053518e-05, 'epoch': 0.83} +2025-02-05 17:19:37 - ERROR - stderr - 28%|██▊ | 6201/22434 [7:11:57<11:25:54, 2.54s/it] +2025-02-05 17:19:39 - ERROR - stderr - 28%|██▊ | 6202/22434 [7:11:59<11:24:47, 2.53s/it] +2025-02-05 17:19:39 - ERROR - stderr - +2025-02-05 17:19:39 - ERROR - stderr - +2025-02-05 17:19:39 - INFO - stdout - {'loss': 0.855, 'grad_norm': 0.9747217893600464, 'learning_rate': 1.69806618372345e-05, 'epoch': 0.83} +2025-02-05 17:19:39 - ERROR - stderr - 28%|██▊ | 6202/22434 [7:11:59<11:24:47, 2.53s/it] +2025-02-05 17:19:42 - ERROR - stderr - 28%|██▊ | 6203/22434 [7:12:02<11:24:59, 2.53s/it] +2025-02-05 17:19:42 - ERROR - stderr - +2025-02-05 17:19:42 - ERROR - stderr - +2025-02-05 17:19:42 - INFO - stdout - {'loss': 0.9342, 'grad_norm': 1.1245551109313965, 'learning_rate': 1.697962799391026e-05, 'epoch': 0.83} +2025-02-05 17:19:42 - ERROR - stderr - 28%|██▊ | 6203/22434 [7:12:02<11:24:59, 2.53s/it] +2025-02-05 17:19:44 - ERROR - stderr - 28%|██▊ | 6204/22434 [7:12:04<11:19:43, 2.51s/it] +2025-02-05 17:19:44 - ERROR - stderr - +2025-02-05 17:19:44 - ERROR - stderr - +2025-02-05 17:19:44 - INFO - stdout - {'loss': 0.9354, 'grad_norm': 1.0616766214370728, 'learning_rate': 1.6978594005102354e-05, 'epoch': 0.83} +2025-02-05 17:19:44 - ERROR - stderr - 28%|██▊ | 6204/22434 [7:12:04<11:19:43, 2.51s/it] +2025-02-05 17:19:47 - ERROR - stderr - 28%|██▊ | 6205/22434 [7:12:07<11:54:32, 2.64s/it] +2025-02-05 17:19:47 - ERROR - stderr - +2025-02-05 17:19:47 - ERROR - stderr - +2025-02-05 17:19:47 - INFO - stdout - {'loss': 0.9014, 'grad_norm': 1.0917917490005493, 'learning_rate': 1.6977559870832336e-05, 'epoch': 0.83} +2025-02-05 17:19:47 - ERROR - stderr - 28%|██▊ | 6205/22434 [7:12:07<11:54:32, 2.64s/it] +2025-02-05 17:19:50 - ERROR - stderr - 28%|██▊ | 6206/22434 [7:12:10<12:04:51, 2.68s/it] +2025-02-05 17:19:50 - ERROR - stderr - +2025-02-05 17:19:50 - ERROR - stderr - +2025-02-05 17:19:50 - INFO - stdout - {'loss': 0.8093, 'grad_norm': 1.1070598363876343, 'learning_rate': 1.697652559112176e-05, 'epoch': 0.83} +2025-02-05 17:19:50 - ERROR - stderr - 28%|██▊ | 6206/22434 [7:12:10<12:04:51, 2.68s/it] +2025-02-05 17:19:53 - ERROR - stderr - 28%|██▊ | 6207/22434 [7:12:12<11:50:14, 2.63s/it] +2025-02-05 17:19:53 - ERROR - stderr - +2025-02-05 17:19:53 - ERROR - stderr - +2025-02-05 17:19:53 - INFO - stdout - {'loss': 0.9038, 'grad_norm': 1.0546437501907349, 'learning_rate': 1.6975491165992182e-05, 'epoch': 0.83} +2025-02-05 17:19:53 - ERROR - stderr - 28%|██▊ | 6207/22434 [7:12:12<11:50:14, 2.63s/it] +2025-02-05 17:19:55 - ERROR - stderr - 28%|██▊ | 6208/22434 [7:12:15<11:40:21, 2.59s/it] +2025-02-05 17:19:55 - ERROR - stderr - +2025-02-05 17:19:55 - ERROR - stderr - +2025-02-05 17:19:55 - INFO - stdout - {'loss': 0.9929, 'grad_norm': 1.072019100189209, 'learning_rate': 1.6974456595465166e-05, 'epoch': 0.83} +2025-02-05 17:19:55 - ERROR - stderr - 28%|██▊ | 6208/22434 [7:12:15<11:40:21, 2.59s/it] +2025-02-05 17:19:58 - ERROR - stderr - 28%|██▊ | 6209/22434 [7:12:17<11:36:43, 2.58s/it] +2025-02-05 17:19:58 - ERROR - stderr - +2025-02-05 17:19:58 - ERROR - stderr - +2025-02-05 17:19:58 - INFO - stdout - {'loss': 0.8224, 'grad_norm': 1.1376469135284424, 'learning_rate': 1.6973421879562275e-05, 'epoch': 0.83} +2025-02-05 17:19:58 - ERROR - stderr - 28%|██▊ | 6209/22434 [7:12:18<11:36:43, 2.58s/it] +2025-02-05 17:20:00 - ERROR - stderr - 28%|██▊ | 6210/22434 [7:12:20<11:39:30, 2.59s/it] +2025-02-05 17:20:00 - ERROR - stderr - +2025-02-05 17:20:00 - ERROR - stderr - +2025-02-05 17:20:00 - INFO - stdout - {'loss': 0.9159, 'grad_norm': 0.9903003573417664, 'learning_rate': 1.697238701830508e-05, 'epoch': 0.83} +2025-02-05 17:20:00 - ERROR - stderr - 28%|██▊ | 6210/22434 [7:12:20<11:39:30, 2.59s/it] +2025-02-05 17:20:03 - ERROR - stderr - 28%|██▊ | 6211/22434 [7:12:23<11:31:29, 2.56s/it] +2025-02-05 17:20:03 - ERROR - stderr - +2025-02-05 17:20:03 - ERROR - stderr - +2025-02-05 17:20:03 - INFO - stdout - {'loss': 0.872, 'grad_norm': 0.9316397309303284, 'learning_rate': 1.697135201171515e-05, 'epoch': 0.83} +2025-02-05 17:20:03 - ERROR - stderr - 28%|██▊ | 6211/22434 [7:12:23<11:31:29, 2.56s/it] +2025-02-05 17:20:05 - ERROR - stderr - 28%|██▊ | 6212/22434 [7:12:25<11:25:11, 2.53s/it] +2025-02-05 17:20:05 - ERROR - stderr - +2025-02-05 17:20:05 - ERROR - stderr - +2025-02-05 17:20:05 - INFO - stdout - {'loss': 0.978, 'grad_norm': 1.0285007953643799, 'learning_rate': 1.6970316859814054e-05, 'epoch': 0.83} +2025-02-05 17:20:05 - ERROR - stderr - 28%|██▊ | 6212/22434 [7:12:25<11:25:11, 2.53s/it] +2025-02-05 17:20:08 - ERROR - stderr - 28%|██▊ | 6213/22434 [7:12:27<11:18:37, 2.51s/it] +2025-02-05 17:20:08 - ERROR - stderr - +2025-02-05 17:20:08 - ERROR - stderr - +2025-02-05 17:20:08 - INFO - stdout - {'loss': 0.9887, 'grad_norm': 0.994144856929779, 'learning_rate': 1.6969281562623375e-05, 'epoch': 0.83} +2025-02-05 17:20:08 - ERROR - stderr - 28%|██▊ | 6213/22434 [7:12:28<11:18:37, 2.51s/it] +2025-02-05 17:20:10 - ERROR - stderr - 28%|██▊ | 6214/22434 [7:12:30<11:19:28, 2.51s/it] +2025-02-05 17:20:10 - ERROR - stderr - +2025-02-05 17:20:10 - ERROR - stderr - +2025-02-05 17:20:10 - INFO - stdout - {'loss': 0.882, 'grad_norm': 1.176943063735962, 'learning_rate': 1.6968246120164692e-05, 'epoch': 0.83} +2025-02-05 17:20:10 - ERROR - stderr - 28%|██▊ | 6214/22434 [7:12:30<11:19:28, 2.51s/it] +2025-02-05 17:20:13 - ERROR - stderr - 28%|██▊ | 6215/22434 [7:12:33<11:19:09, 2.51s/it] +2025-02-05 17:20:13 - ERROR - stderr - +2025-02-05 17:20:13 - ERROR - stderr - +2025-02-05 17:20:13 - INFO - stdout - {'loss': 0.9149, 'grad_norm': 1.0672295093536377, 'learning_rate': 1.6967210532459584e-05, 'epoch': 0.83} +2025-02-05 17:20:13 - ERROR - stderr - 28%|██▊ | 6215/22434 [7:12:33<11:19:09, 2.51s/it] +2025-02-05 17:20:16 - ERROR - stderr - 28%|██▊ | 6216/22434 [7:12:35<11:54:15, 2.64s/it] +2025-02-05 17:20:16 - ERROR - stderr - +2025-02-05 17:20:16 - ERROR - stderr - +2025-02-05 17:20:16 - INFO - stdout - {'loss': 0.971, 'grad_norm': 1.1021041870117188, 'learning_rate': 1.696617479952964e-05, 'epoch': 0.83} +2025-02-05 17:20:16 - ERROR - stderr - 28%|██▊ | 6216/22434 [7:12:36<11:54:15, 2.64s/it] +2025-02-05 17:20:18 - ERROR - stderr - 28%|██▊ | 6217/22434 [7:12:38<11:51:00, 2.63s/it] +2025-02-05 17:20:18 - ERROR - stderr - +2025-02-05 17:20:18 - ERROR - stderr - +2025-02-05 17:20:18 - INFO - stdout - {'loss': 0.9608, 'grad_norm': 1.0570067167282104, 'learning_rate': 1.6965138921396452e-05, 'epoch': 0.83} +2025-02-05 17:20:18 - ERROR - stderr - 28%|██▊ | 6217/22434 [7:12:38<11:51:00, 2.63s/it] +2025-02-05 17:20:21 - ERROR - stderr - 28%|██▊ | 6218/22434 [7:12:41<11:42:45, 2.60s/it] +2025-02-05 17:20:21 - ERROR - stderr - +2025-02-05 17:20:21 - ERROR - stderr - +2025-02-05 17:20:21 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 0.9825366139411926, 'learning_rate': 1.6964102898081608e-05, 'epoch': 0.83} +2025-02-05 17:20:21 - ERROR - stderr - 28%|██▊ | 6218/22434 [7:12:41<11:42:45, 2.60s/it] +2025-02-05 17:20:23 - ERROR - stderr - 28%|██▊ | 6219/22434 [7:12:43<11:39:45, 2.59s/it] +2025-02-05 17:20:23 - ERROR - stderr - +2025-02-05 17:20:23 - ERROR - stderr - +2025-02-05 17:20:23 - INFO - stdout - {'loss': 0.9509, 'grad_norm': 1.0337327718734741, 'learning_rate': 1.69630667296067e-05, 'epoch': 0.83} +2025-02-05 17:20:23 - ERROR - stderr - 28%|██▊ | 6219/22434 [7:12:43<11:39:45, 2.59s/it] +2025-02-05 17:20:26 - ERROR - stderr - 28%|██▊ | 6220/22434 [7:12:46<11:33:53, 2.57s/it] +2025-02-05 17:20:26 - ERROR - stderr - +2025-02-05 17:20:26 - ERROR - stderr - +2025-02-05 17:20:26 - INFO - stdout - {'loss': 1.0371, 'grad_norm': 1.192141056060791, 'learning_rate': 1.6962030415993327e-05, 'epoch': 0.83} +2025-02-05 17:20:26 - ERROR - stderr - 28%|██▊ | 6220/22434 [7:12:46<11:33:53, 2.57s/it] +2025-02-05 17:20:28 - ERROR - stderr - 28%|██▊ | 6221/22434 [7:12:48<11:25:29, 2.54s/it] +2025-02-05 17:20:28 - ERROR - stderr - +2025-02-05 17:20:28 - ERROR - stderr - +2025-02-05 17:20:28 - INFO - stdout - {'loss': 0.9718, 'grad_norm': 1.1258766651153564, 'learning_rate': 1.6960993957263094e-05, 'epoch': 0.83} +2025-02-05 17:20:28 - ERROR - stderr - 28%|██▊ | 6221/22434 [7:12:48<11:25:29, 2.54s/it] +2025-02-05 17:20:31 - ERROR - stderr - 28%|██▊ | 6222/22434 [7:12:51<11:23:33, 2.53s/it] +2025-02-05 17:20:31 - ERROR - stderr - +2025-02-05 17:20:31 - ERROR - stderr - +2025-02-05 17:20:31 - INFO - stdout - {'loss': 0.87, 'grad_norm': 0.9789291024208069, 'learning_rate': 1.6959957353437605e-05, 'epoch': 0.83} +2025-02-05 17:20:31 - ERROR - stderr - 28%|██▊ | 6222/22434 [7:12:51<11:23:33, 2.53s/it] +2025-02-05 17:20:33 - ERROR - stderr - 28%|██▊ | 6223/22434 [7:12:53<11:20:57, 2.52s/it] +2025-02-05 17:20:33 - ERROR - stderr - +2025-02-05 17:20:33 - ERROR - stderr - +2025-02-05 17:20:33 - INFO - stdout - {'loss': 0.9748, 'grad_norm': 1.0538341999053955, 'learning_rate': 1.6958920604538462e-05, 'epoch': 0.83} +2025-02-05 17:20:33 - ERROR - stderr - 28%|██▊ | 6223/22434 [7:12:53<11:20:57, 2.52s/it] +2025-02-05 17:20:36 - ERROR - stderr - 28%|██▊ | 6224/22434 [7:12:56<11:17:41, 2.51s/it] +2025-02-05 17:20:36 - ERROR - stderr - +2025-02-05 17:20:36 - ERROR - stderr - +2025-02-05 17:20:36 - INFO - stdout - {'loss': 0.96, 'grad_norm': 1.275272011756897, 'learning_rate': 1.695788371058728e-05, 'epoch': 0.83} +2025-02-05 17:20:36 - ERROR - stderr - 28%|██▊ | 6224/22434 [7:12:56<11:17:41, 2.51s/it] +2025-02-05 17:20:38 - ERROR - stderr - 28%|██▊ | 6225/22434 [7:12:58<11:17:01, 2.51s/it] +2025-02-05 17:20:38 - ERROR - stderr - +2025-02-05 17:20:38 - ERROR - stderr - +2025-02-05 17:20:38 - INFO - stdout - {'loss': 0.9858, 'grad_norm': 1.0702353715896606, 'learning_rate': 1.6956846671605667e-05, 'epoch': 0.83} +2025-02-05 17:20:38 - ERROR - stderr - 28%|██▊ | 6225/22434 [7:12:58<11:17:01, 2.51s/it] +2025-02-05 17:20:41 - ERROR - stderr - 28%|██▊ | 6226/22434 [7:13:01<11:15:10, 2.50s/it] +2025-02-05 17:20:41 - ERROR - stderr - +2025-02-05 17:20:41 - ERROR - stderr - +2025-02-05 17:20:41 - INFO - stdout - {'loss': 0.8968, 'grad_norm': 1.1408076286315918, 'learning_rate': 1.6955809487615244e-05, 'epoch': 0.83} +2025-02-05 17:20:41 - ERROR - stderr - 28%|██▊ | 6226/22434 [7:13:01<11:15:10, 2.50s/it] +2025-02-05 17:20:43 - ERROR - stderr - 28%|██▊ | 6227/22434 [7:13:03<11:13:35, 2.49s/it] +2025-02-05 17:20:43 - ERROR - stderr - +2025-02-05 17:20:43 - ERROR - stderr - +2025-02-05 17:20:43 - INFO - stdout - {'loss': 1.0432, 'grad_norm': 1.1220728158950806, 'learning_rate': 1.695477215863763e-05, 'epoch': 0.83} +2025-02-05 17:20:43 - ERROR - stderr - 28%|██▊ | 6227/22434 [7:13:03<11:13:35, 2.49s/it] +2025-02-05 17:20:46 - ERROR - stderr - 28%|██▊ | 6228/22434 [7:13:06<11:13:44, 2.49s/it] +2025-02-05 17:20:46 - ERROR - stderr - +2025-02-05 17:20:46 - ERROR - stderr - +2025-02-05 17:20:46 - INFO - stdout - {'loss': 1.0044, 'grad_norm': 1.0511724948883057, 'learning_rate': 1.6953734684694444e-05, 'epoch': 0.83} +2025-02-05 17:20:46 - ERROR - stderr - 28%|██▊ | 6228/22434 [7:13:06<11:13:44, 2.49s/it] +2025-02-05 17:20:48 - ERROR - stderr - 28%|██▊ | 6229/22434 [7:13:08<11:13:06, 2.49s/it] +2025-02-05 17:20:48 - ERROR - stderr - +2025-02-05 17:20:48 - ERROR - stderr - +2025-02-05 17:20:48 - INFO - stdout - {'loss': 0.8889, 'grad_norm': 1.1092078685760498, 'learning_rate': 1.695269706580731e-05, 'epoch': 0.83} +2025-02-05 17:20:48 - ERROR - stderr - 28%|██▊ | 6229/22434 [7:13:08<11:13:06, 2.49s/it] +2025-02-05 17:20:51 - ERROR - stderr - 28%|██▊ | 6230/22434 [7:13:11<11:17:54, 2.51s/it] +2025-02-05 17:20:51 - ERROR - stderr - +2025-02-05 17:20:51 - ERROR - stderr - +2025-02-05 17:20:51 - INFO - stdout - {'loss': 1.0218, 'grad_norm': 1.339896321296692, 'learning_rate': 1.695165930199786e-05, 'epoch': 0.83} +2025-02-05 17:20:51 - ERROR - stderr - 28%|██▊ | 6230/22434 [7:13:11<11:17:54, 2.51s/it] +2025-02-05 17:20:53 - ERROR - stderr - 28%|██▊ | 6231/22434 [7:13:13<11:12:57, 2.49s/it] +2025-02-05 17:20:53 - ERROR - stderr - +2025-02-05 17:20:53 - ERROR - stderr - +2025-02-05 17:20:53 - INFO - stdout - {'loss': 0.9761, 'grad_norm': 1.057202696800232, 'learning_rate': 1.695062139328773e-05, 'epoch': 0.83} +2025-02-05 17:20:53 - ERROR - stderr - 28%|██▊ | 6231/22434 [7:13:13<11:12:57, 2.49s/it] +2025-02-05 17:20:56 - ERROR - stderr - 28%|██▊ | 6232/22434 [7:13:16<11:30:50, 2.56s/it] +2025-02-05 17:20:56 - ERROR - stderr - +2025-02-05 17:20:56 - ERROR - stderr - +2025-02-05 17:20:56 - INFO - stdout - {'loss': 0.949, 'grad_norm': 1.1081269979476929, 'learning_rate': 1.694958333969854e-05, 'epoch': 0.83} +2025-02-05 17:20:56 - ERROR - stderr - 28%|██▊ | 6232/22434 [7:13:16<11:30:50, 2.56s/it] +2025-02-05 17:20:59 - ERROR - stderr - 28%|██▊ | 6233/22434 [7:13:18<11:23:23, 2.53s/it] +2025-02-05 17:20:59 - ERROR - stderr - +2025-02-05 17:20:59 - ERROR - stderr - +2025-02-05 17:20:59 - INFO - stdout - {'loss': 0.9558, 'grad_norm': 1.081121563911438, 'learning_rate': 1.6948545141251934e-05, 'epoch': 0.83} +2025-02-05 17:20:59 - ERROR - stderr - 28%|██▊ | 6233/22434 [7:13:18<11:23:23, 2.53s/it] +2025-02-05 17:21:01 - ERROR - stderr - 28%|██▊ | 6234/22434 [7:13:21<11:18:13, 2.51s/it] +2025-02-05 17:21:01 - ERROR - stderr - +2025-02-05 17:21:01 - ERROR - stderr - +2025-02-05 17:21:01 - INFO - stdout - {'loss': 0.9995, 'grad_norm': 1.0447009801864624, 'learning_rate': 1.6947506797969563e-05, 'epoch': 0.83} +2025-02-05 17:21:01 - ERROR - stderr - 28%|██▊ | 6234/22434 [7:13:21<11:18:13, 2.51s/it] +2025-02-05 17:21:03 - ERROR - stderr - 28%|██▊ | 6235/22434 [7:13:23<11:12:59, 2.49s/it] +2025-02-05 17:21:03 - ERROR - stderr - +2025-02-05 17:21:03 - ERROR - stderr - +2025-02-05 17:21:03 - INFO - stdout - {'loss': 1.0295, 'grad_norm': 1.0064798593521118, 'learning_rate': 1.6946468309873055e-05, 'epoch': 0.83} +2025-02-05 17:21:03 - ERROR - stderr - 28%|██▊ | 6235/22434 [7:13:23<11:12:59, 2.49s/it] +2025-02-05 17:21:06 - ERROR - stderr - 28%|██▊ | 6236/22434 [7:13:26<11:07:01, 2.47s/it] +2025-02-05 17:21:06 - ERROR - stderr - +2025-02-05 17:21:06 - ERROR - stderr - +2025-02-05 17:21:06 - INFO - stdout - {'loss': 0.9852, 'grad_norm': 0.9835310578346252, 'learning_rate': 1.694542967698406e-05, 'epoch': 0.83} +2025-02-05 17:21:06 - ERROR - stderr - 28%|██▊ | 6236/22434 [7:13:26<11:07:01, 2.47s/it] +2025-02-05 17:21:08 - ERROR - stderr - 28%|██▊ | 6237/22434 [7:13:28<11:05:23, 2.46s/it] +2025-02-05 17:21:08 - ERROR - stderr - +2025-02-05 17:21:08 - ERROR - stderr - +2025-02-05 17:21:08 - INFO - stdout - {'loss': 0.8355, 'grad_norm': 0.9826045036315918, 'learning_rate': 1.6944390899324234e-05, 'epoch': 0.83} +2025-02-05 17:21:08 - ERROR - stderr - 28%|██▊ | 6237/22434 [7:13:28<11:05:23, 2.46s/it] +2025-02-05 17:21:11 - ERROR - stderr - 28%|██▊ | 6238/22434 [7:13:31<11:13:42, 2.50s/it] +2025-02-05 17:21:11 - ERROR - stderr - +2025-02-05 17:21:11 - ERROR - stderr - +2025-02-05 17:21:11 - INFO - stdout - {'loss': 0.8803, 'grad_norm': 1.0677248239517212, 'learning_rate': 1.694335197691522e-05, 'epoch': 0.83} +2025-02-05 17:21:11 - ERROR - stderr - 28%|██▊ | 6238/22434 [7:13:31<11:13:42, 2.50s/it] +2025-02-05 17:21:13 - ERROR - stderr - 28%|██▊ | 6239/22434 [7:13:33<11:16:32, 2.51s/it] +2025-02-05 17:21:13 - ERROR - stderr - +2025-02-05 17:21:13 - ERROR - stderr - +2025-02-05 17:21:13 - INFO - stdout - {'loss': 0.8361, 'grad_norm': 1.047454595565796, 'learning_rate': 1.6942312909778683e-05, 'epoch': 0.83} +2025-02-05 17:21:13 - ERROR - stderr - 28%|██▊ | 6239/22434 [7:13:33<11:16:32, 2.51s/it] +2025-02-05 17:21:16 - ERROR - stderr - 28%|██▊ | 6240/22434 [7:13:36<11:44:08, 2.61s/it] +2025-02-05 17:21:16 - ERROR - stderr - +2025-02-05 17:21:16 - ERROR - stderr - +2025-02-05 17:21:16 - INFO - stdout - {'loss': 0.903, 'grad_norm': 1.0687263011932373, 'learning_rate': 1.6941273697936273e-05, 'epoch': 0.83} +2025-02-05 17:21:16 - ERROR - stderr - 28%|██▊ | 6240/22434 [7:13:36<11:44:08, 2.61s/it] +2025-02-05 17:21:19 - ERROR - stderr - 28%|██▊ | 6241/22434 [7:13:39<11:36:25, 2.58s/it] +2025-02-05 17:21:19 - ERROR - stderr - +2025-02-05 17:21:19 - ERROR - stderr - +2025-02-05 17:21:19 - INFO - stdout - {'loss': 0.7732, 'grad_norm': 1.0576106309890747, 'learning_rate': 1.6940234341409657e-05, 'epoch': 0.83} +2025-02-05 17:21:19 - ERROR - stderr - 28%|██▊ | 6241/22434 [7:13:39<11:36:25, 2.58s/it] +2025-02-05 17:21:21 - ERROR - stderr - 28%|██▊ | 6242/22434 [7:13:41<11:27:19, 2.55s/it] +2025-02-05 17:21:21 - ERROR - stderr - +2025-02-05 17:21:21 - ERROR - stderr - +2025-02-05 17:21:21 - INFO - stdout - {'loss': 0.8796, 'grad_norm': 0.9619467854499817, 'learning_rate': 1.6939194840220497e-05, 'epoch': 0.83} +2025-02-05 17:21:21 - ERROR - stderr - 28%|██▊ | 6242/22434 [7:13:41<11:27:19, 2.55s/it] +2025-02-05 17:21:24 - ERROR - stderr - 28%|██▊ | 6243/22434 [7:13:44<11:25:36, 2.54s/it] +2025-02-05 17:21:24 - ERROR - stderr - +2025-02-05 17:21:24 - ERROR - stderr - +2025-02-05 17:21:24 - INFO - stdout - {'loss': 0.8976, 'grad_norm': 1.1115882396697998, 'learning_rate': 1.693815519439046e-05, 'epoch': 0.83} +2025-02-05 17:21:24 - ERROR - stderr - 28%|██▊ | 6243/22434 [7:13:44<11:25:36, 2.54s/it] +2025-02-05 17:21:26 - ERROR - stderr - 28%|██▊ | 6244/22434 [7:13:46<11:22:17, 2.53s/it] +2025-02-05 17:21:26 - ERROR - stderr - +2025-02-05 17:21:26 - ERROR - stderr - +2025-02-05 17:21:26 - INFO - stdout - {'loss': 0.8988, 'grad_norm': 0.9547367095947266, 'learning_rate': 1.693711540394122e-05, 'epoch': 0.83} +2025-02-05 17:21:26 - ERROR - stderr - 28%|██▊ | 6244/22434 [7:13:46<11:22:17, 2.53s/it] +2025-02-05 17:21:29 - ERROR - stderr - 28%|██▊ | 6245/22434 [7:13:49<11:29:49, 2.56s/it] +2025-02-05 17:21:29 - ERROR - stderr - +2025-02-05 17:21:29 - ERROR - stderr - +2025-02-05 17:21:29 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 1.1374239921569824, 'learning_rate': 1.693607546889444e-05, 'epoch': 0.84} +2025-02-05 17:21:29 - ERROR - stderr - 28%|██▊ | 6245/22434 [7:13:49<11:29:49, 2.56s/it] +2025-02-05 17:21:31 - ERROR - stderr - 28%|██▊ | 6246/22434 [7:13:51<11:25:50, 2.54s/it] +2025-02-05 17:21:31 - ERROR - stderr - +2025-02-05 17:21:31 - ERROR - stderr - +2025-02-05 17:21:31 - INFO - stdout - {'loss': 1.0201, 'grad_norm': 1.0855188369750977, 'learning_rate': 1.693503538927181e-05, 'epoch': 0.84} +2025-02-05 17:21:31 - ERROR - stderr - 28%|██▊ | 6246/22434 [7:13:51<11:25:50, 2.54s/it] +2025-02-05 17:21:34 - ERROR - stderr - 28%|██▊ | 6247/22434 [7:13:54<11:22:48, 2.53s/it] +2025-02-05 17:21:34 - ERROR - stderr - +2025-02-05 17:21:34 - ERROR - stderr - +2025-02-05 17:21:34 - INFO - stdout - {'loss': 0.9262, 'grad_norm': 1.0434775352478027, 'learning_rate': 1.6933995165095006e-05, 'epoch': 0.84} +2025-02-05 17:21:34 - ERROR - stderr - 28%|██▊ | 6247/22434 [7:13:54<11:22:48, 2.53s/it] +2025-02-05 17:21:36 - ERROR - stderr - 28%|██▊ | 6248/22434 [7:13:56<11:20:27, 2.52s/it] +2025-02-05 17:21:36 - ERROR - stderr - +2025-02-05 17:21:36 - ERROR - stderr - +2025-02-05 17:21:36 - INFO - stdout - {'loss': 0.9966, 'grad_norm': 1.0397087335586548, 'learning_rate': 1.6932954796385703e-05, 'epoch': 0.84} +2025-02-05 17:21:36 - ERROR - stderr - 28%|██▊ | 6248/22434 [7:13:56<11:20:27, 2.52s/it] +2025-02-05 17:21:39 - ERROR - stderr - 28%|██▊ | 6249/22434 [7:13:59<11:17:49, 2.51s/it] +2025-02-05 17:21:39 - ERROR - stderr - +2025-02-05 17:21:39 - ERROR - stderr - +2025-02-05 17:21:39 - INFO - stdout - {'loss': 0.9304, 'grad_norm': 0.989005982875824, 'learning_rate': 1.693191428316559e-05, 'epoch': 0.84} +2025-02-05 17:21:39 - ERROR - stderr - 28%|██▊ | 6249/22434 [7:13:59<11:17:49, 2.51s/it] +2025-02-05 17:21:41 - ERROR - stderr - 28%|██▊ | 6250/22434 [7:14:01<11:21:25, 2.53s/it] +2025-02-05 17:21:41 - ERROR - stderr - +2025-02-05 17:21:41 - ERROR - stderr - +2025-02-05 17:21:41 - INFO - stdout - {'loss': 0.987, 'grad_norm': 1.1155787706375122, 'learning_rate': 1.6930873625456362e-05, 'epoch': 0.84} +2025-02-05 17:21:41 - ERROR - stderr - 28%|██▊ | 6250/22434 [7:14:01<11:21:25, 2.53s/it] +2025-02-05 17:21:44 - ERROR - stderr - 28%|██▊ | 6251/22434 [7:14:04<11:19:53, 2.52s/it] +2025-02-05 17:21:44 - ERROR - stderr - +2025-02-05 17:21:44 - ERROR - stderr - +2025-02-05 17:21:44 - INFO - stdout - {'loss': 0.7934, 'grad_norm': 1.014721155166626, 'learning_rate': 1.69298328232797e-05, 'epoch': 0.84} +2025-02-05 17:21:44 - ERROR - stderr - 28%|██▊ | 6251/22434 [7:14:04<11:19:53, 2.52s/it] +2025-02-05 17:21:46 - ERROR - stderr - 28%|██▊ | 6252/22434 [7:14:06<11:22:54, 2.53s/it] +2025-02-05 17:21:47 - ERROR - stderr - +2025-02-05 17:21:47 - ERROR - stderr - +2025-02-05 17:21:47 - INFO - stdout - {'loss': 0.8197, 'grad_norm': 1.0660616159439087, 'learning_rate': 1.6928791876657306e-05, 'epoch': 0.84} +2025-02-05 17:21:47 - ERROR - stderr - 28%|██▊ | 6252/22434 [7:14:06<11:22:54, 2.53s/it] +2025-02-05 17:21:49 - ERROR - stderr - 28%|██▊ | 6253/22434 [7:14:09<11:18:15, 2.52s/it] +2025-02-05 17:21:49 - ERROR - stderr - +2025-02-05 17:21:49 - ERROR - stderr - +2025-02-05 17:21:49 - INFO - stdout - {'loss': 0.9246, 'grad_norm': 1.0063304901123047, 'learning_rate': 1.6927750785610876e-05, 'epoch': 0.84} +2025-02-05 17:21:49 - ERROR - stderr - 28%|██▊ | 6253/22434 [7:14:09<11:18:15, 2.52s/it] +2025-02-05 17:21:52 - ERROR - stderr - 28%|██▊ | 6254/22434 [7:14:11<11:34:15, 2.57s/it] +2025-02-05 17:21:52 - ERROR - stderr - +2025-02-05 17:21:52 - ERROR - stderr - +2025-02-05 17:21:52 - INFO - stdout - {'loss': 0.911, 'grad_norm': 1.0346862077713013, 'learning_rate': 1.6926709550162112e-05, 'epoch': 0.84} +2025-02-05 17:21:52 - ERROR - stderr - 28%|██▊ | 6254/22434 [7:14:12<11:34:15, 2.57s/it] +2025-02-05 17:21:54 - ERROR - stderr - 28%|██▊ | 6255/22434 [7:14:14<11:30:18, 2.56s/it] +2025-02-05 17:21:54 - ERROR - stderr - +2025-02-05 17:21:54 - ERROR - stderr - +2025-02-05 17:21:54 - INFO - stdout - {'loss': 1.0264, 'grad_norm': 1.2380086183547974, 'learning_rate': 1.692566817033271e-05, 'epoch': 0.84} +2025-02-05 17:21:54 - ERROR - stderr - 28%|██▊ | 6255/22434 [7:14:14<11:30:18, 2.56s/it] +2025-02-05 17:21:57 - ERROR - stderr - 28%|██▊ | 6256/22434 [7:14:16<11:24:06, 2.54s/it] +2025-02-05 17:21:57 - ERROR - stderr - +2025-02-05 17:21:57 - ERROR - stderr - +2025-02-05 17:21:57 - INFO - stdout - {'loss': 0.8628, 'grad_norm': 1.0647270679473877, 'learning_rate': 1.692462664614439e-05, 'epoch': 0.84} +2025-02-05 17:21:57 - ERROR - stderr - 28%|██▊ | 6256/22434 [7:14:17<11:24:06, 2.54s/it] +2025-02-05 17:21:59 - ERROR - stderr - 28%|██▊ | 6257/22434 [7:14:19<11:16:04, 2.51s/it] +2025-02-05 17:21:59 - ERROR - stderr - +2025-02-05 17:21:59 - ERROR - stderr - +2025-02-05 17:21:59 - INFO - stdout - {'loss': 0.9609, 'grad_norm': 1.0911678075790405, 'learning_rate': 1.692358497761885e-05, 'epoch': 0.84} +2025-02-05 17:21:59 - ERROR - stderr - 28%|██▊ | 6257/22434 [7:14:19<11:16:04, 2.51s/it] +2025-02-05 17:22:02 - ERROR - stderr - 28%|██▊ | 6258/22434 [7:14:21<11:11:09, 2.49s/it] +2025-02-05 17:22:02 - ERROR - stderr - +2025-02-05 17:22:02 - ERROR - stderr - +2025-02-05 17:22:02 - INFO - stdout - {'loss': 0.9897, 'grad_norm': 0.980737566947937, 'learning_rate': 1.6922543164777805e-05, 'epoch': 0.84} +2025-02-05 17:22:02 - ERROR - stderr - 28%|██▊ | 6258/22434 [7:14:21<11:11:09, 2.49s/it] +2025-02-05 17:22:04 - ERROR - stderr - 28%|██▊ | 6259/22434 [7:14:24<11:12:15, 2.49s/it] +2025-02-05 17:22:04 - ERROR - stderr - +2025-02-05 17:22:04 - ERROR - stderr - +2025-02-05 17:22:04 - INFO - stdout - {'loss': 1.0212, 'grad_norm': 1.0662826299667358, 'learning_rate': 1.692150120764297e-05, 'epoch': 0.84} +2025-02-05 17:22:04 - ERROR - stderr - 28%|██▊ | 6259/22434 [7:14:24<11:12:15, 2.49s/it] +2025-02-05 17:22:07 - ERROR - stderr - 28%|██▊ | 6260/22434 [7:14:26<11:12:47, 2.50s/it] +2025-02-05 17:22:07 - ERROR - stderr - +2025-02-05 17:22:07 - ERROR - stderr - +2025-02-05 17:22:07 - INFO - stdout - {'loss': 0.9197, 'grad_norm': 1.0151029825210571, 'learning_rate': 1.692045910623607e-05, 'epoch': 0.84} +2025-02-05 17:22:07 - ERROR - stderr - 28%|██▊ | 6260/22434 [7:14:26<11:12:47, 2.50s/it] +2025-02-05 17:22:09 - ERROR - stderr - 28%|██▊ | 6261/22434 [7:14:29<11:08:26, 2.48s/it] +2025-02-05 17:22:09 - ERROR - stderr - +2025-02-05 17:22:09 - ERROR - stderr - +2025-02-05 17:22:09 - INFO - stdout - {'loss': 0.9893, 'grad_norm': 1.0873527526855469, 'learning_rate': 1.691941686057882e-05, 'epoch': 0.84} +2025-02-05 17:22:09 - ERROR - stderr - 28%|██▊ | 6261/22434 [7:14:29<11:08:26, 2.48s/it] +2025-02-05 17:22:12 - ERROR - stderr - 28%|██▊ | 6262/22434 [7:14:31<11:11:59, 2.49s/it] +2025-02-05 17:22:12 - ERROR - stderr - +2025-02-05 17:22:12 - ERROR - stderr - +2025-02-05 17:22:12 - INFO - stdout - {'loss': 1.0698, 'grad_norm': 1.0680855512619019, 'learning_rate': 1.691837447069295e-05, 'epoch': 0.84} +2025-02-05 17:22:12 - ERROR - stderr - 28%|██▊ | 6262/22434 [7:14:31<11:11:59, 2.49s/it] +2025-02-05 17:22:14 - ERROR - stderr - 28%|██▊ | 6263/22434 [7:14:34<11:17:14, 2.51s/it] +2025-02-05 17:22:14 - ERROR - stderr - +2025-02-05 17:22:14 - ERROR - stderr - +2025-02-05 17:22:14 - INFO - stdout - {'loss': 0.8106, 'grad_norm': 0.9014647603034973, 'learning_rate': 1.6917331936600183e-05, 'epoch': 0.84} +2025-02-05 17:22:14 - ERROR - stderr - 28%|██▊ | 6263/22434 [7:14:34<11:17:14, 2.51s/it] +2025-02-05 17:22:17 - ERROR - stderr - 28%|██▊ | 6264/22434 [7:14:37<11:32:48, 2.57s/it] +2025-02-05 17:22:17 - ERROR - stderr - +2025-02-05 17:22:17 - ERROR - stderr - +2025-02-05 17:22:17 - INFO - stdout - {'loss': 0.8819, 'grad_norm': 1.0312988758087158, 'learning_rate': 1.6916289258322246e-05, 'epoch': 0.84} +2025-02-05 17:22:17 - ERROR - stderr - 28%|██▊ | 6264/22434 [7:14:37<11:32:48, 2.57s/it] +2025-02-05 17:22:20 - ERROR - stderr - 28%|██▊ | 6265/22434 [7:14:39<11:42:03, 2.61s/it] +2025-02-05 17:22:20 - ERROR - stderr - +2025-02-05 17:22:20 - ERROR - stderr - +2025-02-05 17:22:20 - INFO - stdout - {'loss': 0.9669, 'grad_norm': 0.9442629814147949, 'learning_rate': 1.691524643588088e-05, 'epoch': 0.84} +2025-02-05 17:22:20 - ERROR - stderr - 28%|██▊ | 6265/22434 [7:14:39<11:42:03, 2.61s/it] +2025-02-05 17:22:22 - ERROR - stderr - 28%|██▊ | 6266/22434 [7:14:42<11:30:40, 2.56s/it] +2025-02-05 17:22:22 - ERROR - stderr - +2025-02-05 17:22:22 - ERROR - stderr - +2025-02-05 17:22:22 - INFO - stdout - {'loss': 0.9529, 'grad_norm': 1.1172345876693726, 'learning_rate': 1.691420346929782e-05, 'epoch': 0.84} +2025-02-05 17:22:22 - ERROR - stderr - 28%|██▊ | 6266/22434 [7:14:42<11:30:40, 2.56s/it] +2025-02-05 17:22:25 - ERROR - stderr - 28%|██▊ | 6267/22434 [7:14:45<11:46:59, 2.62s/it] +2025-02-05 17:22:25 - ERROR - stderr - +2025-02-05 17:22:25 - ERROR - stderr - +2025-02-05 17:22:25 - INFO - stdout - {'loss': 0.894, 'grad_norm': 1.006263017654419, 'learning_rate': 1.6913160358594803e-05, 'epoch': 0.84} +2025-02-05 17:22:25 - ERROR - stderr - 28%|██▊ | 6267/22434 [7:14:45<11:46:59, 2.62s/it] +2025-02-05 17:22:27 - ERROR - stderr - 28%|██▊ | 6268/22434 [7:14:47<11:41:45, 2.60s/it] +2025-02-05 17:22:27 - ERROR - stderr - +2025-02-05 17:22:27 - ERROR - stderr - +2025-02-05 17:22:27 - INFO - stdout - {'loss': 1.0314, 'grad_norm': 0.9992109537124634, 'learning_rate': 1.6912117103793578e-05, 'epoch': 0.84} +2025-02-05 17:22:27 - ERROR - stderr - 28%|██▊ | 6268/22434 [7:14:47<11:41:45, 2.60s/it] +2025-02-05 17:22:30 - ERROR - stderr - 28%|██▊ | 6269/22434 [7:14:50<11:32:37, 2.57s/it] +2025-02-05 17:22:30 - ERROR - stderr - +2025-02-05 17:22:30 - ERROR - stderr - +2025-02-05 17:22:30 - INFO - stdout - {'loss': 0.9283, 'grad_norm': 1.0451394319534302, 'learning_rate': 1.6911073704915883e-05, 'epoch': 0.84} +2025-02-05 17:22:30 - ERROR - stderr - 28%|██▊ | 6269/22434 [7:14:50<11:32:37, 2.57s/it] +2025-02-05 17:22:32 - ERROR - stderr - 28%|██▊ | 6270/22434 [7:14:52<11:28:38, 2.56s/it] +2025-02-05 17:22:32 - ERROR - stderr - +2025-02-05 17:22:32 - ERROR - stderr - +2025-02-05 17:22:32 - INFO - stdout - {'loss': 1.0524, 'grad_norm': 1.1377421617507935, 'learning_rate': 1.691003016198347e-05, 'epoch': 0.84} +2025-02-05 17:22:32 - ERROR - stderr - 28%|██▊ | 6270/22434 [7:14:52<11:28:38, 2.56s/it] +2025-02-05 17:22:35 - ERROR - stderr - 28%|██▊ | 6271/22434 [7:14:55<11:22:51, 2.53s/it] +2025-02-05 17:22:35 - ERROR - stderr - +2025-02-05 17:22:35 - ERROR - stderr - +2025-02-05 17:22:35 - INFO - stdout - {'loss': 0.9054, 'grad_norm': 0.9296470284461975, 'learning_rate': 1.690898647501809e-05, 'epoch': 0.84} +2025-02-05 17:22:35 - ERROR - stderr - 28%|��█▊ | 6271/22434 [7:14:55<11:22:51, 2.53s/it] +2025-02-05 17:22:37 - ERROR - stderr - 28%|██▊ | 6272/22434 [7:14:57<11:15:47, 2.51s/it] +2025-02-05 17:22:37 - ERROR - stderr - +2025-02-05 17:22:37 - ERROR - stderr - +2025-02-05 17:22:37 - INFO - stdout - {'loss': 0.9183, 'grad_norm': 1.1319226026535034, 'learning_rate': 1.69079426440415e-05, 'epoch': 0.84} +2025-02-05 17:22:37 - ERROR - stderr - 28%|██▊ | 6272/22434 [7:14:57<11:15:47, 2.51s/it] +2025-02-05 17:22:40 - ERROR - stderr - 28%|██▊ | 6273/22434 [7:15:00<11:20:12, 2.53s/it] +2025-02-05 17:22:40 - ERROR - stderr - +2025-02-05 17:22:40 - ERROR - stderr - +2025-02-05 17:22:40 - INFO - stdout - {'loss': 0.8667, 'grad_norm': 1.005556583404541, 'learning_rate': 1.6906898669075452e-05, 'epoch': 0.84} +2025-02-05 17:22:40 - ERROR - stderr - 28%|██▊ | 6273/22434 [7:15:00<11:20:12, 2.53s/it] +2025-02-05 17:22:42 - ERROR - stderr - 28%|██▊ | 6274/22434 [7:15:02<11:15:24, 2.51s/it] +2025-02-05 17:22:42 - ERROR - stderr - +2025-02-05 17:22:42 - ERROR - stderr - +2025-02-05 17:22:42 - INFO - stdout - {'loss': 1.0613, 'grad_norm': 1.1296900510787964, 'learning_rate': 1.6905854550141717e-05, 'epoch': 0.84} +2025-02-05 17:22:42 - ERROR - stderr - 28%|██▊ | 6274/22434 [7:15:02<11:15:24, 2.51s/it] +2025-02-05 17:22:45 - ERROR - stderr - 28%|██▊ | 6275/22434 [7:15:05<11:14:04, 2.50s/it] +2025-02-05 17:22:45 - ERROR - stderr - +2025-02-05 17:22:45 - ERROR - stderr - +2025-02-05 17:22:45 - INFO - stdout - {'loss': 0.8075, 'grad_norm': 0.9757203459739685, 'learning_rate': 1.6904810287262047e-05, 'epoch': 0.84} +2025-02-05 17:22:45 - ERROR - stderr - 28%|██▊ | 6275/22434 [7:15:05<11:14:04, 2.50s/it] +2025-02-05 17:22:47 - ERROR - stderr - 28%|██▊ | 6276/22434 [7:15:07<11:13:48, 2.50s/it] +2025-02-05 17:22:47 - ERROR - stderr - +2025-02-05 17:22:47 - ERROR - stderr - +2025-02-05 17:22:47 - INFO - stdout - {'loss': 1.013, 'grad_norm': 1.1405946016311646, 'learning_rate': 1.6903765880458216e-05, 'epoch': 0.84} +2025-02-05 17:22:47 - ERROR - stderr - 28%|██▊ | 6276/22434 [7:15:07<11:13:48, 2.50s/it] +2025-02-05 17:22:50 - ERROR - stderr - 28%|██▊ | 6277/22434 [7:15:09<11:09:02, 2.48s/it] +2025-02-05 17:22:50 - ERROR - stderr - +2025-02-05 17:22:50 - ERROR - stderr - +2025-02-05 17:22:50 - INFO - stdout - {'loss': 0.8188, 'grad_norm': 0.9648895263671875, 'learning_rate': 1.690272132975199e-05, 'epoch': 0.84} +2025-02-05 17:22:50 - ERROR - stderr - 28%|██▊ | 6277/22434 [7:15:10<11:09:02, 2.48s/it] +2025-02-05 17:22:52 - ERROR - stderr - 28%|██▊ | 6278/22434 [7:15:12<11:08:43, 2.48s/it] +2025-02-05 17:22:52 - ERROR - stderr - +2025-02-05 17:22:52 - ERROR - stderr - +2025-02-05 17:22:52 - INFO - stdout - {'loss': 0.9642, 'grad_norm': 0.9771251678466797, 'learning_rate': 1.6901676635165144e-05, 'epoch': 0.84} +2025-02-05 17:22:52 - ERROR - stderr - 28%|██▊ | 6278/22434 [7:15:12<11:08:43, 2.48s/it] +2025-02-05 17:22:55 - ERROR - stderr - 28%|██▊ | 6279/22434 [7:15:14<11:09:19, 2.49s/it] +2025-02-05 17:22:55 - ERROR - stderr - +2025-02-05 17:22:55 - ERROR - stderr - +2025-02-05 17:22:55 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.098215937614441, 'learning_rate': 1.6900631796719455e-05, 'epoch': 0.84} +2025-02-05 17:22:55 - ERROR - stderr - 28%|██▊ | 6279/22434 [7:15:14<11:09:19, 2.49s/it] +2025-02-05 17:22:57 - ERROR - stderr - 28%|██▊ | 6280/22434 [7:15:17<11:07:08, 2.48s/it] +2025-02-05 17:22:57 - ERROR - stderr - +2025-02-05 17:22:57 - ERROR - stderr - +2025-02-05 17:22:57 - INFO - stdout - {'loss': 1.0555, 'grad_norm': 0.9888482689857483, 'learning_rate': 1.6899586814436692e-05, 'epoch': 0.84} +2025-02-05 17:22:57 - ERROR - stderr - 28%|██▊ | 6280/22434 [7:15:17<11:07:08, 2.48s/it] +2025-02-05 17:23:00 - ERROR - stderr - 28%|██▊ | 6281/22434 [7:15:19<11:12:56, 2.50s/it] +2025-02-05 17:23:00 - ERROR - stderr - +2025-02-05 17:23:00 - ERROR - stderr - +2025-02-05 17:23:00 - INFO - stdout - {'loss': 0.9126, 'grad_norm': 1.0288373231887817, 'learning_rate': 1.6898541688338648e-05, 'epoch': 0.84} +2025-02-05 17:23:00 - ERROR - stderr - 28%|██▊ | 6281/22434 [7:15:20<11:12:56, 2.50s/it] +2025-02-05 17:23:02 - ERROR - stderr - 28%|██▊ | 6282/22434 [7:15:22<11:10:06, 2.49s/it] +2025-02-05 17:23:02 - ERROR - stderr - +2025-02-05 17:23:02 - ERROR - stderr - +2025-02-05 17:23:02 - INFO - stdout - {'loss': 1.0251, 'grad_norm': 1.0977911949157715, 'learning_rate': 1.6897496418447108e-05, 'epoch': 0.84} +2025-02-05 17:23:02 - ERROR - stderr - 28%|██▊ | 6282/22434 [7:15:22<11:10:06, 2.49s/it] +2025-02-05 17:23:05 - ERROR - stderr - 28%|██▊ | 6283/22434 [7:15:24<11:09:57, 2.49s/it] +2025-02-05 17:23:05 - ERROR - stderr - +2025-02-05 17:23:05 - ERROR - stderr - +2025-02-05 17:23:05 - INFO - stdout - {'loss': 0.8863, 'grad_norm': 0.9422560930252075, 'learning_rate': 1.6896451004783848e-05, 'epoch': 0.84} +2025-02-05 17:23:05 - ERROR - stderr - 28%|██▊ | 6283/22434 [7:15:24<11:09:57, 2.49s/it] +2025-02-05 17:23:07 - ERROR - stderr - 28%|██▊ | 6284/22434 [7:15:27<11:13:08, 2.50s/it] +2025-02-05 17:23:07 - ERROR - stderr - +2025-02-05 17:23:07 - ERROR - stderr - +2025-02-05 17:23:07 - INFO - stdout - {'loss': 0.9248, 'grad_norm': 1.0523384809494019, 'learning_rate': 1.689540544737067e-05, 'epoch': 0.84} +2025-02-05 17:23:07 - ERROR - stderr - 28%|██▊ | 6284/22434 [7:15:27<11:13:08, 2.50s/it] +2025-02-05 17:23:10 - ERROR - stderr - 28%|██▊ | 6285/22434 [7:15:29<11:10:10, 2.49s/it] +2025-02-05 17:23:10 - ERROR - stderr - +2025-02-05 17:23:10 - ERROR - stderr - +2025-02-05 17:23:10 - INFO - stdout - {'loss': 0.9582, 'grad_norm': 0.9838606119155884, 'learning_rate': 1.6894359746229362e-05, 'epoch': 0.84} +2025-02-05 17:23:10 - ERROR - stderr - 28%|██▊ | 6285/22434 [7:15:29<11:10:10, 2.49s/it] +2025-02-05 17:23:12 - ERROR - stderr - 28%|██▊ | 6286/22434 [7:15:32<11:09:41, 2.49s/it] +2025-02-05 17:23:12 - ERROR - stderr - +2025-02-05 17:23:12 - ERROR - stderr - +2025-02-05 17:23:12 - INFO - stdout - {'loss': 0.8797, 'grad_norm': 1.1502082347869873, 'learning_rate': 1.6893313901381724e-05, 'epoch': 0.84} +2025-02-05 17:23:12 - ERROR - stderr - 28%|██▊ | 6286/22434 [7:15:32<11:09:41, 2.49s/it] +2025-02-05 17:23:15 - ERROR - stderr - 28%|██▊ | 6287/22434 [7:15:34<11:05:31, 2.47s/it] +2025-02-05 17:23:15 - ERROR - stderr - +2025-02-05 17:23:15 - ERROR - stderr - +2025-02-05 17:23:15 - INFO - stdout - {'loss': 0.9738, 'grad_norm': 1.0644716024398804, 'learning_rate': 1.6892267912849556e-05, 'epoch': 0.84} +2025-02-05 17:23:15 - ERROR - stderr - 28%|██▊ | 6287/22434 [7:15:34<11:05:31, 2.47s/it] +2025-02-05 17:23:17 - ERROR - stderr - 28%|██▊ | 6288/22434 [7:15:37<11:08:12, 2.48s/it] +2025-02-05 17:23:17 - ERROR - stderr - +2025-02-05 17:23:17 - ERROR - stderr - +2025-02-05 17:23:17 - INFO - stdout - {'loss': 0.8603, 'grad_norm': 1.1231529712677002, 'learning_rate': 1.6891221780654654e-05, 'epoch': 0.84} +2025-02-05 17:23:17 - ERROR - stderr - 28%|██▊ | 6288/22434 [7:15:37<11:08:12, 2.48s/it] +2025-02-05 17:23:20 - ERROR - stderr - 28%|██▊ | 6289/22434 [7:15:39<11:06:26, 2.48s/it] +2025-02-05 17:23:20 - ERROR - stderr - +2025-02-05 17:23:20 - ERROR - stderr - +2025-02-05 17:23:20 - INFO - stdout - {'loss': 0.9645, 'grad_norm': 1.2128039598464966, 'learning_rate': 1.689017550481883e-05, 'epoch': 0.84} +2025-02-05 17:23:20 - ERROR - stderr - 28%|██▊ | 6289/22434 [7:15:39<11:06:26, 2.48s/it] +2025-02-05 17:23:22 - ERROR - stderr - 28%|██▊ | 6290/22434 [7:15:42<11:06:22, 2.48s/it] +2025-02-05 17:23:22 - ERROR - stderr - +2025-02-05 17:23:22 - ERROR - stderr - +2025-02-05 17:23:22 - INFO - stdout - {'loss': 0.8889, 'grad_norm': 0.9433903098106384, 'learning_rate': 1.6889129085363892e-05, 'epoch': 0.84} +2025-02-05 17:23:22 - ERROR - stderr - 28%|██▊ | 6290/22434 [7:15:42<11:06:22, 2.48s/it] +2025-02-05 17:23:24 - ERROR - stderr - 28%|██▊ | 6291/22434 [7:15:44<11:05:18, 2.47s/it] +2025-02-05 17:23:25 - ERROR - stderr - +2025-02-05 17:23:25 - ERROR - stderr - +2025-02-05 17:23:25 - INFO - stdout - {'loss': 1.0538, 'grad_norm': 1.2111896276474, 'learning_rate': 1.6888082522311648e-05, 'epoch': 0.84} +2025-02-05 17:23:25 - ERROR - stderr - 28%|██▊ | 6291/22434 [7:15:44<11:05:18, 2.47s/it] +2025-02-05 17:23:27 - ERROR - stderr - 28%|██▊ | 6292/22434 [7:15:47<11:21:54, 2.53s/it] +2025-02-05 17:23:27 - ERROR - stderr - +2025-02-05 17:23:27 - ERROR - stderr - +2025-02-05 17:23:27 - INFO - stdout - {'loss': 0.9643, 'grad_norm': 0.9870617985725403, 'learning_rate': 1.6887035815683918e-05, 'epoch': 0.84} +2025-02-05 17:23:27 - ERROR - stderr - 28%|██▊ | 6292/22434 [7:15:47<11:21:54, 2.53s/it] +2025-02-05 17:23:30 - ERROR - stderr - 28%|██▊ | 6293/22434 [7:15:49<11:18:11, 2.52s/it] +2025-02-05 17:23:30 - ERROR - stderr - +2025-02-05 17:23:30 - ERROR - stderr - +2025-02-05 17:23:30 - INFO - stdout - {'loss': 0.895, 'grad_norm': 0.9630647301673889, 'learning_rate': 1.6885988965502514e-05, 'epoch': 0.84} +2025-02-05 17:23:30 - ERROR - stderr - 28%|██▊ | 6293/22434 [7:15:49<11:18:11, 2.52s/it] +2025-02-05 17:23:32 - ERROR - stderr - 28%|██▊ | 6294/22434 [7:15:52<11:14:33, 2.51s/it] +2025-02-05 17:23:32 - ERROR - stderr - +2025-02-05 17:23:32 - ERROR - stderr - +2025-02-05 17:23:32 - INFO - stdout - {'loss': 0.9944, 'grad_norm': 1.0599976778030396, 'learning_rate': 1.6884941971789263e-05, 'epoch': 0.84} +2025-02-05 17:23:32 - ERROR - stderr - 28%|██▊ | 6294/22434 [7:15:52<11:14:33, 2.51s/it] +2025-02-05 17:23:35 - ERROR - stderr - 28%|██▊ | 6295/22434 [7:15:54<11:11:40, 2.50s/it] +2025-02-05 17:23:35 - ERROR - stderr - +2025-02-05 17:23:35 - ERROR - stderr - +2025-02-05 17:23:35 - INFO - stdout - {'loss': 0.8877, 'grad_norm': 1.0369551181793213, 'learning_rate': 1.688389483456598e-05, 'epoch': 0.84} +2025-02-05 17:23:35 - ERROR - stderr - 28%|██▊ | 6295/22434 [7:15:54<11:11:40, 2.50s/it] +2025-02-05 17:23:37 - ERROR - stderr - 28%|██▊ | 6296/22434 [7:15:57<11:09:25, 2.49s/it] +2025-02-05 17:23:37 - ERROR - stderr - +2025-02-05 17:23:37 - ERROR - stderr - +2025-02-05 17:23:37 - INFO - stdout - {'loss': 0.9182, 'grad_norm': 1.0309689044952393, 'learning_rate': 1.6882847553854497e-05, 'epoch': 0.84} +2025-02-05 17:23:37 - ERROR - stderr - 28%|██▊ | 6296/22434 [7:15:57<11:09:25, 2.49s/it] +2025-02-05 17:23:40 - ERROR - stderr - 28%|██▊ | 6297/22434 [7:15:59<11:13:31, 2.50s/it] +2025-02-05 17:23:40 - ERROR - stderr - +2025-02-05 17:23:40 - ERROR - stderr - +2025-02-05 17:23:40 - INFO - stdout - {'loss': 0.9038, 'grad_norm': 1.0261473655700684, 'learning_rate': 1.6881800129676643e-05, 'epoch': 0.84} +2025-02-05 17:23:40 - ERROR - stderr - 28%|██▊ | 6297/22434 [7:15:59<11:13:31, 2.50s/it] +2025-02-05 17:23:42 - ERROR - stderr - 28%|██▊ | 6298/22434 [7:16:02<11:20:26, 2.53s/it] +2025-02-05 17:23:42 - ERROR - stderr - +2025-02-05 17:23:42 - ERROR - stderr - +2025-02-05 17:23:42 - INFO - stdout - {'loss': 0.9059, 'grad_norm': 1.0375601053237915, 'learning_rate': 1.6880752562054253e-05, 'epoch': 0.84} +2025-02-05 17:23:42 - ERROR - stderr - 28%|██▊ | 6298/22434 [7:16:02<11:20:26, 2.53s/it] +2025-02-05 17:23:45 - ERROR - stderr - 28%|██▊ | 6299/22434 [7:16:04<11:11:31, 2.50s/it] +2025-02-05 17:23:45 - ERROR - stderr - +2025-02-05 17:23:45 - ERROR - stderr - +2025-02-05 17:23:45 - INFO - stdout - {'loss': 0.8628, 'grad_norm': 1.0322469472885132, 'learning_rate': 1.687970485100916e-05, 'epoch': 0.84} +2025-02-05 17:23:45 - ERROR - stderr - 28%|██▊ | 6299/22434 [7:16:04<11:11:31, 2.50s/it] +2025-02-05 17:23:47 - ERROR - stderr - 28%|██▊ | 6300/22434 [7:16:07<11:11:42, 2.50s/it] +2025-02-05 17:23:47 - ERROR - stderr - +2025-02-05 17:23:47 - ERROR - stderr - +2025-02-05 17:23:47 - INFO - stdout - {'loss': 0.8839, 'grad_norm': 0.9662466645240784, 'learning_rate': 1.68786569965632e-05, 'epoch': 0.84} +2025-02-05 17:23:47 - ERROR - stderr - 28%|██▊ | 6300/22434 [7:16:07<11:11:42, 2.50s/it] +2025-02-05 17:23:50 - ERROR - stderr - 28%|██▊ | 6301/22434 [7:16:09<11:06:52, 2.48s/it] +2025-02-05 17:23:50 - ERROR - stderr - +2025-02-05 17:23:50 - ERROR - stderr - +2025-02-05 17:23:50 - INFO - stdout - {'loss': 0.9052, 'grad_norm': 1.0548816919326782, 'learning_rate': 1.6877608998738216e-05, 'epoch': 0.84} +2025-02-05 17:23:50 - ERROR - stderr - 28%|██▊ | 6301/22434 [7:16:09<11:06:52, 2.48s/it] +2025-02-05 17:23:52 - ERROR - stderr - 28%|██▊ | 6302/22434 [7:16:12<11:05:10, 2.47s/it] +2025-02-05 17:23:52 - ERROR - stderr - +2025-02-05 17:23:52 - ERROR - stderr - +2025-02-05 17:23:52 - INFO - stdout - {'loss': 0.9428, 'grad_norm': 1.0748306512832642, 'learning_rate': 1.687656085755606e-05, 'epoch': 0.84} +2025-02-05 17:23:52 - ERROR - stderr - 28%|██▊ | 6302/22434 [7:16:12<11:05:10, 2.47s/it] +2025-02-05 17:23:55 - ERROR - stderr - 28%|██▊ | 6303/22434 [7:16:14<11:09:47, 2.49s/it] +2025-02-05 17:23:55 - ERROR - stderr - +2025-02-05 17:23:55 - ERROR - stderr - +2025-02-05 17:23:55 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.1008113622665405, 'learning_rate': 1.687551257303857e-05, 'epoch': 0.84} +2025-02-05 17:23:55 - ERROR - stderr - 28%|██▊ | 6303/22434 [7:16:14<11:09:47, 2.49s/it] +2025-02-05 17:23:57 - ERROR - stderr - 28%|██▊ | 6304/22434 [7:16:17<11:10:49, 2.50s/it] +2025-02-05 17:23:57 - ERROR - stderr - +2025-02-05 17:23:57 - ERROR - stderr - +2025-02-05 17:23:57 - INFO - stdout - {'loss': 0.8826, 'grad_norm': 0.990467369556427, 'learning_rate': 1.6874464145207597e-05, 'epoch': 0.84} +2025-02-05 17:23:57 - ERROR - stderr - 28%|██▊ | 6304/22434 [7:16:17<11:10:49, 2.50s/it] +2025-02-05 17:24:00 - ERROR - stderr - 28%|██▊ | 6305/22434 [7:16:19<11:15:45, 2.51s/it] +2025-02-05 17:24:00 - ERROR - stderr - +2025-02-05 17:24:00 - ERROR - stderr - +2025-02-05 17:24:00 - INFO - stdout - {'loss': 0.9027, 'grad_norm': 1.0164737701416016, 'learning_rate': 1.6873415574085e-05, 'epoch': 0.84} +2025-02-05 17:24:00 - ERROR - stderr - 28%|██▊ | 6305/22434 [7:16:19<11:15:45, 2.51s/it] +2025-02-05 17:24:02 - ERROR - stderr - 28%|██▊ | 6306/22434 [7:16:22<11:20:35, 2.53s/it] +2025-02-05 17:24:02 - ERROR - stderr - +2025-02-05 17:24:02 - ERROR - stderr - +2025-02-05 17:24:02 - INFO - stdout - {'loss': 0.9582, 'grad_norm': 0.9884905219078064, 'learning_rate': 1.687236685969263e-05, 'epoch': 0.84} +2025-02-05 17:24:02 - ERROR - stderr - 28%|██▊ | 6306/22434 [7:16:22<11:20:35, 2.53s/it] +2025-02-05 17:24:05 - ERROR - stderr - 28%|██▊ | 6307/22434 [7:16:24<11:16:41, 2.52s/it] +2025-02-05 17:24:05 - ERROR - stderr - +2025-02-05 17:24:05 - ERROR - stderr - +2025-02-05 17:24:05 - INFO - stdout - {'loss': 1.0188, 'grad_norm': 1.0693950653076172, 'learning_rate': 1.687131800205235e-05, 'epoch': 0.84} +2025-02-05 17:24:05 - ERROR - stderr - 28%|██▊ | 6307/22434 [7:16:24<11:16:41, 2.52s/it] +2025-02-05 17:24:07 - ERROR - stderr - 28%|██▊ | 6308/22434 [7:16:27<11:14:22, 2.51s/it] +2025-02-05 17:24:07 - ERROR - stderr - +2025-02-05 17:24:07 - ERROR - stderr - +2025-02-05 17:24:07 - INFO - stdout - {'loss': 1.0249, 'grad_norm': 1.2533334493637085, 'learning_rate': 1.687026900118602e-05, 'epoch': 0.84} +2025-02-05 17:24:07 - ERROR - stderr - 28%|██▊ | 6308/22434 [7:16:27<11:14:22, 2.51s/it] +2025-02-05 17:24:10 - ERROR - stderr - 28%|██▊ | 6309/22434 [7:16:29<11:12:35, 2.50s/it] +2025-02-05 17:24:10 - ERROR - stderr - +2025-02-05 17:24:10 - ERROR - stderr - +2025-02-05 17:24:10 - INFO - stdout - {'loss': 0.9091, 'grad_norm': 0.9755898118019104, 'learning_rate': 1.686921985711551e-05, 'epoch': 0.84} +2025-02-05 17:24:10 - ERROR - stderr - 28%|██▊ | 6309/22434 [7:16:29<11:12:35, 2.50s/it] +2025-02-05 17:24:12 - ERROR - stderr - 28%|██▊ | 6310/22434 [7:16:32<11:09:48, 2.49s/it] +2025-02-05 17:24:12 - ERROR - stderr - +2025-02-05 17:24:12 - ERROR - stderr - +2025-02-05 17:24:12 - INFO - stdout - {'loss': 0.8727, 'grad_norm': 1.092630386352539, 'learning_rate': 1.686817056986268e-05, 'epoch': 0.84} +2025-02-05 17:24:12 - ERROR - stderr - 28%|██▊ | 6310/22434 [7:16:32<11:09:48, 2.49s/it] +2025-02-05 17:24:15 - ERROR - stderr - 28%|██▊ | 6311/22434 [7:16:34<11:05:45, 2.48s/it] +2025-02-05 17:24:15 - ERROR - stderr - +2025-02-05 17:24:15 - ERROR - stderr - +2025-02-05 17:24:15 - INFO - stdout - {'loss': 0.977, 'grad_norm': 1.0801206827163696, 'learning_rate': 1.6867121139449413e-05, 'epoch': 0.84} +2025-02-05 17:24:15 - ERROR - stderr - 28%|██▊ | 6311/22434 [7:16:34<11:05:45, 2.48s/it] +2025-02-05 17:24:17 - ERROR - stderr - 28%|██▊ | 6312/22434 [7:16:37<11:01:11, 2.46s/it] +2025-02-05 17:24:17 - ERROR - stderr - +2025-02-05 17:24:17 - ERROR - stderr - +2025-02-05 17:24:17 - INFO - stdout - {'loss': 0.9515, 'grad_norm': 1.1071114540100098, 'learning_rate': 1.6866071565897574e-05, 'epoch': 0.84} +2025-02-05 17:24:17 - ERROR - stderr - 28%|██▊ | 6312/22434 [7:16:37<11:01:11, 2.46s/it] +2025-02-05 17:24:19 - ERROR - stderr - 28%|██▊ | 6313/22434 [7:16:39<11:05:09, 2.48s/it] +2025-02-05 17:24:20 - ERROR - stderr - +2025-02-05 17:24:20 - ERROR - stderr - +2025-02-05 17:24:20 - INFO - stdout - {'loss': 0.9799, 'grad_norm': 1.0245574712753296, 'learning_rate': 1.6865021849229042e-05, 'epoch': 0.84} +2025-02-05 17:24:20 - ERROR - stderr - 28%|██▊ | 6313/22434 [7:16:39<11:05:09, 2.48s/it] +2025-02-05 17:24:22 - ERROR - stderr - 28%|██▊ | 6314/22434 [7:16:42<11:06:26, 2.48s/it] +2025-02-05 17:24:22 - ERROR - stderr - +2025-02-05 17:24:22 - ERROR - stderr - +2025-02-05 17:24:22 - INFO - stdout - {'loss': 0.914, 'grad_norm': 0.9975886344909668, 'learning_rate': 1.68639719894657e-05, 'epoch': 0.84} +2025-02-05 17:24:22 - ERROR - stderr - 28%|██▊ | 6314/22434 [7:16:42<11:06:26, 2.48s/it] +2025-02-05 17:24:24 - ERROR - stderr - 28%|██▊ | 6315/22434 [7:16:44<11:09:43, 2.49s/it] +2025-02-05 17:24:25 - ERROR - stderr - +2025-02-05 17:24:25 - ERROR - stderr - +2025-02-05 17:24:25 - INFO - stdout - {'loss': 1.0333, 'grad_norm': 1.087110161781311, 'learning_rate': 1.686292198662943e-05, 'epoch': 0.84} +2025-02-05 17:24:25 - ERROR - stderr - 28%|██▊ | 6315/22434 [7:16:44<11:09:43, 2.49s/it] +2025-02-05 17:24:27 - ERROR - stderr - 28%|██▊ | 6316/22434 [7:16:47<11:11:50, 2.50s/it] +2025-02-05 17:24:27 - ERROR - stderr - +2025-02-05 17:24:27 - ERROR - stderr - +2025-02-05 17:24:27 - INFO - stdout - {'loss': 0.8577, 'grad_norm': 1.081152081489563, 'learning_rate': 1.6861871840742118e-05, 'epoch': 0.84} +2025-02-05 17:24:27 - ERROR - stderr - 28%|██▊ | 6316/22434 [7:16:47<11:11:50, 2.50s/it] +2025-02-05 17:24:29 - ERROR - stderr - 28%|██▊ | 6317/22434 [7:16:49<11:09:03, 2.49s/it] +2025-02-05 17:24:30 - ERROR - stderr - +2025-02-05 17:24:30 - ERROR - stderr - +2025-02-05 17:24:30 - INFO - stdout - {'loss': 0.9317, 'grad_norm': 1.0627353191375732, 'learning_rate': 1.6860821551825655e-05, 'epoch': 0.84} +2025-02-05 17:24:30 - ERROR - stderr - 28%|██▊ | 6317/22434 [7:16:49<11:09:03, 2.49s/it] +2025-02-05 17:24:32 - ERROR - stderr - 28%|██▊ | 6318/22434 [7:16:52<11:08:03, 2.49s/it] +2025-02-05 17:24:32 - ERROR - stderr - +2025-02-05 17:24:32 - ERROR - stderr - +2025-02-05 17:24:32 - INFO - stdout - {'loss': 0.9217, 'grad_norm': 1.0807102918624878, 'learning_rate': 1.685977111990193e-05, 'epoch': 0.84} +2025-02-05 17:24:32 - ERROR - stderr - 28%|██▊ | 6318/22434 [7:16:52<11:08:03, 2.49s/it] +2025-02-05 17:24:34 - ERROR - stderr - 28%|██▊ | 6319/22434 [7:16:54<11:06:28, 2.48s/it] +2025-02-05 17:24:34 - ERROR - stderr - +2025-02-05 17:24:34 - ERROR - stderr - +2025-02-05 17:24:34 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.1931391954421997, 'learning_rate': 1.6858720544992843e-05, 'epoch': 0.85} +2025-02-05 17:24:34 - ERROR - stderr - 28%|██▊ | 6319/22434 [7:16:54<11:06:28, 2.48s/it] +2025-02-05 17:24:37 - ERROR - stderr - 28%|██▊ | 6320/22434 [7:16:57<11:11:00, 2.50s/it] +2025-02-05 17:24:37 - ERROR - stderr - +2025-02-05 17:24:37 - ERROR - stderr - +2025-02-05 17:24:37 - INFO - stdout - {'loss': 0.8237, 'grad_norm': 1.0161738395690918, 'learning_rate': 1.6857669827120285e-05, 'epoch': 0.85} +2025-02-05 17:24:37 - ERROR - stderr - 28%|██▊ | 6320/22434 [7:16:57<11:11:00, 2.50s/it] +2025-02-05 17:24:39 - ERROR - stderr - 28%|██▊ | 6321/22434 [7:16:59<11:10:48, 2.50s/it] +2025-02-05 17:24:40 - ERROR - stderr - +2025-02-05 17:24:40 - ERROR - stderr - +2025-02-05 17:24:40 - INFO - stdout - {'loss': 0.9922, 'grad_norm': 1.0203443765640259, 'learning_rate': 1.6856618966306164e-05, 'epoch': 0.85} +2025-02-05 17:24:40 - ERROR - stderr - 28%|██▊ | 6321/22434 [7:16:59<11:10:48, 2.50s/it] +2025-02-05 17:24:42 - ERROR - stderr - 28%|██▊ | 6322/22434 [7:17:02<11:16:14, 2.52s/it] +2025-02-05 17:24:42 - ERROR - stderr - +2025-02-05 17:24:42 - ERROR - stderr - +2025-02-05 17:24:42 - INFO - stdout - {'loss': 0.8714, 'grad_norm': 1.057619571685791, 'learning_rate': 1.685556796257238e-05, 'epoch': 0.85} +2025-02-05 17:24:42 - ERROR - stderr - 28%|██▊ | 6322/22434 [7:17:02<11:16:14, 2.52s/it] +2025-02-05 17:24:44 - ERROR - stderr - 28%|██▊ | 6323/22434 [7:17:04<11:10:55, 2.50s/it] +2025-02-05 17:24:45 - ERROR - stderr - +2025-02-05 17:24:45 - ERROR - stderr - +2025-02-05 17:24:45 - INFO - stdout - {'loss': 0.9564, 'grad_norm': 1.0800080299377441, 'learning_rate': 1.6854516815940844e-05, 'epoch': 0.85} +2025-02-05 17:24:45 - ERROR - stderr - 28%|██▊ | 6323/22434 [7:17:04<11:10:55, 2.50s/it] +2025-02-05 17:24:47 - ERROR - stderr - 28%|██▊ | 6324/22434 [7:17:07<11:07:41, 2.49s/it] +2025-02-05 17:24:47 - ERROR - stderr - +2025-02-05 17:24:47 - ERROR - stderr - +2025-02-05 17:24:47 - INFO - stdout - {'loss': 0.9349, 'grad_norm': 1.0452362298965454, 'learning_rate': 1.6853465526433465e-05, 'epoch': 0.85} +2025-02-05 17:24:47 - ERROR - stderr - 28%|██▊ | 6324/22434 [7:17:07<11:07:41, 2.49s/it] +2025-02-05 17:24:49 - ERROR - stderr - 28%|██▊ | 6325/22434 [7:17:09<11:05:00, 2.48s/it] +2025-02-05 17:24:49 - ERROR - stderr - +2025-02-05 17:24:49 - ERROR - stderr - +2025-02-05 17:24:49 - INFO - stdout - {'loss': 1.0769, 'grad_norm': 1.063637137413025, 'learning_rate': 1.6852414094072153e-05, 'epoch': 0.85} +2025-02-05 17:24:49 - ERROR - stderr - 28%|██▊ | 6325/22434 [7:17:09<11:05:00, 2.48s/it] +2025-02-05 17:24:52 - ERROR - stderr - 28%|██▊ | 6326/22434 [7:17:12<11:09:55, 2.50s/it] +2025-02-05 17:24:52 - ERROR - stderr - +2025-02-05 17:24:52 - ERROR - stderr - +2025-02-05 17:24:52 - INFO - stdout - {'loss': 1.0414, 'grad_norm': 1.0307679176330566, 'learning_rate': 1.6851362518878823e-05, 'epoch': 0.85} +2025-02-05 17:24:52 - ERROR - stderr - 28%|██▊ | 6326/22434 [7:17:12<11:09:55, 2.50s/it] +2025-02-05 17:24:54 - ERROR - stderr - 28%|██▊ | 6327/22434 [7:17:14<11:09:58, 2.50s/it] +2025-02-05 17:24:54 - ERROR - stderr - +2025-02-05 17:24:54 - ERROR - stderr - +2025-02-05 17:24:54 - INFO - stdout - {'loss': 0.975, 'grad_norm': 1.0028204917907715, 'learning_rate': 1.6850310800875402e-05, 'epoch': 0.85} +2025-02-05 17:24:54 - ERROR - stderr - 28%|██▊ | 6327/22434 [7:17:14<11:09:58, 2.50s/it] +2025-02-05 17:24:57 - ERROR - stderr - 28%|██▊ | 6328/22434 [7:17:17<11:14:08, 2.51s/it] +2025-02-05 17:24:57 - ERROR - stderr - +2025-02-05 17:24:57 - ERROR - stderr - +2025-02-05 17:24:57 - INFO - stdout - {'loss': 0.9348, 'grad_norm': 1.2184512615203857, 'learning_rate': 1.6849258940083806e-05, 'epoch': 0.85} +2025-02-05 17:24:57 - ERROR - stderr - 28%|██▊ | 6328/22434 [7:17:17<11:14:08, 2.51s/it] +2025-02-05 17:24:59 - ERROR - stderr - 28%|██▊ | 6329/22434 [7:17:19<11:15:35, 2.52s/it] +2025-02-05 17:25:00 - ERROR - stderr - +2025-02-05 17:25:00 - ERROR - stderr - +2025-02-05 17:25:00 - INFO - stdout - {'loss': 0.9101, 'grad_norm': 1.021688461303711, 'learning_rate': 1.684820693652596e-05, 'epoch': 0.85} +2025-02-05 17:25:00 - ERROR - stderr - 28%|██▊ | 6329/22434 [7:17:19<11:15:35, 2.52s/it] +2025-02-05 17:25:02 - ERROR - stderr - 28%|██▊ | 6330/22434 [7:17:22<11:11:20, 2.50s/it] +2025-02-05 17:25:02 - ERROR - stderr - +2025-02-05 17:25:02 - ERROR - stderr - +2025-02-05 17:25:02 - INFO - stdout - {'loss': 0.8863, 'grad_norm': 1.1253647804260254, 'learning_rate': 1.6847154790223797e-05, 'epoch': 0.85} +2025-02-05 17:25:02 - ERROR - stderr - 28%|██▊ | 6330/22434 [7:17:22<11:11:20, 2.50s/it] +2025-02-05 17:25:05 - ERROR - stderr - 28%|██▊ | 6331/22434 [7:17:24<11:26:08, 2.56s/it] +2025-02-05 17:25:05 - ERROR - stderr - +2025-02-05 17:25:05 - ERROR - stderr - +2025-02-05 17:25:05 - INFO - stdout - {'loss': 0.9205, 'grad_norm': 1.1511632204055786, 'learning_rate': 1.6846102501199244e-05, 'epoch': 0.85} +2025-02-05 17:25:05 - ERROR - stderr - 28%|██▊ | 6331/22434 [7:17:24<11:26:08, 2.56s/it] +2025-02-05 17:25:07 - ERROR - stderr - 28%|██▊ | 6332/22434 [7:17:27<11:23:15, 2.55s/it] +2025-02-05 17:25:07 - ERROR - stderr - +2025-02-05 17:25:07 - ERROR - stderr - +2025-02-05 17:25:07 - INFO - stdout - {'loss': 0.9559, 'grad_norm': 1.0134265422821045, 'learning_rate': 1.6845050069474234e-05, 'epoch': 0.85} +2025-02-05 17:25:07 - ERROR - stderr - 28%|██▊ | 6332/22434 [7:17:27<11:23:15, 2.55s/it] +2025-02-05 17:25:10 - ERROR - stderr - 28%|██▊ | 6333/22434 [7:17:29<11:15:45, 2.52s/it] +2025-02-05 17:25:10 - ERROR - stderr - +2025-02-05 17:25:10 - ERROR - stderr - +2025-02-05 17:25:10 - INFO - stdout - {'loss': 1.001, 'grad_norm': 1.1101819276809692, 'learning_rate': 1.6843997495070702e-05, 'epoch': 0.85} +2025-02-05 17:25:10 - ERROR - stderr - 28%|██▊ | 6333/22434 [7:17:29<11:15:45, 2.52s/it] +2025-02-05 17:25:12 - ERROR - stderr - 28%|██▊ | 6334/22434 [7:17:32<11:12:07, 2.50s/it] +2025-02-05 17:25:12 - ERROR - stderr - +2025-02-05 17:25:12 - ERROR - stderr - +2025-02-05 17:25:12 - INFO - stdout - {'loss': 0.9079, 'grad_norm': 1.129840612411499, 'learning_rate': 1.68429447780106e-05, 'epoch': 0.85} +2025-02-05 17:25:12 - ERROR - stderr - 28%|██▊ | 6334/22434 [7:17:32<11:12:07, 2.50s/it] +2025-02-05 17:25:15 - ERROR - stderr - 28%|██▊ | 6335/22434 [7:17:34<11:11:14, 2.50s/it] +2025-02-05 17:25:15 - ERROR - stderr - +2025-02-05 17:25:15 - ERROR - stderr - +2025-02-05 17:25:15 - INFO - stdout - {'loss': 0.9264, 'grad_norm': 1.0620453357696533, 'learning_rate': 1.6841891918315853e-05, 'epoch': 0.85} +2025-02-05 17:25:15 - ERROR - stderr - 28%|██▊ | 6335/22434 [7:17:34<11:11:14, 2.50s/it] +2025-02-05 17:25:17 - ERROR - stderr - 28%|██▊ | 6336/22434 [7:17:37<11:09:21, 2.49s/it] +2025-02-05 17:25:17 - ERROR - stderr - +2025-02-05 17:25:17 - ERROR - stderr - +2025-02-05 17:25:17 - INFO - stdout - {'loss': 0.8918, 'grad_norm': 1.1281931400299072, 'learning_rate': 1.684083891600842e-05, 'epoch': 0.85} +2025-02-05 17:25:17 - ERROR - stderr - 28%|██▊ | 6336/22434 [7:17:37<11:09:21, 2.49s/it] +2025-02-05 17:25:20 - ERROR - stderr - 28%|██▊ | 6337/22434 [7:17:39<11:08:53, 2.49s/it] +2025-02-05 17:25:20 - ERROR - stderr - +2025-02-05 17:25:20 - ERROR - stderr - +2025-02-05 17:25:20 - INFO - stdout - {'loss': 0.8917, 'grad_norm': 1.1712507009506226, 'learning_rate': 1.6839785771110247e-05, 'epoch': 0.85} +2025-02-05 17:25:20 - ERROR - stderr - 28%|██▊ | 6337/22434 [7:17:39<11:08:53, 2.49s/it] +2025-02-05 17:25:22 - ERROR - stderr - 28%|██▊ | 6338/22434 [7:17:42<11:07:51, 2.49s/it] +2025-02-05 17:25:22 - ERROR - stderr - +2025-02-05 17:25:22 - ERROR - stderr - +2025-02-05 17:25:22 - INFO - stdout - {'loss': 1.0495, 'grad_norm': 1.0798373222351074, 'learning_rate': 1.683873248364328e-05, 'epoch': 0.85} +2025-02-05 17:25:22 - ERROR - stderr - 28%|██▊ | 6338/22434 [7:17:42<11:07:51, 2.49s/it] +2025-02-05 17:25:25 - ERROR - stderr - 28%|██▊ | 6339/22434 [7:17:44<11:05:33, 2.48s/it] +2025-02-05 17:25:25 - ERROR - stderr - +2025-02-05 17:25:25 - ERROR - stderr - +2025-02-05 17:25:25 - INFO - stdout - {'loss': 0.9955, 'grad_norm': 1.0146881341934204, 'learning_rate': 1.6837679053629483e-05, 'epoch': 0.85} +2025-02-05 17:25:25 - ERROR - stderr - 28%|██▊ | 6339/22434 [7:17:44<11:05:33, 2.48s/it] +2025-02-05 17:25:27 - ERROR - stderr - 28%|██▊ | 6340/22434 [7:17:47<11:40:25, 2.61s/it] +2025-02-05 17:25:27 - ERROR - stderr - +2025-02-05 17:25:27 - ERROR - stderr - +2025-02-05 17:25:27 - INFO - stdout - {'loss': 0.9478, 'grad_norm': 1.0500850677490234, 'learning_rate': 1.683662548109081e-05, 'epoch': 0.85} +2025-02-05 17:25:27 - ERROR - stderr - 28%|██▊ | 6340/22434 [7:17:47<11:40:25, 2.61s/it] +2025-02-05 17:25:30 - ERROR - stderr - 28%|██▊ | 6341/22434 [7:17:50<11:28:03, 2.57s/it] +2025-02-05 17:25:30 - ERROR - stderr - +2025-02-05 17:25:30 - ERROR - stderr - +2025-02-05 17:25:30 - INFO - stdout - {'loss': 0.8954, 'grad_norm': 1.0305777788162231, 'learning_rate': 1.6835571766049214e-05, 'epoch': 0.85} +2025-02-05 17:25:30 - ERROR - stderr - 28%|██▊ | 6341/22434 [7:17:50<11:28:03, 2.57s/it] +2025-02-05 17:25:32 - ERROR - stderr - 28%|██▊ | 6342/22434 [7:17:52<11:26:46, 2.56s/it] +2025-02-05 17:25:32 - ERROR - stderr - +2025-02-05 17:25:32 - ERROR - stderr - +2025-02-05 17:25:32 - INFO - stdout - {'loss': 0.8995, 'grad_norm': 0.9722110033035278, 'learning_rate': 1.683451790852667e-05, 'epoch': 0.85} +2025-02-05 17:25:32 - ERROR - stderr - 28%|██▊ | 6342/22434 [7:17:52<11:26:46, 2.56s/it] +2025-02-05 17:25:35 - ERROR - stderr - 28%|██▊ | 6343/22434 [7:17:55<11:20:44, 2.54s/it] +2025-02-05 17:25:35 - ERROR - stderr - +2025-02-05 17:25:35 - ERROR - stderr - +2025-02-05 17:25:35 - INFO - stdout - {'loss': 0.9075, 'grad_norm': 0.9783356189727783, 'learning_rate': 1.683346390854514e-05, 'epoch': 0.85} +2025-02-05 17:25:35 - ERROR - stderr - 28%|██▊ | 6343/22434 [7:17:55<11:20:44, 2.54s/it] +2025-02-05 17:25:37 - ERROR - stderr - 28%|██▊ | 6344/22434 [7:17:57<11:13:08, 2.51s/it] +2025-02-05 17:25:37 - ERROR - stderr - +2025-02-05 17:25:37 - ERROR - stderr - +2025-02-05 17:25:37 - INFO - stdout - {'loss': 0.8733, 'grad_norm': 1.064634084701538, 'learning_rate': 1.6832409766126593e-05, 'epoch': 0.85} +2025-02-05 17:25:37 - ERROR - stderr - 28%|██▊ | 6344/22434 [7:17:57<11:13:08, 2.51s/it] +2025-02-05 17:25:40 - ERROR - stderr - 28%|██▊ | 6345/22434 [7:18:00<11:14:30, 2.52s/it] +2025-02-05 17:25:40 - ERROR - stderr - +2025-02-05 17:25:40 - ERROR - stderr - +2025-02-05 17:25:40 - INFO - stdout - {'loss': 0.9727, 'grad_norm': 1.0619784593582153, 'learning_rate': 1.6831355481293004e-05, 'epoch': 0.85} +2025-02-05 17:25:40 - ERROR - stderr - 28%|██▊ | 6345/22434 [7:18:00<11:14:30, 2.52s/it] +2025-02-05 17:25:42 - ERROR - stderr - 28%|██▊ | 6346/22434 [7:18:02<11:15:49, 2.52s/it] +2025-02-05 17:25:42 - ERROR - stderr - +2025-02-05 17:25:42 - ERROR - stderr - +2025-02-05 17:25:42 - INFO - stdout - {'loss': 0.8665, 'grad_norm': 1.1045472621917725, 'learning_rate': 1.6830301054066343e-05, 'epoch': 0.85} +2025-02-05 17:25:42 - ERROR - stderr - 28%|██▊ | 6346/22434 [7:18:02<11:15:49, 2.52s/it] +2025-02-05 17:25:45 - ERROR - stderr - 28%|██▊ | 6347/22434 [7:18:05<11:12:24, 2.51s/it] +2025-02-05 17:25:45 - ERROR - stderr - +2025-02-05 17:25:45 - ERROR - stderr - +2025-02-05 17:25:45 - INFO - stdout - {'loss': 0.9916, 'grad_norm': 1.002352237701416, 'learning_rate': 1.68292464844686e-05, 'epoch': 0.85} +2025-02-05 17:25:45 - ERROR - stderr - 28%|██▊ | 6347/22434 [7:18:05<11:12:24, 2.51s/it] +2025-02-05 17:25:47 - ERROR - stderr - 28%|██▊ | 6348/22434 [7:18:07<11:08:16, 2.49s/it] +2025-02-05 17:25:47 - ERROR - stderr - +2025-02-05 17:25:47 - ERROR - stderr - +2025-02-05 17:25:47 - INFO - stdout - {'loss': 0.8624, 'grad_norm': 1.0003159046173096, 'learning_rate': 1.6828191772521744e-05, 'epoch': 0.85} +2025-02-05 17:25:47 - ERROR - stderr - 28%|██▊ | 6348/22434 [7:18:07<11:08:16, 2.49s/it] +2025-02-05 17:25:50 - ERROR - stderr - 28%|██▊ | 6349/22434 [7:18:10<11:10:22, 2.50s/it] +2025-02-05 17:25:50 - ERROR - stderr - +2025-02-05 17:25:50 - ERROR - stderr - +2025-02-05 17:25:50 - INFO - stdout - {'loss': 0.8647, 'grad_norm': 0.9276086091995239, 'learning_rate': 1.6827136918247763e-05, 'epoch': 0.85} +2025-02-05 17:25:50 - ERROR - stderr - 28%|██▊ | 6349/22434 [7:18:10<11:10:22, 2.50s/it] +2025-02-05 17:25:52 - ERROR - stderr - 28%|██▊ | 6350/22434 [7:18:12<11:12:23, 2.51s/it] +2025-02-05 17:25:52 - ERROR - stderr - +2025-02-05 17:25:52 - ERROR - stderr - +2025-02-05 17:25:52 - INFO - stdout - {'loss': 0.8793, 'grad_norm': 1.0791691541671753, 'learning_rate': 1.6826081921668645e-05, 'epoch': 0.85} +2025-02-05 17:25:52 - ERROR - stderr - 28%|██▊ | 6350/22434 [7:18:12<11:12:23, 2.51s/it] +2025-02-05 17:25:55 - ERROR - stderr - 28%|██▊ | 6351/22434 [7:18:15<11:12:55, 2.51s/it] +2025-02-05 17:25:55 - ERROR - stderr - +2025-02-05 17:25:55 - ERROR - stderr - +2025-02-05 17:25:55 - INFO - stdout - {'loss': 1.0109, 'grad_norm': 1.1185963153839111, 'learning_rate': 1.6825026782806383e-05, 'epoch': 0.85} +2025-02-05 17:25:55 - ERROR - stderr - 28%|██▊ | 6351/22434 [7:18:15<11:12:55, 2.51s/it] +2025-02-05 17:25:57 - ERROR - stderr - 28%|██▊ | 6352/22434 [7:18:17<11:14:42, 2.52s/it] +2025-02-05 17:25:57 - ERROR - stderr - +2025-02-05 17:25:57 - ERROR - stderr - +2025-02-05 17:25:57 - INFO - stdout - {'loss': 0.8502, 'grad_norm': 1.0141671895980835, 'learning_rate': 1.682397150168297e-05, 'epoch': 0.85} +2025-02-05 17:25:57 - ERROR - stderr - 28%|██▊ | 6352/22434 [7:18:17<11:14:42, 2.52s/it] +2025-02-05 17:26:00 - ERROR - stderr - 28%|██▊ | 6353/22434 [7:18:20<11:09:16, 2.50s/it] +2025-02-05 17:26:00 - ERROR - stderr - +2025-02-05 17:26:00 - ERROR - stderr - +2025-02-05 17:26:00 - INFO - stdout - {'loss': 0.9827, 'grad_norm': 1.0985190868377686, 'learning_rate': 1.68229160783204e-05, 'epoch': 0.85} +2025-02-05 17:26:00 - ERROR - stderr - 28%|██▊ | 6353/22434 [7:18:20<11:09:16, 2.50s/it] +2025-02-05 17:26:02 - ERROR - stderr - 28%|██▊ | 6354/22434 [7:18:22<11:11:30, 2.51s/it] +2025-02-05 17:26:02 - ERROR - stderr - +2025-02-05 17:26:02 - ERROR - stderr - +2025-02-05 17:26:02 - INFO - stdout - {'loss': 0.899, 'grad_norm': 1.115431785583496, 'learning_rate': 1.6821860512740674e-05, 'epoch': 0.85} +2025-02-05 17:26:02 - ERROR - stderr - 28%|██▊ | 6354/22434 [7:18:22<11:11:30, 2.51s/it] +2025-02-05 17:26:05 - ERROR - stderr - 28%|██▊ | 6355/22434 [7:18:25<11:10:53, 2.50s/it] +2025-02-05 17:26:05 - ERROR - stderr - +2025-02-05 17:26:05 - ERROR - stderr - +2025-02-05 17:26:05 - INFO - stdout - {'loss': 0.9981, 'grad_norm': 1.030537724494934, 'learning_rate': 1.6820804804965792e-05, 'epoch': 0.85} +2025-02-05 17:26:05 - ERROR - stderr - 28%|██▊ | 6355/22434 [7:18:25<11:10:53, 2.50s/it] +2025-02-05 17:26:07 - ERROR - stderr - 28%|██▊ | 6356/22434 [7:18:27<11:09:24, 2.50s/it] +2025-02-05 17:26:07 - ERROR - stderr - +2025-02-05 17:26:07 - ERROR - stderr - +2025-02-05 17:26:07 - INFO - stdout - {'loss': 0.9282, 'grad_norm': 1.0183442831039429, 'learning_rate': 1.681974895501776e-05, 'epoch': 0.85} +2025-02-05 17:26:07 - ERROR - stderr - 28%|██▊ | 6356/22434 [7:18:27<11:09:24, 2.50s/it] +2025-02-05 17:26:10 - ERROR - stderr - 28%|██▊ | 6357/22434 [7:18:30<11:06:17, 2.49s/it] +2025-02-05 17:26:10 - ERROR - stderr - +2025-02-05 17:26:10 - ERROR - stderr - +2025-02-05 17:26:10 - INFO - stdout - {'loss': 0.9135, 'grad_norm': 1.0021448135375977, 'learning_rate': 1.681869296291859e-05, 'epoch': 0.85} +2025-02-05 17:26:10 - ERROR - stderr - 28%|██▊ | 6357/22434 [7:18:30<11:06:17, 2.49s/it] +2025-02-05 17:26:12 - ERROR - stderr - 28%|██▊ | 6358/22434 [7:18:32<11:05:19, 2.48s/it] +2025-02-05 17:26:12 - ERROR - stderr - +2025-02-05 17:26:12 - ERROR - stderr - +2025-02-05 17:26:12 - INFO - stdout - {'loss': 0.8565, 'grad_norm': 1.019509196281433, 'learning_rate': 1.6817636828690288e-05, 'epoch': 0.85} +2025-02-05 17:26:12 - ERROR - stderr - 28%|██▊ | 6358/22434 [7:18:32<11:05:19, 2.48s/it] +2025-02-05 17:26:15 - ERROR - stderr - 28%|██▊ | 6359/22434 [7:18:35<11:15:05, 2.52s/it] +2025-02-05 17:26:15 - ERROR - stderr - +2025-02-05 17:26:15 - ERROR - stderr - +2025-02-05 17:26:15 - INFO - stdout - {'loss': 1.0596, 'grad_norm': 1.062915563583374, 'learning_rate': 1.681658055235487e-05, 'epoch': 0.85} +2025-02-05 17:26:15 - ERROR - stderr - 28%|██▊ | 6359/22434 [7:18:35<11:15:05, 2.52s/it] +2025-02-05 17:26:18 - ERROR - stderr - 28%|██▊ | 6360/22434 [7:18:38<11:40:32, 2.61s/it] +2025-02-05 17:26:18 - ERROR - stderr - +2025-02-05 17:26:18 - ERROR - stderr - +2025-02-05 17:26:18 - INFO - stdout - {'loss': 0.9461, 'grad_norm': 1.0293793678283691, 'learning_rate': 1.681552413393435e-05, 'epoch': 0.85} +2025-02-05 17:26:18 - ERROR - stderr - 28%|██▊ | 6360/22434 [7:18:38<11:40:32, 2.61s/it] +2025-02-05 17:26:20 - ERROR - stderr - 28%|██▊ | 6361/22434 [7:18:40<11:31:21, 2.58s/it] +2025-02-05 17:26:20 - ERROR - stderr - +2025-02-05 17:26:20 - ERROR - stderr - +2025-02-05 17:26:20 - INFO - stdout - {'loss': 1.0166, 'grad_norm': 1.0702258348464966, 'learning_rate': 1.6814467573450754e-05, 'epoch': 0.85} +2025-02-05 17:26:20 - ERROR - stderr - 28%|██▊ | 6361/22434 [7:18:40<11:31:21, 2.58s/it] +2025-02-05 17:26:23 - ERROR - stderr - 28%|██▊ | 6362/22434 [7:18:43<11:27:35, 2.57s/it] +2025-02-05 17:26:23 - ERROR - stderr - +2025-02-05 17:26:23 - ERROR - stderr - +2025-02-05 17:26:23 - INFO - stdout - {'loss': 0.9807, 'grad_norm': 1.1517055034637451, 'learning_rate': 1.6813410870926105e-05, 'epoch': 0.85} +2025-02-05 17:26:23 - ERROR - stderr - 28%|██▊ | 6362/22434 [7:18:43<11:27:35, 2.57s/it] +2025-02-05 17:26:25 - ERROR - stderr - 28%|██▊ | 6363/22434 [7:18:45<11:23:14, 2.55s/it] +2025-02-05 17:26:25 - ERROR - stderr - +2025-02-05 17:26:25 - ERROR - stderr - +2025-02-05 17:26:25 - INFO - stdout - {'loss': 0.9033, 'grad_norm': 1.0516215562820435, 'learning_rate': 1.6812354026382426e-05, 'epoch': 0.85} +2025-02-05 17:26:25 - ERROR - stderr - 28%|██▊ | 6363/22434 [7:18:45<11:23:14, 2.55s/it] +2025-02-05 17:26:28 - ERROR - stderr - 28%|██▊ | 6364/22434 [7:18:48<11:17:36, 2.53s/it] +2025-02-05 17:26:28 - ERROR - stderr - +2025-02-05 17:26:28 - ERROR - stderr - +2025-02-05 17:26:28 - INFO - stdout - {'loss': 0.9396, 'grad_norm': 1.0838863849639893, 'learning_rate': 1.681129703984174e-05, 'epoch': 0.85} +2025-02-05 17:26:28 - ERROR - stderr - 28%|██▊ | 6364/22434 [7:18:48<11:17:36, 2.53s/it] +2025-02-05 17:26:30 - ERROR - stderr - 28%|██▊ | 6365/22434 [7:18:50<11:09:41, 2.50s/it] +2025-02-05 17:26:30 - ERROR - stderr - +2025-02-05 17:26:30 - ERROR - stderr - +2025-02-05 17:26:30 - INFO - stdout - {'loss': 1.0545, 'grad_norm': 1.093553900718689, 'learning_rate': 1.6810239911326086e-05, 'epoch': 0.85} +2025-02-05 17:26:30 - ERROR - stderr - 28%|██▊ | 6365/22434 [7:18:50<11:09:41, 2.50s/it] +2025-02-05 17:26:33 - ERROR - stderr - 28%|██▊ | 6366/22434 [7:18:52<11:05:39, 2.49s/it] +2025-02-05 17:26:33 - ERROR - stderr - +2025-02-05 17:26:33 - ERROR - stderr - +2025-02-05 17:26:33 - INFO - stdout - {'loss': 1.0266, 'grad_norm': 1.0164642333984375, 'learning_rate': 1.6809182640857504e-05, 'epoch': 0.85} +2025-02-05 17:26:33 - ERROR - stderr - 28%|██▊ | 6366/22434 [7:18:53<11:05:39, 2.49s/it] +2025-02-05 17:26:35 - ERROR - stderr - 28%|██▊ | 6367/22434 [7:18:55<11:13:54, 2.52s/it] +2025-02-05 17:26:35 - ERROR - stderr - +2025-02-05 17:26:35 - ERROR - stderr - +2025-02-05 17:26:35 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.0966217517852783, 'learning_rate': 1.680812522845802e-05, 'epoch': 0.85} +2025-02-05 17:26:35 - ERROR - stderr - 28%|██▊ | 6367/22434 [7:18:55<11:13:54, 2.52s/it] +2025-02-05 17:26:38 - ERROR - stderr - 28%|██▊ | 6368/22434 [7:18:58<11:21:20, 2.54s/it] +2025-02-05 17:26:38 - ERROR - stderr - +2025-02-05 17:26:38 - ERROR - stderr - +2025-02-05 17:26:38 - INFO - stdout - {'loss': 0.7678, 'grad_norm': 1.065967321395874, 'learning_rate': 1.680706767414968e-05, 'epoch': 0.85} +2025-02-05 17:26:38 - ERROR - stderr - 28%|██▊ | 6368/22434 [7:18:58<11:21:20, 2.54s/it] +2025-02-05 17:26:40 - ERROR - stderr - 28%|██▊ | 6369/22434 [7:19:00<11:17:46, 2.53s/it] +2025-02-05 17:26:40 - ERROR - stderr - +2025-02-05 17:26:40 - ERROR - stderr - +2025-02-05 17:26:40 - INFO - stdout - {'loss': 0.8639, 'grad_norm': 1.1220910549163818, 'learning_rate': 1.6806009977954533e-05, 'epoch': 0.85} +2025-02-05 17:26:40 - ERROR - stderr - 28%|██▊ | 6369/22434 [7:19:00<11:17:46, 2.53s/it] +2025-02-05 17:26:43 - ERROR - stderr - 28%|██▊ | 6370/22434 [7:19:03<11:13:00, 2.51s/it] +2025-02-05 17:26:43 - ERROR - stderr - +2025-02-05 17:26:43 - ERROR - stderr - +2025-02-05 17:26:43 - INFO - stdout - {'loss': 0.9823, 'grad_norm': 1.0540400743484497, 'learning_rate': 1.6804952139894618e-05, 'epoch': 0.85} +2025-02-05 17:26:43 - ERROR - stderr - 28%|██▊ | 6370/22434 [7:19:03<11:13:00, 2.51s/it] +2025-02-05 17:26:45 - ERROR - stderr - 28%|██▊ | 6371/22434 [7:19:05<11:18:13, 2.53s/it] +2025-02-05 17:26:45 - ERROR - stderr - +2025-02-05 17:26:45 - ERROR - stderr - +2025-02-05 17:26:45 - INFO - stdout - {'loss': 0.9771, 'grad_norm': 1.0192756652832031, 'learning_rate': 1.6803894159991985e-05, 'epoch': 0.85} +2025-02-05 17:26:45 - ERROR - stderr - 28%|██▊ | 6371/22434 [7:19:05<11:18:13, 2.53s/it] +2025-02-05 17:26:48 - ERROR - stderr - 28%|██▊ | 6372/22434 [7:19:08<11:17:30, 2.53s/it] +2025-02-05 17:26:48 - ERROR - stderr - +2025-02-05 17:26:48 - ERROR - stderr - +2025-02-05 17:26:48 - INFO - stdout - {'loss': 0.8796, 'grad_norm': 0.9443618059158325, 'learning_rate': 1.6802836038268694e-05, 'epoch': 0.85} +2025-02-05 17:26:48 - ERROR - stderr - 28%|██▊ | 6372/22434 [7:19:08<11:17:30, 2.53s/it] +2025-02-05 17:26:51 - ERROR - stderr - 28%|██▊ | 6373/22434 [7:19:10<11:21:09, 2.54s/it] +2025-02-05 17:26:51 - ERROR - stderr - +2025-02-05 17:26:51 - ERROR - stderr - +2025-02-05 17:26:51 - INFO - stdout - {'loss': 0.8645, 'grad_norm': 1.0384531021118164, 'learning_rate': 1.680177777474679e-05, 'epoch': 0.85} +2025-02-05 17:26:51 - ERROR - stderr - 28%|██▊ | 6373/22434 [7:19:10<11:21:09, 2.54s/it] +2025-02-05 17:26:53 - ERROR - stderr - 28%|██▊ | 6374/22434 [7:19:13<11:14:26, 2.52s/it] +2025-02-05 17:26:53 - ERROR - stderr - +2025-02-05 17:26:53 - ERROR - stderr - +2025-02-05 17:26:53 - INFO - stdout - {'loss': 0.8759, 'grad_norm': 1.1033827066421509, 'learning_rate': 1.6800719369448336e-05, 'epoch': 0.85} +2025-02-05 17:26:53 - ERROR - stderr - 28%|██▊ | 6374/22434 [7:19:13<11:14:26, 2.52s/it] +2025-02-05 17:26:55 - ERROR - stderr - 28%|██▊ | 6375/22434 [7:19:15<11:11:48, 2.51s/it] +2025-02-05 17:26:56 - ERROR - stderr - +2025-02-05 17:26:56 - ERROR - stderr - +2025-02-05 17:26:56 - INFO - stdout - {'loss': 0.9768, 'grad_norm': 0.9726662635803223, 'learning_rate': 1.67996608223954e-05, 'epoch': 0.85} +2025-02-05 17:26:56 - ERROR - stderr - 28%|██▊ | 6375/22434 [7:19:15<11:11:48, 2.51s/it] +2025-02-05 17:26:58 - ERROR - stderr - 28%|██▊ | 6376/22434 [7:19:18<11:13:35, 2.52s/it] +2025-02-05 17:26:58 - ERROR - stderr - +2025-02-05 17:26:58 - ERROR - stderr - +2025-02-05 17:26:58 - INFO - stdout - {'loss': 0.8964, 'grad_norm': 1.1042805910110474, 'learning_rate': 1.679860213361004e-05, 'epoch': 0.85} +2025-02-05 17:26:58 - ERROR - stderr - 28%|██▊ | 6376/22434 [7:19:18<11:13:35, 2.52s/it] +2025-02-05 17:27:01 - ERROR - stderr - 28%|██▊ | 6377/22434 [7:19:20<11:12:42, 2.51s/it] +2025-02-05 17:27:01 - ERROR - stderr - +2025-02-05 17:27:01 - ERROR - stderr - +2025-02-05 17:27:01 - INFO - stdout - {'loss': 0.9579, 'grad_norm': 1.0877240896224976, 'learning_rate': 1.6797543303114322e-05, 'epoch': 0.85} +2025-02-05 17:27:01 - ERROR - stderr - 28%|██▊ | 6377/22434 [7:19:20<11:12:42, 2.51s/it] +2025-02-05 17:27:03 - ERROR - stderr - 28%|██▊ | 6378/22434 [7:19:23<11:14:25, 2.52s/it] +2025-02-05 17:27:03 - ERROR - stderr - +2025-02-05 17:27:03 - ERROR - stderr - +2025-02-05 17:27:03 - INFO - stdout - {'loss': 0.9157, 'grad_norm': 1.1410986185073853, 'learning_rate': 1.6796484330930315e-05, 'epoch': 0.85} +2025-02-05 17:27:03 - ERROR - stderr - 28%|██▊ | 6378/22434 [7:19:23<11:14:25, 2.52s/it] +2025-02-05 17:27:06 - ERROR - stderr - 28%|██▊ | 6379/22434 [7:19:25<11:16:43, 2.53s/it] +2025-02-05 17:27:06 - ERROR - stderr - +2025-02-05 17:27:06 - ERROR - stderr - +2025-02-05 17:27:06 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.003361701965332, 'learning_rate': 1.6795425217080098e-05, 'epoch': 0.85} +2025-02-05 17:27:06 - ERROR - stderr - 28%|██▊ | 6379/22434 [7:19:25<11:16:43, 2.53s/it] +2025-02-05 17:27:08 - ERROR - stderr - 28%|██▊ | 6380/22434 [7:19:28<11:19:29, 2.54s/it] +2025-02-05 17:27:08 - ERROR - stderr - +2025-02-05 17:27:08 - ERROR - stderr - +2025-02-05 17:27:08 - INFO - stdout - {'loss': 1.0772, 'grad_norm': 1.067478895187378, 'learning_rate': 1.679436596158575e-05, 'epoch': 0.85} +2025-02-05 17:27:08 - ERROR - stderr - 28%|██▊ | 6380/22434 [7:19:28<11:19:29, 2.54s/it] +2025-02-05 17:27:11 - ERROR - stderr - 28%|██▊ | 6381/22434 [7:19:30<11:15:33, 2.52s/it] +2025-02-05 17:27:11 - ERROR - stderr - +2025-02-05 17:27:11 - ERROR - stderr - +2025-02-05 17:27:11 - INFO - stdout - {'loss': 0.978, 'grad_norm': 1.0158237218856812, 'learning_rate': 1.679330656446934e-05, 'epoch': 0.85} +2025-02-05 17:27:11 - ERROR - stderr - 28%|██▊ | 6381/22434 [7:19:30<11:15:33, 2.52s/it] +2025-02-05 17:27:13 - ERROR - stderr - 28%|██▊ | 6382/22434 [7:19:33<11:11:22, 2.51s/it] +2025-02-05 17:27:13 - ERROR - stderr - +2025-02-05 17:27:13 - ERROR - stderr - +2025-02-05 17:27:13 - INFO - stdout - {'loss': 0.924, 'grad_norm': 1.029374122619629, 'learning_rate': 1.6792247025752956e-05, 'epoch': 0.85} +2025-02-05 17:27:13 - ERROR - stderr - 28%|██▊ | 6382/22434 [7:19:33<11:11:22, 2.51s/it] +2025-02-05 17:27:16 - ERROR - stderr - 28%|██▊ | 6383/22434 [7:19:35<11:06:23, 2.49s/it] +2025-02-05 17:27:16 - ERROR - stderr - +2025-02-05 17:27:16 - ERROR - stderr - +2025-02-05 17:27:16 - INFO - stdout - {'loss': 1.0051, 'grad_norm': 1.111932396888733, 'learning_rate': 1.679118734545868e-05, 'epoch': 0.85} +2025-02-05 17:27:16 - ERROR - stderr - 28%|██▊ | 6383/22434 [7:19:35<11:06:23, 2.49s/it] +2025-02-05 17:27:18 - ERROR - stderr - 28%|██▊ | 6384/22434 [7:19:38<11:03:02, 2.48s/it] +2025-02-05 17:27:18 - ERROR - stderr - +2025-02-05 17:27:18 - ERROR - stderr - +2025-02-05 17:27:18 - INFO - stdout - {'loss': 0.9312, 'grad_norm': 1.0799624919891357, 'learning_rate': 1.679012752360861e-05, 'epoch': 0.85} +2025-02-05 17:27:18 - ERROR - stderr - 28%|██▊ | 6384/22434 [7:19:38<11:03:02, 2.48s/it] +2025-02-05 17:27:21 - ERROR - stderr - 28%|██▊ | 6385/22434 [7:19:40<11:01:45, 2.47s/it] +2025-02-05 17:27:21 - ERROR - stderr - +2025-02-05 17:27:21 - ERROR - stderr - +2025-02-05 17:27:21 - INFO - stdout - {'loss': 0.9636, 'grad_norm': 1.0726861953735352, 'learning_rate': 1.678906756022482e-05, 'epoch': 0.85} +2025-02-05 17:27:21 - ERROR - stderr - 28%|██▊ | 6385/22434 [7:19:40<11:01:45, 2.47s/it] +2025-02-05 17:27:23 - ERROR - stderr - 28%|██▊ | 6386/22434 [7:19:43<11:12:06, 2.51s/it] +2025-02-05 17:27:23 - ERROR - stderr - +2025-02-05 17:27:23 - ERROR - stderr - +2025-02-05 17:27:23 - INFO - stdout - {'loss': 0.8986, 'grad_norm': 1.075973629951477, 'learning_rate': 1.678800745532942e-05, 'epoch': 0.85} +2025-02-05 17:27:23 - ERROR - stderr - 28%|██▊ | 6386/22434 [7:19:43<11:12:06, 2.51s/it] +2025-02-05 17:27:26 - ERROR - stderr - 28%|██▊ | 6387/22434 [7:19:45<11:20:08, 2.54s/it] +2025-02-05 17:27:26 - ERROR - stderr - +2025-02-05 17:27:26 - ERROR - stderr - +2025-02-05 17:27:26 - INFO - stdout - {'loss': 1.0191, 'grad_norm': 1.0156878232955933, 'learning_rate': 1.6786947208944494e-05, 'epoch': 0.85} +2025-02-05 17:27:26 - ERROR - stderr - 28%|██▊ | 6387/22434 [7:19:46<11:20:08, 2.54s/it] +2025-02-05 17:27:29 - ERROR - stderr - 28%|██▊ | 6388/22434 [7:19:48<11:43:16, 2.63s/it] +2025-02-05 17:27:29 - ERROR - stderr - +2025-02-05 17:27:29 - ERROR - stderr - +2025-02-05 17:27:29 - INFO - stdout - {'loss': 0.8995, 'grad_norm': 0.9368893504142761, 'learning_rate': 1.6785886821092153e-05, 'epoch': 0.85} +2025-02-05 17:27:29 - ERROR - stderr - 28%|██▊ | 6388/22434 [7:19:48<11:43:16, 2.63s/it] +2025-02-05 17:27:31 - ERROR - stderr - 28%|██▊ | 6389/22434 [7:19:51<11:32:32, 2.59s/it] +2025-02-05 17:27:31 - ERROR - stderr - +2025-02-05 17:27:31 - ERROR - stderr - +2025-02-05 17:27:31 - INFO - stdout - {'loss': 0.9437, 'grad_norm': 1.0493046045303345, 'learning_rate': 1.6784826291794495e-05, 'epoch': 0.85} +2025-02-05 17:27:31 - ERROR - stderr - 28%|██▊ | 6389/22434 [7:19:51<11:32:32, 2.59s/it] +2025-02-05 17:27:34 - ERROR - stderr - 28%|██▊ | 6390/22434 [7:19:53<11:30:06, 2.58s/it] +2025-02-05 17:27:34 - ERROR - stderr - +2025-02-05 17:27:34 - ERROR - stderr - +2025-02-05 17:27:34 - INFO - stdout - {'loss': 0.7937, 'grad_norm': 1.1224291324615479, 'learning_rate': 1.678376562107362e-05, 'epoch': 0.85} +2025-02-05 17:27:34 - ERROR - stderr - 28%|██▊ | 6390/22434 [7:19:53<11:30:06, 2.58s/it] +2025-02-05 17:27:36 - ERROR - stderr - 28%|██▊ | 6391/22434 [7:19:56<11:19:19, 2.54s/it] +2025-02-05 17:27:36 - ERROR - stderr - +2025-02-05 17:27:36 - ERROR - stderr - +2025-02-05 17:27:36 - INFO - stdout - {'loss': 0.936, 'grad_norm': 0.9947245121002197, 'learning_rate': 1.6782704808951646e-05, 'epoch': 0.85} +2025-02-05 17:27:36 - ERROR - stderr - 28%|██▊ | 6391/22434 [7:19:56<11:19:19, 2.54s/it] +2025-02-05 17:27:39 - ERROR - stderr - 28%|██▊ | 6392/22434 [7:19:58<11:18:39, 2.54s/it] +2025-02-05 17:27:39 - ERROR - stderr - +2025-02-05 17:27:39 - ERROR - stderr - +2025-02-05 17:27:39 - INFO - stdout - {'loss': 0.9088, 'grad_norm': 0.9639949798583984, 'learning_rate': 1.678164385545068e-05, 'epoch': 0.85} +2025-02-05 17:27:39 - ERROR - stderr - 28%|██▊ | 6392/22434 [7:19:58<11:18:39, 2.54s/it] +2025-02-05 17:27:41 - ERROR - stderr - 28%|██▊ | 6393/22434 [7:20:01<11:35:50, 2.60s/it] +2025-02-05 17:27:41 - ERROR - stderr - +2025-02-05 17:27:41 - ERROR - stderr - +2025-02-05 17:27:41 - INFO - stdout - {'loss': 1.0057, 'grad_norm': 1.0433982610702515, 'learning_rate': 1.6780582760592836e-05, 'epoch': 0.85} +2025-02-05 17:27:41 - ERROR - stderr - 28%|██▊ | 6393/22434 [7:20:01<11:35:50, 2.60s/it] +2025-02-05 17:27:44 - ERROR - stderr - 29%|██▊ | 6394/22434 [7:20:04<11:29:23, 2.58s/it] +2025-02-05 17:27:44 - ERROR - stderr - +2025-02-05 17:27:44 - ERROR - stderr - +2025-02-05 17:27:44 - INFO - stdout - {'loss': 1.0163, 'grad_norm': 1.0665639638900757, 'learning_rate': 1.6779521524400234e-05, 'epoch': 0.86} +2025-02-05 17:27:44 - ERROR - stderr - 29%|██▊ | 6394/22434 [7:20:04<11:29:23, 2.58s/it] +2025-02-05 17:27:46 - ERROR - stderr - 29%|██▊ | 6395/22434 [7:20:06<11:22:05, 2.55s/it] +2025-02-05 17:27:46 - ERROR - stderr - +2025-02-05 17:27:46 - ERROR - stderr - +2025-02-05 17:27:46 - INFO - stdout - {'loss': 0.9395, 'grad_norm': 1.0499364137649536, 'learning_rate': 1.677846014689499e-05, 'epoch': 0.86} +2025-02-05 17:27:46 - ERROR - stderr - 29%|██▊ | 6395/22434 [7:20:06<11:22:05, 2.55s/it] +2025-02-05 17:27:49 - ERROR - stderr - 29%|██▊ | 6396/22434 [7:20:09<11:11:41, 2.51s/it] +2025-02-05 17:27:49 - ERROR - stderr - +2025-02-05 17:27:49 - ERROR - stderr - +2025-02-05 17:27:49 - INFO - stdout - {'loss': 0.941, 'grad_norm': 1.201156497001648, 'learning_rate': 1.6777398628099234e-05, 'epoch': 0.86} +2025-02-05 17:27:49 - ERROR - stderr - 29%|██▊ | 6396/22434 [7:20:09<11:11:41, 2.51s/it] +2025-02-05 17:27:51 - ERROR - stderr - 29%|██▊ | 6397/22434 [7:20:11<11:10:50, 2.51s/it] +2025-02-05 17:27:51 - ERROR - stderr - +2025-02-05 17:27:51 - ERROR - stderr - +2025-02-05 17:27:51 - INFO - stdout - {'loss': 0.8268, 'grad_norm': 1.0105317831039429, 'learning_rate': 1.677633696803509e-05, 'epoch': 0.86} +2025-02-05 17:27:51 - ERROR - stderr - 29%|██▊ | 6397/22434 [7:20:11<11:10:50, 2.51s/it] +2025-02-05 17:27:54 - ERROR - stderr - 29%|██▊ | 6398/22434 [7:20:14<11:18:51, 2.54s/it] +2025-02-05 17:27:54 - ERROR - stderr - +2025-02-05 17:27:54 - ERROR - stderr - +2025-02-05 17:27:54 - INFO - stdout - {'loss': 0.9659, 'grad_norm': 0.9905195236206055, 'learning_rate': 1.677527516672468e-05, 'epoch': 0.86} +2025-02-05 17:27:54 - ERROR - stderr - 29%|██▊ | 6398/22434 [7:20:14<11:18:51, 2.54s/it] +2025-02-05 17:27:56 - ERROR - stderr - 29%|██▊ | 6399/22434 [7:20:16<11:14:19, 2.52s/it] +2025-02-05 17:27:56 - ERROR - stderr - +2025-02-05 17:27:56 - ERROR - stderr - +2025-02-05 17:27:56 - INFO - stdout - {'loss': 0.9894, 'grad_norm': 1.1213469505310059, 'learning_rate': 1.6774213224190148e-05, 'epoch': 0.86} +2025-02-05 17:27:56 - ERROR - stderr - 29%|██▊ | 6399/22434 [7:20:16<11:14:19, 2.52s/it] +2025-02-05 17:27:59 - ERROR - stderr - 29%|██▊ | 6400/22434 [7:20:19<11:12:50, 2.52s/it] +2025-02-05 17:27:59 - ERROR - stderr - +2025-02-05 17:27:59 - ERROR - stderr - +2025-02-05 17:27:59 - INFO - stdout - {'loss': 0.931, 'grad_norm': 1.0489760637283325, 'learning_rate': 1.6773151140453624e-05, 'epoch': 0.86} +2025-02-05 17:27:59 - ERROR - stderr - 29%|██▊ | 6400/22434 [7:20:19<11:12:50, 2.52s/it] +2025-02-05 17:28:01 - ERROR - stderr - 29%|██▊ | 6401/22434 [7:20:21<11:05:25, 2.49s/it] +2025-02-05 17:28:01 - ERROR - stderr - +2025-02-05 17:28:01 - ERROR - stderr - +2025-02-05 17:28:01 - INFO - stdout - {'loss': 0.8668, 'grad_norm': 1.0773919820785522, 'learning_rate': 1.677208891553724e-05, 'epoch': 0.86} +2025-02-05 17:28:01 - ERROR - stderr - 29%|██▊ | 6401/22434 [7:20:21<11:05:25, 2.49s/it] +2025-02-05 17:28:04 - ERROR - stderr - 29%|██▊ | 6402/22434 [7:20:24<11:12:10, 2.52s/it] +2025-02-05 17:28:04 - ERROR - stderr - +2025-02-05 17:28:04 - ERROR - stderr - +2025-02-05 17:28:04 - INFO - stdout - {'loss': 0.9943, 'grad_norm': 1.2183749675750732, 'learning_rate': 1.6771026549463148e-05, 'epoch': 0.86} +2025-02-05 17:28:04 - ERROR - stderr - 29%|██▊ | 6402/22434 [7:20:24<11:12:10, 2.52s/it] +2025-02-05 17:28:06 - ERROR - stderr - 29%|██▊ | 6403/22434 [7:20:26<11:13:48, 2.52s/it] +2025-02-05 17:28:06 - ERROR - stderr - +2025-02-05 17:28:06 - ERROR - stderr - +2025-02-05 17:28:06 - INFO - stdout - {'loss': 0.9971, 'grad_norm': 0.9685238003730774, 'learning_rate': 1.6769964042253485e-05, 'epoch': 0.86} +2025-02-05 17:28:06 - ERROR - stderr - 29%|██▊ | 6403/22434 [7:20:26<11:13:48, 2.52s/it] +2025-02-05 17:28:09 - ERROR - stderr - 29%|██▊ | 6404/22434 [7:20:29<11:42:14, 2.63s/it] +2025-02-05 17:28:09 - ERROR - stderr - +2025-02-05 17:28:09 - ERROR - stderr - +2025-02-05 17:28:09 - INFO - stdout - {'loss': 0.8311, 'grad_norm': 1.0275424718856812, 'learning_rate': 1.6768901393930403e-05, 'epoch': 0.86} +2025-02-05 17:28:09 - ERROR - stderr - 29%|██▊ | 6404/22434 [7:20:29<11:42:14, 2.63s/it] +2025-02-05 17:28:12 - ERROR - stderr - 29%|██▊ | 6405/22434 [7:20:32<11:46:43, 2.65s/it] +2025-02-05 17:28:12 - ERROR - stderr - +2025-02-05 17:28:12 - ERROR - stderr - +2025-02-05 17:28:12 - INFO - stdout - {'loss': 0.8401, 'grad_norm': 1.0167380571365356, 'learning_rate': 1.6767838604516043e-05, 'epoch': 0.86} +2025-02-05 17:28:12 - ERROR - stderr - 29%|██▊ | 6405/22434 [7:20:32<11:46:43, 2.65s/it] +2025-02-05 17:28:14 - ERROR - stderr - 29%|██▊ | 6406/22434 [7:20:34<11:35:01, 2.60s/it] +2025-02-05 17:28:15 - ERROR - stderr - +2025-02-05 17:28:15 - ERROR - stderr - +2025-02-05 17:28:15 - INFO - stdout - {'loss': 1.0043, 'grad_norm': 1.1026512384414673, 'learning_rate': 1.6766775674032565e-05, 'epoch': 0.86} +2025-02-05 17:28:15 - ERROR - stderr - 29%|██▊ | 6406/22434 [7:20:34<11:35:01, 2.60s/it] +2025-02-05 17:28:17 - ERROR - stderr - 29%|██▊ | 6407/22434 [7:20:37<11:37:48, 2.61s/it] +2025-02-05 17:28:17 - ERROR - stderr - +2025-02-05 17:28:17 - ERROR - stderr - +2025-02-05 17:28:17 - INFO - stdout - {'loss': 0.9988, 'grad_norm': 0.9721025824546814, 'learning_rate': 1.6765712602502122e-05, 'epoch': 0.86} +2025-02-05 17:28:17 - ERROR - stderr - 29%|██▊ | 6407/22434 [7:20:37<11:37:48, 2.61s/it] +2025-02-05 17:28:20 - ERROR - stderr - 29%|██▊ | 6408/22434 [7:20:39<11:35:12, 2.60s/it] +2025-02-05 17:28:20 - ERROR - stderr - +2025-02-05 17:28:20 - ERROR - stderr - +2025-02-05 17:28:20 - INFO - stdout - {'loss': 0.9182, 'grad_norm': 0.9958188533782959, 'learning_rate': 1.676464938994688e-05, 'epoch': 0.86} +2025-02-05 17:28:20 - ERROR - stderr - 29%|██▊ | 6408/22434 [7:20:40<11:35:12, 2.60s/it] +2025-02-05 17:28:22 - ERROR - stderr - 29%|██▊ | 6409/22434 [7:20:42<11:28:11, 2.58s/it] +2025-02-05 17:28:22 - ERROR - stderr - +2025-02-05 17:28:22 - ERROR - stderr - +2025-02-05 17:28:22 - INFO - stdout - {'loss': 1.0118, 'grad_norm': 1.0558589696884155, 'learning_rate': 1.6763586036388988e-05, 'epoch': 0.86} +2025-02-05 17:28:22 - ERROR - stderr - 29%|██▊ | 6409/22434 [7:20:42<11:28:11, 2.58s/it] +2025-02-05 17:28:25 - ERROR - stderr - 29%|██▊ | 6410/22434 [7:20:45<11:30:22, 2.59s/it] +2025-02-05 17:28:25 - ERROR - stderr - +2025-02-05 17:28:25 - ERROR - stderr - +2025-02-05 17:28:25 - INFO - stdout - {'loss': 0.9108, 'grad_norm': 1.0125571489334106, 'learning_rate': 1.676252254185062e-05, 'epoch': 0.86} +2025-02-05 17:28:25 - ERROR - stderr - 29%|██▊ | 6410/22434 [7:20:45<11:30:22, 2.59s/it] +2025-02-05 17:28:27 - ERROR - stderr - 29%|██▊ | 6411/22434 [7:20:47<11:23:07, 2.56s/it] +2025-02-05 17:28:27 - ERROR - stderr - +2025-02-05 17:28:27 - ERROR - stderr - +2025-02-05 17:28:27 - INFO - stdout - {'loss': 0.9964, 'grad_norm': 1.1763077974319458, 'learning_rate': 1.676145890635394e-05, 'epoch': 0.86} +2025-02-05 17:28:27 - ERROR - stderr - 29%|██▊ | 6411/22434 [7:20:47<11:23:07, 2.56s/it] +2025-02-05 17:28:30 - ERROR - stderr - 29%|██▊ | 6412/22434 [7:20:50<11:17:22, 2.54s/it] +2025-02-05 17:28:30 - ERROR - stderr - +2025-02-05 17:28:30 - ERROR - stderr - +2025-02-05 17:28:30 - INFO - stdout - {'loss': 0.9941, 'grad_norm': 1.1250919103622437, 'learning_rate': 1.6760395129921118e-05, 'epoch': 0.86} +2025-02-05 17:28:30 - ERROR - stderr - 29%|██▊ | 6412/22434 [7:20:50<11:17:22, 2.54s/it] +2025-02-05 17:28:32 - ERROR - stderr - 29%|██▊ | 6413/22434 [7:20:52<11:12:00, 2.52s/it] +2025-02-05 17:28:32 - ERROR - stderr - +2025-02-05 17:28:32 - ERROR - stderr - +2025-02-05 17:28:32 - INFO - stdout - {'loss': 0.9758, 'grad_norm': 1.0218565464019775, 'learning_rate': 1.675933121257433e-05, 'epoch': 0.86} +2025-02-05 17:28:32 - ERROR - stderr - 29%|██▊ | 6413/22434 [7:20:52<11:12:00, 2.52s/it] +2025-02-05 17:28:35 - ERROR - stderr - 29%|██▊ | 6414/22434 [7:20:55<11:12:57, 2.52s/it] +2025-02-05 17:28:35 - ERROR - stderr - +2025-02-05 17:28:35 - ERROR - stderr - +2025-02-05 17:28:35 - INFO - stdout - {'loss': 0.9058, 'grad_norm': 0.9700666666030884, 'learning_rate': 1.675826715433575e-05, 'epoch': 0.86} +2025-02-05 17:28:35 - ERROR - stderr - 29%|██▊ | 6414/22434 [7:20:55<11:12:57, 2.52s/it] +2025-02-05 17:28:37 - ERROR - stderr - 29%|██▊ | 6415/22434 [7:20:57<11:09:41, 2.51s/it] +2025-02-05 17:28:37 - ERROR - stderr - +2025-02-05 17:28:37 - ERROR - stderr - +2025-02-05 17:28:37 - INFO - stdout - {'loss': 1.0698, 'grad_norm': 0.958427906036377, 'learning_rate': 1.6757202955227557e-05, 'epoch': 0.86} +2025-02-05 17:28:37 - ERROR - stderr - 29%|██▊ | 6415/22434 [7:20:57<11:09:41, 2.51s/it] +2025-02-05 17:28:40 - ERROR - stderr - 29%|██▊ | 6416/22434 [7:21:00<11:12:20, 2.52s/it] +2025-02-05 17:28:40 - ERROR - stderr - +2025-02-05 17:28:40 - ERROR - stderr - +2025-02-05 17:28:40 - INFO - stdout - {'loss': 0.9571, 'grad_norm': 1.051458716392517, 'learning_rate': 1.675613861527194e-05, 'epoch': 0.86} +2025-02-05 17:28:40 - ERROR - stderr - 29%|██▊ | 6416/22434 [7:21:00<11:12:20, 2.52s/it] +2025-02-05 17:28:42 - ERROR - stderr - 29%|██▊ | 6417/22434 [7:21:02<11:04:32, 2.49s/it] +2025-02-05 17:28:42 - ERROR - stderr - +2025-02-05 17:28:42 - ERROR - stderr - +2025-02-05 17:28:42 - INFO - stdout - {'loss': 0.9619, 'grad_norm': 1.131280541419983, 'learning_rate': 1.6755074134491075e-05, 'epoch': 0.86} +2025-02-05 17:28:42 - ERROR - stderr - 29%|██▊ | 6417/22434 [7:21:02<11:04:32, 2.49s/it] +2025-02-05 17:28:45 - ERROR - stderr - 29%|██▊ | 6418/22434 [7:21:05<11:06:47, 2.50s/it] +2025-02-05 17:28:45 - ERROR - stderr - +2025-02-05 17:28:45 - ERROR - stderr - +2025-02-05 17:28:45 - INFO - stdout - {'loss': 0.9754, 'grad_norm': 1.127591609954834, 'learning_rate': 1.675400951290715e-05, 'epoch': 0.86} +2025-02-05 17:28:45 - ERROR - stderr - 29%|██▊ | 6418/22434 [7:21:05<11:06:47, 2.50s/it] +2025-02-05 17:28:47 - ERROR - stderr - 29%|██▊ | 6419/22434 [7:21:07<11:07:54, 2.50s/it] +2025-02-05 17:28:47 - ERROR - stderr - +2025-02-05 17:28:47 - ERROR - stderr - +2025-02-05 17:28:47 - INFO - stdout - {'loss': 0.8947, 'grad_norm': 1.1054295301437378, 'learning_rate': 1.6752944750542366e-05, 'epoch': 0.86} +2025-02-05 17:28:47 - ERROR - stderr - 29%|██▊ | 6419/22434 [7:21:07<11:07:54, 2.50s/it] +2025-02-05 17:28:50 - ERROR - stderr - 29%|██▊ | 6420/22434 [7:21:10<11:08:49, 2.51s/it] +2025-02-05 17:28:50 - ERROR - stderr - +2025-02-05 17:28:50 - ERROR - stderr - +2025-02-05 17:28:50 - INFO - stdout - {'loss': 1.0274, 'grad_norm': 1.2202069759368896, 'learning_rate': 1.6751879847418907e-05, 'epoch': 0.86} +2025-02-05 17:28:50 - ERROR - stderr - 29%|██▊ | 6420/22434 [7:21:10<11:08:49, 2.51s/it] +2025-02-05 17:28:52 - ERROR - stderr - 29%|██▊ | 6421/22434 [7:21:12<11:10:46, 2.51s/it] +2025-02-05 17:28:52 - ERROR - stderr - +2025-02-05 17:28:52 - ERROR - stderr - +2025-02-05 17:28:52 - INFO - stdout - {'loss': 0.9219, 'grad_norm': 1.0476248264312744, 'learning_rate': 1.675081480355897e-05, 'epoch': 0.86} +2025-02-05 17:28:52 - ERROR - stderr - 29%|██▊ | 6421/22434 [7:21:12<11:10:46, 2.51s/it] +2025-02-05 17:28:55 - ERROR - stderr - 29%|██▊ | 6422/22434 [7:21:15<11:11:44, 2.52s/it] +2025-02-05 17:28:55 - ERROR - stderr - +2025-02-05 17:28:55 - ERROR - stderr - +2025-02-05 17:28:55 - INFO - stdout - {'loss': 0.8629, 'grad_norm': 1.0108592510223389, 'learning_rate': 1.6749749618984763e-05, 'epoch': 0.86} +2025-02-05 17:28:55 - ERROR - stderr - 29%|██▊ | 6422/22434 [7:21:15<11:11:44, 2.52s/it] +2025-02-05 17:28:57 - ERROR - stderr - 29%|██▊ | 6423/22434 [7:21:17<11:20:25, 2.55s/it] +2025-02-05 17:28:58 - ERROR - stderr - +2025-02-05 17:28:58 - ERROR - stderr - +2025-02-05 17:28:58 - INFO - stdout - {'loss': 1.0049, 'grad_norm': 1.0441325902938843, 'learning_rate': 1.6748684293718484e-05, 'epoch': 0.86} +2025-02-05 17:28:58 - ERROR - stderr - 29%|██▊ | 6423/22434 [7:21:17<11:20:25, 2.55s/it] +2025-02-05 17:29:00 - ERROR - stderr - 29%|██▊ | 6424/22434 [7:21:20<11:33:11, 2.60s/it] +2025-02-05 17:29:00 - ERROR - stderr - +2025-02-05 17:29:00 - ERROR - stderr - +2025-02-05 17:29:00 - INFO - stdout - {'loss': 0.9337, 'grad_norm': 1.100607991218567, 'learning_rate': 1.674761882778234e-05, 'epoch': 0.86} +2025-02-05 17:29:00 - ERROR - stderr - 29%|██▊ | 6424/22434 [7:21:20<11:33:11, 2.60s/it] +2025-02-05 17:29:03 - ERROR - stderr - 29%|██▊ | 6425/22434 [7:21:23<11:38:57, 2.62s/it] +2025-02-05 17:29:03 - ERROR - stderr - +2025-02-05 17:29:03 - ERROR - stderr - +2025-02-05 17:29:03 - INFO - stdout - {'loss': 0.9714, 'grad_norm': 1.0563383102416992, 'learning_rate': 1.6746553221198532e-05, 'epoch': 0.86} +2025-02-05 17:29:03 - ERROR - stderr - 29%|██▊ | 6425/22434 [7:21:23<11:38:57, 2.62s/it] +2025-02-05 17:29:06 - ERROR - stderr - 29%|██▊ | 6426/22434 [7:21:25<11:47:24, 2.65s/it] +2025-02-05 17:29:06 - ERROR - stderr - +2025-02-05 17:29:06 - ERROR - stderr - +2025-02-05 17:29:06 - INFO - stdout - {'loss': 0.8994, 'grad_norm': 1.1651633977890015, 'learning_rate': 1.6745487473989285e-05, 'epoch': 0.86} +2025-02-05 17:29:06 - ERROR - stderr - 29%|██▊ | 6426/22434 [7:21:25<11:47:24, 2.65s/it] +2025-02-05 17:29:08 - ERROR - stderr - 29%|██▊ | 6427/22434 [7:21:28<11:43:09, 2.64s/it] +2025-02-05 17:29:08 - ERROR - stderr - +2025-02-05 17:29:08 - ERROR - stderr - +2025-02-05 17:29:08 - INFO - stdout - {'loss': 0.9515, 'grad_norm': 1.005658507347107, 'learning_rate': 1.67444215861768e-05, 'epoch': 0.86} +2025-02-05 17:29:08 - ERROR - stderr - 29%|██▊ | 6427/22434 [7:21:28<11:43:09, 2.64s/it] +2025-02-05 17:29:11 - ERROR - stderr - 29%|██▊ | 6428/22434 [7:21:31<11:39:21, 2.62s/it] +2025-02-05 17:29:11 - ERROR - stderr - +2025-02-05 17:29:11 - ERROR - stderr - +2025-02-05 17:29:11 - INFO - stdout - {'loss': 0.9155, 'grad_norm': 1.0972975492477417, 'learning_rate': 1.6743355557783308e-05, 'epoch': 0.86} +2025-02-05 17:29:11 - ERROR - stderr - 29%|██▊ | 6428/22434 [7:21:31<11:39:21, 2.62s/it] +2025-02-05 17:29:13 - ERROR - stderr - 29%|██▊ | 6429/22434 [7:21:33<11:30:00, 2.59s/it] +2025-02-05 17:29:13 - ERROR - stderr - +2025-02-05 17:29:13 - ERROR - stderr - +2025-02-05 17:29:13 - INFO - stdout - {'loss': 0.9988, 'grad_norm': 1.1275793313980103, 'learning_rate': 1.6742289388831014e-05, 'epoch': 0.86} +2025-02-05 17:29:13 - ERROR - stderr - 29%|██▊ | 6429/22434 [7:21:33<11:30:00, 2.59s/it] +2025-02-05 17:29:16 - ERROR - stderr - 29%|██▊ | 6430/22434 [7:21:36<11:20:50, 2.55s/it] +2025-02-05 17:29:16 - ERROR - stderr - +2025-02-05 17:29:16 - ERROR - stderr - +2025-02-05 17:29:16 - INFO - stdout - {'loss': 0.9155, 'grad_norm': 1.0282682180404663, 'learning_rate': 1.6741223079342153e-05, 'epoch': 0.86} +2025-02-05 17:29:16 - ERROR - stderr - 29%|██▊ | 6430/22434 [7:21:36<11:20:50, 2.55s/it] +2025-02-05 17:29:18 - ERROR - stderr - 29%|██▊ | 6431/22434 [7:21:38<11:24:45, 2.57s/it] +2025-02-05 17:29:18 - ERROR - stderr - +2025-02-05 17:29:18 - ERROR - stderr - +2025-02-05 17:29:18 - INFO - stdout - {'loss': 0.8919, 'grad_norm': 1.0839102268218994, 'learning_rate': 1.674015662933895e-05, 'epoch': 0.86} +2025-02-05 17:29:18 - ERROR - stderr - 29%|██▊ | 6431/22434 [7:21:38<11:24:45, 2.57s/it] +2025-02-05 17:29:21 - ERROR - stderr - 29%|██▊ | 6432/22434 [7:21:41<11:18:56, 2.55s/it] +2025-02-05 17:29:21 - ERROR - stderr - +2025-02-05 17:29:21 - ERROR - stderr - +2025-02-05 17:29:21 - INFO - stdout - {'loss': 0.9458, 'grad_norm': 1.1187360286712646, 'learning_rate': 1.673909003884363e-05, 'epoch': 0.86} +2025-02-05 17:29:21 - ERROR - stderr - 29%|██▊ | 6432/22434 [7:21:41<11:18:56, 2.55s/it] +2025-02-05 17:29:23 - ERROR - stderr - 29%|██▊ | 6433/22434 [7:21:43<11:15:00, 2.53s/it] +2025-02-05 17:29:23 - ERROR - stderr - +2025-02-05 17:29:23 - ERROR - stderr - +2025-02-05 17:29:23 - INFO - stdout - {'loss': 0.8712, 'grad_norm': 0.9898458123207092, 'learning_rate': 1.6738023307878425e-05, 'epoch': 0.86} +2025-02-05 17:29:23 - ERROR - stderr - 29%|██▊ | 6433/22434 [7:21:43<11:15:00, 2.53s/it] +2025-02-05 17:29:26 - ERROR - stderr - 29%|██▊ | 6434/22434 [7:21:46<11:08:08, 2.51s/it] +2025-02-05 17:29:26 - ERROR - stderr - +2025-02-05 17:29:26 - ERROR - stderr - +2025-02-05 17:29:26 - INFO - stdout - {'loss': 0.9341, 'grad_norm': 1.0592583417892456, 'learning_rate': 1.6736956436465573e-05, 'epoch': 0.86} +2025-02-05 17:29:26 - ERROR - stderr - 29%|██▊ | 6434/22434 [7:21:46<11:08:08, 2.51s/it] +2025-02-05 17:29:28 - ERROR - stderr - 29%|██▊ | 6435/22434 [7:21:48<11:06:35, 2.50s/it] +2025-02-05 17:29:28 - ERROR - stderr - +2025-02-05 17:29:28 - ERROR - stderr - +2025-02-05 17:29:28 - INFO - stdout - {'loss': 1.0303, 'grad_norm': 1.1703660488128662, 'learning_rate': 1.6735889424627313e-05, 'epoch': 0.86} +2025-02-05 17:29:28 - ERROR - stderr - 29%|██▊ | 6435/22434 [7:21:48<11:06:35, 2.50s/it] +2025-02-05 17:29:31 - ERROR - stderr - 29%|██▊ | 6436/22434 [7:21:51<11:08:50, 2.51s/it] +2025-02-05 17:29:31 - ERROR - stderr - +2025-02-05 17:29:31 - ERROR - stderr - +2025-02-05 17:29:31 - INFO - stdout - {'loss': 0.944, 'grad_norm': 0.9925939440727234, 'learning_rate': 1.673482227238588e-05, 'epoch': 0.86} +2025-02-05 17:29:31 - ERROR - stderr - 29%|██▊ | 6436/22434 [7:21:51<11:08:50, 2.51s/it] +2025-02-05 17:29:34 - ERROR - stderr - 29%|██▊ | 6437/22434 [7:21:53<11:35:27, 2.61s/it] +2025-02-05 17:29:34 - ERROR - stderr - +2025-02-05 17:29:34 - ERROR - stderr - +2025-02-05 17:29:34 - INFO - stdout - {'loss': 0.9736, 'grad_norm': 1.0885568857192993, 'learning_rate': 1.6733754979763525e-05, 'epoch': 0.86} +2025-02-05 17:29:34 - ERROR - stderr - 29%|██▊ | 6437/22434 [7:21:53<11:35:27, 2.61s/it] +2025-02-05 17:29:36 - ERROR - stderr - 29%|██▊ | 6438/22434 [7:21:56<11:24:48, 2.57s/it] +2025-02-05 17:29:36 - ERROR - stderr - +2025-02-05 17:29:36 - ERROR - stderr - +2025-02-05 17:29:36 - INFO - stdout - {'loss': 0.9015, 'grad_norm': 1.0746959447860718, 'learning_rate': 1.6732687546782486e-05, 'epoch': 0.86} +2025-02-05 17:29:36 - ERROR - stderr - 29%|██▊ | 6438/22434 [7:21:56<11:24:48, 2.57s/it] +2025-02-05 17:29:39 - ERROR - stderr - 29%|██▊ | 6439/22434 [7:21:58<11:16:11, 2.54s/it] +2025-02-05 17:29:39 - ERROR - stderr - +2025-02-05 17:29:39 - ERROR - stderr - +2025-02-05 17:29:39 - INFO - stdout - {'loss': 0.9399, 'grad_norm': 1.0241910219192505, 'learning_rate': 1.6731619973465018e-05, 'epoch': 0.86} +2025-02-05 17:29:39 - ERROR - stderr - 29%|██▊ | 6439/22434 [7:21:58<11:16:11, 2.54s/it] +2025-02-05 17:29:41 - ERROR - stderr - 29%|██▊ | 6440/22434 [7:22:01<11:15:42, 2.53s/it] +2025-02-05 17:29:41 - ERROR - stderr - +2025-02-05 17:29:41 - ERROR - stderr - +2025-02-05 17:29:41 - INFO - stdout - {'loss': 1.0284, 'grad_norm': 1.1427667140960693, 'learning_rate': 1.6730552259833378e-05, 'epoch': 0.86} +2025-02-05 17:29:41 - ERROR - stderr - 29%|██▊ | 6440/22434 [7:22:01<11:15:42, 2.53s/it] +2025-02-05 17:29:44 - ERROR - stderr - 29%|██▊ | 6441/22434 [7:22:03<11:14:12, 2.53s/it] +2025-02-05 17:29:44 - ERROR - stderr - +2025-02-05 17:29:44 - ERROR - stderr - +2025-02-05 17:29:44 - INFO - stdout - {'loss': 0.8813, 'grad_norm': 1.1317977905273438, 'learning_rate': 1.672948440590981e-05, 'epoch': 0.86} +2025-02-05 17:29:44 - ERROR - stderr - 29%|██▊ | 6441/22434 [7:22:03<11:14:12, 2.53s/it] +2025-02-05 17:29:46 - ERROR - stderr - 29%|██▊ | 6442/22434 [7:22:06<11:08:55, 2.51s/it] +2025-02-05 17:29:46 - ERROR - stderr - +2025-02-05 17:29:46 - ERROR - stderr - +2025-02-05 17:29:46 - INFO - stdout - {'loss': 0.9214, 'grad_norm': 1.0913825035095215, 'learning_rate': 1.6728416411716587e-05, 'epoch': 0.86} +2025-02-05 17:29:46 - ERROR - stderr - 29%|██▊ | 6442/22434 [7:22:06<11:08:55, 2.51s/it] +2025-02-05 17:29:49 - ERROR - stderr - 29%|██▊ | 6443/22434 [7:22:08<11:03:37, 2.49s/it] +2025-02-05 17:29:49 - ERROR - stderr - +2025-02-05 17:29:49 - ERROR - stderr - +2025-02-05 17:29:49 - INFO - stdout - {'loss': 0.8926, 'grad_norm': 1.1184508800506592, 'learning_rate': 1.6727348277275957e-05, 'epoch': 0.86} +2025-02-05 17:29:49 - ERROR - stderr - 29%|██▊ | 6443/22434 [7:22:08<11:03:37, 2.49s/it] +2025-02-05 17:29:51 - ERROR - stderr - 29%|██▊ | 6444/22434 [7:22:11<11:10:05, 2.51s/it] +2025-02-05 17:29:51 - ERROR - stderr - +2025-02-05 17:29:51 - ERROR - stderr - +2025-02-05 17:29:51 - INFO - stdout - {'loss': 0.8701, 'grad_norm': 1.1488111019134521, 'learning_rate': 1.6726280002610188e-05, 'epoch': 0.86} +2025-02-05 17:29:51 - ERROR - stderr - 29%|██▊ | 6444/22434 [7:22:11<11:10:05, 2.51s/it] +2025-02-05 17:29:54 - ERROR - stderr - 29%|██▊ | 6445/22434 [7:22:13<11:10:51, 2.52s/it] +2025-02-05 17:29:54 - ERROR - stderr - +2025-02-05 17:29:54 - ERROR - stderr - +2025-02-05 17:29:54 - INFO - stdout - {'loss': 0.8452, 'grad_norm': 1.0850615501403809, 'learning_rate': 1.6725211587741553e-05, 'epoch': 0.86} +2025-02-05 17:29:54 - ERROR - stderr - 29%|██▊ | 6445/22434 [7:22:13<11:10:51, 2.52s/it] +2025-02-05 17:29:56 - ERROR - stderr - 29%|██▊ | 6446/22434 [7:22:16<11:10:05, 2.51s/it] +2025-02-05 17:29:56 - ERROR - stderr - +2025-02-05 17:29:56 - ERROR - stderr - +2025-02-05 17:29:56 - INFO - stdout - {'loss': 0.8968, 'grad_norm': 1.044378638267517, 'learning_rate': 1.6724143032692316e-05, 'epoch': 0.86} +2025-02-05 17:29:56 - ERROR - stderr - 29%|██▊ | 6446/22434 [7:22:16<11:10:05, 2.51s/it] +2025-02-05 17:29:59 - ERROR - stderr - 29%|██▊ | 6447/22434 [7:22:18<11:07:35, 2.51s/it] +2025-02-05 17:29:59 - ERROR - stderr - +2025-02-05 17:29:59 - ERROR - stderr - +2025-02-05 17:29:59 - INFO - stdout - {'loss': 0.7783, 'grad_norm': 0.9474478363990784, 'learning_rate': 1.672307433748475e-05, 'epoch': 0.86} +2025-02-05 17:29:59 - ERROR - stderr - 29%|██▊ | 6447/22434 [7:22:18<11:07:35, 2.51s/it] +2025-02-05 17:30:01 - ERROR - stderr - 29%|██▊ | 6448/22434 [7:22:21<11:03:39, 2.49s/it] +2025-02-05 17:30:01 - ERROR - stderr - +2025-02-05 17:30:01 - ERROR - stderr - +2025-02-05 17:30:01 - INFO - stdout - {'loss': 0.9915, 'grad_norm': 1.2427572011947632, 'learning_rate': 1.6722005502141135e-05, 'epoch': 0.86} +2025-02-05 17:30:01 - ERROR - stderr - 29%|██▊ | 6448/22434 [7:22:21<11:03:39, 2.49s/it] +2025-02-05 17:30:04 - ERROR - stderr - 29%|██▊ | 6449/22434 [7:22:23<11:07:02, 2.50s/it] +2025-02-05 17:30:04 - ERROR - stderr - +2025-02-05 17:30:04 - ERROR - stderr - +2025-02-05 17:30:04 - INFO - stdout - {'loss': 1.0304, 'grad_norm': 1.0530056953430176, 'learning_rate': 1.6720936526683748e-05, 'epoch': 0.86} +2025-02-05 17:30:04 - ERROR - stderr - 29%|██▊ | 6449/22434 [7:22:23<11:07:02, 2.50s/it] +2025-02-05 17:30:06 - ERROR - stderr - 29%|██▉ | 6450/22434 [7:22:26<11:06:25, 2.50s/it] +2025-02-05 17:30:06 - ERROR - stderr - +2025-02-05 17:30:06 - ERROR - stderr - +2025-02-05 17:30:06 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.0332579612731934, 'learning_rate': 1.671986741113487e-05, 'epoch': 0.86} +2025-02-05 17:30:06 - ERROR - stderr - 29%|██▉ | 6450/22434 [7:22:26<11:06:25, 2.50s/it] +2025-02-05 17:30:09 - ERROR - stderr - 29%|██▉ | 6451/22434 [7:22:28<11:05:37, 2.50s/it] +2025-02-05 17:30:09 - ERROR - stderr - +2025-02-05 17:30:09 - ERROR - stderr - +2025-02-05 17:30:09 - INFO - stdout - {'loss': 0.956, 'grad_norm': 0.9718854427337646, 'learning_rate': 1.6718798155516785e-05, 'epoch': 0.86} +2025-02-05 17:30:09 - ERROR - stderr - 29%|██▉ | 6451/22434 [7:22:28<11:05:37, 2.50s/it] +2025-02-05 17:30:11 - ERROR - stderr - 29%|██▉ | 6452/22434 [7:22:31<11:07:58, 2.51s/it] +2025-02-05 17:30:11 - ERROR - stderr - +2025-02-05 17:30:11 - ERROR - stderr - +2025-02-05 17:30:11 - INFO - stdout - {'loss': 0.7821, 'grad_norm': 0.8710107803344727, 'learning_rate': 1.671772875985178e-05, 'epoch': 0.86} +2025-02-05 17:30:11 - ERROR - stderr - 29%|██▉ | 6452/22434 [7:22:31<11:07:58, 2.51s/it] +2025-02-05 17:30:14 - ERROR - stderr - 29%|██▉ | 6453/22434 [7:22:33<11:07:20, 2.51s/it] +2025-02-05 17:30:14 - ERROR - stderr - +2025-02-05 17:30:14 - ERROR - stderr - +2025-02-05 17:30:14 - INFO - stdout - {'loss': 0.9487, 'grad_norm': 1.0515718460083008, 'learning_rate': 1.671665922416215e-05, 'epoch': 0.86} +2025-02-05 17:30:14 - ERROR - stderr - 29%|██▉ | 6453/22434 [7:22:33<11:07:20, 2.51s/it] +2025-02-05 17:30:16 - ERROR - stderr - 29%|██▉ | 6454/22434 [7:22:36<11:07:46, 2.51s/it] +2025-02-05 17:30:16 - ERROR - stderr - +2025-02-05 17:30:16 - ERROR - stderr - +2025-02-05 17:30:16 - INFO - stdout - {'loss': 0.893, 'grad_norm': 1.097126841545105, 'learning_rate': 1.6715589548470187e-05, 'epoch': 0.86} +2025-02-05 17:30:16 - ERROR - stderr - 29%|██▉ | 6454/22434 [7:22:36<11:07:46, 2.51s/it] +2025-02-05 17:30:19 - ERROR - stderr - 29%|██▉ | 6455/22434 [7:22:38<11:02:49, 2.49s/it] +2025-02-05 17:30:19 - ERROR - stderr - +2025-02-05 17:30:19 - ERROR - stderr - +2025-02-05 17:30:19 - INFO - stdout - {'loss': 1.008, 'grad_norm': 1.0665756464004517, 'learning_rate': 1.6714519732798184e-05, 'epoch': 0.86} +2025-02-05 17:30:19 - ERROR - stderr - 29%|██▉ | 6455/22434 [7:22:38<11:02:49, 2.49s/it] +2025-02-05 17:30:21 - ERROR - stderr - 29%|██▉ | 6456/22434 [7:22:41<11:00:35, 2.48s/it] +2025-02-05 17:30:21 - ERROR - stderr - +2025-02-05 17:30:21 - ERROR - stderr - +2025-02-05 17:30:21 - INFO - stdout - {'loss': 0.9382, 'grad_norm': 1.2057867050170898, 'learning_rate': 1.671344977716844e-05, 'epoch': 0.86} +2025-02-05 17:30:21 - ERROR - stderr - 29%|██▉ | 6456/22434 [7:22:41<11:00:35, 2.48s/it] +2025-02-05 17:30:24 - ERROR - stderr - 29%|██▉ | 6457/22434 [7:22:43<10:59:09, 2.48s/it] +2025-02-05 17:30:24 - ERROR - stderr - +2025-02-05 17:30:24 - ERROR - stderr - +2025-02-05 17:30:24 - INFO - stdout - {'loss': 1.0681, 'grad_norm': 1.169060468673706, 'learning_rate': 1.6712379681603264e-05, 'epoch': 0.86} +2025-02-05 17:30:24 - ERROR - stderr - 29%|██▉ | 6457/22434 [7:22:43<10:59:09, 2.48s/it] +2025-02-05 17:30:26 - ERROR - stderr - 29%|██▉ | 6458/22434 [7:22:46<11:37:40, 2.62s/it] +2025-02-05 17:30:27 - ERROR - stderr - +2025-02-05 17:30:27 - ERROR - stderr - +2025-02-05 17:30:27 - INFO - stdout - {'loss': 1.0063, 'grad_norm': 1.212019920349121, 'learning_rate': 1.6711309446124954e-05, 'epoch': 0.86} +2025-02-05 17:30:27 - ERROR - stderr - 29%|██▉ | 6458/22434 [7:22:46<11:37:40, 2.62s/it] +2025-02-05 17:30:29 - ERROR - stderr - 29%|██▉ | 6459/22434 [7:22:49<11:25:47, 2.58s/it] +2025-02-05 17:30:29 - ERROR - stderr - +2025-02-05 17:30:29 - ERROR - stderr - +2025-02-05 17:30:29 - INFO - stdout - {'loss': 0.9958, 'grad_norm': 1.14297354221344, 'learning_rate': 1.6710239070755818e-05, 'epoch': 0.86} +2025-02-05 17:30:29 - ERROR - stderr - 29%|██▉ | 6459/22434 [7:22:49<11:25:47, 2.58s/it] +2025-02-05 17:30:31 - ERROR - stderr - 29%|██▉ | 6460/22434 [7:22:51<11:19:02, 2.55s/it] +2025-02-05 17:30:31 - ERROR - stderr - +2025-02-05 17:30:31 - ERROR - stderr - +2025-02-05 17:30:31 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.1121227741241455, 'learning_rate': 1.670916855551817e-05, 'epoch': 0.86} +2025-02-05 17:30:31 - ERROR - stderr - 29%|██▉ | 6460/22434 [7:22:51<11:19:02, 2.55s/it] +2025-02-05 17:30:34 - ERROR - stderr - 29%|██▉ | 6461/22434 [7:22:54<11:16:05, 2.54s/it] +2025-02-05 17:30:34 - ERROR - stderr - +2025-02-05 17:30:34 - ERROR - stderr - +2025-02-05 17:30:34 - INFO - stdout - {'loss': 0.8992, 'grad_norm': 1.0511651039123535, 'learning_rate': 1.6708097900434328e-05, 'epoch': 0.86} +2025-02-05 17:30:34 - ERROR - stderr - 29%|██▉ | 6461/22434 [7:22:54<11:16:05, 2.54s/it] +2025-02-05 17:30:36 - ERROR - stderr - 29%|██▉ | 6462/22434 [7:22:56<11:14:03, 2.53s/it] +2025-02-05 17:30:37 - ERROR - stderr - +2025-02-05 17:30:37 - ERROR - stderr - +2025-02-05 17:30:37 - INFO - stdout - {'loss': 0.9639, 'grad_norm': 1.0957285165786743, 'learning_rate': 1.6707027105526602e-05, 'epoch': 0.86} +2025-02-05 17:30:37 - ERROR - stderr - 29%|██▉ | 6462/22434 [7:22:56<11:14:03, 2.53s/it] +2025-02-05 17:30:39 - ERROR - stderr - 29%|██▉ | 6463/22434 [7:22:59<11:10:13, 2.52s/it] +2025-02-05 17:30:39 - ERROR - stderr - +2025-02-05 17:30:39 - ERROR - stderr - +2025-02-05 17:30:39 - INFO - stdout - {'loss': 0.8177, 'grad_norm': 0.9509884715080261, 'learning_rate': 1.6705956170817315e-05, 'epoch': 0.86} +2025-02-05 17:30:39 - ERROR - stderr - 29%|██▉ | 6463/22434 [7:22:59<11:10:13, 2.52s/it] +2025-02-05 17:30:41 - ERROR - stderr - 29%|██▉ | 6464/22434 [7:23:01<11:02:22, 2.49s/it] +2025-02-05 17:30:41 - ERROR - stderr - +2025-02-05 17:30:41 - ERROR - stderr - +2025-02-05 17:30:41 - INFO - stdout - {'loss': 0.8999, 'grad_norm': 1.0080265998840332, 'learning_rate': 1.6704885096328787e-05, 'epoch': 0.86} +2025-02-05 17:30:41 - ERROR - stderr - 29%|██▉ | 6464/22434 [7:23:01<11:02:22, 2.49s/it] +2025-02-05 17:30:44 - ERROR - stderr - 29%|██▉ | 6465/22434 [7:23:04<11:09:27, 2.52s/it] +2025-02-05 17:30:44 - ERROR - stderr - +2025-02-05 17:30:44 - ERROR - stderr - +2025-02-05 17:30:44 - INFO - stdout - {'loss': 0.8572, 'grad_norm': 0.9609020948410034, 'learning_rate': 1.6703813882083347e-05, 'epoch': 0.86} +2025-02-05 17:30:44 - ERROR - stderr - 29%|██▉ | 6465/22434 [7:23:04<11:09:27, 2.52s/it] +2025-02-05 17:30:46 - ERROR - stderr - 29%|██▉ | 6466/22434 [7:23:06<11:03:44, 2.49s/it] +2025-02-05 17:30:46 - ERROR - stderr - +2025-02-05 17:30:46 - ERROR - stderr - +2025-02-05 17:30:46 - INFO - stdout - {'loss': 0.8913, 'grad_norm': 0.9913627505302429, 'learning_rate': 1.6702742528103318e-05, 'epoch': 0.86} +2025-02-05 17:30:46 - ERROR - stderr - 29%|██▉ | 6466/22434 [7:23:06<11:03:44, 2.49s/it] +2025-02-05 17:30:49 - ERROR - stderr - 29%|██▉ | 6467/22434 [7:23:09<11:09:31, 2.52s/it] +2025-02-05 17:30:49 - ERROR - stderr - +2025-02-05 17:30:49 - ERROR - stderr - +2025-02-05 17:30:49 - INFO - stdout - {'loss': 0.9404, 'grad_norm': 0.9418418407440186, 'learning_rate': 1.670167103441104e-05, 'epoch': 0.86} +2025-02-05 17:30:49 - ERROR - stderr - 29%|██▉ | 6467/22434 [7:23:09<11:09:31, 2.52s/it] +2025-02-05 17:30:51 - ERROR - stderr - 29%|██▉ | 6468/22434 [7:23:11<11:08:07, 2.51s/it] +2025-02-05 17:30:52 - ERROR - stderr - +2025-02-05 17:30:52 - ERROR - stderr - +2025-02-05 17:30:52 - INFO - stdout - {'loss': 0.9073, 'grad_norm': 1.016886830329895, 'learning_rate': 1.6700599401028834e-05, 'epoch': 0.86} +2025-02-05 17:30:52 - ERROR - stderr - 29%|██▉ | 6468/22434 [7:23:11<11:08:07, 2.51s/it] +2025-02-05 17:30:54 - ERROR - stderr - 29%|██▉ | 6469/22434 [7:23:14<11:07:05, 2.51s/it] +2025-02-05 17:30:54 - ERROR - stderr - +2025-02-05 17:30:54 - ERROR - stderr - +2025-02-05 17:30:54 - INFO - stdout - {'loss': 0.8685, 'grad_norm': 1.114442229270935, 'learning_rate': 1.6699527627979052e-05, 'epoch': 0.87} +2025-02-05 17:30:54 - ERROR - stderr - 29%|██▉ | 6469/22434 [7:23:14<11:07:05, 2.51s/it] +2025-02-05 17:30:56 - ERROR - stderr - 29%|██▉ | 6470/22434 [7:23:16<11:05:25, 2.50s/it] +2025-02-05 17:30:56 - ERROR - stderr - +2025-02-05 17:30:56 - ERROR - stderr - +2025-02-05 17:30:56 - INFO - stdout - {'loss': 1.0016, 'grad_norm': 1.1099072694778442, 'learning_rate': 1.6698455715284026e-05, 'epoch': 0.87} +2025-02-05 17:30:56 - ERROR - stderr - 29%|██▉ | 6470/22434 [7:23:16<11:05:25, 2.50s/it] +2025-02-05 17:30:59 - ERROR - stderr - 29%|██▉ | 6471/22434 [7:23:19<11:27:36, 2.58s/it] +2025-02-05 17:30:59 - ERROR - stderr - +2025-02-05 17:30:59 - ERROR - stderr - +2025-02-05 17:30:59 - INFO - stdout - {'loss': 0.9098, 'grad_norm': 1.1658971309661865, 'learning_rate': 1.66973836629661e-05, 'epoch': 0.87} +2025-02-05 17:30:59 - ERROR - stderr - 29%|██▉ | 6471/22434 [7:23:19<11:27:36, 2.58s/it] +2025-02-05 17:31:02 - ERROR - stderr - 29%|██▉ | 6472/22434 [7:23:22<11:23:02, 2.57s/it] +2025-02-05 17:31:02 - ERROR - stderr - +2025-02-05 17:31:02 - ERROR - stderr - +2025-02-05 17:31:02 - INFO - stdout - {'loss': 0.8715, 'grad_norm': 0.9998052716255188, 'learning_rate': 1.669631147104762e-05, 'epoch': 0.87} +2025-02-05 17:31:02 - ERROR - stderr - 29%|██▉ | 6472/22434 [7:23:22<11:23:02, 2.57s/it] +2025-02-05 17:31:04 - ERROR - stderr - 29%|██▉ | 6473/22434 [7:23:24<11:23:57, 2.57s/it] +2025-02-05 17:31:04 - ERROR - stderr - +2025-02-05 17:31:04 - ERROR - stderr - +2025-02-05 17:31:04 - INFO - stdout - {'loss': 1.0347, 'grad_norm': 1.0649808645248413, 'learning_rate': 1.6695239139550934e-05, 'epoch': 0.87} +2025-02-05 17:31:04 - ERROR - stderr - 29%|██▉ | 6473/22434 [7:23:24<11:23:57, 2.57s/it] +2025-02-05 17:31:07 - ERROR - stderr - 29%|██▉ | 6474/22434 [7:23:27<11:15:02, 2.54s/it] +2025-02-05 17:31:07 - ERROR - stderr - +2025-02-05 17:31:07 - ERROR - stderr - +2025-02-05 17:31:07 - INFO - stdout - {'loss': 0.8371, 'grad_norm': 0.9812138676643372, 'learning_rate': 1.6694166668498396e-05, 'epoch': 0.87} +2025-02-05 17:31:07 - ERROR - stderr - 29%|██▉ | 6474/22434 [7:23:27<11:15:02, 2.54s/it] +2025-02-05 17:31:09 - ERROR - stderr - 29%|██▉ | 6475/22434 [7:23:29<11:10:53, 2.52s/it] +2025-02-05 17:31:09 - ERROR - stderr - +2025-02-05 17:31:09 - ERROR - stderr - +2025-02-05 17:31:09 - INFO - stdout - {'loss': 0.9774, 'grad_norm': 0.9409304261207581, 'learning_rate': 1.669309405791236e-05, 'epoch': 0.87} +2025-02-05 17:31:09 - ERROR - stderr - 29%|██▉ | 6475/22434 [7:23:29<11:10:53, 2.52s/it] +2025-02-05 17:31:12 - ERROR - stderr - 29%|██▉ | 6476/22434 [7:23:31<11:05:00, 2.50s/it] +2025-02-05 17:31:12 - ERROR - stderr - +2025-02-05 17:31:12 - ERROR - stderr - +2025-02-05 17:31:12 - INFO - stdout - {'loss': 0.8815, 'grad_norm': 1.0984230041503906, 'learning_rate': 1.669202130781518e-05, 'epoch': 0.87} +2025-02-05 17:31:12 - ERROR - stderr - 29%|██▉ | 6476/22434 [7:23:32<11:05:00, 2.50s/it] +2025-02-05 17:31:14 - ERROR - stderr - 29%|██▉ | 6477/22434 [7:23:34<11:02:38, 2.49s/it] +2025-02-05 17:31:14 - ERROR - stderr - +2025-02-05 17:31:14 - ERROR - stderr - +2025-02-05 17:31:14 - INFO - stdout - {'loss': 0.8473, 'grad_norm': 0.9248968362808228, 'learning_rate': 1.6690948418229224e-05, 'epoch': 0.87} +2025-02-05 17:31:14 - ERROR - stderr - 29%|██▉ | 6477/22434 [7:23:34<11:02:38, 2.49s/it] +2025-02-05 17:31:17 - ERROR - stderr - 29%|██▉ | 6478/22434 [7:23:36<11:04:17, 2.50s/it] +2025-02-05 17:31:17 - ERROR - stderr - +2025-02-05 17:31:17 - ERROR - stderr - +2025-02-05 17:31:17 - INFO - stdout - {'loss': 0.9668, 'grad_norm': 0.9722856879234314, 'learning_rate': 1.668987538917685e-05, 'epoch': 0.87} +2025-02-05 17:31:17 - ERROR - stderr - 29%|██▉ | 6478/22434 [7:23:37<11:04:17, 2.50s/it] +2025-02-05 17:31:19 - ERROR - stderr - 29%|██▉ | 6479/22434 [7:23:39<11:04:06, 2.50s/it] +2025-02-05 17:31:19 - ERROR - stderr - +2025-02-05 17:31:19 - ERROR - stderr - +2025-02-05 17:31:19 - INFO - stdout - {'loss': 1.1034, 'grad_norm': 1.1002607345581055, 'learning_rate': 1.6688802220680422e-05, 'epoch': 0.87} +2025-02-05 17:31:19 - ERROR - stderr - 29%|██▉ | 6479/22434 [7:23:39<11:04:06, 2.50s/it] +2025-02-05 17:31:22 - ERROR - stderr - 29%|██▉ | 6480/22434 [7:23:41<10:59:21, 2.48s/it] +2025-02-05 17:31:22 - ERROR - stderr - +2025-02-05 17:31:22 - ERROR - stderr - +2025-02-05 17:31:22 - INFO - stdout - {'loss': 0.8969, 'grad_norm': 1.1081945896148682, 'learning_rate': 1.6687728912762314e-05, 'epoch': 0.87} +2025-02-05 17:31:22 - ERROR - stderr - 29%|██▉ | 6480/22434 [7:23:41<10:59:21, 2.48s/it] +2025-02-05 17:31:24 - ERROR - stderr - 29%|██▉ | 6481/22434 [7:23:44<11:16:08, 2.54s/it] +2025-02-05 17:31:24 - ERROR - stderr - +2025-02-05 17:31:24 - ERROR - stderr - +2025-02-05 17:31:24 - INFO - stdout - {'loss': 0.8881, 'grad_norm': 1.0859794616699219, 'learning_rate': 1.6686655465444897e-05, 'epoch': 0.87} +2025-02-05 17:31:24 - ERROR - stderr - 29%|██▉ | 6481/22434 [7:23:44<11:16:08, 2.54s/it] +2025-02-05 17:31:27 - ERROR - stderr - 29%|██▉ | 6482/22434 [7:23:47<11:11:13, 2.52s/it] +2025-02-05 17:31:27 - ERROR - stderr - +2025-02-05 17:31:27 - ERROR - stderr - +2025-02-05 17:31:27 - INFO - stdout - {'loss': 0.9209, 'grad_norm': 0.9970587491989136, 'learning_rate': 1.6685581878750543e-05, 'epoch': 0.87} +2025-02-05 17:31:27 - ERROR - stderr - 29%|██▉ | 6482/22434 [7:23:47<11:11:13, 2.52s/it] +2025-02-05 17:31:29 - ERROR - stderr - 29%|██▉ | 6483/22434 [7:23:49<11:09:34, 2.52s/it] +2025-02-05 17:31:29 - ERROR - stderr - +2025-02-05 17:31:29 - ERROR - stderr - +2025-02-05 17:31:29 - INFO - stdout - {'loss': 0.9579, 'grad_norm': 1.078643560409546, 'learning_rate': 1.6684508152701634e-05, 'epoch': 0.87} +2025-02-05 17:31:29 - ERROR - stderr - 29%|██▉ | 6483/22434 [7:23:49<11:09:34, 2.52s/it] +2025-02-05 17:31:32 - ERROR - stderr - 29%|██▉ | 6484/22434 [7:23:52<11:07:49, 2.51s/it] +2025-02-05 17:31:32 - ERROR - stderr - +2025-02-05 17:31:32 - ERROR - stderr - +2025-02-05 17:31:32 - INFO - stdout - {'loss': 1.0697, 'grad_norm': 1.0877625942230225, 'learning_rate': 1.668343428732055e-05, 'epoch': 0.87} +2025-02-05 17:31:32 - ERROR - stderr - 29%|██▉ | 6484/22434 [7:23:52<11:07:49, 2.51s/it] +2025-02-05 17:31:34 - ERROR - stderr - 29%|██▉ | 6485/22434 [7:23:54<11:02:19, 2.49s/it] +2025-02-05 17:31:34 - ERROR - stderr - +2025-02-05 17:31:34 - ERROR - stderr - +2025-02-05 17:31:34 - INFO - stdout - {'loss': 0.9681, 'grad_norm': 1.102967381477356, 'learning_rate': 1.6682360282629672e-05, 'epoch': 0.87} +2025-02-05 17:31:34 - ERROR - stderr - 29%|██▉ | 6485/22434 [7:23:54<11:02:19, 2.49s/it] +2025-02-05 17:31:37 - ERROR - stderr - 29%|██▉ | 6486/22434 [7:23:57<11:04:34, 2.50s/it] +2025-02-05 17:31:37 - ERROR - stderr - +2025-02-05 17:31:37 - ERROR - stderr - +2025-02-05 17:31:37 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.1853241920471191, 'learning_rate': 1.6681286138651386e-05, 'epoch': 0.87} +2025-02-05 17:31:37 - ERROR - stderr - 29%|██▉ | 6486/22434 [7:23:57<11:04:34, 2.50s/it] +2025-02-05 17:31:39 - ERROR - stderr - 29%|██▉ | 6487/22434 [7:23:59<11:14:56, 2.54s/it] +2025-02-05 17:31:39 - ERROR - stderr - +2025-02-05 17:31:39 - ERROR - stderr - +2025-02-05 17:31:39 - INFO - stdout - {'loss': 0.9474, 'grad_norm': 1.0619043111801147, 'learning_rate': 1.6680211855408087e-05, 'epoch': 0.87} +2025-02-05 17:31:39 - ERROR - stderr - 29%|██▉ | 6487/22434 [7:23:59<11:14:56, 2.54s/it] +2025-02-05 17:31:42 - ERROR - stderr - 29%|██▉ | 6488/22434 [7:24:02<11:16:17, 2.54s/it] +2025-02-05 17:31:42 - ERROR - stderr - +2025-02-05 17:31:42 - ERROR - stderr - +2025-02-05 17:31:42 - INFO - stdout - {'loss': 0.9418, 'grad_norm': 1.1336722373962402, 'learning_rate': 1.6679137432922163e-05, 'epoch': 0.87} +2025-02-05 17:31:42 - ERROR - stderr - 29%|██▉ | 6488/22434 [7:24:02<11:16:17, 2.54s/it] +2025-02-05 17:31:45 - ERROR - stderr - 29%|██▉ | 6489/22434 [7:24:04<11:15:21, 2.54s/it] +2025-02-05 17:31:45 - ERROR - stderr - +2025-02-05 17:31:45 - ERROR - stderr - +2025-02-05 17:31:45 - INFO - stdout - {'loss': 1.0314, 'grad_norm': 1.1401530504226685, 'learning_rate': 1.667806287121601e-05, 'epoch': 0.87} +2025-02-05 17:31:45 - ERROR - stderr - 29%|██▉ | 6489/22434 [7:24:04<11:15:21, 2.54s/it] +2025-02-05 17:31:47 - ERROR - stderr - 29%|██▉ | 6490/22434 [7:24:07<11:14:56, 2.54s/it] +2025-02-05 17:31:47 - ERROR - stderr - +2025-02-05 17:31:47 - ERROR - stderr - +2025-02-05 17:31:47 - INFO - stdout - {'loss': 0.8481, 'grad_norm': 1.0824079513549805, 'learning_rate': 1.6676988170312027e-05, 'epoch': 0.87} +2025-02-05 17:31:47 - ERROR - stderr - 29%|██▉ | 6490/22434 [7:24:07<11:14:56, 2.54s/it] +2025-02-05 17:31:50 - ERROR - stderr - 29%|██▉ | 6491/22434 [7:24:09<11:13:06, 2.53s/it] +2025-02-05 17:31:50 - ERROR - stderr - +2025-02-05 17:31:50 - ERROR - stderr - +2025-02-05 17:31:50 - INFO - stdout - {'loss': 0.9324, 'grad_norm': 1.1097157001495361, 'learning_rate': 1.6675913330232613e-05, 'epoch': 0.87} +2025-02-05 17:31:50 - ERROR - stderr - 29%|██▉ | 6491/22434 [7:24:09<11:13:06, 2.53s/it] +2025-02-05 17:31:52 - ERROR - stderr - 29%|██▉ | 6492/22434 [7:24:12<11:10:51, 2.52s/it] +2025-02-05 17:31:52 - ERROR - stderr - +2025-02-05 17:31:52 - ERROR - stderr - +2025-02-05 17:31:52 - INFO - stdout - {'loss': 0.8745, 'grad_norm': 1.1484395265579224, 'learning_rate': 1.6674838351000176e-05, 'epoch': 0.87} +2025-02-05 17:31:52 - ERROR - stderr - 29%|██▉ | 6492/22434 [7:24:12<11:10:51, 2.52s/it] +2025-02-05 17:31:55 - ERROR - stderr - 29%|██▉ | 6493/22434 [7:24:14<11:16:40, 2.55s/it] +2025-02-05 17:31:55 - ERROR - stderr - +2025-02-05 17:31:55 - ERROR - stderr - +2025-02-05 17:31:55 - INFO - stdout - {'loss': 0.9596, 'grad_norm': 0.9537686705589294, 'learning_rate': 1.6673763232637123e-05, 'epoch': 0.87} +2025-02-05 17:31:55 - ERROR - stderr - 29%|██▉ | 6493/22434 [7:24:14<11:16:40, 2.55s/it] +2025-02-05 17:31:57 - ERROR - stderr - 29%|██▉ | 6494/22434 [7:24:17<11:12:29, 2.53s/it] +2025-02-05 17:31:57 - ERROR - stderr - +2025-02-05 17:31:57 - ERROR - stderr - +2025-02-05 17:31:57 - INFO - stdout - {'loss': 0.904, 'grad_norm': 1.1138883829116821, 'learning_rate': 1.667268797516586e-05, 'epoch': 0.87} +2025-02-05 17:31:57 - ERROR - stderr - 29%|██▉ | 6494/22434 [7:24:17<11:12:29, 2.53s/it] +2025-02-05 17:32:00 - ERROR - stderr - 29%|██▉ | 6495/22434 [7:24:19<11:05:40, 2.51s/it] +2025-02-05 17:32:00 - ERROR - stderr - +2025-02-05 17:32:00 - ERROR - stderr - +2025-02-05 17:32:00 - INFO - stdout - {'loss': 0.9347, 'grad_norm': 1.2903140783309937, 'learning_rate': 1.66716125786088e-05, 'epoch': 0.87} +2025-02-05 17:32:00 - ERROR - stderr - 29%|██▉ | 6495/22434 [7:24:19<11:05:40, 2.51s/it] +2025-02-05 17:32:02 - ERROR - stderr - 29%|██▉ | 6496/22434 [7:24:22<11:03:22, 2.50s/it] +2025-02-05 17:32:02 - ERROR - stderr - +2025-02-05 17:32:02 - ERROR - stderr - +2025-02-05 17:32:02 - INFO - stdout - {'loss': 0.9555, 'grad_norm': 1.0341150760650635, 'learning_rate': 1.667053704298836e-05, 'epoch': 0.87} +2025-02-05 17:32:02 - ERROR - stderr - 29%|██▉ | 6496/22434 [7:24:22<11:03:22, 2.50s/it] +2025-02-05 17:32:05 - ERROR - stderr - 29%|██▉ | 6497/22434 [7:24:24<11:03:54, 2.50s/it] +2025-02-05 17:32:05 - ERROR - stderr - +2025-02-05 17:32:05 - ERROR - stderr - +2025-02-05 17:32:05 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.029263973236084, 'learning_rate': 1.6669461368326958e-05, 'epoch': 0.87} +2025-02-05 17:32:05 - ERROR - stderr - 29%|██▉ | 6497/22434 [7:24:24<11:03:54, 2.50s/it] +2025-02-05 17:32:07 - ERROR - stderr - 29%|██▉ | 6498/22434 [7:24:27<11:16:37, 2.55s/it] +2025-02-05 17:32:07 - ERROR - stderr - +2025-02-05 17:32:07 - ERROR - stderr - +2025-02-05 17:32:07 - INFO - stdout - {'loss': 0.8084, 'grad_norm': 1.029625415802002, 'learning_rate': 1.6668385554647017e-05, 'epoch': 0.87} +2025-02-05 17:32:07 - ERROR - stderr - 29%|██▉ | 6498/22434 [7:24:27<11:16:37, 2.55s/it] +2025-02-05 17:32:10 - ERROR - stderr - 29%|██▉ | 6499/22434 [7:24:30<11:13:40, 2.54s/it] +2025-02-05 17:32:10 - ERROR - stderr - +2025-02-05 17:32:10 - ERROR - stderr - +2025-02-05 17:32:10 - INFO - stdout - {'loss': 0.9658, 'grad_norm': 1.074678897857666, 'learning_rate': 1.6667309601970957e-05, 'epoch': 0.87} +2025-02-05 17:32:10 - ERROR - stderr - 29%|██▉ | 6499/22434 [7:24:30<11:13:40, 2.54s/it] +2025-02-05 17:32:12 - ERROR - stderr - 29%|██▉ | 6500/22434 [7:24:32<11:11:31, 2.53s/it] +2025-02-05 17:32:12 - ERROR - stderr - +2025-02-05 17:32:12 - ERROR - stderr - +2025-02-05 17:32:12 - INFO - stdout - {'loss': 0.9354, 'grad_norm': 1.1187047958374023, 'learning_rate': 1.666623351032121e-05, 'epoch': 0.87} +2025-02-05 17:32:12 - ERROR - stderr - 29%|██▉ | 6500/22434 [7:24:32<11:11:31, 2.53s/it] +2025-02-05 17:32:15 - ERROR - stderr - 29%|██▉ | 6501/22434 [7:24:35<11:10:31, 2.53s/it] +2025-02-05 17:32:15 - ERROR - stderr - +2025-02-05 17:32:15 - ERROR - stderr - +2025-02-05 17:32:15 - INFO - stdout - {'loss': 0.8596, 'grad_norm': 1.012219786643982, 'learning_rate': 1.6665157279720207e-05, 'epoch': 0.87} +2025-02-05 17:32:15 - ERROR - stderr - 29%|██▉ | 6501/22434 [7:24:35<11:10:31, 2.53s/it] +2025-02-05 17:32:17 - ERROR - stderr - 29%|██▉ | 6502/22434 [7:24:37<11:09:09, 2.52s/it] +2025-02-05 17:32:17 - ERROR - stderr - +2025-02-05 17:32:17 - ERROR - stderr - +2025-02-05 17:32:17 - INFO - stdout - {'loss': 0.966, 'grad_norm': 1.1061692237854004, 'learning_rate': 1.6664080910190374e-05, 'epoch': 0.87} +2025-02-05 17:32:17 - ERROR - stderr - 29%|██▉ | 6502/22434 [7:24:37<11:09:09, 2.52s/it] +2025-02-05 17:32:20 - ERROR - stderr - 29%|██▉ | 6503/22434 [7:24:40<11:11:39, 2.53s/it] +2025-02-05 17:32:20 - ERROR - stderr - +2025-02-05 17:32:20 - ERROR - stderr - +2025-02-05 17:32:20 - INFO - stdout - {'loss': 1.0234, 'grad_norm': 1.1396405696868896, 'learning_rate': 1.6663004401754155e-05, 'epoch': 0.87} +2025-02-05 17:32:20 - ERROR - stderr - 29%|██▉ | 6503/22434 [7:24:40<11:11:39, 2.53s/it] +2025-02-05 17:32:22 - ERROR - stderr - 29%|██▉ | 6504/22434 [7:24:42<11:10:03, 2.52s/it] +2025-02-05 17:32:22 - ERROR - stderr - +2025-02-05 17:32:22 - ERROR - stderr - +2025-02-05 17:32:22 - INFO - stdout - {'loss': 0.9256, 'grad_norm': 1.1247122287750244, 'learning_rate': 1.6661927754433982e-05, 'epoch': 0.87} +2025-02-05 17:32:22 - ERROR - stderr - 29%|██▉ | 6504/22434 [7:24:42<11:10:03, 2.52s/it] +2025-02-05 17:32:25 - ERROR - stderr - 29%|██▉ | 6505/22434 [7:24:45<11:09:11, 2.52s/it] +2025-02-05 17:32:25 - ERROR - stderr - +2025-02-05 17:32:25 - ERROR - stderr - +2025-02-05 17:32:25 - INFO - stdout - {'loss': 0.9014, 'grad_norm': 1.0590485334396362, 'learning_rate': 1.6660850968252305e-05, 'epoch': 0.87} +2025-02-05 17:32:25 - ERROR - stderr - 29%|██▉ | 6505/22434 [7:24:45<11:09:11, 2.52s/it] +2025-02-05 17:32:27 - ERROR - stderr - 29%|██▉ | 6506/22434 [7:24:47<11:05:48, 2.51s/it] +2025-02-05 17:32:27 - ERROR - stderr - +2025-02-05 17:32:27 - ERROR - stderr - +2025-02-05 17:32:27 - INFO - stdout - {'loss': 0.9617, 'grad_norm': 1.4007304906845093, 'learning_rate': 1.6659774043231557e-05, 'epoch': 0.87} +2025-02-05 17:32:27 - ERROR - stderr - 29%|██▉ | 6506/22434 [7:24:47<11:05:48, 2.51s/it] +2025-02-05 17:32:30 - ERROR - stderr - 29%|██▉ | 6507/22434 [7:24:50<11:02:52, 2.50s/it] +2025-02-05 17:32:30 - ERROR - stderr - +2025-02-05 17:32:30 - ERROR - stderr - +2025-02-05 17:32:30 - INFO - stdout - {'loss': 1.0781, 'grad_norm': 1.2011232376098633, 'learning_rate': 1.6658696979394194e-05, 'epoch': 0.87} +2025-02-05 17:32:30 - ERROR - stderr - 29%|██▉ | 6507/22434 [7:24:50<11:02:52, 2.50s/it] +2025-02-05 17:32:32 - ERROR - stderr - 29%|██▉ | 6508/22434 [7:24:52<10:58:34, 2.48s/it] +2025-02-05 17:32:32 - ERROR - stderr - +2025-02-05 17:32:32 - ERROR - stderr - +2025-02-05 17:32:32 - INFO - stdout - {'loss': 0.8808, 'grad_norm': 1.0361733436584473, 'learning_rate': 1.6657619776762667e-05, 'epoch': 0.87} +2025-02-05 17:32:32 - ERROR - stderr - 29%|██▉ | 6508/22434 [7:24:52<10:58:34, 2.48s/it] +2025-02-05 17:32:35 - ERROR - stderr - 29%|██▉ | 6509/22434 [7:24:55<11:00:49, 2.49s/it] +2025-02-05 17:32:35 - ERROR - stderr - +2025-02-05 17:32:35 - ERROR - stderr - +2025-02-05 17:32:35 - INFO - stdout - {'loss': 0.8449, 'grad_norm': 0.9740707874298096, 'learning_rate': 1.665654243535942e-05, 'epoch': 0.87} +2025-02-05 17:32:35 - ERROR - stderr - 29%|██▉ | 6509/22434 [7:24:55<11:00:49, 2.49s/it] +2025-02-05 17:32:37 - ERROR - stderr - 29%|██▉ | 6510/22434 [7:24:57<11:08:35, 2.52s/it] +2025-02-05 17:32:37 - ERROR - stderr - +2025-02-05 17:32:37 - ERROR - stderr - +2025-02-05 17:32:37 - INFO - stdout - {'loss': 1.0124, 'grad_norm': 1.112112283706665, 'learning_rate': 1.665546495520692e-05, 'epoch': 0.87} +2025-02-05 17:32:37 - ERROR - stderr - 29%|██▉ | 6510/22434 [7:24:57<11:08:35, 2.52s/it] +2025-02-05 17:32:40 - ERROR - stderr - 29%|██▉ | 6511/22434 [7:25:00<11:06:45, 2.51s/it] +2025-02-05 17:32:40 - ERROR - stderr - +2025-02-05 17:32:40 - ERROR - stderr - +2025-02-05 17:32:40 - INFO - stdout - {'loss': 0.9964, 'grad_norm': 1.0324573516845703, 'learning_rate': 1.665438733632762e-05, 'epoch': 0.87} +2025-02-05 17:32:40 - ERROR - stderr - 29%|██▉ | 6511/22434 [7:25:00<11:06:45, 2.51s/it] +2025-02-05 17:32:42 - ERROR - stderr - 29%|██▉ | 6512/22434 [7:25:02<11:04:21, 2.50s/it] +2025-02-05 17:32:42 - ERROR - stderr - +2025-02-05 17:32:42 - ERROR - stderr - +2025-02-05 17:32:42 - INFO - stdout - {'loss': 0.8778, 'grad_norm': 1.034432053565979, 'learning_rate': 1.6653309578743986e-05, 'epoch': 0.87} +2025-02-05 17:32:42 - ERROR - stderr - 29%|██▉ | 6512/22434 [7:25:02<11:04:21, 2.50s/it] +2025-02-05 17:32:45 - ERROR - stderr - 29%|██▉ | 6513/22434 [7:25:05<11:04:24, 2.50s/it] +2025-02-05 17:32:45 - ERROR - stderr - +2025-02-05 17:32:45 - ERROR - stderr - +2025-02-05 17:32:45 - INFO - stdout - {'loss': 1.0163, 'grad_norm': 1.0609415769577026, 'learning_rate': 1.665223168247848e-05, 'epoch': 0.87} +2025-02-05 17:32:45 - ERROR - stderr - 29%|██▉ | 6513/22434 [7:25:05<11:04:24, 2.50s/it] +2025-02-05 17:32:47 - ERROR - stderr - 29%|██▉ | 6514/22434 [7:25:07<11:06:54, 2.51s/it] +2025-02-05 17:32:47 - ERROR - stderr - +2025-02-05 17:32:47 - ERROR - stderr - +2025-02-05 17:32:47 - INFO - stdout - {'loss': 0.8497, 'grad_norm': 1.0072652101516724, 'learning_rate': 1.665115364755357e-05, 'epoch': 0.87} +2025-02-05 17:32:47 - ERROR - stderr - 29%|██▉ | 6514/22434 [7:25:07<11:06:54, 2.51s/it] +2025-02-05 17:32:50 - ERROR - stderr - 29%|██▉ | 6515/22434 [7:25:10<11:06:23, 2.51s/it] +2025-02-05 17:32:50 - ERROR - stderr - +2025-02-05 17:32:50 - ERROR - stderr - +2025-02-05 17:32:50 - INFO - stdout - {'loss': 0.8636, 'grad_norm': 1.1178102493286133, 'learning_rate': 1.6650075473991726e-05, 'epoch': 0.87} +2025-02-05 17:32:50 - ERROR - stderr - 29%|██▉ | 6515/22434 [7:25:10<11:06:23, 2.51s/it] +2025-02-05 17:32:52 - ERROR - stderr - 29%|██▉ | 6516/22434 [7:25:12<11:06:03, 2.51s/it] +2025-02-05 17:32:52 - ERROR - stderr - +2025-02-05 17:32:52 - ERROR - stderr - +2025-02-05 17:32:52 - INFO - stdout - {'loss': 0.8286, 'grad_norm': 0.9420791268348694, 'learning_rate': 1.664899716181542e-05, 'epoch': 0.87} +2025-02-05 17:32:52 - ERROR - stderr - 29%|██▉ | 6516/22434 [7:25:12<11:06:03, 2.51s/it] +2025-02-05 17:32:55 - ERROR - stderr - 29%|██▉ | 6517/22434 [7:25:15<11:02:22, 2.50s/it] +2025-02-05 17:32:55 - ERROR - stderr - +2025-02-05 17:32:55 - ERROR - stderr - +2025-02-05 17:32:55 - INFO - stdout - {'loss': 0.8808, 'grad_norm': 1.0138992071151733, 'learning_rate': 1.6647918711047133e-05, 'epoch': 0.87} +2025-02-05 17:32:55 - ERROR - stderr - 29%|██▉ | 6517/22434 [7:25:15<11:02:22, 2.50s/it] +2025-02-05 17:32:57 - ERROR - stderr - 29%|██▉ | 6518/22434 [7:25:17<10:57:58, 2.48s/it] +2025-02-05 17:32:57 - ERROR - stderr - +2025-02-05 17:32:57 - ERROR - stderr - +2025-02-05 17:32:57 - INFO - stdout - {'loss': 0.9333, 'grad_norm': 1.022444486618042, 'learning_rate': 1.664684012170934e-05, 'epoch': 0.87} +2025-02-05 17:32:57 - ERROR - stderr - 29%|██▉ | 6518/22434 [7:25:17<10:57:58, 2.48s/it] +2025-02-05 17:33:00 - ERROR - stderr - 29%|██▉ | 6519/22434 [7:25:20<10:57:15, 2.48s/it] +2025-02-05 17:33:00 - ERROR - stderr - +2025-02-05 17:33:00 - ERROR - stderr - +2025-02-05 17:33:00 - INFO - stdout - {'loss': 0.9654, 'grad_norm': 1.0528024435043335, 'learning_rate': 1.6645761393824526e-05, 'epoch': 0.87} +2025-02-05 17:33:00 - ERROR - stderr - 29%|██▉ | 6519/22434 [7:25:20<10:57:15, 2.48s/it] +2025-02-05 17:33:02 - ERROR - stderr - 29%|██▉ | 6520/22434 [7:25:22<10:59:10, 2.49s/it] +2025-02-05 17:33:02 - ERROR - stderr - +2025-02-05 17:33:02 - ERROR - stderr - +2025-02-05 17:33:02 - INFO - stdout - {'loss': 0.9726, 'grad_norm': 1.107457160949707, 'learning_rate': 1.6644682527415176e-05, 'epoch': 0.87} +2025-02-05 17:33:02 - ERROR - stderr - 29%|██▉ | 6520/22434 [7:25:22<10:59:10, 2.49s/it] +2025-02-05 17:33:05 - ERROR - stderr - 29%|██▉ | 6521/22434 [7:25:25<11:01:03, 2.49s/it] +2025-02-05 17:33:05 - ERROR - stderr - +2025-02-05 17:33:05 - ERROR - stderr - +2025-02-05 17:33:05 - INFO - stdout - {'loss': 0.8374, 'grad_norm': 1.0602091550827026, 'learning_rate': 1.664360352250378e-05, 'epoch': 0.87} +2025-02-05 17:33:05 - ERROR - stderr - 29%|██▉ | 6521/22434 [7:25:25<11:01:03, 2.49s/it] +2025-02-05 17:33:07 - ERROR - stderr - 29%|██▉ | 6522/22434 [7:25:27<11:07:02, 2.52s/it] +2025-02-05 17:33:07 - ERROR - stderr - +2025-02-05 17:33:07 - ERROR - stderr - +2025-02-05 17:33:07 - INFO - stdout - {'loss': 0.9698, 'grad_norm': 1.1460821628570557, 'learning_rate': 1.664252437911282e-05, 'epoch': 0.87} +2025-02-05 17:33:07 - ERROR - stderr - 29%|██▉ | 6522/22434 [7:25:27<11:07:02, 2.52s/it] +2025-02-05 17:33:10 - ERROR - stderr - 29%|██▉ | 6523/22434 [7:25:30<11:09:13, 2.52s/it] +2025-02-05 17:33:10 - ERROR - stderr - +2025-02-05 17:33:10 - ERROR - stderr - +2025-02-05 17:33:10 - INFO - stdout - {'loss': 0.811, 'grad_norm': 1.0244218111038208, 'learning_rate': 1.6641445097264796e-05, 'epoch': 0.87} +2025-02-05 17:33:10 - ERROR - stderr - 29%|██▉ | 6523/22434 [7:25:30<11:09:13, 2.52s/it] +2025-02-05 17:33:13 - ERROR - stderr - 29%|██▉ | 6524/22434 [7:25:33<11:36:35, 2.63s/it] +2025-02-05 17:33:13 - ERROR - stderr - +2025-02-05 17:33:13 - ERROR - stderr - +2025-02-05 17:33:13 - INFO - stdout - {'loss': 0.9525, 'grad_norm': 1.1320558786392212, 'learning_rate': 1.6640365676982208e-05, 'epoch': 0.87} +2025-02-05 17:33:13 - ERROR - stderr - 29%|██▉ | 6524/22434 [7:25:33<11:36:35, 2.63s/it] +2025-02-05 17:33:15 - ERROR - stderr - 29%|██▉ | 6525/22434 [7:25:35<11:25:31, 2.59s/it] +2025-02-05 17:33:15 - ERROR - stderr - +2025-02-05 17:33:15 - ERROR - stderr - +2025-02-05 17:33:15 - INFO - stdout - {'loss': 1.0394, 'grad_norm': 1.0552382469177246, 'learning_rate': 1.6639286118287548e-05, 'epoch': 0.87} +2025-02-05 17:33:15 - ERROR - stderr - 29%|██▉ | 6525/22434 [7:25:35<11:25:31, 2.59s/it] +2025-02-05 17:33:18 - ERROR - stderr - 29%|██▉ | 6526/22434 [7:25:38<11:20:21, 2.57s/it] +2025-02-05 17:33:18 - ERROR - stderr - +2025-02-05 17:33:18 - ERROR - stderr - +2025-02-05 17:33:18 - INFO - stdout - {'loss': 0.7836, 'grad_norm': 0.9924234747886658, 'learning_rate': 1.6638206421203324e-05, 'epoch': 0.87} +2025-02-05 17:33:18 - ERROR - stderr - 29%|██▉ | 6526/22434 [7:25:38<11:20:21, 2.57s/it] +2025-02-05 17:33:20 - ERROR - stderr - 29%|██▉ | 6527/22434 [7:25:40<11:11:47, 2.53s/it] +2025-02-05 17:33:20 - ERROR - stderr - +2025-02-05 17:33:20 - ERROR - stderr - +2025-02-05 17:33:20 - INFO - stdout - {'loss': 0.976, 'grad_norm': 1.1913471221923828, 'learning_rate': 1.6637126585752036e-05, 'epoch': 0.87} +2025-02-05 17:33:20 - ERROR - stderr - 29%|██▉ | 6527/22434 [7:25:40<11:11:47, 2.53s/it] +2025-02-05 17:33:23 - ERROR - stderr - 29%|██▉ | 6528/22434 [7:25:43<11:09:46, 2.53s/it] +2025-02-05 17:33:23 - ERROR - stderr - +2025-02-05 17:33:23 - ERROR - stderr - +2025-02-05 17:33:23 - INFO - stdout - {'loss': 0.9332, 'grad_norm': 1.0651968717575073, 'learning_rate': 1.66360466119562e-05, 'epoch': 0.87} +2025-02-05 17:33:23 - ERROR - stderr - 29%|██▉ | 6528/22434 [7:25:43<11:09:46, 2.53s/it] +2025-02-05 17:33:25 - ERROR - stderr - 29%|██▉ | 6529/22434 [7:25:45<11:05:10, 2.51s/it] +2025-02-05 17:33:25 - ERROR - stderr - +2025-02-05 17:33:25 - ERROR - stderr - +2025-02-05 17:33:25 - INFO - stdout - {'loss': 0.9653, 'grad_norm': 1.1777064800262451, 'learning_rate': 1.6634966499838323e-05, 'epoch': 0.87} +2025-02-05 17:33:25 - ERROR - stderr - 29%|██▉ | 6529/22434 [7:25:45<11:05:10, 2.51s/it] +2025-02-05 17:33:28 - ERROR - stderr - 29%|██▉ | 6530/22434 [7:25:47<11:02:28, 2.50s/it] +2025-02-05 17:33:28 - ERROR - stderr - +2025-02-05 17:33:28 - ERROR - stderr - +2025-02-05 17:33:28 - INFO - stdout - {'loss': 0.9182, 'grad_norm': 1.074629306793213, 'learning_rate': 1.6633886249420915e-05, 'epoch': 0.87} +2025-02-05 17:33:28 - ERROR - stderr - 29%|██▉ | 6530/22434 [7:25:48<11:02:28, 2.50s/it] +2025-02-05 17:33:30 - ERROR - stderr - 29%|██▉ | 6531/22434 [7:25:50<10:59:18, 2.49s/it] +2025-02-05 17:33:30 - ERROR - stderr - +2025-02-05 17:33:30 - ERROR - stderr - +2025-02-05 17:33:30 - INFO - stdout - {'loss': 0.9795, 'grad_norm': 1.214073896408081, 'learning_rate': 1.6632805860726497e-05, 'epoch': 0.87} +2025-02-05 17:33:30 - ERROR - stderr - 29%|██▉ | 6531/22434 [7:25:50<10:59:18, 2.49s/it] +2025-02-05 17:33:33 - ERROR - stderr - 29%|██▉ | 6532/22434 [7:25:53<11:14:39, 2.55s/it] +2025-02-05 17:33:33 - ERROR - stderr - +2025-02-05 17:33:33 - ERROR - stderr - +2025-02-05 17:33:33 - INFO - stdout - {'loss': 0.9912, 'grad_norm': 1.1148756742477417, 'learning_rate': 1.6631725333777585e-05, 'epoch': 0.87} +2025-02-05 17:33:33 - ERROR - stderr - 29%|██▉ | 6532/22434 [7:25:53<11:14:39, 2.55s/it] +2025-02-05 17:33:35 - ERROR - stderr - 29%|██▉ | 6533/22434 [7:25:55<11:11:32, 2.53s/it] +2025-02-05 17:33:35 - ERROR - stderr - +2025-02-05 17:33:35 - ERROR - stderr - +2025-02-05 17:33:35 - INFO - stdout - {'loss': 1.0807, 'grad_norm': 1.1715505123138428, 'learning_rate': 1.663064466859671e-05, 'epoch': 0.87} +2025-02-05 17:33:35 - ERROR - stderr - 29%|██▉ | 6533/22434 [7:25:55<11:11:32, 2.53s/it] +2025-02-05 17:33:38 - ERROR - stderr - 29%|██▉ | 6534/22434 [7:25:58<11:05:15, 2.51s/it] +2025-02-05 17:33:38 - ERROR - stderr - +2025-02-05 17:33:38 - ERROR - stderr - +2025-02-05 17:33:38 - INFO - stdout - {'loss': 1.0166, 'grad_norm': 1.1862242221832275, 'learning_rate': 1.6629563865206388e-05, 'epoch': 0.87} +2025-02-05 17:33:38 - ERROR - stderr - 29%|██▉ | 6534/22434 [7:25:58<11:05:15, 2.51s/it] +2025-02-05 17:33:40 - ERROR - stderr - 29%|██▉ | 6535/22434 [7:26:00<11:02:27, 2.50s/it] +2025-02-05 17:33:40 - ERROR - stderr - +2025-02-05 17:33:40 - ERROR - stderr - +2025-02-05 17:33:40 - INFO - stdout - {'loss': 0.9204, 'grad_norm': 1.0223350524902344, 'learning_rate': 1.6628482923629147e-05, 'epoch': 0.87} +2025-02-05 17:33:40 - ERROR - stderr - 29%|██▉ | 6535/22434 [7:26:00<11:02:27, 2.50s/it] +2025-02-05 17:33:43 - ERROR - stderr - 29%|██▉ | 6536/22434 [7:26:03<10:59:49, 2.49s/it] +2025-02-05 17:33:43 - ERROR - stderr - +2025-02-05 17:33:43 - ERROR - stderr - +2025-02-05 17:33:43 - INFO - stdout - {'loss': 0.9605, 'grad_norm': 1.1405360698699951, 'learning_rate': 1.6627401843887526e-05, 'epoch': 0.87} +2025-02-05 17:33:43 - ERROR - stderr - 29%|██▉ | 6536/22434 [7:26:03<10:59:49, 2.49s/it] +2025-02-05 17:33:45 - ERROR - stderr - 29%|██▉ | 6537/22434 [7:26:05<11:05:04, 2.51s/it] +2025-02-05 17:33:45 - ERROR - stderr - +2025-02-05 17:33:45 - ERROR - stderr - +2025-02-05 17:33:45 - INFO - stdout - {'loss': 0.842, 'grad_norm': 0.9497440457344055, 'learning_rate': 1.662632062600406e-05, 'epoch': 0.87} +2025-02-05 17:33:45 - ERROR - stderr - 29%|██▉ | 6537/22434 [7:26:05<11:05:04, 2.51s/it] +2025-02-05 17:33:48 - ERROR - stderr - 29%|██▉ | 6538/22434 [7:26:08<11:02:02, 2.50s/it] +2025-02-05 17:33:48 - ERROR - stderr - +2025-02-05 17:33:48 - ERROR - stderr - +2025-02-05 17:33:48 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.1066334247589111, 'learning_rate': 1.6625239270001277e-05, 'epoch': 0.87} +2025-02-05 17:33:48 - ERROR - stderr - 29%|██▉ | 6538/22434 [7:26:08<11:02:02, 2.50s/it] +2025-02-05 17:33:51 - ERROR - stderr - 29%|██▉ | 6539/22434 [7:26:10<11:23:59, 2.58s/it] +2025-02-05 17:33:51 - ERROR - stderr - +2025-02-05 17:33:51 - ERROR - stderr - +2025-02-05 17:33:51 - INFO - stdout - {'loss': 0.9546, 'grad_norm': 0.9312584400177002, 'learning_rate': 1.662415777590172e-05, 'epoch': 0.87} +2025-02-05 17:33:51 - ERROR - stderr - 29%|██▉ | 6539/22434 [7:26:10<11:23:59, 2.58s/it] +2025-02-05 17:33:53 - ERROR - stderr - 29%|██▉ | 6540/22434 [7:26:13<11:12:15, 2.54s/it] +2025-02-05 17:33:53 - ERROR - stderr - +2025-02-05 17:33:53 - ERROR - stderr - +2025-02-05 17:33:53 - INFO - stdout - {'loss': 0.989, 'grad_norm': 1.1156271696090698, 'learning_rate': 1.6623076143727933e-05, 'epoch': 0.87} +2025-02-05 17:33:53 - ERROR - stderr - 29%|██▉ | 6540/22434 [7:26:13<11:12:15, 2.54s/it] +2025-02-05 17:33:55 - ERROR - stderr - 29%|██▉ | 6541/22434 [7:26:15<11:08:51, 2.53s/it] +2025-02-05 17:33:56 - ERROR - stderr - +2025-02-05 17:33:56 - ERROR - stderr - +2025-02-05 17:33:56 - INFO - stdout - {'loss': 0.9606, 'grad_norm': 1.104783296585083, 'learning_rate': 1.6621994373502463e-05, 'epoch': 0.87} +2025-02-05 17:33:56 - ERROR - stderr - 29%|██▉ | 6541/22434 [7:26:15<11:08:51, 2.53s/it] +2025-02-05 17:33:58 - ERROR - stderr - 29%|██▉ | 6542/22434 [7:26:18<11:18:24, 2.56s/it] +2025-02-05 17:33:58 - ERROR - stderr - +2025-02-05 17:33:58 - ERROR - stderr - +2025-02-05 17:33:58 - INFO - stdout - {'loss': 1.0008, 'grad_norm': 0.9976694583892822, 'learning_rate': 1.6620912465247857e-05, 'epoch': 0.87} +2025-02-05 17:33:58 - ERROR - stderr - 29%|██▉ | 6542/22434 [7:26:18<11:18:24, 2.56s/it] +2025-02-05 17:34:01 - ERROR - stderr - 29%|██▉ | 6543/22434 [7:26:20<11:13:20, 2.54s/it] +2025-02-05 17:34:01 - ERROR - stderr - +2025-02-05 17:34:01 - ERROR - stderr - +2025-02-05 17:34:01 - INFO - stdout - {'loss': 0.9588, 'grad_norm': 0.9949148297309875, 'learning_rate': 1.6619830418986665e-05, 'epoch': 0.87} +2025-02-05 17:34:01 - ERROR - stderr - 29%|██▉ | 6543/22434 [7:26:20<11:13:20, 2.54s/it] +2025-02-05 17:34:03 - ERROR - stderr - 29%|██▉ | 6544/22434 [7:26:23<11:08:04, 2.52s/it] +2025-02-05 17:34:03 - ERROR - stderr - +2025-02-05 17:34:03 - ERROR - stderr - +2025-02-05 17:34:03 - INFO - stdout - {'loss': 0.9878, 'grad_norm': 1.0580958127975464, 'learning_rate': 1.661874823474144e-05, 'epoch': 0.88} +2025-02-05 17:34:03 - ERROR - stderr - 29%|██▉ | 6544/22434 [7:26:23<11:08:04, 2.52s/it] +2025-02-05 17:34:06 - ERROR - stderr - 29%|██▉ | 6545/22434 [7:26:25<11:12:11, 2.54s/it] +2025-02-05 17:34:06 - ERROR - stderr - +2025-02-05 17:34:06 - ERROR - stderr - +2025-02-05 17:34:06 - INFO - stdout - {'loss': 0.9061, 'grad_norm': 0.9579415917396545, 'learning_rate': 1.6617665912534746e-05, 'epoch': 0.88} +2025-02-05 17:34:06 - ERROR - stderr - 29%|██▉ | 6545/22434 [7:26:25<11:12:11, 2.54s/it] +2025-02-05 17:34:08 - ERROR - stderr - 29%|██▉ | 6546/22434 [7:26:28<11:12:01, 2.54s/it] +2025-02-05 17:34:08 - ERROR - stderr - +2025-02-05 17:34:08 - ERROR - stderr - +2025-02-05 17:34:08 - INFO - stdout - {'loss': 0.998, 'grad_norm': 1.2294880151748657, 'learning_rate': 1.661658345238914e-05, 'epoch': 0.88} +2025-02-05 17:34:08 - ERROR - stderr - 29%|██▉ | 6546/22434 [7:26:28<11:12:01, 2.54s/it] +2025-02-05 17:34:11 - ERROR - stderr - 29%|██▉ | 6547/22434 [7:26:31<11:13:02, 2.54s/it] +2025-02-05 17:34:11 - ERROR - stderr - +2025-02-05 17:34:11 - ERROR - stderr - +2025-02-05 17:34:11 - INFO - stdout - {'loss': 0.9151, 'grad_norm': 1.0071120262145996, 'learning_rate': 1.661550085432718e-05, 'epoch': 0.88} +2025-02-05 17:34:11 - ERROR - stderr - 29%|██▉ | 6547/22434 [7:26:31<11:13:02, 2.54s/it] +2025-02-05 17:34:13 - ERROR - stderr - 29%|██▉ | 6548/22434 [7:26:33<11:14:32, 2.55s/it] +2025-02-05 17:34:13 - ERROR - stderr - +2025-02-05 17:34:13 - ERROR - stderr - +2025-02-05 17:34:13 - INFO - stdout - {'loss': 0.986, 'grad_norm': 1.0598218441009521, 'learning_rate': 1.6614418118371435e-05, 'epoch': 0.88} +2025-02-05 17:34:13 - ERROR - stderr - 29%|██▉ | 6548/22434 [7:26:33<11:14:32, 2.55s/it] +2025-02-05 17:34:16 - ERROR - stderr - 29%|██▉ | 6549/22434 [7:26:36<11:12:51, 2.54s/it] +2025-02-05 17:34:16 - ERROR - stderr - +2025-02-05 17:34:16 - ERROR - stderr - +2025-02-05 17:34:16 - INFO - stdout - {'loss': 0.9356, 'grad_norm': 1.0410268306732178, 'learning_rate': 1.661333524454447e-05, 'epoch': 0.88} +2025-02-05 17:34:16 - ERROR - stderr - 29%|██▉ | 6549/22434 [7:26:36<11:12:51, 2.54s/it] +2025-02-05 17:34:18 - ERROR - stderr - 29%|██▉ | 6550/22434 [7:26:38<11:12:39, 2.54s/it] +2025-02-05 17:34:18 - ERROR - stderr - +2025-02-05 17:34:18 - ERROR - stderr - +2025-02-05 17:34:18 - INFO - stdout - {'loss': 1.0216, 'grad_norm': 1.0316548347473145, 'learning_rate': 1.6612252232868868e-05, 'epoch': 0.88} +2025-02-05 17:34:18 - ERROR - stderr - 29%|██▉ | 6550/22434 [7:26:38<11:12:39, 2.54s/it] +2025-02-05 17:34:21 - ERROR - stderr - 29%|██▉ | 6551/22434 [7:26:41<11:06:31, 2.52s/it] +2025-02-05 17:34:21 - ERROR - stderr - +2025-02-05 17:34:21 - ERROR - stderr - +2025-02-05 17:34:21 - INFO - stdout - {'loss': 0.9016, 'grad_norm': 1.0013291835784912, 'learning_rate': 1.6611169083367188e-05, 'epoch': 0.88} +2025-02-05 17:34:21 - ERROR - stderr - 29%|██▉ | 6551/22434 [7:26:41<11:06:31, 2.52s/it] +2025-02-05 17:34:23 - ERROR - stderr - 29%|██▉ | 6552/22434 [7:26:43<11:03:53, 2.51s/it] +2025-02-05 17:34:23 - ERROR - stderr - +2025-02-05 17:34:23 - ERROR - stderr - +2025-02-05 17:34:23 - INFO - stdout - {'loss': 0.9127, 'grad_norm': 0.9989796280860901, 'learning_rate': 1.6610085796062022e-05, 'epoch': 0.88} +2025-02-05 17:34:23 - ERROR - stderr - 29%|██▉ | 6552/22434 [7:26:43<11:03:53, 2.51s/it] +2025-02-05 17:34:26 - ERROR - stderr - 29%|██▉ | 6553/22434 [7:26:46<11:02:54, 2.50s/it] +2025-02-05 17:34:26 - ERROR - stderr - +2025-02-05 17:34:26 - ERROR - stderr - +2025-02-05 17:34:26 - INFO - stdout - {'loss': 0.8754, 'grad_norm': 1.116627812385559, 'learning_rate': 1.6609002370975937e-05, 'epoch': 0.88} +2025-02-05 17:34:26 - ERROR - stderr - 29%|██▉ | 6553/22434 [7:26:46<11:02:54, 2.50s/it] +2025-02-05 17:34:28 - ERROR - stderr - 29%|██▉ | 6554/22434 [7:26:48<10:59:38, 2.49s/it] +2025-02-05 17:34:28 - ERROR - stderr - +2025-02-05 17:34:28 - ERROR - stderr - +2025-02-05 17:34:28 - INFO - stdout - {'loss': 0.957, 'grad_norm': 1.0261844396591187, 'learning_rate': 1.6607918808131526e-05, 'epoch': 0.88} +2025-02-05 17:34:28 - ERROR - stderr - 29%|██▉ | 6554/22434 [7:26:48<10:59:38, 2.49s/it] +2025-02-05 17:34:31 - ERROR - stderr - 29%|██▉ | 6555/22434 [7:26:51<10:59:41, 2.49s/it] +2025-02-05 17:34:31 - ERROR - stderr - +2025-02-05 17:34:31 - ERROR - stderr - +2025-02-05 17:34:31 - INFO - stdout - {'loss': 0.9624, 'grad_norm': 1.0615718364715576, 'learning_rate': 1.6606835107551365e-05, 'epoch': 0.88} +2025-02-05 17:34:31 - ERROR - stderr - 29%|██▉ | 6555/22434 [7:26:51<10:59:41, 2.49s/it] +2025-02-05 17:34:34 - ERROR - stderr - 29%|██▉ | 6556/22434 [7:26:54<11:37:00, 2.63s/it] +2025-02-05 17:34:34 - ERROR - stderr - +2025-02-05 17:34:34 - ERROR - stderr - +2025-02-05 17:34:34 - INFO - stdout - {'loss': 1.0075, 'grad_norm': 1.0716177225112915, 'learning_rate': 1.6605751269258054e-05, 'epoch': 0.88} +2025-02-05 17:34:34 - ERROR - stderr - 29%|██▉ | 6556/22434 [7:26:54<11:37:00, 2.63s/it] +2025-02-05 17:34:36 - ERROR - stderr - 29%|██▉ | 6557/22434 [7:26:56<11:31:57, 2.61s/it] +2025-02-05 17:34:36 - ERROR - stderr - +2025-02-05 17:34:36 - ERROR - stderr - +2025-02-05 17:34:36 - INFO - stdout - {'loss': 0.8836, 'grad_norm': 1.0088770389556885, 'learning_rate': 1.6604667293274174e-05, 'epoch': 0.88} +2025-02-05 17:34:36 - ERROR - stderr - 29%|██▉ | 6557/22434 [7:26:56<11:31:57, 2.61s/it] +2025-02-05 17:34:39 - ERROR - stderr - 29%|██▉ | 6558/22434 [7:26:59<11:25:54, 2.59s/it] +2025-02-05 17:34:39 - ERROR - stderr - +2025-02-05 17:34:39 - ERROR - stderr - +2025-02-05 17:34:39 - INFO - stdout - {'loss': 0.9297, 'grad_norm': 0.9648098349571228, 'learning_rate': 1.6603583179622327e-05, 'epoch': 0.88} +2025-02-05 17:34:39 - ERROR - stderr - 29%|██▉ | 6558/22434 [7:26:59<11:25:54, 2.59s/it] +2025-02-05 17:34:41 - ERROR - stderr - 29%|██▉ | 6559/22434 [7:27:01<11:18:32, 2.56s/it] +2025-02-05 17:34:41 - ERROR - stderr - +2025-02-05 17:34:41 - ERROR - stderr - +2025-02-05 17:34:41 - INFO - stdout - {'loss': 0.8852, 'grad_norm': 1.0674529075622559, 'learning_rate': 1.6602498928325105e-05, 'epoch': 0.88} +2025-02-05 17:34:41 - ERROR - stderr - 29%|██▉ | 6559/22434 [7:27:01<11:18:32, 2.56s/it] +2025-02-05 17:34:44 - ERROR - stderr - 29%|██▉ | 6560/22434 [7:27:04<11:39:15, 2.64s/it] +2025-02-05 17:34:44 - ERROR - stderr - +2025-02-05 17:34:44 - ERROR - stderr - +2025-02-05 17:34:44 - INFO - stdout - {'loss': 0.9268, 'grad_norm': 0.9591109156608582, 'learning_rate': 1.6601414539405114e-05, 'epoch': 0.88} +2025-02-05 17:34:44 - ERROR - stderr - 29%|██▉ | 6560/22434 [7:27:04<11:39:15, 2.64s/it] +2025-02-05 17:34:47 - ERROR - stderr - 29%|██▉ | 6561/22434 [7:27:06<11:24:34, 2.59s/it] +2025-02-05 17:34:47 - ERROR - stderr - +2025-02-05 17:34:47 - ERROR - stderr - +2025-02-05 17:34:47 - INFO - stdout - {'loss': 0.9684, 'grad_norm': 1.0160752534866333, 'learning_rate': 1.660033001288495e-05, 'epoch': 0.88} +2025-02-05 17:34:47 - ERROR - stderr - 29%|██▉ | 6561/22434 [7:27:06<11:24:34, 2.59s/it] +2025-02-05 17:34:49 - ERROR - stderr - 29%|██▉ | 6562/22434 [7:27:09<11:14:09, 2.55s/it] +2025-02-05 17:34:49 - ERROR - stderr - +2025-02-05 17:34:49 - ERROR - stderr - +2025-02-05 17:34:49 - INFO - stdout - {'loss': 1.0523, 'grad_norm': 1.109384298324585, 'learning_rate': 1.659924534878723e-05, 'epoch': 0.88} +2025-02-05 17:34:49 - ERROR - stderr - 29%|██▉ | 6562/22434 [7:27:09<11:14:09, 2.55s/it] +2025-02-05 17:34:52 - ERROR - stderr - 29%|██▉ | 6563/22434 [7:27:11<11:11:46, 2.54s/it] +2025-02-05 17:34:52 - ERROR - stderr - +2025-02-05 17:34:52 - ERROR - stderr - +2025-02-05 17:34:52 - INFO - stdout - {'loss': 0.9346, 'grad_norm': 0.9370120763778687, 'learning_rate': 1.659816054713455e-05, 'epoch': 0.88} +2025-02-05 17:34:52 - ERROR - stderr - 29%|██▉ | 6563/22434 [7:27:11<11:11:46, 2.54s/it] +2025-02-05 17:34:54 - ERROR - stderr - 29%|██▉ | 6564/22434 [7:27:14<11:10:38, 2.54s/it] +2025-02-05 17:34:54 - ERROR - stderr - +2025-02-05 17:34:54 - ERROR - stderr - +2025-02-05 17:34:54 - INFO - stdout - {'loss': 0.8721, 'grad_norm': 1.0478984117507935, 'learning_rate': 1.6597075607949525e-05, 'epoch': 0.88} +2025-02-05 17:34:54 - ERROR - stderr - 29%|██▉ | 6564/22434 [7:27:14<11:10:38, 2.54s/it] +2025-02-05 17:34:57 - ERROR - stderr - 29%|██▉ | 6565/22434 [7:27:16<11:10:53, 2.54s/it] +2025-02-05 17:34:57 - ERROR - stderr - +2025-02-05 17:34:57 - ERROR - stderr - +2025-02-05 17:34:57 - INFO - stdout - {'loss': 0.9286, 'grad_norm': 0.9581248164176941, 'learning_rate': 1.6595990531254776e-05, 'epoch': 0.88} +2025-02-05 17:34:57 - ERROR - stderr - 29%|██▉ | 6565/22434 [7:27:17<11:10:53, 2.54s/it] +2025-02-05 17:34:59 - ERROR - stderr - 29%|██▉ | 6566/22434 [7:27:19<11:04:24, 2.51s/it] +2025-02-05 17:34:59 - ERROR - stderr - +2025-02-05 17:34:59 - ERROR - stderr - +2025-02-05 17:34:59 - INFO - stdout - {'loss': 0.8923, 'grad_norm': 0.9890875220298767, 'learning_rate': 1.6594905317072916e-05, 'epoch': 0.88} +2025-02-05 17:34:59 - ERROR - stderr - 29%|██▉ | 6566/22434 [7:27:19<11:04:24, 2.51s/it] +2025-02-05 17:35:02 - ERROR - stderr - 29%|██▉ | 6567/22434 [7:27:21<11:05:37, 2.52s/it] +2025-02-05 17:35:02 - ERROR - stderr - +2025-02-05 17:35:02 - ERROR - stderr - +2025-02-05 17:35:02 - INFO - stdout - {'loss': 0.8655, 'grad_norm': 0.9938724040985107, 'learning_rate': 1.6593819965426563e-05, 'epoch': 0.88} +2025-02-05 17:35:02 - ERROR - stderr - 29%|██▉ | 6567/22434 [7:27:21<11:05:37, 2.52s/it] +2025-02-05 17:35:04 - ERROR - stderr - 29%|██▉ | 6568/22434 [7:27:24<11:09:49, 2.53s/it] +2025-02-05 17:35:04 - ERROR - stderr - +2025-02-05 17:35:04 - ERROR - stderr - +2025-02-05 17:35:04 - INFO - stdout - {'loss': 0.9498, 'grad_norm': 1.1764945983886719, 'learning_rate': 1.6592734476338344e-05, 'epoch': 0.88} +2025-02-05 17:35:04 - ERROR - stderr - 29%|██▉ | 6568/22434 [7:27:24<11:09:49, 2.53s/it] +2025-02-05 17:35:07 - ERROR - stderr - 29%|██▉ | 6569/22434 [7:27:27<11:07:31, 2.52s/it] +2025-02-05 17:35:07 - ERROR - stderr - +2025-02-05 17:35:07 - ERROR - stderr - +2025-02-05 17:35:07 - INFO - stdout - {'loss': 0.8936, 'grad_norm': 1.0022259950637817, 'learning_rate': 1.659164884983088e-05, 'epoch': 0.88} +2025-02-05 17:35:07 - ERROR - stderr - 29%|██▉ | 6569/22434 [7:27:27<11:07:31, 2.52s/it] +2025-02-05 17:35:09 - ERROR - stderr - 29%|██▉ | 6570/22434 [7:27:29<11:04:11, 2.51s/it] +2025-02-05 17:35:09 - ERROR - stderr - +2025-02-05 17:35:09 - ERROR - stderr - +2025-02-05 17:35:09 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 1.003389835357666, 'learning_rate': 1.659056308592681e-05, 'epoch': 0.88} +2025-02-05 17:35:09 - ERROR - stderr - 29%|██▉ | 6570/22434 [7:27:29<11:04:11, 2.51s/it] +2025-02-05 17:35:12 - ERROR - stderr - 29%|██▉ | 6571/22434 [7:27:32<11:07:35, 2.53s/it] +2025-02-05 17:35:12 - ERROR - stderr - +2025-02-05 17:35:12 - ERROR - stderr - +2025-02-05 17:35:12 - INFO - stdout - {'loss': 1.0215, 'grad_norm': 0.9841601252555847, 'learning_rate': 1.6589477184648752e-05, 'epoch': 0.88} +2025-02-05 17:35:12 - ERROR - stderr - 29%|██▉ | 6571/22434 [7:27:32<11:07:35, 2.53s/it] +2025-02-05 17:35:14 - ERROR - stderr - 29%|██▉ | 6572/22434 [7:27:34<11:18:48, 2.57s/it] +2025-02-05 17:35:14 - ERROR - stderr - +2025-02-05 17:35:14 - ERROR - stderr - +2025-02-05 17:35:14 - INFO - stdout - {'loss': 0.9147, 'grad_norm': 1.0515648126602173, 'learning_rate': 1.658839114601935e-05, 'epoch': 0.88} +2025-02-05 17:35:14 - ERROR - stderr - 29%|██▉ | 6572/22434 [7:27:34<11:18:48, 2.57s/it] +2025-02-05 17:35:17 - ERROR - stderr - 29%|██▉ | 6573/22434 [7:27:37<11:11:31, 2.54s/it] +2025-02-05 17:35:17 - ERROR - stderr - +2025-02-05 17:35:17 - ERROR - stderr - +2025-02-05 17:35:17 - INFO - stdout - {'loss': 0.9117, 'grad_norm': 1.1183761358261108, 'learning_rate': 1.658730497006124e-05, 'epoch': 0.88} +2025-02-05 17:35:17 - ERROR - stderr - 29%|██▉ | 6573/22434 [7:27:37<11:11:31, 2.54s/it] +2025-02-05 17:35:19 - ERROR - stderr - 29%|██▉ | 6574/22434 [7:27:39<11:02:50, 2.51s/it] +2025-02-05 17:35:19 - ERROR - stderr - +2025-02-05 17:35:19 - ERROR - stderr - +2025-02-05 17:35:19 - INFO - stdout - {'loss': 1.0543, 'grad_norm': 1.11531400680542, 'learning_rate': 1.658621865679706e-05, 'epoch': 0.88} +2025-02-05 17:35:19 - ERROR - stderr - 29%|██▉ | 6574/22434 [7:27:39<11:02:50, 2.51s/it] +2025-02-05 17:35:22 - ERROR - stderr - 29%|██▉ | 6575/22434 [7:27:42<11:02:52, 2.51s/it] +2025-02-05 17:35:22 - ERROR - stderr - +2025-02-05 17:35:22 - ERROR - stderr - +2025-02-05 17:35:22 - INFO - stdout - {'loss': 0.9579, 'grad_norm': 0.946401834487915, 'learning_rate': 1.6585132206249455e-05, 'epoch': 0.88} +2025-02-05 17:35:22 - ERROR - stderr - 29%|██▉ | 6575/22434 [7:27:42<11:02:52, 2.51s/it] +2025-02-05 17:35:24 - ERROR - stderr - 29%|██▉ | 6576/22434 [7:27:44<11:01:52, 2.50s/it] +2025-02-05 17:35:24 - ERROR - stderr - +2025-02-05 17:35:24 - ERROR - stderr - +2025-02-05 17:35:24 - INFO - stdout - {'loss': 1.0309, 'grad_norm': 1.1686761379241943, 'learning_rate': 1.658404561844107e-05, 'epoch': 0.88} +2025-02-05 17:35:24 - ERROR - stderr - 29%|██▉ | 6576/22434 [7:27:44<11:01:52, 2.50s/it] +2025-02-05 17:35:27 - ERROR - stderr - 29%|██▉ | 6577/22434 [7:27:47<10:58:43, 2.49s/it] +2025-02-05 17:35:27 - ERROR - stderr - +2025-02-05 17:35:27 - ERROR - stderr - +2025-02-05 17:35:27 - INFO - stdout - {'loss': 0.7868, 'grad_norm': 1.0884144306182861, 'learning_rate': 1.6582958893394556e-05, 'epoch': 0.88} +2025-02-05 17:35:27 - ERROR - stderr - 29%|██▉ | 6577/22434 [7:27:47<10:58:43, 2.49s/it] +2025-02-05 17:35:29 - ERROR - stderr - 29%|██▉ | 6578/22434 [7:27:49<10:56:02, 2.48s/it] +2025-02-05 17:35:29 - ERROR - stderr - +2025-02-05 17:35:29 - ERROR - stderr - +2025-02-05 17:35:29 - INFO - stdout - {'loss': 0.9171, 'grad_norm': 1.0776336193084717, 'learning_rate': 1.6581872031132565e-05, 'epoch': 0.88} +2025-02-05 17:35:29 - ERROR - stderr - 29%|██▉ | 6578/22434 [7:27:49<10:56:02, 2.48s/it] +2025-02-05 17:35:32 - ERROR - stderr - 29%|██▉ | 6579/22434 [7:27:52<10:56:08, 2.48s/it] +2025-02-05 17:35:32 - ERROR - stderr - +2025-02-05 17:35:32 - ERROR - stderr - +2025-02-05 17:35:32 - INFO - stdout - {'loss': 0.9102, 'grad_norm': 1.0471206903457642, 'learning_rate': 1.6580785031677743e-05, 'epoch': 0.88} +2025-02-05 17:35:32 - ERROR - stderr - 29%|██▉ | 6579/22434 [7:27:52<10:56:08, 2.48s/it] +2025-02-05 17:35:34 - ERROR - stderr - 29%|██▉ | 6580/22434 [7:27:54<10:57:44, 2.49s/it] +2025-02-05 17:35:34 - ERROR - stderr - +2025-02-05 17:35:34 - ERROR - stderr - +2025-02-05 17:35:34 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.0015294551849365, 'learning_rate': 1.6579697895052758e-05, 'epoch': 0.88} +2025-02-05 17:35:34 - ERROR - stderr - 29%|██▉ | 6580/22434 [7:27:54<10:57:44, 2.49s/it] +2025-02-05 17:35:37 - ERROR - stderr - 29%|██▉ | 6581/22434 [7:27:57<10:58:27, 2.49s/it] +2025-02-05 17:35:37 - ERROR - stderr - +2025-02-05 17:35:37 - ERROR - stderr - +2025-02-05 17:35:37 - INFO - stdout - {'loss': 0.9573, 'grad_norm': 1.0121917724609375, 'learning_rate': 1.6578610621280267e-05, 'epoch': 0.88} +2025-02-05 17:35:37 - ERROR - stderr - 29%|██▉ | 6581/22434 [7:27:57<10:58:27, 2.49s/it] +2025-02-05 17:35:39 - ERROR - stderr - 29%|██▉ | 6582/22434 [7:27:59<10:54:23, 2.48s/it] +2025-02-05 17:35:39 - ERROR - stderr - +2025-02-05 17:35:39 - ERROR - stderr - +2025-02-05 17:35:39 - INFO - stdout - {'loss': 0.9368, 'grad_norm': 0.9298769235610962, 'learning_rate': 1.6577523210382935e-05, 'epoch': 0.88} +2025-02-05 17:35:39 - ERROR - stderr - 29%|██▉ | 6582/22434 [7:27:59<10:54:23, 2.48s/it] +2025-02-05 17:35:42 - ERROR - stderr - 29%|██▉ | 6583/22434 [7:28:02<11:06:43, 2.52s/it] +2025-02-05 17:35:42 - ERROR - stderr - +2025-02-05 17:35:42 - ERROR - stderr - +2025-02-05 17:35:42 - INFO - stdout - {'loss': 0.9755, 'grad_norm': 1.111396074295044, 'learning_rate': 1.657643566238342e-05, 'epoch': 0.88} +2025-02-05 17:35:42 - ERROR - stderr - 29%|██▉ | 6583/22434 [7:28:02<11:06:43, 2.52s/it] +2025-02-05 17:35:45 - ERROR - stderr - 29%|██▉ | 6584/22434 [7:28:05<11:40:29, 2.65s/it] +2025-02-05 17:35:45 - ERROR - stderr - +2025-02-05 17:35:45 - ERROR - stderr - +2025-02-05 17:35:45 - INFO - stdout - {'loss': 0.897, 'grad_norm': 0.9899436235427856, 'learning_rate': 1.6575347977304398e-05, 'epoch': 0.88} +2025-02-05 17:35:45 - ERROR - stderr - 29%|██▉ | 6584/22434 [7:28:05<11:40:29, 2.65s/it] +2025-02-05 17:35:47 - ERROR - stderr - 29%|██▉ | 6585/22434 [7:28:07<11:25:02, 2.59s/it] +2025-02-05 17:35:47 - ERROR - stderr - +2025-02-05 17:35:47 - ERROR - stderr - +2025-02-05 17:35:47 - INFO - stdout - {'loss': 0.8432, 'grad_norm': 1.1105124950408936, 'learning_rate': 1.657426015516854e-05, 'epoch': 0.88} +2025-02-05 17:35:47 - ERROR - stderr - 29%|██▉ | 6585/22434 [7:28:07<11:25:02, 2.59s/it] +2025-02-05 17:35:50 - ERROR - stderr - 29%|██▉ | 6586/22434 [7:28:10<11:19:07, 2.57s/it] +2025-02-05 17:35:50 - ERROR - stderr - +2025-02-05 17:35:50 - ERROR - stderr - +2025-02-05 17:35:50 - INFO - stdout - {'loss': 1.05, 'grad_norm': 1.1892081499099731, 'learning_rate': 1.657317219599852e-05, 'epoch': 0.88} +2025-02-05 17:35:50 - ERROR - stderr - 29%|██▉ | 6586/22434 [7:28:10<11:19:07, 2.57s/it] +2025-02-05 17:35:52 - ERROR - stderr - 29%|██▉ | 6587/22434 [7:28:12<11:15:49, 2.56s/it] +2025-02-05 17:35:52 - ERROR - stderr - +2025-02-05 17:35:52 - ERROR - stderr - +2025-02-05 17:35:52 - INFO - stdout - {'loss': 0.9101, 'grad_norm': 0.9492790699005127, 'learning_rate': 1.657208409981702e-05, 'epoch': 0.88} +2025-02-05 17:35:52 - ERROR - stderr - 29%|██▉ | 6587/22434 [7:28:12<11:15:49, 2.56s/it] +2025-02-05 17:35:55 - ERROR - stderr - 29%|██▉ | 6588/22434 [7:28:15<11:11:06, 2.54s/it] +2025-02-05 17:35:55 - ERROR - stderr - +2025-02-05 17:35:55 - ERROR - stderr - +2025-02-05 17:35:55 - INFO - stdout - {'loss': 0.8026, 'grad_norm': 1.0121068954467773, 'learning_rate': 1.6570995866646707e-05, 'epoch': 0.88} +2025-02-05 17:35:55 - ERROR - stderr - 29%|██▉ | 6588/22434 [7:28:15<11:11:06, 2.54s/it] +2025-02-05 17:35:57 - ERROR - stderr - 29%|██▉ | 6589/22434 [7:28:17<11:02:55, 2.51s/it] +2025-02-05 17:35:57 - ERROR - stderr - +2025-02-05 17:35:57 - ERROR - stderr - +2025-02-05 17:35:57 - INFO - stdout - {'loss': 0.8982, 'grad_norm': 1.101181983947754, 'learning_rate': 1.656990749651028e-05, 'epoch': 0.88} +2025-02-05 17:35:57 - ERROR - stderr - 29%|██▉ | 6589/22434 [7:28:17<11:02:55, 2.51s/it] +2025-02-05 17:36:00 - ERROR - stderr - 29%|██▉ | 6590/22434 [7:28:20<11:02:41, 2.51s/it] +2025-02-05 17:36:00 - ERROR - stderr - +2025-02-05 17:36:00 - ERROR - stderr - +2025-02-05 17:36:00 - INFO - stdout - {'loss': 0.9951, 'grad_norm': 1.0665388107299805, 'learning_rate': 1.6568818989430416e-05, 'epoch': 0.88} +2025-02-05 17:36:00 - ERROR - stderr - 29%|██▉ | 6590/22434 [7:28:20<11:02:41, 2.51s/it] +2025-02-05 17:36:02 - ERROR - stderr - 29%|██▉ | 6591/22434 [7:28:22<11:03:38, 2.51s/it] +2025-02-05 17:36:02 - ERROR - stderr - +2025-02-05 17:36:02 - ERROR - stderr - +2025-02-05 17:36:02 - INFO - stdout - {'loss': 0.8333, 'grad_norm': 1.0103232860565186, 'learning_rate': 1.6567730345429803e-05, 'epoch': 0.88} +2025-02-05 17:36:02 - ERROR - stderr - 29%|██▉ | 6591/22434 [7:28:22<11:03:38, 2.51s/it] +2025-02-05 17:36:05 - ERROR - stderr - 29%|██▉ | 6592/22434 [7:28:25<11:05:30, 2.52s/it] +2025-02-05 17:36:05 - ERROR - stderr - +2025-02-05 17:36:05 - ERROR - stderr - +2025-02-05 17:36:05 - INFO - stdout - {'loss': 0.8986, 'grad_norm': 1.0503722429275513, 'learning_rate': 1.656664156453114e-05, 'epoch': 0.88} +2025-02-05 17:36:05 - ERROR - stderr - 29%|██▉ | 6592/22434 [7:28:25<11:05:30, 2.52s/it] +2025-02-05 17:36:07 - ERROR - stderr - 29%|██▉ | 6593/22434 [7:28:27<11:03:20, 2.51s/it] +2025-02-05 17:36:07 - ERROR - stderr - +2025-02-05 17:36:07 - ERROR - stderr - +2025-02-05 17:36:07 - INFO - stdout - {'loss': 0.9581, 'grad_norm': 1.1477293968200684, 'learning_rate': 1.6565552646757114e-05, 'epoch': 0.88} +2025-02-05 17:36:07 - ERROR - stderr - 29%|██▉ | 6593/22434 [7:28:27<11:03:20, 2.51s/it] +2025-02-05 17:36:10 - ERROR - stderr - 29%|██▉ | 6594/22434 [7:28:30<11:01:43, 2.51s/it] +2025-02-05 17:36:10 - ERROR - stderr - +2025-02-05 17:36:10 - ERROR - stderr - +2025-02-05 17:36:10 - INFO - stdout - {'loss': 1.0245, 'grad_norm': 0.9949456453323364, 'learning_rate': 1.656446359213043e-05, 'epoch': 0.88} +2025-02-05 17:36:10 - ERROR - stderr - 29%|██▉ | 6594/22434 [7:28:30<11:01:43, 2.51s/it] +2025-02-05 17:36:12 - ERROR - stderr - 29%|██▉ | 6595/22434 [7:28:32<11:01:30, 2.51s/it] +2025-02-05 17:36:12 - ERROR - stderr - +2025-02-05 17:36:12 - ERROR - stderr - +2025-02-05 17:36:12 - INFO - stdout - {'loss': 0.9538, 'grad_norm': 1.1275508403778076, 'learning_rate': 1.656337440067378e-05, 'epoch': 0.88} +2025-02-05 17:36:12 - ERROR - stderr - 29%|██▉ | 6595/22434 [7:28:32<11:01:30, 2.51s/it] +2025-02-05 17:36:15 - ERROR - stderr - 29%|██▉ | 6596/22434 [7:28:35<11:00:59, 2.50s/it] +2025-02-05 17:36:15 - ERROR - stderr - +2025-02-05 17:36:15 - ERROR - stderr - +2025-02-05 17:36:15 - INFO - stdout - {'loss': 0.9261, 'grad_norm': 1.0195989608764648, 'learning_rate': 1.656228507240987e-05, 'epoch': 0.88} +2025-02-05 17:36:15 - ERROR - stderr - 29%|██▉ | 6596/22434 [7:28:35<11:00:59, 2.50s/it] +2025-02-05 17:36:18 - ERROR - stderr - 29%|██▉ | 6597/22434 [7:28:37<11:26:57, 2.60s/it] +2025-02-05 17:36:18 - ERROR - stderr - +2025-02-05 17:36:18 - ERROR - stderr - +2025-02-05 17:36:18 - INFO - stdout - {'loss': 0.9152, 'grad_norm': 1.0239284038543701, 'learning_rate': 1.6561195607361407e-05, 'epoch': 0.88} +2025-02-05 17:36:18 - ERROR - stderr - 29%|██▉ | 6597/22434 [7:28:37<11:26:57, 2.60s/it] +2025-02-05 17:36:20 - ERROR - stderr - 29%|██▉ | 6598/22434 [7:28:40<11:21:56, 2.58s/it] +2025-02-05 17:36:20 - ERROR - stderr - +2025-02-05 17:36:20 - ERROR - stderr - +2025-02-05 17:36:20 - INFO - stdout - {'loss': 0.8932, 'grad_norm': 0.939383327960968, 'learning_rate': 1.6560106005551106e-05, 'epoch': 0.88} +2025-02-05 17:36:20 - ERROR - stderr - 29%|██▉ | 6598/22434 [7:28:40<11:21:56, 2.58s/it] +2025-02-05 17:36:23 - ERROR - stderr - 29%|██▉ | 6599/22434 [7:28:42<11:11:21, 2.54s/it] +2025-02-05 17:36:23 - ERROR - stderr - +2025-02-05 17:36:23 - ERROR - stderr - +2025-02-05 17:36:23 - INFO - stdout - {'loss': 0.8921, 'grad_norm': 0.9758705496788025, 'learning_rate': 1.6559016267001667e-05, 'epoch': 0.88} +2025-02-05 17:36:23 - ERROR - stderr - 29%|██▉ | 6599/22434 [7:28:42<11:11:21, 2.54s/it] +2025-02-05 17:36:25 - ERROR - stderr - 29%|██▉ | 6600/22434 [7:28:45<11:11:46, 2.55s/it] +2025-02-05 17:36:25 - ERROR - stderr - +2025-02-05 17:36:25 - ERROR - stderr - +2025-02-05 17:36:25 - INFO - stdout - {'loss': 1.1565, 'grad_norm': 1.225216269493103, 'learning_rate': 1.655792639173581e-05, 'epoch': 0.88} +2025-02-05 17:36:25 - ERROR - stderr - 29%|██▉ | 6600/22434 [7:28:45<11:11:46, 2.55s/it] +2025-02-05 17:36:28 - ERROR - stderr - 29%|██▉ | 6601/22434 [7:28:48<11:13:36, 2.55s/it] +2025-02-05 17:36:28 - ERROR - stderr - +2025-02-05 17:36:28 - ERROR - stderr - +2025-02-05 17:36:28 - INFO - stdout - {'loss': 0.8809, 'grad_norm': 1.187839388847351, 'learning_rate': 1.6556836379776254e-05, 'epoch': 0.88} +2025-02-05 17:36:28 - ERROR - stderr - 29%|██▉ | 6601/22434 [7:28:48<11:13:36, 2.55s/it] +2025-02-05 17:36:30 - ERROR - stderr - 29%|██▉ | 6602/22434 [7:28:50<11:06:23, 2.53s/it] +2025-02-05 17:36:30 - ERROR - stderr - +2025-02-05 17:36:30 - ERROR - stderr - +2025-02-05 17:36:30 - INFO - stdout - {'loss': 0.924, 'grad_norm': 1.1948819160461426, 'learning_rate': 1.655574623114572e-05, 'epoch': 0.88} +2025-02-05 17:36:30 - ERROR - stderr - 29%|██▉ | 6602/22434 [7:28:50<11:06:23, 2.53s/it] +2025-02-05 17:36:33 - ERROR - stderr - 29%|██▉ | 6603/22434 [7:28:52<11:03:29, 2.51s/it] +2025-02-05 17:36:33 - ERROR - stderr - +2025-02-05 17:36:33 - ERROR - stderr - +2025-02-05 17:36:33 - INFO - stdout - {'loss': 0.9221, 'grad_norm': 1.0240799188613892, 'learning_rate': 1.6554655945866926e-05, 'epoch': 0.88} +2025-02-05 17:36:33 - ERROR - stderr - 29%|██▉ | 6603/22434 [7:28:53<11:03:29, 2.51s/it] +2025-02-05 17:36:35 - ERROR - stderr - 29%|██▉ | 6604/22434 [7:28:55<11:07:06, 2.53s/it] +2025-02-05 17:36:35 - ERROR - stderr - +2025-02-05 17:36:35 - ERROR - stderr - +2025-02-05 17:36:35 - INFO - stdout - {'loss': 0.988, 'grad_norm': 1.087586760520935, 'learning_rate': 1.6553565523962602e-05, 'epoch': 0.88} +2025-02-05 17:36:35 - ERROR - stderr - 29%|██▉ | 6604/22434 [7:28:55<11:07:06, 2.53s/it] +2025-02-05 17:36:38 - ERROR - stderr - 29%|██▉ | 6605/22434 [7:28:58<11:06:29, 2.53s/it] +2025-02-05 17:36:38 - ERROR - stderr - +2025-02-05 17:36:38 - ERROR - stderr - +2025-02-05 17:36:38 - INFO - stdout - {'loss': 0.9485, 'grad_norm': 1.1838536262512207, 'learning_rate': 1.6552474965455475e-05, 'epoch': 0.88} +2025-02-05 17:36:38 - ERROR - stderr - 29%|██▉ | 6605/22434 [7:28:58<11:06:29, 2.53s/it] +2025-02-05 17:36:40 - ERROR - stderr - 29%|██▉ | 6606/22434 [7:29:00<11:07:04, 2.53s/it] +2025-02-05 17:36:40 - ERROR - stderr - +2025-02-05 17:36:40 - ERROR - stderr - +2025-02-05 17:36:40 - INFO - stdout - {'loss': 0.8552, 'grad_norm': 0.9590768218040466, 'learning_rate': 1.6551384270368277e-05, 'epoch': 0.88} +2025-02-05 17:36:40 - ERROR - stderr - 29%|██▉ | 6606/22434 [7:29:00<11:07:04, 2.53s/it] +2025-02-05 17:36:43 - ERROR - stderr - 29%|██▉ | 6607/22434 [7:29:02<10:57:49, 2.49s/it] +2025-02-05 17:36:43 - ERROR - stderr - +2025-02-05 17:36:43 - ERROR - stderr - +2025-02-05 17:36:43 - INFO - stdout - {'loss': 0.8386, 'grad_norm': 1.1438350677490234, 'learning_rate': 1.6550293438723745e-05, 'epoch': 0.88} +2025-02-05 17:36:43 - ERROR - stderr - 29%|██▉ | 6607/22434 [7:29:03<10:57:49, 2.49s/it] +2025-02-05 17:36:45 - ERROR - stderr - 29%|██▉ | 6608/22434 [7:29:05<10:54:09, 2.48s/it] +2025-02-05 17:36:45 - ERROR - stderr - +2025-02-05 17:36:45 - ERROR - stderr - +2025-02-05 17:36:45 - INFO - stdout - {'loss': 0.9223, 'grad_norm': 1.0866752862930298, 'learning_rate': 1.6549202470544613e-05, 'epoch': 0.88} +2025-02-05 17:36:45 - ERROR - stderr - 29%|██▉ | 6608/22434 [7:29:05<10:54:09, 2.48s/it] +2025-02-05 17:36:48 - ERROR - stderr - 29%|██▉ | 6609/22434 [7:29:07<10:51:45, 2.47s/it] +2025-02-05 17:36:48 - ERROR - stderr - +2025-02-05 17:36:48 - ERROR - stderr - +2025-02-05 17:36:48 - INFO - stdout - {'loss': 0.9169, 'grad_norm': 0.9984356760978699, 'learning_rate': 1.6548111365853623e-05, 'epoch': 0.88} +2025-02-05 17:36:48 - ERROR - stderr - 29%|██▉ | 6609/22434 [7:29:07<10:51:45, 2.47s/it] +2025-02-05 17:36:50 - ERROR - stderr - 29%|██▉ | 6610/22434 [7:29:10<10:53:20, 2.48s/it] +2025-02-05 17:36:50 - ERROR - stderr - +2025-02-05 17:36:50 - ERROR - stderr - +2025-02-05 17:36:50 - INFO - stdout - {'loss': 0.831, 'grad_norm': 1.0741022825241089, 'learning_rate': 1.654702012467352e-05, 'epoch': 0.88} +2025-02-05 17:36:50 - ERROR - stderr - 29%|██▉ | 6610/22434 [7:29:10<10:53:20, 2.48s/it] +2025-02-05 17:36:53 - ERROR - stderr - 29%|██▉ | 6611/22434 [7:29:12<10:55:22, 2.49s/it] +2025-02-05 17:36:53 - ERROR - stderr - +2025-02-05 17:36:53 - ERROR - stderr - +2025-02-05 17:36:53 - INFO - stdout - {'loss': 0.9227, 'grad_norm': 1.0321089029312134, 'learning_rate': 1.6545928747027044e-05, 'epoch': 0.88} +2025-02-05 17:36:53 - ERROR - stderr - 29%|██▉ | 6611/22434 [7:29:12<10:55:22, 2.49s/it] +2025-02-05 17:36:55 - ERROR - stderr - 29%|██▉ | 6612/22434 [7:29:15<10:57:31, 2.49s/it] +2025-02-05 17:36:55 - ERROR - stderr - +2025-02-05 17:36:55 - ERROR - stderr - +2025-02-05 17:36:55 - INFO - stdout - {'loss': 0.8206, 'grad_norm': 0.9451794624328613, 'learning_rate': 1.6544837232936946e-05, 'epoch': 0.88} +2025-02-05 17:36:55 - ERROR - stderr - 29%|██▉ | 6612/22434 [7:29:15<10:57:31, 2.49s/it] +2025-02-05 17:36:58 - ERROR - stderr - 29%|██▉ | 6613/22434 [7:29:17<10:52:38, 2.48s/it] +2025-02-05 17:36:58 - ERROR - stderr - +2025-02-05 17:36:58 - ERROR - stderr - +2025-02-05 17:36:58 - INFO - stdout - {'loss': 1.1484, 'grad_norm': 1.1911232471466064, 'learning_rate': 1.654374558242598e-05, 'epoch': 0.88} +2025-02-05 17:36:58 - ERROR - stderr - 29%|██▉ | 6613/22434 [7:29:17<10:52:38, 2.48s/it] +2025-02-05 17:37:00 - ERROR - stderr - 29%|██▉ | 6614/22434 [7:29:20<10:52:26, 2.47s/it] +2025-02-05 17:37:00 - ERROR - stderr - +2025-02-05 17:37:00 - ERROR - stderr - +2025-02-05 17:37:00 - INFO - stdout - {'loss': 0.8514, 'grad_norm': 0.9948623776435852, 'learning_rate': 1.65426537955169e-05, 'epoch': 0.88} +2025-02-05 17:37:00 - ERROR - stderr - 29%|██▉ | 6614/22434 [7:29:20<10:52:26, 2.47s/it] +2025-02-05 17:37:03 - ERROR - stderr - 29%|██▉ | 6615/22434 [7:29:23<11:29:32, 2.62s/it] +2025-02-05 17:37:03 - ERROR - stderr - +2025-02-05 17:37:03 - ERROR - stderr - +2025-02-05 17:37:03 - INFO - stdout - {'loss': 0.997, 'grad_norm': 1.0557894706726074, 'learning_rate': 1.654156187223246e-05, 'epoch': 0.88} +2025-02-05 17:37:03 - ERROR - stderr - 29%|██▉ | 6615/22434 [7:29:23<11:29:32, 2.62s/it] +2025-02-05 17:37:05 - ERROR - stderr - 29%|██▉ | 6616/22434 [7:29:25<11:20:58, 2.58s/it] +2025-02-05 17:37:06 - ERROR - stderr - +2025-02-05 17:37:06 - ERROR - stderr - +2025-02-05 17:37:06 - INFO - stdout - {'loss': 0.8912, 'grad_norm': 0.9317017197608948, 'learning_rate': 1.6540469812595424e-05, 'epoch': 0.88} +2025-02-05 17:37:06 - ERROR - stderr - 29%|██▉ | 6616/22434 [7:29:25<11:20:58, 2.58s/it] +2025-02-05 17:37:08 - ERROR - stderr - 29%|██▉ | 6617/22434 [7:29:28<11:40:40, 2.66s/it] +2025-02-05 17:37:08 - ERROR - stderr - +2025-02-05 17:37:08 - ERROR - stderr - +2025-02-05 17:37:08 - INFO - stdout - {'loss': 0.9976, 'grad_norm': 1.1089012622833252, 'learning_rate': 1.6539377616628554e-05, 'epoch': 0.88} +2025-02-05 17:37:08 - ERROR - stderr - 29%|██▉ | 6617/22434 [7:29:28<11:40:40, 2.66s/it] +2025-02-05 17:37:11 - ERROR - stderr - 29%|██▉ | 6618/22434 [7:29:31<11:28:32, 2.61s/it] +2025-02-05 17:37:11 - ERROR - stderr - +2025-02-05 17:37:11 - ERROR - stderr - +2025-02-05 17:37:11 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.0861963033676147, 'learning_rate': 1.6538285284354615e-05, 'epoch': 0.88} +2025-02-05 17:37:11 - ERROR - stderr - 29%|██▉ | 6618/22434 [7:29:31<11:28:32, 2.61s/it] +2025-02-05 17:37:13 - ERROR - stderr - 30%|██▉ | 6619/22434 [7:29:33<11:24:46, 2.60s/it] +2025-02-05 17:37:13 - ERROR - stderr - +2025-02-05 17:37:13 - ERROR - stderr - +2025-02-05 17:37:13 - INFO - stdout - {'loss': 0.8918, 'grad_norm': 1.0418671369552612, 'learning_rate': 1.653719281579637e-05, 'epoch': 0.89} +2025-02-05 17:37:13 - ERROR - stderr - 30%|██▉ | 6619/22434 [7:29:33<11:24:46, 2.60s/it] +2025-02-05 17:37:16 - ERROR - stderr - 30%|██▉ | 6620/22434 [7:29:36<11:15:12, 2.56s/it] +2025-02-05 17:37:16 - ERROR - stderr - +2025-02-05 17:37:16 - ERROR - stderr - +2025-02-05 17:37:16 - INFO - stdout - {'loss': 0.9519, 'grad_norm': 1.0051528215408325, 'learning_rate': 1.6536100210976604e-05, 'epoch': 0.89} +2025-02-05 17:37:16 - ERROR - stderr - 30%|██▉ | 6620/22434 [7:29:36<11:15:12, 2.56s/it] +2025-02-05 17:37:18 - ERROR - stderr - 30%|██▉ | 6621/22434 [7:29:38<11:09:24, 2.54s/it] +2025-02-05 17:37:18 - ERROR - stderr - +2025-02-05 17:37:18 - ERROR - stderr - +2025-02-05 17:37:18 - INFO - stdout - {'loss': 1.0263, 'grad_norm': 1.0714529752731323, 'learning_rate': 1.653500746991808e-05, 'epoch': 0.89} +2025-02-05 17:37:18 - ERROR - stderr - 30%|██▉ | 6621/22434 [7:29:38<11:09:24, 2.54s/it] +2025-02-05 17:37:21 - ERROR - stderr - 30%|██▉ | 6622/22434 [7:29:41<11:05:09, 2.52s/it] +2025-02-05 17:37:21 - ERROR - stderr - +2025-02-05 17:37:21 - ERROR - stderr - +2025-02-05 17:37:21 - INFO - stdout - {'loss': 1.0005, 'grad_norm': 1.13709557056427, 'learning_rate': 1.6533914592643582e-05, 'epoch': 0.89} +2025-02-05 17:37:21 - ERROR - stderr - 30%|██▉ | 6622/22434 [7:29:41<11:05:09, 2.52s/it] +2025-02-05 17:37:23 - ERROR - stderr - 30%|██▉ | 6623/22434 [7:29:43<11:07:25, 2.53s/it] +2025-02-05 17:37:23 - ERROR - stderr - +2025-02-05 17:37:23 - ERROR - stderr - +2025-02-05 17:37:23 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.1949125528335571, 'learning_rate': 1.6532821579175884e-05, 'epoch': 0.89} +2025-02-05 17:37:23 - ERROR - stderr - 30%|██▉ | 6623/22434 [7:29:43<11:07:25, 2.53s/it] +2025-02-05 17:37:26 - ERROR - stderr - 30%|██▉ | 6624/22434 [7:29:46<11:13:56, 2.56s/it] +2025-02-05 17:37:26 - ERROR - stderr - +2025-02-05 17:37:26 - ERROR - stderr - +2025-02-05 17:37:26 - INFO - stdout - {'loss': 0.933, 'grad_norm': 1.1206669807434082, 'learning_rate': 1.6531728429537766e-05, 'epoch': 0.89} +2025-02-05 17:37:26 - ERROR - stderr - 30%|██▉ | 6624/22434 [7:29:46<11:13:56, 2.56s/it] +2025-02-05 17:37:29 - ERROR - stderr - 30%|██▉ | 6625/22434 [7:29:48<11:13:37, 2.56s/it] +2025-02-05 17:37:29 - ERROR - stderr - +2025-02-05 17:37:29 - ERROR - stderr - +2025-02-05 17:37:29 - INFO - stdout - {'loss': 0.8734, 'grad_norm': 1.0955919027328491, 'learning_rate': 1.6530635143752028e-05, 'epoch': 0.89} +2025-02-05 17:37:29 - ERROR - stderr - 30%|██▉ | 6625/22434 [7:29:48<11:13:37, 2.56s/it] +2025-02-05 17:37:31 - ERROR - stderr - 30%|██▉ | 6626/22434 [7:29:51<11:08:29, 2.54s/it] +2025-02-05 17:37:31 - ERROR - stderr - +2025-02-05 17:37:31 - ERROR - stderr - +2025-02-05 17:37:31 - INFO - stdout - {'loss': 0.9513, 'grad_norm': 1.0365923643112183, 'learning_rate': 1.6529541721841444e-05, 'epoch': 0.89} +2025-02-05 17:37:31 - ERROR - stderr - 30%|██▉ | 6626/22434 [7:29:51<11:08:29, 2.54s/it] +2025-02-05 17:37:34 - ERROR - stderr - 30%|██▉ | 6627/22434 [7:29:54<11:19:55, 2.58s/it] +2025-02-05 17:37:34 - ERROR - stderr - +2025-02-05 17:37:34 - ERROR - stderr - +2025-02-05 17:37:34 - INFO - stdout - {'loss': 0.9823, 'grad_norm': 1.0940231084823608, 'learning_rate': 1.6528448163828814e-05, 'epoch': 0.89} +2025-02-05 17:37:34 - ERROR - stderr - 30%|██▉ | 6627/22434 [7:29:54<11:19:55, 2.58s/it] +2025-02-05 17:37:36 - ERROR - stderr - 30%|██▉ | 6628/22434 [7:29:56<11:13:42, 2.56s/it] +2025-02-05 17:37:36 - ERROR - stderr - +2025-02-05 17:37:36 - ERROR - stderr - +2025-02-05 17:37:36 - INFO - stdout - {'loss': 0.9454, 'grad_norm': 1.0482330322265625, 'learning_rate': 1.6527354469736928e-05, 'epoch': 0.89} +2025-02-05 17:37:36 - ERROR - stderr - 30%|██▉ | 6628/22434 [7:29:56<11:13:42, 2.56s/it] +2025-02-05 17:37:39 - ERROR - stderr - 30%|██▉ | 6629/22434 [7:29:59<11:13:06, 2.56s/it] +2025-02-05 17:37:39 - ERROR - stderr - +2025-02-05 17:37:39 - ERROR - stderr - +2025-02-05 17:37:39 - INFO - stdout - {'loss': 0.9386, 'grad_norm': 1.0636597871780396, 'learning_rate': 1.6526260639588583e-05, 'epoch': 0.89} +2025-02-05 17:37:39 - ERROR - stderr - 30%|██▉ | 6629/22434 [7:29:59<11:13:06, 2.56s/it] +2025-02-05 17:37:41 - ERROR - stderr - 30%|██▉ | 6630/22434 [7:30:01<11:21:15, 2.59s/it] +2025-02-05 17:37:41 - ERROR - stderr - +2025-02-05 17:37:41 - ERROR - stderr - +2025-02-05 17:37:41 - INFO - stdout - {'loss': 0.8274, 'grad_norm': 1.0186216831207275, 'learning_rate': 1.652516667340658e-05, 'epoch': 0.89} +2025-02-05 17:37:41 - ERROR - stderr - 30%|██▉ | 6630/22434 [7:30:01<11:21:15, 2.59s/it] +2025-02-05 17:37:44 - ERROR - stderr - 30%|██▉ | 6631/22434 [7:30:04<11:15:20, 2.56s/it] +2025-02-05 17:37:44 - ERROR - stderr - +2025-02-05 17:37:44 - ERROR - stderr - +2025-02-05 17:37:44 - INFO - stdout - {'loss': 0.9656, 'grad_norm': 1.1662896871566772, 'learning_rate': 1.6524072571213724e-05, 'epoch': 0.89} +2025-02-05 17:37:44 - ERROR - stderr - 30%|██▉ | 6631/22434 [7:30:04<11:15:20, 2.56s/it] +2025-02-05 17:37:46 - ERROR - stderr - 30%|██▉ | 6632/22434 [7:30:06<11:11:19, 2.55s/it] +2025-02-05 17:37:47 - ERROR - stderr - +2025-02-05 17:37:47 - ERROR - stderr - +2025-02-05 17:37:47 - INFO - stdout - {'loss': 1.0751, 'grad_norm': 1.1851017475128174, 'learning_rate': 1.6522978333032817e-05, 'epoch': 0.89} +2025-02-05 17:37:47 - ERROR - stderr - 30%|██▉ | 6632/22434 [7:30:06<11:11:19, 2.55s/it] +2025-02-05 17:37:49 - ERROR - stderr - 30%|██▉ | 6633/22434 [7:30:09<11:04:21, 2.52s/it] +2025-02-05 17:37:49 - ERROR - stderr - +2025-02-05 17:37:49 - ERROR - stderr - +2025-02-05 17:37:49 - INFO - stdout - {'loss': 0.9998, 'grad_norm': 1.0155028104782104, 'learning_rate': 1.6521883958886665e-05, 'epoch': 0.89} +2025-02-05 17:37:49 - ERROR - stderr - 30%|██▉ | 6633/22434 [7:30:09<11:04:21, 2.52s/it] +2025-02-05 17:37:51 - ERROR - stderr - 30%|██▉ | 6634/22434 [7:30:11<11:07:13, 2.53s/it] +2025-02-05 17:37:52 - ERROR - stderr - +2025-02-05 17:37:52 - ERROR - stderr - +2025-02-05 17:37:52 - INFO - stdout - {'loss': 0.9042, 'grad_norm': 1.0203315019607544, 'learning_rate': 1.6520789448798086e-05, 'epoch': 0.89} +2025-02-05 17:37:52 - ERROR - stderr - 30%|██▉ | 6634/22434 [7:30:11<11:07:13, 2.53s/it] +2025-02-05 17:37:54 - ERROR - stderr - 30%|██▉ | 6635/22434 [7:30:14<11:04:13, 2.52s/it] +2025-02-05 17:37:54 - ERROR - stderr - +2025-02-05 17:37:54 - ERROR - stderr - +2025-02-05 17:37:54 - INFO - stdout - {'loss': 0.987, 'grad_norm': 1.0252208709716797, 'learning_rate': 1.6519694802789893e-05, 'epoch': 0.89} +2025-02-05 17:37:54 - ERROR - stderr - 30%|██▉ | 6635/22434 [7:30:14<11:04:13, 2.52s/it] +2025-02-05 17:37:56 - ERROR - stderr - 30%|██▉ | 6636/22434 [7:30:16<11:01:32, 2.51s/it] +2025-02-05 17:37:57 - ERROR - stderr - +2025-02-05 17:37:57 - ERROR - stderr - +2025-02-05 17:37:57 - INFO - stdout - {'loss': 0.9857, 'grad_norm': 1.120632529258728, 'learning_rate': 1.6518600020884896e-05, 'epoch': 0.89} +2025-02-05 17:37:57 - ERROR - stderr - 30%|██▉ | 6636/22434 [7:30:16<11:01:32, 2.51s/it] +2025-02-05 17:37:59 - ERROR - stderr - 30%|██▉ | 6637/22434 [7:30:19<11:05:42, 2.53s/it] +2025-02-05 17:37:59 - ERROR - stderr - +2025-02-05 17:37:59 - ERROR - stderr - +2025-02-05 17:37:59 - INFO - stdout - {'loss': 0.9361, 'grad_norm': 0.9164204597473145, 'learning_rate': 1.651750510310592e-05, 'epoch': 0.89} +2025-02-05 17:37:59 - ERROR - stderr - 30%|██▉ | 6637/22434 [7:30:19<11:05:42, 2.53s/it] +2025-02-05 17:38:02 - ERROR - stderr - 30%|██▉ | 6638/22434 [7:30:21<11:14:30, 2.56s/it] +2025-02-05 17:38:02 - ERROR - stderr - +2025-02-05 17:38:02 - ERROR - stderr - +2025-02-05 17:38:02 - INFO - stdout - {'loss': 0.9198, 'grad_norm': 0.9940130710601807, 'learning_rate': 1.6516410049475788e-05, 'epoch': 0.89} +2025-02-05 17:38:02 - ERROR - stderr - 30%|██▉ | 6638/22434 [7:30:22<11:14:30, 2.56s/it] +2025-02-05 17:38:04 - ERROR - stderr - 30%|██▉ | 6639/22434 [7:30:24<11:06:25, 2.53s/it] +2025-02-05 17:38:04 - ERROR - stderr - +2025-02-05 17:38:04 - ERROR - stderr - +2025-02-05 17:38:04 - INFO - stdout - {'loss': 0.9058, 'grad_norm': 1.0626641511917114, 'learning_rate': 1.6515314860017328e-05, 'epoch': 0.89} +2025-02-05 17:38:04 - ERROR - stderr - 30%|██▉ | 6639/22434 [7:30:24<11:06:25, 2.53s/it] +2025-02-05 17:38:07 - ERROR - stderr - 30%|██▉ | 6640/22434 [7:30:27<11:17:33, 2.57s/it] +2025-02-05 17:38:07 - ERROR - stderr - +2025-02-05 17:38:07 - ERROR - stderr - +2025-02-05 17:38:07 - INFO - stdout - {'loss': 0.936, 'grad_norm': 0.9900780916213989, 'learning_rate': 1.6514219534753357e-05, 'epoch': 0.89} +2025-02-05 17:38:07 - ERROR - stderr - 30%|██▉ | 6640/22434 [7:30:27<11:17:33, 2.57s/it] +2025-02-05 17:38:10 - ERROR - stderr - 30%|██▉ | 6641/22434 [7:30:29<11:26:07, 2.61s/it] +2025-02-05 17:38:10 - ERROR - stderr - +2025-02-05 17:38:10 - ERROR - stderr - +2025-02-05 17:38:10 - INFO - stdout - {'loss': 0.9373, 'grad_norm': 1.0205928087234497, 'learning_rate': 1.6513124073706715e-05, 'epoch': 0.89} +2025-02-05 17:38:10 - ERROR - stderr - 30%|██▉ | 6641/22434 [7:30:29<11:26:07, 2.61s/it] +2025-02-05 17:38:12 - ERROR - stderr - 30%|██▉ | 6642/22434 [7:30:32<11:15:07, 2.57s/it] +2025-02-05 17:38:12 - ERROR - stderr - +2025-02-05 17:38:12 - ERROR - stderr - +2025-02-05 17:38:12 - INFO - stdout - {'loss': 1.0156, 'grad_norm': 1.1461232900619507, 'learning_rate': 1.6512028476900234e-05, 'epoch': 0.89} +2025-02-05 17:38:12 - ERROR - stderr - 30%|██▉ | 6642/22434 [7:30:32<11:15:07, 2.57s/it] +2025-02-05 17:38:15 - ERROR - stderr - 30%|██▉ | 6643/22434 [7:30:34<11:13:03, 2.56s/it] +2025-02-05 17:38:15 - ERROR - stderr - +2025-02-05 17:38:15 - ERROR - stderr - +2025-02-05 17:38:15 - INFO - stdout - {'loss': 0.7738, 'grad_norm': 1.0463027954101562, 'learning_rate': 1.6510932744356754e-05, 'epoch': 0.89} +2025-02-05 17:38:15 - ERROR - stderr - 30%|██▉ | 6643/22434 [7:30:34<11:13:03, 2.56s/it] +2025-02-05 17:38:17 - ERROR - stderr - 30%|██▉ | 6644/22434 [7:30:37<11:20:33, 2.59s/it] +2025-02-05 17:38:17 - ERROR - stderr - +2025-02-05 17:38:17 - ERROR - stderr - +2025-02-05 17:38:17 - INFO - stdout - {'loss': 0.8949, 'grad_norm': 1.181414246559143, 'learning_rate': 1.650983687609911e-05, 'epoch': 0.89} +2025-02-05 17:38:17 - ERROR - stderr - 30%|██▉ | 6644/22434 [7:30:37<11:20:33, 2.59s/it] +2025-02-05 17:38:20 - ERROR - stderr - 30%|██▉ | 6645/22434 [7:30:39<11:15:21, 2.57s/it] +2025-02-05 17:38:20 - ERROR - stderr - +2025-02-05 17:38:20 - ERROR - stderr - +2025-02-05 17:38:20 - INFO - stdout - {'loss': 1.0011, 'grad_norm': 1.1667274236679077, 'learning_rate': 1.6508740872150143e-05, 'epoch': 0.89} +2025-02-05 17:38:20 - ERROR - stderr - 30%|██▉ | 6645/22434 [7:30:40<11:15:21, 2.57s/it] +2025-02-05 17:38:22 - ERROR - stderr - 30%|██▉ | 6646/22434 [7:30:42<11:06:09, 2.53s/it] +2025-02-05 17:38:22 - ERROR - stderr - +2025-02-05 17:38:22 - ERROR - stderr - +2025-02-05 17:38:22 - INFO - stdout - {'loss': 1.0411, 'grad_norm': 1.177300214767456, 'learning_rate': 1.6507644732532702e-05, 'epoch': 0.89} +2025-02-05 17:38:22 - ERROR - stderr - 30%|██▉ | 6646/22434 [7:30:42<11:06:09, 2.53s/it] +2025-02-05 17:38:25 - ERROR - stderr - 30%|██▉ | 6647/22434 [7:30:44<10:59:28, 2.51s/it] +2025-02-05 17:38:25 - ERROR - stderr - +2025-02-05 17:38:25 - ERROR - stderr - +2025-02-05 17:38:25 - INFO - stdout - {'loss': 0.9778, 'grad_norm': 1.0851504802703857, 'learning_rate': 1.6506548457269635e-05, 'epoch': 0.89} +2025-02-05 17:38:25 - ERROR - stderr - 30%|██▉ | 6647/22434 [7:30:44<10:59:28, 2.51s/it] +2025-02-05 17:38:27 - ERROR - stderr - 30%|██▉ | 6648/22434 [7:30:47<10:59:53, 2.51s/it] +2025-02-05 17:38:27 - ERROR - stderr - +2025-02-05 17:38:27 - ERROR - stderr - +2025-02-05 17:38:27 - INFO - stdout - {'loss': 0.8881, 'grad_norm': 1.0519440174102783, 'learning_rate': 1.650545204638379e-05, 'epoch': 0.89} +2025-02-05 17:38:27 - ERROR - stderr - 30%|██▉ | 6648/22434 [7:30:47<10:59:53, 2.51s/it] +2025-02-05 17:38:30 - ERROR - stderr - 30%|██▉ | 6649/22434 [7:30:49<10:54:50, 2.49s/it] +2025-02-05 17:38:30 - ERROR - stderr - +2025-02-05 17:38:30 - ERROR - stderr - +2025-02-05 17:38:30 - INFO - stdout - {'loss': 0.9898, 'grad_norm': 1.1065679788589478, 'learning_rate': 1.6504355499898023e-05, 'epoch': 0.89} +2025-02-05 17:38:30 - ERROR - stderr - 30%|██▉ | 6649/22434 [7:30:49<10:54:50, 2.49s/it] +2025-02-05 17:38:32 - ERROR - stderr - 30%|██▉ | 6650/22434 [7:30:52<10:52:03, 2.48s/it] +2025-02-05 17:38:32 - ERROR - stderr - +2025-02-05 17:38:32 - ERROR - stderr - +2025-02-05 17:38:32 - INFO - stdout - {'loss': 0.9253, 'grad_norm': 1.0286918878555298, 'learning_rate': 1.650325881783519e-05, 'epoch': 0.89} +2025-02-05 17:38:32 - ERROR - stderr - 30%|██▉ | 6650/22434 [7:30:52<10:52:03, 2.48s/it] +2025-02-05 17:38:34 - ERROR - stderr - 30%|██▉ | 6651/22434 [7:30:54<10:50:49, 2.47s/it] +2025-02-05 17:38:35 - ERROR - stderr - +2025-02-05 17:38:35 - ERROR - stderr - +2025-02-05 17:38:35 - INFO - stdout - {'loss': 0.9028, 'grad_norm': 1.0029408931732178, 'learning_rate': 1.650216200021815e-05, 'epoch': 0.89} +2025-02-05 17:38:35 - ERROR - stderr - 30%|██▉ | 6651/22434 [7:30:54<10:50:49, 2.47s/it] +2025-02-05 17:38:37 - ERROR - stderr - 30%|██▉ | 6652/22434 [7:30:57<11:09:15, 2.54s/it] +2025-02-05 17:38:37 - ERROR - stderr - +2025-02-05 17:38:37 - ERROR - stderr - +2025-02-05 17:38:37 - INFO - stdout - {'loss': 0.9046, 'grad_norm': 1.0041744709014893, 'learning_rate': 1.6501065047069764e-05, 'epoch': 0.89} +2025-02-05 17:38:37 - ERROR - stderr - 30%|██▉ | 6652/22434 [7:30:57<11:09:15, 2.54s/it] +2025-02-05 17:38:40 - ERROR - stderr - 30%|██▉ | 6653/22434 [7:30:59<11:10:11, 2.55s/it] +2025-02-05 17:38:40 - ERROR - stderr - +2025-02-05 17:38:40 - ERROR - stderr - +2025-02-05 17:38:40 - INFO - stdout - {'loss': 0.8084, 'grad_norm': 0.9768277406692505, 'learning_rate': 1.64999679584129e-05, 'epoch': 0.89} +2025-02-05 17:38:40 - ERROR - stderr - 30%|██▉ | 6653/22434 [7:31:00<11:10:11, 2.55s/it] +2025-02-05 17:38:42 - ERROR - stderr - 30%|██▉ | 6654/22434 [7:31:02<11:01:04, 2.51s/it] +2025-02-05 17:38:42 - ERROR - stderr - +2025-02-05 17:38:42 - ERROR - stderr - +2025-02-05 17:38:42 - INFO - stdout - {'loss': 0.984, 'grad_norm': 1.1030744314193726, 'learning_rate': 1.649887073427042e-05, 'epoch': 0.89} +2025-02-05 17:38:42 - ERROR - stderr - 30%|██▉ | 6654/22434 [7:31:02<11:01:04, 2.51s/it] +2025-02-05 17:38:45 - ERROR - stderr - 30%|██▉ | 6655/22434 [7:31:04<11:05:41, 2.53s/it] +2025-02-05 17:38:45 - ERROR - stderr - +2025-02-05 17:38:45 - ERROR - stderr - +2025-02-05 17:38:45 - INFO - stdout - {'loss': 0.8216, 'grad_norm': 0.9453567862510681, 'learning_rate': 1.64977733746652e-05, 'epoch': 0.89} +2025-02-05 17:38:45 - ERROR - stderr - 30%|██▉ | 6655/22434 [7:31:05<11:05:41, 2.53s/it] +2025-02-05 17:38:47 - ERROR - stderr - 30%|██▉ | 6656/22434 [7:31:07<10:59:18, 2.51s/it] +2025-02-05 17:38:47 - ERROR - stderr - +2025-02-05 17:38:47 - ERROR - stderr - +2025-02-05 17:38:47 - INFO - stdout - {'loss': 0.917, 'grad_norm': 1.2263792753219604, 'learning_rate': 1.6496675879620113e-05, 'epoch': 0.89} +2025-02-05 17:38:47 - ERROR - stderr - 30%|██▉ | 6656/22434 [7:31:07<10:59:18, 2.51s/it] +2025-02-05 17:38:50 - ERROR - stderr - 30%|██▉ | 6657/22434 [7:31:09<11:02:01, 2.52s/it] +2025-02-05 17:38:50 - ERROR - stderr - +2025-02-05 17:38:50 - ERROR - stderr - +2025-02-05 17:38:50 - INFO - stdout - {'loss': 0.8498, 'grad_norm': 0.9549890756607056, 'learning_rate': 1.649557824915803e-05, 'epoch': 0.89} +2025-02-05 17:38:50 - ERROR - stderr - 30%|██▉ | 6657/22434 [7:31:10<11:02:01, 2.52s/it] +2025-02-05 17:38:52 - ERROR - stderr - 30%|██▉ | 6658/22434 [7:31:12<10:59:59, 2.51s/it] +2025-02-05 17:38:52 - ERROR - stderr - +2025-02-05 17:38:52 - ERROR - stderr - +2025-02-05 17:38:52 - INFO - stdout - {'loss': 0.8399, 'grad_norm': 1.0324268341064453, 'learning_rate': 1.6494480483301836e-05, 'epoch': 0.89} +2025-02-05 17:38:52 - ERROR - stderr - 30%|██▉ | 6658/22434 [7:31:12<10:59:59, 2.51s/it] +2025-02-05 17:38:55 - ERROR - stderr - 30%|██▉ | 6659/22434 [7:31:15<11:21:16, 2.59s/it] +2025-02-05 17:38:55 - ERROR - stderr - +2025-02-05 17:38:55 - ERROR - stderr - +2025-02-05 17:38:55 - INFO - stdout - {'loss': 0.9927, 'grad_norm': 0.9723221659660339, 'learning_rate': 1.6493382582074415e-05, 'epoch': 0.89} +2025-02-05 17:38:55 - ERROR - stderr - 30%|██▉ | 6659/22434 [7:31:15<11:21:16, 2.59s/it] +2025-02-05 17:38:57 - ERROR - stderr - 30%|██▉ | 6660/22434 [7:31:17<11:12:12, 2.56s/it] +2025-02-05 17:38:58 - ERROR - stderr - +2025-02-05 17:38:58 - ERROR - stderr - +2025-02-05 17:38:58 - INFO - stdout - {'loss': 1.0311, 'grad_norm': 1.1457146406173706, 'learning_rate': 1.6492284545498645e-05, 'epoch': 0.89} +2025-02-05 17:38:58 - ERROR - stderr - 30%|██▉ | 6660/22434 [7:31:17<11:12:12, 2.56s/it] +2025-02-05 17:39:00 - ERROR - stderr - 30%|██▉ | 6661/22434 [7:31:20<11:12:27, 2.56s/it] +2025-02-05 17:39:00 - ERROR - stderr - +2025-02-05 17:39:00 - ERROR - stderr - +2025-02-05 17:39:00 - INFO - stdout - {'loss': 0.9032, 'grad_norm': 1.1672335863113403, 'learning_rate': 1.649118637359741e-05, 'epoch': 0.89} +2025-02-05 17:39:00 - ERROR - stderr - 30%|██▉ | 6661/22434 [7:31:20<11:12:27, 2.56s/it] +2025-02-05 17:39:02 - ERROR - stderr - 30%|██▉ | 6662/22434 [7:31:22<11:03:35, 2.52s/it] +2025-02-05 17:39:03 - ERROR - stderr - +2025-02-05 17:39:03 - ERROR - stderr - +2025-02-05 17:39:03 - INFO - stdout - {'loss': 0.8185, 'grad_norm': 1.0801018476486206, 'learning_rate': 1.6490088066393614e-05, 'epoch': 0.89} +2025-02-05 17:39:03 - ERROR - stderr - 30%|██▉ | 6662/22434 [7:31:22<11:03:35, 2.52s/it] +2025-02-05 17:39:05 - ERROR - stderr - 30%|██▉ | 6663/22434 [7:31:25<11:03:42, 2.53s/it] +2025-02-05 17:39:05 - ERROR - stderr - +2025-02-05 17:39:05 - ERROR - stderr - +2025-02-05 17:39:05 - INFO - stdout - {'loss': 0.9311, 'grad_norm': 1.1729601621627808, 'learning_rate': 1.648898962391014e-05, 'epoch': 0.89} +2025-02-05 17:39:05 - ERROR - stderr - 30%|██▉ | 6663/22434 [7:31:25<11:03:42, 2.53s/it] +2025-02-05 17:39:08 - ERROR - stderr - 30%|██▉ | 6664/22434 [7:31:27<11:02:43, 2.52s/it] +2025-02-05 17:39:08 - ERROR - stderr - +2025-02-05 17:39:08 - ERROR - stderr - +2025-02-05 17:39:08 - INFO - stdout - {'loss': 0.8354, 'grad_norm': 0.871172308921814, 'learning_rate': 1.648789104616989e-05, 'epoch': 0.89} +2025-02-05 17:39:08 - ERROR - stderr - 30%|██▉ | 6664/22434 [7:31:27<11:02:43, 2.52s/it] +2025-02-05 17:39:10 - ERROR - stderr - 30%|██▉ | 6665/22434 [7:31:30<11:12:53, 2.56s/it] +2025-02-05 17:39:10 - ERROR - stderr - +2025-02-05 17:39:10 - ERROR - stderr - +2025-02-05 17:39:10 - INFO - stdout - {'loss': 0.7912, 'grad_norm': 0.9779297113418579, 'learning_rate': 1.6486792333195752e-05, 'epoch': 0.89} +2025-02-05 17:39:10 - ERROR - stderr - 30%|██▉ | 6665/22434 [7:31:30<11:12:53, 2.56s/it] +2025-02-05 17:39:13 - ERROR - stderr - 30%|██▉ | 6666/22434 [7:31:32<11:07:35, 2.54s/it] +2025-02-05 17:39:13 - ERROR - stderr - +2025-02-05 17:39:13 - ERROR - stderr - +2025-02-05 17:39:13 - INFO - stdout - {'loss': 0.9237, 'grad_norm': 1.0173784494400024, 'learning_rate': 1.6485693485010643e-05, 'epoch': 0.89} +2025-02-05 17:39:13 - ERROR - stderr - 30%|██▉ | 6666/22434 [7:31:32<11:07:35, 2.54s/it] +2025-02-05 17:39:15 - ERROR - stderr - 30%|██▉ | 6667/22434 [7:31:35<11:07:23, 2.54s/it] +2025-02-05 17:39:15 - ERROR - stderr - +2025-02-05 17:39:15 - ERROR - stderr - +2025-02-05 17:39:15 - INFO - stdout - {'loss': 0.9572, 'grad_norm': 1.0498394966125488, 'learning_rate': 1.6484594501637453e-05, 'epoch': 0.89} +2025-02-05 17:39:15 - ERROR - stderr - 30%|██▉ | 6667/22434 [7:31:35<11:07:23, 2.54s/it] +2025-02-05 17:39:18 - ERROR - stderr - 30%|██▉ | 6668/22434 [7:31:38<11:07:26, 2.54s/it] +2025-02-05 17:39:18 - ERROR - stderr - +2025-02-05 17:39:18 - ERROR - stderr - +2025-02-05 17:39:18 - INFO - stdout - {'loss': 0.9084, 'grad_norm': 1.0005152225494385, 'learning_rate': 1.6483495383099103e-05, 'epoch': 0.89} +2025-02-05 17:39:18 - ERROR - stderr - 30%|██▉ | 6668/22434 [7:31:38<11:07:26, 2.54s/it] +2025-02-05 17:39:20 - ERROR - stderr - 30%|██▉ | 6669/22434 [7:31:40<10:59:04, 2.51s/it] +2025-02-05 17:39:20 - ERROR - stderr - +2025-02-05 17:39:20 - ERROR - stderr - +2025-02-05 17:39:20 - INFO - stdout - {'loss': 0.9289, 'grad_norm': 1.0867047309875488, 'learning_rate': 1.6482396129418488e-05, 'epoch': 0.89} +2025-02-05 17:39:20 - ERROR - stderr - 30%|██▉ | 6669/22434 [7:31:40<10:59:04, 2.51s/it] +2025-02-05 17:39:23 - ERROR - stderr - 30%|██▉ | 6670/22434 [7:31:43<11:19:09, 2.59s/it] +2025-02-05 17:39:23 - ERROR - stderr - +2025-02-05 17:39:23 - ERROR - stderr - +2025-02-05 17:39:23 - INFO - stdout - {'loss': 1.0947, 'grad_norm': 1.1120163202285767, 'learning_rate': 1.648129674061853e-05, 'epoch': 0.89} +2025-02-05 17:39:23 - ERROR - stderr - 30%|██▉ | 6670/22434 [7:31:43<11:19:09, 2.59s/it] +2025-02-05 17:39:25 - ERROR - stderr - 30%|██▉ | 6671/22434 [7:31:45<11:11:20, 2.56s/it] +2025-02-05 17:39:25 - ERROR - stderr - +2025-02-05 17:39:25 - ERROR - stderr - +2025-02-05 17:39:25 - INFO - stdout - {'loss': 0.8579, 'grad_norm': 1.0160980224609375, 'learning_rate': 1.648019721672215e-05, 'epoch': 0.89} +2025-02-05 17:39:25 - ERROR - stderr - 30%|██▉ | 6671/22434 [7:31:45<11:11:20, 2.56s/it] +2025-02-05 17:39:28 - ERROR - stderr - 30%|██▉ | 6672/22434 [7:31:48<11:09:56, 2.55s/it] +2025-02-05 17:39:28 - ERROR - stderr - +2025-02-05 17:39:28 - ERROR - stderr - +2025-02-05 17:39:28 - INFO - stdout - {'loss': 0.9057, 'grad_norm': 1.051330804824829, 'learning_rate': 1.6479097557752254e-05, 'epoch': 0.89} +2025-02-05 17:39:28 - ERROR - stderr - 30%|██▉ | 6672/22434 [7:31:48<11:09:56, 2.55s/it] +2025-02-05 17:39:31 - ERROR - stderr - 30%|██▉ | 6673/22434 [7:31:50<11:10:52, 2.55s/it] +2025-02-05 17:39:31 - ERROR - stderr - +2025-02-05 17:39:31 - ERROR - stderr - +2025-02-05 17:39:31 - INFO - stdout - {'loss': 1.0916, 'grad_norm': 1.267731785774231, 'learning_rate': 1.647799776373177e-05, 'epoch': 0.89} +2025-02-05 17:39:31 - ERROR - stderr - 30%|██▉ | 6673/22434 [7:31:50<11:10:52, 2.55s/it] +2025-02-05 17:39:33 - ERROR - stderr - 30%|██▉ | 6674/22434 [7:31:53<11:07:02, 2.54s/it] +2025-02-05 17:39:33 - ERROR - stderr - +2025-02-05 17:39:33 - ERROR - stderr - +2025-02-05 17:39:33 - INFO - stdout - {'loss': 0.8878, 'grad_norm': 1.0835604667663574, 'learning_rate': 1.647689783468362e-05, 'epoch': 0.89} +2025-02-05 17:39:33 - ERROR - stderr - 30%|██▉ | 6674/22434 [7:31:53<11:07:02, 2.54s/it] +2025-02-05 17:39:36 - ERROR - stderr - 30%|██▉ | 6675/22434 [7:31:55<11:07:34, 2.54s/it] +2025-02-05 17:39:36 - ERROR - stderr - +2025-02-05 17:39:36 - ERROR - stderr - +2025-02-05 17:39:36 - INFO - stdout - {'loss': 0.7677, 'grad_norm': 1.0329865217208862, 'learning_rate': 1.6475797770630736e-05, 'epoch': 0.89} +2025-02-05 17:39:36 - ERROR - stderr - 30%|██▉ | 6675/22434 [7:31:55<11:07:34, 2.54s/it] +2025-02-05 17:39:38 - ERROR - stderr - 30%|██▉ | 6676/22434 [7:31:58<10:58:49, 2.51s/it] +2025-02-05 17:39:38 - ERROR - stderr - +2025-02-05 17:39:38 - ERROR - stderr - +2025-02-05 17:39:38 - INFO - stdout - {'loss': 1.0187, 'grad_norm': 1.0527637004852295, 'learning_rate': 1.6474697571596042e-05, 'epoch': 0.89} +2025-02-05 17:39:38 - ERROR - stderr - 30%|██▉ | 6676/22434 [7:31:58<10:58:49, 2.51s/it] +2025-02-05 17:39:41 - ERROR - stderr - 30%|██▉ | 6677/22434 [7:32:00<11:01:00, 2.52s/it] +2025-02-05 17:39:41 - ERROR - stderr - +2025-02-05 17:39:41 - ERROR - stderr - +2025-02-05 17:39:41 - INFO - stdout - {'loss': 1.0094, 'grad_norm': 1.2311348915100098, 'learning_rate': 1.6473597237602472e-05, 'epoch': 0.89} +2025-02-05 17:39:41 - ERROR - stderr - 30%|██▉ | 6677/22434 [7:32:00<11:01:00, 2.52s/it] +2025-02-05 17:39:43 - ERROR - stderr - 30%|██▉ | 6678/22434 [7:32:03<10:58:29, 2.51s/it] +2025-02-05 17:39:43 - ERROR - stderr - +2025-02-05 17:39:43 - ERROR - stderr - +2025-02-05 17:39:43 - INFO - stdout - {'loss': 0.9126, 'grad_norm': 0.9658203125, 'learning_rate': 1.6472496768672965e-05, 'epoch': 0.89} +2025-02-05 17:39:43 - ERROR - stderr - 30%|██▉ | 6678/22434 [7:32:03<10:58:29, 2.51s/it] +2025-02-05 17:39:43 - INFO - stdout - WARNING: tokenization mismatch: 156 vs. 174. (ignored) +2025-02-05 17:39:46 - ERROR - stderr - 30%|██▉ | 6679/22434 [7:32:05<11:03:13, 2.53s/it] +2025-02-05 17:39:46 - ERROR - stderr - +2025-02-05 17:39:46 - ERROR - stderr - +2025-02-05 17:39:46 - INFO - stdout - {'loss': 0.9129, 'grad_norm': 1.0320508480072021, 'learning_rate': 1.6471396164830452e-05, 'epoch': 0.89} +2025-02-05 17:39:46 - ERROR - stderr - 30%|██▉ | 6679/22434 [7:32:05<11:03:13, 2.53s/it] +2025-02-05 17:39:48 - ERROR - stderr - 30%|██▉ | 6680/22434 [7:32:08<11:08:52, 2.55s/it] +2025-02-05 17:39:48 - ERROR - stderr - +2025-02-05 17:39:48 - ERROR - stderr - +2025-02-05 17:39:48 - INFO - stdout - {'loss': 0.9018, 'grad_norm': 1.1232877969741821, 'learning_rate': 1.647029542609788e-05, 'epoch': 0.89} +2025-02-05 17:39:48 - ERROR - stderr - 30%|██▉ | 6680/22434 [7:32:08<11:08:52, 2.55s/it] +2025-02-05 17:39:51 - ERROR - stderr - 30%|██▉ | 6681/22434 [7:32:10<11:05:02, 2.53s/it] +2025-02-05 17:39:51 - ERROR - stderr - +2025-02-05 17:39:51 - ERROR - stderr - +2025-02-05 17:39:51 - INFO - stdout - {'loss': 0.8608, 'grad_norm': 1.0318970680236816, 'learning_rate': 1.6469194552498194e-05, 'epoch': 0.89} +2025-02-05 17:39:51 - ERROR - stderr - 30%|██▉ | 6681/22434 [7:32:11<11:05:02, 2.53s/it] +2025-02-05 17:39:53 - ERROR - stderr - 30%|██▉ | 6682/22434 [7:32:13<11:14:56, 2.57s/it] +2025-02-05 17:39:53 - ERROR - stderr - +2025-02-05 17:39:53 - ERROR - stderr - +2025-02-05 17:39:53 - INFO - stdout - {'loss': 1.0453, 'grad_norm': 1.0595561265945435, 'learning_rate': 1.6468093544054334e-05, 'epoch': 0.89} +2025-02-05 17:39:53 - ERROR - stderr - 30%|██▉ | 6682/22434 [7:32:13<11:14:56, 2.57s/it] +2025-02-05 17:39:56 - ERROR - stderr - 30%|██▉ | 6683/22434 [7:32:16<11:10:32, 2.55s/it] +2025-02-05 17:39:56 - ERROR - stderr - +2025-02-05 17:39:56 - ERROR - stderr - +2025-02-05 17:39:56 - INFO - stdout - {'loss': 0.8809, 'grad_norm': 0.9870584011077881, 'learning_rate': 1.6466992400789256e-05, 'epoch': 0.89} +2025-02-05 17:39:56 - ERROR - stderr - 30%|██▉ | 6683/22434 [7:32:16<11:10:32, 2.55s/it] +2025-02-05 17:39:59 - ERROR - stderr - 30%|██▉ | 6684/22434 [7:32:18<11:28:30, 2.62s/it] +2025-02-05 17:39:59 - ERROR - stderr - +2025-02-05 17:39:59 - ERROR - stderr - +2025-02-05 17:39:59 - INFO - stdout - {'loss': 1.0889, 'grad_norm': 1.1204252243041992, 'learning_rate': 1.646589112272591e-05, 'epoch': 0.89} +2025-02-05 17:39:59 - ERROR - stderr - 30%|██▉ | 6684/22434 [7:32:18<11:28:30, 2.62s/it] +2025-02-05 17:40:01 - ERROR - stderr - 30%|██▉ | 6685/22434 [7:32:21<11:18:11, 2.58s/it] +2025-02-05 17:40:01 - ERROR - stderr - +2025-02-05 17:40:01 - ERROR - stderr - +2025-02-05 17:40:01 - INFO - stdout - {'loss': 0.9022, 'grad_norm': 1.1926586627960205, 'learning_rate': 1.646478970988725e-05, 'epoch': 0.89} +2025-02-05 17:40:01 - ERROR - stderr - 30%|██▉ | 6685/22434 [7:32:21<11:18:11, 2.58s/it] +2025-02-05 17:40:04 - ERROR - stderr - 30%|██▉ | 6686/22434 [7:32:23<11:11:45, 2.56s/it] +2025-02-05 17:40:04 - ERROR - stderr - +2025-02-05 17:40:04 - ERROR - stderr - +2025-02-05 17:40:04 - INFO - stdout - {'loss': 0.9375, 'grad_norm': 1.0428320169448853, 'learning_rate': 1.6463688162296232e-05, 'epoch': 0.89} +2025-02-05 17:40:04 - ERROR - stderr - 30%|██▉ | 6686/22434 [7:32:23<11:11:45, 2.56s/it] +2025-02-05 17:40:06 - ERROR - stderr - 30%|██▉ | 6687/22434 [7:32:26<11:06:16, 2.54s/it] +2025-02-05 17:40:06 - ERROR - stderr - +2025-02-05 17:40:06 - ERROR - stderr - +2025-02-05 17:40:06 - INFO - stdout - {'loss': 1.1299, 'grad_norm': 0.989416241645813, 'learning_rate': 1.6462586479975823e-05, 'epoch': 0.89} +2025-02-05 17:40:06 - ERROR - stderr - 30%|██▉ | 6687/22434 [7:32:26<11:06:16, 2.54s/it] +2025-02-05 17:40:09 - ERROR - stderr - 30%|██▉ | 6688/22434 [7:32:28<10:59:08, 2.51s/it] +2025-02-05 17:40:09 - ERROR - stderr - +2025-02-05 17:40:09 - ERROR - stderr - +2025-02-05 17:40:09 - INFO - stdout - {'loss': 0.9408, 'grad_norm': 1.232982873916626, 'learning_rate': 1.6461484662948982e-05, 'epoch': 0.89} +2025-02-05 17:40:09 - ERROR - stderr - 30%|██▉ | 6688/22434 [7:32:28<10:59:08, 2.51s/it] +2025-02-05 17:40:11 - ERROR - stderr - 30%|██▉ | 6689/22434 [7:32:31<10:58:12, 2.51s/it] +2025-02-05 17:40:11 - ERROR - stderr - +2025-02-05 17:40:11 - ERROR - stderr - +2025-02-05 17:40:11 - INFO - stdout - {'loss': 0.9389, 'grad_norm': 1.0534180402755737, 'learning_rate': 1.6460382711238678e-05, 'epoch': 0.89} +2025-02-05 17:40:11 - ERROR - stderr - 30%|██▉ | 6689/22434 [7:32:31<10:58:12, 2.51s/it] +2025-02-05 17:40:14 - ERROR - stderr - 30%|██▉ | 6690/22434 [7:32:33<10:53:43, 2.49s/it] +2025-02-05 17:40:14 - ERROR - stderr - +2025-02-05 17:40:14 - ERROR - stderr - +2025-02-05 17:40:14 - INFO - stdout - {'loss': 0.9771, 'grad_norm': 1.0252068042755127, 'learning_rate': 1.6459280624867876e-05, 'epoch': 0.89} +2025-02-05 17:40:14 - ERROR - stderr - 30%|██▉ | 6690/22434 [7:32:33<10:53:43, 2.49s/it] +2025-02-05 17:40:16 - ERROR - stderr - 30%|██▉ | 6691/22434 [7:32:36<10:47:47, 2.47s/it] +2025-02-05 17:40:16 - ERROR - stderr - +2025-02-05 17:40:16 - ERROR - stderr - +2025-02-05 17:40:16 - INFO - stdout - {'loss': 0.9464, 'grad_norm': 1.0314444303512573, 'learning_rate': 1.6458178403859547e-05, 'epoch': 0.89} +2025-02-05 17:40:16 - ERROR - stderr - 30%|██▉ | 6691/22434 [7:32:36<10:47:47, 2.47s/it] +2025-02-05 17:40:18 - ERROR - stderr - 30%|██▉ | 6692/22434 [7:32:38<10:45:50, 2.46s/it] +2025-02-05 17:40:18 - ERROR - stderr - +2025-02-05 17:40:18 - ERROR - stderr - +2025-02-05 17:40:18 - INFO - stdout - {'loss': 0.9805, 'grad_norm': 0.9803935885429382, 'learning_rate': 1.6457076048236676e-05, 'epoch': 0.89} +2025-02-05 17:40:18 - ERROR - stderr - 30%|██▉ | 6692/22434 [7:32:38<10:45:50, 2.46s/it] +2025-02-05 17:40:21 - ERROR - stderr - 30%|██▉ | 6693/22434 [7:32:41<10:49:10, 2.47s/it] +2025-02-05 17:40:21 - ERROR - stderr - +2025-02-05 17:40:21 - ERROR - stderr - +2025-02-05 17:40:21 - INFO - stdout - {'loss': 0.9957, 'grad_norm': 1.0925337076187134, 'learning_rate': 1.645597355802223e-05, 'epoch': 0.9} +2025-02-05 17:40:21 - ERROR - stderr - 30%|██▉ | 6693/22434 [7:32:41<10:49:10, 2.47s/it] +2025-02-05 17:40:24 - ERROR - stderr - 30%|██▉ | 6694/22434 [7:32:44<11:18:41, 2.59s/it] +2025-02-05 17:40:24 - ERROR - stderr - +2025-02-05 17:40:24 - ERROR - stderr - +2025-02-05 17:40:24 - INFO - stdout - {'loss': 0.926, 'grad_norm': 1.004028081893921, 'learning_rate': 1.6454870933239192e-05, 'epoch': 0.9} +2025-02-05 17:40:24 - ERROR - stderr - 30%|██▉ | 6694/22434 [7:32:44<11:18:41, 2.59s/it] +2025-02-05 17:40:26 - ERROR - stderr - 30%|██▉ | 6695/22434 [7:32:46<11:08:44, 2.55s/it] +2025-02-05 17:40:26 - ERROR - stderr - +2025-02-05 17:40:26 - ERROR - stderr - +2025-02-05 17:40:26 - INFO - stdout - {'loss': 1.0194, 'grad_norm': 1.0104879140853882, 'learning_rate': 1.6453768173910546e-05, 'epoch': 0.9} +2025-02-05 17:40:26 - ERROR - stderr - 30%|██▉ | 6695/22434 [7:32:46<11:08:44, 2.55s/it] +2025-02-05 17:40:29 - ERROR - stderr - 30%|██▉ | 6696/22434 [7:32:49<11:10:50, 2.56s/it] +2025-02-05 17:40:29 - ERROR - stderr - +2025-02-05 17:40:29 - ERROR - stderr - +2025-02-05 17:40:29 - INFO - stdout - {'loss': 0.9793, 'grad_norm': 1.0822699069976807, 'learning_rate': 1.6452665280059277e-05, 'epoch': 0.9} +2025-02-05 17:40:29 - ERROR - stderr - 30%|██▉ | 6696/22434 [7:32:49<11:10:50, 2.56s/it] +2025-02-05 17:40:31 - ERROR - stderr - 30%|██▉ | 6697/22434 [7:32:51<11:08:50, 2.55s/it] +2025-02-05 17:40:31 - ERROR - stderr - +2025-02-05 17:40:31 - ERROR - stderr - +2025-02-05 17:40:31 - INFO - stdout - {'loss': 0.9781, 'grad_norm': 1.0720988512039185, 'learning_rate': 1.6451562251708376e-05, 'epoch': 0.9} +2025-02-05 17:40:31 - ERROR - stderr - 30%|██▉ | 6697/22434 [7:32:51<11:08:50, 2.55s/it] +2025-02-05 17:40:34 - ERROR - stderr - 30%|██▉ | 6698/22434 [7:32:54<11:08:06, 2.55s/it] +2025-02-05 17:40:34 - ERROR - stderr - +2025-02-05 17:40:34 - ERROR - stderr - +2025-02-05 17:40:34 - INFO - stdout - {'loss': 0.9052, 'grad_norm': 1.0022727251052856, 'learning_rate': 1.6450459088880836e-05, 'epoch': 0.9} +2025-02-05 17:40:34 - ERROR - stderr - 30%|██▉ | 6698/22434 [7:32:54<11:08:06, 2.55s/it] +2025-02-05 17:40:36 - ERROR - stderr - 30%|██▉ | 6699/22434 [7:32:56<11:13:50, 2.57s/it] +2025-02-05 17:40:37 - ERROR - stderr - +2025-02-05 17:40:37 - ERROR - stderr - +2025-02-05 17:40:37 - INFO - stdout - {'loss': 0.9257, 'grad_norm': 1.0359539985656738, 'learning_rate': 1.6449355791599647e-05, 'epoch': 0.9} +2025-02-05 17:40:37 - ERROR - stderr - 30%|██▉ | 6699/22434 [7:32:56<11:13:50, 2.57s/it] +2025-02-05 17:40:39 - ERROR - stderr - 30%|██▉ | 6700/22434 [7:32:59<11:04:07, 2.53s/it] +2025-02-05 17:40:39 - ERROR - stderr - +2025-02-05 17:40:39 - ERROR - stderr - +2025-02-05 17:40:39 - INFO - stdout - {'loss': 0.8847, 'grad_norm': 1.0689456462860107, 'learning_rate': 1.6448252359887808e-05, 'epoch': 0.9} +2025-02-05 17:40:39 - ERROR - stderr - 30%|██▉ | 6700/22434 [7:32:59<11:04:07, 2.53s/it] +2025-02-05 17:40:41 - ERROR - stderr - 30%|██▉ | 6701/22434 [7:33:01<11:04:18, 2.53s/it] +2025-02-05 17:40:42 - ERROR - stderr - +2025-02-05 17:40:42 - ERROR - stderr - +2025-02-05 17:40:42 - INFO - stdout - {'loss': 0.9318, 'grad_norm': 0.9872997999191284, 'learning_rate': 1.6447148793768316e-05, 'epoch': 0.9} +2025-02-05 17:40:42 - ERROR - stderr - 30%|██▉ | 6701/22434 [7:33:01<11:04:18, 2.53s/it] +2025-02-05 17:40:44 - ERROR - stderr - 30%|██▉ | 6702/22434 [7:33:04<10:59:21, 2.51s/it] +2025-02-05 17:40:44 - ERROR - stderr - +2025-02-05 17:40:44 - ERROR - stderr - +2025-02-05 17:40:44 - INFO - stdout - {'loss': 0.7495, 'grad_norm': 0.9834489226341248, 'learning_rate': 1.644604509326418e-05, 'epoch': 0.9} +2025-02-05 17:40:44 - ERROR - stderr - 30%|██▉ | 6702/22434 [7:33:04<10:59:21, 2.51s/it] +2025-02-05 17:40:46 - ERROR - stderr - 30%|██▉ | 6703/22434 [7:33:06<10:59:48, 2.52s/it] +2025-02-05 17:40:47 - ERROR - stderr - +2025-02-05 17:40:47 - ERROR - stderr - +2025-02-05 17:40:47 - INFO - stdout - {'loss': 0.9347, 'grad_norm': 1.0131887197494507, 'learning_rate': 1.6444941258398403e-05, 'epoch': 0.9} +2025-02-05 17:40:47 - ERROR - stderr - 30%|██▉ | 6703/22434 [7:33:06<10:59:48, 2.52s/it] +2025-02-05 17:40:49 - ERROR - stderr - 30%|██▉ | 6704/22434 [7:33:09<10:54:53, 2.50s/it] +2025-02-05 17:40:49 - ERROR - stderr - +2025-02-05 17:40:49 - ERROR - stderr - +2025-02-05 17:40:49 - INFO - stdout - {'loss': 0.964, 'grad_norm': 1.0297667980194092, 'learning_rate': 1.644383728919399e-05, 'epoch': 0.9} +2025-02-05 17:40:49 - ERROR - stderr - 30%|██▉ | 6704/22434 [7:33:09<10:54:53, 2.50s/it] +2025-02-05 17:40:51 - ERROR - stderr - 30%|██▉ | 6705/22434 [7:33:11<10:57:33, 2.51s/it] +2025-02-05 17:40:51 - ERROR - stderr - +2025-02-05 17:40:52 - ERROR - stderr - +2025-02-05 17:40:52 - INFO - stdout - {'loss': 0.8684, 'grad_norm': 1.0307282209396362, 'learning_rate': 1.6442733185673953e-05, 'epoch': 0.9} +2025-02-05 17:40:52 - ERROR - stderr - 30%|██▉ | 6705/22434 [7:33:11<10:57:33, 2.51s/it] +2025-02-05 17:40:54 - ERROR - stderr - 30%|██▉ | 6706/22434 [7:33:14<10:58:30, 2.51s/it] +2025-02-05 17:40:54 - ERROR - stderr - +2025-02-05 17:40:54 - ERROR - stderr - +2025-02-05 17:40:54 - INFO - stdout - {'loss': 0.939, 'grad_norm': 0.9437198042869568, 'learning_rate': 1.6441628947861312e-05, 'epoch': 0.9} +2025-02-05 17:40:54 - ERROR - stderr - 30%|██▉ | 6706/22434 [7:33:14<10:58:30, 2.51s/it] +2025-02-05 17:40:56 - ERROR - stderr - 30%|██▉ | 6707/22434 [7:33:16<10:59:33, 2.52s/it] +2025-02-05 17:40:57 - ERROR - stderr - +2025-02-05 17:40:57 - ERROR - stderr - +2025-02-05 17:40:57 - INFO - stdout - {'loss': 0.9214, 'grad_norm': 1.0671260356903076, 'learning_rate': 1.644052457577908e-05, 'epoch': 0.9} +2025-02-05 17:40:57 - ERROR - stderr - 30%|██▉ | 6707/22434 [7:33:16<10:59:33, 2.52s/it] +2025-02-05 17:40:59 - ERROR - stderr - 30%|██▉ | 6708/22434 [7:33:19<11:01:59, 2.53s/it] +2025-02-05 17:40:59 - ERROR - stderr - +2025-02-05 17:40:59 - ERROR - stderr - +2025-02-05 17:40:59 - INFO - stdout - {'loss': 0.8918, 'grad_norm': 1.0547828674316406, 'learning_rate': 1.6439420069450273e-05, 'epoch': 0.9} +2025-02-05 17:40:59 - ERROR - stderr - 30%|██▉ | 6708/22434 [7:33:19<11:01:59, 2.53s/it] +2025-02-05 17:41:02 - ERROR - stderr - 30%|██▉ | 6709/22434 [7:33:21<10:59:27, 2.52s/it] +2025-02-05 17:41:02 - ERROR - stderr - +2025-02-05 17:41:02 - ERROR - stderr - +2025-02-05 17:41:02 - INFO - stdout - {'loss': 1.0493, 'grad_norm': 1.0720034837722778, 'learning_rate': 1.6438315428897914e-05, 'epoch': 0.9} +2025-02-05 17:41:02 - ERROR - stderr - 30%|██▉ | 6709/22434 [7:33:21<10:59:27, 2.52s/it] +2025-02-05 17:41:04 - ERROR - stderr - 30%|██▉ | 6710/22434 [7:33:24<11:00:37, 2.52s/it] +2025-02-05 17:41:04 - ERROR - stderr - +2025-02-05 17:41:04 - ERROR - stderr - +2025-02-05 17:41:04 - INFO - stdout - {'loss': 0.9203, 'grad_norm': 0.9499634504318237, 'learning_rate': 1.6437210654145036e-05, 'epoch': 0.9} +2025-02-05 17:41:04 - ERROR - stderr - 30%|██▉ | 6710/22434 [7:33:24<11:00:37, 2.52s/it] +2025-02-05 17:41:07 - ERROR - stderr - 30%|██▉ | 6711/22434 [7:33:27<11:12:42, 2.57s/it] +2025-02-05 17:41:07 - ERROR - stderr - +2025-02-05 17:41:07 - ERROR - stderr - +2025-02-05 17:41:07 - INFO - stdout - {'loss': 0.8987, 'grad_norm': 1.0873655080795288, 'learning_rate': 1.6436105745214658e-05, 'epoch': 0.9} +2025-02-05 17:41:07 - ERROR - stderr - 30%|██▉ | 6711/22434 [7:33:27<11:12:42, 2.57s/it] +2025-02-05 17:41:10 - ERROR - stderr - 30%|██▉ | 6712/22434 [7:33:29<11:39:28, 2.67s/it] +2025-02-05 17:41:10 - ERROR - stderr - +2025-02-05 17:41:10 - ERROR - stderr - +2025-02-05 17:41:10 - INFO - stdout - {'loss': 0.9886, 'grad_norm': 1.091537594795227, 'learning_rate': 1.6435000702129816e-05, 'epoch': 0.9} +2025-02-05 17:41:10 - ERROR - stderr - 30%|██▉ | 6712/22434 [7:33:29<11:39:28, 2.67s/it] +2025-02-05 17:41:12 - ERROR - stderr - 30%|██▉ | 6713/22434 [7:33:32<11:24:29, 2.61s/it] +2025-02-05 17:41:12 - ERROR - stderr - +2025-02-05 17:41:12 - ERROR - stderr - +2025-02-05 17:41:12 - INFO - stdout - {'loss': 0.8877, 'grad_norm': 1.1032395362854004, 'learning_rate': 1.6433895524913546e-05, 'epoch': 0.9} +2025-02-05 17:41:12 - ERROR - stderr - 30%|██▉ | 6713/22434 [7:33:32<11:24:29, 2.61s/it] +2025-02-05 17:41:15 - ERROR - stderr - 30%|██▉ | 6714/22434 [7:33:34<11:11:09, 2.56s/it] +2025-02-05 17:41:15 - ERROR - stderr - +2025-02-05 17:41:15 - ERROR - stderr - +2025-02-05 17:41:15 - INFO - stdout - {'loss': 0.9119, 'grad_norm': 1.0616761445999146, 'learning_rate': 1.6432790213588874e-05, 'epoch': 0.9} +2025-02-05 17:41:15 - ERROR - stderr - 30%|██▉ | 6714/22434 [7:33:34<11:11:09, 2.56s/it] +2025-02-05 17:41:17 - ERROR - stderr - 30%|██▉ | 6715/22434 [7:33:37<11:25:44, 2.62s/it] +2025-02-05 17:41:17 - ERROR - stderr - +2025-02-05 17:41:17 - ERROR - stderr - +2025-02-05 17:41:17 - INFO - stdout - {'loss': 0.8723, 'grad_norm': 1.0024023056030273, 'learning_rate': 1.643168476817885e-05, 'epoch': 0.9} +2025-02-05 17:41:17 - ERROR - stderr - 30%|██▉ | 6715/22434 [7:33:37<11:25:44, 2.62s/it] +2025-02-05 17:41:20 - ERROR - stderr - 30%|██▉ | 6716/22434 [7:33:40<11:15:45, 2.58s/it] +2025-02-05 17:41:20 - ERROR - stderr - +2025-02-05 17:41:20 - ERROR - stderr - +2025-02-05 17:41:20 - INFO - stdout - {'loss': 0.8446, 'grad_norm': 1.068844199180603, 'learning_rate': 1.643057918870651e-05, 'epoch': 0.9} +2025-02-05 17:41:20 - ERROR - stderr - 30%|██▉ | 6716/22434 [7:33:40<11:15:45, 2.58s/it] +2025-02-05 17:41:22 - ERROR - stderr - 30%|██▉ | 6717/22434 [7:33:42<11:13:41, 2.57s/it] +2025-02-05 17:41:22 - ERROR - stderr - +2025-02-05 17:41:22 - ERROR - stderr - +2025-02-05 17:41:22 - INFO - stdout - {'loss': 0.9481, 'grad_norm': 1.0865434408187866, 'learning_rate': 1.6429473475194898e-05, 'epoch': 0.9} +2025-02-05 17:41:22 - ERROR - stderr - 30%|██▉ | 6717/22434 [7:33:42<11:13:41, 2.57s/it] +2025-02-05 17:41:25 - ERROR - stderr - 30%|██▉ | 6718/22434 [7:33:45<11:05:31, 2.54s/it] +2025-02-05 17:41:25 - ERROR - stderr - +2025-02-05 17:41:25 - ERROR - stderr - +2025-02-05 17:41:25 - INFO - stdout - {'loss': 0.9401, 'grad_norm': 1.0358482599258423, 'learning_rate': 1.6428367627667067e-05, 'epoch': 0.9} +2025-02-05 17:41:25 - ERROR - stderr - 30%|██▉ | 6718/22434 [7:33:45<11:05:31, 2.54s/it] +2025-02-05 17:41:27 - ERROR - stderr - 30%|██▉ | 6719/22434 [7:33:47<11:13:07, 2.57s/it] +2025-02-05 17:41:28 - ERROR - stderr - +2025-02-05 17:41:28 - ERROR - stderr - +2025-02-05 17:41:28 - INFO - stdout - {'loss': 0.9841, 'grad_norm': 1.0376105308532715, 'learning_rate': 1.642726164614606e-05, 'epoch': 0.9} +2025-02-05 17:41:28 - ERROR - stderr - 30%|██▉ | 6719/22434 [7:33:47<11:13:07, 2.57s/it] +2025-02-05 17:41:30 - ERROR - stderr - 30%|██▉ | 6720/22434 [7:33:50<11:04:14, 2.54s/it] +2025-02-05 17:41:30 - ERROR - stderr - +2025-02-05 17:41:30 - ERROR - stderr - +2025-02-05 17:41:30 - INFO - stdout - {'loss': 1.0423, 'grad_norm': 1.1672828197479248, 'learning_rate': 1.6426155530654943e-05, 'epoch': 0.9} +2025-02-05 17:41:30 - ERROR - stderr - 30%|██▉ | 6720/22434 [7:33:50<11:04:14, 2.54s/it] +2025-02-05 17:41:32 - ERROR - stderr - 30%|██▉ | 6721/22434 [7:33:52<11:03:15, 2.53s/it] +2025-02-05 17:41:33 - ERROR - stderr - +2025-02-05 17:41:33 - ERROR - stderr - +2025-02-05 17:41:33 - INFO - stdout - {'loss': 0.9826, 'grad_norm': 1.1026726961135864, 'learning_rate': 1.6425049281216755e-05, 'epoch': 0.9} +2025-02-05 17:41:33 - ERROR - stderr - 30%|██▉ | 6721/22434 [7:33:52<11:03:15, 2.53s/it] +2025-02-05 17:41:35 - ERROR - stderr - 30%|██▉ | 6722/22434 [7:33:55<11:00:57, 2.52s/it] +2025-02-05 17:41:35 - ERROR - stderr - +2025-02-05 17:41:35 - ERROR - stderr - +2025-02-05 17:41:35 - INFO - stdout - {'loss': 0.8839, 'grad_norm': 1.1375296115875244, 'learning_rate': 1.642394289785456e-05, 'epoch': 0.9} +2025-02-05 17:41:35 - ERROR - stderr - 30%|██▉ | 6722/22434 [7:33:55<11:00:57, 2.52s/it] +2025-02-05 17:41:37 - ERROR - stderr - 30%|██▉ | 6723/22434 [7:33:57<11:00:14, 2.52s/it] +2025-02-05 17:41:38 - ERROR - stderr - +2025-02-05 17:41:38 - ERROR - stderr - +2025-02-05 17:41:38 - INFO - stdout - {'loss': 0.9149, 'grad_norm': 1.045061707496643, 'learning_rate': 1.642283638059143e-05, 'epoch': 0.9} +2025-02-05 17:41:38 - ERROR - stderr - 30%|██▉ | 6723/22434 [7:33:57<11:00:14, 2.52s/it] +2025-02-05 17:41:40 - ERROR - stderr - 30%|██▉ | 6724/22434 [7:34:00<10:56:27, 2.51s/it] +2025-02-05 17:41:40 - ERROR - stderr - +2025-02-05 17:41:40 - ERROR - stderr - +2025-02-05 17:41:40 - INFO - stdout - {'loss': 0.8937, 'grad_norm': 0.9502860307693481, 'learning_rate': 1.642172972945042e-05, 'epoch': 0.9} +2025-02-05 17:41:40 - ERROR - stderr - 30%|██▉ | 6724/22434 [7:34:00<10:56:27, 2.51s/it] +2025-02-05 17:41:42 - ERROR - stderr - 30%|██▉ | 6725/22434 [7:34:02<10:52:03, 2.49s/it] +2025-02-05 17:41:42 - ERROR - stderr - +2025-02-05 17:41:42 - ERROR - stderr - +2025-02-05 17:41:42 - INFO - stdout - {'loss': 0.8316, 'grad_norm': 0.9991877675056458, 'learning_rate': 1.6420622944454598e-05, 'epoch': 0.9} +2025-02-05 17:41:42 - ERROR - stderr - 30%|██▉ | 6725/22434 [7:34:02<10:52:03, 2.49s/it] +2025-02-05 17:41:45 - ERROR - stderr - 30%|██▉ | 6726/22434 [7:34:05<10:51:25, 2.49s/it] +2025-02-05 17:41:45 - ERROR - stderr - +2025-02-05 17:41:45 - ERROR - stderr - +2025-02-05 17:41:45 - INFO - stdout - {'loss': 0.8907, 'grad_norm': 1.03340482711792, 'learning_rate': 1.641951602562703e-05, 'epoch': 0.9} +2025-02-05 17:41:45 - ERROR - stderr - 30%|██▉ | 6726/22434 [7:34:05<10:51:25, 2.49s/it] +2025-02-05 17:41:47 - ERROR - stderr - 30%|██▉ | 6727/22434 [7:34:07<10:58:07, 2.51s/it] +2025-02-05 17:41:48 - ERROR - stderr - +2025-02-05 17:41:48 - ERROR - stderr - +2025-02-05 17:41:48 - INFO - stdout - {'loss': 0.9461, 'grad_norm': 1.0601781606674194, 'learning_rate': 1.64184089729908e-05, 'epoch': 0.9} +2025-02-05 17:41:48 - ERROR - stderr - 30%|██▉ | 6727/22434 [7:34:07<10:58:07, 2.51s/it] +2025-02-05 17:41:50 - ERROR - stderr - 30%|██▉ | 6728/22434 [7:34:10<11:11:11, 2.56s/it] +2025-02-05 17:41:50 - ERROR - stderr - +2025-02-05 17:41:50 - ERROR - stderr - +2025-02-05 17:41:50 - INFO - stdout - {'loss': 1.0307, 'grad_norm': 1.0657267570495605, 'learning_rate': 1.6417301786568973e-05, 'epoch': 0.9} +2025-02-05 17:41:50 - ERROR - stderr - 30%|██▉ | 6728/22434 [7:34:10<11:11:11, 2.56s/it] +2025-02-05 17:41:53 - ERROR - stderr - 30%|██▉ | 6729/22434 [7:34:12<11:10:06, 2.56s/it] +2025-02-05 17:41:53 - ERROR - stderr - +2025-02-05 17:41:53 - ERROR - stderr - +2025-02-05 17:41:53 - INFO - stdout - {'loss': 0.9574, 'grad_norm': 0.9871540665626526, 'learning_rate': 1.6416194466384632e-05, 'epoch': 0.9} +2025-02-05 17:41:53 - ERROR - stderr - 30%|██▉ | 6729/22434 [7:34:13<11:10:06, 2.56s/it] +2025-02-05 17:41:55 - ERROR - stderr - 30%|██▉ | 6730/22434 [7:34:15<11:09:07, 2.56s/it] +2025-02-05 17:41:55 - ERROR - stderr - +2025-02-05 17:41:55 - ERROR - stderr - +2025-02-05 17:41:55 - INFO - stdout - {'loss': 0.8814, 'grad_norm': 0.9986724257469177, 'learning_rate': 1.6415087012460857e-05, 'epoch': 0.9} +2025-02-05 17:41:55 - ERROR - stderr - 30%|██▉ | 6730/22434 [7:34:15<11:09:07, 2.56s/it] +2025-02-05 17:41:58 - ERROR - stderr - 30%|███ | 6731/22434 [7:34:17<11:02:02, 2.53s/it] +2025-02-05 17:41:58 - ERROR - stderr - +2025-02-05 17:41:58 - ERROR - stderr - +2025-02-05 17:41:58 - INFO - stdout - {'loss': 0.8484, 'grad_norm': 1.0343241691589355, 'learning_rate': 1.6413979424820733e-05, 'epoch': 0.9} +2025-02-05 17:41:58 - ERROR - stderr - 30%|███ | 6731/22434 [7:34:18<11:02:02, 2.53s/it] +2025-02-05 17:42:00 - ERROR - stderr - 30%|███ | 6732/22434 [7:34:20<10:54:34, 2.50s/it] +2025-02-05 17:42:00 - ERROR - stderr - +2025-02-05 17:42:00 - ERROR - stderr - +2025-02-05 17:42:00 - INFO - stdout - {'loss': 0.8975, 'grad_norm': 1.114450216293335, 'learning_rate': 1.6412871703487345e-05, 'epoch': 0.9} +2025-02-05 17:42:00 - ERROR - stderr - 30%|███ | 6732/22434 [7:34:20<10:54:34, 2.50s/it] +2025-02-05 17:42:03 - ERROR - stderr - 30%|███ | 6733/22434 [7:34:22<10:53:54, 2.50s/it] +2025-02-05 17:42:03 - ERROR - stderr - +2025-02-05 17:42:03 - ERROR - stderr - +2025-02-05 17:42:03 - INFO - stdout - {'loss': 0.9997, 'grad_norm': 1.2138824462890625, 'learning_rate': 1.6411763848483782e-05, 'epoch': 0.9} +2025-02-05 17:42:03 - ERROR - stderr - 30%|███ | 6733/22434 [7:34:22<10:53:54, 2.50s/it] +2025-02-05 17:42:05 - ERROR - stderr - 30%|███ | 6734/22434 [7:34:25<10:49:30, 2.48s/it] +2025-02-05 17:42:05 - ERROR - stderr - +2025-02-05 17:42:05 - ERROR - stderr - +2025-02-05 17:42:05 - INFO - stdout - {'loss': 0.9995, 'grad_norm': 1.0738543272018433, 'learning_rate': 1.641065585983314e-05, 'epoch': 0.9} +2025-02-05 17:42:05 - ERROR - stderr - 30%|███ | 6734/22434 [7:34:25<10:49:30, 2.48s/it] +2025-02-05 17:42:08 - ERROR - stderr - 30%|███ | 6735/22434 [7:34:27<10:52:15, 2.49s/it] +2025-02-05 17:42:08 - ERROR - stderr - +2025-02-05 17:42:08 - ERROR - stderr - +2025-02-05 17:42:08 - INFO - stdout - {'loss': 0.8487, 'grad_norm': 0.9797514081001282, 'learning_rate': 1.6409547737558504e-05, 'epoch': 0.9} +2025-02-05 17:42:08 - ERROR - stderr - 30%|███ | 6735/22434 [7:34:27<10:52:15, 2.49s/it] +2025-02-05 17:42:10 - ERROR - stderr - 30%|███ | 6736/22434 [7:34:30<11:06:36, 2.55s/it] +2025-02-05 17:42:10 - ERROR - stderr - +2025-02-05 17:42:10 - ERROR - stderr - +2025-02-05 17:42:10 - INFO - stdout - {'loss': 0.9828, 'grad_norm': 1.0873870849609375, 'learning_rate': 1.6408439481682985e-05, 'epoch': 0.9} +2025-02-05 17:42:10 - ERROR - stderr - 30%|███ | 6736/22434 [7:34:30<11:06:36, 2.55s/it] +2025-02-05 17:42:13 - ERROR - stderr - 30%|███ | 6737/22434 [7:34:33<10:59:37, 2.52s/it] +2025-02-05 17:42:13 - ERROR - stderr - +2025-02-05 17:42:13 - ERROR - stderr - +2025-02-05 17:42:13 - INFO - stdout - {'loss': 1.0074, 'grad_norm': 1.237776517868042, 'learning_rate': 1.6407331092229673e-05, 'epoch': 0.9} +2025-02-05 17:42:13 - ERROR - stderr - 30%|███ | 6737/22434 [7:34:33<10:59:37, 2.52s/it] +2025-02-05 17:42:15 - ERROR - stderr - 30%|███ | 6738/22434 [7:34:35<10:57:45, 2.51s/it] +2025-02-05 17:42:15 - ERROR - stderr - +2025-02-05 17:42:15 - ERROR - stderr - +2025-02-05 17:42:15 - INFO - stdout - {'loss': 0.949, 'grad_norm': 1.0938637256622314, 'learning_rate': 1.6406222569221678e-05, 'epoch': 0.9} +2025-02-05 17:42:15 - ERROR - stderr - 30%|███ | 6738/22434 [7:34:35<10:57:45, 2.51s/it] +2025-02-05 17:42:18 - ERROR - stderr - 30%|███ | 6739/22434 [7:34:38<11:13:43, 2.58s/it] +2025-02-05 17:42:18 - ERROR - stderr - +2025-02-05 17:42:18 - ERROR - stderr - +2025-02-05 17:42:18 - INFO - stdout - {'loss': 0.9336, 'grad_norm': 1.1377477645874023, 'learning_rate': 1.64051139126821e-05, 'epoch': 0.9} +2025-02-05 17:42:18 - ERROR - stderr - 30%|███ | 6739/22434 [7:34:38<11:13:43, 2.58s/it] +2025-02-05 17:42:20 - ERROR - stderr - 30%|███ | 6740/22434 [7:34:40<11:08:16, 2.55s/it] +2025-02-05 17:42:21 - ERROR - stderr - +2025-02-05 17:42:21 - ERROR - stderr - +2025-02-05 17:42:21 - INFO - stdout - {'loss': 0.9324, 'grad_norm': 1.1673563718795776, 'learning_rate': 1.6404005122634058e-05, 'epoch': 0.9} +2025-02-05 17:42:21 - ERROR - stderr - 30%|███ | 6740/22434 [7:34:40<11:08:16, 2.55s/it] +2025-02-05 17:42:23 - ERROR - stderr - 30%|███ | 6741/22434 [7:34:43<11:03:24, 2.54s/it] +2025-02-05 17:42:23 - ERROR - stderr - +2025-02-05 17:42:23 - ERROR - stderr - +2025-02-05 17:42:23 - INFO - stdout - {'loss': 0.8181, 'grad_norm': 1.0079574584960938, 'learning_rate': 1.640289619910065e-05, 'epoch': 0.9} +2025-02-05 17:42:23 - ERROR - stderr - 30%|███ | 6741/22434 [7:34:43<11:03:24, 2.54s/it] +2025-02-05 17:42:25 - ERROR - stderr - 30%|███ | 6742/22434 [7:34:45<11:01:26, 2.53s/it] +2025-02-05 17:42:26 - ERROR - stderr - +2025-02-05 17:42:26 - ERROR - stderr - +2025-02-05 17:42:26 - INFO - stdout - {'loss': 1.0669, 'grad_norm': 1.0750503540039062, 'learning_rate': 1.6401787142105004e-05, 'epoch': 0.9} +2025-02-05 17:42:26 - ERROR - stderr - 30%|███ | 6742/22434 [7:34:45<11:01:26, 2.53s/it] +2025-02-05 17:42:28 - ERROR - stderr - 30%|███ | 6743/22434 [7:34:48<10:53:35, 2.50s/it] +2025-02-05 17:42:28 - ERROR - stderr - +2025-02-05 17:42:28 - ERROR - stderr - +2025-02-05 17:42:28 - INFO - stdout - {'loss': 0.9234, 'grad_norm': 1.024989128112793, 'learning_rate': 1.6400677951670228e-05, 'epoch': 0.9} +2025-02-05 17:42:28 - ERROR - stderr - 30%|███ | 6743/22434 [7:34:48<10:53:35, 2.50s/it] +2025-02-05 17:42:31 - ERROR - stderr - 30%|███ | 6744/22434 [7:34:50<11:02:26, 2.53s/it] +2025-02-05 17:42:31 - ERROR - stderr - +2025-02-05 17:42:31 - ERROR - stderr - +2025-02-05 17:42:31 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.0257606506347656, 'learning_rate': 1.6399568627819445e-05, 'epoch': 0.9} +2025-02-05 17:42:31 - ERROR - stderr - 30%|███ | 6744/22434 [7:34:50<11:02:26, 2.53s/it] +2025-02-05 17:42:33 - ERROR - stderr - 30%|███ | 6745/22434 [7:34:53<10:58:49, 2.52s/it] +2025-02-05 17:42:33 - ERROR - stderr - +2025-02-05 17:42:33 - ERROR - stderr - +2025-02-05 17:42:33 - INFO - stdout - {'loss': 0.9358, 'grad_norm': 1.0672742128372192, 'learning_rate': 1.6398459170575776e-05, 'epoch': 0.9} +2025-02-05 17:42:33 - ERROR - stderr - 30%|███ | 6745/22434 [7:34:53<10:58:49, 2.52s/it] +2025-02-05 17:42:35 - ERROR - stderr - 30%|███ | 6746/22434 [7:34:55<10:54:28, 2.50s/it] +2025-02-05 17:42:36 - ERROR - stderr - +2025-02-05 17:42:36 - ERROR - stderr - +2025-02-05 17:42:36 - INFO - stdout - {'loss': 0.9811, 'grad_norm': 0.9566826820373535, 'learning_rate': 1.639734957996235e-05, 'epoch': 0.9} +2025-02-05 17:42:36 - ERROR - stderr - 30%|███ | 6746/22434 [7:34:55<10:54:28, 2.50s/it] +2025-02-05 17:42:38 - ERROR - stderr - 30%|███ | 6747/22434 [7:34:58<10:55:27, 2.51s/it] +2025-02-05 17:42:38 - ERROR - stderr - +2025-02-05 17:42:38 - ERROR - stderr - +2025-02-05 17:42:38 - INFO - stdout - {'loss': 0.9077, 'grad_norm': 1.0109277963638306, 'learning_rate': 1.6396239856002295e-05, 'epoch': 0.9} +2025-02-05 17:42:38 - ERROR - stderr - 30%|███ | 6747/22434 [7:34:58<10:55:27, 2.51s/it] +2025-02-05 17:42:41 - ERROR - stderr - 30%|███ | 6748/22434 [7:35:00<10:58:05, 2.52s/it] +2025-02-05 17:42:41 - ERROR - stderr - +2025-02-05 17:42:41 - ERROR - stderr - +2025-02-05 17:42:41 - INFO - stdout - {'loss': 0.9854, 'grad_norm': 1.2165746688842773, 'learning_rate': 1.639512999871874e-05, 'epoch': 0.9} +2025-02-05 17:42:41 - ERROR - stderr - 30%|███ | 6748/22434 [7:35:00<10:58:05, 2.52s/it] +2025-02-05 17:42:43 - ERROR - stderr - 30%|███ | 6749/22434 [7:35:03<10:53:39, 2.50s/it] +2025-02-05 17:42:43 - ERROR - stderr - +2025-02-05 17:42:43 - ERROR - stderr - +2025-02-05 17:42:43 - INFO - stdout - {'loss': 1.0142, 'grad_norm': 1.08646559715271, 'learning_rate': 1.639402000813482e-05, 'epoch': 0.9} +2025-02-05 17:42:43 - ERROR - stderr - 30%|███ | 6749/22434 [7:35:03<10:53:39, 2.50s/it] +2025-02-05 17:42:45 - ERROR - stderr - 30%|███ | 6750/22434 [7:35:05<10:54:38, 2.50s/it] +2025-02-05 17:42:46 - ERROR - stderr - +2025-02-05 17:42:46 - ERROR - stderr - +2025-02-05 17:42:46 - INFO - stdout - {'loss': 1.005, 'grad_norm': 1.0451642274856567, 'learning_rate': 1.639290988427367e-05, 'epoch': 0.9} +2025-02-05 17:42:46 - ERROR - stderr - 30%|███ | 6750/22434 [7:35:05<10:54:38, 2.50s/it] +2025-02-05 17:42:48 - ERROR - stderr - 30%|███ | 6751/22434 [7:35:08<10:51:32, 2.49s/it] +2025-02-05 17:42:48 - ERROR - stderr - +2025-02-05 17:42:48 - ERROR - stderr - +2025-02-05 17:42:48 - INFO - stdout - {'loss': 0.8274, 'grad_norm': 1.022635579109192, 'learning_rate': 1.6391799627158432e-05, 'epoch': 0.9} +2025-02-05 17:42:48 - ERROR - stderr - 30%|███ | 6751/22434 [7:35:08<10:51:32, 2.49s/it] +2025-02-05 17:42:51 - ERROR - stderr - 30%|███ | 6752/22434 [7:35:10<10:56:07, 2.51s/it] +2025-02-05 17:42:51 - ERROR - stderr - +2025-02-05 17:42:51 - ERROR - stderr - +2025-02-05 17:42:51 - INFO - stdout - {'loss': 1.0794, 'grad_norm': 1.0998194217681885, 'learning_rate': 1.6390689236812244e-05, 'epoch': 0.9} +2025-02-05 17:42:51 - ERROR - stderr - 30%|███ | 6752/22434 [7:35:10<10:56:07, 2.51s/it] +2025-02-05 17:42:53 - ERROR - stderr - 30%|███ | 6753/22434 [7:35:13<10:55:12, 2.51s/it] +2025-02-05 17:42:53 - ERROR - stderr - +2025-02-05 17:42:53 - ERROR - stderr - +2025-02-05 17:42:53 - INFO - stdout - {'loss': 0.9725, 'grad_norm': 1.0877984762191772, 'learning_rate': 1.638957871325826e-05, 'epoch': 0.9} +2025-02-05 17:42:53 - ERROR - stderr - 30%|███ | 6753/22434 [7:35:13<10:55:12, 2.51s/it] +2025-02-05 17:42:55 - ERROR - stderr - 30%|███ | 6754/22434 [7:35:15<10:51:26, 2.49s/it] +2025-02-05 17:42:56 - ERROR - stderr - +2025-02-05 17:42:56 - ERROR - stderr - +2025-02-05 17:42:56 - INFO - stdout - {'loss': 0.8947, 'grad_norm': 1.0117844343185425, 'learning_rate': 1.638846805651961e-05, 'epoch': 0.9} +2025-02-05 17:42:56 - ERROR - stderr - 30%|███ | 6754/22434 [7:35:15<10:51:26, 2.49s/it] +2025-02-05 17:42:58 - ERROR - stderr - 30%|███ | 6755/22434 [7:35:18<10:51:22, 2.49s/it] +2025-02-05 17:42:58 - ERROR - stderr - +2025-02-05 17:42:58 - ERROR - stderr - +2025-02-05 17:42:58 - INFO - stdout - {'loss': 0.8768, 'grad_norm': 0.939814567565918, 'learning_rate': 1.6387357266619467e-05, 'epoch': 0.9} +2025-02-05 17:42:58 - ERROR - stderr - 30%|███ | 6755/22434 [7:35:18<10:51:22, 2.49s/it] +2025-02-05 17:43:00 - ERROR - stderr - 30%|███ | 6756/22434 [7:35:20<10:51:53, 2.49s/it] +2025-02-05 17:43:01 - ERROR - stderr - +2025-02-05 17:43:01 - ERROR - stderr - +2025-02-05 17:43:01 - INFO - stdout - {'loss': 0.9017, 'grad_norm': 1.0196579694747925, 'learning_rate': 1.6386246343580973e-05, 'epoch': 0.9} +2025-02-05 17:43:01 - ERROR - stderr - 30%|███ | 6756/22434 [7:35:20<10:51:53, 2.49s/it] +2025-02-05 17:43:03 - ERROR - stderr - 30%|███ | 6757/22434 [7:35:23<10:55:27, 2.51s/it] +2025-02-05 17:43:03 - ERROR - stderr - +2025-02-05 17:43:03 - ERROR - stderr - +2025-02-05 17:43:03 - INFO - stdout - {'loss': 0.9509, 'grad_norm': 1.1136674880981445, 'learning_rate': 1.6385135287427284e-05, 'epoch': 0.9} +2025-02-05 17:43:03 - ERROR - stderr - 30%|███ | 6757/22434 [7:35:23<10:55:27, 2.51s/it] +2025-02-05 17:43:05 - ERROR - stderr - 30%|███ | 6758/22434 [7:35:25<10:51:37, 2.49s/it] +2025-02-05 17:43:06 - ERROR - stderr - +2025-02-05 17:43:06 - ERROR - stderr - +2025-02-05 17:43:06 - INFO - stdout - {'loss': 0.8894, 'grad_norm': 1.0302562713623047, 'learning_rate': 1.6384024098181557e-05, 'epoch': 0.9} +2025-02-05 17:43:06 - ERROR - stderr - 30%|███ | 6758/22434 [7:35:25<10:51:37, 2.49s/it] +2025-02-05 17:43:08 - ERROR - stderr - 30%|███ | 6759/22434 [7:35:28<11:09:39, 2.56s/it] +2025-02-05 17:43:08 - ERROR - stderr - +2025-02-05 17:43:08 - ERROR - stderr - +2025-02-05 17:43:08 - INFO - stdout - {'loss': 0.8875, 'grad_norm': 0.9795793890953064, 'learning_rate': 1.638291277586696e-05, 'epoch': 0.9} +2025-02-05 17:43:08 - ERROR - stderr - 30%|███ | 6759/22434 [7:35:28<11:09:39, 2.56s/it] +2025-02-05 17:43:11 - ERROR - stderr - 30%|███ | 6760/22434 [7:35:31<11:11:48, 2.57s/it] +2025-02-05 17:43:11 - ERROR - stderr - +2025-02-05 17:43:11 - ERROR - stderr - +2025-02-05 17:43:11 - INFO - stdout - {'loss': 1.043, 'grad_norm': 1.160224199295044, 'learning_rate': 1.6381801320506655e-05, 'epoch': 0.9} +2025-02-05 17:43:11 - ERROR - stderr - 30%|███ | 6760/22434 [7:35:31<11:11:48, 2.57s/it] +2025-02-05 17:43:13 - ERROR - stderr - 30%|███ | 6761/22434 [7:35:33<11:02:59, 2.54s/it] +2025-02-05 17:43:13 - ERROR - stderr - +2025-02-05 17:43:13 - ERROR - stderr - +2025-02-05 17:43:13 - INFO - stdout - {'loss': 0.9311, 'grad_norm': 1.0704209804534912, 'learning_rate': 1.6380689732123804e-05, 'epoch': 0.9} +2025-02-05 17:43:13 - ERROR - stderr - 30%|███ | 6761/22434 [7:35:33<11:02:59, 2.54s/it] +2025-02-05 17:43:16 - ERROR - stderr - 30%|███ | 6762/22434 [7:35:36<11:01:10, 2.53s/it] +2025-02-05 17:43:16 - ERROR - stderr - +2025-02-05 17:43:16 - ERROR - stderr - +2025-02-05 17:43:16 - INFO - stdout - {'loss': 0.8842, 'grad_norm': 0.9757776856422424, 'learning_rate': 1.6379578010741582e-05, 'epoch': 0.9} +2025-02-05 17:43:16 - ERROR - stderr - 30%|███ | 6762/22434 [7:35:36<11:01:10, 2.53s/it] +2025-02-05 17:43:18 - ERROR - stderr - 30%|███ | 6763/22434 [7:35:38<10:54:17, 2.51s/it] +2025-02-05 17:43:18 - ERROR - stderr - +2025-02-05 17:43:18 - ERROR - stderr - +2025-02-05 17:43:18 - INFO - stdout - {'loss': 0.9848, 'grad_norm': 1.0957953929901123, 'learning_rate': 1.6378466156383163e-05, 'epoch': 0.9} +2025-02-05 17:43:18 - ERROR - stderr - 30%|███ | 6763/22434 [7:35:38<10:54:17, 2.51s/it] +2025-02-05 17:43:21 - ERROR - stderr - 30%|███ | 6764/22434 [7:35:40<10:50:06, 2.49s/it] +2025-02-05 17:43:21 - ERROR - stderr - +2025-02-05 17:43:21 - ERROR - stderr - +2025-02-05 17:43:21 - INFO - stdout - {'loss': 0.7924, 'grad_norm': 0.9756842851638794, 'learning_rate': 1.637735416907172e-05, 'epoch': 0.9} +2025-02-05 17:43:21 - ERROR - stderr - 30%|███ | 6764/22434 [7:35:40<10:50:06, 2.49s/it] +2025-02-05 17:43:23 - ERROR - stderr - 30%|███ | 6765/22434 [7:35:43<10:52:24, 2.50s/it] +2025-02-05 17:43:23 - ERROR - stderr - +2025-02-05 17:43:23 - ERROR - stderr - +2025-02-05 17:43:23 - INFO - stdout - {'loss': 1.0144, 'grad_norm': 1.0808343887329102, 'learning_rate': 1.6376242048830432e-05, 'epoch': 0.9} +2025-02-05 17:43:23 - ERROR - stderr - 30%|███ | 6765/22434 [7:35:43<10:52:24, 2.50s/it] +2025-02-05 17:43:26 - ERROR - stderr - 30%|███ | 6766/22434 [7:35:45<10:57:22, 2.52s/it] +2025-02-05 17:43:26 - ERROR - stderr - +2025-02-05 17:43:26 - ERROR - stderr - +2025-02-05 17:43:26 - INFO - stdout - {'loss': 0.8754, 'grad_norm': 0.9551745653152466, 'learning_rate': 1.637512979568248e-05, 'epoch': 0.9} +2025-02-05 17:43:26 - ERROR - stderr - 30%|███ | 6766/22434 [7:35:46<10:57:22, 2.52s/it] +2025-02-05 17:43:28 - ERROR - stderr - 30%|███ | 6767/22434 [7:35:48<10:59:20, 2.53s/it] +2025-02-05 17:43:28 - ERROR - stderr - +2025-02-05 17:43:28 - ERROR - stderr - +2025-02-05 17:43:28 - INFO - stdout - {'loss': 1.0409, 'grad_norm': 1.1391195058822632, 'learning_rate': 1.6374017409651045e-05, 'epoch': 0.9} +2025-02-05 17:43:28 - ERROR - stderr - 30%|███ | 6767/22434 [7:35:48<10:59:20, 2.53s/it] +2025-02-05 17:43:31 - ERROR - stderr - 30%|███ | 6768/22434 [7:35:51<10:55:17, 2.51s/it] +2025-02-05 17:43:31 - ERROR - stderr - +2025-02-05 17:43:31 - ERROR - stderr - +2025-02-05 17:43:31 - INFO - stdout - {'loss': 0.899, 'grad_norm': 1.0498212575912476, 'learning_rate': 1.637290489075932e-05, 'epoch': 0.91} +2025-02-05 17:43:31 - ERROR - stderr - 30%|███ | 6768/22434 [7:35:51<10:55:17, 2.51s/it] +2025-02-05 17:43:33 - ERROR - stderr - 30%|███ | 6769/22434 [7:35:53<10:54:08, 2.51s/it] +2025-02-05 17:43:33 - ERROR - stderr - +2025-02-05 17:43:33 - ERROR - stderr - +2025-02-05 17:43:33 - INFO - stdout - {'loss': 0.8813, 'grad_norm': 0.9705497026443481, 'learning_rate': 1.6371792239030488e-05, 'epoch': 0.91} +2025-02-05 17:43:33 - ERROR - stderr - 30%|███ | 6769/22434 [7:35:53<10:54:08, 2.51s/it] +2025-02-05 17:43:36 - ERROR - stderr - 30%|███ | 6770/22434 [7:35:56<10:55:22, 2.51s/it] +2025-02-05 17:43:36 - ERROR - stderr - +2025-02-05 17:43:36 - ERROR - stderr - +2025-02-05 17:43:36 - INFO - stdout - {'loss': 0.9106, 'grad_norm': 0.9668666124343872, 'learning_rate': 1.6370679454487747e-05, 'epoch': 0.91} +2025-02-05 17:43:36 - ERROR - stderr - 30%|███ | 6770/22434 [7:35:56<10:55:22, 2.51s/it] +2025-02-05 17:43:38 - ERROR - stderr - 30%|███ | 6771/22434 [7:35:58<10:52:48, 2.50s/it] +2025-02-05 17:43:38 - ERROR - stderr - +2025-02-05 17:43:38 - ERROR - stderr - +2025-02-05 17:43:38 - INFO - stdout - {'loss': 1.0491, 'grad_norm': 1.105089545249939, 'learning_rate': 1.6369566537154285e-05, 'epoch': 0.91} +2025-02-05 17:43:38 - ERROR - stderr - 30%|███ | 6771/22434 [7:35:58<10:52:48, 2.50s/it] +2025-02-05 17:43:41 - ERROR - stderr - 30%|███ | 6772/22434 [7:36:01<10:57:02, 2.52s/it] +2025-02-05 17:43:41 - ERROR - stderr - +2025-02-05 17:43:41 - ERROR - stderr - +2025-02-05 17:43:41 - INFO - stdout - {'loss': 0.8109, 'grad_norm': 1.0066843032836914, 'learning_rate': 1.6368453487053305e-05, 'epoch': 0.91} +2025-02-05 17:43:41 - ERROR - stderr - 30%|███ | 6772/22434 [7:36:01<10:57:02, 2.52s/it] +2025-02-05 17:43:43 - ERROR - stderr - 30%|███ | 6773/22434 [7:36:03<10:58:47, 2.52s/it] +2025-02-05 17:43:43 - ERROR - stderr - +2025-02-05 17:43:43 - ERROR - stderr - +2025-02-05 17:43:43 - INFO - stdout - {'loss': 0.9952, 'grad_norm': 1.1398422718048096, 'learning_rate': 1.6367340304208008e-05, 'epoch': 0.91} +2025-02-05 17:43:43 - ERROR - stderr - 30%|███ | 6773/22434 [7:36:03<10:58:47, 2.52s/it] +2025-02-05 17:43:46 - ERROR - stderr - 30%|███ | 6774/22434 [7:36:06<10:53:39, 2.50s/it] +2025-02-05 17:43:46 - ERROR - stderr - +2025-02-05 17:43:46 - ERROR - stderr - +2025-02-05 17:43:46 - INFO - stdout - {'loss': 0.997, 'grad_norm': 1.152174949645996, 'learning_rate': 1.6366226988641593e-05, 'epoch': 0.91} +2025-02-05 17:43:46 - ERROR - stderr - 30%|███ | 6774/22434 [7:36:06<10:53:39, 2.50s/it] +2025-02-05 17:43:48 - ERROR - stderr - 30%|███ | 6775/22434 [7:36:08<10:54:51, 2.51s/it] +2025-02-05 17:43:48 - ERROR - stderr - +2025-02-05 17:43:48 - ERROR - stderr - +2025-02-05 17:43:48 - INFO - stdout - {'loss': 0.833, 'grad_norm': 0.9366565942764282, 'learning_rate': 1.6365113540377268e-05, 'epoch': 0.91} +2025-02-05 17:43:48 - ERROR - stderr - 30%|███ | 6775/22434 [7:36:08<10:54:51, 2.51s/it] +2025-02-05 17:43:51 - ERROR - stderr - 30%|███ | 6776/22434 [7:36:11<10:57:26, 2.52s/it] +2025-02-05 17:43:51 - ERROR - stderr - +2025-02-05 17:43:51 - ERROR - stderr - +2025-02-05 17:43:51 - INFO - stdout - {'loss': 0.9197, 'grad_norm': 0.9946486353874207, 'learning_rate': 1.6363999959438243e-05, 'epoch': 0.91} +2025-02-05 17:43:51 - ERROR - stderr - 30%|███ | 6776/22434 [7:36:11<10:57:26, 2.52s/it] +2025-02-05 17:43:53 - ERROR - stderr - 30%|███ | 6777/22434 [7:36:13<10:59:39, 2.53s/it] +2025-02-05 17:43:53 - ERROR - stderr - +2025-02-05 17:43:53 - ERROR - stderr - +2025-02-05 17:43:53 - INFO - stdout - {'loss': 0.9187, 'grad_norm': 1.0952951908111572, 'learning_rate': 1.6362886245847732e-05, 'epoch': 0.91} +2025-02-05 17:43:53 - ERROR - stderr - 30%|███ | 6777/22434 [7:36:13<10:59:39, 2.53s/it] +2025-02-05 17:43:56 - ERROR - stderr - 30%|███ | 6778/22434 [7:36:16<10:51:40, 2.50s/it] +2025-02-05 17:43:56 - ERROR - stderr - +2025-02-05 17:43:56 - ERROR - stderr - +2025-02-05 17:43:56 - INFO - stdout - {'loss': 1.0242, 'grad_norm': 1.068766474723816, 'learning_rate': 1.636177239962894e-05, 'epoch': 0.91} +2025-02-05 17:43:56 - ERROR - stderr - 30%|███ | 6778/22434 [7:36:16<10:51:40, 2.50s/it] +2025-02-05 17:43:58 - ERROR - stderr - 30%|███ | 6779/22434 [7:36:18<10:52:32, 2.50s/it] +2025-02-05 17:43:58 - ERROR - stderr - +2025-02-05 17:43:58 - ERROR - stderr - +2025-02-05 17:43:58 - INFO - stdout - {'loss': 0.9773, 'grad_norm': 1.1795401573181152, 'learning_rate': 1.6360658420805093e-05, 'epoch': 0.91} +2025-02-05 17:43:58 - ERROR - stderr - 30%|███ | 6779/22434 [7:36:18<10:52:32, 2.50s/it] +2025-02-05 17:44:01 - ERROR - stderr - 30%|███ | 6780/22434 [7:36:21<11:03:36, 2.54s/it] +2025-02-05 17:44:01 - ERROR - stderr - +2025-02-05 17:44:01 - ERROR - stderr - +2025-02-05 17:44:01 - INFO - stdout - {'loss': 0.8669, 'grad_norm': 1.0513519048690796, 'learning_rate': 1.6359544309399406e-05, 'epoch': 0.91} +2025-02-05 17:44:01 - ERROR - stderr - 30%|███ | 6780/22434 [7:36:21<11:03:36, 2.54s/it] +2025-02-05 17:44:04 - ERROR - stderr - 30%|███ | 6781/22434 [7:36:23<11:03:42, 2.54s/it] +2025-02-05 17:44:04 - ERROR - stderr - +2025-02-05 17:44:04 - ERROR - stderr - +2025-02-05 17:44:04 - INFO - stdout - {'loss': 1.0301, 'grad_norm': 1.0561473369598389, 'learning_rate': 1.6358430065435106e-05, 'epoch': 0.91} +2025-02-05 17:44:04 - ERROR - stderr - 30%|███ | 6781/22434 [7:36:23<11:03:42, 2.54s/it] +2025-02-05 17:44:06 - ERROR - stderr - 30%|███ | 6782/22434 [7:36:26<10:59:16, 2.53s/it] +2025-02-05 17:44:06 - ERROR - stderr - +2025-02-05 17:44:06 - ERROR - stderr - +2025-02-05 17:44:06 - INFO - stdout - {'loss': 0.999, 'grad_norm': 1.0933724641799927, 'learning_rate': 1.6357315688935414e-05, 'epoch': 0.91} +2025-02-05 17:44:06 - ERROR - stderr - 30%|███ | 6782/22434 [7:36:26<10:59:16, 2.53s/it] +2025-02-05 17:44:09 - ERROR - stderr - 30%|███ | 6783/22434 [7:36:28<10:58:26, 2.52s/it] +2025-02-05 17:44:09 - ERROR - stderr - +2025-02-05 17:44:09 - ERROR - stderr - +2025-02-05 17:44:09 - INFO - stdout - {'loss': 0.943, 'grad_norm': 0.9801920652389526, 'learning_rate': 1.6356201179923558e-05, 'epoch': 0.91} +2025-02-05 17:44:09 - ERROR - stderr - 30%|███ | 6783/22434 [7:36:28<10:58:26, 2.52s/it] +2025-02-05 17:44:11 - ERROR - stderr - 30%|███ | 6784/22434 [7:36:31<10:55:52, 2.51s/it] +2025-02-05 17:44:11 - ERROR - stderr - +2025-02-05 17:44:11 - ERROR - stderr - +2025-02-05 17:44:11 - INFO - stdout - {'loss': 1.0108, 'grad_norm': 1.1041316986083984, 'learning_rate': 1.6355086538422775e-05, 'epoch': 0.91} +2025-02-05 17:44:11 - ERROR - stderr - 30%|███ | 6784/22434 [7:36:31<10:55:52, 2.51s/it] +2025-02-05 17:44:14 - ERROR - stderr - 30%|███ | 6785/22434 [7:36:33<11:00:38, 2.53s/it] +2025-02-05 17:44:14 - ERROR - stderr - +2025-02-05 17:44:14 - ERROR - stderr - +2025-02-05 17:44:14 - INFO - stdout - {'loss': 0.8917, 'grad_norm': 1.0017673969268799, 'learning_rate': 1.635397176445629e-05, 'epoch': 0.91} +2025-02-05 17:44:14 - ERROR - stderr - 30%|███ | 6785/22434 [7:36:33<11:00:38, 2.53s/it] +2025-02-05 17:44:16 - ERROR - stderr - 30%|███ | 6786/22434 [7:36:36<10:58:31, 2.53s/it] +2025-02-05 17:44:16 - ERROR - stderr - +2025-02-05 17:44:16 - ERROR - stderr - +2025-02-05 17:44:16 - INFO - stdout - {'loss': 1.024, 'grad_norm': 1.0964072942733765, 'learning_rate': 1.6352856858047347e-05, 'epoch': 0.91} +2025-02-05 17:44:16 - ERROR - stderr - 30%|███ | 6786/22434 [7:36:36<10:58:31, 2.53s/it] +2025-02-05 17:44:19 - ERROR - stderr - 30%|███ | 6787/22434 [7:36:38<10:51:30, 2.50s/it] +2025-02-05 17:44:19 - ERROR - stderr - +2025-02-05 17:44:19 - ERROR - stderr - +2025-02-05 17:44:19 - INFO - stdout - {'loss': 0.9834, 'grad_norm': 1.1336349248886108, 'learning_rate': 1.6351741819219177e-05, 'epoch': 0.91} +2025-02-05 17:44:19 - ERROR - stderr - 30%|███ | 6787/22434 [7:36:38<10:51:30, 2.50s/it] +2025-02-05 17:44:21 - ERROR - stderr - 30%|██��� | 6788/22434 [7:36:41<10:53:30, 2.51s/it] +2025-02-05 17:44:21 - ERROR - stderr - +2025-02-05 17:44:21 - ERROR - stderr - +2025-02-05 17:44:21 - INFO - stdout - {'loss': 0.8408, 'grad_norm': 1.040679693222046, 'learning_rate': 1.635062664799503e-05, 'epoch': 0.91} +2025-02-05 17:44:21 - ERROR - stderr - 30%|███ | 6788/22434 [7:36:41<10:53:30, 2.51s/it] +2025-02-05 17:44:24 - ERROR - stderr - 30%|███ | 6789/22434 [7:36:44<11:17:58, 2.60s/it] +2025-02-05 17:44:24 - ERROR - stderr - +2025-02-05 17:44:24 - ERROR - stderr - +2025-02-05 17:44:24 - INFO - stdout - {'loss': 1.0029, 'grad_norm': 1.1216731071472168, 'learning_rate': 1.6349511344398148e-05, 'epoch': 0.91} +2025-02-05 17:44:24 - ERROR - stderr - 30%|███ | 6789/22434 [7:36:44<11:17:58, 2.60s/it] +2025-02-05 17:44:26 - ERROR - stderr - 30%|███ | 6790/22434 [7:36:46<11:08:35, 2.56s/it] +2025-02-05 17:44:26 - ERROR - stderr - +2025-02-05 17:44:26 - ERROR - stderr - +2025-02-05 17:44:26 - INFO - stdout - {'loss': 0.8722, 'grad_norm': 1.0857746601104736, 'learning_rate': 1.6348395908451778e-05, 'epoch': 0.91} +2025-02-05 17:44:26 - ERROR - stderr - 30%|███ | 6790/22434 [7:36:46<11:08:35, 2.56s/it] +2025-02-05 17:44:29 - ERROR - stderr - 30%|███ | 6791/22434 [7:36:49<11:03:20, 2.54s/it] +2025-02-05 17:44:29 - ERROR - stderr - +2025-02-05 17:44:29 - ERROR - stderr - +2025-02-05 17:44:29 - INFO - stdout - {'loss': 0.9195, 'grad_norm': 0.9849955439567566, 'learning_rate': 1.634728034017917e-05, 'epoch': 0.91} +2025-02-05 17:44:29 - ERROR - stderr - 30%|███ | 6791/22434 [7:36:49<11:03:20, 2.54s/it] +2025-02-05 17:44:31 - ERROR - stderr - 30%|███ | 6792/22434 [7:36:51<11:04:55, 2.55s/it] +2025-02-05 17:44:31 - ERROR - stderr - +2025-02-05 17:44:31 - ERROR - stderr - +2025-02-05 17:44:31 - INFO - stdout - {'loss': 0.9708, 'grad_norm': 1.0231690406799316, 'learning_rate': 1.6346164639603575e-05, 'epoch': 0.91} +2025-02-05 17:44:31 - ERROR - stderr - 30%|███ | 6792/22434 [7:36:51<11:04:55, 2.55s/it] +2025-02-05 17:44:34 - ERROR - stderr - 30%|███ | 6793/22434 [7:36:54<10:56:09, 2.52s/it] +2025-02-05 17:44:34 - ERROR - stderr - +2025-02-05 17:44:34 - ERROR - stderr - +2025-02-05 17:44:34 - INFO - stdout - {'loss': 0.9653, 'grad_norm': 1.0137907266616821, 'learning_rate': 1.6345048806748248e-05, 'epoch': 0.91} +2025-02-05 17:44:34 - ERROR - stderr - 30%|███ | 6793/22434 [7:36:54<10:56:09, 2.52s/it] +2025-02-05 17:44:36 - ERROR - stderr - 30%|███ | 6794/22434 [7:36:56<10:55:07, 2.51s/it] +2025-02-05 17:44:36 - ERROR - stderr - +2025-02-05 17:44:36 - ERROR - stderr - +2025-02-05 17:44:36 - INFO - stdout - {'loss': 0.8228, 'grad_norm': 1.0158982276916504, 'learning_rate': 1.6343932841636455e-05, 'epoch': 0.91} +2025-02-05 17:44:36 - ERROR - stderr - 30%|███ | 6794/22434 [7:36:56<10:55:07, 2.51s/it] +2025-02-05 17:44:39 - ERROR - stderr - 30%|███ | 6795/22434 [7:36:59<10:48:40, 2.49s/it] +2025-02-05 17:44:39 - ERROR - stderr - +2025-02-05 17:44:39 - ERROR - stderr - +2025-02-05 17:44:39 - INFO - stdout - {'loss': 0.9588, 'grad_norm': 1.1140466928482056, 'learning_rate': 1.634281674429145e-05, 'epoch': 0.91} +2025-02-05 17:44:39 - ERROR - stderr - 30%|███ | 6795/22434 [7:36:59<10:48:40, 2.49s/it] +2025-02-05 17:44:41 - ERROR - stderr - 30%|███ | 6796/22434 [7:37:01<10:55:27, 2.51s/it] +2025-02-05 17:44:41 - ERROR - stderr - +2025-02-05 17:44:41 - ERROR - stderr - +2025-02-05 17:44:41 - INFO - stdout - {'loss': 0.9503, 'grad_norm': 1.0771057605743408, 'learning_rate': 1.6341700514736504e-05, 'epoch': 0.91} +2025-02-05 17:44:41 - ERROR - stderr - 30%|███ | 6796/22434 [7:37:01<10:55:27, 2.51s/it] +2025-02-05 17:44:44 - ERROR - stderr - 30%|███ | 6797/22434 [7:37:04<10:58:13, 2.53s/it] +2025-02-05 17:44:44 - ERROR - stderr - +2025-02-05 17:44:44 - ERROR - stderr - +2025-02-05 17:44:44 - INFO - stdout - {'loss': 0.8934, 'grad_norm': 0.9898585677146912, 'learning_rate': 1.6340584152994876e-05, 'epoch': 0.91} +2025-02-05 17:44:44 - ERROR - stderr - 30%|███ | 6797/22434 [7:37:04<10:58:13, 2.53s/it] +2025-02-05 17:44:46 - ERROR - stderr - 30%|███ | 6798/22434 [7:37:06<10:50:57, 2.50s/it] +2025-02-05 17:44:46 - ERROR - stderr - +2025-02-05 17:44:46 - ERROR - stderr - +2025-02-05 17:44:46 - INFO - stdout - {'loss': 0.9223, 'grad_norm': 0.9850866794586182, 'learning_rate': 1.633946765908984e-05, 'epoch': 0.91} +2025-02-05 17:44:46 - ERROR - stderr - 30%|███ | 6798/22434 [7:37:06<10:50:57, 2.50s/it] +2025-02-05 17:44:49 - ERROR - stderr - 30%|███ | 6799/22434 [7:37:09<10:47:48, 2.49s/it] +2025-02-05 17:44:49 - ERROR - stderr - +2025-02-05 17:44:49 - ERROR - stderr - +2025-02-05 17:44:49 - INFO - stdout - {'loss': 0.9486, 'grad_norm': 1.0196812152862549, 'learning_rate': 1.6338351033044665e-05, 'epoch': 0.91} +2025-02-05 17:44:49 - ERROR - stderr - 30%|███ | 6799/22434 [7:37:09<10:47:48, 2.49s/it] +2025-02-05 17:44:51 - ERROR - stderr - 30%|███ | 6800/22434 [7:37:11<10:44:51, 2.47s/it] +2025-02-05 17:44:51 - ERROR - stderr - +2025-02-05 17:44:51 - ERROR - stderr - +2025-02-05 17:44:51 - INFO - stdout - {'loss': 0.8834, 'grad_norm': 0.9686949849128723, 'learning_rate': 1.6337234274882625e-05, 'epoch': 0.91} +2025-02-05 17:44:51 - ERROR - stderr - 30%|███ | 6800/22434 [7:37:11<10:44:51, 2.47s/it] +2025-02-05 17:44:54 - ERROR - stderr - 30%|███ | 6801/22434 [7:37:14<10:51:23, 2.50s/it] +2025-02-05 17:44:54 - ERROR - stderr - +2025-02-05 17:44:54 - ERROR - stderr - +2025-02-05 17:44:54 - INFO - stdout - {'loss': 0.8873, 'grad_norm': 0.9994778037071228, 'learning_rate': 1.6336117384627007e-05, 'epoch': 0.91} +2025-02-05 17:44:54 - ERROR - stderr - 30%|███ | 6801/22434 [7:37:14<10:51:23, 2.50s/it] +2025-02-05 17:44:56 - ERROR - stderr - 30%|███ | 6802/22434 [7:37:16<10:49:51, 2.49s/it] +2025-02-05 17:44:56 - ERROR - stderr - +2025-02-05 17:44:56 - ERROR - stderr - +2025-02-05 17:44:56 - INFO - stdout - {'loss': 0.9371, 'grad_norm': 1.0648632049560547, 'learning_rate': 1.6335000362301083e-05, 'epoch': 0.91} +2025-02-05 17:44:56 - ERROR - stderr - 30%|███ | 6802/22434 [7:37:16<10:49:51, 2.49s/it] +2025-02-05 17:44:59 - ERROR - stderr - 30%|███ | 6803/22434 [7:37:19<10:52:33, 2.50s/it] +2025-02-05 17:44:59 - ERROR - stderr - +2025-02-05 17:44:59 - ERROR - stderr - +2025-02-05 17:44:59 - INFO - stdout - {'loss': 0.9382, 'grad_norm': 1.0102609395980835, 'learning_rate': 1.6333883207928133e-05, 'epoch': 0.91} +2025-02-05 17:44:59 - ERROR - stderr - 30%|███ | 6803/22434 [7:37:19<10:52:33, 2.50s/it] +2025-02-05 17:45:01 - ERROR - stderr - 30%|███ | 6804/22434 [7:37:21<10:52:50, 2.51s/it] +2025-02-05 17:45:01 - ERROR - stderr - +2025-02-05 17:45:01 - ERROR - stderr - +2025-02-05 17:45:01 - INFO - stdout - {'loss': 0.8768, 'grad_norm': 1.0610584020614624, 'learning_rate': 1.633276592153145e-05, 'epoch': 0.91} +2025-02-05 17:45:01 - ERROR - stderr - 30%|███ | 6804/22434 [7:37:21<10:52:50, 2.51s/it] +2025-02-05 17:45:04 - ERROR - stderr - 30%|███ | 6805/22434 [7:37:24<10:48:30, 2.49s/it] +2025-02-05 17:45:04 - ERROR - stderr - +2025-02-05 17:45:04 - ERROR - stderr - +2025-02-05 17:45:04 - INFO - stdout - {'loss': 0.9957, 'grad_norm': 0.94576096534729, 'learning_rate': 1.6331648503134327e-05, 'epoch': 0.91} +2025-02-05 17:45:04 - ERROR - stderr - 30%|███ | 6805/22434 [7:37:24<10:48:30, 2.49s/it] +2025-02-05 17:45:06 - ERROR - stderr - 30%|███ | 6806/22434 [7:37:26<10:59:19, 2.53s/it] +2025-02-05 17:45:06 - ERROR - stderr - +2025-02-05 17:45:06 - ERROR - stderr - +2025-02-05 17:45:06 - INFO - stdout - {'loss': 1.1125, 'grad_norm': 1.1003552675247192, 'learning_rate': 1.6330530952760048e-05, 'epoch': 0.91} +2025-02-05 17:45:06 - ERROR - stderr - 30%|███ | 6806/22434 [7:37:26<10:59:19, 2.53s/it] +2025-02-05 17:45:09 - ERROR - stderr - 30%|███ | 6807/22434 [7:37:29<11:03:49, 2.55s/it] +2025-02-05 17:45:09 - ERROR - stderr - +2025-02-05 17:45:09 - ERROR - stderr - +2025-02-05 17:45:09 - INFO - stdout - {'loss': 0.8955, 'grad_norm': 1.0401118993759155, 'learning_rate': 1.6329413270431906e-05, 'epoch': 0.91} +2025-02-05 17:45:09 - ERROR - stderr - 30%|███ | 6807/22434 [7:37:29<11:03:49, 2.55s/it] +2025-02-05 17:45:12 - ERROR - stderr - 30%|███ | 6808/22434 [7:37:31<11:09:13, 2.57s/it] +2025-02-05 17:45:12 - ERROR - stderr - +2025-02-05 17:45:12 - ERROR - stderr - +2025-02-05 17:45:12 - INFO - stdout - {'loss': 0.9357, 'grad_norm': 1.008165717124939, 'learning_rate': 1.6328295456173206e-05, 'epoch': 0.91} +2025-02-05 17:45:12 - ERROR - stderr - 30%|███ | 6808/22434 [7:37:31<11:09:13, 2.57s/it] +2025-02-05 17:45:14 - ERROR - stderr - 30%|███ | 6809/22434 [7:37:34<11:03:56, 2.55s/it] +2025-02-05 17:45:14 - ERROR - stderr - +2025-02-05 17:45:14 - ERROR - stderr - +2025-02-05 17:45:14 - INFO - stdout - {'loss': 0.9088, 'grad_norm': 1.0534212589263916, 'learning_rate': 1.6327177510007237e-05, 'epoch': 0.91} +2025-02-05 17:45:14 - ERROR - stderr - 30%|███ | 6809/22434 [7:37:34<11:03:56, 2.55s/it] +2025-02-05 17:45:17 - ERROR - stderr - 30%|███ | 6810/22434 [7:37:36<10:58:34, 2.53s/it] +2025-02-05 17:45:17 - ERROR - stderr - +2025-02-05 17:45:17 - ERROR - stderr - +2025-02-05 17:45:17 - INFO - stdout - {'loss': 0.9472, 'grad_norm': 1.0965501070022583, 'learning_rate': 1.632605943195731e-05, 'epoch': 0.91} +2025-02-05 17:45:17 - ERROR - stderr - 30%|███ | 6810/22434 [7:37:36<10:58:34, 2.53s/it] +2025-02-05 17:45:19 - ERROR - stderr - 30%|███ | 6811/22434 [7:37:39<10:57:09, 2.52s/it] +2025-02-05 17:45:19 - ERROR - stderr - +2025-02-05 17:45:19 - ERROR - stderr - +2025-02-05 17:45:19 - INFO - stdout - {'loss': 0.8664, 'grad_norm': 0.9653745889663696, 'learning_rate': 1.6324941222046725e-05, 'epoch': 0.91} +2025-02-05 17:45:19 - ERROR - stderr - 30%|███ | 6811/22434 [7:37:39<10:57:09, 2.52s/it] +2025-02-05 17:45:22 - ERROR - stderr - 30%|███ | 6812/22434 [7:37:42<11:04:58, 2.55s/it] +2025-02-05 17:45:22 - ERROR - stderr - +2025-02-05 17:45:22 - ERROR - stderr - +2025-02-05 17:45:22 - INFO - stdout - {'loss': 1.0884, 'grad_norm': 1.0409152507781982, 'learning_rate': 1.6323822880298795e-05, 'epoch': 0.91} +2025-02-05 17:45:22 - ERROR - stderr - 30%|███ | 6812/22434 [7:37:42<11:04:58, 2.55s/it] +2025-02-05 17:45:24 - ERROR - stderr - 30%|███ | 6813/22434 [7:37:44<10:54:47, 2.52s/it] +2025-02-05 17:45:24 - ERROR - stderr - +2025-02-05 17:45:24 - ERROR - stderr - +2025-02-05 17:45:24 - INFO - stdout - {'loss': 0.9499, 'grad_norm': 1.1187019348144531, 'learning_rate': 1.632270440673683e-05, 'epoch': 0.91} +2025-02-05 17:45:24 - ERROR - stderr - 30%|███ | 6813/22434 [7:37:44<10:54:47, 2.52s/it] +2025-02-05 17:45:27 - ERROR - stderr - 30%|███ | 6814/22434 [7:37:46<10:55:39, 2.52s/it] +2025-02-05 17:45:27 - ERROR - stderr - +2025-02-05 17:45:27 - ERROR - stderr - +2025-02-05 17:45:27 - INFO - stdout - {'loss': 1.0397, 'grad_norm': 1.0470764636993408, 'learning_rate': 1.6321585801384138e-05, 'epoch': 0.91} +2025-02-05 17:45:27 - ERROR - stderr - 30%|███ | 6814/22434 [7:37:47<10:55:39, 2.52s/it] +2025-02-05 17:45:29 - ERROR - stderr - 30%|███ | 6815/22434 [7:37:49<10:51:18, 2.50s/it] +2025-02-05 17:45:29 - ERROR - stderr - +2025-02-05 17:45:29 - ERROR - stderr - +2025-02-05 17:45:29 - INFO - stdout - {'loss': 0.8911, 'grad_norm': 1.034440517425537, 'learning_rate': 1.632046706426404e-05, 'epoch': 0.91} +2025-02-05 17:45:29 - ERROR - stderr - 30%|███ | 6815/22434 [7:37:49<10:51:18, 2.50s/it] +2025-02-05 17:45:32 - ERROR - stderr - 30%|███ | 6816/22434 [7:37:51<10:55:15, 2.52s/it] +2025-02-05 17:45:32 - ERROR - stderr - +2025-02-05 17:45:32 - ERROR - stderr - +2025-02-05 17:45:32 - INFO - stdout - {'loss': 0.9586, 'grad_norm': 1.0755183696746826, 'learning_rate': 1.6319348195399855e-05, 'epoch': 0.91} +2025-02-05 17:45:32 - ERROR - stderr - 30%|███ | 6816/22434 [7:37:52<10:55:15, 2.52s/it] +2025-02-05 17:45:34 - ERROR - stderr - 30%|███ | 6817/22434 [7:37:54<10:53:36, 2.51s/it] +2025-02-05 17:45:34 - ERROR - stderr - +2025-02-05 17:45:34 - ERROR - stderr - +2025-02-05 17:45:34 - INFO - stdout - {'loss': 0.9703, 'grad_norm': 1.0901182889938354, 'learning_rate': 1.6318229194814906e-05, 'epoch': 0.91} +2025-02-05 17:45:34 - ERROR - stderr - 30%|███ | 6817/22434 [7:37:54<10:53:36, 2.51s/it] +2025-02-05 17:45:37 - ERROR - stderr - 30%|███ | 6818/22434 [7:37:57<10:53:49, 2.51s/it] +2025-02-05 17:45:37 - ERROR - stderr - +2025-02-05 17:45:37 - ERROR - stderr - +2025-02-05 17:45:37 - INFO - stdout - {'loss': 0.9826, 'grad_norm': 1.0668545961380005, 'learning_rate': 1.631711006253251e-05, 'epoch': 0.91} +2025-02-05 17:45:37 - ERROR - stderr - 30%|███ | 6818/22434 [7:37:57<10:53:49, 2.51s/it] +2025-02-05 17:45:39 - ERROR - stderr - 30%|███ | 6819/22434 [7:37:59<10:56:18, 2.52s/it] +2025-02-05 17:45:39 - ERROR - stderr - +2025-02-05 17:45:39 - ERROR - stderr - +2025-02-05 17:45:39 - INFO - stdout - {'loss': 0.8764, 'grad_norm': 1.0639677047729492, 'learning_rate': 1.6315990798576002e-05, 'epoch': 0.91} +2025-02-05 17:45:39 - ERROR - stderr - 30%|███ | 6819/22434 [7:37:59<10:56:18, 2.52s/it] +2025-02-05 17:45:42 - ERROR - stderr - 30%|███ | 6820/22434 [7:38:02<10:54:33, 2.52s/it] +2025-02-05 17:45:42 - ERROR - stderr - +2025-02-05 17:45:42 - ERROR - stderr - +2025-02-05 17:45:42 - INFO - stdout - {'loss': 0.9688, 'grad_norm': 1.1224595308303833, 'learning_rate': 1.631487140296871e-05, 'epoch': 0.91} +2025-02-05 17:45:42 - ERROR - stderr - 30%|███ | 6820/22434 [7:38:02<10:54:33, 2.52s/it] +2025-02-05 17:45:44 - ERROR - stderr - 30%|███ | 6821/22434 [7:38:04<10:56:28, 2.52s/it] +2025-02-05 17:45:44 - ERROR - stderr - +2025-02-05 17:45:44 - ERROR - stderr - +2025-02-05 17:45:44 - INFO - stdout - {'loss': 0.8944, 'grad_norm': 1.0143686532974243, 'learning_rate': 1.6313751875733966e-05, 'epoch': 0.91} +2025-02-05 17:45:44 - ERROR - stderr - 30%|███ | 6821/22434 [7:38:04<10:56:28, 2.52s/it] +2025-02-05 17:45:47 - ERROR - stderr - 30%|███ | 6822/22434 [7:38:07<10:54:52, 2.52s/it] +2025-02-05 17:45:47 - ERROR - stderr - +2025-02-05 17:45:47 - ERROR - stderr - +2025-02-05 17:45:47 - INFO - stdout - {'loss': 1.0189, 'grad_norm': 1.1076228618621826, 'learning_rate': 1.6312632216895107e-05, 'epoch': 0.91} +2025-02-05 17:45:47 - ERROR - stderr - 30%|███ | 6822/22434 [7:38:07<10:54:52, 2.52s/it] +2025-02-05 17:45:49 - ERROR - stderr - 30%|███ | 6823/22434 [7:38:09<10:58:54, 2.53s/it] +2025-02-05 17:45:49 - ERROR - stderr - +2025-02-05 17:45:49 - ERROR - stderr - +2025-02-05 17:45:49 - INFO - stdout - {'loss': 0.971, 'grad_norm': 1.1387327909469604, 'learning_rate': 1.6311512426475472e-05, 'epoch': 0.91} +2025-02-05 17:45:49 - ERROR - stderr - 30%|███ | 6823/22434 [7:38:09<10:58:54, 2.53s/it] +2025-02-05 17:45:52 - ERROR - stderr - 30%|███ | 6824/22434 [7:38:12<10:59:08, 2.53s/it] +2025-02-05 17:45:52 - ERROR - stderr - +2025-02-05 17:45:52 - ERROR - stderr - +2025-02-05 17:45:52 - INFO - stdout - {'loss': 0.8719, 'grad_norm': 1.0415048599243164, 'learning_rate': 1.6310392504498397e-05, 'epoch': 0.91} +2025-02-05 17:45:52 - ERROR - stderr - 30%|███ | 6824/22434 [7:38:12<10:59:08, 2.53s/it] +2025-02-05 17:45:54 - ERROR - stderr - 30%|███ | 6825/22434 [7:38:14<10:59:50, 2.54s/it] +2025-02-05 17:45:55 - ERROR - stderr - +2025-02-05 17:45:55 - ERROR - stderr - +2025-02-05 17:45:55 - INFO - stdout - {'loss': 0.9144, 'grad_norm': 1.0837432146072388, 'learning_rate': 1.6309272450987226e-05, 'epoch': 0.91} +2025-02-05 17:45:55 - ERROR - stderr - 30%|███ | 6825/22434 [7:38:14<10:59:50, 2.54s/it] +2025-02-05 17:45:57 - ERROR - stderr - 30%|███ | 6826/22434 [7:38:17<10:56:29, 2.52s/it] +2025-02-05 17:45:57 - ERROR - stderr - +2025-02-05 17:45:57 - ERROR - stderr - +2025-02-05 17:45:57 - INFO - stdout - {'loss': 0.8446, 'grad_norm': 0.958026111125946, 'learning_rate': 1.6308152265965313e-05, 'epoch': 0.91} +2025-02-05 17:45:57 - ERROR - stderr - 30%|███ | 6826/22434 [7:38:17<10:56:29, 2.52s/it] +2025-02-05 17:45:59 - ERROR - stderr - 30%|███ | 6827/22434 [7:38:19<10:55:03, 2.52s/it] +2025-02-05 17:46:00 - ERROR - stderr - +2025-02-05 17:46:00 - ERROR - stderr - +2025-02-05 17:46:00 - INFO - stdout - {'loss': 0.928, 'grad_norm': 1.0343043804168701, 'learning_rate': 1.6307031949455998e-05, 'epoch': 0.91} +2025-02-05 17:46:00 - ERROR - stderr - 30%|███ | 6827/22434 [7:38:19<10:55:03, 2.52s/it] +2025-02-05 17:46:02 - ERROR - stderr - 30%|███ | 6828/22434 [7:38:22<10:50:25, 2.50s/it] +2025-02-05 17:46:02 - ERROR - stderr - +2025-02-05 17:46:02 - ERROR - stderr - +2025-02-05 17:46:02 - INFO - stdout - {'loss': 1.0244, 'grad_norm': 1.097752571105957, 'learning_rate': 1.630591150148264e-05, 'epoch': 0.91} +2025-02-05 17:46:02 - ERROR - stderr - 30%|███ | 6828/22434 [7:38:22<10:50:25, 2.50s/it] +2025-02-05 17:46:04 - ERROR - stderr - 30%|███ | 6829/22434 [7:38:24<10:51:49, 2.51s/it] +2025-02-05 17:46:04 - ERROR - stderr - +2025-02-05 17:46:04 - ERROR - stderr - +2025-02-05 17:46:04 - INFO - stdout - {'loss': 1.0021, 'grad_norm': 1.1021761894226074, 'learning_rate': 1.630479092206859e-05, 'epoch': 0.91} +2025-02-05 17:46:04 - ERROR - stderr - 30%|███ | 6829/22434 [7:38:24<10:51:49, 2.51s/it] +2025-02-05 17:46:07 - ERROR - stderr - 30%|███ | 6830/22434 [7:38:27<10:50:03, 2.50s/it] +2025-02-05 17:46:07 - ERROR - stderr - +2025-02-05 17:46:07 - ERROR - stderr - +2025-02-05 17:46:07 - INFO - stdout - {'loss': 0.9519, 'grad_norm': 1.1466758251190186, 'learning_rate': 1.6303670211237206e-05, 'epoch': 0.91} +2025-02-05 17:46:07 - ERROR - stderr - 30%|███ | 6830/22434 [7:38:27<10:50:03, 2.50s/it] +2025-02-05 17:46:09 - ERROR - stderr - 30%|███ | 6831/22434 [7:38:29<10:55:13, 2.52s/it] +2025-02-05 17:46:10 - ERROR - stderr - +2025-02-05 17:46:10 - ERROR - stderr - +2025-02-05 17:46:10 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.0269718170166016, 'learning_rate': 1.6302549369011847e-05, 'epoch': 0.91} +2025-02-05 17:46:10 - ERROR - stderr - 30%|███ | 6831/22434 [7:38:29<10:55:13, 2.52s/it] +2025-02-05 17:46:12 - ERROR - stderr - 30%|███ | 6832/22434 [7:38:32<10:59:36, 2.54s/it] +2025-02-05 17:46:12 - ERROR - stderr - +2025-02-05 17:46:12 - ERROR - stderr - +2025-02-05 17:46:12 - INFO - stdout - {'loss': 0.8876, 'grad_norm': 1.0369553565979004, 'learning_rate': 1.630142839541588e-05, 'epoch': 0.91} +2025-02-05 17:46:12 - ERROR - stderr - 30%|███ | 6832/22434 [7:38:32<10:59:36, 2.54s/it] +2025-02-05 17:46:15 - ERROR - stderr - 30%|███ | 6833/22434 [7:38:34<10:56:50, 2.53s/it] +2025-02-05 17:46:15 - ERROR - stderr - +2025-02-05 17:46:15 - ERROR - stderr - +2025-02-05 17:46:15 - INFO - stdout - {'loss': 0.9824, 'grad_norm': 1.0761327743530273, 'learning_rate': 1.630030729047267e-05, 'epoch': 0.91} +2025-02-05 17:46:15 - ERROR - stderr - 30%|███ | 6833/22434 [7:38:34<10:56:50, 2.53s/it] +2025-02-05 17:46:17 - ERROR - stderr - 30%|███ | 6834/22434 [7:38:37<10:50:07, 2.50s/it] +2025-02-05 17:46:17 - ERROR - stderr - +2025-02-05 17:46:17 - ERROR - stderr - +2025-02-05 17:46:17 - INFO - stdout - {'loss': 0.9437, 'grad_norm': 1.0410864353179932, 'learning_rate': 1.629918605420558e-05, 'epoch': 0.91} +2025-02-05 17:46:17 - ERROR - stderr - 30%|███ | 6834/22434 [7:38:37<10:50:07, 2.50s/it] +2025-02-05 17:46:20 - ERROR - stderr - 30%|███ | 6835/22434 [7:38:39<10:48:51, 2.50s/it] +2025-02-05 17:46:20 - ERROR - stderr - +2025-02-05 17:46:20 - ERROR - stderr - +2025-02-05 17:46:20 - INFO - stdout - {'loss': 0.8177, 'grad_norm': 1.0458101034164429, 'learning_rate': 1.6298064686637983e-05, 'epoch': 0.91} +2025-02-05 17:46:20 - ERROR - stderr - 30%|███ | 6835/22434 [7:38:39<10:48:51, 2.50s/it] +2025-02-05 17:46:22 - ERROR - stderr - 30%|███ | 6836/22434 [7:38:42<10:48:48, 2.50s/it] +2025-02-05 17:46:22 - ERROR - stderr - +2025-02-05 17:46:22 - ERROR - stderr - +2025-02-05 17:46:22 - INFO - stdout - {'loss': 0.8724, 'grad_norm': 1.0597580671310425, 'learning_rate': 1.6296943187793256e-05, 'epoch': 0.91} +2025-02-05 17:46:22 - ERROR - stderr - 30%|███ | 6836/22434 [7:38:42<10:48:48, 2.50s/it] +2025-02-05 17:46:24 - ERROR - stderr - 30%|███ | 6837/22434 [7:38:44<10:44:14, 2.48s/it] +2025-02-05 17:46:24 - ERROR - stderr - +2025-02-05 17:46:24 - ERROR - stderr - +2025-02-05 17:46:24 - INFO - stdout - {'loss': 0.8305, 'grad_norm': 0.9621270895004272, 'learning_rate': 1.629582155769477e-05, 'epoch': 0.91} +2025-02-05 17:46:24 - ERROR - stderr - 30%|███ | 6837/22434 [7:38:44<10:44:14, 2.48s/it] +2025-02-05 17:46:27 - ERROR - stderr - 30%|███ | 6838/22434 [7:38:47<10:42:23, 2.47s/it] +2025-02-05 17:46:27 - ERROR - stderr - +2025-02-05 17:46:27 - ERROR - stderr - +2025-02-05 17:46:27 - INFO - stdout - {'loss': 1.131, 'grad_norm': 1.1643141508102417, 'learning_rate': 1.6294699796365912e-05, 'epoch': 0.91} +2025-02-05 17:46:27 - ERROR - stderr - 30%|███ | 6838/22434 [7:38:47<10:42:23, 2.47s/it] +2025-02-05 17:46:29 - ERROR - stderr - 30%|███ | 6839/22434 [7:38:49<10:47:11, 2.49s/it] +2025-02-05 17:46:29 - ERROR - stderr - +2025-02-05 17:46:29 - ERROR - stderr - +2025-02-05 17:46:29 - INFO - stdout - {'loss': 0.9581, 'grad_norm': 1.004451036453247, 'learning_rate': 1.629357790383006e-05, 'epoch': 0.91} +2025-02-05 17:46:29 - ERROR - stderr - 30%|███ | 6839/22434 [7:38:49<10:47:11, 2.49s/it] +2025-02-05 17:46:32 - ERROR - stderr - 30%|███ | 6840/22434 [7:38:52<10:56:14, 2.52s/it] +2025-02-05 17:46:32 - ERROR - stderr - +2025-02-05 17:46:32 - ERROR - stderr - +2025-02-05 17:46:32 - INFO - stdout - {'loss': 0.8862, 'grad_norm': 0.9370132684707642, 'learning_rate': 1.62924558801106e-05, 'epoch': 0.91} +2025-02-05 17:46:32 - ERROR - stderr - 30%|███ | 6840/22434 [7:38:52<10:56:14, 2.52s/it] +2025-02-05 17:46:35 - ERROR - stderr - 30%|███ | 6841/22434 [7:38:54<10:57:24, 2.53s/it] +2025-02-05 17:46:35 - ERROR - stderr - +2025-02-05 17:46:35 - ERROR - stderr - +2025-02-05 17:46:35 - INFO - stdout - {'loss': 0.9228, 'grad_norm': 1.0726017951965332, 'learning_rate': 1.629133372523092e-05, 'epoch': 0.91} +2025-02-05 17:46:35 - ERROR - stderr - 30%|███ | 6841/22434 [7:38:54<10:57:24, 2.53s/it] +2025-02-05 17:46:37 - ERROR - stderr - 30%|███ | 6842/22434 [7:38:57<10:57:52, 2.53s/it] +2025-02-05 17:46:37 - ERROR - stderr - +2025-02-05 17:46:37 - ERROR - stderr - +2025-02-05 17:46:37 - INFO - stdout - {'loss': 0.9753, 'grad_norm': 1.070617437362671, 'learning_rate': 1.6290211439214402e-05, 'epoch': 0.91} +2025-02-05 17:46:37 - ERROR - stderr - 30%|███ | 6842/22434 [7:38:57<10:57:52, 2.53s/it] +2025-02-05 17:46:40 - ERROR - stderr - 31%|███ | 6843/22434 [7:38:59<10:58:29, 2.53s/it] +2025-02-05 17:46:40 - ERROR - stderr - +2025-02-05 17:46:40 - ERROR - stderr - +2025-02-05 17:46:40 - INFO - stdout - {'loss': 0.9042, 'grad_norm': 0.9160057306289673, 'learning_rate': 1.628908902208445e-05, 'epoch': 0.92} +2025-02-05 17:46:40 - ERROR - stderr - 31%|███ | 6843/22434 [7:38:59<10:58:29, 2.53s/it] +2025-02-05 17:46:42 - ERROR - stderr - 31%|███ | 6844/22434 [7:39:02<10:59:40, 2.54s/it] +2025-02-05 17:46:42 - ERROR - stderr - +2025-02-05 17:46:42 - ERROR - stderr - +2025-02-05 17:46:42 - INFO - stdout - {'loss': 0.9692, 'grad_norm': 0.9815974235534668, 'learning_rate': 1.6287966473864455e-05, 'epoch': 0.92} +2025-02-05 17:46:42 - ERROR - stderr - 31%|███ | 6844/22434 [7:39:02<10:59:40, 2.54s/it] +2025-02-05 17:46:45 - ERROR - stderr - 31%|███ | 6845/22434 [7:39:04<10:57:04, 2.53s/it] +2025-02-05 17:46:45 - ERROR - stderr - +2025-02-05 17:46:45 - ERROR - stderr - +2025-02-05 17:46:45 - INFO - stdout - {'loss': 1.0521, 'grad_norm': 1.0848268270492554, 'learning_rate': 1.6286843794577815e-05, 'epoch': 0.92} +2025-02-05 17:46:45 - ERROR - stderr - 31%|███ | 6845/22434 [7:39:05<10:57:04, 2.53s/it] +2025-02-05 17:46:47 - ERROR - stderr - 31%|███ | 6846/22434 [7:39:07<10:55:13, 2.52s/it] +2025-02-05 17:46:47 - ERROR - stderr - +2025-02-05 17:46:47 - ERROR - stderr - +2025-02-05 17:46:47 - INFO - stdout - {'loss': 0.9754, 'grad_norm': 1.1050935983657837, 'learning_rate': 1.628572098424793e-05, 'epoch': 0.92} +2025-02-05 17:46:47 - ERROR - stderr - 31%|███ | 6846/22434 [7:39:07<10:55:13, 2.52s/it] +2025-02-05 17:46:50 - ERROR - stderr - 31%|███ | 6847/22434 [7:39:10<10:56:48, 2.53s/it] +2025-02-05 17:46:50 - ERROR - stderr - +2025-02-05 17:46:50 - ERROR - stderr - +2025-02-05 17:46:50 - INFO - stdout - {'loss': 0.9364, 'grad_norm': 1.190428614616394, 'learning_rate': 1.628459804289821e-05, 'epoch': 0.92} +2025-02-05 17:46:50 - ERROR - stderr - 31%|███ | 6847/22434 [7:39:10<10:56:48, 2.53s/it] +2025-02-05 17:46:52 - ERROR - stderr - 31%|███ | 6848/22434 [7:39:12<11:09:45, 2.58s/it] +2025-02-05 17:46:52 - ERROR - stderr - +2025-02-05 17:46:52 - ERROR - stderr - +2025-02-05 17:46:52 - INFO - stdout - {'loss': 0.8962, 'grad_norm': 0.9780930280685425, 'learning_rate': 1.6283474970552055e-05, 'epoch': 0.92} +2025-02-05 17:46:52 - ERROR - stderr - 31%|███ | 6848/22434 [7:39:12<11:09:45, 2.58s/it] +2025-02-05 17:46:55 - ERROR - stderr - 31%|███ | 6849/22434 [7:39:15<11:02:58, 2.55s/it] +2025-02-05 17:46:55 - ERROR - stderr - +2025-02-05 17:46:55 - ERROR - stderr - +2025-02-05 17:46:55 - INFO - stdout - {'loss': 1.0553, 'grad_norm': 1.1244094371795654, 'learning_rate': 1.628235176723288e-05, 'epoch': 0.92} +2025-02-05 17:46:55 - ERROR - stderr - 31%|███ | 6849/22434 [7:39:15<11:02:58, 2.55s/it] +2025-02-05 17:46:57 - ERROR - stderr - 31%|███ | 6850/22434 [7:39:17<10:57:10, 2.53s/it] +2025-02-05 17:46:57 - ERROR - stderr - +2025-02-05 17:46:57 - ERROR - stderr - +2025-02-05 17:46:57 - INFO - stdout - {'loss': 0.9232, 'grad_norm': 1.0004079341888428, 'learning_rate': 1.6281228432964092e-05, 'epoch': 0.92} +2025-02-05 17:46:57 - ERROR - stderr - 31%|███ | 6850/22434 [7:39:17<10:57:10, 2.53s/it] +2025-02-05 17:47:00 - ERROR - stderr - 31%|███ | 6851/22434 [7:39:20<10:53:35, 2.52s/it] +2025-02-05 17:47:00 - ERROR - stderr - +2025-02-05 17:47:00 - ERROR - stderr - +2025-02-05 17:47:00 - INFO - stdout - {'loss': 0.7968, 'grad_norm': 1.1048442125320435, 'learning_rate': 1.6280104967769106e-05, 'epoch': 0.92} +2025-02-05 17:47:00 - ERROR - stderr - 31%|███ | 6851/22434 [7:39:20<10:53:35, 2.52s/it] +2025-02-05 17:47:02 - ERROR - stderr - 31%|███ | 6852/22434 [7:39:22<10:47:12, 2.49s/it] +2025-02-05 17:47:02 - ERROR - stderr - +2025-02-05 17:47:02 - ERROR - stderr - +2025-02-05 17:47:02 - INFO - stdout - {'loss': 0.9654, 'grad_norm': 1.1042070388793945, 'learning_rate': 1.6278981371671345e-05, 'epoch': 0.92} +2025-02-05 17:47:02 - ERROR - stderr - 31%|███ | 6852/22434 [7:39:22<10:47:12, 2.49s/it] +2025-02-05 17:47:05 - ERROR - stderr - 31%|███ | 6853/22434 [7:39:25<10:41:28, 2.47s/it] +2025-02-05 17:47:05 - ERROR - stderr - +2025-02-05 17:47:05 - ERROR - stderr - +2025-02-05 17:47:05 - INFO - stdout - {'loss': 0.9049, 'grad_norm': 1.08100163936615, 'learning_rate': 1.6277857644694223e-05, 'epoch': 0.92} +2025-02-05 17:47:05 - ERROR - stderr - 31%|███ | 6853/22434 [7:39:25<10:41:28, 2.47s/it] +2025-02-05 17:47:07 - ERROR - stderr - 31%|███ | 6854/22434 [7:39:27<10:47:37, 2.49s/it] +2025-02-05 17:47:07 - ERROR - stderr - +2025-02-05 17:47:07 - ERROR - stderr - +2025-02-05 17:47:07 - INFO - stdout - {'loss': 0.919, 'grad_norm': 1.057588815689087, 'learning_rate': 1.6276733786861166e-05, 'epoch': 0.92} +2025-02-05 17:47:07 - ERROR - stderr - 31%|███ | 6854/22434 [7:39:27<10:47:37, 2.49s/it] +2025-02-05 17:47:10 - ERROR - stderr - 31%|███ | 6855/22434 [7:39:30<10:51:16, 2.51s/it] +2025-02-05 17:47:10 - ERROR - stderr - +2025-02-05 17:47:10 - ERROR - stderr - +2025-02-05 17:47:10 - INFO - stdout - {'loss': 0.9968, 'grad_norm': 1.114198088645935, 'learning_rate': 1.6275609798195598e-05, 'epoch': 0.92} +2025-02-05 17:47:10 - ERROR - stderr - 31%|███ | 6855/22434 [7:39:30<10:51:16, 2.51s/it] +2025-02-05 17:47:12 - ERROR - stderr - 31%|███ | 6856/22434 [7:39:32<10:48:42, 2.50s/it] +2025-02-05 17:47:12 - ERROR - stderr - +2025-02-05 17:47:12 - ERROR - stderr - +2025-02-05 17:47:12 - INFO - stdout - {'loss': 0.8945, 'grad_norm': 1.0222796201705933, 'learning_rate': 1.6274485678720952e-05, 'epoch': 0.92} +2025-02-05 17:47:12 - ERROR - stderr - 31%|███ | 6856/22434 [7:39:32<10:48:42, 2.50s/it] +2025-02-05 17:47:15 - ERROR - stderr - 31%|███ | 6857/22434 [7:39:35<10:53:21, 2.52s/it] +2025-02-05 17:47:15 - ERROR - stderr - +2025-02-05 17:47:15 - ERROR - stderr - +2025-02-05 17:47:15 - INFO - stdout - {'loss': 0.992, 'grad_norm': 1.0690758228302002, 'learning_rate': 1.627336142846065e-05, 'epoch': 0.92} +2025-02-05 17:47:15 - ERROR - stderr - 31%|███ | 6857/22434 [7:39:35<10:53:21, 2.52s/it] +2025-02-05 17:47:17 - ERROR - stderr - 31%|███ | 6858/22434 [7:39:37<10:49:22, 2.50s/it] +2025-02-05 17:47:17 - ERROR - stderr - +2025-02-05 17:47:17 - ERROR - stderr - +2025-02-05 17:47:17 - INFO - stdout - {'loss': 0.9204, 'grad_norm': 1.101946234703064, 'learning_rate': 1.627223704743814e-05, 'epoch': 0.92} +2025-02-05 17:47:17 - ERROR - stderr - 31%|███ | 6858/22434 [7:39:37<10:49:22, 2.50s/it] +2025-02-05 17:47:20 - ERROR - stderr - 31%|███ | 6859/22434 [7:39:40<10:56:00, 2.53s/it] +2025-02-05 17:47:20 - ERROR - stderr - +2025-02-05 17:47:20 - ERROR - stderr - +2025-02-05 17:47:20 - INFO - stdout - {'loss': 1.0989, 'grad_norm': 1.08036208152771, 'learning_rate': 1.6271112535676846e-05, 'epoch': 0.92} +2025-02-05 17:47:20 - ERROR - stderr - 31%|███ | 6859/22434 [7:39:40<10:56:00, 2.53s/it] +2025-02-05 17:47:22 - ERROR - stderr - 31%|███ | 6860/22434 [7:39:42<10:50:00, 2.50s/it] +2025-02-05 17:47:22 - ERROR - stderr - +2025-02-05 17:47:22 - ERROR - stderr - +2025-02-05 17:47:22 - INFO - stdout - {'loss': 0.9127, 'grad_norm': 1.1058663129806519, 'learning_rate': 1.6269987893200213e-05, 'epoch': 0.92} +2025-02-05 17:47:22 - ERROR - stderr - 31%|███ | 6860/22434 [7:39:42<10:50:00, 2.50s/it] +2025-02-05 17:47:25 - ERROR - stderr - 31%|███ | 6861/22434 [7:39:45<10:52:42, 2.51s/it] +2025-02-05 17:47:25 - ERROR - stderr - +2025-02-05 17:47:25 - ERROR - stderr - +2025-02-05 17:47:25 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.2037876844406128, 'learning_rate': 1.6268863120031682e-05, 'epoch': 0.92} +2025-02-05 17:47:25 - ERROR - stderr - 31%|███ | 6861/22434 [7:39:45<10:52:42, 2.51s/it] +2025-02-05 17:47:27 - ERROR - stderr - 31%|███ | 6862/22434 [7:39:47<10:51:08, 2.51s/it] +2025-02-05 17:47:27 - ERROR - stderr - +2025-02-05 17:47:27 - ERROR - stderr - +2025-02-05 17:47:27 - INFO - stdout - {'loss': 0.8618, 'grad_norm': 1.056054949760437, 'learning_rate': 1.6267738216194698e-05, 'epoch': 0.92} +2025-02-05 17:47:27 - ERROR - stderr - 31%|███ | 6862/22434 [7:39:47<10:51:08, 2.51s/it] +2025-02-05 17:47:30 - ERROR - stderr - 31%|███ | 6863/22434 [7:39:50<11:08:07, 2.57s/it] +2025-02-05 17:47:30 - ERROR - stderr - +2025-02-05 17:47:30 - ERROR - stderr - +2025-02-05 17:47:30 - INFO - stdout - {'loss': 0.9243, 'grad_norm': 1.081805944442749, 'learning_rate': 1.6266613181712708e-05, 'epoch': 0.92} +2025-02-05 17:47:30 - ERROR - stderr - 31%|███ | 6863/22434 [7:39:50<11:08:07, 2.57s/it] +2025-02-05 17:47:33 - ERROR - stderr - 31%|███ | 6864/22434 [7:39:52<11:02:09, 2.55s/it] +2025-02-05 17:47:33 - ERROR - stderr - +2025-02-05 17:47:33 - ERROR - stderr - +2025-02-05 17:47:33 - INFO - stdout - {'loss': 0.9538, 'grad_norm': 1.0295435190200806, 'learning_rate': 1.626548801660916e-05, 'epoch': 0.92} +2025-02-05 17:47:33 - ERROR - stderr - 31%|███ | 6864/22434 [7:39:52<11:02:09, 2.55s/it] +2025-02-05 17:47:35 - ERROR - stderr - 31%|███ | 6865/22434 [7:39:55<10:54:26, 2.52s/it] +2025-02-05 17:47:35 - ERROR - stderr - +2025-02-05 17:47:35 - ERROR - stderr - +2025-02-05 17:47:35 - INFO - stdout - {'loss': 0.9541, 'grad_norm': 1.0153684616088867, 'learning_rate': 1.6264362720907514e-05, 'epoch': 0.92} +2025-02-05 17:47:35 - ERROR - stderr - 31%|███ | 6865/22434 [7:39:55<10:54:26, 2.52s/it] +2025-02-05 17:47:38 - ERROR - stderr - 31%|███ | 6866/22434 [7:39:57<10:53:04, 2.52s/it] +2025-02-05 17:47:38 - ERROR - stderr - +2025-02-05 17:47:38 - ERROR - stderr - +2025-02-05 17:47:38 - INFO - stdout - {'loss': 0.7294, 'grad_norm': 0.9508002400398254, 'learning_rate': 1.6263237294631224e-05, 'epoch': 0.92} +2025-02-05 17:47:38 - ERROR - stderr - 31%|███ | 6866/22434 [7:39:57<10:53:04, 2.52s/it] +2025-02-05 17:47:40 - ERROR - stderr - 31%|███ | 6867/22434 [7:40:00<10:50:37, 2.51s/it] +2025-02-05 17:47:40 - ERROR - stderr - +2025-02-05 17:47:40 - ERROR - stderr - +2025-02-05 17:47:40 - INFO - stdout - {'loss': 0.9688, 'grad_norm': 1.086427927017212, 'learning_rate': 1.6262111737803737e-05, 'epoch': 0.92} +2025-02-05 17:47:40 - ERROR - stderr - 31%|███ | 6867/22434 [7:40:00<10:50:37, 2.51s/it] +2025-02-05 17:47:43 - ERROR - stderr - 31%|███ | 6868/22434 [7:40:02<10:45:50, 2.49s/it] +2025-02-05 17:47:43 - ERROR - stderr - +2025-02-05 17:47:43 - ERROR - stderr - +2025-02-05 17:47:43 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 1.1007853746414185, 'learning_rate': 1.626098605044853e-05, 'epoch': 0.92} +2025-02-05 17:47:43 - ERROR - stderr - 31%|███ | 6868/22434 [7:40:02<10:45:50, 2.49s/it] +2025-02-05 17:47:45 - ERROR - stderr - 31%|███ | 6869/22434 [7:40:05<10:44:07, 2.48s/it] +2025-02-05 17:47:45 - ERROR - stderr - +2025-02-05 17:47:45 - ERROR - stderr - +2025-02-05 17:47:45 - INFO - stdout - {'loss': 0.9213, 'grad_norm': 1.064684510231018, 'learning_rate': 1.625986023258906e-05, 'epoch': 0.92} +2025-02-05 17:47:45 - ERROR - stderr - 31%|███ | 6869/22434 [7:40:05<10:44:07, 2.48s/it] +2025-02-05 17:47:48 - ERROR - stderr - 31%|███ | 6870/22434 [7:40:07<10:47:45, 2.50s/it] +2025-02-05 17:47:48 - ERROR - stderr - +2025-02-05 17:47:48 - ERROR - stderr - +2025-02-05 17:47:48 - INFO - stdout - {'loss': 0.806, 'grad_norm': 0.9863426089286804, 'learning_rate': 1.625873428424879e-05, 'epoch': 0.92} +2025-02-05 17:47:48 - ERROR - stderr - 31%|███ | 6870/22434 [7:40:07<10:47:45, 2.50s/it] +2025-02-05 17:47:50 - ERROR - stderr - 31%|███ | 6871/22434 [7:40:10<10:47:11, 2.50s/it] +2025-02-05 17:47:50 - ERROR - stderr - +2025-02-05 17:47:50 - ERROR - stderr - +2025-02-05 17:47:50 - INFO - stdout - {'loss': 0.9432, 'grad_norm': 1.1444082260131836, 'learning_rate': 1.6257608205451192e-05, 'epoch': 0.92} +2025-02-05 17:47:50 - ERROR - stderr - 31%|███ | 6871/22434 [7:40:10<10:47:11, 2.50s/it] +2025-02-05 17:47:53 - ERROR - stderr - 31%|███ | 6872/22434 [7:40:12<10:47:13, 2.50s/it] +2025-02-05 17:47:53 - ERROR - stderr - +2025-02-05 17:47:53 - ERROR - stderr - +2025-02-05 17:47:53 - INFO - stdout - {'loss': 0.9893, 'grad_norm': 1.0987794399261475, 'learning_rate': 1.6256481996219743e-05, 'epoch': 0.92} +2025-02-05 17:47:53 - ERROR - stderr - 31%|███ | 6872/22434 [7:40:12<10:47:13, 2.50s/it] +2025-02-05 17:47:55 - ERROR - stderr - 31%|███ | 6873/22434 [7:40:15<10:53:30, 2.52s/it] +2025-02-05 17:47:55 - ERROR - stderr - +2025-02-05 17:47:55 - ERROR - stderr - +2025-02-05 17:47:55 - INFO - stdout - {'loss': 0.9716, 'grad_norm': 1.0105737447738647, 'learning_rate': 1.6255355656577915e-05, 'epoch': 0.92} +2025-02-05 17:47:55 - ERROR - stderr - 31%|███ | 6873/22434 [7:40:15<10:53:30, 2.52s/it] +2025-02-05 17:47:58 - ERROR - stderr - 31%|███ | 6874/22434 [7:40:17<11:01:11, 2.55s/it] +2025-02-05 17:47:58 - ERROR - stderr - +2025-02-05 17:47:58 - ERROR - stderr - +2025-02-05 17:47:58 - INFO - stdout - {'loss': 0.8949, 'grad_norm': 1.0987651348114014, 'learning_rate': 1.625422918654918e-05, 'epoch': 0.92} +2025-02-05 17:47:58 - ERROR - stderr - 31%|███ | 6874/22434 [7:40:18<11:01:11, 2.55s/it] +2025-02-05 17:48:00 - ERROR - stderr - 31%|███ | 6875/22434 [7:40:20<10:59:52, 2.54s/it] +2025-02-05 17:48:00 - ERROR - stderr - +2025-02-05 17:48:00 - ERROR - stderr - +2025-02-05 17:48:00 - INFO - stdout - {'loss': 0.9582, 'grad_norm': 1.0292030572891235, 'learning_rate': 1.6253102586157022e-05, 'epoch': 0.92} +2025-02-05 17:48:00 - ERROR - stderr - 31%|███ | 6875/22434 [7:40:20<10:59:52, 2.54s/it] +2025-02-05 17:48:03 - ERROR - stderr - 31%|███ | 6876/22434 [7:40:23<10:56:49, 2.53s/it] +2025-02-05 17:48:03 - ERROR - stderr - +2025-02-05 17:48:03 - ERROR - stderr - +2025-02-05 17:48:03 - INFO - stdout - {'loss': 0.8976, 'grad_norm': 0.965427577495575, 'learning_rate': 1.6251975855424924e-05, 'epoch': 0.92} +2025-02-05 17:48:03 - ERROR - stderr - 31%|███ | 6876/22434 [7:40:23<10:56:49, 2.53s/it] +2025-02-05 17:48:05 - ERROR - stderr - 31%|███ | 6877/22434 [7:40:25<10:51:16, 2.51s/it] +2025-02-05 17:48:05 - ERROR - stderr - +2025-02-05 17:48:05 - ERROR - stderr - +2025-02-05 17:48:05 - INFO - stdout - {'loss': 0.9965, 'grad_norm': 1.1971684694290161, 'learning_rate': 1.6250848994376377e-05, 'epoch': 0.92} +2025-02-05 17:48:05 - ERROR - stderr - 31%|███ | 6877/22434 [7:40:25<10:51:16, 2.51s/it] +2025-02-05 17:48:08 - ERROR - stderr - 31%|███ | 6878/22434 [7:40:27<10:45:45, 2.49s/it] +2025-02-05 17:48:08 - ERROR - stderr - +2025-02-05 17:48:08 - ERROR - stderr - +2025-02-05 17:48:08 - INFO - stdout - {'loss': 0.9054, 'grad_norm': 1.0184681415557861, 'learning_rate': 1.624972200303486e-05, 'epoch': 0.92} +2025-02-05 17:48:08 - ERROR - stderr - 31%|███ | 6878/22434 [7:40:27<10:45:45, 2.49s/it] +2025-02-05 17:48:10 - ERROR - stderr - 31%|███ | 6879/22434 [7:40:30<10:42:55, 2.48s/it] +2025-02-05 17:48:10 - ERROR - stderr - +2025-02-05 17:48:10 - ERROR - stderr - +2025-02-05 17:48:10 - INFO - stdout - {'loss': 1.0516, 'grad_norm': 1.0956999063491821, 'learning_rate': 1.6248594881423866e-05, 'epoch': 0.92} +2025-02-05 17:48:10 - ERROR - stderr - 31%|███ | 6879/22434 [7:40:30<10:42:55, 2.48s/it] +2025-02-05 17:48:13 - ERROR - stderr - 31%|███ | 6880/22434 [7:40:32<10:43:05, 2.48s/it] +2025-02-05 17:48:13 - ERROR - stderr - +2025-02-05 17:48:13 - ERROR - stderr - +2025-02-05 17:48:13 - INFO - stdout - {'loss': 0.8213, 'grad_norm': 1.0508352518081665, 'learning_rate': 1.624746762956689e-05, 'epoch': 0.92} +2025-02-05 17:48:13 - ERROR - stderr - 31%|███ | 6880/22434 [7:40:32<10:43:05, 2.48s/it] +2025-02-05 17:48:15 - ERROR - stderr - 31%|███ | 6881/22434 [7:40:35<10:41:34, 2.48s/it] +2025-02-05 17:48:15 - ERROR - stderr - +2025-02-05 17:48:15 - ERROR - stderr - +2025-02-05 17:48:15 - INFO - stdout - {'loss': 0.8521, 'grad_norm': 0.9868884682655334, 'learning_rate': 1.6246340247487435e-05, 'epoch': 0.92} +2025-02-05 17:48:15 - ERROR - stderr - 31%|███ | 6881/22434 [7:40:35<10:41:34, 2.48s/it] +2025-02-05 17:48:18 - ERROR - stderr - 31%|███ | 6882/22434 [7:40:37<10:42:57, 2.48s/it] +2025-02-05 17:48:18 - ERROR - stderr - +2025-02-05 17:48:18 - ERROR - stderr - +2025-02-05 17:48:18 - INFO - stdout - {'loss': 1.0102, 'grad_norm': 1.1002509593963623, 'learning_rate': 1.6245212735208994e-05, 'epoch': 0.92} +2025-02-05 17:48:18 - ERROR - stderr - 31%|███ | 6882/22434 [7:40:37<10:42:57, 2.48s/it] +2025-02-05 17:48:20 - ERROR - stderr - 31%|███ | 6883/22434 [7:40:40<10:47:20, 2.50s/it] +2025-02-05 17:48:20 - ERROR - stderr - +2025-02-05 17:48:20 - ERROR - stderr - +2025-02-05 17:48:20 - INFO - stdout - {'loss': 0.8298, 'grad_norm': 0.9806067943572998, 'learning_rate': 1.6244085092755066e-05, 'epoch': 0.92} +2025-02-05 17:48:20 - ERROR - stderr - 31%|███ | 6883/22434 [7:40:40<10:47:20, 2.50s/it] +2025-02-05 17:48:23 - ERROR - stderr - 31%|███ | 6884/22434 [7:40:43<11:16:11, 2.61s/it] +2025-02-05 17:48:23 - ERROR - stderr - +2025-02-05 17:48:23 - ERROR - stderr - +2025-02-05 17:48:23 - INFO - stdout - {'loss': 0.9031, 'grad_norm': 1.1113206148147583, 'learning_rate': 1.624295732014916e-05, 'epoch': 0.92} +2025-02-05 17:48:23 - ERROR - stderr - 31%|███ | 6884/22434 [7:40:43<11:16:11, 2.61s/it] +2025-02-05 17:48:25 - ERROR - stderr - 31%|███ | 6885/22434 [7:40:45<11:02:24, 2.56s/it] +2025-02-05 17:48:25 - ERROR - stderr - +2025-02-05 17:48:25 - ERROR - stderr - +2025-02-05 17:48:25 - INFO - stdout - {'loss': 0.9522, 'grad_norm': 1.0912349224090576, 'learning_rate': 1.6241829417414784e-05, 'epoch': 0.92} +2025-02-05 17:48:25 - ERROR - stderr - 31%|███ | 6885/22434 [7:40:45<11:02:24, 2.56s/it] +2025-02-05 17:48:28 - ERROR - stderr - 31%|███ | 6886/22434 [7:40:48<11:07:04, 2.57s/it] +2025-02-05 17:48:28 - ERROR - stderr - +2025-02-05 17:48:28 - ERROR - stderr - +2025-02-05 17:48:28 - INFO - stdout - {'loss': 0.99, 'grad_norm': 1.0708122253417969, 'learning_rate': 1.6240701384575446e-05, 'epoch': 0.92} +2025-02-05 17:48:28 - ERROR - stderr - 31%|███ | 6886/22434 [7:40:48<11:07:04, 2.57s/it] +2025-02-05 17:48:31 - ERROR - stderr - 31%|███ | 6887/22434 [7:40:50<11:03:02, 2.56s/it] +2025-02-05 17:48:31 - ERROR - stderr - +2025-02-05 17:48:31 - ERROR - stderr - +2025-02-05 17:48:31 - INFO - stdout - {'loss': 0.9201, 'grad_norm': 1.0628043413162231, 'learning_rate': 1.623957322165466e-05, 'epoch': 0.92} +2025-02-05 17:48:31 - ERROR - stderr - 31%|███ | 6887/22434 [7:40:50<11:03:02, 2.56s/it] +2025-02-05 17:48:33 - ERROR - stderr - 31%|███ | 6888/22434 [7:40:53<10:54:05, 2.52s/it] +2025-02-05 17:48:33 - ERROR - stderr - +2025-02-05 17:48:33 - ERROR - stderr - +2025-02-05 17:48:33 - INFO - stdout - {'loss': 0.945, 'grad_norm': 1.1664705276489258, 'learning_rate': 1.623844492867594e-05, 'epoch': 0.92} +2025-02-05 17:48:33 - ERROR - stderr - 31%|███ | 6888/22434 [7:40:53<10:54:05, 2.52s/it] +2025-02-05 17:48:35 - ERROR - stderr - 31%|███ | 6889/22434 [7:40:55<10:51:05, 2.51s/it] +2025-02-05 17:48:36 - ERROR - stderr - +2025-02-05 17:48:36 - ERROR - stderr - +2025-02-05 17:48:36 - INFO - stdout - {'loss': 0.9173, 'grad_norm': 0.9236838221549988, 'learning_rate': 1.6237316505662808e-05, 'epoch': 0.92} +2025-02-05 17:48:36 - ERROR - stderr - 31%|███ | 6889/22434 [7:40:55<10:51:05, 2.51s/it] +2025-02-05 17:48:38 - ERROR - stderr - 31%|███ | 6890/22434 [7:40:58<10:50:20, 2.51s/it] +2025-02-05 17:48:38 - ERROR - stderr - +2025-02-05 17:48:38 - ERROR - stderr - +2025-02-05 17:48:38 - INFO - stdout - {'loss': 0.8982, 'grad_norm': 1.0149219036102295, 'learning_rate': 1.623618795263878e-05, 'epoch': 0.92} +2025-02-05 17:48:38 - ERROR - stderr - 31%|███ | 6890/22434 [7:40:58<10:50:20, 2.51s/it] +2025-02-05 17:48:40 - ERROR - stderr - 31%|███ | 6891/22434 [7:41:00<10:46:24, 2.50s/it] +2025-02-05 17:48:40 - ERROR - stderr - +2025-02-05 17:48:40 - ERROR - stderr - +2025-02-05 17:48:40 - INFO - stdout - {'loss': 0.9214, 'grad_norm': 1.0694936513900757, 'learning_rate': 1.623505926962738e-05, 'epoch': 0.92} +2025-02-05 17:48:40 - ERROR - stderr - 31%|███ | 6891/22434 [7:41:00<10:46:24, 2.50s/it] +2025-02-05 17:48:43 - ERROR - stderr - 31%|███ | 6892/22434 [7:41:03<10:46:57, 2.50s/it] +2025-02-05 17:48:43 - ERROR - stderr - +2025-02-05 17:48:43 - ERROR - stderr - +2025-02-05 17:48:43 - INFO - stdout - {'loss': 0.9253, 'grad_norm': 1.076536774635315, 'learning_rate': 1.6233930456652138e-05, 'epoch': 0.92} +2025-02-05 17:48:43 - ERROR - stderr - 31%|███ | 6892/22434 [7:41:03<10:46:57, 2.50s/it] +2025-02-05 17:48:45 - ERROR - stderr - 31%|███ | 6893/22434 [7:41:05<10:48:03, 2.50s/it] +2025-02-05 17:48:45 - ERROR - stderr - +2025-02-05 17:48:45 - ERROR - stderr - +2025-02-05 17:48:45 - INFO - stdout - {'loss': 0.9864, 'grad_norm': 1.0527029037475586, 'learning_rate': 1.6232801513736576e-05, 'epoch': 0.92} +2025-02-05 17:48:45 - ERROR - stderr - 31%|███ | 6893/22434 [7:41:05<10:48:03, 2.50s/it] +2025-02-05 17:48:48 - ERROR - stderr - 31%|███ | 6894/22434 [7:41:08<10:46:30, 2.50s/it] +2025-02-05 17:48:48 - ERROR - stderr - +2025-02-05 17:48:48 - ERROR - stderr - +2025-02-05 17:48:48 - INFO - stdout - {'loss': 0.9317, 'grad_norm': 1.0681196451187134, 'learning_rate': 1.6231672440904236e-05, 'epoch': 0.92} +2025-02-05 17:48:48 - ERROR - stderr - 31%|███ | 6894/22434 [7:41:08<10:46:30, 2.50s/it] +2025-02-05 17:48:51 - ERROR - stderr - 31%|███ | 6895/22434 [7:41:11<11:11:48, 2.59s/it] +2025-02-05 17:48:51 - ERROR - stderr - +2025-02-05 17:48:51 - ERROR - stderr - +2025-02-05 17:48:51 - INFO - stdout - {'loss': 1.0109, 'grad_norm': 1.0799440145492554, 'learning_rate': 1.6230543238178645e-05, 'epoch': 0.92} +2025-02-05 17:48:51 - ERROR - stderr - 31%|███ | 6895/22434 [7:41:11<11:11:48, 2.59s/it] +2025-02-05 17:48:53 - ERROR - stderr - 31%|███ | 6896/22434 [7:41:13<11:12:17, 2.60s/it] +2025-02-05 17:48:53 - ERROR - stderr - +2025-02-05 17:48:53 - ERROR - stderr - +2025-02-05 17:48:53 - INFO - stdout - {'loss': 0.8347, 'grad_norm': 1.3043550252914429, 'learning_rate': 1.622941390558334e-05, 'epoch': 0.92} +2025-02-05 17:48:53 - ERROR - stderr - 31%|███ | 6896/22434 [7:41:13<11:12:17, 2.60s/it] +2025-02-05 17:48:56 - ERROR - stderr - 31%|███ | 6897/22434 [7:41:16<11:07:31, 2.58s/it] +2025-02-05 17:48:56 - ERROR - stderr - +2025-02-05 17:48:56 - ERROR - stderr - +2025-02-05 17:48:56 - INFO - stdout - {'loss': 0.8994, 'grad_norm': 1.0095030069351196, 'learning_rate': 1.6228284443141866e-05, 'epoch': 0.92} +2025-02-05 17:48:56 - ERROR - stderr - 31%|███ | 6897/22434 [7:41:16<11:07:31, 2.58s/it] +2025-02-05 17:48:58 - ERROR - stderr - 31%|███ | 6898/22434 [7:41:18<11:04:14, 2.57s/it] +2025-02-05 17:48:58 - ERROR - stderr - +2025-02-05 17:48:58 - ERROR - stderr - +2025-02-05 17:48:58 - INFO - stdout - {'loss': 1.0404, 'grad_norm': 1.1030194759368896, 'learning_rate': 1.6227154850877762e-05, 'epoch': 0.92} +2025-02-05 17:48:58 - ERROR - stderr - 31%|███ | 6898/22434 [7:41:18<11:04:14, 2.57s/it] +2025-02-05 17:49:01 - ERROR - stderr - 31%|███ | 6899/22434 [7:41:21<10:57:01, 2.54s/it] +2025-02-05 17:49:01 - ERROR - stderr - +2025-02-05 17:49:01 - ERROR - stderr - +2025-02-05 17:49:01 - INFO - stdout - {'loss': 1.0034, 'grad_norm': 1.1107786893844604, 'learning_rate': 1.6226025128814577e-05, 'epoch': 0.92} +2025-02-05 17:49:01 - ERROR - stderr - 31%|███ | 6899/22434 [7:41:21<10:57:01, 2.54s/it] +2025-02-05 17:49:03 - ERROR - stderr - 31%|███ | 6900/22434 [7:41:23<10:53:12, 2.52s/it] +2025-02-05 17:49:03 - ERROR - stderr - +2025-02-05 17:49:03 - ERROR - stderr - +2025-02-05 17:49:03 - INFO - stdout - {'loss': 0.8769, 'grad_norm': 1.0816348791122437, 'learning_rate': 1.622489527697585e-05, 'epoch': 0.92} +2025-02-05 17:49:03 - ERROR - stderr - 31%|███ | 6900/22434 [7:41:23<10:53:12, 2.52s/it] +2025-02-05 17:49:06 - ERROR - stderr - 31%|███ | 6901/22434 [7:41:26<10:49:46, 2.51s/it] +2025-02-05 17:49:06 - ERROR - stderr - +2025-02-05 17:49:06 - ERROR - stderr - +2025-02-05 17:49:06 - INFO - stdout - {'loss': 0.9268, 'grad_norm': 1.0478155612945557, 'learning_rate': 1.6223765295385142e-05, 'epoch': 0.92} +2025-02-05 17:49:06 - ERROR - stderr - 31%|███ | 6901/22434 [7:41:26<10:49:46, 2.51s/it] +2025-02-05 17:49:08 - ERROR - stderr - 31%|███ | 6902/22434 [7:41:28<10:51:17, 2.52s/it] +2025-02-05 17:49:08 - ERROR - stderr - +2025-02-05 17:49:08 - ERROR - stderr - +2025-02-05 17:49:08 - INFO - stdout - {'loss': 0.8867, 'grad_norm': 1.0490728616714478, 'learning_rate': 1.6222635184065997e-05, 'epoch': 0.92} +2025-02-05 17:49:08 - ERROR - stderr - 31%|███ | 6902/22434 [7:41:28<10:51:17, 2.52s/it] +2025-02-05 17:49:11 - ERROR - stderr - 31%|███ | 6903/22434 [7:41:31<11:10:49, 2.59s/it] +2025-02-05 17:49:11 - ERROR - stderr - +2025-02-05 17:49:11 - ERROR - stderr - +2025-02-05 17:49:11 - INFO - stdout - {'loss': 0.99, 'grad_norm': 1.1856682300567627, 'learning_rate': 1.6221504943041982e-05, 'epoch': 0.92} +2025-02-05 17:49:11 - ERROR - stderr - 31%|███ | 6903/22434 [7:41:31<11:10:49, 2.59s/it] +2025-02-05 17:49:14 - ERROR - stderr - 31%|███ | 6904/22434 [7:41:33<11:00:59, 2.55s/it] +2025-02-05 17:49:14 - ERROR - stderr - +2025-02-05 17:49:14 - ERROR - stderr - +2025-02-05 17:49:14 - INFO - stdout - {'loss': 0.8888, 'grad_norm': 1.033455729484558, 'learning_rate': 1.6220374572336646e-05, 'epoch': 0.92} +2025-02-05 17:49:14 - ERROR - stderr - 31%|███ | 6904/22434 [7:41:33<11:00:59, 2.55s/it] +2025-02-05 17:49:16 - ERROR - stderr - 31%|███ | 6905/22434 [7:41:36<10:53:42, 2.53s/it] +2025-02-05 17:49:16 - ERROR - stderr - +2025-02-05 17:49:16 - ERROR - stderr - +2025-02-05 17:49:16 - INFO - stdout - {'loss': 0.8729, 'grad_norm': 1.088620901107788, 'learning_rate': 1.6219244071973554e-05, 'epoch': 0.92} +2025-02-05 17:49:16 - ERROR - stderr - 31%|███ | 6905/22434 [7:41:36<10:53:42, 2.53s/it] +2025-02-05 17:49:19 - ERROR - stderr - 31%|███ | 6906/22434 [7:41:38<10:48:39, 2.51s/it] +2025-02-05 17:49:19 - ERROR - stderr - +2025-02-05 17:49:19 - ERROR - stderr - +2025-02-05 17:49:19 - INFO - stdout - {'loss': 0.9952, 'grad_norm': 1.163955807685852, 'learning_rate': 1.6218113441976275e-05, 'epoch': 0.92} +2025-02-05 17:49:19 - ERROR - stderr - 31%|███ | 6906/22434 [7:41:38<10:48:39, 2.51s/it] +2025-02-05 17:49:21 - ERROR - stderr - 31%|███ | 6907/22434 [7:41:41<10:45:30, 2.49s/it] +2025-02-05 17:49:21 - ERROR - stderr - +2025-02-05 17:49:21 - ERROR - stderr - +2025-02-05 17:49:21 - INFO - stdout - {'loss': 0.8816, 'grad_norm': 1.0452533960342407, 'learning_rate': 1.6216982682368365e-05, 'epoch': 0.92} +2025-02-05 17:49:21 - ERROR - stderr - 31%|███ | 6907/22434 [7:41:41<10:45:30, 2.49s/it] +2025-02-05 17:49:24 - ERROR - stderr - 31%|███ | 6908/22434 [7:41:43<10:49:56, 2.51s/it] +2025-02-05 17:49:24 - ERROR - stderr - +2025-02-05 17:49:24 - ERROR - stderr - +2025-02-05 17:49:24 - INFO - stdout - {'loss': 0.9032, 'grad_norm': 0.9951573014259338, 'learning_rate': 1.6215851793173403e-05, 'epoch': 0.92} +2025-02-05 17:49:24 - ERROR - stderr - 31%|███ | 6908/22434 [7:41:43<10:49:56, 2.51s/it] +2025-02-05 17:49:26 - ERROR - stderr - 31%|███ | 6909/22434 [7:41:46<10:51:07, 2.52s/it] +2025-02-05 17:49:26 - ERROR - stderr - +2025-02-05 17:49:26 - ERROR - stderr - +2025-02-05 17:49:26 - INFO - stdout - {'loss': 1.0529, 'grad_norm': 1.0714987516403198, 'learning_rate': 1.6214720774414956e-05, 'epoch': 0.92} +2025-02-05 17:49:26 - ERROR - stderr - 31%|███ | 6909/22434 [7:41:46<10:51:07, 2.52s/it] +2025-02-05 17:49:29 - ERROR - stderr - 31%|███ | 6910/22434 [7:41:48<10:43:52, 2.49s/it] +2025-02-05 17:49:29 - ERROR - stderr - +2025-02-05 17:49:29 - ERROR - stderr - +2025-02-05 17:49:29 - INFO - stdout - {'loss': 0.9136, 'grad_norm': 1.290366530418396, 'learning_rate': 1.6213589626116607e-05, 'epoch': 0.92} +2025-02-05 17:49:29 - ERROR - stderr - 31%|███ | 6910/22434 [7:41:48<10:43:52, 2.49s/it] +2025-02-05 17:49:31 - ERROR - stderr - 31%|███ | 6911/22434 [7:41:51<10:45:43, 2.50s/it] +2025-02-05 17:49:31 - ERROR - stderr - +2025-02-05 17:49:31 - ERROR - stderr - +2025-02-05 17:49:31 - INFO - stdout - {'loss': 0.8399, 'grad_norm': 0.992388904094696, 'learning_rate': 1.6212458348301926e-05, 'epoch': 0.92} +2025-02-05 17:49:31 - ERROR - stderr - 31%|███ | 6911/22434 [7:41:51<10:45:43, 2.50s/it] +2025-02-05 17:49:33 - ERROR - stderr - 31%|███ | 6912/22434 [7:41:53<10:42:53, 2.49s/it] +2025-02-05 17:49:34 - ERROR - stderr - +2025-02-05 17:49:34 - ERROR - stderr - +2025-02-05 17:49:34 - INFO - stdout - {'loss': 0.9279, 'grad_norm': 1.137168288230896, 'learning_rate': 1.621132694099449e-05, 'epoch': 0.92} +2025-02-05 17:49:34 - ERROR - stderr - 31%|███ | 6912/22434 [7:41:53<10:42:53, 2.49s/it] +2025-02-05 17:49:36 - ERROR - stderr - 31%|███ | 6913/22434 [7:41:56<10:37:40, 2.47s/it] +2025-02-05 17:49:36 - ERROR - stderr - +2025-02-05 17:49:36 - ERROR - stderr - +2025-02-05 17:49:36 - INFO - stdout - {'loss': 1.0161, 'grad_norm': 1.2961491346359253, 'learning_rate': 1.621019540421789e-05, 'epoch': 0.92} +2025-02-05 17:49:36 - ERROR - stderr - 31%|███ | 6913/22434 [7:41:56<10:37:40, 2.47s/it] +2025-02-05 17:49:38 - ERROR - stderr - 31%|███ | 6914/22434 [7:41:58<10:37:39, 2.47s/it] +2025-02-05 17:49:38 - ERROR - stderr - +2025-02-05 17:49:38 - ERROR - stderr - +2025-02-05 17:49:38 - INFO - stdout - {'loss': 0.9842, 'grad_norm': 1.0697441101074219, 'learning_rate': 1.6209063737995716e-05, 'epoch': 0.92} +2025-02-05 17:49:38 - ERROR - stderr - 31%|███ | 6914/22434 [7:41:58<10:37:39, 2.47s/it] +2025-02-05 17:49:41 - ERROR - stderr - 31%|███ | 6915/22434 [7:42:01<10:45:42, 2.50s/it] +2025-02-05 17:49:41 - ERROR - stderr - +2025-02-05 17:49:41 - ERROR - stderr - +2025-02-05 17:49:41 - INFO - stdout - {'loss': 0.8655, 'grad_norm': 1.1107462644577026, 'learning_rate': 1.6207931942351543e-05, 'epoch': 0.92} +2025-02-05 17:49:41 - ERROR - stderr - 31%|███ | 6915/22434 [7:42:01<10:45:42, 2.50s/it] +2025-02-05 17:49:44 - ERROR - stderr - 31%|███ | 6916/22434 [7:42:04<11:14:38, 2.61s/it] +2025-02-05 17:49:44 - ERROR - stderr - +2025-02-05 17:49:44 - ERROR - stderr - +2025-02-05 17:49:44 - INFO - stdout - {'loss': 0.8821, 'grad_norm': 0.9772646427154541, 'learning_rate': 1.620680001730897e-05, 'epoch': 0.92} +2025-02-05 17:49:44 - ERROR - stderr - 31%|███ | 6916/22434 [7:42:04<11:14:38, 2.61s/it] +2025-02-05 17:49:46 - ERROR - stderr - 31%|███ | 6917/22434 [7:42:06<11:07:35, 2.58s/it] +2025-02-05 17:49:46 - ERROR - stderr - +2025-02-05 17:49:46 - ERROR - stderr - +2025-02-05 17:49:46 - INFO - stdout - {'loss': 0.9021, 'grad_norm': 1.0757473707199097, 'learning_rate': 1.620566796289159e-05, 'epoch': 0.92} +2025-02-05 17:49:46 - ERROR - stderr - 31%|███ | 6917/22434 [7:42:06<11:07:35, 2.58s/it] +2025-02-05 17:49:49 - ERROR - stderr - 31%|███ | 6918/22434 [7:42:09<10:56:01, 2.54s/it] +2025-02-05 17:49:49 - ERROR - stderr - +2025-02-05 17:49:49 - ERROR - stderr - +2025-02-05 17:49:49 - INFO - stdout - {'loss': 0.9483, 'grad_norm': 1.1896097660064697, 'learning_rate': 1.6204535779123002e-05, 'epoch': 0.93} +2025-02-05 17:49:49 - ERROR - stderr - 31%|███ | 6918/22434 [7:42:09<10:56:01, 2.54s/it] +2025-02-05 17:49:51 - ERROR - stderr - 31%|███ | 6919/22434 [7:42:11<10:51:13, 2.52s/it] +2025-02-05 17:49:51 - ERROR - stderr - +2025-02-05 17:49:51 - ERROR - stderr - +2025-02-05 17:49:51 - INFO - stdout - {'loss': 0.8985, 'grad_norm': 1.0323418378829956, 'learning_rate': 1.62034034660268e-05, 'epoch': 0.93} +2025-02-05 17:49:51 - ERROR - stderr - 31%|███ | 6919/22434 [7:42:11<10:51:13, 2.52s/it] +2025-02-05 17:49:54 - ERROR - stderr - 31%|███ | 6920/22434 [7:42:14<11:10:43, 2.59s/it] +2025-02-05 17:49:54 - ERROR - stderr - +2025-02-05 17:49:54 - ERROR - stderr - +2025-02-05 17:49:54 - INFO - stdout - {'loss': 1.0187, 'grad_norm': 1.0475860834121704, 'learning_rate': 1.620227102362659e-05, 'epoch': 0.93} +2025-02-05 17:49:54 - ERROR - stderr - 31%|███ | 6920/22434 [7:42:14<11:10:43, 2.59s/it] +2025-02-05 17:49:57 - ERROR - stderr - 31%|███ | 6921/22434 [7:42:16<11:16:19, 2.62s/it] +2025-02-05 17:49:57 - ERROR - stderr - +2025-02-05 17:49:57 - ERROR - stderr - +2025-02-05 17:49:57 - INFO - stdout - {'loss': 0.9402, 'grad_norm': 1.021731972694397, 'learning_rate': 1.6201138451945976e-05, 'epoch': 0.93} +2025-02-05 17:49:57 - ERROR - stderr - 31%|███ | 6921/22434 [7:42:16<11:16:19, 2.62s/it] +2025-02-05 17:49:59 - ERROR - stderr - 31%|███ | 6922/22434 [7:42:19<11:05:27, 2.57s/it] +2025-02-05 17:49:59 - ERROR - stderr - +2025-02-05 17:49:59 - ERROR - stderr - +2025-02-05 17:49:59 - INFO - stdout - {'loss': 0.9872, 'grad_norm': 1.11515474319458, 'learning_rate': 1.6200005751008564e-05, 'epoch': 0.93} +2025-02-05 17:49:59 - ERROR - stderr - 31%|███ | 6922/22434 [7:42:19<11:05:27, 2.57s/it] +2025-02-05 17:50:02 - ERROR - stderr - 31%|███ | 6923/22434 [7:42:21<10:56:34, 2.54s/it] +2025-02-05 17:50:02 - ERROR - stderr - +2025-02-05 17:50:02 - ERROR - stderr - +2025-02-05 17:50:02 - INFO - stdout - {'loss': 0.9056, 'grad_norm': 1.0412979125976562, 'learning_rate': 1.6198872920837966e-05, 'epoch': 0.93} +2025-02-05 17:50:02 - ERROR - stderr - 31%|███ | 6923/22434 [7:42:21<10:56:34, 2.54s/it] +2025-02-05 17:50:04 - ERROR - stderr - 31%|███ | 6924/22434 [7:42:24<10:54:16, 2.53s/it] +2025-02-05 17:50:04 - ERROR - stderr - +2025-02-05 17:50:04 - ERROR - stderr - +2025-02-05 17:50:04 - INFO - stdout - {'loss': 0.9424, 'grad_norm': 1.0476435422897339, 'learning_rate': 1.619773996145779e-05, 'epoch': 0.93} +2025-02-05 17:50:04 - ERROR - stderr - 31%|███ | 6924/22434 [7:42:24<10:54:16, 2.53s/it] +2025-02-05 17:50:07 - ERROR - stderr - 31%|███ | 6925/22434 [7:42:26<10:50:35, 2.52s/it] +2025-02-05 17:50:07 - ERROR - stderr - +2025-02-05 17:50:07 - ERROR - stderr - +2025-02-05 17:50:07 - INFO - stdout - {'loss': 0.8298, 'grad_norm': 1.0711697340011597, 'learning_rate': 1.6196606872891657e-05, 'epoch': 0.93} +2025-02-05 17:50:07 - ERROR - stderr - 31%|███ | 6925/22434 [7:42:26<10:50:35, 2.52s/it] +2025-02-05 17:50:09 - ERROR - stderr - 31%|███ | 6926/22434 [7:42:29<11:12:54, 2.60s/it] +2025-02-05 17:50:09 - ERROR - stderr - +2025-02-05 17:50:09 - ERROR - stderr - +2025-02-05 17:50:09 - INFO - stdout - {'loss': 1.0201, 'grad_norm': 1.055830955505371, 'learning_rate': 1.6195473655163187e-05, 'epoch': 0.93} +2025-02-05 17:50:09 - ERROR - stderr - 31%|███ | 6926/22434 [7:42:29<11:12:54, 2.60s/it] +2025-02-05 17:50:12 - ERROR - stderr - 31%|███ | 6927/22434 [7:42:32<11:08:31, 2.59s/it] +2025-02-05 17:50:12 - ERROR - stderr - +2025-02-05 17:50:12 - ERROR - stderr - +2025-02-05 17:50:12 - INFO - stdout - {'loss': 1.0027, 'grad_norm': 1.1842652559280396, 'learning_rate': 1.619434030829599e-05, 'epoch': 0.93} +2025-02-05 17:50:12 - ERROR - stderr - 31%|███ | 6927/22434 [7:42:32<11:08:31, 2.59s/it] +2025-02-05 17:50:14 - ERROR - stderr - 31%|███ | 6928/22434 [7:42:34<11:01:35, 2.56s/it] +2025-02-05 17:50:15 - ERROR - stderr - +2025-02-05 17:50:15 - ERROR - stderr - +2025-02-05 17:50:15 - INFO - stdout - {'loss': 0.8421, 'grad_norm': 0.9343481659889221, 'learning_rate': 1.6193206832313702e-05, 'epoch': 0.93} +2025-02-05 17:50:15 - ERROR - stderr - 31%|███ | 6928/22434 [7:42:34<11:01:35, 2.56s/it] +2025-02-05 17:50:17 - ERROR - stderr - 31%|███ | 6929/22434 [7:42:37<10:55:16, 2.54s/it] +2025-02-05 17:50:17 - ERROR - stderr - +2025-02-05 17:50:17 - ERROR - stderr - +2025-02-05 17:50:17 - INFO - stdout - {'loss': 0.9271, 'grad_norm': 1.092033863067627, 'learning_rate': 1.6192073227239942e-05, 'epoch': 0.93} +2025-02-05 17:50:17 - ERROR - stderr - 31%|███ | 6929/22434 [7:42:37<10:55:16, 2.54s/it] +2025-02-05 17:50:19 - ERROR - stderr - 31%|███ | 6930/22434 [7:42:39<10:56:18, 2.54s/it] +2025-02-05 17:50:20 - ERROR - stderr - +2025-02-05 17:50:20 - ERROR - stderr - +2025-02-05 17:50:20 - INFO - stdout - {'loss': 0.9572, 'grad_norm': 1.0830357074737549, 'learning_rate': 1.6190939493098344e-05, 'epoch': 0.93} +2025-02-05 17:50:20 - ERROR - stderr - 31%|███ | 6930/22434 [7:42:39<10:56:18, 2.54s/it] +2025-02-05 17:50:22 - ERROR - stderr - 31%|███ | 6931/22434 [7:42:42<10:45:54, 2.50s/it] +2025-02-05 17:50:22 - ERROR - stderr - +2025-02-05 17:50:22 - ERROR - stderr - +2025-02-05 17:50:22 - INFO - stdout - {'loss': 0.8107, 'grad_norm': 0.9988926649093628, 'learning_rate': 1.618980562991253e-05, 'epoch': 0.93} +2025-02-05 17:50:22 - ERROR - stderr - 31%|███ | 6931/22434 [7:42:42<10:45:54, 2.50s/it] +2025-02-05 17:50:24 - ERROR - stderr - 31%|███ | 6932/22434 [7:42:44<10:44:45, 2.50s/it] +2025-02-05 17:50:24 - ERROR - stderr - +2025-02-05 17:50:24 - ERROR - stderr - +2025-02-05 17:50:24 - INFO - stdout - {'loss': 1.0304, 'grad_norm': 1.1210498809814453, 'learning_rate': 1.6188671637706143e-05, 'epoch': 0.93} +2025-02-05 17:50:24 - ERROR - stderr - 31%|███ | 6932/22434 [7:42:44<10:44:45, 2.50s/it] +2025-02-05 17:50:27 - ERROR - stderr - 31%|███ | 6933/22434 [7:42:47<10:45:07, 2.50s/it] +2025-02-05 17:50:27 - ERROR - stderr - +2025-02-05 17:50:27 - ERROR - stderr - +2025-02-05 17:50:27 - INFO - stdout - {'loss': 0.8508, 'grad_norm': 1.0100246667861938, 'learning_rate': 1.618753751650282e-05, 'epoch': 0.93} +2025-02-05 17:50:27 - ERROR - stderr - 31%|███ | 6933/22434 [7:42:47<10:45:07, 2.50s/it] +2025-02-05 17:50:29 - ERROR - stderr - 31%|███ | 6934/22434 [7:42:49<10:51:10, 2.52s/it] +2025-02-05 17:50:30 - ERROR - stderr - +2025-02-05 17:50:30 - ERROR - stderr - +2025-02-05 17:50:30 - INFO - stdout - {'loss': 0.9, 'grad_norm': 0.915416955947876, 'learning_rate': 1.61864032663262e-05, 'epoch': 0.93} +2025-02-05 17:50:30 - ERROR - stderr - 31%|███ | 6934/22434 [7:42:49<10:51:10, 2.52s/it] +2025-02-05 17:50:32 - ERROR - stderr - 31%|███ | 6935/22434 [7:42:52<10:52:33, 2.53s/it] +2025-02-05 17:50:32 - ERROR - stderr - +2025-02-05 17:50:32 - ERROR - stderr - +2025-02-05 17:50:32 - INFO - stdout - {'loss': 1.0662, 'grad_norm': 1.163590431213379, 'learning_rate': 1.618526888719992e-05, 'epoch': 0.93} +2025-02-05 17:50:32 - ERROR - stderr - 31%|███ | 6935/22434 [7:42:52<10:52:33, 2.53s/it] +2025-02-05 17:50:35 - ERROR - stderr - 31%|███ | 6936/22434 [7:42:54<10:51:09, 2.52s/it] +2025-02-05 17:50:35 - ERROR - stderr - +2025-02-05 17:50:35 - ERROR - stderr - +2025-02-05 17:50:35 - INFO - stdout - {'loss': 0.9838, 'grad_norm': 1.1036838293075562, 'learning_rate': 1.6184134379147627e-05, 'epoch': 0.93} +2025-02-05 17:50:35 - ERROR - stderr - 31%|███ | 6936/22434 [7:42:54<10:51:09, 2.52s/it] +2025-02-05 17:50:37 - ERROR - stderr - 31%|███ | 6937/22434 [7:42:57<11:04:51, 2.57s/it] +2025-02-05 17:50:37 - ERROR - stderr - +2025-02-05 17:50:37 - ERROR - stderr - +2025-02-05 17:50:37 - INFO - stdout - {'loss': 0.7533, 'grad_norm': 1.1418052911758423, 'learning_rate': 1.6182999742192974e-05, 'epoch': 0.93} +2025-02-05 17:50:37 - ERROR - stderr - 31%|███ | 6937/22434 [7:42:57<11:04:51, 2.57s/it] +2025-02-05 17:50:40 - ERROR - stderr - 31%|███ | 6938/22434 [7:43:00<11:01:40, 2.56s/it] +2025-02-05 17:50:40 - ERROR - stderr - +2025-02-05 17:50:40 - ERROR - stderr - +2025-02-05 17:50:40 - INFO - stdout - {'loss': 0.8454, 'grad_norm': 1.008998155593872, 'learning_rate': 1.6181864976359608e-05, 'epoch': 0.93} +2025-02-05 17:50:40 - ERROR - stderr - 31%|███ | 6938/22434 [7:43:00<11:01:40, 2.56s/it] +2025-02-05 17:50:42 - ERROR - stderr - 31%|███ | 6939/22434 [7:43:02<10:59:31, 2.55s/it] +2025-02-05 17:50:42 - ERROR - stderr - +2025-02-05 17:50:42 - ERROR - stderr - +2025-02-05 17:50:42 - INFO - stdout - {'loss': 0.9303, 'grad_norm': 1.0258378982543945, 'learning_rate': 1.618073008167118e-05, 'epoch': 0.93} +2025-02-05 17:50:42 - ERROR - stderr - 31%|███ | 6939/22434 [7:43:02<10:59:31, 2.55s/it] +2025-02-05 17:50:45 - ERROR - stderr - 31%|███ | 6940/22434 [7:43:05<11:12:27, 2.60s/it] +2025-02-05 17:50:45 - ERROR - stderr - +2025-02-05 17:50:45 - ERROR - stderr - +2025-02-05 17:50:45 - INFO - stdout - {'loss': 0.9665, 'grad_norm': 1.0755735635757446, 'learning_rate': 1.6179595058151346e-05, 'epoch': 0.93} +2025-02-05 17:50:45 - ERROR - stderr - 31%|███ | 6940/22434 [7:43:05<11:12:27, 2.60s/it] +2025-02-05 17:50:47 - ERROR - stderr - 31%|███ | 6941/22434 [7:43:07<11:03:45, 2.57s/it] +2025-02-05 17:50:48 - ERROR - stderr - +2025-02-05 17:50:48 - ERROR - stderr - +2025-02-05 17:50:48 - INFO - stdout - {'loss': 0.9464, 'grad_norm': 1.157312273979187, 'learning_rate': 1.617845990582377e-05, 'epoch': 0.93} +2025-02-05 17:50:48 - ERROR - stderr - 31%|███ | 6941/22434 [7:43:07<11:03:45, 2.57s/it] +2025-02-05 17:50:50 - ERROR - stderr - 31%|███ | 6942/22434 [7:43:10<11:01:39, 2.56s/it] +2025-02-05 17:50:50 - ERROR - stderr - +2025-02-05 17:50:50 - ERROR - stderr - +2025-02-05 17:50:50 - INFO - stdout - {'loss': 0.8372, 'grad_norm': 1.0503900051116943, 'learning_rate': 1.617732462471211e-05, 'epoch': 0.93} +2025-02-05 17:50:50 - ERROR - stderr - 31%|███ | 6942/22434 [7:43:10<11:01:39, 2.56s/it] +2025-02-05 17:50:53 - ERROR - stderr - 31%|███ | 6943/22434 [7:43:12<10:55:21, 2.54s/it] +2025-02-05 17:50:53 - ERROR - stderr - +2025-02-05 17:50:53 - ERROR - stderr - +2025-02-05 17:50:53 - INFO - stdout - {'loss': 0.8771, 'grad_norm': 0.9213406443595886, 'learning_rate': 1.6176189214840027e-05, 'epoch': 0.93} +2025-02-05 17:50:53 - ERROR - stderr - 31%|███ | 6943/22434 [7:43:12<10:55:21, 2.54s/it] +2025-02-05 17:50:55 - ERROR - stderr - 31%|███ | 6944/22434 [7:43:15<10:58:10, 2.55s/it] +2025-02-05 17:50:55 - ERROR - stderr - +2025-02-05 17:50:55 - ERROR - stderr - +2025-02-05 17:50:55 - INFO - stdout - {'loss': 0.7725, 'grad_norm': 0.9571143984794617, 'learning_rate': 1.6175053676231188e-05, 'epoch': 0.93} +2025-02-05 17:50:55 - ERROR - stderr - 31%|███ | 6944/22434 [7:43:15<10:58:10, 2.55s/it] +2025-02-05 17:50:58 - ERROR - stderr - 31%|███ | 6945/22434 [7:43:17<10:51:22, 2.52s/it] +2025-02-05 17:50:58 - ERROR - stderr - +2025-02-05 17:50:58 - ERROR - stderr - +2025-02-05 17:50:58 - INFO - stdout - {'loss': 1.011, 'grad_norm': 1.1020632982254028, 'learning_rate': 1.6173918008909266e-05, 'epoch': 0.93} +2025-02-05 17:50:58 - ERROR - stderr - 31%|███ | 6945/22434 [7:43:17<10:51:22, 2.52s/it] +2025-02-05 17:51:00 - ERROR - stderr - 31%|███ | 6946/22434 [7:43:20<10:49:35, 2.52s/it] +2025-02-05 17:51:00 - ERROR - stderr - +2025-02-05 17:51:00 - ERROR - stderr - +2025-02-05 17:51:00 - INFO - stdout - {'loss': 0.802, 'grad_norm': 0.9676728248596191, 'learning_rate': 1.617278221289793e-05, 'epoch': 0.93} +2025-02-05 17:51:00 - ERROR - stderr - 31%|███ | 6946/22434 [7:43:20<10:49:35, 2.52s/it] +2025-02-05 17:51:03 - ERROR - stderr - 31%|███ | 6947/22434 [7:43:23<11:02:48, 2.57s/it] +2025-02-05 17:51:03 - ERROR - stderr - +2025-02-05 17:51:03 - ERROR - stderr - +2025-02-05 17:51:03 - INFO - stdout - {'loss': 0.9476, 'grad_norm': 1.1829897165298462, 'learning_rate': 1.617164628822086e-05, 'epoch': 0.93} +2025-02-05 17:51:03 - ERROR - stderr - 31%|███ | 6947/22434 [7:43:23<11:02:48, 2.57s/it] +2025-02-05 17:51:05 - ERROR - stderr - 31%|███ | 6948/22434 [7:43:25<11:02:00, 2.56s/it] +2025-02-05 17:51:05 - ERROR - stderr - +2025-02-05 17:51:05 - ERROR - stderr - +2025-02-05 17:51:05 - INFO - stdout - {'loss': 0.9338, 'grad_norm': 1.079222321510315, 'learning_rate': 1.6170510234901723e-05, 'epoch': 0.93} +2025-02-05 17:51:05 - ERROR - stderr - 31%|███ | 6948/22434 [7:43:25<11:02:00, 2.56s/it] +2025-02-05 17:51:08 - ERROR - stderr - 31%|███ | 6949/22434 [7:43:28<10:55:06, 2.54s/it] +2025-02-05 17:51:08 - ERROR - stderr - +2025-02-05 17:51:08 - ERROR - stderr - +2025-02-05 17:51:08 - INFO - stdout - {'loss': 0.8555, 'grad_norm': 1.049131155014038, 'learning_rate': 1.6169374052964205e-05, 'epoch': 0.93} +2025-02-05 17:51:08 - ERROR - stderr - 31%|███ | 6949/22434 [7:43:28<10:55:06, 2.54s/it] +2025-02-05 17:51:10 - ERROR - stderr - 31%|███ | 6950/22434 [7:43:30<10:52:29, 2.53s/it] +2025-02-05 17:51:10 - ERROR - stderr - +2025-02-05 17:51:10 - ERROR - stderr - +2025-02-05 17:51:10 - INFO - stdout - {'loss': 0.8914, 'grad_norm': 1.0093390941619873, 'learning_rate': 1.616823774243199e-05, 'epoch': 0.93} +2025-02-05 17:51:10 - ERROR - stderr - 31%|███ | 6950/22434 [7:43:30<10:52:29, 2.53s/it] +2025-02-05 17:51:13 - ERROR - stderr - 31%|███ | 6951/22434 [7:43:32<10:46:57, 2.51s/it] +2025-02-05 17:51:13 - ERROR - stderr - +2025-02-05 17:51:13 - ERROR - stderr - +2025-02-05 17:51:13 - INFO - stdout - {'loss': 0.9178, 'grad_norm': 1.0331645011901855, 'learning_rate': 1.6167101303328766e-05, 'epoch': 0.93} +2025-02-05 17:51:13 - ERROR - stderr - 31%|███ | 6951/22434 [7:43:33<10:46:57, 2.51s/it] +2025-02-05 17:51:15 - ERROR - stderr - 31%|███ | 6952/22434 [7:43:35<10:49:18, 2.52s/it] +2025-02-05 17:51:15 - ERROR - stderr - +2025-02-05 17:51:15 - ERROR - stderr - +2025-02-05 17:51:15 - INFO - stdout - {'loss': 0.8872, 'grad_norm': 0.9970361590385437, 'learning_rate': 1.616596473567821e-05, 'epoch': 0.93} +2025-02-05 17:51:15 - ERROR - stderr - 31%|███ | 6952/22434 [7:43:35<10:49:18, 2.52s/it] +2025-02-05 17:51:18 - ERROR - stderr - 31%|███ | 6953/22434 [7:43:38<10:47:19, 2.51s/it] +2025-02-05 17:51:18 - ERROR - stderr - +2025-02-05 17:51:18 - ERROR - stderr - +2025-02-05 17:51:18 - INFO - stdout - {'loss': 0.9486, 'grad_norm': 0.9104267954826355, 'learning_rate': 1.6164828039504022e-05, 'epoch': 0.93} +2025-02-05 17:51:18 - ERROR - stderr - 31%|███ | 6953/22434 [7:43:38<10:47:19, 2.51s/it] +2025-02-05 17:51:20 - ERROR - stderr - 31%|███ | 6954/22434 [7:43:40<10:51:49, 2.53s/it] +2025-02-05 17:51:20 - ERROR - stderr - +2025-02-05 17:51:20 - ERROR - stderr - +2025-02-05 17:51:20 - INFO - stdout - {'loss': 1.0143, 'grad_norm': 1.0969213247299194, 'learning_rate': 1.6163691214829895e-05, 'epoch': 0.93} +2025-02-05 17:51:20 - ERROR - stderr - 31%|███ | 6954/22434 [7:43:40<10:51:49, 2.53s/it] +2025-02-05 17:51:23 - ERROR - stderr - 31%|███ | 6955/22434 [7:43:43<10:48:16, 2.51s/it] +2025-02-05 17:51:23 - ERROR - stderr - +2025-02-05 17:51:23 - ERROR - stderr - +2025-02-05 17:51:23 - INFO - stdout - {'loss': 0.9617, 'grad_norm': 1.0657401084899902, 'learning_rate': 1.6162554261679517e-05, 'epoch': 0.93} +2025-02-05 17:51:23 - ERROR - stderr - 31%|███ | 6955/22434 [7:43:43<10:48:16, 2.51s/it] +2025-02-05 17:51:25 - ERROR - stderr - 31%|███ | 6956/22434 [7:43:45<10:45:41, 2.50s/it] +2025-02-05 17:51:25 - ERROR - stderr - +2025-02-05 17:51:25 - ERROR - stderr - +2025-02-05 17:51:25 - INFO - stdout - {'loss': 0.8382, 'grad_norm': 1.1671828031539917, 'learning_rate': 1.6161417180076596e-05, 'epoch': 0.93} +2025-02-05 17:51:25 - ERROR - stderr - 31%|███ | 6956/22434 [7:43:45<10:45:41, 2.50s/it] +2025-02-05 17:51:28 - ERROR - stderr - 31%|███ | 6957/22434 [7:43:48<10:47:35, 2.51s/it] +2025-02-05 17:51:28 - ERROR - stderr - +2025-02-05 17:51:28 - ERROR - stderr - +2025-02-05 17:51:28 - INFO - stdout - {'loss': 0.9213, 'grad_norm': 1.0025434494018555, 'learning_rate': 1.616027997004483e-05, 'epoch': 0.93} +2025-02-05 17:51:28 - ERROR - stderr - 31%|███ | 6957/22434 [7:43:48<10:47:35, 2.51s/it] +2025-02-05 17:51:30 - ERROR - stderr - 31%|███ | 6958/22434 [7:43:50<10:51:16, 2.53s/it] +2025-02-05 17:51:30 - ERROR - stderr - +2025-02-05 17:51:30 - ERROR - stderr - +2025-02-05 17:51:30 - INFO - stdout - {'loss': 0.9992, 'grad_norm': 1.061132788658142, 'learning_rate': 1.615914263160792e-05, 'epoch': 0.93} +2025-02-05 17:51:30 - ERROR - stderr - 31%|███ | 6958/22434 [7:43:50<10:51:16, 2.53s/it] +2025-02-05 17:51:33 - ERROR - stderr - 31%|███ | 6959/22434 [7:43:53<10:53:55, 2.54s/it] +2025-02-05 17:51:33 - ERROR - stderr - +2025-02-05 17:51:33 - ERROR - stderr - +2025-02-05 17:51:33 - INFO - stdout - {'loss': 0.9026, 'grad_norm': 0.9592460989952087, 'learning_rate': 1.615800516478958e-05, 'epoch': 0.93} +2025-02-05 17:51:33 - ERROR - stderr - 31%|███ | 6959/22434 [7:43:53<10:53:55, 2.54s/it] +2025-02-05 17:51:35 - ERROR - stderr - 31%|███ | 6960/22434 [7:43:55<10:49:00, 2.52s/it] +2025-02-05 17:51:35 - ERROR - stderr - +2025-02-05 17:51:35 - ERROR - stderr - +2025-02-05 17:51:35 - INFO - stdout - {'loss': 0.9834, 'grad_norm': 1.0587468147277832, 'learning_rate': 1.615686756961351e-05, 'epoch': 0.93} +2025-02-05 17:51:35 - ERROR - stderr - 31%|███ | 6960/22434 [7:43:55<10:49:00, 2.52s/it] +2025-02-05 17:51:38 - ERROR - stderr - 31%|███ | 6961/22434 [7:43:58<10:47:09, 2.51s/it] +2025-02-05 17:51:38 - ERROR - stderr - +2025-02-05 17:51:38 - ERROR - stderr - +2025-02-05 17:51:38 - INFO - stdout - {'loss': 1.0631, 'grad_norm': 1.0437768697738647, 'learning_rate': 1.6155729846103428e-05, 'epoch': 0.93} +2025-02-05 17:51:38 - ERROR - stderr - 31%|███ | 6961/22434 [7:43:58<10:47:09, 2.51s/it] +2025-02-05 17:51:40 - ERROR - stderr - 31%|███ | 6962/22434 [7:44:00<10:52:08, 2.53s/it] +2025-02-05 17:51:41 - ERROR - stderr - +2025-02-05 17:51:41 - ERROR - stderr - +2025-02-05 17:51:41 - INFO - stdout - {'loss': 0.8268, 'grad_norm': 0.9286686778068542, 'learning_rate': 1.615459199428305e-05, 'epoch': 0.93} +2025-02-05 17:51:41 - ERROR - stderr - 31%|███ | 6962/22434 [7:44:00<10:52:08, 2.53s/it] +2025-02-05 17:51:43 - ERROR - stderr - 31%|███ | 6963/22434 [7:44:03<10:44:17, 2.50s/it] +2025-02-05 17:51:43 - ERROR - stderr - +2025-02-05 17:51:43 - ERROR - stderr - +2025-02-05 17:51:43 - INFO - stdout - {'loss': 1.0214, 'grad_norm': 1.083432912826538, 'learning_rate': 1.615345401417609e-05, 'epoch': 0.93} +2025-02-05 17:51:43 - ERROR - stderr - 31%|███ | 6963/22434 [7:44:03<10:44:17, 2.50s/it] +2025-02-05 17:51:45 - ERROR - stderr - 31%|███ | 6964/22434 [7:44:05<10:42:48, 2.49s/it] +2025-02-05 17:51:45 - ERROR - stderr - +2025-02-05 17:51:45 - ERROR - stderr - +2025-02-05 17:51:45 - INFO - stdout - {'loss': 0.8595, 'grad_norm': 0.9935000538825989, 'learning_rate': 1.615231590580627e-05, 'epoch': 0.93} +2025-02-05 17:51:45 - ERROR - stderr - 31%|███ | 6964/22434 [7:44:05<10:42:48, 2.49s/it] +2025-02-05 17:51:48 - ERROR - stderr - 31%|███ | 6965/22434 [7:44:08<10:44:27, 2.50s/it] +2025-02-05 17:51:48 - ERROR - stderr - +2025-02-05 17:51:48 - ERROR - stderr - +2025-02-05 17:51:48 - INFO - stdout - {'loss': 0.8787, 'grad_norm': 1.0275914669036865, 'learning_rate': 1.6151177669197312e-05, 'epoch': 0.93} +2025-02-05 17:51:48 - ERROR - stderr - 31%|███ | 6965/22434 [7:44:08<10:44:27, 2.50s/it] +2025-02-05 17:51:50 - ERROR - stderr - 31%|███ | 6966/22434 [7:44:10<10:49:20, 2.52s/it] +2025-02-05 17:51:51 - ERROR - stderr - +2025-02-05 17:51:51 - ERROR - stderr - +2025-02-05 17:51:51 - INFO - stdout - {'loss': 0.9003, 'grad_norm': 1.0694726705551147, 'learning_rate': 1.615003930437294e-05, 'epoch': 0.93} +2025-02-05 17:51:51 - ERROR - stderr - 31%|███ | 6966/22434 [7:44:10<10:49:20, 2.52s/it] +2025-02-05 17:51:53 - ERROR - stderr - 31%|███ | 6967/22434 [7:44:13<10:49:55, 2.52s/it] +2025-02-05 17:51:53 - ERROR - stderr - +2025-02-05 17:51:53 - ERROR - stderr - +2025-02-05 17:51:53 - INFO - stdout - {'loss': 0.934, 'grad_norm': 1.1867496967315674, 'learning_rate': 1.6148900811356886e-05, 'epoch': 0.93} +2025-02-05 17:51:53 - ERROR - stderr - 31%|███ | 6967/22434 [7:44:13<10:49:55, 2.52s/it] +2025-02-05 17:51:55 - ERROR - stderr - 31%|███ | 6968/22434 [7:44:15<10:48:33, 2.52s/it] +2025-02-05 17:51:56 - ERROR - stderr - +2025-02-05 17:51:56 - ERROR - stderr - +2025-02-05 17:51:56 - INFO - stdout - {'loss': 0.9237, 'grad_norm': 1.1108530759811401, 'learning_rate': 1.6147762190172877e-05, 'epoch': 0.93} +2025-02-05 17:51:56 - ERROR - stderr - 31%|███ | 6968/22434 [7:44:15<10:48:33, 2.52s/it] +2025-02-05 17:51:58 - ERROR - stderr - 31%|███ | 6969/22434 [7:44:18<10:49:50, 2.52s/it] +2025-02-05 17:51:58 - ERROR - stderr - +2025-02-05 17:51:58 - ERROR - stderr - +2025-02-05 17:51:58 - INFO - stdout - {'loss': 0.9495, 'grad_norm': 1.0014153718948364, 'learning_rate': 1.6146623440844645e-05, 'epoch': 0.93} +2025-02-05 17:51:58 - ERROR - stderr - 31%|███ | 6969/22434 [7:44:18<10:49:50, 2.52s/it] +2025-02-05 17:52:01 - ERROR - stderr - 31%|███ | 6970/22434 [7:44:20<10:47:07, 2.51s/it] +2025-02-05 17:52:01 - ERROR - stderr - +2025-02-05 17:52:01 - ERROR - stderr - +2025-02-05 17:52:01 - INFO - stdout - {'loss': 0.8602, 'grad_norm': 1.0120370388031006, 'learning_rate': 1.6145484563395934e-05, 'epoch': 0.93} +2025-02-05 17:52:01 - ERROR - stderr - 31%|███ | 6970/22434 [7:44:20<10:47:07, 2.51s/it] +2025-02-05 17:52:03 - ERROR - stderr - 31%|███ | 6971/22434 [7:44:23<10:50:57, 2.53s/it] +2025-02-05 17:52:03 - ERROR - stderr - +2025-02-05 17:52:03 - ERROR - stderr - +2025-02-05 17:52:03 - INFO - stdout - {'loss': 0.9375, 'grad_norm': 1.0823014974594116, 'learning_rate': 1.6144345557850475e-05, 'epoch': 0.93} +2025-02-05 17:52:03 - ERROR - stderr - 31%|███ | 6971/22434 [7:44:23<10:50:57, 2.53s/it] +2025-02-05 17:52:06 - ERROR - stderr - 31%|███ | 6972/22434 [7:44:25<10:44:50, 2.50s/it] +2025-02-05 17:52:06 - ERROR - stderr - +2025-02-05 17:52:06 - ERROR - stderr - +2025-02-05 17:52:06 - INFO - stdout - {'loss': 0.9405, 'grad_norm': 1.0309419631958008, 'learning_rate': 1.6143206424232018e-05, 'epoch': 0.93} +2025-02-05 17:52:06 - ERROR - stderr - 31%|███ | 6972/22434 [7:44:25<10:44:50, 2.50s/it] +2025-02-05 17:52:08 - ERROR - stderr - 31%|███ | 6973/22434 [7:44:28<10:44:07, 2.50s/it] +2025-02-05 17:52:08 - ERROR - stderr - +2025-02-05 17:52:08 - ERROR - stderr - +2025-02-05 17:52:08 - INFO - stdout - {'loss': 0.8849, 'grad_norm': 1.0053772926330566, 'learning_rate': 1.6142067162564293e-05, 'epoch': 0.93} +2025-02-05 17:52:08 - ERROR - stderr - 31%|███ | 6973/22434 [7:44:28<10:44:07, 2.50s/it] +2025-02-05 17:52:11 - ERROR - stderr - 31%|███ | 6974/22434 [7:44:30<10:45:57, 2.51s/it] +2025-02-05 17:52:11 - ERROR - stderr - +2025-02-05 17:52:11 - ERROR - stderr - +2025-02-05 17:52:11 - INFO - stdout - {'loss': 0.8845, 'grad_norm': 1.0059148073196411, 'learning_rate': 1.614092777287106e-05, 'epoch': 0.93} +2025-02-05 17:52:11 - ERROR - stderr - 31%|███ | 6974/22434 [7:44:30<10:45:57, 2.51s/it] +2025-02-05 17:52:13 - ERROR - stderr - 31%|███ | 6975/22434 [7:44:33<10:45:26, 2.51s/it] +2025-02-05 17:52:13 - ERROR - stderr - +2025-02-05 17:52:13 - ERROR - stderr - +2025-02-05 17:52:13 - INFO - stdout - {'loss': 1.0046, 'grad_norm': 1.1131207942962646, 'learning_rate': 1.6139788255176063e-05, 'epoch': 0.93} +2025-02-05 17:52:13 - ERROR - stderr - 31%|█��█ | 6975/22434 [7:44:33<10:45:26, 2.51s/it] +2025-02-05 17:52:16 - ERROR - stderr - 31%|███ | 6976/22434 [7:44:35<10:43:14, 2.50s/it] +2025-02-05 17:52:16 - ERROR - stderr - +2025-02-05 17:52:16 - ERROR - stderr - +2025-02-05 17:52:16 - INFO - stdout - {'loss': 0.9255, 'grad_norm': 1.1017849445343018, 'learning_rate': 1.6138648609503055e-05, 'epoch': 0.93} +2025-02-05 17:52:16 - ERROR - stderr - 31%|███ | 6976/22434 [7:44:35<10:43:14, 2.50s/it] +2025-02-05 17:52:18 - ERROR - stderr - 31%|███ | 6977/22434 [7:44:38<10:48:01, 2.52s/it] +2025-02-05 17:52:18 - ERROR - stderr - +2025-02-05 17:52:18 - ERROR - stderr - +2025-02-05 17:52:18 - INFO - stdout - {'loss': 1.0285, 'grad_norm': 1.1533608436584473, 'learning_rate': 1.613750883587579e-05, 'epoch': 0.93} +2025-02-05 17:52:18 - ERROR - stderr - 31%|███ | 6977/22434 [7:44:38<10:48:01, 2.52s/it] +2025-02-05 17:52:21 - ERROR - stderr - 31%|███ | 6978/22434 [7:44:40<10:44:40, 2.50s/it] +2025-02-05 17:52:21 - ERROR - stderr - +2025-02-05 17:52:21 - ERROR - stderr - +2025-02-05 17:52:21 - INFO - stdout - {'loss': 0.9821, 'grad_norm': 1.0690585374832153, 'learning_rate': 1.6136368934318028e-05, 'epoch': 0.93} +2025-02-05 17:52:21 - ERROR - stderr - 31%|███ | 6978/22434 [7:44:40<10:44:40, 2.50s/it] +2025-02-05 17:52:23 - ERROR - stderr - 31%|███ | 6979/22434 [7:44:43<10:45:56, 2.51s/it] +2025-02-05 17:52:23 - ERROR - stderr - +2025-02-05 17:52:23 - ERROR - stderr - +2025-02-05 17:52:23 - INFO - stdout - {'loss': 0.8647, 'grad_norm': 1.077472448348999, 'learning_rate': 1.6135228904853525e-05, 'epoch': 0.93} +2025-02-05 17:52:23 - ERROR - stderr - 31%|███ | 6979/22434 [7:44:43<10:45:56, 2.51s/it] +2025-02-05 17:52:26 - ERROR - stderr - 31%|███ | 6980/22434 [7:44:45<10:44:45, 2.50s/it] +2025-02-05 17:52:26 - ERROR - stderr - +2025-02-05 17:52:26 - ERROR - stderr - +2025-02-05 17:52:26 - INFO - stdout - {'loss': 0.9894, 'grad_norm': 1.1467127799987793, 'learning_rate': 1.6134088747506046e-05, 'epoch': 0.93} +2025-02-05 17:52:26 - ERROR - stderr - 31%|███ | 6980/22434 [7:44:45<10:44:45, 2.50s/it] +2025-02-05 17:52:28 - ERROR - stderr - 31%|███ | 6981/22434 [7:44:48<10:40:45, 2.49s/it] +2025-02-05 17:52:28 - ERROR - stderr - +2025-02-05 17:52:28 - ERROR - stderr - +2025-02-05 17:52:28 - INFO - stdout - {'loss': 0.9001, 'grad_norm': 1.0875422954559326, 'learning_rate': 1.6132948462299362e-05, 'epoch': 0.93} +2025-02-05 17:52:28 - ERROR - stderr - 31%|███ | 6981/22434 [7:44:48<10:40:45, 2.49s/it] +2025-02-05 17:52:31 - ERROR - stderr - 31%|███ | 6982/22434 [7:44:50<10:52:11, 2.53s/it] +2025-02-05 17:52:31 - ERROR - stderr - +2025-02-05 17:52:31 - ERROR - stderr - +2025-02-05 17:52:31 - INFO - stdout - {'loss': 1.0571, 'grad_norm': 1.076904296875, 'learning_rate': 1.6131808049257228e-05, 'epoch': 0.93} +2025-02-05 17:52:31 - ERROR - stderr - 31%|███ | 6982/22434 [7:44:50<10:52:11, 2.53s/it] +2025-02-05 17:52:33 - ERROR - stderr - 31%|███ | 6983/22434 [7:44:53<10:46:46, 2.51s/it] +2025-02-05 17:52:33 - ERROR - stderr - +2025-02-05 17:52:33 - ERROR - stderr - +2025-02-05 17:52:33 - INFO - stdout - {'loss': 0.8687, 'grad_norm': 0.9852768778800964, 'learning_rate': 1.613066750840343e-05, 'epoch': 0.93} +2025-02-05 17:52:33 - ERROR - stderr - 31%|███ | 6983/22434 [7:44:53<10:46:46, 2.51s/it] +2025-02-05 17:52:36 - ERROR - stderr - 31%|███ | 6984/22434 [7:44:55<10:40:04, 2.49s/it] +2025-02-05 17:52:36 - ERROR - stderr - +2025-02-05 17:52:36 - ERROR - stderr - +2025-02-05 17:52:36 - INFO - stdout - {'loss': 0.9328, 'grad_norm': 1.1950201988220215, 'learning_rate': 1.612952683976173e-05, 'epoch': 0.93} +2025-02-05 17:52:36 - ERROR - stderr - 31%|███ | 6984/22434 [7:44:55<10:40:04, 2.49s/it] +2025-02-05 17:52:38 - ERROR - stderr - 31%|███ | 6985/22434 [7:44:58<10:43:23, 2.50s/it] +2025-02-05 17:52:38 - ERROR - stderr - +2025-02-05 17:52:38 - ERROR - stderr - +2025-02-05 17:52:38 - INFO - stdout - {'loss': 0.8697, 'grad_norm': 0.9338102340698242, 'learning_rate': 1.612838604335591e-05, 'epoch': 0.93} +2025-02-05 17:52:38 - ERROR - stderr - 31%|███ | 6985/22434 [7:44:58<10:43:23, 2.50s/it] +2025-02-05 17:52:41 - ERROR - stderr - 31%|███ | 6986/22434 [7:45:00<10:40:37, 2.49s/it] +2025-02-05 17:52:41 - ERROR - stderr - +2025-02-05 17:52:41 - ERROR - stderr - +2025-02-05 17:52:41 - INFO - stdout - {'loss': 0.8812, 'grad_norm': 1.0609676837921143, 'learning_rate': 1.6127245119209747e-05, 'epoch': 0.93} +2025-02-05 17:52:41 - ERROR - stderr - 31%|███ | 6986/22434 [7:45:00<10:40:37, 2.49s/it] +2025-02-05 17:52:43 - ERROR - stderr - 31%|███ | 6987/22434 [7:45:03<10:40:41, 2.49s/it] +2025-02-05 17:52:43 - ERROR - stderr - +2025-02-05 17:52:43 - ERROR - stderr - +2025-02-05 17:52:43 - INFO - stdout - {'loss': 0.889, 'grad_norm': 1.0481910705566406, 'learning_rate': 1.6126104067347023e-05, 'epoch': 0.93} +2025-02-05 17:52:43 - ERROR - stderr - 31%|███ | 6987/22434 [7:45:03<10:40:41, 2.49s/it] +2025-02-05 17:52:45 - ERROR - stderr - 31%|███ | 6988/22434 [7:45:05<10:36:27, 2.47s/it] +2025-02-05 17:52:46 - ERROR - stderr - +2025-02-05 17:52:46 - ERROR - stderr - +2025-02-05 17:52:46 - INFO - stdout - {'loss': 1.1037, 'grad_norm': 1.1022799015045166, 'learning_rate': 1.612496288779152e-05, 'epoch': 0.93} +2025-02-05 17:52:46 - ERROR - stderr - 31%|███ | 6988/22434 [7:45:05<10:36:27, 2.47s/it] +2025-02-05 17:52:48 - ERROR - stderr - 31%|███ | 6989/22434 [7:45:08<10:46:04, 2.51s/it] +2025-02-05 17:52:48 - ERROR - stderr - +2025-02-05 17:52:48 - ERROR - stderr - +2025-02-05 17:52:48 - INFO - stdout - {'loss': 0.9815, 'grad_norm': 1.087249994277954, 'learning_rate': 1.6123821580567028e-05, 'epoch': 0.93} +2025-02-05 17:52:48 - ERROR - stderr - 31%|███ | 6989/22434 [7:45:08<10:46:04, 2.51s/it] +2025-02-05 17:52:50 - ERROR - stderr - 31%|███ | 6990/22434 [7:45:10<10:41:19, 2.49s/it] +2025-02-05 17:52:51 - ERROR - stderr - +2025-02-05 17:52:51 - ERROR - stderr - +2025-02-05 17:52:51 - INFO - stdout - {'loss': 0.861, 'grad_norm': 0.9721426963806152, 'learning_rate': 1.6122680145697334e-05, 'epoch': 0.93} +2025-02-05 17:52:51 - ERROR - stderr - 31%|███ | 6990/22434 [7:45:10<10:41:19, 2.49s/it] +2025-02-05 17:52:53 - ERROR - stderr - 31%|███ | 6991/22434 [7:45:13<10:44:37, 2.50s/it] +2025-02-05 17:52:53 - ERROR - stderr - +2025-02-05 17:52:53 - ERROR - stderr - +2025-02-05 17:52:53 - INFO - stdout - {'loss': 0.8518, 'grad_norm': 0.9519912600517273, 'learning_rate': 1.6121538583206232e-05, 'epoch': 0.93} +2025-02-05 17:52:53 - ERROR - stderr - 31%|███ | 6991/22434 [7:45:13<10:44:37, 2.50s/it] +2025-02-05 17:52:56 - ERROR - stderr - 31%|███ | 6992/22434 [7:45:15<10:46:57, 2.51s/it] +2025-02-05 17:52:56 - ERROR - stderr - +2025-02-05 17:52:56 - ERROR - stderr - +2025-02-05 17:52:56 - INFO - stdout - {'loss': 0.85, 'grad_norm': 0.9744101166725159, 'learning_rate': 1.6120396893117518e-05, 'epoch': 0.94} +2025-02-05 17:52:56 - ERROR - stderr - 31%|███ | 6992/22434 [7:45:15<10:46:57, 2.51s/it] +2025-02-05 17:52:58 - ERROR - stderr - 31%|███ | 6993/22434 [7:45:18<10:51:32, 2.53s/it] +2025-02-05 17:52:58 - ERROR - stderr - +2025-02-05 17:52:58 - ERROR - stderr - +2025-02-05 17:52:58 - INFO - stdout - {'loss': 0.8196, 'grad_norm': 0.9318773746490479, 'learning_rate': 1.6119255075454986e-05, 'epoch': 0.94} +2025-02-05 17:52:58 - ERROR - stderr - 31%|███ | 6993/22434 [7:45:18<10:51:32, 2.53s/it] +2025-02-05 17:53:01 - ERROR - stderr - 31%|███ | 6994/22434 [7:45:20<10:47:10, 2.51s/it] +2025-02-05 17:53:01 - ERROR - stderr - +2025-02-05 17:53:01 - ERROR - stderr - +2025-02-05 17:53:01 - INFO - stdout - {'loss': 0.94, 'grad_norm': 1.2241122722625732, 'learning_rate': 1.6118113130242435e-05, 'epoch': 0.94} +2025-02-05 17:53:01 - ERROR - stderr - 31%|███ | 6994/22434 [7:45:20<10:47:10, 2.51s/it] +2025-02-05 17:53:03 - ERROR - stderr - 31%|███ | 6995/22434 [7:45:23<10:52:43, 2.54s/it] +2025-02-05 17:53:03 - ERROR - stderr - +2025-02-05 17:53:03 - ERROR - stderr - +2025-02-05 17:53:03 - INFO - stdout - {'loss': 0.9284, 'grad_norm': 0.9897013902664185, 'learning_rate': 1.6116971057503673e-05, 'epoch': 0.94} +2025-02-05 17:53:03 - ERROR - stderr - 31%|███ | 6995/22434 [7:45:23<10:52:43, 2.54s/it] +2025-02-05 17:53:06 - ERROR - stderr - 31%|███ | 6996/22434 [7:45:25<10:48:55, 2.52s/it] +2025-02-05 17:53:06 - ERROR - stderr - +2025-02-05 17:53:06 - ERROR - stderr - +2025-02-05 17:53:06 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 1.1496587991714478, 'learning_rate': 1.6115828857262502e-05, 'epoch': 0.94} +2025-02-05 17:53:06 - ERROR - stderr - 31%|███ | 6996/22434 [7:45:26<10:48:55, 2.52s/it] +2025-02-05 17:53:08 - ERROR - stderr - 31%|███ | 6997/22434 [7:45:28<10:44:18, 2.50s/it] +2025-02-05 17:53:08 - ERROR - stderr - +2025-02-05 17:53:08 - ERROR - stderr - +2025-02-05 17:53:08 - INFO - stdout - {'loss': 0.9193, 'grad_norm': 1.0751930475234985, 'learning_rate': 1.611468652954273e-05, 'epoch': 0.94} +2025-02-05 17:53:08 - ERROR - stderr - 31%|███ | 6997/22434 [7:45:28<10:44:18, 2.50s/it] +2025-02-05 17:53:11 - ERROR - stderr - 31%|███ | 6998/22434 [7:45:30<10:45:15, 2.51s/it] +2025-02-05 17:53:11 - ERROR - stderr - +2025-02-05 17:53:11 - ERROR - stderr - +2025-02-05 17:53:11 - INFO - stdout - {'loss': 1.0087, 'grad_norm': 1.0770121812820435, 'learning_rate': 1.6113544074368166e-05, 'epoch': 0.94} +2025-02-05 17:53:11 - ERROR - stderr - 31%|███ | 6998/22434 [7:45:30<10:45:15, 2.51s/it] +2025-02-05 17:53:13 - ERROR - stderr - 31%|███ | 6999/22434 [7:45:33<10:43:04, 2.50s/it] +2025-02-05 17:53:13 - ERROR - stderr - +2025-02-05 17:53:13 - ERROR - stderr - +2025-02-05 17:53:13 - INFO - stdout - {'loss': 0.9297, 'grad_norm': 1.0164508819580078, 'learning_rate': 1.611240149176263e-05, 'epoch': 0.94} +2025-02-05 17:53:13 - ERROR - stderr - 31%|███ | 6999/22434 [7:45:33<10:43:04, 2.50s/it] +2025-02-05 17:53:16 - ERROR - stderr - 31%|███ | 7000/22434 [7:45:35<10:43:00, 2.50s/it] +2025-02-05 17:53:16 - ERROR - stderr - +2025-02-05 17:53:16 - ERROR - stderr - +2025-02-05 17:53:16 - INFO - stdout - {'loss': 0.9097, 'grad_norm': 1.1183191537857056, 'learning_rate': 1.6111258781749934e-05, 'epoch': 0.94} +2025-02-05 17:53:16 - ERROR - stderr - 31%|███ | 7000/22434 [7:45:35<10:43:00, 2.50s/it] +2025-02-05 17:53:18 - ERROR - stderr - 31%|███ | 7001/22434 [7:45:38<10:36:21, 2.47s/it] +2025-02-05 17:53:18 - ERROR - stderr - +2025-02-05 17:53:18 - ERROR - stderr - +2025-02-05 17:53:18 - INFO - stdout - {'loss': 0.894, 'grad_norm': 1.0159372091293335, 'learning_rate': 1.611011594435389e-05, 'epoch': 0.94} +2025-02-05 17:53:18 - ERROR - stderr - 31%|███ | 7001/22434 [7:45:38<10:36:21, 2.47s/it] +2025-02-05 17:53:21 - ERROR - stderr - 31%|███ | 7002/22434 [7:45:40<10:36:13, 2.47s/it] +2025-02-05 17:53:21 - ERROR - stderr - +2025-02-05 17:53:21 - ERROR - stderr - +2025-02-05 17:53:21 - INFO - stdout - {'loss': 0.9422, 'grad_norm': 1.1424487829208374, 'learning_rate': 1.610897297959833e-05, 'epoch': 0.94} +2025-02-05 17:53:21 - ERROR - stderr - 31%|███ | 7002/22434 [7:45:40<10:36:13, 2.47s/it] +2025-02-05 17:53:23 - ERROR - stderr - 31%|███ | 7003/22434 [7:45:43<10:36:28, 2.47s/it] +2025-02-05 17:53:23 - ERROR - stderr - +2025-02-05 17:53:23 - ERROR - stderr - +2025-02-05 17:53:23 - INFO - stdout - {'loss': 0.8775, 'grad_norm': 1.208791732788086, 'learning_rate': 1.6107829887507076e-05, 'epoch': 0.94} +2025-02-05 17:53:23 - ERROR - stderr - 31%|███ | 7003/22434 [7:45:43<10:36:28, 2.47s/it] +2025-02-05 17:53:25 - ERROR - stderr - 31%|███ | 7004/22434 [7:45:45<10:36:54, 2.48s/it] +2025-02-05 17:53:26 - ERROR - stderr - +2025-02-05 17:53:26 - ERROR - stderr - +2025-02-05 17:53:26 - INFO - stdout - {'loss': 1.0455, 'grad_norm': 1.1572887897491455, 'learning_rate': 1.610668666810395e-05, 'epoch': 0.94} +2025-02-05 17:53:26 - ERROR - stderr - 31%|███ | 7004/22434 [7:45:45<10:36:54, 2.48s/it] +2025-02-05 17:53:28 - ERROR - stderr - 31%|███ | 7005/22434 [7:45:48<10:32:46, 2.46s/it] +2025-02-05 17:53:28 - ERROR - stderr - +2025-02-05 17:53:28 - ERROR - stderr - +2025-02-05 17:53:28 - INFO - stdout - {'loss': 1.041, 'grad_norm': 1.0640041828155518, 'learning_rate': 1.6105543321412786e-05, 'epoch': 0.94} +2025-02-05 17:53:28 - ERROR - stderr - 31%|███ | 7005/22434 [7:45:48<10:32:46, 2.46s/it] +2025-02-05 17:53:31 - ERROR - stderr - 31%|███ | 7006/22434 [7:45:50<10:58:57, 2.56s/it] +2025-02-05 17:53:31 - ERROR - stderr - +2025-02-05 17:53:31 - ERROR - stderr - +2025-02-05 17:53:31 - INFO - stdout - {'loss': 0.9387, 'grad_norm': 1.091064214706421, 'learning_rate': 1.610439984745741e-05, 'epoch': 0.94} +2025-02-05 17:53:31 - ERROR - stderr - 31%|███ | 7006/22434 [7:45:51<10:58:57, 2.56s/it] +2025-02-05 17:53:33 - ERROR - stderr - 31%|███ | 7007/22434 [7:45:53<10:57:38, 2.56s/it] +2025-02-05 17:53:33 - ERROR - stderr - +2025-02-05 17:53:33 - ERROR - stderr - +2025-02-05 17:53:33 - INFO - stdout - {'loss': 0.9314, 'grad_norm': 1.1536128520965576, 'learning_rate': 1.6103256246261665e-05, 'epoch': 0.94} +2025-02-05 17:53:33 - ERROR - stderr - 31%|███ | 7007/22434 [7:45:53<10:57:38, 2.56s/it] +2025-02-05 17:53:36 - ERROR - stderr - 31%|███ | 7008/22434 [7:45:55<10:50:04, 2.53s/it] +2025-02-05 17:53:36 - ERROR - stderr - +2025-02-05 17:53:36 - ERROR - stderr - +2025-02-05 17:53:36 - INFO - stdout - {'loss': 0.9944, 'grad_norm': 1.0467454195022583, 'learning_rate': 1.6102112517849383e-05, 'epoch': 0.94} +2025-02-05 17:53:36 - ERROR - stderr - 31%|███ | 7008/22434 [7:45:56<10:50:04, 2.53s/it] +2025-02-05 17:53:38 - ERROR - stderr - 31%|███ | 7009/22434 [7:45:58<10:46:04, 2.51s/it] +2025-02-05 17:53:38 - ERROR - stderr - +2025-02-05 17:53:38 - ERROR - stderr - +2025-02-05 17:53:38 - INFO - stdout - {'loss': 0.9626, 'grad_norm': 1.4069218635559082, 'learning_rate': 1.6100968662244402e-05, 'epoch': 0.94} +2025-02-05 17:53:38 - ERROR - stderr - 31%|███ | 7009/22434 [7:45:58<10:46:04, 2.51s/it] +2025-02-05 17:53:41 - ERROR - stderr - 31%|███ | 7010/22434 [7:46:00<10:44:19, 2.51s/it] +2025-02-05 17:53:41 - ERROR - stderr - +2025-02-05 17:53:41 - ERROR - stderr - +2025-02-05 17:53:41 - INFO - stdout - {'loss': 0.9785, 'grad_norm': 0.951005220413208, 'learning_rate': 1.609982467947057e-05, 'epoch': 0.94} +2025-02-05 17:53:41 - ERROR - stderr - 31%|███ | 7010/22434 [7:46:01<10:44:19, 2.51s/it] +2025-02-05 17:53:43 - ERROR - stderr - 31%|███▏ | 7011/22434 [7:46:03<10:41:34, 2.50s/it] +2025-02-05 17:53:43 - ERROR - stderr - +2025-02-05 17:53:43 - ERROR - stderr - +2025-02-05 17:53:43 - INFO - stdout - {'loss': 0.849, 'grad_norm': 0.9597281813621521, 'learning_rate': 1.6098680569551727e-05, 'epoch': 0.94} +2025-02-05 17:53:43 - ERROR - stderr - 31%|███▏ | 7011/22434 [7:46:03<10:41:34, 2.50s/it] +2025-02-05 17:53:46 - ERROR - stderr - 31%|███▏ | 7012/22434 [7:46:05<10:38:54, 2.49s/it] +2025-02-05 17:53:46 - ERROR - stderr - +2025-02-05 17:53:46 - ERROR - stderr - +2025-02-05 17:53:46 - INFO - stdout - {'loss': 0.8753, 'grad_norm': 1.0189837217330933, 'learning_rate': 1.6097536332511726e-05, 'epoch': 0.94} +2025-02-05 17:53:46 - ERROR - stderr - 31%|███▏ | 7012/22434 [7:46:05<10:38:54, 2.49s/it] +2025-02-05 17:53:48 - ERROR - stderr - 31%|███▏ | 7013/22434 [7:46:08<10:37:55, 2.48s/it] +2025-02-05 17:53:48 - ERROR - stderr - +2025-02-05 17:53:48 - ERROR - stderr - +2025-02-05 17:53:48 - INFO - stdout - {'loss': 0.9114, 'grad_norm': 1.0029723644256592, 'learning_rate': 1.609639196837441e-05, 'epoch': 0.94} +2025-02-05 17:53:48 - ERROR - stderr - 31%|███▏ | 7013/22434 [7:46:08<10:37:55, 2.48s/it] +2025-02-05 17:53:51 - ERROR - stderr - 31%|███▏ | 7014/22434 [7:46:10<10:38:20, 2.48s/it] +2025-02-05 17:53:51 - ERROR - stderr - +2025-02-05 17:53:51 - ERROR - stderr - +2025-02-05 17:53:51 - INFO - stdout - {'loss': 0.9898, 'grad_norm': 1.0561949014663696, 'learning_rate': 1.6095247477163644e-05, 'epoch': 0.94} +2025-02-05 17:53:51 - ERROR - stderr - 31%|███▏ | 7014/22434 [7:46:10<10:38:20, 2.48s/it] +2025-02-05 17:53:53 - ERROR - stderr - 31%|███▏ | 7015/22434 [7:46:13<10:39:10, 2.49s/it] +2025-02-05 17:53:53 - ERROR - stderr - +2025-02-05 17:53:53 - ERROR - stderr - +2025-02-05 17:53:53 - INFO - stdout - {'loss': 0.9567, 'grad_norm': 1.015450119972229, 'learning_rate': 1.6094102858903275e-05, 'epoch': 0.94} +2025-02-05 17:53:53 - ERROR - stderr - 31%|███▏ | 7015/22434 [7:46:13<10:39:10, 2.49s/it] +2025-02-05 17:53:56 - ERROR - stderr - 31%|███▏ | 7016/22434 [7:46:16<11:05:57, 2.59s/it] +2025-02-05 17:53:56 - ERROR - stderr - +2025-02-05 17:53:56 - ERROR - stderr - +2025-02-05 17:53:56 - INFO - stdout - {'loss': 0.8879, 'grad_norm': 1.1168749332427979, 'learning_rate': 1.609295811361716e-05, 'epoch': 0.94} +2025-02-05 17:53:56 - ERROR - stderr - 31%|███▏ | 7016/22434 [7:46:16<11:05:57, 2.59s/it] +2025-02-05 17:53:58 - ERROR - stderr - 31%|███▏ | 7017/22434 [7:46:18<10:59:49, 2.57s/it] +2025-02-05 17:53:58 - ERROR - stderr - +2025-02-05 17:53:58 - ERROR - stderr - +2025-02-05 17:53:58 - INFO - stdout - {'loss': 1.0041, 'grad_norm': 1.0824220180511475, 'learning_rate': 1.6091813241329163e-05, 'epoch': 0.94} +2025-02-05 17:53:58 - ERROR - stderr - 31%|███▏ | 7017/22434 [7:46:18<10:59:49, 2.57s/it] +2025-02-05 17:54:01 - ERROR - stderr - 31%|███▏ | 7018/22434 [7:46:21<10:54:56, 2.55s/it] +2025-02-05 17:54:01 - ERROR - stderr - +2025-02-05 17:54:01 - ERROR - stderr - +2025-02-05 17:54:01 - INFO - stdout - {'loss': 0.8724, 'grad_norm': 1.029380202293396, 'learning_rate': 1.6090668242063152e-05, 'epoch': 0.94} +2025-02-05 17:54:01 - ERROR - stderr - 31%|███▏ | 7018/22434 [7:46:21<10:54:56, 2.55s/it] +2025-02-05 17:54:03 - ERROR - stderr - 31%|███▏ | 7019/22434 [7:46:23<10:54:49, 2.55s/it] +2025-02-05 17:54:04 - ERROR - stderr - +2025-02-05 17:54:04 - ERROR - stderr - +2025-02-05 17:54:04 - INFO - stdout - {'loss': 0.9717, 'grad_norm': 1.1161150932312012, 'learning_rate': 1.608952311584299e-05, 'epoch': 0.94} +2025-02-05 17:54:04 - ERROR - stderr - 31%|███▏ | 7019/22434 [7:46:23<10:54:49, 2.55s/it] +2025-02-05 17:54:06 - ERROR - stderr - 31%|███▏ | 7020/22434 [7:46:26<10:51:13, 2.53s/it] +2025-02-05 17:54:06 - ERROR - stderr - +2025-02-05 17:54:06 - ERROR - stderr - +2025-02-05 17:54:06 - INFO - stdout - {'loss': 1.0158, 'grad_norm': 1.097906231880188, 'learning_rate': 1.608837786269254e-05, 'epoch': 0.94} +2025-02-05 17:54:06 - ERROR - stderr - 31%|███▏ | 7020/22434 [7:46:26<10:51:13, 2.53s/it] +2025-02-05 17:54:08 - ERROR - stderr - 31%|███▏ | 7021/22434 [7:46:28<10:42:50, 2.50s/it] +2025-02-05 17:54:08 - ERROR - stderr - +2025-02-05 17:54:08 - ERROR - stderr - +2025-02-05 17:54:08 - INFO - stdout - {'loss': 0.824, 'grad_norm': 1.023992657661438, 'learning_rate': 1.6087232482635685e-05, 'epoch': 0.94} +2025-02-05 17:54:08 - ERROR - stderr - 31%|███▏ | 7021/22434 [7:46:28<10:42:50, 2.50s/it] +2025-02-05 17:54:11 - ERROR - stderr - 31%|███▏ | 7022/22434 [7:46:31<10:46:08, 2.52s/it] +2025-02-05 17:54:11 - ERROR - stderr - +2025-02-05 17:54:11 - ERROR - stderr - +2025-02-05 17:54:11 - INFO - stdout - {'loss': 0.8724, 'grad_norm': 1.0950909852981567, 'learning_rate': 1.608608697569629e-05, 'epoch': 0.94} +2025-02-05 17:54:11 - ERROR - stderr - 31%|███▏ | 7022/22434 [7:46:31<10:46:08, 2.52s/it] +2025-02-05 17:54:13 - ERROR - stderr - 31%|███▏ | 7023/22434 [7:46:33<10:43:04, 2.50s/it] +2025-02-05 17:54:13 - ERROR - stderr - +2025-02-05 17:54:13 - ERROR - stderr - +2025-02-05 17:54:13 - INFO - stdout - {'loss': 0.9444, 'grad_norm': 1.0345914363861084, 'learning_rate': 1.608494134189824e-05, 'epoch': 0.94} +2025-02-05 17:54:13 - ERROR - stderr - 31%|███▏ | 7023/22434 [7:46:33<10:43:04, 2.50s/it] +2025-02-05 17:54:16 - ERROR - stderr - 31%|███▏ | 7024/22434 [7:46:36<10:45:41, 2.51s/it] +2025-02-05 17:54:16 - ERROR - stderr - +2025-02-05 17:54:16 - ERROR - stderr - +2025-02-05 17:54:16 - INFO - stdout - {'loss': 0.9527, 'grad_norm': 1.0029183626174927, 'learning_rate': 1.6083795581265406e-05, 'epoch': 0.94} +2025-02-05 17:54:16 - ERROR - stderr - 31%|███▏ | 7024/22434 [7:46:36<10:45:41, 2.51s/it] +2025-02-05 17:54:18 - ERROR - stderr - 31%|███▏ | 7025/22434 [7:46:38<10:43:58, 2.51s/it] +2025-02-05 17:54:19 - ERROR - stderr - +2025-02-05 17:54:19 - ERROR - stderr - +2025-02-05 17:54:19 - INFO - stdout - {'loss': 0.9311, 'grad_norm': 1.0698575973510742, 'learning_rate': 1.6082649693821677e-05, 'epoch': 0.94} +2025-02-05 17:54:19 - ERROR - stderr - 31%|███▏ | 7025/22434 [7:46:38<10:43:58, 2.51s/it] +2025-02-05 17:54:22 - ERROR - stderr - 31%|███▏ | 7026/22434 [7:46:42<11:42:24, 2.74s/it] +2025-02-05 17:54:22 - ERROR - stderr - +2025-02-05 17:54:22 - ERROR - stderr - +2025-02-05 17:54:22 - INFO - stdout - {'loss': 0.8298, 'grad_norm': 0.9933726787567139, 'learning_rate': 1.6081503679590932e-05, 'epoch': 0.94} +2025-02-05 17:54:22 - ERROR - stderr - 31%|███▏ | 7026/22434 [7:46:42<11:42:24, 2.74s/it] +2025-02-05 17:54:24 - ERROR - stderr - 31%|███▏ | 7027/22434 [7:46:44<11:25:41, 2.67s/it] +2025-02-05 17:54:24 - ERROR - stderr - +2025-02-05 17:54:24 - ERROR - stderr - +2025-02-05 17:54:24 - INFO - stdout - {'loss': 0.9439, 'grad_norm': 1.07218599319458, 'learning_rate': 1.608035753859707e-05, 'epoch': 0.94} +2025-02-05 17:54:24 - ERROR - stderr - 31%|███▏ | 7027/22434 [7:46:44<11:25:41, 2.67s/it] +2025-02-05 17:54:27 - ERROR - stderr - 31%|███▏ | 7028/22434 [7:46:47<11:10:41, 2.61s/it] +2025-02-05 17:54:27 - ERROR - stderr - +2025-02-05 17:54:27 - ERROR - stderr - +2025-02-05 17:54:27 - INFO - stdout - {'loss': 0.91, 'grad_norm': 1.0953887701034546, 'learning_rate': 1.6079211270863966e-05, 'epoch': 0.94} +2025-02-05 17:54:27 - ERROR - stderr - 31%|███▏ | 7028/22434 [7:46:47<11:10:41, 2.61s/it] +2025-02-05 17:54:29 - ERROR - stderr - 31%|███▏ | 7029/22434 [7:46:49<11:07:29, 2.60s/it] +2025-02-05 17:54:29 - ERROR - stderr - +2025-02-05 17:54:29 - ERROR - stderr - +2025-02-05 17:54:29 - INFO - stdout - {'loss': 0.9468, 'grad_norm': 1.1669597625732422, 'learning_rate': 1.6078064876415523e-05, 'epoch': 0.94} +2025-02-05 17:54:29 - ERROR - stderr - 31%|███▏ | 7029/22434 [7:46:49<11:07:29, 2.60s/it] +2025-02-05 17:54:32 - ERROR - stderr - 31%|███▏ | 7030/22434 [7:46:52<11:14:33, 2.63s/it] +2025-02-05 17:54:32 - ERROR - stderr - +2025-02-05 17:54:32 - ERROR - stderr - +2025-02-05 17:54:32 - INFO - stdout - {'loss': 0.9716, 'grad_norm': 1.11777663230896, 'learning_rate': 1.607691835527563e-05, 'epoch': 0.94} +2025-02-05 17:54:32 - ERROR - stderr - 31%|███▏ | 7030/22434 [7:46:52<11:14:33, 2.63s/it] +2025-02-05 17:54:35 - ERROR - stderr - 31%|███▏ | 7031/22434 [7:46:54<11:08:29, 2.60s/it] +2025-02-05 17:54:35 - ERROR - stderr - +2025-02-05 17:54:35 - ERROR - stderr - +2025-02-05 17:54:35 - INFO - stdout - {'loss': 0.8458, 'grad_norm': 1.0105535984039307, 'learning_rate': 1.6075771707468196e-05, 'epoch': 0.94} +2025-02-05 17:54:35 - ERROR - stderr - 31%|███▏ | 7031/22434 [7:46:54<11:08:29, 2.60s/it] +2025-02-05 17:54:37 - ERROR - stderr - 31%|███▏ | 7032/22434 [7:46:57<11:01:28, 2.58s/it] +2025-02-05 17:54:37 - ERROR - stderr - +2025-02-05 17:54:37 - ERROR - stderr - +2025-02-05 17:54:37 - INFO - stdout - {'loss': 0.9395, 'grad_norm': 1.125118374824524, 'learning_rate': 1.607462493301711e-05, 'epoch': 0.94} +2025-02-05 17:54:37 - ERROR - stderr - 31%|███▏ | 7032/22434 [7:46:57<11:01:28, 2.58s/it] +2025-02-05 17:54:40 - ERROR - stderr - 31%|███▏ | 7033/22434 [7:46:59<10:58:22, 2.56s/it] +2025-02-05 17:54:40 - ERROR - stderr - +2025-02-05 17:54:40 - ERROR - stderr - +2025-02-05 17:54:40 - INFO - stdout - {'loss': 0.787, 'grad_norm': 0.9611837863922119, 'learning_rate': 1.6073478031946282e-05, 'epoch': 0.94} +2025-02-05 17:54:40 - ERROR - stderr - 31%|███▏ | 7033/22434 [7:46:59<10:58:22, 2.56s/it] +2025-02-05 17:54:42 - ERROR - stderr - 31%|███▏ | 7034/22434 [7:47:02<10:54:06, 2.55s/it] +2025-02-05 17:54:42 - ERROR - stderr - +2025-02-05 17:54:42 - ERROR - stderr - +2025-02-05 17:54:42 - INFO - stdout - {'loss': 0.9639, 'grad_norm': 0.9582514762878418, 'learning_rate': 1.6072331004279617e-05, 'epoch': 0.94} +2025-02-05 17:54:42 - ERROR - stderr - 31%|███▏ | 7034/22434 [7:47:02<10:54:06, 2.55s/it] +2025-02-05 17:54:45 - ERROR - stderr - 31%|███▏ | 7035/22434 [7:47:04<10:48:58, 2.53s/it] +2025-02-05 17:54:45 - ERROR - stderr - +2025-02-05 17:54:45 - ERROR - stderr - +2025-02-05 17:54:45 - INFO - stdout - {'loss': 0.8635, 'grad_norm': 1.057563066482544, 'learning_rate': 1.6071183850041022e-05, 'epoch': 0.94} +2025-02-05 17:54:45 - ERROR - stderr - 31%|███▏ | 7035/22434 [7:47:04<10:48:58, 2.53s/it] +2025-02-05 17:54:47 - ERROR - stderr - 31%|███▏ | 7036/22434 [7:47:07<10:47:22, 2.52s/it] +2025-02-05 17:54:47 - ERROR - stderr - +2025-02-05 17:54:47 - ERROR - stderr - +2025-02-05 17:54:47 - INFO - stdout - {'loss': 1.0785, 'grad_norm': 1.1197491884231567, 'learning_rate': 1.6070036569254407e-05, 'epoch': 0.94} +2025-02-05 17:54:47 - ERROR - stderr - 31%|███▏ | 7036/22434 [7:47:07<10:47:22, 2.52s/it] +2025-02-05 17:54:50 - ERROR - stderr - 31%|███▏ | 7037/22434 [7:47:10<10:59:38, 2.57s/it] +2025-02-05 17:54:50 - ERROR - stderr - +2025-02-05 17:54:50 - ERROR - stderr - +2025-02-05 17:54:50 - INFO - stdout - {'loss': 0.884, 'grad_norm': 1.102464199066162, 'learning_rate': 1.606888916194369e-05, 'epoch': 0.94} +2025-02-05 17:54:50 - ERROR - stderr - 31%|███▏ | 7037/22434 [7:47:10<10:59:38, 2.57s/it] +2025-02-05 17:54:52 - ERROR - stderr - 31%|███▏ | 7038/22434 [7:47:12<10:53:25, 2.55s/it] +2025-02-05 17:54:52 - ERROR - stderr - +2025-02-05 17:54:52 - ERROR - stderr - +2025-02-05 17:54:52 - INFO - stdout - {'loss': 0.9721, 'grad_norm': 1.087418794631958, 'learning_rate': 1.6067741628132784e-05, 'epoch': 0.94} +2025-02-05 17:54:52 - ERROR - stderr - 31%|███▏ | 7038/22434 [7:47:12<10:53:25, 2.55s/it] +2025-02-05 17:54:55 - ERROR - stderr - 31%|███▏ | 7039/22434 [7:47:15<11:00:58, 2.58s/it] +2025-02-05 17:54:55 - ERROR - stderr - +2025-02-05 17:54:55 - ERROR - stderr - +2025-02-05 17:54:55 - INFO - stdout - {'loss': 0.9675, 'grad_norm': 1.0805827379226685, 'learning_rate': 1.6066593967845613e-05, 'epoch': 0.94} +2025-02-05 17:54:55 - ERROR - stderr - 31%|███▏ | 7039/22434 [7:47:15<11:00:58, 2.58s/it] +2025-02-05 17:54:57 - ERROR - stderr - 31%|███▏ | 7040/22434 [7:47:17<10:52:03, 2.54s/it] +2025-02-05 17:54:57 - ERROR - stderr - +2025-02-05 17:54:57 - ERROR - stderr - +2025-02-05 17:54:57 - INFO - stdout - {'loss': 0.9597, 'grad_norm': 1.090860366821289, 'learning_rate': 1.6065446181106093e-05, 'epoch': 0.94} +2025-02-05 17:54:57 - ERROR - stderr - 31%|███▏ | 7040/22434 [7:47:17<10:52:03, 2.54s/it] +2025-02-05 17:55:00 - ERROR - stderr - 31%|███▏ | 7041/22434 [7:47:20<10:53:00, 2.55s/it] +2025-02-05 17:55:00 - ERROR - stderr - +2025-02-05 17:55:00 - ERROR - stderr - +2025-02-05 17:55:00 - INFO - stdout - {'loss': 0.9327, 'grad_norm': 0.9238436222076416, 'learning_rate': 1.606429826793815e-05, 'epoch': 0.94} +2025-02-05 17:55:00 - ERROR - stderr - 31%|███▏ | 7041/22434 [7:47:20<10:53:00, 2.55s/it] +2025-02-05 17:55:03 - ERROR - stderr - 31%|███▏ | 7042/22434 [7:47:22<10:54:56, 2.55s/it] +2025-02-05 17:55:03 - ERROR - stderr - +2025-02-05 17:55:03 - ERROR - stderr - +2025-02-05 17:55:03 - INFO - stdout - {'loss': 0.8568, 'grad_norm': 0.9414849877357483, 'learning_rate': 1.6063150228365712e-05, 'epoch': 0.94} +2025-02-05 17:55:03 - ERROR - stderr - 31%|███▏ | 7042/22434 [7:47:22<10:54:56, 2.55s/it] +2025-02-05 17:55:05 - ERROR - stderr - 31%|███▏ | 7043/22434 [7:47:25<10:47:24, 2.52s/it] +2025-02-05 17:55:05 - ERROR - stderr - +2025-02-05 17:55:05 - ERROR - stderr - +2025-02-05 17:55:05 - INFO - stdout - {'loss': 1.0164, 'grad_norm': 1.206019401550293, 'learning_rate': 1.6062002062412717e-05, 'epoch': 0.94} +2025-02-05 17:55:05 - ERROR - stderr - 31%|███▏ | 7043/22434 [7:47:25<10:47:24, 2.52s/it] +2025-02-05 17:55:07 - ERROR - stderr - 31%|███▏ | 7044/22434 [7:47:27<10:44:46, 2.51s/it] +2025-02-05 17:55:07 - ERROR - stderr - +2025-02-05 17:55:07 - ERROR - stderr - +2025-02-05 17:55:07 - INFO - stdout - {'loss': 0.8606, 'grad_norm': 0.9971834421157837, 'learning_rate': 1.6060853770103083e-05, 'epoch': 0.94} +2025-02-05 17:55:07 - ERROR - stderr - 31%|███▏ | 7044/22434 [7:47:27<10:44:46, 2.51s/it] +2025-02-05 17:55:10 - ERROR - stderr - 31%|███▏ | 7045/22434 [7:47:30<10:50:13, 2.54s/it] +2025-02-05 17:55:10 - ERROR - stderr - +2025-02-05 17:55:10 - ERROR - stderr - +2025-02-05 17:55:10 - INFO - stdout - {'loss': 1.0597, 'grad_norm': 1.0779533386230469, 'learning_rate': 1.605970535146075e-05, 'epoch': 0.94} +2025-02-05 17:55:10 - ERROR - stderr - 31%|███▏ | 7045/22434 [7:47:30<10:50:13, 2.54s/it] +2025-02-05 17:55:13 - ERROR - stderr - 31%|███▏ | 7046/22434 [7:47:32<10:55:03, 2.55s/it] +2025-02-05 17:55:13 - ERROR - stderr - +2025-02-05 17:55:13 - ERROR - stderr - +2025-02-05 17:55:13 - INFO - stdout - {'loss': 0.9305, 'grad_norm': 1.0883381366729736, 'learning_rate': 1.6058556806509663e-05, 'epoch': 0.94} +2025-02-05 17:55:13 - ERROR - stderr - 31%|███▏ | 7046/22434 [7:47:32<10:55:03, 2.55s/it] +2025-02-05 17:55:15 - ERROR - stderr - 31%|███▏ | 7047/22434 [7:47:35<10:51:03, 2.54s/it] +2025-02-05 17:55:15 - ERROR - stderr - +2025-02-05 17:55:15 - ERROR - stderr - +2025-02-05 17:55:15 - INFO - stdout - {'loss': 0.9097, 'grad_norm': 1.0482873916625977, 'learning_rate': 1.605740813527376e-05, 'epoch': 0.94} +2025-02-05 17:55:15 - ERROR - stderr - 31%|███▏ | 7047/22434 [7:47:35<10:51:03, 2.54s/it] +2025-02-05 17:55:18 - ERROR - stderr - 31%|███▏ | 7048/22434 [7:47:37<10:47:23, 2.52s/it] +2025-02-05 17:55:18 - ERROR - stderr - +2025-02-05 17:55:18 - ERROR - stderr - +2025-02-05 17:55:18 - INFO - stdout - {'loss': 0.9647, 'grad_norm': 1.0759990215301514, 'learning_rate': 1.6056259337776975e-05, 'epoch': 0.94} +2025-02-05 17:55:18 - ERROR - stderr - 31%|███▏ | 7048/22434 [7:47:37<10:47:23, 2.52s/it] +2025-02-05 17:55:20 - ERROR - stderr - 31%|███▏ | 7049/22434 [7:47:40<10:45:33, 2.52s/it] +2025-02-05 17:55:20 - ERROR - stderr - +2025-02-05 17:55:20 - ERROR - stderr - +2025-02-05 17:55:20 - INFO - stdout - {'loss': 0.9418, 'grad_norm': 1.1531344652175903, 'learning_rate': 1.605511041404326e-05, 'epoch': 0.94} +2025-02-05 17:55:20 - ERROR - stderr - 31%|███▏ | 7049/22434 [7:47:40<10:45:33, 2.52s/it] +2025-02-05 17:55:23 - ERROR - stderr - 31%|███▏ | 7050/22434 [7:47:42<10:44:42, 2.51s/it] +2025-02-05 17:55:23 - ERROR - stderr - +2025-02-05 17:55:23 - ERROR - stderr - +2025-02-05 17:55:23 - INFO - stdout - {'loss': 0.866, 'grad_norm': 1.044476866722107, 'learning_rate': 1.605396136409656e-05, 'epoch': 0.94} +2025-02-05 17:55:23 - ERROR - stderr - 31%|███▏ | 7050/22434 [7:47:42<10:44:42, 2.51s/it] +2025-02-05 17:55:25 - ERROR - stderr - 31%|███▏ | 7051/22434 [7:47:45<10:53:30, 2.55s/it] +2025-02-05 17:55:25 - ERROR - stderr - +2025-02-05 17:55:25 - ERROR - stderr - +2025-02-05 17:55:25 - INFO - stdout - {'loss': 0.9232, 'grad_norm': 1.0457857847213745, 'learning_rate': 1.605281218796083e-05, 'epoch': 0.94} +2025-02-05 17:55:25 - ERROR - stderr - 31%|███▏ | 7051/22434 [7:47:45<10:53:30, 2.55s/it] +2025-02-05 17:55:28 - ERROR - stderr - 31%|███▏ | 7052/22434 [7:47:48<10:48:37, 2.53s/it] +2025-02-05 17:55:28 - ERROR - stderr - +2025-02-05 17:55:28 - ERROR - stderr - +2025-02-05 17:55:28 - INFO - stdout - {'loss': 0.8869, 'grad_norm': 1.0963647365570068, 'learning_rate': 1.6051662885660025e-05, 'epoch': 0.94} +2025-02-05 17:55:28 - ERROR - stderr - 31%|███▏ | 7052/22434 [7:47:48<10:48:37, 2.53s/it] +2025-02-05 17:55:30 - ERROR - stderr - 31%|███▏ | 7053/22434 [7:47:50<10:47:56, 2.53s/it] +2025-02-05 17:55:30 - ERROR - stderr - +2025-02-05 17:55:30 - ERROR - stderr - +2025-02-05 17:55:30 - INFO - stdout - {'loss': 0.9139, 'grad_norm': 1.1349354982376099, 'learning_rate': 1.6050513457218092e-05, 'epoch': 0.94} +2025-02-05 17:55:30 - ERROR - stderr - 31%|███▏ | 7053/22434 [7:47:50<10:47:56, 2.53s/it] +2025-02-05 17:55:33 - ERROR - stderr - 31%|███▏ | 7054/22434 [7:47:53<10:50:09, 2.54s/it] +2025-02-05 17:55:33 - ERROR - stderr - +2025-02-05 17:55:33 - ERROR - stderr - +2025-02-05 17:55:33 - INFO - stdout - {'loss': 0.8333, 'grad_norm': 1.0369625091552734, 'learning_rate': 1.6049363902659e-05, 'epoch': 0.94} +2025-02-05 17:55:33 - ERROR - stderr - 31%|███▏ | 7054/22434 [7:47:53<10:50:09, 2.54s/it] +2025-02-05 17:55:35 - ERROR - stderr - 31%|███▏ | 7055/22434 [7:47:55<10:51:36, 2.54s/it] +2025-02-05 17:55:35 - ERROR - stderr - +2025-02-05 17:55:35 - ERROR - stderr - +2025-02-05 17:55:35 - INFO - stdout - {'loss': 0.8946, 'grad_norm': 0.931880533695221, 'learning_rate': 1.6048214222006703e-05, 'epoch': 0.94} +2025-02-05 17:55:35 - ERROR - stderr - 31%|███▏ | 7055/22434 [7:47:55<10:51:36, 2.54s/it] +2025-02-05 17:55:38 - ERROR - stderr - 31%|███▏ | 7056/22434 [7:47:58<10:56:18, 2.56s/it] +2025-02-05 17:55:38 - ERROR - stderr - +2025-02-05 17:55:38 - ERROR - stderr - +2025-02-05 17:55:38 - INFO - stdout - {'loss': 0.8142, 'grad_norm': 1.0355690717697144, 'learning_rate': 1.6047064415285173e-05, 'epoch': 0.94} +2025-02-05 17:55:38 - ERROR - stderr - 31%|███▏ | 7056/22434 [7:47:58<10:56:18, 2.56s/it] +2025-02-05 17:55:40 - ERROR - stderr - 31%|███▏ | 7057/22434 [7:48:00<10:48:58, 2.53s/it] +2025-02-05 17:55:41 - ERROR - stderr - +2025-02-05 17:55:41 - ERROR - stderr - +2025-02-05 17:55:41 - INFO - stdout - {'loss': 1.0132, 'grad_norm': 1.1400094032287598, 'learning_rate': 1.6045914482518366e-05, 'epoch': 0.94} +2025-02-05 17:55:41 - ERROR - stderr - 31%|███▏ | 7057/22434 [7:48:00<10:48:58, 2.53s/it] +2025-02-05 17:55:43 - ERROR - stderr - 31%|███▏ | 7058/22434 [7:48:03<10:56:37, 2.56s/it] +2025-02-05 17:55:43 - ERROR - stderr - +2025-02-05 17:55:43 - ERROR - stderr - +2025-02-05 17:55:43 - INFO - stdout - {'loss': 0.9492, 'grad_norm': 1.0273704528808594, 'learning_rate': 1.6044764423730262e-05, 'epoch': 0.94} +2025-02-05 17:55:43 - ERROR - stderr - 31%|███▏ | 7058/22434 [7:48:03<10:56:37, 2.56s/it] +2025-02-05 17:55:46 - ERROR - stderr - 31%|███▏ | 7059/22434 [7:48:05<10:53:13, 2.55s/it] +2025-02-05 17:55:46 - ERROR - stderr - +2025-02-05 17:55:46 - ERROR - stderr - +2025-02-05 17:55:46 - INFO - stdout - {'loss': 0.9291, 'grad_norm': 1.0778993368148804, 'learning_rate': 1.6043614238944828e-05, 'epoch': 0.94} +2025-02-05 17:55:46 - ERROR - stderr - 31%|███▏ | 7059/22434 [7:48:05<10:53:13, 2.55s/it] +2025-02-05 17:55:48 - ERROR - stderr - 31%|███▏ | 7060/22434 [7:48:08<10:47:32, 2.53s/it] +2025-02-05 17:55:48 - ERROR - stderr - +2025-02-05 17:55:48 - ERROR - stderr - +2025-02-05 17:55:48 - INFO - stdout - {'loss': 1.0442, 'grad_norm': 1.110110878944397, 'learning_rate': 1.6042463928186035e-05, 'epoch': 0.94} +2025-02-05 17:55:48 - ERROR - stderr - 31%|███▏ | 7060/22434 [7:48:08<10:47:32, 2.53s/it] +2025-02-05 17:55:51 - ERROR - stderr - 31%|███▏ | 7061/22434 [7:48:10<10:50:05, 2.54s/it] +2025-02-05 17:55:51 - ERROR - stderr - +2025-02-05 17:55:51 - ERROR - stderr - +2025-02-05 17:55:51 - INFO - stdout - {'loss': 0.9682, 'grad_norm': 1.065592885017395, 'learning_rate': 1.6041313491477865e-05, 'epoch': 0.94} +2025-02-05 17:55:51 - ERROR - stderr - 31%|███▏ | 7061/22434 [7:48:10<10:50:05, 2.54s/it] +2025-02-05 17:55:53 - ERROR - stderr - 31%|███▏ | 7062/22434 [7:48:13<10:47:40, 2.53s/it] +2025-02-05 17:55:53 - ERROR - stderr - +2025-02-05 17:55:53 - ERROR - stderr - +2025-02-05 17:55:53 - INFO - stdout - {'loss': 0.8607, 'grad_norm': 0.9232655167579651, 'learning_rate': 1.6040162928844294e-05, 'epoch': 0.94} +2025-02-05 17:55:53 - ERROR - stderr - 31%|███▏ | 7062/22434 [7:48:13<10:47:40, 2.53s/it] +2025-02-05 17:55:56 - ERROR - stderr - 31%|███▏ | 7063/22434 [7:48:15<10:43:48, 2.51s/it] +2025-02-05 17:55:56 - ERROR - stderr - +2025-02-05 17:55:56 - ERROR - stderr - +2025-02-05 17:55:56 - INFO - stdout - {'loss': 0.9665, 'grad_norm': 1.0336666107177734, 'learning_rate': 1.6039012240309308e-05, 'epoch': 0.94} +2025-02-05 17:55:56 - ERROR - stderr - 31%|███▏ | 7063/22434 [7:48:15<10:43:48, 2.51s/it] +2025-02-05 17:55:58 - ERROR - stderr - 31%|███▏ | 7064/22434 [7:48:18<11:02:28, 2.59s/it] +2025-02-05 17:55:58 - ERROR - stderr - +2025-02-05 17:55:58 - ERROR - stderr - +2025-02-05 17:55:58 - INFO - stdout - {'loss': 0.9714, 'grad_norm': 1.0749419927597046, 'learning_rate': 1.603786142589689e-05, 'epoch': 0.94} +2025-02-05 17:55:58 - ERROR - stderr - 31%|███▏ | 7064/22434 [7:48:18<11:02:28, 2.59s/it] +2025-02-05 17:56:01 - ERROR - stderr - 31%|███▏ | 7065/22434 [7:48:21<10:58:12, 2.57s/it] +2025-02-05 17:56:01 - ERROR - stderr - +2025-02-05 17:56:01 - ERROR - stderr - +2025-02-05 17:56:01 - INFO - stdout - {'loss': 0.9622, 'grad_norm': 1.2030086517333984, 'learning_rate': 1.6036710485631032e-05, 'epoch': 0.94} +2025-02-05 17:56:01 - ERROR - stderr - 31%|███▏ | 7065/22434 [7:48:21<10:58:12, 2.57s/it] +2025-02-05 17:56:04 - ERROR - stderr - 31%|███▏ | 7066/22434 [7:48:23<11:01:45, 2.58s/it] +2025-02-05 17:56:04 - ERROR - stderr - +2025-02-05 17:56:04 - ERROR - stderr - +2025-02-05 17:56:04 - INFO - stdout - {'loss': 1.0341, 'grad_norm': 1.2007665634155273, 'learning_rate': 1.6035559419535714e-05, 'epoch': 0.94} +2025-02-05 17:56:04 - ERROR - stderr - 31%|███▏ | 7066/22434 [7:48:23<11:01:45, 2.58s/it] +2025-02-05 17:56:06 - ERROR - stderr - 32%|███▏ | 7067/22434 [7:48:26<10:58:36, 2.57s/it] +2025-02-05 17:56:06 - ERROR - stderr - +2025-02-05 17:56:06 - ERROR - stderr - +2025-02-05 17:56:06 - INFO - stdout - {'loss': 1.0252, 'grad_norm': 1.0877426862716675, 'learning_rate': 1.603440822763494e-05, 'epoch': 0.95} +2025-02-05 17:56:06 - ERROR - stderr - 32%|███▏ | 7067/22434 [7:48:26<10:58:36, 2.57s/it] +2025-02-05 17:56:09 - ERROR - stderr - 32%|███▏ | 7068/22434 [7:48:28<10:53:40, 2.55s/it] +2025-02-05 17:56:09 - ERROR - stderr - +2025-02-05 17:56:09 - ERROR - stderr - +2025-02-05 17:56:09 - INFO - stdout - {'loss': 0.8859, 'grad_norm': 1.021519422531128, 'learning_rate': 1.60332569099527e-05, 'epoch': 0.95} +2025-02-05 17:56:09 - ERROR - stderr - 32%|███▏ | 7068/22434 [7:48:28<10:53:40, 2.55s/it] +2025-02-05 17:56:11 - ERROR - stderr - 32%|███▏ | 7069/22434 [7:48:31<10:45:54, 2.52s/it] +2025-02-05 17:56:11 - ERROR - stderr - +2025-02-05 17:56:11 - ERROR - stderr - +2025-02-05 17:56:11 - INFO - stdout - {'loss': 0.9081, 'grad_norm': 1.0105737447738647, 'learning_rate': 1.6032105466512993e-05, 'epoch': 0.95} +2025-02-05 17:56:11 - ERROR - stderr - 32%|███▏ | 7069/22434 [7:48:31<10:45:54, 2.52s/it] +2025-02-05 17:56:14 - ERROR - stderr - 32%|███▏ | 7070/22434 [7:48:33<10:43:33, 2.51s/it] +2025-02-05 17:56:14 - ERROR - stderr - +2025-02-05 17:56:14 - ERROR - stderr - +2025-02-05 17:56:14 - INFO - stdout - {'loss': 0.9759, 'grad_norm': 1.068971037864685, 'learning_rate': 1.6030953897339817e-05, 'epoch': 0.95} +2025-02-05 17:56:14 - ERROR - stderr - 32%|███▏ | 7070/22434 [7:48:33<10:43:33, 2.51s/it] +2025-02-05 17:56:16 - ERROR - stderr - 32%|███▏ | 7071/22434 [7:48:36<10:50:06, 2.54s/it] +2025-02-05 17:56:16 - ERROR - stderr - +2025-02-05 17:56:16 - ERROR - stderr - +2025-02-05 17:56:16 - INFO - stdout - {'loss': 0.9358, 'grad_norm': 1.0634922981262207, 'learning_rate': 1.602980220245718e-05, 'epoch': 0.95} +2025-02-05 17:56:16 - ERROR - stderr - 32%|███▏ | 7071/22434 [7:48:36<10:50:06, 2.54s/it] +2025-02-05 17:56:19 - ERROR - stderr - 32%|███▏ | 7072/22434 [7:48:38<10:49:57, 2.54s/it] +2025-02-05 17:56:19 - ERROR - stderr - +2025-02-05 17:56:19 - ERROR - stderr - +2025-02-05 17:56:19 - INFO - stdout - {'loss': 0.839, 'grad_norm': 1.0173265933990479, 'learning_rate': 1.6028650381889088e-05, 'epoch': 0.95} +2025-02-05 17:56:19 - ERROR - stderr - 32%|███▏ | 7072/22434 [7:48:38<10:49:57, 2.54s/it] +2025-02-05 17:56:21 - ERROR - stderr - 32%|███▏ | 7073/22434 [7:48:41<10:48:11, 2.53s/it] +2025-02-05 17:56:21 - ERROR - stderr - +2025-02-05 17:56:21 - ERROR - stderr - +2025-02-05 17:56:21 - INFO - stdout - {'loss': 0.9718, 'grad_norm': 1.1603729724884033, 'learning_rate': 1.6027498435659545e-05, 'epoch': 0.95} +2025-02-05 17:56:21 - ERROR - stderr - 32%|███▏ | 7073/22434 [7:48:41<10:48:11, 2.53s/it] +2025-02-05 17:56:24 - ERROR - stderr - 32%|███▏ | 7074/22434 [7:48:44<10:54:52, 2.56s/it] +2025-02-05 17:56:24 - ERROR - stderr - +2025-02-05 17:56:24 - ERROR - stderr - +2025-02-05 17:56:24 - INFO - stdout - {'loss': 0.7897, 'grad_norm': 0.8913689255714417, 'learning_rate': 1.6026346363792565e-05, 'epoch': 0.95} +2025-02-05 17:56:24 - ERROR - stderr - 32%|███▏ | 7074/22434 [7:48:44<10:54:52, 2.56s/it] +2025-02-05 17:56:26 - ERROR - stderr - 32%|███▏ | 7075/22434 [7:48:46<10:52:37, 2.55s/it] +2025-02-05 17:56:26 - ERROR - stderr - +2025-02-05 17:56:26 - ERROR - stderr - +2025-02-05 17:56:26 - INFO - stdout - {'loss': 0.9082, 'grad_norm': 1.1767996549606323, 'learning_rate': 1.6025194166312162e-05, 'epoch': 0.95} +2025-02-05 17:56:26 - ERROR - stderr - 32%|███▏ | 7075/22434 [7:48:46<10:52:37, 2.55s/it] +2025-02-05 17:56:29 - ERROR - stderr - 32%|███▏ | 7076/22434 [7:48:49<10:44:38, 2.52s/it] +2025-02-05 17:56:29 - ERROR - stderr - +2025-02-05 17:56:29 - ERROR - stderr - +2025-02-05 17:56:29 - INFO - stdout - {'loss': 1.0019, 'grad_norm': 1.076306939125061, 'learning_rate': 1.6024041843242353e-05, 'epoch': 0.95} +2025-02-05 17:56:29 - ERROR - stderr - 32%|███▏ | 7076/22434 [7:48:49<10:44:38, 2.52s/it] +2025-02-05 17:56:31 - ERROR - stderr - 32%|███▏ | 7077/22434 [7:48:51<10:44:16, 2.52s/it] +2025-02-05 17:56:31 - ERROR - stderr - +2025-02-05 17:56:31 - ERROR - stderr - +2025-02-05 17:56:31 - INFO - stdout - {'loss': 0.9902, 'grad_norm': 0.9961170554161072, 'learning_rate': 1.6022889394607156e-05, 'epoch': 0.95} +2025-02-05 17:56:31 - ERROR - stderr - 32%|███▏ | 7077/22434 [7:48:51<10:44:16, 2.52s/it] +2025-02-05 17:56:34 - ERROR - stderr - 32%|███▏ | 7078/22434 [7:48:53<10:38:35, 2.50s/it] +2025-02-05 17:56:34 - ERROR - stderr - +2025-02-05 17:56:34 - ERROR - stderr - +2025-02-05 17:56:34 - INFO - stdout - {'loss': 0.9623, 'grad_norm': 0.9746331572532654, 'learning_rate': 1.602173682043059e-05, 'epoch': 0.95} +2025-02-05 17:56:34 - ERROR - stderr - 32%|███▏ | 7078/22434 [7:48:54<10:38:35, 2.50s/it] +2025-02-05 17:56:36 - ERROR - stderr - 32%|███▏ | 7079/22434 [7:48:56<10:48:47, 2.54s/it] +2025-02-05 17:56:36 - ERROR - stderr - +2025-02-05 17:56:36 - ERROR - stderr - +2025-02-05 17:56:36 - INFO - stdout - {'loss': 0.9123, 'grad_norm': 1.0101170539855957, 'learning_rate': 1.6020584120736686e-05, 'epoch': 0.95} +2025-02-05 17:56:36 - ERROR - stderr - 32%|███▏ | 7079/22434 [7:48:56<10:48:47, 2.54s/it] +2025-02-05 17:56:39 - ERROR - stderr - 32%|███▏ | 7080/22434 [7:48:59<11:00:48, 2.58s/it] +2025-02-05 17:56:39 - ERROR - stderr - +2025-02-05 17:56:39 - ERROR - stderr - +2025-02-05 17:56:39 - INFO - stdout - {'loss': 1.0562, 'grad_norm': 1.105758786201477, 'learning_rate': 1.6019431295549463e-05, 'epoch': 0.95} +2025-02-05 17:56:39 - ERROR - stderr - 32%|███▏ | 7080/22434 [7:48:59<11:00:48, 2.58s/it] +2025-02-05 17:56:42 - ERROR - stderr - 32%|███▏ | 7081/22434 [7:49:01<10:59:07, 2.58s/it] +2025-02-05 17:56:42 - ERROR - stderr - +2025-02-05 17:56:42 - ERROR - stderr - +2025-02-05 17:56:42 - INFO - stdout - {'loss': 1.1044, 'grad_norm': 1.127420425415039, 'learning_rate': 1.601827834489296e-05, 'epoch': 0.95} +2025-02-05 17:56:42 - ERROR - stderr - 32%|███▏ | 7081/22434 [7:49:01<10:59:07, 2.58s/it] +2025-02-05 17:56:44 - ERROR - stderr - 32%|███▏ | 7082/22434 [7:49:04<10:55:45, 2.56s/it] +2025-02-05 17:56:44 - ERROR - stderr - +2025-02-05 17:56:44 - ERROR - stderr - +2025-02-05 17:56:44 - INFO - stdout - {'loss': 0.9414, 'grad_norm': 1.0131900310516357, 'learning_rate': 1.60171252687912e-05, 'epoch': 0.95} +2025-02-05 17:56:44 - ERROR - stderr - 32%|███▏ | 7082/22434 [7:49:04<10:55:45, 2.56s/it] +2025-02-05 17:56:47 - ERROR - stderr - 32%|███▏ | 7083/22434 [7:49:06<10:51:06, 2.54s/it] +2025-02-05 17:56:47 - ERROR - stderr - +2025-02-05 17:56:47 - ERROR - stderr - +2025-02-05 17:56:47 - INFO - stdout - {'loss': 0.8778, 'grad_norm': 1.0410038232803345, 'learning_rate': 1.601597206726822e-05, 'epoch': 0.95} +2025-02-05 17:56:47 - ERROR - stderr - 32%|███▏ | 7083/22434 [7:49:06<10:51:06, 2.54s/it] +2025-02-05 17:56:49 - ERROR - stderr - 32%|███▏ | 7084/22434 [7:49:09<10:54:43, 2.56s/it] +2025-02-05 17:56:49 - ERROR - stderr - +2025-02-05 17:56:49 - ERROR - stderr - +2025-02-05 17:56:49 - INFO - stdout - {'loss': 0.9463, 'grad_norm': 1.093634843826294, 'learning_rate': 1.6014818740348064e-05, 'epoch': 0.95} +2025-02-05 17:56:49 - ERROR - stderr - 32%|███▏ | 7084/22434 [7:49:09<10:54:43, 2.56s/it] +2025-02-05 17:56:52 - ERROR - stderr - 32%|███▏ | 7085/22434 [7:49:12<10:50:14, 2.54s/it] +2025-02-05 17:56:52 - ERROR - stderr - +2025-02-05 17:56:52 - ERROR - stderr - +2025-02-05 17:56:52 - INFO - stdout - {'loss': 0.8959, 'grad_norm': 1.015401005744934, 'learning_rate': 1.6013665288054767e-05, 'epoch': 0.95} +2025-02-05 17:56:52 - ERROR - stderr - 32%|███▏ | 7085/22434 [7:49:12<10:50:14, 2.54s/it] +2025-02-05 17:56:54 - ERROR - stderr - 32%|███▏ | 7086/22434 [7:49:14<10:54:30, 2.56s/it] +2025-02-05 17:56:54 - ERROR - stderr - +2025-02-05 17:56:54 - ERROR - stderr - +2025-02-05 17:56:54 - INFO - stdout - {'loss': 0.879, 'grad_norm': 0.9743746519088745, 'learning_rate': 1.6012511710412364e-05, 'epoch': 0.95} +2025-02-05 17:56:54 - ERROR - stderr - 32%|███▏ | 7086/22434 [7:49:14<10:54:30, 2.56s/it] +2025-02-05 17:56:57 - ERROR - stderr - 32%|███▏ | 7087/22434 [7:49:17<10:50:03, 2.54s/it] +2025-02-05 17:56:57 - ERROR - stderr - +2025-02-05 17:56:57 - ERROR - stderr - +2025-02-05 17:56:57 - INFO - stdout - {'loss': 0.906, 'grad_norm': 1.092085361480713, 'learning_rate': 1.6011358007444914e-05, 'epoch': 0.95} +2025-02-05 17:56:57 - ERROR - stderr - 32%|███▏ | 7087/22434 [7:49:17<10:50:03, 2.54s/it] +2025-02-05 17:56:59 - ERROR - stderr - 32%|███▏ | 7088/22434 [7:49:19<10:58:45, 2.58s/it] +2025-02-05 17:57:00 - ERROR - stderr - +2025-02-05 17:57:00 - ERROR - stderr - +2025-02-05 17:57:00 - INFO - stdout - {'loss': 0.9441, 'grad_norm': 1.0641460418701172, 'learning_rate': 1.6010204179176456e-05, 'epoch': 0.95} +2025-02-05 17:57:00 - ERROR - stderr - 32%|███▏ | 7088/22434 [7:49:19<10:58:45, 2.58s/it] +2025-02-05 17:57:02 - ERROR - stderr - 32%|███▏ | 7089/22434 [7:49:22<10:53:21, 2.55s/it] +2025-02-05 17:57:02 - ERROR - stderr - +2025-02-05 17:57:02 - ERROR - stderr - +2025-02-05 17:57:02 - INFO - stdout - {'loss': 1.0118, 'grad_norm': 1.1013720035552979, 'learning_rate': 1.6009050225631043e-05, 'epoch': 0.95} +2025-02-05 17:57:02 - ERROR - stderr - 32%|███▏ | 7089/22434 [7:49:22<10:53:21, 2.55s/it] +2025-02-05 17:57:05 - ERROR - stderr - 32%|███▏ | 7090/22434 [7:49:25<11:08:47, 2.62s/it] +2025-02-05 17:57:05 - ERROR - stderr - +2025-02-05 17:57:05 - ERROR - stderr - +2025-02-05 17:57:05 - INFO - stdout - {'loss': 0.9325, 'grad_norm': 1.0126711130142212, 'learning_rate': 1.600789614683273e-05, 'epoch': 0.95} +2025-02-05 17:57:05 - ERROR - stderr - 32%|███▏ | 7090/22434 [7:49:25<11:08:47, 2.62s/it] +2025-02-05 17:57:08 - ERROR - stderr - 32%|███▏ | 7091/22434 [7:49:27<11:23:55, 2.67s/it] +2025-02-05 17:57:08 - ERROR - stderr - +2025-02-05 17:57:08 - ERROR - stderr - +2025-02-05 17:57:08 - INFO - stdout - {'loss': 0.9372, 'grad_norm': 1.091216802597046, 'learning_rate': 1.600674194280557e-05, 'epoch': 0.95} +2025-02-05 17:57:08 - ERROR - stderr - 32%|███▏ | 7091/22434 [7:49:27<11:23:55, 2.67s/it] +2025-02-05 17:57:10 - ERROR - stderr - 32%|███▏ | 7092/22434 [7:49:30<11:18:44, 2.65s/it] +2025-02-05 17:57:10 - ERROR - stderr - +2025-02-05 17:57:10 - ERROR - stderr - +2025-02-05 17:57:10 - INFO - stdout - {'loss': 0.9828, 'grad_norm': 1.2878303527832031, 'learning_rate': 1.600558761357362e-05, 'epoch': 0.95} +2025-02-05 17:57:10 - ERROR - stderr - 32%|███▏ | 7092/22434 [7:49:30<11:18:44, 2.65s/it] +2025-02-05 17:57:13 - ERROR - stderr - 32%|███▏ | 7093/22434 [7:49:32<11:06:39, 2.61s/it] +2025-02-05 17:57:13 - ERROR - stderr - +2025-02-05 17:57:13 - ERROR - stderr - +2025-02-05 17:57:13 - INFO - stdout - {'loss': 0.8716, 'grad_norm': 1.0509196519851685, 'learning_rate': 1.6004433159160946e-05, 'epoch': 0.95} +2025-02-05 17:57:13 - ERROR - stderr - 32%|███▏ | 7093/22434 [7:49:32<11:06:39, 2.61s/it] +2025-02-05 17:57:15 - ERROR - stderr - 32%|███▏ | 7094/22434 [7:49:35<10:57:21, 2.57s/it] +2025-02-05 17:57:15 - ERROR - stderr - +2025-02-05 17:57:15 - ERROR - stderr - +2025-02-05 17:57:15 - INFO - stdout - {'loss': 1.0432, 'grad_norm': 1.1290314197540283, 'learning_rate': 1.6003278579591608e-05, 'epoch': 0.95} +2025-02-05 17:57:15 - ERROR - stderr - 32%|███▏ | 7094/22434 [7:49:35<10:57:21, 2.57s/it] +2025-02-05 17:57:18 - ERROR - stderr - 32%|███▏ | 7095/22434 [7:49:37<10:51:04, 2.55s/it] +2025-02-05 17:57:18 - ERROR - stderr - +2025-02-05 17:57:18 - ERROR - stderr - +2025-02-05 17:57:18 - INFO - stdout - {'loss': 0.866, 'grad_norm': 1.0398333072662354, 'learning_rate': 1.6002123874889672e-05, 'epoch': 0.95} +2025-02-05 17:57:18 - ERROR - stderr - 32%|███▏ | 7095/22434 [7:49:37<10:51:04, 2.55s/it] +2025-02-05 17:57:20 - ERROR - stderr - 32%|███▏ | 7096/22434 [7:49:40<10:49:07, 2.54s/it] +2025-02-05 17:57:20 - ERROR - stderr - +2025-02-05 17:57:20 - ERROR - stderr - +2025-02-05 17:57:20 - INFO - stdout - {'loss': 0.978, 'grad_norm': 1.1429569721221924, 'learning_rate': 1.600096904507921e-05, 'epoch': 0.95} +2025-02-05 17:57:20 - ERROR - stderr - 32%|███▏ | 7096/22434 [7:49:40<10:49:07, 2.54s/it] +2025-02-05 17:57:23 - ERROR - stderr - 32%|███▏ | 7097/22434 [7:49:42<10:50:05, 2.54s/it] +2025-02-05 17:57:23 - ERROR - stderr - +2025-02-05 17:57:23 - ERROR - stderr - +2025-02-05 17:57:23 - INFO - stdout - {'loss': 0.9275, 'grad_norm': 1.0715259313583374, 'learning_rate': 1.5999814090184286e-05, 'epoch': 0.95} +2025-02-05 17:57:23 - ERROR - stderr - 32%|███▏ | 7097/22434 [7:49:43<10:50:05, 2.54s/it] +2025-02-05 17:57:25 - ERROR - stderr - 32%|███▏ | 7098/22434 [7:49:45<10:42:59, 2.52s/it] +2025-02-05 17:57:25 - ERROR - stderr - +2025-02-05 17:57:25 - ERROR - stderr - +2025-02-05 17:57:25 - INFO - stdout - {'loss': 0.8453, 'grad_norm': 0.9913287162780762, 'learning_rate': 1.5998659010228978e-05, 'epoch': 0.95} +2025-02-05 17:57:25 - ERROR - stderr - 32%|███▏ | 7098/22434 [7:49:45<10:42:59, 2.52s/it] +2025-02-05 17:57:28 - ERROR - stderr - 32%|███▏ | 7099/22434 [7:49:47<10:42:01, 2.51s/it] +2025-02-05 17:57:28 - ERROR - stderr - +2025-02-05 17:57:28 - ERROR - stderr - +2025-02-05 17:57:28 - INFO - stdout - {'loss': 0.9428, 'grad_norm': 1.1056674718856812, 'learning_rate': 1.5997503805237366e-05, 'epoch': 0.95} +2025-02-05 17:57:28 - ERROR - stderr - 32%|███▏ | 7099/22434 [7:49:48<10:42:01, 2.51s/it] +2025-02-05 17:57:30 - ERROR - stderr - 32%|███▏ | 7100/22434 [7:49:50<10:45:26, 2.53s/it] +2025-02-05 17:57:30 - ERROR - stderr - +2025-02-05 17:57:30 - ERROR - stderr - +2025-02-05 17:57:30 - INFO - stdout - {'loss': 0.8141, 'grad_norm': 0.990154504776001, 'learning_rate': 1.5996348475233526e-05, 'epoch': 0.95} +2025-02-05 17:57:30 - ERROR - stderr - 32%|███▏ | 7100/22434 [7:49:50<10:45:26, 2.53s/it] +2025-02-05 17:57:33 - ERROR - stderr - 32%|███▏ | 7101/22434 [7:49:52<10:42:03, 2.51s/it] +2025-02-05 17:57:33 - ERROR - stderr - +2025-02-05 17:57:33 - ERROR - stderr - +2025-02-05 17:57:33 - INFO - stdout - {'loss': 0.9946, 'grad_norm': 0.9788251519203186, 'learning_rate': 1.5995193020241536e-05, 'epoch': 0.95} +2025-02-05 17:57:33 - ERROR - stderr - 32%|███▏ | 7101/22434 [7:49:53<10:42:03, 2.51s/it] +2025-02-05 17:57:35 - ERROR - stderr - 32%|███▏ | 7102/22434 [7:49:55<10:42:11, 2.51s/it] +2025-02-05 17:57:35 - ERROR - stderr - +2025-02-05 17:57:35 - ERROR - stderr - +2025-02-05 17:57:35 - INFO - stdout - {'loss': 0.9282, 'grad_norm': 1.018373966217041, 'learning_rate': 1.5994037440285487e-05, 'epoch': 0.95} +2025-02-05 17:57:35 - ERROR - stderr - 32%|███▏ | 7102/22434 [7:49:55<10:42:11, 2.51s/it] +2025-02-05 17:57:38 - ERROR - stderr - 32%|███▏ | 7103/22434 [7:49:58<10:44:49, 2.52s/it] +2025-02-05 17:57:38 - ERROR - stderr - +2025-02-05 17:57:38 - ERROR - stderr - +2025-02-05 17:57:38 - INFO - stdout - {'loss': 0.9456, 'grad_norm': 1.0913234949111938, 'learning_rate': 1.5992881735389463e-05, 'epoch': 0.95} +2025-02-05 17:57:38 - ERROR - stderr - 32%|███▏ | 7103/22434 [7:49:58<10:44:49, 2.52s/it] +2025-02-05 17:57:40 - ERROR - stderr - 32%|███▏ | 7104/22434 [7:50:00<10:52:44, 2.55s/it] +2025-02-05 17:57:40 - ERROR - stderr - +2025-02-05 17:57:40 - ERROR - stderr - +2025-02-05 17:57:40 - INFO - stdout - {'loss': 0.9816, 'grad_norm': 1.19563889503479, 'learning_rate': 1.5991725905577557e-05, 'epoch': 0.95} +2025-02-05 17:57:40 - ERROR - stderr - 32%|███▏ | 7104/22434 [7:50:00<10:52:44, 2.55s/it] +2025-02-05 17:57:43 - ERROR - stderr - 32%|███▏ | 7105/22434 [7:50:03<10:52:58, 2.56s/it] +2025-02-05 17:57:43 - ERROR - stderr - +2025-02-05 17:57:43 - ERROR - stderr - +2025-02-05 17:57:43 - INFO - stdout - {'loss': 1.0286, 'grad_norm': 0.9748798608779907, 'learning_rate': 1.5990569950873855e-05, 'epoch': 0.95} +2025-02-05 17:57:43 - ERROR - stderr - 32%|███▏ | 7105/22434 [7:50:03<10:52:58, 2.56s/it] +2025-02-05 17:57:46 - ERROR - stderr - 32%|███▏ | 7106/22434 [7:50:05<10:54:14, 2.56s/it] +2025-02-05 17:57:46 - ERROR - stderr - +2025-02-05 17:57:46 - ERROR - stderr - +2025-02-05 17:57:46 - INFO - stdout - {'loss': 0.7902, 'grad_norm': 0.9589357376098633, 'learning_rate': 1.5989413871302456e-05, 'epoch': 0.95} +2025-02-05 17:57:46 - ERROR - stderr - 32%|███▏ | 7106/22434 [7:50:05<10:54:14, 2.56s/it] +2025-02-05 17:57:48 - ERROR - stderr - 32%|███▏ | 7107/22434 [7:50:08<10:56:39, 2.57s/it] +2025-02-05 17:57:48 - ERROR - stderr - +2025-02-05 17:57:48 - ERROR - stderr - +2025-02-05 17:57:48 - INFO - stdout - {'loss': 0.8276, 'grad_norm': 1.031050205230713, 'learning_rate': 1.5988257666887454e-05, 'epoch': 0.95} +2025-02-05 17:57:48 - ERROR - stderr - 32%|███▏ | 7107/22434 [7:50:08<10:56:39, 2.57s/it] +2025-02-05 17:57:51 - ERROR - stderr - 32%|███▏ | 7108/22434 [7:50:10<10:53:24, 2.56s/it] +2025-02-05 17:57:51 - ERROR - stderr - +2025-02-05 17:57:51 - ERROR - stderr - +2025-02-05 17:57:51 - INFO - stdout - {'loss': 0.9675, 'grad_norm': 1.0419827699661255, 'learning_rate': 1.5987101337652955e-05, 'epoch': 0.95} +2025-02-05 17:57:51 - ERROR - stderr - 32%|███▏ | 7108/22434 [7:50:10<10:53:24, 2.56s/it] +2025-02-05 17:57:53 - ERROR - stderr - 32%|███▏ | 7109/22434 [7:50:13<10:50:23, 2.55s/it] +2025-02-05 17:57:53 - ERROR - stderr - +2025-02-05 17:57:53 - ERROR - stderr - +2025-02-05 17:57:53 - INFO - stdout - {'loss': 0.9662, 'grad_norm': 1.0728776454925537, 'learning_rate': 1.5985944883623052e-05, 'epoch': 0.95} +2025-02-05 17:57:53 - ERROR - stderr - 32%|███▏ | 7109/22434 [7:50:13<10:50:23, 2.55s/it] +2025-02-05 17:57:56 - ERROR - stderr - 32%|███▏ | 7110/22434 [7:50:16<10:52:07, 2.55s/it] +2025-02-05 17:57:56 - ERROR - stderr - +2025-02-05 17:57:56 - ERROR - stderr - +2025-02-05 17:57:56 - INFO - stdout - {'loss': 0.9446, 'grad_norm': 1.013154149055481, 'learning_rate': 1.598478830482186e-05, 'epoch': 0.95} +2025-02-05 17:57:56 - ERROR - stderr - 32%|███▏ | 7110/22434 [7:50:16<10:52:07, 2.55s/it] +2025-02-05 17:57:58 - ERROR - stderr - 32%|███▏ | 7111/22434 [7:50:18<10:48:26, 2.54s/it] +2025-02-05 17:57:58 - ERROR - stderr - +2025-02-05 17:57:58 - ERROR - stderr - +2025-02-05 17:57:58 - INFO - stdout - {'loss': 0.9762, 'grad_norm': 1.0370818376541138, 'learning_rate': 1.598363160127348e-05, 'epoch': 0.95} +2025-02-05 17:57:58 - ERROR - stderr - 32%|███▏ | 7111/22434 [7:50:18<10:48:26, 2.54s/it] +2025-02-05 17:58:01 - ERROR - stderr - 32%|███▏ | 7112/22434 [7:50:21<10:54:52, 2.56s/it] +2025-02-05 17:58:01 - ERROR - stderr - +2025-02-05 17:58:01 - ERROR - stderr - +2025-02-05 17:58:01 - INFO - stdout - {'loss': 0.9293, 'grad_norm': 1.0818289518356323, 'learning_rate': 1.5982474773002028e-05, 'epoch': 0.95} +2025-02-05 17:58:01 - ERROR - stderr - 32%|███▏ | 7112/22434 [7:50:21<10:54:52, 2.56s/it] +2025-02-05 17:58:03 - ERROR - stderr - 32%|███▏ | 7113/22434 [7:50:23<10:49:58, 2.55s/it] +2025-02-05 17:58:03 - ERROR - stderr - +2025-02-05 17:58:03 - ERROR - stderr - +2025-02-05 17:58:03 - INFO - stdout - {'loss': 0.9021, 'grad_norm': 1.0101479291915894, 'learning_rate': 1.5981317820031613e-05, 'epoch': 0.95} +2025-02-05 17:58:03 - ERROR - stderr - 32%|███▏ | 7113/22434 [7:50:23<10:49:58, 2.55s/it] +2025-02-05 17:58:06 - ERROR - stderr - 32%|███▏ | 7114/22434 [7:50:26<10:49:19, 2.54s/it] +2025-02-05 17:58:06 - ERROR - stderr - +2025-02-05 17:58:06 - ERROR - stderr - +2025-02-05 17:58:06 - INFO - stdout - {'loss': 0.9594, 'grad_norm': 1.0467801094055176, 'learning_rate': 1.598016074238635e-05, 'epoch': 0.95} +2025-02-05 17:58:06 - ERROR - stderr - 32%|███▏ | 7114/22434 [7:50:26<10:49:19, 2.54s/it] +2025-02-05 17:58:08 - ERROR - stderr - 32%|███▏ | 7115/22434 [7:50:28<10:40:01, 2.51s/it] +2025-02-05 17:58:08 - ERROR - stderr - +2025-02-05 17:58:08 - ERROR - stderr - +2025-02-05 17:58:08 - INFO - stdout - {'loss': 0.9634, 'grad_norm': 1.0949842929840088, 'learning_rate': 1.597900354009036e-05, 'epoch': 0.95} +2025-02-05 17:58:08 - ERROR - stderr - 32%|███▏ | 7115/22434 [7:50:28<10:40:01, 2.51s/it] +2025-02-05 17:58:11 - ERROR - stderr - 32%|███▏ | 7116/22434 [7:50:31<10:49:28, 2.54s/it] +2025-02-05 17:58:11 - ERROR - stderr - +2025-02-05 17:58:11 - ERROR - stderr - +2025-02-05 17:58:11 - INFO - stdout - {'loss': 0.9533, 'grad_norm': 1.0705264806747437, 'learning_rate': 1.597784621316776e-05, 'epoch': 0.95} +2025-02-05 17:58:11 - ERROR - stderr - 32%|███▏ | 7116/22434 [7:50:31<10:49:28, 2.54s/it] +2025-02-05 17:58:14 - ERROR - stderr - 32%|███▏ | 7117/22434 [7:50:33<10:49:43, 2.55s/it] +2025-02-05 17:58:14 - ERROR - stderr - +2025-02-05 17:58:14 - ERROR - stderr - +2025-02-05 17:58:14 - INFO - stdout - {'loss': 1.0464, 'grad_norm': 1.1463476419448853, 'learning_rate': 1.597668876164268e-05, 'epoch': 0.95} +2025-02-05 17:58:14 - ERROR - stderr - 32%|███▏ | 7117/22434 [7:50:33<10:49:43, 2.55s/it] +2025-02-05 17:58:16 - ERROR - stderr - 32%|███▏ | 7118/22434 [7:50:36<10:47:27, 2.54s/it] +2025-02-05 17:58:16 - ERROR - stderr - +2025-02-05 17:58:16 - ERROR - stderr - +2025-02-05 17:58:16 - INFO - stdout - {'loss': 1.0321, 'grad_norm': 1.0410542488098145, 'learning_rate': 1.5975531185539238e-05, 'epoch': 0.95} +2025-02-05 17:58:16 - ERROR - stderr - 32%|███▏ | 7118/22434 [7:50:36<10:47:27, 2.54s/it] +2025-02-05 17:58:19 - ERROR - stderr - 32%|███▏ | 7119/22434 [7:50:38<10:42:56, 2.52s/it] +2025-02-05 17:58:19 - ERROR - stderr - +2025-02-05 17:58:19 - ERROR - stderr - +2025-02-05 17:58:19 - INFO - stdout - {'loss': 0.9515, 'grad_norm': 0.991000771522522, 'learning_rate': 1.5974373484881568e-05, 'epoch': 0.95} +2025-02-05 17:58:19 - ERROR - stderr - 32%|███▏ | 7119/22434 [7:50:38<10:42:56, 2.52s/it] +2025-02-05 17:58:21 - ERROR - stderr - 32%|███▏ | 7120/22434 [7:50:41<10:44:09, 2.52s/it] +2025-02-05 17:58:21 - ERROR - stderr - +2025-02-05 17:58:21 - ERROR - stderr - +2025-02-05 17:58:21 - INFO - stdout - {'loss': 0.9305, 'grad_norm': 0.9156233072280884, 'learning_rate': 1.5973215659693802e-05, 'epoch': 0.95} +2025-02-05 17:58:21 - ERROR - stderr - 32%|███▏ | 7120/22434 [7:50:41<10:44:09, 2.52s/it] +2025-02-05 17:58:24 - ERROR - stderr - 32%|███▏ | 7121/22434 [7:50:43<10:46:44, 2.53s/it] +2025-02-05 17:58:24 - ERROR - stderr - +2025-02-05 17:58:24 - ERROR - stderr - +2025-02-05 17:58:24 - INFO - stdout - {'loss': 0.928, 'grad_norm': 1.0865553617477417, 'learning_rate': 1.5972057710000067e-05, 'epoch': 0.95} +2025-02-05 17:58:24 - ERROR - stderr - 32%|███▏ | 7121/22434 [7:50:43<10:46:44, 2.53s/it] +2025-02-05 17:58:26 - ERROR - stderr - 32%|███▏ | 7122/22434 [7:50:46<10:52:09, 2.56s/it] +2025-02-05 17:58:26 - ERROR - stderr - +2025-02-05 17:58:26 - ERROR - stderr - +2025-02-05 17:58:26 - INFO - stdout - {'loss': 1.0142, 'grad_norm': 1.1766663789749146, 'learning_rate': 1.5970899635824506e-05, 'epoch': 0.95} +2025-02-05 17:58:26 - ERROR - stderr - 32%|███▏ | 7122/22434 [7:50:46<10:52:09, 2.56s/it] +2025-02-05 17:58:29 - ERROR - stderr - 32%|███▏ | 7123/22434 [7:50:49<11:11:10, 2.63s/it] +2025-02-05 17:58:29 - ERROR - stderr - +2025-02-05 17:58:29 - ERROR - stderr - +2025-02-05 17:58:29 - INFO - stdout - {'loss': 0.889, 'grad_norm': 1.0699774026870728, 'learning_rate': 1.5969741437191254e-05, 'epoch': 0.95} +2025-02-05 17:58:29 - ERROR - stderr - 32%|███▏ | 7123/22434 [7:50:49<11:11:10, 2.63s/it] +2025-02-05 17:58:31 - ERROR - stderr - 32%|███▏ | 7124/22434 [7:50:51<10:58:03, 2.58s/it] +2025-02-05 17:58:32 - ERROR - stderr - +2025-02-05 17:58:32 - ERROR - stderr - +2025-02-05 17:58:32 - INFO - stdout - {'loss': 0.9452, 'grad_norm': 1.1796070337295532, 'learning_rate': 1.5968583114124457e-05, 'epoch': 0.95} +2025-02-05 17:58:32 - ERROR - stderr - 32%|███▏ | 7124/22434 [7:50:51<10:58:03, 2.58s/it] +2025-02-05 17:58:34 - ERROR - stderr - 32%|███▏ | 7125/22434 [7:50:54<11:02:34, 2.60s/it] +2025-02-05 17:58:34 - ERROR - stderr - +2025-02-05 17:58:34 - ERROR - stderr - +2025-02-05 17:58:34 - INFO - stdout - {'loss': 0.8774, 'grad_norm': 1.1045987606048584, 'learning_rate': 1.5967424666648253e-05, 'epoch': 0.95} +2025-02-05 17:58:34 - ERROR - stderr - 32%|███▏ | 7125/22434 [7:50:54<11:02:34, 2.60s/it] +2025-02-05 17:58:37 - ERROR - stderr - 32%|███▏ | 7126/22434 [7:50:56<10:56:55, 2.57s/it] +2025-02-05 17:58:37 - ERROR - stderr - +2025-02-05 17:58:37 - ERROR - stderr - +2025-02-05 17:58:37 - INFO - stdout - {'loss': 0.7952, 'grad_norm': 0.9412549734115601, 'learning_rate': 1.59662660947868e-05, 'epoch': 0.95} +2025-02-05 17:58:37 - ERROR - stderr - 32%|███▏ | 7126/22434 [7:50:56<10:56:55, 2.57s/it] +2025-02-05 17:58:39 - ERROR - stderr - 32%|███▏ | 7127/22434 [7:50:59<10:56:20, 2.57s/it] +2025-02-05 17:58:39 - ERROR - stderr - +2025-02-05 17:58:39 - ERROR - stderr - +2025-02-05 17:58:39 - INFO - stdout - {'loss': 0.9673, 'grad_norm': 1.1132361888885498, 'learning_rate': 1.5965107398564228e-05, 'epoch': 0.95} +2025-02-05 17:58:39 - ERROR - stderr - 32%|███▏ | 7127/22434 [7:50:59<10:56:20, 2.57s/it] +2025-02-05 17:58:42 - ERROR - stderr - 32%|███▏ | 7128/22434 [7:51:01<10:48:59, 2.54s/it] +2025-02-05 17:58:42 - ERROR - stderr - +2025-02-05 17:58:42 - ERROR - stderr - +2025-02-05 17:58:42 - INFO - stdout - {'loss': 1.0005, 'grad_norm': 1.2136839628219604, 'learning_rate': 1.5963948578004708e-05, 'epoch': 0.95} +2025-02-05 17:58:42 - ERROR - stderr - 32%|███▏ | 7128/22434 [7:51:02<10:48:59, 2.54s/it] +2025-02-05 17:58:44 - ERROR - stderr - 32%|███▏ | 7129/22434 [7:51:04<10:46:04, 2.53s/it] +2025-02-05 17:58:44 - ERROR - stderr - +2025-02-05 17:58:44 - ERROR - stderr - +2025-02-05 17:58:44 - INFO - stdout - {'loss': 0.874, 'grad_norm': 0.9749768376350403, 'learning_rate': 1.5962789633132383e-05, 'epoch': 0.95} +2025-02-05 17:58:44 - ERROR - stderr - 32%|███▏ | 7129/22434 [7:51:04<10:46:04, 2.53s/it] +2025-02-05 17:58:47 - ERROR - stderr - 32%|███▏ | 7130/22434 [7:51:07<10:46:41, 2.54s/it] +2025-02-05 17:58:47 - ERROR - stderr - +2025-02-05 17:58:47 - ERROR - stderr - +2025-02-05 17:58:47 - INFO - stdout - {'loss': 0.9864, 'grad_norm': 1.067405104637146, 'learning_rate': 1.5961630563971414e-05, 'epoch': 0.95} +2025-02-05 17:58:47 - ERROR - stderr - 32%|███▏ | 7130/22434 [7:51:07<10:46:41, 2.54s/it] +2025-02-05 17:58:49 - ERROR - stderr - 32%|███▏ | 7131/22434 [7:51:09<10:43:21, 2.52s/it] +2025-02-05 17:58:49 - ERROR - stderr - +2025-02-05 17:58:49 - ERROR - stderr - +2025-02-05 17:58:49 - INFO - stdout - {'loss': 0.8438, 'grad_norm': 1.035555362701416, 'learning_rate': 1.5960471370545962e-05, 'epoch': 0.95} +2025-02-05 17:58:49 - ERROR - stderr - 32%|███▏ | 7131/22434 [7:51:09<10:43:21, 2.52s/it] +2025-02-05 17:58:52 - ERROR - stderr - 32%|███▏ | 7132/22434 [7:51:12<10:56:51, 2.58s/it] +2025-02-05 17:58:52 - ERROR - stderr - +2025-02-05 17:58:52 - ERROR - stderr - +2025-02-05 17:58:52 - INFO - stdout - {'loss': 0.9101, 'grad_norm': 1.10111665725708, 'learning_rate': 1.595931205288019e-05, 'epoch': 0.95} +2025-02-05 17:58:52 - ERROR - stderr - 32%|███▏ | 7132/22434 [7:51:12<10:56:51, 2.58s/it] +2025-02-05 17:58:54 - ERROR - stderr - 32%|███▏ | 7133/22434 [7:51:14<10:52:40, 2.56s/it] +2025-02-05 17:58:54 - ERROR - stderr - +2025-02-05 17:58:54 - ERROR - stderr - +2025-02-05 17:58:54 - INFO - stdout - {'loss': 0.8642, 'grad_norm': 1.0182299613952637, 'learning_rate': 1.595815261099826e-05, 'epoch': 0.95} +2025-02-05 17:58:54 - ERROR - stderr - 32%|███▏ | 7133/22434 [7:51:14<10:52:40, 2.56s/it] +2025-02-05 17:58:57 - ERROR - stderr - 32%|███▏ | 7134/22434 [7:51:17<10:48:32, 2.54s/it] +2025-02-05 17:58:57 - ERROR - stderr - +2025-02-05 17:58:57 - ERROR - stderr - +2025-02-05 17:58:57 - INFO - stdout - {'loss': 0.7498, 'grad_norm': 0.957973837852478, 'learning_rate': 1.5956993044924334e-05, 'epoch': 0.95} +2025-02-05 17:58:57 - ERROR - stderr - 32%|███▏ | 7134/22434 [7:51:17<10:48:32, 2.54s/it] +2025-02-05 17:58:59 - ERROR - stderr - 32%|███▏ | 7135/22434 [7:51:19<10:45:42, 2.53s/it] +2025-02-05 17:59:00 - ERROR - stderr - +2025-02-05 17:59:00 - ERROR - stderr - +2025-02-05 17:59:00 - INFO - stdout - {'loss': 0.888, 'grad_norm': 0.9944035410881042, 'learning_rate': 1.5955833354682593e-05, 'epoch': 0.95} +2025-02-05 17:59:00 - ERROR - stderr - 32%|███▏ | 7135/22434 [7:51:19<10:45:42, 2.53s/it] +2025-02-05 17:59:02 - ERROR - stderr - 32%|███▏ | 7136/22434 [7:51:22<10:48:34, 2.54s/it] +2025-02-05 17:59:02 - ERROR - stderr - +2025-02-05 17:59:02 - ERROR - stderr - +2025-02-05 17:59:02 - INFO - stdout - {'loss': 0.8378, 'grad_norm': 1.026961088180542, 'learning_rate': 1.5954673540297205e-05, 'epoch': 0.95} +2025-02-05 17:59:02 - ERROR - stderr - 32%|███▏ | 7136/22434 [7:51:22<10:48:34, 2.54s/it] +2025-02-05 17:59:05 - ERROR - stderr - 32%|███▏ | 7137/22434 [7:51:24<10:54:54, 2.57s/it] +2025-02-05 17:59:05 - ERROR - stderr - +2025-02-05 17:59:05 - ERROR - stderr - +2025-02-05 17:59:05 - INFO - stdout - {'loss': 0.8807, 'grad_norm': 1.0202935934066772, 'learning_rate': 1.5953513601792346e-05, 'epoch': 0.95} +2025-02-05 17:59:05 - ERROR - stderr - 32%|███▏ | 7137/22434 [7:51:24<10:54:54, 2.57s/it] +2025-02-05 17:59:07 - ERROR - stderr - 32%|███▏ | 7138/22434 [7:51:27<10:53:01, 2.56s/it] +2025-02-05 17:59:07 - ERROR - stderr - +2025-02-05 17:59:07 - ERROR - stderr - +2025-02-05 17:59:07 - INFO - stdout - {'loss': 0.9564, 'grad_norm': 1.1004679203033447, 'learning_rate': 1.595235353919219e-05, 'epoch': 0.95} +2025-02-05 17:59:07 - ERROR - stderr - 32%|███▏ | 7138/22434 [7:51:27<10:53:01, 2.56s/it] +2025-02-05 17:59:10 - ERROR - stderr - 32%|███▏ | 7139/22434 [7:51:29<10:48:45, 2.55s/it] +2025-02-05 17:59:10 - ERROR - stderr - +2025-02-05 17:59:10 - ERROR - stderr - +2025-02-05 17:59:10 - INFO - stdout - {'loss': 1.0089, 'grad_norm': 0.9983121156692505, 'learning_rate': 1.5951193352520918e-05, 'epoch': 0.95} +2025-02-05 17:59:10 - ERROR - stderr - 32%|███▏ | 7139/22434 [7:51:30<10:48:45, 2.55s/it] +2025-02-05 17:59:12 - ERROR - stderr - 32%|███▏ | 7140/22434 [7:51:32<10:43:54, 2.53s/it] +2025-02-05 17:59:12 - ERROR - stderr - +2025-02-05 17:59:12 - ERROR - stderr - +2025-02-05 17:59:12 - INFO - stdout - {'loss': 0.944, 'grad_norm': 1.105841040611267, 'learning_rate': 1.595003304180272e-05, 'epoch': 0.95} +2025-02-05 17:59:12 - ERROR - stderr - 32%|███▏ | 7140/22434 [7:51:32<10:43:54, 2.53s/it] +2025-02-05 17:59:15 - ERROR - stderr - 32%|███▏ | 7141/22434 [7:51:35<10:50:33, 2.55s/it] +2025-02-05 17:59:15 - ERROR - stderr - +2025-02-05 17:59:15 - ERROR - stderr - +2025-02-05 17:59:15 - INFO - stdout - {'loss': 0.8909, 'grad_norm': 0.9592905044555664, 'learning_rate': 1.5948872607061777e-05, 'epoch': 0.95} +2025-02-05 17:59:15 - ERROR - stderr - 32%|███▏ | 7141/22434 [7:51:35<10:50:33, 2.55s/it] +2025-02-05 17:59:17 - ERROR - stderr - 32%|███▏ | 7142/22434 [7:51:37<10:44:25, 2.53s/it] +2025-02-05 17:59:17 - ERROR - stderr - +2025-02-05 17:59:17 - ERROR - stderr - +2025-02-05 17:59:17 - INFO - stdout - {'loss': 1.0262, 'grad_norm': 1.1284663677215576, 'learning_rate': 1.5947712048322273e-05, 'epoch': 0.96} +2025-02-05 17:59:17 - ERROR - stderr - 32%|███▏ | 7142/22434 [7:51:37<10:44:25, 2.53s/it] +2025-02-05 17:59:20 - ERROR - stderr - 32%|███▏ | 7143/22434 [7:51:40<10:42:34, 2.52s/it] +2025-02-05 17:59:20 - ERROR - stderr - +2025-02-05 17:59:20 - ERROR - stderr - +2025-02-05 17:59:20 - INFO - stdout - {'loss': 0.8187, 'grad_norm': 1.0334285497665405, 'learning_rate': 1.594655136560841e-05, 'epoch': 0.96} +2025-02-05 17:59:20 - ERROR - stderr - 32%|███▏ | 7143/22434 [7:51:40<10:42:34, 2.52s/it] +2025-02-05 17:59:22 - ERROR - stderr - 32%|███▏ | 7144/22434 [7:51:42<10:50:59, 2.55s/it] +2025-02-05 17:59:22 - ERROR - stderr - +2025-02-05 17:59:22 - ERROR - stderr - +2025-02-05 17:59:22 - INFO - stdout - {'loss': 0.965, 'grad_norm': 1.04921555519104, 'learning_rate': 1.5945390558944368e-05, 'epoch': 0.96} +2025-02-05 17:59:22 - ERROR - stderr - 32%|███▏ | 7144/22434 [7:51:42<10:50:59, 2.55s/it] +2025-02-05 17:59:25 - ERROR - stderr - 32%|███▏ | 7145/22434 [7:51:45<10:44:10, 2.53s/it] +2025-02-05 17:59:25 - ERROR - stderr - +2025-02-05 17:59:25 - ERROR - stderr - +2025-02-05 17:59:25 - INFO - stdout - {'loss': 0.9663, 'grad_norm': 1.1667208671569824, 'learning_rate': 1.594422962835435e-05, 'epoch': 0.96} +2025-02-05 17:59:25 - ERROR - stderr - 32%|███▏ | 7145/22434 [7:51:45<10:44:10, 2.53s/it] +2025-02-05 17:59:27 - ERROR - stderr - 32%|███▏ | 7146/22434 [7:51:47<10:44:48, 2.53s/it] +2025-02-05 17:59:27 - ERROR - stderr - +2025-02-05 17:59:27 - ERROR - stderr - +2025-02-05 17:59:27 - INFO - stdout - {'loss': 1.0056, 'grad_norm': 1.188725471496582, 'learning_rate': 1.5943068573862554e-05, 'epoch': 0.96} +2025-02-05 17:59:27 - ERROR - stderr - 32%|███▏ | 7146/22434 [7:51:47<10:44:48, 2.53s/it] +2025-02-05 17:59:30 - ERROR - stderr - 32%|███▏ | 7147/22434 [7:51:50<10:39:07, 2.51s/it] +2025-02-05 17:59:30 - ERROR - stderr - +2025-02-05 17:59:30 - ERROR - stderr - +2025-02-05 17:59:30 - INFO - stdout - {'loss': 0.8296, 'grad_norm': 1.0738738775253296, 'learning_rate': 1.594190739549318e-05, 'epoch': 0.96} +2025-02-05 17:59:30 - ERROR - stderr - 32%|███▏ | 7147/22434 [7:51:50<10:39:07, 2.51s/it] +2025-02-05 17:59:32 - ERROR - stderr - 32%|███▏ | 7148/22434 [7:51:52<10:34:42, 2.49s/it] +2025-02-05 17:59:32 - ERROR - stderr - +2025-02-05 17:59:32 - ERROR - stderr - +2025-02-05 17:59:32 - INFO - stdout - {'loss': 0.9642, 'grad_norm': 1.0802658796310425, 'learning_rate': 1.594074609327043e-05, 'epoch': 0.96} +2025-02-05 17:59:32 - ERROR - stderr - 32%|███▏ | 7148/22434 [7:51:52<10:34:42, 2.49s/it] +2025-02-05 17:59:35 - ERROR - stderr - 32%|███▏ | 7149/22434 [7:51:55<10:36:56, 2.50s/it] +2025-02-05 17:59:35 - ERROR - stderr - +2025-02-05 17:59:35 - ERROR - stderr - +2025-02-05 17:59:35 - INFO - stdout - {'loss': 0.9603, 'grad_norm': 1.1569119691848755, 'learning_rate': 1.5939584667218517e-05, 'epoch': 0.96} +2025-02-05 17:59:35 - ERROR - stderr - 32%|███▏ | 7149/22434 [7:51:55<10:36:56, 2.50s/it] +2025-02-05 17:59:37 - ERROR - stderr - 32%|███▏ | 7150/22434 [7:51:57<10:39:59, 2.51s/it] +2025-02-05 17:59:37 - ERROR - stderr - +2025-02-05 17:59:37 - ERROR - stderr - +2025-02-05 17:59:37 - INFO - stdout - {'loss': 0.8927, 'grad_norm': 0.9737552404403687, 'learning_rate': 1.5938423117361642e-05, 'epoch': 0.96} +2025-02-05 17:59:37 - ERROR - stderr - 32%|███▏ | 7150/22434 [7:51:57<10:39:59, 2.51s/it] +2025-02-05 17:59:40 - ERROR - stderr - 32%|███▏ | 7151/22434 [7:52:00<10:42:01, 2.52s/it] +2025-02-05 17:59:40 - ERROR - stderr - +2025-02-05 17:59:40 - ERROR - stderr - +2025-02-05 17:59:40 - INFO - stdout - {'loss': 0.9489, 'grad_norm': 1.00571870803833, 'learning_rate': 1.593726144372402e-05, 'epoch': 0.96} +2025-02-05 17:59:40 - ERROR - stderr - 32%|███▏ | 7151/22434 [7:52:00<10:42:01, 2.52s/it] +2025-02-05 17:59:42 - ERROR - stderr - 32%|███▏ | 7152/22434 [7:52:02<10:41:07, 2.52s/it] +2025-02-05 17:59:42 - ERROR - stderr - +2025-02-05 17:59:42 - ERROR - stderr - +2025-02-05 17:59:42 - INFO - stdout - {'loss': 0.8536, 'grad_norm': 1.0208015441894531, 'learning_rate': 1.5936099646329865e-05, 'epoch': 0.96} +2025-02-05 17:59:42 - ERROR - stderr - 32%|███▏ | 7152/22434 [7:52:02<10:41:07, 2.52s/it] +2025-02-05 17:59:45 - ERROR - stderr - 32%|███▏ | 7153/22434 [7:52:05<10:43:46, 2.53s/it] +2025-02-05 17:59:45 - ERROR - stderr - +2025-02-05 17:59:45 - ERROR - stderr - +2025-02-05 17:59:45 - INFO - stdout - {'loss': 1.0031, 'grad_norm': 1.0271661281585693, 'learning_rate': 1.5934937725203396e-05, 'epoch': 0.96} +2025-02-05 17:59:45 - ERROR - stderr - 32%|███▏ | 7153/22434 [7:52:05<10:43:46, 2.53s/it] +2025-02-05 17:59:47 - ERROR - stderr - 32%|███▏ | 7154/22434 [7:52:07<10:38:32, 2.51s/it] +2025-02-05 17:59:47 - ERROR - stderr - +2025-02-05 17:59:47 - ERROR - stderr - +2025-02-05 17:59:47 - INFO - stdout - {'loss': 1.0142, 'grad_norm': 1.0311570167541504, 'learning_rate': 1.5933775680368825e-05, 'epoch': 0.96} +2025-02-05 17:59:47 - ERROR - stderr - 32%|███▏ | 7154/22434 [7:52:07<10:38:32, 2.51s/it] +2025-02-05 17:59:50 - ERROR - stderr - 32%|███▏ | 7155/22434 [7:52:10<11:04:41, 2.61s/it] +2025-02-05 17:59:50 - ERROR - stderr - +2025-02-05 17:59:50 - ERROR - stderr - +2025-02-05 17:59:50 - INFO - stdout - {'loss': 0.9385, 'grad_norm': 1.034283995628357, 'learning_rate': 1.5932613511850378e-05, 'epoch': 0.96} +2025-02-05 17:59:50 - ERROR - stderr - 32%|███▏ | 7155/22434 [7:52:10<11:04:41, 2.61s/it] +2025-02-05 17:59:53 - ERROR - stderr - 32%|███▏ | 7156/22434 [7:52:13<10:52:52, 2.56s/it] +2025-02-05 17:59:53 - ERROR - stderr - +2025-02-05 17:59:53 - ERROR - stderr - +2025-02-05 17:59:53 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.1296002864837646, 'learning_rate': 1.593145121967228e-05, 'epoch': 0.96} +2025-02-05 17:59:53 - ERROR - stderr - 32%|███▏ | 7156/22434 [7:52:13<10:52:52, 2.56s/it] +2025-02-05 17:59:55 - ERROR - stderr - 32%|███▏ | 7157/22434 [7:52:15<10:48:05, 2.55s/it] +2025-02-05 17:59:55 - ERROR - stderr - +2025-02-05 17:59:55 - ERROR - stderr - +2025-02-05 17:59:55 - INFO - stdout - {'loss': 0.8716, 'grad_norm': 1.0069224834442139, 'learning_rate': 1.593028880385876e-05, 'epoch': 0.96} +2025-02-05 17:59:55 - ERROR - stderr - 32%|███▏ | 7157/22434 [7:52:15<10:48:05, 2.55s/it] +2025-02-05 17:59:58 - ERROR - stderr - 32%|███▏ | 7158/22434 [7:52:17<10:42:30, 2.52s/it] +2025-02-05 17:59:58 - ERROR - stderr - +2025-02-05 17:59:58 - ERROR - stderr - +2025-02-05 17:59:58 - INFO - stdout - {'loss': 0.947, 'grad_norm': 1.0381200313568115, 'learning_rate': 1.592912626443404e-05, 'epoch': 0.96} +2025-02-05 17:59:58 - ERROR - stderr - 32%|███▏ | 7158/22434 [7:52:18<10:42:30, 2.52s/it] +2025-02-05 18:00:00 - ERROR - stderr - 32%|███▏ | 7159/22434 [7:52:20<10:37:44, 2.51s/it] +2025-02-05 18:00:00 - ERROR - stderr - +2025-02-05 18:00:00 - ERROR - stderr - +2025-02-05 18:00:00 - INFO - stdout - {'loss': 0.8566, 'grad_norm': 0.9566755890846252, 'learning_rate': 1.5927963601422357e-05, 'epoch': 0.96} +2025-02-05 18:00:00 - ERROR - stderr - 32%|███▏ | 7159/22434 [7:52:20<10:37:44, 2.51s/it] +2025-02-05 18:00:03 - ERROR - stderr - 32%|███▏ | 7160/22434 [7:52:22<10:34:00, 2.49s/it] +2025-02-05 18:00:03 - ERROR - stderr - +2025-02-05 18:00:03 - ERROR - stderr - +2025-02-05 18:00:03 - INFO - stdout - {'loss': 0.9518, 'grad_norm': 1.0272274017333984, 'learning_rate': 1.5926800814847946e-05, 'epoch': 0.96} +2025-02-05 18:00:03 - ERROR - stderr - 32%|███▏ | 7160/22434 [7:52:22<10:34:00, 2.49s/it] +2025-02-05 18:00:05 - ERROR - stderr - 32%|███▏ | 7161/22434 [7:52:25<10:31:57, 2.48s/it] +2025-02-05 18:00:05 - ERROR - stderr - +2025-02-05 18:00:05 - ERROR - stderr - +2025-02-05 18:00:05 - INFO - stdout - {'loss': 0.8041, 'grad_norm': 0.9850673675537109, 'learning_rate': 1.5925637904735047e-05, 'epoch': 0.96} +2025-02-05 18:00:05 - ERROR - stderr - 32%|███▏ | 7161/22434 [7:52:25<10:31:57, 2.48s/it] +2025-02-05 18:00:08 - ERROR - stderr - 32%|███▏ | 7162/22434 [7:52:27<10:42:12, 2.52s/it] +2025-02-05 18:00:08 - ERROR - stderr - +2025-02-05 18:00:08 - ERROR - stderr - +2025-02-05 18:00:08 - INFO - stdout - {'loss': 0.8249, 'grad_norm': 1.0681962966918945, 'learning_rate': 1.5924474871107892e-05, 'epoch': 0.96} +2025-02-05 18:00:08 - ERROR - stderr - 32%|███▏ | 7162/22434 [7:52:28<10:42:12, 2.52s/it] +2025-02-05 18:00:10 - ERROR - stderr - 32%|███▏ | 7163/22434 [7:52:30<10:49:45, 2.55s/it] +2025-02-05 18:00:10 - ERROR - stderr - +2025-02-05 18:00:10 - ERROR - stderr - +2025-02-05 18:00:10 - INFO - stdout - {'loss': 0.9851, 'grad_norm': 1.2337106466293335, 'learning_rate': 1.592331171399073e-05, 'epoch': 0.96} +2025-02-05 18:00:10 - ERROR - stderr - 32%|███▏ | 7163/22434 [7:52:30<10:49:45, 2.55s/it] +2025-02-05 18:00:13 - ERROR - stderr - 32%|███▏ | 7164/22434 [7:52:33<10:46:00, 2.54s/it] +2025-02-05 18:00:13 - ERROR - stderr - +2025-02-05 18:00:13 - ERROR - stderr - +2025-02-05 18:00:13 - INFO - stdout - {'loss': 0.9539, 'grad_norm': 1.004093885421753, 'learning_rate': 1.5922148433407802e-05, 'epoch': 0.96} +2025-02-05 18:00:13 - ERROR - stderr - 32%|███▏ | 7164/22434 [7:52:33<10:46:00, 2.54s/it] +2025-02-05 18:00:15 - ERROR - stderr - 32%|███▏ | 7165/22434 [7:52:35<10:44:58, 2.53s/it] +2025-02-05 18:00:15 - ERROR - stderr - +2025-02-05 18:00:15 - ERROR - stderr - +2025-02-05 18:00:15 - INFO - stdout - {'loss': 0.9826, 'grad_norm': 1.1471190452575684, 'learning_rate': 1.5920985029383357e-05, 'epoch': 0.96} +2025-02-05 18:00:15 - ERROR - stderr - 32%|███▏ | 7165/22434 [7:52:35<10:44:58, 2.53s/it] +2025-02-05 18:00:18 - ERROR - stderr - 32%|███▏ | 7166/22434 [7:52:38<10:36:32, 2.50s/it] +2025-02-05 18:00:18 - ERROR - stderr - +2025-02-05 18:00:18 - ERROR - stderr - +2025-02-05 18:00:18 - INFO - stdout - {'loss': 0.915, 'grad_norm': 1.1019096374511719, 'learning_rate': 1.5919821501941645e-05, 'epoch': 0.96} +2025-02-05 18:00:18 - ERROR - stderr - 32%|███▏ | 7166/22434 [7:52:38<10:36:32, 2.50s/it] +2025-02-05 18:00:20 - ERROR - stderr - 32%|███▏ | 7167/22434 [7:52:40<10:34:18, 2.49s/it] +2025-02-05 18:00:20 - ERROR - stderr - +2025-02-05 18:00:20 - ERROR - stderr - +2025-02-05 18:00:20 - INFO - stdout - {'loss': 0.8721, 'grad_norm': 1.000279188156128, 'learning_rate': 1.5918657851106914e-05, 'epoch': 0.96} +2025-02-05 18:00:20 - ERROR - stderr - 32%|███▏ | 7167/22434 [7:52:40<10:34:18, 2.49s/it] +2025-02-05 18:00:23 - ERROR - stderr - 32%|███▏ | 7168/22434 [7:52:43<10:32:59, 2.49s/it] +2025-02-05 18:00:23 - ERROR - stderr - +2025-02-05 18:00:23 - ERROR - stderr - +2025-02-05 18:00:23 - INFO - stdout - {'loss': 0.8033, 'grad_norm': 1.0223945379257202, 'learning_rate': 1.591749407690343e-05, 'epoch': 0.96} +2025-02-05 18:00:23 - ERROR - stderr - 32%|███▏ | 7168/22434 [7:52:43<10:32:59, 2.49s/it] +2025-02-05 18:00:25 - ERROR - stderr - 32%|███▏ | 7169/22434 [7:52:45<10:37:00, 2.50s/it] +2025-02-05 18:00:25 - ERROR - stderr - +2025-02-05 18:00:25 - ERROR - stderr - +2025-02-05 18:00:25 - INFO - stdout - {'loss': 0.9105, 'grad_norm': 1.02645742893219, 'learning_rate': 1.5916330179355443e-05, 'epoch': 0.96} +2025-02-05 18:00:25 - ERROR - stderr - 32%|███▏ | 7169/22434 [7:52:45<10:37:00, 2.50s/it] +2025-02-05 18:00:28 - ERROR - stderr - 32%|███▏ | 7170/22434 [7:52:48<10:40:49, 2.52s/it] +2025-02-05 18:00:28 - ERROR - stderr - +2025-02-05 18:00:28 - ERROR - stderr - +2025-02-05 18:00:28 - INFO - stdout - {'loss': 0.9061, 'grad_norm': 1.0516108274459839, 'learning_rate': 1.5915166158487213e-05, 'epoch': 0.96} +2025-02-05 18:00:28 - ERROR - stderr - 32%|███▏ | 7170/22434 [7:52:48<10:40:49, 2.52s/it] +2025-02-05 18:00:30 - ERROR - stderr - 32%|███▏ | 7171/22434 [7:52:50<10:42:57, 2.53s/it] +2025-02-05 18:00:30 - ERROR - stderr - +2025-02-05 18:00:30 - ERROR - stderr - +2025-02-05 18:00:30 - INFO - stdout - {'loss': 0.8048, 'grad_norm': 0.9885880947113037, 'learning_rate': 1.5914002014323004e-05, 'epoch': 0.96} +2025-02-05 18:00:30 - ERROR - stderr - 32%|███▏ | 7171/22434 [7:52:50<10:42:57, 2.53s/it] +2025-02-05 18:00:33 - ERROR - stderr - 32%|███▏ | 7172/22434 [7:52:53<10:45:03, 2.54s/it] +2025-02-05 18:00:33 - ERROR - stderr - +2025-02-05 18:00:33 - ERROR - stderr - +2025-02-05 18:00:33 - INFO - stdout - {'loss': 0.8807, 'grad_norm': 1.195559024810791, 'learning_rate': 1.5912837746887086e-05, 'epoch': 0.96} +2025-02-05 18:00:33 - ERROR - stderr - 32%|███▏ | 7172/22434 [7:52:53<10:45:03, 2.54s/it] +2025-02-05 18:00:36 - ERROR - stderr - 32%|███▏ | 7173/22434 [7:52:55<10:54:14, 2.57s/it] +2025-02-05 18:00:36 - ERROR - stderr - +2025-02-05 18:00:36 - ERROR - stderr - +2025-02-05 18:00:36 - INFO - stdout - {'loss': 0.9439, 'grad_norm': 0.9748232364654541, 'learning_rate': 1.591167335620372e-05, 'epoch': 0.96} +2025-02-05 18:00:36 - ERROR - stderr - 32%|███▏ | 7173/22434 [7:52:55<10:54:14, 2.57s/it] +2025-02-05 18:00:38 - ERROR - stderr - 32%|███▏ | 7174/22434 [7:52:58<11:15:05, 2.65s/it] +2025-02-05 18:00:39 - ERROR - stderr - +2025-02-05 18:00:39 - ERROR - stderr - +2025-02-05 18:00:39 - INFO - stdout - {'loss': 0.984, 'grad_norm': 1.0711393356323242, 'learning_rate': 1.591050884229718e-05, 'epoch': 0.96} +2025-02-05 18:00:39 - ERROR - stderr - 32%|███▏ | 7174/22434 [7:52:58<11:15:05, 2.65s/it] +2025-02-05 18:00:41 - ERROR - stderr - 32%|███▏ | 7175/22434 [7:53:01<11:29:56, 2.71s/it] +2025-02-05 18:00:41 - ERROR - stderr - +2025-02-05 18:00:41 - ERROR - stderr - +2025-02-05 18:00:41 - INFO - stdout - {'loss': 0.8776, 'grad_norm': 1.0485951900482178, 'learning_rate': 1.590934420519174e-05, 'epoch': 0.96} +2025-02-05 18:00:41 - ERROR - stderr - 32%|███▏ | 7175/22434 [7:53:01<11:29:56, 2.71s/it] +2025-02-05 18:00:44 - ERROR - stderr - 32%|███▏ | 7176/22434 [7:53:04<11:17:40, 2.66s/it] +2025-02-05 18:00:44 - ERROR - stderr - +2025-02-05 18:00:44 - ERROR - stderr - +2025-02-05 18:00:44 - INFO - stdout - {'loss': 0.9949, 'grad_norm': 1.1060349941253662, 'learning_rate': 1.5908179444911676e-05, 'epoch': 0.96} +2025-02-05 18:00:44 - ERROR - stderr - 32%|███▏ | 7176/22434 [7:53:04<11:17:40, 2.66s/it] +2025-02-05 18:00:46 - ERROR - stderr - 32%|███▏ | 7177/22434 [7:53:06<11:07:49, 2.63s/it] +2025-02-05 18:00:46 - ERROR - stderr - +2025-02-05 18:00:46 - ERROR - stderr - +2025-02-05 18:00:46 - INFO - stdout - {'loss': 0.8684, 'grad_norm': 1.109278917312622, 'learning_rate': 1.590701456148126e-05, 'epoch': 0.96} +2025-02-05 18:00:46 - ERROR - stderr - 32%|███▏ | 7177/22434 [7:53:06<11:07:49, 2.63s/it] +2025-02-05 18:00:49 - ERROR - stderr - 32%|███▏ | 7178/22434 [7:53:09<10:54:10, 2.57s/it] +2025-02-05 18:00:49 - ERROR - stderr - +2025-02-05 18:00:49 - ERROR - stderr - +2025-02-05 18:00:49 - INFO - stdout - {'loss': 0.9653, 'grad_norm': 1.2605009078979492, 'learning_rate': 1.5905849554924782e-05, 'epoch': 0.96} +2025-02-05 18:00:49 - ERROR - stderr - 32%|███▏ | 7178/22434 [7:53:09<10:54:10, 2.57s/it] +2025-02-05 18:00:51 - ERROR - stderr - 32%|███▏ | 7179/22434 [7:53:11<10:51:37, 2.56s/it] +2025-02-05 18:00:51 - ERROR - stderr - +2025-02-05 18:00:51 - ERROR - stderr - +2025-02-05 18:00:51 - INFO - stdout - {'loss': 0.9289, 'grad_norm': 0.9390795230865479, 'learning_rate': 1.590468442526652e-05, 'epoch': 0.96} +2025-02-05 18:00:51 - ERROR - stderr - 32%|███▏ | 7179/22434 [7:53:11<10:51:37, 2.56s/it] +2025-02-05 18:00:54 - ERROR - stderr - 32%|███▏ | 7180/22434 [7:53:14<11:04:59, 2.62s/it] +2025-02-05 18:00:54 - ERROR - stderr - +2025-02-05 18:00:54 - ERROR - stderr - +2025-02-05 18:00:54 - INFO - stdout - {'loss': 0.9498, 'grad_norm': 1.0866782665252686, 'learning_rate': 1.5903519172530762e-05, 'epoch': 0.96} +2025-02-05 18:00:54 - ERROR - stderr - 32%|███▏ | 7180/22434 [7:53:14<11:04:59, 2.62s/it] +2025-02-05 18:00:57 - ERROR - stderr - 32%|███▏ | 7181/22434 [7:53:16<10:58:34, 2.59s/it] +2025-02-05 18:00:57 - ERROR - stderr - +2025-02-05 18:00:57 - ERROR - stderr - +2025-02-05 18:00:57 - INFO - stdout - {'loss': 0.9013, 'grad_norm': 1.0394997596740723, 'learning_rate': 1.5902353796741796e-05, 'epoch': 0.96} +2025-02-05 18:00:57 - ERROR - stderr - 32%|███▏ | 7181/22434 [7:53:16<10:58:34, 2.59s/it] +2025-02-05 18:00:59 - ERROR - stderr - 32%|███▏ | 7182/22434 [7:53:19<11:06:03, 2.62s/it] +2025-02-05 18:00:59 - ERROR - stderr - +2025-02-05 18:00:59 - ERROR - stderr - +2025-02-05 18:00:59 - INFO - stdout - {'loss': 0.918, 'grad_norm': 1.0646930932998657, 'learning_rate': 1.5901188297923914e-05, 'epoch': 0.96} +2025-02-05 18:00:59 - ERROR - stderr - 32%|███▏ | 7182/22434 [7:53:19<11:06:03, 2.62s/it] +2025-02-05 18:01:02 - ERROR - stderr - 32%|███▏ | 7183/22434 [7:53:22<11:01:19, 2.60s/it] +2025-02-05 18:01:02 - ERROR - stderr - +2025-02-05 18:01:02 - ERROR - stderr - +2025-02-05 18:01:02 - INFO - stdout - {'loss': 0.9694, 'grad_norm': 1.0803992748260498, 'learning_rate': 1.5900022676101404e-05, 'epoch': 0.96} +2025-02-05 18:01:02 - ERROR - stderr - 32%|███▏ | 7183/22434 [7:53:22<11:01:19, 2.60s/it] +2025-02-05 18:01:04 - ERROR - stderr - 32%|███▏ | 7184/22434 [7:53:24<10:59:00, 2.59s/it] +2025-02-05 18:01:05 - ERROR - stderr - +2025-02-05 18:01:05 - ERROR - stderr - +2025-02-05 18:01:05 - INFO - stdout - {'loss': 0.8975, 'grad_norm': 0.9850744009017944, 'learning_rate': 1.589885693129857e-05, 'epoch': 0.96} +2025-02-05 18:01:05 - ERROR - stderr - 32%|███▏ | 7184/22434 [7:53:24<10:59:00, 2.59s/it] +2025-02-05 18:01:07 - ERROR - stderr - 32%|███▏ | 7185/22434 [7:53:27<10:56:15, 2.58s/it] +2025-02-05 18:01:07 - ERROR - stderr - +2025-02-05 18:01:07 - ERROR - stderr - +2025-02-05 18:01:07 - INFO - stdout - {'loss': 0.8876, 'grad_norm': 1.1196023225784302, 'learning_rate': 1.589769106353971e-05, 'epoch': 0.96} +2025-02-05 18:01:07 - ERROR - stderr - 32%|███▏ | 7185/22434 [7:53:27<10:56:15, 2.58s/it] +2025-02-05 18:01:10 - ERROR - stderr - 32%|███▏ | 7186/22434 [7:53:29<10:53:39, 2.57s/it] +2025-02-05 18:01:10 - ERROR - stderr - +2025-02-05 18:01:10 - ERROR - stderr - +2025-02-05 18:01:10 - INFO - stdout - {'loss': 0.8198, 'grad_norm': 1.1130235195159912, 'learning_rate': 1.589652507284912e-05, 'epoch': 0.96} +2025-02-05 18:01:10 - ERROR - stderr - 32%|███▏ | 7186/22434 [7:53:29<10:53:39, 2.57s/it] +2025-02-05 18:01:12 - ERROR - stderr - 32%|███▏ | 7187/22434 [7:53:32<10:47:14, 2.55s/it] +2025-02-05 18:01:12 - ERROR - stderr - +2025-02-05 18:01:12 - ERROR - stderr - +2025-02-05 18:01:12 - INFO - stdout - {'loss': 1.0711, 'grad_norm': 1.2156789302825928, 'learning_rate': 1.5895358959251107e-05, 'epoch': 0.96} +2025-02-05 18:01:12 - ERROR - stderr - 32%|███▏ | 7187/22434 [7:53:32<10:47:14, 2.55s/it] +2025-02-05 18:01:15 - ERROR - stderr - 32%|███▏ | 7188/22434 [7:53:35<11:07:25, 2.63s/it] +2025-02-05 18:01:15 - ERROR - stderr - +2025-02-05 18:01:15 - ERROR - stderr - +2025-02-05 18:01:15 - INFO - stdout - {'loss': 0.9077, 'grad_norm': 1.0984864234924316, 'learning_rate': 1.5894192722769984e-05, 'epoch': 0.96} +2025-02-05 18:01:15 - ERROR - stderr - 32%|███▏ | 7188/22434 [7:53:35<11:07:25, 2.63s/it] +2025-02-05 18:01:17 - ERROR - stderr - 32%|███▏ | 7189/22434 [7:53:37<10:50:59, 2.56s/it] +2025-02-05 18:01:17 - ERROR - stderr - +2025-02-05 18:01:17 - ERROR - stderr - +2025-02-05 18:01:17 - INFO - stdout - {'loss': 0.9039, 'grad_norm': 1.096029281616211, 'learning_rate': 1.5893026363430046e-05, 'epoch': 0.96} +2025-02-05 18:01:17 - ERROR - stderr - 32%|███▏ | 7189/22434 [7:53:37<10:50:59, 2.56s/it] +2025-02-05 18:01:20 - ERROR - stderr - 32%|███▏ | 7190/22434 [7:53:40<10:47:19, 2.55s/it] +2025-02-05 18:01:20 - ERROR - stderr - +2025-02-05 18:01:20 - ERROR - stderr - +2025-02-05 18:01:20 - INFO - stdout - {'loss': 0.9623, 'grad_norm': 1.1097468137741089, 'learning_rate': 1.5891859881255617e-05, 'epoch': 0.96} +2025-02-05 18:01:20 - ERROR - stderr - 32%|███▏ | 7190/22434 [7:53:40<10:47:19, 2.55s/it] +2025-02-05 18:01:22 - ERROR - stderr - 32%|███▏ | 7191/22434 [7:53:42<10:37:19, 2.51s/it] +2025-02-05 18:01:22 - ERROR - stderr - +2025-02-05 18:01:22 - ERROR - stderr - +2025-02-05 18:01:22 - INFO - stdout - {'loss': 1.0019, 'grad_norm': 1.152674674987793, 'learning_rate': 1.5890693276271005e-05, 'epoch': 0.96} +2025-02-05 18:01:22 - ERROR - stderr - 32%|███▏ | 7191/22434 [7:53:42<10:37:19, 2.51s/it] +2025-02-05 18:01:25 - ERROR - stderr - 32%|███▏ | 7192/22434 [7:53:45<10:37:27, 2.51s/it] +2025-02-05 18:01:25 - ERROR - stderr - +2025-02-05 18:01:25 - ERROR - stderr - +2025-02-05 18:01:25 - INFO - stdout - {'loss': 0.9085, 'grad_norm': 1.0698479413986206, 'learning_rate': 1.588952654850053e-05, 'epoch': 0.96} +2025-02-05 18:01:25 - ERROR - stderr - 32%|███▏ | 7192/22434 [7:53:45<10:37:27, 2.51s/it] +2025-02-05 18:01:27 - ERROR - stderr - 32%|███▏ | 7193/22434 [7:53:47<10:34:42, 2.50s/it] +2025-02-05 18:01:27 - ERROR - stderr - +2025-02-05 18:01:27 - ERROR - stderr - +2025-02-05 18:01:27 - INFO - stdout - {'loss': 0.9388, 'grad_norm': 1.0701878070831299, 'learning_rate': 1.588835969796851e-05, 'epoch': 0.96} +2025-02-05 18:01:27 - ERROR - stderr - 32%|███▏ | 7193/22434 [7:53:47<10:34:42, 2.50s/it] +2025-02-05 18:01:30 - ERROR - stderr - 32%|███▏ | 7194/22434 [7:53:50<10:55:15, 2.58s/it] +2025-02-05 18:01:30 - ERROR - stderr - +2025-02-05 18:01:30 - ERROR - stderr - +2025-02-05 18:01:30 - INFO - stdout - {'loss': 0.9001, 'grad_norm': 1.1567819118499756, 'learning_rate': 1.5887192724699263e-05, 'epoch': 0.96} +2025-02-05 18:01:30 - ERROR - stderr - 32%|███▏ | 7194/22434 [7:53:50<10:55:15, 2.58s/it] +2025-02-05 18:01:33 - ERROR - stderr - 32%|███▏ | 7195/22434 [7:53:52<10:56:24, 2.58s/it] +2025-02-05 18:01:33 - ERROR - stderr - +2025-02-05 18:01:33 - ERROR - stderr - +2025-02-05 18:01:33 - INFO - stdout - {'loss': 0.9151, 'grad_norm': 1.0278853178024292, 'learning_rate': 1.588602562871712e-05, 'epoch': 0.96} +2025-02-05 18:01:33 - ERROR - stderr - 32%|███▏ | 7195/22434 [7:53:52<10:56:24, 2.58s/it] +2025-02-05 18:01:35 - ERROR - stderr - 32%|███▏ | 7196/22434 [7:53:55<10:48:25, 2.55s/it] +2025-02-05 18:01:35 - ERROR - stderr - +2025-02-05 18:01:35 - ERROR - stderr - +2025-02-05 18:01:35 - INFO - stdout - {'loss': 0.8306, 'grad_norm': 0.9728804230690002, 'learning_rate': 1.5884858410046403e-05, 'epoch': 0.96} +2025-02-05 18:01:35 - ERROR - stderr - 32%|███▏ | 7196/22434 [7:53:55<10:48:25, 2.55s/it] +2025-02-05 18:01:38 - ERROR - stderr - 32%|███▏ | 7197/22434 [7:53:58<11:21:36, 2.68s/it] +2025-02-05 18:01:38 - ERROR - stderr - +2025-02-05 18:01:38 - ERROR - stderr - +2025-02-05 18:01:38 - INFO - stdout - {'loss': 0.8942, 'grad_norm': 0.9693159461021423, 'learning_rate': 1.5883691068711445e-05, 'epoch': 0.96} +2025-02-05 18:01:38 - ERROR - stderr - 32%|███▏ | 7197/22434 [7:53:58<11:21:36, 2.68s/it] +2025-02-05 18:01:41 - ERROR - stderr - 32%|███▏ | 7198/22434 [7:54:00<11:06:30, 2.62s/it] +2025-02-05 18:01:41 - ERROR - stderr - +2025-02-05 18:01:41 - ERROR - stderr - +2025-02-05 18:01:41 - INFO - stdout - {'loss': 1.1167, 'grad_norm': 1.1105037927627563, 'learning_rate': 1.5882523604736576e-05, 'epoch': 0.96} +2025-02-05 18:01:41 - ERROR - stderr - 32%|███▏ | 7198/22434 [7:54:00<11:06:30, 2.62s/it] +2025-02-05 18:01:43 - ERROR - stderr - 32%|███▏ | 7199/22434 [7:54:03<10:54:44, 2.58s/it] +2025-02-05 18:01:43 - ERROR - stderr - +2025-02-05 18:01:43 - ERROR - stderr - +2025-02-05 18:01:43 - INFO - stdout - {'loss': 0.9998, 'grad_norm': 1.2085965871810913, 'learning_rate': 1.5881356018146132e-05, 'epoch': 0.96} +2025-02-05 18:01:43 - ERROR - stderr - 32%|███▏ | 7199/22434 [7:54:03<10:54:44, 2.58s/it] +2025-02-05 18:01:45 - ERROR - stderr - 32%|███▏ | 7200/22434 [7:54:05<10:46:43, 2.55s/it] +2025-02-05 18:01:46 - ERROR - stderr - +2025-02-05 18:01:46 - ERROR - stderr - +2025-02-05 18:01:46 - INFO - stdout - {'loss': 0.9018, 'grad_norm': 0.9997827410697937, 'learning_rate': 1.588018830896445e-05, 'epoch': 0.96} +2025-02-05 18:01:46 - ERROR - stderr - 32%|███▏ | 7200/22434 [7:54:05<10:46:43, 2.55s/it] +2025-02-05 18:01:48 - ERROR - stderr - 32%|███▏ | 7201/22434 [7:54:08<10:56:25, 2.59s/it] +2025-02-05 18:01:48 - ERROR - stderr - +2025-02-05 18:01:48 - ERROR - stderr - +2025-02-05 18:01:48 - INFO - stdout - {'loss': 0.9466, 'grad_norm': 0.8537271022796631, 'learning_rate': 1.587902047721587e-05, 'epoch': 0.96} +2025-02-05 18:01:48 - ERROR - stderr - 32%|███▏ | 7201/22434 [7:54:08<10:56:25, 2.59s/it] +2025-02-05 18:01:51 - ERROR - stderr - 32%|███▏ | 7202/22434 [7:54:10<10:48:23, 2.55s/it] +2025-02-05 18:01:51 - ERROR - stderr - +2025-02-05 18:01:51 - ERROR - stderr - +2025-02-05 18:01:51 - INFO - stdout - {'loss': 0.9128, 'grad_norm': 0.9956924319267273, 'learning_rate': 1.5877852522924733e-05, 'epoch': 0.96} +2025-02-05 18:01:51 - ERROR - stderr - 32%|███▏ | 7202/22434 [7:54:10<10:48:23, 2.55s/it] +2025-02-05 18:01:53 - ERROR - stderr - 32%|███▏ | 7203/22434 [7:54:13<10:47:10, 2.55s/it] +2025-02-05 18:01:53 - ERROR - stderr - +2025-02-05 18:01:53 - ERROR - stderr - +2025-02-05 18:01:53 - INFO - stdout - {'loss': 1.0041, 'grad_norm': 1.1981338262557983, 'learning_rate': 1.5876684446115383e-05, 'epoch': 0.96} +2025-02-05 18:01:53 - ERROR - stderr - 32%|███▏ | 7203/22434 [7:54:13<10:47:10, 2.55s/it] +2025-02-05 18:01:56 - ERROR - stderr - 32%|███▏ | 7204/22434 [7:54:15<10:45:32, 2.54s/it] +2025-02-05 18:01:56 - ERROR - stderr - +2025-02-05 18:01:56 - ERROR - stderr - +2025-02-05 18:01:56 - INFO - stdout - {'loss': 0.8941, 'grad_norm': 1.0202155113220215, 'learning_rate': 1.587551624681217e-05, 'epoch': 0.96} +2025-02-05 18:01:56 - ERROR - stderr - 32%|███▏ | 7204/22434 [7:54:16<10:45:32, 2.54s/it] +2025-02-05 18:01:58 - ERROR - stderr - 32%|███▏ | 7205/22434 [7:54:18<10:42:57, 2.53s/it] +2025-02-05 18:01:58 - ERROR - stderr - +2025-02-05 18:01:58 - ERROR - stderr - +2025-02-05 18:01:58 - INFO - stdout - {'loss': 0.9363, 'grad_norm': 1.1028200387954712, 'learning_rate': 1.5874347925039447e-05, 'epoch': 0.96} +2025-02-05 18:01:58 - ERROR - stderr - 32%|███▏ | 7205/22434 [7:54:18<10:42:57, 2.53s/it] +2025-02-05 18:02:01 - ERROR - stderr - 32%|███▏ | 7206/22434 [7:54:21<10:42:48, 2.53s/it] +2025-02-05 18:02:01 - ERROR - stderr - +2025-02-05 18:02:01 - ERROR - stderr - +2025-02-05 18:02:01 - INFO - stdout - {'loss': 0.8347, 'grad_norm': 1.1392103433609009, 'learning_rate': 1.5873179480821558e-05, 'epoch': 0.96} +2025-02-05 18:02:01 - ERROR - stderr - 32%|███▏ | 7206/22434 [7:54:21<10:42:48, 2.53s/it] +2025-02-05 18:02:03 - ERROR - stderr - 32%|███▏ | 7207/22434 [7:54:23<10:37:29, 2.51s/it] +2025-02-05 18:02:03 - ERROR - stderr - +2025-02-05 18:02:03 - ERROR - stderr - +2025-02-05 18:02:03 - INFO - stdout - {'loss': 0.9658, 'grad_norm': 1.1301209926605225, 'learning_rate': 1.5872010914182864e-05, 'epoch': 0.96} +2025-02-05 18:02:03 - ERROR - stderr - 32%|███▏ | 7207/22434 [7:54:23<10:37:29, 2.51s/it] +2025-02-05 18:02:06 - ERROR - stderr - 32%|███▏ | 7208/22434 [7:54:26<10:50:52, 2.56s/it] +2025-02-05 18:02:06 - ERROR - stderr - +2025-02-05 18:02:06 - ERROR - stderr - +2025-02-05 18:02:06 - INFO - stdout - {'loss': 0.8847, 'grad_norm': 0.950319766998291, 'learning_rate': 1.5870842225147722e-05, 'epoch': 0.96} +2025-02-05 18:02:06 - ERROR - stderr - 32%|███▏ | 7208/22434 [7:54:26<10:50:52, 2.56s/it] +2025-02-05 18:02:08 - ERROR - stderr - 32%|███▏ | 7209/22434 [7:54:28<10:45:48, 2.55s/it] +2025-02-05 18:02:08 - ERROR - stderr - +2025-02-05 18:02:08 - ERROR - stderr - +2025-02-05 18:02:08 - INFO - stdout - {'loss': 0.8838, 'grad_norm': 1.048004150390625, 'learning_rate': 1.586967341374049e-05, 'epoch': 0.96} +2025-02-05 18:02:08 - ERROR - stderr - 32%|███▏ | 7209/22434 [7:54:28<10:45:48, 2.55s/it] +2025-02-05 18:02:11 - ERROR - stderr - 32%|███▏ | 7210/22434 [7:54:31<10:49:53, 2.56s/it] +2025-02-05 18:02:11 - ERROR - stderr - +2025-02-05 18:02:11 - ERROR - stderr - +2025-02-05 18:02:11 - INFO - stdout - {'loss': 0.9453, 'grad_norm': 0.9903548955917358, 'learning_rate': 1.5868504479985534e-05, 'epoch': 0.96} +2025-02-05 18:02:11 - ERROR - stderr - 32%|███▏ | 7210/22434 [7:54:31<10:49:53, 2.56s/it] +2025-02-05 18:02:14 - ERROR - stderr - 32%|███▏ | 7211/22434 [7:54:33<10:46:52, 2.55s/it] +2025-02-05 18:02:14 - ERROR - stderr - +2025-02-05 18:02:14 - ERROR - stderr - +2025-02-05 18:02:14 - INFO - stdout - {'loss': 0.8521, 'grad_norm': 1.0336371660232544, 'learning_rate': 1.586733542390722e-05, 'epoch': 0.96} +2025-02-05 18:02:14 - ERROR - stderr - 32%|███▏ | 7211/22434 [7:54:33<10:46:52, 2.55s/it] +2025-02-05 18:02:16 - ERROR - stderr - 32%|███▏ | 7212/22434 [7:54:36<10:41:33, 2.53s/it] +2025-02-05 18:02:16 - ERROR - stderr - +2025-02-05 18:02:16 - ERROR - stderr - +2025-02-05 18:02:16 - INFO - stdout - {'loss': 0.8786, 'grad_norm': 1.1012523174285889, 'learning_rate': 1.586616624552991e-05, 'epoch': 0.96} +2025-02-05 18:02:16 - ERROR - stderr - 32%|███▏ | 7212/22434 [7:54:36<10:41:33, 2.53s/it] +2025-02-05 18:02:19 - ERROR - stderr - 32%|███▏ | 7213/22434 [7:54:38<10:46:41, 2.55s/it] +2025-02-05 18:02:19 - ERROR - stderr - +2025-02-05 18:02:19 - ERROR - stderr - +2025-02-05 18:02:19 - INFO - stdout - {'loss': 0.9128, 'grad_norm': 1.0312836170196533, 'learning_rate': 1.586499694487798e-05, 'epoch': 0.96} +2025-02-05 18:02:19 - ERROR - stderr - 32%|███▏ | 7213/22434 [7:54:38<10:46:41, 2.55s/it] +2025-02-05 18:02:21 - ERROR - stderr - 32%|███▏ | 7214/22434 [7:54:41<10:41:36, 2.53s/it] +2025-02-05 18:02:21 - ERROR - stderr - +2025-02-05 18:02:21 - ERROR - stderr - +2025-02-05 18:02:21 - INFO - stdout - {'loss': 0.8969, 'grad_norm': 1.0870500802993774, 'learning_rate': 1.58638275219758e-05, 'epoch': 0.96} +2025-02-05 18:02:21 - ERROR - stderr - 32%|███▏ | 7214/22434 [7:54:41<10:41:36, 2.53s/it] +2025-02-05 18:02:24 - ERROR - stderr - 32%|███▏ | 7215/22434 [7:54:43<10:37:40, 2.51s/it] +2025-02-05 18:02:24 - ERROR - stderr - +2025-02-05 18:02:24 - ERROR - stderr - +2025-02-05 18:02:24 - INFO - stdout - {'loss': 0.9953, 'grad_norm': 1.109235167503357, 'learning_rate': 1.5862657976847745e-05, 'epoch': 0.96} +2025-02-05 18:02:24 - ERROR - stderr - 32%|███▏ | 7215/22434 [7:54:43<10:37:40, 2.51s/it] +2025-02-05 18:02:26 - ERROR - stderr - 32%|███▏ | 7216/22434 [7:54:46<10:49:23, 2.56s/it] +2025-02-05 18:02:26 - ERROR - stderr - +2025-02-05 18:02:26 - ERROR - stderr - +2025-02-05 18:02:26 - INFO - stdout - {'loss': 0.9205, 'grad_norm': 1.0422286987304688, 'learning_rate': 1.5861488309518193e-05, 'epoch': 0.96} +2025-02-05 18:02:26 - ERROR - stderr - 32%|███▏ | 7216/22434 [7:54:46<10:49:23, 2.56s/it] +2025-02-05 18:02:29 - ERROR - stderr - 32%|███▏ | 7217/22434 [7:54:48<10:44:36, 2.54s/it] +2025-02-05 18:02:29 - ERROR - stderr - +2025-02-05 18:02:29 - ERROR - stderr - +2025-02-05 18:02:29 - INFO - stdout - {'loss': 1.0459, 'grad_norm': 1.1091305017471313, 'learning_rate': 1.586031852001153e-05, 'epoch': 0.97} +2025-02-05 18:02:29 - ERROR - stderr - 32%|███▏ | 7217/22434 [7:54:49<10:44:36, 2.54s/it] +2025-02-05 18:02:31 - ERROR - stderr - 32%|███▏ | 7218/22434 [7:54:51<10:38:35, 2.52s/it] +2025-02-05 18:02:31 - ERROR - stderr - +2025-02-05 18:02:31 - ERROR - stderr - +2025-02-05 18:02:31 - INFO - stdout - {'loss': 0.8989, 'grad_norm': 0.9934311509132385, 'learning_rate': 1.5859148608352134e-05, 'epoch': 0.97} +2025-02-05 18:02:31 - ERROR - stderr - 32%|███▏ | 7218/22434 [7:54:51<10:38:35, 2.52s/it] +2025-02-05 18:02:34 - ERROR - stderr - 32%|███▏ | 7219/22434 [7:54:53<10:39:25, 2.52s/it] +2025-02-05 18:02:34 - ERROR - stderr - +2025-02-05 18:02:34 - ERROR - stderr - +2025-02-05 18:02:34 - INFO - stdout - {'loss': 0.8234, 'grad_norm': 0.9826712608337402, 'learning_rate': 1.585797857456439e-05, 'epoch': 0.97} +2025-02-05 18:02:34 - ERROR - stderr - 32%|███▏ | 7219/22434 [7:54:54<10:39:25, 2.52s/it] +2025-02-05 18:02:36 - ERROR - stderr - 32%|███▏ | 7220/22434 [7:54:56<10:41:59, 2.53s/it] +2025-02-05 18:02:36 - ERROR - stderr - +2025-02-05 18:02:36 - ERROR - stderr - +2025-02-05 18:02:36 - INFO - stdout - {'loss': 0.8972, 'grad_norm': 0.9722713828086853, 'learning_rate': 1.5856808418672688e-05, 'epoch': 0.97} +2025-02-05 18:02:36 - ERROR - stderr - 32%|███▏ | 7220/22434 [7:54:56<10:41:59, 2.53s/it] +2025-02-05 18:02:39 - ERROR - stderr - 32%|███▏ | 7221/22434 [7:54:58<10:35:48, 2.51s/it] +2025-02-05 18:02:39 - ERROR - stderr - +2025-02-05 18:02:39 - ERROR - stderr - +2025-02-05 18:02:39 - INFO - stdout - {'loss': 0.8801, 'grad_norm': 1.0086407661437988, 'learning_rate': 1.585563814070142e-05, 'epoch': 0.97} +2025-02-05 18:02:39 - ERROR - stderr - 32%|███▏ | 7221/22434 [7:54:59<10:35:48, 2.51s/it] +2025-02-05 18:02:41 - ERROR - stderr - 32%|███▏ | 7222/22434 [7:55:01<10:45:30, 2.55s/it] +2025-02-05 18:02:41 - ERROR - stderr - +2025-02-05 18:02:41 - ERROR - stderr - +2025-02-05 18:02:41 - INFO - stdout - {'loss': 0.8855, 'grad_norm': 1.0919053554534912, 'learning_rate': 1.5854467740674983e-05, 'epoch': 0.97} +2025-02-05 18:02:41 - ERROR - stderr - 32%|███▏ | 7222/22434 [7:55:01<10:45:30, 2.55s/it] +2025-02-05 18:02:44 - ERROR - stderr - 32%|███▏ | 7223/22434 [7:55:04<10:38:06, 2.52s/it] +2025-02-05 18:02:44 - ERROR - stderr - +2025-02-05 18:02:44 - ERROR - stderr - +2025-02-05 18:02:44 - INFO - stdout - {'loss': 0.8437, 'grad_norm': 1.0863865613937378, 'learning_rate': 1.585329721861776e-05, 'epoch': 0.97} +2025-02-05 18:02:44 - ERROR - stderr - 32%|███▏ | 7223/22434 [7:55:04<10:38:06, 2.52s/it] +2025-02-05 18:02:46 - ERROR - stderr - 32%|███▏ | 7224/22434 [7:55:06<10:39:33, 2.52s/it] +2025-02-05 18:02:46 - ERROR - stderr - +2025-02-05 18:02:46 - ERROR - stderr - +2025-02-05 18:02:46 - INFO - stdout - {'loss': 0.8242, 'grad_norm': 1.0866085290908813, 'learning_rate': 1.5852126574554162e-05, 'epoch': 0.97} +2025-02-05 18:02:46 - ERROR - stderr - 32%|███▏ | 7224/22434 [7:55:06<10:39:33, 2.52s/it] +2025-02-05 18:02:49 - ERROR - stderr - 32%|███▏ | 7225/22434 [7:55:09<10:34:43, 2.50s/it] +2025-02-05 18:02:49 - ERROR - stderr - +2025-02-05 18:02:49 - ERROR - stderr - +2025-02-05 18:02:49 - INFO - stdout - {'loss': 0.8387, 'grad_norm': 0.9943642616271973, 'learning_rate': 1.5850955808508582e-05, 'epoch': 0.97} +2025-02-05 18:02:49 - ERROR - stderr - 32%|███▏ | 7225/22434 [7:55:09<10:34:43, 2.50s/it] +2025-02-05 18:02:51 - ERROR - stderr - 32%|███▏ | 7226/22434 [7:55:11<10:38:36, 2.52s/it] +2025-02-05 18:02:51 - ERROR - stderr - +2025-02-05 18:02:51 - ERROR - stderr - +2025-02-05 18:02:51 - INFO - stdout - {'loss': 0.9243, 'grad_norm': 0.9850141406059265, 'learning_rate': 1.5849784920505434e-05, 'epoch': 0.97} +2025-02-05 18:02:51 - ERROR - stderr - 32%|███▏ | 7226/22434 [7:55:11<10:38:36, 2.52s/it] +2025-02-05 18:02:54 - ERROR - stderr - 32%|███▏ | 7227/22434 [7:55:14<10:34:55, 2.51s/it] +2025-02-05 18:02:54 - ERROR - stderr - +2025-02-05 18:02:54 - ERROR - stderr - +2025-02-05 18:02:54 - INFO - stdout - {'loss': 0.9777, 'grad_norm': 1.0083836317062378, 'learning_rate': 1.584861391056911e-05, 'epoch': 0.97} +2025-02-05 18:02:54 - ERROR - stderr - 32%|███▏ | 7227/22434 [7:55:14<10:34:55, 2.51s/it] +2025-02-05 18:02:56 - ERROR - stderr - 32%|███▏ | 7228/22434 [7:55:16<10:32:11, 2.49s/it] +2025-02-05 18:02:56 - ERROR - stderr - +2025-02-05 18:02:56 - ERROR - stderr - +2025-02-05 18:02:56 - INFO - stdout - {'loss': 0.956, 'grad_norm': 1.0428203344345093, 'learning_rate': 1.5847442778724028e-05, 'epoch': 0.97} +2025-02-05 18:02:56 - ERROR - stderr - 32%|███▏ | 7228/22434 [7:55:16<10:32:11, 2.49s/it] +2025-02-05 18:02:59 - ERROR - stderr - 32%|███▏ | 7229/22434 [7:55:19<10:38:03, 2.52s/it] +2025-02-05 18:02:59 - ERROR - stderr - +2025-02-05 18:02:59 - ERROR - stderr - +2025-02-05 18:02:59 - INFO - stdout - {'loss': 0.9521, 'grad_norm': 1.1508619785308838, 'learning_rate': 1.5846271524994597e-05, 'epoch': 0.97} +2025-02-05 18:02:59 - ERROR - stderr - 32%|███▏ | 7229/22434 [7:55:19<10:38:03, 2.52s/it] +2025-02-05 18:03:01 - ERROR - stderr - 32%|███▏ | 7230/22434 [7:55:21<10:35:45, 2.51s/it] +2025-02-05 18:03:01 - ERROR - stderr - +2025-02-05 18:03:01 - ERROR - stderr - +2025-02-05 18:03:01 - INFO - stdout - {'loss': 1.0887, 'grad_norm': 1.158668041229248, 'learning_rate': 1.584510014940523e-05, 'epoch': 0.97} +2025-02-05 18:03:01 - ERROR - stderr - 32%|███▏ | 7230/22434 [7:55:21<10:35:45, 2.51s/it] +2025-02-05 18:03:04 - ERROR - stderr - 32%|███▏ | 7231/22434 [7:55:24<10:28:32, 2.48s/it] +2025-02-05 18:03:04 - ERROR - stderr - +2025-02-05 18:03:04 - ERROR - stderr - +2025-02-05 18:03:04 - INFO - stdout - {'loss': 0.9524, 'grad_norm': 1.0060038566589355, 'learning_rate': 1.5843928651980344e-05, 'epoch': 0.97} +2025-02-05 18:03:04 - ERROR - stderr - 32%|███▏ | 7231/22434 [7:55:24<10:28:32, 2.48s/it] +2025-02-05 18:03:06 - ERROR - stderr - 32%|███▏ | 7232/22434 [7:55:26<10:28:39, 2.48s/it] +2025-02-05 18:03:06 - ERROR - stderr - +2025-02-05 18:03:06 - ERROR - stderr - +2025-02-05 18:03:06 - INFO - stdout - {'loss': 1.0009, 'grad_norm': 1.1059223413467407, 'learning_rate': 1.5842757032744355e-05, 'epoch': 0.97} +2025-02-05 18:03:06 - ERROR - stderr - 32%|███▏ | 7232/22434 [7:55:26<10:28:39, 2.48s/it] +2025-02-05 18:03:09 - ERROR - stderr - 32%|███▏ | 7233/22434 [7:55:29<10:34:01, 2.50s/it] +2025-02-05 18:03:09 - ERROR - stderr - +2025-02-05 18:03:09 - ERROR - stderr - +2025-02-05 18:03:09 - INFO - stdout - {'loss': 0.9598, 'grad_norm': 1.037505865097046, 'learning_rate': 1.5841585291721688e-05, 'epoch': 0.97} +2025-02-05 18:03:09 - ERROR - stderr - 32%|███▏ | 7233/22434 [7:55:29<10:34:01, 2.50s/it] +2025-02-05 18:03:11 - ERROR - stderr - 32%|███▏ | 7234/22434 [7:55:31<10:44:41, 2.54s/it] +2025-02-05 18:03:11 - ERROR - stderr - +2025-02-05 18:03:11 - ERROR - stderr - +2025-02-05 18:03:11 - INFO - stdout - {'loss': 1.0886, 'grad_norm': 1.0458277463912964, 'learning_rate': 1.5840413428936767e-05, 'epoch': 0.97} +2025-02-05 18:03:11 - ERROR - stderr - 32%|███▏ | 7234/22434 [7:55:31<10:44:41, 2.54s/it] +2025-02-05 18:03:14 - ERROR - stderr - 32%|███▏ | 7235/22434 [7:55:34<10:45:06, 2.55s/it] +2025-02-05 18:03:14 - ERROR - stderr - +2025-02-05 18:03:14 - ERROR - stderr - +2025-02-05 18:03:14 - INFO - stdout - {'loss': 1.03, 'grad_norm': 1.0625215768814087, 'learning_rate': 1.5839241444414018e-05, 'epoch': 0.97} +2025-02-05 18:03:14 - ERROR - stderr - 32%|███▏ | 7235/22434 [7:55:34<10:45:06, 2.55s/it] +2025-02-05 18:03:16 - ERROR - stderr - 32%|███▏ | 7236/22434 [7:55:36<10:39:45, 2.53s/it] +2025-02-05 18:03:17 - ERROR - stderr - +2025-02-05 18:03:17 - ERROR - stderr - +2025-02-05 18:03:17 - INFO - stdout - {'loss': 0.8138, 'grad_norm': 0.8698956966400146, 'learning_rate': 1.5838069338177865e-05, 'epoch': 0.97} +2025-02-05 18:03:17 - ERROR - stderr - 32%|███▏ | 7236/22434 [7:55:36<10:39:45, 2.53s/it] +2025-02-05 18:03:19 - ERROR - stderr - 32%|███▏ | 7237/22434 [7:55:39<10:35:03, 2.51s/it] +2025-02-05 18:03:19 - ERROR - stderr - +2025-02-05 18:03:19 - ERROR - stderr - +2025-02-05 18:03:19 - INFO - stdout - {'loss': 1.0748, 'grad_norm': 1.1134084463119507, 'learning_rate': 1.5836897110252745e-05, 'epoch': 0.97} +2025-02-05 18:03:19 - ERROR - stderr - 32%|███▏ | 7237/22434 [7:55:39<10:35:03, 2.51s/it] +2025-02-05 18:03:21 - ERROR - stderr - 32%|███▏ | 7238/22434 [7:55:41<10:33:31, 2.50s/it] +2025-02-05 18:03:21 - ERROR - stderr - +2025-02-05 18:03:21 - ERROR - stderr - +2025-02-05 18:03:21 - INFO - stdout - {'loss': 0.9196, 'grad_norm': 1.0385980606079102, 'learning_rate': 1.583572476066309e-05, 'epoch': 0.97} +2025-02-05 18:03:21 - ERROR - stderr - 32%|███▏ | 7238/22434 [7:55:41<10:33:31, 2.50s/it] +2025-02-05 18:03:24 - ERROR - stderr - 32%|███▏ | 7239/22434 [7:55:44<10:30:54, 2.49s/it] +2025-02-05 18:03:24 - ERROR - stderr - +2025-02-05 18:03:24 - ERROR - stderr - +2025-02-05 18:03:24 - INFO - stdout - {'loss': 0.9359, 'grad_norm': 1.126681923866272, 'learning_rate': 1.5834552289433334e-05, 'epoch': 0.97} +2025-02-05 18:03:24 - ERROR - stderr - 32%|███▏ | 7239/22434 [7:55:44<10:30:54, 2.49s/it] +2025-02-05 18:03:26 - ERROR - stderr - 32%|███▏ | 7240/22434 [7:55:46<10:31:20, 2.49s/it] +2025-02-05 18:03:26 - ERROR - stderr - +2025-02-05 18:03:26 - ERROR - stderr - +2025-02-05 18:03:26 - INFO - stdout - {'loss': 0.9498, 'grad_norm': 1.0145320892333984, 'learning_rate': 1.583337969658792e-05, 'epoch': 0.97} +2025-02-05 18:03:26 - ERROR - stderr - 32%|███▏ | 7240/22434 [7:55:46<10:31:20, 2.49s/it] +2025-02-05 18:03:29 - ERROR - stderr - 32%|███▏ | 7241/22434 [7:55:49<10:39:34, 2.53s/it] +2025-02-05 18:03:29 - ERROR - stderr - +2025-02-05 18:03:29 - ERROR - stderr - +2025-02-05 18:03:29 - INFO - stdout - {'loss': 0.9218, 'grad_norm': 1.014756679534912, 'learning_rate': 1.5832206982151288e-05, 'epoch': 0.97} +2025-02-05 18:03:29 - ERROR - stderr - 32%|███▏ | 7241/22434 [7:55:49<10:39:34, 2.53s/it] +2025-02-05 18:03:32 - ERROR - stderr - 32%|███▏ | 7242/22434 [7:55:51<10:39:42, 2.53s/it] +2025-02-05 18:03:32 - ERROR - stderr - +2025-02-05 18:03:32 - ERROR - stderr - +2025-02-05 18:03:32 - INFO - stdout - {'loss': 0.9866, 'grad_norm': 1.1135059595108032, 'learning_rate': 1.5831034146147882e-05, 'epoch': 0.97} +2025-02-05 18:03:32 - ERROR - stderr - 32%|███▏ | 7242/22434 [7:55:51<10:39:42, 2.53s/it] +2025-02-05 18:03:34 - ERROR - stderr - 32%|███▏ | 7243/22434 [7:55:54<10:34:32, 2.51s/it] +2025-02-05 18:03:34 - ERROR - stderr - +2025-02-05 18:03:34 - ERROR - stderr - +2025-02-05 18:03:34 - INFO - stdout - {'loss': 0.8613, 'grad_norm': 1.0222879648208618, 'learning_rate': 1.582986118860215e-05, 'epoch': 0.97} +2025-02-05 18:03:34 - ERROR - stderr - 32%|███▏ | 7243/22434 [7:55:54<10:34:32, 2.51s/it] +2025-02-05 18:03:36 - ERROR - stderr - 32%|███▏ | 7244/22434 [7:55:56<10:35:04, 2.51s/it] +2025-02-05 18:03:37 - ERROR - stderr - +2025-02-05 18:03:37 - ERROR - stderr - +2025-02-05 18:03:37 - INFO - stdout - {'loss': 0.9161, 'grad_norm': 1.0654881000518799, 'learning_rate': 1.582868810953854e-05, 'epoch': 0.97} +2025-02-05 18:03:37 - ERROR - stderr - 32%|███▏ | 7244/22434 [7:55:56<10:35:04, 2.51s/it] +2025-02-05 18:03:39 - ERROR - stderr - 32%|███▏ | 7245/22434 [7:55:59<10:43:01, 2.54s/it] +2025-02-05 18:03:39 - ERROR - stderr - +2025-02-05 18:03:39 - ERROR - stderr - +2025-02-05 18:03:39 - INFO - stdout - {'loss': 0.9286, 'grad_norm': 1.0502616167068481, 'learning_rate': 1.5827514908981504e-05, 'epoch': 0.97} +2025-02-05 18:03:39 - ERROR - stderr - 32%|███▏ | 7245/22434 [7:55:59<10:43:01, 2.54s/it] +2025-02-05 18:03:42 - ERROR - stderr - 32%|███▏ | 7246/22434 [7:56:01<10:43:30, 2.54s/it] +2025-02-05 18:03:42 - ERROR - stderr - +2025-02-05 18:03:42 - ERROR - stderr - +2025-02-05 18:03:42 - INFO - stdout - {'loss': 0.9352, 'grad_norm': 1.0014979839324951, 'learning_rate': 1.58263415869555e-05, 'epoch': 0.97} +2025-02-05 18:03:42 - ERROR - stderr - 32%|███▏ | 7246/22434 [7:56:01<10:43:30, 2.54s/it] +2025-02-05 18:03:44 - ERROR - stderr - 32%|███▏ | 7247/22434 [7:56:04<10:46:18, 2.55s/it] +2025-02-05 18:03:44 - ERROR - stderr - +2025-02-05 18:03:44 - ERROR - stderr - +2025-02-05 18:03:44 - INFO - stdout - {'loss': 1.0052, 'grad_norm': 1.0085316896438599, 'learning_rate': 1.5825168143484974e-05, 'epoch': 0.97} +2025-02-05 18:03:44 - ERROR - stderr - 32%|███▏ | 7247/22434 [7:56:04<10:46:18, 2.55s/it] +2025-02-05 18:03:47 - ERROR - stderr - 32%|███▏ | 7248/22434 [7:56:06<10:41:15, 2.53s/it] +2025-02-05 18:03:47 - ERROR - stderr - +2025-02-05 18:03:47 - ERROR - stderr - +2025-02-05 18:03:47 - INFO - stdout - {'loss': 0.9846, 'grad_norm': 1.0513637065887451, 'learning_rate': 1.5823994578594396e-05, 'epoch': 0.97} +2025-02-05 18:03:47 - ERROR - stderr - 32%|███▏ | 7248/22434 [7:56:07<10:41:15, 2.53s/it] +2025-02-05 18:03:49 - ERROR - stderr - 32%|███▏ | 7249/22434 [7:56:09<10:37:57, 2.52s/it] +2025-02-05 18:03:49 - ERROR - stderr - +2025-02-05 18:03:49 - ERROR - stderr - +2025-02-05 18:03:49 - INFO - stdout - {'loss': 0.9268, 'grad_norm': 1.007477879524231, 'learning_rate': 1.5822820892308222e-05, 'epoch': 0.97} +2025-02-05 18:03:49 - ERROR - stderr - 32%|███▏ | 7249/22434 [7:56:09<10:37:57, 2.52s/it] +2025-02-05 18:03:52 - ERROR - stderr - 32%|███▏ | 7250/22434 [7:56:11<10:35:26, 2.51s/it] +2025-02-05 18:03:52 - ERROR - stderr - +2025-02-05 18:03:52 - ERROR - stderr - +2025-02-05 18:03:52 - INFO - stdout - {'loss': 0.9143, 'grad_norm': 1.0830000638961792, 'learning_rate': 1.5821647084650917e-05, 'epoch': 0.97} +2025-02-05 18:03:52 - ERROR - stderr - 32%|███▏ | 7250/22434 [7:56:12<10:35:26, 2.51s/it] +2025-02-05 18:03:54 - ERROR - stderr - 32%|███▏ | 7251/22434 [7:56:14<10:41:23, 2.53s/it] +2025-02-05 18:03:54 - ERROR - stderr - +2025-02-05 18:03:54 - ERROR - stderr - +2025-02-05 18:03:54 - INFO - stdout - {'loss': 0.9255, 'grad_norm': 1.082414150238037, 'learning_rate': 1.582047315564695e-05, 'epoch': 0.97} +2025-02-05 18:03:54 - ERROR - stderr - 32%|███▏ | 7251/22434 [7:56:14<10:41:23, 2.53s/it] +2025-02-05 18:03:57 - ERROR - stderr - 32%|███▏ | 7252/22434 [7:56:17<10:47:30, 2.56s/it] +2025-02-05 18:03:57 - ERROR - stderr - +2025-02-05 18:03:57 - ERROR - stderr - +2025-02-05 18:03:57 - INFO - stdout - {'loss': 0.9145, 'grad_norm': 1.0586217641830444, 'learning_rate': 1.5819299105320795e-05, 'epoch': 0.97} +2025-02-05 18:03:57 - ERROR - stderr - 32%|███▏ | 7252/22434 [7:56:17<10:47:30, 2.56s/it] +2025-02-05 18:03:59 - ERROR - stderr - 32%|███▏ | 7253/22434 [7:56:19<10:43:00, 2.54s/it] +2025-02-05 18:03:59 - ERROR - stderr - +2025-02-05 18:03:59 - ERROR - stderr - +2025-02-05 18:03:59 - INFO - stdout - {'loss': 0.9215, 'grad_norm': 1.059692621231079, 'learning_rate': 1.5818124933696912e-05, 'epoch': 0.97} +2025-02-05 18:03:59 - ERROR - stderr - 32%|███▏ | 7253/22434 [7:56:19<10:43:00, 2.54s/it] +2025-02-05 18:04:02 - ERROR - stderr - 32%|███▏ | 7254/22434 [7:56:22<10:47:28, 2.56s/it] +2025-02-05 18:04:02 - ERROR - stderr - +2025-02-05 18:04:02 - ERROR - stderr - +2025-02-05 18:04:02 - INFO - stdout - {'loss': 0.9542, 'grad_norm': 0.9990198016166687, 'learning_rate': 1.5816950640799785e-05, 'epoch': 0.97} +2025-02-05 18:04:02 - ERROR - stderr - 32%|███▏ | 7254/22434 [7:56:22<10:47:28, 2.56s/it] +2025-02-05 18:04:05 - ERROR - stderr - 32%|███▏ | 7255/22434 [7:56:25<11:02:18, 2.62s/it] +2025-02-05 18:04:05 - ERROR - stderr - +2025-02-05 18:04:05 - ERROR - stderr - +2025-02-05 18:04:05 - INFO - stdout - {'loss': 0.9605, 'grad_norm': 1.0688501596450806, 'learning_rate': 1.581577622665389e-05, 'epoch': 0.97} +2025-02-05 18:04:05 - ERROR - stderr - 32%|███▏ | 7255/22434 [7:56:25<11:02:18, 2.62s/it] +2025-02-05 18:04:07 - ERROR - stderr - 32%|███▏ | 7256/22434 [7:56:27<10:51:54, 2.58s/it] +2025-02-05 18:04:07 - ERROR - stderr - +2025-02-05 18:04:07 - ERROR - stderr - +2025-02-05 18:04:07 - INFO - stdout - {'loss': 0.9259, 'grad_norm': 1.0125993490219116, 'learning_rate': 1.58146016912837e-05, 'epoch': 0.97} +2025-02-05 18:04:07 - ERROR - stderr - 32%|███▏ | 7256/22434 [7:56:27<10:51:54, 2.58s/it] +2025-02-05 18:04:10 - ERROR - stderr - 32%|███▏ | 7257/22434 [7:56:30<10:47:01, 2.56s/it] +2025-02-05 18:04:10 - ERROR - stderr - +2025-02-05 18:04:10 - ERROR - stderr - +2025-02-05 18:04:10 - INFO - stdout - {'loss': 0.8929, 'grad_norm': 1.0894843339920044, 'learning_rate': 1.5813427034713705e-05, 'epoch': 0.97} +2025-02-05 18:04:10 - ERROR - stderr - 32%|███▏ | 7257/22434 [7:56:30<10:47:01, 2.56s/it] +2025-02-05 18:04:12 - ERROR - stderr - 32%|███▏ | 7258/22434 [7:56:32<10:47:18, 2.56s/it] +2025-02-05 18:04:12 - ERROR - stderr - +2025-02-05 18:04:12 - ERROR - stderr - +2025-02-05 18:04:12 - INFO - stdout - {'loss': 0.9441, 'grad_norm': 1.1687116622924805, 'learning_rate': 1.5812252256968386e-05, 'epoch': 0.97} +2025-02-05 18:04:12 - ERROR - stderr - 32%|███▏ | 7258/22434 [7:56:32<10:47:18, 2.56s/it] +2025-02-05 18:04:15 - ERROR - stderr - 32%|███▏ | 7259/22434 [7:56:35<10:49:27, 2.57s/it] +2025-02-05 18:04:15 - ERROR - stderr - +2025-02-05 18:04:15 - ERROR - stderr - +2025-02-05 18:04:15 - INFO - stdout - {'loss': 0.9526, 'grad_norm': 0.9743062853813171, 'learning_rate': 1.581107735807223e-05, 'epoch': 0.97} +2025-02-05 18:04:15 - ERROR - stderr - 32%|███▏ | 7259/22434 [7:56:35<10:49:27, 2.57s/it] +2025-02-05 18:04:17 - ERROR - stderr - 32%|███▏ | 7260/22434 [7:56:37<10:43:38, 2.55s/it] +2025-02-05 18:04:17 - ERROR - stderr - +2025-02-05 18:04:17 - ERROR - stderr - +2025-02-05 18:04:17 - INFO - stdout - {'loss': 0.9257, 'grad_norm': 1.0323352813720703, 'learning_rate': 1.5809902338049722e-05, 'epoch': 0.97} +2025-02-05 18:04:17 - ERROR - stderr - 32%|███▏ | 7260/22434 [7:56:37<10:43:38, 2.55s/it] +2025-02-05 18:04:20 - ERROR - stderr - 32%|███▏ | 7261/22434 [7:56:40<10:39:17, 2.53s/it] +2025-02-05 18:04:20 - ERROR - stderr - +2025-02-05 18:04:20 - ERROR - stderr - +2025-02-05 18:04:20 - INFO - stdout - {'loss': 0.9672, 'grad_norm': 1.093103051185608, 'learning_rate': 1.5808727196925366e-05, 'epoch': 0.97} +2025-02-05 18:04:20 - ERROR - stderr - 32%|███▏ | 7261/22434 [7:56:40<10:39:17, 2.53s/it] +2025-02-05 18:04:22 - ERROR - stderr - 32%|███▏ | 7262/22434 [7:56:42<10:45:20, 2.55s/it] +2025-02-05 18:04:23 - ERROR - stderr - +2025-02-05 18:04:23 - ERROR - stderr - +2025-02-05 18:04:23 - INFO - stdout - {'loss': 0.9416, 'grad_norm': 1.0001981258392334, 'learning_rate': 1.580755193472365e-05, 'epoch': 0.97} +2025-02-05 18:04:23 - ERROR - stderr - 32%|███▏ | 7262/22434 [7:56:42<10:45:20, 2.55s/it] +2025-02-05 18:04:25 - ERROR - stderr - 32%|███▏ | 7263/22434 [7:56:45<10:50:05, 2.57s/it] +2025-02-05 18:04:25 - ERROR - stderr - +2025-02-05 18:04:25 - ERROR - stderr - +2025-02-05 18:04:25 - INFO - stdout - {'loss': 0.8561, 'grad_norm': 0.978725790977478, 'learning_rate': 1.580637655146907e-05, 'epoch': 0.97} +2025-02-05 18:04:25 - ERROR - stderr - 32%|███▏ | 7263/22434 [7:56:45<10:50:05, 2.57s/it] +2025-02-05 18:04:28 - ERROR - stderr - 32%|███▏ | 7264/22434 [7:56:47<10:41:37, 2.54s/it] +2025-02-05 18:04:28 - ERROR - stderr - +2025-02-05 18:04:28 - ERROR - stderr - +2025-02-05 18:04:28 - INFO - stdout - {'loss': 0.8071, 'grad_norm': 0.9973233938217163, 'learning_rate': 1.5805201047186124e-05, 'epoch': 0.97} +2025-02-05 18:04:28 - ERROR - stderr - 32%|███▏ | 7264/22434 [7:56:47<10:41:37, 2.54s/it] +2025-02-05 18:04:30 - ERROR - stderr - 32%|███▏ | 7265/22434 [7:56:50<10:51:31, 2.58s/it] +2025-02-05 18:04:30 - ERROR - stderr - +2025-02-05 18:04:30 - ERROR - stderr - +2025-02-05 18:04:30 - INFO - stdout - {'loss': 0.9183, 'grad_norm': 1.031728982925415, 'learning_rate': 1.580402542189932e-05, 'epoch': 0.97} +2025-02-05 18:04:30 - ERROR - stderr - 32%|███▏ | 7265/22434 [7:56:50<10:51:31, 2.58s/it] +2025-02-05 18:04:33 - ERROR - stderr - 32%|███▏ | 7266/22434 [7:56:53<10:59:58, 2.61s/it] +2025-02-05 18:04:33 - ERROR - stderr - +2025-02-05 18:04:33 - ERROR - stderr - +2025-02-05 18:04:33 - INFO - stdout - {'loss': 0.9383, 'grad_norm': 0.9891021847724915, 'learning_rate': 1.580284967563316e-05, 'epoch': 0.97} +2025-02-05 18:04:33 - ERROR - stderr - 32%|███▏ | 7266/22434 [7:56:53<10:59:58, 2.61s/it] +2025-02-05 18:04:35 - ERROR - stderr - 32%|███▏ | 7267/22434 [7:56:55<10:52:00, 2.58s/it] +2025-02-05 18:04:35 - ERROR - stderr - +2025-02-05 18:04:35 - ERROR - stderr - +2025-02-05 18:04:35 - INFO - stdout - {'loss': 0.8823, 'grad_norm': 1.0382641553878784, 'learning_rate': 1.580167380841215e-05, 'epoch': 0.97} +2025-02-05 18:04:35 - ERROR - stderr - 32%|███▏ | 7267/22434 [7:56:55<10:52:00, 2.58s/it] +2025-02-05 18:04:38 - ERROR - stderr - 32%|███▏ | 7268/22434 [7:56:58<10:42:47, 2.54s/it] +2025-02-05 18:04:38 - ERROR - stderr - +2025-02-05 18:04:38 - ERROR - stderr - +2025-02-05 18:04:38 - INFO - stdout - {'loss': 0.8949, 'grad_norm': 1.1642438173294067, 'learning_rate': 1.58004978202608e-05, 'epoch': 0.97} +2025-02-05 18:04:38 - ERROR - stderr - 32%|███▏ | 7268/22434 [7:56:58<10:42:47, 2.54s/it] +2025-02-05 18:04:40 - ERROR - stderr - 32%|███▏ | 7269/22434 [7:57:00<10:40:56, 2.54s/it] +2025-02-05 18:04:40 - ERROR - stderr - +2025-02-05 18:04:40 - ERROR - stderr - +2025-02-05 18:04:40 - INFO - stdout - {'loss': 1.1033, 'grad_norm': 1.1263982057571411, 'learning_rate': 1.5799321711203622e-05, 'epoch': 0.97} +2025-02-05 18:04:40 - ERROR - stderr - 32%|███▏ | 7269/22434 [7:57:00<10:40:56, 2.54s/it] +2025-02-05 18:04:43 - ERROR - stderr - 32%|███▏ | 7270/22434 [7:57:03<10:39:11, 2.53s/it] +2025-02-05 18:04:43 - ERROR - stderr - +2025-02-05 18:04:43 - ERROR - stderr - +2025-02-05 18:04:43 - INFO - stdout - {'loss': 0.8936, 'grad_norm': 0.9878901243209839, 'learning_rate': 1.579814548126514e-05, 'epoch': 0.97} +2025-02-05 18:04:43 - ERROR - stderr - 32%|███▏ | 7270/22434 [7:57:03<10:39:11, 2.53s/it] +2025-02-05 18:04:45 - ERROR - stderr - 32%|███▏ | 7271/22434 [7:57:05<10:41:19, 2.54s/it] +2025-02-05 18:04:46 - ERROR - stderr - +2025-02-05 18:04:46 - ERROR - stderr - +2025-02-05 18:04:46 - INFO - stdout - {'loss': 0.8792, 'grad_norm': 0.9581366181373596, 'learning_rate': 1.5796969130469857e-05, 'epoch': 0.97} +2025-02-05 18:04:46 - ERROR - stderr - 32%|███▏ | 7271/22434 [7:57:05<10:41:19, 2.54s/it] +2025-02-05 18:04:48 - ERROR - stderr - 32%|███▏ | 7272/22434 [7:57:08<10:46:45, 2.56s/it] +2025-02-05 18:04:48 - ERROR - stderr - +2025-02-05 18:04:48 - ERROR - stderr - +2025-02-05 18:04:48 - INFO - stdout - {'loss': 1.0183, 'grad_norm': 0.9988245368003845, 'learning_rate': 1.57957926588423e-05, 'epoch': 0.97} +2025-02-05 18:04:48 - ERROR - stderr - 32%|███▏ | 7272/22434 [7:57:08<10:46:45, 2.56s/it] +2025-02-05 18:04:51 - ERROR - stderr - 32%|███▏ | 7273/22434 [7:57:10<10:48:59, 2.57s/it] +2025-02-05 18:04:51 - ERROR - stderr - +2025-02-05 18:04:51 - ERROR - stderr - +2025-02-05 18:04:51 - INFO - stdout - {'loss': 0.9338, 'grad_norm': 1.1319457292556763, 'learning_rate': 1.5794616066406993e-05, 'epoch': 0.97} +2025-02-05 18:04:51 - ERROR - stderr - 32%|███▏ | 7273/22434 [7:57:11<10:48:59, 2.57s/it] +2025-02-05 18:04:53 - ERROR - stderr - 32%|███▏ | 7274/22434 [7:57:13<10:40:58, 2.54s/it] +2025-02-05 18:04:53 - ERROR - stderr - +2025-02-05 18:04:53 - ERROR - stderr - +2025-02-05 18:04:53 - INFO - stdout - {'loss': 0.9566, 'grad_norm': 1.068511724472046, 'learning_rate': 1.579343935318846e-05, 'epoch': 0.97} +2025-02-05 18:04:53 - ERROR - stderr - 32%|███▏ | 7274/22434 [7:57:13<10:40:58, 2.54s/it] +2025-02-05 18:04:56 - ERROR - stderr - 32%|███▏ | 7275/22434 [7:57:15<10:35:55, 2.52s/it] +2025-02-05 18:04:56 - ERROR - stderr - +2025-02-05 18:04:56 - ERROR - stderr - +2025-02-05 18:04:56 - INFO - stdout - {'loss': 0.9718, 'grad_norm': 1.039093017578125, 'learning_rate': 1.5792262519211224e-05, 'epoch': 0.97} +2025-02-05 18:04:56 - ERROR - stderr - 32%|███▏ | 7275/22434 [7:57:15<10:35:55, 2.52s/it] +2025-02-05 18:04:58 - ERROR - stderr - 32%|███▏ | 7276/22434 [7:57:18<10:37:10, 2.52s/it] +2025-02-05 18:04:58 - ERROR - stderr - +2025-02-05 18:04:58 - ERROR - stderr - +2025-02-05 18:04:58 - INFO - stdout - {'loss': 0.9138, 'grad_norm': 1.0397595167160034, 'learning_rate': 1.579108556449982e-05, 'epoch': 0.97} +2025-02-05 18:04:58 - ERROR - stderr - 32%|███▏ | 7276/22434 [7:57:18<10:37:10, 2.52s/it] +2025-02-05 18:05:01 - ERROR - stderr - 32%|███▏ | 7277/22434 [7:57:20<10:36:40, 2.52s/it] +2025-02-05 18:05:01 - ERROR - stderr - +2025-02-05 18:05:01 - ERROR - stderr - +2025-02-05 18:05:01 - INFO - stdout - {'loss': 0.9724, 'grad_norm': 1.037088394165039, 'learning_rate': 1.578990848907878e-05, 'epoch': 0.97} +2025-02-05 18:05:01 - ERROR - stderr - 32%|███▏ | 7277/22434 [7:57:20<10:36:40, 2.52s/it] +2025-02-05 18:05:03 - ERROR - stderr - 32%|███▏ | 7278/22434 [7:57:23<10:31:59, 2.50s/it] +2025-02-05 18:05:03 - ERROR - stderr - +2025-02-05 18:05:03 - ERROR - stderr - +2025-02-05 18:05:03 - INFO - stdout - {'loss': 0.8115, 'grad_norm': 0.9322558045387268, 'learning_rate': 1.578873129297264e-05, 'epoch': 0.97} +2025-02-05 18:05:03 - ERROR - stderr - 32%|███▏ | 7278/22434 [7:57:23<10:31:59, 2.50s/it] +2025-02-05 18:05:06 - ERROR - stderr - 32%|███▏ | 7279/22434 [7:57:25<10:31:28, 2.50s/it] +2025-02-05 18:05:06 - ERROR - stderr - +2025-02-05 18:05:06 - ERROR - stderr - +2025-02-05 18:05:06 - INFO - stdout - {'loss': 0.9492, 'grad_norm': 1.0813533067703247, 'learning_rate': 1.5787553976205928e-05, 'epoch': 0.97} +2025-02-05 18:05:06 - ERROR - stderr - 32%|███▏ | 7279/22434 [7:57:25<10:31:28, 2.50s/it] +2025-02-05 18:05:08 - ERROR - stderr - 32%|███▏ | 7280/22434 [7:57:28<10:26:55, 2.48s/it] +2025-02-05 18:05:08 - ERROR - stderr - +2025-02-05 18:05:08 - ERROR - stderr - +2025-02-05 18:05:08 - INFO - stdout - {'loss': 0.7536, 'grad_norm': 0.8945992588996887, 'learning_rate': 1.5786376538803197e-05, 'epoch': 0.97} +2025-02-05 18:05:08 - ERROR - stderr - 32%|███▏ | 7280/22434 [7:57:28<10:26:55, 2.48s/it] +2025-02-05 18:05:11 - ERROR - stderr - 32%|███▏ | 7281/22434 [7:57:30<10:36:33, 2.52s/it] +2025-02-05 18:05:11 - ERROR - stderr - +2025-02-05 18:05:11 - ERROR - stderr - +2025-02-05 18:05:11 - INFO - stdout - {'loss': 0.9726, 'grad_norm': 1.068833589553833, 'learning_rate': 1.578519898078898e-05, 'epoch': 0.97} +2025-02-05 18:05:11 - ERROR - stderr - 32%|███▏ | 7281/22434 [7:57:30<10:36:33, 2.52s/it] +2025-02-05 18:05:13 - ERROR - stderr - 32%|███▏ | 7282/22434 [7:57:33<10:56:08, 2.60s/it] +2025-02-05 18:05:14 - ERROR - stderr - +2025-02-05 18:05:14 - ERROR - stderr - +2025-02-05 18:05:14 - INFO - stdout - {'loss': 0.971, 'grad_norm': 1.0371111631393433, 'learning_rate': 1.578402130218783e-05, 'epoch': 0.97} +2025-02-05 18:05:14 - ERROR - stderr - 32%|███▏ | 7282/22434 [7:57:33<10:56:08, 2.60s/it] +2025-02-05 18:05:16 - ERROR - stderr - 32%|███▏ | 7283/22434 [7:57:36<10:53:09, 2.59s/it] +2025-02-05 18:05:16 - ERROR - stderr - +2025-02-05 18:05:16 - ERROR - stderr - +2025-02-05 18:05:16 - INFO - stdout - {'loss': 0.9343, 'grad_norm': 1.0172487497329712, 'learning_rate': 1.578284350302429e-05, 'epoch': 0.97} +2025-02-05 18:05:16 - ERROR - stderr - 32%|███▏ | 7283/22434 [7:57:36<10:53:09, 2.59s/it] +2025-02-05 18:05:19 - ERROR - stderr - 32%|███▏ | 7284/22434 [7:57:38<10:48:03, 2.57s/it] +2025-02-05 18:05:19 - ERROR - stderr - +2025-02-05 18:05:19 - ERROR - stderr - +2025-02-05 18:05:19 - INFO - stdout - {'loss': 0.8904, 'grad_norm': 0.942463219165802, 'learning_rate': 1.5781665583322913e-05, 'epoch': 0.97} +2025-02-05 18:05:19 - ERROR - stderr - 32%|███▏ | 7284/22434 [7:57:38<10:48:03, 2.57s/it] +2025-02-05 18:05:21 - ERROR - stderr - 32%|███▏ | 7285/22434 [7:57:41<10:46:14, 2.56s/it] +2025-02-05 18:05:21 - ERROR - stderr - +2025-02-05 18:05:21 - ERROR - stderr - +2025-02-05 18:05:21 - INFO - stdout - {'loss': 0.8689, 'grad_norm': 0.9291203022003174, 'learning_rate': 1.5780487543108246e-05, 'epoch': 0.97} +2025-02-05 18:05:21 - ERROR - stderr - 32%|███▏ | 7285/22434 [7:57:41<10:46:14, 2.56s/it] +2025-02-05 18:05:24 - ERROR - stderr - 32%|███▏ | 7286/22434 [7:57:43<10:37:24, 2.52s/it] +2025-02-05 18:05:24 - ERROR - stderr - +2025-02-05 18:05:24 - ERROR - stderr - +2025-02-05 18:05:24 - INFO - stdout - {'loss': 0.8416, 'grad_norm': 1.0542680025100708, 'learning_rate': 1.577930938240485e-05, 'epoch': 0.97} +2025-02-05 18:05:24 - ERROR - stderr - 32%|███▏ | 7286/22434 [7:57:43<10:37:24, 2.52s/it] +2025-02-05 18:05:26 - ERROR - stderr - 32%|███▏ | 7287/22434 [7:57:46<10:32:02, 2.50s/it] +2025-02-05 18:05:26 - ERROR - stderr - +2025-02-05 18:05:26 - ERROR - stderr - +2025-02-05 18:05:26 - INFO - stdout - {'loss': 0.8744, 'grad_norm': 1.0547653436660767, 'learning_rate': 1.5778131101237275e-05, 'epoch': 0.97} +2025-02-05 18:05:26 - ERROR - stderr - 32%|███▏ | 7287/22434 [7:57:46<10:32:02, 2.50s/it] +2025-02-05 18:05:28 - ERROR - stderr - 32%|███▏ | 7288/22434 [7:57:48<10:30:17, 2.50s/it] +2025-02-05 18:05:29 - ERROR - stderr - +2025-02-05 18:05:29 - ERROR - stderr - +2025-02-05 18:05:29 - INFO - stdout - {'loss': 0.9741, 'grad_norm': 1.0307928323745728, 'learning_rate': 1.577695269963009e-05, 'epoch': 0.97} +2025-02-05 18:05:29 - ERROR - stderr - 32%|███▏ | 7288/22434 [7:57:48<10:30:17, 2.50s/it] +2025-02-05 18:05:31 - ERROR - stderr - 32%|███▏ | 7289/22434 [7:57:51<10:33:57, 2.51s/it] +2025-02-05 18:05:31 - ERROR - stderr - +2025-02-05 18:05:31 - ERROR - stderr - +2025-02-05 18:05:31 - INFO - stdout - {'loss': 0.9564, 'grad_norm': 0.9972244501113892, 'learning_rate': 1.577577417760785e-05, 'epoch': 0.97} +2025-02-05 18:05:31 - ERROR - stderr - 32%|███▏ | 7289/22434 [7:57:51<10:33:57, 2.51s/it] +2025-02-05 18:05:34 - ERROR - stderr - 32%|███▏ | 7290/22434 [7:57:53<10:35:13, 2.52s/it] +2025-02-05 18:05:34 - ERROR - stderr - +2025-02-05 18:05:34 - ERROR - stderr - +2025-02-05 18:05:34 - INFO - stdout - {'loss': 0.9124, 'grad_norm': 1.143341064453125, 'learning_rate': 1.577459553519513e-05, 'epoch': 0.97} +2025-02-05 18:05:34 - ERROR - stderr - 32%|███▏ | 7290/22434 [7:57:53<10:35:13, 2.52s/it] +2025-02-05 18:05:36 - ERROR - stderr - 32%|███▏ | 7291/22434 [7:57:56<10:36:10, 2.52s/it] +2025-02-05 18:05:36 - ERROR - stderr - +2025-02-05 18:05:36 - ERROR - stderr - +2025-02-05 18:05:36 - INFO - stdout - {'loss': 0.873, 'grad_norm': 1.1272506713867188, 'learning_rate': 1.577341677241649e-05, 'epoch': 0.97} +2025-02-05 18:05:36 - ERROR - stderr - 32%|███▏ | 7291/22434 [7:57:56<10:36:10, 2.52s/it] +2025-02-05 18:05:39 - ERROR - stderr - 33%|███▎ | 7292/22434 [7:57:58<10:34:36, 2.51s/it] +2025-02-05 18:05:39 - ERROR - stderr - +2025-02-05 18:05:39 - ERROR - stderr - +2025-02-05 18:05:39 - INFO - stdout - {'loss': 0.8888, 'grad_norm': 0.9385217428207397, 'learning_rate': 1.57722378892965e-05, 'epoch': 0.98} +2025-02-05 18:05:39 - ERROR - stderr - 33%|███▎ | 7292/22434 [7:57:58<10:34:36, 2.51s/it] +2025-02-05 18:05:41 - ERROR - stderr - 33%|███▎ | 7293/22434 [7:58:01<10:37:18, 2.53s/it] +2025-02-05 18:05:41 - ERROR - stderr - +2025-02-05 18:05:41 - ERROR - stderr - +2025-02-05 18:05:41 - INFO - stdout - {'loss': 0.8609, 'grad_norm': 1.1405953168869019, 'learning_rate': 1.5771058885859735e-05, 'epoch': 0.98} +2025-02-05 18:05:41 - ERROR - stderr - 33%|███▎ | 7293/22434 [7:58:01<10:37:18, 2.53s/it] +2025-02-05 18:05:44 - ERROR - stderr - 33%|███▎ | 7294/22434 [7:58:03<10:35:58, 2.52s/it] +2025-02-05 18:05:44 - ERROR - stderr - +2025-02-05 18:05:44 - ERROR - stderr - +2025-02-05 18:05:44 - INFO - stdout - {'loss': 0.8865, 'grad_norm': 0.9752020835876465, 'learning_rate': 1.5769879762130775e-05, 'epoch': 0.98} +2025-02-05 18:05:44 - ERROR - stderr - 33%|███▎ | 7294/22434 [7:58:03<10:35:58, 2.52s/it] +2025-02-05 18:05:46 - ERROR - stderr - 33%|███▎ | 7295/22434 [7:58:06<10:36:10, 2.52s/it] +2025-02-05 18:05:46 - ERROR - stderr - +2025-02-05 18:05:46 - ERROR - stderr - +2025-02-05 18:05:46 - INFO - stdout - {'loss': 0.868, 'grad_norm': 1.0191373825073242, 'learning_rate': 1.5768700518134184e-05, 'epoch': 0.98} +2025-02-05 18:05:46 - ERROR - stderr - 33%|███▎ | 7295/22434 [7:58:06<10:36:10, 2.52s/it] +2025-02-05 18:05:49 - ERROR - stderr - 33%|███▎ | 7296/22434 [7:58:08<10:40:10, 2.54s/it] +2025-02-05 18:05:49 - ERROR - stderr - +2025-02-05 18:05:49 - ERROR - stderr - +2025-02-05 18:05:49 - INFO - stdout - {'loss': 0.9369, 'grad_norm': 1.1318840980529785, 'learning_rate': 1.5767521153894555e-05, 'epoch': 0.98} +2025-02-05 18:05:49 - ERROR - stderr - 33%|███▎ | 7296/22434 [7:58:09<10:40:10, 2.54s/it] +2025-02-05 18:05:52 - ERROR - stderr - 33%|███▎ | 7297/22434 [7:58:11<10:59:22, 2.61s/it] +2025-02-05 18:05:52 - ERROR - stderr - +2025-02-05 18:05:52 - ERROR - stderr - +2025-02-05 18:05:52 - INFO - stdout - {'loss': 0.9231, 'grad_norm': 1.0205382108688354, 'learning_rate': 1.5766341669436468e-05, 'epoch': 0.98} +2025-02-05 18:05:52 - ERROR - stderr - 33%|███▎ | 7297/22434 [7:58:11<10:59:22, 2.61s/it] +2025-02-05 18:05:54 - ERROR - stderr - 33%|███▎ | 7298/22434 [7:58:14<10:55:47, 2.60s/it] +2025-02-05 18:05:54 - ERROR - stderr - +2025-02-05 18:05:54 - ERROR - stderr - +2025-02-05 18:05:54 - INFO - stdout - {'loss': 0.972, 'grad_norm': 1.1478955745697021, 'learning_rate': 1.5765162064784504e-05, 'epoch': 0.98} +2025-02-05 18:05:54 - ERROR - stderr - 33%|███▎ | 7298/22434 [7:58:14<10:55:47, 2.60s/it] +2025-02-05 18:05:57 - ERROR - stderr - 33%|███▎ | 7299/22434 [7:58:16<10:51:19, 2.58s/it] +2025-02-05 18:05:57 - ERROR - stderr - +2025-02-05 18:05:57 - ERROR - stderr - +2025-02-05 18:05:57 - INFO - stdout - {'loss': 0.957, 'grad_norm': 1.1083965301513672, 'learning_rate': 1.5763982339963254e-05, 'epoch': 0.98} +2025-02-05 18:05:57 - ERROR - stderr - 33%|███▎ | 7299/22434 [7:58:16<10:51:19, 2.58s/it] +2025-02-05 18:05:59 - ERROR - stderr - 33%|███▎ | 7300/22434 [7:58:19<10:47:15, 2.57s/it] +2025-02-05 18:05:59 - ERROR - stderr - +2025-02-05 18:05:59 - ERROR - stderr - +2025-02-05 18:05:59 - INFO - stdout - {'loss': 0.8837, 'grad_norm': 1.1210089921951294, 'learning_rate': 1.576280249499731e-05, 'epoch': 0.98} +2025-02-05 18:05:59 - ERROR - stderr - 33%|███▎ | 7300/22434 [7:58:19<10:47:15, 2.57s/it] +2025-02-05 18:06:02 - ERROR - stderr - 33%|███▎ | 7301/22434 [7:58:21<10:42:52, 2.55s/it] +2025-02-05 18:06:02 - ERROR - stderr - +2025-02-05 18:06:02 - ERROR - stderr - +2025-02-05 18:06:02 - INFO - stdout - {'loss': 0.984, 'grad_norm': 1.0822051763534546, 'learning_rate': 1.576162252991126e-05, 'epoch': 0.98} +2025-02-05 18:06:02 - ERROR - stderr - 33%|███▎ | 7301/22434 [7:58:21<10:42:52, 2.55s/it] +2025-02-05 18:06:04 - ERROR - stderr - 33%|███▎ | 7302/22434 [7:58:24<10:41:35, 2.54s/it] +2025-02-05 18:06:04 - ERROR - stderr - +2025-02-05 18:06:04 - ERROR - stderr - +2025-02-05 18:06:04 - INFO - stdout - {'loss': 0.8801, 'grad_norm': 1.034590244293213, 'learning_rate': 1.5760442444729703e-05, 'epoch': 0.98} +2025-02-05 18:06:04 - ERROR - stderr - 33%|███▎ | 7302/22434 [7:58:24<10:41:35, 2.54s/it] +2025-02-05 18:06:07 - ERROR - stderr - 33%|███▎ | 7303/22434 [7:58:27<10:46:24, 2.56s/it] +2025-02-05 18:06:07 - ERROR - stderr - +2025-02-05 18:06:07 - ERROR - stderr - +2025-02-05 18:06:07 - INFO - stdout - {'loss': 1.0588, 'grad_norm': 1.1002167463302612, 'learning_rate': 1.5759262239477237e-05, 'epoch': 0.98} +2025-02-05 18:06:07 - ERROR - stderr - 33%|███▎ | 7303/22434 [7:58:27<10:46:24, 2.56s/it] +2025-02-05 18:06:09 - ERROR - stderr - 33%|███▎ | 7304/22434 [7:58:29<10:42:22, 2.55s/it] +2025-02-05 18:06:09 - ERROR - stderr - +2025-02-05 18:06:09 - ERROR - stderr - +2025-02-05 18:06:09 - INFO - stdout - {'loss': 0.8592, 'grad_norm': 0.9665080308914185, 'learning_rate': 1.5758081914178457e-05, 'epoch': 0.98} +2025-02-05 18:06:09 - ERROR - stderr - 33%|███▎ | 7304/22434 [7:58:29<10:42:22, 2.55s/it] +2025-02-05 18:06:12 - ERROR - stderr - 33%|███▎ | 7305/22434 [7:58:32<10:38:52, 2.53s/it] +2025-02-05 18:06:12 - ERROR - stderr - +2025-02-05 18:06:12 - ERROR - stderr - +2025-02-05 18:06:12 - INFO - stdout - {'loss': 0.9497, 'grad_norm': 1.0083622932434082, 'learning_rate': 1.575690146885797e-05, 'epoch': 0.98} +2025-02-05 18:06:12 - ERROR - stderr - 33%|███▎ | 7305/22434 [7:58:32<10:38:52, 2.53s/it] +2025-02-05 18:06:14 - ERROR - stderr - 33%|███▎ | 7306/22434 [7:58:34<10:33:55, 2.51s/it] +2025-02-05 18:06:14 - ERROR - stderr - +2025-02-05 18:06:14 - ERROR - stderr - +2025-02-05 18:06:14 - INFO - stdout - {'loss': 0.9841, 'grad_norm': 1.1475253105163574, 'learning_rate': 1.575572090354038e-05, 'epoch': 0.98} +2025-02-05 18:06:14 - ERROR - stderr - 33%|███▎ | 7306/22434 [7:58:34<10:33:55, 2.51s/it] +2025-02-05 18:06:17 - ERROR - stderr - 33%|███▎ | 7307/22434 [7:58:37<11:07:58, 2.65s/it] +2025-02-05 18:06:17 - ERROR - stderr - +2025-02-05 18:06:17 - ERROR - stderr - +2025-02-05 18:06:17 - INFO - stdout - {'loss': 1.0083, 'grad_norm': 1.1957204341888428, 'learning_rate': 1.5754540218250296e-05, 'epoch': 0.98} +2025-02-05 18:06:17 - ERROR - stderr - 33%|███▎ | 7307/22434 [7:58:37<11:07:58, 2.65s/it] +2025-02-05 18:06:20 - ERROR - stderr - 33%|███▎ | 7308/22434 [7:58:40<10:56:24, 2.60s/it] +2025-02-05 18:06:20 - ERROR - stderr - +2025-02-05 18:06:20 - ERROR - stderr - +2025-02-05 18:06:20 - INFO - stdout - {'loss': 0.9365, 'grad_norm': 1.1623096466064453, 'learning_rate': 1.5753359413012332e-05, 'epoch': 0.98} +2025-02-05 18:06:20 - ERROR - stderr - 33%|███▎ | 7308/22434 [7:58:40<10:56:24, 2.60s/it] +2025-02-05 18:06:22 - ERROR - stderr - 33%|███▎ | 7309/22434 [7:58:42<10:52:41, 2.59s/it] +2025-02-05 18:06:22 - ERROR - stderr - +2025-02-05 18:06:22 - ERROR - stderr - +2025-02-05 18:06:22 - INFO - stdout - {'loss': 0.7962, 'grad_norm': 0.9830026030540466, 'learning_rate': 1.5752178487851087e-05, 'epoch': 0.98} +2025-02-05 18:06:22 - ERROR - stderr - 33%|███▎ | 7309/22434 [7:58:42<10:52:41, 2.59s/it] +2025-02-05 18:06:25 - ERROR - stderr - 33%|███▎ | 7310/22434 [7:58:45<10:45:44, 2.56s/it] +2025-02-05 18:06:25 - ERROR - stderr - +2025-02-05 18:06:25 - ERROR - stderr - +2025-02-05 18:06:25 - INFO - stdout - {'loss': 0.8736, 'grad_norm': 0.9934231042861938, 'learning_rate': 1.575099744279119e-05, 'epoch': 0.98} +2025-02-05 18:06:25 - ERROR - stderr - 33%|███▎ | 7310/22434 [7:58:45<10:45:44, 2.56s/it] +2025-02-05 18:06:27 - ERROR - stderr - 33%|███▎ | 7311/22434 [7:58:47<10:38:02, 2.53s/it] +2025-02-05 18:06:27 - ERROR - stderr - +2025-02-05 18:06:27 - ERROR - stderr - +2025-02-05 18:06:27 - INFO - stdout - {'loss': 0.9831, 'grad_norm': 0.9317435026168823, 'learning_rate': 1.574981627785726e-05, 'epoch': 0.98} +2025-02-05 18:06:27 - ERROR - stderr - 33%|███▎ | 7311/22434 [7:58:47<10:38:02, 2.53s/it] +2025-02-05 18:06:30 - ERROR - stderr - 33%|███▎ | 7312/22434 [7:58:50<10:42:19, 2.55s/it] +2025-02-05 18:06:30 - ERROR - stderr - +2025-02-05 18:06:30 - ERROR - stderr - +2025-02-05 18:06:30 - INFO - stdout - {'loss': 0.9071, 'grad_norm': 1.062959909439087, 'learning_rate': 1.5748634993073906e-05, 'epoch': 0.98} +2025-02-05 18:06:30 - ERROR - stderr - 33%|███▎ | 7312/22434 [7:58:50<10:42:19, 2.55s/it] +2025-02-05 18:06:32 - ERROR - stderr - 33%|███▎ | 7313/22434 [7:58:52<10:39:12, 2.54s/it] +2025-02-05 18:06:32 - ERROR - stderr - +2025-02-05 18:06:32 - ERROR - stderr - +2025-02-05 18:06:32 - INFO - stdout - {'loss': 0.8875, 'grad_norm': 1.0343540906906128, 'learning_rate': 1.5747453588465758e-05, 'epoch': 0.98} +2025-02-05 18:06:32 - ERROR - stderr - 33%|███▎ | 7313/22434 [7:58:52<10:39:12, 2.54s/it] +2025-02-05 18:06:35 - ERROR - stderr - 33%|███▎ | 7314/22434 [7:58:55<10:37:34, 2.53s/it] +2025-02-05 18:06:35 - ERROR - stderr - +2025-02-05 18:06:35 - ERROR - stderr - +2025-02-05 18:06:35 - INFO - stdout - {'loss': 0.8804, 'grad_norm': 1.0103228092193604, 'learning_rate': 1.5746272064057438e-05, 'epoch': 0.98} +2025-02-05 18:06:35 - ERROR - stderr - 33%|███▎ | 7314/22434 [7:58:55<10:37:34, 2.53s/it] +2025-02-05 18:06:37 - ERROR - stderr - 33%|███▎ | 7315/22434 [7:58:57<10:43:03, 2.55s/it] +2025-02-05 18:06:38 - ERROR - stderr - +2025-02-05 18:06:38 - ERROR - stderr - +2025-02-05 18:06:38 - INFO - stdout - {'loss': 1.019, 'grad_norm': 1.0440034866333008, 'learning_rate': 1.574509041987358e-05, 'epoch': 0.98} +2025-02-05 18:06:38 - ERROR - stderr - 33%|███▎ | 7315/22434 [7:58:57<10:43:03, 2.55s/it] +2025-02-05 18:06:40 - ERROR - stderr - 33%|███▎ | 7316/22434 [7:59:00<10:46:54, 2.57s/it] +2025-02-05 18:06:40 - ERROR - stderr - +2025-02-05 18:06:40 - ERROR - stderr - +2025-02-05 18:06:40 - INFO - stdout - {'loss': 0.9038, 'grad_norm': 1.0099412202835083, 'learning_rate': 1.5743908655938803e-05, 'epoch': 0.98} +2025-02-05 18:06:40 - ERROR - stderr - 33%|███▎ | 7316/22434 [7:59:00<10:46:54, 2.57s/it] +2025-02-05 18:06:43 - ERROR - stderr - 33%|███▎ | 7317/22434 [7:59:02<10:48:37, 2.57s/it] +2025-02-05 18:06:43 - ERROR - stderr - +2025-02-05 18:06:43 - ERROR - stderr - +2025-02-05 18:06:43 - INFO - stdout - {'loss': 0.9544, 'grad_norm': 1.095073938369751, 'learning_rate': 1.574272677227775e-05, 'epoch': 0.98} +2025-02-05 18:06:43 - ERROR - stderr - 33%|███▎ | 7317/22434 [7:59:02<10:48:37, 2.57s/it] +2025-02-05 18:06:45 - ERROR - stderr - 33%|███▎ | 7318/22434 [7:59:05<10:38:45, 2.54s/it] +2025-02-05 18:06:45 - ERROR - stderr - +2025-02-05 18:06:45 - ERROR - stderr - +2025-02-05 18:06:45 - INFO - stdout - {'loss': 0.8614, 'grad_norm': 0.9910484552383423, 'learning_rate': 1.5741544768915055e-05, 'epoch': 0.98} +2025-02-05 18:06:45 - ERROR - stderr - 33%|███▎ | 7318/22434 [7:59:05<10:38:45, 2.54s/it] +2025-02-05 18:06:48 - ERROR - stderr - 33%|███▎ | 7319/22434 [7:59:07<10:30:53, 2.50s/it] +2025-02-05 18:06:48 - ERROR - stderr - +2025-02-05 18:06:48 - ERROR - stderr - +2025-02-05 18:06:48 - INFO - stdout - {'loss': 0.908, 'grad_norm': 1.0039806365966797, 'learning_rate': 1.574036264587535e-05, 'epoch': 0.98} +2025-02-05 18:06:48 - ERROR - stderr - 33%|███▎ | 7319/22434 [7:59:07<10:30:53, 2.50s/it] +2025-02-05 18:06:50 - ERROR - stderr - 33%|███▎ | 7320/22434 [7:59:10<10:41:56, 2.55s/it] +2025-02-05 18:06:50 - ERROR - stderr - +2025-02-05 18:06:50 - ERROR - stderr - +2025-02-05 18:06:50 - INFO - stdout - {'loss': 1.0054, 'grad_norm': 1.0282633304595947, 'learning_rate': 1.573918040318328e-05, 'epoch': 0.98} +2025-02-05 18:06:50 - ERROR - stderr - 33%|███▎ | 7320/22434 [7:59:10<10:41:56, 2.55s/it] +2025-02-05 18:06:53 - ERROR - stderr - 33%|███▎ | 7321/22434 [7:59:12<10:36:15, 2.53s/it] +2025-02-05 18:06:53 - ERROR - stderr - +2025-02-05 18:06:53 - ERROR - stderr - +2025-02-05 18:06:53 - INFO - stdout - {'loss': 0.9732, 'grad_norm': 1.112988829612732, 'learning_rate': 1.5737998040863484e-05, 'epoch': 0.98} +2025-02-05 18:06:53 - ERROR - stderr - 33%|███▎ | 7321/22434 [7:59:12<10:36:15, 2.53s/it] +2025-02-05 18:06:55 - ERROR - stderr - 33%|███▎ | 7322/22434 [7:59:15<10:42:50, 2.55s/it] +2025-02-05 18:06:55 - ERROR - stderr - +2025-02-05 18:06:55 - ERROR - stderr - +2025-02-05 18:06:55 - INFO - stdout - {'loss': 0.9111, 'grad_norm': 0.9976562857627869, 'learning_rate': 1.5736815558940612e-05, 'epoch': 0.98} +2025-02-05 18:06:55 - ERROR - stderr - 33%|███▎ | 7322/22434 [7:59:15<10:42:50, 2.55s/it] +2025-02-05 18:06:58 - ERROR - stderr - 33%|███▎ | 7323/22434 [7:59:18<10:44:53, 2.56s/it] +2025-02-05 18:06:58 - ERROR - stderr - +2025-02-05 18:06:58 - ERROR - stderr - +2025-02-05 18:06:58 - INFO - stdout - {'loss': 1.033, 'grad_norm': 1.100832462310791, 'learning_rate': 1.573563295743931e-05, 'epoch': 0.98} +2025-02-05 18:06:58 - ERROR - stderr - 33%|███▎ | 7323/22434 [7:59:18<10:44:53, 2.56s/it] +2025-02-05 18:07:00 - ERROR - stderr - 33%|███▎ | 7324/22434 [7:59:20<10:41:05, 2.55s/it] +2025-02-05 18:07:00 - ERROR - stderr - +2025-02-05 18:07:00 - ERROR - stderr - +2025-02-05 18:07:00 - INFO - stdout - {'loss': 0.8979, 'grad_norm': 0.9580290913581848, 'learning_rate': 1.5734450236384225e-05, 'epoch': 0.98} +2025-02-05 18:07:00 - ERROR - stderr - 33%|███▎ | 7324/22434 [7:59:20<10:41:05, 2.55s/it] +2025-02-05 18:07:03 - ERROR - stderr - 33%|███▎ | 7325/22434 [7:59:23<10:41:54, 2.55s/it] +2025-02-05 18:07:03 - ERROR - stderr - +2025-02-05 18:07:03 - ERROR - stderr - +2025-02-05 18:07:03 - INFO - stdout - {'loss': 0.9109, 'grad_norm': 1.113832950592041, 'learning_rate': 1.5733267395800014e-05, 'epoch': 0.98} +2025-02-05 18:07:03 - ERROR - stderr - 33%|███▎ | 7325/22434 [7:59:23<10:41:54, 2.55s/it] +2025-02-05 18:07:06 - ERROR - stderr - 33%|███▎ | 7326/22434 [7:59:25<10:51:42, 2.59s/it] +2025-02-05 18:07:06 - ERROR - stderr - +2025-02-05 18:07:06 - ERROR - stderr - +2025-02-05 18:07:06 - INFO - stdout - {'loss': 0.9898, 'grad_norm': 1.127493977546692, 'learning_rate': 1.5732084435711326e-05, 'epoch': 0.98} +2025-02-05 18:07:06 - ERROR - stderr - 33%|███▎ | 7326/22434 [7:59:25<10:51:42, 2.59s/it] +2025-02-05 18:07:08 - ERROR - stderr - 33%|███▎ | 7327/22434 [7:59:28<10:50:08, 2.58s/it] +2025-02-05 18:07:08 - ERROR - stderr - +2025-02-05 18:07:08 - ERROR - stderr - +2025-02-05 18:07:08 - INFO - stdout - {'loss': 0.8576, 'grad_norm': 0.9688475728034973, 'learning_rate': 1.573090135614283e-05, 'epoch': 0.98} +2025-02-05 18:07:08 - ERROR - stderr - 33%|███▎ | 7327/22434 [7:59:28<10:50:08, 2.58s/it] +2025-02-05 18:07:11 - ERROR - stderr - 33%|███▎ | 7328/22434 [7:59:30<10:47:47, 2.57s/it] +2025-02-05 18:07:11 - ERROR - stderr - +2025-02-05 18:07:11 - ERROR - stderr - +2025-02-05 18:07:11 - INFO - stdout - {'loss': 0.9895, 'grad_norm': 1.1061797142028809, 'learning_rate': 1.5729718157119176e-05, 'epoch': 0.98} +2025-02-05 18:07:11 - ERROR - stderr - 33%|███▎ | 7328/22434 [7:59:31<10:47:47, 2.57s/it] +2025-02-05 18:07:13 - ERROR - stderr - 33%|███▎ | 7329/22434 [7:59:33<10:38:13, 2.54s/it] +2025-02-05 18:07:13 - ERROR - stderr - +2025-02-05 18:07:13 - ERROR - stderr - +2025-02-05 18:07:13 - INFO - stdout - {'loss': 0.9773, 'grad_norm': 1.051244854927063, 'learning_rate': 1.5728534838665027e-05, 'epoch': 0.98} +2025-02-05 18:07:13 - ERROR - stderr - 33%|███▎ | 7329/22434 [7:59:33<10:38:13, 2.54s/it] +2025-02-05 18:07:16 - ERROR - stderr - 33%|███▎ | 7330/22434 [7:59:35<10:36:57, 2.53s/it] +2025-02-05 18:07:16 - ERROR - stderr - +2025-02-05 18:07:16 - ERROR - stderr - +2025-02-05 18:07:16 - INFO - stdout - {'loss': 0.9508, 'grad_norm': 1.1054096221923828, 'learning_rate': 1.5727351400805054e-05, 'epoch': 0.98} +2025-02-05 18:07:16 - ERROR - stderr - 33%|███▎ | 7330/22434 [7:59:36<10:36:57, 2.53s/it] +2025-02-05 18:07:18 - ERROR - stderr - 33%|███▎ | 7331/22434 [7:59:38<10:36:30, 2.53s/it] +2025-02-05 18:07:18 - ERROR - stderr - +2025-02-05 18:07:18 - ERROR - stderr - +2025-02-05 18:07:18 - INFO - stdout - {'loss': 0.8988, 'grad_norm': 1.0104281902313232, 'learning_rate': 1.572616784356392e-05, 'epoch': 0.98} +2025-02-05 18:07:18 - ERROR - stderr - 33%|███▎ | 7331/22434 [7:59:38<10:36:30, 2.53s/it] +2025-02-05 18:07:21 - ERROR - stderr - 33%|███▎ | 7332/22434 [7:59:40<10:34:44, 2.52s/it] +2025-02-05 18:07:21 - ERROR - stderr - +2025-02-05 18:07:21 - ERROR - stderr - +2025-02-05 18:07:21 - INFO - stdout - {'loss': 1.0162, 'grad_norm': 1.1442021131515503, 'learning_rate': 1.5724984166966297e-05, 'epoch': 0.98} +2025-02-05 18:07:21 - ERROR - stderr - 33%|███▎ | 7332/22434 [7:59:41<10:34:44, 2.52s/it] +2025-02-05 18:07:23 - ERROR - stderr - 33%|███▎ | 7333/22434 [7:59:43<10:29:05, 2.50s/it] +2025-02-05 18:07:23 - ERROR - stderr - +2025-02-05 18:07:23 - ERROR - stderr - +2025-02-05 18:07:23 - INFO - stdout - {'loss': 0.9163, 'grad_norm': 1.1502915620803833, 'learning_rate': 1.572380037103686e-05, 'epoch': 0.98} +2025-02-05 18:07:23 - ERROR - stderr - 33%|███▎ | 7333/22434 [7:59:43<10:29:05, 2.50s/it] +2025-02-05 18:07:26 - ERROR - stderr - 33%|███▎ | 7334/22434 [7:59:45<10:30:15, 2.50s/it] +2025-02-05 18:07:26 - ERROR - stderr - +2025-02-05 18:07:26 - ERROR - stderr - +2025-02-05 18:07:26 - INFO - stdout - {'loss': 0.9383, 'grad_norm': 0.9938676357269287, 'learning_rate': 1.572261645580028e-05, 'epoch': 0.98} +2025-02-05 18:07:26 - ERROR - stderr - 33%|███▎ | 7334/22434 [7:59:46<10:30:15, 2.50s/it] +2025-02-05 18:07:28 - ERROR - stderr - 33%|███▎ | 7335/22434 [7:59:48<10:29:13, 2.50s/it] +2025-02-05 18:07:28 - ERROR - stderr - +2025-02-05 18:07:28 - ERROR - stderr - +2025-02-05 18:07:28 - INFO - stdout - {'loss': 0.9633, 'grad_norm': 1.1184587478637695, 'learning_rate': 1.572143242128123e-05, 'epoch': 0.98} +2025-02-05 18:07:28 - ERROR - stderr - 33%|███▎ | 7335/22434 [7:59:48<10:29:13, 2.50s/it] +2025-02-05 18:07:31 - ERROR - stderr - 33%|███▎ | 7336/22434 [7:59:50<10:33:20, 2.52s/it] +2025-02-05 18:07:31 - ERROR - stderr - +2025-02-05 18:07:31 - ERROR - stderr - +2025-02-05 18:07:31 - INFO - stdout - {'loss': 0.8644, 'grad_norm': 0.9877696633338928, 'learning_rate': 1.57202482675044e-05, 'epoch': 0.98} +2025-02-05 18:07:31 - ERROR - stderr - 33%|███▎ | 7336/22434 [7:59:51<10:33:20, 2.52s/it] +2025-02-05 18:07:33 - ERROR - stderr - 33%|███▎ | 7337/22434 [7:59:53<10:31:24, 2.51s/it] +2025-02-05 18:07:33 - ERROR - stderr - +2025-02-05 18:07:33 - ERROR - stderr - +2025-02-05 18:07:33 - INFO - stdout - {'loss': 0.8408, 'grad_norm': 1.213194727897644, 'learning_rate': 1.5719063994494474e-05, 'epoch': 0.98} +2025-02-05 18:07:33 - ERROR - stderr - 33%|███▎ | 7337/22434 [7:59:53<10:31:24, 2.51s/it] +2025-02-05 18:07:36 - ERROR - stderr - 33%|███▎ | 7338/22434 [7:59:56<10:37:10, 2.53s/it] +2025-02-05 18:07:36 - ERROR - stderr - +2025-02-05 18:07:36 - ERROR - stderr - +2025-02-05 18:07:36 - INFO - stdout - {'loss': 0.9135, 'grad_norm': 0.9884855151176453, 'learning_rate': 1.5717879602276123e-05, 'epoch': 0.98} +2025-02-05 18:07:36 - ERROR - stderr - 33%|███▎ | 7338/22434 [7:59:56<10:37:10, 2.53s/it] +2025-02-05 18:07:38 - ERROR - stderr - 33%|███▎ | 7339/22434 [7:59:58<10:39:27, 2.54s/it] +2025-02-05 18:07:38 - ERROR - stderr - +2025-02-05 18:07:38 - ERROR - stderr - +2025-02-05 18:07:38 - INFO - stdout - {'loss': 0.9605, 'grad_norm': 1.1477363109588623, 'learning_rate': 1.571669509087405e-05, 'epoch': 0.98} +2025-02-05 18:07:38 - ERROR - stderr - 33%|███▎ | 7339/22434 [7:59:58<10:39:27, 2.54s/it] +2025-02-05 18:07:41 - ERROR - stderr - 33%|███▎ | 7340/22434 [8:00:01<10:37:49, 2.54s/it] +2025-02-05 18:07:41 - ERROR - stderr - +2025-02-05 18:07:41 - ERROR - stderr - +2025-02-05 18:07:41 - INFO - stdout - {'loss': 0.9601, 'grad_norm': 1.0498074293136597, 'learning_rate': 1.5715510460312936e-05, 'epoch': 0.98} +2025-02-05 18:07:41 - ERROR - stderr - 33%|███▎ | 7340/22434 [8:00:01<10:37:49, 2.54s/it] +2025-02-05 18:07:43 - ERROR - stderr - 33%|███▎ | 7341/22434 [8:00:03<10:36:25, 2.53s/it] +2025-02-05 18:07:43 - ERROR - stderr - +2025-02-05 18:07:43 - ERROR - stderr - +2025-02-05 18:07:43 - INFO - stdout - {'loss': 1.0461, 'grad_norm': 1.1118855476379395, 'learning_rate': 1.571432571061747e-05, 'epoch': 0.98} +2025-02-05 18:07:43 - ERROR - stderr - 33%|███▎ | 7341/22434 [8:00:03<10:36:25, 2.53s/it] +2025-02-05 18:07:46 - ERROR - stderr - 33%|███▎ | 7342/22434 [8:00:06<10:43:14, 2.56s/it] +2025-02-05 18:07:46 - ERROR - stderr - +2025-02-05 18:07:46 - ERROR - stderr - +2025-02-05 18:07:46 - INFO - stdout - {'loss': 0.9027, 'grad_norm': 1.0796371698379517, 'learning_rate': 1.571314084181236e-05, 'epoch': 0.98} +2025-02-05 18:07:46 - ERROR - stderr - 33%|███▎ | 7342/22434 [8:00:06<10:43:14, 2.56s/it] +2025-02-05 18:07:49 - ERROR - stderr - 33%|███▎ | 7343/22434 [8:00:08<10:44:12, 2.56s/it] +2025-02-05 18:07:49 - ERROR - stderr - +2025-02-05 18:07:49 - ERROR - stderr - +2025-02-05 18:07:49 - INFO - stdout - {'loss': 0.882, 'grad_norm': 1.021366834640503, 'learning_rate': 1.5711955853922295e-05, 'epoch': 0.98} +2025-02-05 18:07:49 - ERROR - stderr - 33%|███▎ | 7343/22434 [8:00:08<10:44:12, 2.56s/it] +2025-02-05 18:07:51 - ERROR - stderr - 33%|███▎ | 7344/22434 [8:00:11<10:59:42, 2.62s/it] +2025-02-05 18:07:51 - ERROR - stderr - +2025-02-05 18:07:51 - ERROR - stderr - +2025-02-05 18:07:51 - INFO - stdout - {'loss': 1.147, 'grad_norm': 1.1015697717666626, 'learning_rate': 1.5710770746971973e-05, 'epoch': 0.98} +2025-02-05 18:07:51 - ERROR - stderr - 33%|███▎ | 7344/22434 [8:00:11<10:59:42, 2.62s/it] +2025-02-05 18:07:54 - ERROR - stderr - 33%|███▎ | 7345/22434 [8:00:14<10:52:54, 2.60s/it] +2025-02-05 18:07:54 - ERROR - stderr - +2025-02-05 18:07:54 - ERROR - stderr - +2025-02-05 18:07:54 - INFO - stdout - {'loss': 0.896, 'grad_norm': 1.0094363689422607, 'learning_rate': 1.5709585520986098e-05, 'epoch': 0.98} +2025-02-05 18:07:54 - ERROR - stderr - 33%|███▎ | 7345/22434 [8:00:14<10:52:54, 2.60s/it] +2025-02-05 18:07:56 - ERROR - stderr - 33%|███▎ | 7346/22434 [8:00:16<10:42:41, 2.56s/it] +2025-02-05 18:07:56 - ERROR - stderr - +2025-02-05 18:07:56 - ERROR - stderr - +2025-02-05 18:07:56 - INFO - stdout - {'loss': 0.8975, 'grad_norm': 1.0328314304351807, 'learning_rate': 1.570840017598938e-05, 'epoch': 0.98} +2025-02-05 18:07:56 - ERROR - stderr - 33%|███▎ | 7346/22434 [8:00:16<10:42:41, 2.56s/it] +2025-02-05 18:07:59 - ERROR - stderr - 33%|███▎ | 7347/22434 [8:00:19<10:37:44, 2.54s/it] +2025-02-05 18:07:59 - ERROR - stderr - +2025-02-05 18:07:59 - ERROR - stderr - +2025-02-05 18:07:59 - INFO - stdout - {'loss': 0.9552, 'grad_norm': 1.023192286491394, 'learning_rate': 1.5707214712006523e-05, 'epoch': 0.98} +2025-02-05 18:07:59 - ERROR - stderr - 33%|███▎ | 7347/22434 [8:00:19<10:37:44, 2.54s/it] +2025-02-05 18:08:01 - ERROR - stderr - 33%|███▎ | 7348/22434 [8:00:21<10:39:26, 2.54s/it] +2025-02-05 18:08:01 - ERROR - stderr - +2025-02-05 18:08:01 - ERROR - stderr - +2025-02-05 18:08:01 - INFO - stdout - {'loss': 0.9155, 'grad_norm': 1.0753324031829834, 'learning_rate': 1.5706029129062235e-05, 'epoch': 0.98} +2025-02-05 18:08:01 - ERROR - stderr - 33%|███▎ | 7348/22434 [8:00:21<10:39:26, 2.54s/it] +2025-02-05 18:08:04 - ERROR - stderr - 33%|███▎ | 7349/22434 [8:00:24<10:40:52, 2.55s/it] +2025-02-05 18:08:04 - ERROR - stderr - +2025-02-05 18:08:04 - ERROR - stderr - +2025-02-05 18:08:04 - INFO - stdout - {'loss': 0.9115, 'grad_norm': 0.9969714283943176, 'learning_rate': 1.570484342718123e-05, 'epoch': 0.98} +2025-02-05 18:08:04 - ERROR - stderr - 33%|███▎ | 7349/22434 [8:00:24<10:40:52, 2.55s/it] +2025-02-05 18:08:06 - ERROR - stderr - 33%|███▎ | 7350/22434 [8:00:26<10:35:09, 2.53s/it] +2025-02-05 18:08:06 - ERROR - stderr - +2025-02-05 18:08:06 - ERROR - stderr - +2025-02-05 18:08:06 - INFO - stdout - {'loss': 0.8759, 'grad_norm': 0.9892032146453857, 'learning_rate': 1.570365760638822e-05, 'epoch': 0.98} +2025-02-05 18:08:06 - ERROR - stderr - 33%|███▎ | 7350/22434 [8:00:26<10:35:09, 2.53s/it] +2025-02-05 18:08:09 - ERROR - stderr - 33%|███▎ | 7351/22434 [8:00:29<10:32:48, 2.52s/it] +2025-02-05 18:08:09 - ERROR - stderr - +2025-02-05 18:08:09 - ERROR - stderr - +2025-02-05 18:08:09 - INFO - stdout - {'loss': 0.8997, 'grad_norm': 0.9327731728553772, 'learning_rate': 1.5702471666707932e-05, 'epoch': 0.98} +2025-02-05 18:08:09 - ERROR - stderr - 33%|███▎ | 7351/22434 [8:00:29<10:32:48, 2.52s/it] +2025-02-05 18:08:12 - ERROR - stderr - 33%|███▎ | 7352/22434 [8:00:31<10:42:52, 2.56s/it] +2025-02-05 18:08:12 - ERROR - stderr - +2025-02-05 18:08:12 - ERROR - stderr - +2025-02-05 18:08:12 - INFO - stdout - {'loss': 1.0087, 'grad_norm': 1.0576577186584473, 'learning_rate': 1.5701285608165073e-05, 'epoch': 0.98} +2025-02-05 18:08:12 - ERROR - stderr - 33%|███▎ | 7352/22434 [8:00:31<10:42:52, 2.56s/it] +2025-02-05 18:08:14 - ERROR - stderr - 33%|███▎ | 7353/22434 [8:00:34<10:58:37, 2.62s/it] +2025-02-05 18:08:14 - ERROR - stderr - +2025-02-05 18:08:14 - ERROR - stderr - +2025-02-05 18:08:14 - INFO - stdout - {'loss': 0.9294, 'grad_norm': 0.9899141788482666, 'learning_rate': 1.570009943078437e-05, 'epoch': 0.98} +2025-02-05 18:08:14 - ERROR - stderr - 33%|███▎ | 7353/22434 [8:00:34<10:58:37, 2.62s/it] +2025-02-05 18:08:17 - ERROR - stderr - 33%|███▎ | 7354/22434 [8:00:37<11:06:55, 2.65s/it] +2025-02-05 18:08:17 - ERROR - stderr - +2025-02-05 18:08:17 - ERROR - stderr - +2025-02-05 18:08:17 - INFO - stdout - {'loss': 0.9147, 'grad_norm': 1.059346079826355, 'learning_rate': 1.5698913134590552e-05, 'epoch': 0.98} +2025-02-05 18:08:17 - ERROR - stderr - 33%|███▎ | 7354/22434 [8:00:37<11:06:55, 2.65s/it] +2025-02-05 18:08:20 - ERROR - stderr - 33%|███▎ | 7355/22434 [8:00:39<10:56:45, 2.61s/it] +2025-02-05 18:08:20 - ERROR - stderr - +2025-02-05 18:08:20 - ERROR - stderr - +2025-02-05 18:08:20 - INFO - stdout - {'loss': 1.0502, 'grad_norm': 1.059155821800232, 'learning_rate': 1.5697726719608345e-05, 'epoch': 0.98} +2025-02-05 18:08:20 - ERROR - stderr - 33%|███▎ | 7355/22434 [8:00:39<10:56:45, 2.61s/it] +2025-02-05 18:08:22 - ERROR - stderr - 33%|███▎ | 7356/22434 [8:00:42<10:45:51, 2.57s/it] +2025-02-05 18:08:22 - ERROR - stderr - +2025-02-05 18:08:22 - ERROR - stderr - +2025-02-05 18:08:22 - INFO - stdout - {'loss': 0.8843, 'grad_norm': 0.9704837203025818, 'learning_rate': 1.5696540185862472e-05, 'epoch': 0.98} +2025-02-05 18:08:22 - ERROR - stderr - 33%|███▎ | 7356/22434 [8:00:42<10:45:51, 2.57s/it] +2025-02-05 18:08:25 - ERROR - stderr - 33%|███▎ | 7357/22434 [8:00:44<10:45:02, 2.57s/it] +2025-02-05 18:08:25 - ERROR - stderr - +2025-02-05 18:08:25 - ERROR - stderr - +2025-02-05 18:08:25 - INFO - stdout - {'loss': 0.8452, 'grad_norm': 0.9326636791229248, 'learning_rate': 1.5695353533377674e-05, 'epoch': 0.98} +2025-02-05 18:08:25 - ERROR - stderr - 33%|███▎ | 7357/22434 [8:00:44<10:45:02, 2.57s/it] +2025-02-05 18:08:27 - ERROR - stderr - 33%|███▎ | 7358/22434 [8:00:47<10:36:43, 2.53s/it] +2025-02-05 18:08:27 - ERROR - stderr - +2025-02-05 18:08:27 - ERROR - stderr - +2025-02-05 18:08:27 - INFO - stdout - {'loss': 1.0068, 'grad_norm': 1.3575596809387207, 'learning_rate': 1.5694166762178677e-05, 'epoch': 0.98} +2025-02-05 18:08:27 - ERROR - stderr - 33%|███▎ | 7358/22434 [8:00:47<10:36:43, 2.53s/it] +2025-02-05 18:08:30 - ERROR - stderr - 33%|███▎ | 7359/22434 [8:00:49<10:34:59, 2.53s/it] +2025-02-05 18:08:30 - ERROR - stderr - +2025-02-05 18:08:30 - ERROR - stderr - +2025-02-05 18:08:30 - INFO - stdout - {'loss': 0.909, 'grad_norm': 0.9865586757659912, 'learning_rate': 1.569297987229023e-05, 'epoch': 0.98} +2025-02-05 18:08:30 - ERROR - stderr - 33%|███▎ | 7359/22434 [8:00:49<10:34:59, 2.53s/it] +2025-02-05 18:08:32 - ERROR - stderr - 33%|███▎ | 7360/22434 [8:00:52<10:36:55, 2.54s/it] +2025-02-05 18:08:32 - ERROR - stderr - +2025-02-05 18:08:32 - ERROR - stderr - +2025-02-05 18:08:32 - INFO - stdout - {'loss': 0.8825, 'grad_norm': 1.0550199747085571, 'learning_rate': 1.5691792863737053e-05, 'epoch': 0.98} +2025-02-05 18:08:32 - ERROR - stderr - 33%|███▎ | 7360/22434 [8:00:52<10:36:55, 2.54s/it] +2025-02-05 18:08:35 - ERROR - stderr - 33%|███▎ | 7361/22434 [8:00:54<10:37:38, 2.54s/it] +2025-02-05 18:08:35 - ERROR - stderr - +2025-02-05 18:08:35 - ERROR - stderr - +2025-02-05 18:08:35 - INFO - stdout - {'loss': 0.9704, 'grad_norm': 1.013679027557373, 'learning_rate': 1.569060573654391e-05, 'epoch': 0.98} +2025-02-05 18:08:35 - ERROR - stderr - 33%|███▎ | 7361/22434 [8:00:55<10:37:38, 2.54s/it] +2025-02-05 18:08:37 - ERROR - stderr - 33%|███▎ | 7362/22434 [8:00:57<10:35:21, 2.53s/it] +2025-02-05 18:08:37 - ERROR - stderr - +2025-02-05 18:08:37 - ERROR - stderr - +2025-02-05 18:08:37 - INFO - stdout - {'loss': 0.8687, 'grad_norm': 0.8777495622634888, 'learning_rate': 1.5689418490735533e-05, 'epoch': 0.98} +2025-02-05 18:08:37 - ERROR - stderr - 33%|███▎ | 7362/22434 [8:00:57<10:35:21, 2.53s/it] +2025-02-05 18:08:40 - ERROR - stderr - 33%|███▎ | 7363/22434 [8:01:00<10:36:35, 2.53s/it] +2025-02-05 18:08:40 - ERROR - stderr - +2025-02-05 18:08:40 - ERROR - stderr - +2025-02-05 18:08:40 - INFO - stdout - {'loss': 0.9328, 'grad_norm': 1.0988759994506836, 'learning_rate': 1.568823112633667e-05, 'epoch': 0.98} +2025-02-05 18:08:40 - ERROR - stderr - 33%|███▎ | 7363/22434 [8:01:00<10:36:35, 2.53s/it] +2025-02-05 18:08:42 - ERROR - stderr - 33%|███▎ | 7364/22434 [8:01:02<10:33:19, 2.52s/it] +2025-02-05 18:08:42 - ERROR - stderr - +2025-02-05 18:08:42 - ERROR - stderr - +2025-02-05 18:08:42 - INFO - stdout - {'loss': 0.9867, 'grad_norm': 1.0776951313018799, 'learning_rate': 1.5687043643372076e-05, 'epoch': 0.98} +2025-02-05 18:08:42 - ERROR - stderr - 33%|███▎ | 7364/22434 [8:01:02<10:33:19, 2.52s/it] +2025-02-05 18:08:45 - ERROR - stderr - 33%|███▎ | 7365/22434 [8:01:05<10:37:35, 2.54s/it] +2025-02-05 18:08:45 - ERROR - stderr - +2025-02-05 18:08:45 - ERROR - stderr - +2025-02-05 18:08:45 - INFO - stdout - {'loss': 1.013, 'grad_norm': 1.1265208721160889, 'learning_rate': 1.5685856041866495e-05, 'epoch': 0.98} +2025-02-05 18:08:45 - ERROR - stderr - 33%|███▎ | 7365/22434 [8:01:05<10:37:35, 2.54s/it] +2025-02-05 18:08:47 - ERROR - stderr - 33%|███▎ | 7366/22434 [8:01:07<10:41:31, 2.55s/it] +2025-02-05 18:08:47 - ERROR - stderr - +2025-02-05 18:08:47 - ERROR - stderr - +2025-02-05 18:08:47 - INFO - stdout - {'loss': 0.8655, 'grad_norm': 1.0599254369735718, 'learning_rate': 1.5684668321844688e-05, 'epoch': 0.99} +2025-02-05 18:08:47 - ERROR - stderr - 33%|███▎ | 7366/22434 [8:01:07<10:41:31, 2.55s/it] +2025-02-05 18:08:50 - ERROR - stderr - 33%|███▎ | 7367/22434 [8:01:10<10:43:00, 2.56s/it] +2025-02-05 18:08:50 - ERROR - stderr - +2025-02-05 18:08:50 - ERROR - stderr - +2025-02-05 18:08:50 - INFO - stdout - {'loss': 0.8722, 'grad_norm': 0.9927284121513367, 'learning_rate': 1.568348048333141e-05, 'epoch': 0.99} +2025-02-05 18:08:50 - ERROR - stderr - 33%|███▎ | 7367/22434 [8:01:10<10:43:00, 2.56s/it] +2025-02-05 18:08:53 - ERROR - stderr - 33%|███▎ | 7368/22434 [8:01:12<10:39:37, 2.55s/it] +2025-02-05 18:08:53 - ERROR - stderr - +2025-02-05 18:08:53 - ERROR - stderr - +2025-02-05 18:08:53 - INFO - stdout - {'loss': 0.8717, 'grad_norm': 1.0686157941818237, 'learning_rate': 1.568229252635142e-05, 'epoch': 0.99} +2025-02-05 18:08:53 - ERROR - stderr - 33%|███▎ | 7368/22434 [8:01:12<10:39:37, 2.55s/it] +2025-02-05 18:08:55 - ERROR - stderr - 33%|███▎ | 7369/22434 [8:01:15<10:42:16, 2.56s/it] +2025-02-05 18:08:55 - ERROR - stderr - +2025-02-05 18:08:55 - ERROR - stderr - +2025-02-05 18:08:55 - INFO - stdout - {'loss': 0.9112, 'grad_norm': 1.0806455612182617, 'learning_rate': 1.5681104450929478e-05, 'epoch': 0.99} +2025-02-05 18:08:55 - ERROR - stderr - 33%|███▎ | 7369/22434 [8:01:15<10:42:16, 2.56s/it] +2025-02-05 18:08:58 - ERROR - stderr - 33%|███▎ | 7370/22434 [8:01:17<10:37:50, 2.54s/it] +2025-02-05 18:08:58 - ERROR - stderr - +2025-02-05 18:08:58 - ERROR - stderr - +2025-02-05 18:08:58 - INFO - stdout - {'loss': 1.0006, 'grad_norm': 1.0151512622833252, 'learning_rate': 1.5679916257090352e-05, 'epoch': 0.99} +2025-02-05 18:08:58 - ERROR - stderr - 33%|███▎ | 7370/22434 [8:01:17<10:37:50, 2.54s/it] +2025-02-05 18:09:00 - ERROR - stderr - 33%|███▎ | 7371/22434 [8:01:20<10:30:34, 2.51s/it] +2025-02-05 18:09:00 - ERROR - stderr - +2025-02-05 18:09:00 - ERROR - stderr - +2025-02-05 18:09:00 - INFO - stdout - {'loss': 0.9463, 'grad_norm': 1.1096863746643066, 'learning_rate': 1.5678727944858805e-05, 'epoch': 0.99} +2025-02-05 18:09:00 - ERROR - stderr - 33%|███▎ | 7371/22434 [8:01:20<10:30:34, 2.51s/it] +2025-02-05 18:09:03 - ERROR - stderr - 33%|███▎ | 7372/22434 [8:01:22<10:43:05, 2.56s/it] +2025-02-05 18:09:03 - ERROR - stderr - +2025-02-05 18:09:03 - ERROR - stderr - +2025-02-05 18:09:03 - INFO - stdout - {'loss': 1.045, 'grad_norm': 0.9373346567153931, 'learning_rate': 1.5677539514259608e-05, 'epoch': 0.99} +2025-02-05 18:09:03 - ERROR - stderr - 33%|███▎ | 7372/22434 [8:01:23<10:43:05, 2.56s/it] +2025-02-05 18:09:05 - ERROR - stderr - 33%|███▎ | 7373/22434 [8:01:25<10:35:42, 2.53s/it] +2025-02-05 18:09:05 - ERROR - stderr - +2025-02-05 18:09:05 - ERROR - stderr - +2025-02-05 18:09:05 - INFO - stdout - {'loss': 0.9842, 'grad_norm': 1.094415307044983, 'learning_rate': 1.5676350965317532e-05, 'epoch': 0.99} +2025-02-05 18:09:05 - ERROR - stderr - 33%|███▎ | 7373/22434 [8:01:25<10:35:42, 2.53s/it] +2025-02-05 18:09:08 - ERROR - stderr - 33%|███▎ | 7374/22434 [8:01:28<10:39:30, 2.55s/it] +2025-02-05 18:09:08 - ERROR - stderr - +2025-02-05 18:09:08 - ERROR - stderr - +2025-02-05 18:09:08 - INFO - stdout - {'loss': 1.0295, 'grad_norm': 1.0701595544815063, 'learning_rate': 1.5675162298057353e-05, 'epoch': 0.99} +2025-02-05 18:09:08 - ERROR - stderr - 33%|███▎ | 7374/22434 [8:01:28<10:39:30, 2.55s/it] +2025-02-05 18:09:10 - ERROR - stderr - 33%|███▎ | 7375/22434 [8:01:30<10:38:30, 2.54s/it] +2025-02-05 18:09:10 - ERROR - stderr - +2025-02-05 18:09:10 - ERROR - stderr - +2025-02-05 18:09:10 - INFO - stdout - {'loss': 0.9807, 'grad_norm': 1.0825715065002441, 'learning_rate': 1.5673973512503846e-05, 'epoch': 0.99} +2025-02-05 18:09:10 - ERROR - stderr - 33%|███▎ | 7375/22434 [8:01:30<10:38:30, 2.54s/it] +2025-02-05 18:09:13 - ERROR - stderr - 33%|███▎ | 7376/22434 [8:01:33<10:37:36, 2.54s/it] +2025-02-05 18:09:13 - ERROR - stderr - +2025-02-05 18:09:13 - ERROR - stderr - +2025-02-05 18:09:13 - INFO - stdout - {'loss': 0.9936, 'grad_norm': 1.0876411199569702, 'learning_rate': 1.567278460868179e-05, 'epoch': 0.99} +2025-02-05 18:09:13 - ERROR - stderr - 33%|███▎ | 7376/22434 [8:01:33<10:37:36, 2.54s/it] +2025-02-05 18:09:15 - ERROR - stderr - 33%|███▎ | 7377/22434 [8:01:35<10:38:57, 2.55s/it] +2025-02-05 18:09:15 - ERROR - stderr - +2025-02-05 18:09:15 - ERROR - stderr - +2025-02-05 18:09:15 - INFO - stdout - {'loss': 0.9066, 'grad_norm': 0.9611084461212158, 'learning_rate': 1.5671595586615968e-05, 'epoch': 0.99} +2025-02-05 18:09:15 - ERROR - stderr - 33%|███▎ | 7377/22434 [8:01:35<10:38:57, 2.55s/it] +2025-02-05 18:09:18 - ERROR - stderr - 33%|███▎ | 7378/22434 [8:01:38<10:32:31, 2.52s/it] +2025-02-05 18:09:18 - ERROR - stderr - +2025-02-05 18:09:18 - ERROR - stderr - +2025-02-05 18:09:18 - INFO - stdout - {'loss': 0.9281, 'grad_norm': 0.972186803817749, 'learning_rate': 1.5670406446331162e-05, 'epoch': 0.99} +2025-02-05 18:09:18 - ERROR - stderr - 33%|███▎ | 7378/22434 [8:01:38<10:32:31, 2.52s/it] +2025-02-05 18:09:21 - ERROR - stderr - 33%|███▎ | 7379/22434 [8:01:40<10:42:53, 2.56s/it] +2025-02-05 18:09:21 - ERROR - stderr - +2025-02-05 18:09:21 - ERROR - stderr - +2025-02-05 18:09:21 - INFO - stdout - {'loss': 0.9863, 'grad_norm': 0.9542217254638672, 'learning_rate': 1.566921718785216e-05, 'epoch': 0.99} +2025-02-05 18:09:21 - ERROR - stderr - 33%|███▎ | 7379/22434 [8:01:40<10:42:53, 2.56s/it] +2025-02-05 18:09:23 - ERROR - stderr - 33%|███▎ | 7380/22434 [8:01:43<10:39:09, 2.55s/it] +2025-02-05 18:09:23 - ERROR - stderr - +2025-02-05 18:09:23 - ERROR - stderr - +2025-02-05 18:09:23 - INFO - stdout - {'loss': 0.8012, 'grad_norm': 0.9953468441963196, 'learning_rate': 1.5668027811203752e-05, 'epoch': 0.99} +2025-02-05 18:09:23 - ERROR - stderr - 33%|███▎ | 7380/22434 [8:01:43<10:39:09, 2.55s/it] +2025-02-05 18:09:26 - ERROR - stderr - 33%|███▎ | 7381/22434 [8:01:45<10:35:53, 2.53s/it] +2025-02-05 18:09:26 - ERROR - stderr - +2025-02-05 18:09:26 - ERROR - stderr - +2025-02-05 18:09:26 - INFO - stdout - {'loss': 0.8389, 'grad_norm': 1.2139370441436768, 'learning_rate': 1.5666838316410727e-05, 'epoch': 0.99} +2025-02-05 18:09:26 - ERROR - stderr - 33%|███▎ | 7381/22434 [8:01:45<10:35:53, 2.53s/it] +2025-02-05 18:09:28 - ERROR - stderr - 33%|███▎ | 7382/22434 [8:01:48<10:31:35, 2.52s/it] +2025-02-05 18:09:28 - ERROR - stderr - +2025-02-05 18:09:28 - ERROR - stderr - +2025-02-05 18:09:28 - INFO - stdout - {'loss': 0.8267, 'grad_norm': 0.953430712223053, 'learning_rate': 1.566564870349788e-05, 'epoch': 0.99} +2025-02-05 18:09:28 - ERROR - stderr - 33%|███▎ | 7382/22434 [8:01:48<10:31:35, 2.52s/it] +2025-02-05 18:09:31 - ERROR - stderr - 33%|███▎ | 7383/22434 [8:01:50<10:34:27, 2.53s/it] +2025-02-05 18:09:31 - ERROR - stderr - +2025-02-05 18:09:31 - ERROR - stderr - +2025-02-05 18:09:31 - INFO - stdout - {'loss': 0.841, 'grad_norm': 1.0456918478012085, 'learning_rate': 1.566445897249001e-05, 'epoch': 0.99} +2025-02-05 18:09:31 - ERROR - stderr - 33%|███▎ | 7383/22434 [8:01:50<10:34:27, 2.53s/it] +2025-02-05 18:09:33 - ERROR - stderr - 33%|███▎ | 7384/22434 [8:01:53<10:31:17, 2.52s/it] +2025-02-05 18:09:33 - ERROR - stderr - +2025-02-05 18:09:33 - ERROR - stderr - +2025-02-05 18:09:33 - INFO - stdout - {'loss': 0.9885, 'grad_norm': 1.1001205444335938, 'learning_rate': 1.566326912341191e-05, 'epoch': 0.99} +2025-02-05 18:09:33 - ERROR - stderr - 33%|███▎ | 7384/22434 [8:01:53<10:31:17, 2.52s/it] +2025-02-05 18:09:36 - ERROR - stderr - 33%|███▎ | 7385/22434 [8:01:55<10:37:44, 2.54s/it] +2025-02-05 18:09:36 - ERROR - stderr - +2025-02-05 18:09:36 - ERROR - stderr - +2025-02-05 18:09:36 - INFO - stdout - {'loss': 0.8151, 'grad_norm': 0.989668071269989, 'learning_rate': 1.566207915628838e-05, 'epoch': 0.99} +2025-02-05 18:09:36 - ERROR - stderr - 33%|███▎ | 7385/22434 [8:01:55<10:37:44, 2.54s/it] +2025-02-05 18:09:38 - ERROR - stderr - 33%|███▎ | 7386/22434 [8:01:58<10:31:40, 2.52s/it] +2025-02-05 18:09:38 - ERROR - stderr - +2025-02-05 18:09:38 - ERROR - stderr - +2025-02-05 18:09:38 - INFO - stdout - {'loss': 0.9882, 'grad_norm': 1.3655250072479248, 'learning_rate': 1.5660889071144233e-05, 'epoch': 0.99} +2025-02-05 18:09:38 - ERROR - stderr - 33%|███▎ | 7386/22434 [8:01:58<10:31:40, 2.52s/it] +2025-02-05 18:09:41 - ERROR - stderr - 33%|███▎ | 7387/22434 [8:02:00<10:29:21, 2.51s/it] +2025-02-05 18:09:41 - ERROR - stderr - +2025-02-05 18:09:41 - ERROR - stderr - +2025-02-05 18:09:41 - INFO - stdout - {'loss': 0.912, 'grad_norm': 1.140180230140686, 'learning_rate': 1.5659698868004273e-05, 'epoch': 0.99} +2025-02-05 18:09:41 - ERROR - stderr - 33%|███▎ | 7387/22434 [8:02:00<10:29:21, 2.51s/it] +2025-02-05 18:09:43 - ERROR - stderr - 33%|███▎ | 7388/22434 [8:02:03<10:29:34, 2.51s/it] +2025-02-05 18:09:43 - ERROR - stderr - +2025-02-05 18:09:43 - ERROR - stderr - +2025-02-05 18:09:43 - INFO - stdout - {'loss': 0.9008, 'grad_norm': 1.0537627935409546, 'learning_rate': 1.56585085468933e-05, 'epoch': 0.99} +2025-02-05 18:09:43 - ERROR - stderr - 33%|███▎ | 7388/22434 [8:02:03<10:29:34, 2.51s/it] +2025-02-05 18:09:46 - ERROR - stderr - 33%|███▎ | 7389/22434 [8:02:05<10:33:03, 2.52s/it] +2025-02-05 18:09:46 - ERROR - stderr - +2025-02-05 18:09:46 - ERROR - stderr - +2025-02-05 18:09:46 - INFO - stdout - {'loss': 0.977, 'grad_norm': 1.3320255279541016, 'learning_rate': 1.5657318107836133e-05, 'epoch': 0.99} +2025-02-05 18:09:46 - ERROR - stderr - 33%|███▎ | 7389/22434 [8:02:06<10:33:03, 2.52s/it] +2025-02-05 18:09:48 - ERROR - stderr - 33%|███▎ | 7390/22434 [8:02:08<10:37:48, 2.54s/it] +2025-02-05 18:09:48 - ERROR - stderr - +2025-02-05 18:09:48 - ERROR - stderr - +2025-02-05 18:09:48 - INFO - stdout - {'loss': 1.0036, 'grad_norm': 1.1631561517715454, 'learning_rate': 1.5656127550857582e-05, 'epoch': 0.99} +2025-02-05 18:09:48 - ERROR - stderr - 33%|███▎ | 7390/22434 [8:02:08<10:37:48, 2.54s/it] +2025-02-05 18:09:51 - ERROR - stderr - 33%|███▎ | 7391/22434 [8:02:11<10:37:32, 2.54s/it] +2025-02-05 18:09:51 - ERROR - stderr - +2025-02-05 18:09:51 - ERROR - stderr - +2025-02-05 18:09:51 - INFO - stdout - {'loss': 0.9014, 'grad_norm': 0.962174117565155, 'learning_rate': 1.565493687598247e-05, 'epoch': 0.99} +2025-02-05 18:09:51 - ERROR - stderr - 33%|███▎ | 7391/22434 [8:02:11<10:37:32, 2.54s/it] +2025-02-05 18:09:53 - ERROR - stderr - 33%|███▎ | 7392/22434 [8:02:13<10:34:10, 2.53s/it] +2025-02-05 18:09:53 - ERROR - stderr - +2025-02-05 18:09:53 - ERROR - stderr - +2025-02-05 18:09:53 - INFO - stdout - {'loss': 0.9889, 'grad_norm': 1.0408951044082642, 'learning_rate': 1.5653746083235605e-05, 'epoch': 0.99} +2025-02-05 18:09:53 - ERROR - stderr - 33%|███▎ | 7392/22434 [8:02:13<10:34:10, 2.53s/it] +2025-02-05 18:09:56 - ERROR - stderr - 33%|███▎ | 7393/22434 [8:02:16<10:56:21, 2.62s/it] +2025-02-05 18:09:56 - ERROR - stderr - +2025-02-05 18:09:56 - ERROR - stderr - +2025-02-05 18:09:56 - INFO - stdout - {'loss': 0.8964, 'grad_norm': 1.0736782550811768, 'learning_rate': 1.5652555172641815e-05, 'epoch': 0.99} +2025-02-05 18:09:56 - ERROR - stderr - 33%|███▎ | 7393/22434 [8:02:16<10:56:21, 2.62s/it] +2025-02-05 18:09:59 - ERROR - stderr - 33%|███▎ | 7394/22434 [8:02:18<10:49:44, 2.59s/it] +2025-02-05 18:09:59 - ERROR - stderr - +2025-02-05 18:09:59 - ERROR - stderr - +2025-02-05 18:09:59 - INFO - stdout - {'loss': 0.8366, 'grad_norm': 0.9619301557540894, 'learning_rate': 1.565136414422592e-05, 'epoch': 0.99} +2025-02-05 18:09:59 - ERROR - stderr - 33%|███��� | 7394/22434 [8:02:18<10:49:44, 2.59s/it] +2025-02-05 18:10:01 - ERROR - stderr - 33%|███▎ | 7395/22434 [8:02:21<10:53:26, 2.61s/it] +2025-02-05 18:10:01 - ERROR - stderr - +2025-02-05 18:10:01 - ERROR - stderr - +2025-02-05 18:10:01 - INFO - stdout - {'loss': 0.8359, 'grad_norm': 0.990788996219635, 'learning_rate': 1.5650172998012746e-05, 'epoch': 0.99} +2025-02-05 18:10:01 - ERROR - stderr - 33%|███▎ | 7395/22434 [8:02:21<10:53:26, 2.61s/it] +2025-02-05 18:10:04 - ERROR - stderr - 33%|███▎ | 7396/22434 [8:02:24<10:42:41, 2.56s/it] +2025-02-05 18:10:04 - ERROR - stderr - +2025-02-05 18:10:04 - ERROR - stderr - +2025-02-05 18:10:04 - INFO - stdout - {'loss': 0.8799, 'grad_norm': 1.0370807647705078, 'learning_rate': 1.5648981734027128e-05, 'epoch': 0.99} +2025-02-05 18:10:04 - ERROR - stderr - 33%|███▎ | 7396/22434 [8:02:24<10:42:41, 2.56s/it] +2025-02-05 18:10:06 - ERROR - stderr - 33%|███▎ | 7397/22434 [8:02:26<10:41:47, 2.56s/it] +2025-02-05 18:10:06 - ERROR - stderr - +2025-02-05 18:10:06 - ERROR - stderr - +2025-02-05 18:10:06 - INFO - stdout - {'loss': 0.943, 'grad_norm': 0.9267067313194275, 'learning_rate': 1.5647790352293887e-05, 'epoch': 0.99} +2025-02-05 18:10:06 - ERROR - stderr - 33%|███▎ | 7397/22434 [8:02:26<10:41:47, 2.56s/it] +2025-02-05 18:10:09 - ERROR - stderr - 33%|███▎ | 7398/22434 [8:02:29<10:55:10, 2.61s/it] +2025-02-05 18:10:09 - ERROR - stderr - +2025-02-05 18:10:09 - ERROR - stderr - +2025-02-05 18:10:09 - INFO - stdout - {'loss': 0.9372, 'grad_norm': 0.9224997758865356, 'learning_rate': 1.5646598852837862e-05, 'epoch': 0.99} +2025-02-05 18:10:09 - ERROR - stderr - 33%|███▎ | 7398/22434 [8:02:29<10:55:10, 2.61s/it] +2025-02-05 18:10:12 - ERROR - stderr - 33%|███▎ | 7399/22434 [8:02:31<10:55:37, 2.62s/it] +2025-02-05 18:10:12 - ERROR - stderr - +2025-02-05 18:10:12 - ERROR - stderr - +2025-02-05 18:10:12 - INFO - stdout - {'loss': 0.9052, 'grad_norm': 1.0839706659317017, 'learning_rate': 1.5645407235683885e-05, 'epoch': 0.99} +2025-02-05 18:10:12 - ERROR - stderr - 33%|███▎ | 7399/22434 [8:02:32<10:55:37, 2.62s/it] +2025-02-05 18:10:14 - ERROR - stderr - 33%|███▎ | 7400/22434 [8:02:34<10:49:26, 2.59s/it] +2025-02-05 18:10:14 - ERROR - stderr - +2025-02-05 18:10:14 - ERROR - stderr - +2025-02-05 18:10:14 - INFO - stdout - {'loss': 0.8346, 'grad_norm': 1.0069739818572998, 'learning_rate': 1.5644215500856795e-05, 'epoch': 0.99} +2025-02-05 18:10:14 - ERROR - stderr - 33%|███▎ | 7400/22434 [8:02:34<10:49:26, 2.59s/it] +2025-02-05 18:10:17 - ERROR - stderr - 33%|███▎ | 7401/22434 [8:02:37<10:43:37, 2.57s/it] +2025-02-05 18:10:17 - ERROR - stderr - +2025-02-05 18:10:17 - ERROR - stderr - +2025-02-05 18:10:17 - INFO - stdout - {'loss': 0.8905, 'grad_norm': 0.9451998472213745, 'learning_rate': 1.564302364838144e-05, 'epoch': 0.99} +2025-02-05 18:10:17 - ERROR - stderr - 33%|███▎ | 7401/22434 [8:02:37<10:43:37, 2.57s/it] +2025-02-05 18:10:19 - ERROR - stderr - 33%|███▎ | 7402/22434 [8:02:39<10:33:41, 2.53s/it] +2025-02-05 18:10:19 - ERROR - stderr - +2025-02-05 18:10:19 - ERROR - stderr - +2025-02-05 18:10:19 - INFO - stdout - {'loss': 0.7804, 'grad_norm': 0.9637553095817566, 'learning_rate': 1.564183167828265e-05, 'epoch': 0.99} +2025-02-05 18:10:19 - ERROR - stderr - 33%|███▎ | 7402/22434 [8:02:39<10:33:41, 2.53s/it] +2025-02-05 18:10:22 - ERROR - stderr - 33%|███▎ | 7403/22434 [8:02:42<10:37:08, 2.54s/it] +2025-02-05 18:10:22 - ERROR - stderr - +2025-02-05 18:10:22 - ERROR - stderr - +2025-02-05 18:10:22 - INFO - stdout - {'loss': 0.8407, 'grad_norm': 1.0743519067764282, 'learning_rate': 1.5640639590585283e-05, 'epoch': 0.99} +2025-02-05 18:10:22 - ERROR - stderr - 33%|███▎ | 7403/22434 [8:02:42<10:37:08, 2.54s/it] +2025-02-05 18:10:24 - ERROR - stderr - 33%|███▎ | 7404/22434 [8:02:44<10:31:18, 2.52s/it] +2025-02-05 18:10:24 - ERROR - stderr - +2025-02-05 18:10:24 - ERROR - stderr - +2025-02-05 18:10:24 - INFO - stdout - {'loss': 0.9274, 'grad_norm': 1.0005062818527222, 'learning_rate': 1.5639447385314176e-05, 'epoch': 0.99} +2025-02-05 18:10:24 - ERROR - stderr - 33%|███▎ | 7404/22434 [8:02:44<10:31:18, 2.52s/it] +2025-02-05 18:10:27 - ERROR - stderr - 33%|███▎ | 7405/22434 [8:02:47<10:31:08, 2.52s/it] +2025-02-05 18:10:27 - ERROR - stderr - +2025-02-05 18:10:27 - ERROR - stderr - +2025-02-05 18:10:27 - INFO - stdout - {'loss': 0.889, 'grad_norm': 1.0159330368041992, 'learning_rate': 1.563825506249419e-05, 'epoch': 0.99} +2025-02-05 18:10:27 - ERROR - stderr - 33%|███▎ | 7405/22434 [8:02:47<10:31:08, 2.52s/it] +2025-02-05 18:10:30 - ERROR - stderr - 33%|███▎ | 7406/22434 [8:02:49<11:02:35, 2.65s/it] +2025-02-05 18:10:30 - ERROR - stderr - +2025-02-05 18:10:30 - ERROR - stderr - +2025-02-05 18:10:30 - INFO - stdout - {'loss': 0.9576, 'grad_norm': 1.1005687713623047, 'learning_rate': 1.5637062622150168e-05, 'epoch': 0.99} +2025-02-05 18:10:30 - ERROR - stderr - 33%|███▎ | 7406/22434 [8:02:49<11:02:35, 2.65s/it] +2025-02-05 18:10:32 - ERROR - stderr - 33%|███▎ | 7407/22434 [8:02:52<10:49:52, 2.59s/it] +2025-02-05 18:10:32 - ERROR - stderr - +2025-02-05 18:10:32 - ERROR - stderr - +2025-02-05 18:10:32 - INFO - stdout - {'loss': 1.0048, 'grad_norm': 1.0260437726974487, 'learning_rate': 1.563587006430697e-05, 'epoch': 0.99} +2025-02-05 18:10:32 - ERROR - stderr - 33%|███▎ | 7407/22434 [8:02:52<10:49:52, 2.59s/it] +2025-02-05 18:10:35 - ERROR - stderr - 33%|███▎ | 7408/22434 [8:02:54<10:47:00, 2.58s/it] +2025-02-05 18:10:35 - ERROR - stderr - +2025-02-05 18:10:35 - ERROR - stderr - +2025-02-05 18:10:35 - INFO - stdout - {'loss': 0.9219, 'grad_norm': 1.042299509048462, 'learning_rate': 1.5634677388989457e-05, 'epoch': 0.99} +2025-02-05 18:10:35 - ERROR - stderr - 33%|███▎ | 7408/22434 [8:02:55<10:47:00, 2.58s/it] +2025-02-05 18:10:37 - ERROR - stderr - 33%|███▎ | 7409/22434 [8:02:57<10:41:20, 2.56s/it] +2025-02-05 18:10:37 - ERROR - stderr - +2025-02-05 18:10:37 - ERROR - stderr - +2025-02-05 18:10:37 - INFO - stdout - {'loss': 0.8893, 'grad_norm': 1.1027491092681885, 'learning_rate': 1.5633484596222485e-05, 'epoch': 0.99} +2025-02-05 18:10:37 - ERROR - stderr - 33%|███▎ | 7409/22434 [8:02:57<10:41:20, 2.56s/it] +2025-02-05 18:10:40 - ERROR - stderr - 33%|███▎ | 7410/22434 [8:03:00<10:39:45, 2.55s/it] +2025-02-05 18:10:40 - ERROR - stderr - +2025-02-05 18:10:40 - ERROR - stderr - +2025-02-05 18:10:40 - INFO - stdout - {'loss': 0.9282, 'grad_norm': 1.0449978113174438, 'learning_rate': 1.5632291686030915e-05, 'epoch': 0.99} +2025-02-05 18:10:40 - ERROR - stderr - 33%|███▎ | 7410/22434 [8:03:00<10:39:45, 2.55s/it] +2025-02-05 18:10:42 - ERROR - stderr - 33%|███▎ | 7411/22434 [8:03:02<10:33:22, 2.53s/it] +2025-02-05 18:10:42 - ERROR - stderr - +2025-02-05 18:10:42 - ERROR - stderr - +2025-02-05 18:10:42 - INFO - stdout - {'loss': 0.834, 'grad_norm': 1.004031777381897, 'learning_rate': 1.5631098658439613e-05, 'epoch': 0.99} +2025-02-05 18:10:42 - ERROR - stderr - 33%|███▎ | 7411/22434 [8:03:02<10:33:22, 2.53s/it] +2025-02-05 18:10:45 - ERROR - stderr - 33%|███▎ | 7412/22434 [8:03:04<10:30:52, 2.52s/it] +2025-02-05 18:10:45 - ERROR - stderr - +2025-02-05 18:10:45 - ERROR - stderr - +2025-02-05 18:10:45 - INFO - stdout - {'loss': 0.9026, 'grad_norm': 0.9347333908081055, 'learning_rate': 1.562990551347345e-05, 'epoch': 0.99} +2025-02-05 18:10:45 - ERROR - stderr - 33%|███▎ | 7412/22434 [8:03:05<10:30:52, 2.52s/it] +2025-02-05 18:10:47 - ERROR - stderr - 33%|███▎ | 7413/22434 [8:03:07<10:24:11, 2.49s/it] +2025-02-05 18:10:47 - ERROR - stderr - +2025-02-05 18:10:47 - ERROR - stderr - +2025-02-05 18:10:47 - INFO - stdout - {'loss': 1.1266, 'grad_norm': 1.027057409286499, 'learning_rate': 1.5628712251157298e-05, 'epoch': 0.99} +2025-02-05 18:10:47 - ERROR - stderr - 33%|███▎ | 7413/22434 [8:03:07<10:24:11, 2.49s/it] +2025-02-05 18:10:50 - ERROR - stderr - 33%|███▎ | 7414/22434 [8:03:09<10:22:53, 2.49s/it] +2025-02-05 18:10:50 - ERROR - stderr - +2025-02-05 18:10:50 - ERROR - stderr - +2025-02-05 18:10:50 - INFO - stdout - {'loss': 0.9656, 'grad_norm': 1.117738127708435, 'learning_rate': 1.562751887151602e-05, 'epoch': 0.99} +2025-02-05 18:10:50 - ERROR - stderr - 33%|███▎ | 7414/22434 [8:03:09<10:22:53, 2.49s/it] +2025-02-05 18:10:52 - ERROR - stderr - 33%|███▎ | 7415/22434 [8:03:12<10:30:18, 2.52s/it] +2025-02-05 18:10:52 - ERROR - stderr - +2025-02-05 18:10:52 - ERROR - stderr - +2025-02-05 18:10:52 - INFO - stdout - {'loss': 0.8059, 'grad_norm': 1.0061217546463013, 'learning_rate': 1.5626325374574495e-05, 'epoch': 0.99} +2025-02-05 18:10:52 - ERROR - stderr - 33%|███▎ | 7415/22434 [8:03:12<10:30:18, 2.52s/it] +2025-02-05 18:10:55 - ERROR - stderr - 33%|███▎ | 7416/22434 [8:03:15<10:34:39, 2.54s/it] +2025-02-05 18:10:55 - ERROR - stderr - +2025-02-05 18:10:55 - ERROR - stderr - +2025-02-05 18:10:55 - INFO - stdout - {'loss': 0.9177, 'grad_norm': 1.0674973726272583, 'learning_rate': 1.5625131760357603e-05, 'epoch': 0.99} +2025-02-05 18:10:55 - ERROR - stderr - 33%|███▎ | 7416/22434 [8:03:15<10:34:39, 2.54s/it] +2025-02-05 18:10:57 - ERROR - stderr - 33%|███▎ | 7417/22434 [8:03:17<10:33:38, 2.53s/it] +2025-02-05 18:10:57 - ERROR - stderr - +2025-02-05 18:10:57 - ERROR - stderr - +2025-02-05 18:10:57 - INFO - stdout - {'loss': 0.9132, 'grad_norm': 0.9088889360427856, 'learning_rate': 1.5623938028890222e-05, 'epoch': 0.99} +2025-02-05 18:10:57 - ERROR - stderr - 33%|███▎ | 7417/22434 [8:03:17<10:33:38, 2.53s/it] +2025-02-05 18:11:00 - ERROR - stderr - 33%|███▎ | 7418/22434 [8:03:20<10:34:40, 2.54s/it] +2025-02-05 18:11:00 - ERROR - stderr - +2025-02-05 18:11:00 - ERROR - stderr - +2025-02-05 18:11:00 - INFO - stdout - {'loss': 0.8085, 'grad_norm': 1.1138849258422852, 'learning_rate': 1.5622744180197236e-05, 'epoch': 0.99} +2025-02-05 18:11:00 - ERROR - stderr - 33%|███▎ | 7418/22434 [8:03:20<10:34:40, 2.54s/it] +2025-02-05 18:11:02 - ERROR - stderr - 33%|███▎ | 7419/22434 [8:03:22<10:30:00, 2.52s/it] +2025-02-05 18:11:02 - ERROR - stderr - +2025-02-05 18:11:02 - ERROR - stderr - +2025-02-05 18:11:02 - INFO - stdout - {'loss': 0.8653, 'grad_norm': 0.9512990713119507, 'learning_rate': 1.5621550214303526e-05, 'epoch': 0.99} +2025-02-05 18:11:02 - ERROR - stderr - 33%|███▎ | 7419/22434 [8:03:22<10:30:00, 2.52s/it] +2025-02-05 18:11:05 - ERROR - stderr - 33%|███▎ | 7420/22434 [8:03:25<10:24:44, 2.50s/it] +2025-02-05 18:11:05 - ERROR - stderr - +2025-02-05 18:11:05 - ERROR - stderr - +2025-02-05 18:11:05 - INFO - stdout - {'loss': 0.9056, 'grad_norm': 1.0697312355041504, 'learning_rate': 1.5620356131233982e-05, 'epoch': 0.99} +2025-02-05 18:11:05 - ERROR - stderr - 33%|███▎ | 7420/22434 [8:03:25<10:24:44, 2.50s/it] +2025-02-05 18:11:07 - ERROR - stderr - 33%|███▎ | 7421/22434 [8:03:27<10:21:25, 2.48s/it] +2025-02-05 18:11:07 - ERROR - stderr - +2025-02-05 18:11:07 - ERROR - stderr - +2025-02-05 18:11:07 - INFO - stdout - {'loss': 0.954, 'grad_norm': 1.0702040195465088, 'learning_rate': 1.5619161931013494e-05, 'epoch': 0.99} +2025-02-05 18:11:07 - ERROR - stderr - 33%|███▎ | 7421/22434 [8:03:27<10:21:25, 2.48s/it] +2025-02-05 18:11:10 - ERROR - stderr - 33%|███▎ | 7422/22434 [8:03:29<10:19:28, 2.48s/it] +2025-02-05 18:11:10 - ERROR - stderr - +2025-02-05 18:11:10 - ERROR - stderr - +2025-02-05 18:11:10 - INFO - stdout - {'loss': 0.8585, 'grad_norm': 0.9819034337997437, 'learning_rate': 1.561796761366695e-05, 'epoch': 0.99} +2025-02-05 18:11:10 - ERROR - stderr - 33%|███▎ | 7422/22434 [8:03:30<10:19:28, 2.48s/it] +2025-02-05 18:11:12 - ERROR - stderr - 33%|███▎ | 7423/22434 [8:03:32<10:23:45, 2.49s/it] +2025-02-05 18:11:12 - ERROR - stderr - +2025-02-05 18:11:12 - ERROR - stderr - +2025-02-05 18:11:12 - INFO - stdout - {'loss': 1.0147, 'grad_norm': 0.9655911326408386, 'learning_rate': 1.5616773179219248e-05, 'epoch': 0.99} +2025-02-05 18:11:12 - ERROR - stderr - 33%|███▎ | 7423/22434 [8:03:32<10:23:45, 2.49s/it] +2025-02-05 18:11:15 - ERROR - stderr - 33%|███▎ | 7424/22434 [8:03:35<10:24:49, 2.50s/it] +2025-02-05 18:11:15 - ERROR - stderr - +2025-02-05 18:11:15 - ERROR - stderr - +2025-02-05 18:11:15 - INFO - stdout - {'loss': 0.8669, 'grad_norm': 1.0780658721923828, 'learning_rate': 1.5615578627695283e-05, 'epoch': 0.99} +2025-02-05 18:11:15 - ERROR - stderr - 33%|███▎ | 7424/22434 [8:03:35<10:24:49, 2.50s/it] +2025-02-05 18:11:17 - ERROR - stderr - 33%|███▎ | 7425/22434 [8:03:37<10:23:10, 2.49s/it] +2025-02-05 18:11:17 - ERROR - stderr - +2025-02-05 18:11:17 - ERROR - stderr - +2025-02-05 18:11:17 - INFO - stdout - {'loss': 0.8619, 'grad_norm': 0.9833524227142334, 'learning_rate': 1.5614383959119958e-05, 'epoch': 0.99} +2025-02-05 18:11:17 - ERROR - stderr - 33%|███▎ | 7425/22434 [8:03:37<10:23:10, 2.49s/it] +2025-02-05 18:11:20 - ERROR - stderr - 33%|███▎ | 7426/22434 [8:03:40<10:30:03, 2.52s/it] +2025-02-05 18:11:20 - ERROR - stderr - +2025-02-05 18:11:20 - ERROR - stderr - +2025-02-05 18:11:20 - INFO - stdout - {'loss': 0.84, 'grad_norm': 0.9126155376434326, 'learning_rate': 1.5613189173518167e-05, 'epoch': 0.99} +2025-02-05 18:11:20 - ERROR - stderr - 33%|███▎ | 7426/22434 [8:03:40<10:30:03, 2.52s/it] +2025-02-05 18:11:22 - ERROR - stderr - 33%|███▎ | 7427/22434 [8:03:42<10:32:56, 2.53s/it] +2025-02-05 18:11:22 - ERROR - stderr - +2025-02-05 18:11:22 - ERROR - stderr - +2025-02-05 18:11:22 - INFO - stdout - {'loss': 0.9353, 'grad_norm': 1.0242676734924316, 'learning_rate': 1.561199427091482e-05, 'epoch': 0.99} +2025-02-05 18:11:22 - ERROR - stderr - 33%|███▎ | 7427/22434 [8:03:42<10:32:56, 2.53s/it] +2025-02-05 18:11:25 - ERROR - stderr - 33%|███▎ | 7428/22434 [8:03:45<10:32:05, 2.53s/it] +2025-02-05 18:11:25 - ERROR - stderr - +2025-02-05 18:11:25 - ERROR - stderr - +2025-02-05 18:11:25 - INFO - stdout - {'loss': 0.9128, 'grad_norm': 1.106041669845581, 'learning_rate': 1.5610799251334825e-05, 'epoch': 0.99} +2025-02-05 18:11:25 - ERROR - stderr - 33%|███▎ | 7428/22434 [8:03:45<10:32:05, 2.53s/it] +2025-02-05 18:11:27 - ERROR - stderr - 33%|███▎ | 7429/22434 [8:03:47<10:29:18, 2.52s/it] +2025-02-05 18:11:27 - ERROR - stderr - +2025-02-05 18:11:27 - ERROR - stderr - +2025-02-05 18:11:27 - INFO - stdout - {'loss': 0.9987, 'grad_norm': 1.106690526008606, 'learning_rate': 1.5609604114803086e-05, 'epoch': 0.99} +2025-02-05 18:11:27 - ERROR - stderr - 33%|███▎ | 7429/22434 [8:03:47<10:29:18, 2.52s/it] +2025-02-05 18:11:30 - ERROR - stderr - 33%|███▎ | 7430/22434 [8:03:50<10:55:26, 2.62s/it] +2025-02-05 18:11:30 - ERROR - stderr - +2025-02-05 18:11:30 - ERROR - stderr - +2025-02-05 18:11:30 - INFO - stdout - {'loss': 0.9451, 'grad_norm': 1.1168925762176514, 'learning_rate': 1.560840886134452e-05, 'epoch': 0.99} +2025-02-05 18:11:30 - ERROR - stderr - 33%|███▎ | 7430/22434 [8:03:50<10:55:26, 2.62s/it] +2025-02-05 18:11:33 - ERROR - stderr - 33%|███▎ | 7431/22434 [8:03:53<10:51:00, 2.60s/it] +2025-02-05 18:11:33 - ERROR - stderr - +2025-02-05 18:11:33 - ERROR - stderr - +2025-02-05 18:11:33 - INFO - stdout - {'loss': 0.8905, 'grad_norm': 1.088492512702942, 'learning_rate': 1.5607213490984038e-05, 'epoch': 0.99} +2025-02-05 18:11:33 - ERROR - stderr - 33%|███▎ | 7431/22434 [8:03:53<10:51:00, 2.60s/it] +2025-02-05 18:11:35 - ERROR - stderr - 33%|███▎ | 7432/22434 [8:03:55<10:46:43, 2.59s/it] +2025-02-05 18:11:35 - ERROR - stderr - +2025-02-05 18:11:35 - ERROR - stderr - +2025-02-05 18:11:35 - INFO - stdout - {'loss': 0.8631, 'grad_norm': 1.0752133131027222, 'learning_rate': 1.5606018003746554e-05, 'epoch': 0.99} +2025-02-05 18:11:35 - ERROR - stderr - 33%|███▎ | 7432/22434 [8:03:55<10:46:43, 2.59s/it] +2025-02-05 18:11:38 - ERROR - stderr - 33%|███▎ | 7433/22434 [8:03:58<10:35:44, 2.54s/it] +2025-02-05 18:11:38 - ERROR - stderr - +2025-02-05 18:11:38 - ERROR - stderr - +2025-02-05 18:11:38 - INFO - stdout - {'loss': 1.0006, 'grad_norm': 1.1848264932632446, 'learning_rate': 1.560482239965699e-05, 'epoch': 0.99} +2025-02-05 18:11:38 - ERROR - stderr - 33%|███▎ | 7433/22434 [8:03:58<10:35:44, 2.54s/it] +2025-02-05 18:11:40 - ERROR - stderr - 33%|███▎ | 7434/22434 [8:04:00<10:35:49, 2.54s/it] +2025-02-05 18:11:40 - ERROR - stderr - +2025-02-05 18:11:40 - ERROR - stderr - +2025-02-05 18:11:40 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 1.0940881967544556, 'learning_rate': 1.5603626678740266e-05, 'epoch': 0.99} +2025-02-05 18:11:40 - ERROR - stderr - 33%|███▎ | 7434/22434 [8:04:00<10:35:49, 2.54s/it] +2025-02-05 18:11:43 - ERROR - stderr - 33%|███▎ | 7435/22434 [8:04:03<10:29:33, 2.52s/it] +2025-02-05 18:11:43 - ERROR - stderr - +2025-02-05 18:11:43 - ERROR - stderr - +2025-02-05 18:11:43 - INFO - stdout - {'loss': 1.0622, 'grad_norm': 1.1784415245056152, 'learning_rate': 1.5602430841021304e-05, 'epoch': 0.99} +2025-02-05 18:11:43 - ERROR - stderr - 33%|███▎ | 7435/22434 [8:04:03<10:29:33, 2.52s/it] +2025-02-05 18:11:45 - ERROR - stderr - 33%|███▎ | 7436/22434 [8:04:05<10:33:08, 2.53s/it] +2025-02-05 18:11:45 - ERROR - stderr - +2025-02-05 18:11:45 - ERROR - stderr - +2025-02-05 18:11:45 - INFO - stdout - {'loss': 1.0632, 'grad_norm': 1.0477439165115356, 'learning_rate': 1.5601234886525034e-05, 'epoch': 0.99} +2025-02-05 18:11:45 - ERROR - stderr - 33%|███▎ | 7436/22434 [8:04:05<10:33:08, 2.53s/it] +2025-02-05 18:11:48 - ERROR - stderr - 33%|███▎ | 7437/22434 [8:04:08<10:32:14, 2.53s/it] +2025-02-05 18:11:48 - ERROR - stderr - +2025-02-05 18:11:48 - ERROR - stderr - +2025-02-05 18:11:48 - INFO - stdout - {'loss': 0.869, 'grad_norm': 0.9544627666473389, 'learning_rate': 1.560003881527638e-05, 'epoch': 0.99} +2025-02-05 18:11:48 - ERROR - stderr - 33%|███▎ | 7437/22434 [8:04:08<10:32:14, 2.53s/it] +2025-02-05 18:11:51 - ERROR - stderr - 33%|███▎ | 7438/22434 [8:04:10<10:41:55, 2.57s/it] +2025-02-05 18:11:51 - ERROR - stderr - +2025-02-05 18:11:51 - ERROR - stderr - +2025-02-05 18:11:51 - INFO - stdout - {'loss': 0.8926, 'grad_norm': 0.9303823709487915, 'learning_rate': 1.559884262730028e-05, 'epoch': 0.99} +2025-02-05 18:11:51 - ERROR - stderr - 33%|███▎ | 7438/22434 [8:04:10<10:41:55, 2.57s/it] +2025-02-05 18:11:53 - ERROR - stderr - 33%|███▎ | 7439/22434 [8:04:13<10:41:55, 2.57s/it] +2025-02-05 18:11:53 - ERROR - stderr - +2025-02-05 18:11:53 - ERROR - stderr - +2025-02-05 18:11:53 - INFO - stdout - {'loss': 0.8717, 'grad_norm': 1.0375386476516724, 'learning_rate': 1.5597646322621663e-05, 'epoch': 0.99} +2025-02-05 18:11:53 - ERROR - stderr - 33%|███▎ | 7439/22434 [8:04:13<10:41:55, 2.57s/it] +2025-02-05 18:11:56 - ERROR - stderr - 33%|███▎ | 7440/22434 [8:04:15<10:44:56, 2.58s/it] +2025-02-05 18:11:56 - ERROR - stderr - +2025-02-05 18:11:56 - ERROR - stderr - +2025-02-05 18:11:56 - INFO - stdout - {'loss': 0.9887, 'grad_norm': 1.0868362188339233, 'learning_rate': 1.559644990126546e-05, 'epoch': 0.99} +2025-02-05 18:11:56 - ERROR - stderr - 33%|███▎ | 7440/22434 [8:04:16<10:44:56, 2.58s/it] +2025-02-05 18:11:58 - ERROR - stderr - 33%|███▎ | 7441/22434 [8:04:18<10:40:49, 2.56s/it] +2025-02-05 18:11:58 - ERROR - stderr - +2025-02-05 18:11:58 - ERROR - stderr - +2025-02-05 18:11:58 - INFO - stdout - {'loss': 0.845, 'grad_norm': 1.0401692390441895, 'learning_rate': 1.559525336325662e-05, 'epoch': 1.0} +2025-02-05 18:11:58 - ERROR - stderr - 33%|███▎ | 7441/22434 [8:04:18<10:40:49, 2.56s/it] +2025-02-05 18:12:01 - ERROR - stderr - 33%|███▎ | 7442/22434 [8:04:21<10:36:04, 2.55s/it] +2025-02-05 18:12:01 - ERROR - stderr - +2025-02-05 18:12:01 - ERROR - stderr - +2025-02-05 18:12:01 - INFO - stdout - {'loss': 0.9772, 'grad_norm': 1.1592814922332764, 'learning_rate': 1.5594056708620073e-05, 'epoch': 1.0} +2025-02-05 18:12:01 - ERROR - stderr - 33%|███▎ | 7442/22434 [8:04:21<10:36:04, 2.55s/it] +2025-02-05 18:12:03 - ERROR - stderr - 33%|███▎ | 7443/22434 [8:04:23<10:32:00, 2.53s/it] +2025-02-05 18:12:03 - ERROR - stderr - +2025-02-05 18:12:03 - ERROR - stderr - +2025-02-05 18:12:03 - INFO - stdout - {'loss': 0.9896, 'grad_norm': 1.0790382623672485, 'learning_rate': 1.559285993738077e-05, 'epoch': 1.0} +2025-02-05 18:12:03 - ERROR - stderr - 33%|███▎ | 7443/22434 [8:04:23<10:32:00, 2.53s/it] +2025-02-05 18:12:06 - ERROR - stderr - 33%|███▎ | 7444/22434 [8:04:26<11:09:52, 2.68s/it] +2025-02-05 18:12:06 - ERROR - stderr - +2025-02-05 18:12:06 - ERROR - stderr - +2025-02-05 18:12:06 - INFO - stdout - {'loss': 0.9018, 'grad_norm': 0.9497143030166626, 'learning_rate': 1.559166304956365e-05, 'epoch': 1.0} +2025-02-05 18:12:06 - ERROR - stderr - 33%|███▎ | 7444/22434 [8:04:26<11:09:52, 2.68s/it] +2025-02-05 18:12:09 - ERROR - stderr - 33%|███▎ | 7445/22434 [8:04:29<10:54:53, 2.62s/it] +2025-02-05 18:12:09 - ERROR - stderr - +2025-02-05 18:12:09 - ERROR - stderr - +2025-02-05 18:12:09 - INFO - stdout - {'loss': 0.831, 'grad_norm': 0.9990165829658508, 'learning_rate': 1.5590466045193666e-05, 'epoch': 1.0} +2025-02-05 18:12:09 - ERROR - stderr - 33%|███▎ | 7445/22434 [8:04:29<10:54:53, 2.62s/it] +2025-02-05 18:12:11 - ERROR - stderr - 33%|███▎ | 7446/22434 [8:04:31<10:54:56, 2.62s/it] +2025-02-05 18:12:11 - ERROR - stderr - +2025-02-05 18:12:11 - ERROR - stderr - +2025-02-05 18:12:11 - INFO - stdout - {'loss': 0.9651, 'grad_norm': 1.0630600452423096, 'learning_rate': 1.5589268924295768e-05, 'epoch': 1.0} +2025-02-05 18:12:11 - ERROR - stderr - 33%|███▎ | 7446/22434 [8:04:31<10:54:56, 2.62s/it] +2025-02-05 18:12:14 - ERROR - stderr - 33%|███▎ | 7447/22434 [8:04:34<10:45:39, 2.58s/it] +2025-02-05 18:12:14 - ERROR - stderr - +2025-02-05 18:12:14 - ERROR - stderr - +2025-02-05 18:12:14 - INFO - stdout - {'loss': 0.8993, 'grad_norm': 1.0353327989578247, 'learning_rate': 1.558807168689491e-05, 'epoch': 1.0} +2025-02-05 18:12:14 - ERROR - stderr - 33%|███▎ | 7447/22434 [8:04:34<10:45:39, 2.58s/it] +2025-02-05 18:12:16 - ERROR - stderr - 33%|███▎ | 7448/22434 [8:04:36<10:39:35, 2.56s/it] +2025-02-05 18:12:16 - ERROR - stderr - +2025-02-05 18:12:16 - ERROR - stderr - +2025-02-05 18:12:16 - INFO - stdout - {'loss': 1.0157, 'grad_norm': 1.175265908241272, 'learning_rate': 1.558687433301604e-05, 'epoch': 1.0} +2025-02-05 18:12:16 - ERROR - stderr - 33%|███▎ | 7448/22434 [8:04:36<10:39:35, 2.56s/it] +2025-02-05 18:12:19 - ERROR - stderr - 33%|███▎ | 7449/22434 [8:04:39<10:32:33, 2.53s/it] +2025-02-05 18:12:19 - ERROR - stderr - +2025-02-05 18:12:19 - ERROR - stderr - +2025-02-05 18:12:19 - INFO - stdout - {'loss': 0.9541, 'grad_norm': 1.012931227684021, 'learning_rate': 1.558567686268412e-05, 'epoch': 1.0} +2025-02-05 18:12:19 - ERROR - stderr - 33%|███▎ | 7449/22434 [8:04:39<10:32:33, 2.53s/it] +2025-02-05 18:12:21 - ERROR - stderr - 33%|███▎ | 7450/22434 [8:04:41<10:33:08, 2.54s/it] +2025-02-05 18:12:21 - ERROR - stderr - +2025-02-05 18:12:21 - ERROR - stderr - +2025-02-05 18:12:21 - INFO - stdout - {'loss': 0.9133, 'grad_norm': 0.9576447010040283, 'learning_rate': 1.5584479275924112e-05, 'epoch': 1.0} +2025-02-05 18:12:21 - ERROR - stderr - 33%|███▎ | 7450/22434 [8:04:41<10:33:08, 2.54s/it] +2025-02-05 18:12:24 - ERROR - stderr - 33%|███▎ | 7451/22434 [8:04:44<10:58:53, 2.64s/it] +2025-02-05 18:12:24 - ERROR - stderr - +2025-02-05 18:12:24 - ERROR - stderr - +2025-02-05 18:12:24 - INFO - stdout - {'loss': 0.9846, 'grad_norm': 1.0208418369293213, 'learning_rate': 1.558328157276098e-05, 'epoch': 1.0} +2025-02-05 18:12:24 - ERROR - stderr - 33%|███▎ | 7451/22434 [8:04:44<10:58:53, 2.64s/it] +2025-02-05 18:12:27 - ERROR - stderr - 33%|███▎ | 7452/22434 [8:04:47<10:49:45, 2.60s/it] +2025-02-05 18:12:27 - ERROR - stderr - +2025-02-05 18:12:27 - ERROR - stderr - +2025-02-05 18:12:27 - INFO - stdout - {'loss': 0.991, 'grad_norm': 1.1926788091659546, 'learning_rate': 1.5582083753219682e-05, 'epoch': 1.0} +2025-02-05 18:12:27 - ERROR - stderr - 33%|███▎ | 7452/22434 [8:04:47<10:49:45, 2.60s/it] +2025-02-05 18:12:29 - ERROR - stderr - 33%|███▎ | 7453/22434 [8:04:49<10:47:43, 2.59s/it] +2025-02-05 18:12:29 - ERROR - stderr - +2025-02-05 18:12:29 - ERROR - stderr - +2025-02-05 18:12:29 - INFO - stdout - {'loss': 1.0054, 'grad_norm': 1.2596707344055176, 'learning_rate': 1.5580885817325192e-05, 'epoch': 1.0} +2025-02-05 18:12:29 - ERROR - stderr - 33%|███▎ | 7453/22434 [8:04:49<10:47:43, 2.59s/it] +2025-02-05 18:12:32 - ERROR - stderr - 33%|███▎ | 7454/22434 [8:04:52<10:47:35, 2.59s/it] +2025-02-05 18:12:32 - ERROR - stderr - +2025-02-05 18:12:32 - ERROR - stderr - +2025-02-05 18:12:32 - INFO - stdout - {'loss': 0.9511, 'grad_norm': 1.0863382816314697, 'learning_rate': 1.557968776510248e-05, 'epoch': 1.0} +2025-02-05 18:12:32 - ERROR - stderr - 33%|███▎ | 7454/22434 [8:04:52<10:47:35, 2.59s/it] +2025-02-05 18:12:34 - ERROR - stderr - 33%|███▎ | 7455/22434 [8:04:54<10:41:20, 2.57s/it] +2025-02-05 18:12:35 - ERROR - stderr - +2025-02-05 18:12:35 - ERROR - stderr - +2025-02-05 18:12:35 - INFO - stdout - {'loss': 0.9496, 'grad_norm': 1.040247917175293, 'learning_rate': 1.5578489596576513e-05, 'epoch': 1.0} +2025-02-05 18:12:35 - ERROR - stderr - 33%|███▎ | 7455/22434 [8:04:54<10:41:20, 2.57s/it] +2025-02-05 18:12:37 - ERROR - stderr - 33%|███▎ | 7456/22434 [8:04:57<10:34:09, 2.54s/it] +2025-02-05 18:12:37 - ERROR - stderr - +2025-02-05 18:12:37 - ERROR - stderr - +2025-02-05 18:12:37 - INFO - stdout - {'loss': 0.8919, 'grad_norm': 0.9456450343132019, 'learning_rate': 1.5577291311772268e-05, 'epoch': 1.0} +2025-02-05 18:12:37 - ERROR - stderr - 33%|███▎ | 7456/22434 [8:04:57<10:34:09, 2.54s/it] +2025-02-05 18:12:40 - ERROR - stderr - 33%|███▎ | 7457/22434 [8:04:59<10:38:03, 2.56s/it] +2025-02-05 18:12:40 - ERROR - stderr - +2025-02-05 18:12:40 - ERROR - stderr - +2025-02-05 18:12:40 - INFO - stdout - {'loss': 0.9092, 'grad_norm': 1.0574474334716797, 'learning_rate': 1.557609291071472e-05, 'epoch': 1.0} +2025-02-05 18:12:40 - ERROR - stderr - 33%|███▎ | 7457/22434 [8:04:59<10:38:03, 2.56s/it] +2025-02-05 18:12:42 - ERROR - stderr - 33%|███▎ | 7458/22434 [8:05:02<10:36:59, 2.55s/it] +2025-02-05 18:12:42 - ERROR - stderr - +2025-02-05 18:12:42 - ERROR - stderr - +2025-02-05 18:12:42 - INFO - stdout - {'loss': 0.9425, 'grad_norm': 0.9996885061264038, 'learning_rate': 1.5574894393428856e-05, 'epoch': 1.0} +2025-02-05 18:12:42 - ERROR - stderr - 33%|███▎ | 7458/22434 [8:05:02<10:36:59, 2.55s/it] +2025-02-05 18:12:45 - ERROR - stderr - 33%|███▎ | 7459/22434 [8:05:05<11:04:48, 2.66s/it] +2025-02-05 18:12:45 - ERROR - stderr - +2025-02-05 18:12:45 - ERROR - stderr - +2025-02-05 18:12:45 - INFO - stdout - {'loss': 0.8576, 'grad_norm': 0.9460115432739258, 'learning_rate': 1.557369575993965e-05, 'epoch': 1.0} +2025-02-05 18:12:45 - ERROR - stderr - 33%|███▎ | 7459/22434 [8:05:05<11:04:48, 2.66s/it] +2025-02-05 18:12:48 - ERROR - stderr - 33%|███▎ | 7460/22434 [8:05:08<11:31:43, 2.77s/it] +2025-02-05 18:12:48 - ERROR - stderr - +2025-02-05 18:12:48 - ERROR - stderr - +2025-02-05 18:12:48 - INFO - stdout - {'loss': 0.9487, 'grad_norm': 1.1186572313308716, 'learning_rate': 1.5572497010272093e-05, 'epoch': 1.0} +2025-02-05 18:12:48 - ERROR - stderr - 33%|███▎ | 7460/22434 [8:05:08<11:31:43, 2.77s/it] +2025-02-05 18:12:51 - ERROR - stderr - 33%|███▎ | 7461/22434 [8:05:10<11:15:26, 2.71s/it] +2025-02-05 18:12:51 - ERROR - stderr - +2025-02-05 18:12:51 - ERROR - stderr - +2025-02-05 18:12:51 - INFO - stdout - {'loss': 1.0133, 'grad_norm': 1.111188530921936, 'learning_rate': 1.5571298144451165e-05, 'epoch': 1.0} +2025-02-05 18:12:51 - ERROR - stderr - 33%|███▎ | 7461/22434 [8:05:10<11:15:26, 2.71s/it] +2025-02-05 18:12:53 - ERROR - stderr - 33%|███▎ | 7462/22434 [8:05:13<10:56:29, 2.63s/it] +2025-02-05 18:12:53 - ERROR - stderr - +2025-02-05 18:12:53 - ERROR - stderr - +2025-02-05 18:12:53 - INFO - stdout - {'loss': 0.8549, 'grad_norm': 1.029048204421997, 'learning_rate': 1.557009916250186e-05, 'epoch': 1.0} +2025-02-05 18:12:53 - ERROR - stderr - 33%|███▎ | 7462/22434 [8:05:13<10:56:29, 2.63s/it] +2025-02-05 18:12:56 - ERROR - stderr - 33%|███▎ | 7463/22434 [8:05:15<10:57:37, 2.64s/it] +2025-02-05 18:12:56 - ERROR - stderr - +2025-02-05 18:12:56 - ERROR - stderr - +2025-02-05 18:12:56 - INFO - stdout - {'loss': 0.8791, 'grad_norm': 0.9296038150787354, 'learning_rate': 1.5568900064449164e-05, 'epoch': 1.0} +2025-02-05 18:12:56 - ERROR - stderr - 33%|███▎ | 7463/22434 [8:05:15<10:57:37, 2.64s/it] +2025-02-05 18:12:58 - ERROR - stderr - 33%|███▎ | 7464/22434 [8:05:18<10:49:26, 2.60s/it] +2025-02-05 18:12:58 - ERROR - stderr - +2025-02-05 18:12:58 - ERROR - stderr - +2025-02-05 18:12:58 - INFO - stdout - {'loss': 0.9524, 'grad_norm': 1.2395427227020264, 'learning_rate': 1.556770085031808e-05, 'epoch': 1.0} +2025-02-05 18:12:58 - ERROR - stderr - 33%|███▎ | 7464/22434 [8:05:18<10:49:26, 2.60s/it] +2025-02-05 18:13:01 - ERROR - stderr - 33%|███▎ | 7465/22434 [8:05:21<10:45:55, 2.59s/it] +2025-02-05 18:13:01 - ERROR - stderr - +2025-02-05 18:13:01 - ERROR - stderr - +2025-02-05 18:13:01 - INFO - stdout - {'loss': 0.8514, 'grad_norm': 0.9644502997398376, 'learning_rate': 1.5566501520133595e-05, 'epoch': 1.0} +2025-02-05 18:13:01 - ERROR - stderr - 33%|███▎ | 7465/22434 [8:05:21<10:45:55, 2.59s/it] +2025-02-05 18:13:03 - ERROR - stderr - 33%|███▎ | 7466/22434 [8:05:23<10:42:38, 2.58s/it] +2025-02-05 18:13:03 - ERROR - stderr - +2025-02-05 18:13:03 - ERROR - stderr - +2025-02-05 18:13:03 - INFO - stdout - {'loss': 0.8675, 'grad_norm': 1.0432003736495972, 'learning_rate': 1.5565302073920715e-05, 'epoch': 1.0} +2025-02-05 18:13:03 - ERROR - stderr - 33%|███▎ | 7466/22434 [8:05:23<10:42:38, 2.58s/it] +2025-02-05 18:13:06 - ERROR - stderr - 33%|███▎ | 7467/22434 [8:05:26<10:56:33, 2.63s/it] +2025-02-05 18:13:06 - ERROR - stderr - +2025-02-05 18:13:06 - ERROR - stderr - +2025-02-05 18:13:06 - INFO - stdout - {'loss': 0.8368, 'grad_norm': 0.9974961280822754, 'learning_rate': 1.5564102511704436e-05, 'epoch': 1.0} +2025-02-05 18:13:06 - ERROR - stderr - 33%|███▎ | 7467/22434 [8:05:26<10:56:33, 2.63s/it] +2025-02-05 18:13:09 - ERROR - stderr - 33%|███▎ | 7468/22434 [8:05:29<11:10:50, 2.69s/it] +2025-02-05 18:13:09 - ERROR - stderr - +2025-02-05 18:13:09 - ERROR - stderr - +2025-02-05 18:13:09 - INFO - stdout - {'loss': 1.0585, 'grad_norm': 1.2333546876907349, 'learning_rate': 1.5562902833509773e-05, 'epoch': 1.0} +2025-02-05 18:13:09 - ERROR - stderr - 33%|███▎ | 7468/22434 [8:05:29<11:10:50, 2.69s/it] +2025-02-05 18:13:11 - ERROR - stderr - 33%|███▎ | 7469/22434 [8:05:31<10:58:38, 2.64s/it] +2025-02-05 18:13:11 - ERROR - stderr - +2025-02-05 18:13:11 - ERROR - stderr - +2025-02-05 18:13:11 - INFO - stdout - {'loss': 0.8662, 'grad_norm': 1.0616283416748047, 'learning_rate': 1.5561703039361715e-05, 'epoch': 1.0} +2025-02-05 18:13:11 - ERROR - stderr - 33%|███▎ | 7469/22434 [8:05:31<10:58:38, 2.64s/it] +2025-02-05 18:13:14 - ERROR - stderr - 33%|███▎ | 7470/22434 [8:05:34<10:48:57, 2.60s/it] +2025-02-05 18:13:14 - ERROR - stderr - +2025-02-05 18:13:14 - ERROR - stderr - +2025-02-05 18:13:14 - INFO - stdout - {'loss': 0.9639, 'grad_norm': 1.1124180555343628, 'learning_rate': 1.556050312928528e-05, 'epoch': 1.0} +2025-02-05 18:13:14 - ERROR - stderr - 33%|███▎ | 7470/22434 [8:05:34<10:48:57, 2.60s/it] +2025-02-05 18:13:16 - ERROR - stderr - 33%|███▎ | 7471/22434 [8:05:36<10:43:20, 2.58s/it] +2025-02-05 18:13:17 - ERROR - stderr - +2025-02-05 18:13:17 - ERROR - stderr - +2025-02-05 18:13:17 - INFO - stdout - {'loss': 0.9877, 'grad_norm': 1.0908548831939697, 'learning_rate': 1.555930310330548e-05, 'epoch': 1.0} +2025-02-05 18:13:17 - ERROR - stderr - 33%|███▎ | 7471/22434 [8:05:36<10:43:20, 2.58s/it] +2025-02-05 18:13:19 - ERROR - stderr - 33%|███▎ | 7472/22434 [8:05:39<10:39:44, 2.57s/it] +2025-02-05 18:13:19 - ERROR - stderr - +2025-02-05 18:13:19 - ERROR - stderr - +2025-02-05 18:13:19 - INFO - stdout - {'loss': 0.9318, 'grad_norm': 0.9785193204879761, 'learning_rate': 1.5558102961447327e-05, 'epoch': 1.0} +2025-02-05 18:13:19 - ERROR - stderr - 33%|███▎ | 7472/22434 [8:05:39<10:39:44, 2.57s/it] +2025-02-05 18:13:22 - ERROR - stderr - 33%|███▎ | 7473/22434 [8:05:41<10:37:27, 2.56s/it] +2025-02-05 18:13:22 - ERROR - stderr - +2025-02-05 18:13:22 - ERROR - stderr - +2025-02-05 18:13:22 - INFO - stdout - {'loss': 0.8848, 'grad_norm': 1.0105390548706055, 'learning_rate': 1.5556902703735836e-05, 'epoch': 1.0} +2025-02-05 18:13:22 - ERROR - stderr - 33%|███▎ | 7473/22434 [8:05:41<10:37:27, 2.56s/it] +2025-02-05 18:13:24 - ERROR - stderr - 33%|███▎ | 7474/22434 [8:05:44<10:41:28, 2.57s/it] +2025-02-05 18:13:24 - ERROR - stderr - +2025-02-05 18:13:24 - ERROR - stderr - +2025-02-05 18:13:24 - INFO - stdout - {'loss': 0.9047, 'grad_norm': 1.0255123376846313, 'learning_rate': 1.5555702330196024e-05, 'epoch': 1.0} +2025-02-05 18:13:24 - ERROR - stderr - 33%|███▎ | 7474/22434 [8:05:44<10:41:28, 2.57s/it] +2025-02-05 18:13:27 - ERROR - stderr - 33%|███▎ | 7475/22434 [8:05:46<10:37:43, 2.56s/it] +2025-02-05 18:13:27 - ERROR - stderr - +2025-02-05 18:13:27 - ERROR - stderr - +2025-02-05 18:13:27 - INFO - stdout - {'loss': 1.1178, 'grad_norm': 1.240639090538025, 'learning_rate': 1.5554501840852915e-05, 'epoch': 1.0} +2025-02-05 18:13:27 - ERROR - stderr - 33%|███▎ | 7475/22434 [8:05:46<10:37:43, 2.56s/it] +2025-02-05 18:13:30 - ERROR - stderr - 33%|███▎ | 7476/22434 [8:05:49<11:06:19, 2.67s/it] +2025-02-05 18:13:30 - ERROR - stderr - +2025-02-05 18:13:30 - ERROR - stderr - +2025-02-05 18:13:30 - INFO - stdout - {'loss': 0.9829, 'grad_norm': 1.174062967300415, 'learning_rate': 1.5553301235731527e-05, 'epoch': 1.0} +2025-02-05 18:13:30 - ERROR - stderr - 33%|███▎ | 7476/22434 [8:05:49<11:06:19, 2.67s/it] +2025-02-05 18:13:32 - ERROR - stderr - 33%|███▎ | 7477/22434 [8:05:52<11:02:20, 2.66s/it] +2025-02-05 18:13:32 - ERROR - stderr - +2025-02-05 18:13:32 - ERROR - stderr - +2025-02-05 18:13:32 - INFO - stdout - {'loss': 1.0286, 'grad_norm': 1.141680121421814, 'learning_rate': 1.5552100514856895e-05, 'epoch': 1.0} +2025-02-05 18:13:32 - ERROR - stderr - 33%|███▎ | 7477/22434 [8:05:52<11:02:20, 2.66s/it] +2025-02-05 18:13:35 - ERROR - stderr - 33%|███▎ | 7478/22434 [8:05:54<10:45:34, 2.59s/it] +2025-02-05 18:13:35 - ERROR - stderr - +2025-02-05 18:13:35 - ERROR - stderr - +2025-02-05 18:13:35 - INFO - stdout - {'loss': 0.967, 'grad_norm': 1.0114703178405762, 'learning_rate': 1.555089967825403e-05, 'epoch': 1.0} +2025-02-05 18:13:35 - ERROR - stderr - 33%|███▎ | 7478/22434 [8:05:54<10:45:34, 2.59s/it] +2025-02-05 18:13:37 - ERROR - stderr - 33%|███▎ | 7479/22434 [8:05:57<10:54:18, 2.63s/it] +2025-02-05 18:13:37 - ERROR - stderr - +2025-02-05 18:13:37 - ERROR - stderr - +2025-02-05 18:13:37 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.0181385278701782, 'learning_rate': 1.554969872594798e-05, 'epoch': 1.0} +2025-02-05 18:13:37 - ERROR - stderr - 33%|███▎ | 7479/22434 [8:05:57<10:54:18, 2.63s/it] +2025-02-05 18:13:40 - ERROR - stderr - 33%|███▎ | 7480/22434 [8:06:00<10:44:58, 2.59s/it] +2025-02-05 18:13:40 - ERROR - stderr - +2025-02-05 18:13:40 - ERROR - stderr - +2025-02-05 18:13:40 - INFO - stdout - {'loss': 0.8714, 'grad_norm': 1.0982142686843872, 'learning_rate': 1.554849765796377e-05, 'epoch': 1.0} +2025-02-05 18:13:40 - ERROR - stderr - 33%|███▎ | 7480/22434 [8:06:00<10:44:58, 2.59s/it] +2025-02-05 18:13:42 - ERROR - stderr - 33%|███▎ | 7481/22434 [8:06:02<10:33:58, 2.54s/it] +2025-02-05 18:13:42 - ERROR - stderr - +2025-02-05 18:13:42 - ERROR - stderr - +2025-02-05 18:13:42 - INFO - stdout - {'loss': 0.7681, 'grad_norm': 1.0393790006637573, 'learning_rate': 1.5547296474326438e-05, 'epoch': 1.0} +2025-02-05 18:13:42 - ERROR - stderr - 33%|███▎ | 7481/22434 [8:06:02<10:33:58, 2.54s/it] +2025-02-05 18:13:45 - ERROR - stderr - 33%|███▎ | 7482/22434 [8:06:05<10:39:38, 2.57s/it] +2025-02-05 18:13:45 - ERROR - stderr - +2025-02-05 18:13:45 - ERROR - stderr - +2025-02-05 18:13:45 - INFO - stdout - {'loss': 0.7087, 'grad_norm': 0.9464859366416931, 'learning_rate': 1.554609517506102e-05, 'epoch': 1.0} +2025-02-05 18:13:45 - ERROR - stderr - 33%|███▎ | 7482/22434 [8:06:05<10:39:38, 2.57s/it] +2025-02-05 18:13:47 - ERROR - stderr - 33%|███▎ | 7483/22434 [8:06:07<10:35:29, 2.55s/it] +2025-02-05 18:13:47 - ERROR - stderr - +2025-02-05 18:13:47 - ERROR - stderr - +2025-02-05 18:13:47 - INFO - stdout - {'loss': 0.768, 'grad_norm': 0.9220442175865173, 'learning_rate': 1.5544893760192546e-05, 'epoch': 1.0} +2025-02-05 18:13:47 - ERROR - stderr - 33%|███▎ | 7483/22434 [8:06:07<10:35:29, 2.55s/it] +2025-02-05 18:13:50 - ERROR - stderr - 33%|███▎ | 7484/22434 [8:06:10<10:31:28, 2.53s/it] +2025-02-05 18:13:50 - ERROR - stderr - +2025-02-05 18:13:50 - ERROR - stderr - +2025-02-05 18:13:50 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 0.919678270816803, 'learning_rate': 1.5543692229746076e-05, 'epoch': 1.0} +2025-02-05 18:13:50 - ERROR - stderr - 33%|███▎ | 7484/22434 [8:06:10<10:31:28, 2.53s/it] +2025-02-05 18:13:53 - ERROR - stderr - 33%|███▎ | 7485/22434 [8:06:12<10:47:06, 2.60s/it] +2025-02-05 18:13:53 - ERROR - stderr - +2025-02-05 18:13:53 - ERROR - stderr - +2025-02-05 18:13:53 - INFO - stdout - {'loss': 0.811, 'grad_norm': 0.9600428938865662, 'learning_rate': 1.5542490583746642e-05, 'epoch': 1.0} +2025-02-05 18:13:53 - ERROR - stderr - 33%|███▎ | 7485/22434 [8:06:12<10:47:06, 2.60s/it] +2025-02-05 18:13:55 - ERROR - stderr - 33%|███▎ | 7486/22434 [8:06:15<10:36:49, 2.56s/it] +2025-02-05 18:13:55 - ERROR - stderr - +2025-02-05 18:13:55 - ERROR - stderr - +2025-02-05 18:13:55 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 0.8796353340148926, 'learning_rate': 1.5541288822219297e-05, 'epoch': 1.0} +2025-02-05 18:13:55 - ERROR - stderr - 33%|███▎ | 7486/22434 [8:06:15<10:36:49, 2.56s/it] +2025-02-05 18:13:58 - ERROR - stderr - 33%|███▎ | 7487/22434 [8:06:17<10:35:28, 2.55s/it] +2025-02-05 18:13:58 - ERROR - stderr - +2025-02-05 18:13:58 - ERROR - stderr - +2025-02-05 18:13:58 - INFO - stdout - {'loss': 0.7446, 'grad_norm': 0.9270517230033875, 'learning_rate': 1.554008694518909e-05, 'epoch': 1.0} +2025-02-05 18:13:58 - ERROR - stderr - 33%|███▎ | 7487/22434 [8:06:17<10:35:28, 2.55s/it] +2025-02-05 18:14:00 - ERROR - stderr - 33%|███▎ | 7488/22434 [8:06:20<10:27:05, 2.52s/it] +2025-02-05 18:14:00 - ERROR - stderr - +2025-02-05 18:14:00 - ERROR - stderr - +2025-02-05 18:14:00 - INFO - stdout - {'loss': 0.7442, 'grad_norm': 1.0358089208602905, 'learning_rate': 1.5538884952681067e-05, 'epoch': 1.0} +2025-02-05 18:14:00 - ERROR - stderr - 33%|███▎ | 7488/22434 [8:06:20<10:27:05, 2.52s/it] +2025-02-05 18:14:03 - ERROR - stderr - 33%|███▎ | 7489/22434 [8:06:22<10:26:47, 2.52s/it] +2025-02-05 18:14:03 - ERROR - stderr - +2025-02-05 18:14:03 - ERROR - stderr - +2025-02-05 18:14:03 - INFO - stdout - {'loss': 0.7974, 'grad_norm': 1.012755274772644, 'learning_rate': 1.5537682844720296e-05, 'epoch': 1.0} +2025-02-05 18:14:03 - ERROR - stderr - 33%|███▎ | 7489/22434 [8:06:22<10:26:47, 2.52s/it] +2025-02-05 18:14:05 - ERROR - stderr - 33%|███▎ | 7490/22434 [8:06:25<10:32:08, 2.54s/it] +2025-02-05 18:14:05 - ERROR - stderr - +2025-02-05 18:14:05 - ERROR - stderr - +2025-02-05 18:14:05 - INFO - stdout - {'loss': 0.7532, 'grad_norm': 1.0328119993209839, 'learning_rate': 1.5536480621331818e-05, 'epoch': 1.0} +2025-02-05 18:14:05 - ERROR - stderr - 33%|███▎ | 7490/22434 [8:06:25<10:32:08, 2.54s/it] +2025-02-05 18:14:08 - ERROR - stderr - 33%|███▎ | 7491/22434 [8:06:28<10:40:56, 2.57s/it] +2025-02-05 18:14:08 - ERROR - stderr - +2025-02-05 18:14:08 - ERROR - stderr - +2025-02-05 18:14:08 - INFO - stdout - {'loss': 0.7145, 'grad_norm': 0.9924728870391846, 'learning_rate': 1.55352782825407e-05, 'epoch': 1.0} +2025-02-05 18:14:08 - ERROR - stderr - 33%|███▎ | 7491/22434 [8:06:28<10:40:56, 2.57s/it] +2025-02-05 18:14:10 - ERROR - stderr - 33%|███▎ | 7492/22434 [8:06:30<10:33:19, 2.54s/it] +2025-02-05 18:14:10 - ERROR - stderr - +2025-02-05 18:14:10 - ERROR - stderr - +2025-02-05 18:14:10 - INFO - stdout - {'loss': 0.7319, 'grad_norm': 1.0950642824172974, 'learning_rate': 1.5534075828372004e-05, 'epoch': 1.0} +2025-02-05 18:14:10 - ERROR - stderr - 33%|███▎ | 7492/22434 [8:06:30<10:33:19, 2.54s/it] +2025-02-05 18:14:13 - ERROR - stderr - 33%|███▎ | 7493/22434 [8:06:33<10:25:41, 2.51s/it] +2025-02-05 18:14:13 - ERROR - stderr - +2025-02-05 18:14:13 - ERROR - stderr - +2025-02-05 18:14:13 - INFO - stdout - {'loss': 0.7377, 'grad_norm': 1.0122344493865967, 'learning_rate': 1.5532873258850796e-05, 'epoch': 1.0} +2025-02-05 18:14:13 - ERROR - stderr - 33%|███▎ | 7493/22434 [8:06:33<10:25:41, 2.51s/it] +2025-02-05 18:14:15 - ERROR - stderr - 33%|███▎ | 7494/22434 [8:06:35<10:40:29, 2.57s/it] +2025-02-05 18:14:16 - ERROR - stderr - +2025-02-05 18:14:16 - ERROR - stderr - +2025-02-05 18:14:16 - INFO - stdout - {'loss': 0.7842, 'grad_norm': 1.0589524507522583, 'learning_rate': 1.5531670574002136e-05, 'epoch': 1.0} +2025-02-05 18:14:16 - ERROR - stderr - 33%|███▎ | 7494/22434 [8:06:35<10:40:29, 2.57s/it] +2025-02-05 18:14:18 - ERROR - stderr - 33%|███▎ | 7495/22434 [8:06:38<10:30:48, 2.53s/it] +2025-02-05 18:14:18 - ERROR - stderr - +2025-02-05 18:14:18 - ERROR - stderr - +2025-02-05 18:14:18 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.0826622247695923, 'learning_rate': 1.5530467773851096e-05, 'epoch': 1.0} +2025-02-05 18:14:18 - ERROR - stderr - 33%|███▎ | 7495/22434 [8:06:38<10:30:48, 2.53s/it] +2025-02-05 18:14:20 - ERROR - stderr - 33%|███▎ | 7496/22434 [8:06:40<10:31:15, 2.54s/it] +2025-02-05 18:14:21 - ERROR - stderr - +2025-02-05 18:14:21 - ERROR - stderr - +2025-02-05 18:14:21 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.1649103164672852, 'learning_rate': 1.5529264858422747e-05, 'epoch': 1.0} +2025-02-05 18:14:21 - ERROR - stderr - 33%|███▎ | 7496/22434 [8:06:40<10:31:15, 2.54s/it] +2025-02-05 18:14:23 - ERROR - stderr - 33%|███▎ | 7497/22434 [8:06:43<10:28:38, 2.53s/it] +2025-02-05 18:14:23 - ERROR - stderr - +2025-02-05 18:14:23 - ERROR - stderr - +2025-02-05 18:14:23 - INFO - stdout - {'loss': 0.797, 'grad_norm': 1.091210126876831, 'learning_rate': 1.5528061827742166e-05, 'epoch': 1.0} +2025-02-05 18:14:23 - ERROR - stderr - 33%|███▎ | 7497/22434 [8:06:43<10:28:38, 2.53s/it] +2025-02-05 18:14:26 - ERROR - stderr - 33%|███▎ | 7498/22434 [8:06:45<10:40:30, 2.57s/it] +2025-02-05 18:14:26 - ERROR - stderr - +2025-02-05 18:14:26 - ERROR - stderr - +2025-02-05 18:14:26 - INFO - stdout - {'loss': 0.6992, 'grad_norm': 1.0944057703018188, 'learning_rate': 1.552685868183442e-05, 'epoch': 1.0} +2025-02-05 18:14:26 - ERROR - stderr - 33%|███▎ | 7498/22434 [8:06:45<10:40:30, 2.57s/it] +2025-02-05 18:14:28 - ERROR - stderr - 33%|███▎ | 7499/22434 [8:06:48<10:35:11, 2.55s/it] +2025-02-05 18:14:28 - ERROR - stderr - +2025-02-05 18:14:28 - ERROR - stderr - +2025-02-05 18:14:28 - INFO - stdout - {'loss': 0.78, 'grad_norm': 1.148113489151001, 'learning_rate': 1.55256554207246e-05, 'epoch': 1.0} +2025-02-05 18:14:28 - ERROR - stderr - 33%|███▎ | 7499/22434 [8:06:48<10:35:11, 2.55s/it] +2025-02-05 18:14:31 - ERROR - stderr - 33%|███▎ | 7500/22434 [8:06:50<10:33:44, 2.55s/it] +2025-02-05 18:14:31 - ERROR - stderr - +2025-02-05 18:14:31 - ERROR - stderr - +2025-02-05 18:14:31 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 0.9884711503982544, 'learning_rate': 1.5524452044437777e-05, 'epoch': 1.0} +2025-02-05 18:14:31 - ERROR - stderr - 33%|███▎ | 7500/22434 [8:06:51<10:33:44, 2.55s/it] +2025-02-05 18:14:33 - ERROR - stderr - 33%|███▎ | 7501/22434 [8:06:53<10:40:23, 2.57s/it] +2025-02-05 18:14:33 - ERROR - stderr - +2025-02-05 18:14:33 - ERROR - stderr - +2025-02-05 18:14:33 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.2058829069137573, 'learning_rate': 1.5523248552999038e-05, 'epoch': 1.0} +2025-02-05 18:14:33 - ERROR - stderr - 33%|███▎ | 7501/22434 [8:06:53<10:40:23, 2.57s/it] +2025-02-05 18:14:36 - ERROR - stderr - 33%|███▎ | 7502/22434 [8:06:56<10:48:02, 2.60s/it] +2025-02-05 18:14:36 - ERROR - stderr - +2025-02-05 18:14:36 - ERROR - stderr - +2025-02-05 18:14:36 - INFO - stdout - {'loss': 0.6804, 'grad_norm': 0.9653275012969971, 'learning_rate': 1.5522044946433468e-05, 'epoch': 1.0} +2025-02-05 18:14:36 - ERROR - stderr - 33%|███▎ | 7502/22434 [8:06:56<10:48:02, 2.60s/it] +2025-02-05 18:14:39 - ERROR - stderr - 33%|███▎ | 7503/22434 [8:06:58<10:47:53, 2.60s/it] +2025-02-05 18:14:39 - ERROR - stderr - +2025-02-05 18:14:39 - ERROR - stderr - +2025-02-05 18:14:39 - INFO - stdout - {'loss': 0.759, 'grad_norm': 1.1390634775161743, 'learning_rate': 1.5520841224766153e-05, 'epoch': 1.0} +2025-02-05 18:14:39 - ERROR - stderr - 33%|███▎ | 7503/22434 [8:06:58<10:47:53, 2.60s/it] +2025-02-05 18:14:41 - ERROR - stderr - 33%|███▎ | 7504/22434 [8:07:01<10:43:12, 2.58s/it] +2025-02-05 18:14:41 - ERROR - stderr - +2025-02-05 18:14:41 - ERROR - stderr - +2025-02-05 18:14:41 - INFO - stdout - {'loss': 0.8251, 'grad_norm': 1.2696011066436768, 'learning_rate': 1.551963738802219e-05, 'epoch': 1.0} +2025-02-05 18:14:41 - ERROR - stderr - 33%|███▎ | 7504/22434 [8:07:01<10:43:12, 2.58s/it] +2025-02-05 18:14:44 - ERROR - stderr - 33%|███▎ | 7505/22434 [8:07:03<10:39:57, 2.57s/it] +2025-02-05 18:14:44 - ERROR - stderr - +2025-02-05 18:14:44 - ERROR - stderr - +2025-02-05 18:14:44 - INFO - stdout - {'loss': 0.7601, 'grad_norm': 1.0664371252059937, 'learning_rate': 1.5518433436226664e-05, 'epoch': 1.0} +2025-02-05 18:14:44 - ERROR - stderr - 33%|███▎ | 7505/22434 [8:07:04<10:39:57, 2.57s/it] +2025-02-05 18:14:46 - ERROR - stderr - 33%|███▎ | 7506/22434 [8:07:06<10:37:40, 2.56s/it] +2025-02-05 18:14:46 - ERROR - stderr - +2025-02-05 18:14:46 - ERROR - stderr - +2025-02-05 18:14:46 - INFO - stdout - {'loss': 0.7216, 'grad_norm': 1.079526424407959, 'learning_rate': 1.5517229369404675e-05, 'epoch': 1.0} +2025-02-05 18:14:46 - ERROR - stderr - 33%|███▎ | 7506/22434 [8:07:06<10:37:40, 2.56s/it] +2025-02-05 18:14:49 - ERROR - stderr - 33%|███▎ | 7507/22434 [8:07:08<10:32:10, 2.54s/it] +2025-02-05 18:14:49 - ERROR - stderr - +2025-02-05 18:14:49 - ERROR - stderr - +2025-02-05 18:14:49 - INFO - stdout - {'loss': 0.8131, 'grad_norm': 1.3189359903335571, 'learning_rate': 1.5516025187581318e-05, 'epoch': 1.0} +2025-02-05 18:14:49 - ERROR - stderr - 33%|███▎ | 7507/22434 [8:07:09<10:32:10, 2.54s/it] +2025-02-05 18:14:51 - ERROR - stderr - 33%|███▎ | 7508/22434 [8:07:11<10:28:54, 2.53s/it] +2025-02-05 18:14:51 - ERROR - stderr - +2025-02-05 18:14:51 - ERROR - stderr - +2025-02-05 18:14:51 - INFO - stdout - {'loss': 0.6483, 'grad_norm': 1.0641002655029297, 'learning_rate': 1.5514820890781695e-05, 'epoch': 1.0} +2025-02-05 18:14:51 - ERROR - stderr - 33%|███▎ | 7508/22434 [8:07:11<10:28:54, 2.53s/it] +2025-02-05 18:14:54 - ERROR - stderr - 33%|███▎ | 7509/22434 [8:07:13<10:25:33, 2.51s/it] +2025-02-05 18:14:54 - ERROR - stderr - +2025-02-05 18:14:54 - ERROR - stderr - +2025-02-05 18:14:54 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.0159856081008911, 'learning_rate': 1.551361647903091e-05, 'epoch': 1.0} +2025-02-05 18:14:54 - ERROR - stderr - 33%|███▎ | 7509/22434 [8:07:14<10:25:33, 2.51s/it] +2025-02-05 18:14:56 - ERROR - stderr - 33%|███▎ | 7510/22434 [8:07:16<10:23:16, 2.51s/it] +2025-02-05 18:14:56 - ERROR - stderr - +2025-02-05 18:14:56 - ERROR - stderr - +2025-02-05 18:14:56 - INFO - stdout - {'loss': 0.748, 'grad_norm': 1.1783243417739868, 'learning_rate': 1.551241195235406e-05, 'epoch': 1.0} +2025-02-05 18:14:56 - ERROR - stderr - 33%|███▎ | 7510/22434 [8:07:16<10:23:16, 2.51s/it] +2025-02-05 18:14:59 - ERROR - stderr - 33%|███▎ | 7511/22434 [8:07:18<10:19:18, 2.49s/it] +2025-02-05 18:14:59 - ERROR - stderr - +2025-02-05 18:14:59 - ERROR - stderr - +2025-02-05 18:14:59 - INFO - stdout - {'loss': 0.79, 'grad_norm': 1.3051390647888184, 'learning_rate': 1.551120731077626e-05, 'epoch': 1.0} +2025-02-05 18:14:59 - ERROR - stderr - 33%|███▎ | 7511/22434 [8:07:18<10:19:18, 2.49s/it] +2025-02-05 18:15:01 - ERROR - stderr - 33%|███▎ | 7512/22434 [8:07:21<10:18:34, 2.49s/it] +2025-02-05 18:15:01 - ERROR - stderr - +2025-02-05 18:15:01 - ERROR - stderr - +2025-02-05 18:15:01 - INFO - stdout - {'loss': 0.7885, 'grad_norm': 1.114362359046936, 'learning_rate': 1.5510002554322617e-05, 'epoch': 1.0} +2025-02-05 18:15:01 - ERROR - stderr - 33%|███▎ | 7512/22434 [8:07:21<10:18:34, 2.49s/it] +2025-02-05 18:15:04 - ERROR - stderr - 33%|███▎ | 7513/22434 [8:07:24<10:29:42, 2.53s/it] +2025-02-05 18:15:04 - ERROR - stderr - +2025-02-05 18:15:04 - ERROR - stderr - +2025-02-05 18:15:04 - INFO - stdout - {'loss': 0.8691, 'grad_norm': 1.0880696773529053, 'learning_rate': 1.550879768301825e-05, 'epoch': 1.0} +2025-02-05 18:15:04 - ERROR - stderr - 33%|███▎ | 7513/22434 [8:07:24<10:29:42, 2.53s/it] +2025-02-05 18:15:06 - ERROR - stderr - 33%|███▎ | 7514/22434 [8:07:26<10:33:21, 2.55s/it] +2025-02-05 18:15:06 - ERROR - stderr - +2025-02-05 18:15:06 - ERROR - stderr - +2025-02-05 18:15:06 - INFO - stdout - {'loss': 0.7344, 'grad_norm': 1.113642930984497, 'learning_rate': 1.5507592696888258e-05, 'epoch': 1.0} +2025-02-05 18:15:06 - ERROR - stderr - 33%|███▎ | 7514/22434 [8:07:26<10:33:21, 2.55s/it] +2025-02-05 18:15:09 - ERROR - stderr - 33%|███▎ | 7515/22434 [8:07:29<10:26:32, 2.52s/it] +2025-02-05 18:15:09 - ERROR - stderr - +2025-02-05 18:15:09 - ERROR - stderr - +2025-02-05 18:15:09 - INFO - stdout - {'loss': 0.788, 'grad_norm': 1.1645268201828003, 'learning_rate': 1.550638759595777e-05, 'epoch': 1.0} +2025-02-05 18:15:09 - ERROR - stderr - 33%|███▎ | 7515/22434 [8:07:29<10:26:32, 2.52s/it] +2025-02-05 18:15:09 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2783 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 18:15:09 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2783 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 18:15:11 - ERROR - stderr - 34%|███▎ | 7516/22434 [8:07:31<10:23:01, 2.51s/it] +2025-02-05 18:15:11 - ERROR - stderr - +2025-02-05 18:15:11 - ERROR - stderr - +2025-02-05 18:15:11 - INFO - stdout - {'loss': 0.7616, 'grad_norm': 1.0820420980453491, 'learning_rate': 1.55051823802519e-05, 'epoch': 1.01} +2025-02-05 18:15:11 - ERROR - stderr - 34%|███▎ | 7516/22434 [8:07:31<10:23:01, 2.51s/it] +2025-02-05 18:15:17 - ERROR - stderr - 34%|███▎ | 7517/22434 [8:07:37<14:21:09, 3.46s/it] +2025-02-05 18:15:17 - ERROR - stderr - +2025-02-05 18:15:17 - ERROR - stderr - +2025-02-05 18:15:17 - INFO - stdout - {'loss': 0.774, 'grad_norm': 1.120851755142212, 'learning_rate': 1.5503977049795772e-05, 'epoch': 1.01} +2025-02-05 18:15:17 - ERROR - stderr - 34%|███▎ | 7517/22434 [8:07:37<14:21:09, 3.46s/it] +2025-02-05 18:15:19 - ERROR - stderr - 34%|███▎ | 7518/22434 [8:07:39<13:08:15, 3.17s/it] +2025-02-05 18:15:20 - ERROR - stderr - +2025-02-05 18:15:20 - ERROR - stderr - +2025-02-05 18:15:20 - INFO - stdout - {'loss': 0.7642, 'grad_norm': 1.2607370615005493, 'learning_rate': 1.550277160461451e-05, 'epoch': 1.01} +2025-02-05 18:15:20 - ERROR - stderr - 34%|███▎ | 7518/22434 [8:07:39<13:08:15, 3.17s/it] +2025-02-05 18:15:22 - ERROR - stderr - 34%|███▎ | 7519/22434 [8:07:42<12:19:26, 2.97s/it] +2025-02-05 18:15:22 - ERROR - stderr - +2025-02-05 18:15:22 - ERROR - stderr - +2025-02-05 18:15:22 - INFO - stdout - {'loss': 0.8203, 'grad_norm': 1.1562846899032593, 'learning_rate': 1.5501566044733237e-05, 'epoch': 1.01} +2025-02-05 18:15:22 - ERROR - stderr - 34%|███▎ | 7519/22434 [8:07:42<12:19:26, 2.97s/it] +2025-02-05 18:15:25 - ERROR - stderr - 34%|███▎ | 7520/22434 [8:07:44<11:47:53, 2.85s/it] +2025-02-05 18:15:25 - ERROR - stderr - +2025-02-05 18:15:25 - ERROR - stderr - +2025-02-05 18:15:25 - INFO - stdout - {'loss': 0.762, 'grad_norm': 1.116916537284851, 'learning_rate': 1.5500360370177087e-05, 'epoch': 1.01} +2025-02-05 18:15:25 - ERROR - stderr - 34%|███▎ | 7520/22434 [8:07:44<11:47:53, 2.85s/it] +2025-02-05 18:15:27 - ERROR - stderr - 34%|███▎ | 7521/22434 [8:07:47<11:22:29, 2.75s/it] +2025-02-05 18:15:27 - ERROR - stderr - +2025-02-05 18:15:27 - ERROR - stderr - +2025-02-05 18:15:27 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.0174331665039062, 'learning_rate': 1.549915458097119e-05, 'epoch': 1.01} +2025-02-05 18:15:27 - ERROR - stderr - 34%|███▎ | 7521/22434 [8:07:47<11:22:29, 2.75s/it] +2025-02-05 18:15:30 - ERROR - stderr - 34%|███▎ | 7522/22434 [8:07:49<11:05:58, 2.68s/it] +2025-02-05 18:15:30 - ERROR - stderr - +2025-02-05 18:15:30 - ERROR - stderr - +2025-02-05 18:15:30 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.155356764793396, 'learning_rate': 1.5497948677140673e-05, 'epoch': 1.01} +2025-02-05 18:15:30 - ERROR - stderr - 34%|███▎ | 7522/22434 [8:07:49<11:05:58, 2.68s/it] +2025-02-05 18:15:32 - ERROR - stderr - 34%|███▎ | 7523/22434 [8:07:52<10:46:44, 2.60s/it] +2025-02-05 18:15:32 - ERROR - stderr - +2025-02-05 18:15:32 - ERROR - stderr - +2025-02-05 18:15:32 - INFO - stdout - {'loss': 0.823, 'grad_norm': 1.143683910369873, 'learning_rate': 1.549674265871068e-05, 'epoch': 1.01} +2025-02-05 18:15:32 - ERROR - stderr - 34%|███▎ | 7523/22434 [8:07:52<10:46:44, 2.60s/it] +2025-02-05 18:15:35 - ERROR - stderr - 34%|███▎ | 7524/22434 [8:07:54<10:51:50, 2.62s/it] +2025-02-05 18:15:35 - ERROR - stderr - +2025-02-05 18:15:35 - ERROR - stderr - +2025-02-05 18:15:35 - INFO - stdout - {'loss': 0.7503, 'grad_norm': 1.0673437118530273, 'learning_rate': 1.5495536525706346e-05, 'epoch': 1.01} +2025-02-05 18:15:35 - ERROR - stderr - 34%|███▎ | 7524/22434 [8:07:54<10:51:50, 2.62s/it] +2025-02-05 18:15:37 - ERROR - stderr - 34%|███▎ | 7525/22434 [8:07:57<10:42:22, 2.59s/it] +2025-02-05 18:15:37 - ERROR - stderr - +2025-02-05 18:15:37 - ERROR - stderr - +2025-02-05 18:15:37 - INFO - stdout - {'loss': 0.7651, 'grad_norm': 1.1236730813980103, 'learning_rate': 1.549433027815281e-05, 'epoch': 1.01} +2025-02-05 18:15:37 - ERROR - stderr - 34%|███▎ | 7525/22434 [8:07:57<10:42:22, 2.59s/it] +2025-02-05 18:15:40 - ERROR - stderr - 34%|███▎ | 7526/22434 [8:07:59<10:37:14, 2.56s/it] +2025-02-05 18:15:40 - ERROR - stderr - +2025-02-05 18:15:40 - ERROR - stderr - +2025-02-05 18:15:40 - INFO - stdout - {'loss': 0.7805, 'grad_norm': 1.1302212476730347, 'learning_rate': 1.5493123916075218e-05, 'epoch': 1.01} +2025-02-05 18:15:40 - ERROR - stderr - 34%|███▎ | 7526/22434 [8:07:59<10:37:14, 2.56s/it] +2025-02-05 18:15:42 - ERROR - stderr - 34%|███▎ | 7527/22434 [8:08:02<10:34:49, 2.56s/it] +2025-02-05 18:15:42 - ERROR - stderr - +2025-02-05 18:15:42 - ERROR - stderr - +2025-02-05 18:15:42 - INFO - stdout - {'loss': 0.7663, 'grad_norm': 1.105208158493042, 'learning_rate': 1.5491917439498714e-05, 'epoch': 1.01} +2025-02-05 18:15:42 - ERROR - stderr - 34%|███▎ | 7527/22434 [8:08:02<10:34:49, 2.56s/it] +2025-02-05 18:15:45 - ERROR - stderr - 34%|███▎ | 7528/22434 [8:08:04<10:28:28, 2.53s/it] +2025-02-05 18:15:45 - ERROR - stderr - +2025-02-05 18:15:45 - ERROR - stderr - +2025-02-05 18:15:45 - INFO - stdout - {'loss': 0.7009, 'grad_norm': 0.9848840832710266, 'learning_rate': 1.5490710848448446e-05, 'epoch': 1.01} +2025-02-05 18:15:45 - ERROR - stderr - 34%|███▎ | 7528/22434 [8:08:04<10:28:28, 2.53s/it] +2025-02-05 18:15:47 - ERROR - stderr - 34%|███▎ | 7529/22434 [8:08:07<10:24:05, 2.51s/it] +2025-02-05 18:15:47 - ERROR - stderr - +2025-02-05 18:15:47 - ERROR - stderr - +2025-02-05 18:15:47 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.0820873975753784, 'learning_rate': 1.548950414294957e-05, 'epoch': 1.01} +2025-02-05 18:15:47 - ERROR - stderr - 34%|███▎ | 7529/22434 [8:08:07<10:24:05, 2.51s/it] +2025-02-05 18:15:50 - ERROR - stderr - 34%|███▎ | 7530/22434 [8:08:09<10:21:25, 2.50s/it] +2025-02-05 18:15:50 - ERROR - stderr - +2025-02-05 18:15:50 - ERROR - stderr - +2025-02-05 18:15:50 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.0202966928482056, 'learning_rate': 1.5488297323027223e-05, 'epoch': 1.01} +2025-02-05 18:15:50 - ERROR - stderr - 34%|███▎ | 7530/22434 [8:08:09<10:21:25, 2.50s/it] +2025-02-05 18:15:52 - ERROR - stderr - 34%|███▎ | 7531/22434 [8:08:12<10:26:59, 2.52s/it] +2025-02-05 18:15:52 - ERROR - stderr - +2025-02-05 18:15:52 - ERROR - stderr - +2025-02-05 18:15:52 - INFO - stdout - {'loss': 0.7451, 'grad_norm': 1.034687876701355, 'learning_rate': 1.5487090388706573e-05, 'epoch': 1.01} +2025-02-05 18:15:52 - ERROR - stderr - 34%|███▎ | 7531/22434 [8:08:12<10:26:59, 2.52s/it] +2025-02-05 18:15:55 - ERROR - stderr - 34%|███▎ | 7532/22434 [8:08:14<10:24:26, 2.51s/it] +2025-02-05 18:15:55 - ERROR - stderr - +2025-02-05 18:15:55 - ERROR - stderr - +2025-02-05 18:15:55 - INFO - stdout - {'loss': 0.8128, 'grad_norm': 1.3755745887756348, 'learning_rate': 1.5485883340012778e-05, 'epoch': 1.01} +2025-02-05 18:15:55 - ERROR - stderr - 34%|███▎ | 7532/22434 [8:08:15<10:24:26, 2.51s/it] +2025-02-05 18:15:57 - ERROR - stderr - 34%|███▎ | 7533/22434 [8:08:17<10:24:23, 2.51s/it] +2025-02-05 18:15:57 - ERROR - stderr - +2025-02-05 18:15:57 - ERROR - stderr - +2025-02-05 18:15:57 - INFO - stdout - {'loss': 0.6603, 'grad_norm': 1.0626695156097412, 'learning_rate': 1.5484676176970996e-05, 'epoch': 1.01} +2025-02-05 18:15:57 - ERROR - stderr - 34%|███▎ | 7533/22434 [8:08:17<10:24:23, 2.51s/it] +2025-02-05 18:16:00 - ERROR - stderr - 34%|███▎ | 7534/22434 [8:08:19<10:22:40, 2.51s/it] +2025-02-05 18:16:00 - ERROR - stderr - +2025-02-05 18:16:00 - ERROR - stderr - +2025-02-05 18:16:00 - INFO - stdout - {'loss': 0.6209, 'grad_norm': 1.0270764827728271, 'learning_rate': 1.548346889960638e-05, 'epoch': 1.01} +2025-02-05 18:16:00 - ERROR - stderr - 34%|███▎ | 7534/22434 [8:08:20<10:22:40, 2.51s/it] +2025-02-05 18:16:02 - ERROR - stderr - 34%|███▎ | 7535/22434 [8:08:22<10:33:30, 2.55s/it] +2025-02-05 18:16:02 - ERROR - stderr - +2025-02-05 18:16:02 - ERROR - stderr - +2025-02-05 18:16:02 - INFO - stdout - {'loss': 0.8197, 'grad_norm': 1.161803960800171, 'learning_rate': 1.5482261507944106e-05, 'epoch': 1.01} +2025-02-05 18:16:02 - ERROR - stderr - 34%|███▎ | 7535/22434 [8:08:22<10:33:30, 2.55s/it] +2025-02-05 18:16:05 - ERROR - stderr - 34%|███▎ | 7536/22434 [8:08:25<10:27:53, 2.53s/it] +2025-02-05 18:16:05 - ERROR - stderr - +2025-02-05 18:16:05 - ERROR - stderr - +2025-02-05 18:16:05 - INFO - stdout - {'loss': 0.6343, 'grad_norm': 0.9051011204719543, 'learning_rate': 1.5481054002009336e-05, 'epoch': 1.01} +2025-02-05 18:16:05 - ERROR - stderr - 34%|███▎ | 7536/22434 [8:08:25<10:27:53, 2.53s/it] +2025-02-05 18:16:07 - ERROR - stderr - 34%|███▎ | 7537/22434 [8:08:27<10:22:29, 2.51s/it] +2025-02-05 18:16:07 - ERROR - stderr - +2025-02-05 18:16:07 - ERROR - stderr - +2025-02-05 18:16:07 - INFO - stdout - {'loss': 0.8111, 'grad_norm': 1.278534173965454, 'learning_rate': 1.5479846381827243e-05, 'epoch': 1.01} +2025-02-05 18:16:07 - ERROR - stderr - 34%|███▎ | 7537/22434 [8:08:27<10:22:29, 2.51s/it] +2025-02-05 18:16:10 - ERROR - stderr - 34%|███▎ | 7538/22434 [8:08:30<10:44:17, 2.60s/it] +2025-02-05 18:16:10 - ERROR - stderr - +2025-02-05 18:16:10 - ERROR - stderr - +2025-02-05 18:16:10 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.13411283493042, 'learning_rate': 1.547863864742299e-05, 'epoch': 1.01} +2025-02-05 18:16:10 - ERROR - stderr - 34%|███▎ | 7538/22434 [8:08:30<10:44:17, 2.60s/it] +2025-02-05 18:16:13 - ERROR - stderr - 34%|███▎ | 7539/22434 [8:08:32<10:36:50, 2.57s/it] +2025-02-05 18:16:13 - ERROR - stderr - +2025-02-05 18:16:13 - ERROR - stderr - +2025-02-05 18:16:13 - INFO - stdout - {'loss': 0.8011, 'grad_norm': 1.230907678604126, 'learning_rate': 1.547743079882176e-05, 'epoch': 1.01} +2025-02-05 18:16:13 - ERROR - stderr - 34%|███▎ | 7539/22434 [8:08:32<10:36:50, 2.57s/it] +2025-02-05 18:16:15 - ERROR - stderr - 34%|███▎ | 7540/22434 [8:08:35<10:29:56, 2.54s/it] +2025-02-05 18:16:15 - ERROR - stderr - +2025-02-05 18:16:15 - ERROR - stderr - +2025-02-05 18:16:15 - INFO - stdout - {'loss': 0.7482, 'grad_norm': 1.104331612586975, 'learning_rate': 1.5476222836048725e-05, 'epoch': 1.01} +2025-02-05 18:16:15 - ERROR - stderr - 34%|███▎ | 7540/22434 [8:08:35<10:29:56, 2.54s/it] +2025-02-05 18:16:18 - ERROR - stderr - 34%|███▎ | 7541/22434 [8:08:37<10:34:58, 2.56s/it] +2025-02-05 18:16:18 - ERROR - stderr - +2025-02-05 18:16:18 - ERROR - stderr - +2025-02-05 18:16:18 - INFO - stdout - {'loss': 0.7563, 'grad_norm': 1.0554298162460327, 'learning_rate': 1.547501475912907e-05, 'epoch': 1.01} +2025-02-05 18:16:18 - ERROR - stderr - 34%|███▎ | 7541/22434 [8:08:37<10:34:58, 2.56s/it] +2025-02-05 18:16:20 - ERROR - stderr - 34%|███▎ | 7542/22434 [8:08:40<10:36:28, 2.56s/it] +2025-02-05 18:16:20 - ERROR - stderr - +2025-02-05 18:16:20 - ERROR - stderr - +2025-02-05 18:16:20 - INFO - stdout - {'loss': 0.7814, 'grad_norm': 1.1173374652862549, 'learning_rate': 1.547380656808797e-05, 'epoch': 1.01} +2025-02-05 18:16:20 - ERROR - stderr - 34%|███▎ | 7542/22434 [8:08:40<10:36:28, 2.56s/it] +2025-02-05 18:16:23 - ERROR - stderr - 34%|███▎ | 7543/22434 [8:08:43<10:31:04, 2.54s/it] +2025-02-05 18:16:23 - ERROR - stderr - +2025-02-05 18:16:23 - ERROR - stderr - +2025-02-05 18:16:23 - INFO - stdout - {'loss': 0.7358, 'grad_norm': 0.9938890337944031, 'learning_rate': 1.5472598262950604e-05, 'epoch': 1.01} +2025-02-05 18:16:23 - ERROR - stderr - 34%|███▎ | 7543/22434 [8:08:43<10:31:04, 2.54s/it] +2025-02-05 18:16:26 - ERROR - stderr - 34%|███▎ | 7544/22434 [8:08:45<10:48:10, 2.61s/it] +2025-02-05 18:16:26 - ERROR - stderr - +2025-02-05 18:16:26 - ERROR - stderr - +2025-02-05 18:16:26 - INFO - stdout - {'loss': 0.828, 'grad_norm': 1.2468020915985107, 'learning_rate': 1.547138984374217e-05, 'epoch': 1.01} +2025-02-05 18:16:26 - ERROR - stderr - 34%|███▎ | 7544/22434 [8:08:45<10:48:10, 2.61s/it] +2025-02-05 18:16:28 - ERROR - stderr - 34%|███▎ | 7545/22434 [8:08:48<10:49:44, 2.62s/it] +2025-02-05 18:16:28 - ERROR - stderr - +2025-02-05 18:16:28 - ERROR - stderr - +2025-02-05 18:16:28 - INFO - stdout - {'loss': 0.7864, 'grad_norm': 1.267331838607788, 'learning_rate': 1.547018131048785e-05, 'epoch': 1.01} +2025-02-05 18:16:28 - ERROR - stderr - 34%|███▎ | 7545/22434 [8:08:48<10:49:44, 2.62s/it] +2025-02-05 18:16:31 - ERROR - stderr - 34%|███▎ | 7546/22434 [8:08:50<10:43:43, 2.59s/it] +2025-02-05 18:16:31 - ERROR - stderr - +2025-02-05 18:16:31 - ERROR - stderr - +2025-02-05 18:16:31 - INFO - stdout - {'loss': 0.7876, 'grad_norm': 1.1960279941558838, 'learning_rate': 1.5468972663212832e-05, 'epoch': 1.01} +2025-02-05 18:16:31 - ERROR - stderr - 34%|███▎ | 7546/22434 [8:08:50<10:43:43, 2.59s/it] +2025-02-05 18:16:33 - ERROR - stderr - 34%|███▎ | 7547/22434 [8:08:53<10:36:26, 2.57s/it] +2025-02-05 18:16:33 - ERROR - stderr - +2025-02-05 18:16:33 - ERROR - stderr - +2025-02-05 18:16:33 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.0529667139053345, 'learning_rate': 1.5467763901942312e-05, 'epoch': 1.01} +2025-02-05 18:16:33 - ERROR - stderr - 34%|███▎ | 7547/22434 [8:08:53<10:36:26, 2.57s/it] +2025-02-05 18:16:36 - ERROR - stderr - 34%|███▎ | 7548/22434 [8:08:55<10:33:11, 2.55s/it] +2025-02-05 18:16:36 - ERROR - stderr - +2025-02-05 18:16:36 - ERROR - stderr - +2025-02-05 18:16:36 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 0.991306483745575, 'learning_rate': 1.5466555026701486e-05, 'epoch': 1.01} +2025-02-05 18:16:36 - ERROR - stderr - 34%|███▎ | 7548/22434 [8:08:56<10:33:11, 2.55s/it] +2025-02-05 18:16:38 - ERROR - stderr - 34%|███▎ | 7549/22434 [8:08:58<10:33:28, 2.55s/it] +2025-02-05 18:16:38 - ERROR - stderr - +2025-02-05 18:16:38 - ERROR - stderr - +2025-02-05 18:16:38 - INFO - stdout - {'loss': 0.768, 'grad_norm': 1.0380163192749023, 'learning_rate': 1.5465346037515555e-05, 'epoch': 1.01} +2025-02-05 18:16:38 - ERROR - stderr - 34%|███▎ | 7549/22434 [8:08:58<10:33:28, 2.55s/it] +2025-02-05 18:16:41 - ERROR - stderr - 34%|███▎ | 7550/22434 [8:09:01<10:36:47, 2.57s/it] +2025-02-05 18:16:41 - ERROR - stderr - +2025-02-05 18:16:41 - ERROR - stderr - +2025-02-05 18:16:41 - INFO - stdout - {'loss': 0.8219, 'grad_norm': 1.1175163984298706, 'learning_rate': 1.546413693440971e-05, 'epoch': 1.01} +2025-02-05 18:16:41 - ERROR - stderr - 34%|███▎ | 7550/22434 [8:09:01<10:36:47, 2.57s/it] +2025-02-05 18:16:43 - ERROR - stderr - 34%|███▎ | 7551/22434 [8:09:03<10:30:01, 2.54s/it] +2025-02-05 18:16:43 - ERROR - stderr - +2025-02-05 18:16:43 - ERROR - stderr - +2025-02-05 18:16:43 - INFO - stdout - {'loss': 0.688, 'grad_norm': 1.0123742818832397, 'learning_rate': 1.5462927717409165e-05, 'epoch': 1.01} +2025-02-05 18:16:43 - ERROR - stderr - 34%|███▎ | 7551/22434 [8:09:03<10:30:01, 2.54s/it] +2025-02-05 18:16:46 - ERROR - stderr - 34%|███▎ | 7552/22434 [8:09:06<10:29:07, 2.54s/it] +2025-02-05 18:16:46 - ERROR - stderr - +2025-02-05 18:16:46 - ERROR - stderr - +2025-02-05 18:16:46 - INFO - stdout - {'loss': 0.6891, 'grad_norm': 1.1075096130371094, 'learning_rate': 1.5461718386539115e-05, 'epoch': 1.01} +2025-02-05 18:16:46 - ERROR - stderr - 34%|███▎ | 7552/22434 [8:09:06<10:29:07, 2.54s/it] +2025-02-05 18:16:49 - ERROR - stderr - 34%|███▎ | 7553/22434 [8:09:08<10:38:41, 2.58s/it] +2025-02-05 18:16:49 - ERROR - stderr - +2025-02-05 18:16:49 - ERROR - stderr - +2025-02-05 18:16:49 - INFO - stdout - {'loss': 0.7164, 'grad_norm': 1.1292492151260376, 'learning_rate': 1.546050894182477e-05, 'epoch': 1.01} +2025-02-05 18:16:49 - ERROR - stderr - 34%|███▎ | 7553/22434 [8:09:08<10:38:41, 2.58s/it] +2025-02-05 18:16:51 - ERROR - stderr - 34%|███▎ | 7554/22434 [8:09:11<10:34:55, 2.56s/it] +2025-02-05 18:16:51 - ERROR - stderr - +2025-02-05 18:16:51 - ERROR - stderr - +2025-02-05 18:16:51 - INFO - stdout - {'loss': 0.7824, 'grad_norm': 1.0311020612716675, 'learning_rate': 1.5459299383291347e-05, 'epoch': 1.01} +2025-02-05 18:16:51 - ERROR - stderr - 34%|███▎ | 7554/22434 [8:09:11<10:34:55, 2.56s/it] +2025-02-05 18:16:54 - ERROR - stderr - 34%|███▎ | 7555/22434 [8:09:13<10:34:02, 2.56s/it] +2025-02-05 18:16:54 - ERROR - stderr - +2025-02-05 18:16:54 - ERROR - stderr - +2025-02-05 18:16:54 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.192772626876831, 'learning_rate': 1.5458089710964047e-05, 'epoch': 1.01} +2025-02-05 18:16:54 - ERROR - stderr - 34%|███▎ | 7555/22434 [8:09:13<10:34:02, 2.56s/it] +2025-02-05 18:16:56 - ERROR - stderr - 34%|███▎ | 7556/22434 [8:09:16<10:55:40, 2.64s/it] +2025-02-05 18:16:56 - ERROR - stderr - +2025-02-05 18:16:56 - ERROR - stderr - +2025-02-05 18:16:56 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.165932536125183, 'learning_rate': 1.5456879924868093e-05, 'epoch': 1.01} +2025-02-05 18:16:56 - ERROR - stderr - 34%|███▎ | 7556/22434 [8:09:16<10:55:40, 2.64s/it] +2025-02-05 18:16:59 - ERROR - stderr - 34%|███▎ | 7557/22434 [8:09:19<11:15:33, 2.72s/it] +2025-02-05 18:16:59 - ERROR - stderr - +2025-02-05 18:16:59 - ERROR - stderr - +2025-02-05 18:16:59 - INFO - stdout - {'loss': 0.596, 'grad_norm': 0.9414857029914856, 'learning_rate': 1.54556700250287e-05, 'epoch': 1.01} +2025-02-05 18:16:59 - ERROR - stderr - 34%|███▎ | 7557/22434 [8:09:19<11:15:33, 2.72s/it] +2025-02-05 18:17:02 - ERROR - stderr - 34%|███▎ | 7558/22434 [8:09:22<10:57:20, 2.65s/it] +2025-02-05 18:17:02 - ERROR - stderr - +2025-02-05 18:17:02 - ERROR - stderr - +2025-02-05 18:17:02 - INFO - stdout - {'loss': 0.6773, 'grad_norm': 1.0853465795516968, 'learning_rate': 1.5454460011471082e-05, 'epoch': 1.01} +2025-02-05 18:17:02 - ERROR - stderr - 34%|███▎ | 7558/22434 [8:09:22<10:57:20, 2.65s/it] +2025-02-05 18:17:04 - ERROR - stderr - 34%|███▎ | 7559/22434 [8:09:24<10:48:25, 2.62s/it] +2025-02-05 18:17:04 - ERROR - stderr - +2025-02-05 18:17:04 - ERROR - stderr - +2025-02-05 18:17:04 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.103758454322815, 'learning_rate': 1.5453249884220466e-05, 'epoch': 1.01} +2025-02-05 18:17:04 - ERROR - stderr - 34%|███▎ | 7559/22434 [8:09:24<10:48:25, 2.62s/it] +2025-02-05 18:17:07 - ERROR - stderr - 34%|███▎ | 7560/22434 [8:09:27<10:50:29, 2.62s/it] +2025-02-05 18:17:07 - ERROR - stderr - +2025-02-05 18:17:07 - ERROR - stderr - +2025-02-05 18:17:07 - INFO - stdout - {'loss': 0.8082, 'grad_norm': 1.095453143119812, 'learning_rate': 1.5452039643302073e-05, 'epoch': 1.01} +2025-02-05 18:17:07 - ERROR - stderr - 34%|███▎ | 7560/22434 [8:09:27<10:50:29, 2.62s/it] +2025-02-05 18:17:09 - ERROR - stderr - 34%|███▎ | 7561/22434 [8:09:29<10:36:39, 2.57s/it] +2025-02-05 18:17:09 - ERROR - stderr - +2025-02-05 18:17:10 - ERROR - stderr - +2025-02-05 18:17:10 - INFO - stdout - {'loss': 0.7326, 'grad_norm': 1.1548391580581665, 'learning_rate': 1.545082928874113e-05, 'epoch': 1.01} +2025-02-05 18:17:10 - ERROR - stderr - 34%|███▎ | 7561/22434 [8:09:29<10:36:39, 2.57s/it] +2025-02-05 18:17:12 - ERROR - stderr - 34%|███▎ | 7562/22434 [8:09:32<10:36:53, 2.57s/it] +2025-02-05 18:17:12 - ERROR - stderr - +2025-02-05 18:17:12 - ERROR - stderr - +2025-02-05 18:17:12 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.137392282485962, 'learning_rate': 1.5449618820562874e-05, 'epoch': 1.01} +2025-02-05 18:17:12 - ERROR - stderr - 34%|███▎ | 7562/22434 [8:09:32<10:36:53, 2.57s/it] +2025-02-05 18:17:15 - ERROR - stderr - 34%|███▎ | 7563/22434 [8:09:34<10:30:14, 2.54s/it] +2025-02-05 18:17:15 - ERROR - stderr - +2025-02-05 18:17:15 - ERROR - stderr - +2025-02-05 18:17:15 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.1104589700698853, 'learning_rate': 1.544840823879252e-05, 'epoch': 1.01} +2025-02-05 18:17:15 - ERROR - stderr - 34%|███▎ | 7563/22434 [8:09:34<10:30:14, 2.54s/it] +2025-02-05 18:17:17 - ERROR - stderr - 34%|███▎ | 7564/22434 [8:09:37<10:30:01, 2.54s/it] +2025-02-05 18:17:17 - ERROR - stderr - +2025-02-05 18:17:17 - ERROR - stderr - +2025-02-05 18:17:17 - INFO - stdout - {'loss': 0.8533, 'grad_norm': 1.2356964349746704, 'learning_rate': 1.544719754345531e-05, 'epoch': 1.01} +2025-02-05 18:17:17 - ERROR - stderr - 34%|███▎ | 7564/22434 [8:09:37<10:30:01, 2.54s/it] +2025-02-05 18:17:20 - ERROR - stderr - 34%|███▎ | 7565/22434 [8:09:39<10:28:52, 2.54s/it] +2025-02-05 18:17:20 - ERROR - stderr - +2025-02-05 18:17:20 - ERROR - stderr - +2025-02-05 18:17:20 - INFO - stdout - {'loss': 0.7558, 'grad_norm': 1.0686579942703247, 'learning_rate': 1.5445986734576485e-05, 'epoch': 1.01} +2025-02-05 18:17:20 - ERROR - stderr - 34%|███▎ | 7565/22434 [8:09:39<10:28:52, 2.54s/it] +2025-02-05 18:17:22 - ERROR - stderr - 34%|███▎ | 7566/22434 [8:09:42<10:34:27, 2.56s/it] +2025-02-05 18:17:22 - ERROR - stderr - +2025-02-05 18:17:22 - ERROR - stderr - +2025-02-05 18:17:22 - INFO - stdout - {'loss': 0.7313, 'grad_norm': 1.0532584190368652, 'learning_rate': 1.5444775812181275e-05, 'epoch': 1.01} +2025-02-05 18:17:22 - ERROR - stderr - 34%|███▎ | 7566/22434 [8:09:42<10:34:27, 2.56s/it] +2025-02-05 18:17:25 - ERROR - stderr - 34%|███▎ | 7567/22434 [8:09:44<10:27:48, 2.53s/it] +2025-02-05 18:17:25 - ERROR - stderr - +2025-02-05 18:17:25 - ERROR - stderr - +2025-02-05 18:17:25 - INFO - stdout - {'loss': 0.7965, 'grad_norm': 1.0541143417358398, 'learning_rate': 1.5443564776294922e-05, 'epoch': 1.01} +2025-02-05 18:17:25 - ERROR - stderr - 34%|███▎ | 7567/22434 [8:09:44<10:27:48, 2.53s/it] +2025-02-05 18:17:27 - ERROR - stderr - 34%|███▎ | 7568/22434 [8:09:47<10:33:14, 2.56s/it] +2025-02-05 18:17:27 - ERROR - stderr - +2025-02-05 18:17:27 - ERROR - stderr - +2025-02-05 18:17:27 - INFO - stdout - {'loss': 0.8002, 'grad_norm': 1.0703556537628174, 'learning_rate': 1.5442353626942672e-05, 'epoch': 1.01} +2025-02-05 18:17:27 - ERROR - stderr - 34%|███▎ | 7568/22434 [8:09:47<10:33:14, 2.56s/it] +2025-02-05 18:17:30 - ERROR - stderr - 34%|███▎ | 7569/22434 [8:09:50<10:27:09, 2.53s/it] +2025-02-05 18:17:30 - ERROR - stderr - +2025-02-05 18:17:30 - ERROR - stderr - +2025-02-05 18:17:30 - INFO - stdout - {'loss': 0.7316, 'grad_norm': 1.0438132286071777, 'learning_rate': 1.544114236414977e-05, 'epoch': 1.01} +2025-02-05 18:17:30 - ERROR - stderr - 34%|███▎ | 7569/22434 [8:09:50<10:27:09, 2.53s/it] +2025-02-05 18:17:32 - ERROR - stderr - 34%|███▎ | 7570/22434 [8:09:52<10:25:24, 2.52s/it] +2025-02-05 18:17:32 - ERROR - stderr - +2025-02-05 18:17:32 - ERROR - stderr - +2025-02-05 18:17:32 - INFO - stdout - {'loss': 0.7714, 'grad_norm': 1.0834113359451294, 'learning_rate': 1.543993098794146e-05, 'epoch': 1.01} +2025-02-05 18:17:32 - ERROR - stderr - 34%|███▎ | 7570/22434 [8:09:52<10:25:24, 2.52s/it] +2025-02-05 18:17:35 - ERROR - stderr - 34%|███▎ | 7571/22434 [8:09:55<10:28:33, 2.54s/it] +2025-02-05 18:17:35 - ERROR - stderr - +2025-02-05 18:17:35 - ERROR - stderr - +2025-02-05 18:17:35 - INFO - stdout - {'loss': 0.7676, 'grad_norm': 1.1129871606826782, 'learning_rate': 1.5438719498342992e-05, 'epoch': 1.01} +2025-02-05 18:17:35 - ERROR - stderr - 34%|███▎ | 7571/22434 [8:09:55<10:28:33, 2.54s/it] +2025-02-05 18:17:37 - ERROR - stderr - 34%|███▍ | 7572/22434 [8:09:57<10:28:58, 2.54s/it] +2025-02-05 18:17:37 - ERROR - stderr - +2025-02-05 18:17:37 - ERROR - stderr - +2025-02-05 18:17:37 - INFO - stdout - {'loss': 0.7749, 'grad_norm': 1.1400810480117798, 'learning_rate': 1.5437507895379624e-05, 'epoch': 1.01} +2025-02-05 18:17:37 - ERROR - stderr - 34%|███▍ | 7572/22434 [8:09:57<10:28:58, 2.54s/it] +2025-02-05 18:17:40 - ERROR - stderr - 34%|███▍ | 7573/22434 [8:10:00<10:24:28, 2.52s/it] +2025-02-05 18:17:40 - ERROR - stderr - +2025-02-05 18:17:40 - ERROR - stderr - +2025-02-05 18:17:40 - INFO - stdout - {'loss': 0.7155, 'grad_norm': 1.0076345205307007, 'learning_rate': 1.5436296179076605e-05, 'epoch': 1.01} +2025-02-05 18:17:40 - ERROR - stderr - 34%|███▍ | 7573/22434 [8:10:00<10:24:28, 2.52s/it] +2025-02-05 18:17:42 - ERROR - stderr - 34%|███▍ | 7574/22434 [8:10:02<10:19:59, 2.50s/it] +2025-02-05 18:17:42 - ERROR - stderr - +2025-02-05 18:17:42 - ERROR - stderr - +2025-02-05 18:17:42 - INFO - stdout - {'loss': 0.752, 'grad_norm': 1.1146738529205322, 'learning_rate': 1.5435084349459194e-05, 'epoch': 1.01} +2025-02-05 18:17:42 - ERROR - stderr - 34%|███▍ | 7574/22434 [8:10:02<10:19:59, 2.50s/it] +2025-02-05 18:17:45 - ERROR - stderr - 34%|███▍ | 7575/22434 [8:10:05<10:21:22, 2.51s/it] +2025-02-05 18:17:45 - ERROR - stderr - +2025-02-05 18:17:45 - ERROR - stderr - +2025-02-05 18:17:45 - INFO - stdout - {'loss': 0.7397, 'grad_norm': 1.04277503490448, 'learning_rate': 1.543387240655265e-05, 'epoch': 1.01} +2025-02-05 18:17:45 - ERROR - stderr - 34%|███▍ | 7575/22434 [8:10:05<10:21:22, 2.51s/it] +2025-02-05 18:17:47 - ERROR - stderr - 34%|███▍ | 7576/22434 [8:10:07<10:29:50, 2.54s/it] +2025-02-05 18:17:47 - ERROR - stderr - +2025-02-05 18:17:47 - ERROR - stderr - +2025-02-05 18:17:47 - INFO - stdout - {'loss': 0.7451, 'grad_norm': 1.0737124681472778, 'learning_rate': 1.5432660350382235e-05, 'epoch': 1.01} +2025-02-05 18:17:47 - ERROR - stderr - 34%|███▍ | 7576/22434 [8:10:07<10:29:50, 2.54s/it] +2025-02-05 18:17:50 - ERROR - stderr - 34%|███▍ | 7577/22434 [8:10:10<10:31:10, 2.55s/it] +2025-02-05 18:17:50 - ERROR - stderr - +2025-02-05 18:17:50 - ERROR - stderr - +2025-02-05 18:17:50 - INFO - stdout - {'loss': 0.7692, 'grad_norm': 1.223185658454895, 'learning_rate': 1.5431448180973218e-05, 'epoch': 1.01} +2025-02-05 18:17:50 - ERROR - stderr - 34%|███▍ | 7577/22434 [8:10:10<10:31:10, 2.55s/it] +2025-02-05 18:17:52 - ERROR - stderr - 34%|███▍ | 7578/22434 [8:10:12<10:25:28, 2.53s/it] +2025-02-05 18:17:53 - ERROR - stderr - +2025-02-05 18:17:53 - ERROR - stderr - +2025-02-05 18:17:53 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.0702157020568848, 'learning_rate': 1.5430235898350858e-05, 'epoch': 1.01} +2025-02-05 18:17:53 - ERROR - stderr - 34%|███▍ | 7578/22434 [8:10:12<10:25:28, 2.53s/it] +2025-02-05 18:17:55 - ERROR - stderr - 34%|███▍ | 7579/22434 [8:10:15<10:22:08, 2.51s/it] +2025-02-05 18:17:55 - ERROR - stderr - +2025-02-05 18:17:55 - ERROR - stderr - +2025-02-05 18:17:55 - INFO - stdout - {'loss': 0.7642, 'grad_norm': 1.0938825607299805, 'learning_rate': 1.5429023502540426e-05, 'epoch': 1.01} +2025-02-05 18:17:55 - ERROR - stderr - 34%|███▍ | 7579/22434 [8:10:15<10:22:08, 2.51s/it] +2025-02-05 18:17:57 - ERROR - stderr - 34%|███▍ | 7580/22434 [8:10:17<10:21:17, 2.51s/it] +2025-02-05 18:17:58 - ERROR - stderr - +2025-02-05 18:17:58 - ERROR - stderr - +2025-02-05 18:17:58 - INFO - stdout - {'loss': 0.7874, 'grad_norm': 1.2363417148590088, 'learning_rate': 1.5427810993567193e-05, 'epoch': 1.01} +2025-02-05 18:17:58 - ERROR - stderr - 34%|███▍ | 7580/22434 [8:10:17<10:21:17, 2.51s/it] +2025-02-05 18:18:00 - ERROR - stderr - 34%|███▍ | 7581/22434 [8:10:20<10:20:14, 2.51s/it] +2025-02-05 18:18:00 - ERROR - stderr - +2025-02-05 18:18:00 - ERROR - stderr - +2025-02-05 18:18:00 - INFO - stdout - {'loss': 0.8263, 'grad_norm': 1.1881234645843506, 'learning_rate': 1.5426598371456436e-05, 'epoch': 1.01} +2025-02-05 18:18:00 - ERROR - stderr - 34%|███▍ | 7581/22434 [8:10:20<10:20:14, 2.51s/it] +2025-02-05 18:18:02 - ERROR - stderr - 34%|███▍ | 7582/22434 [8:10:22<10:21:26, 2.51s/it] +2025-02-05 18:18:03 - ERROR - stderr - +2025-02-05 18:18:03 - ERROR - stderr - +2025-02-05 18:18:03 - INFO - stdout - {'loss': 0.6876, 'grad_norm': 0.9601733088493347, 'learning_rate': 1.542538563623343e-05, 'epoch': 1.01} +2025-02-05 18:18:03 - ERROR - stderr - 34%|███▍ | 7582/22434 [8:10:22<10:21:26, 2.51s/it] +2025-02-05 18:18:05 - ERROR - stderr - 34%|███▍ | 7583/22434 [8:10:25<10:17:06, 2.49s/it] +2025-02-05 18:18:05 - ERROR - stderr - +2025-02-05 18:18:05 - ERROR - stderr - +2025-02-05 18:18:05 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.1035581827163696, 'learning_rate': 1.5424172787923448e-05, 'epoch': 1.01} +2025-02-05 18:18:05 - ERROR - stderr - 34%|███▍ | 7583/22434 [8:10:25<10:17:06, 2.49s/it] +2025-02-05 18:18:07 - ERROR - stderr - 34%|███▍ | 7584/22434 [8:10:27<10:12:45, 2.48s/it] +2025-02-05 18:18:07 - ERROR - stderr - +2025-02-05 18:18:07 - ERROR - stderr - +2025-02-05 18:18:07 - INFO - stdout - {'loss': 0.6184, 'grad_norm': 1.0428141355514526, 'learning_rate': 1.5422959826551778e-05, 'epoch': 1.01} +2025-02-05 18:18:07 - ERROR - stderr - 34%|███▍ | 7584/22434 [8:10:27<10:12:45, 2.48s/it] +2025-02-05 18:18:10 - ERROR - stderr - 34%|███▍ | 7585/22434 [8:10:30<10:14:28, 2.48s/it] +2025-02-05 18:18:10 - ERROR - stderr - +2025-02-05 18:18:10 - ERROR - stderr - +2025-02-05 18:18:10 - INFO - stdout - {'loss': 0.8123, 'grad_norm': 1.1401928663253784, 'learning_rate': 1.5421746752143696e-05, 'epoch': 1.01} +2025-02-05 18:18:10 - ERROR - stderr - 34%|███▍ | 7585/22434 [8:10:30<10:14:28, 2.48s/it] +2025-02-05 18:18:13 - ERROR - stderr - 34%|███▍ | 7586/22434 [8:10:33<10:47:26, 2.62s/it] +2025-02-05 18:18:13 - ERROR - stderr - +2025-02-05 18:18:13 - ERROR - stderr - +2025-02-05 18:18:13 - INFO - stdout - {'loss': 0.749, 'grad_norm': 1.137660264968872, 'learning_rate': 1.5420533564724495e-05, 'epoch': 1.01} +2025-02-05 18:18:13 - ERROR - stderr - 34%|███▍ | 7586/22434 [8:10:33<10:47:26, 2.62s/it] +2025-02-05 18:18:15 - ERROR - stderr - 34%|███▍ | 7587/22434 [8:10:35<10:41:34, 2.59s/it] +2025-02-05 18:18:15 - ERROR - stderr - +2025-02-05 18:18:15 - ERROR - stderr - +2025-02-05 18:18:15 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.0200623273849487, 'learning_rate': 1.5419320264319458e-05, 'epoch': 1.01} +2025-02-05 18:18:15 - ERROR - stderr - 34%|███▍ | 7587/22434 [8:10:35<10:41:34, 2.59s/it] +2025-02-05 18:18:18 - ERROR - stderr - 34%|███▍ | 7588/22434 [8:10:38<10:34:18, 2.56s/it] +2025-02-05 18:18:18 - ERROR - stderr - +2025-02-05 18:18:18 - ERROR - stderr - +2025-02-05 18:18:18 - INFO - stdout - {'loss': 0.6638, 'grad_norm': 1.086226224899292, 'learning_rate': 1.5418106850953877e-05, 'epoch': 1.01} +2025-02-05 18:18:18 - ERROR - stderr - 34%|███▍ | 7588/22434 [8:10:38<10:34:18, 2.56s/it] +2025-02-05 18:18:20 - ERROR - stderr - 34%|███▍ | 7589/22434 [8:10:40<10:35:29, 2.57s/it] +2025-02-05 18:18:20 - ERROR - stderr - +2025-02-05 18:18:20 - ERROR - stderr - +2025-02-05 18:18:20 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.1296967267990112, 'learning_rate': 1.5416893324653037e-05, 'epoch': 1.01} +2025-02-05 18:18:20 - ERROR - stderr - 34%|███▍ | 7589/22434 [8:10:40<10:35:29, 2.57s/it] +2025-02-05 18:18:23 - ERROR - stderr - 34%|███▍ | 7590/22434 [8:10:43<10:30:08, 2.55s/it] +2025-02-05 18:18:23 - ERROR - stderr - +2025-02-05 18:18:23 - ERROR - stderr - +2025-02-05 18:18:23 - INFO - stdout - {'loss': 0.7331, 'grad_norm': 1.1705378293991089, 'learning_rate': 1.5415679685442247e-05, 'epoch': 1.01} +2025-02-05 18:18:23 - ERROR - stderr - 34%|███▍ | 7590/22434 [8:10:43<10:30:08, 2.55s/it] +2025-02-05 18:18:25 - ERROR - stderr - 34%|███▍ | 7591/22434 [8:10:45<10:22:29, 2.52s/it] +2025-02-05 18:18:25 - ERROR - stderr - +2025-02-05 18:18:25 - ERROR - stderr - +2025-02-05 18:18:25 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.1013715267181396, 'learning_rate': 1.541446593334679e-05, 'epoch': 1.02} +2025-02-05 18:18:25 - ERROR - stderr - 34%|███▍ | 7591/22434 [8:10:45<10:22:29, 2.52s/it] +2025-02-05 18:18:28 - ERROR - stderr - 34%|███▍ | 7592/22434 [8:10:48<10:41:53, 2.59s/it] +2025-02-05 18:18:28 - ERROR - stderr - +2025-02-05 18:18:28 - ERROR - stderr - +2025-02-05 18:18:28 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.140773892402649, 'learning_rate': 1.5413252068391973e-05, 'epoch': 1.02} +2025-02-05 18:18:28 - ERROR - stderr - 34%|███▍ | 7592/22434 [8:10:48<10:41:53, 2.59s/it] +2025-02-05 18:18:31 - ERROR - stderr - 34%|███▍ | 7593/22434 [8:10:50<10:38:13, 2.58s/it] +2025-02-05 18:18:31 - ERROR - stderr - +2025-02-05 18:18:31 - ERROR - stderr - +2025-02-05 18:18:31 - INFO - stdout - {'loss': 0.631, 'grad_norm': 1.0598537921905518, 'learning_rate': 1.5412038090603098e-05, 'epoch': 1.02} +2025-02-05 18:18:31 - ERROR - stderr - 34%|███▍ | 7593/22434 [8:10:50<10:38:13, 2.58s/it] +2025-02-05 18:18:33 - ERROR - stderr - 34%|███▍ | 7594/22434 [8:10:53<10:36:27, 2.57s/it] +2025-02-05 18:18:33 - ERROR - stderr - +2025-02-05 18:18:33 - ERROR - stderr - +2025-02-05 18:18:33 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.075382947921753, 'learning_rate': 1.541082400000547e-05, 'epoch': 1.02} +2025-02-05 18:18:33 - ERROR - stderr - 34%|███▍ | 7594/22434 [8:10:53<10:36:27, 2.57s/it] +2025-02-05 18:18:36 - ERROR - stderr - 34%|███▍ | 7595/22434 [8:10:56<10:34:34, 2.57s/it] +2025-02-05 18:18:36 - ERROR - stderr - +2025-02-05 18:18:36 - ERROR - stderr - +2025-02-05 18:18:36 - INFO - stdout - {'loss': 0.6812, 'grad_norm': 1.1742466688156128, 'learning_rate': 1.5409609796624387e-05, 'epoch': 1.02} +2025-02-05 18:18:36 - ERROR - stderr - 34%|███▍ | 7595/22434 [8:10:56<10:34:34, 2.57s/it] +2025-02-05 18:18:38 - ERROR - stderr - 34%|███▍ | 7596/22434 [8:10:58<10:33:20, 2.56s/it] +2025-02-05 18:18:38 - ERROR - stderr - +2025-02-05 18:18:38 - ERROR - stderr - +2025-02-05 18:18:38 - INFO - stdout - {'loss': 0.7777, 'grad_norm': 1.1625468730926514, 'learning_rate': 1.540839548048517e-05, 'epoch': 1.02} +2025-02-05 18:18:38 - ERROR - stderr - 34%|███▍ | 7596/22434 [8:10:58<10:33:20, 2.56s/it] +2025-02-05 18:18:41 - ERROR - stderr - 34%|███▍ | 7597/22434 [8:11:01<10:31:54, 2.56s/it] +2025-02-05 18:18:41 - ERROR - stderr - +2025-02-05 18:18:41 - ERROR - stderr - +2025-02-05 18:18:41 - INFO - stdout - {'loss': 0.7278, 'grad_norm': 1.2257791757583618, 'learning_rate': 1.540718105161312e-05, 'epoch': 1.02} +2025-02-05 18:18:41 - ERROR - stderr - 34%|███▍ | 7597/22434 [8:11:01<10:31:54, 2.56s/it] +2025-02-05 18:18:44 - ERROR - stderr - 34%|███▍ | 7598/22434 [8:11:03<10:48:23, 2.62s/it] +2025-02-05 18:18:44 - ERROR - stderr - +2025-02-05 18:18:44 - ERROR - stderr - +2025-02-05 18:18:44 - INFO - stdout - {'loss': 0.8606, 'grad_norm': 1.2902549505233765, 'learning_rate': 1.540596651003356e-05, 'epoch': 1.02} +2025-02-05 18:18:44 - ERROR - stderr - 34%|███▍ | 7598/22434 [8:11:03<10:48:23, 2.62s/it] +2025-02-05 18:18:46 - ERROR - stderr - 34%|███▍ | 7599/22434 [8:11:06<10:35:22, 2.57s/it] +2025-02-05 18:18:46 - ERROR - stderr - +2025-02-05 18:18:46 - ERROR - stderr - +2025-02-05 18:18:46 - INFO - stdout - {'loss': 0.7524, 'grad_norm': 1.1143171787261963, 'learning_rate': 1.5404751855771798e-05, 'epoch': 1.02} +2025-02-05 18:18:46 - ERROR - stderr - 34%|███▍ | 7599/22434 [8:11:06<10:35:22, 2.57s/it] +2025-02-05 18:18:49 - ERROR - stderr - 34%|███▍ | 7600/22434 [8:11:08<10:27:16, 2.54s/it] +2025-02-05 18:18:49 - ERROR - stderr - +2025-02-05 18:18:49 - ERROR - stderr - +2025-02-05 18:18:49 - INFO - stdout - {'loss': 0.7269, 'grad_norm': 1.1104305982589722, 'learning_rate': 1.5403537088853157e-05, 'epoch': 1.02} +2025-02-05 18:18:49 - ERROR - stderr - 34%|███▍ | 7600/22434 [8:11:08<10:27:16, 2.54s/it] +2025-02-05 18:18:51 - ERROR - stderr - 34%|███▍ | 7601/22434 [8:11:11<10:24:39, 2.53s/it] +2025-02-05 18:18:51 - ERROR - stderr - +2025-02-05 18:18:51 - ERROR - stderr - +2025-02-05 18:18:51 - INFO - stdout - {'loss': 0.8392, 'grad_norm': 1.1554456949234009, 'learning_rate': 1.5402322209302953e-05, 'epoch': 1.02} +2025-02-05 18:18:51 - ERROR - stderr - 34%|███▍ | 7601/22434 [8:11:11<10:24:39, 2.53s/it] +2025-02-05 18:18:54 - ERROR - stderr - 34%|███▍ | 7602/22434 [8:11:13<10:31:38, 2.56s/it] +2025-02-05 18:18:54 - ERROR - stderr - +2025-02-05 18:18:54 - ERROR - stderr - +2025-02-05 18:18:54 - INFO - stdout - {'loss': 0.6679, 'grad_norm': 1.096725344657898, 'learning_rate': 1.5401107217146515e-05, 'epoch': 1.02} +2025-02-05 18:18:54 - ERROR - stderr - 34%|███▍ | 7602/22434 [8:11:14<10:31:38, 2.56s/it] +2025-02-05 18:18:56 - ERROR - stderr - 34%|███▍ | 7603/22434 [8:11:16<10:28:21, 2.54s/it] +2025-02-05 18:18:56 - ERROR - stderr - +2025-02-05 18:18:56 - ERROR - stderr - +2025-02-05 18:18:56 - INFO - stdout - {'loss': 0.6948, 'grad_norm': 1.0478761196136475, 'learning_rate': 1.5399892112409163e-05, 'epoch': 1.02} +2025-02-05 18:18:56 - ERROR - stderr - 34%|███▍ | 7603/22434 [8:11:16<10:28:21, 2.54s/it] +2025-02-05 18:18:59 - ERROR - stderr - 34%|███▍ | 7604/22434 [8:11:19<10:28:57, 2.54s/it] +2025-02-05 18:18:59 - ERROR - stderr - +2025-02-05 18:18:59 - ERROR - stderr - +2025-02-05 18:18:59 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.0522668361663818, 'learning_rate': 1.539867689511623e-05, 'epoch': 1.02} +2025-02-05 18:18:59 - ERROR - stderr - 34%|███▍ | 7604/22434 [8:11:19<10:28:57, 2.54s/it] +2025-02-05 18:19:01 - ERROR - stderr - 34%|███▍ | 7605/22434 [8:11:21<10:29:17, 2.55s/it] +2025-02-05 18:19:01 - ERROR - stderr - +2025-02-05 18:19:01 - ERROR - stderr - +2025-02-05 18:19:01 - INFO - stdout - {'loss': 0.7066, 'grad_norm': 1.0767230987548828, 'learning_rate': 1.5397461565293038e-05, 'epoch': 1.02} +2025-02-05 18:19:01 - ERROR - stderr - 34%|███▍ | 7605/22434 [8:11:21<10:29:17, 2.55s/it] +2025-02-05 18:19:04 - ERROR - stderr - 34%|███▍ | 7606/22434 [8:11:24<10:26:20, 2.53s/it] +2025-02-05 18:19:04 - ERROR - stderr - +2025-02-05 18:19:04 - ERROR - stderr - +2025-02-05 18:19:04 - INFO - stdout - {'loss': 0.7615, 'grad_norm': 1.323459506034851, 'learning_rate': 1.539624612296493e-05, 'epoch': 1.02} +2025-02-05 18:19:04 - ERROR - stderr - 34%|███▍ | 7606/22434 [8:11:24<10:26:20, 2.53s/it] +2025-02-05 18:19:06 - ERROR - stderr - 34%|███▍ | 7607/22434 [8:11:26<10:24:48, 2.53s/it] +2025-02-05 18:19:06 - ERROR - stderr - +2025-02-05 18:19:06 - ERROR - stderr - +2025-02-05 18:19:06 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.0911426544189453, 'learning_rate': 1.5395030568157232e-05, 'epoch': 1.02} +2025-02-05 18:19:06 - ERROR - stderr - 34%|███▍ | 7607/22434 [8:11:26<10:24:48, 2.53s/it] +2025-02-05 18:19:09 - ERROR - stderr - 34%|███▍ | 7608/22434 [8:11:29<10:17:05, 2.50s/it] +2025-02-05 18:19:09 - ERROR - stderr - +2025-02-05 18:19:09 - ERROR - stderr - +2025-02-05 18:19:09 - INFO - stdout - {'loss': 0.6774, 'grad_norm': 1.0339018106460571, 'learning_rate': 1.5393814900895284e-05, 'epoch': 1.02} +2025-02-05 18:19:09 - ERROR - stderr - 34%|███▍ | 7608/22434 [8:11:29<10:17:05, 2.50s/it] +2025-02-05 18:19:12 - ERROR - stderr - 34%|███▍ | 7609/22434 [8:11:31<10:36:34, 2.58s/it] +2025-02-05 18:19:12 - ERROR - stderr - +2025-02-05 18:19:12 - ERROR - stderr - +2025-02-05 18:19:12 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.1799553632736206, 'learning_rate': 1.5392599121204427e-05, 'epoch': 1.02} +2025-02-05 18:19:12 - ERROR - stderr - 34%|███▍ | 7609/22434 [8:11:31<10:36:34, 2.58s/it] +2025-02-05 18:19:14 - ERROR - stderr - 34%|███▍ | 7610/22434 [8:11:34<10:36:20, 2.58s/it] +2025-02-05 18:19:14 - ERROR - stderr - +2025-02-05 18:19:14 - ERROR - stderr - +2025-02-05 18:19:14 - INFO - stdout - {'loss': 0.6394, 'grad_norm': 1.0164538621902466, 'learning_rate': 1.5391383229110005e-05, 'epoch': 1.02} +2025-02-05 18:19:14 - ERROR - stderr - 34%|███▍ | 7610/22434 [8:11:34<10:36:20, 2.58s/it] +2025-02-05 18:19:17 - ERROR - stderr - 34%|███▍ | 7611/22434 [8:11:36<10:31:19, 2.56s/it] +2025-02-05 18:19:17 - ERROR - stderr - +2025-02-05 18:19:17 - ERROR - stderr - +2025-02-05 18:19:17 - INFO - stdout - {'loss': 0.7171, 'grad_norm': 1.103979229927063, 'learning_rate': 1.5390167224637353e-05, 'epoch': 1.02} +2025-02-05 18:19:17 - ERROR - stderr - 34%|███▍ | 7611/22434 [8:11:36<10:31:19, 2.56s/it] +2025-02-05 18:19:19 - ERROR - stderr - 34%|███▍ | 7612/22434 [8:11:39<10:32:05, 2.56s/it] +2025-02-05 18:19:19 - ERROR - stderr - +2025-02-05 18:19:19 - ERROR - stderr - +2025-02-05 18:19:19 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.038739800453186, 'learning_rate': 1.5388951107811828e-05, 'epoch': 1.02} +2025-02-05 18:19:19 - ERROR - stderr - 34%|███▍ | 7612/22434 [8:11:39<10:32:05, 2.56s/it] +2025-02-05 18:19:22 - ERROR - stderr - 34%|███▍ | 7613/22434 [8:11:41<10:30:42, 2.55s/it] +2025-02-05 18:19:22 - ERROR - stderr - +2025-02-05 18:19:22 - ERROR - stderr - +2025-02-05 18:19:22 - INFO - stdout - {'loss': 0.8267, 'grad_norm': 1.2042981386184692, 'learning_rate': 1.538773487865877e-05, 'epoch': 1.02} +2025-02-05 18:19:22 - ERROR - stderr - 34%|███▍ | 7613/22434 [8:11:42<10:30:42, 2.55s/it] +2025-02-05 18:19:24 - ERROR - stderr - 34%|███▍ | 7614/22434 [8:11:44<10:35:29, 2.57s/it] +2025-02-05 18:19:24 - ERROR - stderr - +2025-02-05 18:19:24 - ERROR - stderr - +2025-02-05 18:19:24 - INFO - stdout - {'loss': 0.8447, 'grad_norm': 1.1565076112747192, 'learning_rate': 1.5386518537203533e-05, 'epoch': 1.02} +2025-02-05 18:19:24 - ERROR - stderr - 34%|███▍ | 7614/22434 [8:11:44<10:35:29, 2.57s/it] +2025-02-05 18:19:27 - ERROR - stderr - 34%|███▍ | 7615/22434 [8:11:47<10:30:44, 2.55s/it] +2025-02-05 18:19:27 - ERROR - stderr - +2025-02-05 18:19:27 - ERROR - stderr - +2025-02-05 18:19:27 - INFO - stdout - {'loss': 0.736, 'grad_norm': 1.2337570190429688, 'learning_rate': 1.5385302083471474e-05, 'epoch': 1.02} +2025-02-05 18:19:27 - ERROR - stderr - 34%|███▍ | 7615/22434 [8:11:47<10:30:44, 2.55s/it] +2025-02-05 18:19:29 - ERROR - stderr - 34%|███▍ | 7616/22434 [8:11:49<10:24:47, 2.53s/it] +2025-02-05 18:19:29 - ERROR - stderr - +2025-02-05 18:19:29 - ERROR - stderr - +2025-02-05 18:19:29 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.091556191444397, 'learning_rate': 1.5384085517487948e-05, 'epoch': 1.02} +2025-02-05 18:19:29 - ERROR - stderr - 34%|███▍ | 7616/22434 [8:11:49<10:24:47, 2.53s/it] +2025-02-05 18:19:32 - ERROR - stderr - 34%|███▍ | 7617/22434 [8:11:52<10:55:59, 2.66s/it] +2025-02-05 18:19:32 - ERROR - stderr - +2025-02-05 18:19:32 - ERROR - stderr - +2025-02-05 18:19:32 - INFO - stdout - {'loss': 0.7836, 'grad_norm': 1.1633453369140625, 'learning_rate': 1.5382868839278307e-05, 'epoch': 1.02} +2025-02-05 18:19:32 - ERROR - stderr - 34%|███▍ | 7617/22434 [8:11:52<10:55:59, 2.66s/it] +2025-02-05 18:19:35 - ERROR - stderr - 34%|███▍ | 7618/22434 [8:11:55<10:45:27, 2.61s/it] +2025-02-05 18:19:35 - ERROR - stderr - +2025-02-05 18:19:35 - ERROR - stderr - +2025-02-05 18:19:35 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.1030980348587036, 'learning_rate': 1.538165204886792e-05, 'epoch': 1.02} +2025-02-05 18:19:35 - ERROR - stderr - 34%|███▍ | 7618/22434 [8:11:55<10:45:27, 2.61s/it] +2025-02-05 18:19:37 - ERROR - stderr - 34%|███▍ | 7619/22434 [8:11:57<10:36:48, 2.58s/it] +2025-02-05 18:19:37 - ERROR - stderr - +2025-02-05 18:19:37 - ERROR - stderr - +2025-02-05 18:19:37 - INFO - stdout - {'loss': 0.6532, 'grad_norm': 1.052276372909546, 'learning_rate': 1.538043514628214e-05, 'epoch': 1.02} +2025-02-05 18:19:37 - ERROR - stderr - 34%|███▍ | 7619/22434 [8:11:57<10:36:48, 2.58s/it] +2025-02-05 18:19:40 - ERROR - stderr - 34%|███▍ | 7620/22434 [8:12:00<10:35:40, 2.57s/it] +2025-02-05 18:19:40 - ERROR - stderr - +2025-02-05 18:19:40 - ERROR - stderr - +2025-02-05 18:19:40 - INFO - stdout - {'loss': 0.7828, 'grad_norm': 1.1536833047866821, 'learning_rate': 1.5379218131546344e-05, 'epoch': 1.02} +2025-02-05 18:19:40 - ERROR - stderr - 34%|███▍ | 7620/22434 [8:12:00<10:35:40, 2.57s/it] +2025-02-05 18:19:42 - ERROR - stderr - 34%|███▍ | 7621/22434 [8:12:02<10:24:51, 2.53s/it] +2025-02-05 18:19:42 - ERROR - stderr - +2025-02-05 18:19:42 - ERROR - stderr - +2025-02-05 18:19:42 - INFO - stdout - {'loss': 0.7369, 'grad_norm': 1.1853820085525513, 'learning_rate': 1.5378001004685888e-05, 'epoch': 1.02} +2025-02-05 18:19:42 - ERROR - stderr - 34%|███▍ | 7621/22434 [8:12:02<10:24:51, 2.53s/it] +2025-02-05 18:19:45 - ERROR - stderr - 34%|███▍ | 7622/22434 [8:12:05<10:26:46, 2.54s/it] +2025-02-05 18:19:45 - ERROR - stderr - +2025-02-05 18:19:45 - ERROR - stderr - +2025-02-05 18:19:45 - INFO - stdout - {'loss': 0.7696, 'grad_norm': 1.1158623695373535, 'learning_rate': 1.5376783765726155e-05, 'epoch': 1.02} +2025-02-05 18:19:45 - ERROR - stderr - 34%|███▍ | 7622/22434 [8:12:05<10:26:46, 2.54s/it] +2025-02-05 18:19:47 - ERROR - stderr - 34%|███▍ | 7623/22434 [8:12:07<10:24:00, 2.53s/it] +2025-02-05 18:19:47 - ERROR - stderr - +2025-02-05 18:19:47 - ERROR - stderr - +2025-02-05 18:19:47 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.1290605068206787, 'learning_rate': 1.5375566414692504e-05, 'epoch': 1.02} +2025-02-05 18:19:47 - ERROR - stderr - 34%|███▍ | 7623/22434 [8:12:07<10:24:00, 2.53s/it] +2025-02-05 18:19:50 - ERROR - stderr - 34%|███▍ | 7624/22434 [8:12:10<11:03:12, 2.69s/it] +2025-02-05 18:19:50 - ERROR - stderr - +2025-02-05 18:19:50 - ERROR - stderr - +2025-02-05 18:19:50 - INFO - stdout - {'loss': 0.7714, 'grad_norm': 1.227765679359436, 'learning_rate': 1.5374348951610312e-05, 'epoch': 1.02} +2025-02-05 18:19:50 - ERROR - stderr - 34%|███▍ | 7624/22434 [8:12:10<11:03:12, 2.69s/it] +2025-02-05 18:19:53 - ERROR - stderr - 34%|███▍ | 7625/22434 [8:12:13<10:48:44, 2.63s/it] +2025-02-05 18:19:53 - ERROR - stderr - +2025-02-05 18:19:53 - ERROR - stderr - +2025-02-05 18:19:53 - INFO - stdout - {'loss': 0.8793, 'grad_norm': 1.2493822574615479, 'learning_rate': 1.5373131376504964e-05, 'epoch': 1.02} +2025-02-05 18:19:53 - ERROR - stderr - 34%|███▍ | 7625/22434 [8:12:13<10:48:44, 2.63s/it] +2025-02-05 18:19:55 - ERROR - stderr - 34%|███▍ | 7626/22434 [8:12:15<10:39:21, 2.59s/it] +2025-02-05 18:19:55 - ERROR - stderr - +2025-02-05 18:19:55 - ERROR - stderr - +2025-02-05 18:19:55 - INFO - stdout - {'loss': 0.7588, 'grad_norm': 1.0335562229156494, 'learning_rate': 1.5371913689401833e-05, 'epoch': 1.02} +2025-02-05 18:19:55 - ERROR - stderr - 34%|███▍ | 7626/22434 [8:12:15<10:39:21, 2.59s/it] +2025-02-05 18:19:58 - ERROR - stderr - 34%|███▍ | 7627/22434 [8:12:18<10:28:49, 2.55s/it] +2025-02-05 18:19:58 - ERROR - stderr - +2025-02-05 18:19:58 - ERROR - stderr - +2025-02-05 18:19:58 - INFO - stdout - {'loss': 0.7408, 'grad_norm': 1.1311569213867188, 'learning_rate': 1.53706958903263e-05, 'epoch': 1.02} +2025-02-05 18:19:58 - ERROR - stderr - 34%|███▍ | 7627/22434 [8:12:18<10:28:49, 2.55s/it] +2025-02-05 18:20:00 - ERROR - stderr - 34%|███▍ | 7628/22434 [8:12:20<10:34:19, 2.57s/it] +2025-02-05 18:20:00 - ERROR - stderr - +2025-02-05 18:20:00 - ERROR - stderr - +2025-02-05 18:20:00 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.2213736772537231, 'learning_rate': 1.5369477979303752e-05, 'epoch': 1.02} +2025-02-05 18:20:00 - ERROR - stderr - 34%|███▍ | 7628/22434 [8:12:20<10:34:19, 2.57s/it] +2025-02-05 18:20:03 - ERROR - stderr - 34%|███▍ | 7629/22434 [8:12:23<11:01:51, 2.68s/it] +2025-02-05 18:20:03 - ERROR - stderr - +2025-02-05 18:20:03 - ERROR - stderr - +2025-02-05 18:20:03 - INFO - stdout - {'loss': 0.7809, 'grad_norm': 1.1411844491958618, 'learning_rate': 1.5368259956359572e-05, 'epoch': 1.02} +2025-02-05 18:20:03 - ERROR - stderr - 34%|███▍ | 7629/22434 [8:12:23<11:01:51, 2.68s/it] +2025-02-05 18:20:06 - ERROR - stderr - 34%|███▍ | 7630/22434 [8:12:26<10:47:39, 2.62s/it] +2025-02-05 18:20:06 - ERROR - stderr - +2025-02-05 18:20:06 - ERROR - stderr - +2025-02-05 18:20:06 - INFO - stdout - {'loss': 0.7484, 'grad_norm': 1.0908231735229492, 'learning_rate': 1.5367041821519152e-05, 'epoch': 1.02} +2025-02-05 18:20:06 - ERROR - stderr - 34%|███▍ | 7630/22434 [8:12:26<10:47:39, 2.62s/it] +2025-02-05 18:20:08 - ERROR - stderr - 34%|███▍ | 7631/22434 [8:12:28<10:33:17, 2.57s/it] +2025-02-05 18:20:08 - ERROR - stderr - +2025-02-05 18:20:08 - ERROR - stderr - +2025-02-05 18:20:08 - INFO - stdout - {'loss': 0.6735, 'grad_norm': 1.0617804527282715, 'learning_rate': 1.536582357480788e-05, 'epoch': 1.02} +2025-02-05 18:20:08 - ERROR - stderr - 34%|███▍ | 7631/22434 [8:12:28<10:33:17, 2.57s/it] +2025-02-05 18:20:11 - ERROR - stderr - 34%|███▍ | 7632/22434 [8:12:31<10:29:29, 2.55s/it] +2025-02-05 18:20:11 - ERROR - stderr - +2025-02-05 18:20:11 - ERROR - stderr - +2025-02-05 18:20:11 - INFO - stdout - {'loss': 0.6922, 'grad_norm': 1.186063289642334, 'learning_rate': 1.5364605216251146e-05, 'epoch': 1.02} +2025-02-05 18:20:11 - ERROR - stderr - 34%|███▍ | 7632/22434 [8:12:31<10:29:29, 2.55s/it] +2025-02-05 18:20:13 - ERROR - stderr - 34%|███▍ | 7633/22434 [8:12:33<10:20:33, 2.52s/it] +2025-02-05 18:20:13 - ERROR - stderr - +2025-02-05 18:20:13 - ERROR - stderr - +2025-02-05 18:20:13 - INFO - stdout - {'loss': 0.8077, 'grad_norm': 1.1263413429260254, 'learning_rate': 1.5363386745874355e-05, 'epoch': 1.02} +2025-02-05 18:20:13 - ERROR - stderr - 34%|███▍ | 7633/22434 [8:12:33<10:20:33, 2.52s/it] +2025-02-05 18:20:16 - ERROR - stderr - 34%|███▍ | 7634/22434 [8:12:35<10:18:10, 2.51s/it] +2025-02-05 18:20:16 - ERROR - stderr - +2025-02-05 18:20:16 - ERROR - stderr - +2025-02-05 18:20:16 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.1766207218170166, 'learning_rate': 1.53621681637029e-05, 'epoch': 1.02} +2025-02-05 18:20:16 - ERROR - stderr - 34%|███▍ | 7634/22434 [8:12:36<10:18:10, 2.51s/it] +2025-02-05 18:20:18 - ERROR - stderr - 34%|███▍ | 7635/22434 [8:12:38<10:13:46, 2.49s/it] +2025-02-05 18:20:18 - ERROR - stderr - +2025-02-05 18:20:18 - ERROR - stderr - +2025-02-05 18:20:18 - INFO - stdout - {'loss': 0.7889, 'grad_norm': 1.2635776996612549, 'learning_rate': 1.536094946976218e-05, 'epoch': 1.02} +2025-02-05 18:20:18 - ERROR - stderr - 34%|███▍ | 7635/22434 [8:12:38<10:13:46, 2.49s/it] +2025-02-05 18:20:21 - ERROR - stderr - 34%|███▍ | 7636/22434 [8:12:40<10:16:49, 2.50s/it] +2025-02-05 18:20:21 - ERROR - stderr - +2025-02-05 18:20:21 - ERROR - stderr - +2025-02-05 18:20:21 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.2026598453521729, 'learning_rate': 1.53597306640776e-05, 'epoch': 1.02} +2025-02-05 18:20:21 - ERROR - stderr - 34%|███▍ | 7636/22434 [8:12:41<10:16:49, 2.50s/it] +2025-02-05 18:20:23 - ERROR - stderr - 34%|███▍ | 7637/22434 [8:12:43<10:31:55, 2.56s/it] +2025-02-05 18:20:23 - ERROR - stderr - +2025-02-05 18:20:23 - ERROR - stderr - +2025-02-05 18:20:23 - INFO - stdout - {'loss': 0.6773, 'grad_norm': 1.0593008995056152, 'learning_rate': 1.5358511746674555e-05, 'epoch': 1.02} +2025-02-05 18:20:23 - ERROR - stderr - 34%|███▍ | 7637/22434 [8:12:43<10:31:55, 2.56s/it] +2025-02-05 18:20:26 - ERROR - stderr - 34%|��██▍ | 7638/22434 [8:12:46<10:34:48, 2.57s/it] +2025-02-05 18:20:26 - ERROR - stderr - +2025-02-05 18:20:26 - ERROR - stderr - +2025-02-05 18:20:26 - INFO - stdout - {'loss': 0.6721, 'grad_norm': 1.080504298210144, 'learning_rate': 1.5357292717578463e-05, 'epoch': 1.02} +2025-02-05 18:20:26 - ERROR - stderr - 34%|███▍ | 7638/22434 [8:12:46<10:34:48, 2.57s/it] +2025-02-05 18:20:29 - ERROR - stderr - 34%|███▍ | 7639/22434 [8:12:48<10:34:49, 2.57s/it] +2025-02-05 18:20:29 - ERROR - stderr - +2025-02-05 18:20:29 - ERROR - stderr - +2025-02-05 18:20:29 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.1393852233886719, 'learning_rate': 1.5356073576814732e-05, 'epoch': 1.02} +2025-02-05 18:20:29 - ERROR - stderr - 34%|███▍ | 7639/22434 [8:12:48<10:34:49, 2.57s/it] +2025-02-05 18:20:31 - ERROR - stderr - 34%|███▍ | 7640/22434 [8:12:51<10:38:34, 2.59s/it] +2025-02-05 18:20:31 - ERROR - stderr - +2025-02-05 18:20:31 - ERROR - stderr - +2025-02-05 18:20:31 - INFO - stdout - {'loss': 0.7754, 'grad_norm': 1.1669518947601318, 'learning_rate': 1.5354854324408776e-05, 'epoch': 1.02} +2025-02-05 18:20:31 - ERROR - stderr - 34%|███▍ | 7640/22434 [8:12:51<10:38:34, 2.59s/it] +2025-02-05 18:20:34 - ERROR - stderr - 34%|███▍ | 7641/22434 [8:12:54<10:33:22, 2.57s/it] +2025-02-05 18:20:34 - ERROR - stderr - +2025-02-05 18:20:34 - ERROR - stderr - +2025-02-05 18:20:34 - INFO - stdout - {'loss': 0.7728, 'grad_norm': 1.0751136541366577, 'learning_rate': 1.5353634960386004e-05, 'epoch': 1.02} +2025-02-05 18:20:34 - ERROR - stderr - 34%|███▍ | 7641/22434 [8:12:54<10:33:22, 2.57s/it] +2025-02-05 18:20:36 - ERROR - stderr - 34%|███▍ | 7642/22434 [8:12:56<10:24:06, 2.53s/it] +2025-02-05 18:20:36 - ERROR - stderr - +2025-02-05 18:20:36 - ERROR - stderr - +2025-02-05 18:20:36 - INFO - stdout - {'loss': 0.7812, 'grad_norm': 1.187941312789917, 'learning_rate': 1.5352415484771833e-05, 'epoch': 1.02} +2025-02-05 18:20:36 - ERROR - stderr - 34%|███▍ | 7642/22434 [8:12:56<10:24:06, 2.53s/it] +2025-02-05 18:20:39 - ERROR - stderr - 34%|███▍ | 7643/22434 [8:12:59<10:36:22, 2.58s/it] +2025-02-05 18:20:39 - ERROR - stderr - +2025-02-05 18:20:39 - ERROR - stderr - +2025-02-05 18:20:39 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 1.2098135948181152, 'learning_rate': 1.5351195897591683e-05, 'epoch': 1.02} +2025-02-05 18:20:39 - ERROR - stderr - 34%|███▍ | 7643/22434 [8:12:59<10:36:22, 2.58s/it] +2025-02-05 18:20:41 - ERROR - stderr - 34%|███▍ | 7644/22434 [8:13:01<10:34:28, 2.57s/it] +2025-02-05 18:20:41 - ERROR - stderr - +2025-02-05 18:20:41 - ERROR - stderr - +2025-02-05 18:20:41 - INFO - stdout - {'loss': 0.7814, 'grad_norm': 1.0966928005218506, 'learning_rate': 1.5349976198870974e-05, 'epoch': 1.02} +2025-02-05 18:20:41 - ERROR - stderr - 34%|███▍ | 7644/22434 [8:13:01<10:34:28, 2.57s/it] +2025-02-05 18:20:44 - ERROR - stderr - 34%|███▍ | 7645/22434 [8:13:04<10:25:41, 2.54s/it] +2025-02-05 18:20:44 - ERROR - stderr - +2025-02-05 18:20:44 - ERROR - stderr - +2025-02-05 18:20:44 - INFO - stdout - {'loss': 0.7782, 'grad_norm': 1.2695571184158325, 'learning_rate': 1.5348756388635133e-05, 'epoch': 1.02} +2025-02-05 18:20:44 - ERROR - stderr - 34%|███▍ | 7645/22434 [8:13:04<10:25:41, 2.54s/it] +2025-02-05 18:20:46 - ERROR - stderr - 34%|███▍ | 7646/22434 [8:13:06<10:24:58, 2.54s/it] +2025-02-05 18:20:46 - ERROR - stderr - +2025-02-05 18:20:46 - ERROR - stderr - +2025-02-05 18:20:46 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.1409188508987427, 'learning_rate': 1.534753646690958e-05, 'epoch': 1.02} +2025-02-05 18:20:46 - ERROR - stderr - 34%|███▍ | 7646/22434 [8:13:06<10:24:58, 2.54s/it] +2025-02-05 18:20:49 - ERROR - stderr - 34%|███▍ | 7647/22434 [8:13:09<10:49:07, 2.63s/it] +2025-02-05 18:20:49 - ERROR - stderr - +2025-02-05 18:20:49 - ERROR - stderr - +2025-02-05 18:20:49 - INFO - stdout - {'loss': 0.6972, 'grad_norm': 1.007348895072937, 'learning_rate': 1.5346316433719747e-05, 'epoch': 1.02} +2025-02-05 18:20:49 - ERROR - stderr - 34%|███▍ | 7647/22434 [8:13:09<10:49:07, 2.63s/it] +2025-02-05 18:20:52 - ERROR - stderr - 34%|███▍ | 7648/22434 [8:13:12<10:43:31, 2.61s/it] +2025-02-05 18:20:52 - ERROR - stderr - +2025-02-05 18:20:52 - ERROR - stderr - +2025-02-05 18:20:52 - INFO - stdout - {'loss': 0.6395, 'grad_norm': 1.023591160774231, 'learning_rate': 1.5345096289091066e-05, 'epoch': 1.02} +2025-02-05 18:20:52 - ERROR - stderr - 34%|███▍ | 7648/22434 [8:13:12<10:43:31, 2.61s/it] +2025-02-05 18:20:54 - ERROR - stderr - 34%|███▍ | 7649/22434 [8:13:14<10:40:24, 2.60s/it] +2025-02-05 18:20:54 - ERROR - stderr - +2025-02-05 18:20:54 - ERROR - stderr - +2025-02-05 18:20:54 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.0743900537490845, 'learning_rate': 1.5343876033048964e-05, 'epoch': 1.02} +2025-02-05 18:20:54 - ERROR - stderr - 34%|███▍ | 7649/22434 [8:13:14<10:40:24, 2.60s/it] +2025-02-05 18:20:57 - ERROR - stderr - 34%|███▍ | 7650/22434 [8:13:17<11:05:05, 2.70s/it] +2025-02-05 18:20:57 - ERROR - stderr - +2025-02-05 18:20:57 - ERROR - stderr - +2025-02-05 18:20:57 - INFO - stdout - {'loss': 0.7874, 'grad_norm': 1.2269963026046753, 'learning_rate': 1.5342655665618885e-05, 'epoch': 1.02} +2025-02-05 18:20:57 - ERROR - stderr - 34%|███▍ | 7650/22434 [8:13:17<11:05:05, 2.70s/it] +2025-02-05 18:21:00 - ERROR - stderr - 34%|███▍ | 7651/22434 [8:13:20<10:54:02, 2.65s/it] +2025-02-05 18:21:00 - ERROR - stderr - +2025-02-05 18:21:00 - ERROR - stderr - +2025-02-05 18:21:00 - INFO - stdout - {'loss': 0.7155, 'grad_norm': 1.0078877210617065, 'learning_rate': 1.5341435186826257e-05, 'epoch': 1.02} +2025-02-05 18:21:00 - ERROR - stderr - 34%|███▍ | 7651/22434 [8:13:20<10:54:02, 2.65s/it] +2025-02-05 18:21:02 - ERROR - stderr - 34%|███▍ | 7652/22434 [8:13:22<10:46:12, 2.62s/it] +2025-02-05 18:21:02 - ERROR - stderr - +2025-02-05 18:21:02 - ERROR - stderr - +2025-02-05 18:21:02 - INFO - stdout - {'loss': 0.7526, 'grad_norm': 1.1455148458480835, 'learning_rate': 1.5340214596696525e-05, 'epoch': 1.02} +2025-02-05 18:21:02 - ERROR - stderr - 34%|███▍ | 7652/22434 [8:13:22<10:46:12, 2.62s/it] +2025-02-05 18:21:05 - ERROR - stderr - 34%|███▍ | 7653/22434 [8:13:25<10:38:59, 2.59s/it] +2025-02-05 18:21:05 - ERROR - stderr - +2025-02-05 18:21:05 - ERROR - stderr - +2025-02-05 18:21:05 - INFO - stdout - {'loss': 0.8031, 'grad_norm': 1.3359558582305908, 'learning_rate': 1.533899389525513e-05, 'epoch': 1.02} +2025-02-05 18:21:05 - ERROR - stderr - 34%|███▍ | 7653/22434 [8:13:25<10:38:59, 2.59s/it] +2025-02-05 18:21:08 - ERROR - stderr - 34%|███▍ | 7654/22434 [8:13:27<10:48:13, 2.63s/it] +2025-02-05 18:21:08 - ERROR - stderr - +2025-02-05 18:21:08 - ERROR - stderr - +2025-02-05 18:21:08 - INFO - stdout - {'loss': 0.7555, 'grad_norm': 1.1383705139160156, 'learning_rate': 1.5337773082527515e-05, 'epoch': 1.02} +2025-02-05 18:21:08 - ERROR - stderr - 34%|███▍ | 7654/22434 [8:13:28<10:48:13, 2.63s/it] +2025-02-05 18:21:10 - ERROR - stderr - 34%|███▍ | 7655/22434 [8:13:30<10:47:49, 2.63s/it] +2025-02-05 18:21:10 - ERROR - stderr - +2025-02-05 18:21:10 - ERROR - stderr - +2025-02-05 18:21:10 - INFO - stdout - {'loss': 0.7358, 'grad_norm': 1.149391531944275, 'learning_rate': 1.533655215853913e-05, 'epoch': 1.02} +2025-02-05 18:21:10 - ERROR - stderr - 34%|███▍ | 7655/22434 [8:13:30<10:47:49, 2.63s/it] +2025-02-05 18:21:13 - ERROR - stderr - 34%|███▍ | 7656/22434 [8:13:33<10:56:15, 2.66s/it] +2025-02-05 18:21:13 - ERROR - stderr - +2025-02-05 18:21:13 - ERROR - stderr - +2025-02-05 18:21:13 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.0689034461975098, 'learning_rate': 1.5335331123315424e-05, 'epoch': 1.02} +2025-02-05 18:21:13 - ERROR - stderr - 34%|███▍ | 7656/22434 [8:13:33<10:56:15, 2.66s/it] +2025-02-05 18:21:16 - ERROR - stderr - 34%|███▍ | 7657/22434 [8:13:35<10:48:30, 2.63s/it] +2025-02-05 18:21:16 - ERROR - stderr - +2025-02-05 18:21:16 - ERROR - stderr - +2025-02-05 18:21:16 - INFO - stdout - {'loss': 0.7397, 'grad_norm': 1.1047368049621582, 'learning_rate': 1.533410997688184e-05, 'epoch': 1.02} +2025-02-05 18:21:16 - ERROR - stderr - 34%|███▍ | 7657/22434 [8:13:35<10:48:30, 2.63s/it] +2025-02-05 18:21:18 - ERROR - stderr - 34%|███▍ | 7658/22434 [8:13:38<10:42:53, 2.61s/it] +2025-02-05 18:21:18 - ERROR - stderr - +2025-02-05 18:21:18 - ERROR - stderr - +2025-02-05 18:21:18 - INFO - stdout - {'loss': 0.7762, 'grad_norm': 1.2034196853637695, 'learning_rate': 1.533288871926384e-05, 'epoch': 1.02} +2025-02-05 18:21:18 - ERROR - stderr - 34%|███▍ | 7658/22434 [8:13:38<10:42:53, 2.61s/it] +2025-02-05 18:21:21 - ERROR - stderr - 34%|███▍ | 7659/22434 [8:13:40<10:34:25, 2.58s/it] +2025-02-05 18:21:21 - ERROR - stderr - +2025-02-05 18:21:21 - ERROR - stderr - +2025-02-05 18:21:21 - INFO - stdout - {'loss': 0.7732, 'grad_norm': 1.112215518951416, 'learning_rate': 1.5331667350486876e-05, 'epoch': 1.02} +2025-02-05 18:21:21 - ERROR - stderr - 34%|███▍ | 7659/22434 [8:13:41<10:34:25, 2.58s/it] +2025-02-05 18:21:23 - ERROR - stderr - 34%|███▍ | 7660/22434 [8:13:43<10:34:02, 2.57s/it] +2025-02-05 18:21:23 - ERROR - stderr - +2025-02-05 18:21:23 - ERROR - stderr - +2025-02-05 18:21:23 - INFO - stdout - {'loss': 0.7539, 'grad_norm': 1.161926507949829, 'learning_rate': 1.5330445870576412e-05, 'epoch': 1.02} +2025-02-05 18:21:23 - ERROR - stderr - 34%|███▍ | 7660/22434 [8:13:43<10:34:02, 2.57s/it] +2025-02-05 18:21:26 - ERROR - stderr - 34%|███▍ | 7661/22434 [8:13:45<10:26:16, 2.54s/it] +2025-02-05 18:21:26 - ERROR - stderr - +2025-02-05 18:21:26 - ERROR - stderr - +2025-02-05 18:21:26 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.1224719285964966, 'learning_rate': 1.5329224279557903e-05, 'epoch': 1.02} +2025-02-05 18:21:26 - ERROR - stderr - 34%|███▍ | 7661/22434 [8:13:46<10:26:16, 2.54s/it] +2025-02-05 18:21:28 - ERROR - stderr - 34%|███▍ | 7662/22434 [8:13:48<10:36:56, 2.59s/it] +2025-02-05 18:21:28 - ERROR - stderr - +2025-02-05 18:21:28 - ERROR - stderr - +2025-02-05 18:21:28 - INFO - stdout - {'loss': 0.7793, 'grad_norm': 1.143916130065918, 'learning_rate': 1.532800257745681e-05, 'epoch': 1.02} +2025-02-05 18:21:28 - ERROR - stderr - 34%|███▍ | 7662/22434 [8:13:48<10:36:56, 2.59s/it] +2025-02-05 18:21:31 - ERROR - stderr - 34%|███▍ | 7663/22434 [8:13:51<10:32:51, 2.57s/it] +2025-02-05 18:21:31 - ERROR - stderr - +2025-02-05 18:21:31 - ERROR - stderr - +2025-02-05 18:21:31 - INFO - stdout - {'loss': 0.7608, 'grad_norm': 1.106979489326477, 'learning_rate': 1.5326780764298607e-05, 'epoch': 1.02} +2025-02-05 18:21:31 - ERROR - stderr - 34%|███▍ | 7663/22434 [8:13:51<10:32:51, 2.57s/it] +2025-02-05 18:21:34 - ERROR - stderr - 34%|███▍ | 7664/22434 [8:13:53<10:34:36, 2.58s/it] +2025-02-05 18:21:34 - ERROR - stderr - +2025-02-05 18:21:34 - ERROR - stderr - +2025-02-05 18:21:34 - INFO - stdout - {'loss': 0.7445, 'grad_norm': 1.0660679340362549, 'learning_rate': 1.532555884010875e-05, 'epoch': 1.02} +2025-02-05 18:21:34 - ERROR - stderr - 34%|███▍ | 7664/22434 [8:13:53<10:34:36, 2.58s/it] +2025-02-05 18:21:36 - ERROR - stderr - 34%|███▍ | 7665/22434 [8:13:56<10:37:30, 2.59s/it] +2025-02-05 18:21:36 - ERROR - stderr - +2025-02-05 18:21:36 - ERROR - stderr - +2025-02-05 18:21:36 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.19538414478302, 'learning_rate': 1.532433680491272e-05, 'epoch': 1.03} +2025-02-05 18:21:36 - ERROR - stderr - 34%|███▍ | 7665/22434 [8:13:56<10:37:30, 2.59s/it] +2025-02-05 18:21:39 - ERROR - stderr - 34%|███▍ | 7666/22434 [8:13:58<10:34:21, 2.58s/it] +2025-02-05 18:21:39 - ERROR - stderr - +2025-02-05 18:21:39 - ERROR - stderr - +2025-02-05 18:21:39 - INFO - stdout - {'loss': 0.7196, 'grad_norm': 1.1730625629425049, 'learning_rate': 1.532311465873598e-05, 'epoch': 1.03} +2025-02-05 18:21:39 - ERROR - stderr - 34%|███▍ | 7666/22434 [8:13:59<10:34:21, 2.58s/it] +2025-02-05 18:21:41 - ERROR - stderr - 34%|███▍ | 7667/22434 [8:14:01<10:31:16, 2.56s/it] +2025-02-05 18:21:41 - ERROR - stderr - +2025-02-05 18:21:41 - ERROR - stderr - +2025-02-05 18:21:41 - INFO - stdout - {'loss': 0.7392, 'grad_norm': 1.1953115463256836, 'learning_rate': 1.5321892401604014e-05, 'epoch': 1.03} +2025-02-05 18:21:41 - ERROR - stderr - 34%|███▍ | 7667/22434 [8:14:01<10:31:16, 2.56s/it] +2025-02-05 18:21:44 - ERROR - stderr - 34%|███▍ | 7668/22434 [8:14:04<10:29:55, 2.56s/it] +2025-02-05 18:21:44 - ERROR - stderr - +2025-02-05 18:21:44 - ERROR - stderr - +2025-02-05 18:21:44 - INFO - stdout - {'loss': 0.7688, 'grad_norm': 1.1381534337997437, 'learning_rate': 1.532067003354229e-05, 'epoch': 1.03} +2025-02-05 18:21:44 - ERROR - stderr - 34%|███▍ | 7668/22434 [8:14:04<10:29:55, 2.56s/it] +2025-02-05 18:21:46 - ERROR - stderr - 34%|███▍ | 7669/22434 [8:14:06<10:26:11, 2.54s/it] +2025-02-05 18:21:46 - ERROR - stderr - +2025-02-05 18:21:46 - ERROR - stderr - +2025-02-05 18:21:46 - INFO - stdout - {'loss': 0.6737, 'grad_norm': 1.0532985925674438, 'learning_rate': 1.5319447554576292e-05, 'epoch': 1.03} +2025-02-05 18:21:46 - ERROR - stderr - 34%|███▍ | 7669/22434 [8:14:06<10:26:11, 2.54s/it] +2025-02-05 18:21:49 - ERROR - stderr - 34%|███▍ | 7670/22434 [8:14:09<10:25:41, 2.54s/it] +2025-02-05 18:21:49 - ERROR - stderr - +2025-02-05 18:21:49 - ERROR - stderr - +2025-02-05 18:21:49 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.024971842765808, 'learning_rate': 1.53182249647315e-05, 'epoch': 1.03} +2025-02-05 18:21:49 - ERROR - stderr - 34%|███▍ | 7670/22434 [8:14:09<10:25:41, 2.54s/it] +2025-02-05 18:21:52 - ERROR - stderr - 34%|███▍ | 7671/22434 [8:14:11<10:35:28, 2.58s/it] +2025-02-05 18:21:52 - ERROR - stderr - +2025-02-05 18:21:52 - ERROR - stderr - +2025-02-05 18:21:52 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.0704389810562134, 'learning_rate': 1.5317002264033395e-05, 'epoch': 1.03} +2025-02-05 18:21:52 - ERROR - stderr - 34%|███▍ | 7671/22434 [8:14:11<10:35:28, 2.58s/it] +2025-02-05 18:21:54 - ERROR - stderr - 34%|███▍ | 7672/22434 [8:14:14<10:38:05, 2.59s/it] +2025-02-05 18:21:54 - ERROR - stderr - +2025-02-05 18:21:54 - ERROR - stderr - +2025-02-05 18:21:54 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 0.9812400341033936, 'learning_rate': 1.5315779452507466e-05, 'epoch': 1.03} +2025-02-05 18:21:54 - ERROR - stderr - 34%|███▍ | 7672/22434 [8:14:14<10:38:05, 2.59s/it] +2025-02-05 18:21:57 - ERROR - stderr - 34%|███▍ | 7673/22434 [8:14:16<10:32:19, 2.57s/it] +2025-02-05 18:21:57 - ERROR - stderr - +2025-02-05 18:21:57 - ERROR - stderr - +2025-02-05 18:21:57 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.1281324625015259, 'learning_rate': 1.53145565301792e-05, 'epoch': 1.03} +2025-02-05 18:21:57 - ERROR - stderr - 34%|███▍ | 7673/22434 [8:14:16<10:32:19, 2.57s/it] +2025-02-05 18:21:59 - ERROR - stderr - 34%|███▍ | 7674/22434 [8:14:19<10:28:34, 2.56s/it] +2025-02-05 18:21:59 - ERROR - stderr - +2025-02-05 18:21:59 - ERROR - stderr - +2025-02-05 18:21:59 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.0692516565322876, 'learning_rate': 1.5313333497074094e-05, 'epoch': 1.03} +2025-02-05 18:21:59 - ERROR - stderr - 34%|███▍ | 7674/22434 [8:14:19<10:28:34, 2.56s/it] +2025-02-05 18:22:02 - ERROR - stderr - 34%|███▍ | 7675/22434 [8:14:21<10:28:09, 2.55s/it] +2025-02-05 18:22:02 - ERROR - stderr - +2025-02-05 18:22:02 - ERROR - stderr - +2025-02-05 18:22:02 - INFO - stdout - {'loss': 0.8069, 'grad_norm': 1.0706753730773926, 'learning_rate': 1.5312110353217634e-05, 'epoch': 1.03} +2025-02-05 18:22:02 - ERROR - stderr - 34%|███▍ | 7675/22434 [8:14:22<10:28:09, 2.55s/it] +2025-02-05 18:22:04 - ERROR - stderr - 34%|███▍ | 7676/22434 [8:14:24<10:24:07, 2.54s/it] +2025-02-05 18:22:04 - ERROR - stderr - +2025-02-05 18:22:04 - ERROR - stderr - +2025-02-05 18:22:04 - INFO - stdout - {'loss': 0.7323, 'grad_norm': 1.0035525560379028, 'learning_rate': 1.5310887098635313e-05, 'epoch': 1.03} +2025-02-05 18:22:04 - ERROR - stderr - 34%|███▍ | 7676/22434 [8:14:24<10:24:07, 2.54s/it] +2025-02-05 18:22:07 - ERROR - stderr - 34%|███▍ | 7677/22434 [8:14:26<10:20:55, 2.52s/it] +2025-02-05 18:22:07 - ERROR - stderr - +2025-02-05 18:22:07 - ERROR - stderr - +2025-02-05 18:22:07 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.1789295673370361, 'learning_rate': 1.5309663733352634e-05, 'epoch': 1.03} +2025-02-05 18:22:07 - ERROR - stderr - 34%|███▍ | 7677/22434 [8:14:27<10:20:55, 2.52s/it] +2025-02-05 18:22:09 - ERROR - stderr - 34%|███▍ | 7678/22434 [8:14:29<10:19:40, 2.52s/it] +2025-02-05 18:22:09 - ERROR - stderr - +2025-02-05 18:22:09 - ERROR - stderr - +2025-02-05 18:22:09 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.0701169967651367, 'learning_rate': 1.5308440257395095e-05, 'epoch': 1.03} +2025-02-05 18:22:09 - ERROR - stderr - 34%|███▍ | 7678/22434 [8:14:29<10:19:40, 2.52s/it] +2025-02-05 18:22:12 - ERROR - stderr - 34%|███▍ | 7679/22434 [8:14:31<10:16:56, 2.51s/it] +2025-02-05 18:22:12 - ERROR - stderr - +2025-02-05 18:22:12 - ERROR - stderr - +2025-02-05 18:22:12 - INFO - stdout - {'loss': 0.8357, 'grad_norm': 1.2434269189834595, 'learning_rate': 1.5307216670788202e-05, 'epoch': 1.03} +2025-02-05 18:22:12 - ERROR - stderr - 34%|███▍ | 7679/22434 [8:14:32<10:16:56, 2.51s/it] +2025-02-05 18:22:14 - ERROR - stderr - 34%|███▍ | 7680/22434 [8:14:34<10:12:45, 2.49s/it] +2025-02-05 18:22:14 - ERROR - stderr - +2025-02-05 18:22:14 - ERROR - stderr - +2025-02-05 18:22:14 - INFO - stdout - {'loss': 0.6805, 'grad_norm': 1.068922519683838, 'learning_rate': 1.530599297355745e-05, 'epoch': 1.03} +2025-02-05 18:22:14 - ERROR - stderr - 34%|███▍ | 7680/22434 [8:14:34<10:12:45, 2.49s/it] +2025-02-05 18:22:17 - ERROR - stderr - 34%|███▍ | 7681/22434 [8:14:36<10:15:52, 2.50s/it] +2025-02-05 18:22:17 - ERROR - stderr - +2025-02-05 18:22:17 - ERROR - stderr - +2025-02-05 18:22:17 - INFO - stdout - {'loss': 0.7124, 'grad_norm': 1.0845290422439575, 'learning_rate': 1.5304769165728357e-05, 'epoch': 1.03} +2025-02-05 18:22:17 - ERROR - stderr - 34%|███▍ | 7681/22434 [8:14:37<10:15:52, 2.50s/it] +2025-02-05 18:22:19 - ERROR - stderr - 34%|███▍ | 7682/22434 [8:14:39<10:17:21, 2.51s/it] +2025-02-05 18:22:19 - ERROR - stderr - +2025-02-05 18:22:19 - ERROR - stderr - +2025-02-05 18:22:19 - INFO - stdout - {'loss': 0.7761, 'grad_norm': 1.2109038829803467, 'learning_rate': 1.5303545247326424e-05, 'epoch': 1.03} +2025-02-05 18:22:19 - ERROR - stderr - 34%|███▍ | 7682/22434 [8:14:39<10:17:21, 2.51s/it] +2025-02-05 18:22:22 - ERROR - stderr - 34%|███▍ | 7683/22434 [8:14:41<10:17:50, 2.51s/it] +2025-02-05 18:22:22 - ERROR - stderr - +2025-02-05 18:22:22 - ERROR - stderr - +2025-02-05 18:22:22 - INFO - stdout - {'loss': 0.7426, 'grad_norm': 1.1997913122177124, 'learning_rate': 1.5302321218377167e-05, 'epoch': 1.03} +2025-02-05 18:22:22 - ERROR - stderr - 34%|███▍ | 7683/22434 [8:14:42<10:17:50, 2.51s/it] +2025-02-05 18:22:24 - ERROR - stderr - 34%|███▍ | 7684/22434 [8:14:44<10:14:39, 2.50s/it] +2025-02-05 18:22:24 - ERROR - stderr - +2025-02-05 18:22:24 - ERROR - stderr - +2025-02-05 18:22:24 - INFO - stdout - {'loss': 0.7871, 'grad_norm': 1.1841999292373657, 'learning_rate': 1.5301097078906096e-05, 'epoch': 1.03} +2025-02-05 18:22:24 - ERROR - stderr - 34%|███▍ | 7684/22434 [8:14:44<10:14:39, 2.50s/it] +2025-02-05 18:22:27 - ERROR - stderr - 34%|███▍ | 7685/22434 [8:14:46<10:15:00, 2.50s/it] +2025-02-05 18:22:27 - ERROR - stderr - +2025-02-05 18:22:27 - ERROR - stderr - +2025-02-05 18:22:27 - INFO - stdout - {'loss': 0.7479, 'grad_norm': 1.0841727256774902, 'learning_rate': 1.529987282893873e-05, 'epoch': 1.03} +2025-02-05 18:22:27 - ERROR - stderr - 34%|███▍ | 7685/22434 [8:14:47<10:15:00, 2.50s/it] +2025-02-05 18:22:29 - ERROR - stderr - 34%|███▍ | 7686/22434 [8:14:49<10:15:48, 2.51s/it] +2025-02-05 18:22:29 - ERROR - stderr - +2025-02-05 18:22:29 - ERROR - stderr - +2025-02-05 18:22:29 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.135014533996582, 'learning_rate': 1.5298648468500585e-05, 'epoch': 1.03} +2025-02-05 18:22:29 - ERROR - stderr - 34%|███▍ | 7686/22434 [8:14:49<10:15:48, 2.51s/it] +2025-02-05 18:22:32 - ERROR - stderr - 34%|███▍ | 7687/22434 [8:14:51<10:14:24, 2.50s/it] +2025-02-05 18:22:32 - ERROR - stderr - +2025-02-05 18:22:32 - ERROR - stderr - +2025-02-05 18:22:32 - INFO - stdout - {'loss': 0.7498, 'grad_norm': 1.0995436906814575, 'learning_rate': 1.5297423997617187e-05, 'epoch': 1.03} +2025-02-05 18:22:32 - ERROR - stderr - 34%|███▍ | 7687/22434 [8:14:52<10:14:24, 2.50s/it] +2025-02-05 18:22:34 - ERROR - stderr - 34%|███▍ | 7688/22434 [8:14:54<10:17:54, 2.51s/it] +2025-02-05 18:22:34 - ERROR - stderr - +2025-02-05 18:22:34 - ERROR - stderr - +2025-02-05 18:22:34 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.1187154054641724, 'learning_rate': 1.5296199416314052e-05, 'epoch': 1.03} +2025-02-05 18:22:34 - ERROR - stderr - 34%|███▍ | 7688/22434 [8:14:54<10:17:54, 2.51s/it] +2025-02-05 18:22:37 - ERROR - stderr - 34%|███▍ | 7689/22434 [8:14:57<10:28:32, 2.56s/it] +2025-02-05 18:22:37 - ERROR - stderr - +2025-02-05 18:22:37 - ERROR - stderr - +2025-02-05 18:22:37 - INFO - stdout - {'loss': 0.692, 'grad_norm': 1.1194250583648682, 'learning_rate': 1.529497472461671e-05, 'epoch': 1.03} +2025-02-05 18:22:37 - ERROR - stderr - 34%|███▍ | 7689/22434 [8:14:57<10:28:32, 2.56s/it] +2025-02-05 18:22:39 - ERROR - stderr - 34%|███▍ | 7690/22434 [8:14:59<10:29:46, 2.56s/it] +2025-02-05 18:22:40 - ERROR - stderr - +2025-02-05 18:22:40 - ERROR - stderr - +2025-02-05 18:22:40 - INFO - stdout - {'loss': 0.7743, 'grad_norm': 1.1201798915863037, 'learning_rate': 1.529374992255068e-05, 'epoch': 1.03} +2025-02-05 18:22:40 - ERROR - stderr - 34%|███▍ | 7690/22434 [8:14:59<10:29:46, 2.56s/it] +2025-02-05 18:22:42 - ERROR - stderr - 34%|███▍ | 7691/22434 [8:15:02<10:45:47, 2.63s/it] +2025-02-05 18:22:42 - ERROR - stderr - +2025-02-05 18:22:42 - ERROR - stderr - +2025-02-05 18:22:42 - INFO - stdout - {'loss': 0.7895, 'grad_norm': 1.2152721881866455, 'learning_rate': 1.5292525010141507e-05, 'epoch': 1.03} +2025-02-05 18:22:42 - ERROR - stderr - 34%|███▍ | 7691/22434 [8:15:02<10:45:47, 2.63s/it] +2025-02-05 18:22:45 - ERROR - stderr - 34%|███▍ | 7692/22434 [8:15:05<10:36:32, 2.59s/it] +2025-02-05 18:22:45 - ERROR - stderr - +2025-02-05 18:22:45 - ERROR - stderr - +2025-02-05 18:22:45 - INFO - stdout - {'loss': 0.7759, 'grad_norm': 1.1908994913101196, 'learning_rate': 1.529129998741471e-05, 'epoch': 1.03} +2025-02-05 18:22:45 - ERROR - stderr - 34%|███▍ | 7692/22434 [8:15:05<10:36:32, 2.59s/it] +2025-02-05 18:22:47 - ERROR - stderr - 34%|███▍ | 7693/22434 [8:15:07<10:32:13, 2.57s/it] +2025-02-05 18:22:47 - ERROR - stderr - +2025-02-05 18:22:47 - ERROR - stderr - +2025-02-05 18:22:47 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.0199247598648071, 'learning_rate': 1.529007485439583e-05, 'epoch': 1.03} +2025-02-05 18:22:47 - ERROR - stderr - 34%|███▍ | 7693/22434 [8:15:07<10:32:13, 2.57s/it] +2025-02-05 18:22:50 - ERROR - stderr - 34%|███▍ | 7694/22434 [8:15:10<10:25:26, 2.55s/it] +2025-02-05 18:22:50 - ERROR - stderr - +2025-02-05 18:22:50 - ERROR - stderr - +2025-02-05 18:22:50 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.1857454776763916, 'learning_rate': 1.5288849611110398e-05, 'epoch': 1.03} +2025-02-05 18:22:50 - ERROR - stderr - 34%|███▍ | 7694/22434 [8:15:10<10:25:26, 2.55s/it] +2025-02-05 18:22:52 - ERROR - stderr - 34%|███▍ | 7695/22434 [8:15:12<10:31:07, 2.57s/it] +2025-02-05 18:22:52 - ERROR - stderr - +2025-02-05 18:22:52 - ERROR - stderr - +2025-02-05 18:22:52 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.0865466594696045, 'learning_rate': 1.528762425758396e-05, 'epoch': 1.03} +2025-02-05 18:22:52 - ERROR - stderr - 34%|███▍ | 7695/22434 [8:15:12<10:31:07, 2.57s/it] +2025-02-05 18:22:55 - ERROR - stderr - 34%|███▍ | 7696/22434 [8:15:15<10:36:35, 2.59s/it] +2025-02-05 18:22:55 - ERROR - stderr - +2025-02-05 18:22:55 - ERROR - stderr - +2025-02-05 18:22:55 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.1344951391220093, 'learning_rate': 1.5286398793842054e-05, 'epoch': 1.03} +2025-02-05 18:22:55 - ERROR - stderr - 34%|███▍ | 7696/22434 [8:15:15<10:36:35, 2.59s/it] +2025-02-05 18:22:58 - ERROR - stderr - 34%|███▍ | 7697/22434 [8:15:17<10:29:31, 2.56s/it] +2025-02-05 18:22:58 - ERROR - stderr - +2025-02-05 18:22:58 - ERROR - stderr - +2025-02-05 18:22:58 - INFO - stdout - {'loss': 0.7794, 'grad_norm': 1.2493953704833984, 'learning_rate': 1.528517321991022e-05, 'epoch': 1.03} +2025-02-05 18:22:58 - ERROR - stderr - 34%|███▍ | 7697/22434 [8:15:17<10:29:31, 2.56s/it] +2025-02-05 18:23:00 - ERROR - stderr - 34%|███▍ | 7698/22434 [8:15:20<10:26:16, 2.55s/it] +2025-02-05 18:23:00 - ERROR - stderr - +2025-02-05 18:23:00 - ERROR - stderr - +2025-02-05 18:23:00 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.1999006271362305, 'learning_rate': 1.528394753581401e-05, 'epoch': 1.03} +2025-02-05 18:23:00 - ERROR - stderr - 34%|███▍ | 7698/22434 [8:15:20<10:26:16, 2.55s/it] +2025-02-05 18:23:03 - ERROR - stderr - 34%|███▍ | 7699/22434 [8:15:22<10:25:37, 2.55s/it] +2025-02-05 18:23:03 - ERROR - stderr - +2025-02-05 18:23:03 - ERROR - stderr - +2025-02-05 18:23:03 - INFO - stdout - {'loss': 0.7138, 'grad_norm': 0.9770349860191345, 'learning_rate': 1.5282721741578974e-05, 'epoch': 1.03} +2025-02-05 18:23:03 - ERROR - stderr - 34%|███▍ | 7699/22434 [8:15:22<10:25:37, 2.55s/it] +2025-02-05 18:23:05 - ERROR - stderr - 34%|███▍ | 7700/22434 [8:15:25<10:24:48, 2.54s/it] +2025-02-05 18:23:05 - ERROR - stderr - +2025-02-05 18:23:05 - ERROR - stderr - +2025-02-05 18:23:05 - INFO - stdout - {'loss': 0.7632, 'grad_norm': 1.1624467372894287, 'learning_rate': 1.5281495837230654e-05, 'epoch': 1.03} +2025-02-05 18:23:05 - ERROR - stderr - 34%|███▍ | 7700/22434 [8:15:25<10:24:48, 2.54s/it] +2025-02-05 18:23:08 - ERROR - stderr - 34%|███▍ | 7701/22434 [8:15:27<10:25:45, 2.55s/it] +2025-02-05 18:23:08 - ERROR - stderr - +2025-02-05 18:23:08 - ERROR - stderr - +2025-02-05 18:23:08 - INFO - stdout - {'loss': 0.6974, 'grad_norm': 1.1058036088943481, 'learning_rate': 1.5280269822794607e-05, 'epoch': 1.03} +2025-02-05 18:23:08 - ERROR - stderr - 34%|███▍ | 7701/22434 [8:15:28<10:25:45, 2.55s/it] +2025-02-05 18:23:11 - ERROR - stderr - 34%|███▍ | 7702/22434 [8:15:30<10:51:08, 2.65s/it] +2025-02-05 18:23:11 - ERROR - stderr - +2025-02-05 18:23:11 - ERROR - stderr - +2025-02-05 18:23:11 - INFO - stdout - {'loss': 0.6077, 'grad_norm': 1.0494451522827148, 'learning_rate': 1.527904369829639e-05, 'epoch': 1.03} +2025-02-05 18:23:11 - ERROR - stderr - 34%|███▍ | 7702/22434 [8:15:30<10:51:08, 2.65s/it] +2025-02-05 18:23:13 - ERROR - stderr - 34%|███▍ | 7703/22434 [8:15:33<10:46:27, 2.63s/it] +2025-02-05 18:23:13 - ERROR - stderr - +2025-02-05 18:23:13 - ERROR - stderr - +2025-02-05 18:23:13 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.0553343296051025, 'learning_rate': 1.5277817463761558e-05, 'epoch': 1.03} +2025-02-05 18:23:13 - ERROR - stderr - 34%|███▍ | 7703/22434 [8:15:33<10:46:27, 2.63s/it] +2025-02-05 18:23:16 - ERROR - stderr - 34%|███▍ | 7704/22434 [8:15:35<10:37:16, 2.60s/it] +2025-02-05 18:23:16 - ERROR - stderr - +2025-02-05 18:23:16 - ERROR - stderr - +2025-02-05 18:23:16 - INFO - stdout - {'loss': 0.8065, 'grad_norm': 1.1266316175460815, 'learning_rate': 1.527659111921567e-05, 'epoch': 1.03} +2025-02-05 18:23:16 - ERROR - stderr - 34%|███▍ | 7704/22434 [8:15:36<10:37:16, 2.60s/it] +2025-02-05 18:23:18 - ERROR - stderr - 34%|███▍ | 7705/22434 [8:15:38<10:29:42, 2.57s/it] +2025-02-05 18:23:18 - ERROR - stderr - +2025-02-05 18:23:18 - ERROR - stderr - +2025-02-05 18:23:18 - INFO - stdout - {'loss': 0.7601, 'grad_norm': 1.2385667562484741, 'learning_rate': 1.527536466468429e-05, 'epoch': 1.03} +2025-02-05 18:23:18 - ERROR - stderr - 34%|███▍ | 7705/22434 [8:15:38<10:29:42, 2.57s/it] +2025-02-05 18:23:21 - ERROR - stderr - 34%|███▍ | 7706/22434 [8:15:40<10:27:38, 2.56s/it] +2025-02-05 18:23:21 - ERROR - stderr - +2025-02-05 18:23:21 - ERROR - stderr - +2025-02-05 18:23:21 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.1131343841552734, 'learning_rate': 1.527413810019298e-05, 'epoch': 1.03} +2025-02-05 18:23:21 - ERROR - stderr - 34%|███▍ | 7706/22434 [8:15:41<10:27:38, 2.56s/it] +2025-02-05 18:23:23 - ERROR - stderr - 34%|███▍ | 7707/22434 [8:15:43<10:24:34, 2.54s/it] +2025-02-05 18:23:23 - ERROR - stderr - +2025-02-05 18:23:23 - ERROR - stderr - +2025-02-05 18:23:23 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.1246212720870972, 'learning_rate': 1.5272911425767315e-05, 'epoch': 1.03} +2025-02-05 18:23:23 - ERROR - stderr - 34%|█��█▍ | 7707/22434 [8:15:43<10:24:34, 2.54s/it] +2025-02-05 18:23:26 - ERROR - stderr - 34%|███▍ | 7708/22434 [8:15:46<10:24:58, 2.55s/it] +2025-02-05 18:23:26 - ERROR - stderr - +2025-02-05 18:23:26 - ERROR - stderr - +2025-02-05 18:23:26 - INFO - stdout - {'loss': 0.7182, 'grad_norm': 1.1344150304794312, 'learning_rate': 1.5271684641432848e-05, 'epoch': 1.03} +2025-02-05 18:23:26 - ERROR - stderr - 34%|███▍ | 7708/22434 [8:15:46<10:24:58, 2.55s/it] +2025-02-05 18:23:28 - ERROR - stderr - 34%|███▍ | 7709/22434 [8:15:48<10:21:01, 2.53s/it] +2025-02-05 18:23:28 - ERROR - stderr - +2025-02-05 18:23:28 - ERROR - stderr - +2025-02-05 18:23:28 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.1381834745407104, 'learning_rate': 1.5270457747215164e-05, 'epoch': 1.03} +2025-02-05 18:23:28 - ERROR - stderr - 34%|███▍ | 7709/22434 [8:15:48<10:21:01, 2.53s/it] +2025-02-05 18:23:31 - ERROR - stderr - 34%|███▍ | 7710/22434 [8:15:51<10:28:55, 2.56s/it] +2025-02-05 18:23:31 - ERROR - stderr - +2025-02-05 18:23:31 - ERROR - stderr - +2025-02-05 18:23:31 - INFO - stdout - {'loss': 0.7766, 'grad_norm': 1.2702410221099854, 'learning_rate': 1.5269230743139828e-05, 'epoch': 1.03} +2025-02-05 18:23:31 - ERROR - stderr - 34%|███▍ | 7710/22434 [8:15:51<10:28:55, 2.56s/it] +2025-02-05 18:23:33 - ERROR - stderr - 34%|███▍ | 7711/22434 [8:15:53<10:24:54, 2.55s/it] +2025-02-05 18:23:33 - ERROR - stderr - +2025-02-05 18:23:33 - ERROR - stderr - +2025-02-05 18:23:33 - INFO - stdout - {'loss': 0.705, 'grad_norm': 1.3508851528167725, 'learning_rate': 1.5268003629232423e-05, 'epoch': 1.03} +2025-02-05 18:23:33 - ERROR - stderr - 34%|███▍ | 7711/22434 [8:15:53<10:24:54, 2.55s/it] +2025-02-05 18:23:36 - ERROR - stderr - 34%|███▍ | 7712/22434 [8:15:56<10:25:26, 2.55s/it] +2025-02-05 18:23:36 - ERROR - stderr - +2025-02-05 18:23:36 - ERROR - stderr - +2025-02-05 18:23:36 - INFO - stdout - {'loss': 0.6907, 'grad_norm': 1.0799938440322876, 'learning_rate': 1.5266776405518523e-05, 'epoch': 1.03} +2025-02-05 18:23:36 - ERROR - stderr - 34%|███▍ | 7712/22434 [8:15:56<10:25:26, 2.55s/it] +2025-02-05 18:23:38 - ERROR - stderr - 34%|███▍ | 7713/22434 [8:15:58<10:18:12, 2.52s/it] +2025-02-05 18:23:38 - ERROR - stderr - +2025-02-05 18:23:38 - ERROR - stderr - +2025-02-05 18:23:38 - INFO - stdout - {'loss': 0.6693, 'grad_norm': 1.0558677911758423, 'learning_rate': 1.5265549072023705e-05, 'epoch': 1.03} +2025-02-05 18:23:38 - ERROR - stderr - 34%|███▍ | 7713/22434 [8:15:58<10:18:12, 2.52s/it] +2025-02-05 18:23:41 - ERROR - stderr - 34%|███▍ | 7714/22434 [8:16:01<10:18:52, 2.52s/it] +2025-02-05 18:23:41 - ERROR - stderr - +2025-02-05 18:23:41 - ERROR - stderr - +2025-02-05 18:23:41 - INFO - stdout - {'loss': 0.8498, 'grad_norm': 1.340999722480774, 'learning_rate': 1.526432162877356e-05, 'epoch': 1.03} +2025-02-05 18:23:41 - ERROR - stderr - 34%|███▍ | 7714/22434 [8:16:01<10:18:52, 2.52s/it] +2025-02-05 18:23:44 - ERROR - stderr - 34%|███▍ | 7715/22434 [8:16:03<10:22:30, 2.54s/it] +2025-02-05 18:23:44 - ERROR - stderr - +2025-02-05 18:23:44 - ERROR - stderr - +2025-02-05 18:23:44 - INFO - stdout - {'loss': 0.7542, 'grad_norm': 1.1497763395309448, 'learning_rate': 1.5263094075793667e-05, 'epoch': 1.03} +2025-02-05 18:23:44 - ERROR - stderr - 34%|███▍ | 7715/22434 [8:16:03<10:22:30, 2.54s/it] +2025-02-05 18:23:46 - ERROR - stderr - 34%|███▍ | 7716/22434 [8:16:06<10:18:34, 2.52s/it] +2025-02-05 18:23:46 - ERROR - stderr - +2025-02-05 18:23:46 - ERROR - stderr - +2025-02-05 18:23:46 - INFO - stdout - {'loss': 0.7285, 'grad_norm': 1.188692569732666, 'learning_rate': 1.526186641310961e-05, 'epoch': 1.03} +2025-02-05 18:23:46 - ERROR - stderr - 34%|███▍ | 7716/22434 [8:16:06<10:18:34, 2.52s/it] +2025-02-05 18:23:49 - ERROR - stderr - 34%|███▍ | 7717/22434 [8:16:08<10:20:23, 2.53s/it] +2025-02-05 18:23:49 - ERROR - stderr - +2025-02-05 18:23:49 - ERROR - stderr - +2025-02-05 18:23:49 - INFO - stdout - {'loss': 0.7877, 'grad_norm': 1.1685937643051147, 'learning_rate': 1.526063864074699e-05, 'epoch': 1.03} +2025-02-05 18:23:49 - ERROR - stderr - 34%|███▍ | 7717/22434 [8:16:08<10:20:23, 2.53s/it] +2025-02-05 18:23:51 - ERROR - stderr - 34%|███▍ | 7718/22434 [8:16:11<10:31:21, 2.57s/it] +2025-02-05 18:23:51 - ERROR - stderr - +2025-02-05 18:23:51 - ERROR - stderr - +2025-02-05 18:23:51 - INFO - stdout - {'loss': 0.6436, 'grad_norm': 0.9850782155990601, 'learning_rate': 1.5259410758731384e-05, 'epoch': 1.03} +2025-02-05 18:23:51 - ERROR - stderr - 34%|███▍ | 7718/22434 [8:16:11<10:31:21, 2.57s/it] +2025-02-05 18:23:54 - ERROR - stderr - 34%|███▍ | 7719/22434 [8:16:13<10:23:49, 2.54s/it] +2025-02-05 18:23:54 - ERROR - stderr - +2025-02-05 18:23:54 - ERROR - stderr - +2025-02-05 18:23:54 - INFO - stdout - {'loss': 0.7555, 'grad_norm': 1.1593655347824097, 'learning_rate': 1.5258182767088397e-05, 'epoch': 1.03} +2025-02-05 18:23:54 - ERROR - stderr - 34%|███▍ | 7719/22434 [8:16:14<10:23:49, 2.54s/it] +2025-02-05 18:23:56 - ERROR - stderr - 34%|███▍ | 7720/22434 [8:16:16<10:26:43, 2.56s/it] +2025-02-05 18:23:56 - ERROR - stderr - +2025-02-05 18:23:56 - ERROR - stderr - +2025-02-05 18:23:56 - INFO - stdout - {'loss': 0.6778, 'grad_norm': 1.1131411790847778, 'learning_rate': 1.5256954665843622e-05, 'epoch': 1.03} +2025-02-05 18:23:56 - ERROR - stderr - 34%|███▍ | 7720/22434 [8:16:16<10:26:43, 2.56s/it] +2025-02-05 18:23:59 - ERROR - stderr - 34%|███▍ | 7721/22434 [8:16:19<10:22:37, 2.54s/it] +2025-02-05 18:23:59 - ERROR - stderr - +2025-02-05 18:23:59 - ERROR - stderr - +2025-02-05 18:23:59 - INFO - stdout - {'loss': 0.7417, 'grad_norm': 1.0048941373825073, 'learning_rate': 1.5255726455022655e-05, 'epoch': 1.03} +2025-02-05 18:23:59 - ERROR - stderr - 34%|███▍ | 7721/22434 [8:16:19<10:22:37, 2.54s/it] +2025-02-05 18:24:01 - ERROR - stderr - 34%|███▍ | 7722/22434 [8:16:21<10:21:43, 2.54s/it] +2025-02-05 18:24:01 - ERROR - stderr - +2025-02-05 18:24:01 - ERROR - stderr - +2025-02-05 18:24:01 - INFO - stdout - {'loss': 0.7162, 'grad_norm': 1.0125313997268677, 'learning_rate': 1.5254498134651102e-05, 'epoch': 1.03} +2025-02-05 18:24:01 - ERROR - stderr - 34%|███▍ | 7722/22434 [8:16:21<10:21:43, 2.54s/it] +2025-02-05 18:24:04 - ERROR - stderr - 34%|███▍ | 7723/22434 [8:16:24<10:15:14, 2.51s/it] +2025-02-05 18:24:04 - ERROR - stderr - +2025-02-05 18:24:04 - ERROR - stderr - +2025-02-05 18:24:04 - INFO - stdout - {'loss': 0.775, 'grad_norm': 1.1664342880249023, 'learning_rate': 1.5253269704754564e-05, 'epoch': 1.03} +2025-02-05 18:24:04 - ERROR - stderr - 34%|███▍ | 7723/22434 [8:16:24<10:15:14, 2.51s/it] +2025-02-05 18:24:06 - ERROR - stderr - 34%|███▍ | 7724/22434 [8:16:26<10:14:18, 2.51s/it] +2025-02-05 18:24:06 - ERROR - stderr - +2025-02-05 18:24:06 - ERROR - stderr - +2025-02-05 18:24:06 - INFO - stdout - {'loss': 0.7525, 'grad_norm': 1.1665126085281372, 'learning_rate': 1.5252041165358642e-05, 'epoch': 1.03} +2025-02-05 18:24:06 - ERROR - stderr - 34%|███▍ | 7724/22434 [8:16:26<10:14:18, 2.51s/it] +2025-02-05 18:24:09 - ERROR - stderr - 34%|███▍ | 7725/22434 [8:16:29<10:28:05, 2.56s/it] +2025-02-05 18:24:09 - ERROR - stderr - +2025-02-05 18:24:09 - ERROR - stderr - +2025-02-05 18:24:09 - INFO - stdout - {'loss': 0.8173, 'grad_norm': 1.1431002616882324, 'learning_rate': 1.5250812516488949e-05, 'epoch': 1.03} +2025-02-05 18:24:09 - ERROR - stderr - 34%|███▍ | 7725/22434 [8:16:29<10:28:05, 2.56s/it] +2025-02-05 18:24:11 - ERROR - stderr - 34%|███▍ | 7726/22434 [8:16:31<10:25:01, 2.55s/it] +2025-02-05 18:24:12 - ERROR - stderr - +2025-02-05 18:24:12 - ERROR - stderr - +2025-02-05 18:24:12 - INFO - stdout - {'loss': 0.7813, 'grad_norm': 1.1822288036346436, 'learning_rate': 1.5249583758171094e-05, 'epoch': 1.03} +2025-02-05 18:24:12 - ERROR - stderr - 34%|███▍ | 7726/22434 [8:16:31<10:25:01, 2.55s/it] +2025-02-05 18:24:14 - ERROR - stderr - 34%|███▍ | 7727/22434 [8:16:34<10:21:13, 2.53s/it] +2025-02-05 18:24:14 - ERROR - stderr - +2025-02-05 18:24:14 - ERROR - stderr - +2025-02-05 18:24:14 - INFO - stdout - {'loss': 0.8046, 'grad_norm': 1.2135124206542969, 'learning_rate': 1.5248354890430693e-05, 'epoch': 1.03} +2025-02-05 18:24:14 - ERROR - stderr - 34%|███▍ | 7727/22434 [8:16:34<10:21:13, 2.53s/it] +2025-02-05 18:24:16 - ERROR - stderr - 34%|███▍ | 7728/22434 [8:16:36<10:15:27, 2.51s/it] +2025-02-05 18:24:17 - ERROR - stderr - +2025-02-05 18:24:17 - ERROR - stderr - +2025-02-05 18:24:17 - INFO - stdout - {'loss': 0.7193, 'grad_norm': 1.020592212677002, 'learning_rate': 1.524712591329335e-05, 'epoch': 1.03} +2025-02-05 18:24:17 - ERROR - stderr - 34%|███▍ | 7728/22434 [8:16:36<10:15:27, 2.51s/it] +2025-02-05 18:24:19 - ERROR - stderr - 34%|███▍ | 7729/22434 [8:16:39<10:18:37, 2.52s/it] +2025-02-05 18:24:19 - ERROR - stderr - +2025-02-05 18:24:19 - ERROR - stderr - +2025-02-05 18:24:19 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.0275360345840454, 'learning_rate': 1.5245896826784689e-05, 'epoch': 1.03} +2025-02-05 18:24:19 - ERROR - stderr - 34%|███▍ | 7729/22434 [8:16:39<10:18:37, 2.52s/it] +2025-02-05 18:24:22 - ERROR - stderr - 34%|███▍ | 7730/22434 [8:16:41<10:29:38, 2.57s/it] +2025-02-05 18:24:22 - ERROR - stderr - +2025-02-05 18:24:22 - ERROR - stderr - +2025-02-05 18:24:22 - INFO - stdout - {'loss': 0.7077, 'grad_norm': 1.2109256982803345, 'learning_rate': 1.5244667630930332e-05, 'epoch': 1.03} +2025-02-05 18:24:22 - ERROR - stderr - 34%|███▍ | 7730/22434 [8:16:42<10:29:38, 2.57s/it] +2025-02-05 18:24:24 - ERROR - stderr - 34%|█���█▍ | 7731/22434 [8:16:44<10:23:37, 2.54s/it] +2025-02-05 18:24:24 - ERROR - stderr - +2025-02-05 18:24:24 - ERROR - stderr - +2025-02-05 18:24:24 - INFO - stdout - {'loss': 0.7654, 'grad_norm': 1.0550034046173096, 'learning_rate': 1.5243438325755894e-05, 'epoch': 1.03} +2025-02-05 18:24:24 - ERROR - stderr - 34%|███▍ | 7731/22434 [8:16:44<10:23:37, 2.54s/it] +2025-02-05 18:24:27 - ERROR - stderr - 34%|███▍ | 7732/22434 [8:16:46<10:23:41, 2.55s/it] +2025-02-05 18:24:27 - ERROR - stderr - +2025-02-05 18:24:27 - ERROR - stderr - +2025-02-05 18:24:27 - INFO - stdout - {'loss': 0.7312, 'grad_norm': 1.1888256072998047, 'learning_rate': 1.5242208911287005e-05, 'epoch': 1.03} +2025-02-05 18:24:27 - ERROR - stderr - 34%|███▍ | 7732/22434 [8:16:47<10:23:41, 2.55s/it] +2025-02-05 18:24:29 - ERROR - stderr - 34%|███▍ | 7733/22434 [8:16:49<10:19:53, 2.53s/it] +2025-02-05 18:24:29 - ERROR - stderr - +2025-02-05 18:24:29 - ERROR - stderr - +2025-02-05 18:24:29 - INFO - stdout - {'loss': 0.9253, 'grad_norm': 1.3184868097305298, 'learning_rate': 1.5240979387549284e-05, 'epoch': 1.03} +2025-02-05 18:24:29 - ERROR - stderr - 34%|███▍ | 7733/22434 [8:16:49<10:19:53, 2.53s/it] +2025-02-05 18:24:32 - ERROR - stderr - 34%|███▍ | 7734/22434 [8:16:52<10:20:00, 2.53s/it] +2025-02-05 18:24:32 - ERROR - stderr - +2025-02-05 18:24:32 - ERROR - stderr - +2025-02-05 18:24:32 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.164965033531189, 'learning_rate': 1.5239749754568362e-05, 'epoch': 1.03} +2025-02-05 18:24:32 - ERROR - stderr - 34%|███▍ | 7734/22434 [8:16:52<10:20:00, 2.53s/it] +2025-02-05 18:24:34 - ERROR - stderr - 34%|███▍ | 7735/22434 [8:16:54<10:19:41, 2.53s/it] +2025-02-05 18:24:34 - ERROR - stderr - +2025-02-05 18:24:34 - ERROR - stderr - +2025-02-05 18:24:34 - INFO - stdout - {'loss': 0.7279, 'grad_norm': 1.0995858907699585, 'learning_rate': 1.5238520012369872e-05, 'epoch': 1.03} +2025-02-05 18:24:34 - ERROR - stderr - 34%|███▍ | 7735/22434 [8:16:54<10:19:41, 2.53s/it] +2025-02-05 18:24:37 - ERROR - stderr - 34%|███▍ | 7736/22434 [8:16:57<10:21:05, 2.54s/it] +2025-02-05 18:24:37 - ERROR - stderr - +2025-02-05 18:24:37 - ERROR - stderr - +2025-02-05 18:24:37 - INFO - stdout - {'loss': 0.8036, 'grad_norm': 1.1932474374771118, 'learning_rate': 1.5237290160979448e-05, 'epoch': 1.03} +2025-02-05 18:24:37 - ERROR - stderr - 34%|███▍ | 7736/22434 [8:16:57<10:21:05, 2.54s/it] +2025-02-05 18:24:39 - ERROR - stderr - 34%|███▍ | 7737/22434 [8:16:59<10:19:40, 2.53s/it] +2025-02-05 18:24:39 - ERROR - stderr - +2025-02-05 18:24:39 - ERROR - stderr - +2025-02-05 18:24:39 - INFO - stdout - {'loss': 0.7739, 'grad_norm': 1.1842924356460571, 'learning_rate': 1.523606020042272e-05, 'epoch': 1.03} +2025-02-05 18:24:39 - ERROR - stderr - 34%|███▍ | 7737/22434 [8:16:59<10:19:40, 2.53s/it] +2025-02-05 18:24:42 - ERROR - stderr - 34%|███▍ | 7738/22434 [8:17:02<10:13:49, 2.51s/it] +2025-02-05 18:24:42 - ERROR - stderr - +2025-02-05 18:24:42 - ERROR - stderr - +2025-02-05 18:24:42 - INFO - stdout - {'loss': 0.823, 'grad_norm': 1.1720792055130005, 'learning_rate': 1.5234830130725325e-05, 'epoch': 1.03} +2025-02-05 18:24:42 - ERROR - stderr - 34%|███▍ | 7738/22434 [8:17:02<10:13:49, 2.51s/it] +2025-02-05 18:24:45 - ERROR - stderr - 34%|███▍ | 7739/22434 [8:17:04<10:37:55, 2.60s/it] +2025-02-05 18:24:45 - ERROR - stderr - +2025-02-05 18:24:45 - ERROR - stderr - +2025-02-05 18:24:45 - INFO - stdout - {'loss': 0.7501, 'grad_norm': 1.2058141231536865, 'learning_rate': 1.5233599951912905e-05, 'epoch': 1.03} +2025-02-05 18:24:45 - ERROR - stderr - 34%|███▍ | 7739/22434 [8:17:04<10:37:55, 2.60s/it] +2025-02-05 18:24:47 - ERROR - stderr - 35%|███▍ | 7740/22434 [8:17:07<10:35:30, 2.60s/it] +2025-02-05 18:24:47 - ERROR - stderr - +2025-02-05 18:24:47 - ERROR - stderr - +2025-02-05 18:24:47 - INFO - stdout - {'loss': 0.7775, 'grad_norm': 1.0340498685836792, 'learning_rate': 1.5232369664011106e-05, 'epoch': 1.04} +2025-02-05 18:24:47 - ERROR - stderr - 35%|███▍ | 7740/22434 [8:17:07<10:35:30, 2.60s/it] +2025-02-05 18:24:50 - ERROR - stderr - 35%|███▍ | 7741/22434 [8:17:09<10:27:43, 2.56s/it] +2025-02-05 18:24:50 - ERROR - stderr - +2025-02-05 18:24:50 - ERROR - stderr - +2025-02-05 18:24:50 - INFO - stdout - {'loss': 0.7607, 'grad_norm': 1.2109304666519165, 'learning_rate': 1.5231139267045567e-05, 'epoch': 1.04} +2025-02-05 18:24:50 - ERROR - stderr - 35%|███▍ | 7741/22434 [8:17:10<10:27:43, 2.56s/it] +2025-02-05 18:24:52 - ERROR - stderr - 35%|███▍ | 7742/22434 [8:17:12<10:24:39, 2.55s/it] +2025-02-05 18:24:52 - ERROR - stderr - +2025-02-05 18:24:52 - ERROR - stderr - +2025-02-05 18:24:52 - INFO - stdout - {'loss': 0.8178, 'grad_norm': 1.3134193420410156, 'learning_rate': 1.5229908761041934e-05, 'epoch': 1.04} +2025-02-05 18:24:52 - ERROR - stderr - 35%|███▍ | 7742/22434 [8:17:12<10:24:39, 2.55s/it] +2025-02-05 18:24:55 - ERROR - stderr - 35%|███▍ | 7743/22434 [8:17:14<10:15:04, 2.51s/it] +2025-02-05 18:24:55 - ERROR - stderr - +2025-02-05 18:24:55 - ERROR - stderr - +2025-02-05 18:24:55 - INFO - stdout - {'loss': 0.7065, 'grad_norm': 1.170034408569336, 'learning_rate': 1.5228678146025856e-05, 'epoch': 1.04} +2025-02-05 18:24:55 - ERROR - stderr - 35%|███▍ | 7743/22434 [8:17:14<10:15:04, 2.51s/it] +2025-02-05 18:24:57 - ERROR - stderr - 35%|███▍ | 7744/22434 [8:17:17<10:17:18, 2.52s/it] +2025-02-05 18:24:57 - ERROR - stderr - +2025-02-05 18:24:57 - ERROR - stderr - +2025-02-05 18:24:57 - INFO - stdout - {'loss': 0.6384, 'grad_norm': 1.1636693477630615, 'learning_rate': 1.5227447422022991e-05, 'epoch': 1.04} +2025-02-05 18:24:57 - ERROR - stderr - 35%|███▍ | 7744/22434 [8:17:17<10:17:18, 2.52s/it] +2025-02-05 18:25:00 - ERROR - stderr - 35%|███▍ | 7745/22434 [8:17:19<10:16:27, 2.52s/it] +2025-02-05 18:25:00 - ERROR - stderr - +2025-02-05 18:25:00 - ERROR - stderr - +2025-02-05 18:25:00 - INFO - stdout - {'loss': 0.7299, 'grad_norm': 1.1211869716644287, 'learning_rate': 1.5226216589058982e-05, 'epoch': 1.04} +2025-02-05 18:25:00 - ERROR - stderr - 35%|███▍ | 7745/22434 [8:17:19<10:16:27, 2.52s/it] +2025-02-05 18:25:02 - ERROR - stderr - 35%|███▍ | 7746/22434 [8:17:22<10:20:44, 2.54s/it] +2025-02-05 18:25:02 - ERROR - stderr - +2025-02-05 18:25:02 - ERROR - stderr - +2025-02-05 18:25:02 - INFO - stdout - {'loss': 0.826, 'grad_norm': 1.2007683515548706, 'learning_rate': 1.5224985647159489e-05, 'epoch': 1.04} +2025-02-05 18:25:02 - ERROR - stderr - 35%|███▍ | 7746/22434 [8:17:22<10:20:44, 2.54s/it] +2025-02-05 18:25:05 - ERROR - stderr - 35%|███▍ | 7747/22434 [8:17:25<10:21:08, 2.54s/it] +2025-02-05 18:25:05 - ERROR - stderr - +2025-02-05 18:25:05 - ERROR - stderr - +2025-02-05 18:25:05 - INFO - stdout - {'loss': 0.8739, 'grad_norm': 1.2099323272705078, 'learning_rate': 1.5223754596350171e-05, 'epoch': 1.04} +2025-02-05 18:25:05 - ERROR - stderr - 35%|███▍ | 7747/22434 [8:17:25<10:21:08, 2.54s/it] +2025-02-05 18:25:07 - ERROR - stderr - 35%|███▍ | 7748/22434 [8:17:27<10:21:22, 2.54s/it] +2025-02-05 18:25:07 - ERROR - stderr - +2025-02-05 18:25:07 - ERROR - stderr - +2025-02-05 18:25:07 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.0444656610488892, 'learning_rate': 1.5222523436656689e-05, 'epoch': 1.04} +2025-02-05 18:25:07 - ERROR - stderr - 35%|███▍ | 7748/22434 [8:17:27<10:21:22, 2.54s/it] +2025-02-05 18:25:10 - ERROR - stderr - 35%|███▍ | 7749/22434 [8:17:30<10:21:33, 2.54s/it] +2025-02-05 18:25:10 - ERROR - stderr - +2025-02-05 18:25:10 - ERROR - stderr - +2025-02-05 18:25:10 - INFO - stdout - {'loss': 0.7844, 'grad_norm': 1.2492817640304565, 'learning_rate': 1.5221292168104702e-05, 'epoch': 1.04} +2025-02-05 18:25:10 - ERROR - stderr - 35%|███▍ | 7749/22434 [8:17:30<10:21:33, 2.54s/it] +2025-02-05 18:25:12 - ERROR - stderr - 35%|███▍ | 7750/22434 [8:17:32<10:20:34, 2.54s/it] +2025-02-05 18:25:12 - ERROR - stderr - +2025-02-05 18:25:12 - ERROR - stderr - +2025-02-05 18:25:12 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.225478172302246, 'learning_rate': 1.5220060790719875e-05, 'epoch': 1.04} +2025-02-05 18:25:12 - ERROR - stderr - 35%|███▍ | 7750/22434 [8:17:32<10:20:34, 2.54s/it] +2025-02-05 18:25:15 - ERROR - stderr - 35%|███▍ | 7751/22434 [8:17:35<10:18:23, 2.53s/it] +2025-02-05 18:25:15 - ERROR - stderr - +2025-02-05 18:25:15 - ERROR - stderr - +2025-02-05 18:25:15 - INFO - stdout - {'loss': 0.7512, 'grad_norm': 1.169121265411377, 'learning_rate': 1.5218829304527875e-05, 'epoch': 1.04} +2025-02-05 18:25:15 - ERROR - stderr - 35%|███▍ | 7751/22434 [8:17:35<10:18:23, 2.53s/it] +2025-02-05 18:25:18 - ERROR - stderr - 35%|███▍ | 7752/22434 [8:17:37<10:23:48, 2.55s/it] +2025-02-05 18:25:18 - ERROR - stderr - +2025-02-05 18:25:18 - ERROR - stderr - +2025-02-05 18:25:18 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.0508670806884766, 'learning_rate': 1.5217597709554377e-05, 'epoch': 1.04} +2025-02-05 18:25:18 - ERROR - stderr - 35%|███▍ | 7752/22434 [8:17:37<10:23:48, 2.55s/it] +2025-02-05 18:25:20 - ERROR - stderr - 35%|███▍ | 7753/22434 [8:17:40<10:17:44, 2.52s/it] +2025-02-05 18:25:20 - ERROR - stderr - +2025-02-05 18:25:20 - ERROR - stderr - +2025-02-05 18:25:20 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.0857781171798706, 'learning_rate': 1.5216366005825043e-05, 'epoch': 1.04} +2025-02-05 18:25:20 - ERROR - stderr - 35%|███▍ | 7753/22434 [8:17:40<10:17:44, 2.52s/it] +2025-02-05 18:25:22 - ERROR - stderr - 35%|███▍ | 7754/22434 [8:17:42<10:09:23, 2.49s/it] +2025-02-05 18:25:22 - ERROR - stderr - +2025-02-05 18:25:22 - ERROR - stderr - +2025-02-05 18:25:22 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.1035232543945312, 'learning_rate': 1.521513419336555e-05, 'epoch': 1.04} +2025-02-05 18:25:22 - ERROR - stderr - 35%|███▍ | 7754/22434 [8:17:42<10:09:23, 2.49s/it] +2025-02-05 18:25:25 - ERROR - stderr - 35%|███▍ | 7755/22434 [8:17:45<10:09:17, 2.49s/it] +2025-02-05 18:25:25 - ERROR - stderr - +2025-02-05 18:25:25 - ERROR - stderr - +2025-02-05 18:25:25 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.1231297254562378, 'learning_rate': 1.5213902272201577e-05, 'epoch': 1.04} +2025-02-05 18:25:25 - ERROR - stderr - 35%|███▍ | 7755/22434 [8:17:45<10:09:17, 2.49s/it] +2025-02-05 18:25:27 - ERROR - stderr - 35%|███▍ | 7756/22434 [8:17:47<10:05:03, 2.47s/it] +2025-02-05 18:25:27 - ERROR - stderr - +2025-02-05 18:25:27 - ERROR - stderr - +2025-02-05 18:25:27 - INFO - stdout - {'loss': 0.7353, 'grad_norm': 1.0912600755691528, 'learning_rate': 1.52126702423588e-05, 'epoch': 1.04} +2025-02-05 18:25:27 - ERROR - stderr - 35%|███▍ | 7756/22434 [8:17:47<10:05:03, 2.47s/it] +2025-02-05 18:25:30 - ERROR - stderr - 35%|███▍ | 7757/22434 [8:17:50<10:09:00, 2.49s/it] +2025-02-05 18:25:30 - ERROR - stderr - +2025-02-05 18:25:30 - ERROR - stderr - +2025-02-05 18:25:30 - INFO - stdout - {'loss': 0.6715, 'grad_norm': 1.0572762489318848, 'learning_rate': 1.52114381038629e-05, 'epoch': 1.04} +2025-02-05 18:25:30 - ERROR - stderr - 35%|███▍ | 7757/22434 [8:17:50<10:09:00, 2.49s/it] +2025-02-05 18:25:32 - ERROR - stderr - 35%|███▍ | 7758/22434 [8:17:52<10:10:00, 2.49s/it] +2025-02-05 18:25:32 - ERROR - stderr - +2025-02-05 18:25:32 - ERROR - stderr - +2025-02-05 18:25:32 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.1576606035232544, 'learning_rate': 1.5210205856739561e-05, 'epoch': 1.04} +2025-02-05 18:25:32 - ERROR - stderr - 35%|███▍ | 7758/22434 [8:17:52<10:10:00, 2.49s/it] +2025-02-05 18:25:35 - ERROR - stderr - 35%|███▍ | 7759/22434 [8:17:55<10:16:54, 2.52s/it] +2025-02-05 18:25:35 - ERROR - stderr - +2025-02-05 18:25:35 - ERROR - stderr - +2025-02-05 18:25:35 - INFO - stdout - {'loss': 0.7705, 'grad_norm': 1.1035009622573853, 'learning_rate': 1.5208973501014466e-05, 'epoch': 1.04} +2025-02-05 18:25:35 - ERROR - stderr - 35%|███▍ | 7759/22434 [8:17:55<10:16:54, 2.52s/it] +2025-02-05 18:25:37 - ERROR - stderr - 35%|███▍ | 7760/22434 [8:17:57<10:15:41, 2.52s/it] +2025-02-05 18:25:37 - ERROR - stderr - +2025-02-05 18:25:37 - ERROR - stderr - +2025-02-05 18:25:37 - INFO - stdout - {'loss': 0.728, 'grad_norm': 1.160325050354004, 'learning_rate': 1.5207741036713304e-05, 'epoch': 1.04} +2025-02-05 18:25:37 - ERROR - stderr - 35%|███▍ | 7760/22434 [8:17:57<10:15:41, 2.52s/it] +2025-02-05 18:25:40 - ERROR - stderr - 35%|███▍ | 7761/22434 [8:18:00<10:16:29, 2.52s/it] +2025-02-05 18:25:40 - ERROR - stderr - +2025-02-05 18:25:40 - ERROR - stderr - +2025-02-05 18:25:40 - INFO - stdout - {'loss': 0.7592, 'grad_norm': 1.1255453824996948, 'learning_rate': 1.5206508463861759e-05, 'epoch': 1.04} +2025-02-05 18:25:40 - ERROR - stderr - 35%|███▍ | 7761/22434 [8:18:00<10:16:29, 2.52s/it] +2025-02-05 18:25:42 - ERROR - stderr - 35%|███▍ | 7762/22434 [8:18:02<10:09:33, 2.49s/it] +2025-02-05 18:25:42 - ERROR - stderr - +2025-02-05 18:25:42 - ERROR - stderr - +2025-02-05 18:25:42 - INFO - stdout - {'loss': 0.7424, 'grad_norm': 1.2259798049926758, 'learning_rate': 1.520527578248553e-05, 'epoch': 1.04} +2025-02-05 18:25:42 - ERROR - stderr - 35%|███▍ | 7762/22434 [8:18:02<10:09:33, 2.49s/it] +2025-02-05 18:25:45 - ERROR - stderr - 35%|███▍ | 7763/22434 [8:18:05<10:07:59, 2.49s/it] +2025-02-05 18:25:45 - ERROR - stderr - +2025-02-05 18:25:45 - ERROR - stderr - +2025-02-05 18:25:45 - INFO - stdout - {'loss': 0.6897, 'grad_norm': 1.0097345113754272, 'learning_rate': 1.5204042992610308e-05, 'epoch': 1.04} +2025-02-05 18:25:45 - ERROR - stderr - 35%|███▍ | 7763/22434 [8:18:05<10:07:59, 2.49s/it] +2025-02-05 18:25:47 - ERROR - stderr - 35%|███▍ | 7764/22434 [8:18:07<10:11:40, 2.50s/it] +2025-02-05 18:25:47 - ERROR - stderr - +2025-02-05 18:25:47 - ERROR - stderr - +2025-02-05 18:25:47 - INFO - stdout - {'loss': 0.6248, 'grad_norm': 1.0380107164382935, 'learning_rate': 1.520281009426179e-05, 'epoch': 1.04} +2025-02-05 18:25:47 - ERROR - stderr - 35%|███▍ | 7764/22434 [8:18:07<10:11:40, 2.50s/it] +2025-02-05 18:25:50 - ERROR - stderr - 35%|███▍ | 7765/22434 [8:18:10<10:09:07, 2.49s/it] +2025-02-05 18:25:50 - ERROR - stderr - +2025-02-05 18:25:50 - ERROR - stderr - +2025-02-05 18:25:50 - INFO - stdout - {'loss': 0.713, 'grad_norm': 1.164419174194336, 'learning_rate': 1.5201577087465673e-05, 'epoch': 1.04} +2025-02-05 18:25:50 - ERROR - stderr - 35%|███▍ | 7765/22434 [8:18:10<10:09:07, 2.49s/it] +2025-02-05 18:25:52 - ERROR - stderr - 35%|███▍ | 7766/22434 [8:18:12<10:10:43, 2.50s/it] +2025-02-05 18:25:52 - ERROR - stderr - +2025-02-05 18:25:52 - ERROR - stderr - +2025-02-05 18:25:52 - INFO - stdout - {'loss': 0.6972, 'grad_norm': 1.202943205833435, 'learning_rate': 1.520034397224766e-05, 'epoch': 1.04} +2025-02-05 18:25:52 - ERROR - stderr - 35%|███▍ | 7766/22434 [8:18:12<10:10:43, 2.50s/it] +2025-02-05 18:25:55 - ERROR - stderr - 35%|███▍ | 7767/22434 [8:18:15<10:20:18, 2.54s/it] +2025-02-05 18:25:55 - ERROR - stderr - +2025-02-05 18:25:55 - ERROR - stderr - +2025-02-05 18:25:55 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.2013190984725952, 'learning_rate': 1.5199110748633452e-05, 'epoch': 1.04} +2025-02-05 18:25:55 - ERROR - stderr - 35%|███▍ | 7767/22434 [8:18:15<10:20:18, 2.54s/it] +2025-02-05 18:25:58 - ERROR - stderr - 35%|███▍ | 7768/22434 [8:18:17<10:16:04, 2.52s/it] +2025-02-05 18:25:58 - ERROR - stderr - +2025-02-05 18:25:58 - ERROR - stderr - +2025-02-05 18:25:58 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.022189974784851, 'learning_rate': 1.5197877416648757e-05, 'epoch': 1.04} +2025-02-05 18:25:58 - ERROR - stderr - 35%|███▍ | 7768/22434 [8:18:17<10:16:04, 2.52s/it] +2025-02-05 18:26:00 - ERROR - stderr - 35%|███▍ | 7769/22434 [8:18:20<10:18:16, 2.53s/it] +2025-02-05 18:26:00 - ERROR - stderr - +2025-02-05 18:26:00 - ERROR - stderr - +2025-02-05 18:26:00 - INFO - stdout - {'loss': 0.7354, 'grad_norm': 1.1417109966278076, 'learning_rate': 1.5196643976319281e-05, 'epoch': 1.04} +2025-02-05 18:26:00 - ERROR - stderr - 35%|███▍ | 7769/22434 [8:18:20<10:18:16, 2.53s/it] +2025-02-05 18:26:03 - ERROR - stderr - 35%|███▍ | 7770/22434 [8:18:23<10:50:15, 2.66s/it] +2025-02-05 18:26:03 - ERROR - stderr - +2025-02-05 18:26:03 - ERROR - stderr - +2025-02-05 18:26:03 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.1212451457977295, 'learning_rate': 1.519541042767073e-05, 'epoch': 1.04} +2025-02-05 18:26:03 - ERROR - stderr - 35%|███▍ | 7770/22434 [8:18:23<10:50:15, 2.66s/it] +2025-02-05 18:26:06 - ERROR - stderr - 35%|███▍ | 7771/22434 [8:18:25<10:37:51, 2.61s/it] +2025-02-05 18:26:06 - ERROR - stderr - +2025-02-05 18:26:06 - ERROR - stderr - +2025-02-05 18:26:06 - INFO - stdout - {'loss': 0.7927, 'grad_norm': 1.1309548616409302, 'learning_rate': 1.5194176770728826e-05, 'epoch': 1.04} +2025-02-05 18:26:06 - ERROR - stderr - 35%|███▍ | 7771/22434 [8:18:25<10:37:51, 2.61s/it] +2025-02-05 18:26:08 - ERROR - stderr - 35%|███▍ | 7772/22434 [8:18:28<10:33:47, 2.59s/it] +2025-02-05 18:26:08 - ERROR - stderr - +2025-02-05 18:26:08 - ERROR - stderr - +2025-02-05 18:26:08 - INFO - stdout - {'loss': 0.7471, 'grad_norm': 1.1652313470840454, 'learning_rate': 1.5192943005519274e-05, 'epoch': 1.04} +2025-02-05 18:26:08 - ERROR - stderr - 35%|███▍ | 7772/22434 [8:18:28<10:33:47, 2.59s/it] +2025-02-05 18:26:11 - ERROR - stderr - 35%|███▍ | 7773/22434 [8:18:30<10:32:43, 2.59s/it] +2025-02-05 18:26:11 - ERROR - stderr - +2025-02-05 18:26:11 - ERROR - stderr - +2025-02-05 18:26:11 - INFO - stdout - {'loss': 0.7548, 'grad_norm': 1.1374988555908203, 'learning_rate': 1.5191709132067795e-05, 'epoch': 1.04} +2025-02-05 18:26:11 - ERROR - stderr - 35%|███▍ | 7773/22434 [8:18:30<10:32:43, 2.59s/it] +2025-02-05 18:26:13 - ERROR - stderr - 35%|███▍ | 7774/22434 [8:18:33<10:44:46, 2.64s/it] +2025-02-05 18:26:13 - ERROR - stderr - +2025-02-05 18:26:13 - ERROR - stderr - +2025-02-05 18:26:13 - INFO - stdout - {'loss': 0.7564, 'grad_norm': 1.2150187492370605, 'learning_rate': 1.5190475150400107e-05, 'epoch': 1.04} +2025-02-05 18:26:13 - ERROR - stderr - 35%|███▍ | 7774/22434 [8:18:33<10:44:46, 2.64s/it] +2025-02-05 18:26:16 - ERROR - stderr - 35%|███▍ | 7775/22434 [8:18:36<10:38:59, 2.62s/it] +2025-02-05 18:26:16 - ERROR - stderr - +2025-02-05 18:26:16 - ERROR - stderr - +2025-02-05 18:26:16 - INFO - stdout - {'loss': 0.6141, 'grad_norm': 1.0521843433380127, 'learning_rate': 1.5189241060541928e-05, 'epoch': 1.04} +2025-02-05 18:26:16 - ERROR - stderr - 35%|███▍ | 7775/22434 [8:18:36<10:38:59, 2.62s/it] +2025-02-05 18:26:18 - ERROR - stderr - 35%|███▍ | 7776/22434 [8:18:38<10:26:03, 2.56s/it] +2025-02-05 18:26:18 - ERROR - stderr - +2025-02-05 18:26:18 - ERROR - stderr - +2025-02-05 18:26:18 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.165489912033081, 'learning_rate': 1.5188006862518992e-05, 'epoch': 1.04} +2025-02-05 18:26:18 - ERROR - stderr - 35%|███▍ | 7776/22434 [8:18:38<10:26:03, 2.56s/it] +2025-02-05 18:26:21 - ERROR - stderr - 35%|███▍ | 7777/22434 [8:18:41<10:16:31, 2.52s/it] +2025-02-05 18:26:21 - ERROR - stderr - +2025-02-05 18:26:21 - ERROR - stderr - +2025-02-05 18:26:21 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.1734412908554077, 'learning_rate': 1.5186772556357012e-05, 'epoch': 1.04} +2025-02-05 18:26:21 - ERROR - stderr - 35%|███▍ | 7777/22434 [8:18:41<10:16:31, 2.52s/it] +2025-02-05 18:26:23 - ERROR - stderr - 35%|███▍ | 7778/22434 [8:18:43<10:12:12, 2.51s/it] +2025-02-05 18:26:23 - ERROR - stderr - +2025-02-05 18:26:23 - ERROR - stderr - +2025-02-05 18:26:23 - INFO - stdout - {'loss': 0.77, 'grad_norm': 1.1646074056625366, 'learning_rate': 1.5185538142081721e-05, 'epoch': 1.04} +2025-02-05 18:26:23 - ERROR - stderr - 35%|███▍ | 7778/22434 [8:18:43<10:12:12, 2.51s/it] +2025-02-05 18:26:26 - ERROR - stderr - 35%|███▍ | 7779/22434 [8:18:46<10:14:21, 2.52s/it] +2025-02-05 18:26:26 - ERROR - stderr - +2025-02-05 18:26:26 - ERROR - stderr - +2025-02-05 18:26:26 - INFO - stdout - {'loss': 0.9134, 'grad_norm': 1.2943426370620728, 'learning_rate': 1.5184303619718852e-05, 'epoch': 1.04} +2025-02-05 18:26:26 - ERROR - stderr - 35%|███▍ | 7779/22434 [8:18:46<10:14:21, 2.52s/it] +2025-02-05 18:26:28 - ERROR - stderr - 35%|███▍ | 7780/22434 [8:18:48<10:15:54, 2.52s/it] +2025-02-05 18:26:28 - ERROR - stderr - +2025-02-05 18:26:28 - ERROR - stderr - +2025-02-05 18:26:28 - INFO - stdout - {'loss': 0.767, 'grad_norm': 1.1749294996261597, 'learning_rate': 1.5183068989294133e-05, 'epoch': 1.04} +2025-02-05 18:26:28 - ERROR - stderr - 35%|███▍ | 7780/22434 [8:18:48<10:15:54, 2.52s/it] +2025-02-05 18:26:31 - ERROR - stderr - 35%|███▍ | 7781/22434 [8:18:51<10:09:36, 2.50s/it] +2025-02-05 18:26:31 - ERROR - stderr - +2025-02-05 18:26:31 - ERROR - stderr - +2025-02-05 18:26:31 - INFO - stdout - {'loss': 0.7161, 'grad_norm': 1.081661581993103, 'learning_rate': 1.51818342508333e-05, 'epoch': 1.04} +2025-02-05 18:26:31 - ERROR - stderr - 35%|███▍ | 7781/22434 [8:18:51<10:09:36, 2.50s/it] +2025-02-05 18:26:33 - ERROR - stderr - 35%|███▍ | 7782/22434 [8:18:53<10:07:57, 2.49s/it] +2025-02-05 18:26:33 - ERROR - stderr - +2025-02-05 18:26:33 - ERROR - stderr - +2025-02-05 18:26:33 - INFO - stdout - {'loss': 0.7882, 'grad_norm': 1.343120813369751, 'learning_rate': 1.5180599404362093e-05, 'epoch': 1.04} +2025-02-05 18:26:33 - ERROR - stderr - 35%|███▍ | 7782/22434 [8:18:53<10:07:57, 2.49s/it] +2025-02-05 18:26:36 - ERROR - stderr - 35%|███▍ | 7783/22434 [8:18:56<10:06:43, 2.48s/it] +2025-02-05 18:26:36 - ERROR - stderr - +2025-02-05 18:26:36 - ERROR - stderr - +2025-02-05 18:26:36 - INFO - stdout - {'loss': 0.794, 'grad_norm': 1.3312262296676636, 'learning_rate': 1.5179364449906246e-05, 'epoch': 1.04} +2025-02-05 18:26:36 - ERROR - stderr - 35%|███▍ | 7783/22434 [8:18:56<10:06:43, 2.48s/it] +2025-02-05 18:26:38 - ERROR - stderr - 35%|███▍ | 7784/22434 [8:18:58<10:08:05, 2.49s/it] +2025-02-05 18:26:38 - ERROR - stderr - +2025-02-05 18:26:38 - ERROR - stderr - +2025-02-05 18:26:38 - INFO - stdout - {'loss': 0.7671, 'grad_norm': 1.0863099098205566, 'learning_rate': 1.5178129387491507e-05, 'epoch': 1.04} +2025-02-05 18:26:38 - ERROR - stderr - 35%|███▍ | 7784/22434 [8:18:58<10:08:05, 2.49s/it] +2025-02-05 18:26:41 - ERROR - stderr - 35%|███▍ | 7785/22434 [8:19:01<10:11:13, 2.50s/it] +2025-02-05 18:26:41 - ERROR - stderr - +2025-02-05 18:26:41 - ERROR - stderr - +2025-02-05 18:26:41 - INFO - stdout - {'loss': 0.6912, 'grad_norm': 1.0764286518096924, 'learning_rate': 1.5176894217143617e-05, 'epoch': 1.04} +2025-02-05 18:26:41 - ERROR - stderr - 35%|███▍ | 7785/22434 [8:19:01<10:11:13, 2.50s/it] +2025-02-05 18:26:43 - ERROR - stderr - 35%|███▍ | 7786/22434 [8:19:03<10:11:29, 2.50s/it] +2025-02-05 18:26:43 - ERROR - stderr - +2025-02-05 18:26:43 - ERROR - stderr - +2025-02-05 18:26:43 - INFO - stdout - {'loss': 0.8214, 'grad_norm': 1.2228187322616577, 'learning_rate': 1.5175658938888313e-05, 'epoch': 1.04} +2025-02-05 18:26:43 - ERROR - stderr - 35%|███▍ | 7786/22434 [8:19:03<10:11:29, 2.50s/it] +2025-02-05 18:26:46 - ERROR - stderr - 35%|███▍ | 7787/22434 [8:19:06<10:11:55, 2.51s/it] +2025-02-05 18:26:46 - ERROR - stderr - +2025-02-05 18:26:46 - ERROR - stderr - +2025-02-05 18:26:46 - INFO - stdout - {'loss': 0.7335, 'grad_norm': 1.062856674194336, 'learning_rate': 1.5174423552751356e-05, 'epoch': 1.04} +2025-02-05 18:26:46 - ERROR - stderr - 35%|███▍ | 7787/22434 [8:19:06<10:11:55, 2.51s/it] +2025-02-05 18:26:48 - ERROR - stderr - 35%|███▍ | 7788/22434 [8:19:08<10:07:17, 2.49s/it] +2025-02-05 18:26:48 - ERROR - stderr - +2025-02-05 18:26:48 - ERROR - stderr - +2025-02-05 18:26:48 - INFO - stdout - {'loss': 0.8234, 'grad_norm': 1.2317777872085571, 'learning_rate': 1.5173188058758492e-05, 'epoch': 1.04} +2025-02-05 18:26:48 - ERROR - stderr - 35%|███▍ | 7788/22434 [8:19:08<10:07:17, 2.49s/it] +2025-02-05 18:26:51 - ERROR - stderr - 35%|███▍ | 7789/22434 [8:19:10<10:05:00, 2.48s/it] +2025-02-05 18:26:51 - ERROR - stderr - +2025-02-05 18:26:51 - ERROR - stderr - +2025-02-05 18:26:51 - INFO - stdout - {'loss': 0.6809, 'grad_norm': 1.2182952165603638, 'learning_rate': 1.5171952456935471e-05, 'epoch': 1.04} +2025-02-05 18:26:51 - ERROR - stderr - 35%|███▍ | 7789/22434 [8:19:11<10:05:00, 2.48s/it] +2025-02-05 18:26:53 - ERROR - stderr - 35%|███▍ | 7790/22434 [8:19:13<10:05:32, 2.48s/it] +2025-02-05 18:26:53 - ERROR - stderr - +2025-02-05 18:26:53 - ERROR - stderr - +2025-02-05 18:26:53 - INFO - stdout - {'loss': 0.7095, 'grad_norm': 1.1359412670135498, 'learning_rate': 1.5170716747308052e-05, 'epoch': 1.04} +2025-02-05 18:26:53 - ERROR - stderr - 35%|███▍ | 7790/22434 [8:19:13<10:05:32, 2.48s/it] +2025-02-05 18:26:56 - ERROR - stderr - 35%|███▍ | 7791/22434 [8:19:15<10:07:22, 2.49s/it] +2025-02-05 18:26:56 - ERROR - stderr - +2025-02-05 18:26:56 - ERROR - stderr - +2025-02-05 18:26:56 - INFO - stdout - {'loss': 0.7312, 'grad_norm': 1.0818493366241455, 'learning_rate': 1.516948092990199e-05, 'epoch': 1.04} +2025-02-05 18:26:56 - ERROR - stderr - 35%|███▍ | 7791/22434 [8:19:16<10:07:22, 2.49s/it] +2025-02-05 18:26:58 - ERROR - stderr - 35%|███▍ | 7792/22434 [8:19:18<10:29:02, 2.58s/it] +2025-02-05 18:26:59 - ERROR - stderr - +2025-02-05 18:26:59 - ERROR - stderr - +2025-02-05 18:26:59 - INFO - stdout - {'loss': 0.8068, 'grad_norm': 1.005029559135437, 'learning_rate': 1.5168245004743045e-05, 'epoch': 1.04} +2025-02-05 18:26:59 - ERROR - stderr - 35%|███▍ | 7792/22434 [8:19:18<10:29:02, 2.58s/it] +2025-02-05 18:27:01 - ERROR - stderr - 35%|███▍ | 7793/22434 [8:19:21<10:31:56, 2.59s/it] +2025-02-05 18:27:01 - ERROR - stderr - +2025-02-05 18:27:01 - ERROR - stderr - +2025-02-05 18:27:01 - INFO - stdout - {'loss': 0.7285, 'grad_norm': 1.084871768951416, 'learning_rate': 1.5167008971856977e-05, 'epoch': 1.04} +2025-02-05 18:27:01 - ERROR - stderr - 35%|███▍ | 7793/22434 [8:19:21<10:31:56, 2.59s/it] +2025-02-05 18:27:04 - ERROR - stderr - 35%|███▍ | 7794/22434 [8:19:23<10:27:40, 2.57s/it] +2025-02-05 18:27:04 - ERROR - stderr - +2025-02-05 18:27:04 - ERROR - stderr - +2025-02-05 18:27:04 - INFO - stdout - {'loss': 0.7741, 'grad_norm': 1.1375077962875366, 'learning_rate': 1.5165772831269547e-05, 'epoch': 1.04} +2025-02-05 18:27:04 - ERROR - stderr - 35%|███▍ | 7794/22434 [8:19:23<10:27:40, 2.57s/it] +2025-02-05 18:27:06 - ERROR - stderr - 35%|███▍ | 7795/22434 [8:19:26<10:26:05, 2.57s/it] +2025-02-05 18:27:06 - ERROR - stderr - +2025-02-05 18:27:06 - ERROR - stderr - +2025-02-05 18:27:06 - INFO - stdout - {'loss': 0.8046, 'grad_norm': 1.1234840154647827, 'learning_rate': 1.516453658300653e-05, 'epoch': 1.04} +2025-02-05 18:27:06 - ERROR - stderr - 35%|███▍ | 7795/22434 [8:19:26<10:26:05, 2.57s/it] +2025-02-05 18:27:09 - ERROR - stderr - 35%|███▍ | 7796/22434 [8:19:28<10:20:52, 2.54s/it] +2025-02-05 18:27:09 - ERROR - stderr - +2025-02-05 18:27:09 - ERROR - stderr - +2025-02-05 18:27:09 - INFO - stdout - {'loss': 0.846, 'grad_norm': 1.2515041828155518, 'learning_rate': 1.5163300227093691e-05, 'epoch': 1.04} +2025-02-05 18:27:09 - ERROR - stderr - 35%|███▍ | 7796/22434 [8:19:29<10:20:52, 2.54s/it] +2025-02-05 18:27:11 - ERROR - stderr - 35%|███▍ | 7797/22434 [8:19:31<10:19:38, 2.54s/it] +2025-02-05 18:27:11 - ERROR - stderr - +2025-02-05 18:27:11 - ERROR - stderr - +2025-02-05 18:27:11 - INFO - stdout - {'loss': 0.7578, 'grad_norm': 1.2193487882614136, 'learning_rate': 1.51620637635568e-05, 'epoch': 1.04} +2025-02-05 18:27:11 - ERROR - stderr - 35%|███▍ | 7797/22434 [8:19:31<10:19:38, 2.54s/it] +2025-02-05 18:27:14 - ERROR - stderr - 35%|███▍ | 7798/22434 [8:19:33<10:13:53, 2.52s/it] +2025-02-05 18:27:14 - ERROR - stderr - +2025-02-05 18:27:14 - ERROR - stderr - +2025-02-05 18:27:14 - INFO - stdout - {'loss': 0.6683, 'grad_norm': 1.0596857070922852, 'learning_rate': 1.5160827192421628e-05, 'epoch': 1.04} +2025-02-05 18:27:14 - ERROR - stderr - 35%|███▍ | 7798/22434 [8:19:33<10:13:53, 2.52s/it] +2025-02-05 18:27:16 - ERROR - stderr - 35%|███▍ | 7799/22434 [8:19:36<10:19:01, 2.54s/it] +2025-02-05 18:27:16 - ERROR - stderr - +2025-02-05 18:27:16 - ERROR - stderr - +2025-02-05 18:27:16 - INFO - stdout - {'loss': 0.6882, 'grad_norm': 1.1342637538909912, 'learning_rate': 1.5159590513713952e-05, 'epoch': 1.04} +2025-02-05 18:27:16 - ERROR - stderr - 35%|███▍ | 7799/22434 [8:19:36<10:19:01, 2.54s/it] +2025-02-05 18:27:19 - ERROR - stderr - 35%|███▍ | 7800/22434 [8:19:39<10:15:07, 2.52s/it] +2025-02-05 18:27:19 - ERROR - stderr - +2025-02-05 18:27:19 - ERROR - stderr - +2025-02-05 18:27:19 - INFO - stdout - {'loss': 0.838, 'grad_norm': 1.1775076389312744, 'learning_rate': 1.5158353727459548e-05, 'epoch': 1.04} +2025-02-05 18:27:19 - ERROR - stderr - 35%|���██▍ | 7800/22434 [8:19:39<10:15:07, 2.52s/it] +2025-02-05 18:27:21 - ERROR - stderr - 35%|███▍ | 7801/22434 [8:19:41<10:17:32, 2.53s/it] +2025-02-05 18:27:21 - ERROR - stderr - +2025-02-05 18:27:21 - ERROR - stderr - +2025-02-05 18:27:21 - INFO - stdout - {'loss': 0.7315, 'grad_norm': 1.1239255666732788, 'learning_rate': 1.5157116833684196e-05, 'epoch': 1.04} +2025-02-05 18:27:21 - ERROR - stderr - 35%|███▍ | 7801/22434 [8:19:41<10:17:32, 2.53s/it] +2025-02-05 18:27:24 - ERROR - stderr - 35%|███▍ | 7802/22434 [8:19:44<10:16:36, 2.53s/it] +2025-02-05 18:27:24 - ERROR - stderr - +2025-02-05 18:27:24 - ERROR - stderr - +2025-02-05 18:27:24 - INFO - stdout - {'loss': 0.7447, 'grad_norm': 1.1853537559509277, 'learning_rate': 1.5155879832413678e-05, 'epoch': 1.04} +2025-02-05 18:27:24 - ERROR - stderr - 35%|███▍ | 7802/22434 [8:19:44<10:16:36, 2.53s/it] +2025-02-05 18:27:26 - ERROR - stderr - 35%|███▍ | 7803/22434 [8:19:46<10:12:07, 2.51s/it] +2025-02-05 18:27:26 - ERROR - stderr - +2025-02-05 18:27:26 - ERROR - stderr - +2025-02-05 18:27:26 - INFO - stdout - {'loss': 0.7479, 'grad_norm': 1.15453040599823, 'learning_rate': 1.515464272367378e-05, 'epoch': 1.04} +2025-02-05 18:27:26 - ERROR - stderr - 35%|███▍ | 7803/22434 [8:19:46<10:12:07, 2.51s/it] +2025-02-05 18:27:29 - ERROR - stderr - 35%|███▍ | 7804/22434 [8:19:49<10:11:09, 2.51s/it] +2025-02-05 18:27:29 - ERROR - stderr - +2025-02-05 18:27:29 - ERROR - stderr - +2025-02-05 18:27:29 - INFO - stdout - {'loss': 0.722, 'grad_norm': 1.0728952884674072, 'learning_rate': 1.5153405507490288e-05, 'epoch': 1.04} +2025-02-05 18:27:29 - ERROR - stderr - 35%|███▍ | 7804/22434 [8:19:49<10:11:09, 2.51s/it] +2025-02-05 18:27:31 - ERROR - stderr - 35%|███▍ | 7805/22434 [8:19:51<10:10:48, 2.51s/it] +2025-02-05 18:27:31 - ERROR - stderr - +2025-02-05 18:27:31 - ERROR - stderr - +2025-02-05 18:27:31 - INFO - stdout - {'loss': 0.6686, 'grad_norm': 1.1377720832824707, 'learning_rate': 1.5152168183888987e-05, 'epoch': 1.04} +2025-02-05 18:27:31 - ERROR - stderr - 35%|███▍ | 7805/22434 [8:19:51<10:10:48, 2.51s/it] +2025-02-05 18:27:34 - ERROR - stderr - 35%|███▍ | 7806/22434 [8:19:54<10:14:16, 2.52s/it] +2025-02-05 18:27:34 - ERROR - stderr - +2025-02-05 18:27:34 - ERROR - stderr - +2025-02-05 18:27:34 - INFO - stdout - {'loss': 0.6818, 'grad_norm': 1.210998296737671, 'learning_rate': 1.515093075289567e-05, 'epoch': 1.04} +2025-02-05 18:27:34 - ERROR - stderr - 35%|███▍ | 7806/22434 [8:19:54<10:14:16, 2.52s/it] +2025-02-05 18:27:36 - ERROR - stderr - 35%|███▍ | 7807/22434 [8:19:56<10:12:05, 2.51s/it] +2025-02-05 18:27:36 - ERROR - stderr - +2025-02-05 18:27:36 - ERROR - stderr - +2025-02-05 18:27:36 - INFO - stdout - {'loss': 0.748, 'grad_norm': 1.1565691232681274, 'learning_rate': 1.5149693214536131e-05, 'epoch': 1.04} +2025-02-05 18:27:36 - ERROR - stderr - 35%|███▍ | 7807/22434 [8:19:56<10:12:05, 2.51s/it] +2025-02-05 18:27:39 - ERROR - stderr - 35%|███▍ | 7808/22434 [8:19:59<10:12:18, 2.51s/it] +2025-02-05 18:27:39 - ERROR - stderr - +2025-02-05 18:27:39 - ERROR - stderr - +2025-02-05 18:27:39 - INFO - stdout - {'loss': 0.6781, 'grad_norm': 0.973939061164856, 'learning_rate': 1.514845556883617e-05, 'epoch': 1.04} +2025-02-05 18:27:39 - ERROR - stderr - 35%|███▍ | 7808/22434 [8:19:59<10:12:18, 2.51s/it] +2025-02-05 18:27:42 - ERROR - stderr - 35%|███▍ | 7809/22434 [8:20:01<10:27:46, 2.58s/it] +2025-02-05 18:27:42 - ERROR - stderr - +2025-02-05 18:27:42 - ERROR - stderr - +2025-02-05 18:27:42 - INFO - stdout - {'loss': 0.6567, 'grad_norm': 1.0522398948669434, 'learning_rate': 1.5147217815821571e-05, 'epoch': 1.04} +2025-02-05 18:27:42 - ERROR - stderr - 35%|███▍ | 7809/22434 [8:20:01<10:27:46, 2.58s/it] +2025-02-05 18:27:44 - ERROR - stderr - 35%|███▍ | 7810/22434 [8:20:04<10:18:41, 2.54s/it] +2025-02-05 18:27:44 - ERROR - stderr - +2025-02-05 18:27:44 - ERROR - stderr - +2025-02-05 18:27:44 - INFO - stdout - {'loss': 0.6557, 'grad_norm': 1.0849666595458984, 'learning_rate': 1.5145979955518147e-05, 'epoch': 1.04} +2025-02-05 18:27:44 - ERROR - stderr - 35%|███▍ | 7810/22434 [8:20:04<10:18:41, 2.54s/it] +2025-02-05 18:27:47 - ERROR - stderr - 35%|███▍ | 7811/22434 [8:20:06<10:20:50, 2.55s/it] +2025-02-05 18:27:47 - ERROR - stderr - +2025-02-05 18:27:47 - ERROR - stderr - +2025-02-05 18:27:47 - INFO - stdout - {'loss': 0.7564, 'grad_norm': 1.2479661703109741, 'learning_rate': 1.5144741987951692e-05, 'epoch': 1.04} +2025-02-05 18:27:47 - ERROR - stderr - 35%|███▍ | 7811/22434 [8:20:06<10:20:50, 2.55s/it] +2025-02-05 18:27:49 - ERROR - stderr - 35%|███▍ | 7812/22434 [8:20:09<10:14:45, 2.52s/it] +2025-02-05 18:27:49 - ERROR - stderr - +2025-02-05 18:27:49 - ERROR - stderr - +2025-02-05 18:27:49 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.0706498622894287, 'learning_rate': 1.5143503913148017e-05, 'epoch': 1.04} +2025-02-05 18:27:49 - ERROR - stderr - 35%|███▍ | 7812/22434 [8:20:09<10:14:45, 2.52s/it] +2025-02-05 18:27:52 - ERROR - stderr - 35%|███▍ | 7813/22434 [8:20:11<10:20:55, 2.55s/it] +2025-02-05 18:27:52 - ERROR - stderr - +2025-02-05 18:27:52 - ERROR - stderr - +2025-02-05 18:27:52 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.3072758913040161, 'learning_rate': 1.514226573113292e-05, 'epoch': 1.04} +2025-02-05 18:27:52 - ERROR - stderr - 35%|███▍ | 7813/22434 [8:20:11<10:20:55, 2.55s/it] +2025-02-05 18:27:54 - ERROR - stderr - 35%|███▍ | 7814/22434 [8:20:14<10:18:41, 2.54s/it] +2025-02-05 18:27:54 - ERROR - stderr - +2025-02-05 18:27:54 - ERROR - stderr - +2025-02-05 18:27:54 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.2006278038024902, 'learning_rate': 1.5141027441932217e-05, 'epoch': 1.04} +2025-02-05 18:27:54 - ERROR - stderr - 35%|███▍ | 7814/22434 [8:20:14<10:18:41, 2.54s/it] +2025-02-05 18:27:57 - ERROR - stderr - 35%|███▍ | 7815/22434 [8:20:16<10:11:33, 2.51s/it] +2025-02-05 18:27:57 - ERROR - stderr - +2025-02-05 18:27:57 - ERROR - stderr - +2025-02-05 18:27:57 - INFO - stdout - {'loss': 0.7583, 'grad_norm': 1.2566536664962769, 'learning_rate': 1.5139789045571718e-05, 'epoch': 1.05} +2025-02-05 18:27:57 - ERROR - stderr - 35%|███▍ | 7815/22434 [8:20:16<10:11:33, 2.51s/it] +2025-02-05 18:27:59 - ERROR - stderr - 35%|███▍ | 7816/22434 [8:20:19<10:30:42, 2.59s/it] +2025-02-05 18:27:59 - ERROR - stderr - +2025-02-05 18:27:59 - ERROR - stderr - +2025-02-05 18:27:59 - INFO - stdout - {'loss': 0.7164, 'grad_norm': 1.1450823545455933, 'learning_rate': 1.5138550542077233e-05, 'epoch': 1.05} +2025-02-05 18:27:59 - ERROR - stderr - 35%|███▍ | 7816/22434 [8:20:19<10:30:42, 2.59s/it] +2025-02-05 18:28:02 - ERROR - stderr - 35%|███▍ | 7817/22434 [8:20:22<10:20:23, 2.55s/it] +2025-02-05 18:28:02 - ERROR - stderr - +2025-02-05 18:28:02 - ERROR - stderr - +2025-02-05 18:28:02 - INFO - stdout - {'loss': 0.663, 'grad_norm': 1.298461675643921, 'learning_rate': 1.5137311931474582e-05, 'epoch': 1.05} +2025-02-05 18:28:02 - ERROR - stderr - 35%|███▍ | 7817/22434 [8:20:22<10:20:23, 2.55s/it] +2025-02-05 18:28:04 - ERROR - stderr - 35%|███▍ | 7818/22434 [8:20:24<10:18:38, 2.54s/it] +2025-02-05 18:28:04 - ERROR - stderr - +2025-02-05 18:28:04 - ERROR - stderr - +2025-02-05 18:28:04 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2287315130233765, 'learning_rate': 1.5136073213789574e-05, 'epoch': 1.05} +2025-02-05 18:28:04 - ERROR - stderr - 35%|███▍ | 7818/22434 [8:20:24<10:18:38, 2.54s/it] +2025-02-05 18:28:07 - ERROR - stderr - 35%|███▍ | 7819/22434 [8:20:27<10:12:48, 2.52s/it] +2025-02-05 18:28:07 - ERROR - stderr - +2025-02-05 18:28:07 - ERROR - stderr - +2025-02-05 18:28:07 - INFO - stdout - {'loss': 0.6467, 'grad_norm': 1.0969955921173096, 'learning_rate': 1.5134834389048036e-05, 'epoch': 1.05} +2025-02-05 18:28:07 - ERROR - stderr - 35%|███▍ | 7819/22434 [8:20:27<10:12:48, 2.52s/it] +2025-02-05 18:28:09 - ERROR - stderr - 35%|███▍ | 7820/22434 [8:20:29<10:21:48, 2.55s/it] +2025-02-05 18:28:10 - ERROR - stderr - +2025-02-05 18:28:10 - ERROR - stderr - +2025-02-05 18:28:10 - INFO - stdout - {'loss': 0.7212, 'grad_norm': 1.0878665447235107, 'learning_rate': 1.513359545727579e-05, 'epoch': 1.05} +2025-02-05 18:28:10 - ERROR - stderr - 35%|███▍ | 7820/22434 [8:20:29<10:21:48, 2.55s/it] +2025-02-05 18:28:12 - ERROR - stderr - 35%|███▍ | 7821/22434 [8:20:32<10:24:40, 2.56s/it] +2025-02-05 18:28:12 - ERROR - stderr - +2025-02-05 18:28:12 - ERROR - stderr - +2025-02-05 18:28:12 - INFO - stdout - {'loss': 0.7916, 'grad_norm': 1.1976404190063477, 'learning_rate': 1.5132356418498661e-05, 'epoch': 1.05} +2025-02-05 18:28:12 - ERROR - stderr - 35%|███▍ | 7821/22434 [8:20:32<10:24:40, 2.56s/it] +2025-02-05 18:28:15 - ERROR - stderr - 35%|███▍ | 7822/22434 [8:20:34<10:19:35, 2.54s/it] +2025-02-05 18:28:15 - ERROR - stderr - +2025-02-05 18:28:15 - ERROR - stderr - +2025-02-05 18:28:15 - INFO - stdout - {'loss': 0.7156, 'grad_norm': 1.1810678243637085, 'learning_rate': 1.513111727274247e-05, 'epoch': 1.05} +2025-02-05 18:28:15 - ERROR - stderr - 35%|███▍ | 7822/22434 [8:20:34<10:19:35, 2.54s/it] +2025-02-05 18:28:17 - ERROR - stderr - 35%|███▍ | 7823/22434 [8:20:37<10:14:26, 2.52s/it] +2025-02-05 18:28:17 - ERROR - stderr - +2025-02-05 18:28:17 - ERROR - stderr - +2025-02-05 18:28:17 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.1508187055587769, 'learning_rate': 1.5129878020033051e-05, 'epoch': 1.05} +2025-02-05 18:28:17 - ERROR - stderr - 35%|███▍ | 7823/22434 [8:20:37<10:14:26, 2.52s/it] +2025-02-05 18:28:20 - ERROR - stderr - 35%|█��█▍ | 7824/22434 [8:20:39<10:14:19, 2.52s/it] +2025-02-05 18:28:20 - ERROR - stderr - +2025-02-05 18:28:20 - ERROR - stderr - +2025-02-05 18:28:20 - INFO - stdout - {'loss': 0.7661, 'grad_norm': 1.2293113470077515, 'learning_rate': 1.5128638660396234e-05, 'epoch': 1.05} +2025-02-05 18:28:20 - ERROR - stderr - 35%|███▍ | 7824/22434 [8:20:39<10:14:19, 2.52s/it] +2025-02-05 18:28:22 - ERROR - stderr - 35%|███▍ | 7825/22434 [8:20:42<10:15:24, 2.53s/it] +2025-02-05 18:28:22 - ERROR - stderr - +2025-02-05 18:28:22 - ERROR - stderr - +2025-02-05 18:28:22 - INFO - stdout - {'loss': 0.6897, 'grad_norm': 1.0373139381408691, 'learning_rate': 1.512739919385785e-05, 'epoch': 1.05} +2025-02-05 18:28:22 - ERROR - stderr - 35%|███▍ | 7825/22434 [8:20:42<10:15:24, 2.53s/it] +2025-02-05 18:28:25 - ERROR - stderr - 35%|███▍ | 7826/22434 [8:20:44<10:10:15, 2.51s/it] +2025-02-05 18:28:25 - ERROR - stderr - +2025-02-05 18:28:25 - ERROR - stderr - +2025-02-05 18:28:25 - INFO - stdout - {'loss': 0.7035, 'grad_norm': 1.0706820487976074, 'learning_rate': 1.5126159620443738e-05, 'epoch': 1.05} +2025-02-05 18:28:25 - ERROR - stderr - 35%|███▍ | 7826/22434 [8:20:44<10:10:15, 2.51s/it] +2025-02-05 18:28:27 - ERROR - stderr - 35%|███▍ | 7827/22434 [8:20:47<10:14:18, 2.52s/it] +2025-02-05 18:28:27 - ERROR - stderr - +2025-02-05 18:28:27 - ERROR - stderr - +2025-02-05 18:28:27 - INFO - stdout - {'loss': 0.6626, 'grad_norm': 1.0648683309555054, 'learning_rate': 1.5124919940179732e-05, 'epoch': 1.05} +2025-02-05 18:28:27 - ERROR - stderr - 35%|███▍ | 7827/22434 [8:20:47<10:14:18, 2.52s/it] +2025-02-05 18:28:30 - ERROR - stderr - 35%|███▍ | 7828/22434 [8:20:49<10:19:11, 2.54s/it] +2025-02-05 18:28:30 - ERROR - stderr - +2025-02-05 18:28:30 - ERROR - stderr - +2025-02-05 18:28:30 - INFO - stdout - {'loss': 0.7532, 'grad_norm': 1.1726741790771484, 'learning_rate': 1.5123680153091675e-05, 'epoch': 1.05} +2025-02-05 18:28:30 - ERROR - stderr - 35%|███▍ | 7828/22434 [8:20:50<10:19:11, 2.54s/it] +2025-02-05 18:28:32 - ERROR - stderr - 35%|███▍ | 7829/22434 [8:20:52<10:20:06, 2.55s/it] +2025-02-05 18:28:32 - ERROR - stderr - +2025-02-05 18:28:32 - ERROR - stderr - +2025-02-05 18:28:32 - INFO - stdout - {'loss': 0.6436, 'grad_norm': 1.1182183027267456, 'learning_rate': 1.5122440259205408e-05, 'epoch': 1.05} +2025-02-05 18:28:32 - ERROR - stderr - 35%|███▍ | 7829/22434 [8:20:52<10:20:06, 2.55s/it] +2025-02-05 18:28:35 - ERROR - stderr - 35%|███▍ | 7830/22434 [8:20:55<10:14:56, 2.53s/it] +2025-02-05 18:28:35 - ERROR - stderr - +2025-02-05 18:28:35 - ERROR - stderr - +2025-02-05 18:28:35 - INFO - stdout - {'loss': 0.6762, 'grad_norm': 1.0258846282958984, 'learning_rate': 1.5121200258546778e-05, 'epoch': 1.05} +2025-02-05 18:28:35 - ERROR - stderr - 35%|███▍ | 7830/22434 [8:20:55<10:14:56, 2.53s/it] +2025-02-05 18:28:37 - ERROR - stderr - 35%|███▍ | 7831/22434 [8:20:57<10:09:35, 2.50s/it] +2025-02-05 18:28:37 - ERROR - stderr - +2025-02-05 18:28:37 - ERROR - stderr - +2025-02-05 18:28:37 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.1166654825210571, 'learning_rate': 1.5119960151141627e-05, 'epoch': 1.05} +2025-02-05 18:28:37 - ERROR - stderr - 35%|███▍ | 7831/22434 [8:20:57<10:09:35, 2.50s/it] +2025-02-05 18:28:40 - ERROR - stderr - 35%|███▍ | 7832/22434 [8:20:59<10:06:58, 2.49s/it] +2025-02-05 18:28:40 - ERROR - stderr - +2025-02-05 18:28:40 - ERROR - stderr - +2025-02-05 18:28:40 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.0509123802185059, 'learning_rate': 1.5118719937015805e-05, 'epoch': 1.05} +2025-02-05 18:28:40 - ERROR - stderr - 35%|███▍ | 7832/22434 [8:20:59<10:06:58, 2.49s/it] +2025-02-05 18:28:42 - ERROR - stderr - 35%|███▍ | 7833/22434 [8:21:02<10:09:46, 2.51s/it] +2025-02-05 18:28:42 - ERROR - stderr - +2025-02-05 18:28:42 - ERROR - stderr - +2025-02-05 18:28:42 - INFO - stdout - {'loss': 0.7557, 'grad_norm': 1.086255431175232, 'learning_rate': 1.5117479616195163e-05, 'epoch': 1.05} +2025-02-05 18:28:42 - ERROR - stderr - 35%|███▍ | 7833/22434 [8:21:02<10:09:46, 2.51s/it] +2025-02-05 18:28:45 - ERROR - stderr - 35%|███▍ | 7834/22434 [8:21:04<10:04:20, 2.48s/it] +2025-02-05 18:28:45 - ERROR - stderr - +2025-02-05 18:28:45 - ERROR - stderr - +2025-02-05 18:28:45 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.1505647897720337, 'learning_rate': 1.5116239188705557e-05, 'epoch': 1.05} +2025-02-05 18:28:45 - ERROR - stderr - 35%|███▍ | 7834/22434 [8:21:04<10:04:20, 2.48s/it] +2025-02-05 18:28:47 - ERROR - stderr - 35%|███▍ | 7835/22434 [8:21:07<10:11:32, 2.51s/it] +2025-02-05 18:28:47 - ERROR - stderr - +2025-02-05 18:28:47 - ERROR - stderr - +2025-02-05 18:28:47 - INFO - stdout - {'loss': 0.7549, 'grad_norm': 1.1314702033996582, 'learning_rate': 1.511499865457284e-05, 'epoch': 1.05} +2025-02-05 18:28:47 - ERROR - stderr - 35%|███▍ | 7835/22434 [8:21:07<10:11:32, 2.51s/it] +2025-02-05 18:28:50 - ERROR - stderr - 35%|███▍ | 7836/22434 [8:21:10<10:38:11, 2.62s/it] +2025-02-05 18:28:50 - ERROR - stderr - +2025-02-05 18:28:50 - ERROR - stderr - +2025-02-05 18:28:50 - INFO - stdout - {'loss': 0.7928, 'grad_norm': 1.0411655902862549, 'learning_rate': 1.511375801382287e-05, 'epoch': 1.05} +2025-02-05 18:28:50 - ERROR - stderr - 35%|███▍ | 7836/22434 [8:21:10<10:38:11, 2.62s/it] +2025-02-05 18:28:53 - ERROR - stderr - 35%|███▍ | 7837/22434 [8:21:12<10:30:40, 2.59s/it] +2025-02-05 18:28:53 - ERROR - stderr - +2025-02-05 18:28:53 - ERROR - stderr - +2025-02-05 18:28:53 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.027052402496338, 'learning_rate': 1.5112517266481513e-05, 'epoch': 1.05} +2025-02-05 18:28:53 - ERROR - stderr - 35%|███▍ | 7837/22434 [8:21:12<10:30:40, 2.59s/it] +2025-02-05 18:28:55 - ERROR - stderr - 35%|███▍ | 7838/22434 [8:21:15<10:25:50, 2.57s/it] +2025-02-05 18:28:55 - ERROR - stderr - +2025-02-05 18:28:55 - ERROR - stderr - +2025-02-05 18:28:55 - INFO - stdout - {'loss': 0.7465, 'grad_norm': 1.249130368232727, 'learning_rate': 1.511127641257462e-05, 'epoch': 1.05} +2025-02-05 18:28:55 - ERROR - stderr - 35%|███▍ | 7838/22434 [8:21:15<10:25:50, 2.57s/it] +2025-02-05 18:28:58 - ERROR - stderr - 35%|███▍ | 7839/22434 [8:21:17<10:21:47, 2.56s/it] +2025-02-05 18:28:58 - ERROR - stderr - +2025-02-05 18:28:58 - ERROR - stderr - +2025-02-05 18:28:58 - INFO - stdout - {'loss': 0.7807, 'grad_norm': 1.0493065118789673, 'learning_rate': 1.511003545212806e-05, 'epoch': 1.05} +2025-02-05 18:28:58 - ERROR - stderr - 35%|███▍ | 7839/22434 [8:21:17<10:21:47, 2.56s/it] +2025-02-05 18:29:00 - ERROR - stderr - 35%|███▍ | 7840/22434 [8:21:20<10:20:29, 2.55s/it] +2025-02-05 18:29:00 - ERROR - stderr - +2025-02-05 18:29:00 - ERROR - stderr - +2025-02-05 18:29:00 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.100597858428955, 'learning_rate': 1.5108794385167703e-05, 'epoch': 1.05} +2025-02-05 18:29:00 - ERROR - stderr - 35%|███▍ | 7840/22434 [8:21:20<10:20:29, 2.55s/it] +2025-02-05 18:29:03 - ERROR - stderr - 35%|███▍ | 7841/22434 [8:21:22<10:14:29, 2.53s/it] +2025-02-05 18:29:03 - ERROR - stderr - +2025-02-05 18:29:03 - ERROR - stderr - +2025-02-05 18:29:03 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.1343402862548828, 'learning_rate': 1.5107553211719416e-05, 'epoch': 1.05} +2025-02-05 18:29:03 - ERROR - stderr - 35%|███▍ | 7841/22434 [8:21:22<10:14:29, 2.53s/it] +2025-02-05 18:29:05 - ERROR - stderr - 35%|███▍ | 7842/22434 [8:21:25<10:07:43, 2.50s/it] +2025-02-05 18:29:05 - ERROR - stderr - +2025-02-05 18:29:05 - ERROR - stderr - +2025-02-05 18:29:05 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.166481852531433, 'learning_rate': 1.510631193180907e-05, 'epoch': 1.05} +2025-02-05 18:29:05 - ERROR - stderr - 35%|███▍ | 7842/22434 [8:21:25<10:07:43, 2.50s/it] +2025-02-05 18:29:08 - ERROR - stderr - 35%|███▍ | 7843/22434 [8:21:27<10:04:20, 2.49s/it] +2025-02-05 18:29:08 - ERROR - stderr - +2025-02-05 18:29:08 - ERROR - stderr - +2025-02-05 18:29:08 - INFO - stdout - {'loss': 0.7043, 'grad_norm': 1.2873793840408325, 'learning_rate': 1.5105070545462538e-05, 'epoch': 1.05} +2025-02-05 18:29:08 - ERROR - stderr - 35%|███▍ | 7843/22434 [8:21:27<10:04:20, 2.49s/it] +2025-02-05 18:29:10 - ERROR - stderr - 35%|███▍ | 7844/22434 [8:21:30<10:01:19, 2.47s/it] +2025-02-05 18:29:10 - ERROR - stderr - +2025-02-05 18:29:10 - ERROR - stderr - +2025-02-05 18:29:10 - INFO - stdout - {'loss': 0.7241, 'grad_norm': 1.241848111152649, 'learning_rate': 1.5103829052705697e-05, 'epoch': 1.05} +2025-02-05 18:29:10 - ERROR - stderr - 35%|███▍ | 7844/22434 [8:21:30<10:01:19, 2.47s/it] +2025-02-05 18:29:12 - ERROR - stderr - 35%|███▍ | 7845/22434 [8:21:32<10:00:11, 2.47s/it] +2025-02-05 18:29:13 - ERROR - stderr - +2025-02-05 18:29:13 - ERROR - stderr - +2025-02-05 18:29:13 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.2145333290100098, 'learning_rate': 1.510258745356442e-05, 'epoch': 1.05} +2025-02-05 18:29:13 - ERROR - stderr - 35%|███▍ | 7845/22434 [8:21:32<10:00:11, 2.47s/it] +2025-02-05 18:29:15 - ERROR - stderr - 35%|███▍ | 7846/22434 [8:21:35<10:00:49, 2.47s/it] +2025-02-05 18:29:15 - ERROR - stderr - +2025-02-05 18:29:15 - ERROR - stderr - +2025-02-05 18:29:15 - INFO - stdout - {'loss': 0.6481, 'grad_norm': 1.0562639236450195, 'learning_rate': 1.5101345748064593e-05, 'epoch': 1.05} +2025-02-05 18:29:15 - ERROR - stderr - 35%|███▍ | 7846/22434 [8:21:35<10:00:49, 2.47s/it] +2025-02-05 18:29:17 - ERROR - stderr - 35%|███▍ | 7847/22434 [8:21:37<10:01:40, 2.47s/it] +2025-02-05 18:29:17 - ERROR - stderr - +2025-02-05 18:29:17 - ERROR - stderr - +2025-02-05 18:29:17 - INFO - stdout - {'loss': 0.7426, 'grad_norm': 1.022594928741455, 'learning_rate': 1.510010393623209e-05, 'epoch': 1.05} +2025-02-05 18:29:17 - ERROR - stderr - 35%|███▍ | 7847/22434 [8:21:37<10:01:40, 2.47s/it] +2025-02-05 18:29:20 - ERROR - stderr - 35%|███▍ | 7848/22434 [8:21:40<10:05:44, 2.49s/it] +2025-02-05 18:29:20 - ERROR - stderr - +2025-02-05 18:29:20 - ERROR - stderr - +2025-02-05 18:29:20 - INFO - stdout - {'loss': 0.761, 'grad_norm': 1.0815496444702148, 'learning_rate': 1.5098862018092808e-05, 'epoch': 1.05} +2025-02-05 18:29:20 - ERROR - stderr - 35%|███▍ | 7848/22434 [8:21:40<10:05:44, 2.49s/it] +2025-02-05 18:29:22 - ERROR - stderr - 35%|███▍ | 7849/22434 [8:21:42<10:03:09, 2.48s/it] +2025-02-05 18:29:22 - ERROR - stderr - +2025-02-05 18:29:22 - ERROR - stderr - +2025-02-05 18:29:22 - INFO - stdout - {'loss': 0.8225, 'grad_norm': 1.113710880279541, 'learning_rate': 1.5097619993672624e-05, 'epoch': 1.05} +2025-02-05 18:29:22 - ERROR - stderr - 35%|███▍ | 7849/22434 [8:21:42<10:03:09, 2.48s/it] +2025-02-05 18:29:25 - ERROR - stderr - 35%|███▍ | 7850/22434 [8:21:45<10:00:50, 2.47s/it] +2025-02-05 18:29:25 - ERROR - stderr - +2025-02-05 18:29:25 - ERROR - stderr - +2025-02-05 18:29:25 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.1431163549423218, 'learning_rate': 1.5096377862997428e-05, 'epoch': 1.05} +2025-02-05 18:29:25 - ERROR - stderr - 35%|███▍ | 7850/22434 [8:21:45<10:00:50, 2.47s/it] +2025-02-05 18:29:27 - ERROR - stderr - 35%|███▍ | 7851/22434 [8:21:47<9:57:18, 2.46s/it] +2025-02-05 18:29:27 - ERROR - stderr - +2025-02-05 18:29:27 - ERROR - stderr - +2025-02-05 18:29:27 - INFO - stdout - {'loss': 0.7537, 'grad_norm': 1.0646350383758545, 'learning_rate': 1.5095135626093112e-05, 'epoch': 1.05} +2025-02-05 18:29:27 - ERROR - stderr - 35%|███▍ | 7851/22434 [8:21:47<9:57:18, 2.46s/it] +2025-02-05 18:29:30 - ERROR - stderr - 35%|███▌ | 7852/22434 [8:21:50<10:00:55, 2.47s/it] +2025-02-05 18:29:30 - ERROR - stderr - +2025-02-05 18:29:30 - ERROR - stderr - +2025-02-05 18:29:30 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.1504433155059814, 'learning_rate': 1.5093893282985565e-05, 'epoch': 1.05} +2025-02-05 18:29:30 - ERROR - stderr - 35%|███▌ | 7852/22434 [8:21:50<10:00:55, 2.47s/it] +2025-02-05 18:29:33 - ERROR - stderr - 35%|███▌ | 7853/22434 [8:21:53<10:35:25, 2.61s/it] +2025-02-05 18:29:33 - ERROR - stderr - +2025-02-05 18:29:33 - ERROR - stderr - +2025-02-05 18:29:33 - INFO - stdout - {'loss': 0.735, 'grad_norm': 1.3994109630584717, 'learning_rate': 1.5092650833700695e-05, 'epoch': 1.05} +2025-02-05 18:29:33 - ERROR - stderr - 35%|███▌ | 7853/22434 [8:21:53<10:35:25, 2.61s/it] +2025-02-05 18:29:35 - ERROR - stderr - 35%|███▌ | 7854/22434 [8:21:55<10:23:44, 2.57s/it] +2025-02-05 18:29:35 - ERROR - stderr - +2025-02-05 18:29:35 - ERROR - stderr - +2025-02-05 18:29:35 - INFO - stdout - {'loss': 0.7855, 'grad_norm': 1.2312335968017578, 'learning_rate': 1.5091408278264388e-05, 'epoch': 1.05} +2025-02-05 18:29:35 - ERROR - stderr - 35%|███▌ | 7854/22434 [8:21:55<10:23:44, 2.57s/it] +2025-02-05 18:29:38 - ERROR - stderr - 35%|███▌ | 7855/22434 [8:21:57<10:16:47, 2.54s/it] +2025-02-05 18:29:38 - ERROR - stderr - +2025-02-05 18:29:38 - ERROR - stderr - +2025-02-05 18:29:38 - INFO - stdout - {'loss': 0.7748, 'grad_norm': 1.2383403778076172, 'learning_rate': 1.5090165616702548e-05, 'epoch': 1.05} +2025-02-05 18:29:38 - ERROR - stderr - 35%|███▌ | 7855/22434 [8:21:57<10:16:47, 2.54s/it] +2025-02-05 18:29:40 - ERROR - stderr - 35%|███▌ | 7856/22434 [8:22:00<10:27:01, 2.58s/it] +2025-02-05 18:29:40 - ERROR - stderr - +2025-02-05 18:29:40 - ERROR - stderr - +2025-02-05 18:29:40 - INFO - stdout - {'loss': 0.7354, 'grad_norm': 1.2579381465911865, 'learning_rate': 1.5088922849041075e-05, 'epoch': 1.05} +2025-02-05 18:29:40 - ERROR - stderr - 35%|███▌ | 7856/22434 [8:22:00<10:27:01, 2.58s/it] +2025-02-05 18:29:43 - ERROR - stderr - 35%|███▌ | 7857/22434 [8:22:03<10:18:21, 2.55s/it] +2025-02-05 18:29:43 - ERROR - stderr - +2025-02-05 18:29:43 - ERROR - stderr - +2025-02-05 18:29:43 - INFO - stdout - {'loss': 0.8059, 'grad_norm': 1.356278419494629, 'learning_rate': 1.5087679975305876e-05, 'epoch': 1.05} +2025-02-05 18:29:43 - ERROR - stderr - 35%|███▌ | 7857/22434 [8:22:03<10:18:21, 2.55s/it] +2025-02-05 18:29:45 - ERROR - stderr - 35%|███▌ | 7858/22434 [8:22:05<10:17:15, 2.54s/it] +2025-02-05 18:29:45 - ERROR - stderr - +2025-02-05 18:29:45 - ERROR - stderr - +2025-02-05 18:29:45 - INFO - stdout - {'loss': 0.7977, 'grad_norm': 1.2162450551986694, 'learning_rate': 1.5086436995522855e-05, 'epoch': 1.05} +2025-02-05 18:29:45 - ERROR - stderr - 35%|███▌ | 7858/22434 [8:22:05<10:17:15, 2.54s/it] +2025-02-05 18:29:48 - ERROR - stderr - 35%|███▌ | 7859/22434 [8:22:08<10:14:04, 2.53s/it] +2025-02-05 18:29:48 - ERROR - stderr - +2025-02-05 18:29:48 - ERROR - stderr - +2025-02-05 18:29:48 - INFO - stdout - {'loss': 0.6465, 'grad_norm': 1.0081366300582886, 'learning_rate': 1.508519390971792e-05, 'epoch': 1.05} +2025-02-05 18:29:48 - ERROR - stderr - 35%|███▌ | 7859/22434 [8:22:08<10:14:04, 2.53s/it] +2025-02-05 18:29:50 - ERROR - stderr - 35%|███▌ | 7860/22434 [8:22:10<10:20:03, 2.55s/it] +2025-02-05 18:29:50 - ERROR - stderr - +2025-02-05 18:29:50 - ERROR - stderr - +2025-02-05 18:29:50 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.260623574256897, 'learning_rate': 1.5083950717916991e-05, 'epoch': 1.05} +2025-02-05 18:29:50 - ERROR - stderr - 35%|███▌ | 7860/22434 [8:22:10<10:20:03, 2.55s/it] +2025-02-05 18:29:53 - ERROR - stderr - 35%|███▌ | 7861/22434 [8:22:13<10:13:33, 2.53s/it] +2025-02-05 18:29:53 - ERROR - stderr - +2025-02-05 18:29:53 - ERROR - stderr - +2025-02-05 18:29:53 - INFO - stdout - {'loss': 0.7399, 'grad_norm': 1.275819182395935, 'learning_rate': 1.508270742014597e-05, 'epoch': 1.05} +2025-02-05 18:29:53 - ERROR - stderr - 35%|███▌ | 7861/22434 [8:22:13<10:13:33, 2.53s/it] +2025-02-05 18:29:55 - ERROR - stderr - 35%|███▌ | 7862/22434 [8:22:15<10:09:25, 2.51s/it] +2025-02-05 18:29:55 - ERROR - stderr - +2025-02-05 18:29:55 - ERROR - stderr - +2025-02-05 18:29:55 - INFO - stdout - {'loss': 0.6674, 'grad_norm': 1.0182470083236694, 'learning_rate': 1.5081464016430775e-05, 'epoch': 1.05} +2025-02-05 18:29:55 - ERROR - stderr - 35%|███▌ | 7862/22434 [8:22:15<10:09:25, 2.51s/it] +2025-02-05 18:29:58 - ERROR - stderr - 35%|███▌ | 7863/22434 [8:22:18<10:03:39, 2.49s/it] +2025-02-05 18:29:58 - ERROR - stderr - +2025-02-05 18:29:58 - ERROR - stderr - +2025-02-05 18:29:58 - INFO - stdout - {'loss': 0.7498, 'grad_norm': 1.3782334327697754, 'learning_rate': 1.5080220506797327e-05, 'epoch': 1.05} +2025-02-05 18:29:58 - ERROR - stderr - 35%|███▌ | 7863/22434 [8:22:18<10:03:39, 2.49s/it] +2025-02-05 18:30:00 - ERROR - stderr - 35%|███▌ | 7864/22434 [8:22:20<10:06:26, 2.50s/it] +2025-02-05 18:30:00 - ERROR - stderr - +2025-02-05 18:30:00 - ERROR - stderr - +2025-02-05 18:30:00 - INFO - stdout - {'loss': 0.7779, 'grad_norm': 1.2058110237121582, 'learning_rate': 1.5078976891271544e-05, 'epoch': 1.05} +2025-02-05 18:30:00 - ERROR - stderr - 35%|███▌ | 7864/22434 [8:22:20<10:06:26, 2.50s/it] +2025-02-05 18:30:03 - ERROR - stderr - 35%|███▌ | 7865/22434 [8:22:23<10:12:36, 2.52s/it] +2025-02-05 18:30:03 - ERROR - stderr - +2025-02-05 18:30:03 - ERROR - stderr - +2025-02-05 18:30:03 - INFO - stdout - {'loss': 0.8611, 'grad_norm': 1.1041940450668335, 'learning_rate': 1.5077733169879346e-05, 'epoch': 1.05} +2025-02-05 18:30:03 - ERROR - stderr - 35%|███▌ | 7865/22434 [8:22:23<10:12:36, 2.52s/it] +2025-02-05 18:30:05 - ERROR - stderr - 35%|███▌ | 7866/22434 [8:22:25<10:10:41, 2.52s/it] +2025-02-05 18:30:05 - ERROR - stderr - +2025-02-05 18:30:05 - ERROR - stderr - +2025-02-05 18:30:05 - INFO - stdout - {'loss': 0.8007, 'grad_norm': 1.1350985765457153, 'learning_rate': 1.5076489342646659e-05, 'epoch': 1.05} +2025-02-05 18:30:05 - ERROR - stderr - 35%|███▌ | 7866/22434 [8:22:25<10:10:41, 2.52s/it] +2025-02-05 18:30:08 - ERROR - stderr - 35%|███▌ | 7867/22434 [8:22:28<10:11:54, 2.52s/it] +2025-02-05 18:30:08 - ERROR - stderr - +2025-02-05 18:30:08 - ERROR - stderr - +2025-02-05 18:30:08 - INFO - stdout - {'loss': 0.7812, 'grad_norm': 1.2510885000228882, 'learning_rate': 1.5075245409599411e-05, 'epoch': 1.05} +2025-02-05 18:30:08 - ERROR - stderr - 35%|███▌ | 7867/22434 [8:22:28<10:11:54, 2.52s/it] +2025-02-05 18:30:10 - ERROR - stderr - 35%|███▌ | 7868/22434 [8:22:30<10:07:35, 2.50s/it] +2025-02-05 18:30:10 - ERROR - stderr - +2025-02-05 18:30:10 - ERROR - stderr - +2025-02-05 18:30:10 - INFO - stdout - {'loss': 0.7637, 'grad_norm': 1.117118000984192, 'learning_rate': 1.5074001370763527e-05, 'epoch': 1.05} +2025-02-05 18:30:10 - ERROR - stderr - 35%|███▌ | 7868/22434 [8:22:30<10:07:35, 2.50s/it] +2025-02-05 18:30:13 - ERROR - stderr - 35%|███▌ | 7869/22434 [8:22:33<10:24:41, 2.57s/it] +2025-02-05 18:30:13 - ERROR - stderr - +2025-02-05 18:30:13 - ERROR - stderr - +2025-02-05 18:30:13 - INFO - stdout - {'loss': 0.6722, 'grad_norm': 1.0593831539154053, 'learning_rate': 1.5072757226164942e-05, 'epoch': 1.05} +2025-02-05 18:30:13 - ERROR - stderr - 35%|███▌ | 7869/22434 [8:22:33<10:24:41, 2.57s/it] +2025-02-05 18:30:16 - ERROR - stderr - 35%|███▌ | 7870/22434 [8:22:35<10:21:49, 2.56s/it] +2025-02-05 18:30:16 - ERROR - stderr - +2025-02-05 18:30:16 - ERROR - stderr - +2025-02-05 18:30:16 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.0556825399398804, 'learning_rate': 1.5071512975829588e-05, 'epoch': 1.05} +2025-02-05 18:30:16 - ERROR - stderr - 35%|███▌ | 7870/22434 [8:22:35<10:21:49, 2.56s/it] +2025-02-05 18:30:18 - ERROR - stderr - 35%|███▌ | 7871/22434 [8:22:38<10:24:09, 2.57s/it] +2025-02-05 18:30:18 - ERROR - stderr - +2025-02-05 18:30:18 - ERROR - stderr - +2025-02-05 18:30:18 - INFO - stdout - {'loss': 0.7828, 'grad_norm': 1.0788562297821045, 'learning_rate': 1.5070268619783392e-05, 'epoch': 1.05} +2025-02-05 18:30:18 - ERROR - stderr - 35%|███▌ | 7871/22434 [8:22:38<10:24:09, 2.57s/it] +2025-02-05 18:30:21 - ERROR - stderr - 35%|███▌ | 7872/22434 [8:22:41<10:21:07, 2.56s/it] +2025-02-05 18:30:21 - ERROR - stderr - +2025-02-05 18:30:21 - ERROR - stderr - +2025-02-05 18:30:21 - INFO - stdout - {'loss': 0.7238, 'grad_norm': 1.1442049741744995, 'learning_rate': 1.5069024158052306e-05, 'epoch': 1.05} +2025-02-05 18:30:21 - ERROR - stderr - 35%|███▌ | 7872/22434 [8:22:41<10:21:07, 2.56s/it] +2025-02-05 18:30:23 - ERROR - stderr - 35%|███▌ | 7873/22434 [8:22:43<10:14:49, 2.53s/it] +2025-02-05 18:30:23 - ERROR - stderr - +2025-02-05 18:30:23 - ERROR - stderr - +2025-02-05 18:30:23 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.1112186908721924, 'learning_rate': 1.5067779590662258e-05, 'epoch': 1.05} +2025-02-05 18:30:23 - ERROR - stderr - 35%|███▌ | 7873/22434 [8:22:43<10:14:49, 2.53s/it] +2025-02-05 18:30:26 - ERROR - stderr - 35%|███▌ | 7874/22434 [8:22:46<10:11:48, 2.52s/it] +2025-02-05 18:30:26 - ERROR - stderr - +2025-02-05 18:30:26 - ERROR - stderr - +2025-02-05 18:30:26 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.0672895908355713, 'learning_rate': 1.5066534917639195e-05, 'epoch': 1.05} +2025-02-05 18:30:26 - ERROR - stderr - 35%|███▌ | 7874/22434 [8:22:46<10:11:48, 2.52s/it] +2025-02-05 18:30:28 - ERROR - stderr - 35%|███▌ | 7875/22434 [8:22:48<10:10:48, 2.52s/it] +2025-02-05 18:30:28 - ERROR - stderr - +2025-02-05 18:30:28 - ERROR - stderr - +2025-02-05 18:30:28 - INFO - stdout - {'loss': 0.706, 'grad_norm': 0.9480273723602295, 'learning_rate': 1.506529013900906e-05, 'epoch': 1.05} +2025-02-05 18:30:28 - ERROR - stderr - 35%|███▌ | 7875/22434 [8:22:48<10:10:48, 2.52s/it] +2025-02-05 18:30:31 - ERROR - stderr - 35%|███▌ | 7876/22434 [8:22:51<10:24:53, 2.58s/it] +2025-02-05 18:30:31 - ERROR - stderr - +2025-02-05 18:30:31 - ERROR - stderr - +2025-02-05 18:30:31 - INFO - stdout - {'loss': 0.8082, 'grad_norm': 1.1259511709213257, 'learning_rate': 1.5064045254797797e-05, 'epoch': 1.05} +2025-02-05 18:30:31 - ERROR - stderr - 35%|███▌ | 7876/22434 [8:22:51<10:24:53, 2.58s/it] +2025-02-05 18:30:33 - ERROR - stderr - 35%|███▌ | 7877/22434 [8:22:53<10:15:08, 2.54s/it] +2025-02-05 18:30:33 - ERROR - stderr - +2025-02-05 18:30:33 - ERROR - stderr - +2025-02-05 18:30:33 - INFO - stdout - {'loss': 0.7976, 'grad_norm': 1.2826861143112183, 'learning_rate': 1.5062800265031358e-05, 'epoch': 1.05} +2025-02-05 18:30:33 - ERROR - stderr - 35%|███▌ | 7877/22434 [8:22:53<10:15:08, 2.54s/it] +2025-02-05 18:30:36 - ERROR - stderr - 35%|███▌ | 7878/22434 [8:22:56<10:12:04, 2.52s/it] +2025-02-05 18:30:36 - ERROR - stderr - +2025-02-05 18:30:36 - ERROR - stderr - +2025-02-05 18:30:36 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.0068809986114502, 'learning_rate': 1.506155516973569e-05, 'epoch': 1.05} +2025-02-05 18:30:36 - ERROR - stderr - 35%|███▌ | 7878/22434 [8:22:56<10:12:04, 2.52s/it] +2025-02-05 18:30:39 - ERROR - stderr - 35%|███▌ | 7879/22434 [8:22:58<10:27:19, 2.59s/it] +2025-02-05 18:30:39 - ERROR - stderr - +2025-02-05 18:30:39 - ERROR - stderr - +2025-02-05 18:30:39 - INFO - stdout - {'loss': 0.6528, 'grad_norm': 1.1160383224487305, 'learning_rate': 1.5060309968936753e-05, 'epoch': 1.05} +2025-02-05 18:30:39 - ERROR - stderr - 35%|███▌ | 7879/22434 [8:22:58<10:27:19, 2.59s/it] +2025-02-05 18:30:41 - ERROR - stderr - 35%|███▌ | 7880/22434 [8:23:01<10:19:37, 2.55s/it] +2025-02-05 18:30:41 - ERROR - stderr - +2025-02-05 18:30:41 - ERROR - stderr - +2025-02-05 18:30:41 - INFO - stdout - {'loss': 0.7247, 'grad_norm': 1.177682638168335, 'learning_rate': 1.5059064662660491e-05, 'epoch': 1.05} +2025-02-05 18:30:41 - ERROR - stderr - 35%|███▌ | 7880/22434 [8:23:01<10:19:37, 2.55s/it] +2025-02-05 18:30:44 - ERROR - stderr - 35%|███▌ | 7881/22434 [8:23:04<10:26:38, 2.58s/it] +2025-02-05 18:30:44 - ERROR - stderr - +2025-02-05 18:30:44 - ERROR - stderr - +2025-02-05 18:30:44 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.1462388038635254, 'learning_rate': 1.5057819250932872e-05, 'epoch': 1.05} +2025-02-05 18:30:44 - ERROR - stderr - 35%|███▌ | 7881/22434 [8:23:04<10:26:38, 2.58s/it] +2025-02-05 18:30:46 - ERROR - stderr - 35%|███▌ | 7882/22434 [8:23:06<10:17:03, 2.54s/it] +2025-02-05 18:30:46 - ERROR - stderr - +2025-02-05 18:30:46 - ERROR - stderr - +2025-02-05 18:30:46 - INFO - stdout - {'loss': 0.82, 'grad_norm': 1.2165483236312866, 'learning_rate': 1.5056573733779848e-05, 'epoch': 1.05} +2025-02-05 18:30:46 - ERROR - stderr - 35%|███▌ | 7882/22434 [8:23:06<10:17:03, 2.54s/it] +2025-02-05 18:30:49 - ERROR - stderr - 35%|███▌ | 7883/22434 [8:23:09<10:15:26, 2.54s/it] +2025-02-05 18:30:49 - ERROR - stderr - +2025-02-05 18:30:49 - ERROR - stderr - +2025-02-05 18:30:49 - INFO - stdout - {'loss': 0.7073, 'grad_norm': 1.0892783403396606, 'learning_rate': 1.5055328111227386e-05, 'epoch': 1.05} +2025-02-05 18:30:49 - ERROR - stderr - 35%|███▌ | 7883/22434 [8:23:09<10:15:26, 2.54s/it] +2025-02-05 18:30:51 - ERROR - stderr - 35%|███▌ | 7884/22434 [8:23:11<10:14:51, 2.54s/it] +2025-02-05 18:30:51 - ERROR - stderr - +2025-02-05 18:30:51 - ERROR - stderr - +2025-02-05 18:30:51 - INFO - stdout - {'loss': 0.7064, 'grad_norm': 1.057525396347046, 'learning_rate': 1.5054082383301441e-05, 'epoch': 1.05} +2025-02-05 18:30:51 - ERROR - stderr - 35%|███▌ | 7884/22434 [8:23:11<10:14:51, 2.54s/it] +2025-02-05 18:30:54 - ERROR - stderr - 35%|███▌ | 7885/22434 [8:23:13<10:07:27, 2.51s/it] +2025-02-05 18:30:54 - ERROR - stderr - +2025-02-05 18:30:54 - ERROR - stderr - +2025-02-05 18:30:54 - INFO - stdout - {'loss': 0.7184, 'grad_norm': 1.0444414615631104, 'learning_rate': 1.505283655002799e-05, 'epoch': 1.05} +2025-02-05 18:30:54 - ERROR - stderr - 35%|███▌ | 7885/22434 [8:23:14<10:07:27, 2.51s/it] +2025-02-05 18:30:56 - ERROR - stderr - 35%|███▌ | 7886/22434 [8:23:16<10:02:38, 2.49s/it] +2025-02-05 18:30:56 - ERROR - stderr - +2025-02-05 18:30:56 - ERROR - stderr - +2025-02-05 18:30:56 - INFO - stdout - {'loss': 0.666, 'grad_norm': 1.2299485206604004, 'learning_rate': 1.5051590611432994e-05, 'epoch': 1.05} +2025-02-05 18:30:56 - ERROR - stderr - 35%|███▌ | 7886/22434 [8:23:16<10:02:38, 2.49s/it] +2025-02-05 18:30:59 - ERROR - stderr - 35%|███▌ | 7887/22434 [8:23:19<10:14:31, 2.53s/it] +2025-02-05 18:30:59 - ERROR - stderr - +2025-02-05 18:30:59 - ERROR - stderr - +2025-02-05 18:30:59 - INFO - stdout - {'loss': 0.8481, 'grad_norm': 1.2642645835876465, 'learning_rate': 1.5050344567542425e-05, 'epoch': 1.05} +2025-02-05 18:30:59 - ERROR - stderr - 35%|███▌ | 7887/22434 [8:23:19<10:14:31, 2.53s/it] +2025-02-05 18:31:02 - ERROR - stderr - 35%|███▌ | 7888/22434 [8:23:21<10:40:32, 2.64s/it] +2025-02-05 18:31:02 - ERROR - stderr - +2025-02-05 18:31:02 - ERROR - stderr - +2025-02-05 18:31:02 - INFO - stdout - {'loss': 0.6449, 'grad_norm': 1.0813515186309814, 'learning_rate': 1.5049098418382257e-05, 'epoch': 1.05} +2025-02-05 18:31:02 - ERROR - stderr - 35%|███▌ | 7888/22434 [8:23:22<10:40:32, 2.64s/it] +2025-02-05 18:31:04 - ERROR - stderr - 35%|███▌ | 7889/22434 [8:23:24<10:28:17, 2.59s/it] +2025-02-05 18:31:04 - ERROR - stderr - +2025-02-05 18:31:04 - ERROR - stderr - +2025-02-05 18:31:04 - INFO - stdout - {'loss': 0.7718, 'grad_norm': 1.2013095617294312, 'learning_rate': 1.5047852163978464e-05, 'epoch': 1.05} +2025-02-05 18:31:04 - ERROR - stderr - 35%|███▌ | 7889/22434 [8:23:24<10:28:17, 2.59s/it] +2025-02-05 18:31:07 - ERROR - stderr - 35%|███▌ | 7890/22434 [8:23:26<10:19:44, 2.56s/it] +2025-02-05 18:31:07 - ERROR - stderr - +2025-02-05 18:31:07 - ERROR - stderr - +2025-02-05 18:31:07 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.0395716428756714, 'learning_rate': 1.5046605804357021e-05, 'epoch': 1.06} +2025-02-05 18:31:07 - ERROR - stderr - 35%|███▌ | 7890/22434 [8:23:26<10:19:44, 2.56s/it] +2025-02-05 18:31:09 - ERROR - stderr - 35%|███▌ | 7891/22434 [8:23:29<10:16:12, 2.54s/it] +2025-02-05 18:31:09 - ERROR - stderr - +2025-02-05 18:31:09 - ERROR - stderr - +2025-02-05 18:31:09 - INFO - stdout - {'loss': 0.7364, 'grad_norm': 1.2394373416900635, 'learning_rate': 1.5045359339543912e-05, 'epoch': 1.06} +2025-02-05 18:31:09 - ERROR - stderr - 35%|███▌ | 7891/22434 [8:23:29<10:16:12, 2.54s/it] +2025-02-05 18:31:12 - ERROR - stderr - 35%|███▌ | 7892/22434 [8:23:31<10:12:29, 2.53s/it] +2025-02-05 18:31:12 - ERROR - stderr - +2025-02-05 18:31:12 - ERROR - stderr - +2025-02-05 18:31:12 - INFO - stdout - {'loss': 0.8007, 'grad_norm': 1.1406208276748657, 'learning_rate': 1.5044112769565113e-05, 'epoch': 1.06} +2025-02-05 18:31:12 - ERROR - stderr - 35%|███▌ | 7892/22434 [8:23:31<10:12:29, 2.53s/it] +2025-02-05 18:31:14 - ERROR - stderr - 35%|███▌ | 7893/22434 [8:23:34<10:12:37, 2.53s/it] +2025-02-05 18:31:14 - ERROR - stderr - +2025-02-05 18:31:14 - ERROR - stderr - +2025-02-05 18:31:14 - INFO - stdout - {'loss': 0.7638, 'grad_norm': 1.0284279584884644, 'learning_rate': 1.5042866094446615e-05, 'epoch': 1.06} +2025-02-05 18:31:14 - ERROR - stderr - 35%|���██▌ | 7893/22434 [8:23:34<10:12:37, 2.53s/it] +2025-02-05 18:31:17 - ERROR - stderr - 35%|███▌ | 7894/22434 [8:23:36<10:09:16, 2.51s/it] +2025-02-05 18:31:17 - ERROR - stderr - +2025-02-05 18:31:17 - ERROR - stderr - +2025-02-05 18:31:17 - INFO - stdout - {'loss': 0.692, 'grad_norm': 1.0506573915481567, 'learning_rate': 1.5041619314214396e-05, 'epoch': 1.06} +2025-02-05 18:31:17 - ERROR - stderr - 35%|███▌ | 7894/22434 [8:23:36<10:09:16, 2.51s/it] +2025-02-05 18:31:19 - ERROR - stderr - 35%|███▌ | 7895/22434 [8:23:39<10:10:28, 2.52s/it] +2025-02-05 18:31:19 - ERROR - stderr - +2025-02-05 18:31:19 - ERROR - stderr - +2025-02-05 18:31:19 - INFO - stdout - {'loss': 0.6897, 'grad_norm': 1.2404394149780273, 'learning_rate': 1.5040372428894446e-05, 'epoch': 1.06} +2025-02-05 18:31:19 - ERROR - stderr - 35%|███▌ | 7895/22434 [8:23:39<10:10:28, 2.52s/it] +2025-02-05 18:31:22 - ERROR - stderr - 35%|███▌ | 7896/22434 [8:23:42<10:19:29, 2.56s/it] +2025-02-05 18:31:22 - ERROR - stderr - +2025-02-05 18:31:22 - ERROR - stderr - +2025-02-05 18:31:22 - INFO - stdout - {'loss': 0.7774, 'grad_norm': 1.128794550895691, 'learning_rate': 1.5039125438512755e-05, 'epoch': 1.06} +2025-02-05 18:31:22 - ERROR - stderr - 35%|███▌ | 7896/22434 [8:23:42<10:19:29, 2.56s/it] +2025-02-05 18:31:24 - ERROR - stderr - 35%|███▌ | 7897/22434 [8:23:44<10:15:50, 2.54s/it] +2025-02-05 18:31:24 - ERROR - stderr - +2025-02-05 18:31:24 - ERROR - stderr - +2025-02-05 18:31:24 - INFO - stdout - {'loss': 0.6927, 'grad_norm': 1.0648314952850342, 'learning_rate': 1.5037878343095319e-05, 'epoch': 1.06} +2025-02-05 18:31:24 - ERROR - stderr - 35%|███▌ | 7897/22434 [8:23:44<10:15:50, 2.54s/it] +2025-02-05 18:31:27 - ERROR - stderr - 35%|███▌ | 7898/22434 [8:23:47<10:08:44, 2.51s/it] +2025-02-05 18:31:27 - ERROR - stderr - +2025-02-05 18:31:27 - ERROR - stderr - +2025-02-05 18:31:27 - INFO - stdout - {'loss': 0.7425, 'grad_norm': 1.1216869354248047, 'learning_rate': 1.5036631142668125e-05, 'epoch': 1.06} +2025-02-05 18:31:27 - ERROR - stderr - 35%|███▌ | 7898/22434 [8:23:47<10:08:44, 2.51s/it] +2025-02-05 18:31:29 - ERROR - stderr - 35%|███▌ | 7899/22434 [8:23:49<10:06:17, 2.50s/it] +2025-02-05 18:31:29 - ERROR - stderr - +2025-02-05 18:31:29 - ERROR - stderr - +2025-02-05 18:31:29 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 0.9986578226089478, 'learning_rate': 1.5035383837257178e-05, 'epoch': 1.06} +2025-02-05 18:31:29 - ERROR - stderr - 35%|███▌ | 7899/22434 [8:23:49<10:06:17, 2.50s/it] +2025-02-05 18:31:32 - ERROR - stderr - 35%|███▌ | 7900/22434 [8:23:52<10:07:13, 2.51s/it] +2025-02-05 18:31:32 - ERROR - stderr - +2025-02-05 18:31:32 - ERROR - stderr - +2025-02-05 18:31:32 - INFO - stdout - {'loss': 0.739, 'grad_norm': 1.1233991384506226, 'learning_rate': 1.5034136426888472e-05, 'epoch': 1.06} +2025-02-05 18:31:32 - ERROR - stderr - 35%|███▌ | 7900/22434 [8:23:52<10:07:13, 2.51s/it] +2025-02-05 18:31:34 - ERROR - stderr - 35%|███▌ | 7901/22434 [8:23:54<10:08:06, 2.51s/it] +2025-02-05 18:31:34 - ERROR - stderr - +2025-02-05 18:31:34 - ERROR - stderr - +2025-02-05 18:31:34 - INFO - stdout - {'loss': 0.7462, 'grad_norm': 1.2780122756958008, 'learning_rate': 1.5032888911588008e-05, 'epoch': 1.06} +2025-02-05 18:31:34 - ERROR - stderr - 35%|███▌ | 7901/22434 [8:23:54<10:08:06, 2.51s/it] +2025-02-05 18:31:37 - ERROR - stderr - 35%|███▌ | 7902/22434 [8:23:57<10:07:12, 2.51s/it] +2025-02-05 18:31:37 - ERROR - stderr - +2025-02-05 18:31:37 - ERROR - stderr - +2025-02-05 18:31:37 - INFO - stdout - {'loss': 0.7972, 'grad_norm': 1.181603193283081, 'learning_rate': 1.5031641291381793e-05, 'epoch': 1.06} +2025-02-05 18:31:37 - ERROR - stderr - 35%|███▌ | 7902/22434 [8:23:57<10:07:12, 2.51s/it] +2025-02-05 18:31:39 - ERROR - stderr - 35%|███▌ | 7903/22434 [8:23:59<10:04:29, 2.50s/it] +2025-02-05 18:31:39 - ERROR - stderr - +2025-02-05 18:31:39 - ERROR - stderr - +2025-02-05 18:31:39 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.107285737991333, 'learning_rate': 1.5030393566295829e-05, 'epoch': 1.06} +2025-02-05 18:31:39 - ERROR - stderr - 35%|███▌ | 7903/22434 [8:23:59<10:04:29, 2.50s/it] +2025-02-05 18:31:42 - ERROR - stderr - 35%|███▌ | 7904/22434 [8:24:02<10:09:13, 2.52s/it] +2025-02-05 18:31:42 - ERROR - stderr - +2025-02-05 18:31:42 - ERROR - stderr - +2025-02-05 18:31:42 - INFO - stdout - {'loss': 0.7457, 'grad_norm': 1.1563758850097656, 'learning_rate': 1.5029145736356125e-05, 'epoch': 1.06} +2025-02-05 18:31:42 - ERROR - stderr - 35%|███▌ | 7904/22434 [8:24:02<10:09:13, 2.52s/it] +2025-02-05 18:31:44 - ERROR - stderr - 35%|███▌ | 7905/22434 [8:24:04<10:11:24, 2.52s/it] +2025-02-05 18:31:44 - ERROR - stderr - +2025-02-05 18:31:44 - ERROR - stderr - +2025-02-05 18:31:44 - INFO - stdout - {'loss': 0.8484, 'grad_norm': 1.2809792757034302, 'learning_rate': 1.5027897801588692e-05, 'epoch': 1.06} +2025-02-05 18:31:44 - ERROR - stderr - 35%|███▌ | 7905/22434 [8:24:04<10:11:24, 2.52s/it] +2025-02-05 18:31:47 - ERROR - stderr - 35%|███▌ | 7906/22434 [8:24:07<10:08:22, 2.51s/it] +2025-02-05 18:31:47 - ERROR - stderr - +2025-02-05 18:31:47 - ERROR - stderr - +2025-02-05 18:31:47 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.0872395038604736, 'learning_rate': 1.5026649762019539e-05, 'epoch': 1.06} +2025-02-05 18:31:47 - ERROR - stderr - 35%|███▌ | 7906/22434 [8:24:07<10:08:22, 2.51s/it] +2025-02-05 18:31:49 - ERROR - stderr - 35%|███▌ | 7907/22434 [8:24:09<10:07:30, 2.51s/it] +2025-02-05 18:31:49 - ERROR - stderr - +2025-02-05 18:31:49 - ERROR - stderr - +2025-02-05 18:31:49 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.1085779666900635, 'learning_rate': 1.5025401617674682e-05, 'epoch': 1.06} +2025-02-05 18:31:49 - ERROR - stderr - 35%|███▌ | 7907/22434 [8:24:09<10:07:30, 2.51s/it] +2025-02-05 18:31:52 - ERROR - stderr - 35%|███▌ | 7908/22434 [8:24:12<10:05:33, 2.50s/it] +2025-02-05 18:31:52 - ERROR - stderr - +2025-02-05 18:31:52 - ERROR - stderr - +2025-02-05 18:31:52 - INFO - stdout - {'loss': 0.7664, 'grad_norm': 1.1455177068710327, 'learning_rate': 1.5024153368580137e-05, 'epoch': 1.06} +2025-02-05 18:31:52 - ERROR - stderr - 35%|███▌ | 7908/22434 [8:24:12<10:05:33, 2.50s/it] +2025-02-05 18:31:54 - ERROR - stderr - 35%|███▌ | 7909/22434 [8:24:14<10:14:05, 2.54s/it] +2025-02-05 18:31:55 - ERROR - stderr - +2025-02-05 18:31:55 - ERROR - stderr - +2025-02-05 18:31:55 - INFO - stdout - {'loss': 0.6531, 'grad_norm': 1.0578618049621582, 'learning_rate': 1.5022905014761921e-05, 'epoch': 1.06} +2025-02-05 18:31:55 - ERROR - stderr - 35%|███▌ | 7909/22434 [8:24:14<10:14:05, 2.54s/it] +2025-02-05 18:31:57 - ERROR - stderr - 35%|███▌ | 7910/22434 [8:24:17<10:37:57, 2.64s/it] +2025-02-05 18:31:57 - ERROR - stderr - +2025-02-05 18:31:57 - ERROR - stderr - +2025-02-05 18:31:57 - INFO - stdout - {'loss': 0.7683, 'grad_norm': 1.330992341041565, 'learning_rate': 1.5021656556246056e-05, 'epoch': 1.06} +2025-02-05 18:31:57 - ERROR - stderr - 35%|███▌ | 7910/22434 [8:24:17<10:37:57, 2.64s/it] +2025-02-05 18:32:00 - ERROR - stderr - 35%|███▌ | 7911/22434 [8:24:20<10:33:38, 2.62s/it] +2025-02-05 18:32:00 - ERROR - stderr - +2025-02-05 18:32:00 - ERROR - stderr - +2025-02-05 18:32:00 - INFO - stdout - {'loss': 0.6146, 'grad_norm': 1.0405502319335938, 'learning_rate': 1.5020407993058568e-05, 'epoch': 1.06} +2025-02-05 18:32:00 - ERROR - stderr - 35%|███▌ | 7911/22434 [8:24:20<10:33:38, 2.62s/it] +2025-02-05 18:32:02 - ERROR - stderr - 35%|███▌ | 7912/22434 [8:24:22<10:22:24, 2.57s/it] +2025-02-05 18:32:02 - ERROR - stderr - +2025-02-05 18:32:02 - ERROR - stderr - +2025-02-05 18:32:02 - INFO - stdout - {'loss': 0.7217, 'grad_norm': 1.152622938156128, 'learning_rate': 1.5019159325225476e-05, 'epoch': 1.06} +2025-02-05 18:32:02 - ERROR - stderr - 35%|███▌ | 7912/22434 [8:24:22<10:22:24, 2.57s/it] +2025-02-05 18:32:05 - ERROR - stderr - 35%|███▌ | 7913/22434 [8:24:25<10:12:25, 2.53s/it] +2025-02-05 18:32:05 - ERROR - stderr - +2025-02-05 18:32:05 - ERROR - stderr - +2025-02-05 18:32:05 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.235860824584961, 'learning_rate': 1.5017910552772813e-05, 'epoch': 1.06} +2025-02-05 18:32:05 - ERROR - stderr - 35%|███▌ | 7913/22434 [8:24:25<10:12:25, 2.53s/it] +2025-02-05 18:32:07 - ERROR - stderr - 35%|███▌ | 7914/22434 [8:24:27<10:07:35, 2.51s/it] +2025-02-05 18:32:07 - ERROR - stderr - +2025-02-05 18:32:07 - ERROR - stderr - +2025-02-05 18:32:07 - INFO - stdout - {'loss': 0.7759, 'grad_norm': 1.179604172706604, 'learning_rate': 1.501666167572661e-05, 'epoch': 1.06} +2025-02-05 18:32:07 - ERROR - stderr - 35%|███▌ | 7914/22434 [8:24:27<10:07:35, 2.51s/it] +2025-02-05 18:32:10 - ERROR - stderr - 35%|███▌ | 7915/22434 [8:24:30<10:10:53, 2.52s/it] +2025-02-05 18:32:10 - ERROR - stderr - +2025-02-05 18:32:10 - ERROR - stderr - +2025-02-05 18:32:10 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.1920925378799438, 'learning_rate': 1.501541269411289e-05, 'epoch': 1.06} +2025-02-05 18:32:10 - ERROR - stderr - 35%|███▌ | 7915/22434 [8:24:30<10:10:53, 2.52s/it] +2025-02-05 18:32:12 - ERROR - stderr - 35%|███▌ | 7916/22434 [8:24:32<10:06:06, 2.50s/it] +2025-02-05 18:32:12 - ERROR - stderr - +2025-02-05 18:32:12 - ERROR - stderr - +2025-02-05 18:32:12 - INFO - stdout - {'loss': 0.7255, 'grad_norm': 1.165855050086975, 'learning_rate': 1.5014163607957691e-05, 'epoch': 1.06} +2025-02-05 18:32:12 - ERROR - stderr - 35%|███▌ | 7916/22434 [8:24:32<10:06:06, 2.50s/it] +2025-02-05 18:32:15 - ERROR - stderr - 35%|█���█▌ | 7917/22434 [8:24:35<10:05:30, 2.50s/it] +2025-02-05 18:32:15 - ERROR - stderr - +2025-02-05 18:32:15 - ERROR - stderr - +2025-02-05 18:32:15 - INFO - stdout - {'loss': 0.7119, 'grad_norm': 1.313413381576538, 'learning_rate': 1.501291441728705e-05, 'epoch': 1.06} +2025-02-05 18:32:15 - ERROR - stderr - 35%|███▌ | 7917/22434 [8:24:35<10:05:30, 2.50s/it] +2025-02-05 18:32:17 - ERROR - stderr - 35%|███▌ | 7918/22434 [8:24:37<10:11:30, 2.53s/it] +2025-02-05 18:32:17 - ERROR - stderr - +2025-02-05 18:32:17 - ERROR - stderr - +2025-02-05 18:32:17 - INFO - stdout - {'loss': 0.7101, 'grad_norm': 1.2340948581695557, 'learning_rate': 1.5011665122127008e-05, 'epoch': 1.06} +2025-02-05 18:32:17 - ERROR - stderr - 35%|███▌ | 7918/22434 [8:24:37<10:11:30, 2.53s/it] +2025-02-05 18:32:20 - ERROR - stderr - 35%|███▌ | 7919/22434 [8:24:40<10:07:21, 2.51s/it] +2025-02-05 18:32:20 - ERROR - stderr - +2025-02-05 18:32:20 - ERROR - stderr - +2025-02-05 18:32:20 - INFO - stdout - {'loss': 0.7976, 'grad_norm': 1.2666867971420288, 'learning_rate': 1.5010415722503599e-05, 'epoch': 1.06} +2025-02-05 18:32:20 - ERROR - stderr - 35%|███▌ | 7919/22434 [8:24:40<10:07:21, 2.51s/it] +2025-02-05 18:32:22 - ERROR - stderr - 35%|███▌ | 7920/22434 [8:24:42<10:06:50, 2.51s/it] +2025-02-05 18:32:22 - ERROR - stderr - +2025-02-05 18:32:22 - ERROR - stderr - +2025-02-05 18:32:22 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.1065138578414917, 'learning_rate': 1.500916621844287e-05, 'epoch': 1.06} +2025-02-05 18:32:22 - ERROR - stderr - 35%|███▌ | 7920/22434 [8:24:42<10:06:50, 2.51s/it] +2025-02-05 18:32:25 - ERROR - stderr - 35%|███▌ | 7921/22434 [8:24:45<10:04:59, 2.50s/it] +2025-02-05 18:32:25 - ERROR - stderr - +2025-02-05 18:32:25 - ERROR - stderr - +2025-02-05 18:32:25 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.1073086261749268, 'learning_rate': 1.5007916609970864e-05, 'epoch': 1.06} +2025-02-05 18:32:25 - ERROR - stderr - 35%|███▌ | 7921/22434 [8:24:45<10:04:59, 2.50s/it] +2025-02-05 18:32:27 - ERROR - stderr - 35%|███▌ | 7922/22434 [8:24:47<10:04:20, 2.50s/it] +2025-02-05 18:32:27 - ERROR - stderr - +2025-02-05 18:32:27 - ERROR - stderr - +2025-02-05 18:32:27 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.1413471698760986, 'learning_rate': 1.5006666897113632e-05, 'epoch': 1.06} +2025-02-05 18:32:27 - ERROR - stderr - 35%|███▌ | 7922/22434 [8:24:47<10:04:20, 2.50s/it] +2025-02-05 18:32:30 - ERROR - stderr - 35%|███▌ | 7923/22434 [8:24:50<10:02:33, 2.49s/it] +2025-02-05 18:32:30 - ERROR - stderr - +2025-02-05 18:32:30 - ERROR - stderr - +2025-02-05 18:32:30 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.0360164642333984, 'learning_rate': 1.5005417079897213e-05, 'epoch': 1.06} +2025-02-05 18:32:30 - ERROR - stderr - 35%|███▌ | 7923/22434 [8:24:50<10:02:33, 2.49s/it] +2025-02-05 18:32:32 - ERROR - stderr - 35%|███▌ | 7924/22434 [8:24:52<10:02:49, 2.49s/it] +2025-02-05 18:32:32 - ERROR - stderr - +2025-02-05 18:32:32 - ERROR - stderr - +2025-02-05 18:32:32 - INFO - stdout - {'loss': 0.6272, 'grad_norm': 1.0270274877548218, 'learning_rate': 1.5004167158347667e-05, 'epoch': 1.06} +2025-02-05 18:32:32 - ERROR - stderr - 35%|███▌ | 7924/22434 [8:24:52<10:02:49, 2.49s/it] +2025-02-05 18:32:35 - ERROR - stderr - 35%|███▌ | 7925/22434 [8:24:55<10:06:03, 2.51s/it] +2025-02-05 18:32:35 - ERROR - stderr - +2025-02-05 18:32:35 - ERROR - stderr - +2025-02-05 18:32:35 - INFO - stdout - {'loss': 0.7247, 'grad_norm': 1.1838332414627075, 'learning_rate': 1.5002917132491047e-05, 'epoch': 1.06} +2025-02-05 18:32:35 - ERROR - stderr - 35%|███▌ | 7925/22434 [8:24:55<10:06:03, 2.51s/it] +2025-02-05 18:32:37 - ERROR - stderr - 35%|███▌ | 7926/22434 [8:24:57<10:06:47, 2.51s/it] +2025-02-05 18:32:37 - ERROR - stderr - +2025-02-05 18:32:37 - ERROR - stderr - +2025-02-05 18:32:37 - INFO - stdout - {'loss': 0.8115, 'grad_norm': 1.1557414531707764, 'learning_rate': 1.5001667002353407e-05, 'epoch': 1.06} +2025-02-05 18:32:37 - ERROR - stderr - 35%|███▌ | 7926/22434 [8:24:57<10:06:47, 2.51s/it] +2025-02-05 18:32:40 - ERROR - stderr - 35%|███▌ | 7927/22434 [8:25:00<10:04:19, 2.50s/it] +2025-02-05 18:32:40 - ERROR - stderr - +2025-02-05 18:32:40 - ERROR - stderr - +2025-02-05 18:32:40 - INFO - stdout - {'loss': 0.6706, 'grad_norm': 1.0948350429534912, 'learning_rate': 1.5000416767960802e-05, 'epoch': 1.06} +2025-02-05 18:32:40 - ERROR - stderr - 35%|███▌ | 7927/22434 [8:25:00<10:04:19, 2.50s/it] +2025-02-05 18:32:42 - ERROR - stderr - 35%|███▌ | 7928/22434 [8:25:02<10:05:46, 2.51s/it] +2025-02-05 18:32:42 - ERROR - stderr - +2025-02-05 18:32:42 - ERROR - stderr - +2025-02-05 18:32:42 - INFO - stdout - {'loss': 0.7422, 'grad_norm': 1.0772641897201538, 'learning_rate': 1.4999166429339296e-05, 'epoch': 1.06} +2025-02-05 18:32:42 - ERROR - stderr - 35%|███▌ | 7928/22434 [8:25:02<10:05:46, 2.51s/it] +2025-02-05 18:32:45 - ERROR - stderr - 35%|███▌ | 7929/22434 [8:25:05<10:06:37, 2.51s/it] +2025-02-05 18:32:45 - ERROR - stderr - +2025-02-05 18:32:45 - ERROR - stderr - +2025-02-05 18:32:45 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.1993083953857422, 'learning_rate': 1.4997915986514945e-05, 'epoch': 1.06} +2025-02-05 18:32:45 - ERROR - stderr - 35%|███▌ | 7929/22434 [8:25:05<10:06:37, 2.51s/it] +2025-02-05 18:32:47 - ERROR - stderr - 35%|███▌ | 7930/22434 [8:25:07<10:06:58, 2.51s/it] +2025-02-05 18:32:47 - ERROR - stderr - +2025-02-05 18:32:47 - ERROR - stderr - +2025-02-05 18:32:47 - INFO - stdout - {'loss': 0.8047, 'grad_norm': 1.3113874197006226, 'learning_rate': 1.4996665439513825e-05, 'epoch': 1.06} +2025-02-05 18:32:47 - ERROR - stderr - 35%|███▌ | 7930/22434 [8:25:07<10:06:58, 2.51s/it] +2025-02-05 18:32:50 - ERROR - stderr - 35%|███▌ | 7931/22434 [8:25:10<10:05:50, 2.51s/it] +2025-02-05 18:32:50 - ERROR - stderr - +2025-02-05 18:32:50 - ERROR - stderr - +2025-02-05 18:32:50 - INFO - stdout - {'loss': 0.6063, 'grad_norm': 0.9327312707901001, 'learning_rate': 1.4995414788361991e-05, 'epoch': 1.06} +2025-02-05 18:32:50 - ERROR - stderr - 35%|███▌ | 7931/22434 [8:25:10<10:05:50, 2.51s/it] +2025-02-05 18:32:52 - ERROR - stderr - 35%|███▌ | 7932/22434 [8:25:12<10:00:14, 2.48s/it] +2025-02-05 18:32:52 - ERROR - stderr - +2025-02-05 18:32:52 - ERROR - stderr - +2025-02-05 18:32:52 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.0853824615478516, 'learning_rate': 1.4994164033085516e-05, 'epoch': 1.06} +2025-02-05 18:32:52 - ERROR - stderr - 35%|███▌ | 7932/22434 [8:25:12<10:00:14, 2.48s/it] +2025-02-05 18:32:55 - ERROR - stderr - 35%|███▌ | 7933/22434 [8:25:15<9:56:58, 2.47s/it] +2025-02-05 18:32:55 - ERROR - stderr - +2025-02-05 18:32:55 - ERROR - stderr - +2025-02-05 18:32:55 - INFO - stdout - {'loss': 0.7379, 'grad_norm': 1.1677602529525757, 'learning_rate': 1.4992913173710471e-05, 'epoch': 1.06} +2025-02-05 18:32:55 - ERROR - stderr - 35%|███▌ | 7933/22434 [8:25:15<9:56:58, 2.47s/it] +2025-02-05 18:32:57 - ERROR - stderr - 35%|███▌ | 7934/22434 [8:25:17<10:00:54, 2.49s/it] +2025-02-05 18:32:57 - ERROR - stderr - +2025-02-05 18:32:57 - ERROR - stderr - +2025-02-05 18:32:57 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.313225507736206, 'learning_rate': 1.4991662210262929e-05, 'epoch': 1.06} +2025-02-05 18:32:57 - ERROR - stderr - 35%|███▌ | 7934/22434 [8:25:17<10:00:54, 2.49s/it] +2025-02-05 18:33:00 - ERROR - stderr - 35%|███▌ | 7935/22434 [8:25:20<10:00:19, 2.48s/it] +2025-02-05 18:33:00 - ERROR - stderr - +2025-02-05 18:33:00 - ERROR - stderr - +2025-02-05 18:33:00 - INFO - stdout - {'loss': 0.6436, 'grad_norm': 1.1834877729415894, 'learning_rate': 1.4990411142768963e-05, 'epoch': 1.06} +2025-02-05 18:33:00 - ERROR - stderr - 35%|███▌ | 7935/22434 [8:25:20<10:00:19, 2.48s/it] +2025-02-05 18:33:02 - ERROR - stderr - 35%|███▌ | 7936/22434 [8:25:22<9:56:24, 2.47s/it] +2025-02-05 18:33:02 - ERROR - stderr - +2025-02-05 18:33:02 - ERROR - stderr - +2025-02-05 18:33:02 - INFO - stdout - {'loss': 0.7257, 'grad_norm': 1.192625880241394, 'learning_rate': 1.4989159971254652e-05, 'epoch': 1.06} +2025-02-05 18:33:02 - ERROR - stderr - 35%|███▌ | 7936/22434 [8:25:22<9:56:24, 2.47s/it] +2025-02-05 18:33:05 - ERROR - stderr - 35%|███▌ | 7937/22434 [8:25:24<9:58:50, 2.48s/it] +2025-02-05 18:33:05 - ERROR - stderr - +2025-02-05 18:33:05 - ERROR - stderr - +2025-02-05 18:33:05 - INFO - stdout - {'loss': 0.7332, 'grad_norm': 1.0600441694259644, 'learning_rate': 1.4987908695746078e-05, 'epoch': 1.06} +2025-02-05 18:33:05 - ERROR - stderr - 35%|███▌ | 7937/22434 [8:25:25<9:58:50, 2.48s/it] +2025-02-05 18:33:07 - ERROR - stderr - 35%|███▌ | 7938/22434 [8:25:27<10:05:07, 2.50s/it] +2025-02-05 18:33:07 - ERROR - stderr - +2025-02-05 18:33:07 - ERROR - stderr - +2025-02-05 18:33:07 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.1197410821914673, 'learning_rate': 1.498665731626932e-05, 'epoch': 1.06} +2025-02-05 18:33:07 - ERROR - stderr - 35%|███▌ | 7938/22434 [8:25:27<10:05:07, 2.50s/it] +2025-02-05 18:33:10 - ERROR - stderr - 35%|███▌ | 7939/22434 [8:25:30<10:06:02, 2.51s/it] +2025-02-05 18:33:10 - ERROR - stderr - +2025-02-05 18:33:10 - ERROR - stderr - +2025-02-05 18:33:10 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.2340052127838135, 'learning_rate': 1.4985405832850462e-05, 'epoch': 1.06} +2025-02-05 18:33:10 - ERROR - stderr - 35%|███▌ | 7939/22434 [8:25:30<10:06:02, 2.51s/it] +2025-02-05 18:33:12 - ERROR - stderr - 35%|███▌ | 7940/22434 [8:25:32<10:04:28, 2.50s/it] +2025-02-05 18:33:12 - ERROR - stderr - +2025-02-05 18:33:12 - ERROR - stderr - +2025-02-05 18:33:12 - INFO - stdout - {'loss': 0.7914, 'grad_norm': 1.1823999881744385, 'learning_rate': 1.4984154245515587e-05, 'epoch': 1.06} +2025-02-05 18:33:12 - ERROR - stderr - 35%|███▌ | 7940/22434 [8:25:32<10:04:28, 2.50s/it] +2025-02-05 18:33:15 - ERROR - stderr - 35%|███▌ | 7941/22434 [8:25:35<10:08:59, 2.52s/it] +2025-02-05 18:33:15 - ERROR - stderr - +2025-02-05 18:33:15 - ERROR - stderr - +2025-02-05 18:33:15 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.3604391813278198, 'learning_rate': 1.4982902554290787e-05, 'epoch': 1.06} +2025-02-05 18:33:15 - ERROR - stderr - 35%|███▌ | 7941/22434 [8:25:35<10:08:59, 2.52s/it] +2025-02-05 18:33:17 - ERROR - stderr - 35%|███▌ | 7942/22434 [8:25:37<10:13:47, 2.54s/it] +2025-02-05 18:33:17 - ERROR - stderr - +2025-02-05 18:33:17 - ERROR - stderr - +2025-02-05 18:33:17 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.12320876121521, 'learning_rate': 1.4981650759202154e-05, 'epoch': 1.06} +2025-02-05 18:33:17 - ERROR - stderr - 35%|███▌ | 7942/22434 [8:25:37<10:13:47, 2.54s/it] +2025-02-05 18:33:20 - ERROR - stderr - 35%|███▌ | 7943/22434 [8:25:40<10:10:52, 2.53s/it] +2025-02-05 18:33:20 - ERROR - stderr - +2025-02-05 18:33:20 - ERROR - stderr - +2025-02-05 18:33:20 - INFO - stdout - {'loss': 0.6569, 'grad_norm': 1.0019127130508423, 'learning_rate': 1.4980398860275775e-05, 'epoch': 1.06} +2025-02-05 18:33:20 - ERROR - stderr - 35%|███▌ | 7943/22434 [8:25:40<10:10:52, 2.53s/it] +2025-02-05 18:33:22 - ERROR - stderr - 35%|███▌ | 7944/22434 [8:25:42<10:10:23, 2.53s/it] +2025-02-05 18:33:22 - ERROR - stderr - +2025-02-05 18:33:22 - ERROR - stderr - +2025-02-05 18:33:22 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.2252477407455444, 'learning_rate': 1.497914685753775e-05, 'epoch': 1.06} +2025-02-05 18:33:22 - ERROR - stderr - 35%|███▌ | 7944/22434 [8:25:42<10:10:23, 2.53s/it] +2025-02-05 18:33:25 - ERROR - stderr - 35%|███▌ | 7945/22434 [8:25:45<10:03:55, 2.50s/it] +2025-02-05 18:33:25 - ERROR - stderr - +2025-02-05 18:33:25 - ERROR - stderr - +2025-02-05 18:33:25 - INFO - stdout - {'loss': 0.8165, 'grad_norm': 1.2669659852981567, 'learning_rate': 1.4977894751014171e-05, 'epoch': 1.06} +2025-02-05 18:33:25 - ERROR - stderr - 35%|███▌ | 7945/22434 [8:25:45<10:03:55, 2.50s/it] +2025-02-05 18:33:27 - ERROR - stderr - 35%|███▌ | 7946/22434 [8:25:47<10:00:21, 2.49s/it] +2025-02-05 18:33:27 - ERROR - stderr - +2025-02-05 18:33:27 - ERROR - stderr - +2025-02-05 18:33:27 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.1917272806167603, 'learning_rate': 1.497664254073114e-05, 'epoch': 1.06} +2025-02-05 18:33:27 - ERROR - stderr - 35%|███▌ | 7946/22434 [8:25:47<10:00:21, 2.49s/it] +2025-02-05 18:33:30 - ERROR - stderr - 35%|███▌ | 7947/22434 [8:25:50<10:01:48, 2.49s/it] +2025-02-05 18:33:30 - ERROR - stderr - +2025-02-05 18:33:30 - ERROR - stderr - +2025-02-05 18:33:30 - INFO - stdout - {'loss': 0.7177, 'grad_norm': 1.1570379734039307, 'learning_rate': 1.4975390226714762e-05, 'epoch': 1.06} +2025-02-05 18:33:30 - ERROR - stderr - 35%|███▌ | 7947/22434 [8:25:50<10:01:48, 2.49s/it] +2025-02-05 18:33:33 - ERROR - stderr - 35%|███▌ | 7948/22434 [8:25:52<10:16:31, 2.55s/it] +2025-02-05 18:33:33 - ERROR - stderr - +2025-02-05 18:33:33 - ERROR - stderr - +2025-02-05 18:33:33 - INFO - stdout - {'loss': 0.6859, 'grad_norm': 1.1439061164855957, 'learning_rate': 1.4974137808991128e-05, 'epoch': 1.06} +2025-02-05 18:33:33 - ERROR - stderr - 35%|███▌ | 7948/22434 [8:25:52<10:16:31, 2.55s/it] +2025-02-05 18:33:35 - ERROR - stderr - 35%|███▌ | 7949/22434 [8:25:55<10:06:57, 2.51s/it] +2025-02-05 18:33:35 - ERROR - stderr - +2025-02-05 18:33:35 - ERROR - stderr - +2025-02-05 18:33:35 - INFO - stdout - {'loss': 0.8295, 'grad_norm': 1.3195852041244507, 'learning_rate': 1.4972885287586353e-05, 'epoch': 1.06} +2025-02-05 18:33:35 - ERROR - stderr - 35%|███▌ | 7949/22434 [8:25:55<10:06:57, 2.51s/it] +2025-02-05 18:33:37 - ERROR - stderr - 35%|███▌ | 7950/22434 [8:25:57<10:03:52, 2.50s/it] +2025-02-05 18:33:37 - ERROR - stderr - +2025-02-05 18:33:37 - ERROR - stderr - +2025-02-05 18:33:37 - INFO - stdout - {'loss': 0.7172, 'grad_norm': 1.20064115524292, 'learning_rate': 1.4971632662526545e-05, 'epoch': 1.06} +2025-02-05 18:33:37 - ERROR - stderr - 35%|███▌ | 7950/22434 [8:25:57<10:03:52, 2.50s/it] +2025-02-05 18:33:40 - ERROR - stderr - 35%|███▌ | 7951/22434 [8:26:00<10:35:54, 2.63s/it] +2025-02-05 18:33:40 - ERROR - stderr - +2025-02-05 18:33:40 - ERROR - stderr - +2025-02-05 18:33:40 - INFO - stdout - {'loss': 0.7198, 'grad_norm': 1.0977063179016113, 'learning_rate': 1.4970379933837811e-05, 'epoch': 1.06} +2025-02-05 18:33:40 - ERROR - stderr - 35%|███▌ | 7951/22434 [8:26:00<10:35:54, 2.63s/it] +2025-02-05 18:33:43 - ERROR - stderr - 35%|███▌ | 7952/22434 [8:26:03<10:24:47, 2.59s/it] +2025-02-05 18:33:43 - ERROR - stderr - +2025-02-05 18:33:43 - ERROR - stderr - +2025-02-05 18:33:43 - INFO - stdout - {'loss': 0.7118, 'grad_norm': 1.128766417503357, 'learning_rate': 1.4969127101546263e-05, 'epoch': 1.06} +2025-02-05 18:33:43 - ERROR - stderr - 35%|███▌ | 7952/22434 [8:26:03<10:24:47, 2.59s/it] +2025-02-05 18:33:46 - ERROR - stderr - 35%|███▌ | 7953/22434 [8:26:06<10:47:35, 2.68s/it] +2025-02-05 18:33:46 - ERROR - stderr - +2025-02-05 18:33:46 - ERROR - stderr - +2025-02-05 18:33:46 - INFO - stdout - {'loss': 0.6831, 'grad_norm': 1.0214426517486572, 'learning_rate': 1.4967874165678016e-05, 'epoch': 1.06} +2025-02-05 18:33:46 - ERROR - stderr - 35%|███▌ | 7953/22434 [8:26:06<10:47:35, 2.68s/it] +2025-02-05 18:33:48 - ERROR - stderr - 35%|███▌ | 7954/22434 [8:26:08<10:36:32, 2.64s/it] +2025-02-05 18:33:48 - ERROR - stderr - +2025-02-05 18:33:48 - ERROR - stderr - +2025-02-05 18:33:48 - INFO - stdout - {'loss': 0.7759, 'grad_norm': 1.2549512386322021, 'learning_rate': 1.4966621126259184e-05, 'epoch': 1.06} +2025-02-05 18:33:48 - ERROR - stderr - 35%|███▌ | 7954/22434 [8:26:08<10:36:32, 2.64s/it] +2025-02-05 18:33:51 - ERROR - stderr - 35%|███▌ | 7955/22434 [8:26:11<10:31:42, 2.62s/it] +2025-02-05 18:33:51 - ERROR - stderr - +2025-02-05 18:33:51 - ERROR - stderr - +2025-02-05 18:33:51 - INFO - stdout - {'loss': 0.7533, 'grad_norm': 1.146721363067627, 'learning_rate': 1.4965367983315889e-05, 'epoch': 1.06} +2025-02-05 18:33:51 - ERROR - stderr - 35%|███▌ | 7955/22434 [8:26:11<10:31:42, 2.62s/it] +2025-02-05 18:33:54 - ERROR - stderr - 35%|███▌ | 7956/22434 [8:26:13<10:36:21, 2.64s/it] +2025-02-05 18:33:54 - ERROR - stderr - +2025-02-05 18:33:54 - ERROR - stderr - +2025-02-05 18:33:54 - INFO - stdout - {'loss': 0.7325, 'grad_norm': 1.2159944772720337, 'learning_rate': 1.4964114736874249e-05, 'epoch': 1.06} +2025-02-05 18:33:54 - ERROR - stderr - 35%|███▌ | 7956/22434 [8:26:13<10:36:21, 2.64s/it] +2025-02-05 18:33:56 - ERROR - stderr - 35%|███▌ | 7957/22434 [8:26:16<10:29:57, 2.61s/it] +2025-02-05 18:33:56 - ERROR - stderr - +2025-02-05 18:33:56 - ERROR - stderr - +2025-02-05 18:33:56 - INFO - stdout - {'loss': 0.7394, 'grad_norm': 1.1576440334320068, 'learning_rate': 1.4962861386960389e-05, 'epoch': 1.06} +2025-02-05 18:33:56 - ERROR - stderr - 35%|███▌ | 7957/22434 [8:26:16<10:29:57, 2.61s/it] +2025-02-05 18:33:59 - ERROR - stderr - 35%|███▌ | 7958/22434 [8:26:18<10:18:23, 2.56s/it] +2025-02-05 18:33:59 - ERROR - stderr - +2025-02-05 18:33:59 - ERROR - stderr - +2025-02-05 18:33:59 - INFO - stdout - {'loss': 0.7831, 'grad_norm': 1.2468523979187012, 'learning_rate': 1.4961607933600431e-05, 'epoch': 1.06} +2025-02-05 18:33:59 - ERROR - stderr - 35%|███▌ | 7958/22434 [8:26:18<10:18:23, 2.56s/it] +2025-02-05 18:34:01 - ERROR - stderr - 35%|███▌ | 7959/22434 [8:26:21<10:12:02, 2.54s/it] +2025-02-05 18:34:01 - ERROR - stderr - +2025-02-05 18:34:01 - ERROR - stderr - +2025-02-05 18:34:01 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.1124165058135986, 'learning_rate': 1.4960354376820503e-05, 'epoch': 1.06} +2025-02-05 18:34:01 - ERROR - stderr - 35%|███▌ | 7959/22434 [8:26:21<10:12:02, 2.54s/it] +2025-02-05 18:34:04 - ERROR - stderr - 35%|███▌ | 7960/22434 [8:26:23<10:17:41, 2.56s/it] +2025-02-05 18:34:04 - ERROR - stderr - +2025-02-05 18:34:04 - ERROR - stderr - +2025-02-05 18:34:04 - INFO - stdout - {'loss': 0.695, 'grad_norm': 1.00448477268219, 'learning_rate': 1.4959100716646733e-05, 'epoch': 1.06} +2025-02-05 18:34:04 - ERROR - stderr - 35%|███▌ | 7960/22434 [8:26:23<10:17:41, 2.56s/it] +2025-02-05 18:34:06 - ERROR - stderr - 35%|███▌ | 7961/22434 [8:26:26<10:12:04, 2.54s/it] +2025-02-05 18:34:06 - ERROR - stderr - +2025-02-05 18:34:06 - ERROR - stderr - +2025-02-05 18:34:06 - INFO - stdout - {'loss': 0.6546, 'grad_norm': 0.9752576351165771, 'learning_rate': 1.4957846953105257e-05, 'epoch': 1.06} +2025-02-05 18:34:06 - ERROR - stderr - 35%|███▌ | 7961/22434 [8:26:26<10:12:04, 2.54s/it] +2025-02-05 18:34:09 - ERROR - stderr - 35%|███▌ | 7962/22434 [8:26:28<10:08:31, 2.52s/it] +2025-02-05 18:34:09 - ERROR - stderr - +2025-02-05 18:34:09 - ERROR - stderr - +2025-02-05 18:34:09 - INFO - stdout - {'loss': 0.7418, 'grad_norm': 1.0262619256973267, 'learning_rate': 1.4956593086222204e-05, 'epoch': 1.06} +2025-02-05 18:34:09 - ERROR - stderr - 35%|███▌ | 7962/22434 [8:26:28<10:08:31, 2.52s/it] +2025-02-05 18:34:11 - ERROR - stderr - 35%|███▌ | 7963/22434 [8:26:31<10:06:53, 2.52s/it] +2025-02-05 18:34:11 - ERROR - stderr - +2025-02-05 18:34:11 - ERROR - stderr - +2025-02-05 18:34:11 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.0726574659347534, 'learning_rate': 1.495533911602371e-05, 'epoch': 1.06} +2025-02-05 18:34:11 - ERROR - stderr - 35%|███▌ | 7963/22434 [8:26:31<10:06:53, 2.52s/it] +2025-02-05 18:34:14 - ERROR - stderr - 35%|███▌ | 7964/22434 [8:26:33<10:07:06, 2.52s/it] +2025-02-05 18:34:14 - ERROR - stderr - +2025-02-05 18:34:14 - ERROR - stderr - +2025-02-05 18:34:14 - INFO - stdout - {'loss': 0.7203, 'grad_norm': 1.15571928024292, 'learning_rate': 1.4954085042535915e-05, 'epoch': 1.06} +2025-02-05 18:34:14 - ERROR - stderr - 35%|███▌ | 7964/22434 [8:26:33<10:07:06, 2.52s/it] +2025-02-05 18:34:16 - ERROR - stderr - 36%|███▌ | 7965/22434 [8:26:36<10:02:59, 2.50s/it] +2025-02-05 18:34:16 - ERROR - stderr - +2025-02-05 18:34:16 - ERROR - stderr - +2025-02-05 18:34:16 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.2726913690567017, 'learning_rate': 1.4952830865784958e-05, 'epoch': 1.07} +2025-02-05 18:34:16 - ERROR - stderr - 36%|███▌ | 7965/22434 [8:26:36<10:02:59, 2.50s/it] +2025-02-05 18:34:19 - ERROR - stderr - 36%|███▌ | 7966/22434 [8:26:38<10:01:31, 2.49s/it] +2025-02-05 18:34:19 - ERROR - stderr - +2025-02-05 18:34:19 - ERROR - stderr - +2025-02-05 18:34:19 - INFO - stdout - {'loss': 0.7437, 'grad_norm': 1.2412673234939575, 'learning_rate': 1.4951576585796984e-05, 'epoch': 1.07} +2025-02-05 18:34:19 - ERROR - stderr - 36%|███▌ | 7966/22434 [8:26:38<10:01:31, 2.49s/it] +2025-02-05 18:34:21 - ERROR - stderr - 36%|███▌ | 7967/22434 [8:26:41<10:05:13, 2.51s/it] +2025-02-05 18:34:21 - ERROR - stderr - +2025-02-05 18:34:21 - ERROR - stderr - +2025-02-05 18:34:21 - INFO - stdout - {'loss': 0.816, 'grad_norm': 1.3505381345748901, 'learning_rate': 1.495032220259813e-05, 'epoch': 1.07} +2025-02-05 18:34:21 - ERROR - stderr - 36%|███▌ | 7967/22434 [8:26:41<10:05:13, 2.51s/it] +2025-02-05 18:34:24 - ERROR - stderr - 36%|███▌ | 7968/22434 [8:26:43<10:11:42, 2.54s/it] +2025-02-05 18:34:24 - ERROR - stderr - +2025-02-05 18:34:24 - ERROR - stderr - +2025-02-05 18:34:24 - INFO - stdout - {'loss': 0.8216, 'grad_norm': 1.26339852809906, 'learning_rate': 1.4949067716214545e-05, 'epoch': 1.07} +2025-02-05 18:34:24 - ERROR - stderr - 36%|███▌ | 7968/22434 [8:26:44<10:11:42, 2.54s/it] +2025-02-05 18:34:26 - ERROR - stderr - 36%|███▌ | 7969/22434 [8:26:46<10:25:43, 2.60s/it] +2025-02-05 18:34:26 - ERROR - stderr - +2025-02-05 18:34:26 - ERROR - stderr - +2025-02-05 18:34:26 - INFO - stdout - {'loss': 0.736, 'grad_norm': 1.1283643245697021, 'learning_rate': 1.4947813126672381e-05, 'epoch': 1.07} +2025-02-05 18:34:26 - ERROR - stderr - 36%|███▌ | 7969/22434 [8:26:46<10:25:43, 2.60s/it] +2025-02-05 18:34:29 - ERROR - stderr - 36%|███▌ | 7970/22434 [8:26:49<10:19:22, 2.57s/it] +2025-02-05 18:34:29 - ERROR - stderr - +2025-02-05 18:34:29 - ERROR - stderr - +2025-02-05 18:34:29 - INFO - stdout - {'loss': 0.7229, 'grad_norm': 1.1084765195846558, 'learning_rate': 1.4946558433997792e-05, 'epoch': 1.07} +2025-02-05 18:34:29 - ERROR - stderr - 36%|███▌ | 7970/22434 [8:26:49<10:19:22, 2.57s/it] +2025-02-05 18:34:31 - ERROR - stderr - 36%|███▌ | 7971/22434 [8:26:51<10:13:19, 2.54s/it] +2025-02-05 18:34:31 - ERROR - stderr - +2025-02-05 18:34:31 - ERROR - stderr - +2025-02-05 18:34:31 - INFO - stdout - {'loss': 0.8169, 'grad_norm': 1.316051721572876, 'learning_rate': 1.494530363821692e-05, 'epoch': 1.07} +2025-02-05 18:34:31 - ERROR - stderr - 36%|███▌ | 7971/22434 [8:26:51<10:13:19, 2.54s/it] +2025-02-05 18:34:34 - ERROR - stderr - 36%|███▌ | 7972/22434 [8:26:54<10:11:01, 2.53s/it] +2025-02-05 18:34:34 - ERROR - stderr - +2025-02-05 18:34:34 - ERROR - stderr - +2025-02-05 18:34:34 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.1914138793945312, 'learning_rate': 1.4944048739355928e-05, 'epoch': 1.07} +2025-02-05 18:34:34 - ERROR - stderr - 36%|███▌ | 7972/22434 [8:26:54<10:11:01, 2.53s/it] +2025-02-05 18:34:37 - ERROR - stderr - 36%|███▌ | 7973/22434 [8:26:56<10:14:30, 2.55s/it] +2025-02-05 18:34:37 - ERROR - stderr - +2025-02-05 18:34:37 - ERROR - stderr - +2025-02-05 18:34:37 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 1.071157693862915, 'learning_rate': 1.4942793737440968e-05, 'epoch': 1.07} +2025-02-05 18:34:37 - ERROR - stderr - 36%|███▌ | 7973/22434 [8:26:56<10:14:30, 2.55s/it] +2025-02-05 18:34:39 - ERROR - stderr - 36%|███▌ | 7974/22434 [8:26:59<10:11:26, 2.54s/it] +2025-02-05 18:34:39 - ERROR - stderr - +2025-02-05 18:34:39 - ERROR - stderr - +2025-02-05 18:34:39 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.2871873378753662, 'learning_rate': 1.4941538632498204e-05, 'epoch': 1.07} +2025-02-05 18:34:39 - ERROR - stderr - 36%|███▌ | 7974/22434 [8:26:59<10:11:26, 2.54s/it] +2025-02-05 18:34:42 - ERROR - stderr - 36%|███▌ | 7975/22434 [8:27:02<10:35:37, 2.64s/it] +2025-02-05 18:34:42 - ERROR - stderr - +2025-02-05 18:34:42 - ERROR - stderr - +2025-02-05 18:34:42 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.0527194738388062, 'learning_rate': 1.49402834245538e-05, 'epoch': 1.07} +2025-02-05 18:34:42 - ERROR - stderr - 36%|███▌ | 7975/22434 [8:27:02<10:35:37, 2.64s/it] +2025-02-05 18:34:45 - ERROR - stderr - 36%|███▌ | 7976/22434 [8:27:05<10:55:40, 2.72s/it] +2025-02-05 18:34:45 - ERROR - stderr - +2025-02-05 18:34:45 - ERROR - stderr - +2025-02-05 18:34:45 - INFO - stdout - {'loss': 0.726, 'grad_norm': 1.2433918714523315, 'learning_rate': 1.493902811363391e-05, 'epoch': 1.07} +2025-02-05 18:34:45 - ERROR - stderr - 36%|███▌ | 7976/22434 [8:27:05<10:55:40, 2.72s/it] +2025-02-05 18:34:47 - ERROR - stderr - 36%|███▌ | 7977/22434 [8:27:07<10:44:10, 2.67s/it] +2025-02-05 18:34:47 - ERROR - stderr - +2025-02-05 18:34:47 - ERROR - stderr - +2025-02-05 18:34:47 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.064510464668274, 'learning_rate': 1.4937772699764707e-05, 'epoch': 1.07} +2025-02-05 18:34:47 - ERROR - stderr - 36%|███▌ | 7977/22434 [8:27:07<10:44:10, 2.67s/it] +2025-02-05 18:34:50 - ERROR - stderr - 36%|███▌ | 7978/22434 [8:27:10<10:26:28, 2.60s/it] +2025-02-05 18:34:50 - ERROR - stderr - +2025-02-05 18:34:50 - ERROR - stderr - +2025-02-05 18:34:50 - INFO - stdout - {'loss': 0.8231, 'grad_norm': 1.1654877662658691, 'learning_rate': 1.4936517182972359e-05, 'epoch': 1.07} +2025-02-05 18:34:50 - ERROR - stderr - 36%|███▌ | 7978/22434 [8:27:10<10:26:28, 2.60s/it] +2025-02-05 18:34:52 - ERROR - stderr - 36%|███▌ | 7979/22434 [8:27:12<10:15:52, 2.56s/it] +2025-02-05 18:34:52 - ERROR - stderr - +2025-02-05 18:34:52 - ERROR - stderr - +2025-02-05 18:34:52 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.1321035623550415, 'learning_rate': 1.493526156328303e-05, 'epoch': 1.07} +2025-02-05 18:34:52 - ERROR - stderr - 36%|███▌ | 7979/22434 [8:27:12<10:15:52, 2.56s/it] +2025-02-05 18:34:55 - ERROR - stderr - 36%|███▌ | 7980/22434 [8:27:15<10:11:51, 2.54s/it] +2025-02-05 18:34:55 - ERROR - stderr - +2025-02-05 18:34:55 - ERROR - stderr - +2025-02-05 18:34:55 - INFO - stdout - {'loss': 0.6988, 'grad_norm': 1.1151471138000488, 'learning_rate': 1.4934005840722896e-05, 'epoch': 1.07} +2025-02-05 18:34:55 - ERROR - stderr - 36%|███▌ | 7980/22434 [8:27:15<10:11:51, 2.54s/it] +2025-02-05 18:34:57 - ERROR - stderr - 36%|███▌ | 7981/22434 [8:27:17<10:08:30, 2.53s/it] +2025-02-05 18:34:57 - ERROR - stderr - +2025-02-05 18:34:57 - ERROR - stderr - +2025-02-05 18:34:57 - INFO - stdout - {'loss': 0.6893, 'grad_norm': 1.085904836654663, 'learning_rate': 1.4932750015318134e-05, 'epoch': 1.07} +2025-02-05 18:34:57 - ERROR - stderr - 36%|███▌ | 7981/22434 [8:27:17<10:08:30, 2.53s/it] +2025-02-05 18:35:00 - ERROR - stderr - 36%|███▌ | 7982/22434 [8:27:20<10:08:04, 2.52s/it] +2025-02-05 18:35:00 - ERROR - stderr - +2025-02-05 18:35:00 - ERROR - stderr - +2025-02-05 18:35:00 - INFO - stdout - {'loss': 0.7431, 'grad_norm': 1.0934933423995972, 'learning_rate': 1.493149408709492e-05, 'epoch': 1.07} +2025-02-05 18:35:00 - ERROR - stderr - 36%|███▌ | 7982/22434 [8:27:20<10:08:04, 2.52s/it] +2025-02-05 18:35:02 - ERROR - stderr - 36%|███▌ | 7983/22434 [8:27:22<10:01:03, 2.50s/it] +2025-02-05 18:35:02 - ERROR - stderr - +2025-02-05 18:35:02 - ERROR - stderr - +2025-02-05 18:35:02 - INFO - stdout - {'loss': 0.8035, 'grad_norm': 1.1888419389724731, 'learning_rate': 1.493023805607943e-05, 'epoch': 1.07} +2025-02-05 18:35:02 - ERROR - stderr - 36%|███▌ | 7983/22434 [8:27:22<10:01:03, 2.50s/it] +2025-02-05 18:35:05 - ERROR - stderr - 36%|███▌ | 7984/22434 [8:27:24<9:59:52, 2.49s/it] +2025-02-05 18:35:05 - ERROR - stderr - +2025-02-05 18:35:05 - ERROR - stderr - +2025-02-05 18:35:05 - INFO - stdout - {'loss': 0.6947, 'grad_norm': 1.2238980531692505, 'learning_rate': 1.4928981922297842e-05, 'epoch': 1.07} +2025-02-05 18:35:05 - ERROR - stderr - 36%|███▌ | 7984/22434 [8:27:25<9:59:52, 2.49s/it] +2025-02-05 18:35:07 - ERROR - stderr - 36%|███▌ | 7985/22434 [8:27:27<10:00:28, 2.49s/it] +2025-02-05 18:35:07 - ERROR - stderr - +2025-02-05 18:35:07 - ERROR - stderr - +2025-02-05 18:35:07 - INFO - stdout - {'loss': 0.6843, 'grad_norm': 1.1276289224624634, 'learning_rate': 1.4927725685776344e-05, 'epoch': 1.07} +2025-02-05 18:35:07 - ERROR - stderr - 36%|███▌ | 7985/22434 [8:27:27<10:00:28, 2.49s/it] +2025-02-05 18:35:10 - ERROR - stderr - 36%|███▌ | 7986/22434 [8:27:29<9:54:11, 2.47s/it] +2025-02-05 18:35:10 - ERROR - stderr - +2025-02-05 18:35:10 - ERROR - stderr - +2025-02-05 18:35:10 - INFO - stdout - {'loss': 0.6745, 'grad_norm': 1.111707091331482, 'learning_rate': 1.492646934654112e-05, 'epoch': 1.07} +2025-02-05 18:35:10 - ERROR - stderr - 36%|███��� | 7986/22434 [8:27:29<9:54:11, 2.47s/it] +2025-02-05 18:35:12 - ERROR - stderr - 36%|███▌ | 7987/22434 [8:27:32<10:18:06, 2.57s/it] +2025-02-05 18:35:12 - ERROR - stderr - +2025-02-05 18:35:12 - ERROR - stderr - +2025-02-05 18:35:12 - INFO - stdout - {'loss': 0.6081, 'grad_norm': 1.0623356103897095, 'learning_rate': 1.4925212904618355e-05, 'epoch': 1.07} +2025-02-05 18:35:12 - ERROR - stderr - 36%|███▌ | 7987/22434 [8:27:32<10:18:06, 2.57s/it] +2025-02-05 18:35:15 - ERROR - stderr - 36%|███▌ | 7988/22434 [8:27:35<10:12:58, 2.55s/it] +2025-02-05 18:35:15 - ERROR - stderr - +2025-02-05 18:35:15 - ERROR - stderr - +2025-02-05 18:35:15 - INFO - stdout - {'loss': 0.5927, 'grad_norm': 1.073350191116333, 'learning_rate': 1.4923956360034242e-05, 'epoch': 1.07} +2025-02-05 18:35:15 - ERROR - stderr - 36%|███▌ | 7988/22434 [8:27:35<10:12:58, 2.55s/it] +2025-02-05 18:35:18 - ERROR - stderr - 36%|███▌ | 7989/22434 [8:27:37<10:23:39, 2.59s/it] +2025-02-05 18:35:18 - ERROR - stderr - +2025-02-05 18:35:18 - ERROR - stderr - +2025-02-05 18:35:18 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.1215358972549438, 'learning_rate': 1.492269971281497e-05, 'epoch': 1.07} +2025-02-05 18:35:18 - ERROR - stderr - 36%|███▌ | 7989/22434 [8:27:37<10:23:39, 2.59s/it] +2025-02-05 18:35:20 - ERROR - stderr - 36%|███▌ | 7990/22434 [8:27:40<10:19:09, 2.57s/it] +2025-02-05 18:35:20 - ERROR - stderr - +2025-02-05 18:35:20 - ERROR - stderr - +2025-02-05 18:35:20 - INFO - stdout - {'loss': 0.7181, 'grad_norm': 1.220745325088501, 'learning_rate': 1.4921442962986732e-05, 'epoch': 1.07} +2025-02-05 18:35:20 - ERROR - stderr - 36%|███▌ | 7990/22434 [8:27:40<10:19:09, 2.57s/it] +2025-02-05 18:35:23 - ERROR - stderr - 36%|███▌ | 7991/22434 [8:27:42<10:15:52, 2.56s/it] +2025-02-05 18:35:23 - ERROR - stderr - +2025-02-05 18:35:23 - ERROR - stderr - +2025-02-05 18:35:23 - INFO - stdout - {'loss': 0.7144, 'grad_norm': 1.0404924154281616, 'learning_rate': 1.4920186110575728e-05, 'epoch': 1.07} +2025-02-05 18:35:23 - ERROR - stderr - 36%|███▌ | 7991/22434 [8:27:42<10:15:52, 2.56s/it] +2025-02-05 18:35:25 - ERROR - stderr - 36%|███▌ | 7992/22434 [8:27:45<10:09:06, 2.53s/it] +2025-02-05 18:35:25 - ERROR - stderr - +2025-02-05 18:35:25 - ERROR - stderr - +2025-02-05 18:35:25 - INFO - stdout - {'loss': 0.726, 'grad_norm': 1.3028619289398193, 'learning_rate': 1.4918929155608148e-05, 'epoch': 1.07} +2025-02-05 18:35:25 - ERROR - stderr - 36%|███▌ | 7992/22434 [8:27:45<10:09:06, 2.53s/it] +2025-02-05 18:35:28 - ERROR - stderr - 36%|███▌ | 7993/22434 [8:27:47<10:06:41, 2.52s/it] +2025-02-05 18:35:28 - ERROR - stderr - +2025-02-05 18:35:28 - ERROR - stderr - +2025-02-05 18:35:28 - INFO - stdout - {'loss': 0.7601, 'grad_norm': 1.1932357549667358, 'learning_rate': 1.4917672098110198e-05, 'epoch': 1.07} +2025-02-05 18:35:28 - ERROR - stderr - 36%|███▌ | 7993/22434 [8:27:47<10:06:41, 2.52s/it] +2025-02-05 18:35:30 - ERROR - stderr - 36%|███▌ | 7994/22434 [8:27:50<10:02:39, 2.50s/it] +2025-02-05 18:35:30 - ERROR - stderr - +2025-02-05 18:35:30 - ERROR - stderr - +2025-02-05 18:35:30 - INFO - stdout - {'loss': 0.6799, 'grad_norm': 1.099233627319336, 'learning_rate': 1.491641493810808e-05, 'epoch': 1.07} +2025-02-05 18:35:30 - ERROR - stderr - 36%|███▌ | 7994/22434 [8:27:50<10:02:39, 2.50s/it] +2025-02-05 18:35:33 - ERROR - stderr - 36%|███▌ | 7995/22434 [8:27:52<9:59:41, 2.49s/it] +2025-02-05 18:35:33 - ERROR - stderr - +2025-02-05 18:35:33 - ERROR - stderr - +2025-02-05 18:35:33 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.0862386226654053, 'learning_rate': 1.4915157675627999e-05, 'epoch': 1.07} +2025-02-05 18:35:33 - ERROR - stderr - 36%|███▌ | 7995/22434 [8:27:52<9:59:41, 2.49s/it] +2025-02-05 18:35:35 - ERROR - stderr - 36%|███▌ | 7996/22434 [8:27:55<9:58:58, 2.49s/it] +2025-02-05 18:35:35 - ERROR - stderr - +2025-02-05 18:35:35 - ERROR - stderr - +2025-02-05 18:35:35 - INFO - stdout - {'loss': 0.6449, 'grad_norm': 1.113158941268921, 'learning_rate': 1.4913900310696154e-05, 'epoch': 1.07} +2025-02-05 18:35:35 - ERROR - stderr - 36%|███▌ | 7996/22434 [8:27:55<9:58:58, 2.49s/it] +2025-02-05 18:35:37 - ERROR - stderr - 36%|███▌ | 7997/22434 [8:27:57<9:55:55, 2.48s/it] +2025-02-05 18:35:38 - ERROR - stderr - +2025-02-05 18:35:38 - ERROR - stderr - +2025-02-05 18:35:38 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.1517256498336792, 'learning_rate': 1.4912642843338762e-05, 'epoch': 1.07} +2025-02-05 18:35:38 - ERROR - stderr - 36%|███▌ | 7997/22434 [8:27:57<9:55:55, 2.48s/it] +2025-02-05 18:35:40 - ERROR - stderr - 36%|███▌ | 7998/22434 [8:28:00<10:21:47, 2.58s/it] +2025-02-05 18:35:40 - ERROR - stderr - +2025-02-05 18:35:40 - ERROR - stderr - +2025-02-05 18:35:40 - INFO - stdout - {'loss': 0.7619, 'grad_norm': 1.1889827251434326, 'learning_rate': 1.4911385273582033e-05, 'epoch': 1.07} +2025-02-05 18:35:40 - ERROR - stderr - 36%|███▌ | 7998/22434 [8:28:00<10:21:47, 2.58s/it] +2025-02-05 18:35:43 - ERROR - stderr - 36%|███▌ | 7999/22434 [8:28:03<10:19:25, 2.57s/it] +2025-02-05 18:35:43 - ERROR - stderr - +2025-02-05 18:35:43 - ERROR - stderr - +2025-02-05 18:35:43 - INFO - stdout - {'loss': 0.6199, 'grad_norm': 1.0371898412704468, 'learning_rate': 1.4910127601452175e-05, 'epoch': 1.07} +2025-02-05 18:35:43 - ERROR - stderr - 36%|███▌ | 7999/22434 [8:28:03<10:19:25, 2.57s/it] +2025-02-05 18:35:45 - ERROR - stderr - 36%|███▌ | 8000/22434 [8:28:05<10:15:58, 2.56s/it] +2025-02-05 18:35:45 - ERROR - stderr - +2025-02-05 18:35:45 - ERROR - stderr - +2025-02-05 18:35:45 - INFO - stdout - {'loss': 0.7884, 'grad_norm': 1.2458781003952026, 'learning_rate': 1.4908869826975404e-05, 'epoch': 1.07} +2025-02-05 18:35:45 - ERROR - stderr - 36%|███▌ | 8000/22434 [8:28:05<10:15:58, 2.56s/it] +2025-02-05 18:35:48 - ERROR - stderr - 36%|███▌ | 8001/22434 [8:28:08<10:15:34, 2.56s/it] +2025-02-05 18:35:48 - ERROR - stderr - +2025-02-05 18:35:48 - ERROR - stderr - +2025-02-05 18:35:48 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.1779608726501465, 'learning_rate': 1.4907611950177943e-05, 'epoch': 1.07} +2025-02-05 18:35:48 - ERROR - stderr - 36%|███▌ | 8001/22434 [8:28:08<10:15:34, 2.56s/it] +2025-02-05 18:35:50 - ERROR - stderr - 36%|███▌ | 8002/22434 [8:28:10<10:09:01, 2.53s/it] +2025-02-05 18:35:50 - ERROR - stderr - +2025-02-05 18:35:50 - ERROR - stderr - +2025-02-05 18:35:50 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.0234830379486084, 'learning_rate': 1.4906353971086004e-05, 'epoch': 1.07} +2025-02-05 18:35:50 - ERROR - stderr - 36%|███▌ | 8002/22434 [8:28:10<10:09:01, 2.53s/it] +2025-02-05 18:35:53 - ERROR - stderr - 36%|███▌ | 8003/22434 [8:28:13<10:03:24, 2.51s/it] +2025-02-05 18:35:53 - ERROR - stderr - +2025-02-05 18:35:53 - ERROR - stderr - +2025-02-05 18:35:53 - INFO - stdout - {'loss': 0.6996, 'grad_norm': 1.1388604640960693, 'learning_rate': 1.4905095889725814e-05, 'epoch': 1.07} +2025-02-05 18:35:53 - ERROR - stderr - 36%|███▌ | 8003/22434 [8:28:13<10:03:24, 2.51s/it] +2025-02-05 18:35:55 - ERROR - stderr - 36%|███▌ | 8004/22434 [8:28:15<10:05:11, 2.52s/it] +2025-02-05 18:35:55 - ERROR - stderr - +2025-02-05 18:35:55 - ERROR - stderr - +2025-02-05 18:35:55 - INFO - stdout - {'loss': 0.7371, 'grad_norm': 1.0985702276229858, 'learning_rate': 1.4903837706123591e-05, 'epoch': 1.07} +2025-02-05 18:35:55 - ERROR - stderr - 36%|███▌ | 8004/22434 [8:28:15<10:05:11, 2.52s/it] +2025-02-05 18:35:58 - ERROR - stderr - 36%|███▌ | 8005/22434 [8:28:18<10:04:52, 2.52s/it] +2025-02-05 18:35:58 - ERROR - stderr - +2025-02-05 18:35:58 - ERROR - stderr - +2025-02-05 18:35:58 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.0438344478607178, 'learning_rate': 1.4902579420305564e-05, 'epoch': 1.07} +2025-02-05 18:35:58 - ERROR - stderr - 36%|███▌ | 8005/22434 [8:28:18<10:04:52, 2.52s/it] +2025-02-05 18:36:00 - ERROR - stderr - 36%|███▌ | 8006/22434 [8:28:20<10:04:49, 2.52s/it] +2025-02-05 18:36:00 - ERROR - stderr - +2025-02-05 18:36:00 - ERROR - stderr - +2025-02-05 18:36:00 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.1711899042129517, 'learning_rate': 1.4901321032297964e-05, 'epoch': 1.07} +2025-02-05 18:36:00 - ERROR - stderr - 36%|███▌ | 8006/22434 [8:28:20<10:04:49, 2.52s/it] +2025-02-05 18:36:03 - ERROR - stderr - 36%|███▌ | 8007/22434 [8:28:23<10:10:34, 2.54s/it] +2025-02-05 18:36:03 - ERROR - stderr - +2025-02-05 18:36:03 - ERROR - stderr - +2025-02-05 18:36:03 - INFO - stdout - {'loss': 0.7944, 'grad_norm': 1.2331076860427856, 'learning_rate': 1.4900062542127013e-05, 'epoch': 1.07} +2025-02-05 18:36:03 - ERROR - stderr - 36%|███▌ | 8007/22434 [8:28:23<10:10:34, 2.54s/it] +2025-02-05 18:36:06 - ERROR - stderr - 36%|███▌ | 8008/22434 [8:28:25<10:10:48, 2.54s/it] +2025-02-05 18:36:06 - ERROR - stderr - +2025-02-05 18:36:06 - ERROR - stderr - +2025-02-05 18:36:06 - INFO - stdout - {'loss': 0.8149, 'grad_norm': 1.193842887878418, 'learning_rate': 1.4898803949818947e-05, 'epoch': 1.07} +2025-02-05 18:36:06 - ERROR - stderr - 36%|███▌ | 8008/22434 [8:28:25<10:10:48, 2.54s/it] +2025-02-05 18:36:08 - ERROR - stderr - 36%|███▌ | 8009/22434 [8:28:28<10:04:22, 2.51s/it] +2025-02-05 18:36:08 - ERROR - stderr - +2025-02-05 18:36:08 - ERROR - stderr - +2025-02-05 18:36:08 - INFO - stdout - {'loss': 0.7538, 'grad_norm': 1.1356539726257324, 'learning_rate': 1.48975452554e-05, 'epoch': 1.07} +2025-02-05 18:36:08 - ERROR - stderr - 36%|███▌ | 8009/22434 [8:28:28<10:04:22, 2.51s/it] +2025-02-05 18:36:10 - ERROR - stderr - 36%|███▌ | 8010/22434 [8:28:30<9:58:05, 2.49s/it] +2025-02-05 18:36:11 - ERROR - stderr - +2025-02-05 18:36:11 - ERROR - stderr - +2025-02-05 18:36:11 - INFO - stdout - {'loss': 0.8581, 'grad_norm': 1.2428690195083618, 'learning_rate': 1.4896286458896411e-05, 'epoch': 1.07} +2025-02-05 18:36:11 - ERROR - stderr - 36%|███▌ | 8010/22434 [8:28:30<9:58:05, 2.49s/it] +2025-02-05 18:36:13 - ERROR - stderr - 36%|███▌ | 8011/22434 [8:28:33<10:04:44, 2.52s/it] +2025-02-05 18:36:13 - ERROR - stderr - +2025-02-05 18:36:13 - ERROR - stderr - +2025-02-05 18:36:13 - INFO - stdout - {'loss': 0.7538, 'grad_norm': 1.129564642906189, 'learning_rate': 1.4895027560334418e-05, 'epoch': 1.07} +2025-02-05 18:36:13 - ERROR - stderr - 36%|███▌ | 8011/22434 [8:28:33<10:04:44, 2.52s/it] +2025-02-05 18:36:15 - ERROR - stderr - 36%|███▌ | 8012/22434 [8:28:35<9:59:42, 2.49s/it] +2025-02-05 18:36:16 - ERROR - stderr - +2025-02-05 18:36:16 - ERROR - stderr - +2025-02-05 18:36:16 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.2056195735931396, 'learning_rate': 1.4893768559740256e-05, 'epoch': 1.07} +2025-02-05 18:36:16 - ERROR - stderr - 36%|███▌ | 8012/22434 [8:28:35<9:59:42, 2.49s/it] +2025-02-05 18:36:18 - ERROR - stderr - 36%|███▌ | 8013/22434 [8:28:38<9:59:16, 2.49s/it] +2025-02-05 18:36:18 - ERROR - stderr - +2025-02-05 18:36:18 - ERROR - stderr - +2025-02-05 18:36:18 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.0560389757156372, 'learning_rate': 1.4892509457140171e-05, 'epoch': 1.07} +2025-02-05 18:36:18 - ERROR - stderr - 36%|███▌ | 8013/22434 [8:28:38<9:59:16, 2.49s/it] +2025-02-05 18:36:20 - ERROR - stderr - 36%|███▌ | 8014/22434 [8:28:40<9:59:50, 2.50s/it] +2025-02-05 18:36:21 - ERROR - stderr - +2025-02-05 18:36:21 - ERROR - stderr - +2025-02-05 18:36:21 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.0318557024002075, 'learning_rate': 1.4891250252560408e-05, 'epoch': 1.07} +2025-02-05 18:36:21 - ERROR - stderr - 36%|███▌ | 8014/22434 [8:28:40<9:59:50, 2.50s/it] +2025-02-05 18:36:23 - ERROR - stderr - 36%|███▌ | 8015/22434 [8:28:43<10:21:58, 2.59s/it] +2025-02-05 18:36:23 - ERROR - stderr - +2025-02-05 18:36:23 - ERROR - stderr - +2025-02-05 18:36:23 - INFO - stdout - {'loss': 0.7679, 'grad_norm': 1.140726089477539, 'learning_rate': 1.4889990946027217e-05, 'epoch': 1.07} +2025-02-05 18:36:23 - ERROR - stderr - 36%|███▌ | 8015/22434 [8:28:43<10:21:58, 2.59s/it] +2025-02-05 18:36:26 - ERROR - stderr - 36%|███▌ | 8016/22434 [8:28:46<10:12:42, 2.55s/it] +2025-02-05 18:36:26 - ERROR - stderr - +2025-02-05 18:36:26 - ERROR - stderr - +2025-02-05 18:36:26 - INFO - stdout - {'loss': 0.827, 'grad_norm': 1.1712989807128906, 'learning_rate': 1.4888731537566841e-05, 'epoch': 1.07} +2025-02-05 18:36:26 - ERROR - stderr - 36%|███▌ | 8016/22434 [8:28:46<10:12:42, 2.55s/it] +2025-02-05 18:36:28 - ERROR - stderr - 36%|███▌ | 8017/22434 [8:28:48<10:07:34, 2.53s/it] +2025-02-05 18:36:28 - ERROR - stderr - +2025-02-05 18:36:28 - ERROR - stderr - +2025-02-05 18:36:28 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.0172204971313477, 'learning_rate': 1.4887472027205534e-05, 'epoch': 1.07} +2025-02-05 18:36:28 - ERROR - stderr - 36%|███▌ | 8017/22434 [8:28:48<10:07:34, 2.53s/it] +2025-02-05 18:36:31 - ERROR - stderr - 36%|███▌ | 8018/22434 [8:28:50<10:06:24, 2.52s/it] +2025-02-05 18:36:31 - ERROR - stderr - +2025-02-05 18:36:31 - ERROR - stderr - +2025-02-05 18:36:31 - INFO - stdout - {'loss': 0.7133, 'grad_norm': 1.14127779006958, 'learning_rate': 1.4886212414969551e-05, 'epoch': 1.07} +2025-02-05 18:36:31 - ERROR - stderr - 36%|███▌ | 8018/22434 [8:28:51<10:06:24, 2.52s/it] +2025-02-05 18:36:33 - ERROR - stderr - 36%|███▌ | 8019/22434 [8:28:53<10:05:53, 2.52s/it] +2025-02-05 18:36:33 - ERROR - stderr - +2025-02-05 18:36:33 - ERROR - stderr - +2025-02-05 18:36:33 - INFO - stdout - {'loss': 0.7841, 'grad_norm': 1.075136423110962, 'learning_rate': 1.4884952700885145e-05, 'epoch': 1.07} +2025-02-05 18:36:33 - ERROR - stderr - 36%|███▌ | 8019/22434 [8:28:53<10:05:53, 2.52s/it] +2025-02-05 18:36:36 - ERROR - stderr - 36%|███▌ | 8020/22434 [8:28:56<10:13:10, 2.55s/it] +2025-02-05 18:36:36 - ERROR - stderr - +2025-02-05 18:36:36 - ERROR - stderr - +2025-02-05 18:36:36 - INFO - stdout - {'loss': 0.6831, 'grad_norm': 1.0378270149230957, 'learning_rate': 1.4883692884978574e-05, 'epoch': 1.07} +2025-02-05 18:36:36 - ERROR - stderr - 36%|███▌ | 8020/22434 [8:28:56<10:13:10, 2.55s/it] +2025-02-05 18:36:39 - ERROR - stderr - 36%|███▌ | 8021/22434 [8:28:59<10:46:30, 2.69s/it] +2025-02-05 18:36:39 - ERROR - stderr - +2025-02-05 18:36:39 - ERROR - stderr - +2025-02-05 18:36:39 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.1945972442626953, 'learning_rate': 1.4882432967276099e-05, 'epoch': 1.07} +2025-02-05 18:36:39 - ERROR - stderr - 36%|███▌ | 8021/22434 [8:28:59<10:46:30, 2.69s/it] +2025-02-05 18:36:42 - ERROR - stderr - 36%|███▌ | 8022/22434 [8:29:02<10:59:56, 2.75s/it] +2025-02-05 18:36:42 - ERROR - stderr - +2025-02-05 18:36:42 - ERROR - stderr - +2025-02-05 18:36:42 - INFO - stdout - {'loss': 0.8187, 'grad_norm': 1.317657232284546, 'learning_rate': 1.4881172947803978e-05, 'epoch': 1.07} +2025-02-05 18:36:42 - ERROR - stderr - 36%|███▌ | 8022/22434 [8:29:02<10:59:56, 2.75s/it] +2025-02-05 18:36:44 - ERROR - stderr - 36%|███▌ | 8023/22434 [8:29:04<10:46:42, 2.69s/it] +2025-02-05 18:36:44 - ERROR - stderr - +2025-02-05 18:36:44 - ERROR - stderr - +2025-02-05 18:36:44 - INFO - stdout - {'loss': 0.8914, 'grad_norm': 1.1680006980895996, 'learning_rate': 1.4879912826588483e-05, 'epoch': 1.07} +2025-02-05 18:36:44 - ERROR - stderr - 36%|███▌ | 8023/22434 [8:29:04<10:46:42, 2.69s/it] +2025-02-05 18:36:47 - ERROR - stderr - 36%|███▌ | 8024/22434 [8:29:07<10:28:54, 2.62s/it] +2025-02-05 18:36:47 - ERROR - stderr - +2025-02-05 18:36:47 - ERROR - stderr - +2025-02-05 18:36:47 - INFO - stdout - {'loss': 0.6712, 'grad_norm': 1.0466971397399902, 'learning_rate': 1.4878652603655873e-05, 'epoch': 1.07} +2025-02-05 18:36:47 - ERROR - stderr - 36%|███▌ | 8024/22434 [8:29:07<10:28:54, 2.62s/it] +2025-02-05 18:36:49 - ERROR - stderr - 36%|███▌ | 8025/22434 [8:29:09<10:21:54, 2.59s/it] +2025-02-05 18:36:49 - ERROR - stderr - +2025-02-05 18:36:49 - ERROR - stderr - +2025-02-05 18:36:49 - INFO - stdout - {'loss': 0.7747, 'grad_norm': 1.1777764558792114, 'learning_rate': 1.4877392279032415e-05, 'epoch': 1.07} +2025-02-05 18:36:49 - ERROR - stderr - 36%|███▌ | 8025/22434 [8:29:09<10:21:54, 2.59s/it] +2025-02-05 18:36:52 - ERROR - stderr - 36%|███▌ | 8026/22434 [8:29:12<10:17:35, 2.57s/it] +2025-02-05 18:36:52 - ERROR - stderr - +2025-02-05 18:36:52 - ERROR - stderr - +2025-02-05 18:36:52 - INFO - stdout - {'loss': 0.7119, 'grad_norm': 1.130785346031189, 'learning_rate': 1.4876131852744382e-05, 'epoch': 1.07} +2025-02-05 18:36:52 - ERROR - stderr - 36%|███▌ | 8026/22434 [8:29:12<10:17:35, 2.57s/it] +2025-02-05 18:36:54 - ERROR - stderr - 36%|███▌ | 8027/22434 [8:29:14<10:18:36, 2.58s/it] +2025-02-05 18:36:54 - ERROR - stderr - +2025-02-05 18:36:54 - ERROR - stderr - +2025-02-05 18:36:54 - INFO - stdout - {'loss': 0.7873, 'grad_norm': 1.1580387353897095, 'learning_rate': 1.487487132481805e-05, 'epoch': 1.07} +2025-02-05 18:36:54 - ERROR - stderr - 36%|███▌ | 8027/22434 [8:29:14<10:18:36, 2.58s/it] +2025-02-05 18:36:57 - ERROR - stderr - 36%|███▌ | 8028/22434 [8:29:17<10:16:42, 2.57s/it] +2025-02-05 18:36:57 - ERROR - stderr - +2025-02-05 18:36:57 - ERROR - stderr - +2025-02-05 18:36:57 - INFO - stdout - {'loss': 0.7553, 'grad_norm': 1.142564296722412, 'learning_rate': 1.4873610695279688e-05, 'epoch': 1.07} +2025-02-05 18:36:57 - ERROR - stderr - 36%|███▌ | 8028/22434 [8:29:17<10:16:42, 2.57s/it] +2025-02-05 18:36:57 - INFO - stdout - WARNING: tokenization mismatch: 156 vs. 174. (ignored) +2025-02-05 18:37:00 - ERROR - stderr - 36%|███▌ | 8029/22434 [8:29:19<10:15:56, 2.57s/it] +2025-02-05 18:37:00 - ERROR - stderr - +2025-02-05 18:37:00 - ERROR - stderr - +2025-02-05 18:37:00 - INFO - stdout - {'loss': 0.7562, 'grad_norm': 1.1370848417282104, 'learning_rate': 1.4872349964155573e-05, 'epoch': 1.07} +2025-02-05 18:37:00 - ERROR - stderr - 36%|███▌ | 8029/22434 [8:29:19<10:15:56, 2.57s/it] +2025-02-05 18:37:02 - ERROR - stderr - 36%|███▌ | 8030/22434 [8:29:22<10:06:32, 2.53s/it] +2025-02-05 18:37:02 - ERROR - stderr - +2025-02-05 18:37:02 - ERROR - stderr - +2025-02-05 18:37:02 - INFO - stdout - {'loss': 0.7632, 'grad_norm': 1.124147653579712, 'learning_rate': 1.4871089131471987e-05, 'epoch': 1.07} +2025-02-05 18:37:02 - ERROR - stderr - 36%|███▌ | 8030/22434 [8:29:22<10:06:32, 2.53s/it] +2025-02-05 18:37:04 - ERROR - stderr - 36%|███▌ | 8031/22434 [8:29:24<10:06:26, 2.53s/it] +2025-02-05 18:37:05 - ERROR - stderr - +2025-02-05 18:37:05 - ERROR - stderr - +2025-02-05 18:37:05 - INFO - stdout - {'loss': 0.7407, 'grad_norm': 1.145578145980835, 'learning_rate': 1.4869828197255208e-05, 'epoch': 1.07} +2025-02-05 18:37:05 - ERROR - stderr - 36%|███▌ | 8031/22434 [8:29:24<10:06:26, 2.53s/it] +2025-02-05 18:37:07 - ERROR - stderr - 36%|███▌ | 8032/22434 [8:29:27<10:18:23, 2.58s/it] +2025-02-05 18:37:07 - ERROR - stderr - +2025-02-05 18:37:07 - ERROR - stderr - +2025-02-05 18:37:07 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.069765567779541, 'learning_rate': 1.4868567161531523e-05, 'epoch': 1.07} +2025-02-05 18:37:07 - ERROR - stderr - 36%|███▌ | 8032/22434 [8:29:27<10:18:23, 2.58s/it] +2025-02-05 18:37:10 - ERROR - stderr - 36%|███▌ | 8033/22434 [8:29:30<10:41:33, 2.67s/it] +2025-02-05 18:37:10 - ERROR - stderr - +2025-02-05 18:37:10 - ERROR - stderr - +2025-02-05 18:37:10 - INFO - stdout - {'loss': 0.7774, 'grad_norm': 1.128211259841919, 'learning_rate': 1.486730602432721e-05, 'epoch': 1.07} +2025-02-05 18:37:10 - ERROR - stderr - 36%|███▌ | 8033/22434 [8:29:30<10:41:33, 2.67s/it] +2025-02-05 18:37:13 - ERROR - stderr - 36%|███▌ | 8034/22434 [8:29:32<10:31:55, 2.63s/it] +2025-02-05 18:37:13 - ERROR - stderr - +2025-02-05 18:37:13 - ERROR - stderr - +2025-02-05 18:37:13 - INFO - stdout - {'loss': 0.7169, 'grad_norm': 1.1276506185531616, 'learning_rate': 1.4866044785668563e-05, 'epoch': 1.07} +2025-02-05 18:37:13 - ERROR - stderr - 36%|███▌ | 8034/22434 [8:29:32<10:31:55, 2.63s/it] +2025-02-05 18:37:15 - ERROR - stderr - 36%|███▌ | 8035/22434 [8:29:35<10:34:46, 2.65s/it] +2025-02-05 18:37:15 - ERROR - stderr - +2025-02-05 18:37:15 - ERROR - stderr - +2025-02-05 18:37:15 - INFO - stdout - {'loss': 0.7352, 'grad_norm': 1.181768536567688, 'learning_rate': 1.4864783445581869e-05, 'epoch': 1.07} +2025-02-05 18:37:15 - ERROR - stderr - 36%|███▌ | 8035/22434 [8:29:35<10:34:46, 2.65s/it] +2025-02-05 18:37:18 - ERROR - stderr - 36%|███▌ | 8036/22434 [8:29:38<10:55:45, 2.73s/it] +2025-02-05 18:37:18 - ERROR - stderr - +2025-02-05 18:37:18 - ERROR - stderr - +2025-02-05 18:37:18 - INFO - stdout - {'loss': 0.7638, 'grad_norm': 1.1667119264602661, 'learning_rate': 1.486352200409342e-05, 'epoch': 1.07} +2025-02-05 18:37:18 - ERROR - stderr - 36%|███▌ | 8036/22434 [8:29:38<10:55:45, 2.73s/it] +2025-02-05 18:37:21 - ERROR - stderr - 36%|███▌ | 8037/22434 [8:29:41<10:42:51, 2.68s/it] +2025-02-05 18:37:21 - ERROR - stderr - +2025-02-05 18:37:21 - ERROR - stderr - +2025-02-05 18:37:21 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.1565440893173218, 'learning_rate': 1.4862260461229507e-05, 'epoch': 1.07} +2025-02-05 18:37:21 - ERROR - stderr - 36%|███▌ | 8037/22434 [8:29:41<10:42:51, 2.68s/it] +2025-02-05 18:37:23 - ERROR - stderr - 36%|███▌ | 8038/22434 [8:29:43<10:31:05, 2.63s/it] +2025-02-05 18:37:23 - ERROR - stderr - +2025-02-05 18:37:23 - ERROR - stderr - +2025-02-05 18:37:23 - INFO - stdout - {'loss': 0.7316, 'grad_norm': 1.0787787437438965, 'learning_rate': 1.4860998817016427e-05, 'epoch': 1.07} +2025-02-05 18:37:23 - ERROR - stderr - 36%|███▌ | 8038/22434 [8:29:43<10:31:05, 2.63s/it] +2025-02-05 18:37:26 - ERROR - stderr - 36%|███▌ | 8039/22434 [8:29:45<10:15:59, 2.57s/it] +2025-02-05 18:37:26 - ERROR - stderr - +2025-02-05 18:37:26 - ERROR - stderr - +2025-02-05 18:37:26 - INFO - stdout - {'loss': 0.7993, 'grad_norm': 1.3282588720321655, 'learning_rate': 1.485973707148048e-05, 'epoch': 1.08} +2025-02-05 18:37:26 - ERROR - stderr - 36%|███▌ | 8039/22434 [8:29:46<10:15:59, 2.57s/it] +2025-02-05 18:37:28 - ERROR - stderr - 36%|███▌ | 8040/22434 [8:29:48<10:12:18, 2.55s/it] +2025-02-05 18:37:28 - ERROR - stderr - +2025-02-05 18:37:28 - ERROR - stderr - +2025-02-05 18:37:28 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.2888188362121582, 'learning_rate': 1.4858475224647964e-05, 'epoch': 1.08} +2025-02-05 18:37:28 - ERROR - stderr - 36%|███▌ | 8040/22434 [8:29:48<10:12:18, 2.55s/it] +2025-02-05 18:37:31 - ERROR - stderr - 36%|███▌ | 8041/22434 [8:29:50<10:06:46, 2.53s/it] +2025-02-05 18:37:31 - ERROR - stderr - +2025-02-05 18:37:31 - ERROR - stderr - +2025-02-05 18:37:31 - INFO - stdout - {'loss': 0.664, 'grad_norm': 1.0674335956573486, 'learning_rate': 1.485721327654518e-05, 'epoch': 1.08} +2025-02-05 18:37:31 - ERROR - stderr - 36%|███▌ | 8041/22434 [8:29:51<10:06:46, 2.53s/it] +2025-02-05 18:37:33 - ERROR - stderr - 36%|███▌ | 8042/22434 [8:29:53<10:06:08, 2.53s/it] +2025-02-05 18:37:33 - ERROR - stderr - +2025-02-05 18:37:33 - ERROR - stderr - +2025-02-05 18:37:33 - INFO - stdout - {'loss': 0.7982, 'grad_norm': 1.2739207744598389, 'learning_rate': 1.4855951227198433e-05, 'epoch': 1.08} +2025-02-05 18:37:33 - ERROR - stderr - 36%|███▌ | 8042/22434 [8:29:53<10:06:08, 2.53s/it] +2025-02-05 18:37:36 - ERROR - stderr - 36%|███▌ | 8043/22434 [8:29:56<10:05:09, 2.52s/it] +2025-02-05 18:37:36 - ERROR - stderr - +2025-02-05 18:37:36 - ERROR - stderr - +2025-02-05 18:37:36 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.1310818195343018, 'learning_rate': 1.485468907663403e-05, 'epoch': 1.08} +2025-02-05 18:37:36 - ERROR - stderr - 36%|███▌ | 8043/22434 [8:29:56<10:05:09, 2.52s/it] +2025-02-05 18:37:38 - ERROR - stderr - 36%|███▌ | 8044/22434 [8:29:58<10:07:15, 2.53s/it] +2025-02-05 18:37:38 - ERROR - stderr - +2025-02-05 18:37:38 - ERROR - stderr - +2025-02-05 18:37:38 - INFO - stdout - {'loss': 0.7179, 'grad_norm': 1.0344246625900269, 'learning_rate': 1.4853426824878279e-05, 'epoch': 1.08} +2025-02-05 18:37:38 - ERROR - stderr - 36%|███▌ | 8044/22434 [8:29:58<10:07:15, 2.53s/it] +2025-02-05 18:37:41 - ERROR - stderr - 36%|███▌ | 8045/22434 [8:30:01<10:04:17, 2.52s/it] +2025-02-05 18:37:41 - ERROR - stderr - +2025-02-05 18:37:41 - ERROR - stderr - +2025-02-05 18:37:41 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.0455644130706787, 'learning_rate': 1.4852164471957486e-05, 'epoch': 1.08} +2025-02-05 18:37:41 - ERROR - stderr - 36%|███▌ | 8045/22434 [8:30:01<10:04:17, 2.52s/it] +2025-02-05 18:37:43 - ERROR - stderr - 36%|███▌ | 8046/22434 [8:30:03<10:01:16, 2.51s/it] +2025-02-05 18:37:43 - ERROR - stderr - +2025-02-05 18:37:43 - ERROR - stderr - +2025-02-05 18:37:43 - INFO - stdout - {'loss': 0.7691, 'grad_norm': 1.2927911281585693, 'learning_rate': 1.485090201789797e-05, 'epoch': 1.08} +2025-02-05 18:37:43 - ERROR - stderr - 36%|███▌ | 8046/22434 [8:30:03<10:01:16, 2.51s/it] +2025-02-05 18:37:46 - ERROR - stderr - 36%|███▌ | 8047/22434 [8:30:06<10:04:35, 2.52s/it] +2025-02-05 18:37:46 - ERROR - stderr - +2025-02-05 18:37:46 - ERROR - stderr - +2025-02-05 18:37:46 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.1126906871795654, 'learning_rate': 1.4849639462726046e-05, 'epoch': 1.08} +2025-02-05 18:37:46 - ERROR - stderr - 36%|███▌ | 8047/22434 [8:30:06<10:04:35, 2.52s/it] +2025-02-05 18:37:48 - ERROR - stderr - 36%|███▌ | 8048/22434 [8:30:08<10:05:05, 2.52s/it] +2025-02-05 18:37:48 - ERROR - stderr - +2025-02-05 18:37:48 - ERROR - stderr - +2025-02-05 18:37:48 - INFO - stdout - {'loss': 0.7977, 'grad_norm': 1.204754114151001, 'learning_rate': 1.4848376806468025e-05, 'epoch': 1.08} +2025-02-05 18:37:48 - ERROR - stderr - 36%|███▌ | 8048/22434 [8:30:08<10:05:05, 2.52s/it] +2025-02-05 18:37:51 - ERROR - stderr - 36%|███▌ | 8049/22434 [8:30:11<10:06:22, 2.53s/it] +2025-02-05 18:37:51 - ERROR - stderr - +2025-02-05 18:37:51 - ERROR - stderr - +2025-02-05 18:37:51 - INFO - stdout - {'loss': 0.6615, 'grad_norm': 0.9937276244163513, 'learning_rate': 1.484711404915023e-05, 'epoch': 1.08} +2025-02-05 18:37:51 - ERROR - stderr - 36%|███▌ | 8049/22434 [8:30:11<10:06:22, 2.53s/it] +2025-02-05 18:37:53 - ERROR - stderr - 36%|███▌ | 8050/22434 [8:30:13<10:04:02, 2.52s/it] +2025-02-05 18:37:53 - ERROR - stderr - +2025-02-05 18:37:53 - ERROR - stderr - +2025-02-05 18:37:53 - INFO - stdout - {'loss': 0.752, 'grad_norm': 1.0892422199249268, 'learning_rate': 1.4845851190798981e-05, 'epoch': 1.08} +2025-02-05 18:37:53 - ERROR - stderr - 36%|███▌ | 8050/22434 [8:30:13<10:04:02, 2.52s/it] +2025-02-05 18:37:56 - ERROR - stderr - 36%|███▌ | 8051/22434 [8:30:16<10:05:24, 2.53s/it] +2025-02-05 18:37:56 - ERROR - stderr - +2025-02-05 18:37:56 - ERROR - stderr - +2025-02-05 18:37:56 - INFO - stdout - {'loss': 0.7497, 'grad_norm': 1.1484980583190918, 'learning_rate': 1.48445882314406e-05, 'epoch': 1.08} +2025-02-05 18:37:56 - ERROR - stderr - 36%|███▌ | 8051/22434 [8:30:16<10:05:24, 2.53s/it] +2025-02-05 18:37:58 - ERROR - stderr - 36%|███▌ | 8052/22434 [8:30:18<10:07:25, 2.53s/it] +2025-02-05 18:37:59 - ERROR - stderr - +2025-02-05 18:37:59 - ERROR - stderr - +2025-02-05 18:37:59 - INFO - stdout - {'loss': 0.8083, 'grad_norm': 1.1781418323516846, 'learning_rate': 1.4843325171101413e-05, 'epoch': 1.08} +2025-02-05 18:37:59 - ERROR - stderr - 36%|███▌ | 8052/22434 [8:30:18<10:07:25, 2.53s/it] +2025-02-05 18:38:01 - ERROR - stderr - 36%|███▌ | 8053/22434 [8:30:21<10:13:41, 2.56s/it] +2025-02-05 18:38:01 - ERROR - stderr - +2025-02-05 18:38:01 - ERROR - stderr - +2025-02-05 18:38:01 - INFO - stdout - {'loss': 0.7941, 'grad_norm': 1.185939908027649, 'learning_rate': 1.484206200980775e-05, 'epoch': 1.08} +2025-02-05 18:38:01 - ERROR - stderr - 36%|███▌ | 8053/22434 [8:30:21<10:13:41, 2.56s/it] +2025-02-05 18:38:04 - ERROR - stderr - 36%|███▌ | 8054/22434 [8:30:23<10:09:23, 2.54s/it] +2025-02-05 18:38:04 - ERROR - stderr - +2025-02-05 18:38:04 - ERROR - stderr - +2025-02-05 18:38:04 - INFO - stdout - {'loss': 0.7479, 'grad_norm': 1.1410280466079712, 'learning_rate': 1.4840798747585934e-05, 'epoch': 1.08} +2025-02-05 18:38:04 - ERROR - stderr - 36%|███▌ | 8054/22434 [8:30:23<10:09:23, 2.54s/it] +2025-02-05 18:38:06 - ERROR - stderr - 36%|███▌ | 8055/22434 [8:30:26<10:05:26, 2.53s/it] +2025-02-05 18:38:06 - ERROR - stderr - +2025-02-05 18:38:06 - ERROR - stderr - +2025-02-05 18:38:06 - INFO - stdout - {'loss': 0.6584, 'grad_norm': 1.0887832641601562, 'learning_rate': 1.4839535384462305e-05, 'epoch': 1.08} +2025-02-05 18:38:06 - ERROR - stderr - 36%|███▌ | 8055/22434 [8:30:26<10:05:26, 2.53s/it] +2025-02-05 18:38:09 - ERROR - stderr - 36%|███▌ | 8056/22434 [8:30:28<10:03:03, 2.52s/it] +2025-02-05 18:38:09 - ERROR - stderr - +2025-02-05 18:38:09 - ERROR - stderr - +2025-02-05 18:38:09 - INFO - stdout - {'loss': 0.6967, 'grad_norm': 1.120153784751892, 'learning_rate': 1.4838271920463188e-05, 'epoch': 1.08} +2025-02-05 18:38:09 - ERROR - stderr - 36%|███▌ | 8056/22434 [8:30:28<10:03:03, 2.52s/it] +2025-02-05 18:38:11 - ERROR - stderr - 36%|███▌ | 8057/22434 [8:30:31<10:01:34, 2.51s/it] +2025-02-05 18:38:11 - ERROR - stderr - +2025-02-05 18:38:11 - ERROR - stderr - +2025-02-05 18:38:11 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.2411237955093384, 'learning_rate': 1.4837008355614923e-05, 'epoch': 1.08} +2025-02-05 18:38:11 - ERROR - stderr - 36%|███▌ | 8057/22434 [8:30:31<10:01:34, 2.51s/it] +2025-02-05 18:38:14 - ERROR - stderr - 36%|███▌ | 8058/22434 [8:30:33<10:00:23, 2.51s/it] +2025-02-05 18:38:14 - ERROR - stderr - +2025-02-05 18:38:14 - ERROR - stderr - +2025-02-05 18:38:14 - INFO - stdout - {'loss': 0.73, 'grad_norm': 1.2029176950454712, 'learning_rate': 1.4835744689943844e-05, 'epoch': 1.08} +2025-02-05 18:38:14 - ERROR - stderr - 36%|███▌ | 8058/22434 [8:30:33<10:00:23, 2.51s/it] +2025-02-05 18:38:16 - ERROR - stderr - 36%|███▌ | 8059/22434 [8:30:36<9:58:48, 2.50s/it] +2025-02-05 18:38:16 - ERROR - stderr - +2025-02-05 18:38:16 - ERROR - stderr - +2025-02-05 18:38:16 - INFO - stdout - {'loss': 0.7253, 'grad_norm': 1.1196104288101196, 'learning_rate': 1.4834480923476302e-05, 'epoch': 1.08} +2025-02-05 18:38:16 - ERROR - stderr - 36%|███▌ | 8059/22434 [8:30:36<9:58:48, 2.50s/it] +2025-02-05 18:38:19 - ERROR - stderr - 36%|███▌ | 8060/22434 [8:30:38<9:54:31, 2.48s/it] +2025-02-05 18:38:19 - ERROR - stderr - +2025-02-05 18:38:19 - ERROR - stderr - +2025-02-05 18:38:19 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.145012617111206, 'learning_rate': 1.4833217056238628e-05, 'epoch': 1.08} +2025-02-05 18:38:19 - ERROR - stderr - 36%|███▌ | 8060/22434 [8:30:38<9:54:31, 2.48s/it] +2025-02-05 18:38:21 - ERROR - stderr - 36%|███▌ | 8061/22434 [8:30:41<9:50:47, 2.47s/it] +2025-02-05 18:38:21 - ERROR - stderr - +2025-02-05 18:38:21 - ERROR - stderr - +2025-02-05 18:38:21 - INFO - stdout - {'loss': 0.7989, 'grad_norm': 1.3130682706832886, 'learning_rate': 1.4831953088257167e-05, 'epoch': 1.08} +2025-02-05 18:38:21 - ERROR - stderr - 36%|███▌ | 8061/22434 [8:30:41<9:50:47, 2.47s/it] +2025-02-05 18:38:23 - ERROR - stderr - 36%|███▌ | 8062/22434 [8:30:43<9:50:56, 2.47s/it] +2025-02-05 18:38:23 - ERROR - stderr - +2025-02-05 18:38:23 - ERROR - stderr - +2025-02-05 18:38:23 - INFO - stdout - {'loss': 0.7565, 'grad_norm': 1.0504564046859741, 'learning_rate': 1.4830689019558269e-05, 'epoch': 1.08} +2025-02-05 18:38:23 - ERROR - stderr - 36%|███▌ | 8062/22434 [8:30:43<9:50:56, 2.47s/it] +2025-02-05 18:38:26 - ERROR - stderr - 36%|███▌ | 8063/22434 [8:30:46<9:52:40, 2.47s/it] +2025-02-05 18:38:26 - ERROR - stderr - +2025-02-05 18:38:26 - ERROR - stderr - +2025-02-05 18:38:26 - INFO - stdout - {'loss': 0.7663, 'grad_norm': 1.3305295705795288, 'learning_rate': 1.4829424850168282e-05, 'epoch': 1.08} +2025-02-05 18:38:26 - ERROR - stderr - 36%|███▌ | 8063/22434 [8:30:46<9:52:40, 2.47s/it] +2025-02-05 18:38:28 - ERROR - stderr - 36%|███▌ | 8064/22434 [8:30:48<10:00:08, 2.51s/it] +2025-02-05 18:38:29 - ERROR - stderr - +2025-02-05 18:38:29 - ERROR - stderr - +2025-02-05 18:38:29 - INFO - stdout - {'loss': 0.748, 'grad_norm': 1.146509051322937, 'learning_rate': 1.4828160580113554e-05, 'epoch': 1.08} +2025-02-05 18:38:29 - ERROR - stderr - 36%|███▌ | 8064/22434 [8:30:48<10:00:08, 2.51s/it] +2025-02-05 18:38:31 - ERROR - stderr - 36%|███▌ | 8065/22434 [8:30:51<10:08:43, 2.54s/it] +2025-02-05 18:38:31 - ERROR - stderr - +2025-02-05 18:38:31 - ERROR - stderr - +2025-02-05 18:38:31 - INFO - stdout - {'loss': 0.767, 'grad_norm': 1.2264225482940674, 'learning_rate': 1.4826896209420439e-05, 'epoch': 1.08} +2025-02-05 18:38:31 - ERROR - stderr - 36%|███▌ | 8065/22434 [8:30:51<10:08:43, 2.54s/it] +2025-02-05 18:38:34 - ERROR - stderr - 36%|███▌ | 8066/22434 [8:30:53<10:10:33, 2.55s/it] +2025-02-05 18:38:34 - ERROR - stderr - +2025-02-05 18:38:34 - ERROR - stderr - +2025-02-05 18:38:34 - INFO - stdout - {'loss': 0.7268, 'grad_norm': 1.201645016670227, 'learning_rate': 1.4825631738115289e-05, 'epoch': 1.08} +2025-02-05 18:38:34 - ERROR - stderr - 36%|███▌ | 8066/22434 [8:30:53<10:10:33, 2.55s/it] +2025-02-05 18:38:36 - ERROR - stderr - 36%|███▌ | 8067/22434 [8:30:56<10:09:29, 2.55s/it] +2025-02-05 18:38:36 - ERROR - stderr - +2025-02-05 18:38:36 - ERROR - stderr - +2025-02-05 18:38:36 - INFO - stdout - {'loss': 0.7099, 'grad_norm': 1.1194539070129395, 'learning_rate': 1.4824367166224468e-05, 'epoch': 1.08} +2025-02-05 18:38:36 - ERROR - stderr - 36%|███▌ | 8067/22434 [8:30:56<10:09:29, 2.55s/it] +2025-02-05 18:38:39 - ERROR - stderr - 36%|██���▌ | 8068/22434 [8:30:59<10:08:36, 2.54s/it] +2025-02-05 18:38:39 - ERROR - stderr - +2025-02-05 18:38:39 - ERROR - stderr - +2025-02-05 18:38:39 - INFO - stdout - {'loss': 0.7231, 'grad_norm': 1.1586989164352417, 'learning_rate': 1.4823102493774325e-05, 'epoch': 1.08} +2025-02-05 18:38:39 - ERROR - stderr - 36%|███▌ | 8068/22434 [8:30:59<10:08:36, 2.54s/it] +2025-02-05 18:38:41 - ERROR - stderr - 36%|███▌ | 8069/22434 [8:31:01<10:09:59, 2.55s/it] +2025-02-05 18:38:41 - ERROR - stderr - +2025-02-05 18:38:41 - ERROR - stderr - +2025-02-05 18:38:41 - INFO - stdout - {'loss': 0.731, 'grad_norm': 1.1162248849868774, 'learning_rate': 1.482183772079123e-05, 'epoch': 1.08} +2025-02-05 18:38:41 - ERROR - stderr - 36%|███▌ | 8069/22434 [8:31:01<10:09:59, 2.55s/it] +2025-02-05 18:38:44 - ERROR - stderr - 36%|███▌ | 8070/22434 [8:31:04<10:05:58, 2.53s/it] +2025-02-05 18:38:44 - ERROR - stderr - +2025-02-05 18:38:44 - ERROR - stderr - +2025-02-05 18:38:44 - INFO - stdout - {'loss': 0.7975, 'grad_norm': 1.174980878829956, 'learning_rate': 1.482057284730154e-05, 'epoch': 1.08} +2025-02-05 18:38:44 - ERROR - stderr - 36%|███▌ | 8070/22434 [8:31:04<10:05:58, 2.53s/it] +2025-02-05 18:38:46 - ERROR - stderr - 36%|███▌ | 8071/22434 [8:31:06<10:09:43, 2.55s/it] +2025-02-05 18:38:46 - ERROR - stderr - +2025-02-05 18:38:46 - ERROR - stderr - +2025-02-05 18:38:46 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.1753500699996948, 'learning_rate': 1.4819307873331619e-05, 'epoch': 1.08} +2025-02-05 18:38:46 - ERROR - stderr - 36%|███▌ | 8071/22434 [8:31:06<10:09:43, 2.55s/it] +2025-02-05 18:38:49 - ERROR - stderr - 36%|███▌ | 8072/22434 [8:31:09<10:08:22, 2.54s/it] +2025-02-05 18:38:49 - ERROR - stderr - +2025-02-05 18:38:49 - ERROR - stderr - +2025-02-05 18:38:49 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.130003809928894, 'learning_rate': 1.4818042798907841e-05, 'epoch': 1.08} +2025-02-05 18:38:49 - ERROR - stderr - 36%|███▌ | 8072/22434 [8:31:09<10:08:22, 2.54s/it] +2025-02-05 18:38:52 - ERROR - stderr - 36%|███▌ | 8073/22434 [8:31:11<10:12:40, 2.56s/it] +2025-02-05 18:38:52 - ERROR - stderr - +2025-02-05 18:38:52 - ERROR - stderr - +2025-02-05 18:38:52 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.296520471572876, 'learning_rate': 1.481677762405657e-05, 'epoch': 1.08} +2025-02-05 18:38:52 - ERROR - stderr - 36%|███▌ | 8073/22434 [8:31:11<10:12:40, 2.56s/it] +2025-02-05 18:38:54 - ERROR - stderr - 36%|███▌ | 8074/22434 [8:31:14<10:07:10, 2.54s/it] +2025-02-05 18:38:54 - ERROR - stderr - +2025-02-05 18:38:54 - ERROR - stderr - +2025-02-05 18:38:54 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.1674833297729492, 'learning_rate': 1.4815512348804177e-05, 'epoch': 1.08} +2025-02-05 18:38:54 - ERROR - stderr - 36%|███▌ | 8074/22434 [8:31:14<10:07:10, 2.54s/it] +2025-02-05 18:38:56 - ERROR - stderr - 36%|███▌ | 8075/22434 [8:31:16<9:59:08, 2.50s/it] +2025-02-05 18:38:56 - ERROR - stderr - +2025-02-05 18:38:56 - ERROR - stderr - +2025-02-05 18:38:56 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.1946903467178345, 'learning_rate': 1.4814246973177038e-05, 'epoch': 1.08} +2025-02-05 18:38:56 - ERROR - stderr - 36%|███▌ | 8075/22434 [8:31:16<9:59:08, 2.50s/it] +2025-02-05 18:38:59 - ERROR - stderr - 36%|███▌ | 8076/22434 [8:31:19<9:58:58, 2.50s/it] +2025-02-05 18:38:59 - ERROR - stderr - +2025-02-05 18:38:59 - ERROR - stderr - +2025-02-05 18:38:59 - INFO - stdout - {'loss': 0.7705, 'grad_norm': 1.1635011434555054, 'learning_rate': 1.481298149720153e-05, 'epoch': 1.08} +2025-02-05 18:38:59 - ERROR - stderr - 36%|███▌ | 8076/22434 [8:31:19<9:58:58, 2.50s/it] +2025-02-05 18:39:01 - ERROR - stderr - 36%|███▌ | 8077/22434 [8:31:21<9:59:38, 2.51s/it] +2025-02-05 18:39:01 - ERROR - stderr - +2025-02-05 18:39:01 - ERROR - stderr - +2025-02-05 18:39:01 - INFO - stdout - {'loss': 0.6686, 'grad_norm': 1.1581525802612305, 'learning_rate': 1.4811715920904024e-05, 'epoch': 1.08} +2025-02-05 18:39:01 - ERROR - stderr - 36%|███▌ | 8077/22434 [8:31:21<9:59:38, 2.51s/it] +2025-02-05 18:39:04 - ERROR - stderr - 36%|███▌ | 8078/22434 [8:31:24<10:07:52, 2.54s/it] +2025-02-05 18:39:04 - ERROR - stderr - +2025-02-05 18:39:04 - ERROR - stderr - +2025-02-05 18:39:04 - INFO - stdout - {'loss': 0.8075, 'grad_norm': 1.2597030401229858, 'learning_rate': 1.4810450244310905e-05, 'epoch': 1.08} +2025-02-05 18:39:04 - ERROR - stderr - 36%|███▌ | 8078/22434 [8:31:24<10:07:52, 2.54s/it] +2025-02-05 18:39:07 - ERROR - stderr - 36%|███▌ | 8079/22434 [8:31:26<10:08:22, 2.54s/it] +2025-02-05 18:39:07 - ERROR - stderr - +2025-02-05 18:39:07 - ERROR - stderr - +2025-02-05 18:39:07 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.0820128917694092, 'learning_rate': 1.4809184467448554e-05, 'epoch': 1.08} +2025-02-05 18:39:07 - ERROR - stderr - 36%|███▌ | 8079/22434 [8:31:26<10:08:22, 2.54s/it] +2025-02-05 18:39:09 - ERROR - stderr - 36%|███▌ | 8080/22434 [8:31:29<10:06:57, 2.54s/it] +2025-02-05 18:39:09 - ERROR - stderr - +2025-02-05 18:39:09 - ERROR - stderr - +2025-02-05 18:39:09 - INFO - stdout - {'loss': 0.7591, 'grad_norm': 1.1963951587677002, 'learning_rate': 1.4807918590343358e-05, 'epoch': 1.08} +2025-02-05 18:39:09 - ERROR - stderr - 36%|███▌ | 8080/22434 [8:31:29<10:06:57, 2.54s/it] +2025-02-05 18:39:12 - ERROR - stderr - 36%|███▌ | 8081/22434 [8:31:31<10:04:35, 2.53s/it] +2025-02-05 18:39:12 - ERROR - stderr - +2025-02-05 18:39:12 - ERROR - stderr - +2025-02-05 18:39:12 - INFO - stdout - {'loss': 0.7629, 'grad_norm': 1.13186776638031, 'learning_rate': 1.4806652613021697e-05, 'epoch': 1.08} +2025-02-05 18:39:12 - ERROR - stderr - 36%|███▌ | 8081/22434 [8:31:31<10:04:35, 2.53s/it] +2025-02-05 18:39:14 - ERROR - stderr - 36%|███▌ | 8082/22434 [8:31:34<10:06:13, 2.53s/it] +2025-02-05 18:39:14 - ERROR - stderr - +2025-02-05 18:39:14 - ERROR - stderr - +2025-02-05 18:39:14 - INFO - stdout - {'loss': 0.6952, 'grad_norm': 1.1139552593231201, 'learning_rate': 1.4805386535509963e-05, 'epoch': 1.08} +2025-02-05 18:39:14 - ERROR - stderr - 36%|███▌ | 8082/22434 [8:31:34<10:06:13, 2.53s/it] +2025-02-05 18:39:17 - ERROR - stderr - 36%|███▌ | 8083/22434 [8:31:37<10:08:10, 2.54s/it] +2025-02-05 18:39:17 - ERROR - stderr - +2025-02-05 18:39:17 - ERROR - stderr - +2025-02-05 18:39:17 - INFO - stdout - {'loss': 0.6826, 'grad_norm': 1.1565749645233154, 'learning_rate': 1.4804120357834545e-05, 'epoch': 1.08} +2025-02-05 18:39:17 - ERROR - stderr - 36%|███▌ | 8083/22434 [8:31:37<10:08:10, 2.54s/it] +2025-02-05 18:39:19 - ERROR - stderr - 36%|███▌ | 8084/22434 [8:31:39<10:04:43, 2.53s/it] +2025-02-05 18:39:19 - ERROR - stderr - +2025-02-05 18:39:19 - ERROR - stderr - +2025-02-05 18:39:19 - INFO - stdout - {'loss': 0.8055, 'grad_norm': 1.1168111562728882, 'learning_rate': 1.4802854080021831e-05, 'epoch': 1.08} +2025-02-05 18:39:19 - ERROR - stderr - 36%|███▌ | 8084/22434 [8:31:39<10:04:43, 2.53s/it] +2025-02-05 18:39:22 - ERROR - stderr - 36%|███▌ | 8085/22434 [8:31:41<10:01:26, 2.51s/it] +2025-02-05 18:39:22 - ERROR - stderr - +2025-02-05 18:39:22 - ERROR - stderr - +2025-02-05 18:39:22 - INFO - stdout - {'loss': 0.7753, 'grad_norm': 1.1635884046554565, 'learning_rate': 1.480158770209822e-05, 'epoch': 1.08} +2025-02-05 18:39:22 - ERROR - stderr - 36%|███▌ | 8085/22434 [8:31:42<10:01:26, 2.51s/it] +2025-02-05 18:39:24 - ERROR - stderr - 36%|███▌ | 8086/22434 [8:31:44<9:59:28, 2.51s/it] +2025-02-05 18:39:24 - ERROR - stderr - +2025-02-05 18:39:24 - ERROR - stderr - +2025-02-05 18:39:24 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.1757169961929321, 'learning_rate': 1.4800321224090114e-05, 'epoch': 1.08} +2025-02-05 18:39:24 - ERROR - stderr - 36%|███▌ | 8086/22434 [8:31:44<9:59:28, 2.51s/it] +2025-02-05 18:39:27 - ERROR - stderr - 36%|███▌ | 8087/22434 [8:31:47<10:02:37, 2.52s/it] +2025-02-05 18:39:27 - ERROR - stderr - +2025-02-05 18:39:27 - ERROR - stderr - +2025-02-05 18:39:27 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.0689467191696167, 'learning_rate': 1.47990546460239e-05, 'epoch': 1.08} +2025-02-05 18:39:27 - ERROR - stderr - 36%|███▌ | 8087/22434 [8:31:47<10:02:37, 2.52s/it] +2025-02-05 18:39:29 - ERROR - stderr - 36%|███▌ | 8088/22434 [8:31:49<9:57:56, 2.50s/it] +2025-02-05 18:39:29 - ERROR - stderr - +2025-02-05 18:39:29 - ERROR - stderr - +2025-02-05 18:39:29 - INFO - stdout - {'loss': 0.7091, 'grad_norm': 1.1055799722671509, 'learning_rate': 1.4797787967925988e-05, 'epoch': 1.08} +2025-02-05 18:39:29 - ERROR - stderr - 36%|███▌ | 8088/22434 [8:31:49<9:57:56, 2.50s/it] +2025-02-05 18:39:32 - ERROR - stderr - 36%|███▌ | 8089/22434 [8:31:52<10:12:09, 2.56s/it] +2025-02-05 18:39:32 - ERROR - stderr - +2025-02-05 18:39:32 - ERROR - stderr - +2025-02-05 18:39:32 - INFO - stdout - {'loss': 0.6615, 'grad_norm': 1.1361255645751953, 'learning_rate': 1.4796521189822774e-05, 'epoch': 1.08} +2025-02-05 18:39:32 - ERROR - stderr - 36%|███▌ | 8089/22434 [8:31:52<10:12:09, 2.56s/it] +2025-02-05 18:39:34 - ERROR - stderr - 36%|███▌ | 8090/22434 [8:31:54<10:07:19, 2.54s/it] +2025-02-05 18:39:34 - ERROR - stderr - +2025-02-05 18:39:34 - ERROR - stderr - +2025-02-05 18:39:34 - INFO - stdout - {'loss': 0.7231, 'grad_norm': 1.2079881429672241, 'learning_rate': 1.4795254311740666e-05, 'epoch': 1.08} +2025-02-05 18:39:34 - ERROR - stderr - 36%|███▌ | 8090/22434 [8:31:54<10:07:19, 2.54s/it] +2025-02-05 18:39:37 - ERROR - stderr - 36%|███▌ | 8091/22434 [8:31:57<10:03:15, 2.52s/it] +2025-02-05 18:39:37 - ERROR - stderr - +2025-02-05 18:39:37 - ERROR - stderr - +2025-02-05 18:39:37 - INFO - stdout - {'loss': 0.7818, 'grad_norm': 1.0947825908660889, 'learning_rate': 1.479398733370607e-05, 'epoch': 1.08} +2025-02-05 18:39:37 - ERROR - stderr - 36%|███▌ | 8091/22434 [8:31:57<10:03:15, 2.52s/it] +2025-02-05 18:39:39 - ERROR - stderr - 36%|███▌ | 8092/22434 [8:31:59<10:01:41, 2.52s/it] +2025-02-05 18:39:39 - ERROR - stderr - +2025-02-05 18:39:39 - ERROR - stderr - +2025-02-05 18:39:39 - INFO - stdout - {'loss': 0.7583, 'grad_norm': 1.1490260362625122, 'learning_rate': 1.47927202557454e-05, 'epoch': 1.08} +2025-02-05 18:39:39 - ERROR - stderr - 36%|███▌ | 8092/22434 [8:31:59<10:01:41, 2.52s/it] +2025-02-05 18:39:42 - ERROR - stderr - 36%|███▌ | 8093/22434 [8:32:02<10:17:38, 2.58s/it] +2025-02-05 18:39:42 - ERROR - stderr - +2025-02-05 18:39:42 - ERROR - stderr - +2025-02-05 18:39:42 - INFO - stdout - {'loss': 0.7376, 'grad_norm': 1.160922884941101, 'learning_rate': 1.4791453077885056e-05, 'epoch': 1.08} +2025-02-05 18:39:42 - ERROR - stderr - 36%|███▌ | 8093/22434 [8:32:02<10:17:38, 2.58s/it] +2025-02-05 18:39:45 - ERROR - stderr - 36%|███▌ | 8094/22434 [8:32:04<10:05:48, 2.53s/it] +2025-02-05 18:39:45 - ERROR - stderr - +2025-02-05 18:39:45 - ERROR - stderr - +2025-02-05 18:39:45 - INFO - stdout - {'loss': 0.8051, 'grad_norm': 1.2203446626663208, 'learning_rate': 1.479018580015146e-05, 'epoch': 1.08} +2025-02-05 18:39:45 - ERROR - stderr - 36%|███▌ | 8094/22434 [8:32:04<10:05:48, 2.53s/it] +2025-02-05 18:39:47 - ERROR - stderr - 36%|███▌ | 8095/22434 [8:32:07<10:02:38, 2.52s/it] +2025-02-05 18:39:47 - ERROR - stderr - +2025-02-05 18:39:47 - ERROR - stderr - +2025-02-05 18:39:47 - INFO - stdout - {'loss': 0.8003, 'grad_norm': 1.2284289598464966, 'learning_rate': 1.4788918422571023e-05, 'epoch': 1.08} +2025-02-05 18:39:47 - ERROR - stderr - 36%|███▌ | 8095/22434 [8:32:07<10:02:38, 2.52s/it] +2025-02-05 18:39:50 - ERROR - stderr - 36%|███▌ | 8096/22434 [8:32:09<10:03:18, 2.52s/it] +2025-02-05 18:39:50 - ERROR - stderr - +2025-02-05 18:39:50 - ERROR - stderr - +2025-02-05 18:39:50 - INFO - stdout - {'loss': 0.7464, 'grad_norm': 1.1419718265533447, 'learning_rate': 1.4787650945170167e-05, 'epoch': 1.08} +2025-02-05 18:39:50 - ERROR - stderr - 36%|███▌ | 8096/22434 [8:32:09<10:03:18, 2.52s/it] +2025-02-05 18:39:52 - ERROR - stderr - 36%|███▌ | 8097/22434 [8:32:12<10:03:09, 2.52s/it] +2025-02-05 18:39:52 - ERROR - stderr - +2025-02-05 18:39:52 - ERROR - stderr - +2025-02-05 18:39:52 - INFO - stdout - {'loss': 0.7531, 'grad_norm': 1.1263338327407837, 'learning_rate': 1.4786383367975308e-05, 'epoch': 1.08} +2025-02-05 18:39:52 - ERROR - stderr - 36%|███▌ | 8097/22434 [8:32:12<10:03:09, 2.52s/it] +2025-02-05 18:39:55 - ERROR - stderr - 36%|███▌ | 8098/22434 [8:32:15<10:16:08, 2.58s/it] +2025-02-05 18:39:55 - ERROR - stderr - +2025-02-05 18:39:55 - ERROR - stderr - +2025-02-05 18:39:55 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.1092720031738281, 'learning_rate': 1.4785115691012866e-05, 'epoch': 1.08} +2025-02-05 18:39:55 - ERROR - stderr - 36%|███▌ | 8098/22434 [8:32:15<10:16:08, 2.58s/it] +2025-02-05 18:39:57 - ERROR - stderr - 36%|███▌ | 8099/22434 [8:32:17<10:13:14, 2.57s/it] +2025-02-05 18:39:57 - ERROR - stderr - +2025-02-05 18:39:57 - ERROR - stderr - +2025-02-05 18:39:57 - INFO - stdout - {'loss': 0.7782, 'grad_norm': 1.1124712228775024, 'learning_rate': 1.4783847914309268e-05, 'epoch': 1.08} +2025-02-05 18:39:57 - ERROR - stderr - 36%|███▌ | 8099/22434 [8:32:17<10:13:14, 2.57s/it] +2025-02-05 18:40:00 - ERROR - stderr - 36%|███▌ | 8100/22434 [8:32:20<10:16:49, 2.58s/it] +2025-02-05 18:40:00 - ERROR - stderr - +2025-02-05 18:40:00 - ERROR - stderr - +2025-02-05 18:40:00 - INFO - stdout - {'loss': 0.7335, 'grad_norm': 1.1575204133987427, 'learning_rate': 1.478258003789094e-05, 'epoch': 1.08} +2025-02-05 18:40:00 - ERROR - stderr - 36%|███▌ | 8100/22434 [8:32:20<10:16:49, 2.58s/it] +2025-02-05 18:40:03 - ERROR - stderr - 36%|███▌ | 8101/22434 [8:32:22<10:15:22, 2.58s/it] +2025-02-05 18:40:03 - ERROR - stderr - +2025-02-05 18:40:03 - ERROR - stderr - +2025-02-05 18:40:03 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.0983413457870483, 'learning_rate': 1.4781312061784302e-05, 'epoch': 1.08} +2025-02-05 18:40:03 - ERROR - stderr - 36%|███▌ | 8101/22434 [8:32:22<10:15:22, 2.58s/it] +2025-02-05 18:40:05 - ERROR - stderr - 36%|███▌ | 8102/22434 [8:32:25<10:10:19, 2.56s/it] +2025-02-05 18:40:05 - ERROR - stderr - +2025-02-05 18:40:05 - ERROR - stderr - +2025-02-05 18:40:05 - INFO - stdout - {'loss': 0.6741, 'grad_norm': 1.0680220127105713, 'learning_rate': 1.4780043986015792e-05, 'epoch': 1.08} +2025-02-05 18:40:05 - ERROR - stderr - 36%|███▌ | 8102/22434 [8:32:25<10:10:19, 2.56s/it] +2025-02-05 18:40:07 - ERROR - stderr - 36%|███▌ | 8103/22434 [8:32:27<10:03:39, 2.53s/it] +2025-02-05 18:40:08 - ERROR - stderr - +2025-02-05 18:40:08 - ERROR - stderr - +2025-02-05 18:40:08 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 0.989000678062439, 'learning_rate': 1.4778775810611836e-05, 'epoch': 1.08} +2025-02-05 18:40:08 - ERROR - stderr - 36%|███▌ | 8103/22434 [8:32:27<10:03:39, 2.53s/it] +2025-02-05 18:40:10 - ERROR - stderr - 36%|███▌ | 8104/22434 [8:32:30<10:01:24, 2.52s/it] +2025-02-05 18:40:10 - ERROR - stderr - +2025-02-05 18:40:10 - ERROR - stderr - +2025-02-05 18:40:10 - INFO - stdout - {'loss': 0.7461, 'grad_norm': 1.1988558769226074, 'learning_rate': 1.4777507535598878e-05, 'epoch': 1.08} +2025-02-05 18:40:10 - ERROR - stderr - 36%|███▌ | 8104/22434 [8:32:30<10:01:24, 2.52s/it] +2025-02-05 18:40:13 - ERROR - stderr - 36%|███▌ | 8105/22434 [8:32:32<10:03:05, 2.53s/it] +2025-02-05 18:40:13 - ERROR - stderr - +2025-02-05 18:40:13 - ERROR - stderr - +2025-02-05 18:40:13 - INFO - stdout - {'loss': 0.7594, 'grad_norm': 1.2828068733215332, 'learning_rate': 1.4776239161003343e-05, 'epoch': 1.08} +2025-02-05 18:40:13 - ERROR - stderr - 36%|███▌ | 8105/22434 [8:32:32<10:03:05, 2.53s/it] +2025-02-05 18:40:15 - ERROR - stderr - 36%|███▌ | 8106/22434 [8:32:35<10:07:29, 2.54s/it] +2025-02-05 18:40:15 - ERROR - stderr - +2025-02-05 18:40:15 - ERROR - stderr - +2025-02-05 18:40:15 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.1348973512649536, 'learning_rate': 1.4774970686851671e-05, 'epoch': 1.08} +2025-02-05 18:40:15 - ERROR - stderr - 36%|███▌ | 8106/22434 [8:32:35<10:07:29, 2.54s/it] +2025-02-05 18:40:18 - ERROR - stderr - 36%|███▌ | 8107/22434 [8:32:37<10:06:34, 2.54s/it] +2025-02-05 18:40:18 - ERROR - stderr - +2025-02-05 18:40:18 - ERROR - stderr - +2025-02-05 18:40:18 - INFO - stdout - {'loss': 0.704, 'grad_norm': 1.1082526445388794, 'learning_rate': 1.4773702113170308e-05, 'epoch': 1.08} +2025-02-05 18:40:18 - ERROR - stderr - 36%|███▌ | 8107/22434 [8:32:37<10:06:34, 2.54s/it] +2025-02-05 18:40:20 - ERROR - stderr - 36%|███▌ | 8108/22434 [8:32:40<10:05:30, 2.54s/it] +2025-02-05 18:40:20 - ERROR - stderr - +2025-02-05 18:40:20 - ERROR - stderr - +2025-02-05 18:40:20 - INFO - stdout - {'loss': 0.7319, 'grad_norm': 1.1594127416610718, 'learning_rate': 1.4772433439985692e-05, 'epoch': 1.08} +2025-02-05 18:40:20 - ERROR - stderr - 36%|███▌ | 8108/22434 [8:32:40<10:05:30, 2.54s/it] +2025-02-05 18:40:23 - ERROR - stderr - 36%|███▌ | 8109/22434 [8:32:42<10:01:42, 2.52s/it] +2025-02-05 18:40:23 - ERROR - stderr - +2025-02-05 18:40:23 - ERROR - stderr - +2025-02-05 18:40:23 - INFO - stdout - {'loss': 0.7362, 'grad_norm': 1.0741583108901978, 'learning_rate': 1.4771164667324262e-05, 'epoch': 1.08} +2025-02-05 18:40:23 - ERROR - stderr - 36%|███▌ | 8109/22434 [8:32:42<10:01:42, 2.52s/it] +2025-02-05 18:40:25 - ERROR - stderr - 36%|███▌ | 8110/22434 [8:32:45<10:01:26, 2.52s/it] +2025-02-05 18:40:25 - ERROR - stderr - +2025-02-05 18:40:25 - ERROR - stderr - +2025-02-05 18:40:25 - INFO - stdout - {'loss': 0.7232, 'grad_norm': 1.0652552843093872, 'learning_rate': 1.4769895795212476e-05, 'epoch': 1.08} +2025-02-05 18:40:25 - ERROR - stderr - 36%|███▌ | 8110/22434 [8:32:45<10:01:26, 2.52s/it] +2025-02-05 18:40:28 - ERROR - stderr - 36%|███▌ | 8111/22434 [8:32:47<9:57:51, 2.50s/it] +2025-02-05 18:40:28 - ERROR - stderr - +2025-02-05 18:40:28 - ERROR - stderr - +2025-02-05 18:40:28 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.0738643407821655, 'learning_rate': 1.4768626823676775e-05, 'epoch': 1.08} +2025-02-05 18:40:28 - ERROR - stderr - 36%|███▌ | 8111/22434 [8:32:47<9:57:51, 2.50s/it] +2025-02-05 18:40:31 - ERROR - stderr - 36%|███▌ | 8112/22434 [8:32:50<10:32:03, 2.65s/it] +2025-02-05 18:40:31 - ERROR - stderr - +2025-02-05 18:40:31 - ERROR - stderr - +2025-02-05 18:40:31 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 0.9982830286026001, 'learning_rate': 1.4767357752743612e-05, 'epoch': 1.08} +2025-02-05 18:40:31 - ERROR - stderr - 36%|███▌ | 8112/22434 [8:32:50<10:32:03, 2.65s/it] +2025-02-05 18:40:33 - ERROR - stderr - 36%|███▌ | 8113/22434 [8:32:53<10:24:08, 2.61s/it] +2025-02-05 18:40:33 - ERROR - stderr - +2025-02-05 18:40:33 - ERROR - stderr - +2025-02-05 18:40:33 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.0571751594543457, 'learning_rate': 1.4766088582439438e-05, 'epoch': 1.08} +2025-02-05 18:40:33 - ERROR - stderr - 36%|███▌ | 8113/22434 [8:32:53<10:24:08, 2.61s/it] +2025-02-05 18:40:36 - ERROR - stderr - 36%|███▌ | 8114/22434 [8:32:55<10:18:24, 2.59s/it] +2025-02-05 18:40:36 - ERROR - stderr - +2025-02-05 18:40:36 - ERROR - stderr - +2025-02-05 18:40:36 - INFO - stdout - {'loss': 0.7474, 'grad_norm': 1.2180893421173096, 'learning_rate': 1.4764819312790706e-05, 'epoch': 1.09} +2025-02-05 18:40:36 - ERROR - stderr - 36%|███▌ | 8114/22434 [8:32:56<10:18:24, 2.59s/it] +2025-02-05 18:40:38 - ERROR - stderr - 36%|███▌ | 8115/22434 [8:32:58<10:10:16, 2.56s/it] +2025-02-05 18:40:38 - ERROR - stderr - +2025-02-05 18:40:38 - ERROR - stderr - +2025-02-05 18:40:38 - INFO - stdout - {'loss': 0.7408, 'grad_norm': 1.1505693197250366, 'learning_rate': 1.4763549943823876e-05, 'epoch': 1.09} +2025-02-05 18:40:38 - ERROR - stderr - 36%|███▌ | 8115/22434 [8:32:58<10:10:16, 2.56s/it] +2025-02-05 18:40:41 - ERROR - stderr - 36%|███▌ | 8116/22434 [8:33:00<10:06:27, 2.54s/it] +2025-02-05 18:40:41 - ERROR - stderr - +2025-02-05 18:40:41 - ERROR - stderr - +2025-02-05 18:40:41 - INFO - stdout - {'loss': 0.7685, 'grad_norm': 1.306916356086731, 'learning_rate': 1.4762280475565404e-05, 'epoch': 1.09} +2025-02-05 18:40:41 - ERROR - stderr - 36%|███▌ | 8116/22434 [8:33:01<10:06:27, 2.54s/it] +2025-02-05 18:40:43 - ERROR - stderr - 36%|███▌ | 8117/22434 [8:33:03<10:04:28, 2.53s/it] +2025-02-05 18:40:43 - ERROR - stderr - +2025-02-05 18:40:43 - ERROR - stderr - +2025-02-05 18:40:43 - INFO - stdout - {'loss': 0.8804, 'grad_norm': 1.18074631690979, 'learning_rate': 1.4761010908041758e-05, 'epoch': 1.09} +2025-02-05 18:40:43 - ERROR - stderr - 36%|███▌ | 8117/22434 [8:33:03<10:04:28, 2.53s/it] +2025-02-05 18:40:46 - ERROR - stderr - 36%|███▌ | 8118/22434 [8:33:06<10:13:35, 2.57s/it] +2025-02-05 18:40:46 - ERROR - stderr - +2025-02-05 18:40:46 - ERROR - stderr - +2025-02-05 18:40:46 - INFO - stdout - {'loss': 0.7791, 'grad_norm': 1.1360831260681152, 'learning_rate': 1.475974124127939e-05, 'epoch': 1.09} +2025-02-05 18:40:46 - ERROR - stderr - 36%|███▌ | 8118/22434 [8:33:06<10:13:35, 2.57s/it] +2025-02-05 18:40:49 - ERROR - stderr - 36%|███▌ | 8119/22434 [8:33:09<10:34:57, 2.66s/it] +2025-02-05 18:40:49 - ERROR - stderr - +2025-02-05 18:40:49 - ERROR - stderr - +2025-02-05 18:40:49 - INFO - stdout - {'loss': 0.719, 'grad_norm': 1.148830771446228, 'learning_rate': 1.4758471475304773e-05, 'epoch': 1.09} +2025-02-05 18:40:49 - ERROR - stderr - 36%|███▌ | 8119/22434 [8:33:09<10:34:57, 2.66s/it] +2025-02-05 18:40:51 - ERROR - stderr - 36%|███▌ | 8120/22434 [8:33:11<10:21:08, 2.60s/it] +2025-02-05 18:40:51 - ERROR - stderr - +2025-02-05 18:40:51 - ERROR - stderr - +2025-02-05 18:40:51 - INFO - stdout - {'loss': 0.7466, 'grad_norm': 1.187147855758667, 'learning_rate': 1.4757201610144372e-05, 'epoch': 1.09} +2025-02-05 18:40:51 - ERROR - stderr - 36%|███▌ | 8120/22434 [8:33:11<10:21:08, 2.60s/it] +2025-02-05 18:40:54 - ERROR - stderr - 36%|███▌ | 8121/22434 [8:33:13<10:15:41, 2.58s/it] +2025-02-05 18:40:54 - ERROR - stderr - +2025-02-05 18:40:54 - ERROR - stderr - +2025-02-05 18:40:54 - INFO - stdout - {'loss': 0.7766, 'grad_norm': 1.1305124759674072, 'learning_rate': 1.4755931645824653e-05, 'epoch': 1.09} +2025-02-05 18:40:54 - ERROR - stderr - 36%|███▌ | 8121/22434 [8:33:14<10:15:41, 2.58s/it] +2025-02-05 18:40:56 - ERROR - stderr - 36%|███▌ | 8122/22434 [8:33:16<10:10:30, 2.56s/it] +2025-02-05 18:40:56 - ERROR - stderr - +2025-02-05 18:40:56 - ERROR - stderr - +2025-02-05 18:40:56 - INFO - stdout - {'loss': 0.7616, 'grad_norm': 1.1587939262390137, 'learning_rate': 1.475466158237209e-05, 'epoch': 1.09} +2025-02-05 18:40:56 - ERROR - stderr - 36%|███▌ | 8122/22434 [8:33:16<10:10:30, 2.56s/it] +2025-02-05 18:40:59 - ERROR - stderr - 36%|███▌ | 8123/22434 [8:33:19<10:07:49, 2.55s/it] +2025-02-05 18:40:59 - ERROR - stderr - +2025-02-05 18:40:59 - ERROR - stderr - +2025-02-05 18:40:59 - INFO - stdout - {'loss': 0.7082, 'grad_norm': 1.1204460859298706, 'learning_rate': 1.4753391419813156e-05, 'epoch': 1.09} +2025-02-05 18:40:59 - ERROR - stderr - 36%|███▌ | 8123/22434 [8:33:19<10:07:49, 2.55s/it] +2025-02-05 18:41:01 - ERROR - stderr - 36%|███▌ | 8124/22434 [8:33:21<10:17:10, 2.59s/it] +2025-02-05 18:41:01 - ERROR - stderr - +2025-02-05 18:41:01 - ERROR - stderr - +2025-02-05 18:41:01 - INFO - stdout - {'loss': 0.7621, 'grad_norm': 1.2031095027923584, 'learning_rate': 1.4752121158174331e-05, 'epoch': 1.09} +2025-02-05 18:41:01 - ERROR - stderr - 36%|███▌ | 8124/22434 [8:33:21<10:17:10, 2.59s/it] +2025-02-05 18:41:04 - ERROR - stderr - 36%|███▌ | 8125/22434 [8:33:24<10:17:00, 2.59s/it] +2025-02-05 18:41:04 - ERROR - stderr - +2025-02-05 18:41:04 - ERROR - stderr - +2025-02-05 18:41:04 - INFO - stdout - {'loss': 0.8265, 'grad_norm': 1.2159233093261719, 'learning_rate': 1.4750850797482082e-05, 'epoch': 1.09} +2025-02-05 18:41:04 - ERROR - stderr - 36%|███▌ | 8125/22434 [8:33:24<10:17:00, 2.59s/it] +2025-02-05 18:41:06 - ERROR - stderr - 36%|███▌ | 8126/22434 [8:33:26<10:08:17, 2.55s/it] +2025-02-05 18:41:07 - ERROR - stderr - +2025-02-05 18:41:07 - ERROR - stderr - +2025-02-05 18:41:07 - INFO - stdout - {'loss': 0.7418, 'grad_norm': 1.2648773193359375, 'learning_rate': 1.4749580337762896e-05, 'epoch': 1.09} +2025-02-05 18:41:07 - ERROR - stderr - 36%|███▌ | 8126/22434 [8:33:26<10:08:17, 2.55s/it] +2025-02-05 18:41:10 - ERROR - stderr - 36%|███▌ | 8127/22434 [8:33:29<10:46:37, 2.71s/it] +2025-02-05 18:41:10 - ERROR - stderr - +2025-02-05 18:41:10 - ERROR - stderr - +2025-02-05 18:41:10 - INFO - stdout - {'loss': 0.712, 'grad_norm': 1.0170738697052002, 'learning_rate': 1.4748309779043253e-05, 'epoch': 1.09} +2025-02-05 18:41:10 - ERROR - stderr - 36%|███▌ | 8127/22434 [8:33:29<10:46:37, 2.71s/it] +2025-02-05 18:41:12 - ERROR - stderr - 36%|███▌ | 8128/22434 [8:33:32<10:28:22, 2.64s/it] +2025-02-05 18:41:12 - ERROR - stderr - +2025-02-05 18:41:12 - ERROR - stderr - +2025-02-05 18:41:12 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.3066020011901855, 'learning_rate': 1.4747039121349636e-05, 'epoch': 1.09} +2025-02-05 18:41:12 - ERROR - stderr - 36%|███▌ | 8128/22434 [8:33:32<10:28:22, 2.64s/it] +2025-02-05 18:41:15 - ERROR - stderr - 36%|███▌ | 8129/22434 [8:33:34<10:16:49, 2.59s/it] +2025-02-05 18:41:15 - ERROR - stderr - +2025-02-05 18:41:15 - ERROR - stderr - +2025-02-05 18:41:15 - INFO - stdout - {'loss': 0.7926, 'grad_norm': 1.2325260639190674, 'learning_rate': 1.4745768364708532e-05, 'epoch': 1.09} +2025-02-05 18:41:15 - ERROR - stderr - 36%|███▌ | 8129/22434 [8:33:34<10:16:49, 2.59s/it] +2025-02-05 18:41:17 - ERROR - stderr - 36%|███▌ | 8130/22434 [8:33:37<10:10:43, 2.56s/it] +2025-02-05 18:41:17 - ERROR - stderr - +2025-02-05 18:41:17 - ERROR - stderr - +2025-02-05 18:41:17 - INFO - stdout - {'loss': 0.6746, 'grad_norm': 1.176430106163025, 'learning_rate': 1.4744497509146427e-05, 'epoch': 1.09} +2025-02-05 18:41:17 - ERROR - stderr - 36%|███▌ | 8130/22434 [8:33:37<10:10:43, 2.56s/it] +2025-02-05 18:41:20 - ERROR - stderr - 36%|███▌ | 8131/22434 [8:33:39<10:07:15, 2.55s/it] +2025-02-05 18:41:20 - ERROR - stderr - +2025-02-05 18:41:20 - ERROR - stderr - +2025-02-05 18:41:20 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.1593271493911743, 'learning_rate': 1.4743226554689811e-05, 'epoch': 1.09} +2025-02-05 18:41:20 - ERROR - stderr - 36%|███▌ | 8131/22434 [8:33:39<10:07:15, 2.55s/it] +2025-02-05 18:41:22 - ERROR - stderr - 36%|███▌ | 8132/22434 [8:33:42<10:11:53, 2.57s/it] +2025-02-05 18:41:22 - ERROR - stderr - +2025-02-05 18:41:22 - ERROR - stderr - +2025-02-05 18:41:22 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.1588596105575562, 'learning_rate': 1.4741955501365177e-05, 'epoch': 1.09} +2025-02-05 18:41:22 - ERROR - stderr - 36%|███▌ | 8132/22434 [8:33:42<10:11:53, 2.57s/it] +2025-02-05 18:41:25 - ERROR - stderr - 36%|███▋ | 8133/22434 [8:33:44<10:13:28, 2.57s/it] +2025-02-05 18:41:25 - ERROR - stderr - +2025-02-05 18:41:25 - ERROR - stderr - +2025-02-05 18:41:25 - INFO - stdout - {'loss': 0.6524, 'grad_norm': 1.0420947074890137, 'learning_rate': 1.474068434919902e-05, 'epoch': 1.09} +2025-02-05 18:41:25 - ERROR - stderr - 36%|███▋ | 8133/22434 [8:33:45<10:13:28, 2.57s/it] +2025-02-05 18:41:27 - ERROR - stderr - 36%|███▋ | 8134/22434 [8:33:47<10:15:06, 2.58s/it] +2025-02-05 18:41:27 - ERROR - stderr - +2025-02-05 18:41:27 - ERROR - stderr - +2025-02-05 18:41:27 - INFO - stdout - {'loss': 0.7209, 'grad_norm': 1.1558109521865845, 'learning_rate': 1.473941309821783e-05, 'epoch': 1.09} +2025-02-05 18:41:27 - ERROR - stderr - 36%|███▋ | 8134/22434 [8:33:47<10:15:06, 2.58s/it] +2025-02-05 18:41:30 - ERROR - stderr - 36%|███▋ | 8135/22434 [8:33:50<10:06:09, 2.54s/it] +2025-02-05 18:41:30 - ERROR - stderr - +2025-02-05 18:41:30 - ERROR - stderr - +2025-02-05 18:41:30 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.224700927734375, 'learning_rate': 1.4738141748448112e-05, 'epoch': 1.09} +2025-02-05 18:41:30 - ERROR - stderr - 36%|███▋ | 8135/22434 [8:33:50<10:06:09, 2.54s/it] +2025-02-05 18:41:32 - ERROR - stderr - 36%|███▋ | 8136/22434 [8:33:52<10:01:00, 2.52s/it] +2025-02-05 18:41:32 - ERROR - stderr - +2025-02-05 18:41:32 - ERROR - stderr - +2025-02-05 18:41:32 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.0838958024978638, 'learning_rate': 1.4736870299916361e-05, 'epoch': 1.09} +2025-02-05 18:41:32 - ERROR - stderr - 36%|███▋ | 8136/22434 [8:33:52<10:01:00, 2.52s/it] +2025-02-05 18:41:35 - ERROR - stderr - 36%|███▋ | 8137/22434 [8:33:54<9:56:29, 2.50s/it] +2025-02-05 18:41:35 - ERROR - stderr - +2025-02-05 18:41:35 - ERROR - stderr - +2025-02-05 18:41:35 - INFO - stdout - {'loss': 0.632, 'grad_norm': 1.0932518243789673, 'learning_rate': 1.4735598752649084e-05, 'epoch': 1.09} +2025-02-05 18:41:35 - ERROR - stderr - 36%|███▋ | 8137/22434 [8:33:55<9:56:29, 2.50s/it] +2025-02-05 18:41:37 - ERROR - stderr - 36%|███▋ | 8138/22434 [8:33:57<9:56:27, 2.50s/it] +2025-02-05 18:41:37 - ERROR - stderr - +2025-02-05 18:41:37 - ERROR - stderr - +2025-02-05 18:41:37 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.052201747894287, 'learning_rate': 1.473432710667278e-05, 'epoch': 1.09} +2025-02-05 18:41:37 - ERROR - stderr - 36%|███▋ | 8138/22434 [8:33:57<9:56:27, 2.50s/it] +2025-02-05 18:41:40 - ERROR - stderr - 36%|███▋ | 8139/22434 [8:34:00<10:07:48, 2.55s/it] +2025-02-05 18:41:40 - ERROR - stderr - +2025-02-05 18:41:40 - ERROR - stderr - +2025-02-05 18:41:40 - INFO - stdout - {'loss': 0.7268, 'grad_norm': 1.2122379541397095, 'learning_rate': 1.4733055362013957e-05, 'epoch': 1.09} +2025-02-05 18:41:40 - ERROR - stderr - 36%|███▋ | 8139/22434 [8:34:00<10:07:48, 2.55s/it] +2025-02-05 18:41:42 - ERROR - stderr - 36%|███▋ | 8140/22434 [8:34:02<10:07:01, 2.55s/it] +2025-02-05 18:41:42 - ERROR - stderr - +2025-02-05 18:41:42 - ERROR - stderr - +2025-02-05 18:41:42 - INFO - stdout - {'loss': 0.7608, 'grad_norm': 1.193186640739441, 'learning_rate': 1.4731783518699128e-05, 'epoch': 1.09} +2025-02-05 18:41:42 - ERROR - stderr - 36%|███▋ | 8140/22434 [8:34:02<10:07:01, 2.55s/it] +2025-02-05 18:41:45 - ERROR - stderr - 36%|███▋ | 8141/22434 [8:34:05<9:58:31, 2.51s/it] +2025-02-05 18:41:45 - ERROR - stderr - +2025-02-05 18:41:45 - ERROR - stderr - +2025-02-05 18:41:45 - INFO - stdout - {'loss': 0.7338, 'grad_norm': 1.11224365234375, 'learning_rate': 1.4730511576754794e-05, 'epoch': 1.09} +2025-02-05 18:41:45 - ERROR - stderr - 36%|███▋ | 8141/22434 [8:34:05<9:58:31, 2.51s/it] +2025-02-05 18:41:47 - ERROR - stderr - 36%|███▋ | 8142/22434 [8:34:07<9:55:09, 2.50s/it] +2025-02-05 18:41:47 - ERROR - stderr - +2025-02-05 18:41:47 - ERROR - stderr - +2025-02-05 18:41:47 - INFO - stdout - {'loss': 0.7144, 'grad_norm': 1.2209076881408691, 'learning_rate': 1.4729239536207476e-05, 'epoch': 1.09} +2025-02-05 18:41:47 - ERROR - stderr - 36%|███▋ | 8142/22434 [8:34:07<9:55:09, 2.50s/it] +2025-02-05 18:41:50 - ERROR - stderr - 36%|███▋ | 8143/22434 [8:34:10<10:01:06, 2.52s/it] +2025-02-05 18:41:50 - ERROR - stderr - +2025-02-05 18:41:50 - ERROR - stderr - +2025-02-05 18:41:50 - INFO - stdout - {'loss': 0.7481, 'grad_norm': 1.338446021080017, 'learning_rate': 1.4727967397083684e-05, 'epoch': 1.09} +2025-02-05 18:41:50 - ERROR - stderr - 36%|███▋ | 8143/22434 [8:34:10<10:01:06, 2.52s/it] +2025-02-05 18:41:52 - ERROR - stderr - 36%|███▋ | 8144/22434 [8:34:12<9:56:30, 2.50s/it] +2025-02-05 18:41:52 - ERROR - stderr - +2025-02-05 18:41:52 - ERROR - stderr - +2025-02-05 18:41:52 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.1219849586486816, 'learning_rate': 1.4726695159409938e-05, 'epoch': 1.09} +2025-02-05 18:41:52 - ERROR - stderr - 36%|███▋ | 8144/22434 [8:34:12<9:56:30, 2.50s/it] +2025-02-05 18:41:55 - ERROR - stderr - 36%|███▋ | 8145/22434 [8:34:15<10:06:36, 2.55s/it] +2025-02-05 18:41:55 - ERROR - stderr - +2025-02-05 18:41:55 - ERROR - stderr - +2025-02-05 18:41:55 - INFO - stdout - {'loss': 0.6844, 'grad_norm': 1.0940457582473755, 'learning_rate': 1.4725422823212754e-05, 'epoch': 1.09} +2025-02-05 18:41:55 - ERROR - stderr - 36%|███▋ | 8145/22434 [8:34:15<10:06:36, 2.55s/it] +2025-02-05 18:41:58 - ERROR - stderr - 36%|███▋ | 8146/22434 [8:34:17<10:15:49, 2.59s/it] +2025-02-05 18:41:58 - ERROR - stderr - +2025-02-05 18:41:58 - ERROR - stderr - +2025-02-05 18:41:58 - INFO - stdout - {'loss': 0.6175, 'grad_norm': 1.1348212957382202, 'learning_rate': 1.4724150388518651e-05, 'epoch': 1.09} +2025-02-05 18:41:58 - ERROR - stderr - 36%|███▋ | 8146/22434 [8:34:18<10:15:49, 2.59s/it] +2025-02-05 18:42:00 - ERROR - stderr - 36%|███▋ | 8147/22434 [8:34:20<10:16:39, 2.59s/it] +2025-02-05 18:42:00 - ERROR - stderr - +2025-02-05 18:42:00 - ERROR - stderr - +2025-02-05 18:42:00 - INFO - stdout - {'loss': 0.7989, 'grad_norm': 1.2818306684494019, 'learning_rate': 1.4722877855354156e-05, 'epoch': 1.09} +2025-02-05 18:42:00 - ERROR - stderr - 36%|███▋ | 8147/22434 [8:34:20<10:16:39, 2.59s/it] +2025-02-05 18:42:03 - ERROR - stderr - 36%|███▋ | 8148/22434 [8:34:23<10:12:28, 2.57s/it] +2025-02-05 18:42:03 - ERROR - stderr - +2025-02-05 18:42:03 - ERROR - stderr - +2025-02-05 18:42:03 - INFO - stdout - {'loss': 0.8142, 'grad_norm': 1.2465180158615112, 'learning_rate': 1.472160522374579e-05, 'epoch': 1.09} +2025-02-05 18:42:03 - ERROR - stderr - 36%|███▋ | 8148/22434 [8:34:23<10:12:28, 2.57s/it] +2025-02-05 18:42:05 - ERROR - stderr - 36%|███▋ | 8149/22434 [8:34:25<10:06:00, 2.55s/it] +2025-02-05 18:42:05 - ERROR - stderr - +2025-02-05 18:42:05 - ERROR - stderr - +2025-02-05 18:42:05 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.0677372217178345, 'learning_rate': 1.4720332493720082e-05, 'epoch': 1.09} +2025-02-05 18:42:05 - ERROR - stderr - 36%|███▋ | 8149/22434 [8:34:25<10:06:00, 2.55s/it] +2025-02-05 18:42:08 - ERROR - stderr - 36%|███▋ | 8150/22434 [8:34:28<10:00:57, 2.52s/it] +2025-02-05 18:42:08 - ERROR - stderr - +2025-02-05 18:42:08 - ERROR - stderr - +2025-02-05 18:42:08 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.032468318939209, 'learning_rate': 1.4719059665303559e-05, 'epoch': 1.09} +2025-02-05 18:42:08 - ERROR - stderr - 36%|███▋ | 8150/22434 [8:34:28<10:00:57, 2.52s/it] +2025-02-05 18:42:10 - ERROR - stderr - 36%|███▋ | 8151/22434 [8:34:30<9:56:15, 2.50s/it] +2025-02-05 18:42:10 - ERROR - stderr - +2025-02-05 18:42:10 - ERROR - stderr - +2025-02-05 18:42:10 - INFO - stdout - {'loss': 0.7498, 'grad_norm': 1.2742773294448853, 'learning_rate': 1.4717786738522753e-05, 'epoch': 1.09} +2025-02-05 18:42:10 - ERROR - stderr - 36%|███▋ | 8151/22434 [8:34:30<9:56:15, 2.50s/it] +2025-02-05 18:42:13 - ERROR - stderr - 36%|███▋ | 8152/22434 [8:34:33<9:56:57, 2.51s/it] +2025-02-05 18:42:13 - ERROR - stderr - +2025-02-05 18:42:13 - ERROR - stderr - +2025-02-05 18:42:13 - INFO - stdout - {'loss': 0.7706, 'grad_norm': 1.2955206632614136, 'learning_rate': 1.4716513713404199e-05, 'epoch': 1.09} +2025-02-05 18:42:13 - ERROR - stderr - 36%|███▋ | 8152/22434 [8:34:33<9:56:57, 2.51s/it] +2025-02-05 18:42:15 - ERROR - stderr - 36%|███▋ | 8153/22434 [8:34:35<9:53:31, 2.49s/it] +2025-02-05 18:42:15 - ERROR - stderr - +2025-02-05 18:42:15 - ERROR - stderr - +2025-02-05 18:42:15 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.1426101922988892, 'learning_rate': 1.4715240589974428e-05, 'epoch': 1.09} +2025-02-05 18:42:15 - ERROR - stderr - 36%|███▋ | 8153/22434 [8:34:35<9:53:31, 2.49s/it] +2025-02-05 18:42:18 - ERROR - stderr - 36%|███▋ | 8154/22434 [8:34:37<9:54:12, 2.50s/it] +2025-02-05 18:42:18 - ERROR - stderr - +2025-02-05 18:42:18 - ERROR - stderr - +2025-02-05 18:42:18 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.1886787414550781, 'learning_rate': 1.4713967368259981e-05, 'epoch': 1.09} +2025-02-05 18:42:18 - ERROR - stderr - 36%|███▋ | 8154/22434 [8:34:38<9:54:12, 2.50s/it] +2025-02-05 18:42:20 - ERROR - stderr - 36%|███▋ | 8155/22434 [8:34:40<10:05:28, 2.54s/it] +2025-02-05 18:42:20 - ERROR - stderr - +2025-02-05 18:42:20 - ERROR - stderr - +2025-02-05 18:42:20 - INFO - stdout - {'loss': 0.7448, 'grad_norm': 1.1136610507965088, 'learning_rate': 1.4712694048287387e-05, 'epoch': 1.09} +2025-02-05 18:42:20 - ERROR - stderr - 36%|███▋ | 8155/22434 [8:34:40<10:05:28, 2.54s/it] +2025-02-05 18:42:23 - ERROR - stderr - 36%|███▋ | 8156/22434 [8:34:43<10:00:37, 2.52s/it] +2025-02-05 18:42:23 - ERROR - stderr - +2025-02-05 18:42:23 - ERROR - stderr - +2025-02-05 18:42:23 - INFO - stdout - {'loss': 0.7783, 'grad_norm': 1.1471967697143555, 'learning_rate': 1.4711420630083204e-05, 'epoch': 1.09} +2025-02-05 18:42:23 - ERROR - stderr - 36%|███▋ | 8156/22434 [8:34:43<10:00:37, 2.52s/it] +2025-02-05 18:42:25 - ERROR - stderr - 36%|███▋ | 8157/22434 [8:34:45<10:08:25, 2.56s/it] +2025-02-05 18:42:26 - ERROR - stderr - +2025-02-05 18:42:26 - ERROR - stderr - +2025-02-05 18:42:26 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.2274174690246582, 'learning_rate': 1.4710147113673965e-05, 'epoch': 1.09} +2025-02-05 18:42:26 - ERROR - stderr - 36%|███▋ | 8157/22434 [8:34:45<10:08:25, 2.56s/it] +2025-02-05 18:42:29 - ERROR - stderr - 36%|███▋ | 8158/22434 [8:34:49<11:23:59, 2.87s/it] +2025-02-05 18:42:29 - ERROR - stderr - +2025-02-05 18:42:29 - ERROR - stderr - +2025-02-05 18:42:29 - INFO - stdout - {'loss': 0.6595, 'grad_norm': 0.9566587209701538, 'learning_rate': 1.4708873499086214e-05, 'epoch': 1.09} +2025-02-05 18:42:29 - ERROR - stderr - 36%|███▋ | 8158/22434 [8:34:49<11:23:59, 2.87s/it] +2025-02-05 18:42:32 - ERROR - stderr - 36%|███▋ | 8159/22434 [8:34:52<11:57:44, 3.02s/it] +2025-02-05 18:42:32 - ERROR - stderr - +2025-02-05 18:42:32 - ERROR - stderr - +2025-02-05 18:42:32 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.1610045433044434, 'learning_rate': 1.4707599786346501e-05, 'epoch': 1.09} +2025-02-05 18:42:32 - ERROR - stderr - 36%|███▋ | 8159/22434 [8:34:52<11:57:44, 3.02s/it] +2025-02-05 18:42:36 - ERROR - stderr - 36%|███▋ | 8160/22434 [8:34:56<12:22:18, 3.12s/it] +2025-02-05 18:42:36 - ERROR - stderr - +2025-02-05 18:42:36 - ERROR - stderr - +2025-02-05 18:42:36 - INFO - stdout - {'loss': 0.7235, 'grad_norm': 1.1392569541931152, 'learning_rate': 1.4706325975481377e-05, 'epoch': 1.09} +2025-02-05 18:42:36 - ERROR - stderr - 36%|███▋ | 8160/22434 [8:34:56<12:22:18, 3.12s/it] +2025-02-05 18:42:39 - ERROR - stderr - 36%|███▋ | 8161/22434 [8:34:59<12:26:41, 3.14s/it] +2025-02-05 18:42:39 - ERROR - stderr - +2025-02-05 18:42:39 - ERROR - stderr - +2025-02-05 18:42:39 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.1950937509536743, 'learning_rate': 1.4705052066517388e-05, 'epoch': 1.09} +2025-02-05 18:42:39 - ERROR - stderr - 36%|███▋ | 8161/22434 [8:34:59<12:26:41, 3.14s/it] +2025-02-05 18:42:42 - ERROR - stderr - 36%|███▋ | 8162/22434 [8:35:02<12:09:43, 3.07s/it] +2025-02-05 18:42:42 - ERROR - stderr - +2025-02-05 18:42:42 - ERROR - stderr - +2025-02-05 18:42:42 - INFO - stdout - {'loss': 0.7151, 'grad_norm': 1.1389201879501343, 'learning_rate': 1.4703778059481096e-05, 'epoch': 1.09} +2025-02-05 18:42:42 - ERROR - stderr - 36%|███▋ | 8162/22434 [8:35:02<12:09:43, 3.07s/it] +2025-02-05 18:42:44 - ERROR - stderr - 36%|███▋ | 8163/22434 [8:35:04<11:29:11, 2.90s/it] +2025-02-05 18:42:44 - ERROR - stderr - +2025-02-05 18:42:44 - ERROR - stderr - +2025-02-05 18:42:44 - INFO - stdout - {'loss': 0.8371, 'grad_norm': 1.4349377155303955, 'learning_rate': 1.4702503954399047e-05, 'epoch': 1.09} +2025-02-05 18:42:44 - ERROR - stderr - 36%|███▋ | 8163/22434 [8:35:04<11:29:11, 2.90s/it] +2025-02-05 18:42:47 - ERROR - stderr - 36%|███▋ | 8164/22434 [8:35:07<11:25:31, 2.88s/it] +2025-02-05 18:42:47 - ERROR - stderr - +2025-02-05 18:42:47 - ERROR - stderr - +2025-02-05 18:42:47 - INFO - stdout - {'loss': 0.7257, 'grad_norm': 1.0885009765625, 'learning_rate': 1.4701229751297806e-05, 'epoch': 1.09} +2025-02-05 18:42:47 - ERROR - stderr - 36%|███▋ | 8164/22434 [8:35:07<11:25:31, 2.88s/it] +2025-02-05 18:42:50 - ERROR - stderr - 36%|███▋ | 8165/22434 [8:35:10<11:08:50, 2.81s/it] +2025-02-05 18:42:50 - ERROR - stderr - +2025-02-05 18:42:50 - ERROR - stderr - +2025-02-05 18:42:50 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.1161704063415527, 'learning_rate': 1.4699955450203929e-05, 'epoch': 1.09} +2025-02-05 18:42:50 - ERROR - stderr - 36%|███▋ | 8165/22434 [8:35:10<11:08:50, 2.81s/it] +2025-02-05 18:42:53 - ERROR - stderr - 36%|███▋ | 8166/22434 [8:35:13<11:14:08, 2.83s/it] +2025-02-05 18:42:53 - ERROR - stderr - +2025-02-05 18:42:53 - ERROR - stderr - +2025-02-05 18:42:53 - INFO - stdout - {'loss': 0.7665, 'grad_norm': 1.15769624710083, 'learning_rate': 1.4698681051143976e-05, 'epoch': 1.09} +2025-02-05 18:42:53 - ERROR - stderr - 36%|███▋ | 8166/22434 [8:35:13<11:14:08, 2.83s/it] +2025-02-05 18:42:55 - ERROR - stderr - 36%|███▋ | 8167/22434 [8:35:15<10:50:48, 2.74s/it] +2025-02-05 18:42:55 - ERROR - stderr - +2025-02-05 18:42:55 - ERROR - stderr - +2025-02-05 18:42:55 - INFO - stdout - {'loss': 0.763, 'grad_norm': 1.1866463422775269, 'learning_rate': 1.4697406554144513e-05, 'epoch': 1.09} +2025-02-05 18:42:55 - ERROR - stderr - 36%|███▋ | 8167/22434 [8:35:15<10:50:48, 2.74s/it] +2025-02-05 18:42:58 - ERROR - stderr - 36%|███▋ | 8168/22434 [8:35:18<10:33:25, 2.66s/it] +2025-02-05 18:42:58 - ERROR - stderr - +2025-02-05 18:42:58 - ERROR - stderr - +2025-02-05 18:42:58 - INFO - stdout - {'loss': 0.7819, 'grad_norm': 1.27335786819458, 'learning_rate': 1.4696131959232105e-05, 'epoch': 1.09} +2025-02-05 18:42:58 - ERROR - stderr - 36%|███▋ | 8168/22434 [8:35:18<10:33:25, 2.66s/it] +2025-02-05 18:43:00 - ERROR - stderr - 36%|███▋ | 8169/22434 [8:35:20<10:24:03, 2.62s/it] +2025-02-05 18:43:00 - ERROR - stderr - +2025-02-05 18:43:00 - ERROR - stderr - +2025-02-05 18:43:00 - INFO - stdout - {'loss': 0.7255, 'grad_norm': 1.2271827459335327, 'learning_rate': 1.4694857266433322e-05, 'epoch': 1.09} +2025-02-05 18:43:00 - ERROR - stderr - 36%|███▋ | 8169/22434 [8:35:20<10:24:03, 2.62s/it] +2025-02-05 18:43:03 - ERROR - stderr - 36%|███▋ | 8170/22434 [8:35:23<10:11:04, 2.57s/it] +2025-02-05 18:43:03 - ERROR - stderr - +2025-02-05 18:43:03 - ERROR - stderr - +2025-02-05 18:43:03 - INFO - stdout - {'loss': 0.7435, 'grad_norm': 1.1655311584472656, 'learning_rate': 1.469358247577473e-05, 'epoch': 1.09} +2025-02-05 18:43:03 - ERROR - stderr - 36%|███▋ | 8170/22434 [8:35:23<10:11:04, 2.57s/it] +2025-02-05 18:43:06 - ERROR - stderr - 36%|███▋ | 8171/22434 [8:35:25<10:28:11, 2.64s/it] +2025-02-05 18:43:06 - ERROR - stderr - +2025-02-05 18:43:06 - ERROR - stderr - +2025-02-05 18:43:06 - INFO - stdout - {'loss': 0.6415, 'grad_norm': 1.06745183467865, 'learning_rate': 1.4692307587282905e-05, 'epoch': 1.09} +2025-02-05 18:43:06 - ERROR - stderr - 36%|███▋ | 8171/22434 [8:35:25<10:28:11, 2.64s/it] +2025-02-05 18:43:08 - ERROR - stderr - 36%|███▋ | 8172/22434 [8:35:28<10:19:23, 2.61s/it] +2025-02-05 18:43:08 - ERROR - stderr - +2025-02-05 18:43:08 - ERROR - stderr - +2025-02-05 18:43:08 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.1530661582946777, 'learning_rate': 1.4691032600984416e-05, 'epoch': 1.09} +2025-02-05 18:43:08 - ERROR - stderr - 36%|███▋ | 8172/22434 [8:35:28<10:19:23, 2.61s/it] +2025-02-05 18:43:11 - ERROR - stderr - 36%|███▋ | 8173/22434 [8:35:30<10:16:23, 2.59s/it] +2025-02-05 18:43:11 - ERROR - stderr - +2025-02-05 18:43:11 - ERROR - stderr - +2025-02-05 18:43:11 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.2113919258117676, 'learning_rate': 1.4689757516905842e-05, 'epoch': 1.09} +2025-02-05 18:43:11 - ERROR - stderr - 36%|███▋ | 8173/22434 [8:35:30<10:16:23, 2.59s/it] +2025-02-05 18:43:14 - ERROR - stderr - 36%|███▋ | 8174/22434 [8:35:34<11:10:56, 2.82s/it] +2025-02-05 18:43:14 - ERROR - stderr - +2025-02-05 18:43:14 - ERROR - stderr - +2025-02-05 18:43:14 - INFO - stdout - {'loss': 0.7488, 'grad_norm': 1.3119593858718872, 'learning_rate': 1.468848233507376e-05, 'epoch': 1.09} +2025-02-05 18:43:14 - ERROR - stderr - 36%|███▋ | 8174/22434 [8:35:34<11:10:56, 2.82s/it] +2025-02-05 18:43:17 - ERROR - stderr - 36%|███▋ | 8175/22434 [8:35:36<10:48:25, 2.73s/it] +2025-02-05 18:43:17 - ERROR - stderr - +2025-02-05 18:43:17 - ERROR - stderr - +2025-02-05 18:43:17 - INFO - stdout - {'loss': 0.7328, 'grad_norm': 1.217664361000061, 'learning_rate': 1.468720705551475e-05, 'epoch': 1.09} +2025-02-05 18:43:17 - ERROR - stderr - 36%|███▋ | 8175/22434 [8:35:36<10:48:25, 2.73s/it] +2025-02-05 18:43:19 - ERROR - stderr - 36%|███▋ | 8176/22434 [8:35:39<10:33:58, 2.67s/it] +2025-02-05 18:43:19 - ERROR - stderr - +2025-02-05 18:43:19 - ERROR - stderr - +2025-02-05 18:43:19 - INFO - stdout - {'loss': 0.7015, 'grad_norm': 1.1345393657684326, 'learning_rate': 1.4685931678255394e-05, 'epoch': 1.09} +2025-02-05 18:43:19 - ERROR - stderr - 36%|███▋ | 8176/22434 [8:35:39<10:33:58, 2.67s/it] +2025-02-05 18:43:22 - ERROR - stderr - 36%|███▋ | 8177/22434 [8:35:41<10:22:31, 2.62s/it] +2025-02-05 18:43:22 - ERROR - stderr - +2025-02-05 18:43:22 - ERROR - stderr - +2025-02-05 18:43:22 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.1055500507354736, 'learning_rate': 1.4684656203322278e-05, 'epoch': 1.09} +2025-02-05 18:43:22 - ERROR - stderr - 36%|███▋ | 8177/22434 [8:35:41<10:22:31, 2.62s/it] +2025-02-05 18:43:24 - ERROR - stderr - 36%|███▋ | 8178/22434 [8:35:44<10:14:47, 2.59s/it] +2025-02-05 18:43:24 - ERROR - stderr - +2025-02-05 18:43:24 - ERROR - stderr - +2025-02-05 18:43:24 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.232519268989563, 'learning_rate': 1.4683380630741986e-05, 'epoch': 1.09} +2025-02-05 18:43:24 - ERROR - stderr - 36%|███▋ | 8178/22434 [8:35:44<10:14:47, 2.59s/it] +2025-02-05 18:43:27 - ERROR - stderr - 36%|███▋ | 8179/22434 [8:35:47<10:27:51, 2.64s/it] +2025-02-05 18:43:27 - ERROR - stderr - +2025-02-05 18:43:27 - ERROR - stderr - +2025-02-05 18:43:27 - INFO - stdout - {'loss': 0.7696, 'grad_norm': 1.2854101657867432, 'learning_rate': 1.4682104960541104e-05, 'epoch': 1.09} +2025-02-05 18:43:27 - ERROR - stderr - 36%|███▋ | 8179/22434 [8:35:47<10:27:51, 2.64s/it] +2025-02-05 18:43:29 - ERROR - stderr - 36%|███▋ | 8180/22434 [8:35:49<10:22:48, 2.62s/it] +2025-02-05 18:43:29 - ERROR - stderr - +2025-02-05 18:43:29 - ERROR - stderr - +2025-02-05 18:43:29 - INFO - stdout - {'loss': 0.7974, 'grad_norm': 1.1046152114868164, 'learning_rate': 1.4680829192746224e-05, 'epoch': 1.09} +2025-02-05 18:43:29 - ERROR - stderr - 36%|███▋ | 8180/22434 [8:35:49<10:22:48, 2.62s/it] +2025-02-05 18:43:32 - ERROR - stderr - 36%|███▋ | 8181/22434 [8:35:52<10:17:06, 2.60s/it] +2025-02-05 18:43:32 - ERROR - stderr - +2025-02-05 18:43:32 - ERROR - stderr - +2025-02-05 18:43:32 - INFO - stdout - {'loss': 0.7228, 'grad_norm': 1.0964359045028687, 'learning_rate': 1.4679553327383942e-05, 'epoch': 1.09} +2025-02-05 18:43:32 - ERROR - stderr - 36%|███▋ | 8181/22434 [8:35:52<10:17:06, 2.60s/it] +2025-02-05 18:43:34 - ERROR - stderr - 36%|███▋ | 8182/22434 [8:35:54<10:13:19, 2.58s/it] +2025-02-05 18:43:35 - ERROR - stderr - +2025-02-05 18:43:35 - ERROR - stderr - +2025-02-05 18:43:35 - INFO - stdout - {'loss': 0.762, 'grad_norm': 1.1129666566848755, 'learning_rate': 1.4678277364480846e-05, 'epoch': 1.09} +2025-02-05 18:43:35 - ERROR - stderr - 36%|███▋ | 8182/22434 [8:35:54<10:13:19, 2.58s/it] +2025-02-05 18:43:37 - ERROR - stderr - 36%|███▋ | 8183/22434 [8:35:57<10:04:36, 2.55s/it] +2025-02-05 18:43:37 - ERROR - stderr - +2025-02-05 18:43:37 - ERROR - stderr - +2025-02-05 18:43:37 - INFO - stdout - {'loss': 0.7522, 'grad_norm': 1.1247106790542603, 'learning_rate': 1.4677001304063533e-05, 'epoch': 1.09} +2025-02-05 18:43:37 - ERROR - stderr - 36%|███▋ | 8183/22434 [8:35:57<10:04:36, 2.55s/it] +2025-02-05 18:43:39 - ERROR - stderr - 36%|███▋ | 8184/22434 [8:35:59<10:03:35, 2.54s/it] +2025-02-05 18:43:40 - ERROR - stderr - +2025-02-05 18:43:40 - ERROR - stderr - +2025-02-05 18:43:40 - INFO - stdout - {'loss': 0.7204, 'grad_norm': 1.1707801818847656, 'learning_rate': 1.4675725146158609e-05, 'epoch': 1.09} +2025-02-05 18:43:40 - ERROR - stderr - 36%|███▋ | 8184/22434 [8:35:59<10:03:35, 2.54s/it] +2025-02-05 18:43:42 - ERROR - stderr - 36%|███▋ | 8185/22434 [8:36:02<10:04:58, 2.55s/it] +2025-02-05 18:43:42 - ERROR - stderr - +2025-02-05 18:43:42 - ERROR - stderr - +2025-02-05 18:43:42 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.1898560523986816, 'learning_rate': 1.4674448890792666e-05, 'epoch': 1.09} +2025-02-05 18:43:42 - ERROR - stderr - 36%|███▋ | 8185/22434 [8:36:02<10:04:58, 2.55s/it] +2025-02-05 18:43:44 - ERROR - stderr - 36%|███▋ | 8186/22434 [8:36:04<9:56:13, 2.51s/it] +2025-02-05 18:43:45 - ERROR - stderr - +2025-02-05 18:43:45 - ERROR - stderr - +2025-02-05 18:43:45 - INFO - stdout - {'loss': 0.7744, 'grad_norm': 1.277867078781128, 'learning_rate': 1.4673172537992306e-05, 'epoch': 1.09} +2025-02-05 18:43:45 - ERROR - stderr - 36%|███▋ | 8186/22434 [8:36:04<9:56:13, 2.51s/it] +2025-02-05 18:43:47 - ERROR - stderr - 36%|███▋ | 8187/22434 [8:36:07<9:56:36, 2.51s/it] +2025-02-05 18:43:47 - ERROR - stderr - +2025-02-05 18:43:47 - ERROR - stderr - +2025-02-05 18:43:47 - INFO - stdout - {'loss': 0.7232, 'grad_norm': 1.2944467067718506, 'learning_rate': 1.4671896087784136e-05, 'epoch': 1.09} +2025-02-05 18:43:47 - ERROR - stderr - 36%|███▋ | 8187/22434 [8:36:07<9:56:36, 2.51s/it] +2025-02-05 18:43:50 - ERROR - stderr - 36%|███▋ | 8188/22434 [8:36:09<9:58:20, 2.52s/it] +2025-02-05 18:43:50 - ERROR - stderr - +2025-02-05 18:43:50 - ERROR - stderr - +2025-02-05 18:43:50 - INFO - stdout - {'loss': 0.7709, 'grad_norm': 1.2902640104293823, 'learning_rate': 1.4670619540194766e-05, 'epoch': 1.09} +2025-02-05 18:43:50 - ERROR - stderr - 36%|███▋ | 8188/22434 [8:36:09<9:58:20, 2.52s/it] +2025-02-05 18:43:52 - ERROR - stderr - 37%|███▋ | 8189/22434 [8:36:12<10:00:21, 2.53s/it] +2025-02-05 18:43:52 - ERROR - stderr - +2025-02-05 18:43:52 - ERROR - stderr - +2025-02-05 18:43:52 - INFO - stdout - {'loss': 0.7152, 'grad_norm': 1.0623537302017212, 'learning_rate': 1.4669342895250803e-05, 'epoch': 1.1} +2025-02-05 18:43:52 - ERROR - stderr - 37%|███▋ | 8189/22434 [8:36:12<10:00:21, 2.53s/it] +2025-02-05 18:43:55 - ERROR - stderr - 37%|███▋ | 8190/22434 [8:36:14<9:55:15, 2.51s/it] +2025-02-05 18:43:55 - ERROR - stderr - +2025-02-05 18:43:55 - ERROR - stderr - +2025-02-05 18:43:55 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.0780636072158813, 'learning_rate': 1.4668066152978851e-05, 'epoch': 1.1} +2025-02-05 18:43:55 - ERROR - stderr - 37%|███▋ | 8190/22434 [8:36:14<9:55:15, 2.51s/it] +2025-02-05 18:43:57 - ERROR - stderr - 37%|███▋ | 8191/22434 [8:36:17<9:55:51, 2.51s/it] +2025-02-05 18:43:57 - ERROR - stderr - +2025-02-05 18:43:57 - ERROR - stderr - +2025-02-05 18:43:57 - INFO - stdout - {'loss': 0.793, 'grad_norm': 1.3469547033309937, 'learning_rate': 1.4666789313405528e-05, 'epoch': 1.1} +2025-02-05 18:43:57 - ERROR - stderr - 37%|███▋ | 8191/22434 [8:36:17<9:55:51, 2.51s/it] +2025-02-05 18:43:59 - ERROR - stderr - 37%|███▋ | 8192/22434 [8:36:19<9:50:15, 2.49s/it] +2025-02-05 18:44:00 - ERROR - stderr - +2025-02-05 18:44:00 - ERROR - stderr - +2025-02-05 18:44:00 - INFO - stdout - {'loss': 0.6815, 'grad_norm': 1.2358331680297852, 'learning_rate': 1.4665512376557446e-05, 'epoch': 1.1} +2025-02-05 18:44:00 - ERROR - stderr - 37%|███▋ | 8192/22434 [8:36:19<9:50:15, 2.49s/it] +2025-02-05 18:44:02 - ERROR - stderr - 37%|███▋ | 8193/22434 [8:36:22<9:57:59, 2.52s/it] +2025-02-05 18:44:02 - ERROR - stderr - +2025-02-05 18:44:02 - ERROR - stderr - +2025-02-05 18:44:02 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.0827410221099854, 'learning_rate': 1.4664235342461226e-05, 'epoch': 1.1} +2025-02-05 18:44:02 - ERROR - stderr - 37%|███▋ | 8193/22434 [8:36:22<9:57:59, 2.52s/it] +2025-02-05 18:44:05 - ERROR - stderr - 37%|███▋ | 8194/22434 [8:36:24<9:56:46, 2.51s/it] +2025-02-05 18:44:05 - ERROR - stderr - +2025-02-05 18:44:05 - ERROR - stderr - +2025-02-05 18:44:05 - INFO - stdout - {'loss': 0.6783, 'grad_norm': 1.1867256164550781, 'learning_rate': 1.466295821114348e-05, 'epoch': 1.1} +2025-02-05 18:44:05 - ERROR - stderr - 37%|███▋ | 8194/22434 [8:36:24<9:56:46, 2.51s/it] +2025-02-05 18:44:07 - ERROR - stderr - 37%|███▋ | 8195/22434 [8:36:27<9:52:09, 2.50s/it] +2025-02-05 18:44:07 - ERROR - stderr - +2025-02-05 18:44:07 - ERROR - stderr - +2025-02-05 18:44:07 - INFO - stdout - {'loss': 0.7491, 'grad_norm': 1.1024630069732666, 'learning_rate': 1.4661680982630834e-05, 'epoch': 1.1} +2025-02-05 18:44:07 - ERROR - stderr - 37%|███▋ | 8195/22434 [8:36:27<9:52:09, 2.50s/it] +2025-02-05 18:44:10 - ERROR - stderr - 37%|���██▋ | 8196/22434 [8:36:29<9:54:48, 2.51s/it] +2025-02-05 18:44:10 - ERROR - stderr - +2025-02-05 18:44:10 - ERROR - stderr - +2025-02-05 18:44:10 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.1474690437316895, 'learning_rate': 1.4660403656949908e-05, 'epoch': 1.1} +2025-02-05 18:44:10 - ERROR - stderr - 37%|███▋ | 8196/22434 [8:36:29<9:54:48, 2.51s/it] +2025-02-05 18:44:12 - ERROR - stderr - 37%|███▋ | 8197/22434 [8:36:32<9:58:42, 2.52s/it] +2025-02-05 18:44:12 - ERROR - stderr - +2025-02-05 18:44:12 - ERROR - stderr - +2025-02-05 18:44:12 - INFO - stdout - {'loss': 0.7312, 'grad_norm': 1.1353682279586792, 'learning_rate': 1.4659126234127333e-05, 'epoch': 1.1} +2025-02-05 18:44:12 - ERROR - stderr - 37%|███▋ | 8197/22434 [8:36:32<9:58:42, 2.52s/it] +2025-02-05 18:44:15 - ERROR - stderr - 37%|███▋ | 8198/22434 [8:36:34<9:59:30, 2.53s/it] +2025-02-05 18:44:15 - ERROR - stderr - +2025-02-05 18:44:15 - ERROR - stderr - +2025-02-05 18:44:15 - INFO - stdout - {'loss': 0.7491, 'grad_norm': 1.1524615287780762, 'learning_rate': 1.4657848714189724e-05, 'epoch': 1.1} +2025-02-05 18:44:15 - ERROR - stderr - 37%|███▋ | 8198/22434 [8:36:34<9:59:30, 2.53s/it] +2025-02-05 18:44:17 - ERROR - stderr - 37%|███▋ | 8199/22434 [8:36:37<10:07:50, 2.56s/it] +2025-02-05 18:44:17 - ERROR - stderr - +2025-02-05 18:44:17 - ERROR - stderr - +2025-02-05 18:44:17 - INFO - stdout - {'loss': 0.7293, 'grad_norm': 1.2165710926055908, 'learning_rate': 1.4656571097163717e-05, 'epoch': 1.1} +2025-02-05 18:44:17 - ERROR - stderr - 37%|███▋ | 8199/22434 [8:36:37<10:07:50, 2.56s/it] +2025-02-05 18:44:20 - ERROR - stderr - 37%|███▋ | 8200/22434 [8:36:40<10:14:50, 2.59s/it] +2025-02-05 18:44:20 - ERROR - stderr - +2025-02-05 18:44:20 - ERROR - stderr - +2025-02-05 18:44:20 - INFO - stdout - {'loss': 0.7938, 'grad_norm': 1.2023200988769531, 'learning_rate': 1.4655293383075937e-05, 'epoch': 1.1} +2025-02-05 18:44:20 - ERROR - stderr - 37%|███▋ | 8200/22434 [8:36:40<10:14:50, 2.59s/it] +2025-02-05 18:44:23 - ERROR - stderr - 37%|███▋ | 8201/22434 [8:36:42<10:15:06, 2.59s/it] +2025-02-05 18:44:23 - ERROR - stderr - +2025-02-05 18:44:23 - ERROR - stderr - +2025-02-05 18:44:23 - INFO - stdout - {'loss': 0.7369, 'grad_norm': 1.2271883487701416, 'learning_rate': 1.465401557195303e-05, 'epoch': 1.1} +2025-02-05 18:44:23 - ERROR - stderr - 37%|███▋ | 8201/22434 [8:36:42<10:15:06, 2.59s/it] +2025-02-05 18:44:25 - ERROR - stderr - 37%|███▋ | 8202/22434 [8:36:45<10:13:29, 2.59s/it] +2025-02-05 18:44:25 - ERROR - stderr - +2025-02-05 18:44:25 - ERROR - stderr - +2025-02-05 18:44:25 - INFO - stdout - {'loss': 0.7004, 'grad_norm': 1.1714974641799927, 'learning_rate': 1.4652737663821614e-05, 'epoch': 1.1} +2025-02-05 18:44:25 - ERROR - stderr - 37%|███▋ | 8202/22434 [8:36:45<10:13:29, 2.59s/it] +2025-02-05 18:44:28 - ERROR - stderr - 37%|███▋ | 8203/22434 [8:36:47<10:11:03, 2.58s/it] +2025-02-05 18:44:28 - ERROR - stderr - +2025-02-05 18:44:28 - ERROR - stderr - +2025-02-05 18:44:28 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.1790149211883545, 'learning_rate': 1.4651459658708336e-05, 'epoch': 1.1} +2025-02-05 18:44:28 - ERROR - stderr - 37%|███▋ | 8203/22434 [8:36:47<10:11:03, 2.58s/it] +2025-02-05 18:44:30 - ERROR - stderr - 37%|███▋ | 8204/22434 [8:36:50<10:07:22, 2.56s/it] +2025-02-05 18:44:30 - ERROR - stderr - +2025-02-05 18:44:30 - ERROR - stderr - +2025-02-05 18:44:30 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.1862670183181763, 'learning_rate': 1.4650181556639833e-05, 'epoch': 1.1} +2025-02-05 18:44:30 - ERROR - stderr - 37%|███▋ | 8204/22434 [8:36:50<10:07:22, 2.56s/it] +2025-02-05 18:44:33 - ERROR - stderr - 37%|███▋ | 8205/22434 [8:36:53<10:21:19, 2.62s/it] +2025-02-05 18:44:33 - ERROR - stderr - +2025-02-05 18:44:33 - ERROR - stderr - +2025-02-05 18:44:33 - INFO - stdout - {'loss': 0.7171, 'grad_norm': 1.123826503753662, 'learning_rate': 1.4648903357642748e-05, 'epoch': 1.1} +2025-02-05 18:44:33 - ERROR - stderr - 37%|███▋ | 8205/22434 [8:36:53<10:21:19, 2.62s/it] +2025-02-05 18:44:35 - ERROR - stderr - 37%|███▋ | 8206/22434 [8:36:55<10:13:53, 2.59s/it] +2025-02-05 18:44:36 - ERROR - stderr - +2025-02-05 18:44:36 - ERROR - stderr - +2025-02-05 18:44:36 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.2132987976074219, 'learning_rate': 1.4647625061743713e-05, 'epoch': 1.1} +2025-02-05 18:44:36 - ERROR - stderr - 37%|███▋ | 8206/22434 [8:36:55<10:13:53, 2.59s/it] +2025-02-05 18:44:38 - ERROR - stderr - 37%|███▋ | 8207/22434 [8:36:58<10:04:57, 2.55s/it] +2025-02-05 18:44:38 - ERROR - stderr - +2025-02-05 18:44:38 - ERROR - stderr - +2025-02-05 18:44:38 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.106748104095459, 'learning_rate': 1.4646346668969386e-05, 'epoch': 1.1} +2025-02-05 18:44:38 - ERROR - stderr - 37%|███▋ | 8207/22434 [8:36:58<10:04:57, 2.55s/it] +2025-02-05 18:44:40 - ERROR - stderr - 37%|███▋ | 8208/22434 [8:37:00<9:57:48, 2.52s/it] +2025-02-05 18:44:40 - ERROR - stderr - +2025-02-05 18:44:40 - ERROR - stderr - +2025-02-05 18:44:40 - INFO - stdout - {'loss': 0.7117, 'grad_norm': 1.1480863094329834, 'learning_rate': 1.4645068179346408e-05, 'epoch': 1.1} +2025-02-05 18:44:40 - ERROR - stderr - 37%|███▋ | 8208/22434 [8:37:00<9:57:48, 2.52s/it] +2025-02-05 18:44:43 - ERROR - stderr - 37%|███▋ | 8209/22434 [8:37:03<9:56:08, 2.51s/it] +2025-02-05 18:44:43 - ERROR - stderr - +2025-02-05 18:44:43 - ERROR - stderr - +2025-02-05 18:44:43 - INFO - stdout - {'loss': 0.773, 'grad_norm': 1.254892110824585, 'learning_rate': 1.4643789592901433e-05, 'epoch': 1.1} +2025-02-05 18:44:43 - ERROR - stderr - 37%|███▋ | 8209/22434 [8:37:03<9:56:08, 2.51s/it] +2025-02-05 18:44:45 - ERROR - stderr - 37%|███▋ | 8210/22434 [8:37:05<9:55:28, 2.51s/it] +2025-02-05 18:44:45 - ERROR - stderr - +2025-02-05 18:44:45 - ERROR - stderr - +2025-02-05 18:44:45 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.1178590059280396, 'learning_rate': 1.4642510909661103e-05, 'epoch': 1.1} +2025-02-05 18:44:45 - ERROR - stderr - 37%|███▋ | 8210/22434 [8:37:05<9:55:28, 2.51s/it] +2025-02-05 18:44:48 - ERROR - stderr - 37%|███▋ | 8211/22434 [8:37:08<10:01:06, 2.54s/it] +2025-02-05 18:44:48 - ERROR - stderr - +2025-02-05 18:44:48 - ERROR - stderr - +2025-02-05 18:44:48 - INFO - stdout - {'loss': 0.8698, 'grad_norm': 1.2524044513702393, 'learning_rate': 1.4641232129652076e-05, 'epoch': 1.1} +2025-02-05 18:44:48 - ERROR - stderr - 37%|███▋ | 8211/22434 [8:37:08<10:01:06, 2.54s/it] +2025-02-05 18:44:51 - ERROR - stderr - 37%|███▋ | 8212/22434 [8:37:10<10:02:23, 2.54s/it] +2025-02-05 18:44:51 - ERROR - stderr - +2025-02-05 18:44:51 - ERROR - stderr - +2025-02-05 18:44:51 - INFO - stdout - {'loss': 0.7251, 'grad_norm': 1.1347885131835938, 'learning_rate': 1.4639953252901007e-05, 'epoch': 1.1} +2025-02-05 18:44:51 - ERROR - stderr - 37%|███▋ | 8212/22434 [8:37:10<10:02:23, 2.54s/it] +2025-02-05 18:44:53 - ERROR - stderr - 37%|███▋ | 8213/22434 [8:37:13<10:03:17, 2.55s/it] +2025-02-05 18:44:53 - ERROR - stderr - +2025-02-05 18:44:53 - ERROR - stderr - +2025-02-05 18:44:53 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.1306626796722412, 'learning_rate': 1.4638674279434553e-05, 'epoch': 1.1} +2025-02-05 18:44:53 - ERROR - stderr - 37%|███▋ | 8213/22434 [8:37:13<10:03:17, 2.55s/it] +2025-02-05 18:44:56 - ERROR - stderr - 37%|███▋ | 8214/22434 [8:37:15<9:58:04, 2.52s/it] +2025-02-05 18:44:56 - ERROR - stderr - +2025-02-05 18:44:56 - ERROR - stderr - +2025-02-05 18:44:56 - INFO - stdout - {'loss': 0.8246, 'grad_norm': 1.3701748847961426, 'learning_rate': 1.463739520927937e-05, 'epoch': 1.1} +2025-02-05 18:44:56 - ERROR - stderr - 37%|███▋ | 8214/22434 [8:37:15<9:58:04, 2.52s/it] +2025-02-05 18:44:58 - ERROR - stderr - 37%|███▋ | 8215/22434 [8:37:18<10:00:32, 2.53s/it] +2025-02-05 18:44:58 - ERROR - stderr - +2025-02-05 18:44:58 - ERROR - stderr - +2025-02-05 18:44:58 - INFO - stdout - {'loss': 0.6576, 'grad_norm': 1.06728994846344, 'learning_rate': 1.4636116042462123e-05, 'epoch': 1.1} +2025-02-05 18:44:58 - ERROR - stderr - 37%|███▋ | 8215/22434 [8:37:18<10:00:32, 2.53s/it] +2025-02-05 18:45:01 - ERROR - stderr - 37%|███▋ | 8216/22434 [8:37:20<9:54:40, 2.51s/it] +2025-02-05 18:45:01 - ERROR - stderr - +2025-02-05 18:45:01 - ERROR - stderr - +2025-02-05 18:45:01 - INFO - stdout - {'loss': 0.6693, 'grad_norm': 1.1241544485092163, 'learning_rate': 1.4634836779009474e-05, 'epoch': 1.1} +2025-02-05 18:45:01 - ERROR - stderr - 37%|███▋ | 8216/22434 [8:37:20<9:54:40, 2.51s/it] +2025-02-05 18:45:03 - ERROR - stderr - 37%|███▋ | 8217/22434 [8:37:23<9:56:16, 2.52s/it] +2025-02-05 18:45:03 - ERROR - stderr - +2025-02-05 18:45:03 - ERROR - stderr - +2025-02-05 18:45:03 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.1395597457885742, 'learning_rate': 1.4633557418948089e-05, 'epoch': 1.1} +2025-02-05 18:45:03 - ERROR - stderr - 37%|███▋ | 8217/22434 [8:37:23<9:56:16, 2.52s/it] +2025-02-05 18:45:06 - ERROR - stderr - 37%|███▋ | 8218/22434 [8:37:25<9:50:53, 2.49s/it] +2025-02-05 18:45:06 - ERROR - stderr - +2025-02-05 18:45:06 - ERROR - stderr - +2025-02-05 18:45:06 - INFO - stdout - {'loss': 0.6689, 'grad_norm': 1.0658247470855713, 'learning_rate': 1.4632277962304629e-05, 'epoch': 1.1} +2025-02-05 18:45:06 - ERROR - stderr - 37%|███▋ | 8218/22434 [8:37:25<9:50:53, 2.49s/it] +2025-02-05 18:45:08 - ERROR - stderr - 37%|███▋ | 8219/22434 [8:37:28<9:47:50, 2.48s/it] +2025-02-05 18:45:08 - ERROR - stderr - +2025-02-05 18:45:08 - ERROR - stderr - +2025-02-05 18:45:08 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.2273157835006714, 'learning_rate': 1.4630998409105767e-05, 'epoch': 1.1} +2025-02-05 18:45:08 - ERROR - stderr - 37%|███▋ | 8219/22434 [8:37:28<9:47:50, 2.48s/it] +2025-02-05 18:45:10 - ERROR - stderr - 37%|███▋ | 8220/22434 [8:37:30<9:46:55, 2.48s/it] +2025-02-05 18:45:11 - ERROR - stderr - +2025-02-05 18:45:11 - ERROR - stderr - +2025-02-05 18:45:11 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.1881983280181885, 'learning_rate': 1.4629718759378177e-05, 'epoch': 1.1} +2025-02-05 18:45:11 - ERROR - stderr - 37%|███▋ | 8220/22434 [8:37:30<9:46:55, 2.48s/it] +2025-02-05 18:45:13 - ERROR - stderr - 37%|███▋ | 8221/22434 [8:37:33<9:45:49, 2.47s/it] +2025-02-05 18:45:13 - ERROR - stderr - +2025-02-05 18:45:13 - ERROR - stderr - +2025-02-05 18:45:13 - INFO - stdout - {'loss': 0.7571, 'grad_norm': 1.2353265285491943, 'learning_rate': 1.4628439013148532e-05, 'epoch': 1.1} +2025-02-05 18:45:13 - ERROR - stderr - 37%|███▋ | 8221/22434 [8:37:33<9:45:49, 2.47s/it] +2025-02-05 18:45:15 - ERROR - stderr - 37%|███▋ | 8222/22434 [8:37:35<9:48:23, 2.48s/it] +2025-02-05 18:45:15 - ERROR - stderr - +2025-02-05 18:45:16 - ERROR - stderr - +2025-02-05 18:45:16 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.1384950876235962, 'learning_rate': 1.4627159170443504e-05, 'epoch': 1.1} +2025-02-05 18:45:16 - ERROR - stderr - 37%|███▋ | 8222/22434 [8:37:35<9:48:23, 2.48s/it] +2025-02-05 18:45:18 - ERROR - stderr - 37%|███▋ | 8223/22434 [8:37:38<9:53:30, 2.51s/it] +2025-02-05 18:45:18 - ERROR - stderr - +2025-02-05 18:45:18 - ERROR - stderr - +2025-02-05 18:45:18 - INFO - stdout - {'loss': 0.7109, 'grad_norm': 1.159988284111023, 'learning_rate': 1.4625879231289767e-05, 'epoch': 1.1} +2025-02-05 18:45:18 - ERROR - stderr - 37%|███▋ | 8223/22434 [8:37:38<9:53:30, 2.51s/it] +2025-02-05 18:45:21 - ERROR - stderr - 37%|███▋ | 8224/22434 [8:37:40<9:55:07, 2.51s/it] +2025-02-05 18:45:21 - ERROR - stderr - +2025-02-05 18:45:21 - ERROR - stderr - +2025-02-05 18:45:21 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.1748900413513184, 'learning_rate': 1.4624599195714006e-05, 'epoch': 1.1} +2025-02-05 18:45:21 - ERROR - stderr - 37%|███▋ | 8224/22434 [8:37:40<9:55:07, 2.51s/it] +2025-02-05 18:45:23 - ERROR - stderr - 37%|███▋ | 8225/22434 [8:37:43<9:49:29, 2.49s/it] +2025-02-05 18:45:23 - ERROR - stderr - +2025-02-05 18:45:23 - ERROR - stderr - +2025-02-05 18:45:23 - INFO - stdout - {'loss': 0.7272, 'grad_norm': 1.2397748231887817, 'learning_rate': 1.4623319063742902e-05, 'epoch': 1.1} +2025-02-05 18:45:23 - ERROR - stderr - 37%|███▋ | 8225/22434 [8:37:43<9:49:29, 2.49s/it] +2025-02-05 18:45:26 - ERROR - stderr - 37%|███▋ | 8226/22434 [8:37:45<9:52:34, 2.50s/it] +2025-02-05 18:45:26 - ERROR - stderr - +2025-02-05 18:45:26 - ERROR - stderr - +2025-02-05 18:45:26 - INFO - stdout - {'loss': 0.7664, 'grad_norm': 1.3209439516067505, 'learning_rate': 1.4622038835403135e-05, 'epoch': 1.1} +2025-02-05 18:45:26 - ERROR - stderr - 37%|███▋ | 8226/22434 [8:37:45<9:52:34, 2.50s/it] +2025-02-05 18:45:28 - ERROR - stderr - 37%|███▋ | 8227/22434 [8:37:48<9:51:07, 2.50s/it] +2025-02-05 18:45:28 - ERROR - stderr - +2025-02-05 18:45:28 - ERROR - stderr - +2025-02-05 18:45:28 - INFO - stdout - {'loss': 0.7401, 'grad_norm': 1.043557047843933, 'learning_rate': 1.462075851072139e-05, 'epoch': 1.1} +2025-02-05 18:45:28 - ERROR - stderr - 37%|███▋ | 8227/22434 [8:37:48<9:51:07, 2.50s/it] +2025-02-05 18:45:31 - ERROR - stderr - 37%|███▋ | 8228/22434 [8:37:51<10:20:53, 2.62s/it] +2025-02-05 18:45:31 - ERROR - stderr - +2025-02-05 18:45:31 - ERROR - stderr - +2025-02-05 18:45:31 - INFO - stdout - {'loss': 0.674, 'grad_norm': 1.1530739068984985, 'learning_rate': 1.4619478089724355e-05, 'epoch': 1.1} +2025-02-05 18:45:31 - ERROR - stderr - 37%|███▋ | 8228/22434 [8:37:51<10:20:53, 2.62s/it] +2025-02-05 18:45:33 - ERROR - stderr - 37%|███▋ | 8229/22434 [8:37:53<10:10:11, 2.58s/it] +2025-02-05 18:45:33 - ERROR - stderr - +2025-02-05 18:45:33 - ERROR - stderr - +2025-02-05 18:45:33 - INFO - stdout - {'loss': 0.6545, 'grad_norm': 1.008626937866211, 'learning_rate': 1.4618197572438722e-05, 'epoch': 1.1} +2025-02-05 18:45:33 - ERROR - stderr - 37%|███▋ | 8229/22434 [8:37:53<10:10:11, 2.58s/it] +2025-02-05 18:45:36 - ERROR - stderr - 37%|███▋ | 8230/22434 [8:37:56<9:59:57, 2.53s/it] +2025-02-05 18:45:36 - ERROR - stderr - +2025-02-05 18:45:36 - ERROR - stderr - +2025-02-05 18:45:36 - INFO - stdout - {'loss': 0.7613, 'grad_norm': 1.319429874420166, 'learning_rate': 1.4616916958891179e-05, 'epoch': 1.1} +2025-02-05 18:45:36 - ERROR - stderr - 37%|███▋ | 8230/22434 [8:37:56<9:59:57, 2.53s/it] +2025-02-05 18:45:38 - ERROR - stderr - 37%|███▋ | 8231/22434 [8:37:58<10:09:39, 2.58s/it] +2025-02-05 18:45:39 - ERROR - stderr - +2025-02-05 18:45:39 - ERROR - stderr - +2025-02-05 18:45:39 - INFO - stdout - {'loss': 0.7826, 'grad_norm': 1.1527820825576782, 'learning_rate': 1.4615636249108418e-05, 'epoch': 1.1} +2025-02-05 18:45:39 - ERROR - stderr - 37%|███▋ | 8231/22434 [8:37:58<10:09:39, 2.58s/it] +2025-02-05 18:45:41 - ERROR - stderr - 37%|███▋ | 8232/22434 [8:38:01<10:04:39, 2.55s/it] +2025-02-05 18:45:41 - ERROR - stderr - +2025-02-05 18:45:41 - ERROR - stderr - +2025-02-05 18:45:41 - INFO - stdout - {'loss': 0.6993, 'grad_norm': 1.0154234170913696, 'learning_rate': 1.4614355443117137e-05, 'epoch': 1.1} +2025-02-05 18:45:41 - ERROR - stderr - 37%|███▋ | 8232/22434 [8:38:01<10:04:39, 2.55s/it] +2025-02-05 18:45:43 - ERROR - stderr - 37%|███▋ | 8233/22434 [8:38:03<10:01:22, 2.54s/it] +2025-02-05 18:45:44 - ERROR - stderr - +2025-02-05 18:45:44 - ERROR - stderr - +2025-02-05 18:45:44 - INFO - stdout - {'loss': 0.7573, 'grad_norm': 1.0166356563568115, 'learning_rate': 1.4613074540944032e-05, 'epoch': 1.1} +2025-02-05 18:45:44 - ERROR - stderr - 37%|███▋ | 8233/22434 [8:38:03<10:01:22, 2.54s/it] +2025-02-05 18:45:46 - ERROR - stderr - 37%|███▋ | 8234/22434 [8:38:06<9:58:08, 2.53s/it] +2025-02-05 18:45:46 - ERROR - stderr - +2025-02-05 18:45:46 - ERROR - stderr - +2025-02-05 18:45:46 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.1730951070785522, 'learning_rate': 1.4611793542615805e-05, 'epoch': 1.1} +2025-02-05 18:45:46 - ERROR - stderr - 37%|███▋ | 8234/22434 [8:38:06<9:58:08, 2.53s/it] +2025-02-05 18:45:48 - ERROR - stderr - 37%|███▋ | 8235/22434 [8:38:08<9:56:28, 2.52s/it] +2025-02-05 18:45:49 - ERROR - stderr - +2025-02-05 18:45:49 - ERROR - stderr - +2025-02-05 18:45:49 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.1418660879135132, 'learning_rate': 1.461051244815915e-05, 'epoch': 1.1} +2025-02-05 18:45:49 - ERROR - stderr - 37%|███▋ | 8235/22434 [8:38:08<9:56:28, 2.52s/it] +2025-02-05 18:45:51 - ERROR - stderr - 37%|███▋ | 8236/22434 [8:38:11<9:56:57, 2.52s/it] +2025-02-05 18:45:51 - ERROR - stderr - +2025-02-05 18:45:51 - ERROR - stderr - +2025-02-05 18:45:51 - INFO - stdout - {'loss': 0.6838, 'grad_norm': 1.1512385606765747, 'learning_rate': 1.4609231257600778e-05, 'epoch': 1.1} +2025-02-05 18:45:51 - ERROR - stderr - 37%|███▋ | 8236/22434 [8:38:11<9:56:57, 2.52s/it] +2025-02-05 18:45:54 - ERROR - stderr - 37%|███▋ | 8237/22434 [8:38:13<9:55:11, 2.52s/it] +2025-02-05 18:45:54 - ERROR - stderr - +2025-02-05 18:45:54 - ERROR - stderr - +2025-02-05 18:45:54 - INFO - stdout - {'loss': 0.7357, 'grad_norm': 1.1949220895767212, 'learning_rate': 1.4607949970967391e-05, 'epoch': 1.1} +2025-02-05 18:45:54 - ERROR - stderr - 37%|███▋ | 8237/22434 [8:38:13<9:55:11, 2.52s/it] +2025-02-05 18:45:56 - ERROR - stderr - 37%|███▋ | 8238/22434 [8:38:16<9:58:48, 2.53s/it] +2025-02-05 18:45:56 - ERROR - stderr - +2025-02-05 18:45:56 - ERROR - stderr - +2025-02-05 18:45:56 - INFO - stdout - {'loss': 0.8132, 'grad_norm': 1.188032627105713, 'learning_rate': 1.4606668588285694e-05, 'epoch': 1.1} +2025-02-05 18:45:56 - ERROR - stderr - 37%|███▋ | 8238/22434 [8:38:16<9:58:48, 2.53s/it] +2025-02-05 18:45:59 - ERROR - stderr - 37%|███▋ | 8239/22434 [8:38:18<10:02:21, 2.55s/it] +2025-02-05 18:45:59 - ERROR - stderr - +2025-02-05 18:45:59 - ERROR - stderr - +2025-02-05 18:45:59 - INFO - stdout - {'loss': 0.6925, 'grad_norm': 1.0157471895217896, 'learning_rate': 1.4605387109582401e-05, 'epoch': 1.1} +2025-02-05 18:45:59 - ERROR - stderr - 37%|███▋ | 8239/22434 [8:38:18<10:02:21, 2.55s/it] +2025-02-05 18:46:01 - ERROR - stderr - 37%|███▋ | 8240/22434 [8:38:21<10:06:14, 2.56s/it] +2025-02-05 18:46:01 - ERROR - stderr - +2025-02-05 18:46:01 - ERROR - stderr - +2025-02-05 18:46:01 - INFO - stdout - {'loss': 0.6577, 'grad_norm': 1.0028070211410522, 'learning_rate': 1.4604105534884218e-05, 'epoch': 1.1} +2025-02-05 18:46:01 - ERROR - stderr - 37%|███▋ | 8240/22434 [8:38:21<10:06:14, 2.56s/it] +2025-02-05 18:46:04 - ERROR - stderr - 37%|███▋ | 8241/22434 [8:38:24<10:05:51, 2.56s/it] +2025-02-05 18:46:04 - ERROR - stderr - +2025-02-05 18:46:04 - ERROR - stderr - +2025-02-05 18:46:04 - INFO - stdout - {'loss': 0.7842, 'grad_norm': 1.1497137546539307, 'learning_rate': 1.4602823864217863e-05, 'epoch': 1.1} +2025-02-05 18:46:04 - ERROR - stderr - 37%|███▋ | 8241/22434 [8:38:24<10:05:51, 2.56s/it] +2025-02-05 18:46:06 - ERROR - stderr - 37%|███▋ | 8242/22434 [8:38:26<10:02:06, 2.55s/it] +2025-02-05 18:46:06 - ERROR - stderr - +2025-02-05 18:46:06 - ERROR - stderr - +2025-02-05 18:46:06 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.1665130853652954, 'learning_rate': 1.4601542097610051e-05, 'epoch': 1.1} +2025-02-05 18:46:06 - ERROR - stderr - 37%|███▋ | 8242/22434 [8:38:26<10:02:06, 2.55s/it] +2025-02-05 18:46:09 - ERROR - stderr - 37%|███▋ | 8243/22434 [8:38:29<10:13:03, 2.59s/it] +2025-02-05 18:46:09 - ERROR - stderr - +2025-02-05 18:46:09 - ERROR - stderr - +2025-02-05 18:46:09 - INFO - stdout - {'loss': 0.7859, 'grad_norm': 1.2151650190353394, 'learning_rate': 1.4600260235087493e-05, 'epoch': 1.1} +2025-02-05 18:46:09 - ERROR - stderr - 37%|███▋ | 8243/22434 [8:38:29<10:13:03, 2.59s/it] +2025-02-05 18:46:12 - ERROR - stderr - 37%|███▋ | 8244/22434 [8:38:32<10:41:21, 2.71s/it] +2025-02-05 18:46:12 - ERROR - stderr - +2025-02-05 18:46:12 - ERROR - stderr - +2025-02-05 18:46:12 - INFO - stdout - {'loss': 0.7165, 'grad_norm': 1.1079157590866089, 'learning_rate': 1.4598978276676916e-05, 'epoch': 1.1} +2025-02-05 18:46:12 - ERROR - stderr - 37%|███▋ | 8244/22434 [8:38:32<10:41:21, 2.71s/it] +2025-02-05 18:46:15 - ERROR - stderr - 37%|███▋ | 8245/22434 [8:38:34<10:27:20, 2.65s/it] +2025-02-05 18:46:15 - ERROR - stderr - +2025-02-05 18:46:15 - ERROR - stderr - +2025-02-05 18:46:15 - INFO - stdout - {'loss': 0.7482, 'grad_norm': 1.131780743598938, 'learning_rate': 1.4597696222405033e-05, 'epoch': 1.1} +2025-02-05 18:46:15 - ERROR - stderr - 37%|███▋ | 8245/22434 [8:38:34<10:27:20, 2.65s/it] +2025-02-05 18:46:17 - ERROR - stderr - 37%|███▋ | 8246/22434 [8:38:37<10:26:53, 2.65s/it] +2025-02-05 18:46:17 - ERROR - stderr - +2025-02-05 18:46:17 - ERROR - stderr - +2025-02-05 18:46:17 - INFO - stdout - {'loss': 0.6226, 'grad_norm': 0.9700050354003906, 'learning_rate': 1.4596414072298575e-05, 'epoch': 1.1} +2025-02-05 18:46:17 - ERROR - stderr - 37%|███▋ | 8246/22434 [8:38:37<10:26:53, 2.65s/it] +2025-02-05 18:46:20 - ERROR - stderr - 37%|███▋ | 8247/22434 [8:38:39<10:11:56, 2.59s/it] +2025-02-05 18:46:20 - ERROR - stderr - +2025-02-05 18:46:20 - ERROR - stderr - +2025-02-05 18:46:20 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.5398085117340088, 'learning_rate': 1.4595131826384263e-05, 'epoch': 1.1} +2025-02-05 18:46:20 - ERROR - stderr - 37%|███▋ | 8247/22434 [8:38:39<10:11:56, 2.59s/it] +2025-02-05 18:46:22 - ERROR - stderr - 37%|███▋ | 8248/22434 [8:38:42<10:20:51, 2.63s/it] +2025-02-05 18:46:22 - ERROR - stderr - +2025-02-05 18:46:22 - ERROR - stderr - +2025-02-05 18:46:22 - INFO - stdout - {'loss': 0.7169, 'grad_norm': 1.0944463014602661, 'learning_rate': 1.4593849484688827e-05, 'epoch': 1.1} +2025-02-05 18:46:22 - ERROR - stderr - 37%|███▋ | 8248/22434 [8:38:42<10:20:51, 2.63s/it] +2025-02-05 18:46:25 - ERROR - stderr - 37%|███▋ | 8249/22434 [8:38:45<10:08:54, 2.58s/it] +2025-02-05 18:46:25 - ERROR - stderr - +2025-02-05 18:46:25 - ERROR - stderr - +2025-02-05 18:46:25 - INFO - stdout - {'loss': 0.7613, 'grad_norm': 1.1606014966964722, 'learning_rate': 1.459256704723899e-05, 'epoch': 1.1} +2025-02-05 18:46:25 - ERROR - stderr - 37%|███▋ | 8249/22434 [8:38:45<10:08:54, 2.58s/it] +2025-02-05 18:46:27 - ERROR - stderr - 37%|███▋ | 8250/22434 [8:38:47<10:03:54, 2.55s/it] +2025-02-05 18:46:27 - ERROR - stderr - +2025-02-05 18:46:27 - ERROR - stderr - +2025-02-05 18:46:27 - INFO - stdout - {'loss': 0.7593, 'grad_norm': 1.2230435609817505, 'learning_rate': 1.4591284514061492e-05, 'epoch': 1.1} +2025-02-05 18:46:27 - ERROR - stderr - 37%|███▋ | 8250/22434 [8:38:47<10:03:54, 2.55s/it] +2025-02-05 18:46:30 - ERROR - stderr - 37%|███▋ | 8251/22434 [8:38:50<9:59:31, 2.54s/it] +2025-02-05 18:46:30 - ERROR - stderr - +2025-02-05 18:46:30 - ERROR - stderr - +2025-02-05 18:46:30 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.139914631843567, 'learning_rate': 1.4590001885183059e-05, 'epoch': 1.1} +2025-02-05 18:46:30 - ERROR - stderr - 37%|███▋ | 8251/22434 [8:38:50<9:59:31, 2.54s/it] +2025-02-05 18:46:32 - ERROR - stderr - 37%|███▋ | 8252/22434 [8:38:52<9:55:56, 2.52s/it] +2025-02-05 18:46:32 - ERROR - stderr - +2025-02-05 18:46:32 - ERROR - stderr - +2025-02-05 18:46:32 - INFO - stdout - {'loss': 0.7045, 'grad_norm': 1.1277605295181274, 'learning_rate': 1.4588719160630429e-05, 'epoch': 1.1} +2025-02-05 18:46:32 - ERROR - stderr - 37%|███▋ | 8252/22434 [8:38:52<9:55:56, 2.52s/it] +2025-02-05 18:46:35 - ERROR - stderr - 37%|███▋ | 8253/22434 [8:38:55<9:53:30, 2.51s/it] +2025-02-05 18:46:35 - ERROR - stderr - +2025-02-05 18:46:35 - ERROR - stderr - +2025-02-05 18:46:35 - INFO - stdout - {'loss': 0.7168, 'grad_norm': 1.1307988166809082, 'learning_rate': 1.4587436340430338e-05, 'epoch': 1.1} +2025-02-05 18:46:35 - ERROR - stderr - 37%|███▋ | 8253/22434 [8:38:55<9:53:30, 2.51s/it] +2025-02-05 18:46:37 - ERROR - stderr - 37%|███▋ | 8254/22434 [8:38:57<9:55:07, 2.52s/it] +2025-02-05 18:46:37 - ERROR - stderr - +2025-02-05 18:46:37 - ERROR - stderr - +2025-02-05 18:46:37 - INFO - stdout - {'loss': 0.8023, 'grad_norm': 1.2383873462677002, 'learning_rate': 1.458615342460953e-05, 'epoch': 1.1} +2025-02-05 18:46:37 - ERROR - stderr - 37%|███▋ | 8254/22434 [8:38:57<9:55:07, 2.52s/it] +2025-02-05 18:46:40 - ERROR - stderr - 37%|███▋ | 8255/22434 [8:39:00<9:57:15, 2.53s/it] +2025-02-05 18:46:40 - ERROR - stderr - +2025-02-05 18:46:40 - ERROR - stderr - +2025-02-05 18:46:40 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.1057050228118896, 'learning_rate': 1.458487041319474e-05, 'epoch': 1.1} +2025-02-05 18:46:40 - ERROR - stderr - 37%|███▋ | 8255/22434 [8:39:00<9:57:15, 2.53s/it] +2025-02-05 18:46:42 - ERROR - stderr - 37%|███▋ | 8256/22434 [8:39:02<9:53:20, 2.51s/it] +2025-02-05 18:46:42 - ERROR - stderr - +2025-02-05 18:46:42 - ERROR - stderr - +2025-02-05 18:46:42 - INFO - stdout - {'loss': 0.7997, 'grad_norm': 1.3270267248153687, 'learning_rate': 1.4583587306212714e-05, 'epoch': 1.1} +2025-02-05 18:46:42 - ERROR - stderr - 37%|███▋ | 8256/22434 [8:39:02<9:53:20, 2.51s/it] +2025-02-05 18:46:45 - ERROR - stderr - 37%|███▋ | 8257/22434 [8:39:05<9:55:19, 2.52s/it] +2025-02-05 18:46:45 - ERROR - stderr - +2025-02-05 18:46:45 - ERROR - stderr - +2025-02-05 18:46:45 - INFO - stdout - {'loss': 0.7124, 'grad_norm': 1.100528359413147, 'learning_rate': 1.4582304103690197e-05, 'epoch': 1.1} +2025-02-05 18:46:45 - ERROR - stderr - 37%|███▋ | 8257/22434 [8:39:05<9:55:19, 2.52s/it] +2025-02-05 18:46:47 - ERROR - stderr - 37%|███▋ | 8258/22434 [8:39:07<9:52:58, 2.51s/it] +2025-02-05 18:46:47 - ERROR - stderr - +2025-02-05 18:46:47 - ERROR - stderr - +2025-02-05 18:46:47 - INFO - stdout - {'loss': 0.737, 'grad_norm': 1.4334498643875122, 'learning_rate': 1.4581020805653934e-05, 'epoch': 1.1} +2025-02-05 18:46:47 - ERROR - stderr - 37%|███▋ | 8258/22434 [8:39:07<9:52:58, 2.51s/it] +2025-02-05 18:46:50 - ERROR - stderr - 37%|███▋ | 8259/22434 [8:39:10<9:47:36, 2.49s/it] +2025-02-05 18:46:50 - ERROR - stderr - +2025-02-05 18:46:50 - ERROR - stderr - +2025-02-05 18:46:50 - INFO - stdout - {'loss': 0.7779, 'grad_norm': 1.4105724096298218, 'learning_rate': 1.4579737412130679e-05, 'epoch': 1.1} +2025-02-05 18:46:50 - ERROR - stderr - 37%|███▋ | 8259/22434 [8:39:10<9:47:36, 2.49s/it] +2025-02-05 18:46:52 - ERROR - stderr - 37%|███▋ | 8260/22434 [8:39:12<9:43:14, 2.47s/it] +2025-02-05 18:46:52 - ERROR - stderr - +2025-02-05 18:46:52 - ERROR - stderr - +2025-02-05 18:46:52 - INFO - stdout - {'loss': 0.8912, 'grad_norm': 1.3646095991134644, 'learning_rate': 1.4578453923147176e-05, 'epoch': 1.1} +2025-02-05 18:46:52 - ERROR - stderr - 37%|███▋ | 8260/22434 [8:39:12<9:43:14, 2.47s/it] +2025-02-05 18:46:55 - ERROR - stderr - 37%|███▋ | 8261/22434 [8:39:14<9:40:55, 2.46s/it] +2025-02-05 18:46:55 - ERROR - stderr - +2025-02-05 18:46:55 - ERROR - stderr - +2025-02-05 18:46:55 - INFO - stdout - {'loss': 0.7781, 'grad_norm': 1.2176775932312012, 'learning_rate': 1.4577170338730184e-05, 'epoch': 1.1} +2025-02-05 18:46:55 - ERROR - stderr - 37%|███▋ | 8261/22434 [8:39:14<9:40:55, 2.46s/it] +2025-02-05 18:46:57 - ERROR - stderr - 37%|███▋ | 8262/22434 [8:39:17<9:42:27, 2.47s/it] +2025-02-05 18:46:57 - ERROR - stderr - +2025-02-05 18:46:57 - ERROR - stderr - +2025-02-05 18:46:57 - INFO - stdout - {'loss': 0.7208, 'grad_norm': 1.172537088394165, 'learning_rate': 1.4575886658906458e-05, 'epoch': 1.1} +2025-02-05 18:46:57 - ERROR - stderr - 37%|███▋ | 8262/22434 [8:39:17<9:42:27, 2.47s/it] +2025-02-05 18:47:00 - ERROR - stderr - 37%|███▋ | 8263/22434 [8:39:20<9:51:24, 2.50s/it] +2025-02-05 18:47:00 - ERROR - stderr - +2025-02-05 18:47:00 - ERROR - stderr - +2025-02-05 18:47:00 - INFO - stdout - {'loss': 0.6205, 'grad_norm': 1.0549992322921753, 'learning_rate': 1.4574602883702752e-05, 'epoch': 1.1} +2025-02-05 18:47:00 - ERROR - stderr - 37%|███▋ | 8263/22434 [8:39:20<9:51:24, 2.50s/it] +2025-02-05 18:47:02 - ERROR - stderr - 37%|███▋ | 8264/22434 [8:39:22<9:52:49, 2.51s/it] +2025-02-05 18:47:02 - ERROR - stderr - +2025-02-05 18:47:02 - ERROR - stderr - +2025-02-05 18:47:02 - INFO - stdout - {'loss': 0.6961, 'grad_norm': 1.1880916357040405, 'learning_rate': 1.4573319013145823e-05, 'epoch': 1.11} +2025-02-05 18:47:02 - ERROR - stderr - 37%|███▋ | 8264/22434 [8:39:22<9:52:49, 2.51s/it] +2025-02-05 18:47:05 - ERROR - stderr - 37%|███▋ | 8265/22434 [8:39:25<9:56:55, 2.53s/it] +2025-02-05 18:47:05 - ERROR - stderr - +2025-02-05 18:47:05 - ERROR - stderr - +2025-02-05 18:47:05 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.0007102489471436, 'learning_rate': 1.4572035047262439e-05, 'epoch': 1.11} +2025-02-05 18:47:05 - ERROR - stderr - 37%|███▋ | 8265/22434 [8:39:25<9:56:55, 2.53s/it] +2025-02-05 18:47:07 - ERROR - stderr - 37%|███▋ | 8266/22434 [8:39:27<9:56:24, 2.53s/it] +2025-02-05 18:47:07 - ERROR - stderr - +2025-02-05 18:47:07 - ERROR - stderr - +2025-02-05 18:47:07 - INFO - stdout - {'loss': 0.7653, 'grad_norm': 1.376042366027832, 'learning_rate': 1.4570750986079358e-05, 'epoch': 1.11} +2025-02-05 18:47:07 - ERROR - stderr - 37%|███▋ | 8266/22434 [8:39:27<9:56:24, 2.53s/it] +2025-02-05 18:47:10 - ERROR - stderr - 37%|███▋ | 8267/22434 [8:39:30<9:57:02, 2.53s/it] +2025-02-05 18:47:10 - ERROR - stderr - +2025-02-05 18:47:10 - ERROR - stderr - +2025-02-05 18:47:10 - INFO - stdout - {'loss': 0.7688, 'grad_norm': 1.09882652759552, 'learning_rate': 1.456946682962335e-05, 'epoch': 1.11} +2025-02-05 18:47:10 - ERROR - stderr - 37%|███▋ | 8267/22434 [8:39:30<9:57:02, 2.53s/it] +2025-02-05 18:47:13 - ERROR - stderr - 37%|███▋ | 8268/22434 [8:39:32<10:08:07, 2.58s/it] +2025-02-05 18:47:13 - ERROR - stderr - +2025-02-05 18:47:13 - ERROR - stderr - +2025-02-05 18:47:13 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.2237251996994019, 'learning_rate': 1.4568182577921172e-05, 'epoch': 1.11} +2025-02-05 18:47:13 - ERROR - stderr - 37%|███▋ | 8268/22434 [8:39:32<10:08:07, 2.58s/it] +2025-02-05 18:47:15 - ERROR - stderr - 37%|███▋ | 8269/22434 [8:39:35<10:03:15, 2.56s/it] +2025-02-05 18:47:15 - ERROR - stderr - +2025-02-05 18:47:15 - ERROR - stderr - +2025-02-05 18:47:15 - INFO - stdout - {'loss': 0.7236, 'grad_norm': 1.0586533546447754, 'learning_rate': 1.4566898230999604e-05, 'epoch': 1.11} +2025-02-05 18:47:15 - ERROR - stderr - 37%|███▋ | 8269/22434 [8:39:35<10:03:15, 2.56s/it] +2025-02-05 18:47:18 - ERROR - stderr - 37%|███▋ | 8270/22434 [8:39:37<10:02:52, 2.55s/it] +2025-02-05 18:47:18 - ERROR - stderr - +2025-02-05 18:47:18 - ERROR - stderr - +2025-02-05 18:47:18 - INFO - stdout - {'loss': 0.6302, 'grad_norm': 1.1571077108383179, 'learning_rate': 1.4565613788885412e-05, 'epoch': 1.11} +2025-02-05 18:47:18 - ERROR - stderr - 37%|███▋ | 8270/22434 [8:39:37<10:02:52, 2.55s/it] +2025-02-05 18:47:20 - ERROR - stderr - 37%|███▋ | 8271/22434 [8:39:40<9:58:03, 2.53s/it] +2025-02-05 18:47:20 - ERROR - stderr - +2025-02-05 18:47:20 - ERROR - stderr - +2025-02-05 18:47:20 - INFO - stdout - {'loss': 0.7421, 'grad_norm': 1.2033671140670776, 'learning_rate': 1.4564329251605367e-05, 'epoch': 1.11} +2025-02-05 18:47:20 - ERROR - stderr - 37%|███▋ | 8271/22434 [8:39:40<9:58:03, 2.53s/it] +2025-02-05 18:47:23 - ERROR - stderr - 37%|███▋ | 8272/22434 [8:39:42<9:56:08, 2.53s/it] +2025-02-05 18:47:23 - ERROR - stderr - +2025-02-05 18:47:23 - ERROR - stderr - +2025-02-05 18:47:23 - INFO - stdout - {'loss': 0.7263, 'grad_norm': 1.1124781370162964, 'learning_rate': 1.4563044619186248e-05, 'epoch': 1.11} +2025-02-05 18:47:23 - ERROR - stderr - 37%|███▋ | 8272/22434 [8:39:42<9:56:08, 2.53s/it] +2025-02-05 18:47:25 - ERROR - stderr - 37%|███▋ | 8273/22434 [8:39:45<10:12:14, 2.59s/it] +2025-02-05 18:47:25 - ERROR - stderr - +2025-02-05 18:47:25 - ERROR - stderr - +2025-02-05 18:47:25 - INFO - stdout - {'loss': 0.7322, 'grad_norm': 1.1049301624298096, 'learning_rate': 1.456175989165483e-05, 'epoch': 1.11} +2025-02-05 18:47:25 - ERROR - stderr - 37%|███▋ | 8273/22434 [8:39:45<10:12:14, 2.59s/it] +2025-02-05 18:47:28 - ERROR - stderr - 37%|███▋ | 8274/22434 [8:39:48<10:10:20, 2.59s/it] +2025-02-05 18:47:28 - ERROR - stderr - +2025-02-05 18:47:28 - ERROR - stderr - +2025-02-05 18:47:28 - INFO - stdout - {'loss': 0.7968, 'grad_norm': 1.2442961931228638, 'learning_rate': 1.4560475069037895e-05, 'epoch': 1.11} +2025-02-05 18:47:28 - ERROR - stderr - 37%|███▋ | 8274/22434 [8:39:48<10:10:20, 2.59s/it] +2025-02-05 18:47:30 - ERROR - stderr - 37%|███▋ | 8275/22434 [8:39:50<10:01:22, 2.55s/it] +2025-02-05 18:47:30 - ERROR - stderr - +2025-02-05 18:47:30 - ERROR - stderr - +2025-02-05 18:47:30 - INFO - stdout - {'loss': 0.7821, 'grad_norm': 1.1149489879608154, 'learning_rate': 1.455919015136222e-05, 'epoch': 1.11} +2025-02-05 18:47:30 - ERROR - stderr - 37%|███▋ | 8275/22434 [8:39:50<10:01:22, 2.55s/it] +2025-02-05 18:47:33 - ERROR - stderr - 37%|███▋ | 8276/22434 [8:39:53<10:00:06, 2.54s/it] +2025-02-05 18:47:33 - ERROR - stderr - +2025-02-05 18:47:33 - ERROR - stderr - +2025-02-05 18:47:33 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.1606643199920654, 'learning_rate': 1.4557905138654586e-05, 'epoch': 1.11} +2025-02-05 18:47:33 - ERROR - stderr - 37%|███▋ | 8276/22434 [8:39:53<10:00:06, 2.54s/it] +2025-02-05 18:47:36 - ERROR - stderr - 37%|███▋ | 8277/22434 [8:39:55<10:08:40, 2.58s/it] +2025-02-05 18:47:36 - ERROR - stderr - +2025-02-05 18:47:36 - ERROR - stderr - +2025-02-05 18:47:36 - INFO - stdout - {'loss': 0.8299, 'grad_norm': 1.1662418842315674, 'learning_rate': 1.4556620030941782e-05, 'epoch': 1.11} +2025-02-05 18:47:36 - ERROR - stderr - 37%|███▋ | 8277/22434 [8:39:55<10:08:40, 2.58s/it] +2025-02-05 18:47:38 - ERROR - stderr - 37%|███▋ | 8278/22434 [8:39:58<10:01:53, 2.55s/it] +2025-02-05 18:47:38 - ERROR - stderr - +2025-02-05 18:47:38 - ERROR - stderr - +2025-02-05 18:47:38 - INFO - stdout - {'loss': 0.6927, 'grad_norm': 1.170881748199463, 'learning_rate': 1.4555334828250594e-05, 'epoch': 1.11} +2025-02-05 18:47:38 - ERROR - stderr - 37%|███▋ | 8278/22434 [8:39:58<10:01:53, 2.55s/it] +2025-02-05 18:47:41 - ERROR - stderr - 37%|███▋ | 8279/22434 [8:40:01<10:08:47, 2.58s/it] +2025-02-05 18:47:41 - ERROR - stderr - +2025-02-05 18:47:41 - ERROR - stderr - +2025-02-05 18:47:41 - INFO - stdout - {'loss': 0.6625, 'grad_norm': 1.1598966121673584, 'learning_rate': 1.455404953060781e-05, 'epoch': 1.11} +2025-02-05 18:47:41 - ERROR - stderr - 37%|███▋ | 8279/22434 [8:40:01<10:08:47, 2.58s/it] +2025-02-05 18:47:43 - ERROR - stderr - 37%|███▋ | 8280/22434 [8:40:03<10:06:41, 2.57s/it] +2025-02-05 18:47:43 - ERROR - stderr - +2025-02-05 18:47:43 - ERROR - stderr - +2025-02-05 18:47:43 - INFO - stdout - {'loss': 0.7153, 'grad_norm': 1.1514116525650024, 'learning_rate': 1.4552764138040221e-05, 'epoch': 1.11} +2025-02-05 18:47:43 - ERROR - stderr - 37%|███▋ | 8280/22434 [8:40:03<10:06:41, 2.57s/it] +2025-02-05 18:47:46 - ERROR - stderr - 37%|███▋ | 8281/22434 [8:40:06<10:03:43, 2.56s/it] +2025-02-05 18:47:46 - ERROR - stderr - +2025-02-05 18:47:46 - ERROR - stderr - +2025-02-05 18:47:46 - INFO - stdout - {'loss': 0.7577, 'grad_norm': 1.0893383026123047, 'learning_rate': 1.455147865057462e-05, 'epoch': 1.11} +2025-02-05 18:47:46 - ERROR - stderr - 37%|███▋ | 8281/22434 [8:40:06<10:03:43, 2.56s/it] +2025-02-05 18:47:48 - ERROR - stderr - 37%|███▋ | 8282/22434 [8:40:08<10:02:24, 2.55s/it] +2025-02-05 18:47:48 - ERROR - stderr - +2025-02-05 18:47:48 - ERROR - stderr - +2025-02-05 18:47:48 - INFO - stdout - {'loss': 0.6728, 'grad_norm': 1.0285717248916626, 'learning_rate': 1.4550193068237805e-05, 'epoch': 1.11} +2025-02-05 18:47:48 - ERROR - stderr - 37%|███▋ | 8282/22434 [8:40:08<10:02:24, 2.55s/it] +2025-02-05 18:47:51 - ERROR - stderr - 37%|███▋ | 8283/22434 [8:40:11<10:00:48, 2.55s/it] +2025-02-05 18:47:51 - ERROR - stderr - +2025-02-05 18:47:51 - ERROR - stderr - +2025-02-05 18:47:51 - INFO - stdout - {'loss': 0.7945, 'grad_norm': 1.1683778762817383, 'learning_rate': 1.4548907391056567e-05, 'epoch': 1.11} +2025-02-05 18:47:51 - ERROR - stderr - 37%|███▋ | 8283/22434 [8:40:11<10:00:48, 2.55s/it] +2025-02-05 18:47:53 - ERROR - stderr - 37%|███▋ | 8284/22434 [8:40:13<9:51:18, 2.51s/it] +2025-02-05 18:47:53 - ERROR - stderr - +2025-02-05 18:47:53 - ERROR - stderr - +2025-02-05 18:47:53 - INFO - stdout - {'loss': 0.6435, 'grad_norm': 1.0316197872161865, 'learning_rate': 1.4547621619057706e-05, 'epoch': 1.11} +2025-02-05 18:47:53 - ERROR - stderr - 37%|███▋ | 8284/22434 [8:40:13<9:51:18, 2.51s/it] +2025-02-05 18:47:56 - ERROR - stderr - 37%|███▋ | 8285/22434 [8:40:16<9:48:42, 2.50s/it] +2025-02-05 18:47:56 - ERROR - stderr - +2025-02-05 18:47:56 - ERROR - stderr - +2025-02-05 18:47:56 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.1161266565322876, 'learning_rate': 1.4546335752268027e-05, 'epoch': 1.11} +2025-02-05 18:47:56 - ERROR - stderr - 37%|███▋ | 8285/22434 [8:40:16<9:48:42, 2.50s/it] +2025-02-05 18:47:58 - ERROR - stderr - 37%|███▋ | 8286/22434 [8:40:18<9:49:53, 2.50s/it] +2025-02-05 18:47:58 - ERROR - stderr - +2025-02-05 18:47:58 - ERROR - stderr - +2025-02-05 18:47:58 - INFO - stdout - {'loss': 0.6633, 'grad_norm': 1.156480073928833, 'learning_rate': 1.4545049790714328e-05, 'epoch': 1.11} +2025-02-05 18:47:58 - ERROR - stderr - 37%|███▋ | 8286/22434 [8:40:18<9:49:53, 2.50s/it] +2025-02-05 18:48:01 - ERROR - stderr - 37%|███▋ | 8287/22434 [8:40:21<9:51:06, 2.51s/it] +2025-02-05 18:48:01 - ERROR - stderr - +2025-02-05 18:48:01 - ERROR - stderr - +2025-02-05 18:48:01 - INFO - stdout - {'loss': 0.775, 'grad_norm': 1.200325846672058, 'learning_rate': 1.4543763734423415e-05, 'epoch': 1.11} +2025-02-05 18:48:01 - ERROR - stderr - 37%|███▋ | 8287/22434 [8:40:21<9:51:06, 2.51s/it] +2025-02-05 18:48:03 - ERROR - stderr - 37%|███▋ | 8288/22434 [8:40:23<9:50:47, 2.51s/it] +2025-02-05 18:48:03 - ERROR - stderr - +2025-02-05 18:48:03 - ERROR - stderr - +2025-02-05 18:48:03 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.1742380857467651, 'learning_rate': 1.4542477583422095e-05, 'epoch': 1.11} +2025-02-05 18:48:03 - ERROR - stderr - 37%|███▋ | 8288/22434 [8:40:23<9:50:47, 2.51s/it] +2025-02-05 18:48:06 - ERROR - stderr - 37%|███▋ | 8289/22434 [8:40:26<10:16:17, 2.61s/it] +2025-02-05 18:48:06 - ERROR - stderr - +2025-02-05 18:48:06 - ERROR - stderr - +2025-02-05 18:48:06 - INFO - stdout - {'loss': 0.7902, 'grad_norm': 1.2142648696899414, 'learning_rate': 1.4541191337737175e-05, 'epoch': 1.11} +2025-02-05 18:48:06 - ERROR - stderr - 37%|███▋ | 8289/22434 [8:40:26<10:16:17, 2.61s/it] +2025-02-05 18:48:09 - ERROR - stderr - 37%|███▋ | 8290/22434 [8:40:29<10:13:40, 2.60s/it] +2025-02-05 18:48:09 - ERROR - stderr - +2025-02-05 18:48:09 - ERROR - stderr - +2025-02-05 18:48:09 - INFO - stdout - {'loss': 0.7547, 'grad_norm': 1.0733712911605835, 'learning_rate': 1.4539904997395468e-05, 'epoch': 1.11} +2025-02-05 18:48:09 - ERROR - stderr - 37%|███▋ | 8290/22434 [8:40:29<10:13:40, 2.60s/it] +2025-02-05 18:48:11 - ERROR - stderr - 37%|███▋ | 8291/22434 [8:40:31<10:02:29, 2.56s/it] +2025-02-05 18:48:11 - ERROR - stderr - +2025-02-05 18:48:11 - ERROR - stderr - +2025-02-05 18:48:11 - INFO - stdout - {'loss': 0.6661, 'grad_norm': 1.1593356132507324, 'learning_rate': 1.4538618562423788e-05, 'epoch': 1.11} +2025-02-05 18:48:11 - ERROR - stderr - 37%|███▋ | 8291/22434 [8:40:31<10:02:29, 2.56s/it] +2025-02-05 18:48:14 - ERROR - stderr - 37%|███▋ | 8292/22434 [8:40:33<9:54:21, 2.52s/it] +2025-02-05 18:48:14 - ERROR - stderr - +2025-02-05 18:48:14 - ERROR - stderr - +2025-02-05 18:48:14 - INFO - stdout - {'loss': 0.6722, 'grad_norm': 1.0731009244918823, 'learning_rate': 1.4537332032848945e-05, 'epoch': 1.11} +2025-02-05 18:48:14 - ERROR - stderr - 37%|███▋ | 8292/22434 [8:40:33<9:54:21, 2.52s/it] +2025-02-05 18:48:16 - ERROR - stderr - 37%|███▋ | 8293/22434 [8:40:36<9:51:02, 2.51s/it] +2025-02-05 18:48:16 - ERROR - stderr - +2025-02-05 18:48:16 - ERROR - stderr - +2025-02-05 18:48:16 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.2825267314910889, 'learning_rate': 1.4536045408697757e-05, 'epoch': 1.11} +2025-02-05 18:48:16 - ERROR - stderr - 37%|███▋ | 8293/22434 [8:40:36<9:51:02, 2.51s/it] +2025-02-05 18:48:19 - ERROR - stderr - 37%|███▋ | 8294/22434 [8:40:38<9:49:03, 2.50s/it] +2025-02-05 18:48:19 - ERROR - stderr - +2025-02-05 18:48:19 - ERROR - stderr - +2025-02-05 18:48:19 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.0678948163986206, 'learning_rate': 1.4534758689997046e-05, 'epoch': 1.11} +2025-02-05 18:48:19 - ERROR - stderr - 37%|███▋ | 8294/22434 [8:40:38<9:49:03, 2.50s/it] +2025-02-05 18:48:21 - ERROR - stderr - 37%|███▋ | 8295/22434 [8:40:41<9:45:48, 2.49s/it] +2025-02-05 18:48:21 - ERROR - stderr - +2025-02-05 18:48:21 - ERROR - stderr - +2025-02-05 18:48:21 - INFO - stdout - {'loss': 0.7898, 'grad_norm': 1.2530276775360107, 'learning_rate': 1.4533471876773626e-05, 'epoch': 1.11} +2025-02-05 18:48:21 - ERROR - stderr - 37%|███▋ | 8295/22434 [8:40:41<9:45:48, 2.49s/it] +2025-02-05 18:48:24 - ERROR - stderr - 37%|███▋ | 8296/22434 [8:40:43<9:46:04, 2.49s/it] +2025-02-05 18:48:24 - ERROR - stderr - +2025-02-05 18:48:24 - ERROR - stderr - +2025-02-05 18:48:24 - INFO - stdout - {'loss': 0.6566, 'grad_norm': 1.1837270259857178, 'learning_rate': 1.4532184969054322e-05, 'epoch': 1.11} +2025-02-05 18:48:24 - ERROR - stderr - 37%|███▋ | 8296/22434 [8:40:43<9:46:04, 2.49s/it] +2025-02-05 18:48:26 - ERROR - stderr - 37%|███▋ | 8297/22434 [8:40:46<9:47:10, 2.49s/it] +2025-02-05 18:48:26 - ERROR - stderr - +2025-02-05 18:48:26 - ERROR - stderr - +2025-02-05 18:48:26 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.2918622493743896, 'learning_rate': 1.4530897966865963e-05, 'epoch': 1.11} +2025-02-05 18:48:26 - ERROR - stderr - 37%|███▋ | 8297/22434 [8:40:46<9:47:10, 2.49s/it] +2025-02-05 18:48:29 - ERROR - stderr - 37%|███▋ | 8298/22434 [8:40:48<9:44:53, 2.48s/it] +2025-02-05 18:48:29 - ERROR - stderr - +2025-02-05 18:48:29 - ERROR - stderr - +2025-02-05 18:48:29 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.0833202600479126, 'learning_rate': 1.4529610870235368e-05, 'epoch': 1.11} +2025-02-05 18:48:29 - ERROR - stderr - 37%|███▋ | 8298/22434 [8:40:48<9:44:53, 2.48s/it] +2025-02-05 18:48:31 - ERROR - stderr - 37%|███▋ | 8299/22434 [8:40:51<9:51:51, 2.51s/it] +2025-02-05 18:48:31 - ERROR - stderr - +2025-02-05 18:48:31 - ERROR - stderr - +2025-02-05 18:48:31 - INFO - stdout - {'loss': 0.6814, 'grad_norm': 1.0432302951812744, 'learning_rate': 1.4528323679189371e-05, 'epoch': 1.11} +2025-02-05 18:48:31 - ERROR - stderr - 37%|███▋ | 8299/22434 [8:40:51<9:51:51, 2.51s/it] +2025-02-05 18:48:34 - ERROR - stderr - 37%|███▋ | 8300/22434 [8:40:53<9:59:42, 2.55s/it] +2025-02-05 18:48:34 - ERROR - stderr - +2025-02-05 18:48:34 - ERROR - stderr - +2025-02-05 18:48:34 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.0792577266693115, 'learning_rate': 1.4527036393754799e-05, 'epoch': 1.11} +2025-02-05 18:48:34 - ERROR - stderr - 37%|███▋ | 8300/22434 [8:40:54<9:59:42, 2.55s/it] +2025-02-05 18:48:36 - ERROR - stderr - 37%|███▋ | 8301/22434 [8:40:56<9:56:00, 2.53s/it] +2025-02-05 18:48:36 - ERROR - stderr - +2025-02-05 18:48:36 - ERROR - stderr - +2025-02-05 18:48:36 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.1741794347763062, 'learning_rate': 1.4525749013958486e-05, 'epoch': 1.11} +2025-02-05 18:48:36 - ERROR - stderr - 37%|███▋ | 8301/22434 [8:40:56<9:56:00, 2.53s/it] +2025-02-05 18:48:39 - ERROR - stderr - 37%|███▋ | 8302/22434 [8:40:58<9:50:52, 2.51s/it] +2025-02-05 18:48:39 - ERROR - stderr - +2025-02-05 18:48:39 - ERROR - stderr - +2025-02-05 18:48:39 - INFO - stdout - {'loss': 0.8102, 'grad_norm': 1.230660319328308, 'learning_rate': 1.4524461539827267e-05, 'epoch': 1.11} +2025-02-05 18:48:39 - ERROR - stderr - 37%|███▋ | 8302/22434 [8:40:58<9:50:52, 2.51s/it] +2025-02-05 18:48:41 - ERROR - stderr - 37%|███▋ | 8303/22434 [8:41:01<9:49:35, 2.50s/it] +2025-02-05 18:48:41 - ERROR - stderr - +2025-02-05 18:48:41 - ERROR - stderr - +2025-02-05 18:48:41 - INFO - stdout - {'loss': 0.7796, 'grad_norm': 1.0732176303863525, 'learning_rate': 1.4523173971387973e-05, 'epoch': 1.11} +2025-02-05 18:48:41 - ERROR - stderr - 37%|███▋ | 8303/22434 [8:41:01<9:49:35, 2.50s/it] +2025-02-05 18:48:44 - ERROR - stderr - 37%|███▋ | 8304/22434 [8:41:03<9:46:38, 2.49s/it] +2025-02-05 18:48:44 - ERROR - stderr - +2025-02-05 18:48:44 - ERROR - stderr - +2025-02-05 18:48:44 - INFO - stdout - {'loss': 0.7563, 'grad_norm': 1.2005423307418823, 'learning_rate': 1.4521886308667448e-05, 'epoch': 1.11} +2025-02-05 18:48:44 - ERROR - stderr - 37%|███▋ | 8304/22434 [8:41:03<9:46:38, 2.49s/it] +2025-02-05 18:48:46 - ERROR - stderr - 37%|███▋ | 8305/22434 [8:41:06<9:44:05, 2.48s/it] +2025-02-05 18:48:46 - ERROR - stderr - +2025-02-05 18:48:46 - ERROR - stderr - +2025-02-05 18:48:46 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.11322021484375, 'learning_rate': 1.4520598551692529e-05, 'epoch': 1.11} +2025-02-05 18:48:46 - ERROR - stderr - 37%|███▋ | 8305/22434 [8:41:06<9:44:05, 2.48s/it] +2025-02-05 18:48:49 - ERROR - stderr - 37%|███▋ | 8306/22434 [8:41:08<9:46:23, 2.49s/it] +2025-02-05 18:48:49 - ERROR - stderr - +2025-02-05 18:48:49 - ERROR - stderr - +2025-02-05 18:48:49 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.1368488073349, 'learning_rate': 1.4519310700490061e-05, 'epoch': 1.11} +2025-02-05 18:48:49 - ERROR - stderr - 37%|███▋ | 8306/22434 [8:41:08<9:46:23, 2.49s/it] +2025-02-05 18:48:51 - ERROR - stderr - 37%|███▋ | 8307/22434 [8:41:11<9:55:43, 2.53s/it] +2025-02-05 18:48:51 - ERROR - stderr - +2025-02-05 18:48:51 - ERROR - stderr - +2025-02-05 18:48:51 - INFO - stdout - {'loss': 0.7403, 'grad_norm': 1.2020680904388428, 'learning_rate': 1.4518022755086883e-05, 'epoch': 1.11} +2025-02-05 18:48:51 - ERROR - stderr - 37%|███▋ | 8307/22434 [8:41:11<9:55:43, 2.53s/it] +2025-02-05 18:48:54 - ERROR - stderr - 37%|███▋ | 8308/22434 [8:41:14<10:21:27, 2.64s/it] +2025-02-05 18:48:54 - ERROR - stderr - +2025-02-05 18:48:54 - ERROR - stderr - +2025-02-05 18:48:54 - INFO - stdout - {'loss': 0.6498, 'grad_norm': 1.1665252447128296, 'learning_rate': 1.4516734715509846e-05, 'epoch': 1.11} +2025-02-05 18:48:54 - ERROR - stderr - 37%|███▋ | 8308/22434 [8:41:14<10:21:27, 2.64s/it] +2025-02-05 18:48:57 - ERROR - stderr - 37%|███▋ | 8309/22434 [8:41:16<10:14:05, 2.61s/it] +2025-02-05 18:48:57 - ERROR - stderr - +2025-02-05 18:48:57 - ERROR - stderr - +2025-02-05 18:48:57 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.1584941148757935, 'learning_rate': 1.4515446581785795e-05, 'epoch': 1.11} +2025-02-05 18:48:57 - ERROR - stderr - 37%|███▋ | 8309/22434 [8:41:16<10:14:05, 2.61s/it] +2025-02-05 18:48:59 - ERROR - stderr - 37%|███▋ | 8310/22434 [8:41:19<10:10:39, 2.59s/it] +2025-02-05 18:48:59 - ERROR - stderr - +2025-02-05 18:48:59 - ERROR - stderr - +2025-02-05 18:48:59 - INFO - stdout - {'loss': 0.6904, 'grad_norm': 1.1055808067321777, 'learning_rate': 1.4514158353941581e-05, 'epoch': 1.11} +2025-02-05 18:48:59 - ERROR - stderr - 37%|███▋ | 8310/22434 [8:41:19<10:10:39, 2.59s/it] +2025-02-05 18:49:02 - ERROR - stderr - 37%|███▋ | 8311/22434 [8:41:21<10:02:52, 2.56s/it] +2025-02-05 18:49:02 - ERROR - stderr - +2025-02-05 18:49:02 - ERROR - stderr - +2025-02-05 18:49:02 - INFO - stdout - {'loss': 0.7516, 'grad_norm': 1.1115529537200928, 'learning_rate': 1.4512870032004057e-05, 'epoch': 1.11} +2025-02-05 18:49:02 - ERROR - stderr - 37%|███▋ | 8311/22434 [8:41:22<10:02:52, 2.56s/it] +2025-02-05 18:49:04 - ERROR - stderr - 37%|███▋ | 8312/22434 [8:41:24<9:57:10, 2.54s/it] +2025-02-05 18:49:04 - ERROR - stderr - +2025-02-05 18:49:04 - ERROR - stderr - +2025-02-05 18:49:04 - INFO - stdout - {'loss': 0.675, 'grad_norm': 1.1111924648284912, 'learning_rate': 1.4511581616000072e-05, 'epoch': 1.11} +2025-02-05 18:49:04 - ERROR - stderr - 37%|███▋ | 8312/22434 [8:41:24<9:57:10, 2.54s/it] +2025-02-05 18:49:07 - ERROR - stderr - 37%|███▋ | 8313/22434 [8:41:26<9:53:34, 2.52s/it] +2025-02-05 18:49:07 - ERROR - stderr - +2025-02-05 18:49:07 - ERROR - stderr - +2025-02-05 18:49:07 - INFO - stdout - {'loss': 0.797, 'grad_norm': 1.2660021781921387, 'learning_rate': 1.4510293105956488e-05, 'epoch': 1.11} +2025-02-05 18:49:07 - ERROR - stderr - 37%|███▋ | 8313/22434 [8:41:26<9:53:34, 2.52s/it] +2025-02-05 18:49:09 - ERROR - stderr - 37%|███▋ | 8314/22434 [8:41:29<9:49:17, 2.50s/it] +2025-02-05 18:49:09 - ERROR - stderr - +2025-02-05 18:49:09 - ERROR - stderr - +2025-02-05 18:49:09 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.1217706203460693, 'learning_rate': 1.4509004501900161e-05, 'epoch': 1.11} +2025-02-05 18:49:09 - ERROR - stderr - 37%|███▋ | 8314/22434 [8:41:29<9:49:17, 2.50s/it] +2025-02-05 18:49:12 - ERROR - stderr - 37%|███▋ | 8315/22434 [8:41:31<9:46:42, 2.49s/it] +2025-02-05 18:49:12 - ERROR - stderr - +2025-02-05 18:49:12 - ERROR - stderr - +2025-02-05 18:49:12 - INFO - stdout - {'loss': 0.7535, 'grad_norm': 1.1810978651046753, 'learning_rate': 1.4507715803857948e-05, 'epoch': 1.11} +2025-02-05 18:49:12 - ERROR - stderr - 37%|███▋ | 8315/22434 [8:41:31<9:46:42, 2.49s/it] +2025-02-05 18:49:14 - ERROR - stderr - 37%|███▋ | 8316/22434 [8:41:34<9:52:58, 2.52s/it] +2025-02-05 18:49:14 - ERROR - stderr - +2025-02-05 18:49:14 - ERROR - stderr - +2025-02-05 18:49:14 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.2078564167022705, 'learning_rate': 1.4506427011856712e-05, 'epoch': 1.11} +2025-02-05 18:49:14 - ERROR - stderr - 37%|███▋ | 8316/22434 [8:41:34<9:52:58, 2.52s/it] +2025-02-05 18:49:17 - ERROR - stderr - 37%|███▋ | 8317/22434 [8:41:37<10:02:42, 2.56s/it] +2025-02-05 18:49:17 - ERROR - stderr - +2025-02-05 18:49:17 - ERROR - stderr - +2025-02-05 18:49:17 - INFO - stdout - {'loss': 0.7591, 'grad_norm': 1.1513147354125977, 'learning_rate': 1.4505138125923316e-05, 'epoch': 1.11} +2025-02-05 18:49:17 - ERROR - stderr - 37%|███▋ | 8317/22434 [8:41:37<10:02:42, 2.56s/it] +2025-02-05 18:49:19 - ERROR - stderr - 37%|███▋ | 8318/22434 [8:41:39<10:03:56, 2.57s/it] +2025-02-05 18:49:19 - ERROR - stderr - +2025-02-05 18:49:19 - ERROR - stderr - +2025-02-05 18:49:19 - INFO - stdout - {'loss': 0.6512, 'grad_norm': 1.1595839262008667, 'learning_rate': 1.450384914608463e-05, 'epoch': 1.11} +2025-02-05 18:49:19 - ERROR - stderr - 37%|███▋ | 8318/22434 [8:41:39<10:03:56, 2.57s/it] +2025-02-05 18:49:22 - ERROR - stderr - 37%|███▋ | 8319/22434 [8:41:42<9:57:50, 2.54s/it] +2025-02-05 18:49:22 - ERROR - stderr - +2025-02-05 18:49:22 - ERROR - stderr - +2025-02-05 18:49:22 - INFO - stdout - {'loss': 0.8134, 'grad_norm': 1.2681446075439453, 'learning_rate': 1.4502560072367518e-05, 'epoch': 1.11} +2025-02-05 18:49:22 - ERROR - stderr - 37%|███▋ | 8319/22434 [8:41:42<9:57:50, 2.54s/it] +2025-02-05 18:49:24 - ERROR - stderr - 37%|███▋ | 8320/22434 [8:41:44<9:54:34, 2.53s/it] +2025-02-05 18:49:24 - ERROR - stderr - +2025-02-05 18:49:24 - ERROR - stderr - +2025-02-05 18:49:24 - INFO - stdout - {'loss': 0.7156, 'grad_norm': 1.0839310884475708, 'learning_rate': 1.4501270904798847e-05, 'epoch': 1.11} +2025-02-05 18:49:24 - ERROR - stderr - 37%|███▋ | 8320/22434 [8:41:44<9:54:34, 2.53s/it] +2025-02-05 18:49:27 - ERROR - stderr - 37%|███▋ | 8321/22434 [8:41:47<9:51:36, 2.52s/it] +2025-02-05 18:49:27 - ERROR - stderr - +2025-02-05 18:49:27 - ERROR - stderr - +2025-02-05 18:49:27 - INFO - stdout - {'loss': 0.7017, 'grad_norm': 1.0616414546966553, 'learning_rate': 1.4499981643405495e-05, 'epoch': 1.11} +2025-02-05 18:49:27 - ERROR - stderr - 37%|███▋ | 8321/22434 [8:41:47<9:51:36, 2.52s/it] +2025-02-05 18:49:29 - ERROR - stderr - 37%|███▋ | 8322/22434 [8:41:49<9:48:38, 2.50s/it] +2025-02-05 18:49:29 - ERROR - stderr - +2025-02-05 18:49:29 - ERROR - stderr - +2025-02-05 18:49:29 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.2882949113845825, 'learning_rate': 1.449869228821433e-05, 'epoch': 1.11} +2025-02-05 18:49:29 - ERROR - stderr - 37%|███▋ | 8322/22434 [8:41:49<9:48:38, 2.50s/it] +2025-02-05 18:49:32 - ERROR - stderr - 37%|███▋ | 8323/22434 [8:41:52<9:53:36, 2.52s/it] +2025-02-05 18:49:32 - ERROR - stderr - +2025-02-05 18:49:32 - ERROR - stderr - +2025-02-05 18:49:32 - INFO - stdout - {'loss': 0.7571, 'grad_norm': 1.213078260421753, 'learning_rate': 1.4497402839252228e-05, 'epoch': 1.11} +2025-02-05 18:49:32 - ERROR - stderr - 37%|███▋ | 8323/22434 [8:41:52<9:53:36, 2.52s/it] +2025-02-05 18:49:34 - ERROR - stderr - 37%|███▋ | 8324/22434 [8:41:54<9:49:37, 2.51s/it] +2025-02-05 18:49:34 - ERROR - stderr - +2025-02-05 18:49:34 - ERROR - stderr - +2025-02-05 18:49:34 - INFO - stdout - {'loss': 0.727, 'grad_norm': 1.1187180280685425, 'learning_rate': 1.4496113296546068e-05, 'epoch': 1.11} +2025-02-05 18:49:34 - ERROR - stderr - 37%|███▋ | 8324/22434 [8:41:54<9:49:37, 2.51s/it] +2025-02-05 18:49:37 - ERROR - stderr - 37%|███▋ | 8325/22434 [8:41:57<9:54:10, 2.53s/it] +2025-02-05 18:49:37 - ERROR - stderr - +2025-02-05 18:49:37 - ERROR - stderr - +2025-02-05 18:49:37 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.1193311214447021, 'learning_rate': 1.4494823660122727e-05, 'epoch': 1.11} +2025-02-05 18:49:37 - ERROR - stderr - 37%|███▋ | 8325/22434 [8:41:57<9:54:10, 2.53s/it] +2025-02-05 18:49:39 - ERROR - stderr - 37%|███▋ | 8326/22434 [8:41:59<9:53:33, 2.52s/it] +2025-02-05 18:49:40 - ERROR - stderr - +2025-02-05 18:49:40 - ERROR - stderr - +2025-02-05 18:49:40 - INFO - stdout - {'loss': 0.751, 'grad_norm': 1.1341887712478638, 'learning_rate': 1.4493533930009092e-05, 'epoch': 1.11} +2025-02-05 18:49:40 - ERROR - stderr - 37%|███▋ | 8326/22434 [8:41:59<9:53:33, 2.52s/it] +2025-02-05 18:49:42 - ERROR - stderr - 37%|███▋ | 8327/22434 [8:42:02<10:02:55, 2.56s/it] +2025-02-05 18:49:42 - ERROR - stderr - +2025-02-05 18:49:42 - ERROR - stderr - +2025-02-05 18:49:42 - INFO - stdout - {'loss': 0.7927, 'grad_norm': 1.1902804374694824, 'learning_rate': 1.449224410623204e-05, 'epoch': 1.11} +2025-02-05 18:49:42 - ERROR - stderr - 37%|███▋ | 8327/22434 [8:42:02<10:02:55, 2.56s/it] +2025-02-05 18:49:45 - ERROR - stderr - 37%|███▋ | 8328/22434 [8:42:04<10:04:08, 2.57s/it] +2025-02-05 18:49:45 - ERROR - stderr - +2025-02-05 18:49:45 - ERROR - stderr - +2025-02-05 18:49:45 - INFO - stdout - {'loss': 0.8212, 'grad_norm': 1.1972242593765259, 'learning_rate': 1.4490954188818458e-05, 'epoch': 1.11} +2025-02-05 18:49:45 - ERROR - stderr - 37%|███▋ | 8328/22434 [8:42:05<10:04:08, 2.57s/it] +2025-02-05 18:49:47 - ERROR - stderr - 37%|███▋ | 8329/22434 [8:42:07<10:02:31, 2.56s/it] +2025-02-05 18:49:47 - ERROR - stderr - +2025-02-05 18:49:47 - ERROR - stderr - +2025-02-05 18:49:47 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.1225979328155518, 'learning_rate': 1.448966417779523e-05, 'epoch': 1.11} +2025-02-05 18:49:47 - ERROR - stderr - 37%|███▋ | 8329/22434 [8:42:07<10:02:31, 2.56s/it] +2025-02-05 18:49:50 - ERROR - stderr - 37%|███▋ | 8330/22434 [8:42:10<10:00:05, 2.55s/it] +2025-02-05 18:49:50 - ERROR - stderr - +2025-02-05 18:49:50 - ERROR - stderr - +2025-02-05 18:49:50 - INFO - stdout - {'loss': 0.8213, 'grad_norm': 1.1491978168487549, 'learning_rate': 1.4488374073189251e-05, 'epoch': 1.11} +2025-02-05 18:49:50 - ERROR - stderr - 37%|███▋ | 8330/22434 [8:42:10<10:00:05, 2.55s/it] +2025-02-05 18:49:52 - ERROR - stderr - 37%|███▋ | 8331/22434 [8:42:12<9:56:59, 2.54s/it] +2025-02-05 18:49:52 - ERROR - stderr - +2025-02-05 18:49:52 - ERROR - stderr - +2025-02-05 18:49:52 - INFO - stdout - {'loss': 0.7643, 'grad_norm': 1.0951247215270996, 'learning_rate': 1.4487083875027412e-05, 'epoch': 1.11} +2025-02-05 18:49:52 - ERROR - stderr - 37%|███▋ | 8331/22434 [8:42:12<9:56:59, 2.54s/it] +2025-02-05 18:49:55 - ERROR - stderr - 37%|███▋ | 8332/22434 [8:42:15<9:50:36, 2.51s/it] +2025-02-05 18:49:55 - ERROR - stderr - +2025-02-05 18:49:55 - ERROR - stderr - +2025-02-05 18:49:55 - INFO - stdout - {'loss': 0.6175, 'grad_norm': 1.0306428670883179, 'learning_rate': 1.4485793583336602e-05, 'epoch': 1.11} +2025-02-05 18:49:55 - ERROR - stderr - 37%|███▋ | 8332/22434 [8:42:15<9:50:36, 2.51s/it] +2025-02-05 18:49:57 - ERROR - stderr - 37%|███▋ | 8333/22434 [8:42:17<9:54:50, 2.53s/it] +2025-02-05 18:49:57 - ERROR - stderr - +2025-02-05 18:49:57 - ERROR - stderr - +2025-02-05 18:49:57 - INFO - stdout - {'loss': 0.723, 'grad_norm': 1.1568228006362915, 'learning_rate': 1.4484503198143715e-05, 'epoch': 1.11} +2025-02-05 18:49:57 - ERROR - stderr - 37%|███▋ | 8333/22434 [8:42:17<9:54:50, 2.53s/it] +2025-02-05 18:50:00 - ERROR - stderr - 37%|███▋ | 8334/22434 [8:42:20<10:02:02, 2.56s/it] +2025-02-05 18:50:00 - ERROR - stderr - +2025-02-05 18:50:00 - ERROR - stderr - +2025-02-05 18:50:00 - INFO - stdout - {'loss': 0.7117, 'grad_norm': 1.059244990348816, 'learning_rate': 1.4483212719475652e-05, 'epoch': 1.11} +2025-02-05 18:50:00 - ERROR - stderr - 37%|███▋ | 8334/22434 [8:42:20<10:02:02, 2.56s/it] +2025-02-05 18:50:03 - ERROR - stderr - 37%|███▋ | 8335/22434 [8:42:22<10:06:34, 2.58s/it] +2025-02-05 18:50:03 - ERROR - stderr - +2025-02-05 18:50:03 - ERROR - stderr - +2025-02-05 18:50:03 - INFO - stdout - {'loss': 0.8079, 'grad_norm': 1.1356573104858398, 'learning_rate': 1.4481922147359309e-05, 'epoch': 1.11} +2025-02-05 18:50:03 - ERROR - stderr - 37%|███▋ | 8335/22434 [8:42:22<10:06:34, 2.58s/it] +2025-02-05 18:50:05 - ERROR - stderr - 37%|███▋ | 8336/22434 [8:42:25<10:06:34, 2.58s/it] +2025-02-05 18:50:05 - ERROR - stderr - +2025-02-05 18:50:05 - ERROR - stderr - +2025-02-05 18:50:05 - INFO - stdout - {'loss': 0.7734, 'grad_norm': 1.1072382926940918, 'learning_rate': 1.4480631481821588e-05, 'epoch': 1.11} +2025-02-05 18:50:05 - ERROR - stderr - 37%|███▋ | 8336/22434 [8:42:25<10:06:34, 2.58s/it] +2025-02-05 18:50:08 - ERROR - stderr - 37%|███▋ | 8337/22434 [8:42:27<9:57:10, 2.54s/it] +2025-02-05 18:50:08 - ERROR - stderr - +2025-02-05 18:50:08 - ERROR - stderr - +2025-02-05 18:50:08 - INFO - stdout - {'loss': 0.765, 'grad_norm': 1.1229808330535889, 'learning_rate': 1.447934072288939e-05, 'epoch': 1.11} +2025-02-05 18:50:08 - ERROR - stderr - 37%|███▋ | 8337/22434 [8:42:27<9:57:10, 2.54s/it] +2025-02-05 18:50:10 - ERROR - stderr - 37%|███▋ | 8338/22434 [8:42:30<9:49:59, 2.51s/it] +2025-02-05 18:50:10 - ERROR - stderr - +2025-02-05 18:50:10 - ERROR - stderr - +2025-02-05 18:50:10 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.1347295045852661, 'learning_rate': 1.4478049870589623e-05, 'epoch': 1.12} +2025-02-05 18:50:10 - ERROR - stderr - 37%|███▋ | 8338/22434 [8:42:30<9:49:59, 2.51s/it] +2025-02-05 18:50:13 - ERROR - stderr - 37%|███▋ | 8339/22434 [8:42:32<9:51:26, 2.52s/it] +2025-02-05 18:50:13 - ERROR - stderr - +2025-02-05 18:50:13 - ERROR - stderr - +2025-02-05 18:50:13 - INFO - stdout - {'loss': 0.7806, 'grad_norm': 1.1059898138046265, 'learning_rate': 1.4476758924949192e-05, 'epoch': 1.12} +2025-02-05 18:50:13 - ERROR - stderr - 37%|███▋ | 8339/22434 [8:42:32<9:51:26, 2.52s/it] +2025-02-05 18:50:15 - ERROR - stderr - 37%|███▋ | 8340/22434 [8:42:35<9:57:37, 2.54s/it] +2025-02-05 18:50:15 - ERROR - stderr - +2025-02-05 18:50:15 - ERROR - stderr - +2025-02-05 18:50:15 - INFO - stdout - {'loss': 0.7562, 'grad_norm': 1.1295607089996338, 'learning_rate': 1.4475467885995003e-05, 'epoch': 1.12} +2025-02-05 18:50:15 - ERROR - stderr - 37%|███▋ | 8340/22434 [8:42:35<9:57:37, 2.54s/it] +2025-02-05 18:50:18 - ERROR - stderr - 37%|███▋ | 8341/22434 [8:42:38<10:07:10, 2.59s/it] +2025-02-05 18:50:18 - ERROR - stderr - +2025-02-05 18:50:18 - ERROR - stderr - +2025-02-05 18:50:18 - INFO - stdout - {'loss': 0.8163, 'grad_norm': 1.2315341234207153, 'learning_rate': 1.4474176753753968e-05, 'epoch': 1.12} +2025-02-05 18:50:18 - ERROR - stderr - 37%|███▋ | 8341/22434 [8:42:38<10:07:10, 2.59s/it] +2025-02-05 18:50:20 - ERROR - stderr - 37%|███▋ | 8342/22434 [8:42:40<10:03:41, 2.57s/it] +2025-02-05 18:50:20 - ERROR - stderr - +2025-02-05 18:50:20 - ERROR - stderr - +2025-02-05 18:50:20 - INFO - stdout - {'loss': 0.6845, 'grad_norm': 1.1378015279769897, 'learning_rate': 1.4472885528253e-05, 'epoch': 1.12} +2025-02-05 18:50:20 - ERROR - stderr - 37%|███▋ | 8342/22434 [8:42:40<10:03:41, 2.57s/it] +2025-02-05 18:50:23 - ERROR - stderr - 37%|███▋ | 8343/22434 [8:42:43<9:58:24, 2.55s/it] +2025-02-05 18:50:23 - ERROR - stderr - +2025-02-05 18:50:23 - ERROR - stderr - +2025-02-05 18:50:23 - INFO - stdout - {'loss': 0.7279, 'grad_norm': 1.080583930015564, 'learning_rate': 1.4471594209519016e-05, 'epoch': 1.12} +2025-02-05 18:50:23 - ERROR - stderr - 37%|███▋ | 8343/22434 [8:42:43<9:58:24, 2.55s/it] +2025-02-05 18:50:25 - ERROR - stderr - 37%|███▋ | 8344/22434 [8:42:45<9:50:28, 2.51s/it] +2025-02-05 18:50:25 - ERROR - stderr - +2025-02-05 18:50:25 - ERROR - stderr - +2025-02-05 18:50:25 - INFO - stdout - {'loss': 0.7383, 'grad_norm': 1.1425526142120361, 'learning_rate': 1.4470302797578928e-05, 'epoch': 1.12} +2025-02-05 18:50:25 - ERROR - stderr - 37%|███▋ | 8344/22434 [8:42:45<9:50:28, 2.51s/it] +2025-02-05 18:50:28 - ERROR - stderr - 37%|███▋ | 8345/22434 [8:42:48<9:58:21, 2.55s/it] +2025-02-05 18:50:28 - ERROR - stderr - +2025-02-05 18:50:28 - ERROR - stderr - +2025-02-05 18:50:28 - INFO - stdout - {'loss': 0.6816, 'grad_norm': 1.086901068687439, 'learning_rate': 1.4469011292459653e-05, 'epoch': 1.12} +2025-02-05 18:50:28 - ERROR - stderr - 37%|███▋ | 8345/22434 [8:42:48<9:58:21, 2.55s/it] +2025-02-05 18:50:31 - ERROR - stderr - 37%|███▋ | 8346/22434 [8:42:51<10:20:41, 2.64s/it] +2025-02-05 18:50:31 - ERROR - stderr - +2025-02-05 18:50:31 - ERROR - stderr - +2025-02-05 18:50:31 - INFO - stdout - {'loss': 0.7934, 'grad_norm': 1.2208452224731445, 'learning_rate': 1.4467719694188118e-05, 'epoch': 1.12} +2025-02-05 18:50:31 - ERROR - stderr - 37%|███▋ | 8346/22434 [8:42:51<10:20:41, 2.64s/it] +2025-02-05 18:50:33 - ERROR - stderr - 37%|███▋ | 8347/22434 [8:42:53<10:10:26, 2.60s/it] +2025-02-05 18:50:33 - ERROR - stderr - +2025-02-05 18:50:33 - ERROR - stderr - +2025-02-05 18:50:33 - INFO - stdout - {'loss': 0.7203, 'grad_norm': 1.1572463512420654, 'learning_rate': 1.446642800279124e-05, 'epoch': 1.12} +2025-02-05 18:50:33 - ERROR - stderr - 37%|███▋ | 8347/22434 [8:42:53<10:10:26, 2.60s/it] +2025-02-05 18:50:36 - ERROR - stderr - 37%|███▋ | 8348/22434 [8:42:56<10:11:46, 2.61s/it] +2025-02-05 18:50:36 - ERROR - stderr - +2025-02-05 18:50:36 - ERROR - stderr - +2025-02-05 18:50:36 - INFO - stdout - {'loss': 0.7683, 'grad_norm': 1.181697964668274, 'learning_rate': 1.4465136218295944e-05, 'epoch': 1.12} +2025-02-05 18:50:36 - ERROR - stderr - 37%|███▋ | 8348/22434 [8:42:56<10:11:46, 2.61s/it] +2025-02-05 18:50:38 - ERROR - stderr - 37%|███▋ | 8349/22434 [8:42:58<10:00:47, 2.56s/it] +2025-02-05 18:50:38 - ERROR - stderr - +2025-02-05 18:50:38 - ERROR - stderr - +2025-02-05 18:50:38 - INFO - stdout - {'loss': 0.6594, 'grad_norm': 1.0777674913406372, 'learning_rate': 1.4463844340729155e-05, 'epoch': 1.12} +2025-02-05 18:50:38 - ERROR - stderr - 37%|███▋ | 8349/22434 [8:42:58<10:00:47, 2.56s/it] +2025-02-05 18:50:41 - ERROR - stderr - 37%|███▋ | 8350/22434 [8:43:01<9:56:06, 2.54s/it] +2025-02-05 18:50:41 - ERROR - stderr - +2025-02-05 18:50:41 - ERROR - stderr - +2025-02-05 18:50:41 - INFO - stdout - {'loss': 0.64, 'grad_norm': 1.0535820722579956, 'learning_rate': 1.4462552370117802e-05, 'epoch': 1.12} +2025-02-05 18:50:41 - ERROR - stderr - 37%|███▋ | 8350/22434 [8:43:01<9:56:06, 2.54s/it] +2025-02-05 18:50:43 - ERROR - stderr - 37%|███▋ | 8351/22434 [8:43:03<9:59:26, 2.55s/it] +2025-02-05 18:50:44 - ERROR - stderr - +2025-02-05 18:50:44 - ERROR - stderr - +2025-02-05 18:50:44 - INFO - stdout - {'loss': 0.7339, 'grad_norm': 1.0885950326919556, 'learning_rate': 1.4461260306488818e-05, 'epoch': 1.12} +2025-02-05 18:50:44 - ERROR - stderr - 37%|███▋ | 8351/22434 [8:43:03<9:59:26, 2.55s/it] +2025-02-05 18:50:46 - ERROR - stderr - 37%|███▋ | 8352/22434 [8:43:06<10:02:34, 2.57s/it] +2025-02-05 18:50:46 - ERROR - stderr - +2025-02-05 18:50:46 - ERROR - stderr - +2025-02-05 18:50:46 - INFO - stdout - {'loss': 0.7829, 'grad_norm': 1.2883414030075073, 'learning_rate': 1.445996814986913e-05, 'epoch': 1.12} +2025-02-05 18:50:46 - ERROR - stderr - 37%|███▋ | 8352/22434 [8:43:06<10:02:34, 2.57s/it] +2025-02-05 18:50:49 - ERROR - stderr - 37%|███▋ | 8353/22434 [8:43:08<9:59:54, 2.56s/it] +2025-02-05 18:50:49 - ERROR - stderr - +2025-02-05 18:50:49 - ERROR - stderr - +2025-02-05 18:50:49 - INFO - stdout - {'loss': 0.6413, 'grad_norm': 1.0322016477584839, 'learning_rate': 1.4458675900285672e-05, 'epoch': 1.12} +2025-02-05 18:50:49 - ERROR - stderr - 37%|███▋ | 8353/22434 [8:43:08<9:59:54, 2.56s/it] +2025-02-05 18:50:51 - ERROR - stderr - 37%|███▋ | 8354/22434 [8:43:11<9:54:03, 2.53s/it] +2025-02-05 18:50:51 - ERROR - stderr - +2025-02-05 18:50:51 - ERROR - stderr - +2025-02-05 18:50:51 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.219429612159729, 'learning_rate': 1.4457383557765385e-05, 'epoch': 1.12} +2025-02-05 18:50:51 - ERROR - stderr - 37%|███▋ | 8354/22434 [8:43:11<9:54:03, 2.53s/it] +2025-02-05 18:50:54 - ERROR - stderr - 37%|███▋ | 8355/22434 [8:43:13<9:54:27, 2.53s/it] +2025-02-05 18:50:54 - ERROR - stderr - +2025-02-05 18:50:54 - ERROR - stderr - +2025-02-05 18:50:54 - INFO - stdout - {'loss': 0.7141, 'grad_norm': 1.2747584581375122, 'learning_rate': 1.44560911223352e-05, 'epoch': 1.12} +2025-02-05 18:50:54 - ERROR - stderr - 37%|███▋ | 8355/22434 [8:43:13<9:54:27, 2.53s/it] +2025-02-05 18:50:56 - ERROR - stderr - 37%|███▋ | 8356/22434 [8:43:16<9:53:41, 2.53s/it] +2025-02-05 18:50:56 - ERROR - stderr - +2025-02-05 18:50:56 - ERROR - stderr - +2025-02-05 18:50:56 - INFO - stdout - {'loss': 0.6584, 'grad_norm': 1.1081844568252563, 'learning_rate': 1.4454798594022062e-05, 'epoch': 1.12} +2025-02-05 18:50:56 - ERROR - stderr - 37%|███▋ | 8356/22434 [8:43:16<9:53:41, 2.53s/it] +2025-02-05 18:50:59 - ERROR - stderr - 37%|███▋ | 8357/22434 [8:43:19<9:57:30, 2.55s/it] +2025-02-05 18:50:59 - ERROR - stderr - +2025-02-05 18:50:59 - ERROR - stderr - +2025-02-05 18:50:59 - INFO - stdout - {'loss': 0.7685, 'grad_norm': 1.1719917058944702, 'learning_rate': 1.4453505972852905e-05, 'epoch': 1.12} +2025-02-05 18:50:59 - ERROR - stderr - 37%|███▋ | 8357/22434 [8:43:19<9:57:30, 2.55s/it] +2025-02-05 18:51:01 - ERROR - stderr - 37%|███▋ | 8358/22434 [8:43:21<10:05:34, 2.58s/it] +2025-02-05 18:51:01 - ERROR - stderr - +2025-02-05 18:51:01 - ERROR - stderr - +2025-02-05 18:51:01 - INFO - stdout - {'loss': 0.7295, 'grad_norm': 1.1642959117889404, 'learning_rate': 1.4452213258854684e-05, 'epoch': 1.12} +2025-02-05 18:51:01 - ERROR - stderr - 37%|███▋ | 8358/22434 [8:43:21<10:05:34, 2.58s/it] +2025-02-05 18:51:04 - ERROR - stderr - 37%|███▋ | 8359/22434 [8:43:24<10:26:06, 2.67s/it] +2025-02-05 18:51:04 - ERROR - stderr - +2025-02-05 18:51:04 - ERROR - stderr - +2025-02-05 18:51:04 - INFO - stdout - {'loss': 0.7802, 'grad_norm': 1.1818652153015137, 'learning_rate': 1.4450920452054336e-05, 'epoch': 1.12} +2025-02-05 18:51:04 - ERROR - stderr - 37%|███▋ | 8359/22434 [8:43:24<10:26:06, 2.67s/it] +2025-02-05 18:51:07 - ERROR - stderr - 37%|███▋ | 8360/22434 [8:43:26<10:10:29, 2.60s/it] +2025-02-05 18:51:07 - ERROR - stderr - +2025-02-05 18:51:07 - ERROR - stderr - +2025-02-05 18:51:07 - INFO - stdout - {'loss': 0.7147, 'grad_norm': 1.1717503070831299, 'learning_rate': 1.4449627552478809e-05, 'epoch': 1.12} +2025-02-05 18:51:07 - ERROR - stderr - 37%|███▋ | 8360/22434 [8:43:27<10:10:29, 2.60s/it] +2025-02-05 18:51:10 - ERROR - stderr - 37%|███▋ | 8361/22434 [8:43:29<10:26:25, 2.67s/it] +2025-02-05 18:51:10 - ERROR - stderr - +2025-02-05 18:51:10 - ERROR - stderr - +2025-02-05 18:51:10 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.0776591300964355, 'learning_rate': 1.4448334560155053e-05, 'epoch': 1.12} +2025-02-05 18:51:10 - ERROR - stderr - 37%|███▋ | 8361/22434 [8:43:29<10:26:25, 2.67s/it] +2025-02-05 18:51:12 - ERROR - stderr - 37%|███▋ | 8362/22434 [8:43:32<10:14:02, 2.62s/it] +2025-02-05 18:51:12 - ERROR - stderr - +2025-02-05 18:51:12 - ERROR - stderr - +2025-02-05 18:51:12 - INFO - stdout - {'loss': 0.7576, 'grad_norm': 1.359649658203125, 'learning_rate': 1.4447041475110019e-05, 'epoch': 1.12} +2025-02-05 18:51:12 - ERROR - stderr - 37%|███▋ | 8362/22434 [8:43:32<10:14:02, 2.62s/it] +2025-02-05 18:51:15 - ERROR - stderr - 37%|███▋ | 8363/22434 [8:43:34<10:03:26, 2.57s/it] +2025-02-05 18:51:15 - ERROR - stderr - +2025-02-05 18:51:15 - ERROR - stderr - +2025-02-05 18:51:15 - INFO - stdout - {'loss': 0.672, 'grad_norm': 1.134954810142517, 'learning_rate': 1.4445748297370665e-05, 'epoch': 1.12} +2025-02-05 18:51:15 - ERROR - stderr - 37%|███▋ | 8363/22434 [8:43:34<10:03:26, 2.57s/it] +2025-02-05 18:51:17 - ERROR - stderr - 37%|███▋ | 8364/22434 [8:43:37<10:07:57, 2.59s/it] +2025-02-05 18:51:17 - ERROR - stderr - +2025-02-05 18:51:17 - ERROR - stderr - +2025-02-05 18:51:17 - INFO - stdout - {'loss': 0.8165, 'grad_norm': 1.1859557628631592, 'learning_rate': 1.444445502696394e-05, 'epoch': 1.12} +2025-02-05 18:51:17 - ERROR - stderr - 37%|███▋ | 8364/22434 [8:43:37<10:07:57, 2.59s/it] +2025-02-05 18:51:20 - ERROR - stderr - 37%|███▋ | 8365/22434 [8:43:39<10:02:27, 2.57s/it] +2025-02-05 18:51:20 - ERROR - stderr - +2025-02-05 18:51:20 - ERROR - stderr - +2025-02-05 18:51:20 - INFO - stdout - {'loss': 0.7425, 'grad_norm': 1.228947401046753, 'learning_rate': 1.44431616639168e-05, 'epoch': 1.12} +2025-02-05 18:51:20 - ERROR - stderr - 37%|███▋ | 8365/22434 [8:43:39<10:02:27, 2.57s/it] +2025-02-05 18:51:22 - ERROR - stderr - 37%|███▋ | 8366/22434 [8:43:42<9:53:39, 2.53s/it] +2025-02-05 18:51:22 - ERROR - stderr - +2025-02-05 18:51:22 - ERROR - stderr - +2025-02-05 18:51:22 - INFO - stdout - {'loss': 0.6699, 'grad_norm': 1.187402606010437, 'learning_rate': 1.4441868208256208e-05, 'epoch': 1.12} +2025-02-05 18:51:22 - ERROR - stderr - 37%|███▋ | 8366/22434 [8:43:42<9:53:39, 2.53s/it] +2025-02-05 18:51:25 - ERROR - stderr - 37%|███▋ | 8367/22434 [8:43:44<9:51:29, 2.52s/it] +2025-02-05 18:51:25 - ERROR - stderr - +2025-02-05 18:51:25 - ERROR - stderr - +2025-02-05 18:51:25 - INFO - stdout - {'loss': 0.6688, 'grad_norm': 1.0500737428665161, 'learning_rate': 1.4440574660009125e-05, 'epoch': 1.12} +2025-02-05 18:51:25 - ERROR - stderr - 37%|███▋ | 8367/22434 [8:43:44<9:51:29, 2.52s/it] +2025-02-05 18:51:27 - ERROR - stderr - 37%|███▋ | 8368/22434 [8:43:47<9:58:06, 2.55s/it] +2025-02-05 18:51:27 - ERROR - stderr - +2025-02-05 18:51:27 - ERROR - stderr - +2025-02-05 18:51:27 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.0190798044204712, 'learning_rate': 1.4439281019202512e-05, 'epoch': 1.12} +2025-02-05 18:51:27 - ERROR - stderr - 37%|███▋ | 8368/22434 [8:43:47<9:58:06, 2.55s/it] +2025-02-05 18:51:30 - ERROR - stderr - 37%|███▋ | 8369/22434 [8:43:50<9:55:28, 2.54s/it] +2025-02-05 18:51:30 - ERROR - stderr - +2025-02-05 18:51:30 - ERROR - stderr - +2025-02-05 18:51:30 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.1765137910842896, 'learning_rate': 1.4437987285863332e-05, 'epoch': 1.12} +2025-02-05 18:51:30 - ERROR - stderr - 37%|███▋ | 8369/22434 [8:43:50<9:55:28, 2.54s/it] +2025-02-05 18:51:32 - ERROR - stderr - 37%|███▋ | 8370/22434 [8:43:52<9:55:34, 2.54s/it] +2025-02-05 18:51:32 - ERROR - stderr - +2025-02-05 18:51:32 - ERROR - stderr - +2025-02-05 18:51:32 - INFO - stdout - {'loss': 0.6741, 'grad_norm': 1.012528896331787, 'learning_rate': 1.4436693460018558e-05, 'epoch': 1.12} +2025-02-05 18:51:32 - ERROR - stderr - 37%|███▋ | 8370/22434 [8:43:52<9:55:34, 2.54s/it] +2025-02-05 18:51:35 - ERROR - stderr - 37%|███▋ | 8371/22434 [8:43:55<9:58:11, 2.55s/it] +2025-02-05 18:51:35 - ERROR - stderr - +2025-02-05 18:51:35 - ERROR - stderr - +2025-02-05 18:51:35 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.071714162826538, 'learning_rate': 1.4435399541695154e-05, 'epoch': 1.12} +2025-02-05 18:51:35 - ERROR - stderr - 37%|███▋ | 8371/22434 [8:43:55<9:58:11, 2.55s/it] +2025-02-05 18:51:37 - ERROR - stderr - 37%|███▋ | 8372/22434 [8:43:57<10:00:12, 2.56s/it] +2025-02-05 18:51:37 - ERROR - stderr - +2025-02-05 18:51:37 - ERROR - stderr - +2025-02-05 18:51:37 - INFO - stdout - {'loss': 0.6664, 'grad_norm': 1.0907952785491943, 'learning_rate': 1.4434105530920089e-05, 'epoch': 1.12} +2025-02-05 18:51:37 - ERROR - stderr - 37%|███▋ | 8372/22434 [8:43:57<10:00:12, 2.56s/it] +2025-02-05 18:51:40 - ERROR - stderr - 37%|███▋ | 8373/22434 [8:44:00<9:59:47, 2.56s/it] +2025-02-05 18:51:40 - ERROR - stderr - +2025-02-05 18:51:40 - ERROR - stderr - +2025-02-05 18:51:40 - INFO - stdout - {'loss': 0.7514, 'grad_norm': 1.081235408782959, 'learning_rate': 1.4432811427720334e-05, 'epoch': 1.12} +2025-02-05 18:51:40 - ERROR - stderr - 37%|███▋ | 8373/22434 [8:44:00<9:59:47, 2.56s/it] +2025-02-05 18:51:42 - ERROR - stderr - 37%|███▋ | 8374/22434 [8:44:02<9:51:01, 2.52s/it] +2025-02-05 18:51:42 - ERROR - stderr - +2025-02-05 18:51:42 - ERROR - stderr - +2025-02-05 18:51:42 - INFO - stdout - {'loss': 0.6899, 'grad_norm': 1.1514976024627686, 'learning_rate': 1.443151723212287e-05, 'epoch': 1.12} +2025-02-05 18:51:42 - ERROR - stderr - 37%|███▋ | 8374/22434 [8:44:02<9:51:01, 2.52s/it] +2025-02-05 18:51:45 - ERROR - stderr - 37%|███▋ | 8375/22434 [8:44:05<9:48:28, 2.51s/it] +2025-02-05 18:51:45 - ERROR - stderr - +2025-02-05 18:51:45 - ERROR - stderr - +2025-02-05 18:51:45 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.0804319381713867, 'learning_rate': 1.4430222944154668e-05, 'epoch': 1.12} +2025-02-05 18:51:45 - ERROR - stderr - 37%|███▋ | 8375/22434 [8:44:05<9:48:28, 2.51s/it] +2025-02-05 18:51:47 - ERROR - stderr - 37%|███▋ | 8376/22434 [8:44:07<9:45:41, 2.50s/it] +2025-02-05 18:51:47 - ERROR - stderr - +2025-02-05 18:51:47 - ERROR - stderr - +2025-02-05 18:51:47 - INFO - stdout - {'loss': 0.7858, 'grad_norm': 1.145990014076233, 'learning_rate': 1.4428928563842711e-05, 'epoch': 1.12} +2025-02-05 18:51:47 - ERROR - stderr - 37%|███▋ | 8376/22434 [8:44:07<9:45:41, 2.50s/it] +2025-02-05 18:51:50 - ERROR - stderr - 37%|███▋ | 8377/22434 [8:44:10<9:44:09, 2.49s/it] +2025-02-05 18:51:50 - ERROR - stderr - +2025-02-05 18:51:50 - ERROR - stderr - +2025-02-05 18:51:50 - INFO - stdout - {'loss': 0.7551, 'grad_norm': 1.2893422842025757, 'learning_rate': 1.4427634091213973e-05, 'epoch': 1.12} +2025-02-05 18:51:50 - ERROR - stderr - 37%|███▋ | 8377/22434 [8:44:10<9:44:09, 2.49s/it] +2025-02-05 18:51:52 - ERROR - stderr - 37%|███▋ | 8378/22434 [8:44:12<9:43:29, 2.49s/it] +2025-02-05 18:51:52 - ERROR - stderr - +2025-02-05 18:51:52 - ERROR - stderr - +2025-02-05 18:51:52 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.1194885969161987, 'learning_rate': 1.442633952629544e-05, 'epoch': 1.12} +2025-02-05 18:51:52 - ERROR - stderr - 37%|███▋ | 8378/22434 [8:44:12<9:43:29, 2.49s/it] +2025-02-05 18:51:55 - ERROR - stderr - 37%|███▋ | 8379/22434 [8:44:15<9:41:53, 2.48s/it] +2025-02-05 18:51:55 - ERROR - stderr - +2025-02-05 18:51:55 - ERROR - stderr - +2025-02-05 18:51:55 - INFO - stdout - {'loss': 0.711, 'grad_norm': 1.1008917093276978, 'learning_rate': 1.4425044869114097e-05, 'epoch': 1.12} +2025-02-05 18:51:55 - ERROR - stderr - 37%|███▋ | 8379/22434 [8:44:15<9:41:53, 2.48s/it] +2025-02-05 18:51:57 - ERROR - stderr - 37%|███▋ | 8380/22434 [8:44:17<9:39:14, 2.47s/it] +2025-02-05 18:51:57 - ERROR - stderr - +2025-02-05 18:51:57 - ERROR - stderr - +2025-02-05 18:51:57 - INFO - stdout - {'loss': 0.6324, 'grad_norm': 1.0503158569335938, 'learning_rate': 1.4423750119696927e-05, 'epoch': 1.12} +2025-02-05 18:51:57 - ERROR - stderr - 37%|███▋ | 8380/22434 [8:44:17<9:39:14, 2.47s/it] +2025-02-05 18:52:00 - ERROR - stderr - 37%|███▋ | 8381/22434 [8:44:20<9:43:06, 2.49s/it] +2025-02-05 18:52:00 - ERROR - stderr - +2025-02-05 18:52:00 - ERROR - stderr - +2025-02-05 18:52:00 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.1039241552352905, 'learning_rate': 1.4422455278070916e-05, 'epoch': 1.12} +2025-02-05 18:52:00 - ERROR - stderr - 37%|███▋ | 8381/22434 [8:44:20<9:43:06, 2.49s/it] +2025-02-05 18:52:02 - ERROR - stderr - 37%|███▋ | 8382/22434 [8:44:22<9:43:46, 2.49s/it] +2025-02-05 18:52:02 - ERROR - stderr - +2025-02-05 18:52:02 - ERROR - stderr - +2025-02-05 18:52:02 - INFO - stdout - {'loss': 0.7403, 'grad_norm': 1.1159162521362305, 'learning_rate': 1.4421160344263059e-05, 'epoch': 1.12} +2025-02-05 18:52:02 - ERROR - stderr - 37%|███▋ | 8382/22434 [8:44:22<9:43:46, 2.49s/it] +2025-02-05 18:52:05 - ERROR - stderr - 37%|███▋ | 8383/22434 [8:44:25<9:41:48, 2.48s/it] +2025-02-05 18:52:05 - ERROR - stderr - +2025-02-05 18:52:05 - ERROR - stderr - +2025-02-05 18:52:05 - INFO - stdout - {'loss': 0.8027, 'grad_norm': 1.3382800817489624, 'learning_rate': 1.4419865318300348e-05, 'epoch': 1.12} +2025-02-05 18:52:05 - ERROR - stderr - 37%|███▋ | 8383/22434 [8:44:25<9:41:48, 2.48s/it] +2025-02-05 18:52:07 - ERROR - stderr - 37%|███▋ | 8384/22434 [8:44:27<9:40:54, 2.48s/it] +2025-02-05 18:52:07 - ERROR - stderr - +2025-02-05 18:52:07 - ERROR - stderr - +2025-02-05 18:52:07 - INFO - stdout - {'loss': 0.7523, 'grad_norm': 1.2750722169876099, 'learning_rate': 1.4418570200209772e-05, 'epoch': 1.12} +2025-02-05 18:52:07 - ERROR - stderr - 37%|███▋ | 8384/22434 [8:44:27<9:40:54, 2.48s/it] +2025-02-05 18:52:10 - ERROR - stderr - 37%|███▋ | 8385/22434 [8:44:29<9:39:34, 2.48s/it] +2025-02-05 18:52:10 - ERROR - stderr - +2025-02-05 18:52:10 - ERROR - stderr - +2025-02-05 18:52:10 - INFO - stdout - {'loss': 0.7078, 'grad_norm': 1.199256420135498, 'learning_rate': 1.4417274990018327e-05, 'epoch': 1.12} +2025-02-05 18:52:10 - ERROR - stderr - 37%|███▋ | 8385/22434 [8:44:30<9:39:34, 2.48s/it] +2025-02-05 18:52:12 - ERROR - stderr - 37%|███▋ | 8386/22434 [8:44:32<9:41:23, 2.48s/it] +2025-02-05 18:52:12 - ERROR - stderr - +2025-02-05 18:52:12 - ERROR - stderr - +2025-02-05 18:52:12 - INFO - stdout - {'loss': 0.7445, 'grad_norm': 1.1698222160339355, 'learning_rate': 1.441597968775301e-05, 'epoch': 1.12} +2025-02-05 18:52:12 - ERROR - stderr - 37%|███▋ | 8386/22434 [8:44:32<9:41:23, 2.48s/it] +2025-02-05 18:52:15 - ERROR - stderr - 37%|███▋ | 8387/22434 [8:44:34<9:42:33, 2.49s/it] +2025-02-05 18:52:15 - ERROR - stderr - +2025-02-05 18:52:15 - ERROR - stderr - +2025-02-05 18:52:15 - INFO - stdout - {'loss': 0.7751, 'grad_norm': 1.2105101346969604, 'learning_rate': 1.4414684293440823e-05, 'epoch': 1.12} +2025-02-05 18:52:15 - ERROR - stderr - 37%|███▋ | 8387/22434 [8:44:35<9:42:33, 2.49s/it] +2025-02-05 18:52:17 - ERROR - stderr - 37%|███▋ | 8388/22434 [8:44:37<9:41:02, 2.48s/it] +2025-02-05 18:52:17 - ERROR - stderr - +2025-02-05 18:52:17 - ERROR - stderr - +2025-02-05 18:52:17 - INFO - stdout - {'loss': 0.7006, 'grad_norm': 1.0842067003250122, 'learning_rate': 1.4413388807108768e-05, 'epoch': 1.12} +2025-02-05 18:52:17 - ERROR - stderr - 37%|███▋ | 8388/22434 [8:44:37<9:41:02, 2.48s/it] +2025-02-05 18:52:20 - ERROR - stderr - 37%|███▋ | 8389/22434 [8:44:39<9:43:14, 2.49s/it] +2025-02-05 18:52:20 - ERROR - stderr - +2025-02-05 18:52:20 - ERROR - stderr - +2025-02-05 18:52:20 - INFO - stdout - {'loss': 0.676, 'grad_norm': 1.197487711906433, 'learning_rate': 1.4412093228783846e-05, 'epoch': 1.12} +2025-02-05 18:52:20 - ERROR - stderr - 37%|███▋ | 8389/22434 [8:44:40<9:43:14, 2.49s/it] +2025-02-05 18:52:22 - ERROR - stderr - 37%|███▋ | 8390/22434 [8:44:42<9:47:04, 2.51s/it] +2025-02-05 18:52:22 - ERROR - stderr - +2025-02-05 18:52:22 - ERROR - stderr - +2025-02-05 18:52:22 - INFO - stdout - {'loss': 0.7219, 'grad_norm': 1.181753396987915, 'learning_rate': 1.4410797558493062e-05, 'epoch': 1.12} +2025-02-05 18:52:22 - ERROR - stderr - 37%|███▋ | 8390/22434 [8:44:42<9:47:04, 2.51s/it] +2025-02-05 18:52:25 - ERROR - stderr - 37%|███▋ | 8391/22434 [8:44:44<9:44:58, 2.50s/it] +2025-02-05 18:52:25 - ERROR - stderr - +2025-02-05 18:52:25 - ERROR - stderr - +2025-02-05 18:52:25 - INFO - stdout - {'loss': 0.8266, 'grad_norm': 1.3178125619888306, 'learning_rate': 1.4409501796263425e-05, 'epoch': 1.12} +2025-02-05 18:52:25 - ERROR - stderr - 37%|███▋ | 8391/22434 [8:44:45<9:44:58, 2.50s/it] +2025-02-05 18:52:27 - ERROR - stderr - 37%|███▋ | 8392/22434 [8:44:47<9:47:17, 2.51s/it] +2025-02-05 18:52:27 - ERROR - stderr - +2025-02-05 18:52:27 - ERROR - stderr - +2025-02-05 18:52:27 - INFO - stdout - {'loss': 0.8319, 'grad_norm': 1.3144235610961914, 'learning_rate': 1.4408205942121942e-05, 'epoch': 1.12} +2025-02-05 18:52:27 - ERROR - stderr - 37%|███▋ | 8392/22434 [8:44:47<9:47:17, 2.51s/it] +2025-02-05 18:52:30 - ERROR - stderr - 37%|███▋ | 8393/22434 [8:44:50<9:53:02, 2.53s/it] +2025-02-05 18:52:30 - ERROR - stderr - +2025-02-05 18:52:30 - ERROR - stderr - +2025-02-05 18:52:30 - INFO - stdout - {'loss': 0.7775, 'grad_norm': 1.3660727739334106, 'learning_rate': 1.4406909996095622e-05, 'epoch': 1.12} +2025-02-05 18:52:30 - ERROR - stderr - 37%|███▋ | 8393/22434 [8:44:50<9:53:02, 2.53s/it] +2025-02-05 18:52:32 - ERROR - stderr - 37%|███▋ | 8394/22434 [8:44:52<9:52:17, 2.53s/it] +2025-02-05 18:52:32 - ERROR - stderr - +2025-02-05 18:52:32 - ERROR - stderr - +2025-02-05 18:52:32 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.1368908882141113, 'learning_rate': 1.4405613958211482e-05, 'epoch': 1.12} +2025-02-05 18:52:32 - ERROR - stderr - 37%|███▋ | 8394/22434 [8:44:52<9:52:17, 2.53s/it] +2025-02-05 18:52:35 - ERROR - stderr - 37%|███▋ | 8395/22434 [8:44:55<9:48:00, 2.51s/it] +2025-02-05 18:52:35 - ERROR - stderr - +2025-02-05 18:52:35 - ERROR - stderr - +2025-02-05 18:52:35 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.2406312227249146, 'learning_rate': 1.4404317828496534e-05, 'epoch': 1.12} +2025-02-05 18:52:35 - ERROR - stderr - 37%|███▋ | 8395/22434 [8:44:55<9:48:00, 2.51s/it] +2025-02-05 18:52:37 - ERROR - stderr - 37%|███▋ | 8396/22434 [8:44:57<9:45:04, 2.50s/it] +2025-02-05 18:52:37 - ERROR - stderr - +2025-02-05 18:52:37 - ERROR - stderr - +2025-02-05 18:52:37 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.209076166152954, 'learning_rate': 1.4403021606977798e-05, 'epoch': 1.12} +2025-02-05 18:52:37 - ERROR - stderr - 37%|███▋ | 8396/22434 [8:44:57<9:45:04, 2.50s/it] +2025-02-05 18:52:40 - ERROR - stderr - 37%|███▋ | 8397/22434 [8:45:00<9:51:52, 2.53s/it] +2025-02-05 18:52:40 - ERROR - stderr - +2025-02-05 18:52:40 - ERROR - stderr - +2025-02-05 18:52:40 - INFO - stdout - {'loss': 0.7328, 'grad_norm': 1.1883964538574219, 'learning_rate': 1.4401725293682287e-05, 'epoch': 1.12} +2025-02-05 18:52:40 - ERROR - stderr - 37%|███▋ | 8397/22434 [8:45:00<9:51:52, 2.53s/it] +2025-02-05 18:52:42 - ERROR - stderr - 37%|███▋ | 8398/22434 [8:45:02<9:53:30, 2.54s/it] +2025-02-05 18:52:43 - ERROR - stderr - +2025-02-05 18:52:43 - ERROR - stderr - +2025-02-05 18:52:43 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.0854756832122803, 'learning_rate': 1.4400428888637026e-05, 'epoch': 1.12} +2025-02-05 18:52:43 - ERROR - stderr - 37%|███▋ | 8398/22434 [8:45:02<9:53:30, 2.54s/it] +2025-02-05 18:52:45 - ERROR - stderr - 37%|███▋ | 8399/22434 [8:45:05<9:58:25, 2.56s/it] +2025-02-05 18:52:45 - ERROR - stderr - +2025-02-05 18:52:45 - ERROR - stderr - +2025-02-05 18:52:45 - INFO - stdout - {'loss': 0.7885, 'grad_norm': 1.1856213808059692, 'learning_rate': 1.4399132391869032e-05, 'epoch': 1.12} +2025-02-05 18:52:45 - ERROR - stderr - 37%|███▋ | 8399/22434 [8:45:05<9:58:25, 2.56s/it] +2025-02-05 18:52:48 - ERROR - stderr - 37%|███▋ | 8400/22434 [8:45:07<9:55:33, 2.55s/it] +2025-02-05 18:52:48 - ERROR - stderr - +2025-02-05 18:52:48 - ERROR - stderr - +2025-02-05 18:52:48 - INFO - stdout - {'loss': 0.7398, 'grad_norm': 1.1723554134368896, 'learning_rate': 1.4397835803405338e-05, 'epoch': 1.12} +2025-02-05 18:52:48 - ERROR - stderr - 37%|███▋ | 8400/22434 [8:45:07<9:55:33, 2.55s/it] +2025-02-05 18:52:50 - ERROR - stderr - 37%|███▋ | 8401/22434 [8:45:10<9:53:24, 2.54s/it] +2025-02-05 18:52:50 - ERROR - stderr - +2025-02-05 18:52:50 - ERROR - stderr - +2025-02-05 18:52:50 - INFO - stdout - {'loss': 0.7272, 'grad_norm': 1.0858168601989746, 'learning_rate': 1.439653912327296e-05, 'epoch': 1.12} +2025-02-05 18:52:50 - ERROR - stderr - 37%|███▋ | 8401/22434 [8:45:10<9:53:24, 2.54s/it] +2025-02-05 18:52:53 - ERROR - stderr - 37%|███▋ | 8402/22434 [8:45:12<9:46:22, 2.51s/it] +2025-02-05 18:52:53 - ERROR - stderr - +2025-02-05 18:52:53 - ERROR - stderr - +2025-02-05 18:52:53 - INFO - stdout - {'loss': 0.7228, 'grad_norm': 1.1664848327636719, 'learning_rate': 1.4395242351498934e-05, 'epoch': 1.12} +2025-02-05 18:52:53 - ERROR - stderr - 37%|███▋ | 8402/22434 [8:45:12<9:46:22, 2.51s/it] +2025-02-05 18:52:55 - ERROR - stderr - 37%|███▋ | 8403/22434 [8:45:15<9:45:58, 2.51s/it] +2025-02-05 18:52:55 - ERROR - stderr - +2025-02-05 18:52:55 - ERROR - stderr - +2025-02-05 18:52:55 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.0693929195404053, 'learning_rate': 1.4393945488110287e-05, 'epoch': 1.12} +2025-02-05 18:52:55 - ERROR - stderr - 37%|███▋ | 8403/22434 [8:45:15<9:45:58, 2.51s/it] +2025-02-05 18:52:58 - ERROR - stderr - 37%|███▋ | 8404/22434 [8:45:17<9:43:45, 2.50s/it] +2025-02-05 18:52:58 - ERROR - stderr - +2025-02-05 18:52:58 - ERROR - stderr - +2025-02-05 18:52:58 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 0.954888105392456, 'learning_rate': 1.4392648533134051e-05, 'epoch': 1.12} +2025-02-05 18:52:58 - ERROR - stderr - 37%|███▋ | 8404/22434 [8:45:17<9:43:45, 2.50s/it] +2025-02-05 18:53:00 - ERROR - stderr - 37%|███▋ | 8405/22434 [8:45:20<9:51:08, 2.53s/it] +2025-02-05 18:53:00 - ERROR - stderr - +2025-02-05 18:53:00 - ERROR - stderr - +2025-02-05 18:53:00 - INFO - stdout - {'loss': 0.7303, 'grad_norm': 1.2978016138076782, 'learning_rate': 1.4391351486597259e-05, 'epoch': 1.12} +2025-02-05 18:53:00 - ERROR - stderr - 37%|███▋ | 8405/22434 [8:45:20<9:51:08, 2.53s/it] +2025-02-05 18:53:03 - ERROR - stderr - 37%|███▋ | 8406/22434 [8:45:22<9:53:46, 2.54s/it] +2025-02-05 18:53:03 - ERROR - stderr - +2025-02-05 18:53:03 - ERROR - stderr - +2025-02-05 18:53:03 - INFO - stdout - {'loss': 0.6367, 'grad_norm': 1.1401885747909546, 'learning_rate': 1.4390054348526945e-05, 'epoch': 1.12} +2025-02-05 18:53:03 - ERROR - stderr - 37%|███▋ | 8406/22434 [8:45:22<9:53:46, 2.54s/it] +2025-02-05 18:53:05 - ERROR - stderr - 37%|███▋ | 8407/22434 [8:45:25<9:48:33, 2.52s/it] +2025-02-05 18:53:05 - ERROR - stderr - +2025-02-05 18:53:05 - ERROR - stderr - +2025-02-05 18:53:05 - INFO - stdout - {'loss': 0.6753, 'grad_norm': 1.0839036703109741, 'learning_rate': 1.4388757118950152e-05, 'epoch': 1.12} +2025-02-05 18:53:05 - ERROR - stderr - 37%|███▋ | 8407/22434 [8:45:25<9:48:33, 2.52s/it] +2025-02-05 18:53:08 - ERROR - stderr - 37%|███▋ | 8408/22434 [8:45:27<9:45:47, 2.51s/it] +2025-02-05 18:53:08 - ERROR - stderr - +2025-02-05 18:53:08 - ERROR - stderr - +2025-02-05 18:53:08 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.1497244834899902, 'learning_rate': 1.4387459797893915e-05, 'epoch': 1.12} +2025-02-05 18:53:08 - ERROR - stderr - 37%|███▋ | 8408/22434 [8:45:27<9:45:47, 2.51s/it] +2025-02-05 18:53:10 - ERROR - stderr - 37%|███▋ | 8409/22434 [8:45:30<9:47:33, 2.51s/it] +2025-02-05 18:53:10 - ERROR - stderr - +2025-02-05 18:53:10 - ERROR - stderr - +2025-02-05 18:53:10 - INFO - stdout - {'loss': 0.7331, 'grad_norm': 1.0623220205307007, 'learning_rate': 1.4386162385385279e-05, 'epoch': 1.12} +2025-02-05 18:53:10 - ERROR - stderr - 37%|███▋ | 8409/22434 [8:45:30<9:47:33, 2.51s/it] +2025-02-05 18:53:13 - ERROR - stderr - 37%|███▋ | 8410/22434 [8:45:32<9:48:55, 2.52s/it] +2025-02-05 18:53:13 - ERROR - stderr - +2025-02-05 18:53:13 - ERROR - stderr - +2025-02-05 18:53:13 - INFO - stdout - {'loss': 0.6237, 'grad_norm': 1.031921148300171, 'learning_rate': 1.438486488145128e-05, 'epoch': 1.12} +2025-02-05 18:53:13 - ERROR - stderr - 37%|███▋ | 8410/22434 [8:45:33<9:48:55, 2.52s/it] +2025-02-05 18:53:15 - ERROR - stderr - 37%|███▋ | 8411/22434 [8:45:35<9:48:05, 2.52s/it] +2025-02-05 18:53:15 - ERROR - stderr - +2025-02-05 18:53:15 - ERROR - stderr - +2025-02-05 18:53:15 - INFO - stdout - {'loss': 0.8239, 'grad_norm': 1.1798107624053955, 'learning_rate': 1.4383567286118973e-05, 'epoch': 1.12} +2025-02-05 18:53:15 - ERROR - stderr - 37%|███▋ | 8411/22434 [8:45:35<9:48:05, 2.52s/it] +2025-02-05 18:53:18 - ERROR - stderr - 37%|███▋ | 8412/22434 [8:45:38<9:50:57, 2.53s/it] +2025-02-05 18:53:18 - ERROR - stderr - +2025-02-05 18:53:18 - ERROR - stderr - +2025-02-05 18:53:18 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.1528325080871582, 'learning_rate': 1.43822695994154e-05, 'epoch': 1.12} +2025-02-05 18:53:18 - ERROR - stderr - 37%|███▋ | 8412/22434 [8:45:38<9:50:57, 2.53s/it] +2025-02-05 18:53:20 - ERROR - stderr - 38%|███▊ | 8413/22434 [8:45:40<9:53:04, 2.54s/it] +2025-02-05 18:53:20 - ERROR - stderr - +2025-02-05 18:53:20 - ERROR - stderr - +2025-02-05 18:53:20 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.156791090965271, 'learning_rate': 1.438097182136761e-05, 'epoch': 1.13} +2025-02-05 18:53:20 - ERROR - stderr - 38%|███▊ | 8413/22434 [8:45:40<9:53:04, 2.54s/it] +2025-02-05 18:53:23 - ERROR - stderr - 38%|███▊ | 8414/22434 [8:45:43<9:53:45, 2.54s/it] +2025-02-05 18:53:23 - ERROR - stderr - +2025-02-05 18:53:23 - ERROR - stderr - +2025-02-05 18:53:23 - INFO - stdout - {'loss': 0.6498, 'grad_norm': 1.048363208770752, 'learning_rate': 1.4379673952002656e-05, 'epoch': 1.13} +2025-02-05 18:53:23 - ERROR - stderr - 38%|███▊ | 8414/22434 [8:45:43<9:53:45, 2.54s/it] +2025-02-05 18:53:25 - ERROR - stderr - 38%|███▊ | 8415/22434 [8:45:45<9:53:35, 2.54s/it] +2025-02-05 18:53:25 - ERROR - stderr - +2025-02-05 18:53:25 - ERROR - stderr - +2025-02-05 18:53:25 - INFO - stdout - {'loss': 0.7571, 'grad_norm': 1.2286697626113892, 'learning_rate': 1.4378375991347586e-05, 'epoch': 1.13} +2025-02-05 18:53:25 - ERROR - stderr - 38%|███▊ | 8415/22434 [8:45:45<9:53:35, 2.54s/it] +2025-02-05 18:53:28 - ERROR - stderr - 38%|███▊ | 8416/22434 [8:45:48<9:55:47, 2.55s/it] +2025-02-05 18:53:28 - ERROR - stderr - +2025-02-05 18:53:28 - ERROR - stderr - +2025-02-05 18:53:28 - INFO - stdout - {'loss': 0.708, 'grad_norm': 1.14577054977417, 'learning_rate': 1.4377077939429463e-05, 'epoch': 1.13} +2025-02-05 18:53:28 - ERROR - stderr - 38%|███▊ | 8416/22434 [8:45:48<9:55:47, 2.55s/it] +2025-02-05 18:53:31 - ERROR - stderr - 38%|███▊ | 8417/22434 [8:45:50<9:55:45, 2.55s/it] +2025-02-05 18:53:31 - ERROR - stderr - +2025-02-05 18:53:31 - ERROR - stderr - +2025-02-05 18:53:31 - INFO - stdout - {'loss': 0.7086, 'grad_norm': 1.0847517251968384, 'learning_rate': 1.4375779796275336e-05, 'epoch': 1.13} +2025-02-05 18:53:31 - ERROR - stderr - 38%|███▊ | 8417/22434 [8:45:50<9:55:45, 2.55s/it] +2025-02-05 18:53:33 - ERROR - stderr - 38%|███▊ | 8418/22434 [8:45:53<9:53:58, 2.54s/it] +2025-02-05 18:53:33 - ERROR - stderr - +2025-02-05 18:53:33 - ERROR - stderr - +2025-02-05 18:53:33 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.1408618688583374, 'learning_rate': 1.4374481561912266e-05, 'epoch': 1.13} +2025-02-05 18:53:33 - ERROR - stderr - 38%|███▊ | 8418/22434 [8:45:53<9:53:58, 2.54s/it] +2025-02-05 18:53:35 - ERROR - stderr - 38%|███▊ | 8419/22434 [8:45:55<9:46:03, 2.51s/it] +2025-02-05 18:53:36 - ERROR - stderr - +2025-02-05 18:53:36 - ERROR - stderr - +2025-02-05 18:53:36 - INFO - stdout - {'loss': 0.6288, 'grad_norm': 1.1123744249343872, 'learning_rate': 1.4373183236367312e-05, 'epoch': 1.13} +2025-02-05 18:53:36 - ERROR - stderr - 38%|███▊ | 8419/22434 [8:45:55<9:46:03, 2.51s/it] +2025-02-05 18:53:38 - ERROR - stderr - 38%|███▊ | 8420/22434 [8:45:58<9:42:01, 2.49s/it] +2025-02-05 18:53:38 - ERROR - stderr - +2025-02-05 18:53:38 - ERROR - stderr - +2025-02-05 18:53:38 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.0524399280548096, 'learning_rate': 1.437188481966754e-05, 'epoch': 1.13} +2025-02-05 18:53:38 - ERROR - stderr - 38%|███▊ | 8420/22434 [8:45:58<9:42:01, 2.49s/it] +2025-02-05 18:53:41 - ERROR - stderr - 38%|███▊ | 8421/22434 [8:46:00<9:48:18, 2.52s/it] +2025-02-05 18:53:41 - ERROR - stderr - +2025-02-05 18:53:41 - ERROR - stderr - +2025-02-05 18:53:41 - INFO - stdout - {'loss': 0.7414, 'grad_norm': 1.3016549348831177, 'learning_rate': 1.4370586311840014e-05, 'epoch': 1.13} +2025-02-05 18:53:41 - ERROR - stderr - 38%|███▊ | 8421/22434 [8:46:00<9:48:18, 2.52s/it] +2025-02-05 18:53:43 - ERROR - stderr - 38%|███▊ | 8422/22434 [8:46:03<9:44:14, 2.50s/it] +2025-02-05 18:53:43 - ERROR - stderr - +2025-02-05 18:53:43 - ERROR - stderr - +2025-02-05 18:53:43 - INFO - stdout - {'loss': 0.6367, 'grad_norm': 1.1124123334884644, 'learning_rate': 1.4369287712911795e-05, 'epoch': 1.13} +2025-02-05 18:53:43 - ERROR - stderr - 38%|███▊ | 8422/22434 [8:46:03<9:44:14, 2.50s/it] +2025-02-05 18:53:45 - ERROR - stderr - 38%|███▊ | 8423/22434 [8:46:05<9:42:54, 2.50s/it] +2025-02-05 18:53:46 - ERROR - stderr - +2025-02-05 18:53:46 - ERROR - stderr - +2025-02-05 18:53:46 - INFO - stdout - {'loss': 0.7009, 'grad_norm': 1.0690258741378784, 'learning_rate': 1.4367989022909956e-05, 'epoch': 1.13} +2025-02-05 18:53:46 - ERROR - stderr - 38%|███▊ | 8423/22434 [8:46:05<9:42:54, 2.50s/it] +2025-02-05 18:53:48 - ERROR - stderr - 38%|███▊ | 8424/22434 [8:46:08<9:40:31, 2.49s/it] +2025-02-05 18:53:48 - ERROR - stderr - +2025-02-05 18:53:48 - ERROR - stderr - +2025-02-05 18:53:48 - INFO - stdout - {'loss': 0.7075, 'grad_norm': 1.1233340501785278, 'learning_rate': 1.436669024186157e-05, 'epoch': 1.13} +2025-02-05 18:53:48 - ERROR - stderr - 38%|███▊ | 8424/22434 [8:46:08<9:40:31, 2.49s/it] +2025-02-05 18:53:51 - ERROR - stderr - 38%|███▊ | 8425/22434 [8:46:10<9:53:59, 2.54s/it] +2025-02-05 18:53:51 - ERROR - stderr - +2025-02-05 18:53:51 - ERROR - stderr - +2025-02-05 18:53:51 - INFO - stdout - {'loss': 0.7889, 'grad_norm': 1.3396233320236206, 'learning_rate': 1.4365391369793697e-05, 'epoch': 1.13} +2025-02-05 18:53:51 - ERROR - stderr - 38%|███▊ | 8425/22434 [8:46:10<9:53:59, 2.54s/it] +2025-02-05 18:53:53 - ERROR - stderr - 38%|███▊ | 8426/22434 [8:46:13<9:49:49, 2.53s/it] +2025-02-05 18:53:53 - ERROR - stderr - +2025-02-05 18:53:53 - ERROR - stderr - +2025-02-05 18:53:53 - INFO - stdout - {'loss': 0.7138, 'grad_norm': 1.0610556602478027, 'learning_rate': 1.436409240673342e-05, 'epoch': 1.13} +2025-02-05 18:53:53 - ERROR - stderr - 38%|███▊ | 8426/22434 [8:46:13<9:49:49, 2.53s/it] +2025-02-05 18:53:56 - ERROR - stderr - 38%|███▊ | 8427/22434 [8:46:15<9:48:21, 2.52s/it] +2025-02-05 18:53:56 - ERROR - stderr - +2025-02-05 18:53:56 - ERROR - stderr - +2025-02-05 18:53:56 - INFO - stdout - {'loss': 0.799, 'grad_norm': 1.215868353843689, 'learning_rate': 1.4362793352707816e-05, 'epoch': 1.13} +2025-02-05 18:53:56 - ERROR - stderr - 38%|███▊ | 8427/22434 [8:46:15<9:48:21, 2.52s/it] +2025-02-05 18:53:58 - ERROR - stderr - 38%|███▊ | 8428/22434 [8:46:18<9:50:24, 2.53s/it] +2025-02-05 18:53:58 - ERROR - stderr - +2025-02-05 18:53:58 - ERROR - stderr - +2025-02-05 18:53:58 - INFO - stdout - {'loss': 0.5782, 'grad_norm': 0.9768369197845459, 'learning_rate': 1.4361494207743958e-05, 'epoch': 1.13} +2025-02-05 18:53:58 - ERROR - stderr - 38%|███▊ | 8428/22434 [8:46:18<9:50:24, 2.53s/it] +2025-02-05 18:54:01 - ERROR - stderr - 38%|███▊ | 8429/22434 [8:46:20<9:52:53, 2.54s/it] +2025-02-05 18:54:01 - ERROR - stderr - +2025-02-05 18:54:01 - ERROR - stderr - +2025-02-05 18:54:01 - INFO - stdout - {'loss': 0.7511, 'grad_norm': 1.1867334842681885, 'learning_rate': 1.4360194971868926e-05, 'epoch': 1.13} +2025-02-05 18:54:01 - ERROR - stderr - 38%|███▊ | 8429/22434 [8:46:21<9:52:53, 2.54s/it] +2025-02-05 18:54:03 - ERROR - stderr - 38%|███▊ | 8430/22434 [8:46:23<9:51:30, 2.53s/it] +2025-02-05 18:54:03 - ERROR - stderr - +2025-02-05 18:54:03 - ERROR - stderr - +2025-02-05 18:54:03 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.1280359029769897, 'learning_rate': 1.4358895645109803e-05, 'epoch': 1.13} +2025-02-05 18:54:03 - ERROR - stderr - 38%|███▊ | 8430/22434 [8:46:23<9:51:30, 2.53s/it] +2025-02-05 18:54:06 - ERROR - stderr - 38%|███▊ | 8431/22434 [8:46:26<10:01:39, 2.58s/it] +2025-02-05 18:54:06 - ERROR - stderr - +2025-02-05 18:54:06 - ERROR - stderr - +2025-02-05 18:54:06 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.242020845413208, 'learning_rate': 1.4357596227493672e-05, 'epoch': 1.13} +2025-02-05 18:54:06 - ERROR - stderr - 38%|███▊ | 8431/22434 [8:46:26<10:01:39, 2.58s/it] +2025-02-05 18:54:08 - ERROR - stderr - 38%|███▊ | 8432/22434 [8:46:28<9:59:10, 2.57s/it] +2025-02-05 18:54:08 - ERROR - stderr - +2025-02-05 18:54:08 - ERROR - stderr - +2025-02-05 18:54:08 - INFO - stdout - {'loss': 0.7791, 'grad_norm': 1.1730339527130127, 'learning_rate': 1.4356296719047615e-05, 'epoch': 1.13} +2025-02-05 18:54:08 - ERROR - stderr - 38%|███▊ | 8432/22434 [8:46:28<9:59:10, 2.57s/it] +2025-02-05 18:54:11 - ERROR - stderr - 38%|███▊ | 8433/22434 [8:46:31<9:56:28, 2.56s/it] +2025-02-05 18:54:11 - ERROR - stderr - +2025-02-05 18:54:11 - ERROR - stderr - +2025-02-05 18:54:11 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.1614855527877808, 'learning_rate': 1.4354997119798722e-05, 'epoch': 1.13} +2025-02-05 18:54:11 - ERROR - stderr - 38%|███▊ | 8433/22434 [8:46:31<9:56:28, 2.56s/it] +2025-02-05 18:54:14 - ERROR - stderr - 38%|███▊ | 8434/22434 [8:46:33<9:57:07, 2.56s/it] +2025-02-05 18:54:14 - ERROR - stderr - +2025-02-05 18:54:14 - ERROR - stderr - +2025-02-05 18:54:14 - INFO - stdout - {'loss': 0.8149, 'grad_norm': 1.2613730430603027, 'learning_rate': 1.4353697429774083e-05, 'epoch': 1.13} +2025-02-05 18:54:14 - ERROR - stderr - 38%|███▊ | 8434/22434 [8:46:33<9:57:07, 2.56s/it] +2025-02-05 18:54:16 - ERROR - stderr - 38%|███▊ | 8435/22434 [8:46:36<9:48:04, 2.52s/it] +2025-02-05 18:54:16 - ERROR - stderr - +2025-02-05 18:54:16 - ERROR - stderr - +2025-02-05 18:54:16 - INFO - stdout - {'loss': 0.7142, 'grad_norm': 1.1981089115142822, 'learning_rate': 1.4352397649000785e-05, 'epoch': 1.13} +2025-02-05 18:54:16 - ERROR - stderr - 38%|███▊ | 8435/22434 [8:46:36<9:48:04, 2.52s/it] +2025-02-05 18:54:19 - ERROR - stderr - 38%|███▊ | 8436/22434 [8:46:38<9:49:39, 2.53s/it] +2025-02-05 18:54:19 - ERROR - stderr - +2025-02-05 18:54:19 - ERROR - stderr - +2025-02-05 18:54:19 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.134832739830017, 'learning_rate': 1.4351097777505924e-05, 'epoch': 1.13} +2025-02-05 18:54:19 - ERROR - stderr - 38%|███▊ | 8436/22434 [8:46:38<9:49:39, 2.53s/it] +2025-02-05 18:54:21 - ERROR - stderr - 38%|███▊ | 8437/22434 [8:46:41<9:46:43, 2.52s/it] +2025-02-05 18:54:21 - ERROR - stderr - +2025-02-05 18:54:21 - ERROR - stderr - +2025-02-05 18:54:21 - INFO - stdout - {'loss': 0.7638, 'grad_norm': 1.1421397924423218, 'learning_rate': 1.4349797815316593e-05, 'epoch': 1.13} +2025-02-05 18:54:21 - ERROR - stderr - 38%|███▊ | 8437/22434 [8:46:41<9:46:43, 2.52s/it] +2025-02-05 18:54:24 - ERROR - stderr - 38%|███▊ | 8438/22434 [8:46:43<9:55:16, 2.55s/it] +2025-02-05 18:54:24 - ERROR - stderr - +2025-02-05 18:54:24 - ERROR - stderr - +2025-02-05 18:54:24 - INFO - stdout - {'loss': 0.7985, 'grad_norm': 1.1641876697540283, 'learning_rate': 1.4348497762459887e-05, 'epoch': 1.13} +2025-02-05 18:54:24 - ERROR - stderr - 38%|███▊ | 8438/22434 [8:46:43<9:55:16, 2.55s/it] +2025-02-05 18:54:26 - ERROR - stderr - 38%|███▊ | 8439/22434 [8:46:46<9:49:14, 2.53s/it] +2025-02-05 18:54:26 - ERROR - stderr - +2025-02-05 18:54:26 - ERROR - stderr - +2025-02-05 18:54:26 - INFO - stdout - {'loss': 0.8014, 'grad_norm': 1.3237162828445435, 'learning_rate': 1.434719761896291e-05, 'epoch': 1.13} +2025-02-05 18:54:26 - ERROR - stderr - 38%|███▊ | 8439/22434 [8:46:46<9:49:14, 2.53s/it] +2025-02-05 18:54:29 - ERROR - stderr - 38%|███▊ | 8440/22434 [8:46:48<9:45:17, 2.51s/it] +2025-02-05 18:54:29 - ERROR - stderr - +2025-02-05 18:54:29 - ERROR - stderr - +2025-02-05 18:54:29 - INFO - stdout - {'loss': 0.7299, 'grad_norm': 1.1083369255065918, 'learning_rate': 1.4345897384852756e-05, 'epoch': 1.13} +2025-02-05 18:54:29 - ERROR - stderr - 38%|███▊ | 8440/22434 [8:46:48<9:45:17, 2.51s/it] +2025-02-05 18:54:31 - ERROR - stderr - 38%|███▊ | 8441/22434 [8:46:51<9:49:22, 2.53s/it] +2025-02-05 18:54:31 - ERROR - stderr - +2025-02-05 18:54:31 - ERROR - stderr - +2025-02-05 18:54:31 - INFO - stdout - {'loss': 0.6939, 'grad_norm': 1.1464701890945435, 'learning_rate': 1.434459706015653e-05, 'epoch': 1.13} +2025-02-05 18:54:31 - ERROR - stderr - 38%|███▊ | 8441/22434 [8:46:51<9:49:22, 2.53s/it] +2025-02-05 18:54:34 - ERROR - stderr - 38%|███▊ | 8442/22434 [8:46:53<9:43:00, 2.50s/it] +2025-02-05 18:54:34 - ERROR - stderr - +2025-02-05 18:54:34 - ERROR - stderr - +2025-02-05 18:54:34 - INFO - stdout - {'loss': 0.7446, 'grad_norm': 1.221779704093933, 'learning_rate': 1.4343296644901336e-05, 'epoch': 1.13} +2025-02-05 18:54:34 - ERROR - stderr - 38%|███▊ | 8442/22434 [8:46:53<9:43:00, 2.50s/it] +2025-02-05 18:54:36 - ERROR - stderr - 38%|███▊ | 8443/22434 [8:46:56<9:48:02, 2.52s/it] +2025-02-05 18:54:36 - ERROR - stderr - +2025-02-05 18:54:36 - ERROR - stderr - +2025-02-05 18:54:36 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.1120717525482178, 'learning_rate': 1.434199613911428e-05, 'epoch': 1.13} +2025-02-05 18:54:36 - ERROR - stderr - 38%|███▊ | 8443/22434 [8:46:56<9:48:02, 2.52s/it] +2025-02-05 18:54:39 - ERROR - stderr - 38%|███▊ | 8444/22434 [8:46:58<9:44:29, 2.51s/it] +2025-02-05 18:54:39 - ERROR - stderr - +2025-02-05 18:54:39 - ERROR - stderr - +2025-02-05 18:54:39 - INFO - stdout - {'loss': 0.7336, 'grad_norm': 1.1548521518707275, 'learning_rate': 1.434069554282247e-05, 'epoch': 1.13} +2025-02-05 18:54:39 - ERROR - stderr - 38%|███▊ | 8444/22434 [8:46:58<9:44:29, 2.51s/it] +2025-02-05 18:54:42 - ERROR - stderr - 38%|███▊ | 8445/22434 [8:47:01<10:14:30, 2.64s/it] +2025-02-05 18:54:42 - ERROR - stderr - +2025-02-05 18:54:42 - ERROR - stderr - +2025-02-05 18:54:42 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.1772726774215698, 'learning_rate': 1.433939485605301e-05, 'epoch': 1.13} +2025-02-05 18:54:42 - ERROR - stderr - 38%|███▊ | 8445/22434 [8:47:01<10:14:30, 2.64s/it] +2025-02-05 18:54:44 - ERROR - stderr - 38%|███▊ | 8446/22434 [8:47:04<10:02:27, 2.58s/it] +2025-02-05 18:54:44 - ERROR - stderr - +2025-02-05 18:54:44 - ERROR - stderr - +2025-02-05 18:54:44 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.1283254623413086, 'learning_rate': 1.4338094078833022e-05, 'epoch': 1.13} +2025-02-05 18:54:44 - ERROR - stderr - 38%|███▊ | 8446/22434 [8:47:04<10:02:27, 2.58s/it] +2025-02-05 18:54:47 - ERROR - stderr - 38%|███▊ | 8447/22434 [8:47:06<9:59:03, 2.57s/it] +2025-02-05 18:54:47 - ERROR - stderr - +2025-02-05 18:54:47 - ERROR - stderr - +2025-02-05 18:54:47 - INFO - stdout - {'loss': 0.6375, 'grad_norm': 1.069089651107788, 'learning_rate': 1.4336793211189612e-05, 'epoch': 1.13} +2025-02-05 18:54:47 - ERROR - stderr - 38%|███▊ | 8447/22434 [8:47:06<9:59:03, 2.57s/it] +2025-02-05 18:54:49 - ERROR - stderr - 38%|███▊ | 8448/22434 [8:47:09<9:54:14, 2.55s/it] +2025-02-05 18:54:49 - ERROR - stderr - +2025-02-05 18:54:49 - ERROR - stderr - +2025-02-05 18:54:49 - INFO - stdout - {'loss': 0.7728, 'grad_norm': 1.1044714450836182, 'learning_rate': 1.4335492253149901e-05, 'epoch': 1.13} +2025-02-05 18:54:49 - ERROR - stderr - 38%|███▊ | 8448/22434 [8:47:09<9:54:14, 2.55s/it] +2025-02-05 18:54:52 - ERROR - stderr - 38%|███▊ | 8449/22434 [8:47:12<10:17:36, 2.65s/it] +2025-02-05 18:54:52 - ERROR - stderr - +2025-02-05 18:54:52 - ERROR - stderr - +2025-02-05 18:54:52 - INFO - stdout - {'loss': 0.7272, 'grad_norm': 1.101341724395752, 'learning_rate': 1.4334191204740997e-05, 'epoch': 1.13} +2025-02-05 18:54:52 - ERROR - stderr - 38%|███▊ | 8449/22434 [8:47:12<10:17:36, 2.65s/it] +2025-02-05 18:54:55 - ERROR - stderr - 38%|███▊ | 8450/22434 [8:47:15<10:29:43, 2.70s/it] +2025-02-05 18:54:55 - ERROR - stderr - +2025-02-05 18:54:55 - ERROR - stderr - +2025-02-05 18:54:55 - INFO - stdout - {'loss': 0.8056, 'grad_norm': 1.1498056650161743, 'learning_rate': 1.4332890065990027e-05, 'epoch': 1.13} +2025-02-05 18:54:55 - ERROR - stderr - 38%|███▊ | 8450/22434 [8:47:15<10:29:43, 2.70s/it] +2025-02-05 18:54:57 - ERROR - stderr - 38%|███▊ | 8451/22434 [8:47:17<10:14:24, 2.64s/it] +2025-02-05 18:54:57 - ERROR - stderr - +2025-02-05 18:54:57 - ERROR - stderr - +2025-02-05 18:54:57 - INFO - stdout - {'loss': 0.7414, 'grad_norm': 1.107475996017456, 'learning_rate': 1.4331588836924111e-05, 'epoch': 1.13} +2025-02-05 18:54:57 - ERROR - stderr - 38%|███▊ | 8451/22434 [8:47:17<10:14:24, 2.64s/it] +2025-02-05 18:55:00 - ERROR - stderr - 38%|███▊ | 8452/22434 [8:47:20<10:13:17, 2.63s/it] +2025-02-05 18:55:00 - ERROR - stderr - +2025-02-05 18:55:00 - ERROR - stderr - +2025-02-05 18:55:00 - INFO - stdout - {'loss': 0.7202, 'grad_norm': 1.1508893966674805, 'learning_rate': 1.4330287517570367e-05, 'epoch': 1.13} +2025-02-05 18:55:00 - ERROR - stderr - 38%|███▊ | 8452/22434 [8:47:20<10:13:17, 2.63s/it] +2025-02-05 18:55:02 - ERROR - stderr - 38%|███▊ | 8453/22434 [8:47:22<10:02:55, 2.59s/it] +2025-02-05 18:55:02 - ERROR - stderr - +2025-02-05 18:55:02 - ERROR - stderr - +2025-02-05 18:55:02 - INFO - stdout - {'loss': 0.8509, 'grad_norm': 1.3827824592590332, 'learning_rate': 1.4328986107955926e-05, 'epoch': 1.13} +2025-02-05 18:55:02 - ERROR - stderr - 38%|███▊ | 8453/22434 [8:47:22<10:02:55, 2.59s/it] +2025-02-05 18:55:05 - ERROR - stderr - 38%|███▊ | 8454/22434 [8:47:25<9:56:57, 2.56s/it] +2025-02-05 18:55:05 - ERROR - stderr - +2025-02-05 18:55:05 - ERROR - stderr - +2025-02-05 18:55:05 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.0723426342010498, 'learning_rate': 1.4327684608107912e-05, 'epoch': 1.13} +2025-02-05 18:55:05 - ERROR - stderr - 38%|███▊ | 8454/22434 [8:47:25<9:56:57, 2.56s/it] +2025-02-05 18:55:07 - ERROR - stderr - 38%|███▊ | 8455/22434 [8:47:27<9:59:41, 2.57s/it] +2025-02-05 18:55:08 - ERROR - stderr - +2025-02-05 18:55:08 - ERROR - stderr - +2025-02-05 18:55:08 - INFO - stdout - {'loss': 0.7492, 'grad_norm': 1.161230206489563, 'learning_rate': 1.4326383018053451e-05, 'epoch': 1.13} +2025-02-05 18:55:08 - ERROR - stderr - 38%|███▊ | 8455/22434 [8:47:27<9:59:41, 2.57s/it] +2025-02-05 18:55:10 - ERROR - stderr - 38%|███▊ | 8456/22434 [8:47:30<10:01:16, 2.58s/it] +2025-02-05 18:55:10 - ERROR - stderr - +2025-02-05 18:55:10 - ERROR - stderr - +2025-02-05 18:55:10 - INFO - stdout - {'loss': 0.6881, 'grad_norm': 1.2608108520507812, 'learning_rate': 1.4325081337819681e-05, 'epoch': 1.13} +2025-02-05 18:55:10 - ERROR - stderr - 38%|███▊ | 8456/22434 [8:47:30<10:01:16, 2.58s/it] +2025-02-05 18:55:13 - ERROR - stderr - 38%|███▊ | 8457/22434 [8:47:32<9:57:19, 2.56s/it] +2025-02-05 18:55:13 - ERROR - stderr - +2025-02-05 18:55:13 - ERROR - stderr - +2025-02-05 18:55:13 - INFO - stdout - {'loss': 0.7075, 'grad_norm': 1.0458427667617798, 'learning_rate': 1.4323779567433725e-05, 'epoch': 1.13} +2025-02-05 18:55:13 - ERROR - stderr - 38%|███▊ | 8457/22434 [8:47:32<9:57:19, 2.56s/it] +2025-02-05 18:55:15 - ERROR - stderr - 38%|███▊ | 8458/22434 [8:47:35<9:51:48, 2.54s/it] +2025-02-05 18:55:15 - ERROR - stderr - +2025-02-05 18:55:15 - ERROR - stderr - +2025-02-05 18:55:15 - INFO - stdout - {'loss': 0.7117, 'grad_norm': 1.1560204029083252, 'learning_rate': 1.4322477706922721e-05, 'epoch': 1.13} +2025-02-05 18:55:15 - ERROR - stderr - 38%|███▊ | 8458/22434 [8:47:35<9:51:48, 2.54s/it] +2025-02-05 18:55:18 - ERROR - stderr - 38%|███▊ | 8459/22434 [8:47:37<9:46:59, 2.52s/it] +2025-02-05 18:55:18 - ERROR - stderr - +2025-02-05 18:55:18 - ERROR - stderr - +2025-02-05 18:55:18 - INFO - stdout - {'loss': 0.6497, 'grad_norm': 1.1166564226150513, 'learning_rate': 1.4321175756313807e-05, 'epoch': 1.13} +2025-02-05 18:55:18 - ERROR - stderr - 38%|███▊ | 8459/22434 [8:47:37<9:46:59, 2.52s/it] +2025-02-05 18:55:20 - ERROR - stderr - 38%|███▊ | 8460/22434 [8:47:40<9:40:47, 2.49s/it] +2025-02-05 18:55:20 - ERROR - stderr - +2025-02-05 18:55:20 - ERROR - stderr - +2025-02-05 18:55:20 - INFO - stdout - {'loss': 0.7421, 'grad_norm': 1.0914586782455444, 'learning_rate': 1.431987371563412e-05, 'epoch': 1.13} +2025-02-05 18:55:20 - ERROR - stderr - 38%|███▊ | 8460/22434 [8:47:40<9:40:47, 2.49s/it] +2025-02-05 18:55:23 - ERROR - stderr - 38%|███▊ | 8461/22434 [8:47:42<9:43:06, 2.50s/it] +2025-02-05 18:55:23 - ERROR - stderr - +2025-02-05 18:55:23 - ERROR - stderr - +2025-02-05 18:55:23 - INFO - stdout - {'loss': 0.6725, 'grad_norm': 1.208411693572998, 'learning_rate': 1.4318571584910798e-05, 'epoch': 1.13} +2025-02-05 18:55:23 - ERROR - stderr - 38%|███▊ | 8461/22434 [8:47:42<9:43:06, 2.50s/it] +2025-02-05 18:55:25 - ERROR - stderr - 38%|███▊ | 8462/22434 [8:47:45<9:46:33, 2.52s/it] +2025-02-05 18:55:25 - ERROR - stderr - +2025-02-05 18:55:25 - ERROR - stderr - +2025-02-05 18:55:25 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.0694564580917358, 'learning_rate': 1.4317269364170985e-05, 'epoch': 1.13} +2025-02-05 18:55:25 - ERROR - stderr - 38%|███▊ | 8462/22434 [8:47:45<9:46:33, 2.52s/it] +2025-02-05 18:55:28 - ERROR - stderr - 38%|███▊ | 8463/22434 [8:47:47<9:44:06, 2.51s/it] +2025-02-05 18:55:28 - ERROR - stderr - +2025-02-05 18:55:28 - ERROR - stderr - +2025-02-05 18:55:28 - INFO - stdout - {'loss': 0.7433, 'grad_norm': 1.507333517074585, 'learning_rate': 1.4315967053441822e-05, 'epoch': 1.13} +2025-02-05 18:55:28 - ERROR - stderr - 38%|███▊ | 8463/22434 [8:47:47<9:44:06, 2.51s/it] +2025-02-05 18:55:30 - ERROR - stderr - 38%|███▊ | 8464/22434 [8:47:50<9:43:34, 2.51s/it] +2025-02-05 18:55:30 - ERROR - stderr - +2025-02-05 18:55:30 - ERROR - stderr - +2025-02-05 18:55:30 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.075852394104004, 'learning_rate': 1.4314664652750454e-05, 'epoch': 1.13} +2025-02-05 18:55:30 - ERROR - stderr - 38%|███▊ | 8464/22434 [8:47:50<9:43:34, 2.51s/it] +2025-02-05 18:55:33 - ERROR - stderr - 38%|███▊ | 8465/22434 [8:47:52<9:45:30, 2.51s/it] +2025-02-05 18:55:33 - ERROR - stderr - +2025-02-05 18:55:33 - ERROR - stderr - +2025-02-05 18:55:33 - INFO - stdout - {'loss': 0.7077, 'grad_norm': 1.0763435363769531, 'learning_rate': 1.431336216212403e-05, 'epoch': 1.13} +2025-02-05 18:55:33 - ERROR - stderr - 38%|███▊ | 8465/22434 [8:47:52<9:45:30, 2.51s/it] +2025-02-05 18:55:35 - ERROR - stderr - 38%|███▊ | 8466/22434 [8:47:55<9:47:10, 2.52s/it] +2025-02-05 18:55:35 - ERROR - stderr - +2025-02-05 18:55:35 - ERROR - stderr - +2025-02-05 18:55:35 - INFO - stdout - {'loss': 0.7248, 'grad_norm': 1.2975422143936157, 'learning_rate': 1.4312059581589704e-05, 'epoch': 1.13} +2025-02-05 18:55:35 - ERROR - stderr - 38%|███▊ | 8466/22434 [8:47:55<9:47:10, 2.52s/it] +2025-02-05 18:55:38 - ERROR - stderr - 38%|███▊ | 8467/22434 [8:47:57<9:46:52, 2.52s/it] +2025-02-05 18:55:38 - ERROR - stderr - +2025-02-05 18:55:38 - ERROR - stderr - +2025-02-05 18:55:38 - INFO - stdout - {'loss': 0.7831, 'grad_norm': 1.3175022602081299, 'learning_rate': 1.4310756911174619e-05, 'epoch': 1.13} +2025-02-05 18:55:38 - ERROR - stderr - 38%|███▊ | 8467/22434 [8:47:57<9:46:52, 2.52s/it] +2025-02-05 18:55:40 - ERROR - stderr - 38%|███▊ | 8468/22434 [8:48:00<9:44:52, 2.51s/it] +2025-02-05 18:55:40 - ERROR - stderr - +2025-02-05 18:55:40 - ERROR - stderr - +2025-02-05 18:55:40 - INFO - stdout - {'loss': 0.8005, 'grad_norm': 1.2464368343353271, 'learning_rate': 1.4309454150905933e-05, 'epoch': 1.13} +2025-02-05 18:55:40 - ERROR - stderr - 38%|███▊ | 8468/22434 [8:48:00<9:44:52, 2.51s/it] +2025-02-05 18:55:43 - ERROR - stderr - 38%|███▊ | 8469/22434 [8:48:02<9:41:35, 2.50s/it] +2025-02-05 18:55:43 - ERROR - stderr - +2025-02-05 18:55:43 - ERROR - stderr - +2025-02-05 18:55:43 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.1143320798873901, 'learning_rate': 1.4308151300810797e-05, 'epoch': 1.13} +2025-02-05 18:55:43 - ERROR - stderr - 38%|███▊ | 8469/22434 [8:48:02<9:41:35, 2.50s/it] +2025-02-05 18:55:45 - ERROR - stderr - 38%|███▊ | 8470/22434 [8:48:05<9:41:47, 2.50s/it] +2025-02-05 18:55:45 - ERROR - stderr - +2025-02-05 18:55:45 - ERROR - stderr - +2025-02-05 18:55:45 - INFO - stdout - {'loss': 0.7555, 'grad_norm': 1.313994288444519, 'learning_rate': 1.4306848360916368e-05, 'epoch': 1.13} +2025-02-05 18:55:45 - ERROR - stderr - 38%|███▊ | 8470/22434 [8:48:05<9:41:47, 2.50s/it] +2025-02-05 18:55:48 - ERROR - stderr - 38%|███▊ | 8471/22434 [8:48:07<9:40:24, 2.49s/it] +2025-02-05 18:55:48 - ERROR - stderr - +2025-02-05 18:55:48 - ERROR - stderr - +2025-02-05 18:55:48 - INFO - stdout - {'loss': 0.7659, 'grad_norm': 1.2605969905853271, 'learning_rate': 1.4305545331249807e-05, 'epoch': 1.13} +2025-02-05 18:55:48 - ERROR - stderr - 38%|███▊ | 8471/22434 [8:48:07<9:40:24, 2.49s/it] +2025-02-05 18:55:50 - ERROR - stderr - 38%|███▊ | 8472/22434 [8:48:10<9:43:33, 2.51s/it] +2025-02-05 18:55:50 - ERROR - stderr - +2025-02-05 18:55:50 - ERROR - stderr - +2025-02-05 18:55:50 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.1556403636932373, 'learning_rate': 1.4304242211838277e-05, 'epoch': 1.13} +2025-02-05 18:55:50 - ERROR - stderr - 38%|███▊ | 8472/22434 [8:48:10<9:43:33, 2.51s/it] +2025-02-05 18:55:53 - ERROR - stderr - 38%|███▊ | 8473/22434 [8:48:12<9:48:30, 2.53s/it] +2025-02-05 18:55:53 - ERROR - stderr - +2025-02-05 18:55:53 - ERROR - stderr - +2025-02-05 18:55:53 - INFO - stdout - {'loss': 0.7212, 'grad_norm': 1.0986402034759521, 'learning_rate': 1.4302939002708933e-05, 'epoch': 1.13} +2025-02-05 18:55:53 - ERROR - stderr - 38%|███▊ | 8473/22434 [8:48:13<9:48:30, 2.53s/it] +2025-02-05 18:55:55 - ERROR - stderr - 38%|███▊ | 8474/22434 [8:48:15<9:49:50, 2.54s/it] +2025-02-05 18:55:55 - ERROR - stderr - +2025-02-05 18:55:55 - ERROR - stderr - +2025-02-05 18:55:55 - INFO - stdout - {'loss': 0.7807, 'grad_norm': 1.2066212892532349, 'learning_rate': 1.4301635703888946e-05, 'epoch': 1.13} +2025-02-05 18:55:55 - ERROR - stderr - 38%|███▊ | 8474/22434 [8:48:15<9:49:50, 2.54s/it] +2025-02-05 18:55:58 - ERROR - stderr - 38%|███▊ | 8475/22434 [8:48:17<9:43:53, 2.51s/it] +2025-02-05 18:55:58 - ERROR - stderr - +2025-02-05 18:55:58 - ERROR - stderr - +2025-02-05 18:55:58 - INFO - stdout - {'loss': 0.7074, 'grad_norm': 1.251274824142456, 'learning_rate': 1.4300332315405476e-05, 'epoch': 1.13} +2025-02-05 18:55:58 - ERROR - stderr - 38%|███▊ | 8475/22434 [8:48:18<9:43:53, 2.51s/it] +2025-02-05 18:56:00 - ERROR - stderr - 38%|███▊ | 8476/22434 [8:48:20<9:57:11, 2.57s/it] +2025-02-05 18:56:00 - ERROR - stderr - +2025-02-05 18:56:00 - ERROR - stderr - +2025-02-05 18:56:00 - INFO - stdout - {'loss': 0.7749, 'grad_norm': 1.191537618637085, 'learning_rate': 1.4299028837285693e-05, 'epoch': 1.13} +2025-02-05 18:56:00 - ERROR - stderr - 38%|███▊ | 8476/22434 [8:48:20<9:57:11, 2.57s/it] +2025-02-05 18:56:03 - ERROR - stderr - 38%|███▊ | 8477/22434 [8:48:23<9:56:09, 2.56s/it] +2025-02-05 18:56:03 - ERROR - stderr - +2025-02-05 18:56:03 - ERROR - stderr - +2025-02-05 18:56:03 - INFO - stdout - {'loss': 0.7906, 'grad_norm': 1.2016936540603638, 'learning_rate': 1.429772526955677e-05, 'epoch': 1.13} +2025-02-05 18:56:03 - ERROR - stderr - 38%|███▊ | 8477/22434 [8:48:23<9:56:09, 2.56s/it] +2025-02-05 18:56:05 - ERROR - stderr - 38%|███▊ | 8478/22434 [8:48:25<9:54:31, 2.56s/it] +2025-02-05 18:56:06 - ERROR - stderr - +2025-02-05 18:56:06 - ERROR - stderr - +2025-02-05 18:56:06 - INFO - stdout - {'loss': 0.7332, 'grad_norm': 1.271976351737976, 'learning_rate': 1.4296421612245877e-05, 'epoch': 1.13} +2025-02-05 18:56:06 - ERROR - stderr - 38%|███▊ | 8478/22434 [8:48:25<9:54:31, 2.56s/it] +2025-02-05 18:56:08 - ERROR - stderr - 38%|███▊ | 8479/22434 [8:48:28<9:49:03, 2.53s/it] +2025-02-05 18:56:08 - ERROR - stderr - +2025-02-05 18:56:08 - ERROR - stderr - +2025-02-05 18:56:08 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.0917885303497314, 'learning_rate': 1.4295117865380185e-05, 'epoch': 1.13} +2025-02-05 18:56:08 - ERROR - stderr - 38%|███▊ | 8479/22434 [8:48:28<9:49:03, 2.53s/it] +2025-02-05 18:56:11 - ERROR - stderr - 38%|███▊ | 8480/22434 [8:48:30<9:52:38, 2.55s/it] +2025-02-05 18:56:11 - ERROR - stderr - +2025-02-05 18:56:11 - ERROR - stderr - +2025-02-05 18:56:11 - INFO - stdout - {'loss': 0.8026, 'grad_norm': 1.1485202312469482, 'learning_rate': 1.4293814028986874e-05, 'epoch': 1.13} +2025-02-05 18:56:11 - ERROR - stderr - 38%|███▊ | 8480/22434 [8:48:30<9:52:38, 2.55s/it] +2025-02-05 18:56:13 - ERROR - stderr - 38%|███▊ | 8481/22434 [8:48:33<9:50:27, 2.54s/it] +2025-02-05 18:56:13 - ERROR - stderr - +2025-02-05 18:56:13 - ERROR - stderr - +2025-02-05 18:56:13 - INFO - stdout - {'loss': 0.7674, 'grad_norm': 1.1456812620162964, 'learning_rate': 1.4292510103093115e-05, 'epoch': 1.13} +2025-02-05 18:56:13 - ERROR - stderr - 38%|███▊ | 8481/22434 [8:48:33<9:50:27, 2.54s/it] +2025-02-05 18:56:16 - ERROR - stderr - 38%|███▊ | 8482/22434 [8:48:35<9:45:17, 2.52s/it] +2025-02-05 18:56:16 - ERROR - stderr - +2025-02-05 18:56:16 - ERROR - stderr - +2025-02-05 18:56:16 - INFO - stdout - {'loss': 0.7397, 'grad_norm': 1.106204867362976, 'learning_rate': 1.429120608772609e-05, 'epoch': 1.13} +2025-02-05 18:56:16 - ERROR - stderr - 38%|███▊ | 8482/22434 [8:48:35<9:45:17, 2.52s/it] +2025-02-05 18:56:18 - ERROR - stderr - 38%|███▊ | 8483/22434 [8:48:38<9:50:34, 2.54s/it] +2025-02-05 18:56:18 - ERROR - stderr - +2025-02-05 18:56:18 - ERROR - stderr - +2025-02-05 18:56:18 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.2597233057022095, 'learning_rate': 1.4289901982912983e-05, 'epoch': 1.13} +2025-02-05 18:56:18 - ERROR - stderr - 38%|███▊ | 8483/22434 [8:48:38<9:50:34, 2.54s/it] +2025-02-05 18:56:21 - ERROR - stderr - 38%|███▊ | 8484/22434 [8:48:40<9:52:58, 2.55s/it] +2025-02-05 18:56:21 - ERROR - stderr - +2025-02-05 18:56:21 - ERROR - stderr - +2025-02-05 18:56:21 - INFO - stdout - {'loss': 0.6722, 'grad_norm': 1.0477244853973389, 'learning_rate': 1.4288597788680974e-05, 'epoch': 1.13} +2025-02-05 18:56:21 - ERROR - stderr - 38%|███▊ | 8484/22434 [8:48:41<9:52:58, 2.55s/it] +2025-02-05 18:56:23 - ERROR - stderr - 38%|███▊ | 8485/22434 [8:48:43<9:51:03, 2.54s/it] +2025-02-05 18:56:23 - ERROR - stderr - +2025-02-05 18:56:23 - ERROR - stderr - +2025-02-05 18:56:23 - INFO - stdout - {'loss': 0.8004, 'grad_norm': 1.1919598579406738, 'learning_rate': 1.4287293505057248e-05, 'epoch': 1.13} +2025-02-05 18:56:23 - ERROR - stderr - 38%|███▊ | 8485/22434 [8:48:43<9:51:03, 2.54s/it] +2025-02-05 18:56:26 - ERROR - stderr - 38%|███▊ | 8486/22434 [8:48:46<10:07:51, 2.61s/it] +2025-02-05 18:56:26 - ERROR - stderr - +2025-02-05 18:56:26 - ERROR - stderr - +2025-02-05 18:56:26 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.2121471166610718, 'learning_rate': 1.4285989132068988e-05, 'epoch': 1.13} +2025-02-05 18:56:26 - ERROR - stderr - 38%|███▊ | 8486/22434 [8:48:46<10:07:51, 2.61s/it] +2025-02-05 18:56:29 - ERROR - stderr - 38%|███▊ | 8487/22434 [8:48:48<10:11:21, 2.63s/it] +2025-02-05 18:56:29 - ERROR - stderr - +2025-02-05 18:56:29 - ERROR - stderr - +2025-02-05 18:56:29 - INFO - stdout - {'loss': 0.7802, 'grad_norm': 1.2106009721755981, 'learning_rate': 1.4284684669743387e-05, 'epoch': 1.13} +2025-02-05 18:56:29 - ERROR - stderr - 38%|███▊ | 8487/22434 [8:48:49<10:11:21, 2.63s/it] +2025-02-05 18:56:31 - ERROR - stderr - 38%|███▊ | 8488/22434 [8:48:51<10:07:01, 2.61s/it] +2025-02-05 18:56:31 - ERROR - stderr - +2025-02-05 18:56:31 - ERROR - stderr - +2025-02-05 18:56:31 - INFO - stdout - {'loss': 0.75, 'grad_norm': 1.272891879081726, 'learning_rate': 1.4283380118107636e-05, 'epoch': 1.14} +2025-02-05 18:56:31 - ERROR - stderr - 38%|███▊ | 8488/22434 [8:48:51<10:07:01, 2.61s/it] +2025-02-05 18:56:34 - ERROR - stderr - 38%|███▊ | 8489/22434 [8:48:54<9:58:39, 2.58s/it] +2025-02-05 18:56:34 - ERROR - stderr - +2025-02-05 18:56:34 - ERROR - stderr - +2025-02-05 18:56:34 - INFO - stdout - {'loss': 0.625, 'grad_norm': 1.025378704071045, 'learning_rate': 1.4282075477188923e-05, 'epoch': 1.14} +2025-02-05 18:56:34 - ERROR - stderr - 38%|███▊ | 8489/22434 [8:48:54<9:58:39, 2.58s/it] +2025-02-05 18:56:36 - ERROR - stderr - 38%|███▊ | 8490/22434 [8:48:56<10:04:48, 2.60s/it] +2025-02-05 18:56:36 - ERROR - stderr - +2025-02-05 18:56:36 - ERROR - stderr - +2025-02-05 18:56:36 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.1192607879638672, 'learning_rate': 1.4280770747014445e-05, 'epoch': 1.14} +2025-02-05 18:56:36 - ERROR - stderr - 38%|███▊ | 8490/22434 [8:48:56<10:04:48, 2.60s/it] +2025-02-05 18:56:39 - ERROR - stderr - 38%|███▊ | 8491/22434 [8:48:59<10:08:55, 2.62s/it] +2025-02-05 18:56:39 - ERROR - stderr - +2025-02-05 18:56:39 - ERROR - stderr - +2025-02-05 18:56:39 - INFO - stdout - {'loss': 0.7806, 'grad_norm': 1.1058380603790283, 'learning_rate': 1.4279465927611399e-05, 'epoch': 1.14} +2025-02-05 18:56:39 - ERROR - stderr - 38%|███▊ | 8491/22434 [8:48:59<10:08:55, 2.62s/it] +2025-02-05 18:56:42 - ERROR - stderr - 38%|███▊ | 8492/22434 [8:49:01<10:00:54, 2.59s/it] +2025-02-05 18:56:42 - ERROR - stderr - +2025-02-05 18:56:42 - ERROR - stderr - +2025-02-05 18:56:42 - INFO - stdout - {'loss': 0.7497, 'grad_norm': 1.272824764251709, 'learning_rate': 1.427816101900698e-05, 'epoch': 1.14} +2025-02-05 18:56:42 - ERROR - stderr - 38%|███▊ | 8492/22434 [8:49:01<10:00:54, 2.59s/it] +2025-02-05 18:56:44 - ERROR - stderr - 38%|███▊ | 8493/22434 [8:49:04<9:53:54, 2.56s/it] +2025-02-05 18:56:44 - ERROR - stderr - +2025-02-05 18:56:44 - ERROR - stderr - +2025-02-05 18:56:44 - INFO - stdout - {'loss': 0.7275, 'grad_norm': 1.1337007284164429, 'learning_rate': 1.4276856021228387e-05, 'epoch': 1.14} +2025-02-05 18:56:44 - ERROR - stderr - 38%|███▊ | 8493/22434 [8:49:04<9:53:54, 2.56s/it] +2025-02-05 18:56:47 - ERROR - stderr - 38%|███▊ | 8494/22434 [8:49:06<9:48:06, 2.53s/it] +2025-02-05 18:56:47 - ERROR - stderr - +2025-02-05 18:56:47 - ERROR - stderr - +2025-02-05 18:56:47 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.1484978199005127, 'learning_rate': 1.4275550934302822e-05, 'epoch': 1.14} +2025-02-05 18:56:47 - ERROR - stderr - 38%|███▊ | 8494/22434 [8:49:06<9:48:06, 2.53s/it] +2025-02-05 18:56:49 - ERROR - stderr - 38%|███▊ | 8495/22434 [8:49:09<9:45:41, 2.52s/it] +2025-02-05 18:56:49 - ERROR - stderr - +2025-02-05 18:56:49 - ERROR - stderr - +2025-02-05 18:56:49 - INFO - stdout - {'loss': 0.7246, 'grad_norm': 1.343007206916809, 'learning_rate': 1.4274245758257492e-05, 'epoch': 1.14} +2025-02-05 18:56:49 - ERROR - stderr - 38%|███▊ | 8495/22434 [8:49:09<9:45:41, 2.52s/it] +2025-02-05 18:56:52 - ERROR - stderr - 38%|███▊ | 8496/22434 [8:49:11<9:45:02, 2.52s/it] +2025-02-05 18:56:52 - ERROR - stderr - +2025-02-05 18:56:52 - ERROR - stderr - +2025-02-05 18:56:52 - INFO - stdout - {'loss': 0.6656, 'grad_norm': 1.1766449213027954, 'learning_rate': 1.4272940493119596e-05, 'epoch': 1.14} +2025-02-05 18:56:52 - ERROR - stderr - 38%|███▊ | 8496/22434 [8:49:11<9:45:02, 2.52s/it] +2025-02-05 18:56:54 - ERROR - stderr - 38%|███▊ | 8497/22434 [8:49:14<9:47:28, 2.53s/it] +2025-02-05 18:56:54 - ERROR - stderr - +2025-02-05 18:56:54 - ERROR - stderr - +2025-02-05 18:56:54 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.1537761688232422, 'learning_rate': 1.4271635138916344e-05, 'epoch': 1.14} +2025-02-05 18:56:54 - ERROR - stderr - 38%|███▊ | 8497/22434 [8:49:14<9:47:28, 2.53s/it] +2025-02-05 18:56:57 - ERROR - stderr - 38%|███▊ | 8498/22434 [8:49:17<10:04:42, 2.60s/it] +2025-02-05 18:56:57 - ERROR - stderr - +2025-02-05 18:56:57 - ERROR - stderr - +2025-02-05 18:56:57 - INFO - stdout - {'loss': 0.7172, 'grad_norm': 1.1074657440185547, 'learning_rate': 1.427032969567495e-05, 'epoch': 1.14} +2025-02-05 18:56:57 - ERROR - stderr - 38%|███▊ | 8498/22434 [8:49:17<10:04:42, 2.60s/it] +2025-02-05 18:56:59 - ERROR - stderr - 38%|███▊ | 8499/22434 [8:49:19<9:53:26, 2.56s/it] +2025-02-05 18:56:59 - ERROR - stderr - +2025-02-05 18:56:59 - ERROR - stderr - +2025-02-05 18:56:59 - INFO - stdout - {'loss': 0.7512, 'grad_norm': 1.1687147617340088, 'learning_rate': 1.4269024163422614e-05, 'epoch': 1.14} +2025-02-05 18:56:59 - ERROR - stderr - 38%|███▊ | 8499/22434 [8:49:19<9:53:26, 2.56s/it] +2025-02-05 18:57:02 - ERROR - stderr - 38%|███▊ | 8500/22434 [8:49:22<9:52:44, 2.55s/it] +2025-02-05 18:57:02 - ERROR - stderr - +2025-02-05 18:57:02 - ERROR - stderr - +2025-02-05 18:57:02 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.1800237894058228, 'learning_rate': 1.4267718542186557e-05, 'epoch': 1.14} +2025-02-05 18:57:02 - ERROR - stderr - 38%|███▊ | 8500/22434 [8:49:22<9:52:44, 2.55s/it] +2025-02-05 18:57:04 - ERROR - stderr - 38%|███▊ | 8501/22434 [8:49:24<9:45:46, 2.52s/it] +2025-02-05 18:57:04 - ERROR - stderr - +2025-02-05 18:57:04 - ERROR - stderr - +2025-02-05 18:57:04 - INFO - stdout - {'loss': 0.7327, 'grad_norm': 1.1873095035552979, 'learning_rate': 1.4266412831993991e-05, 'epoch': 1.14} +2025-02-05 18:57:04 - ERROR - stderr - 38%|███▊ | 8501/22434 [8:49:24<9:45:46, 2.52s/it] +2025-02-05 18:57:07 - ERROR - stderr - 38%|███▊ | 8502/22434 [8:49:27<9:42:07, 2.51s/it] +2025-02-05 18:57:07 - ERROR - stderr - +2025-02-05 18:57:07 - ERROR - stderr - +2025-02-05 18:57:07 - INFO - stdout - {'loss': 0.7678, 'grad_norm': 1.192323088645935, 'learning_rate': 1.4265107032872131e-05, 'epoch': 1.14} +2025-02-05 18:57:07 - ERROR - stderr - 38%|███▊ | 8502/22434 [8:49:27<9:42:07, 2.51s/it] +2025-02-05 18:57:09 - ERROR - stderr - 38%|███▊ | 8503/22434 [8:49:29<9:40:27, 2.50s/it] +2025-02-05 18:57:09 - ERROR - stderr - +2025-02-05 18:57:09 - ERROR - stderr - +2025-02-05 18:57:09 - INFO - stdout - {'loss': 0.7472, 'grad_norm': 1.2861554622650146, 'learning_rate': 1.4263801144848196e-05, 'epoch': 1.14} +2025-02-05 18:57:09 - ERROR - stderr - 38%|███▊ | 8503/22434 [8:49:29<9:40:27, 2.50s/it] +2025-02-05 18:57:12 - ERROR - stderr - 38%|███▊ | 8504/22434 [8:49:32<9:44:36, 2.52s/it] +2025-02-05 18:57:12 - ERROR - stderr - +2025-02-05 18:57:12 - ERROR - stderr - +2025-02-05 18:57:12 - INFO - stdout - {'loss': 0.7541, 'grad_norm': 1.296046495437622, 'learning_rate': 1.4262495167949406e-05, 'epoch': 1.14} +2025-02-05 18:57:12 - ERROR - stderr - 38%|███▊ | 8504/22434 [8:49:32<9:44:36, 2.52s/it] +2025-02-05 18:57:14 - ERROR - stderr - 38%|███▊ | 8505/22434 [8:49:34<9:45:05, 2.52s/it] +2025-02-05 18:57:14 - ERROR - stderr - +2025-02-05 18:57:14 - ERROR - stderr - +2025-02-05 18:57:14 - INFO - stdout - {'loss': 0.7721, 'grad_norm': 1.3652756214141846, 'learning_rate': 1.4261189102202985e-05, 'epoch': 1.14} +2025-02-05 18:57:14 - ERROR - stderr - 38%|███▊ | 8505/22434 [8:49:34<9:45:05, 2.52s/it] +2025-02-05 18:57:17 - ERROR - stderr - 38%|███▊ | 8506/22434 [8:49:37<9:43:34, 2.51s/it] +2025-02-05 18:57:17 - ERROR - stderr - +2025-02-05 18:57:17 - ERROR - stderr - +2025-02-05 18:57:17 - INFO - stdout - {'loss': 0.7946, 'grad_norm': 1.1960608959197998, 'learning_rate': 1.4259882947636154e-05, 'epoch': 1.14} +2025-02-05 18:57:17 - ERROR - stderr - 38%|███▊ | 8506/22434 [8:49:37<9:43:34, 2.51s/it] +2025-02-05 18:57:19 - ERROR - stderr - 38%|███▊ | 8507/22434 [8:49:39<9:42:31, 2.51s/it] +2025-02-05 18:57:19 - ERROR - stderr - +2025-02-05 18:57:19 - ERROR - stderr - +2025-02-05 18:57:19 - INFO - stdout - {'loss': 0.7292, 'grad_norm': 1.2483481168746948, 'learning_rate': 1.4258576704276139e-05, 'epoch': 1.14} +2025-02-05 18:57:19 - ERROR - stderr - 38%|███▊ | 8507/22434 [8:49:39<9:42:31, 2.51s/it] +2025-02-05 18:57:22 - ERROR - stderr - 38%|███▊ | 8508/22434 [8:49:42<9:38:16, 2.49s/it] +2025-02-05 18:57:22 - ERROR - stderr - +2025-02-05 18:57:22 - ERROR - stderr - +2025-02-05 18:57:22 - INFO - stdout - {'loss': 0.6636, 'grad_norm': 1.0186744928359985, 'learning_rate': 1.4257270372150167e-05, 'epoch': 1.14} +2025-02-05 18:57:22 - ERROR - stderr - 38%|███▊ | 8508/22434 [8:49:42<9:38:16, 2.49s/it] +2025-02-05 18:57:24 - ERROR - stderr - 38%|███▊ | 8509/22434 [8:49:44<9:44:34, 2.52s/it] +2025-02-05 18:57:24 - ERROR - stderr - +2025-02-05 18:57:24 - ERROR - stderr - +2025-02-05 18:57:24 - INFO - stdout - {'loss': 0.73, 'grad_norm': 1.134162187576294, 'learning_rate': 1.4255963951285467e-05, 'epoch': 1.14} +2025-02-05 18:57:24 - ERROR - stderr - 38%|███▊ | 8509/22434 [8:49:44<9:44:34, 2.52s/it] +2025-02-05 18:57:27 - ERROR - stderr - 38%|███▊ | 8510/22434 [8:49:47<9:43:31, 2.51s/it] +2025-02-05 18:57:27 - ERROR - stderr - +2025-02-05 18:57:27 - ERROR - stderr - +2025-02-05 18:57:27 - INFO - stdout - {'loss': 0.7301, 'grad_norm': 1.1401047706604004, 'learning_rate': 1.4254657441709273e-05, 'epoch': 1.14} +2025-02-05 18:57:27 - ERROR - stderr - 38%|███▊ | 8510/22434 [8:49:47<9:43:31, 2.51s/it] +2025-02-05 18:57:29 - ERROR - stderr - 38%|███▊ | 8511/22434 [8:49:49<9:44:13, 2.52s/it] +2025-02-05 18:57:29 - ERROR - stderr - +2025-02-05 18:57:29 - ERROR - stderr - +2025-02-05 18:57:29 - INFO - stdout - {'loss': 0.6777, 'grad_norm': 1.195168375968933, 'learning_rate': 1.4253350843448815e-05, 'epoch': 1.14} +2025-02-05 18:57:29 - ERROR - stderr - 38%|███▊ | 8511/22434 [8:49:49<9:44:13, 2.52s/it] +2025-02-05 18:57:32 - ERROR - stderr - 38%|███▊ | 8512/22434 [8:49:52<9:55:01, 2.56s/it] +2025-02-05 18:57:32 - ERROR - stderr - +2025-02-05 18:57:32 - ERROR - stderr - +2025-02-05 18:57:32 - INFO - stdout - {'loss': 0.7897, 'grad_norm': 1.2958393096923828, 'learning_rate': 1.4252044156531328e-05, 'epoch': 1.14} +2025-02-05 18:57:32 - ERROR - stderr - 38%|███▊ | 8512/22434 [8:49:52<9:55:01, 2.56s/it] +2025-02-05 18:57:35 - ERROR - stderr - 38%|███▊ | 8513/22434 [8:49:55<10:03:52, 2.60s/it] +2025-02-05 18:57:35 - ERROR - stderr - +2025-02-05 18:57:35 - ERROR - stderr - +2025-02-05 18:57:35 - INFO - stdout - {'loss': 0.7024, 'grad_norm': 1.1841095685958862, 'learning_rate': 1.4250737380984053e-05, 'epoch': 1.14} +2025-02-05 18:57:35 - ERROR - stderr - 38%|███▊ | 8513/22434 [8:49:55<10:03:52, 2.60s/it] +2025-02-05 18:57:37 - ERROR - stderr - 38%|███▊ | 8514/22434 [8:49:57<9:55:19, 2.57s/it] +2025-02-05 18:57:37 - ERROR - stderr - +2025-02-05 18:57:37 - ERROR - stderr - +2025-02-05 18:57:37 - INFO - stdout - {'loss': 0.7659, 'grad_norm': 1.1955180168151855, 'learning_rate': 1.4249430516834222e-05, 'epoch': 1.14} +2025-02-05 18:57:37 - ERROR - stderr - 38%|███▊ | 8514/22434 [8:49:57<9:55:19, 2.57s/it] +2025-02-05 18:57:40 - ERROR - stderr - 38%|███▊ | 8515/22434 [8:50:00<9:55:07, 2.57s/it] +2025-02-05 18:57:40 - ERROR - stderr - +2025-02-05 18:57:40 - ERROR - stderr - +2025-02-05 18:57:40 - INFO - stdout - {'loss': 0.6739, 'grad_norm': 1.1328061819076538, 'learning_rate': 1.4248123564109077e-05, 'epoch': 1.14} +2025-02-05 18:57:40 - ERROR - stderr - 38%|███▊ | 8515/22434 [8:50:00<9:55:07, 2.57s/it] +2025-02-05 18:57:42 - ERROR - stderr - 38%|███▊ | 8516/22434 [8:50:02<9:50:25, 2.55s/it] +2025-02-05 18:57:42 - ERROR - stderr - +2025-02-05 18:57:42 - ERROR - stderr - +2025-02-05 18:57:42 - INFO - stdout - {'loss': 0.6403, 'grad_norm': 1.1669927835464478, 'learning_rate': 1.424681652283586e-05, 'epoch': 1.14} +2025-02-05 18:57:42 - ERROR - stderr - 38%|███▊ | 8516/22434 [8:50:02<9:50:25, 2.55s/it] +2025-02-05 18:57:45 - ERROR - stderr - 38%|███▊ | 8517/22434 [8:50:05<9:42:58, 2.51s/it] +2025-02-05 18:57:45 - ERROR - stderr - +2025-02-05 18:57:45 - ERROR - stderr - +2025-02-05 18:57:45 - INFO - stdout - {'loss': 0.7513, 'grad_norm': 1.1896038055419922, 'learning_rate': 1.4245509393041821e-05, 'epoch': 1.14} +2025-02-05 18:57:45 - ERROR - stderr - 38%|███▊ | 8517/22434 [8:50:05<9:42:58, 2.51s/it] +2025-02-05 18:57:47 - ERROR - stderr - 38%|███▊ | 8518/22434 [8:50:07<9:38:11, 2.49s/it] +2025-02-05 18:57:47 - ERROR - stderr - +2025-02-05 18:57:47 - ERROR - stderr - +2025-02-05 18:57:47 - INFO - stdout - {'loss': 0.6886, 'grad_norm': 1.1753462553024292, 'learning_rate': 1.4244202174754199e-05, 'epoch': 1.14} +2025-02-05 18:57:47 - ERROR - stderr - 38%|███▊ | 8518/22434 [8:50:07<9:38:11, 2.49s/it] +2025-02-05 18:57:50 - ERROR - stderr - 38%|███▊ | 8519/22434 [8:50:10<9:57:27, 2.58s/it] +2025-02-05 18:57:50 - ERROR - stderr - +2025-02-05 18:57:50 - ERROR - stderr - +2025-02-05 18:57:50 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.0358129739761353, 'learning_rate': 1.4242894868000244e-05, 'epoch': 1.14} +2025-02-05 18:57:50 - ERROR - stderr - 38%|███▊ | 8519/22434 [8:50:10<9:57:27, 2.58s/it] +2025-02-05 18:57:52 - ERROR - stderr - 38%|███▊ | 8520/22434 [8:50:12<9:52:57, 2.56s/it] +2025-02-05 18:57:53 - ERROR - stderr - +2025-02-05 18:57:53 - ERROR - stderr - +2025-02-05 18:57:53 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.273347020149231, 'learning_rate': 1.4241587472807203e-05, 'epoch': 1.14} +2025-02-05 18:57:53 - ERROR - stderr - 38%|███▊ | 8520/22434 [8:50:12<9:52:57, 2.56s/it] +2025-02-05 18:57:55 - ERROR - stderr - 38%|███▊ | 8521/22434 [8:50:15<9:46:46, 2.53s/it] +2025-02-05 18:57:55 - ERROR - stderr - +2025-02-05 18:57:55 - ERROR - stderr - +2025-02-05 18:57:55 - INFO - stdout - {'loss': 0.7281, 'grad_norm': 1.164568305015564, 'learning_rate': 1.4240279989202332e-05, 'epoch': 1.14} +2025-02-05 18:57:55 - ERROR - stderr - 38%|███▊ | 8521/22434 [8:50:15<9:46:46, 2.53s/it] +2025-02-05 18:57:57 - ERROR - stderr - 38%|███▊ | 8522/22434 [8:50:17<9:44:56, 2.52s/it] +2025-02-05 18:57:58 - ERROR - stderr - +2025-02-05 18:57:58 - ERROR - stderr - +2025-02-05 18:57:58 - INFO - stdout - {'loss': 0.8326, 'grad_norm': 1.291204810142517, 'learning_rate': 1.4238972417212882e-05, 'epoch': 1.14} +2025-02-05 18:57:58 - ERROR - stderr - 38%|███▊ | 8522/22434 [8:50:17<9:44:56, 2.52s/it] +2025-02-05 18:58:00 - ERROR - stderr - 38%|███▊ | 8523/22434 [8:50:20<9:42:08, 2.51s/it] +2025-02-05 18:58:00 - ERROR - stderr - +2025-02-05 18:58:00 - ERROR - stderr - +2025-02-05 18:58:00 - INFO - stdout - {'loss': 0.7998, 'grad_norm': 1.2569732666015625, 'learning_rate': 1.423766475686611e-05, 'epoch': 1.14} +2025-02-05 18:58:00 - ERROR - stderr - 38%|███▊ | 8523/22434 [8:50:20<9:42:08, 2.51s/it] +2025-02-05 18:58:02 - ERROR - stderr - 38%|███▊ | 8524/22434 [8:50:22<9:42:06, 2.51s/it] +2025-02-05 18:58:03 - ERROR - stderr - +2025-02-05 18:58:03 - ERROR - stderr - +2025-02-05 18:58:03 - INFO - stdout - {'loss': 0.7371, 'grad_norm': 1.1659945249557495, 'learning_rate': 1.423635700818927e-05, 'epoch': 1.14} +2025-02-05 18:58:03 - ERROR - stderr - 38%|███▊ | 8524/22434 [8:50:22<9:42:06, 2.51s/it] +2025-02-05 18:58:05 - ERROR - stderr - 38%|███▊ | 8525/22434 [8:50:25<9:41:11, 2.51s/it] +2025-02-05 18:58:05 - ERROR - stderr - +2025-02-05 18:58:05 - ERROR - stderr - +2025-02-05 18:58:05 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.0587611198425293, 'learning_rate': 1.4235049171209624e-05, 'epoch': 1.14} +2025-02-05 18:58:05 - ERROR - stderr - 38%|███▊ | 8525/22434 [8:50:25<9:41:11, 2.51s/it] +2025-02-05 18:58:08 - ERROR - stderr - 38%|███▊ | 8526/22434 [8:50:27<9:44:21, 2.52s/it] +2025-02-05 18:58:08 - ERROR - stderr - +2025-02-05 18:58:08 - ERROR - stderr - +2025-02-05 18:58:08 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.0336602926254272, 'learning_rate': 1.4233741245954427e-05, 'epoch': 1.14} +2025-02-05 18:58:08 - ERROR - stderr - 38%|███▊ | 8526/22434 [8:50:27<9:44:21, 2.52s/it] +2025-02-05 18:58:10 - ERROR - stderr - 38%|███▊ | 8527/22434 [8:50:30<9:43:58, 2.52s/it] +2025-02-05 18:58:10 - ERROR - stderr - +2025-02-05 18:58:10 - ERROR - stderr - +2025-02-05 18:58:10 - INFO - stdout - {'loss': 0.6497, 'grad_norm': 0.9703884124755859, 'learning_rate': 1.4232433232450945e-05, 'epoch': 1.14} +2025-02-05 18:58:10 - ERROR - stderr - 38%|███▊ | 8527/22434 [8:50:30<9:43:58, 2.52s/it] +2025-02-05 18:58:13 - ERROR - stderr - 38%|███▊ | 8528/22434 [8:50:32<9:41:20, 2.51s/it] +2025-02-05 18:58:13 - ERROR - stderr - +2025-02-05 18:58:13 - ERROR - stderr - +2025-02-05 18:58:13 - INFO - stdout - {'loss': 0.662, 'grad_norm': 1.0044950246810913, 'learning_rate': 1.4231125130726442e-05, 'epoch': 1.14} +2025-02-05 18:58:13 - ERROR - stderr - 38%|███▊ | 8528/22434 [8:50:32<9:41:20, 2.51s/it] +2025-02-05 18:58:15 - ERROR - stderr - 38%|███▊ | 8529/22434 [8:50:35<9:43:53, 2.52s/it] +2025-02-05 18:58:15 - ERROR - stderr - +2025-02-05 18:58:15 - ERROR - stderr - +2025-02-05 18:58:15 - INFO - stdout - {'loss': 0.7495, 'grad_norm': 1.0813864469528198, 'learning_rate': 1.4229816940808188e-05, 'epoch': 1.14} +2025-02-05 18:58:15 - ERROR - stderr - 38%|███▊ | 8529/22434 [8:50:35<9:43:53, 2.52s/it] +2025-02-05 18:58:18 - ERROR - stderr - 38%|███▊ | 8530/22434 [8:50:37<9:40:48, 2.51s/it] +2025-02-05 18:58:18 - ERROR - stderr - +2025-02-05 18:58:18 - ERROR - stderr - +2025-02-05 18:58:18 - INFO - stdout - {'loss': 0.7073, 'grad_norm': 1.0305730104446411, 'learning_rate': 1.4228508662723443e-05, 'epoch': 1.14} +2025-02-05 18:58:18 - ERROR - stderr - 38%|███▊ | 8530/22434 [8:50:37<9:40:48, 2.51s/it] +2025-02-05 18:58:20 - ERROR - stderr - 38%|███▊ | 8531/22434 [8:50:40<9:40:00, 2.50s/it] +2025-02-05 18:58:20 - ERROR - stderr - +2025-02-05 18:58:20 - ERROR - stderr - +2025-02-05 18:58:20 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.1615597009658813, 'learning_rate': 1.4227200296499484e-05, 'epoch': 1.14} +2025-02-05 18:58:20 - ERROR - stderr - 38%|███▊ | 8531/22434 [8:50:40<9:40:00, 2.50s/it] +2025-02-05 18:58:22 - ERROR - stderr - 38%|███▊ | 8532/22434 [8:50:42<9:33:46, 2.48s/it] +2025-02-05 18:58:22 - ERROR - stderr - +2025-02-05 18:58:22 - ERROR - stderr - +2025-02-05 18:58:22 - INFO - stdout - {'loss': 0.7721, 'grad_norm': 1.1701550483703613, 'learning_rate': 1.4225891842163578e-05, 'epoch': 1.14} +2025-02-05 18:58:22 - ERROR - stderr - 38%|███▊ | 8532/22434 [8:50:42<9:33:46, 2.48s/it] +2025-02-05 18:58:25 - ERROR - stderr - 38%|███▊ | 8533/22434 [8:50:45<9:36:33, 2.49s/it] +2025-02-05 18:58:25 - ERROR - stderr - +2025-02-05 18:58:25 - ERROR - stderr - +2025-02-05 18:58:25 - INFO - stdout - {'loss': 0.704, 'grad_norm': 1.1151748895645142, 'learning_rate': 1.4224583299743004e-05, 'epoch': 1.14} +2025-02-05 18:58:25 - ERROR - stderr - 38%|███▊ | 8533/22434 [8:50:45<9:36:33, 2.49s/it] +2025-02-05 18:58:27 - ERROR - stderr - 38%|███▊ | 8534/22434 [8:50:47<9:33:33, 2.48s/it] +2025-02-05 18:58:27 - ERROR - stderr - +2025-02-05 18:58:27 - ERROR - stderr - +2025-02-05 18:58:27 - INFO - stdout - {'loss': 0.7313, 'grad_norm': 1.1730639934539795, 'learning_rate': 1.422327466926503e-05, 'epoch': 1.14} +2025-02-05 18:58:27 - ERROR - stderr - 38%|███▊ | 8534/22434 [8:50:47<9:33:33, 2.48s/it] +2025-02-05 18:58:30 - ERROR - stderr - 38%|███▊ | 8535/22434 [8:50:50<9:34:24, 2.48s/it] +2025-02-05 18:58:30 - ERROR - stderr - +2025-02-05 18:58:30 - ERROR - stderr - +2025-02-05 18:58:30 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.0634881258010864, 'learning_rate': 1.4221965950756937e-05, 'epoch': 1.14} +2025-02-05 18:58:30 - ERROR - stderr - 38%|███▊ | 8535/22434 [8:50:50<9:34:24, 2.48s/it] +2025-02-05 18:58:32 - ERROR - stderr - 38%|███▊ | 8536/22434 [8:50:52<9:36:16, 2.49s/it] +2025-02-05 18:58:32 - ERROR - stderr - +2025-02-05 18:58:32 - ERROR - stderr - +2025-02-05 18:58:32 - INFO - stdout - {'loss': 0.7519, 'grad_norm': 1.0702561140060425, 'learning_rate': 1.4220657144246004e-05, 'epoch': 1.14} +2025-02-05 18:58:32 - ERROR - stderr - 38%|███▊ | 8536/22434 [8:50:52<9:36:16, 2.49s/it] +2025-02-05 18:58:35 - ERROR - stderr - 38%|███▊ | 8537/22434 [8:50:55<9:35:57, 2.49s/it] +2025-02-05 18:58:35 - ERROR - stderr - +2025-02-05 18:58:35 - ERROR - stderr - +2025-02-05 18:58:35 - INFO - stdout - {'loss': 0.6488, 'grad_norm': 0.9902053475379944, 'learning_rate': 1.4219348249759512e-05, 'epoch': 1.14} +2025-02-05 18:58:35 - ERROR - stderr - 38%|███▊ | 8537/22434 [8:50:55<9:35:57, 2.49s/it] +2025-02-05 18:58:37 - ERROR - stderr - 38%|███▊ | 8538/22434 [8:50:57<9:42:49, 2.52s/it] +2025-02-05 18:58:38 - ERROR - stderr - +2025-02-05 18:58:38 - ERROR - stderr - +2025-02-05 18:58:38 - INFO - stdout - {'loss': 0.7919, 'grad_norm': 1.128320574760437, 'learning_rate': 1.4218039267324743e-05, 'epoch': 1.14} +2025-02-05 18:58:38 - ERROR - stderr - 38%|███▊ | 8538/22434 [8:50:57<9:42:49, 2.52s/it] +2025-02-05 18:58:40 - ERROR - stderr - 38%|███▊ | 8539/22434 [8:51:00<9:44:24, 2.52s/it] +2025-02-05 18:58:40 - ERROR - stderr - +2025-02-05 18:58:40 - ERROR - stderr - +2025-02-05 18:58:40 - INFO - stdout - {'loss': 0.7355, 'grad_norm': 1.1763845682144165, 'learning_rate': 1.4216730196968982e-05, 'epoch': 1.14} +2025-02-05 18:58:40 - ERROR - stderr - 38%|███▊ | 8539/22434 [8:51:00<9:44:24, 2.52s/it] +2025-02-05 18:58:43 - ERROR - stderr - 38%|███▊ | 8540/22434 [8:51:02<9:49:07, 2.54s/it] +2025-02-05 18:58:43 - ERROR - stderr - +2025-02-05 18:58:43 - ERROR - stderr - +2025-02-05 18:58:43 - INFO - stdout - {'loss': 0.7291, 'grad_norm': 1.1773360967636108, 'learning_rate': 1.4215421038719516e-05, 'epoch': 1.14} +2025-02-05 18:58:43 - ERROR - stderr - 38%|███▊ | 8540/22434 [8:51:02<9:49:07, 2.54s/it] +2025-02-05 18:58:45 - ERROR - stderr - 38%|███▊ | 8541/22434 [8:51:05<9:48:33, 2.54s/it] +2025-02-05 18:58:45 - ERROR - stderr - +2025-02-05 18:58:45 - ERROR - stderr - +2025-02-05 18:58:45 - INFO - stdout - {'loss': 0.7341, 'grad_norm': 1.1140855550765991, 'learning_rate': 1.4214111792603632e-05, 'epoch': 1.14} +2025-02-05 18:58:45 - ERROR - stderr - 38%|███▊ | 8541/22434 [8:51:05<9:48:33, 2.54s/it] +2025-02-05 18:58:48 - ERROR - stderr - 38%|███▊ | 8542/22434 [8:51:07<9:44:48, 2.53s/it] +2025-02-05 18:58:48 - ERROR - stderr - +2025-02-05 18:58:48 - ERROR - stderr - +2025-02-05 18:58:48 - INFO - stdout - {'loss': 0.814, 'grad_norm': 1.2221038341522217, 'learning_rate': 1.4212802458648618e-05, 'epoch': 1.14} +2025-02-05 18:58:48 - ERROR - stderr - 38%|███▊ | 8542/22434 [8:51:07<9:44:48, 2.53s/it] +2025-02-05 18:58:50 - ERROR - stderr - 38%|███▊ | 8543/22434 [8:51:10<9:47:00, 2.54s/it] +2025-02-05 18:58:50 - ERROR - stderr - +2025-02-05 18:58:50 - ERROR - stderr - +2025-02-05 18:58:50 - INFO - stdout - {'loss': 0.7601, 'grad_norm': 1.1990512609481812, 'learning_rate': 1.421149303688177e-05, 'epoch': 1.14} +2025-02-05 18:58:50 - ERROR - stderr - 38%|███▊ | 8543/22434 [8:51:10<9:47:00, 2.54s/it] +2025-02-05 18:58:53 - ERROR - stderr - 38%|███▊ | 8544/22434 [8:51:12<9:42:01, 2.51s/it] +2025-02-05 18:58:53 - ERROR - stderr - +2025-02-05 18:58:53 - ERROR - stderr - +2025-02-05 18:58:53 - INFO - stdout - {'loss': 0.79, 'grad_norm': 1.2618746757507324, 'learning_rate': 1.4210183527330377e-05, 'epoch': 1.14} +2025-02-05 18:58:53 - ERROR - stderr - 38%|███▊ | 8544/22434 [8:51:12<9:42:01, 2.51s/it] +2025-02-05 18:58:55 - ERROR - stderr - 38%|███▊ | 8545/22434 [8:51:15<9:52:15, 2.56s/it] +2025-02-05 18:58:55 - ERROR - stderr - +2025-02-05 18:58:55 - ERROR - stderr - +2025-02-05 18:58:55 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 0.9865466952323914, 'learning_rate': 1.420887393002174e-05, 'epoch': 1.14} +2025-02-05 18:58:55 - ERROR - stderr - 38%|███▊ | 8545/22434 [8:51:15<9:52:15, 2.56s/it] +2025-02-05 18:58:58 - ERROR - stderr - 38%|���██▊ | 8546/22434 [8:51:18<9:50:37, 2.55s/it] +2025-02-05 18:58:58 - ERROR - stderr - +2025-02-05 18:58:58 - ERROR - stderr - +2025-02-05 18:58:58 - INFO - stdout - {'loss': 0.7054, 'grad_norm': 1.2747303247451782, 'learning_rate': 1.4207564244983154e-05, 'epoch': 1.14} +2025-02-05 18:58:58 - ERROR - stderr - 38%|███▊ | 8546/22434 [8:51:18<9:50:37, 2.55s/it] +2025-02-05 18:59:00 - ERROR - stderr - 38%|███▊ | 8547/22434 [8:51:20<9:48:04, 2.54s/it] +2025-02-05 18:59:00 - ERROR - stderr - +2025-02-05 18:59:00 - ERROR - stderr - +2025-02-05 18:59:00 - INFO - stdout - {'loss': 0.6914, 'grad_norm': 1.0761216878890991, 'learning_rate': 1.4206254472241916e-05, 'epoch': 1.14} +2025-02-05 18:59:00 - ERROR - stderr - 38%|███▊ | 8547/22434 [8:51:20<9:48:04, 2.54s/it] +2025-02-05 18:59:03 - ERROR - stderr - 38%|███▊ | 8548/22434 [8:51:23<10:09:44, 2.63s/it] +2025-02-05 18:59:03 - ERROR - stderr - +2025-02-05 18:59:03 - ERROR - stderr - +2025-02-05 18:59:03 - INFO - stdout - {'loss': 0.7068, 'grad_norm': 1.1071749925613403, 'learning_rate': 1.4204944611825324e-05, 'epoch': 1.14} +2025-02-05 18:59:03 - ERROR - stderr - 38%|███▊ | 8548/22434 [8:51:23<10:09:44, 2.63s/it] +2025-02-05 18:59:06 - ERROR - stderr - 38%|███▊ | 8549/22434 [8:51:26<10:06:56, 2.62s/it] +2025-02-05 18:59:06 - ERROR - stderr - +2025-02-05 18:59:06 - ERROR - stderr - +2025-02-05 18:59:06 - INFO - stdout - {'loss': 0.7523, 'grad_norm': 1.145937204360962, 'learning_rate': 1.4203634663760693e-05, 'epoch': 1.14} +2025-02-05 18:59:06 - ERROR - stderr - 38%|███▊ | 8549/22434 [8:51:26<10:06:56, 2.62s/it] +2025-02-05 18:59:08 - ERROR - stderr - 38%|███▊ | 8550/22434 [8:51:28<9:58:59, 2.59s/it] +2025-02-05 18:59:08 - ERROR - stderr - +2025-02-05 18:59:08 - ERROR - stderr - +2025-02-05 18:59:08 - INFO - stdout - {'loss': 0.7225, 'grad_norm': 1.0613558292388916, 'learning_rate': 1.4202324628075317e-05, 'epoch': 1.14} +2025-02-05 18:59:08 - ERROR - stderr - 38%|███▊ | 8550/22434 [8:51:28<9:58:59, 2.59s/it] +2025-02-05 18:59:11 - ERROR - stderr - 38%|███▊ | 8551/22434 [8:51:31<9:59:08, 2.59s/it] +2025-02-05 18:59:11 - ERROR - stderr - +2025-02-05 18:59:11 - ERROR - stderr - +2025-02-05 18:59:11 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.1577098369598389, 'learning_rate': 1.4201014504796505e-05, 'epoch': 1.14} +2025-02-05 18:59:11 - ERROR - stderr - 38%|███▊ | 8551/22434 [8:51:31<9:59:08, 2.59s/it] +2025-02-05 18:59:13 - ERROR - stderr - 38%|███▊ | 8552/22434 [8:51:33<9:55:12, 2.57s/it] +2025-02-05 18:59:13 - ERROR - stderr - +2025-02-05 18:59:13 - ERROR - stderr - +2025-02-05 18:59:13 - INFO - stdout - {'loss': 0.7621, 'grad_norm': 1.2799396514892578, 'learning_rate': 1.4199704293951564e-05, 'epoch': 1.14} +2025-02-05 18:59:13 - ERROR - stderr - 38%|███▊ | 8552/22434 [8:51:33<9:55:12, 2.57s/it] +2025-02-05 18:59:16 - ERROR - stderr - 38%|███▊ | 8553/22434 [8:51:36<9:48:08, 2.54s/it] +2025-02-05 18:59:16 - ERROR - stderr - +2025-02-05 18:59:16 - ERROR - stderr - +2025-02-05 18:59:16 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.19056236743927, 'learning_rate': 1.4198393995567807e-05, 'epoch': 1.14} +2025-02-05 18:59:16 - ERROR - stderr - 38%|███▊ | 8553/22434 [8:51:36<9:48:08, 2.54s/it] +2025-02-05 18:59:18 - ERROR - stderr - 38%|███▊ | 8554/22434 [8:51:38<9:46:25, 2.53s/it] +2025-02-05 18:59:18 - ERROR - stderr - +2025-02-05 18:59:18 - ERROR - stderr - +2025-02-05 18:59:18 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.2797596454620361, 'learning_rate': 1.4197083609672543e-05, 'epoch': 1.14} +2025-02-05 18:59:18 - ERROR - stderr - 38%|███▊ | 8554/22434 [8:51:38<9:46:25, 2.53s/it] +2025-02-05 18:59:21 - ERROR - stderr - 38%|███▊ | 8555/22434 [8:51:41<9:44:06, 2.53s/it] +2025-02-05 18:59:21 - ERROR - stderr - +2025-02-05 18:59:21 - ERROR - stderr - +2025-02-05 18:59:21 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.1415300369262695, 'learning_rate': 1.419577313629309e-05, 'epoch': 1.14} +2025-02-05 18:59:21 - ERROR - stderr - 38%|███▊ | 8555/22434 [8:51:41<9:44:06, 2.53s/it] +2025-02-05 18:59:23 - ERROR - stderr - 38%|███▊ | 8556/22434 [8:51:43<9:42:52, 2.52s/it] +2025-02-05 18:59:23 - ERROR - stderr - +2025-02-05 18:59:23 - ERROR - stderr - +2025-02-05 18:59:23 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.204128623008728, 'learning_rate': 1.419446257545676e-05, 'epoch': 1.14} +2025-02-05 18:59:23 - ERROR - stderr - 38%|███▊ | 8556/22434 [8:51:43<9:42:52, 2.52s/it] +2025-02-05 18:59:26 - ERROR - stderr - 38%|███▊ | 8557/22434 [8:51:46<9:45:55, 2.53s/it] +2025-02-05 18:59:26 - ERROR - stderr - +2025-02-05 18:59:26 - ERROR - stderr - +2025-02-05 18:59:26 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.330568790435791, 'learning_rate': 1.4193151927190871e-05, 'epoch': 1.14} +2025-02-05 18:59:26 - ERROR - stderr - 38%|███▊ | 8557/22434 [8:51:46<9:45:55, 2.53s/it] +2025-02-05 18:59:29 - ERROR - stderr - 38%|███▊ | 8558/22434 [8:51:48<9:44:04, 2.53s/it] +2025-02-05 18:59:29 - ERROR - stderr - +2025-02-05 18:59:29 - ERROR - stderr - +2025-02-05 18:59:29 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.2889591455459595, 'learning_rate': 1.4191841191522744e-05, 'epoch': 1.14} +2025-02-05 18:59:29 - ERROR - stderr - 38%|███▊ | 8558/22434 [8:51:48<9:44:04, 2.53s/it] +2025-02-05 18:59:31 - ERROR - stderr - 38%|███▊ | 8559/22434 [8:51:51<9:44:30, 2.53s/it] +2025-02-05 18:59:31 - ERROR - stderr - +2025-02-05 18:59:31 - ERROR - stderr - +2025-02-05 18:59:31 - INFO - stdout - {'loss': 0.7496, 'grad_norm': 1.069575548171997, 'learning_rate': 1.4190530368479696e-05, 'epoch': 1.14} +2025-02-05 18:59:31 - ERROR - stderr - 38%|███▊ | 8559/22434 [8:51:51<9:44:30, 2.53s/it] +2025-02-05 18:59:34 - ERROR - stderr - 38%|███▊ | 8560/22434 [8:51:53<9:42:46, 2.52s/it] +2025-02-05 18:59:34 - ERROR - stderr - +2025-02-05 18:59:34 - ERROR - stderr - +2025-02-05 18:59:34 - INFO - stdout - {'loss': 0.6813, 'grad_norm': 1.1083292961120605, 'learning_rate': 1.4189219458089053e-05, 'epoch': 1.14} +2025-02-05 18:59:34 - ERROR - stderr - 38%|███▊ | 8560/22434 [8:51:53<9:42:46, 2.52s/it] +2025-02-05 18:59:36 - ERROR - stderr - 38%|███▊ | 8561/22434 [8:51:56<9:42:25, 2.52s/it] +2025-02-05 18:59:36 - ERROR - stderr - +2025-02-05 18:59:36 - ERROR - stderr - +2025-02-05 18:59:36 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.1319963932037354, 'learning_rate': 1.4187908460378142e-05, 'epoch': 1.14} +2025-02-05 18:59:36 - ERROR - stderr - 38%|███▊ | 8561/22434 [8:51:56<9:42:25, 2.52s/it] +2025-02-05 18:59:39 - ERROR - stderr - 38%|███▊ | 8562/22434 [8:51:58<9:40:04, 2.51s/it] +2025-02-05 18:59:39 - ERROR - stderr - +2025-02-05 18:59:39 - ERROR - stderr - +2025-02-05 18:59:39 - INFO - stdout - {'loss': 0.6399, 'grad_norm': 1.1391440629959106, 'learning_rate': 1.4186597375374283e-05, 'epoch': 1.14} +2025-02-05 18:59:39 - ERROR - stderr - 38%|███▊ | 8562/22434 [8:51:58<9:40:04, 2.51s/it] +2025-02-05 18:59:42 - ERROR - stderr - 38%|███▊ | 8563/22434 [8:52:01<10:13:28, 2.65s/it] +2025-02-05 18:59:42 - ERROR - stderr - +2025-02-05 18:59:42 - ERROR - stderr - +2025-02-05 18:59:42 - INFO - stdout - {'loss': 0.7703, 'grad_norm': 1.2806825637817383, 'learning_rate': 1.4185286203104809e-05, 'epoch': 1.15} +2025-02-05 18:59:42 - ERROR - stderr - 38%|███▊ | 8563/22434 [8:52:01<10:13:28, 2.65s/it] +2025-02-05 18:59:44 - ERROR - stderr - 38%|███▊ | 8564/22434 [8:52:04<10:00:57, 2.60s/it] +2025-02-05 18:59:44 - ERROR - stderr - +2025-02-05 18:59:44 - ERROR - stderr - +2025-02-05 18:59:44 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.1351758241653442, 'learning_rate': 1.4183974943597047e-05, 'epoch': 1.15} +2025-02-05 18:59:44 - ERROR - stderr - 38%|███▊ | 8564/22434 [8:52:04<10:00:57, 2.60s/it] +2025-02-05 18:59:47 - ERROR - stderr - 38%|███▊ | 8565/22434 [8:52:06<10:00:17, 2.60s/it] +2025-02-05 18:59:47 - ERROR - stderr - +2025-02-05 18:59:47 - ERROR - stderr - +2025-02-05 18:59:47 - INFO - stdout - {'loss': 0.6758, 'grad_norm': 1.1527087688446045, 'learning_rate': 1.4182663596878334e-05, 'epoch': 1.15} +2025-02-05 18:59:47 - ERROR - stderr - 38%|███▊ | 8565/22434 [8:52:06<10:00:17, 2.60s/it] +2025-02-05 18:59:49 - ERROR - stderr - 38%|███▊ | 8566/22434 [8:52:09<10:01:32, 2.60s/it] +2025-02-05 18:59:49 - ERROR - stderr - +2025-02-05 18:59:49 - ERROR - stderr - +2025-02-05 18:59:49 - INFO - stdout - {'loss': 0.8237, 'grad_norm': 1.2319458723068237, 'learning_rate': 1.4181352162976002e-05, 'epoch': 1.15} +2025-02-05 18:59:49 - ERROR - stderr - 38%|███▊ | 8566/22434 [8:52:09<10:01:32, 2.60s/it] +2025-02-05 18:59:52 - ERROR - stderr - 38%|███▊ | 8567/22434 [8:52:11<9:52:40, 2.56s/it] +2025-02-05 18:59:52 - ERROR - stderr - +2025-02-05 18:59:52 - ERROR - stderr - +2025-02-05 18:59:52 - INFO - stdout - {'loss': 0.7409, 'grad_norm': 1.25574791431427, 'learning_rate': 1.4180040641917381e-05, 'epoch': 1.15} +2025-02-05 18:59:52 - ERROR - stderr - 38%|███▊ | 8567/22434 [8:52:12<9:52:40, 2.56s/it] +2025-02-05 18:59:54 - ERROR - stderr - 38%|███▊ | 8568/22434 [8:52:14<9:50:18, 2.55s/it] +2025-02-05 18:59:54 - ERROR - stderr - +2025-02-05 18:59:54 - ERROR - stderr - +2025-02-05 18:59:54 - INFO - stdout - {'loss': 0.6227, 'grad_norm': 1.024193525314331, 'learning_rate': 1.4178729033729812e-05, 'epoch': 1.15} +2025-02-05 18:59:54 - ERROR - stderr - 38%|███▊ | 8568/22434 [8:52:14<9:50:18, 2.55s/it] +2025-02-05 18:59:57 - ERROR - stderr - 38%|███▊ | 8569/22434 [8:52:17<9:49:01, 2.55s/it] +2025-02-05 18:59:57 - ERROR - stderr - +2025-02-05 18:59:57 - ERROR - stderr - +2025-02-05 18:59:57 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.2052526473999023, 'learning_rate': 1.417741733844064e-05, 'epoch': 1.15} +2025-02-05 18:59:57 - ERROR - stderr - 38%|███▊ | 8569/22434 [8:52:17<9:49:01, 2.55s/it] +2025-02-05 18:59:59 - ERROR - stderr - 38%|███▊ | 8570/22434 [8:52:19<9:50:04, 2.55s/it] +2025-02-05 18:59:59 - ERROR - stderr - +2025-02-05 18:59:59 - ERROR - stderr - +2025-02-05 18:59:59 - INFO - stdout - {'loss': 0.6438, 'grad_norm': 1.0336365699768066, 'learning_rate': 1.4176105556077198e-05, 'epoch': 1.15} +2025-02-05 18:59:59 - ERROR - stderr - 38%|███▊ | 8570/22434 [8:52:19<9:50:04, 2.55s/it] +2025-02-05 19:00:02 - ERROR - stderr - 38%|███▊ | 8571/22434 [8:52:22<9:46:48, 2.54s/it] +2025-02-05 19:00:02 - ERROR - stderr - +2025-02-05 19:00:02 - ERROR - stderr - +2025-02-05 19:00:02 - INFO - stdout - {'loss': 0.7706, 'grad_norm': 1.300750732421875, 'learning_rate': 1.4174793686666833e-05, 'epoch': 1.15} +2025-02-05 19:00:02 - ERROR - stderr - 38%|███▊ | 8571/22434 [8:52:22<9:46:48, 2.54s/it] +2025-02-05 19:00:04 - ERROR - stderr - 38%|███▊ | 8572/22434 [8:52:24<9:42:57, 2.52s/it] +2025-02-05 19:00:04 - ERROR - stderr - +2025-02-05 19:00:04 - ERROR - stderr - +2025-02-05 19:00:04 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.1572023630142212, 'learning_rate': 1.4173481730236886e-05, 'epoch': 1.15} +2025-02-05 19:00:04 - ERROR - stderr - 38%|███▊ | 8572/22434 [8:52:24<9:42:57, 2.52s/it] +2025-02-05 19:00:07 - ERROR - stderr - 38%|███▊ | 8573/22434 [8:52:27<9:37:55, 2.50s/it] +2025-02-05 19:00:07 - ERROR - stderr - +2025-02-05 19:00:07 - ERROR - stderr - +2025-02-05 19:00:07 - INFO - stdout - {'loss': 0.6434, 'grad_norm': 1.131834626197815, 'learning_rate': 1.4172169686814707e-05, 'epoch': 1.15} +2025-02-05 19:00:07 - ERROR - stderr - 38%|███▊ | 8573/22434 [8:52:27<9:37:55, 2.50s/it] +2025-02-05 19:00:09 - ERROR - stderr - 38%|███▊ | 8574/22434 [8:52:29<9:36:17, 2.49s/it] +2025-02-05 19:00:09 - ERROR - stderr - +2025-02-05 19:00:09 - ERROR - stderr - +2025-02-05 19:00:09 - INFO - stdout - {'loss': 0.857, 'grad_norm': 1.2664762735366821, 'learning_rate': 1.4170857556427645e-05, 'epoch': 1.15} +2025-02-05 19:00:09 - ERROR - stderr - 38%|███▊ | 8574/22434 [8:52:29<9:36:17, 2.49s/it] +2025-02-05 19:00:12 - ERROR - stderr - 38%|███▊ | 8575/22434 [8:52:32<9:42:58, 2.52s/it] +2025-02-05 19:00:12 - ERROR - stderr - +2025-02-05 19:00:12 - ERROR - stderr - +2025-02-05 19:00:12 - INFO - stdout - {'loss': 0.7402, 'grad_norm': 1.1545804738998413, 'learning_rate': 1.4169545339103046e-05, 'epoch': 1.15} +2025-02-05 19:00:12 - ERROR - stderr - 38%|███▊ | 8575/22434 [8:52:32<9:42:58, 2.52s/it] +2025-02-05 19:00:14 - ERROR - stderr - 38%|███▊ | 8576/22434 [8:52:34<9:45:40, 2.54s/it] +2025-02-05 19:00:14 - ERROR - stderr - +2025-02-05 19:00:14 - ERROR - stderr - +2025-02-05 19:00:14 - INFO - stdout - {'loss': 0.7207, 'grad_norm': 1.1405287981033325, 'learning_rate': 1.4168233034868267e-05, 'epoch': 1.15} +2025-02-05 19:00:14 - ERROR - stderr - 38%|███▊ | 8576/22434 [8:52:34<9:45:40, 2.54s/it] +2025-02-05 19:00:17 - ERROR - stderr - 38%|███▊ | 8577/22434 [8:52:37<9:46:30, 2.54s/it] +2025-02-05 19:00:17 - ERROR - stderr - +2025-02-05 19:00:17 - ERROR - stderr - +2025-02-05 19:00:17 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.0656049251556396, 'learning_rate': 1.4166920643750657e-05, 'epoch': 1.15} +2025-02-05 19:00:17 - ERROR - stderr - 38%|███▊ | 8577/22434 [8:52:37<9:46:30, 2.54s/it] +2025-02-05 19:00:19 - ERROR - stderr - 38%|███▊ | 8578/22434 [8:52:39<9:45:09, 2.53s/it] +2025-02-05 19:00:20 - ERROR - stderr - +2025-02-05 19:00:20 - ERROR - stderr - +2025-02-05 19:00:20 - INFO - stdout - {'loss': 0.7566, 'grad_norm': 1.1234911680221558, 'learning_rate': 1.4165608165777574e-05, 'epoch': 1.15} +2025-02-05 19:00:20 - ERROR - stderr - 38%|███▊ | 8578/22434 [8:52:39<9:45:09, 2.53s/it] +2025-02-05 19:00:22 - ERROR - stderr - 38%|███▊ | 8579/22434 [8:52:42<9:39:52, 2.51s/it] +2025-02-05 19:00:22 - ERROR - stderr - +2025-02-05 19:00:22 - ERROR - stderr - +2025-02-05 19:00:22 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.0855439901351929, 'learning_rate': 1.4164295600976375e-05, 'epoch': 1.15} +2025-02-05 19:00:22 - ERROR - stderr - 38%|███▊ | 8579/22434 [8:52:42<9:39:52, 2.51s/it] +2025-02-05 19:00:25 - ERROR - stderr - 38%|███▊ | 8580/22434 [8:52:44<9:54:36, 2.58s/it] +2025-02-05 19:00:25 - ERROR - stderr - +2025-02-05 19:00:25 - ERROR - stderr - +2025-02-05 19:00:25 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.0276319980621338, 'learning_rate': 1.4162982949374416e-05, 'epoch': 1.15} +2025-02-05 19:00:25 - ERROR - stderr - 38%|███▊ | 8580/22434 [8:52:44<9:54:36, 2.58s/it] +2025-02-05 19:00:27 - ERROR - stderr - 38%|���██▊ | 8581/22434 [8:52:47<9:45:55, 2.54s/it] +2025-02-05 19:00:27 - ERROR - stderr - +2025-02-05 19:00:27 - ERROR - stderr - +2025-02-05 19:00:27 - INFO - stdout - {'loss': 0.738, 'grad_norm': 1.1346845626831055, 'learning_rate': 1.4161670210999063e-05, 'epoch': 1.15} +2025-02-05 19:00:27 - ERROR - stderr - 38%|███▊ | 8581/22434 [8:52:47<9:45:55, 2.54s/it] +2025-02-05 19:00:30 - ERROR - stderr - 38%|███▊ | 8582/22434 [8:52:49<9:45:52, 2.54s/it] +2025-02-05 19:00:30 - ERROR - stderr - +2025-02-05 19:00:30 - ERROR - stderr - +2025-02-05 19:00:30 - INFO - stdout - {'loss': 0.8415, 'grad_norm': 1.270289421081543, 'learning_rate': 1.4160357385877678e-05, 'epoch': 1.15} +2025-02-05 19:00:30 - ERROR - stderr - 38%|███▊ | 8582/22434 [8:52:49<9:45:52, 2.54s/it] +2025-02-05 19:00:32 - ERROR - stderr - 38%|███▊ | 8583/22434 [8:52:52<9:51:33, 2.56s/it] +2025-02-05 19:00:32 - ERROR - stderr - +2025-02-05 19:00:32 - ERROR - stderr - +2025-02-05 19:00:32 - INFO - stdout - {'loss': 0.7379, 'grad_norm': 1.16777765750885, 'learning_rate': 1.4159044474037625e-05, 'epoch': 1.15} +2025-02-05 19:00:32 - ERROR - stderr - 38%|███▊ | 8583/22434 [8:52:52<9:51:33, 2.56s/it] +2025-02-05 19:00:35 - ERROR - stderr - 38%|███▊ | 8584/22434 [8:52:55<9:53:05, 2.57s/it] +2025-02-05 19:00:35 - ERROR - stderr - +2025-02-05 19:00:35 - ERROR - stderr - +2025-02-05 19:00:35 - INFO - stdout - {'loss': 0.7636, 'grad_norm': 1.1192541122436523, 'learning_rate': 1.4157731475506266e-05, 'epoch': 1.15} +2025-02-05 19:00:35 - ERROR - stderr - 38%|███▊ | 8584/22434 [8:52:55<9:53:05, 2.57s/it] +2025-02-05 19:00:38 - ERROR - stderr - 38%|███▊ | 8585/22434 [8:52:57<10:00:21, 2.60s/it] +2025-02-05 19:00:38 - ERROR - stderr - +2025-02-05 19:00:38 - ERROR - stderr - +2025-02-05 19:00:38 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.1054061651229858, 'learning_rate': 1.4156418390310976e-05, 'epoch': 1.15} +2025-02-05 19:00:38 - ERROR - stderr - 38%|███▊ | 8585/22434 [8:52:57<10:00:21, 2.60s/it] +2025-02-05 19:00:40 - ERROR - stderr - 38%|███▊ | 8586/22434 [8:53:00<9:59:21, 2.60s/it] +2025-02-05 19:00:40 - ERROR - stderr - +2025-02-05 19:00:40 - ERROR - stderr - +2025-02-05 19:00:40 - INFO - stdout - {'loss': 0.6597, 'grad_norm': 1.2601593732833862, 'learning_rate': 1.4155105218479121e-05, 'epoch': 1.15} +2025-02-05 19:00:40 - ERROR - stderr - 38%|███▊ | 8586/22434 [8:53:00<9:59:21, 2.60s/it] +2025-02-05 19:00:43 - ERROR - stderr - 38%|███▊ | 8587/22434 [8:53:02<9:55:28, 2.58s/it] +2025-02-05 19:00:43 - ERROR - stderr - +2025-02-05 19:00:43 - ERROR - stderr - +2025-02-05 19:00:43 - INFO - stdout - {'loss': 0.708, 'grad_norm': 1.1186531782150269, 'learning_rate': 1.4153791960038075e-05, 'epoch': 1.15} +2025-02-05 19:00:43 - ERROR - stderr - 38%|███▊ | 8587/22434 [8:53:02<9:55:28, 2.58s/it] +2025-02-05 19:00:45 - ERROR - stderr - 38%|███▊ | 8588/22434 [8:53:05<9:48:32, 2.55s/it] +2025-02-05 19:00:45 - ERROR - stderr - +2025-02-05 19:00:45 - ERROR - stderr - +2025-02-05 19:00:45 - INFO - stdout - {'loss': 0.8372, 'grad_norm': 1.1941864490509033, 'learning_rate': 1.4152478615015209e-05, 'epoch': 1.15} +2025-02-05 19:00:45 - ERROR - stderr - 38%|███▊ | 8588/22434 [8:53:05<9:48:32, 2.55s/it] +2025-02-05 19:00:48 - ERROR - stderr - 38%|███▊ | 8589/22434 [8:53:07<9:46:39, 2.54s/it] +2025-02-05 19:00:48 - ERROR - stderr - +2025-02-05 19:00:48 - ERROR - stderr - +2025-02-05 19:00:48 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 1.214064359664917, 'learning_rate': 1.4151165183437899e-05, 'epoch': 1.15} +2025-02-05 19:00:48 - ERROR - stderr - 38%|███▊ | 8589/22434 [8:53:07<9:46:39, 2.54s/it] +2025-02-05 19:00:50 - ERROR - stderr - 38%|███▊ | 8590/22434 [8:53:10<9:43:05, 2.53s/it] +2025-02-05 19:00:50 - ERROR - stderr - +2025-02-05 19:00:50 - ERROR - stderr - +2025-02-05 19:00:50 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.2008904218673706, 'learning_rate': 1.4149851665333525e-05, 'epoch': 1.15} +2025-02-05 19:00:50 - ERROR - stderr - 38%|███▊ | 8590/22434 [8:53:10<9:43:05, 2.53s/it] +2025-02-05 19:00:53 - ERROR - stderr - 38%|███▊ | 8591/22434 [8:53:12<9:38:15, 2.51s/it] +2025-02-05 19:00:53 - ERROR - stderr - +2025-02-05 19:00:53 - ERROR - stderr - +2025-02-05 19:00:53 - INFO - stdout - {'loss': 0.6934, 'grad_norm': 1.094260573387146, 'learning_rate': 1.4148538060729463e-05, 'epoch': 1.15} +2025-02-05 19:00:53 - ERROR - stderr - 38%|███▊ | 8591/22434 [8:53:12<9:38:15, 2.51s/it] +2025-02-05 19:00:55 - ERROR - stderr - 38%|███▊ | 8592/22434 [8:53:15<9:37:31, 2.50s/it] +2025-02-05 19:00:55 - ERROR - stderr - +2025-02-05 19:00:55 - ERROR - stderr - +2025-02-05 19:00:55 - INFO - stdout - {'loss': 0.7467, 'grad_norm': 1.1319067478179932, 'learning_rate': 1.4147224369653094e-05, 'epoch': 1.15} +2025-02-05 19:00:55 - ERROR - stderr - 38%|███▊ | 8592/22434 [8:53:15<9:37:31, 2.50s/it] +2025-02-05 19:00:58 - ERROR - stderr - 38%|███▊ | 8593/22434 [8:53:17<9:33:15, 2.49s/it] +2025-02-05 19:00:58 - ERROR - stderr - +2025-02-05 19:00:58 - ERROR - stderr - +2025-02-05 19:00:58 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.1795001029968262, 'learning_rate': 1.4145910592131799e-05, 'epoch': 1.15} +2025-02-05 19:00:58 - ERROR - stderr - 38%|███▊ | 8593/22434 [8:53:17<9:33:15, 2.49s/it] +2025-02-05 19:01:00 - ERROR - stderr - 38%|███▊ | 8594/22434 [8:53:20<9:33:35, 2.49s/it] +2025-02-05 19:01:00 - ERROR - stderr - +2025-02-05 19:01:00 - ERROR - stderr - +2025-02-05 19:01:00 - INFO - stdout - {'loss': 0.6834, 'grad_norm': 1.0578622817993164, 'learning_rate': 1.4144596728192972e-05, 'epoch': 1.15} +2025-02-05 19:01:00 - ERROR - stderr - 38%|███▊ | 8594/22434 [8:53:20<9:33:35, 2.49s/it] +2025-02-05 19:01:03 - ERROR - stderr - 38%|███▊ | 8595/22434 [8:53:22<9:33:31, 2.49s/it] +2025-02-05 19:01:03 - ERROR - stderr - +2025-02-05 19:01:03 - ERROR - stderr - +2025-02-05 19:01:03 - INFO - stdout - {'loss': 0.6848, 'grad_norm': 1.2457751035690308, 'learning_rate': 1.4143282777863987e-05, 'epoch': 1.15} +2025-02-05 19:01:03 - ERROR - stderr - 38%|███▊ | 8595/22434 [8:53:22<9:33:31, 2.49s/it] +2025-02-05 19:01:05 - ERROR - stderr - 38%|███▊ | 8596/22434 [8:53:25<9:30:25, 2.47s/it] +2025-02-05 19:01:05 - ERROR - stderr - +2025-02-05 19:01:05 - ERROR - stderr - +2025-02-05 19:01:05 - INFO - stdout - {'loss': 0.7924, 'grad_norm': 1.2480480670928955, 'learning_rate': 1.4141968741172239e-05, 'epoch': 1.15} +2025-02-05 19:01:05 - ERROR - stderr - 38%|███▊ | 8596/22434 [8:53:25<9:30:25, 2.47s/it] +2025-02-05 19:01:07 - ERROR - stderr - 38%|███▊ | 8597/22434 [8:53:27<9:33:29, 2.49s/it] +2025-02-05 19:01:08 - ERROR - stderr - +2025-02-05 19:01:08 - ERROR - stderr - +2025-02-05 19:01:08 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.327481746673584, 'learning_rate': 1.4140654618145115e-05, 'epoch': 1.15} +2025-02-05 19:01:08 - ERROR - stderr - 38%|███▊ | 8597/22434 [8:53:27<9:33:29, 2.49s/it] +2025-02-05 19:01:10 - ERROR - stderr - 38%|███▊ | 8598/22434 [8:53:30<9:31:57, 2.48s/it] +2025-02-05 19:01:10 - ERROR - stderr - +2025-02-05 19:01:10 - ERROR - stderr - +2025-02-05 19:01:10 - INFO - stdout - {'loss': 0.7421, 'grad_norm': 1.156846284866333, 'learning_rate': 1.4139340408810011e-05, 'epoch': 1.15} +2025-02-05 19:01:10 - ERROR - stderr - 38%|███▊ | 8598/22434 [8:53:30<9:31:57, 2.48s/it] +2025-02-05 19:01:13 - ERROR - stderr - 38%|███▊ | 8599/22434 [8:53:32<9:39:23, 2.51s/it] +2025-02-05 19:01:13 - ERROR - stderr - +2025-02-05 19:01:13 - ERROR - stderr - +2025-02-05 19:01:13 - INFO - stdout - {'loss': 0.6874, 'grad_norm': 1.1139090061187744, 'learning_rate': 1.4138026113194312e-05, 'epoch': 1.15} +2025-02-05 19:01:13 - ERROR - stderr - 38%|███▊ | 8599/22434 [8:53:32<9:39:23, 2.51s/it] +2025-02-05 19:01:15 - ERROR - stderr - 38%|███▊ | 8600/22434 [8:53:35<9:41:39, 2.52s/it] +2025-02-05 19:01:15 - ERROR - stderr - +2025-02-05 19:01:15 - ERROR - stderr - +2025-02-05 19:01:15 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.0461763143539429, 'learning_rate': 1.413671173132542e-05, 'epoch': 1.15} +2025-02-05 19:01:15 - ERROR - stderr - 38%|███▊ | 8600/22434 [8:53:35<9:41:39, 2.52s/it] +2025-02-05 19:01:18 - ERROR - stderr - 38%|███▊ | 8601/22434 [8:53:38<9:51:32, 2.57s/it] +2025-02-05 19:01:18 - ERROR - stderr - +2025-02-05 19:01:18 - ERROR - stderr - +2025-02-05 19:01:18 - INFO - stdout - {'loss': 0.8022, 'grad_norm': 1.1975128650665283, 'learning_rate': 1.413539726323073e-05, 'epoch': 1.15} +2025-02-05 19:01:18 - ERROR - stderr - 38%|███▊ | 8601/22434 [8:53:38<9:51:32, 2.57s/it] +2025-02-05 19:01:20 - ERROR - stderr - 38%|███▊ | 8602/22434 [8:53:40<9:53:13, 2.57s/it] +2025-02-05 19:01:20 - ERROR - stderr - +2025-02-05 19:01:20 - ERROR - stderr - +2025-02-05 19:01:20 - INFO - stdout - {'loss': 0.6982, 'grad_norm': 1.1955822706222534, 'learning_rate': 1.4134082708937644e-05, 'epoch': 1.15} +2025-02-05 19:01:20 - ERROR - stderr - 38%|███▊ | 8602/22434 [8:53:40<9:53:13, 2.57s/it] +2025-02-05 19:01:23 - ERROR - stderr - 38%|███▊ | 8603/22434 [8:53:43<9:44:27, 2.54s/it] +2025-02-05 19:01:23 - ERROR - stderr - +2025-02-05 19:01:23 - ERROR - stderr - +2025-02-05 19:01:23 - INFO - stdout - {'loss': 0.8329, 'grad_norm': 1.3038724660873413, 'learning_rate': 1.413276806847356e-05, 'epoch': 1.15} +2025-02-05 19:01:23 - ERROR - stderr - 38%|███▊ | 8603/22434 [8:53:43<9:44:27, 2.54s/it] +2025-02-05 19:01:25 - ERROR - stderr - 38%|███▊ | 8604/22434 [8:53:45<9:41:34, 2.52s/it] +2025-02-05 19:01:25 - ERROR - stderr - +2025-02-05 19:01:25 - ERROR - stderr - +2025-02-05 19:01:25 - INFO - stdout - {'loss': 0.8053, 'grad_norm': 1.2546051740646362, 'learning_rate': 1.4131453341865877e-05, 'epoch': 1.15} +2025-02-05 19:01:25 - ERROR - stderr - 38%|███▊ | 8604/22434 [8:53:45<9:41:34, 2.52s/it] +2025-02-05 19:01:28 - ERROR - stderr - 38%|███▊ | 8605/22434 [8:53:47<9:35:53, 2.50s/it] +2025-02-05 19:01:28 - ERROR - stderr - +2025-02-05 19:01:28 - ERROR - stderr - +2025-02-05 19:01:28 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.122104287147522, 'learning_rate': 1.4130138529142003e-05, 'epoch': 1.15} +2025-02-05 19:01:28 - ERROR - stderr - 38%|███▊ | 8605/22434 [8:53:48<9:35:53, 2.50s/it] +2025-02-05 19:01:30 - ERROR - stderr - 38%|███▊ | 8606/22434 [8:53:50<9:42:06, 2.53s/it] +2025-02-05 19:01:30 - ERROR - stderr - +2025-02-05 19:01:30 - ERROR - stderr - +2025-02-05 19:01:30 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.0660064220428467, 'learning_rate': 1.4128823630329345e-05, 'epoch': 1.15} +2025-02-05 19:01:30 - ERROR - stderr - 38%|███▊ | 8606/22434 [8:53:50<9:42:06, 2.53s/it] +2025-02-05 19:01:33 - ERROR - stderr - 38%|███▊ | 8607/22434 [8:53:53<9:41:46, 2.52s/it] +2025-02-05 19:01:33 - ERROR - stderr - +2025-02-05 19:01:33 - ERROR - stderr - +2025-02-05 19:01:33 - INFO - stdout - {'loss': 0.8063, 'grad_norm': 1.1916389465332031, 'learning_rate': 1.4127508645455308e-05, 'epoch': 1.15} +2025-02-05 19:01:33 - ERROR - stderr - 38%|███▊ | 8607/22434 [8:53:53<9:41:46, 2.52s/it] +2025-02-05 19:01:35 - ERROR - stderr - 38%|███▊ | 8608/22434 [8:53:55<9:37:27, 2.51s/it] +2025-02-05 19:01:35 - ERROR - stderr - +2025-02-05 19:01:35 - ERROR - stderr - +2025-02-05 19:01:35 - INFO - stdout - {'loss': 0.785, 'grad_norm': 1.2864009141921997, 'learning_rate': 1.4126193574547303e-05, 'epoch': 1.15} +2025-02-05 19:01:35 - ERROR - stderr - 38%|███▊ | 8608/22434 [8:53:55<9:37:27, 2.51s/it] +2025-02-05 19:01:38 - ERROR - stderr - 38%|███▊ | 8609/22434 [8:53:58<9:37:03, 2.50s/it] +2025-02-05 19:01:38 - ERROR - stderr - +2025-02-05 19:01:38 - ERROR - stderr - +2025-02-05 19:01:38 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.157697081565857, 'learning_rate': 1.4124878417632741e-05, 'epoch': 1.15} +2025-02-05 19:01:38 - ERROR - stderr - 38%|███▊ | 8609/22434 [8:53:58<9:37:03, 2.50s/it] +2025-02-05 19:01:40 - ERROR - stderr - 38%|███▊ | 8610/22434 [8:54:00<9:40:54, 2.52s/it] +2025-02-05 19:01:40 - ERROR - stderr - +2025-02-05 19:01:40 - ERROR - stderr - +2025-02-05 19:01:40 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.3003860712051392, 'learning_rate': 1.4123563174739036e-05, 'epoch': 1.15} +2025-02-05 19:01:40 - ERROR - stderr - 38%|███▊ | 8610/22434 [8:54:00<9:40:54, 2.52s/it] +2025-02-05 19:01:43 - ERROR - stderr - 38%|███▊ | 8611/22434 [8:54:03<9:36:45, 2.50s/it] +2025-02-05 19:01:43 - ERROR - stderr - +2025-02-05 19:01:43 - ERROR - stderr - +2025-02-05 19:01:43 - INFO - stdout - {'loss': 0.6919, 'grad_norm': 1.0955166816711426, 'learning_rate': 1.4122247845893604e-05, 'epoch': 1.15} +2025-02-05 19:01:43 - ERROR - stderr - 38%|███▊ | 8611/22434 [8:54:03<9:36:45, 2.50s/it] +2025-02-05 19:01:45 - ERROR - stderr - 38%|███▊ | 8612/22434 [8:54:05<9:34:19, 2.49s/it] +2025-02-05 19:01:45 - ERROR - stderr - +2025-02-05 19:01:45 - ERROR - stderr - +2025-02-05 19:01:45 - INFO - stdout - {'loss': 0.6377, 'grad_norm': 0.992414653301239, 'learning_rate': 1.4120932431123858e-05, 'epoch': 1.15} +2025-02-05 19:01:45 - ERROR - stderr - 38%|███▊ | 8612/22434 [8:54:05<9:34:19, 2.49s/it] +2025-02-05 19:01:48 - ERROR - stderr - 38%|███▊ | 8613/22434 [8:54:08<9:36:58, 2.50s/it] +2025-02-05 19:01:48 - ERROR - stderr - +2025-02-05 19:01:48 - ERROR - stderr - +2025-02-05 19:01:48 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.1841528415679932, 'learning_rate': 1.4119616930457219e-05, 'epoch': 1.15} +2025-02-05 19:01:48 - ERROR - stderr - 38%|███▊ | 8613/22434 [8:54:08<9:36:58, 2.50s/it] +2025-02-05 19:01:51 - ERROR - stderr - 38%|███▊ | 8614/22434 [8:54:10<9:54:25, 2.58s/it] +2025-02-05 19:01:51 - ERROR - stderr - +2025-02-05 19:01:51 - ERROR - stderr - +2025-02-05 19:01:51 - INFO - stdout - {'loss': 0.7506, 'grad_norm': 1.1276887655258179, 'learning_rate': 1.4118301343921109e-05, 'epoch': 1.15} +2025-02-05 19:01:51 - ERROR - stderr - 38%|███▊ | 8614/22434 [8:54:10<9:54:25, 2.58s/it] +2025-02-05 19:01:53 - ERROR - stderr - 38%|███▊ | 8615/22434 [8:54:13<9:53:50, 2.58s/it] +2025-02-05 19:01:53 - ERROR - stderr - +2025-02-05 19:01:53 - ERROR - stderr - +2025-02-05 19:01:53 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.2013367414474487, 'learning_rate': 1.4116985671542946e-05, 'epoch': 1.15} +2025-02-05 19:01:53 - ERROR - stderr - 38%|███▊ | 8615/22434 [8:54:13<9:53:50, 2.58s/it] +2025-02-05 19:01:56 - ERROR - stderr - 38%|███▊ | 8616/22434 [8:54:15<9:51:45, 2.57s/it] +2025-02-05 19:01:56 - ERROR - stderr - +2025-02-05 19:01:56 - ERROR - stderr - +2025-02-05 19:01:56 - INFO - stdout - {'loss': 0.7535, 'grad_norm': 1.2439512014389038, 'learning_rate': 1.4115669913350156e-05, 'epoch': 1.15} +2025-02-05 19:01:56 - ERROR - stderr - 38%|███▊ | 8616/22434 [8:54:16<9:51:45, 2.57s/it] +2025-02-05 19:01:58 - ERROR - stderr - 38%|███▊ | 8617/22434 [8:54:18<9:49:38, 2.56s/it] +2025-02-05 19:01:58 - ERROR - stderr - +2025-02-05 19:01:58 - ERROR - stderr - +2025-02-05 19:01:58 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.1796938180923462, 'learning_rate': 1.4114354069370166e-05, 'epoch': 1.15} +2025-02-05 19:01:58 - ERROR - stderr - 38%|███▊ | 8617/22434 [8:54:18<9:49:38, 2.56s/it] +2025-02-05 19:02:01 - ERROR - stderr - 38%|███▊ | 8618/22434 [8:54:20<9:42:01, 2.53s/it] +2025-02-05 19:02:01 - ERROR - stderr - +2025-02-05 19:02:01 - ERROR - stderr - +2025-02-05 19:02:01 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.0779612064361572, 'learning_rate': 1.4113038139630404e-05, 'epoch': 1.15} +2025-02-05 19:02:01 - ERROR - stderr - 38%|███▊ | 8618/22434 [8:54:21<9:42:01, 2.53s/it] +2025-02-05 19:02:03 - ERROR - stderr - 38%|███▊ | 8619/22434 [8:54:23<9:39:14, 2.52s/it] +2025-02-05 19:02:03 - ERROR - stderr - +2025-02-05 19:02:03 - ERROR - stderr - +2025-02-05 19:02:03 - INFO - stdout - {'loss': 0.7136, 'grad_norm': 1.2216672897338867, 'learning_rate': 1.4111722124158295e-05, 'epoch': 1.15} +2025-02-05 19:02:03 - ERROR - stderr - 38%|███▊ | 8619/22434 [8:54:23<9:39:14, 2.52s/it] +2025-02-05 19:02:06 - ERROR - stderr - 38%|███▊ | 8620/22434 [8:54:26<9:41:33, 2.53s/it] +2025-02-05 19:02:06 - ERROR - stderr - +2025-02-05 19:02:06 - ERROR - stderr - +2025-02-05 19:02:06 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.1894123554229736, 'learning_rate': 1.4110406022981274e-05, 'epoch': 1.15} +2025-02-05 19:02:06 - ERROR - stderr - 38%|███▊ | 8620/22434 [8:54:26<9:41:33, 2.53s/it] +2025-02-05 19:02:08 - ERROR - stderr - 38%|███▊ | 8621/22434 [8:54:28<9:43:13, 2.53s/it] +2025-02-05 19:02:08 - ERROR - stderr - +2025-02-05 19:02:08 - ERROR - stderr - +2025-02-05 19:02:08 - INFO - stdout - {'loss': 0.6761, 'grad_norm': 1.1297942399978638, 'learning_rate': 1.4109089836126773e-05, 'epoch': 1.15} +2025-02-05 19:02:08 - ERROR - stderr - 38%|███▊ | 8621/22434 [8:54:28<9:43:13, 2.53s/it] +2025-02-05 19:02:11 - ERROR - stderr - 38%|███▊ | 8622/22434 [8:54:31<9:51:02, 2.57s/it] +2025-02-05 19:02:11 - ERROR - stderr - +2025-02-05 19:02:11 - ERROR - stderr - +2025-02-05 19:02:11 - INFO - stdout - {'loss': 0.8332, 'grad_norm': 1.3036282062530518, 'learning_rate': 1.4107773563622227e-05, 'epoch': 1.15} +2025-02-05 19:02:11 - ERROR - stderr - 38%|███▊ | 8622/22434 [8:54:31<9:51:02, 2.57s/it] +2025-02-05 19:02:13 - ERROR - stderr - 38%|███▊ | 8623/22434 [8:54:33<9:48:31, 2.56s/it] +2025-02-05 19:02:14 - ERROR - stderr - +2025-02-05 19:02:14 - ERROR - stderr - +2025-02-05 19:02:14 - INFO - stdout - {'loss': 0.8326, 'grad_norm': 1.208046317100525, 'learning_rate': 1.410645720549507e-05, 'epoch': 1.15} +2025-02-05 19:02:14 - ERROR - stderr - 38%|███▊ | 8623/22434 [8:54:33<9:48:31, 2.56s/it] +2025-02-05 19:02:16 - ERROR - stderr - 38%|███▊ | 8624/22434 [8:54:36<9:48:10, 2.56s/it] +2025-02-05 19:02:16 - ERROR - stderr - +2025-02-05 19:02:16 - ERROR - stderr - +2025-02-05 19:02:16 - INFO - stdout - {'loss': 0.6624, 'grad_norm': 1.1304584741592407, 'learning_rate': 1.4105140761772745e-05, 'epoch': 1.15} +2025-02-05 19:02:16 - ERROR - stderr - 38%|███▊ | 8624/22434 [8:54:36<9:48:10, 2.56s/it] +2025-02-05 19:02:19 - ERROR - stderr - 38%|███▊ | 8625/22434 [8:54:38<9:52:25, 2.57s/it] +2025-02-05 19:02:19 - ERROR - stderr - +2025-02-05 19:02:19 - ERROR - stderr - +2025-02-05 19:02:19 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.1081061363220215, 'learning_rate': 1.4103824232482686e-05, 'epoch': 1.15} +2025-02-05 19:02:19 - ERROR - stderr - 38%|███▊ | 8625/22434 [8:54:38<9:52:25, 2.57s/it] +2025-02-05 19:02:21 - ERROR - stderr - 38%|███▊ | 8626/22434 [8:54:41<9:49:31, 2.56s/it] +2025-02-05 19:02:21 - ERROR - stderr - +2025-02-05 19:02:21 - ERROR - stderr - +2025-02-05 19:02:21 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.1539075374603271, 'learning_rate': 1.4102507617652337e-05, 'epoch': 1.15} +2025-02-05 19:02:21 - ERROR - stderr - 38%|███▊ | 8626/22434 [8:54:41<9:49:31, 2.56s/it] +2025-02-05 19:02:24 - ERROR - stderr - 38%|███▊ | 8627/22434 [8:54:43<9:41:07, 2.53s/it] +2025-02-05 19:02:24 - ERROR - stderr - +2025-02-05 19:02:24 - ERROR - stderr - +2025-02-05 19:02:24 - INFO - stdout - {'loss': 0.7189, 'grad_norm': 1.263707160949707, 'learning_rate': 1.4101190917309144e-05, 'epoch': 1.15} +2025-02-05 19:02:24 - ERROR - stderr - 38%|███▊ | 8627/22434 [8:54:43<9:41:07, 2.53s/it] +2025-02-05 19:02:26 - ERROR - stderr - 38%|███▊ | 8628/22434 [8:54:46<9:42:41, 2.53s/it] +2025-02-05 19:02:26 - ERROR - stderr - +2025-02-05 19:02:26 - ERROR - stderr - +2025-02-05 19:02:26 - INFO - stdout - {'loss': 0.7739, 'grad_norm': 1.2170530557632446, 'learning_rate': 1.4099874131480551e-05, 'epoch': 1.15} +2025-02-05 19:02:26 - ERROR - stderr - 38%|███▊ | 8628/22434 [8:54:46<9:42:41, 2.53s/it] +2025-02-05 19:02:29 - ERROR - stderr - 38%|███▊ | 8629/22434 [8:54:48<9:38:49, 2.52s/it] +2025-02-05 19:02:29 - ERROR - stderr - +2025-02-05 19:02:29 - ERROR - stderr - +2025-02-05 19:02:29 - INFO - stdout - {'loss': 0.7092, 'grad_norm': 1.2079498767852783, 'learning_rate': 1.4098557260194007e-05, 'epoch': 1.15} +2025-02-05 19:02:29 - ERROR - stderr - 38%|███▊ | 8629/22434 [8:54:48<9:38:49, 2.52s/it] +2025-02-05 19:02:31 - ERROR - stderr - 38%|███▊ | 8630/22434 [8:54:51<9:37:26, 2.51s/it] +2025-02-05 19:02:31 - ERROR - stderr - +2025-02-05 19:02:31 - ERROR - stderr - +2025-02-05 19:02:31 - INFO - stdout - {'loss': 0.6964, 'grad_norm': 1.214830756187439, 'learning_rate': 1.4097240303476955e-05, 'epoch': 1.15} +2025-02-05 19:02:31 - ERROR - stderr - 38%|███▊ | 8630/22434 [8:54:51<9:37:26, 2.51s/it] +2025-02-05 19:02:34 - ERROR - stderr - 38%|███▊ | 8631/22434 [8:54:53<9:32:59, 2.49s/it] +2025-02-05 19:02:34 - ERROR - stderr - +2025-02-05 19:02:34 - ERROR - stderr - +2025-02-05 19:02:34 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.134867787361145, 'learning_rate': 1.409592326135685e-05, 'epoch': 1.15} +2025-02-05 19:02:34 - ERROR - stderr - 38%|███▊ | 8631/22434 [8:54:53<9:32:59, 2.49s/it] +2025-02-05 19:02:36 - ERROR - stderr - 38%|███▊ | 8632/22434 [8:54:56<9:33:20, 2.49s/it] +2025-02-05 19:02:36 - ERROR - stderr - +2025-02-05 19:02:36 - ERROR - stderr - +2025-02-05 19:02:36 - INFO - stdout - {'loss': 0.7004, 'grad_norm': 1.0751886367797852, 'learning_rate': 1.4094606133861143e-05, 'epoch': 1.15} +2025-02-05 19:02:36 - ERROR - stderr - 38%|███▊ | 8632/22434 [8:54:56<9:33:20, 2.49s/it] +2025-02-05 19:02:39 - ERROR - stderr - 38%|███▊ | 8633/22434 [8:54:58<9:34:47, 2.50s/it] +2025-02-05 19:02:39 - ERROR - stderr - +2025-02-05 19:02:39 - ERROR - stderr - +2025-02-05 19:02:39 - INFO - stdout - {'loss': 0.8383, 'grad_norm': 1.2977218627929688, 'learning_rate': 1.4093288921017292e-05, 'epoch': 1.15} +2025-02-05 19:02:39 - ERROR - stderr - 38%|███▊ | 8633/22434 [8:54:58<9:34:47, 2.50s/it] +2025-02-05 19:02:41 - ERROR - stderr - 38%|███▊ | 8634/22434 [8:55:01<9:36:23, 2.51s/it] +2025-02-05 19:02:41 - ERROR - stderr - +2025-02-05 19:02:41 - ERROR - stderr - +2025-02-05 19:02:41 - INFO - stdout - {'loss': 0.7448, 'grad_norm': 1.180029273033142, 'learning_rate': 1.4091971622852751e-05, 'epoch': 1.15} +2025-02-05 19:02:41 - ERROR - stderr - 38%|███▊ | 8634/22434 [8:55:01<9:36:23, 2.51s/it] +2025-02-05 19:02:44 - ERROR - stderr - 38%|███▊ | 8635/22434 [8:55:03<9:39:29, 2.52s/it] +2025-02-05 19:02:44 - ERROR - stderr - +2025-02-05 19:02:44 - ERROR - stderr - +2025-02-05 19:02:44 - INFO - stdout - {'loss': 0.7136, 'grad_norm': 1.1232386827468872, 'learning_rate': 1.4090654239394977e-05, 'epoch': 1.15} +2025-02-05 19:02:44 - ERROR - stderr - 38%|███▊ | 8635/22434 [8:55:03<9:39:29, 2.52s/it] +2025-02-05 19:02:46 - ERROR - stderr - 38%|███▊ | 8636/22434 [8:55:06<9:34:35, 2.50s/it] +2025-02-05 19:02:46 - ERROR - stderr - +2025-02-05 19:02:46 - ERROR - stderr - +2025-02-05 19:02:46 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.1235896348953247, 'learning_rate': 1.4089336770671427e-05, 'epoch': 1.15} +2025-02-05 19:02:46 - ERROR - stderr - 38%|███▊ | 8636/22434 [8:55:06<9:34:35, 2.50s/it] +2025-02-05 19:02:49 - ERROR - stderr - 38%|███▊ | 8637/22434 [8:55:08<9:33:58, 2.50s/it] +2025-02-05 19:02:49 - ERROR - stderr - +2025-02-05 19:02:49 - ERROR - stderr - +2025-02-05 19:02:49 - INFO - stdout - {'loss': 0.8026, 'grad_norm': 1.216508388519287, 'learning_rate': 1.4088019216709568e-05, 'epoch': 1.15} +2025-02-05 19:02:49 - ERROR - stderr - 38%|███▊ | 8637/22434 [8:55:08<9:33:58, 2.50s/it] +2025-02-05 19:02:51 - ERROR - stderr - 39%|███▊ | 8638/22434 [8:55:11<9:56:44, 2.60s/it] +2025-02-05 19:02:51 - ERROR - stderr - +2025-02-05 19:02:51 - ERROR - stderr - +2025-02-05 19:02:51 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.2002564668655396, 'learning_rate': 1.4086701577536857e-05, 'epoch': 1.16} +2025-02-05 19:02:51 - ERROR - stderr - 39%|███▊ | 8638/22434 [8:55:11<9:56:44, 2.60s/it] +2025-02-05 19:02:54 - ERROR - stderr - 39%|███▊ | 8639/22434 [8:55:14<9:52:35, 2.58s/it] +2025-02-05 19:02:54 - ERROR - stderr - +2025-02-05 19:02:54 - ERROR - stderr - +2025-02-05 19:02:54 - INFO - stdout - {'loss': 0.7282, 'grad_norm': 1.101381540298462, 'learning_rate': 1.4085383853180762e-05, 'epoch': 1.16} +2025-02-05 19:02:54 - ERROR - stderr - 39%|███▊ | 8639/22434 [8:55:14<9:52:35, 2.58s/it] +2025-02-05 19:02:56 - ERROR - stderr - 39%|███▊ | 8640/22434 [8:55:16<9:44:46, 2.54s/it] +2025-02-05 19:02:56 - ERROR - stderr - +2025-02-05 19:02:56 - ERROR - stderr - +2025-02-05 19:02:56 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.0521728992462158, 'learning_rate': 1.4084066043668753e-05, 'epoch': 1.16} +2025-02-05 19:02:56 - ERROR - stderr - 39%|███▊ | 8640/22434 [8:55:16<9:44:46, 2.54s/it] +2025-02-05 19:02:59 - ERROR - stderr - 39%|███▊ | 8641/22434 [8:55:19<9:43:06, 2.54s/it] +2025-02-05 19:02:59 - ERROR - stderr - +2025-02-05 19:02:59 - ERROR - stderr - +2025-02-05 19:02:59 - INFO - stdout - {'loss': 0.6464, 'grad_norm': 1.0231339931488037, 'learning_rate': 1.4082748149028294e-05, 'epoch': 1.16} +2025-02-05 19:02:59 - ERROR - stderr - 39%|███▊ | 8641/22434 [8:55:19<9:43:06, 2.54s/it] +2025-02-05 19:03:01 - ERROR - stderr - 39%|███▊ | 8642/22434 [8:55:21<9:41:35, 2.53s/it] +2025-02-05 19:03:02 - ERROR - stderr - +2025-02-05 19:03:02 - ERROR - stderr - +2025-02-05 19:03:02 - INFO - stdout - {'loss': 0.7474, 'grad_norm': 1.219417929649353, 'learning_rate': 1.4081430169286859e-05, 'epoch': 1.16} +2025-02-05 19:03:02 - ERROR - stderr - 39%|███▊ | 8642/22434 [8:55:21<9:41:35, 2.53s/it] +2025-02-05 19:03:04 - ERROR - stderr - 39%|███▊ | 8643/22434 [8:55:24<9:42:49, 2.54s/it] +2025-02-05 19:03:04 - ERROR - stderr - +2025-02-05 19:03:04 - ERROR - stderr - +2025-02-05 19:03:04 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.141446590423584, 'learning_rate': 1.4080112104471914e-05, 'epoch': 1.16} +2025-02-05 19:03:04 - ERROR - stderr - 39%|███▊ | 8643/22434 [8:55:24<9:42:49, 2.54s/it] +2025-02-05 19:03:07 - ERROR - stderr - 39%|███▊ | 8644/22434 [8:55:26<9:44:38, 2.54s/it] +2025-02-05 19:03:07 - ERROR - stderr - +2025-02-05 19:03:07 - ERROR - stderr - +2025-02-05 19:03:07 - INFO - stdout - {'loss': 0.6654, 'grad_norm': 1.0480694770812988, 'learning_rate': 1.4078793954610937e-05, 'epoch': 1.16} +2025-02-05 19:03:07 - ERROR - stderr - 39%|███▊ | 8644/22434 [8:55:26<9:44:38, 2.54s/it] +2025-02-05 19:03:09 - ERROR - stderr - 39%|███▊ | 8645/22434 [8:55:29<9:40:32, 2.53s/it] +2025-02-05 19:03:09 - ERROR - stderr - +2025-02-05 19:03:09 - ERROR - stderr - +2025-02-05 19:03:09 - INFO - stdout - {'loss': 0.833, 'grad_norm': 1.254658818244934, 'learning_rate': 1.4077475719731402e-05, 'epoch': 1.16} +2025-02-05 19:03:09 - ERROR - stderr - 39%|███▊ | 8645/22434 [8:55:29<9:40:32, 2.53s/it] +2025-02-05 19:03:12 - ERROR - stderr - 39%|███▊ | 8646/22434 [8:55:31<9:35:53, 2.51s/it] +2025-02-05 19:03:12 - ERROR - stderr - +2025-02-05 19:03:12 - ERROR - stderr - +2025-02-05 19:03:12 - INFO - stdout - {'loss': 0.6883, 'grad_norm': 1.148121953010559, 'learning_rate': 1.407615739986079e-05, 'epoch': 1.16} +2025-02-05 19:03:12 - ERROR - stderr - 39%|███▊ | 8646/22434 [8:55:31<9:35:53, 2.51s/it] +2025-02-05 19:03:14 - ERROR - stderr - 39%|███▊ | 8647/22434 [8:55:34<9:31:37, 2.49s/it] +2025-02-05 19:03:14 - ERROR - stderr - +2025-02-05 19:03:14 - ERROR - stderr - +2025-02-05 19:03:14 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.070624589920044, 'learning_rate': 1.4074838995026578e-05, 'epoch': 1.16} +2025-02-05 19:03:14 - ERROR - stderr - 39%|███▊ | 8647/22434 [8:55:34<9:31:37, 2.49s/it] +2025-02-05 19:03:16 - ERROR - stderr - 39%|███▊ | 8648/22434 [8:55:36<9:33:30, 2.50s/it] +2025-02-05 19:03:17 - ERROR - stderr - +2025-02-05 19:03:17 - ERROR - stderr - +2025-02-05 19:03:17 - INFO - stdout - {'loss': 0.7207, 'grad_norm': 1.2298760414123535, 'learning_rate': 1.4073520505256244e-05, 'epoch': 1.16} +2025-02-05 19:03:17 - ERROR - stderr - 39%|███▊ | 8648/22434 [8:55:36<9:33:30, 2.50s/it] +2025-02-05 19:03:19 - ERROR - stderr - 39%|███▊ | 8649/22434 [8:55:39<9:28:53, 2.48s/it] +2025-02-05 19:03:19 - ERROR - stderr - +2025-02-05 19:03:19 - ERROR - stderr - +2025-02-05 19:03:19 - INFO - stdout - {'loss': 0.7162, 'grad_norm': 1.193013310432434, 'learning_rate': 1.4072201930577274e-05, 'epoch': 1.16} +2025-02-05 19:03:19 - ERROR - stderr - 39%|███▊ | 8649/22434 [8:55:39<9:28:53, 2.48s/it] +2025-02-05 19:03:22 - ERROR - stderr - 39%|███▊ | 8650/22434 [8:55:41<9:37:48, 2.52s/it] +2025-02-05 19:03:22 - ERROR - stderr - +2025-02-05 19:03:22 - ERROR - stderr - +2025-02-05 19:03:22 - INFO - stdout - {'loss': 0.7471, 'grad_norm': 1.144661545753479, 'learning_rate': 1.4070883271017151e-05, 'epoch': 1.16} +2025-02-05 19:03:22 - ERROR - stderr - 39%|███▊ | 8650/22434 [8:55:41<9:37:48, 2.52s/it] +2025-02-05 19:03:24 - ERROR - stderr - 39%|███▊ | 8651/22434 [8:55:44<9:36:29, 2.51s/it] +2025-02-05 19:03:24 - ERROR - stderr - +2025-02-05 19:03:24 - ERROR - stderr - +2025-02-05 19:03:24 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.1560523509979248, 'learning_rate': 1.4069564526603361e-05, 'epoch': 1.16} +2025-02-05 19:03:24 - ERROR - stderr - 39%|███▊ | 8651/22434 [8:55:44<9:36:29, 2.51s/it] +2025-02-05 19:03:26 - ERROR - stderr - 39%|███▊ | 8652/22434 [8:55:46<9:31:50, 2.49s/it] +2025-02-05 19:03:27 - ERROR - stderr - +2025-02-05 19:03:27 - ERROR - stderr - +2025-02-05 19:03:27 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.1969174146652222, 'learning_rate': 1.4068245697363394e-05, 'epoch': 1.16} +2025-02-05 19:03:27 - ERROR - stderr - 39%|███▊ | 8652/22434 [8:55:46<9:31:50, 2.49s/it] +2025-02-05 19:03:29 - ERROR - stderr - 39%|███▊ | 8653/22434 [8:55:49<9:33:08, 2.50s/it] +2025-02-05 19:03:29 - ERROR - stderr - +2025-02-05 19:03:29 - ERROR - stderr - +2025-02-05 19:03:29 - INFO - stdout - {'loss': 0.6963, 'grad_norm': 1.0731309652328491, 'learning_rate': 1.406692678332474e-05, 'epoch': 1.16} +2025-02-05 19:03:29 - ERROR - stderr - 39%|███▊ | 8653/22434 [8:55:49<9:33:08, 2.50s/it] +2025-02-05 19:03:32 - ERROR - stderr - 39%|███▊ | 8654/22434 [8:55:51<9:48:45, 2.56s/it] +2025-02-05 19:03:32 - ERROR - stderr - +2025-02-05 19:03:32 - ERROR - stderr - +2025-02-05 19:03:32 - INFO - stdout - {'loss': 0.7123, 'grad_norm': 1.1388859748840332, 'learning_rate': 1.4065607784514886e-05, 'epoch': 1.16} +2025-02-05 19:03:32 - ERROR - stderr - 39%|███▊ | 8654/22434 [8:55:52<9:48:45, 2.56s/it] +2025-02-05 19:03:34 - ERROR - stderr - 39%|███▊ | 8655/22434 [8:55:54<9:44:37, 2.55s/it] +2025-02-05 19:03:34 - ERROR - stderr - +2025-02-05 19:03:34 - ERROR - stderr - +2025-02-05 19:03:34 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.0731945037841797, 'learning_rate': 1.4064288700961328e-05, 'epoch': 1.16} +2025-02-05 19:03:34 - ERROR - stderr - 39%|███▊ | 8655/22434 [8:55:54<9:44:37, 2.55s/it] +2025-02-05 19:03:37 - ERROR - stderr - 39%|███▊ | 8656/22434 [8:55:56<9:39:20, 2.52s/it] +2025-02-05 19:03:37 - ERROR - stderr - +2025-02-05 19:03:37 - ERROR - stderr - +2025-02-05 19:03:37 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.2505344152450562, 'learning_rate': 1.4062969532691564e-05, 'epoch': 1.16} +2025-02-05 19:03:37 - ERROR - stderr - 39%|███▊ | 8656/22434 [8:55:56<9:39:20, 2.52s/it] +2025-02-05 19:03:39 - ERROR - stderr - 39%|███▊ | 8657/22434 [8:55:59<9:36:23, 2.51s/it] +2025-02-05 19:03:39 - ERROR - stderr - +2025-02-05 19:03:39 - ERROR - stderr - +2025-02-05 19:03:39 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.2416878938674927, 'learning_rate': 1.4061650279733083e-05, 'epoch': 1.16} +2025-02-05 19:03:39 - ERROR - stderr - 39%|███▊ | 8657/22434 [8:55:59<9:36:23, 2.51s/it] +2025-02-05 19:03:42 - ERROR - stderr - 39%|███▊ | 8658/22434 [8:56:02<9:58:55, 2.61s/it] +2025-02-05 19:03:42 - ERROR - stderr - +2025-02-05 19:03:42 - ERROR - stderr - +2025-02-05 19:03:42 - INFO - stdout - {'loss': 0.6635, 'grad_norm': 1.1526620388031006, 'learning_rate': 1.4060330942113392e-05, 'epoch': 1.16} +2025-02-05 19:03:42 - ERROR - stderr - 39%|███▊ | 8658/22434 [8:56:02<9:58:55, 2.61s/it] +2025-02-05 19:03:44 - ERROR - stderr - 39%|███▊ | 8659/22434 [8:56:04<9:52:03, 2.58s/it] +2025-02-05 19:03:45 - ERROR - stderr - +2025-02-05 19:03:45 - ERROR - stderr - +2025-02-05 19:03:45 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.119500756263733, 'learning_rate': 1.4059011519859987e-05, 'epoch': 1.16} +2025-02-05 19:03:45 - ERROR - stderr - 39%|███▊ | 8659/22434 [8:56:04<9:52:03, 2.58s/it] +2025-02-05 19:03:47 - ERROR - stderr - 39%|███▊ | 8660/22434 [8:56:07<9:49:18, 2.57s/it] +2025-02-05 19:03:47 - ERROR - stderr - +2025-02-05 19:03:47 - ERROR - stderr - +2025-02-05 19:03:47 - INFO - stdout - {'loss': 0.7927, 'grad_norm': 1.2432773113250732, 'learning_rate': 1.405769201300037e-05, 'epoch': 1.16} +2025-02-05 19:03:47 - ERROR - stderr - 39%|███▊ | 8660/22434 [8:56:07<9:49:18, 2.57s/it] +2025-02-05 19:03:49 - ERROR - stderr - 39%|███▊ | 8661/22434 [8:56:09<9:40:21, 2.53s/it] +2025-02-05 19:03:50 - ERROR - stderr - +2025-02-05 19:03:50 - ERROR - stderr - +2025-02-05 19:03:50 - INFO - stdout - {'loss': 0.6717, 'grad_norm': 1.1736907958984375, 'learning_rate': 1.4056372421562048e-05, 'epoch': 1.16} +2025-02-05 19:03:50 - ERROR - stderr - 39%|███▊ | 8661/22434 [8:56:09<9:40:21, 2.53s/it] +2025-02-05 19:03:52 - ERROR - stderr - 39%|███▊ | 8662/22434 [8:56:12<9:44:36, 2.55s/it] +2025-02-05 19:03:52 - ERROR - stderr - +2025-02-05 19:03:52 - ERROR - stderr - +2025-02-05 19:03:52 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.2128117084503174, 'learning_rate': 1.4055052745572524e-05, 'epoch': 1.16} +2025-02-05 19:03:52 - ERROR - stderr - 39%|███▊ | 8662/22434 [8:56:12<9:44:36, 2.55s/it] +2025-02-05 19:03:55 - ERROR - stderr - 39%|███▊ | 8663/22434 [8:56:14<9:41:02, 2.53s/it] +2025-02-05 19:03:55 - ERROR - stderr - +2025-02-05 19:03:55 - ERROR - stderr - +2025-02-05 19:03:55 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.1113380193710327, 'learning_rate': 1.4053732985059304e-05, 'epoch': 1.16} +2025-02-05 19:03:55 - ERROR - stderr - 39%|███▊ | 8663/22434 [8:56:14<9:41:02, 2.53s/it] +2025-02-05 19:03:57 - ERROR - stderr - 39%|███▊ | 8664/22434 [8:56:17<9:46:14, 2.55s/it] +2025-02-05 19:03:57 - ERROR - stderr - +2025-02-05 19:03:57 - ERROR - stderr - +2025-02-05 19:03:57 - INFO - stdout - {'loss': 0.6819, 'grad_norm': 1.0834118127822876, 'learning_rate': 1.4052413140049898e-05, 'epoch': 1.16} +2025-02-05 19:03:57 - ERROR - stderr - 39%|███▊ | 8664/22434 [8:56:17<9:46:14, 2.55s/it] +2025-02-05 19:04:00 - ERROR - stderr - 39%|███▊ | 8665/22434 [8:56:19<9:40:59, 2.53s/it] +2025-02-05 19:04:00 - ERROR - stderr - +2025-02-05 19:04:00 - ERROR - stderr - +2025-02-05 19:04:00 - INFO - stdout - {'loss': 0.5555, 'grad_norm': 0.9484673738479614, 'learning_rate': 1.4051093210571822e-05, 'epoch': 1.16} +2025-02-05 19:04:00 - ERROR - stderr - 39%|███▊ | 8665/22434 [8:56:19<9:40:59, 2.53s/it] +2025-02-05 19:04:02 - ERROR - stderr - 39%|███▊ | 8666/22434 [8:56:22<9:36:45, 2.51s/it] +2025-02-05 19:04:02 - ERROR - stderr - +2025-02-05 19:04:02 - ERROR - stderr - +2025-02-05 19:04:02 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.2186510562896729, 'learning_rate': 1.4049773196652582e-05, 'epoch': 1.16} +2025-02-05 19:04:02 - ERROR - stderr - 39%|███▊ | 8666/22434 [8:56:22<9:36:45, 2.51s/it] +2025-02-05 19:04:05 - ERROR - stderr - 39%|███▊ | 8667/22434 [8:56:24<9:32:18, 2.49s/it] +2025-02-05 19:04:05 - ERROR - stderr - +2025-02-05 19:04:05 - ERROR - stderr - +2025-02-05 19:04:05 - INFO - stdout - {'loss': 0.7132, 'grad_norm': 1.1992262601852417, 'learning_rate': 1.4048453098319696e-05, 'epoch': 1.16} +2025-02-05 19:04:05 - ERROR - stderr - 39%|███▊ | 8667/22434 [8:56:24<9:32:18, 2.49s/it] +2025-02-05 19:04:07 - ERROR - stderr - 39%|███▊ | 8668/22434 [8:56:27<9:30:11, 2.49s/it] +2025-02-05 19:04:07 - ERROR - stderr - +2025-02-05 19:04:07 - ERROR - stderr - +2025-02-05 19:04:07 - INFO - stdout - {'loss': 0.7855, 'grad_norm': 1.2447351217269897, 'learning_rate': 1.4047132915600678e-05, 'epoch': 1.16} +2025-02-05 19:04:07 - ERROR - stderr - 39%|███▊ | 8668/22434 [8:56:27<9:30:11, 2.49s/it] +2025-02-05 19:04:10 - ERROR - stderr - 39%|███▊ | 8669/22434 [8:56:29<9:31:21, 2.49s/it] +2025-02-05 19:04:10 - ERROR - stderr - +2025-02-05 19:04:10 - ERROR - stderr - +2025-02-05 19:04:10 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.0706075429916382, 'learning_rate': 1.4045812648523047e-05, 'epoch': 1.16} +2025-02-05 19:04:10 - ERROR - stderr - 39%|███▊ | 8669/22434 [8:56:29<9:31:21, 2.49s/it] +2025-02-05 19:04:12 - ERROR - stderr - 39%|███▊ | 8670/22434 [8:56:32<9:29:50, 2.48s/it] +2025-02-05 19:04:12 - ERROR - stderr - +2025-02-05 19:04:12 - ERROR - stderr - +2025-02-05 19:04:12 - INFO - stdout - {'loss': 0.7446, 'grad_norm': 1.1923249959945679, 'learning_rate': 1.4044492297114323e-05, 'epoch': 1.16} +2025-02-05 19:04:12 - ERROR - stderr - 39%|███▊ | 8670/22434 [8:56:32<9:29:50, 2.48s/it] +2025-02-05 19:04:14 - ERROR - stderr - 39%|███▊ | 8671/22434 [8:56:34<9:30:40, 2.49s/it] +2025-02-05 19:04:15 - ERROR - stderr - +2025-02-05 19:04:15 - ERROR - stderr - +2025-02-05 19:04:15 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.0739163160324097, 'learning_rate': 1.4043171861402028e-05, 'epoch': 1.16} +2025-02-05 19:04:15 - ERROR - stderr - 39%|███▊ | 8671/22434 [8:56:34<9:30:40, 2.49s/it] +2025-02-05 19:04:17 - ERROR - stderr - 39%|███▊ | 8672/22434 [8:56:37<9:31:44, 2.49s/it] +2025-02-05 19:04:17 - ERROR - stderr - +2025-02-05 19:04:17 - ERROR - stderr - +2025-02-05 19:04:17 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.2027435302734375, 'learning_rate': 1.4041851341413683e-05, 'epoch': 1.16} +2025-02-05 19:04:17 - ERROR - stderr - 39%|███▊ | 8672/22434 [8:56:37<9:31:44, 2.49s/it] +2025-02-05 19:04:19 - ERROR - stderr - 39%|███▊ | 8673/22434 [8:56:39<9:29:59, 2.49s/it] +2025-02-05 19:04:20 - ERROR - stderr - +2025-02-05 19:04:20 - ERROR - stderr - +2025-02-05 19:04:20 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.1391794681549072, 'learning_rate': 1.4040530737176817e-05, 'epoch': 1.16} +2025-02-05 19:04:20 - ERROR - stderr - 39%|███▊ | 8673/22434 [8:56:39<9:29:59, 2.49s/it] +2025-02-05 19:04:22 - ERROR - stderr - 39%|███▊ | 8674/22434 [8:56:42<9:29:42, 2.48s/it] +2025-02-05 19:04:22 - ERROR - stderr - +2025-02-05 19:04:22 - ERROR - stderr - +2025-02-05 19:04:22 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.1397111415863037, 'learning_rate': 1.403921004871895e-05, 'epoch': 1.16} +2025-02-05 19:04:22 - ERROR - stderr - 39%|███▊ | 8674/22434 [8:56:42<9:29:42, 2.48s/it] +2025-02-05 19:04:25 - ERROR - stderr - 39%|███▊ | 8675/22434 [8:56:44<9:35:14, 2.51s/it] +2025-02-05 19:04:25 - ERROR - stderr - +2025-02-05 19:04:25 - ERROR - stderr - +2025-02-05 19:04:25 - INFO - stdout - {'loss': 0.7314, 'grad_norm': 1.0840952396392822, 'learning_rate': 1.403788927606762e-05, 'epoch': 1.16} +2025-02-05 19:04:25 - ERROR - stderr - 39%|███▊ | 8675/22434 [8:56:44<9:35:14, 2.51s/it] +2025-02-05 19:04:27 - ERROR - stderr - 39%|███▊ | 8676/22434 [8:56:47<9:35:45, 2.51s/it] +2025-02-05 19:04:27 - ERROR - stderr - +2025-02-05 19:04:27 - ERROR - stderr - +2025-02-05 19:04:27 - INFO - stdout - {'loss': 0.734, 'grad_norm': 1.2008872032165527, 'learning_rate': 1.403656841925035e-05, 'epoch': 1.16} +2025-02-05 19:04:27 - ERROR - stderr - 39%|███▊ | 8676/22434 [8:56:47<9:35:45, 2.51s/it] +2025-02-05 19:04:30 - ERROR - stderr - 39%|███▊ | 8677/22434 [8:56:49<9:48:45, 2.57s/it] +2025-02-05 19:04:30 - ERROR - stderr - +2025-02-05 19:04:30 - ERROR - stderr - +2025-02-05 19:04:30 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.2115564346313477, 'learning_rate': 1.403524747829467e-05, 'epoch': 1.16} +2025-02-05 19:04:30 - ERROR - stderr - 39%|███▊ | 8677/22434 [8:56:50<9:48:45, 2.57s/it] +2025-02-05 19:04:32 - ERROR - stderr - 39%|███▊ | 8678/22434 [8:56:52<9:46:54, 2.56s/it] +2025-02-05 19:04:32 - ERROR - stderr - +2025-02-05 19:04:32 - ERROR - stderr - +2025-02-05 19:04:32 - INFO - stdout - {'loss': 0.7151, 'grad_norm': 1.1864066123962402, 'learning_rate': 1.403392645322812e-05, 'epoch': 1.16} +2025-02-05 19:04:32 - ERROR - stderr - 39%|███▊ | 8678/22434 [8:56:52<9:46:54, 2.56s/it] +2025-02-05 19:04:35 - ERROR - stderr - 39%|███▊ | 8679/22434 [8:56:55<9:47:42, 2.56s/it] +2025-02-05 19:04:35 - ERROR - stderr - +2025-02-05 19:04:35 - ERROR - stderr - +2025-02-05 19:04:35 - INFO - stdout - {'loss': 0.6875, 'grad_norm': 0.9672351479530334, 'learning_rate': 1.4032605344078235e-05, 'epoch': 1.16} +2025-02-05 19:04:35 - ERROR - stderr - 39%|███▊ | 8679/22434 [8:56:55<9:47:42, 2.56s/it] +2025-02-05 19:04:37 - ERROR - stderr - 39%|███▊ | 8680/22434 [8:56:57<9:46:45, 2.56s/it] +2025-02-05 19:04:37 - ERROR - stderr - +2025-02-05 19:04:37 - ERROR - stderr - +2025-02-05 19:04:37 - INFO - stdout - {'loss': 0.8084, 'grad_norm': 1.1373335123062134, 'learning_rate': 1.4031284150872548e-05, 'epoch': 1.16} +2025-02-05 19:04:37 - ERROR - stderr - 39%|███▊ | 8680/22434 [8:56:57<9:46:45, 2.56s/it] +2025-02-05 19:04:40 - ERROR - stderr - 39%|███▊ | 8681/22434 [8:57:00<9:51:01, 2.58s/it] +2025-02-05 19:04:40 - ERROR - stderr - +2025-02-05 19:04:40 - ERROR - stderr - +2025-02-05 19:04:40 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.0992237329483032, 'learning_rate': 1.40299628736386e-05, 'epoch': 1.16} +2025-02-05 19:04:40 - ERROR - stderr - 39%|███▊ | 8681/22434 [8:57:00<9:51:01, 2.58s/it] +2025-02-05 19:04:43 - ERROR - stderr - 39%|███▊ | 8682/22434 [8:57:02<9:48:45, 2.57s/it] +2025-02-05 19:04:43 - ERROR - stderr - +2025-02-05 19:04:43 - ERROR - stderr - +2025-02-05 19:04:43 - INFO - stdout - {'loss': 0.6568, 'grad_norm': 1.0596797466278076, 'learning_rate': 1.4028641512403934e-05, 'epoch': 1.16} +2025-02-05 19:04:43 - ERROR - stderr - 39%|███▊ | 8682/22434 [8:57:02<9:48:45, 2.57s/it] +2025-02-05 19:04:45 - ERROR - stderr - 39%|███▊ | 8683/22434 [8:57:05<9:50:35, 2.58s/it] +2025-02-05 19:04:45 - ERROR - stderr - +2025-02-05 19:04:45 - ERROR - stderr - +2025-02-05 19:04:45 - INFO - stdout - {'loss': 0.7983, 'grad_norm': 1.263583779335022, 'learning_rate': 1.4027320067196091e-05, 'epoch': 1.16} +2025-02-05 19:04:45 - ERROR - stderr - 39%|███▊ | 8683/22434 [8:57:05<9:50:35, 2.58s/it] +2025-02-05 19:04:48 - ERROR - stderr - 39%|███▊ | 8684/22434 [8:57:07<9:46:59, 2.56s/it] +2025-02-05 19:04:48 - ERROR - stderr - +2025-02-05 19:04:48 - ERROR - stderr - +2025-02-05 19:04:48 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.2121177911758423, 'learning_rate': 1.4025998538042613e-05, 'epoch': 1.16} +2025-02-05 19:04:48 - ERROR - stderr - 39%|███▊ | 8684/22434 [8:57:07<9:46:59, 2.56s/it] +2025-02-05 19:04:50 - ERROR - stderr - 39%|███▊ | 8685/22434 [8:57:10<9:44:37, 2.55s/it] +2025-02-05 19:04:50 - ERROR - stderr - +2025-02-05 19:04:50 - ERROR - stderr - +2025-02-05 19:04:50 - INFO - stdout - {'loss': 0.7743, 'grad_norm': 1.1451551914215088, 'learning_rate': 1.4024676924971048e-05, 'epoch': 1.16} +2025-02-05 19:04:50 - ERROR - stderr - 39%|███▊ | 8685/22434 [8:57:10<9:44:37, 2.55s/it] +2025-02-05 19:04:53 - ERROR - stderr - 39%|███▊ | 8686/22434 [8:57:12<9:38:10, 2.52s/it] +2025-02-05 19:04:53 - ERROR - stderr - +2025-02-05 19:04:53 - ERROR - stderr - +2025-02-05 19:04:53 - INFO - stdout - {'loss': 0.6739, 'grad_norm': 1.0530674457550049, 'learning_rate': 1.4023355228008946e-05, 'epoch': 1.16} +2025-02-05 19:04:53 - ERROR - stderr - 39%|███▊ | 8686/22434 [8:57:12<9:38:10, 2.52s/it] +2025-02-05 19:04:55 - ERROR - stderr - 39%|███▊ | 8687/22434 [8:57:15<9:37:19, 2.52s/it] +2025-02-05 19:04:55 - ERROR - stderr - +2025-02-05 19:04:55 - ERROR - stderr - +2025-02-05 19:04:55 - INFO - stdout - {'loss': 0.7105, 'grad_norm': 1.1487675905227661, 'learning_rate': 1.4022033447183854e-05, 'epoch': 1.16} +2025-02-05 19:04:55 - ERROR - stderr - 39%|███▊ | 8687/22434 [8:57:15<9:37:19, 2.52s/it] +2025-02-05 19:04:58 - ERROR - stderr - 39%|███▊ | 8688/22434 [8:57:17<9:33:46, 2.50s/it] +2025-02-05 19:04:58 - ERROR - stderr - +2025-02-05 19:04:58 - ERROR - stderr - +2025-02-05 19:04:58 - INFO - stdout - {'loss': 0.8047, 'grad_norm': 1.1700356006622314, 'learning_rate': 1.4020711582523323e-05, 'epoch': 1.16} +2025-02-05 19:04:58 - ERROR - stderr - 39%|███▊ | 8688/22434 [8:57:17<9:33:46, 2.50s/it] +2025-02-05 19:05:00 - ERROR - stderr - 39%|███▊ | 8689/22434 [8:57:20<9:29:54, 2.49s/it] +2025-02-05 19:05:00 - ERROR - stderr - +2025-02-05 19:05:00 - ERROR - stderr - +2025-02-05 19:05:00 - INFO - stdout - {'loss': 0.7741, 'grad_norm': 1.0820037126541138, 'learning_rate': 1.4019389634054905e-05, 'epoch': 1.16} +2025-02-05 19:05:00 - ERROR - stderr - 39%|███▊ | 8689/22434 [8:57:20<9:29:54, 2.49s/it] +2025-02-05 19:05:03 - ERROR - stderr - 39%|███▊ | 8690/22434 [8:57:23<9:52:30, 2.59s/it] +2025-02-05 19:05:03 - ERROR - stderr - +2025-02-05 19:05:03 - ERROR - stderr - +2025-02-05 19:05:03 - INFO - stdout - {'loss': 0.7896, 'grad_norm': 1.2940515279769897, 'learning_rate': 1.4018067601806155e-05, 'epoch': 1.16} +2025-02-05 19:05:03 - ERROR - stderr - 39%|███▊ | 8690/22434 [8:57:23<9:52:30, 2.59s/it] +2025-02-05 19:05:05 - ERROR - stderr - 39%|███▊ | 8691/22434 [8:57:25<9:51:30, 2.58s/it] +2025-02-05 19:05:06 - ERROR - stderr - +2025-02-05 19:05:06 - ERROR - stderr - +2025-02-05 19:05:06 - INFO - stdout - {'loss': 0.7523, 'grad_norm': 1.2776328325271606, 'learning_rate': 1.4016745485804634e-05, 'epoch': 1.16} +2025-02-05 19:05:06 - ERROR - stderr - 39%|███▊ | 8691/22434 [8:57:25<9:51:30, 2.58s/it] +2025-02-05 19:05:08 - ERROR - stderr - 39%|███▊ | 8692/22434 [8:57:28<9:48:08, 2.57s/it] +2025-02-05 19:05:08 - ERROR - stderr - +2025-02-05 19:05:08 - ERROR - stderr - +2025-02-05 19:05:08 - INFO - stdout - {'loss': 0.7907, 'grad_norm': 1.1768579483032227, 'learning_rate': 1.4015423286077896e-05, 'epoch': 1.16} +2025-02-05 19:05:08 - ERROR - stderr - 39%|███▊ | 8692/22434 [8:57:28<9:48:08, 2.57s/it] +2025-02-05 19:05:11 - ERROR - stderr - 39%|███▊ | 8693/22434 [8:57:30<9:53:40, 2.59s/it] +2025-02-05 19:05:11 - ERROR - stderr - +2025-02-05 19:05:11 - ERROR - stderr - +2025-02-05 19:05:11 - INFO - stdout - {'loss': 0.7546, 'grad_norm': 1.1278249025344849, 'learning_rate': 1.4014101002653501e-05, 'epoch': 1.16} +2025-02-05 19:05:11 - ERROR - stderr - 39%|███▊ | 8693/22434 [8:57:30<9:53:40, 2.59s/it] +2025-02-05 19:05:13 - ERROR - stderr - 39%|███▉ | 8694/22434 [8:57:33<9:45:46, 2.56s/it] +2025-02-05 19:05:13 - ERROR - stderr - +2025-02-05 19:05:13 - ERROR - stderr - +2025-02-05 19:05:13 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.07382333278656, 'learning_rate': 1.4012778635559013e-05, 'epoch': 1.16} +2025-02-05 19:05:13 - ERROR - stderr - 39%|███▉ | 8694/22434 [8:57:33<9:45:46, 2.56s/it] +2025-02-05 19:05:16 - ERROR - stderr - 39%|███▉ | 8695/22434 [8:57:35<9:40:13, 2.53s/it] +2025-02-05 19:05:16 - ERROR - stderr - +2025-02-05 19:05:16 - ERROR - stderr - +2025-02-05 19:05:16 - INFO - stdout - {'loss': 0.7577, 'grad_norm': 1.120728850364685, 'learning_rate': 1.4011456184821994e-05, 'epoch': 1.16} +2025-02-05 19:05:16 - ERROR - stderr - 39%|███▉ | 8695/22434 [8:57:35<9:40:13, 2.53s/it] +2025-02-05 19:05:18 - ERROR - stderr - 39%|███▉ | 8696/22434 [8:57:38<9:37:22, 2.52s/it] +2025-02-05 19:05:18 - ERROR - stderr - +2025-02-05 19:05:18 - ERROR - stderr - +2025-02-05 19:05:18 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.1440843343734741, 'learning_rate': 1.4010133650470007e-05, 'epoch': 1.16} +2025-02-05 19:05:18 - ERROR - stderr - 39%|███▉ | 8696/22434 [8:57:38<9:37:22, 2.52s/it] +2025-02-05 19:05:21 - ERROR - stderr - 39%|███▉ | 8697/22434 [8:57:40<9:36:46, 2.52s/it] +2025-02-05 19:05:21 - ERROR - stderr - +2025-02-05 19:05:21 - ERROR - stderr - +2025-02-05 19:05:21 - INFO - stdout - {'loss': 0.6454, 'grad_norm': 1.0723897218704224, 'learning_rate': 1.4008811032530624e-05, 'epoch': 1.16} +2025-02-05 19:05:21 - ERROR - stderr - 39%|███▉ | 8697/22434 [8:57:40<9:36:46, 2.52s/it] +2025-02-05 19:05:23 - ERROR - stderr - 39%|███▉ | 8698/22434 [8:57:43<9:38:32, 2.53s/it] +2025-02-05 19:05:23 - ERROR - stderr - +2025-02-05 19:05:23 - ERROR - stderr - +2025-02-05 19:05:23 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.1582127809524536, 'learning_rate': 1.4007488331031409e-05, 'epoch': 1.16} +2025-02-05 19:05:23 - ERROR - stderr - 39%|███▉ | 8698/22434 [8:57:43<9:38:32, 2.53s/it] +2025-02-05 19:05:26 - ERROR - stderr - 39%|███▉ | 8699/22434 [8:57:45<9:40:00, 2.53s/it] +2025-02-05 19:05:26 - ERROR - stderr - +2025-02-05 19:05:26 - ERROR - stderr - +2025-02-05 19:05:26 - INFO - stdout - {'loss': 0.7638, 'grad_norm': 1.0734220743179321, 'learning_rate': 1.4006165545999939e-05, 'epoch': 1.16} +2025-02-05 19:05:26 - ERROR - stderr - 39%|███▉ | 8699/22434 [8:57:46<9:40:00, 2.53s/it] +2025-02-05 19:05:28 - ERROR - stderr - 39%|███▉ | 8700/22434 [8:57:48<9:40:26, 2.54s/it] +2025-02-05 19:05:28 - ERROR - stderr - +2025-02-05 19:05:28 - ERROR - stderr - +2025-02-05 19:05:28 - INFO - stdout - {'loss': 0.6616, 'grad_norm': 1.1487008333206177, 'learning_rate': 1.4004842677463777e-05, 'epoch': 1.16} +2025-02-05 19:05:28 - ERROR - stderr - 39%|███▉ | 8700/22434 [8:57:48<9:40:26, 2.54s/it] +2025-02-05 19:05:31 - ERROR - stderr - 39%|███▉ | 8701/22434 [8:57:51<9:41:44, 2.54s/it] +2025-02-05 19:05:31 - ERROR - stderr - +2025-02-05 19:05:31 - ERROR - stderr - +2025-02-05 19:05:31 - INFO - stdout - {'loss': 0.6921, 'grad_norm': 1.1554994583129883, 'learning_rate': 1.4003519725450505e-05, 'epoch': 1.16} +2025-02-05 19:05:31 - ERROR - stderr - 39%|███▉ | 8701/22434 [8:57:51<9:41:44, 2.54s/it] +2025-02-05 19:05:33 - ERROR - stderr - 39%|███▉ | 8702/22434 [8:57:53<9:35:39, 2.52s/it] +2025-02-05 19:05:33 - ERROR - stderr - +2025-02-05 19:05:33 - ERROR - stderr - +2025-02-05 19:05:33 - INFO - stdout - {'loss': 0.7351, 'grad_norm': 1.2108056545257568, 'learning_rate': 1.4002196689987693e-05, 'epoch': 1.16} +2025-02-05 19:05:33 - ERROR - stderr - 39%|███▉ | 8702/22434 [8:57:53<9:35:39, 2.52s/it] +2025-02-05 19:05:36 - ERROR - stderr - 39%|███▉ | 8703/22434 [8:57:56<9:37:18, 2.52s/it] +2025-02-05 19:05:36 - ERROR - stderr - +2025-02-05 19:05:36 - ERROR - stderr - +2025-02-05 19:05:36 - INFO - stdout - {'loss': 0.7992, 'grad_norm': 1.1940504312515259, 'learning_rate': 1.400087357110292e-05, 'epoch': 1.16} +2025-02-05 19:05:36 - ERROR - stderr - 39%|███▉ | 8703/22434 [8:57:56<9:37:18, 2.52s/it] +2025-02-05 19:05:38 - ERROR - stderr - 39%|███▉ | 8704/22434 [8:57:58<9:34:11, 2.51s/it] +2025-02-05 19:05:38 - ERROR - stderr - +2025-02-05 19:05:38 - ERROR - stderr - +2025-02-05 19:05:38 - INFO - stdout - {'loss': 0.7041, 'grad_norm': 1.223440170288086, 'learning_rate': 1.3999550368823767e-05, 'epoch': 1.16} +2025-02-05 19:05:38 - ERROR - stderr - 39%|███▉ | 8704/22434 [8:57:58<9:34:11, 2.51s/it] +2025-02-05 19:05:41 - ERROR - stderr - 39%|███▉ | 8705/22434 [8:58:01<9:36:00, 2.52s/it] +2025-02-05 19:05:41 - ERROR - stderr - +2025-02-05 19:05:41 - ERROR - stderr - +2025-02-05 19:05:41 - INFO - stdout - {'loss': 0.6334, 'grad_norm': 1.0549988746643066, 'learning_rate': 1.3998227083177814e-05, 'epoch': 1.16} +2025-02-05 19:05:41 - ERROR - stderr - 39%|███▉ | 8705/22434 [8:58:01<9:36:00, 2.52s/it] +2025-02-05 19:05:43 - ERROR - stderr - 39%|███▉ | 8706/22434 [8:58:03<9:35:44, 2.52s/it] +2025-02-05 19:05:43 - ERROR - stderr - +2025-02-05 19:05:43 - ERROR - stderr - +2025-02-05 19:05:43 - INFO - stdout - {'loss': 0.7877, 'grad_norm': 1.1634438037872314, 'learning_rate': 1.3996903714192643e-05, 'epoch': 1.16} +2025-02-05 19:05:43 - ERROR - stderr - 39%|███▉ | 8706/22434 [8:58:03<9:35:44, 2.52s/it] +2025-02-05 19:05:46 - ERROR - stderr - 39%|███▉ | 8707/22434 [8:58:06<9:33:44, 2.51s/it] +2025-02-05 19:05:46 - ERROR - stderr - +2025-02-05 19:05:46 - ERROR - stderr - +2025-02-05 19:05:46 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.215009093284607, 'learning_rate': 1.3995580261895839e-05, 'epoch': 1.16} +2025-02-05 19:05:46 - ERROR - stderr - 39%|███▉ | 8707/22434 [8:58:06<9:33:44, 2.51s/it] +2025-02-05 19:05:48 - ERROR - stderr - 39%|███▉ | 8708/22434 [8:58:08<9:32:06, 2.50s/it] +2025-02-05 19:05:48 - ERROR - stderr - +2025-02-05 19:05:48 - ERROR - stderr - +2025-02-05 19:05:48 - INFO - stdout - {'loss': 0.7073, 'grad_norm': 1.0690982341766357, 'learning_rate': 1.3994256726314988e-05, 'epoch': 1.16} +2025-02-05 19:05:48 - ERROR - stderr - 39%|███▉ | 8708/22434 [8:58:08<9:32:06, 2.50s/it] +2025-02-05 19:05:51 - ERROR - stderr - 39%|███▉ | 8709/22434 [8:58:11<9:29:48, 2.49s/it] +2025-02-05 19:05:51 - ERROR - stderr - +2025-02-05 19:05:51 - ERROR - stderr - +2025-02-05 19:05:51 - INFO - stdout - {'loss': 0.7298, 'grad_norm': 1.0872148275375366, 'learning_rate': 1.3992933107477673e-05, 'epoch': 1.16} +2025-02-05 19:05:51 - ERROR - stderr - 39%|███▉ | 8709/22434 [8:58:11<9:29:48, 2.49s/it] +2025-02-05 19:05:53 - ERROR - stderr - 39%|███▉ | 8710/22434 [8:58:13<9:38:56, 2.53s/it] +2025-02-05 19:05:53 - ERROR - stderr - +2025-02-05 19:05:53 - ERROR - stderr - +2025-02-05 19:05:53 - INFO - stdout - {'loss': 0.7087, 'grad_norm': 1.160008192062378, 'learning_rate': 1.3991609405411493e-05, 'epoch': 1.16} +2025-02-05 19:05:53 - ERROR - stderr - 39%|███▉ | 8710/22434 [8:58:13<9:38:56, 2.53s/it] +2025-02-05 19:05:56 - ERROR - stderr - 39%|███▉ | 8711/22434 [8:58:16<9:37:50, 2.53s/it] +2025-02-05 19:05:56 - ERROR - stderr - +2025-02-05 19:05:56 - ERROR - stderr - +2025-02-05 19:05:56 - INFO - stdout - {'loss': 0.7353, 'grad_norm': 1.1520310640335083, 'learning_rate': 1.3990285620144035e-05, 'epoch': 1.16} +2025-02-05 19:05:56 - ERROR - stderr - 39%|███▉ | 8711/22434 [8:58:16<9:37:50, 2.53s/it] +2025-02-05 19:05:59 - ERROR - stderr - 39%|███▉ | 8712/22434 [8:58:18<9:41:40, 2.54s/it] +2025-02-05 19:05:59 - ERROR - stderr - +2025-02-05 19:05:59 - ERROR - stderr - +2025-02-05 19:05:59 - INFO - stdout - {'loss': 0.7309, 'grad_norm': 1.1476534605026245, 'learning_rate': 1.398896175170289e-05, 'epoch': 1.17} +2025-02-05 19:05:59 - ERROR - stderr - 39%|███▉ | 8712/22434 [8:58:18<9:41:40, 2.54s/it] +2025-02-05 19:06:01 - ERROR - stderr - 39%|███▉ | 8713/22434 [8:58:21<9:39:28, 2.53s/it] +2025-02-05 19:06:01 - ERROR - stderr - +2025-02-05 19:06:01 - ERROR - stderr - +2025-02-05 19:06:01 - INFO - stdout - {'loss': 0.7307, 'grad_norm': 1.276505470275879, 'learning_rate': 1.3987637800115654e-05, 'epoch': 1.17} +2025-02-05 19:06:01 - ERROR - stderr - 39%|███▉ | 8713/22434 [8:58:21<9:39:28, 2.53s/it] +2025-02-05 19:06:04 - ERROR - stderr - 39%|███▉ | 8714/22434 [8:58:23<9:39:17, 2.53s/it] +2025-02-05 19:06:04 - ERROR - stderr - +2025-02-05 19:06:04 - ERROR - stderr - +2025-02-05 19:06:04 - INFO - stdout - {'loss': 0.7366, 'grad_norm': 1.2760090827941895, 'learning_rate': 1.3986313765409924e-05, 'epoch': 1.17} +2025-02-05 19:06:04 - ERROR - stderr - 39%|███▉ | 8714/22434 [8:58:23<9:39:17, 2.53s/it] +2025-02-05 19:06:06 - ERROR - stderr - 39%|███▉ | 8715/22434 [8:58:26<9:44:21, 2.56s/it] +2025-02-05 19:06:06 - ERROR - stderr - +2025-02-05 19:06:06 - ERROR - stderr - +2025-02-05 19:06:06 - INFO - stdout - {'loss': 0.701, 'grad_norm': 1.2144317626953125, 'learning_rate': 1.3984989647613301e-05, 'epoch': 1.17} +2025-02-05 19:06:06 - ERROR - stderr - 39%|███▉ | 8715/22434 [8:58:26<9:44:21, 2.56s/it] +2025-02-05 19:06:09 - ERROR - stderr - 39%|███▉ | 8716/22434 [8:58:29<10:07:13, 2.66s/it] +2025-02-05 19:06:09 - ERROR - stderr - +2025-02-05 19:06:09 - ERROR - stderr - +2025-02-05 19:06:09 - INFO - stdout - {'loss': 0.6982, 'grad_norm': 1.1092779636383057, 'learning_rate': 1.3983665446753378e-05, 'epoch': 1.17} +2025-02-05 19:06:09 - ERROR - stderr - 39%|███▉ | 8716/22434 [8:58:29<10:07:13, 2.66s/it] +2025-02-05 19:06:12 - ERROR - stderr - 39%|███▉ | 8717/22434 [8:58:31<9:59:29, 2.62s/it] +2025-02-05 19:06:12 - ERROR - stderr - +2025-02-05 19:06:12 - ERROR - stderr - +2025-02-05 19:06:12 - INFO - stdout - {'loss': 0.728, 'grad_norm': 1.281235933303833, 'learning_rate': 1.3982341162857761e-05, 'epoch': 1.17} +2025-02-05 19:06:12 - ERROR - stderr - 39%|███▉ | 8717/22434 [8:58:31<9:59:29, 2.62s/it] +2025-02-05 19:06:14 - ERROR - stderr - 39%|███▉ | 8718/22434 [8:58:34<9:49:36, 2.58s/it] +2025-02-05 19:06:14 - ERROR - stderr - +2025-02-05 19:06:14 - ERROR - stderr - +2025-02-05 19:06:14 - INFO - stdout - {'loss': 0.802, 'grad_norm': 1.2920124530792236, 'learning_rate': 1.3981016795954054e-05, 'epoch': 1.17} +2025-02-05 19:06:14 - ERROR - stderr - 39%|███▉ | 8718/22434 [8:58:34<9:49:36, 2.58s/it] +2025-02-05 19:06:17 - ERROR - stderr - 39%|███▉ | 8719/22434 [8:58:36<9:41:27, 2.54s/it] +2025-02-05 19:06:17 - ERROR - stderr - +2025-02-05 19:06:17 - ERROR - stderr - +2025-02-05 19:06:17 - INFO - stdout - {'loss': 0.6693, 'grad_norm': 1.1635991334915161, 'learning_rate': 1.3979692346069863e-05, 'epoch': 1.17} +2025-02-05 19:06:17 - ERROR - stderr - 39%|███▉ | 8719/22434 [8:58:36<9:41:27, 2.54s/it] +2025-02-05 19:06:19 - ERROR - stderr - 39%|███▉ | 8720/22434 [8:58:39<9:38:47, 2.53s/it] +2025-02-05 19:06:19 - ERROR - stderr - +2025-02-05 19:06:19 - ERROR - stderr - +2025-02-05 19:06:19 - INFO - stdout - {'loss': 0.6353, 'grad_norm': 1.0509663820266724, 'learning_rate': 1.3978367813232793e-05, 'epoch': 1.17} +2025-02-05 19:06:19 - ERROR - stderr - 39%|███▉ | 8720/22434 [8:58:39<9:38:47, 2.53s/it] +2025-02-05 19:06:22 - ERROR - stderr - 39%|███▉ | 8721/22434 [8:58:41<9:43:47, 2.55s/it] +2025-02-05 19:06:22 - ERROR - stderr - +2025-02-05 19:06:22 - ERROR - stderr - +2025-02-05 19:06:22 - INFO - stdout - {'loss': 0.6493, 'grad_norm': 1.1317791938781738, 'learning_rate': 1.397704319747045e-05, 'epoch': 1.17} +2025-02-05 19:06:22 - ERROR - stderr - 39%|███▉ | 8721/22434 [8:58:41<9:43:47, 2.55s/it] +2025-02-05 19:06:24 - ERROR - stderr - 39%|███▉ | 8722/22434 [8:58:44<9:48:48, 2.58s/it] +2025-02-05 19:06:24 - ERROR - stderr - +2025-02-05 19:06:24 - ERROR - stderr - +2025-02-05 19:06:24 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 0.9960140585899353, 'learning_rate': 1.3975718498810449e-05, 'epoch': 1.17} +2025-02-05 19:06:24 - ERROR - stderr - 39%|███▉ | 8722/22434 [8:58:44<9:48:48, 2.58s/it] +2025-02-05 19:06:27 - ERROR - stderr - 39%|███▉ | 8723/22434 [8:58:47<9:42:48, 2.55s/it] +2025-02-05 19:06:27 - ERROR - stderr - +2025-02-05 19:06:27 - ERROR - stderr - +2025-02-05 19:06:27 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.185448169708252, 'learning_rate': 1.39743937172804e-05, 'epoch': 1.17} +2025-02-05 19:06:27 - ERROR - stderr - 39%|███▉ | 8723/22434 [8:58:47<9:42:48, 2.55s/it] +2025-02-05 19:06:29 - ERROR - stderr - 39%|███▉ | 8724/22434 [8:58:49<9:42:07, 2.55s/it] +2025-02-05 19:06:29 - ERROR - stderr - +2025-02-05 19:06:29 - ERROR - stderr - +2025-02-05 19:06:29 - INFO - stdout - {'loss': 0.7471, 'grad_norm': 1.1212080717086792, 'learning_rate': 1.3973068852907918e-05, 'epoch': 1.17} +2025-02-05 19:06:29 - ERROR - stderr - 39%|███▉ | 8724/22434 [8:58:49<9:42:07, 2.55s/it] +2025-02-05 19:06:32 - ERROR - stderr - 39%|███▉ | 8725/22434 [8:58:52<9:45:29, 2.56s/it] +2025-02-05 19:06:32 - ERROR - stderr - +2025-02-05 19:06:32 - ERROR - stderr - +2025-02-05 19:06:32 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.0912054777145386, 'learning_rate': 1.3971743905720616e-05, 'epoch': 1.17} +2025-02-05 19:06:32 - ERROR - stderr - 39%|███▉ | 8725/22434 [8:58:52<9:45:29, 2.56s/it] +2025-02-05 19:06:34 - ERROR - stderr - 39%|███▉ | 8726/22434 [8:58:54<9:41:43, 2.55s/it] +2025-02-05 19:06:34 - ERROR - stderr - +2025-02-05 19:06:34 - ERROR - stderr - +2025-02-05 19:06:34 - INFO - stdout - {'loss': 0.7313, 'grad_norm': 1.0776363611221313, 'learning_rate': 1.3970418875746114e-05, 'epoch': 1.17} +2025-02-05 19:06:34 - ERROR - stderr - 39%|███▉ | 8726/22434 [8:58:54<9:41:43, 2.55s/it] +2025-02-05 19:06:37 - ERROR - stderr - 39%|███▉ | 8727/22434 [8:58:57<9:35:47, 2.52s/it] +2025-02-05 19:06:37 - ERROR - stderr - +2025-02-05 19:06:37 - ERROR - stderr - +2025-02-05 19:06:37 - INFO - stdout - {'loss': 0.7524, 'grad_norm': 1.2628676891326904, 'learning_rate': 1.3969093763012031e-05, 'epoch': 1.17} +2025-02-05 19:06:37 - ERROR - stderr - 39%|███▉ | 8727/22434 [8:58:57<9:35:47, 2.52s/it] +2025-02-05 19:06:39 - ERROR - stderr - 39%|███▉ | 8728/22434 [8:58:59<9:30:34, 2.50s/it] +2025-02-05 19:06:39 - ERROR - stderr - +2025-02-05 19:06:39 - ERROR - stderr - +2025-02-05 19:06:39 - INFO - stdout - {'loss': 0.7586, 'grad_norm': 1.319196105003357, 'learning_rate': 1.396776856754598e-05, 'epoch': 1.17} +2025-02-05 19:06:39 - ERROR - stderr - 39%|███▉ | 8728/22434 [8:58:59<9:30:34, 2.50s/it] +2025-02-05 19:06:42 - ERROR - stderr - 39%|███▉ | 8729/22434 [8:59:02<9:32:02, 2.50s/it] +2025-02-05 19:06:42 - ERROR - stderr - +2025-02-05 19:06:42 - ERROR - stderr - +2025-02-05 19:06:42 - INFO - stdout - {'loss': 0.7214, 'grad_norm': 1.131996512413025, 'learning_rate': 1.3966443289375598e-05, 'epoch': 1.17} +2025-02-05 19:06:42 - ERROR - stderr - 39%|███▉ | 8729/22434 [8:59:02<9:32:02, 2.50s/it] +2025-02-05 19:06:44 - ERROR - stderr - 39%|███▉ | 8730/22434 [8:59:04<9:34:48, 2.52s/it] +2025-02-05 19:06:44 - ERROR - stderr - +2025-02-05 19:06:44 - ERROR - stderr - +2025-02-05 19:06:44 - INFO - stdout - {'loss': 0.7005, 'grad_norm': 1.1569792032241821, 'learning_rate': 1.3965117928528495e-05, 'epoch': 1.17} +2025-02-05 19:06:44 - ERROR - stderr - 39%|███▉ | 8730/22434 [8:59:04<9:34:48, 2.52s/it] +2025-02-05 19:06:47 - ERROR - stderr - 39%|███▉ | 8731/22434 [8:59:07<9:33:39, 2.51s/it] +2025-02-05 19:06:47 - ERROR - stderr - +2025-02-05 19:06:47 - ERROR - stderr - +2025-02-05 19:06:47 - INFO - stdout - {'loss': 0.7837, 'grad_norm': 1.1557785272598267, 'learning_rate': 1.396379248503231e-05, 'epoch': 1.17} +2025-02-05 19:06:47 - ERROR - stderr - 39%|███▉ | 8731/22434 [8:59:07<9:33:39, 2.51s/it] +2025-02-05 19:06:49 - ERROR - stderr - 39%|███▉ | 8732/22434 [8:59:09<9:34:31, 2.52s/it] +2025-02-05 19:06:49 - ERROR - stderr - +2025-02-05 19:06:49 - ERROR - stderr - +2025-02-05 19:06:49 - INFO - stdout - {'loss': 0.7104, 'grad_norm': 1.1009104251861572, 'learning_rate': 1.3962466958914657e-05, 'epoch': 1.17} +2025-02-05 19:06:49 - ERROR - stderr - 39%|███▉ | 8732/22434 [8:59:09<9:34:31, 2.52s/it] +2025-02-05 19:06:52 - ERROR - stderr - 39%|███▉ | 8733/22434 [8:59:12<9:33:37, 2.51s/it] +2025-02-05 19:06:52 - ERROR - stderr - +2025-02-05 19:06:52 - ERROR - stderr - +2025-02-05 19:06:52 - INFO - stdout - {'loss': 0.7004, 'grad_norm': 1.1699339151382446, 'learning_rate': 1.3961141350203176e-05, 'epoch': 1.17} +2025-02-05 19:06:52 - ERROR - stderr - 39%|███▉ | 8733/22434 [8:59:12<9:33:37, 2.51s/it] +2025-02-05 19:06:54 - ERROR - stderr - 39%|███▉ | 8734/22434 [8:59:14<9:28:15, 2.49s/it] +2025-02-05 19:06:54 - ERROR - stderr - +2025-02-05 19:06:54 - ERROR - stderr - +2025-02-05 19:06:54 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.296496033668518, 'learning_rate': 1.395981565892549e-05, 'epoch': 1.17} +2025-02-05 19:06:54 - ERROR - stderr - 39%|███▉ | 8734/22434 [8:59:14<9:28:15, 2.49s/it] +2025-02-05 19:06:57 - ERROR - stderr - 39%|███▉ | 8735/22434 [8:59:17<9:38:05, 2.53s/it] +2025-02-05 19:06:57 - ERROR - stderr - +2025-02-05 19:06:57 - ERROR - stderr - +2025-02-05 19:06:57 - INFO - stdout - {'loss': 0.6537, 'grad_norm': 1.052441120147705, 'learning_rate': 1.3958489885109238e-05, 'epoch': 1.17} +2025-02-05 19:06:57 - ERROR - stderr - 39%|███▉ | 8735/22434 [8:59:17<9:38:05, 2.53s/it] +2025-02-05 19:07:00 - ERROR - stderr - 39%|███▉ | 8736/22434 [8:59:19<9:38:41, 2.53s/it] +2025-02-05 19:07:00 - ERROR - stderr - +2025-02-05 19:07:00 - ERROR - stderr - +2025-02-05 19:07:00 - INFO - stdout - {'loss': 0.7055, 'grad_norm': 1.2211965322494507, 'learning_rate': 1.3957164028782053e-05, 'epoch': 1.17} +2025-02-05 19:07:00 - ERROR - stderr - 39%|███▉ | 8736/22434 [8:59:19<9:38:41, 2.53s/it] +2025-02-05 19:07:02 - ERROR - stderr - 39%|███▉ | 8737/22434 [8:59:22<9:36:44, 2.53s/it] +2025-02-05 19:07:02 - ERROR - stderr - +2025-02-05 19:07:02 - ERROR - stderr - +2025-02-05 19:07:02 - INFO - stdout - {'loss': 0.7142, 'grad_norm': 1.1071503162384033, 'learning_rate': 1.395583808997157e-05, 'epoch': 1.17} +2025-02-05 19:07:02 - ERROR - stderr - 39%|███▉ | 8737/22434 [8:59:22<9:36:44, 2.53s/it] +2025-02-05 19:07:05 - ERROR - stderr - 39%|███▉ | 8738/22434 [8:59:24<9:38:06, 2.53s/it] +2025-02-05 19:07:05 - ERROR - stderr - +2025-02-05 19:07:05 - ERROR - stderr - +2025-02-05 19:07:05 - INFO - stdout - {'loss': 0.7937, 'grad_norm': 1.1510775089263916, 'learning_rate': 1.3954512068705425e-05, 'epoch': 1.17} +2025-02-05 19:07:05 - ERROR - stderr - 39%|███▉ | 8738/22434 [8:59:24<9:38:06, 2.53s/it] +2025-02-05 19:07:07 - ERROR - stderr - 39%|███▉ | 8739/22434 [8:59:27<9:50:11, 2.59s/it] +2025-02-05 19:07:07 - ERROR - stderr - +2025-02-05 19:07:07 - ERROR - stderr - +2025-02-05 19:07:07 - INFO - stdout - {'loss': 0.7228, 'grad_norm': 1.0915801525115967, 'learning_rate': 1.3953185965011265e-05, 'epoch': 1.17} +2025-02-05 19:07:07 - ERROR - stderr - 39%|███▉ | 8739/22434 [8:59:27<9:50:11, 2.59s/it] +2025-02-05 19:07:10 - ERROR - stderr - 39%|███▉ | 8740/22434 [8:59:30<9:44:59, 2.56s/it] +2025-02-05 19:07:10 - ERROR - stderr - +2025-02-05 19:07:10 - ERROR - stderr - +2025-02-05 19:07:10 - INFO - stdout - {'loss': 0.7526, 'grad_norm': 1.116228461265564, 'learning_rate': 1.3951859778916723e-05, 'epoch': 1.17} +2025-02-05 19:07:10 - ERROR - stderr - 39%|███▉ | 8740/22434 [8:59:30<9:44:59, 2.56s/it] +2025-02-05 19:07:12 - ERROR - stderr - 39%|███▉ | 8741/22434 [8:59:32<9:51:01, 2.59s/it] +2025-02-05 19:07:12 - ERROR - stderr - +2025-02-05 19:07:12 - ERROR - stderr - +2025-02-05 19:07:12 - INFO - stdout - {'loss': 0.7557, 'grad_norm': 1.1896651983261108, 'learning_rate': 1.3950533510449444e-05, 'epoch': 1.17} +2025-02-05 19:07:12 - ERROR - stderr - 39%|███▉ | 8741/22434 [8:59:32<9:51:01, 2.59s/it] +2025-02-05 19:07:15 - ERROR - stderr - 39%|███▉ | 8742/22434 [8:59:35<9:49:28, 2.58s/it] +2025-02-05 19:07:15 - ERROR - stderr - +2025-02-05 19:07:15 - ERROR - stderr - +2025-02-05 19:07:15 - INFO - stdout - {'loss': 0.7198, 'grad_norm': 1.1185009479522705, 'learning_rate': 1.3949207159637075e-05, 'epoch': 1.17} +2025-02-05 19:07:15 - ERROR - stderr - 39%|███▉ | 8742/22434 [8:59:35<9:49:28, 2.58s/it] +2025-02-05 19:07:17 - ERROR - stderr - 39%|███▉ | 8743/22434 [8:59:37<9:42:43, 2.55s/it] +2025-02-05 19:07:18 - ERROR - stderr - +2025-02-05 19:07:18 - ERROR - stderr - +2025-02-05 19:07:18 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.0864053964614868, 'learning_rate': 1.3947880726507267e-05, 'epoch': 1.17} +2025-02-05 19:07:18 - ERROR - stderr - 39%|███▉ | 8743/22434 [8:59:37<9:42:43, 2.55s/it] +2025-02-05 19:07:20 - ERROR - stderr - 39%|███▉ | 8744/22434 [8:59:40<9:40:29, 2.54s/it] +2025-02-05 19:07:20 - ERROR - stderr - +2025-02-05 19:07:20 - ERROR - stderr - +2025-02-05 19:07:20 - INFO - stdout - {'loss': 0.6605, 'grad_norm': 1.2050734758377075, 'learning_rate': 1.3946554211087657e-05, 'epoch': 1.17} +2025-02-05 19:07:20 - ERROR - stderr - 39%|███▉ | 8744/22434 [8:59:40<9:40:29, 2.54s/it] +2025-02-05 19:07:23 - ERROR - stderr - 39%|███▉ | 8745/22434 [8:59:42<9:40:02, 2.54s/it] +2025-02-05 19:07:23 - ERROR - stderr - +2025-02-05 19:07:23 - ERROR - stderr - +2025-02-05 19:07:23 - INFO - stdout - {'loss': 0.7107, 'grad_norm': 1.1356236934661865, 'learning_rate': 1.3945227613405902e-05, 'epoch': 1.17} +2025-02-05 19:07:23 - ERROR - stderr - 39%|███▉ | 8745/22434 [8:59:42<9:40:02, 2.54s/it] +2025-02-05 19:07:25 - ERROR - stderr - 39%|███▉ | 8746/22434 [8:59:45<9:32:19, 2.51s/it] +2025-02-05 19:07:25 - ERROR - stderr - +2025-02-05 19:07:25 - ERROR - stderr - +2025-02-05 19:07:25 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.0730347633361816, 'learning_rate': 1.3943900933489653e-05, 'epoch': 1.17} +2025-02-05 19:07:25 - ERROR - stderr - 39%|███▉ | 8746/22434 [8:59:45<9:32:19, 2.51s/it] +2025-02-05 19:07:28 - ERROR - stderr - 39%|███▉ | 8747/22434 [8:59:47<9:33:11, 2.51s/it] +2025-02-05 19:07:28 - ERROR - stderr - +2025-02-05 19:07:28 - ERROR - stderr - +2025-02-05 19:07:28 - INFO - stdout - {'loss': 0.6412, 'grad_norm': 1.0903875827789307, 'learning_rate': 1.3942574171366563e-05, 'epoch': 1.17} +2025-02-05 19:07:28 - ERROR - stderr - 39%|███▉ | 8747/22434 [8:59:47<9:33:11, 2.51s/it] +2025-02-05 19:07:30 - ERROR - stderr - 39%|███▉ | 8748/22434 [8:59:50<9:32:17, 2.51s/it] +2025-02-05 19:07:30 - ERROR - stderr - +2025-02-05 19:07:30 - ERROR - stderr - +2025-02-05 19:07:30 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.2559584379196167, 'learning_rate': 1.3941247327064286e-05, 'epoch': 1.17} +2025-02-05 19:07:30 - ERROR - stderr - 39%|███▉ | 8748/22434 [8:59:50<9:32:17, 2.51s/it] +2025-02-05 19:07:33 - ERROR - stderr - 39%|███▉ | 8749/22434 [8:59:52<9:33:35, 2.51s/it] +2025-02-05 19:07:33 - ERROR - stderr - +2025-02-05 19:07:33 - ERROR - stderr - +2025-02-05 19:07:33 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.115257740020752, 'learning_rate': 1.3939920400610483e-05, 'epoch': 1.17} +2025-02-05 19:07:33 - ERROR - stderr - 39%|███▉ | 8749/22434 [8:59:52<9:33:35, 2.51s/it] +2025-02-05 19:07:35 - ERROR - stderr - 39%|███▉ | 8750/22434 [8:59:55<9:36:32, 2.53s/it] +2025-02-05 19:07:35 - ERROR - stderr - +2025-02-05 19:07:35 - ERROR - stderr - +2025-02-05 19:07:35 - INFO - stdout - {'loss': 0.7214, 'grad_norm': 1.0530403852462769, 'learning_rate': 1.3938593392032806e-05, 'epoch': 1.17} +2025-02-05 19:07:35 - ERROR - stderr - 39%|███▉ | 8750/22434 [8:59:55<9:36:32, 2.53s/it] +2025-02-05 19:07:38 - ERROR - stderr - 39%|███▉ | 8751/22434 [8:59:58<9:55:33, 2.61s/it] +2025-02-05 19:07:38 - ERROR - stderr - +2025-02-05 19:07:38 - ERROR - stderr - +2025-02-05 19:07:38 - INFO - stdout - {'loss': 0.7458, 'grad_norm': 1.0763143301010132, 'learning_rate': 1.393726630135892e-05, 'epoch': 1.17} +2025-02-05 19:07:38 - ERROR - stderr - 39%|███▉ | 8751/22434 [8:59:58<9:55:33, 2.61s/it] +2025-02-05 19:07:40 - ERROR - stderr - 39%|███▉ | 8752/22434 [9:00:00<9:45:53, 2.57s/it] +2025-02-05 19:07:40 - ERROR - stderr - +2025-02-05 19:07:40 - ERROR - stderr - +2025-02-05 19:07:40 - INFO - stdout - {'loss': 0.6818, 'grad_norm': 1.0420470237731934, 'learning_rate': 1.3935939128616486e-05, 'epoch': 1.17} +2025-02-05 19:07:40 - ERROR - stderr - 39%|███▉ | 8752/22434 [9:00:00<9:45:53, 2.57s/it] +2025-02-05 19:07:43 - ERROR - stderr - 39%|███▉ | 8753/22434 [9:00:03<9:41:32, 2.55s/it] +2025-02-05 19:07:43 - ERROR - stderr - +2025-02-05 19:07:43 - ERROR - stderr - +2025-02-05 19:07:43 - INFO - stdout - {'loss': 0.8303, 'grad_norm': 1.3923213481903076, 'learning_rate': 1.3934611873833168e-05, 'epoch': 1.17} +2025-02-05 19:07:43 - ERROR - stderr - 39%|███▉ | 8753/22434 [9:00:03<9:41:32, 2.55s/it] +2025-02-05 19:07:45 - ERROR - stderr - 39%|███▉ | 8754/22434 [9:00:05<9:33:36, 2.52s/it] +2025-02-05 19:07:45 - ERROR - stderr - +2025-02-05 19:07:45 - ERROR - stderr - +2025-02-05 19:07:45 - INFO - stdout - {'loss': 0.6855, 'grad_norm': 1.255861520767212, 'learning_rate': 1.3933284537036626e-05, 'epoch': 1.17} +2025-02-05 19:07:45 - ERROR - stderr - 39%|███▉ | 8754/22434 [9:00:05<9:33:36, 2.52s/it] +2025-02-05 19:07:48 - ERROR - stderr - 39%|███▉ | 8755/22434 [9:00:08<9:33:32, 2.52s/it] +2025-02-05 19:07:48 - ERROR - stderr - +2025-02-05 19:07:48 - ERROR - stderr - +2025-02-05 19:07:48 - INFO - stdout - {'loss': 0.6539, 'grad_norm': 1.1773043870925903, 'learning_rate': 1.3931957118254536e-05, 'epoch': 1.17} +2025-02-05 19:07:48 - ERROR - stderr - 39%|███▉ | 8755/22434 [9:00:08<9:33:32, 2.52s/it] +2025-02-05 19:07:50 - ERROR - stderr - 39%|███▉ | 8756/22434 [9:00:10<9:39:49, 2.54s/it] +2025-02-05 19:07:50 - ERROR - stderr - +2025-02-05 19:07:50 - ERROR - stderr - +2025-02-05 19:07:50 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.0987216234207153, 'learning_rate': 1.3930629617514562e-05, 'epoch': 1.17} +2025-02-05 19:07:50 - ERROR - stderr - 39%|███▉ | 8756/22434 [9:00:10<9:39:49, 2.54s/it] +2025-02-05 19:07:53 - ERROR - stderr - 39%|███▉ | 8757/22434 [9:00:13<9:47:36, 2.58s/it] +2025-02-05 19:07:53 - ERROR - stderr - +2025-02-05 19:07:53 - ERROR - stderr - +2025-02-05 19:07:53 - INFO - stdout - {'loss': 0.7427, 'grad_norm': 1.2814117670059204, 'learning_rate': 1.3929302034844373e-05, 'epoch': 1.17} +2025-02-05 19:07:53 - ERROR - stderr - 39%|███▉ | 8757/22434 [9:00:13<9:47:36, 2.58s/it] +2025-02-05 19:07:56 - ERROR - stderr - 39%|███▉ | 8758/22434 [9:00:15<9:45:24, 2.57s/it] +2025-02-05 19:07:56 - ERROR - stderr - +2025-02-05 19:07:56 - ERROR - stderr - +2025-02-05 19:07:56 - INFO - stdout - {'loss': 0.7938, 'grad_norm': 1.229577660560608, 'learning_rate': 1.3927974370271644e-05, 'epoch': 1.17} +2025-02-05 19:07:56 - ERROR - stderr - 39%|███▉ | 8758/22434 [9:00:15<9:45:24, 2.57s/it] +2025-02-05 19:07:58 - ERROR - stderr - 39%|███▉ | 8759/22434 [9:00:18<9:37:46, 2.53s/it] +2025-02-05 19:07:58 - ERROR - stderr - +2025-02-05 19:07:58 - ERROR - stderr - +2025-02-05 19:07:58 - INFO - stdout - {'loss': 0.7589, 'grad_norm': 1.1454765796661377, 'learning_rate': 1.3926646623824047e-05, 'epoch': 1.17} +2025-02-05 19:07:58 - ERROR - stderr - 39%|███▉ | 8759/22434 [9:00:18<9:37:46, 2.53s/it] +2025-02-05 19:08:01 - ERROR - stderr - 39%|███▉ | 8760/22434 [9:00:20<9:37:01, 2.53s/it] +2025-02-05 19:08:01 - ERROR - stderr - +2025-02-05 19:08:01 - ERROR - stderr - +2025-02-05 19:08:01 - INFO - stdout - {'loss': 0.7225, 'grad_norm': 1.0660130977630615, 'learning_rate': 1.392531879552926e-05, 'epoch': 1.17} +2025-02-05 19:08:01 - ERROR - stderr - 39%|███▉ | 8760/22434 [9:00:20<9:37:01, 2.53s/it] +2025-02-05 19:08:03 - ERROR - stderr - 39%|███▉ | 8761/22434 [9:00:23<9:33:12, 2.52s/it] +2025-02-05 19:08:03 - ERROR - stderr - +2025-02-05 19:08:03 - ERROR - stderr - +2025-02-05 19:08:03 - INFO - stdout - {'loss': 0.6179, 'grad_norm': 1.0706727504730225, 'learning_rate': 1.3923990885414958e-05, 'epoch': 1.17} +2025-02-05 19:08:03 - ERROR - stderr - 39%|███▉ | 8761/22434 [9:00:23<9:33:12, 2.52s/it] +2025-02-05 19:08:06 - ERROR - stderr - 39%|███▉ | 8762/22434 [9:00:25<9:29:36, 2.50s/it] +2025-02-05 19:08:06 - ERROR - stderr - +2025-02-05 19:08:06 - ERROR - stderr - +2025-02-05 19:08:06 - INFO - stdout - {'loss': 0.7456, 'grad_norm': 1.1329622268676758, 'learning_rate': 1.392266289350882e-05, 'epoch': 1.17} +2025-02-05 19:08:06 - ERROR - stderr - 39%|███▉ | 8762/22434 [9:00:25<9:29:36, 2.50s/it] +2025-02-05 19:08:08 - ERROR - stderr - 39%|███▉ | 8763/22434 [9:00:28<9:37:26, 2.53s/it] +2025-02-05 19:08:08 - ERROR - stderr - +2025-02-05 19:08:08 - ERROR - stderr - +2025-02-05 19:08:08 - INFO - stdout - {'loss': 0.7453, 'grad_norm': 1.2155379056930542, 'learning_rate': 1.3921334819838527e-05, 'epoch': 1.17} +2025-02-05 19:08:08 - ERROR - stderr - 39%|███▉ | 8763/22434 [9:00:28<9:37:26, 2.53s/it] +2025-02-05 19:08:11 - ERROR - stderr - 39%|███▉ | 8764/22434 [9:00:30<9:36:15, 2.53s/it] +2025-02-05 19:08:11 - ERROR - stderr - +2025-02-05 19:08:11 - ERROR - stderr - +2025-02-05 19:08:11 - INFO - stdout - {'loss': 0.7464, 'grad_norm': 1.1920222043991089, 'learning_rate': 1.3920006664431767e-05, 'epoch': 1.17} +2025-02-05 19:08:11 - ERROR - stderr - 39%|███▉ | 8764/22434 [9:00:31<9:36:15, 2.53s/it] +2025-02-05 19:08:13 - ERROR - stderr - 39%|███▉ | 8765/22434 [9:00:33<9:32:26, 2.51s/it] +2025-02-05 19:08:13 - ERROR - stderr - +2025-02-05 19:08:13 - ERROR - stderr - +2025-02-05 19:08:13 - INFO - stdout - {'loss': 0.6758, 'grad_norm': 1.190022587776184, 'learning_rate': 1.3918678427316215e-05, 'epoch': 1.17} +2025-02-05 19:08:13 - ERROR - stderr - 39%|███▉ | 8765/22434 [9:00:33<9:32:26, 2.51s/it] +2025-02-05 19:08:16 - ERROR - stderr - 39%|███▉ | 8766/22434 [9:00:36<9:37:02, 2.53s/it] +2025-02-05 19:08:16 - ERROR - stderr - +2025-02-05 19:08:16 - ERROR - stderr - +2025-02-05 19:08:16 - INFO - stdout - {'loss': 0.8224, 'grad_norm': 1.1319959163665771, 'learning_rate': 1.391735010851956e-05, 'epoch': 1.17} +2025-02-05 19:08:16 - ERROR - stderr - 39%|███▉ | 8766/22434 [9:00:36<9:37:02, 2.53s/it] +2025-02-05 19:08:18 - ERROR - stderr - 39%|███▉ | 8767/22434 [9:00:38<9:30:18, 2.50s/it] +2025-02-05 19:08:18 - ERROR - stderr - +2025-02-05 19:08:18 - ERROR - stderr - +2025-02-05 19:08:18 - INFO - stdout - {'loss': 0.7751, 'grad_norm': 1.1790599822998047, 'learning_rate': 1.3916021708069492e-05, 'epoch': 1.17} +2025-02-05 19:08:18 - ERROR - stderr - 39%|███▉ | 8767/22434 [9:00:38<9:30:18, 2.50s/it] +2025-02-05 19:08:21 - ERROR - stderr - 39%|███▉ | 8768/22434 [9:00:41<9:33:51, 2.52s/it] +2025-02-05 19:08:21 - ERROR - stderr - +2025-02-05 19:08:21 - ERROR - stderr - +2025-02-05 19:08:21 - INFO - stdout - {'loss': 0.761, 'grad_norm': 1.3080675601959229, 'learning_rate': 1.3914693225993701e-05, 'epoch': 1.17} +2025-02-05 19:08:21 - ERROR - stderr - 39%|███▉ | 8768/22434 [9:00:41<9:33:51, 2.52s/it] +2025-02-05 19:08:23 - ERROR - stderr - 39%|███▉ | 8769/22434 [9:00:43<9:29:42, 2.50s/it] +2025-02-05 19:08:23 - ERROR - stderr - +2025-02-05 19:08:23 - ERROR - stderr - +2025-02-05 19:08:23 - INFO - stdout - {'loss': 0.748, 'grad_norm': 1.1109412908554077, 'learning_rate': 1.3913364662319872e-05, 'epoch': 1.17} +2025-02-05 19:08:23 - ERROR - stderr - 39%|███▉ | 8769/22434 [9:00:43<9:29:42, 2.50s/it] +2025-02-05 19:08:26 - ERROR - stderr - 39%|███▉ | 8770/22434 [9:00:46<9:43:42, 2.56s/it] +2025-02-05 19:08:26 - ERROR - stderr - +2025-02-05 19:08:26 - ERROR - stderr - +2025-02-05 19:08:26 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.0980364084243774, 'learning_rate': 1.3912036017075703e-05, 'epoch': 1.17} +2025-02-05 19:08:26 - ERROR - stderr - 39%|███▉ | 8770/22434 [9:00:46<9:43:42, 2.56s/it] +2025-02-05 19:08:28 - ERROR - stderr - 39%|███▉ | 8771/22434 [9:00:48<9:34:45, 2.52s/it] +2025-02-05 19:08:28 - ERROR - stderr - +2025-02-05 19:08:28 - ERROR - stderr - +2025-02-05 19:08:28 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.385385513305664, 'learning_rate': 1.3910707290288885e-05, 'epoch': 1.17} +2025-02-05 19:08:28 - ERROR - stderr - 39%|███▉ | 8771/22434 [9:00:48<9:34:45, 2.52s/it] +2025-02-05 19:08:31 - ERROR - stderr - 39%|███▉ | 8772/22434 [9:00:51<9:32:30, 2.51s/it] +2025-02-05 19:08:31 - ERROR - stderr - +2025-02-05 19:08:31 - ERROR - stderr - +2025-02-05 19:08:31 - INFO - stdout - {'loss': 0.6841, 'grad_norm': 1.1751184463500977, 'learning_rate': 1.390937848198712e-05, 'epoch': 1.17} +2025-02-05 19:08:31 - ERROR - stderr - 39%|███▉ | 8772/22434 [9:00:51<9:32:30, 2.51s/it] +2025-02-05 19:08:33 - ERROR - stderr - 39%|███▉ | 8773/22434 [9:00:53<9:35:32, 2.53s/it] +2025-02-05 19:08:33 - ERROR - stderr - +2025-02-05 19:08:33 - ERROR - stderr - +2025-02-05 19:08:33 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.1100140810012817, 'learning_rate': 1.3908049592198096e-05, 'epoch': 1.17} +2025-02-05 19:08:33 - ERROR - stderr - 39%|███▉ | 8773/22434 [9:00:53<9:35:32, 2.53s/it] +2025-02-05 19:08:36 - ERROR - stderr - 39%|███▉ | 8774/22434 [9:00:56<9:42:58, 2.56s/it] +2025-02-05 19:08:36 - ERROR - stderr - +2025-02-05 19:08:36 - ERROR - stderr - +2025-02-05 19:08:36 - INFO - stdout - {'loss': 0.7079, 'grad_norm': 1.091443419456482, 'learning_rate': 1.3906720620949521e-05, 'epoch': 1.17} +2025-02-05 19:08:36 - ERROR - stderr - 39%|███▉ | 8774/22434 [9:00:56<9:42:58, 2.56s/it] +2025-02-05 19:08:39 - ERROR - stderr - 39%|███▉ | 8775/22434 [9:00:58<9:44:54, 2.57s/it] +2025-02-05 19:08:39 - ERROR - stderr - +2025-02-05 19:08:39 - ERROR - stderr - +2025-02-05 19:08:39 - INFO - stdout - {'loss': 0.6854, 'grad_norm': 1.1175602674484253, 'learning_rate': 1.3905391568269091e-05, 'epoch': 1.17} +2025-02-05 19:08:39 - ERROR - stderr - 39%|███▉ | 8775/22434 [9:00:58<9:44:54, 2.57s/it] +2025-02-05 19:08:41 - ERROR - stderr - 39%|███▉ | 8776/22434 [9:01:01<9:40:33, 2.55s/it] +2025-02-05 19:08:41 - ERROR - stderr - +2025-02-05 19:08:41 - ERROR - stderr - +2025-02-05 19:08:41 - INFO - stdout - {'loss': 0.7469, 'grad_norm': 1.3727306127548218, 'learning_rate': 1.3904062434184514e-05, 'epoch': 1.17} +2025-02-05 19:08:41 - ERROR - stderr - 39%|███▉ | 8776/22434 [9:01:01<9:40:33, 2.55s/it] +2025-02-05 19:08:44 - ERROR - stderr - 39%|███▉ | 8777/22434 [9:01:03<9:36:17, 2.53s/it] +2025-02-05 19:08:44 - ERROR - stderr - +2025-02-05 19:08:44 - ERROR - stderr - +2025-02-05 19:08:44 - INFO - stdout - {'loss': 0.7488, 'grad_norm': 1.154465913772583, 'learning_rate': 1.390273321872349e-05, 'epoch': 1.17} +2025-02-05 19:08:44 - ERROR - stderr - 39%|███▉ | 8777/22434 [9:01:03<9:36:17, 2.53s/it] +2025-02-05 19:08:46 - ERROR - stderr - 39%|███▉ | 8778/22434 [9:01:06<9:54:26, 2.61s/it] +2025-02-05 19:08:46 - ERROR - stderr - +2025-02-05 19:08:46 - ERROR - stderr - +2025-02-05 19:08:46 - INFO - stdout - {'loss': 0.7196, 'grad_norm': 1.2908077239990234, 'learning_rate': 1.3901403921913725e-05, 'epoch': 1.17} +2025-02-05 19:08:46 - ERROR - stderr - 39%|███▉ | 8778/22434 [9:01:06<9:54:26, 2.61s/it] +2025-02-05 19:08:49 - ERROR - stderr - 39%|███▉ | 8779/22434 [9:01:09<9:47:55, 2.58s/it] +2025-02-05 19:08:49 - ERROR - stderr - +2025-02-05 19:08:49 - ERROR - stderr - +2025-02-05 19:08:49 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.1978435516357422, 'learning_rate': 1.3900074543782931e-05, 'epoch': 1.17} +2025-02-05 19:08:49 - ERROR - stderr - 39%|███▉ | 8779/22434 [9:01:09<9:47:55, 2.58s/it] +2025-02-05 19:08:51 - ERROR - stderr - 39%|███▉ | 8780/22434 [9:01:11<9:44:58, 2.57s/it] +2025-02-05 19:08:52 - ERROR - stderr - +2025-02-05 19:08:52 - ERROR - stderr - +2025-02-05 19:08:52 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.1991652250289917, 'learning_rate': 1.3898745084358814e-05, 'epoch': 1.17} +2025-02-05 19:08:52 - ERROR - stderr - 39%|███▉ | 8780/22434 [9:01:11<9:44:58, 2.57s/it] +2025-02-05 19:08:54 - ERROR - stderr - 39%|███▉ | 8781/22434 [9:01:14<9:42:11, 2.56s/it] +2025-02-05 19:08:54 - ERROR - stderr - +2025-02-05 19:08:54 - ERROR - stderr - +2025-02-05 19:08:54 - INFO - stdout - {'loss': 0.7453, 'grad_norm': 1.1526405811309814, 'learning_rate': 1.3897415543669084e-05, 'epoch': 1.17} +2025-02-05 19:08:54 - ERROR - stderr - 39%|███▉ | 8781/22434 [9:01:14<9:42:11, 2.56s/it] +2025-02-05 19:08:57 - ERROR - stderr - 39%|███▉ | 8782/22434 [9:01:16<9:42:07, 2.56s/it] +2025-02-05 19:08:57 - ERROR - stderr - +2025-02-05 19:08:57 - ERROR - stderr - +2025-02-05 19:08:57 - INFO - stdout - {'loss': 0.641, 'grad_norm': 1.0260189771652222, 'learning_rate': 1.3896085921741458e-05, 'epoch': 1.17} +2025-02-05 19:08:57 - ERROR - stderr - 39%|███▉ | 8782/22434 [9:01:16<9:42:07, 2.56s/it] +2025-02-05 19:08:59 - ERROR - stderr - 39%|███▉ | 8783/22434 [9:01:19<9:38:16, 2.54s/it] +2025-02-05 19:08:59 - ERROR - stderr - +2025-02-05 19:08:59 - ERROR - stderr - +2025-02-05 19:08:59 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.1377290487289429, 'learning_rate': 1.389475621860365e-05, 'epoch': 1.17} +2025-02-05 19:08:59 - ERROR - stderr - 39%|███▉ | 8783/22434 [9:01:19<9:38:16, 2.54s/it] +2025-02-05 19:09:02 - ERROR - stderr - 39%|███▉ | 8784/22434 [9:01:21<9:37:12, 2.54s/it] +2025-02-05 19:09:02 - ERROR - stderr - +2025-02-05 19:09:02 - ERROR - stderr - +2025-02-05 19:09:02 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.068386435508728, 'learning_rate': 1.3893426434283376e-05, 'epoch': 1.17} +2025-02-05 19:09:02 - ERROR - stderr - 39%|███▉ | 8784/22434 [9:01:21<9:37:12, 2.54s/it] +2025-02-05 19:09:04 - ERROR - stderr - 39%|███▉ | 8785/22434 [9:01:24<9:35:05, 2.53s/it] +2025-02-05 19:09:04 - ERROR - stderr - +2025-02-05 19:09:04 - ERROR - stderr - +2025-02-05 19:09:04 - INFO - stdout - {'loss': 0.7389, 'grad_norm': 1.2375776767730713, 'learning_rate': 1.3892096568808353e-05, 'epoch': 1.17} +2025-02-05 19:09:04 - ERROR - stderr - 39%|███▉ | 8785/22434 [9:01:24<9:35:05, 2.53s/it] +2025-02-05 19:09:07 - ERROR - stderr - 39%|███▉ | 8786/22434 [9:01:26<9:32:06, 2.52s/it] +2025-02-05 19:09:07 - ERROR - stderr - +2025-02-05 19:09:07 - ERROR - stderr - +2025-02-05 19:09:07 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.1360684633255005, 'learning_rate': 1.3890766622206298e-05, 'epoch': 1.17} +2025-02-05 19:09:07 - ERROR - stderr - 39%|███▉ | 8786/22434 [9:01:26<9:32:06, 2.52s/it] +2025-02-05 19:09:09 - ERROR - stderr - 39%|███▉ | 8787/22434 [9:01:29<9:30:26, 2.51s/it] +2025-02-05 19:09:09 - ERROR - stderr - +2025-02-05 19:09:09 - ERROR - stderr - +2025-02-05 19:09:09 - INFO - stdout - {'loss': 0.7427, 'grad_norm': 1.1640294790267944, 'learning_rate': 1.3889436594504939e-05, 'epoch': 1.18} +2025-02-05 19:09:09 - ERROR - stderr - 39%|███▉ | 8787/22434 [9:01:29<9:30:26, 2.51s/it] +2025-02-05 19:09:12 - ERROR - stderr - 39%|███▉ | 8788/22434 [9:01:31<9:35:18, 2.53s/it] +2025-02-05 19:09:12 - ERROR - stderr - +2025-02-05 19:09:12 - ERROR - stderr - +2025-02-05 19:09:12 - INFO - stdout - {'loss': 0.7745, 'grad_norm': 1.2284187078475952, 'learning_rate': 1.3888106485731999e-05, 'epoch': 1.18} +2025-02-05 19:09:12 - ERROR - stderr - 39%|███▉ | 8788/22434 [9:01:31<9:35:18, 2.53s/it] +2025-02-05 19:09:14 - ERROR - stderr - 39%|███▉ | 8789/22434 [9:01:34<9:31:16, 2.51s/it] +2025-02-05 19:09:14 - ERROR - stderr - +2025-02-05 19:09:14 - ERROR - stderr - +2025-02-05 19:09:14 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.2578758001327515, 'learning_rate': 1.3886776295915194e-05, 'epoch': 1.18} +2025-02-05 19:09:14 - ERROR - stderr - 39%|███▉ | 8789/22434 [9:01:34<9:31:16, 2.51s/it] +2025-02-05 19:09:17 - ERROR - stderr - 39%|███▉ | 8790/22434 [9:01:36<9:27:36, 2.50s/it] +2025-02-05 19:09:17 - ERROR - stderr - +2025-02-05 19:09:17 - ERROR - stderr - +2025-02-05 19:09:17 - INFO - stdout - {'loss': 0.65, 'grad_norm': 1.1694920063018799, 'learning_rate': 1.388544602508226e-05, 'epoch': 1.18} +2025-02-05 19:09:17 - ERROR - stderr - 39%|███▉ | 8790/22434 [9:01:36<9:27:36, 2.50s/it] +2025-02-05 19:09:19 - ERROR - stderr - 39%|███▉ | 8791/22434 [9:01:39<9:27:55, 2.50s/it] +2025-02-05 19:09:19 - ERROR - stderr - +2025-02-05 19:09:19 - ERROR - stderr - +2025-02-05 19:09:19 - INFO - stdout - {'loss': 0.6418, 'grad_norm': 1.0403350591659546, 'learning_rate': 1.388411567326092e-05, 'epoch': 1.18} +2025-02-05 19:09:19 - ERROR - stderr - 39%|███▉ | 8791/22434 [9:01:39<9:27:55, 2.50s/it] +2025-02-05 19:09:22 - ERROR - stderr - 39%|███▉ | 8792/22434 [9:01:41<9:27:40, 2.50s/it] +2025-02-05 19:09:22 - ERROR - stderr - +2025-02-05 19:09:22 - ERROR - stderr - +2025-02-05 19:09:22 - INFO - stdout - {'loss': 0.7248, 'grad_norm': 1.112365961074829, 'learning_rate': 1.3882785240478906e-05, 'epoch': 1.18} +2025-02-05 19:09:22 - ERROR - stderr - 39%|███▉ | 8792/22434 [9:01:41<9:27:40, 2.50s/it] +2025-02-05 19:09:24 - ERROR - stderr - 39%|███▉ | 8793/22434 [9:01:44<9:42:05, 2.56s/it] +2025-02-05 19:09:24 - ERROR - stderr - +2025-02-05 19:09:24 - ERROR - stderr - +2025-02-05 19:09:24 - INFO - stdout - {'loss': 0.7454, 'grad_norm': 1.2182716131210327, 'learning_rate': 1.3881454726763947e-05, 'epoch': 1.18} +2025-02-05 19:09:24 - ERROR - stderr - 39%|███▉ | 8793/22434 [9:01:44<9:42:05, 2.56s/it] +2025-02-05 19:09:27 - ERROR - stderr - 39%|███▉ | 8794/22434 [9:01:47<9:39:03, 2.55s/it] +2025-02-05 19:09:27 - ERROR - stderr - +2025-02-05 19:09:27 - ERROR - stderr - +2025-02-05 19:09:27 - INFO - stdout - {'loss': 0.8102, 'grad_norm': 1.1608079671859741, 'learning_rate': 1.3880124132143782e-05, 'epoch': 1.18} +2025-02-05 19:09:27 - ERROR - stderr - 39%|███▉ | 8794/22434 [9:01:47<9:39:03, 2.55s/it] +2025-02-05 19:09:29 - ERROR - stderr - 39%|███▉ | 8795/22434 [9:01:49<9:38:48, 2.55s/it] +2025-02-05 19:09:29 - ERROR - stderr - +2025-02-05 19:09:29 - ERROR - stderr - +2025-02-05 19:09:29 - INFO - stdout - {'loss': 0.7546, 'grad_norm': 1.1517155170440674, 'learning_rate': 1.387879345664614e-05, 'epoch': 1.18} +2025-02-05 19:09:29 - ERROR - stderr - 39%|███▉ | 8795/22434 [9:01:49<9:38:48, 2.55s/it] +2025-02-05 19:09:32 - ERROR - stderr - 39%|███▉ | 8796/22434 [9:01:52<9:35:41, 2.53s/it] +2025-02-05 19:09:32 - ERROR - stderr - +2025-02-05 19:09:32 - ERROR - stderr - +2025-02-05 19:09:32 - INFO - stdout - {'loss': 0.777, 'grad_norm': 1.1888848543167114, 'learning_rate': 1.3877462700298763e-05, 'epoch': 1.18} +2025-02-05 19:09:32 - ERROR - stderr - 39%|███▉ | 8796/22434 [9:01:52<9:35:41, 2.53s/it] +2025-02-05 19:09:34 - ERROR - stderr - 39%|███▉ | 8797/22434 [9:01:54<9:28:57, 2.50s/it] +2025-02-05 19:09:34 - ERROR - stderr - +2025-02-05 19:09:34 - ERROR - stderr - +2025-02-05 19:09:34 - INFO - stdout - {'loss': 0.7521, 'grad_norm': 1.1409714221954346, 'learning_rate': 1.3876131863129384e-05, 'epoch': 1.18} +2025-02-05 19:09:34 - ERROR - stderr - 39%|███▉ | 8797/22434 [9:01:54<9:28:57, 2.50s/it] +2025-02-05 19:09:37 - ERROR - stderr - 39%|███▉ | 8798/22434 [9:01:57<9:26:17, 2.49s/it] +2025-02-05 19:09:37 - ERROR - stderr - +2025-02-05 19:09:37 - ERROR - stderr - +2025-02-05 19:09:37 - INFO - stdout - {'loss': 0.6729, 'grad_norm': 1.151750087738037, 'learning_rate': 1.3874800945165746e-05, 'epoch': 1.18} +2025-02-05 19:09:37 - ERROR - stderr - 39%|███▉ | 8798/22434 [9:01:57<9:26:17, 2.49s/it] +2025-02-05 19:09:39 - ERROR - stderr - 39%|███▉ | 8799/22434 [9:01:59<9:27:32, 2.50s/it] +2025-02-05 19:09:39 - ERROR - stderr - +2025-02-05 19:09:39 - ERROR - stderr - +2025-02-05 19:09:39 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.1697924137115479, 'learning_rate': 1.387346994643559e-05, 'epoch': 1.18} +2025-02-05 19:09:39 - ERROR - stderr - 39%|███▉ | 8799/22434 [9:01:59<9:27:32, 2.50s/it] +2025-02-05 19:09:42 - ERROR - stderr - 39%|███▉ | 8800/22434 [9:02:02<9:29:19, 2.51s/it] +2025-02-05 19:09:42 - ERROR - stderr - +2025-02-05 19:09:42 - ERROR - stderr - +2025-02-05 19:09:42 - INFO - stdout - {'loss': 0.738, 'grad_norm': 1.2502949237823486, 'learning_rate': 1.3872138866966658e-05, 'epoch': 1.18} +2025-02-05 19:09:42 - ERROR - stderr - 39%|███▉ | 8800/22434 [9:02:02<9:29:19, 2.51s/it] +2025-02-05 19:09:44 - ERROR - stderr - 39%|███▉ | 8801/22434 [9:02:04<9:25:54, 2.49s/it] +2025-02-05 19:09:44 - ERROR - stderr - +2025-02-05 19:09:44 - ERROR - stderr - +2025-02-05 19:09:44 - INFO - stdout - {'loss': 0.8431, 'grad_norm': 1.328881859779358, 'learning_rate': 1.3870807706786697e-05, 'epoch': 1.18} +2025-02-05 19:09:44 - ERROR - stderr - 39%|███▉ | 8801/22434 [9:02:04<9:25:54, 2.49s/it] +2025-02-05 19:09:47 - ERROR - stderr - 39%|███▉ | 8802/22434 [9:02:07<9:28:25, 2.50s/it] +2025-02-05 19:09:47 - ERROR - stderr - +2025-02-05 19:09:47 - ERROR - stderr - +2025-02-05 19:09:47 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.194585919380188, 'learning_rate': 1.3869476465923455e-05, 'epoch': 1.18} +2025-02-05 19:09:47 - ERROR - stderr - 39%|███▉ | 8802/22434 [9:02:07<9:28:25, 2.50s/it] +2025-02-05 19:09:49 - ERROR - stderr - 39%|███▉ | 8803/22434 [9:02:09<9:29:32, 2.51s/it] +2025-02-05 19:09:49 - ERROR - stderr - +2025-02-05 19:09:49 - ERROR - stderr - +2025-02-05 19:09:49 - INFO - stdout - {'loss': 0.791, 'grad_norm': 1.397708773612976, 'learning_rate': 1.3868145144404677e-05, 'epoch': 1.18} +2025-02-05 19:09:49 - ERROR - stderr - 39%|███▉ | 8803/22434 [9:02:09<9:29:32, 2.51s/it] +2025-02-05 19:09:52 - ERROR - stderr - 39%|███▉ | 8804/22434 [9:02:11<9:24:56, 2.49s/it] +2025-02-05 19:09:52 - ERROR - stderr - +2025-02-05 19:09:52 - ERROR - stderr - +2025-02-05 19:09:52 - INFO - stdout - {'loss': 0.7434, 'grad_norm': 1.1802654266357422, 'learning_rate': 1.3866813742258116e-05, 'epoch': 1.18} +2025-02-05 19:09:52 - ERROR - stderr - 39%|███▉ | 8804/22434 [9:02:12<9:24:56, 2.49s/it] +2025-02-05 19:09:54 - ERROR - stderr - 39%|███▉ | 8805/22434 [9:02:14<9:43:21, 2.57s/it] +2025-02-05 19:09:55 - ERROR - stderr - +2025-02-05 19:09:55 - ERROR - stderr - +2025-02-05 19:09:55 - INFO - stdout - {'loss': 0.6187, 'grad_norm': 0.9789415001869202, 'learning_rate': 1.386548225951152e-05, 'epoch': 1.18} +2025-02-05 19:09:55 - ERROR - stderr - 39%|███▉ | 8805/22434 [9:02:14<9:43:21, 2.57s/it] +2025-02-05 19:09:57 - ERROR - stderr - 39%|███▉ | 8806/22434 [9:02:17<9:34:29, 2.53s/it] +2025-02-05 19:09:57 - ERROR - stderr - +2025-02-05 19:09:57 - ERROR - stderr - +2025-02-05 19:09:57 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.1203033924102783, 'learning_rate': 1.386415069619265e-05, 'epoch': 1.18} +2025-02-05 19:09:57 - ERROR - stderr - 39%|███▉ | 8806/22434 [9:02:17<9:34:29, 2.53s/it] +2025-02-05 19:09:59 - ERROR - stderr - 39%|███▉ | 8807/22434 [9:02:19<9:31:42, 2.52s/it] +2025-02-05 19:09:59 - ERROR - stderr - +2025-02-05 19:09:59 - ERROR - stderr - +2025-02-05 19:09:59 - INFO - stdout - {'loss': 0.777, 'grad_norm': 1.1714390516281128, 'learning_rate': 1.386281905232925e-05, 'epoch': 1.18} +2025-02-05 19:09:59 - ERROR - stderr - 39%|███▉ | 8807/22434 [9:02:19<9:31:42, 2.52s/it] +2025-02-05 19:10:02 - ERROR - stderr - 39%|███▉ | 8808/22434 [9:02:22<9:30:40, 2.51s/it] +2025-02-05 19:10:02 - ERROR - stderr - +2025-02-05 19:10:02 - ERROR - stderr - +2025-02-05 19:10:02 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.0305331945419312, 'learning_rate': 1.386148732794909e-05, 'epoch': 1.18} +2025-02-05 19:10:02 - ERROR - stderr - 39%|███▉ | 8808/22434 [9:02:22<9:30:40, 2.51s/it] +2025-02-05 19:10:04 - ERROR - stderr - 39%|███▉ | 8809/22434 [9:02:24<9:26:15, 2.49s/it] +2025-02-05 19:10:04 - ERROR - stderr - +2025-02-05 19:10:04 - ERROR - stderr - +2025-02-05 19:10:04 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.1210215091705322, 'learning_rate': 1.386015552307992e-05, 'epoch': 1.18} +2025-02-05 19:10:04 - ERROR - stderr - 39%|███▉ | 8809/22434 [9:02:24<9:26:15, 2.49s/it] +2025-02-05 19:10:07 - ERROR - stderr - 39%|███▉ | 8810/22434 [9:02:27<9:29:54, 2.51s/it] +2025-02-05 19:10:07 - ERROR - stderr - +2025-02-05 19:10:07 - ERROR - stderr - +2025-02-05 19:10:07 - INFO - stdout - {'loss': 0.6542, 'grad_norm': 1.1202268600463867, 'learning_rate': 1.3858823637749498e-05, 'epoch': 1.18} +2025-02-05 19:10:07 - ERROR - stderr - 39%|███▉ | 8810/22434 [9:02:27<9:29:54, 2.51s/it] +2025-02-05 19:10:09 - ERROR - stderr - 39%|███▉ | 8811/22434 [9:02:29<9:35:26, 2.53s/it] +2025-02-05 19:10:10 - ERROR - stderr - +2025-02-05 19:10:10 - ERROR - stderr - +2025-02-05 19:10:10 - INFO - stdout - {'loss': 0.7153, 'grad_norm': 1.1044254302978516, 'learning_rate': 1.3857491671985592e-05, 'epoch': 1.18} +2025-02-05 19:10:10 - ERROR - stderr - 39%|███▉ | 8811/22434 [9:02:29<9:35:26, 2.53s/it] +2025-02-05 19:10:12 - ERROR - stderr - 39%|███▉ | 8812/22434 [9:02:32<9:31:42, 2.52s/it] +2025-02-05 19:10:12 - ERROR - stderr - +2025-02-05 19:10:12 - ERROR - stderr - +2025-02-05 19:10:12 - INFO - stdout - {'loss': 0.7749, 'grad_norm': 1.404664158821106, 'learning_rate': 1.3856159625815964e-05, 'epoch': 1.18} +2025-02-05 19:10:12 - ERROR - stderr - 39%|███▉ | 8812/22434 [9:02:32<9:31:42, 2.52s/it] +2025-02-05 19:10:14 - ERROR - stderr - 39%|███▉ | 8813/22434 [9:02:34<9:27:33, 2.50s/it] +2025-02-05 19:10:14 - ERROR - stderr - +2025-02-05 19:10:14 - ERROR - stderr - +2025-02-05 19:10:14 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.0921473503112793, 'learning_rate': 1.3854827499268377e-05, 'epoch': 1.18} +2025-02-05 19:10:14 - ERROR - stderr - 39%|███▉ | 8813/22434 [9:02:34<9:27:33, 2.50s/it] +2025-02-05 19:10:17 - ERROR - stderr - 39%|███▉ | 8814/22434 [9:02:37<9:30:41, 2.51s/it] +2025-02-05 19:10:17 - ERROR - stderr - +2025-02-05 19:10:17 - ERROR - stderr - +2025-02-05 19:10:17 - INFO - stdout - {'loss': 0.6862, 'grad_norm': 1.0977472066879272, 'learning_rate': 1.3853495292370603e-05, 'epoch': 1.18} +2025-02-05 19:10:17 - ERROR - stderr - 39%|███▉ | 8814/22434 [9:02:37<9:30:41, 2.51s/it] +2025-02-05 19:10:20 - ERROR - stderr - 39%|███▉ | 8815/22434 [9:02:39<9:31:36, 2.52s/it] +2025-02-05 19:10:20 - ERROR - stderr - +2025-02-05 19:10:20 - ERROR - stderr - +2025-02-05 19:10:20 - INFO - stdout - {'loss': 0.7437, 'grad_norm': 1.143293857574463, 'learning_rate': 1.3852163005150402e-05, 'epoch': 1.18} +2025-02-05 19:10:20 - ERROR - stderr - 39%|███▉ | 8815/22434 [9:02:39<9:31:36, 2.52s/it] +2025-02-05 19:10:22 - ERROR - stderr - 39%|███▉ | 8816/22434 [9:02:42<9:32:27, 2.52s/it] +2025-02-05 19:10:22 - ERROR - stderr - +2025-02-05 19:10:22 - ERROR - stderr - +2025-02-05 19:10:22 - INFO - stdout - {'loss': 0.698, 'grad_norm': 1.1174218654632568, 'learning_rate': 1.3850830637635556e-05, 'epoch': 1.18} +2025-02-05 19:10:22 - ERROR - stderr - 39%|███▉ | 8816/22434 [9:02:42<9:32:27, 2.52s/it] +2025-02-05 19:10:25 - ERROR - stderr - 39%|███▉ | 8817/22434 [9:02:44<9:36:17, 2.54s/it] +2025-02-05 19:10:25 - ERROR - stderr - +2025-02-05 19:10:25 - ERROR - stderr - +2025-02-05 19:10:25 - INFO - stdout - {'loss': 0.7469, 'grad_norm': 1.2731053829193115, 'learning_rate': 1.3849498189853826e-05, 'epoch': 1.18} +2025-02-05 19:10:25 - ERROR - stderr - 39%|███▉ | 8817/22434 [9:02:44<9:36:17, 2.54s/it] +2025-02-05 19:10:27 - ERROR - stderr - 39%|███▉ | 8818/22434 [9:02:47<9:58:51, 2.64s/it] +2025-02-05 19:10:28 - ERROR - stderr - +2025-02-05 19:10:28 - ERROR - stderr - +2025-02-05 19:10:28 - INFO - stdout - {'loss': 0.7478, 'grad_norm': 1.2739965915679932, 'learning_rate': 1.3848165661832986e-05, 'epoch': 1.18} +2025-02-05 19:10:28 - ERROR - stderr - 39%|███▉ | 8818/22434 [9:02:47<9:58:51, 2.64s/it] +2025-02-05 19:10:30 - ERROR - stderr - 39%|███▉ | 8819/22434 [9:02:50<9:53:36, 2.62s/it] +2025-02-05 19:10:30 - ERROR - stderr - +2025-02-05 19:10:30 - ERROR - stderr - +2025-02-05 19:10:30 - INFO - stdout - {'loss': 0.7355, 'grad_norm': 1.0636004209518433, 'learning_rate': 1.3846833053600819e-05, 'epoch': 1.18} +2025-02-05 19:10:30 - ERROR - stderr - 39%|███▉ | 8819/22434 [9:02:50<9:53:36, 2.62s/it] +2025-02-05 19:10:33 - ERROR - stderr - 39%|███▉ | 8820/22434 [9:02:52<9:43:53, 2.57s/it] +2025-02-05 19:10:33 - ERROR - stderr - +2025-02-05 19:10:33 - ERROR - stderr - +2025-02-05 19:10:33 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.1283109188079834, 'learning_rate': 1.38455003651851e-05, 'epoch': 1.18} +2025-02-05 19:10:33 - ERROR - stderr - 39%|███▉ | 8820/22434 [9:02:52<9:43:53, 2.57s/it] +2025-02-05 19:10:35 - ERROR - stderr - 39%|███▉ | 8821/22434 [9:02:55<9:40:39, 2.56s/it] +2025-02-05 19:10:35 - ERROR - stderr - +2025-02-05 19:10:35 - ERROR - stderr - +2025-02-05 19:10:35 - INFO - stdout - {'loss': 0.7001, 'grad_norm': 1.2054831981658936, 'learning_rate': 1.3844167596613604e-05, 'epoch': 1.18} +2025-02-05 19:10:35 - ERROR - stderr - 39%|███▉ | 8821/22434 [9:02:55<9:40:39, 2.56s/it] +2025-02-05 19:10:38 - ERROR - stderr - 39%|███▉ | 8822/22434 [9:02:57<9:36:07, 2.54s/it] +2025-02-05 19:10:38 - ERROR - stderr - +2025-02-05 19:10:38 - ERROR - stderr - +2025-02-05 19:10:38 - INFO - stdout - {'loss': 0.7324, 'grad_norm': 1.2663980722427368, 'learning_rate': 1.3842834747914111e-05, 'epoch': 1.18} +2025-02-05 19:10:38 - ERROR - stderr - 39%|███▉ | 8822/22434 [9:02:57<9:36:07, 2.54s/it] +2025-02-05 19:10:40 - ERROR - stderr - 39%|███▉ | 8823/22434 [9:03:00<9:35:30, 2.54s/it] +2025-02-05 19:10:40 - ERROR - stderr - +2025-02-05 19:10:40 - ERROR - stderr - +2025-02-05 19:10:40 - INFO - stdout - {'loss': 0.7871, 'grad_norm': 1.1540831327438354, 'learning_rate': 1.3841501819114407e-05, 'epoch': 1.18} +2025-02-05 19:10:40 - ERROR - stderr - 39%|███▉ | 8823/22434 [9:03:00<9:35:30, 2.54s/it] +2025-02-05 19:10:43 - ERROR - stderr - 39%|███▉ | 8824/22434 [9:03:02<9:32:41, 2.52s/it] +2025-02-05 19:10:43 - ERROR - stderr - +2025-02-05 19:10:43 - ERROR - stderr - +2025-02-05 19:10:43 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.0857833623886108, 'learning_rate': 1.3840168810242274e-05, 'epoch': 1.18} +2025-02-05 19:10:43 - ERROR - stderr - 39%|███▉ | 8824/22434 [9:03:02<9:32:41, 2.52s/it] +2025-02-05 19:10:45 - ERROR - stderr - 39%|███▉ | 8825/22434 [9:03:05<9:37:40, 2.55s/it] +2025-02-05 19:10:45 - ERROR - stderr - +2025-02-05 19:10:45 - ERROR - stderr - +2025-02-05 19:10:45 - INFO - stdout - {'loss': 0.7216, 'grad_norm': 1.1277796030044556, 'learning_rate': 1.3838835721325493e-05, 'epoch': 1.18} +2025-02-05 19:10:45 - ERROR - stderr - 39%|███▉ | 8825/22434 [9:03:05<9:37:40, 2.55s/it] +2025-02-05 19:10:48 - ERROR - stderr - 39%|███▉ | 8826/22434 [9:03:08<9:47:57, 2.59s/it] +2025-02-05 19:10:48 - ERROR - stderr - +2025-02-05 19:10:48 - ERROR - stderr - +2025-02-05 19:10:48 - INFO - stdout - {'loss': 0.7467, 'grad_norm': 1.3190909624099731, 'learning_rate': 1.3837502552391859e-05, 'epoch': 1.18} +2025-02-05 19:10:48 - ERROR - stderr - 39%|███▉ | 8826/22434 [9:03:08<9:47:57, 2.59s/it] +2025-02-05 19:10:51 - ERROR - stderr - 39%|███▉ | 8827/22434 [9:03:10<9:50:43, 2.60s/it] +2025-02-05 19:10:51 - ERROR - stderr - +2025-02-05 19:10:51 - ERROR - stderr - +2025-02-05 19:10:51 - INFO - stdout - {'loss': 0.6277, 'grad_norm': 1.152365803718567, 'learning_rate': 1.3836169303469154e-05, 'epoch': 1.18} +2025-02-05 19:10:51 - ERROR - stderr - 39%|███▉ | 8827/22434 [9:03:10<9:50:43, 2.60s/it] +2025-02-05 19:10:53 - ERROR - stderr - 39%|███▉ | 8828/22434 [9:03:13<9:43:57, 2.58s/it] +2025-02-05 19:10:53 - ERROR - stderr - +2025-02-05 19:10:53 - ERROR - stderr - +2025-02-05 19:10:53 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.2193844318389893, 'learning_rate': 1.3834835974585175e-05, 'epoch': 1.18} +2025-02-05 19:10:53 - ERROR - stderr - 39%|███▉ | 8828/22434 [9:03:13<9:43:57, 2.58s/it] +2025-02-05 19:10:56 - ERROR - stderr - 39%|███▉ | 8829/22434 [9:03:15<9:41:28, 2.56s/it] +2025-02-05 19:10:56 - ERROR - stderr - +2025-02-05 19:10:56 - ERROR - stderr - +2025-02-05 19:10:56 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.150195837020874, 'learning_rate': 1.3833502565767705e-05, 'epoch': 1.18} +2025-02-05 19:10:56 - ERROR - stderr - 39%|███▉ | 8829/22434 [9:03:15<9:41:28, 2.56s/it] +2025-02-05 19:10:58 - ERROR - stderr - 39%|███▉ | 8830/22434 [9:03:18<9:37:33, 2.55s/it] +2025-02-05 19:10:58 - ERROR - stderr - +2025-02-05 19:10:58 - ERROR - stderr - +2025-02-05 19:10:58 - INFO - stdout - {'loss': 0.6639, 'grad_norm': 1.2230052947998047, 'learning_rate': 1.3832169077044544e-05, 'epoch': 1.18} +2025-02-05 19:10:58 - ERROR - stderr - 39%|███▉ | 8830/22434 [9:03:18<9:37:33, 2.55s/it] +2025-02-05 19:11:01 - ERROR - stderr - 39%|███▉ | 8831/22434 [9:03:20<9:34:46, 2.54s/it] +2025-02-05 19:11:01 - ERROR - stderr - +2025-02-05 19:11:01 - ERROR - stderr - +2025-02-05 19:11:01 - INFO - stdout - {'loss': 0.7204, 'grad_norm': 1.1692253351211548, 'learning_rate': 1.3830835508443484e-05, 'epoch': 1.18} +2025-02-05 19:11:01 - ERROR - stderr - 39%|███▉ | 8831/22434 [9:03:20<9:34:46, 2.54s/it] +2025-02-05 19:11:03 - ERROR - stderr - 39%|███▉ | 8832/22434 [9:03:23<9:37:31, 2.55s/it] +2025-02-05 19:11:03 - ERROR - stderr - +2025-02-05 19:11:03 - ERROR - stderr - +2025-02-05 19:11:03 - INFO - stdout - {'loss': 0.7721, 'grad_norm': 1.3504698276519775, 'learning_rate': 1.3829501859992322e-05, 'epoch': 1.18} +2025-02-05 19:11:03 - ERROR - stderr - 39%|███▉ | 8832/22434 [9:03:23<9:37:31, 2.55s/it] +2025-02-05 19:11:06 - ERROR - stderr - 39%|███▉ | 8833/22434 [9:03:25<9:39:12, 2.56s/it] +2025-02-05 19:11:06 - ERROR - stderr - +2025-02-05 19:11:06 - ERROR - stderr - +2025-02-05 19:11:06 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.1660438776016235, 'learning_rate': 1.3828168131718861e-05, 'epoch': 1.18} +2025-02-05 19:11:06 - ERROR - stderr - 39%|███▉ | 8833/22434 [9:03:26<9:39:12, 2.56s/it] +2025-02-05 19:11:08 - ERROR - stderr - 39%|███▉ | 8834/22434 [9:03:28<9:35:09, 2.54s/it] +2025-02-05 19:11:08 - ERROR - stderr - +2025-02-05 19:11:08 - ERROR - stderr - +2025-02-05 19:11:08 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.2020562887191772, 'learning_rate': 1.3826834323650899e-05, 'epoch': 1.18} +2025-02-05 19:11:08 - ERROR - stderr - 39%|███▉ | 8834/22434 [9:03:28<9:35:09, 2.54s/it] +2025-02-05 19:11:11 - ERROR - stderr - 39%|███▉ | 8835/22434 [9:03:30<9:30:40, 2.52s/it] +2025-02-05 19:11:11 - ERROR - stderr - +2025-02-05 19:11:11 - ERROR - stderr - +2025-02-05 19:11:11 - INFO - stdout - {'loss': 0.7328, 'grad_norm': 1.1986483335494995, 'learning_rate': 1.3825500435816237e-05, 'epoch': 1.18} +2025-02-05 19:11:11 - ERROR - stderr - 39%|███▉ | 8835/22434 [9:03:31<9:30:40, 2.52s/it] +2025-02-05 19:11:13 - ERROR - stderr - 39%|███▉ | 8836/22434 [9:03:33<9:27:26, 2.50s/it] +2025-02-05 19:11:13 - ERROR - stderr - +2025-02-05 19:11:13 - ERROR - stderr - +2025-02-05 19:11:13 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.162795901298523, 'learning_rate': 1.3824166468242677e-05, 'epoch': 1.18} +2025-02-05 19:11:13 - ERROR - stderr - 39%|███▉ | 8836/22434 [9:03:33<9:27:26, 2.50s/it] +2025-02-05 19:11:16 - ERROR - stderr - 39%|███▉ | 8837/22434 [9:03:35<9:21:13, 2.48s/it] +2025-02-05 19:11:16 - ERROR - stderr - +2025-02-05 19:11:16 - ERROR - stderr - +2025-02-05 19:11:16 - INFO - stdout - {'loss': 0.6605, 'grad_norm': 0.9895807504653931, 'learning_rate': 1.3822832420958028e-05, 'epoch': 1.18} +2025-02-05 19:11:16 - ERROR - stderr - 39%|███▉ | 8837/22434 [9:03:35<9:21:13, 2.48s/it] +2025-02-05 19:11:18 - ERROR - stderr - 39%|███▉ | 8838/22434 [9:03:38<9:27:00, 2.50s/it] +2025-02-05 19:11:18 - ERROR - stderr - +2025-02-05 19:11:18 - ERROR - stderr - +2025-02-05 19:11:18 - INFO - stdout - {'loss': 0.6813, 'grad_norm': 1.1690711975097656, 'learning_rate': 1.3821498293990097e-05, 'epoch': 1.18} +2025-02-05 19:11:18 - ERROR - stderr - 39%|███▉ | 8838/22434 [9:03:38<9:27:00, 2.50s/it] +2025-02-05 19:11:21 - ERROR - stderr - 39%|███▉ | 8839/22434 [9:03:40<9:25:03, 2.49s/it] +2025-02-05 19:11:21 - ERROR - stderr - +2025-02-05 19:11:21 - ERROR - stderr - +2025-02-05 19:11:21 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.1499677896499634, 'learning_rate': 1.3820164087366688e-05, 'epoch': 1.18} +2025-02-05 19:11:21 - ERROR - stderr - 39%|███▉ | 8839/22434 [9:03:40<9:25:03, 2.49s/it] +2025-02-05 19:11:23 - ERROR - stderr - 39%|███▉ | 8840/22434 [9:03:43<9:28:40, 2.51s/it] +2025-02-05 19:11:23 - ERROR - stderr - +2025-02-05 19:11:23 - ERROR - stderr - +2025-02-05 19:11:23 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.1874727010726929, 'learning_rate': 1.3818829801115615e-05, 'epoch': 1.18} +2025-02-05 19:11:23 - ERROR - stderr - 39%|███▉ | 8840/22434 [9:03:43<9:28:40, 2.51s/it] +2025-02-05 19:11:26 - ERROR - stderr - 39%|███▉ | 8841/22434 [9:03:45<9:30:16, 2.52s/it] +2025-02-05 19:11:26 - ERROR - stderr - +2025-02-05 19:11:26 - ERROR - stderr - +2025-02-05 19:11:26 - INFO - stdout - {'loss': 0.8257, 'grad_norm': 1.2177965641021729, 'learning_rate': 1.381749543526469e-05, 'epoch': 1.18} +2025-02-05 19:11:26 - ERROR - stderr - 39%|███▉ | 8841/22434 [9:03:46<9:30:16, 2.52s/it] +2025-02-05 19:11:28 - ERROR - stderr - 39%|███▉ | 8842/22434 [9:03:48<9:28:57, 2.51s/it] +2025-02-05 19:11:28 - ERROR - stderr - +2025-02-05 19:11:28 - ERROR - stderr - +2025-02-05 19:11:28 - INFO - stdout - {'loss': 0.7723, 'grad_norm': 1.1914085149765015, 'learning_rate': 1.3816160989841725e-05, 'epoch': 1.18} +2025-02-05 19:11:28 - ERROR - stderr - 39%|███▉ | 8842/22434 [9:03:48<9:28:57, 2.51s/it] +2025-02-05 19:11:31 - ERROR - stderr - 39%|███▉ | 8843/22434 [9:03:50<9:27:36, 2.51s/it] +2025-02-05 19:11:31 - ERROR - stderr - +2025-02-05 19:11:31 - ERROR - stderr - +2025-02-05 19:11:31 - INFO - stdout - {'loss': 0.7329, 'grad_norm': 1.1730551719665527, 'learning_rate': 1.3814826464874536e-05, 'epoch': 1.18} +2025-02-05 19:11:31 - ERROR - stderr - 39%|███▉ | 8843/22434 [9:03:50<9:27:36, 2.51s/it] +2025-02-05 19:11:33 - ERROR - stderr - 39%|███▉ | 8844/22434 [9:03:53<9:29:20, 2.51s/it] +2025-02-05 19:11:33 - ERROR - stderr - +2025-02-05 19:11:33 - ERROR - stderr - +2025-02-05 19:11:33 - INFO - stdout - {'loss': 0.7369, 'grad_norm': 1.1350566148757935, 'learning_rate': 1.3813491860390938e-05, 'epoch': 1.18} +2025-02-05 19:11:33 - ERROR - stderr - 39%|███▉ | 8844/22434 [9:03:53<9:29:20, 2.51s/it] +2025-02-05 19:11:36 - ERROR - stderr - 39%|███▉ | 8845/22434 [9:03:55<9:29:11, 2.51s/it] +2025-02-05 19:11:36 - ERROR - stderr - +2025-02-05 19:11:36 - ERROR - stderr - +2025-02-05 19:11:36 - INFO - stdout - {'loss': 0.6378, 'grad_norm': 1.006437063217163, 'learning_rate': 1.3812157176418752e-05, 'epoch': 1.18} +2025-02-05 19:11:36 - ERROR - stderr - 39%|███▉ | 8845/22434 [9:03:56<9:29:11, 2.51s/it] +2025-02-05 19:11:38 - ERROR - stderr - 39%|███▉ | 8846/22434 [9:03:58<9:25:21, 2.50s/it] +2025-02-05 19:11:38 - ERROR - stderr - +2025-02-05 19:11:38 - ERROR - stderr - +2025-02-05 19:11:38 - INFO - stdout - {'loss': 0.751, 'grad_norm': 1.1954995393753052, 'learning_rate': 1.3810822412985798e-05, 'epoch': 1.18} +2025-02-05 19:11:38 - ERROR - stderr - 39%|███▉ | 8846/22434 [9:03:58<9:25:21, 2.50s/it] +2025-02-05 19:11:41 - ERROR - stderr - 39%|███▉ | 8847/22434 [9:04:00<9:22:58, 2.49s/it] +2025-02-05 19:11:41 - ERROR - stderr - +2025-02-05 19:11:41 - ERROR - stderr - +2025-02-05 19:11:41 - INFO - stdout - {'loss': 0.7354, 'grad_norm': 1.178267478942871, 'learning_rate': 1.3809487570119898e-05, 'epoch': 1.18} +2025-02-05 19:11:41 - ERROR - stderr - 39%|███▉ | 8847/22434 [9:04:00<9:22:58, 2.49s/it] +2025-02-05 19:11:43 - ERROR - stderr - 39%|███▉ | 8848/22434 [9:04:03<9:38:51, 2.56s/it] +2025-02-05 19:11:43 - ERROR - stderr - +2025-02-05 19:11:43 - ERROR - stderr - +2025-02-05 19:11:43 - INFO - stdout - {'loss': 0.787, 'grad_norm': 1.1975408792495728, 'learning_rate': 1.3808152647848874e-05, 'epoch': 1.18} +2025-02-05 19:11:43 - ERROR - stderr - 39%|███▉ | 8848/22434 [9:04:03<9:38:51, 2.56s/it] +2025-02-05 19:11:46 - ERROR - stderr - 39%|███▉ | 8849/22434 [9:04:06<9:41:18, 2.57s/it] +2025-02-05 19:11:46 - ERROR - stderr - +2025-02-05 19:11:46 - ERROR - stderr - +2025-02-05 19:11:46 - INFO - stdout - {'loss': 0.7524, 'grad_norm': 1.2115012407302856, 'learning_rate': 1.3806817646200554e-05, 'epoch': 1.18} +2025-02-05 19:11:46 - ERROR - stderr - 39%|███▉ | 8849/22434 [9:04:06<9:41:18, 2.57s/it] +2025-02-05 19:11:49 - ERROR - stderr - 39%|███▉ | 8850/22434 [9:04:08<9:45:45, 2.59s/it] +2025-02-05 19:11:49 - ERROR - stderr - +2025-02-05 19:11:49 - ERROR - stderr - +2025-02-05 19:11:49 - INFO - stdout - {'loss': 0.6741, 'grad_norm': 1.059596061706543, 'learning_rate': 1.380548256520276e-05, 'epoch': 1.18} +2025-02-05 19:11:49 - ERROR - stderr - 39%|███▉ | 8850/22434 [9:04:08<9:45:45, 2.59s/it] +2025-02-05 19:11:51 - ERROR - stderr - 39%|███▉ | 8851/22434 [9:04:11<9:38:58, 2.56s/it] +2025-02-05 19:11:51 - ERROR - stderr - +2025-02-05 19:11:51 - ERROR - stderr - +2025-02-05 19:11:51 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.1211072206497192, 'learning_rate': 1.3804147404883323e-05, 'epoch': 1.18} +2025-02-05 19:11:51 - ERROR - stderr - 39%|███▉ | 8851/22434 [9:04:11<9:38:58, 2.56s/it] +2025-02-05 19:11:54 - ERROR - stderr - 39%|███▉ | 8852/22434 [9:04:13<9:34:12, 2.54s/it] +2025-02-05 19:11:54 - ERROR - stderr - +2025-02-05 19:11:54 - ERROR - stderr - +2025-02-05 19:11:54 - INFO - stdout - {'loss': 0.7666, 'grad_norm': 1.1713043451309204, 'learning_rate': 1.3802812165270076e-05, 'epoch': 1.18} +2025-02-05 19:11:54 - ERROR - stderr - 39%|███▉ | 8852/22434 [9:04:13<9:34:12, 2.54s/it] +2025-02-05 19:11:56 - ERROR - stderr - 39%|███▉ | 8853/22434 [9:04:16<9:33:42, 2.53s/it] +2025-02-05 19:11:56 - ERROR - stderr - +2025-02-05 19:11:56 - ERROR - stderr - +2025-02-05 19:11:56 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.132416844367981, 'learning_rate': 1.3801476846390848e-05, 'epoch': 1.18} +2025-02-05 19:11:56 - ERROR - stderr - 39%|███▉ | 8853/22434 [9:04:16<9:33:42, 2.53s/it] +2025-02-05 19:11:59 - ERROR - stderr - 39%|███▉ | 8854/22434 [9:04:18<9:32:17, 2.53s/it] +2025-02-05 19:11:59 - ERROR - stderr - +2025-02-05 19:11:59 - ERROR - stderr - +2025-02-05 19:11:59 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.156435489654541, 'learning_rate': 1.3800141448273472e-05, 'epoch': 1.18} +2025-02-05 19:11:59 - ERROR - stderr - 39%|███▉ | 8854/22434 [9:04:18<9:32:17, 2.53s/it] +2025-02-05 19:12:01 - ERROR - stderr - 39%|███▉ | 8855/22434 [9:04:21<9:46:06, 2.59s/it] +2025-02-05 19:12:01 - ERROR - stderr - +2025-02-05 19:12:01 - ERROR - stderr - +2025-02-05 19:12:01 - INFO - stdout - {'loss': 0.7547, 'grad_norm': 1.0945433378219604, 'learning_rate': 1.3798805970945783e-05, 'epoch': 1.18} +2025-02-05 19:12:01 - ERROR - stderr - 39%|███▉ | 8855/22434 [9:04:21<9:46:06, 2.59s/it] +2025-02-05 19:12:04 - ERROR - stderr - 39%|███▉ | 8856/22434 [9:04:24<9:39:45, 2.56s/it] +2025-02-05 19:12:04 - ERROR - stderr - +2025-02-05 19:12:04 - ERROR - stderr - +2025-02-05 19:12:04 - INFO - stdout - {'loss': 0.7394, 'grad_norm': 1.171900749206543, 'learning_rate': 1.379747041443562e-05, 'epoch': 1.18} +2025-02-05 19:12:04 - ERROR - stderr - 39%|███▉ | 8856/22434 [9:04:24<9:39:45, 2.56s/it] +2025-02-05 19:12:06 - ERROR - stderr - 39%|███▉ | 8857/22434 [9:04:26<9:34:22, 2.54s/it] +2025-02-05 19:12:06 - ERROR - stderr - +2025-02-05 19:12:06 - ERROR - stderr - +2025-02-05 19:12:06 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.1396409273147583, 'learning_rate': 1.3796134778770819e-05, 'epoch': 1.18} +2025-02-05 19:12:06 - ERROR - stderr - 39%|███▉ | 8857/22434 [9:04:26<9:34:22, 2.54s/it] +2025-02-05 19:12:09 - ERROR - stderr - 39%|███▉ | 8858/22434 [9:04:29<9:28:26, 2.51s/it] +2025-02-05 19:12:09 - ERROR - stderr - +2025-02-05 19:12:09 - ERROR - stderr - +2025-02-05 19:12:09 - INFO - stdout - {'loss': 0.7491, 'grad_norm': 1.2525348663330078, 'learning_rate': 1.3794799063979224e-05, 'epoch': 1.18} +2025-02-05 19:12:09 - ERROR - stderr - 39%|███▉ | 8858/22434 [9:04:29<9:28:26, 2.51s/it] +2025-02-05 19:12:11 - ERROR - stderr - 39%|███▉ | 8859/22434 [9:04:31<9:29:39, 2.52s/it] +2025-02-05 19:12:11 - ERROR - stderr - +2025-02-05 19:12:11 - ERROR - stderr - +2025-02-05 19:12:11 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 0.999947726726532, 'learning_rate': 1.379346327008867e-05, 'epoch': 1.18} +2025-02-05 19:12:11 - ERROR - stderr - 39%|███▉ | 8859/22434 [9:04:31<9:29:39, 2.52s/it] +2025-02-05 19:12:14 - ERROR - stderr - 39%|███▉ | 8860/22434 [9:04:34<9:29:20, 2.52s/it] +2025-02-05 19:12:14 - ERROR - stderr - +2025-02-05 19:12:14 - ERROR - stderr - +2025-02-05 19:12:14 - INFO - stdout - {'loss': 0.7409, 'grad_norm': 1.0630241632461548, 'learning_rate': 1.3792127397127006e-05, 'epoch': 1.18} +2025-02-05 19:12:14 - ERROR - stderr - 39%|███▉ | 8860/22434 [9:04:34<9:29:20, 2.52s/it] +2025-02-05 19:12:16 - ERROR - stderr - 39%|███▉ | 8861/22434 [9:04:36<9:26:22, 2.50s/it] +2025-02-05 19:12:16 - ERROR - stderr - +2025-02-05 19:12:16 - ERROR - stderr - +2025-02-05 19:12:16 - INFO - stdout - {'loss': 0.737, 'grad_norm': 1.2196308374404907, 'learning_rate': 1.3790791445122076e-05, 'epoch': 1.18} +2025-02-05 19:12:16 - ERROR - stderr - 39%|███▉ | 8861/22434 [9:04:36<9:26:22, 2.50s/it] +2025-02-05 19:12:19 - ERROR - stderr - 40%|███▉ | 8862/22434 [9:04:39<9:45:32, 2.59s/it] +2025-02-05 19:12:19 - ERROR - stderr - +2025-02-05 19:12:19 - ERROR - stderr - +2025-02-05 19:12:19 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.088104009628296, 'learning_rate': 1.3789455414101724e-05, 'epoch': 1.19} +2025-02-05 19:12:19 - ERROR - stderr - 40%|███▉ | 8862/22434 [9:04:39<9:45:32, 2.59s/it] +2025-02-05 19:12:22 - ERROR - stderr - 40%|███▉ | 8863/22434 [9:04:41<9:42:00, 2.57s/it] +2025-02-05 19:12:22 - ERROR - stderr - +2025-02-05 19:12:22 - ERROR - stderr - +2025-02-05 19:12:22 - INFO - stdout - {'loss': 0.7401, 'grad_norm': 1.2695367336273193, 'learning_rate': 1.3788119304093801e-05, 'epoch': 1.19} +2025-02-05 19:12:22 - ERROR - stderr - 40%|███▉ | 8863/22434 [9:04:41<9:42:00, 2.57s/it] +2025-02-05 19:12:24 - ERROR - stderr - 40%|███▉ | 8864/22434 [9:04:44<9:37:36, 2.55s/it] +2025-02-05 19:12:24 - ERROR - stderr - +2025-02-05 19:12:24 - ERROR - stderr - +2025-02-05 19:12:24 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.137142539024353, 'learning_rate': 1.3786783115126152e-05, 'epoch': 1.19} +2025-02-05 19:12:24 - ERROR - stderr - 40%|███▉ | 8864/22434 [9:04:44<9:37:36, 2.55s/it] +2025-02-05 19:12:27 - ERROR - stderr - 40%|███▉ | 8865/22434 [9:04:46<9:34:01, 2.54s/it] +2025-02-05 19:12:27 - ERROR - stderr - +2025-02-05 19:12:27 - ERROR - stderr - +2025-02-05 19:12:27 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 0.982507050037384, 'learning_rate': 1.3785446847226638e-05, 'epoch': 1.19} +2025-02-05 19:12:27 - ERROR - stderr - 40%|███▉ | 8865/22434 [9:04:46<9:34:01, 2.54s/it] +2025-02-05 19:12:29 - ERROR - stderr - 40%|███▉ | 8866/22434 [9:04:49<9:30:55, 2.52s/it] +2025-02-05 19:12:29 - ERROR - stderr - +2025-02-05 19:12:29 - ERROR - stderr - +2025-02-05 19:12:29 - INFO - stdout - {'loss': 0.6972, 'grad_norm': 1.1282432079315186, 'learning_rate': 1.3784110500423104e-05, 'epoch': 1.19} +2025-02-05 19:12:29 - ERROR - stderr - 40%|███▉ | 8866/22434 [9:04:49<9:30:55, 2.52s/it] +2025-02-05 19:12:32 - ERROR - stderr - 40%|███▉ | 8867/22434 [9:04:51<9:29:36, 2.52s/it] +2025-02-05 19:12:32 - ERROR - stderr - +2025-02-05 19:12:32 - ERROR - stderr - +2025-02-05 19:12:32 - INFO - stdout - {'loss': 0.7179, 'grad_norm': 1.0812016725540161, 'learning_rate': 1.3782774074743409e-05, 'epoch': 1.19} +2025-02-05 19:12:32 - ERROR - stderr - 40%|███▉ | 8867/22434 [9:04:51<9:29:36, 2.52s/it] +2025-02-05 19:12:34 - ERROR - stderr - 40%|███▉ | 8868/22434 [9:04:54<9:32:35, 2.53s/it] +2025-02-05 19:12:34 - ERROR - stderr - +2025-02-05 19:12:34 - ERROR - stderr - +2025-02-05 19:12:34 - INFO - stdout - {'loss': 0.7363, 'grad_norm': 1.2406963109970093, 'learning_rate': 1.3781437570215405e-05, 'epoch': 1.19} +2025-02-05 19:12:34 - ERROR - stderr - 40%|███▉ | 8868/22434 [9:04:54<9:32:35, 2.53s/it] +2025-02-05 19:12:37 - ERROR - stderr - 40%|███▉ | 8869/22434 [9:04:57<9:33:32, 2.54s/it] +2025-02-05 19:12:37 - ERROR - stderr - +2025-02-05 19:12:37 - ERROR - stderr - +2025-02-05 19:12:37 - INFO - stdout - {'loss': 0.705, 'grad_norm': 1.6887593269348145, 'learning_rate': 1.3780100986866957e-05, 'epoch': 1.19} +2025-02-05 19:12:37 - ERROR - stderr - 40%|███▉ | 8869/22434 [9:04:57<9:33:32, 2.54s/it] +2025-02-05 19:12:39 - ERROR - stderr - 40%|███▉ | 8870/22434 [9:04:59<9:27:47, 2.51s/it] +2025-02-05 19:12:39 - ERROR - stderr - +2025-02-05 19:12:39 - ERROR - stderr - +2025-02-05 19:12:39 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.120638132095337, 'learning_rate': 1.3778764324725919e-05, 'epoch': 1.19} +2025-02-05 19:12:39 - ERROR - stderr - 40%|███▉ | 8870/22434 [9:04:59<9:27:47, 2.51s/it] +2025-02-05 19:12:42 - ERROR - stderr - 40%|███▉ | 8871/22434 [9:05:02<9:30:28, 2.52s/it] +2025-02-05 19:12:42 - ERROR - stderr - +2025-02-05 19:12:42 - ERROR - stderr - +2025-02-05 19:12:42 - INFO - stdout - {'loss': 0.6708, 'grad_norm': 1.109403371810913, 'learning_rate': 1.3777427583820156e-05, 'epoch': 1.19} +2025-02-05 19:12:42 - ERROR - stderr - 40%|███▉ | 8871/22434 [9:05:02<9:30:28, 2.52s/it] +2025-02-05 19:12:44 - ERROR - stderr - 40%|███▉ | 8872/22434 [9:05:04<9:29:43, 2.52s/it] +2025-02-05 19:12:44 - ERROR - stderr - +2025-02-05 19:12:44 - ERROR - stderr - +2025-02-05 19:12:44 - INFO - stdout - {'loss': 0.6973, 'grad_norm': 1.0980756282806396, 'learning_rate': 1.3776090764177527e-05, 'epoch': 1.19} +2025-02-05 19:12:44 - ERROR - stderr - 40%|███▉ | 8872/22434 [9:05:04<9:29:43, 2.52s/it] +2025-02-05 19:12:47 - ERROR - stderr - 40%|███▉ | 8873/22434 [9:05:07<9:30:58, 2.53s/it] +2025-02-05 19:12:47 - ERROR - stderr - +2025-02-05 19:12:47 - ERROR - stderr - +2025-02-05 19:12:47 - INFO - stdout - {'loss': 0.7233, 'grad_norm': 1.2522748708724976, 'learning_rate': 1.3774753865825905e-05, 'epoch': 1.19} +2025-02-05 19:12:47 - ERROR - stderr - 40%|███▉ | 8873/22434 [9:05:07<9:30:58, 2.53s/it] +2025-02-05 19:12:49 - ERROR - stderr - 40%|███▉ | 8874/22434 [9:05:09<9:31:59, 2.53s/it] +2025-02-05 19:12:49 - ERROR - stderr - +2025-02-05 19:12:49 - ERROR - stderr - +2025-02-05 19:12:49 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.0618726015090942, 'learning_rate': 1.3773416888793145e-05, 'epoch': 1.19} +2025-02-05 19:12:49 - ERROR - stderr - 40%|███▉ | 8874/22434 [9:05:09<9:31:59, 2.53s/it] +2025-02-05 19:12:52 - ERROR - stderr - 40%|███▉ | 8875/22434 [9:05:12<9:33:05, 2.54s/it] +2025-02-05 19:12:52 - ERROR - stderr - +2025-02-05 19:12:52 - ERROR - stderr - +2025-02-05 19:12:52 - INFO - stdout - {'loss': 0.7874, 'grad_norm': 1.1258240938186646, 'learning_rate': 1.3772079833107123e-05, 'epoch': 1.19} +2025-02-05 19:12:52 - ERROR - stderr - 40%|███▉ | 8875/22434 [9:05:12<9:33:05, 2.54s/it] +2025-02-05 19:12:54 - ERROR - stderr - 40%|███▉ | 8876/22434 [9:05:14<9:36:43, 2.55s/it] +2025-02-05 19:12:55 - ERROR - stderr - +2025-02-05 19:12:55 - ERROR - stderr - +2025-02-05 19:12:55 - INFO - stdout - {'loss': 0.7259, 'grad_norm': 1.2461109161376953, 'learning_rate': 1.3770742698795707e-05, 'epoch': 1.19} +2025-02-05 19:12:55 - ERROR - stderr - 40%|███▉ | 8876/22434 [9:05:14<9:36:43, 2.55s/it] +2025-02-05 19:12:57 - ERROR - stderr - 40%|███▉ | 8877/22434 [9:05:17<9:36:48, 2.55s/it] +2025-02-05 19:12:57 - ERROR - stderr - +2025-02-05 19:12:57 - ERROR - stderr - +2025-02-05 19:12:57 - INFO - stdout - {'loss': 0.6422, 'grad_norm': 1.091899037361145, 'learning_rate': 1.3769405485886767e-05, 'epoch': 1.19} +2025-02-05 19:12:57 - ERROR - stderr - 40%|███▉ | 8877/22434 [9:05:17<9:36:48, 2.55s/it] +2025-02-05 19:12:59 - ERROR - stderr - 40%|███▉ | 8878/22434 [9:05:19<9:28:34, 2.52s/it] +2025-02-05 19:13:00 - ERROR - stderr - +2025-02-05 19:13:00 - ERROR - stderr - +2025-02-05 19:13:00 - INFO - stdout - {'loss': 0.7763, 'grad_norm': 1.3187233209609985, 'learning_rate': 1.3768068194408175e-05, 'epoch': 1.19} +2025-02-05 19:13:00 - ERROR - stderr - 40%|███▉ | 8878/22434 [9:05:19<9:28:34, 2.52s/it] +2025-02-05 19:13:02 - ERROR - stderr - 40%|███▉ | 8879/22434 [9:05:22<9:27:15, 2.51s/it] +2025-02-05 19:13:02 - ERROR - stderr - +2025-02-05 19:13:02 - ERROR - stderr - +2025-02-05 19:13:02 - INFO - stdout - {'loss': 0.7885, 'grad_norm': 1.1897106170654297, 'learning_rate': 1.3766730824387808e-05, 'epoch': 1.19} +2025-02-05 19:13:02 - ERROR - stderr - 40%|███▉ | 8879/22434 [9:05:22<9:27:15, 2.51s/it] +2025-02-05 19:13:04 - ERROR - stderr - 40%|███▉ | 8880/22434 [9:05:24<9:22:01, 2.49s/it] +2025-02-05 19:13:04 - ERROR - stderr - +2025-02-05 19:13:04 - ERROR - stderr - +2025-02-05 19:13:04 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.1752712726593018, 'learning_rate': 1.3765393375853541e-05, 'epoch': 1.19} +2025-02-05 19:13:04 - ERROR - stderr - 40%|███▉ | 8880/22434 [9:05:24<9:22:01, 2.49s/it] +2025-02-05 19:13:07 - ERROR - stderr - 40%|███▉ | 8881/22434 [9:05:27<9:25:52, 2.51s/it] +2025-02-05 19:13:07 - ERROR - stderr - +2025-02-05 19:13:07 - ERROR - stderr - +2025-02-05 19:13:07 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.2943899631500244, 'learning_rate': 1.3764055848833256e-05, 'epoch': 1.19} +2025-02-05 19:13:07 - ERROR - stderr - 40%|███▉ | 8881/22434 [9:05:27<9:25:52, 2.51s/it] +2025-02-05 19:13:10 - ERROR - stderr - 40%|███▉ | 8882/22434 [9:05:29<9:30:23, 2.53s/it] +2025-02-05 19:13:10 - ERROR - stderr - +2025-02-05 19:13:10 - ERROR - stderr - +2025-02-05 19:13:10 - INFO - stdout - {'loss': 0.6728, 'grad_norm': 1.155847430229187, 'learning_rate': 1.3762718243354824e-05, 'epoch': 1.19} +2025-02-05 19:13:10 - ERROR - stderr - 40%|███▉ | 8882/22434 [9:05:29<9:30:23, 2.53s/it] +2025-02-05 19:13:12 - ERROR - stderr - 40%|███▉ | 8883/22434 [9:05:32<9:31:09, 2.53s/it] +2025-02-05 19:13:12 - ERROR - stderr - +2025-02-05 19:13:12 - ERROR - stderr - +2025-02-05 19:13:12 - INFO - stdout - {'loss': 0.7771, 'grad_norm': 1.12455415725708, 'learning_rate': 1.3761380559446131e-05, 'epoch': 1.19} +2025-02-05 19:13:12 - ERROR - stderr - 40%|███▉ | 8883/22434 [9:05:32<9:31:09, 2.53s/it] +2025-02-05 19:13:15 - ERROR - stderr - 40%|███▉ | 8884/22434 [9:05:34<9:28:31, 2.52s/it] +2025-02-05 19:13:15 - ERROR - stderr - +2025-02-05 19:13:15 - ERROR - stderr - +2025-02-05 19:13:15 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.1706955432891846, 'learning_rate': 1.376004279713506e-05, 'epoch': 1.19} +2025-02-05 19:13:15 - ERROR - stderr - 40%|███▉ | 8884/22434 [9:05:34<9:28:31, 2.52s/it] +2025-02-05 19:13:17 - ERROR - stderr - 40%|███▉ | 8885/22434 [9:05:37<9:28:42, 2.52s/it] +2025-02-05 19:13:17 - ERROR - stderr - +2025-02-05 19:13:17 - ERROR - stderr - +2025-02-05 19:13:17 - INFO - stdout - {'loss': 0.8019, 'grad_norm': 1.286660075187683, 'learning_rate': 1.3758704956449497e-05, 'epoch': 1.19} +2025-02-05 19:13:17 - ERROR - stderr - 40%|███▉ | 8885/22434 [9:05:37<9:28:42, 2.52s/it] +2025-02-05 19:13:20 - ERROR - stderr - 40%|███▉ | 8886/22434 [9:05:39<9:28:30, 2.52s/it] +2025-02-05 19:13:20 - ERROR - stderr - +2025-02-05 19:13:20 - ERROR - stderr - +2025-02-05 19:13:20 - INFO - stdout - {'loss': 0.7099, 'grad_norm': 1.1379283666610718, 'learning_rate': 1.3757367037417324e-05, 'epoch': 1.19} +2025-02-05 19:13:20 - ERROR - stderr - 40%|███▉ | 8886/22434 [9:05:39<9:28:30, 2.52s/it] +2025-02-05 19:13:22 - ERROR - stderr - 40%|███▉ | 8887/22434 [9:05:42<9:30:05, 2.52s/it] +2025-02-05 19:13:22 - ERROR - stderr - +2025-02-05 19:13:22 - ERROR - stderr - +2025-02-05 19:13:22 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.2692679166793823, 'learning_rate': 1.3756029040066432e-05, 'epoch': 1.19} +2025-02-05 19:13:22 - ERROR - stderr - 40%|███▉ | 8887/22434 [9:05:42<9:30:05, 2.52s/it] +2025-02-05 19:13:25 - ERROR - stderr - 40%|███▉ | 8888/22434 [9:05:45<9:39:03, 2.56s/it] +2025-02-05 19:13:25 - ERROR - stderr - +2025-02-05 19:13:25 - ERROR - stderr - +2025-02-05 19:13:25 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.0977730751037598, 'learning_rate': 1.3754690964424709e-05, 'epoch': 1.19} +2025-02-05 19:13:25 - ERROR - stderr - 40%|███▉ | 8888/22434 [9:05:45<9:39:03, 2.56s/it] +2025-02-05 19:13:27 - ERROR - stderr - 40%|███▉ | 8889/22434 [9:05:47<9:32:48, 2.54s/it] +2025-02-05 19:13:27 - ERROR - stderr - +2025-02-05 19:13:27 - ERROR - stderr - +2025-02-05 19:13:27 - INFO - stdout - {'loss': 0.7673, 'grad_norm': 1.1407732963562012, 'learning_rate': 1.3753352810520042e-05, 'epoch': 1.19} +2025-02-05 19:13:27 - ERROR - stderr - 40%|███▉ | 8889/22434 [9:05:47<9:32:48, 2.54s/it] +2025-02-05 19:13:30 - ERROR - stderr - 40%|███▉ | 8890/22434 [9:05:50<9:30:50, 2.53s/it] +2025-02-05 19:13:30 - ERROR - stderr - +2025-02-05 19:13:30 - ERROR - stderr - +2025-02-05 19:13:30 - INFO - stdout - {'loss': 0.7578, 'grad_norm': 1.2526460886001587, 'learning_rate': 1.375201457838033e-05, 'epoch': 1.19} +2025-02-05 19:13:30 - ERROR - stderr - 40%|███▉ | 8890/22434 [9:05:50<9:30:50, 2.53s/it] +2025-02-05 19:13:32 - ERROR - stderr - 40%|███▉ | 8891/22434 [9:05:52<9:24:49, 2.50s/it] +2025-02-05 19:13:32 - ERROR - stderr - +2025-02-05 19:13:32 - ERROR - stderr - +2025-02-05 19:13:32 - INFO - stdout - {'loss': 0.6452, 'grad_norm': 1.1501094102859497, 'learning_rate': 1.3750676268033462e-05, 'epoch': 1.19} +2025-02-05 19:13:32 - ERROR - stderr - 40%|███▉ | 8891/22434 [9:05:52<9:24:49, 2.50s/it] +2025-02-05 19:13:35 - ERROR - stderr - 40%|███▉ | 8892/22434 [9:05:54<9:23:50, 2.50s/it] +2025-02-05 19:13:35 - ERROR - stderr - +2025-02-05 19:13:35 - ERROR - stderr - +2025-02-05 19:13:35 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.1355444192886353, 'learning_rate': 1.374933787950734e-05, 'epoch': 1.19} +2025-02-05 19:13:35 - ERROR - stderr - 40%|███▉ | 8892/22434 [9:05:55<9:23:50, 2.50s/it] +2025-02-05 19:13:37 - ERROR - stderr - 40%|███▉ | 8893/22434 [9:05:57<9:29:45, 2.52s/it] +2025-02-05 19:13:37 - ERROR - stderr - +2025-02-05 19:13:37 - ERROR - stderr - +2025-02-05 19:13:37 - INFO - stdout - {'loss': 0.745, 'grad_norm': 1.235040307044983, 'learning_rate': 1.3747999412829857e-05, 'epoch': 1.19} +2025-02-05 19:13:37 - ERROR - stderr - 40%|███▉ | 8893/22434 [9:05:57<9:29:45, 2.52s/it] +2025-02-05 19:13:40 - ERROR - stderr - 40%|███▉ | 8894/22434 [9:06:00<9:31:31, 2.53s/it] +2025-02-05 19:13:40 - ERROR - stderr - +2025-02-05 19:13:40 - ERROR - stderr - +2025-02-05 19:13:40 - INFO - stdout - {'loss': 0.7526, 'grad_norm': 1.1710262298583984, 'learning_rate': 1.3746660868028911e-05, 'epoch': 1.19} +2025-02-05 19:13:40 - ERROR - stderr - 40%|███▉ | 8894/22434 [9:06:00<9:31:31, 2.53s/it] +2025-02-05 19:13:42 - ERROR - stderr - 40%|███▉ | 8895/22434 [9:06:02<9:26:01, 2.51s/it] +2025-02-05 19:13:42 - ERROR - stderr - +2025-02-05 19:13:42 - ERROR - stderr - +2025-02-05 19:13:42 - INFO - stdout - {'loss': 0.7782, 'grad_norm': 1.306667685508728, 'learning_rate': 1.3745322245132406e-05, 'epoch': 1.19} +2025-02-05 19:13:42 - ERROR - stderr - 40%|███▉ | 8895/22434 [9:06:02<9:26:01, 2.51s/it] +2025-02-05 19:13:45 - ERROR - stderr - 40%|███▉ | 8896/22434 [9:06:05<9:28:04, 2.52s/it] +2025-02-05 19:13:45 - ERROR - stderr - +2025-02-05 19:13:45 - ERROR - stderr - +2025-02-05 19:13:45 - INFO - stdout - {'loss': 0.6467, 'grad_norm': 1.0348923206329346, 'learning_rate': 1.374398354416824e-05, 'epoch': 1.19} +2025-02-05 19:13:45 - ERROR - stderr - 40%|███▉ | 8896/22434 [9:06:05<9:28:04, 2.52s/it] +2025-02-05 19:13:47 - ERROR - stderr - 40%|███▉ | 8897/22434 [9:06:07<9:23:14, 2.50s/it] +2025-02-05 19:13:47 - ERROR - stderr - +2025-02-05 19:13:47 - ERROR - stderr - +2025-02-05 19:13:47 - INFO - stdout - {'loss': 0.6854, 'grad_norm': 1.1548538208007812, 'learning_rate': 1.3742644765164324e-05, 'epoch': 1.19} +2025-02-05 19:13:47 - ERROR - stderr - 40%|███▉ | 8897/22434 [9:06:07<9:23:14, 2.50s/it] +2025-02-05 19:13:50 - ERROR - stderr - 40%|███▉ | 8898/22434 [9:06:10<9:24:28, 2.50s/it] +2025-02-05 19:13:50 - ERROR - stderr - +2025-02-05 19:13:50 - ERROR - stderr - +2025-02-05 19:13:50 - INFO - stdout - {'loss': 0.6799, 'grad_norm': 1.0818729400634766, 'learning_rate': 1.3741305908148555e-05, 'epoch': 1.19} +2025-02-05 19:13:50 - ERROR - stderr - 40%|███▉ | 8898/22434 [9:06:10<9:24:28, 2.50s/it] +2025-02-05 19:13:52 - ERROR - stderr - 40%|███▉ | 8899/22434 [9:06:12<9:24:17, 2.50s/it] +2025-02-05 19:13:52 - ERROR - stderr - +2025-02-05 19:13:52 - ERROR - stderr - +2025-02-05 19:13:52 - INFO - stdout - {'loss': 0.8208, 'grad_norm': 1.2441802024841309, 'learning_rate': 1.3739966973148846e-05, 'epoch': 1.19} +2025-02-05 19:13:52 - ERROR - stderr - 40%|███▉ | 8899/22434 [9:06:12<9:24:17, 2.50s/it] +2025-02-05 19:13:55 - ERROR - stderr - 40%|███▉ | 8900/22434 [9:06:15<9:30:48, 2.53s/it] +2025-02-05 19:13:55 - ERROR - stderr - +2025-02-05 19:13:55 - ERROR - stderr - +2025-02-05 19:13:55 - INFO - stdout - {'loss': 0.7952, 'grad_norm': 1.2848570346832275, 'learning_rate': 1.3738627960193105e-05, 'epoch': 1.19} +2025-02-05 19:13:55 - ERROR - stderr - 40%|███▉ | 8900/22434 [9:06:15<9:30:48, 2.53s/it] +2025-02-05 19:13:57 - ERROR - stderr - 40%|███▉ | 8901/22434 [9:06:17<9:24:37, 2.50s/it] +2025-02-05 19:13:57 - ERROR - stderr - +2025-02-05 19:13:57 - ERROR - stderr - +2025-02-05 19:13:57 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.1952954530715942, 'learning_rate': 1.3737288869309241e-05, 'epoch': 1.19} +2025-02-05 19:13:57 - ERROR - stderr - 40%|███▉ | 8901/22434 [9:06:17<9:24:37, 2.50s/it] +2025-02-05 19:14:00 - ERROR - stderr - 40%|███▉ | 8902/22434 [9:06:20<9:57:04, 2.65s/it] +2025-02-05 19:14:00 - ERROR - stderr - +2025-02-05 19:14:00 - ERROR - stderr - +2025-02-05 19:14:00 - INFO - stdout - {'loss': 0.7044, 'grad_norm': 1.1041487455368042, 'learning_rate': 1.3735949700525164e-05, 'epoch': 1.19} +2025-02-05 19:14:00 - ERROR - stderr - 40%|███▉ | 8902/22434 [9:06:20<9:57:04, 2.65s/it] +2025-02-05 19:14:03 - ERROR - stderr - 40%|███▉ | 8903/22434 [9:06:23<9:47:25, 2.60s/it] +2025-02-05 19:14:03 - ERROR - stderr - +2025-02-05 19:14:03 - ERROR - stderr - +2025-02-05 19:14:03 - INFO - stdout - {'loss': 0.7313, 'grad_norm': 1.1584585905075073, 'learning_rate': 1.3734610453868793e-05, 'epoch': 1.19} +2025-02-05 19:14:03 - ERROR - stderr - 40%|███▉ | 8903/22434 [9:06:23<9:47:25, 2.60s/it] +2025-02-05 19:14:05 - ERROR - stderr - 40%|███▉ | 8904/22434 [9:06:25<9:43:08, 2.59s/it] +2025-02-05 19:14:05 - ERROR - stderr - +2025-02-05 19:14:05 - ERROR - stderr - +2025-02-05 19:14:05 - INFO - stdout - {'loss': 0.728, 'grad_norm': 1.2655977010726929, 'learning_rate': 1.3733271129368042e-05, 'epoch': 1.19} +2025-02-05 19:14:05 - ERROR - stderr - 40%|███▉ | 8904/22434 [9:06:25<9:43:08, 2.59s/it] +2025-02-05 19:14:08 - ERROR - stderr - 40%|███▉ | 8905/22434 [9:06:28<10:05:29, 2.69s/it] +2025-02-05 19:14:08 - ERROR - stderr - +2025-02-05 19:14:08 - ERROR - stderr - +2025-02-05 19:14:08 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.1579521894454956, 'learning_rate': 1.3731931727050826e-05, 'epoch': 1.19} +2025-02-05 19:14:08 - ERROR - stderr - 40%|███▉ | 8905/22434 [9:06:28<10:05:29, 2.69s/it] +2025-02-05 19:14:11 - ERROR - stderr - 40%|███▉ | 8906/22434 [9:06:31<10:11:03, 2.71s/it] +2025-02-05 19:14:11 - ERROR - stderr - +2025-02-05 19:14:11 - ERROR - stderr - +2025-02-05 19:14:11 - INFO - stdout - {'loss': 0.7423, 'grad_norm': 1.2433384656906128, 'learning_rate': 1.3730592246945063e-05, 'epoch': 1.19} +2025-02-05 19:14:11 - ERROR - stderr - 40%|███▉ | 8906/22434 [9:06:31<10:11:03, 2.71s/it] +2025-02-05 19:14:14 - ERROR - stderr - 40%|███▉ | 8907/22434 [9:06:33<10:01:34, 2.67s/it] +2025-02-05 19:14:14 - ERROR - stderr - +2025-02-05 19:14:14 - ERROR - stderr - +2025-02-05 19:14:14 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.284486174583435, 'learning_rate': 1.3729252689078676e-05, 'epoch': 1.19} +2025-02-05 19:14:14 - ERROR - stderr - 40%|███▉ | 8907/22434 [9:06:33<10:01:34, 2.67s/it] +2025-02-05 19:14:17 - ERROR - stderr - 40%|███▉ | 8908/22434 [9:06:36<10:29:05, 2.79s/it] +2025-02-05 19:14:17 - ERROR - stderr - +2025-02-05 19:14:17 - ERROR - stderr - +2025-02-05 19:14:17 - INFO - stdout - {'loss': 0.7506, 'grad_norm': 1.1267811059951782, 'learning_rate': 1.3727913053479582e-05, 'epoch': 1.19} +2025-02-05 19:14:17 - ERROR - stderr - 40%|███▉ | 8908/22434 [9:06:36<10:29:05, 2.79s/it] +2025-02-05 19:14:19 - ERROR - stderr - 40%|███▉ | 8909/22434 [9:06:39<10:12:46, 2.72s/it] +2025-02-05 19:14:19 - ERROR - stderr - +2025-02-05 19:14:19 - ERROR - stderr - +2025-02-05 19:14:19 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.261964201927185, 'learning_rate': 1.372657334017571e-05, 'epoch': 1.19} +2025-02-05 19:14:19 - ERROR - stderr - 40%|███▉ | 8909/22434 [9:06:39<10:12:46, 2.72s/it] +2025-02-05 19:14:22 - ERROR - stderr - 40%|███▉ | 8910/22434 [9:06:42<9:59:46, 2.66s/it] +2025-02-05 19:14:22 - ERROR - stderr - +2025-02-05 19:14:22 - ERROR - stderr - +2025-02-05 19:14:22 - INFO - stdout - {'loss': 0.7656, 'grad_norm': 1.1986085176467896, 'learning_rate': 1.3725233549194983e-05, 'epoch': 1.19} +2025-02-05 19:14:22 - ERROR - stderr - 40%|███▉ | 8910/22434 [9:06:42<9:59:46, 2.66s/it] +2025-02-05 19:14:25 - ERROR - stderr - 40%|███▉ | 8911/22434 [9:06:45<10:22:05, 2.76s/it] +2025-02-05 19:14:25 - ERROR - stderr - +2025-02-05 19:14:25 - ERROR - stderr - +2025-02-05 19:14:25 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.2177395820617676, 'learning_rate': 1.3723893680565325e-05, 'epoch': 1.19} +2025-02-05 19:14:25 - ERROR - stderr - 40%|███▉ | 8911/22434 [9:06:45<10:22:05, 2.76s/it] +2025-02-05 19:14:27 - ERROR - stderr - 40%|███▉ | 8912/22434 [9:06:47<10:03:43, 2.68s/it] +2025-02-05 19:14:27 - ERROR - stderr - +2025-02-05 19:14:27 - ERROR - stderr - +2025-02-05 19:14:27 - INFO - stdout - {'loss': 0.7769, 'grad_norm': 1.1732949018478394, 'learning_rate': 1.3722553734314669e-05, 'epoch': 1.19} +2025-02-05 19:14:27 - ERROR - stderr - 40%|███▉ | 8912/22434 [9:06:47<10:03:43, 2.68s/it] +2025-02-05 19:14:30 - ERROR - stderr - 40%|███▉ | 8913/22434 [9:06:50<9:52:08, 2.63s/it] +2025-02-05 19:14:30 - ERROR - stderr - +2025-02-05 19:14:30 - ERROR - stderr - +2025-02-05 19:14:30 - INFO - stdout - {'loss': 0.7731, 'grad_norm': 1.258568525314331, 'learning_rate': 1.3721213710470944e-05, 'epoch': 1.19} +2025-02-05 19:14:30 - ERROR - stderr - 40%|███▉ | 8913/22434 [9:06:50<9:52:08, 2.63s/it] +2025-02-05 19:14:32 - ERROR - stderr - 40%|███▉ | 8914/22434 [9:06:52<9:47:08, 2.61s/it] +2025-02-05 19:14:32 - ERROR - stderr - +2025-02-05 19:14:32 - ERROR - stderr - +2025-02-05 19:14:32 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.1091668605804443, 'learning_rate': 1.3719873609062078e-05, 'epoch': 1.19} +2025-02-05 19:14:32 - ERROR - stderr - 40%|███▉ | 8914/22434 [9:06:52<9:47:08, 2.61s/it] +2025-02-05 19:14:35 - ERROR - stderr - 40%|███▉ | 8915/22434 [9:06:55<9:39:57, 2.57s/it] +2025-02-05 19:14:35 - ERROR - stderr - +2025-02-05 19:14:35 - ERROR - stderr - +2025-02-05 19:14:35 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.11782968044281, 'learning_rate': 1.3718533430116003e-05, 'epoch': 1.19} +2025-02-05 19:14:35 - ERROR - stderr - 40%|███▉ | 8915/22434 [9:06:55<9:39:57, 2.57s/it] +2025-02-05 19:14:37 - ERROR - stderr - 40%|███▉ | 8916/22434 [9:06:57<9:31:30, 2.54s/it] +2025-02-05 19:14:37 - ERROR - stderr - +2025-02-05 19:14:37 - ERROR - stderr - +2025-02-05 19:14:37 - INFO - stdout - {'loss': 0.8165, 'grad_norm': 1.2586084604263306, 'learning_rate': 1.371719317366066e-05, 'epoch': 1.19} +2025-02-05 19:14:37 - ERROR - stderr - 40%|███▉ | 8916/22434 [9:06:57<9:31:30, 2.54s/it] +2025-02-05 19:14:40 - ERROR - stderr - 40%|███▉ | 8917/22434 [9:07:00<9:31:18, 2.54s/it] +2025-02-05 19:14:40 - ERROR - stderr - +2025-02-05 19:14:40 - ERROR - stderr - +2025-02-05 19:14:40 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.2471113204956055, 'learning_rate': 1.3715852839723984e-05, 'epoch': 1.19} +2025-02-05 19:14:40 - ERROR - stderr - 40%|███▉ | 8917/22434 [9:07:00<9:31:18, 2.54s/it] +2025-02-05 19:14:42 - ERROR - stderr - 40%|███▉ | 8918/22434 [9:07:02<9:30:43, 2.53s/it] +2025-02-05 19:14:42 - ERROR - stderr - +2025-02-05 19:14:42 - ERROR - stderr - +2025-02-05 19:14:42 - INFO - stdout - {'loss': 0.6364, 'grad_norm': 1.0984491109848022, 'learning_rate': 1.3714512428333908e-05, 'epoch': 1.19} +2025-02-05 19:14:42 - ERROR - stderr - 40%|███▉ | 8918/22434 [9:07:02<9:30:43, 2.53s/it] +2025-02-05 19:14:45 - ERROR - stderr - 40%|███▉ | 8919/22434 [9:07:05<9:33:02, 2.54s/it] +2025-02-05 19:14:45 - ERROR - stderr - +2025-02-05 19:14:45 - ERROR - stderr - +2025-02-05 19:14:45 - INFO - stdout - {'loss': 0.658, 'grad_norm': 1.070490837097168, 'learning_rate': 1.3713171939518378e-05, 'epoch': 1.19} +2025-02-05 19:14:45 - ERROR - stderr - 40%|███▉ | 8919/22434 [9:07:05<9:33:02, 2.54s/it] +2025-02-05 19:14:47 - ERROR - stderr - 40%|███▉ | 8920/22434 [9:07:07<9:33:56, 2.55s/it] +2025-02-05 19:14:47 - ERROR - stderr - +2025-02-05 19:14:47 - ERROR - stderr - +2025-02-05 19:14:47 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.169248342514038, 'learning_rate': 1.3711831373305329e-05, 'epoch': 1.19} +2025-02-05 19:14:47 - ERROR - stderr - 40%|███▉ | 8920/22434 [9:07:07<9:33:56, 2.55s/it] +2025-02-05 19:14:50 - ERROR - stderr - 40%|███▉ | 8921/22434 [9:07:10<9:30:15, 2.53s/it] +2025-02-05 19:14:50 - ERROR - stderr - +2025-02-05 19:14:50 - ERROR - stderr - +2025-02-05 19:14:50 - INFO - stdout - {'loss': 0.7456, 'grad_norm': 1.1993176937103271, 'learning_rate': 1.3710490729722707e-05, 'epoch': 1.19} +2025-02-05 19:14:50 - ERROR - stderr - 40%|███▉ | 8921/22434 [9:07:10<9:30:15, 2.53s/it] +2025-02-05 19:14:52 - ERROR - stderr - 40%|███▉ | 8922/22434 [9:07:12<9:28:30, 2.52s/it] +2025-02-05 19:14:52 - ERROR - stderr - +2025-02-05 19:14:52 - ERROR - stderr - +2025-02-05 19:14:52 - INFO - stdout - {'loss': 0.6742, 'grad_norm': 1.0816614627838135, 'learning_rate': 1.3709150008798457e-05, 'epoch': 1.19} +2025-02-05 19:14:52 - ERROR - stderr - 40%|███▉ | 8922/22434 [9:07:12<9:28:30, 2.52s/it] +2025-02-05 19:14:55 - ERROR - stderr - 40%|███▉ | 8923/22434 [9:07:15<9:31:31, 2.54s/it] +2025-02-05 19:14:55 - ERROR - stderr - +2025-02-05 19:14:55 - ERROR - stderr - +2025-02-05 19:14:55 - INFO - stdout - {'loss': 0.7109, 'grad_norm': 1.1516257524490356, 'learning_rate': 1.3707809210560528e-05, 'epoch': 1.19} +2025-02-05 19:14:55 - ERROR - stderr - 40%|███▉ | 8923/22434 [9:07:15<9:31:31, 2.54s/it] +2025-02-05 19:14:58 - ERROR - stderr - 40%|███▉ | 8924/22434 [9:07:17<9:29:24, 2.53s/it] +2025-02-05 19:14:58 - ERROR - stderr - +2025-02-05 19:14:58 - ERROR - stderr - +2025-02-05 19:14:58 - INFO - stdout - {'loss': 0.6871, 'grad_norm': 1.61143159866333, 'learning_rate': 1.370646833503686e-05, 'epoch': 1.19} +2025-02-05 19:14:58 - ERROR - stderr - 40%|███▉ | 8924/22434 [9:07:17<9:29:24, 2.53s/it] +2025-02-05 19:15:00 - ERROR - stderr - 40%|███▉ | 8925/22434 [9:07:20<9:28:27, 2.52s/it] +2025-02-05 19:15:00 - ERROR - stderr - +2025-02-05 19:15:00 - ERROR - stderr - +2025-02-05 19:15:00 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.145888090133667, 'learning_rate': 1.3705127382255406e-05, 'epoch': 1.19} +2025-02-05 19:15:00 - ERROR - stderr - 40%|███▉ | 8925/22434 [9:07:20<9:28:27, 2.52s/it] +2025-02-05 19:15:03 - ERROR - stderr - 40%|███▉ | 8926/22434 [9:07:23<9:43:01, 2.59s/it] +2025-02-05 19:15:03 - ERROR - stderr - +2025-02-05 19:15:03 - ERROR - stderr - +2025-02-05 19:15:03 - INFO - stdout - {'loss': 0.7333, 'grad_norm': 1.2164641618728638, 'learning_rate': 1.3703786352244119e-05, 'epoch': 1.19} +2025-02-05 19:15:03 - ERROR - stderr - 40%|███▉ | 8926/22434 [9:07:23<9:43:01, 2.59s/it] +2025-02-05 19:15:05 - ERROR - stderr - 40%|███▉ | 8927/22434 [9:07:25<9:48:44, 2.62s/it] +2025-02-05 19:15:05 - ERROR - stderr - +2025-02-05 19:15:05 - ERROR - stderr - +2025-02-05 19:15:05 - INFO - stdout - {'loss': 0.7242, 'grad_norm': 1.1248600482940674, 'learning_rate': 1.3702445245030949e-05, 'epoch': 1.19} +2025-02-05 19:15:05 - ERROR - stderr - 40%|███▉ | 8927/22434 [9:07:25<9:48:44, 2.62s/it] +2025-02-05 19:15:08 - ERROR - stderr - 40%|███▉ | 8928/22434 [9:07:28<9:36:34, 2.56s/it] +2025-02-05 19:15:08 - ERROR - stderr - +2025-02-05 19:15:08 - ERROR - stderr - +2025-02-05 19:15:08 - INFO - stdout - {'loss': 0.7204, 'grad_norm': 1.1660642623901367, 'learning_rate': 1.3701104060643848e-05, 'epoch': 1.19} +2025-02-05 19:15:08 - ERROR - stderr - 40%|███▉ | 8928/22434 [9:07:28<9:36:34, 2.56s/it] +2025-02-05 19:15:11 - ERROR - stderr - 40%|███▉ | 8929/22434 [9:07:30<9:53:33, 2.64s/it] +2025-02-05 19:15:11 - ERROR - stderr - +2025-02-05 19:15:11 - ERROR - stderr - +2025-02-05 19:15:11 - INFO - stdout - {'loss': 0.7675, 'grad_norm': 1.250727891921997, 'learning_rate': 1.3699762799110779e-05, 'epoch': 1.19} +2025-02-05 19:15:11 - ERROR - stderr - 40%|███▉ | 8929/22434 [9:07:31<9:53:33, 2.64s/it] +2025-02-05 19:15:13 - ERROR - stderr - 40%|███▉ | 8930/22434 [9:07:33<9:45:52, 2.60s/it] +2025-02-05 19:15:13 - ERROR - stderr - +2025-02-05 19:15:13 - ERROR - stderr - +2025-02-05 19:15:13 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.1870901584625244, 'learning_rate': 1.3698421460459692e-05, 'epoch': 1.19} +2025-02-05 19:15:13 - ERROR - stderr - 40%|███▉ | 8930/22434 [9:07:33<9:45:52, 2.60s/it] +2025-02-05 19:15:16 - ERROR - stderr - 40%|███▉ | 8931/22434 [9:07:35<9:34:09, 2.55s/it] +2025-02-05 19:15:16 - ERROR - stderr - +2025-02-05 19:15:16 - ERROR - stderr - +2025-02-05 19:15:16 - INFO - stdout - {'loss': 0.7632, 'grad_norm': 1.1625667810440063, 'learning_rate': 1.3697080044718549e-05, 'epoch': 1.19} +2025-02-05 19:15:16 - ERROR - stderr - 40%|███▉ | 8931/22434 [9:07:35<9:34:09, 2.55s/it] +2025-02-05 19:15:18 - ERROR - stderr - 40%|███▉ | 8932/22434 [9:07:38<9:27:40, 2.52s/it] +2025-02-05 19:15:18 - ERROR - stderr - +2025-02-05 19:15:18 - ERROR - stderr - +2025-02-05 19:15:18 - INFO - stdout - {'loss': 0.7545, 'grad_norm': 1.2416410446166992, 'learning_rate': 1.3695738551915312e-05, 'epoch': 1.19} +2025-02-05 19:15:18 - ERROR - stderr - 40%|███▉ | 8932/22434 [9:07:38<9:27:40, 2.52s/it] +2025-02-05 19:15:21 - ERROR - stderr - 40%|███▉ | 8933/22434 [9:07:40<9:34:21, 2.55s/it] +2025-02-05 19:15:21 - ERROR - stderr - +2025-02-05 19:15:21 - ERROR - stderr - +2025-02-05 19:15:21 - INFO - stdout - {'loss': 0.7099, 'grad_norm': 1.2610715627670288, 'learning_rate': 1.369439698207794e-05, 'epoch': 1.19} +2025-02-05 19:15:21 - ERROR - stderr - 40%|███▉ | 8933/22434 [9:07:41<9:34:21, 2.55s/it] +2025-02-05 19:15:23 - ERROR - stderr - 40%|███▉ | 8934/22434 [9:07:43<9:30:58, 2.54s/it] +2025-02-05 19:15:23 - ERROR - stderr - +2025-02-05 19:15:23 - ERROR - stderr - +2025-02-05 19:15:23 - INFO - stdout - {'loss': 0.795, 'grad_norm': 1.1958414316177368, 'learning_rate': 1.3693055335234398e-05, 'epoch': 1.19} +2025-02-05 19:15:23 - ERROR - stderr - 40%|███▉ | 8934/22434 [9:07:43<9:30:58, 2.54s/it] +2025-02-05 19:15:26 - ERROR - stderr - 40%|███▉ | 8935/22434 [9:07:46<9:30:24, 2.54s/it] +2025-02-05 19:15:26 - ERROR - stderr - +2025-02-05 19:15:26 - ERROR - stderr - +2025-02-05 19:15:26 - INFO - stdout - {'loss': 0.7491, 'grad_norm': 1.1745306253433228, 'learning_rate': 1.3691713611412649e-05, 'epoch': 1.19} +2025-02-05 19:15:26 - ERROR - stderr - 40%|███▉ | 8935/22434 [9:07:46<9:30:24, 2.54s/it] +2025-02-05 19:15:28 - ERROR - stderr - 40%|███▉ | 8936/22434 [9:07:48<9:30:48, 2.54s/it] +2025-02-05 19:15:28 - ERROR - stderr - +2025-02-05 19:15:28 - ERROR - stderr - +2025-02-05 19:15:28 - INFO - stdout - {'loss': 0.7107, 'grad_norm': 0.9663350582122803, 'learning_rate': 1.3690371810640665e-05, 'epoch': 1.19} +2025-02-05 19:15:28 - ERROR - stderr - 40%|███▉ | 8936/22434 [9:07:48<9:30:48, 2.54s/it] +2025-02-05 19:15:31 - ERROR - stderr - 40%|███▉ | 8937/22434 [9:07:51<9:31:08, 2.54s/it] +2025-02-05 19:15:31 - ERROR - stderr - +2025-02-05 19:15:31 - ERROR - stderr - +2025-02-05 19:15:31 - INFO - stdout - {'loss': 0.6118, 'grad_norm': 1.056146264076233, 'learning_rate': 1.3689029932946411e-05, 'epoch': 1.2} +2025-02-05 19:15:31 - ERROR - stderr - 40%|███▉ | 8937/22434 [9:07:51<9:31:08, 2.54s/it] +2025-02-05 19:15:33 - ERROR - stderr - 40%|███▉ | 8938/22434 [9:07:53<9:29:57, 2.53s/it] +2025-02-05 19:15:33 - ERROR - stderr - +2025-02-05 19:15:33 - ERROR - stderr - +2025-02-05 19:15:33 - INFO - stdout - {'loss': 0.7755, 'grad_norm': 1.1266562938690186, 'learning_rate': 1.3687687978357863e-05, 'epoch': 1.2} +2025-02-05 19:15:33 - ERROR - stderr - 40%|███▉ | 8938/22434 [9:07:53<9:29:57, 2.53s/it] +2025-02-05 19:15:36 - ERROR - stderr - 40%|███▉ | 8939/22434 [9:07:56<9:34:50, 2.56s/it] +2025-02-05 19:15:36 - ERROR - stderr - +2025-02-05 19:15:36 - ERROR - stderr - +2025-02-05 19:15:36 - INFO - stdout - {'loss': 0.6906, 'grad_norm': 1.261659026145935, 'learning_rate': 1.3686345946902981e-05, 'epoch': 1.2} +2025-02-05 19:15:36 - ERROR - stderr - 40%|███▉ | 8939/22434 [9:07:56<9:34:50, 2.56s/it] +2025-02-05 19:15:38 - ERROR - stderr - 40%|███▉ | 8940/22434 [9:07:58<9:28:53, 2.53s/it] +2025-02-05 19:15:38 - ERROR - stderr - +2025-02-05 19:15:38 - ERROR - stderr - +2025-02-05 19:15:38 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.150908350944519, 'learning_rate': 1.3685003838609747e-05, 'epoch': 1.2} +2025-02-05 19:15:38 - ERROR - stderr - 40%|███▉ | 8940/22434 [9:07:58<9:28:53, 2.53s/it] +2025-02-05 19:15:41 - ERROR - stderr - 40%|███▉ | 8941/22434 [9:08:01<9:32:52, 2.55s/it] +2025-02-05 19:15:41 - ERROR - stderr - +2025-02-05 19:15:41 - ERROR - stderr - +2025-02-05 19:15:41 - INFO - stdout - {'loss': 0.6716, 'grad_norm': 1.129273533821106, 'learning_rate': 1.3683661653506133e-05, 'epoch': 1.2} +2025-02-05 19:15:41 - ERROR - stderr - 40%|███▉ | 8941/22434 [9:08:01<9:32:52, 2.55s/it] +2025-02-05 19:15:44 - ERROR - stderr - 40%|███▉ | 8942/22434 [9:08:03<9:33:27, 2.55s/it] +2025-02-05 19:15:44 - ERROR - stderr - +2025-02-05 19:15:44 - ERROR - stderr - +2025-02-05 19:15:44 - INFO - stdout - {'loss': 0.7274, 'grad_norm': 1.2441515922546387, 'learning_rate': 1.368231939162012e-05, 'epoch': 1.2} +2025-02-05 19:15:44 - ERROR - stderr - 40%|███▉ | 8942/22434 [9:08:03<9:33:27, 2.55s/it] +2025-02-05 19:15:46 - ERROR - stderr - 40%|███▉ | 8943/22434 [9:08:06<9:28:26, 2.53s/it] +2025-02-05 19:15:46 - ERROR - stderr - +2025-02-05 19:15:46 - ERROR - stderr - +2025-02-05 19:15:46 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.1343867778778076, 'learning_rate': 1.3680977052979682e-05, 'epoch': 1.2} +2025-02-05 19:15:46 - ERROR - stderr - 40%|███▉ | 8943/22434 [9:08:06<9:28:26, 2.53s/it] +2025-02-05 19:15:49 - ERROR - stderr - 40%|███▉ | 8944/22434 [9:08:08<9:31:32, 2.54s/it] +2025-02-05 19:15:49 - ERROR - stderr - +2025-02-05 19:15:49 - ERROR - stderr - +2025-02-05 19:15:49 - INFO - stdout - {'loss': 0.7791, 'grad_norm': 1.164981722831726, 'learning_rate': 1.3679634637612799e-05, 'epoch': 1.2} +2025-02-05 19:15:49 - ERROR - stderr - 40%|███▉ | 8944/22434 [9:08:08<9:31:32, 2.54s/it] +2025-02-05 19:15:51 - ERROR - stderr - 40%|███▉ | 8945/22434 [9:08:11<9:34:34, 2.56s/it] +2025-02-05 19:15:51 - ERROR - stderr - +2025-02-05 19:15:51 - ERROR - stderr - +2025-02-05 19:15:51 - INFO - stdout - {'loss': 0.7834, 'grad_norm': 1.155806064605713, 'learning_rate': 1.3678292145547454e-05, 'epoch': 1.2} +2025-02-05 19:15:51 - ERROR - stderr - 40%|███▉ | 8945/22434 [9:08:11<9:34:34, 2.56s/it] +2025-02-05 19:15:54 - ERROR - stderr - 40%|███▉ | 8946/22434 [9:08:14<9:31:51, 2.54s/it] +2025-02-05 19:15:54 - ERROR - stderr - +2025-02-05 19:15:54 - ERROR - stderr - +2025-02-05 19:15:54 - INFO - stdout - {'loss': 0.6326, 'grad_norm': 1.058268666267395, 'learning_rate': 1.367694957681163e-05, 'epoch': 1.2} +2025-02-05 19:15:54 - ERROR - stderr - 40%|███▉ | 8946/22434 [9:08:14<9:31:51, 2.54s/it] +2025-02-05 19:15:56 - ERROR - stderr - 40%|███▉ | 8947/22434 [9:08:16<9:34:40, 2.56s/it] +2025-02-05 19:15:56 - ERROR - stderr - +2025-02-05 19:15:56 - ERROR - stderr - +2025-02-05 19:15:56 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.2116761207580566, 'learning_rate': 1.3675606931433305e-05, 'epoch': 1.2} +2025-02-05 19:15:56 - ERROR - stderr - 40%|███▉ | 8947/22434 [9:08:16<9:34:40, 2.56s/it] +2025-02-05 19:15:59 - ERROR - stderr - 40%|███▉ | 8948/22434 [9:08:19<9:34:40, 2.56s/it] +2025-02-05 19:15:59 - ERROR - stderr - +2025-02-05 19:15:59 - ERROR - stderr - +2025-02-05 19:15:59 - INFO - stdout - {'loss': 0.769, 'grad_norm': 1.1323367357254028, 'learning_rate': 1.3674264209440474e-05, 'epoch': 1.2} +2025-02-05 19:15:59 - ERROR - stderr - 40%|███▉ | 8948/22434 [9:08:19<9:34:40, 2.56s/it] +2025-02-05 19:16:01 - ERROR - stderr - 40%|███▉ | 8949/22434 [9:08:21<9:32:51, 2.55s/it] +2025-02-05 19:16:01 - ERROR - stderr - +2025-02-05 19:16:01 - ERROR - stderr - +2025-02-05 19:16:01 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.143675446510315, 'learning_rate': 1.3672921410861122e-05, 'epoch': 1.2} +2025-02-05 19:16:01 - ERROR - stderr - 40%|███▉ | 8949/22434 [9:08:21<9:32:51, 2.55s/it] +2025-02-05 19:16:04 - ERROR - stderr - 40%|███▉ | 8950/22434 [9:08:24<9:44:11, 2.60s/it] +2025-02-05 19:16:04 - ERROR - stderr - +2025-02-05 19:16:04 - ERROR - stderr - +2025-02-05 19:16:04 - INFO - stdout - {'loss': 0.7323, 'grad_norm': 1.0681509971618652, 'learning_rate': 1.367157853572324e-05, 'epoch': 1.2} +2025-02-05 19:16:04 - ERROR - stderr - 40%|███▉ | 8950/22434 [9:08:24<9:44:11, 2.60s/it] +2025-02-05 19:16:07 - ERROR - stderr - 40%|███▉ | 8951/22434 [9:08:27<9:45:13, 2.60s/it] +2025-02-05 19:16:07 - ERROR - stderr - +2025-02-05 19:16:07 - ERROR - stderr - +2025-02-05 19:16:07 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.111977219581604, 'learning_rate': 1.3670235584054814e-05, 'epoch': 1.2} +2025-02-05 19:16:07 - ERROR - stderr - 40%|███▉ | 8951/22434 [9:08:27<9:45:13, 2.60s/it] +2025-02-05 19:16:10 - ERROR - stderr - 40%|███▉ | 8952/22434 [9:08:29<10:03:31, 2.69s/it] +2025-02-05 19:16:10 - ERROR - stderr - +2025-02-05 19:16:10 - ERROR - stderr - +2025-02-05 19:16:10 - INFO - stdout - {'loss': 0.7327, 'grad_norm': 1.2315517663955688, 'learning_rate': 1.3668892555883839e-05, 'epoch': 1.2} +2025-02-05 19:16:10 - ERROR - stderr - 40%|███▉ | 8952/22434 [9:08:29<10:03:31, 2.69s/it] +2025-02-05 19:16:12 - ERROR - stderr - 40%|███▉ | 8953/22434 [9:08:32<9:48:39, 2.62s/it] +2025-02-05 19:16:12 - ERROR - stderr - +2025-02-05 19:16:12 - ERROR - stderr - +2025-02-05 19:16:12 - INFO - stdout - {'loss': 0.6144, 'grad_norm': 1.2540357112884521, 'learning_rate': 1.3667549451238308e-05, 'epoch': 1.2} +2025-02-05 19:16:12 - ERROR - stderr - 40%|███▉ | 8953/22434 [9:08:32<9:48:39, 2.62s/it] +2025-02-05 19:16:15 - ERROR - stderr - 40%|███▉ | 8954/22434 [9:08:34<9:47:02, 2.61s/it] +2025-02-05 19:16:15 - ERROR - stderr - +2025-02-05 19:16:15 - ERROR - stderr - +2025-02-05 19:16:15 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.1160366535186768, 'learning_rate': 1.3666206270146223e-05, 'epoch': 1.2} +2025-02-05 19:16:15 - ERROR - stderr - 40%|███▉ | 8954/22434 [9:08:35<9:47:02, 2.61s/it] +2025-02-05 19:16:17 - ERROR - stderr - 40%|███▉ | 8955/22434 [9:08:37<9:43:20, 2.60s/it] +2025-02-05 19:16:17 - ERROR - stderr - +2025-02-05 19:16:17 - ERROR - stderr - +2025-02-05 19:16:17 - INFO - stdout - {'loss': 0.802, 'grad_norm': 1.1704756021499634, 'learning_rate': 1.3664863012635572e-05, 'epoch': 1.2} +2025-02-05 19:16:17 - ERROR - stderr - 40%|███▉ | 8955/22434 [9:08:37<9:43:20, 2.60s/it] +2025-02-05 19:16:20 - ERROR - stderr - 40%|███▉ | 8956/22434 [9:08:39<9:33:47, 2.55s/it] +2025-02-05 19:16:20 - ERROR - stderr - +2025-02-05 19:16:20 - ERROR - stderr - +2025-02-05 19:16:20 - INFO - stdout - {'loss': 0.7207, 'grad_norm': 1.2126598358154297, 'learning_rate': 1.366351967873436e-05, 'epoch': 1.2} +2025-02-05 19:16:20 - ERROR - stderr - 40%|███▉ | 8956/22434 [9:08:40<9:33:47, 2.55s/it] +2025-02-05 19:16:22 - ERROR - stderr - 40%|███▉ | 8957/22434 [9:08:42<9:32:03, 2.55s/it] +2025-02-05 19:16:22 - ERROR - stderr - +2025-02-05 19:16:22 - ERROR - stderr - +2025-02-05 19:16:22 - INFO - stdout - {'loss': 0.7634, 'grad_norm': 1.2041130065917969, 'learning_rate': 1.3662176268470586e-05, 'epoch': 1.2} +2025-02-05 19:16:22 - ERROR - stderr - 40%|███▉ | 8957/22434 [9:08:42<9:32:03, 2.55s/it] +2025-02-05 19:16:25 - ERROR - stderr - 40%|███▉ | 8958/22434 [9:08:45<9:30:59, 2.54s/it] +2025-02-05 19:16:25 - ERROR - stderr - +2025-02-05 19:16:25 - ERROR - stderr - +2025-02-05 19:16:25 - INFO - stdout - {'loss': 0.7265, 'grad_norm': 1.1671525239944458, 'learning_rate': 1.3660832781872253e-05, 'epoch': 1.2} +2025-02-05 19:16:25 - ERROR - stderr - 40%|███▉ | 8958/22434 [9:08:45<9:30:59, 2.54s/it] +2025-02-05 19:16:27 - ERROR - stderr - 40%|███▉ | 8959/22434 [9:08:47<9:29:51, 2.54s/it] +2025-02-05 19:16:27 - ERROR - stderr - +2025-02-05 19:16:27 - ERROR - stderr - +2025-02-05 19:16:27 - INFO - stdout - {'loss': 0.7515, 'grad_norm': 1.162858247756958, 'learning_rate': 1.3659489218967363e-05, 'epoch': 1.2} +2025-02-05 19:16:27 - ERROR - stderr - 40%|███▉ | 8959/22434 [9:08:47<9:29:51, 2.54s/it] +2025-02-05 19:16:30 - ERROR - stderr - 40%|███▉ | 8960/22434 [9:08:50<9:29:27, 2.54s/it] +2025-02-05 19:16:30 - ERROR - stderr - +2025-02-05 19:16:30 - ERROR - stderr - +2025-02-05 19:16:30 - INFO - stdout - {'loss': 0.7519, 'grad_norm': 1.2248237133026123, 'learning_rate': 1.3658145579783919e-05, 'epoch': 1.2} +2025-02-05 19:16:30 - ERROR - stderr - 40%|███▉ | 8960/22434 [9:08:50<9:29:27, 2.54s/it] +2025-02-05 19:16:32 - ERROR - stderr - 40%|███▉ | 8961/22434 [9:08:52<9:32:46, 2.55s/it] +2025-02-05 19:16:32 - ERROR - stderr - +2025-02-05 19:16:32 - ERROR - stderr - +2025-02-05 19:16:32 - INFO - stdout - {'loss': 0.6069, 'grad_norm': 1.0587517023086548, 'learning_rate': 1.3656801864349933e-05, 'epoch': 1.2} +2025-02-05 19:16:32 - ERROR - stderr - 40%|███▉ | 8961/22434 [9:08:52<9:32:46, 2.55s/it] +2025-02-05 19:16:35 - ERROR - stderr - 40%|███▉ | 8962/22434 [9:08:55<9:31:05, 2.54s/it] +2025-02-05 19:16:35 - ERROR - stderr - +2025-02-05 19:16:35 - ERROR - stderr - +2025-02-05 19:16:35 - INFO - stdout - {'loss': 0.7751, 'grad_norm': 1.4005359411239624, 'learning_rate': 1.3655458072693413e-05, 'epoch': 1.2} +2025-02-05 19:16:35 - ERROR - stderr - 40%|███▉ | 8962/22434 [9:08:55<9:31:05, 2.54s/it] +2025-02-05 19:16:37 - ERROR - stderr - 40%|███▉ | 8963/22434 [9:08:57<9:24:35, 2.51s/it] +2025-02-05 19:16:37 - ERROR - stderr - +2025-02-05 19:16:37 - ERROR - stderr - +2025-02-05 19:16:37 - INFO - stdout - {'loss': 0.802, 'grad_norm': 1.2449924945831299, 'learning_rate': 1.3654114204842369e-05, 'epoch': 1.2} +2025-02-05 19:16:37 - ERROR - stderr - 40%|███▉ | 8963/22434 [9:08:57<9:24:35, 2.51s/it] +2025-02-05 19:16:40 - ERROR - stderr - 40%|███▉ | 8964/22434 [9:09:00<9:24:00, 2.51s/it] +2025-02-05 19:16:40 - ERROR - stderr - +2025-02-05 19:16:40 - ERROR - stderr - +2025-02-05 19:16:40 - INFO - stdout - {'loss': 0.7239, 'grad_norm': 1.1139699220657349, 'learning_rate': 1.3652770260824806e-05, 'epoch': 1.2} +2025-02-05 19:16:40 - ERROR - stderr - 40%|███▉ | 8964/22434 [9:09:00<9:24:00, 2.51s/it] +2025-02-05 19:16:42 - ERROR - stderr - 40%|███▉ | 8965/22434 [9:09:02<9:27:48, 2.53s/it] +2025-02-05 19:16:43 - ERROR - stderr - +2025-02-05 19:16:43 - ERROR - stderr - +2025-02-05 19:16:43 - INFO - stdout - {'loss': 0.7572, 'grad_norm': 1.2092784643173218, 'learning_rate': 1.3651426240668744e-05, 'epoch': 1.2} +2025-02-05 19:16:43 - ERROR - stderr - 40%|███▉ | 8965/22434 [9:09:02<9:27:48, 2.53s/it] +2025-02-05 19:16:45 - ERROR - stderr - 40%|███▉ | 8966/22434 [9:09:05<9:26:00, 2.52s/it] +2025-02-05 19:16:45 - ERROR - stderr - +2025-02-05 19:16:45 - ERROR - stderr - +2025-02-05 19:16:45 - INFO - stdout - {'loss': 0.7116, 'grad_norm': 1.0910764932632446, 'learning_rate': 1.3650082144402195e-05, 'epoch': 1.2} +2025-02-05 19:16:45 - ERROR - stderr - 40%|███▉ | 8966/22434 [9:09:05<9:26:00, 2.52s/it] +2025-02-05 19:16:48 - ERROR - stderr - 40%|███▉ | 8967/22434 [9:09:07<9:38:18, 2.58s/it] +2025-02-05 19:16:48 - ERROR - stderr - +2025-02-05 19:16:48 - ERROR - stderr - +2025-02-05 19:16:48 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.0360510349273682, 'learning_rate': 1.3648737972053179e-05, 'epoch': 1.2} +2025-02-05 19:16:48 - ERROR - stderr - 40%|███▉ | 8967/22434 [9:09:07<9:38:18, 2.58s/it] +2025-02-05 19:16:50 - ERROR - stderr - 40%|███▉ | 8968/22434 [9:09:10<9:36:22, 2.57s/it] +2025-02-05 19:16:50 - ERROR - stderr - +2025-02-05 19:16:50 - ERROR - stderr - +2025-02-05 19:16:50 - INFO - stdout - {'loss': 0.6281, 'grad_norm': 1.0435012578964233, 'learning_rate': 1.3647393723649708e-05, 'epoch': 1.2} +2025-02-05 19:16:50 - ERROR - stderr - 40%|███▉ | 8968/22434 [9:09:10<9:36:22, 2.57s/it] +2025-02-05 19:16:53 - ERROR - stderr - 40%|███▉ | 8969/22434 [9:09:13<9:35:54, 2.57s/it] +2025-02-05 19:16:53 - ERROR - stderr - +2025-02-05 19:16:53 - ERROR - stderr - +2025-02-05 19:16:53 - INFO - stdout - {'loss': 0.7659, 'grad_norm': 1.2182564735412598, 'learning_rate': 1.364604939921981e-05, 'epoch': 1.2} +2025-02-05 19:16:53 - ERROR - stderr - 40%|███▉ | 8969/22434 [9:09:13<9:35:54, 2.57s/it] +2025-02-05 19:16:55 - ERROR - stderr - 40%|███▉ | 8970/22434 [9:09:15<9:39:03, 2.58s/it] +2025-02-05 19:16:55 - ERROR - stderr - +2025-02-05 19:16:55 - ERROR - stderr - +2025-02-05 19:16:55 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.2867982387542725, 'learning_rate': 1.3644704998791501e-05, 'epoch': 1.2} +2025-02-05 19:16:55 - ERROR - stderr - 40%|███▉ | 8970/22434 [9:09:15<9:39:03, 2.58s/it] +2025-02-05 19:16:58 - ERROR - stderr - 40%|███▉ | 8971/22434 [9:09:18<9:32:09, 2.55s/it] +2025-02-05 19:16:58 - ERROR - stderr - +2025-02-05 19:16:58 - ERROR - stderr - +2025-02-05 19:16:58 - INFO - stdout - {'loss': 0.7158, 'grad_norm': 1.1223868131637573, 'learning_rate': 1.3643360522392799e-05, 'epoch': 1.2} +2025-02-05 19:16:58 - ERROR - stderr - 40%|███▉ | 8971/22434 [9:09:18<9:32:09, 2.55s/it] +2025-02-05 19:17:00 - ERROR - stderr - 40%|███▉ | 8972/22434 [9:09:20<9:27:26, 2.53s/it] +2025-02-05 19:17:00 - ERROR - stderr - +2025-02-05 19:17:00 - ERROR - stderr - +2025-02-05 19:17:00 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.2870337963104248, 'learning_rate': 1.3642015970051737e-05, 'epoch': 1.2} +2025-02-05 19:17:00 - ERROR - stderr - 40%|███▉ | 8972/22434 [9:09:20<9:27:26, 2.53s/it] +2025-02-05 19:17:03 - ERROR - stderr - 40%|███▉ | 8973/22434 [9:09:23<9:25:49, 2.52s/it] +2025-02-05 19:17:03 - ERROR - stderr - +2025-02-05 19:17:03 - ERROR - stderr - +2025-02-05 19:17:03 - INFO - stdout - {'loss': 0.7711, 'grad_norm': 1.2040070295333862, 'learning_rate': 1.3640671341796334e-05, 'epoch': 1.2} +2025-02-05 19:17:03 - ERROR - stderr - 40%|███▉ | 8973/22434 [9:09:23<9:25:49, 2.52s/it] +2025-02-05 19:17:06 - ERROR - stderr - 40%|████ | 8974/22434 [9:09:25<9:42:24, 2.60s/it] +2025-02-05 19:17:06 - ERROR - stderr - +2025-02-05 19:17:06 - ERROR - stderr - +2025-02-05 19:17:06 - INFO - stdout - {'loss': 0.7578, 'grad_norm': 1.2238504886627197, 'learning_rate': 1.3639326637654622e-05, 'epoch': 1.2} +2025-02-05 19:17:06 - ERROR - stderr - 40%|████ | 8974/22434 [9:09:25<9:42:24, 2.60s/it] +2025-02-05 19:17:08 - ERROR - stderr - 40%|████ | 8975/22434 [9:09:28<9:42:57, 2.60s/it] +2025-02-05 19:17:08 - ERROR - stderr - +2025-02-05 19:17:08 - ERROR - stderr - +2025-02-05 19:17:08 - INFO - stdout - {'loss': 0.7657, 'grad_norm': 1.4510940313339233, 'learning_rate': 1.3637981857654629e-05, 'epoch': 1.2} +2025-02-05 19:17:08 - ERROR - stderr - 40%|████ | 8975/22434 [9:09:28<9:42:57, 2.60s/it] +2025-02-05 19:17:11 - ERROR - stderr - 40%|████ | 8976/22434 [9:09:31<9:41:13, 2.59s/it] +2025-02-05 19:17:11 - ERROR - stderr - +2025-02-05 19:17:11 - ERROR - stderr - +2025-02-05 19:17:11 - INFO - stdout - {'loss': 0.7903, 'grad_norm': 1.2163002490997314, 'learning_rate': 1.3636637001824386e-05, 'epoch': 1.2} +2025-02-05 19:17:11 - ERROR - stderr - 40%|████ | 8976/22434 [9:09:31<9:41:13, 2.59s/it] +2025-02-05 19:17:13 - ERROR - stderr - 40%|████ | 8977/22434 [9:09:33<9:38:02, 2.58s/it] +2025-02-05 19:17:13 - ERROR - stderr - +2025-02-05 19:17:13 - ERROR - stderr - +2025-02-05 19:17:13 - INFO - stdout - {'loss': 0.738, 'grad_norm': 1.2259807586669922, 'learning_rate': 1.3635292070191924e-05, 'epoch': 1.2} +2025-02-05 19:17:13 - ERROR - stderr - 40%|████ | 8977/22434 [9:09:33<9:38:02, 2.58s/it] +2025-02-05 19:17:16 - ERROR - stderr - 40%|████ | 8978/22434 [9:09:36<9:42:00, 2.60s/it] +2025-02-05 19:17:16 - ERROR - stderr - +2025-02-05 19:17:16 - ERROR - stderr - +2025-02-05 19:17:16 - INFO - stdout - {'loss': 0.7273, 'grad_norm': 1.1371212005615234, 'learning_rate': 1.3633947062785277e-05, 'epoch': 1.2} +2025-02-05 19:17:16 - ERROR - stderr - 40%|████ | 8978/22434 [9:09:36<9:42:00, 2.60s/it] +2025-02-05 19:17:19 - ERROR - stderr - 40%|████ | 8979/22434 [9:09:38<9:38:02, 2.58s/it] +2025-02-05 19:17:19 - ERROR - stderr - +2025-02-05 19:17:19 - ERROR - stderr - +2025-02-05 19:17:19 - INFO - stdout - {'loss': 0.749, 'grad_norm': 1.1110987663269043, 'learning_rate': 1.363260197963248e-05, 'epoch': 1.2} +2025-02-05 19:17:19 - ERROR - stderr - 40%|████ | 8979/22434 [9:09:38<9:38:02, 2.58s/it] +2025-02-05 19:17:21 - ERROR - stderr - 40%|████ | 8980/22434 [9:09:41<9:29:33, 2.54s/it] +2025-02-05 19:17:21 - ERROR - stderr - +2025-02-05 19:17:21 - ERROR - stderr - +2025-02-05 19:17:21 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.1727619171142578, 'learning_rate': 1.363125682076157e-05, 'epoch': 1.2} +2025-02-05 19:17:21 - ERROR - stderr - 40%|████ | 8980/22434 [9:09:41<9:29:33, 2.54s/it] +2025-02-05 19:17:23 - ERROR - stderr - 40%|████ | 8981/22434 [9:09:43<9:27:15, 2.53s/it] +2025-02-05 19:17:24 - ERROR - stderr - +2025-02-05 19:17:24 - ERROR - stderr - +2025-02-05 19:17:24 - INFO - stdout - {'loss': 0.7505, 'grad_norm': 1.148292899131775, 'learning_rate': 1.3629911586200591e-05, 'epoch': 1.2} +2025-02-05 19:17:24 - ERROR - stderr - 40%|████ | 8981/22434 [9:09:43<9:27:15, 2.53s/it] +2025-02-05 19:17:26 - ERROR - stderr - 40%|████ | 8982/22434 [9:09:46<9:28:47, 2.54s/it] +2025-02-05 19:17:26 - ERROR - stderr - +2025-02-05 19:17:26 - ERROR - stderr - +2025-02-05 19:17:26 - INFO - stdout - {'loss': 0.7412, 'grad_norm': 1.1893748044967651, 'learning_rate': 1.3628566275977577e-05, 'epoch': 1.2} +2025-02-05 19:17:26 - ERROR - stderr - 40%|████ | 8982/22434 [9:09:46<9:28:47, 2.54s/it] +2025-02-05 19:17:29 - ERROR - stderr - 40%|████ | 8983/22434 [9:09:48<9:32:07, 2.55s/it] +2025-02-05 19:17:29 - ERROR - stderr - +2025-02-05 19:17:29 - ERROR - stderr - +2025-02-05 19:17:29 - INFO - stdout - {'loss': 0.7256, 'grad_norm': 1.1427544355392456, 'learning_rate': 1.362722089012057e-05, 'epoch': 1.2} +2025-02-05 19:17:29 - ERROR - stderr - 40%|████ | 8983/22434 [9:09:48<9:32:07, 2.55s/it] +2025-02-05 19:17:31 - ERROR - stderr - 40%|████ | 8984/22434 [9:09:51<9:30:56, 2.55s/it] +2025-02-05 19:17:31 - ERROR - stderr - +2025-02-05 19:17:31 - ERROR - stderr - +2025-02-05 19:17:31 - INFO - stdout - {'loss': 0.7795, 'grad_norm': 1.1609103679656982, 'learning_rate': 1.3625875428657614e-05, 'epoch': 1.2} +2025-02-05 19:17:31 - ERROR - stderr - 40%|████ | 8984/22434 [9:09:51<9:30:56, 2.55s/it] +2025-02-05 19:17:34 - ERROR - stderr - 40%|████ | 8985/22434 [9:09:54<9:33:25, 2.56s/it] +2025-02-05 19:17:34 - ERROR - stderr - +2025-02-05 19:17:34 - ERROR - stderr - +2025-02-05 19:17:34 - INFO - stdout - {'loss': 0.7574, 'grad_norm': 1.2255394458770752, 'learning_rate': 1.3624529891616754e-05, 'epoch': 1.2} +2025-02-05 19:17:34 - ERROR - stderr - 40%|████ | 8985/22434 [9:09:54<9:33:25, 2.56s/it] +2025-02-05 19:17:36 - ERROR - stderr - 40%|████ | 8986/22434 [9:09:56<9:31:26, 2.55s/it] +2025-02-05 19:17:36 - ERROR - stderr - +2025-02-05 19:17:36 - ERROR - stderr - +2025-02-05 19:17:36 - INFO - stdout - {'loss': 0.7157, 'grad_norm': 1.2172746658325195, 'learning_rate': 1.3623184279026036e-05, 'epoch': 1.2} +2025-02-05 19:17:36 - ERROR - stderr - 40%|████ | 8986/22434 [9:09:56<9:31:26, 2.55s/it] +2025-02-05 19:17:39 - ERROR - stderr - 40%|████ | 8987/22434 [9:09:59<9:26:38, 2.53s/it] +2025-02-05 19:17:39 - ERROR - stderr - +2025-02-05 19:17:39 - ERROR - stderr - +2025-02-05 19:17:39 - INFO - stdout - {'loss': 0.7531, 'grad_norm': 1.1932650804519653, 'learning_rate': 1.3621838590913509e-05, 'epoch': 1.2} +2025-02-05 19:17:39 - ERROR - stderr - 40%|████ | 8987/22434 [9:09:59<9:26:38, 2.53s/it] +2025-02-05 19:17:41 - ERROR - stderr - 40%|████ | 8988/22434 [9:10:01<9:30:27, 2.55s/it] +2025-02-05 19:17:41 - ERROR - stderr - +2025-02-05 19:17:41 - ERROR - stderr - +2025-02-05 19:17:41 - INFO - stdout - {'loss': 0.6543, 'grad_norm': 1.0475637912750244, 'learning_rate': 1.3620492827307223e-05, 'epoch': 1.2} +2025-02-05 19:17:41 - ERROR - stderr - 40%|████ | 8988/22434 [9:10:01<9:30:27, 2.55s/it] +2025-02-05 19:17:44 - ERROR - stderr - 40%|████ | 8989/22434 [9:10:04<9:45:19, 2.61s/it] +2025-02-05 19:17:44 - ERROR - stderr - +2025-02-05 19:17:44 - ERROR - stderr - +2025-02-05 19:17:44 - INFO - stdout - {'loss': 0.7041, 'grad_norm': 1.1967012882232666, 'learning_rate': 1.361914698823523e-05, 'epoch': 1.2} +2025-02-05 19:17:44 - ERROR - stderr - 40%|████ | 8989/22434 [9:10:04<9:45:19, 2.61s/it] +2025-02-05 19:17:47 - ERROR - stderr - 40%|████ | 8990/22434 [9:10:06<9:34:24, 2.56s/it] +2025-02-05 19:17:47 - ERROR - stderr - +2025-02-05 19:17:47 - ERROR - stderr - +2025-02-05 19:17:47 - INFO - stdout - {'loss': 0.733, 'grad_norm': 1.1847583055496216, 'learning_rate': 1.3617801073725581e-05, 'epoch': 1.2} +2025-02-05 19:17:47 - ERROR - stderr - 40%|████ | 8990/22434 [9:10:06<9:34:24, 2.56s/it] +2025-02-05 19:17:49 - ERROR - stderr - 40%|████ | 8991/22434 [9:10:09<9:30:48, 2.55s/it] +2025-02-05 19:17:49 - ERROR - stderr - +2025-02-05 19:17:49 - ERROR - stderr - +2025-02-05 19:17:49 - INFO - stdout - {'loss': 0.6143, 'grad_norm': 1.0628321170806885, 'learning_rate': 1.361645508380633e-05, 'epoch': 1.2} +2025-02-05 19:17:49 - ERROR - stderr - 40%|████ | 8991/22434 [9:10:09<9:30:48, 2.55s/it] +2025-02-05 19:17:52 - ERROR - stderr - 40%|████ | 8992/22434 [9:10:11<9:24:41, 2.52s/it] +2025-02-05 19:17:52 - ERROR - stderr - +2025-02-05 19:17:52 - ERROR - stderr - +2025-02-05 19:17:52 - INFO - stdout - {'loss': 0.7869, 'grad_norm': 1.2126635313034058, 'learning_rate': 1.361510901850553e-05, 'epoch': 1.2} +2025-02-05 19:17:52 - ERROR - stderr - 40%|████ | 8992/22434 [9:10:11<9:24:41, 2.52s/it] +2025-02-05 19:17:54 - ERROR - stderr - 40%|████ | 8993/22434 [9:10:14<9:20:58, 2.50s/it] +2025-02-05 19:17:54 - ERROR - stderr - +2025-02-05 19:17:54 - ERROR - stderr - +2025-02-05 19:17:54 - INFO - stdout - {'loss': 0.5897, 'grad_norm': 0.9837111830711365, 'learning_rate': 1.3613762877851244e-05, 'epoch': 1.2} +2025-02-05 19:17:54 - ERROR - stderr - 40%|████ | 8993/22434 [9:10:14<9:20:58, 2.50s/it] +2025-02-05 19:17:56 - ERROR - stderr - 40%|████ | 8994/22434 [9:10:16<9:18:38, 2.49s/it] +2025-02-05 19:17:57 - ERROR - stderr - +2025-02-05 19:17:57 - ERROR - stderr - +2025-02-05 19:17:57 - INFO - stdout - {'loss': 0.6758, 'grad_norm': 1.1172072887420654, 'learning_rate': 1.3612416661871532e-05, 'epoch': 1.2} +2025-02-05 19:17:57 - ERROR - stderr - 40%|████ | 8994/22434 [9:10:16<9:18:38, 2.49s/it] +2025-02-05 19:17:59 - ERROR - stderr - 40%|████ | 8995/22434 [9:10:19<9:14:35, 2.48s/it] +2025-02-05 19:17:59 - ERROR - stderr - +2025-02-05 19:17:59 - ERROR - stderr - +2025-02-05 19:17:59 - INFO - stdout - {'loss': 0.7022, 'grad_norm': 1.1416600942611694, 'learning_rate': 1.3611070370594448e-05, 'epoch': 1.2} +2025-02-05 19:17:59 - ERROR - stderr - 40%|████ | 8995/22434 [9:10:19<9:14:35, 2.48s/it] +2025-02-05 19:18:02 - ERROR - stderr - 40%|████ | 8996/22434 [9:10:21<9:35:41, 2.57s/it] +2025-02-05 19:18:02 - ERROR - stderr - +2025-02-05 19:18:02 - ERROR - stderr - +2025-02-05 19:18:02 - INFO - stdout - {'loss': 0.784, 'grad_norm': 1.2693506479263306, 'learning_rate': 1.3609724004048057e-05, 'epoch': 1.2} +2025-02-05 19:18:02 - ERROR - stderr - 40%|████ | 8996/22434 [9:10:21<9:35:41, 2.57s/it] +2025-02-05 19:18:04 - ERROR - stderr - 40%|████ | 8997/22434 [9:10:24<9:29:14, 2.54s/it] +2025-02-05 19:18:04 - ERROR - stderr - +2025-02-05 19:18:04 - ERROR - stderr - +2025-02-05 19:18:04 - INFO - stdout - {'loss': 0.7454, 'grad_norm': 1.1978055238723755, 'learning_rate': 1.3608377562260423e-05, 'epoch': 1.2} +2025-02-05 19:18:04 - ERROR - stderr - 40%|████ | 8997/22434 [9:10:24<9:29:14, 2.54s/it] +2025-02-05 19:18:07 - ERROR - stderr - 40%|████ | 8998/22434 [9:10:26<9:28:00, 2.54s/it] +2025-02-05 19:18:07 - ERROR - stderr - +2025-02-05 19:18:07 - ERROR - stderr - +2025-02-05 19:18:07 - INFO - stdout - {'loss': 0.6797, 'grad_norm': 1.13164484500885, 'learning_rate': 1.3607031045259615e-05, 'epoch': 1.2} +2025-02-05 19:18:07 - ERROR - stderr - 40%|████ | 8998/22434 [9:10:26<9:28:00, 2.54s/it] +2025-02-05 19:18:10 - ERROR - stderr - 40%|████ | 8999/22434 [9:10:29<9:56:35, 2.66s/it] +2025-02-05 19:18:10 - ERROR - stderr - +2025-02-05 19:18:10 - ERROR - stderr - +2025-02-05 19:18:10 - INFO - stdout - {'loss': 0.7519, 'grad_norm': 1.2723455429077148, 'learning_rate': 1.3605684453073696e-05, 'epoch': 1.2} +2025-02-05 19:18:10 - ERROR - stderr - 40%|████ | 8999/22434 [9:10:29<9:56:35, 2.66s/it] +2025-02-05 19:18:12 - ERROR - stderr - 40%|████ | 9000/22434 [9:10:32<9:41:21, 2.60s/it] +2025-02-05 19:18:12 - ERROR - stderr - +2025-02-05 19:18:12 - ERROR - stderr - +2025-02-05 19:18:12 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.1814661026000977, 'learning_rate': 1.3604337785730732e-05, 'epoch': 1.2} +2025-02-05 19:18:12 - ERROR - stderr - 40%|████ | 9000/22434 [9:10:32<9:41:21, 2.60s/it] +2025-02-05 19:18:15 - ERROR - stderr - 40%|████ | 9001/22434 [9:10:34<9:34:46, 2.57s/it] +2025-02-05 19:18:15 - ERROR - stderr - +2025-02-05 19:18:15 - ERROR - stderr - +2025-02-05 19:18:15 - INFO - stdout - {'loss': 0.8782, 'grad_norm': 1.364017367362976, 'learning_rate': 1.3602991043258795e-05, 'epoch': 1.2} +2025-02-05 19:18:15 - ERROR - stderr - 40%|████ | 9001/22434 [9:10:34<9:34:46, 2.57s/it] +2025-02-05 19:18:17 - ERROR - stderr - 40%|████ | 9002/22434 [9:10:37<9:45:40, 2.62s/it] +2025-02-05 19:18:17 - ERROR - stderr - +2025-02-05 19:18:17 - ERROR - stderr - +2025-02-05 19:18:17 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.2103763818740845, 'learning_rate': 1.3601644225685963e-05, 'epoch': 1.2} +2025-02-05 19:18:17 - ERROR - stderr - 40%|████ | 9002/22434 [9:10:37<9:45:40, 2.62s/it] +2025-02-05 19:18:20 - ERROR - stderr - 40%|████ | 9003/22434 [9:10:40<9:58:45, 2.67s/it] +2025-02-05 19:18:20 - ERROR - stderr - +2025-02-05 19:18:20 - ERROR - stderr - +2025-02-05 19:18:20 - INFO - stdout - {'loss': 0.6409, 'grad_norm': 1.098138451576233, 'learning_rate': 1.36002973330403e-05, 'epoch': 1.2} +2025-02-05 19:18:20 - ERROR - stderr - 40%|████ | 9003/22434 [9:10:40<9:58:45, 2.67s/it] +2025-02-05 19:18:23 - ERROR - stderr - 40%|████ | 9004/22434 [9:10:42<9:48:34, 2.63s/it] +2025-02-05 19:18:23 - ERROR - stderr - +2025-02-05 19:18:23 - ERROR - stderr - +2025-02-05 19:18:23 - INFO - stdout - {'loss': 0.7057, 'grad_norm': 1.1424403190612793, 'learning_rate': 1.3598950365349884e-05, 'epoch': 1.2} +2025-02-05 19:18:23 - ERROR - stderr - 40%|████ | 9004/22434 [9:10:42<9:48:34, 2.63s/it] +2025-02-05 19:18:25 - ERROR - stderr - 40%|████ | 9005/22434 [9:10:45<9:38:23, 2.58s/it] +2025-02-05 19:18:25 - ERROR - stderr - +2025-02-05 19:18:25 - ERROR - stderr - +2025-02-05 19:18:25 - INFO - stdout - {'loss': 0.6874, 'grad_norm': 1.1962815523147583, 'learning_rate': 1.3597603322642791e-05, 'epoch': 1.2} +2025-02-05 19:18:25 - ERROR - stderr - 40%|████ | 9005/22434 [9:10:45<9:38:23, 2.58s/it] +2025-02-05 19:18:28 - ERROR - stderr - 40%|████ | 9006/22434 [9:10:47<9:29:26, 2.54s/it] +2025-02-05 19:18:28 - ERROR - stderr - +2025-02-05 19:18:28 - ERROR - stderr - +2025-02-05 19:18:28 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.0450505018234253, 'learning_rate': 1.3596256204947098e-05, 'epoch': 1.2} +2025-02-05 19:18:28 - ERROR - stderr - 40%|████ | 9006/22434 [9:10:47<9:29:26, 2.54s/it] +2025-02-05 19:18:30 - ERROR - stderr - 40%|████ | 9007/22434 [9:10:50<9:27:29, 2.54s/it] +2025-02-05 19:18:30 - ERROR - stderr - +2025-02-05 19:18:30 - ERROR - stderr - +2025-02-05 19:18:30 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.111109972000122, 'learning_rate': 1.3594909012290889e-05, 'epoch': 1.2} +2025-02-05 19:18:30 - ERROR - stderr - 40%|████ | 9007/22434 [9:10:50<9:27:29, 2.54s/it] +2025-02-05 19:18:33 - ERROR - stderr - 40%|████ | 9008/22434 [9:10:52<9:27:42, 2.54s/it] +2025-02-05 19:18:33 - ERROR - stderr - +2025-02-05 19:18:33 - ERROR - stderr - +2025-02-05 19:18:33 - INFO - stdout - {'loss': 0.7348, 'grad_norm': 1.2168430089950562, 'learning_rate': 1.3593561744702241e-05, 'epoch': 1.2} +2025-02-05 19:18:33 - ERROR - stderr - 40%|████ | 9008/22434 [9:10:52<9:27:42, 2.54s/it] +2025-02-05 19:18:35 - ERROR - stderr - 40%|████ | 9009/22434 [9:10:55<9:19:17, 2.50s/it] +2025-02-05 19:18:35 - ERROR - stderr - +2025-02-05 19:18:35 - ERROR - stderr - +2025-02-05 19:18:35 - INFO - stdout - {'loss': 0.6228, 'grad_norm': 1.294054388999939, 'learning_rate': 1.3592214402209236e-05, 'epoch': 1.2} +2025-02-05 19:18:35 - ERROR - stderr - 40%|████ | 9009/22434 [9:10:55<9:19:17, 2.50s/it] +2025-02-05 19:18:38 - ERROR - stderr - 40%|████ | 9010/22434 [9:10:57<9:19:30, 2.50s/it] +2025-02-05 19:18:38 - ERROR - stderr - +2025-02-05 19:18:38 - ERROR - stderr - +2025-02-05 19:18:38 - INFO - stdout - {'loss': 0.6602, 'grad_norm': 0.956149160861969, 'learning_rate': 1.3590866984839959e-05, 'epoch': 1.2} +2025-02-05 19:18:38 - ERROR - stderr - 40%|████ | 9010/22434 [9:10:57<9:19:30, 2.50s/it] +2025-02-05 19:18:40 - ERROR - stderr - 40%|████ | 9011/22434 [9:11:00<9:23:56, 2.52s/it] +2025-02-05 19:18:40 - ERROR - stderr - +2025-02-05 19:18:40 - ERROR - stderr - +2025-02-05 19:18:40 - INFO - stdout - {'loss': 0.6943, 'grad_norm': 1.1104375123977661, 'learning_rate': 1.3589519492622496e-05, 'epoch': 1.21} +2025-02-05 19:18:40 - ERROR - stderr - 40%|████ | 9011/22434 [9:11:00<9:23:56, 2.52s/it] +2025-02-05 19:18:43 - ERROR - stderr - 40%|████ | 9012/22434 [9:11:02<9:18:32, 2.50s/it] +2025-02-05 19:18:43 - ERROR - stderr - +2025-02-05 19:18:43 - ERROR - stderr - +2025-02-05 19:18:43 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.0774385929107666, 'learning_rate': 1.3588171925584935e-05, 'epoch': 1.21} +2025-02-05 19:18:43 - ERROR - stderr - 40%|████ | 9012/22434 [9:11:02<9:18:32, 2.50s/it] +2025-02-05 19:18:45 - ERROR - stderr - 40%|████ | 9013/22434 [9:11:05<9:23:06, 2.52s/it] +2025-02-05 19:18:45 - ERROR - stderr - +2025-02-05 19:18:45 - ERROR - stderr - +2025-02-05 19:18:45 - INFO - stdout - {'loss': 0.7376, 'grad_norm': 1.214463233947754, 'learning_rate': 1.3586824283755362e-05, 'epoch': 1.21} +2025-02-05 19:18:45 - ERROR - stderr - 40%|████ | 9013/22434 [9:11:05<9:23:06, 2.52s/it] +2025-02-05 19:18:48 - ERROR - stderr - 40%|████ | 9014/22434 [9:11:07<9:22:56, 2.52s/it] +2025-02-05 19:18:48 - ERROR - stderr - +2025-02-05 19:18:48 - ERROR - stderr - +2025-02-05 19:18:48 - INFO - stdout - {'loss': 0.7964, 'grad_norm': 1.1363368034362793, 'learning_rate': 1.358547656716187e-05, 'epoch': 1.21} +2025-02-05 19:18:48 - ERROR - stderr - 40%|████ | 9014/22434 [9:11:07<9:22:56, 2.52s/it] +2025-02-05 19:18:50 - ERROR - stderr - 40%|████ | 9015/22434 [9:11:10<9:20:03, 2.50s/it] +2025-02-05 19:18:50 - ERROR - stderr - +2025-02-05 19:18:50 - ERROR - stderr - +2025-02-05 19:18:50 - INFO - stdout - {'loss': 0.7599, 'grad_norm': 1.2589733600616455, 'learning_rate': 1.358412877583255e-05, 'epoch': 1.21} +2025-02-05 19:18:50 - ERROR - stderr - 40%|████ | 9015/22434 [9:11:10<9:20:03, 2.50s/it] +2025-02-05 19:18:53 - ERROR - stderr - 40%|████ | 9016/22434 [9:11:12<9:24:36, 2.52s/it] +2025-02-05 19:18:53 - ERROR - stderr - +2025-02-05 19:18:53 - ERROR - stderr - +2025-02-05 19:18:53 - INFO - stdout - {'loss': 0.6943, 'grad_norm': 1.1610214710235596, 'learning_rate': 1.3582780909795497e-05, 'epoch': 1.21} +2025-02-05 19:18:53 - ERROR - stderr - 40%|████ | 9016/22434 [9:11:12<9:24:36, 2.52s/it] +2025-02-05 19:18:55 - ERROR - stderr - 40%|████ | 9017/22434 [9:11:15<9:38:38, 2.59s/it] +2025-02-05 19:18:55 - ERROR - stderr - +2025-02-05 19:18:55 - ERROR - stderr - +2025-02-05 19:18:55 - INFO - stdout - {'loss': 0.7636, 'grad_norm': 1.2416789531707764, 'learning_rate': 1.3581432969078803e-05, 'epoch': 1.21} +2025-02-05 19:18:55 - ERROR - stderr - 40%|████ | 9017/22434 [9:11:15<9:38:38, 2.59s/it] +2025-02-05 19:18:58 - ERROR - stderr - 40%|████ | 9018/22434 [9:11:18<9:28:01, 2.54s/it] +2025-02-05 19:18:58 - ERROR - stderr - +2025-02-05 19:18:58 - ERROR - stderr - +2025-02-05 19:18:58 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.1754951477050781, 'learning_rate': 1.3580084953710564e-05, 'epoch': 1.21} +2025-02-05 19:18:58 - ERROR - stderr - 40%|████ | 9018/22434 [9:11:18<9:28:01, 2.54s/it] +2025-02-05 19:19:00 - ERROR - stderr - 40%|████ | 9019/22434 [9:11:20<9:27:29, 2.54s/it] +2025-02-05 19:19:00 - ERROR - stderr - +2025-02-05 19:19:00 - ERROR - stderr - +2025-02-05 19:19:00 - INFO - stdout - {'loss': 0.7567, 'grad_norm': 1.2357126474380493, 'learning_rate': 1.3578736863718879e-05, 'epoch': 1.21} +2025-02-05 19:19:00 - ERROR - stderr - 40%|████ | 9019/22434 [9:11:20<9:27:29, 2.54s/it] +2025-02-05 19:19:03 - ERROR - stderr - 40%|████ | 9020/22434 [9:11:23<9:23:01, 2.52s/it] +2025-02-05 19:19:03 - ERROR - stderr - +2025-02-05 19:19:03 - ERROR - stderr - +2025-02-05 19:19:03 - INFO - stdout - {'loss': 0.6947, 'grad_norm': 1.049296498298645, 'learning_rate': 1.3577388699131852e-05, 'epoch': 1.21} +2025-02-05 19:19:03 - ERROR - stderr - 40%|████ | 9020/22434 [9:11:23<9:23:01, 2.52s/it] +2025-02-05 19:19:05 - ERROR - stderr - 40%|████ | 9021/22434 [9:11:25<9:23:56, 2.52s/it] +2025-02-05 19:19:05 - ERROR - stderr - +2025-02-05 19:19:05 - ERROR - stderr - +2025-02-05 19:19:05 - INFO - stdout - {'loss': 0.8127, 'grad_norm': 1.3287372589111328, 'learning_rate': 1.3576040459977579e-05, 'epoch': 1.21} +2025-02-05 19:19:05 - ERROR - stderr - 40%|████ | 9021/22434 [9:11:25<9:23:56, 2.52s/it] +2025-02-05 19:19:08 - ERROR - stderr - 40%|████ | 9022/22434 [9:11:28<9:28:46, 2.54s/it] +2025-02-05 19:19:08 - ERROR - stderr - +2025-02-05 19:19:08 - ERROR - stderr - +2025-02-05 19:19:08 - INFO - stdout - {'loss': 0.814, 'grad_norm': 1.2081409692764282, 'learning_rate': 1.3574692146284166e-05, 'epoch': 1.21} +2025-02-05 19:19:08 - ERROR - stderr - 40%|████ | 9022/22434 [9:11:28<9:28:46, 2.54s/it] +2025-02-05 19:19:10 - ERROR - stderr - 40%|████ | 9023/22434 [9:11:30<9:25:54, 2.53s/it] +2025-02-05 19:19:11 - ERROR - stderr - +2025-02-05 19:19:11 - ERROR - stderr - +2025-02-05 19:19:11 - INFO - stdout - {'loss': 0.8072, 'grad_norm': 1.1873949766159058, 'learning_rate': 1.3573343758079716e-05, 'epoch': 1.21} +2025-02-05 19:19:11 - ERROR - stderr - 40%|████ | 9023/22434 [9:11:30<9:25:54, 2.53s/it] +2025-02-05 19:19:13 - ERROR - stderr - 40%|████ | 9024/22434 [9:11:33<9:26:27, 2.53s/it] +2025-02-05 19:19:13 - ERROR - stderr - +2025-02-05 19:19:13 - ERROR - stderr - +2025-02-05 19:19:13 - INFO - stdout - {'loss': 0.8152, 'grad_norm': 1.3341482877731323, 'learning_rate': 1.3571995295392333e-05, 'epoch': 1.21} +2025-02-05 19:19:13 - ERROR - stderr - 40%|████ | 9024/22434 [9:11:33<9:26:27, 2.53s/it] +2025-02-05 19:19:16 - ERROR - stderr - 40%|████ | 9025/22434 [9:11:35<9:26:39, 2.54s/it] +2025-02-05 19:19:16 - ERROR - stderr - +2025-02-05 19:19:16 - ERROR - stderr - +2025-02-05 19:19:16 - INFO - stdout - {'loss': 0.6571, 'grad_norm': 1.1328319311141968, 'learning_rate': 1.3570646758250123e-05, 'epoch': 1.21} +2025-02-05 19:19:16 - ERROR - stderr - 40%|████ | 9025/22434 [9:11:35<9:26:39, 2.54s/it] +2025-02-05 19:19:18 - ERROR - stderr - 40%|████ | 9026/22434 [9:11:38<9:22:59, 2.52s/it] +2025-02-05 19:19:18 - ERROR - stderr - +2025-02-05 19:19:18 - ERROR - stderr - +2025-02-05 19:19:18 - INFO - stdout - {'loss': 0.737, 'grad_norm': 1.2272077798843384, 'learning_rate': 1.3569298146681202e-05, 'epoch': 1.21} +2025-02-05 19:19:18 - ERROR - stderr - 40%|████ | 9026/22434 [9:11:38<9:22:59, 2.52s/it] +2025-02-05 19:19:21 - ERROR - stderr - 40%|████ | 9027/22434 [9:11:40<9:23:46, 2.52s/it] +2025-02-05 19:19:21 - ERROR - stderr - +2025-02-05 19:19:21 - ERROR - stderr - +2025-02-05 19:19:21 - INFO - stdout - {'loss': 0.7397, 'grad_norm': 1.1021692752838135, 'learning_rate': 1.3567949460713678e-05, 'epoch': 1.21} +2025-02-05 19:19:21 - ERROR - stderr - 40%|████ | 9027/22434 [9:11:40<9:23:46, 2.52s/it] +2025-02-05 19:19:23 - ERROR - stderr - 40%|████ | 9028/22434 [9:11:43<9:21:37, 2.51s/it] +2025-02-05 19:19:23 - ERROR - stderr - +2025-02-05 19:19:23 - ERROR - stderr - +2025-02-05 19:19:23 - INFO - stdout - {'loss': 0.7179, 'grad_norm': 1.2565913200378418, 'learning_rate': 1.356660070037566e-05, 'epoch': 1.21} +2025-02-05 19:19:23 - ERROR - stderr - 40%|████ | 9028/22434 [9:11:43<9:21:37, 2.51s/it] +2025-02-05 19:19:26 - ERROR - stderr - 40%|████ | 9029/22434 [9:11:45<9:20:35, 2.51s/it] +2025-02-05 19:19:26 - ERROR - stderr - +2025-02-05 19:19:26 - ERROR - stderr - +2025-02-05 19:19:26 - INFO - stdout - {'loss': 0.6684, 'grad_norm': 1.0973520278930664, 'learning_rate': 1.3565251865695263e-05, 'epoch': 1.21} +2025-02-05 19:19:26 - ERROR - stderr - 40%|████ | 9029/22434 [9:11:45<9:20:35, 2.51s/it] +2025-02-05 19:19:28 - ERROR - stderr - 40%|████ | 9030/22434 [9:11:48<9:27:17, 2.54s/it] +2025-02-05 19:19:28 - ERROR - stderr - +2025-02-05 19:19:28 - ERROR - stderr - +2025-02-05 19:19:28 - INFO - stdout - {'loss': 0.7182, 'grad_norm': 1.0415103435516357, 'learning_rate': 1.3563902956700603e-05, 'epoch': 1.21} +2025-02-05 19:19:28 - ERROR - stderr - 40%|████ | 9030/22434 [9:11:48<9:27:17, 2.54s/it] +2025-02-05 19:19:31 - ERROR - stderr - 40%|████ | 9031/22434 [9:11:50<9:27:21, 2.54s/it] +2025-02-05 19:19:31 - ERROR - stderr - +2025-02-05 19:19:31 - ERROR - stderr - +2025-02-05 19:19:31 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.2298235893249512, 'learning_rate': 1.3562553973419796e-05, 'epoch': 1.21} +2025-02-05 19:19:31 - ERROR - stderr - 40%|████ | 9031/22434 [9:11:51<9:27:21, 2.54s/it] +2025-02-05 19:19:33 - ERROR - stderr - 40%|████ | 9032/22434 [9:11:53<9:24:12, 2.53s/it] +2025-02-05 19:19:33 - ERROR - stderr - +2025-02-05 19:19:33 - ERROR - stderr - +2025-02-05 19:19:33 - INFO - stdout - {'loss': 0.713, 'grad_norm': 1.2295136451721191, 'learning_rate': 1.3561204915880958e-05, 'epoch': 1.21} +2025-02-05 19:19:33 - ERROR - stderr - 40%|████ | 9032/22434 [9:11:53<9:24:12, 2.53s/it] +2025-02-05 19:19:36 - ERROR - stderr - 40%|████ | 9033/22434 [9:11:56<9:25:46, 2.53s/it] +2025-02-05 19:19:36 - ERROR - stderr - +2025-02-05 19:19:36 - ERROR - stderr - +2025-02-05 19:19:36 - INFO - stdout - {'loss': 0.6372, 'grad_norm': 1.1624665260314941, 'learning_rate': 1.3559855784112215e-05, 'epoch': 1.21} +2025-02-05 19:19:36 - ERROR - stderr - 40%|████ | 9033/22434 [9:11:56<9:25:46, 2.53s/it] +2025-02-05 19:19:38 - ERROR - stderr - 40%|████ | 9034/22434 [9:11:58<9:24:26, 2.53s/it] +2025-02-05 19:19:38 - ERROR - stderr - +2025-02-05 19:19:38 - ERROR - stderr - +2025-02-05 19:19:38 - INFO - stdout - {'loss': 0.7635, 'grad_norm': 1.2671111822128296, 'learning_rate': 1.3558506578141683e-05, 'epoch': 1.21} +2025-02-05 19:19:38 - ERROR - stderr - 40%|████ | 9034/22434 [9:11:58<9:24:26, 2.53s/it] +2025-02-05 19:19:41 - ERROR - stderr - 40%|████ | 9035/22434 [9:12:01<9:23:16, 2.52s/it] +2025-02-05 19:19:41 - ERROR - stderr - +2025-02-05 19:19:41 - ERROR - stderr - +2025-02-05 19:19:41 - INFO - stdout - {'loss': 0.7431, 'grad_norm': 1.1694306135177612, 'learning_rate': 1.3557157297997487e-05, 'epoch': 1.21} +2025-02-05 19:19:41 - ERROR - stderr - 40%|████ | 9035/22434 [9:12:01<9:23:16, 2.52s/it] +2025-02-05 19:19:43 - ERROR - stderr - 40%|████ | 9036/22434 [9:12:03<9:24:26, 2.53s/it] +2025-02-05 19:19:43 - ERROR - stderr - +2025-02-05 19:19:43 - ERROR - stderr - +2025-02-05 19:19:43 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.1366825103759766, 'learning_rate': 1.3555807943707752e-05, 'epoch': 1.21} +2025-02-05 19:19:43 - ERROR - stderr - 40%|████ | 9036/22434 [9:12:03<9:24:26, 2.53s/it] +2025-02-05 19:19:46 - ERROR - stderr - 40%|████ | 9037/22434 [9:12:06<9:22:22, 2.52s/it] +2025-02-05 19:19:46 - ERROR - stderr - +2025-02-05 19:19:46 - ERROR - stderr - +2025-02-05 19:19:46 - INFO - stdout - {'loss': 0.6857, 'grad_norm': 1.1026197671890259, 'learning_rate': 1.3554458515300602e-05, 'epoch': 1.21} +2025-02-05 19:19:46 - ERROR - stderr - 40%|████ | 9037/22434 [9:12:06<9:22:22, 2.52s/it] +2025-02-05 19:19:48 - ERROR - stderr - 40%|████ | 9038/22434 [9:12:08<9:19:17, 2.51s/it] +2025-02-05 19:19:48 - ERROR - stderr - +2025-02-05 19:19:48 - ERROR - stderr - +2025-02-05 19:19:48 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.066704511642456, 'learning_rate': 1.3553109012804162e-05, 'epoch': 1.21} +2025-02-05 19:19:48 - ERROR - stderr - 40%|████ | 9038/22434 [9:12:08<9:19:17, 2.51s/it] +2025-02-05 19:19:51 - ERROR - stderr - 40%|████ | 9039/22434 [9:12:11<9:20:34, 2.51s/it] +2025-02-05 19:19:51 - ERROR - stderr - +2025-02-05 19:19:51 - ERROR - stderr - +2025-02-05 19:19:51 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.0710009336471558, 'learning_rate': 1.3551759436246568e-05, 'epoch': 1.21} +2025-02-05 19:19:51 - ERROR - stderr - 40%|████ | 9039/22434 [9:12:11<9:20:34, 2.51s/it] +2025-02-05 19:19:53 - ERROR - stderr - 40%|████ | 9040/22434 [9:12:13<9:15:37, 2.49s/it] +2025-02-05 19:19:53 - ERROR - stderr - +2025-02-05 19:19:53 - ERROR - stderr - +2025-02-05 19:19:53 - INFO - stdout - {'loss': 0.7892, 'grad_norm': 1.268571138381958, 'learning_rate': 1.3550409785655947e-05, 'epoch': 1.21} +2025-02-05 19:19:53 - ERROR - stderr - 40%|████ | 9040/22434 [9:12:13<9:15:37, 2.49s/it] +2025-02-05 19:19:56 - ERROR - stderr - 40%|████ | 9041/22434 [9:12:15<9:12:09, 2.47s/it] +2025-02-05 19:19:56 - ERROR - stderr - +2025-02-05 19:19:56 - ERROR - stderr - +2025-02-05 19:19:56 - INFO - stdout - {'loss': 0.7881, 'grad_norm': 1.2657946348190308, 'learning_rate': 1.3549060061060431e-05, 'epoch': 1.21} +2025-02-05 19:19:56 - ERROR - stderr - 40%|████ | 9041/22434 [9:12:16<9:12:09, 2.47s/it] +2025-02-05 19:19:58 - ERROR - stderr - 40%|████ | 9042/22434 [9:12:18<9:14:58, 2.49s/it] +2025-02-05 19:19:58 - ERROR - stderr - +2025-02-05 19:19:58 - ERROR - stderr - +2025-02-05 19:19:58 - INFO - stdout - {'loss': 0.7144, 'grad_norm': 1.124334454536438, 'learning_rate': 1.3547710262488154e-05, 'epoch': 1.21} +2025-02-05 19:19:58 - ERROR - stderr - 40%|████ | 9042/22434 [9:12:18<9:14:58, 2.49s/it] +2025-02-05 19:20:01 - ERROR - stderr - 40%|████ | 9043/22434 [9:12:21<9:18:46, 2.50s/it] +2025-02-05 19:20:01 - ERROR - stderr - +2025-02-05 19:20:01 - ERROR - stderr - +2025-02-05 19:20:01 - INFO - stdout - {'loss': 0.6834, 'grad_norm': 1.1629618406295776, 'learning_rate': 1.3546360389967252e-05, 'epoch': 1.21} +2025-02-05 19:20:01 - ERROR - stderr - 40%|████ | 9043/22434 [9:12:21<9:18:46, 2.50s/it] +2025-02-05 19:20:03 - ERROR - stderr - 40%|████ | 9044/22434 [9:12:23<9:20:15, 2.51s/it] +2025-02-05 19:20:03 - ERROR - stderr - +2025-02-05 19:20:03 - ERROR - stderr - +2025-02-05 19:20:03 - INFO - stdout - {'loss': 0.7259, 'grad_norm': 1.185330867767334, 'learning_rate': 1.354501044352586e-05, 'epoch': 1.21} +2025-02-05 19:20:03 - ERROR - stderr - 40%|████ | 9044/22434 [9:12:23<9:20:15, 2.51s/it] +2025-02-05 19:20:06 - ERROR - stderr - 40%|████ | 9045/22434 [9:12:25<9:15:25, 2.49s/it] +2025-02-05 19:20:06 - ERROR - stderr - +2025-02-05 19:20:06 - ERROR - stderr - +2025-02-05 19:20:06 - INFO - stdout - {'loss': 0.6733, 'grad_norm': 1.1155431270599365, 'learning_rate': 1.3543660423192117e-05, 'epoch': 1.21} +2025-02-05 19:20:06 - ERROR - stderr - 40%|████ | 9045/22434 [9:12:26<9:15:25, 2.49s/it] +2025-02-05 19:20:08 - ERROR - stderr - 40%|████ | 9046/22434 [9:12:28<9:16:29, 2.49s/it] +2025-02-05 19:20:08 - ERROR - stderr - +2025-02-05 19:20:08 - ERROR - stderr - +2025-02-05 19:20:08 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.1219710111618042, 'learning_rate': 1.3542310328994166e-05, 'epoch': 1.21} +2025-02-05 19:20:08 - ERROR - stderr - 40%|████ | 9046/22434 [9:12:28<9:16:29, 2.49s/it] +2025-02-05 19:20:11 - ERROR - stderr - 40%|████ | 9047/22434 [9:12:30<9:13:48, 2.48s/it] +2025-02-05 19:20:11 - ERROR - stderr - +2025-02-05 19:20:11 - ERROR - stderr - +2025-02-05 19:20:11 - INFO - stdout - {'loss': 0.7953, 'grad_norm': 1.2864099740982056, 'learning_rate': 1.3540960160960147e-05, 'epoch': 1.21} +2025-02-05 19:20:11 - ERROR - stderr - 40%|████ | 9047/22434 [9:12:30<9:13:48, 2.48s/it] +2025-02-05 19:20:13 - ERROR - stderr - 40%|████ | 9048/22434 [9:12:33<9:13:24, 2.48s/it] +2025-02-05 19:20:13 - ERROR - stderr - +2025-02-05 19:20:13 - ERROR - stderr - +2025-02-05 19:20:13 - INFO - stdout - {'loss': 0.6712, 'grad_norm': 1.0919420719146729, 'learning_rate': 1.3539609919118197e-05, 'epoch': 1.21} +2025-02-05 19:20:13 - ERROR - stderr - 40%|████ | 9048/22434 [9:12:33<9:13:24, 2.48s/it] +2025-02-05 19:20:16 - ERROR - stderr - 40%|████ | 9049/22434 [9:12:36<9:34:02, 2.57s/it] +2025-02-05 19:20:16 - ERROR - stderr - +2025-02-05 19:20:16 - ERROR - stderr - +2025-02-05 19:20:16 - INFO - stdout - {'loss': 0.7563, 'grad_norm': 1.3224000930786133, 'learning_rate': 1.3538259603496469e-05, 'epoch': 1.21} +2025-02-05 19:20:16 - ERROR - stderr - 40%|████ | 9049/22434 [9:12:36<9:34:02, 2.57s/it] +2025-02-05 19:20:18 - ERROR - stderr - 40%|████ | 9050/22434 [9:12:38<9:28:30, 2.55s/it] +2025-02-05 19:20:18 - ERROR - stderr - +2025-02-05 19:20:18 - ERROR - stderr - +2025-02-05 19:20:18 - INFO - stdout - {'loss': 0.7004, 'grad_norm': 1.1616071462631226, 'learning_rate': 1.3536909214123104e-05, 'epoch': 1.21} +2025-02-05 19:20:18 - ERROR - stderr - 40%|████ | 9050/22434 [9:12:38<9:28:30, 2.55s/it] +2025-02-05 19:20:21 - ERROR - stderr - 40%|████ | 9051/22434 [9:12:41<9:24:22, 2.53s/it] +2025-02-05 19:20:21 - ERROR - stderr - +2025-02-05 19:20:21 - ERROR - stderr - +2025-02-05 19:20:21 - INFO - stdout - {'loss': 0.7057, 'grad_norm': 1.0560840368270874, 'learning_rate': 1.353555875102625e-05, 'epoch': 1.21} +2025-02-05 19:20:21 - ERROR - stderr - 40%|████ | 9051/22434 [9:12:41<9:24:22, 2.53s/it] +2025-02-05 19:20:23 - ERROR - stderr - 40%|████ | 9052/22434 [9:12:43<9:23:03, 2.52s/it] +2025-02-05 19:20:23 - ERROR - stderr - +2025-02-05 19:20:23 - ERROR - stderr - +2025-02-05 19:20:23 - INFO - stdout - {'loss': 0.7171, 'grad_norm': 1.0857654809951782, 'learning_rate': 1.3534208214234057e-05, 'epoch': 1.21} +2025-02-05 19:20:23 - ERROR - stderr - 40%|████ | 9052/22434 [9:12:43<9:23:03, 2.52s/it] +2025-02-05 19:20:26 - ERROR - stderr - 40%|████ | 9053/22434 [9:12:46<9:26:58, 2.54s/it] +2025-02-05 19:20:26 - ERROR - stderr - +2025-02-05 19:20:26 - ERROR - stderr - +2025-02-05 19:20:26 - INFO - stdout - {'loss': 0.7329, 'grad_norm': 1.1566907167434692, 'learning_rate': 1.3532857603774676e-05, 'epoch': 1.21} +2025-02-05 19:20:26 - ERROR - stderr - 40%|████ | 9053/22434 [9:12:46<9:26:58, 2.54s/it] +2025-02-05 19:20:29 - ERROR - stderr - 40%|████ | 9054/22434 [9:12:48<9:28:01, 2.55s/it] +2025-02-05 19:20:29 - ERROR - stderr - +2025-02-05 19:20:29 - ERROR - stderr - +2025-02-05 19:20:29 - INFO - stdout - {'loss': 0.7174, 'grad_norm': 1.2094645500183105, 'learning_rate': 1.3531506919676259e-05, 'epoch': 1.21} +2025-02-05 19:20:29 - ERROR - stderr - 40%|████ | 9054/22434 [9:12:48<9:28:01, 2.55s/it] +2025-02-05 19:20:31 - ERROR - stderr - 40%|████ | 9055/22434 [9:12:51<9:27:36, 2.55s/it] +2025-02-05 19:20:31 - ERROR - stderr - +2025-02-05 19:20:31 - ERROR - stderr - +2025-02-05 19:20:31 - INFO - stdout - {'loss': 0.7195, 'grad_norm': 1.1702481508255005, 'learning_rate': 1.3530156161966961e-05, 'epoch': 1.21} +2025-02-05 19:20:31 - ERROR - stderr - 40%|████ | 9055/22434 [9:12:51<9:27:36, 2.55s/it] +2025-02-05 19:20:34 - ERROR - stderr - 40%|████ | 9056/22434 [9:12:53<9:25:44, 2.54s/it] +2025-02-05 19:20:34 - ERROR - stderr - +2025-02-05 19:20:34 - ERROR - stderr - +2025-02-05 19:20:34 - INFO - stdout - {'loss': 0.6701, 'grad_norm': 1.174026608467102, 'learning_rate': 1.3528805330674934e-05, 'epoch': 1.21} +2025-02-05 19:20:34 - ERROR - stderr - 40%|████ | 9056/22434 [9:12:53<9:25:44, 2.54s/it] +2025-02-05 19:20:36 - ERROR - stderr - 40%|████ | 9057/22434 [9:12:56<9:40:19, 2.60s/it] +2025-02-05 19:20:36 - ERROR - stderr - +2025-02-05 19:20:36 - ERROR - stderr - +2025-02-05 19:20:36 - INFO - stdout - {'loss': 0.909, 'grad_norm': 1.3334025144577026, 'learning_rate': 1.3527454425828336e-05, 'epoch': 1.21} +2025-02-05 19:20:36 - ERROR - stderr - 40%|████ | 9057/22434 [9:12:56<9:40:19, 2.60s/it] +2025-02-05 19:20:39 - ERROR - stderr - 40%|████ | 9058/22434 [9:12:59<10:00:01, 2.69s/it] +2025-02-05 19:20:39 - ERROR - stderr - +2025-02-05 19:20:39 - ERROR - stderr - +2025-02-05 19:20:39 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.1262354850769043, 'learning_rate': 1.3526103447455326e-05, 'epoch': 1.21} +2025-02-05 19:20:39 - ERROR - stderr - 40%|████ | 9058/22434 [9:12:59<10:00:01, 2.69s/it] +2025-02-05 19:20:42 - ERROR - stderr - 40%|████ | 9059/22434 [9:13:02<9:49:20, 2.64s/it] +2025-02-05 19:20:42 - ERROR - stderr - +2025-02-05 19:20:42 - ERROR - stderr - +2025-02-05 19:20:42 - INFO - stdout - {'loss': 0.7785, 'grad_norm': 1.2559438943862915, 'learning_rate': 1.3524752395584066e-05, 'epoch': 1.21} +2025-02-05 19:20:42 - ERROR - stderr - 40%|████ | 9059/22434 [9:13:02<9:49:20, 2.64s/it] +2025-02-05 19:20:44 - ERROR - stderr - 40%|████ | 9060/22434 [9:13:04<9:41:57, 2.61s/it] +2025-02-05 19:20:44 - ERROR - stderr - +2025-02-05 19:20:44 - ERROR - stderr - +2025-02-05 19:20:44 - INFO - stdout - {'loss': 0.7159, 'grad_norm': 1.1379504203796387, 'learning_rate': 1.3523401270242715e-05, 'epoch': 1.21} +2025-02-05 19:20:44 - ERROR - stderr - 40%|████ | 9060/22434 [9:13:04<9:41:57, 2.61s/it] +2025-02-05 19:20:47 - ERROR - stderr - 40%|████ | 9061/22434 [9:13:07<9:39:21, 2.60s/it] +2025-02-05 19:20:47 - ERROR - stderr - +2025-02-05 19:20:47 - ERROR - stderr - +2025-02-05 19:20:47 - INFO - stdout - {'loss': 0.652, 'grad_norm': 1.147474765777588, 'learning_rate': 1.3522050071459434e-05, 'epoch': 1.21} +2025-02-05 19:20:47 - ERROR - stderr - 40%|████ | 9061/22434 [9:13:07<9:39:21, 2.60s/it] +2025-02-05 19:20:50 - ERROR - stderr - 40%|████ | 9062/22434 [9:13:09<9:39:51, 2.60s/it] +2025-02-05 19:20:50 - ERROR - stderr - +2025-02-05 19:20:50 - ERROR - stderr - +2025-02-05 19:20:50 - INFO - stdout - {'loss': 0.7718, 'grad_norm': 1.308051347732544, 'learning_rate': 1.352069879926239e-05, 'epoch': 1.21} +2025-02-05 19:20:50 - ERROR - stderr - 40%|████ | 9062/22434 [9:13:09<9:39:51, 2.60s/it] +2025-02-05 19:20:52 - ERROR - stderr - 40%|████ | 9063/22434 [9:13:12<9:50:02, 2.65s/it] +2025-02-05 19:20:52 - ERROR - stderr - +2025-02-05 19:20:52 - ERROR - stderr - +2025-02-05 19:20:52 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.209902048110962, 'learning_rate': 1.351934745367975e-05, 'epoch': 1.21} +2025-02-05 19:20:52 - ERROR - stderr - 40%|████ | 9063/22434 [9:13:12<9:50:02, 2.65s/it] +2025-02-05 19:20:55 - ERROR - stderr - 40%|████ | 9064/22434 [9:13:15<10:04:29, 2.71s/it] +2025-02-05 19:20:55 - ERROR - stderr - +2025-02-05 19:20:55 - ERROR - stderr - +2025-02-05 19:20:55 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.087220549583435, 'learning_rate': 1.3517996034739678e-05, 'epoch': 1.21} +2025-02-05 19:20:55 - ERROR - stderr - 40%|████ | 9064/22434 [9:13:15<10:04:29, 2.71s/it] +2025-02-05 19:20:58 - ERROR - stderr - 40%|████ | 9065/22434 [9:13:17<9:53:06, 2.66s/it] +2025-02-05 19:20:58 - ERROR - stderr - +2025-02-05 19:20:58 - ERROR - stderr - +2025-02-05 19:20:58 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.1273982524871826, 'learning_rate': 1.3516644542470346e-05, 'epoch': 1.21} +2025-02-05 19:20:58 - ERROR - stderr - 40%|████ | 9065/22434 [9:13:18<9:53:06, 2.66s/it] +2025-02-05 19:21:00 - ERROR - stderr - 40%|████ | 9066/22434 [9:13:20<9:42:20, 2.61s/it] +2025-02-05 19:21:00 - ERROR - stderr - +2025-02-05 19:21:00 - ERROR - stderr - +2025-02-05 19:21:00 - INFO - stdout - {'loss': 0.6827, 'grad_norm': 1.0759533643722534, 'learning_rate': 1.3515292976899922e-05, 'epoch': 1.21} +2025-02-05 19:21:00 - ERROR - stderr - 40%|████ | 9066/22434 [9:13:20<9:42:20, 2.61s/it] +2025-02-05 19:21:03 - ERROR - stderr - 40%|████ | 9067/22434 [9:13:23<9:44:29, 2.62s/it] +2025-02-05 19:21:03 - ERROR - stderr - +2025-02-05 19:21:03 - ERROR - stderr - +2025-02-05 19:21:03 - INFO - stdout - {'loss': 0.6992, 'grad_norm': 1.2487528324127197, 'learning_rate': 1.3513941338056584e-05, 'epoch': 1.21} +2025-02-05 19:21:03 - ERROR - stderr - 40%|████ | 9067/22434 [9:13:23<9:44:29, 2.62s/it] +2025-02-05 19:21:05 - ERROR - stderr - 40%|████ | 9068/22434 [9:13:25<9:33:08, 2.57s/it] +2025-02-05 19:21:05 - ERROR - stderr - +2025-02-05 19:21:05 - ERROR - stderr - +2025-02-05 19:21:05 - INFO - stdout - {'loss': 0.7265, 'grad_norm': 1.2670923471450806, 'learning_rate': 1.35125896259685e-05, 'epoch': 1.21} +2025-02-05 19:21:05 - ERROR - stderr - 40%|████ | 9068/22434 [9:13:25<9:33:08, 2.57s/it] +2025-02-05 19:21:08 - ERROR - stderr - 40%|████ | 9069/22434 [9:13:28<9:49:23, 2.65s/it] +2025-02-05 19:21:08 - ERROR - stderr - +2025-02-05 19:21:08 - ERROR - stderr - +2025-02-05 19:21:08 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.2367582321166992, 'learning_rate': 1.3511237840663842e-05, 'epoch': 1.21} +2025-02-05 19:21:08 - ERROR - stderr - 40%|████ | 9069/22434 [9:13:28<9:49:23, 2.65s/it] +2025-02-05 19:21:11 - ERROR - stderr - 40%|████ | 9070/22434 [9:13:30<9:45:16, 2.63s/it] +2025-02-05 19:21:11 - ERROR - stderr - +2025-02-05 19:21:11 - ERROR - stderr - +2025-02-05 19:21:11 - INFO - stdout - {'loss': 0.7631, 'grad_norm': 1.2065536975860596, 'learning_rate': 1.3509885982170793e-05, 'epoch': 1.21} +2025-02-05 19:21:11 - ERROR - stderr - 40%|████ | 9070/22434 [9:13:31<9:45:16, 2.63s/it] +2025-02-05 19:21:13 - ERROR - stderr - 40%|████ | 9071/22434 [9:13:33<9:34:46, 2.58s/it] +2025-02-05 19:21:13 - ERROR - stderr - +2025-02-05 19:21:13 - ERROR - stderr - +2025-02-05 19:21:13 - INFO - stdout - {'loss': 0.7024, 'grad_norm': 1.2679996490478516, 'learning_rate': 1.3508534050517532e-05, 'epoch': 1.21} +2025-02-05 19:21:13 - ERROR - stderr - 40%|████ | 9071/22434 [9:13:33<9:34:46, 2.58s/it] +2025-02-05 19:21:16 - ERROR - stderr - 40%|████ | 9072/22434 [9:13:35<9:27:50, 2.55s/it] +2025-02-05 19:21:16 - ERROR - stderr - +2025-02-05 19:21:16 - ERROR - stderr - +2025-02-05 19:21:16 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.0177645683288574, 'learning_rate': 1.3507182045732235e-05, 'epoch': 1.21} +2025-02-05 19:21:16 - ERROR - stderr - 40%|████ | 9072/22434 [9:13:35<9:27:50, 2.55s/it] +2025-02-05 19:21:18 - ERROR - stderr - 40%|████ | 9073/22434 [9:13:38<9:26:42, 2.54s/it] +2025-02-05 19:21:18 - ERROR - stderr - +2025-02-05 19:21:18 - ERROR - stderr - +2025-02-05 19:21:18 - INFO - stdout - {'loss': 0.6614, 'grad_norm': 1.1901986598968506, 'learning_rate': 1.3505829967843083e-05, 'epoch': 1.21} +2025-02-05 19:21:18 - ERROR - stderr - 40%|████ | 9073/22434 [9:13:38<9:26:42, 2.54s/it] +2025-02-05 19:21:21 - ERROR - stderr - 40%|████ | 9074/22434 [9:13:40<9:26:09, 2.54s/it] +2025-02-05 19:21:21 - ERROR - stderr - +2025-02-05 19:21:21 - ERROR - stderr - +2025-02-05 19:21:21 - INFO - stdout - {'loss': 0.7073, 'grad_norm': 1.222050666809082, 'learning_rate': 1.350447781687826e-05, 'epoch': 1.21} +2025-02-05 19:21:21 - ERROR - stderr - 40%|████ | 9074/22434 [9:13:41<9:26:09, 2.54s/it] +2025-02-05 19:21:23 - ERROR - stderr - 40%|████ | 9075/22434 [9:13:43<9:21:50, 2.52s/it] +2025-02-05 19:21:23 - ERROR - stderr - +2025-02-05 19:21:23 - ERROR - stderr - +2025-02-05 19:21:23 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.0645278692245483, 'learning_rate': 1.3503125592865954e-05, 'epoch': 1.21} +2025-02-05 19:21:23 - ERROR - stderr - 40%|████ | 9075/22434 [9:13:43<9:21:50, 2.52s/it] +2025-02-05 19:21:26 - ERROR - stderr - 40%|████ | 9076/22434 [9:13:46<9:23:32, 2.53s/it] +2025-02-05 19:21:26 - ERROR - stderr - +2025-02-05 19:21:26 - ERROR - stderr - +2025-02-05 19:21:26 - INFO - stdout - {'loss': 0.7145, 'grad_norm': 1.238612174987793, 'learning_rate': 1.3501773295834339e-05, 'epoch': 1.21} +2025-02-05 19:21:26 - ERROR - stderr - 40%|████ | 9076/22434 [9:13:46<9:23:32, 2.53s/it] +2025-02-05 19:21:28 - ERROR - stderr - 40%|████ | 9077/22434 [9:13:48<9:27:02, 2.55s/it] +2025-02-05 19:21:28 - ERROR - stderr - +2025-02-05 19:21:28 - ERROR - stderr - +2025-02-05 19:21:28 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.1346899271011353, 'learning_rate': 1.3500420925811618e-05, 'epoch': 1.21} +2025-02-05 19:21:28 - ERROR - stderr - 40%|████ | 9077/22434 [9:13:48<9:27:02, 2.55s/it] +2025-02-05 19:21:31 - ERROR - stderr - 40%|████ | 9078/22434 [9:13:51<9:48:10, 2.64s/it] +2025-02-05 19:21:31 - ERROR - stderr - +2025-02-05 19:21:31 - ERROR - stderr - +2025-02-05 19:21:31 - INFO - stdout - {'loss': 0.8005, 'grad_norm': 1.2261466979980469, 'learning_rate': 1.3499068482825968e-05, 'epoch': 1.21} +2025-02-05 19:21:31 - ERROR - stderr - 40%|████ | 9078/22434 [9:13:51<9:48:10, 2.64s/it] +2025-02-05 19:21:34 - ERROR - stderr - 40%|████ | 9079/22434 [9:13:53<9:38:53, 2.60s/it] +2025-02-05 19:21:34 - ERROR - stderr - +2025-02-05 19:21:34 - ERROR - stderr - +2025-02-05 19:21:34 - INFO - stdout - {'loss': 0.6888, 'grad_norm': 1.122787356376648, 'learning_rate': 1.349771596690559e-05, 'epoch': 1.21} +2025-02-05 19:21:34 - ERROR - stderr - 40%|████ | 9079/22434 [9:13:54<9:38:53, 2.60s/it] +2025-02-05 19:21:36 - ERROR - stderr - 40%|████ | 9080/22434 [9:13:56<9:32:28, 2.57s/it] +2025-02-05 19:21:36 - ERROR - stderr - +2025-02-05 19:21:36 - ERROR - stderr - +2025-02-05 19:21:36 - INFO - stdout - {'loss': 0.7441, 'grad_norm': 1.140896201133728, 'learning_rate': 1.3496363378078662e-05, 'epoch': 1.21} +2025-02-05 19:21:36 - ERROR - stderr - 40%|████ | 9080/22434 [9:13:56<9:32:28, 2.57s/it] +2025-02-05 19:21:39 - ERROR - stderr - 40%|████ | 9081/22434 [9:13:58<9:25:33, 2.54s/it] +2025-02-05 19:21:39 - ERROR - stderr - +2025-02-05 19:21:39 - ERROR - stderr - +2025-02-05 19:21:39 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.0653181076049805, 'learning_rate': 1.349501071637339e-05, 'epoch': 1.21} +2025-02-05 19:21:39 - ERROR - stderr - 40%|████ | 9081/22434 [9:13:58<9:25:33, 2.54s/it] +2025-02-05 19:21:41 - ERROR - stderr - 40%|████ | 9082/22434 [9:14:01<9:38:07, 2.60s/it] +2025-02-05 19:21:41 - ERROR - stderr - +2025-02-05 19:21:41 - ERROR - stderr - +2025-02-05 19:21:41 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.1413154602050781, 'learning_rate': 1.3493657981817961e-05, 'epoch': 1.21} +2025-02-05 19:21:41 - ERROR - stderr - 40%|████ | 9082/22434 [9:14:01<9:38:07, 2.60s/it] +2025-02-05 19:21:44 - ERROR - stderr - 40%|████ | 9083/22434 [9:14:04<9:44:48, 2.63s/it] +2025-02-05 19:21:44 - ERROR - stderr - +2025-02-05 19:21:44 - ERROR - stderr - +2025-02-05 19:21:44 - INFO - stdout - {'loss': 0.6383, 'grad_norm': 1.17963707447052, 'learning_rate': 1.3492305174440574e-05, 'epoch': 1.21} +2025-02-05 19:21:44 - ERROR - stderr - 40%|████ | 9083/22434 [9:14:04<9:44:48, 2.63s/it] +2025-02-05 19:21:47 - ERROR - stderr - 40%|████ | 9084/22434 [9:14:06<9:39:29, 2.60s/it] +2025-02-05 19:21:47 - ERROR - stderr - +2025-02-05 19:21:47 - ERROR - stderr - +2025-02-05 19:21:47 - INFO - stdout - {'loss': 0.6496, 'grad_norm': 1.1084611415863037, 'learning_rate': 1.3490952294269431e-05, 'epoch': 1.21} +2025-02-05 19:21:47 - ERROR - stderr - 40%|████ | 9084/22434 [9:14:06<9:39:29, 2.60s/it] +2025-02-05 19:21:49 - ERROR - stderr - 40%|████ | 9085/22434 [9:14:09<9:35:01, 2.58s/it] +2025-02-05 19:21:49 - ERROR - stderr - +2025-02-05 19:21:49 - ERROR - stderr - +2025-02-05 19:21:49 - INFO - stdout - {'loss': 0.6105, 'grad_norm': 1.0990934371948242, 'learning_rate': 1.3489599341332723e-05, 'epoch': 1.21} +2025-02-05 19:21:49 - ERROR - stderr - 40%|████ | 9085/22434 [9:14:09<9:35:01, 2.58s/it] +2025-02-05 19:21:52 - ERROR - stderr - 41%|████ | 9086/22434 [9:14:12<9:37:28, 2.60s/it] +2025-02-05 19:21:52 - ERROR - stderr - +2025-02-05 19:21:52 - ERROR - stderr - +2025-02-05 19:21:52 - INFO - stdout - {'loss': 0.6539, 'grad_norm': 1.055248498916626, 'learning_rate': 1.3488246315658659e-05, 'epoch': 1.22} +2025-02-05 19:21:52 - ERROR - stderr - 41%|████ | 9086/22434 [9:14:12<9:37:28, 2.60s/it] +2025-02-05 19:21:54 - ERROR - stderr - 41%|████ | 9087/22434 [9:14:14<9:31:35, 2.57s/it] +2025-02-05 19:21:54 - ERROR - stderr - +2025-02-05 19:21:54 - ERROR - stderr - +2025-02-05 19:21:54 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.2267736196517944, 'learning_rate': 1.348689321727544e-05, 'epoch': 1.22} +2025-02-05 19:21:54 - ERROR - stderr - 41%|████ | 9087/22434 [9:14:14<9:31:35, 2.57s/it] +2025-02-05 19:21:57 - ERROR - stderr - 41%|████ | 9088/22434 [9:14:17<9:46:29, 2.64s/it] +2025-02-05 19:21:57 - ERROR - stderr - +2025-02-05 19:21:57 - ERROR - stderr - +2025-02-05 19:21:57 - INFO - stdout - {'loss': 0.5853, 'grad_norm': 1.0088655948638916, 'learning_rate': 1.348554004621127e-05, 'epoch': 1.22} +2025-02-05 19:21:57 - ERROR - stderr - 41%|████ | 9088/22434 [9:14:17<9:46:29, 2.64s/it] +2025-02-05 19:22:00 - ERROR - stderr - 41%|████ | 9089/22434 [9:14:19<9:34:06, 2.58s/it] +2025-02-05 19:22:00 - ERROR - stderr - +2025-02-05 19:22:00 - ERROR - stderr - +2025-02-05 19:22:00 - INFO - stdout - {'loss': 0.7804, 'grad_norm': 1.2697254419326782, 'learning_rate': 1.3484186802494346e-05, 'epoch': 1.22} +2025-02-05 19:22:00 - ERROR - stderr - 41%|████ | 9089/22434 [9:14:19<9:34:06, 2.58s/it] +2025-02-05 19:22:02 - ERROR - stderr - 41%|████ | 9090/22434 [9:14:22<9:29:03, 2.56s/it] +2025-02-05 19:22:02 - ERROR - stderr - +2025-02-05 19:22:02 - ERROR - stderr - +2025-02-05 19:22:02 - INFO - stdout - {'loss': 0.7508, 'grad_norm': 1.2124444246292114, 'learning_rate': 1.3482833486152886e-05, 'epoch': 1.22} +2025-02-05 19:22:02 - ERROR - stderr - 41%|████ | 9090/22434 [9:14:22<9:29:03, 2.56s/it] +2025-02-05 19:22:05 - ERROR - stderr - 41%|████ | 9091/22434 [9:14:24<9:23:39, 2.53s/it] +2025-02-05 19:22:05 - ERROR - stderr - +2025-02-05 19:22:05 - ERROR - stderr - +2025-02-05 19:22:05 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.224016547203064, 'learning_rate': 1.3481480097215094e-05, 'epoch': 1.22} +2025-02-05 19:22:05 - ERROR - stderr - 41%|████ | 9091/22434 [9:14:24<9:23:39, 2.53s/it] +2025-02-05 19:22:07 - ERROR - stderr - 41%|████ | 9092/22434 [9:14:27<9:25:19, 2.54s/it] +2025-02-05 19:22:07 - ERROR - stderr - +2025-02-05 19:22:07 - ERROR - stderr - +2025-02-05 19:22:07 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.0906602144241333, 'learning_rate': 1.3480126635709183e-05, 'epoch': 1.22} +2025-02-05 19:22:07 - ERROR - stderr - 41%|████ | 9092/22434 [9:14:27<9:25:19, 2.54s/it] +2025-02-05 19:22:10 - ERROR - stderr - 41%|████ | 9093/22434 [9:14:29<9:21:37, 2.53s/it] +2025-02-05 19:22:10 - ERROR - stderr - +2025-02-05 19:22:10 - ERROR - stderr - +2025-02-05 19:22:10 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.2782318592071533, 'learning_rate': 1.3478773101663362e-05, 'epoch': 1.22} +2025-02-05 19:22:10 - ERROR - stderr - 41%|████ | 9093/22434 [9:14:29<9:21:37, 2.53s/it] +2025-02-05 19:22:12 - ERROR - stderr - 41%|████ | 9094/22434 [9:14:32<9:16:33, 2.50s/it] +2025-02-05 19:22:12 - ERROR - stderr - +2025-02-05 19:22:12 - ERROR - stderr - +2025-02-05 19:22:12 - INFO - stdout - {'loss': 0.7243, 'grad_norm': 1.2403485774993896, 'learning_rate': 1.3477419495105843e-05, 'epoch': 1.22} +2025-02-05 19:22:12 - ERROR - stderr - 41%|████ | 9094/22434 [9:14:32<9:16:33, 2.50s/it] +2025-02-05 19:22:15 - ERROR - stderr - 41%|████ | 9095/22434 [9:14:34<9:14:35, 2.49s/it] +2025-02-05 19:22:15 - ERROR - stderr - +2025-02-05 19:22:15 - ERROR - stderr - +2025-02-05 19:22:15 - INFO - stdout - {'loss': 0.7872, 'grad_norm': 1.317039132118225, 'learning_rate': 1.3476065816064842e-05, 'epoch': 1.22} +2025-02-05 19:22:15 - ERROR - stderr - 41%|████ | 9095/22434 [9:14:34<9:14:35, 2.49s/it] +2025-02-05 19:22:17 - ERROR - stderr - 41%|████ | 9096/22434 [9:14:37<9:12:10, 2.48s/it] +2025-02-05 19:22:17 - ERROR - stderr - +2025-02-05 19:22:17 - ERROR - stderr - +2025-02-05 19:22:17 - INFO - stdout - {'loss': 0.6878, 'grad_norm': 1.1456726789474487, 'learning_rate': 1.3474712064568576e-05, 'epoch': 1.22} +2025-02-05 19:22:17 - ERROR - stderr - 41%|████ | 9096/22434 [9:14:37<9:12:10, 2.48s/it] +2025-02-05 19:22:19 - ERROR - stderr - 41%|████ | 9097/22434 [9:14:39<9:08:57, 2.47s/it] +2025-02-05 19:22:19 - ERROR - stderr - +2025-02-05 19:22:19 - ERROR - stderr - +2025-02-05 19:22:19 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.1739799976348877, 'learning_rate': 1.3473358240645263e-05, 'epoch': 1.22} +2025-02-05 19:22:19 - ERROR - stderr - 41%|████ | 9097/22434 [9:14:39<9:08:57, 2.47s/it] +2025-02-05 19:22:22 - ERROR - stderr - 41%|████ | 9098/22434 [9:14:42<9:37:25, 2.60s/it] +2025-02-05 19:22:22 - ERROR - stderr - +2025-02-05 19:22:22 - ERROR - stderr - +2025-02-05 19:22:22 - INFO - stdout - {'loss': 0.7282, 'grad_norm': 1.1599738597869873, 'learning_rate': 1.3472004344323118e-05, 'epoch': 1.22} +2025-02-05 19:22:22 - ERROR - stderr - 41%|████ | 9098/22434 [9:14:42<9:37:25, 2.60s/it] +2025-02-05 19:22:25 - ERROR - stderr - 41%|████ | 9099/22434 [9:14:45<9:29:45, 2.56s/it] +2025-02-05 19:22:25 - ERROR - stderr - +2025-02-05 19:22:25 - ERROR - stderr - +2025-02-05 19:22:25 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.1250706911087036, 'learning_rate': 1.3470650375630365e-05, 'epoch': 1.22} +2025-02-05 19:22:25 - ERROR - stderr - 41%|████ | 9099/22434 [9:14:45<9:29:45, 2.56s/it] +2025-02-05 19:22:28 - ERROR - stderr - 41%|████ | 9100/22434 [9:14:47<9:50:10, 2.66s/it] +2025-02-05 19:22:28 - ERROR - stderr - +2025-02-05 19:22:28 - ERROR - stderr - +2025-02-05 19:22:28 - INFO - stdout - {'loss': 0.6367, 'grad_norm': 1.1711138486862183, 'learning_rate': 1.346929633459523e-05, 'epoch': 1.22} +2025-02-05 19:22:28 - ERROR - stderr - 41%|████ | 9100/22434 [9:14:47<9:50:10, 2.66s/it] +2025-02-05 19:22:30 - ERROR - stderr - 41%|████ | 9101/22434 [9:14:50<9:46:16, 2.64s/it] +2025-02-05 19:22:30 - ERROR - stderr - +2025-02-05 19:22:30 - ERROR - stderr - +2025-02-05 19:22:30 - INFO - stdout - {'loss': 0.8214, 'grad_norm': 1.2530628442764282, 'learning_rate': 1.3467942221245931e-05, 'epoch': 1.22} +2025-02-05 19:22:30 - ERROR - stderr - 41%|████ | 9101/22434 [9:14:50<9:46:16, 2.64s/it] +2025-02-05 19:22:33 - ERROR - stderr - 41%|████ | 9102/22434 [9:14:53<9:37:22, 2.60s/it] +2025-02-05 19:22:33 - ERROR - stderr - +2025-02-05 19:22:33 - ERROR - stderr - +2025-02-05 19:22:33 - INFO - stdout - {'loss': 0.732, 'grad_norm': 1.1551121473312378, 'learning_rate': 1.3466588035610693e-05, 'epoch': 1.22} +2025-02-05 19:22:33 - ERROR - stderr - 41%|████ | 9102/22434 [9:14:53<9:37:22, 2.60s/it] +2025-02-05 19:22:35 - ERROR - stderr - 41%|████ | 9103/22434 [9:14:55<9:28:17, 2.56s/it] +2025-02-05 19:22:35 - ERROR - stderr - +2025-02-05 19:22:35 - ERROR - stderr - +2025-02-05 19:22:35 - INFO - stdout - {'loss': 0.7339, 'grad_norm': 1.2006908655166626, 'learning_rate': 1.3465233777717744e-05, 'epoch': 1.22} +2025-02-05 19:22:35 - ERROR - stderr - 41%|████ | 9103/22434 [9:14:55<9:28:17, 2.56s/it] +2025-02-05 19:22:38 - ERROR - stderr - 41%|████ | 9104/22434 [9:14:57<9:22:38, 2.53s/it] +2025-02-05 19:22:38 - ERROR - stderr - +2025-02-05 19:22:38 - ERROR - stderr - +2025-02-05 19:22:38 - INFO - stdout - {'loss': 0.6394, 'grad_norm': 1.1822844743728638, 'learning_rate': 1.3463879447595316e-05, 'epoch': 1.22} +2025-02-05 19:22:38 - ERROR - stderr - 41%|████ | 9104/22434 [9:14:58<9:22:38, 2.53s/it] +2025-02-05 19:22:40 - ERROR - stderr - 41%|████ | 9105/22434 [9:15:00<9:20:01, 2.52s/it] +2025-02-05 19:22:40 - ERROR - stderr - +2025-02-05 19:22:40 - ERROR - stderr - +2025-02-05 19:22:40 - INFO - stdout - {'loss': 0.7074, 'grad_norm': 1.252432942390442, 'learning_rate': 1.3462525045271635e-05, 'epoch': 1.22} +2025-02-05 19:22:40 - ERROR - stderr - 41%|████ | 9105/22434 [9:15:00<9:20:01, 2.52s/it] +2025-02-05 19:22:43 - ERROR - stderr - 41%|████ | 9106/22434 [9:15:03<9:20:08, 2.52s/it] +2025-02-05 19:22:43 - ERROR - stderr - +2025-02-05 19:22:43 - ERROR - stderr - +2025-02-05 19:22:43 - INFO - stdout - {'loss': 0.7078, 'grad_norm': 1.1828771829605103, 'learning_rate': 1.346117057077493e-05, 'epoch': 1.22} +2025-02-05 19:22:43 - ERROR - stderr - 41%|████ | 9106/22434 [9:15:03<9:20:08, 2.52s/it] +2025-02-05 19:22:45 - ERROR - stderr - 41%|████ | 9107/22434 [9:15:05<9:18:03, 2.51s/it] +2025-02-05 19:22:45 - ERROR - stderr - +2025-02-05 19:22:45 - ERROR - stderr - +2025-02-05 19:22:45 - INFO - stdout - {'loss': 0.6621, 'grad_norm': 1.2033642530441284, 'learning_rate': 1.3459816024133439e-05, 'epoch': 1.22} +2025-02-05 19:22:45 - ERROR - stderr - 41%|████ | 9107/22434 [9:15:05<9:18:03, 2.51s/it] +2025-02-05 19:22:48 - ERROR - stderr - 41%|████ | 9108/22434 [9:15:08<9:24:13, 2.54s/it] +2025-02-05 19:22:48 - ERROR - stderr - +2025-02-05 19:22:48 - ERROR - stderr - +2025-02-05 19:22:48 - INFO - stdout - {'loss': 0.726, 'grad_norm': 1.2897650003433228, 'learning_rate': 1.3458461405375394e-05, 'epoch': 1.22} +2025-02-05 19:22:48 - ERROR - stderr - 41%|████ | 9108/22434 [9:15:08<9:24:13, 2.54s/it] +2025-02-05 19:22:50 - ERROR - stderr - 41%|████ | 9109/22434 [9:15:10<9:21:35, 2.53s/it] +2025-02-05 19:22:50 - ERROR - stderr - +2025-02-05 19:22:50 - ERROR - stderr - +2025-02-05 19:22:50 - INFO - stdout - {'loss': 0.7295, 'grad_norm': 1.0923179388046265, 'learning_rate': 1.3457106714529027e-05, 'epoch': 1.22} +2025-02-05 19:22:50 - ERROR - stderr - 41%|████ | 9109/22434 [9:15:10<9:21:35, 2.53s/it] +2025-02-05 19:22:53 - ERROR - stderr - 41%|████ | 9110/22434 [9:15:13<9:21:12, 2.53s/it] +2025-02-05 19:22:53 - ERROR - stderr - +2025-02-05 19:22:53 - ERROR - stderr - +2025-02-05 19:22:53 - INFO - stdout - {'loss': 0.7442, 'grad_norm': 1.2467091083526611, 'learning_rate': 1.3455751951622582e-05, 'epoch': 1.22} +2025-02-05 19:22:53 - ERROR - stderr - 41%|████ | 9110/22434 [9:15:13<9:21:12, 2.53s/it] +2025-02-05 19:22:56 - ERROR - stderr - 41%|████ | 9111/22434 [9:15:15<9:32:29, 2.58s/it] +2025-02-05 19:22:56 - ERROR - stderr - +2025-02-05 19:22:56 - ERROR - stderr - +2025-02-05 19:22:56 - INFO - stdout - {'loss': 0.7542, 'grad_norm': 1.2819690704345703, 'learning_rate': 1.3454397116684292e-05, 'epoch': 1.22} +2025-02-05 19:22:56 - ERROR - stderr - 41%|████ | 9111/22434 [9:15:15<9:32:29, 2.58s/it] +2025-02-05 19:22:58 - ERROR - stderr - 41%|████ | 9112/22434 [9:15:18<9:29:05, 2.56s/it] +2025-02-05 19:22:58 - ERROR - stderr - +2025-02-05 19:22:58 - ERROR - stderr - +2025-02-05 19:22:58 - INFO - stdout - {'loss': 0.6177, 'grad_norm': 1.0074762105941772, 'learning_rate': 1.3453042209742405e-05, 'epoch': 1.22} +2025-02-05 19:22:58 - ERROR - stderr - 41%|████ | 9112/22434 [9:15:18<9:29:05, 2.56s/it] +2025-02-05 19:23:01 - ERROR - stderr - 41%|████ | 9113/22434 [9:15:20<9:24:32, 2.54s/it] +2025-02-05 19:23:01 - ERROR - stderr - +2025-02-05 19:23:01 - ERROR - stderr - +2025-02-05 19:23:01 - INFO - stdout - {'loss': 0.7797, 'grad_norm': 1.2283340692520142, 'learning_rate': 1.345168723082515e-05, 'epoch': 1.22} +2025-02-05 19:23:01 - ERROR - stderr - 41%|████ | 9113/22434 [9:15:20<9:24:32, 2.54s/it] +2025-02-05 19:23:03 - ERROR - stderr - 41%|████ | 9114/22434 [9:15:23<9:18:18, 2.51s/it] +2025-02-05 19:23:03 - ERROR - stderr - +2025-02-05 19:23:03 - ERROR - stderr - +2025-02-05 19:23:03 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.2426378726959229, 'learning_rate': 1.345033217996078e-05, 'epoch': 1.22} +2025-02-05 19:23:03 - ERROR - stderr - 41%|████ | 9114/22434 [9:15:23<9:18:18, 2.51s/it] +2025-02-05 19:23:06 - ERROR - stderr - 41%|████ | 9115/22434 [9:15:25<9:20:06, 2.52s/it] +2025-02-05 19:23:06 - ERROR - stderr - +2025-02-05 19:23:06 - ERROR - stderr - +2025-02-05 19:23:06 - INFO - stdout - {'loss': 0.7304, 'grad_norm': 1.1323283910751343, 'learning_rate': 1.3448977057177538e-05, 'epoch': 1.22} +2025-02-05 19:23:06 - ERROR - stderr - 41%|████ | 9115/22434 [9:15:25<9:20:06, 2.52s/it] +2025-02-05 19:23:08 - ERROR - stderr - 41%|████ | 9116/22434 [9:15:28<9:19:25, 2.52s/it] +2025-02-05 19:23:08 - ERROR - stderr - +2025-02-05 19:23:08 - ERROR - stderr - +2025-02-05 19:23:08 - INFO - stdout - {'loss': 0.7563, 'grad_norm': 1.1827670335769653, 'learning_rate': 1.3447621862503671e-05, 'epoch': 1.22} +2025-02-05 19:23:08 - ERROR - stderr - 41%|████ | 9116/22434 [9:15:28<9:19:25, 2.52s/it] +2025-02-05 19:23:11 - ERROR - stderr - 41%|████ | 9117/22434 [9:15:30<9:17:46, 2.51s/it] +2025-02-05 19:23:11 - ERROR - stderr - +2025-02-05 19:23:11 - ERROR - stderr - +2025-02-05 19:23:11 - INFO - stdout - {'loss': 0.7837, 'grad_norm': 1.2890276908874512, 'learning_rate': 1.3446266595967424e-05, 'epoch': 1.22} +2025-02-05 19:23:11 - ERROR - stderr - 41%|████ | 9117/22434 [9:15:30<9:17:46, 2.51s/it] +2025-02-05 19:23:13 - ERROR - stderr - 41%|████ | 9118/22434 [9:15:33<9:17:52, 2.51s/it] +2025-02-05 19:23:13 - ERROR - stderr - +2025-02-05 19:23:13 - ERROR - stderr - +2025-02-05 19:23:13 - INFO - stdout - {'loss': 0.7578, 'grad_norm': 1.2116713523864746, 'learning_rate': 1.3444911257597047e-05, 'epoch': 1.22} +2025-02-05 19:23:13 - ERROR - stderr - 41%|████ | 9118/22434 [9:15:33<9:17:52, 2.51s/it] +2025-02-05 19:23:16 - ERROR - stderr - 41%|████ | 9119/22434 [9:15:35<9:19:59, 2.52s/it] +2025-02-05 19:23:16 - ERROR - stderr - +2025-02-05 19:23:16 - ERROR - stderr - +2025-02-05 19:23:16 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.3103309869766235, 'learning_rate': 1.344355584742079e-05, 'epoch': 1.22} +2025-02-05 19:23:16 - ERROR - stderr - 41%|████ | 9119/22434 [9:15:35<9:19:59, 2.52s/it] +2025-02-05 19:23:18 - ERROR - stderr - 41%|████ | 9120/22434 [9:15:38<9:18:53, 2.52s/it] +2025-02-05 19:23:18 - ERROR - stderr - +2025-02-05 19:23:18 - ERROR - stderr - +2025-02-05 19:23:18 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.0817703008651733, 'learning_rate': 1.344220036546691e-05, 'epoch': 1.22} +2025-02-05 19:23:18 - ERROR - stderr - 41%|████ | 9120/22434 [9:15:38<9:18:53, 2.52s/it] +2025-02-05 19:23:21 - ERROR - stderr - 41%|████ | 9121/22434 [9:15:41<9:40:37, 2.62s/it] +2025-02-05 19:23:21 - ERROR - stderr - +2025-02-05 19:23:21 - ERROR - stderr - +2025-02-05 19:23:21 - INFO - stdout - {'loss': 0.7341, 'grad_norm': 1.1764171123504639, 'learning_rate': 1.3440844811763653e-05, 'epoch': 1.22} +2025-02-05 19:23:21 - ERROR - stderr - 41%|████ | 9121/22434 [9:15:41<9:40:37, 2.62s/it] +2025-02-05 19:23:24 - ERROR - stderr - 41%|████ | 9122/22434 [9:15:43<9:34:00, 2.59s/it] +2025-02-05 19:23:24 - ERROR - stderr - +2025-02-05 19:23:24 - ERROR - stderr - +2025-02-05 19:23:24 - INFO - stdout - {'loss': 0.7853, 'grad_norm': 1.31882905960083, 'learning_rate': 1.3439489186339283e-05, 'epoch': 1.22} +2025-02-05 19:23:24 - ERROR - stderr - 41%|████ | 9122/22434 [9:15:43<9:34:00, 2.59s/it] +2025-02-05 19:23:26 - ERROR - stderr - 41%|████ | 9123/22434 [9:15:46<9:25:22, 2.55s/it] +2025-02-05 19:23:26 - ERROR - stderr - +2025-02-05 19:23:26 - ERROR - stderr - +2025-02-05 19:23:26 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2082699537277222, 'learning_rate': 1.3438133489222049e-05, 'epoch': 1.22} +2025-02-05 19:23:26 - ERROR - stderr - 41%|████ | 9123/22434 [9:15:46<9:25:22, 2.55s/it] +2025-02-05 19:23:28 - ERROR - stderr - 41%|████ | 9124/22434 [9:15:48<9:18:21, 2.52s/it] +2025-02-05 19:23:28 - ERROR - stderr - +2025-02-05 19:23:28 - ERROR - stderr - +2025-02-05 19:23:28 - INFO - stdout - {'loss': 0.6724, 'grad_norm': 1.1116266250610352, 'learning_rate': 1.3436777720440214e-05, 'epoch': 1.22} +2025-02-05 19:23:28 - ERROR - stderr - 41%|████ | 9124/22434 [9:15:48<9:18:21, 2.52s/it] +2025-02-05 19:23:31 - ERROR - stderr - 41%|████ | 9125/22434 [9:15:51<9:15:56, 2.51s/it] +2025-02-05 19:23:31 - ERROR - stderr - +2025-02-05 19:23:31 - ERROR - stderr - +2025-02-05 19:23:31 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.1878135204315186, 'learning_rate': 1.3435421880022035e-05, 'epoch': 1.22} +2025-02-05 19:23:31 - ERROR - stderr - 41%|████ | 9125/22434 [9:15:51<9:15:56, 2.51s/it] +2025-02-05 19:23:33 - ERROR - stderr - 41%|████ | 9126/22434 [9:15:53<9:15:12, 2.50s/it] +2025-02-05 19:23:33 - ERROR - stderr - +2025-02-05 19:23:33 - ERROR - stderr - +2025-02-05 19:23:33 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 0.9940921664237976, 'learning_rate': 1.3434065967995776e-05, 'epoch': 1.22} +2025-02-05 19:23:33 - ERROR - stderr - 41%|████ | 9126/22434 [9:15:53<9:15:12, 2.50s/it] +2025-02-05 19:23:36 - ERROR - stderr - 41%|████ | 9127/22434 [9:15:56<9:14:37, 2.50s/it] +2025-02-05 19:23:36 - ERROR - stderr - +2025-02-05 19:23:36 - ERROR - stderr - +2025-02-05 19:23:36 - INFO - stdout - {'loss': 0.7586, 'grad_norm': 1.1991914510726929, 'learning_rate': 1.3432709984389696e-05, 'epoch': 1.22} +2025-02-05 19:23:36 - ERROR - stderr - 41%|████ | 9127/22434 [9:15:56<9:14:37, 2.50s/it] +2025-02-05 19:23:38 - ERROR - stderr - 41%|████ | 9128/22434 [9:15:58<9:15:55, 2.51s/it] +2025-02-05 19:23:38 - ERROR - stderr - +2025-02-05 19:23:38 - ERROR - stderr - +2025-02-05 19:23:38 - INFO - stdout - {'loss': 0.6719, 'grad_norm': 1.0793712139129639, 'learning_rate': 1.343135392923206e-05, 'epoch': 1.22} +2025-02-05 19:23:38 - ERROR - stderr - 41%|████ | 9128/22434 [9:15:58<9:15:55, 2.51s/it] +2025-02-05 19:23:41 - ERROR - stderr - 41%|████ | 9129/22434 [9:16:01<9:16:51, 2.51s/it] +2025-02-05 19:23:41 - ERROR - stderr - +2025-02-05 19:23:41 - ERROR - stderr - +2025-02-05 19:23:41 - INFO - stdout - {'loss': 0.8594, 'grad_norm': 1.337990164756775, 'learning_rate': 1.3429997802551138e-05, 'epoch': 1.22} +2025-02-05 19:23:41 - ERROR - stderr - 41%|████ | 9129/22434 [9:16:01<9:16:51, 2.51s/it] +2025-02-05 19:23:43 - ERROR - stderr - 41%|████ | 9130/22434 [9:16:03<9:16:28, 2.51s/it] +2025-02-05 19:23:43 - ERROR - stderr - +2025-02-05 19:23:43 - ERROR - stderr - +2025-02-05 19:23:43 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.0496231317520142, 'learning_rate': 1.3428641604375192e-05, 'epoch': 1.22} +2025-02-05 19:23:43 - ERROR - stderr - 41%|████ | 9130/22434 [9:16:03<9:16:28, 2.51s/it] +2025-02-05 19:23:46 - ERROR - stderr - 41%|████ | 9131/22434 [9:16:06<9:18:08, 2.52s/it] +2025-02-05 19:23:46 - ERROR - stderr - +2025-02-05 19:23:46 - ERROR - stderr - +2025-02-05 19:23:46 - INFO - stdout - {'loss': 0.7992, 'grad_norm': 1.3032883405685425, 'learning_rate': 1.3427285334732494e-05, 'epoch': 1.22} +2025-02-05 19:23:46 - ERROR - stderr - 41%|████ | 9131/22434 [9:16:06<9:18:08, 2.52s/it] +2025-02-05 19:23:48 - ERROR - stderr - 41%|████ | 9132/22434 [9:16:08<9:17:41, 2.52s/it] +2025-02-05 19:23:49 - ERROR - stderr - +2025-02-05 19:23:49 - ERROR - stderr - +2025-02-05 19:23:49 - INFO - stdout - {'loss': 0.7823, 'grad_norm': 1.2036288976669312, 'learning_rate': 1.342592899365131e-05, 'epoch': 1.22} +2025-02-05 19:23:49 - ERROR - stderr - 41%|████ | 9132/22434 [9:16:08<9:17:41, 2.52s/it] +2025-02-05 19:23:51 - ERROR - stderr - 41%|████ | 9133/22434 [9:16:11<9:13:09, 2.50s/it] +2025-02-05 19:23:51 - ERROR - stderr - +2025-02-05 19:23:51 - ERROR - stderr - +2025-02-05 19:23:51 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.2072639465332031, 'learning_rate': 1.3424572581159919e-05, 'epoch': 1.22} +2025-02-05 19:23:51 - ERROR - stderr - 41%|████ | 9133/22434 [9:16:11<9:13:09, 2.50s/it] +2025-02-05 19:23:53 - ERROR - stderr - 41%|████ | 9134/22434 [9:16:13<9:16:27, 2.51s/it] +2025-02-05 19:23:54 - ERROR - stderr - +2025-02-05 19:23:54 - ERROR - stderr - +2025-02-05 19:23:54 - INFO - stdout - {'loss': 0.705, 'grad_norm': 1.1556113958358765, 'learning_rate': 1.3423216097286585e-05, 'epoch': 1.22} +2025-02-05 19:23:54 - ERROR - stderr - 41%|████ | 9134/22434 [9:16:13<9:16:27, 2.51s/it] +2025-02-05 19:23:56 - ERROR - stderr - 41%|████ | 9135/22434 [9:16:16<9:20:33, 2.53s/it] +2025-02-05 19:23:56 - ERROR - stderr - +2025-02-05 19:23:56 - ERROR - stderr - +2025-02-05 19:23:56 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.1503335237503052, 'learning_rate': 1.3421859542059587e-05, 'epoch': 1.22} +2025-02-05 19:23:56 - ERROR - stderr - 41%|████ | 9135/22434 [9:16:16<9:20:33, 2.53s/it] +2025-02-05 19:23:59 - ERROR - stderr - 41%|████ | 9136/22434 [9:16:18<9:20:26, 2.53s/it] +2025-02-05 19:23:59 - ERROR - stderr - +2025-02-05 19:23:59 - ERROR - stderr - +2025-02-05 19:23:59 - INFO - stdout - {'loss': 0.7228, 'grad_norm': 1.197332501411438, 'learning_rate': 1.3420502915507206e-05, 'epoch': 1.22} +2025-02-05 19:23:59 - ERROR - stderr - 41%|████ | 9136/22434 [9:16:18<9:20:26, 2.53s/it] +2025-02-05 19:24:01 - ERROR - stderr - 41%|████ | 9137/22434 [9:16:21<9:32:56, 2.59s/it] +2025-02-05 19:24:01 - ERROR - stderr - +2025-02-05 19:24:01 - ERROR - stderr - +2025-02-05 19:24:01 - INFO - stdout - {'loss': 0.7537, 'grad_norm': 1.1593574285507202, 'learning_rate': 1.341914621765771e-05, 'epoch': 1.22} +2025-02-05 19:24:01 - ERROR - stderr - 41%|████ | 9137/22434 [9:16:21<9:32:56, 2.59s/it] +2025-02-05 19:24:04 - ERROR - stderr - 41%|████ | 9138/22434 [9:16:24<9:25:54, 2.55s/it] +2025-02-05 19:24:04 - ERROR - stderr - +2025-02-05 19:24:04 - ERROR - stderr - +2025-02-05 19:24:04 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.1598589420318604, 'learning_rate': 1.3417789448539384e-05, 'epoch': 1.22} +2025-02-05 19:24:04 - ERROR - stderr - 41%|████ | 9138/22434 [9:16:24<9:25:54, 2.55s/it] +2025-02-05 19:24:06 - ERROR - stderr - 41%|████ | 9139/22434 [9:16:26<9:20:09, 2.53s/it] +2025-02-05 19:24:06 - ERROR - stderr - +2025-02-05 19:24:06 - ERROR - stderr - +2025-02-05 19:24:06 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.1910388469696045, 'learning_rate': 1.341643260818051e-05, 'epoch': 1.22} +2025-02-05 19:24:06 - ERROR - stderr - 41%|████ | 9139/22434 [9:16:26<9:20:09, 2.53s/it] +2025-02-05 19:24:09 - ERROR - stderr - 41%|████ | 9140/22434 [9:16:29<9:36:49, 2.60s/it] +2025-02-05 19:24:09 - ERROR - stderr - +2025-02-05 19:24:09 - ERROR - stderr - +2025-02-05 19:24:09 - INFO - stdout - {'loss': 0.7449, 'grad_norm': 1.2916746139526367, 'learning_rate': 1.3415075696609364e-05, 'epoch': 1.22} +2025-02-05 19:24:09 - ERROR - stderr - 41%|████ | 9140/22434 [9:16:29<9:36:49, 2.60s/it] +2025-02-05 19:24:12 - ERROR - stderr - 41%|████ | 9141/22434 [9:16:31<9:33:03, 2.59s/it] +2025-02-05 19:24:12 - ERROR - stderr - +2025-02-05 19:24:12 - ERROR - stderr - +2025-02-05 19:24:12 - INFO - stdout - {'loss': 0.7664, 'grad_norm': 1.1323336362838745, 'learning_rate': 1.3413718713854236e-05, 'epoch': 1.22} +2025-02-05 19:24:12 - ERROR - stderr - 41%|████ | 9141/22434 [9:16:31<9:33:03, 2.59s/it] +2025-02-05 19:24:14 - ERROR - stderr - 41%|████ | 9142/22434 [9:16:34<9:25:05, 2.55s/it] +2025-02-05 19:24:14 - ERROR - stderr - +2025-02-05 19:24:14 - ERROR - stderr - +2025-02-05 19:24:14 - INFO - stdout - {'loss': 0.6398, 'grad_norm': 1.0582225322723389, 'learning_rate': 1.3412361659943405e-05, 'epoch': 1.22} +2025-02-05 19:24:14 - ERROR - stderr - 41%|████ | 9142/22434 [9:16:34<9:25:05, 2.55s/it] +2025-02-05 19:24:16 - ERROR - stderr - 41%|████ | 9143/22434 [9:16:36<9:18:23, 2.52s/it] +2025-02-05 19:24:17 - ERROR - stderr - +2025-02-05 19:24:17 - ERROR - stderr - +2025-02-05 19:24:17 - INFO - stdout - {'loss': 0.7474, 'grad_norm': 1.2325347661972046, 'learning_rate': 1.341100453490516e-05, 'epoch': 1.22} +2025-02-05 19:24:17 - ERROR - stderr - 41%|████ | 9143/22434 [9:16:36<9:18:23, 2.52s/it] +2025-02-05 19:24:19 - ERROR - stderr - 41%|████ | 9144/22434 [9:16:39<9:18:40, 2.52s/it] +2025-02-05 19:24:19 - ERROR - stderr - +2025-02-05 19:24:19 - ERROR - stderr - +2025-02-05 19:24:19 - INFO - stdout - {'loss': 0.8142, 'grad_norm': 1.207645058631897, 'learning_rate': 1.3409647338767795e-05, 'epoch': 1.22} +2025-02-05 19:24:19 - ERROR - stderr - 41%|████ | 9144/22434 [9:16:39<9:18:40, 2.52s/it] +2025-02-05 19:24:22 - ERROR - stderr - 41%|████ | 9145/22434 [9:16:41<9:17:51, 2.52s/it] +2025-02-05 19:24:22 - ERROR - stderr - +2025-02-05 19:24:22 - ERROR - stderr - +2025-02-05 19:24:22 - INFO - stdout - {'loss': 0.741, 'grad_norm': 1.185514211654663, 'learning_rate': 1.3408290071559589e-05, 'epoch': 1.22} +2025-02-05 19:24:22 - ERROR - stderr - 41%|████ | 9145/22434 [9:16:41<9:17:51, 2.52s/it] +2025-02-05 19:24:24 - ERROR - stderr - 41%|████ | 9146/22434 [9:16:44<9:21:24, 2.53s/it] +2025-02-05 19:24:24 - ERROR - stderr - +2025-02-05 19:24:24 - ERROR - stderr - +2025-02-05 19:24:24 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.2109448909759521, 'learning_rate': 1.340693273330884e-05, 'epoch': 1.22} +2025-02-05 19:24:24 - ERROR - stderr - 41%|████ | 9146/22434 [9:16:44<9:21:24, 2.53s/it] +2025-02-05 19:24:27 - ERROR - stderr - 41%|████ | 9147/22434 [9:16:46<9:18:27, 2.52s/it] +2025-02-05 19:24:27 - ERROR - stderr - +2025-02-05 19:24:27 - ERROR - stderr - +2025-02-05 19:24:27 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.2844654321670532, 'learning_rate': 1.3405575324043837e-05, 'epoch': 1.22} +2025-02-05 19:24:27 - ERROR - stderr - 41%|████ | 9147/22434 [9:16:46<9:18:27, 2.52s/it] +2025-02-05 19:24:29 - ERROR - stderr - 41%|████ | 9148/22434 [9:16:49<9:20:48, 2.53s/it] +2025-02-05 19:24:29 - ERROR - stderr - +2025-02-05 19:24:29 - ERROR - stderr - +2025-02-05 19:24:29 - INFO - stdout - {'loss': 0.7876, 'grad_norm': 1.2015427350997925, 'learning_rate': 1.3404217843792874e-05, 'epoch': 1.22} +2025-02-05 19:24:29 - ERROR - stderr - 41%|████ | 9148/22434 [9:16:49<9:20:48, 2.53s/it] +2025-02-05 19:24:32 - ERROR - stderr - 41%|████ | 9149/22434 [9:16:51<9:17:55, 2.52s/it] +2025-02-05 19:24:32 - ERROR - stderr - +2025-02-05 19:24:32 - ERROR - stderr - +2025-02-05 19:24:32 - INFO - stdout - {'loss': 0.7641, 'grad_norm': 1.1601043939590454, 'learning_rate': 1.340286029258425e-05, 'epoch': 1.22} +2025-02-05 19:24:32 - ERROR - stderr - 41%|████ | 9149/22434 [9:16:51<9:17:55, 2.52s/it] +2025-02-05 19:24:34 - ERROR - stderr - 41%|████ | 9150/22434 [9:16:54<9:14:42, 2.51s/it] +2025-02-05 19:24:34 - ERROR - stderr - +2025-02-05 19:24:34 - ERROR - stderr - +2025-02-05 19:24:34 - INFO - stdout - {'loss': 0.7631, 'grad_norm': 1.2381569147109985, 'learning_rate': 1.3401502670446259e-05, 'epoch': 1.22} +2025-02-05 19:24:34 - ERROR - stderr - 41%|████ | 9150/22434 [9:16:54<9:14:42, 2.51s/it] +2025-02-05 19:24:37 - ERROR - stderr - 41%|████ | 9151/22434 [9:16:56<9:19:41, 2.53s/it] +2025-02-05 19:24:37 - ERROR - stderr - +2025-02-05 19:24:37 - ERROR - stderr - +2025-02-05 19:24:37 - INFO - stdout - {'loss': 0.7778, 'grad_norm': 1.28840970993042, 'learning_rate': 1.3400144977407199e-05, 'epoch': 1.22} +2025-02-05 19:24:37 - ERROR - stderr - 41%|████ | 9151/22434 [9:16:57<9:19:41, 2.53s/it] +2025-02-05 19:24:39 - ERROR - stderr - 41%|████ | 9152/22434 [9:16:59<9:20:19, 2.53s/it] +2025-02-05 19:24:39 - ERROR - stderr - +2025-02-05 19:24:39 - ERROR - stderr - +2025-02-05 19:24:39 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.1501911878585815, 'learning_rate': 1.3398787213495372e-05, 'epoch': 1.22} +2025-02-05 19:24:39 - ERROR - stderr - 41%|████ | 9152/22434 [9:16:59<9:20:19, 2.53s/it] +2025-02-05 19:24:42 - ERROR - stderr - 41%|████ | 9153/22434 [9:17:01<9:14:17, 2.50s/it] +2025-02-05 19:24:42 - ERROR - stderr - +2025-02-05 19:24:42 - ERROR - stderr - +2025-02-05 19:24:42 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.1074639558792114, 'learning_rate': 1.3397429378739076e-05, 'epoch': 1.22} +2025-02-05 19:24:42 - ERROR - stderr - 41%|████ | 9153/22434 [9:17:01<9:14:17, 2.50s/it] +2025-02-05 19:24:44 - ERROR - stderr - 41%|████ | 9154/22434 [9:17:04<9:17:12, 2.52s/it] +2025-02-05 19:24:44 - ERROR - stderr - +2025-02-05 19:24:44 - ERROR - stderr - +2025-02-05 19:24:44 - INFO - stdout - {'loss': 0.7477, 'grad_norm': 1.1977347135543823, 'learning_rate': 1.3396071473166614e-05, 'epoch': 1.22} +2025-02-05 19:24:44 - ERROR - stderr - 41%|████ | 9154/22434 [9:17:04<9:17:12, 2.52s/it] +2025-02-05 19:24:47 - ERROR - stderr - 41%|████ | 9155/22434 [9:17:06<9:16:26, 2.51s/it] +2025-02-05 19:24:47 - ERROR - stderr - +2025-02-05 19:24:47 - ERROR - stderr - +2025-02-05 19:24:47 - INFO - stdout - {'loss': 0.7884, 'grad_norm': 1.2164485454559326, 'learning_rate': 1.3394713496806295e-05, 'epoch': 1.22} +2025-02-05 19:24:47 - ERROR - stderr - 41%|████ | 9155/22434 [9:17:07<9:16:26, 2.51s/it] +2025-02-05 19:24:49 - ERROR - stderr - 41%|████ | 9156/22434 [9:17:09<9:18:28, 2.52s/it] +2025-02-05 19:24:49 - ERROR - stderr - +2025-02-05 19:24:49 - ERROR - stderr - +2025-02-05 19:24:49 - INFO - stdout - {'loss': 0.772, 'grad_norm': 1.2040317058563232, 'learning_rate': 1.339335544968642e-05, 'epoch': 1.22} +2025-02-05 19:24:49 - ERROR - stderr - 41%|████ | 9156/22434 [9:17:09<9:18:28, 2.52s/it] +2025-02-05 19:24:52 - ERROR - stderr - 41%|████ | 9157/22434 [9:17:12<9:18:39, 2.52s/it] +2025-02-05 19:24:52 - ERROR - stderr - +2025-02-05 19:24:52 - ERROR - stderr - +2025-02-05 19:24:52 - INFO - stdout - {'loss': 0.6443, 'grad_norm': 1.065250277519226, 'learning_rate': 1.33919973318353e-05, 'epoch': 1.22} +2025-02-05 19:24:52 - ERROR - stderr - 41%|████ | 9157/22434 [9:17:12<9:18:39, 2.52s/it] +2025-02-05 19:24:54 - ERROR - stderr - 41%|████ | 9158/22434 [9:17:14<9:16:43, 2.52s/it] +2025-02-05 19:24:54 - ERROR - stderr - +2025-02-05 19:24:54 - ERROR - stderr - +2025-02-05 19:24:54 - INFO - stdout - {'loss': 0.6742, 'grad_norm': 1.2347534894943237, 'learning_rate': 1.3390639143281239e-05, 'epoch': 1.22} +2025-02-05 19:24:54 - ERROR - stderr - 41%|████ | 9158/22434 [9:17:14<9:16:43, 2.52s/it] +2025-02-05 19:24:57 - ERROR - stderr - 41%|████ | 9159/22434 [9:17:17<9:30:30, 2.58s/it] +2025-02-05 19:24:57 - ERROR - stderr - +2025-02-05 19:24:57 - ERROR - stderr - +2025-02-05 19:24:57 - INFO - stdout - {'loss': 0.6806, 'grad_norm': 1.0989460945129395, 'learning_rate': 1.3389280884052549e-05, 'epoch': 1.22} +2025-02-05 19:24:57 - ERROR - stderr - 41%|████ | 9159/22434 [9:17:17<9:30:30, 2.58s/it] +2025-02-05 19:25:00 - ERROR - stderr - 41%|████ | 9160/22434 [9:17:19<9:28:01, 2.57s/it] +2025-02-05 19:25:00 - ERROR - stderr - +2025-02-05 19:25:00 - ERROR - stderr - +2025-02-05 19:25:00 - INFO - stdout - {'loss': 0.6876, 'grad_norm': 1.1263281106948853, 'learning_rate': 1.3387922554177545e-05, 'epoch': 1.22} +2025-02-05 19:25:00 - ERROR - stderr - 41%|████ | 9160/22434 [9:17:19<9:28:01, 2.57s/it] +2025-02-05 19:25:02 - ERROR - stderr - 41%|████ | 9161/22434 [9:17:22<9:22:18, 2.54s/it] +2025-02-05 19:25:02 - ERROR - stderr - +2025-02-05 19:25:02 - ERROR - stderr - +2025-02-05 19:25:02 - INFO - stdout - {'loss': 0.7451, 'grad_norm': 1.2382155656814575, 'learning_rate': 1.3386564153684533e-05, 'epoch': 1.23} +2025-02-05 19:25:02 - ERROR - stderr - 41%|████ | 9161/22434 [9:17:22<9:22:18, 2.54s/it] +2025-02-05 19:25:05 - ERROR - stderr - 41%|████ | 9162/22434 [9:17:24<9:25:42, 2.56s/it] +2025-02-05 19:25:05 - ERROR - stderr - +2025-02-05 19:25:05 - ERROR - stderr - +2025-02-05 19:25:05 - INFO - stdout - {'loss': 0.7174, 'grad_norm': 1.1751782894134521, 'learning_rate': 1.3385205682601837e-05, 'epoch': 1.23} +2025-02-05 19:25:05 - ERROR - stderr - 41%|████ | 9162/22434 [9:17:24<9:25:42, 2.56s/it] +2025-02-05 19:25:07 - ERROR - stderr - 41%|████ | 9163/22434 [9:17:27<9:20:23, 2.53s/it] +2025-02-05 19:25:07 - ERROR - stderr - +2025-02-05 19:25:07 - ERROR - stderr - +2025-02-05 19:25:07 - INFO - stdout - {'loss': 0.7191, 'grad_norm': 1.2271381616592407, 'learning_rate': 1.3383847140957764e-05, 'epoch': 1.23} +2025-02-05 19:25:07 - ERROR - stderr - 41%|████ | 9163/22434 [9:17:27<9:20:23, 2.53s/it] +2025-02-05 19:25:10 - ERROR - stderr - 41%|████ | 9164/22434 [9:17:30<9:31:11, 2.58s/it] +2025-02-05 19:25:10 - ERROR - stderr - +2025-02-05 19:25:10 - ERROR - stderr - +2025-02-05 19:25:10 - INFO - stdout - {'loss': 0.7538, 'grad_norm': 1.199062466621399, 'learning_rate': 1.338248852878064e-05, 'epoch': 1.23} +2025-02-05 19:25:10 - ERROR - stderr - 41%|████ | 9164/22434 [9:17:30<9:31:11, 2.58s/it] +2025-02-05 19:25:12 - ERROR - stderr - 41%|████ | 9165/22434 [9:17:32<9:29:16, 2.57s/it] +2025-02-05 19:25:12 - ERROR - stderr - +2025-02-05 19:25:12 - ERROR - stderr - +2025-02-05 19:25:12 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.2072817087173462, 'learning_rate': 1.3381129846098776e-05, 'epoch': 1.23} +2025-02-05 19:25:12 - ERROR - stderr - 41%|████ | 9165/22434 [9:17:32<9:29:16, 2.57s/it] +2025-02-05 19:25:15 - ERROR - stderr - 41%|████ | 9166/22434 [9:17:35<9:30:39, 2.58s/it] +2025-02-05 19:25:15 - ERROR - stderr - +2025-02-05 19:25:15 - ERROR - stderr - +2025-02-05 19:25:15 - INFO - stdout - {'loss': 0.6783, 'grad_norm': 1.1895650625228882, 'learning_rate': 1.3379771092940493e-05, 'epoch': 1.23} +2025-02-05 19:25:15 - ERROR - stderr - 41%|████ | 9166/22434 [9:17:35<9:30:39, 2.58s/it] +2025-02-05 19:25:17 - ERROR - stderr - 41%|████ | 9167/22434 [9:17:37<9:26:20, 2.56s/it] +2025-02-05 19:25:18 - ERROR - stderr - +2025-02-05 19:25:18 - ERROR - stderr - +2025-02-05 19:25:18 - INFO - stdout - {'loss': 0.6686, 'grad_norm': 1.0856274366378784, 'learning_rate': 1.3378412269334117e-05, 'epoch': 1.23} +2025-02-05 19:25:18 - ERROR - stderr - 41%|████ | 9167/22434 [9:17:37<9:26:20, 2.56s/it] +2025-02-05 19:25:20 - ERROR - stderr - 41%|████ | 9168/22434 [9:17:40<9:27:41, 2.57s/it] +2025-02-05 19:25:20 - ERROR - stderr - +2025-02-05 19:25:20 - ERROR - stderr - +2025-02-05 19:25:20 - INFO - stdout - {'loss': 0.7101, 'grad_norm': 1.1868494749069214, 'learning_rate': 1.3377053375307974e-05, 'epoch': 1.23} +2025-02-05 19:25:20 - ERROR - stderr - 41%|████ | 9168/22434 [9:17:40<9:27:41, 2.57s/it] +2025-02-05 19:25:23 - ERROR - stderr - 41%|████ | 9169/22434 [9:17:43<9:35:23, 2.60s/it] +2025-02-05 19:25:23 - ERROR - stderr - +2025-02-05 19:25:23 - ERROR - stderr - +2025-02-05 19:25:23 - INFO - stdout - {'loss': 0.7359, 'grad_norm': 1.0710448026657104, 'learning_rate': 1.337569441089038e-05, 'epoch': 1.23} +2025-02-05 19:25:23 - ERROR - stderr - 41%|████ | 9169/22434 [9:17:43<9:35:23, 2.60s/it] +2025-02-05 19:25:25 - ERROR - stderr - 41%|████ | 9170/22434 [9:17:45<9:31:21, 2.58s/it] +2025-02-05 19:25:25 - ERROR - stderr - +2025-02-05 19:25:25 - ERROR - stderr - +2025-02-05 19:25:25 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.1362768411636353, 'learning_rate': 1.3374335376109668e-05, 'epoch': 1.23} +2025-02-05 19:25:25 - ERROR - stderr - 41%|████ | 9170/22434 [9:17:45<9:31:21, 2.58s/it] +2025-02-05 19:25:28 - ERROR - stderr - 41%|████ | 9171/22434 [9:17:48<9:28:04, 2.57s/it] +2025-02-05 19:25:28 - ERROR - stderr - +2025-02-05 19:25:28 - ERROR - stderr - +2025-02-05 19:25:28 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.091895580291748, 'learning_rate': 1.3372976270994164e-05, 'epoch': 1.23} +2025-02-05 19:25:28 - ERROR - stderr - 41%|████ | 9171/22434 [9:17:48<9:28:04, 2.57s/it] +2025-02-05 19:25:31 - ERROR - stderr - 41%|████ | 9172/22434 [9:17:50<9:37:44, 2.61s/it] +2025-02-05 19:25:31 - ERROR - stderr - +2025-02-05 19:25:31 - ERROR - stderr - +2025-02-05 19:25:31 - INFO - stdout - {'loss': 0.6542, 'grad_norm': 1.042287826538086, 'learning_rate': 1.3371617095572199e-05, 'epoch': 1.23} +2025-02-05 19:25:31 - ERROR - stderr - 41%|████ | 9172/22434 [9:17:50<9:37:44, 2.61s/it] +2025-02-05 19:25:33 - ERROR - stderr - 41%|████ | 9173/22434 [9:17:53<9:33:06, 2.59s/it] +2025-02-05 19:25:33 - ERROR - stderr - +2025-02-05 19:25:33 - ERROR - stderr - +2025-02-05 19:25:33 - INFO - stdout - {'loss': 0.6042, 'grad_norm': 1.0992852449417114, 'learning_rate': 1.3370257849872102e-05, 'epoch': 1.23} +2025-02-05 19:25:33 - ERROR - stderr - 41%|████ | 9173/22434 [9:17:53<9:33:06, 2.59s/it] +2025-02-05 19:25:36 - ERROR - stderr - 41%|████ | 9174/22434 [9:17:55<9:24:24, 2.55s/it] +2025-02-05 19:25:36 - ERROR - stderr - +2025-02-05 19:25:36 - ERROR - stderr - +2025-02-05 19:25:36 - INFO - stdout - {'loss': 0.7921, 'grad_norm': 1.1706446409225464, 'learning_rate': 1.3368898533922202e-05, 'epoch': 1.23} +2025-02-05 19:25:36 - ERROR - stderr - 41%|████ | 9174/22434 [9:17:55<9:24:24, 2.55s/it] +2025-02-05 19:25:38 - ERROR - stderr - 41%|████ | 9175/22434 [9:17:58<9:20:46, 2.54s/it] +2025-02-05 19:25:38 - ERROR - stderr - +2025-02-05 19:25:38 - ERROR - stderr - +2025-02-05 19:25:38 - INFO - stdout - {'loss': 0.6841, 'grad_norm': 1.0392574071884155, 'learning_rate': 1.3367539147750837e-05, 'epoch': 1.23} +2025-02-05 19:25:38 - ERROR - stderr - 41%|████ | 9175/22434 [9:17:58<9:20:46, 2.54s/it] +2025-02-05 19:25:41 - ERROR - stderr - 41%|████ | 9176/22434 [9:18:00<9:24:06, 2.55s/it] +2025-02-05 19:25:41 - ERROR - stderr - +2025-02-05 19:25:41 - ERROR - stderr - +2025-02-05 19:25:41 - INFO - stdout - {'loss': 0.7552, 'grad_norm': 1.2831294536590576, 'learning_rate': 1.336617969138634e-05, 'epoch': 1.23} +2025-02-05 19:25:41 - ERROR - stderr - 41%|████ | 9176/22434 [9:18:00<9:24:06, 2.55s/it] +2025-02-05 19:25:43 - ERROR - stderr - 41%|████ | 9177/22434 [9:18:03<9:18:31, 2.53s/it] +2025-02-05 19:25:43 - ERROR - stderr - +2025-02-05 19:25:43 - ERROR - stderr - +2025-02-05 19:25:43 - INFO - stdout - {'loss': 0.6336, 'grad_norm': 1.11100435256958, 'learning_rate': 1.3364820164857053e-05, 'epoch': 1.23} +2025-02-05 19:25:43 - ERROR - stderr - 41%|████ | 9177/22434 [9:18:03<9:18:31, 2.53s/it] +2025-02-05 19:25:46 - ERROR - stderr - 41%|████ | 9178/22434 [9:18:05<9:20:48, 2.54s/it] +2025-02-05 19:25:46 - ERROR - stderr - +2025-02-05 19:25:46 - ERROR - stderr - +2025-02-05 19:25:46 - INFO - stdout - {'loss': 0.6763, 'grad_norm': 1.1974517107009888, 'learning_rate': 1.3363460568191306e-05, 'epoch': 1.23} +2025-02-05 19:25:46 - ERROR - stderr - 41%|████ | 9178/22434 [9:18:05<9:20:48, 2.54s/it] +2025-02-05 19:25:48 - ERROR - stderr - 41%|████ | 9179/22434 [9:18:08<9:18:35, 2.53s/it] +2025-02-05 19:25:48 - ERROR - stderr - +2025-02-05 19:25:48 - ERROR - stderr - +2025-02-05 19:25:48 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.1916143894195557, 'learning_rate': 1.336210090141744e-05, 'epoch': 1.23} +2025-02-05 19:25:48 - ERROR - stderr - 41%|████ | 9179/22434 [9:18:08<9:18:35, 2.53s/it] +2025-02-05 19:25:51 - ERROR - stderr - 41%|████ | 9180/22434 [9:18:10<9:13:13, 2.50s/it] +2025-02-05 19:25:51 - ERROR - stderr - +2025-02-05 19:25:51 - ERROR - stderr - +2025-02-05 19:25:51 - INFO - stdout - {'loss': 0.7639, 'grad_norm': 1.3191468715667725, 'learning_rate': 1.3360741164563797e-05, 'epoch': 1.23} +2025-02-05 19:25:51 - ERROR - stderr - 41%|████ | 9180/22434 [9:18:10<9:13:13, 2.50s/it] +2025-02-05 19:25:53 - ERROR - stderr - 41%|████ | 9181/22434 [9:18:13<9:10:04, 2.49s/it] +2025-02-05 19:25:53 - ERROR - stderr - +2025-02-05 19:25:53 - ERROR - stderr - +2025-02-05 19:25:53 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.2226437330245972, 'learning_rate': 1.3359381357658728e-05, 'epoch': 1.23} +2025-02-05 19:25:53 - ERROR - stderr - 41%|████ | 9181/22434 [9:18:13<9:10:04, 2.49s/it] +2025-02-05 19:25:56 - ERROR - stderr - 41%|████ | 9182/22434 [9:18:15<9:17:05, 2.52s/it] +2025-02-05 19:25:56 - ERROR - stderr - +2025-02-05 19:25:56 - ERROR - stderr - +2025-02-05 19:25:56 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.098572015762329, 'learning_rate': 1.3358021480730563e-05, 'epoch': 1.23} +2025-02-05 19:25:56 - ERROR - stderr - 41%|████ | 9182/22434 [9:18:15<9:17:05, 2.52s/it] +2025-02-05 19:25:58 - ERROR - stderr - 41%|████ | 9183/22434 [9:18:18<9:13:40, 2.51s/it] +2025-02-05 19:25:58 - ERROR - stderr - +2025-02-05 19:25:58 - ERROR - stderr - +2025-02-05 19:25:58 - INFO - stdout - {'loss': 0.7106, 'grad_norm': 1.2061750888824463, 'learning_rate': 1.3356661533807655e-05, 'epoch': 1.23} +2025-02-05 19:25:58 - ERROR - stderr - 41%|████ | 9183/22434 [9:18:18<9:13:40, 2.51s/it] +2025-02-05 19:26:01 - ERROR - stderr - 41%|████ | 9184/22434 [9:18:20<9:19:13, 2.53s/it] +2025-02-05 19:26:01 - ERROR - stderr - +2025-02-05 19:26:01 - ERROR - stderr - +2025-02-05 19:26:01 - INFO - stdout - {'loss': 0.7862, 'grad_norm': 1.1974519491195679, 'learning_rate': 1.3355301516918348e-05, 'epoch': 1.23} +2025-02-05 19:26:01 - ERROR - stderr - 41%|████ | 9184/22434 [9:18:21<9:19:13, 2.53s/it] +2025-02-05 19:26:03 - ERROR - stderr - 41%|████ | 9185/22434 [9:18:23<9:16:33, 2.52s/it] +2025-02-05 19:26:03 - ERROR - stderr - +2025-02-05 19:26:03 - ERROR - stderr - +2025-02-05 19:26:03 - INFO - stdout - {'loss': 0.7608, 'grad_norm': 1.2106926441192627, 'learning_rate': 1.3353941430090992e-05, 'epoch': 1.23} +2025-02-05 19:26:03 - ERROR - stderr - 41%|████ | 9185/22434 [9:18:23<9:16:33, 2.52s/it] +2025-02-05 19:26:06 - ERROR - stderr - 41%|████ | 9186/22434 [9:18:25<9:15:00, 2.51s/it] +2025-02-05 19:26:06 - ERROR - stderr - +2025-02-05 19:26:06 - ERROR - stderr - +2025-02-05 19:26:06 - INFO - stdout - {'loss': 0.7511, 'grad_norm': 1.2067160606384277, 'learning_rate': 1.335258127335394e-05, 'epoch': 1.23} +2025-02-05 19:26:06 - ERROR - stderr - 41%|████ | 9186/22434 [9:18:26<9:15:00, 2.51s/it] +2025-02-05 19:26:08 - ERROR - stderr - 41%|████ | 9187/22434 [9:18:28<9:23:37, 2.55s/it] +2025-02-05 19:26:08 - ERROR - stderr - +2025-02-05 19:26:08 - ERROR - stderr - +2025-02-05 19:26:08 - INFO - stdout - {'loss': 0.6509, 'grad_norm': 1.0389653444290161, 'learning_rate': 1.3351221046735533e-05, 'epoch': 1.23} +2025-02-05 19:26:08 - ERROR - stderr - 41%|████ | 9187/22434 [9:18:28<9:23:37, 2.55s/it] +2025-02-05 19:26:11 - ERROR - stderr - 41%|████ | 9188/22434 [9:18:31<9:18:34, 2.53s/it] +2025-02-05 19:26:11 - ERROR - stderr - +2025-02-05 19:26:11 - ERROR - stderr - +2025-02-05 19:26:11 - INFO - stdout - {'loss': 0.708, 'grad_norm': 1.1797090768814087, 'learning_rate': 1.3349860750264134e-05, 'epoch': 1.23} +2025-02-05 19:26:11 - ERROR - stderr - 41%|████ | 9188/22434 [9:18:31<9:18:34, 2.53s/it] +2025-02-05 19:26:13 - ERROR - stderr - 41%|████ | 9189/22434 [9:18:33<9:15:15, 2.52s/it] +2025-02-05 19:26:13 - ERROR - stderr - +2025-02-05 19:26:13 - ERROR - stderr - +2025-02-05 19:26:13 - INFO - stdout - {'loss': 0.7475, 'grad_norm': 1.164821982383728, 'learning_rate': 1.3348500383968095e-05, 'epoch': 1.23} +2025-02-05 19:26:13 - ERROR - stderr - 41%|████ | 9189/22434 [9:18:33<9:15:15, 2.52s/it] +2025-02-05 19:26:16 - ERROR - stderr - 41%|████ | 9190/22434 [9:18:36<9:12:08, 2.50s/it] +2025-02-05 19:26:16 - ERROR - stderr - +2025-02-05 19:26:16 - ERROR - stderr - +2025-02-05 19:26:16 - INFO - stdout - {'loss': 0.7659, 'grad_norm': 1.190679669380188, 'learning_rate': 1.3347139947875767e-05, 'epoch': 1.23} +2025-02-05 19:26:16 - ERROR - stderr - 41%|████ | 9190/22434 [9:18:36<9:12:08, 2.50s/it] +2025-02-05 19:26:18 - ERROR - stderr - 41%|████ | 9191/22434 [9:18:38<9:12:09, 2.50s/it] +2025-02-05 19:26:18 - ERROR - stderr - +2025-02-05 19:26:18 - ERROR - stderr - +2025-02-05 19:26:18 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.1989288330078125, 'learning_rate': 1.3345779442015512e-05, 'epoch': 1.23} +2025-02-05 19:26:18 - ERROR - stderr - 41%|████ | 9191/22434 [9:18:38<9:12:09, 2.50s/it] +2025-02-05 19:26:21 - ERROR - stderr - 41%|████ | 9192/22434 [9:18:41<9:12:22, 2.50s/it] +2025-02-05 19:26:21 - ERROR - stderr - +2025-02-05 19:26:21 - ERROR - stderr - +2025-02-05 19:26:21 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.3162899017333984, 'learning_rate': 1.3344418866415683e-05, 'epoch': 1.23} +2025-02-05 19:26:21 - ERROR - stderr - 41%|████ | 9192/22434 [9:18:41<9:12:22, 2.50s/it] +2025-02-05 19:26:23 - ERROR - stderr - 41%|████ | 9193/22434 [9:18:43<9:10:51, 2.50s/it] +2025-02-05 19:26:23 - ERROR - stderr - +2025-02-05 19:26:23 - ERROR - stderr - +2025-02-05 19:26:23 - INFO - stdout - {'loss': 0.7566, 'grad_norm': 1.1718229055404663, 'learning_rate': 1.3343058221104643e-05, 'epoch': 1.23} +2025-02-05 19:26:23 - ERROR - stderr - 41%|████ | 9193/22434 [9:18:43<9:10:51, 2.50s/it] +2025-02-05 19:26:26 - ERROR - stderr - 41%|████ | 9194/22434 [9:18:46<9:12:13, 2.50s/it] +2025-02-05 19:26:26 - ERROR - stderr - +2025-02-05 19:26:26 - ERROR - stderr - +2025-02-05 19:26:26 - INFO - stdout - {'loss': 0.7782, 'grad_norm': 1.1812809705734253, 'learning_rate': 1.3341697506110753e-05, 'epoch': 1.23} +2025-02-05 19:26:26 - ERROR - stderr - 41%|████ | 9194/22434 [9:18:46<9:12:13, 2.50s/it] +2025-02-05 19:26:28 - ERROR - stderr - 41%|████ | 9195/22434 [9:18:48<9:17:12, 2.53s/it] +2025-02-05 19:26:28 - ERROR - stderr - +2025-02-05 19:26:28 - ERROR - stderr - +2025-02-05 19:26:28 - INFO - stdout - {'loss': 0.6426, 'grad_norm': 0.9140693545341492, 'learning_rate': 1.334033672146238e-05, 'epoch': 1.23} +2025-02-05 19:26:28 - ERROR - stderr - 41%|████ | 9195/22434 [9:18:48<9:17:12, 2.53s/it] +2025-02-05 19:26:31 - ERROR - stderr - 41%|████ | 9196/22434 [9:18:51<9:43:35, 2.65s/it] +2025-02-05 19:26:31 - ERROR - stderr - +2025-02-05 19:26:31 - ERROR - stderr - +2025-02-05 19:26:31 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.1095671653747559, 'learning_rate': 1.333897586718788e-05, 'epoch': 1.23} +2025-02-05 19:26:31 - ERROR - stderr - 41%|████ | 9196/22434 [9:18:51<9:43:35, 2.65s/it] +2025-02-05 19:26:34 - ERROR - stderr - 41%|████ | 9197/22434 [9:18:54<9:45:25, 2.65s/it] +2025-02-05 19:26:34 - ERROR - stderr - +2025-02-05 19:26:34 - ERROR - stderr - +2025-02-05 19:26:34 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 0.9914871454238892, 'learning_rate': 1.3337614943315629e-05, 'epoch': 1.23} +2025-02-05 19:26:34 - ERROR - stderr - 41%|████ | 9197/22434 [9:18:54<9:45:25, 2.65s/it] +2025-02-05 19:26:36 - ERROR - stderr - 41%|████ | 9198/22434 [9:18:56<9:34:54, 2.61s/it] +2025-02-05 19:26:37 - ERROR - stderr - +2025-02-05 19:26:37 - ERROR - stderr - +2025-02-05 19:26:37 - INFO - stdout - {'loss': 0.7923, 'grad_norm': 1.3530595302581787, 'learning_rate': 1.3336253949873983e-05, 'epoch': 1.23} +2025-02-05 19:26:37 - ERROR - stderr - 41%|████ | 9198/22434 [9:18:56<9:34:54, 2.61s/it] +2025-02-05 19:26:39 - ERROR - stderr - 41%|████ | 9199/22434 [9:18:59<9:29:45, 2.58s/it] +2025-02-05 19:26:39 - ERROR - stderr - +2025-02-05 19:26:39 - ERROR - stderr - +2025-02-05 19:26:39 - INFO - stdout - {'loss': 0.7561, 'grad_norm': 1.1554523706436157, 'learning_rate': 1.3334892886891316e-05, 'epoch': 1.23} +2025-02-05 19:26:39 - ERROR - stderr - 41%|████ | 9199/22434 [9:18:59<9:29:45, 2.58s/it] +2025-02-05 19:26:42 - ERROR - stderr - 41%|████ | 9200/22434 [9:19:01<9:27:28, 2.57s/it] +2025-02-05 19:26:42 - ERROR - stderr - +2025-02-05 19:26:42 - ERROR - stderr - +2025-02-05 19:26:42 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.1662774085998535, 'learning_rate': 1.3333531754395996e-05, 'epoch': 1.23} +2025-02-05 19:26:42 - ERROR - stderr - 41%|████ | 9200/22434 [9:19:01<9:27:28, 2.57s/it] +2025-02-05 19:26:44 - ERROR - stderr - 41%|████ | 9201/22434 [9:19:04<9:19:02, 2.53s/it] +2025-02-05 19:26:44 - ERROR - stderr - +2025-02-05 19:26:44 - ERROR - stderr - +2025-02-05 19:26:44 - INFO - stdout - {'loss': 0.786, 'grad_norm': 1.3221659660339355, 'learning_rate': 1.3332170552416403e-05, 'epoch': 1.23} +2025-02-05 19:26:44 - ERROR - stderr - 41%|████ | 9201/22434 [9:19:04<9:19:02, 2.53s/it] +2025-02-05 19:26:47 - ERROR - stderr - 41%|████ | 9202/22434 [9:19:06<9:18:01, 2.53s/it] +2025-02-05 19:26:47 - ERROR - stderr - +2025-02-05 19:26:47 - ERROR - stderr - +2025-02-05 19:26:47 - INFO - stdout - {'loss': 0.7722, 'grad_norm': 1.153968334197998, 'learning_rate': 1.3330809280980899e-05, 'epoch': 1.23} +2025-02-05 19:26:47 - ERROR - stderr - 41%|████ | 9202/22434 [9:19:06<9:18:01, 2.53s/it] +2025-02-05 19:26:49 - ERROR - stderr - 41%|████ | 9203/22434 [9:19:09<9:17:07, 2.53s/it] +2025-02-05 19:26:49 - ERROR - stderr - +2025-02-05 19:26:49 - ERROR - stderr - +2025-02-05 19:26:49 - INFO - stdout - {'loss': 0.7505, 'grad_norm': 1.2599009275436401, 'learning_rate': 1.3329447940117863e-05, 'epoch': 1.23} +2025-02-05 19:26:49 - ERROR - stderr - 41%|████ | 9203/22434 [9:19:09<9:17:07, 2.53s/it] +2025-02-05 19:26:52 - ERROR - stderr - 41%|████ | 9204/22434 [9:19:11<9:13:52, 2.51s/it] +2025-02-05 19:26:52 - ERROR - stderr - +2025-02-05 19:26:52 - ERROR - stderr - +2025-02-05 19:26:52 - INFO - stdout - {'loss': 0.7665, 'grad_norm': 1.263197898864746, 'learning_rate': 1.3328086529855672e-05, 'epoch': 1.23} +2025-02-05 19:26:52 - ERROR - stderr - 41%|████ | 9204/22434 [9:19:11<9:13:52, 2.51s/it] +2025-02-05 19:26:54 - ERROR - stderr - 41%|████ | 9205/22434 [9:19:14<9:08:22, 2.49s/it] +2025-02-05 19:26:54 - ERROR - stderr - +2025-02-05 19:26:54 - ERROR - stderr - +2025-02-05 19:26:54 - INFO - stdout - {'loss': 0.6569, 'grad_norm': 1.2896510362625122, 'learning_rate': 1.33267250502227e-05, 'epoch': 1.23} +2025-02-05 19:26:54 - ERROR - stderr - 41%|████ | 9205/22434 [9:19:14<9:08:22, 2.49s/it] +2025-02-05 19:26:56 - ERROR - stderr - 41%|████ | 9206/22434 [9:19:16<9:08:53, 2.49s/it] +2025-02-05 19:26:56 - ERROR - stderr - +2025-02-05 19:26:56 - ERROR - stderr - +2025-02-05 19:26:56 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 1.1006722450256348, 'learning_rate': 1.332536350124733e-05, 'epoch': 1.23} +2025-02-05 19:26:56 - ERROR - stderr - 41%|████ | 9206/22434 [9:19:16<9:08:53, 2.49s/it] +2025-02-05 19:26:59 - ERROR - stderr - 41%|████ | 9207/22434 [9:19:19<9:09:42, 2.49s/it] +2025-02-05 19:26:59 - ERROR - stderr - +2025-02-05 19:26:59 - ERROR - stderr - +2025-02-05 19:26:59 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.158721923828125, 'learning_rate': 1.3324001882957938e-05, 'epoch': 1.23} +2025-02-05 19:26:59 - ERROR - stderr - 41%|████ | 9207/22434 [9:19:19<9:09:42, 2.49s/it] +2025-02-05 19:27:02 - ERROR - stderr - 41%|████ | 9208/22434 [9:19:21<9:16:44, 2.53s/it] +2025-02-05 19:27:02 - ERROR - stderr - +2025-02-05 19:27:02 - ERROR - stderr - +2025-02-05 19:27:02 - INFO - stdout - {'loss': 0.7002, 'grad_norm': 1.136100172996521, 'learning_rate': 1.3322640195382908e-05, 'epoch': 1.23} +2025-02-05 19:27:02 - ERROR - stderr - 41%|████ | 9208/22434 [9:19:21<9:16:44, 2.53s/it] +2025-02-05 19:27:04 - ERROR - stderr - 41%|████ | 9209/22434 [9:19:24<9:10:41, 2.50s/it] +2025-02-05 19:27:04 - ERROR - stderr - +2025-02-05 19:27:04 - ERROR - stderr - +2025-02-05 19:27:04 - INFO - stdout - {'loss': 0.8045, 'grad_norm': 1.2908204793930054, 'learning_rate': 1.3321278438550625e-05, 'epoch': 1.23} +2025-02-05 19:27:04 - ERROR - stderr - 41%|████ | 9209/22434 [9:19:24<9:10:41, 2.50s/it] +2025-02-05 19:27:06 - ERROR - stderr - 41%|████ | 9210/22434 [9:19:26<9:06:35, 2.48s/it] +2025-02-05 19:27:06 - ERROR - stderr - +2025-02-05 19:27:06 - ERROR - stderr - +2025-02-05 19:27:06 - INFO - stdout - {'loss': 0.7081, 'grad_norm': 1.2968108654022217, 'learning_rate': 1.3319916612489468e-05, 'epoch': 1.23} +2025-02-05 19:27:06 - ERROR - stderr - 41%|████ | 9210/22434 [9:19:26<9:06:35, 2.48s/it] +2025-02-05 19:27:09 - ERROR - stderr - 41%|████ | 9211/22434 [9:19:29<9:10:57, 2.50s/it] +2025-02-05 19:27:09 - ERROR - stderr - +2025-02-05 19:27:09 - ERROR - stderr - +2025-02-05 19:27:09 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.1575469970703125, 'learning_rate': 1.3318554717227827e-05, 'epoch': 1.23} +2025-02-05 19:27:09 - ERROR - stderr - 41%|████ | 9211/22434 [9:19:29<9:10:57, 2.50s/it] +2025-02-05 19:27:11 - ERROR - stderr - 41%|████ | 9212/22434 [9:19:31<9:11:53, 2.50s/it] +2025-02-05 19:27:12 - ERROR - stderr - +2025-02-05 19:27:12 - ERROR - stderr - +2025-02-05 19:27:12 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.1811562776565552, 'learning_rate': 1.3317192752794086e-05, 'epoch': 1.23} +2025-02-05 19:27:12 - ERROR - stderr - 41%|████ | 9212/22434 [9:19:31<9:11:53, 2.50s/it] +2025-02-05 19:27:14 - ERROR - stderr - 41%|████ | 9213/22434 [9:19:34<9:12:07, 2.51s/it] +2025-02-05 19:27:14 - ERROR - stderr - +2025-02-05 19:27:14 - ERROR - stderr - +2025-02-05 19:27:14 - INFO - stdout - {'loss': 0.7106, 'grad_norm': 1.0972161293029785, 'learning_rate': 1.331583071921664e-05, 'epoch': 1.23} +2025-02-05 19:27:14 - ERROR - stderr - 41%|████ | 9213/22434 [9:19:34<9:12:07, 2.51s/it] +2025-02-05 19:27:16 - ERROR - stderr - 41%|████ | 9214/22434 [9:19:36<9:07:58, 2.49s/it] +2025-02-05 19:27:16 - ERROR - stderr - +2025-02-05 19:27:16 - ERROR - stderr - +2025-02-05 19:27:16 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.245848298072815, 'learning_rate': 1.3314468616523874e-05, 'epoch': 1.23} +2025-02-05 19:27:16 - ERROR - stderr - 41%|████ | 9214/22434 [9:19:36<9:07:58, 2.49s/it] +2025-02-05 19:27:19 - ERROR - stderr - 41%|████ | 9215/22434 [9:19:39<9:08:01, 2.49s/it] +2025-02-05 19:27:19 - ERROR - stderr - +2025-02-05 19:27:19 - ERROR - stderr - +2025-02-05 19:27:19 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.20819890499115, 'learning_rate': 1.3313106444744181e-05, 'epoch': 1.23} +2025-02-05 19:27:19 - ERROR - stderr - 41%|████ | 9215/22434 [9:19:39<9:08:01, 2.49s/it] +2025-02-05 19:27:21 - ERROR - stderr - 41%|████ | 9216/22434 [9:19:41<9:09:47, 2.50s/it] +2025-02-05 19:27:21 - ERROR - stderr - +2025-02-05 19:27:21 - ERROR - stderr - +2025-02-05 19:27:21 - INFO - stdout - {'loss': 0.671, 'grad_norm': 1.063406229019165, 'learning_rate': 1.3311744203905957e-05, 'epoch': 1.23} +2025-02-05 19:27:21 - ERROR - stderr - 41%|████ | 9216/22434 [9:19:41<9:09:47, 2.50s/it] +2025-02-05 19:27:24 - ERROR - stderr - 41%|████ | 9217/22434 [9:19:44<9:07:20, 2.48s/it] +2025-02-05 19:27:24 - ERROR - stderr - +2025-02-05 19:27:24 - ERROR - stderr - +2025-02-05 19:27:24 - INFO - stdout - {'loss': 0.6896, 'grad_norm': 1.1256377696990967, 'learning_rate': 1.3310381894037589e-05, 'epoch': 1.23} +2025-02-05 19:27:24 - ERROR - stderr - 41%|████ | 9217/22434 [9:19:44<9:07:20, 2.48s/it] +2025-02-05 19:27:26 - ERROR - stderr - 41%|████ | 9218/22434 [9:19:46<9:07:18, 2.48s/it] +2025-02-05 19:27:26 - ERROR - stderr - +2025-02-05 19:27:26 - ERROR - stderr - +2025-02-05 19:27:26 - INFO - stdout - {'loss': 0.8054, 'grad_norm': 1.3882501125335693, 'learning_rate': 1.3309019515167481e-05, 'epoch': 1.23} +2025-02-05 19:27:26 - ERROR - stderr - 41%|████ | 9218/22434 [9:19:46<9:07:18, 2.48s/it] +2025-02-05 19:27:29 - ERROR - stderr - 41%|████ | 9219/22434 [9:19:49<9:10:34, 2.50s/it] +2025-02-05 19:27:29 - ERROR - stderr - +2025-02-05 19:27:29 - ERROR - stderr - +2025-02-05 19:27:29 - INFO - stdout - {'loss': 0.7419, 'grad_norm': 1.1527929306030273, 'learning_rate': 1.3307657067324029e-05, 'epoch': 1.23} +2025-02-05 19:27:29 - ERROR - stderr - 41%|████ | 9219/22434 [9:19:49<9:10:34, 2.50s/it] +2025-02-05 19:27:31 - ERROR - stderr - 41%|████ | 9220/22434 [9:19:51<9:12:25, 2.51s/it] +2025-02-05 19:27:31 - ERROR - stderr - +2025-02-05 19:27:31 - ERROR - stderr - +2025-02-05 19:27:31 - INFO - stdout - {'loss': 0.8162, 'grad_norm': 1.3470364809036255, 'learning_rate': 1.3306294550535627e-05, 'epoch': 1.23} +2025-02-05 19:27:31 - ERROR - stderr - 41%|████ | 9220/22434 [9:19:51<9:12:25, 2.51s/it] +2025-02-05 19:27:34 - ERROR - stderr - 41%|████ | 9221/22434 [9:19:54<9:07:50, 2.49s/it] +2025-02-05 19:27:34 - ERROR - stderr - +2025-02-05 19:27:34 - ERROR - stderr - +2025-02-05 19:27:34 - INFO - stdout - {'loss': 0.7286, 'grad_norm': 1.2529911994934082, 'learning_rate': 1.3304931964830683e-05, 'epoch': 1.23} +2025-02-05 19:27:34 - ERROR - stderr - 41%|████ | 9221/22434 [9:19:54<9:07:50, 2.49s/it] +2025-02-05 19:27:36 - ERROR - stderr - 41%|████ | 9222/22434 [9:19:56<9:06:59, 2.48s/it] +2025-02-05 19:27:36 - ERROR - stderr - +2025-02-05 19:27:36 - ERROR - stderr - +2025-02-05 19:27:36 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.2772401571273804, 'learning_rate': 1.3303569310237593e-05, 'epoch': 1.23} +2025-02-05 19:27:36 - ERROR - stderr - 41%|████ | 9222/22434 [9:19:56<9:06:59, 2.48s/it] +2025-02-05 19:27:39 - ERROR - stderr - 41%|████ | 9223/22434 [9:19:59<9:06:21, 2.48s/it] +2025-02-05 19:27:39 - ERROR - stderr - +2025-02-05 19:27:39 - ERROR - stderr - +2025-02-05 19:27:39 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.30802321434021, 'learning_rate': 1.3302206586784762e-05, 'epoch': 1.23} +2025-02-05 19:27:39 - ERROR - stderr - 41%|████ | 9223/22434 [9:19:59<9:06:21, 2.48s/it] +2025-02-05 19:27:41 - ERROR - stderr - 41%|████ | 9224/22434 [9:20:01<9:06:09, 2.48s/it] +2025-02-05 19:27:41 - ERROR - stderr - +2025-02-05 19:27:41 - ERROR - stderr - +2025-02-05 19:27:41 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.2973982095718384, 'learning_rate': 1.3300843794500593e-05, 'epoch': 1.23} +2025-02-05 19:27:41 - ERROR - stderr - 41%|████ | 9224/22434 [9:20:01<9:06:09, 2.48s/it] +2025-02-05 19:27:44 - ERROR - stderr - 41%|████ | 9225/22434 [9:20:03<9:01:46, 2.46s/it] +2025-02-05 19:27:44 - ERROR - stderr - +2025-02-05 19:27:44 - ERROR - stderr - +2025-02-05 19:27:44 - INFO - stdout - {'loss': 0.8263, 'grad_norm': 1.3099365234375, 'learning_rate': 1.3299480933413495e-05, 'epoch': 1.23} +2025-02-05 19:27:44 - ERROR - stderr - 41%|████ | 9225/22434 [9:20:04<9:01:46, 2.46s/it] +2025-02-05 19:27:46 - ERROR - stderr - 41%|████ | 9226/22434 [9:20:06<9:03:10, 2.47s/it] +2025-02-05 19:27:46 - ERROR - stderr - +2025-02-05 19:27:46 - ERROR - stderr - +2025-02-05 19:27:46 - INFO - stdout - {'loss': 0.7334, 'grad_norm': 1.1525856256484985, 'learning_rate': 1.3298118003551875e-05, 'epoch': 1.23} +2025-02-05 19:27:46 - ERROR - stderr - 41%|████ | 9226/22434 [9:20:06<9:03:10, 2.47s/it] +2025-02-05 19:27:49 - ERROR - stderr - 41%|████ | 9227/22434 [9:20:09<9:33:25, 2.61s/it] +2025-02-05 19:27:49 - ERROR - stderr - +2025-02-05 19:27:49 - ERROR - stderr - +2025-02-05 19:27:49 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.1708300113677979, 'learning_rate': 1.329675500494414e-05, 'epoch': 1.23} +2025-02-05 19:27:49 - ERROR - stderr - 41%|████ | 9227/22434 [9:20:09<9:33:25, 2.61s/it] +2025-02-05 19:27:52 - ERROR - stderr - 41%|████ | 9228/22434 [9:20:11<9:27:24, 2.58s/it] +2025-02-05 19:27:52 - ERROR - stderr - +2025-02-05 19:27:52 - ERROR - stderr - +2025-02-05 19:27:52 - INFO - stdout - {'loss': 0.7685, 'grad_norm': 1.1763737201690674, 'learning_rate': 1.32953919376187e-05, 'epoch': 1.23} +2025-02-05 19:27:52 - ERROR - stderr - 41%|████ | 9228/22434 [9:20:11<9:27:24, 2.58s/it] +2025-02-05 19:27:54 - ERROR - stderr - 41%|████ | 9229/22434 [9:20:14<9:21:32, 2.55s/it] +2025-02-05 19:27:54 - ERROR - stderr - +2025-02-05 19:27:54 - ERROR - stderr - +2025-02-05 19:27:54 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.0706772804260254, 'learning_rate': 1.3294028801603973e-05, 'epoch': 1.23} +2025-02-05 19:27:54 - ERROR - stderr - 41%|████ | 9229/22434 [9:20:14<9:21:32, 2.55s/it] +2025-02-05 19:27:57 - ERROR - stderr - 41%|████ | 9230/22434 [9:20:16<9:17:38, 2.53s/it] +2025-02-05 19:27:57 - ERROR - stderr - +2025-02-05 19:27:57 - ERROR - stderr - +2025-02-05 19:27:57 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.225495457649231, 'learning_rate': 1.3292665596928365e-05, 'epoch': 1.23} +2025-02-05 19:27:57 - ERROR - stderr - 41%|████ | 9230/22434 [9:20:16<9:17:38, 2.53s/it] +2025-02-05 19:27:59 - ERROR - stderr - 41%|████ | 9231/22434 [9:20:19<9:15:15, 2.52s/it] +2025-02-05 19:27:59 - ERROR - stderr - +2025-02-05 19:27:59 - ERROR - stderr - +2025-02-05 19:27:59 - INFO - stdout - {'loss': 0.6813, 'grad_norm': 1.171221137046814, 'learning_rate': 1.329130232362029e-05, 'epoch': 1.23} +2025-02-05 19:27:59 - ERROR - stderr - 41%|████ | 9231/22434 [9:20:19<9:15:15, 2.52s/it] +2025-02-05 19:28:02 - ERROR - stderr - 41%|████ | 9232/22434 [9:20:21<9:16:53, 2.53s/it] +2025-02-05 19:28:02 - ERROR - stderr - +2025-02-05 19:28:02 - ERROR - stderr - +2025-02-05 19:28:02 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.1999200582504272, 'learning_rate': 1.328993898170817e-05, 'epoch': 1.23} +2025-02-05 19:28:02 - ERROR - stderr - 41%|████ | 9232/22434 [9:20:21<9:16:53, 2.53s/it] +2025-02-05 19:28:04 - ERROR - stderr - 41%|████ | 9233/22434 [9:20:24<9:15:44, 2.53s/it] +2025-02-05 19:28:04 - ERROR - stderr - +2025-02-05 19:28:04 - ERROR - stderr - +2025-02-05 19:28:04 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.2723852396011353, 'learning_rate': 1.3288575571220424e-05, 'epoch': 1.23} +2025-02-05 19:28:04 - ERROR - stderr - 41%|████ | 9233/22434 [9:20:24<9:15:44, 2.53s/it] +2025-02-05 19:28:07 - ERROR - stderr - 41%|████ | 9234/22434 [9:20:27<9:18:17, 2.54s/it] +2025-02-05 19:28:07 - ERROR - stderr - +2025-02-05 19:28:07 - ERROR - stderr - +2025-02-05 19:28:07 - INFO - stdout - {'loss': 0.7733, 'grad_norm': 1.3295897245407104, 'learning_rate': 1.3287212092185464e-05, 'epoch': 1.23} +2025-02-05 19:28:07 - ERROR - stderr - 41%|████ | 9234/22434 [9:20:27<9:18:17, 2.54s/it] +2025-02-05 19:28:09 - ERROR - stderr - 41%|██��█ | 9235/22434 [9:20:29<9:20:28, 2.55s/it] +2025-02-05 19:28:09 - ERROR - stderr - +2025-02-05 19:28:09 - ERROR - stderr - +2025-02-05 19:28:09 - INFO - stdout - {'loss': 0.6253, 'grad_norm': 0.9353153109550476, 'learning_rate': 1.3285848544631713e-05, 'epoch': 1.23} +2025-02-05 19:28:09 - ERROR - stderr - 41%|████ | 9235/22434 [9:20:29<9:20:28, 2.55s/it] +2025-02-05 19:28:12 - ERROR - stderr - 41%|████ | 9236/22434 [9:20:32<9:17:13, 2.53s/it] +2025-02-05 19:28:12 - ERROR - stderr - +2025-02-05 19:28:12 - ERROR - stderr - +2025-02-05 19:28:12 - INFO - stdout - {'loss': 0.7198, 'grad_norm': 1.2121052742004395, 'learning_rate': 1.3284484928587593e-05, 'epoch': 1.24} +2025-02-05 19:28:12 - ERROR - stderr - 41%|████ | 9236/22434 [9:20:32<9:17:13, 2.53s/it] +2025-02-05 19:28:14 - ERROR - stderr - 41%|████ | 9237/22434 [9:20:34<9:11:38, 2.51s/it] +2025-02-05 19:28:14 - ERROR - stderr - +2025-02-05 19:28:14 - ERROR - stderr - +2025-02-05 19:28:14 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.2026607990264893, 'learning_rate': 1.3283121244081526e-05, 'epoch': 1.24} +2025-02-05 19:28:14 - ERROR - stderr - 41%|████ | 9237/22434 [9:20:34<9:11:38, 2.51s/it] +2025-02-05 19:28:17 - ERROR - stderr - 41%|████ | 9238/22434 [9:20:37<9:09:48, 2.50s/it] +2025-02-05 19:28:17 - ERROR - stderr - +2025-02-05 19:28:17 - ERROR - stderr - +2025-02-05 19:28:17 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.1836035251617432, 'learning_rate': 1.3281757491141942e-05, 'epoch': 1.24} +2025-02-05 19:28:17 - ERROR - stderr - 41%|████ | 9238/22434 [9:20:37<9:09:48, 2.50s/it] +2025-02-05 19:28:19 - ERROR - stderr - 41%|████ | 9239/22434 [9:20:39<9:06:12, 2.48s/it] +2025-02-05 19:28:19 - ERROR - stderr - +2025-02-05 19:28:19 - ERROR - stderr - +2025-02-05 19:28:19 - INFO - stdout - {'loss': 0.7099, 'grad_norm': 1.0486087799072266, 'learning_rate': 1.3280393669797263e-05, 'epoch': 1.24} +2025-02-05 19:28:19 - ERROR - stderr - 41%|████ | 9239/22434 [9:20:39<9:06:12, 2.48s/it] +2025-02-05 19:28:22 - ERROR - stderr - 41%|████ | 9240/22434 [9:20:41<9:07:59, 2.49s/it] +2025-02-05 19:28:22 - ERROR - stderr - +2025-02-05 19:28:22 - ERROR - stderr - +2025-02-05 19:28:22 - INFO - stdout - {'loss': 0.679, 'grad_norm': 1.0635490417480469, 'learning_rate': 1.3279029780075913e-05, 'epoch': 1.24} +2025-02-05 19:28:22 - ERROR - stderr - 41%|████ | 9240/22434 [9:20:42<9:07:59, 2.49s/it] +2025-02-05 19:28:24 - ERROR - stderr - 41%|████ | 9241/22434 [9:20:44<9:07:08, 2.49s/it] +2025-02-05 19:28:24 - ERROR - stderr - +2025-02-05 19:28:24 - ERROR - stderr - +2025-02-05 19:28:24 - INFO - stdout - {'loss': 0.7168, 'grad_norm': 1.1884044408798218, 'learning_rate': 1.3277665822006331e-05, 'epoch': 1.24} +2025-02-05 19:28:24 - ERROR - stderr - 41%|████ | 9241/22434 [9:20:44<9:07:08, 2.49s/it] +2025-02-05 19:28:27 - ERROR - stderr - 41%|████ | 9242/22434 [9:20:46<9:07:01, 2.49s/it] +2025-02-05 19:28:27 - ERROR - stderr - +2025-02-05 19:28:27 - ERROR - stderr - +2025-02-05 19:28:27 - INFO - stdout - {'loss': 0.7674, 'grad_norm': 1.0989524126052856, 'learning_rate': 1.3276301795616937e-05, 'epoch': 1.24} +2025-02-05 19:28:27 - ERROR - stderr - 41%|████ | 9242/22434 [9:20:46<9:07:01, 2.49s/it] +2025-02-05 19:28:29 - ERROR - stderr - 41%|████ | 9243/22434 [9:20:49<9:08:29, 2.49s/it] +2025-02-05 19:28:29 - ERROR - stderr - +2025-02-05 19:28:29 - ERROR - stderr - +2025-02-05 19:28:29 - INFO - stdout - {'loss': 0.7421, 'grad_norm': 1.1169859170913696, 'learning_rate': 1.3274937700936168e-05, 'epoch': 1.24} +2025-02-05 19:28:29 - ERROR - stderr - 41%|████ | 9243/22434 [9:20:49<9:08:29, 2.49s/it] +2025-02-05 19:28:32 - ERROR - stderr - 41%|████ | 9244/22434 [9:20:51<9:07:25, 2.49s/it] +2025-02-05 19:28:32 - ERROR - stderr - +2025-02-05 19:28:32 - ERROR - stderr - +2025-02-05 19:28:32 - INFO - stdout - {'loss': 0.697, 'grad_norm': 1.234826683998108, 'learning_rate': 1.3273573537992455e-05, 'epoch': 1.24} +2025-02-05 19:28:32 - ERROR - stderr - 41%|████ | 9244/22434 [9:20:51<9:07:25, 2.49s/it] +2025-02-05 19:28:34 - ERROR - stderr - 41%|████ | 9245/22434 [9:20:54<9:14:18, 2.52s/it] +2025-02-05 19:28:34 - ERROR - stderr - +2025-02-05 19:28:34 - ERROR - stderr - +2025-02-05 19:28:34 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.1430209875106812, 'learning_rate': 1.3272209306814237e-05, 'epoch': 1.24} +2025-02-05 19:28:34 - ERROR - stderr - 41%|████ | 9245/22434 [9:20:54<9:14:18, 2.52s/it] +2025-02-05 19:28:37 - ERROR - stderr - 41%|████ | 9246/22434 [9:20:56<9:10:43, 2.51s/it] +2025-02-05 19:28:37 - ERROR - stderr - +2025-02-05 19:28:37 - ERROR - stderr - +2025-02-05 19:28:37 - INFO - stdout - {'loss': 0.7298, 'grad_norm': 1.1017210483551025, 'learning_rate': 1.3270845007429946e-05, 'epoch': 1.24} +2025-02-05 19:28:37 - ERROR - stderr - 41%|████ | 9246/22434 [9:20:57<9:10:43, 2.51s/it] +2025-02-05 19:28:39 - ERROR - stderr - 41%|████ | 9247/22434 [9:20:59<9:10:55, 2.51s/it] +2025-02-05 19:28:39 - ERROR - stderr - +2025-02-05 19:28:39 - ERROR - stderr - +2025-02-05 19:28:39 - INFO - stdout - {'loss': 0.7704, 'grad_norm': 1.278051495552063, 'learning_rate': 1.326948063986802e-05, 'epoch': 1.24} +2025-02-05 19:28:39 - ERROR - stderr - 41%|████ | 9247/22434 [9:20:59<9:10:55, 2.51s/it] +2025-02-05 19:28:42 - ERROR - stderr - 41%|████ | 9248/22434 [9:21:02<9:10:29, 2.50s/it] +2025-02-05 19:28:42 - ERROR - stderr - +2025-02-05 19:28:42 - ERROR - stderr - +2025-02-05 19:28:42 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.1119502782821655, 'learning_rate': 1.32681162041569e-05, 'epoch': 1.24} +2025-02-05 19:28:42 - ERROR - stderr - 41%|████ | 9248/22434 [9:21:02<9:10:29, 2.50s/it] +2025-02-05 19:28:44 - ERROR - stderr - 41%|████ | 9249/22434 [9:21:04<9:10:24, 2.50s/it] +2025-02-05 19:28:44 - ERROR - stderr - +2025-02-05 19:28:44 - ERROR - stderr - +2025-02-05 19:28:44 - INFO - stdout - {'loss': 0.7816, 'grad_norm': 1.1583329439163208, 'learning_rate': 1.3266751700325027e-05, 'epoch': 1.24} +2025-02-05 19:28:44 - ERROR - stderr - 41%|████ | 9249/22434 [9:21:04<9:10:24, 2.50s/it] +2025-02-05 19:28:47 - ERROR - stderr - 41%|████ | 9250/22434 [9:21:06<9:06:10, 2.49s/it] +2025-02-05 19:28:47 - ERROR - stderr - +2025-02-05 19:28:47 - ERROR - stderr - +2025-02-05 19:28:47 - INFO - stdout - {'loss': 0.714, 'grad_norm': 1.142851710319519, 'learning_rate': 1.3265387128400833e-05, 'epoch': 1.24} +2025-02-05 19:28:47 - ERROR - stderr - 41%|████ | 9250/22434 [9:21:07<9:06:10, 2.49s/it] +2025-02-05 19:28:49 - ERROR - stderr - 41%|████ | 9251/22434 [9:21:09<9:15:41, 2.53s/it] +2025-02-05 19:28:49 - ERROR - stderr - +2025-02-05 19:28:49 - ERROR - stderr - +2025-02-05 19:28:49 - INFO - stdout - {'loss': 0.7698, 'grad_norm': 1.2522541284561157, 'learning_rate': 1.3264022488412773e-05, 'epoch': 1.24} +2025-02-05 19:28:49 - ERROR - stderr - 41%|████ | 9251/22434 [9:21:09<9:15:41, 2.53s/it] +2025-02-05 19:28:52 - ERROR - stderr - 41%|████ | 9252/22434 [9:21:12<9:14:06, 2.52s/it] +2025-02-05 19:28:52 - ERROR - stderr - +2025-02-05 19:28:52 - ERROR - stderr - +2025-02-05 19:28:52 - INFO - stdout - {'loss': 0.7105, 'grad_norm': 1.1853128671646118, 'learning_rate': 1.326265778038929e-05, 'epoch': 1.24} +2025-02-05 19:28:52 - ERROR - stderr - 41%|████ | 9252/22434 [9:21:12<9:14:06, 2.52s/it] +2025-02-05 19:28:54 - ERROR - stderr - 41%|████ | 9253/22434 [9:21:14<9:10:15, 2.50s/it] +2025-02-05 19:28:54 - ERROR - stderr - +2025-02-05 19:28:54 - ERROR - stderr - +2025-02-05 19:28:54 - INFO - stdout - {'loss': 0.7971, 'grad_norm': 1.306074857711792, 'learning_rate': 1.3261293004358829e-05, 'epoch': 1.24} +2025-02-05 19:28:54 - ERROR - stderr - 41%|████ | 9253/22434 [9:21:14<9:10:15, 2.50s/it] +2025-02-05 19:28:57 - ERROR - stderr - 41%|████ | 9254/22434 [9:21:17<9:09:18, 2.50s/it] +2025-02-05 19:28:57 - ERROR - stderr - +2025-02-05 19:28:57 - ERROR - stderr - +2025-02-05 19:28:57 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.2058424949645996, 'learning_rate': 1.325992816034983e-05, 'epoch': 1.24} +2025-02-05 19:28:57 - ERROR - stderr - 41%|████ | 9254/22434 [9:21:17<9:09:18, 2.50s/it] +2025-02-05 19:28:59 - ERROR - stderr - 41%|████▏ | 9255/22434 [9:21:19<9:13:37, 2.52s/it] +2025-02-05 19:28:59 - ERROR - stderr - +2025-02-05 19:28:59 - ERROR - stderr - +2025-02-05 19:28:59 - INFO - stdout - {'loss': 0.8256, 'grad_norm': 1.2013771533966064, 'learning_rate': 1.3258563248390752e-05, 'epoch': 1.24} +2025-02-05 19:28:59 - ERROR - stderr - 41%|████▏ | 9255/22434 [9:21:19<9:13:37, 2.52s/it] +2025-02-05 19:29:02 - ERROR - stderr - 41%|████▏ | 9256/22434 [9:21:22<9:09:34, 2.50s/it] +2025-02-05 19:29:02 - ERROR - stderr - +2025-02-05 19:29:02 - ERROR - stderr - +2025-02-05 19:29:02 - INFO - stdout - {'loss': 0.6362, 'grad_norm': 1.1267105340957642, 'learning_rate': 1.3257198268510041e-05, 'epoch': 1.24} +2025-02-05 19:29:02 - ERROR - stderr - 41%|████▏ | 9256/22434 [9:21:22<9:09:34, 2.50s/it] +2025-02-05 19:29:04 - ERROR - stderr - 41%|████▏ | 9257/22434 [9:21:24<9:19:13, 2.55s/it] +2025-02-05 19:29:04 - ERROR - stderr - +2025-02-05 19:29:04 - ERROR - stderr - +2025-02-05 19:29:04 - INFO - stdout - {'loss': 0.7099, 'grad_norm': 1.1598784923553467, 'learning_rate': 1.3255833220736147e-05, 'epoch': 1.24} +2025-02-05 19:29:04 - ERROR - stderr - 41%|████▏ | 9257/22434 [9:21:24<9:19:13, 2.55s/it] +2025-02-05 19:29:07 - ERROR - stderr - 41%|████▏ | 9258/22434 [9:21:27<9:25:02, 2.57s/it] +2025-02-05 19:29:07 - ERROR - stderr - +2025-02-05 19:29:07 - ERROR - stderr - +2025-02-05 19:29:07 - INFO - stdout - {'loss': 0.7537, 'grad_norm': 1.3934515714645386, 'learning_rate': 1.3254468105097526e-05, 'epoch': 1.24} +2025-02-05 19:29:07 - ERROR - stderr - 41%|████▏ | 9258/22434 [9:21:27<9:25:02, 2.57s/it] +2025-02-05 19:29:10 - ERROR - stderr - 41%|████▏ | 9259/22434 [9:21:29<9:15:30, 2.53s/it] +2025-02-05 19:29:10 - ERROR - stderr - +2025-02-05 19:29:10 - ERROR - stderr - +2025-02-05 19:29:10 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.1538817882537842, 'learning_rate': 1.3253102921622632e-05, 'epoch': 1.24} +2025-02-05 19:29:10 - ERROR - stderr - 41%|████▏ | 9259/22434 [9:21:29<9:15:30, 2.53s/it] +2025-02-05 19:29:12 - ERROR - stderr - 41%|████▏ | 9260/22434 [9:21:32<9:12:58, 2.52s/it] +2025-02-05 19:29:12 - ERROR - stderr - +2025-02-05 19:29:12 - ERROR - stderr - +2025-02-05 19:29:12 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.1210070848464966, 'learning_rate': 1.325173767033992e-05, 'epoch': 1.24} +2025-02-05 19:29:12 - ERROR - stderr - 41%|████▏ | 9260/22434 [9:21:32<9:12:58, 2.52s/it] +2025-02-05 19:29:15 - ERROR - stderr - 41%|████▏ | 9261/22434 [9:21:34<9:12:50, 2.52s/it] +2025-02-05 19:29:15 - ERROR - stderr - +2025-02-05 19:29:15 - ERROR - stderr - +2025-02-05 19:29:15 - INFO - stdout - {'loss': 0.6525, 'grad_norm': 1.0543426275253296, 'learning_rate': 1.3250372351277844e-05, 'epoch': 1.24} +2025-02-05 19:29:15 - ERROR - stderr - 41%|████▏ | 9261/22434 [9:21:34<9:12:50, 2.52s/it] +2025-02-05 19:29:17 - ERROR - stderr - 41%|████▏ | 9262/22434 [9:21:37<9:08:52, 2.50s/it] +2025-02-05 19:29:17 - ERROR - stderr - +2025-02-05 19:29:17 - ERROR - stderr - +2025-02-05 19:29:17 - INFO - stdout - {'loss': 0.7233, 'grad_norm': 1.1687966585159302, 'learning_rate': 1.3249006964464875e-05, 'epoch': 1.24} +2025-02-05 19:29:17 - ERROR - stderr - 41%|████▏ | 9262/22434 [9:21:37<9:08:52, 2.50s/it] +2025-02-05 19:29:20 - ERROR - stderr - 41%|████▏ | 9263/22434 [9:21:39<9:11:23, 2.51s/it] +2025-02-05 19:29:20 - ERROR - stderr - +2025-02-05 19:29:20 - ERROR - stderr - +2025-02-05 19:29:20 - INFO - stdout - {'loss': 0.7454, 'grad_norm': 1.2662804126739502, 'learning_rate': 1.3247641509929459e-05, 'epoch': 1.24} +2025-02-05 19:29:20 - ERROR - stderr - 41%|████▏ | 9263/22434 [9:21:39<9:11:23, 2.51s/it] +2025-02-05 19:29:23 - ERROR - stderr - 41%|████▏ | 9264/22434 [9:21:42<9:46:36, 2.67s/it] +2025-02-05 19:29:23 - ERROR - stderr - +2025-02-05 19:29:23 - ERROR - stderr - +2025-02-05 19:29:23 - INFO - stdout - {'loss': 0.7152, 'grad_norm': 1.197374701499939, 'learning_rate': 1.3246275987700063e-05, 'epoch': 1.24} +2025-02-05 19:29:23 - ERROR - stderr - 41%|████▏ | 9264/22434 [9:21:42<9:46:36, 2.67s/it] +2025-02-05 19:29:25 - ERROR - stderr - 41%|████▏ | 9265/22434 [9:21:45<9:33:53, 2.61s/it] +2025-02-05 19:29:25 - ERROR - stderr - +2025-02-05 19:29:25 - ERROR - stderr - +2025-02-05 19:29:25 - INFO - stdout - {'loss': 0.7913, 'grad_norm': 1.321302056312561, 'learning_rate': 1.3244910397805151e-05, 'epoch': 1.24} +2025-02-05 19:29:25 - ERROR - stderr - 41%|████▏ | 9265/22434 [9:21:45<9:33:53, 2.61s/it] +2025-02-05 19:29:28 - ERROR - stderr - 41%|████▏ | 9266/22434 [9:21:47<9:27:59, 2.59s/it] +2025-02-05 19:29:28 - ERROR - stderr - +2025-02-05 19:29:28 - ERROR - stderr - +2025-02-05 19:29:28 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.1644649505615234, 'learning_rate': 1.324354474027319e-05, 'epoch': 1.24} +2025-02-05 19:29:28 - ERROR - stderr - 41%|████▏ | 9266/22434 [9:21:47<9:27:59, 2.59s/it] +2025-02-05 19:29:30 - ERROR - stderr - 41%|████▏ | 9267/22434 [9:21:50<9:23:56, 2.57s/it] +2025-02-05 19:29:30 - ERROR - stderr - +2025-02-05 19:29:30 - ERROR - stderr - +2025-02-05 19:29:30 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.0506736040115356, 'learning_rate': 1.3242179015132641e-05, 'epoch': 1.24} +2025-02-05 19:29:30 - ERROR - stderr - 41%|████▏ | 9267/22434 [9:21:50<9:23:56, 2.57s/it] +2025-02-05 19:29:33 - ERROR - stderr - 41%|████▏ | 9268/22434 [9:21:52<9:19:42, 2.55s/it] +2025-02-05 19:29:33 - ERROR - stderr - +2025-02-05 19:29:33 - ERROR - stderr - +2025-02-05 19:29:33 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.1492433547973633, 'learning_rate': 1.3240813222411973e-05, 'epoch': 1.24} +2025-02-05 19:29:33 - ERROR - stderr - 41%|████▏ | 9268/22434 [9:21:52<9:19:42, 2.55s/it] +2025-02-05 19:29:35 - ERROR - stderr - 41%|████▏ | 9269/22434 [9:21:55<9:21:54, 2.56s/it] +2025-02-05 19:29:35 - ERROR - stderr - +2025-02-05 19:29:35 - ERROR - stderr - +2025-02-05 19:29:35 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.0844646692276, 'learning_rate': 1.3239447362139652e-05, 'epoch': 1.24} +2025-02-05 19:29:35 - ERROR - stderr - 41%|████▏ | 9269/22434 [9:21:55<9:21:54, 2.56s/it] +2025-02-05 19:29:38 - ERROR - stderr - 41%|████▏ | 9270/22434 [9:21:57<9:15:06, 2.53s/it] +2025-02-05 19:29:38 - ERROR - stderr - +2025-02-05 19:29:38 - ERROR - stderr - +2025-02-05 19:29:38 - INFO - stdout - {'loss': 0.7226, 'grad_norm': 1.3227527141571045, 'learning_rate': 1.3238081434344153e-05, 'epoch': 1.24} +2025-02-05 19:29:38 - ERROR - stderr - 41%|████▏ | 9270/22434 [9:21:57<9:15:06, 2.53s/it] +2025-02-05 19:29:40 - ERROR - stderr - 41%|████▏ | 9271/22434 [9:22:00<9:14:40, 2.53s/it] +2025-02-05 19:29:40 - ERROR - stderr - +2025-02-05 19:29:40 - ERROR - stderr - +2025-02-05 19:29:40 - INFO - stdout - {'loss': 0.7885, 'grad_norm': 1.3782129287719727, 'learning_rate': 1.3236715439053944e-05, 'epoch': 1.24} +2025-02-05 19:29:40 - ERROR - stderr - 41%|████▏ | 9271/22434 [9:22:00<9:14:40, 2.53s/it] +2025-02-05 19:29:43 - ERROR - stderr - 41%|████▏ | 9272/22434 [9:22:03<9:16:48, 2.54s/it] +2025-02-05 19:29:43 - ERROR - stderr - +2025-02-05 19:29:43 - ERROR - stderr - +2025-02-05 19:29:43 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.1319572925567627, 'learning_rate': 1.32353493762975e-05, 'epoch': 1.24} +2025-02-05 19:29:43 - ERROR - stderr - 41%|████▏ | 9272/22434 [9:22:03<9:16:48, 2.54s/it] +2025-02-05 19:29:45 - ERROR - stderr - 41%|████▏ | 9273/22434 [9:22:05<9:09:45, 2.51s/it] +2025-02-05 19:29:45 - ERROR - stderr - +2025-02-05 19:29:45 - ERROR - stderr - +2025-02-05 19:29:45 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.1389429569244385, 'learning_rate': 1.3233983246103293e-05, 'epoch': 1.24} +2025-02-05 19:29:45 - ERROR - stderr - 41%|████▏ | 9273/22434 [9:22:05<9:09:45, 2.51s/it] +2025-02-05 19:29:48 - ERROR - stderr - 41%|████▏ | 9274/22434 [9:22:07<9:05:51, 2.49s/it] +2025-02-05 19:29:48 - ERROR - stderr - +2025-02-05 19:29:48 - ERROR - stderr - +2025-02-05 19:29:48 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.2254799604415894, 'learning_rate': 1.3232617048499801e-05, 'epoch': 1.24} +2025-02-05 19:29:48 - ERROR - stderr - 41%|████▏ | 9274/22434 [9:22:07<9:05:51, 2.49s/it] +2025-02-05 19:29:50 - ERROR - stderr - 41%|████▏ | 9275/22434 [9:22:10<9:10:10, 2.51s/it] +2025-02-05 19:29:50 - ERROR - stderr - +2025-02-05 19:29:50 - ERROR - stderr - +2025-02-05 19:29:50 - INFO - stdout - {'loss': 0.6911, 'grad_norm': 1.0978336334228516, 'learning_rate': 1.32312507835155e-05, 'epoch': 1.24} +2025-02-05 19:29:50 - ERROR - stderr - 41%|████▏ | 9275/22434 [9:22:10<9:10:10, 2.51s/it] +2025-02-05 19:29:53 - ERROR - stderr - 41%|████▏ | 9276/22434 [9:22:12<9:07:57, 2.50s/it] +2025-02-05 19:29:53 - ERROR - stderr - +2025-02-05 19:29:53 - ERROR - stderr - +2025-02-05 19:29:53 - INFO - stdout - {'loss': 0.6381, 'grad_norm': 1.2059299945831299, 'learning_rate': 1.3229884451178863e-05, 'epoch': 1.24} +2025-02-05 19:29:53 - ERROR - stderr - 41%|████▏ | 9276/22434 [9:22:12<9:07:57, 2.50s/it] +2025-02-05 19:29:55 - ERROR - stderr - 41%|████▏ | 9277/22434 [9:22:15<9:06:34, 2.49s/it] +2025-02-05 19:29:55 - ERROR - stderr - +2025-02-05 19:29:55 - ERROR - stderr - +2025-02-05 19:29:55 - INFO - stdout - {'loss': 0.6559, 'grad_norm': 1.1060365438461304, 'learning_rate': 1.322851805151838e-05, 'epoch': 1.24} +2025-02-05 19:29:55 - ERROR - stderr - 41%|████▏ | 9277/22434 [9:22:15<9:06:34, 2.49s/it] +2025-02-05 19:29:58 - ERROR - stderr - 41%|████▏ | 9278/22434 [9:22:17<9:03:58, 2.48s/it] +2025-02-05 19:29:58 - ERROR - stderr - +2025-02-05 19:29:58 - ERROR - stderr - +2025-02-05 19:29:58 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.1741812229156494, 'learning_rate': 1.322715158456253e-05, 'epoch': 1.24} +2025-02-05 19:29:58 - ERROR - stderr - 41%|████▏ | 9278/22434 [9:22:17<9:03:58, 2.48s/it] +2025-02-05 19:30:00 - ERROR - stderr - 41%|████▏ | 9279/22434 [9:22:20<9:06:42, 2.49s/it] +2025-02-05 19:30:00 - ERROR - stderr - +2025-02-05 19:30:00 - ERROR - stderr - +2025-02-05 19:30:00 - INFO - stdout - {'loss': 0.7588, 'grad_norm': 1.3146891593933105, 'learning_rate': 1.322578505033979e-05, 'epoch': 1.24} +2025-02-05 19:30:00 - ERROR - stderr - 41%|████▏ | 9279/22434 [9:22:20<9:06:42, 2.49s/it] +2025-02-05 19:30:03 - ERROR - stderr - 41%|████▏ | 9280/22434 [9:22:22<9:10:39, 2.51s/it] +2025-02-05 19:30:03 - ERROR - stderr - +2025-02-05 19:30:03 - ERROR - stderr - +2025-02-05 19:30:03 - INFO - stdout - {'loss': 0.7669, 'grad_norm': 1.289953589439392, 'learning_rate': 1.3224418448878648e-05, 'epoch': 1.24} +2025-02-05 19:30:03 - ERROR - stderr - 41%|████▏ | 9280/22434 [9:22:22<9:10:39, 2.51s/it] +2025-02-05 19:30:05 - ERROR - stderr - 41%|████▏ | 9281/22434 [9:22:25<9:17:24, 2.54s/it] +2025-02-05 19:30:05 - ERROR - stderr - +2025-02-05 19:30:05 - ERROR - stderr - +2025-02-05 19:30:05 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.1532399654388428, 'learning_rate': 1.3223051780207587e-05, 'epoch': 1.24} +2025-02-05 19:30:05 - ERROR - stderr - 41%|████▏ | 9281/22434 [9:22:25<9:17:24, 2.54s/it] +2025-02-05 19:30:08 - ERROR - stderr - 41%|████▏ | 9282/22434 [9:22:28<9:17:34, 2.54s/it] +2025-02-05 19:30:08 - ERROR - stderr - +2025-02-05 19:30:08 - ERROR - stderr - +2025-02-05 19:30:08 - INFO - stdout - {'loss': 0.658, 'grad_norm': 1.1366627216339111, 'learning_rate': 1.3221685044355099e-05, 'epoch': 1.24} +2025-02-05 19:30:08 - ERROR - stderr - 41%|████▏ | 9282/22434 [9:22:28<9:17:34, 2.54s/it] +2025-02-05 19:30:10 - ERROR - stderr - 41%|████▏ | 9283/22434 [9:22:30<9:12:27, 2.52s/it] +2025-02-05 19:30:10 - ERROR - stderr - +2025-02-05 19:30:10 - ERROR - stderr - +2025-02-05 19:30:10 - INFO - stdout - {'loss': 0.7605, 'grad_norm': 1.22013521194458, 'learning_rate': 1.3220318241349669e-05, 'epoch': 1.24} +2025-02-05 19:30:10 - ERROR - stderr - 41%|████▏ | 9283/22434 [9:22:30<9:12:27, 2.52s/it] +2025-02-05 19:30:13 - ERROR - stderr - 41%|████▏ | 9284/22434 [9:22:33<9:23:17, 2.57s/it] +2025-02-05 19:30:13 - ERROR - stderr - +2025-02-05 19:30:13 - ERROR - stderr - +2025-02-05 19:30:13 - INFO - stdout - {'loss': 0.795, 'grad_norm': 1.179509162902832, 'learning_rate': 1.3218951371219783e-05, 'epoch': 1.24} +2025-02-05 19:30:13 - ERROR - stderr - 41%|████▏ | 9284/22434 [9:22:33<9:23:17, 2.57s/it] +2025-02-05 19:30:15 - ERROR - stderr - 41%|████▏ | 9285/22434 [9:22:35<9:18:42, 2.55s/it] +2025-02-05 19:30:16 - ERROR - stderr - +2025-02-05 19:30:16 - ERROR - stderr - +2025-02-05 19:30:16 - INFO - stdout - {'loss': 0.66, 'grad_norm': 1.107421875, 'learning_rate': 1.3217584433993937e-05, 'epoch': 1.24} +2025-02-05 19:30:16 - ERROR - stderr - 41%|████▏ | 9285/22434 [9:22:35<9:18:42, 2.55s/it] +2025-02-05 19:30:18 - ERROR - stderr - 41%|████▏ | 9286/22434 [9:22:38<9:18:48, 2.55s/it] +2025-02-05 19:30:18 - ERROR - stderr - +2025-02-05 19:30:18 - ERROR - stderr - +2025-02-05 19:30:18 - INFO - stdout - {'loss': 0.7543, 'grad_norm': 1.2991619110107422, 'learning_rate': 1.3216217429700628e-05, 'epoch': 1.24} +2025-02-05 19:30:18 - ERROR - stderr - 41%|████▏ | 9286/22434 [9:22:38<9:18:48, 2.55s/it] +2025-02-05 19:30:20 - ERROR - stderr - 41%|████▏ | 9287/22434 [9:22:40<9:11:43, 2.52s/it] +2025-02-05 19:30:21 - ERROR - stderr - +2025-02-05 19:30:21 - ERROR - stderr - +2025-02-05 19:30:21 - INFO - stdout - {'loss': 0.7119, 'grad_norm': 1.1613503694534302, 'learning_rate': 1.3214850358368338e-05, 'epoch': 1.24} +2025-02-05 19:30:21 - ERROR - stderr - 41%|████▏ | 9287/22434 [9:22:40<9:11:43, 2.52s/it] +2025-02-05 19:30:23 - ERROR - stderr - 41%|████▏ | 9288/22434 [9:22:43<9:08:27, 2.50s/it] +2025-02-05 19:30:23 - ERROR - stderr - +2025-02-05 19:30:23 - ERROR - stderr - +2025-02-05 19:30:23 - INFO - stdout - {'loss': 0.6948, 'grad_norm': 1.139907956123352, 'learning_rate': 1.3213483220025571e-05, 'epoch': 1.24} +2025-02-05 19:30:23 - ERROR - stderr - 41%|████▏ | 9288/22434 [9:22:43<9:08:27, 2.50s/it] +2025-02-05 19:30:25 - ERROR - stderr - 41%|████▏ | 9289/22434 [9:22:45<9:05:24, 2.49s/it] +2025-02-05 19:30:25 - ERROR - stderr - +2025-02-05 19:30:25 - ERROR - stderr - +2025-02-05 19:30:25 - INFO - stdout - {'loss': 0.725, 'grad_norm': 1.1426881551742554, 'learning_rate': 1.3212116014700818e-05, 'epoch': 1.24} +2025-02-05 19:30:25 - ERROR - stderr - 41%|████▏ | 9289/22434 [9:22:45<9:05:24, 2.49s/it] +2025-02-05 19:30:28 - ERROR - stderr - 41%|████▏ | 9290/22434 [9:22:48<9:03:21, 2.48s/it] +2025-02-05 19:30:28 - ERROR - stderr - +2025-02-05 19:30:28 - ERROR - stderr - +2025-02-05 19:30:28 - INFO - stdout - {'loss': 0.7925, 'grad_norm': 1.3799493312835693, 'learning_rate': 1.3210748742422586e-05, 'epoch': 1.24} +2025-02-05 19:30:28 - ERROR - stderr - 41%|████▏ | 9290/22434 [9:22:48<9:03:21, 2.48s/it] +2025-02-05 19:30:30 - ERROR - stderr - 41%|████▏ | 9291/22434 [9:22:50<9:02:33, 2.48s/it] +2025-02-05 19:30:30 - ERROR - stderr - +2025-02-05 19:30:30 - ERROR - stderr - +2025-02-05 19:30:30 - INFO - stdout - {'loss': 0.7153, 'grad_norm': 1.1253262758255005, 'learning_rate': 1.3209381403219366e-05, 'epoch': 1.24} +2025-02-05 19:30:30 - ERROR - stderr - 41%|████▏ | 9291/22434 [9:22:50<9:02:33, 2.48s/it] +2025-02-05 19:30:33 - ERROR - stderr - 41%|████▏ | 9292/22434 [9:22:53<9:05:00, 2.49s/it] +2025-02-05 19:30:33 - ERROR - stderr - +2025-02-05 19:30:33 - ERROR - stderr - +2025-02-05 19:30:33 - INFO - stdout - {'loss': 0.6129, 'grad_norm': 1.1465344429016113, 'learning_rate': 1.3208013997119662e-05, 'epoch': 1.24} +2025-02-05 19:30:33 - ERROR - stderr - 41%|████▏ | 9292/22434 [9:22:53<9:05:00, 2.49s/it] +2025-02-05 19:30:35 - ERROR - stderr - 41%|████▏ | 9293/22434 [9:22:55<9:03:33, 2.48s/it] +2025-02-05 19:30:35 - ERROR - stderr - +2025-02-05 19:30:35 - ERROR - stderr - +2025-02-05 19:30:35 - INFO - stdout - {'loss': 0.7359, 'grad_norm': 1.139330267906189, 'learning_rate': 1.3206646524151974e-05, 'epoch': 1.24} +2025-02-05 19:30:35 - ERROR - stderr - 41%|████▏ | 9293/22434 [9:22:55<9:03:33, 2.48s/it] +2025-02-05 19:30:38 - ERROR - stderr - 41%|████▏ | 9294/22434 [9:22:58<9:00:51, 2.47s/it] +2025-02-05 19:30:38 - ERROR - stderr - +2025-02-05 19:30:38 - ERROR - stderr - +2025-02-05 19:30:38 - INFO - stdout - {'loss': 0.7501, 'grad_norm': 1.1804602146148682, 'learning_rate': 1.3205278984344811e-05, 'epoch': 1.24} +2025-02-05 19:30:38 - ERROR - stderr - 41%|████▏ | 9294/22434 [9:22:58<9:00:51, 2.47s/it] +2025-02-05 19:30:40 - ERROR - stderr - 41%|████▏ | 9295/22434 [9:23:00<9:01:21, 2.47s/it] +2025-02-05 19:30:40 - ERROR - stderr - +2025-02-05 19:30:40 - ERROR - stderr - +2025-02-05 19:30:40 - INFO - stdout - {'loss': 0.7217, 'grad_norm': 1.1529501676559448, 'learning_rate': 1.320391137772667e-05, 'epoch': 1.24} +2025-02-05 19:30:40 - ERROR - stderr - 41%|████▏ | 9295/22434 [9:23:00<9:01:21, 2.47s/it] +2025-02-05 19:30:43 - ERROR - stderr - 41%|████▏ | 9296/22434 [9:23:02<8:58:26, 2.46s/it] +2025-02-05 19:30:43 - ERROR - stderr - +2025-02-05 19:30:43 - ERROR - stderr - +2025-02-05 19:30:43 - INFO - stdout - {'loss': 0.7963, 'grad_norm': 1.2482224702835083, 'learning_rate': 1.3202543704326065e-05, 'epoch': 1.24} +2025-02-05 19:30:43 - ERROR - stderr - 41%|████▏ | 9296/22434 [9:23:02<8:58:26, 2.46s/it] +2025-02-05 19:30:45 - ERROR - stderr - 41%|████▏ | 9297/22434 [9:23:05<9:00:20, 2.47s/it] +2025-02-05 19:30:45 - ERROR - stderr - +2025-02-05 19:30:45 - ERROR - stderr - +2025-02-05 19:30:45 - INFO - stdout - {'loss': 0.7683, 'grad_norm': 1.4613351821899414, 'learning_rate': 1.3201175964171502e-05, 'epoch': 1.24} +2025-02-05 19:30:45 - ERROR - stderr - 41%|████▏ | 9297/22434 [9:23:05<9:00:20, 2.47s/it] +2025-02-05 19:30:48 - ERROR - stderr - 41%|████▏ | 9298/22434 [9:23:07<9:08:15, 2.50s/it] +2025-02-05 19:30:48 - ERROR - stderr - +2025-02-05 19:30:48 - ERROR - stderr - +2025-02-05 19:30:48 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.223037838935852, 'learning_rate': 1.319980815729149e-05, 'epoch': 1.24} +2025-02-05 19:30:48 - ERROR - stderr - 41%|████▏ | 9298/22434 [9:23:08<9:08:15, 2.50s/it] +2025-02-05 19:30:50 - ERROR - stderr - 41%|████▏ | 9299/22434 [9:23:10<9:06:58, 2.50s/it] +2025-02-05 19:30:50 - ERROR - stderr - +2025-02-05 19:30:50 - ERROR - stderr - +2025-02-05 19:30:50 - INFO - stdout - {'loss': 0.7322, 'grad_norm': 1.12281334400177, 'learning_rate': 1.3198440283714536e-05, 'epoch': 1.24} +2025-02-05 19:30:50 - ERROR - stderr - 41%|████▏ | 9299/22434 [9:23:10<9:06:58, 2.50s/it] +2025-02-05 19:30:53 - ERROR - stderr - 41%|████▏ | 9300/22434 [9:23:13<9:15:23, 2.54s/it] +2025-02-05 19:30:53 - ERROR - stderr - +2025-02-05 19:30:53 - ERROR - stderr - +2025-02-05 19:30:53 - INFO - stdout - {'loss': 0.7941, 'grad_norm': 1.2032722234725952, 'learning_rate': 1.3197072343469154e-05, 'epoch': 1.24} +2025-02-05 19:30:53 - ERROR - stderr - 41%|████▏ | 9300/22434 [9:23:13<9:15:23, 2.54s/it] +2025-02-05 19:30:55 - ERROR - stderr - 41%|████▏ | 9301/22434 [9:23:15<9:14:31, 2.53s/it] +2025-02-05 19:30:55 - ERROR - stderr - +2025-02-05 19:30:55 - ERROR - stderr - +2025-02-05 19:30:55 - INFO - stdout - {'loss': 0.7415, 'grad_norm': 1.3838211297988892, 'learning_rate': 1.3195704336583863e-05, 'epoch': 1.24} +2025-02-05 19:30:55 - ERROR - stderr - 41%|████▏ | 9301/22434 [9:23:15<9:14:31, 2.53s/it] +2025-02-05 19:30:58 - ERROR - stderr - 41%|████▏ | 9302/22434 [9:23:18<9:17:39, 2.55s/it] +2025-02-05 19:30:58 - ERROR - stderr - +2025-02-05 19:30:58 - ERROR - stderr - +2025-02-05 19:30:58 - INFO - stdout - {'loss': 0.7552, 'grad_norm': 1.3007405996322632, 'learning_rate': 1.3194336263087168e-05, 'epoch': 1.24} +2025-02-05 19:30:58 - ERROR - stderr - 41%|████▏ | 9302/22434 [9:23:18<9:17:39, 2.55s/it] +2025-02-05 19:31:00 - ERROR - stderr - 41%|████▏ | 9303/22434 [9:23:20<9:13:44, 2.53s/it] +2025-02-05 19:31:00 - ERROR - stderr - +2025-02-05 19:31:00 - ERROR - stderr - +2025-02-05 19:31:00 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.3206770420074463, 'learning_rate': 1.3192968123007593e-05, 'epoch': 1.24} +2025-02-05 19:31:00 - ERROR - stderr - 41%|████▏ | 9303/22434 [9:23:20<9:13:44, 2.53s/it] +2025-02-05 19:31:03 - ERROR - stderr - 41%|████▏ | 9304/22434 [9:23:23<9:08:33, 2.51s/it] +2025-02-05 19:31:03 - ERROR - stderr - +2025-02-05 19:31:03 - ERROR - stderr - +2025-02-05 19:31:03 - INFO - stdout - {'loss': 0.7301, 'grad_norm': 1.13156259059906, 'learning_rate': 1.3191599916373653e-05, 'epoch': 1.24} +2025-02-05 19:31:03 - ERROR - stderr - 41%|████▏ | 9304/22434 [9:23:23<9:08:33, 2.51s/it] +2025-02-05 19:31:05 - ERROR - stderr - 41%|████▏ | 9305/22434 [9:23:25<9:08:11, 2.51s/it] +2025-02-05 19:31:05 - ERROR - stderr - +2025-02-05 19:31:05 - ERROR - stderr - +2025-02-05 19:31:05 - INFO - stdout - {'loss': 0.6582, 'grad_norm': 1.1350882053375244, 'learning_rate': 1.3190231643213865e-05, 'epoch': 1.24} +2025-02-05 19:31:05 - ERROR - stderr - 41%|████▏ | 9305/22434 [9:23:25<9:08:11, 2.51s/it] +2025-02-05 19:31:08 - ERROR - stderr - 41%|████▏ | 9306/22434 [9:23:28<9:07:34, 2.50s/it] +2025-02-05 19:31:08 - ERROR - stderr - +2025-02-05 19:31:08 - ERROR - stderr - +2025-02-05 19:31:08 - INFO - stdout - {'loss': 0.6799, 'grad_norm': 1.3048738241195679, 'learning_rate': 1.3188863303556754e-05, 'epoch': 1.24} +2025-02-05 19:31:08 - ERROR - stderr - 41%|████▏ | 9306/22434 [9:23:28<9:07:34, 2.50s/it] +2025-02-05 19:31:11 - ERROR - stderr - 41%|████▏ | 9307/22434 [9:23:30<9:21:46, 2.57s/it] +2025-02-05 19:31:11 - ERROR - stderr - +2025-02-05 19:31:11 - ERROR - stderr - +2025-02-05 19:31:11 - INFO - stdout - {'loss': 0.6845, 'grad_norm': 1.1378577947616577, 'learning_rate': 1.3187494897430837e-05, 'epoch': 1.24} +2025-02-05 19:31:11 - ERROR - stderr - 41%|████▏ | 9307/22434 [9:23:30<9:21:46, 2.57s/it] +2025-02-05 19:31:13 - ERROR - stderr - 41%|████▏ | 9308/22434 [9:23:33<9:17:06, 2.55s/it] +2025-02-05 19:31:13 - ERROR - stderr - +2025-02-05 19:31:13 - ERROR - stderr - +2025-02-05 19:31:13 - INFO - stdout - {'loss': 0.7878, 'grad_norm': 1.1637569665908813, 'learning_rate': 1.3186126424864639e-05, 'epoch': 1.24} +2025-02-05 19:31:13 - ERROR - stderr - 41%|████▏ | 9308/22434 [9:23:33<9:17:06, 2.55s/it] +2025-02-05 19:31:16 - ERROR - stderr - 41%|████▏ | 9309/22434 [9:23:35<9:13:42, 2.53s/it] +2025-02-05 19:31:16 - ERROR - stderr - +2025-02-05 19:31:16 - ERROR - stderr - +2025-02-05 19:31:16 - INFO - stdout - {'loss': 0.7848, 'grad_norm': 1.2447539567947388, 'learning_rate': 1.3184757885886683e-05, 'epoch': 1.24} +2025-02-05 19:31:16 - ERROR - stderr - 41%|████▏ | 9309/22434 [9:23:35<9:13:42, 2.53s/it] +2025-02-05 19:31:18 - ERROR - stderr - 41%|████▏ | 9310/22434 [9:23:38<9:12:00, 2.52s/it] +2025-02-05 19:31:18 - ERROR - stderr - +2025-02-05 19:31:18 - ERROR - stderr - +2025-02-05 19:31:18 - INFO - stdout - {'loss': 0.8088, 'grad_norm': 1.377752423286438, 'learning_rate': 1.3183389280525497e-05, 'epoch': 1.24} +2025-02-05 19:31:18 - ERROR - stderr - 41%|████▏ | 9310/22434 [9:23:38<9:12:00, 2.52s/it] +2025-02-05 19:31:21 - ERROR - stderr - 42%|████▏ | 9311/22434 [9:23:40<9:10:25, 2.52s/it] +2025-02-05 19:31:21 - ERROR - stderr - +2025-02-05 19:31:21 - ERROR - stderr - +2025-02-05 19:31:21 - INFO - stdout - {'loss': 0.6947, 'grad_norm': 1.1519241333007812, 'learning_rate': 1.3182020608809611e-05, 'epoch': 1.25} +2025-02-05 19:31:21 - ERROR - stderr - 42%|████▏ | 9311/22434 [9:23:40<9:10:25, 2.52s/it] +2025-02-05 19:31:23 - ERROR - stderr - 42%|████▏ | 9312/22434 [9:23:43<9:09:25, 2.51s/it] +2025-02-05 19:31:23 - ERROR - stderr - +2025-02-05 19:31:23 - ERROR - stderr - +2025-02-05 19:31:23 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.2067630290985107, 'learning_rate': 1.3180651870767547e-05, 'epoch': 1.25} +2025-02-05 19:31:23 - ERROR - stderr - 42%|████▏ | 9312/22434 [9:23:43<9:09:25, 2.51s/it] +2025-02-05 19:31:26 - ERROR - stderr - 42%|████▏ | 9313/22434 [9:23:45<9:13:24, 2.53s/it] +2025-02-05 19:31:26 - ERROR - stderr - +2025-02-05 19:31:26 - ERROR - stderr - +2025-02-05 19:31:26 - INFO - stdout - {'loss': 0.6723, 'grad_norm': 1.1547857522964478, 'learning_rate': 1.317928306642784e-05, 'epoch': 1.25} +2025-02-05 19:31:26 - ERROR - stderr - 42%|████▏ | 9313/22434 [9:23:45<9:13:24, 2.53s/it] +2025-02-05 19:31:28 - ERROR - stderr - 42%|████▏ | 9314/22434 [9:23:48<9:11:43, 2.52s/it] +2025-02-05 19:31:28 - ERROR - stderr - +2025-02-05 19:31:28 - ERROR - stderr - +2025-02-05 19:31:28 - INFO - stdout - {'loss': 0.7964, 'grad_norm': 1.290090799331665, 'learning_rate': 1.3177914195819018e-05, 'epoch': 1.25} +2025-02-05 19:31:28 - ERROR - stderr - 42%|████▏ | 9314/22434 [9:23:48<9:11:43, 2.52s/it] +2025-02-05 19:31:31 - ERROR - stderr - 42%|████▏ | 9315/22434 [9:23:50<9:13:16, 2.53s/it] +2025-02-05 19:31:31 - ERROR - stderr - +2025-02-05 19:31:31 - ERROR - stderr - +2025-02-05 19:31:31 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.139615535736084, 'learning_rate': 1.3176545258969615e-05, 'epoch': 1.25} +2025-02-05 19:31:31 - ERROR - stderr - 42%|████▏ | 9315/22434 [9:23:51<9:13:16, 2.53s/it] +2025-02-05 19:31:33 - ERROR - stderr - 42%|████▏ | 9316/22434 [9:23:53<9:10:26, 2.52s/it] +2025-02-05 19:31:33 - ERROR - stderr - +2025-02-05 19:31:33 - ERROR - stderr - +2025-02-05 19:31:33 - INFO - stdout - {'loss': 0.6651, 'grad_norm': 1.134974718093872, 'learning_rate': 1.3175176255908167e-05, 'epoch': 1.25} +2025-02-05 19:31:33 - ERROR - stderr - 42%|████▏ | 9316/22434 [9:23:53<9:10:26, 2.52s/it] +2025-02-05 19:31:36 - ERROR - stderr - 42%|████▏ | 9317/22434 [9:23:56<9:17:37, 2.55s/it] +2025-02-05 19:31:36 - ERROR - stderr - +2025-02-05 19:31:36 - ERROR - stderr - +2025-02-05 19:31:36 - INFO - stdout - {'loss': 0.708, 'grad_norm': 1.0861785411834717, 'learning_rate': 1.3173807186663209e-05, 'epoch': 1.25} +2025-02-05 19:31:36 - ERROR - stderr - 42%|████▏ | 9317/22434 [9:23:56<9:17:37, 2.55s/it] +2025-02-05 19:31:38 - ERROR - stderr - 42%|████▏ | 9318/22434 [9:23:58<9:22:36, 2.57s/it] +2025-02-05 19:31:39 - ERROR - stderr - +2025-02-05 19:31:39 - ERROR - stderr - +2025-02-05 19:31:39 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.2182151079177856, 'learning_rate': 1.317243805126328e-05, 'epoch': 1.25} +2025-02-05 19:31:39 - ERROR - stderr - 42%|████▏ | 9318/22434 [9:23:58<9:22:36, 2.57s/it] +2025-02-05 19:31:41 - ERROR - stderr - 42%|████▏ | 9319/22434 [9:24:01<9:19:06, 2.56s/it] +2025-02-05 19:31:41 - ERROR - stderr - +2025-02-05 19:31:41 - ERROR - stderr - +2025-02-05 19:31:41 - INFO - stdout - {'loss': 0.6521, 'grad_norm': 1.0341705083847046, 'learning_rate': 1.317106884973691e-05, 'epoch': 1.25} +2025-02-05 19:31:41 - ERROR - stderr - 42%|████▏ | 9319/22434 [9:24:01<9:19:06, 2.56s/it] +2025-02-05 19:31:44 - ERROR - stderr - 42%|████▏ | 9320/22434 [9:24:03<9:16:13, 2.54s/it] +2025-02-05 19:31:44 - ERROR - stderr - +2025-02-05 19:31:44 - ERROR - stderr - +2025-02-05 19:31:44 - INFO - stdout - {'loss': 0.8218, 'grad_norm': 1.2140824794769287, 'learning_rate': 1.3169699582112645e-05, 'epoch': 1.25} +2025-02-05 19:31:44 - ERROR - stderr - 42%|████▏ | 9320/22434 [9:24:03<9:16:13, 2.54s/it] +2025-02-05 19:31:46 - ERROR - stderr - 42%|████▏ | 9321/22434 [9:24:06<9:15:09, 2.54s/it] +2025-02-05 19:31:46 - ERROR - stderr - +2025-02-05 19:31:46 - ERROR - stderr - +2025-02-05 19:31:46 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.1120494604110718, 'learning_rate': 1.3168330248419028e-05, 'epoch': 1.25} +2025-02-05 19:31:46 - ERROR - stderr - 42%|████▏ | 9321/22434 [9:24:06<9:15:09, 2.54s/it] +2025-02-05 19:31:48 - ERROR - stderr - 42%|████▏ | 9322/22434 [9:24:08<9:08:10, 2.51s/it] +2025-02-05 19:31:49 - ERROR - stderr - +2025-02-05 19:31:49 - ERROR - stderr - +2025-02-05 19:31:49 - INFO - stdout - {'loss': 0.6359, 'grad_norm': 1.0463011264801025, 'learning_rate': 1.3166960848684595e-05, 'epoch': 1.25} +2025-02-05 19:31:49 - ERROR - stderr - 42%|████▏ | 9322/22434 [9:24:08<9:08:10, 2.51s/it] +2025-02-05 19:31:51 - ERROR - stderr - 42%|████▏ | 9323/22434 [9:24:11<9:10:49, 2.52s/it] +2025-02-05 19:31:51 - ERROR - stderr - +2025-02-05 19:31:51 - ERROR - stderr - +2025-02-05 19:31:51 - INFO - stdout - {'loss': 0.7699, 'grad_norm': 1.2577452659606934, 'learning_rate': 1.3165591382937897e-05, 'epoch': 1.25} +2025-02-05 19:31:51 - ERROR - stderr - 42%|████▏ | 9323/22434 [9:24:11<9:10:49, 2.52s/it] +2025-02-05 19:31:54 - ERROR - stderr - 42%|████▏ | 9324/22434 [9:24:13<9:13:40, 2.53s/it] +2025-02-05 19:31:54 - ERROR - stderr - +2025-02-05 19:31:54 - ERROR - stderr - +2025-02-05 19:31:54 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.156151533126831, 'learning_rate': 1.3164221851207475e-05, 'epoch': 1.25} +2025-02-05 19:31:54 - ERROR - stderr - 42%|████▏ | 9324/22434 [9:24:13<9:13:40, 2.53s/it] +2025-02-05 19:31:56 - ERROR - stderr - 42%|████▏ | 9325/22434 [9:24:16<9:09:26, 2.51s/it] +2025-02-05 19:31:56 - ERROR - stderr - +2025-02-05 19:31:56 - ERROR - stderr - +2025-02-05 19:31:56 - INFO - stdout - {'loss': 0.7294, 'grad_norm': 1.1722558736801147, 'learning_rate': 1.3162852253521873e-05, 'epoch': 1.25} +2025-02-05 19:31:56 - ERROR - stderr - 42%|████▏ | 9325/22434 [9:24:16<9:09:26, 2.51s/it] +2025-02-05 19:31:59 - ERROR - stderr - 42%|████▏ | 9326/22434 [9:24:19<9:39:51, 2.65s/it] +2025-02-05 19:31:59 - ERROR - stderr - +2025-02-05 19:31:59 - ERROR - stderr - +2025-02-05 19:31:59 - INFO - stdout - {'loss': 0.8322, 'grad_norm': 1.297115445137024, 'learning_rate': 1.3161482589909649e-05, 'epoch': 1.25} +2025-02-05 19:31:59 - ERROR - stderr - 42%|████▏ | 9326/22434 [9:24:19<9:39:51, 2.65s/it] +2025-02-05 19:32:02 - ERROR - stderr - 42%|████▏ | 9327/22434 [9:24:21<9:27:45, 2.60s/it] +2025-02-05 19:32:02 - ERROR - stderr - +2025-02-05 19:32:02 - ERROR - stderr - +2025-02-05 19:32:02 - INFO - stdout - {'loss': 0.7529, 'grad_norm': 1.2446820735931396, 'learning_rate': 1.316011286039934e-05, 'epoch': 1.25} +2025-02-05 19:32:02 - ERROR - stderr - 42%|████▏ | 9327/22434 [9:24:21<9:27:45, 2.60s/it] +2025-02-05 19:32:04 - ERROR - stderr - 42%|████▏ | 9328/22434 [9:24:24<9:22:23, 2.57s/it] +2025-02-05 19:32:04 - ERROR - stderr - +2025-02-05 19:32:04 - ERROR - stderr - +2025-02-05 19:32:04 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.1087275743484497, 'learning_rate': 1.3158743065019504e-05, 'epoch': 1.25} +2025-02-05 19:32:04 - ERROR - stderr - 42%|████▏ | 9328/22434 [9:24:24<9:22:23, 2.57s/it] +2025-02-05 19:32:07 - ERROR - stderr - 42%|████▏ | 9329/22434 [9:24:26<9:18:24, 2.56s/it] +2025-02-05 19:32:07 - ERROR - stderr - +2025-02-05 19:32:07 - ERROR - stderr - +2025-02-05 19:32:07 - INFO - stdout - {'loss': 0.7263, 'grad_norm': 1.2273108959197998, 'learning_rate': 1.3157373203798688e-05, 'epoch': 1.25} +2025-02-05 19:32:07 - ERROR - stderr - 42%|████▏ | 9329/22434 [9:24:26<9:18:24, 2.56s/it] +2025-02-05 19:32:09 - ERROR - stderr - 42%|████▏ | 9330/22434 [9:24:29<9:14:22, 2.54s/it] +2025-02-05 19:32:09 - ERROR - stderr - +2025-02-05 19:32:09 - ERROR - stderr - +2025-02-05 19:32:09 - INFO - stdout - {'loss': 0.6974, 'grad_norm': 1.1508930921554565, 'learning_rate': 1.3156003276765456e-05, 'epoch': 1.25} +2025-02-05 19:32:09 - ERROR - stderr - 42%|████▏ | 9330/22434 [9:24:29<9:14:22, 2.54s/it] +2025-02-05 19:32:12 - ERROR - stderr - 42%|████▏ | 9331/22434 [9:24:31<9:12:19, 2.53s/it] +2025-02-05 19:32:12 - ERROR - stderr - +2025-02-05 19:32:12 - ERROR - stderr - +2025-02-05 19:32:12 - INFO - stdout - {'loss': 0.6989, 'grad_norm': 1.2422845363616943, 'learning_rate': 1.3154633283948352e-05, 'epoch': 1.25} +2025-02-05 19:32:12 - ERROR - stderr - 42%|████▏ | 9331/22434 [9:24:31<9:12:19, 2.53s/it] +2025-02-05 19:32:14 - ERROR - stderr - 42%|████▏ | 9332/22434 [9:24:34<9:13:42, 2.54s/it] +2025-02-05 19:32:14 - ERROR - stderr - +2025-02-05 19:32:14 - ERROR - stderr - +2025-02-05 19:32:14 - INFO - stdout - {'loss': 0.753, 'grad_norm': 1.0530104637145996, 'learning_rate': 1.3153263225375937e-05, 'epoch': 1.25} +2025-02-05 19:32:14 - ERROR - stderr - 42%|████▏ | 9332/22434 [9:24:34<9:13:42, 2.54s/it] +2025-02-05 19:32:17 - ERROR - stderr - 42%|████▏ | 9333/22434 [9:24:36<9:10:23, 2.52s/it] +2025-02-05 19:32:17 - ERROR - stderr - +2025-02-05 19:32:17 - ERROR - stderr - +2025-02-05 19:32:17 - INFO - stdout - {'loss': 0.8212, 'grad_norm': 1.3269013166427612, 'learning_rate': 1.3151893101076765e-05, 'epoch': 1.25} +2025-02-05 19:32:17 - ERROR - stderr - 42%|████▏ | 9333/22434 [9:24:36<9:10:23, 2.52s/it] +2025-02-05 19:32:19 - ERROR - stderr - 42%|████▏ | 9334/22434 [9:24:39<9:15:14, 2.54s/it] +2025-02-05 19:32:19 - ERROR - stderr - +2025-02-05 19:32:19 - ERROR - stderr - +2025-02-05 19:32:19 - INFO - stdout - {'loss': 0.7829, 'grad_norm': 1.2658700942993164, 'learning_rate': 1.3150522911079398e-05, 'epoch': 1.25} +2025-02-05 19:32:19 - ERROR - stderr - 42%|████▏ | 9334/22434 [9:24:39<9:15:14, 2.54s/it] +2025-02-05 19:32:22 - ERROR - stderr - 42%|████▏ | 9335/22434 [9:24:42<9:23:26, 2.58s/it] +2025-02-05 19:32:22 - ERROR - stderr - +2025-02-05 19:32:22 - ERROR - stderr - +2025-02-05 19:32:22 - INFO - stdout - {'loss': 0.7149, 'grad_norm': 1.2770941257476807, 'learning_rate': 1.3149152655412397e-05, 'epoch': 1.25} +2025-02-05 19:32:22 - ERROR - stderr - 42%|████▏ | 9335/22434 [9:24:42<9:23:26, 2.58s/it] +2025-02-05 19:32:24 - ERROR - stderr - 42%|████▏ | 9336/22434 [9:24:44<9:16:54, 2.55s/it] +2025-02-05 19:32:24 - ERROR - stderr - +2025-02-05 19:32:24 - ERROR - stderr - +2025-02-05 19:32:24 - INFO - stdout - {'loss': 0.7872, 'grad_norm': 1.3781917095184326, 'learning_rate': 1.314778233410432e-05, 'epoch': 1.25} +2025-02-05 19:32:24 - ERROR - stderr - 42%|████▏ | 9336/22434 [9:24:44<9:16:54, 2.55s/it] +2025-02-05 19:32:27 - ERROR - stderr - 42%|████▏ | 9337/22434 [9:24:47<9:13:42, 2.54s/it] +2025-02-05 19:32:27 - ERROR - stderr - +2025-02-05 19:32:27 - ERROR - stderr - +2025-02-05 19:32:27 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.1750023365020752, 'learning_rate': 1.3146411947183734e-05, 'epoch': 1.25} +2025-02-05 19:32:27 - ERROR - stderr - 42%|████▏ | 9337/22434 [9:24:47<9:13:42, 2.54s/it] +2025-02-05 19:32:29 - ERROR - stderr - 42%|████▏ | 9338/22434 [9:24:49<9:10:17, 2.52s/it] +2025-02-05 19:32:29 - ERROR - stderr - +2025-02-05 19:32:29 - ERROR - stderr - +2025-02-05 19:32:29 - INFO - stdout - {'loss': 0.7026, 'grad_norm': 1.1494070291519165, 'learning_rate': 1.3145041494679206e-05, 'epoch': 1.25} +2025-02-05 19:32:29 - ERROR - stderr - 42%|████▏ | 9338/22434 [9:24:49<9:10:17, 2.52s/it] +2025-02-05 19:32:32 - ERROR - stderr - 42%|████▏ | 9339/22434 [9:24:52<9:10:00, 2.52s/it] +2025-02-05 19:32:32 - ERROR - stderr - +2025-02-05 19:32:32 - ERROR - stderr - +2025-02-05 19:32:32 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.070863127708435, 'learning_rate': 1.3143670976619292e-05, 'epoch': 1.25} +2025-02-05 19:32:32 - ERROR - stderr - 42%|████▏ | 9339/22434 [9:24:52<9:10:00, 2.52s/it] +2025-02-05 19:32:34 - ERROR - stderr - 42%|████▏ | 9340/22434 [9:24:54<9:05:26, 2.50s/it] +2025-02-05 19:32:34 - ERROR - stderr - +2025-02-05 19:32:34 - ERROR - stderr - +2025-02-05 19:32:34 - INFO - stdout - {'loss': 0.6478, 'grad_norm': 1.1500813961029053, 'learning_rate': 1.3142300393032564e-05, 'epoch': 1.25} +2025-02-05 19:32:34 - ERROR - stderr - 42%|████▏ | 9340/22434 [9:24:54<9:05:26, 2.50s/it] +2025-02-05 19:32:37 - ERROR - stderr - 42%|████▏ | 9341/22434 [9:24:56<9:01:14, 2.48s/it] +2025-02-05 19:32:37 - ERROR - stderr - +2025-02-05 19:32:37 - ERROR - stderr - +2025-02-05 19:32:37 - INFO - stdout - {'loss': 0.7334, 'grad_norm': 1.1829400062561035, 'learning_rate': 1.3140929743947592e-05, 'epoch': 1.25} +2025-02-05 19:32:37 - ERROR - stderr - 42%|████▏ | 9341/22434 [9:24:57<9:01:14, 2.48s/it] +2025-02-05 19:32:39 - ERROR - stderr - 42%|████▏ | 9342/22434 [9:24:59<8:59:39, 2.47s/it] +2025-02-05 19:32:39 - ERROR - stderr - +2025-02-05 19:32:39 - ERROR - stderr - +2025-02-05 19:32:39 - INFO - stdout - {'loss': 0.7043, 'grad_norm': 1.088805913925171, 'learning_rate': 1.3139559029392948e-05, 'epoch': 1.25} +2025-02-05 19:32:39 - ERROR - stderr - 42%|████▏ | 9342/22434 [9:24:59<8:59:39, 2.47s/it] +2025-02-05 19:32:42 - ERROR - stderr - 42%|████▏ | 9343/22434 [9:25:02<9:08:10, 2.51s/it] +2025-02-05 19:32:42 - ERROR - stderr - +2025-02-05 19:32:42 - ERROR - stderr - +2025-02-05 19:32:42 - INFO - stdout - {'loss': 0.6951, 'grad_norm': 1.1109436750411987, 'learning_rate': 1.3138188249397197e-05, 'epoch': 1.25} +2025-02-05 19:32:42 - ERROR - stderr - 42%|████▏ | 9343/22434 [9:25:02<9:08:10, 2.51s/it] +2025-02-05 19:32:45 - ERROR - stderr - 42%|████▏ | 9344/22434 [9:25:04<9:24:07, 2.59s/it] +2025-02-05 19:32:45 - ERROR - stderr - +2025-02-05 19:32:45 - ERROR - stderr - +2025-02-05 19:32:45 - INFO - stdout - {'loss': 0.7784, 'grad_norm': 1.2694209814071655, 'learning_rate': 1.3136817403988918e-05, 'epoch': 1.25} +2025-02-05 19:32:45 - ERROR - stderr - 42%|████▏ | 9344/22434 [9:25:04<9:24:07, 2.59s/it] +2025-02-05 19:32:47 - ERROR - stderr - 42%|████▏ | 9345/22434 [9:25:07<9:22:09, 2.58s/it] +2025-02-05 19:32:47 - ERROR - stderr - +2025-02-05 19:32:47 - ERROR - stderr - +2025-02-05 19:32:47 - INFO - stdout - {'loss': 0.7354, 'grad_norm': 1.1016857624053955, 'learning_rate': 1.3135446493196677e-05, 'epoch': 1.25} +2025-02-05 19:32:47 - ERROR - stderr - 42%|████▏ | 9345/22434 [9:25:07<9:22:09, 2.58s/it] +2025-02-05 19:32:50 - ERROR - stderr - 42%|████▏ | 9346/22434 [9:25:10<9:47:13, 2.69s/it] +2025-02-05 19:32:50 - ERROR - stderr - +2025-02-05 19:32:50 - ERROR - stderr - +2025-02-05 19:32:50 - INFO - stdout - {'loss': 0.7319, 'grad_norm': 1.153218150138855, 'learning_rate': 1.3134075517049059e-05, 'epoch': 1.25} +2025-02-05 19:32:50 - ERROR - stderr - 42%|████▏ | 9346/22434 [9:25:10<9:47:13, 2.69s/it] +2025-02-05 19:32:53 - ERROR - stderr - 42%|████▏ | 9347/22434 [9:25:12<9:37:10, 2.65s/it] +2025-02-05 19:32:53 - ERROR - stderr - +2025-02-05 19:32:53 - ERROR - stderr - +2025-02-05 19:32:53 - INFO - stdout - {'loss': 0.7702, 'grad_norm': 1.300937294960022, 'learning_rate': 1.3132704475574634e-05, 'epoch': 1.25} +2025-02-05 19:32:53 - ERROR - stderr - 42%|████▏ | 9347/22434 [9:25:12<9:37:10, 2.65s/it] +2025-02-05 19:32:55 - ERROR - stderr - 42%|████▏ | 9348/22434 [9:25:15<9:25:17, 2.59s/it] +2025-02-05 19:32:55 - ERROR - stderr - +2025-02-05 19:32:55 - ERROR - stderr - +2025-02-05 19:32:55 - INFO - stdout - {'loss': 0.7152, 'grad_norm': 1.02394700050354, 'learning_rate': 1.3131333368801982e-05, 'epoch': 1.25} +2025-02-05 19:32:55 - ERROR - stderr - 42%|████▏ | 9348/22434 [9:25:15<9:25:17, 2.59s/it] +2025-02-05 19:32:58 - ERROR - stderr - 42%|████▏ | 9349/22434 [9:25:17<9:22:21, 2.58s/it] +2025-02-05 19:32:58 - ERROR - stderr - +2025-02-05 19:32:58 - ERROR - stderr - +2025-02-05 19:32:58 - INFO - stdout - {'loss': 0.7381, 'grad_norm': 1.262675166130066, 'learning_rate': 1.312996219675968e-05, 'epoch': 1.25} +2025-02-05 19:32:58 - ERROR - stderr - 42%|████▏ | 9349/22434 [9:25:17<9:22:21, 2.58s/it] +2025-02-05 19:33:00 - ERROR - stderr - 42%|████▏ | 9350/22434 [9:25:20<9:18:30, 2.56s/it] +2025-02-05 19:33:00 - ERROR - stderr - +2025-02-05 19:33:00 - ERROR - stderr - +2025-02-05 19:33:00 - INFO - stdout - {'loss': 0.7902, 'grad_norm': 1.123143196105957, 'learning_rate': 1.3128590959476313e-05, 'epoch': 1.25} +2025-02-05 19:33:00 - ERROR - stderr - 42%|████▏ | 9350/22434 [9:25:20<9:18:30, 2.56s/it] +2025-02-05 19:33:03 - ERROR - stderr - 42%|████▏ | 9351/22434 [9:25:22<9:09:28, 2.52s/it] +2025-02-05 19:33:03 - ERROR - stderr - +2025-02-05 19:33:03 - ERROR - stderr - +2025-02-05 19:33:03 - INFO - stdout - {'loss': 0.7393, 'grad_norm': 1.2047656774520874, 'learning_rate': 1.3127219656980464e-05, 'epoch': 1.25} +2025-02-05 19:33:03 - ERROR - stderr - 42%|████▏ | 9351/22434 [9:25:22<9:09:28, 2.52s/it] +2025-02-05 19:33:05 - ERROR - stderr - 42%|████▏ | 9352/22434 [9:25:25<9:07:29, 2.51s/it] +2025-02-05 19:33:05 - ERROR - stderr - +2025-02-05 19:33:05 - ERROR - stderr - +2025-02-05 19:33:05 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.1314489841461182, 'learning_rate': 1.3125848289300712e-05, 'epoch': 1.25} +2025-02-05 19:33:05 - ERROR - stderr - 42%|████▏ | 9352/22434 [9:25:25<9:07:29, 2.51s/it] +2025-02-05 19:33:07 - ERROR - stderr - 42%|████▏ | 9353/22434 [9:25:27<9:03:30, 2.49s/it] +2025-02-05 19:33:08 - ERROR - stderr - +2025-02-05 19:33:08 - ERROR - stderr - +2025-02-05 19:33:08 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 0.9788809418678284, 'learning_rate': 1.3124476856465642e-05, 'epoch': 1.25} +2025-02-05 19:33:08 - ERROR - stderr - 42%|████▏ | 9353/22434 [9:25:27<9:03:30, 2.49s/it] +2025-02-05 19:33:10 - ERROR - stderr - 42%|████▏ | 9354/22434 [9:25:30<9:18:46, 2.56s/it] +2025-02-05 19:33:10 - ERROR - stderr - +2025-02-05 19:33:10 - ERROR - stderr - +2025-02-05 19:33:10 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.252990484237671, 'learning_rate': 1.3123105358503839e-05, 'epoch': 1.25} +2025-02-05 19:33:10 - ERROR - stderr - 42%|████▏ | 9354/22434 [9:25:30<9:18:46, 2.56s/it] +2025-02-05 19:33:13 - ERROR - stderr - 42%|████▏ | 9355/22434 [9:25:32<9:10:37, 2.53s/it] +2025-02-05 19:33:13 - ERROR - stderr - +2025-02-05 19:33:13 - ERROR - stderr - +2025-02-05 19:33:13 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.2373435497283936, 'learning_rate': 1.3121733795443898e-05, 'epoch': 1.25} +2025-02-05 19:33:13 - ERROR - stderr - 42%|████▏ | 9355/22434 [9:25:32<9:10:37, 2.53s/it] +2025-02-05 19:33:15 - ERROR - stderr - 42%|████▏ | 9356/22434 [9:25:35<9:26:02, 2.60s/it] +2025-02-05 19:33:15 - ERROR - stderr - +2025-02-05 19:33:15 - ERROR - stderr - +2025-02-05 19:33:15 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.158713936805725, 'learning_rate': 1.3120362167314403e-05, 'epoch': 1.25} +2025-02-05 19:33:15 - ERROR - stderr - 42%|████▏ | 9356/22434 [9:25:35<9:26:02, 2.60s/it] +2025-02-05 19:33:18 - ERROR - stderr - 42%|████▏ | 9357/22434 [9:25:38<9:25:42, 2.60s/it] +2025-02-05 19:33:18 - ERROR - stderr - +2025-02-05 19:33:18 - ERROR - stderr - +2025-02-05 19:33:18 - INFO - stdout - {'loss': 0.8392, 'grad_norm': 1.293854832649231, 'learning_rate': 1.3118990474143941e-05, 'epoch': 1.25} +2025-02-05 19:33:18 - ERROR - stderr - 42%|████▏ | 9357/22434 [9:25:38<9:25:42, 2.60s/it] +2025-02-05 19:33:21 - ERROR - stderr - 42%|████▏ | 9358/22434 [9:25:40<9:20:27, 2.57s/it] +2025-02-05 19:33:21 - ERROR - stderr - +2025-02-05 19:33:21 - ERROR - stderr - +2025-02-05 19:33:21 - INFO - stdout - {'loss': 0.6534, 'grad_norm': 1.1044549942016602, 'learning_rate': 1.3117618715961111e-05, 'epoch': 1.25} +2025-02-05 19:33:21 - ERROR - stderr - 42%|████▏ | 9358/22434 [9:25:40<9:20:27, 2.57s/it] +2025-02-05 19:33:23 - ERROR - stderr - 42%|████▏ | 9359/22434 [9:25:43<9:12:38, 2.54s/it] +2025-02-05 19:33:23 - ERROR - stderr - +2025-02-05 19:33:23 - ERROR - stderr - +2025-02-05 19:33:23 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.1490275859832764, 'learning_rate': 1.31162468927945e-05, 'epoch': 1.25} +2025-02-05 19:33:23 - ERROR - stderr - 42%|████▏ | 9359/22434 [9:25:43<9:12:38, 2.54s/it] +2025-02-05 19:33:25 - ERROR - stderr - 42%|████▏ | 9360/22434 [9:25:45<9:09:47, 2.52s/it] +2025-02-05 19:33:26 - ERROR - stderr - +2025-02-05 19:33:26 - ERROR - stderr - +2025-02-05 19:33:26 - INFO - stdout - {'loss': 0.8295, 'grad_norm': 1.2425954341888428, 'learning_rate': 1.3114875004672705e-05, 'epoch': 1.25} +2025-02-05 19:33:26 - ERROR - stderr - 42%|████▏ | 9360/22434 [9:25:45<9:09:47, 2.52s/it] +2025-02-05 19:33:28 - ERROR - stderr - 42%|████▏ | 9361/22434 [9:25:48<9:27:45, 2.61s/it] +2025-02-05 19:33:28 - ERROR - stderr - +2025-02-05 19:33:28 - ERROR - stderr - +2025-02-05 19:33:28 - INFO - stdout - {'loss': 0.6408, 'grad_norm': 1.1567577123641968, 'learning_rate': 1.3113503051624321e-05, 'epoch': 1.25} +2025-02-05 19:33:28 - ERROR - stderr - 42%|████▏ | 9361/22434 [9:25:48<9:27:45, 2.61s/it] +2025-02-05 19:33:31 - ERROR - stderr - 42%|████▏ | 9362/22434 [9:25:51<9:24:58, 2.59s/it] +2025-02-05 19:33:31 - ERROR - stderr - +2025-02-05 19:33:31 - ERROR - stderr - +2025-02-05 19:33:31 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.1190743446350098, 'learning_rate': 1.3112131033677944e-05, 'epoch': 1.25} +2025-02-05 19:33:31 - ERROR - stderr - 42%|████▏ | 9362/22434 [9:25:51<9:24:58, 2.59s/it] +2025-02-05 19:33:33 - ERROR - stderr - 42%|████▏ | 9363/22434 [9:25:53<9:16:48, 2.56s/it] +2025-02-05 19:33:33 - ERROR - stderr - +2025-02-05 19:33:33 - ERROR - stderr - +2025-02-05 19:33:33 - INFO - stdout - {'loss': 0.6627, 'grad_norm': 1.057592749595642, 'learning_rate': 1.3110758950862176e-05, 'epoch': 1.25} +2025-02-05 19:33:33 - ERROR - stderr - 42%|████▏ | 9363/22434 [9:25:53<9:16:48, 2.56s/it] +2025-02-05 19:33:36 - ERROR - stderr - 42%|████▏ | 9364/22434 [9:25:56<9:16:30, 2.55s/it] +2025-02-05 19:33:36 - ERROR - stderr - +2025-02-05 19:33:36 - ERROR - stderr - +2025-02-05 19:33:36 - INFO - stdout - {'loss': 0.73, 'grad_norm': 1.2360124588012695, 'learning_rate': 1.3109386803205615e-05, 'epoch': 1.25} +2025-02-05 19:33:36 - ERROR - stderr - 42%|████▏ | 9364/22434 [9:25:56<9:16:30, 2.55s/it] +2025-02-05 19:33:39 - ERROR - stderr - 42%|████▏ | 9365/22434 [9:25:58<9:25:45, 2.60s/it] +2025-02-05 19:33:39 - ERROR - stderr - +2025-02-05 19:33:39 - ERROR - stderr - +2025-02-05 19:33:39 - INFO - stdout - {'loss': 0.7904, 'grad_norm': 1.266740083694458, 'learning_rate': 1.310801459073686e-05, 'epoch': 1.25} +2025-02-05 19:33:39 - ERROR - stderr - 42%|████▏ | 9365/22434 [9:25:58<9:25:45, 2.60s/it] +2025-02-05 19:33:41 - ERROR - stderr - 42%|████▏ | 9366/22434 [9:26:01<9:38:10, 2.65s/it] +2025-02-05 19:33:41 - ERROR - stderr - +2025-02-05 19:33:41 - ERROR - stderr - +2025-02-05 19:33:41 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.1983699798583984, 'learning_rate': 1.3106642313484513e-05, 'epoch': 1.25} +2025-02-05 19:33:41 - ERROR - stderr - 42%|████▏ | 9366/22434 [9:26:01<9:38:10, 2.65s/it] +2025-02-05 19:33:44 - ERROR - stderr - 42%|████▏ | 9367/22434 [9:26:04<9:28:02, 2.61s/it] +2025-02-05 19:33:44 - ERROR - stderr - +2025-02-05 19:33:44 - ERROR - stderr - +2025-02-05 19:33:44 - INFO - stdout - {'loss': 0.8036, 'grad_norm': 1.2528026103973389, 'learning_rate': 1.3105269971477181e-05, 'epoch': 1.25} +2025-02-05 19:33:44 - ERROR - stderr - 42%|████▏ | 9367/22434 [9:26:04<9:28:02, 2.61s/it] +2025-02-05 19:33:46 - ERROR - stderr - 42%|████▏ | 9368/22434 [9:26:06<9:17:17, 2.56s/it] +2025-02-05 19:33:46 - ERROR - stderr - +2025-02-05 19:33:46 - ERROR - stderr - +2025-02-05 19:33:46 - INFO - stdout - {'loss': 0.6797, 'grad_norm': 1.3098112344741821, 'learning_rate': 1.3103897564743468e-05, 'epoch': 1.25} +2025-02-05 19:33:46 - ERROR - stderr - 42%|████▏ | 9368/22434 [9:26:06<9:17:17, 2.56s/it] +2025-02-05 19:33:49 - ERROR - stderr - 42%|████▏ | 9369/22434 [9:26:09<9:12:37, 2.54s/it] +2025-02-05 19:33:49 - ERROR - stderr - +2025-02-05 19:33:49 - ERROR - stderr - +2025-02-05 19:33:49 - INFO - stdout - {'loss': 0.7617, 'grad_norm': 1.141838788986206, 'learning_rate': 1.3102525093311979e-05, 'epoch': 1.25} +2025-02-05 19:33:49 - ERROR - stderr - 42%|████▏ | 9369/22434 [9:26:09<9:12:37, 2.54s/it] +2025-02-05 19:33:51 - ERROR - stderr - 42%|████▏ | 9370/22434 [9:26:11<9:08:19, 2.52s/it] +2025-02-05 19:33:51 - ERROR - stderr - +2025-02-05 19:33:51 - ERROR - stderr - +2025-02-05 19:33:51 - INFO - stdout - {'loss': 0.627, 'grad_norm': 1.2870361804962158, 'learning_rate': 1.3101152557211325e-05, 'epoch': 1.25} +2025-02-05 19:33:51 - ERROR - stderr - 42%|████▏ | 9370/22434 [9:26:11<9:08:19, 2.52s/it] +2025-02-05 19:33:54 - ERROR - stderr - 42%|████▏ | 9371/22434 [9:26:14<9:07:47, 2.52s/it] +2025-02-05 19:33:54 - ERROR - stderr - +2025-02-05 19:33:54 - ERROR - stderr - +2025-02-05 19:33:54 - INFO - stdout - {'loss': 0.7245, 'grad_norm': 1.0638781785964966, 'learning_rate': 1.3099779956470116e-05, 'epoch': 1.25} +2025-02-05 19:33:54 - ERROR - stderr - 42%|████▏ | 9371/22434 [9:26:14<9:07:47, 2.52s/it] +2025-02-05 19:33:56 - ERROR - stderr - 42%|████▏ | 9372/22434 [9:26:16<9:10:58, 2.53s/it] +2025-02-05 19:33:56 - ERROR - stderr - +2025-02-05 19:33:56 - ERROR - stderr - +2025-02-05 19:33:56 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.3090872764587402, 'learning_rate': 1.3098407291116958e-05, 'epoch': 1.25} +2025-02-05 19:33:56 - ERROR - stderr - 42%|████▏ | 9372/22434 [9:26:16<9:10:58, 2.53s/it] +2025-02-05 19:33:59 - ERROR - stderr - 42%|████▏ | 9373/22434 [9:26:19<9:17:16, 2.56s/it] +2025-02-05 19:33:59 - ERROR - stderr - +2025-02-05 19:33:59 - ERROR - stderr - +2025-02-05 19:33:59 - INFO - stdout - {'loss': 0.7655, 'grad_norm': 1.3152028322219849, 'learning_rate': 1.3097034561180463e-05, 'epoch': 1.25} +2025-02-05 19:33:59 - ERROR - stderr - 42%|████▏ | 9373/22434 [9:26:19<9:17:16, 2.56s/it] +2025-02-05 19:34:01 - ERROR - stderr - 42%|████▏ | 9374/22434 [9:26:21<9:11:31, 2.53s/it] +2025-02-05 19:34:01 - ERROR - stderr - +2025-02-05 19:34:01 - ERROR - stderr - +2025-02-05 19:34:01 - INFO - stdout - {'loss': 0.767, 'grad_norm': 1.2338777780532837, 'learning_rate': 1.3095661766689245e-05, 'epoch': 1.25} +2025-02-05 19:34:01 - ERROR - stderr - 42%|████▏ | 9374/22434 [9:26:21<9:11:31, 2.53s/it] +2025-02-05 19:34:04 - ERROR - stderr - 42%|████▏ | 9375/22434 [9:26:24<9:14:58, 2.55s/it] +2025-02-05 19:34:04 - ERROR - stderr - +2025-02-05 19:34:04 - ERROR - stderr - +2025-02-05 19:34:04 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.2580115795135498, 'learning_rate': 1.3094288907671924e-05, 'epoch': 1.25} +2025-02-05 19:34:04 - ERROR - stderr - 42%|████▏ | 9375/22434 [9:26:24<9:14:58, 2.55s/it] +2025-02-05 19:34:07 - ERROR - stderr - 42%|████▏ | 9376/22434 [9:26:26<9:12:59, 2.54s/it] +2025-02-05 19:34:07 - ERROR - stderr - +2025-02-05 19:34:07 - ERROR - stderr - +2025-02-05 19:34:07 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.122197151184082, 'learning_rate': 1.3092915984157108e-05, 'epoch': 1.25} +2025-02-05 19:34:07 - ERROR - stderr - 42%|████▏ | 9376/22434 [9:26:26<9:12:59, 2.54s/it] +2025-02-05 19:34:09 - ERROR - stderr - 42%|████▏ | 9377/22434 [9:26:29<9:12:35, 2.54s/it] +2025-02-05 19:34:09 - ERROR - stderr - +2025-02-05 19:34:09 - ERROR - stderr - +2025-02-05 19:34:09 - INFO - stdout - {'loss': 0.6683, 'grad_norm': 1.1007460355758667, 'learning_rate': 1.3091542996173421e-05, 'epoch': 1.25} +2025-02-05 19:34:09 - ERROR - stderr - 42%|████▏ | 9377/22434 [9:26:29<9:12:35, 2.54s/it] +2025-02-05 19:34:12 - ERROR - stderr - 42%|████▏ | 9378/22434 [9:26:31<9:07:36, 2.52s/it] +2025-02-05 19:34:12 - ERROR - stderr - +2025-02-05 19:34:12 - ERROR - stderr - +2025-02-05 19:34:12 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.2023651599884033, 'learning_rate': 1.3090169943749475e-05, 'epoch': 1.25} +2025-02-05 19:34:12 - ERROR - stderr - 42%|████▏ | 9378/22434 [9:26:31<9:07:36, 2.52s/it] +2025-02-05 19:34:14 - ERROR - stderr - 42%|████▏ | 9379/22434 [9:26:34<9:24:26, 2.59s/it] +2025-02-05 19:34:14 - ERROR - stderr - +2025-02-05 19:34:14 - ERROR - stderr - +2025-02-05 19:34:14 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.0603456497192383, 'learning_rate': 1.3088796826913897e-05, 'epoch': 1.25} +2025-02-05 19:34:14 - ERROR - stderr - 42%|████▏ | 9379/22434 [9:26:34<9:24:26, 2.59s/it] +2025-02-05 19:34:17 - ERROR - stderr - 42%|████▏ | 9380/22434 [9:26:37<9:16:49, 2.56s/it] +2025-02-05 19:34:17 - ERROR - stderr - +2025-02-05 19:34:17 - ERROR - stderr - +2025-02-05 19:34:17 - INFO - stdout - {'loss': 0.7247, 'grad_norm': 1.1769858598709106, 'learning_rate': 1.3087423645695303e-05, 'epoch': 1.25} +2025-02-05 19:34:17 - ERROR - stderr - 42%|████▏ | 9380/22434 [9:26:37<9:16:49, 2.56s/it] +2025-02-05 19:34:19 - ERROR - stderr - 42%|████▏ | 9381/22434 [9:26:39<9:18:37, 2.57s/it] +2025-02-05 19:34:19 - ERROR - stderr - +2025-02-05 19:34:19 - ERROR - stderr - +2025-02-05 19:34:19 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.1224919557571411, 'learning_rate': 1.3086050400122316e-05, 'epoch': 1.25} +2025-02-05 19:34:19 - ERROR - stderr - 42%|████▏ | 9381/22434 [9:26:39<9:18:37, 2.57s/it] +2025-02-05 19:34:22 - ERROR - stderr - 42%|████▏ | 9382/22434 [9:26:42<9:16:06, 2.56s/it] +2025-02-05 19:34:22 - ERROR - stderr - +2025-02-05 19:34:22 - ERROR - stderr - +2025-02-05 19:34:22 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.1778157949447632, 'learning_rate': 1.3084677090223563e-05, 'epoch': 1.25} +2025-02-05 19:34:22 - ERROR - stderr - 42%|████▏ | 9382/22434 [9:26:42<9:16:06, 2.56s/it] +2025-02-05 19:34:24 - ERROR - stderr - 42%|████▏ | 9383/22434 [9:26:44<9:15:50, 2.56s/it] +2025-02-05 19:34:25 - ERROR - stderr - +2025-02-05 19:34:25 - ERROR - stderr - +2025-02-05 19:34:25 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.1295745372772217, 'learning_rate': 1.3083303716027671e-05, 'epoch': 1.25} +2025-02-05 19:34:25 - ERROR - stderr - 42%|████▏ | 9383/22434 [9:26:44<9:15:50, 2.56s/it] +2025-02-05 19:34:27 - ERROR - stderr - 42%|████▏ | 9384/22434 [9:26:47<9:09:29, 2.53s/it] +2025-02-05 19:34:27 - ERROR - stderr - +2025-02-05 19:34:27 - ERROR - stderr - +2025-02-05 19:34:27 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.119362473487854, 'learning_rate': 1.3081930277563259e-05, 'epoch': 1.25} +2025-02-05 19:34:27 - ERROR - stderr - 42%|████▏ | 9384/22434 [9:26:47<9:09:29, 2.53s/it] +2025-02-05 19:34:29 - ERROR - stderr - 42%|████▏ | 9385/22434 [9:26:49<9:06:14, 2.51s/it] +2025-02-05 19:34:29 - ERROR - stderr - +2025-02-05 19:34:29 - ERROR - stderr - +2025-02-05 19:34:29 - INFO - stdout - {'loss': 0.722, 'grad_norm': 1.1715532541275024, 'learning_rate': 1.3080556774858962e-05, 'epoch': 1.26} +2025-02-05 19:34:29 - ERROR - stderr - 42%|████▏ | 9385/22434 [9:26:49<9:06:14, 2.51s/it] +2025-02-05 19:34:32 - ERROR - stderr - 42%|████▏ | 9386/22434 [9:26:52<9:01:57, 2.49s/it] +2025-02-05 19:34:32 - ERROR - stderr - +2025-02-05 19:34:32 - ERROR - stderr - +2025-02-05 19:34:32 - INFO - stdout - {'loss': 0.7933, 'grad_norm': 1.231411099433899, 'learning_rate': 1.3079183207943402e-05, 'epoch': 1.26} +2025-02-05 19:34:32 - ERROR - stderr - 42%|████▏ | 9386/22434 [9:26:52<9:01:57, 2.49s/it] +2025-02-05 19:34:34 - ERROR - stderr - 42%|████▏ | 9387/22434 [9:26:54<9:06:10, 2.51s/it] +2025-02-05 19:34:34 - ERROR - stderr - +2025-02-05 19:34:34 - ERROR - stderr - +2025-02-05 19:34:34 - INFO - stdout - {'loss': 0.7002, 'grad_norm': 1.2970679998397827, 'learning_rate': 1.3077809576845219e-05, 'epoch': 1.26} +2025-02-05 19:34:34 - ERROR - stderr - 42%|████▏ | 9387/22434 [9:26:54<9:06:10, 2.51s/it] +2025-02-05 19:34:37 - ERROR - stderr - 42%|████▏ | 9388/22434 [9:26:57<9:15:00, 2.55s/it] +2025-02-05 19:34:37 - ERROR - stderr - +2025-02-05 19:34:37 - ERROR - stderr - +2025-02-05 19:34:37 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.1691569089889526, 'learning_rate': 1.3076435881593042e-05, 'epoch': 1.26} +2025-02-05 19:34:37 - ERROR - stderr - 42%|████▏ | 9388/22434 [9:26:57<9:15:00, 2.55s/it] +2025-02-05 19:34:40 - ERROR - stderr - 42%|████▏ | 9389/22434 [9:26:59<9:15:22, 2.55s/it] +2025-02-05 19:34:40 - ERROR - stderr - +2025-02-05 19:34:40 - ERROR - stderr - +2025-02-05 19:34:40 - INFO - stdout - {'loss': 0.7141, 'grad_norm': 1.243012547492981, 'learning_rate': 1.3075062122215498e-05, 'epoch': 1.26} +2025-02-05 19:34:40 - ERROR - stderr - 42%|████▏ | 9389/22434 [9:26:59<9:15:22, 2.55s/it] +2025-02-05 19:34:42 - ERROR - stderr - 42%|████▏ | 9390/22434 [9:27:02<9:10:37, 2.53s/it] +2025-02-05 19:34:42 - ERROR - stderr - +2025-02-05 19:34:42 - ERROR - stderr - +2025-02-05 19:34:42 - INFO - stdout - {'loss': 0.7417, 'grad_norm': 1.1296825408935547, 'learning_rate': 1.307368829874123e-05, 'epoch': 1.26} +2025-02-05 19:34:42 - ERROR - stderr - 42%|████▏ | 9390/22434 [9:27:02<9:10:37, 2.53s/it] +2025-02-05 19:34:45 - ERROR - stderr - 42%|████▏ | 9391/22434 [9:27:04<9:07:36, 2.52s/it] +2025-02-05 19:34:45 - ERROR - stderr - +2025-02-05 19:34:45 - ERROR - stderr - +2025-02-05 19:34:45 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.1206094026565552, 'learning_rate': 1.3072314411198868e-05, 'epoch': 1.26} +2025-02-05 19:34:45 - ERROR - stderr - 42%|████▏ | 9391/22434 [9:27:04<9:07:36, 2.52s/it] +2025-02-05 19:34:47 - ERROR - stderr - 42%|████▏ | 9392/22434 [9:27:07<9:10:10, 2.53s/it] +2025-02-05 19:34:47 - ERROR - stderr - +2025-02-05 19:34:47 - ERROR - stderr - +2025-02-05 19:34:47 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.0847147703170776, 'learning_rate': 1.3070940459617053e-05, 'epoch': 1.26} +2025-02-05 19:34:47 - ERROR - stderr - 42%|████▏ | 9392/22434 [9:27:07<9:10:10, 2.53s/it] +2025-02-05 19:34:50 - ERROR - stderr - 42%|████▏ | 9393/22434 [9:27:09<9:10:59, 2.54s/it] +2025-02-05 19:34:50 - ERROR - stderr - +2025-02-05 19:34:50 - ERROR - stderr - +2025-02-05 19:34:50 - INFO - stdout - {'loss': 0.6377, 'grad_norm': 1.01374351978302, 'learning_rate': 1.3069566444024423e-05, 'epoch': 1.26} +2025-02-05 19:34:50 - ERROR - stderr - 42%|████▏ | 9393/22434 [9:27:09<9:10:59, 2.54s/it] +2025-02-05 19:34:52 - ERROR - stderr - 42%|████▏ | 9394/22434 [9:27:12<9:04:01, 2.50s/it] +2025-02-05 19:34:52 - ERROR - stderr - +2025-02-05 19:34:52 - ERROR - stderr - +2025-02-05 19:34:52 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.1284648180007935, 'learning_rate': 1.3068192364449618e-05, 'epoch': 1.26} +2025-02-05 19:34:52 - ERROR - stderr - 42%|████▏ | 9394/22434 [9:27:12<9:04:01, 2.50s/it] +2025-02-05 19:34:55 - ERROR - stderr - 42%|████▏ | 9395/22434 [9:27:15<9:17:42, 2.57s/it] +2025-02-05 19:34:55 - ERROR - stderr - +2025-02-05 19:34:55 - ERROR - stderr - +2025-02-05 19:34:55 - INFO - stdout - {'loss': 0.6768, 'grad_norm': 1.135588526725769, 'learning_rate': 1.3066818220921283e-05, 'epoch': 1.26} +2025-02-05 19:34:55 - ERROR - stderr - 42%|████▏ | 9395/22434 [9:27:15<9:17:42, 2.57s/it] +2025-02-05 19:34:58 - ERROR - stderr - 42%|████▏ | 9396/22434 [9:27:18<9:54:58, 2.74s/it] +2025-02-05 19:34:58 - ERROR - stderr - +2025-02-05 19:34:58 - ERROR - stderr - +2025-02-05 19:34:58 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.1649090051651, 'learning_rate': 1.3065444013468052e-05, 'epoch': 1.26} +2025-02-05 19:34:58 - ERROR - stderr - 42%|████▏ | 9396/22434 [9:27:18<9:54:58, 2.74s/it] +2025-02-05 19:35:01 - ERROR - stderr - 42%|████▏ | 9397/22434 [9:27:20<9:44:06, 2.69s/it] +2025-02-05 19:35:01 - ERROR - stderr - +2025-02-05 19:35:01 - ERROR - stderr - +2025-02-05 19:35:01 - INFO - stdout - {'loss': 0.7394, 'grad_norm': 1.1343060731887817, 'learning_rate': 1.3064069742118575e-05, 'epoch': 1.26} +2025-02-05 19:35:01 - ERROR - stderr - 42%|████▏ | 9397/22434 [9:27:20<9:44:06, 2.69s/it] +2025-02-05 19:35:03 - ERROR - stderr - 42%|████▏ | 9398/22434 [9:27:23<9:45:20, 2.69s/it] +2025-02-05 19:35:03 - ERROR - stderr - +2025-02-05 19:35:03 - ERROR - stderr - +2025-02-05 19:35:03 - INFO - stdout - {'loss': 0.7554, 'grad_norm': 1.199958086013794, 'learning_rate': 1.3062695406901496e-05, 'epoch': 1.26} +2025-02-05 19:35:03 - ERROR - stderr - 42%|████▏ | 9398/22434 [9:27:23<9:45:20, 2.69s/it] +2025-02-05 19:35:06 - ERROR - stderr - 42%|████▏ | 9399/22434 [9:27:26<9:34:58, 2.65s/it] +2025-02-05 19:35:06 - ERROR - stderr - +2025-02-05 19:35:06 - ERROR - stderr - +2025-02-05 19:35:06 - INFO - stdout - {'loss': 0.7025, 'grad_norm': 1.285776138305664, 'learning_rate': 1.306132100784546e-05, 'epoch': 1.26} +2025-02-05 19:35:06 - ERROR - stderr - 42%|████▏ | 9399/22434 [9:27:26<9:34:58, 2.65s/it] +2025-02-05 19:35:08 - ERROR - stderr - 42%|████▏ | 9400/22434 [9:27:28<9:31:03, 2.63s/it] +2025-02-05 19:35:08 - ERROR - stderr - +2025-02-05 19:35:08 - ERROR - stderr - +2025-02-05 19:35:08 - INFO - stdout - {'loss': 0.644, 'grad_norm': 1.2102628946304321, 'learning_rate': 1.305994654497912e-05, 'epoch': 1.26} +2025-02-05 19:35:08 - ERROR - stderr - 42%|████▏ | 9400/22434 [9:27:28<9:31:03, 2.63s/it] +2025-02-05 19:35:11 - ERROR - stderr - 42%|████▏ | 9401/22434 [9:27:31<9:25:24, 2.60s/it] +2025-02-05 19:35:11 - ERROR - stderr - +2025-02-05 19:35:11 - ERROR - stderr - +2025-02-05 19:35:11 - INFO - stdout - {'loss': 0.6759, 'grad_norm': 1.2030905485153198, 'learning_rate': 1.3058572018331122e-05, 'epoch': 1.26} +2025-02-05 19:35:11 - ERROR - stderr - 42%|████▏ | 9401/22434 [9:27:31<9:25:24, 2.60s/it] +2025-02-05 19:35:13 - ERROR - stderr - 42%|████▏ | 9402/22434 [9:27:33<9:17:39, 2.57s/it] +2025-02-05 19:35:13 - ERROR - stderr - +2025-02-05 19:35:13 - ERROR - stderr - +2025-02-05 19:35:13 - INFO - stdout - {'loss': 0.7913, 'grad_norm': 1.270838737487793, 'learning_rate': 1.3057197427930114e-05, 'epoch': 1.26} +2025-02-05 19:35:13 - ERROR - stderr - 42%|████▏ | 9402/22434 [9:27:33<9:17:39, 2.57s/it] +2025-02-05 19:35:16 - ERROR - stderr - 42%|████▏ | 9403/22434 [9:27:36<9:28:01, 2.62s/it] +2025-02-05 19:35:16 - ERROR - stderr - +2025-02-05 19:35:16 - ERROR - stderr - +2025-02-05 19:35:16 - INFO - stdout - {'loss': 0.7765, 'grad_norm': 1.2724796533584595, 'learning_rate': 1.3055822773804757e-05, 'epoch': 1.26} +2025-02-05 19:35:16 - ERROR - stderr - 42%|████▏ | 9403/22434 [9:27:36<9:28:01, 2.62s/it] +2025-02-05 19:35:19 - ERROR - stderr - 42%|████▏ | 9404/22434 [9:27:38<9:20:13, 2.58s/it] +2025-02-05 19:35:19 - ERROR - stderr - +2025-02-05 19:35:19 - ERROR - stderr - +2025-02-05 19:35:19 - INFO - stdout - {'loss': 0.6444, 'grad_norm': 0.9970998167991638, 'learning_rate': 1.3054448055983694e-05, 'epoch': 1.26} +2025-02-05 19:35:19 - ERROR - stderr - 42%|████▏ | 9404/22434 [9:27:38<9:20:13, 2.58s/it] +2025-02-05 19:35:21 - ERROR - stderr - 42%|████▏ | 9405/22434 [9:27:41<9:12:35, 2.54s/it] +2025-02-05 19:35:21 - ERROR - stderr - +2025-02-05 19:35:21 - ERROR - stderr - +2025-02-05 19:35:21 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.1529046297073364, 'learning_rate': 1.3053073274495582e-05, 'epoch': 1.26} +2025-02-05 19:35:21 - ERROR - stderr - 42%|████▏ | 9405/22434 [9:27:41<9:12:35, 2.54s/it] +2025-02-05 19:35:24 - ERROR - stderr - 42%|████▏ | 9406/22434 [9:27:43<9:07:48, 2.52s/it] +2025-02-05 19:35:24 - ERROR - stderr - +2025-02-05 19:35:24 - ERROR - stderr - +2025-02-05 19:35:24 - INFO - stdout - {'loss': 0.8698, 'grad_norm': 1.4148932695388794, 'learning_rate': 1.3051698429369082e-05, 'epoch': 1.26} +2025-02-05 19:35:24 - ERROR - stderr - 42%|████▏ | 9406/22434 [9:27:43<9:07:48, 2.52s/it] +2025-02-05 19:35:26 - ERROR - stderr - 42%|████▏ | 9407/22434 [9:27:46<9:03:48, 2.50s/it] +2025-02-05 19:35:26 - ERROR - stderr - +2025-02-05 19:35:26 - ERROR - stderr - +2025-02-05 19:35:26 - INFO - stdout - {'loss': 0.7856, 'grad_norm': 1.2366430759429932, 'learning_rate': 1.305032352063285e-05, 'epoch': 1.26} +2025-02-05 19:35:26 - ERROR - stderr - 42%|████▏ | 9407/22434 [9:27:46<9:03:48, 2.50s/it] +2025-02-05 19:35:29 - ERROR - stderr - 42%|████▏ | 9408/22434 [9:27:49<9:40:58, 2.68s/it] +2025-02-05 19:35:29 - ERROR - stderr - +2025-02-05 19:35:29 - ERROR - stderr - +2025-02-05 19:35:29 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 1.1664479970932007, 'learning_rate': 1.3048948548315541e-05, 'epoch': 1.26} +2025-02-05 19:35:29 - ERROR - stderr - 42%|████▏ | 9408/22434 [9:27:49<9:40:58, 2.68s/it] +2025-02-05 19:35:32 - ERROR - stderr - 42%|████▏ | 9409/22434 [9:27:51<9:31:13, 2.63s/it] +2025-02-05 19:35:32 - ERROR - stderr - +2025-02-05 19:35:32 - ERROR - stderr - +2025-02-05 19:35:32 - INFO - stdout - {'loss': 0.7002, 'grad_norm': 1.0505180358886719, 'learning_rate': 1.3047573512445817e-05, 'epoch': 1.26} +2025-02-05 19:35:32 - ERROR - stderr - 42%|████▏ | 9409/22434 [9:27:51<9:31:13, 2.63s/it] +2025-02-05 19:35:34 - ERROR - stderr - 42%|████▏ | 9410/22434 [9:27:54<9:26:36, 2.61s/it] +2025-02-05 19:35:34 - ERROR - stderr - +2025-02-05 19:35:34 - ERROR - stderr - +2025-02-05 19:35:34 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.2695668935775757, 'learning_rate': 1.3046198413052337e-05, 'epoch': 1.26} +2025-02-05 19:35:34 - ERROR - stderr - 42%|████▏ | 9410/22434 [9:27:54<9:26:36, 2.61s/it] +2025-02-05 19:35:37 - ERROR - stderr - 42%|████▏ | 9411/22434 [9:27:56<9:18:10, 2.57s/it] +2025-02-05 19:35:37 - ERROR - stderr - +2025-02-05 19:35:37 - ERROR - stderr - +2025-02-05 19:35:37 - INFO - stdout - {'loss': 0.636, 'grad_norm': 1.1096209287643433, 'learning_rate': 1.3044823250163772e-05, 'epoch': 1.26} +2025-02-05 19:35:37 - ERROR - stderr - 42%|████▏ | 9411/22434 [9:27:56<9:18:10, 2.57s/it] +2025-02-05 19:35:39 - ERROR - stderr - 42%|████▏ | 9412/22434 [9:27:59<9:15:59, 2.56s/it] +2025-02-05 19:35:39 - ERROR - stderr - +2025-02-05 19:35:39 - ERROR - stderr - +2025-02-05 19:35:39 - INFO - stdout - {'loss': 0.8366, 'grad_norm': 1.2991843223571777, 'learning_rate': 1.3043448023808774e-05, 'epoch': 1.26} +2025-02-05 19:35:39 - ERROR - stderr - 42%|████▏ | 9412/22434 [9:27:59<9:15:59, 2.56s/it] +2025-02-05 19:35:42 - ERROR - stderr - 42%|████▏ | 9413/22434 [9:28:01<9:09:51, 2.53s/it] +2025-02-05 19:35:42 - ERROR - stderr - +2025-02-05 19:35:42 - ERROR - stderr - +2025-02-05 19:35:42 - INFO - stdout - {'loss': 0.7478, 'grad_norm': 1.1352345943450928, 'learning_rate': 1.3042072734016018e-05, 'epoch': 1.26} +2025-02-05 19:35:42 - ERROR - stderr - 42%|████▏ | 9413/22434 [9:28:01<9:09:51, 2.53s/it] +2025-02-05 19:35:44 - ERROR - stderr - 42%|████▏ | 9414/22434 [9:28:04<9:07:32, 2.52s/it] +2025-02-05 19:35:44 - ERROR - stderr - +2025-02-05 19:35:44 - ERROR - stderr - +2025-02-05 19:35:44 - INFO - stdout - {'loss': 0.6572, 'grad_norm': 1.119526743888855, 'learning_rate': 1.3040697380814165e-05, 'epoch': 1.26} +2025-02-05 19:35:44 - ERROR - stderr - 42%|████▏ | 9414/22434 [9:28:04<9:07:32, 2.52s/it] +2025-02-05 19:35:47 - ERROR - stderr - 42%|████▏ | 9415/22434 [9:28:07<9:12:38, 2.55s/it] +2025-02-05 19:35:47 - ERROR - stderr - +2025-02-05 19:35:47 - ERROR - stderr - +2025-02-05 19:35:47 - INFO - stdout - {'loss': 0.7258, 'grad_norm': 1.1450073719024658, 'learning_rate': 1.3039321964231887e-05, 'epoch': 1.26} +2025-02-05 19:35:47 - ERROR - stderr - 42%|████▏ | 9415/22434 [9:28:07<9:12:38, 2.55s/it] +2025-02-05 19:35:49 - ERROR - stderr - 42%|████▏ | 9416/22434 [9:28:09<9:15:12, 2.56s/it] +2025-02-05 19:35:49 - ERROR - stderr - +2025-02-05 19:35:49 - ERROR - stderr - +2025-02-05 19:35:49 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.1701388359069824, 'learning_rate': 1.303794648429785e-05, 'epoch': 1.26} +2025-02-05 19:35:49 - ERROR - stderr - 42%|████▏ | 9416/22434 [9:28:09<9:15:12, 2.56s/it] +2025-02-05 19:35:52 - ERROR - stderr - 42%|████▏ | 9417/22434 [9:28:12<9:07:29, 2.52s/it] +2025-02-05 19:35:52 - ERROR - stderr - +2025-02-05 19:35:52 - ERROR - stderr - +2025-02-05 19:35:52 - INFO - stdout - {'loss': 0.65, 'grad_norm': 1.1093292236328125, 'learning_rate': 1.3036570941040722e-05, 'epoch': 1.26} +2025-02-05 19:35:52 - ERROR - stderr - 42%|████▏ | 9417/22434 [9:28:12<9:07:29, 2.52s/it] +2025-02-05 19:35:54 - ERROR - stderr - 42%|████▏ | 9418/22434 [9:28:14<9:06:52, 2.52s/it] +2025-02-05 19:35:54 - ERROR - stderr - +2025-02-05 19:35:54 - ERROR - stderr - +2025-02-05 19:35:54 - INFO - stdout - {'loss': 0.7536, 'grad_norm': 1.1471846103668213, 'learning_rate': 1.303519533448918e-05, 'epoch': 1.26} +2025-02-05 19:35:54 - ERROR - stderr - 42%|████▏ | 9418/22434 [9:28:14<9:06:52, 2.52s/it] +2025-02-05 19:35:57 - ERROR - stderr - 42%|████▏ | 9419/22434 [9:28:17<9:41:51, 2.68s/it] +2025-02-05 19:35:57 - ERROR - stderr - +2025-02-05 19:35:57 - ERROR - stderr - +2025-02-05 19:35:57 - INFO - stdout - {'loss': 0.7343, 'grad_norm': 1.1413859128952026, 'learning_rate': 1.3033819664671898e-05, 'epoch': 1.26} +2025-02-05 19:35:57 - ERROR - stderr - 42%|████▏ | 9419/22434 [9:28:17<9:41:51, 2.68s/it] +2025-02-05 19:36:00 - ERROR - stderr - 42%|████▏ | 9420/22434 [9:28:20<9:29:50, 2.63s/it] +2025-02-05 19:36:00 - ERROR - stderr - +2025-02-05 19:36:00 - ERROR - stderr - +2025-02-05 19:36:00 - INFO - stdout - {'loss': 0.6884, 'grad_norm': 1.2191188335418701, 'learning_rate': 1.3032443931617547e-05, 'epoch': 1.26} +2025-02-05 19:36:00 - ERROR - stderr - 42%|████▏ | 9420/22434 [9:28:20<9:29:50, 2.63s/it] +2025-02-05 19:36:02 - ERROR - stderr - 42%|████▏ | 9421/22434 [9:28:22<9:22:22, 2.59s/it] +2025-02-05 19:36:02 - ERROR - stderr - +2025-02-05 19:36:02 - ERROR - stderr - +2025-02-05 19:36:02 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.1447863578796387, 'learning_rate': 1.3031068135354805e-05, 'epoch': 1.26} +2025-02-05 19:36:02 - ERROR - stderr - 42%|████▏ | 9421/22434 [9:28:22<9:22:22, 2.59s/it] +2025-02-05 19:36:05 - ERROR - stderr - 42%|████▏ | 9422/22434 [9:28:25<9:24:19, 2.60s/it] +2025-02-05 19:36:05 - ERROR - stderr - +2025-02-05 19:36:05 - ERROR - stderr - +2025-02-05 19:36:05 - INFO - stdout - {'loss': 0.6479, 'grad_norm': 1.0656601190567017, 'learning_rate': 1.3029692275912346e-05, 'epoch': 1.26} +2025-02-05 19:36:05 - ERROR - stderr - 42%|████▏ | 9422/22434 [9:28:25<9:24:19, 2.60s/it] +2025-02-05 19:36:08 - ERROR - stderr - 42%|████▏ | 9423/22434 [9:28:27<9:18:47, 2.58s/it] +2025-02-05 19:36:08 - ERROR - stderr - +2025-02-05 19:36:08 - ERROR - stderr - +2025-02-05 19:36:08 - INFO - stdout - {'loss': 0.7593, 'grad_norm': 1.2403727769851685, 'learning_rate': 1.3028316353318853e-05, 'epoch': 1.26} +2025-02-05 19:36:08 - ERROR - stderr - 42%|████▏ | 9423/22434 [9:28:27<9:18:47, 2.58s/it] +2025-02-05 19:36:10 - ERROR - stderr - 42%|████▏ | 9424/22434 [9:28:30<9:11:16, 2.54s/it] +2025-02-05 19:36:10 - ERROR - stderr - +2025-02-05 19:36:10 - ERROR - stderr - +2025-02-05 19:36:10 - INFO - stdout - {'loss': 0.7513, 'grad_norm': 1.1008222103118896, 'learning_rate': 1.3026940367603e-05, 'epoch': 1.26} +2025-02-05 19:36:10 - ERROR - stderr - 42%|████▏ | 9424/22434 [9:28:30<9:11:16, 2.54s/it] +2025-02-05 19:36:12 - ERROR - stderr - 42%|████▏ | 9425/22434 [9:28:32<9:08:41, 2.53s/it] +2025-02-05 19:36:13 - ERROR - stderr - +2025-02-05 19:36:13 - ERROR - stderr - +2025-02-05 19:36:13 - INFO - stdout - {'loss': 0.7907, 'grad_norm': 1.1596697568893433, 'learning_rate': 1.3025564318793473e-05, 'epoch': 1.26} +2025-02-05 19:36:13 - ERROR - stderr - 42%|████▏ | 9425/22434 [9:28:32<9:08:41, 2.53s/it] +2025-02-05 19:36:15 - ERROR - stderr - 42%|████▏ | 9426/22434 [9:28:35<9:14:19, 2.56s/it] +2025-02-05 19:36:15 - ERROR - stderr - +2025-02-05 19:36:15 - ERROR - stderr - +2025-02-05 19:36:15 - INFO - stdout - {'loss': 0.6599, 'grad_norm': 1.0629615783691406, 'learning_rate': 1.3024188206918955e-05, 'epoch': 1.26} +2025-02-05 19:36:15 - ERROR - stderr - 42%|████▏ | 9426/22434 [9:28:35<9:14:19, 2.56s/it] +2025-02-05 19:36:18 - ERROR - stderr - 42%|████▏ | 9427/22434 [9:28:37<9:08:44, 2.53s/it] +2025-02-05 19:36:18 - ERROR - stderr - +2025-02-05 19:36:18 - ERROR - stderr - +2025-02-05 19:36:18 - INFO - stdout - {'loss': 0.6271, 'grad_norm': 1.0667093992233276, 'learning_rate': 1.3022812032008128e-05, 'epoch': 1.26} +2025-02-05 19:36:18 - ERROR - stderr - 42%|████▏ | 9427/22434 [9:28:37<9:08:44, 2.53s/it] +2025-02-05 19:36:20 - ERROR - stderr - 42%|████▏ | 9428/22434 [9:28:40<9:34:08, 2.65s/it] +2025-02-05 19:36:21 - ERROR - stderr - +2025-02-05 19:36:21 - ERROR - stderr - +2025-02-05 19:36:21 - INFO - stdout - {'loss': 0.7216, 'grad_norm': 1.111937403678894, 'learning_rate': 1.3021435794089674e-05, 'epoch': 1.26} +2025-02-05 19:36:21 - ERROR - stderr - 42%|████▏ | 9428/22434 [9:28:40<9:34:08, 2.65s/it] +2025-02-05 19:36:23 - ERROR - stderr - 42%|████▏ | 9429/22434 [9:28:43<9:26:06, 2.61s/it] +2025-02-05 19:36:23 - ERROR - stderr - +2025-02-05 19:36:23 - ERROR - stderr - +2025-02-05 19:36:23 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.231778860092163, 'learning_rate': 1.3020059493192283e-05, 'epoch': 1.26} +2025-02-05 19:36:23 - ERROR - stderr - 42%|████▏ | 9429/22434 [9:28:43<9:26:06, 2.61s/it] +2025-02-05 19:36:25 - ERROR - stderr - 42%|████▏ | 9430/22434 [9:28:45<9:14:08, 2.56s/it] +2025-02-05 19:36:25 - ERROR - stderr - +2025-02-05 19:36:25 - ERROR - stderr - +2025-02-05 19:36:25 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.161985993385315, 'learning_rate': 1.301868312934464e-05, 'epoch': 1.26} +2025-02-05 19:36:25 - ERROR - stderr - 42%|████▏ | 9430/22434 [9:28:45<9:14:08, 2.56s/it] +2025-02-05 19:36:28 - ERROR - stderr - 42%|████▏ | 9431/22434 [9:28:48<9:12:03, 2.55s/it] +2025-02-05 19:36:28 - ERROR - stderr - +2025-02-05 19:36:28 - ERROR - stderr - +2025-02-05 19:36:28 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.14393949508667, 'learning_rate': 1.3017306702575437e-05, 'epoch': 1.26} +2025-02-05 19:36:28 - ERROR - stderr - 42%|████▏ | 9431/22434 [9:28:48<9:12:03, 2.55s/it] +2025-02-05 19:36:31 - ERROR - stderr - 42%|████▏ | 9432/22434 [9:28:50<9:19:21, 2.58s/it] +2025-02-05 19:36:31 - ERROR - stderr - +2025-02-05 19:36:31 - ERROR - stderr - +2025-02-05 19:36:31 - INFO - stdout - {'loss': 0.7935, 'grad_norm': 1.1766678094863892, 'learning_rate': 1.3015930212913363e-05, 'epoch': 1.26} +2025-02-05 19:36:31 - ERROR - stderr - 42%|████▏ | 9432/22434 [9:28:50<9:19:21, 2.58s/it] +2025-02-05 19:36:33 - ERROR - stderr - 42%|████▏ | 9433/22434 [9:28:53<9:15:53, 2.57s/it] +2025-02-05 19:36:33 - ERROR - stderr - +2025-02-05 19:36:33 - ERROR - stderr - +2025-02-05 19:36:33 - INFO - stdout - {'loss': 0.6116, 'grad_norm': 1.0971862077713013, 'learning_rate': 1.3014553660387112e-05, 'epoch': 1.26} +2025-02-05 19:36:33 - ERROR - stderr - 42%|████▏ | 9433/22434 [9:28:53<9:15:53, 2.57s/it] +2025-02-05 19:36:36 - ERROR - stderr - 42%|████▏ | 9434/22434 [9:28:55<9:12:56, 2.55s/it] +2025-02-05 19:36:36 - ERROR - stderr - +2025-02-05 19:36:36 - ERROR - stderr - +2025-02-05 19:36:36 - INFO - stdout - {'loss': 0.8033, 'grad_norm': 1.2852641344070435, 'learning_rate': 1.3013177045025374e-05, 'epoch': 1.26} +2025-02-05 19:36:36 - ERROR - stderr - 42%|████▏ | 9434/22434 [9:28:55<9:12:56, 2.55s/it] +2025-02-05 19:36:38 - ERROR - stderr - 42%|████▏ | 9435/22434 [9:28:58<9:09:08, 2.53s/it] +2025-02-05 19:36:38 - ERROR - stderr - +2025-02-05 19:36:38 - ERROR - stderr - +2025-02-05 19:36:38 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.2458208799362183, 'learning_rate': 1.3011800366856839e-05, 'epoch': 1.26} +2025-02-05 19:36:38 - ERROR - stderr - 42%|████▏ | 9435/22434 [9:28:58<9:09:08, 2.53s/it] +2025-02-05 19:36:41 - ERROR - stderr - 42%|████▏ | 9436/22434 [9:29:00<9:05:14, 2.52s/it] +2025-02-05 19:36:41 - ERROR - stderr - +2025-02-05 19:36:41 - ERROR - stderr - +2025-02-05 19:36:41 - INFO - stdout - {'loss': 0.6344, 'grad_norm': 1.1096546649932861, 'learning_rate': 1.3010423625910214e-05, 'epoch': 1.26} +2025-02-05 19:36:41 - ERROR - stderr - 42%|████▏ | 9436/22434 [9:29:00<9:05:14, 2.52s/it] +2025-02-05 19:36:43 - ERROR - stderr - 42%|████▏ | 9437/22434 [9:29:03<9:07:06, 2.53s/it] +2025-02-05 19:36:43 - ERROR - stderr - +2025-02-05 19:36:43 - ERROR - stderr - +2025-02-05 19:36:43 - INFO - stdout - {'loss': 0.7995, 'grad_norm': 1.2686545848846436, 'learning_rate': 1.3009046822214183e-05, 'epoch': 1.26} +2025-02-05 19:36:43 - ERROR - stderr - 42%|████▏ | 9437/22434 [9:29:03<9:07:06, 2.53s/it] +2025-02-05 19:36:46 - ERROR - stderr - 42%|████▏ | 9438/22434 [9:29:05<9:02:24, 2.50s/it] +2025-02-05 19:36:46 - ERROR - stderr - +2025-02-05 19:36:46 - ERROR - stderr - +2025-02-05 19:36:46 - INFO - stdout - {'loss': 0.7495, 'grad_norm': 1.2088152170181274, 'learning_rate': 1.3007669955797452e-05, 'epoch': 1.26} +2025-02-05 19:36:46 - ERROR - stderr - 42%|████▏ | 9438/22434 [9:29:05<9:02:24, 2.50s/it] +2025-02-05 19:36:48 - ERROR - stderr - 42%|████▏ | 9439/22434 [9:29:08<8:56:48, 2.48s/it] +2025-02-05 19:36:48 - ERROR - stderr - +2025-02-05 19:36:48 - ERROR - stderr - +2025-02-05 19:36:48 - INFO - stdout - {'loss': 0.7417, 'grad_norm': 1.2632966041564941, 'learning_rate': 1.3006293026688721e-05, 'epoch': 1.26} +2025-02-05 19:36:48 - ERROR - stderr - 42%|████▏ | 9439/22434 [9:29:08<8:56:48, 2.48s/it] +2025-02-05 19:36:51 - ERROR - stderr - 42%|████▏ | 9440/22434 [9:29:10<8:56:47, 2.48s/it] +2025-02-05 19:36:51 - ERROR - stderr - +2025-02-05 19:36:51 - ERROR - stderr - +2025-02-05 19:36:51 - INFO - stdout - {'loss': 0.7325, 'grad_norm': 1.1630315780639648, 'learning_rate': 1.300491603491669e-05, 'epoch': 1.26} +2025-02-05 19:36:51 - ERROR - stderr - 42%|████▏ | 9440/22434 [9:29:10<8:56:47, 2.48s/it] +2025-02-05 19:36:53 - ERROR - stderr - 42%|████▏ | 9441/22434 [9:29:13<8:57:28, 2.48s/it] +2025-02-05 19:36:53 - ERROR - stderr - +2025-02-05 19:36:53 - ERROR - stderr - +2025-02-05 19:36:53 - INFO - stdout - {'loss': 0.6239, 'grad_norm': 1.141762614250183, 'learning_rate': 1.3003538980510058e-05, 'epoch': 1.26} +2025-02-05 19:36:53 - ERROR - stderr - 42%|████▏ | 9441/22434 [9:29:13<8:57:28, 2.48s/it] +2025-02-05 19:36:56 - ERROR - stderr - 42%|████▏ | 9442/22434 [9:29:15<8:56:38, 2.48s/it] +2025-02-05 19:36:56 - ERROR - stderr - +2025-02-05 19:36:56 - ERROR - stderr - +2025-02-05 19:36:56 - INFO - stdout - {'loss': 0.8626, 'grad_norm': 1.2121704816818237, 'learning_rate': 1.3002161863497529e-05, 'epoch': 1.26} +2025-02-05 19:36:56 - ERROR - stderr - 42%|████▏ | 9442/22434 [9:29:15<8:56:38, 2.48s/it] +2025-02-05 19:36:58 - ERROR - stderr - 42%|████▏ | 9443/22434 [9:29:18<9:03:24, 2.51s/it] +2025-02-05 19:36:58 - ERROR - stderr - +2025-02-05 19:36:58 - ERROR - stderr - +2025-02-05 19:36:58 - INFO - stdout - {'loss': 0.7921, 'grad_norm': 1.3186142444610596, 'learning_rate': 1.300078468390781e-05, 'epoch': 1.26} +2025-02-05 19:36:58 - ERROR - stderr - 42%|████▏ | 9443/22434 [9:29:18<9:03:24, 2.51s/it] +2025-02-05 19:37:01 - ERROR - stderr - 42%|████▏ | 9444/22434 [9:29:20<9:03:25, 2.51s/it] +2025-02-05 19:37:01 - ERROR - stderr - +2025-02-05 19:37:01 - ERROR - stderr - +2025-02-05 19:37:01 - INFO - stdout - {'loss': 0.7051, 'grad_norm': 1.0591927766799927, 'learning_rate': 1.2999407441769602e-05, 'epoch': 1.26} +2025-02-05 19:37:01 - ERROR - stderr - 42%|████▏ | 9444/22434 [9:29:20<9:03:25, 2.51s/it] +2025-02-05 19:37:03 - ERROR - stderr - 42%|████▏ | 9445/22434 [9:29:23<8:58:23, 2.49s/it] +2025-02-05 19:37:03 - ERROR - stderr - +2025-02-05 19:37:03 - ERROR - stderr - +2025-02-05 19:37:03 - INFO - stdout - {'loss': 0.7587, 'grad_norm': 1.2458487749099731, 'learning_rate': 1.2998030137111619e-05, 'epoch': 1.26} +2025-02-05 19:37:03 - ERROR - stderr - 42%|████▏ | 9445/22434 [9:29:23<8:58:23, 2.49s/it] +2025-02-05 19:37:06 - ERROR - stderr - 42%|████▏ | 9446/22434 [9:29:25<9:00:38, 2.50s/it] +2025-02-05 19:37:06 - ERROR - stderr - +2025-02-05 19:37:06 - ERROR - stderr - +2025-02-05 19:37:06 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.0992887020111084, 'learning_rate': 1.2996652769962567e-05, 'epoch': 1.26} +2025-02-05 19:37:06 - ERROR - stderr - 42%|████▏ | 9446/22434 [9:29:25<9:00:38, 2.50s/it] +2025-02-05 19:37:08 - ERROR - stderr - 42%|████▏ | 9447/22434 [9:29:28<9:05:28, 2.52s/it] +2025-02-05 19:37:08 - ERROR - stderr - +2025-02-05 19:37:08 - ERROR - stderr - +2025-02-05 19:37:08 - INFO - stdout - {'loss': 0.6569, 'grad_norm': 1.2170807123184204, 'learning_rate': 1.2995275340351154e-05, 'epoch': 1.26} +2025-02-05 19:37:08 - ERROR - stderr - 42%|████▏ | 9447/22434 [9:29:28<9:05:28, 2.52s/it] +2025-02-05 19:37:11 - ERROR - stderr - 42%|████▏ | 9448/22434 [9:29:30<9:09:42, 2.54s/it] +2025-02-05 19:37:11 - ERROR - stderr - +2025-02-05 19:37:11 - ERROR - stderr - +2025-02-05 19:37:11 - INFO - stdout - {'loss': 0.7209, 'grad_norm': 1.279603362083435, 'learning_rate': 1.2993897848306097e-05, 'epoch': 1.26} +2025-02-05 19:37:11 - ERROR - stderr - 42%|████▏ | 9448/22434 [9:29:31<9:09:42, 2.54s/it] +2025-02-05 19:37:13 - ERROR - stderr - 42%|████▏ | 9449/22434 [9:29:33<9:12:22, 2.55s/it] +2025-02-05 19:37:13 - ERROR - stderr - +2025-02-05 19:37:13 - ERROR - stderr - +2025-02-05 19:37:13 - INFO - stdout - {'loss': 0.7441, 'grad_norm': 1.243025541305542, 'learning_rate': 1.2992520293856098e-05, 'epoch': 1.26} +2025-02-05 19:37:13 - ERROR - stderr - 42%|████▏ | 9449/22434 [9:29:33<9:12:22, 2.55s/it] +2025-02-05 19:37:16 - ERROR - stderr - 42%|████▏ | 9450/22434 [9:29:36<9:08:57, 2.54s/it] +2025-02-05 19:37:16 - ERROR - stderr - +2025-02-05 19:37:16 - ERROR - stderr - +2025-02-05 19:37:16 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.14859938621521, 'learning_rate': 1.299114267702988e-05, 'epoch': 1.26} +2025-02-05 19:37:16 - ERROR - stderr - 42%|████▏ | 9450/22434 [9:29:36<9:08:57, 2.54s/it] +2025-02-05 19:37:18 - ERROR - stderr - 42%|████▏ | 9451/22434 [9:29:38<9:05:17, 2.52s/it] +2025-02-05 19:37:18 - ERROR - stderr - +2025-02-05 19:37:18 - ERROR - stderr - +2025-02-05 19:37:18 - INFO - stdout - {'loss': 0.6994, 'grad_norm': 1.1952840089797974, 'learning_rate': 1.2989764997856154e-05, 'epoch': 1.26} +2025-02-05 19:37:18 - ERROR - stderr - 42%|████▏ | 9451/22434 [9:29:38<9:05:17, 2.52s/it] +2025-02-05 19:37:21 - ERROR - stderr - 42%|████▏ | 9452/22434 [9:29:41<9:06:09, 2.52s/it] +2025-02-05 19:37:21 - ERROR - stderr - +2025-02-05 19:37:21 - ERROR - stderr - +2025-02-05 19:37:21 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.1644119024276733, 'learning_rate': 1.298838725636364e-05, 'epoch': 1.26} +2025-02-05 19:37:21 - ERROR - stderr - 42%|████▏ | 9452/22434 [9:29:41<9:06:09, 2.52s/it] +2025-02-05 19:37:23 - ERROR - stderr - 42%|████▏ | 9453/22434 [9:29:43<9:02:32, 2.51s/it] +2025-02-05 19:37:23 - ERROR - stderr - +2025-02-05 19:37:23 - ERROR - stderr - +2025-02-05 19:37:23 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.1415284872055054, 'learning_rate': 1.2987009452581051e-05, 'epoch': 1.26} +2025-02-05 19:37:23 - ERROR - stderr - 42%|████▏ | 9453/22434 [9:29:43<9:02:32, 2.51s/it] +2025-02-05 19:37:26 - ERROR - stderr - 42%|████▏ | 9454/22434 [9:29:46<9:02:45, 2.51s/it] +2025-02-05 19:37:26 - ERROR - stderr - +2025-02-05 19:37:26 - ERROR - stderr - +2025-02-05 19:37:26 - INFO - stdout - {'loss': 0.7858, 'grad_norm': 1.2003577947616577, 'learning_rate': 1.2985631586537109e-05, 'epoch': 1.26} +2025-02-05 19:37:26 - ERROR - stderr - 42%|████▏ | 9454/22434 [9:29:46<9:02:45, 2.51s/it] +2025-02-05 19:37:28 - ERROR - stderr - 42%|████▏ | 9455/22434 [9:29:48<9:05:31, 2.52s/it] +2025-02-05 19:37:28 - ERROR - stderr - +2025-02-05 19:37:28 - ERROR - stderr - +2025-02-05 19:37:28 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.0465545654296875, 'learning_rate': 1.2984253658260534e-05, 'epoch': 1.26} +2025-02-05 19:37:28 - ERROR - stderr - 42%|████▏ | 9455/22434 [9:29:48<9:05:31, 2.52s/it] +2025-02-05 19:37:31 - ERROR - stderr - 42%|████▏ | 9456/22434 [9:29:51<9:05:59, 2.52s/it] +2025-02-05 19:37:31 - ERROR - stderr - +2025-02-05 19:37:31 - ERROR - stderr - +2025-02-05 19:37:31 - INFO - stdout - {'loss': 0.613, 'grad_norm': 0.940658450126648, 'learning_rate': 1.2982875667780046e-05, 'epoch': 1.26} +2025-02-05 19:37:31 - ERROR - stderr - 42%|████▏ | 9456/22434 [9:29:51<9:05:59, 2.52s/it] +2025-02-05 19:37:33 - ERROR - stderr - 42%|████▏ | 9457/22434 [9:29:53<9:02:39, 2.51s/it] +2025-02-05 19:37:33 - ERROR - stderr - +2025-02-05 19:37:33 - ERROR - stderr - +2025-02-05 19:37:33 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.227908968925476, 'learning_rate': 1.2981497615124367e-05, 'epoch': 1.26} +2025-02-05 19:37:33 - ERROR - stderr - 42%|████▏ | 9457/22434 [9:29:53<9:02:39, 2.51s/it] +2025-02-05 19:37:36 - ERROR - stderr - 42%|████▏ | 9458/22434 [9:29:56<9:00:08, 2.50s/it] +2025-02-05 19:37:36 - ERROR - stderr - +2025-02-05 19:37:36 - ERROR - stderr - +2025-02-05 19:37:36 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.1605809926986694, 'learning_rate': 1.2980119500322228e-05, 'epoch': 1.26} +2025-02-05 19:37:36 - ERROR - stderr - 42%|████▏ | 9458/22434 [9:29:56<9:00:08, 2.50s/it] +2025-02-05 19:37:38 - ERROR - stderr - 42%|████▏ | 9459/22434 [9:29:58<8:56:54, 2.48s/it] +2025-02-05 19:37:38 - ERROR - stderr - +2025-02-05 19:37:38 - ERROR - stderr - +2025-02-05 19:37:38 - INFO - stdout - {'loss': 0.7774, 'grad_norm': 1.1619807481765747, 'learning_rate': 1.2978741323402347e-05, 'epoch': 1.26} +2025-02-05 19:37:38 - ERROR - stderr - 42%|████▏ | 9459/22434 [9:29:58<8:56:54, 2.48s/it] +2025-02-05 19:37:41 - ERROR - stderr - 42%|████▏ | 9460/22434 [9:30:00<8:53:57, 2.47s/it] +2025-02-05 19:37:41 - ERROR - stderr - +2025-02-05 19:37:41 - ERROR - stderr - +2025-02-05 19:37:41 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.2946507930755615, 'learning_rate': 1.2977363084393454e-05, 'epoch': 1.27} +2025-02-05 19:37:41 - ERROR - stderr - 42%|████▏ | 9460/22434 [9:30:01<8:53:57, 2.47s/it] +2025-02-05 19:37:43 - ERROR - stderr - 42%|████▏ | 9461/22434 [9:30:03<8:53:07, 2.47s/it] +2025-02-05 19:37:43 - ERROR - stderr - +2025-02-05 19:37:43 - ERROR - stderr - +2025-02-05 19:37:43 - INFO - stdout - {'loss': 0.7853, 'grad_norm': 1.0990961790084839, 'learning_rate': 1.2975984783324278e-05, 'epoch': 1.27} +2025-02-05 19:37:43 - ERROR - stderr - 42%|████▏ | 9461/22434 [9:30:03<8:53:07, 2.47s/it] +2025-02-05 19:37:46 - ERROR - stderr - 42%|████▏ | 9462/22434 [9:30:05<8:57:06, 2.48s/it] +2025-02-05 19:37:46 - ERROR - stderr - +2025-02-05 19:37:46 - ERROR - stderr - +2025-02-05 19:37:46 - INFO - stdout - {'loss': 0.7914, 'grad_norm': 1.3292585611343384, 'learning_rate': 1.2974606420223546e-05, 'epoch': 1.27} +2025-02-05 19:37:46 - ERROR - stderr - 42%|████▏ | 9462/22434 [9:30:06<8:57:06, 2.48s/it] +2025-02-05 19:37:48 - ERROR - stderr - 42%|████▏ | 9463/22434 [9:30:08<9:14:05, 2.56s/it] +2025-02-05 19:37:48 - ERROR - stderr - +2025-02-05 19:37:48 - ERROR - stderr - +2025-02-05 19:37:48 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.1760728359222412, 'learning_rate': 1.2973227995119985e-05, 'epoch': 1.27} +2025-02-05 19:37:48 - ERROR - stderr - 42%|████▏ | 9463/22434 [9:30:08<9:14:05, 2.56s/it] +2025-02-05 19:37:51 - ERROR - stderr - 42%|████▏ | 9464/22434 [9:30:11<9:07:12, 2.53s/it] +2025-02-05 19:37:51 - ERROR - stderr - +2025-02-05 19:37:51 - ERROR - stderr - +2025-02-05 19:37:51 - INFO - stdout - {'loss': 0.7772, 'grad_norm': 1.3283166885375977, 'learning_rate': 1.2971849508042338e-05, 'epoch': 1.27} +2025-02-05 19:37:51 - ERROR - stderr - 42%|████▏ | 9464/22434 [9:30:11<9:07:12, 2.53s/it] +2025-02-05 19:37:53 - ERROR - stderr - 42%|████▏ | 9465/22434 [9:30:13<9:06:09, 2.53s/it] +2025-02-05 19:37:53 - ERROR - stderr - +2025-02-05 19:37:53 - ERROR - stderr - +2025-02-05 19:37:53 - INFO - stdout - {'loss': 0.7292, 'grad_norm': 1.1950656175613403, 'learning_rate': 1.2970470959019328e-05, 'epoch': 1.27} +2025-02-05 19:37:53 - ERROR - stderr - 42%|████▏ | 9465/22434 [9:30:13<9:06:09, 2.53s/it] +2025-02-05 19:37:56 - ERROR - stderr - 42%|████▏ | 9466/22434 [9:30:16<9:08:50, 2.54s/it] +2025-02-05 19:37:56 - ERROR - stderr - +2025-02-05 19:37:56 - ERROR - stderr - +2025-02-05 19:37:56 - INFO - stdout - {'loss': 0.6858, 'grad_norm': 1.1041823625564575, 'learning_rate': 1.2969092348079695e-05, 'epoch': 1.27} +2025-02-05 19:37:56 - ERROR - stderr - 42%|████▏ | 9466/22434 [9:30:16<9:08:50, 2.54s/it] +2025-02-05 19:37:59 - ERROR - stderr - 42%|████▏ | 9467/22434 [9:30:18<9:08:38, 2.54s/it] +2025-02-05 19:37:59 - ERROR - stderr - +2025-02-05 19:37:59 - ERROR - stderr - +2025-02-05 19:37:59 - INFO - stdout - {'loss': 0.7002, 'grad_norm': 1.1594486236572266, 'learning_rate': 1.2967713675252172e-05, 'epoch': 1.27} +2025-02-05 19:37:59 - ERROR - stderr - 42%|████▏ | 9467/22434 [9:30:18<9:08:38, 2.54s/it] +2025-02-05 19:38:01 - ERROR - stderr - 42%|████▏ | 9468/22434 [9:30:21<9:11:54, 2.55s/it] +2025-02-05 19:38:01 - ERROR - stderr - +2025-02-05 19:38:01 - ERROR - stderr - +2025-02-05 19:38:01 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.1467301845550537, 'learning_rate': 1.29663349405655e-05, 'epoch': 1.27} +2025-02-05 19:38:01 - ERROR - stderr - 42%|████▏ | 9468/22434 [9:30:21<9:11:54, 2.55s/it] +2025-02-05 19:38:04 - ERROR - stderr - 42%|████▏ | 9469/22434 [9:30:23<9:08:18, 2.54s/it] +2025-02-05 19:38:04 - ERROR - stderr - +2025-02-05 19:38:04 - ERROR - stderr - +2025-02-05 19:38:04 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.1792594194412231, 'learning_rate': 1.2964956144048408e-05, 'epoch': 1.27} +2025-02-05 19:38:04 - ERROR - stderr - 42%|████▏ | 9469/22434 [9:30:23<9:08:18, 2.54s/it] +2025-02-05 19:38:06 - ERROR - stderr - 42%|████▏ | 9470/22434 [9:30:26<9:05:17, 2.52s/it] +2025-02-05 19:38:06 - ERROR - stderr - +2025-02-05 19:38:06 - ERROR - stderr - +2025-02-05 19:38:06 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.096909999847412, 'learning_rate': 1.2963577285729647e-05, 'epoch': 1.27} +2025-02-05 19:38:06 - ERROR - stderr - 42%|████▏ | 9470/22434 [9:30:26<9:05:17, 2.52s/it] +2025-02-05 19:38:09 - ERROR - stderr - 42%|████▏ | 9471/22434 [9:30:28<9:02:49, 2.51s/it] +2025-02-05 19:38:09 - ERROR - stderr - +2025-02-05 19:38:09 - ERROR - stderr - +2025-02-05 19:38:09 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.116920828819275, 'learning_rate': 1.2962198365637954e-05, 'epoch': 1.27} +2025-02-05 19:38:09 - ERROR - stderr - 42%|████▏ | 9471/22434 [9:30:28<9:02:49, 2.51s/it] +2025-02-05 19:38:11 - ERROR - stderr - 42%|████▏ | 9472/22434 [9:30:31<9:04:39, 2.52s/it] +2025-02-05 19:38:11 - ERROR - stderr - +2025-02-05 19:38:11 - ERROR - stderr - +2025-02-05 19:38:11 - INFO - stdout - {'loss': 0.6529, 'grad_norm': 1.0858980417251587, 'learning_rate': 1.296081938380207e-05, 'epoch': 1.27} +2025-02-05 19:38:11 - ERROR - stderr - 42%|████▏ | 9472/22434 [9:30:31<9:04:39, 2.52s/it] +2025-02-05 19:38:14 - ERROR - stderr - 42%|████▏ | 9473/22434 [9:30:33<9:05:44, 2.53s/it] +2025-02-05 19:38:14 - ERROR - stderr - +2025-02-05 19:38:14 - ERROR - stderr - +2025-02-05 19:38:14 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.162699818611145, 'learning_rate': 1.2959440340250739e-05, 'epoch': 1.27} +2025-02-05 19:38:14 - ERROR - stderr - 42%|████▏ | 9473/22434 [9:30:33<9:05:44, 2.53s/it] +2025-02-05 19:38:16 - ERROR - stderr - 42%|████▏ | 9474/22434 [9:30:36<9:04:59, 2.52s/it] +2025-02-05 19:38:16 - ERROR - stderr - +2025-02-05 19:38:16 - ERROR - stderr - +2025-02-05 19:38:16 - INFO - stdout - {'loss': 0.7424, 'grad_norm': 1.2131328582763672, 'learning_rate': 1.2958061235012707e-05, 'epoch': 1.27} +2025-02-05 19:38:16 - ERROR - stderr - 42%|████▏ | 9474/22434 [9:30:36<9:04:59, 2.52s/it] +2025-02-05 19:38:19 - ERROR - stderr - 42%|████▏ | 9475/22434 [9:30:38<9:06:28, 2.53s/it] +2025-02-05 19:38:19 - ERROR - stderr - +2025-02-05 19:38:19 - ERROR - stderr - +2025-02-05 19:38:19 - INFO - stdout - {'loss': 0.6202, 'grad_norm': 1.0403214693069458, 'learning_rate': 1.2956682068116717e-05, 'epoch': 1.27} +2025-02-05 19:38:19 - ERROR - stderr - 42%|████▏ | 9475/22434 [9:30:39<9:06:28, 2.53s/it] +2025-02-05 19:38:21 - ERROR - stderr - 42%|████▏ | 9476/22434 [9:30:41<9:07:12, 2.53s/it] +2025-02-05 19:38:21 - ERROR - stderr - +2025-02-05 19:38:21 - ERROR - stderr - +2025-02-05 19:38:21 - INFO - stdout - {'loss': 0.6178, 'grad_norm': 1.0294089317321777, 'learning_rate': 1.2955302839591519e-05, 'epoch': 1.27} +2025-02-05 19:38:21 - ERROR - stderr - 42%|████▏ | 9476/22434 [9:30:41<9:07:12, 2.53s/it] +2025-02-05 19:38:24 - ERROR - stderr - 42%|████▏ | 9477/22434 [9:30:44<9:05:42, 2.53s/it] +2025-02-05 19:38:24 - ERROR - stderr - +2025-02-05 19:38:24 - ERROR - stderr - +2025-02-05 19:38:24 - INFO - stdout - {'loss': 0.6804, 'grad_norm': 1.049277663230896, 'learning_rate': 1.2953923549465861e-05, 'epoch': 1.27} +2025-02-05 19:38:24 - ERROR - stderr - 42%|████▏ | 9477/22434 [9:30:44<9:05:42, 2.53s/it] +2025-02-05 19:38:26 - ERROR - stderr - 42%|████▏ | 9478/22434 [9:30:46<9:00:30, 2.50s/it] +2025-02-05 19:38:26 - ERROR - stderr - +2025-02-05 19:38:26 - ERROR - stderr - +2025-02-05 19:38:26 - INFO - stdout - {'loss': 0.7672, 'grad_norm': 1.1760743856430054, 'learning_rate': 1.2952544197768494e-05, 'epoch': 1.27} +2025-02-05 19:38:26 - ERROR - stderr - 42%|████▏ | 9478/22434 [9:30:46<9:00:30, 2.50s/it] +2025-02-05 19:38:29 - ERROR - stderr - 42%|████▏ | 9479/22434 [9:30:48<8:59:33, 2.50s/it] +2025-02-05 19:38:29 - ERROR - stderr - +2025-02-05 19:38:29 - ERROR - stderr - +2025-02-05 19:38:29 - INFO - stdout - {'loss': 0.6307, 'grad_norm': 1.0840741395950317, 'learning_rate': 1.2951164784528167e-05, 'epoch': 1.27} +2025-02-05 19:38:29 - ERROR - stderr - 42%|████▏ | 9479/22434 [9:30:49<8:59:33, 2.50s/it] +2025-02-05 19:38:31 - ERROR - stderr - 42%|████▏ | 9480/22434 [9:30:51<8:55:27, 2.48s/it] +2025-02-05 19:38:31 - ERROR - stderr - +2025-02-05 19:38:31 - ERROR - stderr - +2025-02-05 19:38:31 - INFO - stdout - {'loss': 0.7465, 'grad_norm': 1.1807917356491089, 'learning_rate': 1.2949785309773638e-05, 'epoch': 1.27} +2025-02-05 19:38:31 - ERROR - stderr - 42%|████▏ | 9480/22434 [9:30:51<8:55:27, 2.48s/it] +2025-02-05 19:38:34 - ERROR - stderr - 42%|████▏ | 9481/22434 [9:30:53<8:56:39, 2.49s/it] +2025-02-05 19:38:34 - ERROR - stderr - +2025-02-05 19:38:34 - ERROR - stderr - +2025-02-05 19:38:34 - INFO - stdout - {'loss': 0.7694, 'grad_norm': 1.3061667680740356, 'learning_rate': 1.2948405773533654e-05, 'epoch': 1.27} +2025-02-05 19:38:34 - ERROR - stderr - 42%|████▏ | 9481/22434 [9:30:53<8:56:39, 2.49s/it] +2025-02-05 19:38:36 - ERROR - stderr - 42%|████▏ | 9482/22434 [9:30:56<9:19:10, 2.59s/it] +2025-02-05 19:38:37 - ERROR - stderr - +2025-02-05 19:38:37 - ERROR - stderr - +2025-02-05 19:38:37 - INFO - stdout - {'loss': 0.6318, 'grad_norm': 1.0992333889007568, 'learning_rate': 1.2947026175836972e-05, 'epoch': 1.27} +2025-02-05 19:38:37 - ERROR - stderr - 42%|████▏ | 9482/22434 [9:30:56<9:19:10, 2.59s/it] +2025-02-05 19:38:39 - ERROR - stderr - 42%|████▏ | 9483/22434 [9:30:59<9:13:20, 2.56s/it] +2025-02-05 19:38:39 - ERROR - stderr - +2025-02-05 19:38:39 - ERROR - stderr - +2025-02-05 19:38:39 - INFO - stdout - {'loss': 0.6638, 'grad_norm': 1.0805842876434326, 'learning_rate': 1.2945646516712349e-05, 'epoch': 1.27} +2025-02-05 19:38:39 - ERROR - stderr - 42%|████▏ | 9483/22434 [9:30:59<9:13:20, 2.56s/it] +2025-02-05 19:38:41 - ERROR - stderr - 42%|████▏ | 9484/22434 [9:31:01<9:03:50, 2.52s/it] +2025-02-05 19:38:41 - ERROR - stderr - +2025-02-05 19:38:41 - ERROR - stderr - +2025-02-05 19:38:41 - INFO - stdout - {'loss': 0.765, 'grad_norm': 1.3306224346160889, 'learning_rate': 1.2944266796188547e-05, 'epoch': 1.27} +2025-02-05 19:38:41 - ERROR - stderr - 42%|████▏ | 9484/22434 [9:31:01<9:03:50, 2.52s/it] +2025-02-05 19:38:44 - ERROR - stderr - 42%|████▏ | 9485/22434 [9:31:04<8:57:23, 2.49s/it] +2025-02-05 19:38:44 - ERROR - stderr - +2025-02-05 19:38:44 - ERROR - stderr - +2025-02-05 19:38:44 - INFO - stdout - {'loss': 0.6429, 'grad_norm': 1.0997247695922852, 'learning_rate': 1.2942887014294318e-05, 'epoch': 1.27} +2025-02-05 19:38:44 - ERROR - stderr - 42%|████▏ | 9485/22434 [9:31:04<8:57:23, 2.49s/it] +2025-02-05 19:38:46 - ERROR - stderr - 42%|████▏ | 9486/22434 [9:31:06<8:56:24, 2.49s/it] +2025-02-05 19:38:46 - ERROR - stderr - +2025-02-05 19:38:46 - ERROR - stderr - +2025-02-05 19:38:46 - INFO - stdout - {'loss': 0.6815, 'grad_norm': 1.0569506883621216, 'learning_rate': 1.2941507171058424e-05, 'epoch': 1.27} +2025-02-05 19:38:46 - ERROR - stderr - 42%|████▏ | 9486/22434 [9:31:06<8:56:24, 2.49s/it] +2025-02-05 19:38:49 - ERROR - stderr - 42%|████▏ | 9487/22434 [9:31:08<8:52:28, 2.47s/it] +2025-02-05 19:38:49 - ERROR - stderr - +2025-02-05 19:38:49 - ERROR - stderr - +2025-02-05 19:38:49 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.0984492301940918, 'learning_rate': 1.294012726650963e-05, 'epoch': 1.27} +2025-02-05 19:38:49 - ERROR - stderr - 42%|████▏ | 9487/22434 [9:31:09<8:52:28, 2.47s/it] +2025-02-05 19:38:51 - ERROR - stderr - 42%|████▏ | 9488/22434 [9:31:11<8:52:49, 2.47s/it] +2025-02-05 19:38:51 - ERROR - stderr - +2025-02-05 19:38:51 - ERROR - stderr - +2025-02-05 19:38:51 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.1420230865478516, 'learning_rate': 1.2938747300676697e-05, 'epoch': 1.27} +2025-02-05 19:38:51 - ERROR - stderr - 42%|████▏ | 9488/22434 [9:31:11<8:52:49, 2.47s/it] +2025-02-05 19:38:54 - ERROR - stderr - 42%|████▏ | 9489/22434 [9:31:13<8:52:07, 2.47s/it] +2025-02-05 19:38:54 - ERROR - stderr - +2025-02-05 19:38:54 - ERROR - stderr - +2025-02-05 19:38:54 - INFO - stdout - {'loss': 0.7575, 'grad_norm': 1.1520344018936157, 'learning_rate': 1.2937367273588387e-05, 'epoch': 1.27} +2025-02-05 19:38:54 - ERROR - stderr - 42%|████▏ | 9489/22434 [9:31:13<8:52:07, 2.47s/it] +2025-02-05 19:38:56 - ERROR - stderr - 42%|████▏ | 9490/22434 [9:31:16<8:50:58, 2.46s/it] +2025-02-05 19:38:56 - ERROR - stderr - +2025-02-05 19:38:56 - ERROR - stderr - +2025-02-05 19:38:56 - INFO - stdout - {'loss': 0.6952, 'grad_norm': 1.2222524881362915, 'learning_rate': 1.2935987185273467e-05, 'epoch': 1.27} +2025-02-05 19:38:56 - ERROR - stderr - 42%|████▏ | 9490/22434 [9:31:16<8:50:58, 2.46s/it] +2025-02-05 19:38:59 - ERROR - stderr - 42%|████▏ | 9491/22434 [9:31:18<8:50:11, 2.46s/it] +2025-02-05 19:38:59 - ERROR - stderr - +2025-02-05 19:38:59 - ERROR - stderr - +2025-02-05 19:38:59 - INFO - stdout - {'loss': 0.6638, 'grad_norm': 1.303871989250183, 'learning_rate': 1.2934607035760705e-05, 'epoch': 1.27} +2025-02-05 19:38:59 - ERROR - stderr - 42%|████▏ | 9491/22434 [9:31:18<8:50:11, 2.46s/it] +2025-02-05 19:39:01 - ERROR - stderr - 42%|████▏ | 9492/22434 [9:31:21<8:50:12, 2.46s/it] +2025-02-05 19:39:01 - ERROR - stderr - +2025-02-05 19:39:01 - ERROR - stderr - +2025-02-05 19:39:01 - INFO - stdout - {'loss': 0.7282, 'grad_norm': 1.1603704690933228, 'learning_rate': 1.2933226825078866e-05, 'epoch': 1.27} +2025-02-05 19:39:01 - ERROR - stderr - 42%|████▏ | 9492/22434 [9:31:21<8:50:12, 2.46s/it] +2025-02-05 19:39:04 - ERROR - stderr - 42%|████▏ | 9493/22434 [9:31:23<8:59:57, 2.50s/it] +2025-02-05 19:39:04 - ERROR - stderr - +2025-02-05 19:39:04 - ERROR - stderr - +2025-02-05 19:39:04 - INFO - stdout - {'loss': 0.8046, 'grad_norm': 1.253767490386963, 'learning_rate': 1.2931846553256721e-05, 'epoch': 1.27} +2025-02-05 19:39:04 - ERROR - stderr - 42%|████▏ | 9493/22434 [9:31:23<8:59:57, 2.50s/it] +2025-02-05 19:39:06 - ERROR - stderr - 42%|████▏ | 9494/22434 [9:31:26<8:59:36, 2.50s/it] +2025-02-05 19:39:06 - ERROR - stderr - +2025-02-05 19:39:06 - ERROR - stderr - +2025-02-05 19:39:06 - INFO - stdout - {'loss': 0.8114, 'grad_norm': 1.229962706565857, 'learning_rate': 1.293046622032304e-05, 'epoch': 1.27} +2025-02-05 19:39:06 - ERROR - stderr - 42%|████▏ | 9494/22434 [9:31:26<8:59:36, 2.50s/it] +2025-02-05 19:39:09 - ERROR - stderr - 42%|████▏ | 9495/22434 [9:31:28<9:02:40, 2.52s/it] +2025-02-05 19:39:09 - ERROR - stderr - +2025-02-05 19:39:09 - ERROR - stderr - +2025-02-05 19:39:09 - INFO - stdout - {'loss': 0.8298, 'grad_norm': 1.3764369487762451, 'learning_rate': 1.2929085826306595e-05, 'epoch': 1.27} +2025-02-05 19:39:09 - ERROR - stderr - 42%|████▏ | 9495/22434 [9:31:28<9:02:40, 2.52s/it] +2025-02-05 19:39:11 - ERROR - stderr - 42%|████▏ | 9496/22434 [9:31:31<9:04:22, 2.52s/it] +2025-02-05 19:39:11 - ERROR - stderr - +2025-02-05 19:39:11 - ERROR - stderr - +2025-02-05 19:39:11 - INFO - stdout - {'loss': 0.7739, 'grad_norm': 1.35489821434021, 'learning_rate': 1.2927705371236159e-05, 'epoch': 1.27} +2025-02-05 19:39:11 - ERROR - stderr - 42%|████▏ | 9496/22434 [9:31:31<9:04:22, 2.52s/it] +2025-02-05 19:39:14 - ERROR - stderr - 42%|████▏ | 9497/22434 [9:31:33<9:00:39, 2.51s/it] +2025-02-05 19:39:14 - ERROR - stderr - +2025-02-05 19:39:14 - ERROR - stderr - +2025-02-05 19:39:14 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.1609470844268799, 'learning_rate': 1.2926324855140507e-05, 'epoch': 1.27} +2025-02-05 19:39:14 - ERROR - stderr - 42%|████▏ | 9497/22434 [9:31:34<9:00:39, 2.51s/it] +2025-02-05 19:39:16 - ERROR - stderr - 42%|████▏ | 9498/22434 [9:31:36<9:00:33, 2.51s/it] +2025-02-05 19:39:16 - ERROR - stderr - +2025-02-05 19:39:16 - ERROR - stderr - +2025-02-05 19:39:16 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.0934573411941528, 'learning_rate': 1.2924944278048412e-05, 'epoch': 1.27} +2025-02-05 19:39:16 - ERROR - stderr - 42%|████▏ | 9498/22434 [9:31:36<9:00:33, 2.51s/it] +2025-02-05 19:39:19 - ERROR - stderr - 42%|████▏ | 9499/22434 [9:31:39<9:04:05, 2.52s/it] +2025-02-05 19:39:19 - ERROR - stderr - +2025-02-05 19:39:19 - ERROR - stderr - +2025-02-05 19:39:19 - INFO - stdout - {'loss': 0.7398, 'grad_norm': 1.052471399307251, 'learning_rate': 1.2923563639988652e-05, 'epoch': 1.27} +2025-02-05 19:39:19 - ERROR - stderr - 42%|████▏ | 9499/22434 [9:31:39<9:04:05, 2.52s/it] +2025-02-05 19:39:21 - ERROR - stderr - 42%|████▏ | 9500/22434 [9:31:41<9:06:19, 2.53s/it] +2025-02-05 19:39:21 - ERROR - stderr - +2025-02-05 19:39:21 - ERROR - stderr - +2025-02-05 19:39:21 - INFO - stdout - {'loss': 0.6857, 'grad_norm': 1.1274904012680054, 'learning_rate': 1.292218294099001e-05, 'epoch': 1.27} +2025-02-05 19:39:21 - ERROR - stderr - 42%|████▏ | 9500/22434 [9:31:41<9:06:19, 2.53s/it] +2025-02-05 19:39:24 - ERROR - stderr - 42%|████▏ | 9501/22434 [9:31:44<8:59:52, 2.50s/it] +2025-02-05 19:39:24 - ERROR - stderr - +2025-02-05 19:39:24 - ERROR - stderr - +2025-02-05 19:39:24 - INFO - stdout - {'loss': 0.7205, 'grad_norm': 1.258703351020813, 'learning_rate': 1.2920802181081254e-05, 'epoch': 1.27} +2025-02-05 19:39:24 - ERROR - stderr - 42%|████▏ | 9501/22434 [9:31:44<8:59:52, 2.50s/it] +2025-02-05 19:39:26 - ERROR - stderr - 42%|████▏ | 9502/22434 [9:31:46<8:58:53, 2.50s/it] +2025-02-05 19:39:26 - ERROR - stderr - +2025-02-05 19:39:26 - ERROR - stderr - +2025-02-05 19:39:26 - INFO - stdout - {'loss': 0.622, 'grad_norm': 1.197588562965393, 'learning_rate': 1.2919421360291173e-05, 'epoch': 1.27} +2025-02-05 19:39:26 - ERROR - stderr - 42%|████▏ | 9502/22434 [9:31:46<8:58:53, 2.50s/it] +2025-02-05 19:39:29 - ERROR - stderr - 42%|████▏ | 9503/22434 [9:31:49<8:58:40, 2.50s/it] +2025-02-05 19:39:29 - ERROR - stderr - +2025-02-05 19:39:29 - ERROR - stderr - +2025-02-05 19:39:29 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.1211081743240356, 'learning_rate': 1.2918040478648549e-05, 'epoch': 1.27} +2025-02-05 19:39:29 - ERROR - stderr - 42%|████▏ | 9503/22434 [9:31:49<8:58:40, 2.50s/it] +2025-02-05 19:39:31 - ERROR - stderr - 42%|████▏ | 9504/22434 [9:31:51<9:02:00, 2.52s/it] +2025-02-05 19:39:31 - ERROR - stderr - +2025-02-05 19:39:31 - ERROR - stderr - +2025-02-05 19:39:31 - INFO - stdout - {'loss': 0.7598, 'grad_norm': 1.2332813739776611, 'learning_rate': 1.2916659536182166e-05, 'epoch': 1.27} +2025-02-05 19:39:31 - ERROR - stderr - 42%|████▏ | 9504/22434 [9:31:51<9:02:00, 2.52s/it] +2025-02-05 19:39:34 - ERROR - stderr - 42%|████▏ | 9505/22434 [9:31:54<8:59:27, 2.50s/it] +2025-02-05 19:39:34 - ERROR - stderr - +2025-02-05 19:39:34 - ERROR - stderr - +2025-02-05 19:39:34 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.0994535684585571, 'learning_rate': 1.2915278532920802e-05, 'epoch': 1.27} +2025-02-05 19:39:34 - ERROR - stderr - 42%|████▏ | 9505/22434 [9:31:54<8:59:27, 2.50s/it] +2025-02-05 19:39:36 - ERROR - stderr - 42%|████▏ | 9506/22434 [9:31:56<8:56:42, 2.49s/it] +2025-02-05 19:39:36 - ERROR - stderr - +2025-02-05 19:39:36 - ERROR - stderr - +2025-02-05 19:39:36 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.1468127965927124, 'learning_rate': 1.2913897468893249e-05, 'epoch': 1.27} +2025-02-05 19:39:36 - ERROR - stderr - 42%|████▏ | 9506/22434 [9:31:56<8:56:42, 2.49s/it] +2025-02-05 19:39:39 - ERROR - stderr - 42%|████▏ | 9507/22434 [9:31:58<8:57:23, 2.49s/it] +2025-02-05 19:39:39 - ERROR - stderr - +2025-02-05 19:39:39 - ERROR - stderr - +2025-02-05 19:39:39 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.0861032009124756, 'learning_rate': 1.291251634412829e-05, 'epoch': 1.27} +2025-02-05 19:39:39 - ERROR - stderr - 42%|████▏ | 9507/22434 [9:31:59<8:57:23, 2.49s/it] +2025-02-05 19:39:41 - ERROR - stderr - 42%|████▏ | 9508/22434 [9:32:01<9:03:38, 2.52s/it] +2025-02-05 19:39:41 - ERROR - stderr - +2025-02-05 19:39:41 - ERROR - stderr - +2025-02-05 19:39:41 - INFO - stdout - {'loss': 0.7665, 'grad_norm': 1.2880622148513794, 'learning_rate': 1.2911135158654716e-05, 'epoch': 1.27} +2025-02-05 19:39:41 - ERROR - stderr - 42%|████▏ | 9508/22434 [9:32:01<9:03:38, 2.52s/it] +2025-02-05 19:39:44 - ERROR - stderr - 42%|████▏ | 9509/22434 [9:32:04<9:08:41, 2.55s/it] +2025-02-05 19:39:44 - ERROR - stderr - +2025-02-05 19:39:44 - ERROR - stderr - +2025-02-05 19:39:44 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.1230720281600952, 'learning_rate': 1.2909753912501312e-05, 'epoch': 1.27} +2025-02-05 19:39:44 - ERROR - stderr - 42%|████▏ | 9509/22434 [9:32:04<9:08:41, 2.55s/it] +2025-02-05 19:39:46 - ERROR - stderr - 42%|████▏ | 9510/22434 [9:32:06<9:06:24, 2.54s/it] +2025-02-05 19:39:46 - ERROR - stderr - +2025-02-05 19:39:46 - ERROR - stderr - +2025-02-05 19:39:46 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.2851399183273315, 'learning_rate': 1.2908372605696876e-05, 'epoch': 1.27} +2025-02-05 19:39:46 - ERROR - stderr - 42%|████▏ | 9510/22434 [9:32:06<9:06:24, 2.54s/it] +2025-02-05 19:39:49 - ERROR - stderr - 42%|████▏ | 9511/22434 [9:32:09<9:06:03, 2.54s/it] +2025-02-05 19:39:49 - ERROR - stderr - +2025-02-05 19:39:49 - ERROR - stderr - +2025-02-05 19:39:49 - INFO - stdout - {'loss': 0.7441, 'grad_norm': 1.1694047451019287, 'learning_rate': 1.2906991238270194e-05, 'epoch': 1.27} +2025-02-05 19:39:49 - ERROR - stderr - 42%|████▏ | 9511/22434 [9:32:09<9:06:03, 2.54s/it] +2025-02-05 19:39:52 - ERROR - stderr - 42%|████▏ | 9512/22434 [9:32:11<9:08:44, 2.55s/it] +2025-02-05 19:39:52 - ERROR - stderr - +2025-02-05 19:39:52 - ERROR - stderr - +2025-02-05 19:39:52 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.053795576095581, 'learning_rate': 1.2905609810250064e-05, 'epoch': 1.27} +2025-02-05 19:39:52 - ERROR - stderr - 42%|████▏ | 9512/22434 [9:32:11<9:08:44, 2.55s/it] +2025-02-05 19:39:54 - ERROR - stderr - 42%|████▏ | 9513/22434 [9:32:14<9:04:15, 2.53s/it] +2025-02-05 19:39:54 - ERROR - stderr - +2025-02-05 19:39:54 - ERROR - stderr - +2025-02-05 19:39:54 - INFO - stdout - {'loss': 0.6966, 'grad_norm': 1.2144666910171509, 'learning_rate': 1.2904228321665276e-05, 'epoch': 1.27} +2025-02-05 19:39:54 - ERROR - stderr - 42%|████▏ | 9513/22434 [9:32:14<9:04:15, 2.53s/it] +2025-02-05 19:39:56 - ERROR - stderr - 42%|████▏ | 9514/22434 [9:32:16<9:00:25, 2.51s/it] +2025-02-05 19:39:57 - ERROR - stderr - +2025-02-05 19:39:57 - ERROR - stderr - +2025-02-05 19:39:57 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.15086829662323, 'learning_rate': 1.2902846772544625e-05, 'epoch': 1.27} +2025-02-05 19:39:57 - ERROR - stderr - 42%|████▏ | 9514/22434 [9:32:16<9:00:25, 2.51s/it] +2025-02-05 19:39:59 - ERROR - stderr - 42%|████▏ | 9515/22434 [9:32:19<9:02:14, 2.52s/it] +2025-02-05 19:39:59 - ERROR - stderr - +2025-02-05 19:39:59 - ERROR - stderr - +2025-02-05 19:39:59 - INFO - stdout - {'loss': 0.6516, 'grad_norm': 1.0648542642593384, 'learning_rate': 1.2901465162916914e-05, 'epoch': 1.27} +2025-02-05 19:39:59 - ERROR - stderr - 42%|████▏ | 9515/22434 [9:32:19<9:02:14, 2.52s/it] +2025-02-05 19:40:01 - ERROR - stderr - 42%|████▏ | 9516/22434 [9:32:21<8:56:32, 2.49s/it] +2025-02-05 19:40:02 - ERROR - stderr - +2025-02-05 19:40:02 - ERROR - stderr - +2025-02-05 19:40:02 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.17129385471344, 'learning_rate': 1.2900083492810935e-05, 'epoch': 1.27} +2025-02-05 19:40:02 - ERROR - stderr - 42%|████▏ | 9516/22434 [9:32:21<8:56:32, 2.49s/it] +2025-02-05 19:40:04 - ERROR - stderr - 42%|████▏ | 9517/22434 [9:32:24<8:59:15, 2.50s/it] +2025-02-05 19:40:04 - ERROR - stderr - +2025-02-05 19:40:04 - ERROR - stderr - +2025-02-05 19:40:04 - INFO - stdout - {'loss': 0.7766, 'grad_norm': 1.227160096168518, 'learning_rate': 1.2898701762255495e-05, 'epoch': 1.27} +2025-02-05 19:40:04 - ERROR - stderr - 42%|████▏ | 9517/22434 [9:32:24<8:59:15, 2.50s/it] +2025-02-05 19:40:07 - ERROR - stderr - 42%|████▏ | 9518/22434 [9:32:26<8:59:50, 2.51s/it] +2025-02-05 19:40:07 - ERROR - stderr - +2025-02-05 19:40:07 - ERROR - stderr - +2025-02-05 19:40:07 - INFO - stdout - {'loss': 0.723, 'grad_norm': 1.0876532793045044, 'learning_rate': 1.2897319971279387e-05, 'epoch': 1.27} +2025-02-05 19:40:07 - ERROR - stderr - 42%|████▏ | 9518/22434 [9:32:26<8:59:50, 2.51s/it] +2025-02-05 19:40:09 - ERROR - stderr - 42%|████▏ | 9519/22434 [9:32:29<9:10:38, 2.56s/it] +2025-02-05 19:40:09 - ERROR - stderr - +2025-02-05 19:40:09 - ERROR - stderr - +2025-02-05 19:40:09 - INFO - stdout - {'loss': 0.7423, 'grad_norm': 1.1414062976837158, 'learning_rate': 1.289593811991142e-05, 'epoch': 1.27} +2025-02-05 19:40:09 - ERROR - stderr - 42%|████▏ | 9519/22434 [9:32:29<9:10:38, 2.56s/it] +2025-02-05 19:40:12 - ERROR - stderr - 42%|████▏ | 9520/22434 [9:32:31<9:08:43, 2.55s/it] +2025-02-05 19:40:12 - ERROR - stderr - +2025-02-05 19:40:12 - ERROR - stderr - +2025-02-05 19:40:12 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.1948473453521729, 'learning_rate': 1.2894556208180391e-05, 'epoch': 1.27} +2025-02-05 19:40:12 - ERROR - stderr - 42%|████▏ | 9520/22434 [9:32:32<9:08:43, 2.55s/it] +2025-02-05 19:40:15 - ERROR - stderr - 42%|████▏ | 9521/22434 [9:32:34<9:28:27, 2.64s/it] +2025-02-05 19:40:15 - ERROR - stderr - +2025-02-05 19:40:15 - ERROR - stderr - +2025-02-05 19:40:15 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.2639120817184448, 'learning_rate': 1.2893174236115109e-05, 'epoch': 1.27} +2025-02-05 19:40:15 - ERROR - stderr - 42%|████▏ | 9521/22434 [9:32:34<9:28:27, 2.64s/it] +2025-02-05 19:40:17 - ERROR - stderr - 42%|████▏ | 9522/22434 [9:32:37<9:16:26, 2.59s/it] +2025-02-05 19:40:17 - ERROR - stderr - +2025-02-05 19:40:17 - ERROR - stderr - +2025-02-05 19:40:17 - INFO - stdout - {'loss': 0.8544, 'grad_norm': 1.4424244165420532, 'learning_rate': 1.2891792203744377e-05, 'epoch': 1.27} +2025-02-05 19:40:17 - ERROR - stderr - 42%|████▏ | 9522/22434 [9:32:37<9:16:26, 2.59s/it] +2025-02-05 19:40:20 - ERROR - stderr - 42%|████▏ | 9523/22434 [9:32:40<9:38:25, 2.69s/it] +2025-02-05 19:40:20 - ERROR - stderr - +2025-02-05 19:40:20 - ERROR - stderr - +2025-02-05 19:40:20 - INFO - stdout - {'loss': 0.7861, 'grad_norm': 1.1673781871795654, 'learning_rate': 1.2890410111097004e-05, 'epoch': 1.27} +2025-02-05 19:40:20 - ERROR - stderr - 42%|████▏ | 9523/22434 [9:32:40<9:38:25, 2.69s/it] +2025-02-05 19:40:22 - ERROR - stderr - 42%|████▏ | 9524/22434 [9:32:42<9:27:59, 2.64s/it] +2025-02-05 19:40:23 - ERROR - stderr - +2025-02-05 19:40:23 - ERROR - stderr - +2025-02-05 19:40:23 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.125425100326538, 'learning_rate': 1.28890279582018e-05, 'epoch': 1.27} +2025-02-05 19:40:23 - ERROR - stderr - 42%|████▏ | 9524/22434 [9:32:42<9:27:59, 2.64s/it] +2025-02-05 19:40:25 - ERROR - stderr - 42%|████▏ | 9525/22434 [9:32:45<9:18:47, 2.60s/it] +2025-02-05 19:40:25 - ERROR - stderr - +2025-02-05 19:40:25 - ERROR - stderr - +2025-02-05 19:40:25 - INFO - stdout - {'loss': 0.7225, 'grad_norm': 1.1461714506149292, 'learning_rate': 1.2887645745087573e-05, 'epoch': 1.27} +2025-02-05 19:40:25 - ERROR - stderr - 42%|████▏ | 9525/22434 [9:32:45<9:18:47, 2.60s/it] +2025-02-05 19:40:28 - ERROR - stderr - 42%|████▏ | 9526/22434 [9:32:47<9:15:52, 2.58s/it] +2025-02-05 19:40:28 - ERROR - stderr - +2025-02-05 19:40:28 - ERROR - stderr - +2025-02-05 19:40:28 - INFO - stdout - {'loss': 0.7165, 'grad_norm': 1.1426209211349487, 'learning_rate': 1.2886263471783134e-05, 'epoch': 1.27} +2025-02-05 19:40:28 - ERROR - stderr - 42%|████▏ | 9526/22434 [9:32:47<9:15:52, 2.58s/it] +2025-02-05 19:40:30 - ERROR - stderr - 42%|████▏ | 9527/22434 [9:32:50<9:09:43, 2.56s/it] +2025-02-05 19:40:30 - ERROR - stderr - +2025-02-05 19:40:30 - ERROR - stderr - +2025-02-05 19:40:30 - INFO - stdout - {'loss': 0.7322, 'grad_norm': 1.1742935180664062, 'learning_rate': 1.2884881138317291e-05, 'epoch': 1.27} +2025-02-05 19:40:30 - ERROR - stderr - 42%|████▏ | 9527/22434 [9:32:50<9:09:43, 2.56s/it] +2025-02-05 19:40:33 - ERROR - stderr - 42%|████▏ | 9528/22434 [9:32:53<9:21:16, 2.61s/it] +2025-02-05 19:40:33 - ERROR - stderr - +2025-02-05 19:40:33 - ERROR - stderr - +2025-02-05 19:40:33 - INFO - stdout - {'loss': 0.7656, 'grad_norm': 1.1461148262023926, 'learning_rate': 1.2883498744718861e-05, 'epoch': 1.27} +2025-02-05 19:40:33 - ERROR - stderr - 42%|████▏ | 9528/22434 [9:32:53<9:21:16, 2.61s/it] +2025-02-05 19:40:35 - ERROR - stderr - 42%|████▏ | 9529/22434 [9:32:55<9:28:32, 2.64s/it] +2025-02-05 19:40:36 - ERROR - stderr - +2025-02-05 19:40:36 - ERROR - stderr - +2025-02-05 19:40:36 - INFO - stdout - {'loss': 0.7402, 'grad_norm': 1.1775208711624146, 'learning_rate': 1.2882116291016663e-05, 'epoch': 1.27} +2025-02-05 19:40:36 - ERROR - stderr - 42%|████▏ | 9529/22434 [9:32:55<9:28:32, 2.64s/it] +2025-02-05 19:40:38 - ERROR - stderr - 42%|████▏ | 9530/22434 [9:32:58<9:18:12, 2.60s/it] +2025-02-05 19:40:38 - ERROR - stderr - +2025-02-05 19:40:38 - ERROR - stderr - +2025-02-05 19:40:38 - INFO - stdout - {'loss': 0.6718, 'grad_norm': 1.1097184419631958, 'learning_rate': 1.2880733777239506e-05, 'epoch': 1.27} +2025-02-05 19:40:38 - ERROR - stderr - 42%|████▏ | 9530/22434 [9:32:58<9:18:12, 2.60s/it] +2025-02-05 19:40:40 - ERROR - stderr - 42%|████▏ | 9531/22434 [9:33:00<9:12:47, 2.57s/it] +2025-02-05 19:40:41 - ERROR - stderr - +2025-02-05 19:40:41 - ERROR - stderr - +2025-02-05 19:40:41 - INFO - stdout - {'loss': 0.7459, 'grad_norm': 1.1424474716186523, 'learning_rate': 1.2879351203416213e-05, 'epoch': 1.27} +2025-02-05 19:40:41 - ERROR - stderr - 42%|████▏ | 9531/22434 [9:33:00<9:12:47, 2.57s/it] +2025-02-05 19:40:43 - ERROR - stderr - 42%|████▏ | 9532/22434 [9:33:03<9:05:40, 2.54s/it] +2025-02-05 19:40:43 - ERROR - stderr - +2025-02-05 19:40:43 - ERROR - stderr - +2025-02-05 19:40:43 - INFO - stdout - {'loss': 0.7285, 'grad_norm': 1.224245309829712, 'learning_rate': 1.2877968569575596e-05, 'epoch': 1.27} +2025-02-05 19:40:43 - ERROR - stderr - 42%|████▏ | 9532/22434 [9:33:03<9:05:40, 2.54s/it] +2025-02-05 19:40:45 - ERROR - stderr - 42%|████▏ | 9533/22434 [9:33:05<9:04:45, 2.53s/it] +2025-02-05 19:40:46 - ERROR - stderr - +2025-02-05 19:40:46 - ERROR - stderr - +2025-02-05 19:40:46 - INFO - stdout - {'loss': 0.7081, 'grad_norm': 1.0536460876464844, 'learning_rate': 1.2876585875746478e-05, 'epoch': 1.27} +2025-02-05 19:40:46 - ERROR - stderr - 42%|████▏ | 9533/22434 [9:33:05<9:04:45, 2.53s/it] +2025-02-05 19:40:48 - ERROR - stderr - 42%|████▏ | 9534/22434 [9:33:08<9:00:40, 2.51s/it] +2025-02-05 19:40:48 - ERROR - stderr - +2025-02-05 19:40:48 - ERROR - stderr - +2025-02-05 19:40:48 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.1879181861877441, 'learning_rate': 1.2875203121957682e-05, 'epoch': 1.27} +2025-02-05 19:40:48 - ERROR - stderr - 42%|████▏ | 9534/22434 [9:33:08<9:00:40, 2.51s/it] +2025-02-05 19:40:50 - ERROR - stderr - 43%|████▎ | 9535/22434 [9:33:10<8:58:53, 2.51s/it] +2025-02-05 19:40:50 - ERROR - stderr - +2025-02-05 19:40:50 - ERROR - stderr - +2025-02-05 19:40:50 - INFO - stdout - {'loss': 0.7394, 'grad_norm': 1.0664161443710327, 'learning_rate': 1.2873820308238027e-05, 'epoch': 1.28} +2025-02-05 19:40:50 - ERROR - stderr - 43%|████▎ | 9535/22434 [9:33:10<8:58:53, 2.51s/it] +2025-02-05 19:40:53 - ERROR - stderr - 43%|████▎ | 9536/22434 [9:33:13<8:54:20, 2.49s/it] +2025-02-05 19:40:53 - ERROR - stderr - +2025-02-05 19:40:53 - ERROR - stderr - +2025-02-05 19:40:53 - INFO - stdout - {'loss': 0.7448, 'grad_norm': 1.194688320159912, 'learning_rate': 1.2872437434616339e-05, 'epoch': 1.28} +2025-02-05 19:40:53 - ERROR - stderr - 43%|████▎ | 9536/22434 [9:33:13<8:54:20, 2.49s/it] +2025-02-05 19:40:55 - ERROR - stderr - 43%|████▎ | 9537/22434 [9:33:15<8:56:26, 2.50s/it] +2025-02-05 19:40:55 - ERROR - stderr - +2025-02-05 19:40:55 - ERROR - stderr - +2025-02-05 19:40:55 - INFO - stdout - {'loss': 0.8864, 'grad_norm': 1.3280386924743652, 'learning_rate': 1.2871054501121443e-05, 'epoch': 1.28} +2025-02-05 19:40:55 - ERROR - stderr - 43%|████▎ | 9537/22434 [9:33:15<8:56:26, 2.50s/it] +2025-02-05 19:40:58 - ERROR - stderr - 43%|████▎ | 9538/22434 [9:33:18<8:58:42, 2.51s/it] +2025-02-05 19:40:58 - ERROR - stderr - +2025-02-05 19:40:58 - ERROR - stderr - +2025-02-05 19:40:58 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.2681570053100586, 'learning_rate': 1.286967150778216e-05, 'epoch': 1.28} +2025-02-05 19:40:58 - ERROR - stderr - 43%|████▎ | 9538/22434 [9:33:18<8:58:42, 2.51s/it] +2025-02-05 19:41:01 - ERROR - stderr - 43%|████▎ | 9539/22434 [9:33:20<9:05:49, 2.54s/it] +2025-02-05 19:41:01 - ERROR - stderr - +2025-02-05 19:41:01 - ERROR - stderr - +2025-02-05 19:41:01 - INFO - stdout - {'loss': 0.6948, 'grad_norm': 1.118194818496704, 'learning_rate': 1.2868288454627322e-05, 'epoch': 1.28} +2025-02-05 19:41:01 - ERROR - stderr - 43%|████▎ | 9539/22434 [9:33:20<9:05:49, 2.54s/it] +2025-02-05 19:41:03 - ERROR - stderr - 43%|████▎ | 9540/22434 [9:33:23<9:08:35, 2.55s/it] +2025-02-05 19:41:03 - ERROR - stderr - +2025-02-05 19:41:03 - ERROR - stderr - +2025-02-05 19:41:03 - INFO - stdout - {'loss': 0.702, 'grad_norm': 1.2404524087905884, 'learning_rate': 1.2866905341685753e-05, 'epoch': 1.28} +2025-02-05 19:41:03 - ERROR - stderr - 43%|████▎ | 9540/22434 [9:33:23<9:08:35, 2.55s/it] +2025-02-05 19:41:06 - ERROR - stderr - 43%|████▎ | 9541/22434 [9:33:26<9:24:35, 2.63s/it] +2025-02-05 19:41:06 - ERROR - stderr - +2025-02-05 19:41:06 - ERROR - stderr - +2025-02-05 19:41:06 - INFO - stdout - {'loss': 0.6935, 'grad_norm': 1.1704304218292236, 'learning_rate': 1.286552216898629e-05, 'epoch': 1.28} +2025-02-05 19:41:06 - ERROR - stderr - 43%|████▎ | 9541/22434 [9:33:26<9:24:35, 2.63s/it] +2025-02-05 19:41:08 - ERROR - stderr - 43%|████▎ | 9542/22434 [9:33:28<9:20:54, 2.61s/it] +2025-02-05 19:41:09 - ERROR - stderr - +2025-02-05 19:41:09 - ERROR - stderr - +2025-02-05 19:41:09 - INFO - stdout - {'loss': 0.735, 'grad_norm': 1.0573474168777466, 'learning_rate': 1.2864138936557755e-05, 'epoch': 1.28} +2025-02-05 19:41:09 - ERROR - stderr - 43%|████▎ | 9542/22434 [9:33:28<9:20:54, 2.61s/it] +2025-02-05 19:41:11 - ERROR - stderr - 43%|████▎ | 9543/22434 [9:33:31<9:32:15, 2.66s/it] +2025-02-05 19:41:11 - ERROR - stderr - +2025-02-05 19:41:11 - ERROR - stderr - +2025-02-05 19:41:11 - INFO - stdout - {'loss': 0.6517, 'grad_norm': 1.009398102760315, 'learning_rate': 1.2862755644428985e-05, 'epoch': 1.28} +2025-02-05 19:41:11 - ERROR - stderr - 43%|████▎ | 9543/22434 [9:33:31<9:32:15, 2.66s/it] +2025-02-05 19:41:14 - ERROR - stderr - 43%|████▎ | 9544/22434 [9:33:33<9:17:56, 2.60s/it] +2025-02-05 19:41:14 - ERROR - stderr - +2025-02-05 19:41:14 - ERROR - stderr - +2025-02-05 19:41:14 - INFO - stdout - {'loss': 0.6886, 'grad_norm': 1.145957112312317, 'learning_rate': 1.2861372292628816e-05, 'epoch': 1.28} +2025-02-05 19:41:14 - ERROR - stderr - 43%|████▎ | 9544/22434 [9:33:34<9:17:56, 2.60s/it] +2025-02-05 19:41:16 - ERROR - stderr - 43%|████▎ | 9545/22434 [9:33:36<9:27:04, 2.64s/it] +2025-02-05 19:41:16 - ERROR - stderr - +2025-02-05 19:41:16 - ERROR - stderr - +2025-02-05 19:41:16 - INFO - stdout - {'loss': 0.7945, 'grad_norm': 1.1343228816986084, 'learning_rate': 1.2859988881186079e-05, 'epoch': 1.28} +2025-02-05 19:41:16 - ERROR - stderr - 43%|████▎ | 9545/22434 [9:33:36<9:27:04, 2.64s/it] +2025-02-05 19:41:19 - ERROR - stderr - 43%|████▎ | 9546/22434 [9:33:39<9:13:40, 2.58s/it] +2025-02-05 19:41:19 - ERROR - stderr - +2025-02-05 19:41:19 - ERROR - stderr - +2025-02-05 19:41:19 - INFO - stdout - {'loss': 0.6899, 'grad_norm': 1.103018045425415, 'learning_rate': 1.285860541012961e-05, 'epoch': 1.28} +2025-02-05 19:41:19 - ERROR - stderr - 43%|████▎ | 9546/22434 [9:33:39<9:13:40, 2.58s/it] +2025-02-05 19:41:21 - ERROR - stderr - 43%|████▎ | 9547/22434 [9:33:41<9:08:25, 2.55s/it] +2025-02-05 19:41:21 - ERROR - stderr - +2025-02-05 19:41:21 - ERROR - stderr - +2025-02-05 19:41:21 - INFO - stdout - {'loss': 0.6099, 'grad_norm': 1.0970325469970703, 'learning_rate': 1.2857221879488245e-05, 'epoch': 1.28} +2025-02-05 19:41:21 - ERROR - stderr - 43%|████▎ | 9547/22434 [9:33:41<9:08:25, 2.55s/it] +2025-02-05 19:41:24 - ERROR - stderr - 43%|████▎ | 9548/22434 [9:33:44<9:01:10, 2.52s/it] +2025-02-05 19:41:24 - ERROR - stderr - +2025-02-05 19:41:24 - ERROR - stderr - +2025-02-05 19:41:24 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.2206907272338867, 'learning_rate': 1.2855838289290822e-05, 'epoch': 1.28} +2025-02-05 19:41:24 - ERROR - stderr - 43%|████▎ | 9548/22434 [9:33:44<9:01:10, 2.52s/it] +2025-02-05 19:41:26 - ERROR - stderr - 43%|████▎ | 9549/22434 [9:33:46<8:55:41, 2.49s/it] +2025-02-05 19:41:26 - ERROR - stderr - +2025-02-05 19:41:26 - ERROR - stderr - +2025-02-05 19:41:26 - INFO - stdout - {'loss': 0.8172, 'grad_norm': 1.1753188371658325, 'learning_rate': 1.2854454639566189e-05, 'epoch': 1.28} +2025-02-05 19:41:26 - ERROR - stderr - 43%|████▎ | 9549/22434 [9:33:46<8:55:41, 2.49s/it] +2025-02-05 19:41:29 - ERROR - stderr - 43%|████▎ | 9550/22434 [9:33:49<8:54:55, 2.49s/it] +2025-02-05 19:41:29 - ERROR - stderr - +2025-02-05 19:41:29 - ERROR - stderr - +2025-02-05 19:41:29 - INFO - stdout - {'loss': 0.7382, 'grad_norm': 1.236785650253296, 'learning_rate': 1.2853070930343176e-05, 'epoch': 1.28} +2025-02-05 19:41:29 - ERROR - stderr - 43%|████▎ | 9550/22434 [9:33:49<8:54:55, 2.49s/it] +2025-02-05 19:41:31 - ERROR - stderr - 43%|████▎ | 9551/22434 [9:33:51<8:53:46, 2.49s/it] +2025-02-05 19:41:31 - ERROR - stderr - +2025-02-05 19:41:31 - ERROR - stderr - +2025-02-05 19:41:31 - INFO - stdout - {'loss': 0.7146, 'grad_norm': 1.0794576406478882, 'learning_rate': 1.285168716165063e-05, 'epoch': 1.28} +2025-02-05 19:41:31 - ERROR - stderr - 43%|████▎ | 9551/22434 [9:33:51<8:53:46, 2.49s/it] +2025-02-05 19:41:34 - ERROR - stderr - 43%|████▎ | 9552/22434 [9:33:53<8:50:12, 2.47s/it] +2025-02-05 19:41:34 - ERROR - stderr - +2025-02-05 19:41:34 - ERROR - stderr - +2025-02-05 19:41:34 - INFO - stdout - {'loss': 0.6988, 'grad_norm': 1.1871787309646606, 'learning_rate': 1.2850303333517396e-05, 'epoch': 1.28} +2025-02-05 19:41:34 - ERROR - stderr - 43%|████▎ | 9552/22434 [9:33:53<8:50:12, 2.47s/it] +2025-02-05 19:41:36 - ERROR - stderr - 43%|████▎ | 9553/22434 [9:33:56<8:49:30, 2.47s/it] +2025-02-05 19:41:36 - ERROR - stderr - +2025-02-05 19:41:36 - ERROR - stderr - +2025-02-05 19:41:36 - INFO - stdout - {'loss': 0.6563, 'grad_norm': 1.1199374198913574, 'learning_rate': 1.2848919445972315e-05, 'epoch': 1.28} +2025-02-05 19:41:36 - ERROR - stderr - 43%|████▎ | 9553/22434 [9:33:56<8:49:30, 2.47s/it] +2025-02-05 19:41:39 - ERROR - stderr - 43%|████▎ | 9554/22434 [9:33:58<8:53:38, 2.49s/it] +2025-02-05 19:41:39 - ERROR - stderr - +2025-02-05 19:41:39 - ERROR - stderr - +2025-02-05 19:41:39 - INFO - stdout - {'loss': 0.6022, 'grad_norm': 1.0298348665237427, 'learning_rate': 1.2847535499044232e-05, 'epoch': 1.28} +2025-02-05 19:41:39 - ERROR - stderr - 43%|████▎ | 9554/22434 [9:33:58<8:53:38, 2.49s/it] +2025-02-05 19:41:41 - ERROR - stderr - 43%|████▎ | 9555/22434 [9:34:01<9:12:59, 2.58s/it] +2025-02-05 19:41:41 - ERROR - stderr - +2025-02-05 19:41:41 - ERROR - stderr - +2025-02-05 19:41:41 - INFO - stdout - {'loss': 0.6475, 'grad_norm': 1.2932682037353516, 'learning_rate': 1.2846151492762e-05, 'epoch': 1.28} +2025-02-05 19:41:41 - ERROR - stderr - 43%|████▎ | 9555/22434 [9:34:01<9:12:59, 2.58s/it] +2025-02-05 19:41:44 - ERROR - stderr - 43%|████▎ | 9556/22434 [9:34:04<9:06:13, 2.54s/it] +2025-02-05 19:41:44 - ERROR - stderr - +2025-02-05 19:41:44 - ERROR - stderr - +2025-02-05 19:41:44 - INFO - stdout - {'loss': 0.6811, 'grad_norm': 1.27214515209198, 'learning_rate': 1.2844767427154462e-05, 'epoch': 1.28} +2025-02-05 19:41:44 - ERROR - stderr - 43%|████▎ | 9556/22434 [9:34:04<9:06:13, 2.54s/it] +2025-02-05 19:41:46 - ERROR - stderr - 43%|████▎ | 9557/22434 [9:34:06<9:04:21, 2.54s/it] +2025-02-05 19:41:46 - ERROR - stderr - +2025-02-05 19:41:46 - ERROR - stderr - +2025-02-05 19:41:46 - INFO - stdout - {'loss': 0.7056, 'grad_norm': 1.1253668069839478, 'learning_rate': 1.2843383302250471e-05, 'epoch': 1.28} +2025-02-05 19:41:46 - ERROR - stderr - 43%|████▎ | 9557/22434 [9:34:06<9:04:21, 2.54s/it] +2025-02-05 19:41:49 - ERROR - stderr - 43%|████▎ | 9558/22434 [9:34:09<9:02:08, 2.53s/it] +2025-02-05 19:41:49 - ERROR - stderr - +2025-02-05 19:41:49 - ERROR - stderr - +2025-02-05 19:41:49 - INFO - stdout - {'loss': 0.6427, 'grad_norm': 1.2528895139694214, 'learning_rate': 1.2841999118078874e-05, 'epoch': 1.28} +2025-02-05 19:41:49 - ERROR - stderr - 43%|████▎ | 9558/22434 [9:34:09<9:02:08, 2.53s/it] +2025-02-05 19:41:51 - ERROR - stderr - 43%|████▎ | 9559/22434 [9:34:11<8:58:46, 2.51s/it] +2025-02-05 19:41:51 - ERROR - stderr - +2025-02-05 19:41:51 - ERROR - stderr - +2025-02-05 19:41:51 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.3314491510391235, 'learning_rate': 1.2840614874668524e-05, 'epoch': 1.28} +2025-02-05 19:41:51 - ERROR - stderr - 43%|████▎ | 9559/22434 [9:34:11<8:58:46, 2.51s/it] +2025-02-05 19:41:54 - ERROR - stderr - 43%|████▎ | 9560/22434 [9:34:14<8:57:10, 2.50s/it] +2025-02-05 19:41:54 - ERROR - stderr - +2025-02-05 19:41:54 - ERROR - stderr - +2025-02-05 19:41:54 - INFO - stdout - {'loss': 0.6223, 'grad_norm': 1.0440125465393066, 'learning_rate': 1.2839230572048274e-05, 'epoch': 1.28} +2025-02-05 19:41:54 - ERROR - stderr - 43%|████▎ | 9560/22434 [9:34:14<8:57:10, 2.50s/it] +2025-02-05 19:41:56 - ERROR - stderr - 43%|████▎ | 9561/22434 [9:34:16<8:55:24, 2.50s/it] +2025-02-05 19:41:56 - ERROR - stderr - +2025-02-05 19:41:56 - ERROR - stderr - +2025-02-05 19:41:56 - INFO - stdout - {'loss': 0.7396, 'grad_norm': 1.2579035758972168, 'learning_rate': 1.2837846210246984e-05, 'epoch': 1.28} +2025-02-05 19:41:56 - ERROR - stderr - 43%|████▎ | 9561/22434 [9:34:16<8:55:24, 2.50s/it] +2025-02-05 19:41:59 - ERROR - stderr - 43%|████▎ | 9562/22434 [9:34:19<8:53:07, 2.49s/it] +2025-02-05 19:41:59 - ERROR - stderr - +2025-02-05 19:41:59 - ERROR - stderr - +2025-02-05 19:41:59 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.0947684049606323, 'learning_rate': 1.2836461789293505e-05, 'epoch': 1.28} +2025-02-05 19:41:59 - ERROR - stderr - 43%|████▎ | 9562/22434 [9:34:19<8:53:07, 2.49s/it] +2025-02-05 19:42:01 - ERROR - stderr - 43%|████▎ | 9563/22434 [9:34:21<8:53:56, 2.49s/it] +2025-02-05 19:42:01 - ERROR - stderr - +2025-02-05 19:42:01 - ERROR - stderr - +2025-02-05 19:42:01 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.0589631795883179, 'learning_rate': 1.283507730921669e-05, 'epoch': 1.28} +2025-02-05 19:42:01 - ERROR - stderr - 43%|████▎ | 9563/22434 [9:34:21<8:53:56, 2.49s/it] +2025-02-05 19:42:04 - ERROR - stderr - 43%|████▎ | 9564/22434 [9:34:24<8:59:13, 2.51s/it] +2025-02-05 19:42:04 - ERROR - stderr - +2025-02-05 19:42:04 - ERROR - stderr - +2025-02-05 19:42:04 - INFO - stdout - {'loss': 0.6717, 'grad_norm': 1.1962766647338867, 'learning_rate': 1.2833692770045403e-05, 'epoch': 1.28} +2025-02-05 19:42:04 - ERROR - stderr - 43%|████▎ | 9564/22434 [9:34:24<8:59:13, 2.51s/it] +2025-02-05 19:42:06 - ERROR - stderr - 43%|████▎ | 9565/22434 [9:34:26<8:59:55, 2.52s/it] +2025-02-05 19:42:06 - ERROR - stderr - +2025-02-05 19:42:06 - ERROR - stderr - +2025-02-05 19:42:06 - INFO - stdout - {'loss': 0.6489, 'grad_norm': 1.074391484260559, 'learning_rate': 1.2832308171808505e-05, 'epoch': 1.28} +2025-02-05 19:42:06 - ERROR - stderr - 43%|████▎ | 9565/22434 [9:34:26<8:59:55, 2.52s/it] +2025-02-05 19:42:09 - ERROR - stderr - 43%|████▎ | 9566/22434 [9:34:29<9:01:02, 2.52s/it] +2025-02-05 19:42:09 - ERROR - stderr - +2025-02-05 19:42:09 - ERROR - stderr - +2025-02-05 19:42:09 - INFO - stdout - {'loss': 0.7279, 'grad_norm': 1.2596200704574585, 'learning_rate': 1.283092351453485e-05, 'epoch': 1.28} +2025-02-05 19:42:09 - ERROR - stderr - 43%|████▎ | 9566/22434 [9:34:29<9:01:02, 2.52s/it] +2025-02-05 19:42:11 - ERROR - stderr - 43%|████▎ | 9567/22434 [9:34:31<8:58:46, 2.51s/it] +2025-02-05 19:42:11 - ERROR - stderr - +2025-02-05 19:42:11 - ERROR - stderr - +2025-02-05 19:42:11 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.2978917360305786, 'learning_rate': 1.2829538798253303e-05, 'epoch': 1.28} +2025-02-05 19:42:11 - ERROR - stderr - 43%|████▎ | 9567/22434 [9:34:31<8:58:46, 2.51s/it] +2025-02-05 19:42:14 - ERROR - stderr - 43%|████▎ | 9568/22434 [9:34:34<8:59:42, 2.52s/it] +2025-02-05 19:42:14 - ERROR - stderr - +2025-02-05 19:42:14 - ERROR - stderr - +2025-02-05 19:42:14 - INFO - stdout - {'loss': 0.6672, 'grad_norm': 1.2259624004364014, 'learning_rate': 1.2828154022992727e-05, 'epoch': 1.28} +2025-02-05 19:42:14 - ERROR - stderr - 43%|████▎ | 9568/22434 [9:34:34<8:59:42, 2.52s/it] +2025-02-05 19:42:16 - ERROR - stderr - 43%|████▎ | 9569/22434 [9:34:36<8:56:26, 2.50s/it] +2025-02-05 19:42:16 - ERROR - stderr - +2025-02-05 19:42:16 - ERROR - stderr - +2025-02-05 19:42:16 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.1117417812347412, 'learning_rate': 1.2826769188781991e-05, 'epoch': 1.28} +2025-02-05 19:42:16 - ERROR - stderr - 43%|████▎ | 9569/22434 [9:34:36<8:56:26, 2.50s/it] +2025-02-05 19:42:19 - ERROR - stderr - 43%|████▎ | 9570/22434 [9:34:39<8:51:39, 2.48s/it] +2025-02-05 19:42:19 - ERROR - stderr - +2025-02-05 19:42:19 - ERROR - stderr - +2025-02-05 19:42:19 - INFO - stdout - {'loss': 0.7347, 'grad_norm': 1.1610409021377563, 'learning_rate': 1.2825384295649952e-05, 'epoch': 1.28} +2025-02-05 19:42:19 - ERROR - stderr - 43%|████▎ | 9570/22434 [9:34:39<8:51:39, 2.48s/it] +2025-02-05 19:42:21 - ERROR - stderr - 43%|████▎ | 9571/22434 [9:34:41<8:54:28, 2.49s/it] +2025-02-05 19:42:21 - ERROR - stderr - +2025-02-05 19:42:21 - ERROR - stderr - +2025-02-05 19:42:21 - INFO - stdout - {'loss': 0.748, 'grad_norm': 1.3092174530029297, 'learning_rate': 1.2823999343625482e-05, 'epoch': 1.28} +2025-02-05 19:42:21 - ERROR - stderr - 43%|████▎ | 9571/22434 [9:34:41<8:54:28, 2.49s/it] +2025-02-05 19:42:24 - ERROR - stderr - 43%|████▎ | 9572/22434 [9:34:44<9:00:07, 2.52s/it] +2025-02-05 19:42:24 - ERROR - stderr - +2025-02-05 19:42:24 - ERROR - stderr - +2025-02-05 19:42:24 - INFO - stdout - {'loss': 0.7977, 'grad_norm': 1.452265977859497, 'learning_rate': 1.2822614332737449e-05, 'epoch': 1.28} +2025-02-05 19:42:24 - ERROR - stderr - 43%|████▎ | 9572/22434 [9:34:44<9:00:07, 2.52s/it] +2025-02-05 19:42:26 - ERROR - stderr - 43%|████▎ | 9573/22434 [9:34:46<8:53:35, 2.49s/it] +2025-02-05 19:42:26 - ERROR - stderr - +2025-02-05 19:42:26 - ERROR - stderr - +2025-02-05 19:42:26 - INFO - stdout - {'loss': 0.7278, 'grad_norm': 1.2148412466049194, 'learning_rate': 1.2821229263014719e-05, 'epoch': 1.28} +2025-02-05 19:42:26 - ERROR - stderr - 43%|████▎ | 9573/22434 [9:34:46<8:53:35, 2.49s/it] +2025-02-05 19:42:29 - ERROR - stderr - 43%|████▎ | 9574/22434 [9:34:49<8:53:44, 2.49s/it] +2025-02-05 19:42:29 - ERROR - stderr - +2025-02-05 19:42:29 - ERROR - stderr - +2025-02-05 19:42:29 - INFO - stdout - {'loss': 0.7936, 'grad_norm': 1.2608327865600586, 'learning_rate': 1.2819844134486166e-05, 'epoch': 1.28} +2025-02-05 19:42:29 - ERROR - stderr - 43%|████▎ | 9574/22434 [9:34:49<8:53:44, 2.49s/it] +2025-02-05 19:42:31 - ERROR - stderr - 43%|████▎ | 9575/22434 [9:34:51<8:58:41, 2.51s/it] +2025-02-05 19:42:31 - ERROR - stderr - +2025-02-05 19:42:31 - ERROR - stderr - +2025-02-05 19:42:31 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.228591799736023, 'learning_rate': 1.281845894718066e-05, 'epoch': 1.28} +2025-02-05 19:42:31 - ERROR - stderr - 43%|████▎ | 9575/22434 [9:34:51<8:58:41, 2.51s/it] +2025-02-05 19:42:34 - ERROR - stderr - 43%|████▎ | 9576/22434 [9:34:54<8:54:17, 2.49s/it] +2025-02-05 19:42:34 - ERROR - stderr - +2025-02-05 19:42:34 - ERROR - stderr - +2025-02-05 19:42:34 - INFO - stdout - {'loss': 0.7539, 'grad_norm': 1.1814494132995605, 'learning_rate': 1.2817073701127074e-05, 'epoch': 1.28} +2025-02-05 19:42:34 - ERROR - stderr - 43%|████▎ | 9576/22434 [9:34:54<8:54:17, 2.49s/it] +2025-02-05 19:42:36 - ERROR - stderr - 43%|████▎ | 9577/22434 [9:34:56<8:51:04, 2.48s/it] +2025-02-05 19:42:36 - ERROR - stderr - +2025-02-05 19:42:36 - ERROR - stderr - +2025-02-05 19:42:36 - INFO - stdout - {'loss': 0.7229, 'grad_norm': 1.0683388710021973, 'learning_rate': 1.2815688396354284e-05, 'epoch': 1.28} +2025-02-05 19:42:36 - ERROR - stderr - 43%|████▎ | 9577/22434 [9:34:56<8:51:04, 2.48s/it] +2025-02-05 19:42:39 - ERROR - stderr - 43%|████▎ | 9578/22434 [9:34:59<9:18:28, 2.61s/it] +2025-02-05 19:42:39 - ERROR - stderr - +2025-02-05 19:42:39 - ERROR - stderr - +2025-02-05 19:42:39 - INFO - stdout - {'loss': 0.7938, 'grad_norm': 1.244971752166748, 'learning_rate': 1.2814303032891162e-05, 'epoch': 1.28} +2025-02-05 19:42:39 - ERROR - stderr - 43%|████▎ | 9578/22434 [9:34:59<9:18:28, 2.61s/it] +2025-02-05 19:42:42 - ERROR - stderr - 43%|████▎ | 9579/22434 [9:35:02<9:13:45, 2.58s/it] +2025-02-05 19:42:42 - ERROR - stderr - +2025-02-05 19:42:42 - ERROR - stderr - +2025-02-05 19:42:42 - INFO - stdout - {'loss': 0.7233, 'grad_norm': 1.1436121463775635, 'learning_rate': 1.2812917610766587e-05, 'epoch': 1.28} +2025-02-05 19:42:42 - ERROR - stderr - 43%|████▎ | 9579/22434 [9:35:02<9:13:45, 2.58s/it] +2025-02-05 19:42:44 - ERROR - stderr - 43%|████▎ | 9580/22434 [9:35:04<9:05:09, 2.54s/it] +2025-02-05 19:42:44 - ERROR - stderr - +2025-02-05 19:42:44 - ERROR - stderr - +2025-02-05 19:42:44 - INFO - stdout - {'loss': 0.8012, 'grad_norm': 1.2176181077957153, 'learning_rate': 1.2811532130009434e-05, 'epoch': 1.28} +2025-02-05 19:42:44 - ERROR - stderr - 43%|████▎ | 9580/22434 [9:35:04<9:05:09, 2.54s/it] +2025-02-05 19:42:47 - ERROR - stderr - 43%|████▎ | 9581/22434 [9:35:07<9:08:30, 2.56s/it] +2025-02-05 19:42:47 - ERROR - stderr - +2025-02-05 19:42:47 - ERROR - stderr - +2025-02-05 19:42:47 - INFO - stdout - {'loss': 0.6931, 'grad_norm': 1.1085972785949707, 'learning_rate': 1.2810146590648587e-05, 'epoch': 1.28} +2025-02-05 19:42:47 - ERROR - stderr - 43%|████▎ | 9581/22434 [9:35:07<9:08:30, 2.56s/it] +2025-02-05 19:42:49 - ERROR - stderr - 43%|████▎ | 9582/22434 [9:35:09<9:05:38, 2.55s/it] +2025-02-05 19:42:49 - ERROR - stderr - +2025-02-05 19:42:49 - ERROR - stderr - +2025-02-05 19:42:49 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.1544116735458374, 'learning_rate': 1.2808760992712923e-05, 'epoch': 1.28} +2025-02-05 19:42:49 - ERROR - stderr - 43%|████▎ | 9582/22434 [9:35:09<9:05:38, 2.55s/it] +2025-02-05 19:42:52 - ERROR - stderr - 43%|████▎ | 9583/22434 [9:35:12<9:01:43, 2.53s/it] +2025-02-05 19:42:52 - ERROR - stderr - +2025-02-05 19:42:52 - ERROR - stderr - +2025-02-05 19:42:52 - INFO - stdout - {'loss': 0.794, 'grad_norm': 1.2515429258346558, 'learning_rate': 1.2807375336231323e-05, 'epoch': 1.28} +2025-02-05 19:42:52 - ERROR - stderr - 43%|████▎ | 9583/22434 [9:35:12<9:01:43, 2.53s/it] +2025-02-05 19:42:54 - ERROR - stderr - 43%|████▎ | 9584/22434 [9:35:14<9:00:56, 2.53s/it] +2025-02-05 19:42:54 - ERROR - stderr - +2025-02-05 19:42:54 - ERROR - stderr - +2025-02-05 19:42:54 - INFO - stdout - {'loss': 0.8488, 'grad_norm': 1.2227753400802612, 'learning_rate': 1.280598962123267e-05, 'epoch': 1.28} +2025-02-05 19:42:54 - ERROR - stderr - 43%|████▎ | 9584/22434 [9:35:14<9:00:56, 2.53s/it] +2025-02-05 19:42:57 - ERROR - stderr - 43%|████▎ | 9585/22434 [9:35:17<8:58:20, 2.51s/it] +2025-02-05 19:42:57 - ERROR - stderr - +2025-02-05 19:42:57 - ERROR - stderr - +2025-02-05 19:42:57 - INFO - stdout - {'loss': 0.6915, 'grad_norm': 1.1010059118270874, 'learning_rate': 1.2804603847745848e-05, 'epoch': 1.28} +2025-02-05 19:42:57 - ERROR - stderr - 43%|████▎ | 9585/22434 [9:35:17<8:58:20, 2.51s/it] +2025-02-05 19:42:59 - ERROR - stderr - 43%|████▎ | 9586/22434 [9:35:19<8:53:42, 2.49s/it] +2025-02-05 19:42:59 - ERROR - stderr - +2025-02-05 19:42:59 - ERROR - stderr - +2025-02-05 19:42:59 - INFO - stdout - {'loss': 0.7874, 'grad_norm': 1.2636703252792358, 'learning_rate': 1.2803218015799743e-05, 'epoch': 1.28} +2025-02-05 19:42:59 - ERROR - stderr - 43%|████▎ | 9586/22434 [9:35:19<8:53:42, 2.49s/it] +2025-02-05 19:43:02 - ERROR - stderr - 43%|████▎ | 9587/22434 [9:35:22<8:54:12, 2.49s/it] +2025-02-05 19:43:02 - ERROR - stderr - +2025-02-05 19:43:02 - ERROR - stderr - +2025-02-05 19:43:02 - INFO - stdout - {'loss': 0.7702, 'grad_norm': 1.1862380504608154, 'learning_rate': 1.280183212542324e-05, 'epoch': 1.28} +2025-02-05 19:43:02 - ERROR - stderr - 43%|████▎ | 9587/22434 [9:35:22<8:54:12, 2.49s/it] +2025-02-05 19:43:04 - ERROR - stderr - 43%|████▎ | 9588/22434 [9:35:24<8:54:00, 2.49s/it] +2025-02-05 19:43:04 - ERROR - stderr - +2025-02-05 19:43:04 - ERROR - stderr - +2025-02-05 19:43:04 - INFO - stdout - {'loss': 0.7599, 'grad_norm': 1.0793447494506836, 'learning_rate': 1.2800446176645229e-05, 'epoch': 1.28} +2025-02-05 19:43:04 - ERROR - stderr - 43%|████▎ | 9588/22434 [9:35:24<8:54:00, 2.49s/it] +2025-02-05 19:43:07 - ERROR - stderr - 43%|████▎ | 9589/22434 [9:35:27<8:58:18, 2.51s/it] +2025-02-05 19:43:07 - ERROR - stderr - +2025-02-05 19:43:07 - ERROR - stderr - +2025-02-05 19:43:07 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.1803622245788574, 'learning_rate': 1.2799060169494601e-05, 'epoch': 1.28} +2025-02-05 19:43:07 - ERROR - stderr - 43%|████▎ | 9589/22434 [9:35:27<8:58:18, 2.51s/it] +2025-02-05 19:43:09 - ERROR - stderr - 43%|████▎ | 9590/22434 [9:35:29<8:56:24, 2.51s/it] +2025-02-05 19:43:09 - ERROR - stderr - +2025-02-05 19:43:09 - ERROR - stderr - +2025-02-05 19:43:09 - INFO - stdout - {'loss': 0.7146, 'grad_norm': 1.168172836303711, 'learning_rate': 1.2797674104000237e-05, 'epoch': 1.28} +2025-02-05 19:43:09 - ERROR - stderr - 43%|████▎ | 9590/22434 [9:35:29<8:56:24, 2.51s/it] +2025-02-05 19:43:12 - ERROR - stderr - 43%|████▎ | 9591/22434 [9:35:32<8:57:55, 2.51s/it] +2025-02-05 19:43:12 - ERROR - stderr - +2025-02-05 19:43:12 - ERROR - stderr - +2025-02-05 19:43:12 - INFO - stdout - {'loss': 0.6724, 'grad_norm': 1.053512692451477, 'learning_rate': 1.2796287980191035e-05, 'epoch': 1.28} +2025-02-05 19:43:12 - ERROR - stderr - 43%|████▎ | 9591/22434 [9:35:32<8:57:55, 2.51s/it] +2025-02-05 19:43:14 - ERROR - stderr - 43%|████▎ | 9592/22434 [9:35:34<9:00:38, 2.53s/it] +2025-02-05 19:43:14 - ERROR - stderr - +2025-02-05 19:43:14 - ERROR - stderr - +2025-02-05 19:43:14 - INFO - stdout - {'loss': 0.7409, 'grad_norm': 1.1048084497451782, 'learning_rate': 1.2794901798095882e-05, 'epoch': 1.28} +2025-02-05 19:43:14 - ERROR - stderr - 43%|████▎ | 9592/22434 [9:35:34<9:00:38, 2.53s/it] +2025-02-05 19:43:17 - ERROR - stderr - 43%|████▎ | 9593/22434 [9:35:37<9:03:11, 2.54s/it] +2025-02-05 19:43:17 - ERROR - stderr - +2025-02-05 19:43:17 - ERROR - stderr - +2025-02-05 19:43:17 - INFO - stdout - {'loss': 0.6656, 'grad_norm': 1.09452486038208, 'learning_rate': 1.279351555774368e-05, 'epoch': 1.28} +2025-02-05 19:43:17 - ERROR - stderr - 43%|████▎ | 9593/22434 [9:35:37<9:03:11, 2.54s/it] +2025-02-05 19:43:19 - ERROR - stderr - 43%|████▎ | 9594/22434 [9:35:39<9:00:30, 2.53s/it] +2025-02-05 19:43:20 - ERROR - stderr - +2025-02-05 19:43:20 - ERROR - stderr - +2025-02-05 19:43:20 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.139316201210022, 'learning_rate': 1.279212925916332e-05, 'epoch': 1.28} +2025-02-05 19:43:20 - ERROR - stderr - 43%|████▎ | 9594/22434 [9:35:39<9:00:30, 2.53s/it] +2025-02-05 19:43:22 - ERROR - stderr - 43%|████▎ | 9595/22434 [9:35:42<8:56:42, 2.51s/it] +2025-02-05 19:43:22 - ERROR - stderr - +2025-02-05 19:43:22 - ERROR - stderr - +2025-02-05 19:43:22 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.1722959280014038, 'learning_rate': 1.2790742902383695e-05, 'epoch': 1.28} +2025-02-05 19:43:22 - ERROR - stderr - 43%|████▎ | 9595/22434 [9:35:42<8:56:42, 2.51s/it] +2025-02-05 19:43:24 - ERROR - stderr - 43%|████▎ | 9596/22434 [9:35:44<8:54:24, 2.50s/it] +2025-02-05 19:43:24 - ERROR - stderr - +2025-02-05 19:43:24 - ERROR - stderr - +2025-02-05 19:43:24 - INFO - stdout - {'loss': 0.6704, 'grad_norm': 1.354641079902649, 'learning_rate': 1.2789356487433705e-05, 'epoch': 1.28} +2025-02-05 19:43:24 - ERROR - stderr - 43%|████▎ | 9596/22434 [9:35:44<8:54:24, 2.50s/it] +2025-02-05 19:43:27 - ERROR - stderr - 43%|████▎ | 9597/22434 [9:35:47<8:53:35, 2.49s/it] +2025-02-05 19:43:27 - ERROR - stderr - +2025-02-05 19:43:27 - ERROR - stderr - +2025-02-05 19:43:27 - INFO - stdout - {'loss': 0.8785, 'grad_norm': 1.436334490776062, 'learning_rate': 1.2787970014342248e-05, 'epoch': 1.28} +2025-02-05 19:43:27 - ERROR - stderr - 43%|████▎ | 9597/22434 [9:35:47<8:53:35, 2.49s/it] +2025-02-05 19:43:29 - ERROR - stderr - 43%|████▎ | 9598/22434 [9:35:49<8:52:52, 2.49s/it] +2025-02-05 19:43:29 - ERROR - stderr - +2025-02-05 19:43:29 - ERROR - stderr - +2025-02-05 19:43:29 - INFO - stdout - {'loss': 0.6304, 'grad_norm': 1.1137406826019287, 'learning_rate': 1.2786583483138222e-05, 'epoch': 1.28} +2025-02-05 19:43:29 - ERROR - stderr - 43%|████▎ | 9598/22434 [9:35:49<8:52:52, 2.49s/it] +2025-02-05 19:43:32 - ERROR - stderr - 43%|████▎ | 9599/22434 [9:35:52<8:49:46, 2.48s/it] +2025-02-05 19:43:32 - ERROR - stderr - +2025-02-05 19:43:32 - ERROR - stderr - +2025-02-05 19:43:32 - INFO - stdout - {'loss': 0.651, 'grad_norm': 1.0410254001617432, 'learning_rate': 1.2785196893850532e-05, 'epoch': 1.28} +2025-02-05 19:43:32 - ERROR - stderr - 43%|████▎ | 9599/22434 [9:35:52<8:49:46, 2.48s/it] +2025-02-05 19:43:34 - ERROR - stderr - 43%|████▎ | 9600/22434 [9:35:54<8:54:37, 2.50s/it] +2025-02-05 19:43:34 - ERROR - stderr - +2025-02-05 19:43:34 - ERROR - stderr - +2025-02-05 19:43:34 - INFO - stdout - {'loss': 0.7489, 'grad_norm': 1.3171939849853516, 'learning_rate': 1.2783810246508077e-05, 'epoch': 1.28} +2025-02-05 19:43:34 - ERROR - stderr - 43%|████▎ | 9600/22434 [9:35:54<8:54:37, 2.50s/it] +2025-02-05 19:43:37 - ERROR - stderr - 43%|████▎ | 9601/22434 [9:35:57<8:51:31, 2.49s/it] +2025-02-05 19:43:37 - ERROR - stderr - +2025-02-05 19:43:37 - ERROR - stderr - +2025-02-05 19:43:37 - INFO - stdout - {'loss': 0.7332, 'grad_norm': 1.2629326581954956, 'learning_rate': 1.278242354113976e-05, 'epoch': 1.28} +2025-02-05 19:43:37 - ERROR - stderr - 43%|████▎ | 9601/22434 [9:35:57<8:51:31, 2.49s/it] +2025-02-05 19:43:39 - ERROR - stderr - 43%|████▎ | 9602/22434 [9:35:59<8:53:32, 2.49s/it] +2025-02-05 19:43:39 - ERROR - stderr - +2025-02-05 19:43:39 - ERROR - stderr - +2025-02-05 19:43:39 - INFO - stdout - {'loss': 0.6598, 'grad_norm': 1.0753047466278076, 'learning_rate': 1.2781036777774492e-05, 'epoch': 1.28} +2025-02-05 19:43:39 - ERROR - stderr - 43%|████▎ | 9602/22434 [9:35:59<8:53:32, 2.49s/it] +2025-02-05 19:43:42 - ERROR - stderr - 43%|████▎ | 9603/22434 [9:36:02<8:53:54, 2.50s/it] +2025-02-05 19:43:42 - ERROR - stderr - +2025-02-05 19:43:42 - ERROR - stderr - +2025-02-05 19:43:42 - INFO - stdout - {'loss': 0.6693, 'grad_norm': 1.0636957883834839, 'learning_rate': 1.2779649956441172e-05, 'epoch': 1.28} +2025-02-05 19:43:42 - ERROR - stderr - 43%|████▎ | 9603/22434 [9:36:02<8:53:54, 2.50s/it] +2025-02-05 19:43:44 - ERROR - stderr - 43%|████▎ | 9604/22434 [9:36:04<8:52:32, 2.49s/it] +2025-02-05 19:43:44 - ERROR - stderr - +2025-02-05 19:43:44 - ERROR - stderr - +2025-02-05 19:43:44 - INFO - stdout - {'loss': 0.7663, 'grad_norm': 1.2351280450820923, 'learning_rate': 1.2778263077168704e-05, 'epoch': 1.28} +2025-02-05 19:43:44 - ERROR - stderr - 43%|████▎ | 9604/22434 [9:36:04<8:52:32, 2.49s/it] +2025-02-05 19:43:47 - ERROR - stderr - 43%|████▎ | 9605/22434 [9:36:07<8:49:36, 2.48s/it] +2025-02-05 19:43:47 - ERROR - stderr - +2025-02-05 19:43:47 - ERROR - stderr - +2025-02-05 19:43:47 - INFO - stdout - {'loss': 0.6487, 'grad_norm': 1.1845741271972656, 'learning_rate': 1.2776876139986003e-05, 'epoch': 1.28} +2025-02-05 19:43:47 - ERROR - stderr - 43%|████▎ | 9605/22434 [9:36:07<8:49:36, 2.48s/it] +2025-02-05 19:43:49 - ERROR - stderr - 43%|████▎ | 9606/22434 [9:36:09<8:47:45, 2.47s/it] +2025-02-05 19:43:49 - ERROR - stderr - +2025-02-05 19:43:49 - ERROR - stderr - +2025-02-05 19:43:49 - INFO - stdout - {'loss': 0.7546, 'grad_norm': 1.307610034942627, 'learning_rate': 1.2775489144921977e-05, 'epoch': 1.28} +2025-02-05 19:43:49 - ERROR - stderr - 43%|████▎ | 9606/22434 [9:36:09<8:47:45, 2.47s/it] +2025-02-05 19:43:52 - ERROR - stderr - 43%|████▎ | 9607/22434 [9:36:11<8:50:55, 2.48s/it] +2025-02-05 19:43:52 - ERROR - stderr - +2025-02-05 19:43:52 - ERROR - stderr - +2025-02-05 19:43:52 - INFO - stdout - {'loss': 0.7264, 'grad_norm': 1.0826590061187744, 'learning_rate': 1.2774102092005536e-05, 'epoch': 1.28} +2025-02-05 19:43:52 - ERROR - stderr - 43%|████▎ | 9607/22434 [9:36:12<8:50:55, 2.48s/it] +2025-02-05 19:43:54 - ERROR - stderr - 43%|████▎ | 9608/22434 [9:36:14<8:51:49, 2.49s/it] +2025-02-05 19:43:54 - ERROR - stderr - +2025-02-05 19:43:54 - ERROR - stderr - +2025-02-05 19:43:54 - INFO - stdout - {'loss': 0.721, 'grad_norm': 1.311576008796692, 'learning_rate': 1.2772714981265591e-05, 'epoch': 1.28} +2025-02-05 19:43:54 - ERROR - stderr - 43%|████▎ | 9608/22434 [9:36:14<8:51:49, 2.49s/it] +2025-02-05 19:43:57 - ERROR - stderr - 43%|████▎ | 9609/22434 [9:36:16<8:53:15, 2.49s/it] +2025-02-05 19:43:57 - ERROR - stderr - +2025-02-05 19:43:57 - ERROR - stderr - +2025-02-05 19:43:57 - INFO - stdout - {'loss': 0.7731, 'grad_norm': 1.1081477403640747, 'learning_rate': 1.2771327812731053e-05, 'epoch': 1.28} +2025-02-05 19:43:57 - ERROR - stderr - 43%|████▎ | 9609/22434 [9:36:17<8:53:15, 2.49s/it] +2025-02-05 19:43:59 - ERROR - stderr - 43%|████▎ | 9610/22434 [9:36:19<8:52:40, 2.49s/it] +2025-02-05 19:43:59 - ERROR - stderr - +2025-02-05 19:43:59 - ERROR - stderr - +2025-02-05 19:43:59 - INFO - stdout - {'loss': 0.7516, 'grad_norm': 1.2137136459350586, 'learning_rate': 1.2769940586430842e-05, 'epoch': 1.29} +2025-02-05 19:43:59 - ERROR - stderr - 43%|████▎ | 9610/22434 [9:36:19<8:52:40, 2.49s/it] +2025-02-05 19:44:02 - ERROR - stderr - 43%|████▎ | 9611/22434 [9:36:22<9:16:53, 2.61s/it] +2025-02-05 19:44:02 - ERROR - stderr - +2025-02-05 19:44:02 - ERROR - stderr - +2025-02-05 19:44:02 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.127432942390442, 'learning_rate': 1.2768553302393867e-05, 'epoch': 1.29} +2025-02-05 19:44:02 - ERROR - stderr - 43%|████▎ | 9611/22434 [9:36:22<9:16:53, 2.61s/it] +2025-02-05 19:44:05 - ERROR - stderr - 43%|████▎ | 9612/22434 [9:36:24<9:09:56, 2.57s/it] +2025-02-05 19:44:05 - ERROR - stderr - +2025-02-05 19:44:05 - ERROR - stderr - +2025-02-05 19:44:05 - INFO - stdout - {'loss': 0.6641, 'grad_norm': 1.0564912557601929, 'learning_rate': 1.2767165960649049e-05, 'epoch': 1.29} +2025-02-05 19:44:05 - ERROR - stderr - 43%|████▎ | 9612/22434 [9:36:24<9:09:56, 2.57s/it] +2025-02-05 19:44:07 - ERROR - stderr - 43%|████▎ | 9613/22434 [9:36:27<9:03:32, 2.54s/it] +2025-02-05 19:44:07 - ERROR - stderr - +2025-02-05 19:44:07 - ERROR - stderr - +2025-02-05 19:44:07 - INFO - stdout - {'loss': 0.6638, 'grad_norm': 1.1290284395217896, 'learning_rate': 1.2765778561225303e-05, 'epoch': 1.29} +2025-02-05 19:44:07 - ERROR - stderr - 43%|████▎ | 9613/22434 [9:36:27<9:03:32, 2.54s/it] +2025-02-05 19:44:10 - ERROR - stderr - 43%|████▎ | 9614/22434 [9:36:29<9:00:10, 2.53s/it] +2025-02-05 19:44:10 - ERROR - stderr - +2025-02-05 19:44:10 - ERROR - stderr - +2025-02-05 19:44:10 - INFO - stdout - {'loss': 0.684, 'grad_norm': 1.1460851430892944, 'learning_rate': 1.2764391104151554e-05, 'epoch': 1.29} +2025-02-05 19:44:10 - ERROR - stderr - 43%|████▎ | 9614/22434 [9:36:29<9:00:10, 2.53s/it] +2025-02-05 19:44:12 - ERROR - stderr - 43%|████▎ | 9615/22434 [9:36:32<8:56:32, 2.51s/it] +2025-02-05 19:44:12 - ERROR - stderr - +2025-02-05 19:44:12 - ERROR - stderr - +2025-02-05 19:44:12 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.251197338104248, 'learning_rate': 1.2763003589456716e-05, 'epoch': 1.29} +2025-02-05 19:44:12 - ERROR - stderr - 43%|████▎ | 9615/22434 [9:36:32<8:56:32, 2.51s/it] +2025-02-05 19:44:15 - ERROR - stderr - 43%|████▎ | 9616/22434 [9:36:34<8:54:44, 2.50s/it] +2025-02-05 19:44:15 - ERROR - stderr - +2025-02-05 19:44:15 - ERROR - stderr - +2025-02-05 19:44:15 - INFO - stdout - {'loss': 0.7374, 'grad_norm': 1.1249823570251465, 'learning_rate': 1.2761616017169709e-05, 'epoch': 1.29} +2025-02-05 19:44:15 - ERROR - stderr - 43%|████▎ | 9616/22434 [9:36:34<8:54:44, 2.50s/it] +2025-02-05 19:44:17 - ERROR - stderr - 43%|████▎ | 9617/22434 [9:36:37<8:52:46, 2.49s/it] +2025-02-05 19:44:17 - ERROR - stderr - +2025-02-05 19:44:17 - ERROR - stderr - +2025-02-05 19:44:17 - INFO - stdout - {'loss': 0.7883, 'grad_norm': 1.4155443906784058, 'learning_rate': 1.276022838731946e-05, 'epoch': 1.29} +2025-02-05 19:44:17 - ERROR - stderr - 43%|████▎ | 9617/22434 [9:36:37<8:52:46, 2.49s/it] +2025-02-05 19:44:20 - ERROR - stderr - 43%|████▎ | 9618/22434 [9:36:39<8:58:14, 2.52s/it] +2025-02-05 19:44:20 - ERROR - stderr - +2025-02-05 19:44:20 - ERROR - stderr - +2025-02-05 19:44:20 - INFO - stdout - {'loss': 0.6404, 'grad_norm': 1.1178494691848755, 'learning_rate': 1.2758840699934893e-05, 'epoch': 1.29} +2025-02-05 19:44:20 - ERROR - stderr - 43%|████▎ | 9618/22434 [9:36:39<8:58:14, 2.52s/it] +2025-02-05 19:44:22 - ERROR - stderr - 43%|████▎ | 9619/22434 [9:36:42<9:18:09, 2.61s/it] +2025-02-05 19:44:22 - ERROR - stderr - +2025-02-05 19:44:22 - ERROR - stderr - +2025-02-05 19:44:22 - INFO - stdout - {'loss': 0.8131, 'grad_norm': 1.2892897129058838, 'learning_rate': 1.2757452955044928e-05, 'epoch': 1.29} +2025-02-05 19:44:22 - ERROR - stderr - 43%|████▎ | 9619/22434 [9:36:42<9:18:09, 2.61s/it] +2025-02-05 19:44:25 - ERROR - stderr - 43%|████▎ | 9620/22434 [9:36:45<9:11:47, 2.58s/it] +2025-02-05 19:44:25 - ERROR - stderr - +2025-02-05 19:44:25 - ERROR - stderr - +2025-02-05 19:44:25 - INFO - stdout - {'loss': 0.6935, 'grad_norm': 1.1258512735366821, 'learning_rate': 1.27560651526785e-05, 'epoch': 1.29} +2025-02-05 19:44:25 - ERROR - stderr - 43%|████▎ | 9620/22434 [9:36:45<9:11:47, 2.58s/it] +2025-02-05 19:44:27 - ERROR - stderr - 43%|████▎ | 9621/22434 [9:36:47<9:07:16, 2.56s/it] +2025-02-05 19:44:27 - ERROR - stderr - +2025-02-05 19:44:27 - ERROR - stderr - +2025-02-05 19:44:27 - INFO - stdout - {'loss': 0.7623, 'grad_norm': 1.181854009628296, 'learning_rate': 1.2754677292864525e-05, 'epoch': 1.29} +2025-02-05 19:44:27 - ERROR - stderr - 43%|████▎ | 9621/22434 [9:36:47<9:07:16, 2.56s/it] +2025-02-05 19:44:30 - ERROR - stderr - 43%|████▎ | 9622/22434 [9:36:50<9:03:35, 2.55s/it] +2025-02-05 19:44:30 - ERROR - stderr - +2025-02-05 19:44:30 - ERROR - stderr - +2025-02-05 19:44:30 - INFO - stdout - {'loss': 0.7519, 'grad_norm': 1.1584722995758057, 'learning_rate': 1.2753289375631945e-05, 'epoch': 1.29} +2025-02-05 19:44:30 - ERROR - stderr - 43%|████▎ | 9622/22434 [9:36:50<9:03:35, 2.55s/it] +2025-02-05 19:44:32 - ERROR - stderr - 43%|████▎ | 9623/22434 [9:36:52<8:57:32, 2.52s/it] +2025-02-05 19:44:32 - ERROR - stderr - +2025-02-05 19:44:32 - ERROR - stderr - +2025-02-05 19:44:32 - INFO - stdout - {'loss': 0.8311, 'grad_norm': 1.2618217468261719, 'learning_rate': 1.2751901401009678e-05, 'epoch': 1.29} +2025-02-05 19:44:32 - ERROR - stderr - 43%|████▎ | 9623/22434 [9:36:52<8:57:32, 2.52s/it] +2025-02-05 19:44:35 - ERROR - stderr - 43%|████▎ | 9624/22434 [9:36:55<8:53:02, 2.50s/it] +2025-02-05 19:44:35 - ERROR - stderr - +2025-02-05 19:44:35 - ERROR - stderr - +2025-02-05 19:44:35 - INFO - stdout - {'loss': 0.8387, 'grad_norm': 1.0831125974655151, 'learning_rate': 1.2750513369026658e-05, 'epoch': 1.29} +2025-02-05 19:44:35 - ERROR - stderr - 43%|████▎ | 9624/22434 [9:36:55<8:53:02, 2.50s/it] +2025-02-05 19:44:37 - ERROR - stderr - 43%|████▎ | 9625/22434 [9:36:57<8:51:57, 2.49s/it] +2025-02-05 19:44:37 - ERROR - stderr - +2025-02-05 19:44:37 - ERROR - stderr - +2025-02-05 19:44:37 - INFO - stdout - {'loss': 0.6753, 'grad_norm': 1.1415061950683594, 'learning_rate': 1.274912527971182e-05, 'epoch': 1.29} +2025-02-05 19:44:37 - ERROR - stderr - 43%|████▎ | 9625/22434 [9:36:57<8:51:57, 2.49s/it] +2025-02-05 19:44:40 - ERROR - stderr - 43%|████▎ | 9626/22434 [9:37:00<8:53:21, 2.50s/it] +2025-02-05 19:44:40 - ERROR - stderr - +2025-02-05 19:44:40 - ERROR - stderr - +2025-02-05 19:44:40 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.0522689819335938, 'learning_rate': 1.27477371330941e-05, 'epoch': 1.29} +2025-02-05 19:44:40 - ERROR - stderr - 43%|████▎ | 9626/22434 [9:37:00<8:53:21, 2.50s/it] +2025-02-05 19:44:42 - ERROR - stderr - 43%|████▎ | 9627/22434 [9:37:02<8:53:13, 2.50s/it] +2025-02-05 19:44:42 - ERROR - stderr - +2025-02-05 19:44:42 - ERROR - stderr - +2025-02-05 19:44:42 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.0634231567382812, 'learning_rate': 1.2746348929202426e-05, 'epoch': 1.29} +2025-02-05 19:44:42 - ERROR - stderr - 43%|████▎ | 9627/22434 [9:37:02<8:53:13, 2.50s/it] +2025-02-05 19:44:45 - ERROR - stderr - 43%|████▎ | 9628/22434 [9:37:05<8:55:46, 2.51s/it] +2025-02-05 19:44:45 - ERROR - stderr - +2025-02-05 19:44:45 - ERROR - stderr - +2025-02-05 19:44:45 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.123031735420227, 'learning_rate': 1.2744960668065737e-05, 'epoch': 1.29} +2025-02-05 19:44:45 - ERROR - stderr - 43%|████▎ | 9628/22434 [9:37:05<8:55:46, 2.51s/it] +2025-02-05 19:44:47 - ERROR - stderr - 43%|████▎ | 9629/22434 [9:37:07<8:54:56, 2.51s/it] +2025-02-05 19:44:47 - ERROR - stderr - +2025-02-05 19:44:47 - ERROR - stderr - +2025-02-05 19:44:47 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.2298609018325806, 'learning_rate': 1.274357234971297e-05, 'epoch': 1.29} +2025-02-05 19:44:47 - ERROR - stderr - 43%|████▎ | 9629/22434 [9:37:07<8:54:56, 2.51s/it] +2025-02-05 19:44:50 - ERROR - stderr - 43%|████▎ | 9630/22434 [9:37:10<8:54:50, 2.51s/it] +2025-02-05 19:44:50 - ERROR - stderr - +2025-02-05 19:44:50 - ERROR - stderr - +2025-02-05 19:44:50 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.1345815658569336, 'learning_rate': 1.2742183974173062e-05, 'epoch': 1.29} +2025-02-05 19:44:50 - ERROR - stderr - 43%|████▎ | 9630/22434 [9:37:10<8:54:50, 2.51s/it] +2025-02-05 19:44:53 - ERROR - stderr - 43%|████▎ | 9631/22434 [9:37:12<9:15:17, 2.60s/it] +2025-02-05 19:44:53 - ERROR - stderr - +2025-02-05 19:44:53 - ERROR - stderr - +2025-02-05 19:44:53 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.254126787185669, 'learning_rate': 1.274079554147495e-05, 'epoch': 1.29} +2025-02-05 19:44:53 - ERROR - stderr - 43%|████▎ | 9631/22434 [9:37:13<9:15:17, 2.60s/it] +2025-02-05 19:44:55 - ERROR - stderr - 43%|████▎ | 9632/22434 [9:37:15<9:10:13, 2.58s/it] +2025-02-05 19:44:55 - ERROR - stderr - +2025-02-05 19:44:55 - ERROR - stderr - +2025-02-05 19:44:55 - INFO - stdout - {'loss': 0.6983, 'grad_norm': 1.1598751544952393, 'learning_rate': 1.2739407051647581e-05, 'epoch': 1.29} +2025-02-05 19:44:55 - ERROR - stderr - 43%|████▎ | 9632/22434 [9:37:15<9:10:13, 2.58s/it] +2025-02-05 19:44:58 - ERROR - stderr - 43%|████▎ | 9633/22434 [9:37:17<9:04:27, 2.55s/it] +2025-02-05 19:44:58 - ERROR - stderr - +2025-02-05 19:44:58 - ERROR - stderr - +2025-02-05 19:44:58 - INFO - stdout - {'loss': 0.6925, 'grad_norm': 1.1939847469329834, 'learning_rate': 1.2738018504719894e-05, 'epoch': 1.29} +2025-02-05 19:44:58 - ERROR - stderr - 43%|████▎ | 9633/22434 [9:37:18<9:04:27, 2.55s/it] +2025-02-05 19:45:00 - ERROR - stderr - 43%|████▎ | 9634/22434 [9:37:20<9:05:34, 2.56s/it] +2025-02-05 19:45:00 - ERROR - stderr - +2025-02-05 19:45:00 - ERROR - stderr - +2025-02-05 19:45:00 - INFO - stdout - {'loss': 0.7636, 'grad_norm': 1.2101213932037354, 'learning_rate': 1.2736629900720832e-05, 'epoch': 1.29} +2025-02-05 19:45:00 - ERROR - stderr - 43%|████▎ | 9634/22434 [9:37:20<9:05:34, 2.56s/it] +2025-02-05 19:45:03 - ERROR - stderr - 43%|████▎ | 9635/22434 [9:37:23<9:00:57, 2.54s/it] +2025-02-05 19:45:03 - ERROR - stderr - +2025-02-05 19:45:03 - ERROR - stderr - +2025-02-05 19:45:03 - INFO - stdout - {'loss': 0.7592, 'grad_norm': 1.0956484079360962, 'learning_rate': 1.2735241239679335e-05, 'epoch': 1.29} +2025-02-05 19:45:03 - ERROR - stderr - 43%|████▎ | 9635/22434 [9:37:23<9:00:57, 2.54s/it] +2025-02-05 19:45:05 - ERROR - stderr - 43%|████▎ | 9636/22434 [9:37:25<9:00:34, 2.53s/it] +2025-02-05 19:45:05 - ERROR - stderr - +2025-02-05 19:45:05 - ERROR - stderr - +2025-02-05 19:45:05 - INFO - stdout - {'loss': 0.8213, 'grad_norm': 1.2897831201553345, 'learning_rate': 1.2733852521624353e-05, 'epoch': 1.29} +2025-02-05 19:45:05 - ERROR - stderr - 43%|████▎ | 9636/22434 [9:37:25<9:00:34, 2.53s/it] +2025-02-05 19:45:08 - ERROR - stderr - 43%|████▎ | 9637/22434 [9:37:28<9:08:01, 2.57s/it] +2025-02-05 19:45:08 - ERROR - stderr - +2025-02-05 19:45:08 - ERROR - stderr - +2025-02-05 19:45:08 - INFO - stdout - {'loss': 0.7514, 'grad_norm': 1.3319885730743408, 'learning_rate': 1.273246374658483e-05, 'epoch': 1.29} +2025-02-05 19:45:08 - ERROR - stderr - 43%|████▎ | 9637/22434 [9:37:28<9:08:01, 2.57s/it] +2025-02-05 19:45:10 - ERROR - stderr - 43%|████▎ | 9638/22434 [9:37:30<9:02:32, 2.54s/it] +2025-02-05 19:45:10 - ERROR - stderr - +2025-02-05 19:45:10 - ERROR - stderr - +2025-02-05 19:45:10 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.1608487367630005, 'learning_rate': 1.2731074914589718e-05, 'epoch': 1.29} +2025-02-05 19:45:10 - ERROR - stderr - 43%|████▎ | 9638/22434 [9:37:30<9:02:32, 2.54s/it] +2025-02-05 19:45:13 - ERROR - stderr - 43%|████▎ | 9639/22434 [9:37:33<8:57:33, 2.52s/it] +2025-02-05 19:45:13 - ERROR - stderr - +2025-02-05 19:45:13 - ERROR - stderr - +2025-02-05 19:45:13 - INFO - stdout - {'loss': 0.6822, 'grad_norm': 1.121476173400879, 'learning_rate': 1.272968602566796e-05, 'epoch': 1.29} +2025-02-05 19:45:13 - ERROR - stderr - 43%|████▎ | 9639/22434 [9:37:33<8:57:33, 2.52s/it] +2025-02-05 19:45:15 - ERROR - stderr - 43%|████▎ | 9640/22434 [9:37:35<8:53:17, 2.50s/it] +2025-02-05 19:45:15 - ERROR - stderr - +2025-02-05 19:45:15 - ERROR - stderr - +2025-02-05 19:45:15 - INFO - stdout - {'loss': 0.7443, 'grad_norm': 1.1441428661346436, 'learning_rate': 1.272829707984851e-05, 'epoch': 1.29} +2025-02-05 19:45:15 - ERROR - stderr - 43%|████▎ | 9640/22434 [9:37:35<8:53:17, 2.50s/it] +2025-02-05 19:45:18 - ERROR - stderr - 43%|████▎ | 9641/22434 [9:37:38<8:54:20, 2.51s/it] +2025-02-05 19:45:18 - ERROR - stderr - +2025-02-05 19:45:18 - ERROR - stderr - +2025-02-05 19:45:18 - INFO - stdout - {'loss': 0.7289, 'grad_norm': 1.1241437196731567, 'learning_rate': 1.2726908077160318e-05, 'epoch': 1.29} +2025-02-05 19:45:18 - ERROR - stderr - 43%|████▎ | 9641/22434 [9:37:38<8:54:20, 2.51s/it] +2025-02-05 19:45:20 - ERROR - stderr - 43%|████▎ | 9642/22434 [9:37:40<8:58:49, 2.53s/it] +2025-02-05 19:45:20 - ERROR - stderr - +2025-02-05 19:45:20 - ERROR - stderr - +2025-02-05 19:45:20 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.216529130935669, 'learning_rate': 1.2725519017632337e-05, 'epoch': 1.29} +2025-02-05 19:45:20 - ERROR - stderr - 43%|████▎ | 9642/22434 [9:37:40<8:58:49, 2.53s/it] +2025-02-05 19:45:23 - ERROR - stderr - 43%|████▎ | 9643/22434 [9:37:43<8:59:01, 2.53s/it] +2025-02-05 19:45:23 - ERROR - stderr - +2025-02-05 19:45:23 - ERROR - stderr - +2025-02-05 19:45:23 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.1122649908065796, 'learning_rate': 1.2724129901293519e-05, 'epoch': 1.29} +2025-02-05 19:45:23 - ERROR - stderr - 43%|████▎ | 9643/22434 [9:37:43<8:59:01, 2.53s/it] +2025-02-05 19:45:25 - ERROR - stderr - 43%|████▎ | 9644/22434 [9:37:45<8:55:41, 2.51s/it] +2025-02-05 19:45:25 - ERROR - stderr - +2025-02-05 19:45:25 - ERROR - stderr - +2025-02-05 19:45:25 - INFO - stdout - {'loss': 0.7642, 'grad_norm': 1.1345137357711792, 'learning_rate': 1.272274072817282e-05, 'epoch': 1.29} +2025-02-05 19:45:25 - ERROR - stderr - 43%|████▎ | 9644/22434 [9:37:45<8:55:41, 2.51s/it] +2025-02-05 19:45:28 - ERROR - stderr - 43%|████▎ | 9645/22434 [9:37:48<8:56:19, 2.52s/it] +2025-02-05 19:45:28 - ERROR - stderr - +2025-02-05 19:45:28 - ERROR - stderr - +2025-02-05 19:45:28 - INFO - stdout - {'loss': 0.6494, 'grad_norm': 1.126294493675232, 'learning_rate': 1.2721351498299194e-05, 'epoch': 1.29} +2025-02-05 19:45:28 - ERROR - stderr - 43%|████▎ | 9645/22434 [9:37:48<8:56:19, 2.52s/it] +2025-02-05 19:45:30 - ERROR - stderr - 43%|████▎ | 9646/22434 [9:37:50<8:53:19, 2.50s/it] +2025-02-05 19:45:30 - ERROR - stderr - +2025-02-05 19:45:30 - ERROR - stderr - +2025-02-05 19:45:30 - INFO - stdout - {'loss': 0.7854, 'grad_norm': 1.163415789604187, 'learning_rate': 1.2719962211701607e-05, 'epoch': 1.29} +2025-02-05 19:45:30 - ERROR - stderr - 43%|████▎ | 9646/22434 [9:37:50<8:53:19, 2.50s/it] +2025-02-05 19:45:33 - ERROR - stderr - 43%|████▎ | 9647/22434 [9:37:53<8:54:14, 2.51s/it] +2025-02-05 19:45:33 - ERROR - stderr - +2025-02-05 19:45:33 - ERROR - stderr - +2025-02-05 19:45:33 - INFO - stdout - {'loss': 0.7057, 'grad_norm': 1.1121639013290405, 'learning_rate': 1.2718572868409005e-05, 'epoch': 1.29} +2025-02-05 19:45:33 - ERROR - stderr - 43%|████▎ | 9647/22434 [9:37:53<8:54:14, 2.51s/it] +2025-02-05 19:45:35 - ERROR - stderr - 43%|████▎ | 9648/22434 [9:37:55<8:52:23, 2.50s/it] +2025-02-05 19:45:35 - ERROR - stderr - +2025-02-05 19:45:35 - ERROR - stderr - +2025-02-05 19:45:35 - INFO - stdout - {'loss': 0.688, 'grad_norm': 1.1295735836029053, 'learning_rate': 1.2717183468450354e-05, 'epoch': 1.29} +2025-02-05 19:45:35 - ERROR - stderr - 43%|████▎ | 9648/22434 [9:37:55<8:52:23, 2.50s/it] +2025-02-05 19:45:38 - ERROR - stderr - 43%|████▎ | 9649/22434 [9:37:58<8:55:16, 2.51s/it] +2025-02-05 19:45:38 - ERROR - stderr - +2025-02-05 19:45:38 - ERROR - stderr - +2025-02-05 19:45:38 - INFO - stdout - {'loss': 0.6512, 'grad_norm': 1.1596299409866333, 'learning_rate': 1.2715794011854612e-05, 'epoch': 1.29} +2025-02-05 19:45:38 - ERROR - stderr - 43%|████▎ | 9649/22434 [9:37:58<8:55:16, 2.51s/it] +2025-02-05 19:45:41 - ERROR - stderr - 43%|████▎ | 9650/22434 [9:38:00<9:07:48, 2.57s/it] +2025-02-05 19:45:41 - ERROR - stderr - +2025-02-05 19:45:41 - ERROR - stderr - +2025-02-05 19:45:41 - INFO - stdout - {'loss': 0.7101, 'grad_norm': 1.045924425125122, 'learning_rate': 1.2714404498650743e-05, 'epoch': 1.29} +2025-02-05 19:45:41 - ERROR - stderr - 43%|████▎ | 9650/22434 [9:38:01<9:07:48, 2.57s/it] +2025-02-05 19:45:43 - ERROR - stderr - 43%|████▎ | 9651/22434 [9:38:03<9:06:26, 2.56s/it] +2025-02-05 19:45:43 - ERROR - stderr - +2025-02-05 19:45:43 - ERROR - stderr - +2025-02-05 19:45:43 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.0680131912231445, 'learning_rate': 1.271301492886771e-05, 'epoch': 1.29} +2025-02-05 19:45:43 - ERROR - stderr - 43%|████▎ | 9651/22434 [9:38:03<9:06:26, 2.56s/it] +2025-02-05 19:45:46 - ERROR - stderr - 43%|████▎ | 9652/22434 [9:38:05<8:57:12, 2.52s/it] +2025-02-05 19:45:46 - ERROR - stderr - +2025-02-05 19:45:46 - ERROR - stderr - +2025-02-05 19:45:46 - INFO - stdout - {'loss': 0.6902, 'grad_norm': 1.1398775577545166, 'learning_rate': 1.2711625302534479e-05, 'epoch': 1.29} +2025-02-05 19:45:46 - ERROR - stderr - 43%|████▎ | 9652/22434 [9:38:05<8:57:12, 2.52s/it] +2025-02-05 19:45:48 - ERROR - stderr - 43%|████▎ | 9653/22434 [9:38:08<8:57:06, 2.52s/it] +2025-02-05 19:45:48 - ERROR - stderr - +2025-02-05 19:45:48 - ERROR - stderr - +2025-02-05 19:45:48 - INFO - stdout - {'loss': 0.683, 'grad_norm': 1.0867266654968262, 'learning_rate': 1.2710235619680012e-05, 'epoch': 1.29} +2025-02-05 19:45:48 - ERROR - stderr - 43%|████▎ | 9653/22434 [9:38:08<8:57:06, 2.52s/it] +2025-02-05 19:45:51 - ERROR - stderr - 43%|████▎ | 9654/22434 [9:38:10<8:58:26, 2.53s/it] +2025-02-05 19:45:51 - ERROR - stderr - +2025-02-05 19:45:51 - ERROR - stderr - +2025-02-05 19:45:51 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.0483216047286987, 'learning_rate': 1.2708845880333278e-05, 'epoch': 1.29} +2025-02-05 19:45:51 - ERROR - stderr - 43%|████▎ | 9654/22434 [9:38:11<8:58:26, 2.53s/it] +2025-02-05 19:45:53 - ERROR - stderr - 43%|████▎ | 9655/22434 [9:38:13<8:59:33, 2.53s/it] +2025-02-05 19:45:53 - ERROR - stderr - +2025-02-05 19:45:53 - ERROR - stderr - +2025-02-05 19:45:53 - INFO - stdout - {'loss': 0.6983, 'grad_norm': 1.255963921546936, 'learning_rate': 1.2707456084523242e-05, 'epoch': 1.29} +2025-02-05 19:45:53 - ERROR - stderr - 43%|████▎ | 9655/22434 [9:38:13<8:59:33, 2.53s/it] +2025-02-05 19:45:56 - ERROR - stderr - 43%|████▎ | 9656/22434 [9:38:16<9:03:21, 2.55s/it] +2025-02-05 19:45:56 - ERROR - stderr - +2025-02-05 19:45:56 - ERROR - stderr - +2025-02-05 19:45:56 - INFO - stdout - {'loss': 0.7786, 'grad_norm': 1.2016394138336182, 'learning_rate': 1.2706066232278873e-05, 'epoch': 1.29} +2025-02-05 19:45:56 - ERROR - stderr - 43%|████▎ | 9656/22434 [9:38:16<9:03:21, 2.55s/it] +2025-02-05 19:45:58 - ERROR - stderr - 43%|████▎ | 9657/22434 [9:38:18<8:58:29, 2.53s/it] +2025-02-05 19:45:58 - ERROR - stderr - +2025-02-05 19:45:58 - ERROR - stderr - +2025-02-05 19:45:58 - INFO - stdout - {'loss': 0.7597, 'grad_norm': 1.3941806554794312, 'learning_rate': 1.2704676323629146e-05, 'epoch': 1.29} +2025-02-05 19:45:58 - ERROR - stderr - 43%|████▎ | 9657/22434 [9:38:18<8:58:29, 2.53s/it] +2025-02-05 19:46:01 - ERROR - stderr - 43%|████▎ | 9658/22434 [9:38:21<9:01:42, 2.54s/it] +2025-02-05 19:46:01 - ERROR - stderr - +2025-02-05 19:46:01 - ERROR - stderr - +2025-02-05 19:46:01 - INFO - stdout - {'loss': 0.7022, 'grad_norm': 1.060990571975708, 'learning_rate': 1.2703286358603029e-05, 'epoch': 1.29} +2025-02-05 19:46:01 - ERROR - stderr - 43%|████▎ | 9658/22434 [9:38:21<9:01:42, 2.54s/it] +2025-02-05 19:46:03 - ERROR - stderr - 43%|████▎ | 9659/22434 [9:38:23<9:03:30, 2.55s/it] +2025-02-05 19:46:04 - ERROR - stderr - +2025-02-05 19:46:04 - ERROR - stderr - +2025-02-05 19:46:04 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.0121026039123535, 'learning_rate': 1.2701896337229493e-05, 'epoch': 1.29} +2025-02-05 19:46:04 - ERROR - stderr - 43%|████▎ | 9659/22434 [9:38:23<9:03:30, 2.55s/it] +2025-02-05 19:46:06 - ERROR - stderr - 43%|████▎ | 9660/22434 [9:38:26<8:55:53, 2.52s/it] +2025-02-05 19:46:06 - ERROR - stderr - +2025-02-05 19:46:06 - ERROR - stderr - +2025-02-05 19:46:06 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.1033008098602295, 'learning_rate': 1.2700506259537515e-05, 'epoch': 1.29} +2025-02-05 19:46:06 - ERROR - stderr - 43%|████▎ | 9660/22434 [9:38:26<8:55:53, 2.52s/it] +2025-02-05 19:46:08 - ERROR - stderr - 43%|████▎ | 9661/22434 [9:38:28<8:53:36, 2.51s/it] +2025-02-05 19:46:08 - ERROR - stderr - +2025-02-05 19:46:08 - ERROR - stderr - +2025-02-05 19:46:08 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.2020244598388672, 'learning_rate': 1.2699116125556065e-05, 'epoch': 1.29} +2025-02-05 19:46:08 - ERROR - stderr - 43%|████▎ | 9661/22434 [9:38:28<8:53:36, 2.51s/it] +2025-02-05 19:46:11 - ERROR - stderr - 43%|████▎ | 9662/22434 [9:38:31<9:22:28, 2.64s/it] +2025-02-05 19:46:11 - ERROR - stderr - +2025-02-05 19:46:11 - ERROR - stderr - +2025-02-05 19:46:11 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.0644489526748657, 'learning_rate': 1.2697725935314125e-05, 'epoch': 1.29} +2025-02-05 19:46:11 - ERROR - stderr - 43%|████▎ | 9662/22434 [9:38:31<9:22:28, 2.64s/it] +2025-02-05 19:46:14 - ERROR - stderr - 43%|████▎ | 9663/22434 [9:38:34<9:17:06, 2.62s/it] +2025-02-05 19:46:14 - ERROR - stderr - +2025-02-05 19:46:14 - ERROR - stderr - +2025-02-05 19:46:14 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.1937388181686401, 'learning_rate': 1.2696335688840669e-05, 'epoch': 1.29} +2025-02-05 19:46:14 - ERROR - stderr - 43%|████▎ | 9663/22434 [9:38:34<9:17:06, 2.62s/it] +2025-02-05 19:46:16 - ERROR - stderr - 43%|████▎ | 9664/22434 [9:38:36<9:06:18, 2.57s/it] +2025-02-05 19:46:16 - ERROR - stderr - +2025-02-05 19:46:16 - ERROR - stderr - +2025-02-05 19:46:16 - INFO - stdout - {'loss': 0.7222, 'grad_norm': 1.2846183776855469, 'learning_rate': 1.2694945386164675e-05, 'epoch': 1.29} +2025-02-05 19:46:16 - ERROR - stderr - 43%|████▎ | 9664/22434 [9:38:36<9:06:18, 2.57s/it] +2025-02-05 19:46:19 - ERROR - stderr - 43%|████▎ | 9665/22434 [9:38:39<9:04:15, 2.56s/it] +2025-02-05 19:46:19 - ERROR - stderr - +2025-02-05 19:46:19 - ERROR - stderr - +2025-02-05 19:46:19 - INFO - stdout - {'loss': 0.7392, 'grad_norm': 1.1278648376464844, 'learning_rate': 1.2693555027315124e-05, 'epoch': 1.29} +2025-02-05 19:46:19 - ERROR - stderr - 43%|████▎ | 9665/22434 [9:38:39<9:04:15, 2.56s/it] +2025-02-05 19:46:21 - ERROR - stderr - 43%|████▎ | 9666/22434 [9:38:41<9:00:30, 2.54s/it] +2025-02-05 19:46:21 - ERROR - stderr - +2025-02-05 19:46:21 - ERROR - stderr - +2025-02-05 19:46:21 - INFO - stdout - {'loss': 0.7626, 'grad_norm': 1.3033103942871094, 'learning_rate': 1.2692164612320997e-05, 'epoch': 1.29} +2025-02-05 19:46:21 - ERROR - stderr - 43%|████▎ | 9666/22434 [9:38:41<9:00:30, 2.54s/it] +2025-02-05 19:46:24 - ERROR - stderr - 43%|████▎ | 9667/22434 [9:38:44<8:56:34, 2.52s/it] +2025-02-05 19:46:24 - ERROR - stderr - +2025-02-05 19:46:24 - ERROR - stderr - +2025-02-05 19:46:24 - INFO - stdout - {'loss': 0.7298, 'grad_norm': 1.0802661180496216, 'learning_rate': 1.2690774141211271e-05, 'epoch': 1.29} +2025-02-05 19:46:24 - ERROR - stderr - 43%|████▎ | 9667/22434 [9:38:44<8:56:34, 2.52s/it] +2025-02-05 19:46:26 - ERROR - stderr - 43%|████▎ | 9668/22434 [9:38:46<8:53:16, 2.51s/it] +2025-02-05 19:46:26 - ERROR - stderr - +2025-02-05 19:46:26 - ERROR - stderr - +2025-02-05 19:46:26 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.0877751111984253, 'learning_rate': 1.2689383614014937e-05, 'epoch': 1.29} +2025-02-05 19:46:26 - ERROR - stderr - 43%|████▎ | 9668/22434 [9:38:46<8:53:16, 2.51s/it] +2025-02-05 19:46:29 - ERROR - stderr - 43%|████▎ | 9669/22434 [9:38:49<8:48:15, 2.48s/it] +2025-02-05 19:46:29 - ERROR - stderr - +2025-02-05 19:46:29 - ERROR - stderr - +2025-02-05 19:46:29 - INFO - stdout - {'loss': 0.7958, 'grad_norm': 1.339814305305481, 'learning_rate': 1.2687993030760973e-05, 'epoch': 1.29} +2025-02-05 19:46:29 - ERROR - stderr - 43%|████▎ | 9669/22434 [9:38:49<8:48:15, 2.48s/it] +2025-02-05 19:46:31 - ERROR - stderr - 43%|████▎ | 9670/22434 [9:38:51<8:49:09, 2.49s/it] +2025-02-05 19:46:31 - ERROR - stderr - +2025-02-05 19:46:31 - ERROR - stderr - +2025-02-05 19:46:31 - INFO - stdout - {'loss': 0.6492, 'grad_norm': 1.0392295122146606, 'learning_rate': 1.2686602391478364e-05, 'epoch': 1.29} +2025-02-05 19:46:31 - ERROR - stderr - 43%|████▎ | 9670/22434 [9:38:51<8:49:09, 2.49s/it] +2025-02-05 19:46:34 - ERROR - stderr - 43%|████▎ | 9671/22434 [9:38:54<8:49:42, 2.49s/it] +2025-02-05 19:46:34 - ERROR - stderr - +2025-02-05 19:46:34 - ERROR - stderr - +2025-02-05 19:46:34 - INFO - stdout - {'loss': 0.7585, 'grad_norm': 1.2615996599197388, 'learning_rate': 1.2685211696196102e-05, 'epoch': 1.29} +2025-02-05 19:46:34 - ERROR - stderr - 43%|████▎ | 9671/22434 [9:38:54<8:49:42, 2.49s/it] +2025-02-05 19:46:36 - ERROR - stderr - 43%|████▎ | 9672/22434 [9:38:56<8:49:35, 2.49s/it] +2025-02-05 19:46:36 - ERROR - stderr - +2025-02-05 19:46:36 - ERROR - stderr - +2025-02-05 19:46:36 - INFO - stdout - {'loss': 0.7118, 'grad_norm': 1.1853387355804443, 'learning_rate': 1.268382094494317e-05, 'epoch': 1.29} +2025-02-05 19:46:36 - ERROR - stderr - 43%|████▎ | 9672/22434 [9:38:56<8:49:35, 2.49s/it] +2025-02-05 19:46:39 - ERROR - stderr - 43%|████▎ | 9673/22434 [9:38:59<8:55:58, 2.52s/it] +2025-02-05 19:46:39 - ERROR - stderr - +2025-02-05 19:46:39 - ERROR - stderr - +2025-02-05 19:46:39 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.2066714763641357, 'learning_rate': 1.268243013774856e-05, 'epoch': 1.29} +2025-02-05 19:46:39 - ERROR - stderr - 43%|████▎ | 9673/22434 [9:38:59<8:55:58, 2.52s/it] +2025-02-05 19:46:42 - ERROR - stderr - 43%|████▎ | 9674/22434 [9:39:01<9:15:33, 2.61s/it] +2025-02-05 19:46:42 - ERROR - stderr - +2025-02-05 19:46:42 - ERROR - stderr - +2025-02-05 19:46:42 - INFO - stdout - {'loss': 0.7646, 'grad_norm': 1.2250553369522095, 'learning_rate': 1.2681039274641261e-05, 'epoch': 1.29} +2025-02-05 19:46:42 - ERROR - stderr - 43%|████▎ | 9674/22434 [9:39:02<9:15:33, 2.61s/it] +2025-02-05 19:46:45 - ERROR - stderr - 43%|████▎ | 9675/22434 [9:39:04<9:31:56, 2.69s/it] +2025-02-05 19:46:45 - ERROR - stderr - +2025-02-05 19:46:45 - ERROR - stderr - +2025-02-05 19:46:45 - INFO - stdout - {'loss': 0.767, 'grad_norm': 1.318450927734375, 'learning_rate': 1.267964835565026e-05, 'epoch': 1.29} +2025-02-05 19:46:45 - ERROR - stderr - 43%|████▎ | 9675/22434 [9:39:04<9:31:56, 2.69s/it] +2025-02-05 19:46:47 - ERROR - stderr - 43%|████▎ | 9676/22434 [9:39:07<9:14:25, 2.61s/it] +2025-02-05 19:46:47 - ERROR - stderr - +2025-02-05 19:46:47 - ERROR - stderr - +2025-02-05 19:46:47 - INFO - stdout - {'loss': 0.6883, 'grad_norm': 1.198045253753662, 'learning_rate': 1.2678257380804557e-05, 'epoch': 1.29} +2025-02-05 19:46:47 - ERROR - stderr - 43%|████▎ | 9676/22434 [9:39:07<9:14:25, 2.61s/it] +2025-02-05 19:46:50 - ERROR - stderr - 43%|████▎ | 9677/22434 [9:39:09<9:12:39, 2.60s/it] +2025-02-05 19:46:50 - ERROR - stderr - +2025-02-05 19:46:50 - ERROR - stderr - +2025-02-05 19:46:50 - INFO - stdout - {'loss': 0.7543, 'grad_norm': 1.1542186737060547, 'learning_rate': 1.2676866350133142e-05, 'epoch': 1.29} +2025-02-05 19:46:50 - ERROR - stderr - 43%|████▎ | 9677/22434 [9:39:09<9:12:39, 2.60s/it] +2025-02-05 19:46:52 - ERROR - stderr - 43%|████▎ | 9678/22434 [9:39:12<9:05:35, 2.57s/it] +2025-02-05 19:46:52 - ERROR - stderr - +2025-02-05 19:46:52 - ERROR - stderr - +2025-02-05 19:46:52 - INFO - stdout - {'loss': 0.7494, 'grad_norm': 1.2106938362121582, 'learning_rate': 1.267547526366501e-05, 'epoch': 1.29} +2025-02-05 19:46:52 - ERROR - stderr - 43%|████▎ | 9678/22434 [9:39:12<9:05:35, 2.57s/it] +2025-02-05 19:46:55 - ERROR - stderr - 43%|████▎ | 9679/22434 [9:39:15<9:34:33, 2.70s/it] +2025-02-05 19:46:55 - ERROR - stderr - +2025-02-05 19:46:55 - ERROR - stderr - +2025-02-05 19:46:55 - INFO - stdout - {'loss': 0.744, 'grad_norm': 1.0944880247116089, 'learning_rate': 1.2674084121429153e-05, 'epoch': 1.29} +2025-02-05 19:46:55 - ERROR - stderr - 43%|████▎ | 9679/22434 [9:39:15<9:34:33, 2.70s/it] +2025-02-05 19:46:58 - ERROR - stderr - 43%|████▎ | 9680/22434 [9:39:17<9:21:09, 2.64s/it] +2025-02-05 19:46:58 - ERROR - stderr - +2025-02-05 19:46:58 - ERROR - stderr - +2025-02-05 19:46:58 - INFO - stdout - {'loss': 0.734, 'grad_norm': 1.1708085536956787, 'learning_rate': 1.2672692923454572e-05, 'epoch': 1.29} +2025-02-05 19:46:58 - ERROR - stderr - 43%|████▎ | 9680/22434 [9:39:17<9:21:09, 2.64s/it] +2025-02-05 19:47:00 - ERROR - stderr - 43%|████▎ | 9681/22434 [9:39:20<9:17:29, 2.62s/it] +2025-02-05 19:47:00 - ERROR - stderr - +2025-02-05 19:47:00 - ERROR - stderr - +2025-02-05 19:47:00 - INFO - stdout - {'loss': 0.6945, 'grad_norm': 1.069841980934143, 'learning_rate': 1.2671301669770266e-05, 'epoch': 1.29} +2025-02-05 19:47:00 - ERROR - stderr - 43%|████▎ | 9681/22434 [9:39:20<9:17:29, 2.62s/it] +2025-02-05 19:47:03 - ERROR - stderr - 43%|████▎ | 9682/22434 [9:39:22<9:12:49, 2.60s/it] +2025-02-05 19:47:03 - ERROR - stderr - +2025-02-05 19:47:03 - ERROR - stderr - +2025-02-05 19:47:03 - INFO - stdout - {'loss': 0.6381, 'grad_norm': 1.1405390501022339, 'learning_rate': 1.266991036040523e-05, 'epoch': 1.29} +2025-02-05 19:47:03 - ERROR - stderr - 43%|████▎ | 9682/22434 [9:39:23<9:12:49, 2.60s/it] +2025-02-05 19:47:05 - ERROR - stderr - 43%|████▎ | 9683/22434 [9:39:25<9:08:37, 2.58s/it] +2025-02-05 19:47:05 - ERROR - stderr - +2025-02-05 19:47:05 - ERROR - stderr - +2025-02-05 19:47:05 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.1604965925216675, 'learning_rate': 1.266851899538847e-05, 'epoch': 1.29} +2025-02-05 19:47:05 - ERROR - stderr - 43%|████▎ | 9683/22434 [9:39:25<9:08:37, 2.58s/it] +2025-02-05 19:47:08 - ERROR - stderr - 43%|████▎ | 9684/22434 [9:39:28<9:04:56, 2.56s/it] +2025-02-05 19:47:08 - ERROR - stderr - +2025-02-05 19:47:08 - ERROR - stderr - +2025-02-05 19:47:08 - INFO - stdout - {'loss': 0.7366, 'grad_norm': 1.1917328834533691, 'learning_rate': 1.2667127574748985e-05, 'epoch': 1.29} +2025-02-05 19:47:08 - ERROR - stderr - 43%|████▎ | 9684/22434 [9:39:28<9:04:56, 2.56s/it] +2025-02-05 19:47:10 - ERROR - stderr - 43%|████▎ | 9685/22434 [9:39:30<9:06:02, 2.57s/it] +2025-02-05 19:47:10 - ERROR - stderr - +2025-02-05 19:47:10 - ERROR - stderr - +2025-02-05 19:47:10 - INFO - stdout - {'loss': 0.7704, 'grad_norm': 1.2936955690383911, 'learning_rate': 1.2665736098515778e-05, 'epoch': 1.3} +2025-02-05 19:47:10 - ERROR - stderr - 43%|████▎ | 9685/22434 [9:39:30<9:06:02, 2.57s/it] +2025-02-05 19:47:13 - ERROR - stderr - 43%|████▎ | 9686/22434 [9:39:33<9:01:27, 2.55s/it] +2025-02-05 19:47:13 - ERROR - stderr - +2025-02-05 19:47:13 - ERROR - stderr - +2025-02-05 19:47:13 - INFO - stdout - {'loss': 0.8212, 'grad_norm': 1.1507152318954468, 'learning_rate': 1.2664344566717853e-05, 'epoch': 1.3} +2025-02-05 19:47:13 - ERROR - stderr - 43%|████▎ | 9686/22434 [9:39:33<9:01:27, 2.55s/it] +2025-02-05 19:47:15 - ERROR - stderr - 43%|████▎ | 9687/22434 [9:39:35<8:59:14, 2.54s/it] +2025-02-05 19:47:15 - ERROR - stderr - +2025-02-05 19:47:15 - ERROR - stderr - +2025-02-05 19:47:15 - INFO - stdout - {'loss': 0.732, 'grad_norm': 1.3287267684936523, 'learning_rate': 1.2662952979384216e-05, 'epoch': 1.3} +2025-02-05 19:47:15 - ERROR - stderr - 43%|████▎ | 9687/22434 [9:39:35<8:59:14, 2.54s/it] +2025-02-05 19:47:18 - ERROR - stderr - 43%|████▎ | 9688/22434 [9:39:38<8:52:42, 2.51s/it] +2025-02-05 19:47:18 - ERROR - stderr - +2025-02-05 19:47:18 - ERROR - stderr - +2025-02-05 19:47:18 - INFO - stdout - {'loss': 0.779, 'grad_norm': 1.199951410293579, 'learning_rate': 1.2661561336543868e-05, 'epoch': 1.3} +2025-02-05 19:47:18 - ERROR - stderr - 43%|████▎ | 9688/22434 [9:39:38<8:52:42, 2.51s/it] +2025-02-05 19:47:20 - ERROR - stderr - 43%|████▎ | 9689/22434 [9:39:40<8:52:11, 2.51s/it] +2025-02-05 19:47:20 - ERROR - stderr - +2025-02-05 19:47:20 - ERROR - stderr - +2025-02-05 19:47:20 - INFO - stdout - {'loss': 0.7696, 'grad_norm': 1.3043510913848877, 'learning_rate': 1.2660169638225824e-05, 'epoch': 1.3} +2025-02-05 19:47:20 - ERROR - stderr - 43%|████▎ | 9689/22434 [9:39:40<8:52:11, 2.51s/it] +2025-02-05 19:47:23 - ERROR - stderr - 43%|████▎ | 9690/22434 [9:39:43<8:57:21, 2.53s/it] +2025-02-05 19:47:23 - ERROR - stderr - +2025-02-05 19:47:23 - ERROR - stderr - +2025-02-05 19:47:23 - INFO - stdout - {'loss': 0.7398, 'grad_norm': 1.227268099784851, 'learning_rate': 1.2658777884459086e-05, 'epoch': 1.3} +2025-02-05 19:47:23 - ERROR - stderr - 43%|████▎ | 9690/22434 [9:39:43<8:57:21, 2.53s/it] +2025-02-05 19:47:25 - ERROR - stderr - 43%|████▎ | 9691/22434 [9:39:45<8:56:40, 2.53s/it] +2025-02-05 19:47:25 - ERROR - stderr - +2025-02-05 19:47:25 - ERROR - stderr - +2025-02-05 19:47:25 - INFO - stdout - {'loss': 0.7371, 'grad_norm': 1.3260715007781982, 'learning_rate': 1.2657386075272672e-05, 'epoch': 1.3} +2025-02-05 19:47:25 - ERROR - stderr - 43%|████▎ | 9691/22434 [9:39:45<8:56:40, 2.53s/it] +2025-02-05 19:47:28 - ERROR - stderr - 43%|████▎ | 9692/22434 [9:39:48<9:00:26, 2.54s/it] +2025-02-05 19:47:28 - ERROR - stderr - +2025-02-05 19:47:28 - ERROR - stderr - +2025-02-05 19:47:28 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.0832947492599487, 'learning_rate': 1.2655994210695586e-05, 'epoch': 1.3} +2025-02-05 19:47:28 - ERROR - stderr - 43%|████▎ | 9692/22434 [9:39:48<9:00:26, 2.54s/it] +2025-02-05 19:47:31 - ERROR - stderr - 43%|████▎ | 9693/22434 [9:39:50<9:12:03, 2.60s/it] +2025-02-05 19:47:31 - ERROR - stderr - +2025-02-05 19:47:31 - ERROR - stderr - +2025-02-05 19:47:31 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.1742264032363892, 'learning_rate': 1.2654602290756844e-05, 'epoch': 1.3} +2025-02-05 19:47:31 - ERROR - stderr - 43%|████▎ | 9693/22434 [9:39:51<9:12:03, 2.60s/it] +2025-02-05 19:47:33 - ERROR - stderr - 43%|████▎ | 9694/22434 [9:39:53<9:07:31, 2.58s/it] +2025-02-05 19:47:33 - ERROR - stderr - +2025-02-05 19:47:33 - ERROR - stderr - +2025-02-05 19:47:33 - INFO - stdout - {'loss': 0.757, 'grad_norm': 1.2155097723007202, 'learning_rate': 1.2653210315485453e-05, 'epoch': 1.3} +2025-02-05 19:47:33 - ERROR - stderr - 43%|████▎ | 9694/22434 [9:39:53<9:07:31, 2.58s/it] +2025-02-05 19:47:36 - ERROR - stderr - 43%|████▎ | 9695/22434 [9:39:55<8:58:07, 2.53s/it] +2025-02-05 19:47:36 - ERROR - stderr - +2025-02-05 19:47:36 - ERROR - stderr - +2025-02-05 19:47:36 - INFO - stdout - {'loss': 0.7112, 'grad_norm': 1.244540810585022, 'learning_rate': 1.2651818284910435e-05, 'epoch': 1.3} +2025-02-05 19:47:36 - ERROR - stderr - 43%|████▎ | 9695/22434 [9:39:55<8:58:07, 2.53s/it] +2025-02-05 19:47:38 - ERROR - stderr - 43%|████▎ | 9696/22434 [9:39:58<8:59:21, 2.54s/it] +2025-02-05 19:47:38 - ERROR - stderr - +2025-02-05 19:47:38 - ERROR - stderr - +2025-02-05 19:47:38 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.0440651178359985, 'learning_rate': 1.26504261990608e-05, 'epoch': 1.3} +2025-02-05 19:47:38 - ERROR - stderr - 43%|████▎ | 9696/22434 [9:39:58<8:59:21, 2.54s/it] +2025-02-05 19:47:41 - ERROR - stderr - 43%|████▎ | 9697/22434 [9:40:00<8:54:08, 2.52s/it] +2025-02-05 19:47:41 - ERROR - stderr - +2025-02-05 19:47:41 - ERROR - stderr - +2025-02-05 19:47:41 - INFO - stdout - {'loss': 0.6747, 'grad_norm': 1.126236081123352, 'learning_rate': 1.264903405796557e-05, 'epoch': 1.3} +2025-02-05 19:47:41 - ERROR - stderr - 43%|████▎ | 9697/22434 [9:40:00<8:54:08, 2.52s/it] +2025-02-05 19:47:43 - ERROR - stderr - 43%|████▎ | 9698/22434 [9:40:03<8:54:47, 2.52s/it] +2025-02-05 19:47:43 - ERROR - stderr - +2025-02-05 19:47:43 - ERROR - stderr - +2025-02-05 19:47:43 - INFO - stdout - {'loss': 0.7553, 'grad_norm': 1.1871329545974731, 'learning_rate': 1.2647641861653759e-05, 'epoch': 1.3} +2025-02-05 19:47:43 - ERROR - stderr - 43%|████▎ | 9698/22434 [9:40:03<8:54:47, 2.52s/it] +2025-02-05 19:47:46 - ERROR - stderr - 43%|████▎ | 9699/22434 [9:40:06<9:14:12, 2.61s/it] +2025-02-05 19:47:46 - ERROR - stderr - +2025-02-05 19:47:46 - ERROR - stderr - +2025-02-05 19:47:46 - INFO - stdout - {'loss': 0.6326, 'grad_norm': 1.2088154554367065, 'learning_rate': 1.2646249610154388e-05, 'epoch': 1.3} +2025-02-05 19:47:46 - ERROR - stderr - 43%|████▎ | 9699/22434 [9:40:06<9:14:12, 2.61s/it] +2025-02-05 19:47:48 - ERROR - stderr - 43%|████▎ | 9700/22434 [9:40:08<9:03:04, 2.56s/it] +2025-02-05 19:47:49 - ERROR - stderr - +2025-02-05 19:47:49 - ERROR - stderr - +2025-02-05 19:47:49 - INFO - stdout - {'loss': 0.7422, 'grad_norm': 1.2028765678405762, 'learning_rate': 1.2644857303496476e-05, 'epoch': 1.3} +2025-02-05 19:47:49 - ERROR - stderr - 43%|████▎ | 9700/22434 [9:40:08<9:03:04, 2.56s/it] +2025-02-05 19:47:51 - ERROR - stderr - 43%|████▎ | 9701/22434 [9:40:11<9:04:23, 2.57s/it] +2025-02-05 19:47:51 - ERROR - stderr - +2025-02-05 19:47:51 - ERROR - stderr - +2025-02-05 19:47:51 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.2238471508026123, 'learning_rate': 1.2643464941709042e-05, 'epoch': 1.3} +2025-02-05 19:47:51 - ERROR - stderr - 43%|████▎ | 9701/22434 [9:40:11<9:04:23, 2.57s/it] +2025-02-05 19:47:54 - ERROR - stderr - 43%|████▎ | 9702/22434 [9:40:13<9:04:17, 2.57s/it] +2025-02-05 19:47:54 - ERROR - stderr - +2025-02-05 19:47:54 - ERROR - stderr - +2025-02-05 19:47:54 - INFO - stdout - {'loss': 0.653, 'grad_norm': 1.0195839405059814, 'learning_rate': 1.264207252482111e-05, 'epoch': 1.3} +2025-02-05 19:47:54 - ERROR - stderr - 43%|████▎ | 9702/22434 [9:40:13<9:04:17, 2.57s/it] +2025-02-05 19:47:56 - ERROR - stderr - 43%|████▎ | 9703/22434 [9:40:16<8:58:55, 2.54s/it] +2025-02-05 19:47:56 - ERROR - stderr - +2025-02-05 19:47:56 - ERROR - stderr - +2025-02-05 19:47:56 - INFO - stdout - {'loss': 0.7205, 'grad_norm': 1.0800600051879883, 'learning_rate': 1.2640680052861706e-05, 'epoch': 1.3} +2025-02-05 19:47:56 - ERROR - stderr - 43%|████▎ | 9703/22434 [9:40:16<8:58:55, 2.54s/it] +2025-02-05 19:47:59 - ERROR - stderr - 43%|████▎ | 9704/22434 [9:40:18<8:55:43, 2.53s/it] +2025-02-05 19:47:59 - ERROR - stderr - +2025-02-05 19:47:59 - ERROR - stderr - +2025-02-05 19:47:59 - INFO - stdout - {'loss': 0.7158, 'grad_norm': 1.1276752948760986, 'learning_rate': 1.2639287525859855e-05, 'epoch': 1.3} +2025-02-05 19:47:59 - ERROR - stderr - 43%|████▎ | 9704/22434 [9:40:18<8:55:43, 2.53s/it] +2025-02-05 19:48:01 - ERROR - stderr - 43%|████▎ | 9705/22434 [9:40:21<9:03:53, 2.56s/it] +2025-02-05 19:48:01 - ERROR - stderr - +2025-02-05 19:48:01 - ERROR - stderr - +2025-02-05 19:48:01 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.1186134815216064, 'learning_rate': 1.263789494384458e-05, 'epoch': 1.3} +2025-02-05 19:48:01 - ERROR - stderr - 43%|████▎ | 9705/22434 [9:40:21<9:03:53, 2.56s/it] +2025-02-05 19:48:04 - ERROR - stderr - 43%|████▎ | 9706/22434 [9:40:24<9:00:23, 2.55s/it] +2025-02-05 19:48:04 - ERROR - stderr - +2025-02-05 19:48:04 - ERROR - stderr - +2025-02-05 19:48:04 - INFO - stdout - {'loss': 0.6979, 'grad_norm': 1.180485486984253, 'learning_rate': 1.263650230684491e-05, 'epoch': 1.3} +2025-02-05 19:48:04 - ERROR - stderr - 43%|████▎ | 9706/22434 [9:40:24<9:00:23, 2.55s/it] +2025-02-05 19:48:06 - ERROR - stderr - 43%|████▎ | 9707/22434 [9:40:26<9:01:47, 2.55s/it] +2025-02-05 19:48:06 - ERROR - stderr - +2025-02-05 19:48:06 - ERROR - stderr - +2025-02-05 19:48:06 - INFO - stdout - {'loss': 0.6352, 'grad_norm': 1.137648344039917, 'learning_rate': 1.2635109614889868e-05, 'epoch': 1.3} +2025-02-05 19:48:06 - ERROR - stderr - 43%|████▎ | 9707/22434 [9:40:26<9:01:47, 2.55s/it] +2025-02-05 19:48:09 - ERROR - stderr - 43%|████▎ | 9708/22434 [9:40:29<8:57:10, 2.53s/it] +2025-02-05 19:48:09 - ERROR - stderr - +2025-02-05 19:48:09 - ERROR - stderr - +2025-02-05 19:48:09 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.2170674800872803, 'learning_rate': 1.2633716868008493e-05, 'epoch': 1.3} +2025-02-05 19:48:09 - ERROR - stderr - 43%|████▎ | 9708/22434 [9:40:29<8:57:10, 2.53s/it] +2025-02-05 19:48:11 - ERROR - stderr - 43%|████▎ | 9709/22434 [9:40:31<8:54:06, 2.52s/it] +2025-02-05 19:48:11 - ERROR - stderr - +2025-02-05 19:48:11 - ERROR - stderr - +2025-02-05 19:48:11 - INFO - stdout - {'loss': 0.7699, 'grad_norm': 1.2233610153198242, 'learning_rate': 1.2632324066229806e-05, 'epoch': 1.3} +2025-02-05 19:48:11 - ERROR - stderr - 43%|████▎ | 9709/22434 [9:40:31<8:54:06, 2.52s/it] +2025-02-05 19:48:14 - ERROR - stderr - 43%|████▎ | 9710/22434 [9:40:34<8:50:59, 2.50s/it] +2025-02-05 19:48:14 - ERROR - stderr - +2025-02-05 19:48:14 - ERROR - stderr - +2025-02-05 19:48:14 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.1107714176177979, 'learning_rate': 1.2630931209582844e-05, 'epoch': 1.3} +2025-02-05 19:48:14 - ERROR - stderr - 43%|████▎ | 9710/22434 [9:40:34<8:50:59, 2.50s/it] +2025-02-05 19:48:16 - ERROR - stderr - 43%|████▎ | 9711/22434 [9:40:36<8:50:20, 2.50s/it] +2025-02-05 19:48:16 - ERROR - stderr - +2025-02-05 19:48:16 - ERROR - stderr - +2025-02-05 19:48:16 - INFO - stdout - {'loss': 0.7793, 'grad_norm': 1.1467139720916748, 'learning_rate': 1.2629538298096641e-05, 'epoch': 1.3} +2025-02-05 19:48:16 - ERROR - stderr - 43%|████▎ | 9711/22434 [9:40:36<8:50:20, 2.50s/it] +2025-02-05 19:48:19 - ERROR - stderr - 43%|████▎ | 9712/22434 [9:40:38<8:46:15, 2.48s/it] +2025-02-05 19:48:19 - ERROR - stderr - +2025-02-05 19:48:19 - ERROR - stderr - +2025-02-05 19:48:19 - INFO - stdout - {'loss': 0.7095, 'grad_norm': 1.1824299097061157, 'learning_rate': 1.2628145331800226e-05, 'epoch': 1.3} +2025-02-05 19:48:19 - ERROR - stderr - 43%|████▎ | 9712/22434 [9:40:39<8:46:15, 2.48s/it] +2025-02-05 19:48:21 - ERROR - stderr - 43%|████▎ | 9713/22434 [9:40:41<8:46:49, 2.48s/it] +2025-02-05 19:48:21 - ERROR - stderr - +2025-02-05 19:48:21 - ERROR - stderr - +2025-02-05 19:48:21 - INFO - stdout - {'loss': 0.7512, 'grad_norm': 1.2506284713745117, 'learning_rate': 1.2626752310722637e-05, 'epoch': 1.3} +2025-02-05 19:48:21 - ERROR - stderr - 43%|████▎ | 9713/22434 [9:40:41<8:46:49, 2.48s/it] +2025-02-05 19:48:24 - ERROR - stderr - 43%|████▎ | 9714/22434 [9:40:43<8:42:58, 2.47s/it] +2025-02-05 19:48:24 - ERROR - stderr - +2025-02-05 19:48:24 - ERROR - stderr - +2025-02-05 19:48:24 - INFO - stdout - {'loss': 0.7968, 'grad_norm': 1.1447522640228271, 'learning_rate': 1.2625359234892906e-05, 'epoch': 1.3} +2025-02-05 19:48:24 - ERROR - stderr - 43%|████▎ | 9714/22434 [9:40:43<8:42:58, 2.47s/it] +2025-02-05 19:48:26 - ERROR - stderr - 43%|████▎ | 9715/22434 [9:40:46<8:41:53, 2.46s/it] +2025-02-05 19:48:26 - ERROR - stderr - +2025-02-05 19:48:26 - ERROR - stderr - +2025-02-05 19:48:26 - INFO - stdout - {'loss': 0.7642, 'grad_norm': 1.2814137935638428, 'learning_rate': 1.262396610434008e-05, 'epoch': 1.3} +2025-02-05 19:48:26 - ERROR - stderr - 43%|████▎ | 9715/22434 [9:40:46<8:41:53, 2.46s/it] +2025-02-05 19:48:29 - ERROR - stderr - 43%|████▎ | 9716/22434 [9:40:48<8:44:26, 2.47s/it] +2025-02-05 19:48:29 - ERROR - stderr - +2025-02-05 19:48:29 - ERROR - stderr - +2025-02-05 19:48:29 - INFO - stdout - {'loss': 0.7714, 'grad_norm': 1.371208667755127, 'learning_rate': 1.2622572919093188e-05, 'epoch': 1.3} +2025-02-05 19:48:29 - ERROR - stderr - 43%|████▎ | 9716/22434 [9:40:48<8:44:26, 2.47s/it] +2025-02-05 19:48:31 - ERROR - stderr - 43%|████▎ | 9717/22434 [9:40:51<8:48:36, 2.49s/it] +2025-02-05 19:48:31 - ERROR - stderr - +2025-02-05 19:48:31 - ERROR - stderr - +2025-02-05 19:48:31 - INFO - stdout - {'loss': 0.6613, 'grad_norm': 1.1869853734970093, 'learning_rate': 1.2621179679181273e-05, 'epoch': 1.3} +2025-02-05 19:48:31 - ERROR - stderr - 43%|████▎ | 9717/22434 [9:40:51<8:48:36, 2.49s/it] +2025-02-05 19:48:34 - ERROR - stderr - 43%|████▎ | 9718/22434 [9:40:53<8:43:52, 2.47s/it] +2025-02-05 19:48:34 - ERROR - stderr - +2025-02-05 19:48:34 - ERROR - stderr - +2025-02-05 19:48:34 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.1784124374389648, 'learning_rate': 1.2619786384633374e-05, 'epoch': 1.3} +2025-02-05 19:48:34 - ERROR - stderr - 43%|████▎ | 9718/22434 [9:40:53<8:43:52, 2.47s/it] +2025-02-05 19:48:36 - ERROR - stderr - 43%|████▎ | 9719/22434 [9:40:56<8:43:10, 2.47s/it] +2025-02-05 19:48:36 - ERROR - stderr - +2025-02-05 19:48:36 - ERROR - stderr - +2025-02-05 19:48:36 - INFO - stdout - {'loss': 0.6611, 'grad_norm': 1.1134071350097656, 'learning_rate': 1.261839303547854e-05, 'epoch': 1.3} +2025-02-05 19:48:36 - ERROR - stderr - 43%|████▎ | 9719/22434 [9:40:56<8:43:10, 2.47s/it] +2025-02-05 19:48:38 - ERROR - stderr - 43%|████▎ | 9720/22434 [9:40:58<8:44:53, 2.48s/it] +2025-02-05 19:48:39 - ERROR - stderr - +2025-02-05 19:48:39 - ERROR - stderr - +2025-02-05 19:48:39 - INFO - stdout - {'loss': 0.7883, 'grad_norm': 1.1958562135696411, 'learning_rate': 1.2616999631745807e-05, 'epoch': 1.3} +2025-02-05 19:48:39 - ERROR - stderr - 43%|████▎ | 9720/22434 [9:40:58<8:44:53, 2.48s/it] +2025-02-05 19:48:41 - ERROR - stderr - 43%|████▎ | 9721/22434 [9:41:01<8:47:10, 2.49s/it] +2025-02-05 19:48:41 - ERROR - stderr - +2025-02-05 19:48:41 - ERROR - stderr - +2025-02-05 19:48:41 - INFO - stdout - {'loss': 0.7567, 'grad_norm': 1.264787197113037, 'learning_rate': 1.2615606173464216e-05, 'epoch': 1.3} +2025-02-05 19:48:41 - ERROR - stderr - 43%|████▎ | 9721/22434 [9:41:01<8:47:10, 2.49s/it] +2025-02-05 19:48:43 - ERROR - stderr - 43%|████▎ | 9722/22434 [9:41:03<8:47:08, 2.49s/it] +2025-02-05 19:48:44 - ERROR - stderr - +2025-02-05 19:48:44 - ERROR - stderr - +2025-02-05 19:48:44 - INFO - stdout - {'loss': 0.8026, 'grad_norm': 1.2790603637695312, 'learning_rate': 1.2614212660662822e-05, 'epoch': 1.3} +2025-02-05 19:48:44 - ERROR - stderr - 43%|████▎ | 9722/22434 [9:41:03<8:47:08, 2.49s/it] +2025-02-05 19:48:46 - ERROR - stderr - 43%|████▎ | 9723/22434 [9:41:06<8:46:32, 2.49s/it] +2025-02-05 19:48:46 - ERROR - stderr - +2025-02-05 19:48:46 - ERROR - stderr - +2025-02-05 19:48:46 - INFO - stdout - {'loss': 0.626, 'grad_norm': 1.1667567491531372, 'learning_rate': 1.2612819093370667e-05, 'epoch': 1.3} +2025-02-05 19:48:46 - ERROR - stderr - 43%|████▎ | 9723/22434 [9:41:06<8:46:32, 2.49s/it] +2025-02-05 19:48:49 - ERROR - stderr - 43%|████▎ | 9724/22434 [9:41:08<8:52:40, 2.51s/it] +2025-02-05 19:48:49 - ERROR - stderr - +2025-02-05 19:48:49 - ERROR - stderr - +2025-02-05 19:48:49 - INFO - stdout - {'loss': 0.7583, 'grad_norm': 1.2687195539474487, 'learning_rate': 1.2611425471616796e-05, 'epoch': 1.3} +2025-02-05 19:48:49 - ERROR - stderr - 43%|████▎ | 9724/22434 [9:41:08<8:52:40, 2.51s/it] +2025-02-05 19:48:51 - ERROR - stderr - 43%|████▎ | 9725/22434 [9:41:11<8:54:39, 2.52s/it] +2025-02-05 19:48:51 - ERROR - stderr - +2025-02-05 19:48:51 - ERROR - stderr - +2025-02-05 19:48:51 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.2207969427108765, 'learning_rate': 1.261003179543026e-05, 'epoch': 1.3} +2025-02-05 19:48:51 - ERROR - stderr - 43%|████▎ | 9725/22434 [9:41:11<8:54:39, 2.52s/it] +2025-02-05 19:48:54 - ERROR - stderr - 43%|████▎ | 9726/22434 [9:41:14<9:05:50, 2.58s/it] +2025-02-05 19:48:54 - ERROR - stderr - +2025-02-05 19:48:54 - ERROR - stderr - +2025-02-05 19:48:54 - INFO - stdout - {'loss': 0.7364, 'grad_norm': 1.2442119121551514, 'learning_rate': 1.2608638064840108e-05, 'epoch': 1.3} +2025-02-05 19:48:54 - ERROR - stderr - 43%|████▎ | 9726/22434 [9:41:14<9:05:50, 2.58s/it] +2025-02-05 19:48:56 - ERROR - stderr - 43%|████▎ | 9727/22434 [9:41:16<9:09:54, 2.60s/it] +2025-02-05 19:48:56 - ERROR - stderr - +2025-02-05 19:48:56 - ERROR - stderr - +2025-02-05 19:48:56 - INFO - stdout - {'loss': 0.7548, 'grad_norm': 1.244952917098999, 'learning_rate': 1.2607244279875395e-05, 'epoch': 1.3} +2025-02-05 19:48:56 - ERROR - stderr - 43%|████▎ | 9727/22434 [9:41:16<9:09:54, 2.60s/it] +2025-02-05 19:48:59 - ERROR - stderr - 43%|████▎ | 9728/22434 [9:41:19<9:05:06, 2.57s/it] +2025-02-05 19:48:59 - ERROR - stderr - +2025-02-05 19:48:59 - ERROR - stderr - +2025-02-05 19:48:59 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.164206624031067, 'learning_rate': 1.2605850440565165e-05, 'epoch': 1.3} +2025-02-05 19:48:59 - ERROR - stderr - 43%|████▎ | 9728/22434 [9:41:19<9:05:06, 2.57s/it] +2025-02-05 19:49:02 - ERROR - stderr - 43%|████▎ | 9729/22434 [9:41:21<9:05:40, 2.58s/it] +2025-02-05 19:49:02 - ERROR - stderr - +2025-02-05 19:49:02 - ERROR - stderr - +2025-02-05 19:49:02 - INFO - stdout - {'loss': 0.8108, 'grad_norm': 1.3708223104476929, 'learning_rate': 1.260445654693848e-05, 'epoch': 1.3} +2025-02-05 19:49:02 - ERROR - stderr - 43%|████▎ | 9729/22434 [9:41:21<9:05:40, 2.58s/it] +2025-02-05 19:49:04 - ERROR - stderr - 43%|████▎ | 9730/22434 [9:41:24<9:04:44, 2.57s/it] +2025-02-05 19:49:04 - ERROR - stderr - +2025-02-05 19:49:04 - ERROR - stderr - +2025-02-05 19:49:04 - INFO - stdout - {'loss': 0.6961, 'grad_norm': 1.189103364944458, 'learning_rate': 1.260306259902439e-05, 'epoch': 1.3} +2025-02-05 19:49:04 - ERROR - stderr - 43%|████▎ | 9730/22434 [9:41:24<9:04:44, 2.57s/it] +2025-02-05 19:49:07 - ERROR - stderr - 43%|████▎ | 9731/22434 [9:41:27<9:11:22, 2.60s/it] +2025-02-05 19:49:07 - ERROR - stderr - +2025-02-05 19:49:07 - ERROR - stderr - +2025-02-05 19:49:07 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.233698844909668, 'learning_rate': 1.2601668596851953e-05, 'epoch': 1.3} +2025-02-05 19:49:07 - ERROR - stderr - 43%|████▎ | 9731/22434 [9:41:27<9:11:22, 2.60s/it] +2025-02-05 19:49:10 - ERROR - stderr - 43%|████▎ | 9732/22434 [9:41:29<9:19:45, 2.64s/it] +2025-02-05 19:49:10 - ERROR - stderr - +2025-02-05 19:49:10 - ERROR - stderr - +2025-02-05 19:49:10 - INFO - stdout - {'loss': 0.7634, 'grad_norm': 1.2103241682052612, 'learning_rate': 1.2600274540450222e-05, 'epoch': 1.3} +2025-02-05 19:49:10 - ERROR - stderr - 43%|████▎ | 9732/22434 [9:41:29<9:19:45, 2.64s/it] +2025-02-05 19:49:12 - ERROR - stderr - 43%|████▎ | 9733/22434 [9:41:32<9:15:31, 2.62s/it] +2025-02-05 19:49:12 - ERROR - stderr - +2025-02-05 19:49:12 - ERROR - stderr - +2025-02-05 19:49:12 - INFO - stdout - {'loss': 0.765, 'grad_norm': 1.1698404550552368, 'learning_rate': 1.2598880429848252e-05, 'epoch': 1.3} +2025-02-05 19:49:12 - ERROR - stderr - 43%|████▎ | 9733/22434 [9:41:32<9:15:31, 2.62s/it] +2025-02-05 19:49:15 - ERROR - stderr - 43%|████▎ | 9734/22434 [9:41:34<9:10:57, 2.60s/it] +2025-02-05 19:49:15 - ERROR - stderr - +2025-02-05 19:49:15 - ERROR - stderr - +2025-02-05 19:49:15 - INFO - stdout - {'loss': 0.7123, 'grad_norm': 1.324569821357727, 'learning_rate': 1.259748626507511e-05, 'epoch': 1.3} +2025-02-05 19:49:15 - ERROR - stderr - 43%|████▎ | 9734/22434 [9:41:34<9:10:57, 2.60s/it] +2025-02-05 19:49:17 - ERROR - stderr - 43%|████▎ | 9735/22434 [9:41:37<9:12:36, 2.61s/it] +2025-02-05 19:49:17 - ERROR - stderr - +2025-02-05 19:49:17 - ERROR - stderr - +2025-02-05 19:49:17 - INFO - stdout - {'loss': 0.6729, 'grad_norm': 1.2169783115386963, 'learning_rate': 1.2596092046159854e-05, 'epoch': 1.3} +2025-02-05 19:49:17 - ERROR - stderr - 43%|████▎ | 9735/22434 [9:41:37<9:12:36, 2.61s/it] +2025-02-05 19:49:20 - ERROR - stderr - 43%|████▎ | 9736/22434 [9:41:40<9:05:22, 2.58s/it] +2025-02-05 19:49:20 - ERROR - stderr - +2025-02-05 19:49:20 - ERROR - stderr - +2025-02-05 19:49:20 - INFO - stdout - {'loss': 0.7708, 'grad_norm': 1.226935625076294, 'learning_rate': 1.2594697773131542e-05, 'epoch': 1.3} +2025-02-05 19:49:20 - ERROR - stderr - 43%|████▎ | 9736/22434 [9:41:40<9:05:22, 2.58s/it] +2025-02-05 19:49:22 - ERROR - stderr - 43%|████▎ | 9737/22434 [9:41:42<9:01:50, 2.56s/it] +2025-02-05 19:49:22 - ERROR - stderr - +2025-02-05 19:49:22 - ERROR - stderr - +2025-02-05 19:49:22 - INFO - stdout - {'loss': 0.7841, 'grad_norm': 1.1610592603683472, 'learning_rate': 1.2593303446019234e-05, 'epoch': 1.3} +2025-02-05 19:49:22 - ERROR - stderr - 43%|████▎ | 9737/22434 [9:41:42<9:01:50, 2.56s/it] +2025-02-05 19:49:25 - ERROR - stderr - 43%|████▎ | 9738/22434 [9:41:44<8:54:08, 2.52s/it] +2025-02-05 19:49:25 - ERROR - stderr - +2025-02-05 19:49:25 - ERROR - stderr - +2025-02-05 19:49:25 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.2631546258926392, 'learning_rate': 1.2591909064852002e-05, 'epoch': 1.3} +2025-02-05 19:49:25 - ERROR - stderr - 43%|████▎ | 9738/22434 [9:41:45<8:54:08, 2.52s/it] +2025-02-05 19:49:27 - ERROR - stderr - 43%|████▎ | 9739/22434 [9:41:47<8:56:28, 2.54s/it] +2025-02-05 19:49:27 - ERROR - stderr - +2025-02-05 19:49:27 - ERROR - stderr - +2025-02-05 19:49:27 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.096208095550537, 'learning_rate': 1.2590514629658905e-05, 'epoch': 1.3} +2025-02-05 19:49:27 - ERROR - stderr - 43%|████▎ | 9739/22434 [9:41:47<8:56:28, 2.54s/it] +2025-02-05 19:49:30 - ERROR - stderr - 43%|████▎ | 9740/22434 [9:41:50<8:59:12, 2.55s/it] +2025-02-05 19:49:30 - ERROR - stderr - +2025-02-05 19:49:30 - ERROR - stderr - +2025-02-05 19:49:30 - INFO - stdout - {'loss': 0.7587, 'grad_norm': 1.2070401906967163, 'learning_rate': 1.2589120140469007e-05, 'epoch': 1.3} +2025-02-05 19:49:30 - ERROR - stderr - 43%|████▎ | 9740/22434 [9:41:50<8:59:12, 2.55s/it] +2025-02-05 19:49:32 - ERROR - stderr - 43%|████▎ | 9741/22434 [9:41:52<9:02:26, 2.56s/it] +2025-02-05 19:49:33 - ERROR - stderr - +2025-02-05 19:49:33 - ERROR - stderr - +2025-02-05 19:49:33 - INFO - stdout - {'loss': 0.8041, 'grad_norm': 1.188755989074707, 'learning_rate': 1.258772559731138e-05, 'epoch': 1.3} +2025-02-05 19:49:33 - ERROR - stderr - 43%|████▎ | 9741/22434 [9:41:52<9:02:26, 2.56s/it] +2025-02-05 19:49:35 - ERROR - stderr - 43%|████▎ | 9742/22434 [9:41:55<8:58:02, 2.54s/it] +2025-02-05 19:49:35 - ERROR - stderr - +2025-02-05 19:49:35 - ERROR - stderr - +2025-02-05 19:49:35 - INFO - stdout - {'loss': 0.6858, 'grad_norm': 1.168511152267456, 'learning_rate': 1.2586331000215087e-05, 'epoch': 1.3} +2025-02-05 19:49:35 - ERROR - stderr - 43%|████▎ | 9742/22434 [9:41:55<8:58:02, 2.54s/it] +2025-02-05 19:49:38 - ERROR - stderr - 43%|████▎ | 9743/22434 [9:41:57<8:58:47, 2.55s/it] +2025-02-05 19:49:38 - ERROR - stderr - +2025-02-05 19:49:38 - ERROR - stderr - +2025-02-05 19:49:38 - INFO - stdout - {'loss': 0.6667, 'grad_norm': 1.1929455995559692, 'learning_rate': 1.2584936349209201e-05, 'epoch': 1.3} +2025-02-05 19:49:38 - ERROR - stderr - 43%|████▎ | 9743/22434 [9:41:57<8:58:47, 2.55s/it] +2025-02-05 19:49:40 - ERROR - stderr - 43%|████▎ | 9744/22434 [9:42:00<8:53:29, 2.52s/it] +2025-02-05 19:49:40 - ERROR - stderr - +2025-02-05 19:49:40 - ERROR - stderr - +2025-02-05 19:49:40 - INFO - stdout - {'loss': 0.6325, 'grad_norm': 1.147537350654602, 'learning_rate': 1.258354164432279e-05, 'epoch': 1.3} +2025-02-05 19:49:40 - ERROR - stderr - 43%|████▎ | 9744/22434 [9:42:00<8:53:29, 2.52s/it] +2025-02-05 19:49:42 - ERROR - stderr - 43%|████▎ | 9745/22434 [9:42:02<8:51:46, 2.51s/it] +2025-02-05 19:49:43 - ERROR - stderr - +2025-02-05 19:49:43 - ERROR - stderr - +2025-02-05 19:49:43 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.1671128273010254, 'learning_rate': 1.2582146885584925e-05, 'epoch': 1.3} +2025-02-05 19:49:43 - ERROR - stderr - 43%|████▎ | 9745/22434 [9:42:02<8:51:46, 2.51s/it] +2025-02-05 19:49:45 - ERROR - stderr - 43%|████▎ | 9746/22434 [9:42:05<8:49:32, 2.50s/it] +2025-02-05 19:49:45 - ERROR - stderr - +2025-02-05 19:49:45 - ERROR - stderr - +2025-02-05 19:49:45 - INFO - stdout - {'loss': 0.7265, 'grad_norm': 1.2589455842971802, 'learning_rate': 1.2580752073024677e-05, 'epoch': 1.3} +2025-02-05 19:49:45 - ERROR - stderr - 43%|████▎ | 9746/22434 [9:42:05<8:49:32, 2.50s/it] +2025-02-05 19:49:48 - ERROR - stderr - 43%|████▎ | 9747/22434 [9:42:07<8:56:55, 2.54s/it] +2025-02-05 19:49:48 - ERROR - stderr - +2025-02-05 19:49:48 - ERROR - stderr - +2025-02-05 19:49:48 - INFO - stdout - {'loss': 0.7022, 'grad_norm': 1.1517754793167114, 'learning_rate': 1.2579357206671126e-05, 'epoch': 1.3} +2025-02-05 19:49:48 - ERROR - stderr - 43%|████▎ | 9747/22434 [9:42:07<8:56:55, 2.54s/it] +2025-02-05 19:49:50 - ERROR - stderr - 43%|████▎ | 9748/22434 [9:42:10<9:07:43, 2.59s/it] +2025-02-05 19:49:50 - ERROR - stderr - +2025-02-05 19:49:50 - ERROR - stderr - +2025-02-05 19:49:50 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.2010449171066284, 'learning_rate': 1.2577962286553338e-05, 'epoch': 1.3} +2025-02-05 19:49:50 - ERROR - stderr - 43%|████▎ | 9748/22434 [9:42:10<9:07:43, 2.59s/it] +2025-02-05 19:49:53 - ERROR - stderr - 43%|████▎ | 9749/22434 [9:42:13<9:03:01, 2.57s/it] +2025-02-05 19:49:53 - ERROR - stderr - +2025-02-05 19:49:53 - ERROR - stderr - +2025-02-05 19:49:53 - INFO - stdout - {'loss': 0.644, 'grad_norm': 1.0198960304260254, 'learning_rate': 1.2576567312700394e-05, 'epoch': 1.3} +2025-02-05 19:49:53 - ERROR - stderr - 43%|████▎ | 9749/22434 [9:42:13<9:03:01, 2.57s/it] +2025-02-05 19:49:55 - ERROR - stderr - 43%|████▎ | 9750/22434 [9:42:15<8:56:07, 2.54s/it] +2025-02-05 19:49:55 - ERROR - stderr - +2025-02-05 19:49:55 - ERROR - stderr - +2025-02-05 19:49:55 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.1896487474441528, 'learning_rate': 1.2575172285141371e-05, 'epoch': 1.3} +2025-02-05 19:49:55 - ERROR - stderr - 43%|████▎ | 9750/22434 [9:42:15<8:56:07, 2.54s/it] +2025-02-05 19:49:58 - ERROR - stderr - 43%|████▎ | 9751/22434 [9:42:18<8:54:30, 2.53s/it] +2025-02-05 19:49:58 - ERROR - stderr - +2025-02-05 19:49:58 - ERROR - stderr - +2025-02-05 19:49:58 - INFO - stdout - {'loss': 0.7076, 'grad_norm': 1.1681534051895142, 'learning_rate': 1.2573777203905349e-05, 'epoch': 1.3} +2025-02-05 19:49:58 - ERROR - stderr - 43%|████▎ | 9751/22434 [9:42:18<8:54:30, 2.53s/it] +2025-02-05 19:50:00 - ERROR - stderr - 43%|████▎ | 9752/22434 [9:42:20<9:05:30, 2.58s/it] +2025-02-05 19:50:01 - ERROR - stderr - +2025-02-05 19:50:01 - ERROR - stderr - +2025-02-05 19:50:01 - INFO - stdout - {'loss': 0.7556, 'grad_norm': 1.150976300239563, 'learning_rate': 1.25723820690214e-05, 'epoch': 1.3} +2025-02-05 19:50:01 - ERROR - stderr - 43%|████▎ | 9752/22434 [9:42:20<9:05:30, 2.58s/it] +2025-02-05 19:50:03 - ERROR - stderr - 43%|████▎ | 9753/22434 [9:42:23<8:57:44, 2.54s/it] +2025-02-05 19:50:03 - ERROR - stderr - +2025-02-05 19:50:03 - ERROR - stderr - +2025-02-05 19:50:03 - INFO - stdout - {'loss': 0.7323, 'grad_norm': 1.247734546661377, 'learning_rate': 1.2570986880518605e-05, 'epoch': 1.3} +2025-02-05 19:50:03 - ERROR - stderr - 43%|████▎ | 9753/22434 [9:42:23<8:57:44, 2.54s/it] +2025-02-05 19:50:05 - ERROR - stderr - 43%|████▎ | 9754/22434 [9:42:25<8:56:05, 2.54s/it] +2025-02-05 19:50:06 - ERROR - stderr - +2025-02-05 19:50:06 - ERROR - stderr - +2025-02-05 19:50:06 - INFO - stdout - {'loss': 0.6584, 'grad_norm': 1.0402568578720093, 'learning_rate': 1.2569591638426054e-05, 'epoch': 1.3} +2025-02-05 19:50:06 - ERROR - stderr - 43%|████▎ | 9754/22434 [9:42:25<8:56:05, 2.54s/it] +2025-02-05 19:50:08 - ERROR - stderr - 43%|████▎ | 9755/22434 [9:42:28<8:53:14, 2.52s/it] +2025-02-05 19:50:08 - ERROR - stderr - +2025-02-05 19:50:08 - ERROR - stderr - +2025-02-05 19:50:08 - INFO - stdout - {'loss': 0.7317, 'grad_norm': 1.172666311264038, 'learning_rate': 1.2568196342772823e-05, 'epoch': 1.3} +2025-02-05 19:50:08 - ERROR - stderr - 43%|████▎ | 9755/22434 [9:42:28<8:53:14, 2.52s/it] +2025-02-05 19:50:11 - ERROR - stderr - 43%|████▎ | 9756/22434 [9:42:30<8:55:16, 2.53s/it] +2025-02-05 19:50:11 - ERROR - stderr - +2025-02-05 19:50:11 - ERROR - stderr - +2025-02-05 19:50:11 - INFO - stdout - {'loss': 0.674, 'grad_norm': 1.2080496549606323, 'learning_rate': 1.2566800993587997e-05, 'epoch': 1.3} +2025-02-05 19:50:11 - ERROR - stderr - 43%|████▎ | 9756/22434 [9:42:30<8:55:16, 2.53s/it] +2025-02-05 19:50:13 - ERROR - stderr - 43%|████▎ | 9757/22434 [9:42:33<8:48:18, 2.50s/it] +2025-02-05 19:50:13 - ERROR - stderr - +2025-02-05 19:50:13 - ERROR - stderr - +2025-02-05 19:50:13 - INFO - stdout - {'loss': 0.7222, 'grad_norm': 1.1687921285629272, 'learning_rate': 1.2565405590900659e-05, 'epoch': 1.3} +2025-02-05 19:50:13 - ERROR - stderr - 43%|████▎ | 9757/22434 [9:42:33<8:48:18, 2.50s/it] +2025-02-05 19:50:16 - ERROR - stderr - 43%|████▎ | 9758/22434 [9:42:36<9:09:11, 2.60s/it] +2025-02-05 19:50:16 - ERROR - stderr - +2025-02-05 19:50:16 - ERROR - stderr - +2025-02-05 19:50:16 - INFO - stdout - {'loss': 0.7102, 'grad_norm': 1.1584309339523315, 'learning_rate': 1.2564010134739897e-05, 'epoch': 1.3} +2025-02-05 19:50:16 - ERROR - stderr - 43%|████▎ | 9758/22434 [9:42:36<9:09:11, 2.60s/it] +2025-02-05 19:50:19 - ERROR - stderr - 44%|████▎ | 9759/22434 [9:42:38<9:18:21, 2.64s/it] +2025-02-05 19:50:19 - ERROR - stderr - +2025-02-05 19:50:19 - ERROR - stderr - +2025-02-05 19:50:19 - INFO - stdout - {'loss': 0.7514, 'grad_norm': 1.1695126295089722, 'learning_rate': 1.2562614625134797e-05, 'epoch': 1.31} +2025-02-05 19:50:19 - ERROR - stderr - 44%|████▎ | 9759/22434 [9:42:38<9:18:21, 2.64s/it] +2025-02-05 19:50:21 - ERROR - stderr - 44%|████▎ | 9760/22434 [9:42:41<9:08:55, 2.60s/it] +2025-02-05 19:50:21 - ERROR - stderr - +2025-02-05 19:50:21 - ERROR - stderr - +2025-02-05 19:50:21 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.1632521152496338, 'learning_rate': 1.2561219062114447e-05, 'epoch': 1.31} +2025-02-05 19:50:21 - ERROR - stderr - 44%|████▎ | 9760/22434 [9:42:41<9:08:55, 2.60s/it] +2025-02-05 19:50:24 - ERROR - stderr - 44%|████▎ | 9761/22434 [9:42:44<9:23:36, 2.67s/it] +2025-02-05 19:50:24 - ERROR - stderr - +2025-02-05 19:50:24 - ERROR - stderr - +2025-02-05 19:50:24 - INFO - stdout - {'loss': 0.7198, 'grad_norm': 1.182298183441162, 'learning_rate': 1.2559823445707936e-05, 'epoch': 1.31} +2025-02-05 19:50:24 - ERROR - stderr - 44%|████▎ | 9761/22434 [9:42:44<9:23:36, 2.67s/it] +2025-02-05 19:50:26 - ERROR - stderr - 44%|████▎ | 9762/22434 [9:42:46<9:11:51, 2.61s/it] +2025-02-05 19:50:26 - ERROR - stderr - +2025-02-05 19:50:26 - ERROR - stderr - +2025-02-05 19:50:26 - INFO - stdout - {'loss': 0.6733, 'grad_norm': 1.1981295347213745, 'learning_rate': 1.2558427775944357e-05, 'epoch': 1.31} +2025-02-05 19:50:26 - ERROR - stderr - 44%|████▎ | 9762/22434 [9:42:46<9:11:51, 2.61s/it] +2025-02-05 19:50:29 - ERROR - stderr - 44%|████▎ | 9763/22434 [9:42:49<9:00:52, 2.56s/it] +2025-02-05 19:50:29 - ERROR - stderr - +2025-02-05 19:50:29 - ERROR - stderr - +2025-02-05 19:50:29 - INFO - stdout - {'loss': 0.7103, 'grad_norm': 1.0663963556289673, 'learning_rate': 1.25570320528528e-05, 'epoch': 1.31} +2025-02-05 19:50:29 - ERROR - stderr - 44%|████▎ | 9763/22434 [9:42:49<9:00:52, 2.56s/it] +2025-02-05 19:50:31 - ERROR - stderr - 44%|████▎ | 9764/22434 [9:42:51<8:58:06, 2.55s/it] +2025-02-05 19:50:31 - ERROR - stderr - +2025-02-05 19:50:31 - ERROR - stderr - +2025-02-05 19:50:31 - INFO - stdout - {'loss': 0.7443, 'grad_norm': 1.2121855020523071, 'learning_rate': 1.2555636276462356e-05, 'epoch': 1.31} +2025-02-05 19:50:31 - ERROR - stderr - 44%|████▎ | 9764/22434 [9:42:51<8:58:06, 2.55s/it] +2025-02-05 19:50:34 - ERROR - stderr - 44%|████▎ | 9765/22434 [9:42:54<8:57:02, 2.54s/it] +2025-02-05 19:50:34 - ERROR - stderr - +2025-02-05 19:50:34 - ERROR - stderr - +2025-02-05 19:50:34 - INFO - stdout - {'loss': 0.7468, 'grad_norm': 1.295791506767273, 'learning_rate': 1.2554240446802118e-05, 'epoch': 1.31} +2025-02-05 19:50:34 - ERROR - stderr - 44%|████▎ | 9765/22434 [9:42:54<8:57:02, 2.54s/it] +2025-02-05 19:50:36 - ERROR - stderr - 44%|████▎ | 9766/22434 [9:42:56<8:50:40, 2.51s/it] +2025-02-05 19:50:36 - ERROR - stderr - +2025-02-05 19:50:36 - ERROR - stderr - +2025-02-05 19:50:36 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.2298610210418701, 'learning_rate': 1.2552844563901178e-05, 'epoch': 1.31} +2025-02-05 19:50:36 - ERROR - stderr - 44%|████▎ | 9766/22434 [9:42:56<8:50:40, 2.51s/it] +2025-02-05 19:50:39 - ERROR - stderr - 44%|████▎ | 9767/22434 [9:42:58<8:47:58, 2.50s/it] +2025-02-05 19:50:39 - ERROR - stderr - +2025-02-05 19:50:39 - ERROR - stderr - +2025-02-05 19:50:39 - INFO - stdout - {'loss': 0.6218, 'grad_norm': 1.0248095989227295, 'learning_rate': 1.2551448627788641e-05, 'epoch': 1.31} +2025-02-05 19:50:39 - ERROR - stderr - 44%|████▎ | 9767/22434 [9:42:59<8:47:58, 2.50s/it] +2025-02-05 19:50:41 - ERROR - stderr - 44%|████▎ | 9768/22434 [9:43:01<8:46:59, 2.50s/it] +2025-02-05 19:50:41 - ERROR - stderr - +2025-02-05 19:50:41 - ERROR - stderr - +2025-02-05 19:50:41 - INFO - stdout - {'loss': 0.7273, 'grad_norm': 1.1197268962860107, 'learning_rate': 1.2550052638493597e-05, 'epoch': 1.31} +2025-02-05 19:50:41 - ERROR - stderr - 44%|████▎ | 9768/22434 [9:43:01<8:46:59, 2.50s/it] +2025-02-05 19:50:44 - ERROR - stderr - 44%|████▎ | 9769/22434 [9:43:03<8:42:44, 2.48s/it] +2025-02-05 19:50:44 - ERROR - stderr - +2025-02-05 19:50:44 - ERROR - stderr - +2025-02-05 19:50:44 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.0497872829437256, 'learning_rate': 1.2548656596045147e-05, 'epoch': 1.31} +2025-02-05 19:50:44 - ERROR - stderr - 44%|████▎ | 9769/22434 [9:43:03<8:42:44, 2.48s/it] +2025-02-05 19:50:46 - ERROR - stderr - 44%|████▎ | 9770/22434 [9:43:06<8:47:03, 2.50s/it] +2025-02-05 19:50:46 - ERROR - stderr - +2025-02-05 19:50:46 - ERROR - stderr - +2025-02-05 19:50:46 - INFO - stdout - {'loss': 0.6677, 'grad_norm': 1.1599812507629395, 'learning_rate': 1.254726050047239e-05, 'epoch': 1.31} +2025-02-05 19:50:46 - ERROR - stderr - 44%|████▎ | 9770/22434 [9:43:06<8:47:03, 2.50s/it] +2025-02-05 19:50:49 - ERROR - stderr - 44%|████▎ | 9771/22434 [9:43:08<8:49:30, 2.51s/it] +2025-02-05 19:50:49 - ERROR - stderr - +2025-02-05 19:50:49 - ERROR - stderr - +2025-02-05 19:50:49 - INFO - stdout - {'loss': 0.7195, 'grad_norm': 1.0988116264343262, 'learning_rate': 1.2545864351804423e-05, 'epoch': 1.31} +2025-02-05 19:50:49 - ERROR - stderr - 44%|████▎ | 9771/22434 [9:43:09<8:49:30, 2.51s/it] +2025-02-05 19:50:51 - ERROR - stderr - 44%|████▎ | 9772/22434 [9:43:11<8:50:16, 2.51s/it] +2025-02-05 19:50:51 - ERROR - stderr - +2025-02-05 19:50:51 - ERROR - stderr - +2025-02-05 19:50:51 - INFO - stdout - {'loss': 0.7278, 'grad_norm': 1.0831265449523926, 'learning_rate': 1.2544468150070351e-05, 'epoch': 1.31} +2025-02-05 19:50:51 - ERROR - stderr - 44%|████▎ | 9772/22434 [9:43:11<8:50:16, 2.51s/it] +2025-02-05 19:50:54 - ERROR - stderr - 44%|████▎ | 9773/22434 [9:43:13<8:46:54, 2.50s/it] +2025-02-05 19:50:54 - ERROR - stderr - +2025-02-05 19:50:54 - ERROR - stderr - +2025-02-05 19:50:54 - INFO - stdout - {'loss': 0.6905, 'grad_norm': 1.0655025243759155, 'learning_rate': 1.2543071895299272e-05, 'epoch': 1.31} +2025-02-05 19:50:54 - ERROR - stderr - 44%|████▎ | 9773/22434 [9:43:14<8:46:54, 2.50s/it] +2025-02-05 19:50:56 - ERROR - stderr - 44%|████▎ | 9774/22434 [9:43:16<8:44:04, 2.48s/it] +2025-02-05 19:50:56 - ERROR - stderr - +2025-02-05 19:50:56 - ERROR - stderr - +2025-02-05 19:50:56 - INFO - stdout - {'loss': 0.7488, 'grad_norm': 1.089412808418274, 'learning_rate': 1.2541675587520296e-05, 'epoch': 1.31} +2025-02-05 19:50:56 - ERROR - stderr - 44%|████▎ | 9774/22434 [9:43:16<8:44:04, 2.48s/it] +2025-02-05 19:50:59 - ERROR - stderr - 44%|████▎ | 9775/22434 [9:43:18<8:46:51, 2.50s/it] +2025-02-05 19:50:59 - ERROR - stderr - +2025-02-05 19:50:59 - ERROR - stderr - +2025-02-05 19:50:59 - INFO - stdout - {'loss': 0.6901, 'grad_norm': 1.0837054252624512, 'learning_rate': 1.2540279226762526e-05, 'epoch': 1.31} +2025-02-05 19:50:59 - ERROR - stderr - 44%|████▎ | 9775/22434 [9:43:19<8:46:51, 2.50s/it] +2025-02-05 19:51:01 - ERROR - stderr - 44%|████▎ | 9776/22434 [9:43:21<8:47:34, 2.50s/it] +2025-02-05 19:51:01 - ERROR - stderr - +2025-02-05 19:51:01 - ERROR - stderr - +2025-02-05 19:51:01 - INFO - stdout - {'loss': 0.6739, 'grad_norm': 1.1585474014282227, 'learning_rate': 1.2538882813055064e-05, 'epoch': 1.31} +2025-02-05 19:51:01 - ERROR - stderr - 44%|████▎ | 9776/22434 [9:43:21<8:47:34, 2.50s/it] +2025-02-05 19:51:04 - ERROR - stderr - 44%|████▎ | 9777/22434 [9:43:23<8:48:21, 2.50s/it] +2025-02-05 19:51:04 - ERROR - stderr - +2025-02-05 19:51:04 - ERROR - stderr - +2025-02-05 19:51:04 - INFO - stdout - {'loss': 0.7617, 'grad_norm': 1.3716362714767456, 'learning_rate': 1.253748634642702e-05, 'epoch': 1.31} +2025-02-05 19:51:04 - ERROR - stderr - 44%|████▎ | 9777/22434 [9:43:24<8:48:21, 2.50s/it] +2025-02-05 19:51:06 - ERROR - stderr - 44%|████▎ | 9778/22434 [9:43:26<8:43:56, 2.48s/it] +2025-02-05 19:51:06 - ERROR - stderr - +2025-02-05 19:51:06 - ERROR - stderr - +2025-02-05 19:51:06 - INFO - stdout - {'loss': 0.72, 'grad_norm': 1.1717309951782227, 'learning_rate': 1.25360898269075e-05, 'epoch': 1.31} +2025-02-05 19:51:06 - ERROR - stderr - 44%|████▎ | 9778/22434 [9:43:26<8:43:56, 2.48s/it] +2025-02-05 19:51:09 - ERROR - stderr - 44%|████▎ | 9779/22434 [9:43:28<8:40:45, 2.47s/it] +2025-02-05 19:51:09 - ERROR - stderr - +2025-02-05 19:51:09 - ERROR - stderr - +2025-02-05 19:51:09 - INFO - stdout - {'loss': 0.6551, 'grad_norm': 1.1438543796539307, 'learning_rate': 1.2534693254525614e-05, 'epoch': 1.31} +2025-02-05 19:51:09 - ERROR - stderr - 44%|████▎ | 9779/22434 [9:43:28<8:40:45, 2.47s/it] +2025-02-05 19:51:11 - ERROR - stderr - 44%|████▎ | 9780/22434 [9:43:31<8:50:43, 2.52s/it] +2025-02-05 19:51:11 - ERROR - stderr - +2025-02-05 19:51:11 - ERROR - stderr - +2025-02-05 19:51:11 - INFO - stdout - {'loss': 0.7289, 'grad_norm': 1.1696605682373047, 'learning_rate': 1.2533296629310477e-05, 'epoch': 1.31} +2025-02-05 19:51:11 - ERROR - stderr - 44%|████▎ | 9780/22434 [9:43:31<8:50:43, 2.52s/it] +2025-02-05 19:51:14 - ERROR - stderr - 44%|████▎ | 9781/22434 [9:43:33<8:50:00, 2.51s/it] +2025-02-05 19:51:14 - ERROR - stderr - +2025-02-05 19:51:14 - ERROR - stderr - +2025-02-05 19:51:14 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.108705997467041, 'learning_rate': 1.253189995129119e-05, 'epoch': 1.31} +2025-02-05 19:51:14 - ERROR - stderr - 44%|████▎ | 9781/22434 [9:43:34<8:50:00, 2.51s/it] +2025-02-05 19:51:16 - ERROR - stderr - 44%|████▎ | 9782/22434 [9:43:36<8:46:35, 2.50s/it] +2025-02-05 19:51:16 - ERROR - stderr - +2025-02-05 19:51:16 - ERROR - stderr - +2025-02-05 19:51:16 - INFO - stdout - {'loss': 0.6944, 'grad_norm': 1.2557648420333862, 'learning_rate': 1.2530503220496875e-05, 'epoch': 1.31} +2025-02-05 19:51:16 - ERROR - stderr - 44%|████▎ | 9782/22434 [9:43:36<8:46:35, 2.50s/it] +2025-02-05 19:51:19 - ERROR - stderr - 44%|████▎ | 9783/22434 [9:43:39<8:50:41, 2.52s/it] +2025-02-05 19:51:19 - ERROR - stderr - +2025-02-05 19:51:19 - ERROR - stderr - +2025-02-05 19:51:19 - INFO - stdout - {'loss': 0.8241, 'grad_norm': 1.1566818952560425, 'learning_rate': 1.2529106436956642e-05, 'epoch': 1.31} +2025-02-05 19:51:19 - ERROR - stderr - 44%|████▎ | 9783/22434 [9:43:39<8:50:41, 2.52s/it] +2025-02-05 19:51:21 - ERROR - stderr - 44%|████▎ | 9784/22434 [9:43:41<8:53:04, 2.53s/it] +2025-02-05 19:51:21 - ERROR - stderr - +2025-02-05 19:51:21 - ERROR - stderr - +2025-02-05 19:51:21 - INFO - stdout - {'loss': 0.7291, 'grad_norm': 1.4061435461044312, 'learning_rate': 1.2527709600699605e-05, 'epoch': 1.31} +2025-02-05 19:51:21 - ERROR - stderr - 44%|████▎ | 9784/22434 [9:43:41<8:53:04, 2.53s/it] +2025-02-05 19:51:24 - ERROR - stderr - 44%|████▎ | 9785/22434 [9:43:44<8:52:09, 2.52s/it] +2025-02-05 19:51:24 - ERROR - stderr - +2025-02-05 19:51:24 - ERROR - stderr - +2025-02-05 19:51:24 - INFO - stdout - {'loss': 0.7503, 'grad_norm': 1.1407550573349, 'learning_rate': 1.2526312711754877e-05, 'epoch': 1.31} +2025-02-05 19:51:24 - ERROR - stderr - 44%|████▎ | 9785/22434 [9:43:44<8:52:09, 2.52s/it] +2025-02-05 19:51:26 - ERROR - stderr - 44%|████▎ | 9786/22434 [9:43:46<8:45:42, 2.49s/it] +2025-02-05 19:51:26 - ERROR - stderr - +2025-02-05 19:51:26 - ERROR - stderr - +2025-02-05 19:51:26 - INFO - stdout - {'loss': 0.7478, 'grad_norm': 1.1367970705032349, 'learning_rate': 1.252491577015158e-05, 'epoch': 1.31} +2025-02-05 19:51:26 - ERROR - stderr - 44%|████▎ | 9786/22434 [9:43:46<8:45:42, 2.49s/it] +2025-02-05 19:51:29 - ERROR - stderr - 44%|████▎ | 9787/22434 [9:43:49<8:46:56, 2.50s/it] +2025-02-05 19:51:29 - ERROR - stderr - +2025-02-05 19:51:29 - ERROR - stderr - +2025-02-05 19:51:29 - INFO - stdout - {'loss': 0.7855, 'grad_norm': 1.2011934518814087, 'learning_rate': 1.252351877591883e-05, 'epoch': 1.31} +2025-02-05 19:51:29 - ERROR - stderr - 44%|████▎ | 9787/22434 [9:43:49<8:46:56, 2.50s/it] +2025-02-05 19:51:31 - ERROR - stderr - 44%|████▎ | 9788/22434 [9:43:51<8:46:59, 2.50s/it] +2025-02-05 19:51:31 - ERROR - stderr - +2025-02-05 19:51:31 - ERROR - stderr - +2025-02-05 19:51:31 - INFO - stdout - {'loss': 0.714, 'grad_norm': 1.316261887550354, 'learning_rate': 1.2522121729085748e-05, 'epoch': 1.31} +2025-02-05 19:51:31 - ERROR - stderr - 44%|████▎ | 9788/22434 [9:43:51<8:46:59, 2.50s/it] +2025-02-05 19:51:34 - ERROR - stderr - 44%|████▎ | 9789/22434 [9:43:54<8:57:55, 2.55s/it] +2025-02-05 19:51:34 - ERROR - stderr - +2025-02-05 19:51:34 - ERROR - stderr - +2025-02-05 19:51:34 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.2697792053222656, 'learning_rate': 1.252072462968145e-05, 'epoch': 1.31} +2025-02-05 19:51:34 - ERROR - stderr - 44%|████▎ | 9789/22434 [9:43:54<8:57:55, 2.55s/it] +2025-02-05 19:51:36 - ERROR - stderr - 44%|████▎ | 9790/22434 [9:43:56<8:54:45, 2.54s/it] +2025-02-05 19:51:36 - ERROR - stderr - +2025-02-05 19:51:36 - ERROR - stderr - +2025-02-05 19:51:36 - INFO - stdout - {'loss': 0.6427, 'grad_norm': 1.1292520761489868, 'learning_rate': 1.2519327477735059e-05, 'epoch': 1.31} +2025-02-05 19:51:36 - ERROR - stderr - 44%|████▎ | 9790/22434 [9:43:56<8:54:45, 2.54s/it] +2025-02-05 19:51:39 - ERROR - stderr - 44%|████▎ | 9791/22434 [9:43:59<8:48:03, 2.51s/it] +2025-02-05 19:51:39 - ERROR - stderr - +2025-02-05 19:51:39 - ERROR - stderr - +2025-02-05 19:51:39 - INFO - stdout - {'loss': 0.6429, 'grad_norm': 1.155401349067688, 'learning_rate': 1.2517930273275698e-05, 'epoch': 1.31} +2025-02-05 19:51:39 - ERROR - stderr - 44%|████▎ | 9791/22434 [9:43:59<8:48:03, 2.51s/it] +2025-02-05 19:51:41 - ERROR - stderr - 44%|████▎ | 9792/22434 [9:44:01<8:56:16, 2.55s/it] +2025-02-05 19:51:42 - ERROR - stderr - +2025-02-05 19:51:42 - ERROR - stderr - +2025-02-05 19:51:42 - INFO - stdout - {'loss': 0.7268, 'grad_norm': 1.116864800453186, 'learning_rate': 1.2516533016332489e-05, 'epoch': 1.31} +2025-02-05 19:51:42 - ERROR - stderr - 44%|████▎ | 9792/22434 [9:44:01<8:56:16, 2.55s/it] +2025-02-05 19:51:44 - ERROR - stderr - 44%|████▎ | 9793/22434 [9:44:04<8:49:40, 2.51s/it] +2025-02-05 19:51:44 - ERROR - stderr - +2025-02-05 19:51:44 - ERROR - stderr - +2025-02-05 19:51:44 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.2376480102539062, 'learning_rate': 1.2515135706934556e-05, 'epoch': 1.31} +2025-02-05 19:51:44 - ERROR - stderr - 44%|████▎ | 9793/22434 [9:44:04<8:49:40, 2.51s/it] +2025-02-05 19:51:46 - ERROR - stderr - 44%|████▎ | 9794/22434 [9:44:06<8:48:26, 2.51s/it] +2025-02-05 19:51:46 - ERROR - stderr - +2025-02-05 19:51:46 - ERROR - stderr - +2025-02-05 19:51:46 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.1512469053268433, 'learning_rate': 1.2513738345111029e-05, 'epoch': 1.31} +2025-02-05 19:51:46 - ERROR - stderr - 44%|████▎ | 9794/22434 [9:44:06<8:48:26, 2.51s/it] +2025-02-05 19:51:49 - ERROR - stderr - 44%|████▎ | 9795/22434 [9:44:09<9:14:48, 2.63s/it] +2025-02-05 19:51:49 - ERROR - stderr - +2025-02-05 19:51:49 - ERROR - stderr - +2025-02-05 19:51:49 - INFO - stdout - {'loss': 0.7514, 'grad_norm': 1.230810284614563, 'learning_rate': 1.251234093089103e-05, 'epoch': 1.31} +2025-02-05 19:51:49 - ERROR - stderr - 44%|████▎ | 9795/22434 [9:44:09<9:14:48, 2.63s/it] +2025-02-05 19:51:52 - ERROR - stderr - 44%|████▎ | 9796/22434 [9:44:12<9:02:24, 2.58s/it] +2025-02-05 19:51:52 - ERROR - stderr - +2025-02-05 19:51:52 - ERROR - stderr - +2025-02-05 19:51:52 - INFO - stdout - {'loss': 0.7205, 'grad_norm': 1.206926703453064, 'learning_rate': 1.2510943464303688e-05, 'epoch': 1.31} +2025-02-05 19:51:52 - ERROR - stderr - 44%|████▎ | 9796/22434 [9:44:12<9:02:24, 2.58s/it] +2025-02-05 19:51:54 - ERROR - stderr - 44%|████▎ | 9797/22434 [9:44:14<9:00:30, 2.57s/it] +2025-02-05 19:51:54 - ERROR - stderr - +2025-02-05 19:51:54 - ERROR - stderr - +2025-02-05 19:51:54 - INFO - stdout - {'loss': 0.8289, 'grad_norm': 1.2250057458877563, 'learning_rate': 1.2509545945378134e-05, 'epoch': 1.31} +2025-02-05 19:51:54 - ERROR - stderr - 44%|████▎ | 9797/22434 [9:44:14<9:00:30, 2.57s/it] +2025-02-05 19:51:57 - ERROR - stderr - 44%|████▎ | 9798/22434 [9:44:17<8:52:05, 2.53s/it] +2025-02-05 19:51:57 - ERROR - stderr - +2025-02-05 19:51:57 - ERROR - stderr - +2025-02-05 19:51:57 - INFO - stdout - {'loss': 0.6943, 'grad_norm': 1.0991623401641846, 'learning_rate': 1.2508148374143492e-05, 'epoch': 1.31} +2025-02-05 19:51:57 - ERROR - stderr - 44%|████▎ | 9798/22434 [9:44:17<8:52:05, 2.53s/it] +2025-02-05 19:51:59 - ERROR - stderr - 44%|████▎ | 9799/22434 [9:44:19<8:47:25, 2.50s/it] +2025-02-05 19:51:59 - ERROR - stderr - +2025-02-05 19:51:59 - ERROR - stderr - +2025-02-05 19:51:59 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.1039295196533203, 'learning_rate': 1.25067507506289e-05, 'epoch': 1.31} +2025-02-05 19:51:59 - ERROR - stderr - 44%|████▎ | 9799/22434 [9:44:19<8:47:25, 2.50s/it] +2025-02-05 19:52:02 - ERROR - stderr - 44%|████▎ | 9800/22434 [9:44:22<8:48:38, 2.51s/it] +2025-02-05 19:52:02 - ERROR - stderr - +2025-02-05 19:52:02 - ERROR - stderr - +2025-02-05 19:52:02 - INFO - stdout - {'loss': 0.7658, 'grad_norm': 1.2919847965240479, 'learning_rate': 1.250535307486349e-05, 'epoch': 1.31} +2025-02-05 19:52:02 - ERROR - stderr - 44%|████▎ | 9800/22434 [9:44:22<8:48:38, 2.51s/it] +2025-02-05 19:52:04 - ERROR - stderr - 44%|████▎ | 9801/22434 [9:44:24<8:43:56, 2.49s/it] +2025-02-05 19:52:04 - ERROR - stderr - +2025-02-05 19:52:04 - ERROR - stderr - +2025-02-05 19:52:04 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.1397085189819336, 'learning_rate': 1.2503955346876388e-05, 'epoch': 1.31} +2025-02-05 19:52:04 - ERROR - stderr - 44%|████▎ | 9801/22434 [9:44:24<8:43:56, 2.49s/it] +2025-02-05 19:52:07 - ERROR - stderr - 44%|████▎ | 9802/22434 [9:44:27<8:50:34, 2.52s/it] +2025-02-05 19:52:07 - ERROR - stderr - +2025-02-05 19:52:07 - ERROR - stderr - +2025-02-05 19:52:07 - INFO - stdout - {'loss': 0.8343, 'grad_norm': 1.3238701820373535, 'learning_rate': 1.2502557566696736e-05, 'epoch': 1.31} +2025-02-05 19:52:07 - ERROR - stderr - 44%|████▎ | 9802/22434 [9:44:27<8:50:34, 2.52s/it] +2025-02-05 19:52:09 - ERROR - stderr - 44%|████▎ | 9803/22434 [9:44:29<8:45:32, 2.50s/it] +2025-02-05 19:52:09 - ERROR - stderr - +2025-02-05 19:52:09 - ERROR - stderr - +2025-02-05 19:52:09 - INFO - stdout - {'loss': 0.7589, 'grad_norm': 1.286534309387207, 'learning_rate': 1.2501159734353665e-05, 'epoch': 1.31} +2025-02-05 19:52:09 - ERROR - stderr - 44%|████▎ | 9803/22434 [9:44:29<8:45:32, 2.50s/it] +2025-02-05 19:52:12 - ERROR - stderr - 44%|████▎ | 9804/22434 [9:44:32<8:56:27, 2.55s/it] +2025-02-05 19:52:12 - ERROR - stderr - +2025-02-05 19:52:12 - ERROR - stderr - +2025-02-05 19:52:12 - INFO - stdout - {'loss': 0.7938, 'grad_norm': 1.209022045135498, 'learning_rate': 1.2499761849876313e-05, 'epoch': 1.31} +2025-02-05 19:52:12 - ERROR - stderr - 44%|████▎ | 9804/22434 [9:44:32<8:56:27, 2.55s/it] +2025-02-05 19:52:14 - ERROR - stderr - 44%|████▎ | 9805/22434 [9:44:34<8:49:27, 2.52s/it] +2025-02-05 19:52:14 - ERROR - stderr - +2025-02-05 19:52:14 - ERROR - stderr - +2025-02-05 19:52:14 - INFO - stdout - {'loss': 0.7335, 'grad_norm': 1.2577706575393677, 'learning_rate': 1.2498363913293817e-05, 'epoch': 1.31} +2025-02-05 19:52:14 - ERROR - stderr - 44%|████▎ | 9805/22434 [9:44:34<8:49:27, 2.52s/it] +2025-02-05 19:52:17 - ERROR - stderr - 44%|████▎ | 9806/22434 [9:44:37<8:47:35, 2.51s/it] +2025-02-05 19:52:17 - ERROR - stderr - +2025-02-05 19:52:17 - ERROR - stderr - +2025-02-05 19:52:17 - INFO - stdout - {'loss': 0.6639, 'grad_norm': 1.0877881050109863, 'learning_rate': 1.2496965924635314e-05, 'epoch': 1.31} +2025-02-05 19:52:17 - ERROR - stderr - 44%|████▎ | 9806/22434 [9:44:37<8:47:35, 2.51s/it] +2025-02-05 19:52:19 - ERROR - stderr - 44%|████▎ | 9807/22434 [9:44:39<8:44:06, 2.49s/it] +2025-02-05 19:52:19 - ERROR - stderr - +2025-02-05 19:52:19 - ERROR - stderr - +2025-02-05 19:52:19 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.244131326675415, 'learning_rate': 1.2495567883929947e-05, 'epoch': 1.31} +2025-02-05 19:52:19 - ERROR - stderr - 44%|████▎ | 9807/22434 [9:44:39<8:44:06, 2.49s/it] +2025-02-05 19:52:22 - ERROR - stderr - 44%|████▎ | 9808/22434 [9:44:42<8:48:31, 2.51s/it] +2025-02-05 19:52:22 - ERROR - stderr - +2025-02-05 19:52:22 - ERROR - stderr - +2025-02-05 19:52:22 - INFO - stdout - {'loss': 0.6612, 'grad_norm': 1.164082407951355, 'learning_rate': 1.2494169791206859e-05, 'epoch': 1.31} +2025-02-05 19:52:22 - ERROR - stderr - 44%|████▎ | 9808/22434 [9:44:42<8:48:31, 2.51s/it] +2025-02-05 19:52:24 - ERROR - stderr - 44%|████▎ | 9809/22434 [9:44:44<8:46:18, 2.50s/it] +2025-02-05 19:52:24 - ERROR - stderr - +2025-02-05 19:52:24 - ERROR - stderr - +2025-02-05 19:52:24 - INFO - stdout - {'loss': 0.6682, 'grad_norm': 1.091600775718689, 'learning_rate': 1.2492771646495184e-05, 'epoch': 1.31} +2025-02-05 19:52:24 - ERROR - stderr - 44%|████▎ | 9809/22434 [9:44:44<8:46:18, 2.50s/it] +2025-02-05 19:52:27 - ERROR - stderr - 44%|████▎ | 9810/22434 [9:44:47<8:41:40, 2.48s/it] +2025-02-05 19:52:27 - ERROR - stderr - +2025-02-05 19:52:27 - ERROR - stderr - +2025-02-05 19:52:27 - INFO - stdout - {'loss': 0.6888, 'grad_norm': 1.1382920742034912, 'learning_rate': 1.2491373449824072e-05, 'epoch': 1.31} +2025-02-05 19:52:27 - ERROR - stderr - 44%|████▎ | 9810/22434 [9:44:47<8:41:40, 2.48s/it] +2025-02-05 19:52:29 - ERROR - stderr - 44%|████▎ | 9811/22434 [9:44:49<8:40:55, 2.48s/it] +2025-02-05 19:52:29 - ERROR - stderr - +2025-02-05 19:52:29 - ERROR - stderr - +2025-02-05 19:52:29 - INFO - stdout - {'loss': 0.6971, 'grad_norm': 1.0887612104415894, 'learning_rate': 1.2489975201222662e-05, 'epoch': 1.31} +2025-02-05 19:52:29 - ERROR - stderr - 44%|████▎ | 9811/22434 [9:44:49<8:40:55, 2.48s/it] +2025-02-05 19:52:32 - ERROR - stderr - 44%|████▎ | 9812/22434 [9:44:51<8:43:08, 2.49s/it] +2025-02-05 19:52:32 - ERROR - stderr - +2025-02-05 19:52:32 - ERROR - stderr - +2025-02-05 19:52:32 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.125855565071106, 'learning_rate': 1.2488576900720101e-05, 'epoch': 1.31} +2025-02-05 19:52:32 - ERROR - stderr - 44%|████▎ | 9812/22434 [9:44:52<8:43:08, 2.49s/it] +2025-02-05 19:52:34 - ERROR - stderr - 44%|████▎ | 9813/22434 [9:44:54<8:42:24, 2.48s/it] +2025-02-05 19:52:34 - ERROR - stderr - +2025-02-05 19:52:34 - ERROR - stderr - +2025-02-05 19:52:34 - INFO - stdout - {'loss': 0.7382, 'grad_norm': 1.3042161464691162, 'learning_rate': 1.2487178548345538e-05, 'epoch': 1.31} +2025-02-05 19:52:34 - ERROR - stderr - 44%|████▎ | 9813/22434 [9:44:54<8:42:24, 2.48s/it] +2025-02-05 19:52:37 - ERROR - stderr - 44%|███��▎ | 9814/22434 [9:44:56<8:42:20, 2.48s/it] +2025-02-05 19:52:37 - ERROR - stderr - +2025-02-05 19:52:37 - ERROR - stderr - +2025-02-05 19:52:37 - INFO - stdout - {'loss': 0.5926, 'grad_norm': 0.9865109920501709, 'learning_rate': 1.2485780144128116e-05, 'epoch': 1.31} +2025-02-05 19:52:37 - ERROR - stderr - 44%|████▎ | 9814/22434 [9:44:56<8:42:20, 2.48s/it] +2025-02-05 19:52:39 - ERROR - stderr - 44%|████▍ | 9815/22434 [9:44:59<8:46:30, 2.50s/it] +2025-02-05 19:52:39 - ERROR - stderr - +2025-02-05 19:52:39 - ERROR - stderr - +2025-02-05 19:52:39 - INFO - stdout - {'loss': 0.6421, 'grad_norm': 1.1443166732788086, 'learning_rate': 1.2484381688096988e-05, 'epoch': 1.31} +2025-02-05 19:52:39 - ERROR - stderr - 44%|████▍ | 9815/22434 [9:44:59<8:46:30, 2.50s/it] +2025-02-05 19:52:42 - ERROR - stderr - 44%|████▍ | 9816/22434 [9:45:02<8:48:08, 2.51s/it] +2025-02-05 19:52:42 - ERROR - stderr - +2025-02-05 19:52:42 - ERROR - stderr - +2025-02-05 19:52:42 - INFO - stdout - {'loss': 0.7637, 'grad_norm': 1.1826109886169434, 'learning_rate': 1.2482983180281302e-05, 'epoch': 1.31} +2025-02-05 19:52:42 - ERROR - stderr - 44%|████▍ | 9816/22434 [9:45:02<8:48:08, 2.51s/it] +2025-02-05 19:52:44 - ERROR - stderr - 44%|████▍ | 9817/22434 [9:45:04<8:44:44, 2.50s/it] +2025-02-05 19:52:44 - ERROR - stderr - +2025-02-05 19:52:44 - ERROR - stderr - +2025-02-05 19:52:44 - INFO - stdout - {'loss': 0.7438, 'grad_norm': 1.1959513425827026, 'learning_rate': 1.2481584620710203e-05, 'epoch': 1.31} +2025-02-05 19:52:44 - ERROR - stderr - 44%|████▍ | 9817/22434 [9:45:04<8:44:44, 2.50s/it] +2025-02-05 19:52:47 - ERROR - stderr - 44%|████▍ | 9818/22434 [9:45:07<8:47:53, 2.51s/it] +2025-02-05 19:52:47 - ERROR - stderr - +2025-02-05 19:52:47 - ERROR - stderr - +2025-02-05 19:52:47 - INFO - stdout - {'loss': 0.8757, 'grad_norm': 1.32578444480896, 'learning_rate': 1.248018600941285e-05, 'epoch': 1.31} +2025-02-05 19:52:47 - ERROR - stderr - 44%|████▍ | 9818/22434 [9:45:07<8:47:53, 2.51s/it] +2025-02-05 19:52:49 - ERROR - stderr - 44%|████▍ | 9819/22434 [9:45:09<8:44:44, 2.50s/it] +2025-02-05 19:52:49 - ERROR - stderr - +2025-02-05 19:52:49 - ERROR - stderr - +2025-02-05 19:52:49 - INFO - stdout - {'loss': 0.6241, 'grad_norm': 1.0088437795639038, 'learning_rate': 1.2478787346418392e-05, 'epoch': 1.31} +2025-02-05 19:52:49 - ERROR - stderr - 44%|████▍ | 9819/22434 [9:45:09<8:44:44, 2.50s/it] +2025-02-05 19:52:52 - ERROR - stderr - 44%|████▍ | 9820/22434 [9:45:11<8:43:06, 2.49s/it] +2025-02-05 19:52:52 - ERROR - stderr - +2025-02-05 19:52:52 - ERROR - stderr - +2025-02-05 19:52:52 - INFO - stdout - {'loss': 0.6507, 'grad_norm': 1.1620514392852783, 'learning_rate': 1.2477388631755987e-05, 'epoch': 1.31} +2025-02-05 19:52:52 - ERROR - stderr - 44%|████▍ | 9820/22434 [9:45:12<8:43:06, 2.49s/it] +2025-02-05 19:52:54 - ERROR - stderr - 44%|████▍ | 9821/22434 [9:45:14<8:48:49, 2.52s/it] +2025-02-05 19:52:54 - ERROR - stderr - +2025-02-05 19:52:54 - ERROR - stderr - +2025-02-05 19:52:54 - INFO - stdout - {'loss': 0.6384, 'grad_norm': 1.1568121910095215, 'learning_rate': 1.2475989865454783e-05, 'epoch': 1.31} +2025-02-05 19:52:54 - ERROR - stderr - 44%|████▍ | 9821/22434 [9:45:14<8:48:49, 2.52s/it] +2025-02-05 19:52:57 - ERROR - stderr - 44%|████▍ | 9822/22434 [9:45:17<8:49:44, 2.52s/it] +2025-02-05 19:52:57 - ERROR - stderr - +2025-02-05 19:52:57 - ERROR - stderr - +2025-02-05 19:52:57 - INFO - stdout - {'loss': 0.7374, 'grad_norm': 1.160117268562317, 'learning_rate': 1.247459104754394e-05, 'epoch': 1.31} +2025-02-05 19:52:57 - ERROR - stderr - 44%|████▍ | 9822/22434 [9:45:17<8:49:44, 2.52s/it] +2025-02-05 19:52:59 - ERROR - stderr - 44%|████▍ | 9823/22434 [9:45:19<8:58:09, 2.56s/it] +2025-02-05 19:53:00 - ERROR - stderr - +2025-02-05 19:53:00 - ERROR - stderr - +2025-02-05 19:53:00 - INFO - stdout - {'loss': 0.7731, 'grad_norm': 1.2202023267745972, 'learning_rate': 1.2473192178052615e-05, 'epoch': 1.31} +2025-02-05 19:53:00 - ERROR - stderr - 44%|████▍ | 9823/22434 [9:45:19<8:58:09, 2.56s/it] +2025-02-05 19:53:02 - ERROR - stderr - 44%|████▍ | 9824/22434 [9:45:22<8:50:41, 2.53s/it] +2025-02-05 19:53:02 - ERROR - stderr - +2025-02-05 19:53:02 - ERROR - stderr - +2025-02-05 19:53:02 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.1822270154953003, 'learning_rate': 1.2471793257009965e-05, 'epoch': 1.31} +2025-02-05 19:53:02 - ERROR - stderr - 44%|████▍ | 9824/22434 [9:45:22<8:50:41, 2.53s/it] +2025-02-05 19:53:04 - ERROR - stderr - 44%|████▍ | 9825/22434 [9:45:24<8:52:37, 2.53s/it] +2025-02-05 19:53:05 - ERROR - stderr - +2025-02-05 19:53:05 - ERROR - stderr - +2025-02-05 19:53:05 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.2593294382095337, 'learning_rate': 1.2470394284445151e-05, 'epoch': 1.31} +2025-02-05 19:53:05 - ERROR - stderr - 44%|████▍ | 9825/22434 [9:45:24<8:52:37, 2.53s/it] +2025-02-05 19:53:07 - ERROR - stderr - 44%|████▍ | 9826/22434 [9:45:27<8:48:05, 2.51s/it] +2025-02-05 19:53:07 - ERROR - stderr - +2025-02-05 19:53:07 - ERROR - stderr - +2025-02-05 19:53:07 - INFO - stdout - {'loss': 0.8416, 'grad_norm': 1.3090794086456299, 'learning_rate': 1.2468995260387332e-05, 'epoch': 1.31} +2025-02-05 19:53:07 - ERROR - stderr - 44%|████▍ | 9826/22434 [9:45:27<8:48:05, 2.51s/it] +2025-02-05 19:53:09 - ERROR - stderr - 44%|████▍ | 9827/22434 [9:45:29<8:50:50, 2.53s/it] +2025-02-05 19:53:10 - ERROR - stderr - +2025-02-05 19:53:10 - ERROR - stderr - +2025-02-05 19:53:10 - INFO - stdout - {'loss': 0.721, 'grad_norm': 1.183261513710022, 'learning_rate': 1.2467596184865669e-05, 'epoch': 1.31} +2025-02-05 19:53:10 - ERROR - stderr - 44%|████▍ | 9827/22434 [9:45:29<8:50:50, 2.53s/it] +2025-02-05 19:53:12 - ERROR - stderr - 44%|████▍ | 9828/22434 [9:45:32<8:48:33, 2.52s/it] +2025-02-05 19:53:12 - ERROR - stderr - +2025-02-05 19:53:12 - ERROR - stderr - +2025-02-05 19:53:12 - INFO - stdout - {'loss': 0.7182, 'grad_norm': 1.3361262083053589, 'learning_rate': 1.2466197057909326e-05, 'epoch': 1.31} +2025-02-05 19:53:12 - ERROR - stderr - 44%|████▍ | 9828/22434 [9:45:32<8:48:33, 2.52s/it] +2025-02-05 19:53:14 - ERROR - stderr - 44%|████▍ | 9829/22434 [9:45:34<8:48:26, 2.52s/it] +2025-02-05 19:53:15 - ERROR - stderr - +2025-02-05 19:53:15 - ERROR - stderr - +2025-02-05 19:53:15 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.190143346786499, 'learning_rate': 1.2464797879547464e-05, 'epoch': 1.31} +2025-02-05 19:53:15 - ERROR - stderr - 44%|████▍ | 9829/22434 [9:45:34<8:48:26, 2.52s/it] +2025-02-05 19:53:17 - ERROR - stderr - 44%|████▍ | 9830/22434 [9:45:37<8:42:47, 2.49s/it] +2025-02-05 19:53:17 - ERROR - stderr - +2025-02-05 19:53:17 - ERROR - stderr - +2025-02-05 19:53:17 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.142507553100586, 'learning_rate': 1.2463398649809246e-05, 'epoch': 1.31} +2025-02-05 19:53:17 - ERROR - stderr - 44%|████▍ | 9830/22434 [9:45:37<8:42:47, 2.49s/it] +2025-02-05 19:53:19 - ERROR - stderr - 44%|████▍ | 9831/22434 [9:45:39<8:44:00, 2.49s/it] +2025-02-05 19:53:19 - ERROR - stderr - +2025-02-05 19:53:19 - ERROR - stderr - +2025-02-05 19:53:19 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.1049679517745972, 'learning_rate': 1.2461999368723843e-05, 'epoch': 1.31} +2025-02-05 19:53:19 - ERROR - stderr - 44%|████▍ | 9831/22434 [9:45:39<8:44:00, 2.49s/it] +2025-02-05 19:53:22 - ERROR - stderr - 44%|████▍ | 9832/22434 [9:45:42<8:45:48, 2.50s/it] +2025-02-05 19:53:22 - ERROR - stderr - +2025-02-05 19:53:22 - ERROR - stderr - +2025-02-05 19:53:22 - INFO - stdout - {'loss': 0.6741, 'grad_norm': 1.120949149131775, 'learning_rate': 1.2460600036320421e-05, 'epoch': 1.31} +2025-02-05 19:53:22 - ERROR - stderr - 44%|████▍ | 9832/22434 [9:45:42<8:45:48, 2.50s/it] +2025-02-05 19:53:24 - ERROR - stderr - 44%|████▍ | 9833/22434 [9:45:44<8:44:22, 2.50s/it] +2025-02-05 19:53:24 - ERROR - stderr - +2025-02-05 19:53:24 - ERROR - stderr - +2025-02-05 19:53:24 - INFO - stdout - {'loss': 0.7437, 'grad_norm': 1.3308773040771484, 'learning_rate': 1.2459200652628143e-05, 'epoch': 1.31} +2025-02-05 19:53:24 - ERROR - stderr - 44%|████▍ | 9833/22434 [9:45:44<8:44:22, 2.50s/it] +2025-02-05 19:53:27 - ERROR - stderr - 44%|████▍ | 9834/22434 [9:45:47<8:44:44, 2.50s/it] +2025-02-05 19:53:27 - ERROR - stderr - +2025-02-05 19:53:27 - ERROR - stderr - +2025-02-05 19:53:27 - INFO - stdout - {'loss': 0.6741, 'grad_norm': 1.2362589836120605, 'learning_rate': 1.2457801217676182e-05, 'epoch': 1.32} +2025-02-05 19:53:27 - ERROR - stderr - 44%|████▍ | 9834/22434 [9:45:47<8:44:44, 2.50s/it] +2025-02-05 19:53:29 - ERROR - stderr - 44%|████▍ | 9835/22434 [9:45:49<8:45:52, 2.50s/it] +2025-02-05 19:53:29 - ERROR - stderr - +2025-02-05 19:53:29 - ERROR - stderr - +2025-02-05 19:53:29 - INFO - stdout - {'loss': 0.798, 'grad_norm': 1.239372968673706, 'learning_rate': 1.2456401731493705e-05, 'epoch': 1.32} +2025-02-05 19:53:29 - ERROR - stderr - 44%|████▍ | 9835/22434 [9:45:49<8:45:52, 2.50s/it] +2025-02-05 19:53:32 - ERROR - stderr - 44%|████▍ | 9836/22434 [9:45:52<8:48:34, 2.52s/it] +2025-02-05 19:53:32 - ERROR - stderr - +2025-02-05 19:53:32 - ERROR - stderr - +2025-02-05 19:53:32 - INFO - stdout - {'loss': 0.6919, 'grad_norm': 1.1307982206344604, 'learning_rate': 1.2455002194109886e-05, 'epoch': 1.32} +2025-02-05 19:53:32 - ERROR - stderr - 44%|████▍ | 9836/22434 [9:45:52<8:48:34, 2.52s/it] +2025-02-05 19:53:35 - ERROR - stderr - 44%|████▍ | 9837/22434 [9:45:54<8:48:13, 2.52s/it] +2025-02-05 19:53:35 - ERROR - stderr - +2025-02-05 19:53:35 - ERROR - stderr - +2025-02-05 19:53:35 - INFO - stdout - {'loss': 0.7402, 'grad_norm': 1.173709511756897, 'learning_rate': 1.2453602605553894e-05, 'epoch': 1.32} +2025-02-05 19:53:35 - ERROR - stderr - 44%|████▍ | 9837/22434 [9:45:54<8:48:13, 2.52s/it] +2025-02-05 19:53:37 - ERROR - stderr - 44%|████▍ | 9838/22434 [9:45:57<8:48:30, 2.52s/it] +2025-02-05 19:53:37 - ERROR - stderr - +2025-02-05 19:53:37 - ERROR - stderr - +2025-02-05 19:53:37 - INFO - stdout - {'loss': 0.7754, 'grad_norm': 1.1248339414596558, 'learning_rate': 1.2452202965854905e-05, 'epoch': 1.32} +2025-02-05 19:53:37 - ERROR - stderr - 44%|████▍ | 9838/22434 [9:45:57<8:48:30, 2.52s/it] +2025-02-05 19:53:40 - ERROR - stderr - 44%|████▍ | 9839/22434 [9:45:59<8:55:43, 2.55s/it] +2025-02-05 19:53:40 - ERROR - stderr - +2025-02-05 19:53:40 - ERROR - stderr - +2025-02-05 19:53:40 - INFO - stdout - {'loss': 0.7174, 'grad_norm': 1.1756579875946045, 'learning_rate': 1.2450803275042092e-05, 'epoch': 1.32} +2025-02-05 19:53:40 - ERROR - stderr - 44%|████▍ | 9839/22434 [9:45:59<8:55:43, 2.55s/it] +2025-02-05 19:53:42 - ERROR - stderr - 44%|████▍ | 9840/22434 [9:46:02<8:55:31, 2.55s/it] +2025-02-05 19:53:42 - ERROR - stderr - +2025-02-05 19:53:42 - ERROR - stderr - +2025-02-05 19:53:42 - INFO - stdout - {'loss': 0.7065, 'grad_norm': 1.192704439163208, 'learning_rate': 1.2449403533144629e-05, 'epoch': 1.32} +2025-02-05 19:53:42 - ERROR - stderr - 44%|████▍ | 9840/22434 [9:46:02<8:55:31, 2.55s/it] +2025-02-05 19:53:45 - ERROR - stderr - 44%|████▍ | 9841/22434 [9:46:04<8:51:41, 2.53s/it] +2025-02-05 19:53:45 - ERROR - stderr - +2025-02-05 19:53:45 - ERROR - stderr - +2025-02-05 19:53:45 - INFO - stdout - {'loss': 0.6468, 'grad_norm': 1.1554477214813232, 'learning_rate': 1.2448003740191694e-05, 'epoch': 1.32} +2025-02-05 19:53:45 - ERROR - stderr - 44%|████▍ | 9841/22434 [9:46:05<8:51:41, 2.53s/it] +2025-02-05 19:53:47 - ERROR - stderr - 44%|████▍ | 9842/22434 [9:46:07<8:45:48, 2.51s/it] +2025-02-05 19:53:47 - ERROR - stderr - +2025-02-05 19:53:47 - ERROR - stderr - +2025-02-05 19:53:47 - INFO - stdout - {'loss': 0.7548, 'grad_norm': 1.3115088939666748, 'learning_rate': 1.2446603896212461e-05, 'epoch': 1.32} +2025-02-05 19:53:47 - ERROR - stderr - 44%|████▍ | 9842/22434 [9:46:07<8:45:48, 2.51s/it] +2025-02-05 19:53:50 - ERROR - stderr - 44%|████▍ | 9843/22434 [9:46:09<8:46:46, 2.51s/it] +2025-02-05 19:53:50 - ERROR - stderr - +2025-02-05 19:53:50 - ERROR - stderr - +2025-02-05 19:53:50 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.1918281316757202, 'learning_rate': 1.2445204001236112e-05, 'epoch': 1.32} +2025-02-05 19:53:50 - ERROR - stderr - 44%|████▍ | 9843/22434 [9:46:09<8:46:46, 2.51s/it] +2025-02-05 19:53:52 - ERROR - stderr - 44%|████▍ | 9844/22434 [9:46:12<8:49:19, 2.52s/it] +2025-02-05 19:53:52 - ERROR - stderr - +2025-02-05 19:53:52 - ERROR - stderr - +2025-02-05 19:53:52 - INFO - stdout - {'loss': 0.6651, 'grad_norm': 1.0965884923934937, 'learning_rate': 1.2443804055291826e-05, 'epoch': 1.32} +2025-02-05 19:53:52 - ERROR - stderr - 44%|████▍ | 9844/22434 [9:46:12<8:49:19, 2.52s/it] +2025-02-05 19:53:55 - ERROR - stderr - 44%|████▍ | 9845/22434 [9:46:14<8:46:19, 2.51s/it] +2025-02-05 19:53:55 - ERROR - stderr - +2025-02-05 19:53:55 - ERROR - stderr - +2025-02-05 19:53:55 - INFO - stdout - {'loss': 0.7715, 'grad_norm': 1.3044089078903198, 'learning_rate': 1.2442404058408784e-05, 'epoch': 1.32} +2025-02-05 19:53:55 - ERROR - stderr - 44%|████▍ | 9845/22434 [9:46:15<8:46:19, 2.51s/it] +2025-02-05 19:53:57 - ERROR - stderr - 44%|████▍ | 9846/22434 [9:46:17<8:42:35, 2.49s/it] +2025-02-05 19:53:57 - ERROR - stderr - +2025-02-05 19:53:57 - ERROR - stderr - +2025-02-05 19:53:57 - INFO - stdout - {'loss': 0.6628, 'grad_norm': 1.1114118099212646, 'learning_rate': 1.2441004010616165e-05, 'epoch': 1.32} +2025-02-05 19:53:57 - ERROR - stderr - 44%|████▍ | 9846/22434 [9:46:17<8:42:35, 2.49s/it] +2025-02-05 19:54:00 - ERROR - stderr - 44%|████▍ | 9847/22434 [9:46:19<8:43:06, 2.49s/it] +2025-02-05 19:54:00 - ERROR - stderr - +2025-02-05 19:54:00 - ERROR - stderr - +2025-02-05 19:54:00 - INFO - stdout - {'loss': 0.6393, 'grad_norm': 1.1898798942565918, 'learning_rate': 1.2439603911943152e-05, 'epoch': 1.32} +2025-02-05 19:54:00 - ERROR - stderr - 44%|████▍ | 9847/22434 [9:46:19<8:43:06, 2.49s/it] +2025-02-05 19:54:02 - ERROR - stderr - 44%|████▍ | 9848/22434 [9:46:22<8:40:23, 2.48s/it] +2025-02-05 19:54:02 - ERROR - stderr - +2025-02-05 19:54:02 - ERROR - stderr - +2025-02-05 19:54:02 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.1954336166381836, 'learning_rate': 1.2438203762418934e-05, 'epoch': 1.32} +2025-02-05 19:54:02 - ERROR - stderr - 44%|████▍ | 9848/22434 [9:46:22<8:40:23, 2.48s/it] +2025-02-05 19:54:05 - ERROR - stderr - 44%|████▍ | 9849/22434 [9:46:24<8:39:55, 2.48s/it] +2025-02-05 19:54:05 - ERROR - stderr - +2025-02-05 19:54:05 - ERROR - stderr - +2025-02-05 19:54:05 - INFO - stdout - {'loss': 0.7728, 'grad_norm': 1.3290241956710815, 'learning_rate': 1.2436803562072687e-05, 'epoch': 1.32} +2025-02-05 19:54:05 - ERROR - stderr - 44%|████▍ | 9849/22434 [9:46:24<8:39:55, 2.48s/it] +2025-02-05 19:54:07 - ERROR - stderr - 44%|████▍ | 9850/22434 [9:46:27<8:47:46, 2.52s/it] +2025-02-05 19:54:07 - ERROR - stderr - +2025-02-05 19:54:07 - ERROR - stderr - +2025-02-05 19:54:07 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.2889747619628906, 'learning_rate': 1.2435403310933606e-05, 'epoch': 1.32} +2025-02-05 19:54:07 - ERROR - stderr - 44%|████▍ | 9850/22434 [9:46:27<8:47:46, 2.52s/it] +2025-02-05 19:54:10 - ERROR - stderr - 44%|████▍ | 9851/22434 [9:46:30<8:51:09, 2.53s/it] +2025-02-05 19:54:10 - ERROR - stderr - +2025-02-05 19:54:10 - ERROR - stderr - +2025-02-05 19:54:10 - INFO - stdout - {'loss': 0.7662, 'grad_norm': 1.4065557718276978, 'learning_rate': 1.2434003009030869e-05, 'epoch': 1.32} +2025-02-05 19:54:10 - ERROR - stderr - 44%|████▍ | 9851/22434 [9:46:30<8:51:09, 2.53s/it] +2025-02-05 19:54:12 - ERROR - stderr - 44%|████▍ | 9852/22434 [9:46:32<8:58:46, 2.57s/it] +2025-02-05 19:54:12 - ERROR - stderr - +2025-02-05 19:54:12 - ERROR - stderr - +2025-02-05 19:54:12 - INFO - stdout - {'loss': 0.7468, 'grad_norm': 1.2182893753051758, 'learning_rate': 1.2432602656393673e-05, 'epoch': 1.32} +2025-02-05 19:54:12 - ERROR - stderr - 44%|████▍ | 9852/22434 [9:46:32<8:58:46, 2.57s/it] +2025-02-05 19:54:15 - ERROR - stderr - 44%|████▍ | 9853/22434 [9:46:35<8:54:55, 2.55s/it] +2025-02-05 19:54:15 - ERROR - stderr - +2025-02-05 19:54:15 - ERROR - stderr - +2025-02-05 19:54:15 - INFO - stdout - {'loss': 0.7328, 'grad_norm': 1.262793779373169, 'learning_rate': 1.2431202253051197e-05, 'epoch': 1.32} +2025-02-05 19:54:15 - ERROR - stderr - 44%|████▍ | 9853/22434 [9:46:35<8:54:55, 2.55s/it] +2025-02-05 19:54:17 - ERROR - stderr - 44%|████▍ | 9854/22434 [9:46:37<8:53:32, 2.54s/it] +2025-02-05 19:54:17 - ERROR - stderr - +2025-02-05 19:54:17 - ERROR - stderr - +2025-02-05 19:54:17 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.2878621816635132, 'learning_rate': 1.242980179903264e-05, 'epoch': 1.32} +2025-02-05 19:54:17 - ERROR - stderr - 44%|████▍ | 9854/22434 [9:46:37<8:53:32, 2.54s/it] +2025-02-05 19:54:20 - ERROR - stderr - 44%|████▍ | 9855/22434 [9:46:40<8:49:26, 2.53s/it] +2025-02-05 19:54:20 - ERROR - stderr - +2025-02-05 19:54:20 - ERROR - stderr - +2025-02-05 19:54:20 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.4076778888702393, 'learning_rate': 1.2428401294367189e-05, 'epoch': 1.32} +2025-02-05 19:54:20 - ERROR - stderr - 44%|████▍ | 9855/22434 [9:46:40<8:49:26, 2.53s/it] +2025-02-05 19:54:23 - ERROR - stderr - 44%|████▍ | 9856/22434 [9:46:43<9:15:17, 2.65s/it] +2025-02-05 19:54:23 - ERROR - stderr - +2025-02-05 19:54:23 - ERROR - stderr - +2025-02-05 19:54:23 - INFO - stdout - {'loss': 0.6708, 'grad_norm': 1.144184947013855, 'learning_rate': 1.2427000739084036e-05, 'epoch': 1.32} +2025-02-05 19:54:23 - ERROR - stderr - 44%|████▍ | 9856/22434 [9:46:43<9:15:17, 2.65s/it] +2025-02-05 19:54:25 - ERROR - stderr - 44%|████▍ | 9857/22434 [9:46:45<9:04:48, 2.60s/it] +2025-02-05 19:54:25 - ERROR - stderr - +2025-02-05 19:54:25 - ERROR - stderr - +2025-02-05 19:54:25 - INFO - stdout - {'loss': 0.7349, 'grad_norm': 1.481675386428833, 'learning_rate': 1.2425600133212377e-05, 'epoch': 1.32} +2025-02-05 19:54:25 - ERROR - stderr - 44%|████▍ | 9857/22434 [9:46:45<9:04:48, 2.60s/it] +2025-02-05 19:54:28 - ERROR - stderr - 44%|████▍ | 9858/22434 [9:46:48<8:59:35, 2.57s/it] +2025-02-05 19:54:28 - ERROR - stderr - +2025-02-05 19:54:28 - ERROR - stderr - +2025-02-05 19:54:28 - INFO - stdout - {'loss': 0.635, 'grad_norm': 1.1425468921661377, 'learning_rate': 1.2424199476781403e-05, 'epoch': 1.32} +2025-02-05 19:54:28 - ERROR - stderr - 44%|████▍ | 9858/22434 [9:46:48<8:59:35, 2.57s/it] +2025-02-05 19:54:30 - ERROR - stderr - 44%|████▍ | 9859/22434 [9:46:50<8:56:25, 2.56s/it] +2025-02-05 19:54:30 - ERROR - stderr - +2025-02-05 19:54:30 - ERROR - stderr - +2025-02-05 19:54:30 - INFO - stdout - {'loss': 0.7103, 'grad_norm': 1.2624248266220093, 'learning_rate': 1.242279876982031e-05, 'epoch': 1.32} +2025-02-05 19:54:30 - ERROR - stderr - 44%|████▍ | 9859/22434 [9:46:50<8:56:25, 2.56s/it] +2025-02-05 19:54:33 - ERROR - stderr - 44%|████▍ | 9860/22434 [9:46:53<8:55:07, 2.55s/it] +2025-02-05 19:54:33 - ERROR - stderr - +2025-02-05 19:54:33 - ERROR - stderr - +2025-02-05 19:54:33 - INFO - stdout - {'loss': 0.7427, 'grad_norm': 1.2202231884002686, 'learning_rate': 1.2421398012358294e-05, 'epoch': 1.32} +2025-02-05 19:54:33 - ERROR - stderr - 44%|████▍ | 9860/22434 [9:46:53<8:55:07, 2.55s/it] +2025-02-05 19:54:35 - ERROR - stderr - 44%|████▍ | 9861/22434 [9:46:55<8:57:05, 2.56s/it] +2025-02-05 19:54:36 - ERROR - stderr - +2025-02-05 19:54:36 - ERROR - stderr - +2025-02-05 19:54:36 - INFO - stdout - {'loss': 0.7292, 'grad_norm': 1.3206868171691895, 'learning_rate': 1.241999720442456e-05, 'epoch': 1.32} +2025-02-05 19:54:36 - ERROR - stderr - 44%|████▍ | 9861/22434 [9:46:55<8:57:05, 2.56s/it] +2025-02-05 19:54:38 - ERROR - stderr - 44%|████▍ | 9862/22434 [9:46:58<8:50:38, 2.53s/it] +2025-02-05 19:54:38 - ERROR - stderr - +2025-02-05 19:54:38 - ERROR - stderr - +2025-02-05 19:54:38 - INFO - stdout - {'loss': 0.7164, 'grad_norm': 1.1631275415420532, 'learning_rate': 1.2418596346048293e-05, 'epoch': 1.32} +2025-02-05 19:54:38 - ERROR - stderr - 44%|████▍ | 9862/22434 [9:46:58<8:50:38, 2.53s/it] +2025-02-05 19:54:41 - ERROR - stderr - 44%|████▍ | 9863/22434 [9:47:00<8:52:20, 2.54s/it] +2025-02-05 19:54:41 - ERROR - stderr - +2025-02-05 19:54:41 - ERROR - stderr - +2025-02-05 19:54:41 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.165019154548645, 'learning_rate': 1.2417195437258697e-05, 'epoch': 1.32} +2025-02-05 19:54:41 - ERROR - stderr - 44%|████▍ | 9863/22434 [9:47:00<8:52:20, 2.54s/it] +2025-02-05 19:54:43 - ERROR - stderr - 44%|████▍ | 9864/22434 [9:47:03<8:47:36, 2.52s/it] +2025-02-05 19:54:43 - ERROR - stderr - +2025-02-05 19:54:43 - ERROR - stderr - +2025-02-05 19:54:43 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 1.1370221376419067, 'learning_rate': 1.2415794478084981e-05, 'epoch': 1.32} +2025-02-05 19:54:43 - ERROR - stderr - 44%|████▍ | 9864/22434 [9:47:03<8:47:36, 2.52s/it] +2025-02-05 19:54:46 - ERROR - stderr - 44%|████▍ | 9865/22434 [9:47:05<8:51:28, 2.54s/it] +2025-02-05 19:54:46 - ERROR - stderr - +2025-02-05 19:54:46 - ERROR - stderr - +2025-02-05 19:54:46 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.2088977098464966, 'learning_rate': 1.2414393468556341e-05, 'epoch': 1.32} +2025-02-05 19:54:46 - ERROR - stderr - 44%|████▍ | 9865/22434 [9:47:05<8:51:28, 2.54s/it] +2025-02-05 19:54:48 - ERROR - stderr - 44%|████▍ | 9866/22434 [9:47:08<9:00:48, 2.58s/it] +2025-02-05 19:54:48 - ERROR - stderr - +2025-02-05 19:54:48 - ERROR - stderr - +2025-02-05 19:54:48 - INFO - stdout - {'loss': 0.7219, 'grad_norm': 1.113718867301941, 'learning_rate': 1.2412992408701979e-05, 'epoch': 1.32} +2025-02-05 19:54:48 - ERROR - stderr - 44%|████▍ | 9866/22434 [9:47:08<9:00:48, 2.58s/it] +2025-02-05 19:54:51 - ERROR - stderr - 44%|████▍ | 9867/22434 [9:47:11<8:58:19, 2.57s/it] +2025-02-05 19:54:51 - ERROR - stderr - +2025-02-05 19:54:51 - ERROR - stderr - +2025-02-05 19:54:51 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.08150053024292, 'learning_rate': 1.2411591298551096e-05, 'epoch': 1.32} +2025-02-05 19:54:51 - ERROR - stderr - 44%|████▍ | 9867/22434 [9:47:11<8:58:19, 2.57s/it] +2025-02-05 19:54:53 - ERROR - stderr - 44%|████▍ | 9868/22434 [9:47:13<9:03:04, 2.59s/it] +2025-02-05 19:54:53 - ERROR - stderr - +2025-02-05 19:54:53 - ERROR - stderr - +2025-02-05 19:54:53 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.22684907913208, 'learning_rate': 1.2410190138132903e-05, 'epoch': 1.32} +2025-02-05 19:54:53 - ERROR - stderr - 44%|████▍ | 9868/22434 [9:47:13<9:03:04, 2.59s/it] +2025-02-05 19:54:56 - ERROR - stderr - 44%|████▍ | 9869/22434 [9:47:16<8:58:33, 2.57s/it] +2025-02-05 19:54:56 - ERROR - stderr - +2025-02-05 19:54:56 - ERROR - stderr - +2025-02-05 19:54:56 - INFO - stdout - {'loss': 0.6658, 'grad_norm': 1.1325994729995728, 'learning_rate': 1.24087889274766e-05, 'epoch': 1.32} +2025-02-05 19:54:56 - ERROR - stderr - 44%|████▍ | 9869/22434 [9:47:16<8:58:33, 2.57s/it] +2025-02-05 19:54:59 - ERROR - stderr - 44%|████▍ | 9870/22434 [9:47:18<8:59:50, 2.58s/it] +2025-02-05 19:54:59 - ERROR - stderr - +2025-02-05 19:54:59 - ERROR - stderr - +2025-02-05 19:54:59 - INFO - stdout - {'loss': 0.7082, 'grad_norm': 1.0684702396392822, 'learning_rate': 1.24073876666114e-05, 'epoch': 1.32} +2025-02-05 19:54:59 - ERROR - stderr - 44%|████▍ | 9870/22434 [9:47:18<8:59:50, 2.58s/it] +2025-02-05 19:55:01 - ERROR - stderr - 44%|████▍ | 9871/22434 [9:47:21<9:03:19, 2.59s/it] +2025-02-05 19:55:01 - ERROR - stderr - +2025-02-05 19:55:01 - ERROR - stderr - +2025-02-05 19:55:01 - INFO - stdout - {'loss': 0.7856, 'grad_norm': 1.250662922859192, 'learning_rate': 1.2405986355566506e-05, 'epoch': 1.32} +2025-02-05 19:55:01 - ERROR - stderr - 44%|████▍ | 9871/22434 [9:47:21<9:03:19, 2.59s/it] +2025-02-05 19:55:04 - ERROR - stderr - 44%|████▍ | 9872/22434 [9:47:23<8:59:00, 2.57s/it] +2025-02-05 19:55:04 - ERROR - stderr - +2025-02-05 19:55:04 - ERROR - stderr - +2025-02-05 19:55:04 - INFO - stdout - {'loss': 0.6737, 'grad_norm': 1.1444483995437622, 'learning_rate': 1.2404584994371128e-05, 'epoch': 1.32} +2025-02-05 19:55:04 - ERROR - stderr - 44%|████▍ | 9872/22434 [9:47:24<8:59:00, 2.57s/it] +2025-02-05 19:55:06 - ERROR - stderr - 44%|████▍ | 9873/22434 [9:47:26<8:58:32, 2.57s/it] +2025-02-05 19:55:06 - ERROR - stderr - +2025-02-05 19:55:06 - ERROR - stderr - +2025-02-05 19:55:06 - INFO - stdout - {'loss': 0.7029, 'grad_norm': 1.1788280010223389, 'learning_rate': 1.2403183583054479e-05, 'epoch': 1.32} +2025-02-05 19:55:06 - ERROR - stderr - 44%|████▍ | 9873/22434 [9:47:26<8:58:32, 2.57s/it] +2025-02-05 19:55:09 - ERROR - stderr - 44%|████▍ | 9874/22434 [9:47:29<8:53:35, 2.55s/it] +2025-02-05 19:55:09 - ERROR - stderr - +2025-02-05 19:55:09 - ERROR - stderr - +2025-02-05 19:55:09 - INFO - stdout - {'loss': 0.7459, 'grad_norm': 1.2111122608184814, 'learning_rate': 1.2401782121645767e-05, 'epoch': 1.32} +2025-02-05 19:55:09 - ERROR - stderr - 44%|████▍ | 9874/22434 [9:47:29<8:53:35, 2.55s/it] +2025-02-05 19:55:11 - ERROR - stderr - 44%|████▍ | 9875/22434 [9:47:31<8:53:34, 2.55s/it] +2025-02-05 19:55:11 - ERROR - stderr - +2025-02-05 19:55:11 - ERROR - stderr - +2025-02-05 19:55:11 - INFO - stdout - {'loss': 0.6452, 'grad_norm': 1.0028976202011108, 'learning_rate': 1.2400380610174205e-05, 'epoch': 1.32} +2025-02-05 19:55:11 - ERROR - stderr - 44%|████▍ | 9875/22434 [9:47:31<8:53:34, 2.55s/it] +2025-02-05 19:55:14 - ERROR - stderr - 44%|████▍ | 9876/22434 [9:47:34<8:50:22, 2.53s/it] +2025-02-05 19:55:14 - ERROR - stderr - +2025-02-05 19:55:14 - ERROR - stderr - +2025-02-05 19:55:14 - INFO - stdout - {'loss': 0.7348, 'grad_norm': 1.227378487586975, 'learning_rate': 1.2398979048669002e-05, 'epoch': 1.32} +2025-02-05 19:55:14 - ERROR - stderr - 44%|████▍ | 9876/22434 [9:47:34<8:50:22, 2.53s/it] +2025-02-05 19:55:16 - ERROR - stderr - 44%|████▍ | 9877/22434 [9:47:36<8:58:39, 2.57s/it] +2025-02-05 19:55:17 - ERROR - stderr - +2025-02-05 19:55:17 - ERROR - stderr - +2025-02-05 19:55:17 - INFO - stdout - {'loss': 0.781, 'grad_norm': 1.2527941465377808, 'learning_rate': 1.2397577437159383e-05, 'epoch': 1.32} +2025-02-05 19:55:17 - ERROR - stderr - 44%|████▍ | 9877/22434 [9:47:36<8:58:39, 2.57s/it] +2025-02-05 19:55:19 - ERROR - stderr - 44%|████▍ | 9878/22434 [9:47:39<9:07:39, 2.62s/it] +2025-02-05 19:55:19 - ERROR - stderr - +2025-02-05 19:55:19 - ERROR - stderr - +2025-02-05 19:55:19 - INFO - stdout - {'loss': 0.7482, 'grad_norm': 1.2534083127975464, 'learning_rate': 1.2396175775674553e-05, 'epoch': 1.32} +2025-02-05 19:55:19 - ERROR - stderr - 44%|████▍ | 9878/22434 [9:47:39<9:07:39, 2.62s/it] +2025-02-05 19:55:22 - ERROR - stderr - 44%|████▍ | 9879/22434 [9:47:42<9:08:13, 2.62s/it] +2025-02-05 19:55:22 - ERROR - stderr - +2025-02-05 19:55:22 - ERROR - stderr - +2025-02-05 19:55:22 - INFO - stdout - {'loss': 0.6907, 'grad_norm': 1.2838736772537231, 'learning_rate': 1.2394774064243733e-05, 'epoch': 1.32} +2025-02-05 19:55:22 - ERROR - stderr - 44%|████▍ | 9879/22434 [9:47:42<9:08:13, 2.62s/it] +2025-02-05 19:55:25 - ERROR - stderr - 44%|████▍ | 9880/22434 [9:47:44<9:19:35, 2.67s/it] +2025-02-05 19:55:25 - ERROR - stderr - +2025-02-05 19:55:25 - ERROR - stderr - +2025-02-05 19:55:25 - INFO - stdout - {'loss': 0.7969, 'grad_norm': 1.3349031209945679, 'learning_rate': 1.2393372302896138e-05, 'epoch': 1.32} +2025-02-05 19:55:25 - ERROR - stderr - 44%|████▍ | 9880/22434 [9:47:44<9:19:35, 2.67s/it] +2025-02-05 19:55:27 - ERROR - stderr - 44%|████▍ | 9881/22434 [9:47:47<9:13:00, 2.64s/it] +2025-02-05 19:55:27 - ERROR - stderr - +2025-02-05 19:55:27 - ERROR - stderr - +2025-02-05 19:55:27 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.2055402994155884, 'learning_rate': 1.2391970491660988e-05, 'epoch': 1.32} +2025-02-05 19:55:27 - ERROR - stderr - 44%|████▍ | 9881/22434 [9:47:47<9:13:00, 2.64s/it] +2025-02-05 19:55:30 - ERROR - stderr - 44%|████▍ | 9882/22434 [9:47:49<9:03:40, 2.60s/it] +2025-02-05 19:55:30 - ERROR - stderr - +2025-02-05 19:55:30 - ERROR - stderr - +2025-02-05 19:55:30 - INFO - stdout - {'loss': 0.701, 'grad_norm': 1.1701058149337769, 'learning_rate': 1.2390568630567501e-05, 'epoch': 1.32} +2025-02-05 19:55:30 - ERROR - stderr - 44%|████▍ | 9882/22434 [9:47:50<9:03:40, 2.60s/it] +2025-02-05 19:55:32 - ERROR - stderr - 44%|████▍ | 9883/22434 [9:47:52<9:06:12, 2.61s/it] +2025-02-05 19:55:32 - ERROR - stderr - +2025-02-05 19:55:32 - ERROR - stderr - +2025-02-05 19:55:32 - INFO - stdout - {'loss': 0.7477, 'grad_norm': 1.0695701837539673, 'learning_rate': 1.2389166719644901e-05, 'epoch': 1.32} +2025-02-05 19:55:32 - ERROR - stderr - 44%|████▍ | 9883/22434 [9:47:52<9:06:12, 2.61s/it] +2025-02-05 19:55:35 - ERROR - stderr - 44%|████▍ | 9884/22434 [9:47:55<9:02:23, 2.59s/it] +2025-02-05 19:55:35 - ERROR - stderr - +2025-02-05 19:55:35 - ERROR - stderr - +2025-02-05 19:55:35 - INFO - stdout - {'loss': 0.7071, 'grad_norm': 1.1811188459396362, 'learning_rate': 1.2387764758922405e-05, 'epoch': 1.32} +2025-02-05 19:55:35 - ERROR - stderr - 44%|████▍ | 9884/22434 [9:47:55<9:02:23, 2.59s/it] +2025-02-05 19:55:37 - ERROR - stderr - 44%|████▍ | 9885/22434 [9:47:57<8:55:10, 2.56s/it] +2025-02-05 19:55:37 - ERROR - stderr - +2025-02-05 19:55:37 - ERROR - stderr - +2025-02-05 19:55:37 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 1.0748162269592285, 'learning_rate': 1.2386362748429239e-05, 'epoch': 1.32} +2025-02-05 19:55:37 - ERROR - stderr - 44%|████▍ | 9885/22434 [9:47:57<8:55:10, 2.56s/it] +2025-02-05 19:55:40 - ERROR - stderr - 44%|████▍ | 9886/22434 [9:48:00<8:50:46, 2.54s/it] +2025-02-05 19:55:40 - ERROR - stderr - +2025-02-05 19:55:40 - ERROR - stderr - +2025-02-05 19:55:40 - INFO - stdout - {'loss': 0.7351, 'grad_norm': 1.3554185628890991, 'learning_rate': 1.2384960688194623e-05, 'epoch': 1.32} +2025-02-05 19:55:40 - ERROR - stderr - 44%|████▍ | 9886/22434 [9:48:00<8:50:46, 2.54s/it] +2025-02-05 19:55:42 - ERROR - stderr - 44%|████▍ | 9887/22434 [9:48:02<8:48:01, 2.53s/it] +2025-02-05 19:55:42 - ERROR - stderr - +2025-02-05 19:55:42 - ERROR - stderr - +2025-02-05 19:55:42 - INFO - stdout - {'loss': 0.7176, 'grad_norm': 1.2020564079284668, 'learning_rate': 1.2383558578247785e-05, 'epoch': 1.32} +2025-02-05 19:55:42 - ERROR - stderr - 44%|████▍ | 9887/22434 [9:48:02<8:48:01, 2.53s/it] +2025-02-05 19:55:45 - ERROR - stderr - 44%|████▍ | 9888/22434 [9:48:05<8:44:38, 2.51s/it] +2025-02-05 19:55:45 - ERROR - stderr - +2025-02-05 19:55:45 - ERROR - stderr - +2025-02-05 19:55:45 - INFO - stdout - {'loss': 0.6737, 'grad_norm': 1.2024348974227905, 'learning_rate': 1.2382156418617948e-05, 'epoch': 1.32} +2025-02-05 19:55:45 - ERROR - stderr - 44%|████▍ | 9888/22434 [9:48:05<8:44:38, 2.51s/it] +2025-02-05 19:55:47 - ERROR - stderr - 44%|████▍ | 9889/22434 [9:48:07<8:42:23, 2.50s/it] +2025-02-05 19:55:47 - ERROR - stderr - +2025-02-05 19:55:47 - ERROR - stderr - +2025-02-05 19:55:47 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 1.1332571506500244, 'learning_rate': 1.238075420933434e-05, 'epoch': 1.32} +2025-02-05 19:55:47 - ERROR - stderr - 44%|████▍ | 9889/22434 [9:48:07<8:42:23, 2.50s/it] +2025-02-05 19:55:50 - ERROR - stderr - 44%|████▍ | 9890/22434 [9:48:10<8:52:58, 2.55s/it] +2025-02-05 19:55:50 - ERROR - stderr - +2025-02-05 19:55:50 - ERROR - stderr - +2025-02-05 19:55:50 - INFO - stdout - {'loss': 0.744, 'grad_norm': 1.180061936378479, 'learning_rate': 1.2379351950426188e-05, 'epoch': 1.32} +2025-02-05 19:55:50 - ERROR - stderr - 44%|████▍ | 9890/22434 [9:48:10<8:52:58, 2.55s/it] +2025-02-05 19:55:52 - ERROR - stderr - 44%|████▍ | 9891/22434 [9:48:12<8:47:09, 2.52s/it] +2025-02-05 19:55:52 - ERROR - stderr - +2025-02-05 19:55:52 - ERROR - stderr - +2025-02-05 19:55:52 - INFO - stdout - {'loss': 0.7487, 'grad_norm': 1.1906039714813232, 'learning_rate': 1.2377949641922724e-05, 'epoch': 1.32} +2025-02-05 19:55:52 - ERROR - stderr - 44%|████▍ | 9891/22434 [9:48:12<8:47:09, 2.52s/it] +2025-02-05 19:55:55 - ERROR - stderr - 44%|████▍ | 9892/22434 [9:48:15<8:42:22, 2.50s/it] +2025-02-05 19:55:55 - ERROR - stderr - +2025-02-05 19:55:55 - ERROR - stderr - +2025-02-05 19:55:55 - INFO - stdout - {'loss': 0.7147, 'grad_norm': 1.2536375522613525, 'learning_rate': 1.2376547283853173e-05, 'epoch': 1.32} +2025-02-05 19:55:55 - ERROR - stderr - 44%|████▍ | 9892/22434 [9:48:15<8:42:22, 2.50s/it] +2025-02-05 19:55:58 - ERROR - stderr - 44%|████▍ | 9893/22434 [9:48:17<8:56:29, 2.57s/it] +2025-02-05 19:55:58 - ERROR - stderr - +2025-02-05 19:55:58 - ERROR - stderr - +2025-02-05 19:55:58 - INFO - stdout - {'loss': 0.7792, 'grad_norm': 1.2329235076904297, 'learning_rate': 1.2375144876246771e-05, 'epoch': 1.32} +2025-02-05 19:55:58 - ERROR - stderr - 44%|████▍ | 9893/22434 [9:48:17<8:56:29, 2.57s/it] +2025-02-05 19:56:00 - ERROR - stderr - 44%|████▍ | 9894/22434 [9:48:20<8:51:58, 2.55s/it] +2025-02-05 19:56:00 - ERROR - stderr - +2025-02-05 19:56:00 - ERROR - stderr - +2025-02-05 19:56:00 - INFO - stdout - {'loss': 0.6915, 'grad_norm': 1.1096115112304688, 'learning_rate': 1.2373742419132744e-05, 'epoch': 1.32} +2025-02-05 19:56:00 - ERROR - stderr - 44%|████▍ | 9894/22434 [9:48:20<8:51:58, 2.55s/it] +2025-02-05 19:56:03 - ERROR - stderr - 44%|██���█▍ | 9895/22434 [9:48:22<8:46:02, 2.52s/it] +2025-02-05 19:56:03 - ERROR - stderr - +2025-02-05 19:56:03 - ERROR - stderr - +2025-02-05 19:56:03 - INFO - stdout - {'loss': 0.7412, 'grad_norm': 1.2646775245666504, 'learning_rate': 1.2372339912540326e-05, 'epoch': 1.32} +2025-02-05 19:56:03 - ERROR - stderr - 44%|████▍ | 9895/22434 [9:48:22<8:46:02, 2.52s/it] +2025-02-05 19:56:05 - ERROR - stderr - 44%|████▍ | 9896/22434 [9:48:25<8:52:30, 2.55s/it] +2025-02-05 19:56:05 - ERROR - stderr - +2025-02-05 19:56:05 - ERROR - stderr - +2025-02-05 19:56:05 - INFO - stdout - {'loss': 0.7046, 'grad_norm': 1.1520344018936157, 'learning_rate': 1.2370937356498756e-05, 'epoch': 1.32} +2025-02-05 19:56:05 - ERROR - stderr - 44%|████▍ | 9896/22434 [9:48:25<8:52:30, 2.55s/it] +2025-02-05 19:56:08 - ERROR - stderr - 44%|████▍ | 9897/22434 [9:48:27<8:52:38, 2.55s/it] +2025-02-05 19:56:08 - ERROR - stderr - +2025-02-05 19:56:08 - ERROR - stderr - +2025-02-05 19:56:08 - INFO - stdout - {'loss': 0.7025, 'grad_norm': 1.079518437385559, 'learning_rate': 1.2369534751037267e-05, 'epoch': 1.32} +2025-02-05 19:56:08 - ERROR - stderr - 44%|████▍ | 9897/22434 [9:48:28<8:52:38, 2.55s/it] +2025-02-05 19:56:11 - ERROR - stderr - 44%|████▍ | 9898/22434 [9:48:31<9:23:44, 2.70s/it] +2025-02-05 19:56:11 - ERROR - stderr - +2025-02-05 19:56:11 - ERROR - stderr - +2025-02-05 19:56:11 - INFO - stdout - {'loss': 0.7132, 'grad_norm': 1.090664267539978, 'learning_rate': 1.2368132096185091e-05, 'epoch': 1.32} +2025-02-05 19:56:11 - ERROR - stderr - 44%|████▍ | 9898/22434 [9:48:31<9:23:44, 2.70s/it] +2025-02-05 19:56:13 - ERROR - stderr - 44%|████▍ | 9899/22434 [9:48:33<9:13:17, 2.65s/it] +2025-02-05 19:56:13 - ERROR - stderr - +2025-02-05 19:56:13 - ERROR - stderr - +2025-02-05 19:56:13 - INFO - stdout - {'loss': 0.7124, 'grad_norm': 1.2134525775909424, 'learning_rate': 1.2366729391971466e-05, 'epoch': 1.32} +2025-02-05 19:56:13 - ERROR - stderr - 44%|████▍ | 9899/22434 [9:48:33<9:13:17, 2.65s/it] +2025-02-05 19:56:16 - ERROR - stderr - 44%|████▍ | 9900/22434 [9:48:36<9:02:54, 2.60s/it] +2025-02-05 19:56:16 - ERROR - stderr - +2025-02-05 19:56:16 - ERROR - stderr - +2025-02-05 19:56:16 - INFO - stdout - {'loss': 0.6939, 'grad_norm': 1.0870977640151978, 'learning_rate': 1.2365326638425632e-05, 'epoch': 1.32} +2025-02-05 19:56:16 - ERROR - stderr - 44%|████▍ | 9900/22434 [9:48:36<9:02:54, 2.60s/it] +2025-02-05 19:56:18 - ERROR - stderr - 44%|████▍ | 9901/22434 [9:48:38<8:56:25, 2.57s/it] +2025-02-05 19:56:18 - ERROR - stderr - +2025-02-05 19:56:18 - ERROR - stderr - +2025-02-05 19:56:18 - INFO - stdout - {'loss': 0.6478, 'grad_norm': 1.1053849458694458, 'learning_rate': 1.236392383557683e-05, 'epoch': 1.32} +2025-02-05 19:56:18 - ERROR - stderr - 44%|████▍ | 9901/22434 [9:48:38<8:56:25, 2.57s/it] +2025-02-05 19:56:21 - ERROR - stderr - 44%|████▍ | 9902/22434 [9:48:41<9:12:39, 2.65s/it] +2025-02-05 19:56:21 - ERROR - stderr - +2025-02-05 19:56:21 - ERROR - stderr - +2025-02-05 19:56:21 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.1693007946014404, 'learning_rate': 1.2362520983454295e-05, 'epoch': 1.32} +2025-02-05 19:56:21 - ERROR - stderr - 44%|████▍ | 9902/22434 [9:48:41<9:12:39, 2.65s/it] +2025-02-05 19:56:24 - ERROR - stderr - 44%|████▍ | 9903/22434 [9:48:44<9:11:40, 2.64s/it] +2025-02-05 19:56:24 - ERROR - stderr - +2025-02-05 19:56:24 - ERROR - stderr - +2025-02-05 19:56:24 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.253507375717163, 'learning_rate': 1.2361118082087271e-05, 'epoch': 1.32} +2025-02-05 19:56:24 - ERROR - stderr - 44%|████▍ | 9903/22434 [9:48:44<9:11:40, 2.64s/it] +2025-02-05 19:56:26 - ERROR - stderr - 44%|████▍ | 9904/22434 [9:48:46<8:59:00, 2.58s/it] +2025-02-05 19:56:26 - ERROR - stderr - +2025-02-05 19:56:26 - ERROR - stderr - +2025-02-05 19:56:26 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.2455089092254639, 'learning_rate': 1.2359715131505001e-05, 'epoch': 1.32} +2025-02-05 19:56:26 - ERROR - stderr - 44%|████▍ | 9904/22434 [9:48:46<8:59:00, 2.58s/it] +2025-02-05 19:56:29 - ERROR - stderr - 44%|████▍ | 9905/22434 [9:48:48<8:52:03, 2.55s/it] +2025-02-05 19:56:29 - ERROR - stderr - +2025-02-05 19:56:29 - ERROR - stderr - +2025-02-05 19:56:29 - INFO - stdout - {'loss': 0.5995, 'grad_norm': 1.2626858949661255, 'learning_rate': 1.235831213173673e-05, 'epoch': 1.32} +2025-02-05 19:56:29 - ERROR - stderr - 44%|████▍ | 9905/22434 [9:48:48<8:52:03, 2.55s/it] +2025-02-05 19:56:31 - ERROR - stderr - 44%|████▍ | 9906/22434 [9:48:51<9:06:30, 2.62s/it] +2025-02-05 19:56:31 - ERROR - stderr - +2025-02-05 19:56:31 - ERROR - stderr - +2025-02-05 19:56:31 - INFO - stdout - {'loss': 0.8383, 'grad_norm': 1.2876940965652466, 'learning_rate': 1.2356909082811697e-05, 'epoch': 1.32} +2025-02-05 19:56:31 - ERROR - stderr - 44%|████▍ | 9906/22434 [9:48:51<9:06:30, 2.62s/it] +2025-02-05 19:56:34 - ERROR - stderr - 44%|████▍ | 9907/22434 [9:48:54<9:00:41, 2.59s/it] +2025-02-05 19:56:34 - ERROR - stderr - +2025-02-05 19:56:34 - ERROR - stderr - +2025-02-05 19:56:34 - INFO - stdout - {'loss': 0.6272, 'grad_norm': 1.0716735124588013, 'learning_rate': 1.2355505984759148e-05, 'epoch': 1.32} +2025-02-05 19:56:34 - ERROR - stderr - 44%|████▍ | 9907/22434 [9:48:54<9:00:41, 2.59s/it] +2025-02-05 19:56:36 - ERROR - stderr - 44%|████▍ | 9908/22434 [9:48:56<8:55:16, 2.56s/it] +2025-02-05 19:56:37 - ERROR - stderr - +2025-02-05 19:56:37 - ERROR - stderr - +2025-02-05 19:56:37 - INFO - stdout - {'loss': 0.7381, 'grad_norm': 1.3178489208221436, 'learning_rate': 1.2354102837608328e-05, 'epoch': 1.32} +2025-02-05 19:56:37 - ERROR - stderr - 44%|████▍ | 9908/22434 [9:48:56<8:55:16, 2.56s/it] +2025-02-05 19:56:39 - ERROR - stderr - 44%|████▍ | 9909/22434 [9:48:59<8:53:19, 2.55s/it] +2025-02-05 19:56:39 - ERROR - stderr - +2025-02-05 19:56:39 - ERROR - stderr - +2025-02-05 19:56:39 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.1800323724746704, 'learning_rate': 1.2352699641388493e-05, 'epoch': 1.33} +2025-02-05 19:56:39 - ERROR - stderr - 44%|████▍ | 9909/22434 [9:48:59<8:53:19, 2.55s/it] +2025-02-05 19:56:41 - ERROR - stderr - 44%|████▍ | 9910/22434 [9:49:01<8:47:00, 2.52s/it] +2025-02-05 19:56:41 - ERROR - stderr - +2025-02-05 19:56:41 - ERROR - stderr - +2025-02-05 19:56:41 - INFO - stdout - {'loss': 0.7214, 'grad_norm': 1.3301548957824707, 'learning_rate': 1.2351296396128882e-05, 'epoch': 1.33} +2025-02-05 19:56:41 - ERROR - stderr - 44%|████▍ | 9910/22434 [9:49:01<8:47:00, 2.52s/it] +2025-02-05 19:56:44 - ERROR - stderr - 44%|████▍ | 9911/22434 [9:49:04<8:42:36, 2.50s/it] +2025-02-05 19:56:44 - ERROR - stderr - +2025-02-05 19:56:44 - ERROR - stderr - +2025-02-05 19:56:44 - INFO - stdout - {'loss': 0.7731, 'grad_norm': 1.2469210624694824, 'learning_rate': 1.234989310185875e-05, 'epoch': 1.33} +2025-02-05 19:56:44 - ERROR - stderr - 44%|████▍ | 9911/22434 [9:49:04<8:42:36, 2.50s/it] +2025-02-05 19:56:46 - ERROR - stderr - 44%|████▍ | 9912/22434 [9:49:06<8:43:16, 2.51s/it] +2025-02-05 19:56:46 - ERROR - stderr - +2025-02-05 19:56:46 - ERROR - stderr - +2025-02-05 19:56:46 - INFO - stdout - {'loss': 0.7548, 'grad_norm': 1.2375925779342651, 'learning_rate': 1.2348489758607343e-05, 'epoch': 1.33} +2025-02-05 19:56:46 - ERROR - stderr - 44%|████▍ | 9912/22434 [9:49:06<8:43:16, 2.51s/it] +2025-02-05 19:56:49 - ERROR - stderr - 44%|████▍ | 9913/22434 [9:49:09<8:49:59, 2.54s/it] +2025-02-05 19:56:49 - ERROR - stderr - +2025-02-05 19:56:49 - ERROR - stderr - +2025-02-05 19:56:49 - INFO - stdout - {'loss': 0.7618, 'grad_norm': 1.3275505304336548, 'learning_rate': 1.2347086366403916e-05, 'epoch': 1.33} +2025-02-05 19:56:49 - ERROR - stderr - 44%|████▍ | 9913/22434 [9:49:09<8:49:59, 2.54s/it] +2025-02-05 19:56:51 - ERROR - stderr - 44%|████▍ | 9914/22434 [9:49:11<8:45:35, 2.52s/it] +2025-02-05 19:56:52 - ERROR - stderr - +2025-02-05 19:56:52 - ERROR - stderr - +2025-02-05 19:56:52 - INFO - stdout - {'loss': 0.6206, 'grad_norm': 1.0136315822601318, 'learning_rate': 1.2345682925277716e-05, 'epoch': 1.33} +2025-02-05 19:56:52 - ERROR - stderr - 44%|████▍ | 9914/22434 [9:49:11<8:45:35, 2.52s/it] +2025-02-05 19:56:54 - ERROR - stderr - 44%|████▍ | 9915/22434 [9:49:14<8:46:26, 2.52s/it] +2025-02-05 19:56:54 - ERROR - stderr - +2025-02-05 19:56:54 - ERROR - stderr - +2025-02-05 19:56:54 - INFO - stdout - {'loss': 0.6523, 'grad_norm': 1.0719951391220093, 'learning_rate': 1.2344279435258003e-05, 'epoch': 1.33} +2025-02-05 19:56:54 - ERROR - stderr - 44%|████▍ | 9915/22434 [9:49:14<8:46:26, 2.52s/it] +2025-02-05 19:56:57 - ERROR - stderr - 44%|████▍ | 9916/22434 [9:49:16<8:53:27, 2.56s/it] +2025-02-05 19:56:57 - ERROR - stderr - +2025-02-05 19:56:57 - ERROR - stderr - +2025-02-05 19:56:57 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.3816823959350586, 'learning_rate': 1.2342875896374028e-05, 'epoch': 1.33} +2025-02-05 19:56:57 - ERROR - stderr - 44%|████▍ | 9916/22434 [9:49:16<8:53:27, 2.56s/it] +2025-02-05 19:56:59 - ERROR - stderr - 44%|████▍ | 9917/22434 [9:49:19<8:47:12, 2.53s/it] +2025-02-05 19:56:59 - ERROR - stderr - +2025-02-05 19:56:59 - ERROR - stderr - +2025-02-05 19:56:59 - INFO - stdout - {'loss': 0.6774, 'grad_norm': 1.1184625625610352, 'learning_rate': 1.2341472308655047e-05, 'epoch': 1.33} +2025-02-05 19:56:59 - ERROR - stderr - 44%|████▍ | 9917/22434 [9:49:19<8:47:12, 2.53s/it] +2025-02-05 19:57:02 - ERROR - stderr - 44%|████▍ | 9918/22434 [9:49:22<8:57:46, 2.58s/it] +2025-02-05 19:57:02 - ERROR - stderr - +2025-02-05 19:57:02 - ERROR - stderr - +2025-02-05 19:57:02 - INFO - stdout - {'loss': 0.7036, 'grad_norm': 1.2849780321121216, 'learning_rate': 1.2340068672130315e-05, 'epoch': 1.33} +2025-02-05 19:57:02 - ERROR - stderr - 44%|████▍ | 9918/22434 [9:49:22<8:57:46, 2.58s/it] +2025-02-05 19:57:04 - ERROR - stderr - 44%|████▍ | 9919/22434 [9:49:24<8:58:24, 2.58s/it] +2025-02-05 19:57:04 - ERROR - stderr - +2025-02-05 19:57:04 - ERROR - stderr - +2025-02-05 19:57:04 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.1214709281921387, 'learning_rate': 1.2338664986829092e-05, 'epoch': 1.33} +2025-02-05 19:57:04 - ERROR - stderr - 44%|████▍ | 9919/22434 [9:49:24<8:58:24, 2.58s/it] +2025-02-05 19:57:07 - ERROR - stderr - 44%|████▍ | 9920/22434 [9:49:27<8:54:49, 2.56s/it] +2025-02-05 19:57:07 - ERROR - stderr - +2025-02-05 19:57:07 - ERROR - stderr - +2025-02-05 19:57:07 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.1513007879257202, 'learning_rate': 1.2337261252780632e-05, 'epoch': 1.33} +2025-02-05 19:57:07 - ERROR - stderr - 44%|████▍ | 9920/22434 [9:49:27<8:54:49, 2.56s/it] +2025-02-05 19:57:10 - ERROR - stderr - 44%|████▍ | 9921/22434 [9:49:29<8:56:00, 2.57s/it] +2025-02-05 19:57:10 - ERROR - stderr - +2025-02-05 19:57:10 - ERROR - stderr - +2025-02-05 19:57:10 - INFO - stdout - {'loss': 0.6844, 'grad_norm': 1.2045342922210693, 'learning_rate': 1.23358574700142e-05, 'epoch': 1.33} +2025-02-05 19:57:10 - ERROR - stderr - 44%|████▍ | 9921/22434 [9:49:29<8:56:00, 2.57s/it] +2025-02-05 19:57:12 - ERROR - stderr - 44%|████▍ | 9922/22434 [9:49:32<8:46:52, 2.53s/it] +2025-02-05 19:57:12 - ERROR - stderr - +2025-02-05 19:57:12 - ERROR - stderr - +2025-02-05 19:57:12 - INFO - stdout - {'loss': 0.7418, 'grad_norm': 1.112687110900879, 'learning_rate': 1.2334453638559057e-05, 'epoch': 1.33} +2025-02-05 19:57:12 - ERROR - stderr - 44%|████▍ | 9922/22434 [9:49:32<8:46:52, 2.53s/it] +2025-02-05 19:57:14 - ERROR - stderr - 44%|████▍ | 9923/22434 [9:49:34<8:48:00, 2.53s/it] +2025-02-05 19:57:15 - ERROR - stderr - +2025-02-05 19:57:15 - ERROR - stderr - +2025-02-05 19:57:15 - INFO - stdout - {'loss': 0.7144, 'grad_norm': 1.3229628801345825, 'learning_rate': 1.2333049758444457e-05, 'epoch': 1.33} +2025-02-05 19:57:15 - ERROR - stderr - 44%|████▍ | 9923/22434 [9:49:34<8:48:00, 2.53s/it] +2025-02-05 19:57:17 - ERROR - stderr - 44%|████▍ | 9924/22434 [9:49:37<8:53:32, 2.56s/it] +2025-02-05 19:57:17 - ERROR - stderr - +2025-02-05 19:57:17 - ERROR - stderr - +2025-02-05 19:57:17 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.1406444311141968, 'learning_rate': 1.233164582969967e-05, 'epoch': 1.33} +2025-02-05 19:57:17 - ERROR - stderr - 44%|████▍ | 9924/22434 [9:49:37<8:53:32, 2.56s/it] +2025-02-05 19:57:20 - ERROR - stderr - 44%|████▍ | 9925/22434 [9:49:39<8:47:28, 2.53s/it] +2025-02-05 19:57:20 - ERROR - stderr - +2025-02-05 19:57:20 - ERROR - stderr - +2025-02-05 19:57:20 - INFO - stdout - {'loss': 0.702, 'grad_norm': 1.2382177114486694, 'learning_rate': 1.2330241852353959e-05, 'epoch': 1.33} +2025-02-05 19:57:20 - ERROR - stderr - 44%|████▍ | 9925/22434 [9:49:39<8:47:28, 2.53s/it] +2025-02-05 19:57:22 - ERROR - stderr - 44%|████▍ | 9926/22434 [9:49:42<8:46:44, 2.53s/it] +2025-02-05 19:57:22 - ERROR - stderr - +2025-02-05 19:57:22 - ERROR - stderr - +2025-02-05 19:57:22 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.1028498411178589, 'learning_rate': 1.2328837826436581e-05, 'epoch': 1.33} +2025-02-05 19:57:22 - ERROR - stderr - 44%|████▍ | 9926/22434 [9:49:42<8:46:44, 2.53s/it] +2025-02-05 19:57:25 - ERROR - stderr - 44%|████▍ | 9927/22434 [9:49:44<8:50:11, 2.54s/it] +2025-02-05 19:57:25 - ERROR - stderr - +2025-02-05 19:57:25 - ERROR - stderr - +2025-02-05 19:57:25 - INFO - stdout - {'loss': 0.7987, 'grad_norm': 1.2120846509933472, 'learning_rate': 1.232743375197681e-05, 'epoch': 1.33} +2025-02-05 19:57:25 - ERROR - stderr - 44%|████▍ | 9927/22434 [9:49:44<8:50:11, 2.54s/it] +2025-02-05 19:57:27 - ERROR - stderr - 44%|████▍ | 9928/22434 [9:49:47<8:49:58, 2.54s/it] +2025-02-05 19:57:27 - ERROR - stderr - +2025-02-05 19:57:27 - ERROR - stderr - +2025-02-05 19:57:27 - INFO - stdout - {'loss': 0.812, 'grad_norm': 1.3482658863067627, 'learning_rate': 1.2326029629003908e-05, 'epoch': 1.33} +2025-02-05 19:57:27 - ERROR - stderr - 44%|████▍ | 9928/22434 [9:49:47<8:49:58, 2.54s/it] +2025-02-05 19:57:30 - ERROR - stderr - 44%|████▍ | 9929/22434 [9:49:49<8:47:05, 2.53s/it] +2025-02-05 19:57:30 - ERROR - stderr - +2025-02-05 19:57:30 - ERROR - stderr - +2025-02-05 19:57:30 - INFO - stdout - {'loss': 0.7148, 'grad_norm': 1.215316653251648, 'learning_rate': 1.2324625457547148e-05, 'epoch': 1.33} +2025-02-05 19:57:30 - ERROR - stderr - 44%|████▍ | 9929/22434 [9:49:50<8:47:05, 2.53s/it] +2025-02-05 19:57:32 - ERROR - stderr - 44%|████▍ | 9930/22434 [9:49:52<8:46:29, 2.53s/it] +2025-02-05 19:57:32 - ERROR - stderr - +2025-02-05 19:57:32 - ERROR - stderr - +2025-02-05 19:57:32 - INFO - stdout - {'loss': 0.7102, 'grad_norm': 1.3712847232818604, 'learning_rate': 1.2323221237635791e-05, 'epoch': 1.33} +2025-02-05 19:57:32 - ERROR - stderr - 44%|████▍ | 9930/22434 [9:49:52<8:46:29, 2.53s/it] +2025-02-05 19:57:35 - ERROR - stderr - 44%|████▍ | 9931/22434 [9:49:55<8:46:17, 2.53s/it] +2025-02-05 19:57:35 - ERROR - stderr - +2025-02-05 19:57:35 - ERROR - stderr - +2025-02-05 19:57:35 - INFO - stdout - {'loss': 0.6588, 'grad_norm': 1.1967591047286987, 'learning_rate': 1.2321816969299112e-05, 'epoch': 1.33} +2025-02-05 19:57:35 - ERROR - stderr - 44%|████▍ | 9931/22434 [9:49:55<8:46:17, 2.53s/it] +2025-02-05 19:57:37 - ERROR - stderr - 44%|████▍ | 9932/22434 [9:49:57<8:41:30, 2.50s/it] +2025-02-05 19:57:37 - ERROR - stderr - +2025-02-05 19:57:37 - ERROR - stderr - +2025-02-05 19:57:37 - INFO - stdout - {'loss': 0.6579, 'grad_norm': 1.1134616136550903, 'learning_rate': 1.2320412652566377e-05, 'epoch': 1.33} +2025-02-05 19:57:37 - ERROR - stderr - 44%|████▍ | 9932/22434 [9:49:57<8:41:30, 2.50s/it] +2025-02-05 19:57:40 - ERROR - stderr - 44%|████▍ | 9933/22434 [9:49:59<8:36:35, 2.48s/it] +2025-02-05 19:57:40 - ERROR - stderr - +2025-02-05 19:57:40 - ERROR - stderr - +2025-02-05 19:57:40 - INFO - stdout - {'loss': 0.6781, 'grad_norm': 1.2588043212890625, 'learning_rate': 1.2319008287466865e-05, 'epoch': 1.33} +2025-02-05 19:57:40 - ERROR - stderr - 44%|████▍ | 9933/22434 [9:49:59<8:36:35, 2.48s/it] +2025-02-05 19:57:42 - ERROR - stderr - 44%|████▍ | 9934/22434 [9:50:02<8:36:41, 2.48s/it] +2025-02-05 19:57:42 - ERROR - stderr - +2025-02-05 19:57:42 - ERROR - stderr - +2025-02-05 19:57:42 - INFO - stdout - {'loss': 0.704, 'grad_norm': 1.054167628288269, 'learning_rate': 1.2317603874029843e-05, 'epoch': 1.33} +2025-02-05 19:57:42 - ERROR - stderr - 44%|████▍ | 9934/22434 [9:50:02<8:36:41, 2.48s/it] +2025-02-05 19:57:45 - ERROR - stderr - 44%|████▍ | 9935/22434 [9:50:04<8:35:41, 2.48s/it] +2025-02-05 19:57:45 - ERROR - stderr - +2025-02-05 19:57:45 - ERROR - stderr - +2025-02-05 19:57:45 - INFO - stdout - {'loss': 0.7859, 'grad_norm': 1.2699534893035889, 'learning_rate': 1.2316199412284584e-05, 'epoch': 1.33} +2025-02-05 19:57:45 - ERROR - stderr - 44%|████▍ | 9935/22434 [9:50:04<8:35:41, 2.48s/it] +2025-02-05 19:57:47 - ERROR - stderr - 44%|████▍ | 9936/22434 [9:50:07<8:35:13, 2.47s/it] +2025-02-05 19:57:47 - ERROR - stderr - +2025-02-05 19:57:47 - ERROR - stderr - +2025-02-05 19:57:47 - INFO - stdout - {'loss': 0.6493, 'grad_norm': 1.1028108596801758, 'learning_rate': 1.2314794902260368e-05, 'epoch': 1.33} +2025-02-05 19:57:47 - ERROR - stderr - 44%|████▍ | 9936/22434 [9:50:07<8:35:13, 2.47s/it] +2025-02-05 19:57:49 - ERROR - stderr - 44%|████▍ | 9937/22434 [9:50:09<8:32:46, 2.46s/it] +2025-02-05 19:57:50 - ERROR - stderr - +2025-02-05 19:57:50 - ERROR - stderr - +2025-02-05 19:57:50 - INFO - stdout - {'loss': 0.69, 'grad_norm': 1.266394019126892, 'learning_rate': 1.2313390343986467e-05, 'epoch': 1.33} +2025-02-05 19:57:50 - ERROR - stderr - 44%|████▍ | 9937/22434 [9:50:09<8:32:46, 2.46s/it] +2025-02-05 19:57:52 - ERROR - stderr - 44%|████▍ | 9938/22434 [9:50:12<8:35:30, 2.48s/it] +2025-02-05 19:57:52 - ERROR - stderr - +2025-02-05 19:57:52 - ERROR - stderr - +2025-02-05 19:57:52 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.08072030544281, 'learning_rate': 1.2311985737492155e-05, 'epoch': 1.33} +2025-02-05 19:57:52 - ERROR - stderr - 44%|████▍ | 9938/22434 [9:50:12<8:35:30, 2.48s/it] +2025-02-05 19:57:54 - ERROR - stderr - 44%|████▍ | 9939/22434 [9:50:14<8:37:17, 2.48s/it] +2025-02-05 19:57:55 - ERROR - stderr - +2025-02-05 19:57:55 - ERROR - stderr - +2025-02-05 19:57:55 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.2309006452560425, 'learning_rate': 1.2310581082806713e-05, 'epoch': 1.33} +2025-02-05 19:57:55 - ERROR - stderr - 44%|████▍ | 9939/22434 [9:50:14<8:37:17, 2.48s/it] +2025-02-05 19:57:57 - ERROR - stderr - 44%|████▍ | 9940/22434 [9:50:17<8:33:27, 2.47s/it] +2025-02-05 19:57:57 - ERROR - stderr - +2025-02-05 19:57:57 - ERROR - stderr - +2025-02-05 19:57:57 - INFO - stdout - {'loss': 0.8177, 'grad_norm': 1.2696044445037842, 'learning_rate': 1.2309176379959417e-05, 'epoch': 1.33} +2025-02-05 19:57:57 - ERROR - stderr - 44%|████▍ | 9940/22434 [9:50:17<8:33:27, 2.47s/it] +2025-02-05 19:57:59 - ERROR - stderr - 44%|████▍ | 9941/22434 [9:50:19<8:37:53, 2.49s/it] +2025-02-05 19:58:00 - ERROR - stderr - +2025-02-05 19:58:00 - ERROR - stderr - +2025-02-05 19:58:00 - INFO - stdout - {'loss': 0.7734, 'grad_norm': 1.214389443397522, 'learning_rate': 1.2307771628979555e-05, 'epoch': 1.33} +2025-02-05 19:58:00 - ERROR - stderr - 44%|████▍ | 9941/22434 [9:50:19<8:37:53, 2.49s/it] +2025-02-05 19:58:02 - ERROR - stderr - 44%|████▍ | 9942/22434 [9:50:22<8:36:20, 2.48s/it] +2025-02-05 19:58:02 - ERROR - stderr - +2025-02-05 19:58:02 - ERROR - stderr - +2025-02-05 19:58:02 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.1185178756713867, 'learning_rate': 1.2306366829896398e-05, 'epoch': 1.33} +2025-02-05 19:58:02 - ERROR - stderr - 44%|████▍ | 9942/22434 [9:50:22<8:36:20, 2.48s/it] +2025-02-05 19:58:05 - ERROR - stderr - 44%|████▍ | 9943/22434 [9:50:24<8:44:07, 2.52s/it] +2025-02-05 19:58:05 - ERROR - stderr - +2025-02-05 19:58:05 - ERROR - stderr - +2025-02-05 19:58:05 - INFO - stdout - {'loss': 0.6372, 'grad_norm': 1.2123051881790161, 'learning_rate': 1.2304961982739235e-05, 'epoch': 1.33} +2025-02-05 19:58:05 - ERROR - stderr - 44%|████▍ | 9943/22434 [9:50:24<8:44:07, 2.52s/it] +2025-02-05 19:58:07 - ERROR - stderr - 44%|████▍ | 9944/22434 [9:50:27<8:43:21, 2.51s/it] +2025-02-05 19:58:07 - ERROR - stderr - +2025-02-05 19:58:07 - ERROR - stderr - +2025-02-05 19:58:07 - INFO - stdout - {'loss': 0.6825, 'grad_norm': 1.0923655033111572, 'learning_rate': 1.2303557087537341e-05, 'epoch': 1.33} +2025-02-05 19:58:07 - ERROR - stderr - 44%|████▍ | 9944/22434 [9:50:27<8:43:21, 2.51s/it] +2025-02-05 19:58:10 - ERROR - stderr - 44%|████▍ | 9945/22434 [9:50:29<8:43:37, 2.52s/it] +2025-02-05 19:58:10 - ERROR - stderr - +2025-02-05 19:58:10 - ERROR - stderr - +2025-02-05 19:58:10 - INFO - stdout - {'loss': 0.7774, 'grad_norm': 1.4313167333602905, 'learning_rate': 1.2302152144320005e-05, 'epoch': 1.33} +2025-02-05 19:58:10 - ERROR - stderr - 44%|████▍ | 9945/22434 [9:50:29<8:43:37, 2.52s/it] +2025-02-05 19:58:12 - ERROR - stderr - 44%|████▍ | 9946/22434 [9:50:32<8:42:28, 2.51s/it] +2025-02-05 19:58:12 - ERROR - stderr - +2025-02-05 19:58:12 - ERROR - stderr - +2025-02-05 19:58:12 - INFO - stdout - {'loss': 0.6922, 'grad_norm': 1.140202283859253, 'learning_rate': 1.230074715311651e-05, 'epoch': 1.33} +2025-02-05 19:58:12 - ERROR - stderr - 44%|████▍ | 9946/22434 [9:50:32<8:42:28, 2.51s/it] +2025-02-05 19:58:15 - ERROR - stderr - 44%|████▍ | 9947/22434 [9:50:34<8:41:45, 2.51s/it] +2025-02-05 19:58:15 - ERROR - stderr - +2025-02-05 19:58:15 - ERROR - stderr - +2025-02-05 19:58:15 - INFO - stdout - {'loss': 0.7476, 'grad_norm': 1.2694644927978516, 'learning_rate': 1.2299342113956143e-05, 'epoch': 1.33} +2025-02-05 19:58:15 - ERROR - stderr - 44%|████▍ | 9947/22434 [9:50:34<8:41:45, 2.51s/it] +2025-02-05 19:58:17 - ERROR - stderr - 44%|████▍ | 9948/22434 [9:50:37<8:48:49, 2.54s/it] +2025-02-05 19:58:17 - ERROR - stderr - +2025-02-05 19:58:17 - ERROR - stderr - +2025-02-05 19:58:17 - INFO - stdout - {'loss': 0.6552, 'grad_norm': 1.143731951713562, 'learning_rate': 1.229793702686819e-05, 'epoch': 1.33} +2025-02-05 19:58:17 - ERROR - stderr - 44%|████▍ | 9948/22434 [9:50:37<8:48:49, 2.54s/it] +2025-02-05 19:58:20 - ERROR - stderr - 44%|████▍ | 9949/22434 [9:50:40<8:52:36, 2.56s/it] +2025-02-05 19:58:20 - ERROR - stderr - +2025-02-05 19:58:20 - ERROR - stderr - +2025-02-05 19:58:20 - INFO - stdout - {'loss': 0.6604, 'grad_norm': 1.2158550024032593, 'learning_rate': 1.2296531891881937e-05, 'epoch': 1.33} +2025-02-05 19:58:20 - ERROR - stderr - 44%|████▍ | 9949/22434 [9:50:40<8:52:36, 2.56s/it] +2025-02-05 19:58:22 - ERROR - stderr - 44%|████▍ | 9950/22434 [9:50:42<8:59:15, 2.59s/it] +2025-02-05 19:58:22 - ERROR - stderr - +2025-02-05 19:58:22 - ERROR - stderr - +2025-02-05 19:58:22 - INFO - stdout - {'loss': 0.6641, 'grad_norm': 1.205003023147583, 'learning_rate': 1.2295126709026679e-05, 'epoch': 1.33} +2025-02-05 19:58:22 - ERROR - stderr - 44%|████▍ | 9950/22434 [9:50:42<8:59:15, 2.59s/it] +2025-02-05 19:58:25 - ERROR - stderr - 44%|████▍ | 9951/22434 [9:50:45<8:57:02, 2.58s/it] +2025-02-05 19:58:25 - ERROR - stderr - +2025-02-05 19:58:25 - ERROR - stderr - +2025-02-05 19:58:25 - INFO - stdout - {'loss': 0.6911, 'grad_norm': 1.2294111251831055, 'learning_rate': 1.2293721478331695e-05, 'epoch': 1.33} +2025-02-05 19:58:25 - ERROR - stderr - 44%|████▍ | 9951/22434 [9:50:45<8:57:02, 2.58s/it] +2025-02-05 19:58:27 - ERROR - stderr - 44%|████▍ | 9952/22434 [9:50:47<8:46:48, 2.53s/it] +2025-02-05 19:58:27 - ERROR - stderr - +2025-02-05 19:58:27 - ERROR - stderr - +2025-02-05 19:58:27 - INFO - stdout - {'loss': 0.7307, 'grad_norm': 1.201937198638916, 'learning_rate': 1.2292316199826285e-05, 'epoch': 1.33} +2025-02-05 19:58:27 - ERROR - stderr - 44%|████▍ | 9952/22434 [9:50:47<8:46:48, 2.53s/it] +2025-02-05 19:58:30 - ERROR - stderr - 44%|████▍ | 9953/22434 [9:50:50<8:45:25, 2.53s/it] +2025-02-05 19:58:30 - ERROR - stderr - +2025-02-05 19:58:30 - ERROR - stderr - +2025-02-05 19:58:30 - INFO - stdout - {'loss': 0.6972, 'grad_norm': 1.223040223121643, 'learning_rate': 1.2290910873539734e-05, 'epoch': 1.33} +2025-02-05 19:58:30 - ERROR - stderr - 44%|████▍ | 9953/22434 [9:50:50<8:45:25, 2.53s/it] +2025-02-05 19:58:32 - ERROR - stderr - 44%|████▍ | 9954/22434 [9:50:52<8:41:30, 2.51s/it] +2025-02-05 19:58:32 - ERROR - stderr - +2025-02-05 19:58:32 - ERROR - stderr - +2025-02-05 19:58:32 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.185958743095398, 'learning_rate': 1.2289505499501341e-05, 'epoch': 1.33} +2025-02-05 19:58:32 - ERROR - stderr - 44%|████▍ | 9954/22434 [9:50:52<8:41:30, 2.51s/it] +2025-02-05 19:58:35 - ERROR - stderr - 44%|████▍ | 9955/22434 [9:50:55<8:41:21, 2.51s/it] +2025-02-05 19:58:35 - ERROR - stderr - +2025-02-05 19:58:35 - ERROR - stderr - +2025-02-05 19:58:35 - INFO - stdout - {'loss': 0.7329, 'grad_norm': 1.09072744846344, 'learning_rate': 1.2288100077740398e-05, 'epoch': 1.33} +2025-02-05 19:58:35 - ERROR - stderr - 44%|████▍ | 9955/22434 [9:50:55<8:41:21, 2.51s/it] +2025-02-05 19:58:37 - ERROR - stderr - 44%|████▍ | 9956/22434 [9:50:57<8:40:37, 2.50s/it] +2025-02-05 19:58:37 - ERROR - stderr - +2025-02-05 19:58:37 - ERROR - stderr - +2025-02-05 19:58:37 - INFO - stdout - {'loss': 0.7516, 'grad_norm': 1.4063736200332642, 'learning_rate': 1.2286694608286197e-05, 'epoch': 1.33} +2025-02-05 19:58:37 - ERROR - stderr - 44%|████▍ | 9956/22434 [9:50:57<8:40:37, 2.50s/it] +2025-02-05 19:58:40 - ERROR - stderr - 44%|████▍ | 9957/22434 [9:51:00<8:40:12, 2.50s/it] +2025-02-05 19:58:40 - ERROR - stderr - +2025-02-05 19:58:40 - ERROR - stderr - +2025-02-05 19:58:40 - INFO - stdout - {'loss': 0.6729, 'grad_norm': 1.1567846536636353, 'learning_rate': 1.2285289091168034e-05, 'epoch': 1.33} +2025-02-05 19:58:40 - ERROR - stderr - 44%|████▍ | 9957/22434 [9:51:00<8:40:12, 2.50s/it] +2025-02-05 19:58:42 - ERROR - stderr - 44%|████▍ | 9958/22434 [9:51:02<8:39:14, 2.50s/it] +2025-02-05 19:58:42 - ERROR - stderr - +2025-02-05 19:58:42 - ERROR - stderr - +2025-02-05 19:58:42 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.10630464553833, 'learning_rate': 1.2283883526415208e-05, 'epoch': 1.33} +2025-02-05 19:58:42 - ERROR - stderr - 44%|████▍ | 9958/22434 [9:51:02<8:39:14, 2.50s/it] +2025-02-05 19:58:45 - ERROR - stderr - 44%|████▍ | 9959/22434 [9:51:05<8:38:12, 2.49s/it] +2025-02-05 19:58:45 - ERROR - stderr - +2025-02-05 19:58:45 - ERROR - stderr - +2025-02-05 19:58:45 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.1214838027954102, 'learning_rate': 1.2282477914057011e-05, 'epoch': 1.33} +2025-02-05 19:58:45 - ERROR - stderr - 44%|████▍ | 9959/22434 [9:51:05<8:38:12, 2.49s/it] +2025-02-05 19:58:47 - ERROR - stderr - 44%|████▍ | 9960/22434 [9:51:07<8:44:13, 2.52s/it] +2025-02-05 19:58:47 - ERROR - stderr - +2025-02-05 19:58:47 - ERROR - stderr - +2025-02-05 19:58:47 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.1347246170043945, 'learning_rate': 1.228107225412275e-05, 'epoch': 1.33} +2025-02-05 19:58:47 - ERROR - stderr - 44%|████▍ | 9960/22434 [9:51:07<8:44:13, 2.52s/it] +2025-02-05 19:58:50 - ERROR - stderr - 44%|████▍ | 9961/22434 [9:51:10<8:44:35, 2.52s/it] +2025-02-05 19:58:50 - ERROR - stderr - +2025-02-05 19:58:50 - ERROR - stderr - +2025-02-05 19:58:50 - INFO - stdout - {'loss': 0.7044, 'grad_norm': 1.2258824110031128, 'learning_rate': 1.227966654664172e-05, 'epoch': 1.33} +2025-02-05 19:58:50 - ERROR - stderr - 44%|████▍ | 9961/22434 [9:51:10<8:44:35, 2.52s/it] +2025-02-05 19:58:53 - ERROR - stderr - 44%|████▍ | 9962/22434 [9:51:12<8:47:51, 2.54s/it] +2025-02-05 19:58:53 - ERROR - stderr - +2025-02-05 19:58:53 - ERROR - stderr - +2025-02-05 19:58:53 - INFO - stdout - {'loss': 0.717, 'grad_norm': 1.1314369440078735, 'learning_rate': 1.2278260791643225e-05, 'epoch': 1.33} +2025-02-05 19:58:53 - ERROR - stderr - 44%|████▍ | 9962/22434 [9:51:12<8:47:51, 2.54s/it] +2025-02-05 19:58:55 - ERROR - stderr - 44%|████▍ | 9963/22434 [9:51:15<8:49:03, 2.55s/it] +2025-02-05 19:58:55 - ERROR - stderr - +2025-02-05 19:58:55 - ERROR - stderr - +2025-02-05 19:58:55 - INFO - stdout - {'loss': 0.7743, 'grad_norm': 1.336971640586853, 'learning_rate': 1.2276854989156562e-05, 'epoch': 1.33} +2025-02-05 19:58:55 - ERROR - stderr - 44%|████▍ | 9963/22434 [9:51:15<8:49:03, 2.55s/it] +2025-02-05 19:58:58 - ERROR - stderr - 44%|████▍ | 9964/22434 [9:51:17<8:44:22, 2.52s/it] +2025-02-05 19:58:58 - ERROR - stderr - +2025-02-05 19:58:58 - ERROR - stderr - +2025-02-05 19:58:58 - INFO - stdout - {'loss': 0.7857, 'grad_norm': 1.2942824363708496, 'learning_rate': 1.2275449139211034e-05, 'epoch': 1.33} +2025-02-05 19:58:58 - ERROR - stderr - 44%|████▍ | 9964/22434 [9:51:17<8:44:22, 2.52s/it] +2025-02-05 19:59:00 - ERROR - stderr - 44%|████▍ | 9965/22434 [9:51:20<8:44:11, 2.52s/it] +2025-02-05 19:59:00 - ERROR - stderr - +2025-02-05 19:59:00 - ERROR - stderr - +2025-02-05 19:59:00 - INFO - stdout - {'loss': 0.6634, 'grad_norm': 1.0937530994415283, 'learning_rate': 1.2274043241835944e-05, 'epoch': 1.33} +2025-02-05 19:59:00 - ERROR - stderr - 44%|████▍ | 9965/22434 [9:51:20<8:44:11, 2.52s/it] +2025-02-05 19:59:03 - ERROR - stderr - 44%|████▍ | 9966/22434 [9:51:22<8:38:07, 2.49s/it] +2025-02-05 19:59:03 - ERROR - stderr - +2025-02-05 19:59:03 - ERROR - stderr - +2025-02-05 19:59:03 - INFO - stdout - {'loss': 0.7603, 'grad_norm': 1.2330195903778076, 'learning_rate': 1.2272637297060604e-05, 'epoch': 1.33} +2025-02-05 19:59:03 - ERROR - stderr - 44%|████▍ | 9966/22434 [9:51:22<8:38:07, 2.49s/it] +2025-02-05 19:59:05 - ERROR - stderr - 44%|████▍ | 9967/22434 [9:51:25<8:35:30, 2.48s/it] +2025-02-05 19:59:05 - ERROR - stderr - +2025-02-05 19:59:05 - ERROR - stderr - +2025-02-05 19:59:05 - INFO - stdout - {'loss': 0.772, 'grad_norm': 1.3019753694534302, 'learning_rate': 1.227123130491431e-05, 'epoch': 1.33} +2025-02-05 19:59:05 - ERROR - stderr - 44%|████▍ | 9967/22434 [9:51:25<8:35:30, 2.48s/it] +2025-02-05 19:59:07 - ERROR - stderr - 44%|████▍ | 9968/22434 [9:51:27<8:35:29, 2.48s/it] +2025-02-05 19:59:08 - ERROR - stderr - +2025-02-05 19:59:08 - ERROR - stderr - +2025-02-05 19:59:08 - INFO - stdout - {'loss': 0.7319, 'grad_norm': 1.2161818742752075, 'learning_rate': 1.2269825265426374e-05, 'epoch': 1.33} +2025-02-05 19:59:08 - ERROR - stderr - 44%|████▍ | 9968/22434 [9:51:27<8:35:29, 2.48s/it] +2025-02-05 19:59:10 - ERROR - stderr - 44%|████▍ | 9969/22434 [9:51:30<8:41:54, 2.51s/it] +2025-02-05 19:59:10 - ERROR - stderr - +2025-02-05 19:59:10 - ERROR - stderr - +2025-02-05 19:59:10 - INFO - stdout - {'loss': 0.7614, 'grad_norm': 1.2585318088531494, 'learning_rate': 1.2268419178626104e-05, 'epoch': 1.33} +2025-02-05 19:59:10 - ERROR - stderr - 44%|████▍ | 9969/22434 [9:51:30<8:41:54, 2.51s/it] +2025-02-05 19:59:13 - ERROR - stderr - 44%|████▍ | 9970/22434 [9:51:33<8:54:02, 2.57s/it] +2025-02-05 19:59:13 - ERROR - stderr - +2025-02-05 19:59:13 - ERROR - stderr - +2025-02-05 19:59:13 - INFO - stdout - {'loss': 0.7228, 'grad_norm': 1.1786071062088013, 'learning_rate': 1.2267013044542807e-05, 'epoch': 1.33} +2025-02-05 19:59:13 - ERROR - stderr - 44%|████▍ | 9970/22434 [9:51:33<8:54:02, 2.57s/it] +2025-02-05 19:59:15 - ERROR - stderr - 44%|████▍ | 9971/22434 [9:51:35<8:52:08, 2.56s/it] +2025-02-05 19:59:15 - ERROR - stderr - +2025-02-05 19:59:15 - ERROR - stderr - +2025-02-05 19:59:15 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.220621943473816, 'learning_rate': 1.226560686320579e-05, 'epoch': 1.33} +2025-02-05 19:59:15 - ERROR - stderr - 44%|████▍ | 9971/22434 [9:51:35<8:52:08, 2.56s/it] +2025-02-05 19:59:18 - ERROR - stderr - 44%|████▍ | 9972/22434 [9:51:38<9:01:05, 2.61s/it] +2025-02-05 19:59:18 - ERROR - stderr - +2025-02-05 19:59:18 - ERROR - stderr - +2025-02-05 19:59:18 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.1607320308685303, 'learning_rate': 1.2264200634644366e-05, 'epoch': 1.33} +2025-02-05 19:59:18 - ERROR - stderr - 44%|████▍ | 9972/22434 [9:51:38<9:01:05, 2.61s/it] +2025-02-05 19:59:21 - ERROR - stderr - 44%|████▍ | 9973/22434 [9:51:40<8:57:05, 2.59s/it] +2025-02-05 19:59:21 - ERROR - stderr - +2025-02-05 19:59:21 - ERROR - stderr - +2025-02-05 19:59:21 - INFO - stdout - {'loss': 0.8436, 'grad_norm': 1.3242387771606445, 'learning_rate': 1.2262794358887847e-05, 'epoch': 1.33} +2025-02-05 19:59:21 - ERROR - stderr - 44%|████▍ | 9973/22434 [9:51:40<8:57:05, 2.59s/it] +2025-02-05 19:59:23 - ERROR - stderr - 44%|████▍ | 9974/22434 [9:51:43<8:52:27, 2.56s/it] +2025-02-05 19:59:23 - ERROR - stderr - +2025-02-05 19:59:23 - ERROR - stderr - +2025-02-05 19:59:23 - INFO - stdout - {'loss': 0.7423, 'grad_norm': 1.1421105861663818, 'learning_rate': 1.2261388035965544e-05, 'epoch': 1.33} +2025-02-05 19:59:23 - ERROR - stderr - 44%|████▍ | 9974/22434 [9:51:43<8:52:27, 2.56s/it] +2025-02-05 19:59:26 - ERROR - stderr - 44%|████▍ | 9975/22434 [9:51:45<8:48:09, 2.54s/it] +2025-02-05 19:59:26 - ERROR - stderr - +2025-02-05 19:59:26 - ERROR - stderr - +2025-02-05 19:59:26 - INFO - stdout - {'loss': 0.7147, 'grad_norm': 1.1449205875396729, 'learning_rate': 1.2259981665906774e-05, 'epoch': 1.33} +2025-02-05 19:59:26 - ERROR - stderr - 44%|████▍ | 9975/22434 [9:51:45<8:48:09, 2.54s/it] +2025-02-05 19:59:28 - ERROR - stderr - 44%|██��█▍ | 9976/22434 [9:51:48<8:47:06, 2.54s/it] +2025-02-05 19:59:28 - ERROR - stderr - +2025-02-05 19:59:28 - ERROR - stderr - +2025-02-05 19:59:28 - INFO - stdout - {'loss': 0.7055, 'grad_norm': 1.1216039657592773, 'learning_rate': 1.2258575248740847e-05, 'epoch': 1.33} +2025-02-05 19:59:28 - ERROR - stderr - 44%|████▍ | 9976/22434 [9:51:48<8:47:06, 2.54s/it] +2025-02-05 19:59:31 - ERROR - stderr - 44%|████▍ | 9977/22434 [9:51:50<8:45:34, 2.53s/it] +2025-02-05 19:59:31 - ERROR - stderr - +2025-02-05 19:59:31 - ERROR - stderr - +2025-02-05 19:59:31 - INFO - stdout - {'loss': 0.7334, 'grad_norm': 1.251924991607666, 'learning_rate': 1.225716878449708e-05, 'epoch': 1.33} +2025-02-05 19:59:31 - ERROR - stderr - 44%|████▍ | 9977/22434 [9:51:50<8:45:34, 2.53s/it] +2025-02-05 19:59:33 - ERROR - stderr - 44%|████▍ | 9978/22434 [9:51:53<8:46:03, 2.53s/it] +2025-02-05 19:59:33 - ERROR - stderr - +2025-02-05 19:59:33 - ERROR - stderr - +2025-02-05 19:59:33 - INFO - stdout - {'loss': 0.6734, 'grad_norm': 1.2540733814239502, 'learning_rate': 1.2255762273204788e-05, 'epoch': 1.33} +2025-02-05 19:59:33 - ERROR - stderr - 44%|████▍ | 9978/22434 [9:51:53<8:46:03, 2.53s/it] +2025-02-05 19:59:36 - ERROR - stderr - 44%|████▍ | 9979/22434 [9:51:55<8:48:31, 2.55s/it] +2025-02-05 19:59:36 - ERROR - stderr - +2025-02-05 19:59:36 - ERROR - stderr - +2025-02-05 19:59:36 - INFO - stdout - {'loss': 0.6457, 'grad_norm': 1.111703872680664, 'learning_rate': 1.2254355714893293e-05, 'epoch': 1.33} +2025-02-05 19:59:36 - ERROR - stderr - 44%|████▍ | 9979/22434 [9:51:56<8:48:31, 2.55s/it] +2025-02-05 19:59:38 - ERROR - stderr - 44%|████▍ | 9980/22434 [9:51:58<8:46:42, 2.54s/it] +2025-02-05 19:59:38 - ERROR - stderr - +2025-02-05 19:59:38 - ERROR - stderr - +2025-02-05 19:59:38 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.0919688940048218, 'learning_rate': 1.2252949109591908e-05, 'epoch': 1.33} +2025-02-05 19:59:38 - ERROR - stderr - 44%|████▍ | 9980/22434 [9:51:58<8:46:42, 2.54s/it] +2025-02-05 19:59:41 - ERROR - stderr - 44%|████▍ | 9981/22434 [9:52:00<8:43:37, 2.52s/it] +2025-02-05 19:59:41 - ERROR - stderr - +2025-02-05 19:59:41 - ERROR - stderr - +2025-02-05 19:59:41 - INFO - stdout - {'loss': 0.7669, 'grad_norm': 1.1726315021514893, 'learning_rate': 1.2251542457329957e-05, 'epoch': 1.33} +2025-02-05 19:59:41 - ERROR - stderr - 44%|████▍ | 9981/22434 [9:52:01<8:43:37, 2.52s/it] +2025-02-05 19:59:43 - ERROR - stderr - 44%|████▍ | 9982/22434 [9:52:03<8:40:00, 2.51s/it] +2025-02-05 19:59:43 - ERROR - stderr - +2025-02-05 19:59:43 - ERROR - stderr - +2025-02-05 19:59:43 - INFO - stdout - {'loss': 0.6425, 'grad_norm': 1.1115158796310425, 'learning_rate': 1.2250135758136757e-05, 'epoch': 1.33} +2025-02-05 19:59:43 - ERROR - stderr - 44%|████▍ | 9982/22434 [9:52:03<8:40:00, 2.51s/it] +2025-02-05 19:59:46 - ERROR - stderr - 44%|████▍ | 9983/22434 [9:52:05<8:41:25, 2.51s/it] +2025-02-05 19:59:46 - ERROR - stderr - +2025-02-05 19:59:46 - ERROR - stderr - +2025-02-05 19:59:46 - INFO - stdout - {'loss': 0.8132, 'grad_norm': 1.1724599599838257, 'learning_rate': 1.224872901204163e-05, 'epoch': 1.33} +2025-02-05 19:59:46 - ERROR - stderr - 44%|████▍ | 9983/22434 [9:52:06<8:41:25, 2.51s/it] +2025-02-05 19:59:48 - ERROR - stderr - 45%|████▍ | 9984/22434 [9:52:08<8:36:22, 2.49s/it] +2025-02-05 19:59:48 - ERROR - stderr - +2025-02-05 19:59:48 - ERROR - stderr - +2025-02-05 19:59:48 - INFO - stdout - {'loss': 0.7509, 'grad_norm': 1.1603496074676514, 'learning_rate': 1.2247322219073898e-05, 'epoch': 1.34} +2025-02-05 19:59:48 - ERROR - stderr - 45%|████▍ | 9984/22434 [9:52:08<8:36:22, 2.49s/it] +2025-02-05 19:59:51 - ERROR - stderr - 45%|████▍ | 9985/22434 [9:52:10<8:40:41, 2.51s/it] +2025-02-05 19:59:51 - ERROR - stderr - +2025-02-05 19:59:51 - ERROR - stderr - +2025-02-05 19:59:51 - INFO - stdout - {'loss': 0.7034, 'grad_norm': 1.2477036714553833, 'learning_rate': 1.2245915379262885e-05, 'epoch': 1.34} +2025-02-05 19:59:51 - ERROR - stderr - 45%|████▍ | 9985/22434 [9:52:11<8:40:41, 2.51s/it] +2025-02-05 19:59:53 - ERROR - stderr - 45%|████▍ | 9986/22434 [9:52:13<8:42:02, 2.52s/it] +2025-02-05 19:59:53 - ERROR - stderr - +2025-02-05 19:59:53 - ERROR - stderr - +2025-02-05 19:59:53 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.1882050037384033, 'learning_rate': 1.2244508492637914e-05, 'epoch': 1.34} +2025-02-05 19:59:53 - ERROR - stderr - 45%|████▍ | 9986/22434 [9:52:13<8:42:02, 2.52s/it] +2025-02-05 19:59:56 - ERROR - stderr - 45%|████▍ | 9987/22434 [9:52:15<8:38:03, 2.50s/it] +2025-02-05 19:59:56 - ERROR - stderr - +2025-02-05 19:59:56 - ERROR - stderr - +2025-02-05 19:59:56 - INFO - stdout - {'loss': 0.6487, 'grad_norm': 1.2126102447509766, 'learning_rate': 1.2243101559228313e-05, 'epoch': 1.34} +2025-02-05 19:59:56 - ERROR - stderr - 45%|████▍ | 9987/22434 [9:52:15<8:38:03, 2.50s/it] +2025-02-05 19:59:58 - ERROR - stderr - 45%|████▍ | 9988/22434 [9:52:18<8:35:30, 2.49s/it] +2025-02-05 19:59:58 - ERROR - stderr - +2025-02-05 19:59:58 - ERROR - stderr - +2025-02-05 19:59:58 - INFO - stdout - {'loss': 0.7488, 'grad_norm': 1.1055957078933716, 'learning_rate': 1.2241694579063407e-05, 'epoch': 1.34} +2025-02-05 19:59:58 - ERROR - stderr - 45%|████▍ | 9988/22434 [9:52:18<8:35:30, 2.49s/it] +2025-02-05 20:00:01 - ERROR - stderr - 45%|████▍ | 9989/22434 [9:52:20<8:42:18, 2.52s/it] +2025-02-05 20:00:01 - ERROR - stderr - +2025-02-05 20:00:01 - ERROR - stderr - +2025-02-05 20:00:01 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.0847002267837524, 'learning_rate': 1.2240287552172521e-05, 'epoch': 1.34} +2025-02-05 20:00:01 - ERROR - stderr - 45%|████▍ | 9989/22434 [9:52:21<8:42:18, 2.52s/it] +2025-02-05 20:00:03 - ERROR - stderr - 45%|████▍ | 9990/22434 [9:52:23<8:39:08, 2.50s/it] +2025-02-05 20:00:03 - ERROR - stderr - +2025-02-05 20:00:03 - ERROR - stderr - +2025-02-05 20:00:03 - INFO - stdout - {'loss': 0.7366, 'grad_norm': 1.087023138999939, 'learning_rate': 1.2238880478584987e-05, 'epoch': 1.34} +2025-02-05 20:00:03 - ERROR - stderr - 45%|████▍ | 9990/22434 [9:52:23<8:39:08, 2.50s/it] +2025-02-05 20:00:06 - ERROR - stderr - 45%|████▍ | 9991/22434 [9:52:25<8:37:42, 2.50s/it] +2025-02-05 20:00:06 - ERROR - stderr - +2025-02-05 20:00:06 - ERROR - stderr - +2025-02-05 20:00:06 - INFO - stdout - {'loss': 0.7638, 'grad_norm': 1.242891550064087, 'learning_rate': 1.2237473358330128e-05, 'epoch': 1.34} +2025-02-05 20:00:06 - ERROR - stderr - 45%|████▍ | 9991/22434 [9:52:25<8:37:42, 2.50s/it] +2025-02-05 20:00:08 - ERROR - stderr - 45%|████▍ | 9992/22434 [9:52:28<8:38:16, 2.50s/it] +2025-02-05 20:00:08 - ERROR - stderr - +2025-02-05 20:00:08 - ERROR - stderr - +2025-02-05 20:00:08 - INFO - stdout - {'loss': 0.723, 'grad_norm': 1.1691824197769165, 'learning_rate': 1.223606619143728e-05, 'epoch': 1.34} +2025-02-05 20:00:08 - ERROR - stderr - 45%|████▍ | 9992/22434 [9:52:28<8:38:16, 2.50s/it] +2025-02-05 20:00:11 - ERROR - stderr - 45%|████▍ | 9993/22434 [9:52:31<8:43:20, 2.52s/it] +2025-02-05 20:00:11 - ERROR - stderr - +2025-02-05 20:00:11 - ERROR - stderr - +2025-02-05 20:00:11 - INFO - stdout - {'loss': 0.7504, 'grad_norm': 1.1096733808517456, 'learning_rate': 1.2234658977935772e-05, 'epoch': 1.34} +2025-02-05 20:00:11 - ERROR - stderr - 45%|████▍ | 9993/22434 [9:52:31<8:43:20, 2.52s/it] +2025-02-05 20:00:13 - ERROR - stderr - 45%|████▍ | 9994/22434 [9:52:33<8:40:21, 2.51s/it] +2025-02-05 20:00:13 - ERROR - stderr - +2025-02-05 20:00:13 - ERROR - stderr - +2025-02-05 20:00:13 - INFO - stdout - {'loss': 0.624, 'grad_norm': 1.083448886871338, 'learning_rate': 1.2233251717854937e-05, 'epoch': 1.34} +2025-02-05 20:00:13 - ERROR - stderr - 45%|████▍ | 9994/22434 [9:52:33<8:40:21, 2.51s/it] +2025-02-05 20:00:16 - ERROR - stderr - 45%|████▍ | 9995/22434 [9:52:36<9:21:39, 2.71s/it] +2025-02-05 20:00:16 - ERROR - stderr - +2025-02-05 20:00:16 - ERROR - stderr - +2025-02-05 20:00:16 - INFO - stdout - {'loss': 0.7592, 'grad_norm': 1.219514012336731, 'learning_rate': 1.2231844411224105e-05, 'epoch': 1.34} +2025-02-05 20:00:16 - ERROR - stderr - 45%|████▍ | 9995/22434 [9:52:36<9:21:39, 2.71s/it] +2025-02-05 20:00:19 - ERROR - stderr - 45%|████▍ | 9996/22434 [9:52:39<9:13:48, 2.67s/it] +2025-02-05 20:00:19 - ERROR - stderr - +2025-02-05 20:00:19 - ERROR - stderr - +2025-02-05 20:00:19 - INFO - stdout - {'loss': 0.7118, 'grad_norm': 1.1211715936660767, 'learning_rate': 1.2230437058072613e-05, 'epoch': 1.34} +2025-02-05 20:00:19 - ERROR - stderr - 45%|████▍ | 9996/22434 [9:52:39<9:13:48, 2.67s/it] +2025-02-05 20:00:21 - ERROR - stderr - 45%|████▍ | 9997/22434 [9:52:41<8:59:05, 2.60s/it] +2025-02-05 20:00:21 - ERROR - stderr - +2025-02-05 20:00:21 - ERROR - stderr - +2025-02-05 20:00:21 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.1520240306854248, 'learning_rate': 1.2229029658429795e-05, 'epoch': 1.34} +2025-02-05 20:00:21 - ERROR - stderr - 45%|████▍ | 9997/22434 [9:52:41<8:59:05, 2.60s/it] +2025-02-05 20:00:24 - ERROR - stderr - 45%|████▍ | 9998/22434 [9:52:44<8:52:17, 2.57s/it] +2025-02-05 20:00:24 - ERROR - stderr - +2025-02-05 20:00:24 - ERROR - stderr - +2025-02-05 20:00:24 - INFO - stdout - {'loss': 0.7353, 'grad_norm': 1.2420533895492554, 'learning_rate': 1.2227622212324985e-05, 'epoch': 1.34} +2025-02-05 20:00:24 - ERROR - stderr - 45%|████▍ | 9998/22434 [9:52:44<8:52:17, 2.57s/it] +2025-02-05 20:00:26 - ERROR - stderr - 45%|████▍ | 9999/22434 [9:52:46<8:46:07, 2.54s/it] +2025-02-05 20:00:26 - ERROR - stderr - +2025-02-05 20:00:26 - ERROR - stderr - +2025-02-05 20:00:26 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.287726640701294, 'learning_rate': 1.2226214719787524e-05, 'epoch': 1.34} +2025-02-05 20:00:26 - ERROR - stderr - 45%|████▍ | 9999/22434 [9:52:46<8:46:07, 2.54s/it] +2025-02-05 20:00:29 - ERROR - stderr - 45%|████▍ | 10000/22434 [9:52:49<8:42:20, 2.52s/it] +2025-02-05 20:00:29 - ERROR - stderr - +2025-02-05 20:00:29 - ERROR - stderr - +2025-02-05 20:00:29 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.2552980184555054, 'learning_rate': 1.2224807180846745e-05, 'epoch': 1.34} +2025-02-05 20:00:29 - ERROR - stderr - 45%|████▍ | 10000/22434 [9:52:49<8:42:20, 2.52s/it] +2025-02-05 20:00:31 - ERROR - stderr - 45%|████▍ | 10001/22434 [9:52:51<8:40:24, 2.51s/it] +2025-02-05 20:00:31 - ERROR - stderr - +2025-02-05 20:00:31 - ERROR - stderr - +2025-02-05 20:00:31 - INFO - stdout - {'loss': 0.6433, 'grad_norm': 1.126163363456726, 'learning_rate': 1.222339959553199e-05, 'epoch': 1.34} +2025-02-05 20:00:31 - ERROR - stderr - 45%|████▍ | 10001/22434 [9:52:51<8:40:24, 2.51s/it] +2025-02-05 20:00:34 - ERROR - stderr - 45%|████▍ | 10002/22434 [9:52:54<8:50:20, 2.56s/it] +2025-02-05 20:00:34 - ERROR - stderr - +2025-02-05 20:00:34 - ERROR - stderr - +2025-02-05 20:00:34 - INFO - stdout - {'loss': 0.6508, 'grad_norm': 1.1581871509552002, 'learning_rate': 1.2221991963872599e-05, 'epoch': 1.34} +2025-02-05 20:00:34 - ERROR - stderr - 45%|████▍ | 10002/22434 [9:52:54<8:50:20, 2.56s/it] +2025-02-05 20:00:36 - ERROR - stderr - 45%|████▍ | 10003/22434 [9:52:56<8:42:20, 2.52s/it] +2025-02-05 20:00:37 - ERROR - stderr - +2025-02-05 20:00:37 - ERROR - stderr - +2025-02-05 20:00:37 - INFO - stdout - {'loss': 0.7462, 'grad_norm': 1.2027941942214966, 'learning_rate': 1.2220584285897912e-05, 'epoch': 1.34} +2025-02-05 20:00:37 - ERROR - stderr - 45%|████▍ | 10003/22434 [9:52:56<8:42:20, 2.52s/it] +2025-02-05 20:00:39 - ERROR - stderr - 45%|████▍ | 10004/22434 [9:52:59<8:43:47, 2.53s/it] +2025-02-05 20:00:39 - ERROR - stderr - +2025-02-05 20:00:39 - ERROR - stderr - +2025-02-05 20:00:39 - INFO - stdout - {'loss': 0.7032, 'grad_norm': 1.0569887161254883, 'learning_rate': 1.2219176561637267e-05, 'epoch': 1.34} +2025-02-05 20:00:39 - ERROR - stderr - 45%|████▍ | 10004/22434 [9:52:59<8:43:47, 2.53s/it] +2025-02-05 20:00:42 - ERROR - stderr - 45%|████▍ | 10005/22434 [9:53:01<8:41:28, 2.52s/it] +2025-02-05 20:00:42 - ERROR - stderr - +2025-02-05 20:00:42 - ERROR - stderr - +2025-02-05 20:00:42 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.1866272687911987, 'learning_rate': 1.2217768791120012e-05, 'epoch': 1.34} +2025-02-05 20:00:42 - ERROR - stderr - 45%|████▍ | 10005/22434 [9:53:01<8:41:28, 2.52s/it] +2025-02-05 20:00:44 - ERROR - stderr - 45%|████▍ | 10006/22434 [9:53:04<8:43:05, 2.53s/it] +2025-02-05 20:00:44 - ERROR - stderr - +2025-02-05 20:00:44 - ERROR - stderr - +2025-02-05 20:00:44 - INFO - stdout - {'loss': 0.7032, 'grad_norm': 1.1435871124267578, 'learning_rate': 1.2216360974375492e-05, 'epoch': 1.34} +2025-02-05 20:00:44 - ERROR - stderr - 45%|████▍ | 10006/22434 [9:53:04<8:43:05, 2.53s/it] +2025-02-05 20:00:47 - ERROR - stderr - 45%|████▍ | 10007/22434 [9:53:07<9:05:36, 2.63s/it] +2025-02-05 20:00:47 - ERROR - stderr - +2025-02-05 20:00:47 - ERROR - stderr - +2025-02-05 20:00:47 - INFO - stdout - {'loss': 0.6677, 'grad_norm': 1.259946584701538, 'learning_rate': 1.2214953111433046e-05, 'epoch': 1.34} +2025-02-05 20:00:47 - ERROR - stderr - 45%|████▍ | 10007/22434 [9:53:07<9:05:36, 2.63s/it] +2025-02-05 20:00:49 - ERROR - stderr - 45%|████▍ | 10008/22434 [9:53:09<8:55:43, 2.59s/it] +2025-02-05 20:00:49 - ERROR - stderr - +2025-02-05 20:00:49 - ERROR - stderr - +2025-02-05 20:00:49 - INFO - stdout - {'loss': 0.7667, 'grad_norm': 1.1676573753356934, 'learning_rate': 1.2213545202322021e-05, 'epoch': 1.34} +2025-02-05 20:00:49 - ERROR - stderr - 45%|████▍ | 10008/22434 [9:53:09<8:55:43, 2.59s/it] +2025-02-05 20:00:52 - ERROR - stderr - 45%|████▍ | 10009/22434 [9:53:12<8:50:47, 2.56s/it] +2025-02-05 20:00:52 - ERROR - stderr - +2025-02-05 20:00:52 - ERROR - stderr - +2025-02-05 20:00:52 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.0351556539535522, 'learning_rate': 1.2212137247071764e-05, 'epoch': 1.34} +2025-02-05 20:00:52 - ERROR - stderr - 45%|████▍ | 10009/22434 [9:53:12<8:50:47, 2.56s/it] +2025-02-05 20:00:55 - ERROR - stderr - 45%|████▍ | 10010/22434 [9:53:14<8:59:37, 2.61s/it] +2025-02-05 20:00:55 - ERROR - stderr - +2025-02-05 20:00:55 - ERROR - stderr - +2025-02-05 20:00:55 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.1760621070861816, 'learning_rate': 1.2210729245711623e-05, 'epoch': 1.34} +2025-02-05 20:00:55 - ERROR - stderr - 45%|████▍ | 10010/22434 [9:53:14<8:59:37, 2.61s/it] +2025-02-05 20:00:58 - ERROR - stderr - 45%|████▍ | 10011/22434 [9:53:17<9:21:11, 2.71s/it] +2025-02-05 20:00:58 - ERROR - stderr - +2025-02-05 20:00:58 - ERROR - stderr - +2025-02-05 20:00:58 - INFO - stdout - {'loss': 0.7661, 'grad_norm': 1.3379417657852173, 'learning_rate': 1.2209321198270947e-05, 'epoch': 1.34} +2025-02-05 20:00:58 - ERROR - stderr - 45%|████▍ | 10011/22434 [9:53:17<9:21:11, 2.71s/it] +2025-02-05 20:01:00 - ERROR - stderr - 45%|████▍ | 10012/22434 [9:53:20<9:05:00, 2.63s/it] +2025-02-05 20:01:00 - ERROR - stderr - +2025-02-05 20:01:00 - ERROR - stderr - +2025-02-05 20:01:00 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 1.2333307266235352, 'learning_rate': 1.2207913104779086e-05, 'epoch': 1.34} +2025-02-05 20:01:00 - ERROR - stderr - 45%|████▍ | 10012/22434 [9:53:20<9:05:00, 2.63s/it] +2025-02-05 20:01:03 - ERROR - stderr - 45%|████▍ | 10013/22434 [9:53:22<8:59:20, 2.61s/it] +2025-02-05 20:01:03 - ERROR - stderr - +2025-02-05 20:01:03 - ERROR - stderr - +2025-02-05 20:01:03 - INFO - stdout - {'loss': 0.7808, 'grad_norm': 1.216335415840149, 'learning_rate': 1.2206504965265387e-05, 'epoch': 1.34} +2025-02-05 20:01:03 - ERROR - stderr - 45%|████▍ | 10013/22434 [9:53:22<8:59:20, 2.61s/it] +2025-02-05 20:01:05 - ERROR - stderr - 45%|████▍ | 10014/22434 [9:53:25<8:52:51, 2.57s/it] +2025-02-05 20:01:05 - ERROR - stderr - +2025-02-05 20:01:05 - ERROR - stderr - +2025-02-05 20:01:05 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.2376459836959839, 'learning_rate': 1.2205096779759207e-05, 'epoch': 1.34} +2025-02-05 20:01:05 - ERROR - stderr - 45%|████▍ | 10014/22434 [9:53:25<8:52:51, 2.57s/it] +2025-02-05 20:01:08 - ERROR - stderr - 45%|████▍ | 10015/22434 [9:53:27<8:45:27, 2.54s/it] +2025-02-05 20:01:08 - ERROR - stderr - +2025-02-05 20:01:08 - ERROR - stderr - +2025-02-05 20:01:08 - INFO - stdout - {'loss': 0.6747, 'grad_norm': 1.0902397632598877, 'learning_rate': 1.2203688548289892e-05, 'epoch': 1.34} +2025-02-05 20:01:08 - ERROR - stderr - 45%|████▍ | 10015/22434 [9:53:27<8:45:27, 2.54s/it] +2025-02-05 20:01:10 - ERROR - stderr - 45%|████▍ | 10016/22434 [9:53:30<8:45:43, 2.54s/it] +2025-02-05 20:01:10 - ERROR - stderr - +2025-02-05 20:01:10 - ERROR - stderr - +2025-02-05 20:01:10 - INFO - stdout - {'loss': 0.6388, 'grad_norm': 1.3385555744171143, 'learning_rate': 1.2202280270886797e-05, 'epoch': 1.34} +2025-02-05 20:01:10 - ERROR - stderr - 45%|████▍ | 10016/22434 [9:53:30<8:45:43, 2.54s/it] +2025-02-05 20:01:13 - ERROR - stderr - 45%|████▍ | 10017/22434 [9:53:32<8:51:06, 2.57s/it] +2025-02-05 20:01:13 - ERROR - stderr - +2025-02-05 20:01:13 - ERROR - stderr - +2025-02-05 20:01:13 - INFO - stdout - {'loss': 0.7357, 'grad_norm': 1.2003036737442017, 'learning_rate': 1.2200871947579278e-05, 'epoch': 1.34} +2025-02-05 20:01:13 - ERROR - stderr - 45%|████▍ | 10017/22434 [9:53:33<8:51:06, 2.57s/it] +2025-02-05 20:01:15 - ERROR - stderr - 45%|████▍ | 10018/22434 [9:53:35<8:48:23, 2.55s/it] +2025-02-05 20:01:15 - ERROR - stderr - +2025-02-05 20:01:15 - ERROR - stderr - +2025-02-05 20:01:15 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.1177383661270142, 'learning_rate': 1.2199463578396688e-05, 'epoch': 1.34} +2025-02-05 20:01:15 - ERROR - stderr - 45%|████▍ | 10018/22434 [9:53:35<8:48:23, 2.55s/it] +2025-02-05 20:01:18 - ERROR - stderr - 45%|████▍ | 10019/22434 [9:53:37<8:40:23, 2.51s/it] +2025-02-05 20:01:18 - ERROR - stderr - +2025-02-05 20:01:18 - ERROR - stderr - +2025-02-05 20:01:18 - INFO - stdout - {'loss': 0.7323, 'grad_norm': 1.393141508102417, 'learning_rate': 1.2198055163368386e-05, 'epoch': 1.34} +2025-02-05 20:01:18 - ERROR - stderr - 45%|████▍ | 10019/22434 [9:53:37<8:40:23, 2.51s/it] +2025-02-05 20:01:20 - ERROR - stderr - 45%|████▍ | 10020/22434 [9:53:40<8:40:19, 2.51s/it] +2025-02-05 20:01:20 - ERROR - stderr - +2025-02-05 20:01:20 - ERROR - stderr - +2025-02-05 20:01:20 - INFO - stdout - {'loss': 0.7566, 'grad_norm': 1.1044131517410278, 'learning_rate': 1.2196646702523726e-05, 'epoch': 1.34} +2025-02-05 20:01:20 - ERROR - stderr - 45%|████▍ | 10020/22434 [9:53:40<8:40:19, 2.51s/it] +2025-02-05 20:01:23 - ERROR - stderr - 45%|████▍ | 10021/22434 [9:53:43<8:46:26, 2.54s/it] +2025-02-05 20:01:23 - ERROR - stderr - +2025-02-05 20:01:23 - ERROR - stderr - +2025-02-05 20:01:23 - INFO - stdout - {'loss': 0.764, 'grad_norm': 1.273234248161316, 'learning_rate': 1.219523819589207e-05, 'epoch': 1.34} +2025-02-05 20:01:23 - ERROR - stderr - 45%|████▍ | 10021/22434 [9:53:43<8:46:26, 2.54s/it] +2025-02-05 20:01:25 - ERROR - stderr - 45%|████▍ | 10022/22434 [9:53:45<8:42:24, 2.53s/it] +2025-02-05 20:01:25 - ERROR - stderr - +2025-02-05 20:01:25 - ERROR - stderr - +2025-02-05 20:01:25 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.2831190824508667, 'learning_rate': 1.2193829643502774e-05, 'epoch': 1.34} +2025-02-05 20:01:25 - ERROR - stderr - 45%|████▍ | 10022/22434 [9:53:45<8:42:24, 2.53s/it] +2025-02-05 20:01:28 - ERROR - stderr - 45%|████▍ | 10023/22434 [9:53:48<8:40:07, 2.51s/it] +2025-02-05 20:01:28 - ERROR - stderr - +2025-02-05 20:01:28 - ERROR - stderr - +2025-02-05 20:01:28 - INFO - stdout - {'loss': 0.711, 'grad_norm': 1.1941145658493042, 'learning_rate': 1.2192421045385194e-05, 'epoch': 1.34} +2025-02-05 20:01:28 - ERROR - stderr - 45%|████▍ | 10023/22434 [9:53:48<8:40:07, 2.51s/it] +2025-02-05 20:01:30 - ERROR - stderr - 45%|████▍ | 10024/22434 [9:53:50<8:40:29, 2.52s/it] +2025-02-05 20:01:30 - ERROR - stderr - +2025-02-05 20:01:30 - ERROR - stderr - +2025-02-05 20:01:30 - INFO - stdout - {'loss': 0.6609, 'grad_norm': 1.0071264505386353, 'learning_rate': 1.2191012401568698e-05, 'epoch': 1.34} +2025-02-05 20:01:30 - ERROR - stderr - 45%|████▍ | 10024/22434 [9:53:50<8:40:29, 2.52s/it] +2025-02-05 20:01:33 - ERROR - stderr - 45%|████▍ | 10025/22434 [9:53:53<8:39:00, 2.51s/it] +2025-02-05 20:01:33 - ERROR - stderr - +2025-02-05 20:01:33 - ERROR - stderr - +2025-02-05 20:01:33 - INFO - stdout - {'loss': 0.7499, 'grad_norm': 1.3105251789093018, 'learning_rate': 1.2189603712082648e-05, 'epoch': 1.34} +2025-02-05 20:01:33 - ERROR - stderr - 45%|████▍ | 10025/22434 [9:53:53<8:39:00, 2.51s/it] +2025-02-05 20:01:35 - ERROR - stderr - 45%|████▍ | 10026/22434 [9:53:55<8:35:03, 2.49s/it] +2025-02-05 20:01:35 - ERROR - stderr - +2025-02-05 20:01:35 - ERROR - stderr - +2025-02-05 20:01:35 - INFO - stdout - {'loss': 0.6558, 'grad_norm': 1.1234333515167236, 'learning_rate': 1.21881949769564e-05, 'epoch': 1.34} +2025-02-05 20:01:35 - ERROR - stderr - 45%|████▍ | 10026/22434 [9:53:55<8:35:03, 2.49s/it] +2025-02-05 20:01:38 - ERROR - stderr - 45%|████▍ | 10027/22434 [9:53:57<8:32:44, 2.48s/it] +2025-02-05 20:01:38 - ERROR - stderr - +2025-02-05 20:01:38 - ERROR - stderr - +2025-02-05 20:01:38 - INFO - stdout - {'loss': 0.5798, 'grad_norm': 1.0120511054992676, 'learning_rate': 1.2186786196219324e-05, 'epoch': 1.34} +2025-02-05 20:01:38 - ERROR - stderr - 45%|████▍ | 10027/22434 [9:53:57<8:32:44, 2.48s/it] +2025-02-05 20:01:40 - ERROR - stderr - 45%|████▍ | 10028/22434 [9:54:00<8:33:24, 2.48s/it] +2025-02-05 20:01:40 - ERROR - stderr - +2025-02-05 20:01:40 - ERROR - stderr - +2025-02-05 20:01:40 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.2481564283370972, 'learning_rate': 1.2185377369900781e-05, 'epoch': 1.34} +2025-02-05 20:01:40 - ERROR - stderr - 45%|████▍ | 10028/22434 [9:54:00<8:33:24, 2.48s/it] +2025-02-05 20:01:43 - ERROR - stderr - 45%|████▍ | 10029/22434 [9:54:03<8:54:43, 2.59s/it] +2025-02-05 20:01:43 - ERROR - stderr - +2025-02-05 20:01:43 - ERROR - stderr - +2025-02-05 20:01:43 - INFO - stdout - {'loss': 0.7642, 'grad_norm': 1.1985450983047485, 'learning_rate': 1.2183968498030138e-05, 'epoch': 1.34} +2025-02-05 20:01:43 - ERROR - stderr - 45%|████▍ | 10029/22434 [9:54:03<8:54:43, 2.59s/it] +2025-02-05 20:01:45 - ERROR - stderr - 45%|████▍ | 10030/22434 [9:54:05<8:48:13, 2.56s/it] +2025-02-05 20:01:46 - ERROR - stderr - +2025-02-05 20:01:46 - ERROR - stderr - +2025-02-05 20:01:46 - INFO - stdout - {'loss': 0.7396, 'grad_norm': 1.221891164779663, 'learning_rate': 1.218255958063676e-05, 'epoch': 1.34} +2025-02-05 20:01:46 - ERROR - stderr - 45%|████▍ | 10030/22434 [9:54:05<8:48:13, 2.56s/it] +2025-02-05 20:01:48 - ERROR - stderr - 45%|████▍ | 10031/22434 [9:54:08<8:43:44, 2.53s/it] +2025-02-05 20:01:48 - ERROR - stderr - +2025-02-05 20:01:48 - ERROR - stderr - +2025-02-05 20:01:48 - INFO - stdout - {'loss': 0.7816, 'grad_norm': 1.2291136980056763, 'learning_rate': 1.218115061775002e-05, 'epoch': 1.34} +2025-02-05 20:01:48 - ERROR - stderr - 45%|████▍ | 10031/22434 [9:54:08<8:43:44, 2.53s/it] +2025-02-05 20:01:48 - INFO - stdout - WARNING: tokenization mismatch: 96 vs. 114. (ignored) +2025-02-05 20:01:50 - ERROR - stderr - 45%|████▍ | 10032/22434 [9:54:10<8:40:40, 2.52s/it] +2025-02-05 20:01:50 - ERROR - stderr - +2025-02-05 20:01:50 - ERROR - stderr - +2025-02-05 20:01:50 - INFO - stdout - {'loss': 0.6409, 'grad_norm': 1.200584053993225, 'learning_rate': 1.2179741609399279e-05, 'epoch': 1.34} +2025-02-05 20:01:50 - ERROR - stderr - 45%|████▍ | 10032/22434 [9:54:10<8:40:40, 2.52s/it] +2025-02-05 20:01:53 - ERROR - stderr - 45%|████▍ | 10033/22434 [9:54:13<8:37:53, 2.51s/it] +2025-02-05 20:01:53 - ERROR - stderr - +2025-02-05 20:01:53 - ERROR - stderr - +2025-02-05 20:01:53 - INFO - stdout - {'loss': 0.8773, 'grad_norm': 1.3587855100631714, 'learning_rate': 1.217833255561391e-05, 'epoch': 1.34} +2025-02-05 20:01:53 - ERROR - stderr - 45%|████▍ | 10033/22434 [9:54:13<8:37:53, 2.51s/it] +2025-02-05 20:01:55 - ERROR - stderr - 45%|████▍ | 10034/22434 [9:54:15<8:37:18, 2.50s/it] +2025-02-05 20:01:55 - ERROR - stderr - +2025-02-05 20:01:55 - ERROR - stderr - +2025-02-05 20:01:55 - INFO - stdout - {'loss': 0.6939, 'grad_norm': 1.1163244247436523, 'learning_rate': 1.2176923456423283e-05, 'epoch': 1.34} +2025-02-05 20:01:55 - ERROR - stderr - 45%|████▍ | 10034/22434 [9:54:15<8:37:18, 2.50s/it] +2025-02-05 20:01:58 - ERROR - stderr - 45%|████▍ | 10035/22434 [9:54:18<8:34:41, 2.49s/it] +2025-02-05 20:01:58 - ERROR - stderr - +2025-02-05 20:01:58 - ERROR - stderr - +2025-02-05 20:01:58 - INFO - stdout - {'loss': 0.6379, 'grad_norm': 1.0339083671569824, 'learning_rate': 1.2175514311856776e-05, 'epoch': 1.34} +2025-02-05 20:01:58 - ERROR - stderr - 45%|████▍ | 10035/22434 [9:54:18<8:34:41, 2.49s/it] +2025-02-05 20:02:00 - ERROR - stderr - 45%|████▍ | 10036/22434 [9:54:20<8:34:34, 2.49s/it] +2025-02-05 20:02:00 - ERROR - stderr - +2025-02-05 20:02:00 - ERROR - stderr - +2025-02-05 20:02:00 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.0852417945861816, 'learning_rate': 1.2174105121943748e-05, 'epoch': 1.34} +2025-02-05 20:02:00 - ERROR - stderr - 45%|████▍ | 10036/22434 [9:54:20<8:34:34, 2.49s/it] +2025-02-05 20:02:03 - ERROR - stderr - 45%|████▍ | 10037/22434 [9:54:23<8:35:04, 2.49s/it] +2025-02-05 20:02:03 - ERROR - stderr - +2025-02-05 20:02:03 - ERROR - stderr - +2025-02-05 20:02:03 - INFO - stdout - {'loss': 0.7501, 'grad_norm': 1.2457945346832275, 'learning_rate': 1.2172695886713579e-05, 'epoch': 1.34} +2025-02-05 20:02:03 - ERROR - stderr - 45%|████▍ | 10037/22434 [9:54:23<8:35:04, 2.49s/it] +2025-02-05 20:02:05 - ERROR - stderr - 45%|████▍ | 10038/22434 [9:54:25<8:35:45, 2.50s/it] +2025-02-05 20:02:05 - ERROR - stderr - +2025-02-05 20:02:05 - ERROR - stderr - +2025-02-05 20:02:05 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.1165032386779785, 'learning_rate': 1.2171286606195644e-05, 'epoch': 1.34} +2025-02-05 20:02:05 - ERROR - stderr - 45%|████▍ | 10038/22434 [9:54:25<8:35:45, 2.50s/it] +2025-02-05 20:02:08 - ERROR - stderr - 45%|████▍ | 10039/22434 [9:54:28<8:32:24, 2.48s/it] +2025-02-05 20:02:08 - ERROR - stderr - +2025-02-05 20:02:08 - ERROR - stderr - +2025-02-05 20:02:08 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.2431137561798096, 'learning_rate': 1.2169877280419323e-05, 'epoch': 1.34} +2025-02-05 20:02:08 - ERROR - stderr - 45%|████▍ | 10039/22434 [9:54:28<8:32:24, 2.48s/it] +2025-02-05 20:02:10 - ERROR - stderr - 45%|████▍ | 10040/22434 [9:54:30<8:30:11, 2.47s/it] +2025-02-05 20:02:10 - ERROR - stderr - +2025-02-05 20:02:10 - ERROR - stderr - +2025-02-05 20:02:10 - INFO - stdout - {'loss': 0.7077, 'grad_norm': 1.134765386581421, 'learning_rate': 1.2168467909413981e-05, 'epoch': 1.34} +2025-02-05 20:02:10 - ERROR - stderr - 45%|████▍ | 10040/22434 [9:54:30<8:30:11, 2.47s/it] +2025-02-05 20:02:13 - ERROR - stderr - 45%|████▍ | 10041/22434 [9:54:33<8:32:28, 2.48s/it] +2025-02-05 20:02:13 - ERROR - stderr - +2025-02-05 20:02:13 - ERROR - stderr - +2025-02-05 20:02:13 - INFO - stdout - {'loss': 0.7395, 'grad_norm': 1.2699694633483887, 'learning_rate': 1.2167058493209e-05, 'epoch': 1.34} +2025-02-05 20:02:13 - ERROR - stderr - 45%|████▍ | 10041/22434 [9:54:33<8:32:28, 2.48s/it] +2025-02-05 20:02:15 - ERROR - stderr - 45%|████▍ | 10042/22434 [9:54:35<8:30:11, 2.47s/it] +2025-02-05 20:02:15 - ERROR - stderr - +2025-02-05 20:02:15 - ERROR - stderr - +2025-02-05 20:02:15 - INFO - stdout - {'loss': 0.7448, 'grad_norm': 1.3371946811676025, 'learning_rate': 1.2165649031833761e-05, 'epoch': 1.34} +2025-02-05 20:02:15 - ERROR - stderr - 45%|████▍ | 10042/22434 [9:54:35<8:30:11, 2.47s/it] +2025-02-05 20:02:18 - ERROR - stderr - 45%|████▍ | 10043/22434 [9:54:37<8:30:42, 2.47s/it] +2025-02-05 20:02:18 - ERROR - stderr - +2025-02-05 20:02:18 - ERROR - stderr - +2025-02-05 20:02:18 - INFO - stdout - {'loss': 0.7252, 'grad_norm': 1.2126374244689941, 'learning_rate': 1.2164239525317641e-05, 'epoch': 1.34} +2025-02-05 20:02:18 - ERROR - stderr - 45%|████▍ | 10043/22434 [9:54:37<8:30:42, 2.47s/it] +2025-02-05 20:02:20 - ERROR - stderr - 45%|████▍ | 10044/22434 [9:54:40<8:38:04, 2.51s/it] +2025-02-05 20:02:20 - ERROR - stderr - +2025-02-05 20:02:20 - ERROR - stderr - +2025-02-05 20:02:20 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.342572808265686, 'learning_rate': 1.2162829973690015e-05, 'epoch': 1.34} +2025-02-05 20:02:20 - ERROR - stderr - 45%|████▍ | 10044/22434 [9:54:40<8:38:04, 2.51s/it] +2025-02-05 20:02:23 - ERROR - stderr - 45%|████��� | 10045/22434 [9:54:43<8:37:04, 2.50s/it] +2025-02-05 20:02:23 - ERROR - stderr - +2025-02-05 20:02:23 - ERROR - stderr - +2025-02-05 20:02:23 - INFO - stdout - {'loss': 0.7153, 'grad_norm': 1.1401748657226562, 'learning_rate': 1.2161420376980272e-05, 'epoch': 1.34} +2025-02-05 20:02:23 - ERROR - stderr - 45%|████▍ | 10045/22434 [9:54:43<8:37:04, 2.50s/it] +2025-02-05 20:02:25 - ERROR - stderr - 45%|████▍ | 10046/22434 [9:54:45<8:40:52, 2.52s/it] +2025-02-05 20:02:25 - ERROR - stderr - +2025-02-05 20:02:25 - ERROR - stderr - +2025-02-05 20:02:25 - INFO - stdout - {'loss': 0.7551, 'grad_norm': 1.2594683170318604, 'learning_rate': 1.2160010735217786e-05, 'epoch': 1.34} +2025-02-05 20:02:25 - ERROR - stderr - 45%|████▍ | 10046/22434 [9:54:45<8:40:52, 2.52s/it] +2025-02-05 20:02:28 - ERROR - stderr - 45%|████▍ | 10047/22434 [9:54:48<8:48:37, 2.56s/it] +2025-02-05 20:02:28 - ERROR - stderr - +2025-02-05 20:02:28 - ERROR - stderr - +2025-02-05 20:02:28 - INFO - stdout - {'loss': 0.772, 'grad_norm': 1.3072881698608398, 'learning_rate': 1.2158601048431946e-05, 'epoch': 1.34} +2025-02-05 20:02:28 - ERROR - stderr - 45%|████▍ | 10047/22434 [9:54:48<8:48:37, 2.56s/it] +2025-02-05 20:02:30 - ERROR - stderr - 45%|████▍ | 10048/22434 [9:54:50<8:42:01, 2.53s/it] +2025-02-05 20:02:30 - ERROR - stderr - +2025-02-05 20:02:30 - ERROR - stderr - +2025-02-05 20:02:30 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.0567703247070312, 'learning_rate': 1.215719131665213e-05, 'epoch': 1.34} +2025-02-05 20:02:30 - ERROR - stderr - 45%|████▍ | 10048/22434 [9:54:50<8:42:01, 2.53s/it] +2025-02-05 20:02:33 - ERROR - stderr - 45%|████▍ | 10049/22434 [9:54:53<8:38:48, 2.51s/it] +2025-02-05 20:02:33 - ERROR - stderr - +2025-02-05 20:02:33 - ERROR - stderr - +2025-02-05 20:02:33 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.2708537578582764, 'learning_rate': 1.2155781539907728e-05, 'epoch': 1.34} +2025-02-05 20:02:33 - ERROR - stderr - 45%|████▍ | 10049/22434 [9:54:53<8:38:48, 2.51s/it] +2025-02-05 20:02:35 - ERROR - stderr - 45%|████▍ | 10050/22434 [9:54:55<8:41:09, 2.52s/it] +2025-02-05 20:02:36 - ERROR - stderr - +2025-02-05 20:02:36 - ERROR - stderr - +2025-02-05 20:02:36 - INFO - stdout - {'loss': 0.6643, 'grad_norm': 1.1334589719772339, 'learning_rate': 1.2154371718228119e-05, 'epoch': 1.34} +2025-02-05 20:02:36 - ERROR - stderr - 45%|████▍ | 10050/22434 [9:54:55<8:41:09, 2.52s/it] +2025-02-05 20:02:38 - ERROR - stderr - 45%|████▍ | 10051/22434 [9:54:58<8:42:20, 2.53s/it] +2025-02-05 20:02:38 - ERROR - stderr - +2025-02-05 20:02:38 - ERROR - stderr - +2025-02-05 20:02:38 - INFO - stdout - {'loss': 0.7279, 'grad_norm': 1.1364762783050537, 'learning_rate': 1.2152961851642697e-05, 'epoch': 1.34} +2025-02-05 20:02:38 - ERROR - stderr - 45%|████▍ | 10051/22434 [9:54:58<8:42:20, 2.53s/it] +2025-02-05 20:02:41 - ERROR - stderr - 45%|████▍ | 10052/22434 [9:55:00<8:40:17, 2.52s/it] +2025-02-05 20:02:41 - ERROR - stderr - +2025-02-05 20:02:41 - ERROR - stderr - +2025-02-05 20:02:41 - INFO - stdout - {'loss': 0.7658, 'grad_norm': 1.359859824180603, 'learning_rate': 1.2151551940180844e-05, 'epoch': 1.34} +2025-02-05 20:02:41 - ERROR - stderr - 45%|████▍ | 10052/22434 [9:55:00<8:40:17, 2.52s/it] +2025-02-05 20:02:43 - ERROR - stderr - 45%|████▍ | 10053/22434 [9:55:03<9:00:13, 2.62s/it] +2025-02-05 20:02:43 - ERROR - stderr - +2025-02-05 20:02:43 - ERROR - stderr - +2025-02-05 20:02:43 - INFO - stdout - {'loss': 0.8151, 'grad_norm': 1.2539238929748535, 'learning_rate': 1.2150141983871948e-05, 'epoch': 1.34} +2025-02-05 20:02:43 - ERROR - stderr - 45%|████▍ | 10053/22434 [9:55:03<9:00:13, 2.62s/it] +2025-02-05 20:02:46 - ERROR - stderr - 45%|████▍ | 10054/22434 [9:55:06<8:51:32, 2.58s/it] +2025-02-05 20:02:46 - ERROR - stderr - +2025-02-05 20:02:46 - ERROR - stderr - +2025-02-05 20:02:46 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.0668305158615112, 'learning_rate': 1.21487319827454e-05, 'epoch': 1.34} +2025-02-05 20:02:46 - ERROR - stderr - 45%|████▍ | 10054/22434 [9:55:06<8:51:32, 2.58s/it] +2025-02-05 20:02:48 - ERROR - stderr - 45%|████▍ | 10055/22434 [9:55:08<8:47:13, 2.56s/it] +2025-02-05 20:02:48 - ERROR - stderr - +2025-02-05 20:02:48 - ERROR - stderr - +2025-02-05 20:02:48 - INFO - stdout - {'loss': 0.6928, 'grad_norm': 1.2387019395828247, 'learning_rate': 1.2147321936830592e-05, 'epoch': 1.34} +2025-02-05 20:02:48 - ERROR - stderr - 45%|████▍ | 10055/22434 [9:55:08<8:47:13, 2.56s/it] +2025-02-05 20:02:51 - ERROR - stderr - 45%|████▍ | 10056/22434 [9:55:11<8:41:51, 2.53s/it] +2025-02-05 20:02:51 - ERROR - stderr - +2025-02-05 20:02:51 - ERROR - stderr - +2025-02-05 20:02:51 - INFO - stdout - {'loss': 0.6606, 'grad_norm': 1.1377508640289307, 'learning_rate': 1.2145911846156912e-05, 'epoch': 1.34} +2025-02-05 20:02:51 - ERROR - stderr - 45%|████▍ | 10056/22434 [9:55:11<8:41:51, 2.53s/it] +2025-02-05 20:02:53 - ERROR - stderr - 45%|████▍ | 10057/22434 [9:55:13<8:40:45, 2.52s/it] +2025-02-05 20:02:53 - ERROR - stderr - +2025-02-05 20:02:53 - ERROR - stderr - +2025-02-05 20:02:53 - INFO - stdout - {'loss': 0.7529, 'grad_norm': 1.2309672832489014, 'learning_rate': 1.2144501710753753e-05, 'epoch': 1.34} +2025-02-05 20:02:53 - ERROR - stderr - 45%|████▍ | 10057/22434 [9:55:13<8:40:45, 2.52s/it] +2025-02-05 20:02:56 - ERROR - stderr - 45%|████▍ | 10058/22434 [9:55:16<8:36:43, 2.51s/it] +2025-02-05 20:02:56 - ERROR - stderr - +2025-02-05 20:02:56 - ERROR - stderr - +2025-02-05 20:02:56 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.160562515258789, 'learning_rate': 1.2143091530650508e-05, 'epoch': 1.35} +2025-02-05 20:02:56 - ERROR - stderr - 45%|████▍ | 10058/22434 [9:55:16<8:36:43, 2.51s/it] +2025-02-05 20:02:58 - ERROR - stderr - 45%|████▍ | 10059/22434 [9:55:18<8:36:16, 2.50s/it] +2025-02-05 20:02:58 - ERROR - stderr - +2025-02-05 20:02:58 - ERROR - stderr - +2025-02-05 20:02:58 - INFO - stdout - {'loss': 0.6901, 'grad_norm': 1.0796853303909302, 'learning_rate': 1.2141681305876571e-05, 'epoch': 1.35} +2025-02-05 20:02:58 - ERROR - stderr - 45%|████▍ | 10059/22434 [9:55:18<8:36:16, 2.50s/it] +2025-02-05 20:03:01 - ERROR - stderr - 45%|████▍ | 10060/22434 [9:55:21<8:43:48, 2.54s/it] +2025-02-05 20:03:01 - ERROR - stderr - +2025-02-05 20:03:01 - ERROR - stderr - +2025-02-05 20:03:01 - INFO - stdout - {'loss': 0.8194, 'grad_norm': 1.256422519683838, 'learning_rate': 1.2140271036461338e-05, 'epoch': 1.35} +2025-02-05 20:03:01 - ERROR - stderr - 45%|████▍ | 10060/22434 [9:55:21<8:43:48, 2.54s/it] +2025-02-05 20:03:03 - ERROR - stderr - 45%|████▍ | 10061/22434 [9:55:23<8:38:09, 2.51s/it] +2025-02-05 20:03:03 - ERROR - stderr - +2025-02-05 20:03:03 - ERROR - stderr - +2025-02-05 20:03:03 - INFO - stdout - {'loss': 0.6154, 'grad_norm': 1.0610862970352173, 'learning_rate': 1.21388607224342e-05, 'epoch': 1.35} +2025-02-05 20:03:03 - ERROR - stderr - 45%|████▍ | 10061/22434 [9:55:23<8:38:09, 2.51s/it] +2025-02-05 20:03:06 - ERROR - stderr - 45%|████▍ | 10062/22434 [9:55:26<8:39:12, 2.52s/it] +2025-02-05 20:03:06 - ERROR - stderr - +2025-02-05 20:03:06 - ERROR - stderr - +2025-02-05 20:03:06 - INFO - stdout - {'loss': 0.71, 'grad_norm': 1.1614086627960205, 'learning_rate': 1.213745036382456e-05, 'epoch': 1.35} +2025-02-05 20:03:06 - ERROR - stderr - 45%|████▍ | 10062/22434 [9:55:26<8:39:12, 2.52s/it] +2025-02-05 20:03:08 - ERROR - stderr - 45%|████▍ | 10063/22434 [9:55:28<8:42:44, 2.54s/it] +2025-02-05 20:03:09 - ERROR - stderr - +2025-02-05 20:03:09 - ERROR - stderr - +2025-02-05 20:03:09 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.119407057762146, 'learning_rate': 1.213603996066181e-05, 'epoch': 1.35} +2025-02-05 20:03:09 - ERROR - stderr - 45%|████▍ | 10063/22434 [9:55:28<8:42:44, 2.54s/it] +2025-02-05 20:03:11 - ERROR - stderr - 45%|████▍ | 10064/22434 [9:55:31<8:42:52, 2.54s/it] +2025-02-05 20:03:11 - ERROR - stderr - +2025-02-05 20:03:11 - ERROR - stderr - +2025-02-05 20:03:11 - INFO - stdout - {'loss': 0.7542, 'grad_norm': 1.2506989240646362, 'learning_rate': 1.2134629512975352e-05, 'epoch': 1.35} +2025-02-05 20:03:11 - ERROR - stderr - 45%|████▍ | 10064/22434 [9:55:31<8:42:52, 2.54s/it] +2025-02-05 20:03:13 - ERROR - stderr - 45%|████▍ | 10065/22434 [9:55:33<8:36:27, 2.51s/it] +2025-02-05 20:03:13 - ERROR - stderr - +2025-02-05 20:03:13 - ERROR - stderr - +2025-02-05 20:03:13 - INFO - stdout - {'loss': 0.6634, 'grad_norm': 1.027877688407898, 'learning_rate': 1.2133219020794584e-05, 'epoch': 1.35} +2025-02-05 20:03:13 - ERROR - stderr - 45%|████▍ | 10065/22434 [9:55:33<8:36:27, 2.51s/it] +2025-02-05 20:03:16 - ERROR - stderr - 45%|████▍ | 10066/22434 [9:55:36<8:37:50, 2.51s/it] +2025-02-05 20:03:16 - ERROR - stderr - +2025-02-05 20:03:16 - ERROR - stderr - +2025-02-05 20:03:16 - INFO - stdout - {'loss': 0.6316, 'grad_norm': 1.1118401288986206, 'learning_rate': 1.2131808484148906e-05, 'epoch': 1.35} +2025-02-05 20:03:16 - ERROR - stderr - 45%|████▍ | 10066/22434 [9:55:36<8:37:50, 2.51s/it] +2025-02-05 20:03:19 - ERROR - stderr - 45%|████▍ | 10067/22434 [9:55:38<8:45:49, 2.55s/it] +2025-02-05 20:03:19 - ERROR - stderr - +2025-02-05 20:03:19 - ERROR - stderr - +2025-02-05 20:03:19 - INFO - stdout - {'loss': 0.6814, 'grad_norm': 1.125792145729065, 'learning_rate': 1.2130397903067722e-05, 'epoch': 1.35} +2025-02-05 20:03:19 - ERROR - stderr - 45%|████▍ | 10067/22434 [9:55:38<8:45:49, 2.55s/it] +2025-02-05 20:03:21 - ERROR - stderr - 45%|████▍ | 10068/22434 [9:55:41<8:49:08, 2.57s/it] +2025-02-05 20:03:21 - ERROR - stderr - +2025-02-05 20:03:21 - ERROR - stderr - +2025-02-05 20:03:21 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.089645504951477, 'learning_rate': 1.2128987277580433e-05, 'epoch': 1.35} +2025-02-05 20:03:21 - ERROR - stderr - 45%|████▍ | 10068/22434 [9:55:41<8:49:08, 2.57s/it] +2025-02-05 20:03:24 - ERROR - stderr - 45%|████▍ | 10069/22434 [9:55:44<8:47:55, 2.56s/it] +2025-02-05 20:03:24 - ERROR - stderr - +2025-02-05 20:03:24 - ERROR - stderr - +2025-02-05 20:03:24 - INFO - stdout - {'loss': 0.7558, 'grad_norm': 1.1538852453231812, 'learning_rate': 1.2127576607716436e-05, 'epoch': 1.35} +2025-02-05 20:03:24 - ERROR - stderr - 45%|████▍ | 10069/22434 [9:55:44<8:47:55, 2.56s/it] +2025-02-05 20:03:26 - ERROR - stderr - 45%|████▍ | 10070/22434 [9:55:46<8:42:37, 2.54s/it] +2025-02-05 20:03:26 - ERROR - stderr - +2025-02-05 20:03:26 - ERROR - stderr - +2025-02-05 20:03:26 - INFO - stdout - {'loss': 0.75, 'grad_norm': 1.2567024230957031, 'learning_rate': 1.2126165893505144e-05, 'epoch': 1.35} +2025-02-05 20:03:26 - ERROR - stderr - 45%|████▍ | 10070/22434 [9:55:46<8:42:37, 2.54s/it] +2025-02-05 20:03:29 - ERROR - stderr - 45%|████▍ | 10071/22434 [9:55:49<8:42:13, 2.53s/it] +2025-02-05 20:03:29 - ERROR - stderr - +2025-02-05 20:03:29 - ERROR - stderr - +2025-02-05 20:03:29 - INFO - stdout - {'loss': 0.6881, 'grad_norm': 1.1922539472579956, 'learning_rate': 1.212475513497596e-05, 'epoch': 1.35} +2025-02-05 20:03:29 - ERROR - stderr - 45%|████▍ | 10071/22434 [9:55:49<8:42:13, 2.53s/it] +2025-02-05 20:03:31 - ERROR - stderr - 45%|████▍ | 10072/22434 [9:55:51<8:40:37, 2.53s/it] +2025-02-05 20:03:31 - ERROR - stderr - +2025-02-05 20:03:31 - ERROR - stderr - +2025-02-05 20:03:31 - INFO - stdout - {'loss': 0.6454, 'grad_norm': 1.1519092321395874, 'learning_rate': 1.2123344332158288e-05, 'epoch': 1.35} +2025-02-05 20:03:31 - ERROR - stderr - 45%|████▍ | 10072/22434 [9:55:51<8:40:37, 2.53s/it] +2025-02-05 20:03:34 - ERROR - stderr - 45%|████▍ | 10073/22434 [9:55:54<8:53:03, 2.59s/it] +2025-02-05 20:03:34 - ERROR - stderr - +2025-02-05 20:03:34 - ERROR - stderr - +2025-02-05 20:03:34 - INFO - stdout - {'loss': 0.7817, 'grad_norm': 1.2882055044174194, 'learning_rate': 1.2121933485081536e-05, 'epoch': 1.35} +2025-02-05 20:03:34 - ERROR - stderr - 45%|████▍ | 10073/22434 [9:55:54<8:53:03, 2.59s/it] +2025-02-05 20:03:37 - ERROR - stderr - 45%|████▍ | 10074/22434 [9:55:56<8:49:47, 2.57s/it] +2025-02-05 20:03:37 - ERROR - stderr - +2025-02-05 20:03:37 - ERROR - stderr - +2025-02-05 20:03:37 - INFO - stdout - {'loss': 0.701, 'grad_norm': 1.1061348915100098, 'learning_rate': 1.2120522593775108e-05, 'epoch': 1.35} +2025-02-05 20:03:37 - ERROR - stderr - 45%|████▍ | 10074/22434 [9:55:56<8:49:47, 2.57s/it] +2025-02-05 20:03:39 - ERROR - stderr - 45%|████▍ | 10075/22434 [9:55:59<8:45:01, 2.55s/it] +2025-02-05 20:03:39 - ERROR - stderr - +2025-02-05 20:03:39 - ERROR - stderr - +2025-02-05 20:03:39 - INFO - stdout - {'loss': 0.8133, 'grad_norm': 1.2565011978149414, 'learning_rate': 1.2119111658268417e-05, 'epoch': 1.35} +2025-02-05 20:03:39 - ERROR - stderr - 45%|████▍ | 10075/22434 [9:55:59<8:45:01, 2.55s/it] +2025-02-05 20:03:41 - ERROR - stderr - 45%|████▍ | 10076/22434 [9:56:01<8:40:02, 2.52s/it] +2025-02-05 20:03:42 - ERROR - stderr - +2025-02-05 20:03:42 - ERROR - stderr - +2025-02-05 20:03:42 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.1008533239364624, 'learning_rate': 1.2117700678590872e-05, 'epoch': 1.35} +2025-02-05 20:03:42 - ERROR - stderr - 45%|████▍ | 10076/22434 [9:56:01<8:40:02, 2.52s/it] +2025-02-05 20:03:44 - ERROR - stderr - 45%|████▍ | 10077/22434 [9:56:04<8:36:28, 2.51s/it] +2025-02-05 20:03:44 - ERROR - stderr - +2025-02-05 20:03:44 - ERROR - stderr - +2025-02-05 20:03:44 - INFO - stdout - {'loss': 0.7531, 'grad_norm': 1.347006916999817, 'learning_rate': 1.211628965477188e-05, 'epoch': 1.35} +2025-02-05 20:03:44 - ERROR - stderr - 45%|████▍ | 10077/22434 [9:56:04<8:36:28, 2.51s/it] +2025-02-05 20:03:46 - ERROR - stderr - 45%|████▍ | 10078/22434 [9:56:06<8:34:30, 2.50s/it] +2025-02-05 20:03:46 - ERROR - stderr - +2025-02-05 20:03:46 - ERROR - stderr - +2025-02-05 20:03:46 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.2499759197235107, 'learning_rate': 1.2114878586840856e-05, 'epoch': 1.35} +2025-02-05 20:03:46 - ERROR - stderr - 45%|████▍ | 10078/22434 [9:56:06<8:34:30, 2.50s/it] +2025-02-05 20:03:49 - ERROR - stderr - 45%|████▍ | 10079/22434 [9:56:09<8:34:10, 2.50s/it] +2025-02-05 20:03:49 - ERROR - stderr - +2025-02-05 20:03:49 - ERROR - stderr - +2025-02-05 20:03:49 - INFO - stdout - {'loss': 0.8189, 'grad_norm': 1.1910834312438965, 'learning_rate': 1.2113467474827217e-05, 'epoch': 1.35} +2025-02-05 20:03:49 - ERROR - stderr - 45%|████▍ | 10079/22434 [9:56:09<8:34:10, 2.50s/it] +2025-02-05 20:03:51 - ERROR - stderr - 45%|████▍ | 10080/22434 [9:56:11<8:35:27, 2.50s/it] +2025-02-05 20:03:52 - ERROR - stderr - +2025-02-05 20:03:52 - ERROR - stderr - +2025-02-05 20:03:52 - INFO - stdout - {'loss': 0.7567, 'grad_norm': 1.201168179512024, 'learning_rate': 1.2112056318760365e-05, 'epoch': 1.35} +2025-02-05 20:03:52 - ERROR - stderr - 45%|████▍ | 10080/22434 [9:56:11<8:35:27, 2.50s/it] +2025-02-05 20:03:54 - ERROR - stderr - 45%|████▍ | 10081/22434 [9:56:14<8:34:41, 2.50s/it] +2025-02-05 20:03:54 - ERROR - stderr - +2025-02-05 20:03:54 - ERROR - stderr - +2025-02-05 20:03:54 - INFO - stdout - {'loss': 0.6889, 'grad_norm': 1.2161935567855835, 'learning_rate': 1.2110645118669725e-05, 'epoch': 1.35} +2025-02-05 20:03:54 - ERROR - stderr - 45%|████▍ | 10081/22434 [9:56:14<8:34:41, 2.50s/it] +2025-02-05 20:03:57 - ERROR - stderr - 45%|████▍ | 10082/22434 [9:56:16<8:39:05, 2.52s/it] +2025-02-05 20:03:57 - ERROR - stderr - +2025-02-05 20:03:57 - ERROR - stderr - +2025-02-05 20:03:57 - INFO - stdout - {'loss': 0.7523, 'grad_norm': 1.3054873943328857, 'learning_rate': 1.21092338745847e-05, 'epoch': 1.35} +2025-02-05 20:03:57 - ERROR - stderr - 45%|████▍ | 10082/22434 [9:56:16<8:39:05, 2.52s/it] +2025-02-05 20:03:59 - ERROR - stderr - 45%|████▍ | 10083/22434 [9:56:19<8:36:25, 2.51s/it] +2025-02-05 20:03:59 - ERROR - stderr - +2025-02-05 20:03:59 - ERROR - stderr - +2025-02-05 20:03:59 - INFO - stdout - {'loss': 0.6337, 'grad_norm': 1.0133250951766968, 'learning_rate': 1.2107822586534718e-05, 'epoch': 1.35} +2025-02-05 20:03:59 - ERROR - stderr - 45%|████▍ | 10083/22434 [9:56:19<8:36:25, 2.51s/it] +2025-02-05 20:04:01 - ERROR - stderr - 45%|████▍ | 10084/22434 [9:56:21<8:32:00, 2.49s/it] +2025-02-05 20:04:01 - ERROR - stderr - +2025-02-05 20:04:01 - ERROR - stderr - +2025-02-05 20:04:01 - INFO - stdout - {'loss': 0.7759, 'grad_norm': 1.2997405529022217, 'learning_rate': 1.2106411254549191e-05, 'epoch': 1.35} +2025-02-05 20:04:01 - ERROR - stderr - 45%|████▍ | 10084/22434 [9:56:21<8:32:00, 2.49s/it] +2025-02-05 20:04:04 - ERROR - stderr - 45%|████▍ | 10085/22434 [9:56:24<8:28:13, 2.47s/it] +2025-02-05 20:04:04 - ERROR - stderr - +2025-02-05 20:04:04 - ERROR - stderr - +2025-02-05 20:04:04 - INFO - stdout - {'loss': 0.6414, 'grad_norm': 1.143696665763855, 'learning_rate': 1.2104999878657535e-05, 'epoch': 1.35} +2025-02-05 20:04:04 - ERROR - stderr - 45%|████▍ | 10085/22434 [9:56:24<8:28:13, 2.47s/it] +2025-02-05 20:04:06 - ERROR - stderr - 45%|████▍ | 10086/22434 [9:56:26<8:25:04, 2.45s/it] +2025-02-05 20:04:06 - ERROR - stderr - +2025-02-05 20:04:06 - ERROR - stderr - +2025-02-05 20:04:06 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.2062244415283203, 'learning_rate': 1.2103588458889174e-05, 'epoch': 1.35} +2025-02-05 20:04:06 - ERROR - stderr - 45%|████▍ | 10086/22434 [9:56:26<8:25:04, 2.45s/it] +2025-02-05 20:04:09 - ERROR - stderr - 45%|████▍ | 10087/22434 [9:56:29<8:30:03, 2.48s/it] +2025-02-05 20:04:09 - ERROR - stderr - +2025-02-05 20:04:09 - ERROR - stderr - +2025-02-05 20:04:09 - INFO - stdout - {'loss': 0.779, 'grad_norm': 1.1555273532867432, 'learning_rate': 1.2102176995273522e-05, 'epoch': 1.35} +2025-02-05 20:04:09 - ERROR - stderr - 45%|████▍ | 10087/22434 [9:56:29<8:30:03, 2.48s/it] +2025-02-05 20:04:11 - ERROR - stderr - 45%|████▍ | 10088/22434 [9:56:31<8:30:58, 2.48s/it] +2025-02-05 20:04:11 - ERROR - stderr - +2025-02-05 20:04:11 - ERROR - stderr - +2025-02-05 20:04:11 - INFO - stdout - {'loss': 0.6208, 'grad_norm': 1.207604169845581, 'learning_rate': 1.210076548784e-05, 'epoch': 1.35} +2025-02-05 20:04:11 - ERROR - stderr - 45%|████▍ | 10088/22434 [9:56:31<8:30:58, 2.48s/it] +2025-02-05 20:04:14 - ERROR - stderr - 45%|████▍ | 10089/22434 [9:56:34<8:51:03, 2.58s/it] +2025-02-05 20:04:14 - ERROR - stderr - +2025-02-05 20:04:14 - ERROR - stderr - +2025-02-05 20:04:14 - INFO - stdout - {'loss': 0.7747, 'grad_norm': 1.211506724357605, 'learning_rate': 1.2099353936618035e-05, 'epoch': 1.35} +2025-02-05 20:04:14 - ERROR - stderr - 45%|████▍ | 10089/22434 [9:56:34<8:51:03, 2.58s/it] +2025-02-05 20:04:17 - ERROR - stderr - 45%|████▍ | 10090/22434 [9:56:36<8:44:49, 2.55s/it] +2025-02-05 20:04:17 - ERROR - stderr - +2025-02-05 20:04:17 - ERROR - stderr - +2025-02-05 20:04:17 - INFO - stdout - {'loss': 0.666, 'grad_norm': 1.1382546424865723, 'learning_rate': 1.2097942341637046e-05, 'epoch': 1.35} +2025-02-05 20:04:17 - ERROR - stderr - 45%|████▍ | 10090/22434 [9:56:36<8:44:49, 2.55s/it] +2025-02-05 20:04:19 - ERROR - stderr - 45%|████▍ | 10091/22434 [9:56:39<9:00:09, 2.63s/it] +2025-02-05 20:04:19 - ERROR - stderr - +2025-02-05 20:04:19 - ERROR - stderr - +2025-02-05 20:04:19 - INFO - stdout - {'loss': 0.8604, 'grad_norm': 1.4062260389328003, 'learning_rate': 1.2096530702926457e-05, 'epoch': 1.35} +2025-02-05 20:04:19 - ERROR - stderr - 45%|████▍ | 10091/22434 [9:56:39<9:00:09, 2.63s/it] +2025-02-05 20:04:22 - ERROR - stderr - 45%|████▍ | 10092/22434 [9:56:42<8:50:51, 2.58s/it] +2025-02-05 20:04:22 - ERROR - stderr - +2025-02-05 20:04:22 - ERROR - stderr - +2025-02-05 20:04:22 - INFO - stdout - {'loss': 0.7143, 'grad_norm': 1.2554432153701782, 'learning_rate': 1.2095119020515691e-05, 'epoch': 1.35} +2025-02-05 20:04:22 - ERROR - stderr - 45%|████▍ | 10092/22434 [9:56:42<8:50:51, 2.58s/it] +2025-02-05 20:04:24 - ERROR - stderr - 45%|████▍ | 10093/22434 [9:56:44<8:44:57, 2.55s/it] +2025-02-05 20:04:24 - ERROR - stderr - +2025-02-05 20:04:24 - ERROR - stderr - +2025-02-05 20:04:24 - INFO - stdout - {'loss': 0.6458, 'grad_norm': 1.175278663635254, 'learning_rate': 1.2093707294434172e-05, 'epoch': 1.35} +2025-02-05 20:04:24 - ERROR - stderr - 45%|████▍ | 10093/22434 [9:56:44<8:44:57, 2.55s/it] +2025-02-05 20:04:27 - ERROR - stderr - 45%|████▍ | 10094/22434 [9:56:47<8:40:32, 2.53s/it] +2025-02-05 20:04:27 - ERROR - stderr - +2025-02-05 20:04:27 - ERROR - stderr - +2025-02-05 20:04:27 - INFO - stdout - {'loss': 0.7555, 'grad_norm': 1.2217447757720947, 'learning_rate': 1.2092295524711331e-05, 'epoch': 1.35} +2025-02-05 20:04:27 - ERROR - stderr - 45%|████▍ | 10094/22434 [9:56:47<8:40:32, 2.53s/it] +2025-02-05 20:04:29 - ERROR - stderr - 45%|████▍ | 10095/22434 [9:56:49<8:38:48, 2.52s/it] +2025-02-05 20:04:29 - ERROR - stderr - +2025-02-05 20:04:29 - ERROR - stderr - +2025-02-05 20:04:29 - INFO - stdout - {'loss': 0.679, 'grad_norm': 1.1111235618591309, 'learning_rate': 1.2090883711376589e-05, 'epoch': 1.35} +2025-02-05 20:04:29 - ERROR - stderr - 45%|████▍ | 10095/22434 [9:56:49<8:38:48, 2.52s/it] +2025-02-05 20:04:32 - ERROR - stderr - 45%|████▌ | 10096/22434 [9:56:52<8:50:29, 2.58s/it] +2025-02-05 20:04:32 - ERROR - stderr - +2025-02-05 20:04:32 - ERROR - stderr - +2025-02-05 20:04:32 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.062126636505127, 'learning_rate': 1.2089471854459375e-05, 'epoch': 1.35} +2025-02-05 20:04:32 - ERROR - stderr - 45%|████▌ | 10096/22434 [9:56:52<8:50:29, 2.58s/it] +2025-02-05 20:04:35 - ERROR - stderr - 45%|████▌ | 10097/22434 [9:56:54<8:45:53, 2.56s/it] +2025-02-05 20:04:35 - ERROR - stderr - +2025-02-05 20:04:35 - ERROR - stderr - +2025-02-05 20:04:35 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.0881530046463013, 'learning_rate': 1.2088059953989124e-05, 'epoch': 1.35} +2025-02-05 20:04:35 - ERROR - stderr - 45%|████▌ | 10097/22434 [9:56:54<8:45:53, 2.56s/it] +2025-02-05 20:04:37 - ERROR - stderr - 45%|████▌ | 10098/22434 [9:56:57<8:41:09, 2.53s/it] +2025-02-05 20:04:37 - ERROR - stderr - +2025-02-05 20:04:37 - ERROR - stderr - +2025-02-05 20:04:37 - INFO - stdout - {'loss': 0.7947, 'grad_norm': 1.3496240377426147, 'learning_rate': 1.2086648009995258e-05, 'epoch': 1.35} +2025-02-05 20:04:37 - ERROR - stderr - 45%|████▌ | 10098/22434 [9:56:57<8:41:09, 2.53s/it] +2025-02-05 20:04:40 - ERROR - stderr - 45%|████▌ | 10099/22434 [9:57:00<9:03:53, 2.65s/it] +2025-02-05 20:04:40 - ERROR - stderr - +2025-02-05 20:04:40 - ERROR - stderr - +2025-02-05 20:04:40 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.121634840965271, 'learning_rate': 1.2085236022507216e-05, 'epoch': 1.35} +2025-02-05 20:04:40 - ERROR - stderr - 45%|████▌ | 10099/22434 [9:57:00<9:03:53, 2.65s/it] +2025-02-05 20:04:42 - ERROR - stderr - 45%|████▌ | 10100/22434 [9:57:02<8:53:20, 2.59s/it] +2025-02-05 20:04:42 - ERROR - stderr - +2025-02-05 20:04:42 - ERROR - stderr - +2025-02-05 20:04:42 - INFO - stdout - {'loss': 0.6922, 'grad_norm': 1.3094840049743652, 'learning_rate': 1.2083823991554423e-05, 'epoch': 1.35} +2025-02-05 20:04:42 - ERROR - stderr - 45%|████▌ | 10100/22434 [9:57:02<8:53:20, 2.59s/it] +2025-02-05 20:04:45 - ERROR - stderr - 45%|████▌ | 10101/22434 [9:57:05<8:49:00, 2.57s/it] +2025-02-05 20:04:45 - ERROR - stderr - +2025-02-05 20:04:45 - ERROR - stderr - +2025-02-05 20:04:45 - INFO - stdout - {'loss': 0.7694, 'grad_norm': 1.2325226068496704, 'learning_rate': 1.2082411917166308e-05, 'epoch': 1.35} +2025-02-05 20:04:45 - ERROR - stderr - 45%|████▌ | 10101/22434 [9:57:05<8:49:00, 2.57s/it] +2025-02-05 20:04:47 - ERROR - stderr - 45%|████▌ | 10102/22434 [9:57:07<8:43:24, 2.55s/it] +2025-02-05 20:04:47 - ERROR - stderr - +2025-02-05 20:04:47 - ERROR - stderr - +2025-02-05 20:04:47 - INFO - stdout - {'loss': 0.7827, 'grad_norm': 1.30870521068573, 'learning_rate': 1.208099979937231e-05, 'epoch': 1.35} +2025-02-05 20:04:47 - ERROR - stderr - 45%|████▌ | 10102/22434 [9:57:07<8:43:24, 2.55s/it] +2025-02-05 20:04:50 - ERROR - stderr - 45%|████▌ | 10103/22434 [9:57:10<8:45:42, 2.56s/it] +2025-02-05 20:04:50 - ERROR - stderr - +2025-02-05 20:04:50 - ERROR - stderr - +2025-02-05 20:04:50 - INFO - stdout - {'loss': 0.6442, 'grad_norm': 1.1953870058059692, 'learning_rate': 1.2079587638201868e-05, 'epoch': 1.35} +2025-02-05 20:04:50 - ERROR - stderr - 45%|████▌ | 10103/22434 [9:57:10<8:45:42, 2.56s/it] +2025-02-05 20:04:53 - ERROR - stderr - 45%|████▌ | 10104/22434 [9:57:12<8:42:50, 2.54s/it] +2025-02-05 20:04:53 - ERROR - stderr - +2025-02-05 20:04:53 - ERROR - stderr - +2025-02-05 20:04:53 - INFO - stdout - {'loss': 0.7573, 'grad_norm': 1.248350739479065, 'learning_rate': 1.2078175433684407e-05, 'epoch': 1.35} +2025-02-05 20:04:53 - ERROR - stderr - 45%|████▌ | 10104/22434 [9:57:12<8:42:50, 2.54s/it] +2025-02-05 20:04:55 - ERROR - stderr - 45%|████▌ | 10105/22434 [9:57:15<8:36:29, 2.51s/it] +2025-02-05 20:04:55 - ERROR - stderr - +2025-02-05 20:04:55 - ERROR - stderr - +2025-02-05 20:04:55 - INFO - stdout - {'loss': 0.7686, 'grad_norm': 1.270564079284668, 'learning_rate': 1.2076763185849369e-05, 'epoch': 1.35} +2025-02-05 20:04:55 - ERROR - stderr - 45%|████▌ | 10105/22434 [9:57:15<8:36:29, 2.51s/it] +2025-02-05 20:04:57 - ERROR - stderr - 45%|████▌ | 10106/22434 [9:57:17<8:31:26, 2.49s/it] +2025-02-05 20:04:57 - ERROR - stderr - +2025-02-05 20:04:57 - ERROR - stderr - +2025-02-05 20:04:57 - INFO - stdout - {'loss': 0.8006, 'grad_norm': 1.2915006875991821, 'learning_rate': 1.207535089472619e-05, 'epoch': 1.35} +2025-02-05 20:04:57 - ERROR - stderr - 45%|████▌ | 10106/22434 [9:57:17<8:31:26, 2.49s/it] +2025-02-05 20:05:00 - ERROR - stderr - 45%|████▌ | 10107/22434 [9:57:20<8:33:11, 2.50s/it] +2025-02-05 20:05:00 - ERROR - stderr - +2025-02-05 20:05:00 - ERROR - stderr - +2025-02-05 20:05:00 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.159336805343628, 'learning_rate': 1.2073938560344308e-05, 'epoch': 1.35} +2025-02-05 20:05:00 - ERROR - stderr - 45%|████▌ | 10107/22434 [9:57:20<8:33:11, 2.50s/it] +2025-02-05 20:05:03 - ERROR - stderr - 45%|████▌ | 10108/22434 [9:57:22<8:41:05, 2.54s/it] +2025-02-05 20:05:03 - ERROR - stderr - +2025-02-05 20:05:03 - ERROR - stderr - +2025-02-05 20:05:03 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.2063605785369873, 'learning_rate': 1.207252618273316e-05, 'epoch': 1.35} +2025-02-05 20:05:03 - ERROR - stderr - 45%|████▌ | 10108/22434 [9:57:22<8:41:05, 2.54s/it] +2025-02-05 20:05:05 - ERROR - stderr - 45%|████▌ | 10109/22434 [9:57:25<8:41:18, 2.54s/it] +2025-02-05 20:05:05 - ERROR - stderr - +2025-02-05 20:05:05 - ERROR - stderr - +2025-02-05 20:05:05 - INFO - stdout - {'loss': 0.7762, 'grad_norm': 1.2224453687667847, 'learning_rate': 1.2071113761922187e-05, 'epoch': 1.35} +2025-02-05 20:05:05 - ERROR - stderr - 45%|████▌ | 10109/22434 [9:57:25<8:41:18, 2.54s/it] +2025-02-05 20:05:08 - ERROR - stderr - 45%|████▌ | 10110/22434 [9:57:27<8:37:45, 2.52s/it] +2025-02-05 20:05:08 - ERROR - stderr - +2025-02-05 20:05:08 - ERROR - stderr - +2025-02-05 20:05:08 - INFO - stdout - {'loss': 0.7375, 'grad_norm': 1.2698389291763306, 'learning_rate': 1.206970129794083e-05, 'epoch': 1.35} +2025-02-05 20:05:08 - ERROR - stderr - 45%|████▌ | 10110/22434 [9:57:27<8:37:45, 2.52s/it] +2025-02-05 20:05:10 - ERROR - stderr - 45%|████▌ | 10111/22434 [9:57:30<8:33:21, 2.50s/it] +2025-02-05 20:05:10 - ERROR - stderr - +2025-02-05 20:05:10 - ERROR - stderr - +2025-02-05 20:05:10 - INFO - stdout - {'loss': 0.6884, 'grad_norm': 1.2487092018127441, 'learning_rate': 1.206828879081853e-05, 'epoch': 1.35} +2025-02-05 20:05:10 - ERROR - stderr - 45%|████▌ | 10111/22434 [9:57:30<8:33:21, 2.50s/it] +2025-02-05 20:05:13 - ERROR - stderr - 45%|████▌ | 10112/22434 [9:57:32<8:43:50, 2.55s/it] +2025-02-05 20:05:13 - ERROR - stderr - +2025-02-05 20:05:13 - ERROR - stderr - +2025-02-05 20:05:13 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.1894068717956543, 'learning_rate': 1.206687624058473e-05, 'epoch': 1.35} +2025-02-05 20:05:13 - ERROR - stderr - 45%|████▌ | 10112/22434 [9:57:33<8:43:50, 2.55s/it] +2025-02-05 20:05:15 - ERROR - stderr - 45%|████▌ | 10113/22434 [9:57:35<8:39:50, 2.53s/it] +2025-02-05 20:05:15 - ERROR - stderr - +2025-02-05 20:05:15 - ERROR - stderr - +2025-02-05 20:05:15 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.1236110925674438, 'learning_rate': 1.2065463647268872e-05, 'epoch': 1.35} +2025-02-05 20:05:15 - ERROR - stderr - 45%|████▌ | 10113/22434 [9:57:35<8:39:50, 2.53s/it] +2025-02-05 20:05:18 - ERROR - stderr - 45%|████▌ | 10114/22434 [9:57:38<8:43:19, 2.55s/it] +2025-02-05 20:05:18 - ERROR - stderr - +2025-02-05 20:05:18 - ERROR - stderr - +2025-02-05 20:05:18 - INFO - stdout - {'loss': 0.7304, 'grad_norm': 1.1820405721664429, 'learning_rate': 1.2064051010900397e-05, 'epoch': 1.35} +2025-02-05 20:05:18 - ERROR - stderr - 45%|████▌ | 10114/22434 [9:57:38<8:43:19, 2.55s/it] +2025-02-05 20:05:20 - ERROR - stderr - 45%|████▌ | 10115/22434 [9:57:40<8:50:27, 2.58s/it] +2025-02-05 20:05:20 - ERROR - stderr - +2025-02-05 20:05:20 - ERROR - stderr - +2025-02-05 20:05:20 - INFO - stdout - {'loss': 0.7668, 'grad_norm': 1.167473316192627, 'learning_rate': 1.2062638331508757e-05, 'epoch': 1.35} +2025-02-05 20:05:20 - ERROR - stderr - 45%|████▌ | 10115/22434 [9:57:40<8:50:27, 2.58s/it] +2025-02-05 20:05:23 - ERROR - stderr - 45%|████▌ | 10116/22434 [9:57:43<8:58:27, 2.62s/it] +2025-02-05 20:05:23 - ERROR - stderr - +2025-02-05 20:05:23 - ERROR - stderr - +2025-02-05 20:05:23 - INFO - stdout - {'loss': 0.7662, 'grad_norm': 1.275689721107483, 'learning_rate': 1.2061225609123397e-05, 'epoch': 1.35} +2025-02-05 20:05:23 - ERROR - stderr - 45%|████▌ | 10116/22434 [9:57:43<8:58:27, 2.62s/it] +2025-02-05 20:05:26 - ERROR - stderr - 45%|████▌ | 10117/22434 [9:57:45<8:53:37, 2.60s/it] +2025-02-05 20:05:26 - ERROR - stderr - +2025-02-05 20:05:26 - ERROR - stderr - +2025-02-05 20:05:26 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.28383469581604, 'learning_rate': 1.205981284377376e-05, 'epoch': 1.35} +2025-02-05 20:05:26 - ERROR - stderr - 45%|████▌ | 10117/22434 [9:57:46<8:53:37, 2.60s/it] +2025-02-05 20:05:28 - ERROR - stderr - 45%|████▌ | 10118/22434 [9:57:48<8:46:50, 2.57s/it] +2025-02-05 20:05:28 - ERROR - stderr - +2025-02-05 20:05:28 - ERROR - stderr - +2025-02-05 20:05:28 - INFO - stdout - {'loss': 0.7694, 'grad_norm': 1.2344011068344116, 'learning_rate': 1.2058400035489293e-05, 'epoch': 1.35} +2025-02-05 20:05:28 - ERROR - stderr - 45%|████▌ | 10118/22434 [9:57:48<8:46:50, 2.57s/it] +2025-02-05 20:05:31 - ERROR - stderr - 45%|████▌ | 10119/22434 [9:57:50<8:41:16, 2.54s/it] +2025-02-05 20:05:31 - ERROR - stderr - +2025-02-05 20:05:31 - ERROR - stderr - +2025-02-05 20:05:31 - INFO - stdout - {'loss': 0.6648, 'grad_norm': 1.1365541219711304, 'learning_rate': 1.2056987184299449e-05, 'epoch': 1.35} +2025-02-05 20:05:31 - ERROR - stderr - 45%|████▌ | 10119/22434 [9:57:50<8:41:16, 2.54s/it] +2025-02-05 20:05:33 - ERROR - stderr - 45%|████▌ | 10120/22434 [9:57:53<8:59:18, 2.63s/it] +2025-02-05 20:05:34 - ERROR - stderr - +2025-02-05 20:05:34 - ERROR - stderr - +2025-02-05 20:05:34 - INFO - stdout - {'loss': 0.7238, 'grad_norm': 1.1914833784103394, 'learning_rate': 1.2055574290233673e-05, 'epoch': 1.35} +2025-02-05 20:05:34 - ERROR - stderr - 45%|████▌ | 10120/22434 [9:57:53<8:59:18, 2.63s/it] +2025-02-05 20:05:36 - ERROR - stderr - 45%|████▌ | 10121/22434 [9:57:56<8:56:50, 2.62s/it] +2025-02-05 20:05:36 - ERROR - stderr - +2025-02-05 20:05:36 - ERROR - stderr - +2025-02-05 20:05:36 - INFO - stdout - {'loss': 0.7894, 'grad_norm': 1.2482595443725586, 'learning_rate': 1.205416135332142e-05, 'epoch': 1.35} +2025-02-05 20:05:36 - ERROR - stderr - 45%|████▌ | 10121/22434 [9:57:56<8:56:50, 2.62s/it] +2025-02-05 20:05:39 - ERROR - stderr - 45%|████▌ | 10122/22434 [9:57:58<8:46:40, 2.57s/it] +2025-02-05 20:05:39 - ERROR - stderr - +2025-02-05 20:05:39 - ERROR - stderr - +2025-02-05 20:05:39 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.226043701171875, 'learning_rate': 1.205274837359214e-05, 'epoch': 1.35} +2025-02-05 20:05:39 - ERROR - stderr - 45%|████▌ | 10122/22434 [9:57:58<8:46:40, 2.57s/it] +2025-02-05 20:05:41 - ERROR - stderr - 45%|████▌ | 10123/22434 [9:58:01<8:49:51, 2.58s/it] +2025-02-05 20:05:41 - ERROR - stderr - +2025-02-05 20:05:41 - ERROR - stderr - +2025-02-05 20:05:41 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.1266580820083618, 'learning_rate': 1.2051335351075284e-05, 'epoch': 1.35} +2025-02-05 20:05:41 - ERROR - stderr - 45%|████▌ | 10123/22434 [9:58:01<8:49:51, 2.58s/it] +2025-02-05 20:05:44 - ERROR - stderr - 45%|████▌ | 10124/22434 [9:58:03<8:44:13, 2.56s/it] +2025-02-05 20:05:44 - ERROR - stderr - +2025-02-05 20:05:44 - ERROR - stderr - +2025-02-05 20:05:44 - INFO - stdout - {'loss': 0.7243, 'grad_norm': 1.185328722000122, 'learning_rate': 1.2049922285800305e-05, 'epoch': 1.35} +2025-02-05 20:05:44 - ERROR - stderr - 45%|████▌ | 10124/22434 [9:58:03<8:44:13, 2.56s/it] +2025-02-05 20:05:46 - ERROR - stderr - 45%|████▌ | 10125/22434 [9:58:06<8:45:51, 2.56s/it] +2025-02-05 20:05:46 - ERROR - stderr - +2025-02-05 20:05:46 - ERROR - stderr - +2025-02-05 20:05:46 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.0669310092926025, 'learning_rate': 1.2048509177796659e-05, 'epoch': 1.35} +2025-02-05 20:05:46 - ERROR - stderr - 45%|████▌ | 10125/22434 [9:58:06<8:45:51, 2.56s/it] +2025-02-05 20:05:49 - ERROR - stderr - 45%|████▌ | 10126/22434 [9:58:09<8:48:41, 2.58s/it] +2025-02-05 20:05:49 - ERROR - stderr - +2025-02-05 20:05:49 - ERROR - stderr - +2025-02-05 20:05:49 - INFO - stdout - {'loss': 0.7126, 'grad_norm': 1.078006386756897, 'learning_rate': 1.2047096027093798e-05, 'epoch': 1.35} +2025-02-05 20:05:49 - ERROR - stderr - 45%|████▌ | 10126/22434 [9:58:09<8:48:41, 2.58s/it] +2025-02-05 20:05:51 - ERROR - stderr - 45%|████▌ | 10127/22434 [9:58:11<8:45:25, 2.56s/it] +2025-02-05 20:05:51 - ERROR - stderr - +2025-02-05 20:05:51 - ERROR - stderr - +2025-02-05 20:05:51 - INFO - stdout - {'loss': 0.6252, 'grad_norm': 1.1265978813171387, 'learning_rate': 1.2045682833721177e-05, 'epoch': 1.35} +2025-02-05 20:05:51 - ERROR - stderr - 45%|████▌ | 10127/22434 [9:58:11<8:45:25, 2.56s/it] +2025-02-05 20:05:54 - ERROR - stderr - 45%|████▌ | 10128/22434 [9:58:14<8:43:04, 2.55s/it] +2025-02-05 20:05:54 - ERROR - stderr - +2025-02-05 20:05:54 - ERROR - stderr - +2025-02-05 20:05:54 - INFO - stdout - {'loss': 0.7137, 'grad_norm': 1.2035911083221436, 'learning_rate': 1.2044269597708258e-05, 'epoch': 1.35} +2025-02-05 20:05:54 - ERROR - stderr - 45%|████▌ | 10128/22434 [9:58:14<8:43:04, 2.55s/it] +2025-02-05 20:05:56 - ERROR - stderr - 45%|████▌ | 10129/22434 [9:58:16<8:45:48, 2.56s/it] +2025-02-05 20:05:57 - ERROR - stderr - +2025-02-05 20:05:57 - ERROR - stderr - +2025-02-05 20:05:57 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.1704684495925903, 'learning_rate': 1.2042856319084495e-05, 'epoch': 1.35} +2025-02-05 20:05:57 - ERROR - stderr - 45%|████▌ | 10129/22434 [9:58:16<8:45:48, 2.56s/it] +2025-02-05 20:05:59 - ERROR - stderr - 45%|████▌ | 10130/22434 [9:58:19<8:51:07, 2.59s/it] +2025-02-05 20:05:59 - ERROR - stderr - +2025-02-05 20:05:59 - ERROR - stderr - +2025-02-05 20:05:59 - INFO - stdout - {'loss': 0.6375, 'grad_norm': 1.2008923292160034, 'learning_rate': 1.2041442997879347e-05, 'epoch': 1.35} +2025-02-05 20:05:59 - ERROR - stderr - 45%|████▌ | 10130/22434 [9:58:19<8:51:07, 2.59s/it] +2025-02-05 20:06:02 - ERROR - stderr - 45%|████▌ | 10131/22434 [9:58:21<8:48:21, 2.58s/it] +2025-02-05 20:06:02 - ERROR - stderr - +2025-02-05 20:06:02 - ERROR - stderr - +2025-02-05 20:06:02 - INFO - stdout - {'loss': 0.5944, 'grad_norm': 0.9947749376296997, 'learning_rate': 1.2040029634122272e-05, 'epoch': 1.35} +2025-02-05 20:06:02 - ERROR - stderr - 45%|████▌ | 10131/22434 [9:58:21<8:48:21, 2.58s/it] +2025-02-05 20:06:04 - ERROR - stderr - 45%|████▌ | 10132/22434 [9:58:24<8:43:35, 2.55s/it] +2025-02-05 20:06:04 - ERROR - stderr - +2025-02-05 20:06:04 - ERROR - stderr - +2025-02-05 20:06:04 - INFO - stdout - {'loss': 0.8595, 'grad_norm': 1.4556466341018677, 'learning_rate': 1.2038616227842734e-05, 'epoch': 1.35} +2025-02-05 20:06:04 - ERROR - stderr - 45%|████▌ | 10132/22434 [9:58:24<8:43:35, 2.55s/it] +2025-02-05 20:06:07 - ERROR - stderr - 45%|████▌ | 10133/22434 [9:58:26<8:43:06, 2.55s/it] +2025-02-05 20:06:07 - ERROR - stderr - +2025-02-05 20:06:07 - ERROR - stderr - +2025-02-05 20:06:07 - INFO - stdout - {'loss': 0.6282, 'grad_norm': 1.0000073909759521, 'learning_rate': 1.2037202779070186e-05, 'epoch': 1.36} +2025-02-05 20:06:07 - ERROR - stderr - 45%|████▌ | 10133/22434 [9:58:27<8:43:06, 2.55s/it] +2025-02-05 20:06:09 - ERROR - stderr - 45%|████▌ | 10134/22434 [9:58:29<8:44:33, 2.56s/it] +2025-02-05 20:06:09 - ERROR - stderr - +2025-02-05 20:06:09 - ERROR - stderr - +2025-02-05 20:06:09 - INFO - stdout - {'loss': 0.7719, 'grad_norm': 1.2447948455810547, 'learning_rate': 1.2035789287834099e-05, 'epoch': 1.36} +2025-02-05 20:06:09 - ERROR - stderr - 45%|████▌ | 10134/22434 [9:58:29<8:44:33, 2.56s/it] +2025-02-05 20:06:12 - ERROR - stderr - 45%|████▌ | 10135/22434 [9:58:32<8:41:43, 2.55s/it] +2025-02-05 20:06:12 - ERROR - stderr - +2025-02-05 20:06:12 - ERROR - stderr - +2025-02-05 20:06:12 - INFO - stdout - {'loss': 0.7779, 'grad_norm': 1.2689791917800903, 'learning_rate': 1.2034375754163932e-05, 'epoch': 1.36} +2025-02-05 20:06:12 - ERROR - stderr - 45%|████▌ | 10135/22434 [9:58:32<8:41:43, 2.55s/it] +2025-02-05 20:06:14 - ERROR - stderr - 45%|████▌ | 10136/22434 [9:58:34<8:39:08, 2.53s/it] +2025-02-05 20:06:14 - ERROR - stderr - +2025-02-05 20:06:14 - ERROR - stderr - +2025-02-05 20:06:14 - INFO - stdout - {'loss': 0.7528, 'grad_norm': 1.2218483686447144, 'learning_rate': 1.203296217808915e-05, 'epoch': 1.36} +2025-02-05 20:06:14 - ERROR - stderr - 45%|████▌ | 10136/22434 [9:58:34<8:39:08, 2.53s/it] +2025-02-05 20:06:17 - ERROR - stderr - 45%|████▌ | 10137/22434 [9:58:37<8:35:52, 2.52s/it] +2025-02-05 20:06:17 - ERROR - stderr - +2025-02-05 20:06:17 - ERROR - stderr - +2025-02-05 20:06:17 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 1.1591823101043701, 'learning_rate': 1.2031548559639216e-05, 'epoch': 1.36} +2025-02-05 20:06:17 - ERROR - stderr - 45%|████▌ | 10137/22434 [9:58:37<8:35:52, 2.52s/it] +2025-02-05 20:06:19 - ERROR - stderr - 45%|████▌ | 10138/22434 [9:58:39<8:36:02, 2.52s/it] +2025-02-05 20:06:19 - ERROR - stderr - +2025-02-05 20:06:19 - ERROR - stderr - +2025-02-05 20:06:19 - INFO - stdout - {'loss': 0.7595, 'grad_norm': 1.1840147972106934, 'learning_rate': 1.2030134898843598e-05, 'epoch': 1.36} +2025-02-05 20:06:19 - ERROR - stderr - 45%|████▌ | 10138/22434 [9:58:39<8:36:02, 2.52s/it] +2025-02-05 20:06:22 - ERROR - stderr - 45%|████▌ | 10139/22434 [9:58:42<8:38:04, 2.53s/it] +2025-02-05 20:06:22 - ERROR - stderr - +2025-02-05 20:06:22 - ERROR - stderr - +2025-02-05 20:06:22 - INFO - stdout - {'loss': 0.7322, 'grad_norm': 1.1650559902191162, 'learning_rate': 1.2028721195731756e-05, 'epoch': 1.36} +2025-02-05 20:06:22 - ERROR - stderr - 45%|████▌ | 10139/22434 [9:58:42<8:38:04, 2.53s/it] +2025-02-05 20:06:24 - ERROR - stderr - 45%|████▌ | 10140/22434 [9:58:44<8:35:07, 2.51s/it] +2025-02-05 20:06:24 - ERROR - stderr - +2025-02-05 20:06:24 - ERROR - stderr - +2025-02-05 20:06:24 - INFO - stdout - {'loss': 0.7352, 'grad_norm': 1.1595571041107178, 'learning_rate': 1.2027307450333166e-05, 'epoch': 1.36} +2025-02-05 20:06:24 - ERROR - stderr - 45%|████▌ | 10140/22434 [9:58:44<8:35:07, 2.51s/it] +2025-02-05 20:06:27 - ERROR - stderr - 45%|████▌ | 10141/22434 [9:58:47<8:35:34, 2.52s/it] +2025-02-05 20:06:27 - ERROR - stderr - +2025-02-05 20:06:27 - ERROR - stderr - +2025-02-05 20:06:27 - INFO - stdout - {'loss': 0.7587, 'grad_norm': 1.1970895528793335, 'learning_rate': 1.202589366267729e-05, 'epoch': 1.36} +2025-02-05 20:06:27 - ERROR - stderr - 45%|████▌ | 10141/22434 [9:58:47<8:35:34, 2.52s/it] +2025-02-05 20:06:29 - ERROR - stderr - 45%|████▌ | 10142/22434 [9:58:49<8:35:49, 2.52s/it] +2025-02-05 20:06:29 - ERROR - stderr - +2025-02-05 20:06:29 - ERROR - stderr - +2025-02-05 20:06:29 - INFO - stdout - {'loss': 0.6018, 'grad_norm': 1.032342791557312, 'learning_rate': 1.20244798327936e-05, 'epoch': 1.36} +2025-02-05 20:06:29 - ERROR - stderr - 45%|████▌ | 10142/22434 [9:58:49<8:35:49, 2.52s/it] +2025-02-05 20:06:32 - ERROR - stderr - 45%|████▌ | 10143/22434 [9:58:52<8:34:43, 2.51s/it] +2025-02-05 20:06:32 - ERROR - stderr - +2025-02-05 20:06:32 - ERROR - stderr - +2025-02-05 20:06:32 - INFO - stdout - {'loss': 0.7543, 'grad_norm': 1.218293309211731, 'learning_rate': 1.2023065960711565e-05, 'epoch': 1.36} +2025-02-05 20:06:32 - ERROR - stderr - 45%|████▌ | 10143/22434 [9:58:52<8:34:43, 2.51s/it] +2025-02-05 20:06:34 - ERROR - stderr - 45%|████▌ | 10144/22434 [9:58:54<8:38:48, 2.53s/it] +2025-02-05 20:06:35 - ERROR - stderr - +2025-02-05 20:06:35 - ERROR - stderr - +2025-02-05 20:06:35 - INFO - stdout - {'loss': 0.65, 'grad_norm': 1.1026887893676758, 'learning_rate': 1.202165204646066e-05, 'epoch': 1.36} +2025-02-05 20:06:35 - ERROR - stderr - 45%|████▌ | 10144/22434 [9:58:54<8:38:48, 2.53s/it] +2025-02-05 20:06:37 - ERROR - stderr - 45%|████▌ | 10145/22434 [9:58:57<8:33:17, 2.51s/it] +2025-02-05 20:06:37 - ERROR - stderr - +2025-02-05 20:06:37 - ERROR - stderr - +2025-02-05 20:06:37 - INFO - stdout - {'loss': 0.7572, 'grad_norm': 1.3123043775558472, 'learning_rate': 1.2020238090070346e-05, 'epoch': 1.36} +2025-02-05 20:06:37 - ERROR - stderr - 45%|████▌ | 10145/22434 [9:58:57<8:33:17, 2.51s/it] +2025-02-05 20:06:39 - ERROR - stderr - 45%|████▌ | 10146/22434 [9:58:59<8:33:37, 2.51s/it] +2025-02-05 20:06:39 - ERROR - stderr - +2025-02-05 20:06:39 - ERROR - stderr - +2025-02-05 20:06:39 - INFO - stdout - {'loss': 0.7184, 'grad_norm': 1.2147557735443115, 'learning_rate': 1.2018824091570103e-05, 'epoch': 1.36} +2025-02-05 20:06:39 - ERROR - stderr - 45%|████▌ | 10146/22434 [9:58:59<8:33:37, 2.51s/it] +2025-02-05 20:06:42 - ERROR - stderr - 45%|████▌ | 10147/22434 [9:59:02<8:36:20, 2.52s/it] +2025-02-05 20:06:42 - ERROR - stderr - +2025-02-05 20:06:42 - ERROR - stderr - +2025-02-05 20:06:42 - INFO - stdout - {'loss': 0.7494, 'grad_norm': 1.2220582962036133, 'learning_rate': 1.2017410050989405e-05, 'epoch': 1.36} +2025-02-05 20:06:42 - ERROR - stderr - 45%|████▌ | 10147/22434 [9:59:02<8:36:20, 2.52s/it] +2025-02-05 20:06:45 - ERROR - stderr - 45%|████▌ | 10148/22434 [9:59:04<8:40:56, 2.54s/it] +2025-02-05 20:06:45 - ERROR - stderr - +2025-02-05 20:06:45 - ERROR - stderr - +2025-02-05 20:06:45 - INFO - stdout - {'loss': 0.7532, 'grad_norm': 1.2740370035171509, 'learning_rate': 1.2015995968357728e-05, 'epoch': 1.36} +2025-02-05 20:06:45 - ERROR - stderr - 45%|████▌ | 10148/22434 [9:59:04<8:40:56, 2.54s/it] +2025-02-05 20:06:47 - ERROR - stderr - 45%|████▌ | 10149/22434 [9:59:07<8:36:01, 2.52s/it] +2025-02-05 20:06:47 - ERROR - stderr - +2025-02-05 20:06:47 - ERROR - stderr - +2025-02-05 20:06:47 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.1707720756530762, 'learning_rate': 1.201458184370454e-05, 'epoch': 1.36} +2025-02-05 20:06:47 - ERROR - stderr - 45%|████▌ | 10149/22434 [9:59:07<8:36:01, 2.52s/it] +2025-02-05 20:06:50 - ERROR - stderr - 45%|████▌ | 10150/22434 [9:59:09<8:36:19, 2.52s/it] +2025-02-05 20:06:50 - ERROR - stderr - +2025-02-05 20:06:50 - ERROR - stderr - +2025-02-05 20:06:50 - INFO - stdout - {'loss': 0.65, 'grad_norm': 1.0780820846557617, 'learning_rate': 1.2013167677059324e-05, 'epoch': 1.36} +2025-02-05 20:06:50 - ERROR - stderr - 45%|████▌ | 10150/22434 [9:59:09<8:36:19, 2.52s/it] +2025-02-05 20:06:52 - ERROR - stderr - 45%|████▌ | 10151/22434 [9:59:12<8:51:50, 2.60s/it] +2025-02-05 20:06:52 - ERROR - stderr - +2025-02-05 20:06:52 - ERROR - stderr - +2025-02-05 20:06:52 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.2585344314575195, 'learning_rate': 1.2011753468451552e-05, 'epoch': 1.36} +2025-02-05 20:06:52 - ERROR - stderr - 45%|████▌ | 10151/22434 [9:59:12<8:51:50, 2.60s/it] +2025-02-05 20:06:55 - ERROR - stderr - 45%|████▌ | 10152/22434 [9:59:15<9:18:41, 2.73s/it] +2025-02-05 20:06:55 - ERROR - stderr - +2025-02-05 20:06:55 - ERROR - stderr - +2025-02-05 20:06:55 - INFO - stdout - {'loss': 0.7375, 'grad_norm': 1.0727747678756714, 'learning_rate': 1.2010339217910706e-05, 'epoch': 1.36} +2025-02-05 20:06:55 - ERROR - stderr - 45%|████▌ | 10152/22434 [9:59:15<9:18:41, 2.73s/it] +2025-02-05 20:06:58 - ERROR - stderr - 45%|████▌ | 10153/22434 [9:59:18<9:03:33, 2.66s/it] +2025-02-05 20:06:58 - ERROR - stderr - +2025-02-05 20:06:58 - ERROR - stderr - +2025-02-05 20:06:58 - INFO - stdout - {'loss': 0.6141, 'grad_norm': 1.0936923027038574, 'learning_rate': 1.200892492546626e-05, 'epoch': 1.36} +2025-02-05 20:06:58 - ERROR - stderr - 45%|████▌ | 10153/22434 [9:59:18<9:03:33, 2.66s/it] +2025-02-05 20:07:00 - ERROR - stderr - 45%|████▌ | 10154/22434 [9:59:20<8:57:50, 2.63s/it] +2025-02-05 20:07:00 - ERROR - stderr - +2025-02-05 20:07:00 - ERROR - stderr - +2025-02-05 20:07:00 - INFO - stdout - {'loss': 0.6382, 'grad_norm': 1.1288864612579346, 'learning_rate': 1.2007510591147698e-05, 'epoch': 1.36} +2025-02-05 20:07:00 - ERROR - stderr - 45%|████▌ | 10154/22434 [9:59:20<8:57:50, 2.63s/it] +2025-02-05 20:07:03 - ERROR - stderr - 45%|████▌ | 10155/22434 [9:59:23<8:50:28, 2.59s/it] +2025-02-05 20:07:03 - ERROR - stderr - +2025-02-05 20:07:03 - ERROR - stderr - +2025-02-05 20:07:03 - INFO - stdout - {'loss': 0.8149, 'grad_norm': 1.198479175567627, 'learning_rate': 1.2006096214984498e-05, 'epoch': 1.36} +2025-02-05 20:07:03 - ERROR - stderr - 45%|████▌ | 10155/22434 [9:59:23<8:50:28, 2.59s/it] +2025-02-05 20:07:06 - ERROR - stderr - 45%|████▌ | 10156/22434 [9:59:25<8:50:27, 2.59s/it] +2025-02-05 20:07:06 - ERROR - stderr - +2025-02-05 20:07:06 - ERROR - stderr - +2025-02-05 20:07:06 - INFO - stdout - {'loss': 0.6612, 'grad_norm': 1.260659098625183, 'learning_rate': 1.2004681797006143e-05, 'epoch': 1.36} +2025-02-05 20:07:06 - ERROR - stderr - 45%|████▌ | 10156/22434 [9:59:25<8:50:27, 2.59s/it] +2025-02-05 20:07:08 - ERROR - stderr - 45%|████▌ | 10157/22434 [9:59:28<8:45:22, 2.57s/it] +2025-02-05 20:07:08 - ERROR - stderr - +2025-02-05 20:07:08 - ERROR - stderr - +2025-02-05 20:07:08 - INFO - stdout - {'loss': 0.7405, 'grad_norm': 1.1443016529083252, 'learning_rate': 1.2003267337242115e-05, 'epoch': 1.36} +2025-02-05 20:07:08 - ERROR - stderr - 45%|████▌ | 10157/22434 [9:59:28<8:45:22, 2.57s/it] +2025-02-05 20:07:11 - ERROR - stderr - 45%|████▌ | 10158/22434 [9:59:30<8:42:52, 2.56s/it] +2025-02-05 20:07:11 - ERROR - stderr - +2025-02-05 20:07:11 - ERROR - stderr - +2025-02-05 20:07:11 - INFO - stdout - {'loss': 0.7778, 'grad_norm': 1.2879217863082886, 'learning_rate': 1.2001852835721894e-05, 'epoch': 1.36} +2025-02-05 20:07:11 - ERROR - stderr - 45%|████▌ | 10158/22434 [9:59:30<8:42:52, 2.56s/it] +2025-02-05 20:07:13 - ERROR - stderr - 45%|████▌ | 10159/22434 [9:59:33<8:39:30, 2.54s/it] +2025-02-05 20:07:13 - ERROR - stderr - +2025-02-05 20:07:13 - ERROR - stderr - +2025-02-05 20:07:13 - INFO - stdout - {'loss': 0.7577, 'grad_norm': 1.2178672552108765, 'learning_rate': 1.2000438292474968e-05, 'epoch': 1.36} +2025-02-05 20:07:13 - ERROR - stderr - 45%|████▌ | 10159/22434 [9:59:33<8:39:30, 2.54s/it] +2025-02-05 20:07:16 - ERROR - stderr - 45%|████▌ | 10160/22434 [9:59:35<8:38:00, 2.53s/it] +2025-02-05 20:07:16 - ERROR - stderr - +2025-02-05 20:07:16 - ERROR - stderr - +2025-02-05 20:07:16 - INFO - stdout - {'loss': 0.6048, 'grad_norm': 1.0373649597167969, 'learning_rate': 1.199902370753082e-05, 'epoch': 1.36} +2025-02-05 20:07:16 - ERROR - stderr - 45%|████▌ | 10160/22434 [9:59:35<8:38:00, 2.53s/it] +2025-02-05 20:07:18 - ERROR - stderr - 45%|████▌ | 10161/22434 [9:59:38<8:36:32, 2.53s/it] +2025-02-05 20:07:18 - ERROR - stderr - +2025-02-05 20:07:18 - ERROR - stderr - +2025-02-05 20:07:18 - INFO - stdout - {'loss': 0.6773, 'grad_norm': 1.1186918020248413, 'learning_rate': 1.1997609080918933e-05, 'epoch': 1.36} +2025-02-05 20:07:18 - ERROR - stderr - 45%|████▌ | 10161/22434 [9:59:38<8:36:32, 2.53s/it] +2025-02-05 20:07:21 - ERROR - stderr - 45%|████▌ | 10162/22434 [9:59:41<8:44:35, 2.56s/it] +2025-02-05 20:07:21 - ERROR - stderr - +2025-02-05 20:07:21 - ERROR - stderr - +2025-02-05 20:07:21 - INFO - stdout - {'loss': 0.8011, 'grad_norm': 1.2100756168365479, 'learning_rate': 1.1996194412668798e-05, 'epoch': 1.36} +2025-02-05 20:07:21 - ERROR - stderr - 45%|████▌ | 10162/22434 [9:59:41<8:44:35, 2.56s/it] +2025-02-05 20:07:21 - INFO - stdout - WARNING: tokenization mismatch: 1 vs. 55. (ignored) +2025-02-05 20:07:23 - ERROR - stderr - 45%|████▌ | 10163/22434 [9:59:43<8:38:50, 2.54s/it] +2025-02-05 20:07:23 - ERROR - stderr - +2025-02-05 20:07:23 - ERROR - stderr - +2025-02-05 20:07:23 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.1768124103546143, 'learning_rate': 1.1994779702809903e-05, 'epoch': 1.36} +2025-02-05 20:07:23 - ERROR - stderr - 45%|████▌ | 10163/22434 [9:59:43<8:38:50, 2.54s/it] +2025-02-05 20:07:26 - ERROR - stderr - 45%|████▌ | 10164/22434 [9:59:46<8:39:41, 2.54s/it] +2025-02-05 20:07:26 - ERROR - stderr - +2025-02-05 20:07:26 - ERROR - stderr - +2025-02-05 20:07:26 - INFO - stdout - {'loss': 0.6437, 'grad_norm': 1.0588281154632568, 'learning_rate': 1.1993364951371734e-05, 'epoch': 1.36} +2025-02-05 20:07:26 - ERROR - stderr - 45%|████▌ | 10164/22434 [9:59:46<8:39:41, 2.54s/it] +2025-02-05 20:07:28 - ERROR - stderr - 45%|████▌ | 10165/22434 [9:59:48<8:38:09, 2.53s/it] +2025-02-05 20:07:28 - ERROR - stderr - +2025-02-05 20:07:28 - ERROR - stderr - +2025-02-05 20:07:28 - INFO - stdout - {'loss': 0.6474, 'grad_norm': 1.1727943420410156, 'learning_rate': 1.1991950158383773e-05, 'epoch': 1.36} +2025-02-05 20:07:28 - ERROR - stderr - 45%|████▌ | 10165/22434 [9:59:48<8:38:09, 2.53s/it] +2025-02-05 20:07:31 - ERROR - stderr - 45%|████▌ | 10166/22434 [9:59:51<8:39:56, 2.54s/it] +2025-02-05 20:07:31 - ERROR - stderr - +2025-02-05 20:07:31 - ERROR - stderr - +2025-02-05 20:07:31 - INFO - stdout - {'loss': 0.6967, 'grad_norm': 1.1928704977035522, 'learning_rate': 1.1990535323875521e-05, 'epoch': 1.36} +2025-02-05 20:07:31 - ERROR - stderr - 45%|████▌ | 10166/22434 [9:59:51<8:39:56, 2.54s/it] +2025-02-05 20:07:33 - ERROR - stderr - 45%|████▌ | 10167/22434 [9:59:53<8:41:14, 2.55s/it] +2025-02-05 20:07:33 - ERROR - stderr - +2025-02-05 20:07:33 - ERROR - stderr - +2025-02-05 20:07:33 - INFO - stdout - {'loss': 0.7419, 'grad_norm': 1.2454696893692017, 'learning_rate': 1.1989120447876465e-05, 'epoch': 1.36} +2025-02-05 20:07:33 - ERROR - stderr - 45%|████▌ | 10167/22434 [9:59:53<8:41:14, 2.55s/it] +2025-02-05 20:07:36 - ERROR - stderr - 45%|████▌ | 10168/22434 [9:59:56<8:40:14, 2.54s/it] +2025-02-05 20:07:36 - ERROR - stderr - +2025-02-05 20:07:36 - ERROR - stderr - +2025-02-05 20:07:36 - INFO - stdout - {'loss': 0.782, 'grad_norm': 1.137209415435791, 'learning_rate': 1.19877055304161e-05, 'epoch': 1.36} +2025-02-05 20:07:36 - ERROR - stderr - 45%|████▌ | 10168/22434 [9:59:56<8:40:14, 2.54s/it] +2025-02-05 20:07:38 - ERROR - stderr - 45%|████▌ | 10169/22434 [9:59:58<8:36:23, 2.53s/it] +2025-02-05 20:07:38 - ERROR - stderr - +2025-02-05 20:07:38 - ERROR - stderr - +2025-02-05 20:07:38 - INFO - stdout - {'loss': 0.721, 'grad_norm': 1.338990569114685, 'learning_rate': 1.1986290571523912e-05, 'epoch': 1.36} +2025-02-05 20:07:38 - ERROR - stderr - 45%|████▌ | 10169/22434 [9:59:58<8:36:23, 2.53s/it] +2025-02-05 20:07:41 - ERROR - stderr - 45%|████▌ | 10170/22434 [10:00:01<8:42:36, 2.56s/it] +2025-02-05 20:07:41 - ERROR - stderr - +2025-02-05 20:07:41 - ERROR - stderr - +2025-02-05 20:07:41 - INFO - stdout - {'loss': 0.7078, 'grad_norm': 1.1938974857330322, 'learning_rate': 1.19848755712294e-05, 'epoch': 1.36} +2025-02-05 20:07:41 - ERROR - stderr - 45%|████▌ | 10170/22434 [10:00:01<8:42:36, 2.56s/it] +2025-02-05 20:07:44 - ERROR - stderr - 45%|████▌ | 10171/22434 [10:00:03<8:39:39, 2.54s/it] +2025-02-05 20:07:44 - ERROR - stderr - +2025-02-05 20:07:44 - ERROR - stderr - +2025-02-05 20:07:44 - INFO - stdout - {'loss': 0.6854, 'grad_norm': 1.2825438976287842, 'learning_rate': 1.1983460529562051e-05, 'epoch': 1.36} +2025-02-05 20:07:44 - ERROR - stderr - 45%|████▌ | 10171/22434 [10:00:03<8:39:39, 2.54s/it] +2025-02-05 20:07:46 - ERROR - stderr - 45%|████▌ | 10172/22434 [10:00:06<8:36:23, 2.53s/it] +2025-02-05 20:07:46 - ERROR - stderr - +2025-02-05 20:07:46 - ERROR - stderr - +2025-02-05 20:07:46 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.3444875478744507, 'learning_rate': 1.1982045446551372e-05, 'epoch': 1.36} +2025-02-05 20:07:46 - ERROR - stderr - 45%|████▌ | 10172/22434 [10:00:06<8:36:23, 2.53s/it] +2025-02-05 20:07:49 - ERROR - stderr - 45%|████▌ | 10173/22434 [10:00:08<8:33:06, 2.51s/it] +2025-02-05 20:07:49 - ERROR - stderr - +2025-02-05 20:07:49 - ERROR - stderr - +2025-02-05 20:07:49 - INFO - stdout - {'loss': 0.6693, 'grad_norm': 1.09755277633667, 'learning_rate': 1.1980630322226848e-05, 'epoch': 1.36} +2025-02-05 20:07:49 - ERROR - stderr - 45%|████▌ | 10173/22434 [10:00:08<8:33:06, 2.51s/it] +2025-02-05 20:07:51 - ERROR - stderr - 45%|████▌ | 10174/22434 [10:00:11<8:36:47, 2.53s/it] +2025-02-05 20:07:51 - ERROR - stderr - +2025-02-05 20:07:51 - ERROR - stderr - +2025-02-05 20:07:51 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.0746098756790161, 'learning_rate': 1.197921515661798e-05, 'epoch': 1.36} +2025-02-05 20:07:51 - ERROR - stderr - 45%|████▌ | 10174/22434 [10:00:11<8:36:47, 2.53s/it] +2025-02-05 20:07:54 - ERROR - stderr - 45%|████▌ | 10175/22434 [10:00:13<8:40:04, 2.55s/it] +2025-02-05 20:07:54 - ERROR - stderr - +2025-02-05 20:07:54 - ERROR - stderr - +2025-02-05 20:07:54 - INFO - stdout - {'loss': 0.6462, 'grad_norm': 1.0708236694335938, 'learning_rate': 1.1977799949754267e-05, 'epoch': 1.36} +2025-02-05 20:07:54 - ERROR - stderr - 45%|████▌ | 10175/22434 [10:00:14<8:40:04, 2.55s/it] +2025-02-05 20:07:56 - ERROR - stderr - 45%|████▌ | 10176/22434 [10:00:16<8:38:18, 2.54s/it] +2025-02-05 20:07:56 - ERROR - stderr - +2025-02-05 20:07:56 - ERROR - stderr - +2025-02-05 20:07:56 - INFO - stdout - {'loss': 0.5998, 'grad_norm': 1.1177432537078857, 'learning_rate': 1.197638470166521e-05, 'epoch': 1.36} +2025-02-05 20:07:56 - ERROR - stderr - 45%|████▌ | 10176/22434 [10:00:16<8:38:18, 2.54s/it] +2025-02-05 20:07:59 - ERROR - stderr - 45%|████▌ | 10177/22434 [10:00:18<8:34:21, 2.52s/it] +2025-02-05 20:07:59 - ERROR - stderr - +2025-02-05 20:07:59 - ERROR - stderr - +2025-02-05 20:07:59 - INFO - stdout - {'loss': 0.6392, 'grad_norm': 1.1892383098602295, 'learning_rate': 1.19749694123803e-05, 'epoch': 1.36} +2025-02-05 20:07:59 - ERROR - stderr - 45%|████▌ | 10177/22434 [10:00:19<8:34:21, 2.52s/it] +2025-02-05 20:08:01 - ERROR - stderr - 45%|████▌ | 10178/22434 [10:00:21<8:30:12, 2.50s/it] +2025-02-05 20:08:01 - ERROR - stderr - +2025-02-05 20:08:01 - ERROR - stderr - +2025-02-05 20:08:01 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.1515694856643677, 'learning_rate': 1.1973554081929042e-05, 'epoch': 1.36} +2025-02-05 20:08:01 - ERROR - stderr - 45%|████▌ | 10178/22434 [10:00:21<8:30:12, 2.50s/it] +2025-02-05 20:08:04 - ERROR - stderr - 45%|████▌ | 10179/22434 [10:00:23<8:28:41, 2.49s/it] +2025-02-05 20:08:04 - ERROR - stderr - +2025-02-05 20:08:04 - ERROR - stderr - +2025-02-05 20:08:04 - INFO - stdout - {'loss': 0.7541, 'grad_norm': 1.243503212928772, 'learning_rate': 1.197213871034094e-05, 'epoch': 1.36} +2025-02-05 20:08:04 - ERROR - stderr - 45%|████▌ | 10179/22434 [10:00:23<8:28:41, 2.49s/it] +2025-02-05 20:08:06 - ERROR - stderr - 45%|████▌ | 10180/22434 [10:00:26<8:28:23, 2.49s/it] +2025-02-05 20:08:06 - ERROR - stderr - +2025-02-05 20:08:06 - ERROR - stderr - +2025-02-05 20:08:06 - INFO - stdout - {'loss': 0.7347, 'grad_norm': 1.2338383197784424, 'learning_rate': 1.1970723297645494e-05, 'epoch': 1.36} +2025-02-05 20:08:06 - ERROR - stderr - 45%|████▌ | 10180/22434 [10:00:26<8:28:23, 2.49s/it] +2025-02-05 20:08:09 - ERROR - stderr - 45%|████▌ | 10181/22434 [10:00:28<8:32:00, 2.51s/it] +2025-02-05 20:08:09 - ERROR - stderr - +2025-02-05 20:08:09 - ERROR - stderr - +2025-02-05 20:08:09 - INFO - stdout - {'loss': 0.7583, 'grad_norm': 1.262148141860962, 'learning_rate': 1.1969307843872206e-05, 'epoch': 1.36} +2025-02-05 20:08:09 - ERROR - stderr - 45%|████▌ | 10181/22434 [10:00:28<8:32:00, 2.51s/it] +2025-02-05 20:08:12 - ERROR - stderr - 45%|████▌ | 10182/22434 [10:00:31<8:59:38, 2.64s/it] +2025-02-05 20:08:12 - ERROR - stderr - +2025-02-05 20:08:12 - ERROR - stderr - +2025-02-05 20:08:12 - INFO - stdout - {'loss': 0.6301, 'grad_norm': 1.1674898862838745, 'learning_rate': 1.1967892349050581e-05, 'epoch': 1.36} +2025-02-05 20:08:12 - ERROR - stderr - 45%|████▌ | 10182/22434 [10:00:31<8:59:38, 2.64s/it] +2025-02-05 20:08:14 - ERROR - stderr - 45%|████▌ | 10183/22434 [10:00:34<8:51:04, 2.60s/it] +2025-02-05 20:08:14 - ERROR - stderr - +2025-02-05 20:08:14 - ERROR - stderr - +2025-02-05 20:08:14 - INFO - stdout - {'loss': 0.6208, 'grad_norm': 1.027660846710205, 'learning_rate': 1.1966476813210121e-05, 'epoch': 1.36} +2025-02-05 20:08:14 - ERROR - stderr - 45%|████▌ | 10183/22434 [10:00:34<8:51:04, 2.60s/it] +2025-02-05 20:08:17 - ERROR - stderr - 45%|████▌ | 10184/22434 [10:00:36<8:41:40, 2.56s/it] +2025-02-05 20:08:17 - ERROR - stderr - +2025-02-05 20:08:17 - ERROR - stderr - +2025-02-05 20:08:17 - INFO - stdout - {'loss': 0.7563, 'grad_norm': 1.3393902778625488, 'learning_rate': 1.1965061236380336e-05, 'epoch': 1.36} +2025-02-05 20:08:17 - ERROR - stderr - 45%|████▌ | 10184/22434 [10:00:36<8:41:40, 2.56s/it] +2025-02-05 20:08:19 - ERROR - stderr - 45%|████▌ | 10185/22434 [10:00:39<8:38:27, 2.54s/it] +2025-02-05 20:08:19 - ERROR - stderr - +2025-02-05 20:08:19 - ERROR - stderr - +2025-02-05 20:08:19 - INFO - stdout - {'loss': 0.7031, 'grad_norm': 1.1425881385803223, 'learning_rate': 1.196364561859073e-05, 'epoch': 1.36} +2025-02-05 20:08:19 - ERROR - stderr - 45%|████▌ | 10185/22434 [10:00:39<8:38:27, 2.54s/it] +2025-02-05 20:08:21 - ERROR - stderr - 45%|████▌ | 10186/22434 [10:00:41<8:32:24, 2.51s/it] +2025-02-05 20:08:22 - ERROR - stderr - +2025-02-05 20:08:22 - ERROR - stderr - +2025-02-05 20:08:22 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.1585972309112549, 'learning_rate': 1.1962229959870805e-05, 'epoch': 1.36} +2025-02-05 20:08:22 - ERROR - stderr - 45%|████▌ | 10186/22434 [10:00:41<8:32:24, 2.51s/it] +2025-02-05 20:08:24 - ERROR - stderr - 45%|████▌ | 10187/22434 [10:00:44<8:29:54, 2.50s/it] +2025-02-05 20:08:24 - ERROR - stderr - +2025-02-05 20:08:24 - ERROR - stderr - +2025-02-05 20:08:24 - INFO - stdout - {'loss': 0.6353, 'grad_norm': 1.101199984550476, 'learning_rate': 1.196081426025008e-05, 'epoch': 1.36} +2025-02-05 20:08:24 - ERROR - stderr - 45%|████▌ | 10187/22434 [10:00:44<8:29:54, 2.50s/it] +2025-02-05 20:08:27 - ERROR - stderr - 45%|████▌ | 10188/22434 [10:00:47<9:13:28, 2.71s/it] +2025-02-05 20:08:27 - ERROR - stderr - +2025-02-05 20:08:27 - ERROR - stderr - +2025-02-05 20:08:27 - INFO - stdout - {'loss': 0.6245, 'grad_norm': 1.1224530935287476, 'learning_rate': 1.1959398519758059e-05, 'epoch': 1.36} +2025-02-05 20:08:27 - ERROR - stderr - 45%|████▌ | 10188/22434 [10:00:47<9:13:28, 2.71s/it] +2025-02-05 20:08:30 - ERROR - stderr - 45%|████▌ | 10189/22434 [10:00:50<9:25:23, 2.77s/it] +2025-02-05 20:08:30 - ERROR - stderr - +2025-02-05 20:08:30 - ERROR - stderr - +2025-02-05 20:08:30 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.2043191194534302, 'learning_rate': 1.1957982738424247e-05, 'epoch': 1.36} +2025-02-05 20:08:30 - ERROR - stderr - 45%|████▌ | 10189/22434 [10:00:50<9:25:23, 2.77s/it] +2025-02-05 20:08:33 - ERROR - stderr - 45%|████▌ | 10190/22434 [10:00:52<9:13:22, 2.71s/it] +2025-02-05 20:08:33 - ERROR - stderr - +2025-02-05 20:08:33 - ERROR - stderr - +2025-02-05 20:08:33 - INFO - stdout - {'loss': 0.7057, 'grad_norm': 1.1529829502105713, 'learning_rate': 1.1956566916278159e-05, 'epoch': 1.36} +2025-02-05 20:08:33 - ERROR - stderr - 45%|████▌ | 10190/22434 [10:00:52<9:13:22, 2.71s/it] +2025-02-05 20:08:35 - ERROR - stderr - 45%|████▌ | 10191/22434 [10:00:55<8:55:40, 2.63s/it] +2025-02-05 20:08:35 - ERROR - stderr - +2025-02-05 20:08:35 - ERROR - stderr - +2025-02-05 20:08:35 - INFO - stdout - {'loss': 0.7038, 'grad_norm': 1.2066937685012817, 'learning_rate': 1.1955151053349306e-05, 'epoch': 1.36} +2025-02-05 20:08:35 - ERROR - stderr - 45%|████▌ | 10191/22434 [10:00:55<8:55:40, 2.63s/it] +2025-02-05 20:08:38 - ERROR - stderr - 45%|████▌ | 10192/22434 [10:00:57<8:51:22, 2.60s/it] +2025-02-05 20:08:38 - ERROR - stderr - +2025-02-05 20:08:38 - ERROR - stderr - +2025-02-05 20:08:38 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.1664913892745972, 'learning_rate': 1.1953735149667201e-05, 'epoch': 1.36} +2025-02-05 20:08:38 - ERROR - stderr - 45%|████▌ | 10192/22434 [10:00:57<8:51:22, 2.60s/it] +2025-02-05 20:08:40 - ERROR - stderr - 45%|████▌ | 10193/22434 [10:01:00<8:51:48, 2.61s/it] +2025-02-05 20:08:40 - ERROR - stderr - +2025-02-05 20:08:40 - ERROR - stderr - +2025-02-05 20:08:40 - INFO - stdout - {'loss': 0.7716, 'grad_norm': 1.3087974786758423, 'learning_rate': 1.1952319205261356e-05, 'epoch': 1.36} +2025-02-05 20:08:40 - ERROR - stderr - 45%|████▌ | 10193/22434 [10:01:00<8:51:48, 2.61s/it] +2025-02-05 20:08:43 - ERROR - stderr - 45%|████▌ | 10194/22434 [10:01:02<8:39:45, 2.55s/it] +2025-02-05 20:08:43 - ERROR - stderr - +2025-02-05 20:08:43 - ERROR - stderr - +2025-02-05 20:08:43 - INFO - stdout - {'loss': 0.7635, 'grad_norm': 1.252387523651123, 'learning_rate': 1.1950903220161286e-05, 'epoch': 1.36} +2025-02-05 20:08:43 - ERROR - stderr - 45%|████▌ | 10194/22434 [10:01:02<8:39:45, 2.55s/it] +2025-02-05 20:08:45 - ERROR - stderr - 45%|████▌ | 10195/22434 [10:01:05<8:35:33, 2.53s/it] +2025-02-05 20:08:45 - ERROR - stderr - +2025-02-05 20:08:45 - ERROR - stderr - +2025-02-05 20:08:45 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 1.1942683458328247, 'learning_rate': 1.1949487194396503e-05, 'epoch': 1.36} +2025-02-05 20:08:45 - ERROR - stderr - 45%|████▌ | 10195/22434 [10:01:05<8:35:33, 2.53s/it] +2025-02-05 20:08:48 - ERROR - stderr - 45%|████▌ | 10196/22434 [10:01:07<8:29:33, 2.50s/it] +2025-02-05 20:08:48 - ERROR - stderr - +2025-02-05 20:08:48 - ERROR - stderr - +2025-02-05 20:08:48 - INFO - stdout - {'loss': 0.6624, 'grad_norm': 1.1268057823181152, 'learning_rate': 1.1948071127996525e-05, 'epoch': 1.36} +2025-02-05 20:08:48 - ERROR - stderr - 45%|████▌ | 10196/22434 [10:01:07<8:29:33, 2.50s/it] +2025-02-05 20:08:50 - ERROR - stderr - 45%|████▌ | 10197/22434 [10:01:10<8:29:23, 2.50s/it] +2025-02-05 20:08:50 - ERROR - stderr - +2025-02-05 20:08:50 - ERROR - stderr - +2025-02-05 20:08:50 - INFO - stdout - {'loss': 0.6746, 'grad_norm': 1.1349005699157715, 'learning_rate': 1.194665502099087e-05, 'epoch': 1.36} +2025-02-05 20:08:50 - ERROR - stderr - 45%|████▌ | 10197/22434 [10:01:10<8:29:23, 2.50s/it] +2025-02-05 20:08:53 - ERROR - stderr - 45%|████▌ | 10198/22434 [10:01:12<8:32:52, 2.51s/it] +2025-02-05 20:08:53 - ERROR - stderr - +2025-02-05 20:08:53 - ERROR - stderr - +2025-02-05 20:08:53 - INFO - stdout - {'loss': 0.6439, 'grad_norm': 1.134196400642395, 'learning_rate': 1.1945238873409053e-05, 'epoch': 1.36} +2025-02-05 20:08:53 - ERROR - stderr - 45%|████▌ | 10198/22434 [10:01:12<8:32:52, 2.51s/it] +2025-02-05 20:08:55 - ERROR - stderr - 45%|████▌ | 10199/22434 [10:01:15<8:34:34, 2.52s/it] +2025-02-05 20:08:55 - ERROR - stderr - +2025-02-05 20:08:55 - ERROR - stderr - +2025-02-05 20:08:55 - INFO - stdout - {'loss': 0.6387, 'grad_norm': 1.173986792564392, 'learning_rate': 1.1943822685280592e-05, 'epoch': 1.36} +2025-02-05 20:08:55 - ERROR - stderr - 45%|████▌ | 10199/22434 [10:01:15<8:34:34, 2.52s/it] +2025-02-05 20:08:58 - ERROR - stderr - 45%|████▌ | 10200/22434 [10:01:17<8:36:16, 2.53s/it] +2025-02-05 20:08:58 - ERROR - stderr - +2025-02-05 20:08:58 - ERROR - stderr - +2025-02-05 20:08:58 - INFO - stdout - {'loss': 0.7187, 'grad_norm': 1.1811243295669556, 'learning_rate': 1.194240645663501e-05, 'epoch': 1.36} +2025-02-05 20:08:58 - ERROR - stderr - 45%|████▌ | 10200/22434 [10:01:18<8:36:16, 2.53s/it] +2025-02-05 20:09:00 - ERROR - stderr - 45%|████▌ | 10201/22434 [10:01:20<8:35:18, 2.53s/it] +2025-02-05 20:09:00 - ERROR - stderr - +2025-02-05 20:09:00 - ERROR - stderr - +2025-02-05 20:09:00 - INFO - stdout - {'loss': 0.7216, 'grad_norm': 1.1912455558776855, 'learning_rate': 1.1940990187501824e-05, 'epoch': 1.36} +2025-02-05 20:09:00 - ERROR - stderr - 45%|████▌ | 10201/22434 [10:01:20<8:35:18, 2.53s/it] +2025-02-05 20:09:03 - ERROR - stderr - 45%|████▌ | 10202/22434 [10:01:23<8:37:49, 2.54s/it] +2025-02-05 20:09:03 - ERROR - stderr - +2025-02-05 20:09:03 - ERROR - stderr - +2025-02-05 20:09:03 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.2071850299835205, 'learning_rate': 1.1939573877910555e-05, 'epoch': 1.36} +2025-02-05 20:09:03 - ERROR - stderr - 45%|████▌ | 10202/22434 [10:01:23<8:37:49, 2.54s/it] +2025-02-05 20:09:05 - ERROR - stderr - 45%|████▌ | 10203/22434 [10:01:25<8:31:06, 2.51s/it] +2025-02-05 20:09:05 - ERROR - stderr - +2025-02-05 20:09:05 - ERROR - stderr - +2025-02-05 20:09:05 - INFO - stdout - {'loss': 0.6719, 'grad_norm': 1.2127255201339722, 'learning_rate': 1.1938157527890722e-05, 'epoch': 1.36} +2025-02-05 20:09:05 - ERROR - stderr - 45%|████▌ | 10203/22434 [10:01:25<8:31:06, 2.51s/it] +2025-02-05 20:09:08 - ERROR - stderr - 45%|████▌ | 10204/22434 [10:01:27<8:30:10, 2.50s/it] +2025-02-05 20:09:08 - ERROR - stderr - +2025-02-05 20:09:08 - ERROR - stderr - +2025-02-05 20:09:08 - INFO - stdout - {'loss': 0.6812, 'grad_norm': 1.2086740732192993, 'learning_rate': 1.193674113747185e-05, 'epoch': 1.36} +2025-02-05 20:09:08 - ERROR - stderr - 45%|████▌ | 10204/22434 [10:01:28<8:30:10, 2.50s/it] +2025-02-05 20:09:10 - ERROR - stderr - 45%|████▌ | 10205/22434 [10:01:30<8:40:29, 2.55s/it] +2025-02-05 20:09:10 - ERROR - stderr - +2025-02-05 20:09:10 - ERROR - stderr - +2025-02-05 20:09:10 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.2231475114822388, 'learning_rate': 1.1935324706683464e-05, 'epoch': 1.36} +2025-02-05 20:09:10 - ERROR - stderr - 45%|████▌ | 10205/22434 [10:01:30<8:40:29, 2.55s/it] +2025-02-05 20:09:13 - ERROR - stderr - 45%|████▌ | 10206/22434 [10:01:33<8:45:09, 2.58s/it] +2025-02-05 20:09:13 - ERROR - stderr - +2025-02-05 20:09:13 - ERROR - stderr - +2025-02-05 20:09:13 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.2310230731964111, 'learning_rate': 1.1933908235555085e-05, 'epoch': 1.36} +2025-02-05 20:09:13 - ERROR - stderr - 45%|████▌ | 10206/22434 [10:01:33<8:45:09, 2.58s/it] +2025-02-05 20:09:16 - ERROR - stderr - 45%|████▌ | 10207/22434 [10:01:35<8:49:28, 2.60s/it] +2025-02-05 20:09:16 - ERROR - stderr - +2025-02-05 20:09:16 - ERROR - stderr - +2025-02-05 20:09:16 - INFO - stdout - {'loss': 0.637, 'grad_norm': 1.1294760704040527, 'learning_rate': 1.1932491724116239e-05, 'epoch': 1.36} +2025-02-05 20:09:16 - ERROR - stderr - 45%|████▌ | 10207/22434 [10:01:35<8:49:28, 2.60s/it] +2025-02-05 20:09:18 - ERROR - stderr - 46%|████▌ | 10208/22434 [10:01:38<8:43:25, 2.57s/it] +2025-02-05 20:09:18 - ERROR - stderr - +2025-02-05 20:09:18 - ERROR - stderr - +2025-02-05 20:09:18 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.237724781036377, 'learning_rate': 1.1931075172396453e-05, 'epoch': 1.37} +2025-02-05 20:09:18 - ERROR - stderr - 46%|████▌ | 10208/22434 [10:01:38<8:43:25, 2.57s/it] +2025-02-05 20:09:21 - ERROR - stderr - 46%|████▌ | 10209/22434 [10:01:40<8:40:35, 2.56s/it] +2025-02-05 20:09:21 - ERROR - stderr - +2025-02-05 20:09:21 - ERROR - stderr - +2025-02-05 20:09:21 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.0759931802749634, 'learning_rate': 1.1929658580425257e-05, 'epoch': 1.37} +2025-02-05 20:09:21 - ERROR - stderr - 46%|████▌ | 10209/22434 [10:01:41<8:40:35, 2.56s/it] +2025-02-05 20:09:23 - ERROR - stderr - 46%|████▌ | 10210/22434 [10:01:43<8:33:18, 2.52s/it] +2025-02-05 20:09:23 - ERROR - stderr - +2025-02-05 20:09:23 - ERROR - stderr - +2025-02-05 20:09:23 - INFO - stdout - {'loss': 0.8021, 'grad_norm': 1.2478537559509277, 'learning_rate': 1.192824194823217e-05, 'epoch': 1.37} +2025-02-05 20:09:23 - ERROR - stderr - 46%|████▌ | 10210/22434 [10:01:43<8:33:18, 2.52s/it] +2025-02-05 20:09:26 - ERROR - stderr - 46%|████▌ | 10211/22434 [10:01:45<8:33:53, 2.52s/it] +2025-02-05 20:09:26 - ERROR - stderr - +2025-02-05 20:09:26 - ERROR - stderr - +2025-02-05 20:09:26 - INFO - stdout - {'loss': 0.6443, 'grad_norm': 1.1422759294509888, 'learning_rate': 1.1926825275846722e-05, 'epoch': 1.37} +2025-02-05 20:09:26 - ERROR - stderr - 46%|████▌ | 10211/22434 [10:01:45<8:33:53, 2.52s/it] +2025-02-05 20:09:28 - ERROR - stderr - 46%|████▌ | 10212/22434 [10:01:48<8:36:28, 2.54s/it] +2025-02-05 20:09:28 - ERROR - stderr - +2025-02-05 20:09:28 - ERROR - stderr - +2025-02-05 20:09:28 - INFO - stdout - {'loss': 0.6729, 'grad_norm': 1.1099671125411987, 'learning_rate': 1.1925408563298448e-05, 'epoch': 1.37} +2025-02-05 20:09:28 - ERROR - stderr - 46%|████▌ | 10212/22434 [10:01:48<8:36:28, 2.54s/it] +2025-02-05 20:09:31 - ERROR - stderr - 46%|████▌ | 10213/22434 [10:01:51<8:38:17, 2.54s/it] +2025-02-05 20:09:31 - ERROR - stderr - +2025-02-05 20:09:31 - ERROR - stderr - +2025-02-05 20:09:31 - INFO - stdout - {'loss': 0.7681, 'grad_norm': 1.241811990737915, 'learning_rate': 1.192399181061688e-05, 'epoch': 1.37} +2025-02-05 20:09:31 - ERROR - stderr - 46%|████▌ | 10213/22434 [10:01:51<8:38:17, 2.54s/it] +2025-02-05 20:09:33 - ERROR - stderr - 46%|████▌ | 10214/22434 [10:01:53<8:33:54, 2.52s/it] +2025-02-05 20:09:33 - ERROR - stderr - +2025-02-05 20:09:33 - ERROR - stderr - +2025-02-05 20:09:33 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.4407131671905518, 'learning_rate': 1.1922575017831538e-05, 'epoch': 1.37} +2025-02-05 20:09:33 - ERROR - stderr - 46%|████▌ | 10214/22434 [10:01:53<8:33:54, 2.52s/it] +2025-02-05 20:09:36 - ERROR - stderr - 46%|████▌ | 10215/22434 [10:01:55<8:28:24, 2.50s/it] +2025-02-05 20:09:36 - ERROR - stderr - +2025-02-05 20:09:36 - ERROR - stderr - +2025-02-05 20:09:36 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.166901707649231, 'learning_rate': 1.1921158184971959e-05, 'epoch': 1.37} +2025-02-05 20:09:36 - ERROR - stderr - 46%|████▌ | 10215/22434 [10:01:56<8:28:24, 2.50s/it] +2025-02-05 20:09:39 - ERROR - stderr - 46%|████▌ | 10216/22434 [10:01:58<8:48:47, 2.60s/it] +2025-02-05 20:09:39 - ERROR - stderr - +2025-02-05 20:09:39 - ERROR - stderr - +2025-02-05 20:09:39 - INFO - stdout - {'loss': 0.7095, 'grad_norm': 1.1612164974212646, 'learning_rate': 1.1919741312067676e-05, 'epoch': 1.37} +2025-02-05 20:09:39 - ERROR - stderr - 46%|████▌ | 10216/22434 [10:01:58<8:48:47, 2.60s/it] +2025-02-05 20:09:41 - ERROR - stderr - 46%|████▌ | 10217/22434 [10:02:01<8:40:18, 2.56s/it] +2025-02-05 20:09:41 - ERROR - stderr - +2025-02-05 20:09:41 - ERROR - stderr - +2025-02-05 20:09:41 - INFO - stdout - {'loss': 0.6672, 'grad_norm': 1.1855413913726807, 'learning_rate': 1.1918324399148225e-05, 'epoch': 1.37} +2025-02-05 20:09:41 - ERROR - stderr - 46%|████▌ | 10217/22434 [10:02:01<8:40:18, 2.56s/it] +2025-02-05 20:09:43 - ERROR - stderr - 46%|████▌ | 10218/22434 [10:02:03<8:34:22, 2.53s/it] +2025-02-05 20:09:44 - ERROR - stderr - +2025-02-05 20:09:44 - ERROR - stderr - +2025-02-05 20:09:44 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.2104783058166504, 'learning_rate': 1.1916907446243135e-05, 'epoch': 1.37} +2025-02-05 20:09:44 - ERROR - stderr - 46%|████▌ | 10218/22434 [10:02:03<8:34:22, 2.53s/it] +2025-02-05 20:09:46 - ERROR - stderr - 46%|████▌ | 10219/22434 [10:02:06<8:30:43, 2.51s/it] +2025-02-05 20:09:46 - ERROR - stderr - +2025-02-05 20:09:46 - ERROR - stderr - +2025-02-05 20:09:46 - INFO - stdout - {'loss': 0.7738, 'grad_norm': 1.2359639406204224, 'learning_rate': 1.1915490453381946e-05, 'epoch': 1.37} +2025-02-05 20:09:46 - ERROR - stderr - 46%|████▌ | 10219/22434 [10:02:06<8:30:43, 2.51s/it] +2025-02-05 20:09:48 - ERROR - stderr - 46%|████▌ | 10220/22434 [10:02:08<8:28:05, 2.50s/it] +2025-02-05 20:09:48 - ERROR - stderr - +2025-02-05 20:09:48 - ERROR - stderr - +2025-02-05 20:09:48 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.1119080781936646, 'learning_rate': 1.1914073420594189e-05, 'epoch': 1.37} +2025-02-05 20:09:48 - ERROR - stderr - 46%|████▌ | 10220/22434 [10:02:08<8:28:05, 2.50s/it] +2025-02-05 20:09:51 - ERROR - stderr - 46%|████▌ | 10221/22434 [10:02:11<8:25:36, 2.48s/it] +2025-02-05 20:09:51 - ERROR - stderr - +2025-02-05 20:09:51 - ERROR - stderr - +2025-02-05 20:09:51 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.133814811706543, 'learning_rate': 1.1912656347909406e-05, 'epoch': 1.37} +2025-02-05 20:09:51 - ERROR - stderr - 46%|████▌ | 10221/22434 [10:02:11<8:25:36, 2.48s/it] +2025-02-05 20:09:53 - ERROR - stderr - 46%|████▌ | 10222/22434 [10:02:13<8:24:14, 2.48s/it] +2025-02-05 20:09:53 - ERROR - stderr - +2025-02-05 20:09:53 - ERROR - stderr - +2025-02-05 20:09:53 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.2471286058425903, 'learning_rate': 1.191123923535713e-05, 'epoch': 1.37} +2025-02-05 20:09:53 - ERROR - stderr - 46%|████▌ | 10222/22434 [10:02:13<8:24:14, 2.48s/it] +2025-02-05 20:09:56 - ERROR - stderr - 46%|████▌ | 10223/22434 [10:02:16<8:27:48, 2.50s/it] +2025-02-05 20:09:56 - ERROR - stderr - +2025-02-05 20:09:56 - ERROR - stderr - +2025-02-05 20:09:56 - INFO - stdout - {'loss': 0.6879, 'grad_norm': 1.2019598484039307, 'learning_rate': 1.1909822082966902e-05, 'epoch': 1.37} +2025-02-05 20:09:56 - ERROR - stderr - 46%|████▌ | 10223/22434 [10:02:16<8:27:48, 2.50s/it] +2025-02-05 20:09:58 - ERROR - stderr - 46%|████▌ | 10224/22434 [10:02:18<8:28:27, 2.50s/it] +2025-02-05 20:09:58 - ERROR - stderr - +2025-02-05 20:09:58 - ERROR - stderr - +2025-02-05 20:09:58 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.1864873170852661, 'learning_rate': 1.1908404890768255e-05, 'epoch': 1.37} +2025-02-05 20:09:58 - ERROR - stderr - 46%|████▌ | 10224/22434 [10:02:18<8:28:27, 2.50s/it] +2025-02-05 20:10:01 - ERROR - stderr - 46%|████▌ | 10225/22434 [10:02:21<8:27:49, 2.50s/it] +2025-02-05 20:10:01 - ERROR - stderr - +2025-02-05 20:10:01 - ERROR - stderr - +2025-02-05 20:10:01 - INFO - stdout - {'loss': 0.8002, 'grad_norm': 1.288870096206665, 'learning_rate': 1.1906987658790741e-05, 'epoch': 1.37} +2025-02-05 20:10:01 - ERROR - stderr - 46%|████▌ | 10225/22434 [10:02:21<8:27:49, 2.50s/it] +2025-02-05 20:10:03 - ERROR - stderr - 46%|████▌ | 10226/22434 [10:02:23<8:27:48, 2.50s/it] +2025-02-05 20:10:03 - ERROR - stderr - +2025-02-05 20:10:03 - ERROR - stderr - +2025-02-05 20:10:03 - INFO - stdout - {'loss': 0.7189, 'grad_norm': 1.2178617715835571, 'learning_rate': 1.1905570387063892e-05, 'epoch': 1.37} +2025-02-05 20:10:03 - ERROR - stderr - 46%|████▌ | 10226/22434 [10:02:23<8:27:48, 2.50s/it] +2025-02-05 20:10:06 - ERROR - stderr - 46%|████▌ | 10227/22434 [10:02:26<8:28:30, 2.50s/it] +2025-02-05 20:10:06 - ERROR - stderr - +2025-02-05 20:10:06 - ERROR - stderr - +2025-02-05 20:10:06 - INFO - stdout - {'loss': 0.7178, 'grad_norm': 1.2314642667770386, 'learning_rate': 1.190415307561725e-05, 'epoch': 1.37} +2025-02-05 20:10:06 - ERROR - stderr - 46%|████▌ | 10227/22434 [10:02:26<8:28:30, 2.50s/it] +2025-02-05 20:10:08 - ERROR - stderr - 46%|████▌ | 10228/22434 [10:02:28<8:37:22, 2.54s/it] +2025-02-05 20:10:09 - ERROR - stderr - +2025-02-05 20:10:09 - ERROR - stderr - +2025-02-05 20:10:09 - INFO - stdout - {'loss': 0.7753, 'grad_norm': 1.2320245504379272, 'learning_rate': 1.190273572448036e-05, 'epoch': 1.37} +2025-02-05 20:10:09 - ERROR - stderr - 46%|████▌ | 10228/22434 [10:02:28<8:37:22, 2.54s/it] +2025-02-05 20:10:11 - ERROR - stderr - 46%|████▌ | 10229/22434 [10:02:31<8:32:40, 2.52s/it] +2025-02-05 20:10:11 - ERROR - stderr - +2025-02-05 20:10:11 - ERROR - stderr - +2025-02-05 20:10:11 - INFO - stdout - {'loss': 0.6797, 'grad_norm': 1.1743957996368408, 'learning_rate': 1.1901318333682765e-05, 'epoch': 1.37} +2025-02-05 20:10:11 - ERROR - stderr - 46%|████▌ | 10229/22434 [10:02:31<8:32:40, 2.52s/it] +2025-02-05 20:10:14 - ERROR - stderr - 46%|████▌ | 10230/22434 [10:02:33<8:37:15, 2.54s/it] +2025-02-05 20:10:14 - ERROR - stderr - +2025-02-05 20:10:14 - ERROR - stderr - +2025-02-05 20:10:14 - INFO - stdout - {'loss': 0.6625, 'grad_norm': 1.3338135480880737, 'learning_rate': 1.189990090325401e-05, 'epoch': 1.37} +2025-02-05 20:10:14 - ERROR - stderr - 46%|████▌ | 10230/22434 [10:02:33<8:37:15, 2.54s/it] +2025-02-05 20:10:16 - ERROR - stderr - 46%|████▌ | 10231/22434 [10:02:36<8:43:49, 2.58s/it] +2025-02-05 20:10:16 - ERROR - stderr - +2025-02-05 20:10:16 - ERROR - stderr - +2025-02-05 20:10:16 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.2401553392410278, 'learning_rate': 1.1898483433223635e-05, 'epoch': 1.37} +2025-02-05 20:10:16 - ERROR - stderr - 46%|████▌ | 10231/22434 [10:02:36<8:43:49, 2.58s/it] +2025-02-05 20:10:19 - ERROR - stderr - 46%|████▌ | 10232/22434 [10:02:38<8:38:55, 2.55s/it] +2025-02-05 20:10:19 - ERROR - stderr - +2025-02-05 20:10:19 - ERROR - stderr - +2025-02-05 20:10:19 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.1727200746536255, 'learning_rate': 1.1897065923621191e-05, 'epoch': 1.37} +2025-02-05 20:10:19 - ERROR - stderr - 46%|████▌ | 10232/22434 [10:02:39<8:38:55, 2.55s/it] +2025-02-05 20:10:21 - ERROR - stderr - 46%|████▌ | 10233/22434 [10:02:41<8:36:58, 2.54s/it] +2025-02-05 20:10:21 - ERROR - stderr - +2025-02-05 20:10:21 - ERROR - stderr - +2025-02-05 20:10:21 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.1763224601745605, 'learning_rate': 1.1895648374476227e-05, 'epoch': 1.37} +2025-02-05 20:10:21 - ERROR - stderr - 46%|████▌ | 10233/22434 [10:02:41<8:36:58, 2.54s/it] +2025-02-05 20:10:24 - ERROR - stderr - 46%|████▌ | 10234/22434 [10:02:44<8:38:47, 2.55s/it] +2025-02-05 20:10:24 - ERROR - stderr - +2025-02-05 20:10:24 - ERROR - stderr - +2025-02-05 20:10:24 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.1437729597091675, 'learning_rate': 1.1894230785818284e-05, 'epoch': 1.37} +2025-02-05 20:10:24 - ERROR - stderr - 46%|████▌ | 10234/22434 [10:02:44<8:38:47, 2.55s/it] +2025-02-05 20:10:26 - ERROR - stderr - 46%|████▌ | 10235/22434 [10:02:46<8:36:04, 2.54s/it] +2025-02-05 20:10:26 - ERROR - stderr - +2025-02-05 20:10:26 - ERROR - stderr - +2025-02-05 20:10:26 - INFO - stdout - {'loss': 0.7544, 'grad_norm': 1.1838178634643555, 'learning_rate': 1.189281315767691e-05, 'epoch': 1.37} +2025-02-05 20:10:26 - ERROR - stderr - 46%|████▌ | 10235/22434 [10:02:46<8:36:04, 2.54s/it] +2025-02-05 20:10:29 - ERROR - stderr - 46%|████▌ | 10236/22434 [10:02:49<8:33:40, 2.53s/it] +2025-02-05 20:10:29 - ERROR - stderr - +2025-02-05 20:10:29 - ERROR - stderr - +2025-02-05 20:10:29 - INFO - stdout - {'loss': 0.6548, 'grad_norm': 1.1428289413452148, 'learning_rate': 1.1891395490081661e-05, 'epoch': 1.37} +2025-02-05 20:10:29 - ERROR - stderr - 46%|████▌ | 10236/22434 [10:02:49<8:33:40, 2.53s/it] +2025-02-05 20:10:31 - ERROR - stderr - 46%|████▌ | 10237/22434 [10:02:51<8:32:48, 2.52s/it] +2025-02-05 20:10:31 - ERROR - stderr - +2025-02-05 20:10:31 - ERROR - stderr - +2025-02-05 20:10:31 - INFO - stdout - {'loss': 0.7257, 'grad_norm': 1.4124630689620972, 'learning_rate': 1.1889977783062078e-05, 'epoch': 1.37} +2025-02-05 20:10:31 - ERROR - stderr - 46%|████▌ | 10237/22434 [10:02:51<8:32:48, 2.52s/it] +2025-02-05 20:10:34 - ERROR - stderr - 46%|████▌ | 10238/22434 [10:02:54<8:28:43, 2.50s/it] +2025-02-05 20:10:34 - ERROR - stderr - +2025-02-05 20:10:34 - ERROR - stderr - +2025-02-05 20:10:34 - INFO - stdout - {'loss': 0.6664, 'grad_norm': 1.2611563205718994, 'learning_rate': 1.1888560036647721e-05, 'epoch': 1.37} +2025-02-05 20:10:34 - ERROR - stderr - 46%|████▌ | 10238/22434 [10:02:54<8:28:43, 2.50s/it] +2025-02-05 20:10:36 - ERROR - stderr - 46%|████▌ | 10239/22434 [10:02:56<8:31:25, 2.52s/it] +2025-02-05 20:10:36 - ERROR - stderr - +2025-02-05 20:10:36 - ERROR - stderr - +2025-02-05 20:10:36 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.153427243232727, 'learning_rate': 1.1887142250868135e-05, 'epoch': 1.37} +2025-02-05 20:10:36 - ERROR - stderr - 46%|████▌ | 10239/22434 [10:02:56<8:31:25, 2.52s/it] +2025-02-05 20:10:39 - ERROR - stderr - 46%|████▌ | 10240/22434 [10:02:59<8:29:18, 2.51s/it] +2025-02-05 20:10:39 - ERROR - stderr - +2025-02-05 20:10:39 - ERROR - stderr - +2025-02-05 20:10:39 - INFO - stdout - {'loss': 0.6219, 'grad_norm': 1.2976081371307373, 'learning_rate': 1.1885724425752875e-05, 'epoch': 1.37} +2025-02-05 20:10:39 - ERROR - stderr - 46%|████▌ | 10240/22434 [10:02:59<8:29:18, 2.51s/it] +2025-02-05 20:10:41 - ERROR - stderr - 46%|████▌ | 10241/22434 [10:03:01<8:28:42, 2.50s/it] +2025-02-05 20:10:41 - ERROR - stderr - +2025-02-05 20:10:41 - ERROR - stderr - +2025-02-05 20:10:41 - INFO - stdout - {'loss': 0.7173, 'grad_norm': 1.2516354322433472, 'learning_rate': 1.1884306561331498e-05, 'epoch': 1.37} +2025-02-05 20:10:41 - ERROR - stderr - 46%|████▌ | 10241/22434 [10:03:01<8:28:42, 2.50s/it] +2025-02-05 20:10:44 - ERROR - stderr - 46%|████▌ | 10242/22434 [10:03:04<8:27:58, 2.50s/it] +2025-02-05 20:10:44 - ERROR - stderr - +2025-02-05 20:10:44 - ERROR - stderr - +2025-02-05 20:10:44 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.3219366073608398, 'learning_rate': 1.188288865763355e-05, 'epoch': 1.37} +2025-02-05 20:10:44 - ERROR - stderr - 46%|████▌ | 10242/22434 [10:03:04<8:27:58, 2.50s/it] +2025-02-05 20:10:46 - ERROR - stderr - 46%|████▌ | 10243/22434 [10:03:06<8:29:37, 2.51s/it] +2025-02-05 20:10:46 - ERROR - stderr - +2025-02-05 20:10:46 - ERROR - stderr - +2025-02-05 20:10:46 - INFO - stdout - {'loss': 0.6155, 'grad_norm': 1.0133330821990967, 'learning_rate': 1.1881470714688585e-05, 'epoch': 1.37} +2025-02-05 20:10:46 - ERROR - stderr - 46%|████▌ | 10243/22434 [10:03:06<8:29:37, 2.51s/it] +2025-02-05 20:10:49 - ERROR - stderr - 46%|████▌ | 10244/22434 [10:03:09<8:25:04, 2.49s/it] +2025-02-05 20:10:49 - ERROR - stderr - +2025-02-05 20:10:49 - ERROR - stderr - +2025-02-05 20:10:49 - INFO - stdout - {'loss': 0.6971, 'grad_norm': 1.2487989664077759, 'learning_rate': 1.188005273252617e-05, 'epoch': 1.37} +2025-02-05 20:10:49 - ERROR - stderr - 46%|████▌ | 10244/22434 [10:03:09<8:25:04, 2.49s/it] +2025-02-05 20:10:51 - ERROR - stderr - 46%|████▌ | 10245/22434 [10:03:11<8:25:36, 2.49s/it] +2025-02-05 20:10:51 - ERROR - stderr - +2025-02-05 20:10:51 - ERROR - stderr - +2025-02-05 20:10:51 - INFO - stdout - {'loss': 0.6423, 'grad_norm': 1.1328601837158203, 'learning_rate': 1.1878634711175854e-05, 'epoch': 1.37} +2025-02-05 20:10:51 - ERROR - stderr - 46%|████▌ | 10245/22434 [10:03:11<8:25:36, 2.49s/it] +2025-02-05 20:10:54 - ERROR - stderr - 46%|████▌ | 10246/22434 [10:03:13<8:25:14, 2.49s/it] +2025-02-05 20:10:54 - ERROR - stderr - +2025-02-05 20:10:54 - ERROR - stderr - +2025-02-05 20:10:54 - INFO - stdout - {'loss': 0.7512, 'grad_norm': 1.2758080959320068, 'learning_rate': 1.1877216650667194e-05, 'epoch': 1.37} +2025-02-05 20:10:54 - ERROR - stderr - 46%|████▌ | 10246/22434 [10:03:14<8:25:14, 2.49s/it] +2025-02-05 20:10:56 - ERROR - stderr - 46%|████▌ | 10247/22434 [10:03:16<8:24:56, 2.49s/it] +2025-02-05 20:10:56 - ERROR - stderr - +2025-02-05 20:10:56 - ERROR - stderr - +2025-02-05 20:10:56 - INFO - stdout - {'loss': 0.7434, 'grad_norm': 1.2373908758163452, 'learning_rate': 1.1875798551029749e-05, 'epoch': 1.37} +2025-02-05 20:10:56 - ERROR - stderr - 46%|████▌ | 10247/22434 [10:03:16<8:24:56, 2.49s/it] +2025-02-05 20:10:59 - ERROR - stderr - 46%|████▌ | 10248/22434 [10:03:18<8:23:53, 2.48s/it] +2025-02-05 20:10:59 - ERROR - stderr - +2025-02-05 20:10:59 - ERROR - stderr - +2025-02-05 20:10:59 - INFO - stdout - {'loss': 0.7142, 'grad_norm': 1.1997580528259277, 'learning_rate': 1.1874380412293078e-05, 'epoch': 1.37} +2025-02-05 20:10:59 - ERROR - stderr - 46%|████▌ | 10248/22434 [10:03:19<8:23:53, 2.48s/it] +2025-02-05 20:11:01 - ERROR - stderr - 46%|████▌ | 10249/22434 [10:03:21<8:39:29, 2.56s/it] +2025-02-05 20:11:01 - ERROR - stderr - +2025-02-05 20:11:01 - ERROR - stderr - +2025-02-05 20:11:01 - INFO - stdout - {'loss': 0.7165, 'grad_norm': 1.1408528089523315, 'learning_rate': 1.187296223448674e-05, 'epoch': 1.37} +2025-02-05 20:11:01 - ERROR - stderr - 46%|████▌ | 10249/22434 [10:03:21<8:39:29, 2.56s/it] +2025-02-05 20:11:04 - ERROR - stderr - 46%|████▌ | 10250/22434 [10:03:24<8:34:39, 2.53s/it] +2025-02-05 20:11:04 - ERROR - stderr - +2025-02-05 20:11:04 - ERROR - stderr - +2025-02-05 20:11:04 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.1677137613296509, 'learning_rate': 1.1871544017640298e-05, 'epoch': 1.37} +2025-02-05 20:11:04 - ERROR - stderr - 46%|████▌ | 10250/22434 [10:03:24<8:34:39, 2.53s/it] +2025-02-05 20:11:06 - ERROR - stderr - 46%|████▌ | 10251/22434 [10:03:26<8:30:03, 2.51s/it] +2025-02-05 20:11:06 - ERROR - stderr - +2025-02-05 20:11:06 - ERROR - stderr - +2025-02-05 20:11:06 - INFO - stdout - {'loss': 0.7455, 'grad_norm': 1.2518094778060913, 'learning_rate': 1.1870125761783311e-05, 'epoch': 1.37} +2025-02-05 20:11:06 - ERROR - stderr - 46%|████▌ | 10251/22434 [10:03:26<8:30:03, 2.51s/it] +2025-02-05 20:11:09 - ERROR - stderr - 46%|████▌ | 10252/22434 [10:03:29<8:30:04, 2.51s/it] +2025-02-05 20:11:09 - ERROR - stderr - +2025-02-05 20:11:09 - ERROR - stderr - +2025-02-05 20:11:09 - INFO - stdout - {'loss': 0.7335, 'grad_norm': 1.2905768156051636, 'learning_rate': 1.1868707466945343e-05, 'epoch': 1.37} +2025-02-05 20:11:09 - ERROR - stderr - 46%|████▌ | 10252/22434 [10:03:29<8:30:04, 2.51s/it] +2025-02-05 20:11:11 - ERROR - stderr - 46%|████▌ | 10253/22434 [10:03:31<8:32:53, 2.53s/it] +2025-02-05 20:11:11 - ERROR - stderr - +2025-02-05 20:11:11 - ERROR - stderr - +2025-02-05 20:11:11 - INFO - stdout - {'loss': 0.6254, 'grad_norm': 1.076263427734375, 'learning_rate': 1.1867289133155957e-05, 'epoch': 1.37} +2025-02-05 20:11:11 - ERROR - stderr - 46%|████▌ | 10253/22434 [10:03:31<8:32:53, 2.53s/it] +2025-02-05 20:11:14 - ERROR - stderr - 46%|████▌ | 10254/22434 [10:03:34<8:33:56, 2.53s/it] +2025-02-05 20:11:14 - ERROR - stderr - +2025-02-05 20:11:14 - ERROR - stderr - +2025-02-05 20:11:14 - INFO - stdout - {'loss': 0.6416, 'grad_norm': 1.127852439880371, 'learning_rate': 1.1865870760444715e-05, 'epoch': 1.37} +2025-02-05 20:11:14 - ERROR - stderr - 46%|████▌ | 10254/22434 [10:03:34<8:33:56, 2.53s/it] +2025-02-05 20:11:16 - ERROR - stderr - 46%|████▌ | 10255/22434 [10:03:36<8:30:59, 2.52s/it] +2025-02-05 20:11:17 - ERROR - stderr - +2025-02-05 20:11:17 - ERROR - stderr - +2025-02-05 20:11:17 - INFO - stdout - {'loss': 0.7284, 'grad_norm': 1.1369438171386719, 'learning_rate': 1.1864452348841182e-05, 'epoch': 1.37} +2025-02-05 20:11:17 - ERROR - stderr - 46%|████▌ | 10255/22434 [10:03:36<8:30:59, 2.52s/it] +2025-02-05 20:11:19 - ERROR - stderr - 46%|████▌ | 10256/22434 [10:03:39<8:34:39, 2.54s/it] +2025-02-05 20:11:19 - ERROR - stderr - +2025-02-05 20:11:19 - ERROR - stderr - +2025-02-05 20:11:19 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.1914016008377075, 'learning_rate': 1.1863033898374921e-05, 'epoch': 1.37} +2025-02-05 20:11:19 - ERROR - stderr - 46%|████▌ | 10256/22434 [10:03:39<8:34:39, 2.54s/it] +2025-02-05 20:11:22 - ERROR - stderr - 46%|████▌ | 10257/22434 [10:03:41<8:34:46, 2.54s/it] +2025-02-05 20:11:22 - ERROR - stderr - +2025-02-05 20:11:22 - ERROR - stderr - +2025-02-05 20:11:22 - INFO - stdout - {'loss': 0.6197, 'grad_norm': 1.1593722105026245, 'learning_rate': 1.1861615409075507e-05, 'epoch': 1.37} +2025-02-05 20:11:22 - ERROR - stderr - 46%|████▌ | 10257/22434 [10:03:41<8:34:46, 2.54s/it] +2025-02-05 20:11:24 - ERROR - stderr - 46%|████▌ | 10258/22434 [10:03:44<8:33:48, 2.53s/it] +2025-02-05 20:11:24 - ERROR - stderr - +2025-02-05 20:11:24 - ERROR - stderr - +2025-02-05 20:11:24 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.0651557445526123, 'learning_rate': 1.1860196880972496e-05, 'epoch': 1.37} +2025-02-05 20:11:24 - ERROR - stderr - 46%|████▌ | 10258/22434 [10:03:44<8:33:48, 2.53s/it] +2025-02-05 20:11:27 - ERROR - stderr - 46%|████▌ | 10259/22434 [10:03:47<8:49:51, 2.61s/it] +2025-02-05 20:11:27 - ERROR - stderr - +2025-02-05 20:11:27 - ERROR - stderr - +2025-02-05 20:11:27 - INFO - stdout - {'loss': 0.6963, 'grad_norm': 1.2098373174667358, 'learning_rate': 1.1858778314095462e-05, 'epoch': 1.37} +2025-02-05 20:11:27 - ERROR - stderr - 46%|████▌ | 10259/22434 [10:03:47<8:49:51, 2.61s/it] +2025-02-05 20:11:29 - ERROR - stderr - 46%|████▌ | 10260/22434 [10:03:49<8:42:22, 2.57s/it] +2025-02-05 20:11:29 - ERROR - stderr - +2025-02-05 20:11:29 - ERROR - stderr - +2025-02-05 20:11:29 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.1660557985305786, 'learning_rate': 1.1857359708473975e-05, 'epoch': 1.37} +2025-02-05 20:11:29 - ERROR - stderr - 46%|████▌ | 10260/22434 [10:03:49<8:42:22, 2.57s/it] +2025-02-05 20:11:32 - ERROR - stderr - 46%|████▌ | 10261/22434 [10:03:52<8:39:09, 2.56s/it] +2025-02-05 20:11:32 - ERROR - stderr - +2025-02-05 20:11:32 - ERROR - stderr - +2025-02-05 20:11:32 - INFO - stdout - {'loss': 0.7796, 'grad_norm': 1.2848864793777466, 'learning_rate': 1.1855941064137602e-05, 'epoch': 1.37} +2025-02-05 20:11:32 - ERROR - stderr - 46%|████▌ | 10261/22434 [10:03:52<8:39:09, 2.56s/it] +2025-02-05 20:11:34 - ERROR - stderr - 46%|████▌ | 10262/22434 [10:03:54<8:30:25, 2.52s/it] +2025-02-05 20:11:34 - ERROR - stderr - +2025-02-05 20:11:34 - ERROR - stderr - +2025-02-05 20:11:34 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.2703560590744019, 'learning_rate': 1.185452238111591e-05, 'epoch': 1.37} +2025-02-05 20:11:34 - ERROR - stderr - 46%|████▌ | 10262/22434 [10:03:54<8:30:25, 2.52s/it] +2025-02-05 20:11:37 - ERROR - stderr - 46%|████▌ | 10263/22434 [10:03:57<8:37:42, 2.55s/it] +2025-02-05 20:11:37 - ERROR - stderr - +2025-02-05 20:11:37 - ERROR - stderr - +2025-02-05 20:11:37 - INFO - stdout - {'loss': 0.6344, 'grad_norm': 1.0441081523895264, 'learning_rate': 1.1853103659438477e-05, 'epoch': 1.37} +2025-02-05 20:11:37 - ERROR - stderr - 46%|████▌ | 10263/22434 [10:03:57<8:37:42, 2.55s/it] +2025-02-05 20:11:39 - ERROR - stderr - 46%|████▌ | 10264/22434 [10:03:59<8:31:34, 2.52s/it] +2025-02-05 20:11:39 - ERROR - stderr - +2025-02-05 20:11:39 - ERROR - stderr - +2025-02-05 20:11:39 - INFO - stdout - {'loss': 0.6759, 'grad_norm': 1.1877859830856323, 'learning_rate': 1.185168489913487e-05, 'epoch': 1.37} +2025-02-05 20:11:39 - ERROR - stderr - 46%|████▌ | 10264/22434 [10:03:59<8:31:34, 2.52s/it] +2025-02-05 20:11:42 - ERROR - stderr - 46%|████▌ | 10265/22434 [10:04:02<8:30:46, 2.52s/it] +2025-02-05 20:11:42 - ERROR - stderr - +2025-02-05 20:11:42 - ERROR - stderr - +2025-02-05 20:11:42 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.348563313484192, 'learning_rate': 1.1850266100234665e-05, 'epoch': 1.37} +2025-02-05 20:11:42 - ERROR - stderr - 46%|████▌ | 10265/22434 [10:04:02<8:30:46, 2.52s/it] +2025-02-05 20:11:44 - ERROR - stderr - 46%|████▌ | 10266/22434 [10:04:04<8:29:36, 2.51s/it] +2025-02-05 20:11:44 - ERROR - stderr - +2025-02-05 20:11:44 - ERROR - stderr - +2025-02-05 20:11:44 - INFO - stdout - {'loss': 0.8149, 'grad_norm': 1.2906465530395508, 'learning_rate': 1.1848847262767431e-05, 'epoch': 1.37} +2025-02-05 20:11:44 - ERROR - stderr - 46%|████▌ | 10266/22434 [10:04:04<8:29:36, 2.51s/it] +2025-02-05 20:11:47 - ERROR - stderr - 46%|████▌ | 10267/22434 [10:04:07<8:37:02, 2.55s/it] +2025-02-05 20:11:47 - ERROR - stderr - +2025-02-05 20:11:47 - ERROR - stderr - +2025-02-05 20:11:47 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.2016907930374146, 'learning_rate': 1.1847428386762748e-05, 'epoch': 1.37} +2025-02-05 20:11:47 - ERROR - stderr - 46%|████▌ | 10267/22434 [10:04:07<8:37:02, 2.55s/it] +2025-02-05 20:11:50 - ERROR - stderr - 46%|████▌ | 10268/22434 [10:04:09<8:32:20, 2.53s/it] +2025-02-05 20:11:50 - ERROR - stderr - +2025-02-05 20:11:50 - ERROR - stderr - +2025-02-05 20:11:50 - INFO - stdout - {'loss': 0.7459, 'grad_norm': 1.2858937978744507, 'learning_rate': 1.1846009472250183e-05, 'epoch': 1.37} +2025-02-05 20:11:50 - ERROR - stderr - 46%|████▌ | 10268/22434 [10:04:09<8:32:20, 2.53s/it] +2025-02-05 20:11:52 - ERROR - stderr - 46%|████▌ | 10269/22434 [10:04:12<8:30:45, 2.52s/it] +2025-02-05 20:11:52 - ERROR - stderr - +2025-02-05 20:11:52 - ERROR - stderr - +2025-02-05 20:11:52 - INFO - stdout - {'loss': 0.6663, 'grad_norm': 1.0750868320465088, 'learning_rate': 1.1844590519259321e-05, 'epoch': 1.37} +2025-02-05 20:11:52 - ERROR - stderr - 46%|████▌ | 10269/22434 [10:04:12<8:30:45, 2.52s/it] +2025-02-05 20:11:54 - ERROR - stderr - 46%|████▌ | 10270/22434 [10:04:14<8:26:10, 2.50s/it] +2025-02-05 20:11:55 - ERROR - stderr - +2025-02-05 20:11:55 - ERROR - stderr - +2025-02-05 20:11:55 - INFO - stdout - {'loss': 0.7597, 'grad_norm': 1.2467623949050903, 'learning_rate': 1.1843171527819734e-05, 'epoch': 1.37} +2025-02-05 20:11:55 - ERROR - stderr - 46%|████▌ | 10270/22434 [10:04:14<8:26:10, 2.50s/it] +2025-02-05 20:11:57 - ERROR - stderr - 46%|████▌ | 10271/22434 [10:04:17<8:26:30, 2.50s/it] +2025-02-05 20:11:57 - ERROR - stderr - +2025-02-05 20:11:57 - ERROR - stderr - +2025-02-05 20:11:57 - INFO - stdout - {'loss': 0.7193, 'grad_norm': 1.2384566068649292, 'learning_rate': 1.1841752497961001e-05, 'epoch': 1.37} +2025-02-05 20:11:57 - ERROR - stderr - 46%|████▌ | 10271/22434 [10:04:17<8:26:30, 2.50s/it] +2025-02-05 20:12:00 - ERROR - stderr - 46%|████▌ | 10272/22434 [10:04:19<8:28:53, 2.51s/it] +2025-02-05 20:12:00 - ERROR - stderr - +2025-02-05 20:12:00 - ERROR - stderr - +2025-02-05 20:12:00 - INFO - stdout - {'loss': 0.7314, 'grad_norm': 1.1998809576034546, 'learning_rate': 1.1840333429712699e-05, 'epoch': 1.37} +2025-02-05 20:12:00 - ERROR - stderr - 46%|████▌ | 10272/22434 [10:04:19<8:28:53, 2.51s/it] +2025-02-05 20:12:02 - ERROR - stderr - 46%|████▌ | 10273/22434 [10:04:22<8:26:20, 2.50s/it] +2025-02-05 20:12:02 - ERROR - stderr - +2025-02-05 20:12:02 - ERROR - stderr - +2025-02-05 20:12:02 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.2076008319854736, 'learning_rate': 1.1838914323104407e-05, 'epoch': 1.37} +2025-02-05 20:12:02 - ERROR - stderr - 46%|████▌ | 10273/22434 [10:04:22<8:26:20, 2.50s/it] +2025-02-05 20:12:04 - ERROR - stderr - 46%|████▌ | 10274/22434 [10:04:24<8:24:20, 2.49s/it] +2025-02-05 20:12:04 - ERROR - stderr - +2025-02-05 20:12:04 - ERROR - stderr - +2025-02-05 20:12:04 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.2304364442825317, 'learning_rate': 1.1837495178165706e-05, 'epoch': 1.37} +2025-02-05 20:12:05 - ERROR - stderr - 46%|█��██▌ | 10274/22434 [10:04:24<8:24:20, 2.49s/it] +2025-02-05 20:12:07 - ERROR - stderr - 46%|████▌ | 10275/22434 [10:04:27<8:21:09, 2.47s/it] +2025-02-05 20:12:07 - ERROR - stderr - +2025-02-05 20:12:07 - ERROR - stderr - +2025-02-05 20:12:07 - INFO - stdout - {'loss': 0.8148, 'grad_norm': 1.3354172706604004, 'learning_rate': 1.1836075994926175e-05, 'epoch': 1.37} +2025-02-05 20:12:07 - ERROR - stderr - 46%|████▌ | 10275/22434 [10:04:27<8:21:09, 2.47s/it] +2025-02-05 20:12:09 - ERROR - stderr - 46%|████▌ | 10276/22434 [10:04:29<8:24:39, 2.49s/it] +2025-02-05 20:12:09 - ERROR - stderr - +2025-02-05 20:12:09 - ERROR - stderr - +2025-02-05 20:12:09 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.2624297142028809, 'learning_rate': 1.1834656773415396e-05, 'epoch': 1.37} +2025-02-05 20:12:09 - ERROR - stderr - 46%|████▌ | 10276/22434 [10:04:29<8:24:39, 2.49s/it] +2025-02-05 20:12:12 - ERROR - stderr - 46%|████▌ | 10277/22434 [10:04:32<8:28:22, 2.51s/it] +2025-02-05 20:12:12 - ERROR - stderr - +2025-02-05 20:12:12 - ERROR - stderr - +2025-02-05 20:12:12 - INFO - stdout - {'loss': 0.6153, 'grad_norm': 1.1481683254241943, 'learning_rate': 1.1833237513662956e-05, 'epoch': 1.37} +2025-02-05 20:12:12 - ERROR - stderr - 46%|████▌ | 10277/22434 [10:04:32<8:28:22, 2.51s/it] +2025-02-05 20:12:14 - ERROR - stderr - 46%|████▌ | 10278/22434 [10:04:34<8:28:11, 2.51s/it] +2025-02-05 20:12:15 - ERROR - stderr - +2025-02-05 20:12:15 - ERROR - stderr - +2025-02-05 20:12:15 - INFO - stdout - {'loss': 0.7899, 'grad_norm': 1.1723748445510864, 'learning_rate': 1.1831818215698434e-05, 'epoch': 1.37} +2025-02-05 20:12:15 - ERROR - stderr - 46%|████▌ | 10278/22434 [10:04:34<8:28:11, 2.51s/it] +2025-02-05 20:12:17 - ERROR - stderr - 46%|████▌ | 10279/22434 [10:04:37<8:25:13, 2.49s/it] +2025-02-05 20:12:17 - ERROR - stderr - +2025-02-05 20:12:17 - ERROR - stderr - +2025-02-05 20:12:17 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.1131445169448853, 'learning_rate': 1.1830398879551412e-05, 'epoch': 1.37} +2025-02-05 20:12:17 - ERROR - stderr - 46%|████▌ | 10279/22434 [10:04:37<8:25:13, 2.49s/it] +2025-02-05 20:12:19 - ERROR - stderr - 46%|████▌ | 10280/22434 [10:04:39<8:26:29, 2.50s/it] +2025-02-05 20:12:20 - ERROR - stderr - +2025-02-05 20:12:20 - ERROR - stderr - +2025-02-05 20:12:20 - INFO - stdout - {'loss': 0.6567, 'grad_norm': 1.1286929845809937, 'learning_rate': 1.1828979505251476e-05, 'epoch': 1.37} +2025-02-05 20:12:20 - ERROR - stderr - 46%|████▌ | 10280/22434 [10:04:39<8:26:29, 2.50s/it] +2025-02-05 20:12:22 - ERROR - stderr - 46%|████▌ | 10281/22434 [10:04:42<8:23:40, 2.49s/it] +2025-02-05 20:12:22 - ERROR - stderr - +2025-02-05 20:12:22 - ERROR - stderr - +2025-02-05 20:12:22 - INFO - stdout - {'loss': 0.7466, 'grad_norm': 1.2521553039550781, 'learning_rate': 1.1827560092828215e-05, 'epoch': 1.37} +2025-02-05 20:12:22 - ERROR - stderr - 46%|████▌ | 10281/22434 [10:04:42<8:23:40, 2.49s/it] +2025-02-05 20:12:24 - ERROR - stderr - 46%|████▌ | 10282/22434 [10:04:44<8:21:15, 2.47s/it] +2025-02-05 20:12:24 - ERROR - stderr - +2025-02-05 20:12:24 - ERROR - stderr - +2025-02-05 20:12:24 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.1224563121795654, 'learning_rate': 1.1826140642311211e-05, 'epoch': 1.37} +2025-02-05 20:12:24 - ERROR - stderr - 46%|████▌ | 10282/22434 [10:04:44<8:21:15, 2.47s/it] +2025-02-05 20:12:27 - ERROR - stderr - 46%|████▌ | 10283/22434 [10:04:47<8:22:52, 2.48s/it] +2025-02-05 20:12:27 - ERROR - stderr - +2025-02-05 20:12:27 - ERROR - stderr - +2025-02-05 20:12:27 - INFO - stdout - {'loss': 0.6323, 'grad_norm': 1.0615402460098267, 'learning_rate': 1.1824721153730052e-05, 'epoch': 1.38} +2025-02-05 20:12:27 - ERROR - stderr - 46%|████▌ | 10283/22434 [10:04:47<8:22:52, 2.48s/it] +2025-02-05 20:12:29 - ERROR - stderr - 46%|████▌ | 10284/22434 [10:04:49<8:21:46, 2.48s/it] +2025-02-05 20:12:29 - ERROR - stderr - +2025-02-05 20:12:29 - ERROR - stderr - +2025-02-05 20:12:29 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.1387630701065063, 'learning_rate': 1.1823301627114327e-05, 'epoch': 1.38} +2025-02-05 20:12:29 - ERROR - stderr - 46%|████▌ | 10284/22434 [10:04:49<8:21:46, 2.48s/it] +2025-02-05 20:12:32 - ERROR - stderr - 46%|████▌ | 10285/22434 [10:04:52<8:30:02, 2.52s/it] +2025-02-05 20:12:32 - ERROR - stderr - +2025-02-05 20:12:32 - ERROR - stderr - +2025-02-05 20:12:32 - INFO - stdout - {'loss': 0.7696, 'grad_norm': 1.1740139722824097, 'learning_rate': 1.1821882062493625e-05, 'epoch': 1.38} +2025-02-05 20:12:32 - ERROR - stderr - 46%|████▌ | 10285/22434 [10:04:52<8:30:02, 2.52s/it] +2025-02-05 20:12:34 - ERROR - stderr - 46%|████▌ | 10286/22434 [10:04:54<8:26:42, 2.50s/it] +2025-02-05 20:12:34 - ERROR - stderr - +2025-02-05 20:12:34 - ERROR - stderr - +2025-02-05 20:12:34 - INFO - stdout - {'loss': 0.6315, 'grad_norm': 1.0665405988693237, 'learning_rate': 1.1820462459897537e-05, 'epoch': 1.38} +2025-02-05 20:12:34 - ERROR - stderr - 46%|████▌ | 10286/22434 [10:04:54<8:26:42, 2.50s/it] +2025-02-05 20:12:37 - ERROR - stderr - 46%|████▌ | 10287/22434 [10:04:57<8:25:46, 2.50s/it] +2025-02-05 20:12:37 - ERROR - stderr - +2025-02-05 20:12:37 - ERROR - stderr - +2025-02-05 20:12:37 - INFO - stdout - {'loss': 0.7425, 'grad_norm': 1.3269743919372559, 'learning_rate': 1.1819042819355649e-05, 'epoch': 1.38} +2025-02-05 20:12:37 - ERROR - stderr - 46%|████▌ | 10287/22434 [10:04:57<8:25:46, 2.50s/it] +2025-02-05 20:12:39 - ERROR - stderr - 46%|████▌ | 10288/22434 [10:04:59<8:21:06, 2.48s/it] +2025-02-05 20:12:39 - ERROR - stderr - +2025-02-05 20:12:39 - ERROR - stderr - +2025-02-05 20:12:39 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.1500425338745117, 'learning_rate': 1.1817623140897552e-05, 'epoch': 1.38} +2025-02-05 20:12:39 - ERROR - stderr - 46%|████▌ | 10288/22434 [10:04:59<8:21:06, 2.48s/it] +2025-02-05 20:12:42 - ERROR - stderr - 46%|████▌ | 10289/22434 [10:05:02<8:23:01, 2.49s/it] +2025-02-05 20:12:42 - ERROR - stderr - +2025-02-05 20:12:42 - ERROR - stderr - +2025-02-05 20:12:42 - INFO - stdout - {'loss': 0.789, 'grad_norm': 1.2580466270446777, 'learning_rate': 1.181620342455284e-05, 'epoch': 1.38} +2025-02-05 20:12:42 - ERROR - stderr - 46%|████▌ | 10289/22434 [10:05:02<8:23:01, 2.49s/it] +2025-02-05 20:12:44 - ERROR - stderr - 46%|████▌ | 10290/22434 [10:05:04<8:22:25, 2.48s/it] +2025-02-05 20:12:44 - ERROR - stderr - +2025-02-05 20:12:44 - ERROR - stderr - +2025-02-05 20:12:44 - INFO - stdout - {'loss': 0.8122, 'grad_norm': 1.2586510181427002, 'learning_rate': 1.1814783670351111e-05, 'epoch': 1.38} +2025-02-05 20:12:44 - ERROR - stderr - 46%|████▌ | 10290/22434 [10:05:04<8:22:25, 2.48s/it] +2025-02-05 20:12:47 - ERROR - stderr - 46%|████▌ | 10291/22434 [10:05:07<8:21:34, 2.48s/it] +2025-02-05 20:12:47 - ERROR - stderr - +2025-02-05 20:12:47 - ERROR - stderr - +2025-02-05 20:12:47 - INFO - stdout - {'loss': 0.8484, 'grad_norm': 1.2869205474853516, 'learning_rate': 1.1813363878321948e-05, 'epoch': 1.38} +2025-02-05 20:12:47 - ERROR - stderr - 46%|████▌ | 10291/22434 [10:05:07<8:21:34, 2.48s/it] +2025-02-05 20:12:49 - ERROR - stderr - 46%|████▌ | 10292/22434 [10:05:09<8:25:47, 2.50s/it] +2025-02-05 20:12:49 - ERROR - stderr - +2025-02-05 20:12:49 - ERROR - stderr - +2025-02-05 20:12:49 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.1745719909667969, 'learning_rate': 1.1811944048494952e-05, 'epoch': 1.38} +2025-02-05 20:12:49 - ERROR - stderr - 46%|████▌ | 10292/22434 [10:05:09<8:25:47, 2.50s/it] +2025-02-05 20:12:52 - ERROR - stderr - 46%|████▌ | 10293/22434 [10:05:12<8:28:37, 2.51s/it] +2025-02-05 20:12:52 - ERROR - stderr - +2025-02-05 20:12:52 - ERROR - stderr - +2025-02-05 20:12:52 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.0377514362335205, 'learning_rate': 1.1810524180899716e-05, 'epoch': 1.38} +2025-02-05 20:12:52 - ERROR - stderr - 46%|████▌ | 10293/22434 [10:05:12<8:28:37, 2.51s/it] +2025-02-05 20:12:54 - ERROR - stderr - 46%|████▌ | 10294/22434 [10:05:14<8:24:37, 2.49s/it] +2025-02-05 20:12:54 - ERROR - stderr - +2025-02-05 20:12:54 - ERROR - stderr - +2025-02-05 20:12:54 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.106729507446289, 'learning_rate': 1.1809104275565835e-05, 'epoch': 1.38} +2025-02-05 20:12:54 - ERROR - stderr - 46%|████▌ | 10294/22434 [10:05:14<8:24:37, 2.49s/it] +2025-02-05 20:12:57 - ERROR - stderr - 46%|████▌ | 10295/22434 [10:05:17<8:25:35, 2.50s/it] +2025-02-05 20:12:57 - ERROR - stderr - +2025-02-05 20:12:57 - ERROR - stderr - +2025-02-05 20:12:57 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.1703206300735474, 'learning_rate': 1.1807684332522906e-05, 'epoch': 1.38} +2025-02-05 20:12:57 - ERROR - stderr - 46%|████▌ | 10295/22434 [10:05:17<8:25:35, 2.50s/it] +2025-02-05 20:12:59 - ERROR - stderr - 46%|████▌ | 10296/22434 [10:05:19<8:25:48, 2.50s/it] +2025-02-05 20:12:59 - ERROR - stderr - +2025-02-05 20:12:59 - ERROR - stderr - +2025-02-05 20:12:59 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.1567052602767944, 'learning_rate': 1.1806264351800527e-05, 'epoch': 1.38} +2025-02-05 20:12:59 - ERROR - stderr - 46%|████▌ | 10296/22434 [10:05:19<8:25:48, 2.50s/it] +2025-02-05 20:13:02 - ERROR - stderr - 46%|████▌ | 10297/22434 [10:05:22<8:26:40, 2.50s/it] +2025-02-05 20:13:02 - ERROR - stderr - +2025-02-05 20:13:02 - ERROR - stderr - +2025-02-05 20:13:02 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.1369904279708862, 'learning_rate': 1.1804844333428299e-05, 'epoch': 1.38} +2025-02-05 20:13:02 - ERROR - stderr - 46%|████▌ | 10297/22434 [10:05:22<8:26:40, 2.50s/it] +2025-02-05 20:13:04 - ERROR - stderr - 46%|████▌ | 10298/22434 [10:05:24<8:20:49, 2.48s/it] +2025-02-05 20:13:04 - ERROR - stderr - +2025-02-05 20:13:04 - ERROR - stderr - +2025-02-05 20:13:04 - INFO - stdout - {'loss': 0.5883, 'grad_norm': 1.182319164276123, 'learning_rate': 1.1803424277435818e-05, 'epoch': 1.38} +2025-02-05 20:13:04 - ERROR - stderr - 46%|████▌ | 10298/22434 [10:05:24<8:20:49, 2.48s/it] +2025-02-05 20:13:07 - ERROR - stderr - 46%|████▌ | 10299/22434 [10:05:26<8:21:04, 2.48s/it] +2025-02-05 20:13:07 - ERROR - stderr - +2025-02-05 20:13:07 - ERROR - stderr - +2025-02-05 20:13:07 - INFO - stdout - {'loss': 0.7143, 'grad_norm': 1.2064143419265747, 'learning_rate': 1.180200418385268e-05, 'epoch': 1.38} +2025-02-05 20:13:07 - ERROR - stderr - 46%|████▌ | 10299/22434 [10:05:27<8:21:04, 2.48s/it] +2025-02-05 20:13:09 - ERROR - stderr - 46%|████▌ | 10300/22434 [10:05:29<8:24:16, 2.49s/it] +2025-02-05 20:13:09 - ERROR - stderr - +2025-02-05 20:13:09 - ERROR - stderr - +2025-02-05 20:13:09 - INFO - stdout - {'loss': 0.6539, 'grad_norm': 1.1199012994766235, 'learning_rate': 1.180058405270849e-05, 'epoch': 1.38} +2025-02-05 20:13:09 - ERROR - stderr - 46%|████▌ | 10300/22434 [10:05:29<8:24:16, 2.49s/it] +2025-02-05 20:13:12 - ERROR - stderr - 46%|████▌ | 10301/22434 [10:05:31<8:21:53, 2.48s/it] +2025-02-05 20:13:12 - ERROR - stderr - +2025-02-05 20:13:12 - ERROR - stderr - +2025-02-05 20:13:12 - INFO - stdout - {'loss': 0.739, 'grad_norm': 1.131047248840332, 'learning_rate': 1.1799163884032847e-05, 'epoch': 1.38} +2025-02-05 20:13:12 - ERROR - stderr - 46%|████▌ | 10301/22434 [10:05:32<8:21:53, 2.48s/it] +2025-02-05 20:13:14 - ERROR - stderr - 46%|████▌ | 10302/22434 [10:05:34<8:22:28, 2.49s/it] +2025-02-05 20:13:14 - ERROR - stderr - +2025-02-05 20:13:14 - ERROR - stderr - +2025-02-05 20:13:14 - INFO - stdout - {'loss': 0.7465, 'grad_norm': 1.173695683479309, 'learning_rate': 1.1797743677855358e-05, 'epoch': 1.38} +2025-02-05 20:13:14 - ERROR - stderr - 46%|████▌ | 10302/22434 [10:05:34<8:22:28, 2.49s/it] +2025-02-05 20:13:17 - ERROR - stderr - 46%|████▌ | 10303/22434 [10:05:36<8:21:05, 2.48s/it] +2025-02-05 20:13:17 - ERROR - stderr - +2025-02-05 20:13:17 - ERROR - stderr - +2025-02-05 20:13:17 - INFO - stdout - {'loss': 0.7075, 'grad_norm': 1.143878698348999, 'learning_rate': 1.1796323434205622e-05, 'epoch': 1.38} +2025-02-05 20:13:17 - ERROR - stderr - 46%|████▌ | 10303/22434 [10:05:36<8:21:05, 2.48s/it] +2025-02-05 20:13:19 - ERROR - stderr - 46%|████▌ | 10304/22434 [10:05:39<8:24:23, 2.49s/it] +2025-02-05 20:13:19 - ERROR - stderr - +2025-02-05 20:13:19 - ERROR - stderr - +2025-02-05 20:13:19 - INFO - stdout - {'loss': 0.7586, 'grad_norm': 1.195981502532959, 'learning_rate': 1.179490315311324e-05, 'epoch': 1.38} +2025-02-05 20:13:19 - ERROR - stderr - 46%|████▌ | 10304/22434 [10:05:39<8:24:23, 2.49s/it] +2025-02-05 20:13:22 - ERROR - stderr - 46%|████▌ | 10305/22434 [10:05:41<8:22:52, 2.49s/it] +2025-02-05 20:13:22 - ERROR - stderr - +2025-02-05 20:13:22 - ERROR - stderr - +2025-02-05 20:13:22 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.3747605085372925, 'learning_rate': 1.1793482834607822e-05, 'epoch': 1.38} +2025-02-05 20:13:22 - ERROR - stderr - 46%|████▌ | 10305/22434 [10:05:41<8:22:52, 2.49s/it] +2025-02-05 20:13:24 - ERROR - stderr - 46%|████▌ | 10306/22434 [10:05:44<8:21:38, 2.48s/it] +2025-02-05 20:13:24 - ERROR - stderr - +2025-02-05 20:13:24 - ERROR - stderr - +2025-02-05 20:13:24 - INFO - stdout - {'loss': 0.7699, 'grad_norm': 1.3642431497573853, 'learning_rate': 1.179206247871897e-05, 'epoch': 1.38} +2025-02-05 20:13:24 - ERROR - stderr - 46%|████▌ | 10306/22434 [10:05:44<8:21:38, 2.48s/it] +2025-02-05 20:13:27 - ERROR - stderr - 46%|████▌ | 10307/22434 [10:05:46<8:25:31, 2.50s/it] +2025-02-05 20:13:27 - ERROR - stderr - +2025-02-05 20:13:27 - ERROR - stderr - +2025-02-05 20:13:27 - INFO - stdout - {'loss': 0.7003, 'grad_norm': 1.3034253120422363, 'learning_rate': 1.1790642085476287e-05, 'epoch': 1.38} +2025-02-05 20:13:27 - ERROR - stderr - 46%|████▌ | 10307/22434 [10:05:47<8:25:31, 2.50s/it] +2025-02-05 20:13:29 - ERROR - stderr - 46%|████▌ | 10308/22434 [10:05:49<8:25:24, 2.50s/it] +2025-02-05 20:13:29 - ERROR - stderr - +2025-02-05 20:13:29 - ERROR - stderr - +2025-02-05 20:13:29 - INFO - stdout - {'loss': 0.8, 'grad_norm': 1.3358523845672607, 'learning_rate': 1.1789221654909386e-05, 'epoch': 1.38} +2025-02-05 20:13:29 - ERROR - stderr - 46%|████▌ | 10308/22434 [10:05:49<8:25:24, 2.50s/it] +2025-02-05 20:13:32 - ERROR - stderr - 46%|████▌ | 10309/22434 [10:05:51<8:25:12, 2.50s/it] +2025-02-05 20:13:32 - ERROR - stderr - +2025-02-05 20:13:32 - ERROR - stderr - +2025-02-05 20:13:32 - INFO - stdout - {'loss': 0.7155, 'grad_norm': 1.1389260292053223, 'learning_rate': 1.1787801187047872e-05, 'epoch': 1.38} +2025-02-05 20:13:32 - ERROR - stderr - 46%|████▌ | 10309/22434 [10:05:52<8:25:12, 2.50s/it] +2025-02-05 20:13:34 - ERROR - stderr - 46%|████▌ | 10310/22434 [10:05:54<8:24:09, 2.50s/it] +2025-02-05 20:13:34 - ERROR - stderr - +2025-02-05 20:13:34 - ERROR - stderr - +2025-02-05 20:13:34 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.2290832996368408, 'learning_rate': 1.1786380681921355e-05, 'epoch': 1.38} +2025-02-05 20:13:34 - ERROR - stderr - 46%|████▌ | 10310/22434 [10:05:54<8:24:09, 2.50s/it] +2025-02-05 20:13:37 - ERROR - stderr - 46%|████▌ | 10311/22434 [10:05:56<8:26:02, 2.50s/it] +2025-02-05 20:13:37 - ERROR - stderr - +2025-02-05 20:13:37 - ERROR - stderr - +2025-02-05 20:13:37 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.311579704284668, 'learning_rate': 1.1784960139559441e-05, 'epoch': 1.38} +2025-02-05 20:13:37 - ERROR - stderr - 46%|████▌ | 10311/22434 [10:05:57<8:26:02, 2.50s/it] +2025-02-05 20:13:39 - ERROR - stderr - 46%|████▌ | 10312/22434 [10:05:59<8:27:05, 2.51s/it] +2025-02-05 20:13:39 - ERROR - stderr - +2025-02-05 20:13:39 - ERROR - stderr - +2025-02-05 20:13:39 - INFO - stdout - {'loss': 0.7387, 'grad_norm': 1.252864956855774, 'learning_rate': 1.1783539559991737e-05, 'epoch': 1.38} +2025-02-05 20:13:39 - ERROR - stderr - 46%|████▌ | 10312/22434 [10:05:59<8:27:05, 2.51s/it] +2025-02-05 20:13:42 - ERROR - stderr - 46%|████▌ | 10313/22434 [10:06:01<8:23:28, 2.49s/it] +2025-02-05 20:13:42 - ERROR - stderr - +2025-02-05 20:13:42 - ERROR - stderr - +2025-02-05 20:13:42 - INFO - stdout - {'loss': 0.7219, 'grad_norm': 1.2025372982025146, 'learning_rate': 1.178211894324786e-05, 'epoch': 1.38} +2025-02-05 20:13:42 - ERROR - stderr - 46%|████▌ | 10313/22434 [10:06:01<8:23:28, 2.49s/it] +2025-02-05 20:13:44 - ERROR - stderr - 46%|████▌ | 10314/22434 [10:06:04<8:20:46, 2.48s/it] +2025-02-05 20:13:44 - ERROR - stderr - +2025-02-05 20:13:44 - ERROR - stderr - +2025-02-05 20:13:44 - INFO - stdout - {'loss': 0.7064, 'grad_norm': 1.226413607597351, 'learning_rate': 1.1780698289357419e-05, 'epoch': 1.38} +2025-02-05 20:13:44 - ERROR - stderr - 46%|████▌ | 10314/22434 [10:06:04<8:20:46, 2.48s/it] +2025-02-05 20:13:47 - ERROR - stderr - 46%|████▌ | 10315/22434 [10:06:06<8:27:00, 2.51s/it] +2025-02-05 20:13:47 - ERROR - stderr - +2025-02-05 20:13:47 - ERROR - stderr - +2025-02-05 20:13:47 - INFO - stdout - {'loss': 0.7633, 'grad_norm': 1.3026734590530396, 'learning_rate': 1.1779277598350028e-05, 'epoch': 1.38} +2025-02-05 20:13:47 - ERROR - stderr - 46%|████▌ | 10315/22434 [10:06:07<8:27:00, 2.51s/it] +2025-02-05 20:13:50 - ERROR - stderr - 46%|████▌ | 10316/22434 [10:06:09<8:56:18, 2.66s/it] +2025-02-05 20:13:50 - ERROR - stderr - +2025-02-05 20:13:50 - ERROR - stderr - +2025-02-05 20:13:50 - INFO - stdout - {'loss': 0.6596, 'grad_norm': 1.1103025674819946, 'learning_rate': 1.1777856870255295e-05, 'epoch': 1.38} +2025-02-05 20:13:50 - ERROR - stderr - 46%|████▌ | 10316/22434 [10:06:10<8:56:18, 2.66s/it] +2025-02-05 20:13:52 - ERROR - stderr - 46%|████▌ | 10317/22434 [10:06:12<8:46:02, 2.60s/it] +2025-02-05 20:13:52 - ERROR - stderr - +2025-02-05 20:13:52 - ERROR - stderr - +2025-02-05 20:13:52 - INFO - stdout - {'loss': 0.7621, 'grad_norm': 1.1582976579666138, 'learning_rate': 1.1776436105102838e-05, 'epoch': 1.38} +2025-02-05 20:13:52 - ERROR - stderr - 46%|████▌ | 10317/22434 [10:06:12<8:46:02, 2.60s/it] +2025-02-05 20:13:55 - ERROR - stderr - 46%|████▌ | 10318/22434 [10:06:14<8:38:10, 2.57s/it] +2025-02-05 20:13:55 - ERROR - stderr - +2025-02-05 20:13:55 - ERROR - stderr - +2025-02-05 20:13:55 - INFO - stdout - {'loss': 0.6599, 'grad_norm': 1.2690963745117188, 'learning_rate': 1.1775015302922273e-05, 'epoch': 1.38} +2025-02-05 20:13:55 - ERROR - stderr - 46%|████▌ | 10318/22434 [10:06:14<8:38:10, 2.57s/it] +2025-02-05 20:13:57 - ERROR - stderr - 46%|████▌ | 10319/22434 [10:06:17<8:33:24, 2.54s/it] +2025-02-05 20:13:57 - ERROR - stderr - +2025-02-05 20:13:57 - ERROR - stderr - +2025-02-05 20:13:57 - INFO - stdout - {'loss': 0.6629, 'grad_norm': 1.1598347425460815, 'learning_rate': 1.1773594463743207e-05, 'epoch': 1.38} +2025-02-05 20:13:57 - ERROR - stderr - 46%|████▌ | 10319/22434 [10:06:17<8:33:24, 2.54s/it] +2025-02-05 20:14:00 - ERROR - stderr - 46%|████▌ | 10320/22434 [10:06:19<8:32:16, 2.54s/it] +2025-02-05 20:14:00 - ERROR - stderr - +2025-02-05 20:14:00 - ERROR - stderr - +2025-02-05 20:14:00 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.1277376413345337, 'learning_rate': 1.1772173587595263e-05, 'epoch': 1.38} +2025-02-05 20:14:00 - ERROR - stderr - 46%|████▌ | 10320/22434 [10:06:19<8:32:16, 2.54s/it] +2025-02-05 20:14:02 - ERROR - stderr - 46%|████▌ | 10321/22434 [10:06:22<8:29:58, 2.53s/it] +2025-02-05 20:14:02 - ERROR - stderr - +2025-02-05 20:14:02 - ERROR - stderr - +2025-02-05 20:14:02 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.106000304222107, 'learning_rate': 1.177075267450806e-05, 'epoch': 1.38} +2025-02-05 20:14:02 - ERROR - stderr - 46%|████▌ | 10321/22434 [10:06:22<8:29:58, 2.53s/it] +2025-02-05 20:14:05 - ERROR - stderr - 46%|████▌ | 10322/22434 [10:06:24<8:25:32, 2.50s/it] +2025-02-05 20:14:05 - ERROR - stderr - +2025-02-05 20:14:05 - ERROR - stderr - +2025-02-05 20:14:05 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 1.133597731590271, 'learning_rate': 1.1769331724511211e-05, 'epoch': 1.38} +2025-02-05 20:14:05 - ERROR - stderr - 46%|████▌ | 10322/22434 [10:06:24<8:25:32, 2.50s/it] +2025-02-05 20:14:07 - ERROR - stderr - 46%|████▌ | 10323/22434 [10:06:27<8:22:10, 2.49s/it] +2025-02-05 20:14:07 - ERROR - stderr - +2025-02-05 20:14:07 - ERROR - stderr - +2025-02-05 20:14:07 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.1937872171401978, 'learning_rate': 1.1767910737634334e-05, 'epoch': 1.38} +2025-02-05 20:14:07 - ERROR - stderr - 46%|████▌ | 10323/22434 [10:06:27<8:22:10, 2.49s/it] +2025-02-05 20:14:10 - ERROR - stderr - 46%|████▌ | 10324/22434 [10:06:29<8:26:23, 2.51s/it] +2025-02-05 20:14:10 - ERROR - stderr - +2025-02-05 20:14:10 - ERROR - stderr - +2025-02-05 20:14:10 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.1425434350967407, 'learning_rate': 1.1766489713907047e-05, 'epoch': 1.38} +2025-02-05 20:14:10 - ERROR - stderr - 46%|████▌ | 10324/22434 [10:06:29<8:26:23, 2.51s/it] +2025-02-05 20:14:12 - ERROR - stderr - 46%|████▌ | 10325/22434 [10:06:32<8:23:26, 2.49s/it] +2025-02-05 20:14:12 - ERROR - stderr - +2025-02-05 20:14:12 - ERROR - stderr - +2025-02-05 20:14:12 - INFO - stdout - {'loss': 0.6665, 'grad_norm': 1.12587571144104, 'learning_rate': 1.1765068653358975e-05, 'epoch': 1.38} +2025-02-05 20:14:12 - ERROR - stderr - 46%|████▌ | 10325/22434 [10:06:32<8:23:26, 2.49s/it] +2025-02-05 20:14:15 - ERROR - stderr - 46%|████▌ | 10326/22434 [10:06:34<8:31:05, 2.53s/it] +2025-02-05 20:14:15 - ERROR - stderr - +2025-02-05 20:14:15 - ERROR - stderr - +2025-02-05 20:14:15 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.0703985691070557, 'learning_rate': 1.1763647556019735e-05, 'epoch': 1.38} +2025-02-05 20:14:15 - ERROR - stderr - 46%|████▌ | 10326/22434 [10:06:35<8:31:05, 2.53s/it] +2025-02-05 20:14:17 - ERROR - stderr - 46%|████▌ | 10327/22434 [10:06:37<8:28:28, 2.52s/it] +2025-02-05 20:14:17 - ERROR - stderr - +2025-02-05 20:14:17 - ERROR - stderr - +2025-02-05 20:14:17 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.0838770866394043, 'learning_rate': 1.176222642191895e-05, 'epoch': 1.38} +2025-02-05 20:14:17 - ERROR - stderr - 46%|████▌ | 10327/22434 [10:06:37<8:28:28, 2.52s/it] +2025-02-05 20:14:20 - ERROR - stderr - 46%|████▌ | 10328/22434 [10:06:39<8:22:58, 2.49s/it] +2025-02-05 20:14:20 - ERROR - stderr - +2025-02-05 20:14:20 - ERROR - stderr - +2025-02-05 20:14:20 - INFO - stdout - {'loss': 0.7611, 'grad_norm': 1.2649205923080444, 'learning_rate': 1.176080525108624e-05, 'epoch': 1.38} +2025-02-05 20:14:20 - ERROR - stderr - 46%|████▌ | 10328/22434 [10:06:39<8:22:58, 2.49s/it] +2025-02-05 20:14:22 - ERROR - stderr - 46%|████▌ | 10329/22434 [10:06:42<8:22:19, 2.49s/it] +2025-02-05 20:14:22 - ERROR - stderr - +2025-02-05 20:14:22 - ERROR - stderr - +2025-02-05 20:14:22 - INFO - stdout - {'loss': 0.6768, 'grad_norm': 1.19253408908844, 'learning_rate': 1.1759384043551232e-05, 'epoch': 1.38} +2025-02-05 20:14:22 - ERROR - stderr - 46%|████▌ | 10329/22434 [10:06:42<8:22:19, 2.49s/it] +2025-02-05 20:14:25 - ERROR - stderr - 46%|████▌ | 10330/22434 [10:06:45<8:30:52, 2.53s/it] +2025-02-05 20:14:25 - ERROR - stderr - +2025-02-05 20:14:25 - ERROR - stderr - +2025-02-05 20:14:25 - INFO - stdout - {'loss': 0.6508, 'grad_norm': 1.1661680936813354, 'learning_rate': 1.1757962799343548e-05, 'epoch': 1.38} +2025-02-05 20:14:25 - ERROR - stderr - 46%|████▌ | 10330/22434 [10:06:45<8:30:52, 2.53s/it] +2025-02-05 20:14:27 - ERROR - stderr - 46%|████▌ | 10331/22434 [10:06:47<8:29:10, 2.52s/it] +2025-02-05 20:14:27 - ERROR - stderr - +2025-02-05 20:14:27 - ERROR - stderr - +2025-02-05 20:14:27 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.1784833669662476, 'learning_rate': 1.175654151849281e-05, 'epoch': 1.38} +2025-02-05 20:14:27 - ERROR - stderr - 46%|████▌ | 10331/22434 [10:06:47<8:29:10, 2.52s/it] +2025-02-05 20:14:30 - ERROR - stderr - 46%|████▌ | 10332/22434 [10:06:50<8:26:45, 2.51s/it] +2025-02-05 20:14:30 - ERROR - stderr - +2025-02-05 20:14:30 - ERROR - stderr - +2025-02-05 20:14:30 - INFO - stdout - {'loss': 0.6632, 'grad_norm': 1.1571674346923828, 'learning_rate': 1.1755120201028642e-05, 'epoch': 1.38} +2025-02-05 20:14:30 - ERROR - stderr - 46%|████▌ | 10332/22434 [10:06:50<8:26:45, 2.51s/it] +2025-02-05 20:14:32 - ERROR - stderr - 46%|████▌ | 10333/22434 [10:06:52<8:28:21, 2.52s/it] +2025-02-05 20:14:32 - ERROR - stderr - +2025-02-05 20:14:32 - ERROR - stderr - +2025-02-05 20:14:32 - INFO - stdout - {'loss': 0.6959, 'grad_norm': 1.2020539045333862, 'learning_rate': 1.1753698846980677e-05, 'epoch': 1.38} +2025-02-05 20:14:32 - ERROR - stderr - 46%|████▌ | 10333/22434 [10:06:52<8:28:21, 2.52s/it] +2025-02-05 20:14:35 - ERROR - stderr - 46%|████▌ | 10334/22434 [10:06:55<8:27:04, 2.51s/it] +2025-02-05 20:14:35 - ERROR - stderr - +2025-02-05 20:14:35 - ERROR - stderr - +2025-02-05 20:14:35 - INFO - stdout - {'loss': 0.6462, 'grad_norm': 1.0686465501785278, 'learning_rate': 1.1752277456378536e-05, 'epoch': 1.38} +2025-02-05 20:14:35 - ERROR - stderr - 46%|████▌ | 10334/22434 [10:06:55<8:27:04, 2.51s/it] +2025-02-05 20:14:37 - ERROR - stderr - 46%|████▌ | 10335/22434 [10:06:57<8:20:07, 2.48s/it] +2025-02-05 20:14:37 - ERROR - stderr - +2025-02-05 20:14:37 - ERROR - stderr - +2025-02-05 20:14:37 - INFO - stdout - {'loss': 0.6715, 'grad_norm': 1.1543594598770142, 'learning_rate': 1.1750856029251847e-05, 'epoch': 1.38} +2025-02-05 20:14:37 - ERROR - stderr - 46%|████▌ | 10335/22434 [10:06:57<8:20:07, 2.48s/it] +2025-02-05 20:14:40 - ERROR - stderr - 46%|████▌ | 10336/22434 [10:07:00<8:27:39, 2.52s/it] +2025-02-05 20:14:40 - ERROR - stderr - +2025-02-05 20:14:40 - ERROR - stderr - +2025-02-05 20:14:40 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.1921883821487427, 'learning_rate': 1.174943456563024e-05, 'epoch': 1.38} +2025-02-05 20:14:40 - ERROR - stderr - 46%|████▌ | 10336/22434 [10:07:00<8:27:39, 2.52s/it] +2025-02-05 20:14:42 - ERROR - stderr - 46%|████▌ | 10337/22434 [10:07:02<8:23:25, 2.50s/it] +2025-02-05 20:14:42 - ERROR - stderr - +2025-02-05 20:14:42 - ERROR - stderr - +2025-02-05 20:14:42 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.2841330766677856, 'learning_rate': 1.1748013065543344e-05, 'epoch': 1.38} +2025-02-05 20:14:42 - ERROR - stderr - 46%|████▌ | 10337/22434 [10:07:02<8:23:25, 2.50s/it] +2025-02-05 20:14:45 - ERROR - stderr - 46%|████▌ | 10338/22434 [10:07:05<8:24:28, 2.50s/it] +2025-02-05 20:14:45 - ERROR - stderr - +2025-02-05 20:14:45 - ERROR - stderr - +2025-02-05 20:14:45 - INFO - stdout - {'loss': 0.6218, 'grad_norm': 1.1827079057693481, 'learning_rate': 1.1746591529020789e-05, 'epoch': 1.38} +2025-02-05 20:14:45 - ERROR - stderr - 46%|████▌ | 10338/22434 [10:07:05<8:24:28, 2.50s/it] +2025-02-05 20:14:47 - ERROR - stderr - 46%|████▌ | 10339/22434 [10:07:07<8:25:54, 2.51s/it] +2025-02-05 20:14:47 - ERROR - stderr - +2025-02-05 20:14:47 - ERROR - stderr - +2025-02-05 20:14:47 - INFO - stdout - {'loss': 0.7056, 'grad_norm': 1.093856692314148, 'learning_rate': 1.1745169956092204e-05, 'epoch': 1.38} +2025-02-05 20:14:47 - ERROR - stderr - 46%|████▌ | 10339/22434 [10:07:07<8:25:54, 2.51s/it] +2025-02-05 20:14:50 - ERROR - stderr - 46%|████▌ | 10340/22434 [10:07:10<8:34:34, 2.55s/it] +2025-02-05 20:14:50 - ERROR - stderr - +2025-02-05 20:14:50 - ERROR - stderr - +2025-02-05 20:14:50 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.128021478652954, 'learning_rate': 1.174374834678722e-05, 'epoch': 1.38} +2025-02-05 20:14:50 - ERROR - stderr - 46%|████▌ | 10340/22434 [10:07:10<8:34:34, 2.55s/it] +2025-02-05 20:14:52 - ERROR - stderr - 46%|████▌ | 10341/22434 [10:07:12<8:29:10, 2.53s/it] +2025-02-05 20:14:52 - ERROR - stderr - +2025-02-05 20:14:52 - ERROR - stderr - +2025-02-05 20:14:52 - INFO - stdout - {'loss': 0.7339, 'grad_norm': 1.2980906963348389, 'learning_rate': 1.1742326701135473e-05, 'epoch': 1.38} +2025-02-05 20:14:52 - ERROR - stderr - 46%|████▌ | 10341/22434 [10:07:12<8:29:10, 2.53s/it] +2025-02-05 20:14:55 - ERROR - stderr - 46%|████▌ | 10342/22434 [10:07:15<8:25:51, 2.51s/it] +2025-02-05 20:14:55 - ERROR - stderr - +2025-02-05 20:14:55 - ERROR - stderr - +2025-02-05 20:14:55 - INFO - stdout - {'loss': 0.7134, 'grad_norm': 1.387661099433899, 'learning_rate': 1.1740905019166594e-05, 'epoch': 1.38} +2025-02-05 20:14:55 - ERROR - stderr - 46%|████▌ | 10342/22434 [10:07:15<8:25:51, 2.51s/it] +2025-02-05 20:14:57 - ERROR - stderr - 46%|████▌ | 10343/22434 [10:07:17<8:23:03, 2.50s/it] +2025-02-05 20:14:57 - ERROR - stderr - +2025-02-05 20:14:57 - ERROR - stderr - +2025-02-05 20:14:57 - INFO - stdout - {'loss': 0.7705, 'grad_norm': 1.3027377128601074, 'learning_rate': 1.1739483300910213e-05, 'epoch': 1.38} +2025-02-05 20:14:57 - ERROR - stderr - 46%|████▌ | 10343/22434 [10:07:17<8:23:03, 2.50s/it] +2025-02-05 20:15:00 - ERROR - stderr - 46%|████▌ | 10344/22434 [10:07:20<8:26:34, 2.51s/it] +2025-02-05 20:15:00 - ERROR - stderr - +2025-02-05 20:15:00 - ERROR - stderr - +2025-02-05 20:15:00 - INFO - stdout - {'loss': 0.6934, 'grad_norm': 1.1753196716308594, 'learning_rate': 1.1738061546395967e-05, 'epoch': 1.38} +2025-02-05 20:15:00 - ERROR - stderr - 46%|████▌ | 10344/22434 [10:07:20<8:26:34, 2.51s/it] +2025-02-05 20:15:02 - ERROR - stderr - 46%|████▌ | 10345/22434 [10:07:22<8:21:59, 2.49s/it] +2025-02-05 20:15:02 - ERROR - stderr - +2025-02-05 20:15:02 - ERROR - stderr - +2025-02-05 20:15:02 - INFO - stdout - {'loss': 0.7607, 'grad_norm': 1.255450963973999, 'learning_rate': 1.1736639755653492e-05, 'epoch': 1.38} +2025-02-05 20:15:02 - ERROR - stderr - 46%|████▌ | 10345/22434 [10:07:22<8:21:59, 2.49s/it] +2025-02-05 20:15:05 - ERROR - stderr - 46%|████▌ | 10346/22434 [10:07:25<8:20:04, 2.48s/it] +2025-02-05 20:15:05 - ERROR - stderr - +2025-02-05 20:15:05 - ERROR - stderr - +2025-02-05 20:15:05 - INFO - stdout - {'loss': 0.7238, 'grad_norm': 1.2707215547561646, 'learning_rate': 1.1735217928712423e-05, 'epoch': 1.38} +2025-02-05 20:15:05 - ERROR - stderr - 46%|████▌ | 10346/22434 [10:07:25<8:20:04, 2.48s/it] +2025-02-05 20:15:07 - ERROR - stderr - 46%|████▌ | 10347/22434 [10:07:27<8:23:32, 2.50s/it] +2025-02-05 20:15:07 - ERROR - stderr - +2025-02-05 20:15:07 - ERROR - stderr - +2025-02-05 20:15:07 - INFO - stdout - {'loss': 0.7781, 'grad_norm': 1.229047417640686, 'learning_rate': 1.1733796065602397e-05, 'epoch': 1.38} +2025-02-05 20:15:07 - ERROR - stderr - 46%|████▌ | 10347/22434 [10:07:27<8:23:32, 2.50s/it] +2025-02-05 20:15:10 - ERROR - stderr - 46%|████▌ | 10348/22434 [10:07:30<8:26:36, 2.52s/it] +2025-02-05 20:15:10 - ERROR - stderr - +2025-02-05 20:15:10 - ERROR - stderr - +2025-02-05 20:15:10 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.1879738569259644, 'learning_rate': 1.1732374166353051e-05, 'epoch': 1.38} +2025-02-05 20:15:10 - ERROR - stderr - 46%|████▌ | 10348/22434 [10:07:30<8:26:36, 2.52s/it] +2025-02-05 20:15:12 - ERROR - stderr - 46%|████▌ | 10349/22434 [10:07:32<8:29:48, 2.53s/it] +2025-02-05 20:15:12 - ERROR - stderr - +2025-02-05 20:15:12 - ERROR - stderr - +2025-02-05 20:15:12 - INFO - stdout - {'loss': 0.7634, 'grad_norm': 1.1346478462219238, 'learning_rate': 1.1730952230994022e-05, 'epoch': 1.38} +2025-02-05 20:15:12 - ERROR - stderr - 46%|████▌ | 10349/22434 [10:07:32<8:29:48, 2.53s/it] +2025-02-05 20:15:15 - ERROR - stderr - 46%|████▌ | 10350/22434 [10:07:35<8:26:14, 2.51s/it] +2025-02-05 20:15:15 - ERROR - stderr - +2025-02-05 20:15:15 - ERROR - stderr - +2025-02-05 20:15:15 - INFO - stdout - {'loss': 0.6875, 'grad_norm': 1.2419096231460571, 'learning_rate': 1.1729530259554953e-05, 'epoch': 1.38} +2025-02-05 20:15:15 - ERROR - stderr - 46%|████▌ | 10350/22434 [10:07:35<8:26:14, 2.51s/it] +2025-02-05 20:15:17 - ERROR - stderr - 46%|████▌ | 10351/22434 [10:07:37<8:25:01, 2.51s/it] +2025-02-05 20:15:17 - ERROR - stderr - +2025-02-05 20:15:17 - ERROR - stderr - +2025-02-05 20:15:17 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.0874700546264648, 'learning_rate': 1.172810825206548e-05, 'epoch': 1.38} +2025-02-05 20:15:17 - ERROR - stderr - 46%|████▌ | 10351/22434 [10:07:37<8:25:01, 2.51s/it] +2025-02-05 20:15:20 - ERROR - stderr - 46%|████▌ | 10352/22434 [10:07:40<8:22:10, 2.49s/it] +2025-02-05 20:15:20 - ERROR - stderr - +2025-02-05 20:15:20 - ERROR - stderr - +2025-02-05 20:15:20 - INFO - stdout - {'loss': 0.7107, 'grad_norm': 1.2425285577774048, 'learning_rate': 1.172668620855524e-05, 'epoch': 1.38} +2025-02-05 20:15:20 - ERROR - stderr - 46%|████▌ | 10352/22434 [10:07:40<8:22:10, 2.49s/it] +2025-02-05 20:15:22 - ERROR - stderr - 46%|████▌ | 10353/22434 [10:07:42<8:24:03, 2.50s/it] +2025-02-05 20:15:22 - ERROR - stderr - +2025-02-05 20:15:22 - ERROR - stderr - +2025-02-05 20:15:22 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.1933974027633667, 'learning_rate': 1.1725264129053881e-05, 'epoch': 1.38} +2025-02-05 20:15:22 - ERROR - stderr - 46%|████▌ | 10353/22434 [10:07:42<8:24:03, 2.50s/it] +2025-02-05 20:15:25 - ERROR - stderr - 46%|████▌ | 10354/22434 [10:07:45<8:22:09, 2.49s/it] +2025-02-05 20:15:25 - ERROR - stderr - +2025-02-05 20:15:25 - ERROR - stderr - +2025-02-05 20:15:25 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.3124704360961914, 'learning_rate': 1.1723842013591044e-05, 'epoch': 1.38} +2025-02-05 20:15:25 - ERROR - stderr - 46%|████▌ | 10354/22434 [10:07:45<8:22:09, 2.49s/it] +2025-02-05 20:15:27 - ERROR - stderr - 46%|████▌ | 10355/22434 [10:07:47<8:21:35, 2.49s/it] +2025-02-05 20:15:27 - ERROR - stderr - +2025-02-05 20:15:27 - ERROR - stderr - +2025-02-05 20:15:27 - INFO - stdout - {'loss': 0.7168, 'grad_norm': 1.1542627811431885, 'learning_rate': 1.1722419862196369e-05, 'epoch': 1.38} +2025-02-05 20:15:27 - ERROR - stderr - 46%|████▌ | 10355/22434 [10:07:47<8:21:35, 2.49s/it] +2025-02-05 20:15:30 - ERROR - stderr - 46%|████▌ | 10356/22434 [10:07:50<8:29:03, 2.53s/it] +2025-02-05 20:15:30 - ERROR - stderr - +2025-02-05 20:15:30 - ERROR - stderr - +2025-02-05 20:15:30 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.0263744592666626, 'learning_rate': 1.1720997674899496e-05, 'epoch': 1.38} +2025-02-05 20:15:30 - ERROR - stderr - 46%|████▌ | 10356/22434 [10:07:50<8:29:03, 2.53s/it] +2025-02-05 20:15:32 - ERROR - stderr - 46%|████▌ | 10357/22434 [10:07:52<8:28:00, 2.52s/it] +2025-02-05 20:15:33 - ERROR - stderr - +2025-02-05 20:15:33 - ERROR - stderr - +2025-02-05 20:15:33 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.1206023693084717, 'learning_rate': 1.171957545173008e-05, 'epoch': 1.38} +2025-02-05 20:15:33 - ERROR - stderr - 46%|████▌ | 10357/22434 [10:07:52<8:28:00, 2.52s/it] +2025-02-05 20:15:35 - ERROR - stderr - 46%|████▌ | 10358/22434 [10:07:55<8:23:18, 2.50s/it] +2025-02-05 20:15:35 - ERROR - stderr - +2025-02-05 20:15:35 - ERROR - stderr - +2025-02-05 20:15:35 - INFO - stdout - {'loss': 0.7298, 'grad_norm': 1.2792408466339111, 'learning_rate': 1.1718153192717753e-05, 'epoch': 1.39} +2025-02-05 20:15:35 - ERROR - stderr - 46%|████▌ | 10358/22434 [10:07:55<8:23:18, 2.50s/it] +2025-02-05 20:15:38 - ERROR - stderr - 46%|████▌ | 10359/22434 [10:07:57<8:30:19, 2.54s/it] +2025-02-05 20:15:38 - ERROR - stderr - +2025-02-05 20:15:38 - ERROR - stderr - +2025-02-05 20:15:38 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.2007086277008057, 'learning_rate': 1.171673089789217e-05, 'epoch': 1.39} +2025-02-05 20:15:38 - ERROR - stderr - 46%|████▌ | 10359/22434 [10:07:57<8:30:19, 2.54s/it] +2025-02-05 20:15:40 - ERROR - stderr - 46%|████▌ | 10360/22434 [10:08:00<8:24:07, 2.51s/it] +2025-02-05 20:15:40 - ERROR - stderr - +2025-02-05 20:15:40 - ERROR - stderr - +2025-02-05 20:15:40 - INFO - stdout - {'loss': 0.7705, 'grad_norm': 1.1382535696029663, 'learning_rate': 1.1715308567282972e-05, 'epoch': 1.39} +2025-02-05 20:15:40 - ERROR - stderr - 46%|████▌ | 10360/22434 [10:08:00<8:24:07, 2.51s/it] +2025-02-05 20:15:42 - ERROR - stderr - 46%|████▌ | 10361/22434 [10:08:02<8:21:32, 2.49s/it] +2025-02-05 20:15:42 - ERROR - stderr - +2025-02-05 20:15:42 - ERROR - stderr - +2025-02-05 20:15:42 - INFO - stdout - {'loss': 0.7531, 'grad_norm': 1.2086182832717896, 'learning_rate': 1.1713886200919811e-05, 'epoch': 1.39} +2025-02-05 20:15:42 - ERROR - stderr - 46%|████▌ | 10361/22434 [10:08:02<8:21:32, 2.49s/it] +2025-02-05 20:15:45 - ERROR - stderr - 46%|████▌ | 10362/22434 [10:08:05<8:31:38, 2.54s/it] +2025-02-05 20:15:45 - ERROR - stderr - +2025-02-05 20:15:45 - ERROR - stderr - +2025-02-05 20:15:45 - INFO - stdout - {'loss': 0.758, 'grad_norm': 1.2057385444641113, 'learning_rate': 1.1712463798832335e-05, 'epoch': 1.39} +2025-02-05 20:15:45 - ERROR - stderr - 46%|████▌ | 10362/22434 [10:08:05<8:31:38, 2.54s/it] +2025-02-05 20:15:48 - ERROR - stderr - 46%|████▌ | 10363/22434 [10:08:07<8:29:42, 2.53s/it] +2025-02-05 20:15:48 - ERROR - stderr - +2025-02-05 20:15:48 - ERROR - stderr - +2025-02-05 20:15:48 - INFO - stdout - {'loss': 0.695, 'grad_norm': 1.204099416732788, 'learning_rate': 1.1711041361050183e-05, 'epoch': 1.39} +2025-02-05 20:15:48 - ERROR - stderr - 46%|████▌ | 10363/22434 [10:08:07<8:29:42, 2.53s/it] +2025-02-05 20:15:50 - ERROR - stderr - 46%|████▌ | 10364/22434 [10:08:10<8:24:17, 2.51s/it] +2025-02-05 20:15:50 - ERROR - stderr - +2025-02-05 20:15:50 - ERROR - stderr - +2025-02-05 20:15:50 - INFO - stdout - {'loss': 0.7131, 'grad_norm': 1.2312778234481812, 'learning_rate': 1.1709618887603013e-05, 'epoch': 1.39} +2025-02-05 20:15:50 - ERROR - stderr - 46%|████▌ | 10364/22434 [10:08:10<8:24:17, 2.51s/it] +2025-02-05 20:15:53 - ERROR - stderr - 46%|████▌ | 10365/22434 [10:08:12<8:22:48, 2.50s/it] +2025-02-05 20:15:53 - ERROR - stderr - +2025-02-05 20:15:53 - ERROR - stderr - +2025-02-05 20:15:53 - INFO - stdout - {'loss': 0.8469, 'grad_norm': 1.306699275970459, 'learning_rate': 1.1708196378520476e-05, 'epoch': 1.39} +2025-02-05 20:15:53 - ERROR - stderr - 46%|████▌ | 10365/22434 [10:08:12<8:22:48, 2.50s/it] +2025-02-05 20:15:55 - ERROR - stderr - 46%|████▌ | 10366/22434 [10:08:15<8:24:00, 2.51s/it] +2025-02-05 20:15:55 - ERROR - stderr - +2025-02-05 20:15:55 - ERROR - stderr - +2025-02-05 20:15:55 - INFO - stdout - {'loss': 0.7263, 'grad_norm': 1.324061393737793, 'learning_rate': 1.1706773833832214e-05, 'epoch': 1.39} +2025-02-05 20:15:55 - ERROR - stderr - 46%|████▌ | 10366/22434 [10:08:15<8:24:00, 2.51s/it] +2025-02-05 20:15:58 - ERROR - stderr - 46%|████▌ | 10367/22434 [10:08:17<8:20:50, 2.49s/it] +2025-02-05 20:15:58 - ERROR - stderr - +2025-02-05 20:15:58 - ERROR - stderr - +2025-02-05 20:15:58 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.1715503931045532, 'learning_rate': 1.1705351253567892e-05, 'epoch': 1.39} +2025-02-05 20:15:58 - ERROR - stderr - 46%|████▌ | 10367/22434 [10:08:17<8:20:50, 2.49s/it] +2025-02-05 20:16:00 - ERROR - stderr - 46%|████▌ | 10368/22434 [10:08:20<8:24:01, 2.51s/it] +2025-02-05 20:16:00 - ERROR - stderr - +2025-02-05 20:16:00 - ERROR - stderr - +2025-02-05 20:16:00 - INFO - stdout - {'loss': 0.6123, 'grad_norm': 1.2502814531326294, 'learning_rate': 1.1703928637757152e-05, 'epoch': 1.39} +2025-02-05 20:16:00 - ERROR - stderr - 46%|████▌ | 10368/22434 [10:08:20<8:24:01, 2.51s/it] +2025-02-05 20:16:02 - ERROR - stderr - 46%|████▌ | 10369/22434 [10:08:22<8:19:05, 2.48s/it] +2025-02-05 20:16:03 - ERROR - stderr - +2025-02-05 20:16:03 - ERROR - stderr - +2025-02-05 20:16:03 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.3710945844650269, 'learning_rate': 1.1702505986429648e-05, 'epoch': 1.39} +2025-02-05 20:16:03 - ERROR - stderr - 46%|████▌ | 10369/22434 [10:08:22<8:19:05, 2.48s/it] +2025-02-05 20:16:05 - ERROR - stderr - 46%|████▌ | 10370/22434 [10:08:25<8:17:40, 2.48s/it] +2025-02-05 20:16:05 - ERROR - stderr - +2025-02-05 20:16:05 - ERROR - stderr - +2025-02-05 20:16:05 - INFO - stdout - {'loss': 0.7071, 'grad_norm': 1.2241703271865845, 'learning_rate': 1.170108329961504e-05, 'epoch': 1.39} +2025-02-05 20:16:05 - ERROR - stderr - 46%|████▌ | 10370/22434 [10:08:25<8:17:40, 2.48s/it] +2025-02-05 20:16:07 - ERROR - stderr - 46%|████▌ | 10371/22434 [10:08:27<8:18:20, 2.48s/it] +2025-02-05 20:16:07 - ERROR - stderr - +2025-02-05 20:16:07 - ERROR - stderr - +2025-02-05 20:16:07 - INFO - stdout - {'loss': 0.6809, 'grad_norm': 1.2022857666015625, 'learning_rate': 1.1699660577342974e-05, 'epoch': 1.39} +2025-02-05 20:16:07 - ERROR - stderr - 46%|████▌ | 10371/22434 [10:08:27<8:18:20, 2.48s/it] +2025-02-05 20:16:10 - ERROR - stderr - 46%|████▌ | 10372/22434 [10:08:30<8:21:40, 2.50s/it] +2025-02-05 20:16:10 - ERROR - stderr - +2025-02-05 20:16:10 - ERROR - stderr - +2025-02-05 20:16:10 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.2323219776153564, 'learning_rate': 1.1698237819643112e-05, 'epoch': 1.39} +2025-02-05 20:16:10 - ERROR - stderr - 46%|████▌ | 10372/22434 [10:08:30<8:21:40, 2.50s/it] +2025-02-05 20:16:12 - ERROR - stderr - 46%|████▌ | 10373/22434 [10:08:32<8:20:16, 2.49s/it] +2025-02-05 20:16:12 - ERROR - stderr - +2025-02-05 20:16:12 - ERROR - stderr - +2025-02-05 20:16:12 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.1654260158538818, 'learning_rate': 1.1696815026545107e-05, 'epoch': 1.39} +2025-02-05 20:16:12 - ERROR - stderr - 46%|████▌ | 10373/22434 [10:08:32<8:20:16, 2.49s/it] +2025-02-05 20:16:15 - ERROR - stderr - 46%|████▌ | 10374/22434 [10:08:35<8:15:12, 2.46s/it] +2025-02-05 20:16:15 - ERROR - stderr - +2025-02-05 20:16:15 - ERROR - stderr - +2025-02-05 20:16:15 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.1183232069015503, 'learning_rate': 1.1695392198078617e-05, 'epoch': 1.39} +2025-02-05 20:16:15 - ERROR - stderr - 46%|████▌ | 10374/22434 [10:08:35<8:15:12, 2.46s/it] +2025-02-05 20:16:17 - ERROR - stderr - 46%|████▌ | 10375/22434 [10:08:37<8:18:44, 2.48s/it] +2025-02-05 20:16:17 - ERROR - stderr - +2025-02-05 20:16:17 - ERROR - stderr - +2025-02-05 20:16:17 - INFO - stdout - {'loss': 0.6248, 'grad_norm': 1.1446477174758911, 'learning_rate': 1.1693969334273301e-05, 'epoch': 1.39} +2025-02-05 20:16:17 - ERROR - stderr - 46%|████▌ | 10375/22434 [10:08:37<8:18:44, 2.48s/it] +2025-02-05 20:16:20 - ERROR - stderr - 46%|████▋ | 10376/22434 [10:08:40<8:24:03, 2.51s/it] +2025-02-05 20:16:20 - ERROR - stderr - +2025-02-05 20:16:20 - ERROR - stderr - +2025-02-05 20:16:20 - INFO - stdout - {'loss': 0.751, 'grad_norm': 1.2017310857772827, 'learning_rate': 1.1692546435158814e-05, 'epoch': 1.39} +2025-02-05 20:16:20 - ERROR - stderr - 46%|████▋ | 10376/22434 [10:08:40<8:24:03, 2.51s/it] +2025-02-05 20:16:23 - ERROR - stderr - 46%|████▋ | 10377/22434 [10:08:42<8:27:18, 2.52s/it] +2025-02-05 20:16:23 - ERROR - stderr - +2025-02-05 20:16:23 - ERROR - stderr - +2025-02-05 20:16:23 - INFO - stdout - {'loss': 0.7715, 'grad_norm': 1.3004542589187622, 'learning_rate': 1.1691123500764813e-05, 'epoch': 1.39} +2025-02-05 20:16:23 - ERROR - stderr - 46%|████▋ | 10377/22434 [10:08:42<8:27:18, 2.52s/it] +2025-02-05 20:16:25 - ERROR - stderr - 46%|████▋ | 10378/22434 [10:08:45<8:26:05, 2.52s/it] +2025-02-05 20:16:25 - ERROR - stderr - +2025-02-05 20:16:25 - ERROR - stderr - +2025-02-05 20:16:25 - INFO - stdout - {'loss': 0.7635, 'grad_norm': 1.2739020586013794, 'learning_rate': 1.1689700531120965e-05, 'epoch': 1.39} +2025-02-05 20:16:25 - ERROR - stderr - 46%|████▋ | 10378/22434 [10:08:45<8:26:05, 2.52s/it] +2025-02-05 20:16:27 - ERROR - stderr - 46%|████▋ | 10379/22434 [10:08:47<8:21:25, 2.50s/it] +2025-02-05 20:16:27 - ERROR - stderr - +2025-02-05 20:16:27 - ERROR - stderr - +2025-02-05 20:16:27 - INFO - stdout - {'loss': 0.7797, 'grad_norm': 1.2735795974731445, 'learning_rate': 1.1688277526256923e-05, 'epoch': 1.39} +2025-02-05 20:16:27 - ERROR - stderr - 46%|████▋ | 10379/22434 [10:08:47<8:21:25, 2.50s/it] +2025-02-05 20:16:30 - ERROR - stderr - 46%|████▋ | 10380/22434 [10:08:50<8:25:04, 2.51s/it] +2025-02-05 20:16:30 - ERROR - stderr - +2025-02-05 20:16:30 - ERROR - stderr - +2025-02-05 20:16:30 - INFO - stdout - {'loss': 0.735, 'grad_norm': 1.2582001686096191, 'learning_rate': 1.1686854486202352e-05, 'epoch': 1.39} +2025-02-05 20:16:30 - ERROR - stderr - 46%|████▋ | 10380/22434 [10:08:50<8:25:04, 2.51s/it] +2025-02-05 20:16:32 - ERROR - stderr - 46%|████▋ | 10381/22434 [10:08:52<8:21:10, 2.49s/it] +2025-02-05 20:16:33 - ERROR - stderr - +2025-02-05 20:16:33 - ERROR - stderr - +2025-02-05 20:16:33 - INFO - stdout - {'loss': 0.639, 'grad_norm': 1.1086448431015015, 'learning_rate': 1.1685431410986913e-05, 'epoch': 1.39} +2025-02-05 20:16:33 - ERROR - stderr - 46%|████▋ | 10381/22434 [10:08:52<8:21:10, 2.49s/it] +2025-02-05 20:16:35 - ERROR - stderr - 46%|████▋ | 10382/22434 [10:08:55<8:21:50, 2.50s/it] +2025-02-05 20:16:35 - ERROR - stderr - +2025-02-05 20:16:35 - ERROR - stderr - +2025-02-05 20:16:35 - INFO - stdout - {'loss': 0.7643, 'grad_norm': 1.215226173400879, 'learning_rate': 1.168400830064027e-05, 'epoch': 1.39} +2025-02-05 20:16:35 - ERROR - stderr - 46%|████▋ | 10382/22434 [10:08:55<8:21:50, 2.50s/it] +2025-02-05 20:16:37 - ERROR - stderr - 46%|████▋ | 10383/22434 [10:08:57<8:20:53, 2.49s/it] +2025-02-05 20:16:37 - ERROR - stderr - +2025-02-05 20:16:37 - ERROR - stderr - +2025-02-05 20:16:37 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.1814804077148438, 'learning_rate': 1.168258515519209e-05, 'epoch': 1.39} +2025-02-05 20:16:37 - ERROR - stderr - 46%|████▋ | 10383/22434 [10:08:57<8:20:53, 2.49s/it] +2025-02-05 20:16:40 - ERROR - stderr - 46%|████▋ | 10384/22434 [10:09:00<8:25:57, 2.52s/it] +2025-02-05 20:16:40 - ERROR - stderr - +2025-02-05 20:16:40 - ERROR - stderr - +2025-02-05 20:16:40 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.2276791334152222, 'learning_rate': 1.1681161974672026e-05, 'epoch': 1.39} +2025-02-05 20:16:40 - ERROR - stderr - 46%|████▋ | 10384/22434 [10:09:00<8:25:57, 2.52s/it] +2025-02-05 20:16:43 - ERROR - stderr - 46%|████▋ | 10385/22434 [10:09:02<8:27:21, 2.53s/it] +2025-02-05 20:16:43 - ERROR - stderr - +2025-02-05 20:16:43 - ERROR - stderr - +2025-02-05 20:16:43 - INFO - stdout - {'loss': 0.6977, 'grad_norm': 1.1327016353607178, 'learning_rate': 1.1679738759109748e-05, 'epoch': 1.39} +2025-02-05 20:16:43 - ERROR - stderr - 46%|████▋ | 10385/22434 [10:09:02<8:27:21, 2.53s/it] +2025-02-05 20:16:45 - ERROR - stderr - 46%|████▋ | 10386/22434 [10:09:05<8:26:20, 2.52s/it] +2025-02-05 20:16:45 - ERROR - stderr - +2025-02-05 20:16:45 - ERROR - stderr - +2025-02-05 20:16:45 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.219773769378662, 'learning_rate': 1.1678315508534928e-05, 'epoch': 1.39} +2025-02-05 20:16:45 - ERROR - stderr - 46%|████▋ | 10386/22434 [10:09:05<8:26:20, 2.52s/it] +2025-02-05 20:16:48 - ERROR - stderr - 46%|████▋ | 10387/22434 [10:09:07<8:28:44, 2.53s/it] +2025-02-05 20:16:48 - ERROR - stderr - +2025-02-05 20:16:48 - ERROR - stderr - +2025-02-05 20:16:48 - INFO - stdout - {'loss': 0.7137, 'grad_norm': 1.2139183282852173, 'learning_rate': 1.1676892222977227e-05, 'epoch': 1.39} +2025-02-05 20:16:48 - ERROR - stderr - 46%|████▋ | 10387/22434 [10:09:07<8:28:44, 2.53s/it] +2025-02-05 20:16:50 - ERROR - stderr - 46%|████▋ | 10388/22434 [10:09:10<8:24:12, 2.51s/it] +2025-02-05 20:16:50 - ERROR - stderr - +2025-02-05 20:16:50 - ERROR - stderr - +2025-02-05 20:16:50 - INFO - stdout - {'loss': 0.7419, 'grad_norm': 1.1520743370056152, 'learning_rate': 1.1675468902466311e-05, 'epoch': 1.39} +2025-02-05 20:16:50 - ERROR - stderr - 46%|████▋ | 10388/22434 [10:09:10<8:24:12, 2.51s/it] +2025-02-05 20:16:53 - ERROR - stderr - 46%|████▋ | 10389/22434 [10:09:12<8:24:04, 2.51s/it] +2025-02-05 20:16:53 - ERROR - stderr - +2025-02-05 20:16:53 - ERROR - stderr - +2025-02-05 20:16:53 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.0907866954803467, 'learning_rate': 1.167404554703185e-05, 'epoch': 1.39} +2025-02-05 20:16:53 - ERROR - stderr - 46%|████▋ | 10389/22434 [10:09:12<8:24:04, 2.51s/it] +2025-02-05 20:16:55 - ERROR - stderr - 46%|████▋ | 10390/22434 [10:09:15<8:19:07, 2.49s/it] +2025-02-05 20:16:55 - ERROR - stderr - +2025-02-05 20:16:55 - ERROR - stderr - +2025-02-05 20:16:55 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.1469650268554688, 'learning_rate': 1.1672622156703508e-05, 'epoch': 1.39} +2025-02-05 20:16:55 - ERROR - stderr - 46%|████▋ | 10390/22434 [10:09:15<8:19:07, 2.49s/it] +2025-02-05 20:16:58 - ERROR - stderr - 46%|████▋ | 10391/22434 [10:09:17<8:20:45, 2.49s/it] +2025-02-05 20:16:58 - ERROR - stderr - +2025-02-05 20:16:58 - ERROR - stderr - +2025-02-05 20:16:58 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.1694732904434204, 'learning_rate': 1.167119873151096e-05, 'epoch': 1.39} +2025-02-05 20:16:58 - ERROR - stderr - 46%|████▋ | 10391/22434 [10:09:17<8:20:45, 2.49s/it] +2025-02-05 20:17:00 - ERROR - stderr - 46%|████▋ | 10392/22434 [10:09:20<8:20:23, 2.49s/it] +2025-02-05 20:17:00 - ERROR - stderr - +2025-02-05 20:17:00 - ERROR - stderr - +2025-02-05 20:17:00 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.2636549472808838, 'learning_rate': 1.1669775271483875e-05, 'epoch': 1.39} +2025-02-05 20:17:00 - ERROR - stderr - 46%|████▋ | 10392/22434 [10:09:20<8:20:23, 2.49s/it] +2025-02-05 20:17:03 - ERROR - stderr - 46%|████▋ | 10393/22434 [10:09:22<8:22:28, 2.50s/it] +2025-02-05 20:17:03 - ERROR - stderr - +2025-02-05 20:17:03 - ERROR - stderr - +2025-02-05 20:17:03 - INFO - stdout - {'loss': 0.7414, 'grad_norm': 1.1828047037124634, 'learning_rate': 1.1668351776651918e-05, 'epoch': 1.39} +2025-02-05 20:17:03 - ERROR - stderr - 46%|████▋ | 10393/22434 [10:09:22<8:22:28, 2.50s/it] +2025-02-05 20:17:05 - ERROR - stderr - 46%|████▋ | 10394/22434 [10:09:25<8:19:12, 2.49s/it] +2025-02-05 20:17:05 - ERROR - stderr - +2025-02-05 20:17:05 - ERROR - stderr - +2025-02-05 20:17:05 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.1118900775909424, 'learning_rate': 1.1666928247044769e-05, 'epoch': 1.39} +2025-02-05 20:17:05 - ERROR - stderr - 46%|████▋ | 10394/22434 [10:09:25<8:19:12, 2.49s/it] +2025-02-05 20:17:07 - ERROR - stderr - 46%|████▋ | 10395/22434 [10:09:27<8:15:08, 2.47s/it] +2025-02-05 20:17:07 - ERROR - stderr - +2025-02-05 20:17:07 - ERROR - stderr - +2025-02-05 20:17:07 - INFO - stdout - {'loss': 0.7323, 'grad_norm': 1.1836761236190796, 'learning_rate': 1.1665504682692096e-05, 'epoch': 1.39} +2025-02-05 20:17:07 - ERROR - stderr - 46%|████▋ | 10395/22434 [10:09:27<8:15:08, 2.47s/it] +2025-02-05 20:17:10 - ERROR - stderr - 46%|████▋ | 10396/22434 [10:09:30<8:13:46, 2.46s/it] +2025-02-05 20:17:10 - ERROR - stderr - +2025-02-05 20:17:10 - ERROR - stderr - +2025-02-05 20:17:10 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.2827930450439453, 'learning_rate': 1.1664081083623569e-05, 'epoch': 1.39} +2025-02-05 20:17:10 - ERROR - stderr - 46%|████▋ | 10396/22434 [10:09:30<8:13:46, 2.46s/it] +2025-02-05 20:17:12 - ERROR - stderr - 46%|████▋ | 10397/22434 [10:09:32<8:10:25, 2.44s/it] +2025-02-05 20:17:12 - ERROR - stderr - +2025-02-05 20:17:12 - ERROR - stderr - +2025-02-05 20:17:12 - INFO - stdout - {'loss': 0.6873, 'grad_norm': 1.0846987962722778, 'learning_rate': 1.1662657449868865e-05, 'epoch': 1.39} +2025-02-05 20:17:12 - ERROR - stderr - 46%|████▋ | 10397/22434 [10:09:32<8:10:25, 2.44s/it] +2025-02-05 20:17:15 - ERROR - stderr - 46%|████▋ | 10398/22434 [10:09:35<8:12:13, 2.45s/it] +2025-02-05 20:17:15 - ERROR - stderr - +2025-02-05 20:17:15 - ERROR - stderr - +2025-02-05 20:17:15 - INFO - stdout - {'loss': 0.6278, 'grad_norm': 1.0482357740402222, 'learning_rate': 1.1661233781457655e-05, 'epoch': 1.39} +2025-02-05 20:17:15 - ERROR - stderr - 46%|████▋ | 10398/22434 [10:09:35<8:12:13, 2.45s/it] +2025-02-05 20:17:17 - ERROR - stderr - 46%|████▋ | 10399/22434 [10:09:37<8:14:48, 2.47s/it] +2025-02-05 20:17:17 - ERROR - stderr - +2025-02-05 20:17:17 - ERROR - stderr - +2025-02-05 20:17:17 - INFO - stdout - {'loss': 0.8333, 'grad_norm': 1.2821825742721558, 'learning_rate': 1.165981007841962e-05, 'epoch': 1.39} +2025-02-05 20:17:17 - ERROR - stderr - 46%|████▋ | 10399/22434 [10:09:37<8:14:48, 2.47s/it] +2025-02-05 20:17:20 - ERROR - stderr - 46%|████▋ | 10400/22434 [10:09:39<8:12:10, 2.45s/it] +2025-02-05 20:17:20 - ERROR - stderr - +2025-02-05 20:17:20 - ERROR - stderr - +2025-02-05 20:17:20 - INFO - stdout - {'loss': 0.7476, 'grad_norm': 1.354382872581482, 'learning_rate': 1.1658386340784431e-05, 'epoch': 1.39} +2025-02-05 20:17:20 - ERROR - stderr - 46%|████▋ | 10400/22434 [10:09:40<8:12:10, 2.45s/it] +2025-02-05 20:17:22 - ERROR - stderr - 46%|████▋ | 10401/22434 [10:09:42<8:09:37, 2.44s/it] +2025-02-05 20:17:22 - ERROR - stderr - +2025-02-05 20:17:22 - ERROR - stderr - +2025-02-05 20:17:22 - INFO - stdout - {'loss': 0.6552, 'grad_norm': 1.040104866027832, 'learning_rate': 1.1656962568581767e-05, 'epoch': 1.39} +2025-02-05 20:17:22 - ERROR - stderr - 46%|████▋ | 10401/22434 [10:09:42<8:09:37, 2.44s/it] +2025-02-05 20:17:25 - ERROR - stderr - 46%|████▋ | 10402/22434 [10:09:44<8:15:54, 2.47s/it] +2025-02-05 20:17:25 - ERROR - stderr - +2025-02-05 20:17:25 - ERROR - stderr - +2025-02-05 20:17:25 - INFO - stdout - {'loss': 0.7513, 'grad_norm': 1.144014596939087, 'learning_rate': 1.16555387618413e-05, 'epoch': 1.39} +2025-02-05 20:17:25 - ERROR - stderr - 46%|████▋ | 10402/22434 [10:09:44<8:15:54, 2.47s/it] +2025-02-05 20:17:27 - ERROR - stderr - 46%|████▋ | 10403/22434 [10:09:47<8:18:31, 2.49s/it] +2025-02-05 20:17:27 - ERROR - stderr - +2025-02-05 20:17:27 - ERROR - stderr - +2025-02-05 20:17:27 - INFO - stdout - {'loss': 0.7119, 'grad_norm': 1.3031235933303833, 'learning_rate': 1.1654114920592715e-05, 'epoch': 1.39} +2025-02-05 20:17:27 - ERROR - stderr - 46%|████▋ | 10403/22434 [10:09:47<8:18:31, 2.49s/it] +2025-02-05 20:17:30 - ERROR - stderr - 46%|████▋ | 10404/22434 [10:09:49<8:15:17, 2.47s/it] +2025-02-05 20:17:30 - ERROR - stderr - +2025-02-05 20:17:30 - ERROR - stderr - +2025-02-05 20:17:30 - INFO - stdout - {'loss': 0.614, 'grad_norm': 1.069855809211731, 'learning_rate': 1.1652691044865687e-05, 'epoch': 1.39} +2025-02-05 20:17:30 - ERROR - stderr - 46%|████▋ | 10404/22434 [10:09:49<8:15:17, 2.47s/it] +2025-02-05 20:17:32 - ERROR - stderr - 46%|████▋ | 10405/22434 [10:09:52<8:27:53, 2.53s/it] +2025-02-05 20:17:32 - ERROR - stderr - +2025-02-05 20:17:32 - ERROR - stderr - +2025-02-05 20:17:32 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.1170841455459595, 'learning_rate': 1.1651267134689895e-05, 'epoch': 1.39} +2025-02-05 20:17:32 - ERROR - stderr - 46%|████▋ | 10405/22434 [10:09:52<8:27:53, 2.53s/it] +2025-02-05 20:17:35 - ERROR - stderr - 46%|████▋ | 10406/22434 [10:09:55<8:25:41, 2.52s/it] +2025-02-05 20:17:35 - ERROR - stderr - +2025-02-05 20:17:35 - ERROR - stderr - +2025-02-05 20:17:35 - INFO - stdout - {'loss': 0.7182, 'grad_norm': 1.2767812013626099, 'learning_rate': 1.1649843190095018e-05, 'epoch': 1.39} +2025-02-05 20:17:35 - ERROR - stderr - 46%|████▋ | 10406/22434 [10:09:55<8:25:41, 2.52s/it] +2025-02-05 20:17:37 - ERROR - stderr - 46%|████▋ | 10407/22434 [10:09:57<8:20:57, 2.50s/it] +2025-02-05 20:17:37 - ERROR - stderr - +2025-02-05 20:17:37 - ERROR - stderr - +2025-02-05 20:17:37 - INFO - stdout - {'loss': 0.5809, 'grad_norm': 1.0920031070709229, 'learning_rate': 1.1648419211110742e-05, 'epoch': 1.39} +2025-02-05 20:17:37 - ERROR - stderr - 46%|████▋ | 10407/22434 [10:09:57<8:20:57, 2.50s/it] +2025-02-05 20:17:40 - ERROR - stderr - 46%|████▋ | 10408/22434 [10:10:00<8:23:09, 2.51s/it] +2025-02-05 20:17:40 - ERROR - stderr - +2025-02-05 20:17:40 - ERROR - stderr - +2025-02-05 20:17:40 - INFO - stdout - {'loss': 0.7666, 'grad_norm': 1.2283834218978882, 'learning_rate': 1.1646995197766743e-05, 'epoch': 1.39} +2025-02-05 20:17:40 - ERROR - stderr - 46%|████▋ | 10408/22434 [10:10:00<8:23:09, 2.51s/it] +2025-02-05 20:17:42 - ERROR - stderr - 46%|████▋ | 10409/22434 [10:10:02<8:18:16, 2.49s/it] +2025-02-05 20:17:42 - ERROR - stderr - +2025-02-05 20:17:42 - ERROR - stderr - +2025-02-05 20:17:42 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.1616506576538086, 'learning_rate': 1.1645571150092705e-05, 'epoch': 1.39} +2025-02-05 20:17:42 - ERROR - stderr - 46%|████▋ | 10409/22434 [10:10:02<8:18:16, 2.49s/it] +2025-02-05 20:17:45 - ERROR - stderr - 46%|████▋ | 10410/22434 [10:10:04<8:20:37, 2.50s/it] +2025-02-05 20:17:45 - ERROR - stderr - +2025-02-05 20:17:45 - ERROR - stderr - +2025-02-05 20:17:45 - INFO - stdout - {'loss': 0.7814, 'grad_norm': 1.1822274923324585, 'learning_rate': 1.1644147068118313e-05, 'epoch': 1.39} +2025-02-05 20:17:45 - ERROR - stderr - 46%|████▋ | 10410/22434 [10:10:05<8:20:37, 2.50s/it] +2025-02-05 20:17:47 - ERROR - stderr - 46%|████▋ | 10411/22434 [10:10:07<8:17:52, 2.48s/it] +2025-02-05 20:17:47 - ERROR - stderr - +2025-02-05 20:17:47 - ERROR - stderr - +2025-02-05 20:17:47 - INFO - stdout - {'loss': 0.8343, 'grad_norm': 1.3648608922958374, 'learning_rate': 1.1642722951873244e-05, 'epoch': 1.39} +2025-02-05 20:17:47 - ERROR - stderr - 46%|████▋ | 10411/22434 [10:10:07<8:17:52, 2.48s/it] +2025-02-05 20:17:50 - ERROR - stderr - 46%|████▋ | 10412/22434 [10:10:10<8:34:18, 2.57s/it] +2025-02-05 20:17:50 - ERROR - stderr - +2025-02-05 20:17:50 - ERROR - stderr - +2025-02-05 20:17:50 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.2371116876602173, 'learning_rate': 1.1641298801387191e-05, 'epoch': 1.39} +2025-02-05 20:17:50 - ERROR - stderr - 46%|████▋ | 10412/22434 [10:10:10<8:34:18, 2.57s/it] +2025-02-05 20:17:52 - ERROR - stderr - 46%|████▋ | 10413/22434 [10:10:12<8:24:34, 2.52s/it] +2025-02-05 20:17:52 - ERROR - stderr - +2025-02-05 20:17:52 - ERROR - stderr - +2025-02-05 20:17:52 - INFO - stdout - {'loss': 0.7393, 'grad_norm': 1.3450381755828857, 'learning_rate': 1.1639874616689832e-05, 'epoch': 1.39} +2025-02-05 20:17:52 - ERROR - stderr - 46%|████▋ | 10413/22434 [10:10:12<8:24:34, 2.52s/it] +2025-02-05 20:17:55 - ERROR - stderr - 46%|████▋ | 10414/22434 [10:10:15<8:21:18, 2.50s/it] +2025-02-05 20:17:55 - ERROR - stderr - +2025-02-05 20:17:55 - ERROR - stderr - +2025-02-05 20:17:55 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.193926215171814, 'learning_rate': 1.1638450397810859e-05, 'epoch': 1.39} +2025-02-05 20:17:55 - ERROR - stderr - 46%|████▋ | 10414/22434 [10:10:15<8:21:18, 2.50s/it] +2025-02-05 20:17:57 - ERROR - stderr - 46%|████▋ | 10415/22434 [10:10:17<8:25:32, 2.52s/it] +2025-02-05 20:17:57 - ERROR - stderr - +2025-02-05 20:17:57 - ERROR - stderr - +2025-02-05 20:17:57 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.2264595031738281, 'learning_rate': 1.1637026144779955e-05, 'epoch': 1.39} +2025-02-05 20:17:57 - ERROR - stderr - 46%|████▋ | 10415/22434 [10:10:17<8:25:32, 2.52s/it] +2025-02-05 20:18:00 - ERROR - stderr - 46%|████▋ | 10416/22434 [10:10:20<8:46:51, 2.63s/it] +2025-02-05 20:18:00 - ERROR - stderr - +2025-02-05 20:18:00 - ERROR - stderr - +2025-02-05 20:18:00 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.0404021739959717, 'learning_rate': 1.1635601857626806e-05, 'epoch': 1.39} +2025-02-05 20:18:00 - ERROR - stderr - 46%|████▋ | 10416/22434 [10:10:20<8:46:51, 2.63s/it] +2025-02-05 20:18:03 - ERROR - stderr - 46%|████▋ | 10417/22434 [10:10:22<8:34:12, 2.57s/it] +2025-02-05 20:18:03 - ERROR - stderr - +2025-02-05 20:18:03 - ERROR - stderr - +2025-02-05 20:18:03 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.0510411262512207, 'learning_rate': 1.16341775363811e-05, 'epoch': 1.39} +2025-02-05 20:18:03 - ERROR - stderr - 46%|████▋ | 10417/22434 [10:10:22<8:34:12, 2.57s/it] +2025-02-05 20:18:05 - ERROR - stderr - 46%|████▋ | 10418/22434 [10:10:25<8:38:00, 2.59s/it] +2025-02-05 20:18:05 - ERROR - stderr - +2025-02-05 20:18:05 - ERROR - stderr - +2025-02-05 20:18:05 - INFO - stdout - {'loss': 0.7888, 'grad_norm': 1.2562861442565918, 'learning_rate': 1.163275318107253e-05, 'epoch': 1.39} +2025-02-05 20:18:05 - ERROR - stderr - 46%|████▋ | 10418/22434 [10:10:25<8:38:00, 2.59s/it] +2025-02-05 20:18:08 - ERROR - stderr - 46%|████▋ | 10419/22434 [10:10:27<8:27:05, 2.53s/it] +2025-02-05 20:18:08 - ERROR - stderr - +2025-02-05 20:18:08 - ERROR - stderr - +2025-02-05 20:18:08 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.365065574645996, 'learning_rate': 1.1631328791730781e-05, 'epoch': 1.39} +2025-02-05 20:18:08 - ERROR - stderr - 46%|████▋ | 10419/22434 [10:10:28<8:27:05, 2.53s/it] +2025-02-05 20:18:10 - ERROR - stderr - 46%|████▋ | 10420/22434 [10:10:30<8:27:35, 2.53s/it] +2025-02-05 20:18:10 - ERROR - stderr - +2025-02-05 20:18:10 - ERROR - stderr - +2025-02-05 20:18:10 - INFO - stdout - {'loss': 0.6181, 'grad_norm': 1.1708908081054688, 'learning_rate': 1.1629904368385545e-05, 'epoch': 1.39} +2025-02-05 20:18:10 - ERROR - stderr - 46%|████▋ | 10420/22434 [10:10:30<8:27:35, 2.53s/it] +2025-02-05 20:18:13 - ERROR - stderr - 46%|████▋ | 10421/22434 [10:10:32<8:22:10, 2.51s/it] +2025-02-05 20:18:13 - ERROR - stderr - +2025-02-05 20:18:13 - ERROR - stderr - +2025-02-05 20:18:13 - INFO - stdout - {'loss': 0.6669, 'grad_norm': 1.1322797536849976, 'learning_rate': 1.162847991106651e-05, 'epoch': 1.39} +2025-02-05 20:18:13 - ERROR - stderr - 46%|████▋ | 10421/22434 [10:10:33<8:22:10, 2.51s/it] +2025-02-05 20:18:15 - ERROR - stderr - 46%|████▋ | 10422/22434 [10:10:35<8:21:24, 2.50s/it] +2025-02-05 20:18:15 - ERROR - stderr - +2025-02-05 20:18:15 - ERROR - stderr - +2025-02-05 20:18:15 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.2137596607208252, 'learning_rate': 1.1627055419803372e-05, 'epoch': 1.39} +2025-02-05 20:18:15 - ERROR - stderr - 46%|████▋ | 10422/22434 [10:10:35<8:21:24, 2.50s/it] +2025-02-05 20:18:18 - ERROR - stderr - 46%|████▋ | 10423/22434 [10:10:37<8:21:43, 2.51s/it] +2025-02-05 20:18:18 - ERROR - stderr - +2025-02-05 20:18:18 - ERROR - stderr - +2025-02-05 20:18:18 - INFO - stdout - {'loss': 0.8471, 'grad_norm': 1.450652837753296, 'learning_rate': 1.1625630894625819e-05, 'epoch': 1.39} +2025-02-05 20:18:18 - ERROR - stderr - 46%|████▋ | 10423/22434 [10:10:38<8:21:43, 2.51s/it] +2025-02-05 20:18:20 - ERROR - stderr - 46%|████▋ | 10424/22434 [10:10:40<8:25:25, 2.53s/it] +2025-02-05 20:18:20 - ERROR - stderr - +2025-02-05 20:18:20 - ERROR - stderr - +2025-02-05 20:18:20 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.2751837968826294, 'learning_rate': 1.1624206335563547e-05, 'epoch': 1.39} +2025-02-05 20:18:20 - ERROR - stderr - 46%|████▋ | 10424/22434 [10:10:40<8:25:25, 2.53s/it] +2025-02-05 20:18:23 - ERROR - stderr - 46%|████▋ | 10425/22434 [10:10:43<8:23:10, 2.51s/it] +2025-02-05 20:18:23 - ERROR - stderr - +2025-02-05 20:18:23 - ERROR - stderr - +2025-02-05 20:18:23 - INFO - stdout - {'loss': 0.7327, 'grad_norm': 1.1801493167877197, 'learning_rate': 1.1622781742646248e-05, 'epoch': 1.39} +2025-02-05 20:18:23 - ERROR - stderr - 46%|████▋ | 10425/22434 [10:10:43<8:23:10, 2.51s/it] +2025-02-05 20:18:25 - ERROR - stderr - 46%|████▋ | 10426/22434 [10:10:45<8:24:36, 2.52s/it] +2025-02-05 20:18:25 - ERROR - stderr - +2025-02-05 20:18:25 - ERROR - stderr - +2025-02-05 20:18:25 - INFO - stdout - {'loss': 0.7745, 'grad_norm': 1.1296132802963257, 'learning_rate': 1.1621357115903615e-05, 'epoch': 1.39} +2025-02-05 20:18:25 - ERROR - stderr - 46%|████▋ | 10426/22434 [10:10:45<8:24:36, 2.52s/it] +2025-02-05 20:18:28 - ERROR - stderr - 46%|████▋ | 10427/22434 [10:10:48<8:24:52, 2.52s/it] +2025-02-05 20:18:28 - ERROR - stderr - +2025-02-05 20:18:28 - ERROR - stderr - +2025-02-05 20:18:28 - INFO - stdout - {'loss': 0.7566, 'grad_norm': 1.184929370880127, 'learning_rate': 1.1619932455365346e-05, 'epoch': 1.39} +2025-02-05 20:18:28 - ERROR - stderr - 46%|████▋ | 10427/22434 [10:10:48<8:24:52, 2.52s/it] +2025-02-05 20:18:30 - ERROR - stderr - 46%|████▋ | 10428/22434 [10:10:50<8:24:07, 2.52s/it] +2025-02-05 20:18:30 - ERROR - stderr - +2025-02-05 20:18:30 - ERROR - stderr - +2025-02-05 20:18:30 - INFO - stdout - {'loss': 0.7303, 'grad_norm': 1.3677117824554443, 'learning_rate': 1.1618507761061136e-05, 'epoch': 1.39} +2025-02-05 20:18:30 - ERROR - stderr - 46%|████▋ | 10428/22434 [10:10:50<8:24:07, 2.52s/it] +2025-02-05 20:18:33 - ERROR - stderr - 46%|████▋ | 10429/22434 [10:10:53<8:20:40, 2.50s/it] +2025-02-05 20:18:33 - ERROR - stderr - +2025-02-05 20:18:33 - ERROR - stderr - +2025-02-05 20:18:33 - INFO - stdout - {'loss': 0.7569, 'grad_norm': 1.2666159868240356, 'learning_rate': 1.1617083033020678e-05, 'epoch': 1.39} +2025-02-05 20:18:33 - ERROR - stderr - 46%|████▋ | 10429/22434 [10:10:53<8:20:40, 2.50s/it] +2025-02-05 20:18:35 - ERROR - stderr - 46%|████▋ | 10430/22434 [10:10:55<8:22:46, 2.51s/it] +2025-02-05 20:18:35 - ERROR - stderr - +2025-02-05 20:18:35 - ERROR - stderr - +2025-02-05 20:18:35 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.1321218013763428, 'learning_rate': 1.1615658271273668e-05, 'epoch': 1.39} +2025-02-05 20:18:35 - ERROR - stderr - 46%|████▋ | 10430/22434 [10:10:55<8:22:46, 2.51s/it] +2025-02-05 20:18:38 - ERROR - stderr - 46%|████▋ | 10431/22434 [10:10:58<8:20:32, 2.50s/it] +2025-02-05 20:18:38 - ERROR - stderr - +2025-02-05 20:18:38 - ERROR - stderr - +2025-02-05 20:18:38 - INFO - stdout - {'loss': 0.6681, 'grad_norm': 1.1485258340835571, 'learning_rate': 1.1614233475849815e-05, 'epoch': 1.39} +2025-02-05 20:18:38 - ERROR - stderr - 46%|████▋ | 10431/22434 [10:10:58<8:20:32, 2.50s/it] +2025-02-05 20:18:40 - ERROR - stderr - 47%|████▋ | 10432/22434 [10:11:00<8:19:46, 2.50s/it] +2025-02-05 20:18:40 - ERROR - stderr - +2025-02-05 20:18:40 - ERROR - stderr - +2025-02-05 20:18:40 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.227471113204956, 'learning_rate': 1.1612808646778806e-05, 'epoch': 1.4} +2025-02-05 20:18:40 - ERROR - stderr - 47%|████▋ | 10432/22434 [10:11:00<8:19:46, 2.50s/it] +2025-02-05 20:18:43 - ERROR - stderr - 47%|████▋ | 10433/22434 [10:11:02<8:14:46, 2.47s/it] +2025-02-05 20:18:43 - ERROR - stderr - +2025-02-05 20:18:43 - ERROR - stderr - +2025-02-05 20:18:43 - INFO - stdout - {'loss': 0.6271, 'grad_norm': 1.1490963697433472, 'learning_rate': 1.1611383784090344e-05, 'epoch': 1.4} +2025-02-05 20:18:43 - ERROR - stderr - 47%|████▋ | 10433/22434 [10:11:03<8:14:46, 2.47s/it] +2025-02-05 20:18:45 - ERROR - stderr - 47%|████▋ | 10434/22434 [10:11:05<8:16:24, 2.48s/it] +2025-02-05 20:18:45 - ERROR - stderr - +2025-02-05 20:18:45 - ERROR - stderr - +2025-02-05 20:18:45 - INFO - stdout - {'loss': 0.6329, 'grad_norm': 1.0161354541778564, 'learning_rate': 1.160995888781413e-05, 'epoch': 1.4} +2025-02-05 20:18:45 - ERROR - stderr - 47%|████▋ | 10434/22434 [10:11:05<8:16:24, 2.48s/it] +2025-02-05 20:18:48 - ERROR - stderr - 47%|████▋ | 10435/22434 [10:11:07<8:12:46, 2.46s/it] +2025-02-05 20:18:48 - ERROR - stderr - +2025-02-05 20:18:48 - ERROR - stderr - +2025-02-05 20:18:48 - INFO - stdout - {'loss': 0.7235, 'grad_norm': 1.1661683320999146, 'learning_rate': 1.1608533957979867e-05, 'epoch': 1.4} +2025-02-05 20:18:48 - ERROR - stderr - 47%|████▋ | 10435/22434 [10:11:07<8:12:46, 2.46s/it] +2025-02-05 20:18:50 - ERROR - stderr - 47%|████▋ | 10436/22434 [10:11:10<8:25:35, 2.53s/it] +2025-02-05 20:18:50 - ERROR - stderr - +2025-02-05 20:18:50 - ERROR - stderr - +2025-02-05 20:18:50 - INFO - stdout - {'loss': 0.7422, 'grad_norm': 1.1211094856262207, 'learning_rate': 1.1607108994617245e-05, 'epoch': 1.4} +2025-02-05 20:18:50 - ERROR - stderr - 47%|████▋ | 10436/22434 [10:11:10<8:25:35, 2.53s/it] +2025-02-05 20:18:53 - ERROR - stderr - 47%|████▋ | 10437/22434 [10:11:13<8:19:34, 2.50s/it] +2025-02-05 20:18:53 - ERROR - stderr - +2025-02-05 20:18:53 - ERROR - stderr - +2025-02-05 20:18:53 - INFO - stdout - {'loss': 0.712, 'grad_norm': 1.2231959104537964, 'learning_rate': 1.1605683997755977e-05, 'epoch': 1.4} +2025-02-05 20:18:53 - ERROR - stderr - 47%|████▋ | 10437/22434 [10:11:13<8:19:34, 2.50s/it] +2025-02-05 20:18:55 - ERROR - stderr - 47%|████▋ | 10438/22434 [10:11:15<8:19:33, 2.50s/it] +2025-02-05 20:18:55 - ERROR - stderr - +2025-02-05 20:18:55 - ERROR - stderr - +2025-02-05 20:18:55 - INFO - stdout - {'loss': 0.7079, 'grad_norm': 1.1116641759872437, 'learning_rate': 1.1604258967425764e-05, 'epoch': 1.4} +2025-02-05 20:18:55 - ERROR - stderr - 47%|████▋ | 10438/22434 [10:11:15<8:19:33, 2.50s/it] +2025-02-05 20:18:58 - ERROR - stderr - 47%|████▋ | 10439/22434 [10:11:17<8:15:28, 2.48s/it] +2025-02-05 20:18:58 - ERROR - stderr - +2025-02-05 20:18:58 - ERROR - stderr - +2025-02-05 20:18:58 - INFO - stdout - {'loss': 0.7265, 'grad_norm': 1.303560733795166, 'learning_rate': 1.1602833903656309e-05, 'epoch': 1.4} +2025-02-05 20:18:58 - ERROR - stderr - 47%|████▋ | 10439/22434 [10:11:17<8:15:28, 2.48s/it] +2025-02-05 20:19:00 - ERROR - stderr - 47%|████▋ | 10440/22434 [10:11:20<8:12:39, 2.46s/it] +2025-02-05 20:19:00 - ERROR - stderr - +2025-02-05 20:19:00 - ERROR - stderr - +2025-02-05 20:19:00 - INFO - stdout - {'loss': 0.7229, 'grad_norm': 1.1787686347961426, 'learning_rate': 1.1601408806477312e-05, 'epoch': 1.4} +2025-02-05 20:19:00 - ERROR - stderr - 47%|████▋ | 10440/22434 [10:11:20<8:12:39, 2.46s/it] +2025-02-05 20:19:03 - ERROR - stderr - 47%|████▋ | 10441/22434 [10:11:22<8:12:44, 2.47s/it] +2025-02-05 20:19:03 - ERROR - stderr - +2025-02-05 20:19:03 - ERROR - stderr - +2025-02-05 20:19:03 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.2804287672042847, 'learning_rate': 1.1599983675918483e-05, 'epoch': 1.4} +2025-02-05 20:19:03 - ERROR - stderr - 47%|████▋ | 10441/22434 [10:11:22<8:12:44, 2.47s/it] +2025-02-05 20:19:05 - ERROR - stderr - 47%|████▋ | 10442/22434 [10:11:25<8:13:53, 2.47s/it] +2025-02-05 20:19:05 - ERROR - stderr - +2025-02-05 20:19:05 - ERROR - stderr - +2025-02-05 20:19:05 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.0167394876480103, 'learning_rate': 1.1598558512009524e-05, 'epoch': 1.4} +2025-02-05 20:19:05 - ERROR - stderr - 47%|████▋ | 10442/22434 [10:11:25<8:13:53, 2.47s/it] +2025-02-05 20:19:08 - ERROR - stderr - 47%|████▋ | 10443/22434 [10:11:27<8:18:31, 2.49s/it] +2025-02-05 20:19:08 - ERROR - stderr - +2025-02-05 20:19:08 - ERROR - stderr - +2025-02-05 20:19:08 - INFO - stdout - {'loss': 0.7198, 'grad_norm': 1.196326732635498, 'learning_rate': 1.1597133314780142e-05, 'epoch': 1.4} +2025-02-05 20:19:08 - ERROR - stderr - 47%|████▋ | 10443/22434 [10:11:27<8:18:31, 2.49s/it] +2025-02-05 20:19:10 - ERROR - stderr - 47%|████▋ | 10444/22434 [10:11:30<8:18:25, 2.49s/it] +2025-02-05 20:19:10 - ERROR - stderr - +2025-02-05 20:19:10 - ERROR - stderr - +2025-02-05 20:19:10 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.1013567447662354, 'learning_rate': 1.1595708084260044e-05, 'epoch': 1.4} +2025-02-05 20:19:10 - ERROR - stderr - 47%|████▋ | 10444/22434 [10:11:30<8:18:25, 2.49s/it] +2025-02-05 20:19:13 - ERROR - stderr - 47%|████▋ | 10445/22434 [10:11:32<8:19:30, 2.50s/it] +2025-02-05 20:19:13 - ERROR - stderr - +2025-02-05 20:19:13 - ERROR - stderr - +2025-02-05 20:19:13 - INFO - stdout - {'loss': 0.648, 'grad_norm': 1.0910524129867554, 'learning_rate': 1.1594282820478941e-05, 'epoch': 1.4} +2025-02-05 20:19:13 - ERROR - stderr - 47%|████▋ | 10445/22434 [10:11:32<8:19:30, 2.50s/it] +2025-02-05 20:19:15 - ERROR - stderr - 47%|████▋ | 10446/22434 [10:11:35<8:30:34, 2.56s/it] +2025-02-05 20:19:15 - ERROR - stderr - +2025-02-05 20:19:15 - ERROR - stderr - +2025-02-05 20:19:15 - INFO - stdout - {'loss': 0.6959, 'grad_norm': 1.166200041770935, 'learning_rate': 1.1592857523466537e-05, 'epoch': 1.4} +2025-02-05 20:19:15 - ERROR - stderr - 47%|████▋ | 10446/22434 [10:11:35<8:30:34, 2.56s/it] +2025-02-05 20:19:18 - ERROR - stderr - 47%|████▋ | 10447/22434 [10:11:38<8:41:22, 2.61s/it] +2025-02-05 20:19:18 - ERROR - stderr - +2025-02-05 20:19:18 - ERROR - stderr - +2025-02-05 20:19:18 - INFO - stdout - {'loss': 0.6, 'grad_norm': 1.1874009370803833, 'learning_rate': 1.1591432193252544e-05, 'epoch': 1.4} +2025-02-05 20:19:18 - ERROR - stderr - 47%|████▋ | 10447/22434 [10:11:38<8:41:22, 2.61s/it] +2025-02-05 20:19:21 - ERROR - stderr - 47%|████▋ | 10448/22434 [10:11:40<8:34:28, 2.58s/it] +2025-02-05 20:19:21 - ERROR - stderr - +2025-02-05 20:19:21 - ERROR - stderr - +2025-02-05 20:19:21 - INFO - stdout - {'loss': 0.7398, 'grad_norm': 1.1876559257507324, 'learning_rate': 1.1590006829866665e-05, 'epoch': 1.4} +2025-02-05 20:19:21 - ERROR - stderr - 47%|████▋ | 10448/22434 [10:11:40<8:34:28, 2.58s/it] +2025-02-05 20:19:23 - ERROR - stderr - 47%|████▋ | 10449/22434 [10:11:43<8:30:32, 2.56s/it] +2025-02-05 20:19:23 - ERROR - stderr - +2025-02-05 20:19:23 - ERROR - stderr - +2025-02-05 20:19:23 - INFO - stdout - {'loss': 0.6535, 'grad_norm': 1.2209651470184326, 'learning_rate': 1.1588581433338614e-05, 'epoch': 1.4} +2025-02-05 20:19:23 - ERROR - stderr - 47%|████▋ | 10449/22434 [10:11:43<8:30:32, 2.56s/it] +2025-02-05 20:19:26 - ERROR - stderr - 47%|████▋ | 10450/22434 [10:11:45<8:26:58, 2.54s/it] +2025-02-05 20:19:26 - ERROR - stderr - +2025-02-05 20:19:26 - ERROR - stderr - +2025-02-05 20:19:26 - INFO - stdout - {'loss': 0.7661, 'grad_norm': 1.2398382425308228, 'learning_rate': 1.1587156003698108e-05, 'epoch': 1.4} +2025-02-05 20:19:26 - ERROR - stderr - 47%|████▋ | 10450/22434 [10:11:45<8:26:58, 2.54s/it] +2025-02-05 20:19:28 - ERROR - stderr - 47%|████▋ | 10451/22434 [10:11:48<8:24:35, 2.53s/it] +2025-02-05 20:19:28 - ERROR - stderr - +2025-02-05 20:19:28 - ERROR - stderr - +2025-02-05 20:19:28 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 1.1994364261627197, 'learning_rate': 1.1585730540974851e-05, 'epoch': 1.4} +2025-02-05 20:19:28 - ERROR - stderr - 47%|████▋ | 10451/22434 [10:11:48<8:24:35, 2.53s/it] +2025-02-05 20:19:31 - ERROR - stderr - 47%|████▋ | 10452/22434 [10:11:50<8:23:21, 2.52s/it] +2025-02-05 20:19:31 - ERROR - stderr - +2025-02-05 20:19:31 - ERROR - stderr - +2025-02-05 20:19:31 - INFO - stdout - {'loss': 0.7149, 'grad_norm': 1.2190515995025635, 'learning_rate': 1.1584305045198563e-05, 'epoch': 1.4} +2025-02-05 20:19:31 - ERROR - stderr - 47%|████▋ | 10452/22434 [10:11:50<8:23:21, 2.52s/it] +2025-02-05 20:19:33 - ERROR - stderr - 47%|████▋ | 10453/22434 [10:11:53<8:17:56, 2.49s/it] +2025-02-05 20:19:33 - ERROR - stderr - +2025-02-05 20:19:33 - ERROR - stderr - +2025-02-05 20:19:33 - INFO - stdout - {'loss': 0.5812, 'grad_norm': 1.1928738355636597, 'learning_rate': 1.1582879516398949e-05, 'epoch': 1.4} +2025-02-05 20:19:33 - ERROR - stderr - 47%|████▋ | 10453/22434 [10:11:53<8:17:56, 2.49s/it] +2025-02-05 20:19:35 - ERROR - stderr - 47%|████▋ | 10454/22434 [10:11:55<8:18:13, 2.50s/it] +2025-02-05 20:19:36 - ERROR - stderr - +2025-02-05 20:19:36 - ERROR - stderr - +2025-02-05 20:19:36 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.3220523595809937, 'learning_rate': 1.1581453954605724e-05, 'epoch': 1.4} +2025-02-05 20:19:36 - ERROR - stderr - 47%|████▋ | 10454/22434 [10:11:55<8:18:13, 2.50s/it] +2025-02-05 20:19:38 - ERROR - stderr - 47%|████▋ | 10455/22434 [10:11:58<8:14:45, 2.48s/it] +2025-02-05 20:19:38 - ERROR - stderr - +2025-02-05 20:19:38 - ERROR - stderr - +2025-02-05 20:19:38 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.0939383506774902, 'learning_rate': 1.1580028359848608e-05, 'epoch': 1.4} +2025-02-05 20:19:38 - ERROR - stderr - 47%|████▋ | 10455/22434 [10:11:58<8:14:45, 2.48s/it] +2025-02-05 20:19:40 - ERROR - stderr - 47%|████▋ | 10456/22434 [10:12:00<8:12:22, 2.47s/it] +2025-02-05 20:19:40 - ERROR - stderr - +2025-02-05 20:19:40 - ERROR - stderr - +2025-02-05 20:19:40 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.2797682285308838, 'learning_rate': 1.1578602732157309e-05, 'epoch': 1.4} +2025-02-05 20:19:40 - ERROR - stderr - 47%|████▋ | 10456/22434 [10:12:00<8:12:22, 2.47s/it] +2025-02-05 20:19:43 - ERROR - stderr - 47%|████▋ | 10457/22434 [10:12:03<8:30:49, 2.56s/it] +2025-02-05 20:19:43 - ERROR - stderr - +2025-02-05 20:19:43 - ERROR - stderr - +2025-02-05 20:19:43 - INFO - stdout - {'loss': 0.6134, 'grad_norm': 1.193174958229065, 'learning_rate': 1.157717707156155e-05, 'epoch': 1.4} +2025-02-05 20:19:43 - ERROR - stderr - 47%|████▋ | 10457/22434 [10:12:03<8:30:49, 2.56s/it] +2025-02-05 20:19:46 - ERROR - stderr - 47%|████▋ | 10458/22434 [10:12:05<8:30:01, 2.56s/it] +2025-02-05 20:19:46 - ERROR - stderr - +2025-02-05 20:19:46 - ERROR - stderr - +2025-02-05 20:19:46 - INFO - stdout - {'loss': 0.7773, 'grad_norm': 1.2477015256881714, 'learning_rate': 1.1575751378091043e-05, 'epoch': 1.4} +2025-02-05 20:19:46 - ERROR - stderr - 47%|████▋ | 10458/22434 [10:12:05<8:30:01, 2.56s/it] +2025-02-05 20:19:48 - ERROR - stderr - 47%|████▋ | 10459/22434 [10:12:08<8:26:53, 2.54s/it] +2025-02-05 20:19:48 - ERROR - stderr - +2025-02-05 20:19:48 - ERROR - stderr - +2025-02-05 20:19:48 - INFO - stdout - {'loss': 0.6842, 'grad_norm': 1.2169758081436157, 'learning_rate': 1.1574325651775507e-05, 'epoch': 1.4} +2025-02-05 20:19:48 - ERROR - stderr - 47%|████▋ | 10459/22434 [10:12:08<8:26:53, 2.54s/it] +2025-02-05 20:19:51 - ERROR - stderr - 47%|████▋ | 10460/22434 [10:12:10<8:26:05, 2.54s/it] +2025-02-05 20:19:51 - ERROR - stderr - +2025-02-05 20:19:51 - ERROR - stderr - +2025-02-05 20:19:51 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.237100601196289, 'learning_rate': 1.157289989264466e-05, 'epoch': 1.4} +2025-02-05 20:19:51 - ERROR - stderr - 47%|████▋ | 10460/22434 [10:12:11<8:26:05, 2.54s/it] +2025-02-05 20:19:53 - ERROR - stderr - 47%|████▋ | 10461/22434 [10:12:13<8:23:16, 2.52s/it] +2025-02-05 20:19:53 - ERROR - stderr - +2025-02-05 20:19:53 - ERROR - stderr - +2025-02-05 20:19:53 - INFO - stdout - {'loss': 0.8051, 'grad_norm': 1.3609181642532349, 'learning_rate': 1.1571474100728218e-05, 'epoch': 1.4} +2025-02-05 20:19:53 - ERROR - stderr - 47%|████▋ | 10461/22434 [10:12:13<8:23:16, 2.52s/it] +2025-02-05 20:19:56 - ERROR - stderr - 47%|████▋ | 10462/22434 [10:12:16<8:26:36, 2.54s/it] +2025-02-05 20:19:56 - ERROR - stderr - +2025-02-05 20:19:56 - ERROR - stderr - +2025-02-05 20:19:56 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.2711882591247559, 'learning_rate': 1.15700482760559e-05, 'epoch': 1.4} +2025-02-05 20:19:56 - ERROR - stderr - 47%|████▋ | 10462/22434 [10:12:16<8:26:36, 2.54s/it] +2025-02-05 20:19:58 - ERROR - stderr - 47%|████▋ | 10463/22434 [10:12:18<8:28:36, 2.55s/it] +2025-02-05 20:19:58 - ERROR - stderr - +2025-02-05 20:19:58 - ERROR - stderr - +2025-02-05 20:19:58 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.261265754699707, 'learning_rate': 1.156862241865743e-05, 'epoch': 1.4} +2025-02-05 20:19:58 - ERROR - stderr - 47%|████▋ | 10463/22434 [10:12:18<8:28:36, 2.55s/it] +2025-02-05 20:20:01 - ERROR - stderr - 47%|████▋ | 10464/22434 [10:12:21<8:24:18, 2.53s/it] +2025-02-05 20:20:01 - ERROR - stderr - +2025-02-05 20:20:01 - ERROR - stderr - +2025-02-05 20:20:01 - INFO - stdout - {'loss': 0.7001, 'grad_norm': 1.21962571144104, 'learning_rate': 1.1567196528562529e-05, 'epoch': 1.4} +2025-02-05 20:20:01 - ERROR - stderr - 47%|████▋ | 10464/22434 [10:12:21<8:24:18, 2.53s/it] +2025-02-05 20:20:03 - ERROR - stderr - 47%|████▋ | 10465/22434 [10:12:23<8:21:01, 2.51s/it] +2025-02-05 20:20:03 - ERROR - stderr - +2025-02-05 20:20:03 - ERROR - stderr - +2025-02-05 20:20:03 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.1329017877578735, 'learning_rate': 1.1565770605800915e-05, 'epoch': 1.4} +2025-02-05 20:20:03 - ERROR - stderr - 47%|████▋ | 10465/22434 [10:12:23<8:21:01, 2.51s/it] +2025-02-05 20:20:06 - ERROR - stderr - 47%|████▋ | 10466/22434 [10:12:26<8:17:43, 2.50s/it] +2025-02-05 20:20:06 - ERROR - stderr - +2025-02-05 20:20:06 - ERROR - stderr - +2025-02-05 20:20:06 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.1051579713821411, 'learning_rate': 1.156434465040231e-05, 'epoch': 1.4} +2025-02-05 20:20:06 - ERROR - stderr - 47%|████▋ | 10466/22434 [10:12:26<8:17:43, 2.50s/it] +2025-02-05 20:20:08 - ERROR - stderr - 47%|████▋ | 10467/22434 [10:12:28<8:16:43, 2.49s/it] +2025-02-05 20:20:08 - ERROR - stderr - +2025-02-05 20:20:08 - ERROR - stderr - +2025-02-05 20:20:08 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.1910037994384766, 'learning_rate': 1.1562918662396438e-05, 'epoch': 1.4} +2025-02-05 20:20:08 - ERROR - stderr - 47%|████▋ | 10467/22434 [10:12:28<8:16:43, 2.49s/it] +2025-02-05 20:20:11 - ERROR - stderr - 47%|████▋ | 10468/22434 [10:12:31<8:30:32, 2.56s/it] +2025-02-05 20:20:11 - ERROR - stderr - +2025-02-05 20:20:11 - ERROR - stderr - +2025-02-05 20:20:11 - INFO - stdout - {'loss': 0.7743, 'grad_norm': 1.1898396015167236, 'learning_rate': 1.1561492641813021e-05, 'epoch': 1.4} +2025-02-05 20:20:11 - ERROR - stderr - 47%|████▋ | 10468/22434 [10:12:31<8:30:32, 2.56s/it] +2025-02-05 20:20:13 - ERROR - stderr - 47%|████▋ | 10469/22434 [10:12:33<8:22:57, 2.52s/it] +2025-02-05 20:20:13 - ERROR - stderr - +2025-02-05 20:20:13 - ERROR - stderr - +2025-02-05 20:20:13 - INFO - stdout - {'loss': 0.6901, 'grad_norm': 1.1269909143447876, 'learning_rate': 1.1560066588681786e-05, 'epoch': 1.4} +2025-02-05 20:20:13 - ERROR - stderr - 47%|████▋ | 10469/22434 [10:12:33<8:22:57, 2.52s/it] +2025-02-05 20:20:16 - ERROR - stderr - 47%|████▋ | 10470/22434 [10:12:36<8:19:47, 2.51s/it] +2025-02-05 20:20:16 - ERROR - stderr - +2025-02-05 20:20:16 - ERROR - stderr - +2025-02-05 20:20:16 - INFO - stdout - {'loss': 0.6791, 'grad_norm': 1.1019412279129028, 'learning_rate': 1.1558640503032455e-05, 'epoch': 1.4} +2025-02-05 20:20:16 - ERROR - stderr - 47%|████▋ | 10470/22434 [10:12:36<8:19:47, 2.51s/it] +2025-02-05 20:20:18 - ERROR - stderr - 47%|████▋ | 10471/22434 [10:12:38<8:15:47, 2.49s/it] +2025-02-05 20:20:18 - ERROR - stderr - +2025-02-05 20:20:18 - ERROR - stderr - +2025-02-05 20:20:18 - INFO - stdout - {'loss': 0.8373, 'grad_norm': 1.3726661205291748, 'learning_rate': 1.1557214384894753e-05, 'epoch': 1.4} +2025-02-05 20:20:18 - ERROR - stderr - 47%|████▋ | 10471/22434 [10:12:38<8:15:47, 2.49s/it] +2025-02-05 20:20:21 - ERROR - stderr - 47%|████▋ | 10472/22434 [10:12:41<8:13:48, 2.48s/it] +2025-02-05 20:20:21 - ERROR - stderr - +2025-02-05 20:20:21 - ERROR - stderr - +2025-02-05 20:20:21 - INFO - stdout - {'loss': 0.7966, 'grad_norm': 1.335279107093811, 'learning_rate': 1.1555788234298411e-05, 'epoch': 1.4} +2025-02-05 20:20:21 - ERROR - stderr - 47%|████▋ | 10472/22434 [10:12:41<8:13:48, 2.48s/it] +2025-02-05 20:20:23 - ERROR - stderr - 47%|████▋ | 10473/22434 [10:12:43<8:13:55, 2.48s/it] +2025-02-05 20:20:23 - ERROR - stderr - +2025-02-05 20:20:23 - ERROR - stderr - +2025-02-05 20:20:23 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.2123539447784424, 'learning_rate': 1.1554362051273149e-05, 'epoch': 1.4} +2025-02-05 20:20:23 - ERROR - stderr - 47%|████▋ | 10473/22434 [10:12:43<8:13:55, 2.48s/it] +2025-02-05 20:20:26 - ERROR - stderr - 47%|████▋ | 10474/22434 [10:12:46<8:17:21, 2.50s/it] +2025-02-05 20:20:26 - ERROR - stderr - +2025-02-05 20:20:26 - ERROR - stderr - +2025-02-05 20:20:26 - INFO - stdout - {'loss': 0.6612, 'grad_norm': 1.1895947456359863, 'learning_rate': 1.1552935835848697e-05, 'epoch': 1.4} +2025-02-05 20:20:26 - ERROR - stderr - 47%|████▋ | 10474/22434 [10:12:46<8:17:21, 2.50s/it] +2025-02-05 20:20:28 - ERROR - stderr - 47%|████▋ | 10475/22434 [10:12:48<8:21:34, 2.52s/it] +2025-02-05 20:20:28 - ERROR - stderr - +2025-02-05 20:20:28 - ERROR - stderr - +2025-02-05 20:20:28 - INFO - stdout - {'loss': 0.6336, 'grad_norm': 1.1981195211410522, 'learning_rate': 1.1551509588054783e-05, 'epoch': 1.4} +2025-02-05 20:20:28 - ERROR - stderr - 47%|████▋ | 10475/22434 [10:12:48<8:21:34, 2.52s/it] +2025-02-05 20:20:31 - ERROR - stderr - 47%|████▋ | 10476/22434 [10:12:51<8:17:18, 2.50s/it] +2025-02-05 20:20:31 - ERROR - stderr - +2025-02-05 20:20:31 - ERROR - stderr - +2025-02-05 20:20:31 - INFO - stdout - {'loss': 0.6427, 'grad_norm': 1.076019525527954, 'learning_rate': 1.1550083307921138e-05, 'epoch': 1.4} +2025-02-05 20:20:31 - ERROR - stderr - 47%|████▋ | 10476/22434 [10:12:51<8:17:18, 2.50s/it] +2025-02-05 20:20:33 - ERROR - stderr - 47%|████▋ | 10477/22434 [10:12:53<8:21:39, 2.52s/it] +2025-02-05 20:20:33 - ERROR - stderr - +2025-02-05 20:20:33 - ERROR - stderr - +2025-02-05 20:20:33 - INFO - stdout - {'loss': 0.825, 'grad_norm': 1.1917961835861206, 'learning_rate': 1.154865699547749e-05, 'epoch': 1.4} +2025-02-05 20:20:33 - ERROR - stderr - 47%|████▋ | 10477/22434 [10:12:53<8:21:39, 2.52s/it] +2025-02-05 20:20:36 - ERROR - stderr - 47%|████▋ | 10478/22434 [10:12:56<8:33:27, 2.58s/it] +2025-02-05 20:20:36 - ERROR - stderr - +2025-02-05 20:20:36 - ERROR - stderr - +2025-02-05 20:20:36 - INFO - stdout - {'loss': 0.8008, 'grad_norm': 1.1850403547286987, 'learning_rate': 1.1547230650753569e-05, 'epoch': 1.4} +2025-02-05 20:20:36 - ERROR - stderr - 47%|████▋ | 10478/22434 [10:12:56<8:33:27, 2.58s/it] +2025-02-05 20:20:39 - ERROR - stderr - 47%|████▋ | 10479/22434 [10:12:58<8:29:27, 2.56s/it] +2025-02-05 20:20:39 - ERROR - stderr - +2025-02-05 20:20:39 - ERROR - stderr - +2025-02-05 20:20:39 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.2097023725509644, 'learning_rate': 1.1545804273779104e-05, 'epoch': 1.4} +2025-02-05 20:20:39 - ERROR - stderr - 47%|████▋ | 10479/22434 [10:12:58<8:29:27, 2.56s/it] +2025-02-05 20:20:41 - ERROR - stderr - 47%|████▋ | 10480/22434 [10:13:01<8:26:25, 2.54s/it] +2025-02-05 20:20:41 - ERROR - stderr - +2025-02-05 20:20:41 - ERROR - stderr - +2025-02-05 20:20:41 - INFO - stdout - {'loss': 0.6989, 'grad_norm': 1.1313683986663818, 'learning_rate': 1.1544377864583832e-05, 'epoch': 1.4} +2025-02-05 20:20:41 - ERROR - stderr - 47%|████▋ | 10480/22434 [10:13:01<8:26:25, 2.54s/it] +2025-02-05 20:20:44 - ERROR - stderr - 47%|████▋ | 10481/22434 [10:13:03<8:22:20, 2.52s/it] +2025-02-05 20:20:44 - ERROR - stderr - +2025-02-05 20:20:44 - ERROR - stderr - +2025-02-05 20:20:44 - INFO - stdout - {'loss': 0.6375, 'grad_norm': 1.1132298707962036, 'learning_rate': 1.1542951423197475e-05, 'epoch': 1.4} +2025-02-05 20:20:44 - ERROR - stderr - 47%|████▋ | 10481/22434 [10:13:03<8:22:20, 2.52s/it] +2025-02-05 20:20:46 - ERROR - stderr - 47%|████▋ | 10482/22434 [10:13:06<8:18:05, 2.50s/it] +2025-02-05 20:20:46 - ERROR - stderr - +2025-02-05 20:20:46 - ERROR - stderr - +2025-02-05 20:20:46 - INFO - stdout - {'loss': 0.6912, 'grad_norm': 1.294676423072815, 'learning_rate': 1.1541524949649774e-05, 'epoch': 1.4} +2025-02-05 20:20:46 - ERROR - stderr - 47%|████▋ | 10482/22434 [10:13:06<8:18:05, 2.50s/it] +2025-02-05 20:20:49 - ERROR - stderr - 47%|████▋ | 10483/22434 [10:13:08<8:18:36, 2.50s/it] +2025-02-05 20:20:49 - ERROR - stderr - +2025-02-05 20:20:49 - ERROR - stderr - +2025-02-05 20:20:49 - INFO - stdout - {'loss': 0.7701, 'grad_norm': 1.3265748023986816, 'learning_rate': 1.1540098443970462e-05, 'epoch': 1.4} +2025-02-05 20:20:49 - ERROR - stderr - 47%|████▋ | 10483/22434 [10:13:08<8:18:36, 2.50s/it] +2025-02-05 20:20:51 - ERROR - stderr - 47%|████▋ | 10484/22434 [10:13:11<8:21:59, 2.52s/it] +2025-02-05 20:20:51 - ERROR - stderr - +2025-02-05 20:20:51 - ERROR - stderr - +2025-02-05 20:20:51 - INFO - stdout - {'loss': 0.6719, 'grad_norm': 1.0388612747192383, 'learning_rate': 1.1538671906189272e-05, 'epoch': 1.4} +2025-02-05 20:20:51 - ERROR - stderr - 47%|████▋ | 10484/22434 [10:13:11<8:21:59, 2.52s/it] +2025-02-05 20:20:54 - ERROR - stderr - 47%|████▋ | 10485/22434 [10:13:13<8:23:17, 2.53s/it] +2025-02-05 20:20:54 - ERROR - stderr - +2025-02-05 20:20:54 - ERROR - stderr - +2025-02-05 20:20:54 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.2976186275482178, 'learning_rate': 1.1537245336335938e-05, 'epoch': 1.4} +2025-02-05 20:20:54 - ERROR - stderr - 47%|████▋ | 10485/22434 [10:13:13<8:23:17, 2.53s/it] +2025-02-05 20:20:56 - ERROR - stderr - 47%|████▋ | 10486/22434 [10:13:16<8:23:11, 2.53s/it] +2025-02-05 20:20:56 - ERROR - stderr - +2025-02-05 20:20:56 - ERROR - stderr - +2025-02-05 20:20:56 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.105157494544983, 'learning_rate': 1.1535818734440196e-05, 'epoch': 1.4} +2025-02-05 20:20:56 - ERROR - stderr - 47%|████▋ | 10486/22434 [10:13:16<8:23:11, 2.53s/it] +2025-02-05 20:20:59 - ERROR - stderr - 47%|████▋ | 10487/22434 [10:13:18<8:26:13, 2.54s/it] +2025-02-05 20:20:59 - ERROR - stderr - +2025-02-05 20:20:59 - ERROR - stderr - +2025-02-05 20:20:59 - INFO - stdout - {'loss': 0.645, 'grad_norm': 1.1709946393966675, 'learning_rate': 1.1534392100531781e-05, 'epoch': 1.4} +2025-02-05 20:20:59 - ERROR - stderr - 47%|████▋ | 10487/22434 [10:13:19<8:26:13, 2.54s/it] +2025-02-05 20:21:01 - ERROR - stderr - 47%|████▋ | 10488/22434 [10:13:21<8:26:16, 2.54s/it] +2025-02-05 20:21:01 - ERROR - stderr - +2025-02-05 20:21:01 - ERROR - stderr - +2025-02-05 20:21:01 - INFO - stdout - {'loss': 0.6916, 'grad_norm': 1.2792648077011108, 'learning_rate': 1.153296543464043e-05, 'epoch': 1.4} +2025-02-05 20:21:01 - ERROR - stderr - 47%|████▋ | 10488/22434 [10:13:21<8:26:16, 2.54s/it] +2025-02-05 20:21:04 - ERROR - stderr - 47%|████▋ | 10489/22434 [10:13:24<8:22:32, 2.52s/it] +2025-02-05 20:21:04 - ERROR - stderr - +2025-02-05 20:21:04 - ERROR - stderr - +2025-02-05 20:21:04 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.194143533706665, 'learning_rate': 1.1531538736795884e-05, 'epoch': 1.4} +2025-02-05 20:21:04 - ERROR - stderr - 47%|████▋ | 10489/22434 [10:13:24<8:22:32, 2.52s/it] +2025-02-05 20:21:06 - ERROR - stderr - 47%|████▋ | 10490/22434 [10:13:26<8:17:52, 2.50s/it] +2025-02-05 20:21:06 - ERROR - stderr - +2025-02-05 20:21:06 - ERROR - stderr - +2025-02-05 20:21:06 - INFO - stdout - {'loss': 0.6959, 'grad_norm': 1.1946803331375122, 'learning_rate': 1.1530112007027878e-05, 'epoch': 1.4} +2025-02-05 20:21:06 - ERROR - stderr - 47%|████▋ | 10490/22434 [10:13:26<8:17:52, 2.50s/it] +2025-02-05 20:21:09 - ERROR - stderr - 47%|████▋ | 10491/22434 [10:13:29<8:39:13, 2.61s/it] +2025-02-05 20:21:09 - ERROR - stderr - +2025-02-05 20:21:09 - ERROR - stderr - +2025-02-05 20:21:09 - INFO - stdout - {'loss': 0.6864, 'grad_norm': 1.1878280639648438, 'learning_rate': 1.1528685245366149e-05, 'epoch': 1.4} +2025-02-05 20:21:09 - ERROR - stderr - 47%|████▋ | 10491/22434 [10:13:29<8:39:13, 2.61s/it] +2025-02-05 20:21:12 - ERROR - stderr - 47%|████▋ | 10492/22434 [10:13:31<8:34:09, 2.58s/it] +2025-02-05 20:21:12 - ERROR - stderr - +2025-02-05 20:21:12 - ERROR - stderr - +2025-02-05 20:21:12 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.1840901374816895, 'learning_rate': 1.1527258451840445e-05, 'epoch': 1.4} +2025-02-05 20:21:12 - ERROR - stderr - 47%|████▋ | 10492/22434 [10:13:31<8:34:09, 2.58s/it] +2025-02-05 20:21:14 - ERROR - stderr - 47%|████▋ | 10493/22434 [10:13:34<8:41:42, 2.62s/it] +2025-02-05 20:21:14 - ERROR - stderr - +2025-02-05 20:21:14 - ERROR - stderr - +2025-02-05 20:21:14 - INFO - stdout - {'loss': 0.695, 'grad_norm': 1.1465567350387573, 'learning_rate': 1.1525831626480495e-05, 'epoch': 1.4} +2025-02-05 20:21:14 - ERROR - stderr - 47%|████▋ | 10493/22434 [10:13:34<8:41:42, 2.62s/it] +2025-02-05 20:21:17 - ERROR - stderr - 47%|████▋ | 10494/22434 [10:13:37<8:36:37, 2.60s/it] +2025-02-05 20:21:17 - ERROR - stderr - +2025-02-05 20:21:17 - ERROR - stderr - +2025-02-05 20:21:17 - INFO - stdout - {'loss': 0.7687, 'grad_norm': 1.2945810556411743, 'learning_rate': 1.1524404769316042e-05, 'epoch': 1.4} +2025-02-05 20:21:17 - ERROR - stderr - 47%|████▋ | 10494/22434 [10:13:37<8:36:37, 2.60s/it] +2025-02-05 20:21:19 - ERROR - stderr - 47%|████▋ | 10495/22434 [10:13:39<8:29:37, 2.56s/it] +2025-02-05 20:21:19 - ERROR - stderr - +2025-02-05 20:21:19 - ERROR - stderr - +2025-02-05 20:21:19 - INFO - stdout - {'loss': 0.7259, 'grad_norm': 1.217054843902588, 'learning_rate': 1.1522977880376836e-05, 'epoch': 1.4} +2025-02-05 20:21:19 - ERROR - stderr - 47%|████▋ | 10495/22434 [10:13:39<8:29:37, 2.56s/it] +2025-02-05 20:21:22 - ERROR - stderr - 47%|████▋ | 10496/22434 [10:13:42<8:37:57, 2.60s/it] +2025-02-05 20:21:22 - ERROR - stderr - +2025-02-05 20:21:22 - ERROR - stderr - +2025-02-05 20:21:22 - INFO - stdout - {'loss': 0.7274, 'grad_norm': 1.1944928169250488, 'learning_rate': 1.1521550959692612e-05, 'epoch': 1.4} +2025-02-05 20:21:22 - ERROR - stderr - 47%|████▋ | 10496/22434 [10:13:42<8:37:57, 2.60s/it] +2025-02-05 20:21:25 - ERROR - stderr - 47%|████▋ | 10497/22434 [10:13:44<8:31:13, 2.57s/it] +2025-02-05 20:21:25 - ERROR - stderr - +2025-02-05 20:21:25 - ERROR - stderr - +2025-02-05 20:21:25 - INFO - stdout - {'loss': 0.6221, 'grad_norm': 1.1552131175994873, 'learning_rate': 1.1520124007293114e-05, 'epoch': 1.4} +2025-02-05 20:21:25 - ERROR - stderr - 47%|████▋ | 10497/22434 [10:13:44<8:31:13, 2.57s/it] +2025-02-05 20:21:27 - ERROR - stderr - 47%|████▋ | 10498/22434 [10:13:47<8:26:48, 2.55s/it] +2025-02-05 20:21:27 - ERROR - stderr - +2025-02-05 20:21:27 - ERROR - stderr - +2025-02-05 20:21:27 - INFO - stdout - {'loss': 0.7214, 'grad_norm': 1.2658562660217285, 'learning_rate': 1.1518697023208085e-05, 'epoch': 1.4} +2025-02-05 20:21:27 - ERROR - stderr - 47%|████▋ | 10498/22434 [10:13:47<8:26:48, 2.55s/it] +2025-02-05 20:21:30 - ERROR - stderr - 47%|████▋ | 10499/22434 [10:13:49<8:25:52, 2.54s/it] +2025-02-05 20:21:30 - ERROR - stderr - +2025-02-05 20:21:30 - ERROR - stderr - +2025-02-05 20:21:30 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.32713782787323, 'learning_rate': 1.151727000746727e-05, 'epoch': 1.4} +2025-02-05 20:21:30 - ERROR - stderr - 47%|████▋ | 10499/22434 [10:13:49<8:25:52, 2.54s/it] +2025-02-05 20:21:32 - ERROR - stderr - 47%|████▋ | 10500/22434 [10:13:52<8:39:17, 2.61s/it] +2025-02-05 20:21:32 - ERROR - stderr - +2025-02-05 20:21:32 - ERROR - stderr - +2025-02-05 20:21:32 - INFO - stdout - {'loss': 0.6276, 'grad_norm': 1.1041321754455566, 'learning_rate': 1.1515842960100411e-05, 'epoch': 1.4} +2025-02-05 20:21:32 - ERROR - stderr - 47%|████▋ | 10500/22434 [10:13:52<8:39:17, 2.61s/it] +2025-02-05 20:21:35 - ERROR - stderr - 47%|████▋ | 10501/22434 [10:13:55<8:32:42, 2.58s/it] +2025-02-05 20:21:35 - ERROR - stderr - +2025-02-05 20:21:35 - ERROR - stderr - +2025-02-05 20:21:35 - INFO - stdout - {'loss': 0.7332, 'grad_norm': 1.2578433752059937, 'learning_rate': 1.151441588113726e-05, 'epoch': 1.4} +2025-02-05 20:21:35 - ERROR - stderr - 47%|████▋ | 10501/22434 [10:13:55<8:32:42, 2.58s/it] +2025-02-05 20:21:37 - ERROR - stderr - 47%|████▋ | 10502/22434 [10:13:57<8:39:28, 2.61s/it] +2025-02-05 20:21:38 - ERROR - stderr - +2025-02-05 20:21:38 - ERROR - stderr - +2025-02-05 20:21:38 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.4034364223480225, 'learning_rate': 1.1512988770607558e-05, 'epoch': 1.4} +2025-02-05 20:21:38 - ERROR - stderr - 47%|████▋ | 10502/22434 [10:13:57<8:39:28, 2.61s/it] +2025-02-05 20:21:40 - ERROR - stderr - 47%|████▋ | 10503/22434 [10:14:00<8:34:50, 2.59s/it] +2025-02-05 20:21:40 - ERROR - stderr - +2025-02-05 20:21:40 - ERROR - stderr - +2025-02-05 20:21:40 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.2231868505477905, 'learning_rate': 1.1511561628541053e-05, 'epoch': 1.4} +2025-02-05 20:21:40 - ERROR - stderr - 47%|████▋ | 10503/22434 [10:14:00<8:34:50, 2.59s/it] +2025-02-05 20:21:42 - ERROR - stderr - 47%|████▋ | 10504/22434 [10:14:02<8:25:22, 2.54s/it] +2025-02-05 20:21:43 - ERROR - stderr - +2025-02-05 20:21:43 - ERROR - stderr - +2025-02-05 20:21:43 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.2082515954971313, 'learning_rate': 1.1510134454967493e-05, 'epoch': 1.4} +2025-02-05 20:21:43 - ERROR - stderr - 47%|████▋ | 10504/22434 [10:14:02<8:25:22, 2.54s/it] +2025-02-05 20:21:45 - ERROR - stderr - 47%|████▋ | 10505/22434 [10:14:05<8:24:24, 2.54s/it] +2025-02-05 20:21:45 - ERROR - stderr - +2025-02-05 20:21:45 - ERROR - stderr - +2025-02-05 20:21:45 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.1602566242218018, 'learning_rate': 1.1508707249916623e-05, 'epoch': 1.4} +2025-02-05 20:21:45 - ERROR - stderr - 47%|████▋ | 10505/22434 [10:14:05<8:24:24, 2.54s/it] +2025-02-05 20:21:47 - ERROR - stderr - 47%|████▋ | 10506/22434 [10:14:07<8:21:32, 2.52s/it] +2025-02-05 20:21:48 - ERROR - stderr - +2025-02-05 20:21:48 - ERROR - stderr - +2025-02-05 20:21:48 - INFO - stdout - {'loss': 0.6219, 'grad_norm': 1.0312881469726562, 'learning_rate': 1.1507280013418196e-05, 'epoch': 1.4} +2025-02-05 20:21:48 - ERROR - stderr - 47%|████▋ | 10506/22434 [10:14:07<8:21:32, 2.52s/it] +2025-02-05 20:21:50 - ERROR - stderr - 47%|████▋ | 10507/22434 [10:14:10<8:20:40, 2.52s/it] +2025-02-05 20:21:50 - ERROR - stderr - +2025-02-05 20:21:50 - ERROR - stderr - +2025-02-05 20:21:50 - INFO - stdout - {'loss': 0.6178, 'grad_norm': 1.1029127836227417, 'learning_rate': 1.1505852745501957e-05, 'epoch': 1.41} +2025-02-05 20:21:50 - ERROR - stderr - 47%|████▋ | 10507/22434 [10:14:10<8:20:40, 2.52s/it] +2025-02-05 20:21:52 - ERROR - stderr - 47%|████▋ | 10508/22434 [10:14:12<8:19:50, 2.51s/it] +2025-02-05 20:21:53 - ERROR - stderr - +2025-02-05 20:21:53 - ERROR - stderr - +2025-02-05 20:21:53 - INFO - stdout - {'loss': 0.6061, 'grad_norm': 1.090996503829956, 'learning_rate': 1.150442544619766e-05, 'epoch': 1.41} +2025-02-05 20:21:53 - ERROR - stderr - 47%|████▋ | 10508/22434 [10:14:12<8:19:50, 2.51s/it] +2025-02-05 20:21:55 - ERROR - stderr - 47%|████▋ | 10509/22434 [10:14:15<8:22:16, 2.53s/it] +2025-02-05 20:21:55 - ERROR - stderr - +2025-02-05 20:21:55 - ERROR - stderr - +2025-02-05 20:21:55 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.250545620918274, 'learning_rate': 1.1502998115535053e-05, 'epoch': 1.41} +2025-02-05 20:21:55 - ERROR - stderr - 47%|████▋ | 10509/22434 [10:14:15<8:22:16, 2.53s/it] +2025-02-05 20:21:57 - ERROR - stderr - 47%|████▋ | 10510/22434 [10:14:17<8:15:02, 2.49s/it] +2025-02-05 20:21:58 - ERROR - stderr - +2025-02-05 20:21:58 - ERROR - stderr - +2025-02-05 20:21:58 - INFO - stdout - {'loss': 0.6279, 'grad_norm': 1.2194857597351074, 'learning_rate': 1.1501570753543891e-05, 'epoch': 1.41} +2025-02-05 20:21:58 - ERROR - stderr - 47%|████▋ | 10510/22434 [10:14:17<8:15:02, 2.49s/it] +2025-02-05 20:22:00 - ERROR - stderr - 47%|████▋ | 10511/22434 [10:14:20<8:13:21, 2.48s/it] +2025-02-05 20:22:00 - ERROR - stderr - +2025-02-05 20:22:00 - ERROR - stderr - +2025-02-05 20:22:00 - INFO - stdout - {'loss': 0.6902, 'grad_norm': 1.1679712533950806, 'learning_rate': 1.1500143360253922e-05, 'epoch': 1.41} +2025-02-05 20:22:00 - ERROR - stderr - 47%|████▋ | 10511/22434 [10:14:20<8:13:21, 2.48s/it] +2025-02-05 20:22:03 - ERROR - stderr - 47%|████▋ | 10512/22434 [10:14:23<8:34:04, 2.59s/it] +2025-02-05 20:22:03 - ERROR - stderr - +2025-02-05 20:22:03 - ERROR - stderr - +2025-02-05 20:22:03 - INFO - stdout - {'loss': 0.6571, 'grad_norm': 1.0470558404922485, 'learning_rate': 1.1498715935694901e-05, 'epoch': 1.41} +2025-02-05 20:22:03 - ERROR - stderr - 47%|████▋ | 10512/22434 [10:14:23<8:34:04, 2.59s/it] +2025-02-05 20:22:05 - ERROR - stderr - 47%|████▋ | 10513/22434 [10:14:25<8:31:27, 2.57s/it] +2025-02-05 20:22:05 - ERROR - stderr - +2025-02-05 20:22:05 - ERROR - stderr - +2025-02-05 20:22:05 - INFO - stdout - {'loss': 0.6788, 'grad_norm': 1.1473331451416016, 'learning_rate': 1.1497288479896577e-05, 'epoch': 1.41} +2025-02-05 20:22:05 - ERROR - stderr - 47%|████▋ | 10513/22434 [10:14:25<8:31:27, 2.57s/it] +2025-02-05 20:22:08 - ERROR - stderr - 47%|████▋ | 10514/22434 [10:14:28<8:27:22, 2.55s/it] +2025-02-05 20:22:08 - ERROR - stderr - +2025-02-05 20:22:08 - ERROR - stderr - +2025-02-05 20:22:08 - INFO - stdout - {'loss': 0.7734, 'grad_norm': 1.370267391204834, 'learning_rate': 1.1495860992888712e-05, 'epoch': 1.41} +2025-02-05 20:22:08 - ERROR - stderr - 47%|████▋ | 10514/22434 [10:14:28<8:27:22, 2.55s/it] +2025-02-05 20:22:10 - ERROR - stderr - 47%|████▋ | 10515/22434 [10:14:30<8:20:32, 2.52s/it] +2025-02-05 20:22:10 - ERROR - stderr - +2025-02-05 20:22:10 - ERROR - stderr - +2025-02-05 20:22:10 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.181649088859558, 'learning_rate': 1.1494433474701055e-05, 'epoch': 1.41} +2025-02-05 20:22:10 - ERROR - stderr - 47%|████▋ | 10515/22434 [10:14:30<8:20:32, 2.52s/it] +2025-02-05 20:22:13 - ERROR - stderr - 47%|████▋ | 10516/22434 [10:14:33<8:19:38, 2.52s/it] +2025-02-05 20:22:13 - ERROR - stderr - +2025-02-05 20:22:13 - ERROR - stderr - +2025-02-05 20:22:13 - INFO - stdout - {'loss': 0.6447, 'grad_norm': 1.3100179433822632, 'learning_rate': 1.1493005925363361e-05, 'epoch': 1.41} +2025-02-05 20:22:13 - ERROR - stderr - 47%|████▋ | 10516/22434 [10:14:33<8:19:38, 2.52s/it] +2025-02-05 20:22:15 - ERROR - stderr - 47%|████▋ | 10517/22434 [10:14:35<8:16:38, 2.50s/it] +2025-02-05 20:22:15 - ERROR - stderr - +2025-02-05 20:22:15 - ERROR - stderr - +2025-02-05 20:22:15 - INFO - stdout - {'loss': 0.6436, 'grad_norm': 1.1937938928604126, 'learning_rate': 1.1491578344905387e-05, 'epoch': 1.41} +2025-02-05 20:22:15 - ERROR - stderr - 47%|████▋ | 10517/22434 [10:14:35<8:16:38, 2.50s/it] +2025-02-05 20:22:18 - ERROR - stderr - 47%|████▋ | 10518/22434 [10:14:37<8:15:25, 2.49s/it] +2025-02-05 20:22:18 - ERROR - stderr - +2025-02-05 20:22:18 - ERROR - stderr - +2025-02-05 20:22:18 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.1605963706970215, 'learning_rate': 1.1490150733356891e-05, 'epoch': 1.41} +2025-02-05 20:22:18 - ERROR - stderr - 47%|████▋ | 10518/22434 [10:14:38<8:15:25, 2.49s/it] +2025-02-05 20:22:20 - ERROR - stderr - 47%|████▋ | 10519/22434 [10:14:40<8:18:57, 2.51s/it] +2025-02-05 20:22:20 - ERROR - stderr - +2025-02-05 20:22:20 - ERROR - stderr - +2025-02-05 20:22:20 - INFO - stdout - {'loss': 0.7956, 'grad_norm': 1.4393471479415894, 'learning_rate': 1.1488723090747627e-05, 'epoch': 1.41} +2025-02-05 20:22:20 - ERROR - stderr - 47%|████▋ | 10519/22434 [10:14:40<8:18:57, 2.51s/it] +2025-02-05 20:22:23 - ERROR - stderr - 47%|████▋ | 10520/22434 [10:14:42<8:13:52, 2.49s/it] +2025-02-05 20:22:23 - ERROR - stderr - +2025-02-05 20:22:23 - ERROR - stderr - +2025-02-05 20:22:23 - INFO - stdout - {'loss': 0.6176, 'grad_norm': 1.0958194732666016, 'learning_rate': 1.1487295417107355e-05, 'epoch': 1.41} +2025-02-05 20:22:23 - ERROR - stderr - 47%|████▋ | 10520/22434 [10:14:42<8:13:52, 2.49s/it] +2025-02-05 20:22:25 - ERROR - stderr - 47%|████▋ | 10521/22434 [10:14:45<8:26:39, 2.55s/it] +2025-02-05 20:22:25 - ERROR - stderr - +2025-02-05 20:22:25 - ERROR - stderr - +2025-02-05 20:22:25 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.2444887161254883, 'learning_rate': 1.1485867712465835e-05, 'epoch': 1.41} +2025-02-05 20:22:25 - ERROR - stderr - 47%|████▋ | 10521/22434 [10:14:45<8:26:39, 2.55s/it] +2025-02-05 20:22:28 - ERROR - stderr - 47%|████▋ | 10522/22434 [10:14:48<8:35:38, 2.60s/it] +2025-02-05 20:22:28 - ERROR - stderr - +2025-02-05 20:22:28 - ERROR - stderr - +2025-02-05 20:22:28 - INFO - stdout - {'loss': 0.6972, 'grad_norm': 1.1932224035263062, 'learning_rate': 1.1484439976852823e-05, 'epoch': 1.41} +2025-02-05 20:22:28 - ERROR - stderr - 47%|████▋ | 10522/22434 [10:14:48<8:35:38, 2.60s/it] +2025-02-05 20:22:31 - ERROR - stderr - 47%|████▋ | 10523/22434 [10:14:50<8:32:07, 2.58s/it] +2025-02-05 20:22:31 - ERROR - stderr - +2025-02-05 20:22:31 - ERROR - stderr - +2025-02-05 20:22:31 - INFO - stdout - {'loss': 0.6603, 'grad_norm': 1.0152866840362549, 'learning_rate': 1.1483012210298082e-05, 'epoch': 1.41} +2025-02-05 20:22:31 - ERROR - stderr - 47%|████▋ | 10523/22434 [10:14:50<8:32:07, 2.58s/it] +2025-02-05 20:22:34 - ERROR - stderr - 47%|████▋ | 10524/22434 [10:14:53<8:51:07, 2.68s/it] +2025-02-05 20:22:34 - ERROR - stderr - +2025-02-05 20:22:34 - ERROR - stderr - +2025-02-05 20:22:34 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.1710230112075806, 'learning_rate': 1.148158441283137e-05, 'epoch': 1.41} +2025-02-05 20:22:34 - ERROR - stderr - 47%|████▋ | 10524/22434 [10:14:53<8:51:07, 2.68s/it] +2025-02-05 20:22:36 - ERROR - stderr - 47%|████▋ | 10525/22434 [10:14:56<8:38:30, 2.61s/it] +2025-02-05 20:22:36 - ERROR - stderr - +2025-02-05 20:22:36 - ERROR - stderr - +2025-02-05 20:22:36 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.258752465248108, 'learning_rate': 1.1480156584482448e-05, 'epoch': 1.41} +2025-02-05 20:22:36 - ERROR - stderr - 47%|████▋ | 10525/22434 [10:14:56<8:38:30, 2.61s/it] +2025-02-05 20:22:38 - ERROR - stderr - 47%|████▋ | 10526/22434 [10:14:58<8:32:19, 2.58s/it] +2025-02-05 20:22:39 - ERROR - stderr - +2025-02-05 20:22:39 - ERROR - stderr - +2025-02-05 20:22:39 - INFO - stdout - {'loss': 0.6724, 'grad_norm': 1.1693685054779053, 'learning_rate': 1.1478728725281074e-05, 'epoch': 1.41} +2025-02-05 20:22:39 - ERROR - stderr - 47%|████▋ | 10526/22434 [10:14:58<8:32:19, 2.58s/it] +2025-02-05 20:22:41 - ERROR - stderr - 47%|████▋ | 10527/22434 [10:15:01<8:31:36, 2.58s/it] +2025-02-05 20:22:41 - ERROR - stderr - +2025-02-05 20:22:41 - ERROR - stderr - +2025-02-05 20:22:41 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.113629937171936, 'learning_rate': 1.1477300835257019e-05, 'epoch': 1.41} +2025-02-05 20:22:41 - ERROR - stderr - 47%|████▋ | 10527/22434 [10:15:01<8:31:36, 2.58s/it] +2025-02-05 20:22:44 - ERROR - stderr - 47%|████▋ | 10528/22434 [10:15:03<8:23:56, 2.54s/it] +2025-02-05 20:22:44 - ERROR - stderr - +2025-02-05 20:22:44 - ERROR - stderr - +2025-02-05 20:22:44 - INFO - stdout - {'loss': 0.6146, 'grad_norm': 1.1784099340438843, 'learning_rate': 1.1475872914440042e-05, 'epoch': 1.41} +2025-02-05 20:22:44 - ERROR - stderr - 47%|████▋ | 10528/22434 [10:15:03<8:23:56, 2.54s/it] +2025-02-05 20:22:46 - ERROR - stderr - 47%|████▋ | 10529/22434 [10:15:06<8:20:16, 2.52s/it] +2025-02-05 20:22:46 - ERROR - stderr - +2025-02-05 20:22:46 - ERROR - stderr - +2025-02-05 20:22:46 - INFO - stdout - {'loss': 0.6692, 'grad_norm': 1.1649372577667236, 'learning_rate': 1.1474444962859907e-05, 'epoch': 1.41} +2025-02-05 20:22:46 - ERROR - stderr - 47%|████▋ | 10529/22434 [10:15:06<8:20:16, 2.52s/it] +2025-02-05 20:22:49 - ERROR - stderr - 47%|████▋ | 10530/22434 [10:15:08<8:20:03, 2.52s/it] +2025-02-05 20:22:49 - ERROR - stderr - +2025-02-05 20:22:49 - ERROR - stderr - +2025-02-05 20:22:49 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.1371971368789673, 'learning_rate': 1.1473016980546377e-05, 'epoch': 1.41} +2025-02-05 20:22:49 - ERROR - stderr - 47%|████▋ | 10530/22434 [10:15:08<8:20:03, 2.52s/it] +2025-02-05 20:22:51 - ERROR - stderr - 47%|████▋ | 10531/22434 [10:15:11<8:20:06, 2.52s/it] +2025-02-05 20:22:51 - ERROR - stderr - +2025-02-05 20:22:51 - ERROR - stderr - +2025-02-05 20:22:51 - INFO - stdout - {'loss': 0.672, 'grad_norm': 1.0474406480789185, 'learning_rate': 1.1471588967529218e-05, 'epoch': 1.41} +2025-02-05 20:22:51 - ERROR - stderr - 47%|████▋ | 10531/22434 [10:15:11<8:20:06, 2.52s/it] +2025-02-05 20:22:54 - ERROR - stderr - 47%|████▋ | 10532/22434 [10:15:13<8:20:14, 2.52s/it] +2025-02-05 20:22:54 - ERROR - stderr - +2025-02-05 20:22:54 - ERROR - stderr - +2025-02-05 20:22:54 - INFO - stdout - {'loss': 0.714, 'grad_norm': 1.2140933275222778, 'learning_rate': 1.1470160923838191e-05, 'epoch': 1.41} +2025-02-05 20:22:54 - ERROR - stderr - 47%|████▋ | 10532/22434 [10:15:13<8:20:14, 2.52s/it] +2025-02-05 20:22:56 - ERROR - stderr - 47%|████▋ | 10533/22434 [10:15:16<8:16:21, 2.50s/it] +2025-02-05 20:22:56 - ERROR - stderr - +2025-02-05 20:22:56 - ERROR - stderr - +2025-02-05 20:22:56 - INFO - stdout - {'loss': 0.6531, 'grad_norm': 1.1206984519958496, 'learning_rate': 1.146873284950307e-05, 'epoch': 1.41} +2025-02-05 20:22:56 - ERROR - stderr - 47%|████▋ | 10533/22434 [10:15:16<8:16:21, 2.50s/it] +2025-02-05 20:22:58 - ERROR - stderr - 47%|████▋ | 10534/22434 [10:15:18<8:13:25, 2.49s/it] +2025-02-05 20:22:59 - ERROR - stderr - +2025-02-05 20:22:59 - ERROR - stderr - +2025-02-05 20:22:59 - INFO - stdout - {'loss': 0.6517, 'grad_norm': 1.125379204750061, 'learning_rate': 1.1467304744553618e-05, 'epoch': 1.41} +2025-02-05 20:22:59 - ERROR - stderr - 47%|████▋ | 10534/22434 [10:15:18<8:13:25, 2.49s/it] +2025-02-05 20:23:01 - ERROR - stderr - 47%|████▋ | 10535/22434 [10:15:21<8:12:29, 2.48s/it] +2025-02-05 20:23:01 - ERROR - stderr - +2025-02-05 20:23:01 - ERROR - stderr - +2025-02-05 20:23:01 - INFO - stdout - {'loss': 0.7375, 'grad_norm': 1.1715943813323975, 'learning_rate': 1.1465876609019602e-05, 'epoch': 1.41} +2025-02-05 20:23:01 - ERROR - stderr - 47%|████▋ | 10535/22434 [10:15:21<8:12:29, 2.48s/it] +2025-02-05 20:23:04 - ERROR - stderr - 47%|████▋ | 10536/22434 [10:15:23<8:27:56, 2.56s/it] +2025-02-05 20:23:04 - ERROR - stderr - +2025-02-05 20:23:04 - ERROR - stderr - +2025-02-05 20:23:04 - INFO - stdout - {'loss': 0.6702, 'grad_norm': 1.229430913925171, 'learning_rate': 1.1464448442930792e-05, 'epoch': 1.41} +2025-02-05 20:23:04 - ERROR - stderr - 47%|████▋ | 10536/22434 [10:15:23<8:27:56, 2.56s/it] +2025-02-05 20:23:06 - ERROR - stderr - 47%|████▋ | 10537/22434 [10:15:26<8:25:19, 2.55s/it] +2025-02-05 20:23:06 - ERROR - stderr - +2025-02-05 20:23:06 - ERROR - stderr - +2025-02-05 20:23:06 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.1022083759307861, 'learning_rate': 1.1463020246316956e-05, 'epoch': 1.41} +2025-02-05 20:23:06 - ERROR - stderr - 47%|████▋ | 10537/22434 [10:15:26<8:25:19, 2.55s/it] +2025-02-05 20:23:09 - ERROR - stderr - 47%|████▋ | 10538/22434 [10:15:28<8:21:42, 2.53s/it] +2025-02-05 20:23:09 - ERROR - stderr - +2025-02-05 20:23:09 - ERROR - stderr - +2025-02-05 20:23:09 - INFO - stdout - {'loss': 0.6264, 'grad_norm': 1.07551908493042, 'learning_rate': 1.1461592019207862e-05, 'epoch': 1.41} +2025-02-05 20:23:09 - ERROR - stderr - 47%|████▋ | 10538/22434 [10:15:29<8:21:42, 2.53s/it] +2025-02-05 20:23:11 - ERROR - stderr - 47%|████▋ | 10539/22434 [10:15:31<8:17:32, 2.51s/it] +2025-02-05 20:23:11 - ERROR - stderr - +2025-02-05 20:23:11 - ERROR - stderr - +2025-02-05 20:23:11 - INFO - stdout - {'loss': 0.7845, 'grad_norm': 1.332484483718872, 'learning_rate': 1.1460163761633281e-05, 'epoch': 1.41} +2025-02-05 20:23:11 - ERROR - stderr - 47%|████▋ | 10539/22434 [10:15:31<8:17:32, 2.51s/it] +2025-02-05 20:23:14 - ERROR - stderr - 47%|████▋ | 10540/22434 [10:15:33<8:15:44, 2.50s/it] +2025-02-05 20:23:14 - ERROR - stderr - +2025-02-05 20:23:14 - ERROR - stderr - +2025-02-05 20:23:14 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.2024420499801636, 'learning_rate': 1.1458735473622979e-05, 'epoch': 1.41} +2025-02-05 20:23:14 - ERROR - stderr - 47%|████▋ | 10540/22434 [10:15:33<8:15:44, 2.50s/it] +2025-02-05 20:23:16 - ERROR - stderr - 47%|████▋ | 10541/22434 [10:15:36<8:11:45, 2.48s/it] +2025-02-05 20:23:16 - ERROR - stderr - +2025-02-05 20:23:16 - ERROR - stderr - +2025-02-05 20:23:16 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.18008291721344, 'learning_rate': 1.1457307155206738e-05, 'epoch': 1.41} +2025-02-05 20:23:16 - ERROR - stderr - 47%|████▋ | 10541/22434 [10:15:36<8:11:45, 2.48s/it] +2025-02-05 20:23:18 - ERROR - stderr - 47%|████▋ | 10542/22434 [10:15:38<8:07:47, 2.46s/it] +2025-02-05 20:23:19 - ERROR - stderr - +2025-02-05 20:23:19 - ERROR - stderr - +2025-02-05 20:23:19 - INFO - stdout - {'loss': 0.6598, 'grad_norm': 1.2743057012557983, 'learning_rate': 1.1455878806414322e-05, 'epoch': 1.41} +2025-02-05 20:23:19 - ERROR - stderr - 47%|████▋ | 10542/22434 [10:15:38<8:07:47, 2.46s/it] +2025-02-05 20:23:21 - ERROR - stderr - 47%|████▋ | 10543/22434 [10:15:41<8:06:54, 2.46s/it] +2025-02-05 20:23:21 - ERROR - stderr - +2025-02-05 20:23:21 - ERROR - stderr - +2025-02-05 20:23:21 - INFO - stdout - {'loss': 0.7111, 'grad_norm': 1.2505279779434204, 'learning_rate': 1.1454450427275506e-05, 'epoch': 1.41} +2025-02-05 20:23:21 - ERROR - stderr - 47%|████▋ | 10543/22434 [10:15:41<8:06:54, 2.46s/it] +2025-02-05 20:23:23 - ERROR - stderr - 47%|████▋ | 10544/22434 [10:15:43<8:11:35, 2.48s/it] +2025-02-05 20:23:24 - ERROR - stderr - +2025-02-05 20:23:24 - ERROR - stderr - +2025-02-05 20:23:24 - INFO - stdout - {'loss': 0.7124, 'grad_norm': 1.1035597324371338, 'learning_rate': 1.1453022017820061e-05, 'epoch': 1.41} +2025-02-05 20:23:24 - ERROR - stderr - 47%|████▋ | 10544/22434 [10:15:43<8:11:35, 2.48s/it] +2025-02-05 20:23:26 - ERROR - stderr - 47%|████▋ | 10545/22434 [10:15:46<8:11:50, 2.48s/it] +2025-02-05 20:23:26 - ERROR - stderr - +2025-02-05 20:23:26 - ERROR - stderr - +2025-02-05 20:23:26 - INFO - stdout - {'loss': 0.674, 'grad_norm': 1.082471251487732, 'learning_rate': 1.1451593578077764e-05, 'epoch': 1.41} +2025-02-05 20:23:26 - ERROR - stderr - 47%|████▋ | 10545/22434 [10:15:46<8:11:50, 2.48s/it] +2025-02-05 20:23:28 - ERROR - stderr - 47%|████▋ | 10546/22434 [10:15:48<8:12:59, 2.49s/it] +2025-02-05 20:23:28 - ERROR - stderr - +2025-02-05 20:23:28 - ERROR - stderr - +2025-02-05 20:23:28 - INFO - stdout - {'loss': 0.7018, 'grad_norm': 1.3133602142333984, 'learning_rate': 1.1450165108078385e-05, 'epoch': 1.41} +2025-02-05 20:23:28 - ERROR - stderr - 47%|████▋ | 10546/22434 [10:15:48<8:12:59, 2.49s/it] +2025-02-05 20:23:31 - ERROR - stderr - 47%|████▋ | 10547/22434 [10:15:51<8:16:10, 2.50s/it] +2025-02-05 20:23:31 - ERROR - stderr - +2025-02-05 20:23:31 - ERROR - stderr - +2025-02-05 20:23:31 - INFO - stdout - {'loss': 0.7349, 'grad_norm': 1.2281855344772339, 'learning_rate': 1.1448736607851705e-05, 'epoch': 1.41} +2025-02-05 20:23:31 - ERROR - stderr - 47%|████▋ | 10547/22434 [10:15:51<8:16:10, 2.50s/it] +2025-02-05 20:23:34 - ERROR - stderr - 47%|████▋ | 10548/22434 [10:15:53<8:17:03, 2.51s/it] +2025-02-05 20:23:34 - ERROR - stderr - +2025-02-05 20:23:34 - ERROR - stderr - +2025-02-05 20:23:34 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.1657018661499023, 'learning_rate': 1.1447308077427497e-05, 'epoch': 1.41} +2025-02-05 20:23:34 - ERROR - stderr - 47%|████▋ | 10548/22434 [10:15:53<8:17:03, 2.51s/it] +2025-02-05 20:23:36 - ERROR - stderr - 47%|████▋ | 10549/22434 [10:15:56<8:15:54, 2.50s/it] +2025-02-05 20:23:36 - ERROR - stderr - +2025-02-05 20:23:36 - ERROR - stderr - +2025-02-05 20:23:36 - INFO - stdout - {'loss': 0.6371, 'grad_norm': 1.1242061853408813, 'learning_rate': 1.1445879516835536e-05, 'epoch': 1.41} +2025-02-05 20:23:36 - ERROR - stderr - 47%|████▋ | 10549/22434 [10:15:56<8:15:54, 2.50s/it] +2025-02-05 20:23:38 - ERROR - stderr - 47%|████▋ | 10550/22434 [10:15:58<8:12:25, 2.49s/it] +2025-02-05 20:23:38 - ERROR - stderr - +2025-02-05 20:23:38 - ERROR - stderr - +2025-02-05 20:23:38 - INFO - stdout - {'loss': 0.8151, 'grad_norm': 1.2457032203674316, 'learning_rate': 1.14444509261056e-05, 'epoch': 1.41} +2025-02-05 20:23:38 - ERROR - stderr - 47%|████▋ | 10550/22434 [10:15:58<8:12:25, 2.49s/it] +2025-02-05 20:23:41 - ERROR - stderr - 47%|████▋ | 10551/22434 [10:16:01<8:10:13, 2.48s/it] +2025-02-05 20:23:41 - ERROR - stderr - +2025-02-05 20:23:41 - ERROR - stderr - +2025-02-05 20:23:41 - INFO - stdout - {'loss': 0.6701, 'grad_norm': 1.1794532537460327, 'learning_rate': 1.1443022305267468e-05, 'epoch': 1.41} +2025-02-05 20:23:41 - ERROR - stderr - 47%|████▋ | 10551/22434 [10:16:01<8:10:13, 2.48s/it] +2025-02-05 20:23:43 - ERROR - stderr - 47%|████▋ | 10552/22434 [10:16:03<8:11:38, 2.48s/it] +2025-02-05 20:23:43 - ERROR - stderr - +2025-02-05 20:23:43 - ERROR - stderr - +2025-02-05 20:23:43 - INFO - stdout - {'loss': 0.7785, 'grad_norm': 1.2247318029403687, 'learning_rate': 1.1441593654350914e-05, 'epoch': 1.41} +2025-02-05 20:23:43 - ERROR - stderr - 47%|████▋ | 10552/22434 [10:16:03<8:11:38, 2.48s/it] +2025-02-05 20:23:46 - ERROR - stderr - 47%|████▋ | 10553/22434 [10:16:06<8:18:19, 2.52s/it] +2025-02-05 20:23:46 - ERROR - stderr - +2025-02-05 20:23:46 - ERROR - stderr - +2025-02-05 20:23:46 - INFO - stdout - {'loss': 0.6168, 'grad_norm': 1.2065447568893433, 'learning_rate': 1.1440164973385722e-05, 'epoch': 1.41} +2025-02-05 20:23:46 - ERROR - stderr - 47%|████▋ | 10553/22434 [10:16:06<8:18:19, 2.52s/it] +2025-02-05 20:23:49 - ERROR - stderr - 47%|████▋ | 10554/22434 [10:16:08<8:18:16, 2.52s/it] +2025-02-05 20:23:49 - ERROR - stderr - +2025-02-05 20:23:49 - ERROR - stderr - +2025-02-05 20:23:49 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.1936233043670654, 'learning_rate': 1.1438736262401669e-05, 'epoch': 1.41} +2025-02-05 20:23:49 - ERROR - stderr - 47%|████▋ | 10554/22434 [10:16:08<8:18:16, 2.52s/it] +2025-02-05 20:23:51 - ERROR - stderr - 47%|████▋ | 10555/22434 [10:16:11<8:17:07, 2.51s/it] +2025-02-05 20:23:51 - ERROR - stderr - +2025-02-05 20:23:51 - ERROR - stderr - +2025-02-05 20:23:51 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.0986779928207397, 'learning_rate': 1.1437307521428533e-05, 'epoch': 1.41} +2025-02-05 20:23:51 - ERROR - stderr - 47%|████▋ | 10555/22434 [10:16:11<8:17:07, 2.51s/it] +2025-02-05 20:23:54 - ERROR - stderr - 47%|████▋ | 10556/22434 [10:16:13<8:16:16, 2.51s/it] +2025-02-05 20:23:54 - ERROR - stderr - +2025-02-05 20:23:54 - ERROR - stderr - +2025-02-05 20:23:54 - INFO - stdout - {'loss': 0.7522, 'grad_norm': 1.2485164403915405, 'learning_rate': 1.1435878750496099e-05, 'epoch': 1.41} +2025-02-05 20:23:54 - ERROR - stderr - 47%|████▋ | 10556/22434 [10:16:13<8:16:16, 2.51s/it] +2025-02-05 20:23:56 - ERROR - stderr - 47%|████▋ | 10557/22434 [10:16:16<8:16:32, 2.51s/it] +2025-02-05 20:23:56 - ERROR - stderr - +2025-02-05 20:23:56 - ERROR - stderr - +2025-02-05 20:23:56 - INFO - stdout - {'loss': 0.7179, 'grad_norm': 1.1732702255249023, 'learning_rate': 1.1434449949634147e-05, 'epoch': 1.41} +2025-02-05 20:23:56 - ERROR - stderr - 47%|████▋ | 10557/22434 [10:16:16<8:16:32, 2.51s/it] +2025-02-05 20:23:59 - ERROR - stderr - 47%|████▋ | 10558/22434 [10:16:18<8:16:24, 2.51s/it] +2025-02-05 20:23:59 - ERROR - stderr - +2025-02-05 20:23:59 - ERROR - stderr - +2025-02-05 20:23:59 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.1857529878616333, 'learning_rate': 1.1433021118872458e-05, 'epoch': 1.41} +2025-02-05 20:23:59 - ERROR - stderr - 47%|████▋ | 10558/22434 [10:16:18<8:16:24, 2.51s/it] +2025-02-05 20:24:01 - ERROR - stderr - 47%|████▋ | 10559/22434 [10:16:21<8:15:57, 2.51s/it] +2025-02-05 20:24:01 - ERROR - stderr - +2025-02-05 20:24:01 - ERROR - stderr - +2025-02-05 20:24:01 - INFO - stdout - {'loss': 0.7086, 'grad_norm': 1.1212129592895508, 'learning_rate': 1.1431592258240814e-05, 'epoch': 1.41} +2025-02-05 20:24:01 - ERROR - stderr - 47%|████▋ | 10559/22434 [10:16:21<8:15:57, 2.51s/it] +2025-02-05 20:24:03 - ERROR - stderr - 47%|████▋ | 10560/22434 [10:16:23<8:10:49, 2.48s/it] +2025-02-05 20:24:03 - ERROR - stderr - +2025-02-05 20:24:03 - ERROR - stderr - +2025-02-05 20:24:03 - INFO - stdout - {'loss': 0.7516, 'grad_norm': 1.4557336568832397, 'learning_rate': 1.1430163367768998e-05, 'epoch': 1.41} +2025-02-05 20:24:03 - ERROR - stderr - 47%|████▋ | 10560/22434 [10:16:23<8:10:49, 2.48s/it] +2025-02-05 20:24:06 - ERROR - stderr - 47%|████▋ | 10561/22434 [10:16:26<8:08:23, 2.47s/it] +2025-02-05 20:24:06 - ERROR - stderr - +2025-02-05 20:24:06 - ERROR - stderr - +2025-02-05 20:24:06 - INFO - stdout - {'loss': 0.664, 'grad_norm': 1.1804993152618408, 'learning_rate': 1.14287344474868e-05, 'epoch': 1.41} +2025-02-05 20:24:06 - ERROR - stderr - 47%|████▋ | 10561/22434 [10:16:26<8:08:23, 2.47s/it] +2025-02-05 20:24:08 - ERROR - stderr - 47%|████▋ | 10562/22434 [10:16:28<8:14:13, 2.50s/it] +2025-02-05 20:24:09 - ERROR - stderr - +2025-02-05 20:24:09 - ERROR - stderr - +2025-02-05 20:24:09 - INFO - stdout - {'loss': 0.815, 'grad_norm': 1.3473013639450073, 'learning_rate': 1.1427305497423995e-05, 'epoch': 1.41} +2025-02-05 20:24:09 - ERROR - stderr - 47%|████▋ | 10562/22434 [10:16:28<8:14:13, 2.50s/it] +2025-02-05 20:24:11 - ERROR - stderr - 47%|████▋ | 10563/22434 [10:16:31<8:40:23, 2.63s/it] +2025-02-05 20:24:11 - ERROR - stderr - +2025-02-05 20:24:11 - ERROR - stderr - +2025-02-05 20:24:11 - INFO - stdout - {'loss': 0.6309, 'grad_norm': 1.0304821729660034, 'learning_rate': 1.1425876517610375e-05, 'epoch': 1.41} +2025-02-05 20:24:11 - ERROR - stderr - 47%|████▋ | 10563/22434 [10:16:31<8:40:23, 2.63s/it] +2025-02-05 20:24:14 - ERROR - stderr - 47%|████▋ | 10564/22434 [10:16:34<8:29:55, 2.58s/it] +2025-02-05 20:24:14 - ERROR - stderr - +2025-02-05 20:24:14 - ERROR - stderr - +2025-02-05 20:24:14 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.1462628841400146, 'learning_rate': 1.1424447508075722e-05, 'epoch': 1.41} +2025-02-05 20:24:14 - ERROR - stderr - 47%|████▋ | 10564/22434 [10:16:34<8:29:55, 2.58s/it] +2025-02-05 20:24:16 - ERROR - stderr - 47%|████▋ | 10565/22434 [10:16:36<8:27:55, 2.57s/it] +2025-02-05 20:24:16 - ERROR - stderr - +2025-02-05 20:24:16 - ERROR - stderr - +2025-02-05 20:24:16 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.1305737495422363, 'learning_rate': 1.1423018468849824e-05, 'epoch': 1.41} +2025-02-05 20:24:16 - ERROR - stderr - 47%|████▋ | 10565/22434 [10:16:36<8:27:55, 2.57s/it] +2025-02-05 20:24:19 - ERROR - stderr - 47%|████▋ | 10566/22434 [10:16:39<8:25:44, 2.56s/it] +2025-02-05 20:24:19 - ERROR - stderr - +2025-02-05 20:24:19 - ERROR - stderr - +2025-02-05 20:24:19 - INFO - stdout - {'loss': 0.8142, 'grad_norm': 1.261242389678955, 'learning_rate': 1.142158939996247e-05, 'epoch': 1.41} +2025-02-05 20:24:19 - ERROR - stderr - 47%|████▋ | 10566/22434 [10:16:39<8:25:44, 2.56s/it] +2025-02-05 20:24:21 - ERROR - stderr - 47%|████▋ | 10567/22434 [10:16:41<8:19:36, 2.53s/it] +2025-02-05 20:24:21 - ERROR - stderr - +2025-02-05 20:24:21 - ERROR - stderr - +2025-02-05 20:24:21 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.2355860471725464, 'learning_rate': 1.1420160301443444e-05, 'epoch': 1.41} +2025-02-05 20:24:21 - ERROR - stderr - 47%|████▋ | 10567/22434 [10:16:41<8:19:36, 2.53s/it] +2025-02-05 20:24:24 - ERROR - stderr - 47%|████▋ | 10568/22434 [10:16:44<8:18:04, 2.52s/it] +2025-02-05 20:24:24 - ERROR - stderr - +2025-02-05 20:24:24 - ERROR - stderr - +2025-02-05 20:24:24 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.073434829711914, 'learning_rate': 1.1418731173322532e-05, 'epoch': 1.41} +2025-02-05 20:24:24 - ERROR - stderr - 47%|████▋ | 10568/22434 [10:16:44<8:18:04, 2.52s/it] +2025-02-05 20:24:26 - ERROR - stderr - 47%|████▋ | 10569/22434 [10:16:46<8:23:26, 2.55s/it] +2025-02-05 20:24:27 - ERROR - stderr - +2025-02-05 20:24:27 - ERROR - stderr - +2025-02-05 20:24:27 - INFO - stdout - {'loss': 0.7137, 'grad_norm': 1.24872887134552, 'learning_rate': 1.1417302015629532e-05, 'epoch': 1.41} +2025-02-05 20:24:27 - ERROR - stderr - 47%|████▋ | 10569/22434 [10:16:46<8:23:26, 2.55s/it] +2025-02-05 20:24:29 - ERROR - stderr - 47%|████▋ | 10570/22434 [10:16:49<8:19:32, 2.53s/it] +2025-02-05 20:24:29 - ERROR - stderr - +2025-02-05 20:24:29 - ERROR - stderr - +2025-02-05 20:24:29 - INFO - stdout - {'loss': 0.7591, 'grad_norm': 1.3223756551742554, 'learning_rate': 1.1415872828394225e-05, 'epoch': 1.41} +2025-02-05 20:24:29 - ERROR - stderr - 47%|████▋ | 10570/22434 [10:16:49<8:19:32, 2.53s/it] +2025-02-05 20:24:31 - ERROR - stderr - 47%|████▋ | 10571/22434 [10:16:51<8:15:39, 2.51s/it] +2025-02-05 20:24:31 - ERROR - stderr - +2025-02-05 20:24:31 - ERROR - stderr - +2025-02-05 20:24:31 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.17685067653656, 'learning_rate': 1.1414443611646404e-05, 'epoch': 1.41} +2025-02-05 20:24:31 - ERROR - stderr - 47%|████▋ | 10571/22434 [10:16:51<8:15:39, 2.51s/it] +2025-02-05 20:24:34 - ERROR - stderr - 47%|████▋ | 10572/22434 [10:16:54<8:12:04, 2.49s/it] +2025-02-05 20:24:34 - ERROR - stderr - +2025-02-05 20:24:34 - ERROR - stderr - +2025-02-05 20:24:34 - INFO - stdout - {'loss': 0.7739, 'grad_norm': 1.1834352016448975, 'learning_rate': 1.1413014365415855e-05, 'epoch': 1.41} +2025-02-05 20:24:34 - ERROR - stderr - 47%|████▋ | 10572/22434 [10:16:54<8:12:04, 2.49s/it] +2025-02-05 20:24:36 - ERROR - stderr - 47%|████▋ | 10573/22434 [10:16:56<8:16:28, 2.51s/it] +2025-02-05 20:24:36 - ERROR - stderr - +2025-02-05 20:24:36 - ERROR - stderr - +2025-02-05 20:24:36 - INFO - stdout - {'loss': 0.7575, 'grad_norm': 1.2675681114196777, 'learning_rate': 1.1411585089732382e-05, 'epoch': 1.41} +2025-02-05 20:24:36 - ERROR - stderr - 47%|████▋ | 10573/22434 [10:16:56<8:16:28, 2.51s/it] +2025-02-05 20:24:39 - ERROR - stderr - 47%|████▋ | 10574/22434 [10:16:59<8:14:27, 2.50s/it] +2025-02-05 20:24:39 - ERROR - stderr - +2025-02-05 20:24:39 - ERROR - stderr - +2025-02-05 20:24:39 - INFO - stdout - {'loss': 0.6689, 'grad_norm': 1.1324913501739502, 'learning_rate': 1.1410155784625762e-05, 'epoch': 1.41} +2025-02-05 20:24:39 - ERROR - stderr - 47%|████▋ | 10574/22434 [10:16:59<8:14:27, 2.50s/it] +2025-02-05 20:24:41 - ERROR - stderr - 47%|████▋ | 10575/22434 [10:17:01<8:16:51, 2.51s/it] +2025-02-05 20:24:42 - ERROR - stderr - +2025-02-05 20:24:42 - ERROR - stderr - +2025-02-05 20:24:42 - INFO - stdout - {'loss': 0.6617, 'grad_norm': 1.1453560590744019, 'learning_rate': 1.1408726450125798e-05, 'epoch': 1.41} +2025-02-05 20:24:42 - ERROR - stderr - 47%|████▋ | 10575/22434 [10:17:01<8:16:51, 2.51s/it] +2025-02-05 20:24:44 - ERROR - stderr - 47%|████▋ | 10576/22434 [10:17:04<8:13:38, 2.50s/it] +2025-02-05 20:24:44 - ERROR - stderr - +2025-02-05 20:24:44 - ERROR - stderr - +2025-02-05 20:24:44 - INFO - stdout - {'loss': 0.7695, 'grad_norm': 1.3069463968276978, 'learning_rate': 1.1407297086262276e-05, 'epoch': 1.41} +2025-02-05 20:24:44 - ERROR - stderr - 47%|████▋ | 10576/22434 [10:17:04<8:13:38, 2.50s/it] +2025-02-05 20:24:46 - ERROR - stderr - 47%|████▋ | 10577/22434 [10:17:06<8:15:08, 2.51s/it] +2025-02-05 20:24:46 - ERROR - stderr - +2025-02-05 20:24:46 - ERROR - stderr - +2025-02-05 20:24:46 - INFO - stdout - {'loss': 0.72, 'grad_norm': 1.2291260957717896, 'learning_rate': 1.1405867693064994e-05, 'epoch': 1.41} +2025-02-05 20:24:46 - ERROR - stderr - 47%|████▋ | 10577/22434 [10:17:06<8:15:08, 2.51s/it] +2025-02-05 20:24:49 - ERROR - stderr - 47%|████▋ | 10578/22434 [10:17:09<8:14:16, 2.50s/it] +2025-02-05 20:24:49 - ERROR - stderr - +2025-02-05 20:24:49 - ERROR - stderr - +2025-02-05 20:24:49 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.1886711120605469, 'learning_rate': 1.1404438270563744e-05, 'epoch': 1.41} +2025-02-05 20:24:49 - ERROR - stderr - 47%|████▋ | 10578/22434 [10:17:09<8:14:16, 2.50s/it] +2025-02-05 20:24:51 - ERROR - stderr - 47%|████▋ | 10579/22434 [10:17:11<8:16:43, 2.51s/it] +2025-02-05 20:24:52 - ERROR - stderr - +2025-02-05 20:24:52 - ERROR - stderr - +2025-02-05 20:24:52 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.3096486330032349, 'learning_rate': 1.1403008818788326e-05, 'epoch': 1.41} +2025-02-05 20:24:52 - ERROR - stderr - 47%|████▋ | 10579/22434 [10:17:11<8:16:43, 2.51s/it] +2025-02-05 20:24:54 - ERROR - stderr - 47%|████▋ | 10580/22434 [10:17:14<8:13:23, 2.50s/it] +2025-02-05 20:24:54 - ERROR - stderr - +2025-02-05 20:24:54 - ERROR - stderr - +2025-02-05 20:24:54 - INFO - stdout - {'loss': 0.7226, 'grad_norm': 1.1347885131835938, 'learning_rate': 1.1401579337768528e-05, 'epoch': 1.41} +2025-02-05 20:24:54 - ERROR - stderr - 47%|████▋ | 10580/22434 [10:17:14<8:13:23, 2.50s/it] +2025-02-05 20:24:56 - ERROR - stderr - 47%|████▋ | 10581/22434 [10:17:16<8:10:44, 2.48s/it] +2025-02-05 20:24:56 - ERROR - stderr - +2025-02-05 20:24:56 - ERROR - stderr - +2025-02-05 20:24:56 - INFO - stdout - {'loss': 0.6237, 'grad_norm': 1.2470582723617554, 'learning_rate': 1.1400149827534154e-05, 'epoch': 1.41} +2025-02-05 20:24:56 - ERROR - stderr - 47%|████▋ | 10581/22434 [10:17:16<8:10:44, 2.48s/it] +2025-02-05 20:24:59 - ERROR - stderr - 47%|████▋ | 10582/22434 [10:17:19<8:10:37, 2.48s/it] +2025-02-05 20:24:59 - ERROR - stderr - +2025-02-05 20:24:59 - ERROR - stderr - +2025-02-05 20:24:59 - INFO - stdout - {'loss': 0.6454, 'grad_norm': 1.2683314085006714, 'learning_rate': 1.1398720288114992e-05, 'epoch': 1.42} +2025-02-05 20:24:59 - ERROR - stderr - 47%|████▋ | 10582/22434 [10:17:19<8:10:37, 2.48s/it] +2025-02-05 20:25:01 - ERROR - stderr - 47%|████▋ | 10583/22434 [10:17:21<8:10:36, 2.48s/it] +2025-02-05 20:25:01 - ERROR - stderr - +2025-02-05 20:25:01 - ERROR - stderr - +2025-02-05 20:25:01 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.332045078277588, 'learning_rate': 1.1397290719540848e-05, 'epoch': 1.42} +2025-02-05 20:25:01 - ERROR - stderr - 47%|████▋ | 10583/22434 [10:17:21<8:10:36, 2.48s/it] +2025-02-05 20:25:04 - ERROR - stderr - 47%|████▋ | 10584/22434 [10:17:24<8:11:15, 2.49s/it] +2025-02-05 20:25:04 - ERROR - stderr - +2025-02-05 20:25:04 - ERROR - stderr - +2025-02-05 20:25:04 - INFO - stdout - {'loss': 0.7871, 'grad_norm': 1.2583454847335815, 'learning_rate': 1.1395861121841514e-05, 'epoch': 1.42} +2025-02-05 20:25:04 - ERROR - stderr - 47%|████▋ | 10584/22434 [10:17:24<8:11:15, 2.49s/it] +2025-02-05 20:25:06 - ERROR - stderr - 47%|████▋ | 10585/22434 [10:17:26<8:12:22, 2.49s/it] +2025-02-05 20:25:06 - ERROR - stderr - +2025-02-05 20:25:06 - ERROR - stderr - +2025-02-05 20:25:06 - INFO - stdout - {'loss': 0.7032, 'grad_norm': 1.2215043306350708, 'learning_rate': 1.1394431495046789e-05, 'epoch': 1.42} +2025-02-05 20:25:06 - ERROR - stderr - 47%|████▋ | 10585/22434 [10:17:26<8:12:22, 2.49s/it] +2025-02-05 20:25:09 - ERROR - stderr - 47%|████▋ | 10586/22434 [10:17:29<8:09:37, 2.48s/it] +2025-02-05 20:25:09 - ERROR - stderr - +2025-02-05 20:25:09 - ERROR - stderr - +2025-02-05 20:25:09 - INFO - stdout - {'loss': 0.7392, 'grad_norm': 1.1740665435791016, 'learning_rate': 1.1393001839186475e-05, 'epoch': 1.42} +2025-02-05 20:25:09 - ERROR - stderr - 47%|████▋ | 10586/22434 [10:17:29<8:09:37, 2.48s/it] +2025-02-05 20:25:11 - ERROR - stderr - 47%|████▋ | 10587/22434 [10:17:31<8:10:28, 2.48s/it] +2025-02-05 20:25:11 - ERROR - stderr - +2025-02-05 20:25:11 - ERROR - stderr - +2025-02-05 20:25:11 - INFO - stdout - {'loss': 0.7668, 'grad_norm': 1.2061184644699097, 'learning_rate': 1.1391572154290371e-05, 'epoch': 1.42} +2025-02-05 20:25:11 - ERROR - stderr - 47%|████▋ | 10587/22434 [10:17:31<8:10:28, 2.48s/it] +2025-02-05 20:25:14 - ERROR - stderr - 47%|████▋ | 10588/22434 [10:17:34<8:07:12, 2.47s/it] +2025-02-05 20:25:14 - ERROR - stderr - +2025-02-05 20:25:14 - ERROR - stderr - +2025-02-05 20:25:14 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.1267472505569458, 'learning_rate': 1.1390142440388277e-05, 'epoch': 1.42} +2025-02-05 20:25:14 - ERROR - stderr - 47%|████▋ | 10588/22434 [10:17:34<8:07:12, 2.47s/it] +2025-02-05 20:25:16 - ERROR - stderr - 47%|████▋ | 10589/22434 [10:17:36<8:07:33, 2.47s/it] +2025-02-05 20:25:16 - ERROR - stderr - +2025-02-05 20:25:16 - ERROR - stderr - +2025-02-05 20:25:16 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.1811354160308838, 'learning_rate': 1.1388712697509997e-05, 'epoch': 1.42} +2025-02-05 20:25:16 - ERROR - stderr - 47%|████▋ | 10589/22434 [10:17:36<8:07:33, 2.47s/it] +2025-02-05 20:25:19 - ERROR - stderr - 47%|████▋ | 10590/22434 [10:17:39<8:10:53, 2.49s/it] +2025-02-05 20:25:19 - ERROR - stderr - +2025-02-05 20:25:19 - ERROR - stderr - +2025-02-05 20:25:19 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.2539303302764893, 'learning_rate': 1.1387282925685326e-05, 'epoch': 1.42} +2025-02-05 20:25:19 - ERROR - stderr - 47%|████▋ | 10590/22434 [10:17:39<8:10:53, 2.49s/it] +2025-02-05 20:25:21 - ERROR - stderr - 47%|████▋ | 10591/22434 [10:17:41<8:17:44, 2.52s/it] +2025-02-05 20:25:21 - ERROR - stderr - +2025-02-05 20:25:21 - ERROR - stderr - +2025-02-05 20:25:21 - INFO - stdout - {'loss': 0.798, 'grad_norm': 1.3207405805587769, 'learning_rate': 1.1385853124944069e-05, 'epoch': 1.42} +2025-02-05 20:25:21 - ERROR - stderr - 47%|████▋ | 10591/22434 [10:17:41<8:17:44, 2.52s/it] +2025-02-05 20:25:24 - ERROR - stderr - 47%|████▋ | 10592/22434 [10:17:44<8:14:25, 2.51s/it] +2025-02-05 20:25:24 - ERROR - stderr - +2025-02-05 20:25:24 - ERROR - stderr - +2025-02-05 20:25:24 - INFO - stdout - {'loss': 0.6604, 'grad_norm': 1.113406777381897, 'learning_rate': 1.138442329531603e-05, 'epoch': 1.42} +2025-02-05 20:25:24 - ERROR - stderr - 47%|████▋ | 10592/22434 [10:17:44<8:14:25, 2.51s/it] +2025-02-05 20:25:26 - ERROR - stderr - 47%|████▋ | 10593/22434 [10:17:46<8:19:39, 2.53s/it] +2025-02-05 20:25:26 - ERROR - stderr - +2025-02-05 20:25:26 - ERROR - stderr - +2025-02-05 20:25:26 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.1999109983444214, 'learning_rate': 1.1382993436831015e-05, 'epoch': 1.42} +2025-02-05 20:25:26 - ERROR - stderr - 47%|████▋ | 10593/22434 [10:17:46<8:19:39, 2.53s/it] +2025-02-05 20:25:29 - ERROR - stderr - 47%|████▋ | 10594/22434 [10:17:49<8:14:25, 2.51s/it] +2025-02-05 20:25:29 - ERROR - stderr - +2025-02-05 20:25:29 - ERROR - stderr - +2025-02-05 20:25:29 - INFO - stdout - {'loss': 0.7552, 'grad_norm': 1.1537039279937744, 'learning_rate': 1.1381563549518823e-05, 'epoch': 1.42} +2025-02-05 20:25:29 - ERROR - stderr - 47%|████▋ | 10594/22434 [10:17:49<8:14:25, 2.51s/it] +2025-02-05 20:25:31 - ERROR - stderr - 47%|████▋ | 10595/22434 [10:17:51<8:16:28, 2.52s/it] +2025-02-05 20:25:31 - ERROR - stderr - +2025-02-05 20:25:31 - ERROR - stderr - +2025-02-05 20:25:31 - INFO - stdout - {'loss': 0.7614, 'grad_norm': 1.0739426612854004, 'learning_rate': 1.1380133633409263e-05, 'epoch': 1.42} +2025-02-05 20:25:31 - ERROR - stderr - 47%|████▋ | 10595/22434 [10:17:51<8:16:28, 2.52s/it] +2025-02-05 20:25:34 - ERROR - stderr - 47%|███��▋ | 10596/22434 [10:17:54<8:16:17, 2.52s/it] +2025-02-05 20:25:34 - ERROR - stderr - +2025-02-05 20:25:34 - ERROR - stderr - +2025-02-05 20:25:34 - INFO - stdout - {'loss': 0.6624, 'grad_norm': 1.1287152767181396, 'learning_rate': 1.1378703688532136e-05, 'epoch': 1.42} +2025-02-05 20:25:34 - ERROR - stderr - 47%|████▋ | 10596/22434 [10:17:54<8:16:17, 2.52s/it] +2025-02-05 20:25:36 - ERROR - stderr - 47%|████▋ | 10597/22434 [10:17:56<8:20:44, 2.54s/it] +2025-02-05 20:25:37 - ERROR - stderr - +2025-02-05 20:25:37 - ERROR - stderr - +2025-02-05 20:25:37 - INFO - stdout - {'loss': 0.59, 'grad_norm': 0.9782724380493164, 'learning_rate': 1.1377273714917249e-05, 'epoch': 1.42} +2025-02-05 20:25:37 - ERROR - stderr - 47%|████▋ | 10597/22434 [10:17:56<8:20:44, 2.54s/it] +2025-02-05 20:25:39 - ERROR - stderr - 47%|████▋ | 10598/22434 [10:17:59<8:19:49, 2.53s/it] +2025-02-05 20:25:39 - ERROR - stderr - +2025-02-05 20:25:39 - ERROR - stderr - +2025-02-05 20:25:39 - INFO - stdout - {'loss': 0.722, 'grad_norm': 1.2062641382217407, 'learning_rate': 1.1375843712594412e-05, 'epoch': 1.42} +2025-02-05 20:25:39 - ERROR - stderr - 47%|████▋ | 10598/22434 [10:17:59<8:19:49, 2.53s/it] +2025-02-05 20:25:42 - ERROR - stderr - 47%|████▋ | 10599/22434 [10:18:01<8:28:19, 2.58s/it] +2025-02-05 20:25:42 - ERROR - stderr - +2025-02-05 20:25:42 - ERROR - stderr - +2025-02-05 20:25:42 - INFO - stdout - {'loss': 0.6641, 'grad_norm': 1.1900321245193481, 'learning_rate': 1.1374413681593428e-05, 'epoch': 1.42} +2025-02-05 20:25:42 - ERROR - stderr - 47%|████▋ | 10599/22434 [10:18:02<8:28:19, 2.58s/it] +2025-02-05 20:25:44 - ERROR - stderr - 47%|████▋ | 10600/22434 [10:18:04<8:33:29, 2.60s/it] +2025-02-05 20:25:44 - ERROR - stderr - +2025-02-05 20:25:44 - ERROR - stderr - +2025-02-05 20:25:44 - INFO - stdout - {'loss': 0.7572, 'grad_norm': 1.223484992980957, 'learning_rate': 1.1372983621944105e-05, 'epoch': 1.42} +2025-02-05 20:25:44 - ERROR - stderr - 47%|████▋ | 10600/22434 [10:18:04<8:33:29, 2.60s/it] +2025-02-05 20:25:47 - ERROR - stderr - 47%|████▋ | 10601/22434 [10:18:07<8:36:04, 2.62s/it] +2025-02-05 20:25:47 - ERROR - stderr - +2025-02-05 20:25:47 - ERROR - stderr - +2025-02-05 20:25:47 - INFO - stdout - {'loss': 0.7681, 'grad_norm': 1.2040348052978516, 'learning_rate': 1.1371553533676255e-05, 'epoch': 1.42} +2025-02-05 20:25:47 - ERROR - stderr - 47%|████▋ | 10601/22434 [10:18:07<8:36:04, 2.62s/it] +2025-02-05 20:25:50 - ERROR - stderr - 47%|████▋ | 10602/22434 [10:18:09<8:30:46, 2.59s/it] +2025-02-05 20:25:50 - ERROR - stderr - +2025-02-05 20:25:50 - ERROR - stderr - +2025-02-05 20:25:50 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.2161526679992676, 'learning_rate': 1.1370123416819683e-05, 'epoch': 1.42} +2025-02-05 20:25:50 - ERROR - stderr - 47%|████▋ | 10602/22434 [10:18:09<8:30:46, 2.59s/it] +2025-02-05 20:25:52 - ERROR - stderr - 47%|████▋ | 10603/22434 [10:18:12<8:21:33, 2.54s/it] +2025-02-05 20:25:52 - ERROR - stderr - +2025-02-05 20:25:52 - ERROR - stderr - +2025-02-05 20:25:52 - INFO - stdout - {'loss': 0.7875, 'grad_norm': 1.2322206497192383, 'learning_rate': 1.1368693271404199e-05, 'epoch': 1.42} +2025-02-05 20:25:52 - ERROR - stderr - 47%|████▋ | 10603/22434 [10:18:12<8:21:33, 2.54s/it] +2025-02-05 20:25:54 - ERROR - stderr - 47%|████▋ | 10604/22434 [10:18:14<8:15:32, 2.51s/it] +2025-02-05 20:25:54 - ERROR - stderr - +2025-02-05 20:25:54 - ERROR - stderr - +2025-02-05 20:25:54 - INFO - stdout - {'loss': 0.7053, 'grad_norm': 1.1602782011032104, 'learning_rate': 1.1367263097459612e-05, 'epoch': 1.42} +2025-02-05 20:25:54 - ERROR - stderr - 47%|████▋ | 10604/22434 [10:18:14<8:15:32, 2.51s/it] +2025-02-05 20:25:57 - ERROR - stderr - 47%|████▋ | 10605/22434 [10:18:17<8:18:47, 2.53s/it] +2025-02-05 20:25:57 - ERROR - stderr - +2025-02-05 20:25:57 - ERROR - stderr - +2025-02-05 20:25:57 - INFO - stdout - {'loss': 0.7326, 'grad_norm': 1.2035529613494873, 'learning_rate': 1.1365832895015735e-05, 'epoch': 1.42} +2025-02-05 20:25:57 - ERROR - stderr - 47%|████▋ | 10605/22434 [10:18:17<8:18:47, 2.53s/it] +2025-02-05 20:25:59 - ERROR - stderr - 47%|████▋ | 10606/22434 [10:18:19<8:12:33, 2.50s/it] +2025-02-05 20:25:59 - ERROR - stderr - +2025-02-05 20:25:59 - ERROR - stderr - +2025-02-05 20:25:59 - INFO - stdout - {'loss': 0.8074, 'grad_norm': 1.2299703359603882, 'learning_rate': 1.1364402664102379e-05, 'epoch': 1.42} +2025-02-05 20:25:59 - ERROR - stderr - 47%|████▋ | 10606/22434 [10:18:19<8:12:33, 2.50s/it] +2025-02-05 20:26:02 - ERROR - stderr - 47%|████▋ | 10607/22434 [10:18:22<8:10:49, 2.49s/it] +2025-02-05 20:26:02 - ERROR - stderr - +2025-02-05 20:26:02 - ERROR - stderr - +2025-02-05 20:26:02 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.166016697883606, 'learning_rate': 1.1362972404749355e-05, 'epoch': 1.42} +2025-02-05 20:26:02 - ERROR - stderr - 47%|████▋ | 10607/22434 [10:18:22<8:10:49, 2.49s/it] +2025-02-05 20:26:04 - ERROR - stderr - 47%|████▋ | 10608/22434 [10:18:24<8:17:15, 2.52s/it] +2025-02-05 20:26:05 - ERROR - stderr - +2025-02-05 20:26:05 - ERROR - stderr - +2025-02-05 20:26:05 - INFO - stdout - {'loss': 0.7447, 'grad_norm': 1.2602362632751465, 'learning_rate': 1.1361542116986474e-05, 'epoch': 1.42} +2025-02-05 20:26:05 - ERROR - stderr - 47%|████▋ | 10608/22434 [10:18:24<8:17:15, 2.52s/it] +2025-02-05 20:26:07 - ERROR - stderr - 47%|████▋ | 10609/22434 [10:18:27<8:12:19, 2.50s/it] +2025-02-05 20:26:07 - ERROR - stderr - +2025-02-05 20:26:07 - ERROR - stderr - +2025-02-05 20:26:07 - INFO - stdout - {'loss': 0.5786, 'grad_norm': 1.1421501636505127, 'learning_rate': 1.1360111800843555e-05, 'epoch': 1.42} +2025-02-05 20:26:07 - ERROR - stderr - 47%|████▋ | 10609/22434 [10:18:27<8:12:19, 2.50s/it] +2025-02-05 20:26:09 - ERROR - stderr - 47%|████▋ | 10610/22434 [10:18:29<8:09:15, 2.48s/it] +2025-02-05 20:26:09 - ERROR - stderr - +2025-02-05 20:26:09 - ERROR - stderr - +2025-02-05 20:26:09 - INFO - stdout - {'loss': 0.7147, 'grad_norm': 1.200738787651062, 'learning_rate': 1.13586814563504e-05, 'epoch': 1.42} +2025-02-05 20:26:09 - ERROR - stderr - 47%|████▋ | 10610/22434 [10:18:29<8:09:15, 2.48s/it] +2025-02-05 20:26:12 - ERROR - stderr - 47%|████▋ | 10611/22434 [10:18:32<8:13:19, 2.50s/it] +2025-02-05 20:26:12 - ERROR - stderr - +2025-02-05 20:26:12 - ERROR - stderr - +2025-02-05 20:26:12 - INFO - stdout - {'loss': 0.5874, 'grad_norm': 1.1282882690429688, 'learning_rate': 1.1357251083536834e-05, 'epoch': 1.42} +2025-02-05 20:26:12 - ERROR - stderr - 47%|████▋ | 10611/22434 [10:18:32<8:13:19, 2.50s/it] +2025-02-05 20:26:14 - ERROR - stderr - 47%|████▋ | 10612/22434 [10:18:34<8:14:13, 2.51s/it] +2025-02-05 20:26:14 - ERROR - stderr - +2025-02-05 20:26:14 - ERROR - stderr - +2025-02-05 20:26:14 - INFO - stdout - {'loss': 0.6684, 'grad_norm': 1.213294506072998, 'learning_rate': 1.1355820682432667e-05, 'epoch': 1.42} +2025-02-05 20:26:14 - ERROR - stderr - 47%|████▋ | 10612/22434 [10:18:34<8:14:13, 2.51s/it] +2025-02-05 20:26:17 - ERROR - stderr - 47%|████▋ | 10613/22434 [10:18:37<8:22:59, 2.55s/it] +2025-02-05 20:26:17 - ERROR - stderr - +2025-02-05 20:26:17 - ERROR - stderr - +2025-02-05 20:26:17 - INFO - stdout - {'loss': 0.7514, 'grad_norm': 1.3889409303665161, 'learning_rate': 1.1354390253067717e-05, 'epoch': 1.42} +2025-02-05 20:26:17 - ERROR - stderr - 47%|████▋ | 10613/22434 [10:18:37<8:22:59, 2.55s/it] +2025-02-05 20:26:20 - ERROR - stderr - 47%|████▋ | 10614/22434 [10:18:40<8:32:32, 2.60s/it] +2025-02-05 20:26:20 - ERROR - stderr - +2025-02-05 20:26:20 - ERROR - stderr - +2025-02-05 20:26:20 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.1848816871643066, 'learning_rate': 1.1352959795471798e-05, 'epoch': 1.42} +2025-02-05 20:26:20 - ERROR - stderr - 47%|████▋ | 10614/22434 [10:18:40<8:32:32, 2.60s/it] +2025-02-05 20:26:22 - ERROR - stderr - 47%|████▋ | 10615/22434 [10:18:42<8:30:46, 2.59s/it] +2025-02-05 20:26:22 - ERROR - stderr - +2025-02-05 20:26:22 - ERROR - stderr - +2025-02-05 20:26:22 - INFO - stdout - {'loss': 0.7397, 'grad_norm': 1.3111480474472046, 'learning_rate': 1.1351529309674724e-05, 'epoch': 1.42} +2025-02-05 20:26:22 - ERROR - stderr - 47%|████▋ | 10615/22434 [10:18:42<8:30:46, 2.59s/it] +2025-02-05 20:26:25 - ERROR - stderr - 47%|████▋ | 10616/22434 [10:18:45<8:22:52, 2.55s/it] +2025-02-05 20:26:25 - ERROR - stderr - +2025-02-05 20:26:25 - ERROR - stderr - +2025-02-05 20:26:25 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.2366198301315308, 'learning_rate': 1.1350098795706316e-05, 'epoch': 1.42} +2025-02-05 20:26:25 - ERROR - stderr - 47%|████▋ | 10616/22434 [10:18:45<8:22:52, 2.55s/it] +2025-02-05 20:26:27 - ERROR - stderr - 47%|████▋ | 10617/22434 [10:18:47<8:19:32, 2.54s/it] +2025-02-05 20:26:27 - ERROR - stderr - +2025-02-05 20:26:27 - ERROR - stderr - +2025-02-05 20:26:27 - INFO - stdout - {'loss': 0.7101, 'grad_norm': 1.172921895980835, 'learning_rate': 1.1348668253596394e-05, 'epoch': 1.42} +2025-02-05 20:26:27 - ERROR - stderr - 47%|████▋ | 10617/22434 [10:18:47<8:19:32, 2.54s/it] +2025-02-05 20:26:30 - ERROR - stderr - 47%|████▋ | 10618/22434 [10:18:50<8:16:50, 2.52s/it] +2025-02-05 20:26:30 - ERROR - stderr - +2025-02-05 20:26:30 - ERROR - stderr - +2025-02-05 20:26:30 - INFO - stdout - {'loss': 0.6365, 'grad_norm': 1.1612149477005005, 'learning_rate': 1.1347237683374767e-05, 'epoch': 1.42} +2025-02-05 20:26:30 - ERROR - stderr - 47%|████▋ | 10618/22434 [10:18:50<8:16:50, 2.52s/it] +2025-02-05 20:26:32 - ERROR - stderr - 47%|████▋ | 10619/22434 [10:18:52<8:18:43, 2.53s/it] +2025-02-05 20:26:32 - ERROR - stderr - +2025-02-05 20:26:32 - ERROR - stderr - +2025-02-05 20:26:32 - INFO - stdout - {'loss': 0.7582, 'grad_norm': 1.3267265558242798, 'learning_rate': 1.1345807085071263e-05, 'epoch': 1.42} +2025-02-05 20:26:32 - ERROR - stderr - 47%|████▋ | 10619/22434 [10:18:52<8:18:43, 2.53s/it] +2025-02-05 20:26:35 - ERROR - stderr - 47%|████▋ | 10620/22434 [10:18:55<8:15:49, 2.52s/it] +2025-02-05 20:26:35 - ERROR - stderr - +2025-02-05 20:26:35 - ERROR - stderr - +2025-02-05 20:26:35 - INFO - stdout - {'loss': 0.7186, 'grad_norm': 1.2563718557357788, 'learning_rate': 1.1344376458715697e-05, 'epoch': 1.42} +2025-02-05 20:26:35 - ERROR - stderr - 47%|████▋ | 10620/22434 [10:18:55<8:15:49, 2.52s/it] +2025-02-05 20:26:37 - ERROR - stderr - 47%|████▋ | 10621/22434 [10:18:57<8:12:23, 2.50s/it] +2025-02-05 20:26:37 - ERROR - stderr - +2025-02-05 20:26:37 - ERROR - stderr - +2025-02-05 20:26:37 - INFO - stdout - {'loss': 0.7981, 'grad_norm': 1.2076612710952759, 'learning_rate': 1.134294580433789e-05, 'epoch': 1.42} +2025-02-05 20:26:37 - ERROR - stderr - 47%|████▋ | 10621/22434 [10:18:57<8:12:23, 2.50s/it] +2025-02-05 20:26:40 - ERROR - stderr - 47%|████▋ | 10622/22434 [10:19:00<8:10:50, 2.49s/it] +2025-02-05 20:26:40 - ERROR - stderr - +2025-02-05 20:26:40 - ERROR - stderr - +2025-02-05 20:26:40 - INFO - stdout - {'loss': 0.7727, 'grad_norm': 1.1642730236053467, 'learning_rate': 1.1341515121967666e-05, 'epoch': 1.42} +2025-02-05 20:26:40 - ERROR - stderr - 47%|████▋ | 10622/22434 [10:19:00<8:10:50, 2.49s/it] +2025-02-05 20:26:42 - ERROR - stderr - 47%|████▋ | 10623/22434 [10:19:02<8:07:33, 2.48s/it] +2025-02-05 20:26:42 - ERROR - stderr - +2025-02-05 20:26:42 - ERROR - stderr - +2025-02-05 20:26:42 - INFO - stdout - {'loss': 0.5688, 'grad_norm': 1.108965277671814, 'learning_rate': 1.1340084411634839e-05, 'epoch': 1.42} +2025-02-05 20:26:42 - ERROR - stderr - 47%|████▋ | 10623/22434 [10:19:02<8:07:33, 2.48s/it] +2025-02-05 20:26:45 - ERROR - stderr - 47%|████▋ | 10624/22434 [10:19:05<8:11:18, 2.50s/it] +2025-02-05 20:26:45 - ERROR - stderr - +2025-02-05 20:26:45 - ERROR - stderr - +2025-02-05 20:26:45 - INFO - stdout - {'loss': 0.735, 'grad_norm': 1.247994065284729, 'learning_rate': 1.1338653673369235e-05, 'epoch': 1.42} +2025-02-05 20:26:45 - ERROR - stderr - 47%|████▋ | 10624/22434 [10:19:05<8:11:18, 2.50s/it] +2025-02-05 20:26:47 - ERROR - stderr - 47%|████▋ | 10625/22434 [10:19:07<8:15:38, 2.52s/it] +2025-02-05 20:26:47 - ERROR - stderr - +2025-02-05 20:26:47 - ERROR - stderr - +2025-02-05 20:26:47 - INFO - stdout - {'loss': 0.6724, 'grad_norm': 1.289122462272644, 'learning_rate': 1.1337222907200678e-05, 'epoch': 1.42} +2025-02-05 20:26:47 - ERROR - stderr - 47%|████▋ | 10625/22434 [10:19:07<8:15:38, 2.52s/it] +2025-02-05 20:26:50 - ERROR - stderr - 47%|████▋ | 10626/22434 [10:19:10<8:19:23, 2.54s/it] +2025-02-05 20:26:50 - ERROR - stderr - +2025-02-05 20:26:50 - ERROR - stderr - +2025-02-05 20:26:50 - INFO - stdout - {'loss': 0.7486, 'grad_norm': 1.221063256263733, 'learning_rate': 1.133579211315899e-05, 'epoch': 1.42} +2025-02-05 20:26:50 - ERROR - stderr - 47%|████▋ | 10626/22434 [10:19:10<8:19:23, 2.54s/it] +2025-02-05 20:26:52 - ERROR - stderr - 47%|████▋ | 10627/22434 [10:19:12<8:18:52, 2.54s/it] +2025-02-05 20:26:53 - ERROR - stderr - +2025-02-05 20:26:53 - ERROR - stderr - +2025-02-05 20:26:53 - INFO - stdout - {'loss': 0.7797, 'grad_norm': 1.2657474279403687, 'learning_rate': 1.1334361291273991e-05, 'epoch': 1.42} +2025-02-05 20:26:53 - ERROR - stderr - 47%|████▋ | 10627/22434 [10:19:12<8:18:52, 2.54s/it] +2025-02-05 20:26:55 - ERROR - stderr - 47%|████▋ | 10628/22434 [10:19:15<8:12:16, 2.50s/it] +2025-02-05 20:26:55 - ERROR - stderr - +2025-02-05 20:26:55 - ERROR - stderr - +2025-02-05 20:26:55 - INFO - stdout - {'loss': 0.7377, 'grad_norm': 1.3419336080551147, 'learning_rate': 1.1332930441575509e-05, 'epoch': 1.42} +2025-02-05 20:26:55 - ERROR - stderr - 47%|████▋ | 10628/22434 [10:19:15<8:12:16, 2.50s/it] +2025-02-05 20:26:57 - ERROR - stderr - 47%|████▋ | 10629/22434 [10:19:17<8:11:27, 2.50s/it] +2025-02-05 20:26:57 - ERROR - stderr - +2025-02-05 20:26:57 - ERROR - stderr - +2025-02-05 20:26:57 - INFO - stdout - {'loss': 0.6662, 'grad_norm': 1.0919914245605469, 'learning_rate': 1.1331499564093369e-05, 'epoch': 1.42} +2025-02-05 20:26:57 - ERROR - stderr - 47%|████▋ | 10629/22434 [10:19:17<8:11:27, 2.50s/it] +2025-02-05 20:27:00 - ERROR - stderr - 47%|████▋ | 10630/22434 [10:19:20<8:06:29, 2.47s/it] +2025-02-05 20:27:00 - ERROR - stderr - +2025-02-05 20:27:00 - ERROR - stderr - +2025-02-05 20:27:00 - INFO - stdout - {'loss': 0.71, 'grad_norm': 1.4153435230255127, 'learning_rate': 1.1330068658857391e-05, 'epoch': 1.42} +2025-02-05 20:27:00 - ERROR - stderr - 47%|████▋ | 10630/22434 [10:19:20<8:06:29, 2.47s/it] +2025-02-05 20:27:02 - ERROR - stderr - 47%|████▋ | 10631/22434 [10:19:22<8:09:20, 2.49s/it] +2025-02-05 20:27:02 - ERROR - stderr - +2025-02-05 20:27:02 - ERROR - stderr - +2025-02-05 20:27:02 - INFO - stdout - {'loss': 0.7355, 'grad_norm': 1.184780240058899, 'learning_rate': 1.1328637725897407e-05, 'epoch': 1.42} +2025-02-05 20:27:02 - ERROR - stderr - 47%|████▋ | 10631/22434 [10:19:22<8:09:20, 2.49s/it] +2025-02-05 20:27:05 - ERROR - stderr - 47%|████▋ | 10632/22434 [10:19:25<8:11:54, 2.50s/it] +2025-02-05 20:27:05 - ERROR - stderr - +2025-02-05 20:27:05 - ERROR - stderr - +2025-02-05 20:27:05 - INFO - stdout - {'loss': 0.6934, 'grad_norm': 1.1008286476135254, 'learning_rate': 1.132720676524324e-05, 'epoch': 1.42} +2025-02-05 20:27:05 - ERROR - stderr - 47%|████▋ | 10632/22434 [10:19:25<8:11:54, 2.50s/it] +2025-02-05 20:27:07 - ERROR - stderr - 47%|████▋ | 10633/22434 [10:19:27<8:19:09, 2.54s/it] +2025-02-05 20:27:08 - ERROR - stderr - +2025-02-05 20:27:08 - ERROR - stderr - +2025-02-05 20:27:08 - INFO - stdout - {'loss': 0.6831, 'grad_norm': 1.0425958633422852, 'learning_rate': 1.1325775776924719e-05, 'epoch': 1.42} +2025-02-05 20:27:08 - ERROR - stderr - 47%|████▋ | 10633/22434 [10:19:27<8:19:09, 2.54s/it] +2025-02-05 20:27:10 - ERROR - stderr - 47%|████▋ | 10634/22434 [10:19:30<8:21:21, 2.55s/it] +2025-02-05 20:27:10 - ERROR - stderr - +2025-02-05 20:27:10 - ERROR - stderr - +2025-02-05 20:27:10 - INFO - stdout - {'loss': 0.6018, 'grad_norm': 1.1017608642578125, 'learning_rate': 1.132434476097167e-05, 'epoch': 1.42} +2025-02-05 20:27:10 - ERROR - stderr - 47%|████▋ | 10634/22434 [10:19:30<8:21:21, 2.55s/it] +2025-02-05 20:27:13 - ERROR - stderr - 47%|████▋ | 10635/22434 [10:19:32<8:20:33, 2.55s/it] +2025-02-05 20:27:13 - ERROR - stderr - +2025-02-05 20:27:13 - ERROR - stderr - +2025-02-05 20:27:13 - INFO - stdout - {'loss': 0.6858, 'grad_norm': 1.0533000230789185, 'learning_rate': 1.1322913717413923e-05, 'epoch': 1.42} +2025-02-05 20:27:13 - ERROR - stderr - 47%|████▋ | 10635/22434 [10:19:32<8:20:33, 2.55s/it] +2025-02-05 20:27:15 - ERROR - stderr - 47%|████▋ | 10636/22434 [10:19:35<8:15:49, 2.52s/it] +2025-02-05 20:27:15 - ERROR - stderr - +2025-02-05 20:27:15 - ERROR - stderr - +2025-02-05 20:27:15 - INFO - stdout - {'loss': 0.6656, 'grad_norm': 1.2206239700317383, 'learning_rate': 1.1321482646281301e-05, 'epoch': 1.42} +2025-02-05 20:27:15 - ERROR - stderr - 47%|████▋ | 10636/22434 [10:19:35<8:15:49, 2.52s/it] +2025-02-05 20:27:18 - ERROR - stderr - 47%|████▋ | 10637/22434 [10:19:37<8:19:34, 2.54s/it] +2025-02-05 20:27:18 - ERROR - stderr - +2025-02-05 20:27:18 - ERROR - stderr - +2025-02-05 20:27:18 - INFO - stdout - {'loss': 0.6984, 'grad_norm': 1.127484917640686, 'learning_rate': 1.132005154760364e-05, 'epoch': 1.42} +2025-02-05 20:27:18 - ERROR - stderr - 47%|████▋ | 10637/22434 [10:19:37<8:19:34, 2.54s/it] +2025-02-05 20:27:20 - ERROR - stderr - 47%|████▋ | 10638/22434 [10:19:40<8:18:28, 2.54s/it] +2025-02-05 20:27:20 - ERROR - stderr - +2025-02-05 20:27:20 - ERROR - stderr - +2025-02-05 20:27:20 - INFO - stdout - {'loss': 0.791, 'grad_norm': 1.2618334293365479, 'learning_rate': 1.1318620421410773e-05, 'epoch': 1.42} +2025-02-05 20:27:20 - ERROR - stderr - 47%|████▋ | 10638/22434 [10:19:40<8:18:28, 2.54s/it] +2025-02-05 20:27:23 - ERROR - stderr - 47%|████▋ | 10639/22434 [10:19:42<8:15:51, 2.52s/it] +2025-02-05 20:27:23 - ERROR - stderr - +2025-02-05 20:27:23 - ERROR - stderr - +2025-02-05 20:27:23 - INFO - stdout - {'loss': 0.6387, 'grad_norm': 1.209006667137146, 'learning_rate': 1.131718926773252e-05, 'epoch': 1.42} +2025-02-05 20:27:23 - ERROR - stderr - 47%|████▋ | 10639/22434 [10:19:42<8:15:51, 2.52s/it] +2025-02-05 20:27:25 - ERROR - stderr - 47%|████▋ | 10640/22434 [10:19:45<8:27:24, 2.58s/it] +2025-02-05 20:27:25 - ERROR - stderr - +2025-02-05 20:27:25 - ERROR - stderr - +2025-02-05 20:27:25 - INFO - stdout - {'loss': 0.826, 'grad_norm': 1.3916287422180176, 'learning_rate': 1.1315758086598717e-05, 'epoch': 1.42} +2025-02-05 20:27:25 - ERROR - stderr - 47%|████▋ | 10640/22434 [10:19:45<8:27:24, 2.58s/it] +2025-02-05 20:27:28 - ERROR - stderr - 47%|████▋ | 10641/22434 [10:19:48<8:21:51, 2.55s/it] +2025-02-05 20:27:28 - ERROR - stderr - +2025-02-05 20:27:28 - ERROR - stderr - +2025-02-05 20:27:28 - INFO - stdout - {'loss': 0.7687, 'grad_norm': 1.3079981803894043, 'learning_rate': 1.1314326878039197e-05, 'epoch': 1.42} +2025-02-05 20:27:28 - ERROR - stderr - 47%|████▋ | 10641/22434 [10:19:48<8:21:51, 2.55s/it] +2025-02-05 20:27:31 - ERROR - stderr - 47%|████▋ | 10642/22434 [10:19:51<8:40:59, 2.65s/it] +2025-02-05 20:27:31 - ERROR - stderr - +2025-02-05 20:27:31 - ERROR - stderr - +2025-02-05 20:27:31 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.3276675939559937, 'learning_rate': 1.1312895642083789e-05, 'epoch': 1.42} +2025-02-05 20:27:31 - ERROR - stderr - 47%|████▋ | 10642/22434 [10:19:51<8:40:59, 2.65s/it] +2025-02-05 20:27:33 - ERROR - stderr - 47%|████▋ | 10643/22434 [10:19:53<8:27:47, 2.58s/it] +2025-02-05 20:27:33 - ERROR - stderr - +2025-02-05 20:27:33 - ERROR - stderr - +2025-02-05 20:27:33 - INFO - stdout - {'loss': 0.6985, 'grad_norm': 1.2755855321884155, 'learning_rate': 1.1311464378762329e-05, 'epoch': 1.42} +2025-02-05 20:27:33 - ERROR - stderr - 47%|████▋ | 10643/22434 [10:19:53<8:27:47, 2.58s/it] +2025-02-05 20:27:36 - ERROR - stderr - 47%|████▋ | 10644/22434 [10:19:55<8:22:06, 2.56s/it] +2025-02-05 20:27:36 - ERROR - stderr - +2025-02-05 20:27:36 - ERROR - stderr - +2025-02-05 20:27:36 - INFO - stdout - {'loss': 0.7106, 'grad_norm': 1.199749231338501, 'learning_rate': 1.1310033088104649e-05, 'epoch': 1.42} +2025-02-05 20:27:36 - ERROR - stderr - 47%|████▋ | 10644/22434 [10:19:55<8:22:06, 2.56s/it] +2025-02-05 20:27:38 - ERROR - stderr - 47%|████▋ | 10645/22434 [10:19:58<8:18:46, 2.54s/it] +2025-02-05 20:27:38 - ERROR - stderr - +2025-02-05 20:27:38 - ERROR - stderr - +2025-02-05 20:27:38 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.5804774761199951, 'learning_rate': 1.1308601770140584e-05, 'epoch': 1.42} +2025-02-05 20:27:38 - ERROR - stderr - 47%|████▋ | 10645/22434 [10:19:58<8:18:46, 2.54s/it] +2025-02-05 20:27:41 - ERROR - stderr - 47%|████▋ | 10646/22434 [10:20:00<8:14:51, 2.52s/it] +2025-02-05 20:27:41 - ERROR - stderr - +2025-02-05 20:27:41 - ERROR - stderr - +2025-02-05 20:27:41 - INFO - stdout - {'loss': 0.6212, 'grad_norm': 1.0354292392730713, 'learning_rate': 1.1307170424899967e-05, 'epoch': 1.42} +2025-02-05 20:27:41 - ERROR - stderr - 47%|████▋ | 10646/22434 [10:20:00<8:14:51, 2.52s/it] +2025-02-05 20:27:43 - ERROR - stderr - 47%|████▋ | 10647/22434 [10:20:03<8:14:47, 2.52s/it] +2025-02-05 20:27:43 - ERROR - stderr - +2025-02-05 20:27:43 - ERROR - stderr - +2025-02-05 20:27:43 - INFO - stdout - {'loss': 0.7717, 'grad_norm': 1.1773607730865479, 'learning_rate': 1.1305739052412633e-05, 'epoch': 1.42} +2025-02-05 20:27:43 - ERROR - stderr - 47%|████▋ | 10647/22434 [10:20:03<8:14:47, 2.52s/it] +2025-02-05 20:27:43 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2736 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 20:27:43 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2736 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 20:27:46 - ERROR - stderr - 47%|████▋ | 10648/22434 [10:20:05<8:16:45, 2.53s/it] +2025-02-05 20:27:46 - ERROR - stderr - +2025-02-05 20:27:46 - ERROR - stderr - +2025-02-05 20:27:46 - INFO - stdout - {'loss': 0.6524, 'grad_norm': 1.0961626768112183, 'learning_rate': 1.1304307652708417e-05, 'epoch': 1.42} +2025-02-05 20:27:46 - ERROR - stderr - 47%|████▋ | 10648/22434 [10:20:06<8:16:45, 2.53s/it] +2025-02-05 20:27:51 - ERROR - stderr - 47%|████▋ | 10649/22434 [10:20:11<11:24:36, 3.49s/it] +2025-02-05 20:27:51 - ERROR - stderr - +2025-02-05 20:27:51 - ERROR - stderr - +2025-02-05 20:27:51 - INFO - stdout - {'loss': 0.6265, 'grad_norm': 1.0505746603012085, 'learning_rate': 1.1302876225817155e-05, 'epoch': 1.42} +2025-02-05 20:27:51 - ERROR - stderr - 47%|████▋ | 10649/22434 [10:20:11<11:24:36, 3.49s/it] +2025-02-05 20:27:54 - ERROR - stderr - 47%|████▋ | 10650/22434 [10:20:14<10:29:45, 3.21s/it] +2025-02-05 20:27:54 - ERROR - stderr - +2025-02-05 20:27:54 - ERROR - stderr - +2025-02-05 20:27:54 - INFO - stdout - {'loss': 0.8076, 'grad_norm': 1.3203015327453613, 'learning_rate': 1.1301444771768686e-05, 'epoch': 1.42} +2025-02-05 20:27:54 - ERROR - stderr - 47%|████▋ | 10650/22434 [10:20:14<10:29:45, 3.21s/it] +2025-02-05 20:27:56 - ERROR - stderr - 47%|████▋ | 10651/22434 [10:20:16<9:45:41, 2.98s/it] +2025-02-05 20:27:56 - ERROR - stderr - +2025-02-05 20:27:56 - ERROR - stderr - +2025-02-05 20:27:56 - INFO - stdout - {'loss': 0.6649, 'grad_norm': 1.0896475315093994, 'learning_rate': 1.1300013290592846e-05, 'epoch': 1.42} +2025-02-05 20:27:56 - ERROR - stderr - 47%|████▋ | 10651/22434 [10:20:16<9:45:41, 2.98s/it] +2025-02-05 20:27:59 - ERROR - stderr - 47%|████▋ | 10652/22434 [10:20:19<9:21:01, 2.86s/it] +2025-02-05 20:27:59 - ERROR - stderr - +2025-02-05 20:27:59 - ERROR - stderr - +2025-02-05 20:27:59 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.1877771615982056, 'learning_rate': 1.1298581782319473e-05, 'epoch': 1.42} +2025-02-05 20:27:59 - ERROR - stderr - 47%|████▋ | 10652/22434 [10:20:19<9:21:01, 2.86s/it] +2025-02-05 20:28:02 - ERROR - stderr - 47%|████▋ | 10653/22434 [10:20:21<9:09:29, 2.80s/it] +2025-02-05 20:28:02 - ERROR - stderr - +2025-02-05 20:28:02 - ERROR - stderr - +2025-02-05 20:28:02 - INFO - stdout - {'loss': 0.7031, 'grad_norm': 1.1570836305618286, 'learning_rate': 1.1297150246978406e-05, 'epoch': 1.42} +2025-02-05 20:28:02 - ERROR - stderr - 47%|████▋ | 10653/22434 [10:20:21<9:09:29, 2.80s/it] +2025-02-05 20:28:04 - ERROR - stderr - 47%|████▋ | 10654/22434 [10:20:24<8:49:53, 2.70s/it] +2025-02-05 20:28:04 - ERROR - stderr - +2025-02-05 20:28:04 - ERROR - stderr - +2025-02-05 20:28:04 - INFO - stdout - {'loss': 0.7046, 'grad_norm': 1.1245529651641846, 'learning_rate': 1.1295718684599486e-05, 'epoch': 1.42} +2025-02-05 20:28:04 - ERROR - stderr - 47%|████▋ | 10654/22434 [10:20:24<8:49:53, 2.70s/it] +2025-02-05 20:28:07 - ERROR - stderr - 47%|████▋ | 10655/22434 [10:20:26<8:40:04, 2.65s/it] +2025-02-05 20:28:07 - ERROR - stderr - +2025-02-05 20:28:07 - ERROR - stderr - +2025-02-05 20:28:07 - INFO - stdout - {'loss': 0.7298, 'grad_norm': 1.210593581199646, 'learning_rate': 1.1294287095212543e-05, 'epoch': 1.42} +2025-02-05 20:28:07 - ERROR - stderr - 47%|████▋ | 10655/22434 [10:20:26<8:40:04, 2.65s/it] +2025-02-05 20:28:09 - ERROR - stderr - 47%|████▋ | 10656/22434 [10:20:29<8:33:04, 2.61s/it] +2025-02-05 20:28:09 - ERROR - stderr - +2025-02-05 20:28:09 - ERROR - stderr - +2025-02-05 20:28:09 - INFO - stdout - {'loss': 0.7241, 'grad_norm': 1.226955771446228, 'learning_rate': 1.1292855478847429e-05, 'epoch': 1.42} +2025-02-05 20:28:09 - ERROR - stderr - 47%|████▋ | 10656/22434 [10:20:29<8:33:04, 2.61s/it] +2025-02-05 20:28:12 - ERROR - stderr - 48%|████▊ | 10657/22434 [10:20:31<8:23:21, 2.56s/it] +2025-02-05 20:28:12 - ERROR - stderr - +2025-02-05 20:28:12 - ERROR - stderr - +2025-02-05 20:28:12 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.2139356136322021, 'learning_rate': 1.1291423835533977e-05, 'epoch': 1.43} +2025-02-05 20:28:12 - ERROR - stderr - 48%|████▊ | 10657/22434 [10:20:31<8:23:21, 2.56s/it] +2025-02-05 20:28:14 - ERROR - stderr - 48%|████▊ | 10658/22434 [10:20:34<8:19:02, 2.54s/it] +2025-02-05 20:28:14 - ERROR - stderr - +2025-02-05 20:28:14 - ERROR - stderr - +2025-02-05 20:28:14 - INFO - stdout - {'loss': 0.6864, 'grad_norm': 1.1558961868286133, 'learning_rate': 1.1289992165302036e-05, 'epoch': 1.43} +2025-02-05 20:28:14 - ERROR - stderr - 48%|████▊ | 10658/22434 [10:20:34<8:19:02, 2.54s/it] +2025-02-05 20:28:17 - ERROR - stderr - 48%|████▊ | 10659/22434 [10:20:36<8:21:11, 2.55s/it] +2025-02-05 20:28:17 - ERROR - stderr - +2025-02-05 20:28:17 - ERROR - stderr - +2025-02-05 20:28:17 - INFO - stdout - {'loss': 0.6994, 'grad_norm': 1.1971807479858398, 'learning_rate': 1.1288560468181437e-05, 'epoch': 1.43} +2025-02-05 20:28:17 - ERROR - stderr - 48%|████▊ | 10659/22434 [10:20:37<8:21:11, 2.55s/it] +2025-02-05 20:28:19 - ERROR - stderr - 48%|████▊ | 10660/22434 [10:20:39<8:18:16, 2.54s/it] +2025-02-05 20:28:19 - ERROR - stderr - +2025-02-05 20:28:19 - ERROR - stderr - +2025-02-05 20:28:19 - INFO - stdout - {'loss': 0.7557, 'grad_norm': 1.1828196048736572, 'learning_rate': 1.1287128744202032e-05, 'epoch': 1.43} +2025-02-05 20:28:19 - ERROR - stderr - 48%|████▊ | 10660/22434 [10:20:39<8:18:16, 2.54s/it] +2025-02-05 20:28:22 - ERROR - stderr - 48%|████▊ | 10661/22434 [10:20:42<8:33:09, 2.62s/it] +2025-02-05 20:28:22 - ERROR - stderr - +2025-02-05 20:28:22 - ERROR - stderr - +2025-02-05 20:28:22 - INFO - stdout - {'loss': 0.7398, 'grad_norm': 1.2036288976669312, 'learning_rate': 1.1285696993393658e-05, 'epoch': 1.43} +2025-02-05 20:28:22 - ERROR - stderr - 48%|████▊ | 10661/22434 [10:20:42<8:33:09, 2.62s/it] +2025-02-05 20:28:25 - ERROR - stderr - 48%|████▊ | 10662/22434 [10:20:44<8:28:10, 2.59s/it] +2025-02-05 20:28:25 - ERROR - stderr - +2025-02-05 20:28:25 - ERROR - stderr - +2025-02-05 20:28:25 - INFO - stdout - {'loss': 0.7035, 'grad_norm': 1.0965896844863892, 'learning_rate': 1.1284265215786159e-05, 'epoch': 1.43} +2025-02-05 20:28:25 - ERROR - stderr - 48%|████▊ | 10662/22434 [10:20:44<8:28:10, 2.59s/it] +2025-02-05 20:28:27 - ERROR - stderr - 48%|████▊ | 10663/22434 [10:20:47<8:23:52, 2.57s/it] +2025-02-05 20:28:27 - ERROR - stderr - +2025-02-05 20:28:27 - ERROR - stderr - +2025-02-05 20:28:27 - INFO - stdout - {'loss': 0.676, 'grad_norm': 1.1354644298553467, 'learning_rate': 1.1282833411409381e-05, 'epoch': 1.43} +2025-02-05 20:28:27 - ERROR - stderr - 48%|████▊ | 10663/22434 [10:20:47<8:23:52, 2.57s/it] +2025-02-05 20:28:30 - ERROR - stderr - 48%|████▊ | 10664/22434 [10:20:49<8:19:21, 2.55s/it] +2025-02-05 20:28:30 - ERROR - stderr - +2025-02-05 20:28:30 - ERROR - stderr - +2025-02-05 20:28:30 - INFO - stdout - {'loss': 0.8148, 'grad_norm': 1.2472926378250122, 'learning_rate': 1.128140158029317e-05, 'epoch': 1.43} +2025-02-05 20:28:30 - ERROR - stderr - 48%|████▊ | 10664/22434 [10:20:49<8:19:21, 2.55s/it] +2025-02-05 20:28:32 - ERROR - stderr - 48%|████▊ | 10665/22434 [10:20:52<8:15:19, 2.53s/it] +2025-02-05 20:28:32 - ERROR - stderr - +2025-02-05 20:28:32 - ERROR - stderr - +2025-02-05 20:28:32 - INFO - stdout - {'loss': 0.6758, 'grad_norm': 1.1744377613067627, 'learning_rate': 1.1279969722467368e-05, 'epoch': 1.43} +2025-02-05 20:28:32 - ERROR - stderr - 48%|████▊ | 10665/22434 [10:20:52<8:15:19, 2.53s/it] +2025-02-05 20:28:35 - ERROR - stderr - 48%|████▊ | 10666/22434 [10:20:54<8:14:20, 2.52s/it] +2025-02-05 20:28:35 - ERROR - stderr - +2025-02-05 20:28:35 - ERROR - stderr - +2025-02-05 20:28:35 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.1020431518554688, 'learning_rate': 1.1278537837961824e-05, 'epoch': 1.43} +2025-02-05 20:28:35 - ERROR - stderr - 48%|████▊ | 10666/22434 [10:20:54<8:14:20, 2.52s/it] +2025-02-05 20:28:37 - ERROR - stderr - 48%|████▊ | 10667/22434 [10:20:57<8:11:44, 2.51s/it] +2025-02-05 20:28:37 - ERROR - stderr - +2025-02-05 20:28:37 - ERROR - stderr - +2025-02-05 20:28:37 - INFO - stdout - {'loss': 0.6183, 'grad_norm': 1.129422903060913, 'learning_rate': 1.127710592680638e-05, 'epoch': 1.43} +2025-02-05 20:28:37 - ERROR - stderr - 48%|████▊ | 10667/22434 [10:20:57<8:11:44, 2.51s/it] +2025-02-05 20:28:40 - ERROR - stderr - 48%|████▊ | 10668/22434 [10:20:59<8:12:09, 2.51s/it] +2025-02-05 20:28:40 - ERROR - stderr - +2025-02-05 20:28:40 - ERROR - stderr - +2025-02-05 20:28:40 - INFO - stdout - {'loss': 0.759, 'grad_norm': 1.3803656101226807, 'learning_rate': 1.1275673989030884e-05, 'epoch': 1.43} +2025-02-05 20:28:40 - ERROR - stderr - 48%|████▊ | 10668/22434 [10:20:59<8:12:09, 2.51s/it] +2025-02-05 20:28:42 - ERROR - stderr - 48%|████▊ | 10669/22434 [10:21:02<8:11:54, 2.51s/it] +2025-02-05 20:28:42 - ERROR - stderr - +2025-02-05 20:28:42 - ERROR - stderr - +2025-02-05 20:28:42 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.0879089832305908, 'learning_rate': 1.1274242024665186e-05, 'epoch': 1.43} +2025-02-05 20:28:42 - ERROR - stderr - 48%|████▊ | 10669/22434 [10:21:02<8:11:54, 2.51s/it] +2025-02-05 20:28:45 - ERROR - stderr - 48%|████▊ | 10670/22434 [10:21:04<8:10:44, 2.50s/it] +2025-02-05 20:28:45 - ERROR - stderr - +2025-02-05 20:28:45 - ERROR - stderr - +2025-02-05 20:28:45 - INFO - stdout - {'loss': 0.7476, 'grad_norm': 1.179068684577942, 'learning_rate': 1.1272810033739134e-05, 'epoch': 1.43} +2025-02-05 20:28:45 - ERROR - stderr - 48%|████▊ | 10670/22434 [10:21:04<8:10:44, 2.50s/it] +2025-02-05 20:28:47 - ERROR - stderr - 48%|████▊ | 10671/22434 [10:21:07<8:06:16, 2.48s/it] +2025-02-05 20:28:47 - ERROR - stderr - +2025-02-05 20:28:47 - ERROR - stderr - +2025-02-05 20:28:47 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.1347124576568604, 'learning_rate': 1.1271378016282572e-05, 'epoch': 1.43} +2025-02-05 20:28:47 - ERROR - stderr - 48%|████▊ | 10671/22434 [10:21:07<8:06:16, 2.48s/it] +2025-02-05 20:28:49 - ERROR - stderr - 48%|████▊ | 10672/22434 [10:21:09<8:09:30, 2.50s/it] +2025-02-05 20:28:50 - ERROR - stderr - +2025-02-05 20:28:50 - ERROR - stderr - +2025-02-05 20:28:50 - INFO - stdout - {'loss': 0.6176, 'grad_norm': 1.2245893478393555, 'learning_rate': 1.1269945972325353e-05, 'epoch': 1.43} +2025-02-05 20:28:50 - ERROR - stderr - 48%|████▊ | 10672/22434 [10:21:09<8:09:30, 2.50s/it] +2025-02-05 20:28:52 - ERROR - stderr - 48%|████▊ | 10673/22434 [10:21:12<8:05:29, 2.48s/it] +2025-02-05 20:28:52 - ERROR - stderr - +2025-02-05 20:28:52 - ERROR - stderr - +2025-02-05 20:28:52 - INFO - stdout - {'loss': 0.6791, 'grad_norm': 1.1761186122894287, 'learning_rate': 1.1268513901897324e-05, 'epoch': 1.43} +2025-02-05 20:28:52 - ERROR - stderr - 48%|████▊ | 10673/22434 [10:21:12<8:05:29, 2.48s/it] +2025-02-05 20:28:54 - ERROR - stderr - 48%|████▊ | 10674/22434 [10:21:14<8:10:15, 2.50s/it] +2025-02-05 20:28:55 - ERROR - stderr - +2025-02-05 20:28:55 - ERROR - stderr - +2025-02-05 20:28:55 - INFO - stdout - {'loss': 0.7068, 'grad_norm': 1.14299476146698, 'learning_rate': 1.126708180502834e-05, 'epoch': 1.43} +2025-02-05 20:28:55 - ERROR - stderr - 48%|████▊ | 10674/22434 [10:21:14<8:10:15, 2.50s/it] +2025-02-05 20:28:57 - ERROR - stderr - 48%|████▊ | 10675/22434 [10:21:17<8:23:47, 2.57s/it] +2025-02-05 20:28:57 - ERROR - stderr - +2025-02-05 20:28:57 - ERROR - stderr - +2025-02-05 20:28:57 - INFO - stdout - {'loss': 0.6542, 'grad_norm': 1.1090683937072754, 'learning_rate': 1.1265649681748245e-05, 'epoch': 1.43} +2025-02-05 20:28:57 - ERROR - stderr - 48%|████▊ | 10675/22434 [10:21:17<8:23:47, 2.57s/it] +2025-02-05 20:29:00 - ERROR - stderr - 48%|████▊ | 10676/22434 [10:21:19<8:19:59, 2.55s/it] +2025-02-05 20:29:00 - ERROR - stderr - +2025-02-05 20:29:00 - ERROR - stderr - +2025-02-05 20:29:00 - INFO - stdout - {'loss': 0.6656, 'grad_norm': 1.1149985790252686, 'learning_rate': 1.1264217532086895e-05, 'epoch': 1.43} +2025-02-05 20:29:00 - ERROR - stderr - 48%|████▊ | 10676/22434 [10:21:20<8:19:59, 2.55s/it] +2025-02-05 20:29:02 - ERROR - stderr - 48%|████▊ | 10677/22434 [10:21:22<8:16:49, 2.54s/it] +2025-02-05 20:29:02 - ERROR - stderr - +2025-02-05 20:29:02 - ERROR - stderr - +2025-02-05 20:29:02 - INFO - stdout - {'loss': 0.701, 'grad_norm': 1.129122257232666, 'learning_rate': 1.1262785356074139e-05, 'epoch': 1.43} +2025-02-05 20:29:02 - ERROR - stderr - 48%|████▊ | 10677/22434 [10:21:22<8:16:49, 2.54s/it] +2025-02-05 20:29:05 - ERROR - stderr - 48%|████▊ | 10678/22434 [10:21:24<8:15:30, 2.53s/it] +2025-02-05 20:29:05 - ERROR - stderr - +2025-02-05 20:29:05 - ERROR - stderr - +2025-02-05 20:29:05 - INFO - stdout - {'loss': 0.6461, 'grad_norm': 1.1855891942977905, 'learning_rate': 1.1261353153739834e-05, 'epoch': 1.43} +2025-02-05 20:29:05 - ERROR - stderr - 48%|████▊ | 10678/22434 [10:21:25<8:15:30, 2.53s/it] +2025-02-05 20:29:07 - ERROR - stderr - 48%|████▊ | 10679/22434 [10:21:27<8:10:50, 2.51s/it] +2025-02-05 20:29:07 - ERROR - stderr - +2025-02-05 20:29:07 - ERROR - stderr - +2025-02-05 20:29:07 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.3090617656707764, 'learning_rate': 1.1259920925113825e-05, 'epoch': 1.43} +2025-02-05 20:29:07 - ERROR - stderr - 48%|████▊ | 10679/22434 [10:21:27<8:10:50, 2.51s/it] +2025-02-05 20:29:10 - ERROR - stderr - 48%|████▊ | 10680/22434 [10:21:30<8:22:22, 2.56s/it] +2025-02-05 20:29:10 - ERROR - stderr - +2025-02-05 20:29:10 - ERROR - stderr - +2025-02-05 20:29:10 - INFO - stdout - {'loss': 0.6944, 'grad_norm': 1.1408298015594482, 'learning_rate': 1.1258488670225973e-05, 'epoch': 1.43} +2025-02-05 20:29:10 - ERROR - stderr - 48%|████▊ | 10680/22434 [10:21:30<8:22:22, 2.56s/it] +2025-02-05 20:29:12 - ERROR - stderr - 48%|████▊ | 10681/22434 [10:21:32<8:15:36, 2.53s/it] +2025-02-05 20:29:12 - ERROR - stderr - +2025-02-05 20:29:12 - ERROR - stderr - +2025-02-05 20:29:12 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.2820547819137573, 'learning_rate': 1.1257056389106127e-05, 'epoch': 1.43} +2025-02-05 20:29:12 - ERROR - stderr - 48%|████▊ | 10681/22434 [10:21:32<8:15:36, 2.53s/it] +2025-02-05 20:29:15 - ERROR - stderr - 48%|████▊ | 10682/22434 [10:21:35<8:19:36, 2.55s/it] +2025-02-05 20:29:15 - ERROR - stderr - +2025-02-05 20:29:15 - ERROR - stderr - +2025-02-05 20:29:15 - INFO - stdout - {'loss': 0.6906, 'grad_norm': 1.3175113201141357, 'learning_rate': 1.1255624081784145e-05, 'epoch': 1.43} +2025-02-05 20:29:15 - ERROR - stderr - 48%|████▊ | 10682/22434 [10:21:35<8:19:36, 2.55s/it] +2025-02-05 20:29:17 - ERROR - stderr - 48%|████▊ | 10683/22434 [10:21:37<8:16:10, 2.53s/it] +2025-02-05 20:29:17 - ERROR - stderr - +2025-02-05 20:29:17 - ERROR - stderr - +2025-02-05 20:29:17 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.1378902196884155, 'learning_rate': 1.1254191748289878e-05, 'epoch': 1.43} +2025-02-05 20:29:17 - ERROR - stderr - 48%|████▊ | 10683/22434 [10:21:37<8:16:10, 2.53s/it] +2025-02-05 20:29:20 - ERROR - stderr - 48%|████▊ | 10684/22434 [10:21:40<8:11:00, 2.51s/it] +2025-02-05 20:29:20 - ERROR - stderr - +2025-02-05 20:29:20 - ERROR - stderr - +2025-02-05 20:29:20 - INFO - stdout - {'loss': 0.7539, 'grad_norm': 1.2713255882263184, 'learning_rate': 1.1252759388653187e-05, 'epoch': 1.43} +2025-02-05 20:29:20 - ERROR - stderr - 48%|████▊ | 10684/22434 [10:21:40<8:11:00, 2.51s/it] +2025-02-05 20:29:22 - ERROR - stderr - 48%|████▊ | 10685/22434 [10:21:42<8:09:21, 2.50s/it] +2025-02-05 20:29:22 - ERROR - stderr - +2025-02-05 20:29:22 - ERROR - stderr - +2025-02-05 20:29:22 - INFO - stdout - {'loss': 0.7071, 'grad_norm': 1.2364473342895508, 'learning_rate': 1.1251327002903923e-05, 'epoch': 1.43} +2025-02-05 20:29:22 - ERROR - stderr - 48%|████▊ | 10685/22434 [10:21:42<8:09:21, 2.50s/it] +2025-02-05 20:29:25 - ERROR - stderr - 48%|████▊ | 10686/22434 [10:21:45<8:07:34, 2.49s/it] +2025-02-05 20:29:25 - ERROR - stderr - +2025-02-05 20:29:25 - ERROR - stderr - +2025-02-05 20:29:25 - INFO - stdout - {'loss': 0.7236, 'grad_norm': 1.288007140159607, 'learning_rate': 1.1249894591071948e-05, 'epoch': 1.43} +2025-02-05 20:29:25 - ERROR - stderr - 48%|████▊ | 10686/22434 [10:21:45<8:07:34, 2.49s/it] +2025-02-05 20:29:27 - ERROR - stderr - 48%|████▊ | 10687/22434 [10:21:47<8:11:57, 2.51s/it] +2025-02-05 20:29:27 - ERROR - stderr - +2025-02-05 20:29:27 - ERROR - stderr - +2025-02-05 20:29:27 - INFO - stdout - {'loss': 0.7051, 'grad_norm': 1.1951751708984375, 'learning_rate': 1.1248462153187111e-05, 'epoch': 1.43} +2025-02-05 20:29:27 - ERROR - stderr - 48%|████▊ | 10687/22434 [10:21:47<8:11:57, 2.51s/it] +2025-02-05 20:29:30 - ERROR - stderr - 48%|████▊ | 10688/22434 [10:21:50<8:17:06, 2.54s/it] +2025-02-05 20:29:30 - ERROR - stderr - +2025-02-05 20:29:30 - ERROR - stderr - +2025-02-05 20:29:30 - INFO - stdout - {'loss': 0.7284, 'grad_norm': 1.2279226779937744, 'learning_rate': 1.124702968927928e-05, 'epoch': 1.43} +2025-02-05 20:29:30 - ERROR - stderr - 48%|████▊ | 10688/22434 [10:21:50<8:17:06, 2.54s/it] +2025-02-05 20:29:32 - ERROR - stderr - 48%|████▊ | 10689/22434 [10:21:52<8:12:15, 2.51s/it] +2025-02-05 20:29:32 - ERROR - stderr - +2025-02-05 20:29:32 - ERROR - stderr - +2025-02-05 20:29:32 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.2034598588943481, 'learning_rate': 1.1245597199378306e-05, 'epoch': 1.43} +2025-02-05 20:29:32 - ERROR - stderr - 48%|████▊ | 10689/22434 [10:21:52<8:12:15, 2.51s/it] +2025-02-05 20:29:35 - ERROR - stderr - 48%|████▊ | 10690/22434 [10:21:55<8:29:30, 2.60s/it] +2025-02-05 20:29:35 - ERROR - stderr - +2025-02-05 20:29:35 - ERROR - stderr - +2025-02-05 20:29:35 - INFO - stdout - {'loss': 0.7107, 'grad_norm': 1.2431162595748901, 'learning_rate': 1.1244164683514055e-05, 'epoch': 1.43} +2025-02-05 20:29:35 - ERROR - stderr - 48%|████▊ | 10690/22434 [10:21:55<8:29:30, 2.60s/it] +2025-02-05 20:29:38 - ERROR - stderr - 48%|████▊ | 10691/22434 [10:21:58<8:27:14, 2.59s/it] +2025-02-05 20:29:38 - ERROR - stderr - +2025-02-05 20:29:38 - ERROR - stderr - +2025-02-05 20:29:38 - INFO - stdout - {'loss': 0.6559, 'grad_norm': 1.111336588859558, 'learning_rate': 1.1242732141716377e-05, 'epoch': 1.43} +2025-02-05 20:29:38 - ERROR - stderr - 48%|████▊ | 10691/22434 [10:21:58<8:27:14, 2.59s/it] +2025-02-05 20:29:40 - ERROR - stderr - 48%|████▊ | 10692/22434 [10:22:00<8:31:56, 2.62s/it] +2025-02-05 20:29:41 - ERROR - stderr - +2025-02-05 20:29:41 - ERROR - stderr - +2025-02-05 20:29:41 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.2910313606262207, 'learning_rate': 1.1241299574015137e-05, 'epoch': 1.43} +2025-02-05 20:29:41 - ERROR - stderr - 48%|████▊ | 10692/22434 [10:22:00<8:31:56, 2.62s/it] +2025-02-05 20:29:43 - ERROR - stderr - 48%|████▊ | 10693/22434 [10:22:03<8:20:28, 2.56s/it] +2025-02-05 20:29:43 - ERROR - stderr - +2025-02-05 20:29:43 - ERROR - stderr - +2025-02-05 20:29:43 - INFO - stdout - {'loss': 0.7564, 'grad_norm': 1.2933851480484009, 'learning_rate': 1.1239866980440195e-05, 'epoch': 1.43} +2025-02-05 20:29:43 - ERROR - stderr - 48%|████▊ | 10693/22434 [10:22:03<8:20:28, 2.56s/it] +2025-02-05 20:29:45 - ERROR - stderr - 48%|████▊ | 10694/22434 [10:22:05<8:12:27, 2.52s/it] +2025-02-05 20:29:45 - ERROR - stderr - +2025-02-05 20:29:45 - ERROR - stderr - +2025-02-05 20:29:45 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.175847053527832, 'learning_rate': 1.1238434361021412e-05, 'epoch': 1.43} +2025-02-05 20:29:45 - ERROR - stderr - 48%|████▊ | 10694/22434 [10:22:05<8:12:27, 2.52s/it] +2025-02-05 20:29:48 - ERROR - stderr - 48%|████▊ | 10695/22434 [10:22:08<8:25:07, 2.58s/it] +2025-02-05 20:29:48 - ERROR - stderr - +2025-02-05 20:29:48 - ERROR - stderr - +2025-02-05 20:29:48 - INFO - stdout - {'loss': 0.791, 'grad_norm': 1.2050317525863647, 'learning_rate': 1.1237001715788652e-05, 'epoch': 1.43} +2025-02-05 20:29:48 - ERROR - stderr - 48%|████▊ | 10695/22434 [10:22:08<8:25:07, 2.58s/it] +2025-02-05 20:29:51 - ERROR - stderr - 48%|████▊ | 10696/22434 [10:22:10<8:23:59, 2.58s/it] +2025-02-05 20:29:51 - ERROR - stderr - +2025-02-05 20:29:51 - ERROR - stderr - +2025-02-05 20:29:51 - INFO - stdout - {'loss': 0.7755, 'grad_norm': 1.2653065919876099, 'learning_rate': 1.1235569044771773e-05, 'epoch': 1.43} +2025-02-05 20:29:51 - ERROR - stderr - 48%|████▊ | 10696/22434 [10:22:10<8:23:59, 2.58s/it] +2025-02-05 20:29:53 - ERROR - stderr - 48%|████▊ | 10697/22434 [10:22:13<8:25:25, 2.58s/it] +2025-02-05 20:29:53 - ERROR - stderr - +2025-02-05 20:29:53 - ERROR - stderr - +2025-02-05 20:29:53 - INFO - stdout - {'loss': 0.7662, 'grad_norm': 1.1402374505996704, 'learning_rate': 1.1234136348000639e-05, 'epoch': 1.43} +2025-02-05 20:29:53 - ERROR - stderr - 48%|████▊ | 10697/22434 [10:22:13<8:25:25, 2.58s/it] +2025-02-05 20:29:56 - ERROR - stderr - 48%|████▊ | 10698/22434 [10:22:15<8:20:55, 2.56s/it] +2025-02-05 20:29:56 - ERROR - stderr - +2025-02-05 20:29:56 - ERROR - stderr - +2025-02-05 20:29:56 - INFO - stdout - {'loss': 0.6337, 'grad_norm': 1.1254568099975586, 'learning_rate': 1.1232703625505119e-05, 'epoch': 1.43} +2025-02-05 20:29:56 - ERROR - stderr - 48%|████▊ | 10698/22434 [10:22:16<8:20:55, 2.56s/it] +2025-02-05 20:29:58 - ERROR - stderr - 48%|████▊ | 10699/22434 [10:22:18<8:18:29, 2.55s/it] +2025-02-05 20:29:58 - ERROR - stderr - +2025-02-05 20:29:58 - ERROR - stderr - +2025-02-05 20:29:58 - INFO - stdout - {'loss': 0.7591, 'grad_norm': 1.20777428150177, 'learning_rate': 1.1231270877315066e-05, 'epoch': 1.43} +2025-02-05 20:29:58 - ERROR - stderr - 48%|████▊ | 10699/22434 [10:22:18<8:18:29, 2.55s/it] +2025-02-05 20:30:01 - ERROR - stderr - 48%|████▊ | 10700/22434 [10:22:20<8:12:58, 2.52s/it] +2025-02-05 20:30:01 - ERROR - stderr - +2025-02-05 20:30:01 - ERROR - stderr - +2025-02-05 20:30:01 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.1707218885421753, 'learning_rate': 1.1229838103460349e-05, 'epoch': 1.43} +2025-02-05 20:30:01 - ERROR - stderr - 48%|████▊ | 10700/22434 [10:22:21<8:12:58, 2.52s/it] +2025-02-05 20:30:03 - ERROR - stderr - 48%|████▊ | 10701/22434 [10:22:23<8:15:54, 2.54s/it] +2025-02-05 20:30:03 - ERROR - stderr - +2025-02-05 20:30:03 - ERROR - stderr - +2025-02-05 20:30:03 - INFO - stdout - {'loss': 0.5823, 'grad_norm': 1.0493338108062744, 'learning_rate': 1.1228405303970837e-05, 'epoch': 1.43} +2025-02-05 20:30:03 - ERROR - stderr - 48%|████▊ | 10701/22434 [10:22:23<8:15:54, 2.54s/it] +2025-02-05 20:30:06 - ERROR - stderr - 48%|████▊ | 10702/22434 [10:22:26<8:19:41, 2.56s/it] +2025-02-05 20:30:06 - ERROR - stderr - +2025-02-05 20:30:06 - ERROR - stderr - +2025-02-05 20:30:06 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.1962717771530151, 'learning_rate': 1.1226972478876392e-05, 'epoch': 1.43} +2025-02-05 20:30:06 - ERROR - stderr - 48%|████▊ | 10702/22434 [10:22:26<8:19:41, 2.56s/it] +2025-02-05 20:30:08 - ERROR - stderr - 48%|████▊ | 10703/22434 [10:22:28<8:13:06, 2.52s/it] +2025-02-05 20:30:08 - ERROR - stderr - +2025-02-05 20:30:08 - ERROR - stderr - +2025-02-05 20:30:08 - INFO - stdout - {'loss': 0.7353, 'grad_norm': 1.2757611274719238, 'learning_rate': 1.1225539628206879e-05, 'epoch': 1.43} +2025-02-05 20:30:08 - ERROR - stderr - 48%|████▊ | 10703/22434 [10:22:28<8:13:06, 2.52s/it] +2025-02-05 20:30:11 - ERROR - stderr - 48%|████▊ | 10704/22434 [10:22:31<8:16:13, 2.54s/it] +2025-02-05 20:30:11 - ERROR - stderr - +2025-02-05 20:30:11 - ERROR - stderr - +2025-02-05 20:30:11 - INFO - stdout - {'loss': 0.7985, 'grad_norm': 1.3193750381469727, 'learning_rate': 1.1224106751992164e-05, 'epoch': 1.43} +2025-02-05 20:30:11 - ERROR - stderr - 48%|████▊ | 10704/22434 [10:22:31<8:16:13, 2.54s/it] +2025-02-05 20:30:13 - ERROR - stderr - 48%|████▊ | 10705/22434 [10:22:33<8:11:36, 2.51s/it] +2025-02-05 20:30:13 - ERROR - stderr - +2025-02-05 20:30:13 - ERROR - stderr - +2025-02-05 20:30:13 - INFO - stdout - {'loss': 0.7081, 'grad_norm': 1.161144495010376, 'learning_rate': 1.1222673850262116e-05, 'epoch': 1.43} +2025-02-05 20:30:13 - ERROR - stderr - 48%|████▊ | 10705/22434 [10:22:33<8:11:36, 2.51s/it] +2025-02-05 20:30:16 - ERROR - stderr - 48%|████▊ | 10706/22434 [10:22:36<8:13:17, 2.52s/it] +2025-02-05 20:30:16 - ERROR - stderr - +2025-02-05 20:30:16 - ERROR - stderr - +2025-02-05 20:30:16 - INFO - stdout - {'loss': 0.747, 'grad_norm': 1.176624059677124, 'learning_rate': 1.1221240923046602e-05, 'epoch': 1.43} +2025-02-05 20:30:16 - ERROR - stderr - 48%|████▊ | 10706/22434 [10:22:36<8:13:17, 2.52s/it] +2025-02-05 20:30:18 - ERROR - stderr - 48%|████▊ | 10707/22434 [10:22:38<8:14:30, 2.53s/it] +2025-02-05 20:30:18 - ERROR - stderr - +2025-02-05 20:30:18 - ERROR - stderr - +2025-02-05 20:30:18 - INFO - stdout - {'loss': 0.7303, 'grad_norm': 1.1809161901474, 'learning_rate': 1.1219807970375488e-05, 'epoch': 1.43} +2025-02-05 20:30:18 - ERROR - stderr - 48%|████▊ | 10707/22434 [10:22:38<8:14:30, 2.53s/it] +2025-02-05 20:30:21 - ERROR - stderr - 48%|████▊ | 10708/22434 [10:22:41<8:13:59, 2.53s/it] +2025-02-05 20:30:21 - ERROR - stderr - +2025-02-05 20:30:21 - ERROR - stderr - +2025-02-05 20:30:21 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.1982601881027222, 'learning_rate': 1.1218374992278645e-05, 'epoch': 1.43} +2025-02-05 20:30:21 - ERROR - stderr - 48%|████▊ | 10708/22434 [10:22:41<8:13:59, 2.53s/it] +2025-02-05 20:30:24 - ERROR - stderr - 48%|████▊ | 10709/22434 [10:22:43<8:17:26, 2.55s/it] +2025-02-05 20:30:24 - ERROR - stderr - +2025-02-05 20:30:24 - ERROR - stderr - +2025-02-05 20:30:24 - INFO - stdout - {'loss': 0.6817, 'grad_norm': 1.183390736579895, 'learning_rate': 1.1216941988785939e-05, 'epoch': 1.43} +2025-02-05 20:30:24 - ERROR - stderr - 48%|████▊ | 10709/22434 [10:22:43<8:17:26, 2.55s/it] +2025-02-05 20:30:26 - ERROR - stderr - 48%|████▊ | 10710/22434 [10:22:46<8:15:20, 2.53s/it] +2025-02-05 20:30:26 - ERROR - stderr - +2025-02-05 20:30:26 - ERROR - stderr - +2025-02-05 20:30:26 - INFO - stdout - {'loss': 0.7542, 'grad_norm': 1.1115597486495972, 'learning_rate': 1.1215508959927243e-05, 'epoch': 1.43} +2025-02-05 20:30:26 - ERROR - stderr - 48%|████▊ | 10710/22434 [10:22:46<8:15:20, 2.53s/it] +2025-02-05 20:30:29 - ERROR - stderr - 48%|████▊ | 10711/22434 [10:22:48<8:14:56, 2.53s/it] +2025-02-05 20:30:29 - ERROR - stderr - +2025-02-05 20:30:29 - ERROR - stderr - +2025-02-05 20:30:29 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.1997588872909546, 'learning_rate': 1.121407590573243e-05, 'epoch': 1.43} +2025-02-05 20:30:29 - ERROR - stderr - 48%|████▊ | 10711/22434 [10:22:48<8:14:56, 2.53s/it] +2025-02-05 20:30:31 - ERROR - stderr - 48%|████▊ | 10712/22434 [10:22:51<8:12:13, 2.52s/it] +2025-02-05 20:30:31 - ERROR - stderr - +2025-02-05 20:30:31 - ERROR - stderr - +2025-02-05 20:30:31 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.2250940799713135, 'learning_rate': 1.1212642826231363e-05, 'epoch': 1.43} +2025-02-05 20:30:31 - ERROR - stderr - 48%|████▊ | 10712/22434 [10:22:51<8:12:13, 2.52s/it] +2025-02-05 20:30:34 - ERROR - stderr - 48%|████▊ | 10713/22434 [10:22:53<8:10:46, 2.51s/it] +2025-02-05 20:30:34 - ERROR - stderr - +2025-02-05 20:30:34 - ERROR - stderr - +2025-02-05 20:30:34 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.1667989492416382, 'learning_rate': 1.1211209721453918e-05, 'epoch': 1.43} +2025-02-05 20:30:34 - ERROR - stderr - 48%|████▊ | 10713/22434 [10:22:53<8:10:46, 2.51s/it] +2025-02-05 20:30:36 - ERROR - stderr - 48%|████▊ | 10714/22434 [10:22:56<8:13:30, 2.53s/it] +2025-02-05 20:30:36 - ERROR - stderr - +2025-02-05 20:30:36 - ERROR - stderr - +2025-02-05 20:30:36 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.3029757738113403, 'learning_rate': 1.120977659142996e-05, 'epoch': 1.43} +2025-02-05 20:30:36 - ERROR - stderr - 48%|████▊ | 10714/22434 [10:22:56<8:13:30, 2.53s/it] +2025-02-05 20:30:39 - ERROR - stderr - 48%|████▊ | 10715/22434 [10:22:58<8:09:40, 2.51s/it] +2025-02-05 20:30:39 - ERROR - stderr - +2025-02-05 20:30:39 - ERROR - stderr - +2025-02-05 20:30:39 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.182138204574585, 'learning_rate': 1.1208343436189372e-05, 'epoch': 1.43} +2025-02-05 20:30:39 - ERROR - stderr - 48%|████▊ | 10715/22434 [10:22:58<8:09:40, 2.51s/it] +2025-02-05 20:30:41 - ERROR - stderr - 48%|████▊ | 10716/22434 [10:23:01<8:07:59, 2.50s/it] +2025-02-05 20:30:41 - ERROR - stderr - +2025-02-05 20:30:41 - ERROR - stderr - +2025-02-05 20:30:41 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.1919487714767456, 'learning_rate': 1.120691025576202e-05, 'epoch': 1.43} +2025-02-05 20:30:41 - ERROR - stderr - 48%|████▊ | 10716/22434 [10:23:01<8:07:59, 2.50s/it] +2025-02-05 20:30:44 - ERROR - stderr - 48%|████▊ | 10717/22434 [10:23:03<8:07:27, 2.50s/it] +2025-02-05 20:30:44 - ERROR - stderr - +2025-02-05 20:30:44 - ERROR - stderr - +2025-02-05 20:30:44 - INFO - stdout - {'loss': 0.8007, 'grad_norm': 1.2977871894836426, 'learning_rate': 1.120547705017778e-05, 'epoch': 1.43} +2025-02-05 20:30:44 - ERROR - stderr - 48%|████▊ | 10717/22434 [10:23:03<8:07:27, 2.50s/it] +2025-02-05 20:30:46 - ERROR - stderr - 48%|████▊ | 10718/22434 [10:23:06<8:20:29, 2.56s/it] +2025-02-05 20:30:46 - ERROR - stderr - +2025-02-05 20:30:46 - ERROR - stderr - +2025-02-05 20:30:46 - INFO - stdout - {'loss': 0.7301, 'grad_norm': 1.0938711166381836, 'learning_rate': 1.1204043819466523e-05, 'epoch': 1.43} +2025-02-05 20:30:46 - ERROR - stderr - 48%|████▊ | 10718/22434 [10:23:06<8:20:29, 2.56s/it] +2025-02-05 20:30:49 - ERROR - stderr - 48%|████▊ | 10719/22434 [10:23:09<8:20:05, 2.56s/it] +2025-02-05 20:30:49 - ERROR - stderr - +2025-02-05 20:30:49 - ERROR - stderr - +2025-02-05 20:30:49 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.3124492168426514, 'learning_rate': 1.1202610563658125e-05, 'epoch': 1.43} +2025-02-05 20:30:49 - ERROR - stderr - 48%|████▊ | 10719/22434 [10:23:09<8:20:05, 2.56s/it] +2025-02-05 20:30:51 - ERROR - stderr - 48%|████▊ | 10720/22434 [10:23:11<8:19:11, 2.56s/it] +2025-02-05 20:30:51 - ERROR - stderr - +2025-02-05 20:30:51 - ERROR - stderr - +2025-02-05 20:30:51 - INFO - stdout - {'loss': 0.7059, 'grad_norm': 1.1059037446975708, 'learning_rate': 1.120117728278246e-05, 'epoch': 1.43} +2025-02-05 20:30:51 - ERROR - stderr - 48%|████▊ | 10720/22434 [10:23:11<8:19:11, 2.56s/it] +2025-02-05 20:30:54 - ERROR - stderr - 48%|█���██▊ | 10721/22434 [10:23:14<8:31:49, 2.62s/it] +2025-02-05 20:30:54 - ERROR - stderr - +2025-02-05 20:30:54 - ERROR - stderr - +2025-02-05 20:30:54 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.2210850715637207, 'learning_rate': 1.1199743976869403e-05, 'epoch': 1.43} +2025-02-05 20:30:54 - ERROR - stderr - 48%|████▊ | 10721/22434 [10:23:14<8:31:49, 2.62s/it] +2025-02-05 20:30:57 - ERROR - stderr - 48%|████▊ | 10722/22434 [10:23:16<8:26:08, 2.59s/it] +2025-02-05 20:30:57 - ERROR - stderr - +2025-02-05 20:30:57 - ERROR - stderr - +2025-02-05 20:30:57 - INFO - stdout - {'loss': 0.7605, 'grad_norm': 1.2350088357925415, 'learning_rate': 1.1198310645948833e-05, 'epoch': 1.43} +2025-02-05 20:30:57 - ERROR - stderr - 48%|████▊ | 10722/22434 [10:23:17<8:26:08, 2.59s/it] +2025-02-05 20:30:59 - ERROR - stderr - 48%|████▊ | 10723/22434 [10:23:19<8:20:14, 2.56s/it] +2025-02-05 20:30:59 - ERROR - stderr - +2025-02-05 20:30:59 - ERROR - stderr - +2025-02-05 20:30:59 - INFO - stdout - {'loss': 0.6971, 'grad_norm': 1.1967185735702515, 'learning_rate': 1.1196877290050625e-05, 'epoch': 1.43} +2025-02-05 20:30:59 - ERROR - stderr - 48%|████▊ | 10723/22434 [10:23:19<8:20:14, 2.56s/it] +2025-02-05 20:31:02 - ERROR - stderr - 48%|████▊ | 10724/22434 [10:23:21<8:18:43, 2.56s/it] +2025-02-05 20:31:02 - ERROR - stderr - +2025-02-05 20:31:02 - ERROR - stderr - +2025-02-05 20:31:02 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.116580843925476, 'learning_rate': 1.1195443909204653e-05, 'epoch': 1.43} +2025-02-05 20:31:02 - ERROR - stderr - 48%|████▊ | 10724/22434 [10:23:22<8:18:43, 2.56s/it] +2025-02-05 20:31:04 - ERROR - stderr - 48%|████▊ | 10725/22434 [10:23:24<8:17:07, 2.55s/it] +2025-02-05 20:31:04 - ERROR - stderr - +2025-02-05 20:31:04 - ERROR - stderr - +2025-02-05 20:31:04 - INFO - stdout - {'loss': 0.7495, 'grad_norm': 1.138337254524231, 'learning_rate': 1.1194010503440797e-05, 'epoch': 1.43} +2025-02-05 20:31:04 - ERROR - stderr - 48%|████▊ | 10725/22434 [10:23:24<8:17:07, 2.55s/it] +2025-02-05 20:31:07 - ERROR - stderr - 48%|████▊ | 10726/22434 [10:23:27<8:23:01, 2.58s/it] +2025-02-05 20:31:07 - ERROR - stderr - +2025-02-05 20:31:07 - ERROR - stderr - +2025-02-05 20:31:07 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.125596046447754, 'learning_rate': 1.1192577072788935e-05, 'epoch': 1.43} +2025-02-05 20:31:07 - ERROR - stderr - 48%|████▊ | 10726/22434 [10:23:27<8:23:01, 2.58s/it] +2025-02-05 20:31:10 - ERROR - stderr - 48%|████▊ | 10727/22434 [10:23:29<8:26:14, 2.59s/it] +2025-02-05 20:31:10 - ERROR - stderr - +2025-02-05 20:31:10 - ERROR - stderr - +2025-02-05 20:31:10 - INFO - stdout - {'loss': 0.7311, 'grad_norm': 1.1832541227340698, 'learning_rate': 1.1191143617278946e-05, 'epoch': 1.43} +2025-02-05 20:31:10 - ERROR - stderr - 48%|████▊ | 10727/22434 [10:23:29<8:26:14, 2.59s/it] +2025-02-05 20:31:12 - ERROR - stderr - 48%|████▊ | 10728/22434 [10:23:32<8:21:48, 2.57s/it] +2025-02-05 20:31:12 - ERROR - stderr - +2025-02-05 20:31:12 - ERROR - stderr - +2025-02-05 20:31:12 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.2062422037124634, 'learning_rate': 1.1189710136940706e-05, 'epoch': 1.43} +2025-02-05 20:31:12 - ERROR - stderr - 48%|████▊ | 10728/22434 [10:23:32<8:21:48, 2.57s/it] +2025-02-05 20:31:14 - ERROR - stderr - 48%|████▊ | 10729/22434 [10:23:34<8:14:01, 2.53s/it] +2025-02-05 20:31:15 - ERROR - stderr - +2025-02-05 20:31:15 - ERROR - stderr - +2025-02-05 20:31:15 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.1302896738052368, 'learning_rate': 1.1188276631804098e-05, 'epoch': 1.43} +2025-02-05 20:31:15 - ERROR - stderr - 48%|████▊ | 10729/22434 [10:23:34<8:14:01, 2.53s/it] +2025-02-05 20:31:17 - ERROR - stderr - 48%|████▊ | 10730/22434 [10:23:37<8:10:14, 2.51s/it] +2025-02-05 20:31:17 - ERROR - stderr - +2025-02-05 20:31:17 - ERROR - stderr - +2025-02-05 20:31:17 - INFO - stdout - {'loss': 0.7241, 'grad_norm': 1.2193200588226318, 'learning_rate': 1.1186843101898999e-05, 'epoch': 1.43} +2025-02-05 20:31:17 - ERROR - stderr - 48%|████▊ | 10730/22434 [10:23:37<8:10:14, 2.51s/it] +2025-02-05 20:31:19 - ERROR - stderr - 48%|████▊ | 10731/22434 [10:23:39<8:05:58, 2.49s/it] +2025-02-05 20:31:19 - ERROR - stderr - +2025-02-05 20:31:19 - ERROR - stderr - +2025-02-05 20:31:19 - INFO - stdout - {'loss': 0.6676, 'grad_norm': 1.1650595664978027, 'learning_rate': 1.1185409547255295e-05, 'epoch': 1.44} +2025-02-05 20:31:19 - ERROR - stderr - 48%|████▊ | 10731/22434 [10:23:39<8:05:58, 2.49s/it] +2025-02-05 20:31:22 - ERROR - stderr - 48%|████▊ | 10732/22434 [10:23:42<8:04:57, 2.49s/it] +2025-02-05 20:31:22 - ERROR - stderr - +2025-02-05 20:31:22 - ERROR - stderr - +2025-02-05 20:31:22 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.1004188060760498, 'learning_rate': 1.118397596790286e-05, 'epoch': 1.44} +2025-02-05 20:31:22 - ERROR - stderr - 48%|████▊ | 10732/22434 [10:23:42<8:04:57, 2.49s/it] +2025-02-05 20:31:24 - ERROR - stderr - 48%|████▊ | 10733/22434 [10:23:44<8:03:37, 2.48s/it] +2025-02-05 20:31:24 - ERROR - stderr - +2025-02-05 20:31:24 - ERROR - stderr - +2025-02-05 20:31:24 - INFO - stdout - {'loss': 0.6594, 'grad_norm': 1.0259772539138794, 'learning_rate': 1.1182542363871578e-05, 'epoch': 1.44} +2025-02-05 20:31:24 - ERROR - stderr - 48%|████▊ | 10733/22434 [10:23:44<8:03:37, 2.48s/it] +2025-02-05 20:31:27 - ERROR - stderr - 48%|████▊ | 10734/22434 [10:23:47<8:11:51, 2.52s/it] +2025-02-05 20:31:27 - ERROR - stderr - +2025-02-05 20:31:27 - ERROR - stderr - +2025-02-05 20:31:27 - INFO - stdout - {'loss': 0.7104, 'grad_norm': 1.1806586980819702, 'learning_rate': 1.1181108735191332e-05, 'epoch': 1.44} +2025-02-05 20:31:27 - ERROR - stderr - 48%|████▊ | 10734/22434 [10:23:47<8:11:51, 2.52s/it] +2025-02-05 20:31:29 - ERROR - stderr - 48%|████▊ | 10735/22434 [10:23:49<8:07:17, 2.50s/it] +2025-02-05 20:31:29 - ERROR - stderr - +2025-02-05 20:31:29 - ERROR - stderr - +2025-02-05 20:31:29 - INFO - stdout - {'loss': 0.7018, 'grad_norm': 1.122730016708374, 'learning_rate': 1.117967508189201e-05, 'epoch': 1.44} +2025-02-05 20:31:29 - ERROR - stderr - 48%|████▊ | 10735/22434 [10:23:49<8:07:17, 2.50s/it] +2025-02-05 20:31:32 - ERROR - stderr - 48%|████▊ | 10736/22434 [10:23:52<8:10:03, 2.51s/it] +2025-02-05 20:31:32 - ERROR - stderr - +2025-02-05 20:31:32 - ERROR - stderr - +2025-02-05 20:31:32 - INFO - stdout - {'loss': 0.748, 'grad_norm': 1.2305117845535278, 'learning_rate': 1.1178241404003485e-05, 'epoch': 1.44} +2025-02-05 20:31:32 - ERROR - stderr - 48%|████▊ | 10736/22434 [10:23:52<8:10:03, 2.51s/it] +2025-02-05 20:31:34 - ERROR - stderr - 48%|████▊ | 10737/22434 [10:23:54<8:09:06, 2.51s/it] +2025-02-05 20:31:35 - ERROR - stderr - +2025-02-05 20:31:35 - ERROR - stderr - +2025-02-05 20:31:35 - INFO - stdout - {'loss': 0.6805, 'grad_norm': 1.1225221157073975, 'learning_rate': 1.1176807701555647e-05, 'epoch': 1.44} +2025-02-05 20:31:35 - ERROR - stderr - 48%|████▊ | 10737/22434 [10:23:54<8:09:06, 2.51s/it] +2025-02-05 20:31:37 - ERROR - stderr - 48%|████▊ | 10738/22434 [10:23:57<8:08:42, 2.51s/it] +2025-02-05 20:31:37 - ERROR - stderr - +2025-02-05 20:31:37 - ERROR - stderr - +2025-02-05 20:31:37 - INFO - stdout - {'loss': 0.7101, 'grad_norm': 1.1370915174484253, 'learning_rate': 1.1175373974578378e-05, 'epoch': 1.44} +2025-02-05 20:31:37 - ERROR - stderr - 48%|████▊ | 10738/22434 [10:23:57<8:08:42, 2.51s/it] +2025-02-05 20:31:40 - ERROR - stderr - 48%|████▊ | 10739/22434 [10:23:59<8:20:26, 2.57s/it] +2025-02-05 20:31:40 - ERROR - stderr - +2025-02-05 20:31:40 - ERROR - stderr - +2025-02-05 20:31:40 - INFO - stdout - {'loss': 0.7469, 'grad_norm': 1.371666431427002, 'learning_rate': 1.1173940223101562e-05, 'epoch': 1.44} +2025-02-05 20:31:40 - ERROR - stderr - 48%|████▊ | 10739/22434 [10:23:59<8:20:26, 2.57s/it] +2025-02-05 20:31:42 - ERROR - stderr - 48%|████▊ | 10740/22434 [10:24:02<8:14:39, 2.54s/it] +2025-02-05 20:31:42 - ERROR - stderr - +2025-02-05 20:31:42 - ERROR - stderr - +2025-02-05 20:31:42 - INFO - stdout - {'loss': 0.7168, 'grad_norm': 1.3087252378463745, 'learning_rate': 1.1172506447155088e-05, 'epoch': 1.44} +2025-02-05 20:31:42 - ERROR - stderr - 48%|████▊ | 10740/22434 [10:24:02<8:14:39, 2.54s/it] +2025-02-05 20:31:45 - ERROR - stderr - 48%|████▊ | 10741/22434 [10:24:04<8:14:14, 2.54s/it] +2025-02-05 20:31:45 - ERROR - stderr - +2025-02-05 20:31:45 - ERROR - stderr - +2025-02-05 20:31:45 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.1941101551055908, 'learning_rate': 1.1171072646768836e-05, 'epoch': 1.44} +2025-02-05 20:31:45 - ERROR - stderr - 48%|████▊ | 10741/22434 [10:24:04<8:14:14, 2.54s/it] +2025-02-05 20:31:47 - ERROR - stderr - 48%|████▊ | 10742/22434 [10:24:07<8:15:29, 2.54s/it] +2025-02-05 20:31:47 - ERROR - stderr - +2025-02-05 20:31:47 - ERROR - stderr - +2025-02-05 20:31:47 - INFO - stdout - {'loss': 0.7951, 'grad_norm': 1.3124561309814453, 'learning_rate': 1.1169638821972698e-05, 'epoch': 1.44} +2025-02-05 20:31:47 - ERROR - stderr - 48%|████▊ | 10742/22434 [10:24:07<8:15:29, 2.54s/it] +2025-02-05 20:31:50 - ERROR - stderr - 48%|████▊ | 10743/22434 [10:24:10<8:16:37, 2.55s/it] +2025-02-05 20:31:50 - ERROR - stderr - +2025-02-05 20:31:50 - ERROR - stderr - +2025-02-05 20:31:50 - INFO - stdout - {'loss': 0.7362, 'grad_norm': 1.3282922506332397, 'learning_rate': 1.1168204972796559e-05, 'epoch': 1.44} +2025-02-05 20:31:50 - ERROR - stderr - 48%|████▊ | 10743/22434 [10:24:10<8:16:37, 2.55s/it] +2025-02-05 20:31:52 - ERROR - stderr - 48%|████▊ | 10744/22434 [10:24:12<8:16:33, 2.55s/it] +2025-02-05 20:31:52 - ERROR - stderr - +2025-02-05 20:31:52 - ERROR - stderr - +2025-02-05 20:31:52 - INFO - stdout - {'loss': 0.6825, 'grad_norm': 1.277417778968811, 'learning_rate': 1.1166771099270303e-05, 'epoch': 1.44} +2025-02-05 20:31:52 - ERROR - stderr - 48%|████▊ | 10744/22434 [10:24:12<8:16:33, 2.55s/it] +2025-02-05 20:31:55 - ERROR - stderr - 48%|████▊ | 10745/22434 [10:24:15<8:25:57, 2.60s/it] +2025-02-05 20:31:55 - ERROR - stderr - +2025-02-05 20:31:55 - ERROR - stderr - +2025-02-05 20:31:55 - INFO - stdout - {'loss': 0.7135, 'grad_norm': 1.3105156421661377, 'learning_rate': 1.116533720142382e-05, 'epoch': 1.44} +2025-02-05 20:31:55 - ERROR - stderr - 48%|████▊ | 10745/22434 [10:24:15<8:25:57, 2.60s/it] +2025-02-05 20:31:58 - ERROR - stderr - 48%|████▊ | 10746/22434 [10:24:17<8:21:43, 2.58s/it] +2025-02-05 20:31:58 - ERROR - stderr - +2025-02-05 20:31:58 - ERROR - stderr - +2025-02-05 20:31:58 - INFO - stdout - {'loss': 0.7709, 'grad_norm': 1.3703376054763794, 'learning_rate': 1.1163903279286996e-05, 'epoch': 1.44} +2025-02-05 20:31:58 - ERROR - stderr - 48%|████▊ | 10746/22434 [10:24:17<8:21:43, 2.58s/it] +2025-02-05 20:32:00 - ERROR - stderr - 48%|████▊ | 10747/22434 [10:24:20<8:18:34, 2.56s/it] +2025-02-05 20:32:00 - ERROR - stderr - +2025-02-05 20:32:00 - ERROR - stderr - +2025-02-05 20:32:00 - INFO - stdout - {'loss': 0.7257, 'grad_norm': 1.2666243314743042, 'learning_rate': 1.1162469332889726e-05, 'epoch': 1.44} +2025-02-05 20:32:00 - ERROR - stderr - 48%|████▊ | 10747/22434 [10:24:20<8:18:34, 2.56s/it] +2025-02-05 20:32:03 - ERROR - stderr - 48%|████▊ | 10748/22434 [10:24:22<8:17:16, 2.55s/it] +2025-02-05 20:32:03 - ERROR - stderr - +2025-02-05 20:32:03 - ERROR - stderr - +2025-02-05 20:32:03 - INFO - stdout - {'loss': 0.7716, 'grad_norm': 1.351369857788086, 'learning_rate': 1.1161035362261891e-05, 'epoch': 1.44} +2025-02-05 20:32:03 - ERROR - stderr - 48%|████▊ | 10748/22434 [10:24:22<8:17:16, 2.55s/it] +2025-02-05 20:32:05 - ERROR - stderr - 48%|████▊ | 10749/22434 [10:24:25<8:12:13, 2.53s/it] +2025-02-05 20:32:05 - ERROR - stderr - +2025-02-05 20:32:05 - ERROR - stderr - +2025-02-05 20:32:05 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.214685320854187, 'learning_rate': 1.1159601367433389e-05, 'epoch': 1.44} +2025-02-05 20:32:05 - ERROR - stderr - 48%|████▊ | 10749/22434 [10:24:25<8:12:13, 2.53s/it] +2025-02-05 20:32:08 - ERROR - stderr - 48%|████▊ | 10750/22434 [10:24:27<8:09:30, 2.51s/it] +2025-02-05 20:32:08 - ERROR - stderr - +2025-02-05 20:32:08 - ERROR - stderr - +2025-02-05 20:32:08 - INFO - stdout - {'loss': 0.6321, 'grad_norm': 1.0977617502212524, 'learning_rate': 1.1158167348434103e-05, 'epoch': 1.44} +2025-02-05 20:32:08 - ERROR - stderr - 48%|████▊ | 10750/22434 [10:24:27<8:09:30, 2.51s/it] +2025-02-05 20:32:10 - ERROR - stderr - 48%|████▊ | 10751/22434 [10:24:30<8:09:25, 2.51s/it] +2025-02-05 20:32:10 - ERROR - stderr - +2025-02-05 20:32:10 - ERROR - stderr - +2025-02-05 20:32:10 - INFO - stdout - {'loss': 0.6667, 'grad_norm': 1.1413968801498413, 'learning_rate': 1.1156733305293928e-05, 'epoch': 1.44} +2025-02-05 20:32:10 - ERROR - stderr - 48%|████▊ | 10751/22434 [10:24:30<8:09:25, 2.51s/it] +2025-02-05 20:32:13 - ERROR - stderr - 48%|████▊ | 10752/22434 [10:24:32<8:10:16, 2.52s/it] +2025-02-05 20:32:13 - ERROR - stderr - +2025-02-05 20:32:13 - ERROR - stderr - +2025-02-05 20:32:13 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.2798453569412231, 'learning_rate': 1.1155299238042754e-05, 'epoch': 1.44} +2025-02-05 20:32:13 - ERROR - stderr - 48%|████▊ | 10752/22434 [10:24:32<8:10:16, 2.52s/it] +2025-02-05 20:32:15 - ERROR - stderr - 48%|████▊ | 10753/22434 [10:24:35<8:07:48, 2.51s/it] +2025-02-05 20:32:15 - ERROR - stderr - +2025-02-05 20:32:15 - ERROR - stderr - +2025-02-05 20:32:15 - INFO - stdout - {'loss': 0.6471, 'grad_norm': 1.2000062465667725, 'learning_rate': 1.1153865146710471e-05, 'epoch': 1.44} +2025-02-05 20:32:15 - ERROR - stderr - 48%|████▊ | 10753/22434 [10:24:35<8:07:48, 2.51s/it] +2025-02-05 20:32:18 - ERROR - stderr - 48%|████▊ | 10754/22434 [10:24:37<8:07:29, 2.50s/it] +2025-02-05 20:32:18 - ERROR - stderr - +2025-02-05 20:32:18 - ERROR - stderr - +2025-02-05 20:32:18 - INFO - stdout - {'loss': 0.7592, 'grad_norm': 1.2271296977996826, 'learning_rate': 1.1152431031326978e-05, 'epoch': 1.44} +2025-02-05 20:32:18 - ERROR - stderr - 48%|████▊ | 10754/22434 [10:24:37<8:07:29, 2.50s/it] +2025-02-05 20:32:20 - ERROR - stderr - 48%|████▊ | 10755/22434 [10:24:40<8:05:01, 2.49s/it] +2025-02-05 20:32:20 - ERROR - stderr - +2025-02-05 20:32:20 - ERROR - stderr - +2025-02-05 20:32:20 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.1931766271591187, 'learning_rate': 1.115099689192216e-05, 'epoch': 1.44} +2025-02-05 20:32:20 - ERROR - stderr - 48%|████▊ | 10755/22434 [10:24:40<8:05:01, 2.49s/it] +2025-02-05 20:32:23 - ERROR - stderr - 48%|████▊ | 10756/22434 [10:24:42<8:07:36, 2.51s/it] +2025-02-05 20:32:23 - ERROR - stderr - +2025-02-05 20:32:23 - ERROR - stderr - +2025-02-05 20:32:23 - INFO - stdout - {'loss': 0.7347, 'grad_norm': 1.2668206691741943, 'learning_rate': 1.1149562728525913e-05, 'epoch': 1.44} +2025-02-05 20:32:23 - ERROR - stderr - 48%|████▊ | 10756/22434 [10:24:42<8:07:36, 2.51s/it] +2025-02-05 20:32:25 - ERROR - stderr - 48%|████▊ | 10757/22434 [10:24:45<8:06:41, 2.50s/it] +2025-02-05 20:32:25 - ERROR - stderr - +2025-02-05 20:32:25 - ERROR - stderr - +2025-02-05 20:32:25 - INFO - stdout - {'loss': 0.6349, 'grad_norm': 1.1446300745010376, 'learning_rate': 1.1148128541168133e-05, 'epoch': 1.44} +2025-02-05 20:32:25 - ERROR - stderr - 48%|████▊ | 10757/22434 [10:24:45<8:06:41, 2.50s/it] +2025-02-05 20:32:28 - ERROR - stderr - 48%|████▊ | 10758/22434 [10:24:47<8:08:09, 2.51s/it] +2025-02-05 20:32:28 - ERROR - stderr - +2025-02-05 20:32:28 - ERROR - stderr - +2025-02-05 20:32:28 - INFO - stdout - {'loss': 0.7141, 'grad_norm': 1.1940571069717407, 'learning_rate': 1.1146694329878709e-05, 'epoch': 1.44} +2025-02-05 20:32:28 - ERROR - stderr - 48%|████▊ | 10758/22434 [10:24:47<8:08:09, 2.51s/it] +2025-02-05 20:32:30 - ERROR - stderr - 48%|████▊ | 10759/22434 [10:24:50<8:04:56, 2.49s/it] +2025-02-05 20:32:30 - ERROR - stderr - +2025-02-05 20:32:30 - ERROR - stderr - +2025-02-05 20:32:30 - INFO - stdout - {'loss': 0.7112, 'grad_norm': 1.339685320854187, 'learning_rate': 1.114526009468754e-05, 'epoch': 1.44} +2025-02-05 20:32:30 - ERROR - stderr - 48%|████▊ | 10759/22434 [10:24:50<8:04:56, 2.49s/it] +2025-02-05 20:32:33 - ERROR - stderr - 48%|████▊ | 10760/22434 [10:24:52<8:05:56, 2.50s/it] +2025-02-05 20:32:33 - ERROR - stderr - +2025-02-05 20:32:33 - ERROR - stderr - +2025-02-05 20:32:33 - INFO - stdout - {'loss': 0.6617, 'grad_norm': 1.1376460790634155, 'learning_rate': 1.1143825835624521e-05, 'epoch': 1.44} +2025-02-05 20:32:33 - ERROR - stderr - 48%|████▊ | 10760/22434 [10:24:52<8:05:56, 2.50s/it] +2025-02-05 20:32:35 - ERROR - stderr - 48%|████▊ | 10761/22434 [10:24:55<8:06:49, 2.50s/it] +2025-02-05 20:32:35 - ERROR - stderr - +2025-02-05 20:32:35 - ERROR - stderr - +2025-02-05 20:32:35 - INFO - stdout - {'loss': 0.6487, 'grad_norm': 1.1699087619781494, 'learning_rate': 1.1142391552719548e-05, 'epoch': 1.44} +2025-02-05 20:32:35 - ERROR - stderr - 48%|████▊ | 10761/22434 [10:24:55<8:06:49, 2.50s/it] +2025-02-05 20:32:38 - ERROR - stderr - 48%|████▊ | 10762/22434 [10:24:57<8:14:19, 2.54s/it] +2025-02-05 20:32:38 - ERROR - stderr - +2025-02-05 20:32:38 - ERROR - stderr - +2025-02-05 20:32:38 - INFO - stdout - {'loss': 0.6502, 'grad_norm': 1.1421691179275513, 'learning_rate': 1.1140957246002513e-05, 'epoch': 1.44} +2025-02-05 20:32:38 - ERROR - stderr - 48%|████▊ | 10762/22434 [10:24:58<8:14:19, 2.54s/it] +2025-02-05 20:32:40 - ERROR - stderr - 48%|████▊ | 10763/22434 [10:25:00<8:13:04, 2.53s/it] +2025-02-05 20:32:40 - ERROR - stderr - +2025-02-05 20:32:40 - ERROR - stderr - +2025-02-05 20:32:40 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.2844340801239014, 'learning_rate': 1.113952291550332e-05, 'epoch': 1.44} +2025-02-05 20:32:40 - ERROR - stderr - 48%|████▊ | 10763/22434 [10:25:00<8:13:04, 2.53s/it] +2025-02-05 20:32:43 - ERROR - stderr - 48%|████▊ | 10764/22434 [10:25:03<8:18:20, 2.56s/it] +2025-02-05 20:32:43 - ERROR - stderr - +2025-02-05 20:32:43 - ERROR - stderr - +2025-02-05 20:32:43 - INFO - stdout - {'loss': 0.6734, 'grad_norm': 1.1924713850021362, 'learning_rate': 1.113808856125186e-05, 'epoch': 1.44} +2025-02-05 20:32:43 - ERROR - stderr - 48%|████▊ | 10764/22434 [10:25:03<8:18:20, 2.56s/it] +2025-02-05 20:32:45 - ERROR - stderr - 48%|████▊ | 10765/22434 [10:25:05<8:13:06, 2.54s/it] +2025-02-05 20:32:45 - ERROR - stderr - +2025-02-05 20:32:45 - ERROR - stderr - +2025-02-05 20:32:45 - INFO - stdout - {'loss': 0.7421, 'grad_norm': 1.2711251974105835, 'learning_rate': 1.113665418327803e-05, 'epoch': 1.44} +2025-02-05 20:32:45 - ERROR - stderr - 48%|████▊ | 10765/22434 [10:25:05<8:13:06, 2.54s/it] +2025-02-05 20:32:48 - ERROR - stderr - 48%|████▊ | 10766/22434 [10:25:08<8:11:27, 2.53s/it] +2025-02-05 20:32:48 - ERROR - stderr - +2025-02-05 20:32:48 - ERROR - stderr - +2025-02-05 20:32:48 - INFO - stdout - {'loss': 0.6213, 'grad_norm': 1.0167975425720215, 'learning_rate': 1.1135219781611734e-05, 'epoch': 1.44} +2025-02-05 20:32:48 - ERROR - stderr - 48%|████▊ | 10766/22434 [10:25:08<8:11:27, 2.53s/it] +2025-02-05 20:32:50 - ERROR - stderr - 48%|████▊ | 10767/22434 [10:25:10<8:13:04, 2.54s/it] +2025-02-05 20:32:50 - ERROR - stderr - +2025-02-05 20:32:50 - ERROR - stderr - +2025-02-05 20:32:50 - INFO - stdout - {'loss': 0.726, 'grad_norm': 1.1935372352600098, 'learning_rate': 1.1133785356282872e-05, 'epoch': 1.44} +2025-02-05 20:32:50 - ERROR - stderr - 48%|████▊ | 10767/22434 [10:25:10<8:13:04, 2.54s/it] +2025-02-05 20:32:53 - ERROR - stderr - 48%|████▊ | 10768/22434 [10:25:13<8:10:25, 2.52s/it] +2025-02-05 20:32:53 - ERROR - stderr - +2025-02-05 20:32:53 - ERROR - stderr - +2025-02-05 20:32:53 - INFO - stdout - {'loss': 0.7531, 'grad_norm': 1.314681053161621, 'learning_rate': 1.1132350907321334e-05, 'epoch': 1.44} +2025-02-05 20:32:53 - ERROR - stderr - 48%|████▊ | 10768/22434 [10:25:13<8:10:25, 2.52s/it] +2025-02-05 20:32:55 - ERROR - stderr - 48%|████▊ | 10769/22434 [10:25:15<8:13:29, 2.54s/it] +2025-02-05 20:32:56 - ERROR - stderr - +2025-02-05 20:32:56 - ERROR - stderr - +2025-02-05 20:32:56 - INFO - stdout - {'loss': 0.8185, 'grad_norm': 1.3542203903198242, 'learning_rate': 1.1130916434757027e-05, 'epoch': 1.44} +2025-02-05 20:32:56 - ERROR - stderr - 48%|████▊ | 10769/22434 [10:25:15<8:13:29, 2.54s/it] +2025-02-05 20:32:58 - ERROR - stderr - 48%|████▊ | 10770/22434 [10:25:18<8:06:20, 2.50s/it] +2025-02-05 20:32:58 - ERROR - stderr - +2025-02-05 20:32:58 - ERROR - stderr - +2025-02-05 20:32:58 - INFO - stdout - {'loss': 0.6039, 'grad_norm': 1.0340464115142822, 'learning_rate': 1.1129481938619845e-05, 'epoch': 1.44} +2025-02-05 20:32:58 - ERROR - stderr - 48%|████▊ | 10770/22434 [10:25:18<8:06:20, 2.50s/it] +2025-02-05 20:33:00 - ERROR - stderr - 48%|████▊ | 10771/22434 [10:25:20<8:06:07, 2.50s/it] +2025-02-05 20:33:00 - ERROR - stderr - +2025-02-05 20:33:00 - ERROR - stderr - +2025-02-05 20:33:00 - INFO - stdout - {'loss': 0.7762, 'grad_norm': 1.237444519996643, 'learning_rate': 1.1128047418939698e-05, 'epoch': 1.44} +2025-02-05 20:33:00 - ERROR - stderr - 48%|████▊ | 10771/22434 [10:25:20<8:06:07, 2.50s/it] +2025-02-05 20:33:03 - ERROR - stderr - 48%|████▊ | 10772/22434 [10:25:23<8:10:47, 2.53s/it] +2025-02-05 20:33:03 - ERROR - stderr - +2025-02-05 20:33:03 - ERROR - stderr - +2025-02-05 20:33:03 - INFO - stdout - {'loss': 0.6438, 'grad_norm': 1.1049175262451172, 'learning_rate': 1.1126612875746479e-05, 'epoch': 1.44} +2025-02-05 20:33:03 - ERROR - stderr - 48%|████▊ | 10772/22434 [10:25:23<8:10:47, 2.53s/it] +2025-02-05 20:33:06 - ERROR - stderr - 48%|████▊ | 10773/22434 [10:25:25<8:13:07, 2.54s/it] +2025-02-05 20:33:06 - ERROR - stderr - +2025-02-05 20:33:06 - ERROR - stderr - +2025-02-05 20:33:06 - INFO - stdout - {'loss': 0.668, 'grad_norm': 1.2687772512435913, 'learning_rate': 1.1125178309070094e-05, 'epoch': 1.44} +2025-02-05 20:33:06 - ERROR - stderr - 48%|████▊ | 10773/22434 [10:25:25<8:13:07, 2.54s/it] +2025-02-05 20:33:09 - ERROR - stderr - 48%|████▊ | 10774/22434 [10:25:28<8:46:10, 2.71s/it] +2025-02-05 20:33:09 - ERROR - stderr - +2025-02-05 20:33:09 - ERROR - stderr - +2025-02-05 20:33:09 - INFO - stdout - {'loss': 0.6637, 'grad_norm': 1.2224650382995605, 'learning_rate': 1.1123743718940443e-05, 'epoch': 1.44} +2025-02-05 20:33:09 - ERROR - stderr - 48%|████▊ | 10774/22434 [10:25:28<8:46:10, 2.71s/it] +2025-02-05 20:33:11 - ERROR - stderr - 48%|████▊ | 10775/22434 [10:25:31<8:32:37, 2.64s/it] +2025-02-05 20:33:11 - ERROR - stderr - +2025-02-05 20:33:11 - ERROR - stderr - +2025-02-05 20:33:11 - INFO - stdout - {'loss': 0.6647, 'grad_norm': 1.3154124021530151, 'learning_rate': 1.1122309105387433e-05, 'epoch': 1.44} +2025-02-05 20:33:11 - ERROR - stderr - 48%|████▊ | 10775/22434 [10:25:31<8:32:37, 2.64s/it] +2025-02-05 20:33:14 - ERROR - stderr - 48%|████▊ | 10776/22434 [10:25:33<8:22:02, 2.58s/it] +2025-02-05 20:33:14 - ERROR - stderr - +2025-02-05 20:33:14 - ERROR - stderr - +2025-02-05 20:33:14 - INFO - stdout - {'loss': 0.6574, 'grad_norm': 1.0423606634140015, 'learning_rate': 1.112087446844096e-05, 'epoch': 1.44} +2025-02-05 20:33:14 - ERROR - stderr - 48%|████▊ | 10776/22434 [10:25:33<8:22:02, 2.58s/it] +2025-02-05 20:33:16 - ERROR - stderr - 48%|████▊ | 10777/22434 [10:25:36<8:17:28, 2.56s/it] +2025-02-05 20:33:16 - ERROR - stderr - +2025-02-05 20:33:16 - ERROR - stderr - +2025-02-05 20:33:16 - INFO - stdout - {'loss': 0.8095, 'grad_norm': 1.3187575340270996, 'learning_rate': 1.1119439808130932e-05, 'epoch': 1.44} +2025-02-05 20:33:16 - ERROR - stderr - 48%|████▊ | 10777/22434 [10:25:36<8:17:28, 2.56s/it] +2025-02-05 20:33:19 - ERROR - stderr - 48%|████▊ | 10778/22434 [10:25:38<8:09:39, 2.52s/it] +2025-02-05 20:33:19 - ERROR - stderr - +2025-02-05 20:33:19 - ERROR - stderr - +2025-02-05 20:33:19 - INFO - stdout - {'loss': 0.8306, 'grad_norm': 1.2069398164749146, 'learning_rate': 1.111800512448725e-05, 'epoch': 1.44} +2025-02-05 20:33:19 - ERROR - stderr - 48%|████▊ | 10778/22434 [10:25:38<8:09:39, 2.52s/it] +2025-02-05 20:33:21 - ERROR - stderr - 48%|████▊ | 10779/22434 [10:25:41<8:20:49, 2.58s/it] +2025-02-05 20:33:21 - ERROR - stderr - +2025-02-05 20:33:21 - ERROR - stderr - +2025-02-05 20:33:21 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.268981695175171, 'learning_rate': 1.1116570417539825e-05, 'epoch': 1.44} +2025-02-05 20:33:21 - ERROR - stderr - 48%|████▊ | 10779/22434 [10:25:41<8:20:49, 2.58s/it] +2025-02-05 20:33:24 - ERROR - stderr - 48%|████▊ | 10780/22434 [10:25:44<8:24:30, 2.60s/it] +2025-02-05 20:33:24 - ERROR - stderr - +2025-02-05 20:33:24 - ERROR - stderr - +2025-02-05 20:33:24 - INFO - stdout - {'loss': 0.7204, 'grad_norm': 1.2179815769195557, 'learning_rate': 1.1115135687318556e-05, 'epoch': 1.44} +2025-02-05 20:33:24 - ERROR - stderr - 48%|████▊ | 10780/22434 [10:25:44<8:24:30, 2.60s/it] +2025-02-05 20:33:26 - ERROR - stderr - 48%|████▊ | 10781/22434 [10:25:46<8:19:25, 2.57s/it] +2025-02-05 20:33:26 - ERROR - stderr - +2025-02-05 20:33:26 - ERROR - stderr - +2025-02-05 20:33:26 - INFO - stdout - {'loss': 0.6206, 'grad_norm': 1.058647871017456, 'learning_rate': 1.111370093385335e-05, 'epoch': 1.44} +2025-02-05 20:33:26 - ERROR - stderr - 48%|████▊ | 10781/22434 [10:25:46<8:19:25, 2.57s/it] +2025-02-05 20:33:29 - ERROR - stderr - 48%|████▊ | 10782/22434 [10:25:49<8:15:54, 2.55s/it] +2025-02-05 20:33:29 - ERROR - stderr - +2025-02-05 20:33:29 - ERROR - stderr - +2025-02-05 20:33:29 - INFO - stdout - {'loss': 0.7116, 'grad_norm': 1.2202389240264893, 'learning_rate': 1.1112266157174116e-05, 'epoch': 1.44} +2025-02-05 20:33:29 - ERROR - stderr - 48%|████▊ | 10782/22434 [10:25:49<8:15:54, 2.55s/it] +2025-02-05 20:33:32 - ERROR - stderr - 48%|████▊ | 10783/22434 [10:25:52<8:39:11, 2.67s/it] +2025-02-05 20:33:32 - ERROR - stderr - +2025-02-05 20:33:32 - ERROR - stderr - +2025-02-05 20:33:32 - INFO - stdout - {'loss': 0.723, 'grad_norm': 1.2138216495513916, 'learning_rate': 1.111083135731076e-05, 'epoch': 1.44} +2025-02-05 20:33:32 - ERROR - stderr - 48%|████▊ | 10783/22434 [10:25:52<8:39:11, 2.67s/it] +2025-02-05 20:33:35 - ERROR - stderr - 48%|████▊ | 10784/22434 [10:25:55<8:53:49, 2.75s/it] +2025-02-05 20:33:35 - ERROR - stderr - +2025-02-05 20:33:35 - ERROR - stderr - +2025-02-05 20:33:35 - INFO - stdout - {'loss': 0.7454, 'grad_norm': 1.2124340534210205, 'learning_rate': 1.110939653429318e-05, 'epoch': 1.44} +2025-02-05 20:33:35 - ERROR - stderr - 48%|████▊ | 10784/22434 [10:25:55<8:53:49, 2.75s/it] +2025-02-05 20:33:37 - ERROR - stderr - 48%|████▊ | 10785/22434 [10:25:57<8:39:39, 2.68s/it] +2025-02-05 20:33:37 - ERROR - stderr - +2025-02-05 20:33:37 - ERROR - stderr - +2025-02-05 20:33:37 - INFO - stdout - {'loss': 0.7326, 'grad_norm': 1.2436857223510742, 'learning_rate': 1.1107961688151297e-05, 'epoch': 1.44} +2025-02-05 20:33:37 - ERROR - stderr - 48%|████▊ | 10785/22434 [10:25:57<8:39:39, 2.68s/it] +2025-02-05 20:33:40 - ERROR - stderr - 48%|████▊ | 10786/22434 [10:26:00<8:28:01, 2.62s/it] +2025-02-05 20:33:40 - ERROR - stderr - +2025-02-05 20:33:40 - ERROR - stderr - +2025-02-05 20:33:40 - INFO - stdout - {'loss': 0.8119, 'grad_norm': 1.2278088331222534, 'learning_rate': 1.1106526818915008e-05, 'epoch': 1.44} +2025-02-05 20:33:40 - ERROR - stderr - 48%|████▊ | 10786/22434 [10:26:00<8:28:01, 2.62s/it] +2025-02-05 20:33:42 - ERROR - stderr - 48%|████▊ | 10787/22434 [10:26:02<8:21:20, 2.58s/it] +2025-02-05 20:33:42 - ERROR - stderr - +2025-02-05 20:33:42 - ERROR - stderr - +2025-02-05 20:33:42 - INFO - stdout - {'loss': 0.7593, 'grad_norm': 1.0857634544372559, 'learning_rate': 1.1105091926614234e-05, 'epoch': 1.44} +2025-02-05 20:33:42 - ERROR - stderr - 48%|████▊ | 10787/22434 [10:26:02<8:21:20, 2.58s/it] +2025-02-05 20:33:45 - ERROR - stderr - 48%|████▊ | 10788/22434 [10:26:05<8:34:24, 2.65s/it] +2025-02-05 20:33:45 - ERROR - stderr - +2025-02-05 20:33:45 - ERROR - stderr - +2025-02-05 20:33:45 - INFO - stdout - {'loss': 0.6385, 'grad_norm': 1.0535091161727905, 'learning_rate': 1.110365701127887e-05, 'epoch': 1.44} +2025-02-05 20:33:45 - ERROR - stderr - 48%|████▊ | 10788/22434 [10:26:05<8:34:24, 2.65s/it] +2025-02-05 20:33:48 - ERROR - stderr - 48%|████▊ | 10789/22434 [10:26:07<8:34:34, 2.65s/it] +2025-02-05 20:33:48 - ERROR - stderr - +2025-02-05 20:33:48 - ERROR - stderr - +2025-02-05 20:33:48 - INFO - stdout - {'loss': 0.6395, 'grad_norm': 1.256480097770691, 'learning_rate': 1.1102222072938832e-05, 'epoch': 1.44} +2025-02-05 20:33:48 - ERROR - stderr - 48%|████▊ | 10789/22434 [10:26:08<8:34:34, 2.65s/it] +2025-02-05 20:33:50 - ERROR - stderr - 48%|████▊ | 10790/22434 [10:26:10<8:41:18, 2.69s/it] +2025-02-05 20:33:51 - ERROR - stderr - +2025-02-05 20:33:51 - ERROR - stderr - +2025-02-05 20:33:51 - INFO - stdout - {'loss': 0.7478, 'grad_norm': 1.1999760866165161, 'learning_rate': 1.1100787111624031e-05, 'epoch': 1.44} +2025-02-05 20:33:51 - ERROR - stderr - 48%|████▊ | 10790/22434 [10:26:10<8:41:18, 2.69s/it] +2025-02-05 20:33:53 - ERROR - stderr - 48%|████▊ | 10791/22434 [10:26:13<8:32:05, 2.64s/it] +2025-02-05 20:33:53 - ERROR - stderr - +2025-02-05 20:33:53 - ERROR - stderr - +2025-02-05 20:33:53 - INFO - stdout - {'loss': 0.7965, 'grad_norm': 1.2586785554885864, 'learning_rate': 1.1099352127364373e-05, 'epoch': 1.44} +2025-02-05 20:33:53 - ERROR - stderr - 48%|████▊ | 10791/22434 [10:26:13<8:32:05, 2.64s/it] +2025-02-05 20:33:55 - ERROR - stderr - 48%|████▊ | 10792/22434 [10:26:15<8:19:40, 2.58s/it] +2025-02-05 20:33:55 - ERROR - stderr - +2025-02-05 20:33:55 - ERROR - stderr - +2025-02-05 20:33:55 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.2181401252746582, 'learning_rate': 1.1097917120189778e-05, 'epoch': 1.44} +2025-02-05 20:33:55 - ERROR - stderr - 48%|████▊ | 10792/22434 [10:26:15<8:19:40, 2.58s/it] +2025-02-05 20:33:58 - ERROR - stderr - 48%|████▊ | 10793/22434 [10:26:18<8:13:21, 2.54s/it] +2025-02-05 20:33:58 - ERROR - stderr - +2025-02-05 20:33:58 - ERROR - stderr - +2025-02-05 20:33:58 - INFO - stdout - {'loss': 0.6782, 'grad_norm': 1.1153844594955444, 'learning_rate': 1.1096482090130147e-05, 'epoch': 1.44} +2025-02-05 20:33:58 - ERROR - stderr - 48%|████▊ | 10793/22434 [10:26:18<8:13:21, 2.54s/it] +2025-02-05 20:34:00 - ERROR - stderr - 48%|████▊ | 10794/22434 [10:26:20<8:15:31, 2.55s/it] +2025-02-05 20:34:01 - ERROR - stderr - +2025-02-05 20:34:01 - ERROR - stderr - +2025-02-05 20:34:01 - INFO - stdout - {'loss': 0.712, 'grad_norm': 1.1111050844192505, 'learning_rate': 1.1095047037215397e-05, 'epoch': 1.44} +2025-02-05 20:34:01 - ERROR - stderr - 48%|████▊ | 10794/22434 [10:26:20<8:15:31, 2.55s/it] +2025-02-05 20:34:03 - ERROR - stderr - 48%|████▊ | 10795/22434 [10:26:23<8:13:52, 2.55s/it] +2025-02-05 20:34:03 - ERROR - stderr - +2025-02-05 20:34:03 - ERROR - stderr - +2025-02-05 20:34:03 - INFO - stdout - {'loss': 0.6933, 'grad_norm': 1.078020691871643, 'learning_rate': 1.1093611961475438e-05, 'epoch': 1.44} +2025-02-05 20:34:03 - ERROR - stderr - 48%|████▊ | 10795/22434 [10:26:23<8:13:52, 2.55s/it] +2025-02-05 20:34:06 - ERROR - stderr - 48%|████▊ | 10796/22434 [10:26:25<8:17:25, 2.56s/it] +2025-02-05 20:34:06 - ERROR - stderr - +2025-02-05 20:34:06 - ERROR - stderr - +2025-02-05 20:34:06 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.074216604232788, 'learning_rate': 1.109217686294019e-05, 'epoch': 1.44} +2025-02-05 20:34:06 - ERROR - stderr - 48%|████▊ | 10796/22434 [10:26:25<8:17:25, 2.56s/it] +2025-02-05 20:34:08 - ERROR - stderr - 48%|████▊ | 10797/22434 [10:26:28<8:12:53, 2.54s/it] +2025-02-05 20:34:08 - ERROR - stderr - +2025-02-05 20:34:08 - ERROR - stderr - +2025-02-05 20:34:08 - INFO - stdout - {'loss': 0.7729, 'grad_norm': 1.1970863342285156, 'learning_rate': 1.1090741741639552e-05, 'epoch': 1.44} +2025-02-05 20:34:08 - ERROR - stderr - 48%|████▊ | 10797/22434 [10:26:28<8:12:53, 2.54s/it] +2025-02-05 20:34:11 - ERROR - stderr - 48%|████▊ | 10798/22434 [10:26:30<8:14:33, 2.55s/it] +2025-02-05 20:34:11 - ERROR - stderr - +2025-02-05 20:34:11 - ERROR - stderr - +2025-02-05 20:34:11 - INFO - stdout - {'loss': 0.6107, 'grad_norm': 1.0879454612731934, 'learning_rate': 1.108930659760345e-05, 'epoch': 1.44} +2025-02-05 20:34:11 - ERROR - stderr - 48%|████▊ | 10798/22434 [10:26:30<8:14:33, 2.55s/it] +2025-02-05 20:34:13 - ERROR - stderr - 48%|████▊ | 10799/22434 [10:26:33<8:11:30, 2.53s/it] +2025-02-05 20:34:13 - ERROR - stderr - +2025-02-05 20:34:13 - ERROR - stderr - +2025-02-05 20:34:13 - INFO - stdout - {'loss': 0.6551, 'grad_norm': 1.1212656497955322, 'learning_rate': 1.1087871430861794e-05, 'epoch': 1.44} +2025-02-05 20:34:13 - ERROR - stderr - 48%|████▊ | 10799/22434 [10:26:33<8:11:30, 2.53s/it] +2025-02-05 20:34:16 - ERROR - stderr - 48%|████▊ | 10800/22434 [10:26:36<8:13:37, 2.55s/it] +2025-02-05 20:34:16 - ERROR - stderr - +2025-02-05 20:34:16 - ERROR - stderr - +2025-02-05 20:34:16 - INFO - stdout - {'loss': 0.6537, 'grad_norm': 1.083728551864624, 'learning_rate': 1.10864362414445e-05, 'epoch': 1.44} +2025-02-05 20:34:16 - ERROR - stderr - 48%|████▊ | 10800/22434 [10:26:36<8:13:37, 2.55s/it] +2025-02-05 20:34:18 - ERROR - stderr - 48%|████▊ | 10801/22434 [10:26:38<8:08:51, 2.52s/it] +2025-02-05 20:34:18 - ERROR - stderr - +2025-02-05 20:34:18 - ERROR - stderr - +2025-02-05 20:34:18 - INFO - stdout - {'loss': 0.727, 'grad_norm': 1.0738023519515991, 'learning_rate': 1.1085001029381482e-05, 'epoch': 1.44} +2025-02-05 20:34:18 - ERROR - stderr - 48%|████▊ | 10801/22434 [10:26:38<8:08:51, 2.52s/it] +2025-02-05 20:34:21 - ERROR - stderr - 48%|████▊ | 10802/22434 [10:26:41<8:09:43, 2.53s/it] +2025-02-05 20:34:21 - ERROR - stderr - +2025-02-05 20:34:21 - ERROR - stderr - +2025-02-05 20:34:21 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.180895447731018, 'learning_rate': 1.1083565794702655e-05, 'epoch': 1.44} +2025-02-05 20:34:21 - ERROR - stderr - 48%|████▊ | 10802/22434 [10:26:41<8:09:43, 2.53s/it] +2025-02-05 20:34:23 - ERROR - stderr - 48%|████▊ | 10803/22434 [10:26:43<8:05:05, 2.50s/it] +2025-02-05 20:34:23 - ERROR - stderr - +2025-02-05 20:34:23 - ERROR - stderr - +2025-02-05 20:34:23 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.1423380374908447, 'learning_rate': 1.1082130537437937e-05, 'epoch': 1.44} +2025-02-05 20:34:23 - ERROR - stderr - 48%|████▊ | 10803/22434 [10:26:43<8:05:05, 2.50s/it] +2025-02-05 20:34:26 - ERROR - stderr - 48%|████▊ | 10804/22434 [10:26:46<8:06:55, 2.51s/it] +2025-02-05 20:34:26 - ERROR - stderr - +2025-02-05 20:34:26 - ERROR - stderr - +2025-02-05 20:34:26 - INFO - stdout - {'loss': 0.6708, 'grad_norm': 1.09044349193573, 'learning_rate': 1.1080695257617243e-05, 'epoch': 1.44} +2025-02-05 20:34:26 - ERROR - stderr - 48%|████▊ | 10804/22434 [10:26:46<8:06:55, 2.51s/it] +2025-02-05 20:34:28 - ERROR - stderr - 48%|████▊ | 10805/22434 [10:26:48<8:04:19, 2.50s/it] +2025-02-05 20:34:28 - ERROR - stderr - +2025-02-05 20:34:28 - ERROR - stderr - +2025-02-05 20:34:28 - INFO - stdout - {'loss': 0.6974, 'grad_norm': 1.8647102117538452, 'learning_rate': 1.1079259955270489e-05, 'epoch': 1.44} +2025-02-05 20:34:28 - ERROR - stderr - 48%|████▊ | 10805/22434 [10:26:48<8:04:19, 2.50s/it] +2025-02-05 20:34:31 - ERROR - stderr - 48%|████▊ | 10806/22434 [10:26:50<8:03:40, 2.50s/it] +2025-02-05 20:34:31 - ERROR - stderr - +2025-02-05 20:34:31 - ERROR - stderr - +2025-02-05 20:34:31 - INFO - stdout - {'loss': 0.7045, 'grad_norm': 1.2608208656311035, 'learning_rate': 1.1077824630427593e-05, 'epoch': 1.45} +2025-02-05 20:34:31 - ERROR - stderr - 48%|████▊ | 10806/22434 [10:26:51<8:03:40, 2.50s/it] +2025-02-05 20:34:33 - ERROR - stderr - 48%|████▊ | 10807/22434 [10:26:53<8:06:14, 2.51s/it] +2025-02-05 20:34:33 - ERROR - stderr - +2025-02-05 20:34:33 - ERROR - stderr - +2025-02-05 20:34:33 - INFO - stdout - {'loss': 0.7256, 'grad_norm': 1.2280246019363403, 'learning_rate': 1.1076389283118477e-05, 'epoch': 1.45} +2025-02-05 20:34:33 - ERROR - stderr - 48%|████▊ | 10807/22434 [10:26:53<8:06:14, 2.51s/it] +2025-02-05 20:34:36 - ERROR - stderr - 48%|████▊ | 10808/22434 [10:26:56<8:09:38, 2.53s/it] +2025-02-05 20:34:36 - ERROR - stderr - +2025-02-05 20:34:36 - ERROR - stderr - +2025-02-05 20:34:36 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.280097246170044, 'learning_rate': 1.1074953913373057e-05, 'epoch': 1.45} +2025-02-05 20:34:36 - ERROR - stderr - 48%|████▊ | 10808/22434 [10:26:56<8:09:38, 2.53s/it] +2025-02-05 20:34:38 - ERROR - stderr - 48%|████▊ | 10809/22434 [10:26:58<8:05:56, 2.51s/it] +2025-02-05 20:34:38 - ERROR - stderr - +2025-02-05 20:34:38 - ERROR - stderr - +2025-02-05 20:34:38 - INFO - stdout - {'loss': 0.764, 'grad_norm': 1.3396214246749878, 'learning_rate': 1.1073518521221249e-05, 'epoch': 1.45} +2025-02-05 20:34:38 - ERROR - stderr - 48%|████▊ | 10809/22434 [10:26:58<8:05:56, 2.51s/it] +2025-02-05 20:34:41 - ERROR - stderr - 48%|████▊ | 10810/22434 [10:27:01<8:04:47, 2.50s/it] +2025-02-05 20:34:41 - ERROR - stderr - +2025-02-05 20:34:41 - ERROR - stderr - +2025-02-05 20:34:41 - INFO - stdout - {'loss': 0.7252, 'grad_norm': 1.3328964710235596, 'learning_rate': 1.1072083106692975e-05, 'epoch': 1.45} +2025-02-05 20:34:41 - ERROR - stderr - 48%|████▊ | 10810/22434 [10:27:01<8:04:47, 2.50s/it] +2025-02-05 20:34:44 - ERROR - stderr - 48%|████▊ | 10811/22434 [10:27:04<8:47:16, 2.72s/it] +2025-02-05 20:34:44 - ERROR - stderr - +2025-02-05 20:34:44 - ERROR - stderr - +2025-02-05 20:34:44 - INFO - stdout - {'loss': 0.7159, 'grad_norm': 1.2853786945343018, 'learning_rate': 1.1070647669818153e-05, 'epoch': 1.45} +2025-02-05 20:34:44 - ERROR - stderr - 48%|████▊ | 10811/22434 [10:27:04<8:47:16, 2.72s/it] +2025-02-05 20:34:46 - ERROR - stderr - 48%|████▊ | 10812/22434 [10:27:06<8:30:35, 2.64s/it] +2025-02-05 20:34:46 - ERROR - stderr - +2025-02-05 20:34:46 - ERROR - stderr - +2025-02-05 20:34:46 - INFO - stdout - {'loss': 0.7538, 'grad_norm': 1.2761359214782715, 'learning_rate': 1.106921221062671e-05, 'epoch': 1.45} +2025-02-05 20:34:46 - ERROR - stderr - 48%|████▊ | 10812/22434 [10:27:06<8:30:35, 2.64s/it] +2025-02-05 20:34:49 - ERROR - stderr - 48%|████▊ | 10813/22434 [10:27:09<8:17:19, 2.57s/it] +2025-02-05 20:34:49 - ERROR - stderr - +2025-02-05 20:34:49 - ERROR - stderr - +2025-02-05 20:34:49 - INFO - stdout - {'loss': 0.649, 'grad_norm': 1.2368756532669067, 'learning_rate': 1.1067776729148557e-05, 'epoch': 1.45} +2025-02-05 20:34:49 - ERROR - stderr - 48%|████▊ | 10813/22434 [10:27:09<8:17:19, 2.57s/it] +2025-02-05 20:34:51 - ERROR - stderr - 48%|████▊ | 10814/22434 [10:27:11<8:16:25, 2.56s/it] +2025-02-05 20:34:51 - ERROR - stderr - +2025-02-05 20:34:51 - ERROR - stderr - +2025-02-05 20:34:51 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.203789234161377, 'learning_rate': 1.106634122541362e-05, 'epoch': 1.45} +2025-02-05 20:34:51 - ERROR - stderr - 48%|████▊ | 10814/22434 [10:27:11<8:16:25, 2.56s/it] +2025-02-05 20:34:54 - ERROR - stderr - 48%|████▊ | 10815/22434 [10:27:14<8:15:17, 2.56s/it] +2025-02-05 20:34:54 - ERROR - stderr - +2025-02-05 20:34:54 - ERROR - stderr - +2025-02-05 20:34:54 - INFO - stdout - {'loss': 0.7117, 'grad_norm': 1.1252245903015137, 'learning_rate': 1.1064905699451822e-05, 'epoch': 1.45} +2025-02-05 20:34:54 - ERROR - stderr - 48%|████▊ | 10815/22434 [10:27:14<8:15:17, 2.56s/it] +2025-02-05 20:34:56 - ERROR - stderr - 48%|████▊ | 10816/22434 [10:27:16<8:07:52, 2.52s/it] +2025-02-05 20:34:56 - ERROR - stderr - +2025-02-05 20:34:56 - ERROR - stderr - +2025-02-05 20:34:56 - INFO - stdout - {'loss': 0.6723, 'grad_norm': 1.1826584339141846, 'learning_rate': 1.1063470151293083e-05, 'epoch': 1.45} +2025-02-05 20:34:56 - ERROR - stderr - 48%|████▊ | 10816/22434 [10:27:16<8:07:52, 2.52s/it] +2025-02-05 20:34:59 - ERROR - stderr - 48%|████▊ | 10817/22434 [10:27:19<8:07:59, 2.52s/it] +2025-02-05 20:34:59 - ERROR - stderr - +2025-02-05 20:34:59 - ERROR - stderr - +2025-02-05 20:34:59 - INFO - stdout - {'loss': 0.7461, 'grad_norm': 1.2672951221466064, 'learning_rate': 1.1062034580967327e-05, 'epoch': 1.45} +2025-02-05 20:34:59 - ERROR - stderr - 48%|████▊ | 10817/22434 [10:27:19<8:07:59, 2.52s/it] +2025-02-05 20:35:01 - ERROR - stderr - 48%|████▊ | 10818/22434 [10:27:21<8:12:28, 2.54s/it] +2025-02-05 20:35:02 - ERROR - stderr - +2025-02-05 20:35:02 - ERROR - stderr - +2025-02-05 20:35:02 - INFO - stdout - {'loss': 0.7408, 'grad_norm': 1.2064985036849976, 'learning_rate': 1.1060598988504476e-05, 'epoch': 1.45} +2025-02-05 20:35:02 - ERROR - stderr - 48%|████▊ | 10818/22434 [10:27:21<8:12:28, 2.54s/it] +2025-02-05 20:35:04 - ERROR - stderr - 48%|████▊ | 10819/22434 [10:27:24<8:05:12, 2.51s/it] +2025-02-05 20:35:04 - ERROR - stderr - +2025-02-05 20:35:04 - ERROR - stderr - +2025-02-05 20:35:04 - INFO - stdout - {'loss': 0.6734, 'grad_norm': 1.2159812450408936, 'learning_rate': 1.1059163373934454e-05, 'epoch': 1.45} +2025-02-05 20:35:04 - ERROR - stderr - 48%|████▊ | 10819/22434 [10:27:24<8:05:12, 2.51s/it] +2025-02-05 20:35:06 - ERROR - stderr - 48%|████▊ | 10820/22434 [10:27:26<8:02:23, 2.49s/it] +2025-02-05 20:35:06 - ERROR - stderr - +2025-02-05 20:35:06 - ERROR - stderr - +2025-02-05 20:35:06 - INFO - stdout - {'loss': 0.719, 'grad_norm': 1.2581053972244263, 'learning_rate': 1.1057727737287184e-05, 'epoch': 1.45} +2025-02-05 20:35:06 - ERROR - stderr - 48%|████▊ | 10820/22434 [10:27:26<8:02:23, 2.49s/it] +2025-02-05 20:35:09 - ERROR - stderr - 48%|████▊ | 10821/22434 [10:27:29<8:09:19, 2.53s/it] +2025-02-05 20:35:09 - ERROR - stderr - +2025-02-05 20:35:09 - ERROR - stderr - +2025-02-05 20:35:09 - INFO - stdout - {'loss': 0.6992, 'grad_norm': 1.281354308128357, 'learning_rate': 1.1056292078592595e-05, 'epoch': 1.45} +2025-02-05 20:35:09 - ERROR - stderr - 48%|████▊ | 10821/22434 [10:27:29<8:09:19, 2.53s/it] +2025-02-05 20:35:11 - ERROR - stderr - 48%|████▊ | 10822/22434 [10:27:31<8:06:02, 2.51s/it] +2025-02-05 20:35:11 - ERROR - stderr - +2025-02-05 20:35:11 - ERROR - stderr - +2025-02-05 20:35:11 - INFO - stdout - {'loss': 0.7948, 'grad_norm': 1.2291339635849, 'learning_rate': 1.1054856397880604e-05, 'epoch': 1.45} +2025-02-05 20:35:11 - ERROR - stderr - 48%|████▊ | 10822/22434 [10:27:31<8:06:02, 2.51s/it] +2025-02-05 20:35:14 - ERROR - stderr - 48%|████▊ | 10823/22434 [10:27:34<8:04:32, 2.50s/it] +2025-02-05 20:35:14 - ERROR - stderr - +2025-02-05 20:35:14 - ERROR - stderr - +2025-02-05 20:35:14 - INFO - stdout - {'loss': 0.6203, 'grad_norm': 1.0835392475128174, 'learning_rate': 1.105342069518114e-05, 'epoch': 1.45} +2025-02-05 20:35:14 - ERROR - stderr - 48%|████▊ | 10823/22434 [10:27:34<8:04:32, 2.50s/it] +2025-02-05 20:35:16 - ERROR - stderr - 48%|████▊ | 10824/22434 [10:27:36<8:01:51, 2.49s/it] +2025-02-05 20:35:16 - ERROR - stderr - +2025-02-05 20:35:16 - ERROR - stderr - +2025-02-05 20:35:16 - INFO - stdout - {'loss': 0.769, 'grad_norm': 1.3118091821670532, 'learning_rate': 1.1051984970524135e-05, 'epoch': 1.45} +2025-02-05 20:35:16 - ERROR - stderr - 48%|████▊ | 10824/22434 [10:27:36<8:01:51, 2.49s/it] +2025-02-05 20:35:19 - ERROR - stderr - 48%|████▊ | 10825/22434 [10:27:39<8:05:42, 2.51s/it] +2025-02-05 20:35:19 - ERROR - stderr - +2025-02-05 20:35:19 - ERROR - stderr - +2025-02-05 20:35:19 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.1601808071136475, 'learning_rate': 1.1050549223939507e-05, 'epoch': 1.45} +2025-02-05 20:35:19 - ERROR - stderr - 48%|████▊ | 10825/22434 [10:27:39<8:05:42, 2.51s/it] +2025-02-05 20:35:21 - ERROR - stderr - 48%|████▊ | 10826/22434 [10:27:41<8:02:21, 2.49s/it] +2025-02-05 20:35:21 - ERROR - stderr - +2025-02-05 20:35:21 - ERROR - stderr - +2025-02-05 20:35:21 - INFO - stdout - {'loss': 0.7495, 'grad_norm': 1.2609903812408447, 'learning_rate': 1.1049113455457186e-05, 'epoch': 1.45} +2025-02-05 20:35:21 - ERROR - stderr - 48%|████▊ | 10826/22434 [10:27:41<8:02:21, 2.49s/it] +2025-02-05 20:35:24 - ERROR - stderr - 48%|████▊ | 10827/22434 [10:27:44<8:08:00, 2.52s/it] +2025-02-05 20:35:24 - ERROR - stderr - +2025-02-05 20:35:24 - ERROR - stderr - +2025-02-05 20:35:24 - INFO - stdout - {'loss': 0.7355, 'grad_norm': 1.1831955909729004, 'learning_rate': 1.1047677665107099e-05, 'epoch': 1.45} +2025-02-05 20:35:24 - ERROR - stderr - 48%|████▊ | 10827/22434 [10:27:44<8:08:00, 2.52s/it] +2025-02-05 20:35:26 - ERROR - stderr - 48%|████▊ | 10828/22434 [10:27:46<8:03:16, 2.50s/it] +2025-02-05 20:35:26 - ERROR - stderr - +2025-02-05 20:35:26 - ERROR - stderr - +2025-02-05 20:35:26 - INFO - stdout - {'loss': 0.722, 'grad_norm': 1.2387874126434326, 'learning_rate': 1.1046241852919176e-05, 'epoch': 1.45} +2025-02-05 20:35:26 - ERROR - stderr - 48%|████▊ | 10828/22434 [10:27:46<8:03:16, 2.50s/it] +2025-02-05 20:35:29 - ERROR - stderr - 48%|████▊ | 10829/22434 [10:27:49<8:04:18, 2.50s/it] +2025-02-05 20:35:29 - ERROR - stderr - +2025-02-05 20:35:29 - ERROR - stderr - +2025-02-05 20:35:29 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.275311827659607, 'learning_rate': 1.1044806018923336e-05, 'epoch': 1.45} +2025-02-05 20:35:29 - ERROR - stderr - 48%|████▊ | 10829/22434 [10:27:49<8:04:18, 2.50s/it] +2025-02-05 20:35:31 - ERROR - stderr - 48%|████▊ | 10830/22434 [10:27:51<8:03:59, 2.50s/it] +2025-02-05 20:35:31 - ERROR - stderr - +2025-02-05 20:35:31 - ERROR - stderr - +2025-02-05 20:35:31 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.249427080154419, 'learning_rate': 1.1043370163149518e-05, 'epoch': 1.45} +2025-02-05 20:35:31 - ERROR - stderr - 48%|████▊ | 10830/22434 [10:27:51<8:03:59, 2.50s/it] +2025-02-05 20:35:34 - ERROR - stderr - 48%|████▊ | 10831/22434 [10:27:54<8:00:31, 2.48s/it] +2025-02-05 20:35:34 - ERROR - stderr - +2025-02-05 20:35:34 - ERROR - stderr - +2025-02-05 20:35:34 - INFO - stdout - {'loss': 0.7466, 'grad_norm': 1.195416808128357, 'learning_rate': 1.104193428562765e-05, 'epoch': 1.45} +2025-02-05 20:35:34 - ERROR - stderr - 48%|████▊ | 10831/22434 [10:27:54<8:00:31, 2.48s/it] +2025-02-05 20:35:36 - ERROR - stderr - 48%|████▊ | 10832/22434 [10:27:56<8:01:31, 2.49s/it] +2025-02-05 20:35:36 - ERROR - stderr - +2025-02-05 20:35:36 - ERROR - stderr - +2025-02-05 20:35:36 - INFO - stdout - {'loss': 0.7015, 'grad_norm': 1.1396962404251099, 'learning_rate': 1.1040498386387657e-05, 'epoch': 1.45} +2025-02-05 20:35:36 - ERROR - stderr - 48%|████▊ | 10832/22434 [10:27:56<8:01:31, 2.49s/it] +2025-02-05 20:35:39 - ERROR - stderr - 48%|████▊ | 10833/22434 [10:27:59<7:58:46, 2.48s/it] +2025-02-05 20:35:39 - ERROR - stderr - +2025-02-05 20:35:39 - ERROR - stderr - +2025-02-05 20:35:39 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 1.3026492595672607, 'learning_rate': 1.1039062465459468e-05, 'epoch': 1.45} +2025-02-05 20:35:39 - ERROR - stderr - 48%|████▊ | 10833/22434 [10:27:59<7:58:46, 2.48s/it] +2025-02-05 20:35:41 - ERROR - stderr - 48%|████▊ | 10834/22434 [10:28:01<8:02:47, 2.50s/it] +2025-02-05 20:35:41 - ERROR - stderr - +2025-02-05 20:35:41 - ERROR - stderr - +2025-02-05 20:35:41 - INFO - stdout - {'loss': 0.7582, 'grad_norm': 1.2944220304489136, 'learning_rate': 1.103762652287302e-05, 'epoch': 1.45} +2025-02-05 20:35:41 - ERROR - stderr - 48%|████▊ | 10834/22434 [10:28:01<8:02:47, 2.50s/it] +2025-02-05 20:35:44 - ERROR - stderr - 48%|████▊ | 10835/22434 [10:28:04<8:05:42, 2.51s/it] +2025-02-05 20:35:44 - ERROR - stderr - +2025-02-05 20:35:44 - ERROR - stderr - +2025-02-05 20:35:44 - INFO - stdout - {'loss': 0.6257, 'grad_norm': 1.0840996503829956, 'learning_rate': 1.1036190558658238e-05, 'epoch': 1.45} +2025-02-05 20:35:44 - ERROR - stderr - 48%|████▊ | 10835/22434 [10:28:04<8:05:42, 2.51s/it] +2025-02-05 20:35:46 - ERROR - stderr - 48%|████▊ | 10836/22434 [10:28:06<8:08:11, 2.53s/it] +2025-02-05 20:35:47 - ERROR - stderr - +2025-02-05 20:35:47 - ERROR - stderr - +2025-02-05 20:35:47 - INFO - stdout - {'loss': 0.8033, 'grad_norm': 1.2793140411376953, 'learning_rate': 1.1034754572845057e-05, 'epoch': 1.45} +2025-02-05 20:35:47 - ERROR - stderr - 48%|████▊ | 10836/22434 [10:28:06<8:08:11, 2.53s/it] +2025-02-05 20:35:49 - ERROR - stderr - 48%|████▊ | 10837/22434 [10:28:09<8:11:41, 2.54s/it] +2025-02-05 20:35:49 - ERROR - stderr - +2025-02-05 20:35:49 - ERROR - stderr - +2025-02-05 20:35:49 - INFO - stdout - {'loss': 0.7059, 'grad_norm': 1.2749674320220947, 'learning_rate': 1.1033318565463404e-05, 'epoch': 1.45} +2025-02-05 20:35:49 - ERROR - stderr - 48%|████▊ | 10837/22434 [10:28:09<8:11:41, 2.54s/it] +2025-02-05 20:35:52 - ERROR - stderr - 48%|████▊ | 10838/22434 [10:28:11<8:08:49, 2.53s/it] +2025-02-05 20:35:52 - ERROR - stderr - +2025-02-05 20:35:52 - ERROR - stderr - +2025-02-05 20:35:52 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 1.1647595167160034, 'learning_rate': 1.1031882536543216e-05, 'epoch': 1.45} +2025-02-05 20:35:52 - ERROR - stderr - 48%|████▊ | 10838/22434 [10:28:11<8:08:49, 2.53s/it] +2025-02-05 20:35:54 - ERROR - stderr - 48%|████▊ | 10839/22434 [10:28:14<8:04:01, 2.50s/it] +2025-02-05 20:35:54 - ERROR - stderr - +2025-02-05 20:35:54 - ERROR - stderr - +2025-02-05 20:35:54 - INFO - stdout - {'loss': 0.7086, 'grad_norm': 1.147619366645813, 'learning_rate': 1.1030446486114425e-05, 'epoch': 1.45} +2025-02-05 20:35:54 - ERROR - stderr - 48%|████▊ | 10839/22434 [10:28:14<8:04:01, 2.50s/it] +2025-02-05 20:35:56 - ERROR - stderr - 48%|████▊ | 10840/22434 [10:28:16<8:00:11, 2.49s/it] +2025-02-05 20:35:57 - ERROR - stderr - +2025-02-05 20:35:57 - ERROR - stderr - +2025-02-05 20:35:57 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.3113161325454712, 'learning_rate': 1.1029010414206965e-05, 'epoch': 1.45} +2025-02-05 20:35:57 - ERROR - stderr - 48%|████▊ | 10840/22434 [10:28:16<8:00:11, 2.49s/it] +2025-02-05 20:35:59 - ERROR - stderr - 48%|████▊ | 10841/22434 [10:28:19<7:57:48, 2.47s/it] +2025-02-05 20:35:59 - ERROR - stderr - +2025-02-05 20:35:59 - ERROR - stderr - +2025-02-05 20:35:59 - INFO - stdout - {'loss': 0.8283, 'grad_norm': 1.3291889429092407, 'learning_rate': 1.1027574320850763e-05, 'epoch': 1.45} +2025-02-05 20:35:59 - ERROR - stderr - 48%|████▊ | 10841/22434 [10:28:19<7:57:48, 2.47s/it] +2025-02-05 20:36:01 - ERROR - stderr - 48%|████▊ | 10842/22434 [10:28:21<7:59:35, 2.48s/it] +2025-02-05 20:36:01 - ERROR - stderr - +2025-02-05 20:36:01 - ERROR - stderr - +2025-02-05 20:36:01 - INFO - stdout - {'loss': 0.7279, 'grad_norm': 1.1831458806991577, 'learning_rate': 1.1026138206075759e-05, 'epoch': 1.45} +2025-02-05 20:36:01 - ERROR - stderr - 48%|████▊ | 10842/22434 [10:28:21<7:59:35, 2.48s/it] +2025-02-05 20:36:04 - ERROR - stderr - 48%|████▊ | 10843/22434 [10:28:24<8:01:43, 2.49s/it] +2025-02-05 20:36:04 - ERROR - stderr - +2025-02-05 20:36:04 - ERROR - stderr - +2025-02-05 20:36:04 - INFO - stdout - {'loss': 0.7449, 'grad_norm': 1.2279443740844727, 'learning_rate': 1.1024702069911885e-05, 'epoch': 1.45} +2025-02-05 20:36:04 - ERROR - stderr - 48%|████▊ | 10843/22434 [10:28:24<8:01:43, 2.49s/it] +2025-02-05 20:36:06 - ERROR - stderr - 48%|████▊ | 10844/22434 [10:28:26<8:06:15, 2.52s/it] +2025-02-05 20:36:07 - ERROR - stderr - +2025-02-05 20:36:07 - ERROR - stderr - +2025-02-05 20:36:07 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.307448148727417, 'learning_rate': 1.102326591238908e-05, 'epoch': 1.45} +2025-02-05 20:36:07 - ERROR - stderr - 48%|████▊ | 10844/22434 [10:28:26<8:06:15, 2.52s/it] +2025-02-05 20:36:09 - ERROR - stderr - 48%|████▊ | 10845/22434 [10:28:29<8:06:29, 2.52s/it] +2025-02-05 20:36:09 - ERROR - stderr - +2025-02-05 20:36:09 - ERROR - stderr - +2025-02-05 20:36:09 - INFO - stdout - {'loss': 0.6852, 'grad_norm': 1.1759616136550903, 'learning_rate': 1.1021829733537274e-05, 'epoch': 1.45} +2025-02-05 20:36:09 - ERROR - stderr - 48%|████▊ | 10845/22434 [10:28:29<8:06:29, 2.52s/it] +2025-02-05 20:36:12 - ERROR - stderr - 48%|████▊ | 10846/22434 [10:28:31<8:16:10, 2.57s/it] +2025-02-05 20:36:12 - ERROR - stderr - +2025-02-05 20:36:12 - ERROR - stderr - +2025-02-05 20:36:12 - INFO - stdout - {'loss': 0.8268, 'grad_norm': 1.260444164276123, 'learning_rate': 1.1020393533386404e-05, 'epoch': 1.45} +2025-02-05 20:36:12 - ERROR - stderr - 48%|████▊ | 10846/22434 [10:28:32<8:16:10, 2.57s/it] +2025-02-05 20:36:14 - ERROR - stderr - 48%|████▊ | 10847/22434 [10:28:34<8:14:16, 2.56s/it] +2025-02-05 20:36:14 - ERROR - stderr - +2025-02-05 20:36:14 - ERROR - stderr - +2025-02-05 20:36:14 - INFO - stdout - {'loss': 0.735, 'grad_norm': 1.310232162475586, 'learning_rate': 1.101895731196641e-05, 'epoch': 1.45} +2025-02-05 20:36:14 - ERROR - stderr - 48%|████▊ | 10847/22434 [10:28:34<8:14:16, 2.56s/it] +2025-02-05 20:36:17 - ERROR - stderr - 48%|████▊ | 10848/22434 [10:28:36<8:08:08, 2.53s/it] +2025-02-05 20:36:17 - ERROR - stderr - +2025-02-05 20:36:17 - ERROR - stderr - +2025-02-05 20:36:17 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.2217086553573608, 'learning_rate': 1.1017521069307224e-05, 'epoch': 1.45} +2025-02-05 20:36:17 - ERROR - stderr - 48%|████▊ | 10848/22434 [10:28:37<8:08:08, 2.53s/it] +2025-02-05 20:36:19 - ERROR - stderr - 48%|████▊ | 10849/22434 [10:28:39<8:05:48, 2.52s/it] +2025-02-05 20:36:19 - ERROR - stderr - +2025-02-05 20:36:19 - ERROR - stderr - +2025-02-05 20:36:19 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.2213094234466553, 'learning_rate': 1.1016084805438785e-05, 'epoch': 1.45} +2025-02-05 20:36:19 - ERROR - stderr - 48%|████▊ | 10849/22434 [10:28:39<8:05:48, 2.52s/it] +2025-02-05 20:36:22 - ERROR - stderr - 48%|████▊ | 10850/22434 [10:28:41<8:05:27, 2.51s/it] +2025-02-05 20:36:22 - ERROR - stderr - +2025-02-05 20:36:22 - ERROR - stderr - +2025-02-05 20:36:22 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.4285427331924438, 'learning_rate': 1.1014648520391031e-05, 'epoch': 1.45} +2025-02-05 20:36:22 - ERROR - stderr - 48%|████▊ | 10850/22434 [10:28:42<8:05:27, 2.51s/it] +2025-02-05 20:36:24 - ERROR - stderr - 48%|████▊ | 10851/22434 [10:28:44<7:59:27, 2.48s/it] +2025-02-05 20:36:24 - ERROR - stderr - +2025-02-05 20:36:24 - ERROR - stderr - +2025-02-05 20:36:24 - INFO - stdout - {'loss': 0.7371, 'grad_norm': 1.1831865310668945, 'learning_rate': 1.10132122141939e-05, 'epoch': 1.45} +2025-02-05 20:36:24 - ERROR - stderr - 48%|████▊ | 10851/22434 [10:28:44<7:59:27, 2.48s/it] +2025-02-05 20:36:27 - ERROR - stderr - 48%|████▊ | 10852/22434 [10:28:46<8:01:46, 2.50s/it] +2025-02-05 20:36:27 - ERROR - stderr - +2025-02-05 20:36:27 - ERROR - stderr - +2025-02-05 20:36:27 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.2192161083221436, 'learning_rate': 1.1011775886877331e-05, 'epoch': 1.45} +2025-02-05 20:36:27 - ERROR - stderr - 48%|████▊ | 10852/22434 [10:28:46<8:01:46, 2.50s/it] +2025-02-05 20:36:29 - ERROR - stderr - 48%|████▊ | 10853/22434 [10:28:49<8:04:25, 2.51s/it] +2025-02-05 20:36:29 - ERROR - stderr - +2025-02-05 20:36:29 - ERROR - stderr - +2025-02-05 20:36:29 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.1997629404067993, 'learning_rate': 1.1010339538471259e-05, 'epoch': 1.45} +2025-02-05 20:36:29 - ERROR - stderr - 48%|████▊ | 10853/22434 [10:28:49<8:04:25, 2.51s/it] +2025-02-05 20:36:32 - ERROR - stderr - 48%|████▊ | 10854/22434 [10:28:51<8:02:27, 2.50s/it] +2025-02-05 20:36:32 - ERROR - stderr - +2025-02-05 20:36:32 - ERROR - stderr - +2025-02-05 20:36:32 - INFO - stdout - {'loss': 0.7916, 'grad_norm': 1.1829913854599, 'learning_rate': 1.1008903169005627e-05, 'epoch': 1.45} +2025-02-05 20:36:32 - ERROR - stderr - 48%|████▊ | 10854/22434 [10:28:51<8:02:27, 2.50s/it] +2025-02-05 20:36:34 - ERROR - stderr - 48%|████▊ | 10855/22434 [10:28:54<8:03:53, 2.51s/it] +2025-02-05 20:36:34 - ERROR - stderr - +2025-02-05 20:36:34 - ERROR - stderr - +2025-02-05 20:36:34 - INFO - stdout - {'loss': 0.7159, 'grad_norm': 1.2070574760437012, 'learning_rate': 1.1007466778510373e-05, 'epoch': 1.45} +2025-02-05 20:36:34 - ERROR - stderr - 48%|████▊ | 10855/22434 [10:28:54<8:03:53, 2.51s/it] +2025-02-05 20:36:37 - ERROR - stderr - 48%|████▊ | 10856/22434 [10:28:57<8:20:53, 2.60s/it] +2025-02-05 20:36:37 - ERROR - stderr - +2025-02-05 20:36:37 - ERROR - stderr - +2025-02-05 20:36:37 - INFO - stdout - {'loss': 0.7384, 'grad_norm': 1.2767226696014404, 'learning_rate': 1.100603036701544e-05, 'epoch': 1.45} +2025-02-05 20:36:37 - ERROR - stderr - 48%|████▊ | 10856/22434 [10:28:57<8:20:53, 2.60s/it] +2025-02-05 20:36:40 - ERROR - stderr - 48%|████▊ | 10857/22434 [10:28:59<8:23:45, 2.61s/it] +2025-02-05 20:36:40 - ERROR - stderr - +2025-02-05 20:36:40 - ERROR - stderr - +2025-02-05 20:36:40 - INFO - stdout - {'loss': 0.6996, 'grad_norm': 1.1423934698104858, 'learning_rate': 1.1004593934550767e-05, 'epoch': 1.45} +2025-02-05 20:36:40 - ERROR - stderr - 48%|████▊ | 10857/22434 [10:28:59<8:23:45, 2.61s/it] +2025-02-05 20:36:42 - ERROR - stderr - 48%|████▊ | 10858/22434 [10:29:02<8:19:07, 2.59s/it] +2025-02-05 20:36:42 - ERROR - stderr - +2025-02-05 20:36:42 - ERROR - stderr - +2025-02-05 20:36:42 - INFO - stdout - {'loss': 0.6516, 'grad_norm': 1.1036198139190674, 'learning_rate': 1.1003157481146294e-05, 'epoch': 1.45} +2025-02-05 20:36:42 - ERROR - stderr - 48%|████▊ | 10858/22434 [10:29:02<8:19:07, 2.59s/it] +2025-02-05 20:36:45 - ERROR - stderr - 48%|████▊ | 10859/22434 [10:29:04<8:17:28, 2.58s/it] +2025-02-05 20:36:45 - ERROR - stderr - +2025-02-05 20:36:45 - ERROR - stderr - +2025-02-05 20:36:45 - INFO - stdout - {'loss': 0.7181, 'grad_norm': 1.1072052717208862, 'learning_rate': 1.1001721006831962e-05, 'epoch': 1.45} +2025-02-05 20:36:45 - ERROR - stderr - 48%|████▊ | 10859/22434 [10:29:05<8:17:28, 2.58s/it] +2025-02-05 20:36:47 - ERROR - stderr - 48%|████▊ | 10860/22434 [10:29:07<8:11:46, 2.55s/it] +2025-02-05 20:36:47 - ERROR - stderr - +2025-02-05 20:36:47 - ERROR - stderr - +2025-02-05 20:36:47 - INFO - stdout - {'loss': 0.6596, 'grad_norm': 1.1262147426605225, 'learning_rate': 1.1000284511637717e-05, 'epoch': 1.45} +2025-02-05 20:36:47 - ERROR - stderr - 48%|████▊ | 10860/22434 [10:29:07<8:11:46, 2.55s/it] +2025-02-05 20:36:50 - ERROR - stderr - 48%|████▊ | 10861/22434 [10:29:09<8:05:47, 2.52s/it] +2025-02-05 20:36:50 - ERROR - stderr - +2025-02-05 20:36:50 - ERROR - stderr - +2025-02-05 20:36:50 - INFO - stdout - {'loss': 0.5408, 'grad_norm': 1.0688198804855347, 'learning_rate': 1.0998847995593494e-05, 'epoch': 1.45} +2025-02-05 20:36:50 - ERROR - stderr - 48%|████▊ | 10861/22434 [10:29:09<8:05:47, 2.52s/it] +2025-02-05 20:36:52 - ERROR - stderr - 48%|████▊ | 10862/22434 [10:29:12<8:01:11, 2.49s/it] +2025-02-05 20:36:52 - ERROR - stderr - +2025-02-05 20:36:52 - ERROR - stderr - +2025-02-05 20:36:52 - INFO - stdout - {'loss': 0.6279, 'grad_norm': 1.2367618083953857, 'learning_rate': 1.0997411458729243e-05, 'epoch': 1.45} +2025-02-05 20:36:52 - ERROR - stderr - 48%|████▊ | 10862/22434 [10:29:12<8:01:11, 2.49s/it] +2025-02-05 20:36:54 - ERROR - stderr - 48%|████▊ | 10863/22434 [10:29:14<7:56:27, 2.47s/it] +2025-02-05 20:36:55 - ERROR - stderr - +2025-02-05 20:36:55 - ERROR - stderr - +2025-02-05 20:36:55 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.2538025379180908, 'learning_rate': 1.0995974901074905e-05, 'epoch': 1.45} +2025-02-05 20:36:55 - ERROR - stderr - 48%|████▊ | 10863/22434 [10:29:14<7:56:27, 2.47s/it] +2025-02-05 20:36:57 - ERROR - stderr - 48%|████▊ | 10864/22434 [10:29:17<8:07:05, 2.53s/it] +2025-02-05 20:36:57 - ERROR - stderr - +2025-02-05 20:36:57 - ERROR - stderr - +2025-02-05 20:36:57 - INFO - stdout - {'loss': 0.6847, 'grad_norm': 1.1458609104156494, 'learning_rate': 1.0994538322660423e-05, 'epoch': 1.45} +2025-02-05 20:36:57 - ERROR - stderr - 48%|████▊ | 10864/22434 [10:29:17<8:07:05, 2.53s/it] +2025-02-05 20:37:00 - ERROR - stderr - 48%|████▊ | 10865/22434 [10:29:19<8:04:04, 2.51s/it] +2025-02-05 20:37:00 - ERROR - stderr - +2025-02-05 20:37:00 - ERROR - stderr - +2025-02-05 20:37:00 - INFO - stdout - {'loss': 0.7399, 'grad_norm': 1.275004267692566, 'learning_rate': 1.099310172351574e-05, 'epoch': 1.45} +2025-02-05 20:37:00 - ERROR - stderr - 48%|████▊ | 10865/22434 [10:29:19<8:04:04, 2.51s/it] +2025-02-05 20:37:02 - ERROR - stderr - 48%|████▊ | 10866/22434 [10:29:22<8:05:20, 2.52s/it] +2025-02-05 20:37:02 - ERROR - stderr - +2025-02-05 20:37:02 - ERROR - stderr - +2025-02-05 20:37:02 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.3751593828201294, 'learning_rate': 1.0991665103670803e-05, 'epoch': 1.45} +2025-02-05 20:37:02 - ERROR - stderr - 48%|████▊ | 10866/22434 [10:29:22<8:05:20, 2.52s/it] +2025-02-05 20:37:05 - ERROR - stderr - 48%|████▊ | 10867/22434 [10:29:24<8:01:28, 2.50s/it] +2025-02-05 20:37:05 - ERROR - stderr - +2025-02-05 20:37:05 - ERROR - stderr - +2025-02-05 20:37:05 - INFO - stdout - {'loss': 0.6702, 'grad_norm': 1.2564325332641602, 'learning_rate': 1.0990228463155557e-05, 'epoch': 1.45} +2025-02-05 20:37:05 - ERROR - stderr - 48%|████▊ | 10867/22434 [10:29:24<8:01:28, 2.50s/it] +2025-02-05 20:37:07 - ERROR - stderr - 48%|████▊ | 10868/22434 [10:29:27<8:00:00, 2.49s/it] +2025-02-05 20:37:07 - ERROR - stderr - +2025-02-05 20:37:07 - ERROR - stderr - +2025-02-05 20:37:07 - INFO - stdout - {'loss': 0.7953, 'grad_norm': 1.285187840461731, 'learning_rate': 1.0988791801999944e-05, 'epoch': 1.45} +2025-02-05 20:37:07 - ERROR - stderr - 48%|████▊ | 10868/22434 [10:29:27<8:00:00, 2.49s/it] +2025-02-05 20:37:10 - ERROR - stderr - 48%|████▊ | 10869/22434 [10:29:29<8:01:02, 2.50s/it] +2025-02-05 20:37:10 - ERROR - stderr - +2025-02-05 20:37:10 - ERROR - stderr - +2025-02-05 20:37:10 - INFO - stdout - {'loss': 0.7205, 'grad_norm': 1.1962530612945557, 'learning_rate': 1.0987355120233914e-05, 'epoch': 1.45} +2025-02-05 20:37:10 - ERROR - stderr - 48%|████▊ | 10869/22434 [10:29:29<8:01:02, 2.50s/it] +2025-02-05 20:37:12 - ERROR - stderr - 48%|████▊ | 10870/22434 [10:29:32<7:57:27, 2.48s/it] +2025-02-05 20:37:12 - ERROR - stderr - +2025-02-05 20:37:12 - ERROR - stderr - +2025-02-05 20:37:12 - INFO - stdout - {'loss': 0.6678, 'grad_norm': 1.1755759716033936, 'learning_rate': 1.098591841788741e-05, 'epoch': 1.45} +2025-02-05 20:37:12 - ERROR - stderr - 48%|████▊ | 10870/22434 [10:29:32<7:57:27, 2.48s/it] +2025-02-05 20:37:15 - ERROR - stderr - 48%|████▊ | 10871/22434 [10:29:34<7:58:37, 2.48s/it] +2025-02-05 20:37:15 - ERROR - stderr - +2025-02-05 20:37:15 - ERROR - stderr - +2025-02-05 20:37:15 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.2317090034484863, 'learning_rate': 1.0984481694990378e-05, 'epoch': 1.45} +2025-02-05 20:37:15 - ERROR - stderr - 48%|████▊ | 10871/22434 [10:29:34<7:58:37, 2.48s/it] +2025-02-05 20:37:17 - ERROR - stderr - 48%|████▊ | 10872/22434 [10:29:37<7:58:56, 2.49s/it] +2025-02-05 20:37:17 - ERROR - stderr - +2025-02-05 20:37:17 - ERROR - stderr - +2025-02-05 20:37:17 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.302388310432434, 'learning_rate': 1.0983044951572773e-05, 'epoch': 1.45} +2025-02-05 20:37:17 - ERROR - stderr - 48%|████▊ | 10872/22434 [10:29:37<7:58:56, 2.49s/it] +2025-02-05 20:37:20 - ERROR - stderr - 48%|████▊ | 10873/22434 [10:29:39<8:02:54, 2.51s/it] +2025-02-05 20:37:20 - ERROR - stderr - +2025-02-05 20:37:20 - ERROR - stderr - +2025-02-05 20:37:20 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.2712892293930054, 'learning_rate': 1.0981608187664532e-05, 'epoch': 1.45} +2025-02-05 20:37:20 - ERROR - stderr - 48%|████▊ | 10873/22434 [10:29:39<8:02:54, 2.51s/it] +2025-02-05 20:37:22 - ERROR - stderr - 48%|████▊ | 10874/22434 [10:29:42<8:03:14, 2.51s/it] +2025-02-05 20:37:22 - ERROR - stderr - +2025-02-05 20:37:22 - ERROR - stderr - +2025-02-05 20:37:22 - INFO - stdout - {'loss': 0.6222, 'grad_norm': 1.0935871601104736, 'learning_rate': 1.098017140329561e-05, 'epoch': 1.45} +2025-02-05 20:37:22 - ERROR - stderr - 48%|████▊ | 10874/22434 [10:29:42<8:03:14, 2.51s/it] +2025-02-05 20:37:25 - ERROR - stderr - 48%|████▊ | 10875/22434 [10:29:44<8:02:02, 2.50s/it] +2025-02-05 20:37:25 - ERROR - stderr - +2025-02-05 20:37:25 - ERROR - stderr - +2025-02-05 20:37:25 - INFO - stdout - {'loss': 0.7481, 'grad_norm': 1.2833107709884644, 'learning_rate': 1.0978734598495949e-05, 'epoch': 1.45} +2025-02-05 20:37:25 - ERROR - stderr - 48%|████▊ | 10875/22434 [10:29:44<8:02:02, 2.50s/it] +2025-02-05 20:37:27 - ERROR - stderr - 48%|████▊ | 10876/22434 [10:29:47<8:09:21, 2.54s/it] +2025-02-05 20:37:27 - ERROR - stderr - +2025-02-05 20:37:27 - ERROR - stderr - +2025-02-05 20:37:27 - INFO - stdout - {'loss': 0.6602, 'grad_norm': 1.1621891260147095, 'learning_rate': 1.0977297773295503e-05, 'epoch': 1.45} +2025-02-05 20:37:27 - ERROR - stderr - 48%|████▊ | 10876/22434 [10:29:47<8:09:21, 2.54s/it] +2025-02-05 20:37:30 - ERROR - stderr - 48%|████▊ | 10877/22434 [10:29:49<8:07:35, 2.53s/it] +2025-02-05 20:37:30 - ERROR - stderr - +2025-02-05 20:37:30 - ERROR - stderr - +2025-02-05 20:37:30 - INFO - stdout - {'loss': 0.6989, 'grad_norm': 1.095976710319519, 'learning_rate': 1.0975860927724225e-05, 'epoch': 1.45} +2025-02-05 20:37:30 - ERROR - stderr - 48%|████▊ | 10877/22434 [10:29:50<8:07:35, 2.53s/it] +2025-02-05 20:37:32 - ERROR - stderr - 48%|████▊ | 10878/22434 [10:29:52<8:08:50, 2.54s/it] +2025-02-05 20:37:32 - ERROR - stderr - +2025-02-05 20:37:32 - ERROR - stderr - +2025-02-05 20:37:32 - INFO - stdout - {'loss': 0.7454, 'grad_norm': 1.3705198764801025, 'learning_rate': 1.0974424061812055e-05, 'epoch': 1.45} +2025-02-05 20:37:32 - ERROR - stderr - 48%|████▊ | 10878/22434 [10:29:52<8:08:50, 2.54s/it] +2025-02-05 20:37:35 - ERROR - stderr - 48%|████▊ | 10879/22434 [10:29:54<8:02:17, 2.50s/it] +2025-02-05 20:37:35 - ERROR - stderr - +2025-02-05 20:37:35 - ERROR - stderr - +2025-02-05 20:37:35 - INFO - stdout - {'loss': 0.7585, 'grad_norm': 1.216985821723938, 'learning_rate': 1.097298717558895e-05, 'epoch': 1.45} +2025-02-05 20:37:35 - ERROR - stderr - 48%|████▊ | 10879/22434 [10:29:55<8:02:17, 2.50s/it] +2025-02-05 20:37:37 - ERROR - stderr - 48%|████▊ | 10880/22434 [10:29:57<7:59:58, 2.49s/it] +2025-02-05 20:37:37 - ERROR - stderr - +2025-02-05 20:37:37 - ERROR - stderr - +2025-02-05 20:37:37 - INFO - stdout - {'loss': 0.6528, 'grad_norm': 1.095146894454956, 'learning_rate': 1.0971550269084856e-05, 'epoch': 1.45} +2025-02-05 20:37:37 - ERROR - stderr - 48%|████▊ | 10880/22434 [10:29:57<7:59:58, 2.49s/it] +2025-02-05 20:37:40 - ERROR - stderr - 49%|████▊ | 10881/22434 [10:29:59<8:01:45, 2.50s/it] +2025-02-05 20:37:40 - ERROR - stderr - +2025-02-05 20:37:40 - ERROR - stderr - +2025-02-05 20:37:40 - INFO - stdout - {'loss': 0.67, 'grad_norm': 1.2245376110076904, 'learning_rate': 1.0970113342329728e-05, 'epoch': 1.46} +2025-02-05 20:37:40 - ERROR - stderr - 49%|████▊ | 10881/22434 [10:29:59<8:01:45, 2.50s/it] +2025-02-05 20:37:42 - ERROR - stderr - 49%|████▊ | 10882/22434 [10:30:02<8:11:51, 2.55s/it] +2025-02-05 20:37:42 - ERROR - stderr - +2025-02-05 20:37:42 - ERROR - stderr - +2025-02-05 20:37:42 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.1592437028884888, 'learning_rate': 1.0968676395353514e-05, 'epoch': 1.46} +2025-02-05 20:37:42 - ERROR - stderr - 49%|████▊ | 10882/22434 [10:30:02<8:11:51, 2.55s/it] +2025-02-05 20:37:45 - ERROR - stderr - 49%|████▊ | 10883/22434 [10:30:05<8:05:22, 2.52s/it] +2025-02-05 20:37:45 - ERROR - stderr - +2025-02-05 20:37:45 - ERROR - stderr - +2025-02-05 20:37:45 - INFO - stdout - {'loss': 0.6984, 'grad_norm': 1.2051292657852173, 'learning_rate': 1.0967239428186172e-05, 'epoch': 1.46} +2025-02-05 20:37:45 - ERROR - stderr - 49%|████▊ | 10883/22434 [10:30:05<8:05:22, 2.52s/it] +2025-02-05 20:37:47 - ERROR - stderr - 49%|████▊ | 10884/22434 [10:30:07<8:07:35, 2.53s/it] +2025-02-05 20:37:47 - ERROR - stderr - +2025-02-05 20:37:47 - ERROR - stderr - +2025-02-05 20:37:47 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.3012681007385254, 'learning_rate': 1.0965802440857645e-05, 'epoch': 1.46} +2025-02-05 20:37:47 - ERROR - stderr - 49%|████▊ | 10884/22434 [10:30:07<8:07:35, 2.53s/it] +2025-02-05 20:37:50 - ERROR - stderr - 49%|████▊ | 10885/22434 [10:30:10<8:10:29, 2.55s/it] +2025-02-05 20:37:50 - ERROR - stderr - +2025-02-05 20:37:50 - ERROR - stderr - +2025-02-05 20:37:50 - INFO - stdout - {'loss': 0.7148, 'grad_norm': 1.1714789867401123, 'learning_rate': 1.0964365433397894e-05, 'epoch': 1.46} +2025-02-05 20:37:50 - ERROR - stderr - 49%|████▊ | 10885/22434 [10:30:10<8:10:29, 2.55s/it] +2025-02-05 20:37:53 - ERROR - stderr - 49%|████▊ | 10886/22434 [10:30:12<8:14:05, 2.57s/it] +2025-02-05 20:37:53 - ERROR - stderr - +2025-02-05 20:37:53 - ERROR - stderr - +2025-02-05 20:37:53 - INFO - stdout - {'loss': 0.6489, 'grad_norm': 1.1794650554656982, 'learning_rate': 1.0962928405836866e-05, 'epoch': 1.46} +2025-02-05 20:37:53 - ERROR - stderr - 49%|████▊ | 10886/22434 [10:30:12<8:14:05, 2.57s/it] +2025-02-05 20:37:55 - ERROR - stderr - 49%|████▊ | 10887/22434 [10:30:15<8:06:48, 2.53s/it] +2025-02-05 20:37:55 - ERROR - stderr - +2025-02-05 20:37:55 - ERROR - stderr - +2025-02-05 20:37:55 - INFO - stdout - {'loss': 0.607, 'grad_norm': 1.0465161800384521, 'learning_rate': 1.0961491358204516e-05, 'epoch': 1.46} +2025-02-05 20:37:55 - ERROR - stderr - 49%|████▊ | 10887/22434 [10:30:15<8:06:48, 2.53s/it] +2025-02-05 20:37:58 - ERROR - stderr - 49%|████▊ | 10888/22434 [10:30:17<8:08:01, 2.54s/it] +2025-02-05 20:37:58 - ERROR - stderr - +2025-02-05 20:37:58 - ERROR - stderr - +2025-02-05 20:37:58 - INFO - stdout - {'loss': 0.6573, 'grad_norm': 1.1192132234573364, 'learning_rate': 1.09600542905308e-05, 'epoch': 1.46} +2025-02-05 20:37:58 - ERROR - stderr - 49%|████▊ | 10888/22434 [10:30:17<8:08:01, 2.54s/it] +2025-02-05 20:38:00 - ERROR - stderr - 49%|████▊ | 10889/22434 [10:30:20<8:06:23, 2.53s/it] +2025-02-05 20:38:00 - ERROR - stderr - +2025-02-05 20:38:00 - ERROR - stderr - +2025-02-05 20:38:00 - INFO - stdout - {'loss': 0.7362, 'grad_norm': 1.3643540143966675, 'learning_rate': 1.0958617202845672e-05, 'epoch': 1.46} +2025-02-05 20:38:00 - ERROR - stderr - 49%|████▊ | 10889/22434 [10:30:20<8:06:23, 2.53s/it] +2025-02-05 20:38:03 - ERROR - stderr - 49%|████▊ | 10890/22434 [10:30:22<8:05:34, 2.52s/it] +2025-02-05 20:38:03 - ERROR - stderr - +2025-02-05 20:38:03 - ERROR - stderr - +2025-02-05 20:38:03 - INFO - stdout - {'loss': 0.7694, 'grad_norm': 1.2710552215576172, 'learning_rate': 1.0957180095179082e-05, 'epoch': 1.46} +2025-02-05 20:38:03 - ERROR - stderr - 49%|████▊ | 10890/22434 [10:30:22<8:05:34, 2.52s/it] +2025-02-05 20:38:05 - ERROR - stderr - 49%|████▊ | 10891/22434 [10:30:25<8:05:41, 2.52s/it] +2025-02-05 20:38:05 - ERROR - stderr - +2025-02-05 20:38:05 - ERROR - stderr - +2025-02-05 20:38:05 - INFO - stdout - {'loss': 0.7078, 'grad_norm': 1.2066421508789062, 'learning_rate': 1.0955742967560995e-05, 'epoch': 1.46} +2025-02-05 20:38:05 - ERROR - stderr - 49%|████▊ | 10891/22434 [10:30:25<8:05:41, 2.52s/it] +2025-02-05 20:38:08 - ERROR - stderr - 49%|████▊ | 10892/22434 [10:30:27<8:00:06, 2.50s/it] +2025-02-05 20:38:08 - ERROR - stderr - +2025-02-05 20:38:08 - ERROR - stderr - +2025-02-05 20:38:08 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.24596107006073, 'learning_rate': 1.0954305820021354e-05, 'epoch': 1.46} +2025-02-05 20:38:08 - ERROR - stderr - 49%|████▊ | 10892/22434 [10:30:27<8:00:06, 2.50s/it] +2025-02-05 20:38:10 - ERROR - stderr - 49%|████▊ | 10893/22434 [10:30:30<8:04:23, 2.52s/it] +2025-02-05 20:38:10 - ERROR - stderr - +2025-02-05 20:38:10 - ERROR - stderr - +2025-02-05 20:38:10 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.1464096307754517, 'learning_rate': 1.0952868652590124e-05, 'epoch': 1.46} +2025-02-05 20:38:10 - ERROR - stderr - 49%|████▊ | 10893/22434 [10:30:30<8:04:23, 2.52s/it] +2025-02-05 20:38:13 - ERROR - stderr - 49%|████▊ | 10894/22434 [10:30:32<8:00:35, 2.50s/it] +2025-02-05 20:38:13 - ERROR - stderr - +2025-02-05 20:38:13 - ERROR - stderr - +2025-02-05 20:38:13 - INFO - stdout - {'loss': 0.7024, 'grad_norm': 1.36152184009552, 'learning_rate': 1.095143146529726e-05, 'epoch': 1.46} +2025-02-05 20:38:13 - ERROR - stderr - 49%|████▊ | 10894/22434 [10:30:32<8:00:35, 2.50s/it] +2025-02-05 20:38:15 - ERROR - stderr - 49%|████▊ | 10895/22434 [10:30:35<8:18:34, 2.59s/it] +2025-02-05 20:38:15 - ERROR - stderr - +2025-02-05 20:38:15 - ERROR - stderr - +2025-02-05 20:38:15 - INFO - stdout - {'loss': 0.6805, 'grad_norm': 1.184718370437622, 'learning_rate': 1.0949994258172715e-05, 'epoch': 1.46} +2025-02-05 20:38:15 - ERROR - stderr - 49%|████▊ | 10895/22434 [10:30:35<8:18:34, 2.59s/it] +2025-02-05 20:38:18 - ERROR - stderr - 49%|████▊ | 10896/22434 [10:30:38<8:14:59, 2.57s/it] +2025-02-05 20:38:18 - ERROR - stderr - +2025-02-05 20:38:18 - ERROR - stderr - +2025-02-05 20:38:18 - INFO - stdout - {'loss': 0.7404, 'grad_norm': 1.233087182044983, 'learning_rate': 1.094855703124645e-05, 'epoch': 1.46} +2025-02-05 20:38:18 - ERROR - stderr - 49%|████▊ | 10896/22434 [10:30:38<8:14:59, 2.57s/it] +2025-02-05 20:38:20 - ERROR - stderr - 49%|████▊ | 10897/22434 [10:30:40<8:12:54, 2.56s/it] +2025-02-05 20:38:20 - ERROR - stderr - +2025-02-05 20:38:20 - ERROR - stderr - +2025-02-05 20:38:20 - INFO - stdout - {'loss': 0.7065, 'grad_norm': 1.1632511615753174, 'learning_rate': 1.0947119784548424e-05, 'epoch': 1.46} +2025-02-05 20:38:20 - ERROR - stderr - 49%|████▊ | 10897/22434 [10:30:40<8:12:54, 2.56s/it] +2025-02-05 20:38:23 - ERROR - stderr - 49%|████▊ | 10898/22434 [10:30:43<8:13:07, 2.56s/it] +2025-02-05 20:38:23 - ERROR - stderr - +2025-02-05 20:38:23 - ERROR - stderr - +2025-02-05 20:38:23 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.0786553621292114, 'learning_rate': 1.0945682518108588e-05, 'epoch': 1.46} +2025-02-05 20:38:23 - ERROR - stderr - 49%|████▊ | 10898/22434 [10:30:43<8:13:07, 2.56s/it] +2025-02-05 20:38:25 - ERROR - stderr - 49%|████▊ | 10899/22434 [10:30:45<8:07:35, 2.54s/it] +2025-02-05 20:38:26 - ERROR - stderr - +2025-02-05 20:38:26 - ERROR - stderr - +2025-02-05 20:38:26 - INFO - stdout - {'loss': 0.7845, 'grad_norm': 1.2024211883544922, 'learning_rate': 1.0944245231956909e-05, 'epoch': 1.46} +2025-02-05 20:38:26 - ERROR - stderr - 49%|████▊ | 10899/22434 [10:30:45<8:07:35, 2.54s/it] +2025-02-05 20:38:28 - ERROR - stderr - 49%|████▊ | 10900/22434 [10:30:48<8:06:29, 2.53s/it] +2025-02-05 20:38:28 - ERROR - stderr - +2025-02-05 20:38:28 - ERROR - stderr - +2025-02-05 20:38:28 - INFO - stdout - {'loss': 0.7382, 'grad_norm': 1.239782691001892, 'learning_rate': 1.0942807926123338e-05, 'epoch': 1.46} +2025-02-05 20:38:28 - ERROR - stderr - 49%|████▊ | 10900/22434 [10:30:48<8:06:29, 2.53s/it] +2025-02-05 20:38:30 - ERROR - stderr - 49%|████▊ | 10901/22434 [10:30:50<8:02:26, 2.51s/it] +2025-02-05 20:38:30 - ERROR - stderr - +2025-02-05 20:38:30 - ERROR - stderr - +2025-02-05 20:38:30 - INFO - stdout - {'loss': 0.7409, 'grad_norm': 1.159354567527771, 'learning_rate': 1.0941370600637839e-05, 'epoch': 1.46} +2025-02-05 20:38:30 - ERROR - stderr - 49%|████▊ | 10901/22434 [10:30:50<8:02:26, 2.51s/it] +2025-02-05 20:38:33 - ERROR - stderr - 49%|████▊ | 10902/22434 [10:30:53<7:59:07, 2.49s/it] +2025-02-05 20:38:33 - ERROR - stderr - +2025-02-05 20:38:33 - ERROR - stderr - +2025-02-05 20:38:33 - INFO - stdout - {'loss': 0.7273, 'grad_norm': 1.3095420598983765, 'learning_rate': 1.093993325553037e-05, 'epoch': 1.46} +2025-02-05 20:38:33 - ERROR - stderr - 49%|████▊ | 10902/22434 [10:30:53<7:59:07, 2.49s/it] +2025-02-05 20:38:35 - ERROR - stderr - 49%|████▊ | 10903/22434 [10:30:55<7:57:42, 2.49s/it] +2025-02-05 20:38:35 - ERROR - stderr - +2025-02-05 20:38:35 - ERROR - stderr - +2025-02-05 20:38:35 - INFO - stdout - {'loss': 0.7473, 'grad_norm': 1.2642269134521484, 'learning_rate': 1.0938495890830893e-05, 'epoch': 1.46} +2025-02-05 20:38:35 - ERROR - stderr - 49%|████▊ | 10903/22434 [10:30:55<7:57:42, 2.49s/it] +2025-02-05 20:38:38 - ERROR - stderr - 49%|████▊ | 10904/22434 [10:30:58<8:20:19, 2.60s/it] +2025-02-05 20:38:38 - ERROR - stderr - +2025-02-05 20:38:38 - ERROR - stderr - +2025-02-05 20:38:38 - INFO - stdout - {'loss': 0.7746, 'grad_norm': 1.286946415901184, 'learning_rate': 1.0937058506569366e-05, 'epoch': 1.46} +2025-02-05 20:38:38 - ERROR - stderr - 49%|████▊ | 10904/22434 [10:30:58<8:20:19, 2.60s/it] +2025-02-05 20:38:41 - ERROR - stderr - 49%|████▊ | 10905/22434 [10:31:01<8:15:13, 2.58s/it] +2025-02-05 20:38:41 - ERROR - stderr - +2025-02-05 20:38:41 - ERROR - stderr - +2025-02-05 20:38:41 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.1595613956451416, 'learning_rate': 1.0935621102775756e-05, 'epoch': 1.46} +2025-02-05 20:38:41 - ERROR - stderr - 49%|████▊ | 10905/22434 [10:31:01<8:15:13, 2.58s/it] +2025-02-05 20:38:43 - ERROR - stderr - 49%|████▊ | 10906/22434 [10:31:03<8:12:44, 2.56s/it] +2025-02-05 20:38:43 - ERROR - stderr - +2025-02-05 20:38:43 - ERROR - stderr - +2025-02-05 20:38:43 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.2030659914016724, 'learning_rate': 1.0934183679480014e-05, 'epoch': 1.46} +2025-02-05 20:38:43 - ERROR - stderr - 49%|████▊ | 10906/22434 [10:31:03<8:12:44, 2.56s/it] +2025-02-05 20:38:46 - ERROR - stderr - 49%|████▊ | 10907/22434 [10:31:06<8:13:24, 2.57s/it] +2025-02-05 20:38:46 - ERROR - stderr - +2025-02-05 20:38:46 - ERROR - stderr - +2025-02-05 20:38:46 - INFO - stdout - {'loss': 0.6035, 'grad_norm': 1.1344598531723022, 'learning_rate': 1.0932746236712106e-05, 'epoch': 1.46} +2025-02-05 20:38:46 - ERROR - stderr - 49%|████▊ | 10907/22434 [10:31:06<8:13:24, 2.57s/it] +2025-02-05 20:38:48 - ERROR - stderr - 49%|████▊ | 10908/22434 [10:31:08<8:09:50, 2.55s/it] +2025-02-05 20:38:48 - ERROR - stderr - +2025-02-05 20:38:48 - ERROR - stderr - +2025-02-05 20:38:48 - INFO - stdout - {'loss': 0.7401, 'grad_norm': 1.2621605396270752, 'learning_rate': 1.0931308774501999e-05, 'epoch': 1.46} +2025-02-05 20:38:48 - ERROR - stderr - 49%|████▊ | 10908/22434 [10:31:08<8:09:50, 2.55s/it] +2025-02-05 20:38:51 - ERROR - stderr - 49%|████▊ | 10909/22434 [10:31:11<8:07:07, 2.54s/it] +2025-02-05 20:38:51 - ERROR - stderr - +2025-02-05 20:38:51 - ERROR - stderr - +2025-02-05 20:38:51 - INFO - stdout - {'loss': 0.7147, 'grad_norm': 1.152124285697937, 'learning_rate': 1.0929871292879652e-05, 'epoch': 1.46} +2025-02-05 20:38:51 - ERROR - stderr - 49%|████▊ | 10909/22434 [10:31:11<8:07:07, 2.54s/it] +2025-02-05 20:38:53 - ERROR - stderr - 49%|████▊ | 10910/22434 [10:31:13<8:05:33, 2.53s/it] +2025-02-05 20:38:53 - ERROR - stderr - +2025-02-05 20:38:53 - ERROR - stderr - +2025-02-05 20:38:53 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.2316539287567139, 'learning_rate': 1.0928433791875026e-05, 'epoch': 1.46} +2025-02-05 20:38:53 - ERROR - stderr - 49%|████▊ | 10910/22434 [10:31:13<8:05:33, 2.53s/it] +2025-02-05 20:38:56 - ERROR - stderr - 49%|████▊ | 10911/22434 [10:31:16<8:02:41, 2.51s/it] +2025-02-05 20:38:56 - ERROR - stderr - +2025-02-05 20:38:56 - ERROR - stderr - +2025-02-05 20:38:56 - INFO - stdout - {'loss': 0.7863, 'grad_norm': 1.3426806926727295, 'learning_rate': 1.0926996271518085e-05, 'epoch': 1.46} +2025-02-05 20:38:56 - ERROR - stderr - 49%|████▊ | 10911/22434 [10:31:16<8:02:41, 2.51s/it] +2025-02-05 20:38:58 - ERROR - stderr - 49%|████▊ | 10912/22434 [10:31:18<7:58:06, 2.49s/it] +2025-02-05 20:38:58 - ERROR - stderr - +2025-02-05 20:38:58 - ERROR - stderr - +2025-02-05 20:38:58 - INFO - stdout - {'loss': 0.7831, 'grad_norm': 1.3194398880004883, 'learning_rate': 1.0925558731838795e-05, 'epoch': 1.46} +2025-02-05 20:38:58 - ERROR - stderr - 49%|████▊ | 10912/22434 [10:31:18<7:58:06, 2.49s/it] +2025-02-05 20:39:01 - ERROR - stderr - 49%|████▊ | 10913/22434 [10:31:21<7:56:58, 2.48s/it] +2025-02-05 20:39:01 - ERROR - stderr - +2025-02-05 20:39:01 - ERROR - stderr - +2025-02-05 20:39:01 - INFO - stdout - {'loss': 0.6343, 'grad_norm': 1.0863442420959473, 'learning_rate': 1.0924121172867119e-05, 'epoch': 1.46} +2025-02-05 20:39:01 - ERROR - stderr - 49%|████▊ | 10913/22434 [10:31:21<7:56:58, 2.48s/it] +2025-02-05 20:39:03 - ERROR - stderr - 49%|████▊ | 10914/22434 [10:31:23<7:52:35, 2.46s/it] +2025-02-05 20:39:03 - ERROR - stderr - +2025-02-05 20:39:03 - ERROR - stderr - +2025-02-05 20:39:03 - INFO - stdout - {'loss': 0.6221, 'grad_norm': 1.1019114255905151, 'learning_rate': 1.092268359463302e-05, 'epoch': 1.46} +2025-02-05 20:39:03 - ERROR - stderr - 49%|████▊ | 10914/22434 [10:31:23<7:52:35, 2.46s/it] +2025-02-05 20:39:06 - ERROR - stderr - 49%|████▊ | 10915/22434 [10:31:25<7:53:04, 2.46s/it] +2025-02-05 20:39:06 - ERROR - stderr - +2025-02-05 20:39:06 - ERROR - stderr - +2025-02-05 20:39:06 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.1885719299316406, 'learning_rate': 1.0921245997166467e-05, 'epoch': 1.46} +2025-02-05 20:39:06 - ERROR - stderr - 49%|████▊ | 10915/22434 [10:31:25<7:53:04, 2.46s/it] +2025-02-05 20:39:08 - ERROR - stderr - 49%|████▊ | 10916/22434 [10:31:28<7:53:22, 2.47s/it] +2025-02-05 20:39:08 - ERROR - stderr - +2025-02-05 20:39:08 - ERROR - stderr - +2025-02-05 20:39:08 - INFO - stdout - {'loss': 0.7189, 'grad_norm': 1.1551927328109741, 'learning_rate': 1.091980838049742e-05, 'epoch': 1.46} +2025-02-05 20:39:08 - ERROR - stderr - 49%|████▊ | 10916/22434 [10:31:28<7:53:22, 2.47s/it] +2025-02-05 20:39:11 - ERROR - stderr - 49%|████▊ | 10917/22434 [10:31:31<8:16:48, 2.59s/it] +2025-02-05 20:39:11 - ERROR - stderr - +2025-02-05 20:39:11 - ERROR - stderr - +2025-02-05 20:39:11 - INFO - stdout - {'loss': 0.821, 'grad_norm': 1.293895959854126, 'learning_rate': 1.0918370744655851e-05, 'epoch': 1.46} +2025-02-05 20:39:11 - ERROR - stderr - 49%|████▊ | 10917/22434 [10:31:31<8:16:48, 2.59s/it] +2025-02-05 20:39:14 - ERROR - stderr - 49%|████▊ | 10918/22434 [10:31:33<8:12:58, 2.57s/it] +2025-02-05 20:39:14 - ERROR - stderr - +2025-02-05 20:39:14 - ERROR - stderr - +2025-02-05 20:39:14 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.3254491090774536, 'learning_rate': 1.0916933089671721e-05, 'epoch': 1.46} +2025-02-05 20:39:14 - ERROR - stderr - 49%|████▊ | 10918/22434 [10:31:33<8:12:58, 2.57s/it] +2025-02-05 20:39:16 - ERROR - stderr - 49%|████▊ | 10919/22434 [10:31:36<8:10:26, 2.56s/it] +2025-02-05 20:39:16 - ERROR - stderr - +2025-02-05 20:39:16 - ERROR - stderr - +2025-02-05 20:39:16 - INFO - stdout - {'loss': 0.6509, 'grad_norm': 1.1972441673278809, 'learning_rate': 1.0915495415574996e-05, 'epoch': 1.46} +2025-02-05 20:39:16 - ERROR - stderr - 49%|████▊ | 10919/22434 [10:31:36<8:10:26, 2.56s/it] +2025-02-05 20:39:19 - ERROR - stderr - 49%|████▊ | 10920/22434 [10:31:38<8:06:46, 2.54s/it] +2025-02-05 20:39:19 - ERROR - stderr - +2025-02-05 20:39:19 - ERROR - stderr - +2025-02-05 20:39:19 - INFO - stdout - {'loss': 0.7529, 'grad_norm': 1.3287737369537354, 'learning_rate': 1.0914057722395646e-05, 'epoch': 1.46} +2025-02-05 20:39:19 - ERROR - stderr - 49%|████▊ | 10920/22434 [10:31:38<8:06:46, 2.54s/it] +2025-02-05 20:39:21 - ERROR - stderr - 49%|████▊ | 10921/22434 [10:31:41<8:06:05, 2.53s/it] +2025-02-05 20:39:21 - ERROR - stderr - +2025-02-05 20:39:21 - ERROR - stderr - +2025-02-05 20:39:21 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.1431668996810913, 'learning_rate': 1.0912620010163639e-05, 'epoch': 1.46} +2025-02-05 20:39:21 - ERROR - stderr - 49%|████▊ | 10921/22434 [10:31:41<8:06:05, 2.53s/it] +2025-02-05 20:39:24 - ERROR - stderr - 49%|████▊ | 10922/22434 [10:31:43<8:03:53, 2.52s/it] +2025-02-05 20:39:24 - ERROR - stderr - +2025-02-05 20:39:24 - ERROR - stderr - +2025-02-05 20:39:24 - INFO - stdout - {'loss': 0.6481, 'grad_norm': 1.1322269439697266, 'learning_rate': 1.0911182278908941e-05, 'epoch': 1.46} +2025-02-05 20:39:24 - ERROR - stderr - 49%|████▊ | 10922/22434 [10:31:43<8:03:53, 2.52s/it] +2025-02-05 20:39:26 - ERROR - stderr - 49%|████▊ | 10923/22434 [10:31:46<7:59:19, 2.50s/it] +2025-02-05 20:39:26 - ERROR - stderr - +2025-02-05 20:39:26 - ERROR - stderr - +2025-02-05 20:39:26 - INFO - stdout - {'loss': 0.7249, 'grad_norm': 1.1951957941055298, 'learning_rate': 1.090974452866152e-05, 'epoch': 1.46} +2025-02-05 20:39:26 - ERROR - stderr - 49%|████▊ | 10923/22434 [10:31:46<7:59:19, 2.50s/it] +2025-02-05 20:39:29 - ERROR - stderr - 49%|████▊ | 10924/22434 [10:31:48<8:02:15, 2.51s/it] +2025-02-05 20:39:29 - ERROR - stderr - +2025-02-05 20:39:29 - ERROR - stderr - +2025-02-05 20:39:29 - INFO - stdout - {'loss': 0.7235, 'grad_norm': 1.4419785737991333, 'learning_rate': 1.0908306759451343e-05, 'epoch': 1.46} +2025-02-05 20:39:29 - ERROR - stderr - 49%|████▊ | 10924/22434 [10:31:48<8:02:15, 2.51s/it] +2025-02-05 20:39:31 - ERROR - stderr - 49%|████▊ | 10925/22434 [10:31:51<7:56:27, 2.48s/it] +2025-02-05 20:39:31 - ERROR - stderr - +2025-02-05 20:39:31 - ERROR - stderr - +2025-02-05 20:39:31 - INFO - stdout - {'loss': 0.6238, 'grad_norm': 1.215969443321228, 'learning_rate': 1.0906868971308384e-05, 'epoch': 1.46} +2025-02-05 20:39:31 - ERROR - stderr - 49%|████▊ | 10925/22434 [10:31:51<7:56:27, 2.48s/it] +2025-02-05 20:39:33 - ERROR - stderr - 49%|████▊ | 10926/22434 [10:31:53<7:55:09, 2.48s/it] +2025-02-05 20:39:33 - ERROR - stderr - +2025-02-05 20:39:33 - ERROR - stderr - +2025-02-05 20:39:33 - INFO - stdout - {'loss': 0.6625, 'grad_norm': 2.5027265548706055, 'learning_rate': 1.0905431164262605e-05, 'epoch': 1.46} +2025-02-05 20:39:33 - ERROR - stderr - 49%|████▊ | 10926/22434 [10:31:53<7:55:09, 2.48s/it] +2025-02-05 20:39:36 - ERROR - stderr - 49%|████▊ | 10927/22434 [10:31:56<7:53:36, 2.47s/it] +2025-02-05 20:39:36 - ERROR - stderr - +2025-02-05 20:39:36 - ERROR - stderr - +2025-02-05 20:39:36 - INFO - stdout - {'loss': 0.7467, 'grad_norm': 1.279018759727478, 'learning_rate': 1.0903993338343984e-05, 'epoch': 1.46} +2025-02-05 20:39:36 - ERROR - stderr - 49%|████▊ | 10927/22434 [10:31:56<7:53:36, 2.47s/it] +2025-02-05 20:39:38 - ERROR - stderr - 49%|████▊ | 10928/22434 [10:31:58<7:51:43, 2.46s/it] +2025-02-05 20:39:38 - ERROR - stderr - +2025-02-05 20:39:38 - ERROR - stderr - +2025-02-05 20:39:38 - INFO - stdout - {'loss': 0.6448, 'grad_norm': 1.1036295890808105, 'learning_rate': 1.0902555493582483e-05, 'epoch': 1.46} +2025-02-05 20:39:38 - ERROR - stderr - 49%|████▊ | 10928/22434 [10:31:58<7:51:43, 2.46s/it] +2025-02-05 20:39:41 - ERROR - stderr - 49%|████▊ | 10929/22434 [10:32:01<7:52:21, 2.46s/it] +2025-02-05 20:39:41 - ERROR - stderr - +2025-02-05 20:39:41 - ERROR - stderr - +2025-02-05 20:39:41 - INFO - stdout - {'loss': 0.7496, 'grad_norm': 1.3221873044967651, 'learning_rate': 1.090111763000808e-05, 'epoch': 1.46} +2025-02-05 20:39:41 - ERROR - stderr - 49%|████▊ | 10929/22434 [10:32:01<7:52:21, 2.46s/it] +2025-02-05 20:39:43 - ERROR - stderr - 49%|████▊ | 10930/22434 [10:32:03<7:53:03, 2.47s/it] +2025-02-05 20:39:43 - ERROR - stderr - +2025-02-05 20:39:43 - ERROR - stderr - +2025-02-05 20:39:43 - INFO - stdout - {'loss': 0.7526, 'grad_norm': 1.2621976137161255, 'learning_rate': 1.0899679747650742e-05, 'epoch': 1.46} +2025-02-05 20:39:43 - ERROR - stderr - 49%|████▊ | 10930/22434 [10:32:03<7:53:03, 2.47s/it] +2025-02-05 20:39:46 - ERROR - stderr - 49%|████▊ | 10931/22434 [10:32:06<7:53:09, 2.47s/it] +2025-02-05 20:39:46 - ERROR - stderr - +2025-02-05 20:39:46 - ERROR - stderr - +2025-02-05 20:39:46 - INFO - stdout - {'loss': 0.7906, 'grad_norm': 1.3293044567108154, 'learning_rate': 1.0898241846540439e-05, 'epoch': 1.46} +2025-02-05 20:39:46 - ERROR - stderr - 49%|████▊ | 10931/22434 [10:32:06<7:53:09, 2.47s/it] +2025-02-05 20:39:48 - ERROR - stderr - 49%|████▊ | 10932/22434 [10:32:08<7:49:25, 2.45s/it] +2025-02-05 20:39:48 - ERROR - stderr - +2025-02-05 20:39:48 - ERROR - stderr - +2025-02-05 20:39:48 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 1.1575193405151367, 'learning_rate': 1.0896803926707142e-05, 'epoch': 1.46} +2025-02-05 20:39:48 - ERROR - stderr - 49%|████▊ | 10932/22434 [10:32:08<7:49:25, 2.45s/it] +2025-02-05 20:39:51 - ERROR - stderr - 49%|████▊ | 10933/22434 [10:32:10<7:50:00, 2.45s/it] +2025-02-05 20:39:51 - ERROR - stderr - +2025-02-05 20:39:51 - ERROR - stderr - +2025-02-05 20:39:51 - INFO - stdout - {'loss': 0.6524, 'grad_norm': 1.2418296337127686, 'learning_rate': 1.0895365988180829e-05, 'epoch': 1.46} +2025-02-05 20:39:51 - ERROR - stderr - 49%|████▊ | 10933/22434 [10:32:10<7:50:00, 2.45s/it] +2025-02-05 20:39:53 - ERROR - stderr - 49%|████▊ | 10934/22434 [10:32:13<7:53:39, 2.47s/it] +2025-02-05 20:39:53 - ERROR - stderr - +2025-02-05 20:39:53 - ERROR - stderr - +2025-02-05 20:39:53 - INFO - stdout - {'loss': 0.7095, 'grad_norm': 1.318368911743164, 'learning_rate': 1.0893928030991468e-05, 'epoch': 1.46} +2025-02-05 20:39:53 - ERROR - stderr - 49%|████▊ | 10934/22434 [10:32:13<7:53:39, 2.47s/it] +2025-02-05 20:39:56 - ERROR - stderr - 49%|████▊ | 10935/22434 [10:32:16<8:02:21, 2.52s/it] +2025-02-05 20:39:56 - ERROR - stderr - +2025-02-05 20:39:56 - ERROR - stderr - +2025-02-05 20:39:56 - INFO - stdout - {'loss': 0.6393, 'grad_norm': 1.137374997138977, 'learning_rate': 1.0892490055169032e-05, 'epoch': 1.46} +2025-02-05 20:39:56 - ERROR - stderr - 49%|████▊ | 10935/22434 [10:32:16<8:02:21, 2.52s/it] +2025-02-05 20:39:58 - ERROR - stderr - 49%|████▊ | 10936/22434 [10:32:18<8:02:38, 2.52s/it] +2025-02-05 20:39:58 - ERROR - stderr - +2025-02-05 20:39:58 - ERROR - stderr - +2025-02-05 20:39:58 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.255260705947876, 'learning_rate': 1.0891052060743494e-05, 'epoch': 1.46} +2025-02-05 20:39:58 - ERROR - stderr - 49%|████▊ | 10936/22434 [10:32:18<8:02:38, 2.52s/it] +2025-02-05 20:40:01 - ERROR - stderr - 49%|████▉ | 10937/22434 [10:32:21<7:59:55, 2.50s/it] +2025-02-05 20:40:01 - ERROR - stderr - +2025-02-05 20:40:01 - ERROR - stderr - +2025-02-05 20:40:01 - INFO - stdout - {'loss': 0.7447, 'grad_norm': 1.329527497291565, 'learning_rate': 1.0889614047744831e-05, 'epoch': 1.46} +2025-02-05 20:40:01 - ERROR - stderr - 49%|████▉ | 10937/22434 [10:32:21<7:59:55, 2.50s/it] +2025-02-05 20:40:03 - ERROR - stderr - 49%|████▉ | 10938/22434 [10:32:23<8:02:07, 2.52s/it] +2025-02-05 20:40:03 - ERROR - stderr - +2025-02-05 20:40:03 - ERROR - stderr - +2025-02-05 20:40:03 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.2871960401535034, 'learning_rate': 1.0888176016203013e-05, 'epoch': 1.46} +2025-02-05 20:40:03 - ERROR - stderr - 49%|████▉ | 10938/22434 [10:32:23<8:02:07, 2.52s/it] +2025-02-05 20:40:06 - ERROR - stderr - 49%|████▉ | 10939/22434 [10:32:25<7:58:20, 2.50s/it] +2025-02-05 20:40:06 - ERROR - stderr - +2025-02-05 20:40:06 - ERROR - stderr - +2025-02-05 20:40:06 - INFO - stdout - {'loss': 0.7082, 'grad_norm': 1.1659642457962036, 'learning_rate': 1.0886737966148014e-05, 'epoch': 1.46} +2025-02-05 20:40:06 - ERROR - stderr - 49%|████▉ | 10939/22434 [10:32:26<7:58:20, 2.50s/it] +2025-02-05 20:40:08 - ERROR - stderr - 49%|████▉ | 10940/22434 [10:32:28<8:00:13, 2.51s/it] +2025-02-05 20:40:08 - ERROR - stderr - +2025-02-05 20:40:08 - ERROR - stderr - +2025-02-05 20:40:08 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.0542985200881958, 'learning_rate': 1.0885299897609811e-05, 'epoch': 1.46} +2025-02-05 20:40:08 - ERROR - stderr - 49%|████▉ | 10940/22434 [10:32:28<8:00:13, 2.51s/it] +2025-02-05 20:40:11 - ERROR - stderr - 49%|████▉ | 10941/22434 [10:32:31<8:02:31, 2.52s/it] +2025-02-05 20:40:11 - ERROR - stderr - +2025-02-05 20:40:11 - ERROR - stderr - +2025-02-05 20:40:11 - INFO - stdout - {'loss': 0.739, 'grad_norm': 1.2396397590637207, 'learning_rate': 1.0883861810618382e-05, 'epoch': 1.46} +2025-02-05 20:40:11 - ERROR - stderr - 49%|████▉ | 10941/22434 [10:32:31<8:02:31, 2.52s/it] +2025-02-05 20:40:13 - ERROR - stderr - 49%|████▉ | 10942/22434 [10:32:33<8:03:15, 2.52s/it] +2025-02-05 20:40:13 - ERROR - stderr - +2025-02-05 20:40:13 - ERROR - stderr - +2025-02-05 20:40:13 - INFO - stdout - {'loss': 0.6637, 'grad_norm': 1.1542752981185913, 'learning_rate': 1.0882423705203698e-05, 'epoch': 1.46} +2025-02-05 20:40:13 - ERROR - stderr - 49%|████▉ | 10942/22434 [10:32:33<8:03:15, 2.52s/it] +2025-02-05 20:40:16 - ERROR - stderr - 49%|████▉ | 10943/22434 [10:32:36<7:57:37, 2.49s/it] +2025-02-05 20:40:16 - ERROR - stderr - +2025-02-05 20:40:16 - ERROR - stderr - +2025-02-05 20:40:16 - INFO - stdout - {'loss': 0.8444, 'grad_norm': 1.3775659799575806, 'learning_rate': 1.0880985581395736e-05, 'epoch': 1.46} +2025-02-05 20:40:16 - ERROR - stderr - 49%|████▉ | 10943/22434 [10:32:36<7:57:37, 2.49s/it] +2025-02-05 20:40:18 - ERROR - stderr - 49%|████▉ | 10944/22434 [10:32:38<8:02:36, 2.52s/it] +2025-02-05 20:40:18 - ERROR - stderr - +2025-02-05 20:40:18 - ERROR - stderr - +2025-02-05 20:40:18 - INFO - stdout - {'loss': 0.6809, 'grad_norm': 1.2057346105575562, 'learning_rate': 1.0879547439224471e-05, 'epoch': 1.46} +2025-02-05 20:40:18 - ERROR - stderr - 49%|████▉ | 10944/22434 [10:32:38<8:02:36, 2.52s/it] +2025-02-05 20:40:21 - ERROR - stderr - 49%|████▉ | 10945/22434 [10:32:41<8:15:30, 2.59s/it] +2025-02-05 20:40:21 - ERROR - stderr - +2025-02-05 20:40:21 - ERROR - stderr - +2025-02-05 20:40:21 - INFO - stdout - {'loss': 0.6971, 'grad_norm': 1.110167384147644, 'learning_rate': 1.0878109278719882e-05, 'epoch': 1.46} +2025-02-05 20:40:21 - ERROR - stderr - 49%|████▉ | 10945/22434 [10:32:41<8:15:30, 2.59s/it] +2025-02-05 20:40:24 - ERROR - stderr - 49%|████▉ | 10946/22434 [10:32:43<8:14:46, 2.58s/it] +2025-02-05 20:40:24 - ERROR - stderr - +2025-02-05 20:40:24 - ERROR - stderr - +2025-02-05 20:40:24 - INFO - stdout - {'loss': 0.6949, 'grad_norm': 1.2810145616531372, 'learning_rate': 1.0876671099911947e-05, 'epoch': 1.46} +2025-02-05 20:40:24 - ERROR - stderr - 49%|████▉ | 10946/22434 [10:32:43<8:14:46, 2.58s/it] +2025-02-05 20:40:26 - ERROR - stderr - 49%|████▉ | 10947/22434 [10:32:46<8:09:59, 2.56s/it] +2025-02-05 20:40:26 - ERROR - stderr - +2025-02-05 20:40:26 - ERROR - stderr - +2025-02-05 20:40:26 - INFO - stdout - {'loss': 0.6403, 'grad_norm': 1.177228569984436, 'learning_rate': 1.087523290283064e-05, 'epoch': 1.46} +2025-02-05 20:40:26 - ERROR - stderr - 49%|████▉ | 10947/22434 [10:32:46<8:09:59, 2.56s/it] +2025-02-05 20:40:29 - ERROR - stderr - 49%|████▉ | 10948/22434 [10:32:49<8:11:06, 2.57s/it] +2025-02-05 20:40:29 - ERROR - stderr - +2025-02-05 20:40:29 - ERROR - stderr - +2025-02-05 20:40:29 - INFO - stdout - {'loss': 0.8029, 'grad_norm': 1.2795343399047852, 'learning_rate': 1.087379468750594e-05, 'epoch': 1.46} +2025-02-05 20:40:29 - ERROR - stderr - 49%|████▉ | 10948/22434 [10:32:49<8:11:06, 2.57s/it] +2025-02-05 20:40:31 - ERROR - stderr - 49%|████▉ | 10949/22434 [10:32:51<8:06:12, 2.54s/it] +2025-02-05 20:40:31 - ERROR - stderr - +2025-02-05 20:40:31 - ERROR - stderr - +2025-02-05 20:40:31 - INFO - stdout - {'loss': 0.5906, 'grad_norm': 1.0432038307189941, 'learning_rate': 1.0872356453967829e-05, 'epoch': 1.46} +2025-02-05 20:40:31 - ERROR - stderr - 49%|████▉ | 10949/22434 [10:32:51<8:06:12, 2.54s/it] +2025-02-05 20:40:34 - ERROR - stderr - 49%|████▉ | 10950/22434 [10:32:54<8:04:35, 2.53s/it] +2025-02-05 20:40:34 - ERROR - stderr - +2025-02-05 20:40:34 - ERROR - stderr - +2025-02-05 20:40:34 - INFO - stdout - {'loss': 0.6319, 'grad_norm': 1.1143854856491089, 'learning_rate': 1.087091820224628e-05, 'epoch': 1.46} +2025-02-05 20:40:34 - ERROR - stderr - 49%|████▉ | 10950/22434 [10:32:54<8:04:35, 2.53s/it] +2025-02-05 20:40:36 - ERROR - stderr - 49%|████▉ | 10951/22434 [10:32:56<7:59:58, 2.51s/it] +2025-02-05 20:40:36 - ERROR - stderr - +2025-02-05 20:40:36 - ERROR - stderr - +2025-02-05 20:40:36 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.437066674232483, 'learning_rate': 1.0869479932371274e-05, 'epoch': 1.46} +2025-02-05 20:40:36 - ERROR - stderr - 49%|████▉ | 10951/22434 [10:32:56<7:59:58, 2.51s/it] +2025-02-05 20:40:39 - ERROR - stderr - 49%|████▉ | 10952/22434 [10:32:58<7:57:34, 2.50s/it] +2025-02-05 20:40:39 - ERROR - stderr - +2025-02-05 20:40:39 - ERROR - stderr - +2025-02-05 20:40:39 - INFO - stdout - {'loss': 0.6835, 'grad_norm': 1.3062926530838013, 'learning_rate': 1.0868041644372792e-05, 'epoch': 1.46} +2025-02-05 20:40:39 - ERROR - stderr - 49%|████▉ | 10952/22434 [10:32:58<7:57:34, 2.50s/it] +2025-02-05 20:40:41 - ERROR - stderr - 49%|████▉ | 10953/22434 [10:33:01<8:03:22, 2.53s/it] +2025-02-05 20:40:41 - ERROR - stderr - +2025-02-05 20:40:41 - ERROR - stderr - +2025-02-05 20:40:41 - INFO - stdout - {'loss': 0.7212, 'grad_norm': 1.2938897609710693, 'learning_rate': 1.0866603338280812e-05, 'epoch': 1.46} +2025-02-05 20:40:41 - ERROR - stderr - 49%|████▉ | 10953/22434 [10:33:01<8:03:22, 2.53s/it] +2025-02-05 20:40:44 - ERROR - stderr - 49%|████▉ | 10954/22434 [10:33:04<8:02:52, 2.52s/it] +2025-02-05 20:40:44 - ERROR - stderr - +2025-02-05 20:40:44 - ERROR - stderr - +2025-02-05 20:40:44 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.163225531578064, 'learning_rate': 1.0865165014125316e-05, 'epoch': 1.46} +2025-02-05 20:40:44 - ERROR - stderr - 49%|████▉ | 10954/22434 [10:33:04<8:02:52, 2.52s/it] +2025-02-05 20:40:46 - ERROR - stderr - 49%|████▉ | 10955/22434 [10:33:06<8:02:29, 2.52s/it] +2025-02-05 20:40:46 - ERROR - stderr - +2025-02-05 20:40:46 - ERROR - stderr - +2025-02-05 20:40:46 - INFO - stdout - {'loss': 0.7841, 'grad_norm': 1.3518708944320679, 'learning_rate': 1.086372667193628e-05, 'epoch': 1.46} +2025-02-05 20:40:46 - ERROR - stderr - 49%|████▉ | 10955/22434 [10:33:06<8:02:29, 2.52s/it] +2025-02-05 20:40:49 - ERROR - stderr - 49%|████▉ | 10956/22434 [10:33:09<8:00:24, 2.51s/it] +2025-02-05 20:40:49 - ERROR - stderr - +2025-02-05 20:40:49 - ERROR - stderr - +2025-02-05 20:40:49 - INFO - stdout - {'loss': 0.7388, 'grad_norm': 1.2324292659759521, 'learning_rate': 1.0862288311743691e-05, 'epoch': 1.47} +2025-02-05 20:40:49 - ERROR - stderr - 49%|████▉ | 10956/22434 [10:33:09<8:00:24, 2.51s/it] +2025-02-05 20:40:51 - ERROR - stderr - 49%|████▉ | 10957/22434 [10:33:11<8:04:37, 2.53s/it] +2025-02-05 20:40:51 - ERROR - stderr - +2025-02-05 20:40:51 - ERROR - stderr - +2025-02-05 20:40:51 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.3129326105117798, 'learning_rate': 1.0860849933577529e-05, 'epoch': 1.47} +2025-02-05 20:40:51 - ERROR - stderr - 49%|████▉ | 10957/22434 [10:33:11<8:04:37, 2.53s/it] +2025-02-05 20:40:54 - ERROR - stderr - 49%|████▉ | 10958/22434 [10:33:14<8:04:40, 2.53s/it] +2025-02-05 20:40:54 - ERROR - stderr - +2025-02-05 20:40:54 - ERROR - stderr - +2025-02-05 20:40:54 - INFO - stdout - {'loss': 0.7119, 'grad_norm': 1.1984643936157227, 'learning_rate': 1.0859411537467768e-05, 'epoch': 1.47} +2025-02-05 20:40:54 - ERROR - stderr - 49%|████▉ | 10958/22434 [10:33:14<8:04:40, 2.53s/it] +2025-02-05 20:40:56 - ERROR - stderr - 49%|████▉ | 10959/22434 [10:33:16<8:08:06, 2.55s/it] +2025-02-05 20:40:57 - ERROR - stderr - +2025-02-05 20:40:57 - ERROR - stderr - +2025-02-05 20:40:57 - INFO - stdout - {'loss': 0.7684, 'grad_norm': 1.4695075750350952, 'learning_rate': 1.0857973123444401e-05, 'epoch': 1.47} +2025-02-05 20:40:57 - ERROR - stderr - 49%|████▉ | 10959/22434 [10:33:16<8:08:06, 2.55s/it] +2025-02-05 20:40:59 - ERROR - stderr - 49%|████▉ | 10960/22434 [10:33:19<8:00:21, 2.51s/it] +2025-02-05 20:40:59 - ERROR - stderr - +2025-02-05 20:40:59 - ERROR - stderr - +2025-02-05 20:40:59 - INFO - stdout - {'loss': 0.7778, 'grad_norm': 1.3796393871307373, 'learning_rate': 1.0856534691537402e-05, 'epoch': 1.47} +2025-02-05 20:40:59 - ERROR - stderr - 49%|████▉ | 10960/22434 [10:33:19<8:00:21, 2.51s/it] +2025-02-05 20:41:01 - ERROR - stderr - 49%|████▉ | 10961/22434 [10:33:21<7:57:04, 2.49s/it] +2025-02-05 20:41:01 - ERROR - stderr - +2025-02-05 20:41:01 - ERROR - stderr - +2025-02-05 20:41:01 - INFO - stdout - {'loss': 0.7289, 'grad_norm': 1.270552396774292, 'learning_rate': 1.0855096241776759e-05, 'epoch': 1.47} +2025-02-05 20:41:01 - ERROR - stderr - 49%|████▉ | 10961/22434 [10:33:21<7:57:04, 2.49s/it] +2025-02-05 20:41:04 - ERROR - stderr - 49%|████▉ | 10962/22434 [10:33:24<7:59:28, 2.51s/it] +2025-02-05 20:41:04 - ERROR - stderr - +2025-02-05 20:41:04 - ERROR - stderr - +2025-02-05 20:41:04 - INFO - stdout - {'loss': 0.8081, 'grad_norm': 1.4660818576812744, 'learning_rate': 1.0853657774192454e-05, 'epoch': 1.47} +2025-02-05 20:41:04 - ERROR - stderr - 49%|████▉ | 10962/22434 [10:33:24<7:59:28, 2.51s/it] +2025-02-05 20:41:06 - ERROR - stderr - 49%|████▉ | 10963/22434 [10:33:26<7:58:50, 2.50s/it] +2025-02-05 20:41:06 - ERROR - stderr - +2025-02-05 20:41:06 - ERROR - stderr - +2025-02-05 20:41:06 - INFO - stdout - {'loss': 0.702, 'grad_norm': 1.2011971473693848, 'learning_rate': 1.0852219288814467e-05, 'epoch': 1.47} +2025-02-05 20:41:06 - ERROR - stderr - 49%|████▉ | 10963/22434 [10:33:26<7:58:50, 2.50s/it] +2025-02-05 20:41:09 - ERROR - stderr - 49%|████▉ | 10964/22434 [10:33:29<7:57:40, 2.50s/it] +2025-02-05 20:41:09 - ERROR - stderr - +2025-02-05 20:41:09 - ERROR - stderr - +2025-02-05 20:41:09 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.2819328308105469, 'learning_rate': 1.0850780785672786e-05, 'epoch': 1.47} +2025-02-05 20:41:09 - ERROR - stderr - 49%|████▉ | 10964/22434 [10:33:29<7:57:40, 2.50s/it] +2025-02-05 20:41:11 - ERROR - stderr - 49%|████▉ | 10965/22434 [10:33:31<7:57:45, 2.50s/it] +2025-02-05 20:41:11 - ERROR - stderr - +2025-02-05 20:41:11 - ERROR - stderr - +2025-02-05 20:41:11 - INFO - stdout - {'loss': 0.7263, 'grad_norm': 1.309237003326416, 'learning_rate': 1.0849342264797391e-05, 'epoch': 1.47} +2025-02-05 20:41:11 - ERROR - stderr - 49%|████▉ | 10965/22434 [10:33:31<7:57:45, 2.50s/it] +2025-02-05 20:41:14 - ERROR - stderr - 49%|████▉ | 10966/22434 [10:33:34<7:54:30, 2.48s/it] +2025-02-05 20:41:14 - ERROR - stderr - +2025-02-05 20:41:14 - ERROR - stderr - +2025-02-05 20:41:14 - INFO - stdout - {'loss': 0.7383, 'grad_norm': 1.2203730344772339, 'learning_rate': 1.0847903726218271e-05, 'epoch': 1.47} +2025-02-05 20:41:14 - ERROR - stderr - 49%|████▉ | 10966/22434 [10:33:34<7:54:30, 2.48s/it] +2025-02-05 20:41:16 - ERROR - stderr - 49%|████▉ | 10967/22434 [10:33:36<7:53:43, 2.48s/it] +2025-02-05 20:41:16 - ERROR - stderr - +2025-02-05 20:41:16 - ERROR - stderr - +2025-02-05 20:41:16 - INFO - stdout - {'loss': 0.6819, 'grad_norm': 1.2435630559921265, 'learning_rate': 1.084646516996541e-05, 'epoch': 1.47} +2025-02-05 20:41:16 - ERROR - stderr - 49%|████▉ | 10967/22434 [10:33:36<7:53:43, 2.48s/it] +2025-02-05 20:41:19 - ERROR - stderr - 49%|████▉ | 10968/22434 [10:33:39<8:00:25, 2.51s/it] +2025-02-05 20:41:19 - ERROR - stderr - +2025-02-05 20:41:19 - ERROR - stderr - +2025-02-05 20:41:19 - INFO - stdout - {'loss': 0.8351, 'grad_norm': 1.3134021759033203, 'learning_rate': 1.0845026596068792e-05, 'epoch': 1.47} +2025-02-05 20:41:19 - ERROR - stderr - 49%|████▉ | 10968/22434 [10:33:39<8:00:25, 2.51s/it] +2025-02-05 20:41:21 - ERROR - stderr - 49%|████▉ | 10969/22434 [10:33:41<7:57:20, 2.50s/it] +2025-02-05 20:41:21 - ERROR - stderr - +2025-02-05 20:41:21 - ERROR - stderr - +2025-02-05 20:41:21 - INFO - stdout - {'loss': 0.7997, 'grad_norm': 1.234737753868103, 'learning_rate': 1.0843588004558402e-05, 'epoch': 1.47} +2025-02-05 20:41:21 - ERROR - stderr - 49%|████▉ | 10969/22434 [10:33:41<7:57:20, 2.50s/it] +2025-02-05 20:41:24 - ERROR - stderr - 49%|████▉ | 10970/22434 [10:33:44<7:53:45, 2.48s/it] +2025-02-05 20:41:24 - ERROR - stderr - +2025-02-05 20:41:24 - ERROR - stderr - +2025-02-05 20:41:24 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.2755217552185059, 'learning_rate': 1.0842149395464231e-05, 'epoch': 1.47} +2025-02-05 20:41:24 - ERROR - stderr - 49%|████▉ | 10970/22434 [10:33:44<7:53:45, 2.48s/it] +2025-02-05 20:41:26 - ERROR - stderr - 49%|████▉ | 10971/22434 [10:33:46<7:53:06, 2.48s/it] +2025-02-05 20:41:26 - ERROR - stderr - +2025-02-05 20:41:26 - ERROR - stderr - +2025-02-05 20:41:26 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.2468311786651611, 'learning_rate': 1.0840710768816258e-05, 'epoch': 1.47} +2025-02-05 20:41:26 - ERROR - stderr - 49%|████▉ | 10971/22434 [10:33:46<7:53:06, 2.48s/it] +2025-02-05 20:41:29 - ERROR - stderr - 49%|████▉ | 10972/22434 [10:33:49<7:54:17, 2.48s/it] +2025-02-05 20:41:29 - ERROR - stderr - +2025-02-05 20:41:29 - ERROR - stderr - +2025-02-05 20:41:29 - INFO - stdout - {'loss': 0.735, 'grad_norm': 1.2712064981460571, 'learning_rate': 1.0839272124644476e-05, 'epoch': 1.47} +2025-02-05 20:41:29 - ERROR - stderr - 49%|████▉ | 10972/22434 [10:33:49<7:54:17, 2.48s/it] +2025-02-05 20:41:31 - ERROR - stderr - 49%|████▉ | 10973/22434 [10:33:51<7:56:57, 2.50s/it] +2025-02-05 20:41:31 - ERROR - stderr - +2025-02-05 20:41:31 - ERROR - stderr - +2025-02-05 20:41:31 - INFO - stdout - {'loss': 0.7362, 'grad_norm': 1.1605310440063477, 'learning_rate': 1.0837833462978866e-05, 'epoch': 1.47} +2025-02-05 20:41:31 - ERROR - stderr - 49%|████▉ | 10973/22434 [10:33:51<7:56:57, 2.50s/it] +2025-02-05 20:41:34 - ERROR - stderr - 49%|████▉ | 10974/22434 [10:33:53<7:52:29, 2.47s/it] +2025-02-05 20:41:34 - ERROR - stderr - +2025-02-05 20:41:34 - ERROR - stderr - +2025-02-05 20:41:34 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.3182663917541504, 'learning_rate': 1.0836394783849424e-05, 'epoch': 1.47} +2025-02-05 20:41:34 - ERROR - stderr - 49%|████▉ | 10974/22434 [10:33:54<7:52:29, 2.47s/it] +2025-02-05 20:41:36 - ERROR - stderr - 49%|████▉ | 10975/22434 [10:33:56<7:54:45, 2.49s/it] +2025-02-05 20:41:36 - ERROR - stderr - +2025-02-05 20:41:36 - ERROR - stderr - +2025-02-05 20:41:36 - INFO - stdout - {'loss': 0.7195, 'grad_norm': 1.2682924270629883, 'learning_rate': 1.083495608728613e-05, 'epoch': 1.47} +2025-02-05 20:41:36 - ERROR - stderr - 49%|████▉ | 10975/22434 [10:33:56<7:54:45, 2.49s/it] +2025-02-05 20:41:39 - ERROR - stderr - 49%|████▉ | 10976/22434 [10:33:58<7:53:53, 2.48s/it] +2025-02-05 20:41:39 - ERROR - stderr - +2025-02-05 20:41:39 - ERROR - stderr - +2025-02-05 20:41:39 - INFO - stdout - {'loss': 0.7574, 'grad_norm': 1.308272361755371, 'learning_rate': 1.0833517373318976e-05, 'epoch': 1.47} +2025-02-05 20:41:39 - ERROR - stderr - 49%|████▉ | 10976/22434 [10:33:59<7:53:53, 2.48s/it] +2025-02-05 20:41:41 - ERROR - stderr - 49%|████▉ | 10977/22434 [10:34:01<7:52:36, 2.48s/it] +2025-02-05 20:41:41 - ERROR - stderr - +2025-02-05 20:41:41 - ERROR - stderr - +2025-02-05 20:41:41 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.2059719562530518, 'learning_rate': 1.083207864197795e-05, 'epoch': 1.47} +2025-02-05 20:41:41 - ERROR - stderr - 49%|████▉ | 10977/22434 [10:34:01<7:52:36, 2.48s/it] +2025-02-05 20:41:44 - ERROR - stderr - 49%|████▉ | 10978/22434 [10:34:03<7:51:58, 2.47s/it] +2025-02-05 20:41:44 - ERROR - stderr - +2025-02-05 20:41:44 - ERROR - stderr - +2025-02-05 20:41:44 - INFO - stdout - {'loss': 0.7569, 'grad_norm': 1.1888363361358643, 'learning_rate': 1.083063989329304e-05, 'epoch': 1.47} +2025-02-05 20:41:44 - ERROR - stderr - 49%|████▉ | 10978/22434 [10:34:03<7:51:58, 2.47s/it] +2025-02-05 20:41:46 - ERROR - stderr - 49%|████▉ | 10979/22434 [10:34:06<7:54:25, 2.48s/it] +2025-02-05 20:41:46 - ERROR - stderr - +2025-02-05 20:41:46 - ERROR - stderr - +2025-02-05 20:41:46 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.2443981170654297, 'learning_rate': 1.0829201127294238e-05, 'epoch': 1.47} +2025-02-05 20:41:46 - ERROR - stderr - 49%|████▉ | 10979/22434 [10:34:06<7:54:25, 2.48s/it] +2025-02-05 20:41:49 - ERROR - stderr - 49%|████▉ | 10980/22434 [10:34:08<7:51:57, 2.47s/it] +2025-02-05 20:41:49 - ERROR - stderr - +2025-02-05 20:41:49 - ERROR - stderr - +2025-02-05 20:41:49 - INFO - stdout - {'loss': 0.7724, 'grad_norm': 1.241886854171753, 'learning_rate': 1.082776234401153e-05, 'epoch': 1.47} +2025-02-05 20:41:49 - ERROR - stderr - 49%|████▉ | 10980/22434 [10:34:08<7:51:57, 2.47s/it] +2025-02-05 20:41:51 - ERROR - stderr - 49%|████▉ | 10981/22434 [10:34:11<7:49:09, 2.46s/it] +2025-02-05 20:41:51 - ERROR - stderr - +2025-02-05 20:41:51 - ERROR - stderr - +2025-02-05 20:41:51 - INFO - stdout - {'loss': 0.7453, 'grad_norm': 1.2591633796691895, 'learning_rate': 1.0826323543474909e-05, 'epoch': 1.47} +2025-02-05 20:41:51 - ERROR - stderr - 49%|████▉ | 10981/22434 [10:34:11<7:49:09, 2.46s/it] +2025-02-05 20:41:54 - ERROR - stderr - 49%|████▉ | 10982/22434 [10:34:13<7:51:43, 2.47s/it] +2025-02-05 20:41:54 - ERROR - stderr - +2025-02-05 20:41:54 - ERROR - stderr - +2025-02-05 20:41:54 - INFO - stdout - {'loss': 0.6293, 'grad_norm': 1.1055887937545776, 'learning_rate': 1.0824884725714366e-05, 'epoch': 1.47} +2025-02-05 20:41:54 - ERROR - stderr - 49%|████▉ | 10982/22434 [10:34:13<7:51:43, 2.47s/it] +2025-02-05 20:41:56 - ERROR - stderr - 49%|████▉ | 10983/22434 [10:34:16<7:49:09, 2.46s/it] +2025-02-05 20:41:56 - ERROR - stderr - +2025-02-05 20:41:56 - ERROR - stderr - +2025-02-05 20:41:56 - INFO - stdout - {'loss': 0.6131, 'grad_norm': 1.1809444427490234, 'learning_rate': 1.082344589075989e-05, 'epoch': 1.47} +2025-02-05 20:41:56 - ERROR - stderr - 49%|████▉ | 10983/22434 [10:34:16<7:49:09, 2.46s/it] +2025-02-05 20:41:58 - ERROR - stderr - 49%|████▉ | 10984/22434 [10:34:18<7:50:52, 2.47s/it] +2025-02-05 20:41:58 - ERROR - stderr - +2025-02-05 20:41:58 - ERROR - stderr - +2025-02-05 20:41:58 - INFO - stdout - {'loss': 0.6606, 'grad_norm': 1.1707051992416382, 'learning_rate': 1.0822007038641467e-05, 'epoch': 1.47} +2025-02-05 20:41:58 - ERROR - stderr - 49%|████▉ | 10984/22434 [10:34:18<7:50:52, 2.47s/it] +2025-02-05 20:42:01 - ERROR - stderr - 49%|████▉ | 10985/22434 [10:34:21<8:22:35, 2.63s/it] +2025-02-05 20:42:01 - ERROR - stderr - +2025-02-05 20:42:01 - ERROR - stderr - +2025-02-05 20:42:01 - INFO - stdout - {'loss': 0.7155, 'grad_norm': 1.2626782655715942, 'learning_rate': 1.0820568169389098e-05, 'epoch': 1.47} +2025-02-05 20:42:01 - ERROR - stderr - 49%|████▉ | 10985/22434 [10:34:21<8:22:35, 2.63s/it] +2025-02-05 20:42:04 - ERROR - stderr - 49%|████▉ | 10986/22434 [10:34:24<8:17:22, 2.61s/it] +2025-02-05 20:42:04 - ERROR - stderr - +2025-02-05 20:42:04 - ERROR - stderr - +2025-02-05 20:42:04 - INFO - stdout - {'loss': 0.6804, 'grad_norm': 1.2243694067001343, 'learning_rate': 1.0819129283032772e-05, 'epoch': 1.47} +2025-02-05 20:42:04 - ERROR - stderr - 49%|████▉ | 10986/22434 [10:34:24<8:17:22, 2.61s/it] +2025-02-05 20:42:07 - ERROR - stderr - 49%|████▉ | 10987/22434 [10:34:26<8:15:06, 2.60s/it] +2025-02-05 20:42:07 - ERROR - stderr - +2025-02-05 20:42:07 - ERROR - stderr - +2025-02-05 20:42:07 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.2063841819763184, 'learning_rate': 1.081769037960248e-05, 'epoch': 1.47} +2025-02-05 20:42:07 - ERROR - stderr - 49%|████▉ | 10987/22434 [10:34:26<8:15:06, 2.60s/it] +2025-02-05 20:42:09 - ERROR - stderr - 49%|████▉ | 10988/22434 [10:34:29<8:09:48, 2.57s/it] +2025-02-05 20:42:09 - ERROR - stderr - +2025-02-05 20:42:09 - ERROR - stderr - +2025-02-05 20:42:09 - INFO - stdout - {'loss': 0.7463, 'grad_norm': 1.1805076599121094, 'learning_rate': 1.0816251459128213e-05, 'epoch': 1.47} +2025-02-05 20:42:09 - ERROR - stderr - 49%|████▉ | 10988/22434 [10:34:29<8:09:48, 2.57s/it] +2025-02-05 20:42:12 - ERROR - stderr - 49%|████▉ | 10989/22434 [10:34:31<8:03:29, 2.53s/it] +2025-02-05 20:42:12 - ERROR - stderr - +2025-02-05 20:42:12 - ERROR - stderr - +2025-02-05 20:42:12 - INFO - stdout - {'loss': 0.6635, 'grad_norm': 1.328493595123291, 'learning_rate': 1.0814812521639963e-05, 'epoch': 1.47} +2025-02-05 20:42:12 - ERROR - stderr - 49%|████▉ | 10989/22434 [10:34:31<8:03:29, 2.53s/it] +2025-02-05 20:42:14 - ERROR - stderr - 49%|████▉ | 10990/22434 [10:34:34<8:04:07, 2.54s/it] +2025-02-05 20:42:14 - ERROR - stderr - +2025-02-05 20:42:14 - ERROR - stderr - +2025-02-05 20:42:14 - INFO - stdout - {'loss': 0.6932, 'grad_norm': 1.1209900379180908, 'learning_rate': 1.0813373567167729e-05, 'epoch': 1.47} +2025-02-05 20:42:14 - ERROR - stderr - 49%|████▉ | 10990/22434 [10:34:34<8:04:07, 2.54s/it] +2025-02-05 20:42:17 - ERROR - stderr - 49%|████▉ | 10991/22434 [10:34:36<8:09:02, 2.56s/it] +2025-02-05 20:42:17 - ERROR - stderr - +2025-02-05 20:42:17 - ERROR - stderr - +2025-02-05 20:42:17 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.2943460941314697, 'learning_rate': 1.08119345957415e-05, 'epoch': 1.47} +2025-02-05 20:42:17 - ERROR - stderr - 49%|████▉ | 10991/22434 [10:34:37<8:09:02, 2.56s/it] +2025-02-05 20:42:19 - ERROR - stderr - 49%|████▉ | 10992/22434 [10:34:39<8:10:14, 2.57s/it] +2025-02-05 20:42:19 - ERROR - stderr - +2025-02-05 20:42:19 - ERROR - stderr - +2025-02-05 20:42:19 - INFO - stdout - {'loss': 0.7685, 'grad_norm': 1.2325923442840576, 'learning_rate': 1.081049560739127e-05, 'epoch': 1.47} +2025-02-05 20:42:19 - ERROR - stderr - 49%|████▉ | 10992/22434 [10:34:39<8:10:14, 2.57s/it] +2025-02-05 20:42:22 - ERROR - stderr - 49%|████▉ | 10993/22434 [10:34:42<8:18:57, 2.62s/it] +2025-02-05 20:42:22 - ERROR - stderr - +2025-02-05 20:42:22 - ERROR - stderr - +2025-02-05 20:42:22 - INFO - stdout - {'loss': 0.6965, 'grad_norm': 1.2439771890640259, 'learning_rate': 1.080905660214704e-05, 'epoch': 1.47} +2025-02-05 20:42:22 - ERROR - stderr - 49%|████▉ | 10993/22434 [10:34:42<8:18:57, 2.62s/it] +2025-02-05 20:42:25 - ERROR - stderr - 49%|████▉ | 10994/22434 [10:34:44<8:12:25, 2.58s/it] +2025-02-05 20:42:25 - ERROR - stderr - +2025-02-05 20:42:25 - ERROR - stderr - +2025-02-05 20:42:25 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.3063714504241943, 'learning_rate': 1.0807617580038797e-05, 'epoch': 1.47} +2025-02-05 20:42:25 - ERROR - stderr - 49%|████▉ | 10994/22434 [10:34:44<8:12:25, 2.58s/it] +2025-02-05 20:42:27 - ERROR - stderr - 49%|████▉ | 10995/22434 [10:34:47<8:09:10, 2.57s/it] +2025-02-05 20:42:27 - ERROR - stderr - +2025-02-05 20:42:27 - ERROR - stderr - +2025-02-05 20:42:27 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.1719521284103394, 'learning_rate': 1.0806178541096535e-05, 'epoch': 1.47} +2025-02-05 20:42:27 - ERROR - stderr - 49%|████▉ | 10995/22434 [10:34:47<8:09:10, 2.57s/it] +2025-02-05 20:42:30 - ERROR - stderr - 49%|████▉ | 10996/22434 [10:34:49<8:07:56, 2.56s/it] +2025-02-05 20:42:30 - ERROR - stderr - +2025-02-05 20:42:30 - ERROR - stderr - +2025-02-05 20:42:30 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.2490668296813965, 'learning_rate': 1.0804739485350255e-05, 'epoch': 1.47} +2025-02-05 20:42:30 - ERROR - stderr - 49%|████▉ | 10996/22434 [10:34:49<8:07:56, 2.56s/it] +2025-02-05 20:42:32 - ERROR - stderr - 49%|████▉ | 10997/22434 [10:34:52<8:02:48, 2.53s/it] +2025-02-05 20:42:32 - ERROR - stderr - +2025-02-05 20:42:32 - ERROR - stderr - +2025-02-05 20:42:32 - INFO - stdout - {'loss': 0.758, 'grad_norm': 1.2804933786392212, 'learning_rate': 1.0803300412829949e-05, 'epoch': 1.47} +2025-02-05 20:42:32 - ERROR - stderr - 49%|████▉ | 10997/22434 [10:34:52<8:02:48, 2.53s/it] +2025-02-05 20:42:34 - ERROR - stderr - 49%|████▉ | 10998/22434 [10:34:54<7:56:26, 2.50s/it] +2025-02-05 20:42:35 - ERROR - stderr - +2025-02-05 20:42:35 - ERROR - stderr - +2025-02-05 20:42:35 - INFO - stdout - {'loss': 0.627, 'grad_norm': 1.2336697578430176, 'learning_rate': 1.0801861323565616e-05, 'epoch': 1.47} +2025-02-05 20:42:35 - ERROR - stderr - 49%|████▉ | 10998/22434 [10:34:54<7:56:26, 2.50s/it] +2025-02-05 20:42:37 - ERROR - stderr - 49%|████▉ | 10999/22434 [10:34:57<8:20:11, 2.62s/it] +2025-02-05 20:42:37 - ERROR - stderr - +2025-02-05 20:42:37 - ERROR - stderr - +2025-02-05 20:42:37 - INFO - stdout - {'loss': 0.6653, 'grad_norm': 1.2621501684188843, 'learning_rate': 1.0800422217587253e-05, 'epoch': 1.47} +2025-02-05 20:42:37 - ERROR - stderr - 49%|████▉ | 10999/22434 [10:34:57<8:20:11, 2.62s/it] +2025-02-05 20:42:40 - ERROR - stderr - 49%|████▉ | 11000/22434 [10:35:00<8:10:25, 2.57s/it] +2025-02-05 20:42:40 - ERROR - stderr - +2025-02-05 20:42:40 - ERROR - stderr - +2025-02-05 20:42:40 - INFO - stdout - {'loss': 0.5881, 'grad_norm': 1.1541835069656372, 'learning_rate': 1.0798983094924851e-05, 'epoch': 1.47} +2025-02-05 20:42:40 - ERROR - stderr - 49%|████▉ | 11000/22434 [10:35:00<8:10:25, 2.57s/it] +2025-02-05 20:42:42 - ERROR - stderr - 49%|████▉ | 11001/22434 [10:35:02<8:02:46, 2.53s/it] +2025-02-05 20:42:42 - ERROR - stderr - +2025-02-05 20:42:42 - ERROR - stderr - +2025-02-05 20:42:42 - INFO - stdout - {'loss': 0.6551, 'grad_norm': 1.1506747007369995, 'learning_rate': 1.0797543955608411e-05, 'epoch': 1.47} +2025-02-05 20:42:42 - ERROR - stderr - 49%|████▉ | 11001/22434 [10:35:02<8:02:46, 2.53s/it] +2025-02-05 20:42:45 - ERROR - stderr - 49%|████▉ | 11002/22434 [10:35:05<8:09:30, 2.57s/it] +2025-02-05 20:42:45 - ERROR - stderr - +2025-02-05 20:42:45 - ERROR - stderr - +2025-02-05 20:42:45 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.3766402006149292, 'learning_rate': 1.0796104799667935e-05, 'epoch': 1.47} +2025-02-05 20:42:45 - ERROR - stderr - 49%|████▉ | 11002/22434 [10:35:05<8:09:30, 2.57s/it] +2025-02-05 20:42:47 - ERROR - stderr - 49%|████▉ | 11003/22434 [10:35:07<8:06:26, 2.55s/it] +2025-02-05 20:42:47 - ERROR - stderr - +2025-02-05 20:42:47 - ERROR - stderr - +2025-02-05 20:42:47 - INFO - stdout - {'loss': 0.6897, 'grad_norm': 1.2164684534072876, 'learning_rate': 1.0794665627133409e-05, 'epoch': 1.47} +2025-02-05 20:42:47 - ERROR - stderr - 49%|████▉ | 11003/22434 [10:35:07<8:06:26, 2.55s/it] +2025-02-05 20:42:50 - ERROR - stderr - 49%|████▉ | 11004/22434 [10:35:10<8:03:40, 2.54s/it] +2025-02-05 20:42:50 - ERROR - stderr - +2025-02-05 20:42:50 - ERROR - stderr - +2025-02-05 20:42:50 - INFO - stdout - {'loss': 0.6959, 'grad_norm': 1.1555522680282593, 'learning_rate': 1.0793226438034843e-05, 'epoch': 1.47} +2025-02-05 20:42:50 - ERROR - stderr - 49%|████▉ | 11004/22434 [10:35:10<8:03:40, 2.54s/it] +2025-02-05 20:42:53 - ERROR - stderr - 49%|████▉ | 11005/22434 [10:35:12<8:04:11, 2.54s/it] +2025-02-05 20:42:53 - ERROR - stderr - +2025-02-05 20:42:53 - ERROR - stderr - +2025-02-05 20:42:53 - INFO - stdout - {'loss': 0.7559, 'grad_norm': 1.3476839065551758, 'learning_rate': 1.079178723240223e-05, 'epoch': 1.47} +2025-02-05 20:42:53 - ERROR - stderr - 49%|████▉ | 11005/22434 [10:35:12<8:04:11, 2.54s/it] +2025-02-05 20:42:55 - ERROR - stderr - 49%|████▉ | 11006/22434 [10:35:15<8:01:21, 2.53s/it] +2025-02-05 20:42:55 - ERROR - stderr - +2025-02-05 20:42:55 - ERROR - stderr - +2025-02-05 20:42:55 - INFO - stdout - {'loss': 0.6216, 'grad_norm': 1.1879136562347412, 'learning_rate': 1.0790348010265572e-05, 'epoch': 1.47} +2025-02-05 20:42:55 - ERROR - stderr - 49%|████▉ | 11006/22434 [10:35:15<8:01:21, 2.53s/it] +2025-02-05 20:42:58 - ERROR - stderr - 49%|████▉ | 11007/22434 [10:35:17<8:00:53, 2.53s/it] +2025-02-05 20:42:58 - ERROR - stderr - +2025-02-05 20:42:58 - ERROR - stderr - +2025-02-05 20:42:58 - INFO - stdout - {'loss': 0.6877, 'grad_norm': 1.2088356018066406, 'learning_rate': 1.0788908771654865e-05, 'epoch': 1.47} +2025-02-05 20:42:58 - ERROR - stderr - 49%|████▉ | 11007/22434 [10:35:17<8:00:53, 2.53s/it] +2025-02-05 20:43:00 - ERROR - stderr - 49%|████▉ | 11008/22434 [10:35:20<7:59:54, 2.52s/it] +2025-02-05 20:43:00 - ERROR - stderr - +2025-02-05 20:43:00 - ERROR - stderr - +2025-02-05 20:43:00 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.17787504196167, 'learning_rate': 1.0787469516600109e-05, 'epoch': 1.47} +2025-02-05 20:43:00 - ERROR - stderr - 49%|████▉ | 11008/22434 [10:35:20<7:59:54, 2.52s/it] +2025-02-05 20:43:03 - ERROR - stderr - 49%|████▉ | 11009/22434 [10:35:22<8:00:10, 2.52s/it] +2025-02-05 20:43:03 - ERROR - stderr - +2025-02-05 20:43:03 - ERROR - stderr - +2025-02-05 20:43:03 - INFO - stdout - {'loss': 0.5867, 'grad_norm': 1.0703020095825195, 'learning_rate': 1.0786030245131305e-05, 'epoch': 1.47} +2025-02-05 20:43:03 - ERROR - stderr - 49%|████▉ | 11009/22434 [10:35:22<8:00:10, 2.52s/it] +2025-02-05 20:43:05 - ERROR - stderr - 49%|████▉ | 11010/22434 [10:35:25<8:08:33, 2.57s/it] +2025-02-05 20:43:05 - ERROR - stderr - +2025-02-05 20:43:05 - ERROR - stderr - +2025-02-05 20:43:05 - INFO - stdout - {'loss': 0.7471, 'grad_norm': 1.4556201696395874, 'learning_rate': 1.0784590957278452e-05, 'epoch': 1.47} +2025-02-05 20:43:05 - ERROR - stderr - 49%|████▉ | 11010/22434 [10:35:25<8:08:33, 2.57s/it] +2025-02-05 20:43:08 - ERROR - stderr - 49%|████▉ | 11011/22434 [10:35:28<8:10:48, 2.58s/it] +2025-02-05 20:43:08 - ERROR - stderr - +2025-02-05 20:43:08 - ERROR - stderr - +2025-02-05 20:43:08 - INFO - stdout - {'loss': 0.6816, 'grad_norm': 1.1953924894332886, 'learning_rate': 1.078315165307155e-05, 'epoch': 1.47} +2025-02-05 20:43:08 - ERROR - stderr - 49%|████▉ | 11011/22434 [10:35:28<8:10:48, 2.58s/it] +2025-02-05 20:43:10 - ERROR - stderr - 49%|████▉ | 11012/22434 [10:35:30<8:09:03, 2.57s/it] +2025-02-05 20:43:10 - ERROR - stderr - +2025-02-05 20:43:10 - ERROR - stderr - +2025-02-05 20:43:10 - INFO - stdout - {'loss': 0.7157, 'grad_norm': 1.2086997032165527, 'learning_rate': 1.0781712332540602e-05, 'epoch': 1.47} +2025-02-05 20:43:10 - ERROR - stderr - 49%|████▉ | 11012/22434 [10:35:30<8:09:03, 2.57s/it] +2025-02-05 20:43:13 - ERROR - stderr - 49%|████▉ | 11013/22434 [10:35:33<8:09:14, 2.57s/it] +2025-02-05 20:43:13 - ERROR - stderr - +2025-02-05 20:43:13 - ERROR - stderr - +2025-02-05 20:43:13 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.2595924139022827, 'learning_rate': 1.0780272995715608e-05, 'epoch': 1.47} +2025-02-05 20:43:13 - ERROR - stderr - 49%|████▉ | 11013/22434 [10:35:33<8:09:14, 2.57s/it] +2025-02-05 20:43:16 - ERROR - stderr - 49%|████▉ | 11014/22434 [10:35:35<8:10:19, 2.58s/it] +2025-02-05 20:43:16 - ERROR - stderr - +2025-02-05 20:43:16 - ERROR - stderr - +2025-02-05 20:43:16 - INFO - stdout - {'loss': 0.731, 'grad_norm': 1.1625953912734985, 'learning_rate': 1.0778833642626573e-05, 'epoch': 1.47} +2025-02-05 20:43:16 - ERROR - stderr - 49%|████▉ | 11014/22434 [10:35:35<8:10:19, 2.58s/it] +2025-02-05 20:43:18 - ERROR - stderr - 49%|████▉ | 11015/22434 [10:35:38<8:09:47, 2.57s/it] +2025-02-05 20:43:18 - ERROR - stderr - +2025-02-05 20:43:18 - ERROR - stderr - +2025-02-05 20:43:18 - INFO - stdout - {'loss': 0.6491, 'grad_norm': 1.2217767238616943, 'learning_rate': 1.0777394273303495e-05, 'epoch': 1.47} +2025-02-05 20:43:18 - ERROR - stderr - 49%|████▉ | 11015/22434 [10:35:38<8:09:47, 2.57s/it] +2025-02-05 20:43:21 - ERROR - stderr - 49%|████▉ | 11016/22434 [10:35:40<8:07:34, 2.56s/it] +2025-02-05 20:43:21 - ERROR - stderr - +2025-02-05 20:43:21 - ERROR - stderr - +2025-02-05 20:43:21 - INFO - stdout - {'loss': 0.734, 'grad_norm': 1.238851547241211, 'learning_rate': 1.0775954887776374e-05, 'epoch': 1.47} +2025-02-05 20:43:21 - ERROR - stderr - 49%|████▉ | 11016/22434 [10:35:40<8:07:34, 2.56s/it] +2025-02-05 20:43:23 - ERROR - stderr - 49%|████▉ | 11017/22434 [10:35:43<8:17:01, 2.61s/it] +2025-02-05 20:43:23 - ERROR - stderr - +2025-02-05 20:43:23 - ERROR - stderr - +2025-02-05 20:43:23 - INFO - stdout - {'loss': 0.6682, 'grad_norm': 1.1740225553512573, 'learning_rate': 1.0774515486075216e-05, 'epoch': 1.47} +2025-02-05 20:43:23 - ERROR - stderr - 49%|████▉ | 11017/22434 [10:35:43<8:17:01, 2.61s/it] +2025-02-05 20:43:26 - ERROR - stderr - 49%|████▉ | 11018/22434 [10:35:46<8:21:26, 2.64s/it] +2025-02-05 20:43:26 - ERROR - stderr - +2025-02-05 20:43:26 - ERROR - stderr - +2025-02-05 20:43:26 - INFO - stdout - {'loss': 0.6931, 'grad_norm': 1.1755441427230835, 'learning_rate': 1.0773076068230028e-05, 'epoch': 1.47} +2025-02-05 20:43:26 - ERROR - stderr - 49%|████▉ | 11018/22434 [10:35:46<8:21:26, 2.64s/it] +2025-02-05 20:43:29 - ERROR - stderr - 49%|████▉ | 11019/22434 [10:35:48<8:13:02, 2.59s/it] +2025-02-05 20:43:29 - ERROR - stderr - +2025-02-05 20:43:29 - ERROR - stderr - +2025-02-05 20:43:29 - INFO - stdout - {'loss': 0.6879, 'grad_norm': 1.263599157333374, 'learning_rate': 1.0771636634270807e-05, 'epoch': 1.47} +2025-02-05 20:43:29 - ERROR - stderr - 49%|████▉ | 11019/22434 [10:35:48<8:13:02, 2.59s/it] +2025-02-05 20:43:31 - ERROR - stderr - 49%|████▉ | 11020/22434 [10:35:51<8:20:54, 2.63s/it] +2025-02-05 20:43:31 - ERROR - stderr - +2025-02-05 20:43:31 - ERROR - stderr - +2025-02-05 20:43:31 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.1955450773239136, 'learning_rate': 1.077019718422756e-05, 'epoch': 1.47} +2025-02-05 20:43:31 - ERROR - stderr - 49%|████▉ | 11020/22434 [10:35:51<8:20:54, 2.63s/it] +2025-02-05 20:43:34 - ERROR - stderr - 49%|████▉ | 11021/22434 [10:35:54<8:17:22, 2.61s/it] +2025-02-05 20:43:34 - ERROR - stderr - +2025-02-05 20:43:34 - ERROR - stderr - +2025-02-05 20:43:34 - INFO - stdout - {'loss': 0.7942, 'grad_norm': 1.3294062614440918, 'learning_rate': 1.0768757718130287e-05, 'epoch': 1.47} +2025-02-05 20:43:34 - ERROR - stderr - 49%|████▉ | 11021/22434 [10:35:54<8:17:22, 2.61s/it] +2025-02-05 20:43:36 - ERROR - stderr - 49%|████▉ | 11022/22434 [10:35:56<8:11:44, 2.59s/it] +2025-02-05 20:43:36 - ERROR - stderr - +2025-02-05 20:43:36 - ERROR - stderr - +2025-02-05 20:43:36 - INFO - stdout - {'loss': 0.6454, 'grad_norm': 1.1144938468933105, 'learning_rate': 1.0767318236008997e-05, 'epoch': 1.47} +2025-02-05 20:43:36 - ERROR - stderr - 49%|████▉ | 11022/22434 [10:35:56<8:11:44, 2.59s/it] +2025-02-05 20:43:39 - ERROR - stderr - 49%|████▉ | 11023/22434 [10:35:59<8:11:22, 2.58s/it] +2025-02-05 20:43:39 - ERROR - stderr - +2025-02-05 20:43:39 - ERROR - stderr - +2025-02-05 20:43:39 - INFO - stdout - {'loss': 0.7131, 'grad_norm': 1.2441428899765015, 'learning_rate': 1.0765878737893692e-05, 'epoch': 1.47} +2025-02-05 20:43:39 - ERROR - stderr - 49%|████▉ | 11023/22434 [10:35:59<8:11:22, 2.58s/it] +2025-02-05 20:43:41 - ERROR - stderr - 49%|████▉ | 11024/22434 [10:36:01<8:06:19, 2.56s/it] +2025-02-05 20:43:41 - ERROR - stderr - +2025-02-05 20:43:41 - ERROR - stderr - +2025-02-05 20:43:41 - INFO - stdout - {'loss': 0.7286, 'grad_norm': 1.1600240468978882, 'learning_rate': 1.0764439223814378e-05, 'epoch': 1.47} +2025-02-05 20:43:41 - ERROR - stderr - 49%|████▉ | 11024/22434 [10:36:01<8:06:19, 2.56s/it] +2025-02-05 20:43:44 - ERROR - stderr - 49%|████▉ | 11025/22434 [10:36:04<8:04:17, 2.55s/it] +2025-02-05 20:43:44 - ERROR - stderr - +2025-02-05 20:43:44 - ERROR - stderr - +2025-02-05 20:43:44 - INFO - stdout - {'loss': 0.8622, 'grad_norm': 1.5299153327941895, 'learning_rate': 1.0762999693801057e-05, 'epoch': 1.47} +2025-02-05 20:43:44 - ERROR - stderr - 49%|████▉ | 11025/22434 [10:36:04<8:04:17, 2.55s/it] +2025-02-05 20:43:46 - ERROR - stderr - 49%|████▉ | 11026/22434 [10:36:06<8:00:44, 2.53s/it] +2025-02-05 20:43:46 - ERROR - stderr - +2025-02-05 20:43:46 - ERROR - stderr - +2025-02-05 20:43:46 - INFO - stdout - {'loss': 0.7904, 'grad_norm': 1.302994966506958, 'learning_rate': 1.0761560147883742e-05, 'epoch': 1.47} +2025-02-05 20:43:46 - ERROR - stderr - 49%|████▉ | 11026/22434 [10:36:06<8:00:44, 2.53s/it] +2025-02-05 20:43:49 - ERROR - stderr - 49%|████▉ | 11027/22434 [10:36:09<8:00:09, 2.53s/it] +2025-02-05 20:43:49 - ERROR - stderr - +2025-02-05 20:43:49 - ERROR - stderr - +2025-02-05 20:43:49 - INFO - stdout - {'loss': 0.7185, 'grad_norm': 1.1514214277267456, 'learning_rate': 1.0760120586092432e-05, 'epoch': 1.47} +2025-02-05 20:43:49 - ERROR - stderr - 49%|████▉ | 11027/22434 [10:36:09<8:00:09, 2.53s/it] +2025-02-05 20:43:51 - ERROR - stderr - 49%|████▉ | 11028/22434 [10:36:11<7:55:53, 2.50s/it] +2025-02-05 20:43:51 - ERROR - stderr - +2025-02-05 20:43:51 - ERROR - stderr - +2025-02-05 20:43:51 - INFO - stdout - {'loss': 0.6692, 'grad_norm': 1.2764613628387451, 'learning_rate': 1.0758681008457137e-05, 'epoch': 1.47} +2025-02-05 20:43:51 - ERROR - stderr - 49%|████▉ | 11028/22434 [10:36:11<7:55:53, 2.50s/it] +2025-02-05 20:43:54 - ERROR - stderr - 49%|████▉ | 11029/22434 [10:36:14<7:54:07, 2.49s/it] +2025-02-05 20:43:54 - ERROR - stderr - +2025-02-05 20:43:54 - ERROR - stderr - +2025-02-05 20:43:54 - INFO - stdout - {'loss': 0.7132, 'grad_norm': 1.3437010049819946, 'learning_rate': 1.0757241415007861e-05, 'epoch': 1.47} +2025-02-05 20:43:54 - ERROR - stderr - 49%|████▉ | 11029/22434 [10:36:14<7:54:07, 2.49s/it] +2025-02-05 20:43:56 - ERROR - stderr - 49%|████▉ | 11030/22434 [10:36:16<7:55:23, 2.50s/it] +2025-02-05 20:43:56 - ERROR - stderr - +2025-02-05 20:43:56 - ERROR - stderr - +2025-02-05 20:43:56 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.2485750913619995, 'learning_rate': 1.0755801805774613e-05, 'epoch': 1.47} +2025-02-05 20:43:56 - ERROR - stderr - 49%|████▉ | 11030/22434 [10:36:16<7:55:23, 2.50s/it] +2025-02-05 20:43:59 - ERROR - stderr - 49%|████▉ | 11031/22434 [10:36:19<7:57:28, 2.51s/it] +2025-02-05 20:43:59 - ERROR - stderr - +2025-02-05 20:43:59 - ERROR - stderr - +2025-02-05 20:43:59 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.2216953039169312, 'learning_rate': 1.07543621807874e-05, 'epoch': 1.48} +2025-02-05 20:43:59 - ERROR - stderr - 49%|████▉ | 11031/22434 [10:36:19<7:57:28, 2.51s/it] +2025-02-05 20:44:01 - ERROR - stderr - 49%|████▉ | 11032/22434 [10:36:21<7:55:56, 2.50s/it] +2025-02-05 20:44:01 - ERROR - stderr - +2025-02-05 20:44:01 - ERROR - stderr - +2025-02-05 20:44:01 - INFO - stdout - {'loss': 0.6142, 'grad_norm': 1.1537508964538574, 'learning_rate': 1.0752922540076227e-05, 'epoch': 1.48} +2025-02-05 20:44:01 - ERROR - stderr - 49%|████▉ | 11032/22434 [10:36:21<7:55:56, 2.50s/it] +2025-02-05 20:44:04 - ERROR - stderr - 49%|████▉ | 11033/22434 [10:36:24<7:58:22, 2.52s/it] +2025-02-05 20:44:04 - ERROR - stderr - +2025-02-05 20:44:04 - ERROR - stderr - +2025-02-05 20:44:04 - INFO - stdout - {'loss': 0.6769, 'grad_norm': 1.3035407066345215, 'learning_rate': 1.0751482883671108e-05, 'epoch': 1.48} +2025-02-05 20:44:04 - ERROR - stderr - 49%|████▉ | 11033/22434 [10:36:24<7:58:22, 2.52s/it] +2025-02-05 20:44:07 - ERROR - stderr - 49%|████▉ | 11034/22434 [10:36:27<8:11:52, 2.59s/it] +2025-02-05 20:44:07 - ERROR - stderr - +2025-02-05 20:44:07 - ERROR - stderr - +2025-02-05 20:44:07 - INFO - stdout - {'loss': 0.6792, 'grad_norm': 1.2123315334320068, 'learning_rate': 1.0750043211602045e-05, 'epoch': 1.48} +2025-02-05 20:44:07 - ERROR - stderr - 49%|████▉ | 11034/22434 [10:36:27<8:11:52, 2.59s/it] +2025-02-05 20:44:09 - ERROR - stderr - 49%|████▉ | 11035/22434 [10:36:29<8:06:52, 2.56s/it] +2025-02-05 20:44:09 - ERROR - stderr - +2025-02-05 20:44:09 - ERROR - stderr - +2025-02-05 20:44:09 - INFO - stdout - {'loss': 0.8214, 'grad_norm': 1.4011425971984863, 'learning_rate': 1.0748603523899048e-05, 'epoch': 1.48} +2025-02-05 20:44:09 - ERROR - stderr - 49%|████▉ | 11035/22434 [10:36:29<8:06:52, 2.56s/it] +2025-02-05 20:44:12 - ERROR - stderr - 49%|████▉ | 11036/22434 [10:36:32<8:09:20, 2.58s/it] +2025-02-05 20:44:12 - ERROR - stderr - +2025-02-05 20:44:12 - ERROR - stderr - +2025-02-05 20:44:12 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.1624788045883179, 'learning_rate': 1.0747163820592128e-05, 'epoch': 1.48} +2025-02-05 20:44:12 - ERROR - stderr - 49%|████▉ | 11036/22434 [10:36:32<8:09:20, 2.58s/it] +2025-02-05 20:44:14 - ERROR - stderr - 49%|████▉ | 11037/22434 [10:36:34<8:09:20, 2.58s/it] +2025-02-05 20:44:14 - ERROR - stderr - +2025-02-05 20:44:14 - ERROR - stderr - +2025-02-05 20:44:14 - INFO - stdout - {'loss': 0.8074, 'grad_norm': 1.2880839109420776, 'learning_rate': 1.0745724101711293e-05, 'epoch': 1.48} +2025-02-05 20:44:14 - ERROR - stderr - 49%|████▉ | 11037/22434 [10:36:34<8:09:20, 2.58s/it] +2025-02-05 20:44:17 - ERROR - stderr - 49%|████▉ | 11038/22434 [10:36:37<8:06:20, 2.56s/it] +2025-02-05 20:44:17 - ERROR - stderr - +2025-02-05 20:44:17 - ERROR - stderr - +2025-02-05 20:44:17 - INFO - stdout - {'loss': 0.6794, 'grad_norm': 1.1198848485946655, 'learning_rate': 1.0744284367286553e-05, 'epoch': 1.48} +2025-02-05 20:44:17 - ERROR - stderr - 49%|████▉ | 11038/22434 [10:36:37<8:06:20, 2.56s/it] +2025-02-05 20:44:19 - ERROR - stderr - 49%|████▉ | 11039/22434 [10:36:39<7:59:08, 2.52s/it] +2025-02-05 20:44:19 - ERROR - stderr - +2025-02-05 20:44:19 - ERROR - stderr - +2025-02-05 20:44:19 - INFO - stdout - {'loss': 0.7242, 'grad_norm': 1.2235897779464722, 'learning_rate': 1.0742844617347919e-05, 'epoch': 1.48} +2025-02-05 20:44:19 - ERROR - stderr - 49%|████▉ | 11039/22434 [10:36:39<7:59:08, 2.52s/it] +2025-02-05 20:44:22 - ERROR - stderr - 49%|████▉ | 11040/22434 [10:36:42<8:03:21, 2.55s/it] +2025-02-05 20:44:22 - ERROR - stderr - +2025-02-05 20:44:22 - ERROR - stderr - +2025-02-05 20:44:22 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.1562097072601318, 'learning_rate': 1.0741404851925397e-05, 'epoch': 1.48} +2025-02-05 20:44:22 - ERROR - stderr - 49%|████▉ | 11040/22434 [10:36:42<8:03:21, 2.55s/it] +2025-02-05 20:44:25 - ERROR - stderr - 49%|████▉ | 11041/22434 [10:36:44<8:03:58, 2.55s/it] +2025-02-05 20:44:25 - ERROR - stderr - +2025-02-05 20:44:25 - ERROR - stderr - +2025-02-05 20:44:25 - INFO - stdout - {'loss': 0.7635, 'grad_norm': 1.2664004564285278, 'learning_rate': 1.0739965071049001e-05, 'epoch': 1.48} +2025-02-05 20:44:25 - ERROR - stderr - 49%|████▉ | 11041/22434 [10:36:44<8:03:58, 2.55s/it] +2025-02-05 20:44:27 - ERROR - stderr - 49%|████▉ | 11042/22434 [10:36:47<8:00:05, 2.53s/it] +2025-02-05 20:44:27 - ERROR - stderr - +2025-02-05 20:44:27 - ERROR - stderr - +2025-02-05 20:44:27 - INFO - stdout - {'loss': 0.7494, 'grad_norm': 1.1384872198104858, 'learning_rate': 1.073852527474874e-05, 'epoch': 1.48} +2025-02-05 20:44:27 - ERROR - stderr - 49%|████▉ | 11042/22434 [10:36:47<8:00:05, 2.53s/it] +2025-02-05 20:44:29 - ERROR - stderr - 49%|████▉ | 11043/22434 [10:36:49<7:57:10, 2.51s/it] +2025-02-05 20:44:30 - ERROR - stderr - +2025-02-05 20:44:30 - ERROR - stderr - +2025-02-05 20:44:30 - INFO - stdout - {'loss': 0.7631, 'grad_norm': 1.311767339706421, 'learning_rate': 1.0737085463054628e-05, 'epoch': 1.48} +2025-02-05 20:44:30 - ERROR - stderr - 49%|████▉ | 11043/22434 [10:36:49<7:57:10, 2.51s/it] +2025-02-05 20:44:32 - ERROR - stderr - 49%|████▉ | 11044/22434 [10:36:52<7:58:13, 2.52s/it] +2025-02-05 20:44:32 - ERROR - stderr - +2025-02-05 20:44:32 - ERROR - stderr - +2025-02-05 20:44:32 - INFO - stdout - {'loss': 0.7371, 'grad_norm': 1.1707119941711426, 'learning_rate': 1.0735645635996676e-05, 'epoch': 1.48} +2025-02-05 20:44:32 - ERROR - stderr - 49%|████▉ | 11044/22434 [10:36:52<7:58:13, 2.52s/it] +2025-02-05 20:44:35 - ERROR - stderr - 49%|████▉ | 11045/22434 [10:36:54<8:00:15, 2.53s/it] +2025-02-05 20:44:35 - ERROR - stderr - +2025-02-05 20:44:35 - ERROR - stderr - +2025-02-05 20:44:35 - INFO - stdout - {'loss': 0.5966, 'grad_norm': 1.2678793668746948, 'learning_rate': 1.0734205793604892e-05, 'epoch': 1.48} +2025-02-05 20:44:35 - ERROR - stderr - 49%|████▉ | 11045/22434 [10:36:54<8:00:15, 2.53s/it] +2025-02-05 20:44:37 - ERROR - stderr - 49%|████▉ | 11046/22434 [10:36:57<7:58:40, 2.52s/it] +2025-02-05 20:44:37 - ERROR - stderr - +2025-02-05 20:44:37 - ERROR - stderr - +2025-02-05 20:44:37 - INFO - stdout - {'loss': 0.7517, 'grad_norm': 1.1148202419281006, 'learning_rate': 1.0732765935909293e-05, 'epoch': 1.48} +2025-02-05 20:44:37 - ERROR - stderr - 49%|████▉ | 11046/22434 [10:36:57<7:58:40, 2.52s/it] +2025-02-05 20:44:40 - ERROR - stderr - 49%|████▉ | 11047/22434 [10:36:59<8:01:22, 2.54s/it] +2025-02-05 20:44:40 - ERROR - stderr - +2025-02-05 20:44:40 - ERROR - stderr - +2025-02-05 20:44:40 - INFO - stdout - {'loss': 0.7736, 'grad_norm': 1.3307673931121826, 'learning_rate': 1.073132606293989e-05, 'epoch': 1.48} +2025-02-05 20:44:40 - ERROR - stderr - 49%|████▉ | 11047/22434 [10:36:59<8:01:22, 2.54s/it] +2025-02-05 20:44:42 - ERROR - stderr - 49%|████▉ | 11048/22434 [10:37:02<7:58:55, 2.52s/it] +2025-02-05 20:44:42 - ERROR - stderr - +2025-02-05 20:44:42 - ERROR - stderr - +2025-02-05 20:44:42 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.3070316314697266, 'learning_rate': 1.0729886174726694e-05, 'epoch': 1.48} +2025-02-05 20:44:42 - ERROR - stderr - 49%|████▉ | 11048/22434 [10:37:02<7:58:55, 2.52s/it] +2025-02-05 20:44:45 - ERROR - stderr - 49%|████▉ | 11049/22434 [10:37:04<7:57:47, 2.52s/it] +2025-02-05 20:44:45 - ERROR - stderr - +2025-02-05 20:44:45 - ERROR - stderr - +2025-02-05 20:44:45 - INFO - stdout - {'loss': 0.7185, 'grad_norm': 1.1212128400802612, 'learning_rate': 1.0728446271299714e-05, 'epoch': 1.48} +2025-02-05 20:44:45 - ERROR - stderr - 49%|████▉ | 11049/22434 [10:37:04<7:57:47, 2.52s/it] +2025-02-05 20:44:47 - ERROR - stderr - 49%|████▉ | 11050/22434 [10:37:07<7:56:11, 2.51s/it] +2025-02-05 20:44:47 - ERROR - stderr - +2025-02-05 20:44:47 - ERROR - stderr - +2025-02-05 20:44:47 - INFO - stdout - {'loss': 0.7954, 'grad_norm': 1.2541477680206299, 'learning_rate': 1.0727006352688973e-05, 'epoch': 1.48} +2025-02-05 20:44:47 - ERROR - stderr - 49%|████▉ | 11050/22434 [10:37:07<7:56:11, 2.51s/it] +2025-02-05 20:44:50 - ERROR - stderr - 49%|████▉ | 11051/22434 [10:37:09<7:55:07, 2.50s/it] +2025-02-05 20:44:50 - ERROR - stderr - +2025-02-05 20:44:50 - ERROR - stderr - +2025-02-05 20:44:50 - INFO - stdout - {'loss': 0.6888, 'grad_norm': 1.216055989265442, 'learning_rate': 1.0725566418924484e-05, 'epoch': 1.48} +2025-02-05 20:44:50 - ERROR - stderr - 49%|████▉ | 11051/22434 [10:37:09<7:55:07, 2.50s/it] +2025-02-05 20:44:52 - ERROR - stderr - 49%|████▉ | 11052/22434 [10:37:12<7:56:31, 2.51s/it] +2025-02-05 20:44:52 - ERROR - stderr - +2025-02-05 20:44:52 - ERROR - stderr - +2025-02-05 20:44:52 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.2605098485946655, 'learning_rate': 1.0724126470036254e-05, 'epoch': 1.48} +2025-02-05 20:44:52 - ERROR - stderr - 49%|████▉ | 11052/22434 [10:37:12<7:56:31, 2.51s/it] +2025-02-05 20:44:55 - ERROR - stderr - 49%|████▉ | 11053/22434 [10:37:14<7:57:47, 2.52s/it] +2025-02-05 20:44:55 - ERROR - stderr - +2025-02-05 20:44:55 - ERROR - stderr - +2025-02-05 20:44:55 - INFO - stdout - {'loss': 0.7027, 'grad_norm': 1.2346242666244507, 'learning_rate': 1.0722686506054298e-05, 'epoch': 1.48} +2025-02-05 20:44:55 - ERROR - stderr - 49%|████▉ | 11053/22434 [10:37:15<7:57:47, 2.52s/it] +2025-02-05 20:44:57 - ERROR - stderr - 49%|████▉ | 11054/22434 [10:37:17<8:00:05, 2.53s/it] +2025-02-05 20:44:57 - ERROR - stderr - +2025-02-05 20:44:57 - ERROR - stderr - +2025-02-05 20:44:57 - INFO - stdout - {'loss': 0.7523, 'grad_norm': 1.2767425775527954, 'learning_rate': 1.0721246527008637e-05, 'epoch': 1.48} +2025-02-05 20:44:57 - ERROR - stderr - 49%|████▉ | 11054/22434 [10:37:17<8:00:05, 2.53s/it] +2025-02-05 20:45:00 - ERROR - stderr - 49%|████▉ | 11055/22434 [10:37:20<8:02:04, 2.54s/it] +2025-02-05 20:45:00 - ERROR - stderr - +2025-02-05 20:45:00 - ERROR - stderr - +2025-02-05 20:45:00 - INFO - stdout - {'loss': 0.5793, 'grad_norm': 1.0733360052108765, 'learning_rate': 1.071980653292928e-05, 'epoch': 1.48} +2025-02-05 20:45:00 - ERROR - stderr - 49%|████▉ | 11055/22434 [10:37:20<8:02:04, 2.54s/it] +2025-02-05 20:45:02 - ERROR - stderr - 49%|████▉ | 11056/22434 [10:37:22<8:02:08, 2.54s/it] +2025-02-05 20:45:02 - ERROR - stderr - +2025-02-05 20:45:02 - ERROR - stderr - +2025-02-05 20:45:02 - INFO - stdout - {'loss': 0.7644, 'grad_norm': 1.279767632484436, 'learning_rate': 1.0718366523846246e-05, 'epoch': 1.48} +2025-02-05 20:45:02 - ERROR - stderr - 49%|████▉ | 11056/22434 [10:37:22<8:02:08, 2.54s/it] +2025-02-05 20:45:05 - ERROR - stderr - 49%|████▉ | 11057/22434 [10:37:25<7:59:19, 2.53s/it] +2025-02-05 20:45:05 - ERROR - stderr - +2025-02-05 20:45:05 - ERROR - stderr - +2025-02-05 20:45:05 - INFO - stdout - {'loss': 0.7585, 'grad_norm': 1.2687493562698364, 'learning_rate': 1.0716926499789548e-05, 'epoch': 1.48} +2025-02-05 20:45:05 - ERROR - stderr - 49%|████▉ | 11057/22434 [10:37:25<7:59:19, 2.53s/it] +2025-02-05 20:45:07 - ERROR - stderr - 49%|████▉ | 11058/22434 [10:37:27<7:56:48, 2.51s/it] +2025-02-05 20:45:07 - ERROR - stderr - +2025-02-05 20:45:07 - ERROR - stderr - +2025-02-05 20:45:07 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.1347460746765137, 'learning_rate': 1.0715486460789204e-05, 'epoch': 1.48} +2025-02-05 20:45:07 - ERROR - stderr - 49%|████▉ | 11058/22434 [10:37:27<7:56:48, 2.51s/it] +2025-02-05 20:45:10 - ERROR - stderr - 49%|████▉ | 11059/22434 [10:37:30<7:57:53, 2.52s/it] +2025-02-05 20:45:10 - ERROR - stderr - +2025-02-05 20:45:10 - ERROR - stderr - +2025-02-05 20:45:10 - INFO - stdout - {'loss': 0.7027, 'grad_norm': 1.2479099035263062, 'learning_rate': 1.0714046406875231e-05, 'epoch': 1.48} +2025-02-05 20:45:10 - ERROR - stderr - 49%|████▉ | 11059/22434 [10:37:30<7:57:53, 2.52s/it] +2025-02-05 20:45:13 - ERROR - stderr - 49%|████▉ | 11060/22434 [10:37:32<8:03:15, 2.55s/it] +2025-02-05 20:45:13 - ERROR - stderr - +2025-02-05 20:45:13 - ERROR - stderr - +2025-02-05 20:45:13 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.1573920249938965, 'learning_rate': 1.0712606338077642e-05, 'epoch': 1.48} +2025-02-05 20:45:13 - ERROR - stderr - 49%|████▉ | 11060/22434 [10:37:32<8:03:15, 2.55s/it] +2025-02-05 20:45:15 - ERROR - stderr - 49%|████▉ | 11061/22434 [10:37:35<8:10:36, 2.59s/it] +2025-02-05 20:45:15 - ERROR - stderr - +2025-02-05 20:45:15 - ERROR - stderr - +2025-02-05 20:45:15 - INFO - stdout - {'loss': 0.7436, 'grad_norm': 1.282656192779541, 'learning_rate': 1.0711166254426455e-05, 'epoch': 1.48} +2025-02-05 20:45:15 - ERROR - stderr - 49%|████▉ | 11061/22434 [10:37:35<8:10:36, 2.59s/it] +2025-02-05 20:45:18 - ERROR - stderr - 49%|████▉ | 11062/22434 [10:37:37<8:05:07, 2.56s/it] +2025-02-05 20:45:18 - ERROR - stderr - +2025-02-05 20:45:18 - ERROR - stderr - +2025-02-05 20:45:18 - INFO - stdout - {'loss': 0.7731, 'grad_norm': 1.3173472881317139, 'learning_rate': 1.0709726155951688e-05, 'epoch': 1.48} +2025-02-05 20:45:18 - ERROR - stderr - 49%|████▉ | 11062/22434 [10:37:37<8:05:07, 2.56s/it] +2025-02-05 20:45:20 - ERROR - stderr - 49%|████▉ | 11063/22434 [10:37:40<8:09:05, 2.58s/it] +2025-02-05 20:45:20 - ERROR - stderr - +2025-02-05 20:45:20 - ERROR - stderr - +2025-02-05 20:45:20 - INFO - stdout - {'loss': 0.7727, 'grad_norm': 1.4834703207015991, 'learning_rate': 1.070828604268336e-05, 'epoch': 1.48} +2025-02-05 20:45:20 - ERROR - stderr - 49%|████▉ | 11063/22434 [10:37:40<8:09:05, 2.58s/it] +2025-02-05 20:45:23 - ERROR - stderr - 49%|████▉ | 11064/22434 [10:37:43<8:02:58, 2.55s/it] +2025-02-05 20:45:23 - ERROR - stderr - +2025-02-05 20:45:23 - ERROR - stderr - +2025-02-05 20:45:23 - INFO - stdout - {'loss': 0.6783, 'grad_norm': 1.162908911705017, 'learning_rate': 1.0706845914651486e-05, 'epoch': 1.48} +2025-02-05 20:45:23 - ERROR - stderr - 49%|████▉ | 11064/22434 [10:37:43<8:02:58, 2.55s/it] +2025-02-05 20:45:25 - ERROR - stderr - 49%|████▉ | 11065/22434 [10:37:45<8:00:28, 2.54s/it] +2025-02-05 20:45:25 - ERROR - stderr - +2025-02-05 20:45:25 - ERROR - stderr - +2025-02-05 20:45:25 - INFO - stdout - {'loss': 0.6952, 'grad_norm': 1.130872130393982, 'learning_rate': 1.0705405771886086e-05, 'epoch': 1.48} +2025-02-05 20:45:25 - ERROR - stderr - 49%|████▉ | 11065/22434 [10:37:45<8:00:28, 2.54s/it] +2025-02-05 20:45:28 - ERROR - stderr - 49%|████▉ | 11066/22434 [10:37:48<7:59:23, 2.53s/it] +2025-02-05 20:45:28 - ERROR - stderr - +2025-02-05 20:45:28 - ERROR - stderr - +2025-02-05 20:45:28 - INFO - stdout - {'loss': 0.6402, 'grad_norm': 1.2108170986175537, 'learning_rate': 1.0703965614417178e-05, 'epoch': 1.48} +2025-02-05 20:45:28 - ERROR - stderr - 49%|████▉ | 11066/22434 [10:37:48<7:59:23, 2.53s/it] +2025-02-05 20:45:30 - ERROR - stderr - 49%|████▉ | 11067/22434 [10:37:50<7:58:56, 2.53s/it] +2025-02-05 20:45:30 - ERROR - stderr - +2025-02-05 20:45:30 - ERROR - stderr - +2025-02-05 20:45:30 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.2727104425430298, 'learning_rate': 1.0702525442274779e-05, 'epoch': 1.48} +2025-02-05 20:45:30 - ERROR - stderr - 49%|████▉ | 11067/22434 [10:37:50<7:58:56, 2.53s/it] +2025-02-05 20:45:33 - ERROR - stderr - 49%|████▉ | 11068/22434 [10:37:53<7:53:35, 2.50s/it] +2025-02-05 20:45:33 - ERROR - stderr - +2025-02-05 20:45:33 - ERROR - stderr - +2025-02-05 20:45:33 - INFO - stdout - {'loss': 0.6615, 'grad_norm': 1.2478718757629395, 'learning_rate': 1.070108525548891e-05, 'epoch': 1.48} +2025-02-05 20:45:33 - ERROR - stderr - 49%|████▉ | 11068/22434 [10:37:53<7:53:35, 2.50s/it] +2025-02-05 20:45:35 - ERROR - stderr - 49%|████▉ | 11069/22434 [10:37:55<7:53:52, 2.50s/it] +2025-02-05 20:45:35 - ERROR - stderr - +2025-02-05 20:45:35 - ERROR - stderr - +2025-02-05 20:45:35 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.2840837240219116, 'learning_rate': 1.069964505408959e-05, 'epoch': 1.48} +2025-02-05 20:45:35 - ERROR - stderr - 49%|████▉ | 11069/22434 [10:37:55<7:53:52, 2.50s/it] +2025-02-05 20:45:38 - ERROR - stderr - 49%|████▉ | 11070/22434 [10:37:57<7:51:13, 2.49s/it] +2025-02-05 20:45:38 - ERROR - stderr - +2025-02-05 20:45:38 - ERROR - stderr - +2025-02-05 20:45:38 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.133843183517456, 'learning_rate': 1.0698204838106837e-05, 'epoch': 1.48} +2025-02-05 20:45:38 - ERROR - stderr - 49%|████▉ | 11070/22434 [10:37:58<7:51:13, 2.49s/it] +2025-02-05 20:45:40 - ERROR - stderr - 49%|████▉ | 11071/22434 [10:38:00<7:50:44, 2.49s/it] +2025-02-05 20:45:40 - ERROR - stderr - +2025-02-05 20:45:40 - ERROR - stderr - +2025-02-05 20:45:40 - INFO - stdout - {'loss': 0.6669, 'grad_norm': 1.1912472248077393, 'learning_rate': 1.0696764607570676e-05, 'epoch': 1.48} +2025-02-05 20:45:40 - ERROR - stderr - 49%|████▉ | 11071/22434 [10:38:00<7:50:44, 2.49s/it] +2025-02-05 20:45:43 - ERROR - stderr - 49%|████▉ | 11072/22434 [10:38:02<7:52:21, 2.49s/it] +2025-02-05 20:45:43 - ERROR - stderr - +2025-02-05 20:45:43 - ERROR - stderr - +2025-02-05 20:45:43 - INFO - stdout - {'loss': 0.6579, 'grad_norm': 1.1748170852661133, 'learning_rate': 1.069532436251112e-05, 'epoch': 1.48} +2025-02-05 20:45:43 - ERROR - stderr - 49%|████▉ | 11072/22434 [10:38:03<7:52:21, 2.49s/it] +2025-02-05 20:45:45 - ERROR - stderr - 49%|████▉ | 11073/22434 [10:38:05<7:49:35, 2.48s/it] +2025-02-05 20:45:45 - ERROR - stderr - +2025-02-05 20:45:45 - ERROR - stderr - +2025-02-05 20:45:45 - INFO - stdout - {'loss': 0.7132, 'grad_norm': 1.2491395473480225, 'learning_rate': 1.0693884102958194e-05, 'epoch': 1.48} +2025-02-05 20:45:45 - ERROR - stderr - 49%|████▉ | 11073/22434 [10:38:05<7:49:35, 2.48s/it] +2025-02-05 20:45:48 - ERROR - stderr - 49%|████▉ | 11074/22434 [10:38:07<7:50:36, 2.49s/it] +2025-02-05 20:45:48 - ERROR - stderr - +2025-02-05 20:45:48 - ERROR - stderr - +2025-02-05 20:45:48 - INFO - stdout - {'loss': 0.6494, 'grad_norm': 1.0071097612380981, 'learning_rate': 1.0692443828941918e-05, 'epoch': 1.48} +2025-02-05 20:45:48 - ERROR - stderr - 49%|████▉ | 11074/22434 [10:38:07<7:50:36, 2.49s/it] +2025-02-05 20:45:50 - ERROR - stderr - 49%|████▉ | 11075/22434 [10:38:10<7:51:11, 2.49s/it] +2025-02-05 20:45:50 - ERROR - stderr - +2025-02-05 20:45:50 - ERROR - stderr - +2025-02-05 20:45:50 - INFO - stdout - {'loss': 0.6607, 'grad_norm': 1.262352705001831, 'learning_rate': 1.0691003540492313e-05, 'epoch': 1.48} +2025-02-05 20:45:50 - ERROR - stderr - 49%|████▉ | 11075/22434 [10:38:10<7:51:11, 2.49s/it] +2025-02-05 20:45:53 - ERROR - stderr - 49%|████▉ | 11076/22434 [10:38:12<7:47:26, 2.47s/it] +2025-02-05 20:45:53 - ERROR - stderr - +2025-02-05 20:45:53 - ERROR - stderr - +2025-02-05 20:45:53 - INFO - stdout - {'loss': 0.7732, 'grad_norm': 1.3789443969726562, 'learning_rate': 1.06895632376394e-05, 'epoch': 1.48} +2025-02-05 20:45:53 - ERROR - stderr - 49%|████▉ | 11076/22434 [10:38:12<7:47:26, 2.47s/it] +2025-02-05 20:45:55 - ERROR - stderr - 49%|████▉ | 11077/22434 [10:38:15<7:48:21, 2.47s/it] +2025-02-05 20:45:55 - ERROR - stderr - +2025-02-05 20:45:55 - ERROR - stderr - +2025-02-05 20:45:55 - INFO - stdout - {'loss': 0.7569, 'grad_norm': 1.296425461769104, 'learning_rate': 1.0688122920413202e-05, 'epoch': 1.48} +2025-02-05 20:45:55 - ERROR - stderr - 49%|████▉ | 11077/22434 [10:38:15<7:48:21, 2.47s/it] +2025-02-05 20:45:58 - ERROR - stderr - 49%|████▉ | 11078/22434 [10:38:18<8:10:03, 2.59s/it] +2025-02-05 20:45:58 - ERROR - stderr - +2025-02-05 20:45:58 - ERROR - stderr - +2025-02-05 20:45:58 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.1450682878494263, 'learning_rate': 1.0686682588843737e-05, 'epoch': 1.48} +2025-02-05 20:45:58 - ERROR - stderr - 49%|████▉ | 11078/22434 [10:38:18<8:10:03, 2.59s/it] +2025-02-05 20:46:00 - ERROR - stderr - 49%|████▉ | 11079/22434 [10:38:20<8:07:07, 2.57s/it] +2025-02-05 20:46:01 - ERROR - stderr - +2025-02-05 20:46:01 - ERROR - stderr - +2025-02-05 20:46:01 - INFO - stdout - {'loss': 0.6466, 'grad_norm': 1.153116226196289, 'learning_rate': 1.0685242242961035e-05, 'epoch': 1.48} +2025-02-05 20:46:01 - ERROR - stderr - 49%|████▉ | 11079/22434 [10:38:20<8:07:07, 2.57s/it] +2025-02-05 20:46:03 - ERROR - stderr - 49%|████▉ | 11080/22434 [10:38:23<8:07:33, 2.58s/it] +2025-02-05 20:46:03 - ERROR - stderr - +2025-02-05 20:46:03 - ERROR - stderr - +2025-02-05 20:46:03 - INFO - stdout - {'loss': 0.6658, 'grad_norm': 1.1240274906158447, 'learning_rate': 1.0683801882795112e-05, 'epoch': 1.48} +2025-02-05 20:46:03 - ERROR - stderr - 49%|████▉ | 11080/22434 [10:38:23<8:07:33, 2.58s/it] +2025-02-05 20:46:06 - ERROR - stderr - 49%|████▉ | 11081/22434 [10:38:25<8:08:12, 2.58s/it] +2025-02-05 20:46:06 - ERROR - stderr - +2025-02-05 20:46:06 - ERROR - stderr - +2025-02-05 20:46:06 - INFO - stdout - {'loss': 0.598, 'grad_norm': 1.1764883995056152, 'learning_rate': 1.0682361508375993e-05, 'epoch': 1.48} +2025-02-05 20:46:06 - ERROR - stderr - 49%|████▉ | 11081/22434 [10:38:25<8:08:12, 2.58s/it] +2025-02-05 20:46:08 - ERROR - stderr - 49%|████▉ | 11082/22434 [10:38:28<8:06:18, 2.57s/it] +2025-02-05 20:46:08 - ERROR - stderr - +2025-02-05 20:46:08 - ERROR - stderr - +2025-02-05 20:46:08 - INFO - stdout - {'loss': 0.7176, 'grad_norm': 1.2112371921539307, 'learning_rate': 1.06809211197337e-05, 'epoch': 1.48} +2025-02-05 20:46:08 - ERROR - stderr - 49%|████▉ | 11082/22434 [10:38:28<8:06:18, 2.57s/it] +2025-02-05 20:46:11 - ERROR - stderr - 49%|████▉ | 11083/22434 [10:38:31<8:08:31, 2.58s/it] +2025-02-05 20:46:11 - ERROR - stderr - +2025-02-05 20:46:11 - ERROR - stderr - +2025-02-05 20:46:11 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.1283106803894043, 'learning_rate': 1.0679480716898263e-05, 'epoch': 1.48} +2025-02-05 20:46:11 - ERROR - stderr - 49%|████▉ | 11083/22434 [10:38:31<8:08:31, 2.58s/it] +2025-02-05 20:46:13 - ERROR - stderr - 49%|████▉ | 11084/22434 [10:38:33<8:04:00, 2.56s/it] +2025-02-05 20:46:13 - ERROR - stderr - +2025-02-05 20:46:13 - ERROR - stderr - +2025-02-05 20:46:13 - INFO - stdout - {'loss': 0.6087, 'grad_norm': 1.1033836603164673, 'learning_rate': 1.0678040299899697e-05, 'epoch': 1.48} +2025-02-05 20:46:13 - ERROR - stderr - 49%|████▉ | 11084/22434 [10:38:33<8:04:00, 2.56s/it] +2025-02-05 20:46:16 - ERROR - stderr - 49%|████▉ | 11085/22434 [10:38:36<8:03:18, 2.56s/it] +2025-02-05 20:46:16 - ERROR - stderr - +2025-02-05 20:46:16 - ERROR - stderr - +2025-02-05 20:46:16 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.2348387241363525, 'learning_rate': 1.0676599868768029e-05, 'epoch': 1.48} +2025-02-05 20:46:16 - ERROR - stderr - 49%|████▉ | 11085/22434 [10:38:36<8:03:18, 2.56s/it] +2025-02-05 20:46:18 - ERROR - stderr - 49%|████▉ | 11086/22434 [10:38:38<8:03:13, 2.55s/it] +2025-02-05 20:46:18 - ERROR - stderr - +2025-02-05 20:46:18 - ERROR - stderr - +2025-02-05 20:46:18 - INFO - stdout - {'loss': 0.7521, 'grad_norm': 1.2662067413330078, 'learning_rate': 1.0675159423533286e-05, 'epoch': 1.48} +2025-02-05 20:46:18 - ERROR - stderr - 49%|████▉ | 11086/22434 [10:38:38<8:03:13, 2.55s/it] +2025-02-05 20:46:21 - ERROR - stderr - 49%|████▉ | 11087/22434 [10:38:41<8:03:25, 2.56s/it] +2025-02-05 20:46:21 - ERROR - stderr - +2025-02-05 20:46:21 - ERROR - stderr - +2025-02-05 20:46:21 - INFO - stdout - {'loss': 0.635, 'grad_norm': 1.059889793395996, 'learning_rate': 1.0673718964225488e-05, 'epoch': 1.48} +2025-02-05 20:46:21 - ERROR - stderr - 49%|████▉ | 11087/22434 [10:38:41<8:03:25, 2.56s/it] +2025-02-05 20:46:23 - ERROR - stderr - 49%|████▉ | 11088/22434 [10:38:43<8:02:20, 2.55s/it] +2025-02-05 20:46:24 - ERROR - stderr - +2025-02-05 20:46:24 - ERROR - stderr - +2025-02-05 20:46:24 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.2185277938842773, 'learning_rate': 1.0672278490874666e-05, 'epoch': 1.48} +2025-02-05 20:46:24 - ERROR - stderr - 49%|████▉ | 11088/22434 [10:38:43<8:02:20, 2.55s/it] +2025-02-05 20:46:26 - ERROR - stderr - 49%|████▉ | 11089/22434 [10:38:46<8:01:57, 2.55s/it] +2025-02-05 20:46:26 - ERROR - stderr - +2025-02-05 20:46:26 - ERROR - stderr - +2025-02-05 20:46:26 - INFO - stdout - {'loss': 0.7051, 'grad_norm': 1.201928973197937, 'learning_rate': 1.067083800351084e-05, 'epoch': 1.48} +2025-02-05 20:46:26 - ERROR - stderr - 49%|████▉ | 11089/22434 [10:38:46<8:01:57, 2.55s/it] +2025-02-05 20:46:29 - ERROR - stderr - 49%|████▉ | 11090/22434 [10:38:48<7:59:10, 2.53s/it] +2025-02-05 20:46:29 - ERROR - stderr - +2025-02-05 20:46:29 - ERROR - stderr - +2025-02-05 20:46:29 - INFO - stdout - {'loss': 0.72, 'grad_norm': 1.1891995668411255, 'learning_rate': 1.0669397502164038e-05, 'epoch': 1.48} +2025-02-05 20:46:29 - ERROR - stderr - 49%|████▉ | 11090/22434 [10:38:48<7:59:10, 2.53s/it] +2025-02-05 20:46:31 - ERROR - stderr - 49%|████▉ | 11091/22434 [10:38:51<7:55:35, 2.52s/it] +2025-02-05 20:46:31 - ERROR - stderr - +2025-02-05 20:46:31 - ERROR - stderr - +2025-02-05 20:46:31 - INFO - stdout - {'loss': 0.5798, 'grad_norm': 1.189477801322937, 'learning_rate': 1.066795698686429e-05, 'epoch': 1.48} +2025-02-05 20:46:31 - ERROR - stderr - 49%|████▉ | 11091/22434 [10:38:51<7:55:35, 2.52s/it] +2025-02-05 20:46:34 - ERROR - stderr - 49%|████▉ | 11092/22434 [10:38:53<7:57:13, 2.52s/it] +2025-02-05 20:46:34 - ERROR - stderr - +2025-02-05 20:46:34 - ERROR - stderr - +2025-02-05 20:46:34 - INFO - stdout - {'loss': 0.7259, 'grad_norm': 1.256593108177185, 'learning_rate': 1.0666516457641614e-05, 'epoch': 1.48} +2025-02-05 20:46:34 - ERROR - stderr - 49%|████▉ | 11092/22434 [10:38:53<7:57:13, 2.52s/it] +2025-02-05 20:46:36 - ERROR - stderr - 49%|████▉ | 11093/22434 [10:38:56<7:55:01, 2.51s/it] +2025-02-05 20:46:36 - ERROR - stderr - +2025-02-05 20:46:36 - ERROR - stderr - +2025-02-05 20:46:36 - INFO - stdout - {'loss': 0.7675, 'grad_norm': 1.3148462772369385, 'learning_rate': 1.0665075914526039e-05, 'epoch': 1.48} +2025-02-05 20:46:36 - ERROR - stderr - 49%|████▉ | 11093/22434 [10:38:56<7:55:01, 2.51s/it] +2025-02-05 20:46:39 - ERROR - stderr - 49%|████▉ | 11094/22434 [10:38:58<7:58:34, 2.53s/it] +2025-02-05 20:46:39 - ERROR - stderr - +2025-02-05 20:46:39 - ERROR - stderr - +2025-02-05 20:46:39 - INFO - stdout - {'loss': 0.7015, 'grad_norm': 1.2321245670318604, 'learning_rate': 1.0663635357547593e-05, 'epoch': 1.48} +2025-02-05 20:46:39 - ERROR - stderr - 49%|████▉ | 11094/22434 [10:38:58<7:58:34, 2.53s/it] +2025-02-05 20:46:41 - ERROR - stderr - 49%|████▉ | 11095/22434 [10:39:01<8:01:23, 2.55s/it] +2025-02-05 20:46:41 - ERROR - stderr - +2025-02-05 20:46:41 - ERROR - stderr - +2025-02-05 20:46:41 - INFO - stdout - {'loss': 0.7716, 'grad_norm': 1.2060399055480957, 'learning_rate': 1.0662194786736307e-05, 'epoch': 1.48} +2025-02-05 20:46:41 - ERROR - stderr - 49%|████▉ | 11095/22434 [10:39:01<8:01:23, 2.55s/it] +2025-02-05 20:46:44 - ERROR - stderr - 49%|████▉ | 11096/22434 [10:39:04<8:01:40, 2.55s/it] +2025-02-05 20:46:44 - ERROR - stderr - +2025-02-05 20:46:44 - ERROR - stderr - +2025-02-05 20:46:44 - INFO - stdout - {'loss': 0.6772, 'grad_norm': 1.2887561321258545, 'learning_rate': 1.0660754202122199e-05, 'epoch': 1.48} +2025-02-05 20:46:44 - ERROR - stderr - 49%|████▉ | 11096/22434 [10:39:04<8:01:40, 2.55s/it] +2025-02-05 20:46:46 - ERROR - stderr - 49%|████▉ | 11097/22434 [10:39:06<8:04:33, 2.56s/it] +2025-02-05 20:46:46 - ERROR - stderr - +2025-02-05 20:46:46 - ERROR - stderr - +2025-02-05 20:46:46 - INFO - stdout - {'loss': 0.7529, 'grad_norm': 1.1921271085739136, 'learning_rate': 1.0659313603735307e-05, 'epoch': 1.48} +2025-02-05 20:46:46 - ERROR - stderr - 49%|████▉ | 11097/22434 [10:39:06<8:04:33, 2.56s/it] +2025-02-05 20:46:49 - ERROR - stderr - 49%|████▉ | 11098/22434 [10:39:09<8:00:58, 2.55s/it] +2025-02-05 20:46:49 - ERROR - stderr - +2025-02-05 20:46:49 - ERROR - stderr - +2025-02-05 20:46:49 - INFO - stdout - {'loss': 0.7033, 'grad_norm': 1.180107593536377, 'learning_rate': 1.0657872991605649e-05, 'epoch': 1.48} +2025-02-05 20:46:49 - ERROR - stderr - 49%|████▉ | 11098/22434 [10:39:09<8:00:58, 2.55s/it] +2025-02-05 20:46:51 - ERROR - stderr - 49%|████▉ | 11099/22434 [10:39:11<8:04:34, 2.57s/it] +2025-02-05 20:46:52 - ERROR - stderr - +2025-02-05 20:46:52 - ERROR - stderr - +2025-02-05 20:46:52 - INFO - stdout - {'loss': 0.6462, 'grad_norm': 1.1730291843414307, 'learning_rate': 1.0656432365763263e-05, 'epoch': 1.48} +2025-02-05 20:46:52 - ERROR - stderr - 49%|████▉ | 11099/22434 [10:39:11<8:04:34, 2.57s/it] +2025-02-05 20:46:54 - ERROR - stderr - 49%|████▉ | 11100/22434 [10:39:14<8:07:42, 2.58s/it] +2025-02-05 20:46:54 - ERROR - stderr - +2025-02-05 20:46:54 - ERROR - stderr - +2025-02-05 20:46:54 - INFO - stdout - {'loss': 0.7754, 'grad_norm': 1.2790377140045166, 'learning_rate': 1.0654991726238166e-05, 'epoch': 1.48} +2025-02-05 20:46:54 - ERROR - stderr - 49%|████▉ | 11100/22434 [10:39:14<8:07:42, 2.58s/it] +2025-02-05 20:46:57 - ERROR - stderr - 49%|████▉ | 11101/22434 [10:39:16<8:08:10, 2.58s/it] +2025-02-05 20:46:57 - ERROR - stderr - +2025-02-05 20:46:57 - ERROR - stderr - +2025-02-05 20:46:57 - INFO - stdout - {'loss': 0.7825, 'grad_norm': 1.2423210144042969, 'learning_rate': 1.0653551073060397e-05, 'epoch': 1.48} +2025-02-05 20:46:57 - ERROR - stderr - 49%|████▉ | 11101/22434 [10:39:16<8:08:10, 2.58s/it] +2025-02-05 20:46:59 - ERROR - stderr - 49%|████▉ | 11102/22434 [10:39:19<8:04:35, 2.57s/it] +2025-02-05 20:46:59 - ERROR - stderr - +2025-02-05 20:46:59 - ERROR - stderr - +2025-02-05 20:46:59 - INFO - stdout - {'loss': 0.8086, 'grad_norm': 1.3042274713516235, 'learning_rate': 1.0652110406259981e-05, 'epoch': 1.48} +2025-02-05 20:46:59 - ERROR - stderr - 49%|████▉ | 11102/22434 [10:39:19<8:04:35, 2.57s/it] +2025-02-05 20:47:02 - ERROR - stderr - 49%|████▉ | 11103/22434 [10:39:22<8:02:39, 2.56s/it] +2025-02-05 20:47:02 - ERROR - stderr - +2025-02-05 20:47:02 - ERROR - stderr - +2025-02-05 20:47:02 - INFO - stdout - {'loss': 0.8353, 'grad_norm': 1.2844929695129395, 'learning_rate': 1.065066972586695e-05, 'epoch': 1.48} +2025-02-05 20:47:02 - ERROR - stderr - 49%|████▉ | 11103/22434 [10:39:22<8:02:39, 2.56s/it] +2025-02-05 20:47:04 - ERROR - stderr - 49%|████▉ | 11104/22434 [10:39:24<8:03:58, 2.56s/it] +2025-02-05 20:47:04 - ERROR - stderr - +2025-02-05 20:47:04 - ERROR - stderr - +2025-02-05 20:47:04 - INFO - stdout - {'loss': 0.6514, 'grad_norm': 1.0808616876602173, 'learning_rate': 1.064922903191133e-05, 'epoch': 1.48} +2025-02-05 20:47:04 - ERROR - stderr - 49%|████▉ | 11104/22434 [10:39:24<8:03:58, 2.56s/it] +2025-02-05 20:47:07 - ERROR - stderr - 50%|████▉ | 11105/22434 [10:39:27<8:02:16, 2.55s/it] +2025-02-05 20:47:07 - ERROR - stderr - +2025-02-05 20:47:07 - ERROR - stderr - +2025-02-05 20:47:07 - INFO - stdout - {'loss': 0.7744, 'grad_norm': 1.4469674825668335, 'learning_rate': 1.0647788324423152e-05, 'epoch': 1.49} +2025-02-05 20:47:07 - ERROR - stderr - 50%|████▉ | 11105/22434 [10:39:27<8:02:16, 2.55s/it] +2025-02-05 20:47:09 - ERROR - stderr - 50%|████▉ | 11106/22434 [10:39:29<7:59:49, 2.54s/it] +2025-02-05 20:47:09 - ERROR - stderr - +2025-02-05 20:47:09 - ERROR - stderr - +2025-02-05 20:47:09 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.1879732608795166, 'learning_rate': 1.0646347603432443e-05, 'epoch': 1.49} +2025-02-05 20:47:09 - ERROR - stderr - 50%|████▉ | 11106/22434 [10:39:29<7:59:49, 2.54s/it] +2025-02-05 20:47:12 - ERROR - stderr - 50%|████▉ | 11107/22434 [10:39:32<8:00:32, 2.55s/it] +2025-02-05 20:47:12 - ERROR - stderr - +2025-02-05 20:47:12 - ERROR - stderr - +2025-02-05 20:47:12 - INFO - stdout - {'loss': 0.6848, 'grad_norm': 1.2401130199432373, 'learning_rate': 1.064490686896924e-05, 'epoch': 1.49} +2025-02-05 20:47:12 - ERROR - stderr - 50%|████▉ | 11107/22434 [10:39:32<8:00:32, 2.55s/it] +2025-02-05 20:47:15 - ERROR - stderr - 50%|████▉ | 11108/22434 [10:39:34<8:05:03, 2.57s/it] +2025-02-05 20:47:15 - ERROR - stderr - +2025-02-05 20:47:15 - ERROR - stderr - +2025-02-05 20:47:15 - INFO - stdout - {'loss': 0.6927, 'grad_norm': 1.1934558153152466, 'learning_rate': 1.064346612106357e-05, 'epoch': 1.49} +2025-02-05 20:47:15 - ERROR - stderr - 50%|████▉ | 11108/22434 [10:39:34<8:05:03, 2.57s/it] +2025-02-05 20:47:17 - ERROR - stderr - 50%|████▉ | 11109/22434 [10:39:37<8:16:49, 2.63s/it] +2025-02-05 20:47:17 - ERROR - stderr - +2025-02-05 20:47:17 - ERROR - stderr - +2025-02-05 20:47:17 - INFO - stdout - {'loss': 0.7093, 'grad_norm': 1.217354416847229, 'learning_rate': 1.0642025359745463e-05, 'epoch': 1.49} +2025-02-05 20:47:17 - ERROR - stderr - 50%|████▉ | 11109/22434 [10:39:37<8:16:49, 2.63s/it] +2025-02-05 20:47:20 - ERROR - stderr - 50%|████▉ | 11110/22434 [10:39:40<8:08:46, 2.59s/it] +2025-02-05 20:47:20 - ERROR - stderr - +2025-02-05 20:47:20 - ERROR - stderr - +2025-02-05 20:47:20 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.1650991439819336, 'learning_rate': 1.0640584585044953e-05, 'epoch': 1.49} +2025-02-05 20:47:20 - ERROR - stderr - 50%|████▉ | 11110/22434 [10:39:40<8:08:46, 2.59s/it] +2025-02-05 20:47:22 - ERROR - stderr - 50%|████▉ | 11111/22434 [10:39:42<8:07:46, 2.58s/it] +2025-02-05 20:47:22 - ERROR - stderr - +2025-02-05 20:47:22 - ERROR - stderr - +2025-02-05 20:47:22 - INFO - stdout - {'loss': 0.585, 'grad_norm': 1.1619435548782349, 'learning_rate': 1.0639143796992072e-05, 'epoch': 1.49} +2025-02-05 20:47:22 - ERROR - stderr - 50%|████▉ | 11111/22434 [10:39:42<8:07:46, 2.58s/it] +2025-02-05 20:47:25 - ERROR - stderr - 50%|████▉ | 11112/22434 [10:39:45<8:11:35, 2.61s/it] +2025-02-05 20:47:25 - ERROR - stderr - +2025-02-05 20:47:25 - ERROR - stderr - +2025-02-05 20:47:25 - INFO - stdout - {'loss': 0.7588, 'grad_norm': 1.2391313314437866, 'learning_rate': 1.0637702995616848e-05, 'epoch': 1.49} +2025-02-05 20:47:25 - ERROR - stderr - 50%|████▉ | 11112/22434 [10:39:45<8:11:35, 2.61s/it] +2025-02-05 20:47:28 - ERROR - stderr - 50%|████▉ | 11113/22434 [10:39:47<8:08:18, 2.59s/it] +2025-02-05 20:47:28 - ERROR - stderr - +2025-02-05 20:47:28 - ERROR - stderr - +2025-02-05 20:47:28 - INFO - stdout - {'loss': 0.7341, 'grad_norm': 1.2317885160446167, 'learning_rate': 1.0636262180949312e-05, 'epoch': 1.49} +2025-02-05 20:47:28 - ERROR - stderr - 50%|████▉ | 11113/22434 [10:39:47<8:08:18, 2.59s/it] +2025-02-05 20:47:30 - ERROR - stderr - 50%|████▉ | 11114/22434 [10:39:50<8:06:10, 2.58s/it] +2025-02-05 20:47:30 - ERROR - stderr - +2025-02-05 20:47:30 - ERROR - stderr - +2025-02-05 20:47:30 - INFO - stdout - {'loss': 0.711, 'grad_norm': 1.1130679845809937, 'learning_rate': 1.0634821353019505e-05, 'epoch': 1.49} +2025-02-05 20:47:30 - ERROR - stderr - 50%|████▉ | 11114/22434 [10:39:50<8:06:10, 2.58s/it] +2025-02-05 20:47:33 - ERROR - stderr - 50%|████▉ | 11115/22434 [10:39:53<8:10:35, 2.60s/it] +2025-02-05 20:47:33 - ERROR - stderr - +2025-02-05 20:47:33 - ERROR - stderr - +2025-02-05 20:47:33 - INFO - stdout - {'loss': 0.7604, 'grad_norm': 1.1894334554672241, 'learning_rate': 1.0633380511857454e-05, 'epoch': 1.49} +2025-02-05 20:47:33 - ERROR - stderr - 50%|████▉ | 11115/22434 [10:39:53<8:10:35, 2.60s/it] +2025-02-05 20:47:35 - ERROR - stderr - 50%|████▉ | 11116/22434 [10:39:55<8:04:56, 2.57s/it] +2025-02-05 20:47:35 - ERROR - stderr - +2025-02-05 20:47:35 - ERROR - stderr - +2025-02-05 20:47:35 - INFO - stdout - {'loss': 0.7775, 'grad_norm': 1.2044531106948853, 'learning_rate': 1.0631939657493188e-05, 'epoch': 1.49} +2025-02-05 20:47:35 - ERROR - stderr - 50%|████▉ | 11116/22434 [10:39:55<8:04:56, 2.57s/it] +2025-02-05 20:47:38 - ERROR - stderr - 50%|████▉ | 11117/22434 [10:39:58<8:02:21, 2.56s/it] +2025-02-05 20:47:38 - ERROR - stderr - +2025-02-05 20:47:38 - ERROR - stderr - +2025-02-05 20:47:38 - INFO - stdout - {'loss': 0.6572, 'grad_norm': 1.188333511352539, 'learning_rate': 1.0630498789956749e-05, 'epoch': 1.49} +2025-02-05 20:47:38 - ERROR - stderr - 50%|████▉ | 11117/22434 [10:39:58<8:02:21, 2.56s/it] +2025-02-05 20:47:41 - ERROR - stderr - 50%|████▉ | 11118/22434 [10:40:01<8:56:41, 2.85s/it] +2025-02-05 20:47:41 - ERROR - stderr - +2025-02-05 20:47:41 - ERROR - stderr - +2025-02-05 20:47:41 - INFO - stdout - {'loss': 0.7848, 'grad_norm': 1.267372727394104, 'learning_rate': 1.0629057909278165e-05, 'epoch': 1.49} +2025-02-05 20:47:41 - ERROR - stderr - 50%|████▉ | 11118/22434 [10:40:01<8:56:41, 2.85s/it] +2025-02-05 20:48:06 - ERROR - stderr - 50%|████▉ | 11119/22434 [10:40:26<29:41:06, 9.44s/it] +2025-02-05 20:48:06 - ERROR - stderr - +2025-02-05 20:48:06 - ERROR - stderr - +2025-02-05 20:48:06 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.0872628688812256, 'learning_rate': 1.0627617015487468e-05, 'epoch': 1.49} +2025-02-05 20:48:06 - ERROR - stderr - 50%|████▉ | 11119/22434 [10:40:26<29:41:06, 9.44s/it] +2025-02-05 20:48:33 - ERROR - stderr - 50%|████▉ | 11120/22434 [10:40:52<45:46:11, 14.56s/it] +2025-02-05 20:48:33 - ERROR - stderr - +2025-02-05 20:48:33 - ERROR - stderr - +2025-02-05 20:48:33 - INFO - stdout - {'loss': 0.7661, 'grad_norm': 1.24944007396698, 'learning_rate': 1.0626176108614699e-05, 'epoch': 1.49} +2025-02-05 20:48:33 - ERROR - stderr - 50%|████▉ | 11120/22434 [10:40:53<45:46:11, 14.56s/it] +2025-02-05 20:48:43 - ERROR - stderr - 50%|████▉ | 11121/22434 [10:41:03<41:50:29, 13.31s/it] +2025-02-05 20:48:43 - ERROR - stderr - +2025-02-05 20:48:43 - ERROR - stderr - +2025-02-05 20:48:43 - INFO - stdout - {'loss': 0.6711, 'grad_norm': 1.4129022359848022, 'learning_rate': 1.0624735188689885e-05, 'epoch': 1.49} +2025-02-05 20:48:43 - ERROR - stderr - 50%|████▉ | 11121/22434 [10:41:03<41:50:29, 13.31s/it] +2025-02-05 20:48:46 - ERROR - stderr - 50%|████▉ | 11122/22434 [10:41:05<31:40:14, 10.08s/it] +2025-02-05 20:48:46 - ERROR - stderr - +2025-02-05 20:48:46 - ERROR - stderr - +2025-02-05 20:48:46 - INFO - stdout - {'loss': 0.6772, 'grad_norm': 1.1602057218551636, 'learning_rate': 1.0623294255743064e-05, 'epoch': 1.49} +2025-02-05 20:48:46 - ERROR - stderr - 50%|████▉ | 11122/22434 [10:41:05<31:40:14, 10.08s/it] +2025-02-05 20:48:48 - ERROR - stderr - 50%|████▉ | 11123/22434 [10:41:08<24:34:21, 7.82s/it] +2025-02-05 20:48:48 - ERROR - stderr - +2025-02-05 20:48:48 - ERROR - stderr - +2025-02-05 20:48:48 - INFO - stdout - {'loss': 0.7407, 'grad_norm': 1.2842772006988525, 'learning_rate': 1.0621853309804275e-05, 'epoch': 1.49} +2025-02-05 20:48:48 - ERROR - stderr - 50%|████▉ | 11123/22434 [10:41:08<24:34:21, 7.82s/it] +2025-02-05 20:48:51 - ERROR - stderr - 50%|████▉ | 11124/22434 [10:41:10<19:34:22, 6.23s/it] +2025-02-05 20:48:51 - ERROR - stderr - +2025-02-05 20:48:51 - ERROR - stderr - +2025-02-05 20:48:51 - INFO - stdout - {'loss': 0.7811, 'grad_norm': 1.3192344903945923, 'learning_rate': 1.0620412350903545e-05, 'epoch': 1.49} +2025-02-05 20:48:51 - ERROR - stderr - 50%|████▉ | 11124/22434 [10:41:11<19:34:22, 6.23s/it] +2025-02-05 20:48:55 - ERROR - stderr - 50%|████▉ | 11125/22434 [10:41:15<17:53:08, 5.69s/it] +2025-02-05 20:48:55 - ERROR - stderr - +2025-02-05 20:48:55 - ERROR - stderr - +2025-02-05 20:48:55 - INFO - stdout - {'loss': 0.6615, 'grad_norm': 1.1869572401046753, 'learning_rate': 1.0618971379070912e-05, 'epoch': 1.49} +2025-02-05 20:48:55 - ERROR - stderr - 50%|████▉ | 11125/22434 [10:41:15<17:53:08, 5.69s/it] +2025-02-05 20:49:21 - ERROR - stderr - 50%|████▉ | 11126/22434 [10:41:41<37:09:23, 11.83s/it] +2025-02-05 20:49:21 - ERROR - stderr - +2025-02-05 20:49:21 - ERROR - stderr - +2025-02-05 20:49:21 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.1692684888839722, 'learning_rate': 1.0617530394336412e-05, 'epoch': 1.49} +2025-02-05 20:49:21 - ERROR - stderr - 50%|████▉ | 11126/22434 [10:41:41<37:09:23, 11.83s/it] +2025-02-05 20:49:24 - ERROR - stderr - 50%|████▉ | 11127/22434 [10:41:44<28:24:32, 9.05s/it] +2025-02-05 20:49:24 - ERROR - stderr - +2025-02-05 20:49:24 - ERROR - stderr - +2025-02-05 20:49:24 - INFO - stdout - {'loss': 0.7036, 'grad_norm': 1.2383116483688354, 'learning_rate': 1.0616089396730086e-05, 'epoch': 1.49} +2025-02-05 20:49:24 - ERROR - stderr - 50%|████▉ | 11127/22434 [10:41:44<28:24:32, 9.05s/it] +2025-02-05 20:49:28 - ERROR - stderr - 50%|████▉ | 11128/22434 [10:41:48<23:44:19, 7.56s/it] +2025-02-05 20:49:28 - ERROR - stderr - +2025-02-05 20:49:28 - ERROR - stderr - +2025-02-05 20:49:28 - INFO - stdout - {'loss': 0.7453, 'grad_norm': 1.1192725896835327, 'learning_rate': 1.0614648386281967e-05, 'epoch': 1.49} +2025-02-05 20:49:28 - ERROR - stderr - 50%|████▉ | 11128/22434 [10:41:48<23:44:19, 7.56s/it] +2025-02-05 20:49:32 - ERROR - stderr - 50%|████▉ | 11129/22434 [10:41:52<20:26:42, 6.51s/it] +2025-02-05 20:49:32 - ERROR - stderr - +2025-02-05 20:49:32 - ERROR - stderr - +2025-02-05 20:49:32 - INFO - stdout - {'loss': 0.6989, 'grad_norm': 1.1129965782165527, 'learning_rate': 1.0613207363022086e-05, 'epoch': 1.49} +2025-02-05 20:49:32 - ERROR - stderr - 50%|████▉ | 11129/22434 [10:41:52<20:26:42, 6.51s/it] +2025-02-05 20:49:57 - ERROR - stderr - 50%|████▉ | 11130/22434 [10:42:17<37:56:17, 12.08s/it] +2025-02-05 20:49:57 - ERROR - stderr - +2025-02-05 20:49:57 - ERROR - stderr - +2025-02-05 20:49:57 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.1282628774642944, 'learning_rate': 1.0611766326980489e-05, 'epoch': 1.49} +2025-02-05 20:49:57 - ERROR - stderr - 50%|████▉ | 11130/22434 [10:42:17<37:56:17, 12.08s/it] +2025-02-05 20:50:02 - ERROR - stderr - 50%|████▉ | 11131/22434 [10:42:21<30:54:10, 9.84s/it] +2025-02-05 20:50:02 - ERROR - stderr - +2025-02-05 20:50:02 - ERROR - stderr - +2025-02-05 20:50:02 - INFO - stdout - {'loss': 0.6493, 'grad_norm': 1.06178617477417, 'learning_rate': 1.0610325278187203e-05, 'epoch': 1.49} +2025-02-05 20:50:02 - ERROR - stderr - 50%|████▉ | 11131/22434 [10:42:21<30:54:10, 9.84s/it] +2025-02-05 20:50:26 - ERROR - stderr - 50%|████▉ | 11132/22434 [10:42:45<44:14:57, 14.09s/it] +2025-02-05 20:50:26 - ERROR - stderr - +2025-02-05 20:50:26 - ERROR - stderr - +2025-02-05 20:50:26 - INFO - stdout - {'loss': 0.6972, 'grad_norm': 1.2385424375534058, 'learning_rate': 1.0608884216672275e-05, 'epoch': 1.49} +2025-02-05 20:50:26 - ERROR - stderr - 50%|████▉ | 11132/22434 [10:42:46<44:14:57, 14.09s/it] +2025-02-05 20:50:45 - ERROR - stderr - 50%|████▉ | 11133/22434 [10:43:05<49:27:15, 15.75s/it] +2025-02-05 20:50:45 - ERROR - stderr - +2025-02-05 20:50:45 - ERROR - stderr - +2025-02-05 20:50:45 - INFO - stdout - {'loss': 0.672, 'grad_norm': 1.1972640752792358, 'learning_rate': 1.0607443142465735e-05, 'epoch': 1.49} +2025-02-05 20:50:45 - ERROR - stderr - 50%|████▉ | 11133/22434 [10:43:05<49:27:15, 15.75s/it] +2025-02-05 20:51:08 - ERROR - stderr - 50%|████▉ | 11134/22434 [10:43:28<55:43:24, 17.75s/it] +2025-02-05 20:51:08 - ERROR - stderr - +2025-02-05 20:51:08 - ERROR - stderr - +2025-02-05 20:51:08 - INFO - stdout - {'loss': 0.6628, 'grad_norm': 1.22037935256958, 'learning_rate': 1.0606002055597627e-05, 'epoch': 1.49} +2025-02-05 20:51:08 - ERROR - stderr - 50%|████▉ | 11134/22434 [10:43:28<55:43:24, 17.75s/it] +2025-02-05 20:51:10 - ERROR - stderr - 50%|████▉ | 11135/22434 [10:43:30<41:22:49, 13.18s/it] +2025-02-05 20:51:10 - ERROR - stderr - +2025-02-05 20:51:10 - ERROR - stderr - +2025-02-05 20:51:10 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 1.1934614181518555, 'learning_rate': 1.0604560956097983e-05, 'epoch': 1.49} +2025-02-05 20:51:10 - ERROR - stderr - 50%|████▉ | 11135/22434 [10:43:30<41:22:49, 13.18s/it] +2025-02-05 20:51:13 - ERROR - stderr - 50%|████▉ | 11136/22434 [10:43:32<31:15:34, 9.96s/it] +2025-02-05 20:51:13 - ERROR - stderr - +2025-02-05 20:51:13 - ERROR - stderr - +2025-02-05 20:51:13 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.1259020566940308, 'learning_rate': 1.0603119843996848e-05, 'epoch': 1.49} +2025-02-05 20:51:13 - ERROR - stderr - 50%|████▉ | 11136/22434 [10:43:33<31:15:34, 9.96s/it] +2025-02-05 20:51:34 - ERROR - stderr - 50%|████▉ | 11137/22434 [10:43:54<42:13:48, 13.46s/it] +2025-02-05 20:51:34 - ERROR - stderr - +2025-02-05 20:51:34 - ERROR - stderr - +2025-02-05 20:51:34 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.1250442266464233, 'learning_rate': 1.0601678719324254e-05, 'epoch': 1.49} +2025-02-05 20:51:34 - ERROR - stderr - 50%|████▉ | 11137/22434 [10:43:54<42:13:48, 13.46s/it] +2025-02-05 20:51:37 - ERROR - stderr - 50%|████▉ | 11138/22434 [10:43:57<31:51:17, 10.15s/it] +2025-02-05 20:51:37 - ERROR - stderr - +2025-02-05 20:51:37 - ERROR - stderr - +2025-02-05 20:51:37 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.1000854969024658, 'learning_rate': 1.0600237582110244e-05, 'epoch': 1.49} +2025-02-05 20:51:37 - ERROR - stderr - 50%|████▉ | 11138/22434 [10:43:57<31:51:17, 10.15s/it] +2025-02-05 20:51:39 - ERROR - stderr - 50%|████▉ | 11139/22434 [10:43:59<24:36:48, 7.84s/it] +2025-02-05 20:51:39 - ERROR - stderr - +2025-02-05 20:51:39 - ERROR - stderr - +2025-02-05 20:51:39 - INFO - stdout - {'loss': 0.674, 'grad_norm': 1.297085165977478, 'learning_rate': 1.0598796432384853e-05, 'epoch': 1.49} +2025-02-05 20:51:39 - ERROR - stderr - 50%|████▉ | 11139/22434 [10:43:59<24:36:48, 7.84s/it] +2025-02-05 20:51:53 - ERROR - stderr - 50%|████▉ | 11140/22434 [10:44:13<30:06:00, 9.59s/it] +2025-02-05 20:51:53 - ERROR - stderr - +2025-02-05 20:51:53 - ERROR - stderr - +2025-02-05 20:51:53 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.2587895393371582, 'learning_rate': 1.0597355270178126e-05, 'epoch': 1.49} +2025-02-05 20:51:53 - ERROR - stderr - 50%|████▉ | 11140/22434 [10:44:13<30:06:00, 9.59s/it] +2025-02-05 20:52:15 - ERROR - stderr - 50%|████▉ | 11141/22434 [10:44:34<41:33:17, 13.25s/it] +2025-02-05 20:52:15 - ERROR - stderr - +2025-02-05 20:52:15 - ERROR - stderr - +2025-02-05 20:52:15 - INFO - stdout - {'loss': 0.6845, 'grad_norm': 1.2848955392837524, 'learning_rate': 1.0595914095520102e-05, 'epoch': 1.49} +2025-02-05 20:52:15 - ERROR - stderr - 50%|████▉ | 11141/22434 [10:44:34<41:33:17, 13.25s/it] +2025-02-05 20:52:35 - ERROR - stderr - 50%|████▉ | 11142/22434 [10:44:55<48:38:36, 15.51s/it] +2025-02-05 20:52:35 - ERROR - stderr - +2025-02-05 20:52:35 - ERROR - stderr - +2025-02-05 20:52:35 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.0787171125411987, 'learning_rate': 1.0594472908440817e-05, 'epoch': 1.49} +2025-02-05 20:52:35 - ERROR - stderr - 50%|████▉ | 11142/22434 [10:44:55<48:38:36, 15.51s/it] +2025-02-05 20:52:42 - ERROR - stderr - 50%|████▉ | 11143/22434 [10:45:01<39:51:54, 12.71s/it] +2025-02-05 20:52:42 - ERROR - stderr - +2025-02-05 20:52:42 - ERROR - stderr - +2025-02-05 20:52:42 - INFO - stdout - {'loss': 0.7623, 'grad_norm': 1.3908013105392456, 'learning_rate': 1.0593031708970312e-05, 'epoch': 1.49} +2025-02-05 20:52:42 - ERROR - stderr - 50%|████▉ | 11143/22434 [10:45:01<39:51:54, 12.71s/it] +2025-02-05 20:52:48 - ERROR - stderr - 50%|████▉ | 11144/22434 [10:45:08<33:45:04, 10.76s/it] +2025-02-05 20:52:48 - ERROR - stderr - +2025-02-05 20:52:48 - ERROR - stderr - +2025-02-05 20:52:48 - INFO - stdout - {'loss': 0.6604, 'grad_norm': 1.1248219013214111, 'learning_rate': 1.059159049713863e-05, 'epoch': 1.49} +2025-02-05 20:52:48 - ERROR - stderr - 50%|████▉ | 11144/22434 [10:45:08<33:45:04, 10.76s/it] +2025-02-05 20:52:50 - ERROR - stderr - 50%|████▉ | 11145/22434 [10:45:10<25:58:51, 8.29s/it] +2025-02-05 20:52:50 - ERROR - stderr - +2025-02-05 20:52:50 - ERROR - stderr - +2025-02-05 20:52:50 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.2079771757125854, 'learning_rate': 1.059014927297581e-05, 'epoch': 1.49} +2025-02-05 20:52:50 - ERROR - stderr - 50%|████▉ | 11145/22434 [10:45:10<25:58:51, 8.29s/it] +2025-02-05 20:52:53 - ERROR - stderr - 50%|████▉ | 11146/22434 [10:45:13<20:34:37, 6.56s/it] +2025-02-05 20:52:53 - ERROR - stderr - +2025-02-05 20:52:53 - ERROR - stderr - +2025-02-05 20:52:53 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.3465570211410522, 'learning_rate': 1.058870803651189e-05, 'epoch': 1.49} +2025-02-05 20:52:53 - ERROR - stderr - 50%|████▉ | 11146/22434 [10:45:13<20:34:37, 6.56s/it] +2025-02-05 20:53:36 - ERROR - stderr - 50%|████▉ | 11147/22434 [10:45:56<55:00:22, 17.54s/it] +2025-02-05 20:53:36 - ERROR - stderr - +2025-02-05 20:53:36 - ERROR - stderr - +2025-02-05 20:53:36 - INFO - stdout - {'loss': 0.5953, 'grad_norm': 1.1222517490386963, 'learning_rate': 1.0587266787776917e-05, 'epoch': 1.49} +2025-02-05 20:53:36 - ERROR - stderr - 50%|████▉ | 11147/22434 [10:45:56<55:00:22, 17.54s/it] +2025-02-05 20:53:39 - ERROR - stderr - 50%|████▉ | 11148/22434 [10:45:58<40:50:07, 13.03s/it] +2025-02-05 20:53:39 - ERROR - stderr - +2025-02-05 20:53:39 - ERROR - stderr - +2025-02-05 20:53:39 - INFO - stdout - {'loss': 0.7789, 'grad_norm': 1.4174551963806152, 'learning_rate': 1.0585825526800933e-05, 'epoch': 1.49} +2025-02-05 20:53:39 - ERROR - stderr - 50%|████▉ | 11148/22434 [10:45:58<40:50:07, 13.03s/it] +2025-02-05 20:54:00 - ERROR - stderr - 50%|████▉ | 11149/22434 [10:46:20<48:51:45, 15.59s/it] +2025-02-05 20:54:00 - ERROR - stderr - +2025-02-05 20:54:00 - ERROR - stderr - +2025-02-05 20:54:00 - INFO - stdout - {'loss': 0.7131, 'grad_norm': 1.354761004447937, 'learning_rate': 1.0584384253613973e-05, 'epoch': 1.49} +2025-02-05 20:54:00 - ERROR - stderr - 50%|████▉ | 11149/22434 [10:46:20<48:51:45, 15.59s/it] +2025-02-05 20:54:23 - ERROR - stderr - 50%|████▉ | 11150/22434 [10:46:43<56:03:20, 17.88s/it] +2025-02-05 20:54:23 - ERROR - stderr - +2025-02-05 20:54:23 - ERROR - stderr - +2025-02-05 20:54:23 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.2866826057434082, 'learning_rate': 1.058294296824608e-05, 'epoch': 1.49} +2025-02-05 20:54:23 - ERROR - stderr - 50%|████▉ | 11150/22434 [10:46:43<56:03:20, 17.88s/it] +2025-02-05 20:54:45 - ERROR - stderr - 50%|████▉ | 11151/22434 [10:47:05<59:41:03, 19.04s/it] +2025-02-05 20:54:45 - ERROR - stderr - +2025-02-05 20:54:45 - ERROR - stderr - +2025-02-05 20:54:45 - INFO - stdout - {'loss': 0.6948, 'grad_norm': 1.2255841493606567, 'learning_rate': 1.0581501670727303e-05, 'epoch': 1.49} +2025-02-05 20:54:45 - ERROR - stderr - 50%|████▉ | 11151/22434 [10:47:05<59:41:03, 19.04s/it] +2025-02-05 20:55:06 - ERROR - stderr - 50%|████▉ | 11152/22434 [10:47:26<61:24:19, 19.59s/it] +2025-02-05 20:55:06 - ERROR - stderr - +2025-02-05 20:55:06 - ERROR - stderr - +2025-02-05 20:55:06 - INFO - stdout - {'loss': 0.6762, 'grad_norm': 1.217775583267212, 'learning_rate': 1.0580060361087678e-05, 'epoch': 1.49} +2025-02-05 20:55:06 - ERROR - stderr - 50%|████▉ | 11152/22434 [10:47:26<61:24:19, 19.59s/it] +2025-02-05 20:55:10 - ERROR - stderr - 50%|████▉ | 11153/22434 [10:47:30<47:07:25, 15.04s/it] +2025-02-05 20:55:10 - ERROR - stderr - +2025-02-05 20:55:10 - ERROR - stderr - +2025-02-05 20:55:10 - INFO - stdout - {'loss': 0.7571, 'grad_norm': 1.2363560199737549, 'learning_rate': 1.057861903935725e-05, 'epoch': 1.49} +2025-02-05 20:55:10 - ERROR - stderr - 50%|████▉ | 11153/22434 [10:47:30<47:07:25, 15.04s/it] +2025-02-05 20:55:15 - ERROR - stderr - 50%|████▉ | 11154/22434 [10:47:34<37:03:19, 11.83s/it] +2025-02-05 20:55:15 - ERROR - stderr - +2025-02-05 20:55:15 - ERROR - stderr - +2025-02-05 20:55:15 - INFO - stdout - {'loss': 0.6373, 'grad_norm': 1.2037606239318848, 'learning_rate': 1.0577177705566061e-05, 'epoch': 1.49} +2025-02-05 20:55:15 - ERROR - stderr - 50%|████▉ | 11154/22434 [10:47:35<37:03:19, 11.83s/it] +2025-02-05 20:55:32 - ERROR - stderr - 50%|████▉ | 11155/22434 [10:47:52<42:22:51, 13.53s/it] +2025-02-05 20:55:32 - ERROR - stderr - +2025-02-05 20:55:32 - ERROR - stderr - +2025-02-05 20:55:32 - INFO - stdout - {'loss': 0.8142, 'grad_norm': 1.3599095344543457, 'learning_rate': 1.0575736359744157e-05, 'epoch': 1.49} +2025-02-05 20:55:32 - ERROR - stderr - 50%|████▉ | 11155/22434 [10:47:52<42:22:51, 13.53s/it] +2025-02-05 20:55:59 - ERROR - stderr - 50%|████▉ | 11156/22434 [10:48:18<54:34:45, 17.42s/it] +2025-02-05 20:55:59 - ERROR - stderr - +2025-02-05 20:55:59 - ERROR - stderr - +2025-02-05 20:55:59 - INFO - stdout - {'loss': 0.6027, 'grad_norm': 1.1520377397537231, 'learning_rate': 1.057429500192158e-05, 'epoch': 1.49} +2025-02-05 20:55:59 - ERROR - stderr - 50%|████▉ | 11156/22434 [10:48:19<54:34:45, 17.42s/it] +2025-02-05 20:56:14 - ERROR - stderr - 50%|████▉ | 11157/22434 [10:48:33<52:15:12, 16.68s/it] +2025-02-05 20:56:14 - ERROR - stderr - +2025-02-05 20:56:14 - ERROR - stderr - +2025-02-05 20:56:14 - INFO - stdout - {'loss': 0.6532, 'grad_norm': 1.200454831123352, 'learning_rate': 1.0572853632128372e-05, 'epoch': 1.49} +2025-02-05 20:56:14 - ERROR - stderr - 50%|████▉ | 11157/22434 [10:48:33<52:15:12, 16.68s/it] +2025-02-05 20:56:36 - ERROR - stderr - 50%|████▉ | 11158/22434 [10:48:56<57:34:56, 18.38s/it] +2025-02-05 20:56:36 - ERROR - stderr - +2025-02-05 20:56:36 - ERROR - stderr - +2025-02-05 20:56:36 - INFO - stdout - {'loss': 0.6085, 'grad_norm': 0.9994578957557678, 'learning_rate': 1.0571412250394575e-05, 'epoch': 1.49} +2025-02-05 20:56:36 - ERROR - stderr - 50%|████▉ | 11158/22434 [10:48:56<57:34:56, 18.38s/it] +2025-02-05 20:56:42 - ERROR - stderr - 50%|████▉ | 11159/22434 [10:49:02<45:45:48, 14.61s/it] +2025-02-05 20:56:42 - ERROR - stderr - +2025-02-05 20:56:42 - ERROR - stderr - +2025-02-05 20:56:42 - INFO - stdout - {'loss': 0.7465, 'grad_norm': 1.2469974756240845, 'learning_rate': 1.056997085675024e-05, 'epoch': 1.49} +2025-02-05 20:56:42 - ERROR - stderr - 50%|████▉ | 11159/22434 [10:49:02<45:45:48, 14.61s/it] +2025-02-05 20:56:44 - ERROR - stderr - 50%|████▉ | 11160/22434 [10:49:04<34:22:17, 10.98s/it] +2025-02-05 20:56:44 - ERROR - stderr - +2025-02-05 20:56:44 - ERROR - stderr - +2025-02-05 20:56:44 - INFO - stdout - {'loss': 0.7289, 'grad_norm': 1.2766176462173462, 'learning_rate': 1.0568529451225408e-05, 'epoch': 1.49} +2025-02-05 20:56:44 - ERROR - stderr - 50%|████▉ | 11160/22434 [10:49:04<34:22:17, 10.98s/it] +2025-02-05 20:56:58 - ERROR - stderr - 50%|████▉ | 11161/22434 [10:49:17<36:35:39, 11.69s/it] +2025-02-05 20:56:58 - ERROR - stderr - +2025-02-05 20:56:58 - ERROR - stderr - +2025-02-05 20:56:58 - INFO - stdout - {'loss': 0.7384, 'grad_norm': 1.2428025007247925, 'learning_rate': 1.0567088033850123e-05, 'epoch': 1.49} +2025-02-05 20:56:58 - ERROR - stderr - 50%|████▉ | 11161/22434 [10:49:17<36:35:39, 11.69s/it] +2025-02-05 20:57:20 - ERROR - stderr - 50%|████▉ | 11162/22434 [10:49:40<46:51:59, 14.97s/it] +2025-02-05 20:57:20 - ERROR - stderr - +2025-02-05 20:57:20 - ERROR - stderr - +2025-02-05 20:57:20 - INFO - stdout - {'loss': 0.717, 'grad_norm': 1.173176884651184, 'learning_rate': 1.0565646604654432e-05, 'epoch': 1.49} +2025-02-05 20:57:20 - ERROR - stderr - 50%|████▉ | 11162/22434 [10:49:40<46:51:59, 14.97s/it] +2025-02-05 20:57:49 - ERROR - stderr - 50%|████▉ | 11163/22434 [10:50:08<59:27:20, 18.99s/it] +2025-02-05 20:57:49 - ERROR - stderr - +2025-02-05 20:57:49 - ERROR - stderr - +2025-02-05 20:57:49 - INFO - stdout - {'loss': 0.6614, 'grad_norm': 1.0862598419189453, 'learning_rate': 1.0564205163668377e-05, 'epoch': 1.49} +2025-02-05 20:57:49 - ERROR - stderr - 50%|████▉ | 11163/22434 [10:50:08<59:27:20, 18.99s/it] +2025-02-05 20:58:02 - ERROR - stderr - 50%|████▉ | 11164/22434 [10:50:22<54:00:28, 17.25s/it] +2025-02-05 20:58:02 - ERROR - stderr - +2025-02-05 20:58:02 - ERROR - stderr - +2025-02-05 20:58:02 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.317094326019287, 'learning_rate': 1.0562763710922004e-05, 'epoch': 1.49} +2025-02-05 20:58:02 - ERROR - stderr - 50%|████▉ | 11164/22434 [10:50:22<54:00:28, 17.25s/it] +2025-02-05 20:58:24 - ERROR - stderr - 50%|████▉ | 11165/22434 [10:50:44<58:39:35, 18.74s/it] +2025-02-05 20:58:24 - ERROR - stderr - +2025-02-05 20:58:24 - ERROR - stderr - +2025-02-05 20:58:24 - INFO - stdout - {'loss': 0.7845, 'grad_norm': 1.2068299055099487, 'learning_rate': 1.0561322246445363e-05, 'epoch': 1.49} +2025-02-05 20:58:24 - ERROR - stderr - 50%|████▉ | 11165/22434 [10:50:44<58:39:35, 18.74s/it] +2025-02-05 20:58:40 - ERROR - stderr - 50%|████▉ | 11166/22434 [10:50:59<55:36:59, 17.77s/it] +2025-02-05 20:58:40 - ERROR - stderr - +2025-02-05 20:58:40 - ERROR - stderr - +2025-02-05 20:58:40 - INFO - stdout - {'loss': 0.7543, 'grad_norm': 1.2888822555541992, 'learning_rate': 1.0559880770268493e-05, 'epoch': 1.49} +2025-02-05 20:58:40 - ERROR - stderr - 50%|████▉ | 11166/22434 [10:50:59<55:36:59, 17.77s/it] +2025-02-05 20:58:55 - ERROR - stderr - 50%|████▉ | 11167/22434 [10:51:15<53:15:32, 17.02s/it] +2025-02-05 20:58:55 - ERROR - stderr - +2025-02-05 20:58:55 - ERROR - stderr - +2025-02-05 20:58:55 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.197426676750183, 'learning_rate': 1.0558439282421446e-05, 'epoch': 1.49} +2025-02-05 20:58:55 - ERROR - stderr - 50%|████▉ | 11167/22434 [10:51:15<53:15:32, 17.02s/it] +2025-02-05 20:59:16 - ERROR - stderr - 50%|████�� | 11168/22434 [10:51:36<57:03:41, 18.23s/it] +2025-02-05 20:59:16 - ERROR - stderr - +2025-02-05 20:59:16 - ERROR - stderr - +2025-02-05 20:59:16 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.1670724153518677, 'learning_rate': 1.055699778293427e-05, 'epoch': 1.49} +2025-02-05 20:59:16 - ERROR - stderr - 50%|████▉ | 11168/22434 [10:51:36<57:03:41, 18.23s/it] +2025-02-05 20:59:21 - ERROR - stderr - 50%|████▉ | 11169/22434 [10:51:41<44:31:51, 14.23s/it] +2025-02-05 20:59:21 - ERROR - stderr - +2025-02-05 20:59:21 - ERROR - stderr - +2025-02-05 20:59:21 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.3224575519561768, 'learning_rate': 1.0555556271837007e-05, 'epoch': 1.49} +2025-02-05 20:59:21 - ERROR - stderr - 50%|████▉ | 11169/22434 [10:51:41<44:31:51, 14.23s/it] +2025-02-05 20:59:23 - ERROR - stderr - 50%|████▉ | 11170/22434 [10:51:43<33:26:35, 10.69s/it] +2025-02-05 20:59:23 - ERROR - stderr - +2025-02-05 20:59:23 - ERROR - stderr - +2025-02-05 20:59:23 - INFO - stdout - {'loss': 0.6624, 'grad_norm': 1.2369978427886963, 'learning_rate': 1.05541147491597e-05, 'epoch': 1.49} +2025-02-05 20:59:23 - ERROR - stderr - 50%|████▉ | 11170/22434 [10:51:43<33:26:35, 10.69s/it] +2025-02-05 20:59:29 - ERROR - stderr - 50%|████▉ | 11171/22434 [10:51:48<28:28:23, 9.10s/it] +2025-02-05 20:59:29 - ERROR - stderr - +2025-02-05 20:59:29 - ERROR - stderr - +2025-02-05 20:59:29 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.2266074419021606, 'learning_rate': 1.0552673214932406e-05, 'epoch': 1.49} +2025-02-05 20:59:29 - ERROR - stderr - 50%|████▉ | 11171/22434 [10:51:48<28:28:23, 9.10s/it] +2025-02-05 20:59:34 - ERROR - stderr - 50%|████▉ | 11172/22434 [10:51:54<24:51:11, 7.94s/it] +2025-02-05 20:59:34 - ERROR - stderr - +2025-02-05 20:59:34 - ERROR - stderr - +2025-02-05 20:59:34 - INFO - stdout - {'loss': 0.7352, 'grad_norm': 1.2613096237182617, 'learning_rate': 1.0551231669185168e-05, 'epoch': 1.49} +2025-02-05 20:59:34 - ERROR - stderr - 50%|████▉ | 11172/22434 [10:51:54<24:51:11, 7.94s/it] +2025-02-05 20:59:36 - ERROR - stderr - 50%|████▉ | 11173/22434 [10:51:56<19:45:20, 6.32s/it] +2025-02-05 20:59:36 - ERROR - stderr - +2025-02-05 20:59:36 - ERROR - stderr - +2025-02-05 20:59:36 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.2465813159942627, 'learning_rate': 1.0549790111948031e-05, 'epoch': 1.49} +2025-02-05 20:59:36 - ERROR - stderr - 50%|████▉ | 11173/22434 [10:51:56<19:45:20, 6.32s/it] +2025-02-05 20:59:39 - ERROR - stderr - 50%|████▉ | 11174/22434 [10:51:59<16:42:52, 5.34s/it] +2025-02-05 20:59:40 - ERROR - stderr - +2025-02-05 20:59:40 - ERROR - stderr - +2025-02-05 20:59:40 - INFO - stdout - {'loss': 0.7623, 'grad_norm': 1.2195369005203247, 'learning_rate': 1.0548348543251044e-05, 'epoch': 1.49} +2025-02-05 20:59:40 - ERROR - stderr - 50%|████▉ | 11174/22434 [10:51:59<16:42:52, 5.34s/it] +2025-02-05 21:00:03 - ERROR - stderr - 50%|████▉ | 11175/22434 [10:52:22<33:23:13, 10.68s/it] +2025-02-05 21:00:03 - ERROR - stderr - +2025-02-05 21:00:03 - ERROR - stderr - +2025-02-05 21:00:03 - INFO - stdout - {'loss': 0.7753, 'grad_norm': 1.2356926202774048, 'learning_rate': 1.054690696312426e-05, 'epoch': 1.49} +2025-02-05 21:00:03 - ERROR - stderr - 50%|████▉ | 11175/22434 [10:52:22<33:23:13, 10.68s/it] +2025-02-05 21:00:23 - ERROR - stderr - 50%|████▉ | 11176/22434 [10:52:43<42:39:24, 13.64s/it] +2025-02-05 21:00:23 - ERROR - stderr - +2025-02-05 21:00:23 - ERROR - stderr - +2025-02-05 21:00:23 - INFO - stdout - {'loss': 0.6574, 'grad_norm': 1.0978771448135376, 'learning_rate': 1.0545465371597723e-05, 'epoch': 1.49} +2025-02-05 21:00:23 - ERROR - stderr - 50%|████▉ | 11176/22434 [10:52:43<42:39:24, 13.64s/it] +2025-02-05 21:01:01 - ERROR - stderr - 50%|████▉ | 11177/22434 [10:53:21<65:13:51, 20.86s/it] +2025-02-05 21:01:01 - ERROR - stderr - +2025-02-05 21:01:01 - ERROR - stderr - +2025-02-05 21:01:01 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.2394564151763916, 'learning_rate': 1.0544023768701477e-05, 'epoch': 1.49} +2025-02-05 21:01:01 - ERROR - stderr - 50%|████▉ | 11177/22434 [10:53:21<65:13:51, 20.86s/it] +2025-02-05 21:01:17 - ERROR - stderr - 50%|████▉ | 11178/22434 [10:53:37<60:53:01, 19.47s/it] +2025-02-05 21:01:17 - ERROR - stderr - +2025-02-05 21:01:17 - ERROR - stderr - +2025-02-05 21:01:17 - INFO - stdout - {'loss': 0.7727, 'grad_norm': 1.3418971300125122, 'learning_rate': 1.0542582154465581e-05, 'epoch': 1.49} +2025-02-05 21:01:17 - ERROR - stderr - 50%|████▉ | 11178/22434 [10:53:37<60:53:01, 19.47s/it] +2025-02-05 21:01:42 - ERROR - stderr - 50%|████▉ | 11179/22434 [10:54:02<66:13:18, 21.18s/it] +2025-02-05 21:01:42 - ERROR - stderr - +2025-02-05 21:01:42 - ERROR - stderr - +2025-02-05 21:01:42 - INFO - stdout - {'loss': 0.623, 'grad_norm': 1.114583134651184, 'learning_rate': 1.0541140528920077e-05, 'epoch': 1.49} +2025-02-05 21:01:42 - ERROR - stderr - 50%|████▉ | 11179/22434 [10:54:02<66:13:18, 21.18s/it] +2025-02-05 21:01:58 - ERROR - stderr - 50%|████▉ | 11180/22434 [10:54:18<61:30:07, 19.67s/it] +2025-02-05 21:01:58 - ERROR - stderr - +2025-02-05 21:01:58 - ERROR - stderr - +2025-02-05 21:01:58 - INFO - stdout - {'loss': 0.6847, 'grad_norm': 1.278980016708374, 'learning_rate': 1.053969889209502e-05, 'epoch': 1.5} +2025-02-05 21:01:58 - ERROR - stderr - 50%|████▉ | 11180/22434 [10:54:18<61:30:07, 19.67s/it] +2025-02-05 21:02:16 - ERROR - stderr - 50%|████▉ | 11181/22434 [10:54:36<59:46:55, 19.13s/it] +2025-02-05 21:02:16 - ERROR - stderr - +2025-02-05 21:02:16 - ERROR - stderr - +2025-02-05 21:02:16 - INFO - stdout - {'loss': 0.7263, 'grad_norm': 1.3881422281265259, 'learning_rate': 1.0538257244020456e-05, 'epoch': 1.5} +2025-02-05 21:02:16 - ERROR - stderr - 50%|████▉ | 11181/22434 [10:54:36<59:46:55, 19.13s/it] +2025-02-05 21:02:38 - ERROR - stderr - 50%|████▉ | 11182/22434 [10:54:58<62:30:28, 20.00s/it] +2025-02-05 21:02:38 - ERROR - stderr - +2025-02-05 21:02:38 - ERROR - stderr - +2025-02-05 21:02:38 - INFO - stdout - {'loss': 0.6569, 'grad_norm': 1.1720807552337646, 'learning_rate': 1.0536815584726432e-05, 'epoch': 1.5} +2025-02-05 21:02:38 - ERROR - stderr - 50%|████▉ | 11182/22434 [10:54:58<62:30:28, 20.00s/it] +2025-02-05 21:03:20 - ERROR - stderr - 50%|████▉ | 11183/22434 [10:55:39<82:27:15, 26.38s/it] +2025-02-05 21:03:20 - ERROR - stderr - +2025-02-05 21:03:20 - ERROR - stderr - +2025-02-05 21:03:20 - INFO - stdout - {'loss': 0.6577, 'grad_norm': 1.1185722351074219, 'learning_rate': 1.0535373914243001e-05, 'epoch': 1.5} +2025-02-05 21:03:20 - ERROR - stderr - 50%|████▉ | 11183/22434 [10:55:39<82:27:15, 26.38s/it] +2025-02-05 21:03:41 - ERROR - stderr - 50%|████▉ | 11184/22434 [10:56:01<78:00:16, 24.96s/it] +2025-02-05 21:03:41 - ERROR - stderr - +2025-02-05 21:03:41 - ERROR - stderr - +2025-02-05 21:03:41 - INFO - stdout - {'loss': 0.6473, 'grad_norm': 1.0863063335418701, 'learning_rate': 1.0533932232600213e-05, 'epoch': 1.5} +2025-02-05 21:03:41 - ERROR - stderr - 50%|████▉ | 11184/22434 [10:56:01<78:00:16, 24.96s/it] +2025-02-05 21:03:45 - ERROR - stderr - 50%|████▉ | 11185/22434 [10:56:05<58:23:20, 18.69s/it] +2025-02-05 21:03:45 - ERROR - stderr - +2025-02-05 21:03:45 - ERROR - stderr - +2025-02-05 21:03:45 - INFO - stdout - {'loss': 0.7396, 'grad_norm': 1.221068024635315, 'learning_rate': 1.053249053982812e-05, 'epoch': 1.5} +2025-02-05 21:03:45 - ERROR - stderr - 50%|████▉ | 11185/22434 [10:56:05<58:23:20, 18.69s/it] +2025-02-05 21:03:48 - ERROR - stderr - 50%|████▉ | 11186/22434 [10:56:08<43:14:48, 13.84s/it] +2025-02-05 21:03:48 - ERROR - stderr - +2025-02-05 21:03:48 - ERROR - stderr - +2025-02-05 21:03:48 - INFO - stdout - {'loss': 0.6905, 'grad_norm': 1.2321242094039917, 'learning_rate': 1.053104883595677e-05, 'epoch': 1.5} +2025-02-05 21:03:48 - ERROR - stderr - 50%|████▉ | 11186/22434 [10:56:08<43:14:48, 13.84s/it] +2025-02-05 21:04:09 - ERROR - stderr - 50%|████▉ | 11187/22434 [10:56:29<50:31:44, 16.17s/it] +2025-02-05 21:04:09 - ERROR - stderr - +2025-02-05 21:04:09 - ERROR - stderr - +2025-02-05 21:04:09 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.2206392288208008, 'learning_rate': 1.0529607121016215e-05, 'epoch': 1.5} +2025-02-05 21:04:09 - ERROR - stderr - 50%|████▉ | 11187/22434 [10:56:29<50:31:44, 16.17s/it] +2025-02-05 21:04:12 - ERROR - stderr - 50%|████▉ | 11188/22434 [10:56:32<37:40:05, 12.06s/it] +2025-02-05 21:04:12 - ERROR - stderr - +2025-02-05 21:04:12 - ERROR - stderr - +2025-02-05 21:04:12 - INFO - stdout - {'loss': 0.631, 'grad_norm': 1.2069880962371826, 'learning_rate': 1.052816539503651e-05, 'epoch': 1.5} +2025-02-05 21:04:12 - ERROR - stderr - 50%|████▉ | 11188/22434 [10:56:32<37:40:05, 12.06s/it] +2025-02-05 21:04:28 - ERROR - stderr - 50%|████▉ | 11189/22434 [10:56:48<41:25:15, 13.26s/it] +2025-02-05 21:04:28 - ERROR - stderr - +2025-02-05 21:04:28 - ERROR - stderr - +2025-02-05 21:04:28 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.2368944883346558, 'learning_rate': 1.0526723658047698e-05, 'epoch': 1.5} +2025-02-05 21:04:28 - ERROR - stderr - 50%|████▉ | 11189/22434 [10:56:48<41:25:15, 13.26s/it] +2025-02-05 21:04:48 - ERROR - stderr - 50%|████▉ | 11190/22434 [10:57:08<47:35:50, 15.24s/it] +2025-02-05 21:04:48 - ERROR - stderr - +2025-02-05 21:04:48 - ERROR - stderr - +2025-02-05 21:04:48 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.193634033203125, 'learning_rate': 1.0525281910079834e-05, 'epoch': 1.5} +2025-02-05 21:04:48 - ERROR - stderr - 50%|████▉ | 11190/22434 [10:57:08<47:35:50, 15.24s/it] +2025-02-05 21:05:10 - ERROR - stderr - 50%|████▉ | 11191/22434 [10:57:30<54:28:31, 17.44s/it] +2025-02-05 21:05:10 - ERROR - stderr - +2025-02-05 21:05:10 - ERROR - stderr - +2025-02-05 21:05:10 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.1900726556777954, 'learning_rate': 1.0523840151162974e-05, 'epoch': 1.5} +2025-02-05 21:05:10 - ERROR - stderr - 50%|████▉ | 11191/22434 [10:57:30<54:28:31, 17.44s/it] +2025-02-05 21:05:31 - ERROR - stderr - 50%|████▉ | 11192/22434 [10:57:51<57:46:48, 18.50s/it] +2025-02-05 21:05:31 - ERROR - stderr - +2025-02-05 21:05:31 - ERROR - stderr - +2025-02-05 21:05:31 - INFO - stdout - {'loss': 0.8222, 'grad_norm': 1.2822988033294678, 'learning_rate': 1.0522398381327171e-05, 'epoch': 1.5} +2025-02-05 21:05:31 - ERROR - stderr - 50%|████▉ | 11192/22434 [10:57:51<57:46:48, 18.50s/it] +2025-02-05 21:05:36 - ERROR - stderr - 50%|████▉ | 11193/22434 [10:57:56<45:10:11, 14.47s/it] +2025-02-05 21:05:36 - ERROR - stderr - +2025-02-05 21:05:36 - ERROR - stderr - +2025-02-05 21:05:36 - INFO - stdout - {'loss': 0.6517, 'grad_norm': 1.1578625440597534, 'learning_rate': 1.052095660060247e-05, 'epoch': 1.5} +2025-02-05 21:05:36 - ERROR - stderr - 50%|████▉ | 11193/22434 [10:57:56<45:10:11, 14.47s/it] +2025-02-05 21:05:39 - ERROR - stderr - 50%|████▉ | 11194/22434 [10:57:59<34:00:53, 10.89s/it] +2025-02-05 21:05:39 - ERROR - stderr - +2025-02-05 21:05:39 - ERROR - stderr - +2025-02-05 21:05:39 - INFO - stdout - {'loss': 0.7465, 'grad_norm': 1.2446532249450684, 'learning_rate': 1.0519514809018927e-05, 'epoch': 1.5} +2025-02-05 21:05:39 - ERROR - stderr - 50%|████▉ | 11194/22434 [10:57:59<34:00:53, 10.89s/it] +2025-02-05 21:05:43 - ERROR - stderr - 50%|████▉ | 11195/22434 [10:58:03<27:51:17, 8.92s/it] +2025-02-05 21:05:43 - ERROR - stderr - +2025-02-05 21:05:43 - ERROR - stderr - +2025-02-05 21:05:43 - INFO - stdout - {'loss': 0.6599, 'grad_norm': 1.1602444648742676, 'learning_rate': 1.0518073006606596e-05, 'epoch': 1.5} +2025-02-05 21:05:43 - ERROR - stderr - 50%|████▉ | 11195/22434 [10:58:03<27:51:17, 8.92s/it] +2025-02-05 21:06:00 - ERROR - stderr - 50%|████▉ | 11196/22434 [10:58:20<35:30:27, 11.37s/it] +2025-02-05 21:06:00 - ERROR - stderr - +2025-02-05 21:06:00 - ERROR - stderr - +2025-02-05 21:06:00 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.3141688108444214, 'learning_rate': 1.0516631193395525e-05, 'epoch': 1.5} +2025-02-05 21:06:00 - ERROR - stderr - 50%|████▉ | 11196/22434 [10:58:20<35:30:27, 11.37s/it] +2025-02-05 21:06:03 - ERROR - stderr - 50%|████▉ | 11197/22434 [10:58:23<27:40:51, 8.87s/it] +2025-02-05 21:06:03 - ERROR - stderr - +2025-02-05 21:06:03 - ERROR - stderr - +2025-02-05 21:06:03 - INFO - stdout - {'loss': 0.6416, 'grad_norm': 1.1707797050476074, 'learning_rate': 1.0515189369415775e-05, 'epoch': 1.5} +2025-02-05 21:06:03 - ERROR - stderr - 50%|████▉ | 11197/22434 [10:58:23<27:40:51, 8.87s/it] +2025-02-05 21:06:08 - ERROR - stderr - 50%|████▉ | 11198/22434 [10:58:28<23:31:19, 7.54s/it] +2025-02-05 21:06:08 - ERROR - stderr - +2025-02-05 21:06:08 - ERROR - stderr - +2025-02-05 21:06:08 - INFO - stdout - {'loss': 0.7772, 'grad_norm': 1.3497982025146484, 'learning_rate': 1.0513747534697396e-05, 'epoch': 1.5} +2025-02-05 21:06:08 - ERROR - stderr - 50%|████▉ | 11198/22434 [10:58:28<23:31:19, 7.54s/it] +2025-02-05 21:06:12 - ERROR - stderr - 50%|████▉ | 11199/22434 [10:58:32<20:42:09, 6.63s/it] +2025-02-05 21:06:12 - ERROR - stderr - +2025-02-05 21:06:12 - ERROR - stderr - +2025-02-05 21:06:12 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.1801602840423584, 'learning_rate': 1.051230568927044e-05, 'epoch': 1.5} +2025-02-05 21:06:12 - ERROR - stderr - 50%|████▉ | 11199/22434 [10:58:32<20:42:09, 6.63s/it] +2025-02-05 21:06:29 - ERROR - stderr - 50%|████▉ | 11200/22434 [10:58:49<30:29:30, 9.77s/it] +2025-02-05 21:06:29 - ERROR - stderr - +2025-02-05 21:06:29 - ERROR - stderr - +2025-02-05 21:06:29 - INFO - stdout - {'loss': 0.6813, 'grad_norm': 1.3530025482177734, 'learning_rate': 1.0510863833164963e-05, 'epoch': 1.5} +2025-02-05 21:06:29 - ERROR - stderr - 50%|████▉ | 11200/22434 [10:58:49<30:29:30, 9.77s/it] +2025-02-05 21:06:54 - ERROR - stderr - 50%|████▉ | 11201/22434 [10:59:14<44:14:43, 14.18s/it] +2025-02-05 21:06:54 - ERROR - stderr - +2025-02-05 21:06:54 - ERROR - stderr - +2025-02-05 21:06:54 - INFO - stdout - {'loss': 0.6137, 'grad_norm': 1.0279252529144287, 'learning_rate': 1.0509421966411017e-05, 'epoch': 1.5} +2025-02-05 21:06:54 - ERROR - stderr - 50%|████▉ | 11201/22434 [10:59:14<44:14:43, 14.18s/it] +2025-02-05 21:07:10 - ERROR - stderr - 50%|████▉ | 11202/22434 [10:59:30<46:13:23, 14.82s/it] +2025-02-05 21:07:10 - ERROR - stderr - +2025-02-05 21:07:10 - ERROR - stderr - +2025-02-05 21:07:10 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.176138162612915, 'learning_rate': 1.0507980089038659e-05, 'epoch': 1.5} +2025-02-05 21:07:10 - ERROR - stderr - 50%|████▉ | 11202/22434 [10:59:30<46:13:23, 14.82s/it] +2025-02-05 21:07:34 - ERROR - stderr - 50%|████▉ | 11203/22434 [10:59:53<54:18:13, 17.41s/it] +2025-02-05 21:07:34 - ERROR - stderr - +2025-02-05 21:07:34 - ERROR - stderr - +2025-02-05 21:07:34 - INFO - stdout - {'loss': 0.7327, 'grad_norm': 1.3767824172973633, 'learning_rate': 1.050653820107794e-05, 'epoch': 1.5} +2025-02-05 21:07:34 - ERROR - stderr - 50%|████▉ | 11203/22434 [10:59:53<54:18:13, 17.41s/it] +2025-02-05 21:07:58 - ERROR - stderr - 50%|████▉ | 11204/22434 [11:00:18<61:06:11, 19.59s/it] +2025-02-05 21:07:58 - ERROR - stderr - +2025-02-05 21:07:58 - ERROR - stderr - +2025-02-05 21:07:58 - INFO - stdout - {'loss': 0.8116, 'grad_norm': 1.4212448596954346, 'learning_rate': 1.050509630255892e-05, 'epoch': 1.5} +2025-02-05 21:07:58 - ERROR - stderr - 50%|████▉ | 11204/22434 [11:00:18<61:06:11, 19.59s/it] +2025-02-05 21:08:02 - ERROR - stderr - 50%|████▉ | 11205/22434 [11:00:21<45:55:12, 14.72s/it] +2025-02-05 21:08:02 - ERROR - stderr - +2025-02-05 21:08:02 - ERROR - stderr - +2025-02-05 21:08:02 - INFO - stdout - {'loss': 0.7032, 'grad_norm': 1.3102025985717773, 'learning_rate': 1.050365439351165e-05, 'epoch': 1.5} +2025-02-05 21:08:02 - ERROR - stderr - 50%|████▉ | 11205/22434 [11:00:22<45:55:12, 14.72s/it] +2025-02-05 21:08:18 - ERROR - stderr - 50%|████▉ | 11206/22434 [11:00:37<47:05:58, 15.10s/it] +2025-02-05 21:08:18 - ERROR - stderr - +2025-02-05 21:08:18 - ERROR - stderr - +2025-02-05 21:08:18 - INFO - stdout - {'loss': 0.7001, 'grad_norm': 1.2339673042297363, 'learning_rate': 1.0502212473966183e-05, 'epoch': 1.5} +2025-02-05 21:08:18 - ERROR - stderr - 50%|████▉ | 11206/22434 [11:00:38<47:05:58, 15.10s/it] +2025-02-05 21:08:33 - ERROR - stderr - 50%|████▉ | 11207/22434 [11:00:53<47:14:10, 15.15s/it] +2025-02-05 21:08:33 - ERROR - stderr - +2025-02-05 21:08:33 - ERROR - stderr - +2025-02-05 21:08:33 - INFO - stdout - {'loss': 0.8373, 'grad_norm': 1.3438186645507812, 'learning_rate': 1.0500770543952579e-05, 'epoch': 1.5} +2025-02-05 21:08:33 - ERROR - stderr - 50%|████▉ | 11207/22434 [11:00:53<47:14:10, 15.15s/it] +2025-02-05 21:08:35 - ERROR - stderr - 50%|████▉ | 11208/22434 [11:00:55<35:26:06, 11.36s/it] +2025-02-05 21:08:36 - ERROR - stderr - +2025-02-05 21:08:36 - ERROR - stderr - +2025-02-05 21:08:36 - INFO - stdout - {'loss': 0.7364, 'grad_norm': 1.2887126207351685, 'learning_rate': 1.0499328603500896e-05, 'epoch': 1.5} +2025-02-05 21:08:36 - ERROR - stderr - 50%|████▉ | 11208/22434 [11:00:55<35:26:06, 11.36s/it] +2025-02-05 21:08:52 - ERROR - stderr - 50%|████▉ | 11209/22434 [11:01:12<40:21:22, 12.94s/it] +2025-02-05 21:08:52 - ERROR - stderr - +2025-02-05 21:08:52 - ERROR - stderr - +2025-02-05 21:08:52 - INFO - stdout - {'loss': 0.6368, 'grad_norm': 1.1469290256500244, 'learning_rate': 1.0497886652641181e-05, 'epoch': 1.5} +2025-02-05 21:08:52 - ERROR - stderr - 50%|████▉ | 11209/22434 [11:01:12<40:21:22, 12.94s/it] +2025-02-05 21:09:10 - ERROR - stderr - 50%|████▉ | 11210/22434 [11:01:30<44:48:27, 14.37s/it] +2025-02-05 21:09:10 - ERROR - stderr - +2025-02-05 21:09:10 - ERROR - stderr - +2025-02-05 21:09:10 - INFO - stdout - {'loss': 0.6914, 'grad_norm': 1.2227312326431274, 'learning_rate': 1.0496444691403496e-05, 'epoch': 1.5} +2025-02-05 21:09:10 - ERROR - stderr - 50%|████▉ | 11210/22434 [11:01:30<44:48:27, 14.37s/it] +2025-02-05 21:09:26 - ERROR - stderr - 50%|████▉ | 11211/22434 [11:01:46<46:34:08, 14.94s/it] +2025-02-05 21:09:26 - ERROR - stderr - +2025-02-05 21:09:26 - ERROR - stderr - +2025-02-05 21:09:26 - INFO - stdout - {'loss': 0.7893, 'grad_norm': 1.278199315071106, 'learning_rate': 1.0495002719817896e-05, 'epoch': 1.5} +2025-02-05 21:09:26 - ERROR - stderr - 50%|████▉ | 11211/22434 [11:01:46<46:34:08, 14.94s/it] +2025-02-05 21:09:29 - ERROR - stderr - 50%|████▉ | 11212/22434 [11:01:48<34:56:54, 11.21s/it] +2025-02-05 21:09:29 - ERROR - stderr - +2025-02-05 21:09:29 - ERROR - stderr - +2025-02-05 21:09:29 - INFO - stdout - {'loss': 0.6217, 'grad_norm': 1.1027257442474365, 'learning_rate': 1.0493560737914444e-05, 'epoch': 1.5} +2025-02-05 21:09:29 - ERROR - stderr - 50%|████▉ | 11212/22434 [11:01:48<34:56:54, 11.21s/it] +2025-02-05 21:09:44 - ERROR - stderr - 50%|████▉ | 11213/22434 [11:02:04<39:14:06, 12.59s/it] +2025-02-05 21:09:44 - ERROR - stderr - +2025-02-05 21:09:44 - ERROR - stderr - +2025-02-05 21:09:44 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.210065245628357, 'learning_rate': 1.0492118745723185e-05, 'epoch': 1.5} +2025-02-05 21:09:44 - ERROR - stderr - 50%|██���█▉ | 11213/22434 [11:02:04<39:14:06, 12.59s/it] +2025-02-05 21:10:08 - ERROR - stderr - 50%|████▉ | 11214/22434 [11:02:28<49:21:14, 15.84s/it] +2025-02-05 21:10:08 - ERROR - stderr - +2025-02-05 21:10:08 - ERROR - stderr - +2025-02-05 21:10:08 - INFO - stdout - {'loss': 0.6545, 'grad_norm': 1.0736790895462036, 'learning_rate': 1.0490676743274181e-05, 'epoch': 1.5} +2025-02-05 21:10:08 - ERROR - stderr - 50%|████▉ | 11214/22434 [11:02:28<49:21:14, 15.84s/it] +2025-02-05 21:10:19 - ERROR - stderr - 50%|████▉ | 11215/22434 [11:02:39<44:49:51, 14.39s/it] +2025-02-05 21:10:19 - ERROR - stderr - +2025-02-05 21:10:19 - ERROR - stderr - +2025-02-05 21:10:19 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.2265375852584839, 'learning_rate': 1.0489234730597494e-05, 'epoch': 1.5} +2025-02-05 21:10:19 - ERROR - stderr - 50%|████▉ | 11215/22434 [11:02:39<44:49:51, 14.39s/it] +2025-02-05 21:10:31 - ERROR - stderr - 50%|████▉ | 11216/22434 [11:02:51<42:47:38, 13.73s/it] +2025-02-05 21:10:31 - ERROR - stderr - +2025-02-05 21:10:31 - ERROR - stderr - +2025-02-05 21:10:31 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.2218736410140991, 'learning_rate': 1.0487792707723173e-05, 'epoch': 1.5} +2025-02-05 21:10:31 - ERROR - stderr - 50%|████▉ | 11216/22434 [11:02:51<42:47:38, 13.73s/it] +2025-02-05 21:10:53 - ERROR - stderr - 50%|█████ | 11217/22434 [11:03:13<50:26:44, 16.19s/it] +2025-02-05 21:10:53 - ERROR - stderr - +2025-02-05 21:10:53 - ERROR - stderr - +2025-02-05 21:10:53 - INFO - stdout - {'loss': 0.8272, 'grad_norm': 1.3834000825881958, 'learning_rate': 1.0486350674681282e-05, 'epoch': 1.5} +2025-02-05 21:10:53 - ERROR - stderr - 50%|█████ | 11217/22434 [11:03:13<50:26:44, 16.19s/it] +2025-02-05 21:11:07 - ERROR - stderr - 50%|█████ | 11218/22434 [11:03:27<48:24:52, 15.54s/it] +2025-02-05 21:11:07 - ERROR - stderr - +2025-02-05 21:11:07 - ERROR - stderr - +2025-02-05 21:11:07 - INFO - stdout - {'loss': 0.6238, 'grad_norm': 1.1733715534210205, 'learning_rate': 1.0484908631501875e-05, 'epoch': 1.5} +2025-02-05 21:11:07 - ERROR - stderr - 50%|█████ | 11218/22434 [11:03:27<48:24:52, 15.54s/it] +2025-02-05 21:11:16 - ERROR - stderr - 50%|█████ | 11219/22434 [11:03:36<42:34:31, 13.67s/it] +2025-02-05 21:11:16 - ERROR - stderr - +2025-02-05 21:11:16 - ERROR - stderr - +2025-02-05 21:11:16 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.0997190475463867, 'learning_rate': 1.0483466578215013e-05, 'epoch': 1.5} +2025-02-05 21:11:16 - ERROR - stderr - 50%|█████ | 11219/22434 [11:03:36<42:34:31, 13.67s/it] +2025-02-05 21:11:19 - ERROR - stderr - 50%|█████ | 11220/22434 [11:03:39<32:09:50, 10.33s/it] +2025-02-05 21:11:19 - ERROR - stderr - +2025-02-05 21:11:19 - ERROR - stderr - +2025-02-05 21:11:19 - INFO - stdout - {'loss': 0.6397, 'grad_norm': 1.1278554201126099, 'learning_rate': 1.0482024514850753e-05, 'epoch': 1.5} +2025-02-05 21:11:19 - ERROR - stderr - 50%|█████ | 11220/22434 [11:03:39<32:09:50, 10.33s/it] +2025-02-05 21:11:36 - ERROR - stderr - 50%|█████ | 11221/22434 [11:03:56<38:35:57, 12.39s/it] +2025-02-05 21:11:36 - ERROR - stderr - +2025-02-05 21:11:36 - ERROR - stderr - +2025-02-05 21:11:36 - INFO - stdout - {'loss': 0.791, 'grad_norm': 1.2674373388290405, 'learning_rate': 1.0480582441439155e-05, 'epoch': 1.5} +2025-02-05 21:11:36 - ERROR - stderr - 50%|█████ | 11221/22434 [11:03:56<38:35:57, 12.39s/it] +2025-02-05 21:11:50 - ERROR - stderr - 50%|█████ | 11222/22434 [11:04:09<39:47:55, 12.78s/it] +2025-02-05 21:11:50 - ERROR - stderr - +2025-02-05 21:11:50 - ERROR - stderr - +2025-02-05 21:11:50 - INFO - stdout - {'loss': 0.7469, 'grad_norm': 1.2782623767852783, 'learning_rate': 1.0479140358010273e-05, 'epoch': 1.5} +2025-02-05 21:11:50 - ERROR - stderr - 50%|█████ | 11222/22434 [11:04:10<39:47:55, 12.78s/it] +2025-02-05 21:11:59 - ERROR - stderr - 50%|█████ | 11223/22434 [11:04:19<36:49:57, 11.83s/it] +2025-02-05 21:11:59 - ERROR - stderr - +2025-02-05 21:11:59 - ERROR - stderr - +2025-02-05 21:11:59 - INFO - stdout - {'loss': 0.672, 'grad_norm': 1.1943987607955933, 'learning_rate': 1.0477698264594167e-05, 'epoch': 1.5} +2025-02-05 21:11:59 - ERROR - stderr - 50%|█████ | 11223/22434 [11:04:19<36:49:57, 11.83s/it] +2025-02-05 21:12:02 - ERROR - stderr - 50%|█████ | 11224/22434 [11:04:22<28:05:07, 9.02s/it] +2025-02-05 21:12:02 - ERROR - stderr - +2025-02-05 21:12:02 - ERROR - stderr - +2025-02-05 21:12:02 - INFO - stdout - {'loss': 0.6518, 'grad_norm': 1.269080638885498, 'learning_rate': 1.0476256161220902e-05, 'epoch': 1.5} +2025-02-05 21:12:02 - ERROR - stderr - 50%|█████ | 11224/22434 [11:04:22<28:05:07, 9.02s/it] +2025-02-05 21:12:04 - ERROR - stderr - 50%|█████ | 11225/22434 [11:04:24<21:57:02, 7.05s/it] +2025-02-05 21:12:04 - ERROR - stderr - +2025-02-05 21:12:04 - ERROR - stderr - +2025-02-05 21:12:04 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.190590739250183, 'learning_rate': 1.0474814047920532e-05, 'epoch': 1.5} +2025-02-05 21:12:04 - ERROR - stderr - 50%|█████ | 11225/22434 [11:04:24<21:57:02, 7.05s/it] +2025-02-05 21:12:17 - ERROR - stderr - 50%|█████ | 11226/22434 [11:04:37<27:43:58, 8.91s/it] +2025-02-05 21:12:18 - ERROR - stderr - +2025-02-05 21:12:18 - ERROR - stderr - +2025-02-05 21:12:18 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.2478607892990112, 'learning_rate': 1.0473371924723117e-05, 'epoch': 1.5} +2025-02-05 21:12:18 - ERROR - stderr - 50%|█████ | 11226/22434 [11:04:37<27:43:58, 8.91s/it] +2025-02-05 21:12:20 - ERROR - stderr - 50%|█████ | 11227/22434 [11:04:40<21:49:22, 7.01s/it] +2025-02-05 21:12:20 - ERROR - stderr - +2025-02-05 21:12:20 - ERROR - stderr - +2025-02-05 21:12:20 - INFO - stdout - {'loss': 0.654, 'grad_norm': 1.0146020650863647, 'learning_rate': 1.0471929791658717e-05, 'epoch': 1.5} +2025-02-05 21:12:20 - ERROR - stderr - 50%|█████ | 11227/22434 [11:04:40<21:49:22, 7.01s/it] +2025-02-05 21:12:30 - ERROR - stderr - 50%|█████ | 11228/22434 [11:04:50<24:29:11, 7.87s/it] +2025-02-05 21:12:30 - ERROR - stderr - +2025-02-05 21:12:30 - ERROR - stderr - +2025-02-05 21:12:30 - INFO - stdout - {'loss': 0.7468, 'grad_norm': 1.0527175664901733, 'learning_rate': 1.047048764875739e-05, 'epoch': 1.5} +2025-02-05 21:12:30 - ERROR - stderr - 50%|█████ | 11228/22434 [11:04:50<24:29:11, 7.87s/it] +2025-02-05 21:12:50 - ERROR - stderr - 50%|█████ | 11229/22434 [11:05:10<35:40:40, 11.46s/it] +2025-02-05 21:12:50 - ERROR - stderr - +2025-02-05 21:12:50 - ERROR - stderr - +2025-02-05 21:12:50 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.172809362411499, 'learning_rate': 1.0469045496049202e-05, 'epoch': 1.5} +2025-02-05 21:12:50 - ERROR - stderr - 50%|█████ | 11229/22434 [11:05:10<35:40:40, 11.46s/it] +2025-02-05 21:12:58 - ERROR - stderr - 50%|█████ | 11230/22434 [11:05:18<33:02:29, 10.62s/it] +2025-02-05 21:12:58 - ERROR - stderr - +2025-02-05 21:12:58 - ERROR - stderr - +2025-02-05 21:12:58 - INFO - stdout - {'loss': 0.6706, 'grad_norm': 1.151249885559082, 'learning_rate': 1.0467603333564207e-05, 'epoch': 1.5} +2025-02-05 21:12:58 - ERROR - stderr - 50%|█████ | 11230/22434 [11:05:18<33:02:29, 10.62s/it] +2025-02-05 21:13:11 - ERROR - stderr - 50%|█████ | 11231/22434 [11:05:30<34:27:16, 11.07s/it] +2025-02-05 21:13:11 - ERROR - stderr - +2025-02-05 21:13:11 - ERROR - stderr - +2025-02-05 21:13:11 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.1829904317855835, 'learning_rate': 1.0466161161332468e-05, 'epoch': 1.5} +2025-02-05 21:13:11 - ERROR - stderr - 50%|█████ | 11231/22434 [11:05:30<34:27:16, 11.07s/it] +2025-02-05 21:13:22 - ERROR - stderr - 50%|█████ | 11232/22434 [11:05:42<35:10:03, 11.30s/it] +2025-02-05 21:13:22 - ERROR - stderr - +2025-02-05 21:13:22 - ERROR - stderr - +2025-02-05 21:13:22 - INFO - stdout - {'loss': 0.6382, 'grad_norm': 1.075018286705017, 'learning_rate': 1.0464718979384045e-05, 'epoch': 1.5} +2025-02-05 21:13:22 - ERROR - stderr - 50%|█████ | 11232/22434 [11:05:42<35:10:03, 11.30s/it] +2025-02-05 21:13:25 - ERROR - stderr - 50%|█████ | 11233/22434 [11:05:45<26:56:31, 8.66s/it] +2025-02-05 21:13:25 - ERROR - stderr - +2025-02-05 21:13:25 - ERROR - stderr - +2025-02-05 21:13:25 - INFO - stdout - {'loss': 0.7795, 'grad_norm': 1.306370496749878, 'learning_rate': 1.0463276787749004e-05, 'epoch': 1.5} +2025-02-05 21:13:25 - ERROR - stderr - 50%|█████ | 11233/22434 [11:05:45<26:56:31, 8.66s/it] +2025-02-05 21:13:28 - ERROR - stderr - 50%|█████ | 11234/22434 [11:05:47<21:21:24, 6.86s/it] +2025-02-05 21:13:28 - ERROR - stderr - +2025-02-05 21:13:28 - ERROR - stderr - +2025-02-05 21:13:28 - INFO - stdout - {'loss': 0.7068, 'grad_norm': 1.2223362922668457, 'learning_rate': 1.0461834586457398e-05, 'epoch': 1.5} +2025-02-05 21:13:28 - ERROR - stderr - 50%|█████ | 11234/22434 [11:05:47<21:21:24, 6.86s/it] +2025-02-05 21:13:41 - ERROR - stderr - 50%|█████ | 11235/22434 [11:06:01<27:52:00, 8.96s/it] +2025-02-05 21:13:41 - ERROR - stderr - +2025-02-05 21:13:41 - ERROR - stderr - +2025-02-05 21:13:41 - INFO - stdout - {'loss': 0.663, 'grad_norm': 1.0923806428909302, 'learning_rate': 1.0460392375539293e-05, 'epoch': 1.5} +2025-02-05 21:13:41 - ERROR - stderr - 50%|█████ | 11235/22434 [11:06:01<27:52:00, 8.96s/it] +2025-02-05 21:13:53 - ERROR - stderr - 50%|█████ | 11236/22434 [11:06:13<30:38:14, 9.85s/it] +2025-02-05 21:13:53 - ERROR - stderr - +2025-02-05 21:13:53 - ERROR - stderr - +2025-02-05 21:13:53 - INFO - stdout - {'loss': 0.6382, 'grad_norm': 1.1650400161743164, 'learning_rate': 1.0458950155024745e-05, 'epoch': 1.5} +2025-02-05 21:13:53 - ERROR - stderr - 50%|█████ | 11236/22434 [11:06:13<30:38:14, 9.85s/it] +2025-02-05 21:13:57 - ERROR - stderr - 50%|█████ | 11237/22434 [11:06:16<24:28:31, 7.87s/it] +2025-02-05 21:13:57 - ERROR - stderr - +2025-02-05 21:13:57 - ERROR - stderr - +2025-02-05 21:13:57 - INFO - stdout - {'loss': 0.812, 'grad_norm': 1.261993169784546, 'learning_rate': 1.0457507924943829e-05, 'epoch': 1.5} +2025-02-05 21:13:57 - ERROR - stderr - 50%|█████ | 11237/22434 [11:06:16<24:28:31, 7.87s/it] +2025-02-05 21:14:07 - ERROR - stderr - 50%|█████ | 11238/22434 [11:06:27<27:05:00, 8.71s/it] +2025-02-05 21:14:07 - ERROR - stderr - +2025-02-05 21:14:07 - ERROR - stderr - +2025-02-05 21:14:07 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.2129238843917847, 'learning_rate': 1.0456065685326591e-05, 'epoch': 1.5} +2025-02-05 21:14:07 - ERROR - stderr - 50%|█████ | 11238/22434 [11:06:27<27:05:00, 8.71s/it] +2025-02-05 21:14:33 - ERROR - stderr - 50%|█████ | 11239/22434 [11:06:53<43:02:15, 13.84s/it] +2025-02-05 21:14:33 - ERROR - stderr - +2025-02-05 21:14:33 - ERROR - stderr - +2025-02-05 21:14:33 - INFO - stdout - {'loss': 0.7663, 'grad_norm': 1.1511640548706055, 'learning_rate': 1.0454623436203102e-05, 'epoch': 1.5} +2025-02-05 21:14:33 - ERROR - stderr - 50%|█████ | 11239/22434 [11:06:53<43:02:15, 13.84s/it] +2025-02-05 21:14:35 - ERROR - stderr - 50%|█████ | 11240/22434 [11:06:55<32:22:58, 10.41s/it] +2025-02-05 21:14:36 - ERROR - stderr - +2025-02-05 21:14:36 - ERROR - stderr - +2025-02-05 21:14:36 - INFO - stdout - {'loss': 0.732, 'grad_norm': 1.4265037775039673, 'learning_rate': 1.0453181177603424e-05, 'epoch': 1.5} +2025-02-05 21:14:36 - ERROR - stderr - 50%|█████ | 11240/22434 [11:06:55<32:22:58, 10.41s/it] +2025-02-05 21:14:48 - ERROR - stderr - 50%|█████ | 11241/22434 [11:07:07<34:03:31, 10.95s/it] +2025-02-05 21:14:48 - ERROR - stderr - +2025-02-05 21:14:48 - ERROR - stderr - +2025-02-05 21:14:48 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.3808835744857788, 'learning_rate': 1.0451738909557617e-05, 'epoch': 1.5} +2025-02-05 21:14:48 - ERROR - stderr - 50%|█████ | 11241/22434 [11:07:08<34:03:31, 10.95s/it] +2025-02-05 21:14:59 - ERROR - stderr - 50%|█████ | 11242/22434 [11:07:19<34:39:20, 11.15s/it] +2025-02-05 21:14:59 - ERROR - stderr - +2025-02-05 21:14:59 - ERROR - stderr - +2025-02-05 21:14:59 - INFO - stdout - {'loss': 0.7187, 'grad_norm': 1.3154296875, 'learning_rate': 1.0450296632095745e-05, 'epoch': 1.5} +2025-02-05 21:14:59 - ERROR - stderr - 50%|█████ | 11242/22434 [11:07:19<34:39:20, 11.15s/it] +2025-02-05 21:15:12 - ERROR - stderr - 50%|█████ | 11243/22434 [11:07:32<36:27:57, 11.73s/it] +2025-02-05 21:15:12 - ERROR - stderr - +2025-02-05 21:15:12 - ERROR - stderr - +2025-02-05 21:15:12 - INFO - stdout - {'loss': 0.7574, 'grad_norm': 1.3440579175949097, 'learning_rate': 1.044885434524787e-05, 'epoch': 1.5} +2025-02-05 21:15:12 - ERROR - stderr - 50%|█████ | 11243/22434 [11:07:32<36:27:57, 11.73s/it] +2025-02-05 21:15:15 - ERROR - stderr - 50%|█████ | 11244/22434 [11:07:35<27:57:53, 9.00s/it] +2025-02-05 21:15:15 - ERROR - stderr - +2025-02-05 21:15:15 - ERROR - stderr - +2025-02-05 21:15:15 - INFO - stdout - {'loss': 0.6987, 'grad_norm': 1.2270103693008423, 'learning_rate': 1.0447412049044055e-05, 'epoch': 1.5} +2025-02-05 21:15:15 - ERROR - stderr - 50%|█████ | 11244/22434 [11:07:35<27:57:53, 9.00s/it] +2025-02-05 21:15:26 - ERROR - stderr - 50%|█████ | 11245/22434 [11:07:46<30:03:07, 9.67s/it] +2025-02-05 21:15:26 - ERROR - stderr - +2025-02-05 21:15:26 - ERROR - stderr - +2025-02-05 21:15:26 - INFO - stdout - {'loss': 0.7693, 'grad_norm': 1.2899839878082275, 'learning_rate': 1.0445969743514365e-05, 'epoch': 1.5} +2025-02-05 21:15:26 - ERROR - stderr - 50%|█████ | 11245/22434 [11:07:46<30:03:07, 9.67s/it] +2025-02-05 21:15:37 - ERROR - stderr - 50%|█████ | 11246/22434 [11:07:57<31:08:20, 10.02s/it] +2025-02-05 21:15:37 - ERROR - stderr - +2025-02-05 21:15:37 - ERROR - stderr - +2025-02-05 21:15:37 - INFO - stdout - {'loss': 0.7688, 'grad_norm': 1.2557570934295654, 'learning_rate': 1.0444527428688864e-05, 'epoch': 1.5} +2025-02-05 21:15:37 - ERROR - stderr - 50%|█████ | 11246/22434 [11:07:57<31:08:20, 10.02s/it] +2025-02-05 21:15:40 - ERROR - stderr - 50%|█████ | 11247/22434 [11:07:59<24:13:01, 7.79s/it] +2025-02-05 21:15:40 - ERROR - stderr - +2025-02-05 21:15:40 - ERROR - stderr - +2025-02-05 21:15:40 - INFO - stdout - {'loss': 0.655, 'grad_norm': 1.0963035821914673, 'learning_rate': 1.0443085104597612e-05, 'epoch': 1.5} +2025-02-05 21:15:40 - ERROR - stderr - 50%|█████ | 11247/22434 [11:07:59<24:13:01, 7.79s/it] +2025-02-05 21:15:42 - ERROR - stderr - 50%|█████ | 11248/22434 [11:08:02<19:14:33, 6.19s/it] +2025-02-05 21:15:42 - ERROR - stderr - +2025-02-05 21:15:42 - ERROR - stderr - +2025-02-05 21:15:42 - INFO - stdout - {'loss': 0.7554, 'grad_norm': 1.2186487913131714, 'learning_rate': 1.0441642771270675e-05, 'epoch': 1.5} +2025-02-05 21:15:42 - ERROR - stderr - 50%|█████ | 11248/22434 [11:08:02<19:14:33, 6.19s/it] +2025-02-05 21:15:45 - ERROR - stderr - 50%|█████ | 11249/22434 [11:08:04<15:47:45, 5.08s/it] +2025-02-05 21:15:45 - ERROR - stderr - +2025-02-05 21:15:45 - ERROR - stderr - +2025-02-05 21:15:45 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.0940096378326416, 'learning_rate': 1.0440200428738119e-05, 'epoch': 1.5} +2025-02-05 21:15:45 - ERROR - stderr - 50%|█████ | 11249/22434 [11:08:04<15:47:45, 5.08s/it] +2025-02-05 21:16:02 - ERROR - stderr - 50%|█████ | 11250/22434 [11:08:21<26:54:12, 8.66s/it] +2025-02-05 21:16:02 - ERROR - stderr - +2025-02-05 21:16:02 - ERROR - stderr - +2025-02-05 21:16:02 - INFO - stdout - {'loss': 0.7787, 'grad_norm': 1.2495222091674805, 'learning_rate': 1.0438758077030002e-05, 'epoch': 1.5} +2025-02-05 21:16:02 - ERROR - stderr - 50%|█████ | 11250/22434 [11:08:21<26:54:12, 8.66s/it] +2025-02-05 21:16:18 - ERROR - stderr - 50%|█████ | 11251/22434 [11:08:37<33:38:30, 10.83s/it] +2025-02-05 21:16:18 - ERROR - stderr - +2025-02-05 21:16:18 - ERROR - stderr - +2025-02-05 21:16:18 - INFO - stdout - {'loss': 0.7177, 'grad_norm': 1.278853178024292, 'learning_rate': 1.0437315716176398e-05, 'epoch': 1.5} +2025-02-05 21:16:18 - ERROR - stderr - 50%|█████ | 11251/22434 [11:08:37<33:38:30, 10.83s/it] +2025-02-05 21:16:20 - ERROR - stderr - 50%|█████ | 11252/22434 [11:08:40<25:52:36, 8.33s/it] +2025-02-05 21:16:20 - ERROR - stderr - +2025-02-05 21:16:20 - ERROR - stderr - +2025-02-05 21:16:20 - INFO - stdout - {'loss': 0.6526, 'grad_norm': 1.1386044025421143, 'learning_rate': 1.0435873346207362e-05, 'epoch': 1.5} +2025-02-05 21:16:20 - ERROR - stderr - 50%|█████ | 11252/22434 [11:08:40<25:52:36, 8.33s/it] +2025-02-05 21:16:23 - ERROR - stderr - 50%|█████ | 11253/22434 [11:08:42<20:31:16, 6.61s/it] +2025-02-05 21:16:23 - ERROR - stderr - +2025-02-05 21:16:23 - ERROR - stderr - +2025-02-05 21:16:23 - INFO - stdout - {'loss': 0.7469, 'grad_norm': 1.2027910947799683, 'learning_rate': 1.0434430967152966e-05, 'epoch': 1.5} +2025-02-05 21:16:23 - ERROR - stderr - 50%|█████ | 11253/22434 [11:08:42<20:31:16, 6.61s/it] +2025-02-05 21:16:38 - ERROR - stderr - 50%|█████ | 11254/22434 [11:08:58<28:29:35, 9.17s/it] +2025-02-05 21:16:38 - ERROR - stderr - +2025-02-05 21:16:38 - ERROR - stderr - +2025-02-05 21:16:38 - INFO - stdout - {'loss': 0.6259, 'grad_norm': 1.0777400732040405, 'learning_rate': 1.0432988579043273e-05, 'epoch': 1.5} +2025-02-05 21:16:38 - ERROR - stderr - 50%|█████ | 11254/22434 [11:08:58<28:29:35, 9.17s/it] +2025-02-05 21:16:52 - ERROR - stderr - 50%|█████ | 11255/22434 [11:09:12<33:21:59, 10.75s/it] +2025-02-05 21:16:52 - ERROR - stderr - +2025-02-05 21:16:52 - ERROR - stderr - +2025-02-05 21:16:52 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.1165553331375122, 'learning_rate': 1.0431546181908343e-05, 'epoch': 1.51} +2025-02-05 21:16:52 - ERROR - stderr - 50%|█████ | 11255/22434 [11:09:12<33:21:59, 10.75s/it] +2025-02-05 21:17:05 - ERROR - stderr - 50%|█████ | 11256/22434 [11:09:25<35:41:02, 11.49s/it] +2025-02-05 21:17:05 - ERROR - stderr - +2025-02-05 21:17:05 - ERROR - stderr - +2025-02-05 21:17:05 - INFO - stdout - {'loss': 0.7581, 'grad_norm': 1.298244595527649, 'learning_rate': 1.0430103775778249e-05, 'epoch': 1.51} +2025-02-05 21:17:05 - ERROR - stderr - 50%|█████ | 11256/22434 [11:09:25<35:41:02, 11.49s/it] +2025-02-05 21:17:19 - ERROR - stderr - 50%|█████ | 11257/22434 [11:09:38<37:21:44, 12.03s/it] +2025-02-05 21:17:19 - ERROR - stderr - +2025-02-05 21:17:19 - ERROR - stderr - +2025-02-05 21:17:19 - INFO - stdout - {'loss': 0.6969, 'grad_norm': 1.2060997486114502, 'learning_rate': 1.0428661360683055e-05, 'epoch': 1.51} +2025-02-05 21:17:19 - ERROR - stderr - 50%|█████ | 11257/22434 [11:09:39<37:21:44, 12.03s/it] +2025-02-05 21:17:21 - ERROR - stderr - 50%|█████ | 11258/22434 [11:09:41<28:28:49, 9.17s/it] +2025-02-05 21:17:21 - ERROR - stderr - +2025-02-05 21:17:21 - ERROR - stderr - +2025-02-05 21:17:21 - INFO - stdout - {'loss': 0.7801, 'grad_norm': 1.2900875806808472, 'learning_rate': 1.0427218936652821e-05, 'epoch': 1.51} +2025-02-05 21:17:21 - ERROR - stderr - 50%|█████ | 11258/22434 [11:09:41<28:28:49, 9.17s/it] +2025-02-05 21:17:25 - ERROR - stderr - 50%|█████ | 11259/22434 [11:09:45<23:13:46, 7.48s/it] +2025-02-05 21:17:25 - ERROR - stderr - +2025-02-05 21:17:25 - ERROR - stderr - +2025-02-05 21:17:25 - INFO - stdout - {'loss': 0.6634, 'grad_norm': 1.1401000022888184, 'learning_rate': 1.042577650371762e-05, 'epoch': 1.51} +2025-02-05 21:17:25 - ERROR - stderr - 50%|█████ | 11259/22434 [11:09:45<23:13:46, 7.48s/it] +2025-02-05 21:17:44 - ERROR - stderr - 50%|█████ | 11260/22434 [11:10:04<34:28:57, 11.11s/it] +2025-02-05 21:17:44 - ERROR - stderr - +2025-02-05 21:17:44 - ERROR - stderr - +2025-02-05 21:17:44 - INFO - stdout - {'loss': 0.7152, 'grad_norm': 1.2181081771850586, 'learning_rate': 1.0424334061907513e-05, 'epoch': 1.51} +2025-02-05 21:17:44 - ERROR - stderr - 50%|█████ | 11260/22434 [11:10:04<34:28:57, 11.11s/it] +2025-02-05 21:17:55 - ERROR - stderr - 50%|█████ | 11261/22434 [11:10:15<34:22:10, 11.07s/it] +2025-02-05 21:17:55 - ERROR - stderr - +2025-02-05 21:17:55 - ERROR - stderr - +2025-02-05 21:17:55 - INFO - stdout - {'loss': 0.7402, 'grad_norm': 1.2649118900299072, 'learning_rate': 1.042289161125257e-05, 'epoch': 1.51} +2025-02-05 21:17:55 - ERROR - stderr - 50%|█████ | 11261/22434 [11:10:15<34:22:10, 11.07s/it] +2025-02-05 21:18:11 - ERROR - stderr - 50%|█████ | 11262/22434 [11:10:31<38:55:21, 12.54s/it] +2025-02-05 21:18:11 - ERROR - stderr - +2025-02-05 21:18:11 - ERROR - stderr - +2025-02-05 21:18:11 - INFO - stdout - {'loss': 0.6749, 'grad_norm': 1.1299681663513184, 'learning_rate': 1.0421449151782855e-05, 'epoch': 1.51} +2025-02-05 21:18:11 - ERROR - stderr - 50%|█████ | 11262/22434 [11:10:31<38:55:21, 12.54s/it] +2025-02-05 21:18:25 - ERROR - stderr - 50%|█████ | 11263/22434 [11:10:44<39:41:12, 12.79s/it] +2025-02-05 21:18:25 - ERROR - stderr - +2025-02-05 21:18:25 - ERROR - stderr - +2025-02-05 21:18:25 - INFO - stdout - {'loss': 0.6826, 'grad_norm': 1.0603952407836914, 'learning_rate': 1.0420006683528436e-05, 'epoch': 1.51} +2025-02-05 21:18:25 - ERROR - stderr - 50%|█████ | 11263/22434 [11:10:44<39:41:12, 12.79s/it] +2025-02-05 21:18:27 - ERROR - stderr - 50%|█████ | 11264/22434 [11:10:47<30:06:21, 9.70s/it] +2025-02-05 21:18:27 - ERROR - stderr - +2025-02-05 21:18:27 - ERROR - stderr - +2025-02-05 21:18:27 - INFO - stdout - {'loss': 0.7543, 'grad_norm': 1.2336446046829224, 'learning_rate': 1.0418564206519379e-05, 'epoch': 1.51} +2025-02-05 21:18:27 - ERROR - stderr - 50%|█████ | 11264/22434 [11:10:47<30:06:21, 9.70s/it] +2025-02-05 21:18:38 - ERROR - stderr - 50%|█████ | 11265/22434 [11:10:58<31:28:36, 10.15s/it] +2025-02-05 21:18:38 - ERROR - stderr - +2025-02-05 21:18:38 - ERROR - stderr - +2025-02-05 21:18:38 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.2501355409622192, 'learning_rate': 1.0417121720785758e-05, 'epoch': 1.51} +2025-02-05 21:18:38 - ERROR - stderr - 50%|█████ | 11265/22434 [11:10:58<31:28:36, 10.15s/it] +2025-02-05 21:18:41 - ERROR - stderr - 50%|█████ | 11266/22434 [11:11:01<24:22:59, 7.86s/it] +2025-02-05 21:18:41 - ERROR - stderr - +2025-02-05 21:18:41 - ERROR - stderr - +2025-02-05 21:18:41 - INFO - stdout - {'loss': 0.6457, 'grad_norm': 1.0364837646484375, 'learning_rate': 1.0415679226357627e-05, 'epoch': 1.51} +2025-02-05 21:18:41 - ERROR - stderr - 50%|█████ | 11266/22434 [11:11:01<24:22:59, 7.86s/it] +2025-02-05 21:18:43 - ERROR - stderr - 50%|█████ | 11267/22434 [11:11:03<19:21:43, 6.24s/it] +2025-02-05 21:18:43 - ERROR - stderr - +2025-02-05 21:18:43 - ERROR - stderr - +2025-02-05 21:18:43 - INFO - stdout - {'loss': 0.7702, 'grad_norm': 1.3113071918487549, 'learning_rate': 1.0414236723265062e-05, 'epoch': 1.51} +2025-02-05 21:18:43 - ERROR - stderr - 50%|█████ | 11267/22434 [11:11:03<19:21:43, 6.24s/it] +2025-02-05 21:18:46 - ERROR - stderr - 50%|█████ | 11268/22434 [11:11:06<15:50:25, 5.11s/it] +2025-02-05 21:18:46 - ERROR - stderr - +2025-02-05 21:18:46 - ERROR - stderr - +2025-02-05 21:18:46 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.3548494577407837, 'learning_rate': 1.0412794211538125e-05, 'epoch': 1.51} +2025-02-05 21:18:46 - ERROR - stderr - 50%|█████ | 11268/22434 [11:11:06<15:50:25, 5.11s/it] +2025-02-05 21:19:02 - ERROR - stderr - 50%|█████ | 11269/22434 [11:11:21<25:50:47, 8.33s/it] +2025-02-05 21:19:02 - ERROR - stderr - +2025-02-05 21:19:02 - ERROR - stderr - +2025-02-05 21:19:02 - INFO - stdout - {'loss': 0.7391, 'grad_norm': 1.1755337715148926, 'learning_rate': 1.0411351691206894e-05, 'epoch': 1.51} +2025-02-05 21:19:02 - ERROR - stderr - 50%|█████ | 11269/22434 [11:11:21<25:50:47, 8.33s/it] +2025-02-05 21:19:04 - ERROR - stderr - 50%|█████ | 11270/22434 [11:11:24<20:23:23, 6.57s/it] +2025-02-05 21:19:04 - ERROR - stderr - +2025-02-05 21:19:04 - ERROR - stderr - +2025-02-05 21:19:04 - INFO - stdout - {'loss': 0.661, 'grad_norm': 1.1628522872924805, 'learning_rate': 1.0409909162301428e-05, 'epoch': 1.51} +2025-02-05 21:19:04 - ERROR - stderr - 50%|█████ | 11270/22434 [11:11:24<20:23:23, 6.57s/it] +2025-02-05 21:19:07 - ERROR - stderr - 50%|█████ | 11271/22434 [11:11:26<16:35:28, 5.35s/it] +2025-02-05 21:19:07 - ERROR - stderr - +2025-02-05 21:19:07 - ERROR - stderr - +2025-02-05 21:19:07 - INFO - stdout - {'loss': 0.6269, 'grad_norm': 1.1194788217544556, 'learning_rate': 1.0408466624851796e-05, 'epoch': 1.51} +2025-02-05 21:19:07 - ERROR - stderr - 50%|█████ | 11271/22434 [11:11:26<16:35:28, 5.35s/it] +2025-02-05 21:19:18 - ERROR - stderr - 50%|█████ | 11272/22434 [11:11:37<21:55:12, 7.07s/it] +2025-02-05 21:19:18 - ERROR - stderr - +2025-02-05 21:19:18 - ERROR - stderr - +2025-02-05 21:19:18 - INFO - stdout - {'loss': 0.7609, 'grad_norm': 1.3749436140060425, 'learning_rate': 1.040702407888807e-05, 'epoch': 1.51} +2025-02-05 21:19:18 - ERROR - stderr - 50%|█████ | 11272/22434 [11:11:38<21:55:12, 7.07s/it] +2025-02-05 21:19:20 - ERROR - stderr - 50%|█████ | 11273/22434 [11:11:40<17:36:53, 5.68s/it] +2025-02-05 21:19:20 - ERROR - stderr - +2025-02-05 21:19:20 - ERROR - stderr - +2025-02-05 21:19:20 - INFO - stdout - {'loss': 0.7187, 'grad_norm': 1.265852928161621, 'learning_rate': 1.0405581524440318e-05, 'epoch': 1.51} +2025-02-05 21:19:20 - ERROR - stderr - 50%|█████ | 11273/22434 [11:11:40<17:36:53, 5.68s/it] +2025-02-05 21:19:23 - ERROR - stderr - 50%|█████ | 11274/22434 [11:11:42<14:36:10, 4.71s/it] +2025-02-05 21:19:23 - ERROR - stderr - +2025-02-05 21:19:23 - ERROR - stderr - +2025-02-05 21:19:23 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.3400779962539673, 'learning_rate': 1.0404138961538603e-05, 'epoch': 1.51} +2025-02-05 21:19:23 - ERROR - stderr - 50%|█████ | 11274/22434 [11:11:42<14:36:10, 4.71s/it] +2025-02-05 21:19:41 - ERROR - stderr - 50%|█████ | 11275/22434 [11:12:01<27:19:42, 8.82s/it] +2025-02-05 21:19:41 - ERROR - stderr - +2025-02-05 21:19:41 - ERROR - stderr - +2025-02-05 21:19:41 - INFO - stdout - {'loss': 0.7566, 'grad_norm': 1.3339792490005493, 'learning_rate': 1.0402696390213e-05, 'epoch': 1.51} +2025-02-05 21:19:41 - ERROR - stderr - 50%|█████ | 11275/22434 [11:12:01<27:19:42, 8.82s/it] +2025-02-05 21:19:55 - ERROR - stderr - 50%|█████ | 11276/22434 [11:12:15<32:28:21, 10.48s/it] +2025-02-05 21:19:55 - ERROR - stderr - +2025-02-05 21:19:55 - ERROR - stderr - +2025-02-05 21:19:55 - INFO - stdout - {'loss': 0.7929, 'grad_norm': 1.449597716331482, 'learning_rate': 1.0401253810493579e-05, 'epoch': 1.51} +2025-02-05 21:19:55 - ERROR - stderr - 50%|█████ | 11276/22434 [11:12:15<32:28:21, 10.48s/it] +2025-02-05 21:19:58 - ERROR - stderr - 50%|█████ | 11277/22434 [11:12:18<25:07:53, 8.11s/it] +2025-02-05 21:19:58 - ERROR - stderr - +2025-02-05 21:19:58 - ERROR - stderr - +2025-02-05 21:19:58 - INFO - stdout - {'loss': 0.7336, 'grad_norm': 1.2467231750488281, 'learning_rate': 1.0399811222410405e-05, 'epoch': 1.51} +2025-02-05 21:19:58 - ERROR - stderr - 50%|█████ | 11277/22434 [11:12:18<25:07:53, 8.11s/it] +2025-02-05 21:20:01 - ERROR - stderr - 50%|█████ | 11278/22434 [11:12:20<20:11:10, 6.51s/it] +2025-02-05 21:20:01 - ERROR - stderr - +2025-02-05 21:20:01 - ERROR - stderr - +2025-02-05 21:20:01 - INFO - stdout - {'loss': 0.7176, 'grad_norm': 1.3466869592666626, 'learning_rate': 1.0398368625993546e-05, 'epoch': 1.51} +2025-02-05 21:20:01 - ERROR - stderr - 50%|█████ | 11278/22434 [11:12:21<20:11:10, 6.51s/it] +2025-02-05 21:20:12 - ERROR - stderr - 50%|█████ | 11279/22434 [11:12:32<24:36:26, 7.94s/it] +2025-02-05 21:20:12 - ERROR - stderr - +2025-02-05 21:20:12 - ERROR - stderr - +2025-02-05 21:20:12 - INFO - stdout - {'loss': 0.6873, 'grad_norm': 1.1303359270095825, 'learning_rate': 1.0396926021273076e-05, 'epoch': 1.51} +2025-02-05 21:20:12 - ERROR - stderr - 50%|█████ | 11279/22434 [11:12:32<24:36:26, 7.94s/it] +2025-02-05 21:20:14 - ERROR - stderr - 50%|█████ | 11280/22434 [11:12:34<19:34:04, 6.32s/it] +2025-02-05 21:20:15 - ERROR - stderr - +2025-02-05 21:20:15 - ERROR - stderr - +2025-02-05 21:20:15 - INFO - stdout - {'loss': 0.7528, 'grad_norm': 1.2739181518554688, 'learning_rate': 1.0395483408279063e-05, 'epoch': 1.51} +2025-02-05 21:20:15 - ERROR - stderr - 50%|█████ | 11280/22434 [11:12:34<19:34:04, 6.32s/it] +2025-02-05 21:20:17 - ERROR - stderr - 50%|█████ | 11281/22434 [11:12:37<15:56:03, 5.14s/it] +2025-02-05 21:20:17 - ERROR - stderr - +2025-02-05 21:20:17 - ERROR - stderr - +2025-02-05 21:20:17 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.331796646118164, 'learning_rate': 1.0394040787041576e-05, 'epoch': 1.51} +2025-02-05 21:20:17 - ERROR - stderr - 50%|█████ | 11281/22434 [11:12:37<15:56:03, 5.14s/it] +2025-02-05 21:20:28 - ERROR - stderr - 50%|█████ | 11282/22434 [11:12:47<21:06:41, 6.82s/it] +2025-02-05 21:20:28 - ERROR - stderr - +2025-02-05 21:20:28 - ERROR - stderr - +2025-02-05 21:20:28 - INFO - stdout - {'loss': 0.8308, 'grad_norm': 1.4136468172073364, 'learning_rate': 1.0392598157590687e-05, 'epoch': 1.51} +2025-02-05 21:20:28 - ERROR - stderr - 50%|█████ | 11282/22434 [11:12:47<21:06:41, 6.82s/it] +2025-02-05 21:20:37 - ERROR - stderr - 50%|█████ | 11283/22434 [11:12:57<23:24:55, 7.56s/it] +2025-02-05 21:20:37 - ERROR - stderr - +2025-02-05 21:20:37 - ERROR - stderr - +2025-02-05 21:20:37 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.1866846084594727, 'learning_rate': 1.0391155519956464e-05, 'epoch': 1.51} +2025-02-05 21:20:37 - ERROR - stderr - 50%|█████ | 11283/22434 [11:12:57<23:24:55, 7.56s/it] +2025-02-05 21:20:47 - ERROR - stderr - 50%|█████ | 11284/22434 [11:13:07<26:02:15, 8.41s/it] +2025-02-05 21:20:47 - ERROR - stderr - +2025-02-05 21:20:47 - ERROR - stderr - +2025-02-05 21:20:47 - INFO - stdout - {'loss': 0.6925, 'grad_norm': 1.1944962739944458, 'learning_rate': 1.038971287416898e-05, 'epoch': 1.51} +2025-02-05 21:20:47 - ERROR - stderr - 50%|█████ | 11284/22434 [11:13:07<26:02:15, 8.41s/it] +2025-02-05 21:20:55 - ERROR - stderr - 50%|█████ | 11285/22434 [11:13:15<25:30:55, 8.24s/it] +2025-02-05 21:20:55 - ERROR - stderr - +2025-02-05 21:20:55 - ERROR - stderr - +2025-02-05 21:20:55 - INFO - stdout - {'loss': 0.6863, 'grad_norm': 1.2064961194992065, 'learning_rate': 1.0388270220258305e-05, 'epoch': 1.51} +2025-02-05 21:20:55 - ERROR - stderr - 50%|█████ | 11285/22434 [11:13:15<25:30:55, 8.24s/it] +2025-02-05 21:20:58 - ERROR - stderr - 50%|█████ | 11286/22434 [11:13:17<20:09:31, 6.51s/it] +2025-02-05 21:20:58 - ERROR - stderr - +2025-02-05 21:20:58 - ERROR - stderr - +2025-02-05 21:20:58 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.2031103372573853, 'learning_rate': 1.0386827558254507e-05, 'epoch': 1.51} +2025-02-05 21:20:58 - ERROR - stderr - 50%|█████ | 11286/22434 [11:13:17<20:09:31, 6.51s/it] +2025-02-05 21:21:09 - ERROR - stderr - 50%|█████ | 11287/22434 [11:13:28<24:18:58, 7.85s/it] +2025-02-05 21:21:09 - ERROR - stderr - +2025-02-05 21:21:09 - ERROR - stderr - +2025-02-05 21:21:09 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.0480718612670898, 'learning_rate': 1.0385384888187656e-05, 'epoch': 1.51} +2025-02-05 21:21:09 - ERROR - stderr - 50%|█████ | 11287/22434 [11:13:28<24:18:58, 7.85s/it] +2025-02-05 21:21:11 - ERROR - stderr - 50%|█████ | 11288/22434 [11:13:31<19:38:40, 6.34s/it] +2025-02-05 21:21:11 - ERROR - stderr - +2025-02-05 21:21:11 - ERROR - stderr - +2025-02-05 21:21:11 - INFO - stdout - {'loss': 0.6099, 'grad_norm': 1.0089476108551025, 'learning_rate': 1.0383942210087827e-05, 'epoch': 1.51} +2025-02-05 21:21:11 - ERROR - stderr - 50%|█████ | 11288/22434 [11:13:31<19:38:40, 6.34s/it] +2025-02-05 21:21:14 - ERROR - stderr - 50%|█████ | 11289/22434 [11:13:34<16:01:40, 5.18s/it] +2025-02-05 21:21:14 - ERROR - stderr - +2025-02-05 21:21:14 - ERROR - stderr - +2025-02-05 21:21:14 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.278743863105774, 'learning_rate': 1.0382499523985094e-05, 'epoch': 1.51} +2025-02-05 21:21:14 - ERROR - stderr - 50%|█████ | 11289/22434 [11:13:34<16:01:40, 5.18s/it] +2025-02-05 21:21:29 - ERROR - stderr - 50%|█████ | 11290/22434 [11:13:49<25:13:50, 8.15s/it] +2025-02-05 21:21:29 - ERROR - stderr - +2025-02-05 21:21:29 - ERROR - stderr - +2025-02-05 21:21:29 - INFO - stdout - {'loss': 0.7567, 'grad_norm': 1.150586485862732, 'learning_rate': 1.0381056829909522e-05, 'epoch': 1.51} +2025-02-05 21:21:29 - ERROR - stderr - 50%|█████ | 11290/22434 [11:13:49<25:13:50, 8.15s/it] +2025-02-05 21:21:38 - ERROR - stderr - 50%|█████ | 11291/22434 [11:13:58<26:26:22, 8.54s/it] +2025-02-05 21:21:38 - ERROR - stderr - +2025-02-05 21:21:38 - ERROR - stderr - +2025-02-05 21:21:38 - INFO - stdout - {'loss': 0.7024, 'grad_norm': 1.309959888458252, 'learning_rate': 1.0379614127891185e-05, 'epoch': 1.51} +2025-02-05 21:21:38 - ERROR - stderr - 50%|█████ | 11291/22434 [11:13:58<26:26:22, 8.54s/it] +2025-02-05 21:21:57 - ERROR - stderr - 50%|█████ | 11292/22434 [11:14:17<35:59:14, 11.63s/it] +2025-02-05 21:21:57 - ERROR - stderr - +2025-02-05 21:21:57 - ERROR - stderr - +2025-02-05 21:21:57 - INFO - stdout - {'loss': 0.7617, 'grad_norm': 1.3697991371154785, 'learning_rate': 1.0378171417960152e-05, 'epoch': 1.51} +2025-02-05 21:21:57 - ERROR - stderr - 50%|█████ | 11292/22434 [11:14:17<35:59:14, 11.63s/it] +2025-02-05 21:22:00 - ERROR - stderr - 50%|█████ | 11293/22434 [11:14:20<27:46:01, 8.97s/it] +2025-02-05 21:22:00 - ERROR - stderr - +2025-02-05 21:22:00 - ERROR - stderr - +2025-02-05 21:22:00 - INFO - stdout - {'loss': 0.7784, 'grad_norm': 1.1924794912338257, 'learning_rate': 1.03767287001465e-05, 'epoch': 1.51} +2025-02-05 21:22:00 - ERROR - stderr - 50%|███��█ | 11293/22434 [11:14:20<27:46:01, 8.97s/it] +2025-02-05 21:22:09 - ERROR - stderr - 50%|█████ | 11294/22434 [11:14:29<27:57:24, 9.03s/it] +2025-02-05 21:22:09 - ERROR - stderr - +2025-02-05 21:22:09 - ERROR - stderr - +2025-02-05 21:22:09 - INFO - stdout - {'loss': 0.6579, 'grad_norm': 1.2200721502304077, 'learning_rate': 1.03752859744803e-05, 'epoch': 1.51} +2025-02-05 21:22:09 - ERROR - stderr - 50%|█████ | 11294/22434 [11:14:29<27:57:24, 9.03s/it] +2025-02-05 21:22:12 - ERROR - stderr - 50%|█████ | 11295/22434 [11:14:32<21:57:41, 7.10s/it] +2025-02-05 21:22:12 - ERROR - stderr - +2025-02-05 21:22:12 - ERROR - stderr - +2025-02-05 21:22:12 - INFO - stdout - {'loss': 0.6492, 'grad_norm': 1.1638315916061401, 'learning_rate': 1.037384324099162e-05, 'epoch': 1.51} +2025-02-05 21:22:12 - ERROR - stderr - 50%|█████ | 11295/22434 [11:14:32<21:57:41, 7.10s/it] +2025-02-05 21:22:22 - ERROR - stderr - 50%|█████ | 11296/22434 [11:14:42<24:54:07, 8.05s/it] +2025-02-05 21:22:22 - ERROR - stderr - +2025-02-05 21:22:22 - ERROR - stderr - +2025-02-05 21:22:22 - INFO - stdout - {'loss': 0.7429, 'grad_norm': 1.2186846733093262, 'learning_rate': 1.0372400499710537e-05, 'epoch': 1.51} +2025-02-05 21:22:22 - ERROR - stderr - 50%|█████ | 11296/22434 [11:14:42<24:54:07, 8.05s/it] +2025-02-05 21:22:32 - ERROR - stderr - 50%|█████ | 11297/22434 [11:14:52<26:33:34, 8.59s/it] +2025-02-05 21:22:32 - ERROR - stderr - +2025-02-05 21:22:32 - ERROR - stderr - +2025-02-05 21:22:32 - INFO - stdout - {'loss': 0.7457, 'grad_norm': 1.2120462656021118, 'learning_rate': 1.0370957750667125e-05, 'epoch': 1.51} +2025-02-05 21:22:32 - ERROR - stderr - 50%|█████ | 11297/22434 [11:14:52<26:33:34, 8.59s/it] +2025-02-05 21:22:39 - ERROR - stderr - 50%|█████ | 11298/22434 [11:14:58<24:47:16, 8.01s/it] +2025-02-05 21:22:39 - ERROR - stderr - +2025-02-05 21:22:39 - ERROR - stderr - +2025-02-05 21:22:39 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 1.1597504615783691, 'learning_rate': 1.0369514993891451e-05, 'epoch': 1.51} +2025-02-05 21:22:39 - ERROR - stderr - 50%|█████ | 11298/22434 [11:14:58<24:47:16, 8.01s/it] +2025-02-05 21:22:41 - ERROR - stderr - 50%|█████ | 11299/22434 [11:15:01<19:42:05, 6.37s/it] +2025-02-05 21:22:41 - ERROR - stderr - +2025-02-05 21:22:41 - ERROR - stderr - +2025-02-05 21:22:41 - INFO - stdout - {'loss': 0.6638, 'grad_norm': 1.1799989938735962, 'learning_rate': 1.036807222941359e-05, 'epoch': 1.51} +2025-02-05 21:22:41 - ERROR - stderr - 50%|█████ | 11299/22434 [11:15:01<19:42:05, 6.37s/it] +2025-02-05 21:22:48 - ERROR - stderr - 50%|█████ | 11300/22434 [11:15:08<20:18:56, 6.57s/it] +2025-02-05 21:22:48 - ERROR - stderr - +2025-02-05 21:22:48 - ERROR - stderr - +2025-02-05 21:22:48 - INFO - stdout - {'loss': 0.6645, 'grad_norm': 1.1815595626831055, 'learning_rate': 1.0366629457263616e-05, 'epoch': 1.51} +2025-02-05 21:22:48 - ERROR - stderr - 50%|█████ | 11300/22434 [11:15:08<20:18:56, 6.57s/it] +2025-02-05 21:22:54 - ERROR - stderr - 50%|█████ | 11301/22434 [11:15:14<20:06:19, 6.50s/it] +2025-02-05 21:22:55 - ERROR - stderr - +2025-02-05 21:22:55 - ERROR - stderr - +2025-02-05 21:22:55 - INFO - stdout - {'loss': 0.6483, 'grad_norm': 1.1958928108215332, 'learning_rate': 1.0365186677471598e-05, 'epoch': 1.51} +2025-02-05 21:22:55 - ERROR - stderr - 50%|█████ | 11301/22434 [11:15:14<20:06:19, 6.50s/it] +2025-02-05 21:22:57 - ERROR - stderr - 50%|█████ | 11302/22434 [11:15:17<16:23:15, 5.30s/it] +2025-02-05 21:22:57 - ERROR - stderr - +2025-02-05 21:22:57 - ERROR - stderr - +2025-02-05 21:22:57 - INFO - stdout - {'loss': 0.6653, 'grad_norm': 1.2273719310760498, 'learning_rate': 1.0363743890067621e-05, 'epoch': 1.51} +2025-02-05 21:22:57 - ERROR - stderr - 50%|█████ | 11302/22434 [11:15:17<16:23:15, 5.30s/it] +2025-02-05 21:22:59 - ERROR - stderr - 50%|█████ | 11303/22434 [11:15:19<13:47:53, 4.46s/it] +2025-02-05 21:23:00 - ERROR - stderr - +2025-02-05 21:23:00 - ERROR - stderr - +2025-02-05 21:23:00 - INFO - stdout - {'loss': 0.6473, 'grad_norm': 1.1292232275009155, 'learning_rate': 1.0362301095081746e-05, 'epoch': 1.51} +2025-02-05 21:23:00 - ERROR - stderr - 50%|█████ | 11303/22434 [11:15:19<13:47:53, 4.46s/it] +2025-02-05 21:23:02 - ERROR - stderr - 50%|█████ | 11304/22434 [11:15:22<11:58:47, 3.87s/it] +2025-02-05 21:23:02 - ERROR - stderr - +2025-02-05 21:23:02 - ERROR - stderr - +2025-02-05 21:23:02 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.2107740640640259, 'learning_rate': 1.0360858292544051e-05, 'epoch': 1.51} +2025-02-05 21:23:02 - ERROR - stderr - 50%|█████ | 11304/22434 [11:15:22<11:58:47, 3.87s/it] +2025-02-05 21:23:04 - ERROR - stderr - 50%|█████ | 11305/22434 [11:15:24<10:39:36, 3.45s/it] +2025-02-05 21:23:04 - ERROR - stderr - +2025-02-05 21:23:04 - ERROR - stderr - +2025-02-05 21:23:04 - INFO - stdout - {'loss': 0.7699, 'grad_norm': 1.2193636894226074, 'learning_rate': 1.035941548248461e-05, 'epoch': 1.51} +2025-02-05 21:23:04 - ERROR - stderr - 50%|█████ | 11305/22434 [11:15:24<10:39:36, 3.45s/it] +2025-02-05 21:23:10 - ERROR - stderr - 50%|█████ | 11306/22434 [11:15:30<12:53:59, 4.17s/it] +2025-02-05 21:23:10 - ERROR - stderr - +2025-02-05 21:23:10 - ERROR - stderr - +2025-02-05 21:23:10 - INFO - stdout - {'loss': 0.7149, 'grad_norm': 1.1529028415679932, 'learning_rate': 1.03579726649335e-05, 'epoch': 1.51} +2025-02-05 21:23:10 - ERROR - stderr - 50%|█████ | 11306/22434 [11:15:30<12:53:59, 4.17s/it] +2025-02-05 21:23:17 - ERROR - stderr - 50%|█████ | 11307/22434 [11:15:36<14:50:31, 4.80s/it] +2025-02-05 21:23:17 - ERROR - stderr - +2025-02-05 21:23:17 - ERROR - stderr - +2025-02-05 21:23:17 - INFO - stdout - {'loss': 0.7472, 'grad_norm': 1.3412538766860962, 'learning_rate': 1.035652983992079e-05, 'epoch': 1.51} +2025-02-05 21:23:17 - ERROR - stderr - 50%|█████ | 11307/22434 [11:15:36<14:50:31, 4.80s/it] +2025-02-05 21:23:19 - ERROR - stderr - 50%|█████ | 11308/22434 [11:15:39<12:49:23, 4.15s/it] +2025-02-05 21:23:19 - ERROR - stderr - +2025-02-05 21:23:19 - ERROR - stderr - +2025-02-05 21:23:19 - INFO - stdout - {'loss': 0.734, 'grad_norm': 1.2334516048431396, 'learning_rate': 1.0355087007476558e-05, 'epoch': 1.51} +2025-02-05 21:23:19 - ERROR - stderr - 50%|█████ | 11308/22434 [11:15:39<12:49:23, 4.15s/it] +2025-02-05 21:23:27 - ERROR - stderr - 50%|█████ | 11309/22434 [11:15:47<16:02:24, 5.19s/it] +2025-02-05 21:23:27 - ERROR - stderr - +2025-02-05 21:23:27 - ERROR - stderr - +2025-02-05 21:23:27 - INFO - stdout - {'loss': 0.8163, 'grad_norm': 1.467167615890503, 'learning_rate': 1.0353644167630877e-05, 'epoch': 1.51} +2025-02-05 21:23:27 - ERROR - stderr - 50%|█████ | 11309/22434 [11:15:47<16:02:24, 5.19s/it] +2025-02-05 21:23:33 - ERROR - stderr - 50%|█████ | 11310/22434 [11:15:53<16:43:47, 5.41s/it] +2025-02-05 21:23:33 - ERROR - stderr - +2025-02-05 21:23:33 - ERROR - stderr - +2025-02-05 21:23:33 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.1186554431915283, 'learning_rate': 1.0352201320413822e-05, 'epoch': 1.51} +2025-02-05 21:23:33 - ERROR - stderr - 50%|█████ | 11310/22434 [11:15:53<16:43:47, 5.41s/it] +2025-02-05 21:23:41 - ERROR - stderr - 50%|█████ | 11311/22434 [11:16:00<19:00:14, 6.15s/it] +2025-02-05 21:23:41 - ERROR - stderr - +2025-02-05 21:23:41 - ERROR - stderr - +2025-02-05 21:23:41 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.172777533531189, 'learning_rate': 1.0350758465855466e-05, 'epoch': 1.51} +2025-02-05 21:23:41 - ERROR - stderr - 50%|█████ | 11311/22434 [11:16:00<19:00:14, 6.15s/it] +2025-02-05 21:23:43 - ERROR - stderr - 50%|█████ | 11312/22434 [11:16:03<15:36:03, 5.05s/it] +2025-02-05 21:23:43 - ERROR - stderr - +2025-02-05 21:23:43 - ERROR - stderr - +2025-02-05 21:23:43 - INFO - stdout - {'loss': 0.7665, 'grad_norm': 1.2928880453109741, 'learning_rate': 1.0349315603985886e-05, 'epoch': 1.51} +2025-02-05 21:23:43 - ERROR - stderr - 50%|█████ | 11312/22434 [11:16:03<15:36:03, 5.05s/it] +2025-02-05 21:23:46 - ERROR - stderr - 50%|█████ | 11313/22434 [11:16:05<13:11:52, 4.27s/it] +2025-02-05 21:23:46 - ERROR - stderr - +2025-02-05 21:23:46 - ERROR - stderr - +2025-02-05 21:23:46 - INFO - stdout - {'loss': 0.644, 'grad_norm': 1.0531476736068726, 'learning_rate': 1.0347872734835154e-05, 'epoch': 1.51} +2025-02-05 21:23:46 - ERROR - stderr - 50%|█████ | 11313/22434 [11:16:05<13:11:52, 4.27s/it] +2025-02-05 21:23:51 - ERROR - stderr - 50%|█████ | 11314/22434 [11:16:11<14:10:43, 4.59s/it] +2025-02-05 21:23:51 - ERROR - stderr - +2025-02-05 21:23:51 - ERROR - stderr - +2025-02-05 21:23:51 - INFO - stdout - {'loss': 0.8218, 'grad_norm': 1.444922685623169, 'learning_rate': 1.0346429858433354e-05, 'epoch': 1.51} +2025-02-05 21:23:51 - ERROR - stderr - 50%|█████ | 11314/22434 [11:16:11<14:10:43, 4.59s/it] +2025-02-05 21:23:59 - ERROR - stderr - 50%|█████ | 11315/22434 [11:16:19<17:21:49, 5.62s/it] +2025-02-05 21:23:59 - ERROR - stderr - +2025-02-05 21:23:59 - ERROR - stderr - +2025-02-05 21:23:59 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 1.168660044670105, 'learning_rate': 1.0344986974810549e-05, 'epoch': 1.51} +2025-02-05 21:23:59 - ERROR - stderr - 50%|█████ | 11315/22434 [11:16:19<17:21:49, 5.62s/it] +2025-02-05 21:24:09 - ERROR - stderr - 50%|█████ | 11316/22434 [11:16:29<21:21:49, 6.92s/it] +2025-02-05 21:24:09 - ERROR - stderr - +2025-02-05 21:24:09 - ERROR - stderr - +2025-02-05 21:24:09 - INFO - stdout - {'loss': 0.6661, 'grad_norm': 1.1563942432403564, 'learning_rate': 1.0343544083996824e-05, 'epoch': 1.51} +2025-02-05 21:24:09 - ERROR - stderr - 50%|█████ | 11316/22434 [11:16:29<21:21:49, 6.92s/it] +2025-02-05 21:24:11 - ERROR - stderr - 50%|█████ | 11317/22434 [11:16:31<17:11:59, 5.57s/it] +2025-02-05 21:24:11 - ERROR - stderr - +2025-02-05 21:24:11 - ERROR - stderr - +2025-02-05 21:24:11 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.2931989431381226, 'learning_rate': 1.034210118602225e-05, 'epoch': 1.51} +2025-02-05 21:24:11 - ERROR - stderr - 50%|█████ | 11317/22434 [11:16:31<17:11:59, 5.57s/it] +2025-02-05 21:24:14 - ERROR - stderr - 50%|█████ | 11318/22434 [11:16:34<14:21:16, 4.65s/it] +2025-02-05 21:24:14 - ERROR - stderr - +2025-02-05 21:24:14 - ERROR - stderr - +2025-02-05 21:24:14 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.141377329826355, 'learning_rate': 1.0340658280916906e-05, 'epoch': 1.51} +2025-02-05 21:24:14 - ERROR - stderr - 50%|█████ | 11318/22434 [11:16:34<14:21:16, 4.65s/it] +2025-02-05 21:24:16 - ERROR - stderr - 50%|█████ | 11319/22434 [11:16:36<12:22:48, 4.01s/it] +2025-02-05 21:24:16 - ERROR - stderr - +2025-02-05 21:24:16 - ERROR - stderr - +2025-02-05 21:24:16 - INFO - stdout - {'loss': 0.7478, 'grad_norm': 1.3198901414871216, 'learning_rate': 1.0339215368710862e-05, 'epoch': 1.51} +2025-02-05 21:24:16 - ERROR - stderr - 50%|█████ | 11319/22434 [11:16:36<12:22:48, 4.01s/it] +2025-02-05 21:24:24 - ERROR - stderr - 50%|█████ | 11320/22434 [11:16:44<16:11:36, 5.25s/it] +2025-02-05 21:24:24 - ERROR - stderr - +2025-02-05 21:24:24 - ERROR - stderr - +2025-02-05 21:24:24 - INFO - stdout - {'loss': 0.6669, 'grad_norm': 1.1861226558685303, 'learning_rate': 1.03377724494342e-05, 'epoch': 1.51} +2025-02-05 21:24:24 - ERROR - stderr - 50%|█████ | 11320/22434 [11:16:44<16:11:36, 5.25s/it] +2025-02-05 21:24:30 - ERROR - stderr - 50%|█████ | 11321/22434 [11:16:50<16:50:51, 5.46s/it] +2025-02-05 21:24:30 - ERROR - stderr - +2025-02-05 21:24:30 - ERROR - stderr - +2025-02-05 21:24:30 - INFO - stdout - {'loss': 0.7096, 'grad_norm': 1.212786316871643, 'learning_rate': 1.0336329523116997e-05, 'epoch': 1.51} +2025-02-05 21:24:30 - ERROR - stderr - 50%|█████ | 11321/22434 [11:16:50<16:50:51, 5.46s/it] +2025-02-05 21:24:37 - ERROR - stderr - 50%|█████ | 11322/22434 [11:16:56<17:27:42, 5.66s/it] +2025-02-05 21:24:37 - ERROR - stderr - +2025-02-05 21:24:37 - ERROR - stderr - +2025-02-05 21:24:37 - INFO - stdout - {'loss': 0.7076, 'grad_norm': 1.109321117401123, 'learning_rate': 1.0334886589789326e-05, 'epoch': 1.51} +2025-02-05 21:24:37 - ERROR - stderr - 50%|█████ | 11322/22434 [11:16:56<17:27:42, 5.66s/it] +2025-02-05 21:24:43 - ERROR - stderr - 50%|█████ | 11323/22434 [11:17:03<18:13:37, 5.91s/it] +2025-02-05 21:24:43 - ERROR - stderr - +2025-02-05 21:24:43 - ERROR - stderr - +2025-02-05 21:24:43 - INFO - stdout - {'loss': 0.6744, 'grad_norm': 1.0888601541519165, 'learning_rate': 1.0333443649481265e-05, 'epoch': 1.51} +2025-02-05 21:24:43 - ERROR - stderr - 50%|█████ | 11323/22434 [11:17:03<18:13:37, 5.91s/it] +2025-02-05 21:24:46 - ERROR - stderr - 50%|█████ | 11324/22434 [11:17:05<15:04:58, 4.89s/it] +2025-02-05 21:24:46 - ERROR - stderr - +2025-02-05 21:24:46 - ERROR - stderr - +2025-02-05 21:24:46 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.210271954536438, 'learning_rate': 1.0332000702222889e-05, 'epoch': 1.51} +2025-02-05 21:24:46 - ERROR - stderr - 50%|█████ | 11324/22434 [11:17:05<15:04:58, 4.89s/it] +2025-02-05 21:24:48 - ERROR - stderr - 50%|█████ | 11325/22434 [11:17:08<12:54:45, 4.18s/it] +2025-02-05 21:24:48 - ERROR - stderr - +2025-02-05 21:24:48 - ERROR - stderr - +2025-02-05 21:24:48 - INFO - stdout - {'loss': 0.7621, 'grad_norm': 1.229527235031128, 'learning_rate': 1.0330557748044274e-05, 'epoch': 1.51} +2025-02-05 21:24:48 - ERROR - stderr - 50%|█████ | 11325/22434 [11:17:08<12:54:45, 4.18s/it] +2025-02-05 21:24:51 - ERROR - stderr - 50%|█████ | 11326/22434 [11:17:10<11:25:40, 3.70s/it] +2025-02-05 21:24:51 - ERROR - stderr - +2025-02-05 21:24:51 - ERROR - stderr - +2025-02-05 21:24:51 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.2199658155441284, 'learning_rate': 1.03291147869755e-05, 'epoch': 1.51} +2025-02-05 21:24:51 - ERROR - stderr - 50%|█████ | 11326/22434 [11:17:10<11:25:40, 3.70s/it] +2025-02-05 21:24:55 - ERROR - stderr - 50%|█████ | 11327/22434 [11:17:14<11:46:40, 3.82s/it] +2025-02-05 21:24:55 - ERROR - stderr - +2025-02-05 21:24:55 - ERROR - stderr - +2025-02-05 21:24:55 - INFO - stdout - {'loss': 0.7241, 'grad_norm': 1.1841131448745728, 'learning_rate': 1.0327671819046645e-05, 'epoch': 1.51} +2025-02-05 21:24:55 - ERROR - stderr - 50%|█████ | 11327/22434 [11:17:15<11:46:40, 3.82s/it] +2025-02-05 21:24:58 - ERROR - stderr - 50%|█████ | 11328/22434 [11:17:17<10:58:12, 3.56s/it] +2025-02-05 21:24:58 - ERROR - stderr - +2025-02-05 21:24:58 - ERROR - stderr - +2025-02-05 21:24:58 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.1946063041687012, 'learning_rate': 1.0326228844287784e-05, 'epoch': 1.51} +2025-02-05 21:24:58 - ERROR - stderr - 50%|█████ | 11328/22434 [11:17:17<10:58:12, 3.56s/it] +2025-02-05 21:25:00 - ERROR - stderr - 50%|█████ | 11329/22434 [11:17:20<9:58:04, 3.23s/it] +2025-02-05 21:25:00 - ERROR - stderr - +2025-02-05 21:25:00 - ERROR - stderr - +2025-02-05 21:25:00 - INFO - stdout - {'loss': 0.7038, 'grad_norm': 1.3085778951644897, 'learning_rate': 1.0324785862728995e-05, 'epoch': 1.51} +2025-02-05 21:25:00 - ERROR - stderr - 50%|█████ | 11329/22434 [11:17:20<9:58:04, 3.23s/it] +2025-02-05 21:25:03 - ERROR - stderr - 51%|█████ | 11330/22434 [11:17:22<9:19:46, 3.02s/it] +2025-02-05 21:25:03 - ERROR - stderr - +2025-02-05 21:25:03 - ERROR - stderr - +2025-02-05 21:25:03 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.210023045539856, 'learning_rate': 1.0323342874400358e-05, 'epoch': 1.52} +2025-02-05 21:25:03 - ERROR - stderr - 51%|█████ | 11330/22434 [11:17:23<9:19:46, 3.02s/it] +2025-02-05 21:25:05 - ERROR - stderr - 51%|█████ | 11331/22434 [11:17:25<8:48:34, 2.86s/it] +2025-02-05 21:25:05 - ERROR - stderr - +2025-02-05 21:25:05 - ERROR - stderr - +2025-02-05 21:25:05 - INFO - stdout - {'loss': 0.6316, 'grad_norm': 1.0165055990219116, 'learning_rate': 1.0321899879331942e-05, 'epoch': 1.52} +2025-02-05 21:25:05 - ERROR - stderr - 51%|█████ | 11331/22434 [11:17:25<8:48:34, 2.86s/it] +2025-02-05 21:25:08 - ERROR - stderr - 51%|█████ | 11332/22434 [11:17:27<8:28:57, 2.75s/it] +2025-02-05 21:25:08 - ERROR - stderr - +2025-02-05 21:25:08 - ERROR - stderr - +2025-02-05 21:25:08 - INFO - stdout - {'loss': 0.6064, 'grad_norm': 1.0894322395324707, 'learning_rate': 1.0320456877553833e-05, 'epoch': 1.52} +2025-02-05 21:25:08 - ERROR - stderr - 51%|█████ | 11332/22434 [11:17:27<8:28:57, 2.75s/it] +2025-02-05 21:25:10 - ERROR - stderr - 51%|█████ | 11333/22434 [11:17:30<8:14:43, 2.67s/it] +2025-02-05 21:25:10 - ERROR - stderr - +2025-02-05 21:25:10 - ERROR - stderr - +2025-02-05 21:25:10 - INFO - stdout - {'loss': 0.6874, 'grad_norm': 1.1646851301193237, 'learning_rate': 1.0319013869096109e-05, 'epoch': 1.52} +2025-02-05 21:25:10 - ERROR - stderr - 51%|█████ | 11333/22434 [11:17:30<8:14:43, 2.67s/it] +2025-02-05 21:25:13 - ERROR - stderr - 51%|█████ | 11334/22434 [11:17:32<8:07:15, 2.63s/it] +2025-02-05 21:25:13 - ERROR - stderr - +2025-02-05 21:25:13 - ERROR - stderr - +2025-02-05 21:25:13 - INFO - stdout - {'loss': 0.714, 'grad_norm': 1.2282353639602661, 'learning_rate': 1.0317570853988847e-05, 'epoch': 1.52} +2025-02-05 21:25:13 - ERROR - stderr - 51%|█████ | 11334/22434 [11:17:33<8:07:15, 2.63s/it] +2025-02-05 21:25:15 - ERROR - stderr - 51%|█████ | 11335/22434 [11:17:35<7:58:28, 2.59s/it] +2025-02-05 21:25:15 - ERROR - stderr - +2025-02-05 21:25:15 - ERROR - stderr - +2025-02-05 21:25:15 - INFO - stdout - {'loss': 0.7109, 'grad_norm': 1.2134082317352295, 'learning_rate': 1.0316127832262124e-05, 'epoch': 1.52} +2025-02-05 21:25:15 - ERROR - stderr - 51%|█████ | 11335/22434 [11:17:35<7:58:28, 2.59s/it] +2025-02-05 21:25:18 - ERROR - stderr - 51%|█████ | 11336/22434 [11:17:37<7:53:09, 2.56s/it] +2025-02-05 21:25:18 - ERROR - stderr - +2025-02-05 21:25:18 - ERROR - stderr - +2025-02-05 21:25:18 - INFO - stdout - {'loss': 0.7471, 'grad_norm': 1.259196400642395, 'learning_rate': 1.0314684803946015e-05, 'epoch': 1.52} +2025-02-05 21:25:18 - ERROR - stderr - 51%|█████ | 11336/22434 [11:17:37<7:53:09, 2.56s/it] +2025-02-05 21:25:20 - ERROR - stderr - 51%|█████ | 11337/22434 [11:17:40<7:49:16, 2.54s/it] +2025-02-05 21:25:20 - ERROR - stderr - +2025-02-05 21:25:20 - ERROR - stderr - +2025-02-05 21:25:20 - INFO - stdout - {'loss': 0.5871, 'grad_norm': 1.0562297105789185, 'learning_rate': 1.0313241769070605e-05, 'epoch': 1.52} +2025-02-05 21:25:20 - ERROR - stderr - 51%|█████ | 11337/22434 [11:17:40<7:49:16, 2.54s/it] +2025-02-05 21:25:23 - ERROR - stderr - 51%|█████ | 11338/22434 [11:17:42<7:46:34, 2.52s/it] +2025-02-05 21:25:23 - ERROR - stderr - +2025-02-05 21:25:23 - ERROR - stderr - +2025-02-05 21:25:23 - INFO - stdout - {'loss': 0.68, 'grad_norm': 1.2305461168289185, 'learning_rate': 1.0311798727665972e-05, 'epoch': 1.52} +2025-02-05 21:25:23 - ERROR - stderr - 51%|█████ | 11338/22434 [11:17:42<7:46:34, 2.52s/it] +2025-02-05 21:25:25 - ERROR - stderr - 51%|█████ | 11339/22434 [11:17:45<7:43:35, 2.51s/it] +2025-02-05 21:25:25 - ERROR - stderr - +2025-02-05 21:25:25 - ERROR - stderr - +2025-02-05 21:25:25 - INFO - stdout - {'loss': 0.7082, 'grad_norm': 1.251670002937317, 'learning_rate': 1.031035567976219e-05, 'epoch': 1.52} +2025-02-05 21:25:25 - ERROR - stderr - 51%|█████ | 11339/22434 [11:17:45<7:43:35, 2.51s/it] +2025-02-05 21:25:28 - ERROR - stderr - 51%|█████ | 11340/22434 [11:17:47<7:49:14, 2.54s/it] +2025-02-05 21:25:28 - ERROR - stderr - +2025-02-05 21:25:28 - ERROR - stderr - +2025-02-05 21:25:28 - INFO - stdout - {'loss': 0.6718, 'grad_norm': 1.1587879657745361, 'learning_rate': 1.0308912625389343e-05, 'epoch': 1.52} +2025-02-05 21:25:28 - ERROR - stderr - 51%|█████ | 11340/22434 [11:17:48<7:49:14, 2.54s/it] +2025-02-05 21:25:30 - ERROR - stderr - 51%|█████ | 11341/22434 [11:17:50<7:55:31, 2.57s/it] +2025-02-05 21:25:30 - ERROR - stderr - +2025-02-05 21:25:30 - ERROR - stderr - +2025-02-05 21:25:30 - INFO - stdout - {'loss': 0.6326, 'grad_norm': 1.1916331052780151, 'learning_rate': 1.0307469564577506e-05, 'epoch': 1.52} +2025-02-05 21:25:30 - ERROR - stderr - 51%|█████ | 11341/22434 [11:17:50<7:55:31, 2.57s/it] +2025-02-05 21:25:33 - ERROR - stderr - 51%|█████ | 11342/22434 [11:17:53<7:53:12, 2.56s/it] +2025-02-05 21:25:33 - ERROR - stderr - +2025-02-05 21:25:33 - ERROR - stderr - +2025-02-05 21:25:33 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.2042045593261719, 'learning_rate': 1.0306026497356763e-05, 'epoch': 1.52} +2025-02-05 21:25:33 - ERROR - stderr - 51%|█████ | 11342/22434 [11:17:53<7:53:12, 2.56s/it] +2025-02-05 21:25:35 - ERROR - stderr - 51%|█████ | 11343/22434 [11:17:55<7:47:42, 2.53s/it] +2025-02-05 21:25:35 - ERROR - stderr - +2025-02-05 21:25:35 - ERROR - stderr - +2025-02-05 21:25:35 - INFO - stdout - {'loss': 0.7657, 'grad_norm': 1.3651026487350464, 'learning_rate': 1.0304583423757188e-05, 'epoch': 1.52} +2025-02-05 21:25:35 - ERROR - stderr - 51%|█████ | 11343/22434 [11:17:55<7:47:42, 2.53s/it] +2025-02-05 21:25:38 - ERROR - stderr - 51%|█████ | 11344/22434 [11:17:58<7:46:49, 2.53s/it] +2025-02-05 21:25:38 - ERROR - stderr - +2025-02-05 21:25:38 - ERROR - stderr - +2025-02-05 21:25:38 - INFO - stdout - {'loss': 0.7467, 'grad_norm': 1.346718192100525, 'learning_rate': 1.0303140343808865e-05, 'epoch': 1.52} +2025-02-05 21:25:38 - ERROR - stderr - 51%|█████ | 11344/22434 [11:17:58<7:46:49, 2.53s/it] +2025-02-05 21:25:41 - ERROR - stderr - 51%|█████ | 11345/22434 [11:18:00<7:57:51, 2.59s/it] +2025-02-05 21:25:41 - ERROR - stderr - +2025-02-05 21:25:41 - ERROR - stderr - +2025-02-05 21:25:41 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.2971117496490479, 'learning_rate': 1.0301697257541867e-05, 'epoch': 1.52} +2025-02-05 21:25:41 - ERROR - stderr - 51%|█████ | 11345/22434 [11:18:00<7:57:51, 2.59s/it] +2025-02-05 21:25:43 - ERROR - stderr - 51%|█████ | 11346/22434 [11:18:03<7:57:53, 2.59s/it] +2025-02-05 21:25:43 - ERROR - stderr - +2025-02-05 21:25:43 - ERROR - stderr - +2025-02-05 21:25:43 - INFO - stdout - {'loss': 0.6928, 'grad_norm': 1.192138910293579, 'learning_rate': 1.0300254164986283e-05, 'epoch': 1.52} +2025-02-05 21:25:43 - ERROR - stderr - 51%|█████ | 11346/22434 [11:18:03<7:57:53, 2.59s/it] +2025-02-05 21:25:46 - ERROR - stderr - 51%|█████ | 11347/22434 [11:18:05<7:51:05, 2.55s/it] +2025-02-05 21:25:46 - ERROR - stderr - +2025-02-05 21:25:46 - ERROR - stderr - +2025-02-05 21:25:46 - INFO - stdout - {'loss': 0.6812, 'grad_norm': 1.2294753789901733, 'learning_rate': 1.0298811066172185e-05, 'epoch': 1.52} +2025-02-05 21:25:46 - ERROR - stderr - 51%|█████ | 11347/22434 [11:18:05<7:51:05, 2.55s/it] +2025-02-05 21:25:48 - ERROR - stderr - 51%|█████ | 11348/22434 [11:18:08<7:45:23, 2.52s/it] +2025-02-05 21:25:48 - ERROR - stderr - +2025-02-05 21:25:48 - ERROR - stderr - +2025-02-05 21:25:48 - INFO - stdout - {'loss': 0.7427, 'grad_norm': 1.3203856945037842, 'learning_rate': 1.0297367961129658e-05, 'epoch': 1.52} +2025-02-05 21:25:48 - ERROR - stderr - 51%|█████ | 11348/22434 [11:18:08<7:45:23, 2.52s/it] +2025-02-05 21:25:51 - ERROR - stderr - 51%|█████ | 11349/22434 [11:18:10<7:45:57, 2.52s/it] +2025-02-05 21:25:51 - ERROR - stderr - +2025-02-05 21:25:51 - ERROR - stderr - +2025-02-05 21:25:51 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.2395155429840088, 'learning_rate': 1.0295924849888781e-05, 'epoch': 1.52} +2025-02-05 21:25:51 - ERROR - stderr - 51%|█████ | 11349/22434 [11:18:10<7:45:57, 2.52s/it] +2025-02-05 21:25:53 - ERROR - stderr - 51%|█████ | 11350/22434 [11:18:13<7:43:13, 2.51s/it] +2025-02-05 21:25:53 - ERROR - stderr - +2025-02-05 21:25:53 - ERROR - stderr - +2025-02-05 21:25:53 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.1613038778305054, 'learning_rate': 1.0294481732479635e-05, 'epoch': 1.52} +2025-02-05 21:25:53 - ERROR - stderr - 51%|█████ | 11350/22434 [11:18:13<7:43:13, 2.51s/it] +2025-02-05 21:25:56 - ERROR - stderr - 51%|█████ | 11351/22434 [11:18:15<7:48:11, 2.53s/it] +2025-02-05 21:25:56 - ERROR - stderr - +2025-02-05 21:25:56 - ERROR - stderr - +2025-02-05 21:25:56 - INFO - stdout - {'loss': 0.6761, 'grad_norm': 0.9489179849624634, 'learning_rate': 1.0293038608932296e-05, 'epoch': 1.52} +2025-02-05 21:25:56 - ERROR - stderr - 51%|█████ | 11351/22434 [11:18:16<7:48:11, 2.53s/it] +2025-02-05 21:25:58 - ERROR - stderr - 51%|█████ | 11352/22434 [11:18:18<7:49:24, 2.54s/it] +2025-02-05 21:25:58 - ERROR - stderr - +2025-02-05 21:25:58 - ERROR - stderr - +2025-02-05 21:25:58 - INFO - stdout - {'loss': 0.6928, 'grad_norm': 1.1639461517333984, 'learning_rate': 1.0291595479276849e-05, 'epoch': 1.52} +2025-02-05 21:25:58 - ERROR - stderr - 51%|█████ | 11352/22434 [11:18:18<7:49:24, 2.54s/it] +2025-02-05 21:26:01 - ERROR - stderr - 51%|█████ | 11353/22434 [11:18:21<7:47:34, 2.53s/it] +2025-02-05 21:26:01 - ERROR - stderr - +2025-02-05 21:26:01 - ERROR - stderr - +2025-02-05 21:26:01 - INFO - stdout - {'loss': 0.7025, 'grad_norm': 1.2916233539581299, 'learning_rate': 1.0290152343543372e-05, 'epoch': 1.52} +2025-02-05 21:26:01 - ERROR - stderr - 51%|█████ | 11353/22434 [11:18:21<7:47:34, 2.53s/it] +2025-02-05 21:26:03 - ERROR - stderr - 51%|█████ | 11354/22434 [11:18:23<7:47:22, 2.53s/it] +2025-02-05 21:26:03 - ERROR - stderr - +2025-02-05 21:26:03 - ERROR - stderr - +2025-02-05 21:26:03 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.358557105064392, 'learning_rate': 1.0288709201761949e-05, 'epoch': 1.52} +2025-02-05 21:26:03 - ERROR - stderr - 51%|█████ | 11354/22434 [11:18:23<7:47:22, 2.53s/it] +2025-02-05 21:26:06 - ERROR - stderr - 51%|█████ | 11355/22434 [11:18:26<7:48:49, 2.54s/it] +2025-02-05 21:26:06 - ERROR - stderr - +2025-02-05 21:26:06 - ERROR - stderr - +2025-02-05 21:26:06 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.143175482749939, 'learning_rate': 1.0287266053962657e-05, 'epoch': 1.52} +2025-02-05 21:26:06 - ERROR - stderr - 51%|█████ | 11355/22434 [11:18:26<7:48:49, 2.54s/it] +2025-02-05 21:26:08 - ERROR - stderr - 51%|█████ | 11356/22434 [11:18:28<7:47:35, 2.53s/it] +2025-02-05 21:26:08 - ERROR - stderr - +2025-02-05 21:26:08 - ERROR - stderr - +2025-02-05 21:26:08 - INFO - stdout - {'loss': 0.7027, 'grad_norm': 1.2170140743255615, 'learning_rate': 1.028582290017558e-05, 'epoch': 1.52} +2025-02-05 21:26:08 - ERROR - stderr - 51%|█████ | 11356/22434 [11:18:28<7:47:35, 2.53s/it] +2025-02-05 21:26:11 - ERROR - stderr - 51%|█████ | 11357/22434 [11:18:31<7:43:56, 2.51s/it] +2025-02-05 21:26:11 - ERROR - stderr - +2025-02-05 21:26:11 - ERROR - stderr - +2025-02-05 21:26:11 - INFO - stdout - {'loss': 0.7186, 'grad_norm': 1.3404967784881592, 'learning_rate': 1.0284379740430798e-05, 'epoch': 1.52} +2025-02-05 21:26:11 - ERROR - stderr - 51%|█████ | 11357/22434 [11:18:31<7:43:56, 2.51s/it] +2025-02-05 21:26:13 - ERROR - stderr - 51%|█████ | 11358/22434 [11:18:33<7:42:37, 2.51s/it] +2025-02-05 21:26:13 - ERROR - stderr - +2025-02-05 21:26:13 - ERROR - stderr - +2025-02-05 21:26:13 - INFO - stdout - {'loss': 0.606, 'grad_norm': 1.2315402030944824, 'learning_rate': 1.0282936574758394e-05, 'epoch': 1.52} +2025-02-05 21:26:13 - ERROR - stderr - 51%|█████ | 11358/22434 [11:18:33<7:42:37, 2.51s/it] +2025-02-05 21:26:16 - ERROR - stderr - 51%|█████ | 11359/22434 [11:18:36<7:40:27, 2.49s/it] +2025-02-05 21:26:16 - ERROR - stderr - +2025-02-05 21:26:16 - ERROR - stderr - +2025-02-05 21:26:16 - INFO - stdout - {'loss': 0.605, 'grad_norm': 1.065169334411621, 'learning_rate': 1.0281493403188446e-05, 'epoch': 1.52} +2025-02-05 21:26:16 - ERROR - stderr - 51%|█████ | 11359/22434 [11:18:36<7:40:27, 2.49s/it] +2025-02-05 21:26:18 - ERROR - stderr - 51%|█████ | 11360/22434 [11:18:38<7:48:50, 2.54s/it] +2025-02-05 21:26:18 - ERROR - stderr - +2025-02-05 21:26:18 - ERROR - stderr - +2025-02-05 21:26:18 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.1744664907455444, 'learning_rate': 1.0280050225751036e-05, 'epoch': 1.52} +2025-02-05 21:26:18 - ERROR - stderr - 51%|█████ | 11360/22434 [11:18:38<7:48:50, 2.54s/it] +2025-02-05 21:26:21 - ERROR - stderr - 51%|█████ | 11361/22434 [11:18:41<7:44:14, 2.52s/it] +2025-02-05 21:26:21 - ERROR - stderr - +2025-02-05 21:26:21 - ERROR - stderr - +2025-02-05 21:26:21 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.383623480796814, 'learning_rate': 1.027860704247625e-05, 'epoch': 1.52} +2025-02-05 21:26:21 - ERROR - stderr - 51%|█████ | 11361/22434 [11:18:41<7:44:14, 2.52s/it] +2025-02-05 21:26:23 - ERROR - stderr - 51%|█████ | 11362/22434 [11:18:43<7:42:09, 2.50s/it] +2025-02-05 21:26:23 - ERROR - stderr - +2025-02-05 21:26:23 - ERROR - stderr - +2025-02-05 21:26:23 - INFO - stdout - {'loss': 0.6846, 'grad_norm': 1.2623125314712524, 'learning_rate': 1.0277163853394166e-05, 'epoch': 1.52} +2025-02-05 21:26:23 - ERROR - stderr - 51%|█████ | 11362/22434 [11:18:43<7:42:09, 2.50s/it] +2025-02-05 21:26:26 - ERROR - stderr - 51%|█████ | 11363/22434 [11:18:46<7:40:30, 2.50s/it] +2025-02-05 21:26:26 - ERROR - stderr - +2025-02-05 21:26:26 - ERROR - stderr - +2025-02-05 21:26:26 - INFO - stdout - {'loss': 0.7395, 'grad_norm': 1.282300353050232, 'learning_rate': 1.0275720658534867e-05, 'epoch': 1.52} +2025-02-05 21:26:26 - ERROR - stderr - 51%|█████ | 11363/22434 [11:18:46<7:40:30, 2.50s/it] +2025-02-05 21:26:28 - ERROR - stderr - 51%|█████ | 11364/22434 [11:18:48<7:44:43, 2.52s/it] +2025-02-05 21:26:28 - ERROR - stderr - +2025-02-05 21:26:28 - ERROR - stderr - +2025-02-05 21:26:28 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.125113606452942, 'learning_rate': 1.027427745792843e-05, 'epoch': 1.52} +2025-02-05 21:26:28 - ERROR - stderr - 51%|█████ | 11364/22434 [11:18:48<7:44:43, 2.52s/it] +2025-02-05 21:26:31 - ERROR - stderr - 51%|█████ | 11365/22434 [11:18:51<7:41:10, 2.50s/it] +2025-02-05 21:26:31 - ERROR - stderr - +2025-02-05 21:26:31 - ERROR - stderr - +2025-02-05 21:26:31 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.3038486242294312, 'learning_rate': 1.0272834251604946e-05, 'epoch': 1.52} +2025-02-05 21:26:31 - ERROR - stderr - 51%|█████ | 11365/22434 [11:18:51<7:41:10, 2.50s/it] +2025-02-05 21:26:33 - ERROR - stderr - 51%|█████ | 11366/22434 [11:18:53<7:36:31, 2.47s/it] +2025-02-05 21:26:33 - ERROR - stderr - +2025-02-05 21:26:33 - ERROR - stderr - +2025-02-05 21:26:33 - INFO - stdout - {'loss': 0.7537, 'grad_norm': 1.2768163681030273, 'learning_rate': 1.0271391039594496e-05, 'epoch': 1.52} +2025-02-05 21:26:33 - ERROR - stderr - 51%|█████ | 11366/22434 [11:18:53<7:36:31, 2.47s/it] +2025-02-05 21:26:36 - ERROR - stderr - 51%|█████ | 11367/22434 [11:18:56<7:35:57, 2.47s/it] +2025-02-05 21:26:36 - ERROR - stderr - +2025-02-05 21:26:36 - ERROR - stderr - +2025-02-05 21:26:36 - INFO - stdout - {'loss': 0.7245, 'grad_norm': 1.3322765827178955, 'learning_rate': 1.0269947821927155e-05, 'epoch': 1.52} +2025-02-05 21:26:36 - ERROR - stderr - 51%|█████ | 11367/22434 [11:18:56<7:35:57, 2.47s/it] +2025-02-05 21:26:38 - ERROR - stderr - 51%|█████ | 11368/22434 [11:18:58<7:35:55, 2.47s/it] +2025-02-05 21:26:38 - ERROR - stderr - +2025-02-05 21:26:38 - ERROR - stderr - +2025-02-05 21:26:38 - INFO - stdout - {'loss': 0.6865, 'grad_norm': 1.2983310222625732, 'learning_rate': 1.0268504598633011e-05, 'epoch': 1.52} +2025-02-05 21:26:38 - ERROR - stderr - 51%|█████ | 11368/22434 [11:18:58<7:35:55, 2.47s/it] +2025-02-05 21:26:41 - ERROR - stderr - 51%|█████ | 11369/22434 [11:19:01<7:41:26, 2.50s/it] +2025-02-05 21:26:41 - ERROR - stderr - +2025-02-05 21:26:41 - ERROR - stderr - +2025-02-05 21:26:41 - INFO - stdout - {'loss': 0.7442, 'grad_norm': 1.180198073387146, 'learning_rate': 1.0267061369742147e-05, 'epoch': 1.52} +2025-02-05 21:26:41 - ERROR - stderr - 51%|█████ | 11369/22434 [11:19:01<7:41:26, 2.50s/it] +2025-02-05 21:26:44 - ERROR - stderr - 51%|█████ | 11370/22434 [11:19:03<7:53:25, 2.57s/it] +2025-02-05 21:26:44 - ERROR - stderr - +2025-02-05 21:26:44 - ERROR - stderr - +2025-02-05 21:26:44 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.1388121843338013, 'learning_rate': 1.0265618135284643e-05, 'epoch': 1.52} +2025-02-05 21:26:44 - ERROR - stderr - 51%|█████ | 11370/22434 [11:19:03<7:53:25, 2.57s/it] +2025-02-05 21:26:46 - ERROR - stderr - 51%|█████ | 11371/22434 [11:19:06<7:50:05, 2.55s/it] +2025-02-05 21:26:46 - ERROR - stderr - +2025-02-05 21:26:46 - ERROR - stderr - +2025-02-05 21:26:46 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.09035325050354, 'learning_rate': 1.0264174895290582e-05, 'epoch': 1.52} +2025-02-05 21:26:46 - ERROR - stderr - 51%|█████ | 11371/22434 [11:19:06<7:50:05, 2.55s/it] +2025-02-05 21:26:48 - ERROR - stderr - 51%|█████ | 11372/22434 [11:19:08<7:44:32, 2.52s/it] +2025-02-05 21:26:49 - ERROR - stderr - +2025-02-05 21:26:49 - ERROR - stderr - +2025-02-05 21:26:49 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.20558500289917, 'learning_rate': 1.026273164979005e-05, 'epoch': 1.52} +2025-02-05 21:26:49 - ERROR - stderr - 51%|█████ | 11372/22434 [11:19:08<7:44:32, 2.52s/it] +2025-02-05 21:26:51 - ERROR - stderr - 51%|█████ | 11373/22434 [11:19:11<7:39:09, 2.49s/it] +2025-02-05 21:26:51 - ERROR - stderr - +2025-02-05 21:26:51 - ERROR - stderr - +2025-02-05 21:26:51 - INFO - stdout - {'loss': 0.6119, 'grad_norm': 1.1587101221084595, 'learning_rate': 1.0261288398813127e-05, 'epoch': 1.52} +2025-02-05 21:26:51 - ERROR - stderr - 51%|█████ | 11373/22434 [11:19:11<7:39:09, 2.49s/it] +2025-02-05 21:26:53 - ERROR - stderr - 51%|█████ | 11374/22434 [11:19:13<7:37:36, 2.48s/it] +2025-02-05 21:26:53 - ERROR - stderr - +2025-02-05 21:26:53 - ERROR - stderr - +2025-02-05 21:26:53 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.2932541370391846, 'learning_rate': 1.0259845142389899e-05, 'epoch': 1.52} +2025-02-05 21:26:53 - ERROR - stderr - 51%|█████ | 11374/22434 [11:19:13<7:37:36, 2.48s/it] +2025-02-05 21:26:56 - ERROR - stderr - 51%|█████ | 11375/22434 [11:19:16<7:42:15, 2.51s/it] +2025-02-05 21:26:56 - ERROR - stderr - +2025-02-05 21:26:56 - ERROR - stderr - +2025-02-05 21:26:56 - INFO - stdout - {'loss': 0.6807, 'grad_norm': 1.1788967847824097, 'learning_rate': 1.0258401880550449e-05, 'epoch': 1.52} +2025-02-05 21:26:56 - ERROR - stderr - 51%|█████ | 11375/22434 [11:19:16<7:42:15, 2.51s/it] +2025-02-05 21:26:58 - ERROR - stderr - 51%|█████ | 11376/22434 [11:19:18<7:39:08, 2.49s/it] +2025-02-05 21:26:58 - ERROR - stderr - +2025-02-05 21:26:58 - ERROR - stderr - +2025-02-05 21:26:58 - INFO - stdout - {'loss': 0.6778, 'grad_norm': 1.197046160697937, 'learning_rate': 1.0256958613324855e-05, 'epoch': 1.52} +2025-02-05 21:26:58 - ERROR - stderr - 51%|█████ | 11376/22434 [11:19:18<7:39:08, 2.49s/it] +2025-02-05 21:27:01 - ERROR - stderr - 51%|█████ | 11377/22434 [11:19:21<7:35:21, 2.47s/it] +2025-02-05 21:27:01 - ERROR - stderr - +2025-02-05 21:27:01 - ERROR - stderr - +2025-02-05 21:27:01 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.2200101613998413, 'learning_rate': 1.0255515340743206e-05, 'epoch': 1.52} +2025-02-05 21:27:01 - ERROR - stderr - 51%|█████ | 11377/22434 [11:19:21<7:35:21, 2.47s/it] +2025-02-05 21:27:03 - ERROR - stderr - 51%|█████ | 11378/22434 [11:19:23<7:36:57, 2.48s/it] +2025-02-05 21:27:03 - ERROR - stderr - +2025-02-05 21:27:03 - ERROR - stderr - +2025-02-05 21:27:03 - INFO - stdout - {'loss': 0.6933, 'grad_norm': 1.1235463619232178, 'learning_rate': 1.0254072062835585e-05, 'epoch': 1.52} +2025-02-05 21:27:03 - ERROR - stderr - 51%|█████ | 11378/22434 [11:19:23<7:36:57, 2.48s/it] +2025-02-05 21:27:06 - ERROR - stderr - 51%|█████ | 11379/22434 [11:19:26<7:40:18, 2.50s/it] +2025-02-05 21:27:06 - ERROR - stderr - +2025-02-05 21:27:06 - ERROR - stderr - +2025-02-05 21:27:06 - INFO - stdout - {'loss': 0.7206, 'grad_norm': 1.2488973140716553, 'learning_rate': 1.0252628779632075e-05, 'epoch': 1.52} +2025-02-05 21:27:06 - ERROR - stderr - 51%|█████ | 11379/22434 [11:19:26<7:40:18, 2.50s/it] +2025-02-05 21:27:08 - ERROR - stderr - 51%|█████ | 11380/22434 [11:19:28<7:40:04, 2.50s/it] +2025-02-05 21:27:08 - ERROR - stderr - +2025-02-05 21:27:08 - ERROR - stderr - +2025-02-05 21:27:08 - INFO - stdout - {'loss': 0.7618, 'grad_norm': 1.1184589862823486, 'learning_rate': 1.0251185491162758e-05, 'epoch': 1.52} +2025-02-05 21:27:08 - ERROR - stderr - 51%|█████ | 11380/22434 [11:19:28<7:40:04, 2.50s/it] +2025-02-05 21:27:11 - ERROR - stderr - 51%|█████ | 11381/22434 [11:19:31<7:43:12, 2.51s/it] +2025-02-05 21:27:11 - ERROR - stderr - +2025-02-05 21:27:11 - ERROR - stderr - +2025-02-05 21:27:11 - INFO - stdout - {'loss': 0.7185, 'grad_norm': 1.1048762798309326, 'learning_rate': 1.0249742197457721e-05, 'epoch': 1.52} +2025-02-05 21:27:11 - ERROR - stderr - 51%|█████ | 11381/22434 [11:19:31<7:43:12, 2.51s/it] +2025-02-05 21:27:13 - ERROR - stderr - 51%|█████ | 11382/22434 [11:19:33<7:42:59, 2.51s/it] +2025-02-05 21:27:13 - ERROR - stderr - +2025-02-05 21:27:13 - ERROR - stderr - +2025-02-05 21:27:13 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.2889329195022583, 'learning_rate': 1.024829889854705e-05, 'epoch': 1.52} +2025-02-05 21:27:13 - ERROR - stderr - 51%|█████ | 11382/22434 [11:19:33<7:42:59, 2.51s/it] +2025-02-05 21:27:16 - ERROR - stderr - 51%|█████ | 11383/22434 [11:19:36<7:42:59, 2.51s/it] +2025-02-05 21:27:16 - ERROR - stderr - +2025-02-05 21:27:16 - ERROR - stderr - +2025-02-05 21:27:16 - INFO - stdout - {'loss': 0.6896, 'grad_norm': 1.1219260692596436, 'learning_rate': 1.0246855594460818e-05, 'epoch': 1.52} +2025-02-05 21:27:16 - ERROR - stderr - 51%|█████ | 11383/22434 [11:19:36<7:42:59, 2.51s/it] +2025-02-05 21:27:18 - ERROR - stderr - 51%|█████ | 11384/22434 [11:19:38<7:41:08, 2.50s/it] +2025-02-05 21:27:18 - ERROR - stderr - +2025-02-05 21:27:18 - ERROR - stderr - +2025-02-05 21:27:18 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 1.1798728704452515, 'learning_rate': 1.0245412285229124e-05, 'epoch': 1.52} +2025-02-05 21:27:18 - ERROR - stderr - 51%|█████ | 11384/22434 [11:19:38<7:41:08, 2.50s/it] +2025-02-05 21:27:21 - ERROR - stderr - 51%|█████ | 11385/22434 [11:19:41<7:36:38, 2.48s/it] +2025-02-05 21:27:21 - ERROR - stderr - +2025-02-05 21:27:21 - ERROR - stderr - +2025-02-05 21:27:21 - INFO - stdout - {'loss': 0.6993, 'grad_norm': 1.2250559329986572, 'learning_rate': 1.0243968970882044e-05, 'epoch': 1.52} +2025-02-05 21:27:21 - ERROR - stderr - 51%|█████ | 11385/22434 [11:19:41<7:36:38, 2.48s/it] +2025-02-05 21:27:23 - ERROR - stderr - 51%|█████ | 11386/22434 [11:19:43<7:38:28, 2.49s/it] +2025-02-05 21:27:23 - ERROR - stderr - +2025-02-05 21:27:23 - ERROR - stderr - +2025-02-05 21:27:23 - INFO - stdout - {'loss': 0.5716, 'grad_norm': 1.2025673389434814, 'learning_rate': 1.0242525651449664e-05, 'epoch': 1.52} +2025-02-05 21:27:23 - ERROR - stderr - 51%|█████ | 11386/22434 [11:19:43<7:38:28, 2.49s/it] +2025-02-05 21:27:26 - ERROR - stderr - 51%|█████ | 11387/22434 [11:19:46<7:37:37, 2.49s/it] +2025-02-05 21:27:26 - ERROR - stderr - +2025-02-05 21:27:26 - ERROR - stderr - +2025-02-05 21:27:26 - INFO - stdout - {'loss': 0.6242, 'grad_norm': 1.1163078546524048, 'learning_rate': 1.024108232696207e-05, 'epoch': 1.52} +2025-02-05 21:27:26 - ERROR - stderr - 51%|█████ | 11387/22434 [11:19:46<7:37:37, 2.49s/it] +2025-02-05 21:27:28 - ERROR - stderr - 51%|█████ | 11388/22434 [11:19:48<7:36:29, 2.48s/it] +2025-02-05 21:27:28 - ERROR - stderr - +2025-02-05 21:27:28 - ERROR - stderr - +2025-02-05 21:27:28 - INFO - stdout - {'loss': 0.78, 'grad_norm': 1.3289074897766113, 'learning_rate': 1.0239638997449346e-05, 'epoch': 1.52} +2025-02-05 21:27:28 - ERROR - stderr - 51%|█████ | 11388/22434 [11:19:48<7:36:29, 2.48s/it] +2025-02-05 21:27:31 - ERROR - stderr - 51%|█████ | 11389/22434 [11:19:51<7:40:12, 2.50s/it] +2025-02-05 21:27:31 - ERROR - stderr - +2025-02-05 21:27:31 - ERROR - stderr - +2025-02-05 21:27:31 - INFO - stdout - {'loss': 0.7035, 'grad_norm': 1.1160694360733032, 'learning_rate': 1.0238195662941574e-05, 'epoch': 1.52} +2025-02-05 21:27:31 - ERROR - stderr - 51%|█████ | 11389/22434 [11:19:51<7:40:12, 2.50s/it] +2025-02-05 21:27:33 - ERROR - stderr - 51%|█████ | 11390/22434 [11:19:53<7:48:34, 2.55s/it] +2025-02-05 21:27:34 - ERROR - stderr - +2025-02-05 21:27:34 - ERROR - stderr - +2025-02-05 21:27:34 - INFO - stdout - {'loss': 0.7427, 'grad_norm': 1.428734302520752, 'learning_rate': 1.0236752323468844e-05, 'epoch': 1.52} +2025-02-05 21:27:34 - ERROR - stderr - 51%|█████ | 11390/22434 [11:19:53<7:48:34, 2.55s/it] +2025-02-05 21:27:36 - ERROR - stderr - 51%|█████ | 11391/22434 [11:19:56<8:02:37, 2.62s/it] +2025-02-05 21:27:36 - ERROR - stderr - +2025-02-05 21:27:36 - ERROR - stderr - +2025-02-05 21:27:36 - INFO - stdout - {'loss': 0.7423, 'grad_norm': 1.2779194116592407, 'learning_rate': 1.0235308979061235e-05, 'epoch': 1.52} +2025-02-05 21:27:36 - ERROR - stderr - 51%|█████ | 11391/22434 [11:19:56<8:02:37, 2.62s/it] +2025-02-05 21:27:39 - ERROR - stderr - 51%|█████ | 11392/22434 [11:19:58<7:51:32, 2.56s/it] +2025-02-05 21:27:39 - ERROR - stderr - +2025-02-05 21:27:39 - ERROR - stderr - +2025-02-05 21:27:39 - INFO - stdout - {'loss': 0.7376, 'grad_norm': 1.2424854040145874, 'learning_rate': 1.0233865629748838e-05, 'epoch': 1.52} +2025-02-05 21:27:39 - ERROR - stderr - 51%|█████ | 11392/22434 [11:19:59<7:51:32, 2.56s/it] +2025-02-05 21:27:41 - ERROR - stderr - 51%|█████ | 11393/22434 [11:20:01<7:49:56, 2.55s/it] +2025-02-05 21:27:41 - ERROR - stderr - +2025-02-05 21:27:41 - ERROR - stderr - +2025-02-05 21:27:41 - INFO - stdout - {'loss': 0.8003, 'grad_norm': 1.4598060846328735, 'learning_rate': 1.0232422275561735e-05, 'epoch': 1.52} +2025-02-05 21:27:41 - ERROR - stderr - 51%|█████ | 11393/22434 [11:20:01<7:49:56, 2.55s/it] +2025-02-05 21:27:44 - ERROR - stderr - 51%|█████ | 11394/22434 [11:20:04<7:46:06, 2.53s/it] +2025-02-05 21:27:44 - ERROR - stderr - +2025-02-05 21:27:44 - ERROR - stderr - +2025-02-05 21:27:44 - INFO - stdout - {'loss': 0.6988, 'grad_norm': 1.2185792922973633, 'learning_rate': 1.0230978916530012e-05, 'epoch': 1.52} +2025-02-05 21:27:44 - ERROR - stderr - 51%|█████ | 11394/22434 [11:20:04<7:46:06, 2.53s/it] +2025-02-05 21:27:46 - ERROR - stderr - 51%|█████ | 11395/22434 [11:20:06<7:47:23, 2.54s/it] +2025-02-05 21:27:46 - ERROR - stderr - +2025-02-05 21:27:46 - ERROR - stderr - +2025-02-05 21:27:46 - INFO - stdout - {'loss': 0.6794, 'grad_norm': 1.132039189338684, 'learning_rate': 1.0229535552683757e-05, 'epoch': 1.52} +2025-02-05 21:27:46 - ERROR - stderr - 51%|█████ | 11395/22434 [11:20:06<7:47:23, 2.54s/it] +2025-02-05 21:27:49 - ERROR - stderr - 51%|█████ | 11396/22434 [11:20:09<7:53:54, 2.58s/it] +2025-02-05 21:27:49 - ERROR - stderr - +2025-02-05 21:27:49 - ERROR - stderr - +2025-02-05 21:27:49 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.1940776109695435, 'learning_rate': 1.022809218405305e-05, 'epoch': 1.52} +2025-02-05 21:27:49 - ERROR - stderr - 51%|█████ | 11396/22434 [11:20:09<7:53:54, 2.58s/it] +2025-02-05 21:27:49 - INFO - stdout - WARNING: tokenization mismatch: 110 vs. 127. (ignored) +2025-02-05 21:27:51 - ERROR - stderr - 51%|█████ | 11397/22434 [11:20:11<7:50:51, 2.56s/it] +2025-02-05 21:27:52 - ERROR - stderr - +2025-02-05 21:27:52 - ERROR - stderr - +2025-02-05 21:27:52 - INFO - stdout - {'loss': 0.7013, 'grad_norm': 1.2174535989761353, 'learning_rate': 1.0226648810667979e-05, 'epoch': 1.52} +2025-02-05 21:27:52 - ERROR - stderr - 51%|█████ | 11397/22434 [11:20:11<7:50:51, 2.56s/it] +2025-02-05 21:27:54 - ERROR - stderr - 51%|█████ | 11398/22434 [11:20:14<7:55:31, 2.59s/it] +2025-02-05 21:27:54 - ERROR - stderr - +2025-02-05 21:27:54 - ERROR - stderr - +2025-02-05 21:27:54 - INFO - stdout - {'loss': 0.7185, 'grad_norm': 1.1812546253204346, 'learning_rate': 1.0225205432558632e-05, 'epoch': 1.52} +2025-02-05 21:27:54 - ERROR - stderr - 51%|█████ | 11398/22434 [11:20:14<7:55:31, 2.59s/it] +2025-02-05 21:27:57 - ERROR - stderr - 51%|█████ | 11399/22434 [11:20:16<7:46:37, 2.54s/it] +2025-02-05 21:27:57 - ERROR - stderr - +2025-02-05 21:27:57 - ERROR - stderr - +2025-02-05 21:27:57 - INFO - stdout - {'loss': 0.7006, 'grad_norm': 1.2069307565689087, 'learning_rate': 1.0223762049755094e-05, 'epoch': 1.52} +2025-02-05 21:27:57 - ERROR - stderr - 51%|█████ | 11399/22434 [11:20:16<7:46:37, 2.54s/it] +2025-02-05 21:27:59 - ERROR - stderr - 51%|█████ | 11400/22434 [11:20:19<7:41:13, 2.51s/it] +2025-02-05 21:27:59 - ERROR - stderr - +2025-02-05 21:27:59 - ERROR - stderr - +2025-02-05 21:27:59 - INFO - stdout - {'loss': 0.6886, 'grad_norm': 1.1613616943359375, 'learning_rate': 1.022231866228745e-05, 'epoch': 1.52} +2025-02-05 21:27:59 - ERROR - stderr - 51%|█████ | 11400/22434 [11:20:19<7:41:13, 2.51s/it] +2025-02-05 21:28:02 - ERROR - stderr - 51%|█████ | 11401/22434 [11:20:21<7:41:47, 2.51s/it] +2025-02-05 21:28:02 - ERROR - stderr - +2025-02-05 21:28:02 - ERROR - stderr - +2025-02-05 21:28:02 - INFO - stdout - {'loss': 0.6913, 'grad_norm': 1.323214054107666, 'learning_rate': 1.0220875270185784e-05, 'epoch': 1.52} +2025-02-05 21:28:02 - ERROR - stderr - 51%|█████ | 11401/22434 [11:20:21<7:41:47, 2.51s/it] +2025-02-05 21:28:04 - ERROR - stderr - 51%|█████ | 11402/22434 [11:20:24<7:47:30, 2.54s/it] +2025-02-05 21:28:04 - ERROR - stderr - +2025-02-05 21:28:04 - ERROR - stderr - +2025-02-05 21:28:04 - INFO - stdout - {'loss': 0.7929, 'grad_norm': 1.2059725522994995, 'learning_rate': 1.0219431873480186e-05, 'epoch': 1.52} +2025-02-05 21:28:04 - ERROR - stderr - 51%|█████ | 11402/22434 [11:20:24<7:47:30, 2.54s/it] +2025-02-05 21:28:07 - ERROR - stderr - 51%|█████ | 11403/22434 [11:20:26<7:47:00, 2.54s/it] +2025-02-05 21:28:07 - ERROR - stderr - +2025-02-05 21:28:07 - ERROR - stderr - +2025-02-05 21:28:07 - INFO - stdout - {'loss': 0.6674, 'grad_norm': 1.0640259981155396, 'learning_rate': 1.0217988472200739e-05, 'epoch': 1.52} +2025-02-05 21:28:07 - ERROR - stderr - 51%|█████ | 11403/22434 [11:20:26<7:47:00, 2.54s/it] +2025-02-05 21:28:09 - ERROR - stderr - 51%|█████ | 11404/22434 [11:20:29<7:49:18, 2.55s/it] +2025-02-05 21:28:09 - ERROR - stderr - +2025-02-05 21:28:09 - ERROR - stderr - +2025-02-05 21:28:09 - INFO - stdout - {'loss': 0.7135, 'grad_norm': 1.3941439390182495, 'learning_rate': 1.0216545066377535e-05, 'epoch': 1.53} +2025-02-05 21:28:09 - ERROR - stderr - 51%|█████ | 11404/22434 [11:20:29<7:49:18, 2.55s/it] +2025-02-05 21:28:12 - ERROR - stderr - 51%|█████ | 11405/22434 [11:20:32<7:49:45, 2.56s/it] +2025-02-05 21:28:12 - ERROR - stderr - +2025-02-05 21:28:12 - ERROR - stderr - +2025-02-05 21:28:12 - INFO - stdout - {'loss': 0.7613, 'grad_norm': 1.2665691375732422, 'learning_rate': 1.021510165604065e-05, 'epoch': 1.53} +2025-02-05 21:28:12 - ERROR - stderr - 51%|█████ | 11405/22434 [11:20:32<7:49:45, 2.56s/it] +2025-02-05 21:28:14 - ERROR - stderr - 51%|█████ | 11406/22434 [11:20:34<7:49:19, 2.55s/it] +2025-02-05 21:28:14 - ERROR - stderr - +2025-02-05 21:28:14 - ERROR - stderr - +2025-02-05 21:28:14 - INFO - stdout - {'loss': 0.6778, 'grad_norm': 1.200862169265747, 'learning_rate': 1.0213658241220181e-05, 'epoch': 1.53} +2025-02-05 21:28:14 - ERROR - stderr - 51%|█████ | 11406/22434 [11:20:34<7:49:19, 2.55s/it] +2025-02-05 21:28:17 - ERROR - stderr - 51%|█████ | 11407/22434 [11:20:37<7:45:39, 2.53s/it] +2025-02-05 21:28:17 - ERROR - stderr - +2025-02-05 21:28:17 - ERROR - stderr - +2025-02-05 21:28:17 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.3196154832839966, 'learning_rate': 1.0212214821946213e-05, 'epoch': 1.53} +2025-02-05 21:28:17 - ERROR - stderr - 51%|█████ | 11407/22434 [11:20:37<7:45:39, 2.53s/it] +2025-02-05 21:28:19 - ERROR - stderr - 51%|█████ | 11408/22434 [11:20:39<7:41:49, 2.51s/it] +2025-02-05 21:28:19 - ERROR - stderr - +2025-02-05 21:28:19 - ERROR - stderr - +2025-02-05 21:28:19 - INFO - stdout - {'loss': 0.7834, 'grad_norm': 1.344474196434021, 'learning_rate': 1.0210771398248826e-05, 'epoch': 1.53} +2025-02-05 21:28:19 - ERROR - stderr - 51%|█████ | 11408/22434 [11:20:39<7:41:49, 2.51s/it] +2025-02-05 21:28:22 - ERROR - stderr - 51%|█████ | 11409/22434 [11:20:42<7:39:14, 2.50s/it] +2025-02-05 21:28:22 - ERROR - stderr - +2025-02-05 21:28:22 - ERROR - stderr - +2025-02-05 21:28:22 - INFO - stdout - {'loss': 0.6684, 'grad_norm': 1.2215858697891235, 'learning_rate': 1.0209327970158113e-05, 'epoch': 1.53} +2025-02-05 21:28:22 - ERROR - stderr - 51%|█████ | 11409/22434 [11:20:42<7:39:14, 2.50s/it] +2025-02-05 21:28:24 - ERROR - stderr - 51%|█████ | 11410/22434 [11:20:44<7:36:22, 2.48s/it] +2025-02-05 21:28:24 - ERROR - stderr - +2025-02-05 21:28:24 - ERROR - stderr - +2025-02-05 21:28:24 - INFO - stdout - {'loss': 0.7324, 'grad_norm': 1.2536499500274658, 'learning_rate': 1.0207884537704156e-05, 'epoch': 1.53} +2025-02-05 21:28:24 - ERROR - stderr - 51%|█████ | 11410/22434 [11:20:44<7:36:22, 2.48s/it] +2025-02-05 21:28:27 - ERROR - stderr - 51%|█████ | 11411/22434 [11:20:46<7:34:07, 2.47s/it] +2025-02-05 21:28:27 - ERROR - stderr - +2025-02-05 21:28:27 - ERROR - stderr - +2025-02-05 21:28:27 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.2801861763000488, 'learning_rate': 1.0206441100917049e-05, 'epoch': 1.53} +2025-02-05 21:28:27 - ERROR - stderr - 51%|█████ | 11411/22434 [11:20:46<7:34:07, 2.47s/it] +2025-02-05 21:28:29 - ERROR - stderr - 51%|█████ | 11412/22434 [11:20:49<7:36:45, 2.49s/it] +2025-02-05 21:28:29 - ERROR - stderr - +2025-02-05 21:28:29 - ERROR - stderr - +2025-02-05 21:28:29 - INFO - stdout - {'loss': 0.5824, 'grad_norm': 1.1119688749313354, 'learning_rate': 1.020499765982687e-05, 'epoch': 1.53} +2025-02-05 21:28:29 - ERROR - stderr - 51%|█████ | 11412/22434 [11:20:49<7:36:45, 2.49s/it] +2025-02-05 21:28:32 - ERROR - stderr - 51%|█████ | 11413/22434 [11:20:51<7:37:00, 2.49s/it] +2025-02-05 21:28:32 - ERROR - stderr - +2025-02-05 21:28:32 - ERROR - stderr - +2025-02-05 21:28:32 - INFO - stdout - {'loss': 0.7377, 'grad_norm': 1.2409217357635498, 'learning_rate': 1.0203554214463713e-05, 'epoch': 1.53} +2025-02-05 21:28:32 - ERROR - stderr - 51%|█████ | 11413/22434 [11:20:51<7:37:00, 2.49s/it] +2025-02-05 21:28:34 - ERROR - stderr - 51%|█████ | 11414/22434 [11:20:54<7:36:48, 2.49s/it] +2025-02-05 21:28:34 - ERROR - stderr - +2025-02-05 21:28:34 - ERROR - stderr - +2025-02-05 21:28:34 - INFO - stdout - {'loss': 0.7202, 'grad_norm': 1.249873399734497, 'learning_rate': 1.0202110764857662e-05, 'epoch': 1.53} +2025-02-05 21:28:34 - ERROR - stderr - 51%|█████ | 11414/22434 [11:20:54<7:36:48, 2.49s/it] +2025-02-05 21:28:37 - ERROR - stderr - 51%|█████ | 11415/22434 [11:20:56<7:35:00, 2.48s/it] +2025-02-05 21:28:37 - ERROR - stderr - +2025-02-05 21:28:37 - ERROR - stderr - +2025-02-05 21:28:37 - INFO - stdout - {'loss': 0.7442, 'grad_norm': 1.2186199426651, 'learning_rate': 1.0200667311038808e-05, 'epoch': 1.53} +2025-02-05 21:28:37 - ERROR - stderr - 51%|█████ | 11415/22434 [11:20:56<7:35:00, 2.48s/it] +2025-02-05 21:28:39 - ERROR - stderr - 51%|█████ | 11416/22434 [11:20:59<7:33:08, 2.47s/it] +2025-02-05 21:28:39 - ERROR - stderr - +2025-02-05 21:28:39 - ERROR - stderr - +2025-02-05 21:28:39 - INFO - stdout - {'loss': 0.78, 'grad_norm': 1.3245911598205566, 'learning_rate': 1.0199223853037235e-05, 'epoch': 1.53} +2025-02-05 21:28:39 - ERROR - stderr - 51%|█████ | 11416/22434 [11:20:59<7:33:08, 2.47s/it] +2025-02-05 21:28:42 - ERROR - stderr - 51%|█████ | 11417/22434 [11:21:01<7:37:44, 2.49s/it] +2025-02-05 21:28:42 - ERROR - stderr - +2025-02-05 21:28:42 - ERROR - stderr - +2025-02-05 21:28:42 - INFO - stdout - {'loss': 0.6544, 'grad_norm': 1.1203274726867676, 'learning_rate': 1.019778039088303e-05, 'epoch': 1.53} +2025-02-05 21:28:42 - ERROR - stderr - 51%|█████ | 11417/22434 [11:21:01<7:37:44, 2.49s/it] +2025-02-05 21:28:44 - ERROR - stderr - 51%|█████ | 11418/22434 [11:21:04<7:42:01, 2.52s/it] +2025-02-05 21:28:44 - ERROR - stderr - +2025-02-05 21:28:44 - ERROR - stderr - +2025-02-05 21:28:44 - INFO - stdout - {'loss': 0.7151, 'grad_norm': 1.1356606483459473, 'learning_rate': 1.0196336924606282e-05, 'epoch': 1.53} +2025-02-05 21:28:44 - ERROR - stderr - 51%|█████ | 11418/22434 [11:21:04<7:42:01, 2.52s/it] +2025-02-05 21:28:47 - ERROR - stderr - 51%|█████ | 11419/22434 [11:21:07<7:44:36, 2.53s/it] +2025-02-05 21:28:47 - ERROR - stderr - +2025-02-05 21:28:47 - ERROR - stderr - +2025-02-05 21:28:47 - INFO - stdout - {'loss': 0.685, 'grad_norm': 1.1675573587417603, 'learning_rate': 1.0194893454237082e-05, 'epoch': 1.53} +2025-02-05 21:28:47 - ERROR - stderr - 51%|█████ | 11419/22434 [11:21:07<7:44:36, 2.53s/it] +2025-02-05 21:28:49 - ERROR - stderr - 51%|█████ | 11420/22434 [11:21:09<7:45:07, 2.53s/it] +2025-02-05 21:28:49 - ERROR - stderr - +2025-02-05 21:28:49 - ERROR - stderr - +2025-02-05 21:28:49 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.2476993799209595, 'learning_rate': 1.0193449979805515e-05, 'epoch': 1.53} +2025-02-05 21:28:49 - ERROR - stderr - 51%|█████ | 11420/22434 [11:21:09<7:45:07, 2.53s/it] +2025-02-05 21:28:52 - ERROR - stderr - 51%|█████ | 11421/22434 [11:21:12<7:47:42, 2.55s/it] +2025-02-05 21:28:52 - ERROR - stderr - +2025-02-05 21:28:52 - ERROR - stderr - +2025-02-05 21:28:52 - INFO - stdout - {'loss': 0.6832, 'grad_norm': 1.2121187448501587, 'learning_rate': 1.0192006501341664e-05, 'epoch': 1.53} +2025-02-05 21:28:52 - ERROR - stderr - 51%|█████ | 11421/22434 [11:21:12<7:47:42, 2.55s/it] +2025-02-05 21:28:55 - ERROR - stderr - 51%|█████ | 11422/22434 [11:21:14<7:56:42, 2.60s/it] +2025-02-05 21:28:55 - ERROR - stderr - +2025-02-05 21:28:55 - ERROR - stderr - +2025-02-05 21:28:55 - INFO - stdout - {'loss': 0.7506, 'grad_norm': 1.1831876039505005, 'learning_rate': 1.0190563018875623e-05, 'epoch': 1.53} +2025-02-05 21:28:55 - ERROR - stderr - 51%|█████ | 11422/22434 [11:21:14<7:56:42, 2.60s/it] +2025-02-05 21:28:57 - ERROR - stderr - 51%|█████ | 11423/22434 [11:21:17<7:50:16, 2.56s/it] +2025-02-05 21:28:57 - ERROR - stderr - +2025-02-05 21:28:57 - ERROR - stderr - +2025-02-05 21:28:57 - INFO - stdout - {'loss': 0.698, 'grad_norm': 1.2036285400390625, 'learning_rate': 1.0189119532437478e-05, 'epoch': 1.53} +2025-02-05 21:28:57 - ERROR - stderr - 51%|█████ | 11423/22434 [11:21:17<7:50:16, 2.56s/it] +2025-02-05 21:29:00 - ERROR - stderr - 51%|█████ | 11424/22434 [11:21:19<7:51:27, 2.57s/it] +2025-02-05 21:29:00 - ERROR - stderr - +2025-02-05 21:29:00 - ERROR - stderr - +2025-02-05 21:29:00 - INFO - stdout - {'loss': 0.683, 'grad_norm': 1.2253434658050537, 'learning_rate': 1.0187676042057315e-05, 'epoch': 1.53} +2025-02-05 21:29:00 - ERROR - stderr - 51%|█████ | 11424/22434 [11:21:19<7:51:27, 2.57s/it] +2025-02-05 21:29:02 - ERROR - stderr - 51%|█████ | 11425/22434 [11:21:22<7:46:12, 2.54s/it] +2025-02-05 21:29:02 - ERROR - stderr - +2025-02-05 21:29:02 - ERROR - stderr - +2025-02-05 21:29:02 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.1600944995880127, 'learning_rate': 1.0186232547765226e-05, 'epoch': 1.53} +2025-02-05 21:29:02 - ERROR - stderr - 51%|█████ | 11425/22434 [11:21:22<7:46:12, 2.54s/it] +2025-02-05 21:29:05 - ERROR - stderr - 51%|█████ | 11426/22434 [11:21:24<7:46:43, 2.54s/it] +2025-02-05 21:29:05 - ERROR - stderr - +2025-02-05 21:29:05 - ERROR - stderr - +2025-02-05 21:29:05 - INFO - stdout - {'loss': 0.6336, 'grad_norm': 1.194593906402588, 'learning_rate': 1.01847890495913e-05, 'epoch': 1.53} +2025-02-05 21:29:05 - ERROR - stderr - 51%|█████ | 11426/22434 [11:21:24<7:46:43, 2.54s/it] +2025-02-05 21:29:07 - ERROR - stderr - 51%|█████ | 11427/22434 [11:21:27<7:41:00, 2.51s/it] +2025-02-05 21:29:07 - ERROR - stderr - +2025-02-05 21:29:07 - ERROR - stderr - +2025-02-05 21:29:07 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.280401587486267, 'learning_rate': 1.0183345547565624e-05, 'epoch': 1.53} +2025-02-05 21:29:07 - ERROR - stderr - 51%|█████ | 11427/22434 [11:21:27<7:41:00, 2.51s/it] +2025-02-05 21:29:10 - ERROR - stderr - 51%|█████ | 11428/22434 [11:21:29<7:42:36, 2.52s/it] +2025-02-05 21:29:10 - ERROR - stderr - +2025-02-05 21:29:10 - ERROR - stderr - +2025-02-05 21:29:10 - INFO - stdout - {'loss': 0.7071, 'grad_norm': 1.1808527708053589, 'learning_rate': 1.0181902041718284e-05, 'epoch': 1.53} +2025-02-05 21:29:10 - ERROR - stderr - 51%|█████ | 11428/22434 [11:21:29<7:42:36, 2.52s/it] +2025-02-05 21:29:12 - ERROR - stderr - 51%|█████ | 11429/22434 [11:21:32<7:39:20, 2.50s/it] +2025-02-05 21:29:12 - ERROR - stderr - +2025-02-05 21:29:12 - ERROR - stderr - +2025-02-05 21:29:12 - INFO - stdout - {'loss': 0.6334, 'grad_norm': 1.059228539466858, 'learning_rate': 1.0180458532079365e-05, 'epoch': 1.53} +2025-02-05 21:29:12 - ERROR - stderr - 51%|█████ | 11429/22434 [11:21:32<7:39:20, 2.50s/it] +2025-02-05 21:29:15 - ERROR - stderr - 51%|█████ | 11430/22434 [11:21:34<7:37:57, 2.50s/it] +2025-02-05 21:29:15 - ERROR - stderr - +2025-02-05 21:29:15 - ERROR - stderr - +2025-02-05 21:29:15 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.0922168493270874, 'learning_rate': 1.0179015018678963e-05, 'epoch': 1.53} +2025-02-05 21:29:15 - ERROR - stderr - 51%|█████ | 11430/22434 [11:21:34<7:37:57, 2.50s/it] +2025-02-05 21:29:17 - ERROR - stderr - 51%|█████ | 11431/22434 [11:21:37<7:39:54, 2.51s/it] +2025-02-05 21:29:17 - ERROR - stderr - +2025-02-05 21:29:17 - ERROR - stderr - +2025-02-05 21:29:17 - INFO - stdout - {'loss': 0.6996, 'grad_norm': 1.0760000944137573, 'learning_rate': 1.017757150154717e-05, 'epoch': 1.53} +2025-02-05 21:29:17 - ERROR - stderr - 51%|█████ | 11431/22434 [11:21:37<7:39:54, 2.51s/it] +2025-02-05 21:29:20 - ERROR - stderr - 51%|█████ | 11432/22434 [11:21:39<7:42:41, 2.52s/it] +2025-02-05 21:29:20 - ERROR - stderr - +2025-02-05 21:29:20 - ERROR - stderr - +2025-02-05 21:29:20 - INFO - stdout - {'loss': 0.6616, 'grad_norm': 1.1737550497055054, 'learning_rate': 1.0176127980714063e-05, 'epoch': 1.53} +2025-02-05 21:29:20 - ERROR - stderr - 51%|█████ | 11432/22434 [11:21:40<7:42:41, 2.52s/it] +2025-02-05 21:29:22 - ERROR - stderr - 51%|█████ | 11433/22434 [11:21:42<7:41:46, 2.52s/it] +2025-02-05 21:29:22 - ERROR - stderr - +2025-02-05 21:29:22 - ERROR - stderr - +2025-02-05 21:29:22 - INFO - stdout - {'loss': 0.7044, 'grad_norm': 1.1612838506698608, 'learning_rate': 1.017468445620974e-05, 'epoch': 1.53} +2025-02-05 21:29:22 - ERROR - stderr - 51%|█████ | 11433/22434 [11:21:42<7:41:46, 2.52s/it] +2025-02-05 21:29:25 - ERROR - stderr - 51%|█████ | 11434/22434 [11:21:44<7:39:47, 2.51s/it] +2025-02-05 21:29:25 - ERROR - stderr - +2025-02-05 21:29:25 - ERROR - stderr - +2025-02-05 21:29:25 - INFO - stdout - {'loss': 0.6748, 'grad_norm': 1.2250031232833862, 'learning_rate': 1.0173240928064285e-05, 'epoch': 1.53} +2025-02-05 21:29:25 - ERROR - stderr - 51%|█████ | 11434/22434 [11:21:45<7:39:47, 2.51s/it] +2025-02-05 21:29:27 - ERROR - stderr - 51%|█████ | 11435/22434 [11:21:47<7:38:52, 2.50s/it] +2025-02-05 21:29:27 - ERROR - stderr - +2025-02-05 21:29:27 - ERROR - stderr - +2025-02-05 21:29:27 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.104472279548645, 'learning_rate': 1.017179739630779e-05, 'epoch': 1.53} +2025-02-05 21:29:27 - ERROR - stderr - 51%|█████ | 11435/22434 [11:21:47<7:38:52, 2.50s/it] +2025-02-05 21:29:30 - ERROR - stderr - 51%|█████ | 11436/22434 [11:21:49<7:36:45, 2.49s/it] +2025-02-05 21:29:30 - ERROR - stderr - +2025-02-05 21:29:30 - ERROR - stderr - +2025-02-05 21:29:30 - INFO - stdout - {'loss': 0.72, 'grad_norm': 1.2426345348358154, 'learning_rate': 1.017035386097034e-05, 'epoch': 1.53} +2025-02-05 21:29:30 - ERROR - stderr - 51%|█████ | 11436/22434 [11:21:49<7:36:45, 2.49s/it] +2025-02-05 21:29:32 - ERROR - stderr - 51%|█████ | 11437/22434 [11:21:52<7:48:12, 2.55s/it] +2025-02-05 21:29:32 - ERROR - stderr - +2025-02-05 21:29:32 - ERROR - stderr - +2025-02-05 21:29:32 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.2250365018844604, 'learning_rate': 1.0168910322082028e-05, 'epoch': 1.53} +2025-02-05 21:29:32 - ERROR - stderr - 51%|█████ | 11437/22434 [11:21:52<7:48:12, 2.55s/it] +2025-02-05 21:29:35 - ERROR - stderr - 51%|█████ | 11438/22434 [11:21:55<7:43:48, 2.53s/it] +2025-02-05 21:29:35 - ERROR - stderr - +2025-02-05 21:29:35 - ERROR - stderr - +2025-02-05 21:29:35 - INFO - stdout - {'loss': 0.7231, 'grad_norm': 1.3105405569076538, 'learning_rate': 1.0167466779672943e-05, 'epoch': 1.53} +2025-02-05 21:29:35 - ERROR - stderr - 51%|█████ | 11438/22434 [11:21:55<7:43:48, 2.53s/it] +2025-02-05 21:29:37 - ERROR - stderr - 51%|█████ | 11439/22434 [11:21:57<7:49:36, 2.56s/it] +2025-02-05 21:29:38 - ERROR - stderr - +2025-02-05 21:29:38 - ERROR - stderr - +2025-02-05 21:29:38 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.1340572834014893, 'learning_rate': 1.0166023233773174e-05, 'epoch': 1.53} +2025-02-05 21:29:38 - ERROR - stderr - 51%|█████ | 11439/22434 [11:21:57<7:49:36, 2.56s/it] +2025-02-05 21:29:40 - ERROR - stderr - 51%|█████ | 11440/22434 [11:22:00<7:47:08, 2.55s/it] +2025-02-05 21:29:40 - ERROR - stderr - +2025-02-05 21:29:40 - ERROR - stderr - +2025-02-05 21:29:40 - INFO - stdout - {'loss': 0.6839, 'grad_norm': 1.0655606985092163, 'learning_rate': 1.0164579684412808e-05, 'epoch': 1.53} +2025-02-05 21:29:40 - ERROR - stderr - 51%|█████ | 11440/22434 [11:22:00<7:47:08, 2.55s/it] +2025-02-05 21:29:42 - ERROR - stderr - 51%|█████ | 11441/22434 [11:22:02<7:42:13, 2.52s/it] +2025-02-05 21:29:42 - ERROR - stderr - +2025-02-05 21:29:42 - ERROR - stderr - +2025-02-05 21:29:42 - INFO - stdout - {'loss': 0.7004, 'grad_norm': 1.2457494735717773, 'learning_rate': 1.0163136131621937e-05, 'epoch': 1.53} +2025-02-05 21:29:42 - ERROR - stderr - 51%|█████ | 11441/22434 [11:22:02<7:42:13, 2.52s/it] +2025-02-05 21:29:45 - ERROR - stderr - 51%|█████ | 11442/22434 [11:22:05<7:58:38, 2.61s/it] +2025-02-05 21:29:45 - ERROR - stderr - +2025-02-05 21:29:45 - ERROR - stderr - +2025-02-05 21:29:45 - INFO - stdout - {'loss': 0.8023, 'grad_norm': 1.3896231651306152, 'learning_rate': 1.0161692575430646e-05, 'epoch': 1.53} +2025-02-05 21:29:45 - ERROR - stderr - 51%|█████ | 11442/22434 [11:22:05<7:58:38, 2.61s/it] +2025-02-05 21:29:48 - ERROR - stderr - 51%|█████ | 11443/22434 [11:22:07<7:50:40, 2.57s/it] +2025-02-05 21:29:48 - ERROR - stderr - +2025-02-05 21:29:48 - ERROR - stderr - +2025-02-05 21:29:48 - INFO - stdout - {'loss': 0.7615, 'grad_norm': 1.2457791566848755, 'learning_rate': 1.0160249015869032e-05, 'epoch': 1.53} +2025-02-05 21:29:48 - ERROR - stderr - 51%|█████ | 11443/22434 [11:22:08<7:50:40, 2.57s/it] +2025-02-05 21:29:50 - ERROR - stderr - 51%|█████ | 11444/22434 [11:22:10<7:48:19, 2.56s/it] +2025-02-05 21:29:50 - ERROR - stderr - +2025-02-05 21:29:50 - ERROR - stderr - +2025-02-05 21:29:50 - INFO - stdout - {'loss': 0.6902, 'grad_norm': 1.131152629852295, 'learning_rate': 1.015880545296718e-05, 'epoch': 1.53} +2025-02-05 21:29:50 - ERROR - stderr - 51%|█████ | 11444/22434 [11:22:10<7:48:19, 2.56s/it] +2025-02-05 21:29:53 - ERROR - stderr - 51%|█████ | 11445/22434 [11:22:13<7:45:01, 2.54s/it] +2025-02-05 21:29:53 - ERROR - stderr - +2025-02-05 21:29:53 - ERROR - stderr - +2025-02-05 21:29:53 - INFO - stdout - {'loss': 0.7562, 'grad_norm': 1.2113062143325806, 'learning_rate': 1.0157361886755178e-05, 'epoch': 1.53} +2025-02-05 21:29:53 - ERROR - stderr - 51%|█████ | 11445/22434 [11:22:13<7:45:01, 2.54s/it] +2025-02-05 21:29:55 - ERROR - stderr - 51%|█████ | 11446/22434 [11:22:15<7:41:06, 2.52s/it] +2025-02-05 21:29:55 - ERROR - stderr - +2025-02-05 21:29:55 - ERROR - stderr - +2025-02-05 21:29:55 - INFO - stdout - {'loss': 0.6607, 'grad_norm': 1.165136694908142, 'learning_rate': 1.015591831726312e-05, 'epoch': 1.53} +2025-02-05 21:29:55 - ERROR - stderr - 51%|█████ | 11446/22434 [11:22:15<7:41:06, 2.52s/it] +2025-02-05 21:29:58 - ERROR - stderr - 51%|█████ | 11447/22434 [11:22:17<7:40:40, 2.52s/it] +2025-02-05 21:29:58 - ERROR - stderr - +2025-02-05 21:29:58 - ERROR - stderr - +2025-02-05 21:29:58 - INFO - stdout - {'loss': 0.6691, 'grad_norm': 1.2244077920913696, 'learning_rate': 1.0154474744521094e-05, 'epoch': 1.53} +2025-02-05 21:29:58 - ERROR - stderr - 51%|█████ | 11447/22434 [11:22:18<7:40:40, 2.52s/it] +2025-02-05 21:30:00 - ERROR - stderr - 51%|█████ | 11448/22434 [11:22:20<7:36:13, 2.49s/it] +2025-02-05 21:30:00 - ERROR - stderr - +2025-02-05 21:30:00 - ERROR - stderr - +2025-02-05 21:30:00 - INFO - stdout - {'loss': 0.6248, 'grad_norm': 1.0597703456878662, 'learning_rate': 1.0153031168559188e-05, 'epoch': 1.53} +2025-02-05 21:30:00 - ERROR - stderr - 51%|█████ | 11448/22434 [11:22:20<7:36:13, 2.49s/it] +2025-02-05 21:30:03 - ERROR - stderr - 51%|█████ | 11449/22434 [11:22:22<7:34:55, 2.48s/it] +2025-02-05 21:30:03 - ERROR - stderr - +2025-02-05 21:30:03 - ERROR - stderr - +2025-02-05 21:30:03 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.0311192274093628, 'learning_rate': 1.0151587589407494e-05, 'epoch': 1.53} +2025-02-05 21:30:03 - ERROR - stderr - 51%|█████ | 11449/22434 [11:22:22<7:34:55, 2.48s/it] +2025-02-05 21:30:05 - ERROR - stderr - 51%|█████ | 11450/22434 [11:22:25<7:34:32, 2.48s/it] +2025-02-05 21:30:05 - ERROR - stderr - +2025-02-05 21:30:05 - ERROR - stderr - +2025-02-05 21:30:05 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.1861071586608887, 'learning_rate': 1.0150144007096103e-05, 'epoch': 1.53} +2025-02-05 21:30:05 - ERROR - stderr - 51%|█████ | 11450/22434 [11:22:25<7:34:32, 2.48s/it] +2025-02-05 21:30:08 - ERROR - stderr - 51%|█████ | 11451/22434 [11:22:27<7:32:02, 2.47s/it] +2025-02-05 21:30:08 - ERROR - stderr - +2025-02-05 21:30:08 - ERROR - stderr - +2025-02-05 21:30:08 - INFO - stdout - {'loss': 0.6595, 'grad_norm': 1.2244356870651245, 'learning_rate': 1.0148700421655105e-05, 'epoch': 1.53} +2025-02-05 21:30:08 - ERROR - stderr - 51%|█████ | 11451/22434 [11:22:27<7:32:02, 2.47s/it] +2025-02-05 21:30:10 - ERROR - stderr - 51%|█████ | 11452/22434 [11:22:30<7:32:17, 2.47s/it] +2025-02-05 21:30:10 - ERROR - stderr - +2025-02-05 21:30:10 - ERROR - stderr - +2025-02-05 21:30:10 - INFO - stdout - {'loss': 0.6291, 'grad_norm': 1.0164098739624023, 'learning_rate': 1.0147256833114586e-05, 'epoch': 1.53} +2025-02-05 21:30:10 - ERROR - stderr - 51%|█████ | 11452/22434 [11:22:30<7:32:17, 2.47s/it] +2025-02-05 21:30:12 - ERROR - stderr - 51%|█████ | 11453/22434 [11:22:32<7:30:02, 2.46s/it] +2025-02-05 21:30:13 - ERROR - stderr - +2025-02-05 21:30:13 - ERROR - stderr - +2025-02-05 21:30:13 - INFO - stdout - {'loss': 0.67, 'grad_norm': 1.1831355094909668, 'learning_rate': 1.0145813241504642e-05, 'epoch': 1.53} +2025-02-05 21:30:13 - ERROR - stderr - 51%|█████ | 11453/22434 [11:22:32<7:30:02, 2.46s/it] +2025-02-05 21:30:15 - ERROR - stderr - 51%|█████ | 11454/22434 [11:22:35<7:34:05, 2.48s/it] +2025-02-05 21:30:15 - ERROR - stderr - +2025-02-05 21:30:15 - ERROR - stderr - +2025-02-05 21:30:15 - INFO - stdout - {'loss': 0.6139, 'grad_norm': 1.118692398071289, 'learning_rate': 1.014436964685536e-05, 'epoch': 1.53} +2025-02-05 21:30:15 - ERROR - stderr - 51%|█████ | 11454/22434 [11:22:35<7:34:05, 2.48s/it] +2025-02-05 21:30:18 - ERROR - stderr - 51%|█████ | 11455/22434 [11:22:37<7:38:53, 2.51s/it] +2025-02-05 21:30:18 - ERROR - stderr - +2025-02-05 21:30:18 - ERROR - stderr - +2025-02-05 21:30:18 - INFO - stdout - {'loss': 0.6728, 'grad_norm': 1.1892640590667725, 'learning_rate': 1.0142926049196829e-05, 'epoch': 1.53} +2025-02-05 21:30:18 - ERROR - stderr - 51%|█████ | 11455/22434 [11:22:37<7:38:53, 2.51s/it] +2025-02-05 21:30:20 - ERROR - stderr - 51%|█████ | 11456/22434 [11:22:40<7:36:31, 2.50s/it] +2025-02-05 21:30:20 - ERROR - stderr - +2025-02-05 21:30:20 - ERROR - stderr - +2025-02-05 21:30:20 - INFO - stdout - {'loss': 0.7006, 'grad_norm': 1.2539464235305786, 'learning_rate': 1.0141482448559142e-05, 'epoch': 1.53} +2025-02-05 21:30:20 - ERROR - stderr - 51%|█████ | 11456/22434 [11:22:40<7:36:31, 2.50s/it] +2025-02-05 21:30:22 - ERROR - stderr - 51%|█████ | 11457/22434 [11:22:42<7:34:41, 2.49s/it] +2025-02-05 21:30:23 - ERROR - stderr - +2025-02-05 21:30:23 - ERROR - stderr - +2025-02-05 21:30:23 - INFO - stdout - {'loss': 0.6157, 'grad_norm': 1.0876903533935547, 'learning_rate': 1.0140038844972389e-05, 'epoch': 1.53} +2025-02-05 21:30:23 - ERROR - stderr - 51%|█████ | 11457/22434 [11:22:42<7:34:41, 2.49s/it] +2025-02-05 21:30:25 - ERROR - stderr - 51%|█████ | 11458/22434 [11:22:45<7:32:08, 2.47s/it] +2025-02-05 21:30:25 - ERROR - stderr - +2025-02-05 21:30:25 - ERROR - stderr - +2025-02-05 21:30:25 - INFO - stdout - {'loss': 0.793, 'grad_norm': 1.339532732963562, 'learning_rate': 1.0138595238466659e-05, 'epoch': 1.53} +2025-02-05 21:30:25 - ERROR - stderr - 51%|█████ | 11458/22434 [11:22:45<7:32:08, 2.47s/it] +2025-02-05 21:30:27 - ERROR - stderr - 51%|█████ | 11459/22434 [11:22:47<7:30:42, 2.46s/it] +2025-02-05 21:30:27 - ERROR - stderr - +2025-02-05 21:30:27 - ERROR - stderr - +2025-02-05 21:30:27 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.134891152381897, 'learning_rate': 1.0137151629072049e-05, 'epoch': 1.53} +2025-02-05 21:30:27 - ERROR - stderr - 51%|█████ | 11459/22434 [11:22:47<7:30:42, 2.46s/it] +2025-02-05 21:30:30 - ERROR - stderr - 51%|█████ | 11460/22434 [11:22:50<7:31:38, 2.47s/it] +2025-02-05 21:30:30 - ERROR - stderr - +2025-02-05 21:30:30 - ERROR - stderr - +2025-02-05 21:30:30 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.2790517807006836, 'learning_rate': 1.013570801681864e-05, 'epoch': 1.53} +2025-02-05 21:30:30 - ERROR - stderr - 51%|█████ | 11460/22434 [11:22:50<7:31:38, 2.47s/it] +2025-02-05 21:30:32 - ERROR - stderr - 51%|█████ | 11461/22434 [11:22:52<7:30:09, 2.46s/it] +2025-02-05 21:30:32 - ERROR - stderr - +2025-02-05 21:30:32 - ERROR - stderr - +2025-02-05 21:30:32 - INFO - stdout - {'loss': 0.6371, 'grad_norm': 1.1411367654800415, 'learning_rate': 1.0134264401736526e-05, 'epoch': 1.53} +2025-02-05 21:30:32 - ERROR - stderr - 51%|█████ | 11461/22434 [11:22:52<7:30:09, 2.46s/it] +2025-02-05 21:30:35 - ERROR - stderr - 51%|█████ | 11462/22434 [11:22:55<7:34:03, 2.48s/it] +2025-02-05 21:30:35 - ERROR - stderr - +2025-02-05 21:30:35 - ERROR - stderr - +2025-02-05 21:30:35 - INFO - stdout - {'loss': 0.6875, 'grad_norm': 1.201743245124817, 'learning_rate': 1.0132820783855801e-05, 'epoch': 1.53} +2025-02-05 21:30:35 - ERROR - stderr - 51%|█████ | 11462/22434 [11:22:55<7:34:03, 2.48s/it] +2025-02-05 21:30:37 - ERROR - stderr - 51%|█████ | 11463/22434 [11:22:57<7:34:37, 2.49s/it] +2025-02-05 21:30:37 - ERROR - stderr - +2025-02-05 21:30:37 - ERROR - stderr - +2025-02-05 21:30:37 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.15653395652771, 'learning_rate': 1.0131377163206555e-05, 'epoch': 1.53} +2025-02-05 21:30:37 - ERROR - stderr - 51%|█████ | 11463/22434 [11:22:57<7:34:37, 2.49s/it] +2025-02-05 21:30:40 - ERROR - stderr - 51%|█████ | 11464/22434 [11:23:00<7:32:26, 2.47s/it] +2025-02-05 21:30:40 - ERROR - stderr - +2025-02-05 21:30:40 - ERROR - stderr - +2025-02-05 21:30:40 - INFO - stdout - {'loss': 0.6963, 'grad_norm': 1.2586740255355835, 'learning_rate': 1.0129933539818878e-05, 'epoch': 1.53} +2025-02-05 21:30:40 - ERROR - stderr - 51%|█████ | 11464/22434 [11:23:00<7:32:26, 2.47s/it] +2025-02-05 21:30:42 - ERROR - stderr - 51%|█████ | 11465/22434 [11:23:02<7:34:12, 2.48s/it] +2025-02-05 21:30:42 - ERROR - stderr - +2025-02-05 21:30:42 - ERROR - stderr - +2025-02-05 21:30:42 - INFO - stdout - {'loss': 0.7236, 'grad_norm': 1.1018822193145752, 'learning_rate': 1.012848991372286e-05, 'epoch': 1.53} +2025-02-05 21:30:42 - ERROR - stderr - 51%|█████ | 11465/22434 [11:23:02<7:34:12, 2.48s/it] +2025-02-05 21:30:45 - ERROR - stderr - 51%|█████ | 11466/22434 [11:23:05<7:37:00, 2.50s/it] +2025-02-05 21:30:45 - ERROR - stderr - +2025-02-05 21:30:45 - ERROR - stderr - +2025-02-05 21:30:45 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.1289311647415161, 'learning_rate': 1.012704628494859e-05, 'epoch': 1.53} +2025-02-05 21:30:45 - ERROR - stderr - 51%|█████ | 11466/22434 [11:23:05<7:37:00, 2.50s/it] +2025-02-05 21:30:47 - ERROR - stderr - 51%|█████ | 11467/22434 [11:23:07<7:35:44, 2.49s/it] +2025-02-05 21:30:47 - ERROR - stderr - +2025-02-05 21:30:47 - ERROR - stderr - +2025-02-05 21:30:47 - INFO - stdout - {'loss': 0.7023, 'grad_norm': 1.230009913444519, 'learning_rate': 1.0125602653526164e-05, 'epoch': 1.53} +2025-02-05 21:30:47 - ERROR - stderr - 51%|█████ | 11467/22434 [11:23:07<7:35:44, 2.49s/it] +2025-02-05 21:30:50 - ERROR - stderr - 51%|█████ | 11468/22434 [11:23:10<7:36:24, 2.50s/it] +2025-02-05 21:30:50 - ERROR - stderr - +2025-02-05 21:30:50 - ERROR - stderr - +2025-02-05 21:30:50 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.1485594511032104, 'learning_rate': 1.012415901948567e-05, 'epoch': 1.53} +2025-02-05 21:30:50 - ERROR - stderr - 51%|█████ | 11468/22434 [11:23:10<7:36:24, 2.50s/it] +2025-02-05 21:30:52 - ERROR - stderr - 51%|█████ | 11469/22434 [11:23:12<7:34:40, 2.49s/it] +2025-02-05 21:30:52 - ERROR - stderr - +2025-02-05 21:30:52 - ERROR - stderr - +2025-02-05 21:30:52 - INFO - stdout - {'loss': 0.6523, 'grad_norm': 1.136791467666626, 'learning_rate': 1.01227153828572e-05, 'epoch': 1.53} +2025-02-05 21:30:52 - ERROR - stderr - 51%|█████ | 11469/22434 [11:23:12<7:34:40, 2.49s/it] +2025-02-05 21:30:55 - ERROR - stderr - 51%|█████ | 11470/22434 [11:23:15<7:36:59, 2.50s/it] +2025-02-05 21:30:55 - ERROR - stderr - +2025-02-05 21:30:55 - ERROR - stderr - +2025-02-05 21:30:55 - INFO - stdout - {'loss': 0.7133, 'grad_norm': 1.3130242824554443, 'learning_rate': 1.0121271743670846e-05, 'epoch': 1.53} +2025-02-05 21:30:55 - ERROR - stderr - 51%|█████ | 11470/22434 [11:23:15<7:36:59, 2.50s/it] +2025-02-05 21:30:57 - ERROR - stderr - 51%|█████ | 11471/22434 [11:23:17<7:33:18, 2.48s/it] +2025-02-05 21:30:57 - ERROR - stderr - +2025-02-05 21:30:57 - ERROR - stderr - +2025-02-05 21:30:57 - INFO - stdout - {'loss': 0.773, 'grad_norm': 1.3219910860061646, 'learning_rate': 1.01198281019567e-05, 'epoch': 1.53} +2025-02-05 21:30:57 - ERROR - stderr - 51%|█████ | 11471/22434 [11:23:17<7:33:18, 2.48s/it] +2025-02-05 21:31:00 - ERROR - stderr - 51%|█████ | 11472/22434 [11:23:20<7:35:41, 2.49s/it] +2025-02-05 21:31:00 - ERROR - stderr - +2025-02-05 21:31:00 - ERROR - stderr - +2025-02-05 21:31:00 - INFO - stdout - {'loss': 0.6669, 'grad_norm': 1.1638140678405762, 'learning_rate': 1.011838445774485e-05, 'epoch': 1.53} +2025-02-05 21:31:00 - ERROR - stderr - 51%|█████ | 11472/22434 [11:23:20<7:35:41, 2.49s/it] +2025-02-05 21:31:02 - ERROR - stderr - 51%|█████ | 11473/22434 [11:23:22<7:39:59, 2.52s/it] +2025-02-05 21:31:02 - ERROR - stderr - +2025-02-05 21:31:02 - ERROR - stderr - +2025-02-05 21:31:02 - INFO - stdout - {'loss': 0.6424, 'grad_norm': 1.1685967445373535, 'learning_rate': 1.011694081106539e-05, 'epoch': 1.53} +2025-02-05 21:31:02 - ERROR - stderr - 51%|█████ | 11473/22434 [11:23:22<7:39:59, 2.52s/it] +2025-02-05 21:31:05 - ERROR - stderr - 51%|█████ | 11474/22434 [11:23:25<7:47:28, 2.56s/it] +2025-02-05 21:31:05 - ERROR - stderr - +2025-02-05 21:31:05 - ERROR - stderr - +2025-02-05 21:31:05 - INFO - stdout - {'loss': 0.6351, 'grad_norm': 1.2603915929794312, 'learning_rate': 1.0115497161948409e-05, 'epoch': 1.53} +2025-02-05 21:31:05 - ERROR - stderr - 51%|█████ | 11474/22434 [11:23:25<7:47:28, 2.56s/it] +2025-02-05 21:31:08 - ERROR - stderr - 51%|█████ | 11475/22434 [11:23:27<7:46:18, 2.55s/it] +2025-02-05 21:31:08 - ERROR - stderr - +2025-02-05 21:31:08 - ERROR - stderr - +2025-02-05 21:31:08 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.1106369495391846, 'learning_rate': 1.0114053510424e-05, 'epoch': 1.53} +2025-02-05 21:31:08 - ERROR - stderr - 51%|█████ | 11475/22434 [11:23:27<7:46:18, 2.55s/it] +2025-02-05 21:31:10 - ERROR - stderr - 51%|█████ | 11476/22434 [11:23:30<7:40:35, 2.52s/it] +2025-02-05 21:31:10 - ERROR - stderr - +2025-02-05 21:31:10 - ERROR - stderr - +2025-02-05 21:31:10 - INFO - stdout - {'loss': 0.7292, 'grad_norm': 1.3990757465362549, 'learning_rate': 1.0112609856522259e-05, 'epoch': 1.53} +2025-02-05 21:31:10 - ERROR - stderr - 51%|█████ | 11476/22434 [11:23:30<7:40:35, 2.52s/it] +2025-02-05 21:31:12 - ERROR - stderr - 51%|█████ | 11477/22434 [11:23:32<7:37:22, 2.50s/it] +2025-02-05 21:31:12 - ERROR - stderr - +2025-02-05 21:31:12 - ERROR - stderr - +2025-02-05 21:31:12 - INFO - stdout - {'loss': 0.6883, 'grad_norm': 1.2309085130691528, 'learning_rate': 1.011116620027327e-05, 'epoch': 1.53} +2025-02-05 21:31:12 - ERROR - stderr - 51%|█████ | 11477/22434 [11:23:32<7:37:22, 2.50s/it] +2025-02-05 21:31:15 - ERROR - stderr - 51%|█████ | 11478/22434 [11:23:35<7:37:34, 2.51s/it] +2025-02-05 21:31:15 - ERROR - stderr - +2025-02-05 21:31:15 - ERROR - stderr - +2025-02-05 21:31:15 - INFO - stdout - {'loss': 0.7455, 'grad_norm': 1.2403303384780884, 'learning_rate': 1.0109722541707127e-05, 'epoch': 1.53} +2025-02-05 21:31:15 - ERROR - stderr - 51%|█████ | 11478/22434 [11:23:35<7:37:34, 2.51s/it] +2025-02-05 21:31:18 - ERROR - stderr - 51%|█████ | 11479/22434 [11:23:38<7:52:55, 2.59s/it] +2025-02-05 21:31:18 - ERROR - stderr - +2025-02-05 21:31:18 - ERROR - stderr - +2025-02-05 21:31:18 - INFO - stdout - {'loss': 0.5785, 'grad_norm': 1.1004066467285156, 'learning_rate': 1.0108278880853925e-05, 'epoch': 1.54} +2025-02-05 21:31:18 - ERROR - stderr - 51%|█████ | 11479/22434 [11:23:38<7:52:55, 2.59s/it] +2025-02-05 21:31:20 - ERROR - stderr - 51%|█████ | 11480/22434 [11:23:40<7:45:51, 2.55s/it] +2025-02-05 21:31:20 - ERROR - stderr - +2025-02-05 21:31:20 - ERROR - stderr - +2025-02-05 21:31:20 - INFO - stdout - {'loss': 0.7561, 'grad_norm': 1.2963147163391113, 'learning_rate': 1.0106835217743753e-05, 'epoch': 1.54} +2025-02-05 21:31:20 - ERROR - stderr - 51%|█████ | 11480/22434 [11:23:40<7:45:51, 2.55s/it] +2025-02-05 21:31:23 - ERROR - stderr - 51%|█████ | 11481/22434 [11:23:42<7:43:56, 2.54s/it] +2025-02-05 21:31:23 - ERROR - stderr - +2025-02-05 21:31:23 - ERROR - stderr - +2025-02-05 21:31:23 - INFO - stdout - {'loss': 0.8128, 'grad_norm': 1.3116780519485474, 'learning_rate': 1.0105391552406703e-05, 'epoch': 1.54} +2025-02-05 21:31:23 - ERROR - stderr - 51%|█████ | 11481/22434 [11:23:43<7:43:56, 2.54s/it] +2025-02-05 21:31:25 - ERROR - stderr - 51%|█████ | 11482/22434 [11:23:45<7:43:12, 2.54s/it] +2025-02-05 21:31:25 - ERROR - stderr - +2025-02-05 21:31:25 - ERROR - stderr - +2025-02-05 21:31:25 - INFO - stdout - {'loss': 0.6896, 'grad_norm': 1.16659414768219, 'learning_rate': 1.0103947884872865e-05, 'epoch': 1.54} +2025-02-05 21:31:25 - ERROR - stderr - 51%|█████ | 11482/22434 [11:23:45<7:43:12, 2.54s/it] +2025-02-05 21:31:28 - ERROR - stderr - 51%|█████ | 11483/22434 [11:23:47<7:36:56, 2.50s/it] +2025-02-05 21:31:28 - ERROR - stderr - +2025-02-05 21:31:28 - ERROR - stderr - +2025-02-05 21:31:28 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.237070918083191, 'learning_rate': 1.0102504215172335e-05, 'epoch': 1.54} +2025-02-05 21:31:28 - ERROR - stderr - 51%|█████ | 11483/22434 [11:23:47<7:36:56, 2.50s/it] +2025-02-05 21:31:30 - ERROR - stderr - 51%|█████ | 11484/22434 [11:23:50<7:35:56, 2.50s/it] +2025-02-05 21:31:30 - ERROR - stderr - +2025-02-05 21:31:30 - ERROR - stderr - +2025-02-05 21:31:30 - INFO - stdout - {'loss': 0.7349, 'grad_norm': 1.2002946138381958, 'learning_rate': 1.0101060543335204e-05, 'epoch': 1.54} +2025-02-05 21:31:30 - ERROR - stderr - 51%|█████ | 11484/22434 [11:23:50<7:35:56, 2.50s/it] +2025-02-05 21:31:33 - ERROR - stderr - 51%|█████ | 11485/22434 [11:23:52<7:36:06, 2.50s/it] +2025-02-05 21:31:33 - ERROR - stderr - +2025-02-05 21:31:33 - ERROR - stderr - +2025-02-05 21:31:33 - INFO - stdout - {'loss': 0.7396, 'grad_norm': 1.471403956413269, 'learning_rate': 1.009961686939156e-05, 'epoch': 1.54} +2025-02-05 21:31:33 - ERROR - stderr - 51%|█████ | 11485/22434 [11:23:52<7:36:06, 2.50s/it] +2025-02-05 21:31:35 - ERROR - stderr - 51%|█████ | 11486/22434 [11:23:55<7:31:05, 2.47s/it] +2025-02-05 21:31:35 - ERROR - stderr - +2025-02-05 21:31:35 - ERROR - stderr - +2025-02-05 21:31:35 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 1.091191053390503, 'learning_rate': 1.0098173193371498e-05, 'epoch': 1.54} +2025-02-05 21:31:35 - ERROR - stderr - 51%|█████ | 11486/22434 [11:23:55<7:31:05, 2.47s/it] +2025-02-05 21:31:38 - ERROR - stderr - 51%|█████ | 11487/22434 [11:23:58<7:50:36, 2.58s/it] +2025-02-05 21:31:38 - ERROR - stderr - +2025-02-05 21:31:38 - ERROR - stderr - +2025-02-05 21:31:38 - INFO - stdout - {'loss': 0.7205, 'grad_norm': 1.2951558828353882, 'learning_rate': 1.0096729515305108e-05, 'epoch': 1.54} +2025-02-05 21:31:38 - ERROR - stderr - 51%|█████ | 11487/22434 [11:23:58<7:50:36, 2.58s/it] +2025-02-05 21:31:40 - ERROR - stderr - 51%|█████ | 11488/22434 [11:24:00<7:44:57, 2.55s/it] +2025-02-05 21:31:40 - ERROR - stderr - +2025-02-05 21:31:40 - ERROR - stderr - +2025-02-05 21:31:40 - INFO - stdout - {'loss': 0.6545, 'grad_norm': 1.0998430252075195, 'learning_rate': 1.0095285835222488e-05, 'epoch': 1.54} +2025-02-05 21:31:40 - ERROR - stderr - 51%|█████ | 11488/22434 [11:24:00<7:44:57, 2.55s/it] +2025-02-05 21:31:43 - ERROR - stderr - 51%|█████ | 11489/22434 [11:24:03<7:44:25, 2.55s/it] +2025-02-05 21:31:43 - ERROR - stderr - +2025-02-05 21:31:43 - ERROR - stderr - +2025-02-05 21:31:43 - INFO - stdout - {'loss': 0.7418, 'grad_norm': 1.2637847661972046, 'learning_rate': 1.0093842153153723e-05, 'epoch': 1.54} +2025-02-05 21:31:43 - ERROR - stderr - 51%|█████ | 11489/22434 [11:24:03<7:44:25, 2.55s/it] +2025-02-05 21:31:45 - ERROR - stderr - 51%|█████ | 11490/22434 [11:24:05<7:39:53, 2.52s/it] +2025-02-05 21:31:45 - ERROR - stderr - +2025-02-05 21:31:45 - ERROR - stderr - +2025-02-05 21:31:45 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.2122328281402588, 'learning_rate': 1.009239846912891e-05, 'epoch': 1.54} +2025-02-05 21:31:45 - ERROR - stderr - 51%|█████ | 11490/22434 [11:24:05<7:39:53, 2.52s/it] +2025-02-05 21:31:48 - ERROR - stderr - 51%|█████ | 11491/22434 [11:24:08<7:38:17, 2.51s/it] +2025-02-05 21:31:48 - ERROR - stderr - +2025-02-05 21:31:48 - ERROR - stderr - +2025-02-05 21:31:48 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.2284464836120605, 'learning_rate': 1.0090954783178137e-05, 'epoch': 1.54} +2025-02-05 21:31:48 - ERROR - stderr - 51%|█████ | 11491/22434 [11:24:08<7:38:17, 2.51s/it] +2025-02-05 21:31:51 - ERROR - stderr - 51%|█████ | 11492/22434 [11:24:11<7:58:50, 2.63s/it] +2025-02-05 21:31:51 - ERROR - stderr - +2025-02-05 21:31:51 - ERROR - stderr - +2025-02-05 21:31:51 - INFO - stdout - {'loss': 0.6402, 'grad_norm': 1.08121919631958, 'learning_rate': 1.00895110953315e-05, 'epoch': 1.54} +2025-02-05 21:31:51 - ERROR - stderr - 51%|█████ | 11492/22434 [11:24:11<7:58:50, 2.63s/it] +2025-02-05 21:31:54 - ERROR - stderr - 51%|█████ | 11493/22434 [11:24:13<8:07:02, 2.67s/it] +2025-02-05 21:31:54 - ERROR - stderr - +2025-02-05 21:31:54 - ERROR - stderr - +2025-02-05 21:31:54 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.3623188734054565, 'learning_rate': 1.0088067405619088e-05, 'epoch': 1.54} +2025-02-05 21:31:54 - ERROR - stderr - 51%|█████ | 11493/22434 [11:24:13<8:07:02, 2.67s/it] +2025-02-05 21:31:56 - ERROR - stderr - 51%|█████ | 11494/22434 [11:24:16<8:06:45, 2.67s/it] +2025-02-05 21:31:56 - ERROR - stderr - +2025-02-05 21:31:56 - ERROR - stderr - +2025-02-05 21:31:56 - INFO - stdout - {'loss': 0.621, 'grad_norm': 1.1458266973495483, 'learning_rate': 1.0086623714070998e-05, 'epoch': 1.54} +2025-02-05 21:31:56 - ERROR - stderr - 51%|█████ | 11494/22434 [11:24:16<8:06:45, 2.67s/it] +2025-02-05 21:31:59 - ERROR - stderr - 51%|█████ | 11495/22434 [11:24:18<7:57:24, 2.62s/it] +2025-02-05 21:31:59 - ERROR - stderr - +2025-02-05 21:31:59 - ERROR - stderr - +2025-02-05 21:31:59 - INFO - stdout - {'loss': 0.6663, 'grad_norm': 1.2444700002670288, 'learning_rate': 1.0085180020717318e-05, 'epoch': 1.54} +2025-02-05 21:31:59 - ERROR - stderr - 51%|█████ | 11495/22434 [11:24:19<7:57:24, 2.62s/it] +2025-02-05 21:32:01 - ERROR - stderr - 51%|█████ | 11496/22434 [11:24:21<7:49:49, 2.58s/it] +2025-02-05 21:32:01 - ERROR - stderr - +2025-02-05 21:32:01 - ERROR - stderr - +2025-02-05 21:32:01 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.1638400554656982, 'learning_rate': 1.0083736325588145e-05, 'epoch': 1.54} +2025-02-05 21:32:01 - ERROR - stderr - 51%|█████ | 11496/22434 [11:24:21<7:49:49, 2.58s/it] +2025-02-05 21:32:04 - ERROR - stderr - 51%|█████ | 11497/22434 [11:24:23<7:42:18, 2.54s/it] +2025-02-05 21:32:04 - ERROR - stderr - +2025-02-05 21:32:04 - ERROR - stderr - +2025-02-05 21:32:04 - INFO - stdout - {'loss': 0.6049, 'grad_norm': 1.234995722770691, 'learning_rate': 1.0082292628713566e-05, 'epoch': 1.54} +2025-02-05 21:32:04 - ERROR - stderr - 51%|█████ | 11497/22434 [11:24:23<7:42:18, 2.54s/it] +2025-02-05 21:32:06 - ERROR - stderr - 51%|█████▏ | 11498/22434 [11:24:26<7:40:59, 2.53s/it] +2025-02-05 21:32:06 - ERROR - stderr - +2025-02-05 21:32:06 - ERROR - stderr - +2025-02-05 21:32:06 - INFO - stdout - {'loss': 0.6277, 'grad_norm': 1.044304370880127, 'learning_rate': 1.0080848930123674e-05, 'epoch': 1.54} +2025-02-05 21:32:06 - ERROR - stderr - 51%|█████▏ | 11498/22434 [11:24:26<7:40:59, 2.53s/it] +2025-02-05 21:32:09 - ERROR - stderr - 51%|█████▏ | 11499/22434 [11:24:28<7:35:49, 2.50s/it] +2025-02-05 21:32:09 - ERROR - stderr - +2025-02-05 21:32:09 - ERROR - stderr - +2025-02-05 21:32:09 - INFO - stdout - {'loss': 0.6488, 'grad_norm': 1.1334415674209595, 'learning_rate': 1.0079405229848566e-05, 'epoch': 1.54} +2025-02-05 21:32:09 - ERROR - stderr - 51%|█████▏ | 11499/22434 [11:24:28<7:35:49, 2.50s/it] +2025-02-05 21:32:11 - ERROR - stderr - 51%|█████▏ | 11500/22434 [11:24:31<7:38:14, 2.51s/it] +2025-02-05 21:32:11 - ERROR - stderr - +2025-02-05 21:32:11 - ERROR - stderr - +2025-02-05 21:32:11 - INFO - stdout - {'loss': 0.727, 'grad_norm': 1.270363211631775, 'learning_rate': 1.0077961527918332e-05, 'epoch': 1.54} +2025-02-05 21:32:11 - ERROR - stderr - 51%|█████▏ | 11500/22434 [11:24:31<7:38:14, 2.51s/it] +2025-02-05 21:32:14 - ERROR - stderr - 51%|█████▏ | 11501/22434 [11:24:33<7:36:37, 2.51s/it] +2025-02-05 21:32:14 - ERROR - stderr - +2025-02-05 21:32:14 - ERROR - stderr - +2025-02-05 21:32:14 - INFO - stdout - {'loss': 0.7331, 'grad_norm': 1.2057033777236938, 'learning_rate': 1.0076517824363063e-05, 'epoch': 1.54} +2025-02-05 21:32:14 - ERROR - stderr - 51%|█████▏ | 11501/22434 [11:24:33<7:36:37, 2.51s/it] +2025-02-05 21:32:16 - ERROR - stderr - 51%|█████▏ | 11502/22434 [11:24:36<7:33:09, 2.49s/it] +2025-02-05 21:32:16 - ERROR - stderr - +2025-02-05 21:32:16 - ERROR - stderr - +2025-02-05 21:32:16 - INFO - stdout - {'loss': 0.6078, 'grad_norm': 1.27130925655365, 'learning_rate': 1.0075074119212854e-05, 'epoch': 1.54} +2025-02-05 21:32:16 - ERROR - stderr - 51%|█████▏ | 11502/22434 [11:24:36<7:33:09, 2.49s/it] +2025-02-05 21:32:19 - ERROR - stderr - 51%|█████▏ | 11503/22434 [11:24:38<7:32:32, 2.48s/it] +2025-02-05 21:32:19 - ERROR - stderr - +2025-02-05 21:32:19 - ERROR - stderr - +2025-02-05 21:32:19 - INFO - stdout - {'loss': 0.7093, 'grad_norm': 1.2937345504760742, 'learning_rate': 1.0073630412497796e-05, 'epoch': 1.54} +2025-02-05 21:32:19 - ERROR - stderr - 51%|█████▏ | 11503/22434 [11:24:38<7:32:32, 2.48s/it] +2025-02-05 21:32:21 - ERROR - stderr - 51%|█████▏ | 11504/22434 [11:24:41<7:32:18, 2.48s/it] +2025-02-05 21:32:21 - ERROR - stderr - +2025-02-05 21:32:21 - ERROR - stderr - +2025-02-05 21:32:21 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.169643521308899, 'learning_rate': 1.0072186704247987e-05, 'epoch': 1.54} +2025-02-05 21:32:21 - ERROR - stderr - 51%|█████▏ | 11504/22434 [11:24:41<7:32:18, 2.48s/it] +2025-02-05 21:32:24 - ERROR - stderr - 51%|█████▏ | 11505/22434 [11:24:44<7:49:14, 2.58s/it] +2025-02-05 21:32:24 - ERROR - stderr - +2025-02-05 21:32:24 - ERROR - stderr - +2025-02-05 21:32:24 - INFO - stdout - {'loss': 0.7799, 'grad_norm': 1.4491045475006104, 'learning_rate': 1.007074299449351e-05, 'epoch': 1.54} +2025-02-05 21:32:24 - ERROR - stderr - 51%|█████▏ | 11505/22434 [11:24:44<7:49:14, 2.58s/it] +2025-02-05 21:32:26 - ERROR - stderr - 51%|█████▏ | 11506/22434 [11:24:46<7:42:53, 2.54s/it] +2025-02-05 21:32:26 - ERROR - stderr - +2025-02-05 21:32:26 - ERROR - stderr - +2025-02-05 21:32:26 - INFO - stdout - {'loss': 0.7786, 'grad_norm': 1.3527846336364746, 'learning_rate': 1.0069299283264463e-05, 'epoch': 1.54} +2025-02-05 21:32:26 - ERROR - stderr - 51%|█████▏ | 11506/22434 [11:24:46<7:42:53, 2.54s/it] +2025-02-05 21:32:29 - ERROR - stderr - 51%|█████▏ | 11507/22434 [11:24:48<7:38:42, 2.52s/it] +2025-02-05 21:32:29 - ERROR - stderr - +2025-02-05 21:32:29 - ERROR - stderr - +2025-02-05 21:32:29 - INFO - stdout - {'loss': 0.6825, 'grad_norm': 1.1357488632202148, 'learning_rate': 1.0067855570590939e-05, 'epoch': 1.54} +2025-02-05 21:32:29 - ERROR - stderr - 51%|█████▏ | 11507/22434 [11:24:49<7:38:42, 2.52s/it] +2025-02-05 21:32:31 - ERROR - stderr - 51%|█████▏ | 11508/22434 [11:24:51<7:37:54, 2.51s/it] +2025-02-05 21:32:31 - ERROR - stderr - +2025-02-05 21:32:31 - ERROR - stderr - +2025-02-05 21:32:31 - INFO - stdout - {'loss': 0.645, 'grad_norm': 1.1992802619934082, 'learning_rate': 1.0066411856503034e-05, 'epoch': 1.54} +2025-02-05 21:32:31 - ERROR - stderr - 51%|█████▏ | 11508/22434 [11:24:51<7:37:54, 2.51s/it] +2025-02-05 21:32:34 - ERROR - stderr - 51%|█████▏ | 11509/22434 [11:24:53<7:33:32, 2.49s/it] +2025-02-05 21:32:34 - ERROR - stderr - +2025-02-05 21:32:34 - ERROR - stderr - +2025-02-05 21:32:34 - INFO - stdout - {'loss': 0.72, 'grad_norm': 1.3136249780654907, 'learning_rate': 1.0064968141030835e-05, 'epoch': 1.54} +2025-02-05 21:32:34 - ERROR - stderr - 51%|█████▏ | 11509/22434 [11:24:53<7:33:32, 2.49s/it] +2025-02-05 21:32:36 - ERROR - stderr - 51%|█████▏ | 11510/22434 [11:24:56<7:32:35, 2.49s/it] +2025-02-05 21:32:36 - ERROR - stderr - +2025-02-05 21:32:36 - ERROR - stderr - +2025-02-05 21:32:36 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.270641565322876, 'learning_rate': 1.0063524424204436e-05, 'epoch': 1.54} +2025-02-05 21:32:36 - ERROR - stderr - 51%|█████▏ | 11510/22434 [11:24:56<7:32:35, 2.49s/it] +2025-02-05 21:32:39 - ERROR - stderr - 51%|█████▏ | 11511/22434 [11:24:59<7:44:07, 2.55s/it] +2025-02-05 21:32:39 - ERROR - stderr - +2025-02-05 21:32:39 - ERROR - stderr - +2025-02-05 21:32:39 - INFO - stdout - {'loss': 0.7524, 'grad_norm': 1.1993701457977295, 'learning_rate': 1.0062080706053934e-05, 'epoch': 1.54} +2025-02-05 21:32:39 - ERROR - stderr - 51%|█████▏ | 11511/22434 [11:24:59<7:44:07, 2.55s/it] +2025-02-05 21:32:41 - ERROR - stderr - 51%|█████▏ | 11512/22434 [11:25:01<7:37:08, 2.51s/it] +2025-02-05 21:32:41 - ERROR - stderr - +2025-02-05 21:32:41 - ERROR - stderr - +2025-02-05 21:32:41 - INFO - stdout - {'loss': 0.7499, 'grad_norm': 1.142477035522461, 'learning_rate': 1.0060636986609418e-05, 'epoch': 1.54} +2025-02-05 21:32:41 - ERROR - stderr - 51%|█████▏ | 11512/22434 [11:25:01<7:37:08, 2.51s/it] +2025-02-05 21:32:44 - ERROR - stderr - 51%|█████▏ | 11513/22434 [11:25:03<7:34:25, 2.50s/it] +2025-02-05 21:32:44 - ERROR - stderr - +2025-02-05 21:32:44 - ERROR - stderr - +2025-02-05 21:32:44 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.2433645725250244, 'learning_rate': 1.005919326590098e-05, 'epoch': 1.54} +2025-02-05 21:32:44 - ERROR - stderr - 51%|█████▏ | 11513/22434 [11:25:04<7:34:25, 2.50s/it] +2025-02-05 21:32:46 - ERROR - stderr - 51%|█████▏ | 11514/22434 [11:25:06<7:36:46, 2.51s/it] +2025-02-05 21:32:46 - ERROR - stderr - +2025-02-05 21:32:46 - ERROR - stderr - +2025-02-05 21:32:46 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.2316704988479614, 'learning_rate': 1.0057749543958717e-05, 'epoch': 1.54} +2025-02-05 21:32:46 - ERROR - stderr - 51%|█████▏ | 11514/22434 [11:25:06<7:36:46, 2.51s/it] +2025-02-05 21:32:49 - ERROR - stderr - 51%|█████▏ | 11515/22434 [11:25:08<7:32:11, 2.48s/it] +2025-02-05 21:32:49 - ERROR - stderr - +2025-02-05 21:32:49 - ERROR - stderr - +2025-02-05 21:32:49 - INFO - stdout - {'loss': 0.6057, 'grad_norm': 1.141312599182129, 'learning_rate': 1.005630582081272e-05, 'epoch': 1.54} +2025-02-05 21:32:49 - ERROR - stderr - 51%|█████▏ | 11515/22434 [11:25:08<7:32:11, 2.48s/it] +2025-02-05 21:32:51 - ERROR - stderr - 51%|█████▏ | 11516/22434 [11:25:11<7:37:14, 2.51s/it] +2025-02-05 21:32:51 - ERROR - stderr - +2025-02-05 21:32:51 - ERROR - stderr - +2025-02-05 21:32:51 - INFO - stdout - {'loss': 0.7404, 'grad_norm': 1.1960495710372925, 'learning_rate': 1.0054862096493084e-05, 'epoch': 1.54} +2025-02-05 21:32:51 - ERROR - stderr - 51%|█████▏ | 11516/22434 [11:25:11<7:37:14, 2.51s/it] +2025-02-05 21:32:54 - ERROR - stderr - 51%|█████▏ | 11517/22434 [11:25:14<7:37:35, 2.51s/it] +2025-02-05 21:32:54 - ERROR - stderr - +2025-02-05 21:32:54 - ERROR - stderr - +2025-02-05 21:32:54 - INFO - stdout - {'loss': 0.7354, 'grad_norm': 1.1864585876464844, 'learning_rate': 1.0053418371029898e-05, 'epoch': 1.54} +2025-02-05 21:32:54 - ERROR - stderr - 51%|█████▏ | 11517/22434 [11:25:14<7:37:35, 2.51s/it] +2025-02-05 21:32:56 - ERROR - stderr - 51%|█████▏ | 11518/22434 [11:25:16<7:37:14, 2.51s/it] +2025-02-05 21:32:56 - ERROR - stderr - +2025-02-05 21:32:56 - ERROR - stderr - +2025-02-05 21:32:56 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.1143370866775513, 'learning_rate': 1.0051974644453255e-05, 'epoch': 1.54} +2025-02-05 21:32:56 - ERROR - stderr - 51%|█████▏ | 11518/22434 [11:25:16<7:37:14, 2.51s/it] +2025-02-05 21:32:59 - ERROR - stderr - 51%|█████▏ | 11519/22434 [11:25:19<7:34:20, 2.50s/it] +2025-02-05 21:32:59 - ERROR - stderr - +2025-02-05 21:32:59 - ERROR - stderr - +2025-02-05 21:32:59 - INFO - stdout - {'loss': 0.6594, 'grad_norm': 1.1594401597976685, 'learning_rate': 1.0050530916793253e-05, 'epoch': 1.54} +2025-02-05 21:32:59 - ERROR - stderr - 51%|█████▏ | 11519/22434 [11:25:19<7:34:20, 2.50s/it] +2025-02-05 21:33:01 - ERROR - stderr - 51%|█████▏ | 11520/22434 [11:25:21<7:30:04, 2.47s/it] +2025-02-05 21:33:01 - ERROR - stderr - +2025-02-05 21:33:01 - ERROR - stderr - +2025-02-05 21:33:01 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.2512022256851196, 'learning_rate': 1.0049087188079983e-05, 'epoch': 1.54} +2025-02-05 21:33:01 - ERROR - stderr - 51%|█████▏ | 11520/22434 [11:25:21<7:30:04, 2.47s/it] +2025-02-05 21:33:04 - ERROR - stderr - 51%|█████▏ | 11521/22434 [11:25:23<7:29:42, 2.47s/it] +2025-02-05 21:33:04 - ERROR - stderr - +2025-02-05 21:33:04 - ERROR - stderr - +2025-02-05 21:33:04 - INFO - stdout - {'loss': 0.6753, 'grad_norm': 1.1580619812011719, 'learning_rate': 1.0047643458343534e-05, 'epoch': 1.54} +2025-02-05 21:33:04 - ERROR - stderr - 51%|█████▏ | 11521/22434 [11:25:23<7:29:42, 2.47s/it] +2025-02-05 21:33:06 - ERROR - stderr - 51%|█████▏ | 11522/22434 [11:25:26<7:26:40, 2.46s/it] +2025-02-05 21:33:06 - ERROR - stderr - +2025-02-05 21:33:06 - ERROR - stderr - +2025-02-05 21:33:06 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.082715630531311, 'learning_rate': 1.0046199727614005e-05, 'epoch': 1.54} +2025-02-05 21:33:06 - ERROR - stderr - 51%|█████▏ | 11522/22434 [11:25:26<7:26:40, 2.46s/it] +2025-02-05 21:33:09 - ERROR - stderr - 51%|█████▏ | 11523/22434 [11:25:28<7:28:14, 2.46s/it] +2025-02-05 21:33:09 - ERROR - stderr - +2025-02-05 21:33:09 - ERROR - stderr - +2025-02-05 21:33:09 - INFO - stdout - {'loss': 0.6974, 'grad_norm': 1.0970429182052612, 'learning_rate': 1.0044755995921488e-05, 'epoch': 1.54} +2025-02-05 21:33:09 - ERROR - stderr - 51%|█████▏ | 11523/22434 [11:25:28<7:28:14, 2.46s/it] +2025-02-05 21:33:11 - ERROR - stderr - 51%|█████▏ | 11524/22434 [11:25:31<7:32:28, 2.49s/it] +2025-02-05 21:33:11 - ERROR - stderr - +2025-02-05 21:33:11 - ERROR - stderr - +2025-02-05 21:33:11 - INFO - stdout - {'loss': 0.6377, 'grad_norm': 1.1509534120559692, 'learning_rate': 1.0043312263296074e-05, 'epoch': 1.54} +2025-02-05 21:33:11 - ERROR - stderr - 51%|█████▏ | 11524/22434 [11:25:31<7:32:28, 2.49s/it] +2025-02-05 21:33:14 - ERROR - stderr - 51%|█████▏ | 11525/22434 [11:25:33<7:32:42, 2.49s/it] +2025-02-05 21:33:14 - ERROR - stderr - +2025-02-05 21:33:14 - ERROR - stderr - +2025-02-05 21:33:14 - INFO - stdout - {'loss': 0.7892, 'grad_norm': 1.2446962594985962, 'learning_rate': 1.0041868529767855e-05, 'epoch': 1.54} +2025-02-05 21:33:14 - ERROR - stderr - 51%|█████▏ | 11525/22434 [11:25:33<7:32:42, 2.49s/it] +2025-02-05 21:33:16 - ERROR - stderr - 51%|█████▏ | 11526/22434 [11:25:36<7:33:43, 2.50s/it] +2025-02-05 21:33:16 - ERROR - stderr - +2025-02-05 21:33:16 - ERROR - stderr - +2025-02-05 21:33:16 - INFO - stdout - {'loss': 0.7052, 'grad_norm': 1.3933978080749512, 'learning_rate': 1.004042479536693e-05, 'epoch': 1.54} +2025-02-05 21:33:16 - ERROR - stderr - 51%|█████▏ | 11526/22434 [11:25:36<7:33:43, 2.50s/it] +2025-02-05 21:33:19 - ERROR - stderr - 51%|█████▏ | 11527/22434 [11:25:38<7:40:54, 2.54s/it] +2025-02-05 21:33:19 - ERROR - stderr - +2025-02-05 21:33:19 - ERROR - stderr - +2025-02-05 21:33:19 - INFO - stdout - {'loss': 0.6689, 'grad_norm': 1.1709096431732178, 'learning_rate': 1.0038981060123388e-05, 'epoch': 1.54} +2025-02-05 21:33:19 - ERROR - stderr - 51%|█████▏ | 11527/22434 [11:25:39<7:40:54, 2.54s/it] +2025-02-05 21:33:21 - ERROR - stderr - 51%|█████▏ | 11528/22434 [11:25:41<7:39:11, 2.53s/it] +2025-02-05 21:33:21 - ERROR - stderr - +2025-02-05 21:33:21 - ERROR - stderr - +2025-02-05 21:33:21 - INFO - stdout - {'loss': 0.6476, 'grad_norm': 1.1704374551773071, 'learning_rate': 1.0037537324067324e-05, 'epoch': 1.54} +2025-02-05 21:33:21 - ERROR - stderr - 51%|█████▏ | 11528/22434 [11:25:41<7:39:11, 2.53s/it] +2025-02-05 21:33:24 - ERROR - stderr - 51%|█████▏ | 11529/22434 [11:25:43<7:35:54, 2.51s/it] +2025-02-05 21:33:24 - ERROR - stderr - +2025-02-05 21:33:24 - ERROR - stderr - +2025-02-05 21:33:24 - INFO - stdout - {'loss': 0.7775, 'grad_norm': 1.219169020652771, 'learning_rate': 1.0036093587228828e-05, 'epoch': 1.54} +2025-02-05 21:33:24 - ERROR - stderr - 51%|█████▏ | 11529/22434 [11:25:44<7:35:54, 2.51s/it] +2025-02-05 21:33:26 - ERROR - stderr - 51%|█████▏ | 11530/22434 [11:25:46<7:36:58, 2.51s/it] +2025-02-05 21:33:26 - ERROR - stderr - +2025-02-05 21:33:26 - ERROR - stderr - +2025-02-05 21:33:26 - INFO - stdout - {'loss': 0.6248, 'grad_norm': 1.2217094898223877, 'learning_rate': 1.0034649849637998e-05, 'epoch': 1.54} +2025-02-05 21:33:26 - ERROR - stderr - 51%|█████▏ | 11530/22434 [11:25:46<7:36:58, 2.51s/it] +2025-02-05 21:33:29 - ERROR - stderr - 51%|█████▏ | 11531/22434 [11:25:49<7:37:30, 2.52s/it] +2025-02-05 21:33:29 - ERROR - stderr - +2025-02-05 21:33:29 - ERROR - stderr - +2025-02-05 21:33:29 - INFO - stdout - {'loss': 0.6888, 'grad_norm': 1.1808208227157593, 'learning_rate': 1.0033206111324922e-05, 'epoch': 1.54} +2025-02-05 21:33:29 - ERROR - stderr - 51%|█████▏ | 11531/22434 [11:25:49<7:37:30, 2.52s/it] +2025-02-05 21:33:31 - ERROR - stderr - 51%|█████▏ | 11532/22434 [11:25:51<7:38:26, 2.52s/it] +2025-02-05 21:33:31 - ERROR - stderr - +2025-02-05 21:33:31 - ERROR - stderr - +2025-02-05 21:33:31 - INFO - stdout - {'loss': 0.6933, 'grad_norm': 1.1540546417236328, 'learning_rate': 1.00317623723197e-05, 'epoch': 1.54} +2025-02-05 21:33:31 - ERROR - stderr - 51%|█████▏ | 11532/22434 [11:25:51<7:38:26, 2.52s/it] +2025-02-05 21:33:34 - ERROR - stderr - 51%|█████▏ | 11533/22434 [11:25:54<7:36:12, 2.51s/it] +2025-02-05 21:33:34 - ERROR - stderr - +2025-02-05 21:33:34 - ERROR - stderr - +2025-02-05 21:33:34 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.1816272735595703, 'learning_rate': 1.0030318632652419e-05, 'epoch': 1.54} +2025-02-05 21:33:34 - ERROR - stderr - 51%|█████▏ | 11533/22434 [11:25:54<7:36:12, 2.51s/it] +2025-02-05 21:33:36 - ERROR - stderr - 51%|█████▏ | 11534/22434 [11:25:56<7:35:44, 2.51s/it] +2025-02-05 21:33:36 - ERROR - stderr - +2025-02-05 21:33:36 - ERROR - stderr - +2025-02-05 21:33:36 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.2176556587219238, 'learning_rate': 1.0028874892353176e-05, 'epoch': 1.54} +2025-02-05 21:33:36 - ERROR - stderr - 51%|█████▏ | 11534/22434 [11:25:56<7:35:44, 2.51s/it] +2025-02-05 21:33:39 - ERROR - stderr - 51%|█████▏ | 11535/22434 [11:25:59<7:49:01, 2.58s/it] +2025-02-05 21:33:39 - ERROR - stderr - +2025-02-05 21:33:39 - ERROR - stderr - +2025-02-05 21:33:39 - INFO - stdout - {'loss': 0.7066, 'grad_norm': 1.2748459577560425, 'learning_rate': 1.0027431151452062e-05, 'epoch': 1.54} +2025-02-05 21:33:39 - ERROR - stderr - 51%|█████▏ | 11535/22434 [11:25:59<7:49:01, 2.58s/it] +2025-02-05 21:33:42 - ERROR - stderr - 51%|█████▏ | 11536/22434 [11:26:01<7:45:25, 2.56s/it] +2025-02-05 21:33:42 - ERROR - stderr - +2025-02-05 21:33:42 - ERROR - stderr - +2025-02-05 21:33:42 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 1.2358193397521973, 'learning_rate': 1.0025987409979176e-05, 'epoch': 1.54} +2025-02-05 21:33:42 - ERROR - stderr - 51%|█████▏ | 11536/22434 [11:26:01<7:45:25, 2.56s/it] +2025-02-05 21:33:44 - ERROR - stderr - 51%|█████▏ | 11537/22434 [11:26:04<7:42:16, 2.55s/it] +2025-02-05 21:33:44 - ERROR - stderr - +2025-02-05 21:33:44 - ERROR - stderr - +2025-02-05 21:33:44 - INFO - stdout - {'loss': 0.6108, 'grad_norm': 1.1027051210403442, 'learning_rate': 1.0024543667964605e-05, 'epoch': 1.54} +2025-02-05 21:33:44 - ERROR - stderr - 51%|█████▏ | 11537/22434 [11:26:04<7:42:16, 2.55s/it] +2025-02-05 21:33:46 - ERROR - stderr - 51%|█████▏ | 11538/22434 [11:26:06<7:37:37, 2.52s/it] +2025-02-05 21:33:47 - ERROR - stderr - +2025-02-05 21:33:47 - ERROR - stderr - +2025-02-05 21:33:47 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 1.249271273612976, 'learning_rate': 1.0023099925438441e-05, 'epoch': 1.54} +2025-02-05 21:33:47 - ERROR - stderr - 51%|█████▏ | 11538/22434 [11:26:06<7:37:37, 2.52s/it] +2025-02-05 21:33:49 - ERROR - stderr - 51%|█████▏ | 11539/22434 [11:26:09<7:35:51, 2.51s/it] +2025-02-05 21:33:49 - ERROR - stderr - +2025-02-05 21:33:49 - ERROR - stderr - +2025-02-05 21:33:49 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.171519160270691, 'learning_rate': 1.0021656182430785e-05, 'epoch': 1.54} +2025-02-05 21:33:49 - ERROR - stderr - 51%|█████▏ | 11539/22434 [11:26:09<7:35:51, 2.51s/it] +2025-02-05 21:33:52 - ERROR - stderr - 51%|█████▏ | 11540/22434 [11:26:11<7:37:11, 2.52s/it] +2025-02-05 21:33:52 - ERROR - stderr - +2025-02-05 21:33:52 - ERROR - stderr - +2025-02-05 21:33:52 - INFO - stdout - {'loss': 0.6914, 'grad_norm': 1.14243745803833, 'learning_rate': 1.002021243897173e-05, 'epoch': 1.54} +2025-02-05 21:33:52 - ERROR - stderr - 51%|█████▏ | 11540/22434 [11:26:11<7:37:11, 2.52s/it] +2025-02-05 21:33:54 - ERROR - stderr - 51%|█████▏ | 11541/22434 [11:26:14<7:36:37, 2.52s/it] +2025-02-05 21:33:54 - ERROR - stderr - +2025-02-05 21:33:54 - ERROR - stderr - +2025-02-05 21:33:54 - INFO - stdout - {'loss': 0.6915, 'grad_norm': 1.2589973211288452, 'learning_rate': 1.0018768695091361e-05, 'epoch': 1.54} +2025-02-05 21:33:54 - ERROR - stderr - 51%|█████▏ | 11541/22434 [11:26:14<7:36:37, 2.52s/it] +2025-02-05 21:33:57 - ERROR - stderr - 51%|█████▏ | 11542/22434 [11:26:16<7:34:50, 2.51s/it] +2025-02-05 21:33:57 - ERROR - stderr - +2025-02-05 21:33:57 - ERROR - stderr - +2025-02-05 21:33:57 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.0428980588912964, 'learning_rate': 1.0017324950819778e-05, 'epoch': 1.54} +2025-02-05 21:33:57 - ERROR - stderr - 51%|█████▏ | 11542/22434 [11:26:16<7:34:50, 2.51s/it] +2025-02-05 21:33:59 - ERROR - stderr - 51%|█████▏ | 11543/22434 [11:26:19<7:34:50, 2.51s/it] +2025-02-05 21:33:59 - ERROR - stderr - +2025-02-05 21:33:59 - ERROR - stderr - +2025-02-05 21:33:59 - INFO - stdout - {'loss': 0.7164, 'grad_norm': 1.3954237699508667, 'learning_rate': 1.0015881206187072e-05, 'epoch': 1.54} +2025-02-05 21:33:59 - ERROR - stderr - 51%|█████▏ | 11543/22434 [11:26:19<7:34:50, 2.51s/it] +2025-02-05 21:34:02 - ERROR - stderr - 51%|█████▏ | 11544/22434 [11:26:21<7:36:39, 2.52s/it] +2025-02-05 21:34:02 - ERROR - stderr - +2025-02-05 21:34:02 - ERROR - stderr - +2025-02-05 21:34:02 - INFO - stdout - {'loss': 0.7472, 'grad_norm': 1.211290955543518, 'learning_rate': 1.001443746122334e-05, 'epoch': 1.54} +2025-02-05 21:34:02 - ERROR - stderr - 51%|█████▏ | 11544/22434 [11:26:21<7:36:39, 2.52s/it] +2025-02-05 21:34:04 - ERROR - stderr - 51%|█████▏ | 11545/22434 [11:26:24<7:37:22, 2.52s/it] +2025-02-05 21:34:04 - ERROR - stderr - +2025-02-05 21:34:04 - ERROR - stderr - +2025-02-05 21:34:04 - INFO - stdout - {'loss': 0.705, 'grad_norm': 1.1888172626495361, 'learning_rate': 1.001299371595867e-05, 'epoch': 1.54} +2025-02-05 21:34:04 - ERROR - stderr - 51%|█████▏ | 11545/22434 [11:26:24<7:37:22, 2.52s/it] +2025-02-05 21:34:07 - ERROR - stderr - 51%|█████▏ | 11546/22434 [11:26:26<7:35:54, 2.51s/it] +2025-02-05 21:34:07 - ERROR - stderr - +2025-02-05 21:34:07 - ERROR - stderr - +2025-02-05 21:34:07 - INFO - stdout - {'loss': 0.6068, 'grad_norm': 1.1781798601150513, 'learning_rate': 1.001154997042316e-05, 'epoch': 1.54} +2025-02-05 21:34:07 - ERROR - stderr - 51%|█████▏ | 11546/22434 [11:26:26<7:35:54, 2.51s/it] +2025-02-05 21:34:09 - ERROR - stderr - 51%|█████▏ | 11547/22434 [11:26:29<7:41:02, 2.54s/it] +2025-02-05 21:34:09 - ERROR - stderr - +2025-02-05 21:34:09 - ERROR - stderr - +2025-02-05 21:34:09 - INFO - stdout - {'loss': 0.7251, 'grad_norm': 1.1936330795288086, 'learning_rate': 1.0010106224646901e-05, 'epoch': 1.54} +2025-02-05 21:34:09 - ERROR - stderr - 51%|█████▏ | 11547/22434 [11:26:29<7:41:02, 2.54s/it] +2025-02-05 21:34:12 - ERROR - stderr - 51%|█████▏ | 11548/22434 [11:26:32<7:48:38, 2.58s/it] +2025-02-05 21:34:12 - ERROR - stderr - +2025-02-05 21:34:12 - ERROR - stderr - +2025-02-05 21:34:12 - INFO - stdout - {'loss': 0.7275, 'grad_norm': 1.3079208135604858, 'learning_rate': 1.000866247865999e-05, 'epoch': 1.54} +2025-02-05 21:34:12 - ERROR - stderr - 51%|█████▏ | 11548/22434 [11:26:32<7:48:38, 2.58s/it] +2025-02-05 21:34:14 - ERROR - stderr - 51%|█████▏ | 11549/22434 [11:26:34<7:49:09, 2.59s/it] +2025-02-05 21:34:15 - ERROR - stderr - +2025-02-05 21:34:15 - ERROR - stderr - +2025-02-05 21:34:15 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.2139668464660645, 'learning_rate': 1.0007218732492516e-05, 'epoch': 1.54} +2025-02-05 21:34:15 - ERROR - stderr - 51%|█████▏ | 11549/22434 [11:26:34<7:49:09, 2.59s/it] +2025-02-05 21:34:17 - ERROR - stderr - 51%|█████▏ | 11550/22434 [11:26:37<7:43:23, 2.55s/it] +2025-02-05 21:34:17 - ERROR - stderr - +2025-02-05 21:34:17 - ERROR - stderr - +2025-02-05 21:34:17 - INFO - stdout - {'loss': 0.7236, 'grad_norm': 1.2707051038742065, 'learning_rate': 1.0005774986174574e-05, 'epoch': 1.54} +2025-02-05 21:34:17 - ERROR - stderr - 51%|█████▏ | 11550/22434 [11:26:37<7:43:23, 2.55s/it] +2025-02-05 21:34:20 - ERROR - stderr - 51%|█████▏ | 11551/22434 [11:26:39<7:46:07, 2.57s/it] +2025-02-05 21:34:20 - ERROR - stderr - +2025-02-05 21:34:20 - ERROR - stderr - +2025-02-05 21:34:20 - INFO - stdout - {'loss': 0.7706, 'grad_norm': 1.3258891105651855, 'learning_rate': 1.0004331239736258e-05, 'epoch': 1.54} +2025-02-05 21:34:20 - ERROR - stderr - 51%|█████▏ | 11551/22434 [11:26:39<7:46:07, 2.57s/it] +2025-02-05 21:34:22 - ERROR - stderr - 51%|█████▏ | 11552/22434 [11:26:42<7:44:11, 2.56s/it] +2025-02-05 21:34:22 - ERROR - stderr - +2025-02-05 21:34:22 - ERROR - stderr - +2025-02-05 21:34:22 - INFO - stdout - {'loss': 0.7538, 'grad_norm': 1.2156256437301636, 'learning_rate': 1.0002887493207663e-05, 'epoch': 1.54} +2025-02-05 21:34:22 - ERROR - stderr - 51%|█████▏ | 11552/22434 [11:26:42<7:44:11, 2.56s/it] +2025-02-05 21:34:25 - ERROR - stderr - 51%|█████▏ | 11553/22434 [11:26:44<7:43:38, 2.56s/it] +2025-02-05 21:34:25 - ERROR - stderr - +2025-02-05 21:34:25 - ERROR - stderr - +2025-02-05 21:34:25 - INFO - stdout - {'loss': 0.6632, 'grad_norm': 0.9964362978935242, 'learning_rate': 1.0001443746618877e-05, 'epoch': 1.54} +2025-02-05 21:34:25 - ERROR - stderr - 51%|█████▏ | 11553/22434 [11:26:44<7:43:38, 2.56s/it] +2025-02-05 21:34:27 - ERROR - stderr - 52%|█████▏ | 11554/22434 [11:26:47<7:38:39, 2.53s/it] +2025-02-05 21:34:27 - ERROR - stderr - +2025-02-05 21:34:27 - ERROR - stderr - +2025-02-05 21:34:27 - INFO - stdout - {'loss': 0.6294, 'grad_norm': 1.1825547218322754, 'learning_rate': 1e-05, 'epoch': 1.55} +2025-02-05 21:34:27 - ERROR - stderr - 52%|█████▏ | 11554/22434 [11:26:47<7:38:39, 2.53s/it] +2025-02-05 21:34:30 - ERROR - stderr - 52%|█████▏ | 11555/22434 [11:26:49<7:39:11, 2.53s/it] +2025-02-05 21:34:30 - ERROR - stderr - +2025-02-05 21:34:30 - ERROR - stderr - +2025-02-05 21:34:30 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.1713074445724487, 'learning_rate': 9.998556253381127e-06, 'epoch': 1.55} +2025-02-05 21:34:30 - ERROR - stderr - 52%|█████▏ | 11555/22434 [11:26:49<7:39:11, 2.53s/it] +2025-02-05 21:34:32 - ERROR - stderr - 52%|█████▏ | 11556/22434 [11:26:52<7:46:54, 2.58s/it] +2025-02-05 21:34:32 - ERROR - stderr - +2025-02-05 21:34:32 - ERROR - stderr - +2025-02-05 21:34:32 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.274376630783081, 'learning_rate': 9.99711250679234e-06, 'epoch': 1.55} +2025-02-05 21:34:32 - ERROR - stderr - 52%|█████▏ | 11556/22434 [11:26:52<7:46:54, 2.58s/it] +2025-02-05 21:34:35 - ERROR - stderr - 52%|█████▏ | 11557/22434 [11:26:55<7:41:28, 2.55s/it] +2025-02-05 21:34:35 - ERROR - stderr - +2025-02-05 21:34:35 - ERROR - stderr - +2025-02-05 21:34:35 - INFO - stdout - {'loss': 0.6939, 'grad_norm': 1.1717119216918945, 'learning_rate': 9.995668760263745e-06, 'epoch': 1.55} +2025-02-05 21:34:35 - ERROR - stderr - 52%|█████▏ | 11557/22434 [11:26:55<7:41:28, 2.55s/it] +2025-02-05 21:34:37 - ERROR - stderr - 52%|█████▏ | 11558/22434 [11:26:57<7:41:57, 2.55s/it] +2025-02-05 21:34:37 - ERROR - stderr - +2025-02-05 21:34:37 - ERROR - stderr - +2025-02-05 21:34:37 - INFO - stdout - {'loss': 0.6985, 'grad_norm': 1.1790015697479248, 'learning_rate': 9.994225013825428e-06, 'epoch': 1.55} +2025-02-05 21:34:37 - ERROR - stderr - 52%|█████▏ | 11558/22434 [11:26:57<7:41:57, 2.55s/it] +2025-02-05 21:34:40 - ERROR - stderr - 52%|█████▏ | 11559/22434 [11:27:00<7:37:24, 2.52s/it] +2025-02-05 21:34:40 - ERROR - stderr - +2025-02-05 21:34:40 - ERROR - stderr - +2025-02-05 21:34:40 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.174066424369812, 'learning_rate': 9.992781267507487e-06, 'epoch': 1.55} +2025-02-05 21:34:40 - ERROR - stderr - 52%|█████▏ | 11559/22434 [11:27:00<7:37:24, 2.52s/it] +2025-02-05 21:34:42 - ERROR - stderr - 52%|█████▏ | 11560/22434 [11:27:02<7:37:40, 2.53s/it] +2025-02-05 21:34:42 - ERROR - stderr - +2025-02-05 21:34:42 - ERROR - stderr - +2025-02-05 21:34:42 - INFO - stdout - {'loss': 0.7265, 'grad_norm': 1.3035436868667603, 'learning_rate': 9.991337521340014e-06, 'epoch': 1.55} +2025-02-05 21:34:42 - ERROR - stderr - 52%|█████▏ | 11560/22434 [11:27:02<7:37:40, 2.53s/it] +2025-02-05 21:34:45 - ERROR - stderr - 52%|█████▏ | 11561/22434 [11:27:05<7:36:12, 2.52s/it] +2025-02-05 21:34:45 - ERROR - stderr - +2025-02-05 21:34:45 - ERROR - stderr - +2025-02-05 21:34:45 - INFO - stdout - {'loss': 0.6961, 'grad_norm': 1.1840989589691162, 'learning_rate': 9.989893775353099e-06, 'epoch': 1.55} +2025-02-05 21:34:45 - ERROR - stderr - 52%|█████▏ | 11561/22434 [11:27:05<7:36:12, 2.52s/it] +2025-02-05 21:34:48 - ERROR - stderr - 52%|█████▏ | 11562/22434 [11:27:07<7:54:11, 2.62s/it] +2025-02-05 21:34:48 - ERROR - stderr - +2025-02-05 21:34:48 - ERROR - stderr - +2025-02-05 21:34:48 - INFO - stdout - {'loss': 0.7475, 'grad_norm': 1.2567524909973145, 'learning_rate': 9.988450029576843e-06, 'epoch': 1.55} +2025-02-05 21:34:48 - ERROR - stderr - 52%|█████▏ | 11562/22434 [11:27:08<7:54:11, 2.62s/it] +2025-02-05 21:34:50 - ERROR - stderr - 52%|█████▏ | 11563/22434 [11:27:10<7:48:17, 2.58s/it] +2025-02-05 21:34:50 - ERROR - stderr - +2025-02-05 21:34:50 - ERROR - stderr - +2025-02-05 21:34:50 - INFO - stdout - {'loss': 0.7107, 'grad_norm': 1.2457315921783447, 'learning_rate': 9.987006284041332e-06, 'epoch': 1.55} +2025-02-05 21:34:50 - ERROR - stderr - 52%|█████▏ | 11563/22434 [11:27:10<7:48:17, 2.58s/it] +2025-02-05 21:34:53 - ERROR - stderr - 52%|█████▏ | 11564/22434 [11:27:12<7:39:02, 2.53s/it] +2025-02-05 21:34:53 - ERROR - stderr - +2025-02-05 21:34:53 - ERROR - stderr - +2025-02-05 21:34:53 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.3733962774276733, 'learning_rate': 9.985562538776662e-06, 'epoch': 1.55} +2025-02-05 21:34:53 - ERROR - stderr - 52%|█████▏ | 11564/22434 [11:27:12<7:39:02, 2.53s/it] +2025-02-05 21:34:55 - ERROR - stderr - 52%|█████▏ | 11565/22434 [11:27:15<7:36:00, 2.52s/it] +2025-02-05 21:34:55 - ERROR - stderr - +2025-02-05 21:34:55 - ERROR - stderr - +2025-02-05 21:34:55 - INFO - stdout - {'loss': 0.6882, 'grad_norm': 1.271192193031311, 'learning_rate': 9.98411879381293e-06, 'epoch': 1.55} +2025-02-05 21:34:55 - ERROR - stderr - 52%|█████▏ | 11565/22434 [11:27:15<7:36:00, 2.52s/it] +2025-02-05 21:34:58 - ERROR - stderr - 52%|█████▏ | 11566/22434 [11:27:17<7:34:34, 2.51s/it] +2025-02-05 21:34:58 - ERROR - stderr - +2025-02-05 21:34:58 - ERROR - stderr - +2025-02-05 21:34:58 - INFO - stdout - {'loss': 0.6213, 'grad_norm': 1.017574667930603, 'learning_rate': 9.982675049180222e-06, 'epoch': 1.55} +2025-02-05 21:34:58 - ERROR - stderr - 52%|█████▏ | 11566/22434 [11:27:17<7:34:34, 2.51s/it] +2025-02-05 21:35:00 - ERROR - stderr - 52%|█████▏ | 11567/22434 [11:27:20<7:33:40, 2.50s/it] +2025-02-05 21:35:00 - ERROR - stderr - +2025-02-05 21:35:00 - ERROR - stderr - +2025-02-05 21:35:00 - INFO - stdout - {'loss': 0.7962, 'grad_norm': 1.3721671104431152, 'learning_rate': 9.98123130490864e-06, 'epoch': 1.55} +2025-02-05 21:35:00 - ERROR - stderr - 52%|█████▏ | 11567/22434 [11:27:20<7:33:40, 2.50s/it] +2025-02-05 21:35:03 - ERROR - stderr - 52%|█████▏ | 11568/22434 [11:27:23<7:42:07, 2.55s/it] +2025-02-05 21:35:03 - ERROR - stderr - +2025-02-05 21:35:03 - ERROR - stderr - +2025-02-05 21:35:03 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.2679362297058105, 'learning_rate': 9.979787561028276e-06, 'epoch': 1.55} +2025-02-05 21:35:03 - ERROR - stderr - 52%|█████▏ | 11568/22434 [11:27:23<7:42:07, 2.55s/it] +2025-02-05 21:35:05 - ERROR - stderr - 52%|█████▏ | 11569/22434 [11:27:25<7:36:56, 2.52s/it] +2025-02-05 21:35:05 - ERROR - stderr - +2025-02-05 21:35:05 - ERROR - stderr - +2025-02-05 21:35:05 - INFO - stdout - {'loss': 0.7775, 'grad_norm': 1.199182152748108, 'learning_rate': 9.978343817569214e-06, 'epoch': 1.55} +2025-02-05 21:35:05 - ERROR - stderr - 52%|█████▏ | 11569/22434 [11:27:25<7:36:56, 2.52s/it] +2025-02-05 21:35:08 - ERROR - stderr - 52%|█████▏ | 11570/22434 [11:27:28<7:42:48, 2.56s/it] +2025-02-05 21:35:08 - ERROR - stderr - +2025-02-05 21:35:08 - ERROR - stderr - +2025-02-05 21:35:08 - INFO - stdout - {'loss': 0.6634, 'grad_norm': 1.2453573942184448, 'learning_rate': 9.97690007456156e-06, 'epoch': 1.55} +2025-02-05 21:35:08 - ERROR - stderr - 52%|█████▏ | 11570/22434 [11:27:28<7:42:48, 2.56s/it] +2025-02-05 21:35:10 - ERROR - stderr - 52%|█████▏ | 11571/22434 [11:27:30<7:38:08, 2.53s/it] +2025-02-05 21:35:10 - ERROR - stderr - +2025-02-05 21:35:10 - ERROR - stderr - +2025-02-05 21:35:10 - INFO - stdout - {'loss': 0.7324, 'grad_norm': 1.3124717473983765, 'learning_rate': 9.975456332035398e-06, 'epoch': 1.55} +2025-02-05 21:35:10 - ERROR - stderr - 52%|█████▏ | 11571/22434 [11:27:30<7:38:08, 2.53s/it] +2025-02-05 21:35:13 - ERROR - stderr - 52%|█████▏ | 11572/22434 [11:27:33<7:32:41, 2.50s/it] +2025-02-05 21:35:13 - ERROR - stderr - +2025-02-05 21:35:13 - ERROR - stderr - +2025-02-05 21:35:13 - INFO - stdout - {'loss': 0.7358, 'grad_norm': 1.2797060012817383, 'learning_rate': 9.974012590020826e-06, 'epoch': 1.55} +2025-02-05 21:35:13 - ERROR - stderr - 52%|█████▏ | 11572/22434 [11:27:33<7:32:41, 2.50s/it] +2025-02-05 21:35:15 - ERROR - stderr - 52%|█████▏ | 11573/22434 [11:27:35<7:32:31, 2.50s/it] +2025-02-05 21:35:15 - ERROR - stderr - +2025-02-05 21:35:15 - ERROR - stderr - +2025-02-05 21:35:15 - INFO - stdout - {'loss': 0.6429, 'grad_norm': 1.3853349685668945, 'learning_rate': 9.97256884854794e-06, 'epoch': 1.55} +2025-02-05 21:35:15 - ERROR - stderr - 52%|█████▏ | 11573/22434 [11:27:35<7:32:31, 2.50s/it] +2025-02-05 21:35:18 - ERROR - stderr - 52%|█████▏ | 11574/22434 [11:27:38<7:42:07, 2.55s/it] +2025-02-05 21:35:18 - ERROR - stderr - +2025-02-05 21:35:18 - ERROR - stderr - +2025-02-05 21:35:18 - INFO - stdout - {'loss': 0.6492, 'grad_norm': 1.2257990837097168, 'learning_rate': 9.971125107646826e-06, 'epoch': 1.55} +2025-02-05 21:35:18 - ERROR - stderr - 52%|█████▏ | 11574/22434 [11:27:38<7:42:07, 2.55s/it] +2025-02-05 21:35:20 - ERROR - stderr - 52%|█████▏ | 11575/22434 [11:27:40<7:40:49, 2.55s/it] +2025-02-05 21:35:20 - ERROR - stderr - +2025-02-05 21:35:20 - ERROR - stderr - +2025-02-05 21:35:20 - INFO - stdout - {'loss': 0.6366, 'grad_norm': 1.1092944145202637, 'learning_rate': 9.969681367347583e-06, 'epoch': 1.55} +2025-02-05 21:35:20 - ERROR - stderr - 52%|█████▏ | 11575/22434 [11:27:40<7:40:49, 2.55s/it] +2025-02-05 21:35:23 - ERROR - stderr - 52%|█████▏ | 11576/22434 [11:27:43<7:35:55, 2.52s/it] +2025-02-05 21:35:23 - ERROR - stderr - +2025-02-05 21:35:23 - ERROR - stderr - +2025-02-05 21:35:23 - INFO - stdout - {'loss': 0.7186, 'grad_norm': 1.2266744375228882, 'learning_rate': 9.968237627680305e-06, 'epoch': 1.55} +2025-02-05 21:35:23 - ERROR - stderr - 52%|█████▏ | 11576/22434 [11:27:43<7:35:55, 2.52s/it] +2025-02-05 21:35:25 - ERROR - stderr - 52%|█████▏ | 11577/22434 [11:27:45<7:35:44, 2.52s/it] +2025-02-05 21:35:25 - ERROR - stderr - +2025-02-05 21:35:25 - ERROR - stderr - +2025-02-05 21:35:25 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.3109180927276611, 'learning_rate': 9.96679388867508e-06, 'epoch': 1.55} +2025-02-05 21:35:25 - ERROR - stderr - 52%|█████▏ | 11577/22434 [11:27:45<7:35:44, 2.52s/it] +2025-02-05 21:35:28 - ERROR - stderr - 52%|█████▏ | 11578/22434 [11:27:48<7:35:59, 2.52s/it] +2025-02-05 21:35:28 - ERROR - stderr - +2025-02-05 21:35:28 - ERROR - stderr - +2025-02-05 21:35:28 - INFO - stdout - {'loss': 0.7348, 'grad_norm': 1.1613661050796509, 'learning_rate': 9.965350150362005e-06, 'epoch': 1.55} +2025-02-05 21:35:28 - ERROR - stderr - 52%|█████▏ | 11578/22434 [11:27:48<7:35:59, 2.52s/it] +2025-02-05 21:35:30 - ERROR - stderr - 52%|█████▏ | 11579/22434 [11:27:50<7:35:39, 2.52s/it] +2025-02-05 21:35:31 - ERROR - stderr - +2025-02-05 21:35:31 - ERROR - stderr - +2025-02-05 21:35:31 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.1526134014129639, 'learning_rate': 9.963906412771176e-06, 'epoch': 1.55} +2025-02-05 21:35:31 - ERROR - stderr - 52%|█████▏ | 11579/22434 [11:27:50<7:35:39, 2.52s/it] +2025-02-05 21:35:33 - ERROR - stderr - 52%|█████▏ | 11580/22434 [11:27:53<7:33:03, 2.50s/it] +2025-02-05 21:35:33 - ERROR - stderr - +2025-02-05 21:35:33 - ERROR - stderr - +2025-02-05 21:35:33 - INFO - stdout - {'loss': 0.6585, 'grad_norm': 1.2256871461868286, 'learning_rate': 9.962462675932679e-06, 'epoch': 1.55} +2025-02-05 21:35:33 - ERROR - stderr - 52%|█████▏ | 11580/22434 [11:27:53<7:33:03, 2.50s/it] +2025-02-05 21:35:36 - ERROR - stderr - 52%|█████▏ | 11581/22434 [11:27:55<7:45:33, 2.57s/it] +2025-02-05 21:35:36 - ERROR - stderr - +2025-02-05 21:35:36 - ERROR - stderr - +2025-02-05 21:35:36 - INFO - stdout - {'loss': 0.6815, 'grad_norm': 1.391932487487793, 'learning_rate': 9.961018939876616e-06, 'epoch': 1.55} +2025-02-05 21:35:36 - ERROR - stderr - 52%|█████▏ | 11581/22434 [11:27:55<7:45:33, 2.57s/it] +2025-02-05 21:35:38 - ERROR - stderr - 52%|█████▏ | 11582/22434 [11:27:58<7:38:21, 2.53s/it] +2025-02-05 21:35:38 - ERROR - stderr - +2025-02-05 21:35:38 - ERROR - stderr - +2025-02-05 21:35:38 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.1569149494171143, 'learning_rate': 9.95957520463307e-06, 'epoch': 1.55} +2025-02-05 21:35:38 - ERROR - stderr - 52%|█████▏ | 11582/22434 [11:27:58<7:38:21, 2.53s/it] +2025-02-05 21:35:41 - ERROR - stderr - 52%|█████▏ | 11583/22434 [11:28:01<7:47:17, 2.58s/it] +2025-02-05 21:35:41 - ERROR - stderr - +2025-02-05 21:35:41 - ERROR - stderr - +2025-02-05 21:35:41 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.1894664764404297, 'learning_rate': 9.958131470232147e-06, 'epoch': 1.55} +2025-02-05 21:35:41 - ERROR - stderr - 52%|█████▏ | 11583/22434 [11:28:01<7:47:17, 2.58s/it] +2025-02-05 21:35:43 - ERROR - stderr - 52%|█████▏ | 11584/22434 [11:28:03<7:45:43, 2.58s/it] +2025-02-05 21:35:43 - ERROR - stderr - +2025-02-05 21:35:43 - ERROR - stderr - +2025-02-05 21:35:43 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.286178469657898, 'learning_rate': 9.956687736703931e-06, 'epoch': 1.55} +2025-02-05 21:35:43 - ERROR - stderr - 52%|█████▏ | 11584/22434 [11:28:03<7:45:43, 2.58s/it] +2025-02-05 21:35:46 - ERROR - stderr - 52%|█████▏ | 11585/22434 [11:28:06<7:41:09, 2.55s/it] +2025-02-05 21:35:46 - ERROR - stderr - +2025-02-05 21:35:46 - ERROR - stderr - +2025-02-05 21:35:46 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.19902503490448, 'learning_rate': 9.955244004078514e-06, 'epoch': 1.55} +2025-02-05 21:35:46 - ERROR - stderr - 52%|█████▏ | 11585/22434 [11:28:06<7:41:09, 2.55s/it] +2025-02-05 21:35:49 - ERROR - stderr - 52%|█████▏ | 11586/22434 [11:28:08<7:58:36, 2.65s/it] +2025-02-05 21:35:49 - ERROR - stderr - +2025-02-05 21:35:49 - ERROR - stderr - +2025-02-05 21:35:49 - INFO - stdout - {'loss': 0.7455, 'grad_norm': 1.2167459726333618, 'learning_rate': 9.953800272385997e-06, 'epoch': 1.55} +2025-02-05 21:35:49 - ERROR - stderr - 52%|█████▏ | 11586/22434 [11:28:09<7:58:36, 2.65s/it] +2025-02-05 21:35:51 - ERROR - stderr - 52%|█████▏ | 11587/22434 [11:28:11<7:51:24, 2.61s/it] +2025-02-05 21:35:51 - ERROR - stderr - +2025-02-05 21:35:51 - ERROR - stderr - +2025-02-05 21:35:51 - INFO - stdout - {'loss': 0.7238, 'grad_norm': 1.3349252939224243, 'learning_rate': 9.952356541656471e-06, 'epoch': 1.55} +2025-02-05 21:35:51 - ERROR - stderr - 52%|█████▏ | 11587/22434 [11:28:11<7:51:24, 2.61s/it] +2025-02-05 21:35:54 - ERROR - stderr - 52%|█████▏ | 11588/22434 [11:28:14<7:48:15, 2.59s/it] +2025-02-05 21:35:54 - ERROR - stderr - +2025-02-05 21:35:54 - ERROR - stderr - +2025-02-05 21:35:54 - INFO - stdout - {'loss': 0.5805, 'grad_norm': 1.1347497701644897, 'learning_rate': 9.95091281192002e-06, 'epoch': 1.55} +2025-02-05 21:35:54 - ERROR - stderr - 52%|█████▏ | 11588/22434 [11:28:14<7:48:15, 2.59s/it] +2025-02-05 21:35:56 - ERROR - stderr - 52%|█████▏ | 11589/22434 [11:28:16<7:41:32, 2.55s/it] +2025-02-05 21:35:56 - ERROR - stderr - +2025-02-05 21:35:56 - ERROR - stderr - +2025-02-05 21:35:56 - INFO - stdout - {'loss': 0.6755, 'grad_norm': 1.2632615566253662, 'learning_rate': 9.94946908320675e-06, 'epoch': 1.55} +2025-02-05 21:35:56 - ERROR - stderr - 52%|█████▏ | 11589/22434 [11:28:16<7:41:32, 2.55s/it] +2025-02-05 21:35:59 - ERROR - stderr - 52%|█████▏ | 11590/22434 [11:28:19<7:38:47, 2.54s/it] +2025-02-05 21:35:59 - ERROR - stderr - +2025-02-05 21:35:59 - ERROR - stderr - +2025-02-05 21:35:59 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.153563380241394, 'learning_rate': 9.948025355546747e-06, 'epoch': 1.55} +2025-02-05 21:35:59 - ERROR - stderr - 52%|█████▏ | 11590/22434 [11:28:19<7:38:47, 2.54s/it] +2025-02-05 21:36:01 - ERROR - stderr - 52%|█████▏ | 11591/22434 [11:28:21<7:33:27, 2.51s/it] +2025-02-05 21:36:01 - ERROR - stderr - +2025-02-05 21:36:01 - ERROR - stderr - +2025-02-05 21:36:01 - INFO - stdout - {'loss': 0.6431, 'grad_norm': 1.2649372816085815, 'learning_rate': 9.946581628970106e-06, 'epoch': 1.55} +2025-02-05 21:36:01 - ERROR - stderr - 52%|█████▏ | 11591/22434 [11:28:21<7:33:27, 2.51s/it] +2025-02-05 21:36:04 - ERROR - stderr - 52%|█████▏ | 11592/22434 [11:28:24<7:34:57, 2.52s/it] +2025-02-05 21:36:04 - ERROR - stderr - +2025-02-05 21:36:04 - ERROR - stderr - +2025-02-05 21:36:04 - INFO - stdout - {'loss': 0.5431, 'grad_norm': 1.1538318395614624, 'learning_rate': 9.945137903506921e-06, 'epoch': 1.55} +2025-02-05 21:36:04 - ERROR - stderr - 52%|█████▏ | 11592/22434 [11:28:24<7:34:57, 2.52s/it] +2025-02-05 21:36:06 - ERROR - stderr - 52%|█████▏ | 11593/22434 [11:28:26<7:33:49, 2.51s/it] +2025-02-05 21:36:06 - ERROR - stderr - +2025-02-05 21:36:06 - ERROR - stderr - +2025-02-05 21:36:06 - INFO - stdout - {'loss': 0.6826, 'grad_norm': 1.1633721590042114, 'learning_rate': 9.94369417918728e-06, 'epoch': 1.55} +2025-02-05 21:36:06 - ERROR - stderr - 52%|█████▏ | 11593/22434 [11:28:26<7:33:49, 2.51s/it] +2025-02-05 21:36:09 - ERROR - stderr - 52%|█████▏ | 11594/22434 [11:28:28<7:32:55, 2.51s/it] +2025-02-05 21:36:09 - ERROR - stderr - +2025-02-05 21:36:09 - ERROR - stderr - +2025-02-05 21:36:09 - INFO - stdout - {'loss': 0.6369, 'grad_norm': 1.2265843152999878, 'learning_rate': 9.942250456041286e-06, 'epoch': 1.55} +2025-02-05 21:36:09 - ERROR - stderr - 52%|█████▏ | 11594/22434 [11:28:29<7:32:55, 2.51s/it] +2025-02-05 21:36:11 - ERROR - stderr - 52%|█████▏ | 11595/22434 [11:28:31<7:30:00, 2.49s/it] +2025-02-05 21:36:11 - ERROR - stderr - +2025-02-05 21:36:11 - ERROR - stderr - +2025-02-05 21:36:11 - INFO - stdout - {'loss': 0.7255, 'grad_norm': 1.3075207471847534, 'learning_rate': 9.940806734099021e-06, 'epoch': 1.55} +2025-02-05 21:36:11 - ERROR - stderr - 52%|█████▏ | 11595/22434 [11:28:31<7:30:00, 2.49s/it] +2025-02-05 21:36:14 - ERROR - stderr - 52%|█████▏ | 11596/22434 [11:28:34<7:34:57, 2.52s/it] +2025-02-05 21:36:14 - ERROR - stderr - +2025-02-05 21:36:14 - ERROR - stderr - +2025-02-05 21:36:14 - INFO - stdout - {'loss': 0.8353, 'grad_norm': 1.3687458038330078, 'learning_rate': 9.939363013390587e-06, 'epoch': 1.55} +2025-02-05 21:36:14 - ERROR - stderr - 52%|█████▏ | 11596/22434 [11:28:34<7:34:57, 2.52s/it] +2025-02-05 21:36:16 - ERROR - stderr - 52%|█████▏ | 11597/22434 [11:28:36<7:31:43, 2.50s/it] +2025-02-05 21:36:16 - ERROR - stderr - +2025-02-05 21:36:16 - ERROR - stderr - +2025-02-05 21:36:16 - INFO - stdout - {'loss': 0.7142, 'grad_norm': 1.2239234447479248, 'learning_rate': 9.93791929394607e-06, 'epoch': 1.55} +2025-02-05 21:36:16 - ERROR - stderr - 52%|█████▏ | 11597/22434 [11:28:36<7:31:43, 2.50s/it] +2025-02-05 21:36:19 - ERROR - stderr - 52%|█████▏ | 11598/22434 [11:28:38<7:29:41, 2.49s/it] +2025-02-05 21:36:19 - ERROR - stderr - +2025-02-05 21:36:19 - ERROR - stderr - +2025-02-05 21:36:19 - INFO - stdout - {'loss': 0.6482, 'grad_norm': 1.1349965333938599, 'learning_rate': 9.936475575795563e-06, 'epoch': 1.55} +2025-02-05 21:36:19 - ERROR - stderr - 52%|█████▏ | 11598/22434 [11:28:39<7:29:41, 2.49s/it] +2025-02-05 21:36:21 - ERROR - stderr - 52%|█████▏ | 11599/22434 [11:28:41<7:29:22, 2.49s/it] +2025-02-05 21:36:21 - ERROR - stderr - +2025-02-05 21:36:21 - ERROR - stderr - +2025-02-05 21:36:21 - INFO - stdout - {'loss': 0.6786, 'grad_norm': 1.271505355834961, 'learning_rate': 9.935031858969168e-06, 'epoch': 1.55} +2025-02-05 21:36:21 - ERROR - stderr - 52%|█████▏ | 11599/22434 [11:28:41<7:29:22, 2.49s/it] +2025-02-05 21:36:24 - ERROR - stderr - 52%|█████▏ | 11600/22434 [11:28:43<7:31:41, 2.50s/it] +2025-02-05 21:36:24 - ERROR - stderr - +2025-02-05 21:36:24 - ERROR - stderr - +2025-02-05 21:36:24 - INFO - stdout - {'loss': 0.6523, 'grad_norm': 1.2784535884857178, 'learning_rate': 9.933588143496971e-06, 'epoch': 1.55} +2025-02-05 21:36:24 - ERROR - stderr - 52%|█████▏ | 11600/22434 [11:28:44<7:31:41, 2.50s/it] +2025-02-05 21:36:26 - ERROR - stderr - 52%|█████▏ | 11601/22434 [11:28:46<7:33:08, 2.51s/it] +2025-02-05 21:36:26 - ERROR - stderr - +2025-02-05 21:36:26 - ERROR - stderr - +2025-02-05 21:36:26 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.0192725658416748, 'learning_rate': 9.932144429409061e-06, 'epoch': 1.55} +2025-02-05 21:36:26 - ERROR - stderr - 52%|█████▏ | 11601/22434 [11:28:46<7:33:08, 2.51s/it] +2025-02-05 21:36:29 - ERROR - stderr - 52%|█████▏ | 11602/22434 [11:28:49<7:33:21, 2.51s/it] +2025-02-05 21:36:29 - ERROR - stderr - +2025-02-05 21:36:29 - ERROR - stderr - +2025-02-05 21:36:29 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.2094461917877197, 'learning_rate': 9.93070071673554e-06, 'epoch': 1.55} +2025-02-05 21:36:29 - ERROR - stderr - 52%|█████▏ | 11602/22434 [11:28:49<7:33:21, 2.51s/it] +2025-02-05 21:36:31 - ERROR - stderr - 52%|█████▏ | 11603/22434 [11:28:51<7:28:13, 2.48s/it] +2025-02-05 21:36:31 - ERROR - stderr - +2025-02-05 21:36:31 - ERROR - stderr - +2025-02-05 21:36:31 - INFO - stdout - {'loss': 0.7084, 'grad_norm': 1.2380319833755493, 'learning_rate': 9.929257005506496e-06, 'epoch': 1.55} +2025-02-05 21:36:31 - ERROR - stderr - 52%|█████▏ | 11603/22434 [11:28:51<7:28:13, 2.48s/it] +2025-02-05 21:36:34 - ERROR - stderr - 52%|█████▏ | 11604/22434 [11:28:53<7:29:17, 2.49s/it] +2025-02-05 21:36:34 - ERROR - stderr - +2025-02-05 21:36:34 - ERROR - stderr - +2025-02-05 21:36:34 - INFO - stdout - {'loss': 0.6987, 'grad_norm': 1.1918452978134155, 'learning_rate': 9.927813295752017e-06, 'epoch': 1.55} +2025-02-05 21:36:34 - ERROR - stderr - 52%|█████▏ | 11604/22434 [11:28:53<7:29:17, 2.49s/it] +2025-02-05 21:36:36 - ERROR - stderr - 52%|█████▏ | 11605/22434 [11:28:56<7:27:41, 2.48s/it] +2025-02-05 21:36:36 - ERROR - stderr - +2025-02-05 21:36:36 - ERROR - stderr - +2025-02-05 21:36:36 - INFO - stdout - {'loss': 0.727, 'grad_norm': 1.2153878211975098, 'learning_rate': 9.926369587502205e-06, 'epoch': 1.55} +2025-02-05 21:36:36 - ERROR - stderr - 52%|█████▏ | 11605/22434 [11:28:56<7:27:41, 2.48s/it] +2025-02-05 21:36:39 - ERROR - stderr - 52%|█████▏ | 11606/22434 [11:28:58<7:28:11, 2.48s/it] +2025-02-05 21:36:39 - ERROR - stderr - +2025-02-05 21:36:39 - ERROR - stderr - +2025-02-05 21:36:39 - INFO - stdout - {'loss': 0.605, 'grad_norm': 1.1790846586227417, 'learning_rate': 9.924925880787146e-06, 'epoch': 1.55} +2025-02-05 21:36:39 - ERROR - stderr - 52%|█████▏ | 11606/22434 [11:28:58<7:28:11, 2.48s/it] +2025-02-05 21:36:41 - ERROR - stderr - 52%|█████▏ | 11607/22434 [11:29:01<7:30:37, 2.50s/it] +2025-02-05 21:36:41 - ERROR - stderr - +2025-02-05 21:36:41 - ERROR - stderr - +2025-02-05 21:36:41 - INFO - stdout - {'loss': 0.7303, 'grad_norm': 1.3317478895187378, 'learning_rate': 9.923482175636938e-06, 'epoch': 1.55} +2025-02-05 21:36:41 - ERROR - stderr - 52%|█████▏ | 11607/22434 [11:29:01<7:30:37, 2.50s/it] +2025-02-05 21:36:44 - ERROR - stderr - 52%|█████▏ | 11608/22434 [11:29:03<7:32:11, 2.51s/it] +2025-02-05 21:36:44 - ERROR - stderr - +2025-02-05 21:36:44 - ERROR - stderr - +2025-02-05 21:36:44 - INFO - stdout - {'loss': 0.693, 'grad_norm': 1.170379400253296, 'learning_rate': 9.922038472081672e-06, 'epoch': 1.55} +2025-02-05 21:36:44 - ERROR - stderr - 52%|█████▏ | 11608/22434 [11:29:03<7:32:11, 2.51s/it] +2025-02-05 21:36:46 - ERROR - stderr - 52%|█████▏ | 11609/22434 [11:29:06<7:33:00, 2.51s/it] +2025-02-05 21:36:46 - ERROR - stderr - +2025-02-05 21:36:46 - ERROR - stderr - +2025-02-05 21:36:46 - INFO - stdout - {'loss': 0.6755, 'grad_norm': 1.1302177906036377, 'learning_rate': 9.920594770151436e-06, 'epoch': 1.55} +2025-02-05 21:36:46 - ERROR - stderr - 52%|█████▏ | 11609/22434 [11:29:06<7:33:00, 2.51s/it] +2025-02-05 21:36:49 - ERROR - stderr - 52%|█████▏ | 11610/22434 [11:29:09<7:39:25, 2.55s/it] +2025-02-05 21:36:49 - ERROR - stderr - +2025-02-05 21:36:49 - ERROR - stderr - +2025-02-05 21:36:49 - INFO - stdout - {'loss': 0.8085, 'grad_norm': 1.2757900953292847, 'learning_rate': 9.919151069876328e-06, 'epoch': 1.55} +2025-02-05 21:36:49 - ERROR - stderr - 52%|█████▏ | 11610/22434 [11:29:09<7:39:25, 2.55s/it] +2025-02-05 21:36:51 - ERROR - stderr - 52%|█████▏ | 11611/22434 [11:29:11<7:34:38, 2.52s/it] +2025-02-05 21:36:51 - ERROR - stderr - +2025-02-05 21:36:51 - ERROR - stderr - +2025-02-05 21:36:51 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.2512168884277344, 'learning_rate': 9.917707371286439e-06, 'epoch': 1.55} +2025-02-05 21:36:51 - ERROR - stderr - 52%|█████▏ | 11611/22434 [11:29:11<7:34:38, 2.52s/it] +2025-02-05 21:36:51 - INFO - stdout - WARNING: tokenization mismatch: 112 vs. 138. (ignored) +2025-02-05 21:36:54 - ERROR - stderr - 52%|█████▏ | 11612/22434 [11:29:14<7:34:35, 2.52s/it] +2025-02-05 21:36:54 - ERROR - stderr - +2025-02-05 21:36:54 - ERROR - stderr - +2025-02-05 21:36:54 - INFO - stdout - {'loss': 0.7203, 'grad_norm': 1.2800650596618652, 'learning_rate': 9.916263674411858e-06, 'epoch': 1.55} +2025-02-05 21:36:54 - ERROR - stderr - 52%|█████▏ | 11612/22434 [11:29:14<7:34:35, 2.52s/it] +2025-02-05 21:36:56 - ERROR - stderr - 52%|█████▏ | 11613/22434 [11:29:16<7:34:40, 2.52s/it] +2025-02-05 21:36:56 - ERROR - stderr - +2025-02-05 21:36:56 - ERROR - stderr - +2025-02-05 21:36:56 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.2630618810653687, 'learning_rate': 9.914819979282684e-06, 'epoch': 1.55} +2025-02-05 21:36:56 - ERROR - stderr - 52%|█████▏ | 11613/22434 [11:29:16<7:34:40, 2.52s/it] +2025-02-05 21:36:59 - ERROR - stderr - 52%|█████▏ | 11614/22434 [11:29:19<7:32:54, 2.51s/it] +2025-02-05 21:36:59 - ERROR - stderr - +2025-02-05 21:36:59 - ERROR - stderr - +2025-02-05 21:36:59 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.2083522081375122, 'learning_rate': 9.913376285929002e-06, 'epoch': 1.55} +2025-02-05 21:36:59 - ERROR - stderr - 52%|█████▏ | 11614/22434 [11:29:19<7:32:54, 2.51s/it] +2025-02-05 21:37:01 - ERROR - stderr - 52%|█████▏ | 11615/22434 [11:29:21<7:29:05, 2.49s/it] +2025-02-05 21:37:01 - ERROR - stderr - +2025-02-05 21:37:01 - ERROR - stderr - +2025-02-05 21:37:01 - INFO - stdout - {'loss': 0.6931, 'grad_norm': 1.2552076578140259, 'learning_rate': 9.911932594380913e-06, 'epoch': 1.55} +2025-02-05 21:37:01 - ERROR - stderr - 52%|█████▏ | 11615/22434 [11:29:21<7:29:05, 2.49s/it] +2025-02-05 21:37:04 - ERROR - stderr - 52%|█████▏ | 11616/22434 [11:29:24<7:28:35, 2.49s/it] +2025-02-05 21:37:04 - ERROR - stderr - +2025-02-05 21:37:04 - ERROR - stderr - +2025-02-05 21:37:04 - INFO - stdout - {'loss': 0.7653, 'grad_norm': 1.3146113157272339, 'learning_rate': 9.910488904668503e-06, 'epoch': 1.55} +2025-02-05 21:37:04 - ERROR - stderr - 52%|█████▏ | 11616/22434 [11:29:24<7:28:35, 2.49s/it] +2025-02-05 21:37:06 - ERROR - stderr - 52%|█████▏ | 11617/22434 [11:29:26<7:37:24, 2.54s/it] +2025-02-05 21:37:06 - ERROR - stderr - +2025-02-05 21:37:06 - ERROR - stderr - +2025-02-05 21:37:06 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.2481141090393066, 'learning_rate': 9.909045216821863e-06, 'epoch': 1.55} +2025-02-05 21:37:06 - ERROR - stderr - 52%|█████▏ | 11617/22434 [11:29:26<7:37:24, 2.54s/it] +2025-02-05 21:37:09 - ERROR - stderr - 52%|█████▏ | 11618/22434 [11:29:29<7:31:38, 2.51s/it] +2025-02-05 21:37:09 - ERROR - stderr - +2025-02-05 21:37:09 - ERROR - stderr - +2025-02-05 21:37:09 - INFO - stdout - {'loss': 0.7343, 'grad_norm': 1.1267297267913818, 'learning_rate': 9.907601530871094e-06, 'epoch': 1.55} +2025-02-05 21:37:09 - ERROR - stderr - 52%|█████▏ | 11618/22434 [11:29:29<7:31:38, 2.51s/it] +2025-02-05 21:37:11 - ERROR - stderr - 52%|█████▏ | 11619/22434 [11:29:31<7:28:04, 2.49s/it] +2025-02-05 21:37:11 - ERROR - stderr - +2025-02-05 21:37:11 - ERROR - stderr - +2025-02-05 21:37:11 - INFO - stdout - {'loss': 0.7429, 'grad_norm': 1.3143137693405151, 'learning_rate': 9.906157846846282e-06, 'epoch': 1.55} +2025-02-05 21:37:11 - ERROR - stderr - 52%|█████▏ | 11619/22434 [11:29:31<7:28:04, 2.49s/it] +2025-02-05 21:37:14 - ERROR - stderr - 52%|█████▏ | 11620/22434 [11:29:34<7:29:42, 2.50s/it] +2025-02-05 21:37:14 - ERROR - stderr - +2025-02-05 21:37:14 - ERROR - stderr - +2025-02-05 21:37:14 - INFO - stdout - {'loss': 0.624, 'grad_norm': 1.2199690341949463, 'learning_rate': 9.904714164777514e-06, 'epoch': 1.55} +2025-02-05 21:37:14 - ERROR - stderr - 52%|█████▏ | 11620/22434 [11:29:34<7:29:42, 2.50s/it] +2025-02-05 21:37:17 - ERROR - stderr - 52%|█████▏ | 11621/22434 [11:29:36<7:47:53, 2.60s/it] +2025-02-05 21:37:17 - ERROR - stderr - +2025-02-05 21:37:17 - ERROR - stderr - +2025-02-05 21:37:17 - INFO - stdout - {'loss': 0.6315, 'grad_norm': 1.1053187847137451, 'learning_rate': 9.903270484694895e-06, 'epoch': 1.55} +2025-02-05 21:37:17 - ERROR - stderr - 52%|█████▏ | 11621/22434 [11:29:36<7:47:53, 2.60s/it] +2025-02-05 21:37:19 - ERROR - stderr - 52%|█████▏ | 11622/22434 [11:29:39<7:41:26, 2.56s/it] +2025-02-05 21:37:19 - ERROR - stderr - +2025-02-05 21:37:19 - ERROR - stderr - +2025-02-05 21:37:19 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.2417516708374023, 'learning_rate': 9.901826806628505e-06, 'epoch': 1.55} +2025-02-05 21:37:19 - ERROR - stderr - 52%|█████▏ | 11622/22434 [11:29:39<7:41:26, 2.56s/it] +2025-02-05 21:37:22 - ERROR - stderr - 52%|█████▏ | 11623/22434 [11:29:41<7:35:34, 2.53s/it] +2025-02-05 21:37:22 - ERROR - stderr - +2025-02-05 21:37:22 - ERROR - stderr - +2025-02-05 21:37:22 - INFO - stdout - {'loss': 0.7123, 'grad_norm': 1.302356481552124, 'learning_rate': 9.900383130608443e-06, 'epoch': 1.55} +2025-02-05 21:37:22 - ERROR - stderr - 52%|█████▏ | 11623/22434 [11:29:41<7:35:34, 2.53s/it] +2025-02-05 21:37:24 - ERROR - stderr - 52%|█████▏ | 11624/22434 [11:29:44<7:33:20, 2.52s/it] +2025-02-05 21:37:24 - ERROR - stderr - +2025-02-05 21:37:24 - ERROR - stderr - +2025-02-05 21:37:24 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.204300045967102, 'learning_rate': 9.8989394566648e-06, 'epoch': 1.55} +2025-02-05 21:37:24 - ERROR - stderr - 52%|█████▏ | 11624/22434 [11:29:44<7:33:20, 2.52s/it] +2025-02-05 21:37:26 - ERROR - stderr - 52%|█████▏ | 11625/22434 [11:29:46<7:28:44, 2.49s/it] +2025-02-05 21:37:27 - ERROR - stderr - +2025-02-05 21:37:27 - ERROR - stderr - +2025-02-05 21:37:27 - INFO - stdout - {'loss': 0.7487, 'grad_norm': 1.0882188081741333, 'learning_rate': 9.897495784827667e-06, 'epoch': 1.55} +2025-02-05 21:37:27 - ERROR - stderr - 52%|█████▏ | 11625/22434 [11:29:46<7:28:44, 2.49s/it] +2025-02-05 21:37:29 - ERROR - stderr - 52%|█████▏ | 11626/22434 [11:29:49<7:25:57, 2.48s/it] +2025-02-05 21:37:29 - ERROR - stderr - +2025-02-05 21:37:29 - ERROR - stderr - +2025-02-05 21:37:29 - INFO - stdout - {'loss': 0.7136, 'grad_norm': 1.2488876581192017, 'learning_rate': 9.896052115127136e-06, 'epoch': 1.55} +2025-02-05 21:37:29 - ERROR - stderr - 52%|█████▏ | 11626/22434 [11:29:49<7:25:57, 2.48s/it] +2025-02-05 21:37:31 - ERROR - stderr - 52%|█████▏ | 11627/22434 [11:29:51<7:32:03, 2.51s/it] +2025-02-05 21:37:32 - ERROR - stderr - +2025-02-05 21:37:32 - ERROR - stderr - +2025-02-05 21:37:32 - INFO - stdout - {'loss': 0.7369, 'grad_norm': 1.1952486038208008, 'learning_rate': 9.8946084475933e-06, 'epoch': 1.55} +2025-02-05 21:37:32 - ERROR - stderr - 52%|█████▏ | 11627/22434 [11:29:51<7:32:03, 2.51s/it] +2025-02-05 21:37:34 - ERROR - stderr - 52%|█████▏ | 11628/22434 [11:29:54<7:29:20, 2.49s/it] +2025-02-05 21:37:34 - ERROR - stderr - +2025-02-05 21:37:34 - ERROR - stderr - +2025-02-05 21:37:34 - INFO - stdout - {'loss': 0.7881, 'grad_norm': 1.3092358112335205, 'learning_rate': 9.89316478225625e-06, 'epoch': 1.55} +2025-02-05 21:37:34 - ERROR - stderr - 52%|█████▏ | 11628/22434 [11:29:54<7:29:20, 2.49s/it] +2025-02-05 21:37:36 - ERROR - stderr - 52%|█████▏ | 11629/22434 [11:29:56<7:29:30, 2.50s/it] +2025-02-05 21:37:37 - ERROR - stderr - +2025-02-05 21:37:37 - ERROR - stderr - +2025-02-05 21:37:37 - INFO - stdout - {'loss': 0.7028, 'grad_norm': 1.204134464263916, 'learning_rate': 9.891721119146076e-06, 'epoch': 1.56} +2025-02-05 21:37:37 - ERROR - stderr - 52%|█████▏ | 11629/22434 [11:29:56<7:29:30, 2.50s/it] +2025-02-05 21:37:39 - ERROR - stderr - 52%|█████▏ | 11630/22434 [11:29:59<7:28:37, 2.49s/it] +2025-02-05 21:37:39 - ERROR - stderr - +2025-02-05 21:37:39 - ERROR - stderr - +2025-02-05 21:37:39 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.173227310180664, 'learning_rate': 9.890277458292871e-06, 'epoch': 1.56} +2025-02-05 21:37:39 - ERROR - stderr - 52%|█████▏ | 11630/22434 [11:29:59<7:28:37, 2.49s/it] +2025-02-05 21:37:41 - ERROR - stderr - 52%|█████▏ | 11631/22434 [11:30:01<7:30:58, 2.50s/it] +2025-02-05 21:37:42 - ERROR - stderr - +2025-02-05 21:37:42 - ERROR - stderr - +2025-02-05 21:37:42 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.2467774152755737, 'learning_rate': 9.888833799726733e-06, 'epoch': 1.56} +2025-02-05 21:37:42 - ERROR - stderr - 52%|█████▏ | 11631/22434 [11:30:01<7:30:58, 2.50s/it] +2025-02-05 21:37:44 - ERROR - stderr - 52%|█████▏ | 11632/22434 [11:30:04<7:32:09, 2.51s/it] +2025-02-05 21:37:44 - ERROR - stderr - +2025-02-05 21:37:44 - ERROR - stderr - +2025-02-05 21:37:44 - INFO - stdout - {'loss': 0.6794, 'grad_norm': 1.2323771715164185, 'learning_rate': 9.887390143477746e-06, 'epoch': 1.56} +2025-02-05 21:37:44 - ERROR - stderr - 52%|█████▏ | 11632/22434 [11:30:04<7:32:09, 2.51s/it] +2025-02-05 21:37:46 - ERROR - stderr - 52%|█████▏ | 11633/22434 [11:30:06<7:28:08, 2.49s/it] +2025-02-05 21:37:46 - ERROR - stderr - +2025-02-05 21:37:46 - ERROR - stderr - +2025-02-05 21:37:46 - INFO - stdout - {'loss': 0.6282, 'grad_norm': 1.2474805116653442, 'learning_rate': 9.885946489576001e-06, 'epoch': 1.56} +2025-02-05 21:37:46 - ERROR - stderr - 52%|█████▏ | 11633/22434 [11:30:06<7:28:08, 2.49s/it] +2025-02-05 21:37:49 - ERROR - stderr - 52%|█████▏ | 11634/22434 [11:30:09<7:36:54, 2.54s/it] +2025-02-05 21:37:49 - ERROR - stderr - +2025-02-05 21:37:49 - ERROR - stderr - +2025-02-05 21:37:49 - INFO - stdout - {'loss': 0.6278, 'grad_norm': 1.0359275341033936, 'learning_rate': 9.884502838051595e-06, 'epoch': 1.56} +2025-02-05 21:37:49 - ERROR - stderr - 52%|█████▏ | 11634/22434 [11:30:09<7:36:54, 2.54s/it] +2025-02-05 21:37:52 - ERROR - stderr - 52%|█████▏ | 11635/22434 [11:30:11<7:37:04, 2.54s/it] +2025-02-05 21:37:52 - ERROR - stderr - +2025-02-05 21:37:52 - ERROR - stderr - +2025-02-05 21:37:52 - INFO - stdout - {'loss': 0.6579, 'grad_norm': 1.1019821166992188, 'learning_rate': 9.883059188934615e-06, 'epoch': 1.56} +2025-02-05 21:37:52 - ERROR - stderr - 52%|█████▏ | 11635/22434 [11:30:11<7:37:04, 2.54s/it] +2025-02-05 21:37:54 - ERROR - stderr - 52%|█████▏ | 11636/22434 [11:30:14<7:51:10, 2.62s/it] +2025-02-05 21:37:54 - ERROR - stderr - +2025-02-05 21:37:54 - ERROR - stderr - +2025-02-05 21:37:54 - INFO - stdout - {'loss': 0.6979, 'grad_norm': 1.116276502609253, 'learning_rate': 9.881615542255151e-06, 'epoch': 1.56} +2025-02-05 21:37:54 - ERROR - stderr - 52%|█████▏ | 11636/22434 [11:30:14<7:51:10, 2.62s/it] +2025-02-05 21:37:57 - ERROR - stderr - 52%|█████▏ | 11637/22434 [11:30:17<7:46:23, 2.59s/it] +2025-02-05 21:37:57 - ERROR - stderr - +2025-02-05 21:37:57 - ERROR - stderr - +2025-02-05 21:37:57 - INFO - stdout - {'loss': 0.6786, 'grad_norm': 1.198554277420044, 'learning_rate': 9.880171898043306e-06, 'epoch': 1.56} +2025-02-05 21:37:57 - ERROR - stderr - 52%|█████▏ | 11637/22434 [11:30:17<7:46:23, 2.59s/it] +2025-02-05 21:37:59 - ERROR - stderr - 52%|█████▏ | 11638/22434 [11:30:19<7:39:51, 2.56s/it] +2025-02-05 21:37:59 - ERROR - stderr - +2025-02-05 21:37:59 - ERROR - stderr - +2025-02-05 21:37:59 - INFO - stdout - {'loss': 0.7343, 'grad_norm': 1.192000150680542, 'learning_rate': 9.878728256329154e-06, 'epoch': 1.56} +2025-02-05 21:37:59 - ERROR - stderr - 52%|█████▏ | 11638/22434 [11:30:19<7:39:51, 2.56s/it] +2025-02-05 21:38:02 - ERROR - stderr - 52%|█████▏ | 11639/22434 [11:30:22<7:40:47, 2.56s/it] +2025-02-05 21:38:02 - ERROR - stderr - +2025-02-05 21:38:02 - ERROR - stderr - +2025-02-05 21:38:02 - INFO - stdout - {'loss': 0.6672, 'grad_norm': 1.2033674716949463, 'learning_rate': 9.877284617142802e-06, 'epoch': 1.56} +2025-02-05 21:38:02 - ERROR - stderr - 52%|█████▏ | 11639/22434 [11:30:22<7:40:47, 2.56s/it] +2025-02-05 21:38:05 - ERROR - stderr - 52%|█████▏ | 11640/22434 [11:30:24<7:37:05, 2.54s/it] +2025-02-05 21:38:05 - ERROR - stderr - +2025-02-05 21:38:05 - ERROR - stderr - +2025-02-05 21:38:05 - INFO - stdout - {'loss': 0.7765, 'grad_norm': 1.1766128540039062, 'learning_rate': 9.875840980514332e-06, 'epoch': 1.56} +2025-02-05 21:38:05 - ERROR - stderr - 52%|█████▏ | 11640/22434 [11:30:24<7:37:05, 2.54s/it] +2025-02-05 21:38:07 - ERROR - stderr - 52%|█████▏ | 11641/22434 [11:30:27<7:31:17, 2.51s/it] +2025-02-05 21:38:07 - ERROR - stderr - +2025-02-05 21:38:07 - ERROR - stderr - +2025-02-05 21:38:07 - INFO - stdout - {'loss': 0.7248, 'grad_norm': 1.199671745300293, 'learning_rate': 9.87439734647384e-06, 'epoch': 1.56} +2025-02-05 21:38:07 - ERROR - stderr - 52%|█████▏ | 11641/22434 [11:30:27<7:31:17, 2.51s/it] +2025-02-05 21:38:09 - ERROR - stderr - 52%|█████▏ | 11642/22434 [11:30:29<7:29:37, 2.50s/it] +2025-02-05 21:38:09 - ERROR - stderr - +2025-02-05 21:38:09 - ERROR - stderr - +2025-02-05 21:38:09 - INFO - stdout - {'loss': 0.7148, 'grad_norm': 1.3016639947891235, 'learning_rate': 9.872953715051412e-06, 'epoch': 1.56} +2025-02-05 21:38:09 - ERROR - stderr - 52%|█████▏ | 11642/22434 [11:30:29<7:29:37, 2.50s/it] +2025-02-05 21:38:12 - ERROR - stderr - 52%|█████▏ | 11643/22434 [11:30:32<7:28:48, 2.50s/it] +2025-02-05 21:38:12 - ERROR - stderr - +2025-02-05 21:38:12 - ERROR - stderr - +2025-02-05 21:38:12 - INFO - stdout - {'loss': 0.6622, 'grad_norm': 1.1941275596618652, 'learning_rate': 9.871510086277142e-06, 'epoch': 1.56} +2025-02-05 21:38:12 - ERROR - stderr - 52%|█████▏ | 11643/22434 [11:30:32<7:28:48, 2.50s/it] +2025-02-05 21:38:14 - ERROR - stderr - 52%|█████▏ | 11644/22434 [11:30:34<7:31:04, 2.51s/it] +2025-02-05 21:38:14 - ERROR - stderr - +2025-02-05 21:38:14 - ERROR - stderr - +2025-02-05 21:38:14 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.2003486156463623, 'learning_rate': 9.870066460181126e-06, 'epoch': 1.56} +2025-02-05 21:38:14 - ERROR - stderr - 52%|█████▏ | 11644/22434 [11:30:34<7:31:04, 2.51s/it] +2025-02-05 21:38:17 - ERROR - stderr - 52%|█████▏ | 11645/22434 [11:30:37<7:31:36, 2.51s/it] +2025-02-05 21:38:17 - ERROR - stderr - +2025-02-05 21:38:17 - ERROR - stderr - +2025-02-05 21:38:17 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.3094204664230347, 'learning_rate': 9.86862283679345e-06, 'epoch': 1.56} +2025-02-05 21:38:17 - ERROR - stderr - 52%|█████▏ | 11645/22434 [11:30:37<7:31:36, 2.51s/it] +2025-02-05 21:38:20 - ERROR - stderr - 52%|█████▏ | 11646/22434 [11:30:40<7:47:42, 2.60s/it] +2025-02-05 21:38:20 - ERROR - stderr - +2025-02-05 21:38:20 - ERROR - stderr - +2025-02-05 21:38:20 - INFO - stdout - {'loss': 0.6679, 'grad_norm': 1.1726515293121338, 'learning_rate': 9.8671792161442e-06, 'epoch': 1.56} +2025-02-05 21:38:20 - ERROR - stderr - 52%|█████▏ | 11646/22434 [11:30:40<7:47:42, 2.60s/it] +2025-02-05 21:38:22 - ERROR - stderr - 52%|█████▏ | 11647/22434 [11:30:42<7:43:29, 2.58s/it] +2025-02-05 21:38:22 - ERROR - stderr - +2025-02-05 21:38:22 - ERROR - stderr - +2025-02-05 21:38:22 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.0584392547607422, 'learning_rate': 9.865735598263477e-06, 'epoch': 1.56} +2025-02-05 21:38:22 - ERROR - stderr - 52%|█████▏ | 11647/22434 [11:30:42<7:43:29, 2.58s/it] +2025-02-05 21:38:25 - ERROR - stderr - 52%|█████▏ | 11648/22434 [11:30:45<7:38:46, 2.55s/it] +2025-02-05 21:38:25 - ERROR - stderr - +2025-02-05 21:38:25 - ERROR - stderr - +2025-02-05 21:38:25 - INFO - stdout - {'loss': 0.695, 'grad_norm': 1.1330418586730957, 'learning_rate': 9.864291983181366e-06, 'epoch': 1.56} +2025-02-05 21:38:25 - ERROR - stderr - 52%|█████▏ | 11648/22434 [11:30:45<7:38:46, 2.55s/it] +2025-02-05 21:38:27 - ERROR - stderr - 52%|█████▏ | 11649/22434 [11:30:47<7:32:38, 2.52s/it] +2025-02-05 21:38:27 - ERROR - stderr - +2025-02-05 21:38:27 - ERROR - stderr - +2025-02-05 21:38:27 - INFO - stdout - {'loss': 0.767, 'grad_norm': 1.2522163391113281, 'learning_rate': 9.862848370927955e-06, 'epoch': 1.56} +2025-02-05 21:38:27 - ERROR - stderr - 52%|█████▏ | 11649/22434 [11:30:47<7:32:38, 2.52s/it] +2025-02-05 21:38:30 - ERROR - stderr - 52%|█████▏ | 11650/22434 [11:30:49<7:31:04, 2.51s/it] +2025-02-05 21:38:30 - ERROR - stderr - +2025-02-05 21:38:30 - ERROR - stderr - +2025-02-05 21:38:30 - INFO - stdout - {'loss': 0.6714, 'grad_norm': 1.1301709413528442, 'learning_rate': 9.861404761533343e-06, 'epoch': 1.56} +2025-02-05 21:38:30 - ERROR - stderr - 52%|█████▏ | 11650/22434 [11:30:50<7:31:04, 2.51s/it] +2025-02-05 21:38:32 - ERROR - stderr - 52%|█████▏ | 11651/22434 [11:30:52<7:39:34, 2.56s/it] +2025-02-05 21:38:32 - ERROR - stderr - +2025-02-05 21:38:32 - ERROR - stderr - +2025-02-05 21:38:32 - INFO - stdout - {'loss': 0.6315, 'grad_norm': 1.1758971214294434, 'learning_rate': 9.859961155027613e-06, 'epoch': 1.56} +2025-02-05 21:38:32 - ERROR - stderr - 52%|█████▏ | 11651/22434 [11:30:52<7:39:34, 2.56s/it] +2025-02-05 21:38:35 - ERROR - stderr - 52%|█████▏ | 11652/22434 [11:30:55<7:40:53, 2.56s/it] +2025-02-05 21:38:35 - ERROR - stderr - +2025-02-05 21:38:35 - ERROR - stderr - +2025-02-05 21:38:35 - INFO - stdout - {'loss': 0.6749, 'grad_norm': 1.1606630086898804, 'learning_rate': 9.85851755144086e-06, 'epoch': 1.56} +2025-02-05 21:38:35 - ERROR - stderr - 52%|█████▏ | 11652/22434 [11:30:55<7:40:53, 2.56s/it] +2025-02-05 21:38:37 - ERROR - stderr - 52%|█████▏ | 11653/22434 [11:30:57<7:37:44, 2.55s/it] +2025-02-05 21:38:38 - ERROR - stderr - +2025-02-05 21:38:38 - ERROR - stderr - +2025-02-05 21:38:38 - INFO - stdout - {'loss': 0.5811, 'grad_norm': 1.082653284072876, 'learning_rate': 9.857073950803176e-06, 'epoch': 1.56} +2025-02-05 21:38:38 - ERROR - stderr - 52%|█████▏ | 11653/22434 [11:30:57<7:37:44, 2.55s/it] +2025-02-05 21:38:40 - ERROR - stderr - 52%|█████▏ | 11654/22434 [11:31:00<7:37:36, 2.55s/it] +2025-02-05 21:38:40 - ERROR - stderr - +2025-02-05 21:38:40 - ERROR - stderr - +2025-02-05 21:38:40 - INFO - stdout - {'loss': 0.7025, 'grad_norm': 1.1764706373214722, 'learning_rate': 9.855630353144644e-06, 'epoch': 1.56} +2025-02-05 21:38:40 - ERROR - stderr - 52%|█████▏ | 11654/22434 [11:31:00<7:37:36, 2.55s/it] +2025-02-05 21:38:42 - ERROR - stderr - 52%|█████▏ | 11655/22434 [11:31:02<7:33:39, 2.53s/it] +2025-02-05 21:38:43 - ERROR - stderr - +2025-02-05 21:38:43 - ERROR - stderr - +2025-02-05 21:38:43 - INFO - stdout - {'loss': 0.6713, 'grad_norm': 1.1327965259552002, 'learning_rate': 9.854186758495361e-06, 'epoch': 1.56} +2025-02-05 21:38:43 - ERROR - stderr - 52%|█████▏ | 11655/22434 [11:31:02<7:33:39, 2.53s/it] +2025-02-05 21:38:45 - ERROR - stderr - 52%|█████▏ | 11656/22434 [11:31:05<7:31:19, 2.51s/it] +2025-02-05 21:38:45 - ERROR - stderr - +2025-02-05 21:38:45 - ERROR - stderr - +2025-02-05 21:38:45 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.2917152643203735, 'learning_rate': 9.852743166885419e-06, 'epoch': 1.56} +2025-02-05 21:38:45 - ERROR - stderr - 52%|█████▏ | 11656/22434 [11:31:05<7:31:19, 2.51s/it] +2025-02-05 21:38:48 - ERROR - stderr - 52%|█████▏ | 11657/22434 [11:31:07<7:33:28, 2.52s/it] +2025-02-05 21:38:48 - ERROR - stderr - +2025-02-05 21:38:48 - ERROR - stderr - +2025-02-05 21:38:48 - INFO - stdout - {'loss': 0.6649, 'grad_norm': 1.122725486755371, 'learning_rate': 9.851299578344897e-06, 'epoch': 1.56} +2025-02-05 21:38:48 - ERROR - stderr - 52%|█████▏ | 11657/22434 [11:31:07<7:33:28, 2.52s/it] +2025-02-05 21:38:50 - ERROR - stderr - 52%|█████▏ | 11658/22434 [11:31:10<7:32:40, 2.52s/it] +2025-02-05 21:38:50 - ERROR - stderr - +2025-02-05 21:38:50 - ERROR - stderr - +2025-02-05 21:38:50 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.1917108297348022, 'learning_rate': 9.8498559929039e-06, 'epoch': 1.56} +2025-02-05 21:38:50 - ERROR - stderr - 52%|█████▏ | 11658/22434 [11:31:10<7:32:40, 2.52s/it] +2025-02-05 21:38:53 - ERROR - stderr - 52%|█████▏ | 11659/22434 [11:31:12<7:34:31, 2.53s/it] +2025-02-05 21:38:53 - ERROR - stderr - +2025-02-05 21:38:53 - ERROR - stderr - +2025-02-05 21:38:53 - INFO - stdout - {'loss': 0.62, 'grad_norm': 1.1083738803863525, 'learning_rate': 9.848412410592506e-06, 'epoch': 1.56} +2025-02-05 21:38:53 - ERROR - stderr - 52%|█████▏ | 11659/22434 [11:31:12<7:34:31, 2.53s/it] +2025-02-05 21:38:55 - ERROR - stderr - 52%|█████▏ | 11660/22434 [11:31:15<7:36:26, 2.54s/it] +2025-02-05 21:38:55 - ERROR - stderr - +2025-02-05 21:38:55 - ERROR - stderr - +2025-02-05 21:38:55 - INFO - stdout - {'loss': 0.7216, 'grad_norm': 1.2363409996032715, 'learning_rate': 9.846968831440815e-06, 'epoch': 1.56} +2025-02-05 21:38:55 - ERROR - stderr - 52%|█████▏ | 11660/22434 [11:31:15<7:36:26, 2.54s/it] +2025-02-05 21:38:58 - ERROR - stderr - 52%|█████▏ | 11661/22434 [11:31:17<7:32:18, 2.52s/it] +2025-02-05 21:38:58 - ERROR - stderr - +2025-02-05 21:38:58 - ERROR - stderr - +2025-02-05 21:38:58 - INFO - stdout - {'loss': 0.6297, 'grad_norm': 1.4001164436340332, 'learning_rate': 9.84552525547891e-06, 'epoch': 1.56} +2025-02-05 21:38:58 - ERROR - stderr - 52%|█████▏ | 11661/22434 [11:31:17<7:32:18, 2.52s/it] +2025-02-05 21:39:00 - ERROR - stderr - 52%|█████▏ | 11662/22434 [11:31:20<7:29:28, 2.50s/it] +2025-02-05 21:39:00 - ERROR - stderr - +2025-02-05 21:39:00 - ERROR - stderr - +2025-02-05 21:39:00 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.2242978811264038, 'learning_rate': 9.844081682736881e-06, 'epoch': 1.56} +2025-02-05 21:39:00 - ERROR - stderr - 52%|█████▏ | 11662/22434 [11:31:20<7:29:28, 2.50s/it] +2025-02-05 21:39:03 - ERROR - stderr - 52%|█████▏ | 11663/22434 [11:31:22<7:32:23, 2.52s/it] +2025-02-05 21:39:03 - ERROR - stderr - +2025-02-05 21:39:03 - ERROR - stderr - +2025-02-05 21:39:03 - INFO - stdout - {'loss': 0.7295, 'grad_norm': 1.223361611366272, 'learning_rate': 9.842638113244824e-06, 'epoch': 1.56} +2025-02-05 21:39:03 - ERROR - stderr - 52%|█████▏ | 11663/22434 [11:31:22<7:32:23, 2.52s/it] +2025-02-05 21:39:05 - ERROR - stderr - 52%|█████▏ | 11664/22434 [11:31:25<7:31:53, 2.52s/it] +2025-02-05 21:39:05 - ERROR - stderr - +2025-02-05 21:39:05 - ERROR - stderr - +2025-02-05 21:39:05 - INFO - stdout - {'loss': 0.8019, 'grad_norm': 1.2252004146575928, 'learning_rate': 9.841194547032826e-06, 'epoch': 1.56} +2025-02-05 21:39:05 - ERROR - stderr - 52%|█████▏ | 11664/22434 [11:31:25<7:31:53, 2.52s/it] +2025-02-05 21:39:08 - ERROR - stderr - 52%|█████▏ | 11665/22434 [11:31:27<7:27:13, 2.49s/it] +2025-02-05 21:39:08 - ERROR - stderr - +2025-02-05 21:39:08 - ERROR - stderr - +2025-02-05 21:39:08 - INFO - stdout - {'loss': 0.6912, 'grad_norm': 1.165259838104248, 'learning_rate': 9.839750984130971e-06, 'epoch': 1.56} +2025-02-05 21:39:08 - ERROR - stderr - 52%|█████▏ | 11665/22434 [11:31:27<7:27:13, 2.49s/it] +2025-02-05 21:39:10 - ERROR - stderr - 52%|█████▏ | 11666/22434 [11:31:30<7:24:43, 2.48s/it] +2025-02-05 21:39:10 - ERROR - stderr - +2025-02-05 21:39:10 - ERROR - stderr - +2025-02-05 21:39:10 - INFO - stdout - {'loss': 0.7716, 'grad_norm': 1.239406704902649, 'learning_rate': 9.838307424569357e-06, 'epoch': 1.56} +2025-02-05 21:39:10 - ERROR - stderr - 52%|█████▏ | 11666/22434 [11:31:30<7:24:43, 2.48s/it] +2025-02-05 21:39:13 - ERROR - stderr - 52%|█████▏ | 11667/22434 [11:31:32<7:24:29, 2.48s/it] +2025-02-05 21:39:13 - ERROR - stderr - +2025-02-05 21:39:13 - ERROR - stderr - +2025-02-05 21:39:13 - INFO - stdout - {'loss': 0.6733, 'grad_norm': 1.240493893623352, 'learning_rate': 9.836863868378067e-06, 'epoch': 1.56} +2025-02-05 21:39:13 - ERROR - stderr - 52%|█████▏ | 11667/22434 [11:31:32<7:24:29, 2.48s/it] +2025-02-05 21:39:15 - ERROR - stderr - 52%|█████▏ | 11668/22434 [11:31:35<7:33:20, 2.53s/it] +2025-02-05 21:39:15 - ERROR - stderr - +2025-02-05 21:39:15 - ERROR - stderr - +2025-02-05 21:39:15 - INFO - stdout - {'loss': 0.7451, 'grad_norm': 1.276371955871582, 'learning_rate': 9.835420315587194e-06, 'epoch': 1.56} +2025-02-05 21:39:15 - ERROR - stderr - 52%|█████▏ | 11668/22434 [11:31:35<7:33:20, 2.53s/it] +2025-02-05 21:39:18 - ERROR - stderr - 52%|█████▏ | 11669/22434 [11:31:37<7:31:40, 2.52s/it] +2025-02-05 21:39:18 - ERROR - stderr - +2025-02-05 21:39:18 - ERROR - stderr - +2025-02-05 21:39:18 - INFO - stdout - {'loss': 0.6196, 'grad_norm': 1.1262831687927246, 'learning_rate': 9.833976766226831e-06, 'epoch': 1.56} +2025-02-05 21:39:18 - ERROR - stderr - 52%|█████▏ | 11669/22434 [11:31:37<7:31:40, 2.52s/it] +2025-02-05 21:39:20 - ERROR - stderr - 52%|█████▏ | 11670/22434 [11:31:40<7:31:29, 2.52s/it] +2025-02-05 21:39:20 - ERROR - stderr - +2025-02-05 21:39:20 - ERROR - stderr - +2025-02-05 21:39:20 - INFO - stdout - {'loss': 0.7514, 'grad_norm': 1.2979719638824463, 'learning_rate': 9.832533220327059e-06, 'epoch': 1.56} +2025-02-05 21:39:20 - ERROR - stderr - 52%|█████▏ | 11670/22434 [11:31:40<7:31:29, 2.52s/it] +2025-02-05 21:39:23 - ERROR - stderr - 52%|█████▏ | 11671/22434 [11:31:42<7:32:31, 2.52s/it] +2025-02-05 21:39:23 - ERROR - stderr - +2025-02-05 21:39:23 - ERROR - stderr - +2025-02-05 21:39:23 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.213478684425354, 'learning_rate': 9.831089677917974e-06, 'epoch': 1.56} +2025-02-05 21:39:23 - ERROR - stderr - 52%|█████▏ | 11671/22434 [11:31:43<7:32:31, 2.52s/it] +2025-02-05 21:39:25 - ERROR - stderr - 52%|█████▏ | 11672/22434 [11:31:45<7:29:43, 2.51s/it] +2025-02-05 21:39:25 - ERROR - stderr - +2025-02-05 21:39:25 - ERROR - stderr - +2025-02-05 21:39:25 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.2263919115066528, 'learning_rate': 9.829646139029664e-06, 'epoch': 1.56} +2025-02-05 21:39:25 - ERROR - stderr - 52%|█████▏ | 11672/22434 [11:31:45<7:29:43, 2.51s/it] +2025-02-05 21:39:28 - ERROR - stderr - 52%|█████▏ | 11673/22434 [11:31:47<7:27:02, 2.49s/it] +2025-02-05 21:39:28 - ERROR - stderr - +2025-02-05 21:39:28 - ERROR - stderr - +2025-02-05 21:39:28 - INFO - stdout - {'loss': 0.7988, 'grad_norm': 1.4062761068344116, 'learning_rate': 9.828202603692214e-06, 'epoch': 1.56} +2025-02-05 21:39:28 - ERROR - stderr - 52%|█████▏ | 11673/22434 [11:31:47<7:27:02, 2.49s/it] +2025-02-05 21:39:30 - ERROR - stderr - 52%|█████▏ | 11674/22434 [11:31:50<7:31:12, 2.52s/it] +2025-02-05 21:39:30 - ERROR - stderr - +2025-02-05 21:39:30 - ERROR - stderr - +2025-02-05 21:39:30 - INFO - stdout - {'loss': 0.6466, 'grad_norm': 1.0653266906738281, 'learning_rate': 9.826759071935718e-06, 'epoch': 1.56} +2025-02-05 21:39:30 - ERROR - stderr - 52%|█████▏ | 11674/22434 [11:31:50<7:31:12, 2.52s/it] +2025-02-05 21:39:33 - ERROR - stderr - 52%|█████▏ | 11675/22434 [11:31:53<7:33:48, 2.53s/it] +2025-02-05 21:39:33 - ERROR - stderr - +2025-02-05 21:39:33 - ERROR - stderr - +2025-02-05 21:39:33 - INFO - stdout - {'loss': 0.5972, 'grad_norm': 1.1554373502731323, 'learning_rate': 9.82531554379026e-06, 'epoch': 1.56} +2025-02-05 21:39:33 - ERROR - stderr - 52%|█████▏ | 11675/22434 [11:31:53<7:33:48, 2.53s/it] +2025-02-05 21:39:35 - ERROR - stderr - 52%|█████▏ | 11676/22434 [11:31:55<7:28:49, 2.50s/it] +2025-02-05 21:39:35 - ERROR - stderr - +2025-02-05 21:39:35 - ERROR - stderr - +2025-02-05 21:39:35 - INFO - stdout - {'loss': 0.6886, 'grad_norm': 1.1816476583480835, 'learning_rate': 9.823872019285938e-06, 'epoch': 1.56} +2025-02-05 21:39:35 - ERROR - stderr - 52%|█████▏ | 11676/22434 [11:31:55<7:28:49, 2.50s/it] +2025-02-05 21:39:38 - ERROR - stderr - 52%|█████▏ | 11677/22434 [11:31:58<7:32:07, 2.52s/it] +2025-02-05 21:39:38 - ERROR - stderr - +2025-02-05 21:39:38 - ERROR - stderr - +2025-02-05 21:39:38 - INFO - stdout - {'loss': 0.7817, 'grad_norm': 1.3037949800491333, 'learning_rate': 9.822428498452836e-06, 'epoch': 1.56} +2025-02-05 21:39:38 - ERROR - stderr - 52%|█████▏ | 11677/22434 [11:31:58<7:32:07, 2.52s/it] +2025-02-05 21:39:40 - ERROR - stderr - 52%|█████▏ | 11678/22434 [11:32:00<7:29:22, 2.51s/it] +2025-02-05 21:39:40 - ERROR - stderr - +2025-02-05 21:39:40 - ERROR - stderr - +2025-02-05 21:39:40 - INFO - stdout - {'loss': 0.7161, 'grad_norm': 1.2093069553375244, 'learning_rate': 9.820984981321035e-06, 'epoch': 1.56} +2025-02-05 21:39:40 - ERROR - stderr - 52%|█████▏ | 11678/22434 [11:32:00<7:29:22, 2.51s/it] +2025-02-05 21:39:43 - ERROR - stderr - 52%|█████▏ | 11679/22434 [11:32:03<7:34:10, 2.53s/it] +2025-02-05 21:39:43 - ERROR - stderr - +2025-02-05 21:39:43 - ERROR - stderr - +2025-02-05 21:39:43 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.2922788858413696, 'learning_rate': 9.819541467920638e-06, 'epoch': 1.56} +2025-02-05 21:39:43 - ERROR - stderr - 52%|█████▏ | 11679/22434 [11:32:03<7:34:10, 2.53s/it] +2025-02-05 21:39:45 - ERROR - stderr - 52%|█████▏ | 11680/22434 [11:32:05<7:29:12, 2.51s/it] +2025-02-05 21:39:45 - ERROR - stderr - +2025-02-05 21:39:45 - ERROR - stderr - +2025-02-05 21:39:45 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.1776243448257446, 'learning_rate': 9.818097958281723e-06, 'epoch': 1.56} +2025-02-05 21:39:45 - ERROR - stderr - 52%|█████▏ | 11680/22434 [11:32:05<7:29:12, 2.51s/it] +2025-02-05 21:39:48 - ERROR - stderr - 52%|█████▏ | 11681/22434 [11:32:08<7:41:38, 2.58s/it] +2025-02-05 21:39:48 - ERROR - stderr - +2025-02-05 21:39:48 - ERROR - stderr - +2025-02-05 21:39:48 - INFO - stdout - {'loss': 0.6501, 'grad_norm': 1.125073790550232, 'learning_rate': 9.81665445243438e-06, 'epoch': 1.56} +2025-02-05 21:39:48 - ERROR - stderr - 52%|█████▏ | 11681/22434 [11:32:08<7:41:38, 2.58s/it] +2025-02-05 21:39:51 - ERROR - stderr - 52%|█████▏ | 11682/22434 [11:32:10<7:37:17, 2.55s/it] +2025-02-05 21:39:51 - ERROR - stderr - +2025-02-05 21:39:51 - ERROR - stderr - +2025-02-05 21:39:51 - INFO - stdout - {'loss': 0.7229, 'grad_norm': 1.2076047658920288, 'learning_rate': 9.815210950408703e-06, 'epoch': 1.56} +2025-02-05 21:39:51 - ERROR - stderr - 52%|█████▏ | 11682/22434 [11:32:10<7:37:17, 2.55s/it] +2025-02-05 21:39:53 - ERROR - stderr - 52%|█████▏ | 11683/22434 [11:32:13<7:33:25, 2.53s/it] +2025-02-05 21:39:53 - ERROR - stderr - +2025-02-05 21:39:53 - ERROR - stderr - +2025-02-05 21:39:53 - INFO - stdout - {'loss': 0.6013, 'grad_norm': 1.2347359657287598, 'learning_rate': 9.813767452234772e-06, 'epoch': 1.56} +2025-02-05 21:39:53 - ERROR - stderr - 52%|█████▏ | 11683/22434 [11:32:13<7:33:25, 2.53s/it] +2025-02-05 21:39:55 - ERROR - stderr - 52%|█████▏ | 11684/22434 [11:32:15<7:29:28, 2.51s/it] +2025-02-05 21:39:56 - ERROR - stderr - +2025-02-05 21:39:56 - ERROR - stderr - +2025-02-05 21:39:56 - INFO - stdout - {'loss': 0.6347, 'grad_norm': 1.2110868692398071, 'learning_rate': 9.812323957942686e-06, 'epoch': 1.56} +2025-02-05 21:39:56 - ERROR - stderr - 52%|█████▏ | 11684/22434 [11:32:15<7:29:28, 2.51s/it] +2025-02-05 21:39:58 - ERROR - stderr - 52%|█████▏ | 11685/22434 [11:32:18<7:33:16, 2.53s/it] +2025-02-05 21:39:58 - ERROR - stderr - +2025-02-05 21:39:58 - ERROR - stderr - +2025-02-05 21:39:58 - INFO - stdout - {'loss': 0.8649, 'grad_norm': 1.4476277828216553, 'learning_rate': 9.810880467562527e-06, 'epoch': 1.56} +2025-02-05 21:39:58 - ERROR - stderr - 52%|█████▏ | 11685/22434 [11:32:18<7:33:16, 2.53s/it] +2025-02-05 21:40:01 - ERROR - stderr - 52%|█████▏ | 11686/22434 [11:32:20<7:30:00, 2.51s/it] +2025-02-05 21:40:01 - ERROR - stderr - +2025-02-05 21:40:01 - ERROR - stderr - +2025-02-05 21:40:01 - INFO - stdout - {'loss': 0.6593, 'grad_norm': 1.2302271127700806, 'learning_rate': 9.80943698112438e-06, 'epoch': 1.56} +2025-02-05 21:40:01 - ERROR - stderr - 52%|█████▏ | 11686/22434 [11:32:20<7:30:00, 2.51s/it] +2025-02-05 21:40:03 - ERROR - stderr - 52%|█████▏ | 11687/22434 [11:32:23<7:27:05, 2.50s/it] +2025-02-05 21:40:03 - ERROR - stderr - +2025-02-05 21:40:03 - ERROR - stderr - +2025-02-05 21:40:03 - INFO - stdout - {'loss': 0.6905, 'grad_norm': 1.1797484159469604, 'learning_rate': 9.80799349865834e-06, 'epoch': 1.56} +2025-02-05 21:40:03 - ERROR - stderr - 52%|█████▏ | 11687/22434 [11:32:23<7:27:05, 2.50s/it] +2025-02-05 21:40:05 - ERROR - stderr - 52%|█████▏ | 11688/22434 [11:32:25<7:27:03, 2.50s/it] +2025-02-05 21:40:06 - ERROR - stderr - +2025-02-05 21:40:06 - ERROR - stderr - +2025-02-05 21:40:06 - INFO - stdout - {'loss': 0.6367, 'grad_norm': 1.2235772609710693, 'learning_rate': 9.806550020194492e-06, 'epoch': 1.56} +2025-02-05 21:40:06 - ERROR - stderr - 52%|█████▏ | 11688/22434 [11:32:25<7:27:03, 2.50s/it] +2025-02-05 21:40:08 - ERROR - stderr - 52%|█████▏ | 11689/22434 [11:32:28<7:28:40, 2.51s/it] +2025-02-05 21:40:08 - ERROR - stderr - +2025-02-05 21:40:08 - ERROR - stderr - +2025-02-05 21:40:08 - INFO - stdout - {'loss': 0.6496, 'grad_norm': 1.1277586221694946, 'learning_rate': 9.80510654576292e-06, 'epoch': 1.56} +2025-02-05 21:40:08 - ERROR - stderr - 52%|█████▏ | 11689/22434 [11:32:28<7:28:40, 2.51s/it] +2025-02-05 21:40:10 - ERROR - stderr - 52%|█████▏ | 11690/22434 [11:32:30<7:25:38, 2.49s/it] +2025-02-05 21:40:10 - ERROR - stderr - +2025-02-05 21:40:10 - ERROR - stderr - +2025-02-05 21:40:10 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.2723939418792725, 'learning_rate': 9.80366307539372e-06, 'epoch': 1.56} +2025-02-05 21:40:10 - ERROR - stderr - 52%|█████▏ | 11690/22434 [11:32:30<7:25:38, 2.49s/it] +2025-02-05 21:40:13 - ERROR - stderr - 52%|█████▏ | 11691/22434 [11:32:33<7:30:22, 2.52s/it] +2025-02-05 21:40:13 - ERROR - stderr - +2025-02-05 21:40:13 - ERROR - stderr - +2025-02-05 21:40:13 - INFO - stdout - {'loss': 0.6905, 'grad_norm': 1.1371005773544312, 'learning_rate': 9.80221960911697e-06, 'epoch': 1.56} +2025-02-05 21:40:13 - ERROR - stderr - 52%|█████▏ | 11691/22434 [11:32:33<7:30:22, 2.52s/it] +2025-02-05 21:40:16 - ERROR - stderr - 52%|█████▏ | 11692/22434 [11:32:35<7:32:01, 2.52s/it] +2025-02-05 21:40:16 - ERROR - stderr - +2025-02-05 21:40:16 - ERROR - stderr - +2025-02-05 21:40:16 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 1.1974263191223145, 'learning_rate': 9.800776146962768e-06, 'epoch': 1.56} +2025-02-05 21:40:16 - ERROR - stderr - 52%|█████▏ | 11692/22434 [11:32:35<7:32:01, 2.52s/it] +2025-02-05 21:40:18 - ERROR - stderr - 52%|█████▏ | 11693/22434 [11:32:38<7:31:50, 2.52s/it] +2025-02-05 21:40:18 - ERROR - stderr - +2025-02-05 21:40:18 - ERROR - stderr - +2025-02-05 21:40:18 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.1156206130981445, 'learning_rate': 9.799332688961196e-06, 'epoch': 1.56} +2025-02-05 21:40:18 - ERROR - stderr - 52%|█████▏ | 11693/22434 [11:32:38<7:31:50, 2.52s/it] +2025-02-05 21:40:21 - ERROR - stderr - 52%|█████▏ | 11694/22434 [11:32:40<7:28:53, 2.51s/it] +2025-02-05 21:40:21 - ERROR - stderr - +2025-02-05 21:40:21 - ERROR - stderr - +2025-02-05 21:40:21 - INFO - stdout - {'loss': 0.597, 'grad_norm': 1.123761773109436, 'learning_rate': 9.797889235142338e-06, 'epoch': 1.56} +2025-02-05 21:40:21 - ERROR - stderr - 52%|█████▏ | 11694/22434 [11:32:40<7:28:53, 2.51s/it] +2025-02-05 21:40:23 - ERROR - stderr - 52%|█████▏ | 11695/22434 [11:32:43<7:26:56, 2.50s/it] +2025-02-05 21:40:23 - ERROR - stderr - +2025-02-05 21:40:23 - ERROR - stderr - +2025-02-05 21:40:23 - INFO - stdout - {'loss': 0.6977, 'grad_norm': 1.2224805355072021, 'learning_rate': 9.79644578553629e-06, 'epoch': 1.56} +2025-02-05 21:40:23 - ERROR - stderr - 52%|█████▏ | 11695/22434 [11:32:43<7:26:56, 2.50s/it] +2025-02-05 21:40:26 - ERROR - stderr - 52%|█████▏ | 11696/22434 [11:32:45<7:25:41, 2.49s/it] +2025-02-05 21:40:26 - ERROR - stderr - +2025-02-05 21:40:26 - ERROR - stderr - +2025-02-05 21:40:26 - INFO - stdout - {'loss': 0.7128, 'grad_norm': 1.1933468580245972, 'learning_rate': 9.795002340173135e-06, 'epoch': 1.56} +2025-02-05 21:40:26 - ERROR - stderr - 52%|█████▏ | 11696/22434 [11:32:45<7:25:41, 2.49s/it] +2025-02-05 21:40:28 - ERROR - stderr - 52%|█████▏ | 11697/22434 [11:32:48<7:24:59, 2.49s/it] +2025-02-05 21:40:28 - ERROR - stderr - +2025-02-05 21:40:28 - ERROR - stderr - +2025-02-05 21:40:28 - INFO - stdout - {'loss': 0.7225, 'grad_norm': 1.3735162019729614, 'learning_rate': 9.793558899082955e-06, 'epoch': 1.56} +2025-02-05 21:40:28 - ERROR - stderr - 52%|█████▏ | 11697/22434 [11:32:48<7:24:59, 2.49s/it] +2025-02-05 21:40:30 - ERROR - stderr - 52%|█████▏ | 11698/22434 [11:32:50<7:24:25, 2.48s/it] +2025-02-05 21:40:31 - ERROR - stderr - +2025-02-05 21:40:31 - ERROR - stderr - +2025-02-05 21:40:31 - INFO - stdout - {'loss': 0.7139, 'grad_norm': 1.221158504486084, 'learning_rate': 9.792115462295848e-06, 'epoch': 1.56} +2025-02-05 21:40:31 - ERROR - stderr - 52%|█████▏ | 11698/22434 [11:32:50<7:24:25, 2.48s/it] +2025-02-05 21:40:33 - ERROR - stderr - 52%|█████▏ | 11699/22434 [11:32:53<7:28:21, 2.51s/it] +2025-02-05 21:40:33 - ERROR - stderr - +2025-02-05 21:40:33 - ERROR - stderr - +2025-02-05 21:40:33 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.3197550773620605, 'learning_rate': 9.79067202984189e-06, 'epoch': 1.56} +2025-02-05 21:40:33 - ERROR - stderr - 52%|█████▏ | 11699/22434 [11:32:53<7:28:21, 2.51s/it] +2025-02-05 21:40:36 - ERROR - stderr - 52%|█████▏ | 11700/22434 [11:32:55<7:33:01, 2.53s/it] +2025-02-05 21:40:36 - ERROR - stderr - +2025-02-05 21:40:36 - ERROR - stderr - +2025-02-05 21:40:36 - INFO - stdout - {'loss': 0.7217, 'grad_norm': 1.207801103591919, 'learning_rate': 9.789228601751177e-06, 'epoch': 1.56} +2025-02-05 21:40:36 - ERROR - stderr - 52%|█████▏ | 11700/22434 [11:32:55<7:33:01, 2.53s/it] +2025-02-05 21:40:38 - ERROR - stderr - 52%|█████▏ | 11701/22434 [11:32:58<7:38:26, 2.56s/it] +2025-02-05 21:40:38 - ERROR - stderr - +2025-02-05 21:40:38 - ERROR - stderr - +2025-02-05 21:40:38 - INFO - stdout - {'loss': 0.6487, 'grad_norm': 1.2044693231582642, 'learning_rate': 9.787785178053792e-06, 'epoch': 1.56} +2025-02-05 21:40:38 - ERROR - stderr - 52%|█████▏ | 11701/22434 [11:32:58<7:38:26, 2.56s/it] +2025-02-05 21:40:41 - ERROR - stderr - 52%|█████▏ | 11702/22434 [11:33:00<7:32:29, 2.53s/it] +2025-02-05 21:40:41 - ERROR - stderr - +2025-02-05 21:40:41 - ERROR - stderr - +2025-02-05 21:40:41 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.3459947109222412, 'learning_rate': 9.786341758779817e-06, 'epoch': 1.56} +2025-02-05 21:40:41 - ERROR - stderr - 52%|█████▏ | 11702/22434 [11:33:01<7:32:29, 2.53s/it] +2025-02-05 21:40:43 - ERROR - stderr - 52%|█████▏ | 11703/22434 [11:33:03<7:37:07, 2.56s/it] +2025-02-05 21:40:43 - ERROR - stderr - +2025-02-05 21:40:43 - ERROR - stderr - +2025-02-05 21:40:43 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.0948126316070557, 'learning_rate': 9.784898343959351e-06, 'epoch': 1.56} +2025-02-05 21:40:43 - ERROR - stderr - 52%|█████▏ | 11703/22434 [11:33:03<7:37:07, 2.56s/it] +2025-02-05 21:40:46 - ERROR - stderr - 52%|█████▏ | 11704/22434 [11:33:06<7:36:27, 2.55s/it] +2025-02-05 21:40:46 - ERROR - stderr - +2025-02-05 21:40:46 - ERROR - stderr - +2025-02-05 21:40:46 - INFO - stdout - {'loss': 0.6748, 'grad_norm': 1.1210191249847412, 'learning_rate': 9.783454933622472e-06, 'epoch': 1.57} +2025-02-05 21:40:46 - ERROR - stderr - 52%|█████▏ | 11704/22434 [11:33:06<7:36:27, 2.55s/it] +2025-02-05 21:40:48 - ERROR - stderr - 52%|█████▏ | 11705/22434 [11:33:08<7:36:46, 2.55s/it] +2025-02-05 21:40:48 - ERROR - stderr - +2025-02-05 21:40:48 - ERROR - stderr - +2025-02-05 21:40:48 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.2867801189422607, 'learning_rate': 9.782011527799263e-06, 'epoch': 1.57} +2025-02-05 21:40:48 - ERROR - stderr - 52%|█████▏ | 11705/22434 [11:33:08<7:36:46, 2.55s/it] +2025-02-05 21:40:51 - ERROR - stderr - 52%|█████▏ | 11706/22434 [11:33:11<7:29:55, 2.52s/it] +2025-02-05 21:40:51 - ERROR - stderr - +2025-02-05 21:40:51 - ERROR - stderr - +2025-02-05 21:40:51 - INFO - stdout - {'loss': 0.7392, 'grad_norm': 1.2693672180175781, 'learning_rate': 9.780568126519817e-06, 'epoch': 1.57} +2025-02-05 21:40:51 - ERROR - stderr - 52%|█████▏ | 11706/22434 [11:33:11<7:29:55, 2.52s/it] +2025-02-05 21:40:53 - ERROR - stderr - 52%|█████▏ | 11707/22434 [11:33:13<7:24:56, 2.49s/it] +2025-02-05 21:40:53 - ERROR - stderr - +2025-02-05 21:40:53 - ERROR - stderr - +2025-02-05 21:40:53 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.150911569595337, 'learning_rate': 9.779124729814216e-06, 'epoch': 1.57} +2025-02-05 21:40:53 - ERROR - stderr - 52%|█████▏ | 11707/22434 [11:33:13<7:24:56, 2.49s/it] +2025-02-05 21:40:56 - ERROR - stderr - 52%|█████▏ | 11708/22434 [11:33:16<7:28:04, 2.51s/it] +2025-02-05 21:40:56 - ERROR - stderr - +2025-02-05 21:40:56 - ERROR - stderr - +2025-02-05 21:40:56 - INFO - stdout - {'loss': 0.7295, 'grad_norm': 1.3449972867965698, 'learning_rate': 9.777681337712554e-06, 'epoch': 1.57} +2025-02-05 21:40:56 - ERROR - stderr - 52%|█████▏ | 11708/22434 [11:33:16<7:28:04, 2.51s/it] +2025-02-05 21:40:58 - ERROR - stderr - 52%|█████▏ | 11709/22434 [11:33:18<7:26:06, 2.50s/it] +2025-02-05 21:40:58 - ERROR - stderr - +2025-02-05 21:40:58 - ERROR - stderr - +2025-02-05 21:40:58 - INFO - stdout - {'loss': 0.6645, 'grad_norm': 1.30966055393219, 'learning_rate': 9.77623795024491e-06, 'epoch': 1.57} +2025-02-05 21:40:58 - ERROR - stderr - 52%|█████▏ | 11709/22434 [11:33:18<7:26:06, 2.50s/it] +2025-02-05 21:41:01 - ERROR - stderr - 52%|█████▏ | 11710/22434 [11:33:21<7:25:56, 2.49s/it] +2025-02-05 21:41:01 - ERROR - stderr - +2025-02-05 21:41:01 - ERROR - stderr - +2025-02-05 21:41:01 - INFO - stdout - {'loss': 0.6483, 'grad_norm': 1.1972509622573853, 'learning_rate': 9.77479456744137e-06, 'epoch': 1.57} +2025-02-05 21:41:01 - ERROR - stderr - 52%|█████▏ | 11710/22434 [11:33:21<7:25:56, 2.49s/it] +2025-02-05 21:41:03 - ERROR - stderr - 52%|█████▏ | 11711/22434 [11:33:23<7:24:53, 2.49s/it] +2025-02-05 21:41:03 - ERROR - stderr - +2025-02-05 21:41:03 - ERROR - stderr - +2025-02-05 21:41:03 - INFO - stdout - {'loss': 0.7409, 'grad_norm': 1.3096901178359985, 'learning_rate': 9.773351189332024e-06, 'epoch': 1.57} +2025-02-05 21:41:03 - ERROR - stderr - 52%|█████▏ | 11711/22434 [11:33:23<7:24:53, 2.49s/it] +2025-02-05 21:41:06 - ERROR - stderr - 52%|█████▏ | 11712/22434 [11:33:26<7:26:40, 2.50s/it] +2025-02-05 21:41:06 - ERROR - stderr - +2025-02-05 21:41:06 - ERROR - stderr - +2025-02-05 21:41:06 - INFO - stdout - {'loss': 0.635, 'grad_norm': 1.146596908569336, 'learning_rate': 9.771907815946955e-06, 'epoch': 1.57} +2025-02-05 21:41:06 - ERROR - stderr - 52%|█████▏ | 11712/22434 [11:33:26<7:26:40, 2.50s/it] +2025-02-05 21:41:08 - ERROR - stderr - 52%|█████▏ | 11713/22434 [11:33:28<7:32:27, 2.53s/it] +2025-02-05 21:41:08 - ERROR - stderr - +2025-02-05 21:41:08 - ERROR - stderr - +2025-02-05 21:41:08 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.2024109363555908, 'learning_rate': 9.770464447316245e-06, 'epoch': 1.57} +2025-02-05 21:41:08 - ERROR - stderr - 52%|█████▏ | 11713/22434 [11:33:28<7:32:27, 2.53s/it] +2025-02-05 21:41:11 - ERROR - stderr - 52%|█████▏ | 11714/22434 [11:33:31<7:31:52, 2.53s/it] +2025-02-05 21:41:11 - ERROR - stderr - +2025-02-05 21:41:11 - ERROR - stderr - +2025-02-05 21:41:11 - INFO - stdout - {'loss': 0.653, 'grad_norm': 1.2433140277862549, 'learning_rate': 9.769021083469991e-06, 'epoch': 1.57} +2025-02-05 21:41:11 - ERROR - stderr - 52%|█████▏ | 11714/22434 [11:33:31<7:31:52, 2.53s/it] +2025-02-05 21:41:14 - ERROR - stderr - 52%|█████▏ | 11715/22434 [11:33:33<7:39:25, 2.57s/it] +2025-02-05 21:41:14 - ERROR - stderr - +2025-02-05 21:41:14 - ERROR - stderr - +2025-02-05 21:41:14 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.385238766670227, 'learning_rate': 9.767577724438267e-06, 'epoch': 1.57} +2025-02-05 21:41:14 - ERROR - stderr - 52%|█████▏ | 11715/22434 [11:33:33<7:39:25, 2.57s/it] +2025-02-05 21:41:16 - ERROR - stderr - 52%|███��█▏ | 11716/22434 [11:33:36<7:36:29, 2.56s/it] +2025-02-05 21:41:16 - ERROR - stderr - +2025-02-05 21:41:16 - ERROR - stderr - +2025-02-05 21:41:16 - INFO - stdout - {'loss': 0.7164, 'grad_norm': 1.0950173139572144, 'learning_rate': 9.766134370251165e-06, 'epoch': 1.57} +2025-02-05 21:41:16 - ERROR - stderr - 52%|█████▏ | 11716/22434 [11:33:36<7:36:29, 2.56s/it] +2025-02-05 21:41:19 - ERROR - stderr - 52%|█████▏ | 11717/22434 [11:33:38<7:34:34, 2.55s/it] +2025-02-05 21:41:19 - ERROR - stderr - +2025-02-05 21:41:19 - ERROR - stderr - +2025-02-05 21:41:19 - INFO - stdout - {'loss': 0.7151, 'grad_norm': 1.2922570705413818, 'learning_rate': 9.76469102093877e-06, 'epoch': 1.57} +2025-02-05 21:41:19 - ERROR - stderr - 52%|█████▏ | 11717/22434 [11:33:38<7:34:34, 2.55s/it] +2025-02-05 21:41:21 - ERROR - stderr - 52%|█████▏ | 11718/22434 [11:33:41<7:29:54, 2.52s/it] +2025-02-05 21:41:21 - ERROR - stderr - +2025-02-05 21:41:21 - ERROR - stderr - +2025-02-05 21:41:21 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.1620092391967773, 'learning_rate': 9.76324767653116e-06, 'epoch': 1.57} +2025-02-05 21:41:21 - ERROR - stderr - 52%|█████▏ | 11718/22434 [11:33:41<7:29:54, 2.52s/it] +2025-02-05 21:41:24 - ERROR - stderr - 52%|█████▏ | 11719/22434 [11:33:43<7:29:12, 2.52s/it] +2025-02-05 21:41:24 - ERROR - stderr - +2025-02-05 21:41:24 - ERROR - stderr - +2025-02-05 21:41:24 - INFO - stdout - {'loss': 0.6438, 'grad_norm': 1.2484108209609985, 'learning_rate': 9.761804337058428e-06, 'epoch': 1.57} +2025-02-05 21:41:24 - ERROR - stderr - 52%|█████▏ | 11719/22434 [11:33:43<7:29:12, 2.52s/it] +2025-02-05 21:41:26 - ERROR - stderr - 52%|█████▏ | 11720/22434 [11:33:46<7:30:21, 2.52s/it] +2025-02-05 21:41:26 - ERROR - stderr - +2025-02-05 21:41:26 - ERROR - stderr - +2025-02-05 21:41:26 - INFO - stdout - {'loss': 0.7158, 'grad_norm': 1.1119927167892456, 'learning_rate': 9.76036100255066e-06, 'epoch': 1.57} +2025-02-05 21:41:26 - ERROR - stderr - 52%|█████▏ | 11720/22434 [11:33:46<7:30:21, 2.52s/it] +2025-02-05 21:41:29 - ERROR - stderr - 52%|█████▏ | 11721/22434 [11:33:48<7:29:06, 2.52s/it] +2025-02-05 21:41:29 - ERROR - stderr - +2025-02-05 21:41:29 - ERROR - stderr - +2025-02-05 21:41:29 - INFO - stdout - {'loss': 0.6921, 'grad_norm': 1.1602082252502441, 'learning_rate': 9.758917673037932e-06, 'epoch': 1.57} +2025-02-05 21:41:29 - ERROR - stderr - 52%|█████▏ | 11721/22434 [11:33:48<7:29:06, 2.52s/it] +2025-02-05 21:41:31 - ERROR - stderr - 52%|█████▏ | 11722/22434 [11:33:51<7:29:45, 2.52s/it] +2025-02-05 21:41:31 - ERROR - stderr - +2025-02-05 21:41:31 - ERROR - stderr - +2025-02-05 21:41:31 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.1938859224319458, 'learning_rate': 9.75747434855034e-06, 'epoch': 1.57} +2025-02-05 21:41:31 - ERROR - stderr - 52%|█████▏ | 11722/22434 [11:33:51<7:29:45, 2.52s/it] +2025-02-05 21:41:34 - ERROR - stderr - 52%|█████▏ | 11723/22434 [11:33:53<7:30:35, 2.52s/it] +2025-02-05 21:41:34 - ERROR - stderr - +2025-02-05 21:41:34 - ERROR - stderr - +2025-02-05 21:41:34 - INFO - stdout - {'loss': 0.6692, 'grad_norm': 1.1739352941513062, 'learning_rate': 9.756031029117958e-06, 'epoch': 1.57} +2025-02-05 21:41:34 - ERROR - stderr - 52%|█████▏ | 11723/22434 [11:33:54<7:30:35, 2.52s/it] +2025-02-05 21:41:36 - ERROR - stderr - 52%|█████▏ | 11724/22434 [11:33:56<7:34:18, 2.55s/it] +2025-02-05 21:41:36 - ERROR - stderr - +2025-02-05 21:41:36 - ERROR - stderr - +2025-02-05 21:41:36 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.309211015701294, 'learning_rate': 9.75458771477088e-06, 'epoch': 1.57} +2025-02-05 21:41:36 - ERROR - stderr - 52%|█████▏ | 11724/22434 [11:33:56<7:34:18, 2.55s/it] +2025-02-05 21:41:39 - ERROR - stderr - 52%|█████▏ | 11725/22434 [11:33:59<7:51:05, 2.64s/it] +2025-02-05 21:41:39 - ERROR - stderr - +2025-02-05 21:41:39 - ERROR - stderr - +2025-02-05 21:41:39 - INFO - stdout - {'loss': 0.7484, 'grad_norm': 1.2607371807098389, 'learning_rate': 9.753144405539184e-06, 'epoch': 1.57} +2025-02-05 21:41:39 - ERROR - stderr - 52%|█████▏ | 11725/22434 [11:33:59<7:51:05, 2.64s/it] +2025-02-05 21:41:42 - ERROR - stderr - 52%|█████▏ | 11726/22434 [11:34:01<7:45:40, 2.61s/it] +2025-02-05 21:41:42 - ERROR - stderr - +2025-02-05 21:41:42 - ERROR - stderr - +2025-02-05 21:41:42 - INFO - stdout - {'loss': 0.6628, 'grad_norm': 1.0773786306381226, 'learning_rate': 9.751701101452954e-06, 'epoch': 1.57} +2025-02-05 21:41:42 - ERROR - stderr - 52%|█████▏ | 11726/22434 [11:34:01<7:45:40, 2.61s/it] +2025-02-05 21:41:44 - ERROR - stderr - 52%|█████▏ | 11727/22434 [11:34:04<7:38:12, 2.57s/it] +2025-02-05 21:41:44 - ERROR - stderr - +2025-02-05 21:41:44 - ERROR - stderr - +2025-02-05 21:41:44 - INFO - stdout - {'loss': 0.7642, 'grad_norm': 1.219720482826233, 'learning_rate': 9.750257802542282e-06, 'epoch': 1.57} +2025-02-05 21:41:44 - ERROR - stderr - 52%|█████▏ | 11727/22434 [11:34:04<7:38:12, 2.57s/it] +2025-02-05 21:41:47 - ERROR - stderr - 52%|█████▏ | 11728/22434 [11:34:07<7:48:20, 2.62s/it] +2025-02-05 21:41:47 - ERROR - stderr - +2025-02-05 21:41:47 - ERROR - stderr - +2025-02-05 21:41:47 - INFO - stdout - {'loss': 0.663, 'grad_norm': 1.0714747905731201, 'learning_rate': 9.748814508837244e-06, 'epoch': 1.57} +2025-02-05 21:41:47 - ERROR - stderr - 52%|█████▏ | 11728/22434 [11:34:07<7:48:20, 2.62s/it] +2025-02-05 21:41:50 - ERROR - stderr - 52%|█████▏ | 11729/22434 [11:34:10<8:03:53, 2.71s/it] +2025-02-05 21:41:50 - ERROR - stderr - +2025-02-05 21:41:50 - ERROR - stderr - +2025-02-05 21:41:50 - INFO - stdout - {'loss': 0.7238, 'grad_norm': 1.3097397089004517, 'learning_rate': 9.74737122036793e-06, 'epoch': 1.57} +2025-02-05 21:41:50 - ERROR - stderr - 52%|█████▏ | 11729/22434 [11:34:10<8:03:53, 2.71s/it] +2025-02-05 21:41:52 - ERROR - stderr - 52%|█████▏ | 11730/22434 [11:34:12<7:55:16, 2.66s/it] +2025-02-05 21:41:52 - ERROR - stderr - +2025-02-05 21:41:52 - ERROR - stderr - +2025-02-05 21:41:52 - INFO - stdout - {'loss': 0.7673, 'grad_norm': 1.3196287155151367, 'learning_rate': 9.74592793716442e-06, 'epoch': 1.57} +2025-02-05 21:41:52 - ERROR - stderr - 52%|█████▏ | 11730/22434 [11:34:12<7:55:16, 2.66s/it] +2025-02-05 21:41:55 - ERROR - stderr - 52%|█████▏ | 11731/22434 [11:34:15<7:49:29, 2.63s/it] +2025-02-05 21:41:55 - ERROR - stderr - +2025-02-05 21:41:55 - ERROR - stderr - +2025-02-05 21:41:55 - INFO - stdout - {'loss': 0.5918, 'grad_norm': 1.206199288368225, 'learning_rate': 9.744484659256796e-06, 'epoch': 1.57} +2025-02-05 21:41:55 - ERROR - stderr - 52%|█████▏ | 11731/22434 [11:34:15<7:49:29, 2.63s/it] +2025-02-05 21:41:58 - ERROR - stderr - 52%|█████▏ | 11732/22434 [11:34:17<7:50:54, 2.64s/it] +2025-02-05 21:41:58 - ERROR - stderr - +2025-02-05 21:41:58 - ERROR - stderr - +2025-02-05 21:41:58 - INFO - stdout - {'loss': 0.6645, 'grad_norm': 1.225818395614624, 'learning_rate': 9.743041386675147e-06, 'epoch': 1.57} +2025-02-05 21:41:58 - ERROR - stderr - 52%|█████▏ | 11732/22434 [11:34:17<7:50:54, 2.64s/it] +2025-02-05 21:42:00 - ERROR - stderr - 52%|█████▏ | 11733/22434 [11:34:20<7:46:37, 2.62s/it] +2025-02-05 21:42:00 - ERROR - stderr - +2025-02-05 21:42:00 - ERROR - stderr - +2025-02-05 21:42:00 - INFO - stdout - {'loss': 0.7317, 'grad_norm': 1.2376528978347778, 'learning_rate': 9.741598119449558e-06, 'epoch': 1.57} +2025-02-05 21:42:00 - ERROR - stderr - 52%|█████▏ | 11733/22434 [11:34:20<7:46:37, 2.62s/it] +2025-02-05 21:42:03 - ERROR - stderr - 52%|█████▏ | 11734/22434 [11:34:23<7:51:10, 2.64s/it] +2025-02-05 21:42:03 - ERROR - stderr - +2025-02-05 21:42:03 - ERROR - stderr - +2025-02-05 21:42:03 - INFO - stdout - {'loss': 0.6674, 'grad_norm': 1.3443000316619873, 'learning_rate': 9.740154857610103e-06, 'epoch': 1.57} +2025-02-05 21:42:03 - ERROR - stderr - 52%|█████▏ | 11734/22434 [11:34:23<7:51:10, 2.64s/it] +2025-02-05 21:42:05 - ERROR - stderr - 52%|█████▏ | 11735/22434 [11:34:25<7:42:48, 2.60s/it] +2025-02-05 21:42:05 - ERROR - stderr - +2025-02-05 21:42:05 - ERROR - stderr - +2025-02-05 21:42:05 - INFO - stdout - {'loss': 0.6393, 'grad_norm': 1.0988441705703735, 'learning_rate': 9.738711601186875e-06, 'epoch': 1.57} +2025-02-05 21:42:05 - ERROR - stderr - 52%|█████▏ | 11735/22434 [11:34:25<7:42:48, 2.60s/it] +2025-02-05 21:42:08 - ERROR - stderr - 52%|█████▏ | 11736/22434 [11:34:28<7:40:56, 2.59s/it] +2025-02-05 21:42:08 - ERROR - stderr - +2025-02-05 21:42:08 - ERROR - stderr - +2025-02-05 21:42:08 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.3145753145217896, 'learning_rate': 9.737268350209951e-06, 'epoch': 1.57} +2025-02-05 21:42:08 - ERROR - stderr - 52%|█████▏ | 11736/22434 [11:34:28<7:40:56, 2.59s/it] +2025-02-05 21:42:11 - ERROR - stderr - 52%|█████▏ | 11737/22434 [11:34:30<7:41:48, 2.59s/it] +2025-02-05 21:42:11 - ERROR - stderr - +2025-02-05 21:42:11 - ERROR - stderr - +2025-02-05 21:42:11 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.3256607055664062, 'learning_rate': 9.73582510470942e-06, 'epoch': 1.57} +2025-02-05 21:42:11 - ERROR - stderr - 52%|█████▏ | 11737/22434 [11:34:30<7:41:48, 2.59s/it] +2025-02-05 21:42:13 - ERROR - stderr - 52%|█████▏ | 11738/22434 [11:34:33<7:46:45, 2.62s/it] +2025-02-05 21:42:13 - ERROR - stderr - +2025-02-05 21:42:13 - ERROR - stderr - +2025-02-05 21:42:13 - INFO - stdout - {'loss': 0.7534, 'grad_norm': 1.2174677848815918, 'learning_rate': 9.73438186471536e-06, 'epoch': 1.57} +2025-02-05 21:42:13 - ERROR - stderr - 52%|█████▏ | 11738/22434 [11:34:33<7:46:45, 2.62s/it] +2025-02-05 21:42:16 - ERROR - stderr - 52%|█████▏ | 11739/22434 [11:34:35<7:41:14, 2.59s/it] +2025-02-05 21:42:16 - ERROR - stderr - +2025-02-05 21:42:16 - ERROR - stderr - +2025-02-05 21:42:16 - INFO - stdout - {'loss': 0.8335, 'grad_norm': 1.3708367347717285, 'learning_rate': 9.732938630257855e-06, 'epoch': 1.57} +2025-02-05 21:42:16 - ERROR - stderr - 52%|█████▏ | 11739/22434 [11:34:36<7:41:14, 2.59s/it] +2025-02-05 21:42:18 - ERROR - stderr - 52%|█████▏ | 11740/22434 [11:34:38<7:36:50, 2.56s/it] +2025-02-05 21:42:18 - ERROR - stderr - +2025-02-05 21:42:18 - ERROR - stderr - +2025-02-05 21:42:18 - INFO - stdout - {'loss': 0.6255, 'grad_norm': 1.1592994928359985, 'learning_rate': 9.731495401366992e-06, 'epoch': 1.57} +2025-02-05 21:42:18 - ERROR - stderr - 52%|█████▏ | 11740/22434 [11:34:38<7:36:50, 2.56s/it] +2025-02-05 21:42:21 - ERROR - stderr - 52%|█████▏ | 11741/22434 [11:34:41<7:34:32, 2.55s/it] +2025-02-05 21:42:21 - ERROR - stderr - +2025-02-05 21:42:21 - ERROR - stderr - +2025-02-05 21:42:21 - INFO - stdout - {'loss': 0.7256, 'grad_norm': 1.2380578517913818, 'learning_rate': 9.73005217807285e-06, 'epoch': 1.57} +2025-02-05 21:42:21 - ERROR - stderr - 52%|█████▏ | 11741/22434 [11:34:41<7:34:32, 2.55s/it] +2025-02-05 21:42:23 - ERROR - stderr - 52%|█████▏ | 11742/22434 [11:34:43<7:33:21, 2.54s/it] +2025-02-05 21:42:23 - ERROR - stderr - +2025-02-05 21:42:23 - ERROR - stderr - +2025-02-05 21:42:23 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.2830851078033447, 'learning_rate': 9.728608960405508e-06, 'epoch': 1.57} +2025-02-05 21:42:23 - ERROR - stderr - 52%|█████▏ | 11742/22434 [11:34:43<7:33:21, 2.54s/it] +2025-02-05 21:42:26 - ERROR - stderr - 52%|█████▏ | 11743/22434 [11:34:45<7:28:42, 2.52s/it] +2025-02-05 21:42:26 - ERROR - stderr - +2025-02-05 21:42:26 - ERROR - stderr - +2025-02-05 21:42:26 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 1.1336897611618042, 'learning_rate': 9.727165748395056e-06, 'epoch': 1.57} +2025-02-05 21:42:26 - ERROR - stderr - 52%|█████▏ | 11743/22434 [11:34:46<7:28:42, 2.52s/it] +2025-02-05 21:42:28 - ERROR - stderr - 52%|█████▏ | 11744/22434 [11:34:48<7:28:01, 2.51s/it] +2025-02-05 21:42:28 - ERROR - stderr - +2025-02-05 21:42:28 - ERROR - stderr - +2025-02-05 21:42:28 - INFO - stdout - {'loss': 0.6702, 'grad_norm': 1.0971282720565796, 'learning_rate': 9.72572254207157e-06, 'epoch': 1.57} +2025-02-05 21:42:28 - ERROR - stderr - 52%|█████▏ | 11744/22434 [11:34:48<7:28:01, 2.51s/it] +2025-02-05 21:42:31 - ERROR - stderr - 52%|█████▏ | 11745/22434 [11:34:51<7:29:58, 2.53s/it] +2025-02-05 21:42:31 - ERROR - stderr - +2025-02-05 21:42:31 - ERROR - stderr - +2025-02-05 21:42:31 - INFO - stdout - {'loss': 0.6961, 'grad_norm': 1.1822386980056763, 'learning_rate': 9.724279341465138e-06, 'epoch': 1.57} +2025-02-05 21:42:31 - ERROR - stderr - 52%|█████▏ | 11745/22434 [11:34:51<7:29:58, 2.53s/it] +2025-02-05 21:42:33 - ERROR - stderr - 52%|█████▏ | 11746/22434 [11:34:53<7:30:59, 2.53s/it] +2025-02-05 21:42:33 - ERROR - stderr - +2025-02-05 21:42:33 - ERROR - stderr - +2025-02-05 21:42:33 - INFO - stdout - {'loss': 0.6423, 'grad_norm': 1.1029486656188965, 'learning_rate': 9.722836146605838e-06, 'epoch': 1.57} +2025-02-05 21:42:33 - ERROR - stderr - 52%|█████▏ | 11746/22434 [11:34:53<7:30:59, 2.53s/it] +2025-02-05 21:42:36 - ERROR - stderr - 52%|█████▏ | 11747/22434 [11:34:56<7:53:36, 2.66s/it] +2025-02-05 21:42:36 - ERROR - stderr - +2025-02-05 21:42:36 - ERROR - stderr - +2025-02-05 21:42:36 - INFO - stdout - {'loss': 0.7294, 'grad_norm': 1.3332191705703735, 'learning_rate': 9.721392957523751e-06, 'epoch': 1.57} +2025-02-05 21:42:36 - ERROR - stderr - 52%|█████▏ | 11747/22434 [11:34:56<7:53:36, 2.66s/it] +2025-02-05 21:42:39 - ERROR - stderr - 52%|█████▏ | 11748/22434 [11:34:59<7:47:19, 2.62s/it] +2025-02-05 21:42:39 - ERROR - stderr - +2025-02-05 21:42:39 - ERROR - stderr - +2025-02-05 21:42:39 - INFO - stdout - {'loss': 0.7524, 'grad_norm': 1.1590595245361328, 'learning_rate': 9.719949774248967e-06, 'epoch': 1.57} +2025-02-05 21:42:39 - ERROR - stderr - 52%|█████▏ | 11748/22434 [11:34:59<7:47:19, 2.62s/it] +2025-02-05 21:42:41 - ERROR - stderr - 52%|█████▏ | 11749/22434 [11:35:01<7:44:16, 2.61s/it] +2025-02-05 21:42:41 - ERROR - stderr - +2025-02-05 21:42:41 - ERROR - stderr - +2025-02-05 21:42:41 - INFO - stdout - {'loss': 0.7005, 'grad_norm': 1.1736226081848145, 'learning_rate': 9.718506596811561e-06, 'epoch': 1.57} +2025-02-05 21:42:41 - ERROR - stderr - 52%|█████▏ | 11749/22434 [11:35:01<7:44:16, 2.61s/it] +2025-02-05 21:42:44 - ERROR - stderr - 52%|█████▏ | 11750/22434 [11:35:04<7:37:45, 2.57s/it] +2025-02-05 21:42:44 - ERROR - stderr - +2025-02-05 21:42:44 - ERROR - stderr - +2025-02-05 21:42:44 - INFO - stdout - {'loss': 0.6983, 'grad_norm': 1.327608346939087, 'learning_rate': 9.717063425241611e-06, 'epoch': 1.57} +2025-02-05 21:42:44 - ERROR - stderr - 52%|█████▏ | 11750/22434 [11:35:04<7:37:45, 2.57s/it] +2025-02-05 21:42:46 - ERROR - stderr - 52%|█████▏ | 11751/22434 [11:35:06<7:34:26, 2.55s/it] +2025-02-05 21:42:46 - ERROR - stderr - +2025-02-05 21:42:46 - ERROR - stderr - +2025-02-05 21:42:46 - INFO - stdout - {'loss': 0.6386, 'grad_norm': 1.1685997247695923, 'learning_rate': 9.715620259569205e-06, 'epoch': 1.57} +2025-02-05 21:42:46 - ERROR - stderr - 52%|█████▏ | 11751/22434 [11:35:06<7:34:26, 2.55s/it] +2025-02-05 21:42:49 - ERROR - stderr - 52%|█████▏ | 11752/22434 [11:35:09<7:28:58, 2.52s/it] +2025-02-05 21:42:49 - ERROR - stderr - +2025-02-05 21:42:49 - ERROR - stderr - +2025-02-05 21:42:49 - INFO - stdout - {'loss': 0.669, 'grad_norm': 1.1421043872833252, 'learning_rate': 9.71417709982442e-06, 'epoch': 1.57} +2025-02-05 21:42:49 - ERROR - stderr - 52%|█████▏ | 11752/22434 [11:35:09<7:28:58, 2.52s/it] +2025-02-05 21:42:51 - ERROR - stderr - 52%|█████▏ | 11753/22434 [11:35:11<7:24:24, 2.50s/it] +2025-02-05 21:42:51 - ERROR - stderr - +2025-02-05 21:42:51 - ERROR - stderr - +2025-02-05 21:42:51 - INFO - stdout - {'loss': 0.7208, 'grad_norm': 1.1286109685897827, 'learning_rate': 9.712733946037344e-06, 'epoch': 1.57} +2025-02-05 21:42:51 - ERROR - stderr - 52%|█████▏ | 11753/22434 [11:35:11<7:24:24, 2.50s/it] +2025-02-05 21:42:54 - ERROR - stderr - 52%|█████▏ | 11754/22434 [11:35:14<7:24:00, 2.49s/it] +2025-02-05 21:42:54 - ERROR - stderr - +2025-02-05 21:42:54 - ERROR - stderr - +2025-02-05 21:42:54 - INFO - stdout - {'loss': 0.69, 'grad_norm': 1.1458673477172852, 'learning_rate': 9.711290798238056e-06, 'epoch': 1.57} +2025-02-05 21:42:54 - ERROR - stderr - 52%|█████▏ | 11754/22434 [11:35:14<7:24:00, 2.49s/it] +2025-02-05 21:42:56 - ERROR - stderr - 52%|█████▏ | 11755/22434 [11:35:16<7:22:17, 2.48s/it] +2025-02-05 21:42:56 - ERROR - stderr - +2025-02-05 21:42:56 - ERROR - stderr - +2025-02-05 21:42:56 - INFO - stdout - {'loss': 0.668, 'grad_norm': 1.143220067024231, 'learning_rate': 9.70984765645663e-06, 'epoch': 1.57} +2025-02-05 21:42:56 - ERROR - stderr - 52%|█████▏ | 11755/22434 [11:35:16<7:22:17, 2.48s/it] +2025-02-05 21:42:59 - ERROR - stderr - 52%|█████▏ | 11756/22434 [11:35:18<7:20:50, 2.48s/it] +2025-02-05 21:42:59 - ERROR - stderr - +2025-02-05 21:42:59 - ERROR - stderr - +2025-02-05 21:42:59 - INFO - stdout - {'loss': 0.7589, 'grad_norm': 1.2981085777282715, 'learning_rate': 9.708404520723156e-06, 'epoch': 1.57} +2025-02-05 21:42:59 - ERROR - stderr - 52%|█████▏ | 11756/22434 [11:35:19<7:20:50, 2.48s/it] +2025-02-05 21:43:01 - ERROR - stderr - 52%|█████▏ | 11757/22434 [11:35:21<7:17:35, 2.46s/it] +2025-02-05 21:43:01 - ERROR - stderr - +2025-02-05 21:43:01 - ERROR - stderr - +2025-02-05 21:43:01 - INFO - stdout - {'loss': 0.7312, 'grad_norm': 1.313697099685669, 'learning_rate': 9.706961391067709e-06, 'epoch': 1.57} +2025-02-05 21:43:01 - ERROR - stderr - 52%|█████▏ | 11757/22434 [11:35:21<7:17:35, 2.46s/it] +2025-02-05 21:43:04 - ERROR - stderr - 52%|█████▏ | 11758/22434 [11:35:23<7:17:27, 2.46s/it] +2025-02-05 21:43:04 - ERROR - stderr - +2025-02-05 21:43:04 - ERROR - stderr - +2025-02-05 21:43:04 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.164406180381775, 'learning_rate': 9.705518267520369e-06, 'epoch': 1.57} +2025-02-05 21:43:04 - ERROR - stderr - 52%|█████▏ | 11758/22434 [11:35:23<7:17:27, 2.46s/it] +2025-02-05 21:43:06 - ERROR - stderr - 52%|█████▏ | 11759/22434 [11:35:26<7:16:39, 2.45s/it] +2025-02-05 21:43:06 - ERROR - stderr - +2025-02-05 21:43:06 - ERROR - stderr - +2025-02-05 21:43:06 - INFO - stdout - {'loss': 0.6712, 'grad_norm': 1.0774198770523071, 'learning_rate': 9.704075150111222e-06, 'epoch': 1.57} +2025-02-05 21:43:06 - ERROR - stderr - 52%|█████▏ | 11759/22434 [11:35:26<7:16:39, 2.45s/it] +2025-02-05 21:43:09 - ERROR - stderr - 52%|█████▏ | 11760/22434 [11:35:28<7:24:14, 2.50s/it] +2025-02-05 21:43:09 - ERROR - stderr - +2025-02-05 21:43:09 - ERROR - stderr - +2025-02-05 21:43:09 - INFO - stdout - {'loss': 0.7603, 'grad_norm': 1.3268271684646606, 'learning_rate': 9.702632038870342e-06, 'epoch': 1.57} +2025-02-05 21:43:09 - ERROR - stderr - 52%|█████▏ | 11760/22434 [11:35:28<7:24:14, 2.50s/it] +2025-02-05 21:43:11 - ERROR - stderr - 52%|█████▏ | 11761/22434 [11:35:31<7:29:06, 2.52s/it] +2025-02-05 21:43:11 - ERROR - stderr - +2025-02-05 21:43:11 - ERROR - stderr - +2025-02-05 21:43:11 - INFO - stdout - {'loss': 0.7605, 'grad_norm': 1.1782640218734741, 'learning_rate': 9.701188933827817e-06, 'epoch': 1.57} +2025-02-05 21:43:11 - ERROR - stderr - 52%|█████▏ | 11761/22434 [11:35:31<7:29:06, 2.52s/it] +2025-02-05 21:43:14 - ERROR - stderr - 52%|█████▏ | 11762/22434 [11:35:33<7:27:27, 2.52s/it] +2025-02-05 21:43:14 - ERROR - stderr - +2025-02-05 21:43:14 - ERROR - stderr - +2025-02-05 21:43:14 - INFO - stdout - {'loss': 0.71, 'grad_norm': 1.1995817422866821, 'learning_rate': 9.699745835013724e-06, 'epoch': 1.57} +2025-02-05 21:43:14 - ERROR - stderr - 52%|█████▏ | 11762/22434 [11:35:34<7:27:27, 2.52s/it] +2025-02-05 21:43:16 - ERROR - stderr - 52%|█████▏ | 11763/22434 [11:35:36<7:35:45, 2.56s/it] +2025-02-05 21:43:16 - ERROR - stderr - +2025-02-05 21:43:16 - ERROR - stderr - +2025-02-05 21:43:16 - INFO - stdout - {'loss': 0.7202, 'grad_norm': 1.2396368980407715, 'learning_rate': 9.698302742458135e-06, 'epoch': 1.57} +2025-02-05 21:43:16 - ERROR - stderr - 52%|█████▏ | 11763/22434 [11:35:36<7:35:45, 2.56s/it] +2025-02-05 21:43:19 - ERROR - stderr - 52%|█████▏ | 11764/22434 [11:35:39<7:29:38, 2.53s/it] +2025-02-05 21:43:19 - ERROR - stderr - +2025-02-05 21:43:19 - ERROR - stderr - +2025-02-05 21:43:19 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 1.0194238424301147, 'learning_rate': 9.69685965619114e-06, 'epoch': 1.57} +2025-02-05 21:43:19 - ERROR - stderr - 52%|█████▏ | 11764/22434 [11:35:39<7:29:38, 2.53s/it] +2025-02-05 21:43:21 - ERROR - stderr - 52%|█████▏ | 11765/22434 [11:35:41<7:28:24, 2.52s/it] +2025-02-05 21:43:21 - ERROR - stderr - +2025-02-05 21:43:21 - ERROR - stderr - +2025-02-05 21:43:21 - INFO - stdout - {'loss': 0.7789, 'grad_norm': 1.3074367046356201, 'learning_rate': 9.695416576242818e-06, 'epoch': 1.57} +2025-02-05 21:43:21 - ERROR - stderr - 52%|█████▏ | 11765/22434 [11:35:41<7:28:24, 2.52s/it] +2025-02-05 21:43:24 - ERROR - stderr - 52%|█████▏ | 11766/22434 [11:35:44<7:30:31, 2.53s/it] +2025-02-05 21:43:24 - ERROR - stderr - +2025-02-05 21:43:24 - ERROR - stderr - +2025-02-05 21:43:24 - INFO - stdout - {'loss': 0.7292, 'grad_norm': 1.1553776264190674, 'learning_rate': 9.69397350264324e-06, 'epoch': 1.57} +2025-02-05 21:43:24 - ERROR - stderr - 52%|█████▏ | 11766/22434 [11:35:44<7:30:31, 2.53s/it] +2025-02-05 21:43:26 - ERROR - stderr - 52%|█████▏ | 11767/22434 [11:35:46<7:31:56, 2.54s/it] +2025-02-05 21:43:26 - ERROR - stderr - +2025-02-05 21:43:26 - ERROR - stderr - +2025-02-05 21:43:26 - INFO - stdout - {'loss': 0.7685, 'grad_norm': 1.2162641286849976, 'learning_rate': 9.692530435422497e-06, 'epoch': 1.57} +2025-02-05 21:43:26 - ERROR - stderr - 52%|█████▏ | 11767/22434 [11:35:46<7:31:56, 2.54s/it] +2025-02-05 21:43:29 - ERROR - stderr - 52%|█████▏ | 11768/22434 [11:35:49<7:27:10, 2.52s/it] +2025-02-05 21:43:29 - ERROR - stderr - +2025-02-05 21:43:29 - ERROR - stderr - +2025-02-05 21:43:29 - INFO - stdout - {'loss': 0.7143, 'grad_norm': 1.2103520631790161, 'learning_rate': 9.691087374610659e-06, 'epoch': 1.57} +2025-02-05 21:43:29 - ERROR - stderr - 52%|█████▏ | 11768/22434 [11:35:49<7:27:10, 2.52s/it] +2025-02-05 21:43:31 - ERROR - stderr - 52%|█████▏ | 11769/22434 [11:35:51<7:23:55, 2.50s/it] +2025-02-05 21:43:31 - ERROR - stderr - +2025-02-05 21:43:31 - ERROR - stderr - +2025-02-05 21:43:31 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.3082326650619507, 'learning_rate': 9.689644320237812e-06, 'epoch': 1.57} +2025-02-05 21:43:31 - ERROR - stderr - 52%|█████▏ | 11769/22434 [11:35:51<7:23:55, 2.50s/it] +2025-02-05 21:43:34 - ERROR - stderr - 52%|█████▏ | 11770/22434 [11:35:54<7:25:09, 2.50s/it] +2025-02-05 21:43:34 - ERROR - stderr - +2025-02-05 21:43:34 - ERROR - stderr - +2025-02-05 21:43:34 - INFO - stdout - {'loss': 0.6572, 'grad_norm': 1.0748053789138794, 'learning_rate': 9.688201272334031e-06, 'epoch': 1.57} +2025-02-05 21:43:34 - ERROR - stderr - 52%|█████▏ | 11770/22434 [11:35:54<7:25:09, 2.50s/it] +2025-02-05 21:43:37 - ERROR - stderr - 52%|█████▏ | 11771/22434 [11:35:56<7:35:09, 2.56s/it] +2025-02-05 21:43:37 - ERROR - stderr - +2025-02-05 21:43:37 - ERROR - stderr - +2025-02-05 21:43:37 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 1.2526825666427612, 'learning_rate': 9.686758230929395e-06, 'epoch': 1.57} +2025-02-05 21:43:37 - ERROR - stderr - 52%|█████▏ | 11771/22434 [11:35:56<7:35:09, 2.56s/it] +2025-02-05 21:43:39 - ERROR - stderr - 52%|█████▏ | 11772/22434 [11:35:59<7:30:23, 2.53s/it] +2025-02-05 21:43:39 - ERROR - stderr - +2025-02-05 21:43:39 - ERROR - stderr - +2025-02-05 21:43:39 - INFO - stdout - {'loss': 0.7648, 'grad_norm': 1.339076280593872, 'learning_rate': 9.685315196053986e-06, 'epoch': 1.57} +2025-02-05 21:43:39 - ERROR - stderr - 52%|█████▏ | 11772/22434 [11:35:59<7:30:23, 2.53s/it] +2025-02-05 21:43:42 - ERROR - stderr - 52%|█████▏ | 11773/22434 [11:36:01<7:30:32, 2.54s/it] +2025-02-05 21:43:42 - ERROR - stderr - +2025-02-05 21:43:42 - ERROR - stderr - +2025-02-05 21:43:42 - INFO - stdout - {'loss': 0.652, 'grad_norm': 1.0858122110366821, 'learning_rate': 9.683872167737883e-06, 'epoch': 1.57} +2025-02-05 21:43:42 - ERROR - stderr - 52%|█████▏ | 11773/22434 [11:36:01<7:30:32, 2.54s/it] +2025-02-05 21:43:44 - ERROR - stderr - 52%|█████▏ | 11774/22434 [11:36:04<7:29:13, 2.53s/it] +2025-02-05 21:43:44 - ERROR - stderr - +2025-02-05 21:43:44 - ERROR - stderr - +2025-02-05 21:43:44 - INFO - stdout - {'loss': 0.6638, 'grad_norm': 1.1740416288375854, 'learning_rate': 9.682429146011157e-06, 'epoch': 1.57} +2025-02-05 21:43:44 - ERROR - stderr - 52%|█████▏ | 11774/22434 [11:36:04<7:29:13, 2.53s/it] +2025-02-05 21:43:47 - ERROR - stderr - 52%|█████▏ | 11775/22434 [11:36:06<7:27:42, 2.52s/it] +2025-02-05 21:43:47 - ERROR - stderr - +2025-02-05 21:43:47 - ERROR - stderr - +2025-02-05 21:43:47 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.2719452381134033, 'learning_rate': 9.680986130903895e-06, 'epoch': 1.57} +2025-02-05 21:43:47 - ERROR - stderr - 52%|█████▏ | 11775/22434 [11:36:06<7:27:42, 2.52s/it] +2025-02-05 21:43:49 - ERROR - stderr - 52%|█████▏ | 11776/22434 [11:36:09<7:33:54, 2.56s/it] +2025-02-05 21:43:49 - ERROR - stderr - +2025-02-05 21:43:49 - ERROR - stderr - +2025-02-05 21:43:49 - INFO - stdout - {'loss': 0.6011, 'grad_norm': 1.2472749948501587, 'learning_rate': 9.679543122446167e-06, 'epoch': 1.57} +2025-02-05 21:43:49 - ERROR - stderr - 52%|█████▏ | 11776/22434 [11:36:09<7:33:54, 2.56s/it] +2025-02-05 21:43:52 - ERROR - stderr - 52%|█████▏ | 11777/22434 [11:36:12<7:34:20, 2.56s/it] +2025-02-05 21:43:52 - ERROR - stderr - +2025-02-05 21:43:52 - ERROR - stderr - +2025-02-05 21:43:52 - INFO - stdout - {'loss': 0.7186, 'grad_norm': 1.2287189960479736, 'learning_rate': 9.67810012066806e-06, 'epoch': 1.57} +2025-02-05 21:43:52 - ERROR - stderr - 52%|█████▏ | 11777/22434 [11:36:12<7:34:20, 2.56s/it] +2025-02-05 21:43:54 - ERROR - stderr - 53%|█████▎ | 11778/22434 [11:36:14<7:31:02, 2.54s/it] +2025-02-05 21:43:54 - ERROR - stderr - +2025-02-05 21:43:54 - ERROR - stderr - +2025-02-05 21:43:54 - INFO - stdout - {'loss': 0.7487, 'grad_norm': 1.19370698928833, 'learning_rate': 9.676657125599649e-06, 'epoch': 1.58} +2025-02-05 21:43:54 - ERROR - stderr - 53%|█████▎ | 11778/22434 [11:36:14<7:31:02, 2.54s/it] +2025-02-05 21:43:57 - ERROR - stderr - 53%|█████▎ | 11779/22434 [11:36:17<7:28:28, 2.53s/it] +2025-02-05 21:43:57 - ERROR - stderr - +2025-02-05 21:43:57 - ERROR - stderr - +2025-02-05 21:43:57 - INFO - stdout - {'loss': 0.6886, 'grad_norm': 1.2478740215301514, 'learning_rate': 9.675214137271007e-06, 'epoch': 1.58} +2025-02-05 21:43:57 - ERROR - stderr - 53%|█████▎ | 11779/22434 [11:36:17<7:28:28, 2.53s/it] +2025-02-05 21:43:59 - ERROR - stderr - 53%|█████▎ | 11780/22434 [11:36:19<7:29:14, 2.53s/it] +2025-02-05 21:43:59 - ERROR - stderr - +2025-02-05 21:43:59 - ERROR - stderr - +2025-02-05 21:43:59 - INFO - stdout - {'loss': 0.7126, 'grad_norm': 1.1382899284362793, 'learning_rate': 9.67377115571222e-06, 'epoch': 1.58} +2025-02-05 21:43:59 - ERROR - stderr - 53%|█████▎ | 11780/22434 [11:36:19<7:29:14, 2.53s/it] +2025-02-05 21:44:02 - ERROR - stderr - 53%|█████▎ | 11781/22434 [11:36:22<7:30:01, 2.53s/it] +2025-02-05 21:44:02 - ERROR - stderr - +2025-02-05 21:44:02 - ERROR - stderr - +2025-02-05 21:44:02 - INFO - stdout - {'loss': 0.693, 'grad_norm': 1.2274399995803833, 'learning_rate': 9.67232818095336e-06, 'epoch': 1.58} +2025-02-05 21:44:02 - ERROR - stderr - 53%|█████▎ | 11781/22434 [11:36:22<7:30:01, 2.53s/it] +2025-02-05 21:44:04 - ERROR - stderr - 53%|█████▎ | 11782/22434 [11:36:24<7:29:57, 2.53s/it] +2025-02-05 21:44:04 - ERROR - stderr - +2025-02-05 21:44:04 - ERROR - stderr - +2025-02-05 21:44:04 - INFO - stdout - {'loss': 0.6347, 'grad_norm': 1.177681565284729, 'learning_rate': 9.670885213024502e-06, 'epoch': 1.58} +2025-02-05 21:44:04 - ERROR - stderr - 53%|█████▎ | 11782/22434 [11:36:24<7:29:57, 2.53s/it] +2025-02-05 21:44:07 - ERROR - stderr - 53%|█████▎ | 11783/22434 [11:36:27<7:23:36, 2.50s/it] +2025-02-05 21:44:07 - ERROR - stderr - +2025-02-05 21:44:07 - ERROR - stderr - +2025-02-05 21:44:07 - INFO - stdout - {'loss': 0.7251, 'grad_norm': 1.3298187255859375, 'learning_rate': 9.669442251955728e-06, 'epoch': 1.58} +2025-02-05 21:44:07 - ERROR - stderr - 53%|█████▎ | 11783/22434 [11:36:27<7:23:36, 2.50s/it] +2025-02-05 21:44:09 - ERROR - stderr - 53%|█████▎ | 11784/22434 [11:36:29<7:21:42, 2.49s/it] +2025-02-05 21:44:09 - ERROR - stderr - +2025-02-05 21:44:09 - ERROR - stderr - +2025-02-05 21:44:09 - INFO - stdout - {'loss': 0.6272, 'grad_norm': 1.171137809753418, 'learning_rate': 9.667999297777113e-06, 'epoch': 1.58} +2025-02-05 21:44:09 - ERROR - stderr - 53%|█████▎ | 11784/22434 [11:36:29<7:21:42, 2.49s/it] +2025-02-05 21:44:12 - ERROR - stderr - 53%|█████▎ | 11785/22434 [11:36:32<7:36:20, 2.57s/it] +2025-02-05 21:44:12 - ERROR - stderr - +2025-02-05 21:44:12 - ERROR - stderr - +2025-02-05 21:44:12 - INFO - stdout - {'loss': 0.7075, 'grad_norm': 1.2151525020599365, 'learning_rate': 9.666556350518738e-06, 'epoch': 1.58} +2025-02-05 21:44:12 - ERROR - stderr - 53%|█████▎ | 11785/22434 [11:36:32<7:36:20, 2.57s/it] +2025-02-05 21:44:15 - ERROR - stderr - 53%|█████▎ | 11786/22434 [11:36:34<7:31:47, 2.55s/it] +2025-02-05 21:44:15 - ERROR - stderr - +2025-02-05 21:44:15 - ERROR - stderr - +2025-02-05 21:44:15 - INFO - stdout - {'loss': 0.7594, 'grad_norm': 1.174127221107483, 'learning_rate': 9.665113410210678e-06, 'epoch': 1.58} +2025-02-05 21:44:15 - ERROR - stderr - 53%|█████▎ | 11786/22434 [11:36:34<7:31:47, 2.55s/it] +2025-02-05 21:44:17 - ERROR - stderr - 53%|█████▎ | 11787/22434 [11:36:37<7:52:12, 2.66s/it] +2025-02-05 21:44:18 - ERROR - stderr - +2025-02-05 21:44:18 - ERROR - stderr - +2025-02-05 21:44:18 - INFO - stdout - {'loss': 0.781, 'grad_norm': 1.2687079906463623, 'learning_rate': 9.663670476883005e-06, 'epoch': 1.58} +2025-02-05 21:44:18 - ERROR - stderr - 53%|█████▎ | 11787/22434 [11:36:37<7:52:12, 2.66s/it] +2025-02-05 21:44:20 - ERROR - stderr - 53%|█████▎ | 11788/22434 [11:36:40<7:45:27, 2.62s/it] +2025-02-05 21:44:20 - ERROR - stderr - +2025-02-05 21:44:20 - ERROR - stderr - +2025-02-05 21:44:20 - INFO - stdout - {'loss': 0.7112, 'grad_norm': 1.1923011541366577, 'learning_rate': 9.662227550565801e-06, 'epoch': 1.58} +2025-02-05 21:44:20 - ERROR - stderr - 53%|█████▎ | 11788/22434 [11:36:40<7:45:27, 2.62s/it] +2025-02-05 21:44:23 - ERROR - stderr - 53%|█████▎ | 11789/22434 [11:36:42<7:48:23, 2.64s/it] +2025-02-05 21:44:23 - ERROR - stderr - +2025-02-05 21:44:23 - ERROR - stderr - +2025-02-05 21:44:23 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.2337623834609985, 'learning_rate': 9.660784631289141e-06, 'epoch': 1.58} +2025-02-05 21:44:23 - ERROR - stderr - 53%|█████▎ | 11789/22434 [11:36:42<7:48:23, 2.64s/it] +2025-02-05 21:44:25 - ERROR - stderr - 53%|█████▎ | 11790/22434 [11:36:45<7:39:15, 2.59s/it] +2025-02-05 21:44:25 - ERROR - stderr - +2025-02-05 21:44:25 - ERROR - stderr - +2025-02-05 21:44:25 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.2290518283843994, 'learning_rate': 9.659341719083096e-06, 'epoch': 1.58} +2025-02-05 21:44:25 - ERROR - stderr - 53%|█████▎ | 11790/22434 [11:36:45<7:39:15, 2.59s/it] +2025-02-05 21:44:28 - ERROR - stderr - 53%|█████▎ | 11791/22434 [11:36:47<7:33:30, 2.56s/it] +2025-02-05 21:44:28 - ERROR - stderr - +2025-02-05 21:44:28 - ERROR - stderr - +2025-02-05 21:44:28 - INFO - stdout - {'loss': 0.6711, 'grad_norm': 1.1059620380401611, 'learning_rate': 9.657898813977753e-06, 'epoch': 1.58} +2025-02-05 21:44:28 - ERROR - stderr - 53%|█████▎ | 11791/22434 [11:36:47<7:33:30, 2.56s/it] +2025-02-05 21:44:30 - ERROR - stderr - 53%|█████▎ | 11792/22434 [11:36:50<7:33:34, 2.56s/it] +2025-02-05 21:44:30 - ERROR - stderr - +2025-02-05 21:44:30 - ERROR - stderr - +2025-02-05 21:44:30 - INFO - stdout - {'loss': 0.7091, 'grad_norm': 1.114241361618042, 'learning_rate': 9.656455916003178e-06, 'epoch': 1.58} +2025-02-05 21:44:30 - ERROR - stderr - 53%|█████▎ | 11792/22434 [11:36:50<7:33:34, 2.56s/it] +2025-02-05 21:44:33 - ERROR - stderr - 53%|█████▎ | 11793/22434 [11:36:52<7:29:22, 2.53s/it] +2025-02-05 21:44:33 - ERROR - stderr - +2025-02-05 21:44:33 - ERROR - stderr - +2025-02-05 21:44:33 - INFO - stdout - {'loss': 0.6664, 'grad_norm': 1.1974172592163086, 'learning_rate': 9.655013025189452e-06, 'epoch': 1.58} +2025-02-05 21:44:33 - ERROR - stderr - 53%|█████▎ | 11793/22434 [11:36:52<7:29:22, 2.53s/it] +2025-02-05 21:44:36 - ERROR - stderr - 53%|█████▎ | 11794/22434 [11:36:56<8:15:24, 2.79s/it] +2025-02-05 21:44:36 - ERROR - stderr - +2025-02-05 21:44:36 - ERROR - stderr - +2025-02-05 21:44:36 - INFO - stdout - {'loss': 0.7615, 'grad_norm': 1.433977484703064, 'learning_rate': 9.653570141566653e-06, 'epoch': 1.58} +2025-02-05 21:44:36 - ERROR - stderr - 53%|█████▎ | 11794/22434 [11:36:56<8:15:24, 2.79s/it] +2025-02-05 21:44:39 - ERROR - stderr - 53%|█████▎ | 11795/22434 [11:36:58<8:05:35, 2.74s/it] +2025-02-05 21:44:39 - ERROR - stderr - +2025-02-05 21:44:39 - ERROR - stderr - +2025-02-05 21:44:39 - INFO - stdout - {'loss': 0.6729, 'grad_norm': 1.2793447971343994, 'learning_rate': 9.652127265164846e-06, 'epoch': 1.58} +2025-02-05 21:44:39 - ERROR - stderr - 53%|██���██▎ | 11795/22434 [11:36:58<8:05:35, 2.74s/it] +2025-02-05 21:44:41 - ERROR - stderr - 53%|█████▎ | 11796/22434 [11:37:01<7:56:13, 2.69s/it] +2025-02-05 21:44:41 - ERROR - stderr - +2025-02-05 21:44:41 - ERROR - stderr - +2025-02-05 21:44:41 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.1911096572875977, 'learning_rate': 9.650684396014115e-06, 'epoch': 1.58} +2025-02-05 21:44:41 - ERROR - stderr - 53%|█████▎ | 11796/22434 [11:37:01<7:56:13, 2.69s/it] +2025-02-05 21:44:44 - ERROR - stderr - 53%|█████▎ | 11797/22434 [11:37:03<7:43:19, 2.61s/it] +2025-02-05 21:44:44 - ERROR - stderr - +2025-02-05 21:44:44 - ERROR - stderr - +2025-02-05 21:44:44 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.1903282403945923, 'learning_rate': 9.64924153414454e-06, 'epoch': 1.58} +2025-02-05 21:44:44 - ERROR - stderr - 53%|█████▎ | 11797/22434 [11:37:03<7:43:19, 2.61s/it] +2025-02-05 21:44:46 - ERROR - stderr - 53%|█████▎ | 11798/22434 [11:37:06<7:38:22, 2.59s/it] +2025-02-05 21:44:46 - ERROR - stderr - +2025-02-05 21:44:46 - ERROR - stderr - +2025-02-05 21:44:46 - INFO - stdout - {'loss': 0.754, 'grad_norm': 1.187843680381775, 'learning_rate': 9.64779867958618e-06, 'epoch': 1.58} +2025-02-05 21:44:46 - ERROR - stderr - 53%|█████▎ | 11798/22434 [11:37:06<7:38:22, 2.59s/it] +2025-02-05 21:44:49 - ERROR - stderr - 53%|█████▎ | 11799/22434 [11:37:08<7:30:24, 2.54s/it] +2025-02-05 21:44:49 - ERROR - stderr - +2025-02-05 21:44:49 - ERROR - stderr - +2025-02-05 21:44:49 - INFO - stdout - {'loss': 0.8519, 'grad_norm': 1.3745046854019165, 'learning_rate': 9.646355832369128e-06, 'epoch': 1.58} +2025-02-05 21:44:49 - ERROR - stderr - 53%|█████▎ | 11799/22434 [11:37:08<7:30:24, 2.54s/it] +2025-02-05 21:44:51 - ERROR - stderr - 53%|█████▎ | 11800/22434 [11:37:11<7:30:16, 2.54s/it] +2025-02-05 21:44:51 - ERROR - stderr - +2025-02-05 21:44:51 - ERROR - stderr - +2025-02-05 21:44:51 - INFO - stdout - {'loss': 0.7947, 'grad_norm': 1.3015426397323608, 'learning_rate': 9.644912992523444e-06, 'epoch': 1.58} +2025-02-05 21:44:51 - ERROR - stderr - 53%|█████▎ | 11800/22434 [11:37:11<7:30:16, 2.54s/it] +2025-02-05 21:44:54 - ERROR - stderr - 53%|█████▎ | 11801/22434 [11:37:14<7:32:30, 2.55s/it] +2025-02-05 21:44:54 - ERROR - stderr - +2025-02-05 21:44:54 - ERROR - stderr - +2025-02-05 21:44:54 - INFO - stdout - {'loss': 0.7313, 'grad_norm': 1.2784847021102905, 'learning_rate': 9.643470160079213e-06, 'epoch': 1.58} +2025-02-05 21:44:54 - ERROR - stderr - 53%|█████▎ | 11801/22434 [11:37:14<7:32:30, 2.55s/it] +2025-02-05 21:44:56 - ERROR - stderr - 53%|█████▎ | 11802/22434 [11:37:16<7:31:44, 2.55s/it] +2025-02-05 21:44:56 - ERROR - stderr - +2025-02-05 21:44:56 - ERROR - stderr - +2025-02-05 21:44:56 - INFO - stdout - {'loss': 0.6432, 'grad_norm': 1.1897597312927246, 'learning_rate': 9.642027335066502e-06, 'epoch': 1.58} +2025-02-05 21:44:56 - ERROR - stderr - 53%|█████▎ | 11802/22434 [11:37:16<7:31:44, 2.55s/it] +2025-02-05 21:44:59 - ERROR - stderr - 53%|█████▎ | 11803/22434 [11:37:19<7:30:05, 2.54s/it] +2025-02-05 21:44:59 - ERROR - stderr - +2025-02-05 21:44:59 - ERROR - stderr - +2025-02-05 21:44:59 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.1843338012695312, 'learning_rate': 9.64058451751539e-06, 'epoch': 1.58} +2025-02-05 21:44:59 - ERROR - stderr - 53%|█████▎ | 11803/22434 [11:37:19<7:30:05, 2.54s/it] +2025-02-05 21:45:01 - ERROR - stderr - 53%|█████▎ | 11804/22434 [11:37:21<7:25:25, 2.51s/it] +2025-02-05 21:45:01 - ERROR - stderr - +2025-02-05 21:45:01 - ERROR - stderr - +2025-02-05 21:45:01 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.2876243591308594, 'learning_rate': 9.63914170745595e-06, 'epoch': 1.58} +2025-02-05 21:45:01 - ERROR - stderr - 53%|█████▎ | 11804/22434 [11:37:21<7:25:25, 2.51s/it] +2025-02-05 21:45:04 - ERROR - stderr - 53%|█████▎ | 11805/22434 [11:37:24<7:25:52, 2.52s/it] +2025-02-05 21:45:04 - ERROR - stderr - +2025-02-05 21:45:04 - ERROR - stderr - +2025-02-05 21:45:04 - INFO - stdout - {'loss': 0.6611, 'grad_norm': 1.217679500579834, 'learning_rate': 9.63769890491826e-06, 'epoch': 1.58} +2025-02-05 21:45:04 - ERROR - stderr - 53%|█████▎ | 11805/22434 [11:37:24<7:25:52, 2.52s/it] +2025-02-05 21:45:06 - ERROR - stderr - 53%|█████▎ | 11806/22434 [11:37:26<7:25:27, 2.51s/it] +2025-02-05 21:45:06 - ERROR - stderr - +2025-02-05 21:45:06 - ERROR - stderr - +2025-02-05 21:45:06 - INFO - stdout - {'loss': 0.7029, 'grad_norm': 1.3497886657714844, 'learning_rate': 9.636256109932382e-06, 'epoch': 1.58} +2025-02-05 21:45:06 - ERROR - stderr - 53%|█████▎ | 11806/22434 [11:37:26<7:25:27, 2.51s/it] +2025-02-05 21:45:09 - ERROR - stderr - 53%|█████▎ | 11807/22434 [11:37:29<7:22:24, 2.50s/it] +2025-02-05 21:45:09 - ERROR - stderr - +2025-02-05 21:45:09 - ERROR - stderr - +2025-02-05 21:45:09 - INFO - stdout - {'loss': 0.7257, 'grad_norm': 1.1805776357650757, 'learning_rate': 9.634813322528403e-06, 'epoch': 1.58} +2025-02-05 21:45:09 - ERROR - stderr - 53%|█████▎ | 11807/22434 [11:37:29<7:22:24, 2.50s/it] +2025-02-05 21:45:11 - ERROR - stderr - 53%|█████▎ | 11808/22434 [11:37:31<7:21:02, 2.49s/it] +2025-02-05 21:45:11 - ERROR - stderr - +2025-02-05 21:45:11 - ERROR - stderr - +2025-02-05 21:45:11 - INFO - stdout - {'loss': 0.5769, 'grad_norm': 1.070351481437683, 'learning_rate': 9.633370542736386e-06, 'epoch': 1.58} +2025-02-05 21:45:11 - ERROR - stderr - 53%|█████▎ | 11808/22434 [11:37:31<7:21:02, 2.49s/it] +2025-02-05 21:45:14 - ERROR - stderr - 53%|█████▎ | 11809/22434 [11:37:34<7:26:17, 2.52s/it] +2025-02-05 21:45:14 - ERROR - stderr - +2025-02-05 21:45:14 - ERROR - stderr - +2025-02-05 21:45:14 - INFO - stdout - {'loss': 0.6408, 'grad_norm': 1.1269499063491821, 'learning_rate': 9.631927770586412e-06, 'epoch': 1.58} +2025-02-05 21:45:14 - ERROR - stderr - 53%|█████▎ | 11809/22434 [11:37:34<7:26:17, 2.52s/it] +2025-02-05 21:45:16 - ERROR - stderr - 53%|█████▎ | 11810/22434 [11:37:36<7:23:28, 2.50s/it] +2025-02-05 21:45:16 - ERROR - stderr - +2025-02-05 21:45:16 - ERROR - stderr - +2025-02-05 21:45:16 - INFO - stdout - {'loss': 0.6845, 'grad_norm': 1.390510082244873, 'learning_rate': 9.630485006108554e-06, 'epoch': 1.58} +2025-02-05 21:45:16 - ERROR - stderr - 53%|█████▎ | 11810/22434 [11:37:36<7:23:28, 2.50s/it] +2025-02-05 21:45:19 - ERROR - stderr - 53%|█████▎ | 11811/22434 [11:37:39<7:26:51, 2.52s/it] +2025-02-05 21:45:19 - ERROR - stderr - +2025-02-05 21:45:19 - ERROR - stderr - +2025-02-05 21:45:19 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.1902354955673218, 'learning_rate': 9.629042249332878e-06, 'epoch': 1.58} +2025-02-05 21:45:19 - ERROR - stderr - 53%|█████▎ | 11811/22434 [11:37:39<7:26:51, 2.52s/it] +2025-02-05 21:45:21 - ERROR - stderr - 53%|█████▎ | 11812/22434 [11:37:41<7:25:23, 2.52s/it] +2025-02-05 21:45:21 - ERROR - stderr - +2025-02-05 21:45:21 - ERROR - stderr - +2025-02-05 21:45:21 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.2409356832504272, 'learning_rate': 9.627599500289464e-06, 'epoch': 1.58} +2025-02-05 21:45:21 - ERROR - stderr - 53%|█████▎ | 11812/22434 [11:37:41<7:25:23, 2.52s/it] +2025-02-05 21:45:24 - ERROR - stderr - 53%|█████▎ | 11813/22434 [11:37:44<7:27:03, 2.53s/it] +2025-02-05 21:45:24 - ERROR - stderr - +2025-02-05 21:45:24 - ERROR - stderr - +2025-02-05 21:45:24 - INFO - stdout - {'loss': 0.6509, 'grad_norm': 1.1123534440994263, 'learning_rate': 9.62615675900838e-06, 'epoch': 1.58} +2025-02-05 21:45:24 - ERROR - stderr - 53%|█████▎ | 11813/22434 [11:37:44<7:27:03, 2.53s/it] +2025-02-05 21:45:26 - ERROR - stderr - 53%|█████▎ | 11814/22434 [11:37:46<7:27:49, 2.53s/it] +2025-02-05 21:45:26 - ERROR - stderr - +2025-02-05 21:45:26 - ERROR - stderr - +2025-02-05 21:45:26 - INFO - stdout - {'loss': 0.7546, 'grad_norm': 1.3697246313095093, 'learning_rate': 9.624714025519703e-06, 'epoch': 1.58} +2025-02-05 21:45:26 - ERROR - stderr - 53%|█████▎ | 11814/22434 [11:37:46<7:27:49, 2.53s/it] +2025-02-05 21:45:29 - ERROR - stderr - 53%|█████▎ | 11815/22434 [11:37:49<7:26:12, 2.52s/it] +2025-02-05 21:45:29 - ERROR - stderr - +2025-02-05 21:45:29 - ERROR - stderr - +2025-02-05 21:45:29 - INFO - stdout - {'loss': 0.6603, 'grad_norm': 1.1406394243240356, 'learning_rate': 9.623271299853501e-06, 'epoch': 1.58} +2025-02-05 21:45:29 - ERROR - stderr - 53%|█████▎ | 11815/22434 [11:37:49<7:26:12, 2.52s/it] +2025-02-05 21:45:32 - ERROR - stderr - 53%|█████▎ | 11816/22434 [11:37:51<7:31:55, 2.55s/it] +2025-02-05 21:45:32 - ERROR - stderr - +2025-02-05 21:45:32 - ERROR - stderr - +2025-02-05 21:45:32 - INFO - stdout - {'loss': 0.642, 'grad_norm': 1.0919586420059204, 'learning_rate': 9.62182858203985e-06, 'epoch': 1.58} +2025-02-05 21:45:32 - ERROR - stderr - 53%|█████▎ | 11816/22434 [11:37:51<7:31:55, 2.55s/it] +2025-02-05 21:45:34 - ERROR - stderr - 53%|█████▎ | 11817/22434 [11:37:54<7:27:38, 2.53s/it] +2025-02-05 21:45:34 - ERROR - stderr - +2025-02-05 21:45:34 - ERROR - stderr - +2025-02-05 21:45:34 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.0920943021774292, 'learning_rate': 9.62038587210882e-06, 'epoch': 1.58} +2025-02-05 21:45:34 - ERROR - stderr - 53%|█████▎ | 11817/22434 [11:37:54<7:27:38, 2.53s/it] +2025-02-05 21:45:37 - ERROR - stderr - 53%|█████▎ | 11818/22434 [11:37:56<7:26:26, 2.52s/it] +2025-02-05 21:45:37 - ERROR - stderr - +2025-02-05 21:45:37 - ERROR - stderr - +2025-02-05 21:45:37 - INFO - stdout - {'loss': 0.7203, 'grad_norm': 1.2340935468673706, 'learning_rate': 9.618943170090483e-06, 'epoch': 1.58} +2025-02-05 21:45:37 - ERROR - stderr - 53%|█████▎ | 11818/22434 [11:37:56<7:26:26, 2.52s/it] +2025-02-05 21:45:39 - ERROR - stderr - 53%|█████▎ | 11819/22434 [11:37:59<7:27:33, 2.53s/it] +2025-02-05 21:45:39 - ERROR - stderr - +2025-02-05 21:45:39 - ERROR - stderr - +2025-02-05 21:45:39 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.270204782485962, 'learning_rate': 9.617500476014909e-06, 'epoch': 1.58} +2025-02-05 21:45:39 - ERROR - stderr - 53%|█████▎ | 11819/22434 [11:37:59<7:27:33, 2.53s/it] +2025-02-05 21:45:42 - ERROR - stderr - 53%|█████▎ | 11820/22434 [11:38:01<7:26:37, 2.52s/it] +2025-02-05 21:45:42 - ERROR - stderr - +2025-02-05 21:45:42 - ERROR - stderr - +2025-02-05 21:45:42 - INFO - stdout - {'loss': 0.5913, 'grad_norm': 1.1748254299163818, 'learning_rate': 9.616057789912176e-06, 'epoch': 1.58} +2025-02-05 21:45:42 - ERROR - stderr - 53%|█████▎ | 11820/22434 [11:38:01<7:26:37, 2.52s/it] +2025-02-05 21:45:44 - ERROR - stderr - 53%|█████▎ | 11821/22434 [11:38:04<7:25:32, 2.52s/it] +2025-02-05 21:45:44 - ERROR - stderr - +2025-02-05 21:45:44 - ERROR - stderr - +2025-02-05 21:45:44 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.165730357170105, 'learning_rate': 9.614615111812346e-06, 'epoch': 1.58} +2025-02-05 21:45:44 - ERROR - stderr - 53%|█████▎ | 11821/22434 [11:38:04<7:25:32, 2.52s/it] +2025-02-05 21:45:47 - ERROR - stderr - 53%|█████▎ | 11822/22434 [11:38:06<7:28:53, 2.54s/it] +2025-02-05 21:45:47 - ERROR - stderr - +2025-02-05 21:45:47 - ERROR - stderr - +2025-02-05 21:45:47 - INFO - stdout - {'loss': 0.7946, 'grad_norm': 1.4562066793441772, 'learning_rate': 9.613172441745497e-06, 'epoch': 1.58} +2025-02-05 21:45:47 - ERROR - stderr - 53%|█████▎ | 11822/22434 [11:38:07<7:28:53, 2.54s/it] +2025-02-05 21:45:49 - ERROR - stderr - 53%|█████▎ | 11823/22434 [11:38:09<7:33:13, 2.56s/it] +2025-02-05 21:45:49 - ERROR - stderr - +2025-02-05 21:45:49 - ERROR - stderr - +2025-02-05 21:45:49 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.2353875637054443, 'learning_rate': 9.611729779741701e-06, 'epoch': 1.58} +2025-02-05 21:45:49 - ERROR - stderr - 53%|█████▎ | 11823/22434 [11:38:09<7:33:13, 2.56s/it] +2025-02-05 21:45:52 - ERROR - stderr - 53%|█████▎ | 11824/22434 [11:38:12<7:27:26, 2.53s/it] +2025-02-05 21:45:52 - ERROR - stderr - +2025-02-05 21:45:52 - ERROR - stderr - +2025-02-05 21:45:52 - INFO - stdout - {'loss': 0.6266, 'grad_norm': 1.1224403381347656, 'learning_rate': 9.610287125831021e-06, 'epoch': 1.58} +2025-02-05 21:45:52 - ERROR - stderr - 53%|█████▎ | 11824/22434 [11:38:12<7:27:26, 2.53s/it] +2025-02-05 21:45:54 - ERROR - stderr - 53%|█████▎ | 11825/22434 [11:38:14<7:28:37, 2.54s/it] +2025-02-05 21:45:54 - ERROR - stderr - +2025-02-05 21:45:54 - ERROR - stderr - +2025-02-05 21:45:54 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.182630181312561, 'learning_rate': 9.608844480043538e-06, 'epoch': 1.58} +2025-02-05 21:45:54 - ERROR - stderr - 53%|█████▎ | 11825/22434 [11:38:14<7:28:37, 2.54s/it] +2025-02-05 21:45:57 - ERROR - stderr - 53%|█████▎ | 11826/22434 [11:38:17<7:38:54, 2.60s/it] +2025-02-05 21:45:57 - ERROR - stderr - +2025-02-05 21:45:57 - ERROR - stderr - +2025-02-05 21:45:57 - INFO - stdout - {'loss': 0.6399, 'grad_norm': 1.1505976915359497, 'learning_rate': 9.607401842409318e-06, 'epoch': 1.58} +2025-02-05 21:45:57 - ERROR - stderr - 53%|█████▎ | 11826/22434 [11:38:17<7:38:54, 2.60s/it] +2025-02-05 21:46:00 - ERROR - stderr - 53%|█████▎ | 11827/22434 [11:38:19<7:36:29, 2.58s/it] +2025-02-05 21:46:00 - ERROR - stderr - +2025-02-05 21:46:00 - ERROR - stderr - +2025-02-05 21:46:00 - INFO - stdout - {'loss': 0.7762, 'grad_norm': 1.2644262313842773, 'learning_rate': 9.605959212958425e-06, 'epoch': 1.58} +2025-02-05 21:46:00 - ERROR - stderr - 53%|█████▎ | 11827/22434 [11:38:19<7:36:29, 2.58s/it] +2025-02-05 21:46:02 - ERROR - stderr - 53%|█████▎ | 11828/22434 [11:38:22<7:33:14, 2.56s/it] +2025-02-05 21:46:02 - ERROR - stderr - +2025-02-05 21:46:02 - ERROR - stderr - +2025-02-05 21:46:02 - INFO - stdout - {'loss': 0.6393, 'grad_norm': 1.1500595808029175, 'learning_rate': 9.60451659172094e-06, 'epoch': 1.58} +2025-02-05 21:46:02 - ERROR - stderr - 53%|█████▎ | 11828/22434 [11:38:22<7:33:14, 2.56s/it] +2025-02-05 21:46:05 - ERROR - stderr - 53%|█████▎ | 11829/22434 [11:38:24<7:30:23, 2.55s/it] +2025-02-05 21:46:05 - ERROR - stderr - +2025-02-05 21:46:05 - ERROR - stderr - +2025-02-05 21:46:05 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 1.2151269912719727, 'learning_rate': 9.603073978726925e-06, 'epoch': 1.58} +2025-02-05 21:46:05 - ERROR - stderr - 53%|█████▎ | 11829/22434 [11:38:24<7:30:23, 2.55s/it] +2025-02-05 21:46:07 - ERROR - stderr - 53%|█████▎ | 11830/22434 [11:38:27<7:28:21, 2.54s/it] +2025-02-05 21:46:07 - ERROR - stderr - +2025-02-05 21:46:07 - ERROR - stderr - +2025-02-05 21:46:07 - INFO - stdout - {'loss': 0.6754, 'grad_norm': 1.2002170085906982, 'learning_rate': 9.601631374006455e-06, 'epoch': 1.58} +2025-02-05 21:46:07 - ERROR - stderr - 53%|█████▎ | 11830/22434 [11:38:27<7:28:21, 2.54s/it] +2025-02-05 21:46:10 - ERROR - stderr - 53%|█████▎ | 11831/22434 [11:38:29<7:29:37, 2.54s/it] +2025-02-05 21:46:10 - ERROR - stderr - +2025-02-05 21:46:10 - ERROR - stderr - +2025-02-05 21:46:10 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.387992024421692, 'learning_rate': 9.6001887775896e-06, 'epoch': 1.58} +2025-02-05 21:46:10 - ERROR - stderr - 53%|█████▎ | 11831/22434 [11:38:30<7:29:37, 2.54s/it] +2025-02-05 21:46:12 - ERROR - stderr - 53%|█████▎ | 11832/22434 [11:38:32<7:26:15, 2.53s/it] +2025-02-05 21:46:12 - ERROR - stderr - +2025-02-05 21:46:12 - ERROR - stderr - +2025-02-05 21:46:12 - INFO - stdout - {'loss': 0.6779, 'grad_norm': 1.219621181488037, 'learning_rate': 9.598746189506423e-06, 'epoch': 1.58} +2025-02-05 21:46:12 - ERROR - stderr - 53%|█████▎ | 11832/22434 [11:38:32<7:26:15, 2.53s/it] +2025-02-05 21:46:15 - ERROR - stderr - 53%|█████▎ | 11833/22434 [11:38:34<7:22:54, 2.51s/it] +2025-02-05 21:46:15 - ERROR - stderr - +2025-02-05 21:46:15 - ERROR - stderr - +2025-02-05 21:46:15 - INFO - stdout - {'loss': 0.7076, 'grad_norm': 1.1249243021011353, 'learning_rate': 9.597303609787001e-06, 'epoch': 1.58} +2025-02-05 21:46:15 - ERROR - stderr - 53%|█████▎ | 11833/22434 [11:38:34<7:22:54, 2.51s/it] +2025-02-05 21:46:17 - ERROR - stderr - 53%|█████▎ | 11834/22434 [11:38:37<7:20:28, 2.49s/it] +2025-02-05 21:46:17 - ERROR - stderr - +2025-02-05 21:46:17 - ERROR - stderr - +2025-02-05 21:46:17 - INFO - stdout - {'loss': 0.8013, 'grad_norm': 1.2674144506454468, 'learning_rate': 9.595861038461399e-06, 'epoch': 1.58} +2025-02-05 21:46:17 - ERROR - stderr - 53%|█████▎ | 11834/22434 [11:38:37<7:20:28, 2.49s/it] +2025-02-05 21:46:20 - ERROR - stderr - 53%|█████▎ | 11835/22434 [11:38:39<7:20:32, 2.49s/it] +2025-02-05 21:46:20 - ERROR - stderr - +2025-02-05 21:46:20 - ERROR - stderr - +2025-02-05 21:46:20 - INFO - stdout - {'loss': 0.725, 'grad_norm': 1.14565110206604, 'learning_rate': 9.594418475559684e-06, 'epoch': 1.58} +2025-02-05 21:46:20 - ERROR - stderr - 53%|█████▎ | 11835/22434 [11:38:39<7:20:32, 2.49s/it] +2025-02-05 21:46:22 - ERROR - stderr - 53%|█████▎ | 11836/22434 [11:38:42<7:22:17, 2.50s/it] +2025-02-05 21:46:22 - ERROR - stderr - +2025-02-05 21:46:22 - ERROR - stderr - +2025-02-05 21:46:22 - INFO - stdout - {'loss': 0.6425, 'grad_norm': 1.1068092584609985, 'learning_rate': 9.592975921111933e-06, 'epoch': 1.58} +2025-02-05 21:46:22 - ERROR - stderr - 53%|█████▎ | 11836/22434 [11:38:42<7:22:17, 2.50s/it] +2025-02-05 21:46:25 - ERROR - stderr - 53%|█████▎ | 11837/22434 [11:38:44<7:25:53, 2.52s/it] +2025-02-05 21:46:25 - ERROR - stderr - +2025-02-05 21:46:25 - ERROR - stderr - +2025-02-05 21:46:25 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.1704996824264526, 'learning_rate': 9.591533375148204e-06, 'epoch': 1.58} +2025-02-05 21:46:25 - ERROR - stderr - 53%|█████▎ | 11837/22434 [11:38:45<7:25:53, 2.52s/it] +2025-02-05 21:46:28 - ERROR - stderr - 53%|█████▎ | 11838/22434 [11:38:47<7:44:28, 2.63s/it] +2025-02-05 21:46:28 - ERROR - stderr - +2025-02-05 21:46:28 - ERROR - stderr - +2025-02-05 21:46:28 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.2061281204223633, 'learning_rate': 9.590090837698576e-06, 'epoch': 1.58} +2025-02-05 21:46:28 - ERROR - stderr - 53%|█████▎ | 11838/22434 [11:38:47<7:44:28, 2.63s/it] +2025-02-05 21:46:30 - ERROR - stderr - 53%|█████▎ | 11839/22434 [11:38:50<7:41:01, 2.61s/it] +2025-02-05 21:46:30 - ERROR - stderr - +2025-02-05 21:46:30 - ERROR - stderr - +2025-02-05 21:46:30 - INFO - stdout - {'loss': 0.6636, 'grad_norm': 1.1754319667816162, 'learning_rate': 9.588648308793111e-06, 'epoch': 1.58} +2025-02-05 21:46:30 - ERROR - stderr - 53%|█████▎ | 11839/22434 [11:38:50<7:41:01, 2.61s/it] +2025-02-05 21:46:33 - ERROR - stderr - 53%|█████▎ | 11840/22434 [11:38:52<7:34:43, 2.58s/it] +2025-02-05 21:46:33 - ERROR - stderr - +2025-02-05 21:46:33 - ERROR - stderr - +2025-02-05 21:46:33 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.2038474082946777, 'learning_rate': 9.587205788461875e-06, 'epoch': 1.58} +2025-02-05 21:46:33 - ERROR - stderr - 53%|█████▎ | 11840/22434 [11:38:52<7:34:43, 2.58s/it] +2025-02-05 21:46:35 - ERROR - stderr - 53%|█████▎ | 11841/22434 [11:38:55<7:31:41, 2.56s/it] +2025-02-05 21:46:35 - ERROR - stderr - +2025-02-05 21:46:35 - ERROR - stderr - +2025-02-05 21:46:35 - INFO - stdout - {'loss': 0.7381, 'grad_norm': 1.3270734548568726, 'learning_rate': 9.585763276734942e-06, 'epoch': 1.58} +2025-02-05 21:46:35 - ERROR - stderr - 53%|█████▎ | 11841/22434 [11:38:55<7:31:41, 2.56s/it] +2025-02-05 21:46:38 - ERROR - stderr - 53%|█████▎ | 11842/22434 [11:38:57<7:30:14, 2.55s/it] +2025-02-05 21:46:38 - ERROR - stderr - +2025-02-05 21:46:38 - ERROR - stderr - +2025-02-05 21:46:38 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.3576314449310303, 'learning_rate': 9.58432077364238e-06, 'epoch': 1.58} +2025-02-05 21:46:38 - ERROR - stderr - 53%|█████▎ | 11842/22434 [11:38:58<7:30:14, 2.55s/it] +2025-02-05 21:46:40 - ERROR - stderr - 53%|█████▎ | 11843/22434 [11:39:00<7:29:01, 2.54s/it] +2025-02-05 21:46:40 - ERROR - stderr - +2025-02-05 21:46:40 - ERROR - stderr - +2025-02-05 21:46:40 - INFO - stdout - {'loss': 0.5968, 'grad_norm': 1.133558750152588, 'learning_rate': 9.582878279214248e-06, 'epoch': 1.58} +2025-02-05 21:46:40 - ERROR - stderr - 53%|█████▎ | 11843/22434 [11:39:00<7:29:01, 2.54s/it] +2025-02-05 21:46:43 - ERROR - stderr - 53%|█████▎ | 11844/22434 [11:39:03<7:27:51, 2.54s/it] +2025-02-05 21:46:43 - ERROR - stderr - +2025-02-05 21:46:43 - ERROR - stderr - +2025-02-05 21:46:43 - INFO - stdout - {'loss': 0.757, 'grad_norm': 1.2783139944076538, 'learning_rate': 9.581435793480623e-06, 'epoch': 1.58} +2025-02-05 21:46:43 - ERROR - stderr - 53%|█████▎ | 11844/22434 [11:39:03<7:27:51, 2.54s/it] +2025-02-05 21:46:45 - ERROR - stderr - 53%|█████▎ | 11845/22434 [11:39:05<7:29:35, 2.55s/it] +2025-02-05 21:46:45 - ERROR - stderr - +2025-02-05 21:46:45 - ERROR - stderr - +2025-02-05 21:46:45 - INFO - stdout - {'loss': 0.7328, 'grad_norm': 1.2356040477752686, 'learning_rate': 9.579993316471564e-06, 'epoch': 1.58} +2025-02-05 21:46:45 - ERROR - stderr - 53%|█████▎ | 11845/22434 [11:39:05<7:29:35, 2.55s/it] +2025-02-05 21:46:48 - ERROR - stderr - 53%|█████▎ | 11846/22434 [11:39:08<7:29:36, 2.55s/it] +2025-02-05 21:46:48 - ERROR - stderr - +2025-02-05 21:46:48 - ERROR - stderr - +2025-02-05 21:46:48 - INFO - stdout - {'loss': 0.8072, 'grad_norm': 1.4585750102996826, 'learning_rate': 9.578550848217147e-06, 'epoch': 1.58} +2025-02-05 21:46:48 - ERROR - stderr - 53%|█████▎ | 11846/22434 [11:39:08<7:29:36, 2.55s/it] +2025-02-05 21:46:50 - ERROR - stderr - 53%|█████▎ | 11847/22434 [11:39:10<7:26:54, 2.53s/it] +2025-02-05 21:46:50 - ERROR - stderr - +2025-02-05 21:46:50 - ERROR - stderr - +2025-02-05 21:46:50 - INFO - stdout - {'loss': 0.6233, 'grad_norm': 1.1954231262207031, 'learning_rate': 9.577108388747433e-06, 'epoch': 1.58} +2025-02-05 21:46:50 - ERROR - stderr - 53%|█████▎ | 11847/22434 [11:39:10<7:26:54, 2.53s/it] +2025-02-05 21:46:53 - ERROR - stderr - 53%|█████▎ | 11848/22434 [11:39:13<7:23:30, 2.51s/it] +2025-02-05 21:46:53 - ERROR - stderr - +2025-02-05 21:46:53 - ERROR - stderr - +2025-02-05 21:46:53 - INFO - stdout - {'loss': 0.759, 'grad_norm': 1.344062328338623, 'learning_rate': 9.57566593809249e-06, 'epoch': 1.58} +2025-02-05 21:46:53 - ERROR - stderr - 53%|█████▎ | 11848/22434 [11:39:13<7:23:30, 2.51s/it] +2025-02-05 21:46:55 - ERROR - stderr - 53%|█████▎ | 11849/22434 [11:39:15<7:20:34, 2.50s/it] +2025-02-05 21:46:55 - ERROR - stderr - +2025-02-05 21:46:55 - ERROR - stderr - +2025-02-05 21:46:55 - INFO - stdout - {'loss': 0.6498, 'grad_norm': 1.1410038471221924, 'learning_rate': 9.574223496282382e-06, 'epoch': 1.58} +2025-02-05 21:46:55 - ERROR - stderr - 53%|█████▎ | 11849/22434 [11:39:15<7:20:34, 2.50s/it] +2025-02-05 21:46:58 - ERROR - stderr - 53%|█████▎ | 11850/22434 [11:39:18<7:20:54, 2.50s/it] +2025-02-05 21:46:58 - ERROR - stderr - +2025-02-05 21:46:58 - ERROR - stderr - +2025-02-05 21:46:58 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.1499236822128296, 'learning_rate': 9.572781063347184e-06, 'epoch': 1.58} +2025-02-05 21:46:58 - ERROR - stderr - 53%|█████▎ | 11850/22434 [11:39:18<7:20:54, 2.50s/it] +2025-02-05 21:47:00 - ERROR - stderr - 53%|█████▎ | 11851/22434 [11:39:20<7:29:48, 2.55s/it] +2025-02-05 21:47:01 - ERROR - stderr - +2025-02-05 21:47:01 - ERROR - stderr - +2025-02-05 21:47:01 - INFO - stdout - {'loss': 0.7472, 'grad_norm': 1.2958803176879883, 'learning_rate': 9.57133863931695e-06, 'epoch': 1.58} +2025-02-05 21:47:01 - ERROR - stderr - 53%|█████▎ | 11851/22434 [11:39:20<7:29:48, 2.55s/it] +2025-02-05 21:47:03 - ERROR - stderr - 53%|█████▎ | 11852/22434 [11:39:23<7:29:12, 2.55s/it] +2025-02-05 21:47:03 - ERROR - stderr - +2025-02-05 21:47:03 - ERROR - stderr - +2025-02-05 21:47:03 - INFO - stdout - {'loss': 0.7284, 'grad_norm': 1.289383053779602, 'learning_rate': 9.569896224221754e-06, 'epoch': 1.58} +2025-02-05 21:47:03 - ERROR - stderr - 53%|█████▎ | 11852/22434 [11:39:23<7:29:12, 2.55s/it] +2025-02-05 21:47:05 - ERROR - stderr - 53%|█████▎ | 11853/22434 [11:39:25<7:25:30, 2.53s/it] +2025-02-05 21:47:06 - ERROR - stderr - +2025-02-05 21:47:06 - ERROR - stderr - +2025-02-05 21:47:06 - INFO - stdout - {'loss': 0.6135, 'grad_norm': 1.1081980466842651, 'learning_rate': 9.568453818091659e-06, 'epoch': 1.59} +2025-02-05 21:47:06 - ERROR - stderr - 53%|█████▎ | 11853/22434 [11:39:25<7:25:30, 2.53s/it] +2025-02-05 21:47:08 - ERROR - stderr - 53%|█████▎ | 11854/22434 [11:39:28<7:26:23, 2.53s/it] +2025-02-05 21:47:08 - ERROR - stderr - +2025-02-05 21:47:08 - ERROR - stderr - +2025-02-05 21:47:08 - INFO - stdout - {'loss': 0.671, 'grad_norm': 1.3121752738952637, 'learning_rate': 9.567011420956732e-06, 'epoch': 1.59} +2025-02-05 21:47:08 - ERROR - stderr - 53%|█████▎ | 11854/22434 [11:39:28<7:26:23, 2.53s/it] +2025-02-05 21:47:10 - ERROR - stderr - 53%|█████▎ | 11855/22434 [11:39:30<7:22:43, 2.51s/it] +2025-02-05 21:47:11 - ERROR - stderr - +2025-02-05 21:47:11 - ERROR - stderr - +2025-02-05 21:47:11 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.4571441411972046, 'learning_rate': 9.565569032847037e-06, 'epoch': 1.59} +2025-02-05 21:47:11 - ERROR - stderr - 53%|█████▎ | 11855/22434 [11:39:30<7:22:43, 2.51s/it] +2025-02-05 21:47:13 - ERROR - stderr - 53%|█████▎ | 11856/22434 [11:39:33<7:37:13, 2.59s/it] +2025-02-05 21:47:13 - ERROR - stderr - +2025-02-05 21:47:13 - ERROR - stderr - +2025-02-05 21:47:13 - INFO - stdout - {'loss': 0.7111, 'grad_norm': 1.3047864437103271, 'learning_rate': 9.564126653792638e-06, 'epoch': 1.59} +2025-02-05 21:47:13 - ERROR - stderr - 53%|█████▎ | 11856/22434 [11:39:33<7:37:13, 2.59s/it] +2025-02-05 21:47:16 - ERROR - stderr - 53%|█████▎ | 11857/22434 [11:39:36<7:32:56, 2.57s/it] +2025-02-05 21:47:16 - ERROR - stderr - +2025-02-05 21:47:16 - ERROR - stderr - +2025-02-05 21:47:16 - INFO - stdout - {'loss': 0.5999, 'grad_norm': 1.1097774505615234, 'learning_rate': 9.562684283823607e-06, 'epoch': 1.59} +2025-02-05 21:47:16 - ERROR - stderr - 53%|█████▎ | 11857/22434 [11:39:36<7:32:56, 2.57s/it] +2025-02-05 21:47:18 - ERROR - stderr - 53%|█████▎ | 11858/22434 [11:39:38<7:33:13, 2.57s/it] +2025-02-05 21:47:18 - ERROR - stderr - +2025-02-05 21:47:18 - ERROR - stderr - +2025-02-05 21:47:18 - INFO - stdout - {'loss': 0.6871, 'grad_norm': 1.1602849960327148, 'learning_rate': 9.561241922970001e-06, 'epoch': 1.59} +2025-02-05 21:47:18 - ERROR - stderr - 53%|█████▎ | 11858/22434 [11:39:38<7:33:13, 2.57s/it] +2025-02-05 21:47:21 - ERROR - stderr - 53%|█████▎ | 11859/22434 [11:39:41<7:35:01, 2.58s/it] +2025-02-05 21:47:21 - ERROR - stderr - +2025-02-05 21:47:21 - ERROR - stderr - +2025-02-05 21:47:21 - INFO - stdout - {'loss': 0.7885, 'grad_norm': 1.2042287588119507, 'learning_rate': 9.559799571261885e-06, 'epoch': 1.59} +2025-02-05 21:47:21 - ERROR - stderr - 53%|█████▎ | 11859/22434 [11:39:41<7:35:01, 2.58s/it] +2025-02-05 21:47:24 - ERROR - stderr - 53%|█████▎ | 11860/22434 [11:39:44<7:50:04, 2.67s/it] +2025-02-05 21:47:24 - ERROR - stderr - +2025-02-05 21:47:24 - ERROR - stderr - +2025-02-05 21:47:24 - INFO - stdout - {'loss': 0.7059, 'grad_norm': 1.1908502578735352, 'learning_rate': 9.558357228729329e-06, 'epoch': 1.59} +2025-02-05 21:47:24 - ERROR - stderr - 53%|█████▎ | 11860/22434 [11:39:44<7:50:04, 2.67s/it] +2025-02-05 21:47:26 - ERROR - stderr - 53%|█████▎ | 11861/22434 [11:39:46<7:44:10, 2.63s/it] +2025-02-05 21:47:26 - ERROR - stderr - +2025-02-05 21:47:26 - ERROR - stderr - +2025-02-05 21:47:26 - INFO - stdout - {'loss': 0.7181, 'grad_norm': 1.2264348268508911, 'learning_rate': 9.556914895402391e-06, 'epoch': 1.59} +2025-02-05 21:47:26 - ERROR - stderr - 53%|█████▎ | 11861/22434 [11:39:46<7:44:10, 2.63s/it] +2025-02-05 21:47:29 - ERROR - stderr - 53%|█████▎ | 11862/22434 [11:39:49<7:38:44, 2.60s/it] +2025-02-05 21:47:29 - ERROR - stderr - +2025-02-05 21:47:29 - ERROR - stderr - +2025-02-05 21:47:29 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.1422291994094849, 'learning_rate': 9.55547257131114e-06, 'epoch': 1.59} +2025-02-05 21:47:29 - ERROR - stderr - 53%|█████▎ | 11862/22434 [11:39:49<7:38:44, 2.60s/it] +2025-02-05 21:47:31 - ERROR - stderr - 53%|█████▎ | 11863/22434 [11:39:51<7:35:59, 2.59s/it] +2025-02-05 21:47:32 - ERROR - stderr - +2025-02-05 21:47:32 - ERROR - stderr - +2025-02-05 21:47:32 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.1978416442871094, 'learning_rate': 9.554030256485638e-06, 'epoch': 1.59} +2025-02-05 21:47:32 - ERROR - stderr - 53%|█████▎ | 11863/22434 [11:39:51<7:35:59, 2.59s/it] +2025-02-05 21:47:34 - ERROR - stderr - 53%|█████▎ | 11864/22434 [11:39:54<7:36:14, 2.59s/it] +2025-02-05 21:47:34 - ERROR - stderr - +2025-02-05 21:47:34 - ERROR - stderr - +2025-02-05 21:47:34 - INFO - stdout - {'loss': 0.7455, 'grad_norm': 1.255835771560669, 'learning_rate': 9.552587950955946e-06, 'epoch': 1.59} +2025-02-05 21:47:34 - ERROR - stderr - 53%|█████▎ | 11864/22434 [11:39:54<7:36:14, 2.59s/it] +2025-02-05 21:47:37 - ERROR - stderr - 53%|█████▎ | 11865/22434 [11:39:56<7:32:50, 2.57s/it] +2025-02-05 21:47:37 - ERROR - stderr - +2025-02-05 21:47:37 - ERROR - stderr - +2025-02-05 21:47:37 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.1623185873031616, 'learning_rate': 9.551145654752134e-06, 'epoch': 1.59} +2025-02-05 21:47:37 - ERROR - stderr - 53%|█████▎ | 11865/22434 [11:39:56<7:32:50, 2.57s/it] +2025-02-05 21:47:39 - ERROR - stderr - 53%|█████▎ | 11866/22434 [11:39:59<7:29:33, 2.55s/it] +2025-02-05 21:47:39 - ERROR - stderr - +2025-02-05 21:47:39 - ERROR - stderr - +2025-02-05 21:47:39 - INFO - stdout - {'loss': 0.7511, 'grad_norm': 1.2972922325134277, 'learning_rate': 9.549703367904259e-06, 'epoch': 1.59} +2025-02-05 21:47:39 - ERROR - stderr - 53%|█████▎ | 11866/22434 [11:39:59<7:29:33, 2.55s/it] +2025-02-05 21:47:42 - ERROR - stderr - 53%|█████▎ | 11867/22434 [11:40:01<7:26:59, 2.54s/it] +2025-02-05 21:47:42 - ERROR - stderr - +2025-02-05 21:47:42 - ERROR - stderr - +2025-02-05 21:47:42 - INFO - stdout - {'loss': 0.6252, 'grad_norm': 1.134050965309143, 'learning_rate': 9.548261090442386e-06, 'epoch': 1.59} +2025-02-05 21:47:42 - ERROR - stderr - 53%|█████▎ | 11867/22434 [11:40:01<7:26:59, 2.54s/it] +2025-02-05 21:47:44 - ERROR - stderr - 53%|█████▎ | 11868/22434 [11:40:04<7:24:31, 2.52s/it] +2025-02-05 21:47:44 - ERROR - stderr - +2025-02-05 21:47:44 - ERROR - stderr - +2025-02-05 21:47:44 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.1868950128555298, 'learning_rate': 9.54681882239658e-06, 'epoch': 1.59} +2025-02-05 21:47:44 - ERROR - stderr - 53%|█████▎ | 11868/22434 [11:40:04<7:24:31, 2.52s/it] +2025-02-05 21:47:47 - ERROR - stderr - 53%|█████▎ | 11869/22434 [11:40:07<7:42:11, 2.62s/it] +2025-02-05 21:47:47 - ERROR - stderr - +2025-02-05 21:47:47 - ERROR - stderr - +2025-02-05 21:47:47 - INFO - stdout - {'loss': 0.6473, 'grad_norm': 1.1928505897521973, 'learning_rate': 9.545376563796898e-06, 'epoch': 1.59} +2025-02-05 21:47:47 - ERROR - stderr - 53%|█████▎ | 11869/22434 [11:40:07<7:42:11, 2.62s/it] +2025-02-05 21:47:49 - ERROR - stderr - 53%|█████▎ | 11870/22434 [11:40:09<7:36:45, 2.59s/it] +2025-02-05 21:47:50 - ERROR - stderr - +2025-02-05 21:47:50 - ERROR - stderr - +2025-02-05 21:47:50 - INFO - stdout - {'loss': 0.5765, 'grad_norm': 1.157888650894165, 'learning_rate': 9.54393431467341e-06, 'epoch': 1.59} +2025-02-05 21:47:50 - ERROR - stderr - 53%|█████▎ | 11870/22434 [11:40:09<7:36:45, 2.59s/it] +2025-02-05 21:47:52 - ERROR - stderr - 53%|█████▎ | 11871/22434 [11:40:12<7:34:31, 2.58s/it] +2025-02-05 21:47:52 - ERROR - stderr - +2025-02-05 21:47:52 - ERROR - stderr - +2025-02-05 21:47:52 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.1943787336349487, 'learning_rate': 9.542492075056178e-06, 'epoch': 1.59} +2025-02-05 21:47:52 - ERROR - stderr - 53%|█████▎ | 11871/22434 [11:40:12<7:34:31, 2.58s/it] +2025-02-05 21:47:55 - ERROR - stderr - 53%|█████▎ | 11872/22434 [11:40:14<7:35:21, 2.59s/it] +2025-02-05 21:47:55 - ERROR - stderr - +2025-02-05 21:47:55 - ERROR - stderr - +2025-02-05 21:47:55 - INFO - stdout - {'loss': 0.7536, 'grad_norm': 1.3316676616668701, 'learning_rate': 9.541049844975255e-06, 'epoch': 1.59} +2025-02-05 21:47:55 - ERROR - stderr - 53%|█████▎ | 11872/22434 [11:40:14<7:35:21, 2.59s/it] +2025-02-05 21:47:57 - ERROR - stderr - 53%|█████▎ | 11873/22434 [11:40:17<7:28:58, 2.55s/it] +2025-02-05 21:47:57 - ERROR - stderr - +2025-02-05 21:47:57 - ERROR - stderr - +2025-02-05 21:47:57 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.2263637781143188, 'learning_rate': 9.53960762446071e-06, 'epoch': 1.59} +2025-02-05 21:47:57 - ERROR - stderr - 53%|█████▎ | 11873/22434 [11:40:17<7:28:58, 2.55s/it] +2025-02-05 21:48:00 - ERROR - stderr - 53%|█████▎ | 11874/22434 [11:40:20<7:48:54, 2.66s/it] +2025-02-05 21:48:00 - ERROR - stderr - +2025-02-05 21:48:00 - ERROR - stderr - +2025-02-05 21:48:00 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.2653452157974243, 'learning_rate': 9.538165413542607e-06, 'epoch': 1.59} +2025-02-05 21:48:00 - ERROR - stderr - 53%|█████▎ | 11874/22434 [11:40:20<7:48:54, 2.66s/it] +2025-02-05 21:48:03 - ERROR - stderr - 53%|█████▎ | 11875/22434 [11:40:22<7:44:36, 2.64s/it] +2025-02-05 21:48:03 - ERROR - stderr - +2025-02-05 21:48:03 - ERROR - stderr - +2025-02-05 21:48:03 - INFO - stdout - {'loss': 0.6453, 'grad_norm': 1.1344497203826904, 'learning_rate': 9.536723212251e-06, 'epoch': 1.59} +2025-02-05 21:48:03 - ERROR - stderr - 53%|█████▎ | 11875/22434 [11:40:22<7:44:36, 2.64s/it] +2025-02-05 21:48:05 - ERROR - stderr - 53%|█████▎ | 11876/22434 [11:40:25<7:38:55, 2.61s/it] +2025-02-05 21:48:05 - ERROR - stderr - +2025-02-05 21:48:05 - ERROR - stderr - +2025-02-05 21:48:05 - INFO - stdout - {'loss': 0.6397, 'grad_norm': 1.2055330276489258, 'learning_rate': 9.535281020615957e-06, 'epoch': 1.59} +2025-02-05 21:48:05 - ERROR - stderr - 53%|█████▎ | 11876/22434 [11:40:25<7:38:55, 2.61s/it] +2025-02-05 21:48:08 - ERROR - stderr - 53%|█████▎ | 11877/22434 [11:40:27<7:34:57, 2.59s/it] +2025-02-05 21:48:08 - ERROR - stderr - +2025-02-05 21:48:08 - ERROR - stderr - +2025-02-05 21:48:08 - INFO - stdout - {'loss': 0.7809, 'grad_norm': 1.3709781169891357, 'learning_rate': 9.533838838667534e-06, 'epoch': 1.59} +2025-02-05 21:48:08 - ERROR - stderr - 53%|█████▎ | 11877/22434 [11:40:28<7:34:57, 2.59s/it] +2025-02-05 21:48:10 - ERROR - stderr - 53%|█████▎ | 11878/22434 [11:40:30<7:35:13, 2.59s/it] +2025-02-05 21:48:10 - ERROR - stderr - +2025-02-05 21:48:10 - ERROR - stderr - +2025-02-05 21:48:10 - INFO - stdout - {'loss': 0.7613, 'grad_norm': 1.2785402536392212, 'learning_rate': 9.532396666435797e-06, 'epoch': 1.59} +2025-02-05 21:48:10 - ERROR - stderr - 53%|█████▎ | 11878/22434 [11:40:30<7:35:13, 2.59s/it] +2025-02-05 21:48:13 - ERROR - stderr - 53%|█████▎ | 11879/22434 [11:40:33<7:33:35, 2.58s/it] +2025-02-05 21:48:13 - ERROR - stderr - +2025-02-05 21:48:13 - ERROR - stderr - +2025-02-05 21:48:13 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.275596022605896, 'learning_rate': 9.530954503950802e-06, 'epoch': 1.59} +2025-02-05 21:48:13 - ERROR - stderr - 53%|█████▎ | 11879/22434 [11:40:33<7:33:35, 2.58s/it] +2025-02-05 21:48:16 - ERROR - stderr - 53%|█████▎ | 11880/22434 [11:40:35<7:42:10, 2.63s/it] +2025-02-05 21:48:16 - ERROR - stderr - +2025-02-05 21:48:16 - ERROR - stderr - +2025-02-05 21:48:16 - INFO - stdout - {'loss': 0.7882, 'grad_norm': 1.3141602277755737, 'learning_rate': 9.529512351242612e-06, 'epoch': 1.59} +2025-02-05 21:48:16 - ERROR - stderr - 53%|█████▎ | 11880/22434 [11:40:35<7:42:10, 2.63s/it] +2025-02-05 21:48:18 - ERROR - stderr - 53%|█████▎ | 11881/22434 [11:40:38<7:34:41, 2.59s/it] +2025-02-05 21:48:18 - ERROR - stderr - +2025-02-05 21:48:18 - ERROR - stderr - +2025-02-05 21:48:18 - INFO - stdout - {'loss': 0.7252, 'grad_norm': 1.1664420366287231, 'learning_rate': 9.528070208341286e-06, 'epoch': 1.59} +2025-02-05 21:48:18 - ERROR - stderr - 53%|█████▎ | 11881/22434 [11:40:38<7:34:41, 2.59s/it] +2025-02-05 21:48:21 - ERROR - stderr - 53%|█████▎ | 11882/22434 [11:40:40<7:30:05, 2.56s/it] +2025-02-05 21:48:21 - ERROR - stderr - +2025-02-05 21:48:21 - ERROR - stderr - +2025-02-05 21:48:21 - INFO - stdout - {'loss': 0.7462, 'grad_norm': 1.2321202754974365, 'learning_rate': 9.52662807527689e-06, 'epoch': 1.59} +2025-02-05 21:48:21 - ERROR - stderr - 53%|█████▎ | 11882/22434 [11:40:40<7:30:05, 2.56s/it] +2025-02-05 21:48:23 - ERROR - stderr - 53%|█████▎ | 11883/22434 [11:40:43<7:24:26, 2.53s/it] +2025-02-05 21:48:23 - ERROR - stderr - +2025-02-05 21:48:23 - ERROR - stderr - +2025-02-05 21:48:23 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.1685811281204224, 'learning_rate': 9.525185952079472e-06, 'epoch': 1.59} +2025-02-05 21:48:23 - ERROR - stderr - 53%|█████▎ | 11883/22434 [11:40:43<7:24:26, 2.53s/it] +2025-02-05 21:48:25 - ERROR - stderr - 53%|█████▎ | 11884/22434 [11:40:45<7:20:03, 2.50s/it] +2025-02-05 21:48:26 - ERROR - stderr - +2025-02-05 21:48:26 - ERROR - stderr - +2025-02-05 21:48:26 - INFO - stdout - {'loss': 0.6913, 'grad_norm': 1.1045488119125366, 'learning_rate': 9.523743838779103e-06, 'epoch': 1.59} +2025-02-05 21:48:26 - ERROR - stderr - 53%|█████▎ | 11884/22434 [11:40:45<7:20:03, 2.50s/it] +2025-02-05 21:48:28 - ERROR - stderr - 53%|█████▎ | 11885/22434 [11:40:48<7:22:20, 2.52s/it] +2025-02-05 21:48:28 - ERROR - stderr - +2025-02-05 21:48:28 - ERROR - stderr - +2025-02-05 21:48:28 - INFO - stdout - {'loss': 0.6423, 'grad_norm': 1.1653701066970825, 'learning_rate': 9.522301735405834e-06, 'epoch': 1.59} +2025-02-05 21:48:28 - ERROR - stderr - 53%|█████▎ | 11885/22434 [11:40:48<7:22:20, 2.52s/it] +2025-02-05 21:48:31 - ERROR - stderr - 53%|█████▎ | 11886/22434 [11:40:50<7:21:47, 2.51s/it] +2025-02-05 21:48:31 - ERROR - stderr - +2025-02-05 21:48:31 - ERROR - stderr - +2025-02-05 21:48:31 - INFO - stdout - {'loss': 0.6666, 'grad_norm': 1.221944808959961, 'learning_rate': 9.520859641989729e-06, 'epoch': 1.59} +2025-02-05 21:48:31 - ERROR - stderr - 53%|█████▎ | 11886/22434 [11:40:50<7:21:47, 2.51s/it] +2025-02-05 21:48:33 - ERROR - stderr - 53%|█████▎ | 11887/22434 [11:40:53<7:19:26, 2.50s/it] +2025-02-05 21:48:33 - ERROR - stderr - +2025-02-05 21:48:33 - ERROR - stderr - +2025-02-05 21:48:33 - INFO - stdout - {'loss': 0.7222, 'grad_norm': 1.2114801406860352, 'learning_rate': 9.519417558560851e-06, 'epoch': 1.59} +2025-02-05 21:48:33 - ERROR - stderr - 53%|█████▎ | 11887/22434 [11:40:53<7:19:26, 2.50s/it] +2025-02-05 21:48:35 - ERROR - stderr - 53%|█████▎ | 11888/22434 [11:40:55<7:18:15, 2.49s/it] +2025-02-05 21:48:36 - ERROR - stderr - +2025-02-05 21:48:36 - ERROR - stderr - +2025-02-05 21:48:36 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.199378490447998, 'learning_rate': 9.517975485149248e-06, 'epoch': 1.59} +2025-02-05 21:48:36 - ERROR - stderr - 53%|█████▎ | 11888/22434 [11:40:55<7:18:15, 2.49s/it] +2025-02-05 21:48:38 - ERROR - stderr - 53%|█████▎ | 11889/22434 [11:40:58<7:34:13, 2.58s/it] +2025-02-05 21:48:38 - ERROR - stderr - +2025-02-05 21:48:38 - ERROR - stderr - +2025-02-05 21:48:38 - INFO - stdout - {'loss': 0.7565, 'grad_norm': 1.275043249130249, 'learning_rate': 9.516533421784989e-06, 'epoch': 1.59} +2025-02-05 21:48:38 - ERROR - stderr - 53%|█████▎ | 11889/22434 [11:40:58<7:34:13, 2.58s/it] +2025-02-05 21:48:41 - ERROR - stderr - 53%|█████▎ | 11890/22434 [11:41:01<7:33:38, 2.58s/it] +2025-02-05 21:48:41 - ERROR - stderr - +2025-02-05 21:48:41 - ERROR - stderr - +2025-02-05 21:48:41 - INFO - stdout - {'loss': 0.6531, 'grad_norm': 1.1312413215637207, 'learning_rate': 9.51509136849813e-06, 'epoch': 1.59} +2025-02-05 21:48:41 - ERROR - stderr - 53%|█████▎ | 11890/22434 [11:41:01<7:33:38, 2.58s/it] +2025-02-05 21:48:43 - ERROR - stderr - 53%|█████▎ | 11891/22434 [11:41:03<7:29:03, 2.56s/it] +2025-02-05 21:48:43 - ERROR - stderr - +2025-02-05 21:48:43 - ERROR - stderr - +2025-02-05 21:48:43 - INFO - stdout - {'loss': 0.6163, 'grad_norm': 1.142961859703064, 'learning_rate': 9.513649325318722e-06, 'epoch': 1.59} +2025-02-05 21:48:43 - ERROR - stderr - 53%|█████▎ | 11891/22434 [11:41:03<7:29:03, 2.56s/it] +2025-02-05 21:48:46 - ERROR - stderr - 53%|█████▎ | 11892/22434 [11:41:06<7:29:29, 2.56s/it] +2025-02-05 21:48:46 - ERROR - stderr - +2025-02-05 21:48:46 - ERROR - stderr - +2025-02-05 21:48:46 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.215346097946167, 'learning_rate': 9.512207292276829e-06, 'epoch': 1.59} +2025-02-05 21:48:46 - ERROR - stderr - 53%|█████▎ | 11892/22434 [11:41:06<7:29:29, 2.56s/it] +2025-02-05 21:48:48 - ERROR - stderr - 53%|█████▎ | 11893/22434 [11:41:08<7:25:09, 2.53s/it] +2025-02-05 21:48:48 - ERROR - stderr - +2025-02-05 21:48:48 - ERROR - stderr - +2025-02-05 21:48:48 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.3247084617614746, 'learning_rate': 9.51076526940251e-06, 'epoch': 1.59} +2025-02-05 21:48:48 - ERROR - stderr - 53%|█████▎ | 11893/22434 [11:41:08<7:25:09, 2.53s/it] +2025-02-05 21:48:51 - ERROR - stderr - 53%|█████▎ | 11894/22434 [11:41:11<7:21:16, 2.51s/it] +2025-02-05 21:48:51 - ERROR - stderr - +2025-02-05 21:48:51 - ERROR - stderr - +2025-02-05 21:48:51 - INFO - stdout - {'loss': 0.7134, 'grad_norm': 1.229372262954712, 'learning_rate': 9.50932325672582e-06, 'epoch': 1.59} +2025-02-05 21:48:51 - ERROR - stderr - 53%|█████▎ | 11894/22434 [11:41:11<7:21:16, 2.51s/it] +2025-02-05 21:48:53 - ERROR - stderr - 53%|█████▎ | 11895/22434 [11:41:13<7:23:59, 2.53s/it] +2025-02-05 21:48:53 - ERROR - stderr - +2025-02-05 21:48:53 - ERROR - stderr - +2025-02-05 21:48:53 - INFO - stdout - {'loss': 0.6383, 'grad_norm': 1.0219619274139404, 'learning_rate': 9.507881254276821e-06, 'epoch': 1.59} +2025-02-05 21:48:53 - ERROR - stderr - 53%|█████▎ | 11895/22434 [11:41:13<7:23:59, 2.53s/it] +2025-02-05 21:48:56 - ERROR - stderr - 53%|█████▎ | 11896/22434 [11:41:16<7:23:51, 2.53s/it] +2025-02-05 21:48:56 - ERROR - stderr - +2025-02-05 21:48:56 - ERROR - stderr - +2025-02-05 21:48:56 - INFO - stdout - {'loss': 0.8406, 'grad_norm': 1.3593305349349976, 'learning_rate': 9.506439262085561e-06, 'epoch': 1.59} +2025-02-05 21:48:56 - ERROR - stderr - 53%|█████▎ | 11896/22434 [11:41:16<7:23:51, 2.53s/it] +2025-02-05 21:48:58 - ERROR - stderr - 53%|█████▎ | 11897/22434 [11:41:18<7:24:05, 2.53s/it] +2025-02-05 21:48:59 - ERROR - stderr - +2025-02-05 21:48:59 - ERROR - stderr - +2025-02-05 21:48:59 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.1417534351348877, 'learning_rate': 9.504997280182105e-06, 'epoch': 1.59} +2025-02-05 21:48:59 - ERROR - stderr - 53%|█████▎ | 11897/22434 [11:41:18<7:24:05, 2.53s/it] +2025-02-05 21:49:01 - ERROR - stderr - 53%|█████▎ | 11898/22434 [11:41:21<7:19:46, 2.50s/it] +2025-02-05 21:49:01 - ERROR - stderr - +2025-02-05 21:49:01 - ERROR - stderr - +2025-02-05 21:49:01 - INFO - stdout - {'loss': 0.6521, 'grad_norm': 1.275374412536621, 'learning_rate': 9.503555308596505e-06, 'epoch': 1.59} +2025-02-05 21:49:01 - ERROR - stderr - 53%|█████▎ | 11898/22434 [11:41:21<7:19:46, 2.50s/it] +2025-02-05 21:49:03 - ERROR - stderr - 53%|█████▎ | 11899/22434 [11:41:23<7:22:15, 2.52s/it] +2025-02-05 21:49:04 - ERROR - stderr - +2025-02-05 21:49:04 - ERROR - stderr - +2025-02-05 21:49:04 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.296343445777893, 'learning_rate': 9.502113347358824e-06, 'epoch': 1.59} +2025-02-05 21:49:04 - ERROR - stderr - 53%|█████▎ | 11899/22434 [11:41:23<7:22:15, 2.52s/it] +2025-02-05 21:49:06 - ERROR - stderr - 53%|█████▎ | 11900/22434 [11:41:26<7:22:42, 2.52s/it] +2025-02-05 21:49:06 - ERROR - stderr - +2025-02-05 21:49:06 - ERROR - stderr - +2025-02-05 21:49:06 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.2174605131149292, 'learning_rate': 9.50067139649911e-06, 'epoch': 1.59} +2025-02-05 21:49:06 - ERROR - stderr - 53%|█████▎ | 11900/22434 [11:41:26<7:22:42, 2.52s/it] +2025-02-05 21:49:08 - ERROR - stderr - 53%|█████▎ | 11901/22434 [11:41:28<7:21:53, 2.52s/it] +2025-02-05 21:49:09 - ERROR - stderr - +2025-02-05 21:49:09 - ERROR - stderr - +2025-02-05 21:49:09 - INFO - stdout - {'loss': 0.7636, 'grad_norm': 1.2400259971618652, 'learning_rate': 9.499229456047423e-06, 'epoch': 1.59} +2025-02-05 21:49:09 - ERROR - stderr - 53%|█████▎ | 11901/22434 [11:41:28<7:21:53, 2.52s/it] +2025-02-05 21:49:11 - ERROR - stderr - 53%|█████▎ | 11902/22434 [11:41:31<7:21:03, 2.51s/it] +2025-02-05 21:49:11 - ERROR - stderr - +2025-02-05 21:49:11 - ERROR - stderr - +2025-02-05 21:49:11 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.2371495962142944, 'learning_rate': 9.49778752603382e-06, 'epoch': 1.59} +2025-02-05 21:49:11 - ERROR - stderr - 53%|█████▎ | 11902/22434 [11:41:31<7:21:03, 2.51s/it] +2025-02-05 21:49:14 - ERROR - stderr - 53%|█████▎ | 11903/22434 [11:41:33<7:21:14, 2.51s/it] +2025-02-05 21:49:14 - ERROR - stderr - +2025-02-05 21:49:14 - ERROR - stderr - +2025-02-05 21:49:14 - INFO - stdout - {'loss': 0.7089, 'grad_norm': 1.1945858001708984, 'learning_rate': 9.496345606488357e-06, 'epoch': 1.59} +2025-02-05 21:49:14 - ERROR - stderr - 53%|█████▎ | 11903/22434 [11:41:33<7:21:14, 2.51s/it] +2025-02-05 21:49:16 - ERROR - stderr - 53%|█████▎ | 11904/22434 [11:41:36<7:22:26, 2.52s/it] +2025-02-05 21:49:16 - ERROR - stderr - +2025-02-05 21:49:16 - ERROR - stderr - +2025-02-05 21:49:16 - INFO - stdout - {'loss': 0.7043, 'grad_norm': 1.2199798822402954, 'learning_rate': 9.494903697441084e-06, 'epoch': 1.59} +2025-02-05 21:49:16 - ERROR - stderr - 53%|█████▎ | 11904/22434 [11:41:36<7:22:26, 2.52s/it] +2025-02-05 21:49:19 - ERROR - stderr - 53%|█████▎ | 11905/22434 [11:41:38<7:20:02, 2.51s/it] +2025-02-05 21:49:19 - ERROR - stderr - +2025-02-05 21:49:19 - ERROR - stderr - +2025-02-05 21:49:19 - INFO - stdout - {'loss': 0.7226, 'grad_norm': 1.2072488069534302, 'learning_rate': 9.493461798922062e-06, 'epoch': 1.59} +2025-02-05 21:49:19 - ERROR - stderr - 53%|█████▎ | 11905/22434 [11:41:38<7:20:02, 2.51s/it] +2025-02-05 21:49:21 - ERROR - stderr - 53%|█████▎ | 11906/22434 [11:41:41<7:18:59, 2.50s/it] +2025-02-05 21:49:21 - ERROR - stderr - +2025-02-05 21:49:21 - ERROR - stderr - +2025-02-05 21:49:21 - INFO - stdout - {'loss': 0.655, 'grad_norm': 1.2400561571121216, 'learning_rate': 9.492019910961345e-06, 'epoch': 1.59} +2025-02-05 21:49:21 - ERROR - stderr - 53%|█████▎ | 11906/22434 [11:41:41<7:18:59, 2.50s/it] +2025-02-05 21:49:23 - ERROR - stderr - 53%|█████▎ | 11907/22434 [11:41:43<7:17:10, 2.49s/it] +2025-02-05 21:49:24 - ERROR - stderr - +2025-02-05 21:49:24 - ERROR - stderr - +2025-02-05 21:49:24 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.304604411125183, 'learning_rate': 9.490578033588985e-06, 'epoch': 1.59} +2025-02-05 21:49:24 - ERROR - stderr - 53%|█████▎ | 11907/22434 [11:41:43<7:17:10, 2.49s/it] +2025-02-05 21:49:26 - ERROR - stderr - 53%|█████▎ | 11908/22434 [11:41:46<7:14:49, 2.48s/it] +2025-02-05 21:49:26 - ERROR - stderr - +2025-02-05 21:49:26 - ERROR - stderr - +2025-02-05 21:49:26 - INFO - stdout - {'loss': 0.7029, 'grad_norm': 1.1369304656982422, 'learning_rate': 9.489136166835042e-06, 'epoch': 1.59} +2025-02-05 21:49:26 - ERROR - stderr - 53%|█████▎ | 11908/22434 [11:41:46<7:14:49, 2.48s/it] +2025-02-05 21:49:28 - ERROR - stderr - 53%|█████▎ | 11909/22434 [11:41:48<7:15:51, 2.48s/it] +2025-02-05 21:49:28 - ERROR - stderr - +2025-02-05 21:49:28 - ERROR - stderr - +2025-02-05 21:49:28 - INFO - stdout - {'loss': 0.6797, 'grad_norm': 1.1227937936782837, 'learning_rate': 9.487694310729562e-06, 'epoch': 1.59} +2025-02-05 21:49:28 - ERROR - stderr - 53%|█████▎ | 11909/22434 [11:41:48<7:15:51, 2.48s/it] +2025-02-05 21:49:31 - ERROR - stderr - 53%|█████▎ | 11910/22434 [11:41:51<7:16:49, 2.49s/it] +2025-02-05 21:49:31 - ERROR - stderr - +2025-02-05 21:49:31 - ERROR - stderr - +2025-02-05 21:49:31 - INFO - stdout - {'loss': 0.6856, 'grad_norm': 1.233720302581787, 'learning_rate': 9.486252465302608e-06, 'epoch': 1.59} +2025-02-05 21:49:31 - ERROR - stderr - 53%|█████▎ | 11910/22434 [11:41:51<7:16:49, 2.49s/it] +2025-02-05 21:49:33 - ERROR - stderr - 53%|█████▎ | 11911/22434 [11:41:53<7:17:29, 2.49s/it] +2025-02-05 21:49:33 - ERROR - stderr - +2025-02-05 21:49:33 - ERROR - stderr - +2025-02-05 21:49:33 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.2412192821502686, 'learning_rate': 9.484810630584227e-06, 'epoch': 1.59} +2025-02-05 21:49:33 - ERROR - stderr - 53%|█████▎ | 11911/22434 [11:41:53<7:17:29, 2.49s/it] +2025-02-05 21:49:36 - ERROR - stderr - 53%|█████▎ | 11912/22434 [11:41:56<7:14:01, 2.47s/it] +2025-02-05 21:49:36 - ERROR - stderr - +2025-02-05 21:49:36 - ERROR - stderr - +2025-02-05 21:49:36 - INFO - stdout - {'loss': 0.659, 'grad_norm': 1.2287447452545166, 'learning_rate': 9.483368806604477e-06, 'epoch': 1.59} +2025-02-05 21:49:36 - ERROR - stderr - 53%|█████▎ | 11912/22434 [11:41:56<7:14:01, 2.47s/it] +2025-02-05 21:49:39 - ERROR - stderr - 53%|█████▎ | 11913/22434 [11:41:58<7:29:34, 2.56s/it] +2025-02-05 21:49:39 - ERROR - stderr - +2025-02-05 21:49:39 - ERROR - stderr - +2025-02-05 21:49:39 - INFO - stdout - {'loss': 0.7103, 'grad_norm': 1.2471702098846436, 'learning_rate': 9.481926993393408e-06, 'epoch': 1.59} +2025-02-05 21:49:39 - ERROR - stderr - 53%|█████▎ | 11913/22434 [11:41:58<7:29:34, 2.56s/it] +2025-02-05 21:49:41 - ERROR - stderr - 53%|█████▎ | 11914/22434 [11:42:01<7:25:53, 2.54s/it] +2025-02-05 21:49:41 - ERROR - stderr - +2025-02-05 21:49:41 - ERROR - stderr - +2025-02-05 21:49:41 - INFO - stdout - {'loss': 0.6966, 'grad_norm': 1.3074742555618286, 'learning_rate': 9.480485190981073e-06, 'epoch': 1.59} +2025-02-05 21:49:41 - ERROR - stderr - 53%|█████▎ | 11914/22434 [11:42:01<7:25:53, 2.54s/it] +2025-02-05 21:49:44 - ERROR - stderr - 53%|█████▎ | 11915/22434 [11:42:03<7:26:42, 2.55s/it] +2025-02-05 21:49:44 - ERROR - stderr - +2025-02-05 21:49:44 - ERROR - stderr - +2025-02-05 21:49:44 - INFO - stdout - {'loss': 0.6447, 'grad_norm': 1.0657333135604858, 'learning_rate': 9.479043399397534e-06, 'epoch': 1.59} +2025-02-05 21:49:44 - ERROR - stderr - 53%|█████▎ | 11915/22434 [11:42:04<7:26:42, 2.55s/it] +2025-02-05 21:49:46 - ERROR - stderr - 53%|█████▎ | 11916/22434 [11:42:06<7:20:24, 2.51s/it] +2025-02-05 21:49:46 - ERROR - stderr - +2025-02-05 21:49:46 - ERROR - stderr - +2025-02-05 21:49:46 - INFO - stdout - {'loss': 0.7556, 'grad_norm': 1.2848750352859497, 'learning_rate': 9.477601618672834e-06, 'epoch': 1.59} +2025-02-05 21:49:46 - ERROR - stderr - 53%|█████▎ | 11916/22434 [11:42:06<7:20:24, 2.51s/it] +2025-02-05 21:49:49 - ERROR - stderr - 53%|█████▎ | 11917/22434 [11:42:08<7:17:01, 2.49s/it] +2025-02-05 21:49:49 - ERROR - stderr - +2025-02-05 21:49:49 - ERROR - stderr - +2025-02-05 21:49:49 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.1942635774612427, 'learning_rate': 9.476159848837026e-06, 'epoch': 1.59} +2025-02-05 21:49:49 - ERROR - stderr - 53%|█████▎ | 11917/22434 [11:42:08<7:17:01, 2.49s/it] +2025-02-05 21:49:51 - ERROR - stderr - 53%|█████▎ | 11918/22434 [11:42:11<7:14:12, 2.48s/it] +2025-02-05 21:49:51 - ERROR - stderr - +2025-02-05 21:49:51 - ERROR - stderr - +2025-02-05 21:49:51 - INFO - stdout - {'loss': 0.6931, 'grad_norm': 1.3195042610168457, 'learning_rate': 9.474718089920167e-06, 'epoch': 1.59} +2025-02-05 21:49:51 - ERROR - stderr - 53%|█████▎ | 11918/22434 [11:42:11<7:14:12, 2.48s/it] +2025-02-05 21:49:53 - ERROR - stderr - 53%|█████▎ | 11919/22434 [11:42:13<7:11:31, 2.46s/it] +2025-02-05 21:49:53 - ERROR - stderr - +2025-02-05 21:49:53 - ERROR - stderr - +2025-02-05 21:49:53 - INFO - stdout - {'loss': 0.7, 'grad_norm': 1.2585618495941162, 'learning_rate': 9.473276341952307e-06, 'epoch': 1.59} +2025-02-05 21:49:53 - ERROR - stderr - 53%|█████▎ | 11919/22434 [11:42:13<7:11:31, 2.46s/it] +2025-02-05 21:49:56 - ERROR - stderr - 53%|█████▎ | 11920/22434 [11:42:16<7:08:54, 2.45s/it] +2025-02-05 21:49:56 - ERROR - stderr - +2025-02-05 21:49:56 - ERROR - stderr - +2025-02-05 21:49:56 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.2516577243804932, 'learning_rate': 9.471834604963495e-06, 'epoch': 1.59} +2025-02-05 21:49:56 - ERROR - stderr - 53%|█████▎ | 11920/22434 [11:42:16<7:08:54, 2.45s/it] +2025-02-05 21:49:58 - ERROR - stderr - 53%|█████▎ | 11921/22434 [11:42:18<7:10:31, 2.46s/it] +2025-02-05 21:49:58 - ERROR - stderr - +2025-02-05 21:49:58 - ERROR - stderr - +2025-02-05 21:49:58 - INFO - stdout - {'loss': 0.6964, 'grad_norm': 1.2190533876419067, 'learning_rate': 9.470392878983789e-06, 'epoch': 1.59} +2025-02-05 21:49:58 - ERROR - stderr - 53%|█████▎ | 11921/22434 [11:42:18<7:10:31, 2.46s/it] +2025-02-05 21:50:01 - ERROR - stderr - 53%|█████▎ | 11922/22434 [11:42:21<7:14:27, 2.48s/it] +2025-02-05 21:50:01 - ERROR - stderr - +2025-02-05 21:50:01 - ERROR - stderr - +2025-02-05 21:50:01 - INFO - stdout - {'loss': 0.7509, 'grad_norm': 1.350277304649353, 'learning_rate': 9.46895116404323e-06, 'epoch': 1.59} +2025-02-05 21:50:01 - ERROR - stderr - 53%|█████▎ | 11922/22434 [11:42:21<7:14:27, 2.48s/it] +2025-02-05 21:50:04 - ERROR - stderr - 53%|█████▎ | 11923/22434 [11:42:23<7:23:09, 2.53s/it] +2025-02-05 21:50:04 - ERROR - stderr - +2025-02-05 21:50:04 - ERROR - stderr - +2025-02-05 21:50:04 - INFO - stdout - {'loss': 0.6207, 'grad_norm': 1.1902166604995728, 'learning_rate': 9.467509460171884e-06, 'epoch': 1.59} +2025-02-05 21:50:04 - ERROR - stderr - 53%|█████▎ | 11923/22434 [11:42:23<7:23:09, 2.53s/it] +2025-02-05 21:50:06 - ERROR - stderr - 53%|█████▎ | 11924/22434 [11:42:26<7:20:08, 2.51s/it] +2025-02-05 21:50:06 - ERROR - stderr - +2025-02-05 21:50:06 - ERROR - stderr - +2025-02-05 21:50:06 - INFO - stdout - {'loss': 0.7544, 'grad_norm': 1.3256665468215942, 'learning_rate': 9.466067767399789e-06, 'epoch': 1.59} +2025-02-05 21:50:06 - ERROR - stderr - 53%|█████▎ | 11924/22434 [11:42:26<7:20:08, 2.51s/it] +2025-02-05 21:50:09 - ERROR - stderr - 53%|█████▎ | 11925/22434 [11:42:28<7:25:57, 2.55s/it] +2025-02-05 21:50:09 - ERROR - stderr - +2025-02-05 21:50:09 - ERROR - stderr - +2025-02-05 21:50:09 - INFO - stdout - {'loss': 0.6301, 'grad_norm': 1.157500982284546, 'learning_rate': 9.464626085757002e-06, 'epoch': 1.59} +2025-02-05 21:50:09 - ERROR - stderr - 53%|█████▎ | 11925/22434 [11:42:28<7:25:57, 2.55s/it] +2025-02-05 21:50:11 - ERROR - stderr - 53%|█████▎ | 11926/22434 [11:42:31<7:20:58, 2.52s/it] +2025-02-05 21:50:11 - ERROR - stderr - +2025-02-05 21:50:11 - ERROR - stderr - +2025-02-05 21:50:11 - INFO - stdout - {'loss': 0.6678, 'grad_norm': 1.1786853075027466, 'learning_rate': 9.463184415273572e-06, 'epoch': 1.59} +2025-02-05 21:50:11 - ERROR - stderr - 53%|█████▎ | 11926/22434 [11:42:31<7:20:58, 2.52s/it] +2025-02-05 21:50:14 - ERROR - stderr - 53%|█████▎ | 11927/22434 [11:42:33<7:22:00, 2.52s/it] +2025-02-05 21:50:14 - ERROR - stderr - +2025-02-05 21:50:14 - ERROR - stderr - +2025-02-05 21:50:14 - INFO - stdout - {'loss': 0.719, 'grad_norm': 1.249001145362854, 'learning_rate': 9.461742755979551e-06, 'epoch': 1.59} +2025-02-05 21:50:14 - ERROR - stderr - 53%|█████▎ | 11927/22434 [11:42:33<7:22:00, 2.52s/it] +2025-02-05 21:50:16 - ERROR - stderr - 53%|█████▎ | 11928/22434 [11:42:36<7:20:51, 2.52s/it] +2025-02-05 21:50:16 - ERROR - stderr - +2025-02-05 21:50:16 - ERROR - stderr - +2025-02-05 21:50:16 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.1630702018737793, 'learning_rate': 9.460301107904982e-06, 'epoch': 1.6} +2025-02-05 21:50:16 - ERROR - stderr - 53%|█████▎ | 11928/22434 [11:42:36<7:20:51, 2.52s/it] +2025-02-05 21:50:19 - ERROR - stderr - 53%|█████▎ | 11929/22434 [11:42:38<7:21:08, 2.52s/it] +2025-02-05 21:50:19 - ERROR - stderr - +2025-02-05 21:50:19 - ERROR - stderr - +2025-02-05 21:50:19 - INFO - stdout - {'loss': 0.7844, 'grad_norm': 1.4249745607376099, 'learning_rate': 9.458859471079925e-06, 'epoch': 1.6} +2025-02-05 21:50:19 - ERROR - stderr - 53%|█████▎ | 11929/22434 [11:42:38<7:21:08, 2.52s/it] +2025-02-05 21:50:21 - ERROR - stderr - 53%|█████▎ | 11930/22434 [11:42:41<7:20:45, 2.52s/it] +2025-02-05 21:50:21 - ERROR - stderr - +2025-02-05 21:50:21 - ERROR - stderr - +2025-02-05 21:50:21 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.330048680305481, 'learning_rate': 9.45741784553442e-06, 'epoch': 1.6} +2025-02-05 21:50:21 - ERROR - stderr - 53%|█████▎ | 11930/22434 [11:42:41<7:20:45, 2.52s/it] +2025-02-05 21:50:24 - ERROR - stderr - 53%|█████▎ | 11931/22434 [11:42:44<7:25:30, 2.55s/it] +2025-02-05 21:50:24 - ERROR - stderr - +2025-02-05 21:50:24 - ERROR - stderr - +2025-02-05 21:50:24 - INFO - stdout - {'loss': 0.751, 'grad_norm': 1.352522611618042, 'learning_rate': 9.455976231298525e-06, 'epoch': 1.6} +2025-02-05 21:50:24 - ERROR - stderr - 53%|█████▎ | 11931/22434 [11:42:44<7:25:30, 2.55s/it] +2025-02-05 21:50:26 - ERROR - stderr - 53%|█████▎ | 11932/22434 [11:42:46<7:23:26, 2.53s/it] +2025-02-05 21:50:26 - ERROR - stderr - +2025-02-05 21:50:26 - ERROR - stderr - +2025-02-05 21:50:26 - INFO - stdout - {'loss': 0.6503, 'grad_norm': 1.0899754762649536, 'learning_rate': 9.454534628402284e-06, 'epoch': 1.6} +2025-02-05 21:50:26 - ERROR - stderr - 53%|█████▎ | 11932/22434 [11:42:46<7:23:26, 2.53s/it] +2025-02-05 21:50:29 - ERROR - stderr - 53%|█████▎ | 11933/22434 [11:42:49<7:46:30, 2.67s/it] +2025-02-05 21:50:29 - ERROR - stderr - +2025-02-05 21:50:29 - ERROR - stderr - +2025-02-05 21:50:29 - INFO - stdout - {'loss': 0.6418, 'grad_norm': 1.149268388748169, 'learning_rate': 9.453093036875742e-06, 'epoch': 1.6} +2025-02-05 21:50:29 - ERROR - stderr - 53%|█████▎ | 11933/22434 [11:42:49<7:46:30, 2.67s/it] +2025-02-05 21:50:32 - ERROR - stderr - 53%|█████▎ | 11934/22434 [11:42:52<7:41:49, 2.64s/it] +2025-02-05 21:50:32 - ERROR - stderr - +2025-02-05 21:50:32 - ERROR - stderr - +2025-02-05 21:50:32 - INFO - stdout - {'loss': 0.6217, 'grad_norm': 1.0731477737426758, 'learning_rate': 9.451651456748958e-06, 'epoch': 1.6} +2025-02-05 21:50:32 - ERROR - stderr - 53%|█████▎ | 11934/22434 [11:42:52<7:41:49, 2.64s/it] +2025-02-05 21:50:34 - ERROR - stderr - 53%|█████▎ | 11935/22434 [11:42:54<7:30:02, 2.57s/it] +2025-02-05 21:50:34 - ERROR - stderr - +2025-02-05 21:50:34 - ERROR - stderr - +2025-02-05 21:50:34 - INFO - stdout - {'loss': 0.6365, 'grad_norm': 1.1634063720703125, 'learning_rate': 9.450209888051976e-06, 'epoch': 1.6} +2025-02-05 21:50:34 - ERROR - stderr - 53%|█████▎ | 11935/22434 [11:42:54<7:30:02, 2.57s/it] +2025-02-05 21:50:37 - ERROR - stderr - 53%|█████▎ | 11936/22434 [11:42:57<7:27:45, 2.56s/it] +2025-02-05 21:50:37 - ERROR - stderr - +2025-02-05 21:50:37 - ERROR - stderr - +2025-02-05 21:50:37 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.3404580354690552, 'learning_rate': 9.448768330814837e-06, 'epoch': 1.6} +2025-02-05 21:50:37 - ERROR - stderr - 53%|█████▎ | 11936/22434 [11:42:57<7:27:45, 2.56s/it] +2025-02-05 21:50:39 - ERROR - stderr - 53%|█████▎ | 11937/22434 [11:42:59<7:20:18, 2.52s/it] +2025-02-05 21:50:39 - ERROR - stderr - +2025-02-05 21:50:39 - ERROR - stderr - +2025-02-05 21:50:39 - INFO - stdout - {'loss': 0.635, 'grad_norm': 1.0767606496810913, 'learning_rate': 9.447326785067596e-06, 'epoch': 1.6} +2025-02-05 21:50:39 - ERROR - stderr - 53%|█████▎ | 11937/22434 [11:42:59<7:20:18, 2.52s/it] +2025-02-05 21:50:42 - ERROR - stderr - 53%|█████▎ | 11938/22434 [11:43:01<7:19:26, 2.51s/it] +2025-02-05 21:50:42 - ERROR - stderr - +2025-02-05 21:50:42 - ERROR - stderr - +2025-02-05 21:50:42 - INFO - stdout - {'loss': 0.711, 'grad_norm': 1.1781765222549438, 'learning_rate': 9.445885250840301e-06, 'epoch': 1.6} +2025-02-05 21:50:42 - ERROR - stderr - 53%|█████▎ | 11938/22434 [11:43:01<7:19:26, 2.51s/it] +2025-02-05 21:50:44 - ERROR - stderr - 53%|█████▎ | 11939/22434 [11:43:04<7:17:40, 2.50s/it] +2025-02-05 21:50:44 - ERROR - stderr - +2025-02-05 21:50:44 - ERROR - stderr - +2025-02-05 21:50:44 - INFO - stdout - {'loss': 0.7903, 'grad_norm': 1.3161048889160156, 'learning_rate': 9.444443728162998e-06, 'epoch': 1.6} +2025-02-05 21:50:44 - ERROR - stderr - 53%|█████▎ | 11939/22434 [11:43:04<7:17:40, 2.50s/it] +2025-02-05 21:50:47 - ERROR - stderr - 53%|█████▎ | 11940/22434 [11:43:06<7:16:30, 2.50s/it] +2025-02-05 21:50:47 - ERROR - stderr - +2025-02-05 21:50:47 - ERROR - stderr - +2025-02-05 21:50:47 - INFO - stdout - {'loss': 0.7769, 'grad_norm': 1.3514540195465088, 'learning_rate': 9.443002217065735e-06, 'epoch': 1.6} +2025-02-05 21:50:47 - ERROR - stderr - 53%|█████▎ | 11940/22434 [11:43:06<7:16:30, 2.50s/it] +2025-02-05 21:50:49 - ERROR - stderr - 53%|█████▎ | 11941/22434 [11:43:09<7:18:21, 2.51s/it] +2025-02-05 21:50:49 - ERROR - stderr - +2025-02-05 21:50:49 - ERROR - stderr - +2025-02-05 21:50:49 - INFO - stdout - {'loss': 0.6269, 'grad_norm': 1.0919398069381714, 'learning_rate': 9.441560717578552e-06, 'epoch': 1.6} +2025-02-05 21:50:49 - ERROR - stderr - 53%|█████▎ | 11941/22434 [11:43:09<7:18:21, 2.51s/it] +2025-02-05 21:50:52 - ERROR - stderr - 53%|█████▎ | 11942/22434 [11:43:11<7:20:00, 2.52s/it] +2025-02-05 21:50:52 - ERROR - stderr - +2025-02-05 21:50:52 - ERROR - stderr - +2025-02-05 21:50:52 - INFO - stdout - {'loss': 0.7168, 'grad_norm': 1.2636088132858276, 'learning_rate': 9.440119229731508e-06, 'epoch': 1.6} +2025-02-05 21:50:52 - ERROR - stderr - 53%|█████▎ | 11942/22434 [11:43:12<7:20:00, 2.52s/it] +2025-02-05 21:50:54 - ERROR - stderr - 53%|█████▎ | 11943/22434 [11:43:14<7:34:15, 2.60s/it] +2025-02-05 21:50:55 - ERROR - stderr - +2025-02-05 21:50:55 - ERROR - stderr - +2025-02-05 21:50:55 - INFO - stdout - {'loss': 0.7571, 'grad_norm': 1.208398699760437, 'learning_rate': 9.438677753554642e-06, 'epoch': 1.6} +2025-02-05 21:50:55 - ERROR - stderr - 53%|█████▎ | 11943/22434 [11:43:14<7:34:15, 2.60s/it] +2025-02-05 21:50:57 - ERROR - stderr - 53%|█████▎ | 11944/22434 [11:43:17<7:31:00, 2.58s/it] +2025-02-05 21:50:57 - ERROR - stderr - +2025-02-05 21:50:57 - ERROR - stderr - +2025-02-05 21:50:57 - INFO - stdout - {'loss': 0.5952, 'grad_norm': 1.0871726274490356, 'learning_rate': 9.437236289077998e-06, 'epoch': 1.6} +2025-02-05 21:50:57 - ERROR - stderr - 53%|█████▎ | 11944/22434 [11:43:17<7:31:00, 2.58s/it] +2025-02-05 21:51:00 - ERROR - stderr - 53%|█████▎ | 11945/22434 [11:43:19<7:29:25, 2.57s/it] +2025-02-05 21:51:00 - ERROR - stderr - +2025-02-05 21:51:00 - ERROR - stderr - +2025-02-05 21:51:00 - INFO - stdout - {'loss': 0.7292, 'grad_norm': 1.3624670505523682, 'learning_rate': 9.435794836331627e-06, 'epoch': 1.6} +2025-02-05 21:51:00 - ERROR - stderr - 53%|█████▎ | 11945/22434 [11:43:19<7:29:25, 2.57s/it] +2025-02-05 21:51:02 - ERROR - stderr - 53%|█████▎ | 11946/22434 [11:43:22<7:24:11, 2.54s/it] +2025-02-05 21:51:02 - ERROR - stderr - +2025-02-05 21:51:02 - ERROR - stderr - +2025-02-05 21:51:02 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.1158134937286377, 'learning_rate': 9.43435339534557e-06, 'epoch': 1.6} +2025-02-05 21:51:02 - ERROR - stderr - 53%|█████▎ | 11946/22434 [11:43:22<7:24:11, 2.54s/it] +2025-02-05 21:51:05 - ERROR - stderr - 53%|█████▎ | 11947/22434 [11:43:24<7:21:11, 2.52s/it] +2025-02-05 21:51:05 - ERROR - stderr - +2025-02-05 21:51:05 - ERROR - stderr - +2025-02-05 21:51:05 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.343318223953247, 'learning_rate': 9.432911966149879e-06, 'epoch': 1.6} +2025-02-05 21:51:05 - ERROR - stderr - 53%|█████▎ | 11947/22434 [11:43:24<7:21:11, 2.52s/it] +2025-02-05 21:51:07 - ERROR - stderr - 53%|█████▎ | 11948/22434 [11:43:27<7:20:09, 2.52s/it] +2025-02-05 21:51:07 - ERROR - stderr - +2025-02-05 21:51:07 - ERROR - stderr - +2025-02-05 21:51:07 - INFO - stdout - {'loss': 0.663, 'grad_norm': 1.3452696800231934, 'learning_rate': 9.431470548774597e-06, 'epoch': 1.6} +2025-02-05 21:51:07 - ERROR - stderr - 53%|█████▎ | 11948/22434 [11:43:27<7:20:09, 2.52s/it] +2025-02-05 21:51:10 - ERROR - stderr - 53%|█████▎ | 11949/22434 [11:43:29<7:17:35, 2.50s/it] +2025-02-05 21:51:10 - ERROR - stderr - +2025-02-05 21:51:10 - ERROR - stderr - +2025-02-05 21:51:10 - INFO - stdout - {'loss': 0.6624, 'grad_norm': 1.219268798828125, 'learning_rate': 9.43002914324976e-06, 'epoch': 1.6} +2025-02-05 21:51:10 - ERROR - stderr - 53%|█████▎ | 11949/22434 [11:43:29<7:17:35, 2.50s/it] +2025-02-05 21:51:12 - ERROR - stderr - 53%|█████▎ | 11950/22434 [11:43:32<7:16:17, 2.50s/it] +2025-02-05 21:51:12 - ERROR - stderr - +2025-02-05 21:51:12 - ERROR - stderr - +2025-02-05 21:51:12 - INFO - stdout - {'loss': 0.6402, 'grad_norm': 1.0390870571136475, 'learning_rate': 9.428587749605426e-06, 'epoch': 1.6} +2025-02-05 21:51:12 - ERROR - stderr - 53%|█████▎ | 11950/22434 [11:43:32<7:16:17, 2.50s/it] +2025-02-05 21:51:14 - ERROR - stderr - 53%|█████▎ | 11951/22434 [11:43:34<7:15:06, 2.49s/it] +2025-02-05 21:51:15 - ERROR - stderr - +2025-02-05 21:51:15 - ERROR - stderr - +2025-02-05 21:51:15 - INFO - stdout - {'loss': 0.6457, 'grad_norm': 1.106889009475708, 'learning_rate': 9.427146367871634e-06, 'epoch': 1.6} +2025-02-05 21:51:15 - ERROR - stderr - 53%|█████▎ | 11951/22434 [11:43:34<7:15:06, 2.49s/it] +2025-02-05 21:51:17 - ERROR - stderr - 53%|█████▎ | 11952/22434 [11:43:37<7:16:05, 2.50s/it] +2025-02-05 21:51:17 - ERROR - stderr - +2025-02-05 21:51:17 - ERROR - stderr - +2025-02-05 21:51:17 - INFO - stdout - {'loss': 0.685, 'grad_norm': 1.2984715700149536, 'learning_rate': 9.425704998078422e-06, 'epoch': 1.6} +2025-02-05 21:51:17 - ERROR - stderr - 53%|█████▎ | 11952/22434 [11:43:37<7:16:05, 2.50s/it] +2025-02-05 21:51:20 - ERROR - stderr - 53%|█████▎ | 11953/22434 [11:43:39<7:22:56, 2.54s/it] +2025-02-05 21:51:20 - ERROR - stderr - +2025-02-05 21:51:20 - ERROR - stderr - +2025-02-05 21:51:20 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.1975985765457153, 'learning_rate': 9.424263640255846e-06, 'epoch': 1.6} +2025-02-05 21:51:20 - ERROR - stderr - 53%|█████▎ | 11953/22434 [11:43:39<7:22:56, 2.54s/it] +2025-02-05 21:51:22 - ERROR - stderr - 53%|█████▎ | 11954/22434 [11:43:42<7:22:42, 2.53s/it] +2025-02-05 21:51:22 - ERROR - stderr - +2025-02-05 21:51:22 - ERROR - stderr - +2025-02-05 21:51:22 - INFO - stdout - {'loss': 0.7368, 'grad_norm': 1.2340142726898193, 'learning_rate': 9.422822294433939e-06, 'epoch': 1.6} +2025-02-05 21:51:22 - ERROR - stderr - 53%|█████▎ | 11954/22434 [11:43:42<7:22:42, 2.53s/it] +2025-02-05 21:51:25 - ERROR - stderr - 53%|█████▎ | 11955/22434 [11:43:44<7:18:37, 2.51s/it] +2025-02-05 21:51:25 - ERROR - stderr - +2025-02-05 21:51:25 - ERROR - stderr - +2025-02-05 21:51:25 - INFO - stdout - {'loss': 0.6825, 'grad_norm': 1.328934669494629, 'learning_rate': 9.421380960642754e-06, 'epoch': 1.6} +2025-02-05 21:51:25 - ERROR - stderr - 53%|█████▎ | 11955/22434 [11:43:44<7:18:37, 2.51s/it] +2025-02-05 21:51:27 - ERROR - stderr - 53%|█████▎ | 11956/22434 [11:43:47<7:38:17, 2.62s/it] +2025-02-05 21:51:28 - ERROR - stderr - +2025-02-05 21:51:28 - ERROR - stderr - +2025-02-05 21:51:28 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.3042895793914795, 'learning_rate': 9.419939638912325e-06, 'epoch': 1.6} +2025-02-05 21:51:28 - ERROR - stderr - 53%|█████▎ | 11956/22434 [11:43:47<7:38:17, 2.62s/it] +2025-02-05 21:51:30 - ERROR - stderr - 53%|█████▎ | 11957/22434 [11:43:50<7:31:52, 2.59s/it] +2025-02-05 21:51:30 - ERROR - stderr - +2025-02-05 21:51:30 - ERROR - stderr - +2025-02-05 21:51:30 - INFO - stdout - {'loss': 0.7743, 'grad_norm': 1.3110930919647217, 'learning_rate': 9.4184983292727e-06, 'epoch': 1.6} +2025-02-05 21:51:30 - ERROR - stderr - 53%|█████▎ | 11957/22434 [11:43:50<7:31:52, 2.59s/it] +2025-02-05 21:51:32 - ERROR - stderr - 53%|█████▎ | 11958/22434 [11:43:52<7:24:47, 2.55s/it] +2025-02-05 21:51:32 - ERROR - stderr - +2025-02-05 21:51:32 - ERROR - stderr - +2025-02-05 21:51:32 - INFO - stdout - {'loss': 0.8152, 'grad_norm': 1.3622463941574097, 'learning_rate': 9.41705703175392e-06, 'epoch': 1.6} +2025-02-05 21:51:32 - ERROR - stderr - 53%|█████▎ | 11958/22434 [11:43:52<7:24:47, 2.55s/it] +2025-02-05 21:51:35 - ERROR - stderr - 53%|█████▎ | 11959/22434 [11:43:55<7:22:00, 2.53s/it] +2025-02-05 21:51:35 - ERROR - stderr - +2025-02-05 21:51:35 - ERROR - stderr - +2025-02-05 21:51:35 - INFO - stdout - {'loss': 0.7853, 'grad_norm': 1.3875211477279663, 'learning_rate': 9.415615746386034e-06, 'epoch': 1.6} +2025-02-05 21:51:35 - ERROR - stderr - 53%|█████▎ | 11959/22434 [11:43:55<7:22:00, 2.53s/it] +2025-02-05 21:51:37 - ERROR - stderr - 53%|█████▎ | 11960/22434 [11:43:57<7:23:54, 2.54s/it] +2025-02-05 21:51:38 - ERROR - stderr - +2025-02-05 21:51:38 - ERROR - stderr - +2025-02-05 21:51:38 - INFO - stdout - {'loss': 0.6437, 'grad_norm': 1.251090168952942, 'learning_rate': 9.41417447319907e-06, 'epoch': 1.6} +2025-02-05 21:51:38 - ERROR - stderr - 53%|█████▎ | 11960/22434 [11:43:57<7:23:54, 2.54s/it] +2025-02-05 21:51:40 - ERROR - stderr - 53%|█████▎ | 11961/22434 [11:44:00<7:22:47, 2.54s/it] +2025-02-05 21:51:40 - ERROR - stderr - +2025-02-05 21:51:40 - ERROR - stderr - +2025-02-05 21:51:40 - INFO - stdout - {'loss': 0.7195, 'grad_norm': 1.3227801322937012, 'learning_rate': 9.412733212223086e-06, 'epoch': 1.6} +2025-02-05 21:51:40 - ERROR - stderr - 53%|█████▎ | 11961/22434 [11:44:00<7:22:47, 2.54s/it] +2025-02-05 21:51:42 - ERROR - stderr - 53%|█████▎ | 11962/22434 [11:44:02<7:19:08, 2.52s/it] +2025-02-05 21:51:43 - ERROR - stderr - +2025-02-05 21:51:43 - ERROR - stderr - +2025-02-05 21:51:43 - INFO - stdout - {'loss': 0.718, 'grad_norm': 1.2727054357528687, 'learning_rate': 9.41129196348811e-06, 'epoch': 1.6} +2025-02-05 21:51:43 - ERROR - stderr - 53%|█████▎ | 11962/22434 [11:44:02<7:19:08, 2.52s/it] +2025-02-05 21:51:45 - ERROR - stderr - 53%|█████▎ | 11963/22434 [11:44:05<7:17:58, 2.51s/it] +2025-02-05 21:51:45 - ERROR - stderr - +2025-02-05 21:51:45 - ERROR - stderr - +2025-02-05 21:51:45 - INFO - stdout - {'loss': 0.7137, 'grad_norm': 1.2788254022598267, 'learning_rate': 9.409850727024194e-06, 'epoch': 1.6} +2025-02-05 21:51:45 - ERROR - stderr - 53%|█████▎ | 11963/22434 [11:44:05<7:17:58, 2.51s/it] +2025-02-05 21:51:47 - ERROR - stderr - 53%|█████▎ | 11964/22434 [11:44:07<7:16:42, 2.50s/it] +2025-02-05 21:51:48 - ERROR - stderr - +2025-02-05 21:51:48 - ERROR - stderr - +2025-02-05 21:51:48 - INFO - stdout - {'loss': 0.8015, 'grad_norm': 1.2815759181976318, 'learning_rate': 9.408409502861374e-06, 'epoch': 1.6} +2025-02-05 21:51:48 - ERROR - stderr - 53%|█████▎ | 11964/22434 [11:44:07<7:16:42, 2.50s/it] +2025-02-05 21:51:50 - ERROR - stderr - 53%|█████▎ | 11965/22434 [11:44:10<7:20:48, 2.53s/it] +2025-02-05 21:51:50 - ERROR - stderr - +2025-02-05 21:51:50 - ERROR - stderr - +2025-02-05 21:51:50 - INFO - stdout - {'loss': 0.6826, 'grad_norm': 1.2300723791122437, 'learning_rate': 9.40696829102969e-06, 'epoch': 1.6} +2025-02-05 21:51:50 - ERROR - stderr - 53%|█████▎ | 11965/22434 [11:44:10<7:20:48, 2.53s/it] +2025-02-05 21:51:53 - ERROR - stderr - 53%|█████▎ | 11966/22434 [11:44:12<7:24:28, 2.55s/it] +2025-02-05 21:51:53 - ERROR - stderr - +2025-02-05 21:51:53 - ERROR - stderr - +2025-02-05 21:51:53 - INFO - stdout - {'loss': 0.7005, 'grad_norm': 1.2241438627243042, 'learning_rate': 9.405527091559187e-06, 'epoch': 1.6} +2025-02-05 21:51:53 - ERROR - stderr - 53%|█████▎ | 11966/22434 [11:44:12<7:24:28, 2.55s/it] +2025-02-05 21:51:55 - ERROR - stderr - 53%|█████▎ | 11967/22434 [11:44:15<7:18:32, 2.51s/it] +2025-02-05 21:51:55 - ERROR - stderr - +2025-02-05 21:51:55 - ERROR - stderr - +2025-02-05 21:51:55 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.3267165422439575, 'learning_rate': 9.404085904479903e-06, 'epoch': 1.6} +2025-02-05 21:51:55 - ERROR - stderr - 53%|█████▎ | 11967/22434 [11:44:15<7:18:32, 2.51s/it] +2025-02-05 21:51:58 - ERROR - stderr - 53%|█████▎ | 11968/22434 [11:44:17<7:14:11, 2.49s/it] +2025-02-05 21:51:58 - ERROR - stderr - +2025-02-05 21:51:58 - ERROR - stderr - +2025-02-05 21:51:58 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.2079559564590454, 'learning_rate': 9.402644729821876e-06, 'epoch': 1.6} +2025-02-05 21:51:58 - ERROR - stderr - 53%|█████▎ | 11968/22434 [11:44:17<7:14:11, 2.49s/it] +2025-02-05 21:52:00 - ERROR - stderr - 53%|█████▎ | 11969/22434 [11:44:20<7:18:41, 2.52s/it] +2025-02-05 21:52:00 - ERROR - stderr - +2025-02-05 21:52:00 - ERROR - stderr - +2025-02-05 21:52:00 - INFO - stdout - {'loss': 0.7341, 'grad_norm': 1.1369354724884033, 'learning_rate': 9.40120356761515e-06, 'epoch': 1.6} +2025-02-05 21:52:00 - ERROR - stderr - 53%|█████▎ | 11969/22434 [11:44:20<7:18:41, 2.52s/it] +2025-02-05 21:52:03 - ERROR - stderr - 53%|█████▎ | 11970/22434 [11:44:22<7:17:03, 2.51s/it] +2025-02-05 21:52:03 - ERROR - stderr - +2025-02-05 21:52:03 - ERROR - stderr - +2025-02-05 21:52:03 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.1219432353973389, 'learning_rate': 9.39976241788976e-06, 'epoch': 1.6} +2025-02-05 21:52:03 - ERROR - stderr - 53%|█████▎ | 11970/22434 [11:44:22<7:17:03, 2.51s/it] +2025-02-05 21:52:05 - ERROR - stderr - 53%|█████▎ | 11971/22434 [11:44:25<7:15:02, 2.49s/it] +2025-02-05 21:52:05 - ERROR - stderr - +2025-02-05 21:52:05 - ERROR - stderr - +2025-02-05 21:52:05 - INFO - stdout - {'loss': 0.648, 'grad_norm': 1.0331940650939941, 'learning_rate': 9.398321280675748e-06, 'epoch': 1.6} +2025-02-05 21:52:05 - ERROR - stderr - 53%|█████▎ | 11971/22434 [11:44:25<7:15:02, 2.49s/it] +2025-02-05 21:52:07 - ERROR - stderr - 53%|█████▎ | 11972/22434 [11:44:27<7:11:02, 2.47s/it] +2025-02-05 21:52:08 - ERROR - stderr - +2025-02-05 21:52:08 - ERROR - stderr - +2025-02-05 21:52:08 - INFO - stdout - {'loss': 0.7325, 'grad_norm': 1.2097536325454712, 'learning_rate': 9.396880156003157e-06, 'epoch': 1.6} +2025-02-05 21:52:08 - ERROR - stderr - 53%|█████▎ | 11972/22434 [11:44:27<7:11:02, 2.47s/it] +2025-02-05 21:52:10 - ERROR - stderr - 53%|█████▎ | 11973/22434 [11:44:30<7:15:17, 2.50s/it] +2025-02-05 21:52:10 - ERROR - stderr - +2025-02-05 21:52:10 - ERROR - stderr - +2025-02-05 21:52:10 - INFO - stdout - {'loss': 0.6673, 'grad_norm': 1.1824707984924316, 'learning_rate': 9.395439043902017e-06, 'epoch': 1.6} +2025-02-05 21:52:10 - ERROR - stderr - 53%|█████▎ | 11973/22434 [11:44:30<7:15:17, 2.50s/it] +2025-02-05 21:52:13 - ERROR - stderr - 53%|█████▎ | 11974/22434 [11:44:32<7:15:22, 2.50s/it] +2025-02-05 21:52:13 - ERROR - stderr - +2025-02-05 21:52:13 - ERROR - stderr - +2025-02-05 21:52:13 - INFO - stdout - {'loss': 0.7126, 'grad_norm': 1.1807588338851929, 'learning_rate': 9.393997944402378e-06, 'epoch': 1.6} +2025-02-05 21:52:13 - ERROR - stderr - 53%|█████▎ | 11974/22434 [11:44:32<7:15:22, 2.50s/it] +2025-02-05 21:52:15 - ERROR - stderr - 53%|█████▎ | 11975/22434 [11:44:35<7:15:04, 2.50s/it] +2025-02-05 21:52:15 - ERROR - stderr - +2025-02-05 21:52:15 - ERROR - stderr - +2025-02-05 21:52:15 - INFO - stdout - {'loss': 0.7189, 'grad_norm': 1.2512067556381226, 'learning_rate': 9.392556857534267e-06, 'epoch': 1.6} +2025-02-05 21:52:15 - ERROR - stderr - 53%|█████▎ | 11975/22434 [11:44:35<7:15:04, 2.50s/it] +2025-02-05 21:52:18 - ERROR - stderr - 53%|█████▎ | 11976/22434 [11:44:37<7:15:59, 2.50s/it] +2025-02-05 21:52:18 - ERROR - stderr - +2025-02-05 21:52:18 - ERROR - stderr - +2025-02-05 21:52:18 - INFO - stdout - {'loss': 0.8405, 'grad_norm': 1.324547529220581, 'learning_rate': 9.39111578332773e-06, 'epoch': 1.6} +2025-02-05 21:52:18 - ERROR - stderr - 53%|█████▎ | 11976/22434 [11:44:37<7:15:59, 2.50s/it] +2025-02-05 21:52:20 - ERROR - stderr - 53%|█████▎ | 11977/22434 [11:44:40<7:19:02, 2.52s/it] +2025-02-05 21:52:20 - ERROR - stderr - +2025-02-05 21:52:20 - ERROR - stderr - +2025-02-05 21:52:20 - INFO - stdout - {'loss': 0.7111, 'grad_norm': 1.257055401802063, 'learning_rate': 9.389674721812799e-06, 'epoch': 1.6} +2025-02-05 21:52:20 - ERROR - stderr - 53%|█████▎ | 11977/22434 [11:44:40<7:19:02, 2.52s/it] +2025-02-05 21:52:23 - ERROR - stderr - 53%|█████▎ | 11978/22434 [11:44:42<7:18:21, 2.52s/it] +2025-02-05 21:52:23 - ERROR - stderr - +2025-02-05 21:52:23 - ERROR - stderr - +2025-02-05 21:52:23 - INFO - stdout - {'loss': 0.7436, 'grad_norm': 1.2335609197616577, 'learning_rate': 9.388233673019513e-06, 'epoch': 1.6} +2025-02-05 21:52:23 - ERROR - stderr - 53%|█████▎ | 11978/22434 [11:44:42<7:18:21, 2.52s/it] +2025-02-05 21:52:25 - ERROR - stderr - 53%|█████▎ | 11979/22434 [11:44:45<7:16:18, 2.50s/it] +2025-02-05 21:52:25 - ERROR - stderr - +2025-02-05 21:52:25 - ERROR - stderr - +2025-02-05 21:52:25 - INFO - stdout - {'loss': 0.6718, 'grad_norm': 1.1616965532302856, 'learning_rate': 9.386792636977915e-06, 'epoch': 1.6} +2025-02-05 21:52:25 - ERROR - stderr - 53%|█████▎ | 11979/22434 [11:44:45<7:16:18, 2.50s/it] +2025-02-05 21:52:28 - ERROR - stderr - 53%|█████▎ | 11980/22434 [11:44:47<7:19:09, 2.52s/it] +2025-02-05 21:52:28 - ERROR - stderr - +2025-02-05 21:52:28 - ERROR - stderr - +2025-02-05 21:52:28 - INFO - stdout - {'loss': 0.6704, 'grad_norm': 1.252840518951416, 'learning_rate': 9.38535161371804e-06, 'epoch': 1.6} +2025-02-05 21:52:28 - ERROR - stderr - 53%|█████▎ | 11980/22434 [11:44:47<7:19:09, 2.52s/it] +2025-02-05 21:52:30 - ERROR - stderr - 53%|█████▎ | 11981/22434 [11:44:50<7:20:48, 2.53s/it] +2025-02-05 21:52:30 - ERROR - stderr - +2025-02-05 21:52:30 - ERROR - stderr - +2025-02-05 21:52:30 - INFO - stdout - {'loss': 0.7209, 'grad_norm': 1.2710531949996948, 'learning_rate': 9.383910603269915e-06, 'epoch': 1.6} +2025-02-05 21:52:30 - ERROR - stderr - 53%|█████▎ | 11981/22434 [11:44:50<7:20:48, 2.53s/it] +2025-02-05 21:52:33 - ERROR - stderr - 53%|█████▎ | 11982/22434 [11:44:53<7:23:15, 2.54s/it] +2025-02-05 21:52:33 - ERROR - stderr - +2025-02-05 21:52:33 - ERROR - stderr - +2025-02-05 21:52:33 - INFO - stdout - {'loss': 0.6856, 'grad_norm': 1.2172224521636963, 'learning_rate': 9.38246960566359e-06, 'epoch': 1.6} +2025-02-05 21:52:33 - ERROR - stderr - 53%|█████▎ | 11982/22434 [11:44:53<7:23:15, 2.54s/it] +2025-02-05 21:52:35 - ERROR - stderr - 53%|█████▎ | 11983/22434 [11:44:55<7:20:43, 2.53s/it] +2025-02-05 21:52:35 - ERROR - stderr - +2025-02-05 21:52:35 - ERROR - stderr - +2025-02-05 21:52:35 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.2084267139434814, 'learning_rate': 9.38102862092909e-06, 'epoch': 1.6} +2025-02-05 21:52:35 - ERROR - stderr - 53%|█████▎ | 11983/22434 [11:44:55<7:20:43, 2.53s/it] +2025-02-05 21:52:38 - ERROR - stderr - 53%|█████▎ | 11984/22434 [11:44:58<7:26:54, 2.57s/it] +2025-02-05 21:52:38 - ERROR - stderr - +2025-02-05 21:52:38 - ERROR - stderr - +2025-02-05 21:52:38 - INFO - stdout - {'loss': 0.6996, 'grad_norm': 1.258206844329834, 'learning_rate': 9.379587649096457e-06, 'epoch': 1.6} +2025-02-05 21:52:38 - ERROR - stderr - 53%|█████▎ | 11984/22434 [11:44:58<7:26:54, 2.57s/it] +2025-02-05 21:52:40 - ERROR - stderr - 53%|█████▎ | 11985/22434 [11:45:00<7:24:48, 2.55s/it] +2025-02-05 21:52:40 - ERROR - stderr - +2025-02-05 21:52:40 - ERROR - stderr - +2025-02-05 21:52:40 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.1563105583190918, 'learning_rate': 9.37814669019573e-06, 'epoch': 1.6} +2025-02-05 21:52:40 - ERROR - stderr - 53%|█████▎ | 11985/22434 [11:45:00<7:24:48, 2.55s/it] +2025-02-05 21:52:43 - ERROR - stderr - 53%|█████▎ | 11986/22434 [11:45:03<7:23:10, 2.55s/it] +2025-02-05 21:52:43 - ERROR - stderr - +2025-02-05 21:52:43 - ERROR - stderr - +2025-02-05 21:52:43 - INFO - stdout - {'loss': 0.6516, 'grad_norm': 1.1429296731948853, 'learning_rate': 9.376705744256936e-06, 'epoch': 1.6} +2025-02-05 21:52:43 - ERROR - stderr - 53%|█████▎ | 11986/22434 [11:45:03<7:23:10, 2.55s/it] +2025-02-05 21:52:45 - ERROR - stderr - 53%|█████▎ | 11987/22434 [11:45:05<7:19:36, 2.52s/it] +2025-02-05 21:52:45 - ERROR - stderr - +2025-02-05 21:52:45 - ERROR - stderr - +2025-02-05 21:52:45 - INFO - stdout - {'loss': 0.6794, 'grad_norm': 1.2605894804000854, 'learning_rate': 9.375264811310117e-06, 'epoch': 1.6} +2025-02-05 21:52:45 - ERROR - stderr - 53%|█████▎ | 11987/22434 [11:45:05<7:19:36, 2.52s/it] +2025-02-05 21:52:48 - ERROR - stderr - 53%|█████▎ | 11988/22434 [11:45:08<7:22:51, 2.54s/it] +2025-02-05 21:52:48 - ERROR - stderr - +2025-02-05 21:52:48 - ERROR - stderr - +2025-02-05 21:52:48 - INFO - stdout - {'loss': 0.6498, 'grad_norm': 1.1889954805374146, 'learning_rate': 9.373823891385305e-06, 'epoch': 1.6} +2025-02-05 21:52:48 - ERROR - stderr - 53%|█████▎ | 11988/22434 [11:45:08<7:22:51, 2.54s/it] +2025-02-05 21:52:50 - ERROR - stderr - 53%|█████▎ | 11989/22434 [11:45:10<7:19:07, 2.52s/it] +2025-02-05 21:52:51 - ERROR - stderr - +2025-02-05 21:52:51 - ERROR - stderr - +2025-02-05 21:52:51 - INFO - stdout - {'loss': 0.7759, 'grad_norm': 1.3865443468093872, 'learning_rate': 9.372382984512533e-06, 'epoch': 1.6} +2025-02-05 21:52:51 - ERROR - stderr - 53%|█████▎ | 11989/22434 [11:45:10<7:19:07, 2.52s/it] +2025-02-05 21:52:53 - ERROR - stderr - 53%|█████▎ | 11990/22434 [11:45:13<7:15:37, 2.50s/it] +2025-02-05 21:52:53 - ERROR - stderr - +2025-02-05 21:52:53 - ERROR - stderr - +2025-02-05 21:52:53 - INFO - stdout - {'loss': 0.661, 'grad_norm': 1.1048980951309204, 'learning_rate': 9.370942090721838e-06, 'epoch': 1.6} +2025-02-05 21:52:53 - ERROR - stderr - 53%|█████▎ | 11990/22434 [11:45:13<7:15:37, 2.50s/it] +2025-02-05 21:52:55 - ERROR - stderr - 53%|█████▎ | 11991/22434 [11:45:15<7:16:24, 2.51s/it] +2025-02-05 21:52:56 - ERROR - stderr - +2025-02-05 21:52:56 - ERROR - stderr - +2025-02-05 21:52:56 - INFO - stdout - {'loss': 0.7569, 'grad_norm': 1.3671259880065918, 'learning_rate': 9.369501210043251e-06, 'epoch': 1.6} +2025-02-05 21:52:56 - ERROR - stderr - 53%|█████▎ | 11991/22434 [11:45:15<7:16:24, 2.51s/it] +2025-02-05 21:52:58 - ERROR - stderr - 53%|█████▎ | 11992/22434 [11:45:18<7:20:19, 2.53s/it] +2025-02-05 21:52:58 - ERROR - stderr - +2025-02-05 21:52:58 - ERROR - stderr - +2025-02-05 21:52:58 - INFO - stdout - {'loss': 0.6254, 'grad_norm': 1.2366061210632324, 'learning_rate': 9.368060342506813e-06, 'epoch': 1.6} +2025-02-05 21:52:58 - ERROR - stderr - 53%|█████▎ | 11992/22434 [11:45:18<7:20:19, 2.53s/it] +2025-02-05 21:53:01 - ERROR - stderr - 53%|█████▎ | 11993/22434 [11:45:20<7:20:21, 2.53s/it] +2025-02-05 21:53:01 - ERROR - stderr - +2025-02-05 21:53:01 - ERROR - stderr - +2025-02-05 21:53:01 - INFO - stdout - {'loss': 0.8084, 'grad_norm': 1.3501685857772827, 'learning_rate': 9.366619488142553e-06, 'epoch': 1.6} +2025-02-05 21:53:01 - ERROR - stderr - 53%|█████▎ | 11993/22434 [11:45:20<7:20:21, 2.53s/it] +2025-02-05 21:53:03 - ERROR - stderr - 53%|█████▎ | 11994/22434 [11:45:23<7:17:50, 2.52s/it] +2025-02-05 21:53:03 - ERROR - stderr - +2025-02-05 21:53:03 - ERROR - stderr - +2025-02-05 21:53:03 - INFO - stdout - {'loss': 0.6292, 'grad_norm': 1.1714917421340942, 'learning_rate': 9.365178646980497e-06, 'epoch': 1.6} +2025-02-05 21:53:03 - ERROR - stderr - 53%|█████▎ | 11994/22434 [11:45:23<7:17:50, 2.52s/it] +2025-02-05 21:53:06 - ERROR - stderr - 53%|█████▎ | 11995/22434 [11:45:25<7:25:17, 2.56s/it] +2025-02-05 21:53:06 - ERROR - stderr - +2025-02-05 21:53:06 - ERROR - stderr - +2025-02-05 21:53:06 - INFO - stdout - {'loss': 0.7336, 'grad_norm': 1.269734263420105, 'learning_rate': 9.36373781905069e-06, 'epoch': 1.6} +2025-02-05 21:53:06 - ERROR - stderr - 53%|█████▎ | 11995/22434 [11:45:26<7:25:17, 2.56s/it] +2025-02-05 21:53:08 - ERROR - stderr - 53%|█████▎ | 11996/22434 [11:45:28<7:19:27, 2.53s/it] +2025-02-05 21:53:08 - ERROR - stderr - +2025-02-05 21:53:08 - ERROR - stderr - +2025-02-05 21:53:08 - INFO - stdout - {'loss': 0.6342, 'grad_norm': 1.2444238662719727, 'learning_rate': 9.362297004383157e-06, 'epoch': 1.6} +2025-02-05 21:53:08 - ERROR - stderr - 53%|█████▎ | 11996/22434 [11:45:28<7:19:27, 2.53s/it] +2025-02-05 21:53:11 - ERROR - stderr - 53%|█████▎ | 11997/22434 [11:45:30<7:20:05, 2.53s/it] +2025-02-05 21:53:11 - ERROR - stderr - +2025-02-05 21:53:11 - ERROR - stderr - +2025-02-05 21:53:11 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.1795012950897217, 'learning_rate': 9.36085620300793e-06, 'epoch': 1.6} +2025-02-05 21:53:11 - ERROR - stderr - 53%|█████▎ | 11997/22434 [11:45:31<7:20:05, 2.53s/it] +2025-02-05 21:53:13 - ERROR - stderr - 53%|█████▎ | 11998/22434 [11:45:33<7:16:51, 2.51s/it] +2025-02-05 21:53:13 - ERROR - stderr - +2025-02-05 21:53:13 - ERROR - stderr - +2025-02-05 21:53:13 - INFO - stdout - {'loss': 0.6763, 'grad_norm': 1.2577054500579834, 'learning_rate': 9.359415414955049e-06, 'epoch': 1.6} +2025-02-05 21:53:13 - ERROR - stderr - 53%|█████▎ | 11998/22434 [11:45:33<7:16:51, 2.51s/it] +2025-02-05 21:53:16 - ERROR - stderr - 53%|█████▎ | 11999/22434 [11:45:36<7:19:15, 2.53s/it] +2025-02-05 21:53:16 - ERROR - stderr - +2025-02-05 21:53:16 - ERROR - stderr - +2025-02-05 21:53:16 - INFO - stdout - {'loss': 0.7404, 'grad_norm': 1.2974488735198975, 'learning_rate': 9.357974640254537e-06, 'epoch': 1.6} +2025-02-05 21:53:16 - ERROR - stderr - 53%|█████▎ | 11999/22434 [11:45:36<7:19:15, 2.53s/it] +2025-02-05 21:53:18 - ERROR - stderr - 53%|█████▎ | 12000/22434 [11:45:38<7:25:08, 2.56s/it] +2025-02-05 21:53:18 - ERROR - stderr - +2025-02-05 21:53:18 - ERROR - stderr - +2025-02-05 21:53:18 - INFO - stdout - {'loss': 0.7315, 'grad_norm': 1.3762050867080688, 'learning_rate': 9.356533878936434e-06, 'epoch': 1.6} +2025-02-05 21:53:18 - ERROR - stderr - 53%|█████▎ | 12000/22434 [11:45:38<7:25:08, 2.56s/it] +2025-02-05 21:53:21 - ERROR - stderr - 53%|█████▎ | 12001/22434 [11:45:41<7:29:18, 2.58s/it] +2025-02-05 21:53:21 - ERROR - stderr - +2025-02-05 21:53:21 - ERROR - stderr - +2025-02-05 21:53:21 - INFO - stdout - {'loss': 0.6351, 'grad_norm': 1.1575360298156738, 'learning_rate': 9.355093131030764e-06, 'epoch': 1.6} +2025-02-05 21:53:21 - ERROR - stderr - 53%|█████▎ | 12001/22434 [11:45:41<7:29:18, 2.58s/it] +2025-02-05 21:53:24 - ERROR - stderr - 53%|█████▎ | 12002/22434 [11:45:43<7:32:36, 2.60s/it] +2025-02-05 21:53:24 - ERROR - stderr - +2025-02-05 21:53:24 - ERROR - stderr - +2025-02-05 21:53:24 - INFO - stdout - {'loss': 0.7248, 'grad_norm': 1.2466273307800293, 'learning_rate': 9.353652396567558e-06, 'epoch': 1.6} +2025-02-05 21:53:24 - ERROR - stderr - 53%|█████▎ | 12002/22434 [11:45:43<7:32:36, 2.60s/it] +2025-02-05 21:53:26 - ERROR - stderr - 54%|█████▎ | 12003/22434 [11:45:46<7:28:23, 2.58s/it] +2025-02-05 21:53:26 - ERROR - stderr - +2025-02-05 21:53:26 - ERROR - stderr - +2025-02-05 21:53:26 - INFO - stdout - {'loss': 0.7093, 'grad_norm': 1.2401421070098877, 'learning_rate': 9.352211675576852e-06, 'epoch': 1.61} +2025-02-05 21:53:26 - ERROR - stderr - 54%|█████▎ | 12003/22434 [11:45:46<7:28:23, 2.58s/it] +2025-02-05 21:53:29 - ERROR - stderr - 54%|█████▎ | 12004/22434 [11:45:49<7:29:46, 2.59s/it] +2025-02-05 21:53:29 - ERROR - stderr - +2025-02-05 21:53:29 - ERROR - stderr - +2025-02-05 21:53:29 - INFO - stdout - {'loss': 0.6597, 'grad_norm': 1.2683452367782593, 'learning_rate': 9.350770968088675e-06, 'epoch': 1.61} +2025-02-05 21:53:29 - ERROR - stderr - 54%|█████▎ | 12004/22434 [11:45:49<7:29:46, 2.59s/it] +2025-02-05 21:53:31 - ERROR - stderr - 54%|█████▎ | 12005/22434 [11:45:51<7:24:07, 2.56s/it] +2025-02-05 21:53:31 - ERROR - stderr - +2025-02-05 21:53:31 - ERROR - stderr - +2025-02-05 21:53:31 - INFO - stdout - {'loss': 0.7096, 'grad_norm': 1.2470391988754272, 'learning_rate': 9.349330274133051e-06, 'epoch': 1.61} +2025-02-05 21:53:31 - ERROR - stderr - 54%|█████▎ | 12005/22434 [11:45:51<7:24:07, 2.56s/it] +2025-02-05 21:53:34 - ERROR - stderr - 54%|█████▎ | 12006/22434 [11:45:53<7:18:26, 2.52s/it] +2025-02-05 21:53:34 - ERROR - stderr - +2025-02-05 21:53:34 - ERROR - stderr - +2025-02-05 21:53:34 - INFO - stdout - {'loss': 0.7248, 'grad_norm': 1.4760842323303223, 'learning_rate': 9.34788959374002e-06, 'epoch': 1.61} +2025-02-05 21:53:34 - ERROR - stderr - 54%|█████▎ | 12006/22434 [11:45:54<7:18:26, 2.52s/it] +2025-02-05 21:53:36 - ERROR - stderr - 54%|█████▎ | 12007/22434 [11:45:56<7:22:02, 2.54s/it] +2025-02-05 21:53:36 - ERROR - stderr - +2025-02-05 21:53:36 - ERROR - stderr - +2025-02-05 21:53:36 - INFO - stdout - {'loss': 0.8426, 'grad_norm': 1.3911807537078857, 'learning_rate': 9.346448926939603e-06, 'epoch': 1.61} +2025-02-05 21:53:36 - ERROR - stderr - 54%|█████▎ | 12007/22434 [11:45:56<7:22:02, 2.54s/it] +2025-02-05 21:53:39 - ERROR - stderr - 54%|█████▎ | 12008/22434 [11:45:59<7:25:07, 2.56s/it] +2025-02-05 21:53:39 - ERROR - stderr - +2025-02-05 21:53:39 - ERROR - stderr - +2025-02-05 21:53:39 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.2453354597091675, 'learning_rate': 9.345008273761836e-06, 'epoch': 1.61} +2025-02-05 21:53:39 - ERROR - stderr - 54%|█████▎ | 12008/22434 [11:45:59<7:25:07, 2.56s/it] +2025-02-05 21:53:41 - ERROR - stderr - 54%|█████▎ | 12009/22434 [11:46:01<7:16:34, 2.51s/it] +2025-02-05 21:53:41 - ERROR - stderr - +2025-02-05 21:53:41 - ERROR - stderr - +2025-02-05 21:53:41 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.3687275648117065, 'learning_rate': 9.343567634236742e-06, 'epoch': 1.61} +2025-02-05 21:53:41 - ERROR - stderr - 54%|█████▎ | 12009/22434 [11:46:01<7:16:34, 2.51s/it] +2025-02-05 21:53:44 - ERROR - stderr - 54%|█████▎ | 12010/22434 [11:46:04<7:12:53, 2.49s/it] +2025-02-05 21:53:44 - ERROR - stderr - +2025-02-05 21:53:44 - ERROR - stderr - +2025-02-05 21:53:44 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.2641733884811401, 'learning_rate': 9.342127008394351e-06, 'epoch': 1.61} +2025-02-05 21:53:44 - ERROR - stderr - 54%|█████▎ | 12010/22434 [11:46:04<7:12:53, 2.49s/it] +2025-02-05 21:53:46 - ERROR - stderr - 54%|█████▎ | 12011/22434 [11:46:06<7:11:46, 2.49s/it] +2025-02-05 21:53:46 - ERROR - stderr - +2025-02-05 21:53:46 - ERROR - stderr - +2025-02-05 21:53:46 - INFO - stdout - {'loss': 0.7001, 'grad_norm': 1.227340579032898, 'learning_rate': 9.340686396264698e-06, 'epoch': 1.61} +2025-02-05 21:53:46 - ERROR - stderr - 54%|█████▎ | 12011/22434 [11:46:06<7:11:46, 2.49s/it] +2025-02-05 21:53:49 - ERROR - stderr - 54%|█████▎ | 12012/22434 [11:46:09<7:13:25, 2.50s/it] +2025-02-05 21:53:49 - ERROR - stderr - +2025-02-05 21:53:49 - ERROR - stderr - +2025-02-05 21:53:49 - INFO - stdout - {'loss': 0.6977, 'grad_norm': 1.179430603981018, 'learning_rate': 9.339245797877804e-06, 'epoch': 1.61} +2025-02-05 21:53:49 - ERROR - stderr - 54%|█████▎ | 12012/22434 [11:46:09<7:13:25, 2.50s/it] +2025-02-05 21:53:51 - ERROR - stderr - 54%|█████▎ | 12013/22434 [11:46:11<7:11:10, 2.48s/it] +2025-02-05 21:53:51 - ERROR - stderr - +2025-02-05 21:53:51 - ERROR - stderr - +2025-02-05 21:53:51 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.2974070310592651, 'learning_rate': 9.337805213263698e-06, 'epoch': 1.61} +2025-02-05 21:53:51 - ERROR - stderr - 54%|█████▎ | 12013/22434 [11:46:11<7:11:10, 2.48s/it] +2025-02-05 21:53:54 - ERROR - stderr - 54%|█████▎ | 12014/22434 [11:46:13<7:10:37, 2.48s/it] +2025-02-05 21:53:54 - ERROR - stderr - +2025-02-05 21:53:54 - ERROR - stderr - +2025-02-05 21:53:54 - INFO - stdout - {'loss': 0.7833, 'grad_norm': 1.2829806804656982, 'learning_rate': 9.33636464245241e-06, 'epoch': 1.61} +2025-02-05 21:53:54 - ERROR - stderr - 54%|█████▎ | 12014/22434 [11:46:13<7:10:37, 2.48s/it] +2025-02-05 21:53:56 - ERROR - stderr - 54%|█████▎ | 12015/22434 [11:46:16<7:11:05, 2.48s/it] +2025-02-05 21:53:56 - ERROR - stderr - +2025-02-05 21:53:56 - ERROR - stderr - +2025-02-05 21:53:56 - INFO - stdout - {'loss': 0.6155, 'grad_norm': 1.2561646699905396, 'learning_rate': 9.334924085473964e-06, 'epoch': 1.61} +2025-02-05 21:53:56 - ERROR - stderr - 54%|█████▎ | 12015/22434 [11:46:16<7:11:05, 2.48s/it] +2025-02-05 21:53:59 - ERROR - stderr - 54%|█████▎ | 12016/22434 [11:46:18<7:13:51, 2.50s/it] +2025-02-05 21:53:59 - ERROR - stderr - +2025-02-05 21:53:59 - ERROR - stderr - +2025-02-05 21:53:59 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.3362137079238892, 'learning_rate': 9.333483542358391e-06, 'epoch': 1.61} +2025-02-05 21:53:59 - ERROR - stderr - 54%|█████▎ | 12016/22434 [11:46:19<7:13:51, 2.50s/it] +2025-02-05 21:54:02 - ERROR - stderr - 54%|█████▎ | 12017/22434 [11:46:22<7:46:20, 2.69s/it] +2025-02-05 21:54:02 - ERROR - stderr - +2025-02-05 21:54:02 - ERROR - stderr - +2025-02-05 21:54:02 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.2331461906433105, 'learning_rate': 9.332043013135717e-06, 'epoch': 1.61} +2025-02-05 21:54:02 - ERROR - stderr - 54%|█████▎ | 12017/22434 [11:46:22<7:46:20, 2.69s/it] +2025-02-05 21:54:04 - ERROR - stderr - 54%|█████▎ | 12018/22434 [11:46:24<7:42:13, 2.66s/it] +2025-02-05 21:54:04 - ERROR - stderr - +2025-02-05 21:54:04 - ERROR - stderr - +2025-02-05 21:54:04 - INFO - stdout - {'loss': 0.7059, 'grad_norm': 1.129315733909607, 'learning_rate': 9.330602497835962e-06, 'epoch': 1.61} +2025-02-05 21:54:04 - ERROR - stderr - 54%|█████▎ | 12018/22434 [11:46:24<7:42:13, 2.66s/it] +2025-02-05 21:54:07 - ERROR - stderr - 54%|█████▎ | 12019/22434 [11:46:27<7:39:05, 2.64s/it] +2025-02-05 21:54:07 - ERROR - stderr - +2025-02-05 21:54:07 - ERROR - stderr - +2025-02-05 21:54:07 - INFO - stdout - {'loss': 0.6443, 'grad_norm': 1.179667353630066, 'learning_rate': 9.329161996489162e-06, 'epoch': 1.61} +2025-02-05 21:54:07 - ERROR - stderr - 54%|█████▎ | 12019/22434 [11:46:27<7:39:05, 2.64s/it] +2025-02-05 21:54:10 - ERROR - stderr - 54%|█████▎ | 12020/22434 [11:46:29<7:30:04, 2.59s/it] +2025-02-05 21:54:10 - ERROR - stderr - +2025-02-05 21:54:10 - ERROR - stderr - +2025-02-05 21:54:10 - INFO - stdout - {'loss': 0.7746, 'grad_norm': 1.4535558223724365, 'learning_rate': 9.32772150912534e-06, 'epoch': 1.61} +2025-02-05 21:54:10 - ERROR - stderr - 54%|█████▎ | 12020/22434 [11:46:29<7:30:04, 2.59s/it] +2025-02-05 21:54:12 - ERROR - stderr - 54%|█████▎ | 12021/22434 [11:46:32<7:23:57, 2.56s/it] +2025-02-05 21:54:12 - ERROR - stderr - +2025-02-05 21:54:12 - ERROR - stderr - +2025-02-05 21:54:12 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.175001621246338, 'learning_rate': 9.326281035774513e-06, 'epoch': 1.61} +2025-02-05 21:54:12 - ERROR - stderr - 54%|█████▎ | 12021/22434 [11:46:32<7:23:57, 2.56s/it] +2025-02-05 21:54:15 - ERROR - stderr - 54%|█████▎ | 12022/22434 [11:46:34<7:23:54, 2.56s/it] +2025-02-05 21:54:15 - ERROR - stderr - +2025-02-05 21:54:15 - ERROR - stderr - +2025-02-05 21:54:15 - INFO - stdout - {'loss': 0.6805, 'grad_norm': 1.262392282485962, 'learning_rate': 9.324840576466718e-06, 'epoch': 1.61} +2025-02-05 21:54:15 - ERROR - stderr - 54%|█████▎ | 12022/22434 [11:46:34<7:23:54, 2.56s/it] +2025-02-05 21:54:17 - ERROR - stderr - 54%|█████▎ | 12023/22434 [11:46:37<7:17:25, 2.52s/it] +2025-02-05 21:54:17 - ERROR - stderr - +2025-02-05 21:54:17 - ERROR - stderr - +2025-02-05 21:54:17 - INFO - stdout - {'loss': 0.6432, 'grad_norm': 1.1330912113189697, 'learning_rate': 9.323400131231971e-06, 'epoch': 1.61} +2025-02-05 21:54:17 - ERROR - stderr - 54%|██���██▎ | 12023/22434 [11:46:37<7:17:25, 2.52s/it] +2025-02-05 21:54:19 - ERROR - stderr - 54%|█████▎ | 12024/22434 [11:46:39<7:12:24, 2.49s/it] +2025-02-05 21:54:19 - ERROR - stderr - +2025-02-05 21:54:19 - ERROR - stderr - +2025-02-05 21:54:19 - INFO - stdout - {'loss': 0.6597, 'grad_norm': 1.2967466115951538, 'learning_rate': 9.321959700100306e-06, 'epoch': 1.61} +2025-02-05 21:54:19 - ERROR - stderr - 54%|█████▎ | 12024/22434 [11:46:39<7:12:24, 2.49s/it] +2025-02-05 21:54:22 - ERROR - stderr - 54%|█████▎ | 12025/22434 [11:46:42<7:23:06, 2.55s/it] +2025-02-05 21:54:22 - ERROR - stderr - +2025-02-05 21:54:22 - ERROR - stderr - +2025-02-05 21:54:22 - INFO - stdout - {'loss': 0.7339, 'grad_norm': 1.2467228174209595, 'learning_rate': 9.320519283101742e-06, 'epoch': 1.61} +2025-02-05 21:54:22 - ERROR - stderr - 54%|█████▎ | 12025/22434 [11:46:42<7:23:06, 2.55s/it] +2025-02-05 21:54:25 - ERROR - stderr - 54%|█████▎ | 12026/22434 [11:46:45<7:33:10, 2.61s/it] +2025-02-05 21:54:25 - ERROR - stderr - +2025-02-05 21:54:25 - ERROR - stderr - +2025-02-05 21:54:25 - INFO - stdout - {'loss': 0.7231, 'grad_norm': 1.2563709020614624, 'learning_rate': 9.319078880266299e-06, 'epoch': 1.61} +2025-02-05 21:54:25 - ERROR - stderr - 54%|█████▎ | 12026/22434 [11:46:45<7:33:10, 2.61s/it] +2025-02-05 21:54:27 - ERROR - stderr - 54%|█████▎ | 12027/22434 [11:46:47<7:30:54, 2.60s/it] +2025-02-05 21:54:27 - ERROR - stderr - +2025-02-05 21:54:27 - ERROR - stderr - +2025-02-05 21:54:27 - INFO - stdout - {'loss': 0.6697, 'grad_norm': 1.1622490882873535, 'learning_rate': 9.31763849162401e-06, 'epoch': 1.61} +2025-02-05 21:54:27 - ERROR - stderr - 54%|█████▎ | 12027/22434 [11:46:47<7:30:54, 2.60s/it] +2025-02-05 21:54:30 - ERROR - stderr - 54%|█████▎ | 12028/22434 [11:46:50<7:27:03, 2.58s/it] +2025-02-05 21:54:30 - ERROR - stderr - +2025-02-05 21:54:30 - ERROR - stderr - +2025-02-05 21:54:30 - INFO - stdout - {'loss': 0.7258, 'grad_norm': 1.245871901512146, 'learning_rate': 9.316198117204891e-06, 'epoch': 1.61} +2025-02-05 21:54:30 - ERROR - stderr - 54%|█████▎ | 12028/22434 [11:46:50<7:27:03, 2.58s/it] +2025-02-05 21:54:33 - ERROR - stderr - 54%|█████▎ | 12029/22434 [11:46:52<7:33:41, 2.62s/it] +2025-02-05 21:54:33 - ERROR - stderr - +2025-02-05 21:54:33 - ERROR - stderr - +2025-02-05 21:54:33 - INFO - stdout - {'loss': 0.7238, 'grad_norm': 1.226513147354126, 'learning_rate': 9.314757757038966e-06, 'epoch': 1.61} +2025-02-05 21:54:33 - ERROR - stderr - 54%|█████▎ | 12029/22434 [11:46:52<7:33:41, 2.62s/it] +2025-02-05 21:54:35 - ERROR - stderr - 54%|█████▎ | 12030/22434 [11:46:55<7:26:50, 2.58s/it] +2025-02-05 21:54:35 - ERROR - stderr - +2025-02-05 21:54:35 - ERROR - stderr - +2025-02-05 21:54:35 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2418453693389893, 'learning_rate': 9.313317411156265e-06, 'epoch': 1.61} +2025-02-05 21:54:35 - ERROR - stderr - 54%|█████▎ | 12030/22434 [11:46:55<7:26:50, 2.58s/it] +2025-02-05 21:54:38 - ERROR - stderr - 54%|█████▎ | 12031/22434 [11:46:57<7:18:12, 2.53s/it] +2025-02-05 21:54:38 - ERROR - stderr - +2025-02-05 21:54:38 - ERROR - stderr - +2025-02-05 21:54:38 - INFO - stdout - {'loss': 0.7141, 'grad_norm': 1.2755424976348877, 'learning_rate': 9.311877079586799e-06, 'epoch': 1.61} +2025-02-05 21:54:38 - ERROR - stderr - 54%|█████▎ | 12031/22434 [11:46:57<7:18:12, 2.53s/it] +2025-02-05 21:54:40 - ERROR - stderr - 54%|█████▎ | 12032/22434 [11:47:00<7:21:25, 2.55s/it] +2025-02-05 21:54:40 - ERROR - stderr - +2025-02-05 21:54:40 - ERROR - stderr - +2025-02-05 21:54:40 - INFO - stdout - {'loss': 0.7384, 'grad_norm': 1.2642672061920166, 'learning_rate': 9.310436762360603e-06, 'epoch': 1.61} +2025-02-05 21:54:40 - ERROR - stderr - 54%|█████▎ | 12032/22434 [11:47:00<7:21:25, 2.55s/it] +2025-02-05 21:54:43 - ERROR - stderr - 54%|█████▎ | 12033/22434 [11:47:03<7:36:00, 2.63s/it] +2025-02-05 21:54:43 - ERROR - stderr - +2025-02-05 21:54:43 - ERROR - stderr - +2025-02-05 21:54:43 - INFO - stdout - {'loss': 0.6932, 'grad_norm': 1.2287489175796509, 'learning_rate': 9.308996459507692e-06, 'epoch': 1.61} +2025-02-05 21:54:43 - ERROR - stderr - 54%|█████▎ | 12033/22434 [11:47:03<7:36:00, 2.63s/it] +2025-02-05 21:54:45 - ERROR - stderr - 54%|█████▎ | 12034/22434 [11:47:05<7:30:07, 2.60s/it] +2025-02-05 21:54:46 - ERROR - stderr - +2025-02-05 21:54:46 - ERROR - stderr - +2025-02-05 21:54:46 - INFO - stdout - {'loss': 0.6413, 'grad_norm': 1.1590118408203125, 'learning_rate': 9.307556171058085e-06, 'epoch': 1.61} +2025-02-05 21:54:46 - ERROR - stderr - 54%|█████▎ | 12034/22434 [11:47:05<7:30:07, 2.60s/it] +2025-02-05 21:54:48 - ERROR - stderr - 54%|█████▎ | 12035/22434 [11:47:08<7:36:26, 2.63s/it] +2025-02-05 21:54:48 - ERROR - stderr - +2025-02-05 21:54:48 - ERROR - stderr - +2025-02-05 21:54:48 - INFO - stdout - {'loss': 0.649, 'grad_norm': 1.1187297105789185, 'learning_rate': 9.306115897041808e-06, 'epoch': 1.61} +2025-02-05 21:54:48 - ERROR - stderr - 54%|█████▎ | 12035/22434 [11:47:08<7:36:26, 2.63s/it] +2025-02-05 21:54:51 - ERROR - stderr - 54%|█████▎ | 12036/22434 [11:47:10<7:29:38, 2.59s/it] +2025-02-05 21:54:51 - ERROR - stderr - +2025-02-05 21:54:51 - ERROR - stderr - +2025-02-05 21:54:51 - INFO - stdout - {'loss': 0.6241, 'grad_norm': 1.058962106704712, 'learning_rate': 9.304675637488884e-06, 'epoch': 1.61} +2025-02-05 21:54:51 - ERROR - stderr - 54%|█████▎ | 12036/22434 [11:47:11<7:29:38, 2.59s/it] +2025-02-05 21:54:53 - ERROR - stderr - 54%|█████▎ | 12037/22434 [11:47:13<7:30:42, 2.60s/it] +2025-02-05 21:54:53 - ERROR - stderr - +2025-02-05 21:54:53 - ERROR - stderr - +2025-02-05 21:54:53 - INFO - stdout - {'loss': 0.7907, 'grad_norm': 1.2891771793365479, 'learning_rate': 9.303235392429328e-06, 'epoch': 1.61} +2025-02-05 21:54:53 - ERROR - stderr - 54%|█████▎ | 12037/22434 [11:47:13<7:30:42, 2.60s/it] +2025-02-05 21:54:56 - ERROR - stderr - 54%|█████▎ | 12038/22434 [11:47:16<7:26:22, 2.58s/it] +2025-02-05 21:54:56 - ERROR - stderr - +2025-02-05 21:54:56 - ERROR - stderr - +2025-02-05 21:54:56 - INFO - stdout - {'loss': 0.6882, 'grad_norm': 1.1527633666992188, 'learning_rate': 9.301795161893166e-06, 'epoch': 1.61} +2025-02-05 21:54:56 - ERROR - stderr - 54%|█████▎ | 12038/22434 [11:47:16<7:26:22, 2.58s/it] +2025-02-05 21:54:59 - ERROR - stderr - 54%|█████▎ | 12039/22434 [11:47:18<7:38:21, 2.65s/it] +2025-02-05 21:54:59 - ERROR - stderr - +2025-02-05 21:54:59 - ERROR - stderr - +2025-02-05 21:54:59 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.1895396709442139, 'learning_rate': 9.30035494591041e-06, 'epoch': 1.61} +2025-02-05 21:54:59 - ERROR - stderr - 54%|█████▎ | 12039/22434 [11:47:18<7:38:21, 2.65s/it] +2025-02-05 21:55:01 - ERROR - stderr - 54%|█████▎ | 12040/22434 [11:47:21<7:27:52, 2.59s/it] +2025-02-05 21:55:01 - ERROR - stderr - +2025-02-05 21:55:01 - ERROR - stderr - +2025-02-05 21:55:01 - INFO - stdout - {'loss': 0.7377, 'grad_norm': 1.1801073551177979, 'learning_rate': 9.298914744511093e-06, 'epoch': 1.61} +2025-02-05 21:55:01 - ERROR - stderr - 54%|█████▎ | 12040/22434 [11:47:21<7:27:52, 2.59s/it] +2025-02-05 21:55:04 - ERROR - stderr - 54%|█████▎ | 12041/22434 [11:47:23<7:23:06, 2.56s/it] +2025-02-05 21:55:04 - ERROR - stderr - +2025-02-05 21:55:04 - ERROR - stderr - +2025-02-05 21:55:04 - INFO - stdout - {'loss': 0.67, 'grad_norm': 1.1962250471115112, 'learning_rate': 9.297474557725225e-06, 'epoch': 1.61} +2025-02-05 21:55:04 - ERROR - stderr - 54%|█████▎ | 12041/22434 [11:47:23<7:23:06, 2.56s/it] +2025-02-05 21:55:06 - ERROR - stderr - 54%|█████▎ | 12042/22434 [11:47:26<7:20:45, 2.54s/it] +2025-02-05 21:55:06 - ERROR - stderr - +2025-02-05 21:55:06 - ERROR - stderr - +2025-02-05 21:55:06 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.3731606006622314, 'learning_rate': 9.296034385582823e-06, 'epoch': 1.61} +2025-02-05 21:55:06 - ERROR - stderr - 54%|█████▎ | 12042/22434 [11:47:26<7:20:45, 2.54s/it] +2025-02-05 21:55:09 - ERROR - stderr - 54%|█████▎ | 12043/22434 [11:47:28<7:21:34, 2.55s/it] +2025-02-05 21:55:09 - ERROR - stderr - +2025-02-05 21:55:09 - ERROR - stderr - +2025-02-05 21:55:09 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.2359511852264404, 'learning_rate': 9.294594228113917e-06, 'epoch': 1.61} +2025-02-05 21:55:09 - ERROR - stderr - 54%|█████▎ | 12043/22434 [11:47:28<7:21:34, 2.55s/it] +2025-02-05 21:55:11 - ERROR - stderr - 54%|█████▎ | 12044/22434 [11:47:31<7:18:26, 2.53s/it] +2025-02-05 21:55:11 - ERROR - stderr - +2025-02-05 21:55:11 - ERROR - stderr - +2025-02-05 21:55:11 - INFO - stdout - {'loss': 0.7258, 'grad_norm': 1.1612904071807861, 'learning_rate': 9.293154085348519e-06, 'epoch': 1.61} +2025-02-05 21:55:11 - ERROR - stderr - 54%|█████▎ | 12044/22434 [11:47:31<7:18:26, 2.53s/it] +2025-02-05 21:55:14 - ERROR - stderr - 54%|█████▎ | 12045/22434 [11:47:33<7:17:45, 2.53s/it] +2025-02-05 21:55:14 - ERROR - stderr - +2025-02-05 21:55:14 - ERROR - stderr - +2025-02-05 21:55:14 - INFO - stdout - {'loss': 0.652, 'grad_norm': 1.1835960149765015, 'learning_rate': 9.291713957316642e-06, 'epoch': 1.61} +2025-02-05 21:55:14 - ERROR - stderr - 54%|█████▎ | 12045/22434 [11:47:33<7:17:45, 2.53s/it] +2025-02-05 21:55:16 - ERROR - stderr - 54%|█████▎ | 12046/22434 [11:47:36<7:13:53, 2.51s/it] +2025-02-05 21:55:16 - ERROR - stderr - +2025-02-05 21:55:16 - ERROR - stderr - +2025-02-05 21:55:16 - INFO - stdout - {'loss': 0.7622, 'grad_norm': 1.3312981128692627, 'learning_rate': 9.290273844048316e-06, 'epoch': 1.61} +2025-02-05 21:55:16 - ERROR - stderr - 54%|█████▎ | 12046/22434 [11:47:36<7:13:53, 2.51s/it] +2025-02-05 21:55:19 - ERROR - stderr - 54%|█████▎ | 12047/22434 [11:47:39<7:22:20, 2.56s/it] +2025-02-05 21:55:19 - ERROR - stderr - +2025-02-05 21:55:19 - ERROR - stderr - +2025-02-05 21:55:19 - INFO - stdout - {'loss': 0.6502, 'grad_norm': 1.198354959487915, 'learning_rate': 9.288833745573547e-06, 'epoch': 1.61} +2025-02-05 21:55:19 - ERROR - stderr - 54%|█████▎ | 12047/22434 [11:47:39<7:22:20, 2.56s/it] +2025-02-05 21:55:21 - ERROR - stderr - 54%|█████▎ | 12048/22434 [11:47:41<7:17:36, 2.53s/it] +2025-02-05 21:55:21 - ERROR - stderr - +2025-02-05 21:55:21 - ERROR - stderr - +2025-02-05 21:55:21 - INFO - stdout - {'loss': 0.7136, 'grad_norm': 1.1594412326812744, 'learning_rate': 9.287393661922361e-06, 'epoch': 1.61} +2025-02-05 21:55:21 - ERROR - stderr - 54%|█████▎ | 12048/22434 [11:47:41<7:17:36, 2.53s/it] +2025-02-05 21:55:24 - ERROR - stderr - 54%|█████▎ | 12049/22434 [11:47:44<7:14:56, 2.51s/it] +2025-02-05 21:55:24 - ERROR - stderr - +2025-02-05 21:55:24 - ERROR - stderr - +2025-02-05 21:55:24 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.2196731567382812, 'learning_rate': 9.285953593124774e-06, 'epoch': 1.61} +2025-02-05 21:55:24 - ERROR - stderr - 54%|█████▎ | 12049/22434 [11:47:44<7:14:56, 2.51s/it] +2025-02-05 21:55:26 - ERROR - stderr - 54%|█████▎ | 12050/22434 [11:47:46<7:20:32, 2.55s/it] +2025-02-05 21:55:26 - ERROR - stderr - +2025-02-05 21:55:26 - ERROR - stderr - +2025-02-05 21:55:26 - INFO - stdout - {'loss': 0.692, 'grad_norm': 1.0901768207550049, 'learning_rate': 9.284513539210798e-06, 'epoch': 1.61} +2025-02-05 21:55:26 - ERROR - stderr - 54%|█████▎ | 12050/22434 [11:47:46<7:20:32, 2.55s/it] +2025-02-05 21:55:29 - ERROR - stderr - 54%|█████▎ | 12051/22434 [11:47:49<7:19:13, 2.54s/it] +2025-02-05 21:55:29 - ERROR - stderr - +2025-02-05 21:55:29 - ERROR - stderr - +2025-02-05 21:55:29 - INFO - stdout - {'loss': 0.592, 'grad_norm': 1.1655796766281128, 'learning_rate': 9.283073500210456e-06, 'epoch': 1.61} +2025-02-05 21:55:29 - ERROR - stderr - 54%|█████▎ | 12051/22434 [11:47:49<7:19:13, 2.54s/it] +2025-02-05 21:55:31 - ERROR - stderr - 54%|█████▎ | 12052/22434 [11:47:51<7:19:17, 2.54s/it] +2025-02-05 21:55:31 - ERROR - stderr - +2025-02-05 21:55:31 - ERROR - stderr - +2025-02-05 21:55:31 - INFO - stdout - {'loss': 0.641, 'grad_norm': 1.0902522802352905, 'learning_rate': 9.28163347615376e-06, 'epoch': 1.61} +2025-02-05 21:55:31 - ERROR - stderr - 54%|█████▎ | 12052/22434 [11:47:51<7:19:17, 2.54s/it] +2025-02-05 21:55:34 - ERROR - stderr - 54%|█████▎ | 12053/22434 [11:47:54<7:29:16, 2.60s/it] +2025-02-05 21:55:34 - ERROR - stderr - +2025-02-05 21:55:34 - ERROR - stderr - +2025-02-05 21:55:34 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.2153087854385376, 'learning_rate': 9.280193467070722e-06, 'epoch': 1.61} +2025-02-05 21:55:34 - ERROR - stderr - 54%|█████▎ | 12053/22434 [11:47:54<7:29:16, 2.60s/it] +2025-02-05 21:55:37 - ERROR - stderr - 54%|█████▎ | 12054/22434 [11:47:57<7:28:59, 2.60s/it] +2025-02-05 21:55:37 - ERROR - stderr - +2025-02-05 21:55:37 - ERROR - stderr - +2025-02-05 21:55:37 - INFO - stdout - {'loss': 0.6138, 'grad_norm': 1.1211543083190918, 'learning_rate': 9.278753472991366e-06, 'epoch': 1.61} +2025-02-05 21:55:37 - ERROR - stderr - 54%|█████▎ | 12054/22434 [11:47:57<7:28:59, 2.60s/it] +2025-02-05 21:55:39 - ERROR - stderr - 54%|█████▎ | 12055/22434 [11:47:59<7:36:11, 2.64s/it] +2025-02-05 21:55:40 - ERROR - stderr - +2025-02-05 21:55:40 - ERROR - stderr - +2025-02-05 21:55:40 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.160091757774353, 'learning_rate': 9.2773134939457e-06, 'epoch': 1.61} +2025-02-05 21:55:40 - ERROR - stderr - 54%|█████▎ | 12055/22434 [11:47:59<7:36:11, 2.64s/it] +2025-02-05 21:55:42 - ERROR - stderr - 54%|█████▎ | 12056/22434 [11:48:02<7:36:15, 2.64s/it] +2025-02-05 21:55:42 - ERROR - stderr - +2025-02-05 21:55:42 - ERROR - stderr - +2025-02-05 21:55:42 - INFO - stdout - {'loss': 0.6839, 'grad_norm': 1.2990282773971558, 'learning_rate': 9.275873529963751e-06, 'epoch': 1.61} +2025-02-05 21:55:42 - ERROR - stderr - 54%|█████▎ | 12056/22434 [11:48:02<7:36:15, 2.64s/it] +2025-02-05 21:55:45 - ERROR - stderr - 54%|█████▎ | 12057/22434 [11:48:04<7:28:26, 2.59s/it] +2025-02-05 21:55:45 - ERROR - stderr - +2025-02-05 21:55:45 - ERROR - stderr - +2025-02-05 21:55:45 - INFO - stdout - {'loss': 0.6482, 'grad_norm': 1.2587974071502686, 'learning_rate': 9.274433581075521e-06, 'epoch': 1.61} +2025-02-05 21:55:45 - ERROR - stderr - 54%|█████▎ | 12057/22434 [11:48:04<7:28:26, 2.59s/it] +2025-02-05 21:55:47 - ERROR - stderr - 54%|█████▎ | 12058/22434 [11:48:07<7:29:49, 2.60s/it] +2025-02-05 21:55:47 - ERROR - stderr - +2025-02-05 21:55:47 - ERROR - stderr - +2025-02-05 21:55:47 - INFO - stdout - {'loss': 0.7529, 'grad_norm': 1.3218412399291992, 'learning_rate': 9.272993647311027e-06, 'epoch': 1.61} +2025-02-05 21:55:47 - ERROR - stderr - 54%|█████▎ | 12058/22434 [11:48:07<7:29:49, 2.60s/it] +2025-02-05 21:55:50 - ERROR - stderr - 54%|█████▍ | 12059/22434 [11:48:10<7:26:39, 2.58s/it] +2025-02-05 21:55:50 - ERROR - stderr - +2025-02-05 21:55:50 - ERROR - stderr - +2025-02-05 21:55:50 - INFO - stdout - {'loss': 0.7153, 'grad_norm': 1.2248531579971313, 'learning_rate': 9.271553728700287e-06, 'epoch': 1.61} +2025-02-05 21:55:50 - ERROR - stderr - 54%|█████▍ | 12059/22434 [11:48:10<7:26:39, 2.58s/it] +2025-02-05 21:55:52 - ERROR - stderr - 54%|█████▍ | 12060/22434 [11:48:12<7:23:09, 2.56s/it] +2025-02-05 21:55:52 - ERROR - stderr - +2025-02-05 21:55:52 - ERROR - stderr - +2025-02-05 21:55:52 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 1.153381109237671, 'learning_rate': 9.270113825273311e-06, 'epoch': 1.61} +2025-02-05 21:55:52 - ERROR - stderr - 54%|█████▍ | 12060/22434 [11:48:12<7:23:09, 2.56s/it] +2025-02-05 21:55:55 - ERROR - stderr - 54%|█████▍ | 12061/22434 [11:48:15<7:19:33, 2.54s/it] +2025-02-05 21:55:55 - ERROR - stderr - +2025-02-05 21:55:55 - ERROR - stderr - +2025-02-05 21:55:55 - INFO - stdout - {'loss': 0.7035, 'grad_norm': 1.3794678449630737, 'learning_rate': 9.268673937060113e-06, 'epoch': 1.61} +2025-02-05 21:55:55 - ERROR - stderr - 54%|█████▍ | 12061/22434 [11:48:15<7:19:33, 2.54s/it] +2025-02-05 21:55:57 - ERROR - stderr - 54%|█████▍ | 12062/22434 [11:48:17<7:17:06, 2.53s/it] +2025-02-05 21:55:57 - ERROR - stderr - +2025-02-05 21:55:57 - ERROR - stderr - +2025-02-05 21:55:57 - INFO - stdout - {'loss': 0.6875, 'grad_norm': 1.1833741664886475, 'learning_rate': 9.26723406409071e-06, 'epoch': 1.61} +2025-02-05 21:55:57 - ERROR - stderr - 54%|█████▍ | 12062/22434 [11:48:17<7:17:06, 2.53s/it] +2025-02-05 21:56:00 - ERROR - stderr - 54%|█████▍ | 12063/22434 [11:48:20<7:14:07, 2.51s/it] +2025-02-05 21:56:00 - ERROR - stderr - +2025-02-05 21:56:00 - ERROR - stderr - +2025-02-05 21:56:00 - INFO - stdout - {'loss': 0.6272, 'grad_norm': 1.1151732206344604, 'learning_rate': 9.265794206395108e-06, 'epoch': 1.61} +2025-02-05 21:56:00 - ERROR - stderr - 54%|█████▍ | 12063/22434 [11:48:20<7:14:07, 2.51s/it] +2025-02-05 21:56:02 - ERROR - stderr - 54%|█████▍ | 12064/22434 [11:48:22<7:15:01, 2.52s/it] +2025-02-05 21:56:02 - ERROR - stderr - +2025-02-05 21:56:02 - ERROR - stderr - +2025-02-05 21:56:02 - INFO - stdout - {'loss': 0.7366, 'grad_norm': 1.2501327991485596, 'learning_rate': 9.264354364003327e-06, 'epoch': 1.61} +2025-02-05 21:56:02 - ERROR - stderr - 54%|█████▍ | 12064/22434 [11:48:22<7:15:01, 2.52s/it] +2025-02-05 21:56:05 - ERROR - stderr - 54%|█████▍ | 12065/22434 [11:48:25<7:15:11, 2.52s/it] +2025-02-05 21:56:05 - ERROR - stderr - +2025-02-05 21:56:05 - ERROR - stderr - +2025-02-05 21:56:05 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.3475669622421265, 'learning_rate': 9.262914536945377e-06, 'epoch': 1.61} +2025-02-05 21:56:05 - ERROR - stderr - 54%|█████▍ | 12065/22434 [11:48:25<7:15:11, 2.52s/it] +2025-02-05 21:56:07 - ERROR - stderr - 54%|█████▍ | 12066/22434 [11:48:27<7:11:15, 2.50s/it] +2025-02-05 21:56:07 - ERROR - stderr - +2025-02-05 21:56:07 - ERROR - stderr - +2025-02-05 21:56:07 - INFO - stdout - {'loss': 0.6487, 'grad_norm': 1.276896357536316, 'learning_rate': 9.261474725251261e-06, 'epoch': 1.61} +2025-02-05 21:56:07 - ERROR - stderr - 54%|█████▍ | 12066/22434 [11:48:27<7:11:15, 2.50s/it] +2025-02-05 21:56:10 - ERROR - stderr - 54%|█████▍ | 12067/22434 [11:48:30<7:12:06, 2.50s/it] +2025-02-05 21:56:10 - ERROR - stderr - +2025-02-05 21:56:10 - ERROR - stderr - +2025-02-05 21:56:10 - INFO - stdout - {'loss': 0.8003, 'grad_norm': 1.254490852355957, 'learning_rate': 9.260034928951002e-06, 'epoch': 1.61} +2025-02-05 21:56:10 - ERROR - stderr - 54%|█████▍ | 12067/22434 [11:48:30<7:12:06, 2.50s/it] +2025-02-05 21:56:13 - ERROR - stderr - 54%|█████▍ | 12068/22434 [11:48:32<7:26:10, 2.58s/it] +2025-02-05 21:56:13 - ERROR - stderr - +2025-02-05 21:56:13 - ERROR - stderr - +2025-02-05 21:56:13 - INFO - stdout - {'loss': 0.764, 'grad_norm': 1.4288378953933716, 'learning_rate': 9.258595148074604e-06, 'epoch': 1.61} +2025-02-05 21:56:13 - ERROR - stderr - 54%|█████▍ | 12068/22434 [11:48:32<7:26:10, 2.58s/it] +2025-02-05 21:56:15 - ERROR - stderr - 54%|█████▍ | 12069/22434 [11:48:35<7:22:34, 2.56s/it] +2025-02-05 21:56:15 - ERROR - stderr - +2025-02-05 21:56:15 - ERROR - stderr - +2025-02-05 21:56:15 - INFO - stdout - {'loss': 0.7145, 'grad_norm': 1.2978192567825317, 'learning_rate': 9.257155382652086e-06, 'epoch': 1.61} +2025-02-05 21:56:15 - ERROR - stderr - 54%|█████▍ | 12069/22434 [11:48:35<7:22:34, 2.56s/it] +2025-02-05 21:56:18 - ERROR - stderr - 54%|█████▍ | 12070/22434 [11:48:37<7:18:19, 2.54s/it] +2025-02-05 21:56:18 - ERROR - stderr - +2025-02-05 21:56:18 - ERROR - stderr - +2025-02-05 21:56:18 - INFO - stdout - {'loss': 0.6853, 'grad_norm': 1.259850025177002, 'learning_rate': 9.255715632713452e-06, 'epoch': 1.61} +2025-02-05 21:56:18 - ERROR - stderr - 54%|█████▍ | 12070/22434 [11:48:37<7:18:19, 2.54s/it] +2025-02-05 21:56:20 - ERROR - stderr - 54%|█████▍ | 12071/22434 [11:48:40<7:19:08, 2.54s/it] +2025-02-05 21:56:20 - ERROR - stderr - +2025-02-05 21:56:20 - ERROR - stderr - +2025-02-05 21:56:20 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 1.2559986114501953, 'learning_rate': 9.254275898288709e-06, 'epoch': 1.61} +2025-02-05 21:56:20 - ERROR - stderr - 54%|█████▍ | 12071/22434 [11:48:40<7:19:08, 2.54s/it] +2025-02-05 21:56:23 - ERROR - stderr - 54%|█████▍ | 12072/22434 [11:48:42<7:18:50, 2.54s/it] +2025-02-05 21:56:23 - ERROR - stderr - +2025-02-05 21:56:23 - ERROR - stderr - +2025-02-05 21:56:23 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.2673776149749756, 'learning_rate': 9.252836179407876e-06, 'epoch': 1.61} +2025-02-05 21:56:23 - ERROR - stderr - 54%|█████▍ | 12072/22434 [11:48:42<7:18:50, 2.54s/it] +2025-02-05 21:56:25 - ERROR - stderr - 54%|█████▍ | 12073/22434 [11:48:45<7:25:55, 2.58s/it] +2025-02-05 21:56:25 - ERROR - stderr - +2025-02-05 21:56:25 - ERROR - stderr - +2025-02-05 21:56:25 - INFO - stdout - {'loss': 0.6052, 'grad_norm': 1.0878359079360962, 'learning_rate': 9.251396476100955e-06, 'epoch': 1.61} +2025-02-05 21:56:25 - ERROR - stderr - 54%|█████▍ | 12073/22434 [11:48:45<7:25:55, 2.58s/it] +2025-02-05 21:56:28 - ERROR - stderr - 54%|█████▍ | 12074/22434 [11:48:48<7:26:50, 2.59s/it] +2025-02-05 21:56:28 - ERROR - stderr - +2025-02-05 21:56:28 - ERROR - stderr - +2025-02-05 21:56:28 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.0732975006103516, 'learning_rate': 9.249956788397956e-06, 'epoch': 1.61} +2025-02-05 21:56:28 - ERROR - stderr - 54%|█████▍ | 12074/22434 [11:48:48<7:26:50, 2.59s/it] +2025-02-05 21:56:31 - ERROR - stderr - 54%|█████▍ | 12075/22434 [11:48:50<7:30:30, 2.61s/it] +2025-02-05 21:56:31 - ERROR - stderr - +2025-02-05 21:56:31 - ERROR - stderr - +2025-02-05 21:56:31 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.3220690488815308, 'learning_rate': 9.248517116328897e-06, 'epoch': 1.61} +2025-02-05 21:56:31 - ERROR - stderr - 54%|█████▍ | 12075/22434 [11:48:50<7:30:30, 2.61s/it] +2025-02-05 21:56:33 - ERROR - stderr - 54%|█████▍ | 12076/22434 [11:48:53<7:27:31, 2.59s/it] +2025-02-05 21:56:33 - ERROR - stderr - +2025-02-05 21:56:33 - ERROR - stderr - +2025-02-05 21:56:33 - INFO - stdout - {'loss': 0.7002, 'grad_norm': 1.2409942150115967, 'learning_rate': 9.247077459923773e-06, 'epoch': 1.61} +2025-02-05 21:56:33 - ERROR - stderr - 54%|█████▍ | 12076/22434 [11:48:53<7:27:31, 2.59s/it] +2025-02-05 21:56:36 - ERROR - stderr - 54%|█████▍ | 12077/22434 [11:48:55<7:24:44, 2.58s/it] +2025-02-05 21:56:36 - ERROR - stderr - +2025-02-05 21:56:36 - ERROR - stderr - +2025-02-05 21:56:36 - INFO - stdout - {'loss': 0.7866, 'grad_norm': 1.3409161567687988, 'learning_rate': 9.245637819212602e-06, 'epoch': 1.62} +2025-02-05 21:56:36 - ERROR - stderr - 54%|█████▍ | 12077/22434 [11:48:55<7:24:44, 2.58s/it] +2025-02-05 21:56:38 - ERROR - stderr - 54%|█████▍ | 12078/22434 [11:48:58<7:26:23, 2.59s/it] +2025-02-05 21:56:38 - ERROR - stderr - +2025-02-05 21:56:38 - ERROR - stderr - +2025-02-05 21:56:38 - INFO - stdout - {'loss': 0.7429, 'grad_norm': 1.257031798362732, 'learning_rate': 9.244198194225392e-06, 'epoch': 1.62} +2025-02-05 21:56:38 - ERROR - stderr - 54%|█████▍ | 12078/22434 [11:48:58<7:26:23, 2.59s/it] +2025-02-05 21:56:41 - ERROR - stderr - 54%|█████▍ | 12079/22434 [11:49:01<7:25:27, 2.58s/it] +2025-02-05 21:56:41 - ERROR - stderr - +2025-02-05 21:56:41 - ERROR - stderr - +2025-02-05 21:56:41 - INFO - stdout - {'loss': 0.7193, 'grad_norm': 1.286256194114685, 'learning_rate': 9.24275858499214e-06, 'epoch': 1.62} +2025-02-05 21:56:41 - ERROR - stderr - 54%|█████▍ | 12079/22434 [11:49:01<7:25:27, 2.58s/it] +2025-02-05 21:56:43 - ERROR - stderr - 54%|█████▍ | 12080/22434 [11:49:03<7:21:56, 2.56s/it] +2025-02-05 21:56:43 - ERROR - stderr - +2025-02-05 21:56:43 - ERROR - stderr - +2025-02-05 21:56:43 - INFO - stdout - {'loss': 0.7503, 'grad_norm': 1.266829252243042, 'learning_rate': 9.241318991542865e-06, 'epoch': 1.62} +2025-02-05 21:56:43 - ERROR - stderr - 54%|█████▍ | 12080/22434 [11:49:03<7:21:56, 2.56s/it] +2025-02-05 21:56:46 - ERROR - stderr - 54%|█████▍ | 12081/22434 [11:49:06<7:14:26, 2.52s/it] +2025-02-05 21:56:46 - ERROR - stderr - +2025-02-05 21:56:46 - ERROR - stderr - +2025-02-05 21:56:46 - INFO - stdout - {'loss': 0.6455, 'grad_norm': 1.106519341468811, 'learning_rate': 9.239879413907571e-06, 'epoch': 1.62} +2025-02-05 21:56:46 - ERROR - stderr - 54%|█████▍ | 12081/22434 [11:49:06<7:14:26, 2.52s/it] +2025-02-05 21:56:48 - ERROR - stderr - 54%|█████▍ | 12082/22434 [11:49:08<7:14:59, 2.52s/it] +2025-02-05 21:56:48 - ERROR - stderr - +2025-02-05 21:56:48 - ERROR - stderr - +2025-02-05 21:56:48 - INFO - stdout - {'loss': 0.7102, 'grad_norm': 1.3861799240112305, 'learning_rate': 9.23843985211626e-06, 'epoch': 1.62} +2025-02-05 21:56:48 - ERROR - stderr - 54%|█████▍ | 12082/22434 [11:49:08<7:14:59, 2.52s/it] +2025-02-05 21:56:51 - ERROR - stderr - 54%|█████▍ | 12083/22434 [11:49:11<7:17:27, 2.54s/it] +2025-02-05 21:56:51 - ERROR - stderr - +2025-02-05 21:56:51 - ERROR - stderr - +2025-02-05 21:56:51 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.2288228273391724, 'learning_rate': 9.237000306198944e-06, 'epoch': 1.62} +2025-02-05 21:56:51 - ERROR - stderr - 54%|█████▍ | 12083/22434 [11:49:11<7:17:27, 2.54s/it] +2025-02-05 21:56:53 - ERROR - stderr - 54%|█████▍ | 12084/22434 [11:49:13<7:17:57, 2.54s/it] +2025-02-05 21:56:53 - ERROR - stderr - +2025-02-05 21:56:53 - ERROR - stderr - +2025-02-05 21:56:53 - INFO - stdout - {'loss': 0.636, 'grad_norm': 1.1513835191726685, 'learning_rate': 9.235560776185623e-06, 'epoch': 1.62} +2025-02-05 21:56:53 - ERROR - stderr - 54%|█████▍ | 12084/22434 [11:49:13<7:17:57, 2.54s/it] +2025-02-05 21:56:56 - ERROR - stderr - 54%|█████▍ | 12085/22434 [11:49:16<7:29:19, 2.61s/it] +2025-02-05 21:56:56 - ERROR - stderr - +2025-02-05 21:56:56 - ERROR - stderr - +2025-02-05 21:56:56 - INFO - stdout - {'loss': 0.6712, 'grad_norm': 1.15359628200531, 'learning_rate': 9.234121262106312e-06, 'epoch': 1.62} +2025-02-05 21:56:56 - ERROR - stderr - 54%|█████▍ | 12085/22434 [11:49:16<7:29:19, 2.61s/it] +2025-02-05 21:56:59 - ERROR - stderr - 54%|█████▍ | 12086/22434 [11:49:18<7:22:39, 2.57s/it] +2025-02-05 21:56:59 - ERROR - stderr - +2025-02-05 21:56:59 - ERROR - stderr - +2025-02-05 21:56:59 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.3144891262054443, 'learning_rate': 9.232681763991006e-06, 'epoch': 1.62} +2025-02-05 21:56:59 - ERROR - stderr - 54%|█████▍ | 12086/22434 [11:49:18<7:22:39, 2.57s/it] +2025-02-05 21:57:01 - ERROR - stderr - 54%|█████▍ | 12087/22434 [11:49:21<7:16:27, 2.53s/it] +2025-02-05 21:57:01 - ERROR - stderr - +2025-02-05 21:57:01 - ERROR - stderr - +2025-02-05 21:57:01 - INFO - stdout - {'loss': 0.5859, 'grad_norm': 1.0848164558410645, 'learning_rate': 9.231242281869714e-06, 'epoch': 1.62} +2025-02-05 21:57:01 - ERROR - stderr - 54%|█████▍ | 12087/22434 [11:49:21<7:16:27, 2.53s/it] +2025-02-05 21:57:04 - ERROR - stderr - 54%|█████▍ | 12088/22434 [11:49:23<7:15:38, 2.53s/it] +2025-02-05 21:57:04 - ERROR - stderr - +2025-02-05 21:57:04 - ERROR - stderr - +2025-02-05 21:57:04 - INFO - stdout - {'loss': 0.7549, 'grad_norm': 1.2962980270385742, 'learning_rate': 9.229802815772444e-06, 'epoch': 1.62} +2025-02-05 21:57:04 - ERROR - stderr - 54%|█████▍ | 12088/22434 [11:49:23<7:15:38, 2.53s/it] +2025-02-05 21:57:06 - ERROR - stderr - 54%|█████▍ | 12089/22434 [11:49:26<7:17:33, 2.54s/it] +2025-02-05 21:57:06 - ERROR - stderr - +2025-02-05 21:57:06 - ERROR - stderr - +2025-02-05 21:57:06 - INFO - stdout - {'loss': 0.7137, 'grad_norm': 1.1979405879974365, 'learning_rate': 9.228363365729198e-06, 'epoch': 1.62} +2025-02-05 21:57:06 - ERROR - stderr - 54%|█████▍ | 12089/22434 [11:49:26<7:17:33, 2.54s/it] +2025-02-05 21:57:09 - ERROR - stderr - 54%|█████▍ | 12090/22434 [11:49:28<7:18:20, 2.54s/it] +2025-02-05 21:57:09 - ERROR - stderr - +2025-02-05 21:57:09 - ERROR - stderr - +2025-02-05 21:57:09 - INFO - stdout - {'loss': 0.6677, 'grad_norm': 1.1655186414718628, 'learning_rate': 9.226923931769973e-06, 'epoch': 1.62} +2025-02-05 21:57:09 - ERROR - stderr - 54%|█████▍ | 12090/22434 [11:49:29<7:18:20, 2.54s/it] +2025-02-05 21:57:11 - ERROR - stderr - 54%|█████▍ | 12091/22434 [11:49:31<7:17:13, 2.54s/it] +2025-02-05 21:57:11 - ERROR - stderr - +2025-02-05 21:57:11 - ERROR - stderr - +2025-02-05 21:57:11 - INFO - stdout - {'loss': 0.6559, 'grad_norm': 1.3142768144607544, 'learning_rate': 9.225484513924786e-06, 'epoch': 1.62} +2025-02-05 21:57:11 - ERROR - stderr - 54%|█████▍ | 12091/22434 [11:49:31<7:17:13, 2.54s/it] +2025-02-05 21:57:14 - ERROR - stderr - 54%|█████▍ | 12092/22434 [11:49:34<7:19:17, 2.55s/it] +2025-02-05 21:57:14 - ERROR - stderr - +2025-02-05 21:57:14 - ERROR - stderr - +2025-02-05 21:57:14 - INFO - stdout - {'loss': 0.7982, 'grad_norm': 1.4054341316223145, 'learning_rate': 9.224045112223627e-06, 'epoch': 1.62} +2025-02-05 21:57:14 - ERROR - stderr - 54%|█████▍ | 12092/22434 [11:49:34<7:19:17, 2.55s/it] +2025-02-05 21:57:16 - ERROR - stderr - 54%|█████▍ | 12093/22434 [11:49:36<7:18:41, 2.55s/it] +2025-02-05 21:57:16 - ERROR - stderr - +2025-02-05 21:57:16 - ERROR - stderr - +2025-02-05 21:57:16 - INFO - stdout - {'loss': 0.7677, 'grad_norm': 1.242814540863037, 'learning_rate': 9.222605726696509e-06, 'epoch': 1.62} +2025-02-05 21:57:16 - ERROR - stderr - 54%|█████▍ | 12093/22434 [11:49:36<7:18:41, 2.55s/it] +2025-02-05 21:57:19 - ERROR - stderr - 54%|█████▍ | 12094/22434 [11:49:39<7:15:28, 2.53s/it] +2025-02-05 21:57:19 - ERROR - stderr - +2025-02-05 21:57:19 - ERROR - stderr - +2025-02-05 21:57:19 - INFO - stdout - {'loss': 0.7372, 'grad_norm': 1.3603601455688477, 'learning_rate': 9.22116635737343e-06, 'epoch': 1.62} +2025-02-05 21:57:19 - ERROR - stderr - 54%|█████▍ | 12094/22434 [11:49:39<7:15:28, 2.53s/it] +2025-02-05 21:57:21 - ERROR - stderr - 54%|█████▍ | 12095/22434 [11:49:41<7:15:00, 2.52s/it] +2025-02-05 21:57:21 - ERROR - stderr - +2025-02-05 21:57:21 - ERROR - stderr - +2025-02-05 21:57:21 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.1627155542373657, 'learning_rate': 9.21972700428439e-06, 'epoch': 1.62} +2025-02-05 21:57:21 - ERROR - stderr - 54%|█████▍ | 12095/22434 [11:49:41<7:15:00, 2.52s/it] +2025-02-05 21:57:24 - ERROR - stderr - 54%|█████▍ | 12096/22434 [11:49:44<7:12:23, 2.51s/it] +2025-02-05 21:57:24 - ERROR - stderr - +2025-02-05 21:57:24 - ERROR - stderr - +2025-02-05 21:57:24 - INFO - stdout - {'loss': 0.651, 'grad_norm': 1.0614711046218872, 'learning_rate': 9.2182876674594e-06, 'epoch': 1.62} +2025-02-05 21:57:24 - ERROR - stderr - 54%|█████▍ | 12096/22434 [11:49:44<7:12:23, 2.51s/it] +2025-02-05 21:57:26 - ERROR - stderr - 54%|█████▍ | 12097/22434 [11:49:46<7:12:43, 2.51s/it] +2025-02-05 21:57:26 - ERROR - stderr - +2025-02-05 21:57:26 - ERROR - stderr - +2025-02-05 21:57:26 - INFO - stdout - {'loss': 0.6873, 'grad_norm': 1.309293270111084, 'learning_rate': 9.216848346928455e-06, 'epoch': 1.62} +2025-02-05 21:57:26 - ERROR - stderr - 54%|█████▍ | 12097/22434 [11:49:46<7:12:43, 2.51s/it] +2025-02-05 21:57:29 - ERROR - stderr - 54%|█████▍ | 12098/22434 [11:49:49<7:11:55, 2.51s/it] +2025-02-05 21:57:29 - ERROR - stderr - +2025-02-05 21:57:29 - ERROR - stderr - +2025-02-05 21:57:29 - INFO - stdout - {'loss': 0.6364, 'grad_norm': 1.2187672853469849, 'learning_rate': 9.215409042721553e-06, 'epoch': 1.62} +2025-02-05 21:57:29 - ERROR - stderr - 54%|█████▍ | 12098/22434 [11:49:49<7:11:55, 2.51s/it] +2025-02-05 21:57:31 - ERROR - stderr - 54%|█████▍ | 12099/22434 [11:49:51<7:11:54, 2.51s/it] +2025-02-05 21:57:31 - ERROR - stderr - +2025-02-05 21:57:31 - ERROR - stderr - +2025-02-05 21:57:31 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.1157554388046265, 'learning_rate': 9.213969754868699e-06, 'epoch': 1.62} +2025-02-05 21:57:31 - ERROR - stderr - 54%|█████▍ | 12099/22434 [11:49:51<7:11:54, 2.51s/it] +2025-02-05 21:57:34 - ERROR - stderr - 54%|█████▍ | 12100/22434 [11:49:54<7:14:57, 2.53s/it] +2025-02-05 21:57:34 - ERROR - stderr - +2025-02-05 21:57:34 - ERROR - stderr - +2025-02-05 21:57:34 - INFO - stdout - {'loss': 0.6874, 'grad_norm': 1.1662760972976685, 'learning_rate': 9.212530483399891e-06, 'epoch': 1.62} +2025-02-05 21:57:34 - ERROR - stderr - 54%|█████▍ | 12100/22434 [11:49:54<7:14:57, 2.53s/it] +2025-02-05 21:57:36 - ERROR - stderr - 54%|█████▍ | 12101/22434 [11:49:56<7:14:34, 2.52s/it] +2025-02-05 21:57:36 - ERROR - stderr - +2025-02-05 21:57:36 - ERROR - stderr - +2025-02-05 21:57:36 - INFO - stdout - {'loss': 0.7848, 'grad_norm': 1.288918137550354, 'learning_rate': 9.211091228345137e-06, 'epoch': 1.62} +2025-02-05 21:57:36 - ERROR - stderr - 54%|█████▍ | 12101/22434 [11:49:56<7:14:34, 2.52s/it] +2025-02-05 21:57:39 - ERROR - stderr - 54%|█████▍ | 12102/22434 [11:49:59<7:14:55, 2.53s/it] +2025-02-05 21:57:39 - ERROR - stderr - +2025-02-05 21:57:39 - ERROR - stderr - +2025-02-05 21:57:39 - INFO - stdout - {'loss': 0.7237, 'grad_norm': 1.3652713298797607, 'learning_rate': 9.209651989734431e-06, 'epoch': 1.62} +2025-02-05 21:57:39 - ERROR - stderr - 54%|█████▍ | 12102/22434 [11:49:59<7:14:55, 2.53s/it] +2025-02-05 21:57:42 - ERROR - stderr - 54%|█████▍ | 12103/22434 [11:50:01<7:16:13, 2.53s/it] +2025-02-05 21:57:42 - ERROR - stderr - +2025-02-05 21:57:42 - ERROR - stderr - +2025-02-05 21:57:42 - INFO - stdout - {'loss': 0.6412, 'grad_norm': 1.1923158168792725, 'learning_rate': 9.20821276759777e-06, 'epoch': 1.62} +2025-02-05 21:57:42 - ERROR - stderr - 54%|█████▍ | 12103/22434 [11:50:01<7:16:13, 2.53s/it] +2025-02-05 21:57:44 - ERROR - stderr - 54%|█████▍ | 12104/22434 [11:50:04<7:21:42, 2.57s/it] +2025-02-05 21:57:44 - ERROR - stderr - +2025-02-05 21:57:44 - ERROR - stderr - +2025-02-05 21:57:44 - INFO - stdout - {'loss': 0.6046, 'grad_norm': 1.1096692085266113, 'learning_rate': 9.206773561965158e-06, 'epoch': 1.62} +2025-02-05 21:57:44 - ERROR - stderr - 54%|█████▍ | 12104/22434 [11:50:04<7:21:42, 2.57s/it] +2025-02-05 21:57:47 - ERROR - stderr - 54%|█████▍ | 12105/22434 [11:50:06<7:17:15, 2.54s/it] +2025-02-05 21:57:47 - ERROR - stderr - +2025-02-05 21:57:47 - ERROR - stderr - +2025-02-05 21:57:47 - INFO - stdout - {'loss': 0.6234, 'grad_norm': 1.127875804901123, 'learning_rate': 9.205334372866593e-06, 'epoch': 1.62} +2025-02-05 21:57:47 - ERROR - stderr - 54%|█████▍ | 12105/22434 [11:50:06<7:17:15, 2.54s/it] +2025-02-05 21:57:49 - ERROR - stderr - 54%|█████▍ | 12106/22434 [11:50:09<7:16:13, 2.53s/it] +2025-02-05 21:57:49 - ERROR - stderr - +2025-02-05 21:57:49 - ERROR - stderr - +2025-02-05 21:57:49 - INFO - stdout - {'loss': 0.6862, 'grad_norm': 1.257360577583313, 'learning_rate': 9.203895200332069e-06, 'epoch': 1.62} +2025-02-05 21:57:49 - ERROR - stderr - 54%|█████▍ | 12106/22434 [11:50:09<7:16:13, 2.53s/it] +2025-02-05 21:57:52 - ERROR - stderr - 54%|█████▍ | 12107/22434 [11:50:11<7:12:51, 2.51s/it] +2025-02-05 21:57:52 - ERROR - stderr - +2025-02-05 21:57:52 - ERROR - stderr - +2025-02-05 21:57:52 - INFO - stdout - {'loss': 0.7081, 'grad_norm': 1.3931533098220825, 'learning_rate': 9.20245604439159e-06, 'epoch': 1.62} +2025-02-05 21:57:52 - ERROR - stderr - 54%|█████▍ | 12107/22434 [11:50:11<7:12:51, 2.51s/it] +2025-02-05 21:57:54 - ERROR - stderr - 54%|█████▍ | 12108/22434 [11:50:14<7:12:23, 2.51s/it] +2025-02-05 21:57:54 - ERROR - stderr - +2025-02-05 21:57:54 - ERROR - stderr - +2025-02-05 21:57:54 - INFO - stdout - {'loss': 0.6241, 'grad_norm': 1.0419423580169678, 'learning_rate': 9.20101690507515e-06, 'epoch': 1.62} +2025-02-05 21:57:54 - ERROR - stderr - 54%|█████▍ | 12108/22434 [11:50:14<7:12:23, 2.51s/it] +2025-02-05 21:57:57 - ERROR - stderr - 54%|█████▍ | 12109/22434 [11:50:16<7:11:49, 2.51s/it] +2025-02-05 21:57:57 - ERROR - stderr - +2025-02-05 21:57:57 - ERROR - stderr - +2025-02-05 21:57:57 - INFO - stdout - {'loss': 0.7334, 'grad_norm': 1.205247402191162, 'learning_rate': 9.199577782412752e-06, 'epoch': 1.62} +2025-02-05 21:57:57 - ERROR - stderr - 54%|█████▍ | 12109/22434 [11:50:16<7:11:49, 2.51s/it] +2025-02-05 21:57:59 - ERROR - stderr - 54%|█████▍ | 12110/22434 [11:50:19<7:13:42, 2.52s/it] +2025-02-05 21:57:59 - ERROR - stderr - +2025-02-05 21:57:59 - ERROR - stderr - +2025-02-05 21:57:59 - INFO - stdout - {'loss': 0.7165, 'grad_norm': 1.338371753692627, 'learning_rate': 9.198138676434387e-06, 'epoch': 1.62} +2025-02-05 21:57:59 - ERROR - stderr - 54%|█████▍ | 12110/22434 [11:50:19<7:13:42, 2.52s/it] +2025-02-05 21:58:02 - ERROR - stderr - 54%|█████▍ | 12111/22434 [11:50:22<7:16:20, 2.54s/it] +2025-02-05 21:58:02 - ERROR - stderr - +2025-02-05 21:58:02 - ERROR - stderr - +2025-02-05 21:58:02 - INFO - stdout - {'loss': 0.7499, 'grad_norm': 1.276588797569275, 'learning_rate': 9.196699587170053e-06, 'epoch': 1.62} +2025-02-05 21:58:02 - ERROR - stderr - 54%|█████▍ | 12111/22434 [11:50:22<7:16:20, 2.54s/it] +2025-02-05 21:58:04 - ERROR - stderr - 54%|█████▍ | 12112/22434 [11:50:24<7:15:29, 2.53s/it] +2025-02-05 21:58:04 - ERROR - stderr - +2025-02-05 21:58:04 - ERROR - stderr - +2025-02-05 21:58:04 - INFO - stdout - {'loss': 0.7614, 'grad_norm': 1.4011569023132324, 'learning_rate': 9.195260514649748e-06, 'epoch': 1.62} +2025-02-05 21:58:04 - ERROR - stderr - 54%|█████▍ | 12112/22434 [11:50:24<7:15:29, 2.53s/it] +2025-02-05 21:58:07 - ERROR - stderr - 54%|█████▍ | 12113/22434 [11:50:27<7:16:32, 2.54s/it] +2025-02-05 21:58:07 - ERROR - stderr - +2025-02-05 21:58:07 - ERROR - stderr - +2025-02-05 21:58:07 - INFO - stdout - {'loss': 0.6607, 'grad_norm': 1.2182762622833252, 'learning_rate': 9.19382145890347e-06, 'epoch': 1.62} +2025-02-05 21:58:07 - ERROR - stderr - 54%|█████▍ | 12113/22434 [11:50:27<7:16:32, 2.54s/it] +2025-02-05 21:58:09 - ERROR - stderr - 54%|█████▍ | 12114/22434 [11:50:29<7:15:55, 2.53s/it] +2025-02-05 21:58:09 - ERROR - stderr - +2025-02-05 21:58:09 - ERROR - stderr - +2025-02-05 21:58:09 - INFO - stdout - {'loss': 0.6295, 'grad_norm': 1.2512187957763672, 'learning_rate': 9.192382419961208e-06, 'epoch': 1.62} +2025-02-05 21:58:09 - ERROR - stderr - 54%|█████▍ | 12114/22434 [11:50:29<7:15:55, 2.53s/it] +2025-02-05 21:58:12 - ERROR - stderr - 54%|█████▍ | 12115/22434 [11:50:32<7:20:07, 2.56s/it] +2025-02-05 21:58:12 - ERROR - stderr - +2025-02-05 21:58:12 - ERROR - stderr - +2025-02-05 21:58:12 - INFO - stdout - {'loss': 0.7387, 'grad_norm': 1.3180410861968994, 'learning_rate': 9.190943397852966e-06, 'epoch': 1.62} +2025-02-05 21:58:12 - ERROR - stderr - 54%|█████▍ | 12115/22434 [11:50:32<7:20:07, 2.56s/it] +2025-02-05 21:58:15 - ERROR - stderr - 54%|█████▍ | 12116/22434 [11:50:34<7:20:45, 2.56s/it] +2025-02-05 21:58:15 - ERROR - stderr - +2025-02-05 21:58:15 - ERROR - stderr - +2025-02-05 21:58:15 - INFO - stdout - {'loss': 0.6517, 'grad_norm': 1.247375249862671, 'learning_rate': 9.18950439260873e-06, 'epoch': 1.62} +2025-02-05 21:58:15 - ERROR - stderr - 54%|█████▍ | 12116/22434 [11:50:34<7:20:45, 2.56s/it] +2025-02-05 21:58:17 - ERROR - stderr - 54%|█████▍ | 12117/22434 [11:50:37<7:18:05, 2.55s/it] +2025-02-05 21:58:17 - ERROR - stderr - +2025-02-05 21:58:17 - ERROR - stderr - +2025-02-05 21:58:17 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.2085763216018677, 'learning_rate': 9.188065404258502e-06, 'epoch': 1.62} +2025-02-05 21:58:17 - ERROR - stderr - 54%|█████▍ | 12117/22434 [11:50:37<7:18:05, 2.55s/it] +2025-02-05 21:58:20 - ERROR - stderr - 54%|█████▍ | 12118/22434 [11:50:39<7:21:03, 2.57s/it] +2025-02-05 21:58:20 - ERROR - stderr - +2025-02-05 21:58:20 - ERROR - stderr - +2025-02-05 21:58:20 - INFO - stdout - {'loss': 0.6507, 'grad_norm': 1.1321659088134766, 'learning_rate': 9.186626432832275e-06, 'epoch': 1.62} +2025-02-05 21:58:20 - ERROR - stderr - 54%|█████▍ | 12118/22434 [11:50:39<7:21:03, 2.57s/it] +2025-02-05 21:58:22 - ERROR - stderr - 54%|█████▍ | 12119/22434 [11:50:42<7:18:19, 2.55s/it] +2025-02-05 21:58:22 - ERROR - stderr - +2025-02-05 21:58:22 - ERROR - stderr - +2025-02-05 21:58:22 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.339453935623169, 'learning_rate': 9.185187478360037e-06, 'epoch': 1.62} +2025-02-05 21:58:22 - ERROR - stderr - 54%|█████▍ | 12119/22434 [11:50:42<7:18:19, 2.55s/it] +2025-02-05 21:58:25 - ERROR - stderr - 54%|█████▍ | 12120/22434 [11:50:45<7:35:04, 2.65s/it] +2025-02-05 21:58:25 - ERROR - stderr - +2025-02-05 21:58:25 - ERROR - stderr - +2025-02-05 21:58:25 - INFO - stdout - {'loss': 0.7445, 'grad_norm': 1.2092570066452026, 'learning_rate': 9.18374854087179e-06, 'epoch': 1.62} +2025-02-05 21:58:25 - ERROR - stderr - 54%|█████▍ | 12120/22434 [11:50:45<7:35:04, 2.65s/it] +2025-02-05 21:58:28 - ERROR - stderr - 54%|█████▍ | 12121/22434 [11:50:47<7:32:27, 2.63s/it] +2025-02-05 21:58:28 - ERROR - stderr - +2025-02-05 21:58:28 - ERROR - stderr - +2025-02-05 21:58:28 - INFO - stdout - {'loss': 0.7565, 'grad_norm': 1.3333697319030762, 'learning_rate': 9.182309620397525e-06, 'epoch': 1.62} +2025-02-05 21:58:28 - ERROR - stderr - 54%|█████▍ | 12121/22434 [11:50:47<7:32:27, 2.63s/it] +2025-02-05 21:58:30 - ERROR - stderr - 54%|█████▍ | 12122/22434 [11:50:50<7:39:37, 2.67s/it] +2025-02-05 21:58:30 - ERROR - stderr - +2025-02-05 21:58:30 - ERROR - stderr - +2025-02-05 21:58:30 - INFO - stdout - {'loss': 0.6198, 'grad_norm': 1.3010880947113037, 'learning_rate': 9.18087071696723e-06, 'epoch': 1.62} +2025-02-05 21:58:30 - ERROR - stderr - 54%|█████▍ | 12122/22434 [11:50:50<7:39:37, 2.67s/it] +2025-02-05 21:58:33 - ERROR - stderr - 54%|█████▍ | 12123/22434 [11:50:53<7:41:42, 2.69s/it] +2025-02-05 21:58:33 - ERROR - stderr - +2025-02-05 21:58:33 - ERROR - stderr - +2025-02-05 21:58:33 - INFO - stdout - {'loss': 0.7412, 'grad_norm': 1.3878114223480225, 'learning_rate': 9.179431830610905e-06, 'epoch': 1.62} +2025-02-05 21:58:33 - ERROR - stderr - 54%|█████▍ | 12123/22434 [11:50:53<7:41:42, 2.69s/it] +2025-02-05 21:58:36 - ERROR - stderr - 54%|█████▍ | 12124/22434 [11:50:55<7:33:25, 2.64s/it] +2025-02-05 21:58:36 - ERROR - stderr - +2025-02-05 21:58:36 - ERROR - stderr - +2025-02-05 21:58:36 - INFO - stdout - {'loss': 0.6429, 'grad_norm': 1.1799821853637695, 'learning_rate': 9.177992961358533e-06, 'epoch': 1.62} +2025-02-05 21:58:36 - ERROR - stderr - 54%|█████▍ | 12124/22434 [11:50:55<7:33:25, 2.64s/it] +2025-02-05 21:58:38 - ERROR - stderr - 54%|█████▍ | 12125/22434 [11:50:58<7:28:18, 2.61s/it] +2025-02-05 21:58:38 - ERROR - stderr - +2025-02-05 21:58:38 - ERROR - stderr - +2025-02-05 21:58:38 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.3221498727798462, 'learning_rate': 9.176554109240115e-06, 'epoch': 1.62} +2025-02-05 21:58:38 - ERROR - stderr - 54%|█████▍ | 12125/22434 [11:50:58<7:28:18, 2.61s/it] +2025-02-05 21:58:41 - ERROR - stderr - 54%|█████▍ | 12126/22434 [11:51:01<7:24:34, 2.59s/it] +2025-02-05 21:58:41 - ERROR - stderr - +2025-02-05 21:58:41 - ERROR - stderr - +2025-02-05 21:58:41 - INFO - stdout - {'loss': 0.7287, 'grad_norm': 1.2941234111785889, 'learning_rate': 9.175115274285639e-06, 'epoch': 1.62} +2025-02-05 21:58:41 - ERROR - stderr - 54%|█████▍ | 12126/22434 [11:51:01<7:24:34, 2.59s/it] +2025-02-05 21:58:43 - ERROR - stderr - 54%|█████▍ | 12127/22434 [11:51:03<7:27:45, 2.61s/it] +2025-02-05 21:58:43 - ERROR - stderr - +2025-02-05 21:58:43 - ERROR - stderr - +2025-02-05 21:58:43 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.2035632133483887, 'learning_rate': 9.173676456525091e-06, 'epoch': 1.62} +2025-02-05 21:58:43 - ERROR - stderr - 54%|█████▍ | 12127/22434 [11:51:03<7:27:45, 2.61s/it] +2025-02-05 21:58:46 - ERROR - stderr - 54%|█████▍ | 12128/22434 [11:51:06<7:22:04, 2.57s/it] +2025-02-05 21:58:46 - ERROR - stderr - +2025-02-05 21:58:46 - ERROR - stderr - +2025-02-05 21:58:46 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.2470849752426147, 'learning_rate': 9.172237655988472e-06, 'epoch': 1.62} +2025-02-05 21:58:46 - ERROR - stderr - 54%|█████▍ | 12128/22434 [11:51:06<7:22:04, 2.57s/it] +2025-02-05 21:58:48 - ERROR - stderr - 54%|█████▍ | 12129/22434 [11:51:08<7:16:35, 2.54s/it] +2025-02-05 21:58:48 - ERROR - stderr - +2025-02-05 21:58:48 - ERROR - stderr - +2025-02-05 21:58:48 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.2868101596832275, 'learning_rate': 9.170798872705767e-06, 'epoch': 1.62} +2025-02-05 21:58:48 - ERROR - stderr - 54%|█████▍ | 12129/22434 [11:51:08<7:16:35, 2.54s/it] +2025-02-05 21:58:51 - ERROR - stderr - 54%|█████▍ | 12130/22434 [11:51:11<7:14:33, 2.53s/it] +2025-02-05 21:58:51 - ERROR - stderr - +2025-02-05 21:58:51 - ERROR - stderr - +2025-02-05 21:58:51 - INFO - stdout - {'loss': 0.8034, 'grad_norm': 1.4334930181503296, 'learning_rate': 9.169360106706962e-06, 'epoch': 1.62} +2025-02-05 21:58:51 - ERROR - stderr - 54%|█████▍ | 12130/22434 [11:51:11<7:14:33, 2.53s/it] +2025-02-05 21:58:53 - ERROR - stderr - 54%|█████▍ | 12131/22434 [11:51:13<7:14:55, 2.53s/it] +2025-02-05 21:58:53 - ERROR - stderr - +2025-02-05 21:58:53 - ERROR - stderr - +2025-02-05 21:58:53 - INFO - stdout - {'loss': 0.7017, 'grad_norm': 1.3387612104415894, 'learning_rate': 9.167921358022053e-06, 'epoch': 1.62} +2025-02-05 21:58:53 - ERROR - stderr - 54%|█████▍ | 12131/22434 [11:51:13<7:14:55, 2.53s/it] +2025-02-05 21:58:56 - ERROR - stderr - 54%|█████▍ | 12132/22434 [11:51:16<7:15:50, 2.54s/it] +2025-02-05 21:58:56 - ERROR - stderr - +2025-02-05 21:58:56 - ERROR - stderr - +2025-02-05 21:58:56 - INFO - stdout - {'loss': 0.7781, 'grad_norm': 1.3273720741271973, 'learning_rate': 9.166482626681024e-06, 'epoch': 1.62} +2025-02-05 21:58:56 - ERROR - stderr - 54%|█████▍ | 12132/22434 [11:51:16<7:15:50, 2.54s/it] +2025-02-05 21:58:58 - ERROR - stderr - 54%|█████▍ | 12133/22434 [11:51:18<7:15:32, 2.54s/it] +2025-02-05 21:58:59 - ERROR - stderr - +2025-02-05 21:58:59 - ERROR - stderr - +2025-02-05 21:58:59 - INFO - stdout - {'loss': 0.6407, 'grad_norm': 1.2694358825683594, 'learning_rate': 9.165043912713873e-06, 'epoch': 1.62} +2025-02-05 21:58:59 - ERROR - stderr - 54%|█████▍ | 12133/22434 [11:51:18<7:15:32, 2.54s/it] +2025-02-05 21:59:01 - ERROR - stderr - 54%|█████▍ | 12134/22434 [11:51:21<7:14:10, 2.53s/it] +2025-02-05 21:59:01 - ERROR - stderr - +2025-02-05 21:59:01 - ERROR - stderr - +2025-02-05 21:59:01 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.1428979635238647, 'learning_rate': 9.16360521615058e-06, 'epoch': 1.62} +2025-02-05 21:59:01 - ERROR - stderr - 54%|█████▍ | 12134/22434 [11:51:21<7:14:10, 2.53s/it] +2025-02-05 21:59:04 - ERROR - stderr - 54%|█████▍ | 12135/22434 [11:51:24<7:30:40, 2.63s/it] +2025-02-05 21:59:04 - ERROR - stderr - +2025-02-05 21:59:04 - ERROR - stderr - +2025-02-05 21:59:04 - INFO - stdout - {'loss': 0.6378, 'grad_norm': 1.1809850931167603, 'learning_rate': 9.162166537021134e-06, 'epoch': 1.62} +2025-02-05 21:59:04 - ERROR - stderr - 54%|█████▍ | 12135/22434 [11:51:24<7:30:40, 2.63s/it] +2025-02-05 21:59:06 - ERROR - stderr - 54%|█████▍ | 12136/22434 [11:51:26<7:20:38, 2.57s/it] +2025-02-05 21:59:06 - ERROR - stderr - +2025-02-05 21:59:06 - ERROR - stderr - +2025-02-05 21:59:06 - INFO - stdout - {'loss': 0.6299, 'grad_norm': 1.2608532905578613, 'learning_rate': 9.16072787535553e-06, 'epoch': 1.62} +2025-02-05 21:59:06 - ERROR - stderr - 54%|█████▍ | 12136/22434 [11:51:26<7:20:38, 2.57s/it] +2025-02-05 21:59:09 - ERROR - stderr - 54%|█████▍ | 12137/22434 [11:51:29<7:17:32, 2.55s/it] +2025-02-05 21:59:09 - ERROR - stderr - +2025-02-05 21:59:09 - ERROR - stderr - +2025-02-05 21:59:09 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.1949703693389893, 'learning_rate': 9.159289231183745e-06, 'epoch': 1.62} +2025-02-05 21:59:09 - ERROR - stderr - 54%|█████▍ | 12137/22434 [11:51:29<7:17:32, 2.55s/it] +2025-02-05 21:59:11 - ERROR - stderr - 54%|█████▍ | 12138/22434 [11:51:31<7:18:43, 2.56s/it] +2025-02-05 21:59:11 - ERROR - stderr - +2025-02-05 21:59:11 - ERROR - stderr - +2025-02-05 21:59:11 - INFO - stdout - {'loss': 0.6902, 'grad_norm': 1.221248984336853, 'learning_rate': 9.15785060453577e-06, 'epoch': 1.62} +2025-02-05 21:59:11 - ERROR - stderr - 54%|█████▍ | 12138/22434 [11:51:31<7:18:43, 2.56s/it] +2025-02-05 21:59:14 - ERROR - stderr - 54%|█████▍ | 12139/22434 [11:51:34<7:12:04, 2.52s/it] +2025-02-05 21:59:14 - ERROR - stderr - +2025-02-05 21:59:14 - ERROR - stderr - +2025-02-05 21:59:14 - INFO - stdout - {'loss': 0.6719, 'grad_norm': 1.3226178884506226, 'learning_rate': 9.1564119954416e-06, 'epoch': 1.62} +2025-02-05 21:59:14 - ERROR - stderr - 54%|█████▍ | 12139/22434 [11:51:34<7:12:04, 2.52s/it] +2025-02-05 21:59:16 - ERROR - stderr - 54%|█████▍ | 12140/22434 [11:51:36<7:09:01, 2.50s/it] +2025-02-05 21:59:16 - ERROR - stderr - +2025-02-05 21:59:16 - ERROR - stderr - +2025-02-05 21:59:16 - INFO - stdout - {'loss': 0.6715, 'grad_norm': 1.3858745098114014, 'learning_rate': 9.154973403931207e-06, 'epoch': 1.62} +2025-02-05 21:59:16 - ERROR - stderr - 54%|█████▍ | 12140/22434 [11:51:36<7:09:01, 2.50s/it] +2025-02-05 21:59:19 - ERROR - stderr - 54%|█████▍ | 12141/22434 [11:51:39<7:09:07, 2.50s/it] +2025-02-05 21:59:19 - ERROR - stderr - +2025-02-05 21:59:19 - ERROR - stderr - +2025-02-05 21:59:19 - INFO - stdout - {'loss': 0.658, 'grad_norm': 1.1939282417297363, 'learning_rate': 9.153534830034591e-06, 'epoch': 1.62} +2025-02-05 21:59:19 - ERROR - stderr - 54%|█████▍ | 12141/22434 [11:51:39<7:09:07, 2.50s/it] +2025-02-05 21:59:21 - ERROR - stderr - 54%|█████▍ | 12142/22434 [11:51:41<7:08:38, 2.50s/it] +2025-02-05 21:59:21 - ERROR - stderr - +2025-02-05 21:59:21 - ERROR - stderr - +2025-02-05 21:59:21 - INFO - stdout - {'loss': 0.6746, 'grad_norm': 1.0602447986602783, 'learning_rate': 9.152096273781732e-06, 'epoch': 1.62} +2025-02-05 21:59:21 - ERROR - stderr - 54%|█████▍ | 12142/22434 [11:51:41<7:08:38, 2.50s/it] +2025-02-05 21:59:24 - ERROR - stderr - 54%|█████▍ | 12143/22434 [11:51:43<7:07:36, 2.49s/it] +2025-02-05 21:59:24 - ERROR - stderr - +2025-02-05 21:59:24 - ERROR - stderr - +2025-02-05 21:59:24 - INFO - stdout - {'loss': 0.6873, 'grad_norm': 1.2562229633331299, 'learning_rate': 9.15065773520261e-06, 'epoch': 1.62} +2025-02-05 21:59:24 - ERROR - stderr - 54%|█████▍ | 12143/22434 [11:51:44<7:07:36, 2.49s/it] +2025-02-05 21:59:26 - ERROR - stderr - 54%|█████▍ | 12144/22434 [11:51:46<7:05:44, 2.48s/it] +2025-02-05 21:59:26 - ERROR - stderr - +2025-02-05 21:59:26 - ERROR - stderr - +2025-02-05 21:59:26 - INFO - stdout - {'loss': 0.6739, 'grad_norm': 1.2610026597976685, 'learning_rate': 9.149219214327217e-06, 'epoch': 1.62} +2025-02-05 21:59:26 - ERROR - stderr - 54%|█████▍ | 12144/22434 [11:51:46<7:05:44, 2.48s/it] +2025-02-05 21:59:29 - ERROR - stderr - 54%|█████▍ | 12145/22434 [11:51:48<7:06:09, 2.49s/it] +2025-02-05 21:59:29 - ERROR - stderr - +2025-02-05 21:59:29 - ERROR - stderr - +2025-02-05 21:59:29 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.2135074138641357, 'learning_rate': 9.147780711185538e-06, 'epoch': 1.62} +2025-02-05 21:59:29 - ERROR - stderr - 54%|█████▍ | 12145/22434 [11:51:48<7:06:09, 2.49s/it] +2025-02-05 21:59:31 - ERROR - stderr - 54%|█████▍ | 12146/22434 [11:51:51<7:07:21, 2.49s/it] +2025-02-05 21:59:31 - ERROR - stderr - +2025-02-05 21:59:31 - ERROR - stderr - +2025-02-05 21:59:31 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 1.1034901142120361, 'learning_rate': 9.14634222580755e-06, 'epoch': 1.62} +2025-02-05 21:59:31 - ERROR - stderr - 54%|█████▍ | 12146/22434 [11:51:51<7:07:21, 2.49s/it] +2025-02-05 21:59:34 - ERROR - stderr - 54%|█████▍ | 12147/22434 [11:51:53<7:04:04, 2.47s/it] +2025-02-05 21:59:34 - ERROR - stderr - +2025-02-05 21:59:34 - ERROR - stderr - +2025-02-05 21:59:34 - INFO - stdout - {'loss': 0.6819, 'grad_norm': 1.2972038984298706, 'learning_rate': 9.144903758223245e-06, 'epoch': 1.62} +2025-02-05 21:59:34 - ERROR - stderr - 54%|█████▍ | 12147/22434 [11:51:53<7:04:04, 2.47s/it] +2025-02-05 21:59:36 - ERROR - stderr - 54%|█████▍ | 12148/22434 [11:51:56<7:06:18, 2.49s/it] +2025-02-05 21:59:36 - ERROR - stderr - +2025-02-05 21:59:36 - ERROR - stderr - +2025-02-05 21:59:36 - INFO - stdout - {'loss': 0.6281, 'grad_norm': 1.1745084524154663, 'learning_rate': 9.143465308462598e-06, 'epoch': 1.62} +2025-02-05 21:59:36 - ERROR - stderr - 54%|█████▍ | 12148/22434 [11:51:56<7:06:18, 2.49s/it] +2025-02-05 21:59:39 - ERROR - stderr - 54%|█████▍ | 12149/22434 [11:51:58<7:03:58, 2.47s/it] +2025-02-05 21:59:39 - ERROR - stderr - +2025-02-05 21:59:39 - ERROR - stderr - +2025-02-05 21:59:39 - INFO - stdout - {'loss': 0.7848, 'grad_norm': 1.3178049325942993, 'learning_rate': 9.142026876555602e-06, 'epoch': 1.62} +2025-02-05 21:59:39 - ERROR - stderr - 54%|█████▍ | 12149/22434 [11:51:58<7:03:58, 2.47s/it] +2025-02-05 21:59:41 - ERROR - stderr - 54%|█████▍ | 12150/22434 [11:52:01<7:06:03, 2.49s/it] +2025-02-05 21:59:41 - ERROR - stderr - +2025-02-05 21:59:41 - ERROR - stderr - +2025-02-05 21:59:41 - INFO - stdout - {'loss': 0.7239, 'grad_norm': 1.239805817604065, 'learning_rate': 9.140588462532233e-06, 'epoch': 1.62} +2025-02-05 21:59:41 - ERROR - stderr - 54%|█████▍ | 12150/22434 [11:52:01<7:06:03, 2.49s/it] +2025-02-05 21:59:44 - ERROR - stderr - 54%|█████▍ | 12151/22434 [11:52:03<7:06:47, 2.49s/it] +2025-02-05 21:59:44 - ERROR - stderr - +2025-02-05 21:59:44 - ERROR - stderr - +2025-02-05 21:59:44 - INFO - stdout - {'loss': 0.7369, 'grad_norm': 1.2920856475830078, 'learning_rate': 9.139150066422474e-06, 'epoch': 1.62} +2025-02-05 21:59:44 - ERROR - stderr - 54%|█████▍ | 12151/22434 [11:52:03<7:06:47, 2.49s/it] +2025-02-05 21:59:46 - ERROR - stderr - 54%|█████▍ | 12152/22434 [11:52:06<7:10:44, 2.51s/it] +2025-02-05 21:59:46 - ERROR - stderr - +2025-02-05 21:59:46 - ERROR - stderr - +2025-02-05 21:59:46 - INFO - stdout - {'loss': 0.727, 'grad_norm': 1.1460797786712646, 'learning_rate': 9.137711688256312e-06, 'epoch': 1.63} +2025-02-05 21:59:46 - ERROR - stderr - 54%|█████▍ | 12152/22434 [11:52:06<7:10:44, 2.51s/it] +2025-02-05 21:59:49 - ERROR - stderr - 54%|█████▍ | 12153/22434 [11:52:08<7:09:42, 2.51s/it] +2025-02-05 21:59:49 - ERROR - stderr - +2025-02-05 21:59:49 - ERROR - stderr - +2025-02-05 21:59:49 - INFO - stdout - {'loss': 0.6365, 'grad_norm': 1.0860310792922974, 'learning_rate': 9.13627332806372e-06, 'epoch': 1.63} +2025-02-05 21:59:49 - ERROR - stderr - 54%|█████▍ | 12153/22434 [11:52:08<7:09:42, 2.51s/it] +2025-02-05 21:59:51 - ERROR - stderr - 54%|█████▍ | 12154/22434 [11:52:11<7:07:14, 2.49s/it] +2025-02-05 21:59:51 - ERROR - stderr - +2025-02-05 21:59:51 - ERROR - stderr - +2025-02-05 21:59:51 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.1594702005386353, 'learning_rate': 9.134834985874687e-06, 'epoch': 1.63} +2025-02-05 21:59:51 - ERROR - stderr - 54%|█████▍ | 12154/22434 [11:52:11<7:07:14, 2.49s/it] +2025-02-05 21:59:54 - ERROR - stderr - 54%|█████▍ | 12155/22434 [11:52:13<7:08:18, 2.50s/it] +2025-02-05 21:59:54 - ERROR - stderr - +2025-02-05 21:59:54 - ERROR - stderr - +2025-02-05 21:59:54 - INFO - stdout - {'loss': 0.692, 'grad_norm': 1.1805531978607178, 'learning_rate': 9.133396661719193e-06, 'epoch': 1.63} +2025-02-05 21:59:54 - ERROR - stderr - 54%|█████▍ | 12155/22434 [11:52:13<7:08:18, 2.50s/it] +2025-02-05 21:59:56 - ERROR - stderr - 54%|█████▍ | 12156/22434 [11:52:16<7:10:34, 2.51s/it] +2025-02-05 21:59:56 - ERROR - stderr - +2025-02-05 21:59:56 - ERROR - stderr - +2025-02-05 21:59:56 - INFO - stdout - {'loss': 0.6381, 'grad_norm': 1.1558222770690918, 'learning_rate': 9.13195835562721e-06, 'epoch': 1.63} +2025-02-05 21:59:56 - ERROR - stderr - 54%|█████▍ | 12156/22434 [11:52:16<7:10:34, 2.51s/it] +2025-02-05 21:59:59 - ERROR - stderr - 54%|█████▍ | 12157/22434 [11:52:18<7:08:08, 2.50s/it] +2025-02-05 21:59:59 - ERROR - stderr - +2025-02-05 21:59:59 - ERROR - stderr - +2025-02-05 21:59:59 - INFO - stdout - {'loss': 0.7571, 'grad_norm': 1.2628341913223267, 'learning_rate': 9.130520067628728e-06, 'epoch': 1.63} +2025-02-05 21:59:59 - ERROR - stderr - 54%|█████▍ | 12157/22434 [11:52:18<7:08:08, 2.50s/it] +2025-02-05 22:00:01 - ERROR - stderr - 54%|█████▍ | 12158/22434 [11:52:21<7:10:56, 2.52s/it] +2025-02-05 22:00:01 - ERROR - stderr - +2025-02-05 22:00:01 - ERROR - stderr - +2025-02-05 22:00:01 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.243841528892517, 'learning_rate': 9.129081797753724e-06, 'epoch': 1.63} +2025-02-05 22:00:01 - ERROR - stderr - 54%|█████▍ | 12158/22434 [11:52:21<7:10:56, 2.52s/it] +2025-02-05 22:00:04 - ERROR - stderr - 54%|█████▍ | 12159/22434 [11:52:23<7:07:44, 2.50s/it] +2025-02-05 22:00:04 - ERROR - stderr - +2025-02-05 22:00:04 - ERROR - stderr - +2025-02-05 22:00:04 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.1264487504959106, 'learning_rate': 9.127643546032174e-06, 'epoch': 1.63} +2025-02-05 22:00:04 - ERROR - stderr - 54%|█████▍ | 12159/22434 [11:52:23<7:07:44, 2.50s/it] +2025-02-05 22:00:06 - ERROR - stderr - 54%|█████▍ | 12160/22434 [11:52:26<7:12:48, 2.53s/it] +2025-02-05 22:00:06 - ERROR - stderr - +2025-02-05 22:00:06 - ERROR - stderr - +2025-02-05 22:00:06 - INFO - stdout - {'loss': 0.7512, 'grad_norm': 1.3356196880340576, 'learning_rate': 9.126205312494062e-06, 'epoch': 1.63} +2025-02-05 22:00:06 - ERROR - stderr - 54%|█████▍ | 12160/22434 [11:52:26<7:12:48, 2.53s/it] +2025-02-05 22:00:09 - ERROR - stderr - 54%|█████▍ | 12161/22434 [11:52:29<7:16:03, 2.55s/it] +2025-02-05 22:00:09 - ERROR - stderr - +2025-02-05 22:00:09 - ERROR - stderr - +2025-02-05 22:00:09 - INFO - stdout - {'loss': 0.7969, 'grad_norm': 1.3498576879501343, 'learning_rate': 9.124767097169362e-06, 'epoch': 1.63} +2025-02-05 22:00:09 - ERROR - stderr - 54%|█████▍ | 12161/22434 [11:52:29<7:16:03, 2.55s/it] +2025-02-05 22:00:12 - ERROR - stderr - 54%|█████▍ | 12162/22434 [11:52:31<7:23:25, 2.59s/it] +2025-02-05 22:00:12 - ERROR - stderr - +2025-02-05 22:00:12 - ERROR - stderr - +2025-02-05 22:00:12 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.1330918073654175, 'learning_rate': 9.123328900088058e-06, 'epoch': 1.63} +2025-02-05 22:00:12 - ERROR - stderr - 54%|█████▍ | 12162/22434 [11:52:31<7:23:25, 2.59s/it] +2025-02-05 22:00:14 - ERROR - stderr - 54%|█████▍ | 12163/22434 [11:52:34<7:19:15, 2.57s/it] +2025-02-05 22:00:14 - ERROR - stderr - +2025-02-05 22:00:14 - ERROR - stderr - +2025-02-05 22:00:14 - INFO - stdout - {'loss': 0.6979, 'grad_norm': 1.2193334102630615, 'learning_rate': 9.121890721280121e-06, 'epoch': 1.63} +2025-02-05 22:00:14 - ERROR - stderr - 54%|█████▍ | 12163/22434 [11:52:34<7:19:15, 2.57s/it] +2025-02-05 22:00:17 - ERROR - stderr - 54%|█████▍ | 12164/22434 [11:52:36<7:15:51, 2.55s/it] +2025-02-05 22:00:17 - ERROR - stderr - +2025-02-05 22:00:17 - ERROR - stderr - +2025-02-05 22:00:17 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.4225854873657227, 'learning_rate': 9.120452560775532e-06, 'epoch': 1.63} +2025-02-05 22:00:17 - ERROR - stderr - 54%|█████▍ | 12164/22434 [11:52:36<7:15:51, 2.55s/it] +2025-02-05 22:00:19 - ERROR - stderr - 54%|█████▍ | 12165/22434 [11:52:39<7:17:11, 2.55s/it] +2025-02-05 22:00:19 - ERROR - stderr - +2025-02-05 22:00:19 - ERROR - stderr - +2025-02-05 22:00:19 - INFO - stdout - {'loss': 0.7003, 'grad_norm': 1.1815792322158813, 'learning_rate': 9.119014418604269e-06, 'epoch': 1.63} +2025-02-05 22:00:19 - ERROR - stderr - 54%|█████▍ | 12165/22434 [11:52:39<7:17:11, 2.55s/it] +2025-02-05 22:00:22 - ERROR - stderr - 54%|█████▍ | 12166/22434 [11:52:42<7:21:42, 2.58s/it] +2025-02-05 22:00:22 - ERROR - stderr - +2025-02-05 22:00:22 - ERROR - stderr - +2025-02-05 22:00:22 - INFO - stdout - {'loss': 0.6451, 'grad_norm': 1.179482340812683, 'learning_rate': 9.117576294796307e-06, 'epoch': 1.63} +2025-02-05 22:00:22 - ERROR - stderr - 54%|█████▍ | 12166/22434 [11:52:42<7:21:42, 2.58s/it] +2025-02-05 22:00:24 - ERROR - stderr - 54%|█████▍ | 12167/22434 [11:52:44<7:21:22, 2.58s/it] +2025-02-05 22:00:24 - ERROR - stderr - +2025-02-05 22:00:24 - ERROR - stderr - +2025-02-05 22:00:24 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.2368333339691162, 'learning_rate': 9.11613818938162e-06, 'epoch': 1.63} +2025-02-05 22:00:24 - ERROR - stderr - 54%|█████▍ | 12167/22434 [11:52:44<7:21:22, 2.58s/it] +2025-02-05 22:00:27 - ERROR - stderr - 54%|█████▍ | 12168/22434 [11:52:47<7:20:34, 2.57s/it] +2025-02-05 22:00:27 - ERROR - stderr - +2025-02-05 22:00:27 - ERROR - stderr - +2025-02-05 22:00:27 - INFO - stdout - {'loss': 0.7017, 'grad_norm': 1.2230229377746582, 'learning_rate': 9.11470010239019e-06, 'epoch': 1.63} +2025-02-05 22:00:27 - ERROR - stderr - 54%|█████▍ | 12168/22434 [11:52:47<7:20:34, 2.57s/it] +2025-02-05 22:00:29 - ERROR - stderr - 54%|█████▍ | 12169/22434 [11:52:49<7:18:36, 2.56s/it] +2025-02-05 22:00:29 - ERROR - stderr - +2025-02-05 22:00:29 - ERROR - stderr - +2025-02-05 22:00:29 - INFO - stdout - {'loss': 0.7569, 'grad_norm': 1.2191781997680664, 'learning_rate': 9.113262033851988e-06, 'epoch': 1.63} +2025-02-05 22:00:29 - ERROR - stderr - 54%|█████▍ | 12169/22434 [11:52:49<7:18:36, 2.56s/it] +2025-02-05 22:00:32 - ERROR - stderr - 54%|█████▍ | 12170/22434 [11:52:52<7:14:58, 2.54s/it] +2025-02-05 22:00:32 - ERROR - stderr - +2025-02-05 22:00:32 - ERROR - stderr - +2025-02-05 22:00:32 - INFO - stdout - {'loss': 0.801, 'grad_norm': 1.251646637916565, 'learning_rate': 9.11182398379699e-06, 'epoch': 1.63} +2025-02-05 22:00:32 - ERROR - stderr - 54%|█████▍ | 12170/22434 [11:52:52<7:14:58, 2.54s/it] +2025-02-05 22:00:34 - ERROR - stderr - 54%|█████▍ | 12171/22434 [11:52:54<7:12:24, 2.53s/it] +2025-02-05 22:00:34 - ERROR - stderr - +2025-02-05 22:00:34 - ERROR - stderr - +2025-02-05 22:00:34 - INFO - stdout - {'loss': 0.7004, 'grad_norm': 1.100760579109192, 'learning_rate': 9.110385952255174e-06, 'epoch': 1.63} +2025-02-05 22:00:34 - ERROR - stderr - 54%|█████▍ | 12171/22434 [11:52:54<7:12:24, 2.53s/it] +2025-02-05 22:00:37 - ERROR - stderr - 54%|█████▍ | 12172/22434 [11:52:57<7:10:36, 2.52s/it] +2025-02-05 22:00:37 - ERROR - stderr - +2025-02-05 22:00:37 - ERROR - stderr - +2025-02-05 22:00:37 - INFO - stdout - {'loss': 0.7105, 'grad_norm': 1.2002395391464233, 'learning_rate': 9.108947939256508e-06, 'epoch': 1.63} +2025-02-05 22:00:37 - ERROR - stderr - 54%|█████▍ | 12172/22434 [11:52:57<7:10:36, 2.52s/it] +2025-02-05 22:00:40 - ERROR - stderr - 54%|█████▍ | 12173/22434 [11:52:59<7:16:55, 2.55s/it] +2025-02-05 22:00:40 - ERROR - stderr - +2025-02-05 22:00:40 - ERROR - stderr - +2025-02-05 22:00:40 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.2985812425613403, 'learning_rate': 9.107509944830972e-06, 'epoch': 1.63} +2025-02-05 22:00:40 - ERROR - stderr - 54%|█████▍ | 12173/22434 [11:52:59<7:16:55, 2.55s/it] +2025-02-05 22:00:42 - ERROR - stderr - 54%|█████▍ | 12174/22434 [11:53:02<7:17:50, 2.56s/it] +2025-02-05 22:00:42 - ERROR - stderr - +2025-02-05 22:00:42 - ERROR - stderr - +2025-02-05 22:00:42 - INFO - stdout - {'loss': 0.7727, 'grad_norm': 1.27374267578125, 'learning_rate': 9.106071969008537e-06, 'epoch': 1.63} +2025-02-05 22:00:42 - ERROR - stderr - 54%|█████▍ | 12174/22434 [11:53:02<7:17:50, 2.56s/it] +2025-02-05 22:00:45 - ERROR - stderr - 54%|█████▍ | 12175/22434 [11:53:04<7:16:31, 2.55s/it] +2025-02-05 22:00:45 - ERROR - stderr - +2025-02-05 22:00:45 - ERROR - stderr - +2025-02-05 22:00:45 - INFO - stdout - {'loss': 0.6522, 'grad_norm': 1.1897945404052734, 'learning_rate': 9.104634011819173e-06, 'epoch': 1.63} +2025-02-05 22:00:45 - ERROR - stderr - 54%|█████▍ | 12175/22434 [11:53:04<7:16:31, 2.55s/it] +2025-02-05 22:00:45 - INFO - stdout - WARNING: tokenization mismatch: 1 vs. 62. (ignored) +2025-02-05 22:00:47 - ERROR - stderr - 54%|█████▍ | 12176/22434 [11:53:07<7:17:23, 2.56s/it] +2025-02-05 22:00:47 - ERROR - stderr - +2025-02-05 22:00:47 - ERROR - stderr - +2025-02-05 22:00:47 - INFO - stdout - {'loss': 0.737, 'grad_norm': 1.282271385192871, 'learning_rate': 9.10319607329286e-06, 'epoch': 1.63} +2025-02-05 22:00:47 - ERROR - stderr - 54%|█████▍ | 12176/22434 [11:53:07<7:17:23, 2.56s/it] +2025-02-05 22:00:50 - ERROR - stderr - 54%|█████▍ | 12177/22434 [11:53:10<7:15:45, 2.55s/it] +2025-02-05 22:00:50 - ERROR - stderr - +2025-02-05 22:00:50 - ERROR - stderr - +2025-02-05 22:00:50 - INFO - stdout - {'loss': 0.6457, 'grad_norm': 1.1652004718780518, 'learning_rate': 9.101758153459564e-06, 'epoch': 1.63} +2025-02-05 22:00:50 - ERROR - stderr - 54%|█████▍ | 12177/22434 [11:53:10<7:15:45, 2.55s/it] +2025-02-05 22:00:53 - ERROR - stderr - 54%|█████▍ | 12178/22434 [11:53:12<7:29:55, 2.63s/it] +2025-02-05 22:00:53 - ERROR - stderr - +2025-02-05 22:00:53 - ERROR - stderr - +2025-02-05 22:00:53 - INFO - stdout - {'loss': 0.7103, 'grad_norm': 1.1747586727142334, 'learning_rate': 9.100320252349261e-06, 'epoch': 1.63} +2025-02-05 22:00:53 - ERROR - stderr - 54%|█████▍ | 12178/22434 [11:53:12<7:29:55, 2.63s/it] +2025-02-05 22:00:55 - ERROR - stderr - 54%|█████▍ | 12179/22434 [11:53:15<7:36:23, 2.67s/it] +2025-02-05 22:00:55 - ERROR - stderr - +2025-02-05 22:00:55 - ERROR - stderr - +2025-02-05 22:00:55 - INFO - stdout - {'loss': 0.7065, 'grad_norm': 1.2479718923568726, 'learning_rate': 9.098882369991924e-06, 'epoch': 1.63} +2025-02-05 22:00:55 - ERROR - stderr - 54%|█████▍ | 12179/22434 [11:53:15<7:36:23, 2.67s/it] +2025-02-05 22:00:58 - ERROR - stderr - 54%|█████▍ | 12180/22434 [11:53:18<7:29:19, 2.63s/it] +2025-02-05 22:00:58 - ERROR - stderr - +2025-02-05 22:00:58 - ERROR - stderr - +2025-02-05 22:00:58 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.3513357639312744, 'learning_rate': 9.097444506417518e-06, 'epoch': 1.63} +2025-02-05 22:00:58 - ERROR - stderr - 54%|█████▍ | 12180/22434 [11:53:18<7:29:19, 2.63s/it] +2025-02-05 22:01:00 - ERROR - stderr - 54%|█████▍ | 12181/22434 [11:53:20<7:21:51, 2.59s/it] +2025-02-05 22:01:00 - ERROR - stderr - +2025-02-05 22:01:00 - ERROR - stderr - +2025-02-05 22:01:00 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.2002671957015991, 'learning_rate': 9.096006661656021e-06, 'epoch': 1.63} +2025-02-05 22:01:00 - ERROR - stderr - 54%|█████▍ | 12181/22434 [11:53:20<7:21:51, 2.59s/it] +2025-02-05 22:01:03 - ERROR - stderr - 54%|█████▍ | 12182/22434 [11:53:23<7:16:24, 2.55s/it] +2025-02-05 22:01:03 - ERROR - stderr - +2025-02-05 22:01:03 - ERROR - stderr - +2025-02-05 22:01:03 - INFO - stdout - {'loss': 0.7156, 'grad_norm': 1.480422854423523, 'learning_rate': 9.094568835737397e-06, 'epoch': 1.63} +2025-02-05 22:01:03 - ERROR - stderr - 54%|█████▍ | 12182/22434 [11:53:23<7:16:24, 2.55s/it] +2025-02-05 22:01:05 - ERROR - stderr - 54%|█████▍ | 12183/22434 [11:53:25<7:11:55, 2.53s/it] +2025-02-05 22:01:05 - ERROR - stderr - +2025-02-05 22:01:05 - ERROR - stderr - +2025-02-05 22:01:05 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.230607271194458, 'learning_rate': 9.093131028691617e-06, 'epoch': 1.63} +2025-02-05 22:01:05 - ERROR - stderr - 54%|█████▍ | 12183/22434 [11:53:25<7:11:55, 2.53s/it] +2025-02-05 22:01:08 - ERROR - stderr - 54%|█████▍ | 12184/22434 [11:53:28<7:10:16, 2.52s/it] +2025-02-05 22:01:08 - ERROR - stderr - +2025-02-05 22:01:08 - ERROR - stderr - +2025-02-05 22:01:08 - INFO - stdout - {'loss': 0.6386, 'grad_norm': 1.2090680599212646, 'learning_rate': 9.091693240548659e-06, 'epoch': 1.63} +2025-02-05 22:01:08 - ERROR - stderr - 54%|█████▍ | 12184/22434 [11:53:28<7:10:16, 2.52s/it] +2025-02-05 22:01:10 - ERROR - stderr - 54%|█████▍ | 12185/22434 [11:53:30<7:10:52, 2.52s/it] +2025-02-05 22:01:10 - ERROR - stderr - +2025-02-05 22:01:10 - ERROR - stderr - +2025-02-05 22:01:10 - INFO - stdout - {'loss': 0.7721, 'grad_norm': 1.2085438966751099, 'learning_rate': 9.090255471338482e-06, 'epoch': 1.63} +2025-02-05 22:01:10 - ERROR - stderr - 54%|█████▍ | 12185/22434 [11:53:30<7:10:52, 2.52s/it] +2025-02-05 22:01:13 - ERROR - stderr - 54%|█████▍ | 12186/22434 [11:53:33<7:07:36, 2.50s/it] +2025-02-05 22:01:13 - ERROR - stderr - +2025-02-05 22:01:13 - ERROR - stderr - +2025-02-05 22:01:13 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.1867046356201172, 'learning_rate': 9.088817721091062e-06, 'epoch': 1.63} +2025-02-05 22:01:13 - ERROR - stderr - 54%|█████▍ | 12186/22434 [11:53:33<7:07:36, 2.50s/it] +2025-02-05 22:01:15 - ERROR - stderr - 54%|█████▍ | 12187/22434 [11:53:35<7:15:31, 2.55s/it] +2025-02-05 22:01:16 - ERROR - stderr - +2025-02-05 22:01:16 - ERROR - stderr - +2025-02-05 22:01:16 - INFO - stdout - {'loss': 0.6951, 'grad_norm': 1.1910532712936401, 'learning_rate': 9.087379989836366e-06, 'epoch': 1.63} +2025-02-05 22:01:16 - ERROR - stderr - 54%|█████▍ | 12187/22434 [11:53:35<7:15:31, 2.55s/it] +2025-02-05 22:01:18 - ERROR - stderr - 54%|█████▍ | 12188/22434 [11:53:38<7:15:08, 2.55s/it] +2025-02-05 22:01:18 - ERROR - stderr - +2025-02-05 22:01:18 - ERROR - stderr - +2025-02-05 22:01:18 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.2493771314620972, 'learning_rate': 9.085942277604354e-06, 'epoch': 1.63} +2025-02-05 22:01:18 - ERROR - stderr - 54%|█████▍ | 12188/22434 [11:53:38<7:15:08, 2.55s/it] +2025-02-05 22:01:21 - ERROR - stderr - 54%|█████▍ | 12189/22434 [11:53:40<7:18:51, 2.57s/it] +2025-02-05 22:01:21 - ERROR - stderr - +2025-02-05 22:01:21 - ERROR - stderr - +2025-02-05 22:01:21 - INFO - stdout - {'loss': 0.8063, 'grad_norm': 1.2557603120803833, 'learning_rate': 9.084504584425005e-06, 'epoch': 1.63} +2025-02-05 22:01:21 - ERROR - stderr - 54%|█████▍ | 12189/22434 [11:53:40<7:18:51, 2.57s/it] +2025-02-05 22:01:23 - ERROR - stderr - 54%|█████▍ | 12190/22434 [11:53:43<7:16:12, 2.55s/it] +2025-02-05 22:01:23 - ERROR - stderr - +2025-02-05 22:01:23 - ERROR - stderr - +2025-02-05 22:01:23 - INFO - stdout - {'loss': 0.6871, 'grad_norm': 1.1318848133087158, 'learning_rate': 9.083066910328284e-06, 'epoch': 1.63} +2025-02-05 22:01:23 - ERROR - stderr - 54%|█████▍ | 12190/22434 [11:53:43<7:16:12, 2.55s/it] +2025-02-05 22:01:26 - ERROR - stderr - 54%|█████▍ | 12191/22434 [11:53:46<7:22:16, 2.59s/it] +2025-02-05 22:01:26 - ERROR - stderr - +2025-02-05 22:01:26 - ERROR - stderr - +2025-02-05 22:01:26 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.100594401359558, 'learning_rate': 9.08162925534415e-06, 'epoch': 1.63} +2025-02-05 22:01:26 - ERROR - stderr - 54%|█████▍ | 12191/22434 [11:53:46<7:22:16, 2.59s/it] +2025-02-05 22:01:29 - ERROR - stderr - 54%|█████▍ | 12192/22434 [11:53:48<7:28:41, 2.63s/it] +2025-02-05 22:01:29 - ERROR - stderr - +2025-02-05 22:01:29 - ERROR - stderr - +2025-02-05 22:01:29 - INFO - stdout - {'loss': 0.5987, 'grad_norm': 1.1576602458953857, 'learning_rate': 9.080191619502581e-06, 'epoch': 1.63} +2025-02-05 22:01:29 - ERROR - stderr - 54%|█████▍ | 12192/22434 [11:53:48<7:28:41, 2.63s/it] +2025-02-05 22:01:31 - ERROR - stderr - 54%|█████▍ | 12193/22434 [11:53:51<7:24:14, 2.60s/it] +2025-02-05 22:01:31 - ERROR - stderr - +2025-02-05 22:01:31 - ERROR - stderr - +2025-02-05 22:01:31 - INFO - stdout - {'loss': 0.7157, 'grad_norm': 1.2103257179260254, 'learning_rate': 9.078754002833535e-06, 'epoch': 1.63} +2025-02-05 22:01:31 - ERROR - stderr - 54%|█████▍ | 12193/22434 [11:53:51<7:24:14, 2.60s/it] +2025-02-05 22:01:34 - ERROR - stderr - 54%|█████▍ | 12194/22434 [11:53:53<7:24:33, 2.60s/it] +2025-02-05 22:01:34 - ERROR - stderr - +2025-02-05 22:01:34 - ERROR - stderr - +2025-02-05 22:01:34 - INFO - stdout - {'loss': 0.7749, 'grad_norm': 1.2625612020492554, 'learning_rate': 9.07731640536698e-06, 'epoch': 1.63} +2025-02-05 22:01:34 - ERROR - stderr - 54%|█████▍ | 12194/22434 [11:53:54<7:24:33, 2.60s/it] +2025-02-05 22:01:36 - ERROR - stderr - 54%|█████▍ | 12195/22434 [11:53:56<7:20:54, 2.58s/it] +2025-02-05 22:01:36 - ERROR - stderr - +2025-02-05 22:01:36 - ERROR - stderr - +2025-02-05 22:01:36 - INFO - stdout - {'loss': 0.5939, 'grad_norm': 1.0626695156097412, 'learning_rate': 9.075878827132883e-06, 'epoch': 1.63} +2025-02-05 22:01:36 - ERROR - stderr - 54%|█████▍ | 12195/22434 [11:53:56<7:20:54, 2.58s/it] +2025-02-05 22:01:39 - ERROR - stderr - 54%|█████▍ | 12196/22434 [11:53:58<7:16:41, 2.56s/it] +2025-02-05 22:01:39 - ERROR - stderr - +2025-02-05 22:01:39 - ERROR - stderr - +2025-02-05 22:01:39 - INFO - stdout - {'loss': 0.6633, 'grad_norm': 1.1504452228546143, 'learning_rate': 9.074441268161207e-06, 'epoch': 1.63} +2025-02-05 22:01:39 - ERROR - stderr - 54%|█████▍ | 12196/22434 [11:53:59<7:16:41, 2.56s/it] +2025-02-05 22:01:41 - ERROR - stderr - 54%|█████▍ | 12197/22434 [11:54:01<7:14:10, 2.54s/it] +2025-02-05 22:01:41 - ERROR - stderr - +2025-02-05 22:01:41 - ERROR - stderr - +2025-02-05 22:01:41 - INFO - stdout - {'loss': 0.5941, 'grad_norm': 1.1651383638381958, 'learning_rate': 9.073003728481917e-06, 'epoch': 1.63} +2025-02-05 22:01:41 - ERROR - stderr - 54%|█████▍ | 12197/22434 [11:54:01<7:14:10, 2.54s/it] +2025-02-05 22:01:44 - ERROR - stderr - 54%|█████▍ | 12198/22434 [11:54:04<7:15:12, 2.55s/it] +2025-02-05 22:01:44 - ERROR - stderr - +2025-02-05 22:01:44 - ERROR - stderr - +2025-02-05 22:01:44 - INFO - stdout - {'loss': 0.7825, 'grad_norm': 1.2084946632385254, 'learning_rate': 9.07156620812498e-06, 'epoch': 1.63} +2025-02-05 22:01:44 - ERROR - stderr - 54%|█████▍ | 12198/22434 [11:54:04<7:15:12, 2.55s/it] +2025-02-05 22:01:46 - ERROR - stderr - 54%|█████▍ | 12199/22434 [11:54:06<7:09:22, 2.52s/it] +2025-02-05 22:01:46 - ERROR - stderr - +2025-02-05 22:01:46 - ERROR - stderr - +2025-02-05 22:01:46 - INFO - stdout - {'loss': 0.6454, 'grad_norm': 1.2287118434906006, 'learning_rate': 9.070128707120351e-06, 'epoch': 1.63} +2025-02-05 22:01:46 - ERROR - stderr - 54%|█████▍ | 12199/22434 [11:54:06<7:09:22, 2.52s/it] +2025-02-05 22:01:49 - ERROR - stderr - 54%|█████▍ | 12200/22434 [11:54:08<7:05:12, 2.49s/it] +2025-02-05 22:01:49 - ERROR - stderr - +2025-02-05 22:01:49 - ERROR - stderr - +2025-02-05 22:01:49 - INFO - stdout - {'loss': 0.6358, 'grad_norm': 1.2099499702453613, 'learning_rate': 9.068691225498004e-06, 'epoch': 1.63} +2025-02-05 22:01:49 - ERROR - stderr - 54%|█████▍ | 12200/22434 [11:54:08<7:05:12, 2.49s/it] +2025-02-05 22:01:51 - ERROR - stderr - 54%|█████▍ | 12201/22434 [11:54:11<7:05:27, 2.49s/it] +2025-02-05 22:01:51 - ERROR - stderr - +2025-02-05 22:01:51 - ERROR - stderr - +2025-02-05 22:01:51 - INFO - stdout - {'loss': 0.8125, 'grad_norm': 1.3258144855499268, 'learning_rate': 9.067253763287894e-06, 'epoch': 1.63} +2025-02-05 22:01:51 - ERROR - stderr - 54%|█████▍ | 12201/22434 [11:54:11<7:05:27, 2.49s/it] +2025-02-05 22:01:54 - ERROR - stderr - 54%|█████▍ | 12202/22434 [11:54:13<7:04:12, 2.49s/it] +2025-02-05 22:01:54 - ERROR - stderr - +2025-02-05 22:01:54 - ERROR - stderr - +2025-02-05 22:01:54 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.1682347059249878, 'learning_rate': 9.065816320519989e-06, 'epoch': 1.63} +2025-02-05 22:01:54 - ERROR - stderr - 54%|█████▍ | 12202/22434 [11:54:13<7:04:12, 2.49s/it] +2025-02-05 22:01:56 - ERROR - stderr - 54%|█████▍ | 12203/22434 [11:54:16<7:00:33, 2.47s/it] +2025-02-05 22:01:56 - ERROR - stderr - +2025-02-05 22:01:56 - ERROR - stderr - +2025-02-05 22:01:56 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.3243006467819214, 'learning_rate': 9.06437889722425e-06, 'epoch': 1.63} +2025-02-05 22:01:56 - ERROR - stderr - 54%|█████▍ | 12203/22434 [11:54:16<7:00:33, 2.47s/it] +2025-02-05 22:01:59 - ERROR - stderr - 54%|█████▍ | 12204/22434 [11:54:18<7:05:20, 2.49s/it] +2025-02-05 22:01:59 - ERROR - stderr - +2025-02-05 22:01:59 - ERROR - stderr - +2025-02-05 22:01:59 - INFO - stdout - {'loss': 0.6392, 'grad_norm': 1.273690938949585, 'learning_rate': 9.062941493430634e-06, 'epoch': 1.63} +2025-02-05 22:01:59 - ERROR - stderr - 54%|█████▍ | 12204/22434 [11:54:18<7:05:20, 2.49s/it] +2025-02-05 22:02:01 - ERROR - stderr - 54%|█████▍ | 12205/22434 [11:54:21<7:06:05, 2.50s/it] +2025-02-05 22:02:01 - ERROR - stderr - +2025-02-05 22:02:01 - ERROR - stderr - +2025-02-05 22:02:01 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.234810471534729, 'learning_rate': 9.061504109169108e-06, 'epoch': 1.63} +2025-02-05 22:02:01 - ERROR - stderr - 54%|█████▍ | 12205/22434 [11:54:21<7:06:05, 2.50s/it] +2025-02-05 22:02:04 - ERROR - stderr - 54%|█████▍ | 12206/22434 [11:54:24<7:17:40, 2.57s/it] +2025-02-05 22:02:04 - ERROR - stderr - +2025-02-05 22:02:04 - ERROR - stderr - +2025-02-05 22:02:04 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.2182395458221436, 'learning_rate': 9.060066744469633e-06, 'epoch': 1.63} +2025-02-05 22:02:04 - ERROR - stderr - 54%|█████▍ | 12206/22434 [11:54:24<7:17:40, 2.57s/it] +2025-02-05 22:02:06 - ERROR - stderr - 54%|█████▍ | 12207/22434 [11:54:26<7:12:47, 2.54s/it] +2025-02-05 22:02:06 - ERROR - stderr - +2025-02-05 22:02:06 - ERROR - stderr - +2025-02-05 22:02:06 - INFO - stdout - {'loss': 0.7437, 'grad_norm': 1.239786148071289, 'learning_rate': 9.058629399362163e-06, 'epoch': 1.63} +2025-02-05 22:02:06 - ERROR - stderr - 54%|█████▍ | 12207/22434 [11:54:26<7:12:47, 2.54s/it] +2025-02-05 22:02:09 - ERROR - stderr - 54%|█████▍ | 12208/22434 [11:54:29<7:09:26, 2.52s/it] +2025-02-05 22:02:09 - ERROR - stderr - +2025-02-05 22:02:09 - ERROR - stderr - +2025-02-05 22:02:09 - INFO - stdout - {'loss': 0.6615, 'grad_norm': 1.1204453706741333, 'learning_rate': 9.057192073876665e-06, 'epoch': 1.63} +2025-02-05 22:02:09 - ERROR - stderr - 54%|█████▍ | 12208/22434 [11:54:29<7:09:26, 2.52s/it] +2025-02-05 22:02:11 - ERROR - stderr - 54%|█████▍ | 12209/22434 [11:54:31<7:11:31, 2.53s/it] +2025-02-05 22:02:11 - ERROR - stderr - +2025-02-05 22:02:11 - ERROR - stderr - +2025-02-05 22:02:11 - INFO - stdout - {'loss': 0.6451, 'grad_norm': 1.1716543436050415, 'learning_rate': 9.055754768043095e-06, 'epoch': 1.63} +2025-02-05 22:02:11 - ERROR - stderr - 54%|█████▍ | 12209/22434 [11:54:31<7:11:31, 2.53s/it] +2025-02-05 22:02:14 - ERROR - stderr - 54%|█████▍ | 12210/22434 [11:54:34<7:10:54, 2.53s/it] +2025-02-05 22:02:14 - ERROR - stderr - +2025-02-05 22:02:14 - ERROR - stderr - +2025-02-05 22:02:14 - INFO - stdout - {'loss': 0.6402, 'grad_norm': 1.1134787797927856, 'learning_rate': 9.054317481891413e-06, 'epoch': 1.63} +2025-02-05 22:02:14 - ERROR - stderr - 54%|█████▍ | 12210/22434 [11:54:34<7:10:54, 2.53s/it] +2025-02-05 22:02:16 - ERROR - stderr - 54%|█████▍ | 12211/22434 [11:54:36<7:06:52, 2.51s/it] +2025-02-05 22:02:16 - ERROR - stderr - +2025-02-05 22:02:16 - ERROR - stderr - +2025-02-05 22:02:16 - INFO - stdout - {'loss': 0.747, 'grad_norm': 1.2876673936843872, 'learning_rate': 9.052880215451581e-06, 'epoch': 1.63} +2025-02-05 22:02:16 - ERROR - stderr - 54%|█████▍ | 12211/22434 [11:54:36<7:06:52, 2.51s/it] +2025-02-05 22:02:19 - ERROR - stderr - 54%|█████▍ | 12212/22434 [11:54:39<7:19:11, 2.58s/it] +2025-02-05 22:02:19 - ERROR - stderr - +2025-02-05 22:02:19 - ERROR - stderr - +2025-02-05 22:02:19 - INFO - stdout - {'loss': 0.6932, 'grad_norm': 1.3560421466827393, 'learning_rate': 9.05144296875355e-06, 'epoch': 1.63} +2025-02-05 22:02:19 - ERROR - stderr - 54%|█████▍ | 12212/22434 [11:54:39<7:19:11, 2.58s/it] +2025-02-05 22:02:22 - ERROR - stderr - 54%|█████▍ | 12213/22434 [11:54:41<7:15:38, 2.56s/it] +2025-02-05 22:02:22 - ERROR - stderr - +2025-02-05 22:02:22 - ERROR - stderr - +2025-02-05 22:02:22 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.2528886795043945, 'learning_rate': 9.050005741827286e-06, 'epoch': 1.63} +2025-02-05 22:02:22 - ERROR - stderr - 54%|█████▍ | 12213/22434 [11:54:41<7:15:38, 2.56s/it] +2025-02-05 22:02:24 - ERROR - stderr - 54%|█████▍ | 12214/22434 [11:54:44<7:12:43, 2.54s/it] +2025-02-05 22:02:24 - ERROR - stderr - +2025-02-05 22:02:24 - ERROR - stderr - +2025-02-05 22:02:24 - INFO - stdout - {'loss': 0.8155, 'grad_norm': 1.2665976285934448, 'learning_rate': 9.048568534702744e-06, 'epoch': 1.63} +2025-02-05 22:02:24 - ERROR - stderr - 54%|█████▍ | 12214/22434 [11:54:44<7:12:43, 2.54s/it] +2025-02-05 22:02:27 - ERROR - stderr - 54%|█████▍ | 12215/22434 [11:54:46<7:10:40, 2.53s/it] +2025-02-05 22:02:27 - ERROR - stderr - +2025-02-05 22:02:27 - ERROR - stderr - +2025-02-05 22:02:27 - INFO - stdout - {'loss': 0.7204, 'grad_norm': 1.255212426185608, 'learning_rate': 9.047131347409879e-06, 'epoch': 1.63} +2025-02-05 22:02:27 - ERROR - stderr - 54%|█████▍ | 12215/22434 [11:54:46<7:10:40, 2.53s/it] +2025-02-05 22:02:29 - ERROR - stderr - 54%|█████▍ | 12216/22434 [11:54:49<7:16:11, 2.56s/it] +2025-02-05 22:02:29 - ERROR - stderr - +2025-02-05 22:02:29 - ERROR - stderr - +2025-02-05 22:02:29 - INFO - stdout - {'loss': 0.6883, 'grad_norm': 1.2966251373291016, 'learning_rate': 9.045694179978647e-06, 'epoch': 1.63} +2025-02-05 22:02:29 - ERROR - stderr - 54%|█████▍ | 12216/22434 [11:54:49<7:16:11, 2.56s/it] +2025-02-05 22:02:32 - ERROR - stderr - 54%|█████▍ | 12217/22434 [11:54:52<7:19:29, 2.58s/it] +2025-02-05 22:02:32 - ERROR - stderr - +2025-02-05 22:02:32 - ERROR - stderr - +2025-02-05 22:02:32 - INFO - stdout - {'loss': 0.7679, 'grad_norm': 1.3082724809646606, 'learning_rate': 9.044257032439007e-06, 'epoch': 1.63} +2025-02-05 22:02:32 - ERROR - stderr - 54%|█████▍ | 12217/22434 [11:54:52<7:19:29, 2.58s/it] +2025-02-05 22:02:34 - ERROR - stderr - 54%|█████▍ | 12218/22434 [11:54:54<7:17:07, 2.57s/it] +2025-02-05 22:02:34 - ERROR - stderr - +2025-02-05 22:02:34 - ERROR - stderr - +2025-02-05 22:02:34 - INFO - stdout - {'loss': 0.7084, 'grad_norm': 1.3585913181304932, 'learning_rate': 9.04281990482092e-06, 'epoch': 1.63} +2025-02-05 22:02:34 - ERROR - stderr - 54%|█████▍ | 12218/22434 [11:54:54<7:17:07, 2.57s/it] +2025-02-05 22:02:37 - ERROR - stderr - 54%|█████▍ | 12219/22434 [11:54:57<7:13:45, 2.55s/it] +2025-02-05 22:02:37 - ERROR - stderr - +2025-02-05 22:02:37 - ERROR - stderr - +2025-02-05 22:02:37 - INFO - stdout - {'loss': 0.6483, 'grad_norm': 1.189323902130127, 'learning_rate': 9.041382797154333e-06, 'epoch': 1.63} +2025-02-05 22:02:37 - ERROR - stderr - 54%|█████▍ | 12219/22434 [11:54:57<7:13:45, 2.55s/it] +2025-02-05 22:02:39 - ERROR - stderr - 54%|█████▍ | 12220/22434 [11:54:59<7:10:56, 2.53s/it] +2025-02-05 22:02:39 - ERROR - stderr - +2025-02-05 22:02:39 - ERROR - stderr - +2025-02-05 22:02:39 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.1092365980148315, 'learning_rate': 9.039945709469202e-06, 'epoch': 1.63} +2025-02-05 22:02:39 - ERROR - stderr - 54%|█████▍ | 12220/22434 [11:54:59<7:10:56, 2.53s/it] +2025-02-05 22:02:42 - ERROR - stderr - 54%|█████▍ | 12221/22434 [11:55:02<7:10:46, 2.53s/it] +2025-02-05 22:02:42 - ERROR - stderr - +2025-02-05 22:02:42 - ERROR - stderr - +2025-02-05 22:02:42 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.1652475595474243, 'learning_rate': 9.038508641795485e-06, 'epoch': 1.63} +2025-02-05 22:02:42 - ERROR - stderr - 54%|█████▍ | 12221/22434 [11:55:02<7:10:46, 2.53s/it] +2025-02-05 22:02:44 - ERROR - stderr - 54%|█████▍ | 12222/22434 [11:55:04<7:10:37, 2.53s/it] +2025-02-05 22:02:45 - ERROR - stderr - +2025-02-05 22:02:45 - ERROR - stderr - +2025-02-05 22:02:45 - INFO - stdout - {'loss': 0.6465, 'grad_norm': 1.1675993204116821, 'learning_rate': 9.037071594163139e-06, 'epoch': 1.63} +2025-02-05 22:02:45 - ERROR - stderr - 54%|█████▍ | 12222/22434 [11:55:04<7:10:37, 2.53s/it] +2025-02-05 22:02:47 - ERROR - stderr - 54%|█████▍ | 12223/22434 [11:55:07<7:27:16, 2.63s/it] +2025-02-05 22:02:47 - ERROR - stderr - +2025-02-05 22:02:47 - ERROR - stderr - +2025-02-05 22:02:47 - INFO - stdout - {'loss': 0.6673, 'grad_norm': 1.207872748374939, 'learning_rate': 9.035634566602109e-06, 'epoch': 1.63} +2025-02-05 22:02:47 - ERROR - stderr - 54%|█████▍ | 12223/22434 [11:55:07<7:27:16, 2.63s/it] +2025-02-05 22:02:50 - ERROR - stderr - 54%|█████▍ | 12224/22434 [11:55:10<7:42:02, 2.72s/it] +2025-02-05 22:02:50 - ERROR - stderr - +2025-02-05 22:02:50 - ERROR - stderr - +2025-02-05 22:02:50 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.093558430671692, 'learning_rate': 9.034197559142358e-06, 'epoch': 1.63} +2025-02-05 22:02:50 - ERROR - stderr - 54%|█████▍ | 12224/22434 [11:55:10<7:42:02, 2.72s/it] +2025-02-05 22:02:53 - ERROR - stderr - 54%|█████▍ | 12225/22434 [11:55:12<7:27:02, 2.63s/it] +2025-02-05 22:02:53 - ERROR - stderr - +2025-02-05 22:02:53 - ERROR - stderr - +2025-02-05 22:02:53 - INFO - stdout - {'loss': 0.7134, 'grad_norm': 1.2169008255004883, 'learning_rate': 9.03276057181383e-06, 'epoch': 1.63} +2025-02-05 22:02:53 - ERROR - stderr - 54%|█████▍ | 12225/22434 [11:55:12<7:27:02, 2.63s/it] +2025-02-05 22:02:56 - ERROR - stderr - 54%|█████▍ | 12226/22434 [11:55:15<7:42:32, 2.72s/it] +2025-02-05 22:02:56 - ERROR - stderr - +2025-02-05 22:02:56 - ERROR - stderr - +2025-02-05 22:02:56 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.185446858406067, 'learning_rate': 9.031323604646488e-06, 'epoch': 1.63} +2025-02-05 22:02:56 - ERROR - stderr - 54%|█████▍ | 12226/22434 [11:55:15<7:42:32, 2.72s/it] +2025-02-05 22:02:58 - ERROR - stderr - 55%|█████▍ | 12227/22434 [11:55:18<7:31:11, 2.65s/it] +2025-02-05 22:02:58 - ERROR - stderr - +2025-02-05 22:02:58 - ERROR - stderr - +2025-02-05 22:02:58 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.17899751663208, 'learning_rate': 9.029886657670275e-06, 'epoch': 1.64} +2025-02-05 22:02:58 - ERROR - stderr - 55%|█████▍ | 12227/22434 [11:55:18<7:31:11, 2.65s/it] +2025-02-05 22:03:01 - ERROR - stderr - 55%|█████▍ | 12228/22434 [11:55:20<7:25:15, 2.62s/it] +2025-02-05 22:03:01 - ERROR - stderr - +2025-02-05 22:03:01 - ERROR - stderr - +2025-02-05 22:03:01 - INFO - stdout - {'loss': 0.7446, 'grad_norm': 1.233406901359558, 'learning_rate': 9.028449730915146e-06, 'epoch': 1.64} +2025-02-05 22:03:01 - ERROR - stderr - 55%|█████▍ | 12228/22434 [11:55:20<7:25:15, 2.62s/it] +2025-02-05 22:03:03 - ERROR - stderr - 55%|█████▍ | 12229/22434 [11:55:23<7:19:40, 2.59s/it] +2025-02-05 22:03:03 - ERROR - stderr - +2025-02-05 22:03:03 - ERROR - stderr - +2025-02-05 22:03:03 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.2549545764923096, 'learning_rate': 9.027012824411053e-06, 'epoch': 1.64} +2025-02-05 22:03:03 - ERROR - stderr - 55%|█████▍ | 12229/22434 [11:55:23<7:19:40, 2.59s/it] +2025-02-05 22:03:06 - ERROR - stderr - 55%|█████▍ | 12230/22434 [11:55:26<7:31:41, 2.66s/it] +2025-02-05 22:03:06 - ERROR - stderr - +2025-02-05 22:03:06 - ERROR - stderr - +2025-02-05 22:03:06 - INFO - stdout - {'loss': 0.7076, 'grad_norm': 1.2093199491500854, 'learning_rate': 9.02557593818795e-06, 'epoch': 1.64} +2025-02-05 22:03:06 - ERROR - stderr - 55%|█████▍ | 12230/22434 [11:55:26<7:31:41, 2.66s/it] +2025-02-05 22:03:08 - ERROR - stderr - 55%|█████▍ | 12231/22434 [11:55:28<7:21:38, 2.60s/it] +2025-02-05 22:03:08 - ERROR - stderr - +2025-02-05 22:03:08 - ERROR - stderr - +2025-02-05 22:03:08 - INFO - stdout - {'loss': 0.7126, 'grad_norm': 1.168333888053894, 'learning_rate': 9.024139072275779e-06, 'epoch': 1.64} +2025-02-05 22:03:08 - ERROR - stderr - 55%|█████▍ | 12231/22434 [11:55:28<7:21:38, 2.60s/it] +2025-02-05 22:03:11 - ERROR - stderr - 55%|█████▍ | 12232/22434 [11:55:31<7:25:35, 2.62s/it] +2025-02-05 22:03:11 - ERROR - stderr - +2025-02-05 22:03:11 - ERROR - stderr - +2025-02-05 22:03:11 - INFO - stdout - {'loss': 0.6721, 'grad_norm': 1.2756528854370117, 'learning_rate': 9.022702226704499e-06, 'epoch': 1.64} +2025-02-05 22:03:11 - ERROR - stderr - 55%|█████▍ | 12232/22434 [11:55:31<7:25:35, 2.62s/it] +2025-02-05 22:03:14 - ERROR - stderr - 55%|█████▍ | 12233/22434 [11:55:33<7:18:13, 2.58s/it] +2025-02-05 22:03:14 - ERROR - stderr - +2025-02-05 22:03:14 - ERROR - stderr - +2025-02-05 22:03:14 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.1707675457000732, 'learning_rate': 9.021265401504053e-06, 'epoch': 1.64} +2025-02-05 22:03:14 - ERROR - stderr - 55%|█████▍ | 12233/22434 [11:55:33<7:18:13, 2.58s/it] +2025-02-05 22:03:16 - ERROR - stderr - 55%|█████▍ | 12234/22434 [11:55:36<7:16:23, 2.57s/it] +2025-02-05 22:03:16 - ERROR - stderr - +2025-02-05 22:03:16 - ERROR - stderr - +2025-02-05 22:03:16 - INFO - stdout - {'loss': 0.7518, 'grad_norm': 1.2475755214691162, 'learning_rate': 9.019828596704394e-06, 'epoch': 1.64} +2025-02-05 22:03:16 - ERROR - stderr - 55%|█████▍ | 12234/22434 [11:55:36<7:16:23, 2.57s/it] +2025-02-05 22:03:19 - ERROR - stderr - 55%|█████▍ | 12235/22434 [11:55:38<7:09:30, 2.53s/it] +2025-02-05 22:03:19 - ERROR - stderr - +2025-02-05 22:03:19 - ERROR - stderr - +2025-02-05 22:03:19 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.2347018718719482, 'learning_rate': 9.018391812335473e-06, 'epoch': 1.64} +2025-02-05 22:03:19 - ERROR - stderr - 55%|█████▍ | 12235/22434 [11:55:38<7:09:30, 2.53s/it] +2025-02-05 22:03:21 - ERROR - stderr - 55%|█████▍ | 12236/22434 [11:55:41<7:14:42, 2.56s/it] +2025-02-05 22:03:21 - ERROR - stderr - +2025-02-05 22:03:21 - ERROR - stderr - +2025-02-05 22:03:21 - INFO - stdout - {'loss': 0.7487, 'grad_norm': 1.2987205982208252, 'learning_rate': 9.01695504842723e-06, 'epoch': 1.64} +2025-02-05 22:03:21 - ERROR - stderr - 55%|█████▍ | 12236/22434 [11:55:41<7:14:42, 2.56s/it] +2025-02-05 22:03:24 - ERROR - stderr - 55%|█████▍ | 12237/22434 [11:55:43<7:09:10, 2.53s/it] +2025-02-05 22:03:24 - ERROR - stderr - +2025-02-05 22:03:24 - ERROR - stderr - +2025-02-05 22:03:24 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.2436975240707397, 'learning_rate': 9.015518305009623e-06, 'epoch': 1.64} +2025-02-05 22:03:24 - ERROR - stderr - 55%|█████▍ | 12237/22434 [11:55:43<7:09:10, 2.53s/it] +2025-02-05 22:03:26 - ERROR - stderr - 55%|█████▍ | 12238/22434 [11:55:46<7:17:51, 2.58s/it] +2025-02-05 22:03:26 - ERROR - stderr - +2025-02-05 22:03:26 - ERROR - stderr - +2025-02-05 22:03:26 - INFO - stdout - {'loss': 0.7111, 'grad_norm': 1.1979976892471313, 'learning_rate': 9.014081582112592e-06, 'epoch': 1.64} +2025-02-05 22:03:26 - ERROR - stderr - 55%|█████▍ | 12238/22434 [11:55:46<7:17:51, 2.58s/it] +2025-02-05 22:03:29 - ERROR - stderr - 55%|█████▍ | 12239/22434 [11:55:49<7:14:20, 2.56s/it] +2025-02-05 22:03:29 - ERROR - stderr - +2025-02-05 22:03:29 - ERROR - stderr - +2025-02-05 22:03:29 - INFO - stdout - {'loss': 0.5911, 'grad_norm': 1.0305064916610718, 'learning_rate': 9.012644879766091e-06, 'epoch': 1.64} +2025-02-05 22:03:29 - ERROR - stderr - 55%|█████▍ | 12239/22434 [11:55:49<7:14:20, 2.56s/it] +2025-02-05 22:03:31 - ERROR - stderr - 55%|█████▍ | 12240/22434 [11:55:51<7:12:48, 2.55s/it] +2025-02-05 22:03:31 - ERROR - stderr - +2025-02-05 22:03:31 - ERROR - stderr - +2025-02-05 22:03:31 - INFO - stdout - {'loss': 0.7257, 'grad_norm': 1.2712054252624512, 'learning_rate': 9.011208198000058e-06, 'epoch': 1.64} +2025-02-05 22:03:31 - ERROR - stderr - 55%|█████▍ | 12240/22434 [11:55:51<7:12:48, 2.55s/it] +2025-02-05 22:03:34 - ERROR - stderr - 55%|█████▍ | 12241/22434 [11:55:54<7:11:59, 2.54s/it] +2025-02-05 22:03:34 - ERROR - stderr - +2025-02-05 22:03:34 - ERROR - stderr - +2025-02-05 22:03:34 - INFO - stdout - {'loss': 0.6658, 'grad_norm': 1.0698387622833252, 'learning_rate': 9.009771536844448e-06, 'epoch': 1.64} +2025-02-05 22:03:34 - ERROR - stderr - 55%|█████▍ | 12241/22434 [11:55:54<7:11:59, 2.54s/it] +2025-02-05 22:03:36 - ERROR - stderr - 55%|█████▍ | 12242/22434 [11:55:56<7:09:06, 2.53s/it] +2025-02-05 22:03:36 - ERROR - stderr - +2025-02-05 22:03:36 - ERROR - stderr - +2025-02-05 22:03:36 - INFO - stdout - {'loss': 0.7534, 'grad_norm': 1.1605802774429321, 'learning_rate': 9.008334896329199e-06, 'epoch': 1.64} +2025-02-05 22:03:36 - ERROR - stderr - 55%|█████▍ | 12242/22434 [11:55:56<7:09:06, 2.53s/it] +2025-02-05 22:03:39 - ERROR - stderr - 55%|█████▍ | 12243/22434 [11:55:59<7:04:50, 2.50s/it] +2025-02-05 22:03:39 - ERROR - stderr - +2025-02-05 22:03:39 - ERROR - stderr - +2025-02-05 22:03:39 - INFO - stdout - {'loss': 0.6507, 'grad_norm': 1.2932995557785034, 'learning_rate': 9.006898276484264e-06, 'epoch': 1.64} +2025-02-05 22:03:39 - ERROR - stderr - 55%|█████▍ | 12243/22434 [11:55:59<7:04:50, 2.50s/it] +2025-02-05 22:03:41 - ERROR - stderr - 55%|█████▍ | 12244/22434 [11:56:01<7:02:35, 2.49s/it] +2025-02-05 22:03:41 - ERROR - stderr - +2025-02-05 22:03:41 - ERROR - stderr - +2025-02-05 22:03:41 - INFO - stdout - {'loss': 0.6916, 'grad_norm': 1.2700107097625732, 'learning_rate': 9.00546167733958e-06, 'epoch': 1.64} +2025-02-05 22:03:41 - ERROR - stderr - 55%|█████▍ | 12244/22434 [11:56:01<7:02:35, 2.49s/it] +2025-02-05 22:03:44 - ERROR - stderr - 55%|█████▍ | 12245/22434 [11:56:04<7:04:47, 2.50s/it] +2025-02-05 22:03:44 - ERROR - stderr - +2025-02-05 22:03:44 - ERROR - stderr - +2025-02-05 22:03:44 - INFO - stdout - {'loss': 0.771, 'grad_norm': 1.2535593509674072, 'learning_rate': 9.004025098925099e-06, 'epoch': 1.64} +2025-02-05 22:03:44 - ERROR - stderr - 55%|█████▍ | 12245/22434 [11:56:04<7:04:47, 2.50s/it] +2025-02-05 22:03:46 - ERROR - stderr - 55%|█████▍ | 12246/22434 [11:56:06<7:14:26, 2.56s/it] +2025-02-05 22:03:47 - ERROR - stderr - +2025-02-05 22:03:47 - ERROR - stderr - +2025-02-05 22:03:47 - INFO - stdout - {'loss': 0.6334, 'grad_norm': 1.243652582168579, 'learning_rate': 9.002588541270758e-06, 'epoch': 1.64} +2025-02-05 22:03:47 - ERROR - stderr - 55%|█████▍ | 12246/22434 [11:56:06<7:14:26, 2.56s/it] +2025-02-05 22:03:49 - ERROR - stderr - 55%|█████▍ | 12247/22434 [11:56:09<7:13:35, 2.55s/it] +2025-02-05 22:03:49 - ERROR - stderr - +2025-02-05 22:03:49 - ERROR - stderr - +2025-02-05 22:03:49 - INFO - stdout - {'loss': 0.6161, 'grad_norm': 1.2627640962600708, 'learning_rate': 9.00115200440651e-06, 'epoch': 1.64} +2025-02-05 22:03:49 - ERROR - stderr - 55%|█████▍ | 12247/22434 [11:56:09<7:13:35, 2.55s/it] +2025-02-05 22:03:52 - ERROR - stderr - 55%|█████▍ | 12248/22434 [11:56:11<7:16:02, 2.57s/it] +2025-02-05 22:03:52 - ERROR - stderr - +2025-02-05 22:03:52 - ERROR - stderr - +2025-02-05 22:03:52 - INFO - stdout - {'loss': 0.6409, 'grad_norm': 1.1325398683547974, 'learning_rate': 8.999715488362288e-06, 'epoch': 1.64} +2025-02-05 22:03:52 - ERROR - stderr - 55%|█████▍ | 12248/22434 [11:56:11<7:16:02, 2.57s/it] +2025-02-05 22:03:54 - ERROR - stderr - 55%|█████▍ | 12249/22434 [11:56:14<7:11:12, 2.54s/it] +2025-02-05 22:03:54 - ERROR - stderr - +2025-02-05 22:03:54 - ERROR - stderr - +2025-02-05 22:03:54 - INFO - stdout - {'loss': 0.7208, 'grad_norm': 1.186276912689209, 'learning_rate': 8.99827899316804e-06, 'epoch': 1.64} +2025-02-05 22:03:54 - ERROR - stderr - 55%|█████▍ | 12249/22434 [11:56:14<7:11:12, 2.54s/it] +2025-02-05 22:03:57 - ERROR - stderr - 55%|█████▍ | 12250/22434 [11:56:16<7:11:31, 2.54s/it] +2025-02-05 22:03:57 - ERROR - stderr - +2025-02-05 22:03:57 - ERROR - stderr - +2025-02-05 22:03:57 - INFO - stdout - {'loss': 0.6919, 'grad_norm': 1.3243136405944824, 'learning_rate': 8.99684251885371e-06, 'epoch': 1.64} +2025-02-05 22:03:57 - ERROR - stderr - 55%|█████▍ | 12250/22434 [11:56:16<7:11:31, 2.54s/it] +2025-02-05 22:03:59 - ERROR - stderr - 55%|█████▍ | 12251/22434 [11:56:19<7:08:56, 2.53s/it] +2025-02-05 22:03:59 - ERROR - stderr - +2025-02-05 22:03:59 - ERROR - stderr - +2025-02-05 22:03:59 - INFO - stdout - {'loss': 0.6747, 'grad_norm': 1.171627163887024, 'learning_rate': 8.995406065449238e-06, 'epoch': 1.64} +2025-02-05 22:03:59 - ERROR - stderr - 55%|█████▍ | 12251/22434 [11:56:19<7:08:56, 2.53s/it] +2025-02-05 22:04:02 - ERROR - stderr - 55%|█████▍ | 12252/22434 [11:56:21<7:07:50, 2.52s/it] +2025-02-05 22:04:02 - ERROR - stderr - +2025-02-05 22:04:02 - ERROR - stderr - +2025-02-05 22:04:02 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.2558014392852783, 'learning_rate': 8.993969632984561e-06, 'epoch': 1.64} +2025-02-05 22:04:02 - ERROR - stderr - 55%|█████▍ | 12252/22434 [11:56:21<7:07:50, 2.52s/it] +2025-02-05 22:04:04 - ERROR - stderr - 55%|█████▍ | 12253/22434 [11:56:24<7:04:42, 2.50s/it] +2025-02-05 22:04:04 - ERROR - stderr - +2025-02-05 22:04:04 - ERROR - stderr - +2025-02-05 22:04:04 - INFO - stdout - {'loss': 0.6912, 'grad_norm': 1.1844756603240967, 'learning_rate': 8.992533221489628e-06, 'epoch': 1.64} +2025-02-05 22:04:04 - ERROR - stderr - 55%|█████▍ | 12253/22434 [11:56:24<7:04:42, 2.50s/it] +2025-02-05 22:04:07 - ERROR - stderr - 55%|█████▍ | 12254/22434 [11:56:26<7:01:47, 2.49s/it] +2025-02-05 22:04:07 - ERROR - stderr - +2025-02-05 22:04:07 - ERROR - stderr - +2025-02-05 22:04:07 - INFO - stdout - {'loss': 0.7401, 'grad_norm': 1.3822500705718994, 'learning_rate': 8.991096830994375e-06, 'epoch': 1.64} +2025-02-05 22:04:07 - ERROR - stderr - 55%|█████▍ | 12254/22434 [11:56:26<7:01:47, 2.49s/it] +2025-02-05 22:04:09 - ERROR - stderr - 55%|█████▍ | 12255/22434 [11:56:29<7:09:39, 2.53s/it] +2025-02-05 22:04:09 - ERROR - stderr - +2025-02-05 22:04:09 - ERROR - stderr - +2025-02-05 22:04:09 - INFO - stdout - {'loss': 0.7836, 'grad_norm': 1.3259596824645996, 'learning_rate': 8.989660461528743e-06, 'epoch': 1.64} +2025-02-05 22:04:09 - ERROR - stderr - 55%|█████▍ | 12255/22434 [11:56:29<7:09:39, 2.53s/it] +2025-02-05 22:04:12 - ERROR - stderr - 55%|█████▍ | 12256/22434 [11:56:32<7:13:55, 2.56s/it] +2025-02-05 22:04:12 - ERROR - stderr - +2025-02-05 22:04:12 - ERROR - stderr - +2025-02-05 22:04:12 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.1490412950515747, 'learning_rate': 8.988224113122675e-06, 'epoch': 1.64} +2025-02-05 22:04:12 - ERROR - stderr - 55%|█████▍ | 12256/22434 [11:56:32<7:13:55, 2.56s/it] +2025-02-05 22:04:14 - ERROR - stderr - 55%|█████▍ | 12257/22434 [11:56:34<7:09:57, 2.53s/it] +2025-02-05 22:04:14 - ERROR - stderr - +2025-02-05 22:04:14 - ERROR - stderr - +2025-02-05 22:04:14 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.3420923948287964, 'learning_rate': 8.986787785806102e-06, 'epoch': 1.64} +2025-02-05 22:04:14 - ERROR - stderr - 55%|█████▍ | 12257/22434 [11:56:34<7:09:57, 2.53s/it] +2025-02-05 22:04:17 - ERROR - stderr - 55%|█████▍ | 12258/22434 [11:56:37<7:19:58, 2.59s/it] +2025-02-05 22:04:17 - ERROR - stderr - +2025-02-05 22:04:17 - ERROR - stderr - +2025-02-05 22:04:17 - INFO - stdout - {'loss': 0.7275, 'grad_norm': 1.2320441007614136, 'learning_rate': 8.985351479608972e-06, 'epoch': 1.64} +2025-02-05 22:04:17 - ERROR - stderr - 55%|█████▍ | 12258/22434 [11:56:37<7:19:58, 2.59s/it] +2025-02-05 22:04:20 - ERROR - stderr - 55%|█████▍ | 12259/22434 [11:56:39<7:17:12, 2.58s/it] +2025-02-05 22:04:20 - ERROR - stderr - +2025-02-05 22:04:20 - ERROR - stderr - +2025-02-05 22:04:20 - INFO - stdout - {'loss': 0.6574, 'grad_norm': 1.1409752368927002, 'learning_rate': 8.983915194561218e-06, 'epoch': 1.64} +2025-02-05 22:04:20 - ERROR - stderr - 55%|█████▍ | 12259/22434 [11:56:39<7:17:12, 2.58s/it] +2025-02-05 22:04:22 - ERROR - stderr - 55%|█████▍ | 12260/22434 [11:56:42<7:17:17, 2.58s/it] +2025-02-05 22:04:22 - ERROR - stderr - +2025-02-05 22:04:22 - ERROR - stderr - +2025-02-05 22:04:22 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.1920838356018066, 'learning_rate': 8.98247893069278e-06, 'epoch': 1.64} +2025-02-05 22:04:22 - ERROR - stderr - 55%|█████▍ | 12260/22434 [11:56:42<7:17:17, 2.58s/it] +2025-02-05 22:04:25 - ERROR - stderr - 55%|█████▍ | 12261/22434 [11:56:44<7:16:16, 2.57s/it] +2025-02-05 22:04:25 - ERROR - stderr - +2025-02-05 22:04:25 - ERROR - stderr - +2025-02-05 22:04:25 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.2273805141448975, 'learning_rate': 8.981042688033593e-06, 'epoch': 1.64} +2025-02-05 22:04:25 - ERROR - stderr - 55%|█████▍ | 12261/22434 [11:56:45<7:16:16, 2.57s/it] +2025-02-05 22:04:27 - ERROR - stderr - 55%|█████▍ | 12262/22434 [11:56:47<7:12:30, 2.55s/it] +2025-02-05 22:04:27 - ERROR - stderr - +2025-02-05 22:04:27 - ERROR - stderr - +2025-02-05 22:04:27 - INFO - stdout - {'loss': 0.6376, 'grad_norm': 1.1600852012634277, 'learning_rate': 8.979606466613596e-06, 'epoch': 1.64} +2025-02-05 22:04:27 - ERROR - stderr - 55%|██���██▍ | 12262/22434 [11:56:47<7:12:30, 2.55s/it] +2025-02-05 22:04:30 - ERROR - stderr - 55%|█████▍ | 12263/22434 [11:56:50<7:15:15, 2.57s/it] +2025-02-05 22:04:30 - ERROR - stderr - +2025-02-05 22:04:30 - ERROR - stderr - +2025-02-05 22:04:30 - INFO - stdout - {'loss': 0.6361, 'grad_norm': 1.200808048248291, 'learning_rate': 8.97817026646273e-06, 'epoch': 1.64} +2025-02-05 22:04:30 - ERROR - stderr - 55%|█████▍ | 12263/22434 [11:56:50<7:15:15, 2.57s/it] +2025-02-05 22:04:32 - ERROR - stderr - 55%|█████▍ | 12264/22434 [11:56:52<7:09:52, 2.54s/it] +2025-02-05 22:04:32 - ERROR - stderr - +2025-02-05 22:04:32 - ERROR - stderr - +2025-02-05 22:04:32 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.217524528503418, 'learning_rate': 8.976734087610925e-06, 'epoch': 1.64} +2025-02-05 22:04:32 - ERROR - stderr - 55%|█████▍ | 12264/22434 [11:56:52<7:09:52, 2.54s/it] +2025-02-05 22:04:35 - ERROR - stderr - 55%|█████▍ | 12265/22434 [11:56:55<7:15:28, 2.57s/it] +2025-02-05 22:04:35 - ERROR - stderr - +2025-02-05 22:04:35 - ERROR - stderr - +2025-02-05 22:04:35 - INFO - stdout - {'loss': 0.5904, 'grad_norm': 1.1046650409698486, 'learning_rate': 8.975297930088116e-06, 'epoch': 1.64} +2025-02-05 22:04:35 - ERROR - stderr - 55%|█████▍ | 12265/22434 [11:56:55<7:15:28, 2.57s/it] +2025-02-05 22:04:35 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2878 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 22:04:35 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2878 > 2048). Running this sequence through the model will result in indexing errors +2025-02-05 22:04:37 - ERROR - stderr - 55%|█████▍ | 12266/22434 [11:56:57<7:08:08, 2.53s/it] +2025-02-05 22:04:37 - ERROR - stderr - +2025-02-05 22:04:37 - ERROR - stderr - +2025-02-05 22:04:37 - INFO - stdout - {'loss': 0.7478, 'grad_norm': 1.289227843284607, 'learning_rate': 8.973861793924246e-06, 'epoch': 1.64} +2025-02-05 22:04:37 - ERROR - stderr - 55%|█████▍ | 12266/22434 [11:56:57<7:08:08, 2.53s/it] +2025-02-05 22:04:43 - ERROR - stderr - 55%|█████▍ | 12267/22434 [11:57:03<9:50:40, 3.49s/it] +2025-02-05 22:04:43 - ERROR - stderr - +2025-02-05 22:04:43 - ERROR - stderr - +2025-02-05 22:04:43 - INFO - stdout - {'loss': 0.6179, 'grad_norm': 1.2591333389282227, 'learning_rate': 8.97242567914924e-06, 'epoch': 1.64} +2025-02-05 22:04:43 - ERROR - stderr - 55%|█████▍ | 12267/22434 [11:57:03<9:50:40, 3.49s/it] +2025-02-05 22:04:46 - ERROR - stderr - 55%|█████▍ | 12268/22434 [11:57:05<9:01:42, 3.20s/it] +2025-02-05 22:04:46 - ERROR - stderr - +2025-02-05 22:04:46 - ERROR - stderr - +2025-02-05 22:04:46 - INFO - stdout - {'loss': 0.7256, 'grad_norm': 1.3970115184783936, 'learning_rate': 8.970989585793039e-06, 'epoch': 1.64} +2025-02-05 22:04:46 - ERROR - stderr - 55%|█████▍ | 12268/22434 [11:57:05<9:01:42, 3.20s/it] +2025-02-05 22:04:48 - ERROR - stderr - 55%|█████▍ | 12269/22434 [11:57:08<8:23:55, 2.97s/it] +2025-02-05 22:04:48 - ERROR - stderr - +2025-02-05 22:04:48 - ERROR - stderr - +2025-02-05 22:04:48 - INFO - stdout - {'loss': 0.5993, 'grad_norm': 1.0967646837234497, 'learning_rate': 8.969553513885578e-06, 'epoch': 1.64} +2025-02-05 22:04:48 - ERROR - stderr - 55%|█████▍ | 12269/22434 [11:57:08<8:23:55, 2.97s/it] +2025-02-05 22:04:51 - ERROR - stderr - 55%|█████▍ | 12270/22434 [11:57:10<8:00:34, 2.84s/it] +2025-02-05 22:04:51 - ERROR - stderr - +2025-02-05 22:04:51 - ERROR - stderr - +2025-02-05 22:04:51 - INFO - stdout - {'loss': 0.6389, 'grad_norm': 1.205810546875, 'learning_rate': 8.968117463456784e-06, 'epoch': 1.64} +2025-02-05 22:04:51 - ERROR - stderr - 55%|█████▍ | 12270/22434 [11:57:10<8:00:34, 2.84s/it] +2025-02-05 22:04:53 - ERROR - stderr - 55%|█████▍ | 12271/22434 [11:57:13<7:46:20, 2.75s/it] +2025-02-05 22:04:53 - ERROR - stderr - +2025-02-05 22:04:53 - ERROR - stderr - +2025-02-05 22:04:53 - INFO - stdout - {'loss': 0.7589, 'grad_norm': 1.2053886651992798, 'learning_rate': 8.966681434536599e-06, 'epoch': 1.64} +2025-02-05 22:04:53 - ERROR - stderr - 55%|█████▍ | 12271/22434 [11:57:13<7:46:20, 2.75s/it] +2025-02-05 22:04:56 - ERROR - stderr - 55%|█████▍ | 12272/22434 [11:57:15<7:35:52, 2.69s/it] +2025-02-05 22:04:56 - ERROR - stderr - +2025-02-05 22:04:56 - ERROR - stderr - +2025-02-05 22:04:56 - INFO - stdout - {'loss': 0.6131, 'grad_norm': 1.1467087268829346, 'learning_rate': 8.965245427154948e-06, 'epoch': 1.64} +2025-02-05 22:04:56 - ERROR - stderr - 55%|█████▍ | 12272/22434 [11:57:16<7:35:52, 2.69s/it] +2025-02-05 22:04:58 - ERROR - stderr - 55%|█████▍ | 12273/22434 [11:57:18<7:23:20, 2.62s/it] +2025-02-05 22:04:58 - ERROR - stderr - +2025-02-05 22:04:58 - ERROR - stderr - +2025-02-05 22:04:58 - INFO - stdout - {'loss': 0.7084, 'grad_norm': 1.2030466794967651, 'learning_rate': 8.963809441341764e-06, 'epoch': 1.64} +2025-02-05 22:04:58 - ERROR - stderr - 55%|█████▍ | 12273/22434 [11:57:18<7:23:20, 2.62s/it] +2025-02-05 22:05:01 - ERROR - stderr - 55%|█████▍ | 12274/22434 [11:57:20<7:16:15, 2.58s/it] +2025-02-05 22:05:01 - ERROR - stderr - +2025-02-05 22:05:01 - ERROR - stderr - +2025-02-05 22:05:01 - INFO - stdout - {'loss': 0.7696, 'grad_norm': 1.3350441455841064, 'learning_rate': 8.962373477126983e-06, 'epoch': 1.64} +2025-02-05 22:05:01 - ERROR - stderr - 55%|█████▍ | 12274/22434 [11:57:20<7:16:15, 2.58s/it] +2025-02-05 22:05:03 - ERROR - stderr - 55%|█████▍ | 12275/22434 [11:57:23<7:15:09, 2.57s/it] +2025-02-05 22:05:03 - ERROR - stderr - +2025-02-05 22:05:03 - ERROR - stderr - +2025-02-05 22:05:03 - INFO - stdout - {'loss': 0.7451, 'grad_norm': 1.2633978128433228, 'learning_rate': 8.960937534540537e-06, 'epoch': 1.64} +2025-02-05 22:05:03 - ERROR - stderr - 55%|█████▍ | 12275/22434 [11:57:23<7:15:09, 2.57s/it] +2025-02-05 22:05:06 - ERROR - stderr - 55%|█████▍ | 12276/22434 [11:57:25<7:12:33, 2.55s/it] +2025-02-05 22:05:06 - ERROR - stderr - +2025-02-05 22:05:06 - ERROR - stderr - +2025-02-05 22:05:06 - INFO - stdout - {'loss': 0.7369, 'grad_norm': 1.2714512348175049, 'learning_rate': 8.959501613612347e-06, 'epoch': 1.64} +2025-02-05 22:05:06 - ERROR - stderr - 55%|█████▍ | 12276/22434 [11:57:26<7:12:33, 2.55s/it] +2025-02-05 22:05:08 - ERROR - stderr - 55%|█████▍ | 12277/22434 [11:57:28<7:06:56, 2.52s/it] +2025-02-05 22:05:08 - ERROR - stderr - +2025-02-05 22:05:08 - ERROR - stderr - +2025-02-05 22:05:08 - INFO - stdout - {'loss': 0.6442, 'grad_norm': 1.2392311096191406, 'learning_rate': 8.958065714372355e-06, 'epoch': 1.64} +2025-02-05 22:05:08 - ERROR - stderr - 55%|█████▍ | 12277/22434 [11:57:28<7:06:56, 2.52s/it] +2025-02-05 22:05:11 - ERROR - stderr - 55%|█████▍ | 12278/22434 [11:57:30<7:05:58, 2.52s/it] +2025-02-05 22:05:11 - ERROR - stderr - +2025-02-05 22:05:11 - ERROR - stderr - +2025-02-05 22:05:11 - INFO - stdout - {'loss': 0.6685, 'grad_norm': 1.2752341032028198, 'learning_rate': 8.956629836850482e-06, 'epoch': 1.64} +2025-02-05 22:05:11 - ERROR - stderr - 55%|█████▍ | 12278/22434 [11:57:30<7:05:58, 2.52s/it] +2025-02-05 22:05:13 - ERROR - stderr - 55%|█████▍ | 12279/22434 [11:57:33<7:06:04, 2.52s/it] +2025-02-05 22:05:13 - ERROR - stderr - +2025-02-05 22:05:13 - ERROR - stderr - +2025-02-05 22:05:13 - INFO - stdout - {'loss': 0.7191, 'grad_norm': 1.2576552629470825, 'learning_rate': 8.955193981076666e-06, 'epoch': 1.64} +2025-02-05 22:05:13 - ERROR - stderr - 55%|█████▍ | 12279/22434 [11:57:33<7:06:04, 2.52s/it] +2025-02-05 22:05:16 - ERROR - stderr - 55%|█████▍ | 12280/22434 [11:57:35<7:03:58, 2.51s/it] +2025-02-05 22:05:16 - ERROR - stderr - +2025-02-05 22:05:16 - ERROR - stderr - +2025-02-05 22:05:16 - INFO - stdout - {'loss': 0.7125, 'grad_norm': 1.302627444267273, 'learning_rate': 8.95375814708083e-06, 'epoch': 1.64} +2025-02-05 22:05:16 - ERROR - stderr - 55%|█████▍ | 12280/22434 [11:57:35<7:03:58, 2.51s/it] +2025-02-05 22:05:18 - ERROR - stderr - 55%|█████▍ | 12281/22434 [11:57:38<7:02:52, 2.50s/it] +2025-02-05 22:05:18 - ERROR - stderr - +2025-02-05 22:05:18 - ERROR - stderr - +2025-02-05 22:05:18 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.2063794136047363, 'learning_rate': 8.952322334892903e-06, 'epoch': 1.64} +2025-02-05 22:05:18 - ERROR - stderr - 55%|█████▍ | 12281/22434 [11:57:38<7:02:52, 2.50s/it] +2025-02-05 22:05:21 - ERROR - stderr - 55%|█████▍ | 12282/22434 [11:57:40<7:05:58, 2.52s/it] +2025-02-05 22:05:21 - ERROR - stderr - +2025-02-05 22:05:21 - ERROR - stderr - +2025-02-05 22:05:21 - INFO - stdout - {'loss': 0.7505, 'grad_norm': 1.583531141281128, 'learning_rate': 8.950886544542817e-06, 'epoch': 1.64} +2025-02-05 22:05:21 - ERROR - stderr - 55%|█████▍ | 12282/22434 [11:57:40<7:05:58, 2.52s/it] +2025-02-05 22:05:23 - ERROR - stderr - 55%|█████▍ | 12283/22434 [11:57:43<7:03:58, 2.51s/it] +2025-02-05 22:05:23 - ERROR - stderr - +2025-02-05 22:05:23 - ERROR - stderr - +2025-02-05 22:05:23 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.3122018575668335, 'learning_rate': 8.949450776060498e-06, 'epoch': 1.64} +2025-02-05 22:05:23 - ERROR - stderr - 55%|█████▍ | 12283/22434 [11:57:43<7:03:58, 2.51s/it] +2025-02-05 22:05:26 - ERROR - stderr - 55%|█████▍ | 12284/22434 [11:57:45<7:04:22, 2.51s/it] +2025-02-05 22:05:26 - ERROR - stderr - +2025-02-05 22:05:26 - ERROR - stderr - +2025-02-05 22:05:26 - INFO - stdout - {'loss': 0.7243, 'grad_norm': 1.2292333841323853, 'learning_rate': 8.948015029475866e-06, 'epoch': 1.64} +2025-02-05 22:05:26 - ERROR - stderr - 55%|█████▍ | 12284/22434 [11:57:45<7:04:22, 2.51s/it] +2025-02-05 22:05:28 - ERROR - stderr - 55%|█████▍ | 12285/22434 [11:57:48<7:06:54, 2.52s/it] +2025-02-05 22:05:28 - ERROR - stderr - +2025-02-05 22:05:28 - ERROR - stderr - +2025-02-05 22:05:28 - INFO - stdout - {'loss': 0.6807, 'grad_norm': 1.124380350112915, 'learning_rate': 8.946579304818863e-06, 'epoch': 1.64} +2025-02-05 22:05:28 - ERROR - stderr - 55%|█████▍ | 12285/22434 [11:57:48<7:06:54, 2.52s/it] +2025-02-05 22:05:31 - ERROR - stderr - 55%|█████▍ | 12286/22434 [11:57:51<7:14:48, 2.57s/it] +2025-02-05 22:05:31 - ERROR - stderr - +2025-02-05 22:05:31 - ERROR - stderr - +2025-02-05 22:05:31 - INFO - stdout - {'loss': 0.6407, 'grad_norm': 1.2706211805343628, 'learning_rate': 8.945143602119397e-06, 'epoch': 1.64} +2025-02-05 22:05:31 - ERROR - stderr - 55%|█████▍ | 12286/22434 [11:57:51<7:14:48, 2.57s/it] +2025-02-05 22:05:33 - ERROR - stderr - 55%|█████▍ | 12287/22434 [11:57:53<7:08:23, 2.53s/it] +2025-02-05 22:05:33 - ERROR - stderr - +2025-02-05 22:05:33 - ERROR - stderr - +2025-02-05 22:05:33 - INFO - stdout - {'loss': 0.6725, 'grad_norm': 1.2830673456192017, 'learning_rate': 8.943707921407408e-06, 'epoch': 1.64} +2025-02-05 22:05:33 - ERROR - stderr - 55%|█████▍ | 12287/22434 [11:57:53<7:08:23, 2.53s/it] +2025-02-05 22:05:36 - ERROR - stderr - 55%|█████▍ | 12288/22434 [11:57:56<7:13:28, 2.56s/it] +2025-02-05 22:05:36 - ERROR - stderr - +2025-02-05 22:05:36 - ERROR - stderr - +2025-02-05 22:05:36 - INFO - stdout - {'loss': 0.826, 'grad_norm': 1.3369498252868652, 'learning_rate': 8.94227226271282e-06, 'epoch': 1.64} +2025-02-05 22:05:36 - ERROR - stderr - 55%|█████▍ | 12288/22434 [11:57:56<7:13:28, 2.56s/it] +2025-02-05 22:05:39 - ERROR - stderr - 55%|█████▍ | 12289/22434 [11:57:58<7:11:33, 2.55s/it] +2025-02-05 22:05:39 - ERROR - stderr - +2025-02-05 22:05:39 - ERROR - stderr - +2025-02-05 22:05:39 - INFO - stdout - {'loss': 0.7315, 'grad_norm': 1.1838973760604858, 'learning_rate': 8.940836626065547e-06, 'epoch': 1.64} +2025-02-05 22:05:39 - ERROR - stderr - 55%|█████▍ | 12289/22434 [11:57:58<7:11:33, 2.55s/it] +2025-02-05 22:05:41 - ERROR - stderr - 55%|█████▍ | 12290/22434 [11:58:01<7:09:04, 2.54s/it] +2025-02-05 22:05:41 - ERROR - stderr - +2025-02-05 22:05:41 - ERROR - stderr - +2025-02-05 22:05:41 - INFO - stdout - {'loss': 0.6231, 'grad_norm': 1.2631181478500366, 'learning_rate': 8.939401011495527e-06, 'epoch': 1.64} +2025-02-05 22:05:41 - ERROR - stderr - 55%|█████▍ | 12290/22434 [11:58:01<7:09:04, 2.54s/it] +2025-02-05 22:05:43 - ERROR - stderr - 55%|█████▍ | 12291/22434 [11:58:03<7:03:50, 2.51s/it] +2025-02-05 22:05:44 - ERROR - stderr - +2025-02-05 22:05:44 - ERROR - stderr - +2025-02-05 22:05:44 - INFO - stdout - {'loss': 0.6267, 'grad_norm': 1.1619325876235962, 'learning_rate': 8.937965419032677e-06, 'epoch': 1.64} +2025-02-05 22:05:44 - ERROR - stderr - 55%|█████▍ | 12291/22434 [11:58:03<7:03:50, 2.51s/it] +2025-02-05 22:05:46 - ERROR - stderr - 55%|█████▍ | 12292/22434 [11:58:06<7:03:02, 2.50s/it] +2025-02-05 22:05:46 - ERROR - stderr - +2025-02-05 22:05:46 - ERROR - stderr - +2025-02-05 22:05:46 - INFO - stdout - {'loss': 0.7537, 'grad_norm': 1.1883012056350708, 'learning_rate': 8.936529848706919e-06, 'epoch': 1.64} +2025-02-05 22:05:46 - ERROR - stderr - 55%|█████▍ | 12292/22434 [11:58:06<7:03:02, 2.50s/it] +2025-02-05 22:05:48 - ERROR - stderr - 55%|█████▍ | 12293/22434 [11:58:08<7:04:41, 2.51s/it] +2025-02-05 22:05:49 - ERROR - stderr - +2025-02-05 22:05:49 - ERROR - stderr - +2025-02-05 22:05:49 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.2511861324310303, 'learning_rate': 8.93509430054818e-06, 'epoch': 1.64} +2025-02-05 22:05:49 - ERROR - stderr - 55%|█████▍ | 12293/22434 [11:58:08<7:04:41, 2.51s/it] +2025-02-05 22:05:51 - ERROR - stderr - 55%|█████▍ | 12294/22434 [11:58:11<7:05:21, 2.52s/it] +2025-02-05 22:05:51 - ERROR - stderr - +2025-02-05 22:05:51 - ERROR - stderr - +2025-02-05 22:05:51 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.2818245887756348, 'learning_rate': 8.933658774586381e-06, 'epoch': 1.64} +2025-02-05 22:05:51 - ERROR - stderr - 55%|█████▍ | 12294/22434 [11:58:11<7:05:21, 2.52s/it] +2025-02-05 22:05:53 - ERROR - stderr - 55%|█████▍ | 12295/22434 [11:58:13<7:01:05, 2.49s/it] +2025-02-05 22:05:53 - ERROR - stderr - +2025-02-05 22:05:53 - ERROR - stderr - +2025-02-05 22:05:53 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.157547116279602, 'learning_rate': 8.932223270851445e-06, 'epoch': 1.64} +2025-02-05 22:05:53 - ERROR - stderr - 55%|█████▍ | 12295/22434 [11:58:13<7:01:05, 2.49s/it] +2025-02-05 22:05:56 - ERROR - stderr - 55%|█████▍ | 12296/22434 [11:58:16<7:01:54, 2.50s/it] +2025-02-05 22:05:56 - ERROR - stderr - +2025-02-05 22:05:56 - ERROR - stderr - +2025-02-05 22:05:56 - INFO - stdout - {'loss': 0.7135, 'grad_norm': 1.2189162969589233, 'learning_rate': 8.930787789373296e-06, 'epoch': 1.64} +2025-02-05 22:05:56 - ERROR - stderr - 55%|█████▍ | 12296/22434 [11:58:16<7:01:54, 2.50s/it] +2025-02-05 22:05:59 - ERROR - stderr - 55%|█████▍ | 12297/22434 [11:58:18<7:04:40, 2.51s/it] +2025-02-05 22:05:59 - ERROR - stderr - +2025-02-05 22:05:59 - ERROR - stderr - +2025-02-05 22:05:59 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.1448163986206055, 'learning_rate': 8.929352330181847e-06, 'epoch': 1.64} +2025-02-05 22:05:59 - ERROR - stderr - 55%|█████▍ | 12297/22434 [11:58:18<7:04:40, 2.51s/it] +2025-02-05 22:06:01 - ERROR - stderr - 55%|█████▍ | 12298/22434 [11:58:21<7:05:14, 2.52s/it] +2025-02-05 22:06:01 - ERROR - stderr - +2025-02-05 22:06:01 - ERROR - stderr - +2025-02-05 22:06:01 - INFO - stdout - {'loss': 0.6745, 'grad_norm': 1.21192467212677, 'learning_rate': 8.92791689330703e-06, 'epoch': 1.64} +2025-02-05 22:06:01 - ERROR - stderr - 55%|█████▍ | 12298/22434 [11:58:21<7:05:14, 2.52s/it] +2025-02-05 22:06:04 - ERROR - stderr - 55%|█████▍ | 12299/22434 [11:58:23<7:03:38, 2.51s/it] +2025-02-05 22:06:04 - ERROR - stderr - +2025-02-05 22:06:04 - ERROR - stderr - +2025-02-05 22:06:04 - INFO - stdout - {'loss': 0.6883, 'grad_norm': 1.4186244010925293, 'learning_rate': 8.926481478778756e-06, 'epoch': 1.64} +2025-02-05 22:06:04 - ERROR - stderr - 55%|█████▍ | 12299/22434 [11:58:23<7:03:38, 2.51s/it] +2025-02-05 22:06:06 - ERROR - stderr - 55%|█████▍ | 12300/22434 [11:58:26<7:02:19, 2.50s/it] +2025-02-05 22:06:06 - ERROR - stderr - +2025-02-05 22:06:06 - ERROR - stderr - +2025-02-05 22:06:06 - INFO - stdout - {'loss': 0.6446, 'grad_norm': 1.0672227144241333, 'learning_rate': 8.925046086626945e-06, 'epoch': 1.64} +2025-02-05 22:06:06 - ERROR - stderr - 55%|█████▍ | 12300/22434 [11:58:26<7:02:19, 2.50s/it] +2025-02-05 22:06:08 - ERROR - stderr - 55%|█████▍ | 12301/22434 [11:58:28<6:59:58, 2.49s/it] +2025-02-05 22:06:09 - ERROR - stderr - +2025-02-05 22:06:09 - ERROR - stderr - +2025-02-05 22:06:09 - INFO - stdout - {'loss': 0.6901, 'grad_norm': 1.261681318283081, 'learning_rate': 8.923610716881525e-06, 'epoch': 1.64} +2025-02-05 22:06:09 - ERROR - stderr - 55%|█████▍ | 12301/22434 [11:58:28<6:59:58, 2.49s/it] +2025-02-05 22:06:11 - ERROR - stderr - 55%|█████▍ | 12302/22434 [11:58:31<7:03:40, 2.51s/it] +2025-02-05 22:06:11 - ERROR - stderr - +2025-02-05 22:06:11 - ERROR - stderr - +2025-02-05 22:06:11 - INFO - stdout - {'loss': 0.6448, 'grad_norm': 1.166210412979126, 'learning_rate': 8.922175369572407e-06, 'epoch': 1.65} +2025-02-05 22:06:11 - ERROR - stderr - 55%|█████▍ | 12302/22434 [11:58:31<7:03:40, 2.51s/it] +2025-02-05 22:06:14 - ERROR - stderr - 55%|█████▍ | 12303/22434 [11:58:33<7:05:50, 2.52s/it] +2025-02-05 22:06:14 - ERROR - stderr - +2025-02-05 22:06:14 - ERROR - stderr - +2025-02-05 22:06:14 - INFO - stdout - {'loss': 0.708, 'grad_norm': 1.1794824600219727, 'learning_rate': 8.920740044729515e-06, 'epoch': 1.65} +2025-02-05 22:06:14 - ERROR - stderr - 55%|█████▍ | 12303/22434 [11:58:33<7:05:50, 2.52s/it] +2025-02-05 22:06:16 - ERROR - stderr - 55%|█████▍ | 12304/22434 [11:58:36<7:03:59, 2.51s/it] +2025-02-05 22:06:16 - ERROR - stderr - +2025-02-05 22:06:16 - ERROR - stderr - +2025-02-05 22:06:16 - INFO - stdout - {'loss': 0.7075, 'grad_norm': 1.2333112955093384, 'learning_rate': 8.919304742382762e-06, 'epoch': 1.65} +2025-02-05 22:06:16 - ERROR - stderr - 55%|█████▍ | 12304/22434 [11:58:36<7:03:59, 2.51s/it] +2025-02-05 22:06:19 - ERROR - stderr - 55%|█████▍ | 12305/22434 [11:58:38<7:01:59, 2.50s/it] +2025-02-05 22:06:19 - ERROR - stderr - +2025-02-05 22:06:19 - ERROR - stderr - +2025-02-05 22:06:19 - INFO - stdout - {'loss': 0.6641, 'grad_norm': 1.225588321685791, 'learning_rate': 8.917869462562067e-06, 'epoch': 1.65} +2025-02-05 22:06:19 - ERROR - stderr - 55%|█████▍ | 12305/22434 [11:58:38<7:01:59, 2.50s/it] +2025-02-05 22:06:21 - ERROR - stderr - 55%|█████▍ | 12306/22434 [11:58:41<7:01:56, 2.50s/it] +2025-02-05 22:06:21 - ERROR - stderr - +2025-02-05 22:06:21 - ERROR - stderr - +2025-02-05 22:06:21 - INFO - stdout - {'loss': 0.7145, 'grad_norm': 1.4158178567886353, 'learning_rate': 8.916434205297347e-06, 'epoch': 1.65} +2025-02-05 22:06:21 - ERROR - stderr - 55%|█████▍ | 12306/22434 [11:58:41<7:01:56, 2.50s/it] +2025-02-05 22:06:24 - ERROR - stderr - 55%|█████▍ | 12307/22434 [11:58:43<7:03:35, 2.51s/it] +2025-02-05 22:06:24 - ERROR - stderr - +2025-02-05 22:06:24 - ERROR - stderr - +2025-02-05 22:06:24 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.204805612564087, 'learning_rate': 8.914998970618522e-06, 'epoch': 1.65} +2025-02-05 22:06:24 - ERROR - stderr - 55%|█████▍ | 12307/22434 [11:58:43<7:03:35, 2.51s/it] +2025-02-05 22:06:26 - ERROR - stderr - 55%|█████▍ | 12308/22434 [11:58:46<7:01:33, 2.50s/it] +2025-02-05 22:06:26 - ERROR - stderr - +2025-02-05 22:06:26 - ERROR - stderr - +2025-02-05 22:06:26 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.284525752067566, 'learning_rate': 8.913563758555502e-06, 'epoch': 1.65} +2025-02-05 22:06:26 - ERROR - stderr - 55%|█████▍ | 12308/22434 [11:58:46<7:01:33, 2.50s/it] +2025-02-05 22:06:29 - ERROR - stderr - 55%|█████▍ | 12309/22434 [11:58:49<7:15:08, 2.58s/it] +2025-02-05 22:06:29 - ERROR - stderr - +2025-02-05 22:06:29 - ERROR - stderr - +2025-02-05 22:06:29 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.3572874069213867, 'learning_rate': 8.912128569138209e-06, 'epoch': 1.65} +2025-02-05 22:06:29 - ERROR - stderr - 55%|█████▍ | 12309/22434 [11:58:49<7:15:08, 2.58s/it] +2025-02-05 22:06:31 - ERROR - stderr - 55%|█████▍ | 12310/22434 [11:58:51<7:17:59, 2.60s/it] +2025-02-05 22:06:31 - ERROR - stderr - +2025-02-05 22:06:31 - ERROR - stderr - +2025-02-05 22:06:31 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.3200709819793701, 'learning_rate': 8.91069340239655e-06, 'epoch': 1.65} +2025-02-05 22:06:31 - ERROR - stderr - 55%|█████▍ | 12310/22434 [11:58:51<7:17:59, 2.60s/it] +2025-02-05 22:06:34 - ERROR - stderr - 55%|█████▍ | 12311/22434 [11:58:54<7:22:24, 2.62s/it] +2025-02-05 22:06:34 - ERROR - stderr - +2025-02-05 22:06:34 - ERROR - stderr - +2025-02-05 22:06:34 - INFO - stdout - {'loss': 0.7759, 'grad_norm': 1.2622733116149902, 'learning_rate': 8.909258258360451e-06, 'epoch': 1.65} +2025-02-05 22:06:34 - ERROR - stderr - 55%|█████▍ | 12311/22434 [11:58:54<7:22:24, 2.62s/it] +2025-02-05 22:06:37 - ERROR - stderr - 55%|█████▍ | 12312/22434 [11:58:56<7:16:35, 2.59s/it] +2025-02-05 22:06:37 - ERROR - stderr - +2025-02-05 22:06:37 - ERROR - stderr - +2025-02-05 22:06:37 - INFO - stdout - {'loss': 0.7301, 'grad_norm': 1.2914661169052124, 'learning_rate': 8.907823137059817e-06, 'epoch': 1.65} +2025-02-05 22:06:37 - ERROR - stderr - 55%|█████▍ | 12312/22434 [11:58:56<7:16:35, 2.59s/it] +2025-02-05 22:06:39 - ERROR - stderr - 55%|█████▍ | 12313/22434 [11:58:59<7:22:56, 2.63s/it] +2025-02-05 22:06:39 - ERROR - stderr - +2025-02-05 22:06:39 - ERROR - stderr - +2025-02-05 22:06:39 - INFO - stdout - {'loss': 0.7668, 'grad_norm': 1.2797245979309082, 'learning_rate': 8.906388038524562e-06, 'epoch': 1.65} +2025-02-05 22:06:39 - ERROR - stderr - 55%|█████▍ | 12313/22434 [11:58:59<7:22:56, 2.63s/it] +2025-02-05 22:06:42 - ERROR - stderr - 55%|█████▍ | 12314/22434 [11:59:02<7:19:13, 2.60s/it] +2025-02-05 22:06:42 - ERROR - stderr - +2025-02-05 22:06:42 - ERROR - stderr - +2025-02-05 22:06:42 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.1662758588790894, 'learning_rate': 8.904952962784605e-06, 'epoch': 1.65} +2025-02-05 22:06:42 - ERROR - stderr - 55%|█████▍ | 12314/22434 [11:59:02<7:19:13, 2.60s/it] +2025-02-05 22:06:44 - ERROR - stderr - 55%|█████▍ | 12315/22434 [11:59:04<7:10:29, 2.55s/it] +2025-02-05 22:06:44 - ERROR - stderr - +2025-02-05 22:06:44 - ERROR - stderr - +2025-02-05 22:06:44 - INFO - stdout - {'loss': 0.5904, 'grad_norm': 1.2073575258255005, 'learning_rate': 8.903517909869858e-06, 'epoch': 1.65} +2025-02-05 22:06:44 - ERROR - stderr - 55%|█████▍ | 12315/22434 [11:59:04<7:10:29, 2.55s/it] +2025-02-05 22:06:47 - ERROR - stderr - 55%|█████▍ | 12316/22434 [11:59:07<7:17:50, 2.60s/it] +2025-02-05 22:06:47 - ERROR - stderr - +2025-02-05 22:06:47 - ERROR - stderr - +2025-02-05 22:06:47 - INFO - stdout - {'loss': 0.7156, 'grad_norm': 1.21602463722229, 'learning_rate': 8.902082879810225e-06, 'epoch': 1.65} +2025-02-05 22:06:47 - ERROR - stderr - 55%|█████▍ | 12316/22434 [11:59:07<7:17:50, 2.60s/it] +2025-02-05 22:06:50 - ERROR - stderr - 55%|█████▍ | 12317/22434 [11:59:09<7:14:01, 2.57s/it] +2025-02-05 22:06:50 - ERROR - stderr - +2025-02-05 22:06:50 - ERROR - stderr - +2025-02-05 22:06:50 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.2871873378753662, 'learning_rate': 8.900647872635629e-06, 'epoch': 1.65} +2025-02-05 22:06:50 - ERROR - stderr - 55%|█████▍ | 12317/22434 [11:59:09<7:14:01, 2.57s/it] +2025-02-05 22:06:52 - ERROR - stderr - 55%|█████▍ | 12318/22434 [11:59:12<7:10:27, 2.55s/it] +2025-02-05 22:06:52 - ERROR - stderr - +2025-02-05 22:06:52 - ERROR - stderr - +2025-02-05 22:06:52 - INFO - stdout - {'loss': 0.6612, 'grad_norm': 1.2348345518112183, 'learning_rate': 8.899212888375972e-06, 'epoch': 1.65} +2025-02-05 22:06:52 - ERROR - stderr - 55%|█████▍ | 12318/22434 [11:59:12<7:10:27, 2.55s/it] +2025-02-05 22:06:55 - ERROR - stderr - 55%|█████▍ | 12319/22434 [11:59:14<7:09:22, 2.55s/it] +2025-02-05 22:06:55 - ERROR - stderr - +2025-02-05 22:06:55 - ERROR - stderr - +2025-02-05 22:06:55 - INFO - stdout - {'loss': 0.7991, 'grad_norm': 1.320090651512146, 'learning_rate': 8.89777792706117e-06, 'epoch': 1.65} +2025-02-05 22:06:55 - ERROR - stderr - 55%|█████▍ | 12319/22434 [11:59:14<7:09:22, 2.55s/it] +2025-02-05 22:06:57 - ERROR - stderr - 55%|█████▍ | 12320/22434 [11:59:17<7:09:26, 2.55s/it] +2025-02-05 22:06:57 - ERROR - stderr - +2025-02-05 22:06:57 - ERROR - stderr - +2025-02-05 22:06:57 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.278443455696106, 'learning_rate': 8.896342988721135e-06, 'epoch': 1.65} +2025-02-05 22:06:57 - ERROR - stderr - 55%|█████▍ | 12320/22434 [11:59:17<7:09:26, 2.55s/it] +2025-02-05 22:07:00 - ERROR - stderr - 55%|█████▍ | 12321/22434 [11:59:19<7:06:27, 2.53s/it] +2025-02-05 22:07:00 - ERROR - stderr - +2025-02-05 22:07:00 - ERROR - stderr - +2025-02-05 22:07:00 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.2681987285614014, 'learning_rate': 8.894908073385771e-06, 'epoch': 1.65} +2025-02-05 22:07:00 - ERROR - stderr - 55%|█████▍ | 12321/22434 [11:59:19<7:06:27, 2.53s/it] +2025-02-05 22:07:02 - ERROR - stderr - 55%|█████▍ | 12322/22434 [11:59:22<7:03:10, 2.51s/it] +2025-02-05 22:07:02 - ERROR - stderr - +2025-02-05 22:07:02 - ERROR - stderr - +2025-02-05 22:07:02 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.285531997680664, 'learning_rate': 8.893473181084993e-06, 'epoch': 1.65} +2025-02-05 22:07:02 - ERROR - stderr - 55%|█████▍ | 12322/22434 [11:59:22<7:03:10, 2.51s/it] +2025-02-05 22:07:05 - ERROR - stderr - 55%|█████▍ | 12323/22434 [11:59:24<7:00:16, 2.49s/it] +2025-02-05 22:07:05 - ERROR - stderr - +2025-02-05 22:07:05 - ERROR - stderr - +2025-02-05 22:07:05 - INFO - stdout - {'loss': 0.6066, 'grad_norm': 1.0385469198226929, 'learning_rate': 8.892038311848704e-06, 'epoch': 1.65} +2025-02-05 22:07:05 - ERROR - stderr - 55%|█████▍ | 12323/22434 [11:59:24<7:00:16, 2.49s/it] +2025-02-05 22:07:07 - ERROR - stderr - 55%|█████▍ | 12324/22434 [11:59:27<6:58:03, 2.48s/it] +2025-02-05 22:07:07 - ERROR - stderr - +2025-02-05 22:07:07 - ERROR - stderr - +2025-02-05 22:07:07 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.1164511442184448, 'learning_rate': 8.890603465706823e-06, 'epoch': 1.65} +2025-02-05 22:07:07 - ERROR - stderr - 55%|█████▍ | 12324/22434 [11:59:27<6:58:03, 2.48s/it] +2025-02-05 22:07:09 - ERROR - stderr - 55%|█████▍ | 12325/22434 [11:59:29<6:55:03, 2.46s/it] +2025-02-05 22:07:09 - ERROR - stderr - +2025-02-05 22:07:09 - ERROR - stderr - +2025-02-05 22:07:09 - INFO - stdout - {'loss': 0.6759, 'grad_norm': 1.1552810668945312, 'learning_rate': 8.889168642689246e-06, 'epoch': 1.65} +2025-02-05 22:07:09 - ERROR - stderr - 55%|█████▍ | 12325/22434 [11:59:29<6:55:03, 2.46s/it] +2025-02-05 22:07:12 - ERROR - stderr - 55%|█████▍ | 12326/22434 [11:59:32<6:55:43, 2.47s/it] +2025-02-05 22:07:12 - ERROR - stderr - +2025-02-05 22:07:12 - ERROR - stderr - +2025-02-05 22:07:12 - INFO - stdout - {'loss': 0.6822, 'grad_norm': 1.2800675630569458, 'learning_rate': 8.887733842825885e-06, 'epoch': 1.65} +2025-02-05 22:07:12 - ERROR - stderr - 55%|█████▍ | 12326/22434 [11:59:32<6:55:43, 2.47s/it] +2025-02-05 22:07:14 - ERROR - stderr - 55%|█████▍ | 12327/22434 [11:59:34<6:58:14, 2.48s/it] +2025-02-05 22:07:14 - ERROR - stderr - +2025-02-05 22:07:14 - ERROR - stderr - +2025-02-05 22:07:14 - INFO - stdout - {'loss': 0.6344, 'grad_norm': 1.1618342399597168, 'learning_rate': 8.886299066146652e-06, 'epoch': 1.65} +2025-02-05 22:07:14 - ERROR - stderr - 55%|█████▍ | 12327/22434 [11:59:34<6:58:14, 2.48s/it] +2025-02-05 22:07:17 - ERROR - stderr - 55%|█████▍ | 12328/22434 [11:59:37<6:59:04, 2.49s/it] +2025-02-05 22:07:17 - ERROR - stderr - +2025-02-05 22:07:17 - ERROR - stderr - +2025-02-05 22:07:17 - INFO - stdout - {'loss': 0.789, 'grad_norm': 1.115425944328308, 'learning_rate': 8.884864312681449e-06, 'epoch': 1.65} +2025-02-05 22:07:17 - ERROR - stderr - 55%|█████▍ | 12328/22434 [11:59:37<6:59:04, 2.49s/it] +2025-02-05 22:07:19 - ERROR - stderr - 55%|█████▍ | 12329/22434 [11:59:39<6:57:19, 2.48s/it] +2025-02-05 22:07:19 - ERROR - stderr - +2025-02-05 22:07:19 - ERROR - stderr - +2025-02-05 22:07:19 - INFO - stdout - {'loss': 0.747, 'grad_norm': 1.2912665605545044, 'learning_rate': 8.883429582460178e-06, 'epoch': 1.65} +2025-02-05 22:07:19 - ERROR - stderr - 55%|█████▍ | 12329/22434 [11:59:39<6:57:19, 2.48s/it] +2025-02-05 22:07:22 - ERROR - stderr - 55%|█████▍ | 12330/22434 [11:59:42<6:57:58, 2.48s/it] +2025-02-05 22:07:22 - ERROR - stderr - +2025-02-05 22:07:22 - ERROR - stderr - +2025-02-05 22:07:22 - INFO - stdout - {'loss': 0.7079, 'grad_norm': 1.2462538480758667, 'learning_rate': 8.881994875512754e-06, 'epoch': 1.65} +2025-02-05 22:07:22 - ERROR - stderr - 55%|█████▍ | 12330/22434 [11:59:42<6:57:58, 2.48s/it] +2025-02-05 22:07:24 - ERROR - stderr - 55%|█████▍ | 12331/22434 [11:59:44<6:56:21, 2.47s/it] +2025-02-05 22:07:24 - ERROR - stderr - +2025-02-05 22:07:24 - ERROR - stderr - +2025-02-05 22:07:24 - INFO - stdout - {'loss': 0.7433, 'grad_norm': 1.2592493295669556, 'learning_rate': 8.880560191869071e-06, 'epoch': 1.65} +2025-02-05 22:07:24 - ERROR - stderr - 55%|█████▍ | 12331/22434 [11:59:44<6:56:21, 2.47s/it] +2025-02-05 22:07:27 - ERROR - stderr - 55%|█████▍ | 12332/22434 [11:59:47<6:58:13, 2.48s/it] +2025-02-05 22:07:27 - ERROR - stderr - +2025-02-05 22:07:27 - ERROR - stderr - +2025-02-05 22:07:27 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.2509167194366455, 'learning_rate': 8.879125531559042e-06, 'epoch': 1.65} +2025-02-05 22:07:27 - ERROR - stderr - 55%|█████▍ | 12332/22434 [11:59:47<6:58:13, 2.48s/it] +2025-02-05 22:07:29 - ERROR - stderr - 55%|█████▍ | 12333/22434 [11:59:49<6:57:44, 2.48s/it] +2025-02-05 22:07:29 - ERROR - stderr - +2025-02-05 22:07:29 - ERROR - stderr - +2025-02-05 22:07:29 - INFO - stdout - {'loss': 0.7036, 'grad_norm': 1.1815496683120728, 'learning_rate': 8.877690894612572e-06, 'epoch': 1.65} +2025-02-05 22:07:29 - ERROR - stderr - 55%|█████▍ | 12333/22434 [11:59:49<6:57:44, 2.48s/it] +2025-02-05 22:07:32 - ERROR - stderr - 55%|█████▍ | 12334/22434 [11:59:52<7:07:53, 2.54s/it] +2025-02-05 22:07:32 - ERROR - stderr - +2025-02-05 22:07:32 - ERROR - stderr - +2025-02-05 22:07:32 - INFO - stdout - {'loss': 0.7314, 'grad_norm': 1.3585467338562012, 'learning_rate': 8.876256281059558e-06, 'epoch': 1.65} +2025-02-05 22:07:32 - ERROR - stderr - 55%|█████▍ | 12334/22434 [11:59:52<7:07:53, 2.54s/it] +2025-02-05 22:07:35 - ERROR - stderr - 55%|█████▍ | 12335/22434 [11:59:54<7:11:30, 2.56s/it] +2025-02-05 22:07:35 - ERROR - stderr - +2025-02-05 22:07:35 - ERROR - stderr - +2025-02-05 22:07:35 - INFO - stdout - {'loss': 0.6453, 'grad_norm': 1.1983674764633179, 'learning_rate': 8.874821690929909e-06, 'epoch': 1.65} +2025-02-05 22:07:35 - ERROR - stderr - 55%|█████▍ | 12335/22434 [11:59:54<7:11:30, 2.56s/it] +2025-02-05 22:07:37 - ERROR - stderr - 55%|█████▍ | 12336/22434 [11:59:57<7:08:02, 2.54s/it] +2025-02-05 22:07:37 - ERROR - stderr - +2025-02-05 22:07:37 - ERROR - stderr - +2025-02-05 22:07:37 - INFO - stdout - {'loss': 0.6943, 'grad_norm': 1.1856147050857544, 'learning_rate': 8.873387124253524e-06, 'epoch': 1.65} +2025-02-05 22:07:37 - ERROR - stderr - 55%|█████▍ | 12336/22434 [11:59:57<7:08:02, 2.54s/it] +2025-02-05 22:07:40 - ERROR - stderr - 55%|█████▍ | 12337/22434 [11:59:59<7:02:30, 2.51s/it] +2025-02-05 22:07:40 - ERROR - stderr - +2025-02-05 22:07:40 - ERROR - stderr - +2025-02-05 22:07:40 - INFO - stdout - {'loss': 0.6285, 'grad_norm': 1.297411561012268, 'learning_rate': 8.871952581060305e-06, 'epoch': 1.65} +2025-02-05 22:07:40 - ERROR - stderr - 55%|█████▍ | 12337/22434 [11:59:59<7:02:30, 2.51s/it] +2025-02-05 22:07:42 - ERROR - stderr - 55%|█████▍ | 12338/22434 [12:00:02<6:57:49, 2.48s/it] +2025-02-05 22:07:42 - ERROR - stderr - +2025-02-05 22:07:42 - ERROR - stderr - +2025-02-05 22:07:42 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.243592381477356, 'learning_rate': 8.870518061380156e-06, 'epoch': 1.65} +2025-02-05 22:07:42 - ERROR - stderr - 55%|█████▍ | 12338/22434 [12:00:02<6:57:49, 2.48s/it] +2025-02-05 22:07:45 - ERROR - stderr - 55%|█████▌ | 12339/22434 [12:00:04<7:06:24, 2.53s/it] +2025-02-05 22:07:45 - ERROR - stderr - +2025-02-05 22:07:45 - ERROR - stderr - +2025-02-05 22:07:45 - INFO - stdout - {'loss': 0.64, 'grad_norm': 1.1155613660812378, 'learning_rate': 8.869083565242975e-06, 'epoch': 1.65} +2025-02-05 22:07:45 - ERROR - stderr - 55%|█████▌ | 12339/22434 [12:00:04<7:06:24, 2.53s/it] +2025-02-05 22:07:47 - ERROR - stderr - 55%|█████▌ | 12340/22434 [12:00:07<7:05:26, 2.53s/it] +2025-02-05 22:07:47 - ERROR - stderr - +2025-02-05 22:07:47 - ERROR - stderr - +2025-02-05 22:07:47 - INFO - stdout - {'loss': 0.7436, 'grad_norm': 1.1976444721221924, 'learning_rate': 8.86764909267867e-06, 'epoch': 1.65} +2025-02-05 22:07:47 - ERROR - stderr - 55%|█████▌ | 12340/22434 [12:00:07<7:05:26, 2.53s/it] +2025-02-05 22:07:50 - ERROR - stderr - 55%|█████▌ | 12341/22434 [12:00:09<7:04:38, 2.52s/it] +2025-02-05 22:07:50 - ERROR - stderr - +2025-02-05 22:07:50 - ERROR - stderr - +2025-02-05 22:07:50 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.116301417350769, 'learning_rate': 8.866214643717135e-06, 'epoch': 1.65} +2025-02-05 22:07:50 - ERROR - stderr - 55%|█████▌ | 12341/22434 [12:00:09<7:04:38, 2.52s/it] +2025-02-05 22:07:52 - ERROR - stderr - 55%|█████▌ | 12342/22434 [12:00:12<7:05:24, 2.53s/it] +2025-02-05 22:07:52 - ERROR - stderr - +2025-02-05 22:07:52 - ERROR - stderr - +2025-02-05 22:07:52 - INFO - stdout - {'loss': 0.6893, 'grad_norm': 1.239696741104126, 'learning_rate': 8.864780218388267e-06, 'epoch': 1.65} +2025-02-05 22:07:52 - ERROR - stderr - 55%|█████▌ | 12342/22434 [12:00:12<7:05:24, 2.53s/it] +2025-02-05 22:07:55 - ERROR - stderr - 55%|█████▌ | 12343/22434 [12:00:14<7:02:36, 2.51s/it] +2025-02-05 22:07:55 - ERROR - stderr - +2025-02-05 22:07:55 - ERROR - stderr - +2025-02-05 22:07:55 - INFO - stdout - {'loss': 0.6545, 'grad_norm': 1.1754655838012695, 'learning_rate': 8.863345816721972e-06, 'epoch': 1.65} +2025-02-05 22:07:55 - ERROR - stderr - 55%|█████▌ | 12343/22434 [12:00:14<7:02:36, 2.51s/it] +2025-02-05 22:07:57 - ERROR - stderr - 55%|█████▌ | 12344/22434 [12:00:17<7:14:35, 2.58s/it] +2025-02-05 22:07:57 - ERROR - stderr - +2025-02-05 22:07:57 - ERROR - stderr - +2025-02-05 22:07:57 - INFO - stdout - {'loss': 0.6291, 'grad_norm': 1.1113002300262451, 'learning_rate': 8.861911438748146e-06, 'epoch': 1.65} +2025-02-05 22:07:57 - ERROR - stderr - 55%|█████▌ | 12344/22434 [12:00:17<7:14:35, 2.58s/it] +2025-02-05 22:08:00 - ERROR - stderr - 55%|█████▌ | 12345/22434 [12:00:20<7:10:50, 2.56s/it] +2025-02-05 22:08:00 - ERROR - stderr - +2025-02-05 22:08:00 - ERROR - stderr - +2025-02-05 22:08:00 - INFO - stdout - {'loss': 0.6466, 'grad_norm': 1.211775302886963, 'learning_rate': 8.860477084496684e-06, 'epoch': 1.65} +2025-02-05 22:08:00 - ERROR - stderr - 55%|█████▌ | 12345/22434 [12:00:20<7:10:50, 2.56s/it] +2025-02-05 22:08:02 - ERROR - stderr - 55%|█████▌ | 12346/22434 [12:00:22<7:03:13, 2.52s/it] +2025-02-05 22:08:02 - ERROR - stderr - +2025-02-05 22:08:02 - ERROR - stderr - +2025-02-05 22:08:02 - INFO - stdout - {'loss': 0.7151, 'grad_norm': 4.429222583770752, 'learning_rate': 8.85904275399749e-06, 'epoch': 1.65} +2025-02-05 22:08:02 - ERROR - stderr - 55%|█████▌ | 12346/22434 [12:00:22<7:03:13, 2.52s/it] +2025-02-05 22:08:05 - ERROR - stderr - 55%|█████▌ | 12347/22434 [12:00:25<7:05:12, 2.53s/it] +2025-02-05 22:08:05 - ERROR - stderr - +2025-02-05 22:08:05 - ERROR - stderr - +2025-02-05 22:08:05 - INFO - stdout - {'loss': 0.6853, 'grad_norm': 1.1703946590423584, 'learning_rate': 8.857608447280454e-06, 'epoch': 1.65} +2025-02-05 22:08:05 - ERROR - stderr - 55%|█████▌ | 12347/22434 [12:00:25<7:05:12, 2.53s/it] +2025-02-05 22:08:07 - ERROR - stderr - 55%|█████▌ | 12348/22434 [12:00:27<7:05:57, 2.53s/it] +2025-02-05 22:08:07 - ERROR - stderr - +2025-02-05 22:08:07 - ERROR - stderr - +2025-02-05 22:08:07 - INFO - stdout - {'loss': 0.7024, 'grad_norm': 1.2180061340332031, 'learning_rate': 8.856174164375482e-06, 'epoch': 1.65} +2025-02-05 22:08:07 - ERROR - stderr - 55%|█████▌ | 12348/22434 [12:00:27<7:05:57, 2.53s/it] +2025-02-05 22:08:10 - ERROR - stderr - 55%|█████▌ | 12349/22434 [12:00:30<7:05:07, 2.53s/it] +2025-02-05 22:08:10 - ERROR - stderr - +2025-02-05 22:08:10 - ERROR - stderr - +2025-02-05 22:08:10 - INFO - stdout - {'loss': 0.6754, 'grad_norm': 1.2263474464416504, 'learning_rate': 8.854739905312463e-06, 'epoch': 1.65} +2025-02-05 22:08:10 - ERROR - stderr - 55%|█████▌ | 12349/22434 [12:00:30<7:05:07, 2.53s/it] +2025-02-05 22:08:12 - ERROR - stderr - 55%|█████▌ | 12350/22434 [12:00:32<7:02:59, 2.52s/it] +2025-02-05 22:08:12 - ERROR - stderr - +2025-02-05 22:08:12 - ERROR - stderr - +2025-02-05 22:08:12 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.2842947244644165, 'learning_rate': 8.853305670121294e-06, 'epoch': 1.65} +2025-02-05 22:08:12 - ERROR - stderr - 55%|█████▌ | 12350/22434 [12:00:32<7:02:59, 2.52s/it] +2025-02-05 22:08:15 - ERROR - stderr - 55%|█████▌ | 12351/22434 [12:00:35<7:01:18, 2.51s/it] +2025-02-05 22:08:15 - ERROR - stderr - +2025-02-05 22:08:15 - ERROR - stderr - +2025-02-05 22:08:15 - INFO - stdout - {'loss': 0.6235, 'grad_norm': 1.1028200387954712, 'learning_rate': 8.85187145883187e-06, 'epoch': 1.65} +2025-02-05 22:08:15 - ERROR - stderr - 55%|█████▌ | 12351/22434 [12:00:35<7:01:18, 2.51s/it] +2025-02-05 22:08:17 - ERROR - stderr - 55%|█████▌ | 12352/22434 [12:00:37<7:00:06, 2.50s/it] +2025-02-05 22:08:17 - ERROR - stderr - +2025-02-05 22:08:17 - ERROR - stderr - +2025-02-05 22:08:17 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.2587285041809082, 'learning_rate': 8.85043727147409e-06, 'epoch': 1.65} +2025-02-05 22:08:17 - ERROR - stderr - 55%|█████▌ | 12352/22434 [12:00:37<7:00:06, 2.50s/it] +2025-02-05 22:08:20 - ERROR - stderr - 55%|█████▌ | 12353/22434 [12:00:40<6:57:07, 2.48s/it] +2025-02-05 22:08:20 - ERROR - stderr - +2025-02-05 22:08:20 - ERROR - stderr - +2025-02-05 22:08:20 - INFO - stdout - {'loss': 0.7453, 'grad_norm': 1.1952602863311768, 'learning_rate': 8.84900310807784e-06, 'epoch': 1.65} +2025-02-05 22:08:20 - ERROR - stderr - 55%|█████▌ | 12353/22434 [12:00:40<6:57:07, 2.48s/it] +2025-02-05 22:08:22 - ERROR - stderr - 55%|█████▌ | 12354/22434 [12:00:42<6:53:35, 2.46s/it] +2025-02-05 22:08:22 - ERROR - stderr - +2025-02-05 22:08:22 - ERROR - stderr - +2025-02-05 22:08:22 - INFO - stdout - {'loss': 0.6524, 'grad_norm': 1.2839983701705933, 'learning_rate': 8.847568968673025e-06, 'epoch': 1.65} +2025-02-05 22:08:22 - ERROR - stderr - 55%|█████▌ | 12354/22434 [12:00:42<6:53:35, 2.46s/it] +2025-02-05 22:08:25 - ERROR - stderr - 55%|█████▌ | 12355/22434 [12:00:45<6:58:00, 2.49s/it] +2025-02-05 22:08:25 - ERROR - stderr - +2025-02-05 22:08:25 - ERROR - stderr - +2025-02-05 22:08:25 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.172766089439392, 'learning_rate': 8.846134853289527e-06, 'epoch': 1.65} +2025-02-05 22:08:25 - ERROR - stderr - 55%|█████▌ | 12355/22434 [12:00:45<6:58:00, 2.49s/it] +2025-02-05 22:08:27 - ERROR - stderr - 55%|█████▌ | 12356/22434 [12:00:47<6:59:51, 2.50s/it] +2025-02-05 22:08:27 - ERROR - stderr - +2025-02-05 22:08:27 - ERROR - stderr - +2025-02-05 22:08:27 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.2719576358795166, 'learning_rate': 8.84470076195725e-06, 'epoch': 1.65} +2025-02-05 22:08:27 - ERROR - stderr - 55%|█████▌ | 12356/22434 [12:00:47<6:59:51, 2.50s/it] +2025-02-05 22:08:30 - ERROR - stderr - 55%|█████▌ | 12357/22434 [12:00:50<7:09:51, 2.56s/it] +2025-02-05 22:08:30 - ERROR - stderr - +2025-02-05 22:08:30 - ERROR - stderr - +2025-02-05 22:08:30 - INFO - stdout - {'loss': 0.7612, 'grad_norm': 1.2521767616271973, 'learning_rate': 8.843266694706075e-06, 'epoch': 1.65} +2025-02-05 22:08:30 - ERROR - stderr - 55%|█████▌ | 12357/22434 [12:00:50<7:09:51, 2.56s/it] +2025-02-05 22:08:32 - ERROR - stderr - 55%|█████▌ | 12358/22434 [12:00:52<7:05:15, 2.53s/it] +2025-02-05 22:08:33 - ERROR - stderr - +2025-02-05 22:08:33 - ERROR - stderr - +2025-02-05 22:08:33 - INFO - stdout - {'loss': 0.6857, 'grad_norm': 1.1674854755401611, 'learning_rate': 8.841832651565897e-06, 'epoch': 1.65} +2025-02-05 22:08:33 - ERROR - stderr - 55%|█████▌ | 12358/22434 [12:00:52<7:05:15, 2.53s/it] +2025-02-05 22:08:35 - ERROR - stderr - 55%|█████▌ | 12359/22434 [12:00:55<7:04:09, 2.53s/it] +2025-02-05 22:08:35 - ERROR - stderr - +2025-02-05 22:08:35 - ERROR - stderr - +2025-02-05 22:08:35 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.2313117980957031, 'learning_rate': 8.840398632566614e-06, 'epoch': 1.65} +2025-02-05 22:08:35 - ERROR - stderr - 55%|█████▌ | 12359/22434 [12:00:55<7:04:09, 2.53s/it] +2025-02-05 22:08:38 - ERROR - stderr - 55%|█████▌ | 12360/22434 [12:00:58<7:17:34, 2.61s/it] +2025-02-05 22:08:38 - ERROR - stderr - +2025-02-05 22:08:38 - ERROR - stderr - +2025-02-05 22:08:38 - INFO - stdout - {'loss': 0.7138, 'grad_norm': 1.2685930728912354, 'learning_rate': 8.838964637738112e-06, 'epoch': 1.65} +2025-02-05 22:08:38 - ERROR - stderr - 55%|█████▌ | 12360/22434 [12:00:58<7:17:34, 2.61s/it] +2025-02-05 22:08:40 - ERROR - stderr - 55%|█████▌ | 12361/22434 [12:01:00<7:11:54, 2.57s/it] +2025-02-05 22:08:40 - ERROR - stderr - +2025-02-05 22:08:40 - ERROR - stderr - +2025-02-05 22:08:40 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.0916415452957153, 'learning_rate': 8.837530667110278e-06, 'epoch': 1.65} +2025-02-05 22:08:40 - ERROR - stderr - 55%|█████▌ | 12361/22434 [12:01:00<7:11:54, 2.57s/it] +2025-02-05 22:08:43 - ERROR - stderr - 55%|█████▌ | 12362/22434 [12:01:03<7:07:26, 2.55s/it] +2025-02-05 22:08:43 - ERROR - stderr - +2025-02-05 22:08:43 - ERROR - stderr - +2025-02-05 22:08:43 - INFO - stdout - {'loss': 0.6949, 'grad_norm': 1.2985621690750122, 'learning_rate': 8.836096720713009e-06, 'epoch': 1.65} +2025-02-05 22:08:43 - ERROR - stderr - 55%|█████▌ | 12362/22434 [12:01:03<7:07:26, 2.55s/it] +2025-02-05 22:08:45 - ERROR - stderr - 55%|█████▌ | 12363/22434 [12:01:05<7:02:51, 2.52s/it] +2025-02-05 22:08:45 - ERROR - stderr - +2025-02-05 22:08:45 - ERROR - stderr - +2025-02-05 22:08:45 - INFO - stdout - {'loss': 0.6054, 'grad_norm': 1.122735857963562, 'learning_rate': 8.834662798576184e-06, 'epoch': 1.65} +2025-02-05 22:08:45 - ERROR - stderr - 55%|█████▌ | 12363/22434 [12:01:05<7:02:51, 2.52s/it] +2025-02-05 22:08:48 - ERROR - stderr - 55%|█████▌ | 12364/22434 [12:01:08<7:18:14, 2.61s/it] +2025-02-05 22:08:48 - ERROR - stderr - +2025-02-05 22:08:48 - ERROR - stderr - +2025-02-05 22:08:48 - INFO - stdout - {'loss': 0.6262, 'grad_norm': 1.1704249382019043, 'learning_rate': 8.8332289007297e-06, 'epoch': 1.65} +2025-02-05 22:08:48 - ERROR - stderr - 55%|█████▌ | 12364/22434 [12:01:08<7:18:14, 2.61s/it] +2025-02-05 22:08:51 - ERROR - stderr - 55%|█████▌ | 12365/22434 [12:01:11<7:26:00, 2.66s/it] +2025-02-05 22:08:51 - ERROR - stderr - +2025-02-05 22:08:51 - ERROR - stderr - +2025-02-05 22:08:51 - INFO - stdout - {'loss': 0.6302, 'grad_norm': 1.235245943069458, 'learning_rate': 8.831795027203448e-06, 'epoch': 1.65} +2025-02-05 22:08:51 - ERROR - stderr - 55%|█████▌ | 12365/22434 [12:01:11<7:26:00, 2.66s/it] +2025-02-05 22:08:53 - ERROR - stderr - 55%|█████▌ | 12366/22434 [12:01:13<7:15:48, 2.60s/it] +2025-02-05 22:08:53 - ERROR - stderr - +2025-02-05 22:08:53 - ERROR - stderr - +2025-02-05 22:08:53 - INFO - stdout - {'loss': 0.6043, 'grad_norm': 1.140698790550232, 'learning_rate': 8.830361178027302e-06, 'epoch': 1.65} +2025-02-05 22:08:53 - ERROR - stderr - 55%|█████▌ | 12366/22434 [12:01:13<7:15:48, 2.60s/it] +2025-02-05 22:08:56 - ERROR - stderr - 55%|█████▌ | 12367/22434 [12:01:16<7:10:46, 2.57s/it] +2025-02-05 22:08:56 - ERROR - stderr - +2025-02-05 22:08:56 - ERROR - stderr - +2025-02-05 22:08:56 - INFO - stdout - {'loss': 0.6171, 'grad_norm': 1.205237627029419, 'learning_rate': 8.828927353231165e-06, 'epoch': 1.65} +2025-02-05 22:08:56 - ERROR - stderr - 55%|█████▌ | 12367/22434 [12:01:16<7:10:46, 2.57s/it] +2025-02-05 22:08:58 - ERROR - stderr - 55%|█████▌ | 12368/22434 [12:01:18<7:09:55, 2.56s/it] +2025-02-05 22:08:58 - ERROR - stderr - +2025-02-05 22:08:58 - ERROR - stderr - +2025-02-05 22:08:58 - INFO - stdout - {'loss': 0.7309, 'grad_norm': 1.3161205053329468, 'learning_rate': 8.827493552844917e-06, 'epoch': 1.65} +2025-02-05 22:08:58 - ERROR - stderr - 55%|█████▌ | 12368/22434 [12:01:18<7:09:55, 2.56s/it] +2025-02-05 22:09:01 - ERROR - stderr - 55%|█████▌ | 12369/22434 [12:01:21<7:17:22, 2.61s/it] +2025-02-05 22:09:01 - ERROR - stderr - +2025-02-05 22:09:01 - ERROR - stderr - +2025-02-05 22:09:01 - INFO - stdout - {'loss': 0.6129, 'grad_norm': 1.1530934572219849, 'learning_rate': 8.826059776898441e-06, 'epoch': 1.65} +2025-02-05 22:09:01 - ERROR - stderr - 55%|█████▌ | 12369/22434 [12:01:21<7:17:22, 2.61s/it] +2025-02-05 22:09:04 - ERROR - stderr - 55%|█████▌ | 12370/22434 [12:01:23<7:12:12, 2.58s/it] +2025-02-05 22:09:04 - ERROR - stderr - +2025-02-05 22:09:04 - ERROR - stderr - +2025-02-05 22:09:04 - INFO - stdout - {'loss': 0.7404, 'grad_norm': 1.33249032497406, 'learning_rate': 8.824626025421625e-06, 'epoch': 1.65} +2025-02-05 22:09:04 - ERROR - stderr - 55%|█████▌ | 12370/22434 [12:01:23<7:12:12, 2.58s/it] +2025-02-05 22:09:06 - ERROR - stderr - 55%|█████▌ | 12371/22434 [12:01:26<7:09:57, 2.56s/it] +2025-02-05 22:09:06 - ERROR - stderr - +2025-02-05 22:09:06 - ERROR - stderr - +2025-02-05 22:09:06 - INFO - stdout - {'loss': 0.7499, 'grad_norm': 1.2091525793075562, 'learning_rate': 8.823192298444355e-06, 'epoch': 1.65} +2025-02-05 22:09:06 - ERROR - stderr - 55%|█████▌ | 12371/22434 [12:01:26<7:09:57, 2.56s/it] +2025-02-05 22:09:09 - ERROR - stderr - 55%|█████▌ | 12372/22434 [12:01:28<7:06:18, 2.54s/it] +2025-02-05 22:09:09 - ERROR - stderr - +2025-02-05 22:09:09 - ERROR - stderr - +2025-02-05 22:09:09 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.2791638374328613, 'learning_rate': 8.821758595996516e-06, 'epoch': 1.65} +2025-02-05 22:09:09 - ERROR - stderr - 55%|█████▌ | 12372/22434 [12:01:28<7:06:18, 2.54s/it] +2025-02-05 22:09:11 - ERROR - stderr - 55%|█████▌ | 12373/22434 [12:01:31<7:05:00, 2.53s/it] +2025-02-05 22:09:11 - ERROR - stderr - +2025-02-05 22:09:11 - ERROR - stderr - +2025-02-05 22:09:11 - INFO - stdout - {'loss': 0.5971, 'grad_norm': 1.0910277366638184, 'learning_rate': 8.820324918107995e-06, 'epoch': 1.65} +2025-02-05 22:09:11 - ERROR - stderr - 55%|█████▌ | 12373/22434 [12:01:31<7:05:00, 2.53s/it] +2025-02-05 22:09:14 - ERROR - stderr - 55%|█████▌ | 12374/22434 [12:01:34<7:13:47, 2.59s/it] +2025-02-05 22:09:14 - ERROR - stderr - +2025-02-05 22:09:14 - ERROR - stderr - +2025-02-05 22:09:14 - INFO - stdout - {'loss': 0.6333, 'grad_norm': 1.1187313795089722, 'learning_rate': 8.818891264808667e-06, 'epoch': 1.65} +2025-02-05 22:09:14 - ERROR - stderr - 55%|█████▌ | 12374/22434 [12:01:34<7:13:47, 2.59s/it] +2025-02-05 22:09:16 - ERROR - stderr - 55%|█████▌ | 12375/22434 [12:01:36<7:04:42, 2.53s/it] +2025-02-05 22:09:16 - ERROR - stderr - +2025-02-05 22:09:16 - ERROR - stderr - +2025-02-05 22:09:16 - INFO - stdout - {'loss': 0.6955, 'grad_norm': 1.2203001976013184, 'learning_rate': 8.817457636128425e-06, 'epoch': 1.65} +2025-02-05 22:09:16 - ERROR - stderr - 55%|█████▌ | 12375/22434 [12:01:36<7:04:42, 2.53s/it] +2025-02-05 22:09:19 - ERROR - stderr - 55%|█████▌ | 12376/22434 [12:01:38<7:04:12, 2.53s/it] +2025-02-05 22:09:19 - ERROR - stderr - +2025-02-05 22:09:19 - ERROR - stderr - +2025-02-05 22:09:19 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.0609242916107178, 'learning_rate': 8.816024032097145e-06, 'epoch': 1.65} +2025-02-05 22:09:19 - ERROR - stderr - 55%|█████▌ | 12376/22434 [12:01:39<7:04:12, 2.53s/it] +2025-02-05 22:09:21 - ERROR - stderr - 55%|█████▌ | 12377/22434 [12:01:41<7:02:38, 2.52s/it] +2025-02-05 22:09:21 - ERROR - stderr - +2025-02-05 22:09:21 - ERROR - stderr - +2025-02-05 22:09:21 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 1.2922406196594238, 'learning_rate': 8.814590452744709e-06, 'epoch': 1.66} +2025-02-05 22:09:21 - ERROR - stderr - 55%|█████▌ | 12377/22434 [12:01:41<7:02:38, 2.52s/it] +2025-02-05 22:09:24 - ERROR - stderr - 55%|█████▌ | 12378/22434 [12:01:43<7:00:10, 2.51s/it] +2025-02-05 22:09:24 - ERROR - stderr - +2025-02-05 22:09:24 - ERROR - stderr - +2025-02-05 22:09:24 - INFO - stdout - {'loss': 0.7177, 'grad_norm': 1.3166743516921997, 'learning_rate': 8.813156898101003e-06, 'epoch': 1.66} +2025-02-05 22:09:24 - ERROR - stderr - 55%|█████▌ | 12378/22434 [12:01:44<7:00:10, 2.51s/it] +2025-02-05 22:09:26 - ERROR - stderr - 55%|█████▌ | 12379/22434 [12:01:46<7:05:22, 2.54s/it] +2025-02-05 22:09:26 - ERROR - stderr - +2025-02-05 22:09:26 - ERROR - stderr - +2025-02-05 22:09:26 - INFO - stdout - {'loss': 0.6205, 'grad_norm': 1.0977634191513062, 'learning_rate': 8.811723368195903e-06, 'epoch': 1.66} +2025-02-05 22:09:26 - ERROR - stderr - 55%|█████▌ | 12379/22434 [12:01:46<7:05:22, 2.54s/it] +2025-02-05 22:09:29 - ERROR - stderr - 55%|█████▌ | 12380/22434 [12:01:49<6:59:45, 2.50s/it] +2025-02-05 22:09:29 - ERROR - stderr - +2025-02-05 22:09:29 - ERROR - stderr - +2025-02-05 22:09:29 - INFO - stdout - {'loss': 0.7415, 'grad_norm': 1.234967827796936, 'learning_rate': 8.810289863059298e-06, 'epoch': 1.66} +2025-02-05 22:09:29 - ERROR - stderr - 55%|█████▌ | 12380/22434 [12:01:49<6:59:45, 2.50s/it] +2025-02-05 22:09:31 - ERROR - stderr - 55%|█████▌ | 12381/22434 [12:01:51<6:59:37, 2.50s/it] +2025-02-05 22:09:31 - ERROR - stderr - +2025-02-05 22:09:31 - ERROR - stderr - +2025-02-05 22:09:31 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.2720471620559692, 'learning_rate': 8.80885638272106e-06, 'epoch': 1.66} +2025-02-05 22:09:31 - ERROR - stderr - 55%|█████▌ | 12381/22434 [12:01:51<6:59:37, 2.50s/it] +2025-02-05 22:09:34 - ERROR - stderr - 55%|█████▌ | 12382/22434 [12:01:53<6:55:51, 2.48s/it] +2025-02-05 22:09:34 - ERROR - stderr - +2025-02-05 22:09:34 - ERROR - stderr - +2025-02-05 22:09:34 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.246524453163147, 'learning_rate': 8.807422927211068e-06, 'epoch': 1.66} +2025-02-05 22:09:34 - ERROR - stderr - 55%|█████▌ | 12382/22434 [12:01:53<6:55:51, 2.48s/it] +2025-02-05 22:09:36 - ERROR - stderr - 55%|█████▌ | 12383/22434 [12:01:56<6:56:45, 2.49s/it] +2025-02-05 22:09:36 - ERROR - stderr - +2025-02-05 22:09:36 - ERROR - stderr - +2025-02-05 22:09:36 - INFO - stdout - {'loss': 0.6691, 'grad_norm': 1.111374020576477, 'learning_rate': 8.805989496559204e-06, 'epoch': 1.66} +2025-02-05 22:09:36 - ERROR - stderr - 55%|█████▌ | 12383/22434 [12:01:56<6:56:45, 2.49s/it] +2025-02-05 22:09:39 - ERROR - stderr - 55%|█████▌ | 12384/22434 [12:01:58<6:53:52, 2.47s/it] +2025-02-05 22:09:39 - ERROR - stderr - +2025-02-05 22:09:39 - ERROR - stderr - +2025-02-05 22:09:39 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.1502548456192017, 'learning_rate': 8.80455609079535e-06, 'epoch': 1.66} +2025-02-05 22:09:39 - ERROR - stderr - 55%|█████▌ | 12384/22434 [12:01:58<6:53:52, 2.47s/it] +2025-02-05 22:09:41 - ERROR - stderr - 55%|█████▌ | 12385/22434 [12:02:01<6:57:22, 2.49s/it] +2025-02-05 22:09:41 - ERROR - stderr - +2025-02-05 22:09:41 - ERROR - stderr - +2025-02-05 22:09:41 - INFO - stdout - {'loss': 0.639, 'grad_norm': 1.1283316612243652, 'learning_rate': 8.803122709949378e-06, 'epoch': 1.66} +2025-02-05 22:09:41 - ERROR - stderr - 55%|█████▌ | 12385/22434 [12:02:01<6:57:22, 2.49s/it] +2025-02-05 22:09:44 - ERROR - stderr - 55%|█████▌ | 12386/22434 [12:02:03<6:55:06, 2.48s/it] +2025-02-05 22:09:44 - ERROR - stderr - +2025-02-05 22:09:44 - ERROR - stderr - +2025-02-05 22:09:44 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.3349289894104004, 'learning_rate': 8.80168935405117e-06, 'epoch': 1.66} +2025-02-05 22:09:44 - ERROR - stderr - 55%|█████▌ | 12386/22434 [12:02:03<6:55:06, 2.48s/it] +2025-02-05 22:09:46 - ERROR - stderr - 55%|█████▌ | 12387/22434 [12:02:06<6:57:43, 2.49s/it] +2025-02-05 22:09:46 - ERROR - stderr - +2025-02-05 22:09:46 - ERROR - stderr - +2025-02-05 22:09:46 - INFO - stdout - {'loss': 0.6616, 'grad_norm': 1.1897867918014526, 'learning_rate': 8.800256023130597e-06, 'epoch': 1.66} +2025-02-05 22:09:46 - ERROR - stderr - 55%|█████▌ | 12387/22434 [12:02:06<6:57:43, 2.49s/it] +2025-02-05 22:09:49 - ERROR - stderr - 55%|█████▌ | 12388/22434 [12:02:08<6:57:00, 2.49s/it] +2025-02-05 22:09:49 - ERROR - stderr - +2025-02-05 22:09:49 - ERROR - stderr - +2025-02-05 22:09:49 - INFO - stdout - {'loss': 0.7981, 'grad_norm': 1.3511940240859985, 'learning_rate': 8.798822717217543e-06, 'epoch': 1.66} +2025-02-05 22:09:49 - ERROR - stderr - 55%|█████▌ | 12388/22434 [12:02:08<6:57:00, 2.49s/it] +2025-02-05 22:09:51 - ERROR - stderr - 55%|█████▌ | 12389/22434 [12:02:11<6:53:49, 2.47s/it] +2025-02-05 22:09:51 - ERROR - stderr - +2025-02-05 22:09:51 - ERROR - stderr - +2025-02-05 22:09:51 - INFO - stdout - {'loss': 0.6669, 'grad_norm': 1.1522397994995117, 'learning_rate': 8.797389436341879e-06, 'epoch': 1.66} +2025-02-05 22:09:51 - ERROR - stderr - 55%|█████▌ | 12389/22434 [12:02:11<6:53:49, 2.47s/it] +2025-02-05 22:09:54 - ERROR - stderr - 55%|█████▌ | 12390/22434 [12:02:13<6:57:17, 2.49s/it] +2025-02-05 22:09:54 - ERROR - stderr - +2025-02-05 22:09:54 - ERROR - stderr - +2025-02-05 22:09:54 - INFO - stdout - {'loss': 0.7414, 'grad_norm': 1.3934744596481323, 'learning_rate': 8.795956180533478e-06, 'epoch': 1.66} +2025-02-05 22:09:54 - ERROR - stderr - 55%|█████▌ | 12390/22434 [12:02:13<6:57:17, 2.49s/it] +2025-02-05 22:09:56 - ERROR - stderr - 55%|█████▌ | 12391/22434 [12:02:16<6:58:05, 2.50s/it] +2025-02-05 22:09:56 - ERROR - stderr - +2025-02-05 22:09:56 - ERROR - stderr - +2025-02-05 22:09:56 - INFO - stdout - {'loss': 0.6611, 'grad_norm': 1.1359480619430542, 'learning_rate': 8.794522949822222e-06, 'epoch': 1.66} +2025-02-05 22:09:56 - ERROR - stderr - 55%|█████▌ | 12391/22434 [12:02:16<6:58:05, 2.50s/it] +2025-02-05 22:09:59 - ERROR - stderr - 55%|█████▌ | 12392/22434 [12:02:18<6:54:10, 2.47s/it] +2025-02-05 22:09:59 - ERROR - stderr - +2025-02-05 22:09:59 - ERROR - stderr - +2025-02-05 22:09:59 - INFO - stdout - {'loss': 0.6325, 'grad_norm': 1.0843828916549683, 'learning_rate': 8.793089744237983e-06, 'epoch': 1.66} +2025-02-05 22:09:59 - ERROR - stderr - 55%|█████▌ | 12392/22434 [12:02:18<6:54:10, 2.47s/it] +2025-02-05 22:10:01 - ERROR - stderr - 55%|█████▌ | 12393/22434 [12:02:21<6:52:19, 2.46s/it] +2025-02-05 22:10:01 - ERROR - stderr - +2025-02-05 22:10:01 - ERROR - stderr - +2025-02-05 22:10:01 - INFO - stdout - {'loss': 0.655, 'grad_norm': 1.3256853818893433, 'learning_rate': 8.79165656381063e-06, 'epoch': 1.66} +2025-02-05 22:10:01 - ERROR - stderr - 55%|█████▌ | 12393/22434 [12:02:21<6:52:19, 2.46s/it] +2025-02-05 22:10:04 - ERROR - stderr - 55%|█████▌ | 12394/22434 [12:02:23<7:00:36, 2.51s/it] +2025-02-05 22:10:04 - ERROR - stderr - +2025-02-05 22:10:04 - ERROR - stderr - +2025-02-05 22:10:04 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.2043112516403198, 'learning_rate': 8.790223408570043e-06, 'epoch': 1.66} +2025-02-05 22:10:04 - ERROR - stderr - 55%|█████▌ | 12394/22434 [12:02:23<7:00:36, 2.51s/it] +2025-02-05 22:10:06 - ERROR - stderr - 55%|█████▌ | 12395/22434 [12:02:26<6:58:29, 2.50s/it] +2025-02-05 22:10:06 - ERROR - stderr - +2025-02-05 22:10:06 - ERROR - stderr - +2025-02-05 22:10:06 - INFO - stdout - {'loss': 0.756, 'grad_norm': 1.213523268699646, 'learning_rate': 8.788790278546087e-06, 'epoch': 1.66} +2025-02-05 22:10:06 - ERROR - stderr - 55%|█████▌ | 12395/22434 [12:02:26<6:58:29, 2.50s/it] +2025-02-05 22:10:09 - ERROR - stderr - 55%|█████▌ | 12396/22434 [12:02:28<7:00:21, 2.51s/it] +2025-02-05 22:10:09 - ERROR - stderr - +2025-02-05 22:10:09 - ERROR - stderr - +2025-02-05 22:10:09 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.0996527671813965, 'learning_rate': 8.78735717376864e-06, 'epoch': 1.66} +2025-02-05 22:10:09 - ERROR - stderr - 55%|█████▌ | 12396/22434 [12:02:28<7:00:21, 2.51s/it] +2025-02-05 22:10:11 - ERROR - stderr - 55%|█████▌ | 12397/22434 [12:02:31<6:59:11, 2.51s/it] +2025-02-05 22:10:11 - ERROR - stderr - +2025-02-05 22:10:11 - ERROR - stderr - +2025-02-05 22:10:11 - INFO - stdout - {'loss': 0.6748, 'grad_norm': 1.214781403541565, 'learning_rate': 8.785924094267575e-06, 'epoch': 1.66} +2025-02-05 22:10:11 - ERROR - stderr - 55%|█████▌ | 12397/22434 [12:02:31<6:59:11, 2.51s/it] +2025-02-05 22:10:14 - ERROR - stderr - 55%|█████▌ | 12398/22434 [12:02:33<6:58:20, 2.50s/it] +2025-02-05 22:10:14 - ERROR - stderr - +2025-02-05 22:10:14 - ERROR - stderr - +2025-02-05 22:10:14 - INFO - stdout - {'loss': 0.717, 'grad_norm': 1.197139024734497, 'learning_rate': 8.784491040072755e-06, 'epoch': 1.66} +2025-02-05 22:10:14 - ERROR - stderr - 55%|█████▌ | 12398/22434 [12:02:33<6:58:20, 2.50s/it] +2025-02-05 22:10:16 - ERROR - stderr - 55%|█████▌ | 12399/22434 [12:02:36<6:59:04, 2.51s/it] +2025-02-05 22:10:16 - ERROR - stderr - +2025-02-05 22:10:16 - ERROR - stderr - +2025-02-05 22:10:16 - INFO - stdout - {'loss': 0.6464, 'grad_norm': 1.1408874988555908, 'learning_rate': 8.783058011214063e-06, 'epoch': 1.66} +2025-02-05 22:10:16 - ERROR - stderr - 55%|█████▌ | 12399/22434 [12:02:36<6:59:04, 2.51s/it] +2025-02-05 22:10:19 - ERROR - stderr - 55%|█████▌ | 12400/22434 [12:02:38<6:59:06, 2.51s/it] +2025-02-05 22:10:19 - ERROR - stderr - +2025-02-05 22:10:19 - ERROR - stderr - +2025-02-05 22:10:19 - INFO - stdout - {'loss': 0.6863, 'grad_norm': 1.1369574069976807, 'learning_rate': 8.781625007721362e-06, 'epoch': 1.66} +2025-02-05 22:10:19 - ERROR - stderr - 55%|█████▌ | 12400/22434 [12:02:38<6:59:06, 2.51s/it] +2025-02-05 22:10:21 - ERROR - stderr - 55%|█████▌ | 12401/22434 [12:02:41<6:58:30, 2.50s/it] +2025-02-05 22:10:21 - ERROR - stderr - +2025-02-05 22:10:21 - ERROR - stderr - +2025-02-05 22:10:21 - INFO - stdout - {'loss': 0.6533, 'grad_norm': 1.274687647819519, 'learning_rate': 8.780192029624516e-06, 'epoch': 1.66} +2025-02-05 22:10:21 - ERROR - stderr - 55%|█████▌ | 12401/22434 [12:02:41<6:58:30, 2.50s/it] +2025-02-05 22:10:24 - ERROR - stderr - 55%|█████▌ | 12402/22434 [12:02:43<6:57:38, 2.50s/it] +2025-02-05 22:10:24 - ERROR - stderr - +2025-02-05 22:10:24 - ERROR - stderr - +2025-02-05 22:10:24 - INFO - stdout - {'loss': 0.642, 'grad_norm': 1.3065712451934814, 'learning_rate': 8.778759076953403e-06, 'epoch': 1.66} +2025-02-05 22:10:24 - ERROR - stderr - 55%|█████▌ | 12402/22434 [12:02:43<6:57:38, 2.50s/it] +2025-02-05 22:10:26 - ERROR - stderr - 55%|█████▌ | 12403/22434 [12:02:46<7:00:11, 2.51s/it] +2025-02-05 22:10:26 - ERROR - stderr - +2025-02-05 22:10:26 - ERROR - stderr - +2025-02-05 22:10:26 - INFO - stdout - {'loss': 0.5633, 'grad_norm': 1.0645710229873657, 'learning_rate': 8.777326149737886e-06, 'epoch': 1.66} +2025-02-05 22:10:26 - ERROR - stderr - 55%|█████▌ | 12403/22434 [12:02:46<7:00:11, 2.51s/it] +2025-02-05 22:10:29 - ERROR - stderr - 55%|█████▌ | 12404/22434 [12:02:48<6:57:08, 2.50s/it] +2025-02-05 22:10:29 - ERROR - stderr - +2025-02-05 22:10:29 - ERROR - stderr - +2025-02-05 22:10:29 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.2185330390930176, 'learning_rate': 8.77589324800784e-06, 'epoch': 1.66} +2025-02-05 22:10:29 - ERROR - stderr - 55%|█████▌ | 12404/22434 [12:02:48<6:57:08, 2.50s/it] +2025-02-05 22:10:31 - ERROR - stderr - 55%|█████▌ | 12405/22434 [12:02:51<6:56:34, 2.49s/it] +2025-02-05 22:10:31 - ERROR - stderr - +2025-02-05 22:10:31 - ERROR - stderr - +2025-02-05 22:10:31 - INFO - stdout - {'loss': 0.68, 'grad_norm': 1.3028680086135864, 'learning_rate': 8.774460371793126e-06, 'epoch': 1.66} +2025-02-05 22:10:31 - ERROR - stderr - 55%|█████▌ | 12405/22434 [12:02:51<6:56:34, 2.49s/it] +2025-02-05 22:10:34 - ERROR - stderr - 55%|█████▌ | 12406/22434 [12:02:54<7:07:51, 2.56s/it] +2025-02-05 22:10:34 - ERROR - stderr - +2025-02-05 22:10:34 - ERROR - stderr - +2025-02-05 22:10:34 - INFO - stdout - {'loss': 0.7071, 'grad_norm': 1.3789809942245483, 'learning_rate': 8.77302752112361e-06, 'epoch': 1.66} +2025-02-05 22:10:34 - ERROR - stderr - 55%|█████▌ | 12406/22434 [12:02:54<7:07:51, 2.56s/it] +2025-02-05 22:10:36 - ERROR - stderr - 55%|█████▌ | 12407/22434 [12:02:56<7:07:46, 2.56s/it] +2025-02-05 22:10:36 - ERROR - stderr - +2025-02-05 22:10:36 - ERROR - stderr - +2025-02-05 22:10:36 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.1797913312911987, 'learning_rate': 8.771594696029166e-06, 'epoch': 1.66} +2025-02-05 22:10:36 - ERROR - stderr - 55%|█████▌ | 12407/22434 [12:02:56<7:07:46, 2.56s/it] +2025-02-05 22:10:39 - ERROR - stderr - 55%|█████▌ | 12408/22434 [12:02:59<7:06:21, 2.55s/it] +2025-02-05 22:10:39 - ERROR - stderr - +2025-02-05 22:10:39 - ERROR - stderr - +2025-02-05 22:10:39 - INFO - stdout - {'loss': 0.6927, 'grad_norm': 1.201615571975708, 'learning_rate': 8.77016189653965e-06, 'epoch': 1.66} +2025-02-05 22:10:39 - ERROR - stderr - 55%|█████▌ | 12408/22434 [12:02:59<7:06:21, 2.55s/it] +2025-02-05 22:10:41 - ERROR - stderr - 55%|█████▌ | 12409/22434 [12:03:01<7:04:57, 2.54s/it] +2025-02-05 22:10:41 - ERROR - stderr - +2025-02-05 22:10:41 - ERROR - stderr - +2025-02-05 22:10:41 - INFO - stdout - {'loss': 0.6494, 'grad_norm': 1.2347160577774048, 'learning_rate': 8.768729122684935e-06, 'epoch': 1.66} +2025-02-05 22:10:41 - ERROR - stderr - 55%|█████▌ | 12409/22434 [12:03:01<7:04:57, 2.54s/it] +2025-02-05 22:10:44 - ERROR - stderr - 55%|█████▌ | 12410/22434 [12:03:04<7:20:50, 2.64s/it] +2025-02-05 22:10:44 - ERROR - stderr - +2025-02-05 22:10:44 - ERROR - stderr - +2025-02-05 22:10:44 - INFO - stdout - {'loss': 0.6443, 'grad_norm': 1.1201642751693726, 'learning_rate': 8.767296374494886e-06, 'epoch': 1.66} +2025-02-05 22:10:44 - ERROR - stderr - 55%|█████▌ | 12410/22434 [12:03:04<7:20:50, 2.64s/it] +2025-02-05 22:10:47 - ERROR - stderr - 55%|█████▌ | 12411/22434 [12:03:07<7:22:45, 2.65s/it] +2025-02-05 22:10:47 - ERROR - stderr - +2025-02-05 22:10:47 - ERROR - stderr - +2025-02-05 22:10:47 - INFO - stdout - {'loss': 0.7498, 'grad_norm': 1.2633198499679565, 'learning_rate': 8.76586365199936e-06, 'epoch': 1.66} +2025-02-05 22:10:47 - ERROR - stderr - 55%|█████▌ | 12411/22434 [12:03:07<7:22:45, 2.65s/it] +2025-02-05 22:10:49 - ERROR - stderr - 55%|█████▌ | 12412/22434 [12:03:09<7:13:52, 2.60s/it] +2025-02-05 22:10:49 - ERROR - stderr - +2025-02-05 22:10:49 - ERROR - stderr - +2025-02-05 22:10:49 - INFO - stdout - {'loss': 0.6637, 'grad_norm': 1.1623989343643188, 'learning_rate': 8.764430955228229e-06, 'epoch': 1.66} +2025-02-05 22:10:49 - ERROR - stderr - 55%|█████▌ | 12412/22434 [12:03:09<7:13:52, 2.60s/it] +2025-02-05 22:10:52 - ERROR - stderr - 55%|█████▌ | 12413/22434 [12:03:12<7:05:30, 2.55s/it] +2025-02-05 22:10:52 - ERROR - stderr - +2025-02-05 22:10:52 - ERROR - stderr - +2025-02-05 22:10:52 - INFO - stdout - {'loss': 0.7059, 'grad_norm': 1.1975071430206299, 'learning_rate': 8.762998284211353e-06, 'epoch': 1.66} +2025-02-05 22:10:52 - ERROR - stderr - 55%|█████▌ | 12413/22434 [12:03:12<7:05:30, 2.55s/it] +2025-02-05 22:10:54 - ERROR - stderr - 55%|█████▌ | 12414/22434 [12:03:14<7:01:39, 2.52s/it] +2025-02-05 22:10:54 - ERROR - stderr - +2025-02-05 22:10:54 - ERROR - stderr - +2025-02-05 22:10:54 - INFO - stdout - {'loss': 0.7047, 'grad_norm': 1.2383865118026733, 'learning_rate': 8.76156563897859e-06, 'epoch': 1.66} +2025-02-05 22:10:54 - ERROR - stderr - 55%|█████▌ | 12414/22434 [12:03:14<7:01:39, 2.52s/it] +2025-02-05 22:10:57 - ERROR - stderr - 55%|█████▌ | 12415/22434 [12:03:16<6:56:21, 2.49s/it] +2025-02-05 22:10:57 - ERROR - stderr - +2025-02-05 22:10:57 - ERROR - stderr - +2025-02-05 22:10:57 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.1941391229629517, 'learning_rate': 8.760133019559808e-06, 'epoch': 1.66} +2025-02-05 22:10:57 - ERROR - stderr - 55%|█████▌ | 12415/22434 [12:03:17<6:56:21, 2.49s/it] +2025-02-05 22:10:59 - ERROR - stderr - 55%|█████▌ | 12416/22434 [12:03:19<6:57:16, 2.50s/it] +2025-02-05 22:10:59 - ERROR - stderr - +2025-02-05 22:10:59 - ERROR - stderr - +2025-02-05 22:10:59 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2723753452301025, 'learning_rate': 8.758700425984865e-06, 'epoch': 1.66} +2025-02-05 22:10:59 - ERROR - stderr - 55%|█████▌ | 12416/22434 [12:03:19<6:57:16, 2.50s/it] +2025-02-05 22:11:02 - ERROR - stderr - 55%|█████▌ | 12417/22434 [12:03:22<7:00:33, 2.52s/it] +2025-02-05 22:11:02 - ERROR - stderr - +2025-02-05 22:11:02 - ERROR - stderr - +2025-02-05 22:11:02 - INFO - stdout - {'loss': 0.6707, 'grad_norm': 1.2194710969924927, 'learning_rate': 8.757267858283627e-06, 'epoch': 1.66} +2025-02-05 22:11:02 - ERROR - stderr - 55%|█████▌ | 12417/22434 [12:03:22<7:00:33, 2.52s/it] +2025-02-05 22:11:04 - ERROR - stderr - 55%|█████▌ | 12418/22434 [12:03:24<7:05:53, 2.55s/it] +2025-02-05 22:11:04 - ERROR - stderr - +2025-02-05 22:11:04 - ERROR - stderr - +2025-02-05 22:11:04 - INFO - stdout - {'loss': 0.6915, 'grad_norm': 1.231456995010376, 'learning_rate': 8.75583531648595e-06, 'epoch': 1.66} +2025-02-05 22:11:04 - ERROR - stderr - 55%|█████▌ | 12418/22434 [12:03:24<7:05:53, 2.55s/it] +2025-02-05 22:11:07 - ERROR - stderr - 55%|█████▌ | 12419/22434 [12:03:27<7:15:59, 2.61s/it] +2025-02-05 22:11:07 - ERROR - stderr - +2025-02-05 22:11:07 - ERROR - stderr - +2025-02-05 22:11:07 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.3471393585205078, 'learning_rate': 8.754402800621694e-06, 'epoch': 1.66} +2025-02-05 22:11:07 - ERROR - stderr - 55%|█████▌ | 12419/22434 [12:03:27<7:15:59, 2.61s/it] +2025-02-05 22:11:10 - ERROR - stderr - 55%|█████▌ | 12420/22434 [12:03:30<7:12:51, 2.59s/it] +2025-02-05 22:11:10 - ERROR - stderr - +2025-02-05 22:11:10 - ERROR - stderr - +2025-02-05 22:11:10 - INFO - stdout - {'loss': 0.6819, 'grad_norm': 1.2551910877227783, 'learning_rate': 8.752970310720723e-06, 'epoch': 1.66} +2025-02-05 22:11:10 - ERROR - stderr - 55%|█████▌ | 12420/22434 [12:03:30<7:12:51, 2.59s/it] +2025-02-05 22:11:12 - ERROR - stderr - 55%|█████▌ | 12421/22434 [12:03:32<7:05:58, 2.55s/it] +2025-02-05 22:11:12 - ERROR - stderr - +2025-02-05 22:11:12 - ERROR - stderr - +2025-02-05 22:11:12 - INFO - stdout - {'loss': 0.7278, 'grad_norm': 1.2738603353500366, 'learning_rate': 8.75153784681289e-06, 'epoch': 1.66} +2025-02-05 22:11:12 - ERROR - stderr - 55%|█████▌ | 12421/22434 [12:03:32<7:05:58, 2.55s/it] +2025-02-05 22:11:15 - ERROR - stderr - 55%|█████▌ | 12422/22434 [12:03:35<7:08:25, 2.57s/it] +2025-02-05 22:11:15 - ERROR - stderr - +2025-02-05 22:11:15 - ERROR - stderr - +2025-02-05 22:11:15 - INFO - stdout - {'loss': 0.7152, 'grad_norm': 1.272778034210205, 'learning_rate': 8.750105408928054e-06, 'epoch': 1.66} +2025-02-05 22:11:15 - ERROR - stderr - 55%|█████▌ | 12422/22434 [12:03:35<7:08:25, 2.57s/it] +2025-02-05 22:11:17 - ERROR - stderr - 55%|█████▌ | 12423/22434 [12:03:37<7:02:52, 2.53s/it] +2025-02-05 22:11:17 - ERROR - stderr - +2025-02-05 22:11:17 - ERROR - stderr - +2025-02-05 22:11:17 - INFO - stdout - {'loss': 0.6891, 'grad_norm': 1.1494654417037964, 'learning_rate': 8.748672997096079e-06, 'epoch': 1.66} +2025-02-05 22:11:17 - ERROR - stderr - 55%|█████▌ | 12423/22434 [12:03:37<7:02:52, 2.53s/it] +2025-02-05 22:11:20 - ERROR - stderr - 55%|█████▌ | 12424/22434 [12:03:40<7:06:39, 2.56s/it] +2025-02-05 22:11:20 - ERROR - stderr - +2025-02-05 22:11:20 - ERROR - stderr - +2025-02-05 22:11:20 - INFO - stdout - {'loss': 0.6685, 'grad_norm': 1.2249614000320435, 'learning_rate': 8.747240611346815e-06, 'epoch': 1.66} +2025-02-05 22:11:20 - ERROR - stderr - 55%|█████▌ | 12424/22434 [12:03:40<7:06:39, 2.56s/it] +2025-02-05 22:11:22 - ERROR - stderr - 55%|█████▌ | 12425/22434 [12:03:42<7:02:31, 2.53s/it] +2025-02-05 22:11:22 - ERROR - stderr - +2025-02-05 22:11:22 - ERROR - stderr - +2025-02-05 22:11:22 - INFO - stdout - {'loss': 0.6816, 'grad_norm': 1.2430477142333984, 'learning_rate': 8.745808251710123e-06, 'epoch': 1.66} +2025-02-05 22:11:22 - ERROR - stderr - 55%|█████▌ | 12425/22434 [12:03:42<7:02:31, 2.53s/it] +2025-02-05 22:11:25 - ERROR - stderr - 55%|█████▌ | 12426/22434 [12:03:45<7:00:12, 2.52s/it] +2025-02-05 22:11:25 - ERROR - stderr - +2025-02-05 22:11:25 - ERROR - stderr - +2025-02-05 22:11:25 - INFO - stdout - {'loss': 0.6665, 'grad_norm': 1.1498692035675049, 'learning_rate': 8.74437591821586e-06, 'epoch': 1.66} +2025-02-05 22:11:25 - ERROR - stderr - 55%|█████▌ | 12426/22434 [12:03:45<7:00:12, 2.52s/it] +2025-02-05 22:11:27 - ERROR - stderr - 55%|█████▌ | 12427/22434 [12:03:47<6:55:40, 2.49s/it] +2025-02-05 22:11:27 - ERROR - stderr - +2025-02-05 22:11:27 - ERROR - stderr - +2025-02-05 22:11:27 - INFO - stdout - {'loss': 0.6778, 'grad_norm': 1.2858808040618896, 'learning_rate': 8.742943610893875e-06, 'epoch': 1.66} +2025-02-05 22:11:27 - ERROR - stderr - 55%|█████▌ | 12427/22434 [12:03:47<6:55:40, 2.49s/it] +2025-02-05 22:11:30 - ERROR - stderr - 55%|█████▌ | 12428/22434 [12:03:50<6:57:07, 2.50s/it] +2025-02-05 22:11:30 - ERROR - stderr - +2025-02-05 22:11:30 - ERROR - stderr - +2025-02-05 22:11:30 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.296128749847412, 'learning_rate': 8.74151132977403e-06, 'epoch': 1.66} +2025-02-05 22:11:30 - ERROR - stderr - 55%|█████▌ | 12428/22434 [12:03:50<6:57:07, 2.50s/it] +2025-02-05 22:11:32 - ERROR - stderr - 55%|█████▌ | 12429/22434 [12:03:52<6:56:07, 2.50s/it] +2025-02-05 22:11:32 - ERROR - stderr - +2025-02-05 22:11:32 - ERROR - stderr - +2025-02-05 22:11:32 - INFO - stdout - {'loss': 0.7402, 'grad_norm': 1.1610465049743652, 'learning_rate': 8.740079074886178e-06, 'epoch': 1.66} +2025-02-05 22:11:32 - ERROR - stderr - 55%|█████▌ | 12429/22434 [12:03:52<6:56:07, 2.50s/it] +2025-02-05 22:11:35 - ERROR - stderr - 55%|█████▌ | 12430/22434 [12:03:55<6:56:31, 2.50s/it] +2025-02-05 22:11:35 - ERROR - stderr - +2025-02-05 22:11:35 - ERROR - stderr - +2025-02-05 22:11:35 - INFO - stdout - {'loss': 0.6612, 'grad_norm': 1.2021414041519165, 'learning_rate': 8.738646846260169e-06, 'epoch': 1.66} +2025-02-05 22:11:35 - ERROR - stderr - 55%|█████▌ | 12430/22434 [12:03:55<6:56:31, 2.50s/it] +2025-02-05 22:11:37 - ERROR - stderr - 55%|█████▌ | 12431/22434 [12:03:57<6:57:40, 2.51s/it] +2025-02-05 22:11:37 - ERROR - stderr - +2025-02-05 22:11:37 - ERROR - stderr - +2025-02-05 22:11:37 - INFO - stdout - {'loss': 0.7217, 'grad_norm': 1.2859634160995483, 'learning_rate': 8.737214643925864e-06, 'epoch': 1.66} +2025-02-05 22:11:37 - ERROR - stderr - 55%|█████▌ | 12431/22434 [12:03:57<6:57:40, 2.51s/it] +2025-02-05 22:11:40 - ERROR - stderr - 55%|█████▌ | 12432/22434 [12:04:00<6:55:46, 2.49s/it] +2025-02-05 22:11:40 - ERROR - stderr - +2025-02-05 22:11:40 - ERROR - stderr - +2025-02-05 22:11:40 - INFO - stdout - {'loss': 0.6475, 'grad_norm': 1.2385551929473877, 'learning_rate': 8.735782467913107e-06, 'epoch': 1.66} +2025-02-05 22:11:40 - ERROR - stderr - 55%|█████▌ | 12432/22434 [12:04:00<6:55:46, 2.49s/it] +2025-02-05 22:11:42 - ERROR - stderr - 55%|█████▌ | 12433/22434 [12:04:02<6:52:00, 2.47s/it] +2025-02-05 22:11:42 - ERROR - stderr - +2025-02-05 22:11:42 - ERROR - stderr - +2025-02-05 22:11:42 - INFO - stdout - {'loss': 0.7304, 'grad_norm': 1.3111885786056519, 'learning_rate': 8.734350318251758e-06, 'epoch': 1.66} +2025-02-05 22:11:42 - ERROR - stderr - 55%|█████▌ | 12433/22434 [12:04:02<6:52:00, 2.47s/it] +2025-02-05 22:11:45 - ERROR - stderr - 55%|█████▌ | 12434/22434 [12:04:04<6:53:18, 2.48s/it] +2025-02-05 22:11:45 - ERROR - stderr - +2025-02-05 22:11:45 - ERROR - stderr - +2025-02-05 22:11:45 - INFO - stdout - {'loss': 0.6707, 'grad_norm': 1.141268253326416, 'learning_rate': 8.732918194971663e-06, 'epoch': 1.66} +2025-02-05 22:11:45 - ERROR - stderr - 55%|█████▌ | 12434/22434 [12:04:04<6:53:18, 2.48s/it] +2025-02-05 22:11:47 - ERROR - stderr - 55%|█████▌ | 12435/22434 [12:04:07<6:53:57, 2.48s/it] +2025-02-05 22:11:47 - ERROR - stderr - +2025-02-05 22:11:47 - ERROR - stderr - +2025-02-05 22:11:47 - INFO - stdout - {'loss': 0.8004, 'grad_norm': 1.321655035018921, 'learning_rate': 8.731486098102674e-06, 'epoch': 1.66} +2025-02-05 22:11:47 - ERROR - stderr - 55%|█████▌ | 12435/22434 [12:04:07<6:53:57, 2.48s/it] +2025-02-05 22:11:50 - ERROR - stderr - 55%|█████▌ | 12436/22434 [12:04:09<6:56:21, 2.50s/it] +2025-02-05 22:11:50 - ERROR - stderr - +2025-02-05 22:11:50 - ERROR - stderr - +2025-02-05 22:11:50 - INFO - stdout - {'loss': 0.6271, 'grad_norm': 1.2179478406906128, 'learning_rate': 8.730054027674649e-06, 'epoch': 1.66} +2025-02-05 22:11:50 - ERROR - stderr - 55%|█████▌ | 12436/22434 [12:04:10<6:56:21, 2.50s/it] +2025-02-05 22:11:52 - ERROR - stderr - 55%|█████▌ | 12437/22434 [12:04:12<6:55:54, 2.50s/it] +2025-02-05 22:11:52 - ERROR - stderr - +2025-02-05 22:11:52 - ERROR - stderr - +2025-02-05 22:11:52 - INFO - stdout - {'loss': 0.7672, 'grad_norm': 1.328784704208374, 'learning_rate': 8.728621983717433e-06, 'epoch': 1.66} +2025-02-05 22:11:52 - ERROR - stderr - 55%|█████▌ | 12437/22434 [12:04:12<6:55:54, 2.50s/it] +2025-02-05 22:11:55 - ERROR - stderr - 55%|█████▌ | 12438/22434 [12:04:15<6:58:10, 2.51s/it] +2025-02-05 22:11:55 - ERROR - stderr - +2025-02-05 22:11:55 - ERROR - stderr - +2025-02-05 22:11:55 - INFO - stdout - {'loss': 0.6297, 'grad_norm': 1.1775200366973877, 'learning_rate': 8.72718996626087e-06, 'epoch': 1.66} +2025-02-05 22:11:55 - ERROR - stderr - 55%|█████▌ | 12438/22434 [12:04:15<6:58:10, 2.51s/it] +2025-02-05 22:11:57 - ERROR - stderr - 55%|█████▌ | 12439/22434 [12:04:17<7:06:06, 2.56s/it] +2025-02-05 22:11:57 - ERROR - stderr - +2025-02-05 22:11:57 - ERROR - stderr - +2025-02-05 22:11:57 - INFO - stdout - {'loss': 0.6214, 'grad_norm': 1.1654350757598877, 'learning_rate': 8.725757975334816e-06, 'epoch': 1.66} +2025-02-05 22:11:57 - ERROR - stderr - 55%|█████▌ | 12439/22434 [12:04:17<7:06:06, 2.56s/it] +2025-02-05 22:12:00 - ERROR - stderr - 55%|█████▌ | 12440/22434 [12:04:20<7:00:33, 2.52s/it] +2025-02-05 22:12:00 - ERROR - stderr - +2025-02-05 22:12:00 - ERROR - stderr - +2025-02-05 22:12:00 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 1.2574567794799805, 'learning_rate': 8.724326010969116e-06, 'epoch': 1.66} +2025-02-05 22:12:00 - ERROR - stderr - 55%|█████▌ | 12440/22434 [12:04:20<7:00:33, 2.52s/it] +2025-02-05 22:12:02 - ERROR - stderr - 55%|█████▌ | 12441/22434 [12:04:22<6:58:06, 2.51s/it] +2025-02-05 22:12:02 - ERROR - stderr - +2025-02-05 22:12:02 - ERROR - stderr - +2025-02-05 22:12:02 - INFO - stdout - {'loss': 0.6359, 'grad_norm': 1.1157575845718384, 'learning_rate': 8.722894073193622e-06, 'epoch': 1.66} +2025-02-05 22:12:02 - ERROR - stderr - 55%|█████▌ | 12441/22434 [12:04:22<6:58:06, 2.51s/it] +2025-02-05 22:12:05 - ERROR - stderr - 55%|█████▌ | 12442/22434 [12:04:25<7:01:42, 2.53s/it] +2025-02-05 22:12:05 - ERROR - stderr - +2025-02-05 22:12:05 - ERROR - stderr - +2025-02-05 22:12:05 - INFO - stdout - {'loss': 0.7887, 'grad_norm': 1.4238977432250977, 'learning_rate': 8.721462162038181e-06, 'epoch': 1.66} +2025-02-05 22:12:05 - ERROR - stderr - 55%|█████▌ | 12442/22434 [12:04:25<7:01:42, 2.53s/it] +2025-02-05 22:12:07 - ERROR - stderr - 55%|█████▌ | 12443/22434 [12:04:27<6:58:18, 2.51s/it] +2025-02-05 22:12:07 - ERROR - stderr - +2025-02-05 22:12:07 - ERROR - stderr - +2025-02-05 22:12:07 - INFO - stdout - {'loss': 0.6746, 'grad_norm': 1.296324372291565, 'learning_rate': 8.720030277532632e-06, 'epoch': 1.66} +2025-02-05 22:12:07 - ERROR - stderr - 55%|█████▌ | 12443/22434 [12:04:27<6:58:18, 2.51s/it] +2025-02-05 22:12:10 - ERROR - stderr - 55%|█████▌ | 12444/22434 [12:04:30<6:58:55, 2.52s/it] +2025-02-05 22:12:10 - ERROR - stderr - +2025-02-05 22:12:10 - ERROR - stderr - +2025-02-05 22:12:10 - INFO - stdout - {'loss': 0.6781, 'grad_norm': 1.1258771419525146, 'learning_rate': 8.718598419706832e-06, 'epoch': 1.66} +2025-02-05 22:12:10 - ERROR - stderr - 55%|█████▌ | 12444/22434 [12:04:30<6:58:55, 2.52s/it] +2025-02-05 22:12:12 - ERROR - stderr - 55%|█████▌ | 12445/22434 [12:04:32<6:54:29, 2.49s/it] +2025-02-05 22:12:12 - ERROR - stderr - +2025-02-05 22:12:12 - ERROR - stderr - +2025-02-05 22:12:12 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.2695239782333374, 'learning_rate': 8.717166588590624e-06, 'epoch': 1.66} +2025-02-05 22:12:12 - ERROR - stderr - 55%|█████▌ | 12445/22434 [12:04:32<6:54:29, 2.49s/it] +2025-02-05 22:12:15 - ERROR - stderr - 55%|█████▌ | 12446/22434 [12:04:35<7:04:30, 2.55s/it] +2025-02-05 22:12:15 - ERROR - stderr - +2025-02-05 22:12:15 - ERROR - stderr - +2025-02-05 22:12:15 - INFO - stdout - {'loss': 0.7041, 'grad_norm': 1.2950083017349243, 'learning_rate': 8.715734784213843e-06, 'epoch': 1.66} +2025-02-05 22:12:15 - ERROR - stderr - 55%|█████▌ | 12446/22434 [12:04:35<7:04:30, 2.55s/it] +2025-02-05 22:12:18 - ERROR - stderr - 55%|█████▌ | 12447/22434 [12:04:37<7:01:36, 2.53s/it] +2025-02-05 22:12:18 - ERROR - stderr - +2025-02-05 22:12:18 - ERROR - stderr - +2025-02-05 22:12:18 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.102746844291687, 'learning_rate': 8.714303006606346e-06, 'epoch': 1.66} +2025-02-05 22:12:18 - ERROR - stderr - 55%|█████▌ | 12447/22434 [12:04:37<7:01:36, 2.53s/it] +2025-02-05 22:12:20 - ERROR - stderr - 55%|█████▌ | 12448/22434 [12:04:40<6:59:02, 2.52s/it] +2025-02-05 22:12:20 - ERROR - stderr - +2025-02-05 22:12:20 - ERROR - stderr - +2025-02-05 22:12:20 - INFO - stdout - {'loss': 0.784, 'grad_norm': 1.445131540298462, 'learning_rate': 8.71287125579797e-06, 'epoch': 1.66} +2025-02-05 22:12:20 - ERROR - stderr - 55%|█████▌ | 12448/22434 [12:04:40<6:59:02, 2.52s/it] +2025-02-05 22:12:22 - ERROR - stderr - 55%|█████▌ | 12449/22434 [12:04:42<6:56:23, 2.50s/it] +2025-02-05 22:12:23 - ERROR - stderr - +2025-02-05 22:12:23 - ERROR - stderr - +2025-02-05 22:12:23 - INFO - stdout - {'loss': 0.7588, 'grad_norm': 1.3842480182647705, 'learning_rate': 8.711439531818565e-06, 'epoch': 1.66} +2025-02-05 22:12:23 - ERROR - stderr - 55%|█████▌ | 12449/22434 [12:04:42<6:56:23, 2.50s/it] +2025-02-05 22:12:25 - ERROR - stderr - 55%|█████▌ | 12450/22434 [12:04:45<6:53:17, 2.48s/it] +2025-02-05 22:12:25 - ERROR - stderr - +2025-02-05 22:12:25 - ERROR - stderr - +2025-02-05 22:12:25 - INFO - stdout - {'loss': 0.6185, 'grad_norm': 1.1452401876449585, 'learning_rate': 8.71000783469797e-06, 'epoch': 1.66} +2025-02-05 22:12:25 - ERROR - stderr - 55%|█████▌ | 12450/22434 [12:04:45<6:53:17, 2.48s/it] +2025-02-05 22:12:27 - ERROR - stderr - 56%|█████▌ | 12451/22434 [12:04:47<6:53:54, 2.49s/it] +2025-02-05 22:12:27 - ERROR - stderr - +2025-02-05 22:12:27 - ERROR - stderr - +2025-02-05 22:12:27 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 1.1969728469848633, 'learning_rate': 8.708576164466023e-06, 'epoch': 1.67} +2025-02-05 22:12:27 - ERROR - stderr - 56%|█████▌ | 12451/22434 [12:04:47<6:53:54, 2.49s/it] +2025-02-05 22:12:30 - ERROR - stderr - 56%|█████▌ | 12452/22434 [12:04:50<6:55:58, 2.50s/it] +2025-02-05 22:12:30 - ERROR - stderr - +2025-02-05 22:12:30 - ERROR - stderr - +2025-02-05 22:12:30 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.3963426351547241, 'learning_rate': 8.707144521152574e-06, 'epoch': 1.67} +2025-02-05 22:12:30 - ERROR - stderr - 56%|█████▌ | 12452/22434 [12:04:50<6:55:58, 2.50s/it] +2025-02-05 22:12:32 - ERROR - stderr - 56%|█████▌ | 12453/22434 [12:04:52<6:56:34, 2.50s/it] +2025-02-05 22:12:32 - ERROR - stderr - +2025-02-05 22:12:32 - ERROR - stderr - +2025-02-05 22:12:32 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.174609661102295, 'learning_rate': 8.705712904787458e-06, 'epoch': 1.67} +2025-02-05 22:12:32 - ERROR - stderr - 56%|█████▌ | 12453/22434 [12:04:52<6:56:34, 2.50s/it] +2025-02-05 22:12:35 - ERROR - stderr - 56%|█████▌ | 12454/22434 [12:04:55<6:54:00, 2.49s/it] +2025-02-05 22:12:35 - ERROR - stderr - +2025-02-05 22:12:35 - ERROR - stderr - +2025-02-05 22:12:35 - INFO - stdout - {'loss': 0.728, 'grad_norm': 1.1370718479156494, 'learning_rate': 8.704281315400518e-06, 'epoch': 1.67} +2025-02-05 22:12:35 - ERROR - stderr - 56%|█████▌ | 12454/22434 [12:04:55<6:54:00, 2.49s/it] +2025-02-05 22:12:37 - ERROR - stderr - 56%|█████▌ | 12455/22434 [12:04:57<6:54:00, 2.49s/it] +2025-02-05 22:12:37 - ERROR - stderr - +2025-02-05 22:12:37 - ERROR - stderr - +2025-02-05 22:12:37 - INFO - stdout - {'loss': 0.7647, 'grad_norm': 1.2526185512542725, 'learning_rate': 8.702849753021595e-06, 'epoch': 1.67} +2025-02-05 22:12:37 - ERROR - stderr - 56%|█████▌ | 12455/22434 [12:04:57<6:54:00, 2.49s/it] +2025-02-05 22:12:40 - ERROR - stderr - 56%|█████▌ | 12456/22434 [12:05:00<6:51:00, 2.47s/it] +2025-02-05 22:12:40 - ERROR - stderr - +2025-02-05 22:12:40 - ERROR - stderr - +2025-02-05 22:12:40 - INFO - stdout - {'loss': 0.6529, 'grad_norm': 1.7578057050704956, 'learning_rate': 8.701418217680525e-06, 'epoch': 1.67} +2025-02-05 22:12:40 - ERROR - stderr - 56%|█████▌ | 12456/22434 [12:05:00<6:51:00, 2.47s/it] +2025-02-05 22:12:42 - ERROR - stderr - 56%|█████▌ | 12457/22434 [12:05:02<6:51:59, 2.48s/it] +2025-02-05 22:12:42 - ERROR - stderr - +2025-02-05 22:12:42 - ERROR - stderr - +2025-02-05 22:12:42 - INFO - stdout - {'loss': 0.7633, 'grad_norm': 1.556486964225769, 'learning_rate': 8.699986709407156e-06, 'epoch': 1.67} +2025-02-05 22:12:42 - ERROR - stderr - 56%|█████▌ | 12457/22434 [12:05:02<6:51:59, 2.48s/it] +2025-02-05 22:12:45 - ERROR - stderr - 56%|█████▌ | 12458/22434 [12:05:05<6:59:58, 2.53s/it] +2025-02-05 22:12:45 - ERROR - stderr - +2025-02-05 22:12:45 - ERROR - stderr - +2025-02-05 22:12:45 - INFO - stdout - {'loss': 0.7914, 'grad_norm': 1.3589372634887695, 'learning_rate': 8.698555228231319e-06, 'epoch': 1.67} +2025-02-05 22:12:45 - ERROR - stderr - 56%|█████▌ | 12458/22434 [12:05:05<6:59:58, 2.53s/it] +2025-02-05 22:12:47 - ERROR - stderr - 56%|█████▌ | 12459/22434 [12:05:07<6:58:07, 2.51s/it] +2025-02-05 22:12:47 - ERROR - stderr - +2025-02-05 22:12:47 - ERROR - stderr - +2025-02-05 22:12:47 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.334647536277771, 'learning_rate': 8.697123774182847e-06, 'epoch': 1.67} +2025-02-05 22:12:47 - ERROR - stderr - 56%|█████▌ | 12459/22434 [12:05:07<6:58:07, 2.51s/it] +2025-02-05 22:12:50 - ERROR - stderr - 56%|█████▌ | 12460/22434 [12:05:10<6:57:36, 2.51s/it] +2025-02-05 22:12:50 - ERROR - stderr - +2025-02-05 22:12:50 - ERROR - stderr - +2025-02-05 22:12:50 - INFO - stdout - {'loss': 0.6154, 'grad_norm': 1.2330920696258545, 'learning_rate': 8.695692347291586e-06, 'epoch': 1.67} +2025-02-05 22:12:50 - ERROR - stderr - 56%|█████▌ | 12460/22434 [12:05:10<6:57:36, 2.51s/it] +2025-02-05 22:12:52 - ERROR - stderr - 56%|█████▌ | 12461/22434 [12:05:12<6:58:12, 2.52s/it] +2025-02-05 22:12:53 - ERROR - stderr - +2025-02-05 22:12:53 - ERROR - stderr - +2025-02-05 22:12:53 - INFO - stdout - {'loss': 0.759, 'grad_norm': 1.347893476486206, 'learning_rate': 8.694260947587372e-06, 'epoch': 1.67} +2025-02-05 22:12:53 - ERROR - stderr - 56%|█████▌ | 12461/22434 [12:05:12<6:58:12, 2.52s/it] +2025-02-05 22:12:55 - ERROR - stderr - 56%|█████▌ | 12462/22434 [12:05:15<7:20:16, 2.65s/it] +2025-02-05 22:12:55 - ERROR - stderr - +2025-02-05 22:12:55 - ERROR - stderr - +2025-02-05 22:12:55 - INFO - stdout - {'loss': 0.5989, 'grad_norm': 1.0674257278442383, 'learning_rate': 8.692829575100037e-06, 'epoch': 1.67} +2025-02-05 22:12:55 - ERROR - stderr - 56%|█████▌ | 12462/22434 [12:05:15<7:20:16, 2.65s/it] +2025-02-05 22:12:58 - ERROR - stderr - 56%|█████▌ | 12463/22434 [12:05:18<7:12:49, 2.60s/it] +2025-02-05 22:12:58 - ERROR - stderr - +2025-02-05 22:12:58 - ERROR - stderr - +2025-02-05 22:12:58 - INFO - stdout - {'loss': 0.6585, 'grad_norm': 1.1802756786346436, 'learning_rate': 8.69139822985942e-06, 'epoch': 1.67} +2025-02-05 22:12:58 - ERROR - stderr - 56%|█████▌ | 12463/22434 [12:05:18<7:12:49, 2.60s/it] +2025-02-05 22:13:00 - ERROR - stderr - 56%|█████▌ | 12464/22434 [12:05:20<7:07:49, 2.57s/it] +2025-02-05 22:13:00 - ERROR - stderr - +2025-02-05 22:13:00 - ERROR - stderr - +2025-02-05 22:13:00 - INFO - stdout - {'loss': 0.7902, 'grad_norm': 1.2098033428192139, 'learning_rate': 8.68996691189535e-06, 'epoch': 1.67} +2025-02-05 22:13:00 - ERROR - stderr - 56%|█████▌ | 12464/22434 [12:05:20<7:07:49, 2.57s/it] +2025-02-05 22:13:03 - ERROR - stderr - 56%|█████▌ | 12465/22434 [12:05:23<7:04:54, 2.56s/it] +2025-02-05 22:13:03 - ERROR - stderr - +2025-02-05 22:13:03 - ERROR - stderr - +2025-02-05 22:13:03 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.1803351640701294, 'learning_rate': 8.688535621237674e-06, 'epoch': 1.67} +2025-02-05 22:13:03 - ERROR - stderr - 56%|█████▌ | 12465/22434 [12:05:23<7:04:54, 2.56s/it] +2025-02-05 22:13:05 - ERROR - stderr - 56%|█████▌ | 12466/22434 [12:05:25<7:00:10, 2.53s/it] +2025-02-05 22:13:05 - ERROR - stderr - +2025-02-05 22:13:05 - ERROR - stderr - +2025-02-05 22:13:05 - INFO - stdout - {'loss': 0.7404, 'grad_norm': 1.357036828994751, 'learning_rate': 8.687104357916214e-06, 'epoch': 1.67} +2025-02-05 22:13:05 - ERROR - stderr - 56%|█████▌ | 12466/22434 [12:05:25<7:00:10, 2.53s/it] +2025-02-05 22:13:08 - ERROR - stderr - 56%|█████▌ | 12467/22434 [12:05:28<7:01:54, 2.54s/it] +2025-02-05 22:13:08 - ERROR - stderr - +2025-02-05 22:13:08 - ERROR - stderr - +2025-02-05 22:13:08 - INFO - stdout - {'loss': 0.6416, 'grad_norm': 1.1231348514556885, 'learning_rate': 8.685673121960805e-06, 'epoch': 1.67} +2025-02-05 22:13:08 - ERROR - stderr - 56%|█████▌ | 12467/22434 [12:05:28<7:01:54, 2.54s/it] +2025-02-05 22:13:10 - ERROR - stderr - 56%|█████▌ | 12468/22434 [12:05:30<7:00:22, 2.53s/it] +2025-02-05 22:13:11 - ERROR - stderr - +2025-02-05 22:13:11 - ERROR - stderr - +2025-02-05 22:13:11 - INFO - stdout - {'loss': 0.6708, 'grad_norm': 1.1997489929199219, 'learning_rate': 8.684241913401285e-06, 'epoch': 1.67} +2025-02-05 22:13:11 - ERROR - stderr - 56%|█████▌ | 12468/22434 [12:05:30<7:00:22, 2.53s/it] +2025-02-05 22:13:13 - ERROR - stderr - 56%|█████▌ | 12469/22434 [12:05:33<6:57:51, 2.52s/it] +2025-02-05 22:13:13 - ERROR - stderr - +2025-02-05 22:13:13 - ERROR - stderr - +2025-02-05 22:13:13 - INFO - stdout - {'loss': 0.7309, 'grad_norm': 1.3341172933578491, 'learning_rate': 8.682810732267486e-06, 'epoch': 1.67} +2025-02-05 22:13:13 - ERROR - stderr - 56%|█████▌ | 12469/22434 [12:05:33<6:57:51, 2.52s/it] +2025-02-05 22:13:15 - ERROR - stderr - 56%|█████▌ | 12470/22434 [12:05:35<6:55:55, 2.50s/it] +2025-02-05 22:13:15 - ERROR - stderr - +2025-02-05 22:13:15 - ERROR - stderr - +2025-02-05 22:13:15 - INFO - stdout - {'loss': 0.6862, 'grad_norm': 1.1997499465942383, 'learning_rate': 8.681379578589232e-06, 'epoch': 1.67} +2025-02-05 22:13:15 - ERROR - stderr - 56%|█████▌ | 12470/22434 [12:05:35<6:55:55, 2.50s/it] +2025-02-05 22:13:18 - ERROR - stderr - 56%|█████▌ | 12471/22434 [12:05:38<6:52:27, 2.48s/it] +2025-02-05 22:13:18 - ERROR - stderr - +2025-02-05 22:13:18 - ERROR - stderr - +2025-02-05 22:13:18 - INFO - stdout - {'loss': 0.653, 'grad_norm': 1.1846867799758911, 'learning_rate': 8.679948452396361e-06, 'epoch': 1.67} +2025-02-05 22:13:18 - ERROR - stderr - 56%|█████▌ | 12471/22434 [12:05:38<6:52:27, 2.48s/it] +2025-02-05 22:13:20 - ERROR - stderr - 56%|█████▌ | 12472/22434 [12:05:40<6:52:54, 2.49s/it] +2025-02-05 22:13:20 - ERROR - stderr - +2025-02-05 22:13:20 - ERROR - stderr - +2025-02-05 22:13:20 - INFO - stdout - {'loss': 0.6476, 'grad_norm': 1.1316466331481934, 'learning_rate': 8.678517353718699e-06, 'epoch': 1.67} +2025-02-05 22:13:20 - ERROR - stderr - 56%|█████▌ | 12472/22434 [12:05:40<6:52:54, 2.49s/it] +2025-02-05 22:13:23 - ERROR - stderr - 56%|█████▌ | 12473/22434 [12:05:43<6:49:16, 2.47s/it] +2025-02-05 22:13:23 - ERROR - stderr - +2025-02-05 22:13:23 - ERROR - stderr - +2025-02-05 22:13:23 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.328794240951538, 'learning_rate': 8.67708628258608e-06, 'epoch': 1.67} +2025-02-05 22:13:23 - ERROR - stderr - 56%|█████▌ | 12473/22434 [12:05:43<6:49:16, 2.47s/it] +2025-02-05 22:13:25 - ERROR - stderr - 56%|█████▌ | 12474/22434 [12:05:45<6:50:50, 2.47s/it] +2025-02-05 22:13:25 - ERROR - stderr - +2025-02-05 22:13:25 - ERROR - stderr - +2025-02-05 22:13:25 - INFO - stdout - {'loss': 0.7099, 'grad_norm': 1.1940380334854126, 'learning_rate': 8.675655239028333e-06, 'epoch': 1.67} +2025-02-05 22:13:25 - ERROR - stderr - 56%|█████▌ | 12474/22434 [12:05:45<6:50:50, 2.47s/it] +2025-02-05 22:13:28 - ERROR - stderr - 56%|█████▌ | 12475/22434 [12:05:48<6:52:34, 2.49s/it] +2025-02-05 22:13:28 - ERROR - stderr - +2025-02-05 22:13:28 - ERROR - stderr - +2025-02-05 22:13:28 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.1756426095962524, 'learning_rate': 8.674224223075283e-06, 'epoch': 1.67} +2025-02-05 22:13:28 - ERROR - stderr - 56%|█████▌ | 12475/22434 [12:05:48<6:52:34, 2.49s/it] +2025-02-05 22:13:30 - ERROR - stderr - 56%|█████▌ | 12476/22434 [12:05:50<6:52:21, 2.48s/it] +2025-02-05 22:13:30 - ERROR - stderr - +2025-02-05 22:13:30 - ERROR - stderr - +2025-02-05 22:13:30 - INFO - stdout - {'loss': 0.6899, 'grad_norm': 1.33180570602417, 'learning_rate': 8.672793234756762e-06, 'epoch': 1.67} +2025-02-05 22:13:30 - ERROR - stderr - 56%|█████▌ | 12476/22434 [12:05:50<6:52:21, 2.48s/it] +2025-02-05 22:13:33 - ERROR - stderr - 56%|█████▌ | 12477/22434 [12:05:53<6:53:06, 2.49s/it] +2025-02-05 22:13:33 - ERROR - stderr - +2025-02-05 22:13:33 - ERROR - stderr - +2025-02-05 22:13:33 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.1839414834976196, 'learning_rate': 8.671362274102598e-06, 'epoch': 1.67} +2025-02-05 22:13:33 - ERROR - stderr - 56%|█████▌ | 12477/22434 [12:05:53<6:53:06, 2.49s/it] +2025-02-05 22:13:35 - ERROR - stderr - 56%|█████▌ | 12478/22434 [12:05:55<7:04:19, 2.56s/it] +2025-02-05 22:13:36 - ERROR - stderr - +2025-02-05 22:13:36 - ERROR - stderr - +2025-02-05 22:13:36 - INFO - stdout - {'loss': 0.6536, 'grad_norm': 1.2693500518798828, 'learning_rate': 8.66993134114261e-06, 'epoch': 1.67} +2025-02-05 22:13:36 - ERROR - stderr - 56%|█████▌ | 12478/22434 [12:05:55<7:04:19, 2.56s/it] +2025-02-05 22:13:38 - ERROR - stderr - 56%|█████▌ | 12479/22434 [12:05:58<7:00:24, 2.53s/it] +2025-02-05 22:13:38 - ERROR - stderr - +2025-02-05 22:13:38 - ERROR - stderr - +2025-02-05 22:13:38 - INFO - stdout - {'loss': 0.7355, 'grad_norm': 1.3352482318878174, 'learning_rate': 8.668500435906635e-06, 'epoch': 1.67} +2025-02-05 22:13:38 - ERROR - stderr - 56%|█████▌ | 12479/22434 [12:05:58<7:00:24, 2.53s/it] +2025-02-05 22:13:40 - ERROR - stderr - 56%|█████▌ | 12480/22434 [12:06:00<6:58:02, 2.52s/it] +2025-02-05 22:13:41 - ERROR - stderr - +2025-02-05 22:13:41 - ERROR - stderr - +2025-02-05 22:13:41 - INFO - stdout - {'loss': 0.6689, 'grad_norm': 1.1899019479751587, 'learning_rate': 8.667069558424493e-06, 'epoch': 1.67} +2025-02-05 22:13:41 - ERROR - stderr - 56%|█████▌ | 12480/22434 [12:06:00<6:58:02, 2.52s/it] +2025-02-05 22:13:43 - ERROR - stderr - 56%|█████▌ | 12481/22434 [12:06:03<6:56:51, 2.51s/it] +2025-02-05 22:13:43 - ERROR - stderr - +2025-02-05 22:13:43 - ERROR - stderr - +2025-02-05 22:13:43 - INFO - stdout - {'loss': 0.7324, 'grad_norm': 1.1978273391723633, 'learning_rate': 8.66563870872601e-06, 'epoch': 1.67} +2025-02-05 22:13:43 - ERROR - stderr - 56%|█████▌ | 12481/22434 [12:06:03<6:56:51, 2.51s/it] +2025-02-05 22:13:45 - ERROR - stderr - 56%|█████▌ | 12482/22434 [12:06:05<6:55:50, 2.51s/it] +2025-02-05 22:13:46 - ERROR - stderr - +2025-02-05 22:13:46 - ERROR - stderr - +2025-02-05 22:13:46 - INFO - stdout - {'loss': 0.7844, 'grad_norm': 1.3753151893615723, 'learning_rate': 8.664207886841014e-06, 'epoch': 1.67} +2025-02-05 22:13:46 - ERROR - stderr - 56%|█████▌ | 12482/22434 [12:06:05<6:55:50, 2.51s/it] +2025-02-05 22:13:48 - ERROR - stderr - 56%|█████▌ | 12483/22434 [12:06:08<6:52:30, 2.49s/it] +2025-02-05 22:13:48 - ERROR - stderr - +2025-02-05 22:13:48 - ERROR - stderr - +2025-02-05 22:13:48 - INFO - stdout - {'loss': 0.7412, 'grad_norm': 1.3972060680389404, 'learning_rate': 8.662777092799322e-06, 'epoch': 1.67} +2025-02-05 22:13:48 - ERROR - stderr - 56%|█████▌ | 12483/22434 [12:06:08<6:52:30, 2.49s/it] +2025-02-05 22:13:50 - ERROR - stderr - 56%|█████▌ | 12484/22434 [12:06:10<6:49:34, 2.47s/it] +2025-02-05 22:13:50 - ERROR - stderr - +2025-02-05 22:13:50 - ERROR - stderr - +2025-02-05 22:13:50 - INFO - stdout - {'loss': 0.6088, 'grad_norm': 1.1643095016479492, 'learning_rate': 8.661346326630767e-06, 'epoch': 1.67} +2025-02-05 22:13:50 - ERROR - stderr - 56%|█████▌ | 12484/22434 [12:06:10<6:49:34, 2.47s/it] +2025-02-05 22:13:53 - ERROR - stderr - 56%|█████▌ | 12485/22434 [12:06:13<6:50:03, 2.47s/it] +2025-02-05 22:13:53 - ERROR - stderr - +2025-02-05 22:13:53 - ERROR - stderr - +2025-02-05 22:13:53 - INFO - stdout - {'loss': 0.6934, 'grad_norm': 1.2762802839279175, 'learning_rate': 8.659915588365164e-06, 'epoch': 1.67} +2025-02-05 22:13:53 - ERROR - stderr - 56%|█████▌ | 12485/22434 [12:06:13<6:50:03, 2.47s/it] +2025-02-05 22:13:55 - ERROR - stderr - 56%|█████▌ | 12486/22434 [12:06:15<6:49:45, 2.47s/it] +2025-02-05 22:13:55 - ERROR - stderr - +2025-02-05 22:13:55 - ERROR - stderr - +2025-02-05 22:13:55 - INFO - stdout - {'loss': 0.6632, 'grad_norm': 1.2050150632858276, 'learning_rate': 8.658484878032335e-06, 'epoch': 1.67} +2025-02-05 22:13:55 - ERROR - stderr - 56%|█████▌ | 12486/22434 [12:06:15<6:49:45, 2.47s/it] +2025-02-05 22:13:58 - ERROR - stderr - 56%|█████▌ | 12487/22434 [12:06:18<7:01:30, 2.54s/it] +2025-02-05 22:13:58 - ERROR - stderr - +2025-02-05 22:13:58 - ERROR - stderr - +2025-02-05 22:13:58 - INFO - stdout - {'loss': 0.6321, 'grad_norm': 1.1012870073318481, 'learning_rate': 8.657054195662112e-06, 'epoch': 1.67} +2025-02-05 22:13:58 - ERROR - stderr - 56%|█████▌ | 12487/22434 [12:06:18<7:01:30, 2.54s/it] +2025-02-05 22:14:00 - ERROR - stderr - 56%|█████▌ | 12488/22434 [12:06:20<6:57:56, 2.52s/it] +2025-02-05 22:14:01 - ERROR - stderr - +2025-02-05 22:14:01 - ERROR - stderr - +2025-02-05 22:14:01 - INFO - stdout - {'loss': 0.6685, 'grad_norm': 1.1518096923828125, 'learning_rate': 8.655623541284304e-06, 'epoch': 1.67} +2025-02-05 22:14:01 - ERROR - stderr - 56%|█████▌ | 12488/22434 [12:06:20<6:57:56, 2.52s/it] +2025-02-05 22:14:03 - ERROR - stderr - 56%|█████▌ | 12489/22434 [12:06:23<6:59:22, 2.53s/it] +2025-02-05 22:14:03 - ERROR - stderr - +2025-02-05 22:14:03 - ERROR - stderr - +2025-02-05 22:14:03 - INFO - stdout - {'loss': 0.6534, 'grad_norm': 1.4391409158706665, 'learning_rate': 8.654192914928739e-06, 'epoch': 1.67} +2025-02-05 22:14:03 - ERROR - stderr - 56%|█████▌ | 12489/22434 [12:06:23<6:59:22, 2.53s/it] +2025-02-05 22:14:05 - ERROR - stderr - 56%|█████▌ | 12490/22434 [12:06:25<6:56:50, 2.52s/it] +2025-02-05 22:14:06 - ERROR - stderr - +2025-02-05 22:14:06 - ERROR - stderr - +2025-02-05 22:14:06 - INFO - stdout - {'loss': 0.75, 'grad_norm': 1.343369483947754, 'learning_rate': 8.652762316625238e-06, 'epoch': 1.67} +2025-02-05 22:14:06 - ERROR - stderr - 56%|█████▌ | 12490/22434 [12:06:25<6:56:50, 2.52s/it] +2025-02-05 22:14:08 - ERROR - stderr - 56%|█████▌ | 12491/22434 [12:06:28<6:53:09, 2.49s/it] +2025-02-05 22:14:08 - ERROR - stderr - +2025-02-05 22:14:08 - ERROR - stderr - +2025-02-05 22:14:08 - INFO - stdout - {'loss': 0.7311, 'grad_norm': 1.209408164024353, 'learning_rate': 8.651331746403611e-06, 'epoch': 1.67} +2025-02-05 22:14:08 - ERROR - stderr - 56%|█████▌ | 12491/22434 [12:06:28<6:53:09, 2.49s/it] +2025-02-05 22:14:10 - ERROR - stderr - 56%|█████▌ | 12492/22434 [12:06:30<6:56:09, 2.51s/it] +2025-02-05 22:14:11 - ERROR - stderr - +2025-02-05 22:14:11 - ERROR - stderr - +2025-02-05 22:14:11 - INFO - stdout - {'loss': 0.6656, 'grad_norm': 1.0760979652404785, 'learning_rate': 8.649901204293685e-06, 'epoch': 1.67} +2025-02-05 22:14:11 - ERROR - stderr - 56%|█████▌ | 12492/22434 [12:06:30<6:56:09, 2.51s/it] +2025-02-05 22:14:13 - ERROR - stderr - 56%|█████▌ | 12493/22434 [12:06:33<6:56:53, 2.52s/it] +2025-02-05 22:14:13 - ERROR - stderr - +2025-02-05 22:14:13 - ERROR - stderr - +2025-02-05 22:14:13 - INFO - stdout - {'loss': 0.6537, 'grad_norm': 1.187767744064331, 'learning_rate': 8.648470690325277e-06, 'epoch': 1.67} +2025-02-05 22:14:13 - ERROR - stderr - 56%|█████▌ | 12493/22434 [12:06:33<6:56:53, 2.52s/it] +2025-02-05 22:14:16 - ERROR - stderr - 56%|█████▌ | 12494/22434 [12:06:35<6:56:28, 2.51s/it] +2025-02-05 22:14:16 - ERROR - stderr - +2025-02-05 22:14:16 - ERROR - stderr - +2025-02-05 22:14:16 - INFO - stdout - {'loss': 0.6062, 'grad_norm': 1.1163438558578491, 'learning_rate': 8.647040204528206e-06, 'epoch': 1.67} +2025-02-05 22:14:16 - ERROR - stderr - 56%|█████▌ | 12494/22434 [12:06:35<6:56:28, 2.51s/it] +2025-02-05 22:14:18 - ERROR - stderr - 56%|█████▌ | 12495/22434 [12:06:38<6:53:11, 2.49s/it] +2025-02-05 22:14:18 - ERROR - stderr - +2025-02-05 22:14:18 - ERROR - stderr - +2025-02-05 22:14:18 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.1550617218017578, 'learning_rate': 8.645609746932288e-06, 'epoch': 1.67} +2025-02-05 22:14:18 - ERROR - stderr - 56%|█████▌ | 12495/22434 [12:06:38<6:53:11, 2.49s/it] +2025-02-05 22:14:20 - ERROR - stderr - 56%|█████▌ | 12496/22434 [12:06:40<6:50:46, 2.48s/it] +2025-02-05 22:14:20 - ERROR - stderr - +2025-02-05 22:14:20 - ERROR - stderr - +2025-02-05 22:14:20 - INFO - stdout - {'loss': 0.6944, 'grad_norm': 1.3999446630477905, 'learning_rate': 8.644179317567335e-06, 'epoch': 1.67} +2025-02-05 22:14:20 - ERROR - stderr - 56%|█████▌ | 12496/22434 [12:06:40<6:50:46, 2.48s/it] +2025-02-05 22:14:23 - ERROR - stderr - 56%|█████▌ | 12497/22434 [12:06:43<6:53:40, 2.50s/it] +2025-02-05 22:14:23 - ERROR - stderr - +2025-02-05 22:14:23 - ERROR - stderr - +2025-02-05 22:14:23 - INFO - stdout - {'loss': 0.6437, 'grad_norm': 1.3315181732177734, 'learning_rate': 8.64274891646317e-06, 'epoch': 1.67} +2025-02-05 22:14:23 - ERROR - stderr - 56%|█████▌ | 12497/22434 [12:06:43<6:53:40, 2.50s/it] +2025-02-05 22:14:26 - ERROR - stderr - 56%|█████▌ | 12498/22434 [12:06:46<7:11:22, 2.60s/it] +2025-02-05 22:14:26 - ERROR - stderr - +2025-02-05 22:14:26 - ERROR - stderr - +2025-02-05 22:14:26 - INFO - stdout - {'loss': 0.6573, 'grad_norm': 1.067337155342102, 'learning_rate': 8.641318543649602e-06, 'epoch': 1.67} +2025-02-05 22:14:26 - ERROR - stderr - 56%|█████▌ | 12498/22434 [12:06:46<7:11:22, 2.60s/it] +2025-02-05 22:14:28 - ERROR - stderr - 56%|█████▌ | 12499/22434 [12:06:48<7:05:39, 2.57s/it] +2025-02-05 22:14:28 - ERROR - stderr - +2025-02-05 22:14:28 - ERROR - stderr - +2025-02-05 22:14:28 - INFO - stdout - {'loss': 0.6289, 'grad_norm': 1.1923857927322388, 'learning_rate': 8.639888199156449e-06, 'epoch': 1.67} +2025-02-05 22:14:28 - ERROR - stderr - 56%|█████▌ | 12499/22434 [12:06:48<7:05:39, 2.57s/it] +2025-02-05 22:14:31 - ERROR - stderr - 56%|█████▌ | 12500/22434 [12:06:51<7:03:20, 2.56s/it] +2025-02-05 22:14:31 - ERROR - stderr - +2025-02-05 22:14:31 - ERROR - stderr - +2025-02-05 22:14:31 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.1675527095794678, 'learning_rate': 8.638457883013529e-06, 'epoch': 1.67} +2025-02-05 22:14:31 - ERROR - stderr - 56%|█████▌ | 12500/22434 [12:06:51<7:03:20, 2.56s/it] +2025-02-05 22:14:33 - ERROR - stderr - 56%|█████▌ | 12501/22434 [12:06:53<6:58:31, 2.53s/it] +2025-02-05 22:14:33 - ERROR - stderr - +2025-02-05 22:14:33 - ERROR - stderr - +2025-02-05 22:14:33 - INFO - stdout - {'loss': 0.6308, 'grad_norm': 1.086632251739502, 'learning_rate': 8.637027595250646e-06, 'epoch': 1.67} +2025-02-05 22:14:33 - ERROR - stderr - 56%|█████▌ | 12501/22434 [12:06:53<6:58:31, 2.53s/it] +2025-02-05 22:14:36 - ERROR - stderr - 56%|█████▌ | 12502/22434 [12:06:56<7:00:18, 2.54s/it] +2025-02-05 22:14:36 - ERROR - stderr - +2025-02-05 22:14:36 - ERROR - stderr - +2025-02-05 22:14:36 - INFO - stdout - {'loss': 0.7017, 'grad_norm': 1.1351196765899658, 'learning_rate': 8.635597335897623e-06, 'epoch': 1.67} +2025-02-05 22:14:36 - ERROR - stderr - 56%|█████▌ | 12502/22434 [12:06:56<7:00:18, 2.54s/it] +2025-02-05 22:14:38 - ERROR - stderr - 56%|█████▌ | 12503/22434 [12:06:58<7:00:03, 2.54s/it] +2025-02-05 22:14:38 - ERROR - stderr - +2025-02-05 22:14:38 - ERROR - stderr - +2025-02-05 22:14:38 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.2161800861358643, 'learning_rate': 8.63416710498427e-06, 'epoch': 1.67} +2025-02-05 22:14:38 - ERROR - stderr - 56%|█████▌ | 12503/22434 [12:06:58<7:00:03, 2.54s/it] +2025-02-05 22:14:41 - ERROR - stderr - 56%|█████▌ | 12504/22434 [12:07:01<7:07:52, 2.59s/it] +2025-02-05 22:14:41 - ERROR - stderr - +2025-02-05 22:14:41 - ERROR - stderr - +2025-02-05 22:14:41 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.211410641670227, 'learning_rate': 8.63273690254039e-06, 'epoch': 1.67} +2025-02-05 22:14:41 - ERROR - stderr - 56%|█████▌ | 12504/22434 [12:07:01<7:07:52, 2.59s/it] +2025-02-05 22:14:44 - ERROR - stderr - 56%|█████▌ | 12505/22434 [12:07:03<7:08:46, 2.59s/it] +2025-02-05 22:14:44 - ERROR - stderr - +2025-02-05 22:14:44 - ERROR - stderr - +2025-02-05 22:14:44 - INFO - stdout - {'loss': 0.5785, 'grad_norm': 1.1712548732757568, 'learning_rate': 8.631306728595804e-06, 'epoch': 1.67} +2025-02-05 22:14:44 - ERROR - stderr - 56%|█████▌ | 12505/22434 [12:07:03<7:08:46, 2.59s/it] +2025-02-05 22:14:46 - ERROR - stderr - 56%|█████▌ | 12506/22434 [12:07:06<7:02:47, 2.56s/it] +2025-02-05 22:14:46 - ERROR - stderr - +2025-02-05 22:14:46 - ERROR - stderr - +2025-02-05 22:14:46 - INFO - stdout - {'loss': 0.7561, 'grad_norm': 1.3365895748138428, 'learning_rate': 8.629876583180322e-06, 'epoch': 1.67} +2025-02-05 22:14:46 - ERROR - stderr - 56%|█████▌ | 12506/22434 [12:07:06<7:02:47, 2.56s/it] +2025-02-05 22:14:49 - ERROR - stderr - 56%|█████▌ | 12507/22434 [12:07:08<7:01:14, 2.55s/it] +2025-02-05 22:14:49 - ERROR - stderr - +2025-02-05 22:14:49 - ERROR - stderr - +2025-02-05 22:14:49 - INFO - stdout - {'loss': 0.6249, 'grad_norm': 1.1151317358016968, 'learning_rate': 8.628446466323748e-06, 'epoch': 1.67} +2025-02-05 22:14:49 - ERROR - stderr - 56%|█████▌ | 12507/22434 [12:07:08<7:01:14, 2.55s/it] +2025-02-05 22:14:51 - ERROR - stderr - 56%|█████▌ | 12508/22434 [12:07:11<7:00:32, 2.54s/it] +2025-02-05 22:14:51 - ERROR - stderr - +2025-02-05 22:14:51 - ERROR - stderr - +2025-02-05 22:14:51 - INFO - stdout - {'loss': 0.7013, 'grad_norm': 1.3312451839447021, 'learning_rate': 8.627016378055896e-06, 'epoch': 1.67} +2025-02-05 22:14:51 - ERROR - stderr - 56%|█████▌ | 12508/22434 [12:07:11<7:00:32, 2.54s/it] +2025-02-05 22:14:54 - ERROR - stderr - 56%|█████▌ | 12509/22434 [12:07:13<6:54:20, 2.50s/it] +2025-02-05 22:14:54 - ERROR - stderr - +2025-02-05 22:14:54 - ERROR - stderr - +2025-02-05 22:14:54 - INFO - stdout - {'loss': 0.6323, 'grad_norm': 1.1077631711959839, 'learning_rate': 8.625586318406574e-06, 'epoch': 1.67} +2025-02-05 22:14:54 - ERROR - stderr - 56%|█████▌ | 12509/22434 [12:07:13<6:54:20, 2.50s/it] +2025-02-05 22:14:56 - ERROR - stderr - 56%|█████▌ | 12510/22434 [12:07:16<6:52:44, 2.50s/it] +2025-02-05 22:14:56 - ERROR - stderr - +2025-02-05 22:14:56 - ERROR - stderr - +2025-02-05 22:14:56 - INFO - stdout - {'loss': 0.7314, 'grad_norm': 1.2349666357040405, 'learning_rate': 8.624156287405591e-06, 'epoch': 1.67} +2025-02-05 22:14:56 - ERROR - stderr - 56%|█████▌ | 12510/22434 [12:07:16<6:52:44, 2.50s/it] +2025-02-05 22:14:59 - ERROR - stderr - 56%|█████▌ | 12511/22434 [12:07:18<6:49:40, 2.48s/it] +2025-02-05 22:14:59 - ERROR - stderr - +2025-02-05 22:14:59 - ERROR - stderr - +2025-02-05 22:14:59 - INFO - stdout - {'loss': 0.7739, 'grad_norm': 1.3365390300750732, 'learning_rate': 8.622726285082753e-06, 'epoch': 1.67} +2025-02-05 22:14:59 - ERROR - stderr - 56%|█████▌ | 12511/22434 [12:07:18<6:49:40, 2.48s/it] +2025-02-05 22:15:01 - ERROR - stderr - 56%|█████▌ | 12512/22434 [12:07:21<7:01:44, 2.55s/it] +2025-02-05 22:15:01 - ERROR - stderr - +2025-02-05 22:15:01 - ERROR - stderr - +2025-02-05 22:15:01 - INFO - stdout - {'loss': 0.6535, 'grad_norm': 1.3835841417312622, 'learning_rate': 8.621296311467868e-06, 'epoch': 1.67} +2025-02-05 22:15:01 - ERROR - stderr - 56%|█████▌ | 12512/22434 [12:07:21<7:01:44, 2.55s/it] +2025-02-05 22:15:04 - ERROR - stderr - 56%|█████▌ | 12513/22434 [12:07:23<6:56:47, 2.52s/it] +2025-02-05 22:15:04 - ERROR - stderr - +2025-02-05 22:15:04 - ERROR - stderr - +2025-02-05 22:15:04 - INFO - stdout - {'loss': 0.7294, 'grad_norm': 1.306641936302185, 'learning_rate': 8.61986636659074e-06, 'epoch': 1.67} +2025-02-05 22:15:04 - ERROR - stderr - 56%|█████▌ | 12513/22434 [12:07:24<6:56:47, 2.52s/it] +2025-02-05 22:15:06 - ERROR - stderr - 56%|█████▌ | 12514/22434 [12:07:26<6:52:49, 2.50s/it] +2025-02-05 22:15:06 - ERROR - stderr - +2025-02-05 22:15:06 - ERROR - stderr - +2025-02-05 22:15:06 - INFO - stdout - {'loss': 0.76, 'grad_norm': 1.2207576036453247, 'learning_rate': 8.618436450481182e-06, 'epoch': 1.67} +2025-02-05 22:15:06 - ERROR - stderr - 56%|█████▌ | 12514/22434 [12:07:26<6:52:49, 2.50s/it] +2025-02-05 22:15:09 - ERROR - stderr - 56%|█████▌ | 12515/22434 [12:07:29<7:02:15, 2.55s/it] +2025-02-05 22:15:09 - ERROR - stderr - +2025-02-05 22:15:09 - ERROR - stderr - +2025-02-05 22:15:09 - INFO - stdout - {'loss': 0.6388, 'grad_norm': 1.0595773458480835, 'learning_rate': 8.617006563168986e-06, 'epoch': 1.67} +2025-02-05 22:15:09 - ERROR - stderr - 56%|█████▌ | 12515/22434 [12:07:29<7:02:15, 2.55s/it] +2025-02-05 22:15:11 - ERROR - stderr - 56%|█████▌ | 12516/22434 [12:07:31<6:58:53, 2.53s/it] +2025-02-05 22:15:11 - ERROR - stderr - +2025-02-05 22:15:11 - ERROR - stderr - +2025-02-05 22:15:11 - INFO - stdout - {'loss': 0.7508, 'grad_norm': 1.2052029371261597, 'learning_rate': 8.615576704683972e-06, 'epoch': 1.67} +2025-02-05 22:15:11 - ERROR - stderr - 56%|█████▌ | 12516/22434 [12:07:31<6:58:53, 2.53s/it] +2025-02-05 22:15:14 - ERROR - stderr - 56%|█████▌ | 12517/22434 [12:07:34<6:58:06, 2.53s/it] +2025-02-05 22:15:14 - ERROR - stderr - +2025-02-05 22:15:14 - ERROR - stderr - +2025-02-05 22:15:14 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.2362549304962158, 'learning_rate': 8.614146875055933e-06, 'epoch': 1.67} +2025-02-05 22:15:14 - ERROR - stderr - 56%|█████▌ | 12517/22434 [12:07:34<6:58:06, 2.53s/it] +2025-02-05 22:15:16 - ERROR - stderr - 56%|█████▌ | 12518/22434 [12:07:36<6:52:44, 2.50s/it] +2025-02-05 22:15:16 - ERROR - stderr - +2025-02-05 22:15:16 - ERROR - stderr - +2025-02-05 22:15:16 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.2581454515457153, 'learning_rate': 8.612717074314677e-06, 'epoch': 1.67} +2025-02-05 22:15:16 - ERROR - stderr - 56%|█████▌ | 12518/22434 [12:07:36<6:52:44, 2.50s/it] +2025-02-05 22:15:19 - ERROR - stderr - 56%|█████▌ | 12519/22434 [12:07:39<6:54:09, 2.51s/it] +2025-02-05 22:15:19 - ERROR - stderr - +2025-02-05 22:15:19 - ERROR - stderr - +2025-02-05 22:15:19 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.2816431522369385, 'learning_rate': 8.611287302490008e-06, 'epoch': 1.67} +2025-02-05 22:15:19 - ERROR - stderr - 56%|█████▌ | 12519/22434 [12:07:39<6:54:09, 2.51s/it] +2025-02-05 22:15:21 - ERROR - stderr - 56%|█████▌ | 12520/22434 [12:07:41<6:53:19, 2.50s/it] +2025-02-05 22:15:21 - ERROR - stderr - +2025-02-05 22:15:21 - ERROR - stderr - +2025-02-05 22:15:21 - INFO - stdout - {'loss': 0.6913, 'grad_norm': 1.2089641094207764, 'learning_rate': 8.609857559611723e-06, 'epoch': 1.67} +2025-02-05 22:15:21 - ERROR - stderr - 56%|█████▌ | 12520/22434 [12:07:41<6:53:19, 2.50s/it] +2025-02-05 22:15:24 - ERROR - stderr - 56%|█████▌ | 12521/22434 [12:07:44<7:01:04, 2.55s/it] +2025-02-05 22:15:24 - ERROR - stderr - +2025-02-05 22:15:24 - ERROR - stderr - +2025-02-05 22:15:24 - INFO - stdout - {'loss': 0.75, 'grad_norm': 1.356865644454956, 'learning_rate': 8.608427845709632e-06, 'epoch': 1.67} +2025-02-05 22:15:24 - ERROR - stderr - 56%|█████▌ | 12521/22434 [12:07:44<7:01:04, 2.55s/it] +2025-02-05 22:15:26 - ERROR - stderr - 56%|█████▌ | 12522/22434 [12:07:46<6:58:00, 2.53s/it] +2025-02-05 22:15:26 - ERROR - stderr - +2025-02-05 22:15:26 - ERROR - stderr - +2025-02-05 22:15:26 - INFO - stdout - {'loss': 0.7448, 'grad_norm': 1.489906668663025, 'learning_rate': 8.60699816081353e-06, 'epoch': 1.67} +2025-02-05 22:15:26 - ERROR - stderr - 56%|█████▌ | 12522/22434 [12:07:46<6:58:00, 2.53s/it] +2025-02-05 22:15:29 - ERROR - stderr - 56%|█████▌ | 12523/22434 [12:07:49<6:54:01, 2.51s/it] +2025-02-05 22:15:29 - ERROR - stderr - +2025-02-05 22:15:29 - ERROR - stderr - +2025-02-05 22:15:29 - INFO - stdout - {'loss': 0.6496, 'grad_norm': 1.4315907955169678, 'learning_rate': 8.605568504953213e-06, 'epoch': 1.67} +2025-02-05 22:15:29 - ERROR - stderr - 56%|█████▌ | 12523/22434 [12:07:49<6:54:01, 2.51s/it] +2025-02-05 22:15:31 - ERROR - stderr - 56%|█████▌ | 12524/22434 [12:07:51<6:54:26, 2.51s/it] +2025-02-05 22:15:31 - ERROR - stderr - +2025-02-05 22:15:31 - ERROR - stderr - +2025-02-05 22:15:31 - INFO - stdout - {'loss': 0.6653, 'grad_norm': 1.2465261220932007, 'learning_rate': 8.60413887815849e-06, 'epoch': 1.67} +2025-02-05 22:15:31 - ERROR - stderr - 56%|█████▌ | 12524/22434 [12:07:51<6:54:26, 2.51s/it] +2025-02-05 22:15:34 - ERROR - stderr - 56%|█████▌ | 12525/22434 [12:07:54<6:52:05, 2.50s/it] +2025-02-05 22:15:34 - ERROR - stderr - +2025-02-05 22:15:34 - ERROR - stderr - +2025-02-05 22:15:34 - INFO - stdout - {'loss': 0.6422, 'grad_norm': 1.1621308326721191, 'learning_rate': 8.602709280459156e-06, 'epoch': 1.67} +2025-02-05 22:15:34 - ERROR - stderr - 56%|█████▌ | 12525/22434 [12:07:54<6:52:05, 2.50s/it] +2025-02-05 22:15:36 - ERROR - stderr - 56%|█████▌ | 12526/22434 [12:07:56<6:57:04, 2.53s/it] +2025-02-05 22:15:37 - ERROR - stderr - +2025-02-05 22:15:37 - ERROR - stderr - +2025-02-05 22:15:37 - INFO - stdout - {'loss': 0.7162, 'grad_norm': 1.2785048484802246, 'learning_rate': 8.60127971188501e-06, 'epoch': 1.68} +2025-02-05 22:15:37 - ERROR - stderr - 56%|█████▌ | 12526/22434 [12:07:56<6:57:04, 2.53s/it] +2025-02-05 22:15:39 - ERROR - stderr - 56%|█████▌ | 12527/22434 [12:07:59<6:55:32, 2.52s/it] +2025-02-05 22:15:39 - ERROR - stderr - +2025-02-05 22:15:39 - ERROR - stderr - +2025-02-05 22:15:39 - INFO - stdout - {'loss': 0.7648, 'grad_norm': 1.2701060771942139, 'learning_rate': 8.599850172465851e-06, 'epoch': 1.68} +2025-02-05 22:15:39 - ERROR - stderr - 56%|█████▌ | 12527/22434 [12:07:59<6:55:32, 2.52s/it] +2025-02-05 22:15:41 - ERROR - stderr - 56%|█████▌ | 12528/22434 [12:08:01<6:54:09, 2.51s/it] +2025-02-05 22:15:41 - ERROR - stderr - +2025-02-05 22:15:41 - ERROR - stderr - +2025-02-05 22:15:41 - INFO - stdout - {'loss': 0.7635, 'grad_norm': 1.3403438329696655, 'learning_rate': 8.598420662231473e-06, 'epoch': 1.68} +2025-02-05 22:15:41 - ERROR - stderr - 56%|█████▌ | 12528/22434 [12:08:01<6:54:09, 2.51s/it] +2025-02-05 22:15:44 - ERROR - stderr - 56%|█████▌ | 12529/22434 [12:08:04<7:04:39, 2.57s/it] +2025-02-05 22:15:44 - ERROR - stderr - +2025-02-05 22:15:44 - ERROR - stderr - +2025-02-05 22:15:44 - INFO - stdout - {'loss': 0.7371, 'grad_norm': 1.293044924736023, 'learning_rate': 8.596991181211679e-06, 'epoch': 1.68} +2025-02-05 22:15:44 - ERROR - stderr - 56%|█████▌ | 12529/22434 [12:08:04<7:04:39, 2.57s/it] +2025-02-05 22:15:47 - ERROR - stderr - 56%|█████▌ | 12530/22434 [12:08:06<7:01:08, 2.55s/it] +2025-02-05 22:15:47 - ERROR - stderr - +2025-02-05 22:15:47 - ERROR - stderr - +2025-02-05 22:15:47 - INFO - stdout - {'loss': 0.7078, 'grad_norm': 1.1402117013931274, 'learning_rate': 8.595561729436257e-06, 'epoch': 1.68} +2025-02-05 22:15:47 - ERROR - stderr - 56%|█████▌ | 12530/22434 [12:08:06<7:01:08, 2.55s/it] +2025-02-05 22:15:49 - ERROR - stderr - 56%|█████▌ | 12531/22434 [12:08:09<6:58:38, 2.54s/it] +2025-02-05 22:15:49 - ERROR - stderr - +2025-02-05 22:15:49 - ERROR - stderr - +2025-02-05 22:15:49 - INFO - stdout - {'loss': 0.7872, 'grad_norm': 1.3669507503509521, 'learning_rate': 8.594132306935008e-06, 'epoch': 1.68} +2025-02-05 22:15:49 - ERROR - stderr - 56%|█████▌ | 12531/22434 [12:08:09<6:58:38, 2.54s/it] +2025-02-05 22:15:52 - ERROR - stderr - 56%|█████▌ | 12532/22434 [12:08:11<6:56:15, 2.52s/it] +2025-02-05 22:15:52 - ERROR - stderr - +2025-02-05 22:15:52 - ERROR - stderr - +2025-02-05 22:15:52 - INFO - stdout - {'loss': 0.5982, 'grad_norm': 1.2211463451385498, 'learning_rate': 8.592702913737727e-06, 'epoch': 1.68} +2025-02-05 22:15:52 - ERROR - stderr - 56%|█████▌ | 12532/22434 [12:08:11<6:56:15, 2.52s/it] +2025-02-05 22:15:54 - ERROR - stderr - 56%|█████▌ | 12533/22434 [12:08:14<6:55:00, 2.51s/it] +2025-02-05 22:15:54 - ERROR - stderr - +2025-02-05 22:15:54 - ERROR - stderr - +2025-02-05 22:15:54 - INFO - stdout - {'loss': 0.7535, 'grad_norm': 1.303511142730713, 'learning_rate': 8.591273549874204e-06, 'epoch': 1.68} +2025-02-05 22:15:54 - ERROR - stderr - 56%|█████▌ | 12533/22434 [12:08:14<6:55:00, 2.51s/it] +2025-02-05 22:15:57 - ERROR - stderr - 56%|█████▌ | 12534/22434 [12:08:16<6:51:59, 2.50s/it] +2025-02-05 22:15:57 - ERROR - stderr - +2025-02-05 22:15:57 - ERROR - stderr - +2025-02-05 22:15:57 - INFO - stdout - {'loss': 0.7229, 'grad_norm': 1.1747208833694458, 'learning_rate': 8.58984421537424e-06, 'epoch': 1.68} +2025-02-05 22:15:57 - ERROR - stderr - 56%|█████▌ | 12534/22434 [12:08:16<6:51:59, 2.50s/it] +2025-02-05 22:15:59 - ERROR - stderr - 56%|█████▌ | 12535/22434 [12:08:19<7:07:17, 2.59s/it] +2025-02-05 22:15:59 - ERROR - stderr - +2025-02-05 22:15:59 - ERROR - stderr - +2025-02-05 22:15:59 - INFO - stdout - {'loss': 0.6008, 'grad_norm': 1.1237784624099731, 'learning_rate': 8.588414910267623e-06, 'epoch': 1.68} +2025-02-05 22:15:59 - ERROR - stderr - 56%|█████▌ | 12535/22434 [12:08:19<7:07:17, 2.59s/it] +2025-02-05 22:16:02 - ERROR - stderr - 56%|█████▌ | 12536/22434 [12:08:22<7:02:50, 2.56s/it] +2025-02-05 22:16:02 - ERROR - stderr - +2025-02-05 22:16:02 - ERROR - stderr - +2025-02-05 22:16:02 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.167698621749878, 'learning_rate': 8.586985634584145e-06, 'epoch': 1.68} +2025-02-05 22:16:02 - ERROR - stderr - 56%|█████▌ | 12536/22434 [12:08:22<7:02:50, 2.56s/it] +2025-02-05 22:16:04 - ERROR - stderr - 56%|█████▌ | 12537/22434 [12:08:24<7:01:18, 2.55s/it] +2025-02-05 22:16:04 - ERROR - stderr - +2025-02-05 22:16:04 - ERROR - stderr - +2025-02-05 22:16:04 - INFO - stdout - {'loss': 0.6244, 'grad_norm': 1.167222499847412, 'learning_rate': 8.5855563883536e-06, 'epoch': 1.68} +2025-02-05 22:16:04 - ERROR - stderr - 56%|█████▌ | 12537/22434 [12:08:24<7:01:18, 2.55s/it] +2025-02-05 22:16:07 - ERROR - stderr - 56%|█████▌ | 12538/22434 [12:08:27<6:55:33, 2.52s/it] +2025-02-05 22:16:07 - ERROR - stderr - +2025-02-05 22:16:07 - ERROR - stderr - +2025-02-05 22:16:07 - INFO - stdout - {'loss': 0.6844, 'grad_norm': 1.2443809509277344, 'learning_rate': 8.58412717160578e-06, 'epoch': 1.68} +2025-02-05 22:16:07 - ERROR - stderr - 56%|█████▌ | 12538/22434 [12:08:27<6:55:33, 2.52s/it] +2025-02-05 22:16:09 - ERROR - stderr - 56%|█████▌ | 12539/22434 [12:08:29<6:53:11, 2.51s/it] +2025-02-05 22:16:09 - ERROR - stderr - +2025-02-05 22:16:09 - ERROR - stderr - +2025-02-05 22:16:09 - INFO - stdout - {'loss': 0.5852, 'grad_norm': 1.1030668020248413, 'learning_rate': 8.582697984370471e-06, 'epoch': 1.68} +2025-02-05 22:16:09 - ERROR - stderr - 56%|█████▌ | 12539/22434 [12:08:29<6:53:11, 2.51s/it] +2025-02-05 22:16:12 - ERROR - stderr - 56%|█████▌ | 12540/22434 [12:08:32<6:50:24, 2.49s/it] +2025-02-05 22:16:12 - ERROR - stderr - +2025-02-05 22:16:12 - ERROR - stderr - +2025-02-05 22:16:12 - INFO - stdout - {'loss': 0.721, 'grad_norm': 1.244611382484436, 'learning_rate': 8.58126882667747e-06, 'epoch': 1.68} +2025-02-05 22:16:12 - ERROR - stderr - 56%|█████▌ | 12540/22434 [12:08:32<6:50:24, 2.49s/it] +2025-02-05 22:16:14 - ERROR - stderr - 56%|█████▌ | 12541/22434 [12:08:34<6:47:40, 2.47s/it] +2025-02-05 22:16:14 - ERROR - stderr - +2025-02-05 22:16:14 - ERROR - stderr - +2025-02-05 22:16:14 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.421364665031433, 'learning_rate': 8.579839698556558e-06, 'epoch': 1.68} +2025-02-05 22:16:14 - ERROR - stderr - 56%|█████▌ | 12541/22434 [12:08:34<6:47:40, 2.47s/it] +2025-02-05 22:16:17 - ERROR - stderr - 56%|█████▌ | 12542/22434 [12:08:37<6:49:04, 2.48s/it] +2025-02-05 22:16:17 - ERROR - stderr - +2025-02-05 22:16:17 - ERROR - stderr - +2025-02-05 22:16:17 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.2461024522781372, 'learning_rate': 8.578410600037533e-06, 'epoch': 1.68} +2025-02-05 22:16:17 - ERROR - stderr - 56%|█████▌ | 12542/22434 [12:08:37<6:49:04, 2.48s/it] +2025-02-05 22:16:19 - ERROR - stderr - 56%|█████▌ | 12543/22434 [12:08:39<6:46:34, 2.47s/it] +2025-02-05 22:16:19 - ERROR - stderr - +2025-02-05 22:16:19 - ERROR - stderr - +2025-02-05 22:16:19 - INFO - stdout - {'loss': 0.6402, 'grad_norm': 1.1580002307891846, 'learning_rate': 8.576981531150177e-06, 'epoch': 1.68} +2025-02-05 22:16:19 - ERROR - stderr - 56%|█████▌ | 12543/22434 [12:08:39<6:46:34, 2.47s/it] +2025-02-05 22:16:22 - ERROR - stderr - 56%|█████▌ | 12544/22434 [12:08:41<6:50:31, 2.49s/it] +2025-02-05 22:16:22 - ERROR - stderr - +2025-02-05 22:16:22 - ERROR - stderr - +2025-02-05 22:16:22 - INFO - stdout - {'loss': 0.6476, 'grad_norm': 1.2995128631591797, 'learning_rate': 8.57555249192428e-06, 'epoch': 1.68} +2025-02-05 22:16:22 - ERROR - stderr - 56%|█████▌ | 12544/22434 [12:08:42<6:50:31, 2.49s/it] +2025-02-05 22:16:24 - ERROR - stderr - 56%|█████▌ | 12545/22434 [12:08:44<6:53:48, 2.51s/it] +2025-02-05 22:16:24 - ERROR - stderr - +2025-02-05 22:16:24 - ERROR - stderr - +2025-02-05 22:16:24 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.1639846563339233, 'learning_rate': 8.574123482389627e-06, 'epoch': 1.68} +2025-02-05 22:16:24 - ERROR - stderr - 56%|█████▌ | 12545/22434 [12:08:44<6:53:48, 2.51s/it] +2025-02-05 22:16:27 - ERROR - stderr - 56%|█████▌ | 12546/22434 [12:08:46<6:50:30, 2.49s/it] +2025-02-05 22:16:27 - ERROR - stderr - +2025-02-05 22:16:27 - ERROR - stderr - +2025-02-05 22:16:27 - INFO - stdout - {'loss': 0.6584, 'grad_norm': 1.1439696550369263, 'learning_rate': 8.572694502576009e-06, 'epoch': 1.68} +2025-02-05 22:16:27 - ERROR - stderr - 56%|█████▌ | 12546/22434 [12:08:47<6:50:30, 2.49s/it] +2025-02-05 22:16:29 - ERROR - stderr - 56%|█████▌ | 12547/22434 [12:08:49<6:56:27, 2.53s/it] +2025-02-05 22:16:29 - ERROR - stderr - +2025-02-05 22:16:29 - ERROR - stderr - +2025-02-05 22:16:29 - INFO - stdout - {'loss': 0.6606, 'grad_norm': 1.180978775024414, 'learning_rate': 8.571265552513205e-06, 'epoch': 1.68} +2025-02-05 22:16:29 - ERROR - stderr - 56%|█████▌ | 12547/22434 [12:08:49<6:56:27, 2.53s/it] +2025-02-05 22:16:32 - ERROR - stderr - 56%|█████▌ | 12548/22434 [12:08:52<6:55:34, 2.52s/it] +2025-02-05 22:16:32 - ERROR - stderr - +2025-02-05 22:16:32 - ERROR - stderr - +2025-02-05 22:16:32 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.2616256475448608, 'learning_rate': 8.569836632231005e-06, 'epoch': 1.68} +2025-02-05 22:16:32 - ERROR - stderr - 56%|█████▌ | 12548/22434 [12:08:52<6:55:34, 2.52s/it] +2025-02-05 22:16:34 - ERROR - stderr - 56%|█████▌ | 12549/22434 [12:08:54<6:55:22, 2.52s/it] +2025-02-05 22:16:34 - ERROR - stderr - +2025-02-05 22:16:34 - ERROR - stderr - +2025-02-05 22:16:34 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.2243560552597046, 'learning_rate': 8.568407741759188e-06, 'epoch': 1.68} +2025-02-05 22:16:34 - ERROR - stderr - 56%|█████▌ | 12549/22434 [12:08:54<6:55:22, 2.52s/it] +2025-02-05 22:16:37 - ERROR - stderr - 56%|█████▌ | 12550/22434 [12:08:57<6:55:38, 2.52s/it] +2025-02-05 22:16:37 - ERROR - stderr - +2025-02-05 22:16:37 - ERROR - stderr - +2025-02-05 22:16:37 - INFO - stdout - {'loss': 0.6838, 'grad_norm': 1.2548588514328003, 'learning_rate': 8.566978881127544e-06, 'epoch': 1.68} +2025-02-05 22:16:37 - ERROR - stderr - 56%|█████▌ | 12550/22434 [12:08:57<6:55:38, 2.52s/it] +2025-02-05 22:16:39 - ERROR - stderr - 56%|█████▌ | 12551/22434 [12:08:59<6:53:03, 2.51s/it] +2025-02-05 22:16:39 - ERROR - stderr - +2025-02-05 22:16:39 - ERROR - stderr - +2025-02-05 22:16:39 - INFO - stdout - {'loss': 0.6704, 'grad_norm': 1.053565502166748, 'learning_rate': 8.565550050365858e-06, 'epoch': 1.68} +2025-02-05 22:16:39 - ERROR - stderr - 56%|█████▌ | 12551/22434 [12:08:59<6:53:03, 2.51s/it] +2025-02-05 22:16:42 - ERROR - stderr - 56%|█████▌ | 12552/22434 [12:09:02<6:53:26, 2.51s/it] +2025-02-05 22:16:42 - ERROR - stderr - +2025-02-05 22:16:42 - ERROR - stderr - +2025-02-05 22:16:42 - INFO - stdout - {'loss': 0.6951, 'grad_norm': 1.2140696048736572, 'learning_rate': 8.564121249503901e-06, 'epoch': 1.68} +2025-02-05 22:16:42 - ERROR - stderr - 56%|█████▌ | 12552/22434 [12:09:02<6:53:26, 2.51s/it] +2025-02-05 22:16:44 - ERROR - stderr - 56%|█████▌ | 12553/22434 [12:09:04<6:49:13, 2.48s/it] +2025-02-05 22:16:44 - ERROR - stderr - +2025-02-05 22:16:44 - ERROR - stderr - +2025-02-05 22:16:44 - INFO - stdout - {'loss': 0.6082, 'grad_norm': 1.1624490022659302, 'learning_rate': 8.562692478571469e-06, 'epoch': 1.68} +2025-02-05 22:16:44 - ERROR - stderr - 56%|█████▌ | 12553/22434 [12:09:04<6:49:13, 2.48s/it] +2025-02-05 22:16:47 - ERROR - stderr - 56%|█████▌ | 12554/22434 [12:09:07<6:51:41, 2.50s/it] +2025-02-05 22:16:47 - ERROR - stderr - +2025-02-05 22:16:47 - ERROR - stderr - +2025-02-05 22:16:47 - INFO - stdout - {'loss': 0.7459, 'grad_norm': 1.2249689102172852, 'learning_rate': 8.561263737598338e-06, 'epoch': 1.68} +2025-02-05 22:16:47 - ERROR - stderr - 56%|█████▌ | 12554/22434 [12:09:07<6:51:41, 2.50s/it] +2025-02-05 22:16:49 - ERROR - stderr - 56%|█████▌ | 12555/22434 [12:09:09<6:50:41, 2.49s/it] +2025-02-05 22:16:49 - ERROR - stderr - +2025-02-05 22:16:49 - ERROR - stderr - +2025-02-05 22:16:49 - INFO - stdout - {'loss': 0.7029, 'grad_norm': 1.3606446981430054, 'learning_rate': 8.559835026614281e-06, 'epoch': 1.68} +2025-02-05 22:16:49 - ERROR - stderr - 56%|█████▌ | 12555/22434 [12:09:09<6:50:41, 2.49s/it] +2025-02-05 22:16:52 - ERROR - stderr - 56%|█████▌ | 12556/22434 [12:09:12<7:00:32, 2.55s/it] +2025-02-05 22:16:52 - ERROR - stderr - +2025-02-05 22:16:52 - ERROR - stderr - +2025-02-05 22:16:52 - INFO - stdout - {'loss': 0.6557, 'grad_norm': 1.2434829473495483, 'learning_rate': 8.558406345649088e-06, 'epoch': 1.68} +2025-02-05 22:16:52 - ERROR - stderr - 56%|█████▌ | 12556/22434 [12:09:12<7:00:32, 2.55s/it] +2025-02-05 22:16:54 - ERROR - stderr - 56%|█████▌ | 12557/22434 [12:09:14<6:53:59, 2.51s/it] +2025-02-05 22:16:54 - ERROR - stderr - +2025-02-05 22:16:54 - ERROR - stderr - +2025-02-05 22:16:54 - INFO - stdout - {'loss': 0.6609, 'grad_norm': 1.2610472440719604, 'learning_rate': 8.556977694732535e-06, 'epoch': 1.68} +2025-02-05 22:16:54 - ERROR - stderr - 56%|█████▌ | 12557/22434 [12:09:14<6:53:59, 2.51s/it] +2025-02-05 22:16:57 - ERROR - stderr - 56%|█████▌ | 12558/22434 [12:09:17<6:55:00, 2.52s/it] +2025-02-05 22:16:57 - ERROR - stderr - +2025-02-05 22:16:57 - ERROR - stderr - +2025-02-05 22:16:57 - INFO - stdout - {'loss': 0.7161, 'grad_norm': 1.1929535865783691, 'learning_rate': 8.555549073894403e-06, 'epoch': 1.68} +2025-02-05 22:16:57 - ERROR - stderr - 56%|█████▌ | 12558/22434 [12:09:17<6:55:00, 2.52s/it] +2025-02-05 22:16:59 - ERROR - stderr - 56%|█████▌ | 12559/22434 [12:09:19<6:54:07, 2.52s/it] +2025-02-05 22:17:00 - ERROR - stderr - +2025-02-05 22:17:00 - ERROR - stderr - +2025-02-05 22:17:00 - INFO - stdout - {'loss': 0.651, 'grad_norm': 1.270406723022461, 'learning_rate': 8.554120483164467e-06, 'epoch': 1.68} +2025-02-05 22:17:00 - ERROR - stderr - 56%|█████▌ | 12559/22434 [12:09:19<6:54:07, 2.52s/it] +2025-02-05 22:17:02 - ERROR - stderr - 56%|█████▌ | 12560/22434 [12:09:22<7:08:10, 2.60s/it] +2025-02-05 22:17:02 - ERROR - stderr - +2025-02-05 22:17:02 - ERROR - stderr - +2025-02-05 22:17:02 - INFO - stdout - {'loss': 0.6615, 'grad_norm': 1.202677607536316, 'learning_rate': 8.552691922572505e-06, 'epoch': 1.68} +2025-02-05 22:17:02 - ERROR - stderr - 56%|█████▌ | 12560/22434 [12:09:22<7:08:10, 2.60s/it] +2025-02-05 22:17:05 - ERROR - stderr - 56%|█████▌ | 12561/22434 [12:09:25<7:28:06, 2.72s/it] +2025-02-05 22:17:05 - ERROR - stderr - +2025-02-05 22:17:05 - ERROR - stderr - +2025-02-05 22:17:05 - INFO - stdout - {'loss': 0.6596, 'grad_norm': 1.149090051651001, 'learning_rate': 8.551263392148298e-06, 'epoch': 1.68} +2025-02-05 22:17:05 - ERROR - stderr - 56%|█████▌ | 12561/22434 [12:09:25<7:28:06, 2.72s/it] +2025-02-05 22:17:08 - ERROR - stderr - 56%|█████▌ | 12562/22434 [12:09:28<7:17:06, 2.66s/it] +2025-02-05 22:17:08 - ERROR - stderr - +2025-02-05 22:17:08 - ERROR - stderr - +2025-02-05 22:17:08 - INFO - stdout - {'loss': 0.7031, 'grad_norm': 1.2523561716079712, 'learning_rate': 8.549834891921616e-06, 'epoch': 1.68} +2025-02-05 22:17:08 - ERROR - stderr - 56%|█████▌ | 12562/22434 [12:09:28<7:17:06, 2.66s/it] +2025-02-05 22:17:10 - ERROR - stderr - 56%|█████▌ | 12563/22434 [12:09:30<7:08:37, 2.61s/it] +2025-02-05 22:17:10 - ERROR - stderr - +2025-02-05 22:17:10 - ERROR - stderr - +2025-02-05 22:17:10 - INFO - stdout - {'loss': 0.6403, 'grad_norm': 1.2160288095474243, 'learning_rate': 8.54840642192224e-06, 'epoch': 1.68} +2025-02-05 22:17:10 - ERROR - stderr - 56%|█████▌ | 12563/22434 [12:09:30<7:08:37, 2.61s/it] +2025-02-05 22:17:13 - ERROR - stderr - 56%|█████▌ | 12564/22434 [12:09:33<7:02:44, 2.57s/it] +2025-02-05 22:17:13 - ERROR - stderr - +2025-02-05 22:17:13 - ERROR - stderr - +2025-02-05 22:17:13 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.4041779041290283, 'learning_rate': 8.54697798217994e-06, 'epoch': 1.68} +2025-02-05 22:17:13 - ERROR - stderr - 56%|█████▌ | 12564/22434 [12:09:33<7:02:44, 2.57s/it] +2025-02-05 22:17:15 - ERROR - stderr - 56%|█████▌ | 12565/22434 [12:09:35<6:58:45, 2.55s/it] +2025-02-05 22:17:15 - ERROR - stderr - +2025-02-05 22:17:15 - ERROR - stderr - +2025-02-05 22:17:15 - INFO - stdout - {'loss': 0.6201, 'grad_norm': 1.1235220432281494, 'learning_rate': 8.545549572724496e-06, 'epoch': 1.68} +2025-02-05 22:17:15 - ERROR - stderr - 56%|█████▌ | 12565/22434 [12:09:35<6:58:45, 2.55s/it] +2025-02-05 22:17:18 - ERROR - stderr - 56%|█████▌ | 12566/22434 [12:09:37<6:54:00, 2.52s/it] +2025-02-05 22:17:18 - ERROR - stderr - +2025-02-05 22:17:18 - ERROR - stderr - +2025-02-05 22:17:18 - INFO - stdout - {'loss': 0.7784, 'grad_norm': 1.2747966051101685, 'learning_rate': 8.544121193585681e-06, 'epoch': 1.68} +2025-02-05 22:17:18 - ERROR - stderr - 56%|█████▌ | 12566/22434 [12:09:38<6:54:00, 2.52s/it] +2025-02-05 22:17:20 - ERROR - stderr - 56%|█████▌ | 12567/22434 [12:09:40<6:53:10, 2.51s/it] +2025-02-05 22:17:20 - ERROR - stderr - +2025-02-05 22:17:20 - ERROR - stderr - +2025-02-05 22:17:20 - INFO - stdout - {'loss': 0.7278, 'grad_norm': 1.2559876441955566, 'learning_rate': 8.542692844793267e-06, 'epoch': 1.68} +2025-02-05 22:17:20 - ERROR - stderr - 56%|█████▌ | 12567/22434 [12:09:40<6:53:10, 2.51s/it] +2025-02-05 22:17:23 - ERROR - stderr - 56%|█████▌ | 12568/22434 [12:09:42<6:53:30, 2.51s/it] +2025-02-05 22:17:23 - ERROR - stderr - +2025-02-05 22:17:23 - ERROR - stderr - +2025-02-05 22:17:23 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.1781076192855835, 'learning_rate': 8.541264526377021e-06, 'epoch': 1.68} +2025-02-05 22:17:23 - ERROR - stderr - 56%|█████▌ | 12568/22434 [12:09:43<6:53:30, 2.51s/it] +2025-02-05 22:17:25 - ERROR - stderr - 56%|█████▌ | 12569/22434 [12:09:45<6:49:40, 2.49s/it] +2025-02-05 22:17:25 - ERROR - stderr - +2025-02-05 22:17:25 - ERROR - stderr - +2025-02-05 22:17:25 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.2719155550003052, 'learning_rate': 8.539836238366724e-06, 'epoch': 1.68} +2025-02-05 22:17:25 - ERROR - stderr - 56%|█████▌ | 12569/22434 [12:09:45<6:49:40, 2.49s/it] +2025-02-05 22:17:28 - ERROR - stderr - 56%|█████▌ | 12570/22434 [12:09:47<6:49:02, 2.49s/it] +2025-02-05 22:17:28 - ERROR - stderr - +2025-02-05 22:17:28 - ERROR - stderr - +2025-02-05 22:17:28 - INFO - stdout - {'loss': 0.7458, 'grad_norm': 1.284642219543457, 'learning_rate': 8.538407980792144e-06, 'epoch': 1.68} +2025-02-05 22:17:28 - ERROR - stderr - 56%|█████▌ | 12570/22434 [12:09:47<6:49:02, 2.49s/it] +2025-02-05 22:17:30 - ERROR - stderr - 56%|█████▌ | 12571/22434 [12:09:50<6:50:07, 2.49s/it] +2025-02-05 22:17:30 - ERROR - stderr - +2025-02-05 22:17:30 - ERROR - stderr - +2025-02-05 22:17:30 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.3887240886688232, 'learning_rate': 8.536979753683046e-06, 'epoch': 1.68} +2025-02-05 22:17:30 - ERROR - stderr - 56%|█████▌ | 12571/22434 [12:09:50<6:50:07, 2.49s/it] +2025-02-05 22:17:33 - ERROR - stderr - 56%|█████▌ | 12572/22434 [12:09:52<6:48:36, 2.49s/it] +2025-02-05 22:17:33 - ERROR - stderr - +2025-02-05 22:17:33 - ERROR - stderr - +2025-02-05 22:17:33 - INFO - stdout - {'loss': 0.6261, 'grad_norm': 1.1581352949142456, 'learning_rate': 8.535551557069211e-06, 'epoch': 1.68} +2025-02-05 22:17:33 - ERROR - stderr - 56%|█████▌ | 12572/22434 [12:09:52<6:48:36, 2.49s/it] +2025-02-05 22:17:35 - ERROR - stderr - 56%|█████▌ | 12573/22434 [12:09:55<6:48:29, 2.49s/it] +2025-02-05 22:17:35 - ERROR - stderr - +2025-02-05 22:17:35 - ERROR - stderr - +2025-02-05 22:17:35 - INFO - stdout - {'loss': 0.6508, 'grad_norm': 1.2411030530929565, 'learning_rate': 8.534123390980398e-06, 'epoch': 1.68} +2025-02-05 22:17:35 - ERROR - stderr - 56%|█████▌ | 12573/22434 [12:09:55<6:48:29, 2.49s/it] +2025-02-05 22:17:38 - ERROR - stderr - 56%|█████▌ | 12574/22434 [12:09:58<7:06:05, 2.59s/it] +2025-02-05 22:17:38 - ERROR - stderr - +2025-02-05 22:17:38 - ERROR - stderr - +2025-02-05 22:17:38 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.2681448459625244, 'learning_rate': 8.532695255446384e-06, 'epoch': 1.68} +2025-02-05 22:17:38 - ERROR - stderr - 56%|█████▌ | 12574/22434 [12:09:58<7:06:05, 2.59s/it] +2025-02-05 22:17:41 - ERROR - stderr - 56%|█████▌ | 12575/22434 [12:10:00<7:11:49, 2.63s/it] +2025-02-05 22:17:41 - ERROR - stderr - +2025-02-05 22:17:41 - ERROR - stderr - +2025-02-05 22:17:41 - INFO - stdout - {'loss': 0.6812, 'grad_norm': 1.2097887992858887, 'learning_rate': 8.531267150496932e-06, 'epoch': 1.68} +2025-02-05 22:17:41 - ERROR - stderr - 56%|█████▌ | 12575/22434 [12:10:00<7:11:49, 2.63s/it] +2025-02-05 22:17:43 - ERROR - stderr - 56%|█████▌ | 12576/22434 [12:10:03<7:05:07, 2.59s/it] +2025-02-05 22:17:43 - ERROR - stderr - +2025-02-05 22:17:43 - ERROR - stderr - +2025-02-05 22:17:43 - INFO - stdout - {'loss': 0.6357, 'grad_norm': 1.1672911643981934, 'learning_rate': 8.52983907616181e-06, 'epoch': 1.68} +2025-02-05 22:17:43 - ERROR - stderr - 56%|█████▌ | 12576/22434 [12:10:03<7:05:07, 2.59s/it] +2025-02-05 22:17:46 - ERROR - stderr - 56%|█████▌ | 12577/22434 [12:10:05<7:01:20, 2.56s/it] +2025-02-05 22:17:46 - ERROR - stderr - +2025-02-05 22:17:46 - ERROR - stderr - +2025-02-05 22:17:46 - INFO - stdout - {'loss': 0.7375, 'grad_norm': 1.4323292970657349, 'learning_rate': 8.528411032470786e-06, 'epoch': 1.68} +2025-02-05 22:17:46 - ERROR - stderr - 56%|█████▌ | 12577/22434 [12:10:05<7:01:20, 2.56s/it] +2025-02-05 22:17:48 - ERROR - stderr - 56%|█████▌ | 12578/22434 [12:10:08<7:01:24, 2.57s/it] +2025-02-05 22:17:48 - ERROR - stderr - +2025-02-05 22:17:48 - ERROR - stderr - +2025-02-05 22:17:48 - INFO - stdout - {'loss': 0.6848, 'grad_norm': 1.1874772310256958, 'learning_rate': 8.526983019453624e-06, 'epoch': 1.68} +2025-02-05 22:17:48 - ERROR - stderr - 56%|█████▌ | 12578/22434 [12:10:08<7:01:24, 2.57s/it] +2025-02-05 22:17:51 - ERROR - stderr - 56%|█████▌ | 12579/22434 [12:10:11<6:58:53, 2.55s/it] +2025-02-05 22:17:51 - ERROR - stderr - +2025-02-05 22:17:51 - ERROR - stderr - +2025-02-05 22:17:51 - INFO - stdout - {'loss': 0.7648, 'grad_norm': 1.3499449491500854, 'learning_rate': 8.525555037140095e-06, 'epoch': 1.68} +2025-02-05 22:17:51 - ERROR - stderr - 56%|█████▌ | 12579/22434 [12:10:11<6:58:53, 2.55s/it] +2025-02-05 22:17:53 - ERROR - stderr - 56%|█████▌ | 12580/22434 [12:10:13<6:54:53, 2.53s/it] +2025-02-05 22:17:53 - ERROR - stderr - +2025-02-05 22:17:53 - ERROR - stderr - +2025-02-05 22:17:53 - INFO - stdout - {'loss': 0.7349, 'grad_norm': 1.2850762605667114, 'learning_rate': 8.524127085559961e-06, 'epoch': 1.68} +2025-02-05 22:17:53 - ERROR - stderr - 56%|█████▌ | 12580/22434 [12:10:13<6:54:53, 2.53s/it] +2025-02-05 22:17:56 - ERROR - stderr - 56%|█████▌ | 12581/22434 [12:10:15<6:49:51, 2.50s/it] +2025-02-05 22:17:56 - ERROR - stderr - +2025-02-05 22:17:56 - ERROR - stderr - +2025-02-05 22:17:56 - INFO - stdout - {'loss': 0.6856, 'grad_norm': 1.1222763061523438, 'learning_rate': 8.522699164742981e-06, 'epoch': 1.68} +2025-02-05 22:17:56 - ERROR - stderr - 56%|█████▌ | 12581/22434 [12:10:15<6:49:51, 2.50s/it] +2025-02-05 22:17:58 - ERROR - stderr - 56%|█████▌ | 12582/22434 [12:10:18<6:49:04, 2.49s/it] +2025-02-05 22:17:58 - ERROR - stderr - +2025-02-05 22:17:58 - ERROR - stderr - +2025-02-05 22:17:58 - INFO - stdout - {'loss': 0.7056, 'grad_norm': 1.3772772550582886, 'learning_rate': 8.521271274718928e-06, 'epoch': 1.68} +2025-02-05 22:17:58 - ERROR - stderr - 56%|█████▌ | 12582/22434 [12:10:18<6:49:04, 2.49s/it] +2025-02-05 22:18:01 - ERROR - stderr - 56%|█████▌ | 12583/22434 [12:10:20<6:51:14, 2.50s/it] +2025-02-05 22:18:01 - ERROR - stderr - +2025-02-05 22:18:01 - ERROR - stderr - +2025-02-05 22:18:01 - INFO - stdout - {'loss': 0.5965, 'grad_norm': 1.089009404182434, 'learning_rate': 8.519843415517557e-06, 'epoch': 1.68} +2025-02-05 22:18:01 - ERROR - stderr - 56%|█████▌ | 12583/22434 [12:10:20<6:51:14, 2.50s/it] +2025-02-05 22:18:03 - ERROR - stderr - 56%|█████▌ | 12584/22434 [12:10:23<6:52:07, 2.51s/it] +2025-02-05 22:18:03 - ERROR - stderr - +2025-02-05 22:18:03 - ERROR - stderr - +2025-02-05 22:18:03 - INFO - stdout - {'loss': 0.6257, 'grad_norm': 1.1247016191482544, 'learning_rate': 8.518415587168634e-06, 'epoch': 1.68} +2025-02-05 22:18:03 - ERROR - stderr - 56%|█████▌ | 12584/22434 [12:10:23<6:52:07, 2.51s/it] +2025-02-05 22:18:06 - ERROR - stderr - 56%|█████▌ | 12585/22434 [12:10:25<6:51:27, 2.51s/it] +2025-02-05 22:18:06 - ERROR - stderr - +2025-02-05 22:18:06 - ERROR - stderr - +2025-02-05 22:18:06 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.3030263185501099, 'learning_rate': 8.516987789701923e-06, 'epoch': 1.68} +2025-02-05 22:18:06 - ERROR - stderr - 56%|█████▌ | 12585/22434 [12:10:25<6:51:27, 2.51s/it] +2025-02-05 22:18:08 - ERROR - stderr - 56%|█████▌ | 12586/22434 [12:10:28<6:51:50, 2.51s/it] +2025-02-05 22:18:08 - ERROR - stderr - +2025-02-05 22:18:08 - ERROR - stderr - +2025-02-05 22:18:08 - INFO - stdout - {'loss': 0.7845, 'grad_norm': 1.338673710823059, 'learning_rate': 8.515560023147177e-06, 'epoch': 1.68} +2025-02-05 22:18:08 - ERROR - stderr - 56%|█████▌ | 12586/22434 [12:10:28<6:51:50, 2.51s/it] +2025-02-05 22:18:11 - ERROR - stderr - 56%|█████▌ | 12587/22434 [12:10:31<6:55:13, 2.53s/it] +2025-02-05 22:18:11 - ERROR - stderr - +2025-02-05 22:18:11 - ERROR - stderr - +2025-02-05 22:18:11 - INFO - stdout - {'loss': 0.6634, 'grad_norm': 1.199763298034668, 'learning_rate': 8.514132287534166e-06, 'epoch': 1.68} +2025-02-05 22:18:11 - ERROR - stderr - 56%|█████▌ | 12587/22434 [12:10:31<6:55:13, 2.53s/it] +2025-02-05 22:18:13 - ERROR - stderr - 56%|█████▌ | 12588/22434 [12:10:33<6:55:34, 2.53s/it] +2025-02-05 22:18:13 - ERROR - stderr - +2025-02-05 22:18:13 - ERROR - stderr - +2025-02-05 22:18:13 - INFO - stdout - {'loss': 0.7669, 'grad_norm': 1.4011178016662598, 'learning_rate': 8.512704582892646e-06, 'epoch': 1.68} +2025-02-05 22:18:13 - ERROR - stderr - 56%|█████▌ | 12588/22434 [12:10:33<6:55:34, 2.53s/it] +2025-02-05 22:18:16 - ERROR - stderr - 56%|█████▌ | 12589/22434 [12:10:36<6:59:15, 2.56s/it] +2025-02-05 22:18:16 - ERROR - stderr - +2025-02-05 22:18:16 - ERROR - stderr - +2025-02-05 22:18:16 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.208861231803894, 'learning_rate': 8.511276909252374e-06, 'epoch': 1.68} +2025-02-05 22:18:16 - ERROR - stderr - 56%|█████▌ | 12589/22434 [12:10:36<6:59:15, 2.56s/it] +2025-02-05 22:18:18 - ERROR - stderr - 56%|█████▌ | 12590/22434 [12:10:38<6:56:37, 2.54s/it] +2025-02-05 22:18:18 - ERROR - stderr - +2025-02-05 22:18:18 - ERROR - stderr - +2025-02-05 22:18:18 - INFO - stdout - {'loss': 0.7273, 'grad_norm': 1.1918777227401733, 'learning_rate': 8.509849266643112e-06, 'epoch': 1.68} +2025-02-05 22:18:18 - ERROR - stderr - 56%|█████▌ | 12590/22434 [12:10:38<6:56:37, 2.54s/it] +2025-02-05 22:18:21 - ERROR - stderr - 56%|█████▌ | 12591/22434 [12:10:41<6:58:31, 2.55s/it] +2025-02-05 22:18:21 - ERROR - stderr - +2025-02-05 22:18:21 - ERROR - stderr - +2025-02-05 22:18:21 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.3381962776184082, 'learning_rate': 8.508421655094618e-06, 'epoch': 1.68} +2025-02-05 22:18:21 - ERROR - stderr - 56%|█████▌ | 12591/22434 [12:10:41<6:58:31, 2.55s/it] +2025-02-05 22:18:23 - ERROR - stderr - 56%|█████▌ | 12592/22434 [12:10:43<6:53:25, 2.52s/it] +2025-02-05 22:18:23 - ERROR - stderr - +2025-02-05 22:18:23 - ERROR - stderr - +2025-02-05 22:18:23 - INFO - stdout - {'loss': 0.6641, 'grad_norm': 1.1858441829681396, 'learning_rate': 8.50699407463664e-06, 'epoch': 1.68} +2025-02-05 22:18:23 - ERROR - stderr - 56%|█████▌ | 12592/22434 [12:10:43<6:53:25, 2.52s/it] +2025-02-05 22:18:26 - ERROR - stderr - 56%|█████▌ | 12593/22434 [12:10:46<6:49:07, 2.49s/it] +2025-02-05 22:18:26 - ERROR - stderr - +2025-02-05 22:18:26 - ERROR - stderr - +2025-02-05 22:18:26 - INFO - stdout - {'loss': 0.5883, 'grad_norm': 1.0837119817733765, 'learning_rate': 8.50556652529895e-06, 'epoch': 1.68} +2025-02-05 22:18:26 - ERROR - stderr - 56%|█████▌ | 12593/22434 [12:10:46<6:49:07, 2.49s/it] +2025-02-05 22:18:28 - ERROR - stderr - 56%|█████▌ | 12594/22434 [12:10:48<6:47:10, 2.48s/it] +2025-02-05 22:18:28 - ERROR - stderr - +2025-02-05 22:18:28 - ERROR - stderr - +2025-02-05 22:18:28 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.244067907333374, 'learning_rate': 8.50413900711129e-06, 'epoch': 1.68} +2025-02-05 22:18:28 - ERROR - stderr - 56%|█████▌ | 12594/22434 [12:10:48<6:47:10, 2.48s/it] +2025-02-05 22:18:31 - ERROR - stderr - 56%|█████▌ | 12595/22434 [12:10:51<6:48:34, 2.49s/it] +2025-02-05 22:18:31 - ERROR - stderr - +2025-02-05 22:18:31 - ERROR - stderr - +2025-02-05 22:18:31 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.254588007926941, 'learning_rate': 8.502711520103425e-06, 'epoch': 1.68} +2025-02-05 22:18:31 - ERROR - stderr - 56%|█████▌ | 12595/22434 [12:10:51<6:48:34, 2.49s/it] +2025-02-05 22:18:33 - ERROR - stderr - 56%|█████▌ | 12596/22434 [12:10:53<6:52:44, 2.52s/it] +2025-02-05 22:18:33 - ERROR - stderr - +2025-02-05 22:18:33 - ERROR - stderr - +2025-02-05 22:18:33 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.3050950765609741, 'learning_rate': 8.501284064305104e-06, 'epoch': 1.68} +2025-02-05 22:18:33 - ERROR - stderr - 56%|█████▌ | 12596/22434 [12:10:53<6:52:44, 2.52s/it] +2025-02-05 22:18:36 - ERROR - stderr - 56%|█████▌ | 12597/22434 [12:10:56<6:49:29, 2.50s/it] +2025-02-05 22:18:36 - ERROR - stderr - +2025-02-05 22:18:36 - ERROR - stderr - +2025-02-05 22:18:36 - INFO - stdout - {'loss': 0.7262, 'grad_norm': 1.2358989715576172, 'learning_rate': 8.49985663974608e-06, 'epoch': 1.68} +2025-02-05 22:18:36 - ERROR - stderr - 56%|█████▌ | 12597/22434 [12:10:56<6:49:29, 2.50s/it] +2025-02-05 22:18:38 - ERROR - stderr - 56%|█████▌ | 12598/22434 [12:10:58<6:53:40, 2.52s/it] +2025-02-05 22:18:39 - ERROR - stderr - +2025-02-05 22:18:39 - ERROR - stderr - +2025-02-05 22:18:39 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.1740919351577759, 'learning_rate': 8.498429246456112e-06, 'epoch': 1.68} +2025-02-05 22:18:39 - ERROR - stderr - 56%|█████▌ | 12598/22434 [12:10:58<6:53:40, 2.52s/it] +2025-02-05 22:18:41 - ERROR - stderr - 56%|█████▌ | 12599/22434 [12:11:01<6:51:49, 2.51s/it] +2025-02-05 22:18:41 - ERROR - stderr - +2025-02-05 22:18:41 - ERROR - stderr - +2025-02-05 22:18:41 - INFO - stdout - {'loss': 0.7596, 'grad_norm': 1.3470181226730347, 'learning_rate': 8.49700188446495e-06, 'epoch': 1.68} +2025-02-05 22:18:41 - ERROR - stderr - 56%|█████▌ | 12599/22434 [12:11:01<6:51:49, 2.51s/it] +2025-02-05 22:18:43 - ERROR - stderr - 56%|█████▌ | 12600/22434 [12:11:03<6:47:36, 2.49s/it] +2025-02-05 22:18:43 - ERROR - stderr - +2025-02-05 22:18:43 - ERROR - stderr - +2025-02-05 22:18:43 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.2600083351135254, 'learning_rate': 8.495574553802343e-06, 'epoch': 1.68} +2025-02-05 22:18:43 - ERROR - stderr - 56%|█████▌ | 12600/22434 [12:11:03<6:47:36, 2.49s/it] +2025-02-05 22:18:46 - ERROR - stderr - 56%|█████▌ | 12601/22434 [12:11:06<6:48:20, 2.49s/it] +2025-02-05 22:18:46 - ERROR - stderr - +2025-02-05 22:18:46 - ERROR - stderr - +2025-02-05 22:18:46 - INFO - stdout - {'loss': 0.702, 'grad_norm': 1.1840137243270874, 'learning_rate': 8.494147254498045e-06, 'epoch': 1.69} +2025-02-05 22:18:46 - ERROR - stderr - 56%|█████▌ | 12601/22434 [12:11:06<6:48:20, 2.49s/it] +2025-02-05 22:18:48 - ERROR - stderr - 56%|█████▌ | 12602/22434 [12:11:08<6:45:08, 2.47s/it] +2025-02-05 22:18:48 - ERROR - stderr - +2025-02-05 22:18:48 - ERROR - stderr - +2025-02-05 22:18:48 - INFO - stdout - {'loss': 0.6086, 'grad_norm': 1.1932307481765747, 'learning_rate': 8.492719986581808e-06, 'epoch': 1.69} +2025-02-05 22:18:48 - ERROR - stderr - 56%|█████▌ | 12602/22434 [12:11:08<6:45:08, 2.47s/it] +2025-02-05 22:18:51 - ERROR - stderr - 56%|█████▌ | 12603/22434 [12:11:11<6:52:04, 2.51s/it] +2025-02-05 22:18:51 - ERROR - stderr - +2025-02-05 22:18:51 - ERROR - stderr - +2025-02-05 22:18:51 - INFO - stdout - {'loss': 0.8095, 'grad_norm': 1.312119722366333, 'learning_rate': 8.49129275008338e-06, 'epoch': 1.69} +2025-02-05 22:18:51 - ERROR - stderr - 56%|█████▌ | 12603/22434 [12:11:11<6:52:04, 2.51s/it] +2025-02-05 22:18:53 - ERROR - stderr - 56%|█████▌ | 12604/22434 [12:11:13<6:54:18, 2.53s/it] +2025-02-05 22:18:54 - ERROR - stderr - +2025-02-05 22:18:54 - ERROR - stderr - +2025-02-05 22:18:54 - INFO - stdout - {'loss': 0.7281, 'grad_norm': 1.1024538278579712, 'learning_rate': 8.489865545032512e-06, 'epoch': 1.69} +2025-02-05 22:18:54 - ERROR - stderr - 56%|█████▌ | 12604/22434 [12:11:13<6:54:18, 2.53s/it] +2025-02-05 22:18:56 - ERROR - stderr - 56%|█████▌ | 12605/22434 [12:11:16<6:56:07, 2.54s/it] +2025-02-05 22:18:56 - ERROR - stderr - +2025-02-05 22:18:56 - ERROR - stderr - +2025-02-05 22:18:56 - INFO - stdout - {'loss': 0.6289, 'grad_norm': 1.209395408630371, 'learning_rate': 8.488438371458949e-06, 'epoch': 1.69} +2025-02-05 22:18:56 - ERROR - stderr - 56%|█████▌ | 12605/22434 [12:11:16<6:56:07, 2.54s/it] +2025-02-05 22:18:59 - ERROR - stderr - 56%|█████▌ | 12606/22434 [12:11:18<6:55:10, 2.53s/it] +2025-02-05 22:18:59 - ERROR - stderr - +2025-02-05 22:18:59 - ERROR - stderr - +2025-02-05 22:18:59 - INFO - stdout - {'loss': 0.7006, 'grad_norm': 1.4207433462142944, 'learning_rate': 8.487011229392445e-06, 'epoch': 1.69} +2025-02-05 22:18:59 - ERROR - stderr - 56%|█████▌ | 12606/22434 [12:11:18<6:55:10, 2.53s/it] +2025-02-05 22:19:01 - ERROR - stderr - 56%|█████▌ | 12607/22434 [12:11:21<6:52:23, 2.52s/it] +2025-02-05 22:19:01 - ERROR - stderr - +2025-02-05 22:19:01 - ERROR - stderr - +2025-02-05 22:19:01 - INFO - stdout - {'loss': 0.7348, 'grad_norm': 1.2793669700622559, 'learning_rate': 8.485584118862743e-06, 'epoch': 1.69} +2025-02-05 22:19:01 - ERROR - stderr - 56%|█████▌ | 12607/22434 [12:11:21<6:52:23, 2.52s/it] +2025-02-05 22:19:04 - ERROR - stderr - 56%|█████▌ | 12608/22434 [12:11:24<7:07:39, 2.61s/it] +2025-02-05 22:19:04 - ERROR - stderr - +2025-02-05 22:19:04 - ERROR - stderr - +2025-02-05 22:19:04 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.2128007411956787, 'learning_rate': 8.48415703989959e-06, 'epoch': 1.69} +2025-02-05 22:19:04 - ERROR - stderr - 56%|█████▌ | 12608/22434 [12:11:24<7:07:39, 2.61s/it] +2025-02-05 22:19:06 - ERROR - stderr - 56%|█████▌ | 12609/22434 [12:11:26<6:57:10, 2.55s/it] +2025-02-05 22:19:06 - ERROR - stderr - +2025-02-05 22:19:06 - ERROR - stderr - +2025-02-05 22:19:06 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.208802342414856, 'learning_rate': 8.482729992532733e-06, 'epoch': 1.69} +2025-02-05 22:19:06 - ERROR - stderr - 56%|█████▌ | 12609/22434 [12:11:26<6:57:10, 2.55s/it] +2025-02-05 22:19:09 - ERROR - stderr - 56%|█████▌ | 12610/22434 [12:11:29<6:55:56, 2.54s/it] +2025-02-05 22:19:09 - ERROR - stderr - +2025-02-05 22:19:09 - ERROR - stderr - +2025-02-05 22:19:09 - INFO - stdout - {'loss': 0.7273, 'grad_norm': 1.2582327127456665, 'learning_rate': 8.481302976791917e-06, 'epoch': 1.69} +2025-02-05 22:19:09 - ERROR - stderr - 56%|█████▌ | 12610/22434 [12:11:29<6:55:56, 2.54s/it] +2025-02-05 22:19:11 - ERROR - stderr - 56%|█████▌ | 12611/22434 [12:11:31<6:53:04, 2.52s/it] +2025-02-05 22:19:11 - ERROR - stderr - +2025-02-05 22:19:11 - ERROR - stderr - +2025-02-05 22:19:11 - INFO - stdout - {'loss': 0.6718, 'grad_norm': 1.1231346130371094, 'learning_rate': 8.47987599270689e-06, 'epoch': 1.69} +2025-02-05 22:19:11 - ERROR - stderr - 56%|█████▌ | 12611/22434 [12:11:31<6:53:04, 2.52s/it] +2025-02-05 22:19:14 - ERROR - stderr - 56%|█████▌ | 12612/22434 [12:11:34<6:52:22, 2.52s/it] +2025-02-05 22:19:14 - ERROR - stderr - +2025-02-05 22:19:14 - ERROR - stderr - +2025-02-05 22:19:14 - INFO - stdout - {'loss': 0.6107, 'grad_norm': 1.156197190284729, 'learning_rate': 8.478449040307393e-06, 'epoch': 1.69} +2025-02-05 22:19:14 - ERROR - stderr - 56%|█████▌ | 12612/22434 [12:11:34<6:52:22, 2.52s/it] +2025-02-05 22:19:17 - ERROR - stderr - 56%|█████▌ | 12613/22434 [12:11:36<7:05:41, 2.60s/it] +2025-02-05 22:19:17 - ERROR - stderr - +2025-02-05 22:19:17 - ERROR - stderr - +2025-02-05 22:19:17 - INFO - stdout - {'loss': 0.6636, 'grad_norm': 1.1130750179290771, 'learning_rate': 8.477022119623165e-06, 'epoch': 1.69} +2025-02-05 22:19:17 - ERROR - stderr - 56%|█████▌ | 12613/22434 [12:11:36<7:05:41, 2.60s/it] +2025-02-05 22:19:19 - ERROR - stderr - 56%|█████▌ | 12614/22434 [12:11:39<7:01:38, 2.58s/it] +2025-02-05 22:19:19 - ERROR - stderr - +2025-02-05 22:19:19 - ERROR - stderr - +2025-02-05 22:19:19 - INFO - stdout - {'loss': 0.5917, 'grad_norm': 1.0320173501968384, 'learning_rate': 8.47559523068396e-06, 'epoch': 1.69} +2025-02-05 22:19:19 - ERROR - stderr - 56%|█���███▌ | 12614/22434 [12:11:39<7:01:38, 2.58s/it] +2025-02-05 22:19:22 - ERROR - stderr - 56%|█████▌ | 12615/22434 [12:11:41<6:56:11, 2.54s/it] +2025-02-05 22:19:22 - ERROR - stderr - +2025-02-05 22:19:22 - ERROR - stderr - +2025-02-05 22:19:22 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.305985450744629, 'learning_rate': 8.47416837351951e-06, 'epoch': 1.69} +2025-02-05 22:19:22 - ERROR - stderr - 56%|█████▌ | 12615/22434 [12:11:41<6:56:11, 2.54s/it] +2025-02-05 22:19:24 - ERROR - stderr - 56%|█████▌ | 12616/22434 [12:11:44<6:56:10, 2.54s/it] +2025-02-05 22:19:24 - ERROR - stderr - +2025-02-05 22:19:24 - ERROR - stderr - +2025-02-05 22:19:24 - INFO - stdout - {'loss': 0.6901, 'grad_norm': 1.1145660877227783, 'learning_rate': 8.472741548159559e-06, 'epoch': 1.69} +2025-02-05 22:19:24 - ERROR - stderr - 56%|█████▌ | 12616/22434 [12:11:44<6:56:10, 2.54s/it] +2025-02-05 22:19:27 - ERROR - stderr - 56%|█████▌ | 12617/22434 [12:11:47<7:12:17, 2.64s/it] +2025-02-05 22:19:27 - ERROR - stderr - +2025-02-05 22:19:27 - ERROR - stderr - +2025-02-05 22:19:27 - INFO - stdout - {'loss': 0.7162, 'grad_norm': 1.1806029081344604, 'learning_rate': 8.471314754633853e-06, 'epoch': 1.69} +2025-02-05 22:19:27 - ERROR - stderr - 56%|█████▌ | 12617/22434 [12:11:47<7:12:17, 2.64s/it] +2025-02-05 22:19:30 - ERROR - stderr - 56%|█████▌ | 12618/22434 [12:11:49<7:07:17, 2.61s/it] +2025-02-05 22:19:30 - ERROR - stderr - +2025-02-05 22:19:30 - ERROR - stderr - +2025-02-05 22:19:30 - INFO - stdout - {'loss': 0.6856, 'grad_norm': 1.1462332010269165, 'learning_rate': 8.469887992972124e-06, 'epoch': 1.69} +2025-02-05 22:19:30 - ERROR - stderr - 56%|█████▌ | 12618/22434 [12:11:49<7:07:17, 2.61s/it] +2025-02-05 22:19:32 - ERROR - stderr - 56%|█████▌ | 12619/22434 [12:11:52<7:00:30, 2.57s/it] +2025-02-05 22:19:32 - ERROR - stderr - +2025-02-05 22:19:32 - ERROR - stderr - +2025-02-05 22:19:32 - INFO - stdout - {'loss': 0.7555, 'grad_norm': 1.2312220335006714, 'learning_rate': 8.468461263204118e-06, 'epoch': 1.69} +2025-02-05 22:19:32 - ERROR - stderr - 56%|█████▌ | 12619/22434 [12:11:52<7:00:30, 2.57s/it] +2025-02-05 22:19:34 - ERROR - stderr - 56%|█████▋ | 12620/22434 [12:11:54<6:55:52, 2.54s/it] +2025-02-05 22:19:35 - ERROR - stderr - +2025-02-05 22:19:35 - ERROR - stderr - +2025-02-05 22:19:35 - INFO - stdout - {'loss': 0.6092, 'grad_norm': 1.1212772130966187, 'learning_rate': 8.467034565359571e-06, 'epoch': 1.69} +2025-02-05 22:19:35 - ERROR - stderr - 56%|█████▋ | 12620/22434 [12:11:54<6:55:52, 2.54s/it] +2025-02-05 22:19:37 - ERROR - stderr - 56%|█████▋ | 12621/22434 [12:11:57<6:51:32, 2.52s/it] +2025-02-05 22:19:37 - ERROR - stderr - +2025-02-05 22:19:37 - ERROR - stderr - +2025-02-05 22:19:37 - INFO - stdout - {'loss': 0.6326, 'grad_norm': 1.1769607067108154, 'learning_rate': 8.465607899468222e-06, 'epoch': 1.69} +2025-02-05 22:19:37 - ERROR - stderr - 56%|█████▋ | 12621/22434 [12:11:57<6:51:32, 2.52s/it] +2025-02-05 22:19:39 - ERROR - stderr - 56%|█████▋ | 12622/22434 [12:11:59<6:49:39, 2.51s/it] +2025-02-05 22:19:39 - ERROR - stderr - +2025-02-05 22:19:39 - ERROR - stderr - +2025-02-05 22:19:39 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.2133753299713135, 'learning_rate': 8.464181265559807e-06, 'epoch': 1.69} +2025-02-05 22:19:39 - ERROR - stderr - 56%|█████▋ | 12622/22434 [12:11:59<6:49:39, 2.51s/it] +2025-02-05 22:19:42 - ERROR - stderr - 56%|█████▋ | 12623/22434 [12:12:02<6:50:04, 2.51s/it] +2025-02-05 22:19:42 - ERROR - stderr - +2025-02-05 22:19:42 - ERROR - stderr - +2025-02-05 22:19:42 - INFO - stdout - {'loss': 0.7102, 'grad_norm': 1.3872541189193726, 'learning_rate': 8.462754663664067e-06, 'epoch': 1.69} +2025-02-05 22:19:42 - ERROR - stderr - 56%|█████▋ | 12623/22434 [12:12:02<6:50:04, 2.51s/it] +2025-02-05 22:19:44 - ERROR - stderr - 56%|█████▋ | 12624/22434 [12:12:04<6:52:33, 2.52s/it] +2025-02-05 22:19:45 - ERROR - stderr - +2025-02-05 22:19:45 - ERROR - stderr - +2025-02-05 22:19:45 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.1456184387207031, 'learning_rate': 8.46132809381073e-06, 'epoch': 1.69} +2025-02-05 22:19:45 - ERROR - stderr - 56%|█████▋ | 12624/22434 [12:12:04<6:52:33, 2.52s/it] +2025-02-05 22:19:47 - ERROR - stderr - 56%|█████▋ | 12625/22434 [12:12:07<6:47:07, 2.49s/it] +2025-02-05 22:19:47 - ERROR - stderr - +2025-02-05 22:19:47 - ERROR - stderr - +2025-02-05 22:19:47 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.2878984212875366, 'learning_rate': 8.459901556029541e-06, 'epoch': 1.69} +2025-02-05 22:19:47 - ERROR - stderr - 56%|█████▋ | 12625/22434 [12:12:07<6:47:07, 2.49s/it] +2025-02-05 22:19:49 - ERROR - stderr - 56%|█████▋ | 12626/22434 [12:12:09<6:49:54, 2.51s/it] +2025-02-05 22:19:49 - ERROR - stderr - +2025-02-05 22:19:49 - ERROR - stderr - +2025-02-05 22:19:49 - INFO - stdout - {'loss': 0.7562, 'grad_norm': 1.4174041748046875, 'learning_rate': 8.458475050350227e-06, 'epoch': 1.69} +2025-02-05 22:19:49 - ERROR - stderr - 56%|█████▋ | 12626/22434 [12:12:09<6:49:54, 2.51s/it] +2025-02-05 22:19:52 - ERROR - stderr - 56%|█████▋ | 12627/22434 [12:12:12<6:52:45, 2.53s/it] +2025-02-05 22:19:52 - ERROR - stderr - +2025-02-05 22:19:52 - ERROR - stderr - +2025-02-05 22:19:52 - INFO - stdout - {'loss': 0.6233, 'grad_norm': 1.1257339715957642, 'learning_rate': 8.457048576802529e-06, 'epoch': 1.69} +2025-02-05 22:19:52 - ERROR - stderr - 56%|█████▋ | 12627/22434 [12:12:12<6:52:45, 2.53s/it] +2025-02-05 22:19:54 - ERROR - stderr - 56%|█████▋ | 12628/22434 [12:12:14<6:50:35, 2.51s/it] +2025-02-05 22:19:55 - ERROR - stderr - +2025-02-05 22:19:55 - ERROR - stderr - +2025-02-05 22:19:55 - INFO - stdout - {'loss': 0.7294, 'grad_norm': 1.346755027770996, 'learning_rate': 8.455622135416175e-06, 'epoch': 1.69} +2025-02-05 22:19:55 - ERROR - stderr - 56%|█████▋ | 12628/22434 [12:12:14<6:50:35, 2.51s/it] +2025-02-05 22:19:57 - ERROR - stderr - 56%|█████▋ | 12629/22434 [12:12:17<6:50:50, 2.51s/it] +2025-02-05 22:19:57 - ERROR - stderr - +2025-02-05 22:19:57 - ERROR - stderr - +2025-02-05 22:19:57 - INFO - stdout - {'loss': 0.7373, 'grad_norm': 1.2853269577026367, 'learning_rate': 8.454195726220898e-06, 'epoch': 1.69} +2025-02-05 22:19:57 - ERROR - stderr - 56%|█████▋ | 12629/22434 [12:12:17<6:50:50, 2.51s/it] +2025-02-05 22:20:00 - ERROR - stderr - 56%|█████▋ | 12630/22434 [12:12:19<6:52:57, 2.53s/it] +2025-02-05 22:20:00 - ERROR - stderr - +2025-02-05 22:20:00 - ERROR - stderr - +2025-02-05 22:20:00 - INFO - stdout - {'loss': 0.7396, 'grad_norm': 1.1935268640518188, 'learning_rate': 8.452769349246434e-06, 'epoch': 1.69} +2025-02-05 22:20:00 - ERROR - stderr - 56%|█████▋ | 12630/22434 [12:12:19<6:52:57, 2.53s/it] +2025-02-05 22:20:02 - ERROR - stderr - 56%|█████▋ | 12631/22434 [12:12:22<7:04:07, 2.60s/it] +2025-02-05 22:20:02 - ERROR - stderr - +2025-02-05 22:20:02 - ERROR - stderr - +2025-02-05 22:20:02 - INFO - stdout - {'loss': 0.8071, 'grad_norm': 1.438927173614502, 'learning_rate': 8.451343004522515e-06, 'epoch': 1.69} +2025-02-05 22:20:02 - ERROR - stderr - 56%|█████▋ | 12631/22434 [12:12:22<7:04:07, 2.60s/it] +2025-02-05 22:20:05 - ERROR - stderr - 56%|█████▋ | 12632/22434 [12:12:25<7:01:48, 2.58s/it] +2025-02-05 22:20:05 - ERROR - stderr - +2025-02-05 22:20:05 - ERROR - stderr - +2025-02-05 22:20:05 - INFO - stdout - {'loss': 0.6644, 'grad_norm': 1.1363568305969238, 'learning_rate': 8.449916692078863e-06, 'epoch': 1.69} +2025-02-05 22:20:05 - ERROR - stderr - 56%|█████▋ | 12632/22434 [12:12:25<7:01:48, 2.58s/it] +2025-02-05 22:20:07 - ERROR - stderr - 56%|█████▋ | 12633/22434 [12:12:27<6:53:42, 2.53s/it] +2025-02-05 22:20:07 - ERROR - stderr - +2025-02-05 22:20:07 - ERROR - stderr - +2025-02-05 22:20:07 - INFO - stdout - {'loss': 0.6382, 'grad_norm': 1.1978378295898438, 'learning_rate': 8.44849041194522e-06, 'epoch': 1.69} +2025-02-05 22:20:07 - ERROR - stderr - 56%|█████▋ | 12633/22434 [12:12:27<6:53:42, 2.53s/it] +2025-02-05 22:20:10 - ERROR - stderr - 56%|█████▋ | 12634/22434 [12:12:30<6:54:20, 2.54s/it] +2025-02-05 22:20:10 - ERROR - stderr - +2025-02-05 22:20:10 - ERROR - stderr - +2025-02-05 22:20:10 - INFO - stdout - {'loss': 0.6297, 'grad_norm': 1.1633226871490479, 'learning_rate': 8.447064164151305e-06, 'epoch': 1.69} +2025-02-05 22:20:10 - ERROR - stderr - 56%|█████▋ | 12634/22434 [12:12:30<6:54:20, 2.54s/it] +2025-02-05 22:20:12 - ERROR - stderr - 56%|█████▋ | 12635/22434 [12:12:32<6:52:42, 2.53s/it] +2025-02-05 22:20:12 - ERROR - stderr - +2025-02-05 22:20:12 - ERROR - stderr - +2025-02-05 22:20:12 - INFO - stdout - {'loss': 0.7954, 'grad_norm': 1.2863752841949463, 'learning_rate': 8.445637948726854e-06, 'epoch': 1.69} +2025-02-05 22:20:12 - ERROR - stderr - 56%|█████▋ | 12635/22434 [12:12:32<6:52:42, 2.53s/it] +2025-02-05 22:20:15 - ERROR - stderr - 56%|█████▋ | 12636/22434 [12:12:35<6:50:18, 2.51s/it] +2025-02-05 22:20:15 - ERROR - stderr - +2025-02-05 22:20:15 - ERROR - stderr - +2025-02-05 22:20:15 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.092232346534729, 'learning_rate': 8.444211765701594e-06, 'epoch': 1.69} +2025-02-05 22:20:15 - ERROR - stderr - 56%|█████▋ | 12636/22434 [12:12:35<6:50:18, 2.51s/it] +2025-02-05 22:20:18 - ERROR - stderr - 56%|█████▋ | 12637/22434 [12:12:37<7:05:34, 2.61s/it] +2025-02-05 22:20:18 - ERROR - stderr - +2025-02-05 22:20:18 - ERROR - stderr - +2025-02-05 22:20:18 - INFO - stdout - {'loss': 0.6914, 'grad_norm': 1.1640865802764893, 'learning_rate': 8.442785615105247e-06, 'epoch': 1.69} +2025-02-05 22:20:18 - ERROR - stderr - 56%|█████▋ | 12637/22434 [12:12:37<7:05:34, 2.61s/it] +2025-02-05 22:20:20 - ERROR - stderr - 56%|█████▋ | 12638/22434 [12:12:40<7:07:29, 2.62s/it] +2025-02-05 22:20:20 - ERROR - stderr - +2025-02-05 22:20:20 - ERROR - stderr - +2025-02-05 22:20:20 - INFO - stdout - {'loss': 0.6636, 'grad_norm': 1.2823617458343506, 'learning_rate': 8.441359496967549e-06, 'epoch': 1.69} +2025-02-05 22:20:20 - ERROR - stderr - 56%|█████▋ | 12638/22434 [12:12:40<7:07:29, 2.62s/it] +2025-02-05 22:20:23 - ERROR - stderr - 56%|█████▋ | 12639/22434 [12:12:43<6:59:42, 2.57s/it] +2025-02-05 22:20:23 - ERROR - stderr - +2025-02-05 22:20:23 - ERROR - stderr - +2025-02-05 22:20:23 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.250607967376709, 'learning_rate': 8.439933411318217e-06, 'epoch': 1.69} +2025-02-05 22:20:23 - ERROR - stderr - 56%|█████▋ | 12639/22434 [12:12:43<6:59:42, 2.57s/it] +2025-02-05 22:20:25 - ERROR - stderr - 56%|█████▋ | 12640/22434 [12:12:45<6:54:23, 2.54s/it] +2025-02-05 22:20:25 - ERROR - stderr - +2025-02-05 22:20:25 - ERROR - stderr - +2025-02-05 22:20:25 - INFO - stdout - {'loss': 0.7316, 'grad_norm': 1.2313709259033203, 'learning_rate': 8.43850735818698e-06, 'epoch': 1.69} +2025-02-05 22:20:25 - ERROR - stderr - 56%|█████▋ | 12640/22434 [12:12:45<6:54:23, 2.54s/it] +2025-02-05 22:20:28 - ERROR - stderr - 56%|█████▋ | 12641/22434 [12:12:48<7:03:01, 2.59s/it] +2025-02-05 22:20:28 - ERROR - stderr - +2025-02-05 22:20:28 - ERROR - stderr - +2025-02-05 22:20:28 - INFO - stdout - {'loss': 0.5821, 'grad_norm': 1.164263129234314, 'learning_rate': 8.437081337603566e-06, 'epoch': 1.69} +2025-02-05 22:20:28 - ERROR - stderr - 56%|█████▋ | 12641/22434 [12:12:48<7:03:01, 2.59s/it] +2025-02-05 22:20:30 - ERROR - stderr - 56%|█████▋ | 12642/22434 [12:12:50<6:58:05, 2.56s/it] +2025-02-05 22:20:30 - ERROR - stderr - +2025-02-05 22:20:30 - ERROR - stderr - +2025-02-05 22:20:30 - INFO - stdout - {'loss': 0.7066, 'grad_norm': 1.2083113193511963, 'learning_rate': 8.43565534959769e-06, 'epoch': 1.69} +2025-02-05 22:20:30 - ERROR - stderr - 56%|█████▋ | 12642/22434 [12:12:50<6:58:05, 2.56s/it] +2025-02-05 22:20:33 - ERROR - stderr - 56%|█████▋ | 12643/22434 [12:12:53<6:57:48, 2.56s/it] +2025-02-05 22:20:33 - ERROR - stderr - +2025-02-05 22:20:33 - ERROR - stderr - +2025-02-05 22:20:33 - INFO - stdout - {'loss': 0.6443, 'grad_norm': 1.2969449758529663, 'learning_rate': 8.434229394199089e-06, 'epoch': 1.69} +2025-02-05 22:20:33 - ERROR - stderr - 56%|█████▋ | 12643/22434 [12:12:53<6:57:48, 2.56s/it] +2025-02-05 22:20:36 - ERROR - stderr - 56%|█████▋ | 12644/22434 [12:12:55<6:57:40, 2.56s/it] +2025-02-05 22:20:36 - ERROR - stderr - +2025-02-05 22:20:36 - ERROR - stderr - +2025-02-05 22:20:36 - INFO - stdout - {'loss': 0.7938, 'grad_norm': 1.284295916557312, 'learning_rate': 8.432803471437476e-06, 'epoch': 1.69} +2025-02-05 22:20:36 - ERROR - stderr - 56%|█████▋ | 12644/22434 [12:12:55<6:57:40, 2.56s/it] +2025-02-05 22:20:38 - ERROR - stderr - 56%|█████▋ | 12645/22434 [12:12:58<6:51:09, 2.52s/it] +2025-02-05 22:20:38 - ERROR - stderr - +2025-02-05 22:20:38 - ERROR - stderr - +2025-02-05 22:20:38 - INFO - stdout - {'loss': 0.6735, 'grad_norm': 1.2096102237701416, 'learning_rate': 8.43137758134257e-06, 'epoch': 1.69} +2025-02-05 22:20:38 - ERROR - stderr - 56%|█████▋ | 12645/22434 [12:12:58<6:51:09, 2.52s/it] +2025-02-05 22:20:40 - ERROR - stderr - 56%|█████▋ | 12646/22434 [12:13:00<6:47:50, 2.50s/it] +2025-02-05 22:20:40 - ERROR - stderr - +2025-02-05 22:20:40 - ERROR - stderr - +2025-02-05 22:20:40 - INFO - stdout - {'loss': 0.6859, 'grad_norm': 1.219283103942871, 'learning_rate': 8.429951723944103e-06, 'epoch': 1.69} +2025-02-05 22:20:40 - ERROR - stderr - 56%|█████▋ | 12646/22434 [12:13:00<6:47:50, 2.50s/it] +2025-02-05 22:20:43 - ERROR - stderr - 56%|█████▋ | 12647/22434 [12:13:03<6:46:11, 2.49s/it] +2025-02-05 22:20:43 - ERROR - stderr - +2025-02-05 22:20:43 - ERROR - stderr - +2025-02-05 22:20:43 - INFO - stdout - {'loss': 0.6339, 'grad_norm': 1.1531134843826294, 'learning_rate': 8.428525899271787e-06, 'epoch': 1.69} +2025-02-05 22:20:43 - ERROR - stderr - 56%|█████▋ | 12647/22434 [12:13:03<6:46:11, 2.49s/it] +2025-02-05 22:20:45 - ERROR - stderr - 56%|█████▋ | 12648/22434 [12:13:05<6:44:55, 2.48s/it] +2025-02-05 22:20:45 - ERROR - stderr - +2025-02-05 22:20:45 - ERROR - stderr - +2025-02-05 22:20:45 - INFO - stdout - {'loss': 0.6306, 'grad_norm': 1.2346347570419312, 'learning_rate': 8.427100107355344e-06, 'epoch': 1.69} +2025-02-05 22:20:45 - ERROR - stderr - 56%|█████▋ | 12648/22434 [12:13:05<6:44:55, 2.48s/it] +2025-02-05 22:20:48 - ERROR - stderr - 56%|█████▋ | 12649/22434 [12:13:08<6:42:36, 2.47s/it] +2025-02-05 22:20:48 - ERROR - stderr - +2025-02-05 22:20:48 - ERROR - stderr - +2025-02-05 22:20:48 - INFO - stdout - {'loss': 0.6744, 'grad_norm': 1.2428038120269775, 'learning_rate': 8.425674348224498e-06, 'epoch': 1.69} +2025-02-05 22:20:48 - ERROR - stderr - 56%|█████▋ | 12649/22434 [12:13:08<6:42:36, 2.47s/it] +2025-02-05 22:20:50 - ERROR - stderr - 56%|█████▋ | 12650/22434 [12:13:10<6:47:30, 2.50s/it] +2025-02-05 22:20:50 - ERROR - stderr - +2025-02-05 22:20:50 - ERROR - stderr - +2025-02-05 22:20:50 - INFO - stdout - {'loss': 0.744, 'grad_norm': 1.2464931011199951, 'learning_rate': 8.424248621908959e-06, 'epoch': 1.69} +2025-02-05 22:20:50 - ERROR - stderr - 56%|█████▋ | 12650/22434 [12:13:10<6:47:30, 2.50s/it] +2025-02-05 22:20:53 - ERROR - stderr - 56%|█████▋ | 12651/22434 [12:13:13<6:46:48, 2.50s/it] +2025-02-05 22:20:53 - ERROR - stderr - +2025-02-05 22:20:53 - ERROR - stderr - +2025-02-05 22:20:53 - INFO - stdout - {'loss': 0.6573, 'grad_norm': 1.3033384084701538, 'learning_rate': 8.422822928438453e-06, 'epoch': 1.69} +2025-02-05 22:20:53 - ERROR - stderr - 56%|█████▋ | 12651/22434 [12:13:13<6:46:48, 2.50s/it] +2025-02-05 22:20:55 - ERROR - stderr - 56%|█████▋ | 12652/22434 [12:13:15<6:49:35, 2.51s/it] +2025-02-05 22:20:55 - ERROR - stderr - +2025-02-05 22:20:55 - ERROR - stderr - +2025-02-05 22:20:55 - INFO - stdout - {'loss': 0.7233, 'grad_norm': 1.2556694746017456, 'learning_rate': 8.421397267842693e-06, 'epoch': 1.69} +2025-02-05 22:20:55 - ERROR - stderr - 56%|█████▋ | 12652/22434 [12:13:15<6:49:35, 2.51s/it] +2025-02-05 22:20:58 - ERROR - stderr - 56%|█████▋ | 12653/22434 [12:13:18<6:47:49, 2.50s/it] +2025-02-05 22:20:58 - ERROR - stderr - +2025-02-05 22:20:58 - ERROR - stderr - +2025-02-05 22:20:58 - INFO - stdout - {'loss': 0.6873, 'grad_norm': 1.1637213230133057, 'learning_rate': 8.419971640151397e-06, 'epoch': 1.69} +2025-02-05 22:20:58 - ERROR - stderr - 56%|█████▋ | 12653/22434 [12:13:18<6:47:49, 2.50s/it] +2025-02-05 22:21:00 - ERROR - stderr - 56%|█████▋ | 12654/22434 [12:13:20<6:47:21, 2.50s/it] +2025-02-05 22:21:00 - ERROR - stderr - +2025-02-05 22:21:00 - ERROR - stderr - +2025-02-05 22:21:00 - INFO - stdout - {'loss': 0.8026, 'grad_norm': 1.3313084840774536, 'learning_rate': 8.41854604539428e-06, 'epoch': 1.69} +2025-02-05 22:21:00 - ERROR - stderr - 56%|█████▋ | 12654/22434 [12:13:20<6:47:21, 2.50s/it] +2025-02-05 22:21:03 - ERROR - stderr - 56%|█████▋ | 12655/22434 [12:13:23<6:44:05, 2.48s/it] +2025-02-05 22:21:03 - ERROR - stderr - +2025-02-05 22:21:03 - ERROR - stderr - +2025-02-05 22:21:03 - INFO - stdout - {'loss': 0.7479, 'grad_norm': 1.3373881578445435, 'learning_rate': 8.417120483601058e-06, 'epoch': 1.69} +2025-02-05 22:21:03 - ERROR - stderr - 56%|█████▋ | 12655/22434 [12:13:23<6:44:05, 2.48s/it] +2025-02-05 22:21:05 - ERROR - stderr - 56%|█████▋ | 12656/22434 [12:13:25<6:45:30, 2.49s/it] +2025-02-05 22:21:05 - ERROR - stderr - +2025-02-05 22:21:05 - ERROR - stderr - +2025-02-05 22:21:05 - INFO - stdout - {'loss': 0.6275, 'grad_norm': 1.1625293493270874, 'learning_rate': 8.41569495480144e-06, 'epoch': 1.69} +2025-02-05 22:21:05 - ERROR - stderr - 56%|█████▋ | 12656/22434 [12:13:25<6:45:30, 2.49s/it] +2025-02-05 22:21:08 - ERROR - stderr - 56%|█████▋ | 12657/22434 [12:13:28<6:48:15, 2.51s/it] +2025-02-05 22:21:08 - ERROR - stderr - +2025-02-05 22:21:08 - ERROR - stderr - +2025-02-05 22:21:08 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.1874030828475952, 'learning_rate': 8.414269459025152e-06, 'epoch': 1.69} +2025-02-05 22:21:08 - ERROR - stderr - 56%|█████▋ | 12657/22434 [12:13:28<6:48:15, 2.51s/it] +2025-02-05 22:21:10 - ERROR - stderr - 56%|█████▋ | 12658/22434 [12:13:30<6:45:35, 2.49s/it] +2025-02-05 22:21:10 - ERROR - stderr - +2025-02-05 22:21:10 - ERROR - stderr - +2025-02-05 22:21:10 - INFO - stdout - {'loss': 0.7917, 'grad_norm': 1.3021211624145508, 'learning_rate': 8.412843996301894e-06, 'epoch': 1.69} +2025-02-05 22:21:10 - ERROR - stderr - 56%|█████▋ | 12658/22434 [12:13:30<6:45:35, 2.49s/it] +2025-02-05 22:21:13 - ERROR - stderr - 56%|█████▋ | 12659/22434 [12:13:33<6:46:04, 2.49s/it] +2025-02-05 22:21:13 - ERROR - stderr - +2025-02-05 22:21:13 - ERROR - stderr - +2025-02-05 22:21:13 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.182937502861023, 'learning_rate': 8.411418566661387e-06, 'epoch': 1.69} +2025-02-05 22:21:13 - ERROR - stderr - 56%|█████▋ | 12659/22434 [12:13:33<6:46:04, 2.49s/it] +2025-02-05 22:21:15 - ERROR - stderr - 56%|█████▋ | 12660/22434 [12:13:35<6:46:03, 2.49s/it] +2025-02-05 22:21:15 - ERROR - stderr - +2025-02-05 22:21:15 - ERROR - stderr - +2025-02-05 22:21:15 - INFO - stdout - {'loss': 0.6651, 'grad_norm': 1.2208675146102905, 'learning_rate': 8.40999317013334e-06, 'epoch': 1.69} +2025-02-05 22:21:15 - ERROR - stderr - 56%|█████▋ | 12660/22434 [12:13:35<6:46:03, 2.49s/it] +2025-02-05 22:21:18 - ERROR - stderr - 56%|█████▋ | 12661/22434 [12:13:38<6:48:07, 2.51s/it] +2025-02-05 22:21:18 - ERROR - stderr - +2025-02-05 22:21:18 - ERROR - stderr - +2025-02-05 22:21:18 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.136149525642395, 'learning_rate': 8.408567806747461e-06, 'epoch': 1.69} +2025-02-05 22:21:18 - ERROR - stderr - 56%|█████▋ | 12661/22434 [12:13:38<6:48:07, 2.51s/it] +2025-02-05 22:21:20 - ERROR - stderr - 56%|█████▋ | 12662/22434 [12:13:40<6:49:10, 2.51s/it] +2025-02-05 22:21:20 - ERROR - stderr - +2025-02-05 22:21:20 - ERROR - stderr - +2025-02-05 22:21:20 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.355196475982666, 'learning_rate': 8.407142476533468e-06, 'epoch': 1.69} +2025-02-05 22:21:20 - ERROR - stderr - 56%|█████▋ | 12662/22434 [12:13:40<6:49:10, 2.51s/it] +2025-02-05 22:21:23 - ERROR - stderr - 56%|█████▋ | 12663/22434 [12:13:43<6:48:05, 2.51s/it] +2025-02-05 22:21:23 - ERROR - stderr - +2025-02-05 22:21:23 - ERROR - stderr - +2025-02-05 22:21:23 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.408139944076538, 'learning_rate': 8.40571717952106e-06, 'epoch': 1.69} +2025-02-05 22:21:23 - ERROR - stderr - 56%|█████▋ | 12663/22434 [12:13:43<6:48:05, 2.51s/it] +2025-02-05 22:21:25 - ERROR - stderr - 56%|█████▋ | 12664/22434 [12:13:45<6:51:55, 2.53s/it] +2025-02-05 22:21:25 - ERROR - stderr - +2025-02-05 22:21:25 - ERROR - stderr - +2025-02-05 22:21:25 - INFO - stdout - {'loss': 0.7651, 'grad_norm': 1.3606435060501099, 'learning_rate': 8.404291915739958e-06, 'epoch': 1.69} +2025-02-05 22:21:25 - ERROR - stderr - 56%|█████▋ | 12664/22434 [12:13:45<6:51:55, 2.53s/it] +2025-02-05 22:21:28 - ERROR - stderr - 56%|█████▋ | 12665/22434 [12:13:48<6:52:42, 2.53s/it] +2025-02-05 22:21:28 - ERROR - stderr - +2025-02-05 22:21:28 - ERROR - stderr - +2025-02-05 22:21:28 - INFO - stdout - {'loss': 0.7209, 'grad_norm': 1.309222936630249, 'learning_rate': 8.402866685219863e-06, 'epoch': 1.69} +2025-02-05 22:21:28 - ERROR - stderr - 56%|█████▋ | 12665/22434 [12:13:48<6:52:42, 2.53s/it] +2025-02-05 22:21:31 - ERROR - stderr - 56%|█████▋ | 12666/22434 [12:13:50<6:54:35, 2.55s/it] +2025-02-05 22:21:31 - ERROR - stderr - +2025-02-05 22:21:31 - ERROR - stderr - +2025-02-05 22:21:31 - INFO - stdout - {'loss': 0.7306, 'grad_norm': 1.214650273323059, 'learning_rate': 8.401441487990478e-06, 'epoch': 1.69} +2025-02-05 22:21:31 - ERROR - stderr - 56%|█████▋ | 12666/22434 [12:13:50<6:54:35, 2.55s/it] +2025-02-05 22:21:33 - ERROR - stderr - 56%|█████▋ | 12667/22434 [12:13:53<6:50:44, 2.52s/it] +2025-02-05 22:21:33 - ERROR - stderr - +2025-02-05 22:21:33 - ERROR - stderr - +2025-02-05 22:21:33 - INFO - stdout - {'loss': 0.7424, 'grad_norm': 1.4477800130844116, 'learning_rate': 8.40001632408152e-06, 'epoch': 1.69} +2025-02-05 22:21:33 - ERROR - stderr - 56%|█████▋ | 12667/22434 [12:13:53<6:50:44, 2.52s/it] +2025-02-05 22:21:36 - ERROR - stderr - 56%|█████▋ | 12668/22434 [12:13:55<6:50:59, 2.53s/it] +2025-02-05 22:21:36 - ERROR - stderr - +2025-02-05 22:21:36 - ERROR - stderr - +2025-02-05 22:21:36 - INFO - stdout - {'loss': 0.7333, 'grad_norm': 1.2210028171539307, 'learning_rate': 8.398591193522691e-06, 'epoch': 1.69} +2025-02-05 22:21:36 - ERROR - stderr - 56%|█████▋ | 12668/22434 [12:13:55<6:50:59, 2.53s/it] +2025-02-05 22:21:38 - ERROR - stderr - 56%|█████▋ | 12669/22434 [12:13:58<6:50:30, 2.52s/it] +2025-02-05 22:21:38 - ERROR - stderr - +2025-02-05 22:21:38 - ERROR - stderr - +2025-02-05 22:21:38 - INFO - stdout - {'loss': 0.7119, 'grad_norm': 1.305587887763977, 'learning_rate': 8.397166096343694e-06, 'epoch': 1.69} +2025-02-05 22:21:38 - ERROR - stderr - 56%|█████▋ | 12669/22434 [12:13:58<6:50:30, 2.52s/it] +2025-02-05 22:21:41 - ERROR - stderr - 56%|█████▋ | 12670/22434 [12:14:01<6:57:40, 2.57s/it] +2025-02-05 22:21:41 - ERROR - stderr - +2025-02-05 22:21:41 - ERROR - stderr - +2025-02-05 22:21:41 - INFO - stdout - {'loss': 0.7025, 'grad_norm': 1.28839910030365, 'learning_rate': 8.39574103257424e-06, 'epoch': 1.69} +2025-02-05 22:21:41 - ERROR - stderr - 56%|█████▋ | 12670/22434 [12:14:01<6:57:40, 2.57s/it] +2025-02-05 22:21:43 - ERROR - stderr - 56%|█████▋ | 12671/22434 [12:14:03<6:55:08, 2.55s/it] +2025-02-05 22:21:43 - ERROR - stderr - +2025-02-05 22:21:43 - ERROR - stderr - +2025-02-05 22:21:43 - INFO - stdout - {'loss': 0.6137, 'grad_norm': 1.1505221128463745, 'learning_rate': 8.394316002244023e-06, 'epoch': 1.69} +2025-02-05 22:21:43 - ERROR - stderr - 56%|█████▋ | 12671/22434 [12:14:03<6:55:08, 2.55s/it] +2025-02-05 22:21:46 - ERROR - stderr - 56%|█████▋ | 12672/22434 [12:14:06<6:54:04, 2.55s/it] +2025-02-05 22:21:46 - ERROR - stderr - +2025-02-05 22:21:46 - ERROR - stderr - +2025-02-05 22:21:46 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.1304563283920288, 'learning_rate': 8.392891005382756e-06, 'epoch': 1.69} +2025-02-05 22:21:46 - ERROR - stderr - 56%|█████▋ | 12672/22434 [12:14:06<6:54:04, 2.55s/it] +2025-02-05 22:21:48 - ERROR - stderr - 56%|█████▋ | 12673/22434 [12:14:08<6:52:54, 2.54s/it] +2025-02-05 22:21:48 - ERROR - stderr - +2025-02-05 22:21:48 - ERROR - stderr - +2025-02-05 22:21:48 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.3093501329421997, 'learning_rate': 8.39146604202014e-06, 'epoch': 1.69} +2025-02-05 22:21:48 - ERROR - stderr - 56%|█████▋ | 12673/22434 [12:14:08<6:52:54, 2.54s/it] +2025-02-05 22:21:51 - ERROR - stderr - 56%|█████▋ | 12674/22434 [12:14:11<6:49:44, 2.52s/it] +2025-02-05 22:21:51 - ERROR - stderr - +2025-02-05 22:21:51 - ERROR - stderr - +2025-02-05 22:21:51 - INFO - stdout - {'loss': 0.7549, 'grad_norm': 1.3738151788711548, 'learning_rate': 8.39004111218587e-06, 'epoch': 1.69} +2025-02-05 22:21:51 - ERROR - stderr - 56%|█████▋ | 12674/22434 [12:14:11<6:49:44, 2.52s/it] +2025-02-05 22:21:53 - ERROR - stderr - 56%|█████▋ | 12675/22434 [12:14:13<6:58:44, 2.57s/it] +2025-02-05 22:21:54 - ERROR - stderr - +2025-02-05 22:21:54 - ERROR - stderr - +2025-02-05 22:21:54 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.198561668395996, 'learning_rate': 8.388616215909657e-06, 'epoch': 1.69} +2025-02-05 22:21:54 - ERROR - stderr - 56%|█████▋ | 12675/22434 [12:14:13<6:58:44, 2.57s/it] +2025-02-05 22:21:56 - ERROR - stderr - 57%|█████▋ | 12676/22434 [12:14:16<7:02:50, 2.60s/it] +2025-02-05 22:21:56 - ERROR - stderr - +2025-02-05 22:21:56 - ERROR - stderr - +2025-02-05 22:21:56 - INFO - stdout - {'loss': 0.6396, 'grad_norm': 1.2479665279388428, 'learning_rate': 8.387191353221198e-06, 'epoch': 1.7} +2025-02-05 22:21:56 - ERROR - stderr - 57%|█████▋ | 12676/22434 [12:14:16<7:02:50, 2.60s/it] +2025-02-05 22:21:59 - ERROR - stderr - 57%|█████▋ | 12677/22434 [12:14:18<6:54:09, 2.55s/it] +2025-02-05 22:21:59 - ERROR - stderr - +2025-02-05 22:21:59 - ERROR - stderr - +2025-02-05 22:21:59 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.1282187700271606, 'learning_rate': 8.385766524150187e-06, 'epoch': 1.7} +2025-02-05 22:21:59 - ERROR - stderr - 57%|█████▋ | 12677/22434 [12:14:18<6:54:09, 2.55s/it] +2025-02-05 22:22:01 - ERROR - stderr - 57%|█████▋ | 12678/22434 [12:14:21<6:57:20, 2.57s/it] +2025-02-05 22:22:01 - ERROR - stderr - +2025-02-05 22:22:01 - ERROR - stderr - +2025-02-05 22:22:01 - INFO - stdout - {'loss': 0.7722, 'grad_norm': 1.2233806848526, 'learning_rate': 8.384341728726333e-06, 'epoch': 1.7} +2025-02-05 22:22:01 - ERROR - stderr - 57%|█████▋ | 12678/22434 [12:14:21<6:57:20, 2.57s/it] +2025-02-05 22:22:04 - ERROR - stderr - 57%|█████▋ | 12679/22434 [12:14:24<6:56:14, 2.56s/it] +2025-02-05 22:22:04 - ERROR - stderr - +2025-02-05 22:22:04 - ERROR - stderr - +2025-02-05 22:22:04 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.1995158195495605, 'learning_rate': 8.382916966979326e-06, 'epoch': 1.7} +2025-02-05 22:22:04 - ERROR - stderr - 57%|█████▋ | 12679/22434 [12:14:24<6:56:14, 2.56s/it] +2025-02-05 22:22:06 - ERROR - stderr - 57%|█████▋ | 12680/22434 [12:14:26<6:53:59, 2.55s/it] +2025-02-05 22:22:06 - ERROR - stderr - +2025-02-05 22:22:06 - ERROR - stderr - +2025-02-05 22:22:06 - INFO - stdout - {'loss': 0.6072, 'grad_norm': 1.1774523258209229, 'learning_rate': 8.381492238938868e-06, 'epoch': 1.7} +2025-02-05 22:22:06 - ERROR - stderr - 57%|█████▋ | 12680/22434 [12:14:26<6:53:59, 2.55s/it] +2025-02-05 22:22:09 - ERROR - stderr - 57%|█████▋ | 12681/22434 [12:14:28<6:50:14, 2.52s/it] +2025-02-05 22:22:09 - ERROR - stderr - +2025-02-05 22:22:09 - ERROR - stderr - +2025-02-05 22:22:09 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.2051258087158203, 'learning_rate': 8.380067544634658e-06, 'epoch': 1.7} +2025-02-05 22:22:09 - ERROR - stderr - 57%|█████▋ | 12681/22434 [12:14:29<6:50:14, 2.52s/it] +2025-02-05 22:22:11 - ERROR - stderr - 57%|█████▋ | 12682/22434 [12:14:31<6:57:21, 2.57s/it] +2025-02-05 22:22:11 - ERROR - stderr - +2025-02-05 22:22:11 - ERROR - stderr - +2025-02-05 22:22:11 - INFO - stdout - {'loss': 0.8068, 'grad_norm': 1.2617137432098389, 'learning_rate': 8.378642884096386e-06, 'epoch': 1.7} +2025-02-05 22:22:11 - ERROR - stderr - 57%|█████▋ | 12682/22434 [12:14:31<6:57:21, 2.57s/it] +2025-02-05 22:22:14 - ERROR - stderr - 57%|█████▋ | 12683/22434 [12:14:34<6:50:41, 2.53s/it] +2025-02-05 22:22:14 - ERROR - stderr - +2025-02-05 22:22:14 - ERROR - stderr - +2025-02-05 22:22:14 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.1566762924194336, 'learning_rate': 8.377218257353757e-06, 'epoch': 1.7} +2025-02-05 22:22:14 - ERROR - stderr - 57%|█████▋ | 12683/22434 [12:14:34<6:50:41, 2.53s/it] +2025-02-05 22:22:16 - ERROR - stderr - 57%|█████▋ | 12684/22434 [12:14:36<6:50:33, 2.53s/it] +2025-02-05 22:22:16 - ERROR - stderr - +2025-02-05 22:22:16 - ERROR - stderr - +2025-02-05 22:22:16 - INFO - stdout - {'loss': 0.7538, 'grad_norm': 1.3244025707244873, 'learning_rate': 8.375793664436459e-06, 'epoch': 1.7} +2025-02-05 22:22:16 - ERROR - stderr - 57%|█████▋ | 12684/22434 [12:14:36<6:50:33, 2.53s/it] +2025-02-05 22:22:19 - ERROR - stderr - 57%|█████▋ | 12685/22434 [12:14:39<6:49:18, 2.52s/it] +2025-02-05 22:22:19 - ERROR - stderr - +2025-02-05 22:22:19 - ERROR - stderr - +2025-02-05 22:22:19 - INFO - stdout - {'loss': 0.7173, 'grad_norm': 1.2162891626358032, 'learning_rate': 8.374369105374183e-06, 'epoch': 1.7} +2025-02-05 22:22:19 - ERROR - stderr - 57%|█████▋ | 12685/22434 [12:14:39<6:49:18, 2.52s/it] +2025-02-05 22:22:21 - ERROR - stderr - 57%|█████▋ | 12686/22434 [12:14:41<6:45:28, 2.50s/it] +2025-02-05 22:22:21 - ERROR - stderr - +2025-02-05 22:22:21 - ERROR - stderr - +2025-02-05 22:22:21 - INFO - stdout - {'loss': 0.6716, 'grad_norm': 1.3229069709777832, 'learning_rate': 8.372944580196631e-06, 'epoch': 1.7} +2025-02-05 22:22:21 - ERROR - stderr - 57%|█████▋ | 12686/22434 [12:14:41<6:45:28, 2.50s/it] +2025-02-05 22:22:24 - ERROR - stderr - 57%|█████▋ | 12687/22434 [12:14:44<6:46:31, 2.50s/it] +2025-02-05 22:22:24 - ERROR - stderr - +2025-02-05 22:22:24 - ERROR - stderr - +2025-02-05 22:22:24 - INFO - stdout - {'loss': 0.6156, 'grad_norm': 1.1752147674560547, 'learning_rate': 8.37152008893349e-06, 'epoch': 1.7} +2025-02-05 22:22:24 - ERROR - stderr - 57%|█████▋ | 12687/22434 [12:14:44<6:46:31, 2.50s/it] +2025-02-05 22:22:26 - ERROR - stderr - 57%|█████▋ | 12688/22434 [12:14:46<6:46:03, 2.50s/it] +2025-02-05 22:22:26 - ERROR - stderr - +2025-02-05 22:22:26 - ERROR - stderr - +2025-02-05 22:22:26 - INFO - stdout - {'loss': 0.7706, 'grad_norm': 1.2808023691177368, 'learning_rate': 8.370095631614459e-06, 'epoch': 1.7} +2025-02-05 22:22:26 - ERROR - stderr - 57%|█████▋ | 12688/22434 [12:14:46<6:46:03, 2.50s/it] +2025-02-05 22:22:29 - ERROR - stderr - 57%|█████▋ | 12689/22434 [12:14:49<6:46:48, 2.50s/it] +2025-02-05 22:22:29 - ERROR - stderr - +2025-02-05 22:22:29 - ERROR - stderr - +2025-02-05 22:22:29 - INFO - stdout - {'loss': 0.763, 'grad_norm': 1.3911948204040527, 'learning_rate': 8.368671208269224e-06, 'epoch': 1.7} +2025-02-05 22:22:29 - ERROR - stderr - 57%|█████▋ | 12689/22434 [12:14:49<6:46:48, 2.50s/it] +2025-02-05 22:22:31 - ERROR - stderr - 57%|█████▋ | 12690/22434 [12:14:51<6:50:33, 2.53s/it] +2025-02-05 22:22:31 - ERROR - stderr - +2025-02-05 22:22:31 - ERROR - stderr - +2025-02-05 22:22:31 - INFO - stdout - {'loss': 0.7614, 'grad_norm': 1.2635115385055542, 'learning_rate': 8.367246818927472e-06, 'epoch': 1.7} +2025-02-05 22:22:31 - ERROR - stderr - 57%|█████▋ | 12690/22434 [12:14:51<6:50:33, 2.53s/it] +2025-02-05 22:22:34 - ERROR - stderr - 57%|█████▋ | 12691/22434 [12:14:54<6:47:00, 2.51s/it] +2025-02-05 22:22:34 - ERROR - stderr - +2025-02-05 22:22:34 - ERROR - stderr - +2025-02-05 22:22:34 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.201280117034912, 'learning_rate': 8.365822463618902e-06, 'epoch': 1.7} +2025-02-05 22:22:34 - ERROR - stderr - 57%|█████▋ | 12691/22434 [12:14:54<6:47:00, 2.51s/it] +2025-02-05 22:22:36 - ERROR - stderr - 57%|█████▋ | 12692/22434 [12:14:56<6:47:25, 2.51s/it] +2025-02-05 22:22:36 - ERROR - stderr - +2025-02-05 22:22:36 - ERROR - stderr - +2025-02-05 22:22:36 - INFO - stdout - {'loss': 0.6224, 'grad_norm': 1.248910903930664, 'learning_rate': 8.364398142373198e-06, 'epoch': 1.7} +2025-02-05 22:22:36 - ERROR - stderr - 57%|█████▋ | 12692/22434 [12:14:56<6:47:25, 2.51s/it] +2025-02-05 22:22:39 - ERROR - stderr - 57%|█████▋ | 12693/22434 [12:14:59<6:47:34, 2.51s/it] +2025-02-05 22:22:39 - ERROR - stderr - +2025-02-05 22:22:39 - ERROR - stderr - +2025-02-05 22:22:39 - INFO - stdout - {'loss': 0.791, 'grad_norm': 1.3601746559143066, 'learning_rate': 8.362973855220046e-06, 'epoch': 1.7} +2025-02-05 22:22:39 - ERROR - stderr - 57%|█████▋ | 12693/22434 [12:14:59<6:47:34, 2.51s/it] +2025-02-05 22:22:41 - ERROR - stderr - 57%|█████▋ | 12694/22434 [12:15:01<6:44:51, 2.49s/it] +2025-02-05 22:22:41 - ERROR - stderr - +2025-02-05 22:22:41 - ERROR - stderr - +2025-02-05 22:22:41 - INFO - stdout - {'loss': 0.5768, 'grad_norm': 1.1967077255249023, 'learning_rate': 8.361549602189145e-06, 'epoch': 1.7} +2025-02-05 22:22:41 - ERROR - stderr - 57%|█████▋ | 12694/22434 [12:15:01<6:44:51, 2.49s/it] +2025-02-05 22:22:44 - ERROR - stderr - 57%|█████▋ | 12695/22434 [12:15:04<6:45:32, 2.50s/it] +2025-02-05 22:22:44 - ERROR - stderr - +2025-02-05 22:22:44 - ERROR - stderr - +2025-02-05 22:22:44 - INFO - stdout - {'loss': 0.7103, 'grad_norm': 1.216409683227539, 'learning_rate': 8.360125383310167e-06, 'epoch': 1.7} +2025-02-05 22:22:44 - ERROR - stderr - 57%|█████▋ | 12695/22434 [12:15:04<6:45:32, 2.50s/it] +2025-02-05 22:22:46 - ERROR - stderr - 57%|█████▋ | 12696/22434 [12:15:06<6:45:47, 2.50s/it] +2025-02-05 22:22:46 - ERROR - stderr - +2025-02-05 22:22:46 - ERROR - stderr - +2025-02-05 22:22:46 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.1632540225982666, 'learning_rate': 8.358701198612814e-06, 'epoch': 1.7} +2025-02-05 22:22:46 - ERROR - stderr - 57%|█████▋ | 12696/22434 [12:15:06<6:45:47, 2.50s/it] +2025-02-05 22:22:49 - ERROR - stderr - 57%|█████▋ | 12697/22434 [12:15:09<6:46:49, 2.51s/it] +2025-02-05 22:22:49 - ERROR - stderr - +2025-02-05 22:22:49 - ERROR - stderr - +2025-02-05 22:22:49 - INFO - stdout - {'loss': 0.7105, 'grad_norm': 1.1656478643417358, 'learning_rate': 8.35727704812676e-06, 'epoch': 1.7} +2025-02-05 22:22:49 - ERROR - stderr - 57%|█████▋ | 12697/22434 [12:15:09<6:46:49, 2.51s/it] +2025-02-05 22:22:51 - ERROR - stderr - 57%|█████▋ | 12698/22434 [12:15:11<6:45:18, 2.50s/it] +2025-02-05 22:22:51 - ERROR - stderr - +2025-02-05 22:22:51 - ERROR - stderr - +2025-02-05 22:22:51 - INFO - stdout - {'loss': 0.7968, 'grad_norm': 1.3913065195083618, 'learning_rate': 8.355852931881692e-06, 'epoch': 1.7} +2025-02-05 22:22:51 - ERROR - stderr - 57%|█████▋ | 12698/22434 [12:15:11<6:45:18, 2.50s/it] +2025-02-05 22:22:54 - ERROR - stderr - 57%|█████▋ | 12699/22434 [12:15:14<6:44:24, 2.49s/it] +2025-02-05 22:22:54 - ERROR - stderr - +2025-02-05 22:22:54 - ERROR - stderr - +2025-02-05 22:22:54 - INFO - stdout - {'loss': 0.6451, 'grad_norm': 1.0954447984695435, 'learning_rate': 8.354428849907298e-06, 'epoch': 1.7} +2025-02-05 22:22:54 - ERROR - stderr - 57%|█████▋ | 12699/22434 [12:15:14<6:44:24, 2.49s/it] +2025-02-05 22:22:56 - ERROR - stderr - 57%|█████▋ | 12700/22434 [12:15:16<6:43:15, 2.49s/it] +2025-02-05 22:22:56 - ERROR - stderr - +2025-02-05 22:22:56 - ERROR - stderr - +2025-02-05 22:22:56 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.303686499595642, 'learning_rate': 8.353004802233262e-06, 'epoch': 1.7} +2025-02-05 22:22:56 - ERROR - stderr - 57%|█████▋ | 12700/22434 [12:15:16<6:43:15, 2.49s/it] +2025-02-05 22:22:59 - ERROR - stderr - 57%|█████▋ | 12701/22434 [12:15:19<6:57:26, 2.57s/it] +2025-02-05 22:22:59 - ERROR - stderr - +2025-02-05 22:22:59 - ERROR - stderr - +2025-02-05 22:22:59 - INFO - stdout - {'loss': 0.7608, 'grad_norm': 1.205492615699768, 'learning_rate': 8.35158078888926e-06, 'epoch': 1.7} +2025-02-05 22:22:59 - ERROR - stderr - 57%|█████▋ | 12701/22434 [12:15:19<6:57:26, 2.57s/it] +2025-02-05 22:23:02 - ERROR - stderr - 57%|█████▋ | 12702/22434 [12:15:21<6:51:19, 2.54s/it] +2025-02-05 22:23:02 - ERROR - stderr - +2025-02-05 22:23:02 - ERROR - stderr - +2025-02-05 22:23:02 - INFO - stdout - {'loss': 0.7449, 'grad_norm': 1.4294120073318481, 'learning_rate': 8.350156809904984e-06, 'epoch': 1.7} +2025-02-05 22:23:02 - ERROR - stderr - 57%|█████▋ | 12702/22434 [12:15:21<6:51:19, 2.54s/it] +2025-02-05 22:23:04 - ERROR - stderr - 57%|█████▋ | 12703/22434 [12:15:24<6:46:59, 2.51s/it] +2025-02-05 22:23:04 - ERROR - stderr - +2025-02-05 22:23:04 - ERROR - stderr - +2025-02-05 22:23:04 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.3111544847488403, 'learning_rate': 8.348732865310107e-06, 'epoch': 1.7} +2025-02-05 22:23:04 - ERROR - stderr - 57%|█████▋ | 12703/22434 [12:15:24<6:46:59, 2.51s/it] +2025-02-05 22:23:06 - ERROR - stderr - 57%|█████▋ | 12704/22434 [12:15:26<6:44:46, 2.50s/it] +2025-02-05 22:23:06 - ERROR - stderr - +2025-02-05 22:23:06 - ERROR - stderr - +2025-02-05 22:23:06 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.2754124402999878, 'learning_rate': 8.347308955134317e-06, 'epoch': 1.7} +2025-02-05 22:23:06 - ERROR - stderr - 57%|█████▋ | 12704/22434 [12:15:26<6:44:46, 2.50s/it] +2025-02-05 22:23:09 - ERROR - stderr - 57%|█████▋ | 12705/22434 [12:15:29<6:47:54, 2.52s/it] +2025-02-05 22:23:09 - ERROR - stderr - +2025-02-05 22:23:09 - ERROR - stderr - +2025-02-05 22:23:09 - INFO - stdout - {'loss': 0.6674, 'grad_norm': 1.3212411403656006, 'learning_rate': 8.345885079407287e-06, 'epoch': 1.7} +2025-02-05 22:23:09 - ERROR - stderr - 57%|█████▋ | 12705/22434 [12:15:29<6:47:54, 2.52s/it] +2025-02-05 22:23:12 - ERROR - stderr - 57%|█████▋ | 12706/22434 [12:15:31<6:50:32, 2.53s/it] +2025-02-05 22:23:12 - ERROR - stderr - +2025-02-05 22:23:12 - ERROR - stderr - +2025-02-05 22:23:12 - INFO - stdout - {'loss': 0.657, 'grad_norm': 1.2182207107543945, 'learning_rate': 8.3444612381587e-06, 'epoch': 1.7} +2025-02-05 22:23:12 - ERROR - stderr - 57%|█████▋ | 12706/22434 [12:15:31<6:50:32, 2.53s/it] +2025-02-05 22:23:14 - ERROR - stderr - 57%|█████▋ | 12707/22434 [12:15:34<6:49:26, 2.53s/it] +2025-02-05 22:23:14 - ERROR - stderr - +2025-02-05 22:23:14 - ERROR - stderr - +2025-02-05 22:23:14 - INFO - stdout - {'loss': 0.7699, 'grad_norm': 1.2850276231765747, 'learning_rate': 8.343037431418236e-06, 'epoch': 1.7} +2025-02-05 22:23:14 - ERROR - stderr - 57%|█████▋ | 12707/22434 [12:15:34<6:49:26, 2.53s/it] +2025-02-05 22:23:17 - ERROR - stderr - 57%|█████▋ | 12708/22434 [12:15:36<6:50:00, 2.53s/it] +2025-02-05 22:23:17 - ERROR - stderr - +2025-02-05 22:23:17 - ERROR - stderr - +2025-02-05 22:23:17 - INFO - stdout - {'loss': 0.6862, 'grad_norm': 1.2729556560516357, 'learning_rate': 8.341613659215574e-06, 'epoch': 1.7} +2025-02-05 22:23:17 - ERROR - stderr - 57%|█████▋ | 12708/22434 [12:15:36<6:50:00, 2.53s/it] +2025-02-05 22:23:19 - ERROR - stderr - 57%|█████▋ | 12709/22434 [12:15:39<6:48:12, 2.52s/it] +2025-02-05 22:23:19 - ERROR - stderr - +2025-02-05 22:23:19 - ERROR - stderr - +2025-02-05 22:23:19 - INFO - stdout - {'loss': 0.6737, 'grad_norm': 1.2506012916564941, 'learning_rate': 8.340189921580383e-06, 'epoch': 1.7} +2025-02-05 22:23:19 - ERROR - stderr - 57%|█████▋ | 12709/22434 [12:15:39<6:48:12, 2.52s/it] +2025-02-05 22:23:22 - ERROR - stderr - 57%|█████▋ | 12710/22434 [12:15:41<6:45:38, 2.50s/it] +2025-02-05 22:23:22 - ERROR - stderr - +2025-02-05 22:23:22 - ERROR - stderr - +2025-02-05 22:23:22 - INFO - stdout - {'loss': 0.7618, 'grad_norm': 1.2971998453140259, 'learning_rate': 8.338766218542348e-06, 'epoch': 1.7} +2025-02-05 22:23:22 - ERROR - stderr - 57%|█████▋ | 12710/22434 [12:15:41<6:45:38, 2.50s/it] +2025-02-05 22:23:24 - ERROR - stderr - 57%|█████▋ | 12711/22434 [12:15:44<6:45:53, 2.50s/it] +2025-02-05 22:23:24 - ERROR - stderr - +2025-02-05 22:23:24 - ERROR - stderr - +2025-02-05 22:23:24 - INFO - stdout - {'loss': 0.752, 'grad_norm': 1.3115055561065674, 'learning_rate': 8.337342550131137e-06, 'epoch': 1.7} +2025-02-05 22:23:24 - ERROR - stderr - 57%|█████▋ | 12711/22434 [12:15:44<6:45:53, 2.50s/it] +2025-02-05 22:23:27 - ERROR - stderr - 57%|█████▋ | 12712/22434 [12:15:46<6:48:01, 2.52s/it] +2025-02-05 22:23:27 - ERROR - stderr - +2025-02-05 22:23:27 - ERROR - stderr - +2025-02-05 22:23:27 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.2073111534118652, 'learning_rate': 8.335918916376435e-06, 'epoch': 1.7} +2025-02-05 22:23:27 - ERROR - stderr - 57%|█████▋ | 12712/22434 [12:15:46<6:48:01, 2.52s/it] +2025-02-05 22:23:29 - ERROR - stderr - 57%|█████▋ | 12713/22434 [12:15:49<6:45:38, 2.50s/it] +2025-02-05 22:23:29 - ERROR - stderr - +2025-02-05 22:23:29 - ERROR - stderr - +2025-02-05 22:23:29 - INFO - stdout - {'loss': 0.7149, 'grad_norm': 1.380050539970398, 'learning_rate': 8.33449531730791e-06, 'epoch': 1.7} +2025-02-05 22:23:29 - ERROR - stderr - 57%|█████▋ | 12713/22434 [12:15:49<6:45:38, 2.50s/it] +2025-02-05 22:23:32 - ERROR - stderr - 57%|█████▋ | 12714/22434 [12:15:51<6:43:02, 2.49s/it] +2025-02-05 22:23:32 - ERROR - stderr - +2025-02-05 22:23:32 - ERROR - stderr - +2025-02-05 22:23:32 - INFO - stdout - {'loss': 0.7359, 'grad_norm': 1.2507132291793823, 'learning_rate': 8.333071752955233e-06, 'epoch': 1.7} +2025-02-05 22:23:32 - ERROR - stderr - 57%|█████▋ | 12714/22434 [12:15:51<6:43:02, 2.49s/it] +2025-02-05 22:23:34 - ERROR - stderr - 57%|█████▋ | 12715/22434 [12:15:54<6:40:12, 2.47s/it] +2025-02-05 22:23:34 - ERROR - stderr - +2025-02-05 22:23:34 - ERROR - stderr - +2025-02-05 22:23:34 - INFO - stdout - {'loss': 0.5877, 'grad_norm': 1.1087751388549805, 'learning_rate': 8.331648223348083e-06, 'epoch': 1.7} +2025-02-05 22:23:34 - ERROR - stderr - 57%|█████▋ | 12715/22434 [12:15:54<6:40:12, 2.47s/it] +2025-02-05 22:23:37 - ERROR - stderr - 57%|█████▋ | 12716/22434 [12:15:56<6:51:52, 2.54s/it] +2025-02-05 22:23:37 - ERROR - stderr - +2025-02-05 22:23:37 - ERROR - stderr - +2025-02-05 22:23:37 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.2438302040100098, 'learning_rate': 8.330224728516132e-06, 'epoch': 1.7} +2025-02-05 22:23:37 - ERROR - stderr - 57%|█████▋ | 12716/22434 [12:15:57<6:51:52, 2.54s/it] +2025-02-05 22:23:39 - ERROR - stderr - 57%|█████▋ | 12717/22434 [12:15:59<6:46:26, 2.51s/it] +2025-02-05 22:23:39 - ERROR - stderr - +2025-02-05 22:23:39 - ERROR - stderr - +2025-02-05 22:23:39 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.3261548280715942, 'learning_rate': 8.328801268489043e-06, 'epoch': 1.7} +2025-02-05 22:23:39 - ERROR - stderr - 57%|█████▋ | 12717/22434 [12:15:59<6:46:26, 2.51s/it] +2025-02-05 22:23:42 - ERROR - stderr - 57%|█████▋ | 12718/22434 [12:16:01<6:43:50, 2.49s/it] +2025-02-05 22:23:42 - ERROR - stderr - +2025-02-05 22:23:42 - ERROR - stderr - +2025-02-05 22:23:42 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.1668728590011597, 'learning_rate': 8.327377843296493e-06, 'epoch': 1.7} +2025-02-05 22:23:42 - ERROR - stderr - 57%|█████▋ | 12718/22434 [12:16:01<6:43:50, 2.49s/it] +2025-02-05 22:23:44 - ERROR - stderr - 57%|█████▋ | 12719/22434 [12:16:04<6:41:23, 2.48s/it] +2025-02-05 22:23:44 - ERROR - stderr - +2025-02-05 22:23:44 - ERROR - stderr - +2025-02-05 22:23:44 - INFO - stdout - {'loss': 0.6628, 'grad_norm': 1.1999346017837524, 'learning_rate': 8.325954452968152e-06, 'epoch': 1.7} +2025-02-05 22:23:44 - ERROR - stderr - 57%|█████▋ | 12719/22434 [12:16:04<6:41:23, 2.48s/it] +2025-02-05 22:23:47 - ERROR - stderr - 57%|█████▋ | 12720/22434 [12:16:06<6:42:36, 2.49s/it] +2025-02-05 22:23:47 - ERROR - stderr - +2025-02-05 22:23:47 - ERROR - stderr - +2025-02-05 22:23:47 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.2253391742706299, 'learning_rate': 8.324531097533692e-06, 'epoch': 1.7} +2025-02-05 22:23:47 - ERROR - stderr - 57%|█████▋ | 12720/22434 [12:16:06<6:42:36, 2.49s/it] +2025-02-05 22:23:49 - ERROR - stderr - 57%|█████▋ | 12721/22434 [12:16:09<6:43:40, 2.49s/it] +2025-02-05 22:23:49 - ERROR - stderr - +2025-02-05 22:23:49 - ERROR - stderr - +2025-02-05 22:23:49 - INFO - stdout - {'loss': 0.7465, 'grad_norm': 1.3300126791000366, 'learning_rate': 8.323107777022778e-06, 'epoch': 1.7} +2025-02-05 22:23:49 - ERROR - stderr - 57%|█████▋ | 12721/22434 [12:16:09<6:43:40, 2.49s/it] +2025-02-05 22:23:52 - ERROR - stderr - 57%|█████▋ | 12722/22434 [12:16:11<6:50:30, 2.54s/it] +2025-02-05 22:23:52 - ERROR - stderr - +2025-02-05 22:23:52 - ERROR - stderr - +2025-02-05 22:23:52 - INFO - stdout - {'loss': 0.6327, 'grad_norm': 1.2687304019927979, 'learning_rate': 8.321684491465072e-06, 'epoch': 1.7} +2025-02-05 22:23:52 - ERROR - stderr - 57%|█████▋ | 12722/22434 [12:16:12<6:50:30, 2.54s/it] +2025-02-05 22:23:54 - ERROR - stderr - 57%|█████▋ | 12723/22434 [12:16:14<6:49:49, 2.53s/it] +2025-02-05 22:23:54 - ERROR - stderr - +2025-02-05 22:23:54 - ERROR - stderr - +2025-02-05 22:23:54 - INFO - stdout - {'loss': 0.812, 'grad_norm': 1.4037764072418213, 'learning_rate': 8.320261240890253e-06, 'epoch': 1.7} +2025-02-05 22:23:54 - ERROR - stderr - 57%|█████▋ | 12723/22434 [12:16:14<6:49:49, 2.53s/it] +2025-02-05 22:23:57 - ERROR - stderr - 57%|█████▋ | 12724/22434 [12:16:16<6:45:49, 2.51s/it] +2025-02-05 22:23:57 - ERROR - stderr - +2025-02-05 22:23:57 - ERROR - stderr - +2025-02-05 22:23:57 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.3550457954406738, 'learning_rate': 8.318838025327977e-06, 'epoch': 1.7} +2025-02-05 22:23:57 - ERROR - stderr - 57%|█████▋ | 12724/22434 [12:16:16<6:45:49, 2.51s/it] +2025-02-05 22:23:59 - ERROR - stderr - 57%|█████▋ | 12725/22434 [12:16:19<6:45:29, 2.51s/it] +2025-02-05 22:23:59 - ERROR - stderr - +2025-02-05 22:23:59 - ERROR - stderr - +2025-02-05 22:23:59 - INFO - stdout - {'loss': 0.7251, 'grad_norm': 1.3242225646972656, 'learning_rate': 8.317414844807915e-06, 'epoch': 1.7} +2025-02-05 22:23:59 - ERROR - stderr - 57%|█████▋ | 12725/22434 [12:16:19<6:45:29, 2.51s/it] +2025-02-05 22:24:02 - ERROR - stderr - 57%|█████▋ | 12726/22434 [12:16:21<6:47:13, 2.52s/it] +2025-02-05 22:24:02 - ERROR - stderr - +2025-02-05 22:24:02 - ERROR - stderr - +2025-02-05 22:24:02 - INFO - stdout - {'loss': 0.7169, 'grad_norm': 1.2529979944229126, 'learning_rate': 8.31599169935973e-06, 'epoch': 1.7} +2025-02-05 22:24:02 - ERROR - stderr - 57%|█████▋ | 12726/22434 [12:16:22<6:47:13, 2.52s/it] +2025-02-05 22:24:04 - ERROR - stderr - 57%|█████▋ | 12727/22434 [12:16:24<6:50:45, 2.54s/it] +2025-02-05 22:24:04 - ERROR - stderr - +2025-02-05 22:24:04 - ERROR - stderr - +2025-02-05 22:24:04 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.2296323776245117, 'learning_rate': 8.314568589013085e-06, 'epoch': 1.7} +2025-02-05 22:24:04 - ERROR - stderr - 57%|█████▋ | 12727/22434 [12:16:24<6:50:45, 2.54s/it] +2025-02-05 22:24:07 - ERROR - stderr - 57%|█████▋ | 12728/22434 [12:16:27<6:50:00, 2.53s/it] +2025-02-05 22:24:07 - ERROR - stderr - +2025-02-05 22:24:07 - ERROR - stderr - +2025-02-05 22:24:07 - INFO - stdout - {'loss': 0.6014, 'grad_norm': 1.1020498275756836, 'learning_rate': 8.31314551379765e-06, 'epoch': 1.7} +2025-02-05 22:24:07 - ERROR - stderr - 57%|█████�� | 12728/22434 [12:16:27<6:50:00, 2.53s/it] +2025-02-05 22:24:09 - ERROR - stderr - 57%|█████▋ | 12729/22434 [12:16:29<6:49:40, 2.53s/it] +2025-02-05 22:24:09 - ERROR - stderr - +2025-02-05 22:24:09 - ERROR - stderr - +2025-02-05 22:24:09 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.1587034463882446, 'learning_rate': 8.311722473743082e-06, 'epoch': 1.7} +2025-02-05 22:24:09 - ERROR - stderr - 57%|█████▋ | 12729/22434 [12:16:29<6:49:40, 2.53s/it] +2025-02-05 22:24:12 - ERROR - stderr - 57%|█████▋ | 12730/22434 [12:16:32<6:52:05, 2.55s/it] +2025-02-05 22:24:12 - ERROR - stderr - +2025-02-05 22:24:12 - ERROR - stderr - +2025-02-05 22:24:12 - INFO - stdout - {'loss': 0.7195, 'grad_norm': 1.3119348287582397, 'learning_rate': 8.31029946887904e-06, 'epoch': 1.7} +2025-02-05 22:24:12 - ERROR - stderr - 57%|█████▋ | 12730/22434 [12:16:32<6:52:05, 2.55s/it] +2025-02-05 22:24:14 - ERROR - stderr - 57%|█████▋ | 12731/22434 [12:16:34<6:46:05, 2.51s/it] +2025-02-05 22:24:14 - ERROR - stderr - +2025-02-05 22:24:14 - ERROR - stderr - +2025-02-05 22:24:14 - INFO - stdout - {'loss': 0.6822, 'grad_norm': 1.3630188703536987, 'learning_rate': 8.308876499235189e-06, 'epoch': 1.7} +2025-02-05 22:24:14 - ERROR - stderr - 57%|█████▋ | 12731/22434 [12:16:34<6:46:05, 2.51s/it] +2025-02-05 22:24:17 - ERROR - stderr - 57%|█████▋ | 12732/22434 [12:16:37<6:41:09, 2.48s/it] +2025-02-05 22:24:17 - ERROR - stderr - +2025-02-05 22:24:17 - ERROR - stderr - +2025-02-05 22:24:17 - INFO - stdout - {'loss': 0.6474, 'grad_norm': 1.112807035446167, 'learning_rate': 8.307453564841193e-06, 'epoch': 1.7} +2025-02-05 22:24:17 - ERROR - stderr - 57%|█████▋ | 12732/22434 [12:16:37<6:41:09, 2.48s/it] +2025-02-05 22:24:19 - ERROR - stderr - 57%|█████▋ | 12733/22434 [12:16:39<6:42:26, 2.49s/it] +2025-02-05 22:24:19 - ERROR - stderr - +2025-02-05 22:24:19 - ERROR - stderr - +2025-02-05 22:24:19 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.187903881072998, 'learning_rate': 8.3060306657267e-06, 'epoch': 1.7} +2025-02-05 22:24:19 - ERROR - stderr - 57%|█████▋ | 12733/22434 [12:16:39<6:42:26, 2.49s/it] +2025-02-05 22:24:22 - ERROR - stderr - 57%|█████▋ | 12734/22434 [12:16:42<6:46:46, 2.52s/it] +2025-02-05 22:24:22 - ERROR - stderr - +2025-02-05 22:24:22 - ERROR - stderr - +2025-02-05 22:24:22 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.320151925086975, 'learning_rate': 8.304607801921385e-06, 'epoch': 1.7} +2025-02-05 22:24:22 - ERROR - stderr - 57%|█████▋ | 12734/22434 [12:16:42<6:46:46, 2.52s/it] +2025-02-05 22:24:24 - ERROR - stderr - 57%|█████▋ | 12735/22434 [12:16:44<6:44:06, 2.50s/it] +2025-02-05 22:24:24 - ERROR - stderr - +2025-02-05 22:24:24 - ERROR - stderr - +2025-02-05 22:24:24 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.2018464803695679, 'learning_rate': 8.303184973454893e-06, 'epoch': 1.7} +2025-02-05 22:24:24 - ERROR - stderr - 57%|█████▋ | 12735/22434 [12:16:44<6:44:06, 2.50s/it] +2025-02-05 22:24:27 - ERROR - stderr - 57%|█████▋ | 12736/22434 [12:16:47<6:40:43, 2.48s/it] +2025-02-05 22:24:27 - ERROR - stderr - +2025-02-05 22:24:27 - ERROR - stderr - +2025-02-05 22:24:27 - INFO - stdout - {'loss': 0.7203, 'grad_norm': 1.2483669519424438, 'learning_rate': 8.301762180356891e-06, 'epoch': 1.7} +2025-02-05 22:24:27 - ERROR - stderr - 57%|█████▋ | 12736/22434 [12:16:47<6:40:43, 2.48s/it] +2025-02-05 22:24:29 - ERROR - stderr - 57%|█████▋ | 12737/22434 [12:16:49<6:40:00, 2.48s/it] +2025-02-05 22:24:29 - ERROR - stderr - +2025-02-05 22:24:29 - ERROR - stderr - +2025-02-05 22:24:29 - INFO - stdout - {'loss': 0.6704, 'grad_norm': 1.119397759437561, 'learning_rate': 8.300339422657027e-06, 'epoch': 1.7} +2025-02-05 22:24:29 - ERROR - stderr - 57%|█████▋ | 12737/22434 [12:16:49<6:40:00, 2.48s/it] +2025-02-05 22:24:32 - ERROR - stderr - 57%|█████▋ | 12738/22434 [12:16:51<6:40:10, 2.48s/it] +2025-02-05 22:24:32 - ERROR - stderr - +2025-02-05 22:24:32 - ERROR - stderr - +2025-02-05 22:24:32 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.1859266757965088, 'learning_rate': 8.29891670038496e-06, 'epoch': 1.7} +2025-02-05 22:24:32 - ERROR - stderr - 57%|█████▋ | 12738/22434 [12:16:52<6:40:10, 2.48s/it] +2025-02-05 22:24:34 - ERROR - stderr - 57%|█████▋ | 12739/22434 [12:16:54<6:38:29, 2.47s/it] +2025-02-05 22:24:34 - ERROR - stderr - +2025-02-05 22:24:34 - ERROR - stderr - +2025-02-05 22:24:34 - INFO - stdout - {'loss': 0.6685, 'grad_norm': 1.1386950016021729, 'learning_rate': 8.297494013570354e-06, 'epoch': 1.7} +2025-02-05 22:24:34 - ERROR - stderr - 57%|█████▋ | 12739/22434 [12:16:54<6:38:29, 2.47s/it] +2025-02-05 22:24:37 - ERROR - stderr - 57%|█████▋ | 12740/22434 [12:16:56<6:41:42, 2.49s/it] +2025-02-05 22:24:37 - ERROR - stderr - +2025-02-05 22:24:37 - ERROR - stderr - +2025-02-05 22:24:37 - INFO - stdout - {'loss': 0.7104, 'grad_norm': 1.216395378112793, 'learning_rate': 8.296071362242853e-06, 'epoch': 1.7} +2025-02-05 22:24:37 - ERROR - stderr - 57%|█████▋ | 12740/22434 [12:16:56<6:41:42, 2.49s/it] +2025-02-05 22:24:39 - ERROR - stderr - 57%|█████▋ | 12741/22434 [12:16:59<6:44:30, 2.50s/it] +2025-02-05 22:24:39 - ERROR - stderr - +2025-02-05 22:24:39 - ERROR - stderr - +2025-02-05 22:24:39 - INFO - stdout - {'loss': 0.6096, 'grad_norm': 1.215865969657898, 'learning_rate': 8.29464874643211e-06, 'epoch': 1.7} +2025-02-05 22:24:39 - ERROR - stderr - 57%|█████▋ | 12741/22434 [12:16:59<6:44:30, 2.50s/it] +2025-02-05 22:24:42 - ERROR - stderr - 57%|█████▋ | 12742/22434 [12:17:02<6:46:26, 2.52s/it] +2025-02-05 22:24:42 - ERROR - stderr - +2025-02-05 22:24:42 - ERROR - stderr - +2025-02-05 22:24:42 - INFO - stdout - {'loss': 0.7321, 'grad_norm': 1.287767767906189, 'learning_rate': 8.293226166167788e-06, 'epoch': 1.7} +2025-02-05 22:24:42 - ERROR - stderr - 57%|█████▋ | 12742/22434 [12:17:02<6:46:26, 2.52s/it] +2025-02-05 22:24:44 - ERROR - stderr - 57%|█████▋ | 12743/22434 [12:17:04<6:47:16, 2.52s/it] +2025-02-05 22:24:44 - ERROR - stderr - +2025-02-05 22:24:44 - ERROR - stderr - +2025-02-05 22:24:44 - INFO - stdout - {'loss': 0.6479, 'grad_norm': 1.1900346279144287, 'learning_rate': 8.291803621479528e-06, 'epoch': 1.7} +2025-02-05 22:24:44 - ERROR - stderr - 57%|█████▋ | 12743/22434 [12:17:04<6:47:16, 2.52s/it] +2025-02-05 22:24:47 - ERROR - stderr - 57%|█████▋ | 12744/22434 [12:17:07<6:47:53, 2.53s/it] +2025-02-05 22:24:47 - ERROR - stderr - +2025-02-05 22:24:47 - ERROR - stderr - +2025-02-05 22:24:47 - INFO - stdout - {'loss': 0.6526, 'grad_norm': 1.2542630434036255, 'learning_rate': 8.290381112396989e-06, 'epoch': 1.7} +2025-02-05 22:24:47 - ERROR - stderr - 57%|█████▋ | 12744/22434 [12:17:07<6:47:53, 2.53s/it] +2025-02-05 22:24:50 - ERROR - stderr - 57%|█████▋ | 12745/22434 [12:17:09<6:55:24, 2.57s/it] +2025-02-05 22:24:50 - ERROR - stderr - +2025-02-05 22:24:50 - ERROR - stderr - +2025-02-05 22:24:50 - INFO - stdout - {'loss': 0.6047, 'grad_norm': 1.2381327152252197, 'learning_rate': 8.288958638949822e-06, 'epoch': 1.7} +2025-02-05 22:24:50 - ERROR - stderr - 57%|█████▋ | 12745/22434 [12:17:09<6:55:24, 2.57s/it] +2025-02-05 22:24:52 - ERROR - stderr - 57%|█████▋ | 12746/22434 [12:17:12<6:51:11, 2.55s/it] +2025-02-05 22:24:52 - ERROR - stderr - +2025-02-05 22:24:52 - ERROR - stderr - +2025-02-05 22:24:52 - INFO - stdout - {'loss': 0.7326, 'grad_norm': 1.3802180290222168, 'learning_rate': 8.28753620116767e-06, 'epoch': 1.7} +2025-02-05 22:24:52 - ERROR - stderr - 57%|█████▋ | 12746/22434 [12:17:12<6:51:11, 2.55s/it] +2025-02-05 22:24:54 - ERROR - stderr - 57%|█████▋ | 12747/22434 [12:17:14<6:46:21, 2.52s/it] +2025-02-05 22:24:54 - ERROR - stderr - +2025-02-05 22:24:54 - ERROR - stderr - +2025-02-05 22:24:54 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.191171407699585, 'learning_rate': 8.286113799080192e-06, 'epoch': 1.7} +2025-02-05 22:24:54 - ERROR - stderr - 57%|█████▋ | 12747/22434 [12:17:14<6:46:21, 2.52s/it] +2025-02-05 22:24:57 - ERROR - stderr - 57%|█████▋ | 12748/22434 [12:17:17<6:58:31, 2.59s/it] +2025-02-05 22:24:57 - ERROR - stderr - +2025-02-05 22:24:57 - ERROR - stderr - +2025-02-05 22:24:57 - INFO - stdout - {'loss': 0.7179, 'grad_norm': 1.2118676900863647, 'learning_rate': 8.284691432717028e-06, 'epoch': 1.7} +2025-02-05 22:24:57 - ERROR - stderr - 57%|█████▋ | 12748/22434 [12:17:17<6:58:31, 2.59s/it] +2025-02-05 22:25:00 - ERROR - stderr - 57%|█████▋ | 12749/22434 [12:17:19<6:51:13, 2.55s/it] +2025-02-05 22:25:00 - ERROR - stderr - +2025-02-05 22:25:00 - ERROR - stderr - +2025-02-05 22:25:00 - INFO - stdout - {'loss': 0.6396, 'grad_norm': 1.221256971359253, 'learning_rate': 8.283269102107832e-06, 'epoch': 1.7} +2025-02-05 22:25:00 - ERROR - stderr - 57%|█████▋ | 12749/22434 [12:17:19<6:51:13, 2.55s/it] +2025-02-05 22:25:02 - ERROR - stderr - 57%|█████▋ | 12750/22434 [12:17:22<6:45:42, 2.51s/it] +2025-02-05 22:25:02 - ERROR - stderr - +2025-02-05 22:25:02 - ERROR - stderr - +2025-02-05 22:25:02 - INFO - stdout - {'loss': 0.6516, 'grad_norm': 1.2459454536437988, 'learning_rate': 8.281846807282248e-06, 'epoch': 1.71} +2025-02-05 22:25:02 - ERROR - stderr - 57%|█████▋ | 12750/22434 [12:17:22<6:45:42, 2.51s/it] +2025-02-05 22:25:05 - ERROR - stderr - 57%|█████▋ | 12751/22434 [12:17:24<6:44:32, 2.51s/it] +2025-02-05 22:25:05 - ERROR - stderr - +2025-02-05 22:25:05 - ERROR - stderr - +2025-02-05 22:25:05 - INFO - stdout - {'loss': 0.6838, 'grad_norm': 1.3179590702056885, 'learning_rate': 8.280424548269922e-06, 'epoch': 1.71} +2025-02-05 22:25:05 - ERROR - stderr - 57%|█████▋ | 12751/22434 [12:17:24<6:44:32, 2.51s/it] +2025-02-05 22:25:07 - ERROR - stderr - 57%|█████▋ | 12752/22434 [12:17:27<6:45:39, 2.51s/it] +2025-02-05 22:25:07 - ERROR - stderr - +2025-02-05 22:25:07 - ERROR - stderr - +2025-02-05 22:25:07 - INFO - stdout - {'loss': 0.6699, 'grad_norm': 1.1464793682098389, 'learning_rate': 8.279002325100505e-06, 'epoch': 1.71} +2025-02-05 22:25:07 - ERROR - stderr - 57%|█████▋ | 12752/22434 [12:17:27<6:45:39, 2.51s/it] +2025-02-05 22:25:10 - ERROR - stderr - 57%|█████▋ | 12753/22434 [12:17:29<6:46:50, 2.52s/it] +2025-02-05 22:25:10 - ERROR - stderr - +2025-02-05 22:25:10 - ERROR - stderr - +2025-02-05 22:25:10 - INFO - stdout - {'loss': 0.705, 'grad_norm': 1.2682521343231201, 'learning_rate': 8.277580137803636e-06, 'epoch': 1.71} +2025-02-05 22:25:10 - ERROR - stderr - 57%|█████▋ | 12753/22434 [12:17:29<6:46:50, 2.52s/it] +2025-02-05 22:25:12 - ERROR - stderr - 57%|█████▋ | 12754/22434 [12:17:32<6:45:27, 2.51s/it] +2025-02-05 22:25:12 - ERROR - stderr - +2025-02-05 22:25:12 - ERROR - stderr - +2025-02-05 22:25:12 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 1.2531061172485352, 'learning_rate': 8.276157986408959e-06, 'epoch': 1.71} +2025-02-05 22:25:12 - ERROR - stderr - 57%|█████▋ | 12754/22434 [12:17:32<6:45:27, 2.51s/it] +2025-02-05 22:25:15 - ERROR - stderr - 57%|█████▋ | 12755/22434 [12:17:34<6:44:09, 2.51s/it] +2025-02-05 22:25:15 - ERROR - stderr - +2025-02-05 22:25:15 - ERROR - stderr - +2025-02-05 22:25:15 - INFO - stdout - {'loss': 0.6488, 'grad_norm': 1.2168264389038086, 'learning_rate': 8.274735870946122e-06, 'epoch': 1.71} +2025-02-05 22:25:15 - ERROR - stderr - 57%|█████▋ | 12755/22434 [12:17:34<6:44:09, 2.51s/it] +2025-02-05 22:25:17 - ERROR - stderr - 57%|█████▋ | 12756/22434 [12:17:37<6:41:04, 2.49s/it] +2025-02-05 22:25:17 - ERROR - stderr - +2025-02-05 22:25:17 - ERROR - stderr - +2025-02-05 22:25:17 - INFO - stdout - {'loss': 0.8013, 'grad_norm': 1.4569647312164307, 'learning_rate': 8.273313791444762e-06, 'epoch': 1.71} +2025-02-05 22:25:17 - ERROR - stderr - 57%|█████▋ | 12756/22434 [12:17:37<6:41:04, 2.49s/it] +2025-02-05 22:25:19 - ERROR - stderr - 57%|█████▋ | 12757/22434 [12:17:39<6:37:56, 2.47s/it] +2025-02-05 22:25:20 - ERROR - stderr - +2025-02-05 22:25:20 - ERROR - stderr - +2025-02-05 22:25:20 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.2833836078643799, 'learning_rate': 8.271891747934524e-06, 'epoch': 1.71} +2025-02-05 22:25:20 - ERROR - stderr - 57%|█████▋ | 12757/22434 [12:17:39<6:37:56, 2.47s/it] +2025-02-05 22:25:22 - ERROR - stderr - 57%|█████▋ | 12758/22434 [12:17:42<6:40:00, 2.48s/it] +2025-02-05 22:25:22 - ERROR - stderr - +2025-02-05 22:25:22 - ERROR - stderr - +2025-02-05 22:25:22 - INFO - stdout - {'loss': 0.6702, 'grad_norm': 1.269630789756775, 'learning_rate': 8.270469740445052e-06, 'epoch': 1.71} +2025-02-05 22:25:22 - ERROR - stderr - 57%|█████▋ | 12758/22434 [12:17:42<6:40:00, 2.48s/it] +2025-02-05 22:25:24 - ERROR - stderr - 57%|█████▋ | 12759/22434 [12:17:44<6:36:28, 2.46s/it] +2025-02-05 22:25:24 - ERROR - stderr - +2025-02-05 22:25:24 - ERROR - stderr - +2025-02-05 22:25:24 - INFO - stdout - {'loss': 0.773, 'grad_norm': 1.4853568077087402, 'learning_rate': 8.269047769005978e-06, 'epoch': 1.71} +2025-02-05 22:25:24 - ERROR - stderr - 57%|█████▋ | 12759/22434 [12:17:44<6:36:28, 2.46s/it] +2025-02-05 22:25:27 - ERROR - stderr - 57%|█████▋ | 12760/22434 [12:17:47<6:37:15, 2.46s/it] +2025-02-05 22:25:27 - ERROR - stderr - +2025-02-05 22:25:27 - ERROR - stderr - +2025-02-05 22:25:27 - INFO - stdout - {'loss': 0.7471, 'grad_norm': 1.2489780187606812, 'learning_rate': 8.267625833646952e-06, 'epoch': 1.71} +2025-02-05 22:25:27 - ERROR - stderr - 57%|█████▋ | 12760/22434 [12:17:47<6:37:15, 2.46s/it] +2025-02-05 22:25:29 - ERROR - stderr - 57%|█████▋ | 12761/22434 [12:17:49<6:36:28, 2.46s/it] +2025-02-05 22:25:29 - ERROR - stderr - +2025-02-05 22:25:29 - ERROR - stderr - +2025-02-05 22:25:29 - INFO - stdout - {'loss': 0.6356, 'grad_norm': 1.2595922946929932, 'learning_rate': 8.266203934397608e-06, 'epoch': 1.71} +2025-02-05 22:25:29 - ERROR - stderr - 57%|█████▋ | 12761/22434 [12:17:49<6:36:28, 2.46s/it] +2025-02-05 22:25:32 - ERROR - stderr - 57%|█████▋ | 12762/22434 [12:17:52<6:57:58, 2.59s/it] +2025-02-05 22:25:32 - ERROR - stderr - +2025-02-05 22:25:32 - ERROR - stderr - +2025-02-05 22:25:32 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.1662458181381226, 'learning_rate': 8.26478207128758e-06, 'epoch': 1.71} +2025-02-05 22:25:32 - ERROR - stderr - 57%|█████▋ | 12762/22434 [12:17:52<6:57:58, 2.59s/it] +2025-02-05 22:25:35 - ERROR - stderr - 57%|██���██▋ | 12763/22434 [12:17:54<6:50:29, 2.55s/it] +2025-02-05 22:25:35 - ERROR - stderr - +2025-02-05 22:25:35 - ERROR - stderr - +2025-02-05 22:25:35 - INFO - stdout - {'loss': 0.7442, 'grad_norm': 1.3388166427612305, 'learning_rate': 8.26336024434651e-06, 'epoch': 1.71} +2025-02-05 22:25:35 - ERROR - stderr - 57%|█████▋ | 12763/22434 [12:17:55<6:50:29, 2.55s/it] +2025-02-05 22:25:37 - ERROR - stderr - 57%|█████▋ | 12764/22434 [12:17:57<6:48:27, 2.53s/it] +2025-02-05 22:25:37 - ERROR - stderr - +2025-02-05 22:25:37 - ERROR - stderr - +2025-02-05 22:25:37 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.1608309745788574, 'learning_rate': 8.261938453604033e-06, 'epoch': 1.71} +2025-02-05 22:25:37 - ERROR - stderr - 57%|█████▋ | 12764/22434 [12:17:57<6:48:27, 2.53s/it] +2025-02-05 22:25:40 - ERROR - stderr - 57%|█████▋ | 12765/22434 [12:18:00<6:48:56, 2.54s/it] +2025-02-05 22:25:40 - ERROR - stderr - +2025-02-05 22:25:40 - ERROR - stderr - +2025-02-05 22:25:40 - INFO - stdout - {'loss': 0.598, 'grad_norm': 1.0983465909957886, 'learning_rate': 8.26051669908979e-06, 'epoch': 1.71} +2025-02-05 22:25:40 - ERROR - stderr - 57%|█████▋ | 12765/22434 [12:18:00<6:48:56, 2.54s/it] +2025-02-05 22:25:42 - ERROR - stderr - 57%|█████▋ | 12766/22434 [12:18:02<6:43:56, 2.51s/it] +2025-02-05 22:25:42 - ERROR - stderr - +2025-02-05 22:25:42 - ERROR - stderr - +2025-02-05 22:25:42 - INFO - stdout - {'loss': 0.6322, 'grad_norm': 1.1819957494735718, 'learning_rate': 8.259094980833411e-06, 'epoch': 1.71} +2025-02-05 22:25:42 - ERROR - stderr - 57%|█████▋ | 12766/22434 [12:18:02<6:43:56, 2.51s/it] +2025-02-05 22:25:45 - ERROR - stderr - 57%|█████▋ | 12767/22434 [12:18:05<6:49:26, 2.54s/it] +2025-02-05 22:25:45 - ERROR - stderr - +2025-02-05 22:25:45 - ERROR - stderr - +2025-02-05 22:25:45 - INFO - stdout - {'loss': 0.6791, 'grad_norm': 1.0930172204971313, 'learning_rate': 8.257673298864528e-06, 'epoch': 1.71} +2025-02-05 22:25:45 - ERROR - stderr - 57%|█████▋ | 12767/22434 [12:18:05<6:49:26, 2.54s/it] +2025-02-05 22:25:47 - ERROR - stderr - 57%|█████▋ | 12768/22434 [12:18:07<6:42:51, 2.50s/it] +2025-02-05 22:25:47 - ERROR - stderr - +2025-02-05 22:25:47 - ERROR - stderr - +2025-02-05 22:25:47 - INFO - stdout - {'loss': 0.7685, 'grad_norm': 1.575594425201416, 'learning_rate': 8.256251653212783e-06, 'epoch': 1.71} +2025-02-05 22:25:47 - ERROR - stderr - 57%|█████▋ | 12768/22434 [12:18:07<6:42:51, 2.50s/it] +2025-02-05 22:25:50 - ERROR - stderr - 57%|█████▋ | 12769/22434 [12:18:09<6:41:11, 2.49s/it] +2025-02-05 22:25:50 - ERROR - stderr - +2025-02-05 22:25:50 - ERROR - stderr - +2025-02-05 22:25:50 - INFO - stdout - {'loss': 0.5386, 'grad_norm': 1.1964302062988281, 'learning_rate': 8.254830043907799e-06, 'epoch': 1.71} +2025-02-05 22:25:50 - ERROR - stderr - 57%|█████▋ | 12769/22434 [12:18:09<6:41:11, 2.49s/it] +2025-02-05 22:25:52 - ERROR - stderr - 57%|█████▋ | 12770/22434 [12:18:12<6:38:37, 2.47s/it] +2025-02-05 22:25:52 - ERROR - stderr - +2025-02-05 22:25:52 - ERROR - stderr - +2025-02-05 22:25:52 - INFO - stdout - {'loss': 0.6507, 'grad_norm': 1.2177408933639526, 'learning_rate': 8.253408470979212e-06, 'epoch': 1.71} +2025-02-05 22:25:52 - ERROR - stderr - 57%|█████▋ | 12770/22434 [12:18:12<6:38:37, 2.47s/it] +2025-02-05 22:25:55 - ERROR - stderr - 57%|█████▋ | 12771/22434 [12:18:14<6:39:12, 2.48s/it] +2025-02-05 22:25:55 - ERROR - stderr - +2025-02-05 22:25:55 - ERROR - stderr - +2025-02-05 22:25:55 - INFO - stdout - {'loss': 0.6868, 'grad_norm': 1.3633873462677002, 'learning_rate': 8.251986934456658e-06, 'epoch': 1.71} +2025-02-05 22:25:55 - ERROR - stderr - 57%|█████▋ | 12771/22434 [12:18:14<6:39:12, 2.48s/it] +2025-02-05 22:25:57 - ERROR - stderr - 57%|█████▋ | 12772/22434 [12:18:17<6:40:29, 2.49s/it] +2025-02-05 22:25:57 - ERROR - stderr - +2025-02-05 22:25:57 - ERROR - stderr - +2025-02-05 22:25:57 - INFO - stdout - {'loss': 0.6643, 'grad_norm': 1.1916502714157104, 'learning_rate': 8.25056543436976e-06, 'epoch': 1.71} +2025-02-05 22:25:57 - ERROR - stderr - 57%|█████▋ | 12772/22434 [12:18:17<6:40:29, 2.49s/it] +2025-02-05 22:26:00 - ERROR - stderr - 57%|█████▋ | 12773/22434 [12:18:19<6:42:08, 2.50s/it] +2025-02-05 22:26:00 - ERROR - stderr - +2025-02-05 22:26:00 - ERROR - stderr - +2025-02-05 22:26:00 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.1619648933410645, 'learning_rate': 8.249143970748155e-06, 'epoch': 1.71} +2025-02-05 22:26:00 - ERROR - stderr - 57%|█████▋ | 12773/22434 [12:18:19<6:42:08, 2.50s/it] +2025-02-05 22:26:02 - ERROR - stderr - 57%|█████▋ | 12774/22434 [12:18:22<6:40:24, 2.49s/it] +2025-02-05 22:26:02 - ERROR - stderr - +2025-02-05 22:26:02 - ERROR - stderr - +2025-02-05 22:26:02 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.3134262561798096, 'learning_rate': 8.24772254362147e-06, 'epoch': 1.71} +2025-02-05 22:26:02 - ERROR - stderr - 57%|█████▋ | 12774/22434 [12:18:22<6:40:24, 2.49s/it] +2025-02-05 22:26:04 - ERROR - stderr - 57%|█████▋ | 12775/22434 [12:18:24<6:36:36, 2.46s/it] +2025-02-05 22:26:05 - ERROR - stderr - +2025-02-05 22:26:05 - ERROR - stderr - +2025-02-05 22:26:05 - INFO - stdout - {'loss': 0.7662, 'grad_norm': 1.3022657632827759, 'learning_rate': 8.246301153019326e-06, 'epoch': 1.71} +2025-02-05 22:26:05 - ERROR - stderr - 57%|█████▋ | 12775/22434 [12:18:24<6:36:36, 2.46s/it] +2025-02-05 22:26:07 - ERROR - stderr - 57%|█████▋ | 12776/22434 [12:18:27<6:35:16, 2.46s/it] +2025-02-05 22:26:07 - ERROR - stderr - +2025-02-05 22:26:07 - ERROR - stderr - +2025-02-05 22:26:07 - INFO - stdout - {'loss': 0.6897, 'grad_norm': 1.261699914932251, 'learning_rate': 8.24487979897136e-06, 'epoch': 1.71} +2025-02-05 22:26:07 - ERROR - stderr - 57%|█████▋ | 12776/22434 [12:18:27<6:35:16, 2.46s/it] +2025-02-05 22:26:09 - ERROR - stderr - 57%|█████▋ | 12777/22434 [12:18:29<6:36:40, 2.46s/it] +2025-02-05 22:26:09 - ERROR - stderr - +2025-02-05 22:26:09 - ERROR - stderr - +2025-02-05 22:26:09 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.2626806497573853, 'learning_rate': 8.243458481507195e-06, 'epoch': 1.71} +2025-02-05 22:26:09 - ERROR - stderr - 57%|█████▋ | 12777/22434 [12:18:29<6:36:40, 2.46s/it] +2025-02-05 22:26:12 - ERROR - stderr - 57%|█████▋ | 12778/22434 [12:18:32<6:45:52, 2.52s/it] +2025-02-05 22:26:12 - ERROR - stderr - +2025-02-05 22:26:12 - ERROR - stderr - +2025-02-05 22:26:12 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.2278270721435547, 'learning_rate': 8.242037200656455e-06, 'epoch': 1.71} +2025-02-05 22:26:12 - ERROR - stderr - 57%|█████▋ | 12778/22434 [12:18:32<6:45:52, 2.52s/it] +2025-02-05 22:26:15 - ERROR - stderr - 57%|█████▋ | 12779/22434 [12:18:34<6:42:31, 2.50s/it] +2025-02-05 22:26:15 - ERROR - stderr - +2025-02-05 22:26:15 - ERROR - stderr - +2025-02-05 22:26:15 - INFO - stdout - {'loss': 0.8066, 'grad_norm': 1.3510301113128662, 'learning_rate': 8.24061595644877e-06, 'epoch': 1.71} +2025-02-05 22:26:15 - ERROR - stderr - 57%|█████▋ | 12779/22434 [12:18:34<6:42:31, 2.50s/it] +2025-02-05 22:26:17 - ERROR - stderr - 57%|█████▋ | 12780/22434 [12:18:37<6:43:04, 2.51s/it] +2025-02-05 22:26:17 - ERROR - stderr - +2025-02-05 22:26:17 - ERROR - stderr - +2025-02-05 22:26:17 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.2224266529083252, 'learning_rate': 8.23919474891376e-06, 'epoch': 1.71} +2025-02-05 22:26:17 - ERROR - stderr - 57%|█████▋ | 12780/22434 [12:18:37<6:43:04, 2.51s/it] +2025-02-05 22:26:20 - ERROR - stderr - 57%|█████▋ | 12781/22434 [12:18:40<7:05:03, 2.64s/it] +2025-02-05 22:26:20 - ERROR - stderr - +2025-02-05 22:26:20 - ERROR - stderr - +2025-02-05 22:26:20 - INFO - stdout - {'loss': 0.7171, 'grad_norm': 1.255384922027588, 'learning_rate': 8.237773578081052e-06, 'epoch': 1.71} +2025-02-05 22:26:20 - ERROR - stderr - 57%|█████▋ | 12781/22434 [12:18:40<7:05:03, 2.64s/it] +2025-02-05 22:26:23 - ERROR - stderr - 57%|█████▋ | 12782/22434 [12:18:42<7:00:30, 2.61s/it] +2025-02-05 22:26:23 - ERROR - stderr - +2025-02-05 22:26:23 - ERROR - stderr - +2025-02-05 22:26:23 - INFO - stdout - {'loss': 0.6688, 'grad_norm': 1.1592470407485962, 'learning_rate': 8.236352443980268e-06, 'epoch': 1.71} +2025-02-05 22:26:23 - ERROR - stderr - 57%|█████▋ | 12782/22434 [12:18:42<7:00:30, 2.61s/it] +2025-02-05 22:26:25 - ERROR - stderr - 57%|█████▋ | 12783/22434 [12:18:45<6:57:07, 2.59s/it] +2025-02-05 22:26:25 - ERROR - stderr - +2025-02-05 22:26:25 - ERROR - stderr - +2025-02-05 22:26:25 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.289285659790039, 'learning_rate': 8.234931346641025e-06, 'epoch': 1.71} +2025-02-05 22:26:25 - ERROR - stderr - 57%|█████▋ | 12783/22434 [12:18:45<6:57:07, 2.59s/it] +2025-02-05 22:26:28 - ERROR - stderr - 57%|█████▋ | 12784/22434 [12:18:47<6:52:28, 2.56s/it] +2025-02-05 22:26:28 - ERROR - stderr - +2025-02-05 22:26:28 - ERROR - stderr - +2025-02-05 22:26:28 - INFO - stdout - {'loss': 0.7764, 'grad_norm': 1.2794567346572876, 'learning_rate': 8.233510286092955e-06, 'epoch': 1.71} +2025-02-05 22:26:28 - ERROR - stderr - 57%|█████▋ | 12784/22434 [12:18:47<6:52:28, 2.56s/it] +2025-02-05 22:26:30 - ERROR - stderr - 57%|█████▋ | 12785/22434 [12:18:50<6:47:31, 2.53s/it] +2025-02-05 22:26:30 - ERROR - stderr - +2025-02-05 22:26:30 - ERROR - stderr - +2025-02-05 22:26:30 - INFO - stdout - {'loss': 0.6169, 'grad_norm': 1.1646713018417358, 'learning_rate': 8.232089262365672e-06, 'epoch': 1.71} +2025-02-05 22:26:30 - ERROR - stderr - 57%|█████▋ | 12785/22434 [12:18:50<6:47:31, 2.53s/it] +2025-02-05 22:26:33 - ERROR - stderr - 57%|█████▋ | 12786/22434 [12:18:52<6:45:54, 2.52s/it] +2025-02-05 22:26:33 - ERROR - stderr - +2025-02-05 22:26:33 - ERROR - stderr - +2025-02-05 22:26:33 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.176916480064392, 'learning_rate': 8.230668275488794e-06, 'epoch': 1.71} +2025-02-05 22:26:33 - ERROR - stderr - 57%|█████▋ | 12786/22434 [12:18:52<6:45:54, 2.52s/it] +2025-02-05 22:26:35 - ERROR - stderr - 57%|█████▋ | 12787/22434 [12:18:55<6:41:09, 2.50s/it] +2025-02-05 22:26:35 - ERROR - stderr - +2025-02-05 22:26:35 - ERROR - stderr - +2025-02-05 22:26:35 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 1.2405433654785156, 'learning_rate': 8.229247325491945e-06, 'epoch': 1.71} +2025-02-05 22:26:35 - ERROR - stderr - 57%|█████▋ | 12787/22434 [12:18:55<6:41:09, 2.50s/it] +2025-02-05 22:26:37 - ERROR - stderr - 57%|█████▋ | 12788/22434 [12:18:57<6:41:12, 2.50s/it] +2025-02-05 22:26:38 - ERROR - stderr - +2025-02-05 22:26:38 - ERROR - stderr - +2025-02-05 22:26:38 - INFO - stdout - {'loss': 0.684, 'grad_norm': 1.5124691724777222, 'learning_rate': 8.227826412404737e-06, 'epoch': 1.71} +2025-02-05 22:26:38 - ERROR - stderr - 57%|█████▋ | 12788/22434 [12:18:57<6:41:12, 2.50s/it] +2025-02-05 22:26:40 - ERROR - stderr - 57%|█████▋ | 12789/22434 [12:19:00<6:44:20, 2.52s/it] +2025-02-05 22:26:40 - ERROR - stderr - +2025-02-05 22:26:40 - ERROR - stderr - +2025-02-05 22:26:40 - INFO - stdout - {'loss': 0.7416, 'grad_norm': 1.2935627698898315, 'learning_rate': 8.226405536256794e-06, 'epoch': 1.71} +2025-02-05 22:26:40 - ERROR - stderr - 57%|█████▋ | 12789/22434 [12:19:00<6:44:20, 2.52s/it] +2025-02-05 22:26:43 - ERROR - stderr - 57%|█████▋ | 12790/22434 [12:19:02<6:45:10, 2.52s/it] +2025-02-05 22:26:43 - ERROR - stderr - +2025-02-05 22:26:43 - ERROR - stderr - +2025-02-05 22:26:43 - INFO - stdout - {'loss': 0.7181, 'grad_norm': 1.4103387594223022, 'learning_rate': 8.224984697077734e-06, 'epoch': 1.71} +2025-02-05 22:26:43 - ERROR - stderr - 57%|█████▋ | 12790/22434 [12:19:02<6:45:10, 2.52s/it] +2025-02-05 22:26:45 - ERROR - stderr - 57%|█████▋ | 12791/22434 [12:19:05<6:46:51, 2.53s/it] +2025-02-05 22:26:45 - ERROR - stderr - +2025-02-05 22:26:45 - ERROR - stderr - +2025-02-05 22:26:45 - INFO - stdout - {'loss': 0.5976, 'grad_norm': 1.1975380182266235, 'learning_rate': 8.223563894897164e-06, 'epoch': 1.71} +2025-02-05 22:26:45 - ERROR - stderr - 57%|█████▋ | 12791/22434 [12:19:05<6:46:51, 2.53s/it] +2025-02-05 22:26:48 - ERROR - stderr - 57%|█████▋ | 12792/22434 [12:19:07<6:45:09, 2.52s/it] +2025-02-05 22:26:48 - ERROR - stderr - +2025-02-05 22:26:48 - ERROR - stderr - +2025-02-05 22:26:48 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.1826415061950684, 'learning_rate': 8.222143129744708e-06, 'epoch': 1.71} +2025-02-05 22:26:48 - ERROR - stderr - 57%|█████▋ | 12792/22434 [12:19:07<6:45:09, 2.52s/it] +2025-02-05 22:26:50 - ERROR - stderr - 57%|█████▋ | 12793/22434 [12:19:10<6:46:46, 2.53s/it] +2025-02-05 22:26:50 - ERROR - stderr - +2025-02-05 22:26:50 - ERROR - stderr - +2025-02-05 22:26:50 - INFO - stdout - {'loss': 0.7362, 'grad_norm': 1.288440465927124, 'learning_rate': 8.220722401649979e-06, 'epoch': 1.71} +2025-02-05 22:26:50 - ERROR - stderr - 57%|█████▋ | 12793/22434 [12:19:10<6:46:46, 2.53s/it] +2025-02-05 22:26:53 - ERROR - stderr - 57%|█████▋ | 12794/22434 [12:19:12<6:47:12, 2.53s/it] +2025-02-05 22:26:53 - ERROR - stderr - +2025-02-05 22:26:53 - ERROR - stderr - +2025-02-05 22:26:53 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.32015860080719, 'learning_rate': 8.219301710642583e-06, 'epoch': 1.71} +2025-02-05 22:26:53 - ERROR - stderr - 57%|█████▋ | 12794/22434 [12:19:13<6:47:12, 2.53s/it] +2025-02-05 22:26:55 - ERROR - stderr - 57%|█████▋ | 12795/22434 [12:19:15<6:47:10, 2.53s/it] +2025-02-05 22:26:55 - ERROR - stderr - +2025-02-05 22:26:55 - ERROR - stderr - +2025-02-05 22:26:55 - INFO - stdout - {'loss': 0.6248, 'grad_norm': 1.1202263832092285, 'learning_rate': 8.217881056752142e-06, 'epoch': 1.71} +2025-02-05 22:26:55 - ERROR - stderr - 57%|█████▋ | 12795/22434 [12:19:15<6:47:10, 2.53s/it] +2025-02-05 22:26:58 - ERROR - stderr - 57%|█████▋ | 12796/22434 [12:19:17<6:42:34, 2.51s/it] +2025-02-05 22:26:58 - ERROR - stderr - +2025-02-05 22:26:58 - ERROR - stderr - +2025-02-05 22:26:58 - INFO - stdout - {'loss': 0.7152, 'grad_norm': 1.4491305351257324, 'learning_rate': 8.216460440008263e-06, 'epoch': 1.71} +2025-02-05 22:26:58 - ERROR - stderr - 57%|█████▋ | 12796/22434 [12:19:18<6:42:34, 2.51s/it] +2025-02-05 22:27:00 - ERROR - stderr - 57%|█████▋ | 12797/22434 [12:19:20<6:44:11, 2.52s/it] +2025-02-05 22:27:00 - ERROR - stderr - +2025-02-05 22:27:00 - ERROR - stderr - +2025-02-05 22:27:00 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.2150226831436157, 'learning_rate': 8.215039860440564e-06, 'epoch': 1.71} +2025-02-05 22:27:00 - ERROR - stderr - 57%|█████▋ | 12797/22434 [12:19:20<6:44:11, 2.52s/it] +2025-02-05 22:27:03 - ERROR - stderr - 57%|█████▋ | 12798/22434 [12:19:23<6:49:53, 2.55s/it] +2025-02-05 22:27:03 - ERROR - stderr - +2025-02-05 22:27:03 - ERROR - stderr - +2025-02-05 22:27:03 - INFO - stdout - {'loss': 0.6493, 'grad_norm': 1.1814301013946533, 'learning_rate': 8.21361931807865e-06, 'epoch': 1.71} +2025-02-05 22:27:03 - ERROR - stderr - 57%|█████▋ | 12798/22434 [12:19:23<6:49:53, 2.55s/it] +2025-02-05 22:27:05 - ERROR - stderr - 57%|█████▋ | 12799/22434 [12:19:25<6:46:22, 2.53s/it] +2025-02-05 22:27:05 - ERROR - stderr - +2025-02-05 22:27:05 - ERROR - stderr - +2025-02-05 22:27:05 - INFO - stdout - {'loss': 0.6034, 'grad_norm': 1.1774060726165771, 'learning_rate': 8.21219881295213e-06, 'epoch': 1.71} +2025-02-05 22:27:05 - ERROR - stderr - 57%|█████▋ | 12799/22434 [12:19:25<6:46:22, 2.53s/it] +2025-02-05 22:27:08 - ERROR - stderr - 57%|█████▋ | 12800/22434 [12:19:28<6:46:58, 2.53s/it] +2025-02-05 22:27:08 - ERROR - stderr - +2025-02-05 22:27:08 - ERROR - stderr - +2025-02-05 22:27:08 - INFO - stdout - {'loss': 0.6706, 'grad_norm': 1.2926663160324097, 'learning_rate': 8.210778345090617e-06, 'epoch': 1.71} +2025-02-05 22:27:08 - ERROR - stderr - 57%|█████▋ | 12800/22434 [12:19:28<6:46:58, 2.53s/it] +2025-02-05 22:27:10 - ERROR - stderr - 57%|█████▋ | 12801/22434 [12:19:30<6:47:01, 2.54s/it] +2025-02-05 22:27:10 - ERROR - stderr - +2025-02-05 22:27:10 - ERROR - stderr - +2025-02-05 22:27:10 - INFO - stdout - {'loss': 0.6747, 'grad_norm': 1.30585515499115, 'learning_rate': 8.209357914523716e-06, 'epoch': 1.71} +2025-02-05 22:27:10 - ERROR - stderr - 57%|█████▋ | 12801/22434 [12:19:30<6:47:01, 2.54s/it] +2025-02-05 22:27:13 - ERROR - stderr - 57%|█████▋ | 12802/22434 [12:19:33<6:47:55, 2.54s/it] +2025-02-05 22:27:13 - ERROR - stderr - +2025-02-05 22:27:13 - ERROR - stderr - +2025-02-05 22:27:13 - INFO - stdout - {'loss': 0.6936, 'grad_norm': 1.4625037908554077, 'learning_rate': 8.207937521281033e-06, 'epoch': 1.71} +2025-02-05 22:27:13 - ERROR - stderr - 57%|█████▋ | 12802/22434 [12:19:33<6:47:55, 2.54s/it] +2025-02-05 22:27:15 - ERROR - stderr - 57%|█████▋ | 12803/22434 [12:19:35<6:45:45, 2.53s/it] +2025-02-05 22:27:16 - ERROR - stderr - +2025-02-05 22:27:16 - ERROR - stderr - +2025-02-05 22:27:16 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.2131339311599731, 'learning_rate': 8.206517165392183e-06, 'epoch': 1.71} +2025-02-05 22:27:16 - ERROR - stderr - 57%|█████▋ | 12803/22434 [12:19:35<6:45:45, 2.53s/it] +2025-02-05 22:27:18 - ERROR - stderr - 57%|█████▋ | 12804/22434 [12:19:38<6:58:05, 2.60s/it] +2025-02-05 22:27:18 - ERROR - stderr - +2025-02-05 22:27:18 - ERROR - stderr - +2025-02-05 22:27:18 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 1.1758376359939575, 'learning_rate': 8.20509684688676e-06, 'epoch': 1.71} +2025-02-05 22:27:18 - ERROR - stderr - 57%|█████▋ | 12804/22434 [12:19:38<6:58:05, 2.60s/it] +2025-02-05 22:27:21 - ERROR - stderr - 57%|█████▋ | 12805/22434 [12:19:41<6:53:27, 2.58s/it] +2025-02-05 22:27:21 - ERROR - stderr - +2025-02-05 22:27:21 - ERROR - stderr - +2025-02-05 22:27:21 - INFO - stdout - {'loss': 0.6722, 'grad_norm': 1.1197118759155273, 'learning_rate': 8.203676565794382e-06, 'epoch': 1.71} +2025-02-05 22:27:21 - ERROR - stderr - 57%|█████▋ | 12805/22434 [12:19:41<6:53:27, 2.58s/it] +2025-02-05 22:27:23 - ERROR - stderr - 57%|█████▋ | 12806/22434 [12:19:43<6:48:37, 2.55s/it] +2025-02-05 22:27:23 - ERROR - stderr - +2025-02-05 22:27:23 - ERROR - stderr - +2025-02-05 22:27:23 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.1939643621444702, 'learning_rate': 8.202256322144647e-06, 'epoch': 1.71} +2025-02-05 22:27:23 - ERROR - stderr - 57%|█████▋ | 12806/22434 [12:19:43<6:48:37, 2.55s/it] +2025-02-05 22:27:26 - ERROR - stderr - 57%|█████▋ | 12807/22434 [12:19:46<6:46:26, 2.53s/it] +2025-02-05 22:27:26 - ERROR - stderr - +2025-02-05 22:27:26 - ERROR - stderr - +2025-02-05 22:27:26 - INFO - stdout - {'loss': 0.6493, 'grad_norm': 1.132934331893921, 'learning_rate': 8.200836115967153e-06, 'epoch': 1.71} +2025-02-05 22:27:26 - ERROR - stderr - 57%|█████▋ | 12807/22434 [12:19:46<6:46:26, 2.53s/it] +2025-02-05 22:27:28 - ERROR - stderr - 57%|█████▋ | 12808/22434 [12:19:48<6:43:21, 2.51s/it] +2025-02-05 22:27:28 - ERROR - stderr - +2025-02-05 22:27:28 - ERROR - stderr - +2025-02-05 22:27:28 - INFO - stdout - {'loss': 0.697, 'grad_norm': 1.2295867204666138, 'learning_rate': 8.199415947291512e-06, 'epoch': 1.71} +2025-02-05 22:27:28 - ERROR - stderr - 57%|█████▋ | 12808/22434 [12:19:48<6:43:21, 2.51s/it] +2025-02-05 22:27:31 - ERROR - stderr - 57%|█████▋ | 12809/22434 [12:19:51<6:44:36, 2.52s/it] +2025-02-05 22:27:31 - ERROR - stderr - +2025-02-05 22:27:31 - ERROR - stderr - +2025-02-05 22:27:31 - INFO - stdout - {'loss': 0.6252, 'grad_norm': 1.0900604724884033, 'learning_rate': 8.197995816147325e-06, 'epoch': 1.71} +2025-02-05 22:27:31 - ERROR - stderr - 57%|█████▋ | 12809/22434 [12:19:51<6:44:36, 2.52s/it] +2025-02-05 22:27:33 - ERROR - stderr - 57%|█████▋ | 12810/22434 [12:19:53<6:50:34, 2.56s/it] +2025-02-05 22:27:33 - ERROR - stderr - +2025-02-05 22:27:33 - ERROR - stderr - +2025-02-05 22:27:33 - INFO - stdout - {'loss': 0.6174, 'grad_norm': 1.1717404127120972, 'learning_rate': 8.196575722564187e-06, 'epoch': 1.71} +2025-02-05 22:27:33 - ERROR - stderr - 57%|█████▋ | 12810/22434 [12:19:53<6:50:34, 2.56s/it] +2025-02-05 22:27:36 - ERROR - stderr - 57%|█████▋ | 12811/22434 [12:19:56<6:45:54, 2.53s/it] +2025-02-05 22:27:36 - ERROR - stderr - +2025-02-05 22:27:36 - ERROR - stderr - +2025-02-05 22:27:36 - INFO - stdout - {'loss': 0.6694, 'grad_norm': 1.3035222291946411, 'learning_rate': 8.195155666571705e-06, 'epoch': 1.71} +2025-02-05 22:27:36 - ERROR - stderr - 57%|█████▋ | 12811/22434 [12:19:56<6:45:54, 2.53s/it] +2025-02-05 22:27:38 - ERROR - stderr - 57%|█████▋ | 12812/22434 [12:19:58<6:44:41, 2.52s/it] +2025-02-05 22:27:38 - ERROR - stderr - +2025-02-05 22:27:38 - ERROR - stderr - +2025-02-05 22:27:38 - INFO - stdout - {'loss': 0.6994, 'grad_norm': 1.223902702331543, 'learning_rate': 8.193735648199473e-06, 'epoch': 1.71} +2025-02-05 22:27:38 - ERROR - stderr - 57%|█████▋ | 12812/22434 [12:19:58<6:44:41, 2.52s/it] +2025-02-05 22:27:41 - ERROR - stderr - 57%|█████▋ | 12813/22434 [12:20:01<6:46:31, 2.54s/it] +2025-02-05 22:27:41 - ERROR - stderr - +2025-02-05 22:27:41 - ERROR - stderr - +2025-02-05 22:27:41 - INFO - stdout - {'loss': 0.7424, 'grad_norm': 1.2977434396743774, 'learning_rate': 8.192315667477096e-06, 'epoch': 1.71} +2025-02-05 22:27:41 - ERROR - stderr - 57%|█████▋ | 12813/22434 [12:20:01<6:46:31, 2.54s/it] +2025-02-05 22:27:43 - ERROR - stderr - 57%|█████▋ | 12814/22434 [12:20:03<6:46:53, 2.54s/it] +2025-02-05 22:27:44 - ERROR - stderr - +2025-02-05 22:27:44 - ERROR - stderr - +2025-02-05 22:27:44 - INFO - stdout - {'loss': 0.842, 'grad_norm': 1.4670928716659546, 'learning_rate': 8.190895724434169e-06, 'epoch': 1.71} +2025-02-05 22:27:44 - ERROR - stderr - 57%|█████▋ | 12814/22434 [12:20:03<6:46:53, 2.54s/it] +2025-02-05 22:27:46 - ERROR - stderr - 57%|█████▋ | 12815/22434 [12:20:06<6:44:04, 2.52s/it] +2025-02-05 22:27:46 - ERROR - stderr - +2025-02-05 22:27:46 - ERROR - stderr - +2025-02-05 22:27:46 - INFO - stdout - {'loss': 0.659, 'grad_norm': 1.1869728565216064, 'learning_rate': 8.189475819100286e-06, 'epoch': 1.71} +2025-02-05 22:27:46 - ERROR - stderr - 57%|█████▋ | 12815/22434 [12:20:06<6:44:04, 2.52s/it] +2025-02-05 22:27:49 - ERROR - stderr - 57%|█████▋ | 12816/22434 [12:20:08<6:46:51, 2.54s/it] +2025-02-05 22:27:49 - ERROR - stderr - +2025-02-05 22:27:49 - ERROR - stderr - +2025-02-05 22:27:49 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.3191691637039185, 'learning_rate': 8.188055951505051e-06, 'epoch': 1.71} +2025-02-05 22:27:49 - ERROR - stderr - 57%|█████▋ | 12816/22434 [12:20:08<6:46:51, 2.54s/it] +2025-02-05 22:27:51 - ERROR - stderr - 57%|█████▋ | 12817/22434 [12:20:11<6:46:00, 2.53s/it] +2025-02-05 22:27:51 - ERROR - stderr - +2025-02-05 22:27:51 - ERROR - stderr - +2025-02-05 22:27:51 - INFO - stdout - {'loss': 0.7657, 'grad_norm': 1.2622216939926147, 'learning_rate': 8.186636121678057e-06, 'epoch': 1.71} +2025-02-05 22:27:51 - ERROR - stderr - 57%|█████▋ | 12817/22434 [12:20:11<6:46:00, 2.53s/it] +2025-02-05 22:27:54 - ERROR - stderr - 57%|█████▋ | 12818/22434 [12:20:13<6:49:28, 2.55s/it] +2025-02-05 22:27:54 - ERROR - stderr - +2025-02-05 22:27:54 - ERROR - stderr - +2025-02-05 22:27:54 - INFO - stdout - {'loss': 0.7474, 'grad_norm': 1.2715426683425903, 'learning_rate': 8.185216329648892e-06, 'epoch': 1.71} +2025-02-05 22:27:54 - ERROR - stderr - 57%|█████▋ | 12818/22434 [12:20:13<6:49:28, 2.55s/it] +2025-02-05 22:27:56 - ERROR - stderr - 57%|█████▋ | 12819/22434 [12:20:16<6:47:38, 2.54s/it] +2025-02-05 22:27:56 - ERROR - stderr - +2025-02-05 22:27:56 - ERROR - stderr - +2025-02-05 22:27:56 - INFO - stdout - {'loss': 0.6626, 'grad_norm': 1.167702317237854, 'learning_rate': 8.18379657544716e-06, 'epoch': 1.71} +2025-02-05 22:27:56 - ERROR - stderr - 57%|█████▋ | 12819/22434 [12:20:16<6:47:38, 2.54s/it] +2025-02-05 22:27:59 - ERROR - stderr - 57%|█████▋ | 12820/22434 [12:20:19<6:47:39, 2.54s/it] +2025-02-05 22:27:59 - ERROR - stderr - +2025-02-05 22:27:59 - ERROR - stderr - +2025-02-05 22:27:59 - INFO - stdout - {'loss': 0.6514, 'grad_norm': 1.2841434478759766, 'learning_rate': 8.18237685910245e-06, 'epoch': 1.71} +2025-02-05 22:27:59 - ERROR - stderr - 57%|█████▋ | 12820/22434 [12:20:19<6:47:39, 2.54s/it] +2025-02-05 22:28:02 - ERROR - stderr - 57%|█████▋ | 12821/22434 [12:20:21<6:58:11, 2.61s/it] +2025-02-05 22:28:02 - ERROR - stderr - +2025-02-05 22:28:02 - ERROR - stderr - +2025-02-05 22:28:02 - INFO - stdout - {'loss': 0.7711, 'grad_norm': 1.4027804136276245, 'learning_rate': 8.180957180644353e-06, 'epoch': 1.71} +2025-02-05 22:28:02 - ERROR - stderr - 57%|█████▋ | 12821/22434 [12:20:21<6:58:11, 2.61s/it] +2025-02-05 22:28:04 - ERROR - stderr - 57%|█████▋ | 12822/22434 [12:20:24<6:51:09, 2.57s/it] +2025-02-05 22:28:04 - ERROR - stderr - +2025-02-05 22:28:04 - ERROR - stderr - +2025-02-05 22:28:04 - INFO - stdout - {'loss': 0.5589, 'grad_norm': 1.1240234375, 'learning_rate': 8.179537540102466e-06, 'epoch': 1.71} +2025-02-05 22:28:04 - ERROR - stderr - 57%|█████▋ | 12822/22434 [12:20:24<6:51:09, 2.57s/it] +2025-02-05 22:28:06 - ERROR - stderr - 57%|█████▋ | 12823/22434 [12:20:26<6:49:12, 2.55s/it] +2025-02-05 22:28:07 - ERROR - stderr - +2025-02-05 22:28:07 - ERROR - stderr - +2025-02-05 22:28:07 - INFO - stdout - {'loss': 0.6925, 'grad_norm': 1.297250747680664, 'learning_rate': 8.178117937506375e-06, 'epoch': 1.71} +2025-02-05 22:28:07 - ERROR - stderr - 57%|█████▋ | 12823/22434 [12:20:26<6:49:12, 2.55s/it] +2025-02-05 22:28:09 - ERROR - stderr - 57%|█████▋ | 12824/22434 [12:20:29<6:47:40, 2.55s/it] +2025-02-05 22:28:09 - ERROR - stderr - +2025-02-05 22:28:09 - ERROR - stderr - +2025-02-05 22:28:09 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.2992154359817505, 'learning_rate': 8.176698372885676e-06, 'epoch': 1.71} +2025-02-05 22:28:09 - ERROR - stderr - 57%|█████▋ | 12824/22434 [12:20:29<6:47:40, 2.55s/it] +2025-02-05 22:28:12 - ERROR - stderr - 57%|█████▋ | 12825/22434 [12:20:31<6:47:15, 2.54s/it] +2025-02-05 22:28:12 - ERROR - stderr - +2025-02-05 22:28:12 - ERROR - stderr - +2025-02-05 22:28:12 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.3466929197311401, 'learning_rate': 8.175278846269953e-06, 'epoch': 1.72} +2025-02-05 22:28:12 - ERROR - stderr - 57%|█████▋ | 12825/22434 [12:20:31<6:47:15, 2.54s/it] +2025-02-05 22:28:14 - ERROR - stderr - 57%|█████▋ | 12826/22434 [12:20:34<6:44:54, 2.53s/it] +2025-02-05 22:28:14 - ERROR - stderr - +2025-02-05 22:28:14 - ERROR - stderr - +2025-02-05 22:28:14 - INFO - stdout - {'loss': 0.6521, 'grad_norm': 1.37809157371521, 'learning_rate': 8.173859357688792e-06, 'epoch': 1.72} +2025-02-05 22:28:14 - ERROR - stderr - 57%|█████▋ | 12826/22434 [12:20:34<6:44:54, 2.53s/it] +2025-02-05 22:28:17 - ERROR - stderr - 57%|█████▋ | 12827/22434 [12:20:36<6:50:48, 2.57s/it] +2025-02-05 22:28:17 - ERROR - stderr - +2025-02-05 22:28:17 - ERROR - stderr - +2025-02-05 22:28:17 - INFO - stdout - {'loss': 0.6372, 'grad_norm': 1.1244807243347168, 'learning_rate': 8.172439907171788e-06, 'epoch': 1.72} +2025-02-05 22:28:17 - ERROR - stderr - 57%|█████▋ | 12827/22434 [12:20:37<6:50:48, 2.57s/it] +2025-02-05 22:28:19 - ERROR - stderr - 57%|█████▋ | 12828/22434 [12:20:39<6:44:38, 2.53s/it] +2025-02-05 22:28:19 - ERROR - stderr - +2025-02-05 22:28:19 - ERROR - stderr - +2025-02-05 22:28:19 - INFO - stdout - {'loss': 0.7393, 'grad_norm': 1.662224531173706, 'learning_rate': 8.171020494748526e-06, 'epoch': 1.72} +2025-02-05 22:28:19 - ERROR - stderr - 57%|█████▋ | 12828/22434 [12:20:39<6:44:38, 2.53s/it] +2025-02-05 22:28:22 - ERROR - stderr - 57%|█████▋ | 12829/22434 [12:20:41<6:40:55, 2.50s/it] +2025-02-05 22:28:22 - ERROR - stderr - +2025-02-05 22:28:22 - ERROR - stderr - +2025-02-05 22:28:22 - INFO - stdout - {'loss': 0.7268, 'grad_norm': 1.1774975061416626, 'learning_rate': 8.169601120448592e-06, 'epoch': 1.72} +2025-02-05 22:28:22 - ERROR - stderr - 57%|█████▋ | 12829/22434 [12:20:41<6:40:55, 2.50s/it] +2025-02-05 22:28:24 - ERROR - stderr - 57%|█████▋ | 12830/22434 [12:20:44<6:45:26, 2.53s/it] +2025-02-05 22:28:24 - ERROR - stderr - +2025-02-05 22:28:24 - ERROR - stderr - +2025-02-05 22:28:24 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.214506983757019, 'learning_rate': 8.168181784301573e-06, 'epoch': 1.72} +2025-02-05 22:28:24 - ERROR - stderr - 57%|█████▋ | 12830/22434 [12:20:44<6:45:26, 2.53s/it] +2025-02-05 22:28:27 - ERROR - stderr - 57%|█████▋ | 12831/22434 [12:20:46<6:45:26, 2.53s/it] +2025-02-05 22:28:27 - ERROR - stderr - +2025-02-05 22:28:27 - ERROR - stderr - +2025-02-05 22:28:27 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.151482105255127, 'learning_rate': 8.166762486337045e-06, 'epoch': 1.72} +2025-02-05 22:28:27 - ERROR - stderr - 57%|█████▋ | 12831/22434 [12:20:47<6:45:26, 2.53s/it] +2025-02-05 22:28:29 - ERROR - stderr - 57%|█████▋ | 12832/22434 [12:20:49<6:40:56, 2.51s/it] +2025-02-05 22:28:29 - ERROR - stderr - +2025-02-05 22:28:29 - ERROR - stderr - +2025-02-05 22:28:29 - INFO - stdout - {'loss': 0.8111, 'grad_norm': 1.4210294485092163, 'learning_rate': 8.165343226584605e-06, 'epoch': 1.72} +2025-02-05 22:28:29 - ERROR - stderr - 57%|█████▋ | 12832/22434 [12:20:49<6:40:56, 2.51s/it] +2025-02-05 22:28:32 - ERROR - stderr - 57%|█████▋ | 12833/22434 [12:20:52<6:44:13, 2.53s/it] +2025-02-05 22:28:32 - ERROR - stderr - +2025-02-05 22:28:32 - ERROR - stderr - +2025-02-05 22:28:32 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.2370742559432983, 'learning_rate': 8.163924005073826e-06, 'epoch': 1.72} +2025-02-05 22:28:32 - ERROR - stderr - 57%|█████▋ | 12833/22434 [12:20:52<6:44:13, 2.53s/it] +2025-02-05 22:28:34 - ERROR - stderr - 57%|█████▋ | 12834/22434 [12:20:54<6:44:29, 2.53s/it] +2025-02-05 22:28:34 - ERROR - stderr - +2025-02-05 22:28:34 - ERROR - stderr - +2025-02-05 22:28:34 - INFO - stdout - {'loss': 0.6369, 'grad_norm': 1.1957107782363892, 'learning_rate': 8.162504821834296e-06, 'epoch': 1.72} +2025-02-05 22:28:34 - ERROR - stderr - 57%|█████▋ | 12834/22434 [12:20:54<6:44:29, 2.53s/it] +2025-02-05 22:28:37 - ERROR - stderr - 57%|█████▋ | 12835/22434 [12:20:57<7:06:14, 2.66s/it] +2025-02-05 22:28:37 - ERROR - stderr - +2025-02-05 22:28:37 - ERROR - stderr - +2025-02-05 22:28:37 - INFO - stdout - {'loss': 0.6691, 'grad_norm': 1.0631752014160156, 'learning_rate': 8.161085676895597e-06, 'epoch': 1.72} +2025-02-05 22:28:37 - ERROR - stderr - 57%|█████▋ | 12835/22434 [12:20:57<7:06:14, 2.66s/it] +2025-02-05 22:28:40 - ERROR - stderr - 57%|█████▋ | 12836/22434 [12:21:00<6:57:37, 2.61s/it] +2025-02-05 22:28:40 - ERROR - stderr - +2025-02-05 22:28:40 - ERROR - stderr - +2025-02-05 22:28:40 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.1368411779403687, 'learning_rate': 8.159666570287303e-06, 'epoch': 1.72} +2025-02-05 22:28:40 - ERROR - stderr - 57%|█████▋ | 12836/22434 [12:21:00<6:57:37, 2.61s/it] +2025-02-05 22:28:42 - ERROR - stderr - 57%|█████▋ | 12837/22434 [12:21:02<6:53:58, 2.59s/it] +2025-02-05 22:28:42 - ERROR - stderr - +2025-02-05 22:28:42 - ERROR - stderr - +2025-02-05 22:28:42 - INFO - stdout - {'loss': 0.6431, 'grad_norm': 1.3594932556152344, 'learning_rate': 8.158247502039002e-06, 'epoch': 1.72} +2025-02-05 22:28:42 - ERROR - stderr - 57%|█████▋ | 12837/22434 [12:21:02<6:53:58, 2.59s/it] +2025-02-05 22:28:45 - ERROR - stderr - 57%|█████▋ | 12838/22434 [12:21:04<6:46:37, 2.54s/it] +2025-02-05 22:28:45 - ERROR - stderr - +2025-02-05 22:28:45 - ERROR - stderr - +2025-02-05 22:28:45 - INFO - stdout - {'loss': 0.7112, 'grad_norm': 1.1777312755584717, 'learning_rate': 8.156828472180271e-06, 'epoch': 1.72} +2025-02-05 22:28:45 - ERROR - stderr - 57%|█████▋ | 12838/22434 [12:21:05<6:46:37, 2.54s/it] +2025-02-05 22:28:47 - ERROR - stderr - 57%|█████▋ | 12839/22434 [12:21:07<6:41:03, 2.51s/it] +2025-02-05 22:28:47 - ERROR - stderr - +2025-02-05 22:28:47 - ERROR - stderr - +2025-02-05 22:28:47 - INFO - stdout - {'loss': 0.6884, 'grad_norm': 1.175522804260254, 'learning_rate': 8.15540948074068e-06, 'epoch': 1.72} +2025-02-05 22:28:47 - ERROR - stderr - 57%|█████▋ | 12839/22434 [12:21:07<6:41:03, 2.51s/it] +2025-02-05 22:28:50 - ERROR - stderr - 57%|█████▋ | 12840/22434 [12:21:09<6:41:14, 2.51s/it] +2025-02-05 22:28:50 - ERROR - stderr - +2025-02-05 22:28:50 - ERROR - stderr - +2025-02-05 22:28:50 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.1670371294021606, 'learning_rate': 8.153990527749818e-06, 'epoch': 1.72} +2025-02-05 22:28:50 - ERROR - stderr - 57%|█████▋ | 12840/22434 [12:21:09<6:41:14, 2.51s/it] +2025-02-05 22:28:52 - ERROR - stderr - 57%|█████▋ | 12841/22434 [12:21:12<6:37:12, 2.48s/it] +2025-02-05 22:28:52 - ERROR - stderr - +2025-02-05 22:28:52 - ERROR - stderr - +2025-02-05 22:28:52 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.325585126876831, 'learning_rate': 8.152571613237257e-06, 'epoch': 1.72} +2025-02-05 22:28:52 - ERROR - stderr - 57%|█████▋ | 12841/22434 [12:21:12<6:37:12, 2.48s/it] +2025-02-05 22:28:54 - ERROR - stderr - 57%|█████▋ | 12842/22434 [12:21:14<6:33:35, 2.46s/it] +2025-02-05 22:28:55 - ERROR - stderr - +2025-02-05 22:28:55 - ERROR - stderr - +2025-02-05 22:28:55 - INFO - stdout - {'loss': 0.6777, 'grad_norm': 1.3175456523895264, 'learning_rate': 8.151152737232572e-06, 'epoch': 1.72} +2025-02-05 22:28:55 - ERROR - stderr - 57%|█��███▋ | 12842/22434 [12:21:14<6:33:35, 2.46s/it] +2025-02-05 22:28:57 - ERROR - stderr - 57%|█████▋ | 12843/22434 [12:21:17<6:36:37, 2.48s/it] +2025-02-05 22:28:57 - ERROR - stderr - +2025-02-05 22:28:57 - ERROR - stderr - +2025-02-05 22:28:57 - INFO - stdout - {'loss': 0.7486, 'grad_norm': 1.2463619709014893, 'learning_rate': 8.14973389976534e-06, 'epoch': 1.72} +2025-02-05 22:28:57 - ERROR - stderr - 57%|█████▋ | 12843/22434 [12:21:17<6:36:37, 2.48s/it] +2025-02-05 22:29:00 - ERROR - stderr - 57%|█████▋ | 12844/22434 [12:21:19<6:39:24, 2.50s/it] +2025-02-05 22:29:00 - ERROR - stderr - +2025-02-05 22:29:00 - ERROR - stderr - +2025-02-05 22:29:00 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.2975207567214966, 'learning_rate': 8.148315100865131e-06, 'epoch': 1.72} +2025-02-05 22:29:00 - ERROR - stderr - 57%|█████▋ | 12844/22434 [12:21:19<6:39:24, 2.50s/it] +2025-02-05 22:29:02 - ERROR - stderr - 57%|█████▋ | 12845/22434 [12:21:22<6:43:43, 2.53s/it] +2025-02-05 22:29:02 - ERROR - stderr - +2025-02-05 22:29:02 - ERROR - stderr - +2025-02-05 22:29:02 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.1764236688613892, 'learning_rate': 8.146896340561528e-06, 'epoch': 1.72} +2025-02-05 22:29:02 - ERROR - stderr - 57%|█████▋ | 12845/22434 [12:21:22<6:43:43, 2.53s/it] +2025-02-05 22:29:05 - ERROR - stderr - 57%|█████▋ | 12846/22434 [12:21:25<6:52:16, 2.58s/it] +2025-02-05 22:29:05 - ERROR - stderr - +2025-02-05 22:29:05 - ERROR - stderr - +2025-02-05 22:29:05 - INFO - stdout - {'loss': 0.7113, 'grad_norm': 1.2639333009719849, 'learning_rate': 8.145477618884092e-06, 'epoch': 1.72} +2025-02-05 22:29:05 - ERROR - stderr - 57%|█████▋ | 12846/22434 [12:21:25<6:52:16, 2.58s/it] +2025-02-05 22:29:07 - ERROR - stderr - 57%|█████▋ | 12847/22434 [12:21:27<6:49:29, 2.56s/it] +2025-02-05 22:29:07 - ERROR - stderr - +2025-02-05 22:29:07 - ERROR - stderr - +2025-02-05 22:29:07 - INFO - stdout - {'loss': 0.7294, 'grad_norm': 1.3122056722640991, 'learning_rate': 8.1440589358624e-06, 'epoch': 1.72} +2025-02-05 22:29:07 - ERROR - stderr - 57%|█████▋ | 12847/22434 [12:21:27<6:49:29, 2.56s/it] +2025-02-05 22:29:10 - ERROR - stderr - 57%|█████▋ | 12848/22434 [12:21:30<6:51:16, 2.57s/it] +2025-02-05 22:29:10 - ERROR - stderr - +2025-02-05 22:29:10 - ERROR - stderr - +2025-02-05 22:29:10 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.224700927734375, 'learning_rate': 8.142640291526028e-06, 'epoch': 1.72} +2025-02-05 22:29:10 - ERROR - stderr - 57%|█████▋ | 12848/22434 [12:21:30<6:51:16, 2.57s/it] +2025-02-05 22:29:13 - ERROR - stderr - 57%|█████▋ | 12849/22434 [12:21:32<6:51:24, 2.58s/it] +2025-02-05 22:29:13 - ERROR - stderr - +2025-02-05 22:29:13 - ERROR - stderr - +2025-02-05 22:29:13 - INFO - stdout - {'loss': 0.7488, 'grad_norm': 1.2388968467712402, 'learning_rate': 8.141221685904538e-06, 'epoch': 1.72} +2025-02-05 22:29:13 - ERROR - stderr - 57%|█████▋ | 12849/22434 [12:21:32<6:51:24, 2.58s/it] +2025-02-05 22:29:15 - ERROR - stderr - 57%|█████▋ | 12850/22434 [12:21:35<6:52:24, 2.58s/it] +2025-02-05 22:29:15 - ERROR - stderr - +2025-02-05 22:29:15 - ERROR - stderr - +2025-02-05 22:29:15 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.1630809307098389, 'learning_rate': 8.139803119027507e-06, 'epoch': 1.72} +2025-02-05 22:29:15 - ERROR - stderr - 57%|█████▋ | 12850/22434 [12:21:35<6:52:24, 2.58s/it] +2025-02-05 22:29:18 - ERROR - stderr - 57%|█████▋ | 12851/22434 [12:21:37<6:51:18, 2.58s/it] +2025-02-05 22:29:18 - ERROR - stderr - +2025-02-05 22:29:18 - ERROR - stderr - +2025-02-05 22:29:18 - INFO - stdout - {'loss': 0.662, 'grad_norm': 1.1897659301757812, 'learning_rate': 8.1383845909245e-06, 'epoch': 1.72} +2025-02-05 22:29:18 - ERROR - stderr - 57%|█████▋ | 12851/22434 [12:21:38<6:51:18, 2.58s/it] +2025-02-05 22:29:20 - ERROR - stderr - 57%|█████▋ | 12852/22434 [12:21:40<6:47:15, 2.55s/it] +2025-02-05 22:29:20 - ERROR - stderr - +2025-02-05 22:29:20 - ERROR - stderr - +2025-02-05 22:29:20 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.1600133180618286, 'learning_rate': 8.13696610162508e-06, 'epoch': 1.72} +2025-02-05 22:29:20 - ERROR - stderr - 57%|█████▋ | 12852/22434 [12:21:40<6:47:15, 2.55s/it] +2025-02-05 22:29:23 - ERROR - stderr - 57%|█████▋ | 12853/22434 [12:21:43<6:46:59, 2.55s/it] +2025-02-05 22:29:23 - ERROR - stderr - +2025-02-05 22:29:23 - ERROR - stderr - +2025-02-05 22:29:23 - INFO - stdout - {'loss': 0.6716, 'grad_norm': 1.257002830505371, 'learning_rate': 8.135547651158822e-06, 'epoch': 1.72} +2025-02-05 22:29:23 - ERROR - stderr - 57%|█████▋ | 12853/22434 [12:21:43<6:46:59, 2.55s/it] +2025-02-05 22:29:25 - ERROR - stderr - 57%|█████▋ | 12854/22434 [12:21:45<6:41:35, 2.52s/it] +2025-02-05 22:29:25 - ERROR - stderr - +2025-02-05 22:29:25 - ERROR - stderr - +2025-02-05 22:29:25 - INFO - stdout - {'loss': 0.6176, 'grad_norm': 1.2998160123825073, 'learning_rate': 8.13412923955529e-06, 'epoch': 1.72} +2025-02-05 22:29:25 - ERROR - stderr - 57%|█████▋ | 12854/22434 [12:21:45<6:41:35, 2.52s/it] +2025-02-05 22:29:28 - ERROR - stderr - 57%|█████▋ | 12855/22434 [12:21:47<6:37:58, 2.49s/it] +2025-02-05 22:29:28 - ERROR - stderr - +2025-02-05 22:29:28 - ERROR - stderr - +2025-02-05 22:29:28 - INFO - stdout - {'loss': 0.7464, 'grad_norm': 1.2360540628433228, 'learning_rate': 8.132710866844045e-06, 'epoch': 1.72} +2025-02-05 22:29:28 - ERROR - stderr - 57%|█████▋ | 12855/22434 [12:21:47<6:37:58, 2.49s/it] +2025-02-05 22:29:30 - ERROR - stderr - 57%|█████▋ | 12856/22434 [12:21:50<6:37:46, 2.49s/it] +2025-02-05 22:29:30 - ERROR - stderr - +2025-02-05 22:29:30 - ERROR - stderr - +2025-02-05 22:29:30 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.166771650314331, 'learning_rate': 8.13129253305466e-06, 'epoch': 1.72} +2025-02-05 22:29:30 - ERROR - stderr - 57%|█████▋ | 12856/22434 [12:21:50<6:37:46, 2.49s/it] +2025-02-05 22:29:33 - ERROR - stderr - 57%|█████▋ | 12857/22434 [12:21:52<6:42:16, 2.52s/it] +2025-02-05 22:29:33 - ERROR - stderr - +2025-02-05 22:29:33 - ERROR - stderr - +2025-02-05 22:29:33 - INFO - stdout - {'loss': 0.6532, 'grad_norm': 1.173782467842102, 'learning_rate': 8.129874238216689e-06, 'epoch': 1.72} +2025-02-05 22:29:33 - ERROR - stderr - 57%|█████▋ | 12857/22434 [12:21:53<6:42:16, 2.52s/it] +2025-02-05 22:29:35 - ERROR - stderr - 57%|█████▋ | 12858/22434 [12:21:55<6:39:54, 2.51s/it] +2025-02-05 22:29:35 - ERROR - stderr - +2025-02-05 22:29:35 - ERROR - stderr - +2025-02-05 22:29:35 - INFO - stdout - {'loss': 0.7143, 'grad_norm': 1.2952295541763306, 'learning_rate': 8.128455982359704e-06, 'epoch': 1.72} +2025-02-05 22:29:35 - ERROR - stderr - 57%|█████▋ | 12858/22434 [12:21:55<6:39:54, 2.51s/it] +2025-02-05 22:29:38 - ERROR - stderr - 57%|█████▋ | 12859/22434 [12:21:57<6:40:20, 2.51s/it] +2025-02-05 22:29:38 - ERROR - stderr - +2025-02-05 22:29:38 - ERROR - stderr - +2025-02-05 22:29:38 - INFO - stdout - {'loss': 0.6858, 'grad_norm': 1.132118821144104, 'learning_rate': 8.127037765513261e-06, 'epoch': 1.72} +2025-02-05 22:29:38 - ERROR - stderr - 57%|█████▋ | 12859/22434 [12:21:58<6:40:20, 2.51s/it] +2025-02-05 22:29:40 - ERROR - stderr - 57%|█████▋ | 12860/22434 [12:22:00<6:41:05, 2.51s/it] +2025-02-05 22:29:40 - ERROR - stderr - +2025-02-05 22:29:40 - ERROR - stderr - +2025-02-05 22:29:40 - INFO - stdout - {'loss': 0.7606, 'grad_norm': 1.374457597732544, 'learning_rate': 8.125619587706925e-06, 'epoch': 1.72} +2025-02-05 22:29:40 - ERROR - stderr - 57%|█████▋ | 12860/22434 [12:22:00<6:41:05, 2.51s/it] +2025-02-05 22:29:43 - ERROR - stderr - 57%|█████▋ | 12861/22434 [12:22:03<6:42:10, 2.52s/it] +2025-02-05 22:29:43 - ERROR - stderr - +2025-02-05 22:29:43 - ERROR - stderr - +2025-02-05 22:29:43 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.1691745519638062, 'learning_rate': 8.124201448970254e-06, 'epoch': 1.72} +2025-02-05 22:29:43 - ERROR - stderr - 57%|█████▋ | 12861/22434 [12:22:03<6:42:10, 2.52s/it] +2025-02-05 22:29:45 - ERROR - stderr - 57%|█████▋ | 12862/22434 [12:22:05<6:42:42, 2.52s/it] +2025-02-05 22:29:45 - ERROR - stderr - +2025-02-05 22:29:45 - ERROR - stderr - +2025-02-05 22:29:45 - INFO - stdout - {'loss': 0.7641, 'grad_norm': 1.3403770923614502, 'learning_rate': 8.122783349332811e-06, 'epoch': 1.72} +2025-02-05 22:29:45 - ERROR - stderr - 57%|█████▋ | 12862/22434 [12:22:05<6:42:42, 2.52s/it] +2025-02-05 22:29:48 - ERROR - stderr - 57%|█████▋ | 12863/22434 [12:22:08<6:40:32, 2.51s/it] +2025-02-05 22:29:48 - ERROR - stderr - +2025-02-05 22:29:48 - ERROR - stderr - +2025-02-05 22:29:48 - INFO - stdout - {'loss': 0.6855, 'grad_norm': 1.312195897102356, 'learning_rate': 8.12136528882415e-06, 'epoch': 1.72} +2025-02-05 22:29:48 - ERROR - stderr - 57%|█████▋ | 12863/22434 [12:22:08<6:40:32, 2.51s/it] +2025-02-05 22:29:50 - ERROR - stderr - 57%|█████▋ | 12864/22434 [12:22:10<6:40:08, 2.51s/it] +2025-02-05 22:29:50 - ERROR - stderr - +2025-02-05 22:29:50 - ERROR - stderr - +2025-02-05 22:29:50 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.2833150625228882, 'learning_rate': 8.119947267473833e-06, 'epoch': 1.72} +2025-02-05 22:29:50 - ERROR - stderr - 57%|█████▋ | 12864/22434 [12:22:10<6:40:08, 2.51s/it] +2025-02-05 22:29:53 - ERROR - stderr - 57%|█████▋ | 12865/22434 [12:22:13<6:41:05, 2.51s/it] +2025-02-05 22:29:53 - ERROR - stderr - +2025-02-05 22:29:53 - ERROR - stderr - +2025-02-05 22:29:53 - INFO - stdout - {'loss': 0.6435, 'grad_norm': 1.185397744178772, 'learning_rate': 8.118529285311415e-06, 'epoch': 1.72} +2025-02-05 22:29:53 - ERROR - stderr - 57%|█████▋ | 12865/22434 [12:22:13<6:41:05, 2.51s/it] +2025-02-05 22:29:56 - ERROR - stderr - 57%|█████▋ | 12866/22434 [12:22:15<6:54:09, 2.60s/it] +2025-02-05 22:29:56 - ERROR - stderr - +2025-02-05 22:29:56 - ERROR - stderr - +2025-02-05 22:29:56 - INFO - stdout - {'loss': 0.5846, 'grad_norm': 1.113646149635315, 'learning_rate': 8.117111342366454e-06, 'epoch': 1.72} +2025-02-05 22:29:56 - ERROR - stderr - 57%|█████▋ | 12866/22434 [12:22:15<6:54:09, 2.60s/it] +2025-02-05 22:29:58 - ERROR - stderr - 57%|█████▋ | 12867/22434 [12:22:18<6:49:36, 2.57s/it] +2025-02-05 22:29:58 - ERROR - stderr - +2025-02-05 22:29:58 - ERROR - stderr - +2025-02-05 22:29:58 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.1509543657302856, 'learning_rate': 8.115693438668507e-06, 'epoch': 1.72} +2025-02-05 22:29:58 - ERROR - stderr - 57%|█████▋ | 12867/22434 [12:22:18<6:49:36, 2.57s/it] +2025-02-05 22:30:01 - ERROR - stderr - 57%|█████▋ | 12868/22434 [12:22:21<6:55:28, 2.61s/it] +2025-02-05 22:30:01 - ERROR - stderr - +2025-02-05 22:30:01 - ERROR - stderr - +2025-02-05 22:30:01 - INFO - stdout - {'loss': 0.7561, 'grad_norm': 1.2752186059951782, 'learning_rate': 8.114275574247124e-06, 'epoch': 1.72} +2025-02-05 22:30:01 - ERROR - stderr - 57%|█████▋ | 12868/22434 [12:22:21<6:55:28, 2.61s/it] +2025-02-05 22:30:03 - ERROR - stderr - 57%|█████▋ | 12869/22434 [12:22:23<6:52:38, 2.59s/it] +2025-02-05 22:30:03 - ERROR - stderr - +2025-02-05 22:30:03 - ERROR - stderr - +2025-02-05 22:30:03 - INFO - stdout - {'loss': 0.7779, 'grad_norm': 1.297583818435669, 'learning_rate': 8.112857749131867e-06, 'epoch': 1.72} +2025-02-05 22:30:03 - ERROR - stderr - 57%|█████▋ | 12869/22434 [12:22:23<6:52:38, 2.59s/it] +2025-02-05 22:30:06 - ERROR - stderr - 57%|█████▋ | 12870/22434 [12:22:26<6:48:19, 2.56s/it] +2025-02-05 22:30:06 - ERROR - stderr - +2025-02-05 22:30:06 - ERROR - stderr - +2025-02-05 22:30:06 - INFO - stdout - {'loss': 0.7614, 'grad_norm': 1.334119439125061, 'learning_rate': 8.111439963352284e-06, 'epoch': 1.72} +2025-02-05 22:30:06 - ERROR - stderr - 57%|█████▋ | 12870/22434 [12:22:26<6:48:19, 2.56s/it] +2025-02-05 22:30:08 - ERROR - stderr - 57%|█████▋ | 12871/22434 [12:22:28<6:41:04, 2.52s/it] +2025-02-05 22:30:08 - ERROR - stderr - +2025-02-05 22:30:08 - ERROR - stderr - +2025-02-05 22:30:08 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.2802408933639526, 'learning_rate': 8.110022216937923e-06, 'epoch': 1.72} +2025-02-05 22:30:08 - ERROR - stderr - 57%|█████▋ | 12871/22434 [12:22:28<6:41:04, 2.52s/it] +2025-02-05 22:30:11 - ERROR - stderr - 57%|█████▋ | 12872/22434 [12:22:31<6:51:13, 2.58s/it] +2025-02-05 22:30:11 - ERROR - stderr - +2025-02-05 22:30:11 - ERROR - stderr - +2025-02-05 22:30:11 - INFO - stdout - {'loss': 0.7003, 'grad_norm': 1.2406387329101562, 'learning_rate': 8.108604509918344e-06, 'epoch': 1.72} +2025-02-05 22:30:11 - ERROR - stderr - 57%|█████▋ | 12872/22434 [12:22:31<6:51:13, 2.58s/it] +2025-02-05 22:30:13 - ERROR - stderr - 57%|█████▋ | 12873/22434 [12:22:33<6:43:09, 2.53s/it] +2025-02-05 22:30:13 - ERROR - stderr - +2025-02-05 22:30:13 - ERROR - stderr - +2025-02-05 22:30:13 - INFO - stdout - {'loss': 0.6048, 'grad_norm': 1.1272902488708496, 'learning_rate': 8.107186842323091e-06, 'epoch': 1.72} +2025-02-05 22:30:13 - ERROR - stderr - 57%|█████▋ | 12873/22434 [12:22:33<6:43:09, 2.53s/it] +2025-02-05 22:30:16 - ERROR - stderr - 57%|█████▋ | 12874/22434 [12:22:36<6:44:47, 2.54s/it] +2025-02-05 22:30:16 - ERROR - stderr - +2025-02-05 22:30:16 - ERROR - stderr - +2025-02-05 22:30:16 - INFO - stdout - {'loss': 0.641, 'grad_norm': 1.140223741531372, 'learning_rate': 8.10576921418172e-06, 'epoch': 1.72} +2025-02-05 22:30:16 - ERROR - stderr - 57%|█████▋ | 12874/22434 [12:22:36<6:44:47, 2.54s/it] +2025-02-05 22:30:19 - ERROR - stderr - 57%|█████▋ | 12875/22434 [12:22:39<7:05:08, 2.67s/it] +2025-02-05 22:30:19 - ERROR - stderr - +2025-02-05 22:30:19 - ERROR - stderr - +2025-02-05 22:30:19 - INFO - stdout - {'loss': 0.5968, 'grad_norm': 1.1854684352874756, 'learning_rate': 8.104351625523778e-06, 'epoch': 1.72} +2025-02-05 22:30:19 - ERROR - stderr - 57%|█████▋ | 12875/22434 [12:22:39<7:05:08, 2.67s/it] +2025-02-05 22:30:21 - ERROR - stderr - 57%|█████▋ | 12876/22434 [12:22:41<6:57:57, 2.62s/it] +2025-02-05 22:30:21 - ERROR - stderr - +2025-02-05 22:30:21 - ERROR - stderr - +2025-02-05 22:30:21 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.1487549543380737, 'learning_rate': 8.102934076378809e-06, 'epoch': 1.72} +2025-02-05 22:30:21 - ERROR - stderr - 57%|█████▋ | 12876/22434 [12:22:41<6:57:57, 2.62s/it] +2025-02-05 22:30:24 - ERROR - stderr - 57%|█████▋ | 12877/22434 [12:22:44<6:51:53, 2.59s/it] +2025-02-05 22:30:24 - ERROR - stderr - +2025-02-05 22:30:24 - ERROR - stderr - +2025-02-05 22:30:24 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.2717201709747314, 'learning_rate': 8.101516566776368e-06, 'epoch': 1.72} +2025-02-05 22:30:24 - ERROR - stderr - 57%|█████▋ | 12877/22434 [12:22:44<6:51:53, 2.59s/it] +2025-02-05 22:30:26 - ERROR - stderr - 57%|█████▋ | 12878/22434 [12:22:46<6:50:16, 2.58s/it] +2025-02-05 22:30:27 - ERROR - stderr - +2025-02-05 22:30:27 - ERROR - stderr - +2025-02-05 22:30:27 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.3073252439498901, 'learning_rate': 8.100099096745995e-06, 'epoch': 1.72} +2025-02-05 22:30:27 - ERROR - stderr - 57%|█████▋ | 12878/22434 [12:22:46<6:50:16, 2.58s/it] +2025-02-05 22:30:29 - ERROR - stderr - 57%|█████▋ | 12879/22434 [12:22:49<6:47:54, 2.56s/it] +2025-02-05 22:30:29 - ERROR - stderr - +2025-02-05 22:30:29 - ERROR - stderr - +2025-02-05 22:30:29 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.2961386442184448, 'learning_rate': 8.098681666317239e-06, 'epoch': 1.72} +2025-02-05 22:30:29 - ERROR - stderr - 57%|█████▋ | 12879/22434 [12:22:49<6:47:54, 2.56s/it] +2025-02-05 22:30:32 - ERROR - stderr - 57%|█████▋ | 12880/22434 [12:22:52<7:03:59, 2.66s/it] +2025-02-05 22:30:32 - ERROR - stderr - +2025-02-05 22:30:32 - ERROR - stderr - +2025-02-05 22:30:32 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.2053905725479126, 'learning_rate': 8.097264275519643e-06, 'epoch': 1.72} +2025-02-05 22:30:32 - ERROR - stderr - 57%|█████▋ | 12880/22434 [12:22:52<7:03:59, 2.66s/it] +2025-02-05 22:30:34 - ERROR - stderr - 57%|█████▋ | 12881/22434 [12:22:54<6:54:23, 2.60s/it] +2025-02-05 22:30:34 - ERROR - stderr - +2025-02-05 22:30:34 - ERROR - stderr - +2025-02-05 22:30:34 - INFO - stdout - {'loss': 0.684, 'grad_norm': 1.216880440711975, 'learning_rate': 8.095846924382751e-06, 'epoch': 1.72} +2025-02-05 22:30:34 - ERROR - stderr - 57%|█████▋ | 12881/22434 [12:22:54<6:54:23, 2.60s/it] +2025-02-05 22:30:37 - ERROR - stderr - 57%|█████▋ | 12882/22434 [12:22:57<6:54:01, 2.60s/it] +2025-02-05 22:30:37 - ERROR - stderr - +2025-02-05 22:30:37 - ERROR - stderr - +2025-02-05 22:30:37 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.2305643558502197, 'learning_rate': 8.094429612936111e-06, 'epoch': 1.72} +2025-02-05 22:30:37 - ERROR - stderr - 57%|█████▋ | 12882/22434 [12:22:57<6:54:01, 2.60s/it] +2025-02-05 22:30:39 - ERROR - stderr - 57%|█████▋ | 12883/22434 [12:22:59<6:50:01, 2.58s/it] +2025-02-05 22:30:40 - ERROR - stderr - +2025-02-05 22:30:40 - ERROR - stderr - +2025-02-05 22:30:40 - INFO - stdout - {'loss': 0.6969, 'grad_norm': 1.25296151638031, 'learning_rate': 8.093012341209264e-06, 'epoch': 1.72} +2025-02-05 22:30:40 - ERROR - stderr - 57%|█████▋ | 12883/22434 [12:22:59<6:50:01, 2.58s/it] +2025-02-05 22:30:42 - ERROR - stderr - 57%|█████▋ | 12884/22434 [12:23:02<6:50:02, 2.58s/it] +2025-02-05 22:30:42 - ERROR - stderr - +2025-02-05 22:30:42 - ERROR - stderr - +2025-02-05 22:30:42 - INFO - stdout - {'loss': 0.6558, 'grad_norm': 1.2045902013778687, 'learning_rate': 8.091595109231745e-06, 'epoch': 1.72} +2025-02-05 22:30:42 - ERROR - stderr - 57%|█████▋ | 12884/22434 [12:23:02<6:50:02, 2.58s/it] +2025-02-05 22:30:45 - ERROR - stderr - 57%|█████▋ | 12885/22434 [12:23:04<6:47:31, 2.56s/it] +2025-02-05 22:30:45 - ERROR - stderr - +2025-02-05 22:30:45 - ERROR - stderr - +2025-02-05 22:30:45 - INFO - stdout - {'loss': 0.6761, 'grad_norm': 1.1353389024734497, 'learning_rate': 8.090177917033102e-06, 'epoch': 1.72} +2025-02-05 22:30:45 - ERROR - stderr - 57%|█████▋ | 12885/22434 [12:23:04<6:47:31, 2.56s/it] +2025-02-05 22:30:47 - ERROR - stderr - 57%|█████▋ | 12886/22434 [12:23:07<6:44:29, 2.54s/it] +2025-02-05 22:30:47 - ERROR - stderr - +2025-02-05 22:30:47 - ERROR - stderr - +2025-02-05 22:30:47 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.2239171266555786, 'learning_rate': 8.088760764642874e-06, 'epoch': 1.72} +2025-02-05 22:30:47 - ERROR - stderr - 57%|█████▋ | 12886/22434 [12:23:07<6:44:29, 2.54s/it] +2025-02-05 22:30:50 - ERROR - stderr - 57%|█████▋ | 12887/22434 [12:23:09<6:43:02, 2.53s/it] +2025-02-05 22:30:50 - ERROR - stderr - +2025-02-05 22:30:50 - ERROR - stderr - +2025-02-05 22:30:50 - INFO - stdout - {'loss': 0.7066, 'grad_norm': 1.1530975103378296, 'learning_rate': 8.087343652090595e-06, 'epoch': 1.72} +2025-02-05 22:30:50 - ERROR - stderr - 57%|█████▋ | 12887/22434 [12:23:09<6:43:02, 2.53s/it] +2025-02-05 22:30:52 - ERROR - stderr - 57%|█████▋ | 12888/22434 [12:23:12<6:43:13, 2.53s/it] +2025-02-05 22:30:52 - ERROR - stderr - +2025-02-05 22:30:52 - ERROR - stderr - +2025-02-05 22:30:52 - INFO - stdout - {'loss': 0.611, 'grad_norm': 1.189701795578003, 'learning_rate': 8.085926579405814e-06, 'epoch': 1.72} +2025-02-05 22:30:52 - ERROR - stderr - 57%|█████▋ | 12888/22434 [12:23:12<6:43:13, 2.53s/it] +2025-02-05 22:30:55 - ERROR - stderr - 57%|█████▋ | 12889/22434 [12:23:14<6:40:23, 2.52s/it] +2025-02-05 22:30:55 - ERROR - stderr - +2025-02-05 22:30:55 - ERROR - stderr - +2025-02-05 22:30:55 - INFO - stdout - {'loss': 0.6989, 'grad_norm': 1.1637330055236816, 'learning_rate': 8.084509546618055e-06, 'epoch': 1.72} +2025-02-05 22:30:55 - ERROR - stderr - 57%|█████▋ | 12889/22434 [12:23:14<6:40:23, 2.52s/it] +2025-02-05 22:30:57 - ERROR - stderr - 57%|█████▋ | 12890/22434 [12:23:17<6:41:23, 2.52s/it] +2025-02-05 22:30:57 - ERROR - stderr - +2025-02-05 22:30:57 - ERROR - stderr - +2025-02-05 22:30:57 - INFO - stdout - {'loss': 0.6647, 'grad_norm': 1.199289083480835, 'learning_rate': 8.083092553756866e-06, 'epoch': 1.72} +2025-02-05 22:30:57 - ERROR - stderr - 57%|█████▋ | 12890/22434 [12:23:17<6:41:23, 2.52s/it] +2025-02-05 22:31:00 - ERROR - stderr - 57%|█████▋ | 12891/22434 [12:23:19<6:41:53, 2.53s/it] +2025-02-05 22:31:00 - ERROR - stderr - +2025-02-05 22:31:00 - ERROR - stderr - +2025-02-05 22:31:00 - INFO - stdout - {'loss': 0.6444, 'grad_norm': 1.2631560564041138, 'learning_rate': 8.081675600851779e-06, 'epoch': 1.72} +2025-02-05 22:31:00 - ERROR - stderr - 57%|█████▋ | 12891/22434 [12:23:20<6:41:53, 2.53s/it] +2025-02-05 22:31:02 - ERROR - stderr - 57%|█████▋ | 12892/22434 [12:23:22<6:36:55, 2.50s/it] +2025-02-05 22:31:02 - ERROR - stderr - +2025-02-05 22:31:02 - ERROR - stderr - +2025-02-05 22:31:02 - INFO - stdout - {'loss': 0.7455, 'grad_norm': 1.3155760765075684, 'learning_rate': 8.080258687932326e-06, 'epoch': 1.72} +2025-02-05 22:31:02 - ERROR - stderr - 57%|█████▋ | 12892/22434 [12:23:22<6:36:55, 2.50s/it] +2025-02-05 22:31:05 - ERROR - stderr - 57%|█████▋ | 12893/22434 [12:23:24<6:35:33, 2.49s/it] +2025-02-05 22:31:05 - ERROR - stderr - +2025-02-05 22:31:05 - ERROR - stderr - +2025-02-05 22:31:05 - INFO - stdout - {'loss': 0.7544, 'grad_norm': 1.3042099475860596, 'learning_rate': 8.078841815028043e-06, 'epoch': 1.72} +2025-02-05 22:31:05 - ERROR - stderr - 57%|█████▋ | 12893/22434 [12:23:24<6:35:33, 2.49s/it] +2025-02-05 22:31:07 - ERROR - stderr - 57%|█████▋ | 12894/22434 [12:23:27<6:38:25, 2.51s/it] +2025-02-05 22:31:07 - ERROR - stderr - +2025-02-05 22:31:07 - ERROR - stderr - +2025-02-05 22:31:07 - INFO - stdout - {'loss': 0.6455, 'grad_norm': 1.2959418296813965, 'learning_rate': 8.077424982168467e-06, 'epoch': 1.72} +2025-02-05 22:31:07 - ERROR - stderr - 57%|█████▋ | 12894/22434 [12:23:27<6:38:25, 2.51s/it] +2025-02-05 22:31:10 - ERROR - stderr - 57%|█████▋ | 12895/22434 [12:23:29<6:40:07, 2.52s/it] +2025-02-05 22:31:10 - ERROR - stderr - +2025-02-05 22:31:10 - ERROR - stderr - +2025-02-05 22:31:10 - INFO - stdout - {'loss': 0.666, 'grad_norm': 1.1764159202575684, 'learning_rate': 8.076008189383125e-06, 'epoch': 1.72} +2025-02-05 22:31:10 - ERROR - stderr - 57%|█████▋ | 12895/22434 [12:23:29<6:40:07, 2.52s/it] +2025-02-05 22:31:12 - ERROR - stderr - 57%|█████▋ | 12896/22434 [12:23:32<6:36:39, 2.50s/it] +2025-02-05 22:31:12 - ERROR - stderr - +2025-02-05 22:31:12 - ERROR - stderr - +2025-02-05 22:31:12 - INFO - stdout - {'loss': 0.6683, 'grad_norm': 1.4329313039779663, 'learning_rate': 8.074591436701554e-06, 'epoch': 1.72} +2025-02-05 22:31:12 - ERROR - stderr - 57%|█████▋ | 12896/22434 [12:23:32<6:36:39, 2.50s/it] +2025-02-05 22:31:15 - ERROR - stderr - 57%|█████▋ | 12897/22434 [12:23:34<6:36:32, 2.49s/it] +2025-02-05 22:31:15 - ERROR - stderr - +2025-02-05 22:31:15 - ERROR - stderr - +2025-02-05 22:31:15 - INFO - stdout - {'loss': 0.6584, 'grad_norm': 1.2898163795471191, 'learning_rate': 8.073174724153278e-06, 'epoch': 1.72} +2025-02-05 22:31:15 - ERROR - stderr - 57%|█████▋ | 12897/22434 [12:23:34<6:36:32, 2.49s/it] +2025-02-05 22:31:17 - ERROR - stderr - 57%|█████▋ | 12898/22434 [12:23:37<6:40:20, 2.52s/it] +2025-02-05 22:31:17 - ERROR - stderr - +2025-02-05 22:31:17 - ERROR - stderr - +2025-02-05 22:31:17 - INFO - stdout - {'loss': 0.6806, 'grad_norm': 1.1148664951324463, 'learning_rate': 8.071758051767833e-06, 'epoch': 1.72} +2025-02-05 22:31:17 - ERROR - stderr - 57%|█████▋ | 12898/22434 [12:23:37<6:40:20, 2.52s/it] +2025-02-05 22:31:20 - ERROR - stderr - 57%|█████▋ | 12899/22434 [12:23:39<6:38:36, 2.51s/it] +2025-02-05 22:31:20 - ERROR - stderr - +2025-02-05 22:31:20 - ERROR - stderr - +2025-02-05 22:31:20 - INFO - stdout - {'loss': 0.7153, 'grad_norm': 1.1910672187805176, 'learning_rate': 8.070341419574748e-06, 'epoch': 1.72} +2025-02-05 22:31:20 - ERROR - stderr - 57%|█████▋ | 12899/22434 [12:23:39<6:38:36, 2.51s/it] +2025-02-05 22:31:22 - ERROR - stderr - 58%|█████▊ | 12900/22434 [12:23:42<6:36:57, 2.50s/it] +2025-02-05 22:31:22 - ERROR - stderr - +2025-02-05 22:31:22 - ERROR - stderr - +2025-02-05 22:31:22 - INFO - stdout - {'loss': 0.6382, 'grad_norm': 1.185206651687622, 'learning_rate': 8.068924827603545e-06, 'epoch': 1.73} +2025-02-05 22:31:22 - ERROR - stderr - 58%|█████▊ | 12900/22434 [12:23:42<6:36:57, 2.50s/it] +2025-02-05 22:31:25 - ERROR - stderr - 58%|█████▊ | 12901/22434 [12:23:44<6:41:02, 2.52s/it] +2025-02-05 22:31:25 - ERROR - stderr - +2025-02-05 22:31:25 - ERROR - stderr - +2025-02-05 22:31:25 - INFO - stdout - {'loss': 0.7217, 'grad_norm': 1.2356975078582764, 'learning_rate': 8.067508275883763e-06, 'epoch': 1.73} +2025-02-05 22:31:25 - ERROR - stderr - 58%|█████▊ | 12901/22434 [12:23:45<6:41:02, 2.52s/it] +2025-02-05 22:31:27 - ERROR - stderr - 58%|█████▊ | 12902/22434 [12:23:47<6:37:39, 2.50s/it] +2025-02-05 22:31:27 - ERROR - stderr - +2025-02-05 22:31:27 - ERROR - stderr - +2025-02-05 22:31:27 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.1419377326965332, 'learning_rate': 8.066091764444918e-06, 'epoch': 1.73} +2025-02-05 22:31:27 - ERROR - stderr - 58%|█████▊ | 12902/22434 [12:23:47<6:37:39, 2.50s/it] +2025-02-05 22:31:30 - ERROR - stderr - 58%|█████▊ | 12903/22434 [12:23:50<6:41:10, 2.53s/it] +2025-02-05 22:31:30 - ERROR - stderr - +2025-02-05 22:31:30 - ERROR - stderr - +2025-02-05 22:31:30 - INFO - stdout - {'loss': 0.6484, 'grad_norm': 1.1161158084869385, 'learning_rate': 8.064675293316538e-06, 'epoch': 1.73} +2025-02-05 22:31:30 - ERROR - stderr - 58%|█████▊ | 12903/22434 [12:23:50<6:41:10, 2.53s/it] +2025-02-05 22:31:32 - ERROR - stderr - 58%|█████▊ | 12904/22434 [12:23:52<6:36:30, 2.50s/it] +2025-02-05 22:31:32 - ERROR - stderr - +2025-02-05 22:31:32 - ERROR - stderr - +2025-02-05 22:31:32 - INFO - stdout - {'loss': 0.7253, 'grad_norm': 1.3562854528427124, 'learning_rate': 8.063258862528151e-06, 'epoch': 1.73} +2025-02-05 22:31:32 - ERROR - stderr - 58%|█████▊ | 12904/22434 [12:23:52<6:36:30, 2.50s/it] +2025-02-05 22:31:35 - ERROR - stderr - 58%|█████▊ | 12905/22434 [12:23:54<6:36:11, 2.49s/it] +2025-02-05 22:31:35 - ERROR - stderr - +2025-02-05 22:31:35 - ERROR - stderr - +2025-02-05 22:31:35 - INFO - stdout - {'loss': 0.7605, 'grad_norm': 1.3898921012878418, 'learning_rate': 8.06184247210928e-06, 'epoch': 1.73} +2025-02-05 22:31:35 - ERROR - stderr - 58%|█████▊ | 12905/22434 [12:23:54<6:36:11, 2.49s/it] +2025-02-05 22:31:37 - ERROR - stderr - 58%|█████▊ | 12906/22434 [12:23:57<6:34:56, 2.49s/it] +2025-02-05 22:31:37 - ERROR - stderr - +2025-02-05 22:31:37 - ERROR - stderr - +2025-02-05 22:31:37 - INFO - stdout - {'loss': 0.6522, 'grad_norm': 1.2959322929382324, 'learning_rate': 8.060426122089448e-06, 'epoch': 1.73} +2025-02-05 22:31:37 - ERROR - stderr - 58%|█████▊ | 12906/22434 [12:23:57<6:34:56, 2.49s/it] +2025-02-05 22:31:40 - ERROR - stderr - 58%|█████▊ | 12907/22434 [12:23:59<6:35:28, 2.49s/it] +2025-02-05 22:31:40 - ERROR - stderr - +2025-02-05 22:31:40 - ERROR - stderr - +2025-02-05 22:31:40 - INFO - stdout - {'loss': 0.7239, 'grad_norm': 1.3542392253875732, 'learning_rate': 8.059009812498179e-06, 'epoch': 1.73} +2025-02-05 22:31:40 - ERROR - stderr - 58%|█████▊ | 12907/22434 [12:23:59<6:35:28, 2.49s/it] +2025-02-05 22:31:42 - ERROR - stderr - 58%|█████▊ | 12908/22434 [12:24:02<6:37:53, 2.51s/it] +2025-02-05 22:31:42 - ERROR - stderr - +2025-02-05 22:31:42 - ERROR - stderr - +2025-02-05 22:31:42 - INFO - stdout - {'loss': 0.7222, 'grad_norm': 1.2368452548980713, 'learning_rate': 8.057593543364991e-06, 'epoch': 1.73} +2025-02-05 22:31:42 - ERROR - stderr - 58%|█████▊ | 12908/22434 [12:24:02<6:37:53, 2.51s/it] +2025-02-05 22:31:45 - ERROR - stderr - 58%|█████▊ | 12909/22434 [12:24:05<6:39:49, 2.52s/it] +2025-02-05 22:31:45 - ERROR - stderr - +2025-02-05 22:31:45 - ERROR - stderr - +2025-02-05 22:31:45 - INFO - stdout - {'loss': 0.7307, 'grad_norm': 1.1805928945541382, 'learning_rate': 8.05617731471941e-06, 'epoch': 1.73} +2025-02-05 22:31:45 - ERROR - stderr - 58%|█████▊ | 12909/22434 [12:24:05<6:39:49, 2.52s/it] +2025-02-05 22:31:47 - ERROR - stderr - 58%|█████▊ | 12910/22434 [12:24:07<6:35:53, 2.49s/it] +2025-02-05 22:31:47 - ERROR - stderr - +2025-02-05 22:31:47 - ERROR - stderr - +2025-02-05 22:31:47 - INFO - stdout - {'loss': 0.6286, 'grad_norm': 1.1964836120605469, 'learning_rate': 8.05476112659095e-06, 'epoch': 1.73} +2025-02-05 22:31:47 - ERROR - stderr - 58%|█████▊ | 12910/22434 [12:24:07<6:35:53, 2.49s/it] +2025-02-05 22:31:50 - ERROR - stderr - 58%|█████▊ | 12911/22434 [12:24:09<6:36:25, 2.50s/it] +2025-02-05 22:31:50 - ERROR - stderr - +2025-02-05 22:31:50 - ERROR - stderr - +2025-02-05 22:31:50 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.2369167804718018, 'learning_rate': 8.053344979009134e-06, 'epoch': 1.73} +2025-02-05 22:31:50 - ERROR - stderr - 58%|█████▊ | 12911/22434 [12:24:09<6:36:25, 2.50s/it] +2025-02-05 22:31:52 - ERROR - stderr - 58%|█████▊ | 12912/22434 [12:24:12<6:37:03, 2.50s/it] +2025-02-05 22:31:52 - ERROR - stderr - +2025-02-05 22:31:52 - ERROR - stderr - +2025-02-05 22:31:52 - INFO - stdout - {'loss': 0.7123, 'grad_norm': 1.1599445343017578, 'learning_rate': 8.051928872003477e-06, 'epoch': 1.73} +2025-02-05 22:31:52 - ERROR - stderr - 58%|█████▊ | 12912/22434 [12:24:12<6:37:03, 2.50s/it] +2025-02-05 22:31:55 - ERROR - stderr - 58%|█████▊ | 12913/22434 [12:24:14<6:38:19, 2.51s/it] +2025-02-05 22:31:55 - ERROR - stderr - +2025-02-05 22:31:55 - ERROR - stderr - +2025-02-05 22:31:55 - INFO - stdout - {'loss': 0.734, 'grad_norm': 1.2838554382324219, 'learning_rate': 8.050512805603498e-06, 'epoch': 1.73} +2025-02-05 22:31:55 - ERROR - stderr - 58%|█████▊ | 12913/22434 [12:24:15<6:38:19, 2.51s/it] +2025-02-05 22:31:57 - ERROR - stderr - 58%|█████▊ | 12914/22434 [12:24:17<6:36:28, 2.50s/it] +2025-02-05 22:31:57 - ERROR - stderr - +2025-02-05 22:31:57 - ERROR - stderr - +2025-02-05 22:31:57 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.2725017070770264, 'learning_rate': 8.04909677983872e-06, 'epoch': 1.73} +2025-02-05 22:31:57 - ERROR - stderr - 58%|█████▊ | 12914/22434 [12:24:17<6:36:28, 2.50s/it] +2025-02-05 22:32:00 - ERROR - stderr - 58%|█████▊ | 12915/22434 [12:24:20<6:41:32, 2.53s/it] +2025-02-05 22:32:00 - ERROR - stderr - +2025-02-05 22:32:00 - ERROR - stderr - +2025-02-05 22:32:00 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.2023309469223022, 'learning_rate': 8.04768079473865e-06, 'epoch': 1.73} +2025-02-05 22:32:00 - ERROR - stderr - 58%|█████▊ | 12915/22434 [12:24:20<6:41:32, 2.53s/it] +2025-02-05 22:32:02 - ERROR - stderr - 58%|█████▊ | 12916/22434 [12:24:22<6:42:31, 2.54s/it] +2025-02-05 22:32:02 - ERROR - stderr - +2025-02-05 22:32:02 - ERROR - stderr - +2025-02-05 22:32:02 - INFO - stdout - {'loss': 0.6093, 'grad_norm': 1.1640784740447998, 'learning_rate': 8.046264850332802e-06, 'epoch': 1.73} +2025-02-05 22:32:02 - ERROR - stderr - 58%|█████▊ | 12916/22434 [12:24:22<6:42:31, 2.54s/it] +2025-02-05 22:32:05 - ERROR - stderr - 58%|█████▊ | 12917/22434 [12:24:25<6:42:59, 2.54s/it] +2025-02-05 22:32:05 - ERROR - stderr - +2025-02-05 22:32:05 - ERROR - stderr - +2025-02-05 22:32:05 - INFO - stdout - {'loss': 0.6678, 'grad_norm': 1.2280038595199585, 'learning_rate': 8.044848946650696e-06, 'epoch': 1.73} +2025-02-05 22:32:05 - ERROR - stderr - 58%|█████▊ | 12917/22434 [12:24:25<6:42:59, 2.54s/it] +2025-02-05 22:32:07 - ERROR - stderr - 58%|█████▊ | 12918/22434 [12:24:27<6:38:57, 2.52s/it] +2025-02-05 22:32:07 - ERROR - stderr - +2025-02-05 22:32:07 - ERROR - stderr - +2025-02-05 22:32:07 - INFO - stdout - {'loss': 0.7098, 'grad_norm': 1.2812857627868652, 'learning_rate': 8.043433083721843e-06, 'epoch': 1.73} +2025-02-05 22:32:07 - ERROR - stderr - 58%|█████▊ | 12918/22434 [12:24:27<6:38:57, 2.52s/it] +2025-02-05 22:32:10 - ERROR - stderr - 58%|█████▊ | 12919/22434 [12:24:30<6:38:43, 2.51s/it] +2025-02-05 22:32:10 - ERROR - stderr - +2025-02-05 22:32:10 - ERROR - stderr - +2025-02-05 22:32:10 - INFO - stdout - {'loss': 0.6582, 'grad_norm': 1.1899956464767456, 'learning_rate': 8.042017261575756e-06, 'epoch': 1.73} +2025-02-05 22:32:10 - ERROR - stderr - 58%|█████▊ | 12919/22434 [12:24:30<6:38:43, 2.51s/it] +2025-02-05 22:32:12 - ERROR - stderr - 58%|█████▊ | 12920/22434 [12:24:32<6:38:25, 2.51s/it] +2025-02-05 22:32:12 - ERROR - stderr - +2025-02-05 22:32:12 - ERROR - stderr - +2025-02-05 22:32:12 - INFO - stdout - {'loss': 0.7497, 'grad_norm': 1.3713732957839966, 'learning_rate': 8.040601480241948e-06, 'epoch': 1.73} +2025-02-05 22:32:12 - ERROR - stderr - 58%|█████▊ | 12920/22434 [12:24:32<6:38:25, 2.51s/it] +2025-02-05 22:32:15 - ERROR - stderr - 58%|█████▊ | 12921/22434 [12:24:35<6:44:50, 2.55s/it] +2025-02-05 22:32:15 - ERROR - stderr - +2025-02-05 22:32:15 - ERROR - stderr - +2025-02-05 22:32:15 - INFO - stdout - {'loss': 0.6391, 'grad_norm': 1.2832385301589966, 'learning_rate': 8.03918573974992e-06, 'epoch': 1.73} +2025-02-05 22:32:15 - ERROR - stderr - 58%|█████▊ | 12921/22434 [12:24:35<6:44:50, 2.55s/it] +2025-02-05 22:32:17 - ERROR - stderr - 58%|█████▊ | 12922/22434 [12:24:37<6:39:13, 2.52s/it] +2025-02-05 22:32:18 - ERROR - stderr - +2025-02-05 22:32:18 - ERROR - stderr - +2025-02-05 22:32:18 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.3006452322006226, 'learning_rate': 8.037770040129196e-06, 'epoch': 1.73} +2025-02-05 22:32:18 - ERROR - stderr - 58%|█████▊ | 12922/22434 [12:24:37<6:39:13, 2.52s/it] +2025-02-05 22:32:20 - ERROR - stderr - 58%|█████▊ | 12923/22434 [12:24:40<6:41:55, 2.54s/it] +2025-02-05 22:32:20 - ERROR - stderr - +2025-02-05 22:32:20 - ERROR - stderr - +2025-02-05 22:32:20 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.181689739227295, 'learning_rate': 8.036354381409276e-06, 'epoch': 1.73} +2025-02-05 22:32:20 - ERROR - stderr - 58%|█████▊ | 12923/22434 [12:24:40<6:41:55, 2.54s/it] +2025-02-05 22:32:23 - ERROR - stderr - 58%|█████▊ | 12924/22434 [12:24:42<6:39:54, 2.52s/it] +2025-02-05 22:32:23 - ERROR - stderr - +2025-02-05 22:32:23 - ERROR - stderr - +2025-02-05 22:32:23 - INFO - stdout - {'loss': 0.7704, 'grad_norm': 1.2600747346878052, 'learning_rate': 8.034938763619667e-06, 'epoch': 1.73} +2025-02-05 22:32:23 - ERROR - stderr - 58%|█████▊ | 12924/22434 [12:24:42<6:39:54, 2.52s/it] +2025-02-05 22:32:25 - ERROR - stderr - 58%|█████▊ | 12925/22434 [12:24:45<6:49:13, 2.58s/it] +2025-02-05 22:32:25 - ERROR - stderr - +2025-02-05 22:32:25 - ERROR - stderr - +2025-02-05 22:32:25 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.239434838294983, 'learning_rate': 8.03352318678988e-06, 'epoch': 1.73} +2025-02-05 22:32:25 - ERROR - stderr - 58%|█████▊ | 12925/22434 [12:24:45<6:49:13, 2.58s/it] +2025-02-05 22:32:28 - ERROR - stderr - 58%|█████▊ | 12926/22434 [12:24:47<6:41:04, 2.53s/it] +2025-02-05 22:32:28 - ERROR - stderr - +2025-02-05 22:32:28 - ERROR - stderr - +2025-02-05 22:32:28 - INFO - stdout - {'loss': 0.7203, 'grad_norm': 1.4491002559661865, 'learning_rate': 8.03210765094942e-06, 'epoch': 1.73} +2025-02-05 22:32:28 - ERROR - stderr - 58%|█████▊ | 12926/22434 [12:24:47<6:41:04, 2.53s/it] +2025-02-05 22:32:30 - ERROR - stderr - 58%|█████▊ | 12927/22434 [12:24:50<6:40:52, 2.53s/it] +2025-02-05 22:32:30 - ERROR - stderr - +2025-02-05 22:32:30 - ERROR - stderr - +2025-02-05 22:32:30 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.410421371459961, 'learning_rate': 8.030692156127797e-06, 'epoch': 1.73} +2025-02-05 22:32:30 - ERROR - stderr - 58%|█████▊ | 12927/22434 [12:24:50<6:40:52, 2.53s/it] +2025-02-05 22:32:33 - ERROR - stderr - 58%|█████▊ | 12928/22434 [12:24:52<6:39:38, 2.52s/it] +2025-02-05 22:32:33 - ERROR - stderr - +2025-02-05 22:32:33 - ERROR - stderr - +2025-02-05 22:32:33 - INFO - stdout - {'loss': 0.6297, 'grad_norm': 1.124375581741333, 'learning_rate': 8.029276702354511e-06, 'epoch': 1.73} +2025-02-05 22:32:33 - ERROR - stderr - 58%|█████▊ | 12928/22434 [12:24:53<6:39:38, 2.52s/it] +2025-02-05 22:32:35 - ERROR - stderr - 58%|█████▊ | 12929/22434 [12:24:55<6:50:21, 2.59s/it] +2025-02-05 22:32:35 - ERROR - stderr - +2025-02-05 22:32:35 - ERROR - stderr - +2025-02-05 22:32:35 - INFO - stdout - {'loss': 0.6466, 'grad_norm': 1.3015804290771484, 'learning_rate': 8.027861289659062e-06, 'epoch': 1.73} +2025-02-05 22:32:35 - ERROR - stderr - 58%|█████▊ | 12929/22434 [12:24:55<6:50:21, 2.59s/it] +2025-02-05 22:32:38 - ERROR - stderr - 58%|█████▊ | 12930/22434 [12:24:58<6:43:50, 2.55s/it] +2025-02-05 22:32:38 - ERROR - stderr - +2025-02-05 22:32:38 - ERROR - stderr - +2025-02-05 22:32:38 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.2716597318649292, 'learning_rate': 8.026445918070963e-06, 'epoch': 1.73} +2025-02-05 22:32:38 - ERROR - stderr - 58%|█████▊ | 12930/22434 [12:24:58<6:43:50, 2.55s/it] +2025-02-05 22:32:40 - ERROR - stderr - 58%|█████▊ | 12931/22434 [12:25:00<6:45:42, 2.56s/it] +2025-02-05 22:32:41 - ERROR - stderr - +2025-02-05 22:32:41 - ERROR - stderr - +2025-02-05 22:32:41 - INFO - stdout - {'loss': 0.6958, 'grad_norm': 1.180567741394043, 'learning_rate': 8.025030587619706e-06, 'epoch': 1.73} +2025-02-05 22:32:41 - ERROR - stderr - 58%|█████▊ | 12931/22434 [12:25:00<6:45:42, 2.56s/it] +2025-02-05 22:32:43 - ERROR - stderr - 58%|█████▊ | 12932/22434 [12:25:03<6:43:43, 2.55s/it] +2025-02-05 22:32:43 - ERROR - stderr - +2025-02-05 22:32:43 - ERROR - stderr - +2025-02-05 22:32:43 - INFO - stdout - {'loss': 0.7462, 'grad_norm': 1.2131541967391968, 'learning_rate': 8.023615298334796e-06, 'epoch': 1.73} +2025-02-05 22:32:43 - ERROR - stderr - 58%|█████▊ | 12932/22434 [12:25:03<6:43:43, 2.55s/it] +2025-02-05 22:32:46 - ERROR - stderr - 58%|█████▊ | 12933/22434 [12:25:05<6:41:26, 2.54s/it] +2025-02-05 22:32:46 - ERROR - stderr - +2025-02-05 22:32:46 - ERROR - stderr - +2025-02-05 22:32:46 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.2852815389633179, 'learning_rate': 8.022200050245736e-06, 'epoch': 1.73} +2025-02-05 22:32:46 - ERROR - stderr - 58%|█████▊ | 12933/22434 [12:25:05<6:41:26, 2.54s/it] +2025-02-05 22:32:48 - ERROR - stderr - 58%|█████▊ | 12934/22434 [12:25:08<6:43:27, 2.55s/it] +2025-02-05 22:32:48 - ERROR - stderr - +2025-02-05 22:32:48 - ERROR - stderr - +2025-02-05 22:32:48 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.182002067565918, 'learning_rate': 8.020784843382021e-06, 'epoch': 1.73} +2025-02-05 22:32:48 - ERROR - stderr - 58%|█████▊ | 12934/22434 [12:25:08<6:43:27, 2.55s/it] +2025-02-05 22:32:51 - ERROR - stderr - 58%|█████▊ | 12935/22434 [12:25:10<6:44:52, 2.56s/it] +2025-02-05 22:32:51 - ERROR - stderr - +2025-02-05 22:32:51 - ERROR - stderr - +2025-02-05 22:32:51 - INFO - stdout - {'loss': 0.664, 'grad_norm': 1.2903915643692017, 'learning_rate': 8.019369677773155e-06, 'epoch': 1.73} +2025-02-05 22:32:51 - ERROR - stderr - 58%|█████▊ | 12935/22434 [12:25:10<6:44:52, 2.56s/it] +2025-02-05 22:32:53 - ERROR - stderr - 58%|█████▊ | 12936/22434 [12:25:13<6:43:55, 2.55s/it] +2025-02-05 22:32:53 - ERROR - stderr - +2025-02-05 22:32:53 - ERROR - stderr - +2025-02-05 22:32:53 - INFO - stdout - {'loss': 0.747, 'grad_norm': 1.2154886722564697, 'learning_rate': 8.017954553448632e-06, 'epoch': 1.73} +2025-02-05 22:32:53 - ERROR - stderr - 58%|█████▊ | 12936/22434 [12:25:13<6:43:55, 2.55s/it] +2025-02-05 22:32:56 - ERROR - stderr - 58%|█████▊ | 12937/22434 [12:25:16<6:42:52, 2.55s/it] +2025-02-05 22:32:56 - ERROR - stderr - +2025-02-05 22:32:56 - ERROR - stderr - +2025-02-05 22:32:56 - INFO - stdout - {'loss': 0.7992, 'grad_norm': 1.3928550481796265, 'learning_rate': 8.01653947043795e-06, 'epoch': 1.73} +2025-02-05 22:32:56 - ERROR - stderr - 58%|█████▊ | 12937/22434 [12:25:16<6:42:52, 2.55s/it] +2025-02-05 22:32:58 - ERROR - stderr - 58%|█████▊ | 12938/22434 [12:25:18<6:40:37, 2.53s/it] +2025-02-05 22:32:58 - ERROR - stderr - +2025-02-05 22:32:58 - ERROR - stderr - +2025-02-05 22:32:58 - INFO - stdout - {'loss': 0.6869, 'grad_norm': 1.1690040826797485, 'learning_rate': 8.015124428770605e-06, 'epoch': 1.73} +2025-02-05 22:32:58 - ERROR - stderr - 58%|█████▊ | 12938/22434 [12:25:18<6:40:37, 2.53s/it] +2025-02-05 22:33:01 - ERROR - stderr - 58%|█████▊ | 12939/22434 [12:25:20<6:38:38, 2.52s/it] +2025-02-05 22:33:01 - ERROR - stderr - +2025-02-05 22:33:01 - ERROR - stderr - +2025-02-05 22:33:01 - INFO - stdout - {'loss': 0.6769, 'grad_norm': 1.2384727001190186, 'learning_rate': 8.013709428476093e-06, 'epoch': 1.73} +2025-02-05 22:33:01 - ERROR - stderr - 58%|█████▊ | 12939/22434 [12:25:21<6:38:38, 2.52s/it] +2025-02-05 22:33:03 - ERROR - stderr - 58%|█████▊ | 12940/22434 [12:25:23<6:36:36, 2.51s/it] +2025-02-05 22:33:03 - ERROR - stderr - +2025-02-05 22:33:03 - ERROR - stderr - +2025-02-05 22:33:03 - INFO - stdout - {'loss': 0.6784, 'grad_norm': 1.2056655883789062, 'learning_rate': 8.012294469583902e-06, 'epoch': 1.73} +2025-02-05 22:33:03 - ERROR - stderr - 58%|█████▊ | 12940/22434 [12:25:23<6:36:36, 2.51s/it] +2025-02-05 22:33:06 - ERROR - stderr - 58%|█████▊ | 12941/22434 [12:25:25<6:35:56, 2.50s/it] +2025-02-05 22:33:06 - ERROR - stderr - +2025-02-05 22:33:06 - ERROR - stderr - +2025-02-05 22:33:06 - INFO - stdout - {'loss': 0.6721, 'grad_norm': 1.2486110925674438, 'learning_rate': 8.010879552123537e-06, 'epoch': 1.73} +2025-02-05 22:33:06 - ERROR - stderr - 58%|█████▊ | 12941/22434 [12:25:26<6:35:56, 2.50s/it] +2025-02-05 22:33:08 - ERROR - stderr - 58%|█████▊ | 12942/22434 [12:25:28<6:36:18, 2.51s/it] +2025-02-05 22:33:08 - ERROR - stderr - +2025-02-05 22:33:08 - ERROR - stderr - +2025-02-05 22:33:08 - INFO - stdout - {'loss': 0.6669, 'grad_norm': 1.2337318658828735, 'learning_rate': 8.009464676124479e-06, 'epoch': 1.73} +2025-02-05 22:33:08 - ERROR - stderr - 58%|█████▊ | 12942/22434 [12:25:28<6:36:18, 2.51s/it] +2025-02-05 22:33:11 - ERROR - stderr - 58%|█████▊ | 12943/22434 [12:25:30<6:36:00, 2.50s/it] +2025-02-05 22:33:11 - ERROR - stderr - +2025-02-05 22:33:11 - ERROR - stderr - +2025-02-05 22:33:11 - INFO - stdout - {'loss': 0.6382, 'grad_norm': 1.1453114748001099, 'learning_rate': 8.00804984161623e-06, 'epoch': 1.73} +2025-02-05 22:33:11 - ERROR - stderr - 58%|█████▊ | 12943/22434 [12:25:31<6:36:00, 2.50s/it] +2025-02-05 22:33:13 - ERROR - stderr - 58%|█████▊ | 12944/22434 [12:25:33<6:33:57, 2.49s/it] +2025-02-05 22:33:13 - ERROR - stderr - +2025-02-05 22:33:13 - ERROR - stderr - +2025-02-05 22:33:13 - INFO - stdout - {'loss': 0.7013, 'grad_norm': 1.5174992084503174, 'learning_rate': 8.006635048628273e-06, 'epoch': 1.73} +2025-02-05 22:33:13 - ERROR - stderr - 58%|█████▊ | 12944/22434 [12:25:33<6:33:57, 2.49s/it] +2025-02-05 22:33:16 - ERROR - stderr - 58%|█████▊ | 12945/22434 [12:25:36<6:42:49, 2.55s/it] +2025-02-05 22:33:16 - ERROR - stderr - +2025-02-05 22:33:16 - ERROR - stderr - +2025-02-05 22:33:16 - INFO - stdout - {'loss': 0.6645, 'grad_norm': 1.2980328798294067, 'learning_rate': 8.005220297190099e-06, 'epoch': 1.73} +2025-02-05 22:33:16 - ERROR - stderr - 58%|█████▊ | 12945/22434 [12:25:36<6:42:49, 2.55s/it] +2025-02-05 22:33:18 - ERROR - stderr - 58%|█████▊ | 12946/22434 [12:25:38<6:39:29, 2.53s/it] +2025-02-05 22:33:18 - ERROR - stderr - +2025-02-05 22:33:18 - ERROR - stderr - +2025-02-05 22:33:18 - INFO - stdout - {'loss': 0.6581, 'grad_norm': 1.105157732963562, 'learning_rate': 8.003805587331204e-06, 'epoch': 1.73} +2025-02-05 22:33:18 - ERROR - stderr - 58%|█████▊ | 12946/22434 [12:25:38<6:39:29, 2.53s/it] +2025-02-05 22:33:21 - ERROR - stderr - 58%|█████▊ | 12947/22434 [12:25:41<6:47:55, 2.58s/it] +2025-02-05 22:33:21 - ERROR - stderr - +2025-02-05 22:33:21 - ERROR - stderr - +2025-02-05 22:33:21 - INFO - stdout - {'loss': 0.7296, 'grad_norm': 1.3423397541046143, 'learning_rate': 8.00239091908107e-06, 'epoch': 1.73} +2025-02-05 22:33:21 - ERROR - stderr - 58%|█████▊ | 12947/22434 [12:25:41<6:47:55, 2.58s/it] +2025-02-05 22:33:24 - ERROR - stderr - 58%|█████▊ | 12948/22434 [12:25:43<6:44:33, 2.56s/it] +2025-02-05 22:33:24 - ERROR - stderr - +2025-02-05 22:33:24 - ERROR - stderr - +2025-02-05 22:33:24 - INFO - stdout - {'loss': 0.7469, 'grad_norm': 1.247710943222046, 'learning_rate': 8.000976292469184e-06, 'epoch': 1.73} +2025-02-05 22:33:24 - ERROR - stderr - 58%|█████▊ | 12948/22434 [12:25:43<6:44:33, 2.56s/it] +2025-02-05 22:33:26 - ERROR - stderr - 58%|█████▊ | 12949/22434 [12:25:46<6:38:52, 2.52s/it] +2025-02-05 22:33:26 - ERROR - stderr - +2025-02-05 22:33:26 - ERROR - stderr - +2025-02-05 22:33:26 - INFO - stdout - {'loss': 0.6622, 'grad_norm': 1.2204896211624146, 'learning_rate': 7.999561707525034e-06, 'epoch': 1.73} +2025-02-05 22:33:26 - ERROR - stderr - 58%|█████▊ | 12949/22434 [12:25:46<6:38:52, 2.52s/it] +2025-02-05 22:33:28 - ERROR - stderr - 58%|█████▊ | 12950/22434 [12:25:48<6:36:34, 2.51s/it] +2025-02-05 22:33:29 - ERROR - stderr - +2025-02-05 22:33:29 - ERROR - stderr - +2025-02-05 22:33:29 - INFO - stdout - {'loss': 0.745, 'grad_norm': 1.3191577196121216, 'learning_rate': 7.998147164278107e-06, 'epoch': 1.73} +2025-02-05 22:33:29 - ERROR - stderr - 58%|█████▊ | 12950/22434 [12:25:48<6:36:34, 2.51s/it] +2025-02-05 22:33:31 - ERROR - stderr - 58%|█████▊ | 12951/22434 [12:25:51<6:34:58, 2.50s/it] +2025-02-05 22:33:31 - ERROR - stderr - +2025-02-05 22:33:31 - ERROR - stderr - +2025-02-05 22:33:31 - INFO - stdout - {'loss': 0.6733, 'grad_norm': 1.22435462474823, 'learning_rate': 7.996732662757887e-06, 'epoch': 1.73} +2025-02-05 22:33:31 - ERROR - stderr - 58%|█████▊ | 12951/22434 [12:25:51<6:34:58, 2.50s/it] +2025-02-05 22:33:33 - ERROR - stderr - 58%|█████▊ | 12952/22434 [12:25:53<6:32:58, 2.49s/it] +2025-02-05 22:33:33 - ERROR - stderr - +2025-02-05 22:33:33 - ERROR - stderr - +2025-02-05 22:33:33 - INFO - stdout - {'loss': 0.6471, 'grad_norm': 1.1642422676086426, 'learning_rate': 7.99531820299386e-06, 'epoch': 1.73} +2025-02-05 22:33:33 - ERROR - stderr - 58%|█████▊ | 12952/22434 [12:25:53<6:32:58, 2.49s/it] +2025-02-05 22:33:36 - ERROR - stderr - 58%|█████▊ | 12953/22434 [12:25:56<6:33:25, 2.49s/it] +2025-02-05 22:33:36 - ERROR - stderr - +2025-02-05 22:33:36 - ERROR - stderr - +2025-02-05 22:33:36 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.3170973062515259, 'learning_rate': 7.993903785015502e-06, 'epoch': 1.73} +2025-02-05 22:33:36 - ERROR - stderr - 58%|█████▊ | 12953/22434 [12:25:56<6:33:25, 2.49s/it] +2025-02-05 22:33:38 - ERROR - stderr - 58%|█████▊ | 12954/22434 [12:25:58<6:34:14, 2.50s/it] +2025-02-05 22:33:38 - ERROR - stderr - +2025-02-05 22:33:38 - ERROR - stderr - +2025-02-05 22:33:38 - INFO - stdout - {'loss': 0.6452, 'grad_norm': 1.3028523921966553, 'learning_rate': 7.992489408852306e-06, 'epoch': 1.73} +2025-02-05 22:33:38 - ERROR - stderr - 58%|█████▊ | 12954/22434 [12:25:58<6:34:14, 2.50s/it] +2025-02-05 22:33:41 - ERROR - stderr - 58%|█████▊ | 12955/22434 [12:26:01<6:34:59, 2.50s/it] +2025-02-05 22:33:41 - ERROR - stderr - +2025-02-05 22:33:41 - ERROR - stderr - +2025-02-05 22:33:41 - INFO - stdout - {'loss': 0.6933, 'grad_norm': 1.198959469795227, 'learning_rate': 7.991075074533743e-06, 'epoch': 1.73} +2025-02-05 22:33:41 - ERROR - stderr - 58%|█████▊ | 12955/22434 [12:26:01<6:34:59, 2.50s/it] +2025-02-05 22:33:43 - ERROR - stderr - 58%|█████▊ | 12956/22434 [12:26:03<6:32:33, 2.49s/it] +2025-02-05 22:33:43 - ERROR - stderr - +2025-02-05 22:33:43 - ERROR - stderr - +2025-02-05 22:33:43 - INFO - stdout - {'loss': 0.6041, 'grad_norm': 1.2525686025619507, 'learning_rate': 7.989660782089298e-06, 'epoch': 1.73} +2025-02-05 22:33:43 - ERROR - stderr - 58%|█████▊ | 12956/22434 [12:26:03<6:32:33, 2.49s/it] +2025-02-05 22:33:46 - ERROR - stderr - 58%|█████▊ | 12957/22434 [12:26:06<6:33:38, 2.49s/it] +2025-02-05 22:33:46 - ERROR - stderr - +2025-02-05 22:33:46 - ERROR - stderr - +2025-02-05 22:33:46 - INFO - stdout - {'loss': 0.6148, 'grad_norm': 1.126526951789856, 'learning_rate': 7.988246531548452e-06, 'epoch': 1.73} +2025-02-05 22:33:46 - ERROR - stderr - 58%|█████▊ | 12957/22434 [12:26:06<6:33:38, 2.49s/it] +2025-02-05 22:33:49 - ERROR - stderr - 58%|█████▊ | 12958/22434 [12:26:08<6:42:31, 2.55s/it] +2025-02-05 22:33:49 - ERROR - stderr - +2025-02-05 22:33:49 - ERROR - stderr - +2025-02-05 22:33:49 - INFO - stdout - {'loss': 0.6632, 'grad_norm': 1.3683443069458008, 'learning_rate': 7.986832322940678e-06, 'epoch': 1.73} +2025-02-05 22:33:49 - ERROR - stderr - 58%|█████▊ | 12958/22434 [12:26:08<6:42:31, 2.55s/it] +2025-02-05 22:33:51 - ERROR - stderr - 58%|█████▊ | 12959/22434 [12:26:11<6:41:11, 2.54s/it] +2025-02-05 22:33:51 - ERROR - stderr - +2025-02-05 22:33:51 - ERROR - stderr - +2025-02-05 22:33:51 - INFO - stdout - {'loss': 0.6639, 'grad_norm': 1.08456289768219, 'learning_rate': 7.985418156295462e-06, 'epoch': 1.73} +2025-02-05 22:33:51 - ERROR - stderr - 58%|█████▊ | 12959/22434 [12:26:11<6:41:11, 2.54s/it] +2025-02-05 22:33:54 - ERROR - stderr - 58%|█████▊ | 12960/22434 [12:26:13<6:38:52, 2.53s/it] +2025-02-05 22:33:54 - ERROR - stderr - +2025-02-05 22:33:54 - ERROR - stderr - +2025-02-05 22:33:54 - INFO - stdout - {'loss': 0.6817, 'grad_norm': 1.20579195022583, 'learning_rate': 7.984004031642277e-06, 'epoch': 1.73} +2025-02-05 22:33:54 - ERROR - stderr - 58%|█████▊ | 12960/22434 [12:26:13<6:38:52, 2.53s/it] +2025-02-05 22:33:56 - ERROR - stderr - 58%|█████▊ | 12961/22434 [12:26:16<6:35:29, 2.50s/it] +2025-02-05 22:33:56 - ERROR - stderr - +2025-02-05 22:33:56 - ERROR - stderr - +2025-02-05 22:33:56 - INFO - stdout - {'loss': 0.7181, 'grad_norm': 1.429930329322815, 'learning_rate': 7.982589949010595e-06, 'epoch': 1.73} +2025-02-05 22:33:56 - ERROR - stderr - 58%|█████▊ | 12961/22434 [12:26:16<6:35:29, 2.50s/it] +2025-02-05 22:33:59 - ERROR - stderr - 58%|█████▊ | 12962/22434 [12:26:18<6:34:53, 2.50s/it] +2025-02-05 22:33:59 - ERROR - stderr - +2025-02-05 22:33:59 - ERROR - stderr - +2025-02-05 22:33:59 - INFO - stdout - {'loss': 0.6253, 'grad_norm': 1.1147289276123047, 'learning_rate': 7.9811759084299e-06, 'epoch': 1.73} +2025-02-05 22:33:59 - ERROR - stderr - 58%|█████▊ | 12962/22434 [12:26:18<6:34:53, 2.50s/it] +2025-02-05 22:34:01 - ERROR - stderr - 58%|█████▊ | 12963/22434 [12:26:21<6:32:35, 2.49s/it] +2025-02-05 22:34:01 - ERROR - stderr - +2025-02-05 22:34:01 - ERROR - stderr - +2025-02-05 22:34:01 - INFO - stdout - {'loss': 0.674, 'grad_norm': 1.1452322006225586, 'learning_rate': 7.97976190992966e-06, 'epoch': 1.73} +2025-02-05 22:34:01 - ERROR - stderr - 58%|█████▊ | 12963/22434 [12:26:21<6:32:35, 2.49s/it] +2025-02-05 22:34:04 - ERROR - stderr - 58%|█████▊ | 12964/22434 [12:26:23<6:35:06, 2.50s/it] +2025-02-05 22:34:04 - ERROR - stderr - +2025-02-05 22:34:04 - ERROR - stderr - +2025-02-05 22:34:04 - INFO - stdout - {'loss': 0.6978, 'grad_norm': 1.3454655408859253, 'learning_rate': 7.978347953539344e-06, 'epoch': 1.73} +2025-02-05 22:34:04 - ERROR - stderr - 58%|█████▊ | 12964/22434 [12:26:23<6:35:06, 2.50s/it] +2025-02-05 22:34:06 - ERROR - stderr - 58%|█████▊ | 12965/22434 [12:26:26<6:36:13, 2.51s/it] +2025-02-05 22:34:06 - ERROR - stderr - +2025-02-05 22:34:06 - ERROR - stderr - +2025-02-05 22:34:06 - INFO - stdout - {'loss': 0.596, 'grad_norm': 1.1853774785995483, 'learning_rate': 7.976934039288437e-06, 'epoch': 1.73} +2025-02-05 22:34:06 - ERROR - stderr - 58%|█████▊ | 12965/22434 [12:26:26<6:36:13, 2.51s/it] +2025-02-05 22:34:09 - ERROR - stderr - 58%|█████▊ | 12966/22434 [12:26:29<6:50:15, 2.60s/it] +2025-02-05 22:34:09 - ERROR - stderr - +2025-02-05 22:34:09 - ERROR - stderr - +2025-02-05 22:34:09 - INFO - stdout - {'loss': 0.6165, 'grad_norm': 1.1059528589248657, 'learning_rate': 7.975520167206401e-06, 'epoch': 1.73} +2025-02-05 22:34:09 - ERROR - stderr - 58%|█████▊ | 12966/22434 [12:26:29<6:50:15, 2.60s/it] +2025-02-05 22:34:11 - ERROR - stderr - 58%|█████▊ | 12967/22434 [12:26:31<6:46:39, 2.58s/it] +2025-02-05 22:34:11 - ERROR - stderr - +2025-02-05 22:34:11 - ERROR - stderr - +2025-02-05 22:34:11 - INFO - stdout - {'loss': 0.6903, 'grad_norm': 1.2693812847137451, 'learning_rate': 7.974106337322713e-06, 'epoch': 1.73} +2025-02-05 22:34:11 - ERROR - stderr - 58%|█████▊ | 12967/22434 [12:26:31<6:46:39, 2.58s/it] +2025-02-05 22:34:14 - ERROR - stderr - 58%|█████▊ | 12968/22434 [12:26:34<6:43:24, 2.56s/it] +2025-02-05 22:34:14 - ERROR - stderr - +2025-02-05 22:34:14 - ERROR - stderr - +2025-02-05 22:34:14 - INFO - stdout - {'loss': 0.6019, 'grad_norm': 1.1936752796173096, 'learning_rate': 7.972692549666838e-06, 'epoch': 1.73} +2025-02-05 22:34:14 - ERROR - stderr - 58%|█████▊ | 12968/22434 [12:26:34<6:43:24, 2.56s/it] +2025-02-05 22:34:16 - ERROR - stderr - 58%|█████▊ | 12969/22434 [12:26:36<6:38:42, 2.53s/it] +2025-02-05 22:34:16 - ERROR - stderr - +2025-02-05 22:34:16 - ERROR - stderr - +2025-02-05 22:34:16 - INFO - stdout - {'loss': 0.6857, 'grad_norm': 1.4172327518463135, 'learning_rate': 7.971278804268245e-06, 'epoch': 1.73} +2025-02-05 22:34:16 - ERROR - stderr - 58%|█████▊ | 12969/22434 [12:26:36<6:38:42, 2.53s/it] +2025-02-05 22:34:19 - ERROR - stderr - 58%|█████▊ | 12970/22434 [12:26:39<6:35:46, 2.51s/it] +2025-02-05 22:34:19 - ERROR - stderr - +2025-02-05 22:34:19 - ERROR - stderr - +2025-02-05 22:34:19 - INFO - stdout - {'loss': 0.6526, 'grad_norm': 1.2376610040664673, 'learning_rate': 7.969865101156407e-06, 'epoch': 1.73} +2025-02-05 22:34:19 - ERROR - stderr - 58%|█████▊ | 12970/22434 [12:26:39<6:35:46, 2.51s/it] +2025-02-05 22:34:21 - ERROR - stderr - 58%|█████▊ | 12971/22434 [12:26:41<6:35:37, 2.51s/it] +2025-02-05 22:34:21 - ERROR - stderr - +2025-02-05 22:34:21 - ERROR - stderr - +2025-02-05 22:34:21 - INFO - stdout - {'loss': 0.698, 'grad_norm': 1.223358154296875, 'learning_rate': 7.968451440360789e-06, 'epoch': 1.73} +2025-02-05 22:34:21 - ERROR - stderr - 58%|█████▊ | 12971/22434 [12:26:41<6:35:37, 2.51s/it] +2025-02-05 22:34:24 - ERROR - stderr - 58%|█████▊ | 12972/22434 [12:26:44<6:37:11, 2.52s/it] +2025-02-05 22:34:24 - ERROR - stderr - +2025-02-05 22:34:24 - ERROR - stderr - +2025-02-05 22:34:24 - INFO - stdout - {'loss': 0.8227, 'grad_norm': 1.3264931440353394, 'learning_rate': 7.967037821910853e-06, 'epoch': 1.73} +2025-02-05 22:34:24 - ERROR - stderr - 58%|█████▊ | 12972/22434 [12:26:44<6:37:11, 2.52s/it] +2025-02-05 22:34:26 - ERROR - stderr - 58%|█████▊ | 12973/22434 [12:26:46<6:32:51, 2.49s/it] +2025-02-05 22:34:26 - ERROR - stderr - +2025-02-05 22:34:26 - ERROR - stderr - +2025-02-05 22:34:26 - INFO - stdout - {'loss': 0.7178, 'grad_norm': 1.4387422800064087, 'learning_rate': 7.96562424583607e-06, 'epoch': 1.73} +2025-02-05 22:34:26 - ERROR - stderr - 58%|█████▊ | 12973/22434 [12:26:46<6:32:51, 2.49s/it] +2025-02-05 22:34:29 - ERROR - stderr - 58%|█████▊ | 12974/22434 [12:26:48<6:29:45, 2.47s/it] +2025-02-05 22:34:29 - ERROR - stderr - +2025-02-05 22:34:29 - ERROR - stderr - +2025-02-05 22:34:29 - INFO - stdout - {'loss': 0.6439, 'grad_norm': 1.082356572151184, 'learning_rate': 7.964210712165901e-06, 'epoch': 1.73} +2025-02-05 22:34:29 - ERROR - stderr - 58%|█████▊ | 12974/22434 [12:26:49<6:29:45, 2.47s/it] +2025-02-05 22:34:31 - ERROR - stderr - 58%|█████▊ | 12975/22434 [12:26:51<6:31:55, 2.49s/it] +2025-02-05 22:34:31 - ERROR - stderr - +2025-02-05 22:34:31 - ERROR - stderr - +2025-02-05 22:34:31 - INFO - stdout - {'loss': 0.6704, 'grad_norm': 1.3814677000045776, 'learning_rate': 7.962797220929816e-06, 'epoch': 1.74} +2025-02-05 22:34:31 - ERROR - stderr - 58%|█████▊ | 12975/22434 [12:26:51<6:31:55, 2.49s/it] +2025-02-05 22:34:34 - ERROR - stderr - 58%|█████▊ | 12976/22434 [12:26:53<6:28:38, 2.47s/it] +2025-02-05 22:34:34 - ERROR - stderr - +2025-02-05 22:34:34 - ERROR - stderr - +2025-02-05 22:34:34 - INFO - stdout - {'loss': 0.7736, 'grad_norm': 1.4386422634124756, 'learning_rate': 7.961383772157273e-06, 'epoch': 1.74} +2025-02-05 22:34:34 - ERROR - stderr - 58%|█████▊ | 12976/22434 [12:26:53<6:28:38, 2.47s/it] +2025-02-05 22:34:36 - ERROR - stderr - 58%|█████▊ | 12977/22434 [12:26:56<6:30:01, 2.47s/it] +2025-02-05 22:34:36 - ERROR - stderr - +2025-02-05 22:34:36 - ERROR - stderr - +2025-02-05 22:34:36 - INFO - stdout - {'loss': 0.798, 'grad_norm': 1.2363412380218506, 'learning_rate': 7.95997036587773e-06, 'epoch': 1.74} +2025-02-05 22:34:36 - ERROR - stderr - 58%|█████▊ | 12977/22434 [12:26:56<6:30:01, 2.47s/it] +2025-02-05 22:34:39 - ERROR - stderr - 58%|█████▊ | 12978/22434 [12:26:58<6:30:25, 2.48s/it] +2025-02-05 22:34:39 - ERROR - stderr - +2025-02-05 22:34:39 - ERROR - stderr - +2025-02-05 22:34:39 - INFO - stdout - {'loss': 0.6632, 'grad_norm': 1.1102499961853027, 'learning_rate': 7.958557002120656e-06, 'epoch': 1.74} +2025-02-05 22:34:39 - ERROR - stderr - 58%|█████▊ | 12978/22434 [12:26:58<6:30:25, 2.48s/it] +2025-02-05 22:34:41 - ERROR - stderr - 58%|█████▊ | 12979/22434 [12:27:01<6:34:13, 2.50s/it] +2025-02-05 22:34:41 - ERROR - stderr - +2025-02-05 22:34:41 - ERROR - stderr - +2025-02-05 22:34:41 - INFO - stdout - {'loss': 0.7148, 'grad_norm': 1.3287978172302246, 'learning_rate': 7.95714368091551e-06, 'epoch': 1.74} +2025-02-05 22:34:41 - ERROR - stderr - 58%|█████▊ | 12979/22434 [12:27:01<6:34:13, 2.50s/it] +2025-02-05 22:34:44 - ERROR - stderr - 58%|█████▊ | 12980/22434 [12:27:03<6:30:45, 2.48s/it] +2025-02-05 22:34:44 - ERROR - stderr - +2025-02-05 22:34:44 - ERROR - stderr - +2025-02-05 22:34:44 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.3027607202529907, 'learning_rate': 7.955730402291743e-06, 'epoch': 1.74} +2025-02-05 22:34:44 - ERROR - stderr - 58%|█████▊ | 12980/22434 [12:27:03<6:30:45, 2.48s/it] +2025-02-05 22:34:46 - ERROR - stderr - 58%|█████▊ | 12981/22434 [12:27:06<6:44:58, 2.57s/it] +2025-02-05 22:34:46 - ERROR - stderr - +2025-02-05 22:34:46 - ERROR - stderr - +2025-02-05 22:34:46 - INFO - stdout - {'loss': 0.8573, 'grad_norm': 1.4091987609863281, 'learning_rate': 7.954317166278825e-06, 'epoch': 1.74} +2025-02-05 22:34:46 - ERROR - stderr - 58%|█████▊ | 12981/22434 [12:27:06<6:44:58, 2.57s/it] +2025-02-05 22:34:49 - ERROR - stderr - 58%|█████▊ | 12982/22434 [12:27:09<6:42:40, 2.56s/it] +2025-02-05 22:34:49 - ERROR - stderr - +2025-02-05 22:34:49 - ERROR - stderr - +2025-02-05 22:34:49 - INFO - stdout - {'loss': 0.6708, 'grad_norm': 1.2049931287765503, 'learning_rate': 7.952903972906205e-06, 'epoch': 1.74} +2025-02-05 22:34:49 - ERROR - stderr - 58%|█████▊ | 12982/22434 [12:27:09<6:42:40, 2.56s/it] +2025-02-05 22:34:51 - ERROR - stderr - 58%|█████▊ | 12983/22434 [12:27:11<6:40:51, 2.54s/it] +2025-02-05 22:34:51 - ERROR - stderr - +2025-02-05 22:34:51 - ERROR - stderr - +2025-02-05 22:34:51 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.2172513008117676, 'learning_rate': 7.951490822203345e-06, 'epoch': 1.74} +2025-02-05 22:34:51 - ERROR - stderr - 58%|█████▊ | 12983/22434 [12:27:11<6:40:51, 2.54s/it] +2025-02-05 22:34:54 - ERROR - stderr - 58%|█████▊ | 12984/22434 [12:27:14<6:36:16, 2.52s/it] +2025-02-05 22:34:54 - ERROR - stderr - +2025-02-05 22:34:54 - ERROR - stderr - +2025-02-05 22:34:54 - INFO - stdout - {'loss': 0.6512, 'grad_norm': 1.2180969715118408, 'learning_rate': 7.950077714199698e-06, 'epoch': 1.74} +2025-02-05 22:34:54 - ERROR - stderr - 58%|█████▊ | 12984/22434 [12:27:14<6:36:16, 2.52s/it] +2025-02-05 22:34:56 - ERROR - stderr - 58%|█████▊ | 12985/22434 [12:27:16<6:35:52, 2.51s/it] +2025-02-05 22:34:56 - ERROR - stderr - +2025-02-05 22:34:56 - ERROR - stderr - +2025-02-05 22:34:56 - INFO - stdout - {'loss': 0.7261, 'grad_norm': 1.3189692497253418, 'learning_rate': 7.948664648924716e-06, 'epoch': 1.74} +2025-02-05 22:34:56 - ERROR - stderr - 58%|█████▊ | 12985/22434 [12:27:16<6:35:52, 2.51s/it] +2025-02-05 22:34:59 - ERROR - stderr - 58%|█████▊ | 12986/22434 [12:27:19<6:35:57, 2.51s/it] +2025-02-05 22:34:59 - ERROR - stderr - +2025-02-05 22:34:59 - ERROR - stderr - +2025-02-05 22:34:59 - INFO - stdout - {'loss': 0.6572, 'grad_norm': 1.1013020277023315, 'learning_rate': 7.947251626407863e-06, 'epoch': 1.74} +2025-02-05 22:34:59 - ERROR - stderr - 58%|█████▊ | 12986/22434 [12:27:19<6:35:57, 2.51s/it] +2025-02-05 22:35:01 - ERROR - stderr - 58%|█████▊ | 12987/22434 [12:27:21<6:35:17, 2.51s/it] +2025-02-05 22:35:01 - ERROR - stderr - +2025-02-05 22:35:01 - ERROR - stderr - +2025-02-05 22:35:01 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.3120019435882568, 'learning_rate': 7.945838646678581e-06, 'epoch': 1.74} +2025-02-05 22:35:01 - ERROR - stderr - 58%|█████▊ | 12987/22434 [12:27:21<6:35:17, 2.51s/it] +2025-02-05 22:35:04 - ERROR - stderr - 58%|█████▊ | 12988/22434 [12:27:24<6:38:23, 2.53s/it] +2025-02-05 22:35:04 - ERROR - stderr - +2025-02-05 22:35:04 - ERROR - stderr - +2025-02-05 22:35:04 - INFO - stdout - {'loss': 0.7071, 'grad_norm': 1.1524003744125366, 'learning_rate': 7.944425709766328e-06, 'epoch': 1.74} +2025-02-05 22:35:04 - ERROR - stderr - 58%|█████▊ | 12988/22434 [12:27:24<6:38:23, 2.53s/it] +2025-02-05 22:35:06 - ERROR - stderr - 58%|█████▊ | 12989/22434 [12:27:26<6:36:43, 2.52s/it] +2025-02-05 22:35:07 - ERROR - stderr - +2025-02-05 22:35:07 - ERROR - stderr - +2025-02-05 22:35:07 - INFO - stdout - {'loss': 0.7936, 'grad_norm': 1.4776729345321655, 'learning_rate': 7.943012815700554e-06, 'epoch': 1.74} +2025-02-05 22:35:07 - ERROR - stderr - 58%|█████▊ | 12989/22434 [12:27:26<6:36:43, 2.52s/it] +2025-02-05 22:35:09 - ERROR - stderr - 58%|█████▊ | 12990/22434 [12:27:29<6:37:08, 2.52s/it] +2025-02-05 22:35:09 - ERROR - stderr - +2025-02-05 22:35:09 - ERROR - stderr - +2025-02-05 22:35:09 - INFO - stdout - {'loss': 0.5866, 'grad_norm': 1.1302794218063354, 'learning_rate': 7.941599964510707e-06, 'epoch': 1.74} +2025-02-05 22:35:09 - ERROR - stderr - 58%|█████▊ | 12990/22434 [12:27:29<6:37:08, 2.52s/it] +2025-02-05 22:35:12 - ERROR - stderr - 58%|█████▊ | 12991/22434 [12:27:31<6:42:25, 2.56s/it] +2025-02-05 22:35:12 - ERROR - stderr - +2025-02-05 22:35:12 - ERROR - stderr - +2025-02-05 22:35:12 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.2434536218643188, 'learning_rate': 7.940187156226244e-06, 'epoch': 1.74} +2025-02-05 22:35:12 - ERROR - stderr - 58%|█████▊ | 12991/22434 [12:27:31<6:42:25, 2.56s/it] +2025-02-05 22:35:14 - ERROR - stderr - 58%|█████▊ | 12992/22434 [12:27:34<6:37:23, 2.53s/it] +2025-02-05 22:35:14 - ERROR - stderr - +2025-02-05 22:35:14 - ERROR - stderr - +2025-02-05 22:35:14 - INFO - stdout - {'loss': 0.6755, 'grad_norm': 1.2090867757797241, 'learning_rate': 7.938774390876608e-06, 'epoch': 1.74} +2025-02-05 22:35:14 - ERROR - stderr - 58%|█████▊ | 12992/22434 [12:27:34<6:37:23, 2.53s/it] +2025-02-05 22:35:17 - ERROR - stderr - 58%|█████▊ | 12993/22434 [12:27:36<6:33:30, 2.50s/it] +2025-02-05 22:35:17 - ERROR - stderr - +2025-02-05 22:35:17 - ERROR - stderr - +2025-02-05 22:35:17 - INFO - stdout - {'loss': 0.7603, 'grad_norm': 1.3892182111740112, 'learning_rate': 7.937361668491244e-06, 'epoch': 1.74} +2025-02-05 22:35:17 - ERROR - stderr - 58%|█████▊ | 12993/22434 [12:27:36<6:33:30, 2.50s/it] +2025-02-05 22:35:19 - ERROR - stderr - 58%|█████▊ | 12994/22434 [12:27:39<6:47:09, 2.59s/it] +2025-02-05 22:35:19 - ERROR - stderr - +2025-02-05 22:35:19 - ERROR - stderr - +2025-02-05 22:35:19 - INFO - stdout - {'loss': 0.7253, 'grad_norm': 1.3046506643295288, 'learning_rate': 7.935948989099606e-06, 'epoch': 1.74} +2025-02-05 22:35:19 - ERROR - stderr - 58%|█████▊ | 12994/22434 [12:27:39<6:47:09, 2.59s/it] +2025-02-05 22:35:22 - ERROR - stderr - 58%|█████▊ | 12995/22434 [12:27:42<6:44:17, 2.57s/it] +2025-02-05 22:35:22 - ERROR - stderr - +2025-02-05 22:35:22 - ERROR - stderr - +2025-02-05 22:35:22 - INFO - stdout - {'loss': 0.6024, 'grad_norm': 1.1160005331039429, 'learning_rate': 7.934536352731133e-06, 'epoch': 1.74} +2025-02-05 22:35:22 - ERROR - stderr - 58%|█████▊ | 12995/22434 [12:27:42<6:44:17, 2.57s/it] +2025-02-05 22:35:24 - ERROR - stderr - 58%|█████▊ | 12996/22434 [12:27:44<6:38:59, 2.54s/it] +2025-02-05 22:35:24 - ERROR - stderr - +2025-02-05 22:35:24 - ERROR - stderr - +2025-02-05 22:35:24 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.1101962327957153, 'learning_rate': 7.933123759415273e-06, 'epoch': 1.74} +2025-02-05 22:35:24 - ERROR - stderr - 58%|█████▊ | 12996/22434 [12:27:44<6:38:59, 2.54s/it] +2025-02-05 22:35:27 - ERROR - stderr - 58%|█████▊ | 12997/22434 [12:27:47<6:34:55, 2.51s/it] +2025-02-05 22:35:27 - ERROR - stderr - +2025-02-05 22:35:27 - ERROR - stderr - +2025-02-05 22:35:27 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.3881338834762573, 'learning_rate': 7.931711209181474e-06, 'epoch': 1.74} +2025-02-05 22:35:27 - ERROR - stderr - 58%|█████▊ | 12997/22434 [12:27:47<6:34:55, 2.51s/it] +2025-02-05 22:35:29 - ERROR - stderr - 58%|█████▊ | 12998/22434 [12:27:49<6:37:06, 2.53s/it] +2025-02-05 22:35:29 - ERROR - stderr - +2025-02-05 22:35:29 - ERROR - stderr - +2025-02-05 22:35:29 - INFO - stdout - {'loss': 0.6302, 'grad_norm': 1.2860358953475952, 'learning_rate': 7.930298702059171e-06, 'epoch': 1.74} +2025-02-05 22:35:29 - ERROR - stderr - 58%|█████▊ | 12998/22434 [12:27:49<6:37:06, 2.53s/it] +2025-02-05 22:35:32 - ERROR - stderr - 58%|█████▊ | 12999/22434 [12:27:52<6:35:21, 2.51s/it] +2025-02-05 22:35:32 - ERROR - stderr - +2025-02-05 22:35:32 - ERROR - stderr - +2025-02-05 22:35:32 - INFO - stdout - {'loss': 0.7038, 'grad_norm': 1.2796674966812134, 'learning_rate': 7.928886238077817e-06, 'epoch': 1.74} +2025-02-05 22:35:32 - ERROR - stderr - 58%|█████▊ | 12999/22434 [12:27:52<6:35:21, 2.51s/it] +2025-02-05 22:35:34 - ERROR - stderr - 58%|█████▊ | 13000/22434 [12:27:54<6:31:23, 2.49s/it] +2025-02-05 22:35:34 - ERROR - stderr - +2025-02-05 22:35:34 - ERROR - stderr - +2025-02-05 22:35:34 - INFO - stdout - {'loss': 0.658, 'grad_norm': 1.3535820245742798, 'learning_rate': 7.927473817266843e-06, 'epoch': 1.74} +2025-02-05 22:35:34 - ERROR - stderr - 58%|█████▊ | 13000/22434 [12:27:54<6:31:23, 2.49s/it] +2025-02-05 22:35:37 - ERROR - stderr - 58%|█████▊ | 13001/22434 [12:27:57<6:34:21, 2.51s/it] +2025-02-05 22:35:37 - ERROR - stderr - +2025-02-05 22:35:37 - ERROR - stderr - +2025-02-05 22:35:37 - INFO - stdout - {'loss': 0.7353, 'grad_norm': 1.3275471925735474, 'learning_rate': 7.926061439655696e-06, 'epoch': 1.74} +2025-02-05 22:35:37 - ERROR - stderr - 58%|█████▊ | 13001/22434 [12:27:57<6:34:21, 2.51s/it] +2025-02-05 22:35:39 - ERROR - stderr - 58%|█████▊ | 13002/22434 [12:27:59<6:36:42, 2.52s/it] +2025-02-05 22:35:39 - ERROR - stderr - +2025-02-05 22:35:39 - ERROR - stderr - +2025-02-05 22:35:39 - INFO - stdout - {'loss': 0.6585, 'grad_norm': 1.2380675077438354, 'learning_rate': 7.924649105273813e-06, 'epoch': 1.74} +2025-02-05 22:35:39 - ERROR - stderr - 58%|█████▊ | 13002/22434 [12:27:59<6:36:42, 2.52s/it] +2025-02-05 22:35:42 - ERROR - stderr - 58%|█████▊ | 13003/22434 [12:28:02<6:34:13, 2.51s/it] +2025-02-05 22:35:42 - ERROR - stderr - +2025-02-05 22:35:42 - ERROR - stderr - +2025-02-05 22:35:42 - INFO - stdout - {'loss': 0.6684, 'grad_norm': 1.1827822923660278, 'learning_rate': 7.923236814150631e-06, 'epoch': 1.74} +2025-02-05 22:35:42 - ERROR - stderr - 58%|█████▊ | 13003/22434 [12:28:02<6:34:13, 2.51s/it] +2025-02-05 22:35:44 - ERROR - stderr - 58%|█████▊ | 13004/22434 [12:28:04<6:32:15, 2.50s/it] +2025-02-05 22:35:44 - ERROR - stderr - +2025-02-05 22:35:44 - ERROR - stderr - +2025-02-05 22:35:44 - INFO - stdout - {'loss': 0.6522, 'grad_norm': 1.3132728338241577, 'learning_rate': 7.921824566315595e-06, 'epoch': 1.74} +2025-02-05 22:35:44 - ERROR - stderr - 58%|█████▊ | 13004/22434 [12:28:04<6:32:15, 2.50s/it] +2025-02-05 22:35:47 - ERROR - stderr - 58%|█████▊ | 13005/22434 [12:28:06<6:28:45, 2.47s/it] +2025-02-05 22:35:47 - ERROR - stderr - +2025-02-05 22:35:47 - ERROR - stderr - +2025-02-05 22:35:47 - INFO - stdout - {'loss': 0.6441, 'grad_norm': 1.189157485961914, 'learning_rate': 7.920412361798137e-06, 'epoch': 1.74} +2025-02-05 22:35:47 - ERROR - stderr - 58%|█████▊ | 13005/22434 [12:28:07<6:28:45, 2.47s/it] +2025-02-05 22:35:49 - ERROR - stderr - 58%|█████▊ | 13006/22434 [12:28:09<6:28:01, 2.47s/it] +2025-02-05 22:35:49 - ERROR - stderr - +2025-02-05 22:35:49 - ERROR - stderr - +2025-02-05 22:35:49 - INFO - stdout - {'loss': 0.759, 'grad_norm': 1.3762788772583008, 'learning_rate': 7.91900020062769e-06, 'epoch': 1.74} +2025-02-05 22:35:49 - ERROR - stderr - 58%|█████▊ | 13006/22434 [12:28:09<6:28:01, 2.47s/it] +2025-02-05 22:35:52 - ERROR - stderr - 58%|█████▊ | 13007/22434 [12:28:11<6:26:37, 2.46s/it] +2025-02-05 22:35:52 - ERROR - stderr - +2025-02-05 22:35:52 - ERROR - stderr - +2025-02-05 22:35:52 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.201709270477295, 'learning_rate': 7.917588082833696e-06, 'epoch': 1.74} +2025-02-05 22:35:52 - ERROR - stderr - 58%|█████▊ | 13007/22434 [12:28:11<6:26:37, 2.46s/it] +2025-02-05 22:35:54 - ERROR - stderr - 58%|█████▊ | 13008/22434 [12:28:14<6:28:41, 2.47s/it] +2025-02-05 22:35:54 - ERROR - stderr - +2025-02-05 22:35:54 - ERROR - stderr - +2025-02-05 22:35:54 - INFO - stdout - {'loss': 0.7184, 'grad_norm': 1.2958943843841553, 'learning_rate': 7.916176008445584e-06, 'epoch': 1.74} +2025-02-05 22:35:54 - ERROR - stderr - 58%|█████▊ | 13008/22434 [12:28:14<6:28:41, 2.47s/it] +2025-02-05 22:35:57 - ERROR - stderr - 58%|█████▊ | 13009/22434 [12:28:16<6:29:48, 2.48s/it] +2025-02-05 22:35:57 - ERROR - stderr - +2025-02-05 22:35:57 - ERROR - stderr - +2025-02-05 22:35:57 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.265431523323059, 'learning_rate': 7.914763977492787e-06, 'epoch': 1.74} +2025-02-05 22:35:57 - ERROR - stderr - 58%|█████▊ | 13009/22434 [12:28:16<6:29:48, 2.48s/it] +2025-02-05 22:35:59 - ERROR - stderr - 58%|█████▊ | 13010/22434 [12:28:19<6:44:25, 2.57s/it] +2025-02-05 22:35:59 - ERROR - stderr - +2025-02-05 22:35:59 - ERROR - stderr - +2025-02-05 22:35:59 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.343470573425293, 'learning_rate': 7.913351990004743e-06, 'epoch': 1.74} +2025-02-05 22:35:59 - ERROR - stderr - 58%|█████▊ | 13010/22434 [12:28:19<6:44:25, 2.57s/it] +2025-02-05 22:36:02 - ERROR - stderr - 58%|█████▊ | 13011/22434 [12:28:22<6:42:51, 2.57s/it] +2025-02-05 22:36:02 - ERROR - stderr - +2025-02-05 22:36:02 - ERROR - stderr - +2025-02-05 22:36:02 - INFO - stdout - {'loss': 0.6748, 'grad_norm': 1.294517159461975, 'learning_rate': 7.911940046010876e-06, 'epoch': 1.74} +2025-02-05 22:36:02 - ERROR - stderr - 58%|█████▊ | 13011/22434 [12:28:22<6:42:51, 2.57s/it] +2025-02-05 22:36:04 - ERROR - stderr - 58%|█████▊ | 13012/22434 [12:28:24<6:39:25, 2.54s/it] +2025-02-05 22:36:04 - ERROR - stderr - +2025-02-05 22:36:04 - ERROR - stderr - +2025-02-05 22:36:04 - INFO - stdout - {'loss': 0.7061, 'grad_norm': 1.1967862844467163, 'learning_rate': 7.910528145540626e-06, 'epoch': 1.74} +2025-02-05 22:36:04 - ERROR - stderr - 58%|█████▊ | 13012/22434 [12:28:24<6:39:25, 2.54s/it] +2025-02-05 22:36:07 - ERROR - stderr - 58%|█████▊ | 13013/22434 [12:28:27<6:34:04, 2.51s/it] +2025-02-05 22:36:07 - ERROR - stderr - +2025-02-05 22:36:07 - ERROR - stderr - +2025-02-05 22:36:07 - INFO - stdout - {'loss': 0.6351, 'grad_norm': 1.283071756362915, 'learning_rate': 7.909116288623418e-06, 'epoch': 1.74} +2025-02-05 22:36:07 - ERROR - stderr - 58%|█████▊ | 13013/22434 [12:28:27<6:34:04, 2.51s/it] +2025-02-05 22:36:09 - ERROR - stderr - 58%|█████▊ | 13014/22434 [12:28:29<6:34:21, 2.51s/it] +2025-02-05 22:36:09 - ERROR - stderr - +2025-02-05 22:36:09 - ERROR - stderr - +2025-02-05 22:36:09 - INFO - stdout - {'loss': 0.6653, 'grad_norm': 1.136717677116394, 'learning_rate': 7.907704475288674e-06, 'epoch': 1.74} +2025-02-05 22:36:09 - ERROR - stderr - 58%|█████▊ | 13014/22434 [12:28:29<6:34:21, 2.51s/it] +2025-02-05 22:36:12 - ERROR - stderr - 58%|█████▊ | 13015/22434 [12:28:32<6:55:34, 2.65s/it] +2025-02-05 22:36:12 - ERROR - stderr - +2025-02-05 22:36:12 - ERROR - stderr - +2025-02-05 22:36:12 - INFO - stdout - {'loss': 0.7473, 'grad_norm': 1.3442686796188354, 'learning_rate': 7.90629270556583e-06, 'epoch': 1.74} +2025-02-05 22:36:12 - ERROR - stderr - 58%|█████▊ | 13015/22434 [12:28:32<6:55:34, 2.65s/it] +2025-02-05 22:36:15 - ERROR - stderr - 58%|█████▊ | 13016/22434 [12:28:35<7:04:21, 2.70s/it] +2025-02-05 22:36:15 - ERROR - stderr - +2025-02-05 22:36:15 - ERROR - stderr - +2025-02-05 22:36:15 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.1719759702682495, 'learning_rate': 7.904880979484316e-06, 'epoch': 1.74} +2025-02-05 22:36:15 - ERROR - stderr - 58%|█████▊ | 13016/22434 [12:28:35<7:04:21, 2.70s/it] +2025-02-05 22:36:18 - ERROR - stderr - 58%|█████▊ | 13017/22434 [12:28:37<6:55:34, 2.65s/it] +2025-02-05 22:36:18 - ERROR - stderr - +2025-02-05 22:36:18 - ERROR - stderr - +2025-02-05 22:36:18 - INFO - stdout - {'loss': 0.6149, 'grad_norm': 1.123903751373291, 'learning_rate': 7.903469297073547e-06, 'epoch': 1.74} +2025-02-05 22:36:18 - ERROR - stderr - 58%|█████▊ | 13017/22434 [12:28:38<6:55:34, 2.65s/it] +2025-02-05 22:36:20 - ERROR - stderr - 58%|█████▊ | 13018/22434 [12:28:40<6:49:05, 2.61s/it] +2025-02-05 22:36:20 - ERROR - stderr - +2025-02-05 22:36:20 - ERROR - stderr - +2025-02-05 22:36:20 - INFO - stdout - {'loss': 0.7083, 'grad_norm': 1.2346513271331787, 'learning_rate': 7.902057658362957e-06, 'epoch': 1.74} +2025-02-05 22:36:20 - ERROR - stderr - 58%|█████▊ | 13018/22434 [12:28:40<6:49:05, 2.61s/it] +2025-02-05 22:36:23 - ERROR - stderr - 58%|█████▊ | 13019/22434 [12:28:42<6:41:37, 2.56s/it] +2025-02-05 22:36:23 - ERROR - stderr - +2025-02-05 22:36:23 - ERROR - stderr - +2025-02-05 22:36:23 - INFO - stdout - {'loss': 0.7355, 'grad_norm': 1.2357277870178223, 'learning_rate': 7.900646063381965e-06, 'epoch': 1.74} +2025-02-05 22:36:23 - ERROR - stderr - 58%|█████▊ | 13019/22434 [12:28:42<6:41:37, 2.56s/it] +2025-02-05 22:36:25 - ERROR - stderr - 58%|█████▊ | 13020/22434 [12:28:45<6:37:07, 2.53s/it] +2025-02-05 22:36:25 - ERROR - stderr - +2025-02-05 22:36:25 - ERROR - stderr - +2025-02-05 22:36:25 - INFO - stdout - {'loss': 0.7176, 'grad_norm': 1.291468858718872, 'learning_rate': 7.899234512160002e-06, 'epoch': 1.74} +2025-02-05 22:36:25 - ERROR - stderr - 58%|█████▊ | 13020/22434 [12:28:45<6:37:07, 2.53s/it] +2025-02-05 22:36:28 - ERROR - stderr - 58%|█████▊ | 13021/22434 [12:28:47<6:39:45, 2.55s/it] +2025-02-05 22:36:28 - ERROR - stderr - +2025-02-05 22:36:28 - ERROR - stderr - +2025-02-05 22:36:28 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.2255412340164185, 'learning_rate': 7.897823004726482e-06, 'epoch': 1.74} +2025-02-05 22:36:28 - ERROR - stderr - 58%|█████▊ | 13021/22434 [12:28:48<6:39:45, 2.55s/it] +2025-02-05 22:36:30 - ERROR - stderr - 58%|█████▊ | 13022/22434 [12:28:50<6:39:56, 2.55s/it] +2025-02-05 22:36:30 - ERROR - stderr - +2025-02-05 22:36:30 - ERROR - stderr - +2025-02-05 22:36:30 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.3531514406204224, 'learning_rate': 7.896411541110828e-06, 'epoch': 1.74} +2025-02-05 22:36:30 - ERROR - stderr - 58%|█████▊ | 13022/22434 [12:28:50<6:39:56, 2.55s/it] +2025-02-05 22:36:33 - ERROR - stderr - 58%|█████▊ | 13023/22434 [12:28:53<6:37:16, 2.53s/it] +2025-02-05 22:36:33 - ERROR - stderr - +2025-02-05 22:36:33 - ERROR - stderr - +2025-02-05 22:36:33 - INFO - stdout - {'loss': 0.6651, 'grad_norm': 1.2913018465042114, 'learning_rate': 7.895000121342467e-06, 'epoch': 1.74} +2025-02-05 22:36:33 - ERROR - stderr - 58%|█████▊ | 13023/22434 [12:28:53<6:37:16, 2.53s/it] +2025-02-05 22:36:35 - ERROR - stderr - 58%|█████▊ | 13024/22434 [12:28:55<6:33:48, 2.51s/it] +2025-02-05 22:36:35 - ERROR - stderr - +2025-02-05 22:36:35 - ERROR - stderr - +2025-02-05 22:36:35 - INFO - stdout - {'loss': 0.7505, 'grad_norm': 1.3822314739227295, 'learning_rate': 7.893588745450814e-06, 'epoch': 1.74} +2025-02-05 22:36:35 - ERROR - stderr - 58%|█████▊ | 13024/22434 [12:28:55<6:33:48, 2.51s/it] +2025-02-05 22:36:38 - ERROR - stderr - 58%|█████▊ | 13025/22434 [12:28:58<6:34:05, 2.51s/it] +2025-02-05 22:36:38 - ERROR - stderr - +2025-02-05 22:36:38 - ERROR - stderr - +2025-02-05 22:36:38 - INFO - stdout - {'loss': 0.6882, 'grad_norm': 1.2893619537353516, 'learning_rate': 7.892177413465285e-06, 'epoch': 1.74} +2025-02-05 22:36:38 - ERROR - stderr - 58%|█████▊ | 13025/22434 [12:28:58<6:34:05, 2.51s/it] +2025-02-05 22:36:40 - ERROR - stderr - 58%|█████▊ | 13026/22434 [12:29:00<6:32:58, 2.51s/it] +2025-02-05 22:36:40 - ERROR - stderr - +2025-02-05 22:36:40 - ERROR - stderr - +2025-02-05 22:36:40 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.3651442527770996, 'learning_rate': 7.890766125415304e-06, 'epoch': 1.74} +2025-02-05 22:36:40 - ERROR - stderr - 58%|█████▊ | 13026/22434 [12:29:00<6:32:58, 2.51s/it] +2025-02-05 22:36:43 - ERROR - stderr - 58%|█████▊ | 13027/22434 [12:29:03<6:38:30, 2.54s/it] +2025-02-05 22:36:43 - ERROR - stderr - +2025-02-05 22:36:43 - ERROR - stderr - +2025-02-05 22:36:43 - INFO - stdout - {'loss': 0.7687, 'grad_norm': 1.3322980403900146, 'learning_rate': 7.88935488133028e-06, 'epoch': 1.74} +2025-02-05 22:36:43 - ERROR - stderr - 58%|█████▊ | 13027/22434 [12:29:03<6:38:30, 2.54s/it] +2025-02-05 22:36:45 - ERROR - stderr - 58%|█████▊ | 13028/22434 [12:29:05<6:35:56, 2.53s/it] +2025-02-05 22:36:45 - ERROR - stderr - +2025-02-05 22:36:45 - ERROR - stderr - +2025-02-05 22:36:45 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.1048146486282349, 'learning_rate': 7.887943681239636e-06, 'epoch': 1.74} +2025-02-05 22:36:45 - ERROR - stderr - 58%|█████▊ | 13028/22434 [12:29:05<6:35:56, 2.53s/it] +2025-02-05 22:36:48 - ERROR - stderr - 58%|█████▊ | 13029/22434 [12:29:08<6:32:19, 2.50s/it] +2025-02-05 22:36:48 - ERROR - stderr - +2025-02-05 22:36:48 - ERROR - stderr - +2025-02-05 22:36:48 - INFO - stdout - {'loss': 0.6702, 'grad_norm': 1.2392172813415527, 'learning_rate': 7.886532525172788e-06, 'epoch': 1.74} +2025-02-05 22:36:48 - ERROR - stderr - 58%|█████▊ | 13029/22434 [12:29:08<6:32:19, 2.50s/it] +2025-02-05 22:36:50 - ERROR - stderr - 58%|█████▊ | 13030/22434 [12:29:10<6:30:11, 2.49s/it] +2025-02-05 22:36:50 - ERROR - stderr - +2025-02-05 22:36:50 - ERROR - stderr - +2025-02-05 22:36:50 - INFO - stdout - {'loss': 0.6104, 'grad_norm': 1.135341763496399, 'learning_rate': 7.885121413159142e-06, 'epoch': 1.74} +2025-02-05 22:36:50 - ERROR - stderr - 58%|█████▊ | 13030/22434 [12:29:10<6:30:11, 2.49s/it] +2025-02-05 22:36:53 - ERROR - stderr - 58%|█████▊ | 13031/22434 [12:29:13<6:30:43, 2.49s/it] +2025-02-05 22:36:53 - ERROR - stderr - +2025-02-05 22:36:53 - ERROR - stderr - +2025-02-05 22:36:53 - INFO - stdout - {'loss': 0.7751, 'grad_norm': 1.3980683088302612, 'learning_rate': 7.883710345228121e-06, 'epoch': 1.74} +2025-02-05 22:36:53 - ERROR - stderr - 58%|█████▊ | 13031/22434 [12:29:13<6:30:43, 2.49s/it] +2025-02-05 22:36:55 - ERROR - stderr - 58%|█████▊ | 13032/22434 [12:29:15<6:32:57, 2.51s/it] +2025-02-05 22:36:55 - ERROR - stderr - +2025-02-05 22:36:55 - ERROR - stderr - +2025-02-05 22:36:55 - INFO - stdout - {'loss': 0.6543, 'grad_norm': 1.1801718473434448, 'learning_rate': 7.882299321409133e-06, 'epoch': 1.74} +2025-02-05 22:36:55 - ERROR - stderr - 58%|█████▊ | 13032/22434 [12:29:15<6:32:57, 2.51s/it] +2025-02-05 22:36:58 - ERROR - stderr - 58%|█████▊ | 13033/22434 [12:29:18<6:34:27, 2.52s/it] +2025-02-05 22:36:58 - ERROR - stderr - +2025-02-05 22:36:58 - ERROR - stderr - +2025-02-05 22:36:58 - INFO - stdout - {'loss': 0.6601, 'grad_norm': 1.1539872884750366, 'learning_rate': 7.880888341731585e-06, 'epoch': 1.74} +2025-02-05 22:36:58 - ERROR - stderr - 58%|█████▊ | 13033/22434 [12:29:18<6:34:27, 2.52s/it] +2025-02-05 22:37:00 - ERROR - stderr - 58%|█████▊ | 13034/22434 [12:29:20<6:32:47, 2.51s/it] +2025-02-05 22:37:00 - ERROR - stderr - +2025-02-05 22:37:00 - ERROR - stderr - +2025-02-05 22:37:00 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.2245397567749023, 'learning_rate': 7.879477406224894e-06, 'epoch': 1.74} +2025-02-05 22:37:00 - ERROR - stderr - 58%|█████▊ | 13034/22434 [12:29:20<6:32:47, 2.51s/it] +2025-02-05 22:37:03 - ERROR - stderr - 58%|█████▊ | 13035/22434 [12:29:23<6:32:10, 2.50s/it] +2025-02-05 22:37:03 - ERROR - stderr - +2025-02-05 22:37:03 - ERROR - stderr - +2025-02-05 22:37:03 - INFO - stdout - {'loss': 0.6663, 'grad_norm': 1.2380887269973755, 'learning_rate': 7.878066514918466e-06, 'epoch': 1.74} +2025-02-05 22:37:03 - ERROR - stderr - 58%|█████▊ | 13035/22434 [12:29:23<6:32:10, 2.50s/it] +2025-02-05 22:37:05 - ERROR - stderr - 58%|█████▊ | 13036/22434 [12:29:25<6:37:53, 2.54s/it] +2025-02-05 22:37:05 - ERROR - stderr - +2025-02-05 22:37:05 - ERROR - stderr - +2025-02-05 22:37:05 - INFO - stdout - {'loss': 0.6971, 'grad_norm': 1.1769920587539673, 'learning_rate': 7.876655667841713e-06, 'epoch': 1.74} +2025-02-05 22:37:05 - ERROR - stderr - 58%|█████▊ | 13036/22434 [12:29:25<6:37:53, 2.54s/it] +2025-02-05 22:37:08 - ERROR - stderr - 58%|█████▊ | 13037/22434 [12:29:28<6:39:06, 2.55s/it] +2025-02-05 22:37:08 - ERROR - stderr - +2025-02-05 22:37:08 - ERROR - stderr - +2025-02-05 22:37:08 - INFO - stdout - {'loss': 0.6831, 'grad_norm': 1.2474780082702637, 'learning_rate': 7.875244865024043e-06, 'epoch': 1.74} +2025-02-05 22:37:08 - ERROR - stderr - 58%|█████▊ | 13037/22434 [12:29:28<6:39:06, 2.55s/it] +2025-02-05 22:37:11 - ERROR - stderr - 58%|█████▊ | 13038/22434 [12:29:30<6:39:11, 2.55s/it] +2025-02-05 22:37:11 - ERROR - stderr - +2025-02-05 22:37:11 - ERROR - stderr - +2025-02-05 22:37:11 - INFO - stdout - {'loss': 0.752, 'grad_norm': 1.391759991645813, 'learning_rate': 7.873834106494856e-06, 'epoch': 1.74} +2025-02-05 22:37:11 - ERROR - stderr - 58%|█████▊ | 13038/22434 [12:29:30<6:39:11, 2.55s/it] +2025-02-05 22:37:13 - ERROR - stderr - 58%|█████▊ | 13039/22434 [12:29:33<6:40:53, 2.56s/it] +2025-02-05 22:37:13 - ERROR - stderr - +2025-02-05 22:37:13 - ERROR - stderr - +2025-02-05 22:37:13 - INFO - stdout - {'loss': 0.5601, 'grad_norm': 1.1344763040542603, 'learning_rate': 7.872423392283566e-06, 'epoch': 1.74} +2025-02-05 22:37:13 - ERROR - stderr - 58%|█████▊ | 13039/22434 [12:29:33<6:40:53, 2.56s/it] +2025-02-05 22:37:16 - ERROR - stderr - 58%|█████▊ | 13040/22434 [12:29:36<6:51:45, 2.63s/it] +2025-02-05 22:37:16 - ERROR - stderr - +2025-02-05 22:37:16 - ERROR - stderr - +2025-02-05 22:37:16 - INFO - stdout - {'loss': 0.6839, 'grad_norm': 1.2707011699676514, 'learning_rate': 7.871012722419572e-06, 'epoch': 1.74} +2025-02-05 22:37:16 - ERROR - stderr - 58%|█████▊ | 13040/22434 [12:29:36<6:51:45, 2.63s/it] +2025-02-05 22:37:18 - ERROR - stderr - 58%|█████▊ | 13041/22434 [12:29:38<6:46:01, 2.59s/it] +2025-02-05 22:37:19 - ERROR - stderr - +2025-02-05 22:37:19 - ERROR - stderr - +2025-02-05 22:37:19 - INFO - stdout - {'loss': 0.6804, 'grad_norm': 1.1836826801300049, 'learning_rate': 7.86960209693228e-06, 'epoch': 1.74} +2025-02-05 22:37:19 - ERROR - stderr - 58%|█████▊ | 13041/22434 [12:29:38<6:46:01, 2.59s/it] +2025-02-05 22:37:21 - ERROR - stderr - 58%|█████▊ | 13042/22434 [12:29:41<6:43:58, 2.58s/it] +2025-02-05 22:37:21 - ERROR - stderr - +2025-02-05 22:37:21 - ERROR - stderr - +2025-02-05 22:37:21 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.3756452798843384, 'learning_rate': 7.868191515851097e-06, 'epoch': 1.74} +2025-02-05 22:37:21 - ERROR - stderr - 58%|█████▊ | 13042/22434 [12:29:41<6:43:58, 2.58s/it] +2025-02-05 22:37:24 - ERROR - stderr - 58%|█████▊ | 13043/22434 [12:29:43<6:43:36, 2.58s/it] +2025-02-05 22:37:24 - ERROR - stderr - +2025-02-05 22:37:24 - ERROR - stderr - +2025-02-05 22:37:24 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.2071243524551392, 'learning_rate': 7.866780979205418e-06, 'epoch': 1.74} +2025-02-05 22:37:24 - ERROR - stderr - 58%|█████▊ | 13043/22434 [12:29:43<6:43:36, 2.58s/it] +2025-02-05 22:37:26 - ERROR - stderr - 58%|█████▊ | 13044/22434 [12:29:46<6:42:43, 2.57s/it] +2025-02-05 22:37:26 - ERROR - stderr - +2025-02-05 22:37:26 - ERROR - stderr - +2025-02-05 22:37:26 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.4034690856933594, 'learning_rate': 7.865370487024652e-06, 'epoch': 1.74} +2025-02-05 22:37:26 - ERROR - stderr - 58%|█████▊ | 13044/22434 [12:29:46<6:42:43, 2.57s/it] +2025-02-05 22:37:29 - ERROR - stderr - 58%|█████▊ | 13045/22434 [12:29:48<6:39:36, 2.55s/it] +2025-02-05 22:37:29 - ERROR - stderr - +2025-02-05 22:37:29 - ERROR - stderr - +2025-02-05 22:37:29 - INFO - stdout - {'loss': 0.7472, 'grad_norm': 1.2475078105926514, 'learning_rate': 7.863960039338196e-06, 'epoch': 1.74} +2025-02-05 22:37:29 - ERROR - stderr - 58%|█████▊ | 13045/22434 [12:29:48<6:39:36, 2.55s/it] +2025-02-05 22:37:31 - ERROR - stderr - 58%|█████▊ | 13046/22434 [12:29:51<6:36:48, 2.54s/it] +2025-02-05 22:37:31 - ERROR - stderr - +2025-02-05 22:37:31 - ERROR - stderr - +2025-02-05 22:37:31 - INFO - stdout - {'loss': 0.6136, 'grad_norm': 1.1691175699234009, 'learning_rate': 7.862549636175444e-06, 'epoch': 1.74} +2025-02-05 22:37:31 - ERROR - stderr - 58%|█████▊ | 13046/22434 [12:29:51<6:36:48, 2.54s/it] +2025-02-05 22:37:34 - ERROR - stderr - 58%|█████▊ | 13047/22434 [12:29:53<6:32:38, 2.51s/it] +2025-02-05 22:37:34 - ERROR - stderr - +2025-02-05 22:37:34 - ERROR - stderr - +2025-02-05 22:37:34 - INFO - stdout - {'loss': 0.6959, 'grad_norm': 1.3238695859909058, 'learning_rate': 7.861139277565802e-06, 'epoch': 1.74} +2025-02-05 22:37:34 - ERROR - stderr - 58%|████���▊ | 13047/22434 [12:29:53<6:32:38, 2.51s/it] +2025-02-05 22:37:36 - ERROR - stderr - 58%|█████▊ | 13048/22434 [12:29:56<6:30:12, 2.49s/it] +2025-02-05 22:37:36 - ERROR - stderr - +2025-02-05 22:37:36 - ERROR - stderr - +2025-02-05 22:37:36 - INFO - stdout - {'loss': 0.6304, 'grad_norm': 1.217269778251648, 'learning_rate': 7.859728963538667e-06, 'epoch': 1.74} +2025-02-05 22:37:36 - ERROR - stderr - 58%|█████▊ | 13048/22434 [12:29:56<6:30:12, 2.49s/it] +2025-02-05 22:37:39 - ERROR - stderr - 58%|█████▊ | 13049/22434 [12:29:58<6:29:40, 2.49s/it] +2025-02-05 22:37:39 - ERROR - stderr - +2025-02-05 22:37:39 - ERROR - stderr - +2025-02-05 22:37:39 - INFO - stdout - {'loss': 0.7341, 'grad_norm': 1.349548578262329, 'learning_rate': 7.85831869412343e-06, 'epoch': 1.74} +2025-02-05 22:37:39 - ERROR - stderr - 58%|█████▊ | 13049/22434 [12:29:58<6:29:40, 2.49s/it] +2025-02-05 22:37:41 - ERROR - stderr - 58%|█████▊ | 13050/22434 [12:30:01<6:30:38, 2.50s/it] +2025-02-05 22:37:41 - ERROR - stderr - +2025-02-05 22:37:41 - ERROR - stderr - +2025-02-05 22:37:41 - INFO - stdout - {'loss': 0.6642, 'grad_norm': 1.1678849458694458, 'learning_rate': 7.856908469349495e-06, 'epoch': 1.75} +2025-02-05 22:37:41 - ERROR - stderr - 58%|█████▊ | 13050/22434 [12:30:01<6:30:38, 2.50s/it] +2025-02-05 22:37:44 - ERROR - stderr - 58%|█████▊ | 13051/22434 [12:30:03<6:30:12, 2.50s/it] +2025-02-05 22:37:44 - ERROR - stderr - +2025-02-05 22:37:44 - ERROR - stderr - +2025-02-05 22:37:44 - INFO - stdout - {'loss': 0.8134, 'grad_norm': 1.392716884613037, 'learning_rate': 7.855498289246246e-06, 'epoch': 1.75} +2025-02-05 22:37:44 - ERROR - stderr - 58%|█████▊ | 13051/22434 [12:30:03<6:30:12, 2.50s/it] +2025-02-05 22:37:46 - ERROR - stderr - 58%|█████▊ | 13052/22434 [12:30:06<6:30:46, 2.50s/it] +2025-02-05 22:37:46 - ERROR - stderr - +2025-02-05 22:37:46 - ERROR - stderr - +2025-02-05 22:37:46 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.2779555320739746, 'learning_rate': 7.85408815384309e-06, 'epoch': 1.75} +2025-02-05 22:37:46 - ERROR - stderr - 58%|█████▊ | 13052/22434 [12:30:06<6:30:46, 2.50s/it] +2025-02-05 22:37:49 - ERROR - stderr - 58%|█████▊ | 13053/22434 [12:30:08<6:30:46, 2.50s/it] +2025-02-05 22:37:49 - ERROR - stderr - +2025-02-05 22:37:49 - ERROR - stderr - +2025-02-05 22:37:49 - INFO - stdout - {'loss': 0.7441, 'grad_norm': 1.2170205116271973, 'learning_rate': 7.85267806316941e-06, 'epoch': 1.75} +2025-02-05 22:37:49 - ERROR - stderr - 58%|█████▊ | 13053/22434 [12:30:08<6:30:46, 2.50s/it] +2025-02-05 22:37:51 - ERROR - stderr - 58%|█████▊ | 13054/22434 [12:30:11<6:44:03, 2.58s/it] +2025-02-05 22:37:51 - ERROR - stderr - +2025-02-05 22:37:51 - ERROR - stderr - +2025-02-05 22:37:51 - INFO - stdout - {'loss': 0.7023, 'grad_norm': 1.2747998237609863, 'learning_rate': 7.851268017254598e-06, 'epoch': 1.75} +2025-02-05 22:37:51 - ERROR - stderr - 58%|█████▊ | 13054/22434 [12:30:11<6:44:03, 2.58s/it] +2025-02-05 22:37:54 - ERROR - stderr - 58%|█████▊ | 13055/22434 [12:30:14<6:43:57, 2.58s/it] +2025-02-05 22:37:54 - ERROR - stderr - +2025-02-05 22:37:54 - ERROR - stderr - +2025-02-05 22:37:54 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.1835910081863403, 'learning_rate': 7.849858016128054e-06, 'epoch': 1.75} +2025-02-05 22:37:54 - ERROR - stderr - 58%|█████▊ | 13055/22434 [12:30:14<6:43:57, 2.58s/it] +2025-02-05 22:37:56 - ERROR - stderr - 58%|█████▊ | 13056/22434 [12:30:16<6:41:10, 2.57s/it] +2025-02-05 22:37:56 - ERROR - stderr - +2025-02-05 22:37:56 - ERROR - stderr - +2025-02-05 22:37:56 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.2233202457427979, 'learning_rate': 7.848448059819161e-06, 'epoch': 1.75} +2025-02-05 22:37:56 - ERROR - stderr - 58%|█████▊ | 13056/22434 [12:30:16<6:41:10, 2.57s/it] +2025-02-05 22:37:59 - ERROR - stderr - 58%|█████▊ | 13057/22434 [12:30:19<6:42:09, 2.57s/it] +2025-02-05 22:37:59 - ERROR - stderr - +2025-02-05 22:37:59 - ERROR - stderr - +2025-02-05 22:37:59 - INFO - stdout - {'loss': 0.7284, 'grad_norm': 1.297877550125122, 'learning_rate': 7.847038148357306e-06, 'epoch': 1.75} +2025-02-05 22:37:59 - ERROR - stderr - 58%|█████▊ | 13057/22434 [12:30:19<6:42:09, 2.57s/it] +2025-02-05 22:38:02 - ERROR - stderr - 58%|█████▊ | 13058/22434 [12:30:21<6:47:35, 2.61s/it] +2025-02-05 22:38:02 - ERROR - stderr - +2025-02-05 22:38:02 - ERROR - stderr - +2025-02-05 22:38:02 - INFO - stdout - {'loss': 0.6343, 'grad_norm': 1.176261067390442, 'learning_rate': 7.845628281771884e-06, 'epoch': 1.75} +2025-02-05 22:38:02 - ERROR - stderr - 58%|█████▊ | 13058/22434 [12:30:22<6:47:35, 2.61s/it] +2025-02-05 22:38:04 - ERROR - stderr - 58%|█████▊ | 13059/22434 [12:30:24<6:41:03, 2.57s/it] +2025-02-05 22:38:04 - ERROR - stderr - +2025-02-05 22:38:04 - ERROR - stderr - +2025-02-05 22:38:04 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.2030620574951172, 'learning_rate': 7.844218460092274e-06, 'epoch': 1.75} +2025-02-05 22:38:04 - ERROR - stderr - 58%|█████▊ | 13059/22434 [12:30:24<6:41:03, 2.57s/it] +2025-02-05 22:38:07 - ERROR - stderr - 58%|█████▊ | 13060/22434 [12:30:27<6:40:46, 2.57s/it] +2025-02-05 22:38:07 - ERROR - stderr - +2025-02-05 22:38:07 - ERROR - stderr - +2025-02-05 22:38:07 - INFO - stdout - {'loss': 0.712, 'grad_norm': 1.3171570301055908, 'learning_rate': 7.842808683347871e-06, 'epoch': 1.75} +2025-02-05 22:38:07 - ERROR - stderr - 58%|█████▊ | 13060/22434 [12:30:27<6:40:46, 2.57s/it] +2025-02-05 22:38:09 - ERROR - stderr - 58%|█████▊ | 13061/22434 [12:30:29<6:40:15, 2.56s/it] +2025-02-05 22:38:09 - ERROR - stderr - +2025-02-05 22:38:09 - ERROR - stderr - +2025-02-05 22:38:09 - INFO - stdout - {'loss': 0.6585, 'grad_norm': 1.1180232763290405, 'learning_rate': 7.841398951568059e-06, 'epoch': 1.75} +2025-02-05 22:38:09 - ERROR - stderr - 58%|█████▊ | 13061/22434 [12:30:29<6:40:15, 2.56s/it] +2025-02-05 22:38:12 - ERROR - stderr - 58%|█████▊ | 13062/22434 [12:30:32<6:35:48, 2.53s/it] +2025-02-05 22:38:12 - ERROR - stderr - +2025-02-05 22:38:12 - ERROR - stderr - +2025-02-05 22:38:12 - INFO - stdout - {'loss': 0.7921, 'grad_norm': 1.4766656160354614, 'learning_rate': 7.839989264782216e-06, 'epoch': 1.75} +2025-02-05 22:38:12 - ERROR - stderr - 58%|█████▊ | 13062/22434 [12:30:32<6:35:48, 2.53s/it] +2025-02-05 22:38:14 - ERROR - stderr - 58%|█████▊ | 13063/22434 [12:30:34<6:29:48, 2.50s/it] +2025-02-05 22:38:14 - ERROR - stderr - +2025-02-05 22:38:14 - ERROR - stderr - +2025-02-05 22:38:14 - INFO - stdout - {'loss': 0.6242, 'grad_norm': 1.2696322202682495, 'learning_rate': 7.838579623019732e-06, 'epoch': 1.75} +2025-02-05 22:38:14 - ERROR - stderr - 58%|█████▊ | 13063/22434 [12:30:34<6:29:48, 2.50s/it] +2025-02-05 22:38:17 - ERROR - stderr - 58%|█████▊ | 13064/22434 [12:30:36<6:31:24, 2.51s/it] +2025-02-05 22:38:17 - ERROR - stderr - +2025-02-05 22:38:17 - ERROR - stderr - +2025-02-05 22:38:17 - INFO - stdout - {'loss': 0.7428, 'grad_norm': 1.2306702136993408, 'learning_rate': 7.83717002630999e-06, 'epoch': 1.75} +2025-02-05 22:38:17 - ERROR - stderr - 58%|█████▊ | 13064/22434 [12:30:37<6:31:24, 2.51s/it] +2025-02-05 22:38:19 - ERROR - stderr - 58%|█████▊ | 13065/22434 [12:30:39<6:30:58, 2.50s/it] +2025-02-05 22:38:19 - ERROR - stderr - +2025-02-05 22:38:19 - ERROR - stderr - +2025-02-05 22:38:19 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.134827971458435, 'learning_rate': 7.835760474682364e-06, 'epoch': 1.75} +2025-02-05 22:38:19 - ERROR - stderr - 58%|█████▊ | 13065/22434 [12:30:39<6:30:58, 2.50s/it] +2025-02-05 22:38:22 - ERROR - stderr - 58%|█████▊ | 13066/22434 [12:30:41<6:29:52, 2.50s/it] +2025-02-05 22:38:22 - ERROR - stderr - +2025-02-05 22:38:22 - ERROR - stderr - +2025-02-05 22:38:22 - INFO - stdout - {'loss': 0.6702, 'grad_norm': 1.1618316173553467, 'learning_rate': 7.83435096816624e-06, 'epoch': 1.75} +2025-02-05 22:38:22 - ERROR - stderr - 58%|█████▊ | 13066/22434 [12:30:42<6:29:52, 2.50s/it] +2025-02-05 22:38:24 - ERROR - stderr - 58%|█████▊ | 13067/22434 [12:30:44<6:29:15, 2.49s/it] +2025-02-05 22:38:24 - ERROR - stderr - +2025-02-05 22:38:24 - ERROR - stderr - +2025-02-05 22:38:24 - INFO - stdout - {'loss': 0.5767, 'grad_norm': 1.1894416809082031, 'learning_rate': 7.832941506790998e-06, 'epoch': 1.75} +2025-02-05 22:38:24 - ERROR - stderr - 58%|█████▊ | 13067/22434 [12:30:44<6:29:15, 2.49s/it] +2025-02-05 22:38:27 - ERROR - stderr - 58%|█████▊ | 13068/22434 [12:30:46<6:28:50, 2.49s/it] +2025-02-05 22:38:27 - ERROR - stderr - +2025-02-05 22:38:27 - ERROR - stderr - +2025-02-05 22:38:27 - INFO - stdout - {'loss': 0.6596, 'grad_norm': 1.2185240983963013, 'learning_rate': 7.831532090586022e-06, 'epoch': 1.75} +2025-02-05 22:38:27 - ERROR - stderr - 58%|█████▊ | 13068/22434 [12:30:46<6:28:50, 2.49s/it] +2025-02-05 22:38:29 - ERROR - stderr - 58%|█████▊ | 13069/22434 [12:30:49<6:26:44, 2.48s/it] +2025-02-05 22:38:29 - ERROR - stderr - +2025-02-05 22:38:29 - ERROR - stderr - +2025-02-05 22:38:29 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.205224871635437, 'learning_rate': 7.830122719580682e-06, 'epoch': 1.75} +2025-02-05 22:38:29 - ERROR - stderr - 58%|█████▊ | 13069/22434 [12:30:49<6:26:44, 2.48s/it] +2025-02-05 22:38:32 - ERROR - stderr - 58%|█████▊ | 13070/22434 [12:30:51<6:25:33, 2.47s/it] +2025-02-05 22:38:32 - ERROR - stderr - +2025-02-05 22:38:32 - ERROR - stderr - +2025-02-05 22:38:32 - INFO - stdout - {'loss': 0.7274, 'grad_norm': 1.2897881269454956, 'learning_rate': 7.828713393804354e-06, 'epoch': 1.75} +2025-02-05 22:38:32 - ERROR - stderr - 58%|█████▊ | 13070/22434 [12:30:51<6:25:33, 2.47s/it] +2025-02-05 22:38:34 - ERROR - stderr - 58%|█████▊ | 13071/22434 [12:30:54<6:33:49, 2.52s/it] +2025-02-05 22:38:34 - ERROR - stderr - +2025-02-05 22:38:34 - ERROR - stderr - +2025-02-05 22:38:34 - INFO - stdout - {'loss': 0.7419, 'grad_norm': 1.2016863822937012, 'learning_rate': 7.827304113286423e-06, 'epoch': 1.75} +2025-02-05 22:38:34 - ERROR - stderr - 58%|█████▊ | 13071/22434 [12:30:54<6:33:49, 2.52s/it] +2025-02-05 22:38:37 - ERROR - stderr - 58%|█████▊ | 13072/22434 [12:30:56<6:31:21, 2.51s/it] +2025-02-05 22:38:37 - ERROR - stderr - +2025-02-05 22:38:37 - ERROR - stderr - +2025-02-05 22:38:37 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.269400954246521, 'learning_rate': 7.825894878056257e-06, 'epoch': 1.75} +2025-02-05 22:38:37 - ERROR - stderr - 58%|█████▊ | 13072/22434 [12:30:56<6:31:21, 2.51s/it] +2025-02-05 22:38:39 - ERROR - stderr - 58%|█████▊ | 13073/22434 [12:30:59<6:30:07, 2.50s/it] +2025-02-05 22:38:39 - ERROR - stderr - +2025-02-05 22:38:39 - ERROR - stderr - +2025-02-05 22:38:39 - INFO - stdout - {'loss': 0.7003, 'grad_norm': 1.179728627204895, 'learning_rate': 7.824485688143229e-06, 'epoch': 1.75} +2025-02-05 22:38:39 - ERROR - stderr - 58%|█████▊ | 13073/22434 [12:30:59<6:30:07, 2.50s/it] +2025-02-05 22:38:42 - ERROR - stderr - 58%|█████▊ | 13074/22434 [12:31:02<6:34:39, 2.53s/it] +2025-02-05 22:38:42 - ERROR - stderr - +2025-02-05 22:38:42 - ERROR - stderr - +2025-02-05 22:38:42 - INFO - stdout - {'loss': 0.7239, 'grad_norm': 1.1984901428222656, 'learning_rate': 7.823076543576718e-06, 'epoch': 1.75} +2025-02-05 22:38:42 - ERROR - stderr - 58%|█████▊ | 13074/22434 [12:31:02<6:34:39, 2.53s/it] +2025-02-05 22:38:44 - ERROR - stderr - 58%|█████▊ | 13075/22434 [12:31:04<6:34:17, 2.53s/it] +2025-02-05 22:38:44 - ERROR - stderr - +2025-02-05 22:38:44 - ERROR - stderr - +2025-02-05 22:38:44 - INFO - stdout - {'loss': 0.7506, 'grad_norm': 1.3591724634170532, 'learning_rate': 7.82166744438609e-06, 'epoch': 1.75} +2025-02-05 22:38:44 - ERROR - stderr - 58%|█████▊ | 13075/22434 [12:31:04<6:34:17, 2.53s/it] +2025-02-05 22:38:47 - ERROR - stderr - 58%|█████▊ | 13076/22434 [12:31:07<6:31:23, 2.51s/it] +2025-02-05 22:38:47 - ERROR - stderr - +2025-02-05 22:38:47 - ERROR - stderr - +2025-02-05 22:38:47 - INFO - stdout - {'loss': 0.6928, 'grad_norm': 1.213767409324646, 'learning_rate': 7.820258390600723e-06, 'epoch': 1.75} +2025-02-05 22:38:47 - ERROR - stderr - 58%|█████▊ | 13076/22434 [12:31:07<6:31:23, 2.51s/it] +2025-02-05 22:38:50 - ERROR - stderr - 58%|█████▊ | 13077/22434 [12:31:09<6:44:03, 2.59s/it] +2025-02-05 22:38:50 - ERROR - stderr - +2025-02-05 22:38:50 - ERROR - stderr - +2025-02-05 22:38:50 - INFO - stdout - {'loss': 0.7237, 'grad_norm': 1.263181447982788, 'learning_rate': 7.818849382249987e-06, 'epoch': 1.75} +2025-02-05 22:38:50 - ERROR - stderr - 58%|█████▊ | 13077/22434 [12:31:09<6:44:03, 2.59s/it] +2025-02-05 22:38:52 - ERROR - stderr - 58%|█████▊ | 13078/22434 [12:31:12<6:35:11, 2.53s/it] +2025-02-05 22:38:52 - ERROR - stderr - +2025-02-05 22:38:52 - ERROR - stderr - +2025-02-05 22:38:52 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.2507827281951904, 'learning_rate': 7.81744041936324e-06, 'epoch': 1.75} +2025-02-05 22:38:52 - ERROR - stderr - 58%|█████▊ | 13078/22434 [12:31:12<6:35:11, 2.53s/it] +2025-02-05 22:38:54 - ERROR - stderr - 58%|█████▊ | 13079/22434 [12:31:14<6:30:03, 2.50s/it] +2025-02-05 22:38:54 - ERROR - stderr - +2025-02-05 22:38:54 - ERROR - stderr - +2025-02-05 22:38:54 - INFO - stdout - {'loss': 0.7597, 'grad_norm': 1.1744171380996704, 'learning_rate': 7.816031501969865e-06, 'epoch': 1.75} +2025-02-05 22:38:54 - ERROR - stderr - 58%|█████▊ | 13079/22434 [12:31:14<6:30:03, 2.50s/it] +2025-02-05 22:38:57 - ERROR - stderr - 58%|█████▊ | 13080/22434 [12:31:17<6:29:01, 2.50s/it] +2025-02-05 22:38:57 - ERROR - stderr - +2025-02-05 22:38:57 - ERROR - stderr - +2025-02-05 22:38:57 - INFO - stdout - {'loss': 0.7873, 'grad_norm': 1.3690522909164429, 'learning_rate': 7.814622630099224e-06, 'epoch': 1.75} +2025-02-05 22:38:57 - ERROR - stderr - 58%|█████▊ | 13080/22434 [12:31:17<6:29:01, 2.50s/it] +2025-02-05 22:38:59 - ERROR - stderr - 58%|█████▊ | 13081/22434 [12:31:19<6:27:54, 2.49s/it] +2025-02-05 22:38:59 - ERROR - stderr - +2025-02-05 22:38:59 - ERROR - stderr - +2025-02-05 22:38:59 - INFO - stdout - {'loss': 0.7045, 'grad_norm': 1.213753581047058, 'learning_rate': 7.813213803780679e-06, 'epoch': 1.75} +2025-02-05 22:38:59 - ERROR - stderr - 58%|█████▊ | 13081/22434 [12:31:19<6:27:54, 2.49s/it] +2025-02-05 22:39:02 - ERROR - stderr - 58%|█████▊ | 13082/22434 [12:31:22<6:27:32, 2.49s/it] +2025-02-05 22:39:02 - ERROR - stderr - +2025-02-05 22:39:02 - ERROR - stderr - +2025-02-05 22:39:02 - INFO - stdout - {'loss': 0.7456, 'grad_norm': 1.4117039442062378, 'learning_rate': 7.811805023043603e-06, 'epoch': 1.75} +2025-02-05 22:39:02 - ERROR - stderr - 58%|█████▊ | 13082/22434 [12:31:22<6:27:32, 2.49s/it] +2025-02-05 22:39:04 - ERROR - stderr - 58%|█████▊ | 13083/22434 [12:31:24<6:33:56, 2.53s/it] +2025-02-05 22:39:04 - ERROR - stderr - +2025-02-05 22:39:04 - ERROR - stderr - +2025-02-05 22:39:04 - INFO - stdout - {'loss': 0.6891, 'grad_norm': 1.1690768003463745, 'learning_rate': 7.810396287917354e-06, 'epoch': 1.75} +2025-02-05 22:39:04 - ERROR - stderr - 58%|█████▊ | 13083/22434 [12:31:24<6:33:56, 2.53s/it] +2025-02-05 22:39:07 - ERROR - stderr - 58%|█████▊ | 13084/22434 [12:31:27<6:29:58, 2.50s/it] +2025-02-05 22:39:07 - ERROR - stderr - +2025-02-05 22:39:07 - ERROR - stderr - +2025-02-05 22:39:07 - INFO - stdout - {'loss': 0.6886, 'grad_norm': 1.2917319536209106, 'learning_rate': 7.808987598431303e-06, 'epoch': 1.75} +2025-02-05 22:39:07 - ERROR - stderr - 58%|█████▊ | 13084/22434 [12:31:27<6:29:58, 2.50s/it] +2025-02-05 22:39:09 - ERROR - stderr - 58%|█████▊ | 13085/22434 [12:31:29<6:30:59, 2.51s/it] +2025-02-05 22:39:09 - ERROR - stderr - +2025-02-05 22:39:09 - ERROR - stderr - +2025-02-05 22:39:09 - INFO - stdout - {'loss': 0.66, 'grad_norm': 1.290205478668213, 'learning_rate': 7.807578954614808e-06, 'epoch': 1.75} +2025-02-05 22:39:09 - ERROR - stderr - 58%|█████▊ | 13085/22434 [12:31:29<6:30:59, 2.51s/it] +2025-02-05 22:39:12 - ERROR - stderr - 58%|█████▊ | 13086/22434 [12:31:32<6:30:28, 2.51s/it] +2025-02-05 22:39:12 - ERROR - stderr - +2025-02-05 22:39:12 - ERROR - stderr - +2025-02-05 22:39:12 - INFO - stdout - {'loss': 0.6721, 'grad_norm': 1.2115559577941895, 'learning_rate': 7.806170356497229e-06, 'epoch': 1.75} +2025-02-05 22:39:12 - ERROR - stderr - 58%|█████▊ | 13086/22434 [12:31:32<6:30:28, 2.51s/it] +2025-02-05 22:39:14 - ERROR - stderr - 58%|█████▊ | 13087/22434 [12:31:34<6:30:47, 2.51s/it] +2025-02-05 22:39:14 - ERROR - stderr - +2025-02-05 22:39:14 - ERROR - stderr - +2025-02-05 22:39:14 - INFO - stdout - {'loss': 0.7299, 'grad_norm': 1.3884086608886719, 'learning_rate': 7.804761804107935e-06, 'epoch': 1.75} +2025-02-05 22:39:14 - ERROR - stderr - 58%|█████▊ | 13087/22434 [12:31:34<6:30:47, 2.51s/it] +2025-02-05 22:39:17 - ERROR - stderr - 58%|█████▊ | 13088/22434 [12:31:37<6:28:45, 2.50s/it] +2025-02-05 22:39:17 - ERROR - stderr - +2025-02-05 22:39:17 - ERROR - stderr - +2025-02-05 22:39:17 - INFO - stdout - {'loss': 0.6995, 'grad_norm': 1.2650190591812134, 'learning_rate': 7.803353297476276e-06, 'epoch': 1.75} +2025-02-05 22:39:17 - ERROR - stderr - 58%|█████▊ | 13088/22434 [12:31:37<6:28:45, 2.50s/it] +2025-02-05 22:39:20 - ERROR - stderr - 58%|█████▊ | 13089/22434 [12:31:39<6:38:30, 2.56s/it] +2025-02-05 22:39:20 - ERROR - stderr - +2025-02-05 22:39:20 - ERROR - stderr - +2025-02-05 22:39:20 - INFO - stdout - {'loss': 0.7303, 'grad_norm': 1.2618435621261597, 'learning_rate': 7.801944836631617e-06, 'epoch': 1.75} +2025-02-05 22:39:20 - ERROR - stderr - 58%|█████▊ | 13089/22434 [12:31:39<6:38:30, 2.56s/it] +2025-02-05 22:39:22 - ERROR - stderr - 58%|█████▊ | 13090/22434 [12:31:42<6:34:31, 2.53s/it] +2025-02-05 22:39:22 - ERROR - stderr - +2025-02-05 22:39:22 - ERROR - stderr - +2025-02-05 22:39:22 - INFO - stdout - {'loss': 0.6773, 'grad_norm': 1.3761684894561768, 'learning_rate': 7.800536421603317e-06, 'epoch': 1.75} +2025-02-05 22:39:22 - ERROR - stderr - 58%|█████▊ | 13090/22434 [12:31:42<6:34:31, 2.53s/it] +2025-02-05 22:39:25 - ERROR - stderr - 58%|█████▊ | 13091/22434 [12:31:44<6:38:53, 2.56s/it] +2025-02-05 22:39:25 - ERROR - stderr - +2025-02-05 22:39:25 - ERROR - stderr - +2025-02-05 22:39:25 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.2615305185317993, 'learning_rate': 7.799128052420726e-06, 'epoch': 1.75} +2025-02-05 22:39:25 - ERROR - stderr - 58%|█████▊ | 13091/22434 [12:31:44<6:38:53, 2.56s/it] +2025-02-05 22:39:27 - ERROR - stderr - 58%|█████▊ | 13092/22434 [12:31:47<6:37:19, 2.55s/it] +2025-02-05 22:39:27 - ERROR - stderr - +2025-02-05 22:39:27 - ERROR - stderr - +2025-02-05 22:39:27 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.1958781480789185, 'learning_rate': 7.797719729113207e-06, 'epoch': 1.75} +2025-02-05 22:39:27 - ERROR - stderr - 58%|█████▊ | 13092/22434 [12:31:47<6:37:19, 2.55s/it] +2025-02-05 22:39:30 - ERROR - stderr - 58%|█████▊ | 13093/22434 [12:31:50<6:46:53, 2.61s/it] +2025-02-05 22:39:30 - ERROR - stderr - +2025-02-05 22:39:30 - ERROR - stderr - +2025-02-05 22:39:30 - INFO - stdout - {'loss': 0.683, 'grad_norm': 1.3002465963363647, 'learning_rate': 7.796311451710115e-06, 'epoch': 1.75} +2025-02-05 22:39:30 - ERROR - stderr - 58%|█████▊ | 13093/22434 [12:31:50<6:46:53, 2.61s/it] +2025-02-05 22:39:33 - ERROR - stderr - 58%|█████▊ | 13094/22434 [12:31:52<6:46:19, 2.61s/it] +2025-02-05 22:39:33 - ERROR - stderr - +2025-02-05 22:39:33 - ERROR - stderr - +2025-02-05 22:39:33 - INFO - stdout - {'loss': 0.7837, 'grad_norm': 1.2295221090316772, 'learning_rate': 7.794903220240798e-06, 'epoch': 1.75} +2025-02-05 22:39:33 - ERROR - stderr - 58%|█████▊ | 13094/22434 [12:31:52<6:46:19, 2.61s/it] +2025-02-05 22:39:35 - ERROR - stderr - 58%|█████▊ | 13095/22434 [12:31:55<6:40:12, 2.57s/it] +2025-02-05 22:39:35 - ERROR - stderr - +2025-02-05 22:39:35 - ERROR - stderr - +2025-02-05 22:39:35 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.304438829421997, 'learning_rate': 7.793495034734616e-06, 'epoch': 1.75} +2025-02-05 22:39:35 - ERROR - stderr - 58%|█████▊ | 13095/22434 [12:31:55<6:40:12, 2.57s/it] +2025-02-05 22:39:37 - ERROR - stderr - 58%|█████▊ | 13096/22434 [12:31:57<6:33:12, 2.53s/it] +2025-02-05 22:39:38 - ERROR - stderr - +2025-02-05 22:39:38 - ERROR - stderr - +2025-02-05 22:39:38 - INFO - stdout - {'loss': 0.6624, 'grad_norm': 1.2923221588134766, 'learning_rate': 7.792086895220915e-06, 'epoch': 1.75} +2025-02-05 22:39:38 - ERROR - stderr - 58%|█████▊ | 13096/22434 [12:31:57<6:33:12, 2.53s/it] +2025-02-05 22:39:40 - ERROR - stderr - 58%|█████▊ | 13097/22434 [12:32:00<6:46:25, 2.61s/it] +2025-02-05 22:39:40 - ERROR - stderr - +2025-02-05 22:39:40 - ERROR - stderr - +2025-02-05 22:39:40 - INFO - stdout - {'loss': 0.7078, 'grad_norm': 1.1502407789230347, 'learning_rate': 7.790678801729056e-06, 'epoch': 1.75} +2025-02-05 22:39:40 - ERROR - stderr - 58%|█████▊ | 13097/22434 [12:32:00<6:46:25, 2.61s/it] +2025-02-05 22:39:43 - ERROR - stderr - 58%|█████▊ | 13098/22434 [12:32:03<6:40:24, 2.57s/it] +2025-02-05 22:39:43 - ERROR - stderr - +2025-02-05 22:39:43 - ERROR - stderr - +2025-02-05 22:39:43 - INFO - stdout - {'loss': 0.6706, 'grad_norm': 1.2105625867843628, 'learning_rate': 7.789270754288379e-06, 'epoch': 1.75} +2025-02-05 22:39:43 - ERROR - stderr - 58%|█████▊ | 13098/22434 [12:32:03<6:40:24, 2.57s/it] +2025-02-05 22:39:45 - ERROR - stderr - 58%|█████▊ | 13099/22434 [12:32:05<6:40:01, 2.57s/it] +2025-02-05 22:39:45 - ERROR - stderr - +2025-02-05 22:39:45 - ERROR - stderr - +2025-02-05 22:39:45 - INFO - stdout - {'loss': 0.7235, 'grad_norm': 1.4119216203689575, 'learning_rate': 7.787862752928237e-06, 'epoch': 1.75} +2025-02-05 22:39:45 - ERROR - stderr - 58%|█████▊ | 13099/22434 [12:32:05<6:40:01, 2.57s/it] +2025-02-05 22:39:48 - ERROR - stderr - 58%|█████▊ | 13100/22434 [12:32:08<6:50:12, 2.64s/it] +2025-02-05 22:39:48 - ERROR - stderr - +2025-02-05 22:39:48 - ERROR - stderr - +2025-02-05 22:39:48 - INFO - stdout - {'loss': 0.674, 'grad_norm': 1.3615435361862183, 'learning_rate': 7.786454797677982e-06, 'epoch': 1.75} +2025-02-05 22:39:48 - ERROR - stderr - 58%|█████▊ | 13100/22434 [12:32:08<6:50:12, 2.64s/it] +2025-02-05 22:39:51 - ERROR - stderr - 58%|█████▊ | 13101/22434 [12:32:10<6:42:28, 2.59s/it] +2025-02-05 22:39:51 - ERROR - stderr - +2025-02-05 22:39:51 - ERROR - stderr - +2025-02-05 22:39:51 - INFO - stdout - {'loss': 0.6492, 'grad_norm': 1.148840069770813, 'learning_rate': 7.78504688856696e-06, 'epoch': 1.75} +2025-02-05 22:39:51 - ERROR - stderr - 58%|█████▊ | 13101/22434 [12:32:10<6:42:28, 2.59s/it] +2025-02-05 22:39:53 - ERROR - stderr - 58%|█████▊ | 13102/22434 [12:32:13<6:35:22, 2.54s/it] +2025-02-05 22:39:53 - ERROR - stderr - +2025-02-05 22:39:53 - ERROR - stderr - +2025-02-05 22:39:53 - INFO - stdout - {'loss': 0.6806, 'grad_norm': 1.2933416366577148, 'learning_rate': 7.783639025624511e-06, 'epoch': 1.75} +2025-02-05 22:39:53 - ERROR - stderr - 58%|█████▊ | 13102/22434 [12:32:13<6:35:22, 2.54s/it] +2025-02-05 22:39:56 - ERROR - stderr - 58%|█████▊ | 13103/22434 [12:32:15<6:36:03, 2.55s/it] +2025-02-05 22:39:56 - ERROR - stderr - +2025-02-05 22:39:56 - ERROR - stderr - +2025-02-05 22:39:56 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.1937079429626465, 'learning_rate': 7.782231208879991e-06, 'epoch': 1.75} +2025-02-05 22:39:56 - ERROR - stderr - 58%|█████▊ | 13103/22434 [12:32:15<6:36:03, 2.55s/it] +2025-02-05 22:39:58 - ERROR - stderr - 58%|█████▊ | 13104/22434 [12:32:18<6:31:04, 2.51s/it] +2025-02-05 22:39:58 - ERROR - stderr - +2025-02-05 22:39:58 - ERROR - stderr - +2025-02-05 22:39:58 - INFO - stdout - {'loss': 0.653, 'grad_norm': 1.112066626548767, 'learning_rate': 7.780823438362733e-06, 'epoch': 1.75} +2025-02-05 22:39:58 - ERROR - stderr - 58%|█████▊ | 13104/22434 [12:32:18<6:31:04, 2.51s/it] +2025-02-05 22:40:01 - ERROR - stderr - 58%|█████▊ | 13105/22434 [12:32:20<6:31:23, 2.52s/it] +2025-02-05 22:40:01 - ERROR - stderr - +2025-02-05 22:40:01 - ERROR - stderr - +2025-02-05 22:40:01 - INFO - stdout - {'loss': 0.6604, 'grad_norm': 1.3983036279678345, 'learning_rate': 7.779415714102092e-06, 'epoch': 1.75} +2025-02-05 22:40:01 - ERROR - stderr - 58%|█████▊ | 13105/22434 [12:32:20<6:31:23, 2.52s/it] +2025-02-05 22:40:03 - ERROR - stderr - 58%|█████▊ | 13106/22434 [12:32:23<6:34:27, 2.54s/it] +2025-02-05 22:40:03 - ERROR - stderr - +2025-02-05 22:40:03 - ERROR - stderr - +2025-02-05 22:40:03 - INFO - stdout - {'loss': 0.7585, 'grad_norm': 1.2212536334991455, 'learning_rate': 7.778008036127405e-06, 'epoch': 1.75} +2025-02-05 22:40:03 - ERROR - stderr - 58%|█████▊ | 13106/22434 [12:32:23<6:34:27, 2.54s/it] +2025-02-05 22:40:06 - ERROR - stderr - 58%|█████▊ | 13107/22434 [12:32:25<6:32:28, 2.52s/it] +2025-02-05 22:40:06 - ERROR - stderr - +2025-02-05 22:40:06 - ERROR - stderr - +2025-02-05 22:40:06 - INFO - stdout - {'loss': 0.745, 'grad_norm': 1.2914576530456543, 'learning_rate': 7.776600404468012e-06, 'epoch': 1.75} +2025-02-05 22:40:06 - ERROR - stderr - 58%|█████▊ | 13107/22434 [12:32:25<6:32:28, 2.52s/it] +2025-02-05 22:40:08 - ERROR - stderr - 58%|█████▊ | 13108/22434 [12:32:28<6:46:20, 2.61s/it] +2025-02-05 22:40:08 - ERROR - stderr - +2025-02-05 22:40:08 - ERROR - stderr - +2025-02-05 22:40:08 - INFO - stdout - {'loss': 0.633, 'grad_norm': 1.17420494556427, 'learning_rate': 7.775192819153259e-06, 'epoch': 1.75} +2025-02-05 22:40:08 - ERROR - stderr - 58%|█████▊ | 13108/22434 [12:32:28<6:46:20, 2.61s/it] +2025-02-05 22:40:11 - ERROR - stderr - 58%|█████▊ | 13109/22434 [12:32:31<6:48:52, 2.63s/it] +2025-02-05 22:40:11 - ERROR - stderr - +2025-02-05 22:40:11 - ERROR - stderr - +2025-02-05 22:40:11 - INFO - stdout - {'loss': 0.6973, 'grad_norm': 1.219744324684143, 'learning_rate': 7.773785280212482e-06, 'epoch': 1.75} +2025-02-05 22:40:11 - ERROR - stderr - 58%|█████▊ | 13109/22434 [12:32:31<6:48:52, 2.63s/it] +2025-02-05 22:40:14 - ERROR - stderr - 58%|█████▊ | 13110/22434 [12:32:33<6:42:14, 2.59s/it] +2025-02-05 22:40:14 - ERROR - stderr - +2025-02-05 22:40:14 - ERROR - stderr - +2025-02-05 22:40:14 - INFO - stdout - {'loss': 0.6068, 'grad_norm': 1.3174890279769897, 'learning_rate': 7.772377787675019e-06, 'epoch': 1.75} +2025-02-05 22:40:14 - ERROR - stderr - 58%|█████▊ | 13110/22434 [12:32:33<6:42:14, 2.59s/it] +2025-02-05 22:40:16 - ERROR - stderr - 58%|█████▊ | 13111/22434 [12:32:36<6:37:42, 2.56s/it] +2025-02-05 22:40:16 - ERROR - stderr - +2025-02-05 22:40:16 - ERROR - stderr - +2025-02-05 22:40:16 - INFO - stdout - {'loss': 0.7688, 'grad_norm': 1.4683308601379395, 'learning_rate': 7.770970341570209e-06, 'epoch': 1.75} +2025-02-05 22:40:16 - ERROR - stderr - 58%|█████▊ | 13111/22434 [12:32:36<6:37:42, 2.56s/it] +2025-02-05 22:40:19 - ERROR - stderr - 58%|█████▊ | 13112/22434 [12:32:38<6:39:35, 2.57s/it] +2025-02-05 22:40:19 - ERROR - stderr - +2025-02-05 22:40:19 - ERROR - stderr - +2025-02-05 22:40:19 - INFO - stdout - {'loss': 0.6879, 'grad_norm': 1.2203373908996582, 'learning_rate': 7.769562941927387e-06, 'epoch': 1.75} +2025-02-05 22:40:19 - ERROR - stderr - 58%|█████▊ | 13112/22434 [12:32:39<6:39:35, 2.57s/it] +2025-02-05 22:40:21 - ERROR - stderr - 58%|█████▊ | 13113/22434 [12:32:41<6:41:43, 2.59s/it] +2025-02-05 22:40:21 - ERROR - stderr - +2025-02-05 22:40:21 - ERROR - stderr - +2025-02-05 22:40:21 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.1337286233901978, 'learning_rate': 7.768155588775898e-06, 'epoch': 1.75} +2025-02-05 22:40:21 - ERROR - stderr - 58%|█████▊ | 13113/22434 [12:32:41<6:41:43, 2.59s/it] +2025-02-05 22:40:24 - ERROR - stderr - 58%|█████▊ | 13114/22434 [12:32:44<6:38:34, 2.57s/it] +2025-02-05 22:40:24 - ERROR - stderr - +2025-02-05 22:40:24 - ERROR - stderr - +2025-02-05 22:40:24 - INFO - stdout - {'loss': 0.7034, 'grad_norm': 1.3314387798309326, 'learning_rate': 7.766748282145068e-06, 'epoch': 1.75} +2025-02-05 22:40:24 - ERROR - stderr - 58%|█████▊ | 13114/22434 [12:32:44<6:38:34, 2.57s/it] +2025-02-05 22:40:26 - ERROR - stderr - 58%|█████▊ | 13115/22434 [12:32:46<6:35:40, 2.55s/it] +2025-02-05 22:40:26 - ERROR - stderr - +2025-02-05 22:40:26 - ERROR - stderr - +2025-02-05 22:40:26 - INFO - stdout - {'loss': 0.6455, 'grad_norm': 1.2009519338607788, 'learning_rate': 7.76534102206423e-06, 'epoch': 1.75} +2025-02-05 22:40:26 - ERROR - stderr - 58%|█████▊ | 13115/22434 [12:32:46<6:35:40, 2.55s/it] +2025-02-05 22:40:29 - ERROR - stderr - 58%|█████▊ | 13116/22434 [12:32:49<6:37:58, 2.56s/it] +2025-02-05 22:40:29 - ERROR - stderr - +2025-02-05 22:40:29 - ERROR - stderr - +2025-02-05 22:40:29 - INFO - stdout - {'loss': 0.7057, 'grad_norm': 1.228232502937317, 'learning_rate': 7.763933808562724e-06, 'epoch': 1.75} +2025-02-05 22:40:29 - ERROR - stderr - 58%|█████▊ | 13116/22434 [12:32:49<6:37:58, 2.56s/it] +2025-02-05 22:40:32 - ERROR - stderr - 58%|█████▊ | 13117/22434 [12:32:51<6:41:16, 2.58s/it] +2025-02-05 22:40:32 - ERROR - stderr - +2025-02-05 22:40:32 - ERROR - stderr - +2025-02-05 22:40:32 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.3439264297485352, 'learning_rate': 7.762526641669875e-06, 'epoch': 1.75} +2025-02-05 22:40:32 - ERROR - stderr - 58%|█████▊ | 13117/22434 [12:32:51<6:41:16, 2.58s/it] +2025-02-05 22:40:34 - ERROR - stderr - 58%|█████▊ | 13118/22434 [12:32:54<6:44:51, 2.61s/it] +2025-02-05 22:40:34 - ERROR - stderr - +2025-02-05 22:40:34 - ERROR - stderr - +2025-02-05 22:40:34 - INFO - stdout - {'loss': 0.6948, 'grad_norm': 1.2936687469482422, 'learning_rate': 7.761119521415017e-06, 'epoch': 1.75} +2025-02-05 22:40:34 - ERROR - stderr - 58%|█████▊ | 13118/22434 [12:32:54<6:44:51, 2.61s/it] +2025-02-05 22:40:37 - ERROR - stderr - 58%|█████▊ | 13119/22434 [12:32:57<6:43:18, 2.60s/it] +2025-02-05 22:40:37 - ERROR - stderr - +2025-02-05 22:40:37 - ERROR - stderr - +2025-02-05 22:40:37 - INFO - stdout - {'loss': 0.617, 'grad_norm': 1.2233085632324219, 'learning_rate': 7.759712447827482e-06, 'epoch': 1.75} +2025-02-05 22:40:37 - ERROR - stderr - 58%|█████▊ | 13119/22434 [12:32:57<6:43:18, 2.60s/it] +2025-02-05 22:40:39 - ERROR - stderr - 58%|█████▊ | 13120/22434 [12:32:59<6:38:14, 2.57s/it] +2025-02-05 22:40:39 - ERROR - stderr - +2025-02-05 22:40:39 - ERROR - stderr - +2025-02-05 22:40:39 - INFO - stdout - {'loss': 0.6189, 'grad_norm': 1.2602564096450806, 'learning_rate': 7.758305420936594e-06, 'epoch': 1.75} +2025-02-05 22:40:39 - ERROR - stderr - 58%|█████▊ | 13120/22434 [12:32:59<6:38:14, 2.57s/it] +2025-02-05 22:40:42 - ERROR - stderr - 58%|█████▊ | 13121/22434 [12:33:02<6:46:18, 2.62s/it] +2025-02-05 22:40:42 - ERROR - stderr - +2025-02-05 22:40:42 - ERROR - stderr - +2025-02-05 22:40:42 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.1129753589630127, 'learning_rate': 7.75689844077169e-06, 'epoch': 1.75} +2025-02-05 22:40:42 - ERROR - stderr - 58%|█████▊ | 13121/22434 [12:33:02<6:46:18, 2.62s/it] +2025-02-05 22:40:45 - ERROR - stderr - 58%|█████▊ | 13122/22434 [12:33:04<6:40:49, 2.58s/it] +2025-02-05 22:40:45 - ERROR - stderr - +2025-02-05 22:40:45 - ERROR - stderr - +2025-02-05 22:40:45 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.278192162513733, 'learning_rate': 7.755491507362089e-06, 'epoch': 1.75} +2025-02-05 22:40:45 - ERROR - stderr - 58%|█████▊ | 13122/22434 [12:33:04<6:40:49, 2.58s/it] +2025-02-05 22:40:47 - ERROR - stderr - 58%|█████▊ | 13123/22434 [12:33:07<6:39:27, 2.57s/it] +2025-02-05 22:40:47 - ERROR - stderr - +2025-02-05 22:40:47 - ERROR - stderr - +2025-02-05 22:40:47 - INFO - stdout - {'loss': 0.6827, 'grad_norm': 1.1872895956039429, 'learning_rate': 7.754084620737117e-06, 'epoch': 1.75} +2025-02-05 22:40:47 - ERROR - stderr - 58%|█████▊ | 13123/22434 [12:33:07<6:39:27, 2.57s/it] +2025-02-05 22:40:50 - ERROR - stderr - 59%|█████▊ | 13124/22434 [12:33:09<6:39:56, 2.58s/it] +2025-02-05 22:40:50 - ERROR - stderr - +2025-02-05 22:40:50 - ERROR - stderr - +2025-02-05 22:40:50 - INFO - stdout - {'loss': 0.7143, 'grad_norm': 1.3323551416397095, 'learning_rate': 7.752677780926105e-06, 'epoch': 1.76} +2025-02-05 22:40:50 - ERROR - stderr - 59%|█████▊ | 13124/22434 [12:33:09<6:39:56, 2.58s/it] +2025-02-05 22:40:52 - ERROR - stderr - 59%|█████▊ | 13125/22434 [12:33:12<6:35:29, 2.55s/it] +2025-02-05 22:40:52 - ERROR - stderr - +2025-02-05 22:40:52 - ERROR - stderr - +2025-02-05 22:40:52 - INFO - stdout - {'loss': 0.7769, 'grad_norm': 1.3326780796051025, 'learning_rate': 7.751270987958375e-06, 'epoch': 1.76} +2025-02-05 22:40:52 - ERROR - stderr - 59%|█████▊ | 13125/22434 [12:33:12<6:35:29, 2.55s/it] +2025-02-05 22:40:55 - ERROR - stderr - 59%|█████▊ | 13126/22434 [12:33:15<6:41:40, 2.59s/it] +2025-02-05 22:40:55 - ERROR - stderr - +2025-02-05 22:40:55 - ERROR - stderr - +2025-02-05 22:40:55 - INFO - stdout - {'loss': 0.6745, 'grad_norm': 1.1852363348007202, 'learning_rate': 7.749864241863245e-06, 'epoch': 1.76} +2025-02-05 22:40:55 - ERROR - stderr - 59%|█████▊ | 13126/22434 [12:33:15<6:41:40, 2.59s/it] +2025-02-05 22:40:57 - ERROR - stderr - 59%|█████▊ | 13127/22434 [12:33:17<6:36:04, 2.55s/it] +2025-02-05 22:40:57 - ERROR - stderr - +2025-02-05 22:40:57 - ERROR - stderr - +2025-02-05 22:40:57 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.3145664930343628, 'learning_rate': 7.748457542670046e-06, 'epoch': 1.76} +2025-02-05 22:40:57 - ERROR - stderr - 59%|█████▊ | 13127/22434 [12:33:17<6:36:04, 2.55s/it] +2025-02-05 22:41:00 - ERROR - stderr - 59%|█████▊ | 13128/22434 [12:33:20<6:29:47, 2.51s/it] +2025-02-05 22:41:00 - ERROR - stderr - +2025-02-05 22:41:00 - ERROR - stderr - +2025-02-05 22:41:00 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.3497494459152222, 'learning_rate': 7.747050890408092e-06, 'epoch': 1.76} +2025-02-05 22:41:00 - ERROR - stderr - 59%|█████▊ | 13128/22434 [12:33:20<6:29:47, 2.51s/it] +2025-02-05 22:41:02 - ERROR - stderr - 59%|█████▊ | 13129/22434 [12:33:22<6:25:31, 2.49s/it] +2025-02-05 22:41:02 - ERROR - stderr - +2025-02-05 22:41:02 - ERROR - stderr - +2025-02-05 22:41:02 - INFO - stdout - {'loss': 0.7677, 'grad_norm': 1.2761904001235962, 'learning_rate': 7.74564428510671e-06, 'epoch': 1.76} +2025-02-05 22:41:02 - ERROR - stderr - 59%|█████▊ | 13129/22434 [12:33:22<6:25:31, 2.49s/it] +2025-02-05 22:41:05 - ERROR - stderr - 59%|█████▊ | 13130/22434 [12:33:24<6:23:19, 2.47s/it] +2025-02-05 22:41:05 - ERROR - stderr - +2025-02-05 22:41:05 - ERROR - stderr - +2025-02-05 22:41:05 - INFO - stdout - {'loss': 0.6712, 'grad_norm': 1.1878252029418945, 'learning_rate': 7.744237726795213e-06, 'epoch': 1.76} +2025-02-05 22:41:05 - ERROR - stderr - 59%|█████▊ | 13130/22434 [12:33:24<6:23:19, 2.47s/it] +2025-02-05 22:41:07 - ERROR - stderr - 59%|█████▊ | 13131/22434 [12:33:27<6:25:36, 2.49s/it] +2025-02-05 22:41:07 - ERROR - stderr - +2025-02-05 22:41:07 - ERROR - stderr - +2025-02-05 22:41:07 - INFO - stdout - {'loss': 0.6343, 'grad_norm': 1.1849473714828491, 'learning_rate': 7.742831215502922e-06, 'epoch': 1.76} +2025-02-05 22:41:07 - ERROR - stderr - 59%|█████▊ | 13131/22434 [12:33:27<6:25:36, 2.49s/it] +2025-02-05 22:41:10 - ERROR - stderr - 59%|█████▊ | 13132/22434 [12:33:29<6:26:26, 2.49s/it] +2025-02-05 22:41:10 - ERROR - stderr - +2025-02-05 22:41:10 - ERROR - stderr - +2025-02-05 22:41:10 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.2963931560516357, 'learning_rate': 7.741424751259156e-06, 'epoch': 1.76} +2025-02-05 22:41:10 - ERROR - stderr - 59%|█████▊ | 13132/22434 [12:33:29<6:26:26, 2.49s/it] +2025-02-05 22:41:13 - ERROR - stderr - 59%|█████▊ | 13133/22434 [12:33:32<6:44:43, 2.61s/it] +2025-02-05 22:41:13 - ERROR - stderr - +2025-02-05 22:41:13 - ERROR - stderr - +2025-02-05 22:41:13 - INFO - stdout - {'loss': 0.7811, 'grad_norm': 1.3765865564346313, 'learning_rate': 7.740018334093231e-06, 'epoch': 1.76} +2025-02-05 22:41:13 - ERROR - stderr - 59%|█████▊ | 13133/22434 [12:33:32<6:44:43, 2.61s/it] +2025-02-05 22:41:15 - ERROR - stderr - 59%|█████▊ | 13134/22434 [12:33:35<6:42:54, 2.60s/it] +2025-02-05 22:41:15 - ERROR - stderr - +2025-02-05 22:41:15 - ERROR - stderr - +2025-02-05 22:41:15 - INFO - stdout - {'loss': 0.7118, 'grad_norm': 1.3247803449630737, 'learning_rate': 7.738611964034458e-06, 'epoch': 1.76} +2025-02-05 22:41:15 - ERROR - stderr - 59%|█████▊ | 13134/22434 [12:33:35<6:42:54, 2.60s/it] +2025-02-05 22:41:18 - ERROR - stderr - 59%|█████▊ | 13135/22434 [12:33:37<6:42:21, 2.60s/it] +2025-02-05 22:41:18 - ERROR - stderr - +2025-02-05 22:41:18 - ERROR - stderr - +2025-02-05 22:41:18 - INFO - stdout - {'loss': 0.7609, 'grad_norm': 1.4144973754882812, 'learning_rate': 7.737205641112158e-06, 'epoch': 1.76} +2025-02-05 22:41:18 - ERROR - stderr - 59%|█████▊ | 13135/22434 [12:33:37<6:42:21, 2.60s/it] +2025-02-05 22:41:20 - ERROR - stderr - 59%|█████▊ | 13136/22434 [12:33:40<6:39:46, 2.58s/it] +2025-02-05 22:41:20 - ERROR - stderr - +2025-02-05 22:41:20 - ERROR - stderr - +2025-02-05 22:41:20 - INFO - stdout - {'loss': 0.7217, 'grad_norm': 1.2717084884643555, 'learning_rate': 7.735799365355636e-06, 'epoch': 1.76} +2025-02-05 22:41:20 - ERROR - stderr - 59%|█████▊ | 13136/22434 [12:33:40<6:39:46, 2.58s/it] +2025-02-05 22:41:23 - ERROR - stderr - 59%|█████▊ | 13137/22434 [12:33:43<6:38:54, 2.57s/it] +2025-02-05 22:41:23 - ERROR - stderr - +2025-02-05 22:41:23 - ERROR - stderr - +2025-02-05 22:41:23 - INFO - stdout - {'loss': 0.6734, 'grad_norm': 1.3558757305145264, 'learning_rate': 7.734393136794214e-06, 'epoch': 1.76} +2025-02-05 22:41:23 - ERROR - stderr - 59%|█████▊ | 13137/22434 [12:33:43<6:38:54, 2.57s/it] +2025-02-05 22:41:26 - ERROR - stderr - 59%|█████▊ | 13138/22434 [12:33:45<6:51:05, 2.65s/it] +2025-02-05 22:41:26 - ERROR - stderr - +2025-02-05 22:41:26 - ERROR - stderr - +2025-02-05 22:41:26 - INFO - stdout - {'loss': 0.7037, 'grad_norm': 1.2823171615600586, 'learning_rate': 7.732986955457198e-06, 'epoch': 1.76} +2025-02-05 22:41:26 - ERROR - stderr - 59%|█████▊ | 13138/22434 [12:33:45<6:51:05, 2.65s/it] +2025-02-05 22:41:28 - ERROR - stderr - 59%|█████▊ | 13139/22434 [12:33:48<6:48:58, 2.64s/it] +2025-02-05 22:41:28 - ERROR - stderr - +2025-02-05 22:41:28 - ERROR - stderr - +2025-02-05 22:41:28 - INFO - stdout - {'loss': 0.7804, 'grad_norm': 1.4336529970169067, 'learning_rate': 7.731580821373898e-06, 'epoch': 1.76} +2025-02-05 22:41:28 - ERROR - stderr - 59%|█████▊ | 13139/22434 [12:33:48<6:48:58, 2.64s/it] +2025-02-05 22:41:31 - ERROR - stderr - 59%|█████▊ | 13140/22434 [12:33:51<6:48:05, 2.63s/it] +2025-02-05 22:41:31 - ERROR - stderr - +2025-02-05 22:41:31 - ERROR - stderr - +2025-02-05 22:41:31 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.2235890626907349, 'learning_rate': 7.73017473457363e-06, 'epoch': 1.76} +2025-02-05 22:41:31 - ERROR - stderr - 59%|█████▊ | 13140/22434 [12:33:51<6:48:05, 2.63s/it] +2025-02-05 22:41:34 - ERROR - stderr - 59%|█████▊ | 13141/22434 [12:33:53<6:52:14, 2.66s/it] +2025-02-05 22:41:34 - ERROR - stderr - +2025-02-05 22:41:34 - ERROR - stderr - +2025-02-05 22:41:34 - INFO - stdout - {'loss': 0.7193, 'grad_norm': 1.3857910633087158, 'learning_rate': 7.728768695085696e-06, 'epoch': 1.76} +2025-02-05 22:41:34 - ERROR - stderr - 59%|█████▊ | 13141/22434 [12:33:53<6:52:14, 2.66s/it] +2025-02-05 22:41:36 - ERROR - stderr - 59%|█████▊ | 13142/22434 [12:33:56<6:49:22, 2.64s/it] +2025-02-05 22:41:36 - ERROR - stderr - +2025-02-05 22:41:36 - ERROR - stderr - +2025-02-05 22:41:36 - INFO - stdout - {'loss': 0.5665, 'grad_norm': 1.0571202039718628, 'learning_rate': 7.7273627029394e-06, 'epoch': 1.76} +2025-02-05 22:41:36 - ERROR - stderr - 59%|█████▊ | 13142/22434 [12:33:56<6:49:22, 2.64s/it] +2025-02-05 22:41:39 - ERROR - stderr - 59%|█████▊ | 13143/22434 [12:33:58<6:43:20, 2.60s/it] +2025-02-05 22:41:39 - ERROR - stderr - +2025-02-05 22:41:39 - ERROR - stderr - +2025-02-05 22:41:39 - INFO - stdout - {'loss': 0.6566, 'grad_norm': 1.1605843305587769, 'learning_rate': 7.725956758164058e-06, 'epoch': 1.76} +2025-02-05 22:41:39 - ERROR - stderr - 59%|█████▊ | 13143/22434 [12:33:59<6:43:20, 2.60s/it] +2025-02-05 22:41:41 - ERROR - stderr - 59%|█████▊ | 13144/22434 [12:34:01<6:43:21, 2.61s/it] +2025-02-05 22:41:41 - ERROR - stderr - +2025-02-05 22:41:41 - ERROR - stderr - +2025-02-05 22:41:41 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.1740840673446655, 'learning_rate': 7.724550860788968e-06, 'epoch': 1.76} +2025-02-05 22:41:41 - ERROR - stderr - 59%|█████▊ | 13144/22434 [12:34:01<6:43:21, 2.61s/it] +2025-02-05 22:41:44 - ERROR - stderr - 59%|█████▊ | 13145/22434 [12:34:04<6:35:57, 2.56s/it] +2025-02-05 22:41:44 - ERROR - stderr - +2025-02-05 22:41:44 - ERROR - stderr - +2025-02-05 22:41:44 - INFO - stdout - {'loss': 0.6279, 'grad_norm': 1.2423170804977417, 'learning_rate': 7.723145010843442e-06, 'epoch': 1.76} +2025-02-05 22:41:44 - ERROR - stderr - 59%|█████▊ | 13145/22434 [12:34:04<6:35:57, 2.56s/it] +2025-02-05 22:41:46 - ERROR - stderr - 59%|█████▊ | 13146/22434 [12:34:06<6:31:13, 2.53s/it] +2025-02-05 22:41:46 - ERROR - stderr - +2025-02-05 22:41:46 - ERROR - stderr - +2025-02-05 22:41:46 - INFO - stdout - {'loss': 0.6394, 'grad_norm': 1.259957194328308, 'learning_rate': 7.72173920835678e-06, 'epoch': 1.76} +2025-02-05 22:41:46 - ERROR - stderr - 59%|█████▊ | 13146/22434 [12:34:06<6:31:13, 2.53s/it] +2025-02-05 22:41:49 - ERROR - stderr - 59%|█████▊ | 13147/22434 [12:34:09<6:32:32, 2.54s/it] +2025-02-05 22:41:49 - ERROR - stderr - +2025-02-05 22:41:49 - ERROR - stderr - +2025-02-05 22:41:49 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.2878979444503784, 'learning_rate': 7.720333453358281e-06, 'epoch': 1.76} +2025-02-05 22:41:49 - ERROR - stderr - 59%|█████▊ | 13147/22434 [12:34:09<6:32:32, 2.54s/it] +2025-02-05 22:41:52 - ERROR - stderr - 59%|█████▊ | 13148/22434 [12:34:11<6:43:09, 2.60s/it] +2025-02-05 22:41:52 - ERROR - stderr - +2025-02-05 22:41:52 - ERROR - stderr - +2025-02-05 22:41:52 - INFO - stdout - {'loss': 0.649, 'grad_norm': 1.1974692344665527, 'learning_rate': 7.718927745877253e-06, 'epoch': 1.76} +2025-02-05 22:41:52 - ERROR - stderr - 59%|█████▊ | 13148/22434 [12:34:11<6:43:09, 2.60s/it] +2025-02-05 22:41:54 - ERROR - stderr - 59%|█████▊ | 13149/22434 [12:34:14<6:34:36, 2.55s/it] +2025-02-05 22:41:54 - ERROR - stderr - +2025-02-05 22:41:54 - ERROR - stderr - +2025-02-05 22:41:54 - INFO - stdout - {'loss': 0.7997, 'grad_norm': 1.427952527999878, 'learning_rate': 7.71752208594299e-06, 'epoch': 1.76} +2025-02-05 22:41:54 - ERROR - stderr - 59%|█████▊ | 13149/22434 [12:34:14<6:34:36, 2.55s/it] +2025-02-05 22:41:57 - ERROR - stderr - 59%|█████▊ | 13150/22434 [12:34:16<6:39:49, 2.58s/it] +2025-02-05 22:41:57 - ERROR - stderr - +2025-02-05 22:41:57 - ERROR - stderr - +2025-02-05 22:41:57 - INFO - stdout - {'loss': 0.7211, 'grad_norm': 1.2960275411605835, 'learning_rate': 7.716116473584795e-06, 'epoch': 1.76} +2025-02-05 22:41:57 - ERROR - stderr - 59%|█████▊ | 13150/22434 [12:34:16<6:39:49, 2.58s/it] +2025-02-05 22:41:59 - ERROR - stderr - 59%|█████▊ | 13151/22434 [12:34:19<6:35:14, 2.55s/it] +2025-02-05 22:41:59 - ERROR - stderr - +2025-02-05 22:41:59 - ERROR - stderr - +2025-02-05 22:41:59 - INFO - stdout - {'loss': 0.6381, 'grad_norm': 1.2086318731307983, 'learning_rate': 7.714710908831971e-06, 'epoch': 1.76} +2025-02-05 22:41:59 - ERROR - stderr - 59%|█████▊ | 13151/22434 [12:34:19<6:35:14, 2.55s/it] +2025-02-05 22:42:02 - ERROR - stderr - 59%|█████▊ | 13152/22434 [12:34:21<6:33:49, 2.55s/it] +2025-02-05 22:42:02 - ERROR - stderr - +2025-02-05 22:42:02 - ERROR - stderr - +2025-02-05 22:42:02 - INFO - stdout - {'loss': 0.7629, 'grad_norm': 1.3161646127700806, 'learning_rate': 7.713305391713805e-06, 'epoch': 1.76} +2025-02-05 22:42:02 - ERROR - stderr - 59%|█████▊ | 13152/22434 [12:34:21<6:33:49, 2.55s/it] +2025-02-05 22:42:04 - ERROR - stderr - 59%|█████▊ | 13153/22434 [12:34:24<6:33:37, 2.54s/it] +2025-02-05 22:42:04 - ERROR - stderr - +2025-02-05 22:42:04 - ERROR - stderr - +2025-02-05 22:42:04 - INFO - stdout - {'loss': 0.7692, 'grad_norm': 1.285466194152832, 'learning_rate': 7.711899922259606e-06, 'epoch': 1.76} +2025-02-05 22:42:04 - ERROR - stderr - 59%|█████▊ | 13153/22434 [12:34:24<6:33:37, 2.54s/it] +2025-02-05 22:42:07 - ERROR - stderr - 59%|█████▊ | 13154/22434 [12:34:26<6:33:11, 2.54s/it] +2025-02-05 22:42:07 - ERROR - stderr - +2025-02-05 22:42:07 - ERROR - stderr - +2025-02-05 22:42:07 - INFO - stdout - {'loss': 0.7182, 'grad_norm': 1.31812584400177, 'learning_rate': 7.710494500498662e-06, 'epoch': 1.76} +2025-02-05 22:42:07 - ERROR - stderr - 59%|█████▊ | 13154/22434 [12:34:27<6:33:11, 2.54s/it] +2025-02-05 22:42:09 - ERROR - stderr - 59%|█████▊ | 13155/22434 [12:34:29<6:36:13, 2.56s/it] +2025-02-05 22:42:09 - ERROR - stderr - +2025-02-05 22:42:09 - ERROR - stderr - +2025-02-05 22:42:09 - INFO - stdout - {'loss': 0.6984, 'grad_norm': 1.337638020515442, 'learning_rate': 7.709089126460266e-06, 'epoch': 1.76} +2025-02-05 22:42:09 - ERROR - stderr - 59%|█████▊ | 13155/22434 [12:34:29<6:36:13, 2.56s/it] +2025-02-05 22:42:12 - ERROR - stderr - 59%|█████▊ | 13156/22434 [12:34:32<6:37:12, 2.57s/it] +2025-02-05 22:42:12 - ERROR - stderr - +2025-02-05 22:42:12 - ERROR - stderr - +2025-02-05 22:42:12 - INFO - stdout - {'loss': 0.639, 'grad_norm': 1.3110979795455933, 'learning_rate': 7.707683800173717e-06, 'epoch': 1.76} +2025-02-05 22:42:12 - ERROR - stderr - 59%|█████▊ | 13156/22434 [12:34:32<6:37:12, 2.57s/it] +2025-02-05 22:42:14 - ERROR - stderr - 59%|█████▊ | 13157/22434 [12:34:34<6:34:28, 2.55s/it] +2025-02-05 22:42:14 - ERROR - stderr - +2025-02-05 22:42:14 - ERROR - stderr - +2025-02-05 22:42:14 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.1933131217956543, 'learning_rate': 7.70627852166831e-06, 'epoch': 1.76} +2025-02-05 22:42:14 - ERROR - stderr - 59%|█████▊ | 13157/22434 [12:34:34<6:34:28, 2.55s/it] +2025-02-05 22:42:17 - ERROR - stderr - 59%|█████▊ | 13158/22434 [12:34:37<6:40:44, 2.59s/it] +2025-02-05 22:42:17 - ERROR - stderr - +2025-02-05 22:42:17 - ERROR - stderr - +2025-02-05 22:42:17 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.4542263746261597, 'learning_rate': 7.704873290973325e-06, 'epoch': 1.76} +2025-02-05 22:42:17 - ERROR - stderr - 59%|█████▊ | 13158/22434 [12:34:37<6:40:44, 2.59s/it] +2025-02-05 22:42:20 - ERROR - stderr - 59%|█████▊ | 13159/22434 [12:34:39<6:37:07, 2.57s/it] +2025-02-05 22:42:20 - ERROR - stderr - +2025-02-05 22:42:20 - ERROR - stderr - +2025-02-05 22:42:20 - INFO - stdout - {'loss': 0.7163, 'grad_norm': 1.263856291770935, 'learning_rate': 7.703468108118064e-06, 'epoch': 1.76} +2025-02-05 22:42:20 - ERROR - stderr - 59%|█████▊ | 13159/22434 [12:34:39<6:37:07, 2.57s/it] +2025-02-05 22:42:22 - ERROR - stderr - 59%|█████▊ | 13160/22434 [12:34:42<6:29:35, 2.52s/it] +2025-02-05 22:42:22 - ERROR - stderr - +2025-02-05 22:42:22 - ERROR - stderr - +2025-02-05 22:42:22 - INFO - stdout - {'loss': 0.7376, 'grad_norm': 1.3714478015899658, 'learning_rate': 7.702062973131812e-06, 'epoch': 1.76} +2025-02-05 22:42:22 - ERROR - stderr - 59%|█████▊ | 13160/22434 [12:34:42<6:29:35, 2.52s/it] +2025-02-05 22:42:25 - ERROR - stderr - 59%|█████▊ | 13161/22434 [12:34:44<6:31:18, 2.53s/it] +2025-02-05 22:42:25 - ERROR - stderr - +2025-02-05 22:42:25 - ERROR - stderr - +2025-02-05 22:42:25 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.1897685527801514, 'learning_rate': 7.700657886043859e-06, 'epoch': 1.76} +2025-02-05 22:42:25 - ERROR - stderr - 59%|█████▊ | 13161/22434 [12:34:44<6:31:18, 2.53s/it] +2025-02-05 22:42:27 - ERROR - stderr - 59%|█████▊ | 13162/22434 [12:34:47<6:32:10, 2.54s/it] +2025-02-05 22:42:27 - ERROR - stderr - +2025-02-05 22:42:27 - ERROR - stderr - +2025-02-05 22:42:27 - INFO - stdout - {'loss': 0.6847, 'grad_norm': 1.113761067390442, 'learning_rate': 7.699252846883493e-06, 'epoch': 1.76} +2025-02-05 22:42:27 - ERROR - stderr - 59%|█████▊ | 13162/22434 [12:34:47<6:32:10, 2.54s/it] +2025-02-05 22:42:30 - ERROR - stderr - 59%|█████▊ | 13163/22434 [12:34:50<6:44:15, 2.62s/it] +2025-02-05 22:42:30 - ERROR - stderr - +2025-02-05 22:42:30 - ERROR - stderr - +2025-02-05 22:42:30 - INFO - stdout - {'loss': 0.7176, 'grad_norm': 1.385581612586975, 'learning_rate': 7.697847855679996e-06, 'epoch': 1.76} +2025-02-05 22:42:30 - ERROR - stderr - 59%|█████▊ | 13163/22434 [12:34:50<6:44:15, 2.62s/it] +2025-02-05 22:42:32 - ERROR - stderr - 59%|█████▊ | 13164/22434 [12:34:52<6:38:32, 2.58s/it] +2025-02-05 22:42:32 - ERROR - stderr - +2025-02-05 22:42:32 - ERROR - stderr - +2025-02-05 22:42:32 - INFO - stdout - {'loss': 0.7265, 'grad_norm': 1.3693904876708984, 'learning_rate': 7.696442912462662e-06, 'epoch': 1.76} +2025-02-05 22:42:32 - ERROR - stderr - 59%|█████▊ | 13164/22434 [12:34:52<6:38:32, 2.58s/it] +2025-02-05 22:42:35 - ERROR - stderr - 59%|█████▊ | 13165/22434 [12:34:55<6:37:46, 2.57s/it] +2025-02-05 22:42:35 - ERROR - stderr - +2025-02-05 22:42:35 - ERROR - stderr - +2025-02-05 22:42:35 - INFO - stdout - {'loss': 0.7417, 'grad_norm': 1.2939317226409912, 'learning_rate': 7.695038017260772e-06, 'epoch': 1.76} +2025-02-05 22:42:35 - ERROR - stderr - 59%|█████▊ | 13165/22434 [12:34:55<6:37:46, 2.57s/it] +2025-02-05 22:42:38 - ERROR - stderr - 59%|█████▊ | 13166/22434 [12:34:57<6:42:08, 2.60s/it] +2025-02-05 22:42:38 - ERROR - stderr - +2025-02-05 22:42:38 - ERROR - stderr - +2025-02-05 22:42:38 - INFO - stdout - {'loss': 0.7064, 'grad_norm': 1.280278205871582, 'learning_rate': 7.693633170103603e-06, 'epoch': 1.76} +2025-02-05 22:42:38 - ERROR - stderr - 59%|█████▊ | 13166/22434 [12:34:57<6:42:08, 2.60s/it] +2025-02-05 22:42:40 - ERROR - stderr - 59%|█████▊ | 13167/22434 [12:35:00<6:52:58, 2.67s/it] +2025-02-05 22:42:41 - ERROR - stderr - +2025-02-05 22:42:41 - ERROR - stderr - +2025-02-05 22:42:41 - INFO - stdout - {'loss': 0.6714, 'grad_norm': 1.2304584980010986, 'learning_rate': 7.692228371020449e-06, 'epoch': 1.76} +2025-02-05 22:42:41 - ERROR - stderr - 59%|█████▊ | 13167/22434 [12:35:00<6:52:58, 2.67s/it] +2025-02-05 22:42:43 - ERROR - stderr - 59%|█████▊ | 13168/22434 [12:35:03<6:54:33, 2.68s/it] +2025-02-05 22:42:43 - ERROR - stderr - +2025-02-05 22:42:43 - ERROR - stderr - +2025-02-05 22:42:43 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.2177759408950806, 'learning_rate': 7.690823620040581e-06, 'epoch': 1.76} +2025-02-05 22:42:43 - ERROR - stderr - 59%|█████▊ | 13168/22434 [12:35:03<6:54:33, 2.68s/it] +2025-02-05 22:42:46 - ERROR - stderr - 59%|█████▊ | 13169/22434 [12:35:06<7:03:56, 2.75s/it] +2025-02-05 22:42:46 - ERROR - stderr - +2025-02-05 22:42:46 - ERROR - stderr - +2025-02-05 22:42:46 - INFO - stdout - {'loss': 0.5802, 'grad_norm': 1.219003677368164, 'learning_rate': 7.68941891719329e-06, 'epoch': 1.76} +2025-02-05 22:42:46 - ERROR - stderr - 59%|█████▊ | 13169/22434 [12:35:06<7:03:56, 2.75s/it] +2025-02-05 22:42:49 - ERROR - stderr - 59%|█████▊ | 13170/22434 [12:35:08<6:56:35, 2.70s/it] +2025-02-05 22:42:49 - ERROR - stderr - +2025-02-05 22:42:49 - ERROR - stderr - +2025-02-05 22:42:49 - INFO - stdout - {'loss': 0.6488, 'grad_norm': 1.1753109693527222, 'learning_rate': 7.68801426250785e-06, 'epoch': 1.76} +2025-02-05 22:42:49 - ERROR - stderr - 59%|█████▊ | 13170/22434 [12:35:08<6:56:35, 2.70s/it] +2025-02-05 22:42:51 - ERROR - stderr - 59%|█████▊ | 13171/22434 [12:35:11<6:55:45, 2.69s/it] +2025-02-05 22:42:51 - ERROR - stderr - +2025-02-05 22:42:51 - ERROR - stderr - +2025-02-05 22:42:51 - INFO - stdout - {'loss': 0.6663, 'grad_norm': 1.1693811416625977, 'learning_rate': 7.686609656013538e-06, 'epoch': 1.76} +2025-02-05 22:42:51 - ERROR - stderr - 59%|█████▊ | 13171/22434 [12:35:11<6:55:45, 2.69s/it] +2025-02-05 22:42:54 - ERROR - stderr - 59%|█████▊ | 13172/22434 [12:35:14<6:46:45, 2.63s/it] +2025-02-05 22:42:54 - ERROR - stderr - +2025-02-05 22:42:54 - ERROR - stderr - +2025-02-05 22:42:54 - INFO - stdout - {'loss': 0.6169, 'grad_norm': 1.1380445957183838, 'learning_rate': 7.685205097739636e-06, 'epoch': 1.76} +2025-02-05 22:42:54 - ERROR - stderr - 59%|█████▊ | 13172/22434 [12:35:14<6:46:45, 2.63s/it] +2025-02-05 22:42:56 - ERROR - stderr - 59%|█████▊ | 13173/22434 [12:35:16<6:43:32, 2.61s/it] +2025-02-05 22:42:56 - ERROR - stderr - +2025-02-05 22:42:56 - ERROR - stderr - +2025-02-05 22:42:56 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.2496849298477173, 'learning_rate': 7.683800587715416e-06, 'epoch': 1.76} +2025-02-05 22:42:56 - ERROR - stderr - 59%|█████▊ | 13173/22434 [12:35:16<6:43:32, 2.61s/it] +2025-02-05 22:42:59 - ERROR - stderr - 59%|█████▊ | 13174/22434 [12:35:19<6:36:34, 2.57s/it] +2025-02-05 22:42:59 - ERROR - stderr - +2025-02-05 22:42:59 - ERROR - stderr - +2025-02-05 22:42:59 - INFO - stdout - {'loss': 0.7651, 'grad_norm': 1.2730404138565063, 'learning_rate': 7.68239612597016e-06, 'epoch': 1.76} +2025-02-05 22:42:59 - ERROR - stderr - 59%|█████▊ | 13174/22434 [12:35:19<6:36:34, 2.57s/it] +2025-02-05 22:43:01 - ERROR - stderr - 59%|█████▊ | 13175/22434 [12:35:21<6:32:13, 2.54s/it] +2025-02-05 22:43:01 - ERROR - stderr - +2025-02-05 22:43:01 - ERROR - stderr - +2025-02-05 22:43:01 - INFO - stdout - {'loss': 0.6542, 'grad_norm': 1.2129355669021606, 'learning_rate': 7.680991712533138e-06, 'epoch': 1.76} +2025-02-05 22:43:01 - ERROR - stderr - 59%|█████▊ | 13175/22434 [12:35:21<6:32:13, 2.54s/it] +2025-02-05 22:43:04 - ERROR - stderr - 59%|█████▊ | 13176/22434 [12:35:24<6:31:43, 2.54s/it] +2025-02-05 22:43:04 - ERROR - stderr - +2025-02-05 22:43:04 - ERROR - stderr - +2025-02-05 22:43:04 - INFO - stdout - {'loss': 0.6497, 'grad_norm': 1.1598167419433594, 'learning_rate': 7.679587347433624e-06, 'epoch': 1.76} +2025-02-05 22:43:04 - ERROR - stderr - 59%|█████▊ | 13176/22434 [12:35:24<6:31:43, 2.54s/it] +2025-02-05 22:43:06 - ERROR - stderr - 59%|█████▊ | 13177/22434 [12:35:26<6:33:14, 2.55s/it] +2025-02-05 22:43:07 - ERROR - stderr - +2025-02-05 22:43:07 - ERROR - stderr - +2025-02-05 22:43:07 - INFO - stdout - {'loss': 0.6289, 'grad_norm': 1.2443852424621582, 'learning_rate': 7.678183030700891e-06, 'epoch': 1.76} +2025-02-05 22:43:07 - ERROR - stderr - 59%|█████▊ | 13177/22434 [12:35:26<6:33:14, 2.55s/it] +2025-02-05 22:43:09 - ERROR - stderr - 59%|█████▊ | 13178/22434 [12:35:29<6:42:02, 2.61s/it] +2025-02-05 22:43:09 - ERROR - stderr - +2025-02-05 22:43:09 - ERROR - stderr - +2025-02-05 22:43:09 - INFO - stdout - {'loss': 0.5957, 'grad_norm': 1.136084794998169, 'learning_rate': 7.676778762364214e-06, 'epoch': 1.76} +2025-02-05 22:43:09 - ERROR - stderr - 59%|█████▊ | 13178/22434 [12:35:29<6:42:02, 2.61s/it] +2025-02-05 22:43:12 - ERROR - stderr - 59%|█████▊ | 13179/22434 [12:35:32<6:41:29, 2.60s/it] +2025-02-05 22:43:12 - ERROR - stderr - +2025-02-05 22:43:12 - ERROR - stderr - +2025-02-05 22:43:12 - INFO - stdout - {'loss': 0.6616, 'grad_norm': 1.2369897365570068, 'learning_rate': 7.675374542452856e-06, 'epoch': 1.76} +2025-02-05 22:43:12 - ERROR - stderr - 59%|█████▊ | 13179/22434 [12:35:32<6:41:29, 2.60s/it] +2025-02-05 22:43:14 - ERROR - stderr - 59%|█████▉ | 13180/22434 [12:35:34<6:38:10, 2.58s/it] +2025-02-05 22:43:14 - ERROR - stderr - +2025-02-05 22:43:14 - ERROR - stderr - +2025-02-05 22:43:14 - INFO - stdout - {'loss': 0.7138, 'grad_norm': 1.3235372304916382, 'learning_rate': 7.673970370996095e-06, 'epoch': 1.76} +2025-02-05 22:43:14 - ERROR - stderr - 59%|█████▉ | 13180/22434 [12:35:34<6:38:10, 2.58s/it] +2025-02-05 22:43:17 - ERROR - stderr - 59%|█████▉ | 13181/22434 [12:35:37<6:34:19, 2.56s/it] +2025-02-05 22:43:17 - ERROR - stderr - +2025-02-05 22:43:17 - ERROR - stderr - +2025-02-05 22:43:17 - INFO - stdout - {'loss': 0.7086, 'grad_norm': 1.0980262756347656, 'learning_rate': 7.672566248023192e-06, 'epoch': 1.76} +2025-02-05 22:43:17 - ERROR - stderr - 59%|█████▉ | 13181/22434 [12:35:37<6:34:19, 2.56s/it] +2025-02-05 22:43:19 - ERROR - stderr - 59%|█████▉ | 13182/22434 [12:35:39<6:35:59, 2.57s/it] +2025-02-05 22:43:19 - ERROR - stderr - +2025-02-05 22:43:19 - ERROR - stderr - +2025-02-05 22:43:19 - INFO - stdout - {'loss': 0.6782, 'grad_norm': 1.1994779109954834, 'learning_rate': 7.67116217356342e-06, 'epoch': 1.76} +2025-02-05 22:43:19 - ERROR - stderr - 59%|█████▉ | 13182/22434 [12:35:39<6:35:59, 2.57s/it] +2025-02-05 22:43:22 - ERROR - stderr - 59%|█████▉ | 13183/22434 [12:35:42<6:35:47, 2.57s/it] +2025-02-05 22:43:22 - ERROR - stderr - +2025-02-05 22:43:22 - ERROR - stderr - +2025-02-05 22:43:22 - INFO - stdout - {'loss': 0.6416, 'grad_norm': 1.1428031921386719, 'learning_rate': 7.669758147646046e-06, 'epoch': 1.76} +2025-02-05 22:43:22 - ERROR - stderr - 59%|█████▉ | 13183/22434 [12:35:42<6:35:47, 2.57s/it] +2025-02-05 22:43:24 - ERROR - stderr - 59%|█████▉ | 13184/22434 [12:35:44<6:32:24, 2.55s/it] +2025-02-05 22:43:25 - ERROR - stderr - +2025-02-05 22:43:25 - ERROR - stderr - +2025-02-05 22:43:25 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.2394593954086304, 'learning_rate': 7.668354170300331e-06, 'epoch': 1.76} +2025-02-05 22:43:25 - ERROR - stderr - 59%|█████▉ | 13184/22434 [12:35:44<6:32:24, 2.55s/it] +2025-02-05 22:43:27 - ERROR - stderr - 59%|█████▉ | 13185/22434 [12:35:47<6:29:17, 2.53s/it] +2025-02-05 22:43:27 - ERROR - stderr - +2025-02-05 22:43:27 - ERROR - stderr - +2025-02-05 22:43:27 - INFO - stdout - {'loss': 0.7195, 'grad_norm': 1.2066439390182495, 'learning_rate': 7.666950241555546e-06, 'epoch': 1.76} +2025-02-05 22:43:27 - ERROR - stderr - 59%|█████▉ | 13185/22434 [12:35:47<6:29:17, 2.53s/it] +2025-02-05 22:43:29 - ERROR - stderr - 59%|█████▉ | 13186/22434 [12:35:49<6:29:02, 2.52s/it] +2025-02-05 22:43:30 - ERROR - stderr - +2025-02-05 22:43:30 - ERROR - stderr - +2025-02-05 22:43:30 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.2288256883621216, 'learning_rate': 7.66554636144095e-06, 'epoch': 1.76} +2025-02-05 22:43:30 - ERROR - stderr - 59%|█████▉ | 13186/22434 [12:35:49<6:29:02, 2.52s/it] +2025-02-05 22:43:32 - ERROR - stderr - 59%|█████▉ | 13187/22434 [12:35:52<6:32:34, 2.55s/it] +2025-02-05 22:43:32 - ERROR - stderr - +2025-02-05 22:43:32 - ERROR - stderr - +2025-02-05 22:43:32 - INFO - stdout - {'loss': 0.7511, 'grad_norm': 1.24745512008667, 'learning_rate': 7.664142529985801e-06, 'epoch': 1.76} +2025-02-05 22:43:32 - ERROR - stderr - 59%|█████▉ | 13187/22434 [12:35:52<6:32:34, 2.55s/it] +2025-02-05 22:43:35 - ERROR - stderr - 59%|█████▉ | 13188/22434 [12:35:54<6:29:33, 2.53s/it] +2025-02-05 22:43:35 - ERROR - stderr - +2025-02-05 22:43:35 - ERROR - stderr - +2025-02-05 22:43:35 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.3364877700805664, 'learning_rate': 7.66273874721937e-06, 'epoch': 1.76} +2025-02-05 22:43:35 - ERROR - stderr - 59%|█████▉ | 13188/22434 [12:35:54<6:29:33, 2.53s/it] +2025-02-05 22:43:37 - ERROR - stderr - 59%|█████▉ | 13189/22434 [12:35:57<6:29:09, 2.53s/it] +2025-02-05 22:43:37 - ERROR - stderr - +2025-02-05 22:43:37 - ERROR - stderr - +2025-02-05 22:43:37 - INFO - stdout - {'loss': 0.6176, 'grad_norm': 1.1447926759719849, 'learning_rate': 7.661335013170911e-06, 'epoch': 1.76} +2025-02-05 22:43:37 - ERROR - stderr - 59%|█████▉ | 13189/22434 [12:35:57<6:29:09, 2.53s/it] +2025-02-05 22:43:40 - ERROR - stderr - 59%|█████▉ | 13190/22434 [12:35:59<6:26:14, 2.51s/it] +2025-02-05 22:43:40 - ERROR - stderr - +2025-02-05 22:43:40 - ERROR - stderr - +2025-02-05 22:43:40 - INFO - stdout - {'loss': 0.7468, 'grad_norm': 1.3012038469314575, 'learning_rate': 7.659931327869688e-06, 'epoch': 1.76} +2025-02-05 22:43:40 - ERROR - stderr - 59%|█████▉ | 13190/22434 [12:35:59<6:26:14, 2.51s/it] +2025-02-05 22:43:42 - ERROR - stderr - 59%|█████▉ | 13191/22434 [12:36:02<6:28:07, 2.52s/it] +2025-02-05 22:43:42 - ERROR - stderr - +2025-02-05 22:43:42 - ERROR - stderr - +2025-02-05 22:43:42 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.2025270462036133, 'learning_rate': 7.65852769134496e-06, 'epoch': 1.76} +2025-02-05 22:43:42 - ERROR - stderr - 59%|█████▉ | 13191/22434 [12:36:02<6:28:07, 2.52s/it] +2025-02-05 22:43:45 - ERROR - stderr - 59%|█████▉ | 13192/22434 [12:36:04<6:29:12, 2.53s/it] +2025-02-05 22:43:45 - ERROR - stderr - +2025-02-05 22:43:45 - ERROR - stderr - +2025-02-05 22:43:45 - INFO - stdout - {'loss': 0.7051, 'grad_norm': 1.2923822402954102, 'learning_rate': 7.657124103625974e-06, 'epoch': 1.76} +2025-02-05 22:43:45 - ERROR - stderr - 59%|█████▉ | 13192/22434 [12:36:04<6:29:12, 2.53s/it] +2025-02-05 22:43:47 - ERROR - stderr - 59%|█████▉ | 13193/22434 [12:36:07<6:28:52, 2.52s/it] +2025-02-05 22:43:47 - ERROR - stderr - +2025-02-05 22:43:47 - ERROR - stderr - +2025-02-05 22:43:47 - INFO - stdout - {'loss': 0.654, 'grad_norm': 1.225016474723816, 'learning_rate': 7.655720564742002e-06, 'epoch': 1.76} +2025-02-05 22:43:47 - ERROR - stderr - 59%|█████▉ | 13193/22434 [12:36:07<6:28:52, 2.52s/it] +2025-02-05 22:43:50 - ERROR - stderr - 59%|█████▉ | 13194/22434 [12:36:09<6:30:30, 2.54s/it] +2025-02-05 22:43:50 - ERROR - stderr - +2025-02-05 22:43:50 - ERROR - stderr - +2025-02-05 22:43:50 - INFO - stdout - {'loss': 0.6994, 'grad_norm': 1.2559335231781006, 'learning_rate': 7.654317074722287e-06, 'epoch': 1.76} +2025-02-05 22:43:50 - ERROR - stderr - 59%|█████▉ | 13194/22434 [12:36:10<6:30:30, 2.54s/it] +2025-02-05 22:43:52 - ERROR - stderr - 59%|█████▉ | 13195/22434 [12:36:12<6:27:32, 2.52s/it] +2025-02-05 22:43:52 - ERROR - stderr - +2025-02-05 22:43:52 - ERROR - stderr - +2025-02-05 22:43:52 - INFO - stdout - {'loss': 0.7013, 'grad_norm': 1.2556838989257812, 'learning_rate': 7.652913633596087e-06, 'epoch': 1.76} +2025-02-05 22:43:52 - ERROR - stderr - 59%|█████▉ | 13195/22434 [12:36:12<6:27:32, 2.52s/it] +2025-02-05 22:43:55 - ERROR - stderr - 59%|█████▉ | 13196/22434 [12:36:14<6:26:31, 2.51s/it] +2025-02-05 22:43:55 - ERROR - stderr - +2025-02-05 22:43:55 - ERROR - stderr - +2025-02-05 22:43:55 - INFO - stdout - {'loss': 0.7382, 'grad_norm': 1.3198646306991577, 'learning_rate': 7.65151024139266e-06, 'epoch': 1.76} +2025-02-05 22:43:55 - ERROR - stderr - 59%|█████▉ | 13196/22434 [12:36:15<6:26:31, 2.51s/it] +2025-02-05 22:43:57 - ERROR - stderr - 59%|█████▉ | 13197/22434 [12:36:17<6:34:27, 2.56s/it] +2025-02-05 22:43:57 - ERROR - stderr - +2025-02-05 22:43:57 - ERROR - stderr - +2025-02-05 22:43:57 - INFO - stdout - {'loss': 0.6519, 'grad_norm': 1.3013070821762085, 'learning_rate': 7.650106898141251e-06, 'epoch': 1.76} +2025-02-05 22:43:57 - ERROR - stderr - 59%|█████▉ | 13197/22434 [12:36:17<6:34:27, 2.56s/it] +2025-02-05 22:44:00 - ERROR - stderr - 59%|█████▉ | 13198/22434 [12:36:20<6:29:34, 2.53s/it] +2025-02-05 22:44:00 - ERROR - stderr - +2025-02-05 22:44:00 - ERROR - stderr - +2025-02-05 22:44:00 - INFO - stdout - {'loss': 0.696, 'grad_norm': 1.199157953262329, 'learning_rate': 7.64870360387112e-06, 'epoch': 1.76} +2025-02-05 22:44:00 - ERROR - stderr - 59%|█████▉ | 13198/22434 [12:36:20<6:29:34, 2.53s/it] +2025-02-05 22:44:02 - ERROR - stderr - 59%|█████▉ | 13199/22434 [12:36:22<6:28:56, 2.53s/it] +2025-02-05 22:44:02 - ERROR - stderr - +2025-02-05 22:44:02 - ERROR - stderr - +2025-02-05 22:44:02 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.1832815408706665, 'learning_rate': 7.64730035861151e-06, 'epoch': 1.77} +2025-02-05 22:44:02 - ERROR - stderr - 59%|█████▉ | 13199/22434 [12:36:22<6:28:56, 2.53s/it] +2025-02-05 22:44:05 - ERROR - stderr - 59%|█████▉ | 13200/22434 [12:36:25<6:25:48, 2.51s/it] +2025-02-05 22:44:05 - ERROR - stderr - +2025-02-05 22:44:05 - ERROR - stderr - +2025-02-05 22:44:05 - INFO - stdout - {'loss': 0.6569, 'grad_norm': 1.205780267715454, 'learning_rate': 7.645897162391672e-06, 'epoch': 1.77} +2025-02-05 22:44:05 - ERROR - stderr - 59%|█████▉ | 13200/22434 [12:36:25<6:25:48, 2.51s/it] +2025-02-05 22:44:07 - ERROR - stderr - 59%|█████▉ | 13201/22434 [12:36:27<6:27:57, 2.52s/it] +2025-02-05 22:44:07 - ERROR - stderr - +2025-02-05 22:44:07 - ERROR - stderr - +2025-02-05 22:44:07 - INFO - stdout - {'loss': 0.6951, 'grad_norm': 1.1487656831741333, 'learning_rate': 7.644494015240855e-06, 'epoch': 1.77} +2025-02-05 22:44:07 - ERROR - stderr - 59%|█████▉ | 13201/22434 [12:36:27<6:27:57, 2.52s/it] +2025-02-05 22:44:10 - ERROR - stderr - 59%|█████▉ | 13202/22434 [12:36:30<6:30:27, 2.54s/it] +2025-02-05 22:44:10 - ERROR - stderr - +2025-02-05 22:44:10 - ERROR - stderr - +2025-02-05 22:44:10 - INFO - stdout - {'loss': 0.6604, 'grad_norm': 1.1796915531158447, 'learning_rate': 7.64309091718831e-06, 'epoch': 1.77} +2025-02-05 22:44:10 - ERROR - stderr - 59%|█████▉ | 13202/22434 [12:36:30<6:30:27, 2.54s/it] +2025-02-05 22:44:12 - ERROR - stderr - 59%|█████▉ | 13203/22434 [12:36:32<6:28:28, 2.53s/it] +2025-02-05 22:44:12 - ERROR - stderr - +2025-02-05 22:44:12 - ERROR - stderr - +2025-02-05 22:44:12 - INFO - stdout - {'loss': 0.6797, 'grad_norm': 1.3387131690979004, 'learning_rate': 7.641687868263274e-06, 'epoch': 1.77} +2025-02-05 22:44:12 - ERROR - stderr - 59%|█████▉ | 13203/22434 [12:36:32<6:28:28, 2.53s/it] +2025-02-05 22:44:15 - ERROR - stderr - 59%|█████▉ | 13204/22434 [12:36:35<6:32:11, 2.55s/it] +2025-02-05 22:44:15 - ERROR - stderr - +2025-02-05 22:44:15 - ERROR - stderr - +2025-02-05 22:44:15 - INFO - stdout - {'loss': 0.7035, 'grad_norm': 1.176735520362854, 'learning_rate': 7.640284868495e-06, 'epoch': 1.77} +2025-02-05 22:44:15 - ERROR - stderr - 59%|█████▉ | 13204/22434 [12:36:35<6:32:11, 2.55s/it] +2025-02-05 22:44:18 - ERROR - stderr - 59%|█████▉ | 13205/22434 [12:36:37<6:31:03, 2.54s/it] +2025-02-05 22:44:18 - ERROR - stderr - +2025-02-05 22:44:18 - ERROR - stderr - +2025-02-05 22:44:18 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.214046597480774, 'learning_rate': 7.638881917912729e-06, 'epoch': 1.77} +2025-02-05 22:44:18 - ERROR - stderr - 59%|█████▉ | 13205/22434 [12:36:37<6:31:03, 2.54s/it] +2025-02-05 22:44:20 - ERROR - stderr - 59%|█████▉ | 13206/22434 [12:36:40<6:28:36, 2.53s/it] +2025-02-05 22:44:20 - ERROR - stderr - +2025-02-05 22:44:20 - ERROR - stderr - +2025-02-05 22:44:20 - INFO - stdout - {'loss': 0.6772, 'grad_norm': 1.2525125741958618, 'learning_rate': 7.637479016545708e-06, 'epoch': 1.77} +2025-02-05 22:44:20 - ERROR - stderr - 59%|█████▉ | 13206/22434 [12:36:40<6:28:36, 2.53s/it] +2025-02-05 22:44:23 - ERROR - stderr - 59%|█████▉ | 13207/22434 [12:36:42<6:33:18, 2.56s/it] +2025-02-05 22:44:23 - ERROR - stderr - +2025-02-05 22:44:23 - ERROR - stderr - +2025-02-05 22:44:23 - INFO - stdout - {'loss': 0.6594, 'grad_norm': 1.266052484512329, 'learning_rate': 7.636076164423173e-06, 'epoch': 1.77} +2025-02-05 22:44:23 - ERROR - stderr - 59%|█████▉ | 13207/22434 [12:36:42<6:33:18, 2.56s/it] +2025-02-05 22:44:25 - ERROR - stderr - 59%|█████▉ | 13208/22434 [12:36:45<6:33:49, 2.56s/it] +2025-02-05 22:44:25 - ERROR - stderr - +2025-02-05 22:44:25 - ERROR - stderr - +2025-02-05 22:44:25 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.1831791400909424, 'learning_rate': 7.63467336157437e-06, 'epoch': 1.77} +2025-02-05 22:44:25 - ERROR - stderr - 59%|█████▉ | 13208/22434 [12:36:45<6:33:49, 2.56s/it] +2025-02-05 22:44:28 - ERROR - stderr - 59%|█████▉ | 13209/22434 [12:36:48<6:31:18, 2.55s/it] +2025-02-05 22:44:28 - ERROR - stderr - +2025-02-05 22:44:28 - ERROR - stderr - +2025-02-05 22:44:28 - INFO - stdout - {'loss': 0.6963, 'grad_norm': 1.2331364154815674, 'learning_rate': 7.633270608028537e-06, 'epoch': 1.77} +2025-02-05 22:44:28 - ERROR - stderr - 59%|█████▉ | 13209/22434 [12:36:48<6:31:18, 2.55s/it] +2025-02-05 22:44:30 - ERROR - stderr - 59%|█████▉ | 13210/22434 [12:36:50<6:34:48, 2.57s/it] +2025-02-05 22:44:30 - ERROR - stderr - +2025-02-05 22:44:30 - ERROR - stderr - +2025-02-05 22:44:30 - INFO - stdout - {'loss': 0.7936, 'grad_norm': 1.2755190134048462, 'learning_rate': 7.631867903814916e-06, 'epoch': 1.77} +2025-02-05 22:44:30 - ERROR - stderr - 59%|█████▉ | 13210/22434 [12:36:50<6:34:48, 2.57s/it] +2025-02-05 22:44:33 - ERROR - stderr - 59%|█████▉ | 13211/22434 [12:36:53<6:35:00, 2.57s/it] +2025-02-05 22:44:33 - ERROR - stderr - +2025-02-05 22:44:33 - ERROR - stderr - +2025-02-05 22:44:33 - INFO - stdout - {'loss': 0.6856, 'grad_norm': 1.3306457996368408, 'learning_rate': 7.630465248962738e-06, 'epoch': 1.77} +2025-02-05 22:44:33 - ERROR - stderr - 59%|█████▉ | 13211/22434 [12:36:53<6:35:00, 2.57s/it] +2025-02-05 22:44:35 - ERROR - stderr - 59%|█████▉ | 13212/22434 [12:36:55<6:33:48, 2.56s/it] +2025-02-05 22:44:36 - ERROR - stderr - +2025-02-05 22:44:36 - ERROR - stderr - +2025-02-05 22:44:36 - INFO - stdout - {'loss': 0.7159, 'grad_norm': 1.2203032970428467, 'learning_rate': 7.629062643501248e-06, 'epoch': 1.77} +2025-02-05 22:44:36 - ERROR - stderr - 59%|█████▉ | 13212/22434 [12:36:55<6:33:48, 2.56s/it] +2025-02-05 22:44:38 - ERROR - stderr - 59%|█████▉ | 13213/22434 [12:36:58<6:32:11, 2.55s/it] +2025-02-05 22:44:38 - ERROR - stderr - +2025-02-05 22:44:38 - ERROR - stderr - +2025-02-05 22:44:38 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.1315717697143555, 'learning_rate': 7.627660087459674e-06, 'epoch': 1.77} +2025-02-05 22:44:38 - ERROR - stderr - 59%|█████▉ | 13213/22434 [12:36:58<6:32:11, 2.55s/it] +2025-02-05 22:44:41 - ERROR - stderr - 59%|█████▉ | 13214/22434 [12:37:00<6:31:26, 2.55s/it] +2025-02-05 22:44:41 - ERROR - stderr - +2025-02-05 22:44:41 - ERROR - stderr - +2025-02-05 22:44:41 - INFO - stdout - {'loss': 0.77, 'grad_norm': 1.2911678552627563, 'learning_rate': 7.6262575808672576e-06, 'epoch': 1.77} +2025-02-05 22:44:41 - ERROR - stderr - 59%|█████▉ | 13214/22434 [12:37:00<6:31:26, 2.55s/it] +2025-02-05 22:44:43 - ERROR - stderr - 59%|█████▉ | 13215/22434 [12:37:03<6:31:59, 2.55s/it] +2025-02-05 22:44:43 - ERROR - stderr - +2025-02-05 22:44:43 - ERROR - stderr - +2025-02-05 22:44:43 - INFO - stdout - {'loss': 0.6755, 'grad_norm': 1.2581003904342651, 'learning_rate': 7.624855123753235e-06, 'epoch': 1.77} +2025-02-05 22:44:43 - ERROR - stderr - 59%|█████▉ | 13215/22434 [12:37:03<6:31:59, 2.55s/it] +2025-02-05 22:44:46 - ERROR - stderr - 59%|█████▉ | 13216/22434 [12:37:05<6:28:49, 2.53s/it] +2025-02-05 22:44:46 - ERROR - stderr - +2025-02-05 22:44:46 - ERROR - stderr - +2025-02-05 22:44:46 - INFO - stdout - {'loss': 0.6794, 'grad_norm': 1.2090908288955688, 'learning_rate': 7.623452716146827e-06, 'epoch': 1.77} +2025-02-05 22:44:46 - ERROR - stderr - 59%|█████▉ | 13216/22434 [12:37:05<6:28:49, 2.53s/it] +2025-02-05 22:44:48 - ERROR - stderr - 59%|█████▉ | 13217/22434 [12:37:08<6:24:42, 2.50s/it] +2025-02-05 22:44:48 - ERROR - stderr - +2025-02-05 22:44:48 - ERROR - stderr - +2025-02-05 22:44:48 - INFO - stdout - {'loss': 0.7918, 'grad_norm': 1.5002691745758057, 'learning_rate': 7.62205035807728e-06, 'epoch': 1.77} +2025-02-05 22:44:48 - ERROR - stderr - 59%|█████▉ | 13217/22434 [12:37:08<6:24:42, 2.50s/it] +2025-02-05 22:44:51 - ERROR - stderr - 59%|█████▉ | 13218/22434 [12:37:10<6:23:01, 2.49s/it] +2025-02-05 22:44:51 - ERROR - stderr - +2025-02-05 22:44:51 - ERROR - stderr - +2025-02-05 22:44:51 - INFO - stdout - {'loss': 0.6163, 'grad_norm': 1.2874877452850342, 'learning_rate': 7.620648049573815e-06, 'epoch': 1.77} +2025-02-05 22:44:51 - ERROR - stderr - 59%|█████▉ | 13218/22434 [12:37:10<6:23:01, 2.49s/it] +2025-02-05 22:44:53 - ERROR - stderr - 59%|█████▉ | 13219/22434 [12:37:13<6:20:34, 2.48s/it] +2025-02-05 22:44:53 - ERROR - stderr - +2025-02-05 22:44:53 - ERROR - stderr - +2025-02-05 22:44:53 - INFO - stdout - {'loss': 0.6791, 'grad_norm': 1.1568379402160645, 'learning_rate': 7.619245790665662e-06, 'epoch': 1.77} +2025-02-05 22:44:53 - ERROR - stderr - 59%|█████▉ | 13219/22434 [12:37:13<6:20:34, 2.48s/it] +2025-02-05 22:44:55 - ERROR - stderr - 59%|█████▉ | 13220/22434 [12:37:15<6:18:28, 2.46s/it] +2025-02-05 22:44:55 - ERROR - stderr - +2025-02-05 22:44:55 - ERROR - stderr - +2025-02-05 22:44:55 - INFO - stdout - {'loss': 0.6064, 'grad_norm': 1.1005451679229736, 'learning_rate': 7.617843581382055e-06, 'epoch': 1.77} +2025-02-05 22:44:55 - ERROR - stderr - 59%|█████▉ | 13220/22434 [12:37:15<6:18:28, 2.46s/it] +2025-02-05 22:44:58 - ERROR - stderr - 59%|█████▉ | 13221/22434 [12:37:18<6:17:49, 2.46s/it] +2025-02-05 22:44:58 - ERROR - stderr - +2025-02-05 22:44:58 - ERROR - stderr - +2025-02-05 22:44:58 - INFO - stdout - {'loss': 0.6304, 'grad_norm': 1.1477329730987549, 'learning_rate': 7.6164414217522185e-06, 'epoch': 1.77} +2025-02-05 22:44:58 - ERROR - stderr - 59%|█████▉ | 13221/22434 [12:37:18<6:17:49, 2.46s/it] +2025-02-05 22:45:00 - ERROR - stderr - 59%|█████▉ | 13222/22434 [12:37:20<6:18:52, 2.47s/it] +2025-02-05 22:45:00 - ERROR - stderr - +2025-02-05 22:45:00 - ERROR - stderr - +2025-02-05 22:45:00 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.4061325788497925, 'learning_rate': 7.61503931180538e-06, 'epoch': 1.77} +2025-02-05 22:45:00 - ERROR - stderr - 59%|█████▉ | 13222/22434 [12:37:20<6:18:52, 2.47s/it] +2025-02-05 22:45:03 - ERROR - stderr - 59%|█████▉ | 13223/22434 [12:37:23<6:21:07, 2.48s/it] +2025-02-05 22:45:03 - ERROR - stderr - +2025-02-05 22:45:03 - ERROR - stderr - +2025-02-05 22:45:03 - INFO - stdout - {'loss': 0.7343, 'grad_norm': 1.2457906007766724, 'learning_rate': 7.613637251570767e-06, 'epoch': 1.77} +2025-02-05 22:45:03 - ERROR - stderr - 59%|█████▉ | 13223/22434 [12:37:23<6:21:07, 2.48s/it] +2025-02-05 22:45:05 - ERROR - stderr - 59%|█████▉ | 13224/22434 [12:37:25<6:21:53, 2.49s/it] +2025-02-05 22:45:05 - ERROR - stderr - +2025-02-05 22:45:05 - ERROR - stderr - +2025-02-05 22:45:05 - INFO - stdout - {'loss': 0.7036, 'grad_norm': 1.2702500820159912, 'learning_rate': 7.612235241077597e-06, 'epoch': 1.77} +2025-02-05 22:45:05 - ERROR - stderr - 59%|█████▉ | 13224/22434 [12:37:25<6:21:53, 2.49s/it] +2025-02-05 22:45:08 - ERROR - stderr - 59%|█████▉ | 13225/22434 [12:37:28<6:25:55, 2.51s/it] +2025-02-05 22:45:08 - ERROR - stderr - +2025-02-05 22:45:08 - ERROR - stderr - +2025-02-05 22:45:08 - INFO - stdout - {'loss': 0.6597, 'grad_norm': 1.2025442123413086, 'learning_rate': 7.610833280355103e-06, 'epoch': 1.77} +2025-02-05 22:45:08 - ERROR - stderr - 59%|█████▉ | 13225/22434 [12:37:28<6:25:55, 2.51s/it] +2025-02-05 22:45:10 - ERROR - stderr - 59%|█████▉ | 13226/22434 [12:37:30<6:27:09, 2.52s/it] +2025-02-05 22:45:11 - ERROR - stderr - +2025-02-05 22:45:11 - ERROR - stderr - +2025-02-05 22:45:11 - INFO - stdout - {'loss': 0.7255, 'grad_norm': 1.1940994262695312, 'learning_rate': 7.609431369432502e-06, 'epoch': 1.77} +2025-02-05 22:45:11 - ERROR - stderr - 59%|█████▉ | 13226/22434 [12:37:30<6:27:09, 2.52s/it] +2025-02-05 22:45:13 - ERROR - stderr - 59%|█████▉ | 13227/22434 [12:37:33<6:28:11, 2.53s/it] +2025-02-05 22:45:13 - ERROR - stderr - +2025-02-05 22:45:13 - ERROR - stderr - +2025-02-05 22:45:13 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.1808069944381714, 'learning_rate': 7.608029508339015e-06, 'epoch': 1.77} +2025-02-05 22:45:13 - ERROR - stderr - 59%|█████▉ | 13227/22434 [12:37:33<6:28:11, 2.53s/it] +2025-02-05 22:45:16 - ERROR - stderr - 59%|█████▉ | 13228/22434 [12:37:35<6:28:19, 2.53s/it] +2025-02-05 22:45:16 - ERROR - stderr - +2025-02-05 22:45:16 - ERROR - stderr - +2025-02-05 22:45:16 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.3171695470809937, 'learning_rate': 7.606627697103866e-06, 'epoch': 1.77} +2025-02-05 22:45:16 - ERROR - stderr - 59%|█████▉ | 13228/22434 [12:37:35<6:28:19, 2.53s/it] +2025-02-05 22:45:18 - ERROR - stderr - 59%|█████▉ | 13229/22434 [12:37:38<6:23:44, 2.50s/it] +2025-02-05 22:45:18 - ERROR - stderr - +2025-02-05 22:45:18 - ERROR - stderr - +2025-02-05 22:45:18 - INFO - stdout - {'loss': 0.637, 'grad_norm': 1.2143160104751587, 'learning_rate': 7.6052259357562685e-06, 'epoch': 1.77} +2025-02-05 22:45:18 - ERROR - stderr - 59%|█████▉ | 13229/22434 [12:37:38<6:23:44, 2.50s/it] +2025-02-05 22:45:20 - ERROR - stderr - 59%|█████��� | 13230/22434 [12:37:40<6:23:51, 2.50s/it] +2025-02-05 22:45:21 - ERROR - stderr - +2025-02-05 22:45:21 - ERROR - stderr - +2025-02-05 22:45:21 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.3234939575195312, 'learning_rate': 7.60382422432545e-06, 'epoch': 1.77} +2025-02-05 22:45:21 - ERROR - stderr - 59%|█████▉ | 13230/22434 [12:37:40<6:23:51, 2.50s/it] +2025-02-05 22:45:23 - ERROR - stderr - 59%|█████▉ | 13231/22434 [12:37:43<6:24:26, 2.51s/it] +2025-02-05 22:45:23 - ERROR - stderr - +2025-02-05 22:45:23 - ERROR - stderr - +2025-02-05 22:45:23 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2912805080413818, 'learning_rate': 7.602422562840622e-06, 'epoch': 1.77} +2025-02-05 22:45:23 - ERROR - stderr - 59%|█████▉ | 13231/22434 [12:37:43<6:24:26, 2.51s/it] +2025-02-05 22:45:25 - ERROR - stderr - 59%|█████▉ | 13232/22434 [12:37:45<6:23:59, 2.50s/it] +2025-02-05 22:45:26 - ERROR - stderr - +2025-02-05 22:45:26 - ERROR - stderr - +2025-02-05 22:45:26 - INFO - stdout - {'loss': 0.7327, 'grad_norm': 1.20145845413208, 'learning_rate': 7.601020951330998e-06, 'epoch': 1.77} +2025-02-05 22:45:26 - ERROR - stderr - 59%|█████▉ | 13232/22434 [12:37:45<6:23:59, 2.50s/it] +2025-02-05 22:45:28 - ERROR - stderr - 59%|█████▉ | 13233/22434 [12:37:48<6:29:21, 2.54s/it] +2025-02-05 22:45:28 - ERROR - stderr - +2025-02-05 22:45:28 - ERROR - stderr - +2025-02-05 22:45:28 - INFO - stdout - {'loss': 0.7092, 'grad_norm': 1.390238642692566, 'learning_rate': 7.599619389825799e-06, 'epoch': 1.77} +2025-02-05 22:45:28 - ERROR - stderr - 59%|█████▉ | 13233/22434 [12:37:48<6:29:21, 2.54s/it] +2025-02-05 22:45:31 - ERROR - stderr - 59%|█████▉ | 13234/22434 [12:37:50<6:30:36, 2.55s/it] +2025-02-05 22:45:31 - ERROR - stderr - +2025-02-05 22:45:31 - ERROR - stderr - +2025-02-05 22:45:31 - INFO - stdout - {'loss': 0.6519, 'grad_norm': 1.1942757368087769, 'learning_rate': 7.598217878354237e-06, 'epoch': 1.77} +2025-02-05 22:45:31 - ERROR - stderr - 59%|█████▉ | 13234/22434 [12:37:51<6:30:36, 2.55s/it] +2025-02-05 22:45:33 - ERROR - stderr - 59%|█████▉ | 13235/22434 [12:37:53<6:33:57, 2.57s/it] +2025-02-05 22:45:33 - ERROR - stderr - +2025-02-05 22:45:33 - ERROR - stderr - +2025-02-05 22:45:33 - INFO - stdout - {'loss': 0.6341, 'grad_norm': 1.0842225551605225, 'learning_rate': 7.596816416945523e-06, 'epoch': 1.77} +2025-02-05 22:45:33 - ERROR - stderr - 59%|█████▉ | 13235/22434 [12:37:53<6:33:57, 2.57s/it] +2025-02-05 22:45:36 - ERROR - stderr - 59%|█████▉ | 13236/22434 [12:37:56<6:35:30, 2.58s/it] +2025-02-05 22:45:36 - ERROR - stderr - +2025-02-05 22:45:36 - ERROR - stderr - +2025-02-05 22:45:36 - INFO - stdout - {'loss': 0.6408, 'grad_norm': 1.1821191310882568, 'learning_rate': 7.595415005628875e-06, 'epoch': 1.77} +2025-02-05 22:45:36 - ERROR - stderr - 59%|█████▉ | 13236/22434 [12:37:56<6:35:30, 2.58s/it] +2025-02-05 22:45:38 - ERROR - stderr - 59%|█████▉ | 13237/22434 [12:37:58<6:30:24, 2.55s/it] +2025-02-05 22:45:38 - ERROR - stderr - +2025-02-05 22:45:38 - ERROR - stderr - +2025-02-05 22:45:38 - INFO - stdout - {'loss': 0.6512, 'grad_norm': 1.2233281135559082, 'learning_rate': 7.594013644433496e-06, 'epoch': 1.77} +2025-02-05 22:45:38 - ERROR - stderr - 59%|█████▉ | 13237/22434 [12:37:58<6:30:24, 2.55s/it] +2025-02-05 22:45:41 - ERROR - stderr - 59%|█████▉ | 13238/22434 [12:38:01<6:28:54, 2.54s/it] +2025-02-05 22:45:41 - ERROR - stderr - +2025-02-05 22:45:41 - ERROR - stderr - +2025-02-05 22:45:41 - INFO - stdout - {'loss': 0.6324, 'grad_norm': 1.1893068552017212, 'learning_rate': 7.592612333388604e-06, 'epoch': 1.77} +2025-02-05 22:45:41 - ERROR - stderr - 59%|█████▉ | 13238/22434 [12:38:01<6:28:54, 2.54s/it] +2025-02-05 22:45:43 - ERROR - stderr - 59%|█████▉ | 13239/22434 [12:38:03<6:27:21, 2.53s/it] +2025-02-05 22:45:43 - ERROR - stderr - +2025-02-05 22:45:43 - ERROR - stderr - +2025-02-05 22:45:43 - INFO - stdout - {'loss': 0.6713, 'grad_norm': 1.3438538312911987, 'learning_rate': 7.591211072523403e-06, 'epoch': 1.77} +2025-02-05 22:45:43 - ERROR - stderr - 59%|█████▉ | 13239/22434 [12:38:03<6:27:21, 2.53s/it] +2025-02-05 22:45:46 - ERROR - stderr - 59%|█████▉ | 13240/22434 [12:38:06<6:23:30, 2.50s/it] +2025-02-05 22:45:46 - ERROR - stderr - +2025-02-05 22:45:46 - ERROR - stderr - +2025-02-05 22:45:46 - INFO - stdout - {'loss': 0.6585, 'grad_norm': 1.2872432470321655, 'learning_rate': 7.5898098618671015e-06, 'epoch': 1.77} +2025-02-05 22:45:46 - ERROR - stderr - 59%|█████▉ | 13240/22434 [12:38:06<6:23:30, 2.50s/it] +2025-02-05 22:45:48 - ERROR - stderr - 59%|█████▉ | 13241/22434 [12:38:08<6:22:16, 2.49s/it] +2025-02-05 22:45:48 - ERROR - stderr - +2025-02-05 22:45:48 - ERROR - stderr - +2025-02-05 22:45:48 - INFO - stdout - {'loss': 0.6194, 'grad_norm': 1.2955158948898315, 'learning_rate': 7.5884087014489065e-06, 'epoch': 1.77} +2025-02-05 22:45:48 - ERROR - stderr - 59%|█████▉ | 13241/22434 [12:38:08<6:22:16, 2.49s/it] +2025-02-05 22:45:51 - ERROR - stderr - 59%|█████▉ | 13242/22434 [12:38:11<6:22:19, 2.50s/it] +2025-02-05 22:45:51 - ERROR - stderr - +2025-02-05 22:45:51 - ERROR - stderr - +2025-02-05 22:45:51 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.2175147533416748, 'learning_rate': 7.587007591298028e-06, 'epoch': 1.77} +2025-02-05 22:45:51 - ERROR - stderr - 59%|█████▉ | 13242/22434 [12:38:11<6:22:19, 2.50s/it] +2025-02-05 22:45:53 - ERROR - stderr - 59%|█████▉ | 13243/22434 [12:38:13<6:24:19, 2.51s/it] +2025-02-05 22:45:53 - ERROR - stderr - +2025-02-05 22:45:53 - ERROR - stderr - +2025-02-05 22:45:53 - INFO - stdout - {'loss': 0.7935, 'grad_norm': 1.4306607246398926, 'learning_rate': 7.585606531443662e-06, 'epoch': 1.77} +2025-02-05 22:45:53 - ERROR - stderr - 59%|█████▉ | 13243/22434 [12:38:13<6:24:19, 2.51s/it] +2025-02-05 22:45:56 - ERROR - stderr - 59%|█████▉ | 13244/22434 [12:38:16<6:23:53, 2.51s/it] +2025-02-05 22:45:56 - ERROR - stderr - +2025-02-05 22:45:56 - ERROR - stderr - +2025-02-05 22:45:56 - INFO - stdout - {'loss': 0.6189, 'grad_norm': 1.1454448699951172, 'learning_rate': 7.584205521915023e-06, 'epoch': 1.77} +2025-02-05 22:45:56 - ERROR - stderr - 59%|█████▉ | 13244/22434 [12:38:16<6:23:53, 2.51s/it] +2025-02-05 22:45:58 - ERROR - stderr - 59%|█████▉ | 13245/22434 [12:38:18<6:29:25, 2.54s/it] +2025-02-05 22:45:59 - ERROR - stderr - +2025-02-05 22:45:59 - ERROR - stderr - +2025-02-05 22:45:59 - INFO - stdout - {'loss': 0.727, 'grad_norm': 1.2536683082580566, 'learning_rate': 7.582804562741303e-06, 'epoch': 1.77} +2025-02-05 22:45:59 - ERROR - stderr - 59%|█████▉ | 13245/22434 [12:38:18<6:29:25, 2.54s/it] +2025-02-05 22:46:01 - ERROR - stderr - 59%|█████▉ | 13246/22434 [12:38:21<6:26:57, 2.53s/it] +2025-02-05 22:46:01 - ERROR - stderr - +2025-02-05 22:46:01 - ERROR - stderr - +2025-02-05 22:46:01 - INFO - stdout - {'loss': 0.6892, 'grad_norm': 1.2530522346496582, 'learning_rate': 7.581403653951711e-06, 'epoch': 1.77} +2025-02-05 22:46:01 - ERROR - stderr - 59%|█████▉ | 13246/22434 [12:38:21<6:26:57, 2.53s/it] +2025-02-05 22:46:04 - ERROR - stderr - 59%|█████▉ | 13247/22434 [12:38:23<6:29:58, 2.55s/it] +2025-02-05 22:46:04 - ERROR - stderr - +2025-02-05 22:46:04 - ERROR - stderr - +2025-02-05 22:46:04 - INFO - stdout - {'loss': 0.6352, 'grad_norm': 1.1325064897537231, 'learning_rate': 7.5800027955754474e-06, 'epoch': 1.77} +2025-02-05 22:46:04 - ERROR - stderr - 59%|█████▉ | 13247/22434 [12:38:23<6:29:58, 2.55s/it] +2025-02-05 22:46:06 - ERROR - stderr - 59%|█████▉ | 13248/22434 [12:38:26<6:28:20, 2.54s/it] +2025-02-05 22:46:06 - ERROR - stderr - +2025-02-05 22:46:06 - ERROR - stderr - +2025-02-05 22:46:06 - INFO - stdout - {'loss': 0.7073, 'grad_norm': 1.40010666847229, 'learning_rate': 7.578601987641706e-06, 'epoch': 1.77} +2025-02-05 22:46:06 - ERROR - stderr - 59%|█████▉ | 13248/22434 [12:38:26<6:28:20, 2.54s/it] +2025-02-05 22:46:09 - ERROR - stderr - 59%|█████▉ | 13249/22434 [12:38:28<6:27:27, 2.53s/it] +2025-02-05 22:46:09 - ERROR - stderr - +2025-02-05 22:46:09 - ERROR - stderr - +2025-02-05 22:46:09 - INFO - stdout - {'loss': 0.8132, 'grad_norm': 1.4395222663879395, 'learning_rate': 7.5772012301796935e-06, 'epoch': 1.77} +2025-02-05 22:46:09 - ERROR - stderr - 59%|█████▉ | 13249/22434 [12:38:28<6:27:27, 2.53s/it] +2025-02-05 22:46:11 - ERROR - stderr - 59%|█████▉ | 13250/22434 [12:38:31<6:28:30, 2.54s/it] +2025-02-05 22:46:11 - ERROR - stderr - +2025-02-05 22:46:11 - ERROR - stderr - +2025-02-05 22:46:11 - INFO - stdout - {'loss': 0.6525, 'grad_norm': 1.0818517208099365, 'learning_rate': 7.575800523218603e-06, 'epoch': 1.77} +2025-02-05 22:46:11 - ERROR - stderr - 59%|█████▉ | 13250/22434 [12:38:31<6:28:30, 2.54s/it] +2025-02-05 22:46:14 - ERROR - stderr - 59%|█████▉ | 13251/22434 [12:38:33<6:27:03, 2.53s/it] +2025-02-05 22:46:14 - ERROR - stderr - +2025-02-05 22:46:14 - ERROR - stderr - +2025-02-05 22:46:14 - INFO - stdout - {'loss': 0.6715, 'grad_norm': 1.181702971458435, 'learning_rate': 7.574399866787626e-06, 'epoch': 1.77} +2025-02-05 22:46:14 - ERROR - stderr - 59%|█████▉ | 13251/22434 [12:38:33<6:27:03, 2.53s/it] +2025-02-05 22:46:16 - ERROR - stderr - 59%|█████▉ | 13252/22434 [12:38:36<6:25:30, 2.52s/it] +2025-02-05 22:46:16 - ERROR - stderr - +2025-02-05 22:46:16 - ERROR - stderr - +2025-02-05 22:46:16 - INFO - stdout - {'loss': 0.6178, 'grad_norm': 1.1964753866195679, 'learning_rate': 7.572999260915965e-06, 'epoch': 1.77} +2025-02-05 22:46:16 - ERROR - stderr - 59%|█████▉ | 13252/22434 [12:38:36<6:25:30, 2.52s/it] +2025-02-05 22:46:19 - ERROR - stderr - 59%|█████▉ | 13253/22434 [12:38:38<6:24:58, 2.52s/it] +2025-02-05 22:46:19 - ERROR - stderr - +2025-02-05 22:46:19 - ERROR - stderr - +2025-02-05 22:46:19 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.2745329141616821, 'learning_rate': 7.5715987056328136e-06, 'epoch': 1.77} +2025-02-05 22:46:19 - ERROR - stderr - 59%|█████▉ | 13253/22434 [12:38:38<6:24:58, 2.52s/it] +2025-02-05 22:46:21 - ERROR - stderr - 59%|█████▉ | 13254/22434 [12:38:41<6:23:16, 2.51s/it] +2025-02-05 22:46:21 - ERROR - stderr - +2025-02-05 22:46:21 - ERROR - stderr - +2025-02-05 22:46:21 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.1141149997711182, 'learning_rate': 7.570198200967363e-06, 'epoch': 1.77} +2025-02-05 22:46:21 - ERROR - stderr - 59%|█████▉ | 13254/22434 [12:38:41<6:23:16, 2.51s/it] +2025-02-05 22:46:24 - ERROR - stderr - 59%|█████▉ | 13255/22434 [12:38:43<6:22:09, 2.50s/it] +2025-02-05 22:46:24 - ERROR - stderr - +2025-02-05 22:46:24 - ERROR - stderr - +2025-02-05 22:46:24 - INFO - stdout - {'loss': 0.7182, 'grad_norm': 1.3399637937545776, 'learning_rate': 7.568797746948806e-06, 'epoch': 1.77} +2025-02-05 22:46:24 - ERROR - stderr - 59%|█████▉ | 13255/22434 [12:38:43<6:22:09, 2.50s/it] +2025-02-05 22:46:26 - ERROR - stderr - 59%|█████▉ | 13256/22434 [12:38:46<6:30:25, 2.55s/it] +2025-02-05 22:46:26 - ERROR - stderr - +2025-02-05 22:46:26 - ERROR - stderr - +2025-02-05 22:46:26 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.326000452041626, 'learning_rate': 7.567397343606331e-06, 'epoch': 1.77} +2025-02-05 22:46:26 - ERROR - stderr - 59%|█████▉ | 13256/22434 [12:38:46<6:30:25, 2.55s/it] +2025-02-05 22:46:29 - ERROR - stderr - 59%|█████▉ | 13257/22434 [12:38:49<6:28:38, 2.54s/it] +2025-02-05 22:46:29 - ERROR - stderr - +2025-02-05 22:46:29 - ERROR - stderr - +2025-02-05 22:46:29 - INFO - stdout - {'loss': 0.6022, 'grad_norm': 1.1466896533966064, 'learning_rate': 7.565996990969135e-06, 'epoch': 1.77} +2025-02-05 22:46:29 - ERROR - stderr - 59%|█████▉ | 13257/22434 [12:38:49<6:28:38, 2.54s/it] +2025-02-05 22:46:31 - ERROR - stderr - 59%|█████▉ | 13258/22434 [12:38:51<6:23:19, 2.51s/it] +2025-02-05 22:46:31 - ERROR - stderr - +2025-02-05 22:46:31 - ERROR - stderr - +2025-02-05 22:46:31 - INFO - stdout - {'loss': 0.6852, 'grad_norm': 1.4548966884613037, 'learning_rate': 7.564596689066397e-06, 'epoch': 1.77} +2025-02-05 22:46:31 - ERROR - stderr - 59%|█████▉ | 13258/22434 [12:38:51<6:23:19, 2.51s/it] +2025-02-05 22:46:34 - ERROR - stderr - 59%|█████▉ | 13259/22434 [12:38:54<6:23:50, 2.51s/it] +2025-02-05 22:46:34 - ERROR - stderr - +2025-02-05 22:46:34 - ERROR - stderr - +2025-02-05 22:46:34 - INFO - stdout - {'loss': 0.6299, 'grad_norm': 1.1868021488189697, 'learning_rate': 7.563196437927316e-06, 'epoch': 1.77} +2025-02-05 22:46:34 - ERROR - stderr - 59%|█████▉ | 13259/22434 [12:38:54<6:23:50, 2.51s/it] +2025-02-05 22:46:36 - ERROR - stderr - 59%|█████▉ | 13260/22434 [12:38:56<6:20:20, 2.49s/it] +2025-02-05 22:46:36 - ERROR - stderr - +2025-02-05 22:46:36 - ERROR - stderr - +2025-02-05 22:46:36 - INFO - stdout - {'loss': 0.6813, 'grad_norm': 1.2207348346710205, 'learning_rate': 7.5617962375810705e-06, 'epoch': 1.77} +2025-02-05 22:46:36 - ERROR - stderr - 59%|█████▉ | 13260/22434 [12:38:56<6:20:20, 2.49s/it] +2025-02-05 22:46:39 - ERROR - stderr - 59%|█████▉ | 13261/22434 [12:38:59<6:41:55, 2.63s/it] +2025-02-05 22:46:39 - ERROR - stderr - +2025-02-05 22:46:39 - ERROR - stderr - +2025-02-05 22:46:39 - INFO - stdout - {'loss': 0.6433, 'grad_norm': 1.2169100046157837, 'learning_rate': 7.560396088056848e-06, 'epoch': 1.77} +2025-02-05 22:46:39 - ERROR - stderr - 59%|█████▉ | 13261/22434 [12:38:59<6:41:55, 2.63s/it] +2025-02-05 22:46:42 - ERROR - stderr - 59%|█████▉ | 13262/22434 [12:39:01<6:38:40, 2.61s/it] +2025-02-05 22:46:42 - ERROR - stderr - +2025-02-05 22:46:42 - ERROR - stderr - +2025-02-05 22:46:42 - INFO - stdout - {'loss': 0.7832, 'grad_norm': 1.2752684354782104, 'learning_rate': 7.558995989383839e-06, 'epoch': 1.77} +2025-02-05 22:46:42 - ERROR - stderr - 59%|█████▉ | 13262/22434 [12:39:02<6:38:40, 2.61s/it] +2025-02-05 22:46:44 - ERROR - stderr - 59%|█████▉ | 13263/22434 [12:39:04<6:38:17, 2.61s/it] +2025-02-05 22:46:44 - ERROR - stderr - +2025-02-05 22:46:44 - ERROR - stderr - +2025-02-05 22:46:44 - INFO - stdout - {'loss': 0.6141, 'grad_norm': 1.1743502616882324, 'learning_rate': 7.557595941591221e-06, 'epoch': 1.77} +2025-02-05 22:46:44 - ERROR - stderr - 59%|█████▉ | 13263/22434 [12:39:04<6:38:17, 2.61s/it] +2025-02-05 22:46:47 - ERROR - stderr - 59%|█████▉ | 13264/22434 [12:39:07<6:35:28, 2.59s/it] +2025-02-05 22:46:47 - ERROR - stderr - +2025-02-05 22:46:47 - ERROR - stderr - +2025-02-05 22:46:47 - INFO - stdout - {'loss': 0.7051, 'grad_norm': 1.3403269052505493, 'learning_rate': 7.556195944708176e-06, 'epoch': 1.77} +2025-02-05 22:46:47 - ERROR - stderr - 59%|█████▉ | 13264/22434 [12:39:07<6:35:28, 2.59s/it] +2025-02-05 22:46:49 - ERROR - stderr - 59%|█████▉ | 13265/22434 [12:39:09<6:29:29, 2.55s/it] +2025-02-05 22:46:49 - ERROR - stderr - +2025-02-05 22:46:49 - ERROR - stderr - +2025-02-05 22:46:49 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.2211259603500366, 'learning_rate': 7.55479599876389e-06, 'epoch': 1.77} +2025-02-05 22:46:49 - ERROR - stderr - 59%|█████▉ | 13265/22434 [12:39:09<6:29:29, 2.55s/it] +2025-02-05 22:46:52 - ERROR - stderr - 59%|█████▉ | 13266/22434 [12:39:12<6:25:03, 2.52s/it] +2025-02-05 22:46:52 - ERROR - stderr - +2025-02-05 22:46:52 - ERROR - stderr - +2025-02-05 22:46:52 - INFO - stdout - {'loss': 0.6124, 'grad_norm': 1.239791989326477, 'learning_rate': 7.553396103787541e-06, 'epoch': 1.77} +2025-02-05 22:46:52 - ERROR - stderr - 59%|█████▉ | 13266/22434 [12:39:12<6:25:03, 2.52s/it] +2025-02-05 22:46:54 - ERROR - stderr - 59%|█████▉ | 13267/22434 [12:39:14<6:25:02, 2.52s/it] +2025-02-05 22:46:54 - ERROR - stderr - +2025-02-05 22:46:54 - ERROR - stderr - +2025-02-05 22:46:54 - INFO - stdout - {'loss': 0.6925, 'grad_norm': 1.229616641998291, 'learning_rate': 7.55199625980831e-06, 'epoch': 1.77} +2025-02-05 22:46:54 - ERROR - stderr - 59%|█████▉ | 13267/22434 [12:39:14<6:25:02, 2.52s/it] +2025-02-05 22:46:57 - ERROR - stderr - 59%|█████▉ | 13268/22434 [12:39:17<6:24:34, 2.52s/it] +2025-02-05 22:46:57 - ERROR - stderr - +2025-02-05 22:46:57 - ERROR - stderr - +2025-02-05 22:46:57 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.1542174816131592, 'learning_rate': 7.550596466855375e-06, 'epoch': 1.77} +2025-02-05 22:46:57 - ERROR - stderr - 59%|█████▉ | 13268/22434 [12:39:17<6:24:34, 2.52s/it] +2025-02-05 22:46:59 - ERROR - stderr - 59%|█████▉ | 13269/22434 [12:39:19<6:25:17, 2.52s/it] +2025-02-05 22:46:59 - ERROR - stderr - +2025-02-05 22:46:59 - ERROR - stderr - +2025-02-05 22:46:59 - INFO - stdout - {'loss': 0.7413, 'grad_norm': 1.1539620161056519, 'learning_rate': 7.5491967249579105e-06, 'epoch': 1.77} +2025-02-05 22:46:59 - ERROR - stderr - 59%|█████▉ | 13269/22434 [12:39:19<6:25:17, 2.52s/it] +2025-02-05 22:47:02 - ERROR - stderr - 59%|█████▉ | 13270/22434 [12:39:22<6:25:09, 2.52s/it] +2025-02-05 22:47:02 - ERROR - stderr - +2025-02-05 22:47:02 - ERROR - stderr - +2025-02-05 22:47:02 - INFO - stdout - {'loss': 0.7174, 'grad_norm': 1.2399497032165527, 'learning_rate': 7.547797034145098e-06, 'epoch': 1.77} +2025-02-05 22:47:02 - ERROR - stderr - 59%|█████▉ | 13270/22434 [12:39:22<6:25:09, 2.52s/it] +2025-02-05 22:47:04 - ERROR - stderr - 59%|█████▉ | 13271/22434 [12:39:24<6:23:06, 2.51s/it] +2025-02-05 22:47:04 - ERROR - stderr - +2025-02-05 22:47:04 - ERROR - stderr - +2025-02-05 22:47:04 - INFO - stdout - {'loss': 0.7476, 'grad_norm': 1.2965887784957886, 'learning_rate': 7.546397394446108e-06, 'epoch': 1.77} +2025-02-05 22:47:04 - ERROR - stderr - 59%|█████▉ | 13271/22434 [12:39:24<6:23:06, 2.51s/it] +2025-02-05 22:47:07 - ERROR - stderr - 59%|█████▉ | 13272/22434 [12:39:27<6:23:13, 2.51s/it] +2025-02-05 22:47:07 - ERROR - stderr - +2025-02-05 22:47:07 - ERROR - stderr - +2025-02-05 22:47:07 - INFO - stdout - {'loss': 0.692, 'grad_norm': 1.1897649765014648, 'learning_rate': 7.5449978058901174e-06, 'epoch': 1.77} +2025-02-05 22:47:07 - ERROR - stderr - 59%|█████▉ | 13272/22434 [12:39:27<6:23:13, 2.51s/it] +2025-02-05 22:47:09 - ERROR - stderr - 59%|█████▉ | 13273/22434 [12:39:29<6:23:30, 2.51s/it] +2025-02-05 22:47:09 - ERROR - stderr - +2025-02-05 22:47:09 - ERROR - stderr - +2025-02-05 22:47:09 - INFO - stdout - {'loss': 0.6762, 'grad_norm': 1.3211344480514526, 'learning_rate': 7.543598268506297e-06, 'epoch': 1.77} +2025-02-05 22:47:09 - ERROR - stderr - 59%|█████▉ | 13273/22434 [12:39:29<6:23:30, 2.51s/it] +2025-02-05 22:47:12 - ERROR - stderr - 59%|█████▉ | 13274/22434 [12:39:32<6:26:19, 2.53s/it] +2025-02-05 22:47:12 - ERROR - stderr - +2025-02-05 22:47:12 - ERROR - stderr - +2025-02-05 22:47:12 - INFO - stdout - {'loss': 0.6655, 'grad_norm': 1.2350395917892456, 'learning_rate': 7.542198782323819e-06, 'epoch': 1.78} +2025-02-05 22:47:12 - ERROR - stderr - 59%|█████▉ | 13274/22434 [12:39:32<6:26:19, 2.53s/it] +2025-02-05 22:47:14 - ERROR - stderr - 59%|█████▉ | 13275/22434 [12:39:34<6:22:39, 2.51s/it] +2025-02-05 22:47:14 - ERROR - stderr - +2025-02-05 22:47:14 - ERROR - stderr - +2025-02-05 22:47:14 - INFO - stdout - {'loss': 0.6655, 'grad_norm': 1.3445478677749634, 'learning_rate': 7.540799347371859e-06, 'epoch': 1.78} +2025-02-05 22:47:14 - ERROR - stderr - 59%|█████▉ | 13275/22434 [12:39:34<6:22:39, 2.51s/it] +2025-02-05 22:47:17 - ERROR - stderr - 59%|█████▉ | 13276/22434 [12:39:37<6:21:48, 2.50s/it] +2025-02-05 22:47:17 - ERROR - stderr - +2025-02-05 22:47:17 - ERROR - stderr - +2025-02-05 22:47:17 - INFO - stdout - {'loss': 0.7447, 'grad_norm': 1.3568940162658691, 'learning_rate': 7.539399963679583e-06, 'epoch': 1.78} +2025-02-05 22:47:17 - ERROR - stderr - 59%|█████▉ | 13276/22434 [12:39:37<6:21:48, 2.50s/it] +2025-02-05 22:47:19 - ERROR - stderr - 59%|█████▉ | 13277/22434 [12:39:39<6:26:13, 2.53s/it] +2025-02-05 22:47:20 - ERROR - stderr - +2025-02-05 22:47:20 - ERROR - stderr - +2025-02-05 22:47:20 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.3517497777938843, 'learning_rate': 7.538000631276158e-06, 'epoch': 1.78} +2025-02-05 22:47:20 - ERROR - stderr - 59%|█████▉ | 13277/22434 [12:39:39<6:26:13, 2.53s/it] +2025-02-05 22:47:22 - ERROR - stderr - 59%|█████▉ | 13278/22434 [12:39:42<6:24:10, 2.52s/it] +2025-02-05 22:47:22 - ERROR - stderr - +2025-02-05 22:47:22 - ERROR - stderr - +2025-02-05 22:47:22 - INFO - stdout - {'loss': 0.699, 'grad_norm': 1.2729796171188354, 'learning_rate': 7.536601350190756e-06, 'epoch': 1.78} +2025-02-05 22:47:22 - ERROR - stderr - 59%|█████▉ | 13278/22434 [12:39:42<6:24:10, 2.52s/it] +2025-02-05 22:47:24 - ERROR - stderr - 59%|█████▉ | 13279/22434 [12:39:44<6:22:43, 2.51s/it] +2025-02-05 22:47:25 - ERROR - stderr - +2025-02-05 22:47:25 - ERROR - stderr - +2025-02-05 22:47:25 - INFO - stdout - {'loss': 0.6439, 'grad_norm': 1.2554553747177124, 'learning_rate': 7.53520212045254e-06, 'epoch': 1.78} +2025-02-05 22:47:25 - ERROR - stderr - 59%|█████▉ | 13279/22434 [12:39:44<6:22:43, 2.51s/it] +2025-02-05 22:47:27 - ERROR - stderr - 59%|█████▉ | 13280/22434 [12:39:47<6:21:20, 2.50s/it] +2025-02-05 22:47:27 - ERROR - stderr - +2025-02-05 22:47:27 - ERROR - stderr - +2025-02-05 22:47:27 - INFO - stdout - {'loss': 0.6489, 'grad_norm': 1.1379806995391846, 'learning_rate': 7.533802942090677e-06, 'epoch': 1.78} +2025-02-05 22:47:27 - ERROR - stderr - 59%|█████▉ | 13280/22434 [12:39:47<6:21:20, 2.50s/it] +2025-02-05 22:47:29 - ERROR - stderr - 59%|█████▉ | 13281/22434 [12:39:49<6:22:37, 2.51s/it] +2025-02-05 22:47:30 - ERROR - stderr - +2025-02-05 22:47:30 - ERROR - stderr - +2025-02-05 22:47:30 - INFO - stdout - {'loss': 0.6038, 'grad_norm': 1.1808940172195435, 'learning_rate': 7.532403815134335e-06, 'epoch': 1.78} +2025-02-05 22:47:30 - ERROR - stderr - 59%|█████▉ | 13281/22434 [12:39:49<6:22:37, 2.51s/it] +2025-02-05 22:47:32 - ERROR - stderr - 59%|█████▉ | 13282/22434 [12:39:52<6:21:40, 2.50s/it] +2025-02-05 22:47:32 - ERROR - stderr - +2025-02-05 22:47:32 - ERROR - stderr - +2025-02-05 22:47:32 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.1703616380691528, 'learning_rate': 7.531004739612668e-06, 'epoch': 1.78} +2025-02-05 22:47:32 - ERROR - stderr - 59%|█████▉ | 13282/22434 [12:39:52<6:21:40, 2.50s/it] +2025-02-05 22:47:34 - ERROR - stderr - 59%|█████▉ | 13283/22434 [12:39:54<6:20:55, 2.50s/it] +2025-02-05 22:47:34 - ERROR - stderr - +2025-02-05 22:47:34 - ERROR - stderr - +2025-02-05 22:47:34 - INFO - stdout - {'loss': 0.7723, 'grad_norm': 1.495598554611206, 'learning_rate': 7.529605715554851e-06, 'epoch': 1.78} +2025-02-05 22:47:34 - ERROR - stderr - 59%|█████▉ | 13283/22434 [12:39:54<6:20:55, 2.50s/it] +2025-02-05 22:47:37 - ERROR - stderr - 59%|█████▉ | 13284/22434 [12:39:57<6:19:05, 2.49s/it] +2025-02-05 22:47:37 - ERROR - stderr - +2025-02-05 22:47:37 - ERROR - stderr - +2025-02-05 22:47:37 - INFO - stdout - {'loss': 0.6442, 'grad_norm': 1.1676737070083618, 'learning_rate': 7.528206742990036e-06, 'epoch': 1.78} +2025-02-05 22:47:37 - ERROR - stderr - 59%|█████▉ | 13284/22434 [12:39:57<6:19:05, 2.49s/it] +2025-02-05 22:47:39 - ERROR - stderr - 59%|█████▉ | 13285/22434 [12:39:59<6:20:20, 2.49s/it] +2025-02-05 22:47:39 - ERROR - stderr - +2025-02-05 22:47:39 - ERROR - stderr - +2025-02-05 22:47:39 - INFO - stdout - {'loss': 0.7284, 'grad_norm': 1.411613941192627, 'learning_rate': 7.526807821947387e-06, 'epoch': 1.78} +2025-02-05 22:47:39 - ERROR - stderr - 59%|█████▉ | 13285/22434 [12:39:59<6:20:20, 2.49s/it] +2025-02-05 22:47:42 - ERROR - stderr - 59%|█████▉ | 13286/22434 [12:40:02<6:18:02, 2.48s/it] +2025-02-05 22:47:42 - ERROR - stderr - +2025-02-05 22:47:42 - ERROR - stderr - +2025-02-05 22:47:42 - INFO - stdout - {'loss': 0.7459, 'grad_norm': 1.204737663269043, 'learning_rate': 7.5254089524560614e-06, 'epoch': 1.78} +2025-02-05 22:47:42 - ERROR - stderr - 59%|█████▉ | 13286/22434 [12:40:02<6:18:02, 2.48s/it] +2025-02-05 22:47:44 - ERROR - stderr - 59%|█████▉ | 13287/22434 [12:40:04<6:19:42, 2.49s/it] +2025-02-05 22:47:44 - ERROR - stderr - +2025-02-05 22:47:44 - ERROR - stderr - +2025-02-05 22:47:44 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.2189507484436035, 'learning_rate': 7.524010134545221e-06, 'epoch': 1.78} +2025-02-05 22:47:44 - ERROR - stderr - 59%|█████▉ | 13287/22434 [12:40:04<6:19:42, 2.49s/it] +2025-02-05 22:47:47 - ERROR - stderr - 59%|█████▉ | 13288/22434 [12:40:07<6:19:24, 2.49s/it] +2025-02-05 22:47:47 - ERROR - stderr - +2025-02-05 22:47:47 - ERROR - stderr - +2025-02-05 22:47:47 - INFO - stdout - {'loss': 0.749, 'grad_norm': 1.2145860195159912, 'learning_rate': 7.522611368244016e-06, 'epoch': 1.78} +2025-02-05 22:47:47 - ERROR - stderr - 59%|█████▉ | 13288/22434 [12:40:07<6:19:24, 2.49s/it] +2025-02-05 22:47:49 - ERROR - stderr - 59%|█████▉ | 13289/22434 [12:40:09<6:20:05, 2.49s/it] +2025-02-05 22:47:49 - ERROR - stderr - +2025-02-05 22:47:49 - ERROR - stderr - +2025-02-05 22:47:49 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.2140165567398071, 'learning_rate': 7.521212653581611e-06, 'epoch': 1.78} +2025-02-05 22:47:49 - ERROR - stderr - 59%|█████▉ | 13289/22434 [12:40:09<6:20:05, 2.49s/it] +2025-02-05 22:47:52 - ERROR - stderr - 59%|█████▉ | 13290/22434 [12:40:12<6:20:42, 2.50s/it] +2025-02-05 22:47:52 - ERROR - stderr - +2025-02-05 22:47:52 - ERROR - stderr - +2025-02-05 22:47:52 - INFO - stdout - {'loss': 0.6781, 'grad_norm': 1.147995114326477, 'learning_rate': 7.51981399058715e-06, 'epoch': 1.78} +2025-02-05 22:47:52 - ERROR - stderr - 59%|█████▉ | 13290/22434 [12:40:12<6:20:42, 2.50s/it] +2025-02-05 22:47:54 - ERROR - stderr - 59%|█████▉ | 13291/22434 [12:40:14<6:23:12, 2.51s/it] +2025-02-05 22:47:54 - ERROR - stderr - +2025-02-05 22:47:54 - ERROR - stderr - +2025-02-05 22:47:54 - INFO - stdout - {'loss': 0.7903, 'grad_norm': 1.2674106359481812, 'learning_rate': 7.5184153792897995e-06, 'epoch': 1.78} +2025-02-05 22:47:54 - ERROR - stderr - 59%|█████▉ | 13291/22434 [12:40:14<6:23:12, 2.51s/it] +2025-02-05 22:47:57 - ERROR - stderr - 59%|█████▉ | 13292/22434 [12:40:17<6:20:21, 2.50s/it] +2025-02-05 22:47:57 - ERROR - stderr - +2025-02-05 22:47:57 - ERROR - stderr - +2025-02-05 22:47:57 - INFO - stdout - {'loss': 0.6452, 'grad_norm': 1.1379270553588867, 'learning_rate': 7.5170168197187035e-06, 'epoch': 1.78} +2025-02-05 22:47:57 - ERROR - stderr - 59%|█████▉ | 13292/22434 [12:40:17<6:20:21, 2.50s/it] +2025-02-05 22:47:59 - ERROR - stderr - 59%|█████▉ | 13293/22434 [12:40:19<6:22:45, 2.51s/it] +2025-02-05 22:47:59 - ERROR - stderr - +2025-02-05 22:47:59 - ERROR - stderr - +2025-02-05 22:47:59 - INFO - stdout - {'loss': 0.6417, 'grad_norm': 1.1839420795440674, 'learning_rate': 7.515618311903012e-06, 'epoch': 1.78} +2025-02-05 22:47:59 - ERROR - stderr - 59%|█████▉ | 13293/22434 [12:40:19<6:22:45, 2.51s/it] +2025-02-05 22:48:02 - ERROR - stderr - 59%|█████▉ | 13294/22434 [12:40:22<6:21:52, 2.51s/it] +2025-02-05 22:48:02 - ERROR - stderr - +2025-02-05 22:48:02 - ERROR - stderr - +2025-02-05 22:48:02 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 1.178890585899353, 'learning_rate': 7.514219855871886e-06, 'epoch': 1.78} +2025-02-05 22:48:02 - ERROR - stderr - 59%|█████▉ | 13294/22434 [12:40:22<6:21:52, 2.51s/it] +2025-02-05 22:48:04 - ERROR - stderr - 59%|█████▉ | 13295/22434 [12:40:24<6:21:05, 2.50s/it] +2025-02-05 22:48:04 - ERROR - stderr - +2025-02-05 22:48:04 - ERROR - stderr - +2025-02-05 22:48:04 - INFO - stdout - {'loss': 0.6293, 'grad_norm': 1.240135908126831, 'learning_rate': 7.512821451654467e-06, 'epoch': 1.78} +2025-02-05 22:48:04 - ERROR - stderr - 59%|█████▉ | 13295/22434 [12:40:24<6:21:05, 2.50s/it] +2025-02-05 22:48:07 - ERROR - stderr - 59%|█████▉ | 13296/22434 [12:40:27<6:19:41, 2.49s/it] +2025-02-05 22:48:07 - ERROR - stderr - +2025-02-05 22:48:07 - ERROR - stderr - +2025-02-05 22:48:07 - INFO - stdout - {'loss': 0.6692, 'grad_norm': 1.23332679271698, 'learning_rate': 7.511423099279901e-06, 'epoch': 1.78} +2025-02-05 22:48:07 - ERROR - stderr - 59%|█████▉ | 13296/22434 [12:40:27<6:19:41, 2.49s/it] +2025-02-05 22:48:09 - ERROR - stderr - 59%|█████▉ | 13297/22434 [12:40:29<6:18:30, 2.49s/it] +2025-02-05 22:48:09 - ERROR - stderr - +2025-02-05 22:48:09 - ERROR - stderr - +2025-02-05 22:48:09 - INFO - stdout - {'loss': 0.7009, 'grad_norm': 1.255388617515564, 'learning_rate': 7.510024798777342e-06, 'epoch': 1.78} +2025-02-05 22:48:09 - ERROR - stderr - 59%|█████▉ | 13297/22434 [12:40:29<6:18:30, 2.49s/it] +2025-02-05 22:48:12 - ERROR - stderr - 59%|█████▉ | 13298/22434 [12:40:32<6:19:39, 2.49s/it] +2025-02-05 22:48:12 - ERROR - stderr - +2025-02-05 22:48:12 - ERROR - stderr - +2025-02-05 22:48:12 - INFO - stdout - {'loss': 0.669, 'grad_norm': 1.1538621187210083, 'learning_rate': 7.5086265501759325e-06, 'epoch': 1.78} +2025-02-05 22:48:12 - ERROR - stderr - 59%|█████▉ | 13298/22434 [12:40:32<6:19:39, 2.49s/it] +2025-02-05 22:48:14 - ERROR - stderr - 59%|█████▉ | 13299/22434 [12:40:34<6:23:27, 2.52s/it] +2025-02-05 22:48:14 - ERROR - stderr - +2025-02-05 22:48:14 - ERROR - stderr - +2025-02-05 22:48:14 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.1280181407928467, 'learning_rate': 7.507228353504819e-06, 'epoch': 1.78} +2025-02-05 22:48:14 - ERROR - stderr - 59%|█████▉ | 13299/22434 [12:40:34<6:23:27, 2.52s/it] +2025-02-05 22:48:17 - ERROR - stderr - 59%|█████▉ | 13300/22434 [12:40:37<6:19:35, 2.49s/it] +2025-02-05 22:48:17 - ERROR - stderr - +2025-02-05 22:48:17 - ERROR - stderr - +2025-02-05 22:48:17 - INFO - stdout - {'loss': 0.655, 'grad_norm': 1.074084997177124, 'learning_rate': 7.505830208793147e-06, 'epoch': 1.78} +2025-02-05 22:48:17 - ERROR - stderr - 59%|█████▉ | 13300/22434 [12:40:37<6:19:35, 2.49s/it] +2025-02-05 22:48:19 - ERROR - stderr - 59%|█████▉ | 13301/22434 [12:40:39<6:16:25, 2.47s/it] +2025-02-05 22:48:19 - ERROR - stderr - +2025-02-05 22:48:19 - ERROR - stderr - +2025-02-05 22:48:19 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.3083407878875732, 'learning_rate': 7.504432116070053e-06, 'epoch': 1.78} +2025-02-05 22:48:19 - ERROR - stderr - 59%|█████▉ | 13301/22434 [12:40:39<6:16:25, 2.47s/it] +2025-02-05 22:48:22 - ERROR - stderr - 59%|█████▉ | 13302/22434 [12:40:42<6:18:32, 2.49s/it] +2025-02-05 22:48:22 - ERROR - stderr - +2025-02-05 22:48:22 - ERROR - stderr - +2025-02-05 22:48:22 - INFO - stdout - {'loss': 0.6478, 'grad_norm': 1.1893607378005981, 'learning_rate': 7.503034075364689e-06, 'epoch': 1.78} +2025-02-05 22:48:22 - ERROR - stderr - 59%|█████▉ | 13302/22434 [12:40:42<6:18:32, 2.49s/it] +2025-02-05 22:48:24 - ERROR - stderr - 59%|█████▉ | 13303/22434 [12:40:44<6:18:09, 2.48s/it] +2025-02-05 22:48:24 - ERROR - stderr - +2025-02-05 22:48:24 - ERROR - stderr - +2025-02-05 22:48:24 - INFO - stdout - {'loss': 0.6744, 'grad_norm': 1.291527509689331, 'learning_rate': 7.501636086706188e-06, 'epoch': 1.78} +2025-02-05 22:48:24 - ERROR - stderr - 59%|█████▉ | 13303/22434 [12:40:44<6:18:09, 2.48s/it] +2025-02-05 22:48:27 - ERROR - stderr - 59%|█████▉ | 13304/22434 [12:40:46<6:14:55, 2.46s/it] +2025-02-05 22:48:27 - ERROR - stderr - +2025-02-05 22:48:27 - ERROR - stderr - +2025-02-05 22:48:27 - INFO - stdout - {'loss': 0.6879, 'grad_norm': 1.3032326698303223, 'learning_rate': 7.500238150123691e-06, 'epoch': 1.78} +2025-02-05 22:48:27 - ERROR - stderr - 59%|█████▉ | 13304/22434 [12:40:47<6:14:55, 2.46s/it] +2025-02-05 22:48:29 - ERROR - stderr - 59%|█████▉ | 13305/22434 [12:40:49<6:19:44, 2.50s/it] +2025-02-05 22:48:29 - ERROR - stderr - +2025-02-05 22:48:29 - ERROR - stderr - +2025-02-05 22:48:29 - INFO - stdout - {'loss': 0.6911, 'grad_norm': 1.2379329204559326, 'learning_rate': 7.498840265646339e-06, 'epoch': 1.78} +2025-02-05 22:48:29 - ERROR - stderr - 59%|█████▉ | 13305/22434 [12:40:49<6:19:44, 2.50s/it] +2025-02-05 22:48:32 - ERROR - stderr - 59%|█████▉ | 13306/22434 [12:40:52<6:23:41, 2.52s/it] +2025-02-05 22:48:32 - ERROR - stderr - +2025-02-05 22:48:32 - ERROR - stderr - +2025-02-05 22:48:32 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.1819076538085938, 'learning_rate': 7.497442433303265e-06, 'epoch': 1.78} +2025-02-05 22:48:32 - ERROR - stderr - 59%|█████▉ | 13306/22434 [12:40:52<6:23:41, 2.52s/it] +2025-02-05 22:48:34 - ERROR - stderr - 59%|█████▉ | 13307/22434 [12:40:54<6:19:56, 2.50s/it] +2025-02-05 22:48:34 - ERROR - stderr - +2025-02-05 22:48:34 - ERROR - stderr - +2025-02-05 22:48:34 - INFO - stdout - {'loss': 0.6233, 'grad_norm': 1.0411958694458008, 'learning_rate': 7.4960446531236134e-06, 'epoch': 1.78} +2025-02-05 22:48:34 - ERROR - stderr - 59%|█████▉ | 13307/22434 [12:40:54<6:19:56, 2.50s/it] +2025-02-05 22:48:40 - ERROR - stderr - 59%|█████▉ | 13308/22434 [12:41:00<8:46:47, 3.46s/it] +2025-02-05 22:48:40 - ERROR - stderr - +2025-02-05 22:48:40 - ERROR - stderr - +2025-02-05 22:48:40 - INFO - stdout - {'loss': 0.7092, 'grad_norm': 1.184686303138733, 'learning_rate': 7.494646925136515e-06, 'epoch': 1.78} +2025-02-05 22:48:40 - ERROR - stderr - 59%|█████▉ | 13308/22434 [12:41:00<8:46:47, 3.46s/it] +2025-02-05 22:48:42 - ERROR - stderr - 59%|█████▉ | 13309/22434 [12:41:02<8:00:54, 3.16s/it] +2025-02-05 22:48:43 - ERROR - stderr - +2025-02-05 22:48:43 - ERROR - stderr - +2025-02-05 22:48:43 - INFO - stdout - {'loss': 0.709, 'grad_norm': 1.2434760332107544, 'learning_rate': 7.4932492493711e-06, 'epoch': 1.78} +2025-02-05 22:48:43 - ERROR - stderr - 59%|█████▉ | 13309/22434 [12:41:02<8:00:54, 3.16s/it] +2025-02-05 22:48:45 - ERROR - stderr - 59%|█████▉ | 13310/22434 [12:41:05<7:32:08, 2.97s/it] +2025-02-05 22:48:45 - ERROR - stderr - +2025-02-05 22:48:45 - ERROR - stderr - +2025-02-05 22:48:45 - INFO - stdout - {'loss': 0.6821, 'grad_norm': 1.1609017848968506, 'learning_rate': 7.49185162585651e-06, 'epoch': 1.78} +2025-02-05 22:48:45 - ERROR - stderr - 59%|█████▉ | 13310/22434 [12:41:05<7:32:08, 2.97s/it] +2025-02-05 22:48:48 - ERROR - stderr - 59%|█████▉ | 13311/22434 [12:41:08<7:22:17, 2.91s/it] +2025-02-05 22:48:48 - ERROR - stderr - +2025-02-05 22:48:48 - ERROR - stderr - +2025-02-05 22:48:48 - INFO - stdout - {'loss': 0.6891, 'grad_norm': 1.1342800855636597, 'learning_rate': 7.490454054621872e-06, 'epoch': 1.78} +2025-02-05 22:48:48 - ERROR - stderr - 59%|█████▉ | 13311/22434 [12:41:08<7:22:17, 2.91s/it] +2025-02-05 22:48:50 - ERROR - stderr - 59%|█████▉ | 13312/22434 [12:41:10<7:04:34, 2.79s/it] +2025-02-05 22:48:50 - ERROR - stderr - +2025-02-05 22:48:50 - ERROR - stderr - +2025-02-05 22:48:50 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.2116864919662476, 'learning_rate': 7.489056535696313e-06, 'epoch': 1.78} +2025-02-05 22:48:50 - ERROR - stderr - 59%|█████▉ | 13312/22434 [12:41:10<7:04:34, 2.79s/it] +2025-02-05 22:48:53 - ERROR - stderr - 59%|█████▉ | 13313/22434 [12:41:13<6:49:08, 2.69s/it] +2025-02-05 22:48:53 - ERROR - stderr - +2025-02-05 22:48:53 - ERROR - stderr - +2025-02-05 22:48:53 - INFO - stdout - {'loss': 0.6724, 'grad_norm': 1.315856695175171, 'learning_rate': 7.487659069108974e-06, 'epoch': 1.78} +2025-02-05 22:48:53 - ERROR - stderr - 59%|█████▉ | 13313/22434 [12:41:13<6:49:08, 2.69s/it] +2025-02-05 22:48:56 - ERROR - stderr - 59%|█████▉ | 13314/22434 [12:41:15<6:55:44, 2.74s/it] +2025-02-05 22:48:56 - ERROR - stderr - +2025-02-05 22:48:56 - ERROR - stderr - +2025-02-05 22:48:56 - INFO - stdout - {'loss': 0.6904, 'grad_norm': 1.1762348413467407, 'learning_rate': 7.486261654888974e-06, 'epoch': 1.78} +2025-02-05 22:48:56 - ERROR - stderr - 59%|█████▉ | 13314/22434 [12:41:15<6:55:44, 2.74s/it] +2025-02-05 22:48:58 - ERROR - stderr - 59%|█████▉ | 13315/22434 [12:41:18<6:42:49, 2.65s/it] +2025-02-05 22:48:58 - ERROR - stderr - +2025-02-05 22:48:58 - ERROR - stderr - +2025-02-05 22:48:58 - INFO - stdout - {'loss': 0.6917, 'grad_norm': 1.26168954372406, 'learning_rate': 7.484864293065446e-06, 'epoch': 1.78} +2025-02-05 22:48:58 - ERROR - stderr - 59%|█████▉ | 13315/22434 [12:41:18<6:42:49, 2.65s/it] +2025-02-05 22:49:01 - ERROR - stderr - 59%|█████▉ | 13316/22434 [12:41:20<6:35:50, 2.60s/it] +2025-02-05 22:49:01 - ERROR - stderr - +2025-02-05 22:49:01 - ERROR - stderr - +2025-02-05 22:49:01 - INFO - stdout - {'loss': 0.7231, 'grad_norm': 1.3267970085144043, 'learning_rate': 7.483466983667516e-06, 'epoch': 1.78} +2025-02-05 22:49:01 - ERROR - stderr - 59%|█████▉ | 13316/22434 [12:41:20<6:35:50, 2.60s/it] +2025-02-05 22:49:03 - ERROR - stderr - 59%|█████▉ | 13317/22434 [12:41:23<6:31:10, 2.57s/it] +2025-02-05 22:49:03 - ERROR - stderr - +2025-02-05 22:49:03 - ERROR - stderr - +2025-02-05 22:49:03 - INFO - stdout - {'loss': 0.6882, 'grad_norm': 1.315279483795166, 'learning_rate': 7.482069726724306e-06, 'epoch': 1.78} +2025-02-05 22:49:03 - ERROR - stderr - 59%|█████▉ | 13317/22434 [12:41:23<6:31:10, 2.57s/it] +2025-02-05 22:49:05 - ERROR - stderr - 59%|█████▉ | 13318/22434 [12:41:25<6:24:19, 2.53s/it] +2025-02-05 22:49:06 - ERROR - stderr - +2025-02-05 22:49:06 - ERROR - stderr - +2025-02-05 22:49:06 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.2874135971069336, 'learning_rate': 7.4806725222649446e-06, 'epoch': 1.78} +2025-02-05 22:49:06 - ERROR - stderr - 59%|█████▉ | 13318/22434 [12:41:25<6:24:19, 2.53s/it] +2025-02-05 22:49:08 - ERROR - stderr - 59%|█████▉ | 13319/22434 [12:41:28<6:22:27, 2.52s/it] +2025-02-05 22:49:08 - ERROR - stderr - +2025-02-05 22:49:08 - ERROR - stderr - +2025-02-05 22:49:08 - INFO - stdout - {'loss': 0.5735, 'grad_norm': 1.2011380195617676, 'learning_rate': 7.479275370318555e-06, 'epoch': 1.78} +2025-02-05 22:49:08 - ERROR - stderr - 59%|█████▉ | 13319/22434 [12:41:28<6:22:27, 2.52s/it] +2025-02-05 22:49:10 - ERROR - stderr - 59%|█████▉ | 13320/22434 [12:41:30<6:21:10, 2.51s/it] +2025-02-05 22:49:10 - ERROR - stderr - +2025-02-05 22:49:11 - ERROR - stderr - +2025-02-05 22:49:11 - INFO - stdout - {'loss': 0.7213, 'grad_norm': 1.2650060653686523, 'learning_rate': 7.477878270914255e-06, 'epoch': 1.78} +2025-02-05 22:49:11 - ERROR - stderr - 59%|█████▉ | 13320/22434 [12:41:30<6:21:10, 2.51s/it] +2025-02-05 22:49:13 - ERROR - stderr - 59%|█████▉ | 13321/22434 [12:41:33<6:24:47, 2.53s/it] +2025-02-05 22:49:13 - ERROR - stderr - +2025-02-05 22:49:13 - ERROR - stderr - +2025-02-05 22:49:13 - INFO - stdout - {'loss': 0.7721, 'grad_norm': 1.2899402379989624, 'learning_rate': 7.476481224081174e-06, 'epoch': 1.78} +2025-02-05 22:49:13 - ERROR - stderr - 59%|█████▉ | 13321/22434 [12:41:33<6:24:47, 2.53s/it] +2025-02-05 22:49:15 - ERROR - stderr - 59%|█████▉ | 13322/22434 [12:41:35<6:21:22, 2.51s/it] +2025-02-05 22:49:16 - ERROR - stderr - +2025-02-05 22:49:16 - ERROR - stderr - +2025-02-05 22:49:16 - INFO - stdout - {'loss': 0.7282, 'grad_norm': 1.359849214553833, 'learning_rate': 7.4750842298484205e-06, 'epoch': 1.78} +2025-02-05 22:49:16 - ERROR - stderr - 59%|█████▉ | 13322/22434 [12:41:35<6:21:22, 2.51s/it] +2025-02-05 22:49:18 - ERROR - stderr - 59%|█████▉ | 13323/22434 [12:41:38<6:21:41, 2.51s/it] +2025-02-05 22:49:18 - ERROR - stderr - +2025-02-05 22:49:18 - ERROR - stderr - +2025-02-05 22:49:18 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.2366663217544556, 'learning_rate': 7.473687288245126e-06, 'epoch': 1.78} +2025-02-05 22:49:18 - ERROR - stderr - 59%|█████▉ | 13323/22434 [12:41:38<6:21:41, 2.51s/it] +2025-02-05 22:49:21 - ERROR - stderr - 59%|█████▉ | 13324/22434 [12:41:40<6:22:26, 2.52s/it] +2025-02-05 22:49:21 - ERROR - stderr - +2025-02-05 22:49:21 - ERROR - stderr - +2025-02-05 22:49:21 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.3460928201675415, 'learning_rate': 7.472290399300399e-06, 'epoch': 1.78} +2025-02-05 22:49:21 - ERROR - stderr - 59%|█████▉ | 13324/22434 [12:41:40<6:22:26, 2.52s/it] +2025-02-05 22:49:23 - ERROR - stderr - 59%|█████▉ | 13325/22434 [12:41:43<6:20:11, 2.50s/it] +2025-02-05 22:49:23 - ERROR - stderr - +2025-02-05 22:49:23 - ERROR - stderr - +2025-02-05 22:49:23 - INFO - stdout - {'loss': 0.5787, 'grad_norm': 1.167547583580017, 'learning_rate': 7.47089356304336e-06, 'epoch': 1.78} +2025-02-05 22:49:23 - ERROR - stderr - 59%|█████▉ | 13325/22434 [12:41:43<6:20:11, 2.50s/it] +2025-02-05 22:49:25 - ERROR - stderr - 59%|█████▉ | 13326/22434 [12:41:45<6:17:16, 2.49s/it] +2025-02-05 22:49:26 - ERROR - stderr - +2025-02-05 22:49:26 - ERROR - stderr - +2025-02-05 22:49:26 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 1.121841311454773, 'learning_rate': 7.469496779503127e-06, 'epoch': 1.78} +2025-02-05 22:49:26 - ERROR - stderr - 59%|█████▉ | 13326/22434 [12:41:45<6:17:16, 2.49s/it] +2025-02-05 22:49:28 - ERROR - stderr - 59%|█████▉ | 13327/22434 [12:41:48<6:27:39, 2.55s/it] +2025-02-05 22:49:28 - ERROR - stderr - +2025-02-05 22:49:28 - ERROR - stderr - +2025-02-05 22:49:28 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.3463078737258911, 'learning_rate': 7.468100048708813e-06, 'epoch': 1.78} +2025-02-05 22:49:28 - ERROR - stderr - 59%|█████▉ | 13327/22434 [12:41:48<6:27:39, 2.55s/it] +2025-02-05 22:49:31 - ERROR - stderr - 59%|█████▉ | 13328/22434 [12:41:50<6:26:57, 2.55s/it] +2025-02-05 22:49:31 - ERROR - stderr - +2025-02-05 22:49:31 - ERROR - stderr - +2025-02-05 22:49:31 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.2898348569869995, 'learning_rate': 7.4667033706895265e-06, 'epoch': 1.78} +2025-02-05 22:49:31 - ERROR - stderr - 59%|█████▉ | 13328/22434 [12:41:51<6:26:57, 2.55s/it] +2025-02-05 22:49:33 - ERROR - stderr - 59%|█████▉ | 13329/22434 [12:41:53<6:30:24, 2.57s/it] +2025-02-05 22:49:33 - ERROR - stderr - +2025-02-05 22:49:33 - ERROR - stderr - +2025-02-05 22:49:33 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.2123874425888062, 'learning_rate': 7.465306745474388e-06, 'epoch': 1.78} +2025-02-05 22:49:33 - ERROR - stderr - 59%|█████▉ | 13329/22434 [12:41:53<6:30:24, 2.57s/it] +2025-02-05 22:49:36 - ERROR - stderr - 59%|█████▉ | 13330/22434 [12:41:56<6:33:09, 2.59s/it] +2025-02-05 22:49:36 - ERROR - stderr - +2025-02-05 22:49:36 - ERROR - stderr - +2025-02-05 22:49:36 - INFO - stdout - {'loss': 0.6982, 'grad_norm': 1.2860438823699951, 'learning_rate': 7.463910173092501e-06, 'epoch': 1.78} +2025-02-05 22:49:36 - ERROR - stderr - 59%|█████▉ | 13330/22434 [12:41:56<6:33:09, 2.59s/it] +2025-02-05 22:49:38 - ERROR - stderr - 59%|█████▉ | 13331/22434 [12:41:58<6:24:52, 2.54s/it] +2025-02-05 22:49:38 - ERROR - stderr - +2025-02-05 22:49:38 - ERROR - stderr - +2025-02-05 22:49:38 - INFO - stdout - {'loss': 0.7146, 'grad_norm': 1.2346816062927246, 'learning_rate': 7.462513653572983e-06, 'epoch': 1.78} +2025-02-05 22:49:38 - ERROR - stderr - 59%|█████▉ | 13331/22434 [12:41:58<6:24:52, 2.54s/it] +2025-02-05 22:49:41 - ERROR - stderr - 59%|█████▉ | 13332/22434 [12:42:01<6:21:27, 2.51s/it] +2025-02-05 22:49:41 - ERROR - stderr - +2025-02-05 22:49:41 - ERROR - stderr - +2025-02-05 22:49:41 - INFO - stdout - {'loss': 0.7073, 'grad_norm': 1.253859043121338, 'learning_rate': 7.46111718694494e-06, 'epoch': 1.78} +2025-02-05 22:49:41 - ERROR - stderr - 59%|█████▉ | 13332/22434 [12:42:01<6:21:27, 2.51s/it] +2025-02-05 22:49:43 - ERROR - stderr - 59%|█████▉ | 13333/22434 [12:42:03<6:23:50, 2.53s/it] +2025-02-05 22:49:43 - ERROR - stderr - +2025-02-05 22:49:43 - ERROR - stderr - +2025-02-05 22:49:43 - INFO - stdout - {'loss': 0.6378, 'grad_norm': 1.1765891313552856, 'learning_rate': 7.459720773237476e-06, 'epoch': 1.78} +2025-02-05 22:49:43 - ERROR - stderr - 59%|█████▉ | 13333/22434 [12:42:03<6:23:50, 2.53s/it] +2025-02-05 22:49:46 - ERROR - stderr - 59%|█████▉ | 13334/22434 [12:42:06<6:22:50, 2.52s/it] +2025-02-05 22:49:46 - ERROR - stderr - +2025-02-05 22:49:46 - ERROR - stderr - +2025-02-05 22:49:46 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.2042300701141357, 'learning_rate': 7.458324412479705e-06, 'epoch': 1.78} +2025-02-05 22:49:46 - ERROR - stderr - 59%|█████▉ | 13334/22434 [12:42:06<6:22:50, 2.52s/it] +2025-02-05 22:49:48 - ERROR - stderr - 59%|█████▉ | 13335/22434 [12:42:08<6:19:50, 2.50s/it] +2025-02-05 22:49:48 - ERROR - stderr - +2025-02-05 22:49:48 - ERROR - stderr - +2025-02-05 22:49:48 - INFO - stdout - {'loss': 0.6822, 'grad_norm': 1.2054184675216675, 'learning_rate': 7.456928104700729e-06, 'epoch': 1.78} +2025-02-05 22:49:48 - ERROR - stderr - 59%|█████▉ | 13335/22434 [12:42:08<6:19:50, 2.50s/it] +2025-02-05 22:49:51 - ERROR - stderr - 59%|█████▉ | 13336/22434 [12:42:11<6:22:28, 2.52s/it] +2025-02-05 22:49:51 - ERROR - stderr - +2025-02-05 22:49:51 - ERROR - stderr - +2025-02-05 22:49:51 - INFO - stdout - {'loss': 0.715, 'grad_norm': 1.2493089437484741, 'learning_rate': 7.455531849929653e-06, 'epoch': 1.78} +2025-02-05 22:49:51 - ERROR - stderr - 59%|█████▉ | 13336/22434 [12:42:11<6:22:28, 2.52s/it] +2025-02-05 22:49:53 - ERROR - stderr - 59%|█████▉ | 13337/22434 [12:42:13<6:23:22, 2.53s/it] +2025-02-05 22:49:54 - ERROR - stderr - +2025-02-05 22:49:54 - ERROR - stderr - +2025-02-05 22:49:54 - INFO - stdout - {'loss': 0.6545, 'grad_norm': 1.2286487817764282, 'learning_rate': 7.45413564819558e-06, 'epoch': 1.78} +2025-02-05 22:49:54 - ERROR - stderr - 59%|█████▉ | 13337/22434 [12:42:13<6:23:22, 2.53s/it] +2025-02-05 22:49:56 - ERROR - stderr - 59%|█████▉ | 13338/22434 [12:42:16<6:22:13, 2.52s/it] +2025-02-05 22:49:56 - ERROR - stderr - +2025-02-05 22:49:56 - ERROR - stderr - +2025-02-05 22:49:56 - INFO - stdout - {'loss': 0.5941, 'grad_norm': 1.2287245988845825, 'learning_rate': 7.452739499527613e-06, 'epoch': 1.78} +2025-02-05 22:49:56 - ERROR - stderr - 59%|█████▉ | 13338/22434 [12:42:16<6:22:13, 2.52s/it] +2025-02-05 22:49:58 - ERROR - stderr - 59%|█████▉ | 13339/22434 [12:42:18<6:21:06, 2.51s/it] +2025-02-05 22:49:59 - ERROR - stderr - +2025-02-05 22:49:59 - ERROR - stderr - +2025-02-05 22:49:59 - INFO - stdout - {'loss': 0.7621, 'grad_norm': 1.477335810661316, 'learning_rate': 7.451343403954856e-06, 'epoch': 1.78} +2025-02-05 22:49:59 - ERROR - stderr - 59%|█████▉ | 13339/22434 [12:42:18<6:21:06, 2.51s/it] +2025-02-05 22:50:01 - ERROR - stderr - 59%|█████▉ | 13340/22434 [12:42:21<6:20:57, 2.51s/it] +2025-02-05 22:50:01 - ERROR - stderr - +2025-02-05 22:50:01 - ERROR - stderr - +2025-02-05 22:50:01 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.286862850189209, 'learning_rate': 7.449947361506407e-06, 'epoch': 1.78} +2025-02-05 22:50:01 - ERROR - stderr - 59%|█████▉ | 13340/22434 [12:42:21<6:20:57, 2.51s/it] +2025-02-05 22:50:04 - ERROR - stderr - 59%|█████▉ | 13341/22434 [12:42:24<6:35:05, 2.61s/it] +2025-02-05 22:50:04 - ERROR - stderr - +2025-02-05 22:50:04 - ERROR - stderr - +2025-02-05 22:50:04 - INFO - stdout - {'loss': 0.7286, 'grad_norm': 1.2689037322998047, 'learning_rate': 7.448551372211361e-06, 'epoch': 1.78} +2025-02-05 22:50:04 - ERROR - stderr - 59%|█████▉ | 13341/22434 [12:42:24<6:35:05, 2.61s/it] +2025-02-05 22:50:06 - ERROR - stderr - 59%|█████▉ | 13342/22434 [12:42:26<6:33:24, 2.60s/it] +2025-02-05 22:50:06 - ERROR - stderr - +2025-02-05 22:50:06 - ERROR - stderr - +2025-02-05 22:50:06 - INFO - stdout - {'loss': 0.6927, 'grad_norm': 1.2290533781051636, 'learning_rate': 7.447155436098825e-06, 'epoch': 1.78} +2025-02-05 22:50:06 - ERROR - stderr - 59%|█████▉ | 13342/22434 [12:42:26<6:33:24, 2.60s/it] +2025-02-05 22:50:09 - ERROR - stderr - 59%|█████▉ | 13343/22434 [12:42:29<6:33:22, 2.60s/it] +2025-02-05 22:50:09 - ERROR - stderr - +2025-02-05 22:50:09 - ERROR - stderr - +2025-02-05 22:50:09 - INFO - stdout - {'loss': 0.7627, 'grad_norm': 1.2181172370910645, 'learning_rate': 7.4457595531978864e-06, 'epoch': 1.78} +2025-02-05 22:50:09 - ERROR - stderr - 59%|█████▉ | 13343/22434 [12:42:29<6:33:22, 2.60s/it] +2025-02-05 22:50:11 - ERROR - stderr - 59%|█████▉ | 13344/22434 [12:42:31<6:27:32, 2.56s/it] +2025-02-05 22:50:12 - ERROR - stderr - +2025-02-05 22:50:12 - ERROR - stderr - +2025-02-05 22:50:12 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.3072049617767334, 'learning_rate': 7.444363723537648e-06, 'epoch': 1.78} +2025-02-05 22:50:12 - ERROR - stderr - 59%|█████▉ | 13344/22434 [12:42:31<6:27:32, 2.56s/it] +2025-02-05 22:50:14 - ERROR - stderr - 59%|█████▉ | 13345/22434 [12:42:34<6:25:25, 2.54s/it] +2025-02-05 22:50:14 - ERROR - stderr - +2025-02-05 22:50:14 - ERROR - stderr - +2025-02-05 22:50:14 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.293628454208374, 'learning_rate': 7.442967947147205e-06, 'epoch': 1.78} +2025-02-05 22:50:14 - ERROR - stderr - 59%|█████▉ | 13345/22434 [12:42:34<6:25:25, 2.54s/it] +2025-02-05 22:50:16 - ERROR - stderr - 59%|█████▉ | 13346/22434 [12:42:36<6:23:35, 2.53s/it] +2025-02-05 22:50:17 - ERROR - stderr - +2025-02-05 22:50:17 - ERROR - stderr - +2025-02-05 22:50:17 - INFO - stdout - {'loss': 0.6765, 'grad_norm': 1.1302788257598877, 'learning_rate': 7.441572224055644e-06, 'epoch': 1.78} +2025-02-05 22:50:17 - ERROR - stderr - 59%|█████▉ | 13346/22434 [12:42:36<6:23:35, 2.53s/it] +2025-02-05 22:50:19 - ERROR - stderr - 59%|█████▉ | 13347/22434 [12:42:39<6:21:50, 2.52s/it] +2025-02-05 22:50:19 - ERROR - stderr - +2025-02-05 22:50:19 - ERROR - stderr - +2025-02-05 22:50:19 - INFO - stdout - {'loss': 0.6495, 'grad_norm': 1.2976595163345337, 'learning_rate': 7.440176554292065e-06, 'epoch': 1.78} +2025-02-05 22:50:19 - ERROR - stderr - 59%|█████▉ | 13347/22434 [12:42:39<6:21:50, 2.52s/it] +2025-02-05 22:50:21 - ERROR - stderr - 59%|█████▉ | 13348/22434 [12:42:41<6:21:20, 2.52s/it] +2025-02-05 22:50:22 - ERROR - stderr - +2025-02-05 22:50:22 - ERROR - stderr - +2025-02-05 22:50:22 - INFO - stdout - {'loss': 0.8002, 'grad_norm': 1.4654515981674194, 'learning_rate': 7.438780937885555e-06, 'epoch': 1.78} +2025-02-05 22:50:22 - ERROR - stderr - 59%|█████▉ | 13348/22434 [12:42:41<6:21:20, 2.52s/it] +2025-02-05 22:50:24 - ERROR - stderr - 60%|█████▉ | 13349/22434 [12:42:44<6:18:46, 2.50s/it] +2025-02-05 22:50:24 - ERROR - stderr - +2025-02-05 22:50:24 - ERROR - stderr - +2025-02-05 22:50:24 - INFO - stdout - {'loss': 0.7139, 'grad_norm': 1.296443223953247, 'learning_rate': 7.437385374865206e-06, 'epoch': 1.79} +2025-02-05 22:50:24 - ERROR - stderr - 60%|█████▉ | 13349/22434 [12:42:44<6:18:46, 2.50s/it] +2025-02-05 22:50:26 - ERROR - stderr - 60%|█████▉ | 13350/22434 [12:42:46<6:16:26, 2.49s/it] +2025-02-05 22:50:26 - ERROR - stderr - +2025-02-05 22:50:26 - ERROR - stderr - +2025-02-05 22:50:26 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.2363168001174927, 'learning_rate': 7.435989865260106e-06, 'epoch': 1.79} +2025-02-05 22:50:26 - ERROR - stderr - 60%|█████▉ | 13350/22434 [12:42:46<6:16:26, 2.49s/it] +2025-02-05 22:50:29 - ERROR - stderr - 60%|█████▉ | 13351/22434 [12:42:49<6:13:15, 2.47s/it] +2025-02-05 22:50:29 - ERROR - stderr - +2025-02-05 22:50:29 - ERROR - stderr - +2025-02-05 22:50:29 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.3031309843063354, 'learning_rate': 7.434594409099342e-06, 'epoch': 1.79} +2025-02-05 22:50:29 - ERROR - stderr - 60%|█████▉ | 13351/22434 [12:42:49<6:13:15, 2.47s/it] +2025-02-05 22:50:31 - ERROR - stderr - 60%|█████▉ | 13352/22434 [12:42:51<6:12:34, 2.46s/it] +2025-02-05 22:50:31 - ERROR - stderr - +2025-02-05 22:50:31 - ERROR - stderr - +2025-02-05 22:50:31 - INFO - stdout - {'loss': 0.6408, 'grad_norm': 1.1127417087554932, 'learning_rate': 7.433199006412006e-06, 'epoch': 1.79} +2025-02-05 22:50:31 - ERROR - stderr - 60%|█████▉ | 13352/22434 [12:42:51<6:12:34, 2.46s/it] +2025-02-05 22:50:34 - ERROR - stderr - 60%|█████▉ | 13353/22434 [12:42:54<6:28:01, 2.56s/it] +2025-02-05 22:50:34 - ERROR - stderr - +2025-02-05 22:50:34 - ERROR - stderr - +2025-02-05 22:50:34 - INFO - stdout - {'loss': 0.6425, 'grad_norm': 1.2181146144866943, 'learning_rate': 7.431803657227182e-06, 'epoch': 1.79} +2025-02-05 22:50:34 - ERROR - stderr - 60%|█████▉ | 13353/22434 [12:42:54<6:28:01, 2.56s/it] +2025-02-05 22:50:37 - ERROR - stderr - 60%|█████▉ | 13354/22434 [12:42:56<6:24:42, 2.54s/it] +2025-02-05 22:50:37 - ERROR - stderr - +2025-02-05 22:50:37 - ERROR - stderr - +2025-02-05 22:50:37 - INFO - stdout - {'loss': 0.5671, 'grad_norm': 1.201139211654663, 'learning_rate': 7.430408361573949e-06, 'epoch': 1.79} +2025-02-05 22:50:37 - ERROR - stderr - 60%|█████▉ | 13354/22434 [12:42:56<6:24:42, 2.54s/it] +2025-02-05 22:50:39 - ERROR - stderr - 60%|█████▉ | 13355/22434 [12:42:59<6:18:33, 2.50s/it] +2025-02-05 22:50:39 - ERROR - stderr - +2025-02-05 22:50:39 - ERROR - stderr - +2025-02-05 22:50:39 - INFO - stdout - {'loss': 0.6535, 'grad_norm': 1.248763918876648, 'learning_rate': 7.429013119481398e-06, 'epoch': 1.79} +2025-02-05 22:50:39 - ERROR - stderr - 60%|█████▉ | 13355/22434 [12:42:59<6:18:33, 2.50s/it] +2025-02-05 22:50:41 - ERROR - stderr - 60%|█████▉ | 13356/22434 [12:43:01<6:18:16, 2.50s/it] +2025-02-05 22:50:42 - ERROR - stderr - +2025-02-05 22:50:42 - ERROR - stderr - +2025-02-05 22:50:42 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.3523023128509521, 'learning_rate': 7.427617930978605e-06, 'epoch': 1.79} +2025-02-05 22:50:42 - ERROR - stderr - 60%|█████▉ | 13356/22434 [12:43:01<6:18:16, 2.50s/it] +2025-02-05 22:50:44 - ERROR - stderr - 60%|█████▉ | 13357/22434 [12:43:04<6:15:12, 2.48s/it] +2025-02-05 22:50:44 - ERROR - stderr - +2025-02-05 22:50:44 - ERROR - stderr - +2025-02-05 22:50:44 - INFO - stdout - {'loss': 0.6613, 'grad_norm': 1.390576958656311, 'learning_rate': 7.426222796094655e-06, 'epoch': 1.79} +2025-02-05 22:50:44 - ERROR - stderr - 60%|█████▉ | 13357/22434 [12:43:04<6:15:12, 2.48s/it] +2025-02-05 22:50:46 - ERROR - stderr - 60%|█████▉ | 13358/22434 [12:43:06<6:15:17, 2.48s/it] +2025-02-05 22:50:46 - ERROR - stderr - +2025-02-05 22:50:46 - ERROR - stderr - +2025-02-05 22:50:46 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.2277024984359741, 'learning_rate': 7.424827714858631e-06, 'epoch': 1.79} +2025-02-05 22:50:46 - ERROR - stderr - 60%|█████▉ | 13358/22434 [12:43:06<6:15:17, 2.48s/it] +2025-02-05 22:50:49 - ERROR - stderr - 60%|█████▉ | 13359/22434 [12:43:09<6:16:05, 2.49s/it] +2025-02-05 22:50:49 - ERROR - stderr - +2025-02-05 22:50:49 - ERROR - stderr - +2025-02-05 22:50:49 - INFO - stdout - {'loss': 0.689, 'grad_norm': 1.2601195573806763, 'learning_rate': 7.423432687299605e-06, 'epoch': 1.79} +2025-02-05 22:50:49 - ERROR - stderr - 60%|█████▉ | 13359/22434 [12:43:09<6:16:05, 2.49s/it] +2025-02-05 22:50:51 - ERROR - stderr - 60%|█████▉ | 13360/22434 [12:43:11<6:16:26, 2.49s/it] +2025-02-05 22:50:51 - ERROR - stderr - +2025-02-05 22:50:51 - ERROR - stderr - +2025-02-05 22:50:51 - INFO - stdout - {'loss': 0.6546, 'grad_norm': 1.1781598329544067, 'learning_rate': 7.422037713446665e-06, 'epoch': 1.79} +2025-02-05 22:50:51 - ERROR - stderr - 60%|█████▉ | 13360/22434 [12:43:11<6:16:26, 2.49s/it] +2025-02-05 22:50:54 - ERROR - stderr - 60%|█████▉ | 13361/22434 [12:43:14<6:16:53, 2.49s/it] +2025-02-05 22:50:54 - ERROR - stderr - +2025-02-05 22:50:54 - ERROR - stderr - +2025-02-05 22:50:54 - INFO - stdout - {'loss': 0.672, 'grad_norm': 1.197702407836914, 'learning_rate': 7.42064279332888e-06, 'epoch': 1.79} +2025-02-05 22:50:54 - ERROR - stderr - 60%|█████▉ | 13361/22434 [12:43:14<6:16:53, 2.49s/it] +2025-02-05 22:50:56 - ERROR - stderr - 60%|█████▉ | 13362/22434 [12:43:16<6:18:48, 2.51s/it] +2025-02-05 22:50:56 - ERROR - stderr - +2025-02-05 22:50:56 - ERROR - stderr - +2025-02-05 22:50:56 - INFO - stdout - {'loss': 0.7246, 'grad_norm': 1.2426199913024902, 'learning_rate': 7.419247926975325e-06, 'epoch': 1.79} +2025-02-05 22:50:56 - ERROR - stderr - 60%|█████▉ | 13362/22434 [12:43:16<6:18:48, 2.51s/it] +2025-02-05 22:50:59 - ERROR - stderr - 60%|█████▉ | 13363/22434 [12:43:19<6:16:42, 2.49s/it] +2025-02-05 22:50:59 - ERROR - stderr - +2025-02-05 22:50:59 - ERROR - stderr - +2025-02-05 22:50:59 - INFO - stdout - {'loss': 0.6689, 'grad_norm': 1.444120168685913, 'learning_rate': 7.417853114415079e-06, 'epoch': 1.79} +2025-02-05 22:50:59 - ERROR - stderr - 60%|█████▉ | 13363/22434 [12:43:19<6:16:42, 2.49s/it] +2025-02-05 22:51:01 - ERROR - stderr - 60%|█████▉ | 13364/22434 [12:43:21<6:19:08, 2.51s/it] +2025-02-05 22:51:01 - ERROR - stderr - +2025-02-05 22:51:01 - ERROR - stderr - +2025-02-05 22:51:01 - INFO - stdout - {'loss': 0.6774, 'grad_norm': 1.2977793216705322, 'learning_rate': 7.416458355677215e-06, 'epoch': 1.79} +2025-02-05 22:51:01 - ERROR - stderr - 60%|█████▉ | 13364/22434 [12:43:21<6:19:08, 2.51s/it] +2025-02-05 22:51:04 - ERROR - stderr - 60%|█████▉ | 13365/22434 [12:43:24<6:18:43, 2.51s/it] +2025-02-05 22:51:04 - ERROR - stderr - +2025-02-05 22:51:04 - ERROR - stderr - +2025-02-05 22:51:04 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.130021572113037, 'learning_rate': 7.415063650790801e-06, 'epoch': 1.79} +2025-02-05 22:51:04 - ERROR - stderr - 60%|█████▉ | 13365/22434 [12:43:24<6:18:43, 2.51s/it] +2025-02-05 22:51:06 - ERROR - stderr - 60%|█████▉ | 13366/22434 [12:43:26<6:18:35, 2.50s/it] +2025-02-05 22:51:06 - ERROR - stderr - +2025-02-05 22:51:06 - ERROR - stderr - +2025-02-05 22:51:06 - INFO - stdout - {'loss': 0.7151, 'grad_norm': 1.3829281330108643, 'learning_rate': 7.413668999784916e-06, 'epoch': 1.79} +2025-02-05 22:51:06 - ERROR - stderr - 60%|█████▉ | 13366/22434 [12:43:26<6:18:35, 2.50s/it] +2025-02-05 22:51:09 - ERROR - stderr - 60%|█████▉ | 13367/22434 [12:43:29<6:21:32, 2.52s/it] +2025-02-05 22:51:09 - ERROR - stderr - +2025-02-05 22:51:09 - ERROR - stderr - +2025-02-05 22:51:09 - INFO - stdout - {'loss': 0.7467, 'grad_norm': 1.4196044206619263, 'learning_rate': 7.412274402688622e-06, 'epoch': 1.79} +2025-02-05 22:51:09 - ERROR - stderr - 60%|█████▉ | 13367/22434 [12:43:29<6:21:32, 2.52s/it] +2025-02-05 22:51:11 - ERROR - stderr - 60%|█████▉ | 13368/22434 [12:43:31<6:19:48, 2.51s/it] +2025-02-05 22:51:12 - ERROR - stderr - +2025-02-05 22:51:12 - ERROR - stderr - +2025-02-05 22:51:12 - INFO - stdout - {'loss': 0.6772, 'grad_norm': 1.2620078325271606, 'learning_rate': 7.410879859530996e-06, 'epoch': 1.79} +2025-02-05 22:51:12 - ERROR - stderr - 60%|█████▉ | 13368/22434 [12:43:31<6:19:48, 2.51s/it] +2025-02-05 22:51:14 - ERROR - stderr - 60%|█████▉ | 13369/22434 [12:43:34<6:17:07, 2.50s/it] +2025-02-05 22:51:14 - ERROR - stderr - +2025-02-05 22:51:14 - ERROR - stderr - +2025-02-05 22:51:14 - INFO - stdout - {'loss': 0.6734, 'grad_norm': 1.3080027103424072, 'learning_rate': 7.4094853703410985e-06, 'epoch': 1.79} +2025-02-05 22:51:14 - ERROR - stderr - 60%|█████▉ | 13369/22434 [12:43:34<6:17:07, 2.50s/it] +2025-02-05 22:51:16 - ERROR - stderr - 60%|█████▉ | 13370/22434 [12:43:36<6:14:12, 2.48s/it] +2025-02-05 22:51:16 - ERROR - stderr - +2025-02-05 22:51:16 - ERROR - stderr - +2025-02-05 22:51:16 - INFO - stdout - {'loss': 0.6656, 'grad_norm': 1.2145764827728271, 'learning_rate': 7.408090935147999e-06, 'epoch': 1.79} +2025-02-05 22:51:16 - ERROR - stderr - 60%|█████▉ | 13370/22434 [12:43:36<6:14:12, 2.48s/it] +2025-02-05 22:51:19 - ERROR - stderr - 60%|█████▉ | 13371/22434 [12:43:39<6:13:31, 2.47s/it] +2025-02-05 22:51:19 - ERROR - stderr - +2025-02-05 22:51:19 - ERROR - stderr - +2025-02-05 22:51:19 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.312849998474121, 'learning_rate': 7.406696553980768e-06, 'epoch': 1.79} +2025-02-05 22:51:19 - ERROR - stderr - 60%|█████▉ | 13371/22434 [12:43:39<6:13:31, 2.47s/it] +2025-02-05 22:51:21 - ERROR - stderr - 60%|█████▉ | 13372/22434 [12:43:41<6:16:21, 2.49s/it] +2025-02-05 22:51:21 - ERROR - stderr - +2025-02-05 22:51:21 - ERROR - stderr - +2025-02-05 22:51:21 - INFO - stdout - {'loss': 0.783, 'grad_norm': 1.425845742225647, 'learning_rate': 7.405302226868465e-06, 'epoch': 1.79} +2025-02-05 22:51:21 - ERROR - stderr - 60%|█████▉ | 13372/22434 [12:43:41<6:16:21, 2.49s/it] +2025-02-05 22:51:24 - ERROR - stderr - 60%|█████▉ | 13373/22434 [12:43:44<6:13:56, 2.48s/it] +2025-02-05 22:51:24 - ERROR - stderr - +2025-02-05 22:51:24 - ERROR - stderr - +2025-02-05 22:51:24 - INFO - stdout - {'loss': 0.749, 'grad_norm': 1.4015134572982788, 'learning_rate': 7.403907953840151e-06, 'epoch': 1.79} +2025-02-05 22:51:24 - ERROR - stderr - 60%|█████▉ | 13373/22434 [12:43:44<6:13:56, 2.48s/it] +2025-02-05 22:51:26 - ERROR - stderr - 60%|█████▉ | 13374/22434 [12:43:46<6:14:10, 2.48s/it] +2025-02-05 22:51:26 - ERROR - stderr - +2025-02-05 22:51:26 - ERROR - stderr - +2025-02-05 22:51:26 - INFO - stdout - {'loss': 0.7202, 'grad_norm': 1.3438396453857422, 'learning_rate': 7.402513734924895e-06, 'epoch': 1.79} +2025-02-05 22:51:26 - ERROR - stderr - 60%|█████▉ | 13374/22434 [12:43:46<6:14:10, 2.48s/it] +2025-02-05 22:51:29 - ERROR - stderr - 60%|█████▉ | 13375/22434 [12:43:49<6:15:21, 2.49s/it] +2025-02-05 22:51:29 - ERROR - stderr - +2025-02-05 22:51:29 - ERROR - stderr - +2025-02-05 22:51:29 - INFO - stdout - {'loss': 0.662, 'grad_norm': 1.215644121170044, 'learning_rate': 7.401119570151749e-06, 'epoch': 1.79} +2025-02-05 22:51:29 - ERROR - stderr - 60%|█████▉ | 13375/22434 [12:43:49<6:15:21, 2.49s/it] +2025-02-05 22:51:31 - ERROR - stderr - 60%|█████▉ | 13376/22434 [12:43:51<6:16:03, 2.49s/it] +2025-02-05 22:51:31 - ERROR - stderr - +2025-02-05 22:51:31 - ERROR - stderr - +2025-02-05 22:51:31 - INFO - stdout - {'loss': 0.7028, 'grad_norm': 1.332722783088684, 'learning_rate': 7.399725459549783e-06, 'epoch': 1.79} +2025-02-05 22:51:31 - ERROR - stderr - 60%|█████▉ | 13376/22434 [12:43:51<6:16:03, 2.49s/it] +2025-02-05 22:51:34 - ERROR - stderr - 60%|█████▉ | 13377/22434 [12:43:53<6:12:32, 2.47s/it] +2025-02-05 22:51:34 - ERROR - stderr - +2025-02-05 22:51:34 - ERROR - stderr - +2025-02-05 22:51:34 - INFO - stdout - {'loss': 0.6459, 'grad_norm': 1.2073808908462524, 'learning_rate': 7.398331403148053e-06, 'epoch': 1.79} +2025-02-05 22:51:34 - ERROR - stderr - 60%|█████▉ | 13377/22434 [12:43:54<6:12:32, 2.47s/it] +2025-02-05 22:51:36 - ERROR - stderr - 60%|█████▉ | 13378/22434 [12:43:56<6:14:52, 2.48s/it] +2025-02-05 22:51:36 - ERROR - stderr - +2025-02-05 22:51:36 - ERROR - stderr - +2025-02-05 22:51:36 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.1911091804504395, 'learning_rate': 7.3969374009756104e-06, 'epoch': 1.79} +2025-02-05 22:51:36 - ERROR - stderr - 60%|█████▉ | 13378/22434 [12:43:56<6:14:52, 2.48s/it] +2025-02-05 22:51:39 - ERROR - stderr - 60%|█████▉ | 13379/22434 [12:43:59<6:20:20, 2.52s/it] +2025-02-05 22:51:39 - ERROR - stderr - +2025-02-05 22:51:39 - ERROR - stderr - +2025-02-05 22:51:39 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.2805213928222656, 'learning_rate': 7.395543453061522e-06, 'epoch': 1.79} +2025-02-05 22:51:39 - ERROR - stderr - 60%|█████▉ | 13379/22434 [12:43:59<6:20:20, 2.52s/it] +2025-02-05 22:51:41 - ERROR - stderr - 60%|█████▉ | 13380/22434 [12:44:01<6:20:51, 2.52s/it] +2025-02-05 22:51:41 - ERROR - stderr - +2025-02-05 22:51:41 - ERROR - stderr - +2025-02-05 22:51:41 - INFO - stdout - {'loss': 0.6974, 'grad_norm': 1.1723949909210205, 'learning_rate': 7.394149559434838e-06, 'epoch': 1.79} +2025-02-05 22:51:41 - ERROR - stderr - 60%|█████▉ | 13380/22434 [12:44:01<6:20:51, 2.52s/it] +2025-02-05 22:51:44 - ERROR - stderr - 60%|█████▉ | 13381/22434 [12:44:04<6:32:45, 2.60s/it] +2025-02-05 22:51:44 - ERROR - stderr - +2025-02-05 22:51:44 - ERROR - stderr - +2025-02-05 22:51:44 - INFO - stdout - {'loss': 0.6692, 'grad_norm': 1.191560983657837, 'learning_rate': 7.392755720124609e-06, 'epoch': 1.79} +2025-02-05 22:51:44 - ERROR - stderr - 60%|█████▉ | 13381/22434 [12:44:04<6:32:45, 2.60s/it] +2025-02-05 22:51:47 - ERROR - stderr - 60%|█████▉ | 13382/22434 [12:44:07<6:36:23, 2.63s/it] +2025-02-05 22:51:47 - ERROR - stderr - +2025-02-05 22:51:47 - ERROR - stderr - +2025-02-05 22:51:47 - INFO - stdout - {'loss': 0.6605, 'grad_norm': 1.1938276290893555, 'learning_rate': 7.391361935159893e-06, 'epoch': 1.79} +2025-02-05 22:51:47 - ERROR - stderr - 60%|█████▉ | 13382/22434 [12:44:07<6:36:23, 2.63s/it] +2025-02-05 22:51:49 - ERROR - stderr - 60%|█████▉ | 13383/22434 [12:44:09<6:32:06, 2.60s/it] +2025-02-05 22:51:49 - ERROR - stderr - +2025-02-05 22:51:49 - ERROR - stderr - +2025-02-05 22:51:49 - INFO - stdout - {'loss': 0.6469, 'grad_norm': 1.3397566080093384, 'learning_rate': 7.38996820456974e-06, 'epoch': 1.79} +2025-02-05 22:51:49 - ERROR - stderr - 60%|█████▉ | 13383/22434 [12:44:09<6:32:06, 2.60s/it] +2025-02-05 22:51:52 - ERROR - stderr - 60%|█████▉ | 13384/22434 [12:44:12<6:28:02, 2.57s/it] +2025-02-05 22:51:52 - ERROR - stderr - +2025-02-05 22:51:52 - ERROR - stderr - +2025-02-05 22:51:52 - INFO - stdout - {'loss': 0.7018, 'grad_norm': 1.1818126440048218, 'learning_rate': 7.388574528383207e-06, 'epoch': 1.79} +2025-02-05 22:51:52 - ERROR - stderr - 60%|█████▉ | 13384/22434 [12:44:12<6:28:02, 2.57s/it] +2025-02-05 22:51:54 - ERROR - stderr - 60%|█████▉ | 13385/22434 [12:44:14<6:21:40, 2.53s/it] +2025-02-05 22:51:54 - ERROR - stderr - +2025-02-05 22:51:54 - ERROR - stderr - +2025-02-05 22:51:54 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.1265572309494019, 'learning_rate': 7.387180906629339e-06, 'epoch': 1.79} +2025-02-05 22:51:54 - ERROR - stderr - 60%|█████▉ | 13385/22434 [12:44:14<6:21:40, 2.53s/it] +2025-02-05 22:51:57 - ERROR - stderr - 60%|█████▉ | 13386/22434 [12:44:17<6:20:52, 2.53s/it] +2025-02-05 22:51:57 - ERROR - stderr - +2025-02-05 22:51:57 - ERROR - stderr - +2025-02-05 22:51:57 - INFO - stdout - {'loss': 0.651, 'grad_norm': 1.3017100095748901, 'learning_rate': 7.38578733933718e-06, 'epoch': 1.79} +2025-02-05 22:51:57 - ERROR - stderr - 60%|█████▉ | 13386/22434 [12:44:17<6:20:52, 2.53s/it] +2025-02-05 22:51:59 - ERROR - stderr - 60%|█████▉ | 13387/22434 [12:44:19<6:23:21, 2.54s/it] +2025-02-05 22:51:59 - ERROR - stderr - +2025-02-05 22:51:59 - ERROR - stderr - +2025-02-05 22:51:59 - INFO - stdout - {'loss': 0.7286, 'grad_norm': 1.2248269319534302, 'learning_rate': 7.384393826535786e-06, 'epoch': 1.79} +2025-02-05 22:51:59 - ERROR - stderr - 60%|█████▉ | 13387/22434 [12:44:19<6:23:21, 2.54s/it] +2025-02-05 22:52:02 - ERROR - stderr - 60%|█████▉ | 13388/22434 [12:44:22<6:21:12, 2.53s/it] +2025-02-05 22:52:02 - ERROR - stderr - +2025-02-05 22:52:02 - ERROR - stderr - +2025-02-05 22:52:02 - INFO - stdout - {'loss': 0.7568, 'grad_norm': 1.3767300844192505, 'learning_rate': 7.383000368254199e-06, 'epoch': 1.79} +2025-02-05 22:52:02 - ERROR - stderr - 60%|█████▉ | 13388/22434 [12:44:22<6:21:12, 2.53s/it] +2025-02-05 22:52:05 - ERROR - stderr - 60%|█████▉ | 13389/22434 [12:44:24<6:32:00, 2.60s/it] +2025-02-05 22:52:05 - ERROR - stderr - +2025-02-05 22:52:05 - ERROR - stderr - +2025-02-05 22:52:05 - INFO - stdout - {'loss': 0.6654, 'grad_norm': 1.2575101852416992, 'learning_rate': 7.3816069645214615e-06, 'epoch': 1.79} +2025-02-05 22:52:05 - ERROR - stderr - 60%|█████▉ | 13389/22434 [12:44:25<6:32:00, 2.60s/it] +2025-02-05 22:52:07 - ERROR - stderr - 60%|█████▉ | 13390/22434 [12:44:27<6:24:09, 2.55s/it] +2025-02-05 22:52:07 - ERROR - stderr - +2025-02-05 22:52:07 - ERROR - stderr - +2025-02-05 22:52:07 - INFO - stdout - {'loss': 0.7282, 'grad_norm': 1.3004111051559448, 'learning_rate': 7.380213615366627e-06, 'epoch': 1.79} +2025-02-05 22:52:07 - ERROR - stderr - 60%|█████▉ | 13390/22434 [12:44:27<6:24:09, 2.55s/it] +2025-02-05 22:52:10 - ERROR - stderr - 60%|█████▉ | 13391/22434 [12:44:29<6:20:46, 2.53s/it] +2025-02-05 22:52:10 - ERROR - stderr - +2025-02-05 22:52:10 - ERROR - stderr - +2025-02-05 22:52:10 - INFO - stdout - {'loss': 0.6915, 'grad_norm': 1.2869174480438232, 'learning_rate': 7.378820320818728e-06, 'epoch': 1.79} +2025-02-05 22:52:10 - ERROR - stderr - 60%|█████▉ | 13391/22434 [12:44:29<6:20:46, 2.53s/it] +2025-02-05 22:52:12 - ERROR - stderr - 60%|█████▉ | 13392/22434 [12:44:32<6:18:46, 2.51s/it] +2025-02-05 22:52:12 - ERROR - stderr - +2025-02-05 22:52:12 - ERROR - stderr - +2025-02-05 22:52:12 - INFO - stdout - {'loss': 0.7109, 'grad_norm': 1.2462431192398071, 'learning_rate': 7.377427080906816e-06, 'epoch': 1.79} +2025-02-05 22:52:12 - ERROR - stderr - 60%|█████▉ | 13392/22434 [12:44:32<6:18:46, 2.51s/it] +2025-02-05 22:52:15 - ERROR - stderr - 60%|█████▉ | 13393/22434 [12:44:34<6:18:32, 2.51s/it] +2025-02-05 22:52:15 - ERROR - stderr - +2025-02-05 22:52:15 - ERROR - stderr - +2025-02-05 22:52:15 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.2483997344970703, 'learning_rate': 7.376033895659927e-06, 'epoch': 1.79} +2025-02-05 22:52:15 - ERROR - stderr - 60%|█████▉ | 13393/22434 [12:44:34<6:18:32, 2.51s/it] +2025-02-05 22:52:17 - ERROR - stderr - 60%|█████▉ | 13394/22434 [12:44:37<6:17:58, 2.51s/it] +2025-02-05 22:52:17 - ERROR - stderr - +2025-02-05 22:52:17 - ERROR - stderr - +2025-02-05 22:52:17 - INFO - stdout - {'loss': 0.7578, 'grad_norm': 1.302516222000122, 'learning_rate': 7.374640765107095e-06, 'epoch': 1.79} +2025-02-05 22:52:17 - ERROR - stderr - 60%|█████▉ | 13394/22434 [12:44:37<6:17:58, 2.51s/it] +2025-02-05 22:52:20 - ERROR - stderr - 60%|█████▉ | 13395/22434 [12:44:39<6:17:48, 2.51s/it] +2025-02-05 22:52:20 - ERROR - stderr - +2025-02-05 22:52:20 - ERROR - stderr - +2025-02-05 22:52:20 - INFO - stdout - {'loss': 0.6055, 'grad_norm': 1.0462085008621216, 'learning_rate': 7.373247689277367e-06, 'epoch': 1.79} +2025-02-05 22:52:20 - ERROR - stderr - 60%|█████▉ | 13395/22434 [12:44:39<6:17:48, 2.51s/it] +2025-02-05 22:52:22 - ERROR - stderr - 60%|█████▉ | 13396/22434 [12:44:42<6:19:56, 2.52s/it] +2025-02-05 22:52:22 - ERROR - stderr - +2025-02-05 22:52:22 - ERROR - stderr - +2025-02-05 22:52:22 - INFO - stdout - {'loss': 0.745, 'grad_norm': 1.324966311454773, 'learning_rate': 7.3718546681997795e-06, 'epoch': 1.79} +2025-02-05 22:52:22 - ERROR - stderr - 60%|█████▉ | 13396/22434 [12:44:42<6:19:56, 2.52s/it] +2025-02-05 22:52:25 - ERROR - stderr - 60%|█████▉ | 13397/22434 [12:44:44<6:22:20, 2.54s/it] +2025-02-05 22:52:25 - ERROR - stderr - +2025-02-05 22:52:25 - ERROR - stderr - +2025-02-05 22:52:25 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.226814866065979, 'learning_rate': 7.370461701903362e-06, 'epoch': 1.79} +2025-02-05 22:52:25 - ERROR - stderr - 60%|█████▉ | 13397/22434 [12:44:45<6:22:20, 2.54s/it] +2025-02-05 22:52:27 - ERROR - stderr - 60%|█████▉ | 13398/22434 [12:44:47<6:21:40, 2.53s/it] +2025-02-05 22:52:27 - ERROR - stderr - +2025-02-05 22:52:27 - ERROR - stderr - +2025-02-05 22:52:27 - INFO - stdout - {'loss': 0.7267, 'grad_norm': 1.191521406173706, 'learning_rate': 7.369068790417159e-06, 'epoch': 1.79} +2025-02-05 22:52:27 - ERROR - stderr - 60%|█████▉ | 13398/22434 [12:44:47<6:21:40, 2.53s/it] +2025-02-05 22:52:30 - ERROR - stderr - 60%|█████▉ | 13399/22434 [12:44:49<6:18:36, 2.51s/it] +2025-02-05 22:52:30 - ERROR - stderr - +2025-02-05 22:52:30 - ERROR - stderr - +2025-02-05 22:52:30 - INFO - stdout - {'loss': 0.5498, 'grad_norm': 1.2224823236465454, 'learning_rate': 7.367675933770196e-06, 'epoch': 1.79} +2025-02-05 22:52:30 - ERROR - stderr - 60%|█████▉ | 13399/22434 [12:44:50<6:18:36, 2.51s/it] +2025-02-05 22:52:32 - ERROR - stderr - 60%|█████▉ | 13400/22434 [12:44:52<6:20:38, 2.53s/it] +2025-02-05 22:52:32 - ERROR - stderr - +2025-02-05 22:52:32 - ERROR - stderr - +2025-02-05 22:52:32 - INFO - stdout - {'loss': 0.7043, 'grad_norm': 1.2102611064910889, 'learning_rate': 7.366283131991512e-06, 'epoch': 1.79} +2025-02-05 22:52:32 - ERROR - stderr - 60%|█████▉ | 13400/22434 [12:44:52<6:20:38, 2.53s/it] +2025-02-05 22:52:35 - ERROR - stderr - 60%|█████▉ | 13401/22434 [12:44:55<6:21:34, 2.53s/it] +2025-02-05 22:52:35 - ERROR - stderr - +2025-02-05 22:52:35 - ERROR - stderr - +2025-02-05 22:52:35 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.2830649614334106, 'learning_rate': 7.3648903851101335e-06, 'epoch': 1.79} +2025-02-05 22:52:35 - ERROR - stderr - 60%|█████▉ | 13401/22434 [12:44:55<6:21:34, 2.53s/it] +2025-02-05 22:52:37 - ERROR - stderr - 60%|█████▉ | 13402/22434 [12:44:57<6:18:13, 2.51s/it] +2025-02-05 22:52:37 - ERROR - stderr - +2025-02-05 22:52:37 - ERROR - stderr - +2025-02-05 22:52:37 - INFO - stdout - {'loss': 0.7399, 'grad_norm': 1.3355966806411743, 'learning_rate': 7.3634976931550925e-06, 'epoch': 1.79} +2025-02-05 22:52:37 - ERROR - stderr - 60%|█████▉ | 13402/22434 [12:44:57<6:18:13, 2.51s/it] +2025-02-05 22:52:40 - ERROR - stderr - 60%|█████▉ | 13403/22434 [12:45:00<6:17:09, 2.51s/it] +2025-02-05 22:52:40 - ERROR - stderr - +2025-02-05 22:52:40 - ERROR - stderr - +2025-02-05 22:52:40 - INFO - stdout - {'loss': 0.631, 'grad_norm': 1.1737391948699951, 'learning_rate': 7.362105056155423e-06, 'epoch': 1.79} +2025-02-05 22:52:40 - ERROR - stderr - 60%|█████▉ | 13403/22434 [12:45:00<6:17:09, 2.51s/it] +2025-02-05 22:52:42 - ERROR - stderr - 60%|█████▉ | 13404/22434 [12:45:02<6:18:59, 2.52s/it] +2025-02-05 22:52:42 - ERROR - stderr - +2025-02-05 22:52:42 - ERROR - stderr - +2025-02-05 22:52:42 - INFO - stdout - {'loss': 0.6969, 'grad_norm': 1.1708190441131592, 'learning_rate': 7.360712474140149e-06, 'epoch': 1.79} +2025-02-05 22:52:42 - ERROR - stderr - 60%|█████▉ | 13404/22434 [12:45:02<6:18:59, 2.52s/it] +2025-02-05 22:52:45 - ERROR - stderr - 60%|█████▉ | 13405/22434 [12:45:05<6:16:09, 2.50s/it] +2025-02-05 22:52:45 - ERROR - stderr - +2025-02-05 22:52:45 - ERROR - stderr - +2025-02-05 22:52:45 - INFO - stdout - {'loss': 0.7234, 'grad_norm': 1.3482171297073364, 'learning_rate': 7.359319947138295e-06, 'epoch': 1.79} +2025-02-05 22:52:45 - ERROR - stderr - 60%|█████▉ | 13405/22434 [12:45:05<6:16:09, 2.50s/it] +2025-02-05 22:52:47 - ERROR - stderr - 60%|█████▉ | 13406/22434 [12:45:07<6:13:04, 2.48s/it] +2025-02-05 22:52:47 - ERROR - stderr - +2025-02-05 22:52:47 - ERROR - stderr - +2025-02-05 22:52:47 - INFO - stdout - {'loss': 0.6527, 'grad_norm': 1.2630066871643066, 'learning_rate': 7.3579274751788935e-06, 'epoch': 1.79} +2025-02-05 22:52:47 - ERROR - stderr - 60%|█████▉ | 13406/22434 [12:45:07<6:13:04, 2.48s/it] +2025-02-05 22:52:50 - ERROR - stderr - 60%|█████▉ | 13407/22434 [12:45:09<6:14:14, 2.49s/it] +2025-02-05 22:52:50 - ERROR - stderr - +2025-02-05 22:52:50 - ERROR - stderr - +2025-02-05 22:52:50 - INFO - stdout - {'loss': 0.6884, 'grad_norm': 1.2136695384979248, 'learning_rate': 7.3565350582909614e-06, 'epoch': 1.79} +2025-02-05 22:52:50 - ERROR - stderr - 60%|█████▉ | 13407/22434 [12:45:10<6:14:14, 2.49s/it] +2025-02-05 22:52:52 - ERROR - stderr - 60%|█████▉ | 13408/22434 [12:45:12<6:17:17, 2.51s/it] +2025-02-05 22:52:52 - ERROR - stderr - +2025-02-05 22:52:52 - ERROR - stderr - +2025-02-05 22:52:52 - INFO - stdout - {'loss': 0.6347, 'grad_norm': 1.3214963674545288, 'learning_rate': 7.355142696503528e-06, 'epoch': 1.79} +2025-02-05 22:52:52 - ERROR - stderr - 60%|█████▉ | 13408/22434 [12:45:12<6:17:17, 2.51s/it] +2025-02-05 22:52:55 - ERROR - stderr - 60%|█████▉ | 13409/22434 [12:45:15<6:20:24, 2.53s/it] +2025-02-05 22:52:55 - ERROR - stderr - +2025-02-05 22:52:55 - ERROR - stderr - +2025-02-05 22:52:55 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.2179282903671265, 'learning_rate': 7.353750389845616e-06, 'epoch': 1.79} +2025-02-05 22:52:55 - ERROR - stderr - 60%|█████▉ | 13409/22434 [12:45:15<6:20:24, 2.53s/it] +2025-02-05 22:52:58 - ERROR - stderr - 60%|█████▉ | 13410/22434 [12:45:17<6:31:58, 2.61s/it] +2025-02-05 22:52:58 - ERROR - stderr - +2025-02-05 22:52:58 - ERROR - stderr - +2025-02-05 22:52:58 - INFO - stdout - {'loss': 0.6123, 'grad_norm': 1.2094076871871948, 'learning_rate': 7.352358138346241e-06, 'epoch': 1.79} +2025-02-05 22:52:58 - ERROR - stderr - 60%|█████▉ | 13410/22434 [12:45:17<6:31:58, 2.61s/it] +2025-02-05 22:53:00 - ERROR - stderr - 60%|█████▉ | 13411/22434 [12:45:20<6:24:51, 2.56s/it] +2025-02-05 22:53:00 - ERROR - stderr - +2025-02-05 22:53:00 - ERROR - stderr - +2025-02-05 22:53:00 - INFO - stdout - {'loss': 0.6523, 'grad_norm': 1.1670721769332886, 'learning_rate': 7.350965942034433e-06, 'epoch': 1.79} +2025-02-05 22:53:00 - ERROR - stderr - 60%|█████▉ | 13411/22434 [12:45:20<6:24:51, 2.56s/it] +2025-02-05 22:53:03 - ERROR - stderr - 60%|█████▉ | 13412/22434 [12:45:22<6:25:07, 2.56s/it] +2025-02-05 22:53:03 - ERROR - stderr - +2025-02-05 22:53:03 - ERROR - stderr - +2025-02-05 22:53:03 - INFO - stdout - {'loss': 0.6983, 'grad_norm': 1.2386744022369385, 'learning_rate': 7.3495738009392026e-06, 'epoch': 1.79} +2025-02-05 22:53:03 - ERROR - stderr - 60%|█████▉ | 13412/22434 [12:45:22<6:25:07, 2.56s/it] +2025-02-05 22:53:05 - ERROR - stderr - 60%|█████▉ | 13413/22434 [12:45:25<6:20:54, 2.53s/it] +2025-02-05 22:53:05 - ERROR - stderr - +2025-02-05 22:53:05 - ERROR - stderr - +2025-02-05 22:53:05 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.2220262289047241, 'learning_rate': 7.348181715089569e-06, 'epoch': 1.79} +2025-02-05 22:53:05 - ERROR - stderr - 60%|█████▉ | 13413/22434 [12:45:25<6:20:54, 2.53s/it] +2025-02-05 22:53:08 - ERROR - stderr - 60%|█████▉ | 13414/22434 [12:45:27<6:18:41, 2.52s/it] +2025-02-05 22:53:08 - ERROR - stderr - +2025-02-05 22:53:08 - ERROR - stderr - +2025-02-05 22:53:08 - INFO - stdout - {'loss': 0.6315, 'grad_norm': 1.25224769115448, 'learning_rate': 7.34678968451455e-06, 'epoch': 1.79} +2025-02-05 22:53:08 - ERROR - stderr - 60%|█████▉ | 13414/22434 [12:45:27<6:18:41, 2.52s/it] +2025-02-05 22:53:10 - ERROR - stderr - 60%|█████▉ | 13415/22434 [12:45:30<6:21:26, 2.54s/it] +2025-02-05 22:53:10 - ERROR - stderr - +2025-02-05 22:53:10 - ERROR - stderr - +2025-02-05 22:53:10 - INFO - stdout - {'loss': 0.6284, 'grad_norm': 1.1697125434875488, 'learning_rate': 7.345397709243159e-06, 'epoch': 1.79} +2025-02-05 22:53:10 - ERROR - stderr - 60%|█████▉ | 13415/22434 [12:45:30<6:21:26, 2.54s/it] +2025-02-05 22:53:13 - ERROR - stderr - 60%|█████▉ | 13416/22434 [12:45:32<6:17:26, 2.51s/it] +2025-02-05 22:53:13 - ERROR - stderr - +2025-02-05 22:53:13 - ERROR - stderr - +2025-02-05 22:53:13 - INFO - stdout - {'loss': 0.8295, 'grad_norm': 1.559451699256897, 'learning_rate': 7.344005789304416e-06, 'epoch': 1.79} +2025-02-05 22:53:13 - ERROR - stderr - 60%|█████▉ | 13416/22434 [12:45:32<6:17:26, 2.51s/it] +2025-02-05 22:53:15 - ERROR - stderr - 60%|█████▉ | 13417/22434 [12:45:35<6:14:22, 2.49s/it] +2025-02-05 22:53:15 - ERROR - stderr - +2025-02-05 22:53:15 - ERROR - stderr - +2025-02-05 22:53:15 - INFO - stdout - {'loss': 0.6051, 'grad_norm': 1.2177834510803223, 'learning_rate': 7.3426139247273335e-06, 'epoch': 1.79} +2025-02-05 22:53:15 - ERROR - stderr - 60%|█████▉ | 13417/22434 [12:45:35<6:14:22, 2.49s/it] +2025-02-05 22:53:18 - ERROR - stderr - 60%|█████▉ | 13418/22434 [12:45:37<6:11:56, 2.48s/it] +2025-02-05 22:53:18 - ERROR - stderr - +2025-02-05 22:53:18 - ERROR - stderr - +2025-02-05 22:53:18 - INFO - stdout - {'loss': 0.6447, 'grad_norm': 1.2296241521835327, 'learning_rate': 7.3412221155409135e-06, 'epoch': 1.79} +2025-02-05 22:53:18 - ERROR - stderr - 60%|█████▉ | 13418/22434 [12:45:37<6:11:56, 2.48s/it] +2025-02-05 22:53:20 - ERROR - stderr - 60%|█████▉ | 13419/22434 [12:45:40<6:09:25, 2.46s/it] +2025-02-05 22:53:20 - ERROR - stderr - +2025-02-05 22:53:20 - ERROR - stderr - +2025-02-05 22:53:20 - INFO - stdout - {'loss': 0.733, 'grad_norm': 1.3108586072921753, 'learning_rate': 7.33983036177418e-06, 'epoch': 1.79} +2025-02-05 22:53:20 - ERROR - stderr - 60%|█████▉ | 13419/22434 [12:45:40<6:09:25, 2.46s/it] +2025-02-05 22:53:22 - ERROR - stderr - 60%|█████▉ | 13420/22434 [12:45:42<6:10:52, 2.47s/it] +2025-02-05 22:53:22 - ERROR - stderr - +2025-02-05 22:53:22 - ERROR - stderr - +2025-02-05 22:53:22 - INFO - stdout - {'loss': 0.652, 'grad_norm': 1.1300619840621948, 'learning_rate': 7.338438663456136e-06, 'epoch': 1.79} +2025-02-05 22:53:22 - ERROR - stderr - 60%|█████▉ | 13420/22434 [12:45:42<6:10:52, 2.47s/it] +2025-02-05 22:53:25 - ERROR - stderr - 60%|█████▉ | 13421/22434 [12:45:45<6:17:27, 2.51s/it] +2025-02-05 22:53:25 - ERROR - stderr - +2025-02-05 22:53:25 - ERROR - stderr - +2025-02-05 22:53:25 - INFO - stdout - {'loss': 0.6483, 'grad_norm': 1.2096654176712036, 'learning_rate': 7.337047020615789e-06, 'epoch': 1.79} +2025-02-05 22:53:25 - ERROR - stderr - 60%|█████▉ | 13421/22434 [12:45:45<6:17:27, 2.51s/it] +2025-02-05 22:53:27 - ERROR - stderr - 60%|█████▉ | 13422/22434 [12:45:47<6:14:43, 2.49s/it] +2025-02-05 22:53:28 - ERROR - stderr - +2025-02-05 22:53:28 - ERROR - stderr - +2025-02-05 22:53:28 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.2980238199234009, 'learning_rate': 7.335655433282151e-06, 'epoch': 1.79} +2025-02-05 22:53:28 - ERROR - stderr - 60%|█████▉ | 13422/22434 [12:45:47<6:14:43, 2.49s/it] +2025-02-05 22:53:30 - ERROR - stderr - 60%|█████▉ | 13423/22434 [12:45:50<6:16:01, 2.50s/it] +2025-02-05 22:53:30 - ERROR - stderr - +2025-02-05 22:53:30 - ERROR - stderr - +2025-02-05 22:53:30 - INFO - stdout - {'loss': 0.6168, 'grad_norm': 1.0964913368225098, 'learning_rate': 7.334263901484223e-06, 'epoch': 1.79} +2025-02-05 22:53:30 - ERROR - stderr - 60%|█████▉ | 13423/22434 [12:45:50<6:16:01, 2.50s/it] +2025-02-05 22:53:32 - ERROR - stderr - 60%|█████▉ | 13424/22434 [12:45:52<6:12:21, 2.48s/it] +2025-02-05 22:53:32 - ERROR - stderr - +2025-02-05 22:53:32 - ERROR - stderr - +2025-02-05 22:53:32 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.2979035377502441, 'learning_rate': 7.332872425251017e-06, 'epoch': 1.8} +2025-02-05 22:53:32 - ERROR - stderr - 60%|█████▉ | 13424/22434 [12:45:52<6:12:21, 2.48s/it] +2025-02-05 22:53:35 - ERROR - stderr - 60%|█████▉ | 13425/22434 [12:45:55<6:12:53, 2.48s/it] +2025-02-05 22:53:35 - ERROR - stderr - +2025-02-05 22:53:35 - ERROR - stderr - +2025-02-05 22:53:35 - INFO - stdout - {'loss': 0.6275, 'grad_norm': 1.2881304025650024, 'learning_rate': 7.331481004611533e-06, 'epoch': 1.8} +2025-02-05 22:53:35 - ERROR - stderr - 60%|█████▉ | 13425/22434 [12:45:55<6:12:53, 2.48s/it] +2025-02-05 22:53:37 - ERROR - stderr - 60%|█████▉ | 13426/22434 [12:45:57<6:12:21, 2.48s/it] +2025-02-05 22:53:37 - ERROR - stderr - +2025-02-05 22:53:37 - ERROR - stderr - +2025-02-05 22:53:37 - INFO - stdout - {'loss': 0.645, 'grad_norm': 1.1788579225540161, 'learning_rate': 7.330089639594771e-06, 'epoch': 1.8} +2025-02-05 22:53:37 - ERROR - stderr - 60%|█████▉ | 13426/22434 [12:45:57<6:12:21, 2.48s/it] +2025-02-05 22:53:40 - ERROR - stderr - 60%|█████▉ | 13427/22434 [12:46:00<6:10:38, 2.47s/it] +2025-02-05 22:53:40 - ERROR - stderr - +2025-02-05 22:53:40 - ERROR - stderr - +2025-02-05 22:53:40 - INFO - stdout - {'loss': 0.7253, 'grad_norm': 1.2483694553375244, 'learning_rate': 7.328698330229738e-06, 'epoch': 1.8} +2025-02-05 22:53:40 - ERROR - stderr - 60%|█████▉ | 13427/22434 [12:46:00<6:10:38, 2.47s/it] +2025-02-05 22:53:42 - ERROR - stderr - 60%|█████▉ | 13428/22434 [12:46:02<6:10:19, 2.47s/it] +2025-02-05 22:53:42 - ERROR - stderr - +2025-02-05 22:53:42 - ERROR - stderr - +2025-02-05 22:53:42 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.3123050928115845, 'learning_rate': 7.327307076545428e-06, 'epoch': 1.8} +2025-02-05 22:53:42 - ERROR - stderr - 60%|█████▉ | 13428/22434 [12:46:02<6:10:19, 2.47s/it] +2025-02-05 22:53:45 - ERROR - stderr - 60%|█████▉ | 13429/22434 [12:46:05<6:10:51, 2.47s/it] +2025-02-05 22:53:45 - ERROR - stderr - +2025-02-05 22:53:45 - ERROR - stderr - +2025-02-05 22:53:45 - INFO - stdout - {'loss': 0.7124, 'grad_norm': 1.2769874334335327, 'learning_rate': 7.325915878570851e-06, 'epoch': 1.8} +2025-02-05 22:53:45 - ERROR - stderr - 60%|█████▉ | 13429/22434 [12:46:05<6:10:51, 2.47s/it] +2025-02-05 22:53:47 - ERROR - stderr - 60%|█████▉ | 13430/22434 [12:46:07<6:09:43, 2.46s/it] +2025-02-05 22:53:47 - ERROR - stderr - +2025-02-05 22:53:47 - ERROR - stderr - +2025-02-05 22:53:47 - INFO - stdout - {'loss': 0.6965, 'grad_norm': 1.2320728302001953, 'learning_rate': 7.324524736334997e-06, 'epoch': 1.8} +2025-02-05 22:53:47 - ERROR - stderr - 60%|█████▉ | 13430/22434 [12:46:07<6:09:43, 2.46s/it] +2025-02-05 22:53:50 - ERROR - stderr - 60%|█████▉ | 13431/22434 [12:46:09<6:09:44, 2.46s/it] +2025-02-05 22:53:50 - ERROR - stderr - +2025-02-05 22:53:50 - ERROR - stderr - +2025-02-05 22:53:50 - INFO - stdout - {'loss': 0.7034, 'grad_norm': 1.2586181163787842, 'learning_rate': 7.32313364986686e-06, 'epoch': 1.8} +2025-02-05 22:53:50 - ERROR - stderr - 60%|█████▉ | 13431/22434 [12:46:10<6:09:44, 2.46s/it] +2025-02-05 22:53:52 - ERROR - stderr - 60%|█████▉ | 13432/22434 [12:46:12<6:18:04, 2.52s/it] +2025-02-05 22:53:52 - ERROR - stderr - +2025-02-05 22:53:52 - ERROR - stderr - +2025-02-05 22:53:52 - INFO - stdout - {'loss': 0.746, 'grad_norm': 1.351989984512329, 'learning_rate': 7.321742619195446e-06, 'epoch': 1.8} +2025-02-05 22:53:52 - ERROR - stderr - 60%|█████▉ | 13432/22434 [12:46:12<6:18:04, 2.52s/it] +2025-02-05 22:53:55 - ERROR - stderr - 60%|█████▉ | 13433/22434 [12:46:15<6:14:42, 2.50s/it] +2025-02-05 22:53:55 - ERROR - stderr - +2025-02-05 22:53:55 - ERROR - stderr - +2025-02-05 22:53:55 - INFO - stdout - {'loss': 0.589, 'grad_norm': 1.220894694328308, 'learning_rate': 7.320351644349741e-06, 'epoch': 1.8} +2025-02-05 22:53:55 - ERROR - stderr - 60%|█████▉ | 13433/22434 [12:46:15<6:14:42, 2.50s/it] +2025-02-05 22:53:57 - ERROR - stderr - 60%|█████▉ | 13434/22434 [12:46:17<6:15:03, 2.50s/it] +2025-02-05 22:53:57 - ERROR - stderr - +2025-02-05 22:53:57 - ERROR - stderr - +2025-02-05 22:53:57 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 1.1424720287322998, 'learning_rate': 7.318960725358742e-06, 'epoch': 1.8} +2025-02-05 22:53:57 - ERROR - stderr - 60%|█████▉ | 13434/22434 [12:46:17<6:15:03, 2.50s/it] +2025-02-05 22:54:00 - ERROR - stderr - 60%|█████▉ | 13435/22434 [12:46:20<6:13:29, 2.49s/it] +2025-02-05 22:54:00 - ERROR - stderr - +2025-02-05 22:54:00 - ERROR - stderr - +2025-02-05 22:54:00 - INFO - stdout - {'loss': 0.6917, 'grad_norm': 1.2809216976165771, 'learning_rate': 7.317569862251444e-06, 'epoch': 1.8} +2025-02-05 22:54:00 - ERROR - stderr - 60%|█████▉ | 13435/22434 [12:46:20<6:13:29, 2.49s/it] +2025-02-05 22:54:02 - ERROR - stderr - 60%|█████▉ | 13436/22434 [12:46:22<6:17:55, 2.52s/it] +2025-02-05 22:54:02 - ERROR - stderr - +2025-02-05 22:54:02 - ERROR - stderr - +2025-02-05 22:54:02 - INFO - stdout - {'loss': 0.609, 'grad_norm': 1.266371488571167, 'learning_rate': 7.316179055056831e-06, 'epoch': 1.8} +2025-02-05 22:54:02 - ERROR - stderr - 60%|█████▉ | 13436/22434 [12:46:22<6:17:55, 2.52s/it] +2025-02-05 22:54:05 - ERROR - stderr - 60%|█████▉ | 13437/22434 [12:46:25<6:15:12, 2.50s/it] +2025-02-05 22:54:05 - ERROR - stderr - +2025-02-05 22:54:05 - ERROR - stderr - +2025-02-05 22:54:05 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.2243026494979858, 'learning_rate': 7.3147883038039015e-06, 'epoch': 1.8} +2025-02-05 22:54:05 - ERROR - stderr - 60%|█████▉ | 13437/22434 [12:46:25<6:15:12, 2.50s/it] +2025-02-05 22:54:07 - ERROR - stderr - 60%|█████▉ | 13438/22434 [12:46:27<6:16:46, 2.51s/it] +2025-02-05 22:54:07 - ERROR - stderr - +2025-02-05 22:54:07 - ERROR - stderr - +2025-02-05 22:54:07 - INFO - stdout - {'loss': 0.7951, 'grad_norm': 1.3708266019821167, 'learning_rate': 7.313397608521641e-06, 'epoch': 1.8} +2025-02-05 22:54:07 - ERROR - stderr - 60%|█████▉ | 13438/22434 [12:46:27<6:16:46, 2.51s/it] +2025-02-05 22:54:10 - ERROR - stderr - 60%|█████▉ | 13439/22434 [12:46:30<6:16:47, 2.51s/it] +2025-02-05 22:54:10 - ERROR - stderr - +2025-02-05 22:54:10 - ERROR - stderr - +2025-02-05 22:54:10 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.2156145572662354, 'learning_rate': 7.312006969239032e-06, 'epoch': 1.8} +2025-02-05 22:54:10 - ERROR - stderr - 60%|█████▉ | 13439/22434 [12:46:30<6:16:47, 2.51s/it] +2025-02-05 22:54:12 - ERROR - stderr - 60%|█████▉ | 13440/22434 [12:46:32<6:13:19, 2.49s/it] +2025-02-05 22:54:12 - ERROR - stderr - +2025-02-05 22:54:12 - ERROR - stderr - +2025-02-05 22:54:12 - INFO - stdout - {'loss': 0.6566, 'grad_norm': 1.3105140924453735, 'learning_rate': 7.3106163859850675e-06, 'epoch': 1.8} +2025-02-05 22:54:12 - ERROR - stderr - 60%|█████▉ | 13440/22434 [12:46:32<6:13:19, 2.49s/it] +2025-02-05 22:54:15 - ERROR - stderr - 60%|█████▉ | 13441/22434 [12:46:35<6:13:12, 2.49s/it] +2025-02-05 22:54:15 - ERROR - stderr - +2025-02-05 22:54:15 - ERROR - stderr - +2025-02-05 22:54:15 - INFO - stdout - {'loss': 0.8054, 'grad_norm': 1.4145431518554688, 'learning_rate': 7.309225858788733e-06, 'epoch': 1.8} +2025-02-05 22:54:15 - ERROR - stderr - 60%|█████▉ | 13441/22434 [12:46:35<6:13:12, 2.49s/it] +2025-02-05 22:54:17 - ERROR - stderr - 60%|█████▉ | 13442/22434 [12:46:37<6:16:34, 2.51s/it] +2025-02-05 22:54:17 - ERROR - stderr - +2025-02-05 22:54:17 - ERROR - stderr - +2025-02-05 22:54:17 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.3209199905395508, 'learning_rate': 7.307835387679007e-06, 'epoch': 1.8} +2025-02-05 22:54:17 - ERROR - stderr - 60%|█████▉ | 13442/22434 [12:46:37<6:16:34, 2.51s/it] +2025-02-05 22:54:20 - ERROR - stderr - 60%|█████▉ | 13443/22434 [12:46:40<6:17:21, 2.52s/it] +2025-02-05 22:54:20 - ERROR - stderr - +2025-02-05 22:54:20 - ERROR - stderr - +2025-02-05 22:54:20 - INFO - stdout - {'loss': 0.6453, 'grad_norm': 1.338935136795044, 'learning_rate': 7.3064449726848805e-06, 'epoch': 1.8} +2025-02-05 22:54:20 - ERROR - stderr - 60%|█████▉ | 13443/22434 [12:46:40<6:17:21, 2.52s/it] +2025-02-05 22:54:22 - ERROR - stderr - 60%|█████▉ | 13444/22434 [12:46:42<6:16:27, 2.51s/it] +2025-02-05 22:54:22 - ERROR - stderr - +2025-02-05 22:54:22 - ERROR - stderr - +2025-02-05 22:54:22 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.0802689790725708, 'learning_rate': 7.305054613835326e-06, 'epoch': 1.8} +2025-02-05 22:54:22 - ERROR - stderr - 60%|█████▉ | 13444/22434 [12:46:42<6:16:27, 2.51s/it] +2025-02-05 22:54:25 - ERROR - stderr - 60%|█████▉ | 13445/22434 [12:46:45<6:16:36, 2.51s/it] +2025-02-05 22:54:25 - ERROR - stderr - +2025-02-05 22:54:25 - ERROR - stderr - +2025-02-05 22:54:25 - INFO - stdout - {'loss': 0.6033, 'grad_norm': 1.1431933641433716, 'learning_rate': 7.303664311159335e-06, 'epoch': 1.8} +2025-02-05 22:54:25 - ERROR - stderr - 60%|█████▉ | 13445/22434 [12:46:45<6:16:36, 2.51s/it] +2025-02-05 22:54:27 - ERROR - stderr - 60%|█████▉ | 13446/22434 [12:46:47<6:15:53, 2.51s/it] +2025-02-05 22:54:27 - ERROR - stderr - +2025-02-05 22:54:27 - ERROR - stderr - +2025-02-05 22:54:27 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.3665932416915894, 'learning_rate': 7.3022740646858785e-06, 'epoch': 1.8} +2025-02-05 22:54:27 - ERROR - stderr - 60%|█████▉ | 13446/22434 [12:46:47<6:15:53, 2.51s/it] +2025-02-05 22:54:30 - ERROR - stderr - 60%|█████▉ | 13447/22434 [12:46:50<6:16:44, 2.52s/it] +2025-02-05 22:54:30 - ERROR - stderr - +2025-02-05 22:54:30 - ERROR - stderr - +2025-02-05 22:54:30 - INFO - stdout - {'loss': 0.615, 'grad_norm': 1.1526343822479248, 'learning_rate': 7.300883874443935e-06, 'epoch': 1.8} +2025-02-05 22:54:30 - ERROR - stderr - 60%|█████▉ | 13447/22434 [12:46:50<6:16:44, 2.52s/it] +2025-02-05 22:54:32 - ERROR - stderr - 60%|█████▉ | 13448/22434 [12:46:52<6:17:08, 2.52s/it] +2025-02-05 22:54:33 - ERROR - stderr - +2025-02-05 22:54:33 - ERROR - stderr - +2025-02-05 22:54:33 - INFO - stdout - {'loss': 0.6817, 'grad_norm': 1.2276860475540161, 'learning_rate': 7.299493740462489e-06, 'epoch': 1.8} +2025-02-05 22:54:33 - ERROR - stderr - 60%|█████▉ | 13448/22434 [12:46:52<6:17:08, 2.52s/it] +2025-02-05 22:54:35 - ERROR - stderr - 60%|█████▉ | 13449/22434 [12:46:55<6:14:02, 2.50s/it] +2025-02-05 22:54:35 - ERROR - stderr - +2025-02-05 22:54:35 - ERROR - stderr - +2025-02-05 22:54:35 - INFO - stdout - {'loss': 0.6688, 'grad_norm': 1.3118462562561035, 'learning_rate': 7.2981036627705116e-06, 'epoch': 1.8} +2025-02-05 22:54:35 - ERROR - stderr - 60%|█████▉ | 13449/22434 [12:46:55<6:14:02, 2.50s/it] +2025-02-05 22:54:37 - ERROR - stderr - 60%|█████▉ | 13450/22434 [12:46:57<6:15:42, 2.51s/it] +2025-02-05 22:54:38 - ERROR - stderr - +2025-02-05 22:54:38 - ERROR - stderr - +2025-02-05 22:54:38 - INFO - stdout - {'loss': 0.6919, 'grad_norm': 1.2382930517196655, 'learning_rate': 7.2967136413969745e-06, 'epoch': 1.8} +2025-02-05 22:54:38 - ERROR - stderr - 60%|█████▉ | 13450/22434 [12:46:57<6:15:42, 2.51s/it] +2025-02-05 22:54:40 - ERROR - stderr - 60%|█████▉ | 13451/22434 [12:47:00<6:15:55, 2.51s/it] +2025-02-05 22:54:40 - ERROR - stderr - +2025-02-05 22:54:40 - ERROR - stderr - +2025-02-05 22:54:40 - INFO - stdout - {'loss': 0.7169, 'grad_norm': 1.3025058507919312, 'learning_rate': 7.295323676370858e-06, 'epoch': 1.8} +2025-02-05 22:54:40 - ERROR - stderr - 60%|█████▉ | 13451/22434 [12:47:00<6:15:55, 2.51s/it] +2025-02-05 22:54:43 - ERROR - stderr - 60%|█████▉ | 13452/22434 [12:47:03<6:29:45, 2.60s/it] +2025-02-05 22:54:43 - ERROR - stderr - +2025-02-05 22:54:43 - ERROR - stderr - +2025-02-05 22:54:43 - INFO - stdout - {'loss': 0.7169, 'grad_norm': 1.358379602432251, 'learning_rate': 7.293933767721127e-06, 'epoch': 1.8} +2025-02-05 22:54:43 - ERROR - stderr - 60%|█████▉ | 13452/22434 [12:47:03<6:29:45, 2.60s/it] +2025-02-05 22:54:45 - ERROR - stderr - 60%|█████▉ | 13453/22434 [12:47:05<6:27:19, 2.59s/it] +2025-02-05 22:54:45 - ERROR - stderr - +2025-02-05 22:54:45 - ERROR - stderr - +2025-02-05 22:54:45 - INFO - stdout - {'loss': 0.6362, 'grad_norm': 1.1618032455444336, 'learning_rate': 7.292543915476761e-06, 'epoch': 1.8} +2025-02-05 22:54:45 - ERROR - stderr - 60%|█████▉ | 13453/22434 [12:47:05<6:27:19, 2.59s/it] +2025-02-05 22:54:48 - ERROR - stderr - 60%|█████▉ | 13454/22434 [12:47:08<6:22:07, 2.55s/it] +2025-02-05 22:54:48 - ERROR - stderr - +2025-02-05 22:54:48 - ERROR - stderr - +2025-02-05 22:54:48 - INFO - stdout - {'loss': 0.6653, 'grad_norm': 1.2676730155944824, 'learning_rate': 7.291154119666727e-06, 'epoch': 1.8} +2025-02-05 22:54:48 - ERROR - stderr - 60%|█████▉ | 13454/22434 [12:47:08<6:22:07, 2.55s/it] +2025-02-05 22:54:50 - ERROR - stderr - 60%|█████▉ | 13455/22434 [12:47:10<6:16:40, 2.52s/it] +2025-02-05 22:54:50 - ERROR - stderr - +2025-02-05 22:54:50 - ERROR - stderr - +2025-02-05 22:54:50 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.196387529373169, 'learning_rate': 7.289764380319989e-06, 'epoch': 1.8} +2025-02-05 22:54:50 - ERROR - stderr - 60%|█████▉ | 13455/22434 [12:47:10<6:16:40, 2.52s/it] +2025-02-05 22:54:53 - ERROR - stderr - 60%|█████▉ | 13456/22434 [12:47:13<6:17:19, 2.52s/it] +2025-02-05 22:54:53 - ERROR - stderr - +2025-02-05 22:54:53 - ERROR - stderr - +2025-02-05 22:54:53 - INFO - stdout - {'loss': 0.6568, 'grad_norm': 1.2537990808486938, 'learning_rate': 7.288374697465524e-06, 'epoch': 1.8} +2025-02-05 22:54:53 - ERROR - stderr - 60%|█████▉ | 13456/22434 [12:47:13<6:17:19, 2.52s/it] +2025-02-05 22:54:55 - ERROR - stderr - 60%|█████▉ | 13457/22434 [12:47:15<6:13:54, 2.50s/it] +2025-02-05 22:54:55 - ERROR - stderr - +2025-02-05 22:54:55 - ERROR - stderr - +2025-02-05 22:54:55 - INFO - stdout - {'loss': 0.6325, 'grad_norm': 1.2401738166809082, 'learning_rate': 7.2869850711322934e-06, 'epoch': 1.8} +2025-02-05 22:54:55 - ERROR - stderr - 60%|█████▉ | 13457/22434 [12:47:15<6:13:54, 2.50s/it] +2025-02-05 22:54:58 - ERROR - stderr - 60%|█████▉ | 13458/22434 [12:47:17<6:12:38, 2.49s/it] +2025-02-05 22:54:58 - ERROR - stderr - +2025-02-05 22:54:58 - ERROR - stderr - +2025-02-05 22:54:58 - INFO - stdout - {'loss': 0.7344, 'grad_norm': 1.2342698574066162, 'learning_rate': 7.285595501349259e-06, 'epoch': 1.8} +2025-02-05 22:54:58 - ERROR - stderr - 60%|█████▉ | 13458/22434 [12:47:18<6:12:38, 2.49s/it] +2025-02-05 22:55:00 - ERROR - stderr - 60%|█████▉ | 13459/22434 [12:47:20<6:12:57, 2.49s/it] +2025-02-05 22:55:00 - ERROR - stderr - +2025-02-05 22:55:00 - ERROR - stderr - +2025-02-05 22:55:00 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.2173714637756348, 'learning_rate': 7.28420598814539e-06, 'epoch': 1.8} +2025-02-05 22:55:00 - ERROR - stderr - 60%|█████▉ | 13459/22434 [12:47:20<6:12:57, 2.49s/it] +2025-02-05 22:55:03 - ERROR - stderr - 60%|█████▉ | 13460/22434 [12:47:23<6:15:25, 2.51s/it] +2025-02-05 22:55:03 - ERROR - stderr - +2025-02-05 22:55:03 - ERROR - stderr - +2025-02-05 22:55:03 - INFO - stdout - {'loss': 0.6715, 'grad_norm': 1.1586909294128418, 'learning_rate': 7.282816531549648e-06, 'epoch': 1.8} +2025-02-05 22:55:03 - ERROR - stderr - 60%|█████▉ | 13460/22434 [12:47:23<6:15:25, 2.51s/it] +2025-02-05 22:55:05 - ERROR - stderr - 60%|██████ | 13461/22434 [12:47:25<6:15:35, 2.51s/it] +2025-02-05 22:55:05 - ERROR - stderr - +2025-02-05 22:55:05 - ERROR - stderr - +2025-02-05 22:55:05 - INFO - stdout - {'loss': 0.6345, 'grad_norm': 1.1464523077011108, 'learning_rate': 7.281427131590999e-06, 'epoch': 1.8} +2025-02-05 22:55:05 - ERROR - stderr - 60%|██████ | 13461/22434 [12:47:25<6:15:35, 2.51s/it] +2025-02-05 22:55:08 - ERROR - stderr - 60%|██████ | 13462/22434 [12:47:28<6:13:47, 2.50s/it] +2025-02-05 22:55:08 - ERROR - stderr - +2025-02-05 22:55:08 - ERROR - stderr - +2025-02-05 22:55:08 - INFO - stdout - {'loss': 0.6193, 'grad_norm': 1.2275243997573853, 'learning_rate': 7.2800377882984e-06, 'epoch': 1.8} +2025-02-05 22:55:08 - ERROR - stderr - 60%|██████ | 13462/22434 [12:47:28<6:13:47, 2.50s/it] +2025-02-05 22:55:10 - ERROR - stderr - 60%|██████ | 13463/22434 [12:47:30<6:13:09, 2.50s/it] +2025-02-05 22:55:10 - ERROR - stderr - +2025-02-05 22:55:10 - ERROR - stderr - +2025-02-05 22:55:10 - INFO - stdout - {'loss': 0.7097, 'grad_norm': 1.3102253675460815, 'learning_rate': 7.278648501700804e-06, 'epoch': 1.8} +2025-02-05 22:55:10 - ERROR - stderr - 60%|██████ | 13463/22434 [12:47:30<6:13:09, 2.50s/it] +2025-02-05 22:55:13 - ERROR - stderr - 60%|██████ | 13464/22434 [12:47:32<6:08:37, 2.47s/it] +2025-02-05 22:55:13 - ERROR - stderr - +2025-02-05 22:55:13 - ERROR - stderr - +2025-02-05 22:55:13 - INFO - stdout - {'loss': 0.7049, 'grad_norm': 1.3097261190414429, 'learning_rate': 7.277259271827184e-06, 'epoch': 1.8} +2025-02-05 22:55:13 - ERROR - stderr - 60%|██████ | 13464/22434 [12:47:32<6:08:37, 2.47s/it] +2025-02-05 22:55:15 - ERROR - stderr - 60%|██████ | 13465/22434 [12:47:35<6:10:19, 2.48s/it] +2025-02-05 22:55:15 - ERROR - stderr - +2025-02-05 22:55:15 - ERROR - stderr - +2025-02-05 22:55:15 - INFO - stdout - {'loss': 0.661, 'grad_norm': 1.2153162956237793, 'learning_rate': 7.275870098706485e-06, 'epoch': 1.8} +2025-02-05 22:55:15 - ERROR - stderr - 60%|██████ | 13465/22434 [12:47:35<6:10:19, 2.48s/it] +2025-02-05 22:55:18 - ERROR - stderr - 60%|██████ | 13466/22434 [12:47:37<6:11:36, 2.49s/it] +2025-02-05 22:55:18 - ERROR - stderr - +2025-02-05 22:55:18 - ERROR - stderr - +2025-02-05 22:55:18 - INFO - stdout - {'loss': 0.7015, 'grad_norm': 1.4036004543304443, 'learning_rate': 7.274480982367664e-06, 'epoch': 1.8} +2025-02-05 22:55:18 - ERROR - stderr - 60%|██████ | 13466/22434 [12:47:37<6:11:36, 2.49s/it] +2025-02-05 22:55:20 - ERROR - stderr - 60%|██████ | 13467/22434 [12:47:40<6:15:55, 2.52s/it] +2025-02-05 22:55:20 - ERROR - stderr - +2025-02-05 22:55:20 - ERROR - stderr - +2025-02-05 22:55:20 - INFO - stdout - {'loss': 0.6822, 'grad_norm': 1.2054928541183472, 'learning_rate': 7.273091922839686e-06, 'epoch': 1.8} +2025-02-05 22:55:20 - ERROR - stderr - 60%|██████ | 13467/22434 [12:47:40<6:15:55, 2.52s/it] +2025-02-05 22:55:23 - ERROR - stderr - 60%|██████ | 13468/22434 [12:47:43<6:21:08, 2.55s/it] +2025-02-05 22:55:23 - ERROR - stderr - +2025-02-05 22:55:23 - ERROR - stderr - +2025-02-05 22:55:23 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.2066320180892944, 'learning_rate': 7.271702920151491e-06, 'epoch': 1.8} +2025-02-05 22:55:23 - ERROR - stderr - 60%|██████ | 13468/22434 [12:47:43<6:21:08, 2.55s/it] +2025-02-05 22:55:25 - ERROR - stderr - 60%|██████ | 13469/22434 [12:47:45<6:21:17, 2.55s/it] +2025-02-05 22:55:25 - ERROR - stderr - +2025-02-05 22:55:25 - ERROR - stderr - +2025-02-05 22:55:25 - INFO - stdout - {'loss': 0.7551, 'grad_norm': 1.4527721405029297, 'learning_rate': 7.270313974332042e-06, 'epoch': 1.8} +2025-02-05 22:55:25 - ERROR - stderr - 60%|██████ | 13469/22434 [12:47:45<6:21:17, 2.55s/it] +2025-02-05 22:55:28 - ERROR - stderr - 60%|██████ | 13470/22434 [12:47:48<6:19:03, 2.54s/it] +2025-02-05 22:55:28 - ERROR - stderr - +2025-02-05 22:55:28 - ERROR - stderr - +2025-02-05 22:55:28 - INFO - stdout - {'loss': 0.5768, 'grad_norm': 1.0772895812988281, 'learning_rate': 7.268925085410288e-06, 'epoch': 1.8} +2025-02-05 22:55:28 - ERROR - stderr - 60%|██████ | 13470/22434 [12:47:48<6:19:03, 2.54s/it] +2025-02-05 22:55:30 - ERROR - stderr - 60%|██████ | 13471/22434 [12:47:50<6:17:43, 2.53s/it] +2025-02-05 22:55:30 - ERROR - stderr - +2025-02-05 22:55:30 - ERROR - stderr - +2025-02-05 22:55:30 - INFO - stdout - {'loss': 0.7337, 'grad_norm': 1.336227297782898, 'learning_rate': 7.26753625341517e-06, 'epoch': 1.8} +2025-02-05 22:55:30 - ERROR - stderr - 60%|██████ | 13471/22434 [12:47:50<6:17:43, 2.53s/it] +2025-02-05 22:55:33 - ERROR - stderr - 60%|██████ | 13472/22434 [12:47:53<6:15:37, 2.51s/it] +2025-02-05 22:55:33 - ERROR - stderr - +2025-02-05 22:55:33 - ERROR - stderr - +2025-02-05 22:55:33 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 1.1203022003173828, 'learning_rate': 7.266147478375649e-06, 'epoch': 1.8} +2025-02-05 22:55:33 - ERROR - stderr - 60%|██████ | 13472/22434 [12:47:53<6:15:37, 2.51s/it] +2025-02-05 22:55:35 - ERROR - stderr - 60%|██████ | 13473/22434 [12:47:55<6:15:00, 2.51s/it] +2025-02-05 22:55:35 - ERROR - stderr - +2025-02-05 22:55:35 - ERROR - stderr - +2025-02-05 22:55:35 - INFO - stdout - {'loss': 0.7092, 'grad_norm': 1.3411940336227417, 'learning_rate': 7.2647587603206695e-06, 'epoch': 1.8} +2025-02-05 22:55:35 - ERROR - stderr - 60%|██████ | 13473/22434 [12:47:55<6:15:00, 2.51s/it] +2025-02-05 22:55:38 - ERROR - stderr - 60%|██████ | 13474/22434 [12:47:58<6:13:50, 2.50s/it] +2025-02-05 22:55:38 - ERROR - stderr - +2025-02-05 22:55:38 - ERROR - stderr - +2025-02-05 22:55:38 - INFO - stdout - {'loss': 0.6827, 'grad_norm': 1.2565068006515503, 'learning_rate': 7.263370099279173e-06, 'epoch': 1.8} +2025-02-05 22:55:38 - ERROR - stderr - 60%|██████ | 13474/22434 [12:47:58<6:13:50, 2.50s/it] +2025-02-05 22:55:40 - ERROR - stderr - 60%|██████ | 13475/22434 [12:48:00<6:12:10, 2.49s/it] +2025-02-05 22:55:40 - ERROR - stderr - +2025-02-05 22:55:40 - ERROR - stderr - +2025-02-05 22:55:40 - INFO - stdout - {'loss': 0.6942, 'grad_norm': 1.2375364303588867, 'learning_rate': 7.261981495280111e-06, 'epoch': 1.8} +2025-02-05 22:55:40 - ERROR - stderr - 60%|██████ | 13475/22434 [12:48:00<6:12:10, 2.49s/it] +2025-02-05 22:55:43 - ERROR - stderr - 60%|██████ | 13476/22434 [12:48:03<6:12:03, 2.49s/it] +2025-02-05 22:55:43 - ERROR - stderr - +2025-02-05 22:55:43 - ERROR - stderr - +2025-02-05 22:55:43 - INFO - stdout - {'loss': 0.7342, 'grad_norm': 1.1999362707138062, 'learning_rate': 7.260592948352418e-06, 'epoch': 1.8} +2025-02-05 22:55:43 - ERROR - stderr - 60%|██████ | 13476/22434 [12:48:03<6:12:03, 2.49s/it] +2025-02-05 22:55:45 - ERROR - stderr - 60%|██████ | 13477/22434 [12:48:05<6:08:41, 2.47s/it] +2025-02-05 22:55:45 - ERROR - stderr - +2025-02-05 22:55:45 - ERROR - stderr - +2025-02-05 22:55:45 - INFO - stdout - {'loss': 0.7925, 'grad_norm': 1.3678416013717651, 'learning_rate': 7.259204458525051e-06, 'epoch': 1.8} +2025-02-05 22:55:45 - ERROR - stderr - 60%|██████ | 13477/22434 [12:48:05<6:08:41, 2.47s/it] +2025-02-05 22:55:48 - ERROR - stderr - 60%|██████ | 13478/22434 [12:48:07<6:07:48, 2.46s/it] +2025-02-05 22:55:48 - ERROR - stderr - +2025-02-05 22:55:48 - ERROR - stderr - +2025-02-05 22:55:48 - INFO - stdout - {'loss': 0.6657, 'grad_norm': 1.2612638473510742, 'learning_rate': 7.257816025826942e-06, 'epoch': 1.8} +2025-02-05 22:55:48 - ERROR - stderr - 60%|██████ | 13478/22434 [12:48:08<6:07:48, 2.46s/it] +2025-02-05 22:55:50 - ERROR - stderr - 60%|██████ | 13479/22434 [12:48:10<6:11:52, 2.49s/it] +2025-02-05 22:55:50 - ERROR - stderr - +2025-02-05 22:55:50 - ERROR - stderr - +2025-02-05 22:55:50 - INFO - stdout - {'loss': 0.7848, 'grad_norm': 1.313520073890686, 'learning_rate': 7.256427650287032e-06, 'epoch': 1.8} +2025-02-05 22:55:50 - ERROR - stderr - 60%|██████ | 13479/22434 [12:48:10<6:11:52, 2.49s/it] +2025-02-05 22:55:53 - ERROR - stderr - 60%|██████ | 13480/22434 [12:48:13<6:14:15, 2.51s/it] +2025-02-05 22:55:53 - ERROR - stderr - +2025-02-05 22:55:53 - ERROR - stderr - +2025-02-05 22:55:53 - INFO - stdout - {'loss': 0.6151, 'grad_norm': 1.2450754642486572, 'learning_rate': 7.255039331934266e-06, 'epoch': 1.8} +2025-02-05 22:55:53 - ERROR - stderr - 60%|██████ | 13480/22434 [12:48:13<6:14:15, 2.51s/it] +2025-02-05 22:55:55 - ERROR - stderr - 60%|██████ | 13481/22434 [12:48:15<6:16:06, 2.52s/it] +2025-02-05 22:55:55 - ERROR - stderr - +2025-02-05 22:55:55 - ERROR - stderr - +2025-02-05 22:55:55 - INFO - stdout - {'loss': 0.7502, 'grad_norm': 1.3240654468536377, 'learning_rate': 7.253651070797578e-06, 'epoch': 1.8} +2025-02-05 22:55:55 - ERROR - stderr - 60%|██████ | 13481/22434 [12:48:15<6:16:06, 2.52s/it] +2025-02-05 22:55:58 - ERROR - stderr - 60%|██████ | 13482/22434 [12:48:18<6:17:27, 2.53s/it] +2025-02-05 22:55:58 - ERROR - stderr - +2025-02-05 22:55:58 - ERROR - stderr - +2025-02-05 22:55:58 - INFO - stdout - {'loss': 0.6816, 'grad_norm': 1.3654778003692627, 'learning_rate': 7.2522628669059015e-06, 'epoch': 1.8} +2025-02-05 22:55:58 - ERROR - stderr - 60%|██████ | 13482/22434 [12:48:18<6:17:27, 2.53s/it] +2025-02-05 22:56:00 - ERROR - stderr - 60%|██████ | 13483/22434 [12:48:20<6:13:52, 2.51s/it] +2025-02-05 22:56:00 - ERROR - stderr - +2025-02-05 22:56:00 - ERROR - stderr - +2025-02-05 22:56:00 - INFO - stdout - {'loss': 0.7079, 'grad_norm': 1.3863770961761475, 'learning_rate': 7.250874720288181e-06, 'epoch': 1.8} +2025-02-05 22:56:00 - ERROR - stderr - 60%|██████ | 13483/22434 [12:48:20<6:13:52, 2.51s/it] +2025-02-05 22:56:03 - ERROR - stderr - 60%|██████ | 13484/22434 [12:48:23<6:14:42, 2.51s/it] +2025-02-05 22:56:03 - ERROR - stderr - +2025-02-05 22:56:03 - ERROR - stderr - +2025-02-05 22:56:03 - INFO - stdout - {'loss': 0.6547, 'grad_norm': 1.170119285583496, 'learning_rate': 7.2494866309733414e-06, 'epoch': 1.8} +2025-02-05 22:56:03 - ERROR - stderr - 60%|██████ | 13484/22434 [12:48:23<6:14:42, 2.51s/it] +2025-02-05 22:56:05 - ERROR - stderr - 60%|██████ | 13485/22434 [12:48:25<6:13:40, 2.51s/it] +2025-02-05 22:56:05 - ERROR - stderr - +2025-02-05 22:56:05 - ERROR - stderr - +2025-02-05 22:56:05 - INFO - stdout - {'loss': 0.6977, 'grad_norm': 1.334919810295105, 'learning_rate': 7.248098598990324e-06, 'epoch': 1.8} +2025-02-05 22:56:05 - ERROR - stderr - 60%|██████ | 13485/22434 [12:48:25<6:13:40, 2.51s/it] +2025-02-05 22:56:08 - ERROR - stderr - 60%|██████ | 13486/22434 [12:48:28<6:13:15, 2.50s/it] +2025-02-05 22:56:08 - ERROR - stderr - +2025-02-05 22:56:08 - ERROR - stderr - +2025-02-05 22:56:08 - INFO - stdout - {'loss': 0.6379, 'grad_norm': 1.1556966304779053, 'learning_rate': 7.24671062436806e-06, 'epoch': 1.8} +2025-02-05 22:56:08 - ERROR - stderr - 60%|██████ | 13486/22434 [12:48:28<6:13:15, 2.50s/it] +2025-02-05 22:56:10 - ERROR - stderr - 60%|██████ | 13487/22434 [12:48:30<6:14:02, 2.51s/it] +2025-02-05 22:56:10 - ERROR - stderr - +2025-02-05 22:56:10 - ERROR - stderr - +2025-02-05 22:56:10 - INFO - stdout - {'loss': 0.702, 'grad_norm': 1.2706815004348755, 'learning_rate': 7.245322707135474e-06, 'epoch': 1.8} +2025-02-05 22:56:10 - ERROR - stderr - 60%|██████ | 13487/22434 [12:48:30<6:14:02, 2.51s/it] +2025-02-05 22:56:13 - ERROR - stderr - 60%|██████ | 13488/22434 [12:48:33<6:13:44, 2.51s/it] +2025-02-05 22:56:13 - ERROR - stderr - +2025-02-05 22:56:13 - ERROR - stderr - +2025-02-05 22:56:13 - INFO - stdout - {'loss': 0.7235, 'grad_norm': 1.179062843322754, 'learning_rate': 7.243934847321504e-06, 'epoch': 1.8} +2025-02-05 22:56:13 - ERROR - stderr - 60%|██████ | 13488/22434 [12:48:33<6:13:44, 2.51s/it] +2025-02-05 22:56:15 - ERROR - stderr - 60%|██████ | 13489/22434 [12:48:35<6:11:52, 2.49s/it] +2025-02-05 22:56:15 - ERROR - stderr - +2025-02-05 22:56:15 - ERROR - stderr - +2025-02-05 22:56:15 - INFO - stdout - {'loss': 0.6888, 'grad_norm': 1.3084492683410645, 'learning_rate': 7.242547044955075e-06, 'epoch': 1.8} +2025-02-05 22:56:15 - ERROR - stderr - 60%|██████ | 13489/22434 [12:48:35<6:11:52, 2.49s/it] +2025-02-05 22:56:18 - ERROR - stderr - 60%|██████ | 13490/22434 [12:48:38<6:13:59, 2.51s/it] +2025-02-05 22:56:18 - ERROR - stderr - +2025-02-05 22:56:18 - ERROR - stderr - +2025-02-05 22:56:18 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.2412402629852295, 'learning_rate': 7.24115930006511e-06, 'epoch': 1.8} +2025-02-05 22:56:18 - ERROR - stderr - 60%|██████ | 13490/22434 [12:48:38<6:13:59, 2.51s/it] +2025-02-05 22:56:20 - ERROR - stderr - 60%|██████ | 13491/22434 [12:48:40<6:13:05, 2.50s/it] +2025-02-05 22:56:20 - ERROR - stderr - +2025-02-05 22:56:20 - ERROR - stderr - +2025-02-05 22:56:20 - INFO - stdout - {'loss': 0.6658, 'grad_norm': 1.1943401098251343, 'learning_rate': 7.2397716126805415e-06, 'epoch': 1.8} +2025-02-05 22:56:20 - ERROR - stderr - 60%|██████ | 13491/22434 [12:48:40<6:13:05, 2.50s/it] +2025-02-05 22:56:23 - ERROR - stderr - 60%|██████ | 13492/22434 [12:48:43<6:20:35, 2.55s/it] +2025-02-05 22:56:23 - ERROR - stderr - +2025-02-05 22:56:23 - ERROR - stderr - +2025-02-05 22:56:23 - INFO - stdout - {'loss': 0.7774, 'grad_norm': 1.4427387714385986, 'learning_rate': 7.238383982830292e-06, 'epoch': 1.8} +2025-02-05 22:56:23 - ERROR - stderr - 60%|██████ | 13492/22434 [12:48:43<6:20:35, 2.55s/it] +2025-02-05 22:56:26 - ERROR - stderr - 60%|██████ | 13493/22434 [12:48:45<6:25:04, 2.58s/it] +2025-02-05 22:56:26 - ERROR - stderr - +2025-02-05 22:56:26 - ERROR - stderr - +2025-02-05 22:56:26 - INFO - stdout - {'loss': 0.6224, 'grad_norm': 1.336787462234497, 'learning_rate': 7.2369964105432884e-06, 'epoch': 1.8} +2025-02-05 22:56:26 - ERROR - stderr - 60%|██████ | 13493/22434 [12:48:46<6:25:04, 2.58s/it] +2025-02-05 22:56:28 - ERROR - stderr - 60%|██████ | 13494/22434 [12:48:48<6:21:50, 2.56s/it] +2025-02-05 22:56:28 - ERROR - stderr - +2025-02-05 22:56:28 - ERROR - stderr - +2025-02-05 22:56:28 - INFO - stdout - {'loss': 0.6012, 'grad_norm': 1.117634654045105, 'learning_rate': 7.235608895848451e-06, 'epoch': 1.8} +2025-02-05 22:56:28 - ERROR - stderr - 60%|██████ | 13494/22434 [12:48:48<6:21:50, 2.56s/it] +2025-02-05 22:56:31 - ERROR - stderr - 60%|██████ | 13495/22434 [12:48:50<6:17:34, 2.53s/it] +2025-02-05 22:56:31 - ERROR - stderr - +2025-02-05 22:56:31 - ERROR - stderr - +2025-02-05 22:56:31 - INFO - stdout - {'loss': 0.772, 'grad_norm': 1.2029128074645996, 'learning_rate': 7.2342214387746965e-06, 'epoch': 1.8} +2025-02-05 22:56:31 - ERROR - stderr - 60%|██████ | 13495/22434 [12:48:51<6:17:34, 2.53s/it] +2025-02-05 22:56:33 - ERROR - stderr - 60%|██████ | 13496/22434 [12:48:53<6:12:25, 2.50s/it] +2025-02-05 22:56:33 - ERROR - stderr - +2025-02-05 22:56:33 - ERROR - stderr - +2025-02-05 22:56:33 - INFO - stdout - {'loss': 0.6984, 'grad_norm': 1.2268115282058716, 'learning_rate': 7.232834039350954e-06, 'epoch': 1.8} +2025-02-05 22:56:33 - ERROR - stderr - 60%|██████ | 13496/22434 [12:48:53<6:12:25, 2.50s/it] +2025-02-05 22:56:36 - ERROR - stderr - 60%|██████ | 13497/22434 [12:48:56<6:30:53, 2.62s/it] +2025-02-05 22:56:36 - ERROR - stderr - +2025-02-05 22:56:36 - ERROR - stderr - +2025-02-05 22:56:36 - INFO - stdout - {'loss': 0.6608, 'grad_norm': 1.236081600189209, 'learning_rate': 7.231446697606136e-06, 'epoch': 1.8} +2025-02-05 22:56:36 - ERROR - stderr - 60%|██████ | 13497/22434 [12:48:56<6:30:53, 2.62s/it] +2025-02-05 22:56:39 - ERROR - stderr - 60%|██████ | 13498/22434 [12:48:58<6:25:19, 2.59s/it] +2025-02-05 22:56:39 - ERROR - stderr - +2025-02-05 22:56:39 - ERROR - stderr - +2025-02-05 22:56:39 - INFO - stdout - {'loss': 0.6695, 'grad_norm': 1.248104453086853, 'learning_rate': 7.23005941356916e-06, 'epoch': 1.81} +2025-02-05 22:56:39 - ERROR - stderr - 60%|██████ | 13498/22434 [12:48:58<6:25:19, 2.59s/it] +2025-02-05 22:56:41 - ERROR - stderr - 60%|██████ | 13499/22434 [12:49:01<6:21:43, 2.56s/it] +2025-02-05 22:56:41 - ERROR - stderr - +2025-02-05 22:56:41 - ERROR - stderr - +2025-02-05 22:56:41 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.1277081966400146, 'learning_rate': 7.22867218726895e-06, 'epoch': 1.81} +2025-02-05 22:56:41 - ERROR - stderr - 60%|██████ | 13499/22434 [12:49:01<6:21:43, 2.56s/it] +2025-02-05 22:56:44 - ERROR - stderr - 60%|██████ | 13500/22434 [12:49:03<6:18:02, 2.54s/it] +2025-02-05 22:56:44 - ERROR - stderr - +2025-02-05 22:56:44 - ERROR - stderr - +2025-02-05 22:56:44 - INFO - stdout - {'loss': 0.679, 'grad_norm': 1.2174861431121826, 'learning_rate': 7.227285018734411e-06, 'epoch': 1.81} +2025-02-05 22:56:44 - ERROR - stderr - 60%|██████ | 13500/22434 [12:49:03<6:18:02, 2.54s/it] +2025-02-05 22:56:46 - ERROR - stderr - 60%|██████ | 13501/22434 [12:49:06<6:13:17, 2.51s/it] +2025-02-05 22:56:46 - ERROR - stderr - +2025-02-05 22:56:46 - ERROR - stderr - +2025-02-05 22:56:46 - INFO - stdout - {'loss': 0.6606, 'grad_norm': 1.228413701057434, 'learning_rate': 7.225897907994468e-06, 'epoch': 1.81} +2025-02-05 22:56:46 - ERROR - stderr - 60%|██████ | 13501/22434 [12:49:06<6:13:17, 2.51s/it] +2025-02-05 22:56:48 - ERROR - stderr - 60%|██████ | 13502/22434 [12:49:08<6:13:47, 2.51s/it] +2025-02-05 22:56:49 - ERROR - stderr - +2025-02-05 22:56:49 - ERROR - stderr - +2025-02-05 22:56:49 - INFO - stdout - {'loss': 0.602, 'grad_norm': 1.2130722999572754, 'learning_rate': 7.224510855078027e-06, 'epoch': 1.81} +2025-02-05 22:56:49 - ERROR - stderr - 60%|██████ | 13502/22434 [12:49:08<6:13:47, 2.51s/it] +2025-02-05 22:56:51 - ERROR - stderr - 60%|██████ | 13503/22434 [12:49:11<6:11:39, 2.50s/it] +2025-02-05 22:56:51 - ERROR - stderr - +2025-02-05 22:56:51 - ERROR - stderr - +2025-02-05 22:56:51 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.2508689165115356, 'learning_rate': 7.223123860013998e-06, 'epoch': 1.81} +2025-02-05 22:56:51 - ERROR - stderr - 60%|██████ | 13503/22434 [12:49:11<6:11:39, 2.50s/it] +2025-02-05 22:56:53 - ERROR - stderr - 60%|██████ | 13504/22434 [12:49:13<6:08:20, 2.47s/it] +2025-02-05 22:56:53 - ERROR - stderr - +2025-02-05 22:56:53 - ERROR - stderr - +2025-02-05 22:56:53 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.2220182418823242, 'learning_rate': 7.221736922831297e-06, 'epoch': 1.81} +2025-02-05 22:56:53 - ERROR - stderr - 60%|██████ | 13504/22434 [12:49:13<6:08:20, 2.47s/it] +2025-02-05 22:56:56 - ERROR - stderr - 60%|██████ | 13505/22434 [12:49:16<6:05:52, 2.46s/it] +2025-02-05 22:56:56 - ERROR - stderr - +2025-02-05 22:56:56 - ERROR - stderr - +2025-02-05 22:56:56 - INFO - stdout - {'loss': 0.6681, 'grad_norm': 1.1811316013336182, 'learning_rate': 7.220350043558835e-06, 'epoch': 1.81} +2025-02-05 22:56:56 - ERROR - stderr - 60%|██████ | 13505/22434 [12:49:16<6:05:52, 2.46s/it] +2025-02-05 22:56:58 - ERROR - stderr - 60%|██████ | 13506/22434 [12:49:18<6:06:32, 2.46s/it] +2025-02-05 22:56:58 - ERROR - stderr - +2025-02-05 22:56:58 - ERROR - stderr - +2025-02-05 22:56:58 - INFO - stdout - {'loss': 0.7318, 'grad_norm': 1.4225716590881348, 'learning_rate': 7.21896322222551e-06, 'epoch': 1.81} +2025-02-05 22:56:58 - ERROR - stderr - 60%|██████ | 13506/22434 [12:49:18<6:06:32, 2.46s/it] +2025-02-05 22:57:01 - ERROR - stderr - 60%|██████ | 13507/22434 [12:49:21<6:10:04, 2.49s/it] +2025-02-05 22:57:01 - ERROR - stderr - +2025-02-05 22:57:01 - ERROR - stderr - +2025-02-05 22:57:01 - INFO - stdout - {'loss': 0.68, 'grad_norm': 1.1350493431091309, 'learning_rate': 7.21757645886024e-06, 'epoch': 1.81} +2025-02-05 22:57:01 - ERROR - stderr - 60%|██████ | 13507/22434 [12:49:21<6:10:04, 2.49s/it] +2025-02-05 22:57:03 - ERROR - stderr - 60%|██████ | 13508/22434 [12:49:23<6:11:17, 2.50s/it] +2025-02-05 22:57:03 - ERROR - stderr - +2025-02-05 22:57:03 - ERROR - stderr - +2025-02-05 22:57:03 - INFO - stdout - {'loss': 0.6191, 'grad_norm': 1.212494969367981, 'learning_rate': 7.216189753491924e-06, 'epoch': 1.81} +2025-02-05 22:57:03 - ERROR - stderr - 60%|██████ | 13508/22434 [12:49:23<6:11:17, 2.50s/it] +2025-02-05 22:57:06 - ERROR - stderr - 60%|██████ | 13509/22434 [12:49:26<6:11:23, 2.50s/it] +2025-02-05 22:57:06 - ERROR - stderr - +2025-02-05 22:57:06 - ERROR - stderr - +2025-02-05 22:57:06 - INFO - stdout - {'loss': 0.6921, 'grad_norm': 1.2296123504638672, 'learning_rate': 7.214803106149471e-06, 'epoch': 1.81} +2025-02-05 22:57:06 - ERROR - stderr - 60%|██████ | 13509/22434 [12:49:26<6:11:23, 2.50s/it] +2025-02-05 22:57:08 - ERROR - stderr - 60%|██████ | 13510/22434 [12:49:28<6:12:19, 2.50s/it] +2025-02-05 22:57:08 - ERROR - stderr - +2025-02-05 22:57:08 - ERROR - stderr - +2025-02-05 22:57:08 - INFO - stdout - {'loss': 0.6705, 'grad_norm': 1.275501012802124, 'learning_rate': 7.213416516861779e-06, 'epoch': 1.81} +2025-02-05 22:57:08 - ERROR - stderr - 60%|██████ | 13510/22434 [12:49:28<6:12:19, 2.50s/it] +2025-02-05 22:57:11 - ERROR - stderr - 60%|██████ | 13511/22434 [12:49:31<6:17:18, 2.54s/it] +2025-02-05 22:57:11 - ERROR - stderr - +2025-02-05 22:57:11 - ERROR - stderr - +2025-02-05 22:57:11 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.1218996047973633, 'learning_rate': 7.212029985657754e-06, 'epoch': 1.81} +2025-02-05 22:57:11 - ERROR - stderr - 60%|██████ | 13511/22434 [12:49:31<6:17:18, 2.54s/it] +2025-02-05 22:57:13 - ERROR - stderr - 60%|██████ | 13512/22434 [12:49:33<6:14:54, 2.52s/it] +2025-02-05 22:57:13 - ERROR - stderr - +2025-02-05 22:57:13 - ERROR - stderr - +2025-02-05 22:57:13 - INFO - stdout - {'loss': 0.6776, 'grad_norm': 1.1189286708831787, 'learning_rate': 7.2106435125663e-06, 'epoch': 1.81} +2025-02-05 22:57:13 - ERROR - stderr - 60%|██████ | 13512/22434 [12:49:33<6:14:54, 2.52s/it] +2025-02-05 22:57:16 - ERROR - stderr - 60%|██████ | 13513/22434 [12:49:36<6:13:13, 2.51s/it] +2025-02-05 22:57:16 - ERROR - stderr - +2025-02-05 22:57:16 - ERROR - stderr - +2025-02-05 22:57:16 - INFO - stdout - {'loss': 0.6893, 'grad_norm': 1.288316011428833, 'learning_rate': 7.2092570976163065e-06, 'epoch': 1.81} +2025-02-05 22:57:16 - ERROR - stderr - 60%|██████ | 13513/22434 [12:49:36<6:13:13, 2.51s/it] +2025-02-05 22:57:18 - ERROR - stderr - 60%|██████ | 13514/22434 [12:49:38<6:15:32, 2.53s/it] +2025-02-05 22:57:19 - ERROR - stderr - +2025-02-05 22:57:19 - ERROR - stderr - +2025-02-05 22:57:19 - INFO - stdout - {'loss': 0.7023, 'grad_norm': 1.2410317659378052, 'learning_rate': 7.207870740836684e-06, 'epoch': 1.81} +2025-02-05 22:57:19 - ERROR - stderr - 60%|██████ | 13514/22434 [12:49:38<6:15:32, 2.53s/it] +2025-02-05 22:57:21 - ERROR - stderr - 60%|██████ | 13515/22434 [12:49:41<6:16:11, 2.53s/it] +2025-02-05 22:57:21 - ERROR - stderr - +2025-02-05 22:57:21 - ERROR - stderr - +2025-02-05 22:57:21 - INFO - stdout - {'loss': 0.6149, 'grad_norm': 1.1779961585998535, 'learning_rate': 7.206484442256324e-06, 'epoch': 1.81} +2025-02-05 22:57:21 - ERROR - stderr - 60%|██████ | 13515/22434 [12:49:41<6:16:11, 2.53s/it] +2025-02-05 22:57:23 - ERROR - stderr - 60%|██████ | 13516/22434 [12:49:43<6:13:26, 2.51s/it] +2025-02-05 22:57:24 - ERROR - stderr - +2025-02-05 22:57:24 - ERROR - stderr - +2025-02-05 22:57:24 - INFO - stdout - {'loss': 0.8172, 'grad_norm': 1.4498060941696167, 'learning_rate': 7.205098201904118e-06, 'epoch': 1.81} +2025-02-05 22:57:24 - ERROR - stderr - 60%|██████ | 13516/22434 [12:49:43<6:13:26, 2.51s/it] +2025-02-05 22:57:26 - ERROR - stderr - 60%|██████ | 13517/22434 [12:49:46<6:13:40, 2.51s/it] +2025-02-05 22:57:26 - ERROR - stderr - +2025-02-05 22:57:26 - ERROR - stderr - +2025-02-05 22:57:26 - INFO - stdout - {'loss': 0.6235, 'grad_norm': 1.1095607280731201, 'learning_rate': 7.203712019808968e-06, 'epoch': 1.81} +2025-02-05 22:57:26 - ERROR - stderr - 60%|██████ | 13517/22434 [12:49:46<6:13:40, 2.51s/it] +2025-02-05 22:57:28 - ERROR - stderr - 60%|██████ | 13518/22434 [12:49:48<6:10:49, 2.50s/it] +2025-02-05 22:57:29 - ERROR - stderr - +2025-02-05 22:57:29 - ERROR - stderr - +2025-02-05 22:57:29 - INFO - stdout - {'loss': 0.6599, 'grad_norm': 1.1724600791931152, 'learning_rate': 7.2023258959997675e-06, 'epoch': 1.81} +2025-02-05 22:57:29 - ERROR - stderr - 60%|██████ | 13518/22434 [12:49:48<6:10:49, 2.50s/it] +2025-02-05 22:57:31 - ERROR - stderr - 60%|██████ | 13519/22434 [12:49:51<6:23:56, 2.58s/it] +2025-02-05 22:57:31 - ERROR - stderr - +2025-02-05 22:57:31 - ERROR - stderr - +2025-02-05 22:57:31 - INFO - stdout - {'loss': 0.6627, 'grad_norm': 1.247000813484192, 'learning_rate': 7.200939830505402e-06, 'epoch': 1.81} +2025-02-05 22:57:31 - ERROR - stderr - 60%|██████ | 13519/22434 [12:49:51<6:23:56, 2.58s/it] +2025-02-05 22:57:34 - ERROR - stderr - 60%|██████ | 13520/22434 [12:49:54<6:19:06, 2.55s/it] +2025-02-05 22:57:34 - ERROR - stderr - +2025-02-05 22:57:34 - ERROR - stderr - +2025-02-05 22:57:34 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.4289426803588867, 'learning_rate': 7.1995538233547725e-06, 'epoch': 1.81} +2025-02-05 22:57:34 - ERROR - stderr - 60%|██████ | 13520/22434 [12:49:54<6:19:06, 2.55s/it] +2025-02-05 22:57:36 - ERROR - stderr - 60%|██████ | 13521/22434 [12:49:56<6:16:23, 2.53s/it] +2025-02-05 22:57:36 - ERROR - stderr - +2025-02-05 22:57:36 - ERROR - stderr - +2025-02-05 22:57:36 - INFO - stdout - {'loss': 0.7625, 'grad_norm': 1.306842565536499, 'learning_rate': 7.198167874576758e-06, 'epoch': 1.81} +2025-02-05 22:57:36 - ERROR - stderr - 60%|██████ | 13521/22434 [12:49:56<6:16:23, 2.53s/it] +2025-02-05 22:57:39 - ERROR - stderr - 60%|██████ | 13522/22434 [12:49:58<6:13:51, 2.52s/it] +2025-02-05 22:57:39 - ERROR - stderr - +2025-02-05 22:57:39 - ERROR - stderr - +2025-02-05 22:57:39 - INFO - stdout - {'loss': 0.7134, 'grad_norm': 1.2688566446304321, 'learning_rate': 7.196781984200258e-06, 'epoch': 1.81} +2025-02-05 22:57:39 - ERROR - stderr - 60%|██████ | 13522/22434 [12:49:59<6:13:51, 2.52s/it] +2025-02-05 22:57:41 - ERROR - stderr - 60%|██████ | 13523/22434 [12:50:01<6:20:35, 2.56s/it] +2025-02-05 22:57:41 - ERROR - stderr - +2025-02-05 22:57:41 - ERROR - stderr - +2025-02-05 22:57:41 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.2211495637893677, 'learning_rate': 7.195396152254155e-06, 'epoch': 1.81} +2025-02-05 22:57:41 - ERROR - stderr - 60%|██████ | 13523/22434 [12:50:01<6:20:35, 2.56s/it] +2025-02-05 22:57:44 - ERROR - stderr - 60%|██████ | 13524/22434 [12:50:04<6:18:20, 2.55s/it] +2025-02-05 22:57:44 - ERROR - stderr - +2025-02-05 22:57:44 - ERROR - stderr - +2025-02-05 22:57:44 - INFO - stdout - {'loss': 0.7442, 'grad_norm': 1.4049979448318481, 'learning_rate': 7.194010378767333e-06, 'epoch': 1.81} +2025-02-05 22:57:44 - ERROR - stderr - 60%|██████ | 13524/22434 [12:50:04<6:18:20, 2.55s/it] +2025-02-05 22:57:46 - ERROR - stderr - 60%|██████ | 13525/22434 [12:50:06<6:15:16, 2.53s/it] +2025-02-05 22:57:46 - ERROR - stderr - +2025-02-05 22:57:46 - ERROR - stderr - +2025-02-05 22:57:46 - INFO - stdout - {'loss': 0.6858, 'grad_norm': 1.256548285484314, 'learning_rate': 7.1926246637686805e-06, 'epoch': 1.81} +2025-02-05 22:57:46 - ERROR - stderr - 60%|██████ | 13525/22434 [12:50:06<6:15:16, 2.53s/it] +2025-02-05 22:57:49 - ERROR - stderr - 60%|██████ | 13526/22434 [12:50:09<6:15:01, 2.53s/it] +2025-02-05 22:57:49 - ERROR - stderr - +2025-02-05 22:57:49 - ERROR - stderr - +2025-02-05 22:57:49 - INFO - stdout - {'loss': 0.7627, 'grad_norm': 1.423722267150879, 'learning_rate': 7.191239007287082e-06, 'epoch': 1.81} +2025-02-05 22:57:49 - ERROR - stderr - 60%|██████ | 13526/22434 [12:50:09<6:15:01, 2.53s/it] +2025-02-05 22:57:52 - ERROR - stderr - 60%|██████ | 13527/22434 [12:50:11<6:23:09, 2.58s/it] +2025-02-05 22:57:52 - ERROR - stderr - +2025-02-05 22:57:52 - ERROR - stderr - +2025-02-05 22:57:52 - INFO - stdout - {'loss': 0.7943, 'grad_norm': 1.4026179313659668, 'learning_rate': 7.189853409351415e-06, 'epoch': 1.81} +2025-02-05 22:57:52 - ERROR - stderr - 60%|██████ | 13527/22434 [12:50:11<6:23:09, 2.58s/it] +2025-02-05 22:57:54 - ERROR - stderr - 60%|██████ | 13528/22434 [12:50:14<6:15:42, 2.53s/it] +2025-02-05 22:57:54 - ERROR - stderr - +2025-02-05 22:57:54 - ERROR - stderr - +2025-02-05 22:57:54 - INFO - stdout - {'loss': 0.6433, 'grad_norm': 1.2950092554092407, 'learning_rate': 7.188467869990569e-06, 'epoch': 1.81} +2025-02-05 22:57:54 - ERROR - stderr - 60%|██████ | 13528/22434 [12:50:14<6:15:42, 2.53s/it] +2025-02-05 22:57:56 - ERROR - stderr - 60%|██████ | 13529/22434 [12:50:16<6:13:03, 2.51s/it] +2025-02-05 22:57:57 - ERROR - stderr - +2025-02-05 22:57:57 - ERROR - stderr - +2025-02-05 22:57:57 - INFO - stdout - {'loss': 0.6758, 'grad_norm': 1.1776596307754517, 'learning_rate': 7.187082389233415e-06, 'epoch': 1.81} +2025-02-05 22:57:57 - ERROR - stderr - 60%|██████ | 13529/22434 [12:50:16<6:13:03, 2.51s/it] +2025-02-05 22:57:59 - ERROR - stderr - 60%|██████ | 13530/22434 [12:50:19<6:15:56, 2.53s/it] +2025-02-05 22:57:59 - ERROR - stderr - +2025-02-05 22:57:59 - ERROR - stderr - +2025-02-05 22:57:59 - INFO - stdout - {'loss': 0.583, 'grad_norm': 1.200979232788086, 'learning_rate': 7.18569696710884e-06, 'epoch': 1.81} +2025-02-05 22:57:59 - ERROR - stderr - 60%|██████ | 13530/22434 [12:50:19<6:15:56, 2.53s/it] +2025-02-05 22:58:02 - ERROR - stderr - 60%|██████ | 13531/22434 [12:50:21<6:17:35, 2.54s/it] +2025-02-05 22:58:02 - ERROR - stderr - +2025-02-05 22:58:02 - ERROR - stderr - +2025-02-05 22:58:02 - INFO - stdout - {'loss': 0.6899, 'grad_norm': 1.2164534330368042, 'learning_rate': 7.184311603645719e-06, 'epoch': 1.81} +2025-02-05 22:58:02 - ERROR - stderr - 60%|██████ | 13531/22434 [12:50:21<6:17:35, 2.54s/it] +2025-02-05 22:58:04 - ERROR - stderr - 60%|██████ | 13532/22434 [12:50:24<6:15:19, 2.53s/it] +2025-02-05 22:58:04 - ERROR - stderr - +2025-02-05 22:58:04 - ERROR - stderr - +2025-02-05 22:58:04 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.2490911483764648, 'learning_rate': 7.1829262988729265e-06, 'epoch': 1.81} +2025-02-05 22:58:04 - ERROR - stderr - 60%|██████ | 13532/22434 [12:50:24<6:15:19, 2.53s/it] +2025-02-05 22:58:07 - ERROR - stderr - 60%|██████ | 13533/22434 [12:50:26<6:15:52, 2.53s/it] +2025-02-05 22:58:07 - ERROR - stderr - +2025-02-05 22:58:07 - ERROR - stderr - +2025-02-05 22:58:07 - INFO - stdout - {'loss': 0.675, 'grad_norm': 1.1914219856262207, 'learning_rate': 7.181541052819343e-06, 'epoch': 1.81} +2025-02-05 22:58:07 - ERROR - stderr - 60%|██████ | 13533/22434 [12:50:26<6:15:52, 2.53s/it] +2025-02-05 22:58:09 - ERROR - stderr - 60%|██████ | 13534/22434 [12:50:29<6:12:18, 2.51s/it] +2025-02-05 22:58:09 - ERROR - stderr - +2025-02-05 22:58:09 - ERROR - stderr - +2025-02-05 22:58:09 - INFO - stdout - {'loss': 0.7068, 'grad_norm': 1.304665207862854, 'learning_rate': 7.18015586551384e-06, 'epoch': 1.81} +2025-02-05 22:58:09 - ERROR - stderr - 60%|██████ | 13534/22434 [12:50:29<6:12:18, 2.51s/it] +2025-02-05 22:58:12 - ERROR - stderr - 60%|██████ | 13535/22434 [12:50:31<6:09:43, 2.49s/it] +2025-02-05 22:58:12 - ERROR - stderr - +2025-02-05 22:58:12 - ERROR - stderr - +2025-02-05 22:58:12 - INFO - stdout - {'loss': 0.7533, 'grad_norm': 1.3632184267044067, 'learning_rate': 7.1787707369852835e-06, 'epoch': 1.81} +2025-02-05 22:58:12 - ERROR - stderr - 60%|██████ | 13535/22434 [12:50:31<6:09:43, 2.49s/it] +2025-02-05 22:58:14 - ERROR - stderr - 60%|██████ | 13536/22434 [12:50:34<6:10:23, 2.50s/it] +2025-02-05 22:58:14 - ERROR - stderr - +2025-02-05 22:58:14 - ERROR - stderr - +2025-02-05 22:58:14 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.2004295587539673, 'learning_rate': 7.1773856672625555e-06, 'epoch': 1.81} +2025-02-05 22:58:14 - ERROR - stderr - 60%|██████ | 13536/22434 [12:50:34<6:10:23, 2.50s/it] +2025-02-05 22:58:17 - ERROR - stderr - 60%|██████ | 13537/22434 [12:50:36<6:07:58, 2.48s/it] +2025-02-05 22:58:17 - ERROR - stderr - +2025-02-05 22:58:17 - ERROR - stderr - +2025-02-05 22:58:17 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.3901088237762451, 'learning_rate': 7.17600065637452e-06, 'epoch': 1.81} +2025-02-05 22:58:17 - ERROR - stderr - 60%|██████ | 13537/22434 [12:50:36<6:07:58, 2.48s/it] +2025-02-05 22:58:19 - ERROR - stderr - 60%|██████ | 13538/22434 [12:50:39<6:04:32, 2.46s/it] +2025-02-05 22:58:19 - ERROR - stderr - +2025-02-05 22:58:19 - ERROR - stderr - +2025-02-05 22:58:19 - INFO - stdout - {'loss': 0.7661, 'grad_norm': 1.3509643077850342, 'learning_rate': 7.17461570435005e-06, 'epoch': 1.81} +2025-02-05 22:58:19 - ERROR - stderr - 60%|██████ | 13538/22434 [12:50:39<6:04:32, 2.46s/it] +2025-02-05 22:58:22 - ERROR - stderr - 60%|██████ | 13539/22434 [12:50:41<6:10:07, 2.50s/it] +2025-02-05 22:58:22 - ERROR - stderr - +2025-02-05 22:58:22 - ERROR - stderr - +2025-02-05 22:58:22 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.2957826852798462, 'learning_rate': 7.173230811218015e-06, 'epoch': 1.81} +2025-02-05 22:58:22 - ERROR - stderr - 60%|██████ | 13539/22434 [12:50:41<6:10:07, 2.50s/it] +2025-02-05 22:58:24 - ERROR - stderr - 60%|██████ | 13540/22434 [12:50:44<6:09:37, 2.49s/it] +2025-02-05 22:58:24 - ERROR - stderr - +2025-02-05 22:58:24 - ERROR - stderr - +2025-02-05 22:58:24 - INFO - stdout - {'loss': 0.742, 'grad_norm': 1.2563608884811401, 'learning_rate': 7.1718459770072725e-06, 'epoch': 1.81} +2025-02-05 22:58:24 - ERROR - stderr - 60%|██████ | 13540/22434 [12:50:44<6:09:37, 2.49s/it] +2025-02-05 22:58:26 - ERROR - stderr - 60%|██████ | 13541/22434 [12:50:46<6:06:28, 2.47s/it] +2025-02-05 22:58:26 - ERROR - stderr - +2025-02-05 22:58:26 - ERROR - stderr - +2025-02-05 22:58:26 - INFO - stdout - {'loss': 0.7392, 'grad_norm': 1.318854570388794, 'learning_rate': 7.1704612017467014e-06, 'epoch': 1.81} +2025-02-05 22:58:26 - ERROR - stderr - 60%|██████ | 13541/22434 [12:50:46<6:06:28, 2.47s/it] +2025-02-05 22:58:29 - ERROR - stderr - 60%|██████ | 13542/22434 [12:50:49<6:11:11, 2.50s/it] +2025-02-05 22:58:29 - ERROR - stderr - +2025-02-05 22:58:29 - ERROR - stderr - +2025-02-05 22:58:29 - INFO - stdout - {'loss': 0.644, 'grad_norm': 1.1652617454528809, 'learning_rate': 7.169076485465154e-06, 'epoch': 1.81} +2025-02-05 22:58:29 - ERROR - stderr - 60%|██████ | 13542/22434 [12:50:49<6:11:11, 2.50s/it] +2025-02-05 22:58:31 - ERROR - stderr - 60%|██████ | 13543/22434 [12:50:51<6:08:34, 2.49s/it] +2025-02-05 22:58:32 - ERROR - stderr - +2025-02-05 22:58:32 - ERROR - stderr - +2025-02-05 22:58:32 - INFO - stdout - {'loss': 0.6091, 'grad_norm': 1.1651017665863037, 'learning_rate': 7.167691828191498e-06, 'epoch': 1.81} +2025-02-05 22:58:32 - ERROR - stderr - 60%|██████ | 13543/22434 [12:50:51<6:08:34, 2.49s/it] +2025-02-05 22:58:34 - ERROR - stderr - 60%|██████ | 13544/22434 [12:50:54<6:06:01, 2.47s/it] +2025-02-05 22:58:34 - ERROR - stderr - +2025-02-05 22:58:34 - ERROR - stderr - +2025-02-05 22:58:34 - INFO - stdout - {'loss': 0.627, 'grad_norm': 1.2813012599945068, 'learning_rate': 7.166307229954599e-06, 'epoch': 1.81} +2025-02-05 22:58:34 - ERROR - stderr - 60%|██████ | 13544/22434 [12:50:54<6:06:01, 2.47s/it] +2025-02-05 22:58:36 - ERROR - stderr - 60%|██████ | 13545/22434 [12:50:56<6:05:07, 2.46s/it] +2025-02-05 22:58:36 - ERROR - stderr - +2025-02-05 22:58:36 - ERROR - stderr - +2025-02-05 22:58:36 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.1741511821746826, 'learning_rate': 7.16492269078331e-06, 'epoch': 1.81} +2025-02-05 22:58:36 - ERROR - stderr - 60%|██████ | 13545/22434 [12:50:56<6:05:07, 2.46s/it] +2025-02-05 22:58:39 - ERROR - stderr - 60%|██████ | 13546/22434 [12:50:59<6:08:48, 2.49s/it] +2025-02-05 22:58:39 - ERROR - stderr - +2025-02-05 22:58:39 - ERROR - stderr - +2025-02-05 22:58:39 - INFO - stdout - {'loss': 0.6621, 'grad_norm': 1.3188551664352417, 'learning_rate': 7.1635382107065e-06, 'epoch': 1.81} +2025-02-05 22:58:39 - ERROR - stderr - 60%|██████ | 13546/22434 [12:50:59<6:08:48, 2.49s/it] +2025-02-05 22:58:41 - ERROR - stderr - 60%|██████ | 13547/22434 [12:51:01<6:11:46, 2.51s/it] +2025-02-05 22:58:41 - ERROR - stderr - +2025-02-05 22:58:41 - ERROR - stderr - +2025-02-05 22:58:41 - INFO - stdout - {'loss': 0.6944, 'grad_norm': 1.202091097831726, 'learning_rate': 7.1621537897530205e-06, 'epoch': 1.81} +2025-02-05 22:58:41 - ERROR - stderr - 60%|██████ | 13547/22434 [12:51:01<6:11:46, 2.51s/it] +2025-02-05 22:58:44 - ERROR - stderr - 60%|██████ | 13548/22434 [12:51:04<6:15:57, 2.54s/it] +2025-02-05 22:58:44 - ERROR - stderr - +2025-02-05 22:58:44 - ERROR - stderr - +2025-02-05 22:58:44 - INFO - stdout - {'loss': 0.6318, 'grad_norm': 1.2145463228225708, 'learning_rate': 7.160769427951726e-06, 'epoch': 1.81} +2025-02-05 22:58:44 - ERROR - stderr - 60%|██████ | 13548/22434 [12:51:04<6:15:57, 2.54s/it] +2025-02-05 22:58:47 - ERROR - stderr - 60%|██████ | 13549/22434 [12:51:06<6:12:04, 2.51s/it] +2025-02-05 22:58:47 - ERROR - stderr - +2025-02-05 22:58:47 - ERROR - stderr - +2025-02-05 22:58:47 - INFO - stdout - {'loss': 0.7038, 'grad_norm': 1.2254374027252197, 'learning_rate': 7.159385125331478e-06, 'epoch': 1.81} +2025-02-05 22:58:47 - ERROR - stderr - 60%|██████ | 13549/22434 [12:51:06<6:12:04, 2.51s/it] +2025-02-05 22:58:49 - ERROR - stderr - 60%|██████ | 13550/22434 [12:51:09<6:13:19, 2.52s/it] +2025-02-05 22:58:49 - ERROR - stderr - +2025-02-05 22:58:49 - ERROR - stderr - +2025-02-05 22:58:49 - INFO - stdout - {'loss': 0.6637, 'grad_norm': 1.1796205043792725, 'learning_rate': 7.158000881921131e-06, 'epoch': 1.81} +2025-02-05 22:58:49 - ERROR - stderr - 60%|██████ | 13550/22434 [12:51:09<6:13:19, 2.52s/it] +2025-02-05 22:58:52 - ERROR - stderr - 60%|██████ | 13551/22434 [12:51:11<6:11:57, 2.51s/it] +2025-02-05 22:58:52 - ERROR - stderr - +2025-02-05 22:58:52 - ERROR - stderr - +2025-02-05 22:58:52 - INFO - stdout - {'loss': 0.7709, 'grad_norm': 1.4149168729782104, 'learning_rate': 7.156616697749532e-06, 'epoch': 1.81} +2025-02-05 22:58:52 - ERROR - stderr - 60%|██████ | 13551/22434 [12:51:11<6:11:57, 2.51s/it] +2025-02-05 22:58:54 - ERROR - stderr - 60%|██████ | 13552/22434 [12:51:14<6:10:33, 2.50s/it] +2025-02-05 22:58:54 - ERROR - stderr - +2025-02-05 22:58:54 - ERROR - stderr - +2025-02-05 22:58:54 - INFO - stdout - {'loss': 0.6207, 'grad_norm': 1.178280234336853, 'learning_rate': 7.155232572845541e-06, 'epoch': 1.81} +2025-02-05 22:58:54 - ERROR - stderr - 60%|██████ | 13552/22434 [12:51:14<6:10:33, 2.50s/it] +2025-02-05 22:58:57 - ERROR - stderr - 60%|██████ | 13553/22434 [12:51:16<6:10:10, 2.50s/it] +2025-02-05 22:58:57 - ERROR - stderr - +2025-02-05 22:58:57 - ERROR - stderr - +2025-02-05 22:58:57 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.2755999565124512, 'learning_rate': 7.153848507238002e-06, 'epoch': 1.81} +2025-02-05 22:58:57 - ERROR - stderr - 60%|██████ | 13553/22434 [12:51:16<6:10:10, 2.50s/it] +2025-02-05 22:58:59 - ERROR - stderr - 60%|██████ | 13554/22434 [12:51:19<6:06:43, 2.48s/it] +2025-02-05 22:58:59 - ERROR - stderr - +2025-02-05 22:58:59 - ERROR - stderr - +2025-02-05 22:58:59 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.355187177658081, 'learning_rate': 7.152464500955769e-06, 'epoch': 1.81} +2025-02-05 22:58:59 - ERROR - stderr - 60%|██████ | 13554/22434 [12:51:19<6:06:43, 2.48s/it] +2025-02-05 22:59:02 - ERROR - stderr - 60%|██████ | 13555/22434 [12:51:21<6:10:56, 2.51s/it] +2025-02-05 22:59:02 - ERROR - stderr - +2025-02-05 22:59:02 - ERROR - stderr - +2025-02-05 22:59:02 - INFO - stdout - {'loss': 0.7022, 'grad_norm': 1.2713872194290161, 'learning_rate': 7.151080554027688e-06, 'epoch': 1.81} +2025-02-05 22:59:02 - ERROR - stderr - 60%|██████ | 13555/22434 [12:51:21<6:10:56, 2.51s/it] +2025-02-05 22:59:04 - ERROR - stderr - 60%|██████ | 13556/22434 [12:51:24<6:08:20, 2.49s/it] +2025-02-05 22:59:04 - ERROR - stderr - +2025-02-05 22:59:04 - ERROR - stderr - +2025-02-05 22:59:04 - INFO - stdout - {'loss': 0.6592, 'grad_norm': 1.1352962255477905, 'learning_rate': 7.149696666482607e-06, 'epoch': 1.81} +2025-02-05 22:59:04 - ERROR - stderr - 60%|██████ | 13556/22434 [12:51:24<6:08:20, 2.49s/it] +2025-02-05 22:59:07 - ERROR - stderr - 60%|██████ | 13557/22434 [12:51:26<6:10:30, 2.50s/it] +2025-02-05 22:59:07 - ERROR - stderr - +2025-02-05 22:59:07 - ERROR - stderr - +2025-02-05 22:59:07 - INFO - stdout - {'loss': 0.688, 'grad_norm': 1.3762716054916382, 'learning_rate': 7.1483128383493715e-06, 'epoch': 1.81} +2025-02-05 22:59:07 - ERROR - stderr - 60%|██████ | 13557/22434 [12:51:26<6:10:30, 2.50s/it] +2025-02-05 22:59:09 - ERROR - stderr - 60%|██████ | 13558/22434 [12:51:29<6:09:35, 2.50s/it] +2025-02-05 22:59:09 - ERROR - stderr - +2025-02-05 22:59:09 - ERROR - stderr - +2025-02-05 22:59:09 - INFO - stdout - {'loss': 0.668, 'grad_norm': 1.276595950126648, 'learning_rate': 7.146929069656828e-06, 'epoch': 1.81} +2025-02-05 22:59:09 - ERROR - stderr - 60%|██████ | 13558/22434 [12:51:29<6:09:35, 2.50s/it] +2025-02-05 22:59:11 - ERROR - stderr - 60%|██████ | 13559/22434 [12:51:31<6:08:28, 2.49s/it] +2025-02-05 22:59:12 - ERROR - stderr - +2025-02-05 22:59:12 - ERROR - stderr - +2025-02-05 22:59:12 - INFO - stdout - {'loss': 0.6438, 'grad_norm': 1.1672265529632568, 'learning_rate': 7.1455453604338145e-06, 'epoch': 1.81} +2025-02-05 22:59:12 - ERROR - stderr - 60%|██████ | 13559/22434 [12:51:31<6:08:28, 2.49s/it] +2025-02-05 22:59:14 - ERROR - stderr - 60%|██████ | 13560/22434 [12:51:34<6:08:28, 2.49s/it] +2025-02-05 22:59:14 - ERROR - stderr - +2025-02-05 22:59:14 - ERROR - stderr - +2025-02-05 22:59:14 - INFO - stdout - {'loss': 0.6345, 'grad_norm': 1.2830318212509155, 'learning_rate': 7.144161710709179e-06, 'epoch': 1.81} +2025-02-05 22:59:14 - ERROR - stderr - 60%|██████ | 13560/22434 [12:51:34<6:08:28, 2.49s/it] +2025-02-05 22:59:16 - ERROR - stderr - 60%|██████ | 13561/22434 [12:51:36<6:09:33, 2.50s/it] +2025-02-05 22:59:17 - ERROR - stderr - +2025-02-05 22:59:17 - ERROR - stderr - +2025-02-05 22:59:17 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.1646220684051514, 'learning_rate': 7.142778120511758e-06, 'epoch': 1.81} +2025-02-05 22:59:17 - ERROR - stderr - 60%|██████ | 13561/22434 [12:51:36<6:09:33, 2.50s/it] +2025-02-05 22:59:19 - ERROR - stderr - 60%|██████ | 13562/22434 [12:51:39<6:21:52, 2.58s/it] +2025-02-05 22:59:19 - ERROR - stderr - +2025-02-05 22:59:19 - ERROR - stderr - +2025-02-05 22:59:19 - INFO - stdout - {'loss': 0.6573, 'grad_norm': 1.1093212366104126, 'learning_rate': 7.141394589870393e-06, 'epoch': 1.81} +2025-02-05 22:59:19 - ERROR - stderr - 60%|██████ | 13562/22434 [12:51:39<6:21:52, 2.58s/it] +2025-02-05 22:59:22 - ERROR - stderr - 60%|██████ | 13563/22434 [12:51:42<6:20:36, 2.57s/it] +2025-02-05 22:59:22 - ERROR - stderr - +2025-02-05 22:59:22 - ERROR - stderr - +2025-02-05 22:59:22 - INFO - stdout - {'loss': 0.7157, 'grad_norm': 1.3103210926055908, 'learning_rate': 7.140011118813925e-06, 'epoch': 1.81} +2025-02-05 22:59:22 - ERROR - stderr - 60%|██████ | 13563/22434 [12:51:42<6:20:36, 2.57s/it] +2025-02-05 22:59:24 - ERROR - stderr - 60%|██████ | 13564/22434 [12:51:44<6:24:58, 2.60s/it] +2025-02-05 22:59:25 - ERROR - stderr - +2025-02-05 22:59:25 - ERROR - stderr - +2025-02-05 22:59:25 - INFO - stdout - {'loss': 0.7328, 'grad_norm': 1.2983509302139282, 'learning_rate': 7.1386277073711855e-06, 'epoch': 1.81} +2025-02-05 22:59:25 - ERROR - stderr - 60%|██████ | 13564/22434 [12:51:44<6:24:58, 2.60s/it] +2025-02-05 22:59:27 - ERROR - stderr - 60%|██████ | 13565/22434 [12:51:47<6:20:07, 2.57s/it] +2025-02-05 22:59:27 - ERROR - stderr - +2025-02-05 22:59:27 - ERROR - stderr - +2025-02-05 22:59:27 - INFO - stdout - {'loss': 0.6582, 'grad_norm': 1.2334569692611694, 'learning_rate': 7.1372443555710155e-06, 'epoch': 1.81} +2025-02-05 22:59:27 - ERROR - stderr - 60%|██████ | 13565/22434 [12:51:47<6:20:07, 2.57s/it] +2025-02-05 22:59:30 - ERROR - stderr - 60%|██████ | 13566/22434 [12:51:49<6:18:57, 2.56s/it] +2025-02-05 22:59:30 - ERROR - stderr - +2025-02-05 22:59:30 - ERROR - stderr - +2025-02-05 22:59:30 - INFO - stdout - {'loss': 0.6092, 'grad_norm': 1.2476240396499634, 'learning_rate': 7.13586106344225e-06, 'epoch': 1.81} +2025-02-05 22:59:30 - ERROR - stderr - 60%|██████ | 13566/22434 [12:51:49<6:18:57, 2.56s/it] +2025-02-05 22:59:32 - ERROR - stderr - 60%|██████ | 13567/22434 [12:51:52<6:16:34, 2.55s/it] +2025-02-05 22:59:32 - ERROR - stderr - +2025-02-05 22:59:32 - ERROR - stderr - +2025-02-05 22:59:32 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.262091040611267, 'learning_rate': 7.134477831013714e-06, 'epoch': 1.81} +2025-02-05 22:59:32 - ERROR - stderr - 60%|██████ | 13567/22434 [12:51:52<6:16:34, 2.55s/it] +2025-02-05 22:59:34 - ERROR - stderr - 60%|██████ | 13568/22434 [12:51:54<6:12:25, 2.52s/it] +2025-02-05 22:59:35 - ERROR - stderr - +2025-02-05 22:59:35 - ERROR - stderr - +2025-02-05 22:59:35 - INFO - stdout - {'loss': 0.6426, 'grad_norm': 1.253456950187683, 'learning_rate': 7.133094658314248e-06, 'epoch': 1.81} +2025-02-05 22:59:35 - ERROR - stderr - 60%|██████ | 13568/22434 [12:51:54<6:12:25, 2.52s/it] +2025-02-05 22:59:37 - ERROR - stderr - 60%|██████ | 13569/22434 [12:51:57<6:13:43, 2.53s/it] +2025-02-05 22:59:37 - ERROR - stderr - +2025-02-05 22:59:37 - ERROR - stderr - +2025-02-05 22:59:37 - INFO - stdout - {'loss': 0.6238, 'grad_norm': 1.1792532205581665, 'learning_rate': 7.1317115453726815e-06, 'epoch': 1.81} +2025-02-05 22:59:37 - ERROR - stderr - 60%|██████ | 13569/22434 [12:51:57<6:13:43, 2.53s/it] +2025-02-05 22:59:40 - ERROR - stderr - 60%|██████ | 13570/22434 [12:51:59<6:14:34, 2.54s/it] +2025-02-05 22:59:40 - ERROR - stderr - +2025-02-05 22:59:40 - ERROR - stderr - +2025-02-05 22:59:40 - INFO - stdout - {'loss': 0.6266, 'grad_norm': 1.2162641286849976, 'learning_rate': 7.130328492217841e-06, 'epoch': 1.81} +2025-02-05 22:59:40 - ERROR - stderr - 60%|██████ | 13570/22434 [12:51:59<6:14:34, 2.54s/it] +2025-02-05 22:59:42 - ERROR - stderr - 60%|██████ | 13571/22434 [12:52:02<6:13:21, 2.53s/it] +2025-02-05 22:59:42 - ERROR - stderr - +2025-02-05 22:59:42 - ERROR - stderr - +2025-02-05 22:59:42 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.1827529668807983, 'learning_rate': 7.128945498878562e-06, 'epoch': 1.81} +2025-02-05 22:59:42 - ERROR - stderr - 60%|██████ | 13571/22434 [12:52:02<6:13:21, 2.53s/it] +2025-02-05 22:59:45 - ERROR - stderr - 60%|██████ | 13572/22434 [12:52:04<6:10:02, 2.51s/it] +2025-02-05 22:59:45 - ERROR - stderr - +2025-02-05 22:59:45 - ERROR - stderr - +2025-02-05 22:59:45 - INFO - stdout - {'loss': 0.6891, 'grad_norm': 1.2661943435668945, 'learning_rate': 7.127562565383661e-06, 'epoch': 1.81} +2025-02-05 22:59:45 - ERROR - stderr - 60%|██████ | 13572/22434 [12:52:04<6:10:02, 2.51s/it] +2025-02-05 22:59:47 - ERROR - stderr - 61%|██████ | 13573/22434 [12:52:07<6:12:50, 2.52s/it] +2025-02-05 22:59:47 - ERROR - stderr - +2025-02-05 22:59:47 - ERROR - stderr - +2025-02-05 22:59:47 - INFO - stdout - {'loss': 0.6605, 'grad_norm': 1.1818904876708984, 'learning_rate': 7.1261796917619745e-06, 'epoch': 1.82} +2025-02-05 22:59:47 - ERROR - stderr - 61%|██████ | 13573/22434 [12:52:07<6:12:50, 2.52s/it] +2025-02-05 22:59:50 - ERROR - stderr - 61%|██████ | 13574/22434 [12:52:10<6:20:46, 2.58s/it] +2025-02-05 22:59:50 - ERROR - stderr - +2025-02-05 22:59:50 - ERROR - stderr - +2025-02-05 22:59:50 - INFO - stdout - {'loss': 0.7192, 'grad_norm': 1.2967522144317627, 'learning_rate': 7.124796878042319e-06, 'epoch': 1.82} +2025-02-05 22:59:50 - ERROR - stderr - 61%|██████ | 13574/22434 [12:52:10<6:20:46, 2.58s/it] +2025-02-05 22:59:52 - ERROR - stderr - 61%|██████ | 13575/22434 [12:52:12<6:21:57, 2.59s/it] +2025-02-05 22:59:52 - ERROR - stderr - +2025-02-05 22:59:52 - ERROR - stderr - +2025-02-05 22:59:52 - INFO - stdout - {'loss': 0.66, 'grad_norm': 1.227283239364624, 'learning_rate': 7.123414124253522e-06, 'epoch': 1.82} +2025-02-05 22:59:52 - ERROR - stderr - 61%|██████ | 13575/22434 [12:52:12<6:21:57, 2.59s/it] +2025-02-05 22:59:55 - ERROR - stderr - 61%|██████ | 13576/22434 [12:52:15<6:20:26, 2.58s/it] +2025-02-05 22:59:55 - ERROR - stderr - +2025-02-05 22:59:55 - ERROR - stderr - +2025-02-05 22:59:55 - INFO - stdout - {'loss': 0.6852, 'grad_norm': 1.238958716392517, 'learning_rate': 7.122031430424406e-06, 'epoch': 1.82} +2025-02-05 22:59:55 - ERROR - stderr - 61%|██████ | 13576/22434 [12:52:15<6:20:26, 2.58s/it] +2025-02-05 22:59:58 - ERROR - stderr - 61%|██████ | 13577/22434 [12:52:17<6:18:46, 2.57s/it] +2025-02-05 22:59:58 - ERROR - stderr - +2025-02-05 22:59:58 - ERROR - stderr - +2025-02-05 22:59:58 - INFO - stdout - {'loss': 0.6643, 'grad_norm': 1.2863264083862305, 'learning_rate': 7.120648796583789e-06, 'epoch': 1.82} +2025-02-05 22:59:58 - ERROR - stderr - 61%|██████ | 13577/22434 [12:52:17<6:18:46, 2.57s/it] +2025-02-05 23:00:00 - ERROR - stderr - 61%|██████ | 13578/22434 [12:52:20<6:15:32, 2.54s/it] +2025-02-05 23:00:00 - ERROR - stderr - +2025-02-05 23:00:00 - ERROR - stderr - +2025-02-05 23:00:00 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.280329942703247, 'learning_rate': 7.119266222760494e-06, 'epoch': 1.82} +2025-02-05 23:00:00 - ERROR - stderr - 61%|██████ | 13578/22434 [12:52:20<6:15:32, 2.54s/it] +2025-02-05 23:00:03 - ERROR - stderr - 61%|██████ | 13579/22434 [12:52:22<6:15:01, 2.54s/it] +2025-02-05 23:00:03 - ERROR - stderr - +2025-02-05 23:00:03 - ERROR - stderr - +2025-02-05 23:00:03 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.2752056121826172, 'learning_rate': 7.1178837089833416e-06, 'epoch': 1.82} +2025-02-05 23:00:03 - ERROR - stderr - 61%|██████ | 13579/22434 [12:52:22<6:15:01, 2.54s/it] +2025-02-05 23:00:05 - ERROR - stderr - 61%|██████ | 13580/22434 [12:52:25<6:12:23, 2.52s/it] +2025-02-05 23:00:05 - ERROR - stderr - +2025-02-05 23:00:05 - ERROR - stderr - +2025-02-05 23:00:05 - INFO - stdout - {'loss': 0.5878, 'grad_norm': 1.1416422128677368, 'learning_rate': 7.116501255281138e-06, 'epoch': 1.82} +2025-02-05 23:00:05 - ERROR - stderr - 61%|██████ | 13580/22434 [12:52:25<6:12:23, 2.52s/it] +2025-02-05 23:00:08 - ERROR - stderr - 61%|██████ | 13581/22434 [12:52:27<6:13:37, 2.53s/it] +2025-02-05 23:00:08 - ERROR - stderr - +2025-02-05 23:00:08 - ERROR - stderr - +2025-02-05 23:00:08 - INFO - stdout - {'loss': 0.67, 'grad_norm': 1.1690255403518677, 'learning_rate': 7.115118861682711e-06, 'epoch': 1.82} +2025-02-05 23:00:08 - ERROR - stderr - 61%|██████ | 13581/22434 [12:52:27<6:13:37, 2.53s/it] +2025-02-05 23:00:10 - ERROR - stderr - 61%|██████ | 13582/22434 [12:52:30<6:13:59, 2.53s/it] +2025-02-05 23:00:10 - ERROR - stderr - +2025-02-05 23:00:10 - ERROR - stderr - +2025-02-05 23:00:10 - INFO - stdout - {'loss': 0.6407, 'grad_norm': 1.2313569784164429, 'learning_rate': 7.113736528216872e-06, 'epoch': 1.82} +2025-02-05 23:00:10 - ERROR - stderr - 61%|██████ | 13582/22434 [12:52:30<6:13:59, 2.53s/it] +2025-02-05 23:00:13 - ERROR - stderr - 61%|██████ | 13583/22434 [12:52:32<6:15:12, 2.54s/it] +2025-02-05 23:00:13 - ERROR - stderr - +2025-02-05 23:00:13 - ERROR - stderr - +2025-02-05 23:00:13 - INFO - stdout - {'loss': 0.6652, 'grad_norm': 1.193312406539917, 'learning_rate': 7.112354254912429e-06, 'epoch': 1.82} +2025-02-05 23:00:13 - ERROR - stderr - 61%|██████ | 13583/22434 [12:52:33<6:15:12, 2.54s/it] +2025-02-05 23:00:15 - ERROR - stderr - 61%|██████ | 13584/22434 [12:52:35<6:20:40, 2.58s/it] +2025-02-05 23:00:15 - ERROR - stderr - +2025-02-05 23:00:15 - ERROR - stderr - +2025-02-05 23:00:15 - INFO - stdout - {'loss': 0.7383, 'grad_norm': 1.2921831607818604, 'learning_rate': 7.110972041798203e-06, 'epoch': 1.82} +2025-02-05 23:00:15 - ERROR - stderr - 61%|██████ | 13584/22434 [12:52:35<6:20:40, 2.58s/it] +2025-02-05 23:00:18 - ERROR - stderr - 61%|██████ | 13585/22434 [12:52:38<6:24:22, 2.61s/it] +2025-02-05 23:00:18 - ERROR - stderr - +2025-02-05 23:00:18 - ERROR - stderr - +2025-02-05 23:00:18 - INFO - stdout - {'loss': 0.5889, 'grad_norm': 1.2011044025421143, 'learning_rate': 7.109589888902995e-06, 'epoch': 1.82} +2025-02-05 23:00:18 - ERROR - stderr - 61%|██████ | 13585/22434 [12:52:38<6:24:22, 2.61s/it] +2025-02-05 23:00:21 - ERROR - stderr - 61%|██████ | 13586/22434 [12:52:40<6:23:03, 2.60s/it] +2025-02-05 23:00:21 - ERROR - stderr - +2025-02-05 23:00:21 - ERROR - stderr - +2025-02-05 23:00:21 - INFO - stdout - {'loss': 0.6647, 'grad_norm': 1.3369941711425781, 'learning_rate': 7.108207796255625e-06, 'epoch': 1.82} +2025-02-05 23:00:21 - ERROR - stderr - 61%|██████ | 13586/22434 [12:52:40<6:23:03, 2.60s/it] +2025-02-05 23:00:23 - ERROR - stderr - 61%|██████ | 13587/22434 [12:52:43<6:15:34, 2.55s/it] +2025-02-05 23:00:23 - ERROR - stderr - +2025-02-05 23:00:23 - ERROR - stderr - +2025-02-05 23:00:23 - INFO - stdout - {'loss': 0.6146, 'grad_norm': 1.1748533248901367, 'learning_rate': 7.106825763884895e-06, 'epoch': 1.82} +2025-02-05 23:00:23 - ERROR - stderr - 61%|██████ | 13587/22434 [12:52:43<6:15:34, 2.55s/it] +2025-02-05 23:00:25 - ERROR - stderr - 61%|██████ | 13588/22434 [12:52:45<6:11:52, 2.52s/it] +2025-02-05 23:00:26 - ERROR - stderr - +2025-02-05 23:00:26 - ERROR - stderr - +2025-02-05 23:00:26 - INFO - stdout - {'loss': 0.6544, 'grad_norm': 1.2341011762619019, 'learning_rate': 7.105443791819612e-06, 'epoch': 1.82} +2025-02-05 23:00:26 - ERROR - stderr - 61%|██████ | 13588/22434 [12:52:45<6:11:52, 2.52s/it] +2025-02-05 23:00:28 - ERROR - stderr - 61%|██████ | 13589/22434 [12:52:48<6:10:06, 2.51s/it] +2025-02-05 23:00:28 - ERROR - stderr - +2025-02-05 23:00:28 - ERROR - stderr - +2025-02-05 23:00:28 - INFO - stdout - {'loss': 0.6576, 'grad_norm': 1.3026204109191895, 'learning_rate': 7.1040618800885845e-06, 'epoch': 1.82} +2025-02-05 23:00:28 - ERROR - stderr - 61%|██████ | 13589/22434 [12:52:48<6:10:06, 2.51s/it] +2025-02-05 23:00:30 - ERROR - stderr - 61%|██████ | 13590/22434 [12:52:50<6:08:38, 2.50s/it] +2025-02-05 23:00:31 - ERROR - stderr - +2025-02-05 23:00:31 - ERROR - stderr - +2025-02-05 23:00:31 - INFO - stdout - {'loss': 0.7159, 'grad_norm': 1.194996953010559, 'learning_rate': 7.102680028720616e-06, 'epoch': 1.82} +2025-02-05 23:00:31 - ERROR - stderr - 61%|██████ | 13590/22434 [12:52:50<6:08:38, 2.50s/it] +2025-02-05 23:00:33 - ERROR - stderr - 61%|██████ | 13591/22434 [12:52:53<6:19:00, 2.57s/it] +2025-02-05 23:00:33 - ERROR - stderr - +2025-02-05 23:00:33 - ERROR - stderr - +2025-02-05 23:00:33 - INFO - stdout - {'loss': 0.6831, 'grad_norm': 1.5943880081176758, 'learning_rate': 7.101298237744508e-06, 'epoch': 1.82} +2025-02-05 23:00:33 - ERROR - stderr - 61%|██████ | 13591/22434 [12:52:53<6:19:00, 2.57s/it] +2025-02-05 23:00:36 - ERROR - stderr - 61%|██████ | 13592/22434 [12:52:55<6:15:41, 2.55s/it] +2025-02-05 23:00:36 - ERROR - stderr - +2025-02-05 23:00:36 - ERROR - stderr - +2025-02-05 23:00:36 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.2619918584823608, 'learning_rate': 7.099916507189067e-06, 'epoch': 1.82} +2025-02-05 23:00:36 - ERROR - stderr - 61%|██████ | 13592/22434 [12:52:56<6:15:41, 2.55s/it] +2025-02-05 23:00:38 - ERROR - stderr - 61%|██████ | 13593/22434 [12:52:58<6:14:42, 2.54s/it] +2025-02-05 23:00:38 - ERROR - stderr - +2025-02-05 23:00:38 - ERROR - stderr - +2025-02-05 23:00:38 - INFO - stdout - {'loss': 0.74, 'grad_norm': 1.3485101461410522, 'learning_rate': 7.098534837083089e-06, 'epoch': 1.82} +2025-02-05 23:00:38 - ERROR - stderr - 61%|██████ | 13593/22434 [12:52:58<6:14:42, 2.54s/it] +2025-02-05 23:00:41 - ERROR - stderr - 61%|██████ | 13594/22434 [12:53:01<6:28:01, 2.63s/it] +2025-02-05 23:00:41 - ERROR - stderr - +2025-02-05 23:00:41 - ERROR - stderr - +2025-02-05 23:00:41 - INFO - stdout - {'loss': 0.691, 'grad_norm': 1.346592664718628, 'learning_rate': 7.097153227455379e-06, 'epoch': 1.82} +2025-02-05 23:00:41 - ERROR - stderr - 61%|██████ | 13594/22434 [12:53:01<6:28:01, 2.63s/it] +2025-02-05 23:00:44 - ERROR - stderr - 61%|██████ | 13595/22434 [12:53:03<6:20:49, 2.59s/it] +2025-02-05 23:00:44 - ERROR - stderr - +2025-02-05 23:00:44 - ERROR - stderr - +2025-02-05 23:00:44 - INFO - stdout - {'loss': 0.6682, 'grad_norm': 1.3063503503799438, 'learning_rate': 7.0957716783347295e-06, 'epoch': 1.82} +2025-02-05 23:00:44 - ERROR - stderr - 61%|██████ | 13595/22434 [12:53:03<6:20:49, 2.59s/it] +2025-02-05 23:00:46 - ERROR - stderr - 61%|██████ | 13596/22434 [12:53:06<6:17:41, 2.56s/it] +2025-02-05 23:00:46 - ERROR - stderr - +2025-02-05 23:00:46 - ERROR - stderr - +2025-02-05 23:00:46 - INFO - stdout - {'loss': 0.7985, 'grad_norm': 1.3465898036956787, 'learning_rate': 7.09439018974994e-06, 'epoch': 1.82} +2025-02-05 23:00:46 - ERROR - stderr - 61%|██████ | 13596/22434 [12:53:06<6:17:41, 2.56s/it] +2025-02-05 23:00:49 - ERROR - stderr - 61%|██████ | 13597/22434 [12:53:08<6:14:01, 2.54s/it] +2025-02-05 23:00:49 - ERROR - stderr - +2025-02-05 23:00:49 - ERROR - stderr - +2025-02-05 23:00:49 - INFO - stdout - {'loss': 0.6451, 'grad_norm': 1.1896618604660034, 'learning_rate': 7.093008761729809e-06, 'epoch': 1.82} +2025-02-05 23:00:49 - ERROR - stderr - 61%|██████ | 13597/22434 [12:53:08<6:14:01, 2.54s/it] +2025-02-05 23:00:51 - ERROR - stderr - 61%|██████ | 13598/22434 [12:53:11<6:17:46, 2.57s/it] +2025-02-05 23:00:51 - ERROR - stderr - +2025-02-05 23:00:51 - ERROR - stderr - +2025-02-05 23:00:51 - INFO - stdout - {'loss': 0.7402, 'grad_norm': 1.252610683441162, 'learning_rate': 7.091627394303125e-06, 'epoch': 1.82} +2025-02-05 23:00:51 - ERROR - stderr - 61%|██████ | 13598/22434 [12:53:11<6:17:46, 2.57s/it] +2025-02-05 23:00:54 - ERROR - stderr - 61%|██████ | 13599/22434 [12:53:13<6:14:20, 2.54s/it] +2025-02-05 23:00:54 - ERROR - stderr - +2025-02-05 23:00:54 - ERROR - stderr - +2025-02-05 23:00:54 - INFO - stdout - {'loss': 0.6923, 'grad_norm': 1.3104808330535889, 'learning_rate': 7.09024608749869e-06, 'epoch': 1.82} +2025-02-05 23:00:54 - ERROR - stderr - 61%|██████ | 13599/22434 [12:53:13<6:14:20, 2.54s/it] +2025-02-05 23:00:56 - ERROR - stderr - 61%|██████ | 13600/22434 [12:53:16<6:09:22, 2.51s/it] +2025-02-05 23:00:56 - ERROR - stderr - +2025-02-05 23:00:56 - ERROR - stderr - +2025-02-05 23:00:56 - INFO - stdout - {'loss': 0.6506, 'grad_norm': 1.2454110383987427, 'learning_rate': 7.088864841345289e-06, 'epoch': 1.82} +2025-02-05 23:00:56 - ERROR - stderr - 61%|██████ | 13600/22434 [12:53:16<6:09:22, 2.51s/it] +2025-02-05 23:00:59 - ERROR - stderr - 61%|██████ | 13601/22434 [12:53:18<6:12:42, 2.53s/it] +2025-02-05 23:00:59 - ERROR - stderr - +2025-02-05 23:00:59 - ERROR - stderr - +2025-02-05 23:00:59 - INFO - stdout - {'loss': 0.7542, 'grad_norm': 1.5558629035949707, 'learning_rate': 7.087483655871713e-06, 'epoch': 1.82} +2025-02-05 23:00:59 - ERROR - stderr - 61%|██████ | 13601/22434 [12:53:18<6:12:42, 2.53s/it] +2025-02-05 23:01:01 - ERROR - stderr - 61%|██████ | 13602/22434 [12:53:21<6:08:35, 2.50s/it] +2025-02-05 23:01:01 - ERROR - stderr - +2025-02-05 23:01:01 - ERROR - stderr - +2025-02-05 23:01:01 - INFO - stdout - {'loss': 0.6026, 'grad_norm': 1.265740990638733, 'learning_rate': 7.086102531106755e-06, 'epoch': 1.82} +2025-02-05 23:01:01 - ERROR - stderr - 61%|██████ | 13602/22434 [12:53:21<6:08:35, 2.50s/it] +2025-02-05 23:01:04 - ERROR - stderr - 61%|██████ | 13603/22434 [12:53:23<6:08:04, 2.50s/it] +2025-02-05 23:01:04 - ERROR - stderr - +2025-02-05 23:01:04 - ERROR - stderr - +2025-02-05 23:01:04 - INFO - stdout - {'loss': 0.7032, 'grad_norm': 1.2846379280090332, 'learning_rate': 7.084721467079202e-06, 'epoch': 1.82} +2025-02-05 23:01:04 - ERROR - stderr - 61%|██████ | 13603/22434 [12:53:23<6:08:04, 2.50s/it] +2025-02-05 23:01:06 - ERROR - stderr - 61%|██████ | 13604/22434 [12:53:26<6:08:27, 2.50s/it] +2025-02-05 23:01:06 - ERROR - stderr - +2025-02-05 23:01:06 - ERROR - stderr - +2025-02-05 23:01:06 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.2625577449798584, 'learning_rate': 7.083340463817837e-06, 'epoch': 1.82} +2025-02-05 23:01:06 - ERROR - stderr - 61%|██████ | 13604/22434 [12:53:26<6:08:27, 2.50s/it] +2025-02-05 23:01:09 - ERROR - stderr - 61%|██████ | 13605/22434 [12:53:28<6:08:21, 2.50s/it] +2025-02-05 23:01:09 - ERROR - stderr - +2025-02-05 23:01:09 - ERROR - stderr - +2025-02-05 23:01:09 - INFO - stdout - {'loss': 0.6589, 'grad_norm': 1.3744566440582275, 'learning_rate': 7.081959521351454e-06, 'epoch': 1.82} +2025-02-05 23:01:09 - ERROR - stderr - 61%|██████ | 13605/22434 [12:53:28<6:08:21, 2.50s/it] +2025-02-05 23:01:11 - ERROR - stderr - 61%|██████ | 13606/22434 [12:53:31<6:14:02, 2.54s/it] +2025-02-05 23:01:11 - ERROR - stderr - +2025-02-05 23:01:11 - ERROR - stderr - +2025-02-05 23:01:11 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.169491171836853, 'learning_rate': 7.080578639708827e-06, 'epoch': 1.82} +2025-02-05 23:01:11 - ERROR - stderr - 61%|██████ | 13606/22434 [12:53:31<6:14:02, 2.54s/it] +2025-02-05 23:01:14 - ERROR - stderr - 61%|██████ | 13607/22434 [12:53:33<6:11:29, 2.53s/it] +2025-02-05 23:01:14 - ERROR - stderr - +2025-02-05 23:01:14 - ERROR - stderr - +2025-02-05 23:01:14 - INFO - stdout - {'loss': 0.6243, 'grad_norm': 1.2505451440811157, 'learning_rate': 7.079197818918749e-06, 'epoch': 1.82} +2025-02-05 23:01:14 - ERROR - stderr - 61%|██████ | 13607/22434 [12:53:34<6:11:29, 2.53s/it] +2025-02-05 23:01:16 - ERROR - stderr - 61%|██████ | 13608/22434 [12:53:36<6:10:06, 2.52s/it] +2025-02-05 23:01:16 - ERROR - stderr - +2025-02-05 23:01:16 - ERROR - stderr - +2025-02-05 23:01:16 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.3637359142303467, 'learning_rate': 7.077817059009997e-06, 'epoch': 1.82} +2025-02-05 23:01:16 - ERROR - stderr - 61%|██████ | 13608/22434 [12:53:36<6:10:06, 2.52s/it] +2025-02-05 23:01:19 - ERROR - stderr - 61%|██████ | 13609/22434 [12:53:39<6:10:30, 2.52s/it] +2025-02-05 23:01:19 - ERROR - stderr - +2025-02-05 23:01:19 - ERROR - stderr - +2025-02-05 23:01:19 - INFO - stdout - {'loss': 0.622, 'grad_norm': 1.238973617553711, 'learning_rate': 7.076436360011348e-06, 'epoch': 1.82} +2025-02-05 23:01:19 - ERROR - stderr - 61%|██████ | 13609/22434 [12:53:39<6:10:30, 2.52s/it] +2025-02-05 23:01:21 - ERROR - stderr - 61%|██████ | 13610/22434 [12:53:41<6:14:32, 2.55s/it] +2025-02-05 23:01:21 - ERROR - stderr - +2025-02-05 23:01:21 - ERROR - stderr - +2025-02-05 23:01:21 - INFO - stdout - {'loss': 0.7482, 'grad_norm': 1.1828560829162598, 'learning_rate': 7.0750557219515916e-06, 'epoch': 1.82} +2025-02-05 23:01:21 - ERROR - stderr - 61%|██████ | 13610/22434 [12:53:41<6:14:32, 2.55s/it] +2025-02-05 23:01:24 - ERROR - stderr - 61%|██████ | 13611/22434 [12:53:44<6:12:40, 2.53s/it] +2025-02-05 23:01:24 - ERROR - stderr - +2025-02-05 23:01:24 - ERROR - stderr - +2025-02-05 23:01:24 - INFO - stdout - {'loss': 0.6412, 'grad_norm': 1.1666189432144165, 'learning_rate': 7.073675144859499e-06, 'epoch': 1.82} +2025-02-05 23:01:24 - ERROR - stderr - 61%|██████ | 13611/22434 [12:53:44<6:12:40, 2.53s/it] +2025-02-05 23:01:26 - ERROR - stderr - 61%|██████ | 13612/22434 [12:53:46<6:12:01, 2.53s/it] +2025-02-05 23:01:26 - ERROR - stderr - +2025-02-05 23:01:26 - ERROR - stderr - +2025-02-05 23:01:26 - INFO - stdout - {'loss': 0.7112, 'grad_norm': 1.312224268913269, 'learning_rate': 7.072294628763843e-06, 'epoch': 1.82} +2025-02-05 23:01:26 - ERROR - stderr - 61%|██████ | 13612/22434 [12:53:46<6:12:01, 2.53s/it] +2025-02-05 23:01:29 - ERROR - stderr - 61%|██████ | 13613/22434 [12:53:49<6:11:46, 2.53s/it] +2025-02-05 23:01:29 - ERROR - stderr - +2025-02-05 23:01:29 - ERROR - stderr - +2025-02-05 23:01:29 - INFO - stdout - {'loss': 0.7414, 'grad_norm': 1.1342087984085083, 'learning_rate': 7.0709141736934066e-06, 'epoch': 1.82} +2025-02-05 23:01:29 - ERROR - stderr - 61%|██████ | 13613/22434 [12:53:49<6:11:46, 2.53s/it] +2025-02-05 23:01:32 - ERROR - stderr - 61%|██████ | 13614/22434 [12:53:51<6:21:43, 2.60s/it] +2025-02-05 23:01:32 - ERROR - stderr - +2025-02-05 23:01:32 - ERROR - stderr - +2025-02-05 23:01:32 - INFO - stdout - {'loss': 0.663, 'grad_norm': 1.2651315927505493, 'learning_rate': 7.069533779676961e-06, 'epoch': 1.82} +2025-02-05 23:01:32 - ERROR - stderr - 61%|██████ | 13614/22434 [12:53:51<6:21:43, 2.60s/it] +2025-02-05 23:01:34 - ERROR - stderr - 61%|██████ | 13615/22434 [12:53:54<6:17:22, 2.57s/it] +2025-02-05 23:01:34 - ERROR - stderr - +2025-02-05 23:01:34 - ERROR - stderr - +2025-02-05 23:01:34 - INFO - stdout - {'loss': 0.771, 'grad_norm': 1.3959838151931763, 'learning_rate': 7.06815344674328e-06, 'epoch': 1.82} +2025-02-05 23:01:34 - ERROR - stderr - 61%|██████ | 13615/22434 [12:53:54<6:17:22, 2.57s/it] +2025-02-05 23:01:37 - ERROR - stderr - 61%|██████ | 13616/22434 [12:53:56<6:15:02, 2.55s/it] +2025-02-05 23:01:37 - ERROR - stderr - +2025-02-05 23:01:37 - ERROR - stderr - +2025-02-05 23:01:37 - INFO - stdout - {'loss': 0.6361, 'grad_norm': 1.154520034790039, 'learning_rate': 7.0667731749211375e-06, 'epoch': 1.82} +2025-02-05 23:01:37 - ERROR - stderr - 61%|██████ | 13616/22434 [12:53:56<6:15:02, 2.55s/it] +2025-02-05 23:01:39 - ERROR - stderr - 61%|██████ | 13617/22434 [12:53:59<6:16:11, 2.56s/it] +2025-02-05 23:01:39 - ERROR - stderr - +2025-02-05 23:01:39 - ERROR - stderr - +2025-02-05 23:01:39 - INFO - stdout - {'loss': 0.6224, 'grad_norm': 1.1782459020614624, 'learning_rate': 7.0653929642392974e-06, 'epoch': 1.82} +2025-02-05 23:01:39 - ERROR - stderr - 61%|██████ | 13617/22434 [12:53:59<6:16:11, 2.56s/it] +2025-02-05 23:01:42 - ERROR - stderr - 61%|██████ | 13618/22434 [12:54:02<6:17:05, 2.57s/it] +2025-02-05 23:01:42 - ERROR - stderr - +2025-02-05 23:01:42 - ERROR - stderr - +2025-02-05 23:01:42 - INFO - stdout - {'loss': 0.7108, 'grad_norm': 1.384628176689148, 'learning_rate': 7.0640128147265355e-06, 'epoch': 1.82} +2025-02-05 23:01:42 - ERROR - stderr - 61%|██████ | 13618/22434 [12:54:02<6:17:05, 2.57s/it] +2025-02-05 23:01:44 - ERROR - stderr - 61%|██████ | 13619/22434 [12:54:04<6:14:45, 2.55s/it] +2025-02-05 23:01:44 - ERROR - stderr - +2025-02-05 23:01:44 - ERROR - stderr - +2025-02-05 23:01:44 - INFO - stdout - {'loss': 0.5926, 'grad_norm': 1.3161417245864868, 'learning_rate': 7.062632726411616e-06, 'epoch': 1.82} +2025-02-05 23:01:44 - ERROR - stderr - 61%|██████ | 13619/22434 [12:54:04<6:14:45, 2.55s/it] +2025-02-05 23:01:47 - ERROR - stderr - 61%|██████ | 13620/22434 [12:54:07<6:13:25, 2.54s/it] +2025-02-05 23:01:47 - ERROR - stderr - +2025-02-05 23:01:47 - ERROR - stderr - +2025-02-05 23:01:47 - INFO - stdout - {'loss': 0.73, 'grad_norm': 1.1590203046798706, 'learning_rate': 7.061252699323307e-06, 'epoch': 1.82} +2025-02-05 23:01:47 - ERROR - stderr - 61%|██████ | 13620/22434 [12:54:07<6:13:25, 2.54s/it] +2025-02-05 23:01:49 - ERROR - stderr - 61%|██████ | 13621/22434 [12:54:09<6:09:08, 2.51s/it] +2025-02-05 23:01:49 - ERROR - stderr - +2025-02-05 23:01:49 - ERROR - stderr - +2025-02-05 23:01:49 - INFO - stdout - {'loss': 0.6546, 'grad_norm': 1.2304840087890625, 'learning_rate': 7.059872733490372e-06, 'epoch': 1.82} +2025-02-05 23:01:49 - ERROR - stderr - 61%|██████ | 13621/22434 [12:54:09<6:09:08, 2.51s/it] +2025-02-05 23:01:52 - ERROR - stderr - 61%|██████ | 13622/22434 [12:54:12<6:08:30, 2.51s/it] +2025-02-05 23:01:52 - ERROR - stderr - +2025-02-05 23:01:52 - ERROR - stderr - +2025-02-05 23:01:52 - INFO - stdout - {'loss': 0.7346, 'grad_norm': 1.2083485126495361, 'learning_rate': 7.0584928289415755e-06, 'epoch': 1.82} +2025-02-05 23:01:52 - ERROR - stderr - 61%|██████ | 13622/22434 [12:54:12<6:08:30, 2.51s/it] +2025-02-05 23:01:54 - ERROR - stderr - 61%|██████ | 13623/22434 [12:54:14<6:05:48, 2.49s/it] +2025-02-05 23:01:54 - ERROR - stderr - +2025-02-05 23:01:54 - ERROR - stderr - +2025-02-05 23:01:54 - INFO - stdout - {'loss': 0.6992, 'grad_norm': 1.286428689956665, 'learning_rate': 7.057112985705685e-06, 'epoch': 1.82} +2025-02-05 23:01:54 - ERROR - stderr - 61%|██████ | 13623/22434 [12:54:14<6:05:48, 2.49s/it] +2025-02-05 23:01:57 - ERROR - stderr - 61%|██████ | 13624/22434 [12:54:16<6:02:56, 2.47s/it] +2025-02-05 23:01:57 - ERROR - stderr - +2025-02-05 23:01:57 - ERROR - stderr - +2025-02-05 23:01:57 - INFO - stdout - {'loss': 0.7752, 'grad_norm': 1.390513300895691, 'learning_rate': 7.055733203811459e-06, 'epoch': 1.82} +2025-02-05 23:01:57 - ERROR - stderr - 61%|██████ | 13624/22434 [12:54:17<6:02:56, 2.47s/it] +2025-02-05 23:01:59 - ERROR - stderr - 61%|██████ | 13625/22434 [12:54:19<6:08:10, 2.51s/it] +2025-02-05 23:01:59 - ERROR - stderr - +2025-02-05 23:01:59 - ERROR - stderr - +2025-02-05 23:01:59 - INFO - stdout - {'loss': 0.6943, 'grad_norm': 1.2702159881591797, 'learning_rate': 7.054353483287651e-06, 'epoch': 1.82} +2025-02-05 23:01:59 - ERROR - stderr - 61%|██████ | 13625/22434 [12:54:19<6:08:10, 2.51s/it] +2025-02-05 23:02:02 - ERROR - stderr - 61%|██████ | 13626/22434 [12:54:22<6:23:06, 2.61s/it] +2025-02-05 23:02:02 - ERROR - stderr - +2025-02-05 23:02:02 - ERROR - stderr - +2025-02-05 23:02:02 - INFO - stdout - {'loss': 0.7507, 'grad_norm': 1.3454058170318604, 'learning_rate': 7.052973824163032e-06, 'epoch': 1.82} +2025-02-05 23:02:02 - ERROR - stderr - 61%|██████ | 13626/22434 [12:54:22<6:23:06, 2.61s/it] +2025-02-05 23:02:05 - ERROR - stderr - 61%|██████ | 13627/22434 [12:54:24<6:19:55, 2.59s/it] +2025-02-05 23:02:05 - ERROR - stderr - +2025-02-05 23:02:05 - ERROR - stderr - +2025-02-05 23:02:05 - INFO - stdout - {'loss': 0.6592, 'grad_norm': 1.326231837272644, 'learning_rate': 7.051594226466351e-06, 'epoch': 1.82} +2025-02-05 23:02:05 - ERROR - stderr - 61%|██████ | 13627/22434 [12:54:24<6:19:55, 2.59s/it] +2025-02-05 23:02:07 - ERROR - stderr - 61%|██████ | 13628/22434 [12:54:27<6:17:26, 2.57s/it] +2025-02-05 23:02:07 - ERROR - stderr - +2025-02-05 23:02:07 - ERROR - stderr - +2025-02-05 23:02:07 - INFO - stdout - {'loss': 0.6161, 'grad_norm': 1.196489691734314, 'learning_rate': 7.050214690226365e-06, 'epoch': 1.82} +2025-02-05 23:02:07 - ERROR - stderr - 61%|██████ | 13628/22434 [12:54:27<6:17:26, 2.57s/it] +2025-02-05 23:02:10 - ERROR - stderr - 61%|██████ | 13629/22434 [12:54:30<6:18:37, 2.58s/it] +2025-02-05 23:02:10 - ERROR - stderr - +2025-02-05 23:02:10 - ERROR - stderr - +2025-02-05 23:02:10 - INFO - stdout - {'loss': 0.6189, 'grad_norm': 1.1779720783233643, 'learning_rate': 7.048835215471834e-06, 'epoch': 1.82} +2025-02-05 23:02:10 - ERROR - stderr - 61%|██████ | 13629/22434 [12:54:30<6:18:37, 2.58s/it] +2025-02-05 23:02:12 - ERROR - stderr - 61%|██████ | 13630/22434 [12:54:32<6:19:13, 2.58s/it] +2025-02-05 23:02:12 - ERROR - stderr - +2025-02-05 23:02:12 - ERROR - stderr - +2025-02-05 23:02:12 - INFO - stdout - {'loss': 0.5919, 'grad_norm': 1.2498775720596313, 'learning_rate': 7.047455802231506e-06, 'epoch': 1.82} +2025-02-05 23:02:12 - ERROR - stderr - 61%|██████ | 13630/22434 [12:54:32<6:19:13, 2.58s/it] +2025-02-05 23:02:15 - ERROR - stderr - 61%|██████ | 13631/22434 [12:54:35<6:14:51, 2.55s/it] +2025-02-05 23:02:15 - ERROR - stderr - +2025-02-05 23:02:15 - ERROR - stderr - +2025-02-05 23:02:15 - INFO - stdout - {'loss': 0.7041, 'grad_norm': 1.4105192422866821, 'learning_rate': 7.046076450534142e-06, 'epoch': 1.82} +2025-02-05 23:02:15 - ERROR - stderr - 61%|██████ | 13631/22434 [12:54:35<6:14:51, 2.55s/it] +2025-02-05 23:02:17 - ERROR - stderr - 61%|██████ | 13632/22434 [12:54:37<6:11:50, 2.53s/it] +2025-02-05 23:02:17 - ERROR - stderr - +2025-02-05 23:02:17 - ERROR - stderr - +2025-02-05 23:02:17 - INFO - stdout - {'loss': 0.6954, 'grad_norm': 1.304527997970581, 'learning_rate': 7.0446971604084845e-06, 'epoch': 1.82} +2025-02-05 23:02:17 - ERROR - stderr - 61%|██████ | 13632/22434 [12:54:37<6:11:50, 2.53s/it] +2025-02-05 23:02:20 - ERROR - stderr - 61%|██████ | 13633/22434 [12:54:40<6:09:49, 2.52s/it] +2025-02-05 23:02:20 - ERROR - stderr - +2025-02-05 23:02:20 - ERROR - stderr - +2025-02-05 23:02:20 - INFO - stdout - {'loss': 0.724, 'grad_norm': 1.259665608406067, 'learning_rate': 7.043317931883287e-06, 'epoch': 1.82} +2025-02-05 23:02:20 - ERROR - stderr - 61%|██████ | 13633/22434 [12:54:40<6:09:49, 2.52s/it] +2025-02-05 23:02:22 - ERROR - stderr - 61%|██████ | 13634/22434 [12:54:42<6:10:56, 2.53s/it] +2025-02-05 23:02:22 - ERROR - stderr - +2025-02-05 23:02:22 - ERROR - stderr - +2025-02-05 23:02:22 - INFO - stdout - {'loss': 0.6838, 'grad_norm': 1.316893458366394, 'learning_rate': 7.041938764987297e-06, 'epoch': 1.82} +2025-02-05 23:02:22 - ERROR - stderr - 61%|██████ | 13634/22434 [12:54:42<6:10:56, 2.53s/it] +2025-02-05 23:02:25 - ERROR - stderr - 61%|██████ | 13635/22434 [12:54:45<6:07:18, 2.50s/it] +2025-02-05 23:02:25 - ERROR - stderr - +2025-02-05 23:02:25 - ERROR - stderr - +2025-02-05 23:02:25 - INFO - stdout - {'loss': 0.6244, 'grad_norm': 1.3688302040100098, 'learning_rate': 7.040559659749265e-06, 'epoch': 1.82} +2025-02-05 23:02:25 - ERROR - stderr - 61%|██████ | 13635/22434 [12:54:45<6:07:18, 2.50s/it] +2025-02-05 23:02:27 - ERROR - stderr - 61%|██████ | 13636/22434 [12:54:47<6:05:16, 2.49s/it] +2025-02-05 23:02:27 - ERROR - stderr - +2025-02-05 23:02:27 - ERROR - stderr - +2025-02-05 23:02:27 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.3386318683624268, 'learning_rate': 7.0391806161979316e-06, 'epoch': 1.82} +2025-02-05 23:02:27 - ERROR - stderr - 61%|██████ | 13636/22434 [12:54:47<6:05:16, 2.49s/it] +2025-02-05 23:02:30 - ERROR - stderr - 61%|██████ | 13637/22434 [12:54:50<6:04:01, 2.48s/it] +2025-02-05 23:02:30 - ERROR - stderr - +2025-02-05 23:02:30 - ERROR - stderr - +2025-02-05 23:02:30 - INFO - stdout - {'loss': 0.7475, 'grad_norm': 1.4612607955932617, 'learning_rate': 7.037801634362049e-06, 'epoch': 1.82} +2025-02-05 23:02:30 - ERROR - stderr - 61%|██████ | 13637/22434 [12:54:50<6:04:01, 2.48s/it] +2025-02-05 23:02:32 - ERROR - stderr - 61%|██████ | 13638/22434 [12:54:52<6:03:57, 2.48s/it] +2025-02-05 23:02:32 - ERROR - stderr - +2025-02-05 23:02:32 - ERROR - stderr - +2025-02-05 23:02:32 - INFO - stdout - {'loss': 0.5869, 'grad_norm': 1.1339207887649536, 'learning_rate': 7.036422714270353e-06, 'epoch': 1.82} +2025-02-05 23:02:32 - ERROR - stderr - 61%|██████ | 13638/22434 [12:54:52<6:03:57, 2.48s/it] +2025-02-05 23:02:35 - ERROR - stderr - 61%|██████ | 13639/22434 [12:54:55<6:05:53, 2.50s/it] +2025-02-05 23:02:35 - ERROR - stderr - +2025-02-05 23:02:35 - ERROR - stderr - +2025-02-05 23:02:35 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.2771040201187134, 'learning_rate': 7.035043855951593e-06, 'epoch': 1.82} +2025-02-05 23:02:35 - ERROR - stderr - 61%|██████ | 13639/22434 [12:54:55<6:05:53, 2.50s/it] +2025-02-05 23:02:37 - ERROR - stderr - 61%|██████ | 13640/22434 [12:54:57<6:02:45, 2.48s/it] +2025-02-05 23:02:37 - ERROR - stderr - +2025-02-05 23:02:37 - ERROR - stderr - +2025-02-05 23:02:37 - INFO - stdout - {'loss': 0.7341, 'grad_norm': 1.328466773033142, 'learning_rate': 7.0336650594345055e-06, 'epoch': 1.82} +2025-02-05 23:02:37 - ERROR - stderr - 61%|██████ | 13640/22434 [12:54:57<6:02:45, 2.48s/it] +2025-02-05 23:02:40 - ERROR - stderr - 61%|██████ | 13641/22434 [12:55:00<6:04:32, 2.49s/it] +2025-02-05 23:02:40 - ERROR - stderr - +2025-02-05 23:02:40 - ERROR - stderr - +2025-02-05 23:02:40 - INFO - stdout - {'loss': 0.6625, 'grad_norm': 1.184380292892456, 'learning_rate': 7.032286324747829e-06, 'epoch': 1.82} +2025-02-05 23:02:40 - ERROR - stderr - 61%|██████ | 13641/22434 [12:55:00<6:04:32, 2.49s/it] +2025-02-05 23:02:42 - ERROR - stderr - 61%|██████ | 13642/22434 [12:55:02<6:11:24, 2.53s/it] +2025-02-05 23:02:42 - ERROR - stderr - +2025-02-05 23:02:42 - ERROR - stderr - +2025-02-05 23:02:42 - INFO - stdout - {'loss': 0.7644, 'grad_norm': 1.3108444213867188, 'learning_rate': 7.030907651920309e-06, 'epoch': 1.82} +2025-02-05 23:02:42 - ERROR - stderr - 61%|██████ | 13642/22434 [12:55:02<6:11:24, 2.53s/it] +2025-02-05 23:02:45 - ERROR - stderr - 61%|██████ | 13643/22434 [12:55:05<6:10:17, 2.53s/it] +2025-02-05 23:02:45 - ERROR - stderr - +2025-02-05 23:02:45 - ERROR - stderr - +2025-02-05 23:02:45 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 1.215500831604004, 'learning_rate': 7.0295290409806775e-06, 'epoch': 1.82} +2025-02-05 23:02:45 - ERROR - stderr - 61%|██████ | 13643/22434 [12:55:05<6:10:17, 2.53s/it] +2025-02-05 23:02:47 - ERROR - stderr - 61%|██████ | 13644/22434 [12:55:07<6:06:49, 2.50s/it] +2025-02-05 23:02:47 - ERROR - stderr - +2025-02-05 23:02:47 - ERROR - stderr - +2025-02-05 23:02:47 - INFO - stdout - {'loss': 0.7242, 'grad_norm': 1.2626285552978516, 'learning_rate': 7.028150491957666e-06, 'epoch': 1.82} +2025-02-05 23:02:47 - ERROR - stderr - 61%|██████ | 13644/22434 [12:55:07<6:06:49, 2.50s/it] +2025-02-05 23:02:50 - ERROR - stderr - 61%|██████ | 13645/22434 [12:55:10<6:10:35, 2.53s/it] +2025-02-05 23:02:50 - ERROR - stderr - +2025-02-05 23:02:50 - ERROR - stderr - +2025-02-05 23:02:50 - INFO - stdout - {'loss': 0.7786, 'grad_norm': 1.4614107608795166, 'learning_rate': 7.026772004880018e-06, 'epoch': 1.82} +2025-02-05 23:02:50 - ERROR - stderr - 61%|██████ | 13645/22434 [12:55:10<6:10:35, 2.53s/it] +2025-02-05 23:02:53 - ERROR - stderr - 61%|██████ | 13646/22434 [12:55:12<6:13:48, 2.55s/it] +2025-02-05 23:02:53 - ERROR - stderr - +2025-02-05 23:02:53 - ERROR - stderr - +2025-02-05 23:02:53 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.2773489952087402, 'learning_rate': 7.025393579776458e-06, 'epoch': 1.82} +2025-02-05 23:02:53 - ERROR - stderr - 61%|██████ | 13646/22434 [12:55:12<6:13:48, 2.55s/it] +2025-02-05 23:02:55 - ERROR - stderr - 61%|██████ | 13647/22434 [12:55:15<6:15:46, 2.57s/it] +2025-02-05 23:02:55 - ERROR - stderr - +2025-02-05 23:02:55 - ERROR - stderr - +2025-02-05 23:02:55 - INFO - stdout - {'loss': 0.7145, 'grad_norm': 1.3084397315979004, 'learning_rate': 7.024015216675726e-06, 'epoch': 1.82} +2025-02-05 23:02:55 - ERROR - stderr - 61%|██████ | 13647/22434 [12:55:15<6:15:46, 2.57s/it] +2025-02-05 23:02:58 - ERROR - stderr - 61%|██████ | 13648/22434 [12:55:17<6:13:08, 2.55s/it] +2025-02-05 23:02:58 - ERROR - stderr - +2025-02-05 23:02:58 - ERROR - stderr - +2025-02-05 23:02:58 - INFO - stdout - {'loss': 0.6807, 'grad_norm': 1.357030987739563, 'learning_rate': 7.022636915606549e-06, 'epoch': 1.83} +2025-02-05 23:02:58 - ERROR - stderr - 61%|██████ | 13648/22434 [12:55:17<6:13:08, 2.55s/it] +2025-02-05 23:03:00 - ERROR - stderr - 61%|██████ | 13649/22434 [12:55:20<6:08:36, 2.52s/it] +2025-02-05 23:03:00 - ERROR - stderr - +2025-02-05 23:03:00 - ERROR - stderr - +2025-02-05 23:03:00 - INFO - stdout - {'loss': 0.6356, 'grad_norm': 1.171615719795227, 'learning_rate': 7.021258676597654e-06, 'epoch': 1.83} +2025-02-05 23:03:00 - ERROR - stderr - 61%|██████ | 13649/22434 [12:55:20<6:08:36, 2.52s/it] +2025-02-05 23:03:03 - ERROR - stderr - 61%|██████ | 13650/22434 [12:55:22<6:10:06, 2.53s/it] +2025-02-05 23:03:03 - ERROR - stderr - +2025-02-05 23:03:03 - ERROR - stderr - +2025-02-05 23:03:03 - INFO - stdout - {'loss': 0.6038, 'grad_norm': 1.121340036392212, 'learning_rate': 7.0198804996777754e-06, 'epoch': 1.83} +2025-02-05 23:03:03 - ERROR - stderr - 61%|██████ | 13650/22434 [12:55:22<6:10:06, 2.53s/it] +2025-02-05 23:03:05 - ERROR - stderr - 61%|██████ | 13651/22434 [12:55:25<6:06:09, 2.50s/it] +2025-02-05 23:03:05 - ERROR - stderr - +2025-02-05 23:03:05 - ERROR - stderr - +2025-02-05 23:03:05 - INFO - stdout - {'loss': 0.6682, 'grad_norm': 1.2261704206466675, 'learning_rate': 7.018502384875634e-06, 'epoch': 1.83} +2025-02-05 23:03:05 - ERROR - stderr - 61%|██████ | 13651/22434 [12:55:25<6:06:09, 2.50s/it] +2025-02-05 23:03:08 - ERROR - stderr - 61%|██████ | 13652/22434 [12:55:27<6:05:45, 2.50s/it] +2025-02-05 23:03:08 - ERROR - stderr - +2025-02-05 23:03:08 - ERROR - stderr - +2025-02-05 23:03:08 - INFO - stdout - {'loss': 0.7455, 'grad_norm': 1.2922707796096802, 'learning_rate': 7.017124332219958e-06, 'epoch': 1.83} +2025-02-05 23:03:08 - ERROR - stderr - 61%|██████ | 13652/22434 [12:55:27<6:05:45, 2.50s/it] +2025-02-05 23:03:10 - ERROR - stderr - 61%|██████ | 13653/22434 [12:55:30<6:03:35, 2.48s/it] +2025-02-05 23:03:10 - ERROR - stderr - +2025-02-05 23:03:10 - ERROR - stderr - +2025-02-05 23:03:10 - INFO - stdout - {'loss': 0.68, 'grad_norm': 1.2519872188568115, 'learning_rate': 7.015746341739469e-06, 'epoch': 1.83} +2025-02-05 23:03:10 - ERROR - stderr - 61%|██████ | 13653/22434 [12:55:30<6:03:35, 2.48s/it] +2025-02-05 23:03:13 - ERROR - stderr - 61%|██████ | 13654/22434 [12:55:32<6:11:02, 2.54s/it] +2025-02-05 23:03:13 - ERROR - stderr - +2025-02-05 23:03:13 - ERROR - stderr - +2025-02-05 23:03:13 - INFO - stdout - {'loss': 0.6452, 'grad_norm': 1.2441667318344116, 'learning_rate': 7.014368413462891e-06, 'epoch': 1.83} +2025-02-05 23:03:13 - ERROR - stderr - 61%|██████ | 13654/22434 [12:55:32<6:11:02, 2.54s/it] +2025-02-05 23:03:15 - ERROR - stderr - 61%|██████ | 13655/22434 [12:55:35<6:12:31, 2.55s/it] +2025-02-05 23:03:15 - ERROR - stderr - +2025-02-05 23:03:15 - ERROR - stderr - +2025-02-05 23:03:15 - INFO - stdout - {'loss': 0.6525, 'grad_norm': 1.2047346830368042, 'learning_rate': 7.012990547418952e-06, 'epoch': 1.83} +2025-02-05 23:03:15 - ERROR - stderr - 61%|██████ | 13655/22434 [12:55:35<6:12:31, 2.55s/it] +2025-02-05 23:03:18 - ERROR - stderr - 61%|██████ | 13656/22434 [12:55:38<6:27:26, 2.65s/it] +2025-02-05 23:03:18 - ERROR - stderr - +2025-02-05 23:03:18 - ERROR - stderr - +2025-02-05 23:03:18 - INFO - stdout - {'loss': 0.7295, 'grad_norm': 1.4370282888412476, 'learning_rate': 7.011612743636365e-06, 'epoch': 1.83} +2025-02-05 23:03:18 - ERROR - stderr - 61%|██████ | 13656/22434 [12:55:38<6:27:26, 2.65s/it] +2025-02-05 23:03:21 - ERROR - stderr - 61%|██████ | 13657/22434 [12:55:40<6:21:44, 2.61s/it] +2025-02-05 23:03:21 - ERROR - stderr - +2025-02-05 23:03:21 - ERROR - stderr - +2025-02-05 23:03:21 - INFO - stdout - {'loss': 0.6072, 'grad_norm': 1.1209518909454346, 'learning_rate': 7.010235002143847e-06, 'epoch': 1.83} +2025-02-05 23:03:21 - ERROR - stderr - 61%|██████ | 13657/22434 [12:55:40<6:21:44, 2.61s/it] +2025-02-05 23:03:23 - ERROR - stderr - 61%|██████ | 13658/22434 [12:55:43<6:15:34, 2.57s/it] +2025-02-05 23:03:23 - ERROR - stderr - +2025-02-05 23:03:23 - ERROR - stderr - +2025-02-05 23:03:23 - INFO - stdout - {'loss': 0.754, 'grad_norm': 1.2879307270050049, 'learning_rate': 7.008857322970124e-06, 'epoch': 1.83} +2025-02-05 23:03:23 - ERROR - stderr - 61%|██████ | 13658/22434 [12:55:43<6:15:34, 2.57s/it] +2025-02-05 23:03:26 - ERROR - stderr - 61%|██████ | 13659/22434 [12:55:45<6:12:33, 2.55s/it] +2025-02-05 23:03:26 - ERROR - stderr - +2025-02-05 23:03:26 - ERROR - stderr - +2025-02-05 23:03:26 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.294012188911438, 'learning_rate': 7.007479706143905e-06, 'epoch': 1.83} +2025-02-05 23:03:26 - ERROR - stderr - 61%|██████ | 13659/22434 [12:55:45<6:12:33, 2.55s/it] +2025-02-05 23:03:28 - ERROR - stderr - 61%|██████ | 13660/22434 [12:55:48<6:07:44, 2.51s/it] +2025-02-05 23:03:28 - ERROR - stderr - +2025-02-05 23:03:28 - ERROR - stderr - +2025-02-05 23:03:28 - INFO - stdout - {'loss': 0.6739, 'grad_norm': 1.1819275617599487, 'learning_rate': 7.006102151693907e-06, 'epoch': 1.83} +2025-02-05 23:03:28 - ERROR - stderr - 61%|██████ | 13660/22434 [12:55:48<6:07:44, 2.51s/it] +2025-02-05 23:03:31 - ERROR - stderr - 61%|██████ | 13661/22434 [12:55:50<6:06:36, 2.51s/it] +2025-02-05 23:03:31 - ERROR - stderr - +2025-02-05 23:03:31 - ERROR - stderr - +2025-02-05 23:03:31 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.333618402481079, 'learning_rate': 7.004724659648848e-06, 'epoch': 1.83} +2025-02-05 23:03:31 - ERROR - stderr - 61%|██████ | 13661/22434 [12:55:50<6:06:36, 2.51s/it] +2025-02-05 23:03:33 - ERROR - stderr - 61%|██████ | 13662/22434 [12:55:53<6:17:00, 2.58s/it] +2025-02-05 23:03:33 - ERROR - stderr - +2025-02-05 23:03:33 - ERROR - stderr - +2025-02-05 23:03:33 - INFO - stdout - {'loss': 0.6956, 'grad_norm': 1.2918713092803955, 'learning_rate': 7.003347230037434e-06, 'epoch': 1.83} +2025-02-05 23:03:33 - ERROR - stderr - 61%|██████ | 13662/22434 [12:55:53<6:17:00, 2.58s/it] +2025-02-05 23:03:36 - ERROR - stderr - 61%|██████ | 13663/22434 [12:55:56<6:13:50, 2.56s/it] +2025-02-05 23:03:36 - ERROR - stderr - +2025-02-05 23:03:36 - ERROR - stderr - +2025-02-05 23:03:36 - INFO - stdout - {'loss': 0.6364, 'grad_norm': 1.1978696584701538, 'learning_rate': 7.001969862888383e-06, 'epoch': 1.83} +2025-02-05 23:03:36 - ERROR - stderr - 61%|██████ | 13663/22434 [12:55:56<6:13:50, 2.56s/it] +2025-02-05 23:03:38 - ERROR - stderr - 61%|██████ | 13664/22434 [12:55:58<6:14:36, 2.56s/it] +2025-02-05 23:03:38 - ERROR - stderr - +2025-02-05 23:03:38 - ERROR - stderr - +2025-02-05 23:03:38 - INFO - stdout - {'loss': 0.7472, 'grad_norm': 1.2334239482879639, 'learning_rate': 7.000592558230399e-06, 'epoch': 1.83} +2025-02-05 23:03:38 - ERROR - stderr - 61%|██████ | 13664/22434 [12:55:58<6:14:36, 2.56s/it] +2025-02-05 23:03:41 - ERROR - stderr - 61%|██████ | 13665/22434 [12:56:01<6:13:24, 2.56s/it] +2025-02-05 23:03:41 - ERROR - stderr - +2025-02-05 23:03:41 - ERROR - stderr - +2025-02-05 23:03:41 - INFO - stdout - {'loss': 0.7504, 'grad_norm': 1.3283867835998535, 'learning_rate': 6.9992153160921935e-06, 'epoch': 1.83} +2025-02-05 23:03:41 - ERROR - stderr - 61%|██████ | 13665/22434 [12:56:01<6:13:24, 2.56s/it] +2025-02-05 23:03:43 - ERROR - stderr - 61%|██████ | 13666/22434 [12:56:03<6:11:30, 2.54s/it] +2025-02-05 23:03:43 - ERROR - stderr - +2025-02-05 23:03:43 - ERROR - stderr - +2025-02-05 23:03:43 - INFO - stdout - {'loss': 0.6145, 'grad_norm': 1.2248510122299194, 'learning_rate': 6.997838136502474e-06, 'epoch': 1.83} +2025-02-05 23:03:43 - ERROR - stderr - 61%|██████ | 13666/22434 [12:56:03<6:11:30, 2.54s/it] +2025-02-05 23:03:46 - ERROR - stderr - 61%|██████ | 13667/22434 [12:56:06<6:09:01, 2.53s/it] +2025-02-05 23:03:46 - ERROR - stderr - +2025-02-05 23:03:46 - ERROR - stderr - +2025-02-05 23:03:46 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.0830023288726807, 'learning_rate': 6.9964610194899476e-06, 'epoch': 1.83} +2025-02-05 23:03:46 - ERROR - stderr - 61%|██████ | 13667/22434 [12:56:06<6:09:01, 2.53s/it] +2025-02-05 23:03:48 - ERROR - stderr - 61%|██████ | 13668/22434 [12:56:08<6:08:23, 2.52s/it] +2025-02-05 23:03:48 - ERROR - stderr - +2025-02-05 23:03:48 - ERROR - stderr - +2025-02-05 23:03:48 - INFO - stdout - {'loss': 0.7005, 'grad_norm': 1.2234848737716675, 'learning_rate': 6.995083965083313e-06, 'epoch': 1.83} +2025-02-05 23:03:48 - ERROR - stderr - 61%|██████ | 13668/22434 [12:56:08<6:08:23, 2.52s/it] +2025-02-05 23:03:51 - ERROR - stderr - 61%|██████ | 13669/22434 [12:56:11<6:07:43, 2.52s/it] +2025-02-05 23:03:51 - ERROR - stderr - +2025-02-05 23:03:51 - ERROR - stderr - +2025-02-05 23:03:51 - INFO - stdout - {'loss': 0.6911, 'grad_norm': 1.2511358261108398, 'learning_rate': 6.993706973311281e-06, 'epoch': 1.83} +2025-02-05 23:03:51 - ERROR - stderr - 61%|██████ | 13669/22434 [12:56:11<6:07:43, 2.52s/it] +2025-02-05 23:03:53 - ERROR - stderr - 61%|██████ | 13670/22434 [12:56:13<6:06:48, 2.51s/it] +2025-02-05 23:03:53 - ERROR - stderr - +2025-02-05 23:03:53 - ERROR - stderr - +2025-02-05 23:03:53 - INFO - stdout - {'loss': 0.6189, 'grad_norm': 1.1673377752304077, 'learning_rate': 6.992330044202547e-06, 'epoch': 1.83} +2025-02-05 23:03:53 - ERROR - stderr - 61%|██████ | 13670/22434 [12:56:13<6:06:48, 2.51s/it] +2025-02-05 23:03:56 - ERROR - stderr - 61%|██████ | 13671/22434 [12:56:16<6:06:38, 2.51s/it] +2025-02-05 23:03:56 - ERROR - stderr - +2025-02-05 23:03:56 - ERROR - stderr - +2025-02-05 23:03:56 - INFO - stdout - {'loss': 0.6787, 'grad_norm': 1.1722458600997925, 'learning_rate': 6.990953177785818e-06, 'epoch': 1.83} +2025-02-05 23:03:56 - ERROR - stderr - 61%|██████ | 13671/22434 [12:56:16<6:06:38, 2.51s/it] +2025-02-05 23:03:59 - ERROR - stderr - 61%|██████ | 13672/22434 [12:56:18<6:10:09, 2.53s/it] +2025-02-05 23:03:59 - ERROR - stderr - +2025-02-05 23:03:59 - ERROR - stderr - +2025-02-05 23:03:59 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.2142329216003418, 'learning_rate': 6.989576374089791e-06, 'epoch': 1.83} +2025-02-05 23:03:59 - ERROR - stderr - 61%|██████ | 13672/22434 [12:56:18<6:10:09, 2.53s/it] +2025-02-05 23:04:01 - ERROR - stderr - 61%|██████ | 13673/22434 [12:56:21<6:10:06, 2.53s/it] +2025-02-05 23:04:01 - ERROR - stderr - +2025-02-05 23:04:01 - ERROR - stderr - +2025-02-05 23:04:01 - INFO - stdout - {'loss': 0.7227, 'grad_norm': 1.3297072649002075, 'learning_rate': 6.98819963314316e-06, 'epoch': 1.83} +2025-02-05 23:04:01 - ERROR - stderr - 61%|██████ | 13673/22434 [12:56:21<6:10:06, 2.53s/it] +2025-02-05 23:04:04 - ERROR - stderr - 61%|██████ | 13674/22434 [12:56:23<6:10:40, 2.54s/it] +2025-02-05 23:04:04 - ERROR - stderr - +2025-02-05 23:04:04 - ERROR - stderr - +2025-02-05 23:04:04 - INFO - stdout - {'loss': 0.6723, 'grad_norm': 1.270232081413269, 'learning_rate': 6.986822954974631e-06, 'epoch': 1.83} +2025-02-05 23:04:04 - ERROR - stderr - 61%|██████ | 13674/22434 [12:56:23<6:10:40, 2.54s/it] +2025-02-05 23:04:06 - ERROR - stderr - 61%|██████ | 13675/22434 [12:56:26<6:18:23, 2.59s/it] +2025-02-05 23:04:06 - ERROR - stderr - +2025-02-05 23:04:06 - ERROR - stderr - +2025-02-05 23:04:06 - INFO - stdout - {'loss': 0.6511, 'grad_norm': 1.1608420610427856, 'learning_rate': 6.985446339612893e-06, 'epoch': 1.83} +2025-02-05 23:04:06 - ERROR - stderr - 61%|██████ | 13675/22434 [12:56:26<6:18:23, 2.59s/it] +2025-02-05 23:04:09 - ERROR - stderr - 61%|██████ | 13676/22434 [12:56:29<6:14:13, 2.56s/it] +2025-02-05 23:04:09 - ERROR - stderr - +2025-02-05 23:04:09 - ERROR - stderr - +2025-02-05 23:04:09 - INFO - stdout - {'loss': 0.6425, 'grad_norm': 1.1201629638671875, 'learning_rate': 6.984069787086638e-06, 'epoch': 1.83} +2025-02-05 23:04:09 - ERROR - stderr - 61%|██████ | 13676/22434 [12:56:29<6:14:13, 2.56s/it] +2025-02-05 23:04:11 - ERROR - stderr - 61%|██████ | 13677/22434 [12:56:31<6:13:56, 2.56s/it] +2025-02-05 23:04:11 - ERROR - stderr - +2025-02-05 23:04:11 - ERROR - stderr - +2025-02-05 23:04:11 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.262510061264038, 'learning_rate': 6.982693297424567e-06, 'epoch': 1.83} +2025-02-05 23:04:11 - ERROR - stderr - 61%|██████ | 13677/22434 [12:56:31<6:13:56, 2.56s/it] +2025-02-05 23:04:14 - ERROR - stderr - 61%|██████ | 13678/22434 [12:56:34<6:13:45, 2.56s/it] +2025-02-05 23:04:14 - ERROR - stderr - +2025-02-05 23:04:14 - ERROR - stderr - +2025-02-05 23:04:14 - INFO - stdout - {'loss': 0.5697, 'grad_norm': 1.1735036373138428, 'learning_rate': 6.981316870655361e-06, 'epoch': 1.83} +2025-02-05 23:04:14 - ERROR - stderr - 61%|██████ | 13678/22434 [12:56:34<6:13:45, 2.56s/it] +2025-02-05 23:04:16 - ERROR - stderr - 61%|██████ | 13679/22434 [12:56:36<6:08:05, 2.52s/it] +2025-02-05 23:04:16 - ERROR - stderr - +2025-02-05 23:04:16 - ERROR - stderr - +2025-02-05 23:04:16 - INFO - stdout - {'loss': 0.7104, 'grad_norm': 1.330461025238037, 'learning_rate': 6.97994050680772e-06, 'epoch': 1.83} +2025-02-05 23:04:16 - ERROR - stderr - 61%|██████ | 13679/22434 [12:56:36<6:08:05, 2.52s/it] +2025-02-05 23:04:19 - ERROR - stderr - 61%|██████ | 13680/22434 [12:56:39<6:17:38, 2.59s/it] +2025-02-05 23:04:19 - ERROR - stderr - +2025-02-05 23:04:19 - ERROR - stderr - +2025-02-05 23:04:19 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.1854509115219116, 'learning_rate': 6.978564205910331e-06, 'epoch': 1.83} +2025-02-05 23:04:19 - ERROR - stderr - 61%|██████ | 13680/22434 [12:56:39<6:17:38, 2.59s/it] +2025-02-05 23:04:22 - ERROR - stderr - 61%|██████ | 13681/22434 [12:56:41<6:15:32, 2.57s/it] +2025-02-05 23:04:22 - ERROR - stderr - +2025-02-05 23:04:22 - ERROR - stderr - +2025-02-05 23:04:22 - INFO - stdout - {'loss': 0.6786, 'grad_norm': 1.1057363748550415, 'learning_rate': 6.9771879679918755e-06, 'epoch': 1.83} +2025-02-05 23:04:22 - ERROR - stderr - 61%|██████ | 13681/22434 [12:56:41<6:15:32, 2.57s/it] +2025-02-05 23:04:24 - ERROR - stderr - 61%|██████ | 13682/22434 [12:56:44<6:10:17, 2.54s/it] +2025-02-05 23:04:24 - ERROR - stderr - +2025-02-05 23:04:24 - ERROR - stderr - +2025-02-05 23:04:24 - INFO - stdout - {'loss': 0.7121, 'grad_norm': 1.2634400129318237, 'learning_rate': 6.9758117930810484e-06, 'epoch': 1.83} +2025-02-05 23:04:24 - ERROR - stderr - 61%|██████ | 13682/22434 [12:56:44<6:10:17, 2.54s/it] +2025-02-05 23:04:27 - ERROR - stderr - 61%|██████ | 13683/22434 [12:56:46<6:08:20, 2.53s/it] +2025-02-05 23:04:27 - ERROR - stderr - +2025-02-05 23:04:27 - ERROR - stderr - +2025-02-05 23:04:27 - INFO - stdout - {'loss': 0.7735, 'grad_norm': 1.3212573528289795, 'learning_rate': 6.974435681206526e-06, 'epoch': 1.83} +2025-02-05 23:04:27 - ERROR - stderr - 61%|██████ | 13683/22434 [12:56:46<6:08:20, 2.53s/it] +2025-02-05 23:04:29 - ERROR - stderr - 61%|██████ | 13684/22434 [12:56:49<6:04:56, 2.50s/it] +2025-02-05 23:04:29 - ERROR - stderr - +2025-02-05 23:04:29 - ERROR - stderr - +2025-02-05 23:04:29 - INFO - stdout - {'loss': 0.6034, 'grad_norm': 1.1935824155807495, 'learning_rate': 6.973059632397002e-06, 'epoch': 1.83} +2025-02-05 23:04:29 - ERROR - stderr - 61%|██████ | 13684/22434 [12:56:49<6:04:56, 2.50s/it] +2025-02-05 23:04:32 - ERROR - stderr - 61%|██████ | 13685/22434 [12:56:51<6:05:12, 2.50s/it] +2025-02-05 23:04:32 - ERROR - stderr - +2025-02-05 23:04:32 - ERROR - stderr - +2025-02-05 23:04:32 - INFO - stdout - {'loss': 0.6625, 'grad_norm': 1.194448471069336, 'learning_rate': 6.971683646681151e-06, 'epoch': 1.83} +2025-02-05 23:04:32 - ERROR - stderr - 61%|██████ | 13685/22434 [12:56:51<6:05:12, 2.50s/it] +2025-02-05 23:04:34 - ERROR - stderr - 61%|██████ | 13686/22434 [12:56:54<6:02:36, 2.49s/it] +2025-02-05 23:04:34 - ERROR - stderr - +2025-02-05 23:04:34 - ERROR - stderr - +2025-02-05 23:04:34 - INFO - stdout - {'loss': 0.6847, 'grad_norm': 1.137538194656372, 'learning_rate': 6.970307724087655e-06, 'epoch': 1.83} +2025-02-05 23:04:34 - ERROR - stderr - 61%|██████ | 13686/22434 [12:56:54<6:02:36, 2.49s/it] +2025-02-05 23:04:36 - ERROR - stderr - 61%|██████ | 13687/22434 [12:56:56<6:01:14, 2.48s/it] +2025-02-05 23:04:37 - ERROR - stderr - +2025-02-05 23:04:37 - ERROR - stderr - +2025-02-05 23:04:37 - INFO - stdout - {'loss': 0.651, 'grad_norm': 1.2208797931671143, 'learning_rate': 6.968931864645198e-06, 'epoch': 1.83} +2025-02-05 23:04:37 - ERROR - stderr - 61%|██████ | 13687/22434 [12:56:56<6:01:14, 2.48s/it] +2025-02-05 23:04:39 - ERROR - stderr - 61%|██████ | 13688/22434 [12:56:59<5:58:19, 2.46s/it] +2025-02-05 23:04:39 - ERROR - stderr - +2025-02-05 23:04:39 - ERROR - stderr - +2025-02-05 23:04:39 - INFO - stdout - {'loss': 0.6512, 'grad_norm': 1.3061879873275757, 'learning_rate': 6.967556068382457e-06, 'epoch': 1.83} +2025-02-05 23:04:39 - ERROR - stderr - 61%|██████ | 13688/22434 [12:56:59<5:58:19, 2.46s/it] +2025-02-05 23:04:41 - ERROR - stderr - 61%|██████ | 13689/22434 [12:57:01<6:02:53, 2.49s/it] +2025-02-05 23:04:41 - ERROR - stderr - +2025-02-05 23:04:41 - ERROR - stderr - +2025-02-05 23:04:41 - INFO - stdout - {'loss': 0.5641, 'grad_norm': 1.2173490524291992, 'learning_rate': 6.966180335328103e-06, 'epoch': 1.83} +2025-02-05 23:04:41 - ERROR - stderr - 61%|██████ | 13689/22434 [12:57:01<6:02:53, 2.49s/it] +2025-02-05 23:04:44 - ERROR - stderr - 61%|██████ | 13690/22434 [12:57:04<5:59:37, 2.47s/it] +2025-02-05 23:04:44 - ERROR - stderr - +2025-02-05 23:04:44 - ERROR - stderr - +2025-02-05 23:04:44 - INFO - stdout - {'loss': 0.6403, 'grad_norm': 1.3485435247421265, 'learning_rate': 6.964804665510823e-06, 'epoch': 1.83} +2025-02-05 23:04:44 - ERROR - stderr - 61%|██████ | 13690/22434 [12:57:04<5:59:37, 2.47s/it] +2025-02-05 23:04:46 - ERROR - stderr - 61%|██████ | 13691/22434 [12:57:06<5:59:05, 2.46s/it] +2025-02-05 23:04:46 - ERROR - stderr - +2025-02-05 23:04:46 - ERROR - stderr - +2025-02-05 23:04:46 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.2749197483062744, 'learning_rate': 6.963429058959279e-06, 'epoch': 1.83} +2025-02-05 23:04:46 - ERROR - stderr - 61%|██████ | 13691/22434 [12:57:06<5:59:05, 2.46s/it] +2025-02-05 23:04:49 - ERROR - stderr - 61%|██████ | 13692/22434 [12:57:09<6:00:08, 2.47s/it] +2025-02-05 23:04:49 - ERROR - stderr - +2025-02-05 23:04:49 - ERROR - stderr - +2025-02-05 23:04:49 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.3115544319152832, 'learning_rate': 6.962053515702154e-06, 'epoch': 1.83} +2025-02-05 23:04:49 - ERROR - stderr - 61%|██████ | 13692/22434 [12:57:09<6:00:08, 2.47s/it] +2025-02-05 23:04:51 - ERROR - stderr - 61%|██████ | 13693/22434 [12:57:11<5:59:54, 2.47s/it] +2025-02-05 23:04:51 - ERROR - stderr - +2025-02-05 23:04:51 - ERROR - stderr - +2025-02-05 23:04:51 - INFO - stdout - {'loss': 0.6193, 'grad_norm': 1.2639825344085693, 'learning_rate': 6.9606780357681184e-06, 'epoch': 1.83} +2025-02-05 23:04:51 - ERROR - stderr - 61%|██████ | 13693/22434 [12:57:11<5:59:54, 2.47s/it] +2025-02-05 23:04:54 - ERROR - stderr - 61%|██████ | 13694/22434 [12:57:14<6:01:34, 2.48s/it] +2025-02-05 23:04:54 - ERROR - stderr - +2025-02-05 23:04:54 - ERROR - stderr - +2025-02-05 23:04:54 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.247612714767456, 'learning_rate': 6.9593026191858355e-06, 'epoch': 1.83} +2025-02-05 23:04:54 - ERROR - stderr - 61%|██████ | 13694/22434 [12:57:14<6:01:34, 2.48s/it] +2025-02-05 23:04:56 - ERROR - stderr - 61%|██████ | 13695/22434 [12:57:16<6:02:18, 2.49s/it] +2025-02-05 23:04:56 - ERROR - stderr - +2025-02-05 23:04:56 - ERROR - stderr - +2025-02-05 23:04:56 - INFO - stdout - {'loss': 0.6783, 'grad_norm': 1.2918758392333984, 'learning_rate': 6.9579272659839855e-06, 'epoch': 1.83} +2025-02-05 23:04:56 - ERROR - stderr - 61%|██████ | 13695/22434 [12:57:16<6:02:18, 2.49s/it] +2025-02-05 23:04:59 - ERROR - stderr - 61%|██████ | 13696/22434 [12:57:19<6:06:01, 2.51s/it] +2025-02-05 23:04:59 - ERROR - stderr - +2025-02-05 23:04:59 - ERROR - stderr - +2025-02-05 23:04:59 - INFO - stdout - {'loss': 0.6188, 'grad_norm': 1.1177948713302612, 'learning_rate': 6.95655197619123e-06, 'epoch': 1.83} +2025-02-05 23:04:59 - ERROR - stderr - 61%|██████ | 13696/22434 [12:57:19<6:06:01, 2.51s/it] +2025-02-05 23:05:01 - ERROR - stderr - 61%|██���███ | 13697/22434 [12:57:21<6:07:10, 2.52s/it] +2025-02-05 23:05:01 - ERROR - stderr - +2025-02-05 23:05:01 - ERROR - stderr - +2025-02-05 23:05:01 - INFO - stdout - {'loss': 0.7885, 'grad_norm': 1.2831132411956787, 'learning_rate': 6.955176749836232e-06, 'epoch': 1.83} +2025-02-05 23:05:01 - ERROR - stderr - 61%|██████ | 13697/22434 [12:57:21<6:07:10, 2.52s/it] +2025-02-05 23:05:04 - ERROR - stderr - 61%|██████ | 13698/22434 [12:57:24<6:04:44, 2.51s/it] +2025-02-05 23:05:04 - ERROR - stderr - +2025-02-05 23:05:04 - ERROR - stderr - +2025-02-05 23:05:04 - INFO - stdout - {'loss': 0.5719, 'grad_norm': 1.1410598754882812, 'learning_rate': 6.953801586947664e-06, 'epoch': 1.83} +2025-02-05 23:05:04 - ERROR - stderr - 61%|██████ | 13698/22434 [12:57:24<6:04:44, 2.51s/it] +2025-02-05 23:05:06 - ERROR - stderr - 61%|██████ | 13699/22434 [12:57:26<6:04:26, 2.50s/it] +2025-02-05 23:05:06 - ERROR - stderr - +2025-02-05 23:05:06 - ERROR - stderr - +2025-02-05 23:05:06 - INFO - stdout - {'loss': 0.7245, 'grad_norm': 1.2301900386810303, 'learning_rate': 6.952426487554185e-06, 'epoch': 1.83} +2025-02-05 23:05:06 - ERROR - stderr - 61%|██████ | 13699/22434 [12:57:26<6:04:26, 2.50s/it] +2025-02-05 23:05:09 - ERROR - stderr - 61%|██████ | 13700/22434 [12:57:29<6:01:09, 2.48s/it] +2025-02-05 23:05:09 - ERROR - stderr - +2025-02-05 23:05:09 - ERROR - stderr - +2025-02-05 23:05:09 - INFO - stdout - {'loss': 0.6626, 'grad_norm': 1.3630056381225586, 'learning_rate': 6.951051451684463e-06, 'epoch': 1.83} +2025-02-05 23:05:09 - ERROR - stderr - 61%|██████ | 13700/22434 [12:57:29<6:01:09, 2.48s/it] +2025-02-05 23:05:11 - ERROR - stderr - 61%|██████ | 13701/22434 [12:57:31<6:00:16, 2.48s/it] +2025-02-05 23:05:11 - ERROR - stderr - +2025-02-05 23:05:11 - ERROR - stderr - +2025-02-05 23:05:11 - INFO - stdout - {'loss': 0.7305, 'grad_norm': 1.3991765975952148, 'learning_rate': 6.949676479367155e-06, 'epoch': 1.83} +2025-02-05 23:05:11 - ERROR - stderr - 61%|██████ | 13701/22434 [12:57:31<6:00:16, 2.48s/it] +2025-02-05 23:05:14 - ERROR - stderr - 61%|██████ | 13702/22434 [12:57:34<6:04:29, 2.50s/it] +2025-02-05 23:05:14 - ERROR - stderr - +2025-02-05 23:05:14 - ERROR - stderr - +2025-02-05 23:05:14 - INFO - stdout - {'loss': 0.6315, 'grad_norm': 1.256777286529541, 'learning_rate': 6.94830157063092e-06, 'epoch': 1.83} +2025-02-05 23:05:14 - ERROR - stderr - 61%|██████ | 13702/22434 [12:57:34<6:04:29, 2.50s/it] +2025-02-05 23:05:16 - ERROR - stderr - 61%|██████ | 13703/22434 [12:57:36<6:04:12, 2.50s/it] +2025-02-05 23:05:16 - ERROR - stderr - +2025-02-05 23:05:16 - ERROR - stderr - +2025-02-05 23:05:16 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.2460036277770996, 'learning_rate': 6.9469267255044215e-06, 'epoch': 1.83} +2025-02-05 23:05:16 - ERROR - stderr - 61%|██████ | 13703/22434 [12:57:36<6:04:12, 2.50s/it] +2025-02-05 23:05:19 - ERROR - stderr - 61%|██████ | 13704/22434 [12:57:39<6:02:09, 2.49s/it] +2025-02-05 23:05:19 - ERROR - stderr - +2025-02-05 23:05:19 - ERROR - stderr - +2025-02-05 23:05:19 - INFO - stdout - {'loss': 0.582, 'grad_norm': 1.1540305614471436, 'learning_rate': 6.945551944016311e-06, 'epoch': 1.83} +2025-02-05 23:05:19 - ERROR - stderr - 61%|██████ | 13704/22434 [12:57:39<6:02:09, 2.49s/it] +2025-02-05 23:05:21 - ERROR - stderr - 61%|██████ | 13705/22434 [12:57:41<6:03:19, 2.50s/it] +2025-02-05 23:05:21 - ERROR - stderr - +2025-02-05 23:05:21 - ERROR - stderr - +2025-02-05 23:05:21 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.2193268537521362, 'learning_rate': 6.944177226195247e-06, 'epoch': 1.83} +2025-02-05 23:05:21 - ERROR - stderr - 61%|██████ | 13705/22434 [12:57:41<6:03:19, 2.50s/it] +2025-02-05 23:05:24 - ERROR - stderr - 61%|██████ | 13706/22434 [12:57:44<6:02:49, 2.49s/it] +2025-02-05 23:05:24 - ERROR - stderr - +2025-02-05 23:05:24 - ERROR - stderr - +2025-02-05 23:05:24 - INFO - stdout - {'loss': 0.7757, 'grad_norm': 1.3453047275543213, 'learning_rate': 6.942802572069889e-06, 'epoch': 1.83} +2025-02-05 23:05:24 - ERROR - stderr - 61%|██████ | 13706/22434 [12:57:44<6:02:49, 2.49s/it] +2025-02-05 23:05:26 - ERROR - stderr - 61%|██████ | 13707/22434 [12:57:46<6:05:57, 2.52s/it] +2025-02-05 23:05:26 - ERROR - stderr - +2025-02-05 23:05:26 - ERROR - stderr - +2025-02-05 23:05:26 - INFO - stdout - {'loss': 0.7362, 'grad_norm': 1.258634328842163, 'learning_rate': 6.94142798166888e-06, 'epoch': 1.83} +2025-02-05 23:05:26 - ERROR - stderr - 61%|██████ | 13707/22434 [12:57:46<6:05:57, 2.52s/it] +2025-02-05 23:05:29 - ERROR - stderr - 61%|██████ | 13708/22434 [12:57:49<6:04:27, 2.51s/it] +2025-02-05 23:05:29 - ERROR - stderr - +2025-02-05 23:05:29 - ERROR - stderr - +2025-02-05 23:05:29 - INFO - stdout - {'loss': 0.6594, 'grad_norm': 1.2628198862075806, 'learning_rate': 6.940053455020883e-06, 'epoch': 1.83} +2025-02-05 23:05:29 - ERROR - stderr - 61%|██████ | 13708/22434 [12:57:49<6:04:27, 2.51s/it] +2025-02-05 23:05:31 - ERROR - stderr - 61%|██████ | 13709/22434 [12:57:51<6:04:18, 2.51s/it] +2025-02-05 23:05:31 - ERROR - stderr - +2025-02-05 23:05:31 - ERROR - stderr - +2025-02-05 23:05:31 - INFO - stdout - {'loss': 0.6597, 'grad_norm': 1.2385324239730835, 'learning_rate': 6.938678992154544e-06, 'epoch': 1.83} +2025-02-05 23:05:31 - ERROR - stderr - 61%|██████ | 13709/22434 [12:57:51<6:04:18, 2.51s/it] +2025-02-05 23:05:34 - ERROR - stderr - 61%|██████ | 13710/22434 [12:57:54<6:04:33, 2.51s/it] +2025-02-05 23:05:34 - ERROR - stderr - +2025-02-05 23:05:34 - ERROR - stderr - +2025-02-05 23:05:34 - INFO - stdout - {'loss': 0.7304, 'grad_norm': 1.3036601543426514, 'learning_rate': 6.937304593098509e-06, 'epoch': 1.83} +2025-02-05 23:05:34 - ERROR - stderr - 61%|██████ | 13710/22434 [12:57:54<6:04:33, 2.51s/it] +2025-02-05 23:05:36 - ERROR - stderr - 61%|██████ | 13711/22434 [12:57:56<6:06:13, 2.52s/it] +2025-02-05 23:05:36 - ERROR - stderr - +2025-02-05 23:05:36 - ERROR - stderr - +2025-02-05 23:05:36 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.2943599224090576, 'learning_rate': 6.935930257881429e-06, 'epoch': 1.83} +2025-02-05 23:05:36 - ERROR - stderr - 61%|██████ | 13711/22434 [12:57:56<6:06:13, 2.52s/it] +2025-02-05 23:05:39 - ERROR - stderr - 61%|██████ | 13712/22434 [12:57:59<6:03:26, 2.50s/it] +2025-02-05 23:05:39 - ERROR - stderr - +2025-02-05 23:05:39 - ERROR - stderr - +2025-02-05 23:05:39 - INFO - stdout - {'loss': 0.6259, 'grad_norm': 1.2706928253173828, 'learning_rate': 6.934555986531953e-06, 'epoch': 1.83} +2025-02-05 23:05:39 - ERROR - stderr - 61%|██████ | 13712/22434 [12:57:59<6:03:26, 2.50s/it] +2025-02-05 23:05:41 - ERROR - stderr - 61%|██████ | 13713/22434 [12:58:01<6:01:10, 2.48s/it] +2025-02-05 23:05:41 - ERROR - stderr - +2025-02-05 23:05:41 - ERROR - stderr - +2025-02-05 23:05:41 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.2457811832427979, 'learning_rate': 6.933181779078722e-06, 'epoch': 1.83} +2025-02-05 23:05:41 - ERROR - stderr - 61%|██████ | 13713/22434 [12:58:01<6:01:10, 2.48s/it] +2025-02-05 23:05:44 - ERROR - stderr - 61%|██████ | 13714/22434 [12:58:04<6:01:06, 2.48s/it] +2025-02-05 23:05:44 - ERROR - stderr - +2025-02-05 23:05:44 - ERROR - stderr - +2025-02-05 23:05:44 - INFO - stdout - {'loss': 0.6841, 'grad_norm': 1.172239065170288, 'learning_rate': 6.9318076355503835e-06, 'epoch': 1.83} +2025-02-05 23:05:44 - ERROR - stderr - 61%|██████ | 13714/22434 [12:58:04<6:01:06, 2.48s/it] +2025-02-05 23:05:46 - ERROR - stderr - 61%|██████ | 13715/22434 [12:58:06<6:03:43, 2.50s/it] +2025-02-05 23:05:46 - ERROR - stderr - +2025-02-05 23:05:46 - ERROR - stderr - +2025-02-05 23:05:46 - INFO - stdout - {'loss': 0.6165, 'grad_norm': 1.1760669946670532, 'learning_rate': 6.9304335559755766e-06, 'epoch': 1.83} +2025-02-05 23:05:46 - ERROR - stderr - 61%|██████ | 13715/22434 [12:58:06<6:03:43, 2.50s/it] +2025-02-05 23:05:49 - ERROR - stderr - 61%|██████ | 13716/22434 [12:58:09<6:01:13, 2.49s/it] +2025-02-05 23:05:49 - ERROR - stderr - +2025-02-05 23:05:49 - ERROR - stderr - +2025-02-05 23:05:49 - INFO - stdout - {'loss': 0.7124, 'grad_norm': 1.252285361289978, 'learning_rate': 6.929059540382948e-06, 'epoch': 1.83} +2025-02-05 23:05:49 - ERROR - stderr - 61%|██████ | 13716/22434 [12:58:09<6:01:13, 2.49s/it] +2025-02-05 23:05:51 - ERROR - stderr - 61%|██████ | 13717/22434 [12:58:11<6:00:58, 2.48s/it] +2025-02-05 23:05:51 - ERROR - stderr - +2025-02-05 23:05:51 - ERROR - stderr - +2025-02-05 23:05:51 - INFO - stdout - {'loss': 0.7055, 'grad_norm': 1.1872901916503906, 'learning_rate': 6.927685588801134e-06, 'epoch': 1.83} +2025-02-05 23:05:51 - ERROR - stderr - 61%|██████ | 13717/22434 [12:58:11<6:00:58, 2.48s/it] +2025-02-05 23:05:54 - ERROR - stderr - 61%|██████ | 13718/22434 [12:58:14<6:03:07, 2.50s/it] +2025-02-05 23:05:54 - ERROR - stderr - +2025-02-05 23:05:54 - ERROR - stderr - +2025-02-05 23:05:54 - INFO - stdout - {'loss': 0.6652, 'grad_norm': 1.217926025390625, 'learning_rate': 6.926311701258772e-06, 'epoch': 1.83} +2025-02-05 23:05:54 - ERROR - stderr - 61%|██████ | 13718/22434 [12:58:14<6:03:07, 2.50s/it] +2025-02-05 23:05:56 - ERROR - stderr - 61%|██████ | 13719/22434 [12:58:16<6:03:06, 2.50s/it] +2025-02-05 23:05:56 - ERROR - stderr - +2025-02-05 23:05:56 - ERROR - stderr - +2025-02-05 23:05:56 - INFO - stdout - {'loss': 0.6873, 'grad_norm': 1.1974453926086426, 'learning_rate': 6.924937877784505e-06, 'epoch': 1.83} +2025-02-05 23:05:56 - ERROR - stderr - 61%|█████��� | 13719/22434 [12:58:16<6:03:06, 2.50s/it] +2025-02-05 23:05:59 - ERROR - stderr - 61%|██████ | 13720/22434 [12:58:19<6:02:49, 2.50s/it] +2025-02-05 23:05:59 - ERROR - stderr - +2025-02-05 23:05:59 - ERROR - stderr - +2025-02-05 23:05:59 - INFO - stdout - {'loss': 0.7317, 'grad_norm': 1.280928611755371, 'learning_rate': 6.923564118406964e-06, 'epoch': 1.83} +2025-02-05 23:05:59 - ERROR - stderr - 61%|██████ | 13720/22434 [12:58:19<6:02:49, 2.50s/it] +2025-02-05 23:06:01 - ERROR - stderr - 61%|██████ | 13721/22434 [12:58:21<6:04:03, 2.51s/it] +2025-02-05 23:06:01 - ERROR - stderr - +2025-02-05 23:06:01 - ERROR - stderr - +2025-02-05 23:06:01 - INFO - stdout - {'loss': 0.7595, 'grad_norm': 1.5077660083770752, 'learning_rate': 6.9221904231547835e-06, 'epoch': 1.83} +2025-02-05 23:06:01 - ERROR - stderr - 61%|██████ | 13721/22434 [12:58:21<6:04:03, 2.51s/it] +2025-02-05 23:06:04 - ERROR - stderr - 61%|██████ | 13722/22434 [12:58:24<6:03:54, 2.51s/it] +2025-02-05 23:06:04 - ERROR - stderr - +2025-02-05 23:06:04 - ERROR - stderr - +2025-02-05 23:06:04 - INFO - stdout - {'loss': 0.6378, 'grad_norm': 1.321532130241394, 'learning_rate': 6.920816792056602e-06, 'epoch': 1.83} +2025-02-05 23:06:04 - ERROR - stderr - 61%|██████ | 13722/22434 [12:58:24<6:03:54, 2.51s/it] +2025-02-05 23:06:06 - ERROR - stderr - 61%|██████ | 13723/22434 [12:58:26<6:03:52, 2.51s/it] +2025-02-05 23:06:06 - ERROR - stderr - +2025-02-05 23:06:06 - ERROR - stderr - +2025-02-05 23:06:06 - INFO - stdout - {'loss': 0.712, 'grad_norm': 1.34261953830719, 'learning_rate': 6.919443225141043e-06, 'epoch': 1.84} +2025-02-05 23:06:06 - ERROR - stderr - 61%|██████ | 13723/22434 [12:58:26<6:03:52, 2.51s/it] +2025-02-05 23:06:09 - ERROR - stderr - 61%|██████ | 13724/22434 [12:58:29<6:02:20, 2.50s/it] +2025-02-05 23:06:09 - ERROR - stderr - +2025-02-05 23:06:09 - ERROR - stderr - +2025-02-05 23:06:09 - INFO - stdout - {'loss': 0.717, 'grad_norm': 1.2904306650161743, 'learning_rate': 6.9180697224367445e-06, 'epoch': 1.84} +2025-02-05 23:06:09 - ERROR - stderr - 61%|██████ | 13724/22434 [12:58:29<6:02:20, 2.50s/it] +2025-02-05 23:06:11 - ERROR - stderr - 61%|██████ | 13725/22434 [12:58:31<6:04:21, 2.51s/it] +2025-02-05 23:06:11 - ERROR - stderr - +2025-02-05 23:06:11 - ERROR - stderr - +2025-02-05 23:06:11 - INFO - stdout - {'loss': 0.7283, 'grad_norm': 1.20167076587677, 'learning_rate': 6.916696283972335e-06, 'epoch': 1.84} +2025-02-05 23:06:11 - ERROR - stderr - 61%|██████ | 13725/22434 [12:58:31<6:04:21, 2.51s/it] +2025-02-05 23:06:14 - ERROR - stderr - 61%|██████ | 13726/22434 [12:58:34<6:03:29, 2.50s/it] +2025-02-05 23:06:14 - ERROR - stderr - +2025-02-05 23:06:14 - ERROR - stderr - +2025-02-05 23:06:14 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.3282886743545532, 'learning_rate': 6.9153229097764375e-06, 'epoch': 1.84} +2025-02-05 23:06:14 - ERROR - stderr - 61%|██████ | 13726/22434 [12:58:34<6:03:29, 2.50s/it] +2025-02-05 23:06:16 - ERROR - stderr - 61%|██████ | 13727/22434 [12:58:36<6:01:46, 2.49s/it] +2025-02-05 23:06:16 - ERROR - stderr - +2025-02-05 23:06:16 - ERROR - stderr - +2025-02-05 23:06:16 - INFO - stdout - {'loss': 0.6773, 'grad_norm': 1.2034962177276611, 'learning_rate': 6.913949599877686e-06, 'epoch': 1.84} +2025-02-05 23:06:16 - ERROR - stderr - 61%|██████ | 13727/22434 [12:58:36<6:01:46, 2.49s/it] +2025-02-05 23:06:19 - ERROR - stderr - 61%|██████ | 13728/22434 [12:58:39<6:00:28, 2.48s/it] +2025-02-05 23:06:19 - ERROR - stderr - +2025-02-05 23:06:19 - ERROR - stderr - +2025-02-05 23:06:19 - INFO - stdout - {'loss': 0.6416, 'grad_norm': 1.2672545909881592, 'learning_rate': 6.912576354304703e-06, 'epoch': 1.84} +2025-02-05 23:06:19 - ERROR - stderr - 61%|██████ | 13728/22434 [12:58:39<6:00:28, 2.48s/it] +2025-02-05 23:06:21 - ERROR - stderr - 61%|██████ | 13729/22434 [12:58:41<6:01:44, 2.49s/it] +2025-02-05 23:06:21 - ERROR - stderr - +2025-02-05 23:06:21 - ERROR - stderr - +2025-02-05 23:06:21 - INFO - stdout - {'loss': 0.6043, 'grad_norm': 1.2087756395339966, 'learning_rate': 6.911203173086107e-06, 'epoch': 1.84} +2025-02-05 23:06:21 - ERROR - stderr - 61%|██████ | 13729/22434 [12:58:41<6:01:44, 2.49s/it] +2025-02-05 23:06:24 - ERROR - stderr - 61%|██████ | 13730/22434 [12:58:44<6:04:21, 2.51s/it] +2025-02-05 23:06:24 - ERROR - stderr - +2025-02-05 23:06:24 - ERROR - stderr - +2025-02-05 23:06:24 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.2366011142730713, 'learning_rate': 6.909830056250527e-06, 'epoch': 1.84} +2025-02-05 23:06:24 - ERROR - stderr - 61%|██████ | 13730/22434 [12:58:44<6:04:21, 2.51s/it] +2025-02-05 23:06:26 - ERROR - stderr - 61%|██████ | 13731/22434 [12:58:46<6:06:40, 2.53s/it] +2025-02-05 23:06:26 - ERROR - stderr - +2025-02-05 23:06:26 - ERROR - stderr - +2025-02-05 23:06:26 - INFO - stdout - {'loss': 0.6493, 'grad_norm': 1.1904550790786743, 'learning_rate': 6.9084570038265805e-06, 'epoch': 1.84} +2025-02-05 23:06:26 - ERROR - stderr - 61%|██████ | 13731/22434 [12:58:46<6:06:40, 2.53s/it] +2025-02-05 23:06:29 - ERROR - stderr - 61%|██████ | 13732/22434 [12:58:49<6:05:06, 2.52s/it] +2025-02-05 23:06:29 - ERROR - stderr - +2025-02-05 23:06:29 - ERROR - stderr - +2025-02-05 23:06:29 - INFO - stdout - {'loss': 0.6944, 'grad_norm': 1.1627498865127563, 'learning_rate': 6.907084015842893e-06, 'epoch': 1.84} +2025-02-05 23:06:29 - ERROR - stderr - 61%|██████ | 13732/22434 [12:58:49<6:05:06, 2.52s/it] +2025-02-05 23:06:31 - ERROR - stderr - 61%|██████ | 13733/22434 [12:58:51<6:04:56, 2.52s/it] +2025-02-05 23:06:31 - ERROR - stderr - +2025-02-05 23:06:31 - ERROR - stderr - +2025-02-05 23:06:31 - INFO - stdout - {'loss': 0.6071, 'grad_norm': 1.2116574048995972, 'learning_rate': 6.905711092328081e-06, 'epoch': 1.84} +2025-02-05 23:06:31 - ERROR - stderr - 61%|██████ | 13733/22434 [12:58:51<6:04:56, 2.52s/it] +2025-02-05 23:06:34 - ERROR - stderr - 61%|██████ | 13734/22434 [12:58:54<6:00:09, 2.48s/it] +2025-02-05 23:06:34 - ERROR - stderr - +2025-02-05 23:06:34 - ERROR - stderr - +2025-02-05 23:06:34 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.278102993965149, 'learning_rate': 6.904338233310755e-06, 'epoch': 1.84} +2025-02-05 23:06:34 - ERROR - stderr - 61%|██████ | 13734/22434 [12:58:54<6:00:09, 2.48s/it] +2025-02-05 23:06:36 - ERROR - stderr - 61%|██████ | 13735/22434 [12:58:56<6:01:24, 2.49s/it] +2025-02-05 23:06:36 - ERROR - stderr - +2025-02-05 23:06:36 - ERROR - stderr - +2025-02-05 23:06:36 - INFO - stdout - {'loss': 0.6639, 'grad_norm': 1.17496657371521, 'learning_rate': 6.9029654388195425e-06, 'epoch': 1.84} +2025-02-05 23:06:36 - ERROR - stderr - 61%|██████ | 13735/22434 [12:58:56<6:01:24, 2.49s/it] +2025-02-05 23:06:39 - ERROR - stderr - 61%|██████ | 13736/22434 [12:58:59<6:01:09, 2.49s/it] +2025-02-05 23:06:39 - ERROR - stderr - +2025-02-05 23:06:39 - ERROR - stderr - +2025-02-05 23:06:39 - INFO - stdout - {'loss': 0.72, 'grad_norm': 1.4341068267822266, 'learning_rate': 6.901592708883047e-06, 'epoch': 1.84} +2025-02-05 23:06:39 - ERROR - stderr - 61%|██████ | 13736/22434 [12:58:59<6:01:09, 2.49s/it] +2025-02-05 23:06:41 - ERROR - stderr - 61%|██████ | 13737/22434 [12:59:01<6:02:42, 2.50s/it] +2025-02-05 23:06:41 - ERROR - stderr - +2025-02-05 23:06:41 - ERROR - stderr - +2025-02-05 23:06:41 - INFO - stdout - {'loss': 0.7138, 'grad_norm': 1.2902679443359375, 'learning_rate': 6.9002200435298864e-06, 'epoch': 1.84} +2025-02-05 23:06:41 - ERROR - stderr - 61%|██████ | 13737/22434 [12:59:01<6:02:42, 2.50s/it] +2025-02-05 23:06:44 - ERROR - stderr - 61%|██████ | 13738/22434 [12:59:04<6:00:37, 2.49s/it] +2025-02-05 23:06:44 - ERROR - stderr - +2025-02-05 23:06:44 - ERROR - stderr - +2025-02-05 23:06:44 - INFO - stdout - {'loss': 0.6908, 'grad_norm': 1.1837358474731445, 'learning_rate': 6.8988474427886765e-06, 'epoch': 1.84} +2025-02-05 23:06:44 - ERROR - stderr - 61%|██████ | 13738/22434 [12:59:04<6:00:37, 2.49s/it] +2025-02-05 23:06:46 - ERROR - stderr - 61%|██████ | 13739/22434 [12:59:06<6:00:23, 2.49s/it] +2025-02-05 23:06:46 - ERROR - stderr - +2025-02-05 23:06:46 - ERROR - stderr - +2025-02-05 23:06:46 - INFO - stdout - {'loss': 0.719, 'grad_norm': 1.3759571313858032, 'learning_rate': 6.89747490668802e-06, 'epoch': 1.84} +2025-02-05 23:06:46 - ERROR - stderr - 61%|██████ | 13739/22434 [12:59:06<6:00:23, 2.49s/it] +2025-02-05 23:06:49 - ERROR - stderr - 61%|██████ | 13740/22434 [12:59:09<6:02:49, 2.50s/it] +2025-02-05 23:06:49 - ERROR - stderr - +2025-02-05 23:06:49 - ERROR - stderr - +2025-02-05 23:06:49 - INFO - stdout - {'loss': 0.6806, 'grad_norm': 1.2355530261993408, 'learning_rate': 6.8961024352565345e-06, 'epoch': 1.84} +2025-02-05 23:06:49 - ERROR - stderr - 61%|██████ | 13740/22434 [12:59:09<6:02:49, 2.50s/it] +2025-02-05 23:06:51 - ERROR - stderr - 61%|██████▏ | 13741/22434 [12:59:11<6:02:50, 2.50s/it] +2025-02-05 23:06:51 - ERROR - stderr - +2025-02-05 23:06:51 - ERROR - stderr - +2025-02-05 23:06:51 - INFO - stdout - {'loss': 0.6159, 'grad_norm': 1.0711584091186523, 'learning_rate': 6.894730028522824e-06, 'epoch': 1.84} +2025-02-05 23:06:51 - ERROR - stderr - 61%|██████▏ | 13741/22434 [12:59:11<6:02:50, 2.50s/it] +2025-02-05 23:06:54 - ERROR - stderr - 61%|██████▏ | 13742/22434 [12:59:14<6:01:04, 2.49s/it] +2025-02-05 23:06:54 - ERROR - stderr - +2025-02-05 23:06:54 - ERROR - stderr - +2025-02-05 23:06:54 - INFO - stdout - {'loss': 0.5724, 'grad_norm': 1.0717856884002686, 'learning_rate': 6.89335768651549e-06, 'epoch': 1.84} +2025-02-05 23:06:54 - ERROR - stderr - 61%|██████▏ | 13742/22434 [12:59:14<6:01:04, 2.49s/it] +2025-02-05 23:06:56 - ERROR - stderr - 61%|██████▏ | 13743/22434 [12:59:16<5:59:06, 2.48s/it] +2025-02-05 23:06:56 - ERROR - stderr - +2025-02-05 23:06:56 - ERROR - stderr - +2025-02-05 23:06:56 - INFO - stdout - {'loss': 0.6236, 'grad_norm': 1.2146811485290527, 'learning_rate': 6.8919854092631445e-06, 'epoch': 1.84} +2025-02-05 23:06:56 - ERROR - stderr - 61%|██████▏ | 13743/22434 [12:59:16<5:59:06, 2.48s/it] +2025-02-05 23:06:59 - ERROR - stderr - 61%|██████▏ | 13744/22434 [12:59:19<5:59:11, 2.48s/it] +2025-02-05 23:06:59 - ERROR - stderr - +2025-02-05 23:06:59 - ERROR - stderr - +2025-02-05 23:06:59 - INFO - stdout - {'loss': 0.6674, 'grad_norm': 1.3056007623672485, 'learning_rate': 6.8906131967943904e-06, 'epoch': 1.84} +2025-02-05 23:06:59 - ERROR - stderr - 61%|██████▏ | 13744/22434 [12:59:19<5:59:11, 2.48s/it] +2025-02-05 23:07:01 - ERROR - stderr - 61%|██████▏ | 13745/22434 [12:59:21<6:06:14, 2.53s/it] +2025-02-05 23:07:01 - ERROR - stderr - +2025-02-05 23:07:01 - ERROR - stderr - +2025-02-05 23:07:01 - INFO - stdout - {'loss': 0.7415, 'grad_norm': 1.5136709213256836, 'learning_rate': 6.889241049137825e-06, 'epoch': 1.84} +2025-02-05 23:07:01 - ERROR - stderr - 61%|██████▏ | 13745/22434 [12:59:21<6:06:14, 2.53s/it] +2025-02-05 23:07:04 - ERROR - stderr - 61%|██████▏ | 13746/22434 [12:59:24<6:04:52, 2.52s/it] +2025-02-05 23:07:04 - ERROR - stderr - +2025-02-05 23:07:04 - ERROR - stderr - +2025-02-05 23:07:04 - INFO - stdout - {'loss': 0.7823, 'grad_norm': 1.31511652469635, 'learning_rate': 6.887868966322058e-06, 'epoch': 1.84} +2025-02-05 23:07:04 - ERROR - stderr - 61%|██████▏ | 13746/22434 [12:59:24<6:04:52, 2.52s/it] +2025-02-05 23:07:06 - ERROR - stderr - 61%|██████▏ | 13747/22434 [12:59:26<6:03:08, 2.51s/it] +2025-02-05 23:07:06 - ERROR - stderr - +2025-02-05 23:07:06 - ERROR - stderr - +2025-02-05 23:07:06 - INFO - stdout - {'loss': 0.7212, 'grad_norm': 1.253554344177246, 'learning_rate': 6.886496948375681e-06, 'epoch': 1.84} +2025-02-05 23:07:06 - ERROR - stderr - 61%|██████▏ | 13747/22434 [12:59:26<6:03:08, 2.51s/it] +2025-02-05 23:07:09 - ERROR - stderr - 61%|██████▏ | 13748/22434 [12:59:29<6:02:43, 2.51s/it] +2025-02-05 23:07:09 - ERROR - stderr - +2025-02-05 23:07:09 - ERROR - stderr - +2025-02-05 23:07:09 - INFO - stdout - {'loss': 0.6649, 'grad_norm': 1.2178689241409302, 'learning_rate': 6.885124995327298e-06, 'epoch': 1.84} +2025-02-05 23:07:09 - ERROR - stderr - 61%|██████▏ | 13748/22434 [12:59:29<6:02:43, 2.51s/it] +2025-02-05 23:07:11 - ERROR - stderr - 61%|██████▏ | 13749/22434 [12:59:31<5:59:32, 2.48s/it] +2025-02-05 23:07:11 - ERROR - stderr - +2025-02-05 23:07:11 - ERROR - stderr - +2025-02-05 23:07:11 - INFO - stdout - {'loss': 0.6419, 'grad_norm': 1.1900715827941895, 'learning_rate': 6.883753107205503e-06, 'epoch': 1.84} +2025-02-05 23:07:11 - ERROR - stderr - 61%|██████▏ | 13749/22434 [12:59:31<5:59:32, 2.48s/it] +2025-02-05 23:07:14 - ERROR - stderr - 61%|██████▏ | 13750/22434 [12:59:34<6:00:23, 2.49s/it] +2025-02-05 23:07:14 - ERROR - stderr - +2025-02-05 23:07:14 - ERROR - stderr - +2025-02-05 23:07:14 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.230186939239502, 'learning_rate': 6.8823812840388905e-06, 'epoch': 1.84} +2025-02-05 23:07:14 - ERROR - stderr - 61%|██████▏ | 13750/22434 [12:59:34<6:00:23, 2.49s/it] +2025-02-05 23:07:16 - ERROR - stderr - 61%|██████▏ | 13751/22434 [12:59:36<6:01:19, 2.50s/it] +2025-02-05 23:07:16 - ERROR - stderr - +2025-02-05 23:07:16 - ERROR - stderr - +2025-02-05 23:07:16 - INFO - stdout - {'loss': 0.7515, 'grad_norm': 1.429879069328308, 'learning_rate': 6.88100952585606e-06, 'epoch': 1.84} +2025-02-05 23:07:16 - ERROR - stderr - 61%|██████▏ | 13751/22434 [12:59:36<6:01:19, 2.50s/it] +2025-02-05 23:07:19 - ERROR - stderr - 61%|██████▏ | 13752/22434 [12:59:39<5:59:03, 2.48s/it] +2025-02-05 23:07:19 - ERROR - stderr - +2025-02-05 23:07:19 - ERROR - stderr - +2025-02-05 23:07:19 - INFO - stdout - {'loss': 0.6389, 'grad_norm': 1.1788370609283447, 'learning_rate': 6.879637832685603e-06, 'epoch': 1.84} +2025-02-05 23:07:19 - ERROR - stderr - 61%|██████▏ | 13752/22434 [12:59:39<5:59:03, 2.48s/it] +2025-02-05 23:07:21 - ERROR - stderr - 61%|██████▏ | 13753/22434 [12:59:41<5:58:53, 2.48s/it] +2025-02-05 23:07:21 - ERROR - stderr - +2025-02-05 23:07:21 - ERROR - stderr - +2025-02-05 23:07:21 - INFO - stdout - {'loss': 0.6463, 'grad_norm': 1.1367188692092896, 'learning_rate': 6.878266204556103e-06, 'epoch': 1.84} +2025-02-05 23:07:21 - ERROR - stderr - 61%|██████▏ | 13753/22434 [12:59:41<5:58:53, 2.48s/it] +2025-02-05 23:07:24 - ERROR - stderr - 61%|██████▏ | 13754/22434 [12:59:43<5:59:16, 2.48s/it] +2025-02-05 23:07:24 - ERROR - stderr - +2025-02-05 23:07:24 - ERROR - stderr - +2025-02-05 23:07:24 - INFO - stdout - {'loss': 0.6379, 'grad_norm': 1.2369978427886963, 'learning_rate': 6.876894641496164e-06, 'epoch': 1.84} +2025-02-05 23:07:24 - ERROR - stderr - 61%|██████▏ | 13754/22434 [12:59:44<5:59:16, 2.48s/it] +2025-02-05 23:07:26 - ERROR - stderr - 61%|██████▏ | 13755/22434 [12:59:46<6:00:18, 2.49s/it] +2025-02-05 23:07:26 - ERROR - stderr - +2025-02-05 23:07:26 - ERROR - stderr - +2025-02-05 23:07:26 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.207277774810791, 'learning_rate': 6.875523143534362e-06, 'epoch': 1.84} +2025-02-05 23:07:26 - ERROR - stderr - 61%|██████▏ | 13755/22434 [12:59:46<6:00:18, 2.49s/it] +2025-02-05 23:07:29 - ERROR - stderr - 61%|██████▏ | 13756/22434 [12:59:48<6:00:12, 2.49s/it] +2025-02-05 23:07:29 - ERROR - stderr - +2025-02-05 23:07:29 - ERROR - stderr - +2025-02-05 23:07:29 - INFO - stdout - {'loss': 0.6394, 'grad_norm': 1.2378968000411987, 'learning_rate': 6.874151710699293e-06, 'epoch': 1.84} +2025-02-05 23:07:29 - ERROR - stderr - 61%|██████▏ | 13756/22434 [12:59:49<6:00:12, 2.49s/it] +2025-02-05 23:07:31 - ERROR - stderr - 61%|██████▏ | 13757/22434 [12:59:51<6:00:40, 2.49s/it] +2025-02-05 23:07:31 - ERROR - stderr - +2025-02-05 23:07:31 - ERROR - stderr - +2025-02-05 23:07:31 - INFO - stdout - {'loss': 0.7289, 'grad_norm': 1.322172999382019, 'learning_rate': 6.87278034301954e-06, 'epoch': 1.84} +2025-02-05 23:07:31 - ERROR - stderr - 61%|██████▏ | 13757/22434 [12:59:51<6:00:40, 2.49s/it] +2025-02-05 23:07:34 - ERROR - stderr - 61%|██████▏ | 13758/22434 [12:59:53<5:58:32, 2.48s/it] +2025-02-05 23:07:34 - ERROR - stderr - +2025-02-05 23:07:34 - ERROR - stderr - +2025-02-05 23:07:34 - INFO - stdout - {'loss': 0.6874, 'grad_norm': 1.238092303276062, 'learning_rate': 6.871409040523686e-06, 'epoch': 1.84} +2025-02-05 23:07:34 - ERROR - stderr - 61%|██████▏ | 13758/22434 [12:59:53<5:58:32, 2.48s/it] +2025-02-05 23:07:36 - ERROR - stderr - 61%|██████▏ | 13759/22434 [12:59:56<5:57:40, 2.47s/it] +2025-02-05 23:07:36 - ERROR - stderr - +2025-02-05 23:07:36 - ERROR - stderr - +2025-02-05 23:07:36 - INFO - stdout - {'loss': 0.7333, 'grad_norm': 1.2307052612304688, 'learning_rate': 6.870037803240321e-06, 'epoch': 1.84} +2025-02-05 23:07:36 - ERROR - stderr - 61%|██████▏ | 13759/22434 [12:59:56<5:57:40, 2.47s/it] +2025-02-05 23:07:39 - ERROR - stderr - 61%|██████▏ | 13760/22434 [12:59:58<5:58:19, 2.48s/it] +2025-02-05 23:07:39 - ERROR - stderr - +2025-02-05 23:07:39 - ERROR - stderr - +2025-02-05 23:07:39 - INFO - stdout - {'loss': 0.7039, 'grad_norm': 1.3050785064697266, 'learning_rate': 6.868666631198024e-06, 'epoch': 1.84} +2025-02-05 23:07:39 - ERROR - stderr - 61%|██████▏ | 13760/22434 [12:59:58<5:58:19, 2.48s/it] +2025-02-05 23:07:41 - ERROR - stderr - 61%|██████▏ | 13761/22434 [13:00:01<6:00:41, 2.50s/it] +2025-02-05 23:07:41 - ERROR - stderr - +2025-02-05 23:07:41 - ERROR - stderr - +2025-02-05 23:07:41 - INFO - stdout - {'loss': 0.694, 'grad_norm': 1.3142913579940796, 'learning_rate': 6.86729552442537e-06, 'epoch': 1.84} +2025-02-05 23:07:41 - ERROR - stderr - 61%|██████▏ | 13761/22434 [13:00:01<6:00:41, 2.50s/it] +2025-02-05 23:07:44 - ERROR - stderr - 61%|██████▏ | 13762/22434 [13:00:03<6:00:31, 2.49s/it] +2025-02-05 23:07:44 - ERROR - stderr - +2025-02-05 23:07:44 - ERROR - stderr - +2025-02-05 23:07:44 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 1.2985808849334717, 'learning_rate': 6.8659244829509455e-06, 'epoch': 1.84} +2025-02-05 23:07:44 - ERROR - stderr - 61%|██████▏ | 13762/22434 [13:00:03<6:00:31, 2.49s/it] +2025-02-05 23:07:46 - ERROR - stderr - 61%|██████▏ | 13763/22434 [13:00:06<5:57:58, 2.48s/it] +2025-02-05 23:07:46 - ERROR - stderr - +2025-02-05 23:07:46 - ERROR - stderr - +2025-02-05 23:07:46 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.3340829610824585, 'learning_rate': 6.864553506803322e-06, 'epoch': 1.84} +2025-02-05 23:07:46 - ERROR - stderr - 61%|██████▏ | 13763/22434 [13:00:06<5:57:58, 2.48s/it] +2025-02-05 23:07:48 - ERROR - stderr - 61%|██████▏ | 13764/22434 [13:00:08<5:54:59, 2.46s/it] +2025-02-05 23:07:49 - ERROR - stderr - +2025-02-05 23:07:49 - ERROR - stderr - +2025-02-05 23:07:49 - INFO - stdout - {'loss': 0.6117, 'grad_norm': 1.222744345664978, 'learning_rate': 6.8631825960110866e-06, 'epoch': 1.84} +2025-02-05 23:07:49 - ERROR - stderr - 61%|██████▏ | 13764/22434 [13:00:08<5:54:59, 2.46s/it] +2025-02-05 23:07:51 - ERROR - stderr - 61%|██████▏ | 13765/22434 [13:00:11<5:58:22, 2.48s/it] +2025-02-05 23:07:51 - ERROR - stderr - +2025-02-05 23:07:51 - ERROR - stderr - +2025-02-05 23:07:51 - INFO - stdout - {'loss': 0.7273, 'grad_norm': 1.5633609294891357, 'learning_rate': 6.861811750602807e-06, 'epoch': 1.84} +2025-02-05 23:07:51 - ERROR - stderr - 61%|██████▏ | 13765/22434 [13:00:11<5:58:22, 2.48s/it] +2025-02-05 23:07:54 - ERROR - stderr - 61%|██████▏ | 13766/22434 [13:00:13<6:02:20, 2.51s/it] +2025-02-05 23:07:54 - ERROR - stderr - +2025-02-05 23:07:54 - ERROR - stderr - +2025-02-05 23:07:54 - INFO - stdout - {'loss': 0.5668, 'grad_norm': 1.214248776435852, 'learning_rate': 6.8604409706070556e-06, 'epoch': 1.84} +2025-02-05 23:07:54 - ERROR - stderr - 61%|██████▏ | 13766/22434 [13:00:13<6:02:20, 2.51s/it] +2025-02-05 23:07:56 - ERROR - stderr - 61%|██████▏ | 13767/22434 [13:00:16<6:05:56, 2.53s/it] +2025-02-05 23:07:56 - ERROR - stderr - +2025-02-05 23:07:56 - ERROR - stderr - +2025-02-05 23:07:56 - INFO - stdout - {'loss': 0.6565, 'grad_norm': 1.219557762145996, 'learning_rate': 6.859070256052412e-06, 'epoch': 1.84} +2025-02-05 23:07:56 - ERROR - stderr - 61%|██████▏ | 13767/22434 [13:00:16<6:05:56, 2.53s/it] +2025-02-05 23:07:59 - ERROR - stderr - 61%|██████▏ | 13768/22434 [13:00:19<6:13:49, 2.59s/it] +2025-02-05 23:07:59 - ERROR - stderr - +2025-02-05 23:07:59 - ERROR - stderr - +2025-02-05 23:07:59 - INFO - stdout - {'loss': 0.6715, 'grad_norm': 1.1539026498794556, 'learning_rate': 6.857699606967439e-06, 'epoch': 1.84} +2025-02-05 23:07:59 - ERROR - stderr - 61%|██████▏ | 13768/22434 [13:00:19<6:13:49, 2.59s/it] +2025-02-05 23:08:01 - ERROR - stderr - 61%|██████▏ | 13769/22434 [13:00:21<6:11:11, 2.57s/it] +2025-02-05 23:08:01 - ERROR - stderr - +2025-02-05 23:08:01 - ERROR - stderr - +2025-02-05 23:08:01 - INFO - stdout - {'loss': 0.6734, 'grad_norm': 1.203932762145996, 'learning_rate': 6.856329023380712e-06, 'epoch': 1.84} +2025-02-05 23:08:01 - ERROR - stderr - 61%|██████▏ | 13769/22434 [13:00:21<6:11:11, 2.57s/it] +2025-02-05 23:08:04 - ERROR - stderr - 61%|██████▏ | 13770/22434 [13:00:24<6:04:29, 2.52s/it] +2025-02-05 23:08:04 - ERROR - stderr - +2025-02-05 23:08:04 - ERROR - stderr - +2025-02-05 23:08:04 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.296655297279358, 'learning_rate': 6.854958505320801e-06, 'epoch': 1.84} +2025-02-05 23:08:04 - ERROR - stderr - 61%|██████▏ | 13770/22434 [13:00:24<6:04:29, 2.52s/it] +2025-02-05 23:08:06 - ERROR - stderr - 61%|██████▏ | 13771/22434 [13:00:26<6:02:37, 2.51s/it] +2025-02-05 23:08:06 - ERROR - stderr - +2025-02-05 23:08:06 - ERROR - stderr - +2025-02-05 23:08:06 - INFO - stdout - {'loss': 0.7093, 'grad_norm': 1.2001996040344238, 'learning_rate': 6.853588052816267e-06, 'epoch': 1.84} +2025-02-05 23:08:06 - ERROR - stderr - 61%|██████▏ | 13771/22434 [13:00:26<6:02:37, 2.51s/it] +2025-02-05 23:08:09 - ERROR - stderr - 61%|██████▏ | 13772/22434 [13:00:29<6:01:04, 2.50s/it] +2025-02-05 23:08:09 - ERROR - stderr - +2025-02-05 23:08:09 - ERROR - stderr - +2025-02-05 23:08:09 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.3656059503555298, 'learning_rate': 6.852217665895682e-06, 'epoch': 1.84} +2025-02-05 23:08:09 - ERROR - stderr - 61%|██████▏ | 13772/22434 [13:00:29<6:01:04, 2.50s/it] +2025-02-05 23:08:12 - ERROR - stderr - 61%|██████▏ | 13773/22434 [13:00:31<6:11:09, 2.57s/it] +2025-02-05 23:08:12 - ERROR - stderr - +2025-02-05 23:08:12 - ERROR - stderr - +2025-02-05 23:08:12 - INFO - stdout - {'loss': 0.6602, 'grad_norm': 1.1055585145950317, 'learning_rate': 6.850847344587607e-06, 'epoch': 1.84} +2025-02-05 23:08:12 - ERROR - stderr - 61%|██████▏ | 13773/22434 [13:00:31<6:11:09, 2.57s/it] +2025-02-05 23:08:14 - ERROR - stderr - 61%|██████▏ | 13774/22434 [13:00:34<6:04:41, 2.53s/it] +2025-02-05 23:08:14 - ERROR - stderr - +2025-02-05 23:08:14 - ERROR - stderr - +2025-02-05 23:08:14 - INFO - stdout - {'loss': 0.6291, 'grad_norm': 1.1360492706298828, 'learning_rate': 6.849477088920604e-06, 'epoch': 1.84} +2025-02-05 23:08:14 - ERROR - stderr - 61%|██████▏ | 13774/22434 [13:00:34<6:04:41, 2.53s/it] +2025-02-05 23:08:17 - ERROR - stderr - 61%|██████▏ | 13775/22434 [13:00:36<6:05:13, 2.53s/it] +2025-02-05 23:08:17 - ERROR - stderr - +2025-02-05 23:08:17 - ERROR - stderr - +2025-02-05 23:08:17 - INFO - stdout - {'loss': 0.6174, 'grad_norm': 1.3937456607818604, 'learning_rate': 6.848106898923238e-06, 'epoch': 1.84} +2025-02-05 23:08:17 - ERROR - stderr - 61%|██████▏ | 13775/22434 [13:00:36<6:05:13, 2.53s/it] +2025-02-05 23:08:19 - ERROR - stderr - 61%|██████▏ | 13776/22434 [13:00:39<6:02:27, 2.51s/it] +2025-02-05 23:08:19 - ERROR - stderr - +2025-02-05 23:08:19 - ERROR - stderr - +2025-02-05 23:08:19 - INFO - stdout - {'loss': 0.6379, 'grad_norm': 1.205394983291626, 'learning_rate': 6.846736774624066e-06, 'epoch': 1.84} +2025-02-05 23:08:19 - ERROR - stderr - 61%|██████▏ | 13776/22434 [13:00:39<6:02:27, 2.51s/it] +2025-02-05 23:08:21 - ERROR - stderr - 61%|██████▏ | 13777/22434 [13:00:41<6:02:31, 2.51s/it] +2025-02-05 23:08:22 - ERROR - stderr - +2025-02-05 23:08:22 - ERROR - stderr - +2025-02-05 23:08:22 - INFO - stdout - {'loss': 0.5956, 'grad_norm': 1.0102508068084717, 'learning_rate': 6.845366716051651e-06, 'epoch': 1.84} +2025-02-05 23:08:22 - ERROR - stderr - 61%|██████▏ | 13777/22434 [13:00:41<6:02:31, 2.51s/it] +2025-02-05 23:08:24 - ERROR - stderr - 61%|██████▏ | 13778/22434 [13:00:44<6:00:00, 2.50s/it] +2025-02-05 23:08:24 - ERROR - stderr - +2025-02-05 23:08:24 - ERROR - stderr - +2025-02-05 23:08:24 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.5055598020553589, 'learning_rate': 6.843996723234549e-06, 'epoch': 1.84} +2025-02-05 23:08:24 - ERROR - stderr - 61%|██████▏ | 13778/22434 [13:00:44<6:00:00, 2.50s/it] +2025-02-05 23:08:27 - ERROR - stderr - 61%|██████▏ | 13779/22434 [13:00:46<6:02:49, 2.52s/it] +2025-02-05 23:08:27 - ERROR - stderr - +2025-02-05 23:08:27 - ERROR - stderr - +2025-02-05 23:08:27 - INFO - stdout - {'loss': 0.6269, 'grad_norm': 1.145379662513733, 'learning_rate': 6.842626796201311e-06, 'epoch': 1.84} +2025-02-05 23:08:27 - ERROR - stderr - 61%|██████▏ | 13779/22434 [13:00:46<6:02:49, 2.52s/it] +2025-02-05 23:08:29 - ERROR - stderr - 61%|██████▏ | 13780/22434 [13:00:49<6:00:41, 2.50s/it] +2025-02-05 23:08:29 - ERROR - stderr - +2025-02-05 23:08:29 - ERROR - stderr - +2025-02-05 23:08:29 - INFO - stdout - {'loss': 0.687, 'grad_norm': 1.3151395320892334, 'learning_rate': 6.841256934980501e-06, 'epoch': 1.84} +2025-02-05 23:08:29 - ERROR - stderr - 61%|██████▏ | 13780/22434 [13:00:49<6:00:41, 2.50s/it] +2025-02-05 23:08:32 - ERROR - stderr - 61%|██████▏ | 13781/22434 [13:00:51<6:02:04, 2.51s/it] +2025-02-05 23:08:32 - ERROR - stderr - +2025-02-05 23:08:32 - ERROR - stderr - +2025-02-05 23:08:32 - INFO - stdout - {'loss': 0.7001, 'grad_norm': 1.1371145248413086, 'learning_rate': 6.839887139600664e-06, 'epoch': 1.84} +2025-02-05 23:08:32 - ERROR - stderr - 61%|██████▏ | 13781/22434 [13:00:51<6:02:04, 2.51s/it] +2025-02-05 23:08:34 - ERROR - stderr - 61%|██████▏ | 13782/22434 [13:00:54<6:04:47, 2.53s/it] +2025-02-05 23:08:34 - ERROR - stderr - +2025-02-05 23:08:34 - ERROR - stderr - +2025-02-05 23:08:34 - INFO - stdout - {'loss': 0.7475, 'grad_norm': 1.32063889503479, 'learning_rate': 6.838517410090355e-06, 'epoch': 1.84} +2025-02-05 23:08:34 - ERROR - stderr - 61%|██████▏ | 13782/22434 [13:00:54<6:04:47, 2.53s/it] +2025-02-05 23:08:37 - ERROR - stderr - 61%|██████▏ | 13783/22434 [13:00:56<6:01:57, 2.51s/it] +2025-02-05 23:08:37 - ERROR - stderr - +2025-02-05 23:08:37 - ERROR - stderr - +2025-02-05 23:08:37 - INFO - stdout - {'loss': 0.6264, 'grad_norm': 1.152679681777954, 'learning_rate': 6.8371477464781276e-06, 'epoch': 1.84} +2025-02-05 23:08:37 - ERROR - stderr - 61%|██████▏ | 13783/22434 [13:00:56<6:01:57, 2.51s/it] +2025-02-05 23:08:39 - ERROR - stderr - 61%|██████▏ | 13784/22434 [13:00:59<5:59:18, 2.49s/it] +2025-02-05 23:08:39 - ERROR - stderr - +2025-02-05 23:08:39 - ERROR - stderr - +2025-02-05 23:08:39 - INFO - stdout - {'loss': 0.6867, 'grad_norm': 1.4427374601364136, 'learning_rate': 6.835778148792527e-06, 'epoch': 1.84} +2025-02-05 23:08:39 - ERROR - stderr - 61%|██████▏ | 13784/22434 [13:00:59<5:59:18, 2.49s/it] +2025-02-05 23:08:41 - ERROR - stderr - 61%|██████▏ | 13785/22434 [13:01:01<5:56:07, 2.47s/it] +2025-02-05 23:08:41 - ERROR - stderr - +2025-02-05 23:08:41 - ERROR - stderr - +2025-02-05 23:08:41 - INFO - stdout - {'loss': 0.69, 'grad_norm': 1.2832494974136353, 'learning_rate': 6.834408617062107e-06, 'epoch': 1.84} +2025-02-05 23:08:41 - ERROR - stderr - 61%|██████▏ | 13785/22434 [13:01:01<5:56:07, 2.47s/it] +2025-02-05 23:08:44 - ERROR - stderr - 61%|██████▏ | 13786/22434 [13:01:04<5:56:23, 2.47s/it] +2025-02-05 23:08:44 - ERROR - stderr - +2025-02-05 23:08:44 - ERROR - stderr - +2025-02-05 23:08:44 - INFO - stdout - {'loss': 0.6548, 'grad_norm': 1.2268489599227905, 'learning_rate': 6.8330391513154095e-06, 'epoch': 1.84} +2025-02-05 23:08:44 - ERROR - stderr - 61%|██████▏ | 13786/22434 [13:01:04<5:56:23, 2.47s/it] +2025-02-05 23:08:46 - ERROR - stderr - 61%|██████▏ | 13787/22434 [13:01:06<5:53:53, 2.46s/it] +2025-02-05 23:08:46 - ERROR - stderr - +2025-02-05 23:08:46 - ERROR - stderr - +2025-02-05 23:08:46 - INFO - stdout - {'loss': 0.6479, 'grad_norm': 1.129612922668457, 'learning_rate': 6.831669751580976e-06, 'epoch': 1.84} +2025-02-05 23:08:46 - ERROR - stderr - 61%|██████▏ | 13787/22434 [13:01:06<5:53:53, 2.46s/it] +2025-02-05 23:08:49 - ERROR - stderr - 61%|██████▏ | 13788/22434 [13:01:09<5:54:47, 2.46s/it] +2025-02-05 23:08:49 - ERROR - stderr - +2025-02-05 23:08:49 - ERROR - stderr - +2025-02-05 23:08:49 - INFO - stdout - {'loss': 0.5958, 'grad_norm': 1.193387508392334, 'learning_rate': 6.8303004178873566e-06, 'epoch': 1.84} +2025-02-05 23:08:49 - ERROR - stderr - 61%|██████▏ | 13788/22434 [13:01:09<5:54:47, 2.46s/it] +2025-02-05 23:08:51 - ERROR - stderr - 61%|██████▏ | 13789/22434 [13:01:11<5:53:06, 2.45s/it] +2025-02-05 23:08:51 - ERROR - stderr - +2025-02-05 23:08:51 - ERROR - stderr - +2025-02-05 23:08:51 - INFO - stdout - {'loss': 0.7172, 'grad_norm': 1.270899772644043, 'learning_rate': 6.828931150263095e-06, 'epoch': 1.84} +2025-02-05 23:08:51 - ERROR - stderr - 61%|██████▏ | 13789/22434 [13:01:11<5:53:06, 2.45s/it] +2025-02-05 23:08:54 - ERROR - stderr - 61%|██████▏ | 13790/22434 [13:01:13<5:55:50, 2.47s/it] +2025-02-05 23:08:54 - ERROR - stderr - +2025-02-05 23:08:54 - ERROR - stderr - +2025-02-05 23:08:54 - INFO - stdout - {'loss': 0.5916, 'grad_norm': 1.079564094543457, 'learning_rate': 6.827561948736725e-06, 'epoch': 1.84} +2025-02-05 23:08:54 - ERROR - stderr - 61%|██████▏ | 13790/22434 [13:01:14<5:55:50, 2.47s/it] +2025-02-05 23:08:56 - ERROR - stderr - 61%|██████▏ | 13791/22434 [13:01:16<5:54:16, 2.46s/it] +2025-02-05 23:08:56 - ERROR - stderr - +2025-02-05 23:08:56 - ERROR - stderr - +2025-02-05 23:08:56 - INFO - stdout - {'loss': 0.6844, 'grad_norm': 1.2091145515441895, 'learning_rate': 6.826192813336794e-06, 'epoch': 1.84} +2025-02-05 23:08:56 - ERROR - stderr - 61%|██████▏ | 13791/22434 [13:01:16<5:54:16, 2.46s/it] +2025-02-05 23:08:59 - ERROR - stderr - 61%|██████▏ | 13792/22434 [13:01:18<5:56:19, 2.47s/it] +2025-02-05 23:08:59 - ERROR - stderr - +2025-02-05 23:08:59 - ERROR - stderr - +2025-02-05 23:08:59 - INFO - stdout - {'loss': 0.61, 'grad_norm': 1.1348251104354858, 'learning_rate': 6.824823744091833e-06, 'epoch': 1.84} +2025-02-05 23:08:59 - ERROR - stderr - 61%|██████▏ | 13792/22434 [13:01:18<5:56:19, 2.47s/it] +2025-02-05 23:09:01 - ERROR - stderr - 61%|██████▏ | 13793/22434 [13:01:21<5:56:16, 2.47s/it] +2025-02-05 23:09:01 - ERROR - stderr - +2025-02-05 23:09:01 - ERROR - stderr - +2025-02-05 23:09:01 - INFO - stdout - {'loss': 0.6961, 'grad_norm': 1.3358339071273804, 'learning_rate': 6.8234547410303865e-06, 'epoch': 1.84} +2025-02-05 23:09:01 - ERROR - stderr - 61%|██████▏ | 13793/22434 [13:01:21<5:56:16, 2.47s/it] +2025-02-05 23:09:04 - ERROR - stderr - 61%|██████▏ | 13794/22434 [13:01:23<5:53:49, 2.46s/it] +2025-02-05 23:09:04 - ERROR - stderr - +2025-02-05 23:09:04 - ERROR - stderr - +2025-02-05 23:09:04 - INFO - stdout - {'loss': 0.7367, 'grad_norm': 1.2482625246047974, 'learning_rate': 6.822085804180985e-06, 'epoch': 1.84} +2025-02-05 23:09:04 - ERROR - stderr - 61%|██████▏ | 13794/22434 [13:01:23<5:53:49, 2.46s/it] +2025-02-05 23:09:06 - ERROR - stderr - 61%|██████▏ | 13795/22434 [13:01:26<5:53:47, 2.46s/it] +2025-02-05 23:09:06 - ERROR - stderr - +2025-02-05 23:09:06 - ERROR - stderr - +2025-02-05 23:09:06 - INFO - stdout - {'loss': 0.587, 'grad_norm': 1.19967520236969, 'learning_rate': 6.820716933572162e-06, 'epoch': 1.84} +2025-02-05 23:09:06 - ERROR - stderr - 61%|██████▏ | 13795/22434 [13:01:26<5:53:47, 2.46s/it] +2025-02-05 23:09:09 - ERROR - stderr - 61%|██████▏ | 13796/22434 [13:01:28<5:57:11, 2.48s/it] +2025-02-05 23:09:09 - ERROR - stderr - +2025-02-05 23:09:09 - ERROR - stderr - +2025-02-05 23:09:09 - INFO - stdout - {'loss': 0.6952, 'grad_norm': 1.3269333839416504, 'learning_rate': 6.819348129232456e-06, 'epoch': 1.84} +2025-02-05 23:09:09 - ERROR - stderr - 61%|██████▏ | 13796/22434 [13:01:28<5:57:11, 2.48s/it] +2025-02-05 23:09:11 - ERROR - stderr - 62%|██████▏ | 13797/22434 [13:01:31<6:07:34, 2.55s/it] +2025-02-05 23:09:11 - ERROR - stderr - +2025-02-05 23:09:11 - ERROR - stderr - +2025-02-05 23:09:11 - INFO - stdout - {'loss': 0.7801, 'grad_norm': 1.3255964517593384, 'learning_rate': 6.8179793911903945e-06, 'epoch': 1.85} +2025-02-05 23:09:11 - ERROR - stderr - 62%|██████▏ | 13797/22434 [13:01:31<6:07:34, 2.55s/it] +2025-02-05 23:09:14 - ERROR - stderr - 62%|██████▏ | 13798/22434 [13:01:34<6:06:09, 2.54s/it] +2025-02-05 23:09:14 - ERROR - stderr - +2025-02-05 23:09:14 - ERROR - stderr - +2025-02-05 23:09:14 - INFO - stdout - {'loss': 0.5981, 'grad_norm': 1.1533595323562622, 'learning_rate': 6.816610719474503e-06, 'epoch': 1.85} +2025-02-05 23:09:14 - ERROR - stderr - 62%|██████▏ | 13798/22434 [13:01:34<6:06:09, 2.54s/it] +2025-02-05 23:09:16 - ERROR - stderr - 62%|██████▏ | 13799/22434 [13:01:36<6:05:08, 2.54s/it] +2025-02-05 23:09:16 - ERROR - stderr - +2025-02-05 23:09:16 - ERROR - stderr - +2025-02-05 23:09:16 - INFO - stdout - {'loss': 0.7172, 'grad_norm': 1.2958546876907349, 'learning_rate': 6.815242114113321e-06, 'epoch': 1.85} +2025-02-05 23:09:16 - ERROR - stderr - 62%|██████▏ | 13799/22434 [13:01:36<6:05:08, 2.54s/it] +2025-02-05 23:09:19 - ERROR - stderr - 62%|██████▏ | 13800/22434 [13:01:39<6:04:14, 2.53s/it] +2025-02-05 23:09:19 - ERROR - stderr - +2025-02-05 23:09:19 - ERROR - stderr - +2025-02-05 23:09:19 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.1685703992843628, 'learning_rate': 6.813873575135363e-06, 'epoch': 1.85} +2025-02-05 23:09:19 - ERROR - stderr - 62%|██████▏ | 13800/22434 [13:01:39<6:04:14, 2.53s/it] +2025-02-05 23:09:21 - ERROR - stderr - 62%|██████▏ | 13801/22434 [13:01:41<6:03:16, 2.52s/it] +2025-02-05 23:09:21 - ERROR - stderr - +2025-02-05 23:09:21 - ERROR - stderr - +2025-02-05 23:09:21 - INFO - stdout - {'loss': 0.6369, 'grad_norm': 1.1234225034713745, 'learning_rate': 6.812505102569164e-06, 'epoch': 1.85} +2025-02-05 23:09:21 - ERROR - stderr - 62%|██████▏ | 13801/22434 [13:01:41<6:03:16, 2.52s/it] +2025-02-05 23:09:24 - ERROR - stderr - 62%|██████▏ | 13802/22434 [13:01:44<6:00:03, 2.50s/it] +2025-02-05 23:09:24 - ERROR - stderr - +2025-02-05 23:09:24 - ERROR - stderr - +2025-02-05 23:09:24 - INFO - stdout - {'loss': 0.6346, 'grad_norm': 1.3002102375030518, 'learning_rate': 6.81113669644325e-06, 'epoch': 1.85} +2025-02-05 23:09:24 - ERROR - stderr - 62%|██████▏ | 13802/22434 [13:01:44<6:00:03, 2.50s/it] +2025-02-05 23:09:26 - ERROR - stderr - 62%|██████▏ | 13803/22434 [13:01:46<5:57:24, 2.48s/it] +2025-02-05 23:09:26 - ERROR - stderr - +2025-02-05 23:09:26 - ERROR - stderr - +2025-02-05 23:09:26 - INFO - stdout - {'loss': 0.692, 'grad_norm': 1.258652687072754, 'learning_rate': 6.809768356786135e-06, 'epoch': 1.85} +2025-02-05 23:09:26 - ERROR - stderr - 62%|██████▏ | 13803/22434 [13:01:46<5:57:24, 2.48s/it] +2025-02-05 23:09:29 - ERROR - stderr - 62%|██████▏ | 13804/22434 [13:01:48<5:56:23, 2.48s/it] +2025-02-05 23:09:29 - ERROR - stderr - +2025-02-05 23:09:29 - ERROR - stderr - +2025-02-05 23:09:29 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.2245444059371948, 'learning_rate': 6.80840008362635e-06, 'epoch': 1.85} +2025-02-05 23:09:29 - ERROR - stderr - 62%|██████▏ | 13804/22434 [13:01:49<5:56:23, 2.48s/it] +2025-02-05 23:09:31 - ERROR - stderr - 62%|██████▏ | 13805/22434 [13:01:51<5:57:03, 2.48s/it] +2025-02-05 23:09:31 - ERROR - stderr - +2025-02-05 23:09:31 - ERROR - stderr - +2025-02-05 23:09:31 - INFO - stdout - {'loss': 0.7176, 'grad_norm': 1.524429440498352, 'learning_rate': 6.807031876992411e-06, 'epoch': 1.85} +2025-02-05 23:09:31 - ERROR - stderr - 62%|██████▏ | 13805/22434 [13:01:51<5:57:03, 2.48s/it] +2025-02-05 23:09:34 - ERROR - stderr - 62%|██████▏ | 13806/22434 [13:01:54<5:59:22, 2.50s/it] +2025-02-05 23:09:34 - ERROR - stderr - +2025-02-05 23:09:34 - ERROR - stderr - +2025-02-05 23:09:34 - INFO - stdout - {'loss': 0.7075, 'grad_norm': 1.2874886989593506, 'learning_rate': 6.8056637369128335e-06, 'epoch': 1.85} +2025-02-05 23:09:34 - ERROR - stderr - 62%|██████▏ | 13806/22434 [13:01:54<5:59:22, 2.50s/it] +2025-02-05 23:09:36 - ERROR - stderr - 62%|██████▏ | 13807/22434 [13:01:56<6:01:07, 2.51s/it] +2025-02-05 23:09:36 - ERROR - stderr - +2025-02-05 23:09:36 - ERROR - stderr - +2025-02-05 23:09:36 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2795969247817993, 'learning_rate': 6.804295663416141e-06, 'epoch': 1.85} +2025-02-05 23:09:36 - ERROR - stderr - 62%|██████▏ | 13807/22434 [13:01:56<6:01:07, 2.51s/it] +2025-02-05 23:09:39 - ERROR - stderr - 62%|██████▏ | 13808/22434 [13:01:58<5:58:10, 2.49s/it] +2025-02-05 23:09:39 - ERROR - stderr - +2025-02-05 23:09:39 - ERROR - stderr - +2025-02-05 23:09:39 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.2683343887329102, 'learning_rate': 6.802927656530844e-06, 'epoch': 1.85} +2025-02-05 23:09:39 - ERROR - stderr - 62%|██████▏ | 13808/22434 [13:01:59<5:58:10, 2.49s/it] +2025-02-05 23:09:41 - ERROR - stderr - 62%|██████▏ | 13809/22434 [13:02:01<5:57:44, 2.49s/it] +2025-02-05 23:09:41 - ERROR - stderr - +2025-02-05 23:09:41 - ERROR - stderr - +2025-02-05 23:09:41 - INFO - stdout - {'loss': 0.7858, 'grad_norm': 1.4794082641601562, 'learning_rate': 6.801559716285466e-06, 'epoch': 1.85} +2025-02-05 23:09:41 - ERROR - stderr - 62%|██████▏ | 13809/22434 [13:02:01<5:57:44, 2.49s/it] +2025-02-05 23:09:44 - ERROR - stderr - 62%|██████▏ | 13810/22434 [13:02:04<6:02:55, 2.52s/it] +2025-02-05 23:09:44 - ERROR - stderr - +2025-02-05 23:09:44 - ERROR - stderr - +2025-02-05 23:09:44 - INFO - stdout - {'loss': 0.6796, 'grad_norm': 1.2969962358474731, 'learning_rate': 6.800191842708515e-06, 'epoch': 1.85} +2025-02-05 23:09:44 - ERROR - stderr - 62%|██████▏ | 13810/22434 [13:02:04<6:02:55, 2.52s/it] +2025-02-05 23:09:46 - ERROR - stderr - 62%|██████▏ | 13811/22434 [13:02:06<6:03:34, 2.53s/it] +2025-02-05 23:09:46 - ERROR - stderr - +2025-02-05 23:09:46 - ERROR - stderr - +2025-02-05 23:09:46 - INFO - stdout - {'loss': 0.6788, 'grad_norm': 1.1836827993392944, 'learning_rate': 6.7988240358285e-06, 'epoch': 1.85} +2025-02-05 23:09:46 - ERROR - stderr - 62%|██████▏ | 13811/22434 [13:02:06<6:03:34, 2.53s/it] +2025-02-05 23:09:49 - ERROR - stderr - 62%|██████▏ | 13812/22434 [13:02:09<6:05:35, 2.54s/it] +2025-02-05 23:09:49 - ERROR - stderr - +2025-02-05 23:09:49 - ERROR - stderr - +2025-02-05 23:09:49 - INFO - stdout - {'loss': 0.6919, 'grad_norm': 1.2591941356658936, 'learning_rate': 6.797456295673937e-06, 'epoch': 1.85} +2025-02-05 23:09:49 - ERROR - stderr - 62%|██████▏ | 13812/22434 [13:02:09<6:05:35, 2.54s/it] +2025-02-05 23:09:51 - ERROR - stderr - 62%|██████▏ | 13813/22434 [13:02:11<6:06:30, 2.55s/it] +2025-02-05 23:09:52 - ERROR - stderr - +2025-02-05 23:09:52 - ERROR - stderr - +2025-02-05 23:09:52 - INFO - stdout - {'loss': 0.6281, 'grad_norm': 1.1629880666732788, 'learning_rate': 6.796088622273331e-06, 'epoch': 1.85} +2025-02-05 23:09:52 - ERROR - stderr - 62%|██████▏ | 13813/22434 [13:02:11<6:06:30, 2.55s/it] +2025-02-05 23:09:54 - ERROR - stderr - 62%|██████▏ | 13814/22434 [13:02:14<6:11:19, 2.58s/it] +2025-02-05 23:09:54 - ERROR - stderr - +2025-02-05 23:09:54 - ERROR - stderr - +2025-02-05 23:09:54 - INFO - stdout - {'loss': 0.7406, 'grad_norm': 1.2673150300979614, 'learning_rate': 6.794721015655191e-06, 'epoch': 1.85} +2025-02-05 23:09:54 - ERROR - stderr - 62%|██████▏ | 13814/22434 [13:02:14<6:11:19, 2.58s/it] +2025-02-05 23:09:57 - ERROR - stderr - 62%|██████▏ | 13815/22434 [13:02:16<6:09:25, 2.57s/it] +2025-02-05 23:09:57 - ERROR - stderr - +2025-02-05 23:09:57 - ERROR - stderr - +2025-02-05 23:09:57 - INFO - stdout - {'loss': 0.6527, 'grad_norm': 1.2791593074798584, 'learning_rate': 6.793353475848028e-06, 'epoch': 1.85} +2025-02-05 23:09:57 - ERROR - stderr - 62%|██████▏ | 13815/22434 [13:02:17<6:09:25, 2.57s/it] +2025-02-05 23:09:59 - ERROR - stderr - 62%|██████▏ | 13816/22434 [13:02:19<6:08:32, 2.57s/it] +2025-02-05 23:09:59 - ERROR - stderr - +2025-02-05 23:09:59 - ERROR - stderr - +2025-02-05 23:09:59 - INFO - stdout - {'loss': 0.6655, 'grad_norm': 1.1884660720825195, 'learning_rate': 6.791986002880339e-06, 'epoch': 1.85} +2025-02-05 23:09:59 - ERROR - stderr - 62%|██████▏ | 13816/22434 [13:02:19<6:08:32, 2.57s/it] +2025-02-05 23:10:02 - ERROR - stderr - 62%|██████▏ | 13817/22434 [13:02:21<6:03:25, 2.53s/it] +2025-02-05 23:10:02 - ERROR - stderr - +2025-02-05 23:10:02 - ERROR - stderr - +2025-02-05 23:10:02 - INFO - stdout - {'loss': 0.7747, 'grad_norm': 1.3216431140899658, 'learning_rate': 6.790618596780638e-06, 'epoch': 1.85} +2025-02-05 23:10:02 - ERROR - stderr - 62%|██████▏ | 13817/22434 [13:02:22<6:03:25, 2.53s/it] +2025-02-05 23:10:04 - ERROR - stderr - 62%|██████▏ | 13818/22434 [13:02:24<6:02:40, 2.53s/it] +2025-02-05 23:10:04 - ERROR - stderr - +2025-02-05 23:10:04 - ERROR - stderr - +2025-02-05 23:10:04 - INFO - stdout - {'loss': 0.7345, 'grad_norm': 1.302400827407837, 'learning_rate': 6.789251257577419e-06, 'epoch': 1.85} +2025-02-05 23:10:04 - ERROR - stderr - 62%|██████▏ | 13818/22434 [13:02:24<6:02:40, 2.53s/it] +2025-02-05 23:10:07 - ERROR - stderr - 62%|██████▏ | 13819/22434 [13:02:27<6:04:18, 2.54s/it] +2025-02-05 23:10:07 - ERROR - stderr - +2025-02-05 23:10:07 - ERROR - stderr - +2025-02-05 23:10:07 - INFO - stdout - {'loss': 0.7003, 'grad_norm': 1.2374743223190308, 'learning_rate': 6.787883985299182e-06, 'epoch': 1.85} +2025-02-05 23:10:07 - ERROR - stderr - 62%|██████▏ | 13819/22434 [13:02:27<6:04:18, 2.54s/it] +2025-02-05 23:10:09 - ERROR - stderr - 62%|██████▏ | 13820/22434 [13:02:29<6:04:39, 2.54s/it] +2025-02-05 23:10:09 - ERROR - stderr - +2025-02-05 23:10:09 - ERROR - stderr - +2025-02-05 23:10:09 - INFO - stdout - {'loss': 0.6386, 'grad_norm': 1.154674768447876, 'learning_rate': 6.786516779974431e-06, 'epoch': 1.85} +2025-02-05 23:10:09 - ERROR - stderr - 62%|██████▏ | 13820/22434 [13:02:29<6:04:39, 2.54s/it] +2025-02-05 23:10:12 - ERROR - stderr - 62%|██████▏ | 13821/22434 [13:02:32<6:13:29, 2.60s/it] +2025-02-05 23:10:12 - ERROR - stderr - +2025-02-05 23:10:12 - ERROR - stderr - +2025-02-05 23:10:12 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.0955474376678467, 'learning_rate': 6.785149641631665e-06, 'epoch': 1.85} +2025-02-05 23:10:12 - ERROR - stderr - 62%|██████▏ | 13821/22434 [13:02:32<6:13:29, 2.60s/it] +2025-02-05 23:10:15 - ERROR - stderr - 62%|██████▏ | 13822/22434 [13:02:34<6:07:33, 2.56s/it] +2025-02-05 23:10:15 - ERROR - stderr - +2025-02-05 23:10:15 - ERROR - stderr - +2025-02-05 23:10:15 - INFO - stdout - {'loss': 0.7858, 'grad_norm': 1.3786767721176147, 'learning_rate': 6.783782570299376e-06, 'epoch': 1.85} +2025-02-05 23:10:15 - ERROR - stderr - 62%|██████▏ | 13822/22434 [13:02:34<6:07:33, 2.56s/it] +2025-02-05 23:10:17 - ERROR - stderr - 62%|██████▏ | 13823/22434 [13:02:37<6:03:59, 2.54s/it] +2025-02-05 23:10:17 - ERROR - stderr - +2025-02-05 23:10:17 - ERROR - stderr - +2025-02-05 23:10:17 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.1883971691131592, 'learning_rate': 6.782415566006064e-06, 'epoch': 1.85} +2025-02-05 23:10:17 - ERROR - stderr - 62%|██████▏ | 13823/22434 [13:02:37<6:03:59, 2.54s/it] +2025-02-05 23:10:20 - ERROR - stderr - 62%|██████▏ | 13824/22434 [13:02:39<6:04:23, 2.54s/it] +2025-02-05 23:10:20 - ERROR - stderr - +2025-02-05 23:10:20 - ERROR - stderr - +2025-02-05 23:10:20 - INFO - stdout - {'loss': 0.6206, 'grad_norm': 1.1589635610580444, 'learning_rate': 6.781048628780217e-06, 'epoch': 1.85} +2025-02-05 23:10:20 - ERROR - stderr - 62%|██████▏ | 13824/22434 [13:02:39<6:04:23, 2.54s/it] +2025-02-05 23:10:22 - ERROR - stderr - 62%|██████▏ | 13825/22434 [13:02:42<6:01:49, 2.52s/it] +2025-02-05 23:10:22 - ERROR - stderr - +2025-02-05 23:10:22 - ERROR - stderr - +2025-02-05 23:10:22 - INFO - stdout - {'loss': 0.6558, 'grad_norm': 1.2407127618789673, 'learning_rate': 6.779681758650336e-06, 'epoch': 1.85} +2025-02-05 23:10:22 - ERROR - stderr - 62%|██████▏ | 13825/22434 [13:02:42<6:01:49, 2.52s/it] +2025-02-05 23:10:25 - ERROR - stderr - 62%|██████▏ | 13826/22434 [13:02:44<6:00:01, 2.51s/it] +2025-02-05 23:10:25 - ERROR - stderr - +2025-02-05 23:10:25 - ERROR - stderr - +2025-02-05 23:10:25 - INFO - stdout - {'loss': 0.6349, 'grad_norm': 1.234395146369934, 'learning_rate': 6.778314955644905e-06, 'epoch': 1.85} +2025-02-05 23:10:25 - ERROR - stderr - 62%|██████▏ | 13826/22434 [13:02:44<6:00:01, 2.51s/it] +2025-02-05 23:10:27 - ERROR - stderr - 62%|██████▏ | 13827/22434 [13:02:47<6:00:52, 2.52s/it] +2025-02-05 23:10:27 - ERROR - stderr - +2025-02-05 23:10:27 - ERROR - stderr - +2025-02-05 23:10:27 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.2651311159133911, 'learning_rate': 6.776948219792412e-06, 'epoch': 1.85} +2025-02-05 23:10:27 - ERROR - stderr - 62%|██████▏ | 13827/22434 [13:02:47<6:00:52, 2.52s/it] +2025-02-05 23:10:30 - ERROR - stderr - 62%|██████▏ | 13828/22434 [13:02:49<5:58:09, 2.50s/it] +2025-02-05 23:10:30 - ERROR - stderr - +2025-02-05 23:10:30 - ERROR - stderr - +2025-02-05 23:10:30 - INFO - stdout - {'loss': 0.6578, 'grad_norm': 1.151973009109497, 'learning_rate': 6.775581551121355e-06, 'epoch': 1.85} +2025-02-05 23:10:30 - ERROR - stderr - 62%|██████▏ | 13828/22434 [13:02:49<5:58:09, 2.50s/it] +2025-02-05 23:10:32 - ERROR - stderr - 62%|██████▏ | 13829/22434 [13:02:52<5:58:13, 2.50s/it] +2025-02-05 23:10:32 - ERROR - stderr - +2025-02-05 23:10:32 - ERROR - stderr - +2025-02-05 23:10:32 - INFO - stdout - {'loss': 0.7399, 'grad_norm': 1.2138807773590088, 'learning_rate': 6.774214949660215e-06, 'epoch': 1.85} +2025-02-05 23:10:32 - ERROR - stderr - 62%|██████▏ | 13829/22434 [13:02:52<5:58:13, 2.50s/it] +2025-02-05 23:10:34 - ERROR - stderr - 62%|██████▏ | 13830/22434 [13:02:54<5:56:37, 2.49s/it] +2025-02-05 23:10:35 - ERROR - stderr - +2025-02-05 23:10:35 - ERROR - stderr - +2025-02-05 23:10:35 - INFO - stdout - {'loss': 0.7929, 'grad_norm': 1.238549828529358, 'learning_rate': 6.772848415437473e-06, 'epoch': 1.85} +2025-02-05 23:10:35 - ERROR - stderr - 62%|██████▏ | 13830/22434 [13:02:54<5:56:37, 2.49s/it] +2025-02-05 23:10:37 - ERROR - stderr - 62%|██████▏ | 13831/22434 [13:02:57<5:54:35, 2.47s/it] +2025-02-05 23:10:37 - ERROR - stderr - +2025-02-05 23:10:37 - ERROR - stderr - +2025-02-05 23:10:37 - INFO - stdout - {'loss': 0.5906, 'grad_norm': 1.1721092462539673, 'learning_rate': 6.771481948481622e-06, 'epoch': 1.85} +2025-02-05 23:10:37 - ERROR - stderr - 62%|██████▏ | 13831/22434 [13:02:57<5:54:35, 2.47s/it] +2025-02-05 23:10:39 - ERROR - stderr - 62%|██████▏ | 13832/22434 [13:02:59<5:53:09, 2.46s/it] +2025-02-05 23:10:39 - ERROR - stderr - +2025-02-05 23:10:39 - ERROR - stderr - +2025-02-05 23:10:39 - INFO - stdout - {'loss': 0.7351, 'grad_norm': 1.3039196729660034, 'learning_rate': 6.7701155488211365e-06, 'epoch': 1.85} +2025-02-05 23:10:39 - ERROR - stderr - 62%|██████▏ | 13832/22434 [13:02:59<5:53:09, 2.46s/it] +2025-02-05 23:10:42 - ERROR - stderr - 62%|██████▏ | 13833/22434 [13:03:02<6:05:14, 2.55s/it] +2025-02-05 23:10:42 - ERROR - stderr - +2025-02-05 23:10:42 - ERROR - stderr - +2025-02-05 23:10:42 - INFO - stdout - {'loss': 0.6557, 'grad_norm': 1.2214093208312988, 'learning_rate': 6.7687492164845044e-06, 'epoch': 1.85} +2025-02-05 23:10:42 - ERROR - stderr - 62%|██████▏ | 13833/22434 [13:03:02<6:05:14, 2.55s/it] +2025-02-05 23:10:45 - ERROR - stderr - 62%|██████▏ | 13834/22434 [13:03:04<5:59:19, 2.51s/it] +2025-02-05 23:10:45 - ERROR - stderr - +2025-02-05 23:10:45 - ERROR - stderr - +2025-02-05 23:10:45 - INFO - stdout - {'loss': 0.6528, 'grad_norm': 1.1949502229690552, 'learning_rate': 6.767382951500205e-06, 'epoch': 1.85} +2025-02-05 23:10:45 - ERROR - stderr - 62%|██████▏ | 13834/22434 [13:03:04<5:59:19, 2.51s/it] +2025-02-05 23:10:47 - ERROR - stderr - 62%|██████▏ | 13835/22434 [13:03:07<6:00:37, 2.52s/it] +2025-02-05 23:10:47 - ERROR - stderr - +2025-02-05 23:10:47 - ERROR - stderr - +2025-02-05 23:10:47 - INFO - stdout - {'loss': 0.637, 'grad_norm': 1.298505425453186, 'learning_rate': 6.766016753896709e-06, 'epoch': 1.85} +2025-02-05 23:10:47 - ERROR - stderr - 62%|██████▏ | 13835/22434 [13:03:07<6:00:37, 2.52s/it] +2025-02-05 23:10:49 - ERROR - stderr - 62%|██████▏ | 13836/22434 [13:03:09<5:57:11, 2.49s/it] +2025-02-05 23:10:50 - ERROR - stderr - +2025-02-05 23:10:50 - ERROR - stderr - +2025-02-05 23:10:50 - INFO - stdout - {'loss': 0.6576, 'grad_norm': 1.248430848121643, 'learning_rate': 6.7646506237025045e-06, 'epoch': 1.85} +2025-02-05 23:10:50 - ERROR - stderr - 62%|██████▏ | 13836/22434 [13:03:09<5:57:11, 2.49s/it] +2025-02-05 23:10:52 - ERROR - stderr - 62%|██████▏ | 13837/22434 [13:03:12<5:54:57, 2.48s/it] +2025-02-05 23:10:52 - ERROR - stderr - +2025-02-05 23:10:52 - ERROR - stderr - +2025-02-05 23:10:52 - INFO - stdout - {'loss': 0.6819, 'grad_norm': 1.2680670022964478, 'learning_rate': 6.763284560946062e-06, 'epoch': 1.85} +2025-02-05 23:10:52 - ERROR - stderr - 62%|██████▏ | 13837/22434 [13:03:12<5:54:57, 2.48s/it] +2025-02-05 23:10:54 - ERROR - stderr - 62%|██████▏ | 13838/22434 [13:03:14<5:54:49, 2.48s/it] +2025-02-05 23:10:54 - ERROR - stderr - +2025-02-05 23:10:54 - ERROR - stderr - +2025-02-05 23:10:54 - INFO - stdout - {'loss': 0.6614, 'grad_norm': 1.2248493432998657, 'learning_rate': 6.761918565655851e-06, 'epoch': 1.85} +2025-02-05 23:10:54 - ERROR - stderr - 62%|██████▏ | 13838/22434 [13:03:14<5:54:49, 2.48s/it] +2025-02-05 23:10:57 - ERROR - stderr - 62%|██████▏ | 13839/22434 [13:03:17<6:08:05, 2.57s/it] +2025-02-05 23:10:57 - ERROR - stderr - +2025-02-05 23:10:57 - ERROR - stderr - +2025-02-05 23:10:57 - INFO - stdout - {'loss': 0.8151, 'grad_norm': 1.6024763584136963, 'learning_rate': 6.76055263786035e-06, 'epoch': 1.85} +2025-02-05 23:10:57 - ERROR - stderr - 62%|██████▏ | 13839/22434 [13:03:17<6:08:05, 2.57s/it] +2025-02-05 23:11:00 - ERROR - stderr - 62%|██████▏ | 13840/22434 [13:03:19<6:03:13, 2.54s/it] +2025-02-05 23:11:00 - ERROR - stderr - +2025-02-05 23:11:00 - ERROR - stderr - +2025-02-05 23:11:00 - INFO - stdout - {'loss': 0.6083, 'grad_norm': 1.2581449747085571, 'learning_rate': 6.759186777588032e-06, 'epoch': 1.85} +2025-02-05 23:11:00 - ERROR - stderr - 62%|██████▏ | 13840/22434 [13:03:19<6:03:13, 2.54s/it] +2025-02-05 23:11:02 - ERROR - stderr - 62%|██████▏ | 13841/22434 [13:03:22<6:00:47, 2.52s/it] +2025-02-05 23:11:02 - ERROR - stderr - +2025-02-05 23:11:02 - ERROR - stderr - +2025-02-05 23:11:02 - INFO - stdout - {'loss': 0.6432, 'grad_norm': 1.1997913122177124, 'learning_rate': 6.757820984867362e-06, 'epoch': 1.85} +2025-02-05 23:11:02 - ERROR - stderr - 62%|██████▏ | 13841/22434 [13:03:22<6:00:47, 2.52s/it] +2025-02-05 23:11:05 - ERROR - stderr - 62%|██████▏ | 13842/22434 [13:03:24<5:59:26, 2.51s/it] +2025-02-05 23:11:05 - ERROR - stderr - +2025-02-05 23:11:05 - ERROR - stderr - +2025-02-05 23:11:05 - INFO - stdout - {'loss': 0.7623, 'grad_norm': 1.318400502204895, 'learning_rate': 6.756455259726815e-06, 'epoch': 1.85} +2025-02-05 23:11:05 - ERROR - stderr - 62%|██████▏ | 13842/22434 [13:03:24<5:59:26, 2.51s/it] +2025-02-05 23:11:07 - ERROR - stderr - 62%|██████▏ | 13843/22434 [13:03:27<6:01:19, 2.52s/it] +2025-02-05 23:11:07 - ERROR - stderr - +2025-02-05 23:11:07 - ERROR - stderr - +2025-02-05 23:11:07 - INFO - stdout - {'loss': 0.7235, 'grad_norm': 1.2387298345565796, 'learning_rate': 6.755089602194849e-06, 'epoch': 1.85} +2025-02-05 23:11:07 - ERROR - stderr - 62%|██████▏ | 13843/22434 [13:03:27<6:01:19, 2.52s/it] +2025-02-05 23:11:10 - ERROR - stderr - 62%|██████▏ | 13844/22434 [13:03:29<5:58:56, 2.51s/it] +2025-02-05 23:11:10 - ERROR - stderr - +2025-02-05 23:11:10 - ERROR - stderr - +2025-02-05 23:11:10 - INFO - stdout - {'loss': 0.7473, 'grad_norm': 1.2552803754806519, 'learning_rate': 6.75372401229994e-06, 'epoch': 1.85} +2025-02-05 23:11:10 - ERROR - stderr - 62%|██████▏ | 13844/22434 [13:03:29<5:58:56, 2.51s/it] +2025-02-05 23:11:12 - ERROR - stderr - 62%|██████▏ | 13845/22434 [13:03:32<5:57:49, 2.50s/it] +2025-02-05 23:11:12 - ERROR - stderr - +2025-02-05 23:11:12 - ERROR - stderr - +2025-02-05 23:11:12 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.5273960828781128, 'learning_rate': 6.752358490070545e-06, 'epoch': 1.85} +2025-02-05 23:11:12 - ERROR - stderr - 62%|██████▏ | 13845/22434 [13:03:32<5:57:49, 2.50s/it] +2025-02-05 23:11:15 - ERROR - stderr - 62%|██████▏ | 13846/22434 [13:03:34<5:55:38, 2.48s/it] +2025-02-05 23:11:15 - ERROR - stderr - +2025-02-05 23:11:15 - ERROR - stderr - +2025-02-05 23:11:15 - INFO - stdout - {'loss': 0.7085, 'grad_norm': 1.307141900062561, 'learning_rate': 6.750993035535128e-06, 'epoch': 1.85} +2025-02-05 23:11:15 - ERROR - stderr - 62%|██████▏ | 13846/22434 [13:03:34<5:55:38, 2.48s/it] +2025-02-05 23:11:17 - ERROR - stderr - 62%|██████▏ | 13847/22434 [13:03:37<5:54:22, 2.48s/it] +2025-02-05 23:11:17 - ERROR - stderr - +2025-02-05 23:11:17 - ERROR - stderr - +2025-02-05 23:11:17 - INFO - stdout - {'loss': 0.5856, 'grad_norm': 1.1096395254135132, 'learning_rate': 6.749627648722157e-06, 'epoch': 1.85} +2025-02-05 23:11:17 - ERROR - stderr - 62%|██████▏ | 13847/22434 [13:03:37<5:54:22, 2.48s/it] +2025-02-05 23:11:20 - ERROR - stderr - 62%|██████▏ | 13848/22434 [13:03:39<5:54:31, 2.48s/it] +2025-02-05 23:11:20 - ERROR - stderr - +2025-02-05 23:11:20 - ERROR - stderr - +2025-02-05 23:11:20 - INFO - stdout - {'loss': 0.6816, 'grad_norm': 1.3337516784667969, 'learning_rate': 6.748262329660082e-06, 'epoch': 1.85} +2025-02-05 23:11:20 - ERROR - stderr - 62%|██████▏ | 13848/22434 [13:03:39<5:54:31, 2.48s/it] +2025-02-05 23:11:22 - ERROR - stderr - 62%|██████▏ | 13849/22434 [13:03:42<5:53:33, 2.47s/it] +2025-02-05 23:11:22 - ERROR - stderr - +2025-02-05 23:11:22 - ERROR - stderr - +2025-02-05 23:11:22 - INFO - stdout - {'loss': 0.6461, 'grad_norm': 1.1463372707366943, 'learning_rate': 6.746897078377372e-06, 'epoch': 1.85} +2025-02-05 23:11:22 - ERROR - stderr - 62%|██████▏ | 13849/22434 [13:03:42<5:53:33, 2.47s/it] +2025-02-05 23:11:24 - ERROR - stderr - 62%|██████▏ | 13850/22434 [13:03:44<5:52:01, 2.46s/it] +2025-02-05 23:11:24 - ERROR - stderr - +2025-02-05 23:11:24 - ERROR - stderr - +2025-02-05 23:11:24 - INFO - stdout - {'loss': 0.7194, 'grad_norm': 1.2968477010726929, 'learning_rate': 6.74553189490248e-06, 'epoch': 1.85} +2025-02-05 23:11:24 - ERROR - stderr - 62%|██████▏ | 13850/22434 [13:03:44<5:52:01, 2.46s/it] +2025-02-05 23:11:27 - ERROR - stderr - 62%|██████▏ | 13851/22434 [13:03:47<5:52:43, 2.47s/it] +2025-02-05 23:11:27 - ERROR - stderr - +2025-02-05 23:11:27 - ERROR - stderr - +2025-02-05 23:11:27 - INFO - stdout - {'loss': 0.6041, 'grad_norm': 1.1409854888916016, 'learning_rate': 6.744166779263856e-06, 'epoch': 1.85} +2025-02-05 23:11:27 - ERROR - stderr - 62%|██████▏ | 13851/22434 [13:03:47<5:52:43, 2.47s/it] +2025-02-05 23:11:29 - ERROR - stderr - 62%|██████▏ | 13852/22434 [13:03:49<5:54:43, 2.48s/it] +2025-02-05 23:11:29 - ERROR - stderr - +2025-02-05 23:11:29 - ERROR - stderr - +2025-02-05 23:11:29 - INFO - stdout - {'loss': 0.6686, 'grad_norm': 1.2449700832366943, 'learning_rate': 6.742801731489963e-06, 'epoch': 1.85} +2025-02-05 23:11:29 - ERROR - stderr - 62%|██████▏ | 13852/22434 [13:03:49<5:54:43, 2.48s/it] +2025-02-05 23:11:32 - ERROR - stderr - 62%|██████▏ | 13853/22434 [13:03:52<5:54:49, 2.48s/it] +2025-02-05 23:11:32 - ERROR - stderr - +2025-02-05 23:11:32 - ERROR - stderr - +2025-02-05 23:11:32 - INFO - stdout - {'loss': 0.734, 'grad_norm': 1.3354474306106567, 'learning_rate': 6.741436751609252e-06, 'epoch': 1.85} +2025-02-05 23:11:32 - ERROR - stderr - 62%|██████▏ | 13853/22434 [13:03:52<5:54:49, 2.48s/it] +2025-02-05 23:11:34 - ERROR - stderr - 62%|██████▏ | 13854/22434 [13:03:54<5:51:57, 2.46s/it] +2025-02-05 23:11:34 - ERROR - stderr - +2025-02-05 23:11:34 - ERROR - stderr - +2025-02-05 23:11:34 - INFO - stdout - {'loss': 0.6561, 'grad_norm': 1.2562320232391357, 'learning_rate': 6.740071839650171e-06, 'epoch': 1.85} +2025-02-05 23:11:34 - ERROR - stderr - 62%|██████▏ | 13854/22434 [13:03:54<5:51:57, 2.46s/it] +2025-02-05 23:11:37 - ERROR - stderr - 62%|██████▏ | 13855/22434 [13:03:57<5:53:31, 2.47s/it] +2025-02-05 23:11:37 - ERROR - stderr - +2025-02-05 23:11:37 - ERROR - stderr - +2025-02-05 23:11:37 - INFO - stdout - {'loss': 0.6963, 'grad_norm': 1.2631070613861084, 'learning_rate': 6.738706995641177e-06, 'epoch': 1.85} +2025-02-05 23:11:37 - ERROR - stderr - 62%|██████▏ | 13855/22434 [13:03:57<5:53:31, 2.47s/it] +2025-02-05 23:11:39 - ERROR - stderr - 62%|██████▏ | 13856/22434 [13:03:59<5:59:02, 2.51s/it] +2025-02-05 23:11:39 - ERROR - stderr - +2025-02-05 23:11:39 - ERROR - stderr - +2025-02-05 23:11:39 - INFO - stdout - {'loss': 0.6567, 'grad_norm': 1.3139373064041138, 'learning_rate': 6.7373422196107105e-06, 'epoch': 1.85} +2025-02-05 23:11:39 - ERROR - stderr - 62%|██████▏ | 13856/22434 [13:03:59<5:59:02, 2.51s/it] +2025-02-05 23:11:42 - ERROR - stderr - 62%|██████▏ | 13857/22434 [13:04:02<5:56:35, 2.49s/it] +2025-02-05 23:11:42 - ERROR - stderr - +2025-02-05 23:11:42 - ERROR - stderr - +2025-02-05 23:11:42 - INFO - stdout - {'loss': 0.7447, 'grad_norm': 1.365637183189392, 'learning_rate': 6.735977511587228e-06, 'epoch': 1.85} +2025-02-05 23:11:42 - ERROR - stderr - 62%|██████▏ | 13857/22434 [13:04:02<5:56:35, 2.49s/it] +2025-02-05 23:11:44 - ERROR - stderr - 62%|██████▏ | 13858/22434 [13:04:04<5:59:44, 2.52s/it] +2025-02-05 23:11:44 - ERROR - stderr - +2025-02-05 23:11:44 - ERROR - stderr - +2025-02-05 23:11:44 - INFO - stdout - {'loss': 0.6935, 'grad_norm': 1.2912229299545288, 'learning_rate': 6.734612871599169e-06, 'epoch': 1.85} +2025-02-05 23:11:44 - ERROR - stderr - 62%|██████▏ | 13858/22434 [13:04:04<5:59:44, 2.52s/it] +2025-02-05 23:11:47 - ERROR - stderr - 62%|██████▏ | 13859/22434 [13:04:07<6:00:17, 2.52s/it] +2025-02-05 23:11:47 - ERROR - stderr - +2025-02-05 23:11:47 - ERROR - stderr - +2025-02-05 23:11:47 - INFO - stdout - {'loss': 0.6975, 'grad_norm': 1.2933177947998047, 'learning_rate': 6.733248299674977e-06, 'epoch': 1.85} +2025-02-05 23:11:47 - ERROR - stderr - 62%|██████▏ | 13859/22434 [13:04:07<6:00:17, 2.52s/it] +2025-02-05 23:11:49 - ERROR - stderr - 62%|██████▏ | 13860/22434 [13:04:09<6:01:23, 2.53s/it] +2025-02-05 23:11:50 - ERROR - stderr - +2025-02-05 23:11:50 - ERROR - stderr - +2025-02-05 23:11:50 - INFO - stdout - {'loss': 0.6417, 'grad_norm': 1.2751692533493042, 'learning_rate': 6.731883795843104e-06, 'epoch': 1.85} +2025-02-05 23:11:50 - ERROR - stderr - 62%|██████▏ | 13860/22434 [13:04:09<6:01:23, 2.53s/it] +2025-02-05 23:11:52 - ERROR - stderr - 62%|██████▏ | 13861/22434 [13:04:12<5:57:32, 2.50s/it] +2025-02-05 23:11:52 - ERROR - stderr - +2025-02-05 23:11:52 - ERROR - stderr - +2025-02-05 23:11:52 - INFO - stdout - {'loss': 0.7103, 'grad_norm': 1.3222585916519165, 'learning_rate': 6.73051936013198e-06, 'epoch': 1.85} +2025-02-05 23:11:52 - ERROR - stderr - 62%|██████▏ | 13861/22434 [13:04:12<5:57:32, 2.50s/it] +2025-02-05 23:11:54 - ERROR - stderr - 62%|██████▏ | 13862/22434 [13:04:14<5:55:39, 2.49s/it] +2025-02-05 23:11:54 - ERROR - stderr - +2025-02-05 23:11:54 - ERROR - stderr - +2025-02-05 23:11:54 - INFO - stdout - {'loss': 0.6913, 'grad_norm': 1.2787964344024658, 'learning_rate': 6.7291549925700575e-06, 'epoch': 1.85} +2025-02-05 23:11:54 - ERROR - stderr - 62%|██████▏ | 13862/22434 [13:04:14<5:55:39, 2.49s/it] +2025-02-05 23:11:57 - ERROR - stderr - 62%|██████▏ | 13863/22434 [13:04:17<5:56:47, 2.50s/it] +2025-02-05 23:11:57 - ERROR - stderr - +2025-02-05 23:11:57 - ERROR - stderr - +2025-02-05 23:11:57 - INFO - stdout - {'loss': 0.6285, 'grad_norm': 1.181631326675415, 'learning_rate': 6.727790693185767e-06, 'epoch': 1.85} +2025-02-05 23:11:57 - ERROR - stderr - 62%|██████▏ | 13863/22434 [13:04:17<5:56:47, 2.50s/it] +2025-02-05 23:11:59 - ERROR - stderr - 62%|██████▏ | 13864/22434 [13:04:19<5:58:16, 2.51s/it] +2025-02-05 23:11:59 - ERROR - stderr - +2025-02-05 23:11:59 - ERROR - stderr - +2025-02-05 23:11:59 - INFO - stdout - {'loss': 0.614, 'grad_norm': 1.243194341659546, 'learning_rate': 6.7264264620075455e-06, 'epoch': 1.85} +2025-02-05 23:11:59 - ERROR - stderr - 62%|██████▏ | 13864/22434 [13:04:19<5:58:16, 2.51s/it] +2025-02-05 23:12:02 - ERROR - stderr - 62%|██████▏ | 13865/22434 [13:04:22<6:00:35, 2.52s/it] +2025-02-05 23:12:02 - ERROR - stderr - +2025-02-05 23:12:02 - ERROR - stderr - +2025-02-05 23:12:02 - INFO - stdout - {'loss': 0.7912, 'grad_norm': 1.3067023754119873, 'learning_rate': 6.725062299063834e-06, 'epoch': 1.85} +2025-02-05 23:12:02 - ERROR - stderr - 62%|██████▏ | 13865/22434 [13:04:22<6:00:35, 2.52s/it] +2025-02-05 23:12:05 - ERROR - stderr - 62%|██████▏ | 13866/22434 [13:04:24<6:01:16, 2.53s/it] +2025-02-05 23:12:05 - ERROR - stderr - +2025-02-05 23:12:05 - ERROR - stderr - +2025-02-05 23:12:05 - INFO - stdout - {'loss': 0.64, 'grad_norm': 1.2184467315673828, 'learning_rate': 6.723698204383067e-06, 'epoch': 1.85} +2025-02-05 23:12:05 - ERROR - stderr - 62%|██████▏ | 13866/22434 [13:04:24<6:01:16, 2.53s/it] +2025-02-05 23:12:07 - ERROR - stderr - 62%|██████▏ | 13867/22434 [13:04:27<5:59:27, 2.52s/it] +2025-02-05 23:12:07 - ERROR - stderr - +2025-02-05 23:12:07 - ERROR - stderr - +2025-02-05 23:12:07 - INFO - stdout - {'loss': 0.7748, 'grad_norm': 1.2818893194198608, 'learning_rate': 6.722334177993673e-06, 'epoch': 1.85} +2025-02-05 23:12:07 - ERROR - stderr - 62%|██████▏ | 13867/22434 [13:04:27<5:59:27, 2.52s/it] +2025-02-05 23:12:10 - ERROR - stderr - 62%|██████▏ | 13868/22434 [13:04:29<6:00:50, 2.53s/it] +2025-02-05 23:12:10 - ERROR - stderr - +2025-02-05 23:12:10 - ERROR - stderr - +2025-02-05 23:12:10 - INFO - stdout - {'loss': 0.7437, 'grad_norm': 1.2240118980407715, 'learning_rate': 6.720970219924088e-06, 'epoch': 1.85} +2025-02-05 23:12:10 - ERROR - stderr - 62%|██████▏ | 13868/22434 [13:04:29<6:00:50, 2.53s/it] +2025-02-05 23:12:12 - ERROR - stderr - 62%|██████▏ | 13869/22434 [13:04:32<5:59:13, 2.52s/it] +2025-02-05 23:12:12 - ERROR - stderr - +2025-02-05 23:12:12 - ERROR - stderr - +2025-02-05 23:12:12 - INFO - stdout - {'loss': 0.6169, 'grad_norm': 1.2402490377426147, 'learning_rate': 6.719606330202739e-06, 'epoch': 1.85} +2025-02-05 23:12:12 - ERROR - stderr - 62%|██████▏ | 13869/22434 [13:04:32<5:59:13, 2.52s/it] +2025-02-05 23:12:15 - ERROR - stderr - 62%|██████▏ | 13870/22434 [13:04:34<5:58:39, 2.51s/it] +2025-02-05 23:12:15 - ERROR - stderr - +2025-02-05 23:12:15 - ERROR - stderr - +2025-02-05 23:12:15 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.4071033000946045, 'learning_rate': 6.71824250885806e-06, 'epoch': 1.85} +2025-02-05 23:12:15 - ERROR - stderr - 62%|██████▏ | 13870/22434 [13:04:34<5:58:39, 2.51s/it] +2025-02-05 23:12:17 - ERROR - stderr - 62%|██████▏ | 13871/22434 [13:04:37<5:57:19, 2.50s/it] +2025-02-05 23:12:17 - ERROR - stderr - +2025-02-05 23:12:17 - ERROR - stderr - +2025-02-05 23:12:17 - INFO - stdout - {'loss': 0.6571, 'grad_norm': 1.1687504053115845, 'learning_rate': 6.716878755918474e-06, 'epoch': 1.85} +2025-02-05 23:12:17 - ERROR - stderr - 62%|██████▏ | 13871/22434 [13:04:37<5:57:19, 2.50s/it] +2025-02-05 23:12:19 - ERROR - stderr - 62%|██████▏ | 13872/22434 [13:04:39<5:54:05, 2.48s/it] +2025-02-05 23:12:20 - ERROR - stderr - +2025-02-05 23:12:20 - ERROR - stderr - +2025-02-05 23:12:20 - INFO - stdout - {'loss': 0.66, 'grad_norm': 1.2581931352615356, 'learning_rate': 6.715515071412411e-06, 'epoch': 1.86} +2025-02-05 23:12:20 - ERROR - stderr - 62%|██████▏ | 13872/22434 [13:04:39<5:54:05, 2.48s/it] +2025-02-05 23:12:22 - ERROR - stderr - 62%|██████▏ | 13873/22434 [13:04:42<5:53:37, 2.48s/it] +2025-02-05 23:12:22 - ERROR - stderr - +2025-02-05 23:12:22 - ERROR - stderr - +2025-02-05 23:12:22 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.2189053297042847, 'learning_rate': 6.71415145536829e-06, 'epoch': 1.86} +2025-02-05 23:12:22 - ERROR - stderr - 62%|██████▏ | 13873/22434 [13:04:42<5:53:37, 2.48s/it] +2025-02-05 23:12:25 - ERROR - stderr - 62%|██████▏ | 13874/22434 [13:04:45<6:06:44, 2.57s/it] +2025-02-05 23:12:25 - ERROR - stderr - +2025-02-05 23:12:25 - ERROR - stderr - +2025-02-05 23:12:25 - INFO - stdout - {'loss': 0.664, 'grad_norm': 1.2393089532852173, 'learning_rate': 6.712787907814542e-06, 'epoch': 1.86} +2025-02-05 23:12:25 - ERROR - stderr - 62%|██████▏ | 13874/22434 [13:04:45<6:06:44, 2.57s/it] +2025-02-05 23:12:27 - ERROR - stderr - 62%|██████▏ | 13875/22434 [13:04:47<6:05:04, 2.56s/it] +2025-02-05 23:12:27 - ERROR - stderr - +2025-02-05 23:12:27 - ERROR - stderr - +2025-02-05 23:12:27 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.2932487726211548, 'learning_rate': 6.7114244287795785e-06, 'epoch': 1.86} +2025-02-05 23:12:27 - ERROR - stderr - 62%|██████▏ | 13875/22434 [13:04:47<6:05:04, 2.56s/it] +2025-02-05 23:12:30 - ERROR - stderr - 62%|██████▏ | 13876/22434 [13:04:50<6:04:03, 2.55s/it] +2025-02-05 23:12:30 - ERROR - stderr - +2025-02-05 23:12:30 - ERROR - stderr - +2025-02-05 23:12:30 - INFO - stdout - {'loss': 0.658, 'grad_norm': 1.2768210172653198, 'learning_rate': 6.710061018291831e-06, 'epoch': 1.86} +2025-02-05 23:12:30 - ERROR - stderr - 62%|██████▏ | 13876/22434 [13:04:50<6:04:03, 2.55s/it] +2025-02-05 23:12:32 - ERROR - stderr - 62%|██████▏ | 13877/22434 [13:04:52<5:59:53, 2.52s/it] +2025-02-05 23:12:32 - ERROR - stderr - +2025-02-05 23:12:32 - ERROR - stderr - +2025-02-05 23:12:32 - INFO - stdout - {'loss': 0.613, 'grad_norm': 1.27066969871521, 'learning_rate': 6.70869767637971e-06, 'epoch': 1.86} +2025-02-05 23:12:32 - ERROR - stderr - 62%|██████▏ | 13877/22434 [13:04:52<5:59:53, 2.52s/it] +2025-02-05 23:12:35 - ERROR - stderr - 62%|██████▏ | 13878/22434 [13:04:55<5:58:06, 2.51s/it] +2025-02-05 23:12:35 - ERROR - stderr - +2025-02-05 23:12:35 - ERROR - stderr - +2025-02-05 23:12:35 - INFO - stdout - {'loss': 0.6895, 'grad_norm': 1.2313402891159058, 'learning_rate': 6.707334403071638e-06, 'epoch': 1.86} +2025-02-05 23:12:35 - ERROR - stderr - 62%|██████▏ | 13878/22434 [13:04:55<5:58:06, 2.51s/it] +2025-02-05 23:12:37 - ERROR - stderr - 62%|██████▏ | 13879/22434 [13:04:57<5:55:32, 2.49s/it] +2025-02-05 23:12:37 - ERROR - stderr - +2025-02-05 23:12:37 - ERROR - stderr - +2025-02-05 23:12:37 - INFO - stdout - {'loss': 0.6298, 'grad_norm': 1.2324997186660767, 'learning_rate': 6.705971198396032e-06, 'epoch': 1.86} +2025-02-05 23:12:37 - ERROR - stderr - 62%|██████▏ | 13879/22434 [13:04:57<5:55:32, 2.49s/it] +2025-02-05 23:12:40 - ERROR - stderr - 62%|██████▏ | 13880/22434 [13:04:59<5:56:37, 2.50s/it] +2025-02-05 23:12:40 - ERROR - stderr - +2025-02-05 23:12:40 - ERROR - stderr - +2025-02-05 23:12:40 - INFO - stdout - {'loss': 0.6712, 'grad_norm': 1.1672239303588867, 'learning_rate': 6.7046080623812995e-06, 'epoch': 1.86} +2025-02-05 23:12:40 - ERROR - stderr - 62%|██████▏ | 13880/22434 [13:05:00<5:56:37, 2.50s/it] +2025-02-05 23:12:42 - ERROR - stderr - 62%|██████▏ | 13881/22434 [13:05:02<5:56:01, 2.50s/it] +2025-02-05 23:12:42 - ERROR - stderr - +2025-02-05 23:12:42 - ERROR - stderr - +2025-02-05 23:12:42 - INFO - stdout - {'loss': 0.6835, 'grad_norm': 1.3264271020889282, 'learning_rate': 6.703244995055864e-06, 'epoch': 1.86} +2025-02-05 23:12:42 - ERROR - stderr - 62%|██████▏ | 13881/22434 [13:05:02<5:56:01, 2.50s/it] +2025-02-05 23:12:45 - ERROR - stderr - 62%|██████▏ | 13882/22434 [13:05:04<5:54:01, 2.48s/it] +2025-02-05 23:12:45 - ERROR - stderr - +2025-02-05 23:12:45 - ERROR - stderr - +2025-02-05 23:12:45 - INFO - stdout - {'loss': 0.6343, 'grad_norm': 1.240195631980896, 'learning_rate': 6.701881996448131e-06, 'epoch': 1.86} +2025-02-05 23:12:45 - ERROR - stderr - 62%|██████▏ | 13882/22434 [13:05:04<5:54:01, 2.48s/it] +2025-02-05 23:12:47 - ERROR - stderr - 62%|██████▏ | 13883/22434 [13:05:07<5:56:20, 2.50s/it] +2025-02-05 23:12:47 - ERROR - stderr - +2025-02-05 23:12:47 - ERROR - stderr - +2025-02-05 23:12:47 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.2639858722686768, 'learning_rate': 6.700519066586508e-06, 'epoch': 1.86} +2025-02-05 23:12:47 - ERROR - stderr - 62%|██████▏ | 13883/22434 [13:05:07<5:56:20, 2.50s/it] +2025-02-05 23:12:50 - ERROR - stderr - 62%|██████▏ | 13884/22434 [13:05:09<5:54:37, 2.49s/it] +2025-02-05 23:12:50 - ERROR - stderr - +2025-02-05 23:12:50 - ERROR - stderr - +2025-02-05 23:12:50 - INFO - stdout - {'loss': 0.702, 'grad_norm': 1.2224682569503784, 'learning_rate': 6.6991562054994085e-06, 'epoch': 1.86} +2025-02-05 23:12:50 - ERROR - stderr - 62%|██████▏ | 13884/22434 [13:05:09<5:54:37, 2.49s/it] +2025-02-05 23:12:52 - ERROR - stderr - 62%|██████▏ | 13885/22434 [13:05:12<5:53:44, 2.48s/it] +2025-02-05 23:12:52 - ERROR - stderr - +2025-02-05 23:12:52 - ERROR - stderr - +2025-02-05 23:12:52 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.3374301195144653, 'learning_rate': 6.6977934132152414e-06, 'epoch': 1.86} +2025-02-05 23:12:52 - ERROR - stderr - 62%|██████▏ | 13885/22434 [13:05:12<5:53:44, 2.48s/it] +2025-02-05 23:12:55 - ERROR - stderr - 62%|██████▏ | 13886/22434 [13:05:14<5:53:13, 2.48s/it] +2025-02-05 23:12:55 - ERROR - stderr - +2025-02-05 23:12:55 - ERROR - stderr - +2025-02-05 23:12:55 - INFO - stdout - {'loss': 0.6358, 'grad_norm': 1.143964409828186, 'learning_rate': 6.69643068976241e-06, 'epoch': 1.86} +2025-02-05 23:12:55 - ERROR - stderr - 62%|██████▏ | 13886/22434 [13:05:14<5:53:13, 2.48s/it] +2025-02-05 23:12:57 - ERROR - stderr - 62%|██████▏ | 13887/22434 [13:05:17<5:56:01, 2.50s/it] +2025-02-05 23:12:57 - ERROR - stderr - +2025-02-05 23:12:57 - ERROR - stderr - +2025-02-05 23:12:57 - INFO - stdout - {'loss': 0.6392, 'grad_norm': 1.1736760139465332, 'learning_rate': 6.695068035169321e-06, 'epoch': 1.86} +2025-02-05 23:12:57 - ERROR - stderr - 62%|██████▏ | 13887/22434 [13:05:17<5:56:01, 2.50s/it] +2025-02-05 23:13:00 - ERROR - stderr - 62%|██████▏ | 13888/22434 [13:05:19<5:56:43, 2.50s/it] +2025-02-05 23:13:00 - ERROR - stderr - +2025-02-05 23:13:00 - ERROR - stderr - +2025-02-05 23:13:00 - INFO - stdout - {'loss': 0.6642, 'grad_norm': 1.1120654344558716, 'learning_rate': 6.693705449464373e-06, 'epoch': 1.86} +2025-02-05 23:13:00 - ERROR - stderr - 62%|██████▏ | 13888/22434 [13:05:19<5:56:43, 2.50s/it] +2025-02-05 23:13:02 - ERROR - stderr - 62%|██████▏ | 13889/22434 [13:05:22<5:55:49, 2.50s/it] +2025-02-05 23:13:02 - ERROR - stderr - +2025-02-05 23:13:02 - ERROR - stderr - +2025-02-05 23:13:02 - INFO - stdout - {'loss': 0.6757, 'grad_norm': 1.1934597492218018, 'learning_rate': 6.692342932675974e-06, 'epoch': 1.86} +2025-02-05 23:13:02 - ERROR - stderr - 62%|██████▏ | 13889/22434 [13:05:22<5:55:49, 2.50s/it] +2025-02-05 23:13:05 - ERROR - stderr - 62%|██████▏ | 13890/22434 [13:05:24<5:51:31, 2.47s/it] +2025-02-05 23:13:05 - ERROR - stderr - +2025-02-05 23:13:05 - ERROR - stderr - +2025-02-05 23:13:05 - INFO - stdout - {'loss': 0.6881, 'grad_norm': 1.3436036109924316, 'learning_rate': 6.690980484832521e-06, 'epoch': 1.86} +2025-02-05 23:13:05 - ERROR - stderr - 62%|██████▏ | 13890/22434 [13:05:24<5:51:31, 2.47s/it] +2025-02-05 23:13:07 - ERROR - stderr - 62%|██████▏ | 13891/22434 [13:05:27<5:55:10, 2.49s/it] +2025-02-05 23:13:07 - ERROR - stderr - +2025-02-05 23:13:07 - ERROR - stderr - +2025-02-05 23:13:07 - INFO - stdout - {'loss': 0.6974, 'grad_norm': 1.1626349687576294, 'learning_rate': 6.689618105962412e-06, 'epoch': 1.86} +2025-02-05 23:13:07 - ERROR - stderr - 62%|██████▏ | 13891/22434 [13:05:27<5:55:10, 2.49s/it] +2025-02-05 23:13:10 - ERROR - stderr - 62%|██████▏ | 13892/22434 [13:05:29<5:55:06, 2.49s/it] +2025-02-05 23:13:10 - ERROR - stderr - +2025-02-05 23:13:10 - ERROR - stderr - +2025-02-05 23:13:10 - INFO - stdout - {'loss': 0.669, 'grad_norm': 1.2873663902282715, 'learning_rate': 6.688255796094048e-06, 'epoch': 1.86} +2025-02-05 23:13:10 - ERROR - stderr - 62%|██████▏ | 13892/22434 [13:05:29<5:55:06, 2.49s/it] +2025-02-05 23:13:12 - ERROR - stderr - 62%|██████▏ | 13893/22434 [13:05:32<5:55:32, 2.50s/it] +2025-02-05 23:13:12 - ERROR - stderr - +2025-02-05 23:13:12 - ERROR - stderr - +2025-02-05 23:13:12 - INFO - stdout - {'loss': 0.6829, 'grad_norm': 1.1633247137069702, 'learning_rate': 6.686893555255819e-06, 'epoch': 1.86} +2025-02-05 23:13:12 - ERROR - stderr - 62%|██████▏ | 13893/22434 [13:05:32<5:55:32, 2.50s/it] +2025-02-05 23:13:15 - ERROR - stderr - 62%|██████▏ | 13894/22434 [13:05:34<5:52:28, 2.48s/it] +2025-02-05 23:13:15 - ERROR - stderr - +2025-02-05 23:13:15 - ERROR - stderr - +2025-02-05 23:13:15 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.2562705278396606, 'learning_rate': 6.685531383476128e-06, 'epoch': 1.86} +2025-02-05 23:13:15 - ERROR - stderr - 62%|██████▏ | 13894/22434 [13:05:34<5:52:28, 2.48s/it] +2025-02-05 23:13:17 - ERROR - stderr - 62%|██████▏ | 13895/22434 [13:05:37<5:52:53, 2.48s/it] +2025-02-05 23:13:17 - ERROR - stderr - +2025-02-05 23:13:17 - ERROR - stderr - +2025-02-05 23:13:17 - INFO - stdout - {'loss': 0.6595, 'grad_norm': 1.2320321798324585, 'learning_rate': 6.684169280783365e-06, 'epoch': 1.86} +2025-02-05 23:13:17 - ERROR - stderr - 62%|██████▏ | 13895/22434 [13:05:37<5:52:53, 2.48s/it] +2025-02-05 23:13:19 - ERROR - stderr - 62%|██████▏ | 13896/22434 [13:05:39<5:50:11, 2.46s/it] +2025-02-05 23:13:19 - ERROR - stderr - +2025-02-05 23:13:19 - ERROR - stderr - +2025-02-05 23:13:19 - INFO - stdout - {'loss': 0.6633, 'grad_norm': 1.3818409442901611, 'learning_rate': 6.682807247205915e-06, 'epoch': 1.86} +2025-02-05 23:13:19 - ERROR - stderr - 62%|██████▏ | 13896/22434 [13:05:39<5:50:11, 2.46s/it] +2025-02-05 23:13:22 - ERROR - stderr - 62%|██████▏ | 13897/22434 [13:05:42<5:54:20, 2.49s/it] +2025-02-05 23:13:22 - ERROR - stderr - +2025-02-05 23:13:22 - ERROR - stderr - +2025-02-05 23:13:22 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.1381113529205322, 'learning_rate': 6.681445282772176e-06, 'epoch': 1.86} +2025-02-05 23:13:22 - ERROR - stderr - 62%|██████▏ | 13897/22434 [13:05:42<5:54:20, 2.49s/it] +2025-02-05 23:13:25 - ERROR - stderr - 62%|██████▏ | 13898/22434 [13:05:44<5:56:50, 2.51s/it] +2025-02-05 23:13:25 - ERROR - stderr - +2025-02-05 23:13:25 - ERROR - stderr - +2025-02-05 23:13:25 - INFO - stdout - {'loss': 0.693, 'grad_norm': 1.177986979484558, 'learning_rate': 6.680083387510536e-06, 'epoch': 1.86} +2025-02-05 23:13:25 - ERROR - stderr - 62%|██████▏ | 13898/22434 [13:05:44<5:56:50, 2.51s/it] +2025-02-05 23:13:27 - ERROR - stderr - 62%|██████▏ | 13899/22434 [13:05:47<5:56:18, 2.50s/it] +2025-02-05 23:13:27 - ERROR - stderr - +2025-02-05 23:13:27 - ERROR - stderr - +2025-02-05 23:13:27 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.155556321144104, 'learning_rate': 6.678721561449377e-06, 'epoch': 1.86} +2025-02-05 23:13:27 - ERROR - stderr - 62%|██████▏ | 13899/22434 [13:05:47<5:56:18, 2.50s/it] +2025-02-05 23:13:30 - ERROR - stderr - 62%|██████▏ | 13900/22434 [13:05:49<5:55:59, 2.50s/it] +2025-02-05 23:13:30 - ERROR - stderr - +2025-02-05 23:13:30 - ERROR - stderr - +2025-02-05 23:13:30 - INFO - stdout - {'loss': 0.6456, 'grad_norm': 1.1550631523132324, 'learning_rate': 6.677359804617094e-06, 'epoch': 1.86} +2025-02-05 23:13:30 - ERROR - stderr - 62%|██████▏ | 13900/22434 [13:05:49<5:55:59, 2.50s/it] +2025-02-05 23:13:32 - ERROR - stderr - 62%|██████▏ | 13901/22434 [13:05:52<5:55:28, 2.50s/it] +2025-02-05 23:13:32 - ERROR - stderr - +2025-02-05 23:13:32 - ERROR - stderr - +2025-02-05 23:13:32 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.2905124425888062, 'learning_rate': 6.675998117042062e-06, 'epoch': 1.86} +2025-02-05 23:13:32 - ERROR - stderr - 62%|██████▏ | 13901/22434 [13:05:52<5:55:28, 2.50s/it] +2025-02-05 23:13:35 - ERROR - stderr - 62%|██████▏ | 13902/22434 [13:05:54<5:57:45, 2.52s/it] +2025-02-05 23:13:35 - ERROR - stderr - +2025-02-05 23:13:35 - ERROR - stderr - +2025-02-05 23:13:35 - INFO - stdout - {'loss': 0.658, 'grad_norm': 1.110217809677124, 'learning_rate': 6.674636498752673e-06, 'epoch': 1.86} +2025-02-05 23:13:35 - ERROR - stderr - 62%|██████▏ | 13902/22434 [13:05:54<5:57:45, 2.52s/it] +2025-02-05 23:13:37 - ERROR - stderr - 62%|██████▏ | 13903/22434 [13:05:57<5:53:57, 2.49s/it] +2025-02-05 23:13:37 - ERROR - stderr - +2025-02-05 23:13:37 - ERROR - stderr - +2025-02-05 23:13:37 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.360866904258728, 'learning_rate': 6.673274949777302e-06, 'epoch': 1.86} +2025-02-05 23:13:37 - ERROR - stderr - 62%|██████▏ | 13903/22434 [13:05:57<5:53:57, 2.49s/it] +2025-02-05 23:13:40 - ERROR - stderr - 62%|██████▏ | 13904/22434 [13:05:59<5:55:27, 2.50s/it] +2025-02-05 23:13:40 - ERROR - stderr - +2025-02-05 23:13:40 - ERROR - stderr - +2025-02-05 23:13:40 - INFO - stdout - {'loss': 0.6706, 'grad_norm': 1.2819935083389282, 'learning_rate': 6.671913470144331e-06, 'epoch': 1.86} +2025-02-05 23:13:40 - ERROR - stderr - 62%|██████▏ | 13904/22434 [13:05:59<5:55:27, 2.50s/it] +2025-02-05 23:13:42 - ERROR - stderr - 62%|██████▏ | 13905/22434 [13:06:02<5:52:10, 2.48s/it] +2025-02-05 23:13:42 - ERROR - stderr - +2025-02-05 23:13:42 - ERROR - stderr - +2025-02-05 23:13:42 - INFO - stdout - {'loss': 0.6362, 'grad_norm': 1.1890041828155518, 'learning_rate': 6.670552059882138e-06, 'epoch': 1.86} +2025-02-05 23:13:42 - ERROR - stderr - 62%|██████▏ | 13905/22434 [13:06:02<5:52:10, 2.48s/it] +2025-02-05 23:13:45 - ERROR - stderr - 62%|██████▏ | 13906/22434 [13:06:04<5:55:32, 2.50s/it] +2025-02-05 23:13:45 - ERROR - stderr - +2025-02-05 23:13:45 - ERROR - stderr - +2025-02-05 23:13:45 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.205298662185669, 'learning_rate': 6.669190719019105e-06, 'epoch': 1.86} +2025-02-05 23:13:45 - ERROR - stderr - 62%|██████▏ | 13906/22434 [13:06:04<5:55:32, 2.50s/it] +2025-02-05 23:13:47 - ERROR - stderr - 62%|██████▏ | 13907/22434 [13:06:07<5:57:28, 2.52s/it] +2025-02-05 23:13:47 - ERROR - stderr - +2025-02-05 23:13:47 - ERROR - stderr - +2025-02-05 23:13:47 - INFO - stdout - {'loss': 0.6123, 'grad_norm': 1.1301299333572388, 'learning_rate': 6.6678294475836e-06, 'epoch': 1.86} +2025-02-05 23:13:47 - ERROR - stderr - 62%|██████▏ | 13907/22434 [13:06:07<5:57:28, 2.52s/it] +2025-02-05 23:13:50 - ERROR - stderr - 62%|██████▏ | 13908/22434 [13:06:09<5:55:16, 2.50s/it] +2025-02-05 23:13:50 - ERROR - stderr - +2025-02-05 23:13:50 - ERROR - stderr - +2025-02-05 23:13:50 - INFO - stdout - {'loss': 0.6503, 'grad_norm': 1.331193208694458, 'learning_rate': 6.666468245604005e-06, 'epoch': 1.86} +2025-02-05 23:13:50 - ERROR - stderr - 62%|██████▏ | 13908/22434 [13:06:09<5:55:16, 2.50s/it] +2025-02-05 23:13:52 - ERROR - stderr - 62%|██████▏ | 13909/22434 [13:06:12<5:51:53, 2.48s/it] +2025-02-05 23:13:52 - ERROR - stderr - +2025-02-05 23:13:52 - ERROR - stderr - +2025-02-05 23:13:52 - INFO - stdout - {'loss': 0.606, 'grad_norm': 1.2336878776550293, 'learning_rate': 6.665107113108687e-06, 'epoch': 1.86} +2025-02-05 23:13:52 - ERROR - stderr - 62%|██████▏ | 13909/22434 [13:06:12<5:51:53, 2.48s/it] +2025-02-05 23:13:54 - ERROR - stderr - 62%|██████▏ | 13910/22434 [13:06:14<5:50:20, 2.47s/it] +2025-02-05 23:13:54 - ERROR - stderr - +2025-02-05 23:13:54 - ERROR - stderr - +2025-02-05 23:13:54 - INFO - stdout - {'loss': 0.6019, 'grad_norm': 1.2340543270111084, 'learning_rate': 6.663746050126021e-06, 'epoch': 1.86} +2025-02-05 23:13:54 - ERROR - stderr - 62%|██████▏ | 13910/22434 [13:06:14<5:50:20, 2.47s/it] +2025-02-05 23:13:57 - ERROR - stderr - 62%|██████▏ | 13911/22434 [13:06:17<5:53:46, 2.49s/it] +2025-02-05 23:13:57 - ERROR - stderr - +2025-02-05 23:13:57 - ERROR - stderr - +2025-02-05 23:13:57 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.2719411849975586, 'learning_rate': 6.662385056684377e-06, 'epoch': 1.86} +2025-02-05 23:13:57 - ERROR - stderr - 62%|██████▏ | 13911/22434 [13:06:17<5:53:46, 2.49s/it] +2025-02-05 23:13:59 - ERROR - stderr - 62%|██████▏ | 13912/22434 [13:06:19<5:51:03, 2.47s/it] +2025-02-05 23:13:59 - ERROR - stderr - +2025-02-05 23:13:59 - ERROR - stderr - +2025-02-05 23:13:59 - INFO - stdout - {'loss': 0.6464, 'grad_norm': 1.2811404466629028, 'learning_rate': 6.661024132812119e-06, 'epoch': 1.86} +2025-02-05 23:13:59 - ERROR - stderr - 62%|██████▏ | 13912/22434 [13:06:19<5:51:03, 2.47s/it] +2025-02-05 23:14:02 - ERROR - stderr - 62%|██████▏ | 13913/22434 [13:06:22<5:53:01, 2.49s/it] +2025-02-05 23:14:02 - ERROR - stderr - +2025-02-05 23:14:02 - ERROR - stderr - +2025-02-05 23:14:02 - INFO - stdout - {'loss': 0.764, 'grad_norm': 1.4237899780273438, 'learning_rate': 6.6596632785376245e-06, 'epoch': 1.86} +2025-02-05 23:14:02 - ERROR - stderr - 62%|██████▏ | 13913/22434 [13:06:22<5:53:01, 2.49s/it] +2025-02-05 23:14:04 - ERROR - stderr - 62%|██████▏ | 13914/22434 [13:06:24<5:50:30, 2.47s/it] +2025-02-05 23:14:04 - ERROR - stderr - +2025-02-05 23:14:04 - ERROR - stderr - +2025-02-05 23:14:04 - INFO - stdout - {'loss': 0.7338, 'grad_norm': 1.2332491874694824, 'learning_rate': 6.658302493889251e-06, 'epoch': 1.86} +2025-02-05 23:14:04 - ERROR - stderr - 62%|██████▏ | 13914/22434 [13:06:24<5:50:30, 2.47s/it] +2025-02-05 23:14:07 - ERROR - stderr - 62%|██████▏ | 13915/22434 [13:06:27<5:49:09, 2.46s/it] +2025-02-05 23:14:07 - ERROR - stderr - +2025-02-05 23:14:07 - ERROR - stderr - +2025-02-05 23:14:07 - INFO - stdout - {'loss': 0.7232, 'grad_norm': 1.464020013809204, 'learning_rate': 6.656941778895359e-06, 'epoch': 1.86} +2025-02-05 23:14:07 - ERROR - stderr - 62%|██████▏ | 13915/22434 [13:06:27<5:49:09, 2.46s/it] +2025-02-05 23:14:09 - ERROR - stderr - 62%|██████▏ | 13916/22434 [13:06:29<5:51:47, 2.48s/it] +2025-02-05 23:14:09 - ERROR - stderr - +2025-02-05 23:14:09 - ERROR - stderr - +2025-02-05 23:14:09 - INFO - stdout - {'loss': 0.6388, 'grad_norm': 1.2471884489059448, 'learning_rate': 6.655581133584321e-06, 'epoch': 1.86} +2025-02-05 23:14:09 - ERROR - stderr - 62%|██████▏ | 13916/22434 [13:06:29<5:51:47, 2.48s/it] +2025-02-05 23:14:12 - ERROR - stderr - 62%|██████▏ | 13917/22434 [13:06:32<5:50:49, 2.47s/it] +2025-02-05 23:14:12 - ERROR - stderr - +2025-02-05 23:14:12 - ERROR - stderr - +2025-02-05 23:14:12 - INFO - stdout - {'loss': 0.7115, 'grad_norm': 1.2288116216659546, 'learning_rate': 6.654220557984492e-06, 'epoch': 1.86} +2025-02-05 23:14:12 - ERROR - stderr - 62%|██████▏ | 13917/22434 [13:06:32<5:50:49, 2.47s/it] +2025-02-05 23:14:14 - ERROR - stderr - 62%|██████▏ | 13918/22434 [13:06:34<5:53:51, 2.49s/it] +2025-02-05 23:14:14 - ERROR - stderr - +2025-02-05 23:14:14 - ERROR - stderr - +2025-02-05 23:14:14 - INFO - stdout - {'loss': 0.7144, 'grad_norm': 1.2966489791870117, 'learning_rate': 6.652860052124235e-06, 'epoch': 1.86} +2025-02-05 23:14:14 - ERROR - stderr - 62%|██████▏ | 13918/22434 [13:06:34<5:53:51, 2.49s/it] +2025-02-05 23:14:17 - ERROR - stderr - 62%|██████▏ | 13919/22434 [13:06:37<5:54:40, 2.50s/it] +2025-02-05 23:14:17 - ERROR - stderr - +2025-02-05 23:14:17 - ERROR - stderr - +2025-02-05 23:14:17 - INFO - stdout - {'loss': 0.6394, 'grad_norm': 1.2088100910186768, 'learning_rate': 6.651499616031909e-06, 'epoch': 1.86} +2025-02-05 23:14:17 - ERROR - stderr - 62%|██████▏ | 13919/22434 [13:06:37<5:54:40, 2.50s/it] +2025-02-05 23:14:19 - ERROR - stderr - 62%|██████▏ | 13920/22434 [13:06:39<5:52:50, 2.49s/it] +2025-02-05 23:14:19 - ERROR - stderr - +2025-02-05 23:14:19 - ERROR - stderr - +2025-02-05 23:14:19 - INFO - stdout - {'loss': 0.6606, 'grad_norm': 1.2111750841140747, 'learning_rate': 6.6501392497358654e-06, 'epoch': 1.86} +2025-02-05 23:14:19 - ERROR - stderr - 62%|██████▏ | 13920/22434 [13:06:39<5:52:50, 2.49s/it] +2025-02-05 23:14:22 - ERROR - stderr - 62%|██████▏ | 13921/22434 [13:06:42<5:52:58, 2.49s/it] +2025-02-05 23:14:22 - ERROR - stderr - +2025-02-05 23:14:22 - ERROR - stderr - +2025-02-05 23:14:22 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.1587275266647339, 'learning_rate': 6.648778953264467e-06, 'epoch': 1.86} +2025-02-05 23:14:22 - ERROR - stderr - 62%|██████▏ | 13921/22434 [13:06:42<5:52:58, 2.49s/it] +2025-02-05 23:14:24 - ERROR - stderr - 62%|██████▏ | 13922/22434 [13:06:44<5:54:33, 2.50s/it] +2025-02-05 23:14:24 - ERROR - stderr - +2025-02-05 23:14:24 - ERROR - stderr - +2025-02-05 23:14:24 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.4167252779006958, 'learning_rate': 6.647418726646065e-06, 'epoch': 1.86} +2025-02-05 23:14:24 - ERROR - stderr - 62%|██████▏ | 13922/22434 [13:06:44<5:54:33, 2.50s/it] +2025-02-05 23:14:27 - ERROR - stderr - 62%|██████▏ | 13923/22434 [13:06:47<5:55:44, 2.51s/it] +2025-02-05 23:14:27 - ERROR - stderr - +2025-02-05 23:14:27 - ERROR - stderr - +2025-02-05 23:14:27 - INFO - stdout - {'loss': 0.6845, 'grad_norm': 1.3064322471618652, 'learning_rate': 6.646058569909008e-06, 'epoch': 1.86} +2025-02-05 23:14:27 - ERROR - stderr - 62%|██████▏ | 13923/22434 [13:06:47<5:55:44, 2.51s/it] +2025-02-05 23:14:29 - ERROR - stderr - 62%|██████▏ | 13924/22434 [13:06:49<5:56:52, 2.52s/it] +2025-02-05 23:14:29 - ERROR - stderr - +2025-02-05 23:14:29 - ERROR - stderr - +2025-02-05 23:14:29 - INFO - stdout - {'loss': 0.6841, 'grad_norm': 1.3005396127700806, 'learning_rate': 6.644698483081654e-06, 'epoch': 1.86} +2025-02-05 23:14:29 - ERROR - stderr - 62%|██████▏ | 13924/22434 [13:06:49<5:56:52, 2.52s/it] +2025-02-05 23:14:32 - ERROR - stderr - 62%|██████▏ | 13925/22434 [13:06:52<5:58:27, 2.53s/it] +2025-02-05 23:14:32 - ERROR - stderr - +2025-02-05 23:14:32 - ERROR - stderr - +2025-02-05 23:14:32 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.242795705795288, 'learning_rate': 6.643338466192346e-06, 'epoch': 1.86} +2025-02-05 23:14:32 - ERROR - stderr - 62%|██████▏ | 13925/22434 [13:06:52<5:58:27, 2.53s/it] +2025-02-05 23:14:34 - ERROR - stderr - 62%|██████▏ | 13926/22434 [13:06:54<5:56:51, 2.52s/it] +2025-02-05 23:14:34 - ERROR - stderr - +2025-02-05 23:14:34 - ERROR - stderr - +2025-02-05 23:14:34 - INFO - stdout - {'loss': 0.6653, 'grad_norm': 1.3630322217941284, 'learning_rate': 6.64197851926944e-06, 'epoch': 1.86} +2025-02-05 23:14:34 - ERROR - stderr - 62%|██████▏ | 13926/22434 [13:06:54<5:56:51, 2.52s/it] +2025-02-05 23:14:37 - ERROR - stderr - 62%|██████▏ | 13927/22434 [13:06:57<6:09:15, 2.60s/it] +2025-02-05 23:14:37 - ERROR - stderr - +2025-02-05 23:14:37 - ERROR - stderr - +2025-02-05 23:14:37 - INFO - stdout - {'loss': 0.6649, 'grad_norm': 1.228246808052063, 'learning_rate': 6.640618642341279e-06, 'epoch': 1.86} +2025-02-05 23:14:37 - ERROR - stderr - 62%|██████▏ | 13927/22434 [13:06:57<6:09:15, 2.60s/it] +2025-02-05 23:14:40 - ERROR - stderr - 62%|██████▏ | 13928/22434 [13:07:00<6:06:58, 2.59s/it] +2025-02-05 23:14:40 - ERROR - stderr - +2025-02-05 23:14:40 - ERROR - stderr - +2025-02-05 23:14:40 - INFO - stdout - {'loss': 0.6562, 'grad_norm': 1.3112335205078125, 'learning_rate': 6.639258835436202e-06, 'epoch': 1.86} +2025-02-05 23:14:40 - ERROR - stderr - 62%|██████▏ | 13928/22434 [13:07:00<6:06:58, 2.59s/it] +2025-02-05 23:14:43 - ERROR - stderr - 62%|██████▏ | 13929/22434 [13:07:02<6:22:16, 2.70s/it] +2025-02-05 23:14:43 - ERROR - stderr - +2025-02-05 23:14:43 - ERROR - stderr - +2025-02-05 23:14:43 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.2685134410858154, 'learning_rate': 6.637899098582562e-06, 'epoch': 1.86} +2025-02-05 23:14:43 - ERROR - stderr - 62%|██████▏ | 13929/22434 [13:07:03<6:22:16, 2.70s/it] +2025-02-05 23:14:45 - ERROR - stderr - 62%|██████▏ | 13930/22434 [13:07:05<6:14:19, 2.64s/it] +2025-02-05 23:14:45 - ERROR - stderr - +2025-02-05 23:14:45 - ERROR - stderr - +2025-02-05 23:14:45 - INFO - stdout - {'loss': 0.6748, 'grad_norm': 1.3184354305267334, 'learning_rate': 6.6365394318087e-06, 'epoch': 1.86} +2025-02-05 23:14:45 - ERROR - stderr - 62%|██████▏ | 13930/22434 [13:07:05<6:14:19, 2.64s/it] +2025-02-05 23:14:48 - ERROR - stderr - 62%|██████▏ | 13931/22434 [13:07:07<6:08:09, 2.60s/it] +2025-02-05 23:14:48 - ERROR - stderr - +2025-02-05 23:14:48 - ERROR - stderr - +2025-02-05 23:14:48 - INFO - stdout - {'loss': 0.6566, 'grad_norm': 1.2283575534820557, 'learning_rate': 6.635179835142951e-06, 'epoch': 1.86} +2025-02-05 23:14:48 - ERROR - stderr - 62%|██████▏ | 13931/22434 [13:07:08<6:08:09, 2.60s/it] +2025-02-05 23:14:50 - ERROR - stderr - 62%|██████▏ | 13932/22434 [13:07:10<6:03:39, 2.57s/it] +2025-02-05 23:14:50 - ERROR - stderr - +2025-02-05 23:14:50 - ERROR - stderr - +2025-02-05 23:14:50 - INFO - stdout - {'loss': 0.6753, 'grad_norm': 1.3593336343765259, 'learning_rate': 6.633820308613662e-06, 'epoch': 1.86} +2025-02-05 23:14:50 - ERROR - stderr - 62%|██████▏ | 13932/22434 [13:07:10<6:03:39, 2.57s/it] +2025-02-05 23:14:53 - ERROR - stderr - 62%|██████▏ | 13933/22434 [13:07:12<5:59:47, 2.54s/it] +2025-02-05 23:14:53 - ERROR - stderr - +2025-02-05 23:14:53 - ERROR - stderr - +2025-02-05 23:14:53 - INFO - stdout - {'loss': 0.7558, 'grad_norm': 1.4365121126174927, 'learning_rate': 6.632460852249164e-06, 'epoch': 1.86} +2025-02-05 23:14:53 - ERROR - stderr - 62%|██████▏ | 13933/22434 [13:07:12<5:59:47, 2.54s/it] +2025-02-05 23:14:55 - ERROR - stderr - 62%|██████▏ | 13934/22434 [13:07:15<5:57:20, 2.52s/it] +2025-02-05 23:14:55 - ERROR - stderr - +2025-02-05 23:14:55 - ERROR - stderr - +2025-02-05 23:14:55 - INFO - stdout - {'loss': 0.6154, 'grad_norm': 1.133737564086914, 'learning_rate': 6.631101466077801e-06, 'epoch': 1.86} +2025-02-05 23:14:55 - ERROR - stderr - 62%|██████▏ | 13934/22434 [13:07:15<5:57:20, 2.52s/it] +2025-02-05 23:14:58 - ERROR - stderr - 62%|██████▏ | 13935/22434 [13:07:17<5:57:49, 2.53s/it] +2025-02-05 23:14:58 - ERROR - stderr - +2025-02-05 23:14:58 - ERROR - stderr - +2025-02-05 23:14:58 - INFO - stdout - {'loss': 0.6573, 'grad_norm': 1.311123251914978, 'learning_rate': 6.629742150127903e-06, 'epoch': 1.86} +2025-02-05 23:14:58 - ERROR - stderr - 62%|██████▏ | 13935/22434 [13:07:18<5:57:49, 2.53s/it] +2025-02-05 23:15:00 - ERROR - stderr - 62%|██████▏ | 13936/22434 [13:07:20<6:01:40, 2.55s/it] +2025-02-05 23:15:00 - ERROR - stderr - +2025-02-05 23:15:00 - ERROR - stderr - +2025-02-05 23:15:00 - INFO - stdout - {'loss': 0.6609, 'grad_norm': 1.3266445398330688, 'learning_rate': 6.628382904427804e-06, 'epoch': 1.86} +2025-02-05 23:15:00 - ERROR - stderr - 62%|██████▏ | 13936/22434 [13:07:20<6:01:40, 2.55s/it] +2025-02-05 23:15:03 - ERROR - stderr - 62%|██████▏ | 13937/22434 [13:07:23<5:59:52, 2.54s/it] +2025-02-05 23:15:03 - ERROR - stderr - +2025-02-05 23:15:03 - ERROR - stderr - +2025-02-05 23:15:03 - INFO - stdout - {'loss': 0.7114, 'grad_norm': 1.295404314994812, 'learning_rate': 6.627023729005837e-06, 'epoch': 1.86} +2025-02-05 23:15:03 - ERROR - stderr - 62%|██████▏ | 13937/22434 [13:07:23<5:59:52, 2.54s/it] +2025-02-05 23:15:05 - ERROR - stderr - 62%|██████▏ | 13938/22434 [13:07:25<5:53:56, 2.50s/it] +2025-02-05 23:15:05 - ERROR - stderr - +2025-02-05 23:15:05 - ERROR - stderr - +2025-02-05 23:15:05 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.2988808155059814, 'learning_rate': 6.625664623890331e-06, 'epoch': 1.86} +2025-02-05 23:15:05 - ERROR - stderr - 62%|██████▏ | 13938/22434 [13:07:25<5:53:56, 2.50s/it] +2025-02-05 23:15:08 - ERROR - stderr - 62%|██████▏ | 13939/22434 [13:07:27<5:51:44, 2.48s/it] +2025-02-05 23:15:08 - ERROR - stderr - +2025-02-05 23:15:08 - ERROR - stderr - +2025-02-05 23:15:08 - INFO - stdout - {'loss': 0.6314, 'grad_norm': 1.2106274366378784, 'learning_rate': 6.624305589109622e-06, 'epoch': 1.86} +2025-02-05 23:15:08 - ERROR - stderr - 62%|██████▏ | 13939/22434 [13:07:27<5:51:44, 2.48s/it] +2025-02-05 23:15:10 - ERROR - stderr - 62%|██████▏ | 13940/22434 [13:07:30<5:54:00, 2.50s/it] +2025-02-05 23:15:10 - ERROR - stderr - +2025-02-05 23:15:10 - ERROR - stderr - +2025-02-05 23:15:10 - INFO - stdout - {'loss': 0.6548, 'grad_norm': 1.301979660987854, 'learning_rate': 6.622946624692033e-06, 'epoch': 1.86} +2025-02-05 23:15:10 - ERROR - stderr - 62%|██████▏ | 13940/22434 [13:07:30<5:54:00, 2.50s/it] +2025-02-05 23:15:13 - ERROR - stderr - 62%|██████▏ | 13941/22434 [13:07:33<5:55:22, 2.51s/it] +2025-02-05 23:15:13 - ERROR - stderr - +2025-02-05 23:15:13 - ERROR - stderr - +2025-02-05 23:15:13 - INFO - stdout - {'loss': 0.6398, 'grad_norm': 1.1528948545455933, 'learning_rate': 6.6215877306658835e-06, 'epoch': 1.86} +2025-02-05 23:15:13 - ERROR - stderr - 62%|██████▏ | 13941/22434 [13:07:33<5:55:22, 2.51s/it] +2025-02-05 23:15:15 - ERROR - stderr - 62%|██████▏ | 13942/22434 [13:07:35<5:56:10, 2.52s/it] +2025-02-05 23:15:15 - ERROR - stderr - +2025-02-05 23:15:15 - ERROR - stderr - +2025-02-05 23:15:15 - INFO - stdout - {'loss': 0.6805, 'grad_norm': 1.2338688373565674, 'learning_rate': 6.620228907059511e-06, 'epoch': 1.86} +2025-02-05 23:15:15 - ERROR - stderr - 62%|██████▏ | 13942/22434 [13:07:35<5:56:10, 2.52s/it] +2025-02-05 23:15:18 - ERROR - stderr - 62%|██████▏ | 13943/22434 [13:07:38<5:56:17, 2.52s/it] +2025-02-05 23:15:18 - ERROR - stderr - +2025-02-05 23:15:18 - ERROR - stderr - +2025-02-05 23:15:18 - INFO - stdout - {'loss': 0.6376, 'grad_norm': 1.172389268875122, 'learning_rate': 6.618870153901231e-06, 'epoch': 1.86} +2025-02-05 23:15:18 - ERROR - stderr - 62%|██████▏ | 13943/22434 [13:07:38<5:56:17, 2.52s/it] +2025-02-05 23:15:20 - ERROR - stderr - 62%|██████▏ | 13944/22434 [13:07:40<5:54:00, 2.50s/it] +2025-02-05 23:15:20 - ERROR - stderr - +2025-02-05 23:15:20 - ERROR - stderr - +2025-02-05 23:15:20 - INFO - stdout - {'loss': 0.6932, 'grad_norm': 1.3034014701843262, 'learning_rate': 6.617511471219364e-06, 'epoch': 1.86} +2025-02-05 23:15:20 - ERROR - stderr - 62%|██████▏ | 13944/22434 [13:07:40<5:54:00, 2.50s/it] +2025-02-05 23:15:23 - ERROR - stderr - 62%|██████▏ | 13945/22434 [13:07:43<5:56:14, 2.52s/it] +2025-02-05 23:15:23 - ERROR - stderr - +2025-02-05 23:15:23 - ERROR - stderr - +2025-02-05 23:15:23 - INFO - stdout - {'loss': 0.6567, 'grad_norm': 1.29646897315979, 'learning_rate': 6.616152859042239e-06, 'epoch': 1.86} +2025-02-05 23:15:23 - ERROR - stderr - 62%|██████▏ | 13945/22434 [13:07:43<5:56:14, 2.52s/it] +2025-02-05 23:15:25 - ERROR - stderr - 62%|██████▏ | 13946/22434 [13:07:45<5:55:11, 2.51s/it] +2025-02-05 23:15:25 - ERROR - stderr - +2025-02-05 23:15:25 - ERROR - stderr - +2025-02-05 23:15:25 - INFO - stdout - {'loss': 0.5795, 'grad_norm': 1.123051643371582, 'learning_rate': 6.614794317398166e-06, 'epoch': 1.86} +2025-02-05 23:15:25 - ERROR - stderr - 62%|██████▏ | 13946/22434 [13:07:45<5:55:11, 2.51s/it] +2025-02-05 23:15:28 - ERROR - stderr - 62%|██████▏ | 13947/22434 [13:07:47<5:51:36, 2.49s/it] +2025-02-05 23:15:28 - ERROR - stderr - +2025-02-05 23:15:28 - ERROR - stderr - +2025-02-05 23:15:28 - INFO - stdout - {'loss': 0.6706, 'grad_norm': 1.1916099786758423, 'learning_rate': 6.613435846315468e-06, 'epoch': 1.87} +2025-02-05 23:15:28 - ERROR - stderr - 62%|██████▏ | 13947/22434 [13:07:48<5:51:36, 2.49s/it] +2025-02-05 23:15:30 - ERROR - stderr - 62%|██████▏ | 13948/22434 [13:07:50<5:54:51, 2.51s/it] +2025-02-05 23:15:30 - ERROR - stderr - +2025-02-05 23:15:30 - ERROR - stderr - +2025-02-05 23:15:30 - INFO - stdout - {'loss': 0.6806, 'grad_norm': 1.2787476778030396, 'learning_rate': 6.612077445822458e-06, 'epoch': 1.87} +2025-02-05 23:15:30 - ERROR - stderr - 62%|██████▏ | 13948/22434 [13:07:50<5:54:51, 2.51s/it] +2025-02-05 23:15:33 - ERROR - stderr - 62%|██████▏ | 13949/22434 [13:07:53<6:12:12, 2.63s/it] +2025-02-05 23:15:33 - ERROR - stderr - +2025-02-05 23:15:33 - ERROR - stderr - +2025-02-05 23:15:33 - INFO - stdout - {'loss': 0.6303, 'grad_norm': 1.1563136577606201, 'learning_rate': 6.610719115947453e-06, 'epoch': 1.87} +2025-02-05 23:15:33 - ERROR - stderr - 62%|██████▏ | 13949/22434 [13:07:53<6:12:12, 2.63s/it] +2025-02-05 23:15:36 - ERROR - stderr - 62%|██████▏ | 13950/22434 [13:07:55<6:06:07, 2.59s/it] +2025-02-05 23:15:36 - ERROR - stderr - +2025-02-05 23:15:36 - ERROR - stderr - +2025-02-05 23:15:36 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.283848762512207, 'learning_rate': 6.609360856718763e-06, 'epoch': 1.87} +2025-02-05 23:15:36 - ERROR - stderr - 62%|██████▏ | 13950/22434 [13:07:56<6:06:07, 2.59s/it] +2025-02-05 23:15:38 - ERROR - stderr - 62%|██████▏ | 13951/22434 [13:07:58<6:02:59, 2.57s/it] +2025-02-05 23:15:38 - ERROR - stderr - +2025-02-05 23:15:38 - ERROR - stderr - +2025-02-05 23:15:38 - INFO - stdout - {'loss': 0.7266, 'grad_norm': 1.3023619651794434, 'learning_rate': 6.608002668164706e-06, 'epoch': 1.87} +2025-02-05 23:15:38 - ERROR - stderr - 62%|██████▏ | 13951/22434 [13:07:58<6:02:59, 2.57s/it] +2025-02-05 23:15:41 - ERROR - stderr - 62%|██████▏ | 13952/22434 [13:08:00<6:00:32, 2.55s/it] +2025-02-05 23:15:41 - ERROR - stderr - +2025-02-05 23:15:41 - ERROR - stderr - +2025-02-05 23:15:41 - INFO - stdout - {'loss': 0.6801, 'grad_norm': 1.3092055320739746, 'learning_rate': 6.606644550313581e-06, 'epoch': 1.87} +2025-02-05 23:15:41 - ERROR - stderr - 62%|██████▏ | 13952/22434 [13:08:01<6:00:32, 2.55s/it] +2025-02-05 23:15:43 - ERROR - stderr - 62%|██████▏ | 13953/22434 [13:08:03<6:00:18, 2.55s/it] +2025-02-05 23:15:43 - ERROR - stderr - +2025-02-05 23:15:43 - ERROR - stderr - +2025-02-05 23:15:43 - INFO - stdout - {'loss': 0.6754, 'grad_norm': 1.1721385717391968, 'learning_rate': 6.605286503193709e-06, 'epoch': 1.87} +2025-02-05 23:15:43 - ERROR - stderr - 62%|██████▏ | 13953/22434 [13:08:03<6:00:18, 2.55s/it] +2025-02-05 23:15:46 - ERROR - stderr - 62%|██████▏ | 13954/22434 [13:08:05<5:56:01, 2.52s/it] +2025-02-05 23:15:46 - ERROR - stderr - +2025-02-05 23:15:46 - ERROR - stderr - +2025-02-05 23:15:46 - INFO - stdout - {'loss': 0.6794, 'grad_norm': 1.3377279043197632, 'learning_rate': 6.603928526833386e-06, 'epoch': 1.87} +2025-02-05 23:15:46 - ERROR - stderr - 62%|██████▏ | 13954/22434 [13:08:06<5:56:01, 2.52s/it] +2025-02-05 23:15:48 - ERROR - stderr - 62%|██████▏ | 13955/22434 [13:08:08<5:51:15, 2.49s/it] +2025-02-05 23:15:48 - ERROR - stderr - +2025-02-05 23:15:48 - ERROR - stderr - +2025-02-05 23:15:48 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.2527827024459839, 'learning_rate': 6.602570621260929e-06, 'epoch': 1.87} +2025-02-05 23:15:48 - ERROR - stderr - 62%|██████▏ | 13955/22434 [13:08:08<5:51:15, 2.49s/it] +2025-02-05 23:15:51 - ERROR - stderr - 62%|██████▏ | 13956/22434 [13:08:10<5:55:19, 2.51s/it] +2025-02-05 23:15:51 - ERROR - stderr - +2025-02-05 23:15:51 - ERROR - stderr - +2025-02-05 23:15:51 - INFO - stdout - {'loss': 0.8008, 'grad_norm': 1.4743539094924927, 'learning_rate': 6.601212786504633e-06, 'epoch': 1.87} +2025-02-05 23:15:51 - ERROR - stderr - 62%|██████▏ | 13956/22434 [13:08:11<5:55:19, 2.51s/it] +2025-02-05 23:15:53 - ERROR - stderr - 62%|���█████▏ | 13957/22434 [13:08:13<6:01:29, 2.56s/it] +2025-02-05 23:15:53 - ERROR - stderr - +2025-02-05 23:15:53 - ERROR - stderr - +2025-02-05 23:15:53 - INFO - stdout - {'loss': 0.6576, 'grad_norm': 1.2408746480941772, 'learning_rate': 6.599855022592803e-06, 'epoch': 1.87} +2025-02-05 23:15:53 - ERROR - stderr - 62%|██████▏ | 13957/22434 [13:08:13<6:01:29, 2.56s/it] +2025-02-05 23:15:56 - ERROR - stderr - 62%|██████▏ | 13958/22434 [13:08:16<5:58:04, 2.53s/it] +2025-02-05 23:15:56 - ERROR - stderr - +2025-02-05 23:15:56 - ERROR - stderr - +2025-02-05 23:15:56 - INFO - stdout - {'loss': 0.6933, 'grad_norm': 1.2729380130767822, 'learning_rate': 6.598497329553744e-06, 'epoch': 1.87} +2025-02-05 23:15:56 - ERROR - stderr - 62%|██████▏ | 13958/22434 [13:08:16<5:58:04, 2.53s/it] +2025-02-05 23:15:59 - ERROR - stderr - 62%|██████▏ | 13959/22434 [13:08:18<6:04:54, 2.58s/it] +2025-02-05 23:15:59 - ERROR - stderr - +2025-02-05 23:15:59 - ERROR - stderr - +2025-02-05 23:15:59 - INFO - stdout - {'loss': 0.642, 'grad_norm': 1.1852768659591675, 'learning_rate': 6.597139707415754e-06, 'epoch': 1.87} +2025-02-05 23:15:59 - ERROR - stderr - 62%|██████▏ | 13959/22434 [13:08:18<6:04:54, 2.58s/it] +2025-02-05 23:16:01 - ERROR - stderr - 62%|██████▏ | 13960/22434 [13:08:21<6:03:57, 2.58s/it] +2025-02-05 23:16:01 - ERROR - stderr - +2025-02-05 23:16:01 - ERROR - stderr - +2025-02-05 23:16:01 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.220337986946106, 'learning_rate': 6.595782156207126e-06, 'epoch': 1.87} +2025-02-05 23:16:01 - ERROR - stderr - 62%|██████▏ | 13960/22434 [13:08:21<6:03:57, 2.58s/it] +2025-02-05 23:16:04 - ERROR - stderr - 62%|██████▏ | 13961/22434 [13:08:23<6:03:31, 2.57s/it] +2025-02-05 23:16:04 - ERROR - stderr - +2025-02-05 23:16:04 - ERROR - stderr - +2025-02-05 23:16:04 - INFO - stdout - {'loss': 0.6725, 'grad_norm': 1.175877571105957, 'learning_rate': 6.594424675956166e-06, 'epoch': 1.87} +2025-02-05 23:16:04 - ERROR - stderr - 62%|██████▏ | 13961/22434 [13:08:23<6:03:31, 2.57s/it] +2025-02-05 23:16:06 - ERROR - stderr - 62%|██████▏ | 13962/22434 [13:08:26<6:10:13, 2.62s/it] +2025-02-05 23:16:06 - ERROR - stderr - +2025-02-05 23:16:06 - ERROR - stderr - +2025-02-05 23:16:06 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.2880406379699707, 'learning_rate': 6.593067266691162e-06, 'epoch': 1.87} +2025-02-05 23:16:06 - ERROR - stderr - 62%|██████▏ | 13962/22434 [13:08:26<6:10:13, 2.62s/it] +2025-02-05 23:16:09 - ERROR - stderr - 62%|██████▏ | 13963/22434 [13:08:29<6:08:56, 2.61s/it] +2025-02-05 23:16:09 - ERROR - stderr - +2025-02-05 23:16:09 - ERROR - stderr - +2025-02-05 23:16:09 - INFO - stdout - {'loss': 0.624, 'grad_norm': 1.0359405279159546, 'learning_rate': 6.591709928440413e-06, 'epoch': 1.87} +2025-02-05 23:16:09 - ERROR - stderr - 62%|██████▏ | 13963/22434 [13:08:29<6:08:56, 2.61s/it] +2025-02-05 23:16:12 - ERROR - stderr - 62%|██████▏ | 13964/22434 [13:08:31<6:06:31, 2.60s/it] +2025-02-05 23:16:12 - ERROR - stderr - +2025-02-05 23:16:12 - ERROR - stderr - +2025-02-05 23:16:12 - INFO - stdout - {'loss': 0.6779, 'grad_norm': 1.203827977180481, 'learning_rate': 6.59035266123221e-06, 'epoch': 1.87} +2025-02-05 23:16:12 - ERROR - stderr - 62%|██████▏ | 13964/22434 [13:08:31<6:06:31, 2.60s/it] +2025-02-05 23:16:14 - ERROR - stderr - 62%|██████▏ | 13965/22434 [13:08:34<6:04:34, 2.58s/it] +2025-02-05 23:16:14 - ERROR - stderr - +2025-02-05 23:16:14 - ERROR - stderr - +2025-02-05 23:16:14 - INFO - stdout - {'loss': 0.7174, 'grad_norm': 1.1972779035568237, 'learning_rate': 6.588995465094839e-06, 'epoch': 1.87} +2025-02-05 23:16:14 - ERROR - stderr - 62%|██████▏ | 13965/22434 [13:08:34<6:04:34, 2.58s/it] +2025-02-05 23:16:17 - ERROR - stderr - 62%|██████▏ | 13966/22434 [13:08:36<5:58:52, 2.54s/it] +2025-02-05 23:16:17 - ERROR - stderr - +2025-02-05 23:16:17 - ERROR - stderr - +2025-02-05 23:16:17 - INFO - stdout - {'loss': 0.7096, 'grad_norm': 1.435289978981018, 'learning_rate': 6.587638340056598e-06, 'epoch': 1.87} +2025-02-05 23:16:17 - ERROR - stderr - 62%|██████▏ | 13966/22434 [13:08:36<5:58:52, 2.54s/it] +2025-02-05 23:16:19 - ERROR - stderr - 62%|██████▏ | 13967/22434 [13:08:39<5:58:26, 2.54s/it] +2025-02-05 23:16:19 - ERROR - stderr - +2025-02-05 23:16:19 - ERROR - stderr - +2025-02-05 23:16:19 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.2918485403060913, 'learning_rate': 6.5862812861457685e-06, 'epoch': 1.87} +2025-02-05 23:16:19 - ERROR - stderr - 62%|██████▏ | 13967/22434 [13:08:39<5:58:26, 2.54s/it] +2025-02-05 23:16:22 - ERROR - stderr - 62%|██████▏ | 13968/22434 [13:08:41<5:59:34, 2.55s/it] +2025-02-05 23:16:22 - ERROR - stderr - +2025-02-05 23:16:22 - ERROR - stderr - +2025-02-05 23:16:22 - INFO - stdout - {'loss': 0.739, 'grad_norm': 1.3213568925857544, 'learning_rate': 6.584924303390639e-06, 'epoch': 1.87} +2025-02-05 23:16:22 - ERROR - stderr - 62%|██████▏ | 13968/22434 [13:08:41<5:59:34, 2.55s/it] +2025-02-05 23:16:24 - ERROR - stderr - 62%|██████▏ | 13969/22434 [13:08:44<6:01:50, 2.56s/it] +2025-02-05 23:16:24 - ERROR - stderr - +2025-02-05 23:16:24 - ERROR - stderr - +2025-02-05 23:16:24 - INFO - stdout - {'loss': 0.6116, 'grad_norm': 1.1669787168502808, 'learning_rate': 6.583567391819494e-06, 'epoch': 1.87} +2025-02-05 23:16:24 - ERROR - stderr - 62%|██████▏ | 13969/22434 [13:08:44<6:01:50, 2.56s/it] +2025-02-05 23:16:27 - ERROR - stderr - 62%|██████▏ | 13970/22434 [13:08:47<6:05:47, 2.59s/it] +2025-02-05 23:16:27 - ERROR - stderr - +2025-02-05 23:16:27 - ERROR - stderr - +2025-02-05 23:16:27 - INFO - stdout - {'loss': 0.6902, 'grad_norm': 1.488783597946167, 'learning_rate': 6.582210551460615e-06, 'epoch': 1.87} +2025-02-05 23:16:27 - ERROR - stderr - 62%|██████▏ | 13970/22434 [13:08:47<6:05:47, 2.59s/it] +2025-02-05 23:16:30 - ERROR - stderr - 62%|██████▏ | 13971/22434 [13:08:49<6:11:10, 2.63s/it] +2025-02-05 23:16:30 - ERROR - stderr - +2025-02-05 23:16:30 - ERROR - stderr - +2025-02-05 23:16:30 - INFO - stdout - {'loss': 0.8207, 'grad_norm': 1.2737895250320435, 'learning_rate': 6.580853782342291e-06, 'epoch': 1.87} +2025-02-05 23:16:30 - ERROR - stderr - 62%|██████▏ | 13971/22434 [13:08:49<6:11:10, 2.63s/it] +2025-02-05 23:16:32 - ERROR - stderr - 62%|██████▏ | 13972/22434 [13:08:52<6:03:57, 2.58s/it] +2025-02-05 23:16:32 - ERROR - stderr - +2025-02-05 23:16:32 - ERROR - stderr - +2025-02-05 23:16:32 - INFO - stdout - {'loss': 0.6536, 'grad_norm': 1.2612347602844238, 'learning_rate': 6.5794970844928e-06, 'epoch': 1.87} +2025-02-05 23:16:32 - ERROR - stderr - 62%|██████▏ | 13972/22434 [13:08:52<6:03:57, 2.58s/it] +2025-02-05 23:16:35 - ERROR - stderr - 62%|██████▏ | 13973/22434 [13:08:54<5:57:07, 2.53s/it] +2025-02-05 23:16:35 - ERROR - stderr - +2025-02-05 23:16:35 - ERROR - stderr - +2025-02-05 23:16:35 - INFO - stdout - {'loss': 0.6772, 'grad_norm': 1.309590220451355, 'learning_rate': 6.578140457940414e-06, 'epoch': 1.87} +2025-02-05 23:16:35 - ERROR - stderr - 62%|██████▏ | 13973/22434 [13:08:54<5:57:07, 2.53s/it] +2025-02-05 23:16:37 - ERROR - stderr - 62%|██████▏ | 13974/22434 [13:08:57<5:57:15, 2.53s/it] +2025-02-05 23:16:37 - ERROR - stderr - +2025-02-05 23:16:37 - ERROR - stderr - +2025-02-05 23:16:37 - INFO - stdout - {'loss': 0.6717, 'grad_norm': 1.218559741973877, 'learning_rate': 6.576783902713419e-06, 'epoch': 1.87} +2025-02-05 23:16:37 - ERROR - stderr - 62%|██████▏ | 13974/22434 [13:08:57<5:57:15, 2.53s/it] +2025-02-05 23:16:40 - ERROR - stderr - 62%|██████▏ | 13975/22434 [13:08:59<5:54:10, 2.51s/it] +2025-02-05 23:16:40 - ERROR - stderr - +2025-02-05 23:16:40 - ERROR - stderr - +2025-02-05 23:16:40 - INFO - stdout - {'loss': 0.7271, 'grad_norm': 1.3907921314239502, 'learning_rate': 6.575427418840087e-06, 'epoch': 1.87} +2025-02-05 23:16:40 - ERROR - stderr - 62%|██████▏ | 13975/22434 [13:08:59<5:54:10, 2.51s/it] +2025-02-05 23:16:42 - ERROR - stderr - 62%|██████▏ | 13976/22434 [13:09:02<5:50:31, 2.49s/it] +2025-02-05 23:16:42 - ERROR - stderr - +2025-02-05 23:16:42 - ERROR - stderr - +2025-02-05 23:16:42 - INFO - stdout - {'loss': 0.6883, 'grad_norm': 1.228306770324707, 'learning_rate': 6.57407100634869e-06, 'epoch': 1.87} +2025-02-05 23:16:42 - ERROR - stderr - 62%|██████▏ | 13976/22434 [13:09:02<5:50:31, 2.49s/it] +2025-02-05 23:16:44 - ERROR - stderr - 62%|██████▏ | 13977/22434 [13:09:04<5:50:35, 2.49s/it] +2025-02-05 23:16:44 - ERROR - stderr - +2025-02-05 23:16:44 - ERROR - stderr - +2025-02-05 23:16:44 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.4250659942626953, 'learning_rate': 6.57271466526751e-06, 'epoch': 1.87} +2025-02-05 23:16:44 - ERROR - stderr - 62%|██████▏ | 13977/22434 [13:09:04<5:50:35, 2.49s/it] +2025-02-05 23:16:47 - ERROR - stderr - 62%|██████▏ | 13978/22434 [13:09:07<5:52:16, 2.50s/it] +2025-02-05 23:16:47 - ERROR - stderr - +2025-02-05 23:16:47 - ERROR - stderr - +2025-02-05 23:16:47 - INFO - stdout - {'loss': 0.655, 'grad_norm': 1.1936372518539429, 'learning_rate': 6.57135839562481e-06, 'epoch': 1.87} +2025-02-05 23:16:47 - ERROR - stderr - 62%|██████▏ | 13978/22434 [13:09:07<5:52:16, 2.50s/it] +2025-02-05 23:16:50 - ERROR - stderr - 62%|██████▏ | 13979/22434 [13:09:09<5:55:03, 2.52s/it] +2025-02-05 23:16:50 - ERROR - stderr - +2025-02-05 23:16:50 - ERROR - stderr - +2025-02-05 23:16:50 - INFO - stdout - {'loss': 0.6487, 'grad_norm': 1.2455246448516846, 'learning_rate': 6.570002197448866e-06, 'epoch': 1.87} +2025-02-05 23:16:50 - ERROR - stderr - 62%|██████▏ | 13979/22434 [13:09:09<5:55:03, 2.52s/it] +2025-02-05 23:16:52 - ERROR - stderr - 62%|██████▏ | 13980/22434 [13:09:12<5:53:04, 2.51s/it] +2025-02-05 23:16:52 - ERROR - stderr - +2025-02-05 23:16:52 - ERROR - stderr - +2025-02-05 23:16:52 - INFO - stdout - {'loss': 0.7013, 'grad_norm': 1.3272223472595215, 'learning_rate': 6.568646070767941e-06, 'epoch': 1.87} +2025-02-05 23:16:52 - ERROR - stderr - 62%|██████▏ | 13980/22434 [13:09:12<5:53:04, 2.51s/it] +2025-02-05 23:16:55 - ERROR - stderr - 62%|██████▏ | 13981/22434 [13:09:14<5:56:27, 2.53s/it] +2025-02-05 23:16:55 - ERROR - stderr - +2025-02-05 23:16:55 - ERROR - stderr - +2025-02-05 23:16:55 - INFO - stdout - {'loss': 0.6111, 'grad_norm': 1.3171569108963013, 'learning_rate': 6.567290015610307e-06, 'epoch': 1.87} +2025-02-05 23:16:55 - ERROR - stderr - 62%|██████▏ | 13981/22434 [13:09:14<5:56:27, 2.53s/it] +2025-02-05 23:16:57 - ERROR - stderr - 62%|██████▏ | 13982/22434 [13:09:17<5:55:30, 2.52s/it] +2025-02-05 23:16:57 - ERROR - stderr - +2025-02-05 23:16:57 - ERROR - stderr - +2025-02-05 23:16:57 - INFO - stdout - {'loss': 0.6016, 'grad_norm': 1.1855790615081787, 'learning_rate': 6.5659340320042274e-06, 'epoch': 1.87} +2025-02-05 23:16:57 - ERROR - stderr - 62%|██████▏ | 13982/22434 [13:09:17<5:55:30, 2.52s/it] +2025-02-05 23:17:00 - ERROR - stderr - 62%|██████▏ | 13983/22434 [13:09:19<5:55:35, 2.52s/it] +2025-02-05 23:17:00 - ERROR - stderr - +2025-02-05 23:17:00 - ERROR - stderr - +2025-02-05 23:17:00 - INFO - stdout - {'loss': 0.7092, 'grad_norm': 1.2581253051757812, 'learning_rate': 6.564578119977969e-06, 'epoch': 1.87} +2025-02-05 23:17:00 - ERROR - stderr - 62%|██████▏ | 13983/22434 [13:09:19<5:55:35, 2.52s/it] +2025-02-05 23:17:02 - ERROR - stderr - 62%|██████▏ | 13984/22434 [13:09:22<5:55:12, 2.52s/it] +2025-02-05 23:17:02 - ERROR - stderr - +2025-02-05 23:17:02 - ERROR - stderr - +2025-02-05 23:17:02 - INFO - stdout - {'loss': 0.6017, 'grad_norm': 1.0305087566375732, 'learning_rate': 6.563222279559788e-06, 'epoch': 1.87} +2025-02-05 23:17:02 - ERROR - stderr - 62%|██████▏ | 13984/22434 [13:09:22<5:55:12, 2.52s/it] +2025-02-05 23:17:05 - ERROR - stderr - 62%|██████▏ | 13985/22434 [13:09:25<6:00:00, 2.56s/it] +2025-02-05 23:17:05 - ERROR - stderr - +2025-02-05 23:17:05 - ERROR - stderr - +2025-02-05 23:17:05 - INFO - stdout - {'loss': 0.7062, 'grad_norm': 1.386892318725586, 'learning_rate': 6.5618665107779545e-06, 'epoch': 1.87} +2025-02-05 23:17:05 - ERROR - stderr - 62%|██████▏ | 13985/22434 [13:09:25<6:00:00, 2.56s/it] +2025-02-05 23:17:07 - ERROR - stderr - 62%|██████▏ | 13986/22434 [13:09:27<5:54:57, 2.52s/it] +2025-02-05 23:17:07 - ERROR - stderr - +2025-02-05 23:17:07 - ERROR - stderr - +2025-02-05 23:17:07 - INFO - stdout - {'loss': 0.6704, 'grad_norm': 1.1256368160247803, 'learning_rate': 6.560510813660719e-06, 'epoch': 1.87} +2025-02-05 23:17:07 - ERROR - stderr - 62%|██████▏ | 13986/22434 [13:09:27<5:54:57, 2.52s/it] +2025-02-05 23:17:10 - ERROR - stderr - 62%|██████▏ | 13987/22434 [13:09:30<6:00:33, 2.56s/it] +2025-02-05 23:17:10 - ERROR - stderr - +2025-02-05 23:17:10 - ERROR - stderr - +2025-02-05 23:17:10 - INFO - stdout - {'loss': 0.6754, 'grad_norm': 1.2771196365356445, 'learning_rate': 6.559155188236348e-06, 'epoch': 1.87} +2025-02-05 23:17:10 - ERROR - stderr - 62%|██████▏ | 13987/22434 [13:09:30<6:00:33, 2.56s/it] +2025-02-05 23:17:12 - ERROR - stderr - 62%|██████▏ | 13988/22434 [13:09:32<5:58:28, 2.55s/it] +2025-02-05 23:17:12 - ERROR - stderr - +2025-02-05 23:17:12 - ERROR - stderr - +2025-02-05 23:17:12 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.2199993133544922, 'learning_rate': 6.557799634533093e-06, 'epoch': 1.87} +2025-02-05 23:17:12 - ERROR - stderr - 62%|██████▏ | 13988/22434 [13:09:32<5:58:28, 2.55s/it] +2025-02-05 23:17:15 - ERROR - stderr - 62%|██████▏ | 13989/22434 [13:09:35<5:58:11, 2.54s/it] +2025-02-05 23:17:15 - ERROR - stderr - +2025-02-05 23:17:15 - ERROR - stderr - +2025-02-05 23:17:15 - INFO - stdout - {'loss': 0.7351, 'grad_norm': 1.4394389390945435, 'learning_rate': 6.556444152579209e-06, 'epoch': 1.87} +2025-02-05 23:17:15 - ERROR - stderr - 62%|██████▏ | 13989/22434 [13:09:35<5:58:11, 2.54s/it] +2025-02-05 23:17:17 - ERROR - stderr - 62%|██████▏ | 13990/22434 [13:09:37<5:56:50, 2.54s/it] +2025-02-05 23:17:17 - ERROR - stderr - +2025-02-05 23:17:17 - ERROR - stderr - +2025-02-05 23:17:17 - INFO - stdout - {'loss': 0.6663, 'grad_norm': 1.3375440835952759, 'learning_rate': 6.555088742402955e-06, 'epoch': 1.87} +2025-02-05 23:17:17 - ERROR - stderr - 62%|██████▏ | 13990/22434 [13:09:37<5:56:50, 2.54s/it] +2025-02-05 23:17:20 - ERROR - stderr - 62%|██████▏ | 13991/22434 [13:09:40<5:56:40, 2.53s/it] +2025-02-05 23:17:20 - ERROR - stderr - +2025-02-05 23:17:20 - ERROR - stderr - +2025-02-05 23:17:20 - INFO - stdout - {'loss': 0.6384, 'grad_norm': 1.163658618927002, 'learning_rate': 6.55373340403258e-06, 'epoch': 1.87} +2025-02-05 23:17:20 - ERROR - stderr - 62%|██████▏ | 13991/22434 [13:09:40<5:56:40, 2.53s/it] +2025-02-05 23:17:23 - ERROR - stderr - 62%|██████▏ | 13992/22434 [13:09:42<5:55:59, 2.53s/it] +2025-02-05 23:17:23 - ERROR - stderr - +2025-02-05 23:17:23 - ERROR - stderr - +2025-02-05 23:17:23 - INFO - stdout - {'loss': 0.5907, 'grad_norm': 1.1022019386291504, 'learning_rate': 6.552378137496332e-06, 'epoch': 1.87} +2025-02-05 23:17:23 - ERROR - stderr - 62%|██████▏ | 13992/22434 [13:09:42<5:55:59, 2.53s/it] +2025-02-05 23:17:25 - ERROR - stderr - 62%|██████▏ | 13993/22434 [13:09:45<5:52:25, 2.51s/it] +2025-02-05 23:17:25 - ERROR - stderr - +2025-02-05 23:17:25 - ERROR - stderr - +2025-02-05 23:17:25 - INFO - stdout - {'loss': 0.6767, 'grad_norm': 1.2540149688720703, 'learning_rate': 6.551022942822465e-06, 'epoch': 1.87} +2025-02-05 23:17:25 - ERROR - stderr - 62%|██████▏ | 13993/22434 [13:09:45<5:52:25, 2.51s/it] +2025-02-05 23:17:27 - ERROR - stderr - 62%|██████▏ | 13994/22434 [13:09:47<5:50:29, 2.49s/it] +2025-02-05 23:17:27 - ERROR - stderr - +2025-02-05 23:17:27 - ERROR - stderr - +2025-02-05 23:17:27 - INFO - stdout - {'loss': 0.729, 'grad_norm': 1.331811785697937, 'learning_rate': 6.549667820039221e-06, 'epoch': 1.87} +2025-02-05 23:17:27 - ERROR - stderr - 62%|██████▏ | 13994/22434 [13:09:47<5:50:29, 2.49s/it] +2025-02-05 23:17:30 - ERROR - stderr - 62%|██████▏ | 13995/22434 [13:09:50<5:56:33, 2.54s/it] +2025-02-05 23:17:30 - ERROR - stderr - +2025-02-05 23:17:30 - ERROR - stderr - +2025-02-05 23:17:30 - INFO - stdout - {'loss': 0.6618, 'grad_norm': 1.2621463537216187, 'learning_rate': 6.548312769174852e-06, 'epoch': 1.87} +2025-02-05 23:17:30 - ERROR - stderr - 62%|██████▏ | 13995/22434 [13:09:50<5:56:33, 2.54s/it] +2025-02-05 23:17:32 - ERROR - stderr - 62%|██████▏ | 13996/22434 [13:09:52<5:53:04, 2.51s/it] +2025-02-05 23:17:33 - ERROR - stderr - +2025-02-05 23:17:33 - ERROR - stderr - +2025-02-05 23:17:33 - INFO - stdout - {'loss': 0.6367, 'grad_norm': 1.252867579460144, 'learning_rate': 6.546957790257602e-06, 'epoch': 1.87} +2025-02-05 23:17:33 - ERROR - stderr - 62%|██████▏ | 13996/22434 [13:09:52<5:53:04, 2.51s/it] +2025-02-05 23:17:35 - ERROR - stderr - 62%|██████▏ | 13997/22434 [13:09:55<5:52:33, 2.51s/it] +2025-02-05 23:17:35 - ERROR - stderr - +2025-02-05 23:17:35 - ERROR - stderr - +2025-02-05 23:17:35 - INFO - stdout - {'loss': 0.6166, 'grad_norm': 1.1300737857818604, 'learning_rate': 6.545602883315708e-06, 'epoch': 1.87} +2025-02-05 23:17:35 - ERROR - stderr - 62%|██████▏ | 13997/22434 [13:09:55<5:52:33, 2.51s/it] +2025-02-05 23:17:38 - ERROR - stderr - 62%|██████▏ | 13998/22434 [13:09:57<5:53:16, 2.51s/it] +2025-02-05 23:17:38 - ERROR - stderr - +2025-02-05 23:17:38 - ERROR - stderr - +2025-02-05 23:17:38 - INFO - stdout - {'loss': 0.6727, 'grad_norm': 1.0902920961380005, 'learning_rate': 6.5442480483774215e-06, 'epoch': 1.87} +2025-02-05 23:17:38 - ERROR - stderr - 62%|██████▏ | 13998/22434 [13:09:57<5:53:16, 2.51s/it] +2025-02-05 23:17:40 - ERROR - stderr - 62%|██████▏ | 13999/22434 [13:10:00<5:49:53, 2.49s/it] +2025-02-05 23:17:40 - ERROR - stderr - +2025-02-05 23:17:40 - ERROR - stderr - +2025-02-05 23:17:40 - INFO - stdout - {'loss': 0.6912, 'grad_norm': 1.4419710636138916, 'learning_rate': 6.542893285470975e-06, 'epoch': 1.87} +2025-02-05 23:17:40 - ERROR - stderr - 62%|██████▏ | 13999/22434 [13:10:00<5:49:53, 2.49s/it] +2025-02-05 23:17:42 - ERROR - stderr - 62%|██████▏ | 14000/22434 [13:10:02<5:49:22, 2.49s/it] +2025-02-05 23:17:42 - ERROR - stderr - +2025-02-05 23:17:42 - ERROR - stderr - +2025-02-05 23:17:42 - INFO - stdout - {'loss': 0.6021, 'grad_norm': 1.0174311399459839, 'learning_rate': 6.5415385946246106e-06, 'epoch': 1.87} +2025-02-05 23:17:42 - ERROR - stderr - 62%|██████▏ | 14000/22434 [13:10:02<5:49:22, 2.49s/it] +2025-02-05 23:17:45 - ERROR - stderr - 62%|██████▏ | 14001/22434 [13:10:05<5:51:56, 2.50s/it] +2025-02-05 23:17:45 - ERROR - stderr - +2025-02-05 23:17:45 - ERROR - stderr - +2025-02-05 23:17:45 - INFO - stdout - {'loss': 0.7583, 'grad_norm': 1.3478707075119019, 'learning_rate': 6.540183975866563e-06, 'epoch': 1.87} +2025-02-05 23:17:45 - ERROR - stderr - 62%|██████▏ | 14001/22434 [13:10:05<5:51:56, 2.50s/it] +2025-02-05 23:17:47 - ERROR - stderr - 62%|██████▏ | 14002/22434 [13:10:07<5:50:02, 2.49s/it] +2025-02-05 23:17:47 - ERROR - stderr - +2025-02-05 23:17:47 - ERROR - stderr - +2025-02-05 23:17:47 - INFO - stdout - {'loss': 0.6328, 'grad_norm': 1.1823484897613525, 'learning_rate': 6.538829429225068e-06, 'epoch': 1.87} +2025-02-05 23:17:47 - ERROR - stderr - 62%|██████▏ | 14002/22434 [13:10:07<5:50:02, 2.49s/it] +2025-02-05 23:17:50 - ERROR - stderr - 62%|██████▏ | 14003/22434 [13:10:10<5:51:15, 2.50s/it] +2025-02-05 23:17:50 - ERROR - stderr - +2025-02-05 23:17:50 - ERROR - stderr - +2025-02-05 23:17:50 - INFO - stdout - {'loss': 0.6187, 'grad_norm': 1.2060277462005615, 'learning_rate': 6.537474954728368e-06, 'epoch': 1.87} +2025-02-05 23:17:50 - ERROR - stderr - 62%|██████▏ | 14003/22434 [13:10:10<5:51:15, 2.50s/it] +2025-02-05 23:17:52 - ERROR - stderr - 62%|██████▏ | 14004/22434 [13:10:12<5:50:42, 2.50s/it] +2025-02-05 23:17:52 - ERROR - stderr - +2025-02-05 23:17:52 - ERROR - stderr - +2025-02-05 23:17:52 - INFO - stdout - {'loss': 0.7761, 'grad_norm': 1.281545877456665, 'learning_rate': 6.536120552404688e-06, 'epoch': 1.87} +2025-02-05 23:17:52 - ERROR - stderr - 62%|██████▏ | 14004/22434 [13:10:12<5:50:42, 2.50s/it] +2025-02-05 23:17:55 - ERROR - stderr - 62%|██████▏ | 14005/22434 [13:10:15<5:50:20, 2.49s/it] +2025-02-05 23:17:55 - ERROR - stderr - +2025-02-05 23:17:55 - ERROR - stderr - +2025-02-05 23:17:55 - INFO - stdout - {'loss': 0.7047, 'grad_norm': 1.296919584274292, 'learning_rate': 6.534766222282256e-06, 'epoch': 1.87} +2025-02-05 23:17:55 - ERROR - stderr - 62%|██████▏ | 14005/22434 [13:10:15<5:50:20, 2.49s/it] +2025-02-05 23:17:57 - ERROR - stderr - 62%|██████▏ | 14006/22434 [13:10:17<5:51:58, 2.51s/it] +2025-02-05 23:17:58 - ERROR - stderr - +2025-02-05 23:17:58 - ERROR - stderr - +2025-02-05 23:17:58 - INFO - stdout - {'loss': 0.685, 'grad_norm': 1.227433204650879, 'learning_rate': 6.533411964389311e-06, 'epoch': 1.87} +2025-02-05 23:17:58 - ERROR - stderr - 62%|██████▏ | 14006/22434 [13:10:17<5:51:58, 2.51s/it] +2025-02-05 23:18:00 - ERROR - stderr - 62%|██████▏ | 14007/22434 [13:10:20<5:50:38, 2.50s/it] +2025-02-05 23:18:00 - ERROR - stderr - +2025-02-05 23:18:00 - ERROR - stderr - +2025-02-05 23:18:00 - INFO - stdout - {'loss': 0.789, 'grad_norm': 1.333950161933899, 'learning_rate': 6.532057778754074e-06, 'epoch': 1.87} +2025-02-05 23:18:00 - ERROR - stderr - 62%|██████▏ | 14007/22434 [13:10:20<5:50:38, 2.50s/it] +2025-02-05 23:18:02 - ERROR - stderr - 62%|██████▏ | 14008/22434 [13:10:22<5:47:57, 2.48s/it] +2025-02-05 23:18:02 - ERROR - stderr - +2025-02-05 23:18:02 - ERROR - stderr - +2025-02-05 23:18:02 - INFO - stdout - {'loss': 0.6953, 'grad_norm': 1.3873847723007202, 'learning_rate': 6.530703665404772e-06, 'epoch': 1.87} +2025-02-05 23:18:02 - ERROR - stderr - 62%|██████▏ | 14008/22434 [13:10:22<5:47:57, 2.48s/it] +2025-02-05 23:18:05 - ERROR - stderr - 62%|██████▏ | 14009/22434 [13:10:25<5:47:20, 2.47s/it] +2025-02-05 23:18:05 - ERROR - stderr - +2025-02-05 23:18:05 - ERROR - stderr - +2025-02-05 23:18:05 - INFO - stdout - {'loss': 0.6423, 'grad_norm': 1.1854841709136963, 'learning_rate': 6.529349624369637e-06, 'epoch': 1.87} +2025-02-05 23:18:05 - ERROR - stderr - 62%|██████▏ | 14009/22434 [13:10:25<5:47:20, 2.47s/it] +2025-02-05 23:18:07 - ERROR - stderr - 62%|██████▏ | 14010/22434 [13:10:27<5:53:10, 2.52s/it] +2025-02-05 23:18:08 - ERROR - stderr - +2025-02-05 23:18:08 - ERROR - stderr - +2025-02-05 23:18:08 - INFO - stdout - {'loss': 0.6532, 'grad_norm': 1.1603190898895264, 'learning_rate': 6.527995655676882e-06, 'epoch': 1.87} +2025-02-05 23:18:08 - ERROR - stderr - 62%|██████▏ | 14010/22434 [13:10:27<5:53:10, 2.52s/it] +2025-02-05 23:18:10 - ERROR - stderr - 62%|██████▏ | 14011/22434 [13:10:30<6:01:27, 2.57s/it] +2025-02-05 23:18:10 - ERROR - stderr - +2025-02-05 23:18:10 - ERROR - stderr - +2025-02-05 23:18:10 - INFO - stdout - {'loss': 0.5951, 'grad_norm': 1.21134614944458, 'learning_rate': 6.5266417593547415e-06, 'epoch': 1.87} +2025-02-05 23:18:10 - ERROR - stderr - 62%|██████▏ | 14011/22434 [13:10:30<6:01:27, 2.57s/it] +2025-02-05 23:18:13 - ERROR - stderr - 62%|██████▏ | 14012/22434 [13:10:32<5:59:10, 2.56s/it] +2025-02-05 23:18:13 - ERROR - stderr - +2025-02-05 23:18:13 - ERROR - stderr - +2025-02-05 23:18:13 - INFO - stdout - {'loss': 0.7585, 'grad_norm': 1.3154667615890503, 'learning_rate': 6.525287935431427e-06, 'epoch': 1.87} +2025-02-05 23:18:13 - ERROR - stderr - 62%|██████▏ | 14012/22434 [13:10:33<5:59:10, 2.56s/it] +2025-02-05 23:18:15 - ERROR - stderr - 62%|██████▏ | 14013/22434 [13:10:35<5:55:47, 2.54s/it] +2025-02-05 23:18:15 - ERROR - stderr - +2025-02-05 23:18:15 - ERROR - stderr - +2025-02-05 23:18:15 - INFO - stdout - {'loss': 0.6322, 'grad_norm': 1.2063173055648804, 'learning_rate': 6.523934183935161e-06, 'epoch': 1.87} +2025-02-05 23:18:15 - ERROR - stderr - 62%|██████▏ | 14013/22434 [13:10:35<5:55:47, 2.54s/it] +2025-02-05 23:18:18 - ERROR - stderr - 62%|██████▏ | 14014/22434 [13:10:38<6:01:40, 2.58s/it] +2025-02-05 23:18:18 - ERROR - stderr - +2025-02-05 23:18:18 - ERROR - stderr - +2025-02-05 23:18:18 - INFO - stdout - {'loss': 0.6884, 'grad_norm': 1.3013668060302734, 'learning_rate': 6.522580504894161e-06, 'epoch': 1.87} +2025-02-05 23:18:18 - ERROR - stderr - 62%|██████▏ | 14014/22434 [13:10:38<6:01:40, 2.58s/it] +2025-02-05 23:18:20 - ERROR - stderr - 62%|██████▏ | 14015/22434 [13:10:40<6:02:27, 2.58s/it] +2025-02-05 23:18:20 - ERROR - stderr - +2025-02-05 23:18:20 - ERROR - stderr - +2025-02-05 23:18:20 - INFO - stdout - {'loss': 0.6403, 'grad_norm': 1.0830801725387573, 'learning_rate': 6.521226898336643e-06, 'epoch': 1.87} +2025-02-05 23:18:20 - ERROR - stderr - 62%|██████▏ | 14015/22434 [13:10:40<6:02:27, 2.58s/it] +2025-02-05 23:18:23 - ERROR - stderr - 62%|██████▏ | 14016/22434 [13:10:43<6:02:10, 2.58s/it] +2025-02-05 23:18:23 - ERROR - stderr - +2025-02-05 23:18:23 - ERROR - stderr - +2025-02-05 23:18:23 - INFO - stdout - {'loss': 0.6612, 'grad_norm': 1.3827158212661743, 'learning_rate': 6.519873364290818e-06, 'epoch': 1.87} +2025-02-05 23:18:23 - ERROR - stderr - 62%|██████▏ | 14016/22434 [13:10:43<6:02:10, 2.58s/it] +2025-02-05 23:18:26 - ERROR - stderr - 62%|██████▏ | 14017/22434 [13:10:45<5:59:24, 2.56s/it] +2025-02-05 23:18:26 - ERROR - stderr - +2025-02-05 23:18:26 - ERROR - stderr - +2025-02-05 23:18:26 - INFO - stdout - {'loss': 0.6201, 'grad_norm': 1.206017255783081, 'learning_rate': 6.518519902784908e-06, 'epoch': 1.87} +2025-02-05 23:18:26 - ERROR - stderr - 62%|██████▏ | 14017/22434 [13:10:45<5:59:24, 2.56s/it] +2025-02-05 23:18:28 - ERROR - stderr - 62%|██████▏ | 14018/22434 [13:10:48<6:00:23, 2.57s/it] +2025-02-05 23:18:28 - ERROR - stderr - +2025-02-05 23:18:28 - ERROR - stderr - +2025-02-05 23:18:28 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.2448264360427856, 'learning_rate': 6.517166513847115e-06, 'epoch': 1.87} +2025-02-05 23:18:28 - ERROR - stderr - 62%|██████▏ | 14018/22434 [13:10:48<6:00:23, 2.57s/it] +2025-02-05 23:18:31 - ERROR - stderr - 62%|██████▏ | 14019/22434 [13:10:50<5:56:57, 2.55s/it] +2025-02-05 23:18:31 - ERROR - stderr - +2025-02-05 23:18:31 - ERROR - stderr - +2025-02-05 23:18:31 - INFO - stdout - {'loss': 0.6262, 'grad_norm': 1.2364295721054077, 'learning_rate': 6.515813197505656e-06, 'epoch': 1.87} +2025-02-05 23:18:31 - ERROR - stderr - 62%|██████▏ | 14019/22434 [13:10:50<5:56:57, 2.55s/it] +2025-02-05 23:18:33 - ERROR - stderr - 62%|██████▏ | 14020/22434 [13:10:53<5:54:43, 2.53s/it] +2025-02-05 23:18:33 - ERROR - stderr - +2025-02-05 23:18:33 - ERROR - stderr - +2025-02-05 23:18:33 - INFO - stdout - {'loss': 0.7105, 'grad_norm': 1.4403367042541504, 'learning_rate': 6.514459953788737e-06, 'epoch': 1.87} +2025-02-05 23:18:33 - ERROR - stderr - 62%|██████▏ | 14020/22434 [13:10:53<5:54:43, 2.53s/it] +2025-02-05 23:18:36 - ERROR - stderr - 62%|██████▏ | 14021/22434 [13:10:55<5:52:29, 2.51s/it] +2025-02-05 23:18:36 - ERROR - stderr - +2025-02-05 23:18:36 - ERROR - stderr - +2025-02-05 23:18:36 - INFO - stdout - {'loss': 0.7084, 'grad_norm': 1.3327215909957886, 'learning_rate': 6.513106782724561e-06, 'epoch': 1.87} +2025-02-05 23:18:36 - ERROR - stderr - 62%|██████▏ | 14021/22434 [13:10:55<5:52:29, 2.51s/it] +2025-02-05 23:18:39 - ERROR - stderr - 63%|██████▎ | 14022/22434 [13:10:58<6:13:51, 2.67s/it] +2025-02-05 23:18:39 - ERROR - stderr - +2025-02-05 23:18:39 - ERROR - stderr - +2025-02-05 23:18:39 - INFO - stdout - {'loss': 0.6175, 'grad_norm': 1.1748720407485962, 'learning_rate': 6.511753684341342e-06, 'epoch': 1.88} +2025-02-05 23:18:39 - ERROR - stderr - 63%|██████▎ | 14022/22434 [13:10:58<6:13:51, 2.67s/it] +2025-02-05 23:18:41 - ERROR - stderr - 63%|██████▎ | 14023/22434 [13:11:01<6:07:37, 2.62s/it] +2025-02-05 23:18:41 - ERROR - stderr - +2025-02-05 23:18:41 - ERROR - stderr - +2025-02-05 23:18:41 - INFO - stdout - {'loss': 0.7218, 'grad_norm': 1.2808758020401, 'learning_rate': 6.510400658667276e-06, 'epoch': 1.88} +2025-02-05 23:18:41 - ERROR - stderr - 63%|██████▎ | 14023/22434 [13:11:01<6:07:37, 2.62s/it] +2025-02-05 23:18:44 - ERROR - stderr - 63%|██████▎ | 14024/22434 [13:11:03<6:00:43, 2.57s/it] +2025-02-05 23:18:44 - ERROR - stderr - +2025-02-05 23:18:44 - ERROR - stderr - +2025-02-05 23:18:44 - INFO - stdout - {'loss': 0.621, 'grad_norm': 1.2290514707565308, 'learning_rate': 6.509047705730572e-06, 'epoch': 1.88} +2025-02-05 23:18:44 - ERROR - stderr - 63%|██████▎ | 14024/22434 [13:11:03<6:00:43, 2.57s/it] +2025-02-05 23:18:46 - ERROR - stderr - 63%|██████▎ | 14025/22434 [13:11:06<5:58:16, 2.56s/it] +2025-02-05 23:18:46 - ERROR - stderr - +2025-02-05 23:18:46 - ERROR - stderr - +2025-02-05 23:18:46 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.283898115158081, 'learning_rate': 6.507694825559429e-06, 'epoch': 1.88} +2025-02-05 23:18:46 - ERROR - stderr - 63%|██████▎ | 14025/22434 [13:11:06<5:58:16, 2.56s/it] +2025-02-05 23:18:49 - ERROR - stderr - 63%|██████▎ | 14026/22434 [13:11:09<6:15:53, 2.68s/it] +2025-02-05 23:18:49 - ERROR - stderr - +2025-02-05 23:18:49 - ERROR - stderr - +2025-02-05 23:18:49 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.3426834344863892, 'learning_rate': 6.506342018182041e-06, 'epoch': 1.88} +2025-02-05 23:18:49 - ERROR - stderr - 63%|██████▎ | 14026/22434 [13:11:09<6:15:53, 2.68s/it] +2025-02-05 23:18:52 - ERROR - stderr - 63%|██████▎ | 14027/22434 [13:11:11<6:07:07, 2.62s/it] +2025-02-05 23:18:52 - ERROR - stderr - +2025-02-05 23:18:52 - ERROR - stderr - +2025-02-05 23:18:52 - INFO - stdout - {'loss': 0.7162, 'grad_norm': 1.301448106765747, 'learning_rate': 6.5049892836266135e-06, 'epoch': 1.88} +2025-02-05 23:18:52 - ERROR - stderr - 63%|██████▎ | 14027/22434 [13:11:11<6:07:07, 2.62s/it] +2025-02-05 23:18:54 - ERROR - stderr - 63%|██████▎ | 14028/22434 [13:11:14<6:07:37, 2.62s/it] +2025-02-05 23:18:54 - ERROR - stderr - +2025-02-05 23:18:54 - ERROR - stderr - +2025-02-05 23:18:54 - INFO - stdout - {'loss': 0.7553, 'grad_norm': 1.3862535953521729, 'learning_rate': 6.503636621921342e-06, 'epoch': 1.88} +2025-02-05 23:18:54 - ERROR - stderr - 63%|██████▎ | 14028/22434 [13:11:14<6:07:37, 2.62s/it] +2025-02-05 23:18:57 - ERROR - stderr - 63%|██████▎ | 14029/22434 [13:11:17<6:12:56, 2.66s/it] +2025-02-05 23:18:57 - ERROR - stderr - +2025-02-05 23:18:57 - ERROR - stderr - +2025-02-05 23:18:57 - INFO - stdout - {'loss': 0.6599, 'grad_norm': 1.254606008529663, 'learning_rate': 6.502284033094415e-06, 'epoch': 1.88} +2025-02-05 23:18:57 - ERROR - stderr - 63%|██████▎ | 14029/22434 [13:11:17<6:12:56, 2.66s/it] +2025-02-05 23:18:59 - ERROR - stderr - 63%|██████▎ | 14030/22434 [13:11:19<6:02:08, 2.59s/it] +2025-02-05 23:18:59 - ERROR - stderr - +2025-02-05 23:18:59 - ERROR - stderr - +2025-02-05 23:18:59 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.3500593900680542, 'learning_rate': 6.500931517174034e-06, 'epoch': 1.88} +2025-02-05 23:18:59 - ERROR - stderr - 63%|██████▎ | 14030/22434 [13:11:19<6:02:08, 2.59s/it] +2025-02-05 23:19:02 - ERROR - stderr - 63%|██████▎ | 14031/22434 [13:11:22<6:02:39, 2.59s/it] +2025-02-05 23:19:02 - ERROR - stderr - +2025-02-05 23:19:02 - ERROR - stderr - +2025-02-05 23:19:02 - INFO - stdout - {'loss': 0.7006, 'grad_norm': 1.3084383010864258, 'learning_rate': 6.499579074188385e-06, 'epoch': 1.88} +2025-02-05 23:19:02 - ERROR - stderr - 63%|██████▎ | 14031/22434 [13:11:22<6:02:39, 2.59s/it] +2025-02-05 23:19:05 - ERROR - stderr - 63%|██████▎ | 14032/22434 [13:11:24<6:03:55, 2.60s/it] +2025-02-05 23:19:05 - ERROR - stderr - +2025-02-05 23:19:05 - ERROR - stderr - +2025-02-05 23:19:05 - INFO - stdout - {'loss': 0.7456, 'grad_norm': 1.4097360372543335, 'learning_rate': 6.498226704165662e-06, 'epoch': 1.88} +2025-02-05 23:19:05 - ERROR - stderr - 63%|██████▎ | 14032/22434 [13:11:24<6:03:55, 2.60s/it] +2025-02-05 23:19:07 - ERROR - stderr - 63%|██████▎ | 14033/22434 [13:11:27<5:59:13, 2.57s/it] +2025-02-05 23:19:07 - ERROR - stderr - +2025-02-05 23:19:07 - ERROR - stderr - +2025-02-05 23:19:07 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.4105556011199951, 'learning_rate': 6.496874407134053e-06, 'epoch': 1.88} +2025-02-05 23:19:07 - ERROR - stderr - 63%|██████▎ | 14033/22434 [13:11:27<5:59:13, 2.57s/it] +2025-02-05 23:19:10 - ERROR - stderr - 63%|██████▎ | 14034/22434 [13:11:29<5:58:54, 2.56s/it] +2025-02-05 23:19:10 - ERROR - stderr - +2025-02-05 23:19:10 - ERROR - stderr - +2025-02-05 23:19:10 - INFO - stdout - {'loss': 0.637, 'grad_norm': 1.1480857133865356, 'learning_rate': 6.495522183121741e-06, 'epoch': 1.88} +2025-02-05 23:19:10 - ERROR - stderr - 63%|██████▎ | 14034/22434 [13:11:29<5:58:54, 2.56s/it] +2025-02-05 23:19:12 - ERROR - stderr - 63%|██████▎ | 14035/22434 [13:11:32<5:54:06, 2.53s/it] +2025-02-05 23:19:12 - ERROR - stderr - +2025-02-05 23:19:12 - ERROR - stderr - +2025-02-05 23:19:12 - INFO - stdout - {'loss': 0.7553, 'grad_norm': 1.3478442430496216, 'learning_rate': 6.4941700321569215e-06, 'epoch': 1.88} +2025-02-05 23:19:12 - ERROR - stderr - 63%|██████▎ | 14035/22434 [13:11:32<5:54:06, 2.53s/it] +2025-02-05 23:19:15 - ERROR - stderr - 63%|██��███▎ | 14036/22434 [13:11:35<6:04:03, 2.60s/it] +2025-02-05 23:19:15 - ERROR - stderr - +2025-02-05 23:19:15 - ERROR - stderr - +2025-02-05 23:19:15 - INFO - stdout - {'loss': 0.6833, 'grad_norm': 1.2754693031311035, 'learning_rate': 6.492817954267771e-06, 'epoch': 1.88} +2025-02-05 23:19:15 - ERROR - stderr - 63%|██████▎ | 14036/22434 [13:11:35<6:04:03, 2.60s/it] +2025-02-05 23:19:19 - ERROR - stderr - 63%|██████▎ | 14037/22434 [13:11:39<7:06:32, 3.05s/it] +2025-02-05 23:19:19 - ERROR - stderr - +2025-02-05 23:19:19 - ERROR - stderr - +2025-02-05 23:19:19 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.1774544715881348, 'learning_rate': 6.491465949482471e-06, 'epoch': 1.88} +2025-02-05 23:19:19 - ERROR - stderr - 63%|██████▎ | 14037/22434 [13:11:39<7:06:32, 3.05s/it] +2025-02-05 23:19:22 - ERROR - stderr - 63%|██████▎ | 14038/22434 [13:11:42<7:04:44, 3.04s/it] +2025-02-05 23:19:22 - ERROR - stderr - +2025-02-05 23:19:22 - ERROR - stderr - +2025-02-05 23:19:22 - INFO - stdout - {'loss': 0.6889, 'grad_norm': 1.3568300008773804, 'learning_rate': 6.49011401782921e-06, 'epoch': 1.88} +2025-02-05 23:19:22 - ERROR - stderr - 63%|██████▎ | 14038/22434 [13:11:42<7:04:44, 3.04s/it] +2025-02-05 23:19:25 - ERROR - stderr - 63%|██████▎ | 14039/22434 [13:11:44<6:53:32, 2.96s/it] +2025-02-05 23:19:25 - ERROR - stderr - +2025-02-05 23:19:25 - ERROR - stderr - +2025-02-05 23:19:25 - INFO - stdout - {'loss': 0.6559, 'grad_norm': 1.3176097869873047, 'learning_rate': 6.4887621593361595e-06, 'epoch': 1.88} +2025-02-05 23:19:25 - ERROR - stderr - 63%|██████▎ | 14039/22434 [13:11:45<6:53:32, 2.96s/it] +2025-02-05 23:19:27 - ERROR - stderr - 63%|██████▎ | 14040/22434 [13:11:47<6:32:18, 2.80s/it] +2025-02-05 23:19:27 - ERROR - stderr - +2025-02-05 23:19:27 - ERROR - stderr - +2025-02-05 23:19:27 - INFO - stdout - {'loss': 0.5738, 'grad_norm': 1.1316043138504028, 'learning_rate': 6.487410374031504e-06, 'epoch': 1.88} +2025-02-05 23:19:27 - ERROR - stderr - 63%|██████▎ | 14040/22434 [13:11:47<6:32:18, 2.80s/it] +2025-02-05 23:19:30 - ERROR - stderr - 63%|██████▎ | 14041/22434 [13:11:49<6:19:16, 2.71s/it] +2025-02-05 23:19:30 - ERROR - stderr - +2025-02-05 23:19:30 - ERROR - stderr - +2025-02-05 23:19:30 - INFO - stdout - {'loss': 0.7198, 'grad_norm': 1.3101931810379028, 'learning_rate': 6.4860586619434205e-06, 'epoch': 1.88} +2025-02-05 23:19:30 - ERROR - stderr - 63%|██████▎ | 14041/22434 [13:11:49<6:19:16, 2.71s/it] +2025-02-05 23:19:32 - ERROR - stderr - 63%|██████▎ | 14042/22434 [13:11:52<6:07:14, 2.63s/it] +2025-02-05 23:19:32 - ERROR - stderr - +2025-02-05 23:19:32 - ERROR - stderr - +2025-02-05 23:19:32 - INFO - stdout - {'loss': 0.6992, 'grad_norm': 1.3052033185958862, 'learning_rate': 6.4847070231000775e-06, 'epoch': 1.88} +2025-02-05 23:19:32 - ERROR - stderr - 63%|██████▎ | 14042/22434 [13:11:52<6:07:14, 2.63s/it] +2025-02-05 23:19:34 - ERROR - stderr - 63%|██████▎ | 14043/22434 [13:11:54<5:58:09, 2.56s/it] +2025-02-05 23:19:35 - ERROR - stderr - +2025-02-05 23:19:35 - ERROR - stderr - +2025-02-05 23:19:35 - INFO - stdout - {'loss': 0.6926, 'grad_norm': 1.3033486604690552, 'learning_rate': 6.483355457529657e-06, 'epoch': 1.88} +2025-02-05 23:19:35 - ERROR - stderr - 63%|██████▎ | 14043/22434 [13:11:54<5:58:09, 2.56s/it] +2025-02-05 23:19:37 - ERROR - stderr - 63%|██████▎ | 14044/22434 [13:11:57<5:53:59, 2.53s/it] +2025-02-05 23:19:37 - ERROR - stderr - +2025-02-05 23:19:37 - ERROR - stderr - +2025-02-05 23:19:37 - INFO - stdout - {'loss': 0.6251, 'grad_norm': 1.2964602708816528, 'learning_rate': 6.482003965260326e-06, 'epoch': 1.88} +2025-02-05 23:19:37 - ERROR - stderr - 63%|██████▎ | 14044/22434 [13:11:57<5:53:59, 2.53s/it] +2025-02-05 23:19:39 - ERROR - stderr - 63%|██████▎ | 14045/22434 [13:11:59<5:50:59, 2.51s/it] +2025-02-05 23:19:39 - ERROR - stderr - +2025-02-05 23:19:39 - ERROR - stderr - +2025-02-05 23:19:39 - INFO - stdout - {'loss': 0.663, 'grad_norm': 1.1500821113586426, 'learning_rate': 6.480652546320254e-06, 'epoch': 1.88} +2025-02-05 23:19:39 - ERROR - stderr - 63%|██████▎ | 14045/22434 [13:11:59<5:50:59, 2.51s/it] +2025-02-05 23:19:42 - ERROR - stderr - 63%|██████▎ | 14046/22434 [13:12:02<5:51:03, 2.51s/it] +2025-02-05 23:19:42 - ERROR - stderr - +2025-02-05 23:19:42 - ERROR - stderr - +2025-02-05 23:19:42 - INFO - stdout - {'loss': 0.7686, 'grad_norm': 1.4576236009597778, 'learning_rate': 6.4793012007376125e-06, 'epoch': 1.88} +2025-02-05 23:19:42 - ERROR - stderr - 63%|██████▎ | 14046/22434 [13:12:02<5:51:03, 2.51s/it] +2025-02-05 23:19:44 - ERROR - stderr - 63%|██████▎ | 14047/22434 [13:12:04<5:52:06, 2.52s/it] +2025-02-05 23:19:44 - ERROR - stderr - +2025-02-05 23:19:44 - ERROR - stderr - +2025-02-05 23:19:44 - INFO - stdout - {'loss': 0.6454, 'grad_norm': 1.1989316940307617, 'learning_rate': 6.4779499285405655e-06, 'epoch': 1.88} +2025-02-05 23:19:45 - ERROR - stderr - 63%|██████▎ | 14047/22434 [13:12:04<5:52:06, 2.52s/it] +2025-02-05 23:19:47 - ERROR - stderr - 63%|██████▎ | 14048/22434 [13:12:07<5:50:50, 2.51s/it] +2025-02-05 23:19:47 - ERROR - stderr - +2025-02-05 23:19:47 - ERROR - stderr - +2025-02-05 23:19:47 - INFO - stdout - {'loss': 0.6987, 'grad_norm': 1.3502057790756226, 'learning_rate': 6.476598729757289e-06, 'epoch': 1.88} +2025-02-05 23:19:47 - ERROR - stderr - 63%|██████▎ | 14048/22434 [13:12:07<5:50:50, 2.51s/it] +2025-02-05 23:19:49 - ERROR - stderr - 63%|██████▎ | 14049/22434 [13:12:09<5:49:59, 2.50s/it] +2025-02-05 23:19:49 - ERROR - stderr - +2025-02-05 23:19:49 - ERROR - stderr - +2025-02-05 23:19:49 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.3030344247817993, 'learning_rate': 6.475247604415937e-06, 'epoch': 1.88} +2025-02-05 23:19:49 - ERROR - stderr - 63%|██████▎ | 14049/22434 [13:12:09<5:49:59, 2.50s/it] +2025-02-05 23:19:52 - ERROR - stderr - 63%|██████▎ | 14050/22434 [13:12:12<5:51:44, 2.52s/it] +2025-02-05 23:19:52 - ERROR - stderr - +2025-02-05 23:19:52 - ERROR - stderr - +2025-02-05 23:19:52 - INFO - stdout - {'loss': 0.6321, 'grad_norm': 1.1843358278274536, 'learning_rate': 6.473896552544674e-06, 'epoch': 1.88} +2025-02-05 23:19:52 - ERROR - stderr - 63%|██████▎ | 14050/22434 [13:12:12<5:51:44, 2.52s/it] +2025-02-05 23:19:54 - ERROR - stderr - 63%|██████▎ | 14051/22434 [13:12:14<5:48:55, 2.50s/it] +2025-02-05 23:19:54 - ERROR - stderr - +2025-02-05 23:19:54 - ERROR - stderr - +2025-02-05 23:19:54 - INFO - stdout - {'loss': 0.6878, 'grad_norm': 1.2567222118377686, 'learning_rate': 6.472545574171667e-06, 'epoch': 1.88} +2025-02-05 23:19:54 - ERROR - stderr - 63%|██████▎ | 14051/22434 [13:12:14<5:48:55, 2.50s/it] +2025-02-05 23:19:57 - ERROR - stderr - 63%|██████▎ | 14052/22434 [13:12:17<5:50:10, 2.51s/it] +2025-02-05 23:19:57 - ERROR - stderr - +2025-02-05 23:19:57 - ERROR - stderr - +2025-02-05 23:19:57 - INFO - stdout - {'loss': 0.747, 'grad_norm': 1.2484915256500244, 'learning_rate': 6.471194669325069e-06, 'epoch': 1.88} +2025-02-05 23:19:57 - ERROR - stderr - 63%|██████▎ | 14052/22434 [13:12:17<5:50:10, 2.51s/it] +2025-02-05 23:19:59 - ERROR - stderr - 63%|██████▎ | 14053/22434 [13:12:19<5:47:41, 2.49s/it] +2025-02-05 23:19:59 - ERROR - stderr - +2025-02-05 23:19:59 - ERROR - stderr - +2025-02-05 23:19:59 - INFO - stdout - {'loss': 0.6248, 'grad_norm': 1.3272696733474731, 'learning_rate': 6.4698438380330405e-06, 'epoch': 1.88} +2025-02-05 23:19:59 - ERROR - stderr - 63%|██████▎ | 14053/22434 [13:12:19<5:47:41, 2.49s/it] +2025-02-05 23:20:02 - ERROR - stderr - 63%|██████▎ | 14054/22434 [13:12:22<5:53:19, 2.53s/it] +2025-02-05 23:20:02 - ERROR - stderr - +2025-02-05 23:20:02 - ERROR - stderr - +2025-02-05 23:20:02 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.1845945119857788, 'learning_rate': 6.468493080323743e-06, 'epoch': 1.88} +2025-02-05 23:20:02 - ERROR - stderr - 63%|██████▎ | 14054/22434 [13:12:22<5:53:19, 2.53s/it] +2025-02-05 23:20:05 - ERROR - stderr - 63%|██████▎ | 14055/22434 [13:12:24<5:52:42, 2.53s/it] +2025-02-05 23:20:05 - ERROR - stderr - +2025-02-05 23:20:05 - ERROR - stderr - +2025-02-05 23:20:05 - INFO - stdout - {'loss': 0.6084, 'grad_norm': 1.221358060836792, 'learning_rate': 6.4671423962253255e-06, 'epoch': 1.88} +2025-02-05 23:20:05 - ERROR - stderr - 63%|██████▎ | 14055/22434 [13:12:24<5:52:42, 2.53s/it] +2025-02-05 23:20:07 - ERROR - stderr - 63%|██████▎ | 14056/22434 [13:12:27<5:52:35, 2.53s/it] +2025-02-05 23:20:07 - ERROR - stderr - +2025-02-05 23:20:07 - ERROR - stderr - +2025-02-05 23:20:07 - INFO - stdout - {'loss': 0.6483, 'grad_norm': 1.319655418395996, 'learning_rate': 6.465791785765946e-06, 'epoch': 1.88} +2025-02-05 23:20:07 - ERROR - stderr - 63%|██████▎ | 14056/22434 [13:12:27<5:52:35, 2.53s/it] +2025-02-05 23:20:10 - ERROR - stderr - 63%|██████▎ | 14057/22434 [13:12:30<6:05:47, 2.62s/it] +2025-02-05 23:20:10 - ERROR - stderr - +2025-02-05 23:20:10 - ERROR - stderr - +2025-02-05 23:20:10 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.1982049942016602, 'learning_rate': 6.464441248973756e-06, 'epoch': 1.88} +2025-02-05 23:20:10 - ERROR - stderr - 63%|██████▎ | 14057/22434 [13:12:30<6:05:47, 2.62s/it] +2025-02-05 23:20:12 - ERROR - stderr - 63%|██████▎ | 14058/22434 [13:12:32<5:59:40, 2.58s/it] +2025-02-05 23:20:12 - ERROR - stderr - +2025-02-05 23:20:12 - ERROR - stderr - +2025-02-05 23:20:12 - INFO - stdout - {'loss': 0.7486, 'grad_norm': 1.3233323097229004, 'learning_rate': 6.4630907858769e-06, 'epoch': 1.88} +2025-02-05 23:20:12 - ERROR - stderr - 63%|██████▎ | 14058/22434 [13:12:32<5:59:40, 2.58s/it] +2025-02-05 23:20:15 - ERROR - stderr - 63%|██████▎ | 14059/22434 [13:12:35<5:59:36, 2.58s/it] +2025-02-05 23:20:15 - ERROR - stderr - +2025-02-05 23:20:15 - ERROR - stderr - +2025-02-05 23:20:15 - INFO - stdout - {'loss': 0.5452, 'grad_norm': 1.2489064931869507, 'learning_rate': 6.4617403965035356e-06, 'epoch': 1.88} +2025-02-05 23:20:15 - ERROR - stderr - 63%|██████▎ | 14059/22434 [13:12:35<5:59:36, 2.58s/it] +2025-02-05 23:20:17 - ERROR - stderr - 63%|██████▎ | 14060/22434 [13:12:37<5:56:16, 2.55s/it] +2025-02-05 23:20:18 - ERROR - stderr - +2025-02-05 23:20:18 - ERROR - stderr - +2025-02-05 23:20:18 - INFO - stdout - {'loss': 0.6551, 'grad_norm': 1.371580958366394, 'learning_rate': 6.460390080881807e-06, 'epoch': 1.88} +2025-02-05 23:20:18 - ERROR - stderr - 63%|██████▎ | 14060/22434 [13:12:37<5:56:16, 2.55s/it] +2025-02-05 23:20:20 - ERROR - stderr - 63%|██████▎ | 14061/22434 [13:12:40<5:50:50, 2.51s/it] +2025-02-05 23:20:20 - ERROR - stderr - +2025-02-05 23:20:20 - ERROR - stderr - +2025-02-05 23:20:20 - INFO - stdout - {'loss': 0.6407, 'grad_norm': 1.2519102096557617, 'learning_rate': 6.459039839039858e-06, 'epoch': 1.88} +2025-02-05 23:20:20 - ERROR - stderr - 63%|██████▎ | 14061/22434 [13:12:40<5:50:50, 2.51s/it] +2025-02-05 23:20:22 - ERROR - stderr - 63%|██████▎ | 14062/22434 [13:12:42<5:49:16, 2.50s/it] +2025-02-05 23:20:22 - ERROR - stderr - +2025-02-05 23:20:22 - ERROR - stderr - +2025-02-05 23:20:22 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.181142807006836, 'learning_rate': 6.457689671005838e-06, 'epoch': 1.88} +2025-02-05 23:20:22 - ERROR - stderr - 63%|██████▎ | 14062/22434 [13:12:42<5:49:16, 2.50s/it] +2025-02-05 23:20:25 - ERROR - stderr - 63%|██████▎ | 14063/22434 [13:12:45<5:49:31, 2.51s/it] +2025-02-05 23:20:25 - ERROR - stderr - +2025-02-05 23:20:25 - ERROR - stderr - +2025-02-05 23:20:25 - INFO - stdout - {'loss': 0.7706, 'grad_norm': 1.3463757038116455, 'learning_rate': 6.456339576807883e-06, 'epoch': 1.88} +2025-02-05 23:20:25 - ERROR - stderr - 63%|██████▎ | 14063/22434 [13:12:45<5:49:31, 2.51s/it] +2025-02-05 23:20:27 - ERROR - stderr - 63%|██████▎ | 14064/22434 [13:12:47<5:48:44, 2.50s/it] +2025-02-05 23:20:27 - ERROR - stderr - +2025-02-05 23:20:27 - ERROR - stderr - +2025-02-05 23:20:27 - INFO - stdout - {'loss': 0.6521, 'grad_norm': 1.4333125352859497, 'learning_rate': 6.454989556474143e-06, 'epoch': 1.88} +2025-02-05 23:20:27 - ERROR - stderr - 63%|██████▎ | 14064/22434 [13:12:47<5:48:44, 2.50s/it] +2025-02-05 23:20:30 - ERROR - stderr - 63%|██████▎ | 14065/22434 [13:12:50<5:49:28, 2.51s/it] +2025-02-05 23:20:30 - ERROR - stderr - +2025-02-05 23:20:30 - ERROR - stderr - +2025-02-05 23:20:30 - INFO - stdout - {'loss': 0.6348, 'grad_norm': 1.3870477676391602, 'learning_rate': 6.453639610032751e-06, 'epoch': 1.88} +2025-02-05 23:20:30 - ERROR - stderr - 63%|██████▎ | 14065/22434 [13:12:50<5:49:28, 2.51s/it] +2025-02-05 23:20:33 - ERROR - stderr - 63%|██████▎ | 14066/22434 [13:12:52<5:54:47, 2.54s/it] +2025-02-05 23:20:33 - ERROR - stderr - +2025-02-05 23:20:33 - ERROR - stderr - +2025-02-05 23:20:33 - INFO - stdout - {'loss': 0.6294, 'grad_norm': 1.2330268621444702, 'learning_rate': 6.452289737511846e-06, 'epoch': 1.88} +2025-02-05 23:20:33 - ERROR - stderr - 63%|██████▎ | 14066/22434 [13:12:52<5:54:47, 2.54s/it] +2025-02-05 23:20:35 - ERROR - stderr - 63%|██████▎ | 14067/22434 [13:12:55<5:54:07, 2.54s/it] +2025-02-05 23:20:35 - ERROR - stderr - +2025-02-05 23:20:35 - ERROR - stderr - +2025-02-05 23:20:35 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.3861037492752075, 'learning_rate': 6.450939938939571e-06, 'epoch': 1.88} +2025-02-05 23:20:35 - ERROR - stderr - 63%|██████▎ | 14067/22434 [13:12:55<5:54:07, 2.54s/it] +2025-02-05 23:20:38 - ERROR - stderr - 63%|██████▎ | 14068/22434 [13:12:57<5:52:13, 2.53s/it] +2025-02-05 23:20:38 - ERROR - stderr - +2025-02-05 23:20:38 - ERROR - stderr - +2025-02-05 23:20:38 - INFO - stdout - {'loss': 0.6889, 'grad_norm': 1.2697967290878296, 'learning_rate': 6.449590214344057e-06, 'epoch': 1.88} +2025-02-05 23:20:38 - ERROR - stderr - 63%|██████▎ | 14068/22434 [13:12:57<5:52:13, 2.53s/it] +2025-02-05 23:20:40 - ERROR - stderr - 63%|██████▎ | 14069/22434 [13:13:00<5:49:47, 2.51s/it] +2025-02-05 23:20:40 - ERROR - stderr - +2025-02-05 23:20:40 - ERROR - stderr - +2025-02-05 23:20:40 - INFO - stdout - {'loss': 0.6688, 'grad_norm': 1.3931254148483276, 'learning_rate': 6.448240563753434e-06, 'epoch': 1.88} +2025-02-05 23:20:40 - ERROR - stderr - 63%|██████▎ | 14069/22434 [13:13:00<5:49:47, 2.51s/it] +2025-02-05 23:20:43 - ERROR - stderr - 63%|██████▎ | 14070/22434 [13:13:02<5:50:46, 2.52s/it] +2025-02-05 23:20:43 - ERROR - stderr - +2025-02-05 23:20:43 - ERROR - stderr - +2025-02-05 23:20:43 - INFO - stdout - {'loss': 0.6595, 'grad_norm': 1.3026654720306396, 'learning_rate': 6.446890987195842e-06, 'epoch': 1.88} +2025-02-05 23:20:43 - ERROR - stderr - 63%|██████▎ | 14070/22434 [13:13:02<5:50:46, 2.52s/it] +2025-02-05 23:20:45 - ERROR - stderr - 63%|██████▎ | 14071/22434 [13:13:05<5:46:52, 2.49s/it] +2025-02-05 23:20:45 - ERROR - stderr - +2025-02-05 23:20:45 - ERROR - stderr - +2025-02-05 23:20:45 - INFO - stdout - {'loss': 0.7557, 'grad_norm': 1.330972671508789, 'learning_rate': 6.445541484699402e-06, 'epoch': 1.88} +2025-02-05 23:20:45 - ERROR - stderr - 63%|██████▎ | 14071/22434 [13:13:05<5:46:52, 2.49s/it] +2025-02-05 23:20:48 - ERROR - stderr - 63%|██████▎ | 14072/22434 [13:13:07<5:49:16, 2.51s/it] +2025-02-05 23:20:48 - ERROR - stderr - +2025-02-05 23:20:48 - ERROR - stderr - +2025-02-05 23:20:48 - INFO - stdout - {'loss': 0.8084, 'grad_norm': 1.401557207107544, 'learning_rate': 6.444192056292251e-06, 'epoch': 1.88} +2025-02-05 23:20:48 - ERROR - stderr - 63%|██████▎ | 14072/22434 [13:13:07<5:49:16, 2.51s/it] +2025-02-05 23:20:50 - ERROR - stderr - 63%|██████▎ | 14073/22434 [13:13:10<5:47:34, 2.49s/it] +2025-02-05 23:20:50 - ERROR - stderr - +2025-02-05 23:20:50 - ERROR - stderr - +2025-02-05 23:20:50 - INFO - stdout - {'loss': 0.6921, 'grad_norm': 1.3080319166183472, 'learning_rate': 6.442842702002516e-06, 'epoch': 1.88} +2025-02-05 23:20:50 - ERROR - stderr - 63%|██████▎ | 14073/22434 [13:13:10<5:47:34, 2.49s/it] +2025-02-05 23:20:52 - ERROR - stderr - 63%|██████▎ | 14074/22434 [13:13:12<5:44:49, 2.47s/it] +2025-02-05 23:20:52 - ERROR - stderr - +2025-02-05 23:20:52 - ERROR - stderr - +2025-02-05 23:20:52 - INFO - stdout - {'loss': 0.6687, 'grad_norm': 1.3351554870605469, 'learning_rate': 6.441493421858318e-06, 'epoch': 1.88} +2025-02-05 23:20:52 - ERROR - stderr - 63%|██████▎ | 14074/22434 [13:13:12<5:44:49, 2.47s/it] +2025-02-05 23:20:55 - ERROR - stderr - 63%|██████▎ | 14075/22434 [13:13:15<5:46:31, 2.49s/it] +2025-02-05 23:20:55 - ERROR - stderr - +2025-02-05 23:20:55 - ERROR - stderr - +2025-02-05 23:20:55 - INFO - stdout - {'loss': 0.7118, 'grad_norm': 1.3229854106903076, 'learning_rate': 6.440144215887788e-06, 'epoch': 1.88} +2025-02-05 23:20:55 - ERROR - stderr - 63%|██████▎ | 14075/22434 [13:13:15<5:46:31, 2.49s/it] +2025-02-05 23:20:57 - ERROR - stderr - 63%|██████▎ | 14076/22434 [13:13:17<5:47:01, 2.49s/it] +2025-02-05 23:20:57 - ERROR - stderr - +2025-02-05 23:20:57 - ERROR - stderr - +2025-02-05 23:20:57 - INFO - stdout - {'loss': 0.7045, 'grad_norm': 1.4023959636688232, 'learning_rate': 6.438795084119045e-06, 'epoch': 1.88} +2025-02-05 23:20:57 - ERROR - stderr - 63%|██████▎ | 14076/22434 [13:13:17<5:47:01, 2.49s/it] +2025-02-05 23:21:00 - ERROR - stderr - 63%|██████▎ | 14077/22434 [13:13:20<5:44:31, 2.47s/it] +2025-02-05 23:21:00 - ERROR - stderr - +2025-02-05 23:21:00 - ERROR - stderr - +2025-02-05 23:21:00 - INFO - stdout - {'loss': 0.6875, 'grad_norm': 1.1306209564208984, 'learning_rate': 6.437446026580208e-06, 'epoch': 1.88} +2025-02-05 23:21:00 - ERROR - stderr - 63%|██████▎ | 14077/22434 [13:13:20<5:44:31, 2.47s/it] +2025-02-05 23:21:02 - ERROR - stderr - 63%|██████▎ | 14078/22434 [13:13:22<5:45:34, 2.48s/it] +2025-02-05 23:21:02 - ERROR - stderr - +2025-02-05 23:21:02 - ERROR - stderr - +2025-02-05 23:21:02 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.146634817123413, 'learning_rate': 6.4360970432993995e-06, 'epoch': 1.88} +2025-02-05 23:21:02 - ERROR - stderr - 63%|██████▎ | 14078/22434 [13:13:22<5:45:34, 2.48s/it] +2025-02-05 23:21:05 - ERROR - stderr - 63%|██████▎ | 14079/22434 [13:13:25<5:49:18, 2.51s/it] +2025-02-05 23:21:05 - ERROR - stderr - +2025-02-05 23:21:05 - ERROR - stderr - +2025-02-05 23:21:05 - INFO - stdout - {'loss': 0.6109, 'grad_norm': 1.1646977663040161, 'learning_rate': 6.434748134304737e-06, 'epoch': 1.88} +2025-02-05 23:21:05 - ERROR - stderr - 63%|██████▎ | 14079/22434 [13:13:25<5:49:18, 2.51s/it] +2025-02-05 23:21:07 - ERROR - stderr - 63%|██████▎ | 14080/22434 [13:13:27<5:49:27, 2.51s/it] +2025-02-05 23:21:07 - ERROR - stderr - +2025-02-05 23:21:07 - ERROR - stderr - +2025-02-05 23:21:07 - INFO - stdout - {'loss': 0.632, 'grad_norm': 1.2927672863006592, 'learning_rate': 6.433399299624342e-06, 'epoch': 1.88} +2025-02-05 23:21:07 - ERROR - stderr - 63%|██████▎ | 14080/22434 [13:13:27<5:49:27, 2.51s/it] +2025-02-05 23:21:10 - ERROR - stderr - 63%|██████▎ | 14081/22434 [13:13:30<5:51:54, 2.53s/it] +2025-02-05 23:21:10 - ERROR - stderr - +2025-02-05 23:21:10 - ERROR - stderr - +2025-02-05 23:21:10 - INFO - stdout - {'loss': 0.6169, 'grad_norm': 1.1203598976135254, 'learning_rate': 6.432050539286325e-06, 'epoch': 1.88} +2025-02-05 23:21:10 - ERROR - stderr - 63%|██████▎ | 14081/22434 [13:13:30<5:51:54, 2.53s/it] +2025-02-05 23:21:13 - ERROR - stderr - 63%|██████▎ | 14082/22434 [13:13:32<5:51:52, 2.53s/it] +2025-02-05 23:21:13 - ERROR - stderr - +2025-02-05 23:21:13 - ERROR - stderr - +2025-02-05 23:21:13 - INFO - stdout - {'loss': 0.7569, 'grad_norm': 1.3726783990859985, 'learning_rate': 6.430701853318797e-06, 'epoch': 1.88} +2025-02-05 23:21:13 - ERROR - stderr - 63%|██████▎ | 14082/22434 [13:13:32<5:51:52, 2.53s/it] +2025-02-05 23:21:15 - ERROR - stderr - 63%|██████▎ | 14083/22434 [13:13:35<5:52:05, 2.53s/it] +2025-02-05 23:21:15 - ERROR - stderr - +2025-02-05 23:21:15 - ERROR - stderr - +2025-02-05 23:21:15 - INFO - stdout - {'loss': 0.7143, 'grad_norm': 1.2672418355941772, 'learning_rate': 6.429353241749878e-06, 'epoch': 1.88} +2025-02-05 23:21:15 - ERROR - stderr - 63%|██████▎ | 14083/22434 [13:13:35<5:52:05, 2.53s/it] +2025-02-05 23:21:18 - ERROR - stderr - 63%|██████▎ | 14084/22434 [13:13:37<5:50:10, 2.52s/it] +2025-02-05 23:21:18 - ERROR - stderr - +2025-02-05 23:21:18 - ERROR - stderr - +2025-02-05 23:21:18 - INFO - stdout - {'loss': 0.6499, 'grad_norm': 1.170158863067627, 'learning_rate': 6.428004704607671e-06, 'epoch': 1.88} +2025-02-05 23:21:18 - ERROR - stderr - 63%|██████▎ | 14084/22434 [13:13:37<5:50:10, 2.52s/it] +2025-02-05 23:21:20 - ERROR - stderr - 63%|██████▎ | 14085/22434 [13:13:40<5:48:43, 2.51s/it] +2025-02-05 23:21:20 - ERROR - stderr - +2025-02-05 23:21:20 - ERROR - stderr - +2025-02-05 23:21:20 - INFO - stdout - {'loss': 0.6919, 'grad_norm': 1.2926275730133057, 'learning_rate': 6.426656241920286e-06, 'epoch': 1.88} +2025-02-05 23:21:20 - ERROR - stderr - 63%|██████▎ | 14085/22434 [13:13:40<5:48:43, 2.51s/it] +2025-02-05 23:21:22 - ERROR - stderr - 63%|██████▎ | 14086/22434 [13:13:42<5:45:53, 2.49s/it] +2025-02-05 23:21:23 - ERROR - stderr - +2025-02-05 23:21:23 - ERROR - stderr - +2025-02-05 23:21:23 - INFO - stdout - {'loss': 0.6639, 'grad_norm': 1.2739237546920776, 'learning_rate': 6.425307853715837e-06, 'epoch': 1.88} +2025-02-05 23:21:23 - ERROR - stderr - 63%|██████▎ | 14086/22434 [13:13:42<5:45:53, 2.49s/it] +2025-02-05 23:21:25 - ERROR - stderr - 63%|██████▎ | 14087/22434 [13:13:45<5:45:22, 2.48s/it] +2025-02-05 23:21:25 - ERROR - stderr - +2025-02-05 23:21:25 - ERROR - stderr - +2025-02-05 23:21:25 - INFO - stdout - {'loss': 0.6768, 'grad_norm': 1.224241852760315, 'learning_rate': 6.423959540022422e-06, 'epoch': 1.88} +2025-02-05 23:21:25 - ERROR - stderr - 63%|██████▎ | 14087/22434 [13:13:45<5:45:22, 2.48s/it] +2025-02-05 23:21:28 - ERROR - stderr - 63%|██████▎ | 14088/22434 [13:13:47<5:53:01, 2.54s/it] +2025-02-05 23:21:28 - ERROR - stderr - +2025-02-05 23:21:28 - ERROR - stderr - +2025-02-05 23:21:28 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.470683217048645, 'learning_rate': 6.422611300868151e-06, 'epoch': 1.88} +2025-02-05 23:21:28 - ERROR - stderr - 63%|██████▎ | 14088/22434 [13:13:47<5:53:01, 2.54s/it] +2025-02-05 23:21:30 - ERROR - stderr - 63%|██████▎ | 14089/22434 [13:13:50<5:50:48, 2.52s/it] +2025-02-05 23:21:30 - ERROR - stderr - +2025-02-05 23:21:30 - ERROR - stderr - +2025-02-05 23:21:30 - INFO - stdout - {'loss': 0.6913, 'grad_norm': 1.2674822807312012, 'learning_rate': 6.421263136281124e-06, 'epoch': 1.88} +2025-02-05 23:21:30 - ERROR - stderr - 63%|██████▎ | 14089/22434 [13:13:50<5:50:48, 2.52s/it] +2025-02-05 23:21:33 - ERROR - stderr - 63%|██████▎ | 14090/22434 [13:13:52<5:47:56, 2.50s/it] +2025-02-05 23:21:33 - ERROR - stderr - +2025-02-05 23:21:33 - ERROR - stderr - +2025-02-05 23:21:33 - INFO - stdout - {'loss': 0.6076, 'grad_norm': 1.124819278717041, 'learning_rate': 6.41991504628944e-06, 'epoch': 1.88} +2025-02-05 23:21:33 - ERROR - stderr - 63%|██████▎ | 14090/22434 [13:13:52<5:47:56, 2.50s/it] +2025-02-05 23:21:35 - ERROR - stderr - 63%|██████▎ | 14091/22434 [13:13:55<5:47:13, 2.50s/it] +2025-02-05 23:21:35 - ERROR - stderr - +2025-02-05 23:21:35 - ERROR - stderr - +2025-02-05 23:21:35 - INFO - stdout - {'loss': 0.5521, 'grad_norm': 1.148465633392334, 'learning_rate': 6.418567030921201e-06, 'epoch': 1.88} +2025-02-05 23:21:35 - ERROR - stderr - 63%|██████▎ | 14091/22434 [13:13:55<5:47:13, 2.50s/it] +2025-02-05 23:21:37 - ERROR - stderr - 63%|██████▎ | 14092/22434 [13:13:57<5:44:46, 2.48s/it] +2025-02-05 23:21:38 - ERROR - stderr - +2025-02-05 23:21:38 - ERROR - stderr - +2025-02-05 23:21:38 - INFO - stdout - {'loss': 0.7486, 'grad_norm': 1.265394687652588, 'learning_rate': 6.417219090204508e-06, 'epoch': 1.88} +2025-02-05 23:21:38 - ERROR - stderr - 63%|██████▎ | 14092/22434 [13:13:57<5:44:46, 2.48s/it] +2025-02-05 23:21:40 - ERROR - stderr - 63%|██████▎ | 14093/22434 [13:14:00<5:44:51, 2.48s/it] +2025-02-05 23:21:40 - ERROR - stderr - +2025-02-05 23:21:40 - ERROR - stderr - +2025-02-05 23:21:40 - INFO - stdout - {'loss': 0.6691, 'grad_norm': 1.236352562904358, 'learning_rate': 6.415871224167451e-06, 'epoch': 1.88} +2025-02-05 23:21:40 - ERROR - stderr - 63%|██████▎ | 14093/22434 [13:14:00<5:44:51, 2.48s/it] +2025-02-05 23:21:42 - ERROR - stderr - 63%|██████▎ | 14094/22434 [13:14:02<5:43:57, 2.47s/it] +2025-02-05 23:21:42 - ERROR - stderr - +2025-02-05 23:21:42 - ERROR - stderr - +2025-02-05 23:21:42 - INFO - stdout - {'loss': 0.7224, 'grad_norm': 1.2643847465515137, 'learning_rate': 6.414523432838134e-06, 'epoch': 1.88} +2025-02-05 23:21:42 - ERROR - stderr - 63%|██████▎ | 14094/22434 [13:14:02<5:43:57, 2.47s/it] +2025-02-05 23:21:45 - ERROR - stderr - 63%|██████▎ | 14095/22434 [13:14:05<5:46:25, 2.49s/it] +2025-02-05 23:21:45 - ERROR - stderr - +2025-02-05 23:21:45 - ERROR - stderr - +2025-02-05 23:21:45 - INFO - stdout - {'loss': 0.6419, 'grad_norm': 1.2067506313323975, 'learning_rate': 6.4131757162446395e-06, 'epoch': 1.88} +2025-02-05 23:21:45 - ERROR - stderr - 63%|██████▎ | 14095/22434 [13:14:05<5:46:25, 2.49s/it] +2025-02-05 23:21:47 - ERROR - stderr - 63%|██████▎ | 14096/22434 [13:14:07<5:46:23, 2.49s/it] +2025-02-05 23:21:48 - ERROR - stderr - +2025-02-05 23:21:48 - ERROR - stderr - +2025-02-05 23:21:48 - INFO - stdout - {'loss': 0.6542, 'grad_norm': 1.1536996364593506, 'learning_rate': 6.41182807441507e-06, 'epoch': 1.88} +2025-02-05 23:21:48 - ERROR - stderr - 63%|██████▎ | 14096/22434 [13:14:07<5:46:23, 2.49s/it] +2025-02-05 23:21:50 - ERROR - stderr - 63%|██████▎ | 14097/22434 [13:14:10<5:46:18, 2.49s/it] +2025-02-05 23:21:50 - ERROR - stderr - +2025-02-05 23:21:50 - ERROR - stderr - +2025-02-05 23:21:50 - INFO - stdout - {'loss': 0.6305, 'grad_norm': 1.090825080871582, 'learning_rate': 6.410480507377507e-06, 'epoch': 1.89} +2025-02-05 23:21:50 - ERROR - stderr - 63%|██████▎ | 14097/22434 [13:14:10<5:46:18, 2.49s/it] +2025-02-05 23:21:52 - ERROR - stderr - 63%|██████▎ | 14098/22434 [13:14:12<5:47:25, 2.50s/it] +2025-02-05 23:21:53 - ERROR - stderr - +2025-02-05 23:21:53 - ERROR - stderr - +2025-02-05 23:21:53 - INFO - stdout - {'loss': 0.6513, 'grad_norm': 1.358974575996399, 'learning_rate': 6.409133015160042e-06, 'epoch': 1.89} +2025-02-05 23:21:53 - ERROR - stderr - 63%|██████▎ | 14098/22434 [13:14:12<5:47:25, 2.50s/it] +2025-02-05 23:21:55 - ERROR - stderr - 63%|██████▎ | 14099/22434 [13:14:15<5:54:37, 2.55s/it] +2025-02-05 23:21:55 - ERROR - stderr - +2025-02-05 23:21:55 - ERROR - stderr - +2025-02-05 23:21:55 - INFO - stdout - {'loss': 0.6329, 'grad_norm': 1.3241006135940552, 'learning_rate': 6.407785597790768e-06, 'epoch': 1.89} +2025-02-05 23:21:55 - ERROR - stderr - 63%|██████▎ | 14099/22434 [13:14:15<5:54:37, 2.55s/it] +2025-02-05 23:21:58 - ERROR - stderr - 63%|██████▎ | 14100/22434 [13:14:17<5:51:35, 2.53s/it] +2025-02-05 23:21:58 - ERROR - stderr - +2025-02-05 23:21:58 - ERROR - stderr - +2025-02-05 23:21:58 - INFO - stdout - {'loss': 0.6051, 'grad_norm': 1.1992534399032593, 'learning_rate': 6.406438255297764e-06, 'epoch': 1.89} +2025-02-05 23:21:58 - ERROR - stderr - 63%|██████▎ | 14100/22434 [13:14:17<5:51:35, 2.53s/it] +2025-02-05 23:22:00 - ERROR - stderr - 63%|██████▎ | 14101/22434 [13:14:20<5:52:06, 2.54s/it] +2025-02-05 23:22:00 - ERROR - stderr - +2025-02-05 23:22:00 - ERROR - stderr - +2025-02-05 23:22:00 - INFO - stdout - {'loss': 0.6907, 'grad_norm': 1.2454341650009155, 'learning_rate': 6.405090987709113e-06, 'epoch': 1.89} +2025-02-05 23:22:00 - ERROR - stderr - 63%|██████▎ | 14101/22434 [13:14:20<5:52:06, 2.54s/it] +2025-02-05 23:22:03 - ERROR - stderr - 63%|██████▎ | 14102/22434 [13:14:22<5:47:53, 2.51s/it] +2025-02-05 23:22:03 - ERROR - stderr - +2025-02-05 23:22:03 - ERROR - stderr - +2025-02-05 23:22:03 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.2980799674987793, 'learning_rate': 6.403743795052905e-06, 'epoch': 1.89} +2025-02-05 23:22:03 - ERROR - stderr - 63%|██████▎ | 14102/22434 [13:14:22<5:47:53, 2.51s/it] +2025-02-05 23:22:05 - ERROR - stderr - 63%|██████▎ | 14103/22434 [13:14:25<5:48:25, 2.51s/it] +2025-02-05 23:22:05 - ERROR - stderr - +2025-02-05 23:22:05 - ERROR - stderr - +2025-02-05 23:22:05 - INFO - stdout - {'loss': 0.7634, 'grad_norm': 1.359434723854065, 'learning_rate': 6.402396677357212e-06, 'epoch': 1.89} +2025-02-05 23:22:05 - ERROR - stderr - 63%|██████▎ | 14103/22434 [13:14:25<5:48:25, 2.51s/it] +2025-02-05 23:22:08 - ERROR - stderr - 63%|██████▎ | 14104/22434 [13:14:27<5:48:48, 2.51s/it] +2025-02-05 23:22:08 - ERROR - stderr - +2025-02-05 23:22:08 - ERROR - stderr - +2025-02-05 23:22:08 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.2489616870880127, 'learning_rate': 6.401049634650119e-06, 'epoch': 1.89} +2025-02-05 23:22:08 - ERROR - stderr - 63%|██████▎ | 14104/22434 [13:14:27<5:48:48, 2.51s/it] +2025-02-05 23:22:11 - ERROR - stderr - 63%|██████▎ | 14105/22434 [13:14:30<6:03:31, 2.62s/it] +2025-02-05 23:22:11 - ERROR - stderr - +2025-02-05 23:22:11 - ERROR - stderr - +2025-02-05 23:22:11 - INFO - stdout - {'loss': 0.6227, 'grad_norm': 1.3029799461364746, 'learning_rate': 6.399702666959705e-06, 'epoch': 1.89} +2025-02-05 23:22:11 - ERROR - stderr - 63%|██████▎ | 14105/22434 [13:14:30<6:03:31, 2.62s/it] +2025-02-05 23:22:13 - ERROR - stderr - 63%|██████▎ | 14106/22434 [13:14:33<5:55:08, 2.56s/it] +2025-02-05 23:22:13 - ERROR - stderr - +2025-02-05 23:22:13 - ERROR - stderr - +2025-02-05 23:22:13 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.270702838897705, 'learning_rate': 6.39835577431404e-06, 'epoch': 1.89} +2025-02-05 23:22:13 - ERROR - stderr - 63%|██████▎ | 14106/22434 [13:14:33<5:55:08, 2.56s/it] +2025-02-05 23:22:15 - ERROR - stderr - 63%|██████▎ | 14107/22434 [13:14:35<5:55:17, 2.56s/it] +2025-02-05 23:22:16 - ERROR - stderr - +2025-02-05 23:22:16 - ERROR - stderr - +2025-02-05 23:22:16 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.2219544649124146, 'learning_rate': 6.397008956741206e-06, 'epoch': 1.89} +2025-02-05 23:22:16 - ERROR - stderr - 63%|██████▎ | 14107/22434 [13:14:35<5:55:17, 2.56s/it] +2025-02-05 23:22:18 - ERROR - stderr - 63%|██████▎ | 14108/22434 [13:14:38<5:51:24, 2.53s/it] +2025-02-05 23:22:18 - ERROR - stderr - +2025-02-05 23:22:18 - ERROR - stderr - +2025-02-05 23:22:18 - INFO - stdout - {'loss': 0.6904, 'grad_norm': 1.2158721685409546, 'learning_rate': 6.395662214269269e-06, 'epoch': 1.89} +2025-02-05 23:22:18 - ERROR - stderr - 63%|██████▎ | 14108/22434 [13:14:38<5:51:24, 2.53s/it] +2025-02-05 23:22:20 - ERROR - stderr - 63%|██████▎ | 14109/22434 [13:14:40<5:51:00, 2.53s/it] +2025-02-05 23:22:21 - ERROR - stderr - +2025-02-05 23:22:21 - ERROR - stderr - +2025-02-05 23:22:21 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.2379530668258667, 'learning_rate': 6.394315546926309e-06, 'epoch': 1.89} +2025-02-05 23:22:21 - ERROR - stderr - 63%|██████▎ | 14109/22434 [13:14:40<5:51:00, 2.53s/it] +2025-02-05 23:22:23 - ERROR - stderr - 63%|██████▎ | 14110/22434 [13:14:43<5:48:37, 2.51s/it] +2025-02-05 23:22:23 - ERROR - stderr - +2025-02-05 23:22:23 - ERROR - stderr - +2025-02-05 23:22:23 - INFO - stdout - {'loss': 0.6251, 'grad_norm': 1.1447101831436157, 'learning_rate': 6.3929689547403875e-06, 'epoch': 1.89} +2025-02-05 23:22:23 - ERROR - stderr - 63%|██████▎ | 14110/22434 [13:14:43<5:48:37, 2.51s/it] +2025-02-05 23:22:25 - ERROR - stderr - 63%|██████▎ | 14111/22434 [13:14:45<5:47:35, 2.51s/it] +2025-02-05 23:22:25 - ERROR - stderr - +2025-02-05 23:22:25 - ERROR - stderr - +2025-02-05 23:22:25 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.2050886154174805, 'learning_rate': 6.391622437739575e-06, 'epoch': 1.89} +2025-02-05 23:22:25 - ERROR - stderr - 63%|██████▎ | 14111/22434 [13:14:45<5:47:35, 2.51s/it] +2025-02-05 23:22:28 - ERROR - stderr - 63%|██████▎ | 14112/22434 [13:14:48<5:46:38, 2.50s/it] +2025-02-05 23:22:28 - ERROR - stderr - +2025-02-05 23:22:28 - ERROR - stderr - +2025-02-05 23:22:28 - INFO - stdout - {'loss': 0.6633, 'grad_norm': 1.1737996339797974, 'learning_rate': 6.390275995951945e-06, 'epoch': 1.89} +2025-02-05 23:22:28 - ERROR - stderr - 63%|██████▎ | 14112/22434 [13:14:48<5:46:38, 2.50s/it] +2025-02-05 23:22:30 - ERROR - stderr - 63%|██████▎ | 14113/22434 [13:14:50<5:44:47, 2.49s/it] +2025-02-05 23:22:30 - ERROR - stderr - +2025-02-05 23:22:30 - ERROR - stderr - +2025-02-05 23:22:30 - INFO - stdout - {'loss': 0.719, 'grad_norm': 1.2596298456192017, 'learning_rate': 6.3889296294055566e-06, 'epoch': 1.89} +2025-02-05 23:22:30 - ERROR - stderr - 63%|██████▎ | 14113/22434 [13:14:50<5:44:47, 2.49s/it] +2025-02-05 23:22:33 - ERROR - stderr - 63%|██████▎ | 14114/22434 [13:14:53<5:41:32, 2.46s/it] +2025-02-05 23:22:33 - ERROR - stderr - +2025-02-05 23:22:33 - ERROR - stderr - +2025-02-05 23:22:33 - INFO - stdout - {'loss': 0.6984, 'grad_norm': 1.2738516330718994, 'learning_rate': 6.387583338128471e-06, 'epoch': 1.89} +2025-02-05 23:22:33 - ERROR - stderr - 63%|██████▎ | 14114/22434 [13:14:53<5:41:32, 2.46s/it] +2025-02-05 23:22:35 - ERROR - stderr - 63%|██████▎ | 14115/22434 [13:14:55<5:43:48, 2.48s/it] +2025-02-05 23:22:35 - ERROR - stderr - +2025-02-05 23:22:35 - ERROR - stderr - +2025-02-05 23:22:35 - INFO - stdout - {'loss': 0.6017, 'grad_norm': 1.142195701599121, 'learning_rate': 6.386237122148758e-06, 'epoch': 1.89} +2025-02-05 23:22:35 - ERROR - stderr - 63%|██████▎ | 14115/22434 [13:14:55<5:43:48, 2.48s/it] +2025-02-05 23:22:38 - ERROR - stderr - 63%|██████▎ | 14116/22434 [13:14:58<5:43:49, 2.48s/it] +2025-02-05 23:22:38 - ERROR - stderr - +2025-02-05 23:22:38 - ERROR - stderr - +2025-02-05 23:22:38 - INFO - stdout - {'loss': 0.7709, 'grad_norm': 1.3245794773101807, 'learning_rate': 6.3848909814944706e-06, 'epoch': 1.89} +2025-02-05 23:22:38 - ERROR - stderr - 63%|██████▎ | 14116/22434 [13:14:58<5:43:49, 2.48s/it] +2025-02-05 23:22:40 - ERROR - stderr - 63%|██████▎ | 14117/22434 [13:15:00<5:45:00, 2.49s/it] +2025-02-05 23:22:40 - ERROR - stderr - +2025-02-05 23:22:40 - ERROR - stderr - +2025-02-05 23:22:40 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.2002161741256714, 'learning_rate': 6.383544916193674e-06, 'epoch': 1.89} +2025-02-05 23:22:40 - ERROR - stderr - 63%|██████▎ | 14117/22434 [13:15:00<5:45:00, 2.49s/it] +2025-02-05 23:22:43 - ERROR - stderr - 63%|██████▎ | 14118/22434 [13:15:03<5:45:00, 2.49s/it] +2025-02-05 23:22:43 - ERROR - stderr - +2025-02-05 23:22:43 - ERROR - stderr - +2025-02-05 23:22:43 - INFO - stdout - {'loss': 0.6515, 'grad_norm': 1.2082483768463135, 'learning_rate': 6.382198926274424e-06, 'epoch': 1.89} +2025-02-05 23:22:43 - ERROR - stderr - 63%|██████▎ | 14118/22434 [13:15:03<5:45:00, 2.49s/it] +2025-02-05 23:22:45 - ERROR - stderr - 63%|██████▎ | 14119/22434 [13:15:05<5:44:20, 2.48s/it] +2025-02-05 23:22:45 - ERROR - stderr - +2025-02-05 23:22:45 - ERROR - stderr - +2025-02-05 23:22:45 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.08533775806427, 'learning_rate': 6.380853011764772e-06, 'epoch': 1.89} +2025-02-05 23:22:45 - ERROR - stderr - 63%|██████▎ | 14119/22434 [13:15:05<5:44:20, 2.48s/it] +2025-02-05 23:22:48 - ERROR - stderr - 63%|██████▎ | 14120/22434 [13:15:08<5:49:12, 2.52s/it] +2025-02-05 23:22:48 - ERROR - stderr - +2025-02-05 23:22:48 - ERROR - stderr - +2025-02-05 23:22:48 - INFO - stdout - {'loss': 0.6863, 'grad_norm': 1.2559655904769897, 'learning_rate': 6.379507172692778e-06, 'epoch': 1.89} +2025-02-05 23:22:48 - ERROR - stderr - 63%|██████▎ | 14120/22434 [13:15:08<5:49:12, 2.52s/it] +2025-02-05 23:22:50 - ERROR - stderr - 63%|██████▎ | 14121/22434 [13:15:10<5:51:28, 2.54s/it] +2025-02-05 23:22:50 - ERROR - stderr - +2025-02-05 23:22:50 - ERROR - stderr - +2025-02-05 23:22:50 - INFO - stdout - {'loss': 0.6784, 'grad_norm': 1.3622547388076782, 'learning_rate': 6.378161409086494e-06, 'epoch': 1.89} +2025-02-05 23:22:50 - ERROR - stderr - 63%|██████▎ | 14121/22434 [13:15:10<5:51:28, 2.54s/it] +2025-02-05 23:22:53 - ERROR - stderr - 63%|██████▎ | 14122/22434 [13:15:13<5:46:41, 2.50s/it] +2025-02-05 23:22:53 - ERROR - stderr - +2025-02-05 23:22:53 - ERROR - stderr - +2025-02-05 23:22:53 - INFO - stdout - {'loss': 0.7581, 'grad_norm': 1.4511135816574097, 'learning_rate': 6.376815720973966e-06, 'epoch': 1.89} +2025-02-05 23:22:53 - ERROR - stderr - 63%|██████▎ | 14122/22434 [13:15:13<5:46:41, 2.50s/it] +2025-02-05 23:22:55 - ERROR - stderr - 63%|██████▎ | 14123/22434 [13:15:15<5:43:32, 2.48s/it] +2025-02-05 23:22:55 - ERROR - stderr - +2025-02-05 23:22:55 - ERROR - stderr - +2025-02-05 23:22:55 - INFO - stdout - {'loss': 0.7732, 'grad_norm': 1.3332024812698364, 'learning_rate': 6.375470108383249e-06, 'epoch': 1.89} +2025-02-05 23:22:55 - ERROR - stderr - 63%|██████▎ | 14123/22434 [13:15:15<5:43:32, 2.48s/it] +2025-02-05 23:22:58 - ERROR - stderr - 63%|██████▎ | 14124/22434 [13:15:18<5:44:28, 2.49s/it] +2025-02-05 23:22:58 - ERROR - stderr - +2025-02-05 23:22:58 - ERROR - stderr - +2025-02-05 23:22:58 - INFO - stdout - {'loss': 0.6536, 'grad_norm': 1.0717743635177612, 'learning_rate': 6.374124571342387e-06, 'epoch': 1.89} +2025-02-05 23:22:58 - ERROR - stderr - 63%|██████▎ | 14124/22434 [13:15:18<5:44:28, 2.49s/it] +2025-02-05 23:23:00 - ERROR - stderr - 63%|██████▎ | 14125/22434 [13:15:20<5:48:06, 2.51s/it] +2025-02-05 23:23:00 - ERROR - stderr - +2025-02-05 23:23:00 - ERROR - stderr - +2025-02-05 23:23:00 - INFO - stdout - {'loss': 0.6881, 'grad_norm': 1.1635172367095947, 'learning_rate': 6.372779109879433e-06, 'epoch': 1.89} +2025-02-05 23:23:00 - ERROR - stderr - 63%|██████▎ | 14125/22434 [13:15:20<5:48:06, 2.51s/it] +2025-02-05 23:23:03 - ERROR - stderr - 63%|██████▎ | 14126/22434 [13:15:23<5:48:08, 2.51s/it] +2025-02-05 23:23:03 - ERROR - stderr - +2025-02-05 23:23:03 - ERROR - stderr - +2025-02-05 23:23:03 - INFO - stdout - {'loss': 0.7624, 'grad_norm': 1.3672279119491577, 'learning_rate': 6.371433724022429e-06, 'epoch': 1.89} +2025-02-05 23:23:03 - ERROR - stderr - 63%|██████▎ | 14126/22434 [13:15:23<5:48:08, 2.51s/it] +2025-02-05 23:23:05 - ERROR - stderr - 63%|██████▎ | 14127/22434 [13:15:25<5:49:30, 2.52s/it] +2025-02-05 23:23:05 - ERROR - stderr - +2025-02-05 23:23:05 - ERROR - stderr - +2025-02-05 23:23:05 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.379828929901123, 'learning_rate': 6.3700884137994115e-06, 'epoch': 1.89} +2025-02-05 23:23:05 - ERROR - stderr - 63%|██████▎ | 14127/22434 [13:15:25<5:49:30, 2.52s/it] +2025-02-05 23:23:08 - ERROR - stderr - 63%|██████▎ | 14128/22434 [13:15:28<5:48:10, 2.52s/it] +2025-02-05 23:23:08 - ERROR - stderr - +2025-02-05 23:23:08 - ERROR - stderr - +2025-02-05 23:23:08 - INFO - stdout - {'loss': 0.5865, 'grad_norm': 1.245582938194275, 'learning_rate': 6.36874317923843e-06, 'epoch': 1.89} +2025-02-05 23:23:08 - ERROR - stderr - 63%|██████▎ | 14128/22434 [13:15:28<5:48:10, 2.52s/it] +2025-02-05 23:23:10 - ERROR - stderr - 63%|██████▎ | 14129/22434 [13:15:30<5:48:53, 2.52s/it] +2025-02-05 23:23:11 - ERROR - stderr - +2025-02-05 23:23:11 - ERROR - stderr - +2025-02-05 23:23:11 - INFO - stdout - {'loss': 0.6932, 'grad_norm': 1.250872015953064, 'learning_rate': 6.367398020367522e-06, 'epoch': 1.89} +2025-02-05 23:23:11 - ERROR - stderr - 63%|██████▎ | 14129/22434 [13:15:30<5:48:53, 2.52s/it] +2025-02-05 23:23:13 - ERROR - stderr - 63%|██████▎ | 14130/22434 [13:15:33<5:47:29, 2.51s/it] +2025-02-05 23:23:13 - ERROR - stderr - +2025-02-05 23:23:13 - ERROR - stderr - +2025-02-05 23:23:13 - INFO - stdout - {'loss': 0.7166, 'grad_norm': 1.3644089698791504, 'learning_rate': 6.366052937214724e-06, 'epoch': 1.89} +2025-02-05 23:23:13 - ERROR - stderr - 63%|██████▎ | 14130/22434 [13:15:33<5:47:29, 2.51s/it] +2025-02-05 23:23:16 - ERROR - stderr - 63%|██████▎ | 14131/22434 [13:15:35<5:51:07, 2.54s/it] +2025-02-05 23:23:16 - ERROR - stderr - +2025-02-05 23:23:16 - ERROR - stderr - +2025-02-05 23:23:16 - INFO - stdout - {'loss': 0.7288, 'grad_norm': 1.260862112045288, 'learning_rate': 6.364707929808079e-06, 'epoch': 1.89} +2025-02-05 23:23:16 - ERROR - stderr - 63%|██████▎ | 14131/22434 [13:15:35<5:51:07, 2.54s/it] +2025-02-05 23:23:18 - ERROR - stderr - 63%|██████▎ | 14132/22434 [13:15:38<5:49:28, 2.53s/it] +2025-02-05 23:23:18 - ERROR - stderr - +2025-02-05 23:23:18 - ERROR - stderr - +2025-02-05 23:23:18 - INFO - stdout - {'loss': 0.6437, 'grad_norm': 1.1989822387695312, 'learning_rate': 6.363362998175615e-06, 'epoch': 1.89} +2025-02-05 23:23:18 - ERROR - stderr - 63%|██████▎ | 14132/22434 [13:15:38<5:49:28, 2.53s/it] +2025-02-05 23:23:21 - ERROR - stderr - 63%|██████▎ | 14133/22434 [13:15:40<5:47:58, 2.52s/it] +2025-02-05 23:23:21 - ERROR - stderr - +2025-02-05 23:23:21 - ERROR - stderr - +2025-02-05 23:23:21 - INFO - stdout - {'loss': 0.7023, 'grad_norm': 1.2962535619735718, 'learning_rate': 6.3620181423453745e-06, 'epoch': 1.89} +2025-02-05 23:23:21 - ERROR - stderr - 63%|██████▎ | 14133/22434 [13:15:40<5:47:58, 2.52s/it] +2025-02-05 23:23:23 - ERROR - stderr - 63%|██████▎ | 14134/22434 [13:15:43<5:49:51, 2.53s/it] +2025-02-05 23:23:23 - ERROR - stderr - +2025-02-05 23:23:23 - ERROR - stderr - +2025-02-05 23:23:23 - INFO - stdout - {'loss': 0.6938, 'grad_norm': 1.2855851650238037, 'learning_rate': 6.360673362345382e-06, 'epoch': 1.89} +2025-02-05 23:23:23 - ERROR - stderr - 63%|██████▎ | 14134/22434 [13:15:43<5:49:51, 2.53s/it] +2025-02-05 23:23:26 - ERROR - stderr - 63%|██████▎ | 14135/22434 [13:15:45<5:48:33, 2.52s/it] +2025-02-05 23:23:26 - ERROR - stderr - +2025-02-05 23:23:26 - ERROR - stderr - +2025-02-05 23:23:26 - INFO - stdout - {'loss': 0.6408, 'grad_norm': 1.2507954835891724, 'learning_rate': 6.359328658203668e-06, 'epoch': 1.89} +2025-02-05 23:23:26 - ERROR - stderr - 63%|██████▎ | 14135/22434 [13:15:45<5:48:33, 2.52s/it] +2025-02-05 23:23:28 - ERROR - stderr - 63%|██████▎ | 14136/22434 [13:15:48<5:46:48, 2.51s/it] +2025-02-05 23:23:28 - ERROR - stderr - +2025-02-05 23:23:28 - ERROR - stderr - +2025-02-05 23:23:28 - INFO - stdout - {'loss': 0.7707, 'grad_norm': 1.434545636177063, 'learning_rate': 6.357984029948267e-06, 'epoch': 1.89} +2025-02-05 23:23:28 - ERROR - stderr - 63%|██████▎ | 14136/22434 [13:15:48<5:46:48, 2.51s/it] +2025-02-05 23:23:31 - ERROR - stderr - 63%|██████▎ | 14137/22434 [13:15:50<5:45:43, 2.50s/it] +2025-02-05 23:23:31 - ERROR - stderr - +2025-02-05 23:23:31 - ERROR - stderr - +2025-02-05 23:23:31 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.3401713371276855, 'learning_rate': 6.356639477607205e-06, 'epoch': 1.89} +2025-02-05 23:23:31 - ERROR - stderr - 63%|██████▎ | 14137/22434 [13:15:50<5:45:43, 2.50s/it] +2025-02-05 23:23:33 - ERROR - stderr - 63%|██████▎ | 14138/22434 [13:15:53<5:44:41, 2.49s/it] +2025-02-05 23:23:33 - ERROR - stderr - +2025-02-05 23:23:33 - ERROR - stderr - +2025-02-05 23:23:33 - INFO - stdout - {'loss': 0.6167, 'grad_norm': 1.2333322763442993, 'learning_rate': 6.355295001208504e-06, 'epoch': 1.89} +2025-02-05 23:23:33 - ERROR - stderr - 63%|██████▎ | 14138/22434 [13:15:53<5:44:41, 2.49s/it] +2025-02-05 23:23:36 - ERROR - stderr - 63%|██████▎ | 14139/22434 [13:15:55<5:45:41, 2.50s/it] +2025-02-05 23:23:36 - ERROR - stderr - +2025-02-05 23:23:36 - ERROR - stderr - +2025-02-05 23:23:36 - INFO - stdout - {'loss': 0.6738, 'grad_norm': 1.2623281478881836, 'learning_rate': 6.3539506007801944e-06, 'epoch': 1.89} +2025-02-05 23:23:36 - ERROR - stderr - 63%|██████▎ | 14139/22434 [13:15:55<5:45:41, 2.50s/it] +2025-02-05 23:23:38 - ERROR - stderr - 63%|██████▎ | 14140/22434 [13:15:58<5:44:22, 2.49s/it] +2025-02-05 23:23:38 - ERROR - stderr - +2025-02-05 23:23:38 - ERROR - stderr - +2025-02-05 23:23:38 - INFO - stdout - {'loss': 0.7274, 'grad_norm': 1.308822512626648, 'learning_rate': 6.352606276350291e-06, 'epoch': 1.89} +2025-02-05 23:23:38 - ERROR - stderr - 63%|██████▎ | 14140/22434 [13:15:58<5:44:22, 2.49s/it] +2025-02-05 23:23:41 - ERROR - stderr - 63%|██████▎ | 14141/22434 [13:16:00<5:46:15, 2.51s/it] +2025-02-05 23:23:41 - ERROR - stderr - +2025-02-05 23:23:41 - ERROR - stderr - +2025-02-05 23:23:41 - INFO - stdout - {'loss': 0.7126, 'grad_norm': 1.4683748483657837, 'learning_rate': 6.351262027946824e-06, 'epoch': 1.89} +2025-02-05 23:23:41 - ERROR - stderr - 63%|██████▎ | 14141/22434 [13:16:00<5:46:15, 2.51s/it] +2025-02-05 23:23:43 - ERROR - stderr - 63%|██████▎ | 14142/22434 [13:16:03<5:44:50, 2.50s/it] +2025-02-05 23:23:43 - ERROR - stderr - +2025-02-05 23:23:43 - ERROR - stderr - +2025-02-05 23:23:43 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.2373170852661133, 'learning_rate': 6.349917855597807e-06, 'epoch': 1.89} +2025-02-05 23:23:43 - ERROR - stderr - 63%|██████▎ | 14142/22434 [13:16:03<5:44:50, 2.50s/it] +2025-02-05 23:23:46 - ERROR - stderr - 63%|██████▎ | 14143/22434 [13:16:05<5:44:26, 2.49s/it] +2025-02-05 23:23:46 - ERROR - stderr - +2025-02-05 23:23:46 - ERROR - stderr - +2025-02-05 23:23:46 - INFO - stdout - {'loss': 0.5897, 'grad_norm': 1.0747766494750977, 'learning_rate': 6.348573759331257e-06, 'epoch': 1.89} +2025-02-05 23:23:46 - ERROR - stderr - 63%|██████▎ | 14143/22434 [13:16:05<5:44:26, 2.49s/it] +2025-02-05 23:23:48 - ERROR - stderr - 63%|██████▎ | 14144/22434 [13:16:08<5:45:07, 2.50s/it] +2025-02-05 23:23:48 - ERROR - stderr - +2025-02-05 23:23:48 - ERROR - stderr - +2025-02-05 23:23:48 - INFO - stdout - {'loss': 0.6217, 'grad_norm': 1.208284616470337, 'learning_rate': 6.347229739175197e-06, 'epoch': 1.89} +2025-02-05 23:23:48 - ERROR - stderr - 63%|██████▎ | 14144/22434 [13:16:08<5:45:07, 2.50s/it] +2025-02-05 23:23:51 - ERROR - stderr - 63%|██████▎ | 14145/22434 [13:16:10<5:45:13, 2.50s/it] +2025-02-05 23:23:51 - ERROR - stderr - +2025-02-05 23:23:51 - ERROR - stderr - +2025-02-05 23:23:51 - INFO - stdout - {'loss': 0.6309, 'grad_norm': 1.3434022665023804, 'learning_rate': 6.345885795157638e-06, 'epoch': 1.89} +2025-02-05 23:23:51 - ERROR - stderr - 63%|██████▎ | 14145/22434 [13:16:10<5:45:13, 2.50s/it] +2025-02-05 23:23:53 - ERROR - stderr - 63%|██████▎ | 14146/22434 [13:16:13<5:44:34, 2.49s/it] +2025-02-05 23:23:53 - ERROR - stderr - +2025-02-05 23:23:53 - ERROR - stderr - +2025-02-05 23:23:53 - INFO - stdout - {'loss': 0.7244, 'grad_norm': 1.3921606540679932, 'learning_rate': 6.344541927306589e-06, 'epoch': 1.89} +2025-02-05 23:23:53 - ERROR - stderr - 63%|██████▎ | 14146/22434 [13:16:13<5:44:34, 2.49s/it] +2025-02-05 23:23:55 - ERROR - stderr - 63%|██████▎ | 14147/22434 [13:16:15<5:43:16, 2.49s/it] +2025-02-05 23:23:56 - ERROR - stderr - +2025-02-05 23:23:56 - ERROR - stderr - +2025-02-05 23:23:56 - INFO - stdout - {'loss': 0.6479, 'grad_norm': 1.4474575519561768, 'learning_rate': 6.34319813565007e-06, 'epoch': 1.89} +2025-02-05 23:23:56 - ERROR - stderr - 63%|██████▎ | 14147/22434 [13:16:15<5:43:16, 2.49s/it] +2025-02-05 23:23:58 - ERROR - stderr - 63%|██████▎ | 14148/22434 [13:16:18<5:50:44, 2.54s/it] +2025-02-05 23:23:58 - ERROR - stderr - +2025-02-05 23:23:58 - ERROR - stderr - +2025-02-05 23:23:58 - INFO - stdout - {'loss': 0.7034, 'grad_norm': 1.269419550895691, 'learning_rate': 6.341854420216083e-06, 'epoch': 1.89} +2025-02-05 23:23:58 - ERROR - stderr - 63%|██████▎ | 14148/22434 [13:16:18<5:50:44, 2.54s/it] +2025-02-05 23:24:01 - ERROR - stderr - 63%|██████▎ | 14149/22434 [13:16:20<5:51:30, 2.55s/it] +2025-02-05 23:24:01 - ERROR - stderr - +2025-02-05 23:24:01 - ERROR - stderr - +2025-02-05 23:24:01 - INFO - stdout - {'loss': 0.7366, 'grad_norm': 1.4364163875579834, 'learning_rate': 6.34051078103264e-06, 'epoch': 1.89} +2025-02-05 23:24:01 - ERROR - stderr - 63%|██████▎ | 14149/22434 [13:16:21<5:51:30, 2.55s/it] +2025-02-05 23:24:03 - ERROR - stderr - 63%|██████▎ | 14150/22434 [13:16:23<5:46:49, 2.51s/it] +2025-02-05 23:24:03 - ERROR - stderr - +2025-02-05 23:24:03 - ERROR - stderr - +2025-02-05 23:24:03 - INFO - stdout - {'loss': 0.7664, 'grad_norm': 1.4080795049667358, 'learning_rate': 6.339167218127752e-06, 'epoch': 1.89} +2025-02-05 23:24:03 - ERROR - stderr - 63%|██████▎ | 14150/22434 [13:16:23<5:46:49, 2.51s/it] +2025-02-05 23:24:06 - ERROR - stderr - 63%|██████▎ | 14151/22434 [13:16:25<5:44:00, 2.49s/it] +2025-02-05 23:24:06 - ERROR - stderr - +2025-02-05 23:24:06 - ERROR - stderr - +2025-02-05 23:24:06 - INFO - stdout - {'loss': 0.6396, 'grad_norm': 1.118035912513733, 'learning_rate': 6.337823731529415e-06, 'epoch': 1.89} +2025-02-05 23:24:06 - ERROR - stderr - 63%|██████▎ | 14151/22434 [13:16:25<5:44:00, 2.49s/it] +2025-02-05 23:24:08 - ERROR - stderr - 63%|██████▎ | 14152/22434 [13:16:28<5:42:25, 2.48s/it] +2025-02-05 23:24:08 - ERROR - stderr - +2025-02-05 23:24:08 - ERROR - stderr - +2025-02-05 23:24:08 - INFO - stdout - {'loss': 0.7128, 'grad_norm': 1.190543532371521, 'learning_rate': 6.336480321265643e-06, 'epoch': 1.89} +2025-02-05 23:24:08 - ERROR - stderr - 63%|██████▎ | 14152/22434 [13:16:28<5:42:25, 2.48s/it] +2025-02-05 23:24:11 - ERROR - stderr - 63%|██████▎ | 14153/22434 [13:16:30<5:43:28, 2.49s/it] +2025-02-05 23:24:11 - ERROR - stderr - +2025-02-05 23:24:11 - ERROR - stderr - +2025-02-05 23:24:11 - INFO - stdout - {'loss': 0.7963, 'grad_norm': 1.452013611793518, 'learning_rate': 6.335136987364433e-06, 'epoch': 1.89} +2025-02-05 23:24:11 - ERROR - stderr - 63%|██████▎ | 14153/22434 [13:16:30<5:43:28, 2.49s/it] +2025-02-05 23:24:13 - ERROR - stderr - 63%|██████▎ | 14154/22434 [13:16:33<5:45:31, 2.50s/it] +2025-02-05 23:24:13 - ERROR - stderr - +2025-02-05 23:24:13 - ERROR - stderr - +2025-02-05 23:24:13 - INFO - stdout - {'loss': 0.7164, 'grad_norm': 1.3416484594345093, 'learning_rate': 6.333793729853781e-06, 'epoch': 1.89} +2025-02-05 23:24:13 - ERROR - stderr - 63%|██████▎ | 14154/22434 [13:16:33<5:45:31, 2.50s/it] +2025-02-05 23:24:16 - ERROR - stderr - 63%|██████▎ | 14155/22434 [13:16:35<5:45:47, 2.51s/it] +2025-02-05 23:24:16 - ERROR - stderr - +2025-02-05 23:24:16 - ERROR - stderr - +2025-02-05 23:24:16 - INFO - stdout - {'loss': 0.7003, 'grad_norm': 1.2548432350158691, 'learning_rate': 6.332450548761692e-06, 'epoch': 1.89} +2025-02-05 23:24:16 - ERROR - stderr - 63%|██████▎ | 14155/22434 [13:16:35<5:45:47, 2.51s/it] +2025-02-05 23:24:18 - ERROR - stderr - 63%|██████▎ | 14156/22434 [13:16:38<5:51:21, 2.55s/it] +2025-02-05 23:24:18 - ERROR - stderr - +2025-02-05 23:24:18 - ERROR - stderr - +2025-02-05 23:24:18 - INFO - stdout - {'loss': 0.6465, 'grad_norm': 1.287768840789795, 'learning_rate': 6.331107444116163e-06, 'epoch': 1.89} +2025-02-05 23:24:18 - ERROR - stderr - 63%|██████▎ | 14156/22434 [13:16:38<5:51:21, 2.55s/it] +2025-02-05 23:24:21 - ERROR - stderr - 63%|██████▎ | 14157/22434 [13:16:40<5:48:40, 2.53s/it] +2025-02-05 23:24:21 - ERROR - stderr - +2025-02-05 23:24:21 - ERROR - stderr - +2025-02-05 23:24:21 - INFO - stdout - {'loss': 0.63, 'grad_norm': 1.277902603149414, 'learning_rate': 6.32976441594519e-06, 'epoch': 1.89} +2025-02-05 23:24:21 - ERROR - stderr - 63%|██████▎ | 14157/22434 [13:16:41<5:48:40, 2.53s/it] +2025-02-05 23:24:23 - ERROR - stderr - 63%|██████▎ | 14158/22434 [13:16:43<5:49:37, 2.53s/it] +2025-02-05 23:24:23 - ERROR - stderr - +2025-02-05 23:24:23 - ERROR - stderr - +2025-02-05 23:24:23 - INFO - stdout - {'loss': 0.6681, 'grad_norm': 1.2718380689620972, 'learning_rate': 6.328421464276766e-06, 'epoch': 1.89} +2025-02-05 23:24:23 - ERROR - stderr - 63%|██████▎ | 14158/22434 [13:16:43<5:49:37, 2.53s/it] +2025-02-05 23:24:26 - ERROR - stderr - 63%|██████▎ | 14159/22434 [13:16:46<5:58:25, 2.60s/it] +2025-02-05 23:24:26 - ERROR - stderr - +2025-02-05 23:24:26 - ERROR - stderr - +2025-02-05 23:24:26 - INFO - stdout - {'loss': 0.6473, 'grad_norm': 1.206114411354065, 'learning_rate': 6.327078589138879e-06, 'epoch': 1.89} +2025-02-05 23:24:26 - ERROR - stderr - 63%|██████▎ | 14159/22434 [13:16:46<5:58:25, 2.60s/it] +2025-02-05 23:24:29 - ERROR - stderr - 63%|██████▎ | 14160/22434 [13:16:48<5:53:45, 2.57s/it] +2025-02-05 23:24:29 - ERROR - stderr - +2025-02-05 23:24:29 - ERROR - stderr - +2025-02-05 23:24:29 - INFO - stdout - {'loss': 0.6766, 'grad_norm': 1.3100687265396118, 'learning_rate': 6.325735790559529e-06, 'epoch': 1.89} +2025-02-05 23:24:29 - ERROR - stderr - 63%|██████▎ | 14160/22434 [13:16:48<5:53:45, 2.57s/it] +2025-02-05 23:24:31 - ERROR - stderr - 63%|██████▎ | 14161/22434 [13:16:51<5:52:44, 2.56s/it] +2025-02-05 23:24:31 - ERROR - stderr - +2025-02-05 23:24:31 - ERROR - stderr - +2025-02-05 23:24:31 - INFO - stdout - {'loss': 0.6304, 'grad_norm': 1.2397688627243042, 'learning_rate': 6.324393068566696e-06, 'epoch': 1.89} +2025-02-05 23:24:31 - ERROR - stderr - 63%|██████▎ | 14161/22434 [13:16:51<5:52:44, 2.56s/it] +2025-02-05 23:24:33 - ERROR - stderr - 63%|██████▎ | 14162/22434 [13:16:53<5:46:51, 2.52s/it] +2025-02-05 23:24:34 - ERROR - stderr - +2025-02-05 23:24:34 - ERROR - stderr - +2025-02-05 23:24:34 - INFO - stdout - {'loss': 0.6496, 'grad_norm': 1.1693998575210571, 'learning_rate': 6.323050423188374e-06, 'epoch': 1.89} +2025-02-05 23:24:34 - ERROR - stderr - 63%|██████▎ | 14162/22434 [13:16:53<5:46:51, 2.52s/it] +2025-02-05 23:24:36 - ERROR - stderr - 63%|██████▎ | 14163/22434 [13:16:56<5:46:49, 2.52s/it] +2025-02-05 23:24:36 - ERROR - stderr - +2025-02-05 23:24:36 - ERROR - stderr - +2025-02-05 23:24:36 - INFO - stdout - {'loss': 0.7717, 'grad_norm': 1.3855648040771484, 'learning_rate': 6.32170785445255e-06, 'epoch': 1.89} +2025-02-05 23:24:36 - ERROR - stderr - 63%|██████▎ | 14163/22434 [13:16:56<5:46:49, 2.52s/it] +2025-02-05 23:24:39 - ERROR - stderr - 63%|██████▎ | 14164/22434 [13:16:58<5:53:34, 2.57s/it] +2025-02-05 23:24:39 - ERROR - stderr - +2025-02-05 23:24:39 - ERROR - stderr - +2025-02-05 23:24:39 - INFO - stdout - {'loss': 0.6547, 'grad_norm': 1.1528772115707397, 'learning_rate': 6.320365362387202e-06, 'epoch': 1.89} +2025-02-05 23:24:39 - ERROR - stderr - 63%|██████▎ | 14164/22434 [13:16:58<5:53:34, 2.57s/it] +2025-02-05 23:24:41 - ERROR - stderr - 63%|██████▎ | 14165/22434 [13:17:01<5:52:48, 2.56s/it] +2025-02-05 23:24:41 - ERROR - stderr - +2025-02-05 23:24:41 - ERROR - stderr - +2025-02-05 23:24:41 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.5268951654434204, 'learning_rate': 6.31902294702032e-06, 'epoch': 1.89} +2025-02-05 23:24:41 - ERROR - stderr - 63%|██████▎ | 14165/22434 [13:17:01<5:52:48, 2.56s/it] +2025-02-05 23:24:44 - ERROR - stderr - 63%|██████▎ | 14166/22434 [13:17:03<5:49:46, 2.54s/it] +2025-02-05 23:24:44 - ERROR - stderr - +2025-02-05 23:24:44 - ERROR - stderr - +2025-02-05 23:24:44 - INFO - stdout - {'loss': 0.695, 'grad_norm': 1.3617584705352783, 'learning_rate': 6.317680608379884e-06, 'epoch': 1.89} +2025-02-05 23:24:44 - ERROR - stderr - 63%|██████▎ | 14166/22434 [13:17:04<5:49:46, 2.54s/it] +2025-02-05 23:24:47 - ERROR - stderr - 63%|██████▎ | 14167/22434 [13:17:06<6:01:57, 2.63s/it] +2025-02-05 23:24:47 - ERROR - stderr - +2025-02-05 23:24:47 - ERROR - stderr - +2025-02-05 23:24:47 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.430815577507019, 'learning_rate': 6.316338346493867e-06, 'epoch': 1.89} +2025-02-05 23:24:47 - ERROR - stderr - 63%|██████▎ | 14167/22434 [13:17:06<6:01:57, 2.63s/it] +2025-02-05 23:24:49 - ERROR - stderr - 63%|██████▎ | 14168/22434 [13:17:09<6:00:21, 2.62s/it] +2025-02-05 23:24:49 - ERROR - stderr - +2025-02-05 23:24:49 - ERROR - stderr - +2025-02-05 23:24:49 - INFO - stdout - {'loss': 0.6448, 'grad_norm': 1.2259982824325562, 'learning_rate': 6.314996161390255e-06, 'epoch': 1.89} +2025-02-05 23:24:49 - ERROR - stderr - 63%|██████▎ | 14168/22434 [13:17:09<6:00:21, 2.62s/it] +2025-02-05 23:24:52 - ERROR - stderr - 63%|██████▎ | 14169/22434 [13:17:11<5:56:01, 2.58s/it] +2025-02-05 23:24:52 - ERROR - stderr - +2025-02-05 23:24:52 - ERROR - stderr - +2025-02-05 23:24:52 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.4310404062271118, 'learning_rate': 6.313654053097023e-06, 'epoch': 1.89} +2025-02-05 23:24:52 - ERROR - stderr - 63%|██████▎ | 14169/22434 [13:17:11<5:56:01, 2.58s/it] +2025-02-05 23:24:54 - ERROR - stderr - 63%|██████▎ | 14170/22434 [13:17:14<5:53:07, 2.56s/it] +2025-02-05 23:24:54 - ERROR - stderr - +2025-02-05 23:24:54 - ERROR - stderr - +2025-02-05 23:24:54 - INFO - stdout - {'loss': 0.627, 'grad_norm': 1.3564549684524536, 'learning_rate': 6.312312021642142e-06, 'epoch': 1.89} +2025-02-05 23:24:54 - ERROR - stderr - 63%|██████▎ | 14170/22434 [13:17:14<5:53:07, 2.56s/it] +2025-02-05 23:24:57 - ERROR - stderr - 63%|██████▎ | 14171/22434 [13:17:17<5:53:45, 2.57s/it] +2025-02-05 23:24:57 - ERROR - stderr - +2025-02-05 23:24:57 - ERROR - stderr - +2025-02-05 23:24:57 - INFO - stdout - {'loss': 0.7327, 'grad_norm': 1.3183951377868652, 'learning_rate': 6.31097006705359e-06, 'epoch': 1.9} +2025-02-05 23:24:57 - ERROR - stderr - 63%|██████▎ | 14171/22434 [13:17:17<5:53:45, 2.57s/it] +2025-02-05 23:24:59 - ERROR - stderr - 63%|██████▎ | 14172/22434 [13:17:19<5:51:05, 2.55s/it] +2025-02-05 23:24:59 - ERROR - stderr - +2025-02-05 23:24:59 - ERROR - stderr - +2025-02-05 23:24:59 - INFO - stdout - {'loss': 0.6362, 'grad_norm': 1.311274766921997, 'learning_rate': 6.309628189359336e-06, 'epoch': 1.9} +2025-02-05 23:24:59 - ERROR - stderr - 63%|██████▎ | 14172/22434 [13:17:19<5:51:05, 2.55s/it] +2025-02-05 23:25:02 - ERROR - stderr - 63%|██████▎ | 14173/22434 [13:17:21<5:47:46, 2.53s/it] +2025-02-05 23:25:02 - ERROR - stderr - +2025-02-05 23:25:02 - ERROR - stderr - +2025-02-05 23:25:02 - INFO - stdout - {'loss': 0.6489, 'grad_norm': 1.2156037092208862, 'learning_rate': 6.3082863885873525e-06, 'epoch': 1.9} +2025-02-05 23:25:02 - ERROR - stderr - 63%|██████▎ | 14173/22434 [13:17:22<5:47:46, 2.53s/it] +2025-02-05 23:25:04 - ERROR - stderr - 63%|██████▎ | 14174/22434 [13:17:24<5:45:35, 2.51s/it] +2025-02-05 23:25:04 - ERROR - stderr - +2025-02-05 23:25:04 - ERROR - stderr - +2025-02-05 23:25:04 - INFO - stdout - {'loss': 0.6497, 'grad_norm': 1.2086210250854492, 'learning_rate': 6.306944664765606e-06, 'epoch': 1.9} +2025-02-05 23:25:04 - ERROR - stderr - 63%|██████▎ | 14174/22434 [13:17:24<5:45:35, 2.51s/it] +2025-02-05 23:25:07 - ERROR - stderr - 63%|██████▎ | 14175/22434 [13:17:26<5:43:43, 2.50s/it] +2025-02-05 23:25:07 - ERROR - stderr - +2025-02-05 23:25:07 - ERROR - stderr - +2025-02-05 23:25:07 - INFO - stdout - {'loss': 0.6882, 'grad_norm': 1.20020592212677, 'learning_rate': 6.305603017922062e-06, 'epoch': 1.9} +2025-02-05 23:25:07 - ERROR - stderr - 63%|██████▎ | 14175/22434 [13:17:26<5:43:43, 2.50s/it] +2025-02-05 23:25:09 - ERROR - stderr - 63%|██████▎ | 14176/22434 [13:17:29<5:42:44, 2.49s/it] +2025-02-05 23:25:09 - ERROR - stderr - +2025-02-05 23:25:09 - ERROR - stderr - +2025-02-05 23:25:09 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.24151611328125, 'learning_rate': 6.304261448084692e-06, 'epoch': 1.9} +2025-02-05 23:25:09 - ERROR - stderr - 63%|██████▎ | 14176/22434 [13:17:29<5:42:44, 2.49s/it] +2025-02-05 23:25:12 - ERROR - stderr - 63%|██████▎ | 14177/22434 [13:17:31<5:44:19, 2.50s/it] +2025-02-05 23:25:12 - ERROR - stderr - +2025-02-05 23:25:12 - ERROR - stderr - +2025-02-05 23:25:12 - INFO - stdout - {'loss': 0.6324, 'grad_norm': 1.2657420635223389, 'learning_rate': 6.3029199552814545e-06, 'epoch': 1.9} +2025-02-05 23:25:12 - ERROR - stderr - 63%|██████▎ | 14177/22434 [13:17:31<5:44:19, 2.50s/it] +2025-02-05 23:25:14 - ERROR - stderr - 63%|██████▎ | 14178/22434 [13:17:34<5:44:26, 2.50s/it] +2025-02-05 23:25:14 - ERROR - stderr - +2025-02-05 23:25:14 - ERROR - stderr - +2025-02-05 23:25:14 - INFO - stdout - {'loss': 0.62, 'grad_norm': 1.2817223072052002, 'learning_rate': 6.30157853954031e-06, 'epoch': 1.9} +2025-02-05 23:25:14 - ERROR - stderr - 63%|██████▎ | 14178/22434 [13:17:34<5:44:26, 2.50s/it] +2025-02-05 23:25:17 - ERROR - stderr - 63%|██████▎ | 14179/22434 [13:17:36<5:43:52, 2.50s/it] +2025-02-05 23:25:17 - ERROR - stderr - +2025-02-05 23:25:17 - ERROR - stderr - +2025-02-05 23:25:17 - INFO - stdout - {'loss': 0.605, 'grad_norm': 1.2780274152755737, 'learning_rate': 6.300237200889225e-06, 'epoch': 1.9} +2025-02-05 23:25:17 - ERROR - stderr - 63%|██████▎ | 14179/22434 [13:17:36<5:43:52, 2.50s/it] +2025-02-05 23:25:19 - ERROR - stderr - 63%|██████▎ | 14180/22434 [13:17:39<5:45:43, 2.51s/it] +2025-02-05 23:25:19 - ERROR - stderr - +2025-02-05 23:25:19 - ERROR - stderr - +2025-02-05 23:25:19 - INFO - stdout - {'loss': 0.7378, 'grad_norm': 1.2388662099838257, 'learning_rate': 6.2988959393561525e-06, 'epoch': 1.9} +2025-02-05 23:25:19 - ERROR - stderr - 63%|██████▎ | 14180/22434 [13:17:39<5:45:43, 2.51s/it] +2025-02-05 23:25:22 - ERROR - stderr - 63%|██████▎ | 14181/22434 [13:17:41<5:44:45, 2.51s/it] +2025-02-05 23:25:22 - ERROR - stderr - +2025-02-05 23:25:22 - ERROR - stderr - +2025-02-05 23:25:22 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.246254563331604, 'learning_rate': 6.297554754969053e-06, 'epoch': 1.9} +2025-02-05 23:25:22 - ERROR - stderr - 63%|██████▎ | 14181/22434 [13:17:42<5:44:45, 2.51s/it] +2025-02-05 23:25:24 - ERROR - stderr - 63%|██████▎ | 14182/22434 [13:17:44<5:45:01, 2.51s/it] +2025-02-05 23:25:24 - ERROR - stderr - +2025-02-05 23:25:24 - ERROR - stderr - +2025-02-05 23:25:24 - INFO - stdout - {'loss': 0.6024, 'grad_norm': 1.1774646043777466, 'learning_rate': 6.296213647755885e-06, 'epoch': 1.9} +2025-02-05 23:25:24 - ERROR - stderr - 63%|██████▎ | 14182/22434 [13:17:44<5:45:01, 2.51s/it] +2025-02-05 23:25:27 - ERROR - stderr - 63%|██████▎ | 14183/22434 [13:17:46<5:45:18, 2.51s/it] +2025-02-05 23:25:27 - ERROR - stderr - +2025-02-05 23:25:27 - ERROR - stderr - +2025-02-05 23:25:27 - INFO - stdout - {'loss': 0.6663, 'grad_norm': 1.1876485347747803, 'learning_rate': 6.294872617744595e-06, 'epoch': 1.9} +2025-02-05 23:25:27 - ERROR - stderr - 63%|██████▎ | 14183/22434 [13:17:47<5:45:18, 2.51s/it] +2025-02-05 23:25:29 - ERROR - stderr - 63%|██████▎ | 14184/22434 [13:17:49<5:45:30, 2.51s/it] +2025-02-05 23:25:29 - ERROR - stderr - +2025-02-05 23:25:29 - ERROR - stderr - +2025-02-05 23:25:29 - INFO - stdout - {'loss': 0.6741, 'grad_norm': 1.172798752784729, 'learning_rate': 6.293531664963144e-06, 'epoch': 1.9} +2025-02-05 23:25:29 - ERROR - stderr - 63%|██████▎ | 14184/22434 [13:17:49<5:45:30, 2.51s/it] +2025-02-05 23:25:32 - ERROR - stderr - 63%|██████▎ | 14185/22434 [13:17:52<5:45:18, 2.51s/it] +2025-02-05 23:25:32 - ERROR - stderr - +2025-02-05 23:25:32 - ERROR - stderr - +2025-02-05 23:25:32 - INFO - stdout - {'loss': 0.717, 'grad_norm': 1.4781556129455566, 'learning_rate': 6.292190789439479e-06, 'epoch': 1.9} +2025-02-05 23:25:32 - ERROR - stderr - 63%|██████▎ | 14185/22434 [13:17:52<5:45:18, 2.51s/it] +2025-02-05 23:25:34 - ERROR - stderr - 63%|██████▎ | 14186/22434 [13:17:54<5:44:12, 2.50s/it] +2025-02-05 23:25:34 - ERROR - stderr - +2025-02-05 23:25:34 - ERROR - stderr - +2025-02-05 23:25:34 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.3320326805114746, 'learning_rate': 6.2908499912015444e-06, 'epoch': 1.9} +2025-02-05 23:25:34 - ERROR - stderr - 63%|██████▎ | 14186/22434 [13:17:54<5:44:12, 2.50s/it] +2025-02-05 23:25:37 - ERROR - stderr - 63%|██████▎ | 14187/22434 [13:17:57<5:44:26, 2.51s/it] +2025-02-05 23:25:37 - ERROR - stderr - +2025-02-05 23:25:37 - ERROR - stderr - +2025-02-05 23:25:37 - INFO - stdout - {'loss': 0.6841, 'grad_norm': 1.4262053966522217, 'learning_rate': 6.2895092702772945e-06, 'epoch': 1.9} +2025-02-05 23:25:37 - ERROR - stderr - 63%|██████▎ | 14187/22434 [13:17:57<5:44:26, 2.51s/it] +2025-02-05 23:25:39 - ERROR - stderr - 63%|██████▎ | 14188/22434 [13:17:59<5:52:48, 2.57s/it] +2025-02-05 23:25:39 - ERROR - stderr - +2025-02-05 23:25:39 - ERROR - stderr - +2025-02-05 23:25:39 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.2862799167633057, 'learning_rate': 6.288168626694673e-06, 'epoch': 1.9} +2025-02-05 23:25:39 - ERROR - stderr - 63%|██████▎ | 14188/22434 [13:17:59<5:52:48, 2.57s/it] +2025-02-05 23:25:42 - ERROR - stderr - 63%|██████▎ | 14189/22434 [13:18:02<5:51:17, 2.56s/it] +2025-02-05 23:25:42 - ERROR - stderr - +2025-02-05 23:25:42 - ERROR - stderr - +2025-02-05 23:25:42 - INFO - stdout - {'loss': 0.7297, 'grad_norm': 1.3348792791366577, 'learning_rate': 6.286828060481626e-06, 'epoch': 1.9} +2025-02-05 23:25:42 - ERROR - stderr - 63%|██████▎ | 14189/22434 [13:18:02<5:51:17, 2.56s/it] +2025-02-05 23:25:45 - ERROR - stderr - 63%|██████▎ | 14190/22434 [13:18:04<5:54:21, 2.58s/it] +2025-02-05 23:25:45 - ERROR - stderr - +2025-02-05 23:25:45 - ERROR - stderr - +2025-02-05 23:25:45 - INFO - stdout - {'loss': 0.6294, 'grad_norm': 1.1944373846054077, 'learning_rate': 6.285487571666096e-06, 'epoch': 1.9} +2025-02-05 23:25:45 - ERROR - stderr - 63%|██████▎ | 14190/22434 [13:18:04<5:54:21, 2.58s/it] +2025-02-05 23:25:47 - ERROR - stderr - 63%|██████▎ | 14191/22434 [13:18:07<5:53:10, 2.57s/it] +2025-02-05 23:25:47 - ERROR - stderr - +2025-02-05 23:25:47 - ERROR - stderr - +2025-02-05 23:25:47 - INFO - stdout - {'loss': 0.6962, 'grad_norm': 1.2782119512557983, 'learning_rate': 6.284147160276018e-06, 'epoch': 1.9} +2025-02-05 23:25:47 - ERROR - stderr - 63%|██████▎ | 14191/22434 [13:18:07<5:53:10, 2.57s/it] +2025-02-05 23:25:50 - ERROR - stderr - 63%|██████▎ | 14192/22434 [13:18:09<5:53:05, 2.57s/it] +2025-02-05 23:25:50 - ERROR - stderr - +2025-02-05 23:25:50 - ERROR - stderr - +2025-02-05 23:25:50 - INFO - stdout - {'loss': 0.5789, 'grad_norm': 1.0583351850509644, 'learning_rate': 6.282806826339343e-06, 'epoch': 1.9} +2025-02-05 23:25:50 - ERROR - stderr - 63%|██████▎ | 14192/22434 [13:18:10<5:53:05, 2.57s/it] +2025-02-05 23:25:52 - ERROR - stderr - 63%|██████▎ | 14193/22434 [13:18:12<5:56:23, 2.59s/it] +2025-02-05 23:25:52 - ERROR - stderr - +2025-02-05 23:25:52 - ERROR - stderr - +2025-02-05 23:25:52 - INFO - stdout - {'loss': 0.6818, 'grad_norm': 1.25435209274292, 'learning_rate': 6.2814665698839976e-06, 'epoch': 1.9} +2025-02-05 23:25:52 - ERROR - stderr - 63%|██████▎ | 14193/22434 [13:18:12<5:56:23, 2.59s/it] +2025-02-05 23:25:55 - ERROR - stderr - 63%|██████▎ | 14194/22434 [13:18:15<5:50:03, 2.55s/it] +2025-02-05 23:25:55 - ERROR - stderr - +2025-02-05 23:25:55 - ERROR - stderr - +2025-02-05 23:25:55 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.3558266162872314, 'learning_rate': 6.280126390937925e-06, 'epoch': 1.9} +2025-02-05 23:25:55 - ERROR - stderr - 63%|██████▎ | 14194/22434 [13:18:15<5:50:03, 2.55s/it] +2025-02-05 23:25:57 - ERROR - stderr - 63%|██████▎ | 14195/22434 [13:18:17<5:48:04, 2.53s/it] +2025-02-05 23:25:57 - ERROR - stderr - +2025-02-05 23:25:57 - ERROR - stderr - +2025-02-05 23:25:57 - INFO - stdout - {'loss': 0.6537, 'grad_norm': 1.1226017475128174, 'learning_rate': 6.278786289529061e-06, 'epoch': 1.9} +2025-02-05 23:25:57 - ERROR - stderr - 63%|██████▎ | 14195/22434 [13:18:17<5:48:04, 2.53s/it] +2025-02-05 23:26:00 - ERROR - stderr - 63%|██████▎ | 14196/22434 [13:18:20<5:43:43, 2.50s/it] +2025-02-05 23:26:00 - ERROR - stderr - +2025-02-05 23:26:00 - ERROR - stderr - +2025-02-05 23:26:00 - INFO - stdout - {'loss': 0.6651, 'grad_norm': 1.2714204788208008, 'learning_rate': 6.277446265685332e-06, 'epoch': 1.9} +2025-02-05 23:26:00 - ERROR - stderr - 63%|██████▎ | 14196/22434 [13:18:20<5:43:43, 2.50s/it] +2025-02-05 23:26:02 - ERROR - stderr - 63%|██████▎ | 14197/22434 [13:18:22<5:41:09, 2.49s/it] +2025-02-05 23:26:02 - ERROR - stderr - +2025-02-05 23:26:02 - ERROR - stderr - +2025-02-05 23:26:02 - INFO - stdout - {'loss': 0.628, 'grad_norm': 1.2135707139968872, 'learning_rate': 6.276106319434676e-06, 'epoch': 1.9} +2025-02-05 23:26:02 - ERROR - stderr - 63%|██████▎ | 14197/22434 [13:18:22<5:41:09, 2.49s/it] +2025-02-05 23:26:05 - ERROR - stderr - 63%|██████▎ | 14198/22434 [13:18:25<5:58:51, 2.61s/it] +2025-02-05 23:26:05 - ERROR - stderr - +2025-02-05 23:26:05 - ERROR - stderr - +2025-02-05 23:26:05 - INFO - stdout - {'loss': 0.6397, 'grad_norm': 1.2941120862960815, 'learning_rate': 6.274766450805022e-06, 'epoch': 1.9} +2025-02-05 23:26:05 - ERROR - stderr - 63%|██████▎ | 14198/22434 [13:18:25<5:58:51, 2.61s/it] +2025-02-05 23:26:08 - ERROR - stderr - 63%|██████▎ | 14199/22434 [13:18:27<5:53:41, 2.58s/it] +2025-02-05 23:26:08 - ERROR - stderr - +2025-02-05 23:26:08 - ERROR - stderr - +2025-02-05 23:26:08 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.173779010772705, 'learning_rate': 6.273426659824293e-06, 'epoch': 1.9} +2025-02-05 23:26:08 - ERROR - stderr - 63%|██████▎ | 14199/22434 [13:18:27<5:53:41, 2.58s/it] +2025-02-05 23:26:10 - ERROR - stderr - 63%|██████▎ | 14200/22434 [13:18:30<5:51:38, 2.56s/it] +2025-02-05 23:26:10 - ERROR - stderr - +2025-02-05 23:26:10 - ERROR - stderr - +2025-02-05 23:26:10 - INFO - stdout - {'loss': 0.6651, 'grad_norm': 1.2401278018951416, 'learning_rate': 6.272086946520419e-06, 'epoch': 1.9} +2025-02-05 23:26:10 - ERROR - stderr - 63%|██████▎ | 14200/22434 [13:18:30<5:51:38, 2.56s/it] +2025-02-05 23:26:13 - ERROR - stderr - 63%|██████▎ | 14201/22434 [13:18:32<5:52:32, 2.57s/it] +2025-02-05 23:26:13 - ERROR - stderr - +2025-02-05 23:26:13 - ERROR - stderr - +2025-02-05 23:26:13 - INFO - stdout - {'loss': 0.6762, 'grad_norm': 1.141048550605774, 'learning_rate': 6.270747310921328e-06, 'epoch': 1.9} +2025-02-05 23:26:13 - ERROR - stderr - 63%|██████▎ | 14201/22434 [13:18:33<5:52:32, 2.57s/it] +2025-02-05 23:26:15 - ERROR - stderr - 63%|██████▎ | 14202/22434 [13:18:35<5:46:21, 2.52s/it] +2025-02-05 23:26:15 - ERROR - stderr - +2025-02-05 23:26:15 - ERROR - stderr - +2025-02-05 23:26:15 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.2365912199020386, 'learning_rate': 6.269407753054939e-06, 'epoch': 1.9} +2025-02-05 23:26:15 - ERROR - stderr - 63%|██████▎ | 14202/22434 [13:18:35<5:46:21, 2.52s/it] +2025-02-05 23:26:18 - ERROR - stderr - 63%|██████▎ | 14203/22434 [13:18:38<5:52:19, 2.57s/it] +2025-02-05 23:26:18 - ERROR - stderr - +2025-02-05 23:26:18 - ERROR - stderr - +2025-02-05 23:26:18 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.2768604755401611, 'learning_rate': 6.2680682729491795e-06, 'epoch': 1.9} +2025-02-05 23:26:18 - ERROR - stderr - 63%|██████▎ | 14203/22434 [13:18:38<5:52:19, 2.57s/it] +2025-02-05 23:26:20 - ERROR - stderr - 63%|██████▎ | 14204/22434 [13:18:40<5:52:29, 2.57s/it] +2025-02-05 23:26:20 - ERROR - stderr - +2025-02-05 23:26:20 - ERROR - stderr - +2025-02-05 23:26:20 - INFO - stdout - {'loss': 0.6231, 'grad_norm': 1.0195436477661133, 'learning_rate': 6.26672887063196e-06, 'epoch': 1.9} +2025-02-05 23:26:20 - ERROR - stderr - 63%|██████▎ | 14204/22434 [13:18:40<5:52:29, 2.57s/it] +2025-02-05 23:26:23 - ERROR - stderr - 63%|██████▎ | 14205/22434 [13:18:43<5:50:21, 2.55s/it] +2025-02-05 23:26:23 - ERROR - stderr - +2025-02-05 23:26:23 - ERROR - stderr - +2025-02-05 23:26:23 - INFO - stdout - {'loss': 0.6707, 'grad_norm': 1.3629164695739746, 'learning_rate': 6.265389546131209e-06, 'epoch': 1.9} +2025-02-05 23:26:23 - ERROR - stderr - 63%|██████▎ | 14205/22434 [13:18:43<5:50:21, 2.55s/it] +2025-02-05 23:26:25 - ERROR - stderr - 63%|██████▎ | 14206/22434 [13:18:45<5:49:20, 2.55s/it] +2025-02-05 23:26:25 - ERROR - stderr - +2025-02-05 23:26:25 - ERROR - stderr - +2025-02-05 23:26:25 - INFO - stdout - {'loss': 0.6592, 'grad_norm': 1.2184032201766968, 'learning_rate': 6.2640502994748375e-06, 'epoch': 1.9} +2025-02-05 23:26:25 - ERROR - stderr - 63%|██████▎ | 14206/22434 [13:18:45<5:49:20, 2.55s/it] +2025-02-05 23:26:28 - ERROR - stderr - 63%|██████▎ | 14207/22434 [13:18:48<5:50:34, 2.56s/it] +2025-02-05 23:26:28 - ERROR - stderr - +2025-02-05 23:26:28 - ERROR - stderr - +2025-02-05 23:26:28 - INFO - stdout - {'loss': 0.6272, 'grad_norm': 1.2271900177001953, 'learning_rate': 6.262711130690762e-06, 'epoch': 1.9} +2025-02-05 23:26:28 - ERROR - stderr - 63%|██████▎ | 14207/22434 [13:18:48<5:50:34, 2.56s/it] +2025-02-05 23:26:31 - ERROR - stderr - 63%|██████▎ | 14208/22434 [13:18:50<5:49:14, 2.55s/it] +2025-02-05 23:26:31 - ERROR - stderr - +2025-02-05 23:26:31 - ERROR - stderr - +2025-02-05 23:26:31 - INFO - stdout - {'loss': 0.635, 'grad_norm': 1.1895248889923096, 'learning_rate': 6.261372039806899e-06, 'epoch': 1.9} +2025-02-05 23:26:31 - ERROR - stderr - 63%|██████▎ | 14208/22434 [13:18:50<5:49:14, 2.55s/it] +2025-02-05 23:26:33 - ERROR - stderr - 63%|██████▎ | 14209/22434 [13:18:53<5:50:54, 2.56s/it] +2025-02-05 23:26:33 - ERROR - stderr - +2025-02-05 23:26:33 - ERROR - stderr - +2025-02-05 23:26:33 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.2674988508224487, 'learning_rate': 6.260033026851156e-06, 'epoch': 1.9} +2025-02-05 23:26:33 - ERROR - stderr - 63%|██████▎ | 14209/22434 [13:18:53<5:50:54, 2.56s/it] +2025-02-05 23:26:36 - ERROR - stderr - 63%|██████▎ | 14210/22434 [13:18:55<5:51:13, 2.56s/it] +2025-02-05 23:26:36 - ERROR - stderr - +2025-02-05 23:26:36 - ERROR - stderr - +2025-02-05 23:26:36 - INFO - stdout - {'loss': 0.6393, 'grad_norm': 1.310165286064148, 'learning_rate': 6.2586940918514474e-06, 'epoch': 1.9} +2025-02-05 23:26:36 - ERROR - stderr - 63%|██████▎ | 14210/22434 [13:18:56<5:51:13, 2.56s/it] +2025-02-05 23:26:38 - ERROR - stderr - 63%|██████▎ | 14211/22434 [13:18:58<5:48:33, 2.54s/it] +2025-02-05 23:26:38 - ERROR - stderr - +2025-02-05 23:26:38 - ERROR - stderr - +2025-02-05 23:26:38 - INFO - stdout - {'loss': 0.6647, 'grad_norm': 1.2059580087661743, 'learning_rate': 6.257355234835682e-06, 'epoch': 1.9} +2025-02-05 23:26:38 - ERROR - stderr - 63%|██████▎ | 14211/22434 [13:18:58<5:48:33, 2.54s/it] +2025-02-05 23:26:41 - ERROR - stderr - 63%|██████▎ | 14212/22434 [13:19:00<5:45:19, 2.52s/it] +2025-02-05 23:26:41 - ERROR - stderr - +2025-02-05 23:26:41 - ERROR - stderr - +2025-02-05 23:26:41 - INFO - stdout - {'loss': 0.5878, 'grad_norm': 1.16941499710083, 'learning_rate': 6.256016455831762e-06, 'epoch': 1.9} +2025-02-05 23:26:41 - ERROR - stderr - 63%|██████▎ | 14212/22434 [13:19:00<5:45:19, 2.52s/it] +2025-02-05 23:26:43 - ERROR - stderr - 63%|██████▎ | 14213/22434 [13:19:03<5:42:40, 2.50s/it] +2025-02-05 23:26:43 - ERROR - stderr - +2025-02-05 23:26:43 - ERROR - stderr - +2025-02-05 23:26:43 - INFO - stdout - {'loss': 0.6431, 'grad_norm': 1.2030833959579468, 'learning_rate': 6.254677754867596e-06, 'epoch': 1.9} +2025-02-05 23:26:43 - ERROR - stderr - 63%|██████▎ | 14213/22434 [13:19:03<5:42:40, 2.50s/it] +2025-02-05 23:26:46 - ERROR - stderr - 63%|██████▎ | 14214/22434 [13:19:05<5:46:22, 2.53s/it] +2025-02-05 23:26:46 - ERROR - stderr - +2025-02-05 23:26:46 - ERROR - stderr - +2025-02-05 23:26:46 - INFO - stdout - {'loss': 0.7215, 'grad_norm': 1.2448970079421997, 'learning_rate': 6.2533391319710924e-06, 'epoch': 1.9} +2025-02-05 23:26:46 - ERROR - stderr - 63%|██████▎ | 14214/22434 [13:19:06<5:46:22, 2.53s/it] +2025-02-05 23:26:48 - ERROR - stderr - 63%|██████▎ | 14215/22434 [13:19:08<5:50:46, 2.56s/it] +2025-02-05 23:26:48 - ERROR - stderr - +2025-02-05 23:26:48 - ERROR - stderr - +2025-02-05 23:26:48 - INFO - stdout - {'loss': 0.7123, 'grad_norm': 1.2692575454711914, 'learning_rate': 6.252000587170145e-06, 'epoch': 1.9} +2025-02-05 23:26:48 - ERROR - stderr - 63%|██████▎ | 14215/22434 [13:19:08<5:50:46, 2.56s/it] +2025-02-05 23:26:51 - ERROR - stderr - 63%|██████▎ | 14216/22434 [13:19:11<5:51:32, 2.57s/it] +2025-02-05 23:26:51 - ERROR - stderr - +2025-02-05 23:26:51 - ERROR - stderr - +2025-02-05 23:26:51 - INFO - stdout - {'loss': 0.6922, 'grad_norm': 1.3642958402633667, 'learning_rate': 6.250662120492663e-06, 'epoch': 1.9} +2025-02-05 23:26:51 - ERROR - stderr - 63%|██████▎ | 14216/22434 [13:19:11<5:51:32, 2.57s/it] +2025-02-05 23:26:53 - ERROR - stderr - 63%|██████▎ | 14217/22434 [13:19:13<5:49:58, 2.56s/it] +2025-02-05 23:26:54 - ERROR - stderr - +2025-02-05 23:26:54 - ERROR - stderr - +2025-02-05 23:26:54 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.3640735149383545, 'learning_rate': 6.249323731966537e-06, 'epoch': 1.9} +2025-02-05 23:26:54 - ERROR - stderr - 63%|██████▎ | 14217/22434 [13:19:13<5:49:58, 2.56s/it] +2025-02-05 23:26:56 - ERROR - stderr - 63%|██████▎ | 14218/22434 [13:19:16<5:49:47, 2.55s/it] +2025-02-05 23:26:56 - ERROR - stderr - +2025-02-05 23:26:56 - ERROR - stderr - +2025-02-05 23:26:56 - INFO - stdout - {'loss': 0.7258, 'grad_norm': 1.4249627590179443, 'learning_rate': 6.247985421619674e-06, 'epoch': 1.9} +2025-02-05 23:26:56 - ERROR - stderr - 63%|██████▎ | 14218/22434 [13:19:16<5:49:47, 2.55s/it] +2025-02-05 23:26:59 - ERROR - stderr - 63%|██████▎ | 14219/22434 [13:19:18<5:48:24, 2.54s/it] +2025-02-05 23:26:59 - ERROR - stderr - +2025-02-05 23:26:59 - ERROR - stderr - +2025-02-05 23:26:59 - INFO - stdout - {'loss': 0.6548, 'grad_norm': 1.101607084274292, 'learning_rate': 6.24664718947996e-06, 'epoch': 1.9} +2025-02-05 23:26:59 - ERROR - stderr - 63%|██████▎ | 14219/22434 [13:19:18<5:48:24, 2.54s/it] +2025-02-05 23:27:01 - ERROR - stderr - 63%|██████▎ | 14220/22434 [13:19:21<5:46:23, 2.53s/it] +2025-02-05 23:27:01 - ERROR - stderr - +2025-02-05 23:27:01 - ERROR - stderr - +2025-02-05 23:27:01 - INFO - stdout - {'loss': 0.7144, 'grad_norm': 1.3178461790084839, 'learning_rate': 6.2453090355752955e-06, 'epoch': 1.9} +2025-02-05 23:27:01 - ERROR - stderr - 63%|██████▎ | 14220/22434 [13:19:21<5:46:23, 2.53s/it] +2025-02-05 23:27:04 - ERROR - stderr - 63%|██████▎ | 14221/22434 [13:19:23<5:45:38, 2.53s/it] +2025-02-05 23:27:04 - ERROR - stderr - +2025-02-05 23:27:04 - ERROR - stderr - +2025-02-05 23:27:04 - INFO - stdout - {'loss': 0.7134, 'grad_norm': 1.403782606124878, 'learning_rate': 6.243970959933572e-06, 'epoch': 1.9} +2025-02-05 23:27:04 - ERROR - stderr - 63%|██████▎ | 14221/22434 [13:19:23<5:45:38, 2.53s/it] +2025-02-05 23:27:06 - ERROR - stderr - 63%|██████▎ | 14222/22434 [13:19:26<5:43:36, 2.51s/it] +2025-02-05 23:27:06 - ERROR - stderr - +2025-02-05 23:27:06 - ERROR - stderr - +2025-02-05 23:27:06 - INFO - stdout - {'loss': 0.7193, 'grad_norm': 1.1963376998901367, 'learning_rate': 6.24263296258268e-06, 'epoch': 1.9} +2025-02-05 23:27:06 - ERROR - stderr - 63%|██████▎ | 14222/22434 [13:19:26<5:43:36, 2.51s/it] +2025-02-05 23:27:09 - ERROR - stderr - 63%|██████▎ | 14223/22434 [13:19:28<5:51:50, 2.57s/it] +2025-02-05 23:27:09 - ERROR - stderr - +2025-02-05 23:27:09 - ERROR - stderr - +2025-02-05 23:27:09 - INFO - stdout - {'loss': 0.6593, 'grad_norm': 1.184970736503601, 'learning_rate': 6.241295043550506e-06, 'epoch': 1.9} +2025-02-05 23:27:09 - ERROR - stderr - 63%|██████▎ | 14223/22434 [13:19:29<5:51:50, 2.57s/it] +2025-02-05 23:27:11 - ERROR - stderr - 63%|██████▎ | 14224/22434 [13:19:31<5:48:58, 2.55s/it] +2025-02-05 23:27:11 - ERROR - stderr - +2025-02-05 23:27:11 - ERROR - stderr - +2025-02-05 23:27:11 - INFO - stdout - {'loss': 0.5852, 'grad_norm': 1.2098369598388672, 'learning_rate': 6.239957202864943e-06, 'epoch': 1.9} +2025-02-05 23:27:11 - ERROR - stderr - 63%|██████▎ | 14224/22434 [13:19:31<5:48:58, 2.55s/it] +2025-02-05 23:27:14 - ERROR - stderr - 63%|██████▎ | 14225/22434 [13:19:33<5:44:36, 2.52s/it] +2025-02-05 23:27:14 - ERROR - stderr - +2025-02-05 23:27:14 - ERROR - stderr - +2025-02-05 23:27:14 - INFO - stdout - {'loss': 0.6346, 'grad_norm': 1.4391251802444458, 'learning_rate': 6.23861944055387e-06, 'epoch': 1.9} +2025-02-05 23:27:14 - ERROR - stderr - 63%|██████▎ | 14225/22434 [13:19:33<5:44:36, 2.52s/it] +2025-02-05 23:27:16 - ERROR - stderr - 63%|██████▎ | 14226/22434 [13:19:36<5:42:48, 2.51s/it] +2025-02-05 23:27:16 - ERROR - stderr - +2025-02-05 23:27:16 - ERROR - stderr - +2025-02-05 23:27:16 - INFO - stdout - {'loss': 0.7033, 'grad_norm': 1.334114909172058, 'learning_rate': 6.237281756645178e-06, 'epoch': 1.9} +2025-02-05 23:27:16 - ERROR - stderr - 63%|██████▎ | 14226/22434 [13:19:36<5:42:48, 2.51s/it] +2025-02-05 23:27:19 - ERROR - stderr - 63%|██████▎ | 14227/22434 [13:19:39<5:47:12, 2.54s/it] +2025-02-05 23:27:19 - ERROR - stderr - +2025-02-05 23:27:19 - ERROR - stderr - +2025-02-05 23:27:19 - INFO - stdout - {'loss': 0.6968, 'grad_norm': 1.2246087789535522, 'learning_rate': 6.23594415116675e-06, 'epoch': 1.9} +2025-02-05 23:27:19 - ERROR - stderr - 63%|██████▎ | 14227/22434 [13:19:39<5:47:12, 2.54s/it] +2025-02-05 23:27:22 - ERROR - stderr - 63%|���█████▎ | 14228/22434 [13:19:41<6:00:39, 2.64s/it] +2025-02-05 23:27:22 - ERROR - stderr - +2025-02-05 23:27:22 - ERROR - stderr - +2025-02-05 23:27:22 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.216705083847046, 'learning_rate': 6.2346066241464595e-06, 'epoch': 1.9} +2025-02-05 23:27:22 - ERROR - stderr - 63%|██████▎ | 14228/22434 [13:19:41<6:00:39, 2.64s/it] +2025-02-05 23:27:24 - ERROR - stderr - 63%|██████▎ | 14229/22434 [13:19:44<6:01:27, 2.64s/it] +2025-02-05 23:27:24 - ERROR - stderr - +2025-02-05 23:27:24 - ERROR - stderr - +2025-02-05 23:27:24 - INFO - stdout - {'loss': 0.7201, 'grad_norm': 1.2645127773284912, 'learning_rate': 6.233269175612195e-06, 'epoch': 1.9} +2025-02-05 23:27:24 - ERROR - stderr - 63%|██████▎ | 14229/22434 [13:19:44<6:01:27, 2.64s/it] +2025-02-05 23:27:27 - ERROR - stderr - 63%|██████▎ | 14230/22434 [13:19:47<5:58:00, 2.62s/it] +2025-02-05 23:27:27 - ERROR - stderr - +2025-02-05 23:27:27 - ERROR - stderr - +2025-02-05 23:27:27 - INFO - stdout - {'loss': 0.6423, 'grad_norm': 1.363527774810791, 'learning_rate': 6.23193180559183e-06, 'epoch': 1.9} +2025-02-05 23:27:27 - ERROR - stderr - 63%|██████▎ | 14230/22434 [13:19:47<5:58:00, 2.62s/it] +2025-02-05 23:27:29 - ERROR - stderr - 63%|██████▎ | 14231/22434 [13:19:49<5:57:05, 2.61s/it] +2025-02-05 23:27:29 - ERROR - stderr - +2025-02-05 23:27:29 - ERROR - stderr - +2025-02-05 23:27:29 - INFO - stdout - {'loss': 0.6162, 'grad_norm': 1.2733242511749268, 'learning_rate': 6.230594514113238e-06, 'epoch': 1.9} +2025-02-05 23:27:29 - ERROR - stderr - 63%|██████▎ | 14231/22434 [13:19:49<5:57:05, 2.61s/it] +2025-02-05 23:27:32 - ERROR - stderr - 63%|██████▎ | 14232/22434 [13:19:52<5:57:08, 2.61s/it] +2025-02-05 23:27:32 - ERROR - stderr - +2025-02-05 23:27:32 - ERROR - stderr - +2025-02-05 23:27:32 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.2674627304077148, 'learning_rate': 6.2292573012042965e-06, 'epoch': 1.9} +2025-02-05 23:27:32 - ERROR - stderr - 63%|██████▎ | 14232/22434 [13:19:52<5:57:08, 2.61s/it] +2025-02-05 23:27:34 - ERROR - stderr - 63%|██████▎ | 14233/22434 [13:19:54<5:48:29, 2.55s/it] +2025-02-05 23:27:35 - ERROR - stderr - +2025-02-05 23:27:35 - ERROR - stderr - +2025-02-05 23:27:35 - INFO - stdout - {'loss': 0.6471, 'grad_norm': 1.3361784219741821, 'learning_rate': 6.22792016689288e-06, 'epoch': 1.9} +2025-02-05 23:27:35 - ERROR - stderr - 63%|██████▎ | 14233/22434 [13:19:54<5:48:29, 2.55s/it] +2025-02-05 23:27:37 - ERROR - stderr - 63%|██████▎ | 14234/22434 [13:19:57<5:57:45, 2.62s/it] +2025-02-05 23:27:37 - ERROR - stderr - +2025-02-05 23:27:37 - ERROR - stderr - +2025-02-05 23:27:37 - INFO - stdout - {'loss': 0.6775, 'grad_norm': 1.2567089796066284, 'learning_rate': 6.2265831112068565e-06, 'epoch': 1.9} +2025-02-05 23:27:37 - ERROR - stderr - 63%|██████▎ | 14234/22434 [13:19:57<5:57:45, 2.62s/it] +2025-02-05 23:27:40 - ERROR - stderr - 63%|██████▎ | 14235/22434 [13:19:59<5:50:49, 2.57s/it] +2025-02-05 23:27:40 - ERROR - stderr - +2025-02-05 23:27:40 - ERROR - stderr - +2025-02-05 23:27:40 - INFO - stdout - {'loss': 0.6346, 'grad_norm': 1.3277584314346313, 'learning_rate': 6.225246134174101e-06, 'epoch': 1.9} +2025-02-05 23:27:40 - ERROR - stderr - 63%|██████▎ | 14235/22434 [13:20:00<5:50:49, 2.57s/it] +2025-02-05 23:27:42 - ERROR - stderr - 63%|██████▎ | 14236/22434 [13:20:02<5:52:42, 2.58s/it] +2025-02-05 23:27:42 - ERROR - stderr - +2025-02-05 23:27:42 - ERROR - stderr - +2025-02-05 23:27:42 - INFO - stdout - {'loss': 0.6258, 'grad_norm': 1.1782984733581543, 'learning_rate': 6.223909235822472e-06, 'epoch': 1.9} +2025-02-05 23:27:42 - ERROR - stderr - 63%|██████▎ | 14236/22434 [13:20:02<5:52:42, 2.58s/it] +2025-02-05 23:27:45 - ERROR - stderr - 63%|██████▎ | 14237/22434 [13:20:05<5:53:00, 2.58s/it] +2025-02-05 23:27:45 - ERROR - stderr - +2025-02-05 23:27:45 - ERROR - stderr - +2025-02-05 23:27:45 - INFO - stdout - {'loss': 0.6008, 'grad_norm': 1.2829620838165283, 'learning_rate': 6.222572416179847e-06, 'epoch': 1.9} +2025-02-05 23:27:45 - ERROR - stderr - 63%|██████▎ | 14237/22434 [13:20:05<5:53:00, 2.58s/it] +2025-02-05 23:27:47 - ERROR - stderr - 63%|██████▎ | 14238/22434 [13:20:07<5:49:35, 2.56s/it] +2025-02-05 23:27:47 - ERROR - stderr - +2025-02-05 23:27:47 - ERROR - stderr - +2025-02-05 23:27:47 - INFO - stdout - {'loss': 0.7016, 'grad_norm': 1.2130955457687378, 'learning_rate': 6.2212356752740835e-06, 'epoch': 1.9} +2025-02-05 23:27:47 - ERROR - stderr - 63%|██████▎ | 14238/22434 [13:20:07<5:49:35, 2.56s/it] +2025-02-05 23:27:50 - ERROR - stderr - 63%|██████▎ | 14239/22434 [13:20:10<5:50:14, 2.56s/it] +2025-02-05 23:27:50 - ERROR - stderr - +2025-02-05 23:27:50 - ERROR - stderr - +2025-02-05 23:27:50 - INFO - stdout - {'loss': 0.633, 'grad_norm': 1.3055992126464844, 'learning_rate': 6.219899013133046e-06, 'epoch': 1.9} +2025-02-05 23:27:50 - ERROR - stderr - 63%|██████▎ | 14239/22434 [13:20:10<5:50:14, 2.56s/it] +2025-02-05 23:27:52 - ERROR - stderr - 63%|██████▎ | 14240/22434 [13:20:12<5:48:09, 2.55s/it] +2025-02-05 23:27:53 - ERROR - stderr - +2025-02-05 23:27:53 - ERROR - stderr - +2025-02-05 23:27:53 - INFO - stdout - {'loss': 0.7483, 'grad_norm': 1.3640965223312378, 'learning_rate': 6.218562429784596e-06, 'epoch': 1.9} +2025-02-05 23:27:53 - ERROR - stderr - 63%|██████▎ | 14240/22434 [13:20:12<5:48:09, 2.55s/it] +2025-02-05 23:27:55 - ERROR - stderr - 63%|██████▎ | 14241/22434 [13:20:15<5:49:06, 2.56s/it] +2025-02-05 23:27:55 - ERROR - stderr - +2025-02-05 23:27:55 - ERROR - stderr - +2025-02-05 23:27:55 - INFO - stdout - {'loss': 0.7042, 'grad_norm': 1.367092490196228, 'learning_rate': 6.217225925256593e-06, 'epoch': 1.9} +2025-02-05 23:27:55 - ERROR - stderr - 63%|██████▎ | 14241/22434 [13:20:15<5:49:06, 2.56s/it] +2025-02-05 23:27:57 - ERROR - stderr - 63%|██████▎ | 14242/22434 [13:20:17<5:44:17, 2.52s/it] +2025-02-05 23:27:58 - ERROR - stderr - +2025-02-05 23:27:58 - ERROR - stderr - +2025-02-05 23:27:58 - INFO - stdout - {'loss': 0.6742, 'grad_norm': 1.2731029987335205, 'learning_rate': 6.215889499576898e-06, 'epoch': 1.9} +2025-02-05 23:27:58 - ERROR - stderr - 63%|██████▎ | 14242/22434 [13:20:17<5:44:17, 2.52s/it] +2025-02-05 23:28:00 - ERROR - stderr - 63%|██████▎ | 14243/22434 [13:20:20<5:44:07, 2.52s/it] +2025-02-05 23:28:00 - ERROR - stderr - +2025-02-05 23:28:00 - ERROR - stderr - +2025-02-05 23:28:00 - INFO - stdout - {'loss': 0.6201, 'grad_norm': 1.4902068376541138, 'learning_rate': 6.214553152773366e-06, 'epoch': 1.9} +2025-02-05 23:28:00 - ERROR - stderr - 63%|██████▎ | 14243/22434 [13:20:20<5:44:07, 2.52s/it] +2025-02-05 23:28:03 - ERROR - stderr - 63%|██████▎ | 14244/22434 [13:20:22<5:47:14, 2.54s/it] +2025-02-05 23:28:03 - ERROR - stderr - +2025-02-05 23:28:03 - ERROR - stderr - +2025-02-05 23:28:03 - INFO - stdout - {'loss': 0.6913, 'grad_norm': 1.3631356954574585, 'learning_rate': 6.213216884873848e-06, 'epoch': 1.9} +2025-02-05 23:28:03 - ERROR - stderr - 63%|██████▎ | 14244/22434 [13:20:22<5:47:14, 2.54s/it] +2025-02-05 23:28:05 - ERROR - stderr - 63%|██████▎ | 14245/22434 [13:20:25<5:47:11, 2.54s/it] +2025-02-05 23:28:05 - ERROR - stderr - +2025-02-05 23:28:05 - ERROR - stderr - +2025-02-05 23:28:05 - INFO - stdout - {'loss': 0.6877, 'grad_norm': 1.3377296924591064, 'learning_rate': 6.211880695906203e-06, 'epoch': 1.9} +2025-02-05 23:28:05 - ERROR - stderr - 63%|██████▎ | 14245/22434 [13:20:25<5:47:11, 2.54s/it] +2025-02-05 23:28:08 - ERROR - stderr - 64%|██████▎ | 14246/22434 [13:20:27<5:45:16, 2.53s/it] +2025-02-05 23:28:08 - ERROR - stderr - +2025-02-05 23:28:08 - ERROR - stderr - +2025-02-05 23:28:08 - INFO - stdout - {'loss': 0.6708, 'grad_norm': 1.255954623222351, 'learning_rate': 6.2105445858982805e-06, 'epoch': 1.91} +2025-02-05 23:28:08 - ERROR - stderr - 64%|██████▎ | 14246/22434 [13:20:27<5:45:16, 2.53s/it] +2025-02-05 23:28:10 - ERROR - stderr - 64%|██████▎ | 14247/22434 [13:20:30<5:46:32, 2.54s/it] +2025-02-05 23:28:10 - ERROR - stderr - +2025-02-05 23:28:10 - ERROR - stderr - +2025-02-05 23:28:10 - INFO - stdout - {'loss': 0.6654, 'grad_norm': 1.1267220973968506, 'learning_rate': 6.209208554877927e-06, 'epoch': 1.91} +2025-02-05 23:28:10 - ERROR - stderr - 64%|██████▎ | 14247/22434 [13:20:30<5:46:32, 2.54s/it] +2025-02-05 23:28:13 - ERROR - stderr - 64%|██████▎ | 14248/22434 [13:20:33<5:47:51, 2.55s/it] +2025-02-05 23:28:13 - ERROR - stderr - +2025-02-05 23:28:13 - ERROR - stderr - +2025-02-05 23:28:13 - INFO - stdout - {'loss': 0.6396, 'grad_norm': 1.1582626104354858, 'learning_rate': 6.207872602872998e-06, 'epoch': 1.91} +2025-02-05 23:28:13 - ERROR - stderr - 64%|██████▎ | 14248/22434 [13:20:33<5:47:51, 2.55s/it] +2025-02-05 23:28:15 - ERROR - stderr - 64%|██████▎ | 14249/22434 [13:20:35<5:42:12, 2.51s/it] +2025-02-05 23:28:15 - ERROR - stderr - +2025-02-05 23:28:15 - ERROR - stderr - +2025-02-05 23:28:15 - INFO - stdout - {'loss': 0.6549, 'grad_norm': 1.210599422454834, 'learning_rate': 6.20653672991133e-06, 'epoch': 1.91} +2025-02-05 23:28:15 - ERROR - stderr - 64%|██████▎ | 14249/22434 [13:20:35<5:42:12, 2.51s/it] +2025-02-05 23:28:18 - ERROR - stderr - 64%|██████▎ | 14250/22434 [13:20:38<5:43:16, 2.52s/it] +2025-02-05 23:28:18 - ERROR - stderr - +2025-02-05 23:28:18 - ERROR - stderr - +2025-02-05 23:28:18 - INFO - stdout - {'loss': 0.5714, 'grad_norm': 1.1410456895828247, 'learning_rate': 6.20520093602078e-06, 'epoch': 1.91} +2025-02-05 23:28:18 - ERROR - stderr - 64%|██████▎ | 14250/22434 [13:20:38<5:43:16, 2.52s/it] +2025-02-05 23:28:20 - ERROR - stderr - 64%|██████▎ | 14251/22434 [13:20:40<5:45:34, 2.53s/it] +2025-02-05 23:28:20 - ERROR - stderr - +2025-02-05 23:28:20 - ERROR - stderr - +2025-02-05 23:28:20 - INFO - stdout - {'loss': 0.5963, 'grad_norm': 1.0925973653793335, 'learning_rate': 6.203865221229182e-06, 'epoch': 1.91} +2025-02-05 23:28:20 - ERROR - stderr - 64%|██████▎ | 14251/22434 [13:20:40<5:45:34, 2.53s/it] +2025-02-05 23:28:23 - ERROR - stderr - 64%|██████▎ | 14252/22434 [13:20:43<5:44:51, 2.53s/it] +2025-02-05 23:28:23 - ERROR - stderr - +2025-02-05 23:28:23 - ERROR - stderr - +2025-02-05 23:28:23 - INFO - stdout - {'loss': 0.7082, 'grad_norm': 1.3466782569885254, 'learning_rate': 6.202529585564382e-06, 'epoch': 1.91} +2025-02-05 23:28:23 - ERROR - stderr - 64%|██████▎ | 14252/22434 [13:20:43<5:44:51, 2.53s/it] +2025-02-05 23:28:25 - ERROR - stderr - 64%|██████▎ | 14253/22434 [13:20:45<5:47:17, 2.55s/it] +2025-02-05 23:28:25 - ERROR - stderr - +2025-02-05 23:28:25 - ERROR - stderr - +2025-02-05 23:28:25 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.3894317150115967, 'learning_rate': 6.201194029054218e-06, 'epoch': 1.91} +2025-02-05 23:28:25 - ERROR - stderr - 64%|██████▎ | 14253/22434 [13:20:45<5:47:17, 2.55s/it] +2025-02-05 23:28:28 - ERROR - stderr - 64%|██████▎ | 14254/22434 [13:20:48<5:47:24, 2.55s/it] +2025-02-05 23:28:28 - ERROR - stderr - +2025-02-05 23:28:28 - ERROR - stderr - +2025-02-05 23:28:28 - INFO - stdout - {'loss': 0.6732, 'grad_norm': 1.3058750629425049, 'learning_rate': 6.199858551726532e-06, 'epoch': 1.91} +2025-02-05 23:28:28 - ERROR - stderr - 64%|██████▎ | 14254/22434 [13:20:48<5:47:24, 2.55s/it] +2025-02-05 23:28:30 - ERROR - stderr - 64%|██████▎ | 14255/22434 [13:20:50<5:45:59, 2.54s/it] +2025-02-05 23:28:31 - ERROR - stderr - +2025-02-05 23:28:31 - ERROR - stderr - +2025-02-05 23:28:31 - INFO - stdout - {'loss': 0.7017, 'grad_norm': 1.3069320917129517, 'learning_rate': 6.1985231536091535e-06, 'epoch': 1.91} +2025-02-05 23:28:31 - ERROR - stderr - 64%|██████▎ | 14255/22434 [13:20:50<5:45:59, 2.54s/it] +2025-02-05 23:28:33 - ERROR - stderr - 64%|██████▎ | 14256/22434 [13:20:53<5:45:05, 2.53s/it] +2025-02-05 23:28:33 - ERROR - stderr - +2025-02-05 23:28:33 - ERROR - stderr - +2025-02-05 23:28:33 - INFO - stdout - {'loss': 0.6798, 'grad_norm': 1.2842769622802734, 'learning_rate': 6.1971878347299275e-06, 'epoch': 1.91} +2025-02-05 23:28:33 - ERROR - stderr - 64%|██████▎ | 14256/22434 [13:20:53<5:45:05, 2.53s/it] +2025-02-05 23:28:36 - ERROR - stderr - 64%|██████▎ | 14257/22434 [13:20:55<5:50:18, 2.57s/it] +2025-02-05 23:28:36 - ERROR - stderr - +2025-02-05 23:28:36 - ERROR - stderr - +2025-02-05 23:28:36 - INFO - stdout - {'loss': 0.6991, 'grad_norm': 1.3049827814102173, 'learning_rate': 6.195852595116678e-06, 'epoch': 1.91} +2025-02-05 23:28:36 - ERROR - stderr - 64%|██████▎ | 14257/22434 [13:20:55<5:50:18, 2.57s/it] +2025-02-05 23:28:38 - ERROR - stderr - 64%|██████▎ | 14258/22434 [13:20:58<5:46:50, 2.55s/it] +2025-02-05 23:28:38 - ERROR - stderr - +2025-02-05 23:28:38 - ERROR - stderr - +2025-02-05 23:28:38 - INFO - stdout - {'loss': 0.6637, 'grad_norm': 1.1584053039550781, 'learning_rate': 6.194517434797243e-06, 'epoch': 1.91} +2025-02-05 23:28:38 - ERROR - stderr - 64%|██████▎ | 14258/22434 [13:20:58<5:46:50, 2.55s/it] +2025-02-05 23:28:41 - ERROR - stderr - 64%|██████▎ | 14259/22434 [13:21:00<5:44:14, 2.53s/it] +2025-02-05 23:28:41 - ERROR - stderr - +2025-02-05 23:28:41 - ERROR - stderr - +2025-02-05 23:28:41 - INFO - stdout - {'loss': 0.7048, 'grad_norm': 1.3488049507141113, 'learning_rate': 6.193182353799451e-06, 'epoch': 1.91} +2025-02-05 23:28:41 - ERROR - stderr - 64%|██████▎ | 14259/22434 [13:21:00<5:44:14, 2.53s/it] +2025-02-05 23:28:43 - ERROR - stderr - 64%|██████▎ | 14260/22434 [13:21:03<5:40:32, 2.50s/it] +2025-02-05 23:28:43 - ERROR - stderr - +2025-02-05 23:28:43 - ERROR - stderr - +2025-02-05 23:28:43 - INFO - stdout - {'loss': 0.6652, 'grad_norm': 1.160932183265686, 'learning_rate': 6.191847352151127e-06, 'epoch': 1.91} +2025-02-05 23:28:43 - ERROR - stderr - 64%|██████▎ | 14260/22434 [13:21:03<5:40:32, 2.50s/it] +2025-02-05 23:28:46 - ERROR - stderr - 64%|██████▎ | 14261/22434 [13:21:05<5:42:51, 2.52s/it] +2025-02-05 23:28:46 - ERROR - stderr - +2025-02-05 23:28:46 - ERROR - stderr - +2025-02-05 23:28:46 - INFO - stdout - {'loss': 0.641, 'grad_norm': 1.2118254899978638, 'learning_rate': 6.190512429880105e-06, 'epoch': 1.91} +2025-02-05 23:28:46 - ERROR - stderr - 64%|██████▎ | 14261/22434 [13:21:05<5:42:51, 2.52s/it] +2025-02-05 23:28:48 - ERROR - stderr - 64%|██████▎ | 14262/22434 [13:21:08<5:39:15, 2.49s/it] +2025-02-05 23:28:48 - ERROR - stderr - +2025-02-05 23:28:48 - ERROR - stderr - +2025-02-05 23:28:48 - INFO - stdout - {'loss': 0.6369, 'grad_norm': 1.290051817893982, 'learning_rate': 6.189177587014206e-06, 'epoch': 1.91} +2025-02-05 23:28:48 - ERROR - stderr - 64%|██████▎ | 14262/22434 [13:21:08<5:39:15, 2.49s/it] +2025-02-05 23:28:51 - ERROR - stderr - 64%|██████▎ | 14263/22434 [13:21:11<5:53:40, 2.60s/it] +2025-02-05 23:28:51 - ERROR - stderr - +2025-02-05 23:28:51 - ERROR - stderr - +2025-02-05 23:28:51 - INFO - stdout - {'loss': 0.7822, 'grad_norm': 1.3270457983016968, 'learning_rate': 6.18784282358125e-06, 'epoch': 1.91} +2025-02-05 23:28:51 - ERROR - stderr - 64%|██████▎ | 14263/22434 [13:21:11<5:53:40, 2.60s/it] +2025-02-05 23:28:53 - ERROR - stderr - 64%|██████▎ | 14264/22434 [13:21:13<5:48:34, 2.56s/it] +2025-02-05 23:28:53 - ERROR - stderr - +2025-02-05 23:28:53 - ERROR - stderr - +2025-02-05 23:28:53 - INFO - stdout - {'loss': 0.6411, 'grad_norm': 1.1998343467712402, 'learning_rate': 6.186508139609064e-06, 'epoch': 1.91} +2025-02-05 23:28:53 - ERROR - stderr - 64%|██████▎ | 14264/22434 [13:21:13<5:48:34, 2.56s/it] +2025-02-05 23:28:56 - ERROR - stderr - 64%|██████▎ | 14265/22434 [13:21:16<5:43:56, 2.53s/it] +2025-02-05 23:28:56 - ERROR - stderr - +2025-02-05 23:28:56 - ERROR - stderr - +2025-02-05 23:28:56 - INFO - stdout - {'loss': 0.704, 'grad_norm': 1.2813255786895752, 'learning_rate': 6.185173535125468e-06, 'epoch': 1.91} +2025-02-05 23:28:56 - ERROR - stderr - 64%|██████▎ | 14265/22434 [13:21:16<5:43:56, 2.53s/it] +2025-02-05 23:28:58 - ERROR - stderr - 64%|██████▎ | 14266/22434 [13:21:18<5:44:52, 2.53s/it] +2025-02-05 23:28:58 - ERROR - stderr - +2025-02-05 23:28:58 - ERROR - stderr - +2025-02-05 23:28:58 - INFO - stdout - {'loss': 0.6897, 'grad_norm': 1.312684416770935, 'learning_rate': 6.183839010158278e-06, 'epoch': 1.91} +2025-02-05 23:28:58 - ERROR - stderr - 64%|██████▎ | 14266/22434 [13:21:18<5:44:52, 2.53s/it] +2025-02-05 23:29:01 - ERROR - stderr - 64%|██████▎ | 14267/22434 [13:21:21<5:43:02, 2.52s/it] +2025-02-05 23:29:01 - ERROR - stderr - +2025-02-05 23:29:01 - ERROR - stderr - +2025-02-05 23:29:01 - INFO - stdout - {'loss': 0.6371, 'grad_norm': 1.323920726776123, 'learning_rate': 6.182504564735314e-06, 'epoch': 1.91} +2025-02-05 23:29:01 - ERROR - stderr - 64%|██████▎ | 14267/22434 [13:21:21<5:43:02, 2.52s/it] +2025-02-05 23:29:03 - ERROR - stderr - 64%|██████▎ | 14268/22434 [13:21:23<5:41:25, 2.51s/it] +2025-02-05 23:29:03 - ERROR - stderr - +2025-02-05 23:29:03 - ERROR - stderr - +2025-02-05 23:29:03 - INFO - stdout - {'loss': 0.6701, 'grad_norm': 1.3595277070999146, 'learning_rate': 6.181170198884386e-06, 'epoch': 1.91} +2025-02-05 23:29:03 - ERROR - stderr - 64%|██████▎ | 14268/22434 [13:21:23<5:41:25, 2.51s/it] +2025-02-05 23:29:06 - ERROR - stderr - 64%|██████▎ | 14269/22434 [13:21:26<5:38:37, 2.49s/it] +2025-02-05 23:29:06 - ERROR - stderr - +2025-02-05 23:29:06 - ERROR - stderr - +2025-02-05 23:29:06 - INFO - stdout - {'loss': 0.6417, 'grad_norm': 1.1455808877944946, 'learning_rate': 6.179835912633315e-06, 'epoch': 1.91} +2025-02-05 23:29:06 - ERROR - stderr - 64%|██████▎ | 14269/22434 [13:21:26<5:38:37, 2.49s/it] +2025-02-05 23:29:08 - ERROR - stderr - 64%|██████▎ | 14270/22434 [13:21:28<5:45:03, 2.54s/it] +2025-02-05 23:29:08 - ERROR - stderr - +2025-02-05 23:29:08 - ERROR - stderr - +2025-02-05 23:29:08 - INFO - stdout - {'loss': 0.6784, 'grad_norm': 1.218246579170227, 'learning_rate': 6.178501706009907e-06, 'epoch': 1.91} +2025-02-05 23:29:08 - ERROR - stderr - 64%|██████▎ | 14270/22434 [13:21:28<5:45:03, 2.54s/it] +2025-02-05 23:29:11 - ERROR - stderr - 64%|██████▎ | 14271/22434 [13:21:31<5:44:31, 2.53s/it] +2025-02-05 23:29:11 - ERROR - stderr - +2025-02-05 23:29:11 - ERROR - stderr - +2025-02-05 23:29:11 - INFO - stdout - {'loss': 0.6286, 'grad_norm': 1.2500081062316895, 'learning_rate': 6.177167579041974e-06, 'epoch': 1.91} +2025-02-05 23:29:11 - ERROR - stderr - 64%|██████▎ | 14271/22434 [13:21:31<5:44:31, 2.53s/it] +2025-02-05 23:29:13 - ERROR - stderr - 64%|██████▎ | 14272/22434 [13:21:33<5:44:34, 2.53s/it] +2025-02-05 23:29:14 - ERROR - stderr - +2025-02-05 23:29:14 - ERROR - stderr - +2025-02-05 23:29:14 - INFO - stdout - {'loss': 0.6401, 'grad_norm': 1.306410789489746, 'learning_rate': 6.1758335317573245e-06, 'epoch': 1.91} +2025-02-05 23:29:14 - ERROR - stderr - 64%|██████▎ | 14272/22434 [13:21:33<5:44:34, 2.53s/it] +2025-02-05 23:29:16 - ERROR - stderr - 64%|██████▎ | 14273/22434 [13:21:36<5:44:26, 2.53s/it] +2025-02-05 23:29:16 - ERROR - stderr - +2025-02-05 23:29:16 - ERROR - stderr - +2025-02-05 23:29:16 - INFO - stdout - {'loss': 0.7212, 'grad_norm': 1.4075771570205688, 'learning_rate': 6.174499564183764e-06, 'epoch': 1.91} +2025-02-05 23:29:16 - ERROR - stderr - 64%|██████▎ | 14273/22434 [13:21:36<5:44:26, 2.53s/it] +2025-02-05 23:29:19 - ERROR - stderr - 64%|██████▎ | 14274/22434 [13:21:38<5:43:05, 2.52s/it] +2025-02-05 23:29:19 - ERROR - stderr - +2025-02-05 23:29:19 - ERROR - stderr - +2025-02-05 23:29:19 - INFO - stdout - {'loss': 0.6642, 'grad_norm': 1.341709852218628, 'learning_rate': 6.173165676349103e-06, 'epoch': 1.91} +2025-02-05 23:29:19 - ERROR - stderr - 64%|██████▎ | 14274/22434 [13:21:38<5:43:05, 2.52s/it] +2025-02-05 23:29:21 - ERROR - stderr - 64%|██████▎ | 14275/22434 [13:21:41<5:41:59, 2.51s/it] +2025-02-05 23:29:21 - ERROR - stderr - +2025-02-05 23:29:21 - ERROR - stderr - +2025-02-05 23:29:21 - INFO - stdout - {'loss': 0.6927, 'grad_norm': 1.3228802680969238, 'learning_rate': 6.171831868281142e-06, 'epoch': 1.91} +2025-02-05 23:29:21 - ERROR - stderr - 64%|██████▎ | 14275/22434 [13:21:41<5:41:59, 2.51s/it] +2025-02-05 23:29:24 - ERROR - stderr - 64%|██████▎ | 14276/22434 [13:21:43<5:40:37, 2.51s/it] +2025-02-05 23:29:24 - ERROR - stderr - +2025-02-05 23:29:24 - ERROR - stderr - +2025-02-05 23:29:24 - INFO - stdout - {'loss': 0.6216, 'grad_norm': 1.2037936449050903, 'learning_rate': 6.170498140007679e-06, 'epoch': 1.91} +2025-02-05 23:29:24 - ERROR - stderr - 64%|██████▎ | 14276/22434 [13:21:43<5:40:37, 2.51s/it] +2025-02-05 23:29:26 - ERROR - stderr - 64%|██████▎ | 14277/22434 [13:21:46<5:40:27, 2.50s/it] +2025-02-05 23:29:26 - ERROR - stderr - +2025-02-05 23:29:26 - ERROR - stderr - +2025-02-05 23:29:26 - INFO - stdout - {'loss': 0.6834, 'grad_norm': 1.2791647911071777, 'learning_rate': 6.169164491556519e-06, 'epoch': 1.91} +2025-02-05 23:29:26 - ERROR - stderr - 64%|██████▎ | 14277/22434 [13:21:46<5:40:27, 2.50s/it] +2025-02-05 23:29:29 - ERROR - stderr - 64%|██████▎ | 14278/22434 [13:21:48<5:40:45, 2.51s/it] +2025-02-05 23:29:29 - ERROR - stderr - +2025-02-05 23:29:29 - ERROR - stderr - +2025-02-05 23:29:29 - INFO - stdout - {'loss': 0.6874, 'grad_norm': 1.2625590562820435, 'learning_rate': 6.16783092295546e-06, 'epoch': 1.91} +2025-02-05 23:29:29 - ERROR - stderr - 64%|██████▎ | 14278/22434 [13:21:48<5:40:45, 2.51s/it] +2025-02-05 23:29:31 - ERROR - stderr - 64%|██████▎ | 14279/22434 [13:21:51<5:49:40, 2.57s/it] +2025-02-05 23:29:31 - ERROR - stderr - +2025-02-05 23:29:31 - ERROR - stderr - +2025-02-05 23:29:31 - INFO - stdout - {'loss': 0.6162, 'grad_norm': 1.151929497718811, 'learning_rate': 6.1664974342323e-06, 'epoch': 1.91} +2025-02-05 23:29:31 - ERROR - stderr - 64%|██████▎ | 14279/22434 [13:21:51<5:49:40, 2.57s/it] +2025-02-05 23:29:34 - ERROR - stderr - 64%|██████▎ | 14280/22434 [13:21:53<5:44:30, 2.54s/it] +2025-02-05 23:29:34 - ERROR - stderr - +2025-02-05 23:29:34 - ERROR - stderr - +2025-02-05 23:29:34 - INFO - stdout - {'loss': 0.7386, 'grad_norm': 1.3370460271835327, 'learning_rate': 6.165164025414831e-06, 'epoch': 1.91} +2025-02-05 23:29:34 - ERROR - stderr - 64%|██████▎ | 14280/22434 [13:21:54<5:44:30, 2.54s/it] +2025-02-05 23:29:36 - ERROR - stderr - 64%|██████▎ | 14281/22434 [13:21:56<5:45:03, 2.54s/it] +2025-02-05 23:29:36 - ERROR - stderr - +2025-02-05 23:29:36 - ERROR - stderr - +2025-02-05 23:29:36 - INFO - stdout - {'loss': 0.6399, 'grad_norm': 1.2010351419448853, 'learning_rate': 6.163830696530846e-06, 'epoch': 1.91} +2025-02-05 23:29:36 - ERROR - stderr - 64%|██████▎ | 14281/22434 [13:21:56<5:45:03, 2.54s/it] +2025-02-05 23:29:39 - ERROR - stderr - 64%|██████▎ | 14282/22434 [13:21:59<5:43:38, 2.53s/it] +2025-02-05 23:29:39 - ERROR - stderr - +2025-02-05 23:29:39 - ERROR - stderr - +2025-02-05 23:29:39 - INFO - stdout - {'loss': 0.7026, 'grad_norm': 1.2746011018753052, 'learning_rate': 6.162497447608145e-06, 'epoch': 1.91} +2025-02-05 23:29:39 - ERROR - stderr - 64%|██████▎ | 14282/22434 [13:21:59<5:43:38, 2.53s/it] +2025-02-05 23:29:41 - ERROR - stderr - 64%|██████▎ | 14283/22434 [13:22:01<5:41:17, 2.51s/it] +2025-02-05 23:29:41 - ERROR - stderr - +2025-02-05 23:29:41 - ERROR - stderr - +2025-02-05 23:29:41 - INFO - stdout - {'loss': 0.7485, 'grad_norm': 1.315746784210205, 'learning_rate': 6.161164278674508e-06, 'epoch': 1.91} +2025-02-05 23:29:41 - ERROR - stderr - 64%|██████▎ | 14283/22434 [13:22:01<5:41:17, 2.51s/it] +2025-02-05 23:29:44 - ERROR - stderr - 64%|██████▎ | 14284/22434 [13:22:03<5:39:42, 2.50s/it] +2025-02-05 23:29:44 - ERROR - stderr - +2025-02-05 23:29:44 - ERROR - stderr - +2025-02-05 23:29:44 - INFO - stdout - {'loss': 0.6024, 'grad_norm': 1.157317876815796, 'learning_rate': 6.15983118975773e-06, 'epoch': 1.91} +2025-02-05 23:29:44 - ERROR - stderr - 64%|██████▎ | 14284/22434 [13:22:04<5:39:42, 2.50s/it] +2025-02-05 23:29:46 - ERROR - stderr - 64%|██████▎ | 14285/22434 [13:22:06<5:35:53, 2.47s/it] +2025-02-05 23:29:46 - ERROR - stderr - +2025-02-05 23:29:46 - ERROR - stderr - +2025-02-05 23:29:46 - INFO - stdout - {'loss': 0.7697, 'grad_norm': 1.5498894453048706, 'learning_rate': 6.158498180885596e-06, 'epoch': 1.91} +2025-02-05 23:29:46 - ERROR - stderr - 64%|██████▎ | 14285/22434 [13:22:06<5:35:53, 2.47s/it] +2025-02-05 23:29:49 - ERROR - stderr - 64%|██████▎ | 14286/22434 [13:22:09<5:48:48, 2.57s/it] +2025-02-05 23:29:49 - ERROR - stderr - +2025-02-05 23:29:49 - ERROR - stderr - +2025-02-05 23:29:49 - INFO - stdout - {'loss': 0.642, 'grad_norm': 1.2785818576812744, 'learning_rate': 6.157165252085888e-06, 'epoch': 1.91} +2025-02-05 23:29:49 - ERROR - stderr - 64%|██████▎ | 14286/22434 [13:22:09<5:48:48, 2.57s/it] +2025-02-05 23:29:51 - ERROR - stderr - 64%|██████▎ | 14287/22434 [13:22:11<5:45:45, 2.55s/it] +2025-02-05 23:29:51 - ERROR - stderr - +2025-02-05 23:29:51 - ERROR - stderr - +2025-02-05 23:29:51 - INFO - stdout - {'loss': 0.6547, 'grad_norm': 1.3187330961227417, 'learning_rate': 6.155832403386399e-06, 'epoch': 1.91} +2025-02-05 23:29:51 - ERROR - stderr - 64%|██████▎ | 14287/22434 [13:22:11<5:45:45, 2.55s/it] +2025-02-05 23:29:54 - ERROR - stderr - 64%|██████▎ | 14288/22434 [13:22:14<5:40:20, 2.51s/it] +2025-02-05 23:29:54 - ERROR - stderr - +2025-02-05 23:29:54 - ERROR - stderr - +2025-02-05 23:29:54 - INFO - stdout - {'loss': 0.7219, 'grad_norm': 1.5496586561203003, 'learning_rate': 6.154499634814905e-06, 'epoch': 1.91} +2025-02-05 23:29:54 - ERROR - stderr - 64%|██████▎ | 14288/22434 [13:22:14<5:40:20, 2.51s/it] +2025-02-05 23:29:56 - ERROR - stderr - 64%|██████▎ | 14289/22434 [13:22:16<5:42:04, 2.52s/it] +2025-02-05 23:29:56 - ERROR - stderr - +2025-02-05 23:29:56 - ERROR - stderr - +2025-02-05 23:29:56 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.2809336185455322, 'learning_rate': 6.153166946399182e-06, 'epoch': 1.91} +2025-02-05 23:29:56 - ERROR - stderr - 64%|██████▎ | 14289/22434 [13:22:16<5:42:04, 2.52s/it] +2025-02-05 23:29:59 - ERROR - stderr - 64%|██████▎ | 14290/22434 [13:22:19<5:40:55, 2.51s/it] +2025-02-05 23:29:59 - ERROR - stderr - +2025-02-05 23:29:59 - ERROR - stderr - +2025-02-05 23:29:59 - INFO - stdout - {'loss': 0.6302, 'grad_norm': 1.2342798709869385, 'learning_rate': 6.151834338167016e-06, 'epoch': 1.91} +2025-02-05 23:29:59 - ERROR - stderr - 64%|██████▎ | 14290/22434 [13:22:19<5:40:55, 2.51s/it] +2025-02-05 23:30:01 - ERROR - stderr - 64%|██████▎ | 14291/22434 [13:22:21<5:39:38, 2.50s/it] +2025-02-05 23:30:01 - ERROR - stderr - +2025-02-05 23:30:01 - ERROR - stderr - +2025-02-05 23:30:01 - INFO - stdout - {'loss': 0.6674, 'grad_norm': 1.3253809213638306, 'learning_rate': 6.15050181014618e-06, 'epoch': 1.91} +2025-02-05 23:30:01 - ERROR - stderr - 64%|██████▎ | 14291/22434 [13:22:21<5:39:38, 2.50s/it] +2025-02-05 23:30:04 - ERROR - stderr - 64%|██████▎ | 14292/22434 [13:22:24<5:40:16, 2.51s/it] +2025-02-05 23:30:04 - ERROR - stderr - +2025-02-05 23:30:04 - ERROR - stderr - +2025-02-05 23:30:04 - INFO - stdout - {'loss': 0.5979, 'grad_norm': 1.1328574419021606, 'learning_rate': 6.149169362364448e-06, 'epoch': 1.91} +2025-02-05 23:30:04 - ERROR - stderr - 64%|██████▎ | 14292/22434 [13:22:24<5:40:16, 2.51s/it] +2025-02-05 23:30:06 - ERROR - stderr - 64%|██████▎ | 14293/22434 [13:22:26<5:41:49, 2.52s/it] +2025-02-05 23:30:06 - ERROR - stderr - +2025-02-05 23:30:06 - ERROR - stderr - +2025-02-05 23:30:06 - INFO - stdout - {'loss': 0.7809, 'grad_norm': 1.3538148403167725, 'learning_rate': 6.1478369948495994e-06, 'epoch': 1.91} +2025-02-05 23:30:06 - ERROR - stderr - 64%|██████▎ | 14293/22434 [13:22:26<5:41:49, 2.52s/it] +2025-02-05 23:30:09 - ERROR - stderr - 64%|██████▎ | 14294/22434 [13:22:29<5:40:26, 2.51s/it] +2025-02-05 23:30:09 - ERROR - stderr - +2025-02-05 23:30:09 - ERROR - stderr - +2025-02-05 23:30:09 - INFO - stdout - {'loss': 0.6838, 'grad_norm': 1.2945058345794678, 'learning_rate': 6.1465047076293994e-06, 'epoch': 1.91} +2025-02-05 23:30:09 - ERROR - stderr - 64%|██████▎ | 14294/22434 [13:22:29<5:40:26, 2.51s/it] +2025-02-05 23:30:11 - ERROR - stderr - 64%|██████▎ | 14295/22434 [13:22:31<5:42:46, 2.53s/it] +2025-02-05 23:30:11 - ERROR - stderr - +2025-02-05 23:30:11 - ERROR - stderr - +2025-02-05 23:30:11 - INFO - stdout - {'loss': 0.6511, 'grad_norm': 1.2333205938339233, 'learning_rate': 6.1451725007316245e-06, 'epoch': 1.91} +2025-02-05 23:30:11 - ERROR - stderr - 64%|██████▎ | 14295/22434 [13:22:31<5:42:46, 2.53s/it] +2025-02-05 23:30:14 - ERROR - stderr - 64%|██████▎ | 14296/22434 [13:22:34<5:38:38, 2.50s/it] +2025-02-05 23:30:14 - ERROR - stderr - +2025-02-05 23:30:14 - ERROR - stderr - +2025-02-05 23:30:14 - INFO - stdout - {'loss': 0.6985, 'grad_norm': 1.2917035818099976, 'learning_rate': 6.143840374184038e-06, 'epoch': 1.91} +2025-02-05 23:30:14 - ERROR - stderr - 64%|██████▎ | 14296/22434 [13:22:34<5:38:38, 2.50s/it] +2025-02-05 23:30:16 - ERROR - stderr - 64%|██████▎ | 14297/22434 [13:22:36<5:39:43, 2.51s/it] +2025-02-05 23:30:16 - ERROR - stderr - +2025-02-05 23:30:16 - ERROR - stderr - +2025-02-05 23:30:16 - INFO - stdout - {'loss': 0.7036, 'grad_norm': 1.4599846601486206, 'learning_rate': 6.1425083280144095e-06, 'epoch': 1.91} +2025-02-05 23:30:16 - ERROR - stderr - 64%|██████▎ | 14297/22434 [13:22:36<5:39:43, 2.51s/it] +2025-02-05 23:30:19 - ERROR - stderr - 64%|██████▎ | 14298/22434 [13:22:39<5:40:21, 2.51s/it] +2025-02-05 23:30:19 - ERROR - stderr - +2025-02-05 23:30:19 - ERROR - stderr - +2025-02-05 23:30:19 - INFO - stdout - {'loss': 0.6042, 'grad_norm': 1.3134015798568726, 'learning_rate': 6.141176362250504e-06, 'epoch': 1.91} +2025-02-05 23:30:19 - ERROR - stderr - 64%|██████▎ | 14298/22434 [13:22:39<5:40:21, 2.51s/it] +2025-02-05 23:30:22 - ERROR - stderr - 64%|██████▎ | 14299/22434 [13:22:41<5:46:10, 2.55s/it] +2025-02-05 23:30:22 - ERROR - stderr - +2025-02-05 23:30:22 - ERROR - stderr - +2025-02-05 23:30:22 - INFO - stdout - {'loss': 0.6894, 'grad_norm': 1.1629736423492432, 'learning_rate': 6.139844476920086e-06, 'epoch': 1.91} +2025-02-05 23:30:22 - ERROR - stderr - 64%|██████▎ | 14299/22434 [13:22:41<5:46:10, 2.55s/it] +2025-02-05 23:30:24 - ERROR - stderr - 64%|██████▎ | 14300/22434 [13:22:44<5:42:32, 2.53s/it] +2025-02-05 23:30:24 - ERROR - stderr - +2025-02-05 23:30:24 - ERROR - stderr - +2025-02-05 23:30:24 - INFO - stdout - {'loss': 0.7221, 'grad_norm': 1.3049935102462769, 'learning_rate': 6.138512672050913e-06, 'epoch': 1.91} +2025-02-05 23:30:24 - ERROR - stderr - 64%|██████▎ | 14300/22434 [13:22:44<5:42:32, 2.53s/it] +2025-02-05 23:30:27 - ERROR - stderr - 64%|██████▎ | 14301/22434 [13:22:46<5:42:38, 2.53s/it] +2025-02-05 23:30:27 - ERROR - stderr - +2025-02-05 23:30:27 - ERROR - stderr - +2025-02-05 23:30:27 - INFO - stdout - {'loss': 0.6635, 'grad_norm': 1.249802589416504, 'learning_rate': 6.137180947670751e-06, 'epoch': 1.91} +2025-02-05 23:30:27 - ERROR - stderr - 64%|██████▎ | 14301/22434 [13:22:46<5:42:38, 2.53s/it] +2025-02-05 23:30:29 - ERROR - stderr - 64%|██████▍ | 14302/22434 [13:22:49<5:41:57, 2.52s/it] +2025-02-05 23:30:29 - ERROR - stderr - +2025-02-05 23:30:29 - ERROR - stderr - +2025-02-05 23:30:29 - INFO - stdout - {'loss': 0.6424, 'grad_norm': 1.3167529106140137, 'learning_rate': 6.135849303807353e-06, 'epoch': 1.91} +2025-02-05 23:30:29 - ERROR - stderr - 64%|██████▍ | 14302/22434 [13:22:49<5:41:57, 2.52s/it] +2025-02-05 23:30:32 - ERROR - stderr - 64%|██████▍ | 14303/22434 [13:22:51<5:41:48, 2.52s/it] +2025-02-05 23:30:32 - ERROR - stderr - +2025-02-05 23:30:32 - ERROR - stderr - +2025-02-05 23:30:32 - INFO - stdout - {'loss': 0.5879, 'grad_norm': 1.1499838829040527, 'learning_rate': 6.134517740488481e-06, 'epoch': 1.91} +2025-02-05 23:30:32 - ERROR - stderr - 64%|██████▍ | 14303/22434 [13:22:51<5:41:48, 2.52s/it] +2025-02-05 23:30:34 - ERROR - stderr - 64%|██████▍ | 14304/22434 [13:22:54<5:39:07, 2.50s/it] +2025-02-05 23:30:34 - ERROR - stderr - +2025-02-05 23:30:34 - ERROR - stderr - +2025-02-05 23:30:34 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.2181648015975952, 'learning_rate': 6.133186257741888e-06, 'epoch': 1.91} +2025-02-05 23:30:34 - ERROR - stderr - 64%|██████▍ | 14304/22434 [13:22:54<5:39:07, 2.50s/it] +2025-02-05 23:30:37 - ERROR - stderr - 64%|██████▍ | 14305/22434 [13:22:56<5:39:10, 2.50s/it] +2025-02-05 23:30:37 - ERROR - stderr - +2025-02-05 23:30:37 - ERROR - stderr - +2025-02-05 23:30:37 - INFO - stdout - {'loss': 0.5954, 'grad_norm': 1.232839584350586, 'learning_rate': 6.1318548555953235e-06, 'epoch': 1.91} +2025-02-05 23:30:37 - ERROR - stderr - 64%|██████▍ | 14305/22434 [13:22:56<5:39:10, 2.50s/it] +2025-02-05 23:30:39 - ERROR - stderr - 64%|██████▍ | 14306/22434 [13:22:59<5:37:07, 2.49s/it] +2025-02-05 23:30:39 - ERROR - stderr - +2025-02-05 23:30:39 - ERROR - stderr - +2025-02-05 23:30:39 - INFO - stdout - {'loss': 0.6514, 'grad_norm': 1.2607297897338867, 'learning_rate': 6.130523534076549e-06, 'epoch': 1.91} +2025-02-05 23:30:39 - ERROR - stderr - 64%|██████▍ | 14306/22434 [13:22:59<5:37:07, 2.49s/it] +2025-02-05 23:30:42 - ERROR - stderr - 64%|███��██▍ | 14307/22434 [13:23:01<5:40:15, 2.51s/it] +2025-02-05 23:30:42 - ERROR - stderr - +2025-02-05 23:30:42 - ERROR - stderr - +2025-02-05 23:30:42 - INFO - stdout - {'loss': 0.5875, 'grad_norm': 1.2343944311141968, 'learning_rate': 6.129192293213307e-06, 'epoch': 1.91} +2025-02-05 23:30:42 - ERROR - stderr - 64%|██████▍ | 14307/22434 [13:23:01<5:40:15, 2.51s/it] +2025-02-05 23:30:44 - ERROR - stderr - 64%|██████▍ | 14308/22434 [13:23:04<5:38:53, 2.50s/it] +2025-02-05 23:30:44 - ERROR - stderr - +2025-02-05 23:30:44 - ERROR - stderr - +2025-02-05 23:30:44 - INFO - stdout - {'loss': 0.7707, 'grad_norm': 1.2563674449920654, 'learning_rate': 6.127861133033345e-06, 'epoch': 1.91} +2025-02-05 23:30:44 - ERROR - stderr - 64%|██████▍ | 14308/22434 [13:23:04<5:38:53, 2.50s/it] +2025-02-05 23:30:47 - ERROR - stderr - 64%|██████▍ | 14309/22434 [13:23:06<5:38:05, 2.50s/it] +2025-02-05 23:30:47 - ERROR - stderr - +2025-02-05 23:30:47 - ERROR - stderr - +2025-02-05 23:30:47 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.5288641452789307, 'learning_rate': 6.126530053564414e-06, 'epoch': 1.91} +2025-02-05 23:30:47 - ERROR - stderr - 64%|██████▍ | 14309/22434 [13:23:06<5:38:05, 2.50s/it] +2025-02-05 23:30:49 - ERROR - stderr - 64%|██████▍ | 14310/22434 [13:23:09<5:39:06, 2.50s/it] +2025-02-05 23:30:49 - ERROR - stderr - +2025-02-05 23:30:49 - ERROR - stderr - +2025-02-05 23:30:49 - INFO - stdout - {'loss': 0.7357, 'grad_norm': 1.389078974723816, 'learning_rate': 6.125199054834257e-06, 'epoch': 1.91} +2025-02-05 23:30:49 - ERROR - stderr - 64%|██████▍ | 14310/22434 [13:23:09<5:39:06, 2.50s/it] +2025-02-05 23:30:52 - ERROR - stderr - 64%|██████▍ | 14311/22434 [13:23:11<5:40:12, 2.51s/it] +2025-02-05 23:30:52 - ERROR - stderr - +2025-02-05 23:30:52 - ERROR - stderr - +2025-02-05 23:30:52 - INFO - stdout - {'loss': 0.6976, 'grad_norm': 1.224880337715149, 'learning_rate': 6.123868136870619e-06, 'epoch': 1.91} +2025-02-05 23:30:52 - ERROR - stderr - 64%|██████▍ | 14311/22434 [13:23:11<5:40:12, 2.51s/it] +2025-02-05 23:30:54 - ERROR - stderr - 64%|██████▍ | 14312/22434 [13:23:14<5:36:26, 2.49s/it] +2025-02-05 23:30:54 - ERROR - stderr - +2025-02-05 23:30:54 - ERROR - stderr - +2025-02-05 23:30:54 - INFO - stdout - {'loss': 0.6143, 'grad_norm': 1.2137125730514526, 'learning_rate': 6.122537299701241e-06, 'epoch': 1.91} +2025-02-05 23:30:54 - ERROR - stderr - 64%|██████▍ | 14312/22434 [13:23:14<5:36:26, 2.49s/it] +2025-02-05 23:30:56 - ERROR - stderr - 64%|██████▍ | 14313/22434 [13:23:16<5:33:48, 2.47s/it] +2025-02-05 23:30:56 - ERROR - stderr - +2025-02-05 23:30:56 - ERROR - stderr - +2025-02-05 23:30:56 - INFO - stdout - {'loss': 0.7598, 'grad_norm': 1.338866949081421, 'learning_rate': 6.1212065433538595e-06, 'epoch': 1.91} +2025-02-05 23:30:56 - ERROR - stderr - 64%|██████▍ | 14313/22434 [13:23:16<5:33:48, 2.47s/it] +2025-02-05 23:30:59 - ERROR - stderr - 64%|██████▍ | 14314/22434 [13:23:19<5:36:13, 2.48s/it] +2025-02-05 23:30:59 - ERROR - stderr - +2025-02-05 23:30:59 - ERROR - stderr - +2025-02-05 23:30:59 - INFO - stdout - {'loss': 0.6477, 'grad_norm': 1.2913086414337158, 'learning_rate': 6.11987586785622e-06, 'epoch': 1.91} +2025-02-05 23:30:59 - ERROR - stderr - 64%|██████▍ | 14314/22434 [13:23:19<5:36:13, 2.48s/it] +2025-02-05 23:31:02 - ERROR - stderr - 64%|██████▍ | 14315/22434 [13:23:21<5:39:45, 2.51s/it] +2025-02-05 23:31:02 - ERROR - stderr - +2025-02-05 23:31:02 - ERROR - stderr - +2025-02-05 23:31:02 - INFO - stdout - {'loss': 0.6436, 'grad_norm': 1.15380859375, 'learning_rate': 6.118545273236054e-06, 'epoch': 1.91} +2025-02-05 23:31:02 - ERROR - stderr - 64%|██████▍ | 14315/22434 [13:23:21<5:39:45, 2.51s/it] +2025-02-05 23:31:04 - ERROR - stderr - 64%|██████▍ | 14316/22434 [13:23:24<5:36:54, 2.49s/it] +2025-02-05 23:31:04 - ERROR - stderr - +2025-02-05 23:31:04 - ERROR - stderr - +2025-02-05 23:31:04 - INFO - stdout - {'loss': 0.5933, 'grad_norm': 1.2949401140213013, 'learning_rate': 6.1172147595210976e-06, 'epoch': 1.91} +2025-02-05 23:31:04 - ERROR - stderr - 64%|██████▍ | 14316/22434 [13:23:24<5:36:54, 2.49s/it] +2025-02-05 23:31:06 - ERROR - stderr - 64%|██████▍ | 14317/22434 [13:23:26<5:36:42, 2.49s/it] +2025-02-05 23:31:07 - ERROR - stderr - +2025-02-05 23:31:07 - ERROR - stderr - +2025-02-05 23:31:07 - INFO - stdout - {'loss': 0.5922, 'grad_norm': 1.1877398490905762, 'learning_rate': 6.115884326739083e-06, 'epoch': 1.91} +2025-02-05 23:31:07 - ERROR - stderr - 64%|██████▍ | 14317/22434 [13:23:26<5:36:42, 2.49s/it] +2025-02-05 23:31:09 - ERROR - stderr - 64%|██████▍ | 14318/22434 [13:23:29<5:37:16, 2.49s/it] +2025-02-05 23:31:09 - ERROR - stderr - +2025-02-05 23:31:09 - ERROR - stderr - +2025-02-05 23:31:09 - INFO - stdout - {'loss': 0.7254, 'grad_norm': 1.3242037296295166, 'learning_rate': 6.114553974917741e-06, 'epoch': 1.91} +2025-02-05 23:31:09 - ERROR - stderr - 64%|██████▍ | 14318/22434 [13:23:29<5:37:16, 2.49s/it] +2025-02-05 23:31:12 - ERROR - stderr - 64%|██████▍ | 14319/22434 [13:23:31<5:40:09, 2.52s/it] +2025-02-05 23:31:12 - ERROR - stderr - +2025-02-05 23:31:12 - ERROR - stderr - +2025-02-05 23:31:12 - INFO - stdout - {'loss': 0.7583, 'grad_norm': 1.3087332248687744, 'learning_rate': 6.113223704084807e-06, 'epoch': 1.91} +2025-02-05 23:31:12 - ERROR - stderr - 64%|██████▍ | 14319/22434 [13:23:31<5:40:09, 2.52s/it] +2025-02-05 23:31:14 - ERROR - stderr - 64%|██████▍ | 14320/22434 [13:23:34<5:37:30, 2.50s/it] +2025-02-05 23:31:14 - ERROR - stderr - +2025-02-05 23:31:14 - ERROR - stderr - +2025-02-05 23:31:14 - INFO - stdout - {'loss': 0.7493, 'grad_norm': 1.1986819505691528, 'learning_rate': 6.111893514268007e-06, 'epoch': 1.91} +2025-02-05 23:31:14 - ERROR - stderr - 64%|██████▍ | 14320/22434 [13:23:34<5:37:30, 2.50s/it] +2025-02-05 23:31:17 - ERROR - stderr - 64%|██████▍ | 14321/22434 [13:23:36<5:42:27, 2.53s/it] +2025-02-05 23:31:17 - ERROR - stderr - +2025-02-05 23:31:17 - ERROR - stderr - +2025-02-05 23:31:17 - INFO - stdout - {'loss': 0.6304, 'grad_norm': 1.1840780973434448, 'learning_rate': 6.110563405495062e-06, 'epoch': 1.92} +2025-02-05 23:31:17 - ERROR - stderr - 64%|██████▍ | 14321/22434 [13:23:36<5:42:27, 2.53s/it] +2025-02-05 23:31:19 - ERROR - stderr - 64%|██████▍ | 14322/22434 [13:23:39<5:40:57, 2.52s/it] +2025-02-05 23:31:19 - ERROR - stderr - +2025-02-05 23:31:19 - ERROR - stderr - +2025-02-05 23:31:19 - INFO - stdout - {'loss': 0.731, 'grad_norm': 1.281714916229248, 'learning_rate': 6.109233377793704e-06, 'epoch': 1.92} +2025-02-05 23:31:19 - ERROR - stderr - 64%|██████▍ | 14322/22434 [13:23:39<5:40:57, 2.52s/it] +2025-02-05 23:31:22 - ERROR - stderr - 64%|██████▍ | 14323/22434 [13:23:42<5:48:06, 2.58s/it] +2025-02-05 23:31:22 - ERROR - stderr - +2025-02-05 23:31:22 - ERROR - stderr - +2025-02-05 23:31:22 - INFO - stdout - {'loss': 0.6925, 'grad_norm': 1.229513168334961, 'learning_rate': 6.107903431191652e-06, 'epoch': 1.92} +2025-02-05 23:31:22 - ERROR - stderr - 64%|██████▍ | 14323/22434 [13:23:42<5:48:06, 2.58s/it] +2025-02-05 23:31:24 - ERROR - stderr - 64%|██████▍ | 14324/22434 [13:23:44<5:43:49, 2.54s/it] +2025-02-05 23:31:24 - ERROR - stderr - +2025-02-05 23:31:24 - ERROR - stderr - +2025-02-05 23:31:24 - INFO - stdout - {'loss': 0.7201, 'grad_norm': 1.3632463216781616, 'learning_rate': 6.106573565716627e-06, 'epoch': 1.92} +2025-02-05 23:31:24 - ERROR - stderr - 64%|██████▍ | 14324/22434 [13:23:44<5:43:49, 2.54s/it] +2025-02-05 23:31:27 - ERROR - stderr - 64%|██████▍ | 14325/22434 [13:23:47<5:41:28, 2.53s/it] +2025-02-05 23:31:27 - ERROR - stderr - +2025-02-05 23:31:27 - ERROR - stderr - +2025-02-05 23:31:27 - INFO - stdout - {'loss': 0.6239, 'grad_norm': 1.1269325017929077, 'learning_rate': 6.105243781396353e-06, 'epoch': 1.92} +2025-02-05 23:31:27 - ERROR - stderr - 64%|██████▍ | 14325/22434 [13:23:47<5:41:28, 2.53s/it] +2025-02-05 23:31:29 - ERROR - stderr - 64%|██████▍ | 14326/22434 [13:23:49<5:42:07, 2.53s/it] +2025-02-05 23:31:29 - ERROR - stderr - +2025-02-05 23:31:29 - ERROR - stderr - +2025-02-05 23:31:29 - INFO - stdout - {'loss': 0.7661, 'grad_norm': 1.2974936962127686, 'learning_rate': 6.103914078258543e-06, 'epoch': 1.92} +2025-02-05 23:31:29 - ERROR - stderr - 64%|██████▍ | 14326/22434 [13:23:49<5:42:07, 2.53s/it] +2025-02-05 23:31:32 - ERROR - stderr - 64%|██████▍ | 14327/22434 [13:23:52<5:42:55, 2.54s/it] +2025-02-05 23:31:32 - ERROR - stderr - +2025-02-05 23:31:32 - ERROR - stderr - +2025-02-05 23:31:32 - INFO - stdout - {'loss': 0.6952, 'grad_norm': 1.3269054889678955, 'learning_rate': 6.102584456330919e-06, 'epoch': 1.92} +2025-02-05 23:31:32 - ERROR - stderr - 64%|██████▍ | 14327/22434 [13:23:52<5:42:55, 2.54s/it] +2025-02-05 23:31:34 - ERROR - stderr - 64%|██████▍ | 14328/22434 [13:23:54<5:41:33, 2.53s/it] +2025-02-05 23:31:34 - ERROR - stderr - +2025-02-05 23:31:34 - ERROR - stderr - +2025-02-05 23:31:34 - INFO - stdout - {'loss': 0.6751, 'grad_norm': 1.2491848468780518, 'learning_rate': 6.101254915641191e-06, 'epoch': 1.92} +2025-02-05 23:31:34 - ERROR - stderr - 64%|██████▍ | 14328/22434 [13:23:54<5:41:33, 2.53s/it] +2025-02-05 23:31:37 - ERROR - stderr - 64%|██████▍ | 14329/22434 [13:23:57<5:39:13, 2.51s/it] +2025-02-05 23:31:37 - ERROR - stderr - +2025-02-05 23:31:37 - ERROR - stderr - +2025-02-05 23:31:37 - INFO - stdout - {'loss': 0.6795, 'grad_norm': 1.2820974588394165, 'learning_rate': 6.099925456217073e-06, 'epoch': 1.92} +2025-02-05 23:31:37 - ERROR - stderr - 64%|██████▍ | 14329/22434 [13:23:57<5:39:13, 2.51s/it] +2025-02-05 23:31:39 - ERROR - stderr - 64%|██████▍ | 14330/22434 [13:23:59<5:39:14, 2.51s/it] +2025-02-05 23:31:39 - ERROR - stderr - +2025-02-05 23:31:39 - ERROR - stderr - +2025-02-05 23:31:39 - INFO - stdout - {'loss': 0.6885, 'grad_norm': 1.239327073097229, 'learning_rate': 6.098596078086278e-06, 'epoch': 1.92} +2025-02-05 23:31:39 - ERROR - stderr - 64%|██████▍ | 14330/22434 [13:23:59<5:39:14, 2.51s/it] +2025-02-05 23:31:42 - ERROR - stderr - 64%|██████▍ | 14331/22434 [13:24:02<5:39:09, 2.51s/it] +2025-02-05 23:31:42 - ERROR - stderr - +2025-02-05 23:31:42 - ERROR - stderr - +2025-02-05 23:31:42 - INFO - stdout - {'loss': 0.6929, 'grad_norm': 1.396090030670166, 'learning_rate': 6.097266781276515e-06, 'epoch': 1.92} +2025-02-05 23:31:42 - ERROR - stderr - 64%|██████▍ | 14331/22434 [13:24:02<5:39:09, 2.51s/it] +2025-02-05 23:31:44 - ERROR - stderr - 64%|██████▍ | 14332/22434 [13:24:04<5:40:09, 2.52s/it] +2025-02-05 23:31:44 - ERROR - stderr - +2025-02-05 23:31:44 - ERROR - stderr - +2025-02-05 23:31:44 - INFO - stdout - {'loss': 0.6845, 'grad_norm': 1.2558794021606445, 'learning_rate': 6.095937565815489e-06, 'epoch': 1.92} +2025-02-05 23:31:44 - ERROR - stderr - 64%|██████▍ | 14332/22434 [13:24:04<5:40:09, 2.52s/it] +2025-02-05 23:31:47 - ERROR - stderr - 64%|██████▍ | 14333/22434 [13:24:07<5:40:14, 2.52s/it] +2025-02-05 23:31:47 - ERROR - stderr - +2025-02-05 23:31:47 - ERROR - stderr - +2025-02-05 23:31:47 - INFO - stdout - {'loss': 0.6837, 'grad_norm': 1.2332311868667603, 'learning_rate': 6.0946084317309105e-06, 'epoch': 1.92} +2025-02-05 23:31:47 - ERROR - stderr - 64%|██████▍ | 14333/22434 [13:24:07<5:40:14, 2.52s/it] +2025-02-05 23:31:49 - ERROR - stderr - 64%|██████▍ | 14334/22434 [13:24:09<5:39:09, 2.51s/it] +2025-02-05 23:31:49 - ERROR - stderr - +2025-02-05 23:31:49 - ERROR - stderr - +2025-02-05 23:31:49 - INFO - stdout - {'loss': 0.6875, 'grad_norm': 1.2708615064620972, 'learning_rate': 6.093279379050481e-06, 'epoch': 1.92} +2025-02-05 23:31:49 - ERROR - stderr - 64%|██████▍ | 14334/22434 [13:24:09<5:39:09, 2.51s/it] +2025-02-05 23:31:52 - ERROR - stderr - 64%|██████▍ | 14335/22434 [13:24:12<5:38:25, 2.51s/it] +2025-02-05 23:31:52 - ERROR - stderr - +2025-02-05 23:31:52 - ERROR - stderr - +2025-02-05 23:31:52 - INFO - stdout - {'loss': 0.7211, 'grad_norm': 1.261328101158142, 'learning_rate': 6.091950407801907e-06, 'epoch': 1.92} +2025-02-05 23:31:52 - ERROR - stderr - 64%|██████▍ | 14335/22434 [13:24:12<5:38:25, 2.51s/it] +2025-02-05 23:31:54 - ERROR - stderr - 64%|██████▍ | 14336/22434 [13:24:14<5:37:16, 2.50s/it] +2025-02-05 23:31:54 - ERROR - stderr - +2025-02-05 23:31:54 - ERROR - stderr - +2025-02-05 23:31:54 - INFO - stdout - {'loss': 0.6164, 'grad_norm': 1.0668758153915405, 'learning_rate': 6.090621518012884e-06, 'epoch': 1.92} +2025-02-05 23:31:54 - ERROR - stderr - 64%|██████▍ | 14336/22434 [13:24:14<5:37:16, 2.50s/it] +2025-02-05 23:31:57 - ERROR - stderr - 64%|██████▍ | 14337/22434 [13:24:17<5:36:22, 2.49s/it] +2025-02-05 23:31:57 - ERROR - stderr - +2025-02-05 23:31:57 - ERROR - stderr - +2025-02-05 23:31:57 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.2561827898025513, 'learning_rate': 6.089292709711115e-06, 'epoch': 1.92} +2025-02-05 23:31:57 - ERROR - stderr - 64%|██████▍ | 14337/22434 [13:24:17<5:36:22, 2.49s/it] +2025-02-05 23:31:59 - ERROR - stderr - 64%|██████▍ | 14338/22434 [13:24:19<5:38:41, 2.51s/it] +2025-02-05 23:31:59 - ERROR - stderr - +2025-02-05 23:31:59 - ERROR - stderr - +2025-02-05 23:31:59 - INFO - stdout - {'loss': 0.6982, 'grad_norm': 1.3284103870391846, 'learning_rate': 6.0879639829243e-06, 'epoch': 1.92} +2025-02-05 23:31:59 - ERROR - stderr - 64%|██████▍ | 14338/22434 [13:24:19<5:38:41, 2.51s/it] +2025-02-05 23:32:02 - ERROR - stderr - 64%|██████▍ | 14339/22434 [13:24:22<5:37:53, 2.50s/it] +2025-02-05 23:32:02 - ERROR - stderr - +2025-02-05 23:32:02 - ERROR - stderr - +2025-02-05 23:32:02 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.295108437538147, 'learning_rate': 6.086635337680133e-06, 'epoch': 1.92} +2025-02-05 23:32:02 - ERROR - stderr - 64%|██████▍ | 14339/22434 [13:24:22<5:37:53, 2.50s/it] +2025-02-05 23:32:05 - ERROR - stderr - 64%|██████▍ | 14340/22434 [13:24:24<5:43:10, 2.54s/it] +2025-02-05 23:32:05 - ERROR - stderr - +2025-02-05 23:32:05 - ERROR - stderr - +2025-02-05 23:32:05 - INFO - stdout - {'loss': 0.7155, 'grad_norm': 1.2891427278518677, 'learning_rate': 6.085306774006303e-06, 'epoch': 1.92} +2025-02-05 23:32:05 - ERROR - stderr - 64%|██████▍ | 14340/22434 [13:24:24<5:43:10, 2.54s/it] +2025-02-05 23:32:07 - ERROR - stderr - 64%|██████▍ | 14341/22434 [13:24:27<5:42:26, 2.54s/it] +2025-02-05 23:32:07 - ERROR - stderr - +2025-02-05 23:32:07 - ERROR - stderr - +2025-02-05 23:32:07 - INFO - stdout - {'loss': 0.6499, 'grad_norm': 1.2271307706832886, 'learning_rate': 6.083978291930511e-06, 'epoch': 1.92} +2025-02-05 23:32:07 - ERROR - stderr - 64%|██████▍ | 14341/22434 [13:24:27<5:42:26, 2.54s/it] +2025-02-05 23:32:10 - ERROR - stderr - 64%|██████▍ | 14342/22434 [13:24:29<5:45:15, 2.56s/it] +2025-02-05 23:32:10 - ERROR - stderr - +2025-02-05 23:32:10 - ERROR - stderr - +2025-02-05 23:32:10 - INFO - stdout - {'loss': 0.6818, 'grad_norm': 1.2307151556015015, 'learning_rate': 6.082649891480441e-06, 'epoch': 1.92} +2025-02-05 23:32:10 - ERROR - stderr - 64%|██████▍ | 14342/22434 [13:24:29<5:45:15, 2.56s/it] +2025-02-05 23:32:12 - ERROR - stderr - 64%|██████▍ | 14343/22434 [13:24:32<5:51:00, 2.60s/it] +2025-02-05 23:32:12 - ERROR - stderr - +2025-02-05 23:32:12 - ERROR - stderr - +2025-02-05 23:32:12 - INFO - stdout - {'loss': 0.7433, 'grad_norm': 1.301474928855896, 'learning_rate': 6.081321572683787e-06, 'epoch': 1.92} +2025-02-05 23:32:12 - ERROR - stderr - 64%|██████▍ | 14343/22434 [13:24:32<5:51:00, 2.60s/it] +2025-02-05 23:32:15 - ERROR - stderr - 64%|██████▍ | 14344/22434 [13:24:35<5:45:40, 2.56s/it] +2025-02-05 23:32:15 - ERROR - stderr - +2025-02-05 23:32:15 - ERROR - stderr - +2025-02-05 23:32:15 - INFO - stdout - {'loss': 0.7086, 'grad_norm': 1.254326343536377, 'learning_rate': 6.0799933355682374e-06, 'epoch': 1.92} +2025-02-05 23:32:15 - ERROR - stderr - 64%|██████▍ | 14344/22434 [13:24:35<5:45:40, 2.56s/it] +2025-02-05 23:32:17 - ERROR - stderr - 64%|██████▍ | 14345/22434 [13:24:37<5:44:19, 2.55s/it] +2025-02-05 23:32:17 - ERROR - stderr - +2025-02-05 23:32:17 - ERROR - stderr - +2025-02-05 23:32:17 - INFO - stdout - {'loss': 0.6127, 'grad_norm': 1.415197730064392, 'learning_rate': 6.078665180161472e-06, 'epoch': 1.92} +2025-02-05 23:32:17 - ERROR - stderr - 64%|██████▍ | 14345/22434 [13:24:37<5:44:19, 2.55s/it] +2025-02-05 23:32:20 - ERROR - stderr - 64%|██████▍ | 14346/22434 [13:24:40<5:44:53, 2.56s/it] +2025-02-05 23:32:20 - ERROR - stderr - +2025-02-05 23:32:20 - ERROR - stderr - +2025-02-05 23:32:20 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.3525371551513672, 'learning_rate': 6.0773371064911825e-06, 'epoch': 1.92} +2025-02-05 23:32:20 - ERROR - stderr - 64%|██████▍ | 14346/22434 [13:24:40<5:44:53, 2.56s/it] +2025-02-05 23:32:22 - ERROR - stderr - 64%|██████▍ | 14347/22434 [13:24:42<5:41:25, 2.53s/it] +2025-02-05 23:32:22 - ERROR - stderr - +2025-02-05 23:32:22 - ERROR - stderr - +2025-02-05 23:32:22 - INFO - stdout - {'loss': 0.6523, 'grad_norm': 1.2345032691955566, 'learning_rate': 6.076009114585045e-06, 'epoch': 1.92} +2025-02-05 23:32:22 - ERROR - stderr - 64%|██████▍ | 14347/22434 [13:24:42<5:41:25, 2.53s/it] +2025-02-05 23:32:25 - ERROR - stderr - 64%|██████▍ | 14348/22434 [13:24:45<5:41:05, 2.53s/it] +2025-02-05 23:32:25 - ERROR - stderr - +2025-02-05 23:32:25 - ERROR - stderr - +2025-02-05 23:32:25 - INFO - stdout - {'loss': 0.6789, 'grad_norm': 1.1721729040145874, 'learning_rate': 6.074681204470742e-06, 'epoch': 1.92} +2025-02-05 23:32:25 - ERROR - stderr - 64%|██████▍ | 14348/22434 [13:24:45<5:41:05, 2.53s/it] +2025-02-05 23:32:28 - ERROR - stderr - 64%|██████▍ | 14349/22434 [13:24:47<5:43:50, 2.55s/it] +2025-02-05 23:32:28 - ERROR - stderr - +2025-02-05 23:32:28 - ERROR - stderr - +2025-02-05 23:32:28 - INFO - stdout - {'loss': 0.683, 'grad_norm': 1.307623028755188, 'learning_rate': 6.073353376175955e-06, 'epoch': 1.92} +2025-02-05 23:32:28 - ERROR - stderr - 64%|██████▍ | 14349/22434 [13:24:47<5:43:50, 2.55s/it] +2025-02-05 23:32:30 - ERROR - stderr - 64%|██████▍ | 14350/22434 [13:24:50<5:42:30, 2.54s/it] +2025-02-05 23:32:30 - ERROR - stderr - +2025-02-05 23:32:30 - ERROR - stderr - +2025-02-05 23:32:30 - INFO - stdout - {'loss': 0.6918, 'grad_norm': 1.2120999097824097, 'learning_rate': 6.072025629728356e-06, 'epoch': 1.92} +2025-02-05 23:32:30 - ERROR - stderr - 64%|██████▍ | 14350/22434 [13:24:50<5:42:30, 2.54s/it] +2025-02-05 23:32:33 - ERROR - stderr - 64%|██████▍ | 14351/22434 [13:24:52<5:40:34, 2.53s/it] +2025-02-05 23:32:33 - ERROR - stderr - +2025-02-05 23:32:33 - ERROR - stderr - +2025-02-05 23:32:33 - INFO - stdout - {'loss': 0.6746, 'grad_norm': 1.310998558998108, 'learning_rate': 6.07069796515563e-06, 'epoch': 1.92} +2025-02-05 23:32:33 - ERROR - stderr - 64%|██████▍ | 14351/22434 [13:24:52<5:40:34, 2.53s/it] +2025-02-05 23:32:35 - ERROR - stderr - 64%|██████▍ | 14352/22434 [13:24:55<5:36:34, 2.50s/it] +2025-02-05 23:32:35 - ERROR - stderr - +2025-02-05 23:32:35 - ERROR - stderr - +2025-02-05 23:32:35 - INFO - stdout - {'loss': 0.622, 'grad_norm': 1.2244346141815186, 'learning_rate': 6.069370382485442e-06, 'epoch': 1.92} +2025-02-05 23:32:35 - ERROR - stderr - 64%|██████▍ | 14352/22434 [13:24:55<5:36:34, 2.50s/it] +2025-02-05 23:32:37 - ERROR - stderr - 64%|██████▍ | 14353/22434 [13:24:57<5:35:13, 2.49s/it] +2025-02-05 23:32:38 - ERROR - stderr - +2025-02-05 23:32:38 - ERROR - stderr - +2025-02-05 23:32:38 - INFO - stdout - {'loss': 0.6241, 'grad_norm': 1.2747117280960083, 'learning_rate': 6.068042881745466e-06, 'epoch': 1.92} +2025-02-05 23:32:38 - ERROR - stderr - 64%|██████▍ | 14353/22434 [13:24:57<5:35:13, 2.49s/it] +2025-02-05 23:32:40 - ERROR - stderr - 64%|██████▍ | 14354/22434 [13:25:00<5:52:13, 2.62s/it] +2025-02-05 23:32:40 - ERROR - stderr - +2025-02-05 23:32:40 - ERROR - stderr - +2025-02-05 23:32:40 - INFO - stdout - {'loss': 0.7352, 'grad_norm': 1.4055944681167603, 'learning_rate': 6.0667154629633766e-06, 'epoch': 1.92} +2025-02-05 23:32:40 - ERROR - stderr - 64%|██████▍ | 14354/22434 [13:25:00<5:52:13, 2.62s/it] +2025-02-05 23:32:43 - ERROR - stderr - 64%|██████▍ | 14355/22434 [13:25:03<5:49:08, 2.59s/it] +2025-02-05 23:32:43 - ERROR - stderr - +2025-02-05 23:32:43 - ERROR - stderr - +2025-02-05 23:32:43 - INFO - stdout - {'loss': 0.6208, 'grad_norm': 1.3048895597457886, 'learning_rate': 6.065388126166837e-06, 'epoch': 1.92} +2025-02-05 23:32:43 - ERROR - stderr - 64%|██████▍ | 14355/22434 [13:25:03<5:49:08, 2.59s/it] +2025-02-05 23:32:45 - ERROR - stderr - 64%|██████▍ | 14356/22434 [13:25:05<5:43:44, 2.55s/it] +2025-02-05 23:32:45 - ERROR - stderr - +2025-02-05 23:32:45 - ERROR - stderr - +2025-02-05 23:32:45 - INFO - stdout - {'loss': 0.6981, 'grad_norm': 1.3057315349578857, 'learning_rate': 6.064060871383515e-06, 'epoch': 1.92} +2025-02-05 23:32:45 - ERROR - stderr - 64%|██████▍ | 14356/22434 [13:25:05<5:43:44, 2.55s/it] +2025-02-05 23:32:48 - ERROR - stderr - 64%|██████▍ | 14357/22434 [13:25:08<5:40:52, 2.53s/it] +2025-02-05 23:32:48 - ERROR - stderr - +2025-02-05 23:32:48 - ERROR - stderr - +2025-02-05 23:32:48 - INFO - stdout - {'loss': 0.7013, 'grad_norm': 1.365171194076538, 'learning_rate': 6.062733698641083e-06, 'epoch': 1.92} +2025-02-05 23:32:48 - ERROR - stderr - 64%|██████▍ | 14357/22434 [13:25:08<5:40:52, 2.53s/it] +2025-02-05 23:32:50 - ERROR - stderr - 64%|██████▍ | 14358/22434 [13:25:10<5:39:10, 2.52s/it] +2025-02-05 23:32:50 - ERROR - stderr - +2025-02-05 23:32:50 - ERROR - stderr - +2025-02-05 23:32:50 - INFO - stdout - {'loss': 0.6939, 'grad_norm': 1.3311299085617065, 'learning_rate': 6.061406607967194e-06, 'epoch': 1.92} +2025-02-05 23:32:50 - ERROR - stderr - 64%|██████▍ | 14358/22434 [13:25:10<5:39:10, 2.52s/it] +2025-02-05 23:32:53 - ERROR - stderr - 64%|██████▍ | 14359/22434 [13:25:13<5:40:46, 2.53s/it] +2025-02-05 23:32:53 - ERROR - stderr - +2025-02-05 23:32:53 - ERROR - stderr - +2025-02-05 23:32:53 - INFO - stdout - {'loss': 0.7193, 'grad_norm': 1.3191055059432983, 'learning_rate': 6.060079599389521e-06, 'epoch': 1.92} +2025-02-05 23:32:53 - ERROR - stderr - 64%|██████▍ | 14359/22434 [13:25:13<5:40:46, 2.53s/it] +2025-02-05 23:32:55 - ERROR - stderr - 64%|██████▍ | 14360/22434 [13:25:15<5:38:33, 2.52s/it] +2025-02-05 23:32:55 - ERROR - stderr - +2025-02-05 23:32:55 - ERROR - stderr - +2025-02-05 23:32:55 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.2935361862182617, 'learning_rate': 6.0587526729357145e-06, 'epoch': 1.92} +2025-02-05 23:32:55 - ERROR - stderr - 64%|██████▍ | 14360/22434 [13:25:15<5:38:33, 2.52s/it] +2025-02-05 23:32:58 - ERROR - stderr - 64%|██████▍ | 14361/22434 [13:25:18<5:37:50, 2.51s/it] +2025-02-05 23:32:58 - ERROR - stderr - +2025-02-05 23:32:58 - ERROR - stderr - +2025-02-05 23:32:58 - INFO - stdout - {'loss': 0.7022, 'grad_norm': 1.255576491355896, 'learning_rate': 6.057425828633438e-06, 'epoch': 1.92} +2025-02-05 23:32:58 - ERROR - stderr - 64%|██████▍ | 14361/22434 [13:25:18<5:37:50, 2.51s/it] +2025-02-05 23:33:00 - ERROR - stderr - 64%|██████▍ | 14362/22434 [13:25:20<5:41:03, 2.54s/it] +2025-02-05 23:33:01 - ERROR - stderr - +2025-02-05 23:33:01 - ERROR - stderr - +2025-02-05 23:33:01 - INFO - stdout - {'loss': 0.65, 'grad_norm': 1.2457056045532227, 'learning_rate': 6.056099066510349e-06, 'epoch': 1.92} +2025-02-05 23:33:01 - ERROR - stderr - 64%|██████▍ | 14362/22434 [13:25:20<5:41:03, 2.54s/it] +2025-02-05 23:33:03 - ERROR - stderr - 64%|██████▍ | 14363/22434 [13:25:23<5:39:57, 2.53s/it] +2025-02-05 23:33:03 - ERROR - stderr - +2025-02-05 23:33:03 - ERROR - stderr - +2025-02-05 23:33:03 - INFO - stdout - {'loss': 0.7043, 'grad_norm': 1.2485390901565552, 'learning_rate': 6.054772386594099e-06, 'epoch': 1.92} +2025-02-05 23:33:03 - ERROR - stderr - 64%|██████▍ | 14363/22434 [13:25:23<5:39:57, 2.53s/it] +2025-02-05 23:33:05 - ERROR - stderr - 64%|██████▍ | 14364/22434 [13:25:25<5:38:07, 2.51s/it] +2025-02-05 23:33:06 - ERROR - stderr - +2025-02-05 23:33:06 - ERROR - stderr - +2025-02-05 23:33:06 - INFO - stdout - {'loss': 0.6785, 'grad_norm': 1.2033710479736328, 'learning_rate': 6.053445788912345e-06, 'epoch': 1.92} +2025-02-05 23:33:06 - ERROR - stderr - 64%|██████▍ | 14364/22434 [13:25:25<5:38:07, 2.51s/it] +2025-02-05 23:33:08 - ERROR - stderr - 64%|██████▍ | 14365/22434 [13:25:28<5:36:44, 2.50s/it] +2025-02-05 23:33:08 - ERROR - stderr - +2025-02-05 23:33:08 - ERROR - stderr - +2025-02-05 23:33:08 - INFO - stdout - {'loss': 0.6058, 'grad_norm': 1.0446795225143433, 'learning_rate': 6.052119273492739e-06, 'epoch': 1.92} +2025-02-05 23:33:08 - ERROR - stderr - 64%|██████▍ | 14365/22434 [13:25:28<5:36:44, 2.50s/it] +2025-02-05 23:33:10 - ERROR - stderr - 64%|██████▍ | 14366/22434 [13:25:30<5:35:21, 2.49s/it] +2025-02-05 23:33:10 - ERROR - stderr - +2025-02-05 23:33:10 - ERROR - stderr - +2025-02-05 23:33:10 - INFO - stdout - {'loss': 0.6328, 'grad_norm': 1.247416377067566, 'learning_rate': 6.050792840362925e-06, 'epoch': 1.92} +2025-02-05 23:33:10 - ERROR - stderr - 64%|██████▍ | 14366/22434 [13:25:30<5:35:21, 2.49s/it] +2025-02-05 23:33:13 - ERROR - stderr - 64%|██████▍ | 14367/22434 [13:25:33<5:35:56, 2.50s/it] +2025-02-05 23:33:13 - ERROR - stderr - +2025-02-05 23:33:13 - ERROR - stderr - +2025-02-05 23:33:13 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.2555475234985352, 'learning_rate': 6.049466489550558e-06, 'epoch': 1.92} +2025-02-05 23:33:13 - ERROR - stderr - 64%|██████▍ | 14367/22434 [13:25:33<5:35:56, 2.50s/it] +2025-02-05 23:33:15 - ERROR - stderr - 64%|██████▍ | 14368/22434 [13:25:35<5:35:10, 2.49s/it] +2025-02-05 23:33:15 - ERROR - stderr - +2025-02-05 23:33:15 - ERROR - stderr - +2025-02-05 23:33:15 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.2473998069763184, 'learning_rate': 6.048140221083281e-06, 'epoch': 1.92} +2025-02-05 23:33:15 - ERROR - stderr - 64%|██████▍ | 14368/22434 [13:25:35<5:35:10, 2.49s/it] +2025-02-05 23:33:18 - ERROR - stderr - 64%|██████▍ | 14369/22434 [13:25:38<5:35:53, 2.50s/it] +2025-02-05 23:33:18 - ERROR - stderr - +2025-02-05 23:33:18 - ERROR - stderr - +2025-02-05 23:33:18 - INFO - stdout - {'loss': 0.7543, 'grad_norm': 1.4996132850646973, 'learning_rate': 6.0468140349887375e-06, 'epoch': 1.92} +2025-02-05 23:33:18 - ERROR - stderr - 64%|██████▍ | 14369/22434 [13:25:38<5:35:53, 2.50s/it] +2025-02-05 23:33:20 - ERROR - stderr - 64%|██████▍ | 14370/22434 [13:25:40<5:37:54, 2.51s/it] +2025-02-05 23:33:21 - ERROR - stderr - +2025-02-05 23:33:21 - ERROR - stderr - +2025-02-05 23:33:21 - INFO - stdout - {'loss': 0.6427, 'grad_norm': 1.2484712600708008, 'learning_rate': 6.0454879312945755e-06, 'epoch': 1.92} +2025-02-05 23:33:21 - ERROR - stderr - 64%|██████▍ | 14370/22434 [13:25:40<5:37:54, 2.51s/it] +2025-02-05 23:33:23 - ERROR - stderr - 64%|██████▍ | 14371/22434 [13:25:43<5:40:52, 2.54s/it] +2025-02-05 23:33:23 - ERROR - stderr - +2025-02-05 23:33:23 - ERROR - stderr - +2025-02-05 23:33:23 - INFO - stdout - {'loss': 0.6127, 'grad_norm': 1.1269407272338867, 'learning_rate': 6.044161910028431e-06, 'epoch': 1.92} +2025-02-05 23:33:23 - ERROR - stderr - 64%|██████▍ | 14371/22434 [13:25:43<5:40:52, 2.54s/it] +2025-02-05 23:33:26 - ERROR - stderr - 64%|██████▍ | 14372/22434 [13:25:46<5:53:14, 2.63s/it] +2025-02-05 23:33:26 - ERROR - stderr - +2025-02-05 23:33:26 - ERROR - stderr - +2025-02-05 23:33:26 - INFO - stdout - {'loss': 0.6385, 'grad_norm': 1.1464072465896606, 'learning_rate': 6.0428359712179485e-06, 'epoch': 1.92} +2025-02-05 23:33:26 - ERROR - stderr - 64%|██████▍ | 14372/22434 [13:25:46<5:53:14, 2.63s/it] +2025-02-05 23:33:28 - ERROR - stderr - 64%|██████▍ | 14373/22434 [13:25:48<5:48:47, 2.60s/it] +2025-02-05 23:33:28 - ERROR - stderr - +2025-02-05 23:33:28 - ERROR - stderr - +2025-02-05 23:33:28 - INFO - stdout - {'loss': 0.721, 'grad_norm': 1.3393694162368774, 'learning_rate': 6.041510114890765e-06, 'epoch': 1.92} +2025-02-05 23:33:28 - ERROR - stderr - 64%|██████▍ | 14373/22434 [13:25:48<5:48:47, 2.60s/it] +2025-02-05 23:33:31 - ERROR - stderr - 64%|██████▍ | 14374/22434 [13:25:51<5:43:07, 2.55s/it] +2025-02-05 23:33:31 - ERROR - stderr - +2025-02-05 23:33:31 - ERROR - stderr - +2025-02-05 23:33:31 - INFO - stdout - {'loss': 0.7169, 'grad_norm': 1.2954378128051758, 'learning_rate': 6.040184341074511e-06, 'epoch': 1.92} +2025-02-05 23:33:31 - ERROR - stderr - 64%|██████▍ | 14374/22434 [13:25:51<5:43:07, 2.55s/it] +2025-02-05 23:33:33 - ERROR - stderr - 64%|██████▍ | 14375/22434 [13:25:53<5:40:46, 2.54s/it] +2025-02-05 23:33:33 - ERROR - stderr - +2025-02-05 23:33:33 - ERROR - stderr - +2025-02-05 23:33:33 - INFO - stdout - {'loss': 0.5934, 'grad_norm': 1.1914572715759277, 'learning_rate': 6.038858649796827e-06, 'epoch': 1.92} +2025-02-05 23:33:33 - ERROR - stderr - 64%|██████▍ | 14375/22434 [13:25:53<5:40:46, 2.54s/it] +2025-02-05 23:33:36 - ERROR - stderr - 64%|██████▍ | 14376/22434 [13:25:56<5:56:41, 2.66s/it] +2025-02-05 23:33:36 - ERROR - stderr - +2025-02-05 23:33:36 - ERROR - stderr - +2025-02-05 23:33:36 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.3840868473052979, 'learning_rate': 6.037533041085346e-06, 'epoch': 1.92} +2025-02-05 23:33:36 - ERROR - stderr - 64%|██████▍ | 14376/22434 [13:25:56<5:56:41, 2.66s/it] +2025-02-05 23:33:39 - ERROR - stderr - 64%|██████▍ | 14377/22434 [13:25:59<5:48:19, 2.59s/it] +2025-02-05 23:33:39 - ERROR - stderr - +2025-02-05 23:33:39 - ERROR - stderr - +2025-02-05 23:33:39 - INFO - stdout - {'loss': 0.7508, 'grad_norm': 1.3793399333953857, 'learning_rate': 6.0362075149676935e-06, 'epoch': 1.92} +2025-02-05 23:33:39 - ERROR - stderr - 64%|██████▍ | 14377/22434 [13:25:59<5:48:19, 2.59s/it] +2025-02-05 23:33:41 - ERROR - stderr - 64%|██████▍ | 14378/22434 [13:26:01<5:41:30, 2.54s/it] +2025-02-05 23:33:41 - ERROR - stderr - +2025-02-05 23:33:41 - ERROR - stderr - +2025-02-05 23:33:41 - INFO - stdout - {'loss': 0.7011, 'grad_norm': 1.292108178138733, 'learning_rate': 6.034882071471506e-06, 'epoch': 1.92} +2025-02-05 23:33:41 - ERROR - stderr - 64%|██████▍ | 14378/22434 [13:26:01<5:41:30, 2.54s/it] +2025-02-05 23:33:44 - ERROR - stderr - 64%|██████▍ | 14379/22434 [13:26:03<5:38:51, 2.52s/it] +2025-02-05 23:33:44 - ERROR - stderr - +2025-02-05 23:33:44 - ERROR - stderr - +2025-02-05 23:33:44 - INFO - stdout - {'loss': 0.6947, 'grad_norm': 1.2115254402160645, 'learning_rate': 6.033556710624404e-06, 'epoch': 1.92} +2025-02-05 23:33:44 - ERROR - stderr - 64%|██████▍ | 14379/22434 [13:26:03<5:38:51, 2.52s/it] +2025-02-05 23:33:46 - ERROR - stderr - 64%|██████▍ | 14380/22434 [13:26:06<5:35:51, 2.50s/it] +2025-02-05 23:33:46 - ERROR - stderr - +2025-02-05 23:33:46 - ERROR - stderr - +2025-02-05 23:33:46 - INFO - stdout - {'loss': 0.7072, 'grad_norm': 1.3485713005065918, 'learning_rate': 6.032231432454021e-06, 'epoch': 1.92} +2025-02-05 23:33:46 - ERROR - stderr - 64%|██████▍ | 14380/22434 [13:26:06<5:35:51, 2.50s/it] +2025-02-05 23:33:49 - ERROR - stderr - 64%|██████▍ | 14381/22434 [13:26:08<5:39:37, 2.53s/it] +2025-02-05 23:33:49 - ERROR - stderr - +2025-02-05 23:33:49 - ERROR - stderr - +2025-02-05 23:33:49 - INFO - stdout - {'loss': 0.7167, 'grad_norm': 1.2306231260299683, 'learning_rate': 6.0309062369879745e-06, 'epoch': 1.92} +2025-02-05 23:33:49 - ERROR - stderr - 64%|██████▍ | 14381/22434 [13:26:09<5:39:37, 2.53s/it] +2025-02-05 23:33:51 - ERROR - stderr - 64%|██████▍ | 14382/22434 [13:26:11<5:41:19, 2.54s/it] +2025-02-05 23:33:51 - ERROR - stderr - +2025-02-05 23:33:51 - ERROR - stderr - +2025-02-05 23:33:51 - INFO - stdout - {'loss': 0.7788, 'grad_norm': 1.4424158334732056, 'learning_rate': 6.029581124253887e-06, 'epoch': 1.92} +2025-02-05 23:33:51 - ERROR - stderr - 64%|██████▍ | 14382/22434 [13:26:11<5:41:19, 2.54s/it] +2025-02-05 23:33:54 - ERROR - stderr - 64%|██████▍ | 14383/22434 [13:26:14<5:44:24, 2.57s/it] +2025-02-05 23:33:54 - ERROR - stderr - +2025-02-05 23:33:54 - ERROR - stderr - +2025-02-05 23:33:54 - INFO - stdout - {'loss': 0.7956, 'grad_norm': 1.469840168952942, 'learning_rate': 6.028256094279387e-06, 'epoch': 1.92} +2025-02-05 23:33:54 - ERROR - stderr - 64%|██████▍ | 14383/22434 [13:26:14<5:44:24, 2.57s/it] +2025-02-05 23:33:57 - ERROR - stderr - 64%|██████▍ | 14384/22434 [13:26:16<5:48:49, 2.60s/it] +2025-02-05 23:33:57 - ERROR - stderr - +2025-02-05 23:33:57 - ERROR - stderr - +2025-02-05 23:33:57 - INFO - stdout - {'loss': 0.753, 'grad_norm': 1.354666829109192, 'learning_rate': 6.026931147092088e-06, 'epoch': 1.92} +2025-02-05 23:33:57 - ERROR - stderr - 64%|██████▍ | 14384/22434 [13:26:16<5:48:49, 2.60s/it] +2025-02-05 23:33:59 - ERROR - stderr - 64%|██████▍ | 14385/22434 [13:26:19<5:47:33, 2.59s/it] +2025-02-05 23:33:59 - ERROR - stderr - +2025-02-05 23:33:59 - ERROR - stderr - +2025-02-05 23:33:59 - INFO - stdout - {'loss': 0.6157, 'grad_norm': 1.2361111640930176, 'learning_rate': 6.025606282719603e-06, 'epoch': 1.92} +2025-02-05 23:33:59 - ERROR - stderr - 64%|██████▍ | 14385/22434 [13:26:19<5:47:33, 2.59s/it] +2025-02-05 23:34:02 - ERROR - stderr - 64%|███���██▍ | 14386/22434 [13:26:22<5:53:09, 2.63s/it] +2025-02-05 23:34:02 - ERROR - stderr - +2025-02-05 23:34:02 - ERROR - stderr - +2025-02-05 23:34:02 - INFO - stdout - {'loss': 0.6902, 'grad_norm': 1.274280071258545, 'learning_rate': 6.024281501189555e-06, 'epoch': 1.92} +2025-02-05 23:34:02 - ERROR - stderr - 64%|██████▍ | 14386/22434 [13:26:22<5:53:09, 2.63s/it] +2025-02-05 23:34:04 - ERROR - stderr - 64%|██████▍ | 14387/22434 [13:26:24<5:49:32, 2.61s/it] +2025-02-05 23:34:04 - ERROR - stderr - +2025-02-05 23:34:04 - ERROR - stderr - +2025-02-05 23:34:04 - INFO - stdout - {'loss': 0.7104, 'grad_norm': 1.297232747077942, 'learning_rate': 6.022956802529552e-06, 'epoch': 1.92} +2025-02-05 23:34:04 - ERROR - stderr - 64%|██████▍ | 14387/22434 [13:26:24<5:49:32, 2.61s/it] +2025-02-05 23:34:07 - ERROR - stderr - 64%|██████▍ | 14388/22434 [13:26:27<5:44:47, 2.57s/it] +2025-02-05 23:34:07 - ERROR - stderr - +2025-02-05 23:34:07 - ERROR - stderr - +2025-02-05 23:34:07 - INFO - stdout - {'loss': 0.6887, 'grad_norm': 1.2780399322509766, 'learning_rate': 6.02163218676721e-06, 'epoch': 1.92} +2025-02-05 23:34:07 - ERROR - stderr - 64%|██████▍ | 14388/22434 [13:26:27<5:44:47, 2.57s/it] +2025-02-05 23:34:09 - ERROR - stderr - 64%|██████▍ | 14389/22434 [13:26:29<5:41:45, 2.55s/it] +2025-02-05 23:34:09 - ERROR - stderr - +2025-02-05 23:34:09 - ERROR - stderr - +2025-02-05 23:34:09 - INFO - stdout - {'loss': 0.6967, 'grad_norm': 1.2703601121902466, 'learning_rate': 6.020307653930141e-06, 'epoch': 1.92} +2025-02-05 23:34:09 - ERROR - stderr - 64%|██████▍ | 14389/22434 [13:26:29<5:41:45, 2.55s/it] +2025-02-05 23:34:12 - ERROR - stderr - 64%|██████▍ | 14390/22434 [13:26:32<5:37:32, 2.52s/it] +2025-02-05 23:34:12 - ERROR - stderr - +2025-02-05 23:34:12 - ERROR - stderr - +2025-02-05 23:34:12 - INFO - stdout - {'loss': 0.6301, 'grad_norm': 1.2578415870666504, 'learning_rate': 6.018983204045946e-06, 'epoch': 1.92} +2025-02-05 23:34:12 - ERROR - stderr - 64%|██████▍ | 14390/22434 [13:26:32<5:37:32, 2.52s/it] +2025-02-05 23:34:14 - ERROR - stderr - 64%|██████▍ | 14391/22434 [13:26:34<5:37:51, 2.52s/it] +2025-02-05 23:34:14 - ERROR - stderr - +2025-02-05 23:34:14 - ERROR - stderr - +2025-02-05 23:34:14 - INFO - stdout - {'loss': 0.6736, 'grad_norm': 1.1512683629989624, 'learning_rate': 6.017658837142242e-06, 'epoch': 1.92} +2025-02-05 23:34:14 - ERROR - stderr - 64%|██████▍ | 14391/22434 [13:26:34<5:37:51, 2.52s/it] +2025-02-05 23:34:17 - ERROR - stderr - 64%|██████▍ | 14392/22434 [13:26:37<5:41:07, 2.55s/it] +2025-02-05 23:34:17 - ERROR - stderr - +2025-02-05 23:34:17 - ERROR - stderr - +2025-02-05 23:34:17 - INFO - stdout - {'loss': 0.5888, 'grad_norm': 1.0372085571289062, 'learning_rate': 6.016334553246628e-06, 'epoch': 1.92} +2025-02-05 23:34:17 - ERROR - stderr - 64%|██████▍ | 14392/22434 [13:26:37<5:41:07, 2.55s/it] +2025-02-05 23:34:20 - ERROR - stderr - 64%|██████▍ | 14393/22434 [13:26:39<5:42:53, 2.56s/it] +2025-02-05 23:34:20 - ERROR - stderr - +2025-02-05 23:34:20 - ERROR - stderr - +2025-02-05 23:34:20 - INFO - stdout - {'loss': 0.7056, 'grad_norm': 1.293728232383728, 'learning_rate': 6.015010352386703e-06, 'epoch': 1.92} +2025-02-05 23:34:20 - ERROR - stderr - 64%|██████▍ | 14393/22434 [13:26:39<5:42:53, 2.56s/it] +2025-02-05 23:34:22 - ERROR - stderr - 64%|██████▍ | 14394/22434 [13:26:42<5:41:04, 2.55s/it] +2025-02-05 23:34:22 - ERROR - stderr - +2025-02-05 23:34:22 - ERROR - stderr - +2025-02-05 23:34:22 - INFO - stdout - {'loss': 0.7313, 'grad_norm': 1.252553105354309, 'learning_rate': 6.013686234590077e-06, 'epoch': 1.92} +2025-02-05 23:34:22 - ERROR - stderr - 64%|██████▍ | 14394/22434 [13:26:42<5:41:04, 2.55s/it] +2025-02-05 23:34:25 - ERROR - stderr - 64%|██████▍ | 14395/22434 [13:26:44<5:37:35, 2.52s/it] +2025-02-05 23:34:25 - ERROR - stderr - +2025-02-05 23:34:25 - ERROR - stderr - +2025-02-05 23:34:25 - INFO - stdout - {'loss': 0.6276, 'grad_norm': 1.3527165651321411, 'learning_rate': 6.012362199884345e-06, 'epoch': 1.92} +2025-02-05 23:34:25 - ERROR - stderr - 64%|██████▍ | 14395/22434 [13:26:44<5:37:35, 2.52s/it] +2025-02-05 23:34:27 - ERROR - stderr - 64%|██████▍ | 14396/22434 [13:26:47<5:34:54, 2.50s/it] +2025-02-05 23:34:27 - ERROR - stderr - +2025-02-05 23:34:27 - ERROR - stderr - +2025-02-05 23:34:27 - INFO - stdout - {'loss': 0.5849, 'grad_norm': 1.0362600088119507, 'learning_rate': 6.011038248297112e-06, 'epoch': 1.93} +2025-02-05 23:34:27 - ERROR - stderr - 64%|██████▍ | 14396/22434 [13:26:47<5:34:54, 2.50s/it] +2025-02-05 23:34:30 - ERROR - stderr - 64%|██████▍ | 14397/22434 [13:26:49<5:39:51, 2.54s/it] +2025-02-05 23:34:30 - ERROR - stderr - +2025-02-05 23:34:30 - ERROR - stderr - +2025-02-05 23:34:30 - INFO - stdout - {'loss': 0.7018, 'grad_norm': 1.331507921218872, 'learning_rate': 6.009714379855969e-06, 'epoch': 1.93} +2025-02-05 23:34:30 - ERROR - stderr - 64%|██████▍ | 14397/22434 [13:26:49<5:39:51, 2.54s/it] +2025-02-05 23:34:32 - ERROR - stderr - 64%|██████▍ | 14398/22434 [13:26:52<5:39:48, 2.54s/it] +2025-02-05 23:34:32 - ERROR - stderr - +2025-02-05 23:34:32 - ERROR - stderr - +2025-02-05 23:34:32 - INFO - stdout - {'loss': 0.6201, 'grad_norm': 1.1518824100494385, 'learning_rate': 6.008390594588508e-06, 'epoch': 1.93} +2025-02-05 23:34:32 - ERROR - stderr - 64%|██████▍ | 14398/22434 [13:26:52<5:39:48, 2.54s/it] +2025-02-05 23:34:35 - ERROR - stderr - 64%|██████▍ | 14399/22434 [13:26:54<5:38:24, 2.53s/it] +2025-02-05 23:34:35 - ERROR - stderr - +2025-02-05 23:34:35 - ERROR - stderr - +2025-02-05 23:34:35 - INFO - stdout - {'loss': 0.6928, 'grad_norm': 1.2447253465652466, 'learning_rate': 6.007066892522328e-06, 'epoch': 1.93} +2025-02-05 23:34:35 - ERROR - stderr - 64%|██████▍ | 14399/22434 [13:26:54<5:38:24, 2.53s/it] +2025-02-05 23:34:37 - ERROR - stderr - 64%|██████▍ | 14400/22434 [13:26:57<5:37:50, 2.52s/it] +2025-02-05 23:34:37 - ERROR - stderr - +2025-02-05 23:34:37 - ERROR - stderr - +2025-02-05 23:34:37 - INFO - stdout - {'loss': 0.64, 'grad_norm': 1.1865956783294678, 'learning_rate': 6.005743273685017e-06, 'epoch': 1.93} +2025-02-05 23:34:37 - ERROR - stderr - 64%|██████▍ | 14400/22434 [13:26:57<5:37:50, 2.52s/it] +2025-02-05 23:34:40 - ERROR - stderr - 64%|██████▍ | 14401/22434 [13:26:59<5:36:01, 2.51s/it] +2025-02-05 23:34:40 - ERROR - stderr - +2025-02-05 23:34:40 - ERROR - stderr - +2025-02-05 23:34:40 - INFO - stdout - {'loss': 0.7649, 'grad_norm': 1.4203073978424072, 'learning_rate': 6.004419738104164e-06, 'epoch': 1.93} +2025-02-05 23:34:40 - ERROR - stderr - 64%|██████▍ | 14401/22434 [13:26:59<5:36:01, 2.51s/it] +2025-02-05 23:34:42 - ERROR - stderr - 64%|██████▍ | 14402/22434 [13:27:02<5:33:05, 2.49s/it] +2025-02-05 23:34:42 - ERROR - stderr - +2025-02-05 23:34:42 - ERROR - stderr - +2025-02-05 23:34:42 - INFO - stdout - {'loss': 0.7404, 'grad_norm': 1.372672438621521, 'learning_rate': 6.0030962858073615e-06, 'epoch': 1.93} +2025-02-05 23:34:42 - ERROR - stderr - 64%|██████▍ | 14402/22434 [13:27:02<5:33:05, 2.49s/it] +2025-02-05 23:34:45 - ERROR - stderr - 64%|██████▍ | 14403/22434 [13:27:04<5:30:25, 2.47s/it] +2025-02-05 23:34:45 - ERROR - stderr - +2025-02-05 23:34:45 - ERROR - stderr - +2025-02-05 23:34:45 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.2716954946517944, 'learning_rate': 6.001772916822188e-06, 'epoch': 1.93} +2025-02-05 23:34:45 - ERROR - stderr - 64%|██████▍ | 14403/22434 [13:27:04<5:30:25, 2.47s/it] +2025-02-05 23:34:47 - ERROR - stderr - 64%|██████▍ | 14404/22434 [13:27:07<5:33:04, 2.49s/it] +2025-02-05 23:34:47 - ERROR - stderr - +2025-02-05 23:34:47 - ERROR - stderr - +2025-02-05 23:34:47 - INFO - stdout - {'loss': 0.6312, 'grad_norm': 1.1688146591186523, 'learning_rate': 6.0004496311762365e-06, 'epoch': 1.93} +2025-02-05 23:34:47 - ERROR - stderr - 64%|██████▍ | 14404/22434 [13:27:07<5:33:04, 2.49s/it] +2025-02-05 23:34:50 - ERROR - stderr - 64%|██████▍ | 14405/22434 [13:27:09<5:36:45, 2.52s/it] +2025-02-05 23:34:50 - ERROR - stderr - +2025-02-05 23:34:50 - ERROR - stderr - +2025-02-05 23:34:50 - INFO - stdout - {'loss': 0.6903, 'grad_norm': 1.327919602394104, 'learning_rate': 5.999126428897085e-06, 'epoch': 1.93} +2025-02-05 23:34:50 - ERROR - stderr - 64%|██████▍ | 14405/22434 [13:27:09<5:36:45, 2.52s/it] +2025-02-05 23:34:52 - ERROR - stderr - 64%|██████▍ | 14406/22434 [13:27:12<5:36:12, 2.51s/it] +2025-02-05 23:34:52 - ERROR - stderr - +2025-02-05 23:34:52 - ERROR - stderr - +2025-02-05 23:34:52 - INFO - stdout - {'loss': 0.6997, 'grad_norm': 1.2755934000015259, 'learning_rate': 5.9978033100123115e-06, 'epoch': 1.93} +2025-02-05 23:34:52 - ERROR - stderr - 64%|██████▍ | 14406/22434 [13:27:12<5:36:12, 2.51s/it] +2025-02-05 23:34:55 - ERROR - stderr - 64%|██████▍ | 14407/22434 [13:27:15<5:39:02, 2.53s/it] +2025-02-05 23:34:55 - ERROR - stderr - +2025-02-05 23:34:55 - ERROR - stderr - +2025-02-05 23:34:55 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.2396315336227417, 'learning_rate': 5.9964802745494986e-06, 'epoch': 1.93} +2025-02-05 23:34:55 - ERROR - stderr - 64%|██████▍ | 14407/22434 [13:27:15<5:39:02, 2.53s/it] +2025-02-05 23:34:57 - ERROR - stderr - 64%|██████▍ | 14408/22434 [13:27:17<5:37:25, 2.52s/it] +2025-02-05 23:34:57 - ERROR - stderr - +2025-02-05 23:34:57 - ERROR - stderr - +2025-02-05 23:34:57 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.3341940641403198, 'learning_rate': 5.995157322536227e-06, 'epoch': 1.93} +2025-02-05 23:34:57 - ERROR - stderr - 64%|██████▍ | 14408/22434 [13:27:17<5:37:25, 2.52s/it] +2025-02-05 23:35:00 - ERROR - stderr - 64%|██████▍ | 14409/22434 [13:27:20<5:38:14, 2.53s/it] +2025-02-05 23:35:00 - ERROR - stderr - +2025-02-05 23:35:00 - ERROR - stderr - +2025-02-05 23:35:00 - INFO - stdout - {'loss': 0.6071, 'grad_norm': 1.1795096397399902, 'learning_rate': 5.993834454000065e-06, 'epoch': 1.93} +2025-02-05 23:35:00 - ERROR - stderr - 64%|██████▍ | 14409/22434 [13:27:20<5:38:14, 2.53s/it] +2025-02-05 23:35:02 - ERROR - stderr - 64%|██████▍ | 14410/22434 [13:27:22<5:39:12, 2.54s/it] +2025-02-05 23:35:02 - ERROR - stderr - +2025-02-05 23:35:02 - ERROR - stderr - +2025-02-05 23:35:02 - INFO - stdout - {'loss': 0.6722, 'grad_norm': 1.1885406970977783, 'learning_rate': 5.9925116689685925e-06, 'epoch': 1.93} +2025-02-05 23:35:02 - ERROR - stderr - 64%|██████▍ | 14410/22434 [13:27:22<5:39:12, 2.54s/it] +2025-02-05 23:35:05 - ERROR - stderr - 64%|██████▍ | 14411/22434 [13:27:25<5:39:21, 2.54s/it] +2025-02-05 23:35:05 - ERROR - stderr - +2025-02-05 23:35:05 - ERROR - stderr - +2025-02-05 23:35:05 - INFO - stdout - {'loss': 0.6556, 'grad_norm': 1.2883967161178589, 'learning_rate': 5.991188967469377e-06, 'epoch': 1.93} +2025-02-05 23:35:05 - ERROR - stderr - 64%|██████▍ | 14411/22434 [13:27:25<5:39:21, 2.54s/it] +2025-02-05 23:35:07 - ERROR - stderr - 64%|██████▍ | 14412/22434 [13:27:27<5:40:50, 2.55s/it] +2025-02-05 23:35:07 - ERROR - stderr - +2025-02-05 23:35:07 - ERROR - stderr - +2025-02-05 23:35:07 - INFO - stdout - {'loss': 0.7229, 'grad_norm': 1.4747883081436157, 'learning_rate': 5.989866349529994e-06, 'epoch': 1.93} +2025-02-05 23:35:07 - ERROR - stderr - 64%|██████▍ | 14412/22434 [13:27:27<5:40:50, 2.55s/it] +2025-02-05 23:35:10 - ERROR - stderr - 64%|██████▍ | 14413/22434 [13:27:30<5:38:04, 2.53s/it] +2025-02-05 23:35:10 - ERROR - stderr - +2025-02-05 23:35:10 - ERROR - stderr - +2025-02-05 23:35:10 - INFO - stdout - {'loss': 0.7761, 'grad_norm': 1.461599588394165, 'learning_rate': 5.98854381517801e-06, 'epoch': 1.93} +2025-02-05 23:35:10 - ERROR - stderr - 64%|██████▍ | 14413/22434 [13:27:30<5:38:04, 2.53s/it] +2025-02-05 23:35:13 - ERROR - stderr - 64%|██████▍ | 14414/22434 [13:27:32<5:42:29, 2.56s/it] +2025-02-05 23:35:13 - ERROR - stderr - +2025-02-05 23:35:13 - ERROR - stderr - +2025-02-05 23:35:13 - INFO - stdout - {'loss': 0.7585, 'grad_norm': 1.4269999265670776, 'learning_rate': 5.987221364440987e-06, 'epoch': 1.93} +2025-02-05 23:35:13 - ERROR - stderr - 64%|██████▍ | 14414/22434 [13:27:32<5:42:29, 2.56s/it] +2025-02-05 23:35:15 - ERROR - stderr - 64%|██████▍ | 14415/22434 [13:27:35<5:47:19, 2.60s/it] +2025-02-05 23:35:15 - ERROR - stderr - +2025-02-05 23:35:15 - ERROR - stderr - +2025-02-05 23:35:15 - INFO - stdout - {'loss': 0.5657, 'grad_norm': 1.0483715534210205, 'learning_rate': 5.985898997346501e-06, 'epoch': 1.93} +2025-02-05 23:35:15 - ERROR - stderr - 64%|██████▍ | 14415/22434 [13:27:35<5:47:19, 2.60s/it] +2025-02-05 23:35:18 - ERROR - stderr - 64%|██████▍ | 14416/22434 [13:27:38<5:45:52, 2.59s/it] +2025-02-05 23:35:18 - ERROR - stderr - +2025-02-05 23:35:18 - ERROR - stderr - +2025-02-05 23:35:18 - INFO - stdout - {'loss': 0.6534, 'grad_norm': 1.1475498676300049, 'learning_rate': 5.984576713922108e-06, 'epoch': 1.93} +2025-02-05 23:35:18 - ERROR - stderr - 64%|██████▍ | 14416/22434 [13:27:38<5:45:52, 2.59s/it] +2025-02-05 23:35:20 - ERROR - stderr - 64%|██████▍ | 14417/22434 [13:27:40<5:39:38, 2.54s/it] +2025-02-05 23:35:20 - ERROR - stderr - +2025-02-05 23:35:20 - ERROR - stderr - +2025-02-05 23:35:20 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.126628041267395, 'learning_rate': 5.983254514195368e-06, 'epoch': 1.93} +2025-02-05 23:35:20 - ERROR - stderr - 64%|██████▍ | 14417/22434 [13:27:40<5:39:38, 2.54s/it] +2025-02-05 23:35:23 - ERROR - stderr - 64%|██████▍ | 14418/22434 [13:27:43<5:37:29, 2.53s/it] +2025-02-05 23:35:23 - ERROR - stderr - +2025-02-05 23:35:23 - ERROR - stderr - +2025-02-05 23:35:23 - INFO - stdout - {'loss': 0.6342, 'grad_norm': 1.2385354042053223, 'learning_rate': 5.981932398193848e-06, 'epoch': 1.93} +2025-02-05 23:35:23 - ERROR - stderr - 64%|██████▍ | 14418/22434 [13:27:43<5:37:29, 2.53s/it] +2025-02-05 23:35:25 - ERROR - stderr - 64%|██████▍ | 14419/22434 [13:27:45<5:40:04, 2.55s/it] +2025-02-05 23:35:25 - ERROR - stderr - +2025-02-05 23:35:25 - ERROR - stderr - +2025-02-05 23:35:25 - INFO - stdout - {'loss': 0.7523, 'grad_norm': 1.3059688806533813, 'learning_rate': 5.9806103659450975e-06, 'epoch': 1.93} +2025-02-05 23:35:25 - ERROR - stderr - 64%|██████▍ | 14419/22434 [13:27:45<5:40:04, 2.55s/it] +2025-02-05 23:35:28 - ERROR - stderr - 64%|██████▍ | 14420/22434 [13:27:48<5:41:21, 2.56s/it] +2025-02-05 23:35:28 - ERROR - stderr - +2025-02-05 23:35:28 - ERROR - stderr - +2025-02-05 23:35:28 - INFO - stdout - {'loss': 0.6714, 'grad_norm': 1.2351291179656982, 'learning_rate': 5.979288417476681e-06, 'epoch': 1.93} +2025-02-05 23:35:28 - ERROR - stderr - 64%|██████▍ | 14420/22434 [13:27:48<5:41:21, 2.56s/it] +2025-02-05 23:35:30 - ERROR - stderr - 64%|██████▍ | 14421/22434 [13:27:50<5:40:25, 2.55s/it] +2025-02-05 23:35:30 - ERROR - stderr - +2025-02-05 23:35:30 - ERROR - stderr - +2025-02-05 23:35:30 - INFO - stdout - {'loss': 0.6966, 'grad_norm': 1.3163154125213623, 'learning_rate': 5.97796655281615e-06, 'epoch': 1.93} +2025-02-05 23:35:30 - ERROR - stderr - 64%|██████▍ | 14421/22434 [13:27:50<5:40:25, 2.55s/it] +2025-02-05 23:35:33 - ERROR - stderr - 64%|██████▍ | 14422/22434 [13:27:53<5:39:15, 2.54s/it] +2025-02-05 23:35:33 - ERROR - stderr - +2025-02-05 23:35:33 - ERROR - stderr - +2025-02-05 23:35:33 - INFO - stdout - {'loss': 0.6204, 'grad_norm': 1.292440414428711, 'learning_rate': 5.976644771991054e-06, 'epoch': 1.93} +2025-02-05 23:35:33 - ERROR - stderr - 64%|██████▍ | 14422/22434 [13:27:53<5:39:15, 2.54s/it] +2025-02-05 23:35:36 - ERROR - stderr - 64%|██████▍ | 14423/22434 [13:27:56<5:55:49, 2.67s/it] +2025-02-05 23:35:36 - ERROR - stderr - +2025-02-05 23:35:36 - ERROR - stderr - +2025-02-05 23:35:36 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.1648979187011719, 'learning_rate': 5.9753230750289534e-06, 'epoch': 1.93} +2025-02-05 23:35:36 - ERROR - stderr - 64%|██████▍ | 14423/22434 [13:27:56<5:55:49, 2.67s/it] +2025-02-05 23:35:39 - ERROR - stderr - 64%|██████▍ | 14424/22434 [13:27:58<5:55:20, 2.66s/it] +2025-02-05 23:35:39 - ERROR - stderr - +2025-02-05 23:35:39 - ERROR - stderr - +2025-02-05 23:35:39 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.3234107494354248, 'learning_rate': 5.974001461957392e-06, 'epoch': 1.93} +2025-02-05 23:35:39 - ERROR - stderr - 64%|██████▍ | 14424/22434 [13:27:58<5:55:20, 2.66s/it] +2025-02-05 23:35:42 - ERROR - stderr - 64%|██████▍ | 14425/22434 [13:28:01<6:09:17, 2.77s/it] +2025-02-05 23:35:42 - ERROR - stderr - +2025-02-05 23:35:42 - ERROR - stderr - +2025-02-05 23:35:42 - INFO - stdout - {'loss': 0.6947, 'grad_norm': 1.2569841146469116, 'learning_rate': 5.972679932803912e-06, 'epoch': 1.93} +2025-02-05 23:35:42 - ERROR - stderr - 64%|██████▍ | 14425/22434 [13:28:01<6:09:17, 2.77s/it] +2025-02-05 23:35:44 - ERROR - stderr - 64%|██████▍ | 14426/22434 [13:28:04<6:10:11, 2.77s/it] +2025-02-05 23:35:44 - ERROR - stderr - +2025-02-05 23:35:44 - ERROR - stderr - +2025-02-05 23:35:44 - INFO - stdout - {'loss': 0.6446, 'grad_norm': 1.206868052482605, 'learning_rate': 5.971358487596068e-06, 'epoch': 1.93} +2025-02-05 23:35:44 - ERROR - stderr - 64%|██████▍ | 14426/22434 [13:28:04<6:10:11, 2.77s/it] +2025-02-05 23:35:47 - ERROR - stderr - 64%|██████▍ | 14427/22434 [13:28:07<6:03:08, 2.72s/it] +2025-02-05 23:35:47 - ERROR - stderr - +2025-02-05 23:35:47 - ERROR - stderr - +2025-02-05 23:35:47 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.3475213050842285, 'learning_rate': 5.970037126361399e-06, 'epoch': 1.93} +2025-02-05 23:35:47 - ERROR - stderr - 64%|██████▍ | 14427/22434 [13:28:07<6:03:08, 2.72s/it] +2025-02-05 23:35:50 - ERROR - stderr - 64%|██████▍ | 14428/22434 [13:28:10<6:13:50, 2.80s/it] +2025-02-05 23:35:50 - ERROR - stderr - +2025-02-05 23:35:50 - ERROR - stderr - +2025-02-05 23:35:50 - INFO - stdout - {'loss': 0.6257, 'grad_norm': 1.2687031030654907, 'learning_rate': 5.968715849127454e-06, 'epoch': 1.93} +2025-02-05 23:35:50 - ERROR - stderr - 64%|██████▍ | 14428/22434 [13:28:10<6:13:50, 2.80s/it] +2025-02-05 23:35:53 - ERROR - stderr - 64%|██████▍ | 14429/22434 [13:28:12<6:08:18, 2.76s/it] +2025-02-05 23:35:53 - ERROR - stderr - +2025-02-05 23:35:53 - ERROR - stderr - +2025-02-05 23:35:53 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.2240290641784668, 'learning_rate': 5.96739465592177e-06, 'epoch': 1.93} +2025-02-05 23:35:53 - ERROR - stderr - 64%|██████▍ | 14429/22434 [13:28:12<6:08:18, 2.76s/it] +2025-02-05 23:35:55 - ERROR - stderr - 64%|██████▍ | 14430/22434 [13:28:15<6:00:27, 2.70s/it] +2025-02-05 23:35:55 - ERROR - stderr - +2025-02-05 23:35:55 - ERROR - stderr - +2025-02-05 23:35:55 - INFO - stdout - {'loss': 0.6723, 'grad_norm': 1.268046498298645, 'learning_rate': 5.966073546771882e-06, 'epoch': 1.93} +2025-02-05 23:35:55 - ERROR - stderr - 64%|██████▍ | 14430/22434 [13:28:15<6:00:27, 2.70s/it] +2025-02-05 23:35:58 - ERROR - stderr - 64%|██████▍ | 14431/22434 [13:28:18<5:55:40, 2.67s/it] +2025-02-05 23:35:58 - ERROR - stderr - +2025-02-05 23:35:58 - ERROR - stderr - +2025-02-05 23:35:58 - INFO - stdout - {'loss': 0.6934, 'grad_norm': 1.284865379333496, 'learning_rate': 5.964752521705335e-06, 'epoch': 1.93} +2025-02-05 23:35:58 - ERROR - stderr - 64%|██████▍ | 14431/22434 [13:28:18<5:55:40, 2.67s/it] +2025-02-05 23:36:00 - ERROR - stderr - 64%|██████▍ | 14432/22434 [13:28:20<5:54:06, 2.66s/it] +2025-02-05 23:36:00 - ERROR - stderr - +2025-02-05 23:36:00 - ERROR - stderr - +2025-02-05 23:36:00 - INFO - stdout - {'loss': 0.7276, 'grad_norm': 1.1946617364883423, 'learning_rate': 5.9634315807496565e-06, 'epoch': 1.93} +2025-02-05 23:36:00 - ERROR - stderr - 64%|██████▍ | 14432/22434 [13:28:20<5:54:06, 2.66s/it] +2025-02-05 23:36:03 - ERROR - stderr - 64%|██████▍ | 14433/22434 [13:28:23<5:46:16, 2.60s/it] +2025-02-05 23:36:03 - ERROR - stderr - +2025-02-05 23:36:03 - ERROR - stderr - +2025-02-05 23:36:03 - INFO - stdout - {'loss': 0.6448, 'grad_norm': 1.286138653755188, 'learning_rate': 5.9621107239323835e-06, 'epoch': 1.93} +2025-02-05 23:36:03 - ERROR - stderr - 64%|██████▍ | 14433/22434 [13:28:23<5:46:16, 2.60s/it] +2025-02-05 23:36:05 - ERROR - stderr - 64%|██████▍ | 14434/22434 [13:28:25<5:44:23, 2.58s/it] +2025-02-05 23:36:05 - ERROR - stderr - +2025-02-05 23:36:05 - ERROR - stderr - +2025-02-05 23:36:05 - INFO - stdout - {'loss': 0.6584, 'grad_norm': 1.2671115398406982, 'learning_rate': 5.960789951281052e-06, 'epoch': 1.93} +2025-02-05 23:36:05 - ERROR - stderr - 64%|██████▍ | 14434/22434 [13:28:25<5:44:23, 2.58s/it] +2025-02-05 23:36:08 - ERROR - stderr - 64%|██████▍ | 14435/22434 [13:28:28<5:40:15, 2.55s/it] +2025-02-05 23:36:08 - ERROR - stderr - +2025-02-05 23:36:08 - ERROR - stderr - +2025-02-05 23:36:08 - INFO - stdout - {'loss': 0.6847, 'grad_norm': 1.3619554042816162, 'learning_rate': 5.9594692628231855e-06, 'epoch': 1.93} +2025-02-05 23:36:08 - ERROR - stderr - 64%|██████▍ | 14435/22434 [13:28:28<5:40:15, 2.55s/it] +2025-02-05 23:36:10 - ERROR - stderr - 64%|██████▍ | 14436/22434 [13:28:30<5:40:57, 2.56s/it] +2025-02-05 23:36:11 - ERROR - stderr - +2025-02-05 23:36:11 - ERROR - stderr - +2025-02-05 23:36:11 - INFO - stdout - {'loss': 0.6617, 'grad_norm': 1.2801129817962646, 'learning_rate': 5.95814865858632e-06, 'epoch': 1.93} +2025-02-05 23:36:11 - ERROR - stderr - 64%|██████▍ | 14436/22434 [13:28:30<5:40:57, 2.56s/it] +2025-02-05 23:36:13 - ERROR - stderr - 64%|██████▍ | 14437/22434 [13:28:33<5:41:35, 2.56s/it] +2025-02-05 23:36:13 - ERROR - stderr - +2025-02-05 23:36:13 - ERROR - stderr - +2025-02-05 23:36:13 - INFO - stdout - {'loss': 0.6717, 'grad_norm': 1.3092200756072998, 'learning_rate': 5.956828138597976e-06, 'epoch': 1.93} +2025-02-05 23:36:13 - ERROR - stderr - 64%|██████▍ | 14437/22434 [13:28:33<5:41:35, 2.56s/it] +2025-02-05 23:36:16 - ERROR - stderr - 64%|██████▍ | 14438/22434 [13:28:35<5:37:57, 2.54s/it] +2025-02-05 23:36:16 - ERROR - stderr - +2025-02-05 23:36:16 - ERROR - stderr - +2025-02-05 23:36:16 - INFO - stdout - {'loss': 0.7211, 'grad_norm': 1.3832833766937256, 'learning_rate': 5.955507702885679e-06, 'epoch': 1.93} +2025-02-05 23:36:16 - ERROR - stderr - 64%|██████▍ | 14438/22434 [13:28:35<5:37:57, 2.54s/it] +2025-02-05 23:36:18 - ERROR - stderr - 64%|██████▍ | 14439/22434 [13:28:38<5:36:24, 2.52s/it] +2025-02-05 23:36:18 - ERROR - stderr - +2025-02-05 23:36:18 - ERROR - stderr - +2025-02-05 23:36:18 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.3016250133514404, 'learning_rate': 5.954187351476954e-06, 'epoch': 1.93} +2025-02-05 23:36:18 - ERROR - stderr - 64%|██████▍ | 14439/22434 [13:28:38<5:36:24, 2.52s/it] +2025-02-05 23:36:21 - ERROR - stderr - 64%|██████▍ | 14440/22434 [13:28:40<5:35:15, 2.52s/it] +2025-02-05 23:36:21 - ERROR - stderr - +2025-02-05 23:36:21 - ERROR - stderr - +2025-02-05 23:36:21 - INFO - stdout - {'loss': 0.5802, 'grad_norm': 1.1444220542907715, 'learning_rate': 5.952867084399327e-06, 'epoch': 1.93} +2025-02-05 23:36:21 - ERROR - stderr - 64%|██████▍ | 14440/22434 [13:28:40<5:35:15, 2.52s/it] +2025-02-05 23:36:23 - ERROR - stderr - 64%|██████▍ | 14441/22434 [13:28:43<5:35:16, 2.52s/it] +2025-02-05 23:36:23 - ERROR - stderr - +2025-02-05 23:36:23 - ERROR - stderr - +2025-02-05 23:36:23 - INFO - stdout - {'loss': 0.7755, 'grad_norm': 1.3566572666168213, 'learning_rate': 5.951546901680306e-06, 'epoch': 1.93} +2025-02-05 23:36:23 - ERROR - stderr - 64%|██████▍ | 14441/22434 [13:28:43<5:35:16, 2.52s/it] +2025-02-05 23:36:25 - ERROR - stderr - 64%|██████▍ | 14442/22434 [13:28:45<5:31:49, 2.49s/it] +2025-02-05 23:36:26 - ERROR - stderr - +2025-02-05 23:36:26 - ERROR - stderr - +2025-02-05 23:36:26 - INFO - stdout - {'loss': 0.6144, 'grad_norm': 1.1164309978485107, 'learning_rate': 5.950226803347421e-06, 'epoch': 1.93} +2025-02-05 23:36:26 - ERROR - stderr - 64%|██████▍ | 14442/22434 [13:28:45<5:31:49, 2.49s/it] +2025-02-05 23:36:28 - ERROR - stderr - 64%|██████▍ | 14443/22434 [13:28:48<5:31:28, 2.49s/it] +2025-02-05 23:36:28 - ERROR - stderr - +2025-02-05 23:36:28 - ERROR - stderr - +2025-02-05 23:36:28 - INFO - stdout - {'loss': 0.6854, 'grad_norm': 1.231952428817749, 'learning_rate': 5.948906789428179e-06, 'epoch': 1.93} +2025-02-05 23:36:28 - ERROR - stderr - 64%|██████▍ | 14443/22434 [13:28:48<5:31:28, 2.49s/it] +2025-02-05 23:36:31 - ERROR - stderr - 64%|██████▍ | 14444/22434 [13:28:50<5:36:22, 2.53s/it] +2025-02-05 23:36:31 - ERROR - stderr - +2025-02-05 23:36:31 - ERROR - stderr - +2025-02-05 23:36:31 - INFO - stdout - {'loss': 0.683, 'grad_norm': 1.2558611631393433, 'learning_rate': 5.947586859950103e-06, 'epoch': 1.93} +2025-02-05 23:36:31 - ERROR - stderr - 64%|██████▍ | 14444/22434 [13:28:50<5:36:22, 2.53s/it] +2025-02-05 23:36:33 - ERROR - stderr - 64%|██████▍ | 14445/22434 [13:28:53<5:38:13, 2.54s/it] +2025-02-05 23:36:33 - ERROR - stderr - +2025-02-05 23:36:33 - ERROR - stderr - +2025-02-05 23:36:33 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.2705365419387817, 'learning_rate': 5.946267014940699e-06, 'epoch': 1.93} +2025-02-05 23:36:33 - ERROR - stderr - 64%|██████▍ | 14445/22434 [13:28:53<5:38:13, 2.54s/it] +2025-02-05 23:36:36 - ERROR - stderr - 64%|██████▍ | 14446/22434 [13:28:55<5:38:20, 2.54s/it] +2025-02-05 23:36:36 - ERROR - stderr - +2025-02-05 23:36:36 - ERROR - stderr - +2025-02-05 23:36:36 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.2902936935424805, 'learning_rate': 5.944947254427478e-06, 'epoch': 1.93} +2025-02-05 23:36:36 - ERROR - stderr - 64%|██████▍ | 14446/22434 [13:28:55<5:38:20, 2.54s/it] +2025-02-05 23:36:38 - ERROR - stderr - 64%|██████▍ | 14447/22434 [13:28:58<5:39:38, 2.55s/it] +2025-02-05 23:36:38 - ERROR - stderr - +2025-02-05 23:36:38 - ERROR - stderr - +2025-02-05 23:36:38 - INFO - stdout - {'loss': 0.6558, 'grad_norm': 1.2105505466461182, 'learning_rate': 5.943627578437955e-06, 'epoch': 1.93} +2025-02-05 23:36:38 - ERROR - stderr - 64%|██████▍ | 14447/22434 [13:28:58<5:39:38, 2.55s/it] +2025-02-05 23:36:41 - ERROR - stderr - 64%|██████▍ | 14448/22434 [13:29:01<5:36:57, 2.53s/it] +2025-02-05 23:36:41 - ERROR - stderr - +2025-02-05 23:36:41 - ERROR - stderr - +2025-02-05 23:36:41 - INFO - stdout - {'loss': 0.6234, 'grad_norm': 1.1718028783798218, 'learning_rate': 5.942307986999629e-06, 'epoch': 1.93} +2025-02-05 23:36:41 - ERROR - stderr - 64%|██████▍ | 14448/22434 [13:29:01<5:36:57, 2.53s/it] +2025-02-05 23:36:43 - ERROR - stderr - 64%|██████▍ | 14449/22434 [13:29:03<5:33:22, 2.50s/it] +2025-02-05 23:36:43 - ERROR - stderr - +2025-02-05 23:36:43 - ERROR - stderr - +2025-02-05 23:36:43 - INFO - stdout - {'loss': 0.7149, 'grad_norm': 1.2239034175872803, 'learning_rate': 5.9409884801400155e-06, 'epoch': 1.93} +2025-02-05 23:36:43 - ERROR - stderr - 64%|██████▍ | 14449/22434 [13:29:03<5:33:22, 2.50s/it] +2025-02-05 23:36:46 - ERROR - stderr - 64%|██████▍ | 14450/22434 [13:29:05<5:30:54, 2.49s/it] +2025-02-05 23:36:46 - ERROR - stderr - +2025-02-05 23:36:46 - ERROR - stderr - +2025-02-05 23:36:46 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.214671015739441, 'learning_rate': 5.939669057886612e-06, 'epoch': 1.93} +2025-02-05 23:36:46 - ERROR - stderr - 64%|██████▍ | 14450/22434 [13:29:05<5:30:54, 2.49s/it] +2025-02-05 23:36:48 - ERROR - stderr - 64%|██████▍ | 14451/22434 [13:29:08<5:32:32, 2.50s/it] +2025-02-05 23:36:48 - ERROR - stderr - +2025-02-05 23:36:48 - ERROR - stderr - +2025-02-05 23:36:48 - INFO - stdout - {'loss': 0.615, 'grad_norm': 1.1674202680587769, 'learning_rate': 5.938349720266918e-06, 'epoch': 1.93} +2025-02-05 23:36:48 - ERROR - stderr - 64%|██████▍ | 14451/22434 [13:29:08<5:32:32, 2.50s/it] +2025-02-05 23:36:51 - ERROR - stderr - 64%|██████▍ | 14452/22434 [13:29:11<5:36:27, 2.53s/it] +2025-02-05 23:36:51 - ERROR - stderr - +2025-02-05 23:36:51 - ERROR - stderr - +2025-02-05 23:36:51 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.2422764301300049, 'learning_rate': 5.93703046730844e-06, 'epoch': 1.93} +2025-02-05 23:36:51 - ERROR - stderr - 64%|██████▍ | 14452/22434 [13:29:11<5:36:27, 2.53s/it] +2025-02-05 23:36:53 - ERROR - stderr - 64%|██████▍ | 14453/22434 [13:29:13<5:36:40, 2.53s/it] +2025-02-05 23:36:53 - ERROR - stderr - +2025-02-05 23:36:53 - ERROR - stderr - +2025-02-05 23:36:53 - INFO - stdout - {'loss': 0.6075, 'grad_norm': 1.1725634336471558, 'learning_rate': 5.935711299038676e-06, 'epoch': 1.93} +2025-02-05 23:36:53 - ERROR - stderr - 64%|██████▍ | 14453/22434 [13:29:13<5:36:40, 2.53s/it] +2025-02-05 23:36:56 - ERROR - stderr - 64%|██████▍ | 14454/22434 [13:29:16<5:39:24, 2.55s/it] +2025-02-05 23:36:56 - ERROR - stderr - +2025-02-05 23:36:56 - ERROR - stderr - +2025-02-05 23:36:56 - INFO - stdout - {'loss': 0.688, 'grad_norm': 1.3666969537734985, 'learning_rate': 5.934392215485117e-06, 'epoch': 1.93} +2025-02-05 23:36:56 - ERROR - stderr - 64%|██████▍ | 14454/22434 [13:29:16<5:39:24, 2.55s/it] +2025-02-05 23:36:59 - ERROR - stderr - 64%|██████▍ | 14455/22434 [13:29:18<5:42:23, 2.57s/it] +2025-02-05 23:36:59 - ERROR - stderr - +2025-02-05 23:36:59 - ERROR - stderr - +2025-02-05 23:36:59 - INFO - stdout - {'loss': 0.6646, 'grad_norm': 1.3080909252166748, 'learning_rate': 5.933073216675265e-06, 'epoch': 1.93} +2025-02-05 23:36:59 - ERROR - stderr - 64%|██████▍ | 14455/22434 [13:29:18<5:42:23, 2.57s/it] +2025-02-05 23:37:01 - ERROR - stderr - 64%|██████▍ | 14456/22434 [13:29:21<5:38:21, 2.54s/it] +2025-02-05 23:37:01 - ERROR - stderr - +2025-02-05 23:37:01 - ERROR - stderr - +2025-02-05 23:37:01 - INFO - stdout - {'loss': 0.6627, 'grad_norm': 1.3700485229492188, 'learning_rate': 5.931754302636606e-06, 'epoch': 1.93} +2025-02-05 23:37:01 - ERROR - stderr - 64%|██████▍ | 14456/22434 [13:29:21<5:38:21, 2.54s/it] +2025-02-05 23:37:04 - ERROR - stderr - 64%|██████▍ | 14457/22434 [13:29:23<5:42:40, 2.58s/it] +2025-02-05 23:37:04 - ERROR - stderr - +2025-02-05 23:37:04 - ERROR - stderr - +2025-02-05 23:37:04 - INFO - stdout - {'loss': 0.7792, 'grad_norm': 1.3341292142868042, 'learning_rate': 5.93043547339664e-06, 'epoch': 1.93} +2025-02-05 23:37:04 - ERROR - stderr - 64%|██████▍ | 14457/22434 [13:29:23<5:42:40, 2.58s/it] +2025-02-05 23:37:06 - ERROR - stderr - 64%|██████▍ | 14458/22434 [13:29:26<5:38:47, 2.55s/it] +2025-02-05 23:37:06 - ERROR - stderr - +2025-02-05 23:37:06 - ERROR - stderr - +2025-02-05 23:37:06 - INFO - stdout - {'loss': 0.5837, 'grad_norm': 1.1330444812774658, 'learning_rate': 5.929116728982851e-06, 'epoch': 1.93} +2025-02-05 23:37:06 - ERROR - stderr - 64%|██████▍ | 14458/22434 [13:29:26<5:38:47, 2.55s/it] +2025-02-05 23:37:09 - ERROR - stderr - 64%|██████▍ | 14459/22434 [13:29:28<5:36:11, 2.53s/it] +2025-02-05 23:37:09 - ERROR - stderr - +2025-02-05 23:37:09 - ERROR - stderr - +2025-02-05 23:37:09 - INFO - stdout - {'loss': 0.6645, 'grad_norm': 1.271798014640808, 'learning_rate': 5.927798069422727e-06, 'epoch': 1.93} +2025-02-05 23:37:09 - ERROR - stderr - 64%|██████▍ | 14459/22434 [13:29:28<5:36:11, 2.53s/it] +2025-02-05 23:37:11 - ERROR - stderr - 64%|██████▍ | 14460/22434 [13:29:31<5:32:56, 2.51s/it] +2025-02-05 23:37:11 - ERROR - stderr - +2025-02-05 23:37:11 - ERROR - stderr - +2025-02-05 23:37:11 - INFO - stdout - {'loss': 0.6097, 'grad_norm': 1.2712351083755493, 'learning_rate': 5.926479494743758e-06, 'epoch': 1.93} +2025-02-05 23:37:11 - ERROR - stderr - 64%|██████▍ | 14460/22434 [13:29:31<5:32:56, 2.51s/it] +2025-02-05 23:37:14 - ERROR - stderr - 64%|██████▍ | 14461/22434 [13:29:33<5:30:28, 2.49s/it] +2025-02-05 23:37:14 - ERROR - stderr - +2025-02-05 23:37:14 - ERROR - stderr - +2025-02-05 23:37:14 - INFO - stdout - {'loss': 0.7183, 'grad_norm': 1.3827751874923706, 'learning_rate': 5.925161004973427e-06, 'epoch': 1.93} +2025-02-05 23:37:14 - ERROR - stderr - 64%|██████▍ | 14461/22434 [13:29:33<5:30:28, 2.49s/it] +2025-02-05 23:37:16 - ERROR - stderr - 64%|██████▍ | 14462/22434 [13:29:36<5:32:23, 2.50s/it] +2025-02-05 23:37:16 - ERROR - stderr - +2025-02-05 23:37:16 - ERROR - stderr - +2025-02-05 23:37:16 - INFO - stdout - {'loss': 0.6447, 'grad_norm': 1.1603165864944458, 'learning_rate': 5.923842600139211e-06, 'epoch': 1.93} +2025-02-05 23:37:16 - ERROR - stderr - 64%|██████▍ | 14462/22434 [13:29:36<5:32:23, 2.50s/it] +2025-02-05 23:37:19 - ERROR - stderr - 64%|██████▍ | 14463/22434 [13:29:39<5:40:54, 2.57s/it] +2025-02-05 23:37:19 - ERROR - stderr - +2025-02-05 23:37:19 - ERROR - stderr - +2025-02-05 23:37:19 - INFO - stdout - {'loss': 0.6516, 'grad_norm': 1.1967827081680298, 'learning_rate': 5.9225242802686e-06, 'epoch': 1.93} +2025-02-05 23:37:19 - ERROR - stderr - 64%|██████▍ | 14463/22434 [13:29:39<5:40:54, 2.57s/it] +2025-02-05 23:37:21 - ERROR - stderr - 64%|██████▍ | 14464/22434 [13:29:41<5:35:58, 2.53s/it] +2025-02-05 23:37:21 - ERROR - stderr - +2025-02-05 23:37:21 - ERROR - stderr - +2025-02-05 23:37:21 - INFO - stdout - {'loss': 0.6366, 'grad_norm': 1.2776082754135132, 'learning_rate': 5.921206045389065e-06, 'epoch': 1.93} +2025-02-05 23:37:21 - ERROR - stderr - 64%|██████▍ | 14464/22434 [13:29:41<5:35:58, 2.53s/it] +2025-02-05 23:37:24 - ERROR - stderr - 64%|█���████▍ | 14465/22434 [13:29:43<5:35:11, 2.52s/it] +2025-02-05 23:37:24 - ERROR - stderr - +2025-02-05 23:37:24 - ERROR - stderr - +2025-02-05 23:37:24 - INFO - stdout - {'loss': 0.6482, 'grad_norm': 1.192592740058899, 'learning_rate': 5.919887895528088e-06, 'epoch': 1.93} +2025-02-05 23:37:24 - ERROR - stderr - 64%|██████▍ | 14465/22434 [13:29:44<5:35:11, 2.52s/it] +2025-02-05 23:37:26 - ERROR - stderr - 64%|██████▍ | 14466/22434 [13:29:46<5:39:33, 2.56s/it] +2025-02-05 23:37:26 - ERROR - stderr - +2025-02-05 23:37:26 - ERROR - stderr - +2025-02-05 23:37:26 - INFO - stdout - {'loss': 0.6839, 'grad_norm': 1.3361238241195679, 'learning_rate': 5.918569830713145e-06, 'epoch': 1.93} +2025-02-05 23:37:26 - ERROR - stderr - 64%|██████▍ | 14466/22434 [13:29:46<5:39:33, 2.56s/it] +2025-02-05 23:37:29 - ERROR - stderr - 64%|██████▍ | 14467/22434 [13:29:49<5:37:52, 2.54s/it] +2025-02-05 23:37:29 - ERROR - stderr - +2025-02-05 23:37:29 - ERROR - stderr - +2025-02-05 23:37:29 - INFO - stdout - {'loss': 0.7188, 'grad_norm': 1.2640334367752075, 'learning_rate': 5.917251850971706e-06, 'epoch': 1.93} +2025-02-05 23:37:29 - ERROR - stderr - 64%|██████▍ | 14467/22434 [13:29:49<5:37:52, 2.54s/it] +2025-02-05 23:37:31 - ERROR - stderr - 64%|██████▍ | 14468/22434 [13:29:51<5:36:44, 2.54s/it] +2025-02-05 23:37:31 - ERROR - stderr - +2025-02-05 23:37:31 - ERROR - stderr - +2025-02-05 23:37:31 - INFO - stdout - {'loss': 0.6783, 'grad_norm': 1.2613823413848877, 'learning_rate': 5.91593395633125e-06, 'epoch': 1.93} +2025-02-05 23:37:31 - ERROR - stderr - 64%|██████▍ | 14468/22434 [13:29:51<5:36:44, 2.54s/it] +2025-02-05 23:37:34 - ERROR - stderr - 64%|██████▍ | 14469/22434 [13:29:54<5:35:37, 2.53s/it] +2025-02-05 23:37:34 - ERROR - stderr - +2025-02-05 23:37:34 - ERROR - stderr - +2025-02-05 23:37:34 - INFO - stdout - {'loss': 0.6711, 'grad_norm': 1.1773933172225952, 'learning_rate': 5.914616146819241e-06, 'epoch': 1.93} +2025-02-05 23:37:34 - ERROR - stderr - 64%|██████▍ | 14469/22434 [13:29:54<5:35:37, 2.53s/it] +2025-02-05 23:37:36 - ERROR - stderr - 65%|██████▍ | 14470/22434 [13:29:56<5:35:38, 2.53s/it] +2025-02-05 23:37:36 - ERROR - stderr - +2025-02-05 23:37:36 - ERROR - stderr - +2025-02-05 23:37:36 - INFO - stdout - {'loss': 0.6484, 'grad_norm': 1.2209922075271606, 'learning_rate': 5.913298422463145e-06, 'epoch': 1.94} +2025-02-05 23:37:36 - ERROR - stderr - 65%|██████▍ | 14470/22434 [13:29:56<5:35:38, 2.53s/it] +2025-02-05 23:37:39 - ERROR - stderr - 65%|██████▍ | 14471/22434 [13:29:59<5:37:13, 2.54s/it] +2025-02-05 23:37:39 - ERROR - stderr - +2025-02-05 23:37:39 - ERROR - stderr - +2025-02-05 23:37:39 - INFO - stdout - {'loss': 0.6639, 'grad_norm': 1.3920403718948364, 'learning_rate': 5.911980783290436e-06, 'epoch': 1.94} +2025-02-05 23:37:39 - ERROR - stderr - 65%|██████▍ | 14471/22434 [13:29:59<5:37:13, 2.54s/it] +2025-02-05 23:37:42 - ERROR - stderr - 65%|██████▍ | 14472/22434 [13:30:02<5:50:23, 2.64s/it] +2025-02-05 23:37:42 - ERROR - stderr - +2025-02-05 23:37:42 - ERROR - stderr - +2025-02-05 23:37:42 - INFO - stdout - {'loss': 0.7608, 'grad_norm': 1.2942211627960205, 'learning_rate': 5.910663229328573e-06, 'epoch': 1.94} +2025-02-05 23:37:42 - ERROR - stderr - 65%|██████▍ | 14472/22434 [13:30:02<5:50:23, 2.64s/it] +2025-02-05 23:37:44 - ERROR - stderr - 65%|██████▍ | 14473/22434 [13:30:04<5:45:39, 2.61s/it] +2025-02-05 23:37:44 - ERROR - stderr - +2025-02-05 23:37:44 - ERROR - stderr - +2025-02-05 23:37:44 - INFO - stdout - {'loss': 0.7274, 'grad_norm': 1.1851541996002197, 'learning_rate': 5.909345760605027e-06, 'epoch': 1.94} +2025-02-05 23:37:44 - ERROR - stderr - 65%|██████▍ | 14473/22434 [13:30:04<5:45:39, 2.61s/it] +2025-02-05 23:37:47 - ERROR - stderr - 65%|██████▍ | 14474/22434 [13:30:07<6:00:12, 2.72s/it] +2025-02-05 23:37:47 - ERROR - stderr - +2025-02-05 23:37:47 - ERROR - stderr - +2025-02-05 23:37:47 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.3126583099365234, 'learning_rate': 5.908028377147252e-06, 'epoch': 1.94} +2025-02-05 23:37:47 - ERROR - stderr - 65%|██████▍ | 14474/22434 [13:30:07<6:00:12, 2.72s/it] +2025-02-05 23:37:50 - ERROR - stderr - 65%|██████▍ | 14475/22434 [13:30:10<5:51:02, 2.65s/it] +2025-02-05 23:37:50 - ERROR - stderr - +2025-02-05 23:37:50 - ERROR - stderr - +2025-02-05 23:37:50 - INFO - stdout - {'loss': 0.7552, 'grad_norm': 1.5827136039733887, 'learning_rate': 5.906711078982708e-06, 'epoch': 1.94} +2025-02-05 23:37:50 - ERROR - stderr - 65%|██████▍ | 14475/22434 [13:30:10<5:51:02, 2.65s/it] +2025-02-05 23:37:53 - ERROR - stderr - 65%|██████▍ | 14476/22434 [13:30:12<5:52:51, 2.66s/it] +2025-02-05 23:37:53 - ERROR - stderr - +2025-02-05 23:37:53 - ERROR - stderr - +2025-02-05 23:37:53 - INFO - stdout - {'loss': 0.6839, 'grad_norm': 1.377055048942566, 'learning_rate': 5.905393866138857e-06, 'epoch': 1.94} +2025-02-05 23:37:53 - ERROR - stderr - 65%|██████▍ | 14476/22434 [13:30:12<5:52:51, 2.66s/it] +2025-02-05 23:37:55 - ERROR - stderr - 65%|██████▍ | 14477/22434 [13:30:15<5:47:20, 2.62s/it] +2025-02-05 23:37:55 - ERROR - stderr - +2025-02-05 23:37:55 - ERROR - stderr - +2025-02-05 23:37:55 - INFO - stdout - {'loss': 0.6314, 'grad_norm': 1.3584359884262085, 'learning_rate': 5.904076738643153e-06, 'epoch': 1.94} +2025-02-05 23:37:55 - ERROR - stderr - 65%|██████▍ | 14477/22434 [13:30:15<5:47:20, 2.62s/it] +2025-02-05 23:37:58 - ERROR - stderr - 65%|██████▍ | 14478/22434 [13:30:17<5:43:16, 2.59s/it] +2025-02-05 23:37:58 - ERROR - stderr - +2025-02-05 23:37:58 - ERROR - stderr - +2025-02-05 23:37:58 - INFO - stdout - {'loss': 0.642, 'grad_norm': 1.2355737686157227, 'learning_rate': 5.902759696523046e-06, 'epoch': 1.94} +2025-02-05 23:37:58 - ERROR - stderr - 65%|██████▍ | 14478/22434 [13:30:17<5:43:16, 2.59s/it] +2025-02-05 23:38:00 - ERROR - stderr - 65%|██████▍ | 14479/22434 [13:30:20<5:37:48, 2.55s/it] +2025-02-05 23:38:00 - ERROR - stderr - +2025-02-05 23:38:00 - ERROR - stderr - +2025-02-05 23:38:00 - INFO - stdout - {'loss': 0.6349, 'grad_norm': 1.1691131591796875, 'learning_rate': 5.9014427398059985e-06, 'epoch': 1.94} +2025-02-05 23:38:00 - ERROR - stderr - 65%|██████▍ | 14479/22434 [13:30:20<5:37:48, 2.55s/it] +2025-02-05 23:38:02 - ERROR - stderr - 65%|██████▍ | 14480/22434 [13:30:22<5:33:22, 2.51s/it] +2025-02-05 23:38:03 - ERROR - stderr - +2025-02-05 23:38:03 - ERROR - stderr - +2025-02-05 23:38:03 - INFO - stdout - {'loss': 0.5618, 'grad_norm': 1.10879647731781, 'learning_rate': 5.90012586851945e-06, 'epoch': 1.94} +2025-02-05 23:38:03 - ERROR - stderr - 65%|██████▍ | 14480/22434 [13:30:22<5:33:22, 2.51s/it] +2025-02-05 23:38:05 - ERROR - stderr - 65%|██████▍ | 14481/22434 [13:30:25<5:34:18, 2.52s/it] +2025-02-05 23:38:05 - ERROR - stderr - +2025-02-05 23:38:05 - ERROR - stderr - +2025-02-05 23:38:05 - INFO - stdout - {'loss': 0.6133, 'grad_norm': 1.255393385887146, 'learning_rate': 5.898809082690857e-06, 'epoch': 1.94} +2025-02-05 23:38:05 - ERROR - stderr - 65%|██████▍ | 14481/22434 [13:30:25<5:34:18, 2.52s/it] +2025-02-05 23:38:08 - ERROR - stderr - 65%|██████▍ | 14482/22434 [13:30:27<5:34:18, 2.52s/it] +2025-02-05 23:38:08 - ERROR - stderr - +2025-02-05 23:38:08 - ERROR - stderr - +2025-02-05 23:38:08 - INFO - stdout - {'loss': 0.6866, 'grad_norm': 1.4464377164840698, 'learning_rate': 5.897492382347667e-06, 'epoch': 1.94} +2025-02-05 23:38:08 - ERROR - stderr - 65%|██████▍ | 14482/22434 [13:30:27<5:34:18, 2.52s/it] +2025-02-05 23:38:10 - ERROR - stderr - 65%|██████▍ | 14483/22434 [13:30:30<5:33:36, 2.52s/it] +2025-02-05 23:38:10 - ERROR - stderr - +2025-02-05 23:38:10 - ERROR - stderr - +2025-02-05 23:38:10 - INFO - stdout - {'loss': 0.57, 'grad_norm': 1.1603502035140991, 'learning_rate': 5.896175767517318e-06, 'epoch': 1.94} +2025-02-05 23:38:10 - ERROR - stderr - 65%|██████▍ | 14483/22434 [13:30:30<5:33:36, 2.52s/it] +2025-02-05 23:38:13 - ERROR - stderr - 65%|██████▍ | 14484/22434 [13:30:32<5:31:46, 2.50s/it] +2025-02-05 23:38:13 - ERROR - stderr - +2025-02-05 23:38:13 - ERROR - stderr - +2025-02-05 23:38:13 - INFO - stdout - {'loss': 0.685, 'grad_norm': 1.1326144933700562, 'learning_rate': 5.89485923822726e-06, 'epoch': 1.94} +2025-02-05 23:38:13 - ERROR - stderr - 65%|██████▍ | 14484/22434 [13:30:32<5:31:46, 2.50s/it] +2025-02-05 23:38:15 - ERROR - stderr - 65%|██████▍ | 14485/22434 [13:30:35<5:34:50, 2.53s/it] +2025-02-05 23:38:15 - ERROR - stderr - +2025-02-05 23:38:15 - ERROR - stderr - +2025-02-05 23:38:15 - INFO - stdout - {'loss': 0.6764, 'grad_norm': 1.2566906213760376, 'learning_rate': 5.893542794504934e-06, 'epoch': 1.94} +2025-02-05 23:38:15 - ERROR - stderr - 65%|██████▍ | 14485/22434 [13:30:35<5:34:50, 2.53s/it] +2025-02-05 23:38:18 - ERROR - stderr - 65%|██████▍ | 14486/22434 [13:30:37<5:34:40, 2.53s/it] +2025-02-05 23:38:18 - ERROR - stderr - +2025-02-05 23:38:18 - ERROR - stderr - +2025-02-05 23:38:18 - INFO - stdout - {'loss': 0.6971, 'grad_norm': 1.2922167778015137, 'learning_rate': 5.892226436377775e-06, 'epoch': 1.94} +2025-02-05 23:38:18 - ERROR - stderr - 65%|██████▍ | 14486/22434 [13:30:37<5:34:40, 2.53s/it] +2025-02-05 23:38:20 - ERROR - stderr - 65%|██████▍ | 14487/22434 [13:30:40<5:48:06, 2.63s/it] +2025-02-05 23:38:21 - ERROR - stderr - +2025-02-05 23:38:21 - ERROR - stderr - +2025-02-05 23:38:21 - INFO - stdout - {'loss': 0.615, 'grad_norm': 1.1553624868392944, 'learning_rate': 5.89091016387323e-06, 'epoch': 1.94} +2025-02-05 23:38:21 - ERROR - stderr - 65%|██████▍ | 14487/22434 [13:30:40<5:48:06, 2.63s/it] +2025-02-05 23:38:23 - ERROR - stderr - 65%|██████▍ | 14488/22434 [13:30:43<5:43:47, 2.60s/it] +2025-02-05 23:38:23 - ERROR - stderr - +2025-02-05 23:38:23 - ERROR - stderr - +2025-02-05 23:38:23 - INFO - stdout - {'loss': 0.6967, 'grad_norm': 1.3865890502929688, 'learning_rate': 5.889593977018726e-06, 'epoch': 1.94} +2025-02-05 23:38:23 - ERROR - stderr - 65%|██████▍ | 14488/22434 [13:30:43<5:43:47, 2.60s/it] +2025-02-05 23:38:26 - ERROR - stderr - 65%|██████▍ | 14489/22434 [13:30:45<5:42:40, 2.59s/it] +2025-02-05 23:38:26 - ERROR - stderr - +2025-02-05 23:38:26 - ERROR - stderr - +2025-02-05 23:38:26 - INFO - stdout - {'loss': 0.6682, 'grad_norm': 1.203579068183899, 'learning_rate': 5.888277875841708e-06, 'epoch': 1.94} +2025-02-05 23:38:26 - ERROR - stderr - 65%|██████▍ | 14489/22434 [13:30:45<5:42:40, 2.59s/it] +2025-02-05 23:38:28 - ERROR - stderr - 65%|██████▍ | 14490/22434 [13:30:48<5:42:05, 2.58s/it] +2025-02-05 23:38:28 - ERROR - stderr - +2025-02-05 23:38:28 - ERROR - stderr - +2025-02-05 23:38:28 - INFO - stdout - {'loss': 0.6998, 'grad_norm': 1.4297423362731934, 'learning_rate': 5.8869618603696e-06, 'epoch': 1.94} +2025-02-05 23:38:28 - ERROR - stderr - 65%|██████▍ | 14490/22434 [13:30:48<5:42:05, 2.58s/it] +2025-02-05 23:38:31 - ERROR - stderr - 65%|██████▍ | 14491/22434 [13:30:50<5:41:38, 2.58s/it] +2025-02-05 23:38:31 - ERROR - stderr - +2025-02-05 23:38:31 - ERROR - stderr - +2025-02-05 23:38:31 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.2040075063705444, 'learning_rate': 5.885645930629833e-06, 'epoch': 1.94} +2025-02-05 23:38:31 - ERROR - stderr - 65%|██████▍ | 14491/22434 [13:30:51<5:41:38, 2.58s/it] +2025-02-05 23:38:33 - ERROR - stderr - 65%|██████▍ | 14492/22434 [13:30:53<5:42:20, 2.59s/it] +2025-02-05 23:38:33 - ERROR - stderr - +2025-02-05 23:38:33 - ERROR - stderr - +2025-02-05 23:38:33 - INFO - stdout - {'loss': 0.6348, 'grad_norm': 1.2505990266799927, 'learning_rate': 5.884330086649845e-06, 'epoch': 1.94} +2025-02-05 23:38:33 - ERROR - stderr - 65%|██████▍ | 14492/22434 [13:30:53<5:42:20, 2.59s/it] +2025-02-05 23:38:36 - ERROR - stderr - 65%|██████▍ | 14493/22434 [13:30:56<5:44:16, 2.60s/it] +2025-02-05 23:38:36 - ERROR - stderr - +2025-02-05 23:38:36 - ERROR - stderr - +2025-02-05 23:38:36 - INFO - stdout - {'loss': 0.7285, 'grad_norm': 1.346511721611023, 'learning_rate': 5.883014328457059e-06, 'epoch': 1.94} +2025-02-05 23:38:36 - ERROR - stderr - 65%|██████▍ | 14493/22434 [13:30:56<5:44:16, 2.60s/it] +2025-02-05 23:38:39 - ERROR - stderr - 65%|██████▍ | 14494/22434 [13:30:58<5:44:46, 2.61s/it] +2025-02-05 23:38:39 - ERROR - stderr - +2025-02-05 23:38:39 - ERROR - stderr - +2025-02-05 23:38:39 - INFO - stdout - {'loss': 0.7286, 'grad_norm': 1.3473567962646484, 'learning_rate': 5.881698656078894e-06, 'epoch': 1.94} +2025-02-05 23:38:39 - ERROR - stderr - 65%|██████▍ | 14494/22434 [13:30:58<5:44:46, 2.61s/it] +2025-02-05 23:38:41 - ERROR - stderr - 65%|██████▍ | 14495/22434 [13:31:01<5:40:26, 2.57s/it] +2025-02-05 23:38:41 - ERROR - stderr - +2025-02-05 23:38:41 - ERROR - stderr - +2025-02-05 23:38:41 - INFO - stdout - {'loss': 0.707, 'grad_norm': 1.2619190216064453, 'learning_rate': 5.8803830695427854e-06, 'epoch': 1.94} +2025-02-05 23:38:41 - ERROR - stderr - 65%|██████▍ | 14495/22434 [13:31:01<5:40:26, 2.57s/it] +2025-02-05 23:38:44 - ERROR - stderr - 65%|██████▍ | 14496/22434 [13:31:04<5:44:37, 2.60s/it] +2025-02-05 23:38:44 - ERROR - stderr - +2025-02-05 23:38:44 - ERROR - stderr - +2025-02-05 23:38:44 - INFO - stdout - {'loss': 0.6853, 'grad_norm': 1.281182050704956, 'learning_rate': 5.879067568876145e-06, 'epoch': 1.94} +2025-02-05 23:38:44 - ERROR - stderr - 65%|██████▍ | 14496/22434 [13:31:04<5:44:37, 2.60s/it] +2025-02-05 23:38:46 - ERROR - stderr - 65%|██████▍ | 14497/22434 [13:31:06<5:38:51, 2.56s/it] +2025-02-05 23:38:46 - ERROR - stderr - +2025-02-05 23:38:46 - ERROR - stderr - +2025-02-05 23:38:46 - INFO - stdout - {'loss': 0.6536, 'grad_norm': 1.2203370332717896, 'learning_rate': 5.877752154106399e-06, 'epoch': 1.94} +2025-02-05 23:38:46 - ERROR - stderr - 65%|██████▍ | 14497/22434 [13:31:06<5:38:51, 2.56s/it] +2025-02-05 23:38:49 - ERROR - stderr - 65%|██████▍ | 14498/22434 [13:31:09<5:39:10, 2.56s/it] +2025-02-05 23:38:49 - ERROR - stderr - +2025-02-05 23:38:49 - ERROR - stderr - +2025-02-05 23:38:49 - INFO - stdout - {'loss': 0.6809, 'grad_norm': 1.2782256603240967, 'learning_rate': 5.876436825260967e-06, 'epoch': 1.94} +2025-02-05 23:38:49 - ERROR - stderr - 65%|██████▍ | 14498/22434 [13:31:09<5:39:10, 2.56s/it] +2025-02-05 23:38:51 - ERROR - stderr - 65%|██████▍ | 14499/22434 [13:31:11<5:37:08, 2.55s/it] +2025-02-05 23:38:51 - ERROR - stderr - +2025-02-05 23:38:51 - ERROR - stderr - +2025-02-05 23:38:51 - INFO - stdout - {'loss': 0.625, 'grad_norm': 1.2935810089111328, 'learning_rate': 5.87512158236726e-06, 'epoch': 1.94} +2025-02-05 23:38:51 - ERROR - stderr - 65%|██████▍ | 14499/22434 [13:31:11<5:37:08, 2.55s/it] +2025-02-05 23:38:54 - ERROR - stderr - 65%|██████▍ | 14500/22434 [13:31:14<5:37:01, 2.55s/it] +2025-02-05 23:38:54 - ERROR - stderr - +2025-02-05 23:38:54 - ERROR - stderr - +2025-02-05 23:38:54 - INFO - stdout - {'loss': 0.5785, 'grad_norm': 1.1280409097671509, 'learning_rate': 5.8738064254527e-06, 'epoch': 1.94} +2025-02-05 23:38:54 - ERROR - stderr - 65%|██████▍ | 14500/22434 [13:31:14<5:37:01, 2.55s/it] +2025-02-05 23:38:56 - ERROR - stderr - 65%|██████▍ | 14501/22434 [13:31:16<5:36:25, 2.54s/it] +2025-02-05 23:38:56 - ERROR - stderr - +2025-02-05 23:38:56 - ERROR - stderr - +2025-02-05 23:38:56 - INFO - stdout - {'loss': 0.7636, 'grad_norm': 1.3321349620819092, 'learning_rate': 5.872491354544698e-06, 'epoch': 1.94} +2025-02-05 23:38:56 - ERROR - stderr - 65%|██████▍ | 14501/22434 [13:31:16<5:36:25, 2.54s/it] +2025-02-05 23:38:59 - ERROR - stderr - 65%|██████▍ | 14502/22434 [13:31:19<5:34:27, 2.53s/it] +2025-02-05 23:38:59 - ERROR - stderr - +2025-02-05 23:38:59 - ERROR - stderr - +2025-02-05 23:38:59 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.1474483013153076, 'learning_rate': 5.8711763696706595e-06, 'epoch': 1.94} +2025-02-05 23:38:59 - ERROR - stderr - 65%|██████▍ | 14502/22434 [13:31:19<5:34:27, 2.53s/it] +2025-02-05 23:39:02 - ERROR - stderr - 65%|██████▍ | 14503/22434 [13:31:22<6:02:57, 2.75s/it] +2025-02-05 23:39:02 - ERROR - stderr - +2025-02-05 23:39:02 - ERROR - stderr - +2025-02-05 23:39:02 - INFO - stdout - {'loss': 0.6973, 'grad_norm': 1.193880558013916, 'learning_rate': 5.869861470858e-06, 'epoch': 1.94} +2025-02-05 23:39:02 - ERROR - stderr - 65%|██████▍ | 14503/22434 [13:31:22<6:02:57, 2.75s/it] +2025-02-05 23:39:05 - ERROR - stderr - 65%|██████▍ | 14504/22434 [13:31:24<5:56:16, 2.70s/it] +2025-02-05 23:39:05 - ERROR - stderr - +2025-02-05 23:39:05 - ERROR - stderr - +2025-02-05 23:39:05 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.3732945919036865, 'learning_rate': 5.8685466581341246e-06, 'epoch': 1.94} +2025-02-05 23:39:05 - ERROR - stderr - 65%|██████▍ | 14504/22434 [13:31:25<5:56:16, 2.70s/it] +2025-02-05 23:39:07 - ERROR - stderr - 65%|██████▍ | 14505/22434 [13:31:27<5:49:27, 2.64s/it] +2025-02-05 23:39:07 - ERROR - stderr - +2025-02-05 23:39:07 - ERROR - stderr - +2025-02-05 23:39:07 - INFO - stdout - {'loss': 0.7157, 'grad_norm': 1.2913517951965332, 'learning_rate': 5.867231931526445e-06, 'epoch': 1.94} +2025-02-05 23:39:07 - ERROR - stderr - 65%|██████▍ | 14505/22434 [13:31:27<5:49:27, 2.64s/it] +2025-02-05 23:39:10 - ERROR - stderr - 65%|██████▍ | 14506/22434 [13:31:29<5:41:30, 2.58s/it] +2025-02-05 23:39:10 - ERROR - stderr - +2025-02-05 23:39:10 - ERROR - stderr - +2025-02-05 23:39:10 - INFO - stdout - {'loss': 0.755, 'grad_norm': 1.3383525609970093, 'learning_rate': 5.86591729106236e-06, 'epoch': 1.94} +2025-02-05 23:39:10 - ERROR - stderr - 65%|██████▍ | 14506/22434 [13:31:29<5:41:30, 2.58s/it] +2025-02-05 23:39:12 - ERROR - stderr - 65%|██████▍ | 14507/22434 [13:31:32<5:36:30, 2.55s/it] +2025-02-05 23:39:12 - ERROR - stderr - +2025-02-05 23:39:12 - ERROR - stderr - +2025-02-05 23:39:12 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.3321714401245117, 'learning_rate': 5.864602736769269e-06, 'epoch': 1.94} +2025-02-05 23:39:12 - ERROR - stderr - 65%|██████▍ | 14507/22434 [13:31:32<5:36:30, 2.55s/it] +2025-02-05 23:39:15 - ERROR - stderr - 65%|██████▍ | 14508/22434 [13:31:35<5:41:17, 2.58s/it] +2025-02-05 23:39:15 - ERROR - stderr - +2025-02-05 23:39:15 - ERROR - stderr - +2025-02-05 23:39:15 - INFO - stdout - {'loss': 0.6559, 'grad_norm': 1.2463963031768799, 'learning_rate': 5.863288268674583e-06, 'epoch': 1.94} +2025-02-05 23:39:15 - ERROR - stderr - 65%|██████▍ | 14508/22434 [13:31:35<5:41:17, 2.58s/it] +2025-02-05 23:39:17 - ERROR - stderr - 65%|██████▍ | 14509/22434 [13:31:37<5:41:13, 2.58s/it] +2025-02-05 23:39:17 - ERROR - stderr - +2025-02-05 23:39:17 - ERROR - stderr - +2025-02-05 23:39:17 - INFO - stdout - {'loss': 0.6393, 'grad_norm': 1.1296796798706055, 'learning_rate': 5.861973886805692e-06, 'epoch': 1.94} +2025-02-05 23:39:17 - ERROR - stderr - 65%|██████▍ | 14509/22434 [13:31:37<5:41:13, 2.58s/it] +2025-02-05 23:39:20 - ERROR - stderr - 65%|██████▍ | 14510/22434 [13:31:40<5:39:48, 2.57s/it] +2025-02-05 23:39:20 - ERROR - stderr - +2025-02-05 23:39:20 - ERROR - stderr - +2025-02-05 23:39:20 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.4072933197021484, 'learning_rate': 5.860659591189992e-06, 'epoch': 1.94} +2025-02-05 23:39:20 - ERROR - stderr - 65%|██████▍ | 14510/22434 [13:31:40<5:39:48, 2.57s/it] +2025-02-05 23:39:22 - ERROR - stderr - 65%|██████▍ | 14511/22434 [13:31:42<5:35:34, 2.54s/it] +2025-02-05 23:39:22 - ERROR - stderr - +2025-02-05 23:39:22 - ERROR - stderr - +2025-02-05 23:39:22 - INFO - stdout - {'loss': 0.688, 'grad_norm': 1.1853086948394775, 'learning_rate': 5.859345381854888e-06, 'epoch': 1.94} +2025-02-05 23:39:22 - ERROR - stderr - 65%|██████▍ | 14511/22434 [13:31:42<5:35:34, 2.54s/it] +2025-02-05 23:39:25 - ERROR - stderr - 65%|██████▍ | 14512/22434 [13:31:45<5:34:52, 2.54s/it] +2025-02-05 23:39:25 - ERROR - stderr - +2025-02-05 23:39:25 - ERROR - stderr - +2025-02-05 23:39:25 - INFO - stdout - {'loss': 0.6782, 'grad_norm': 1.287976861000061, 'learning_rate': 5.858031258827761e-06, 'epoch': 1.94} +2025-02-05 23:39:25 - ERROR - stderr - 65%|██████▍ | 14512/22434 [13:31:45<5:34:52, 2.54s/it] +2025-02-05 23:39:27 - ERROR - stderr - 65%|██████▍ | 14513/22434 [13:31:47<5:33:05, 2.52s/it] +2025-02-05 23:39:27 - ERROR - stderr - +2025-02-05 23:39:27 - ERROR - stderr - +2025-02-05 23:39:27 - INFO - stdout - {'loss': 0.7069, 'grad_norm': 1.2892423868179321, 'learning_rate': 5.856717222136015e-06, 'epoch': 1.94} +2025-02-05 23:39:27 - ERROR - stderr - 65%|██████▍ | 14513/22434 [13:31:47<5:33:05, 2.52s/it] +2025-02-05 23:39:30 - ERROR - stderr - 65%|██████▍ | 14514/22434 [13:31:50<5:32:29, 2.52s/it] +2025-02-05 23:39:30 - ERROR - stderr - +2025-02-05 23:39:30 - ERROR - stderr - +2025-02-05 23:39:30 - INFO - stdout - {'loss': 0.6661, 'grad_norm': 1.137879490852356, 'learning_rate': 5.855403271807033e-06, 'epoch': 1.94} +2025-02-05 23:39:30 - ERROR - stderr - 65%|██████▍ | 14514/22434 [13:31:50<5:32:29, 2.52s/it] +2025-02-05 23:39:32 - ERROR - stderr - 65%|██████▍ | 14515/22434 [13:31:52<5:29:40, 2.50s/it] +2025-02-05 23:39:32 - ERROR - stderr - +2025-02-05 23:39:32 - ERROR - stderr - +2025-02-05 23:39:32 - INFO - stdout - {'loss': 0.656, 'grad_norm': 1.3065146207809448, 'learning_rate': 5.8540894078682e-06, 'epoch': 1.94} +2025-02-05 23:39:32 - ERROR - stderr - 65%|██████▍ | 14515/22434 [13:31:52<5:29:40, 2.50s/it] +2025-02-05 23:39:35 - ERROR - stderr - 65%|██████▍ | 14516/22434 [13:31:55<5:28:48, 2.49s/it] +2025-02-05 23:39:35 - ERROR - stderr - +2025-02-05 23:39:35 - ERROR - stderr - +2025-02-05 23:39:35 - INFO - stdout - {'loss': 0.5722, 'grad_norm': 1.1426358222961426, 'learning_rate': 5.8527756303469074e-06, 'epoch': 1.94} +2025-02-05 23:39:35 - ERROR - stderr - 65%|██████▍ | 14516/22434 [13:31:55<5:28:48, 2.49s/it] +2025-02-05 23:39:37 - ERROR - stderr - 65%|██████▍ | 14517/22434 [13:31:57<5:26:36, 2.48s/it] +2025-02-05 23:39:37 - ERROR - stderr - +2025-02-05 23:39:37 - ERROR - stderr - +2025-02-05 23:39:37 - INFO - stdout - {'loss': 0.7527, 'grad_norm': 1.3579005002975464, 'learning_rate': 5.851461939270542e-06, 'epoch': 1.94} +2025-02-05 23:39:37 - ERROR - stderr - 65%|██████▍ | 14517/22434 [13:31:57<5:26:36, 2.48s/it] +2025-02-05 23:39:40 - ERROR - stderr - 65%|██████▍ | 14518/22434 [13:32:00<5:29:20, 2.50s/it] +2025-02-05 23:39:40 - ERROR - stderr - +2025-02-05 23:39:40 - ERROR - stderr - +2025-02-05 23:39:40 - INFO - stdout - {'loss': 0.6481, 'grad_norm': 1.223244309425354, 'learning_rate': 5.850148334666476e-06, 'epoch': 1.94} +2025-02-05 23:39:40 - ERROR - stderr - 65%|██████▍ | 14518/22434 [13:32:00<5:29:20, 2.50s/it] +2025-02-05 23:39:42 - ERROR - stderr - 65%|██████▍ | 14519/22434 [13:32:02<5:31:02, 2.51s/it] +2025-02-05 23:39:42 - ERROR - stderr - +2025-02-05 23:39:42 - ERROR - stderr - +2025-02-05 23:39:42 - INFO - stdout - {'loss': 0.6353, 'grad_norm': 1.231117844581604, 'learning_rate': 5.848834816562104e-06, 'epoch': 1.94} +2025-02-05 23:39:42 - ERROR - stderr - 65%|██████▍ | 14519/22434 [13:32:02<5:31:02, 2.51s/it] +2025-02-05 23:39:45 - ERROR - stderr - 65%|██████▍ | 14520/22434 [13:32:05<5:33:08, 2.53s/it] +2025-02-05 23:39:45 - ERROR - stderr - +2025-02-05 23:39:45 - ERROR - stderr - +2025-02-05 23:39:45 - INFO - stdout - {'loss': 0.6779, 'grad_norm': 1.32968270778656, 'learning_rate': 5.8475213849847935e-06, 'epoch': 1.94} +2025-02-05 23:39:45 - ERROR - stderr - 65%|██████▍ | 14520/22434 [13:32:05<5:33:08, 2.53s/it] +2025-02-05 23:39:47 - ERROR - stderr - 65%|██████▍ | 14521/22434 [13:32:07<5:29:29, 2.50s/it] +2025-02-05 23:39:47 - ERROR - stderr - +2025-02-05 23:39:47 - ERROR - stderr - +2025-02-05 23:39:47 - INFO - stdout - {'loss': 0.6623, 'grad_norm': 1.3732610940933228, 'learning_rate': 5.846208039961929e-06, 'epoch': 1.94} +2025-02-05 23:39:47 - ERROR - stderr - 65%|██████▍ | 14521/22434 [13:32:07<5:29:29, 2.50s/it] +2025-02-05 23:39:50 - ERROR - stderr - 65%|██████▍ | 14522/22434 [13:32:10<5:34:41, 2.54s/it] +2025-02-05 23:39:50 - ERROR - stderr - +2025-02-05 23:39:50 - ERROR - stderr - +2025-02-05 23:39:50 - INFO - stdout - {'loss': 0.63, 'grad_norm': 1.217100977897644, 'learning_rate': 5.844894781520881e-06, 'epoch': 1.94} +2025-02-05 23:39:50 - ERROR - stderr - 65%|██████▍ | 14522/22434 [13:32:10<5:34:41, 2.54s/it] +2025-02-05 23:39:52 - ERROR - stderr - 65%|██████▍ | 14523/22434 [13:32:12<5:31:45, 2.52s/it] +2025-02-05 23:39:53 - ERROR - stderr - +2025-02-05 23:39:53 - ERROR - stderr - +2025-02-05 23:39:53 - INFO - stdout - {'loss': 0.7087, 'grad_norm': 1.2675803899765015, 'learning_rate': 5.843581609689024e-06, 'epoch': 1.94} +2025-02-05 23:39:53 - ERROR - stderr - 65%|██████▍ | 14523/22434 [13:32:12<5:31:45, 2.52s/it] +2025-02-05 23:39:55 - ERROR - stderr - 65%|██████▍ | 14524/22434 [13:32:15<5:28:41, 2.49s/it] +2025-02-05 23:39:55 - ERROR - stderr - +2025-02-05 23:39:55 - ERROR - stderr - +2025-02-05 23:39:55 - INFO - stdout - {'loss': 0.7007, 'grad_norm': 1.2878421545028687, 'learning_rate': 5.842268524493735e-06, 'epoch': 1.94} +2025-02-05 23:39:55 - ERROR - stderr - 65%|██████▍ | 14524/22434 [13:32:15<5:28:41, 2.49s/it] +2025-02-05 23:39:57 - ERROR - stderr - 65%|██████▍ | 14525/22434 [13:32:17<5:27:21, 2.48s/it] +2025-02-05 23:39:57 - ERROR - stderr - +2025-02-05 23:39:57 - ERROR - stderr - +2025-02-05 23:39:57 - INFO - stdout - {'loss': 0.6629, 'grad_norm': 1.336093783378601, 'learning_rate': 5.840955525962381e-06, 'epoch': 1.94} +2025-02-05 23:39:57 - ERROR - stderr - 65%|██████▍ | 14525/22434 [13:32:17<5:27:21, 2.48s/it] +2025-02-05 23:40:00 - ERROR - stderr - 65%|██████▍ | 14526/22434 [13:32:20<5:26:46, 2.48s/it] +2025-02-05 23:40:00 - ERROR - stderr - +2025-02-05 23:40:00 - ERROR - stderr - +2025-02-05 23:40:00 - INFO - stdout - {'loss': 0.6274, 'grad_norm': 1.180262804031372, 'learning_rate': 5.839642614122324e-06, 'epoch': 1.94} +2025-02-05 23:40:00 - ERROR - stderr - 65%|██████▍ | 14526/22434 [13:32:20<5:26:46, 2.48s/it] +2025-02-05 23:40:02 - ERROR - stderr - 65%|██████▍ | 14527/22434 [13:32:22<5:27:03, 2.48s/it] +2025-02-05 23:40:02 - ERROR - stderr - +2025-02-05 23:40:02 - ERROR - stderr - +2025-02-05 23:40:02 - INFO - stdout - {'loss': 0.5642, 'grad_norm': 1.2009515762329102, 'learning_rate': 5.83832978900094e-06, 'epoch': 1.94} +2025-02-05 23:40:02 - ERROR - stderr - 65%|██████▍ | 14527/22434 [13:32:22<5:27:03, 2.48s/it] +2025-02-05 23:40:05 - ERROR - stderr - 65%|██████▍ | 14528/22434 [13:32:25<5:30:11, 2.51s/it] +2025-02-05 23:40:05 - ERROR - stderr - +2025-02-05 23:40:05 - ERROR - stderr - +2025-02-05 23:40:05 - INFO - stdout - {'loss': 0.7129, 'grad_norm': 1.3272777795791626, 'learning_rate': 5.837017050625583e-06, 'epoch': 1.94} +2025-02-05 23:40:05 - ERROR - stderr - 65%|██████▍ | 14528/22434 [13:32:25<5:30:11, 2.51s/it] +2025-02-05 23:40:07 - ERROR - stderr - 65%|██████▍ | 14529/22434 [13:32:27<5:32:36, 2.52s/it] +2025-02-05 23:40:07 - ERROR - stderr - +2025-02-05 23:40:07 - ERROR - stderr - +2025-02-05 23:40:07 - INFO - stdout - {'loss': 0.7122, 'grad_norm': 1.5426565408706665, 'learning_rate': 5.835704399023631e-06, 'epoch': 1.94} +2025-02-05 23:40:07 - ERROR - stderr - 65%|██████▍ | 14529/22434 [13:32:27<5:32:36, 2.52s/it] +2025-02-05 23:40:10 - ERROR - stderr - 65%|██████▍ | 14530/22434 [13:32:30<5:32:03, 2.52s/it] +2025-02-05 23:40:10 - ERROR - stderr - +2025-02-05 23:40:10 - ERROR - stderr - +2025-02-05 23:40:10 - INFO - stdout - {'loss': 0.7302, 'grad_norm': 1.2776893377304077, 'learning_rate': 5.83439183422243e-06, 'epoch': 1.94} +2025-02-05 23:40:10 - ERROR - stderr - 65%|██████▍ | 14530/22434 [13:32:30<5:32:03, 2.52s/it] +2025-02-05 23:40:12 - ERROR - stderr - 65%|██████▍ | 14531/22434 [13:32:32<5:29:29, 2.50s/it] +2025-02-05 23:40:12 - ERROR - stderr - +2025-02-05 23:40:12 - ERROR - stderr - +2025-02-05 23:40:12 - INFO - stdout - {'loss': 0.6828, 'grad_norm': 1.315960168838501, 'learning_rate': 5.833079356249347e-06, 'epoch': 1.94} +2025-02-05 23:40:12 - ERROR - stderr - 65%|██████▍ | 14531/22434 [13:32:32<5:29:29, 2.50s/it] +2025-02-05 23:40:15 - ERROR - stderr - 65%|██████▍ | 14532/22434 [13:32:35<5:29:42, 2.50s/it] +2025-02-05 23:40:15 - ERROR - stderr - +2025-02-05 23:40:15 - ERROR - stderr - +2025-02-05 23:40:15 - INFO - stdout - {'loss': 0.6473, 'grad_norm': 1.2247551679611206, 'learning_rate': 5.8317669651317375e-06, 'epoch': 1.94} +2025-02-05 23:40:15 - ERROR - stderr - 65%|██████▍ | 14532/22434 [13:32:35<5:29:42, 2.50s/it] +2025-02-05 23:40:18 - ERROR - stderr - 65%|██████▍ | 14533/22434 [13:32:37<5:34:36, 2.54s/it] +2025-02-05 23:40:18 - ERROR - stderr - +2025-02-05 23:40:18 - ERROR - stderr - +2025-02-05 23:40:18 - INFO - stdout - {'loss': 0.7823, 'grad_norm': 1.3713740110397339, 'learning_rate': 5.830454660896956e-06, 'epoch': 1.94} +2025-02-05 23:40:18 - ERROR - stderr - 65%|██████▍ | 14533/22434 [13:32:37<5:34:36, 2.54s/it] +2025-02-05 23:40:20 - ERROR - stderr - 65%|██████▍ | 14534/22434 [13:32:40<5:46:45, 2.63s/it] +2025-02-05 23:40:20 - ERROR - stderr - +2025-02-05 23:40:20 - ERROR - stderr - +2025-02-05 23:40:20 - INFO - stdout - {'loss': 0.7127, 'grad_norm': 1.3235563039779663, 'learning_rate': 5.829142443572358e-06, 'epoch': 1.94} +2025-02-05 23:40:20 - ERROR - stderr - 65%|██████▍ | 14534/22434 [13:32:40<5:46:45, 2.63s/it] +2025-02-05 23:40:23 - ERROR - stderr - 65%|██████▍ | 14535/22434 [13:32:43<5:44:16, 2.62s/it] +2025-02-05 23:40:23 - ERROR - stderr - +2025-02-05 23:40:23 - ERROR - stderr - +2025-02-05 23:40:23 - INFO - stdout - {'loss': 0.664, 'grad_norm': 1.2662243843078613, 'learning_rate': 5.827830313185294e-06, 'epoch': 1.94} +2025-02-05 23:40:23 - ERROR - stderr - 65%|██████▍ | 14535/22434 [13:32:43<5:44:16, 2.62s/it] +2025-02-05 23:40:25 - ERROR - stderr - 65%|██████▍ | 14536/22434 [13:32:45<5:39:46, 2.58s/it] +2025-02-05 23:40:26 - ERROR - stderr - +2025-02-05 23:40:26 - ERROR - stderr - +2025-02-05 23:40:26 - INFO - stdout - {'loss': 0.719, 'grad_norm': 1.3169081211090088, 'learning_rate': 5.826518269763116e-06, 'epoch': 1.94} +2025-02-05 23:40:26 - ERROR - stderr - 65%|██████▍ | 14536/22434 [13:32:45<5:39:46, 2.58s/it] +2025-02-05 23:40:28 - ERROR - stderr - 65%|██████▍ | 14537/22434 [13:32:48<5:37:23, 2.56s/it] +2025-02-05 23:40:28 - ERROR - stderr - +2025-02-05 23:40:28 - ERROR - stderr - +2025-02-05 23:40:28 - INFO - stdout - {'loss': 0.7934, 'grad_norm': 1.3273957967758179, 'learning_rate': 5.82520631333317e-06, 'epoch': 1.94} +2025-02-05 23:40:28 - ERROR - stderr - 65%|██████▍ | 14537/22434 [13:32:48<5:37:23, 2.56s/it] +2025-02-05 23:40:31 - ERROR - stderr - 65%|██████▍ | 14538/22434 [13:32:50<5:35:16, 2.55s/it] +2025-02-05 23:40:31 - ERROR - stderr - +2025-02-05 23:40:31 - ERROR - stderr - +2025-02-05 23:40:31 - INFO - stdout - {'loss': 0.632, 'grad_norm': 1.237271785736084, 'learning_rate': 5.823894443922804e-06, 'epoch': 1.94} +2025-02-05 23:40:31 - ERROR - stderr - 65%|██████▍ | 14538/22434 [13:32:50<5:35:16, 2.55s/it] +2025-02-05 23:40:33 - ERROR - stderr - 65%|██████▍ | 14539/22434 [13:32:53<5:34:20, 2.54s/it] +2025-02-05 23:40:33 - ERROR - stderr - +2025-02-05 23:40:33 - ERROR - stderr - +2025-02-05 23:40:33 - INFO - stdout - {'loss': 0.6629, 'grad_norm': 1.2337758541107178, 'learning_rate': 5.822582661559362e-06, 'epoch': 1.94} +2025-02-05 23:40:33 - ERROR - stderr - 65%|██████▍ | 14539/22434 [13:32:53<5:34:20, 2.54s/it] +2025-02-05 23:40:36 - ERROR - stderr - 65%|██████▍ | 14540/22434 [13:32:55<5:33:31, 2.54s/it] +2025-02-05 23:40:36 - ERROR - stderr - +2025-02-05 23:40:36 - ERROR - stderr - +2025-02-05 23:40:36 - INFO - stdout - {'loss': 0.5989, 'grad_norm': 1.3367664813995361, 'learning_rate': 5.821270966270187e-06, 'epoch': 1.94} +2025-02-05 23:40:36 - ERROR - stderr - 65%|██████▍ | 14540/22434 [13:32:55<5:33:31, 2.54s/it] +2025-02-05 23:40:38 - ERROR - stderr - 65%|██████▍ | 14541/22434 [13:32:58<5:29:52, 2.51s/it] +2025-02-05 23:40:38 - ERROR - stderr - +2025-02-05 23:40:38 - ERROR - stderr - +2025-02-05 23:40:38 - INFO - stdout - {'loss': 0.7285, 'grad_norm': 1.465135097503662, 'learning_rate': 5.819959358082621e-06, 'epoch': 1.94} +2025-02-05 23:40:38 - ERROR - stderr - 65%|██████▍ | 14541/22434 [13:32:58<5:29:52, 2.51s/it] +2025-02-05 23:40:41 - ERROR - stderr - 65%|██████▍ | 14542/22434 [13:33:00<5:30:14, 2.51s/it] +2025-02-05 23:40:41 - ERROR - stderr - +2025-02-05 23:40:41 - ERROR - stderr - +2025-02-05 23:40:41 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.2941207885742188, 'learning_rate': 5.818647837024002e-06, 'epoch': 1.94} +2025-02-05 23:40:41 - ERROR - stderr - 65%|██████▍ | 14542/22434 [13:33:00<5:30:14, 2.51s/it] +2025-02-05 23:40:43 - ERROR - stderr - 65%|██████▍ | 14543/22434 [13:33:03<5:30:08, 2.51s/it] +2025-02-05 23:40:43 - ERROR - stderr - +2025-02-05 23:40:43 - ERROR - stderr - +2025-02-05 23:40:43 - INFO - stdout - {'loss': 0.6993, 'grad_norm': 1.3215018510818481, 'learning_rate': 5.817336403121671e-06, 'epoch': 1.94} +2025-02-05 23:40:43 - ERROR - stderr - 65%|██████▍ | 14543/22434 [13:33:03<5:30:08, 2.51s/it] +2025-02-05 23:40:46 - ERROR - stderr - 65%|██████▍ | 14544/22434 [13:33:06<5:43:51, 2.61s/it] +2025-02-05 23:40:46 - ERROR - stderr - +2025-02-05 23:40:46 - ERROR - stderr - +2025-02-05 23:40:46 - INFO - stdout - {'loss': 0.7426, 'grad_norm': 1.3031100034713745, 'learning_rate': 5.816025056402953e-06, 'epoch': 1.94} +2025-02-05 23:40:46 - ERROR - stderr - 65%|██████▍ | 14544/22434 [13:33:06<5:43:51, 2.61s/it] +2025-02-05 23:40:48 - ERROR - stderr - 65%|██████▍ | 14545/22434 [13:33:08<5:37:37, 2.57s/it] +2025-02-05 23:40:48 - ERROR - stderr - +2025-02-05 23:40:48 - ERROR - stderr - +2025-02-05 23:40:48 - INFO - stdout - {'loss': 0.6849, 'grad_norm': 1.2978556156158447, 'learning_rate': 5.814713796895193e-06, 'epoch': 1.95} +2025-02-05 23:40:48 - ERROR - stderr - 65%|██████▍ | 14545/22434 [13:33:08<5:37:37, 2.57s/it] +2025-02-05 23:40:51 - ERROR - stderr - 65%|██████▍ | 14546/22434 [13:33:11<5:34:33, 2.54s/it] +2025-02-05 23:40:51 - ERROR - stderr - +2025-02-05 23:40:51 - ERROR - stderr - +2025-02-05 23:40:51 - INFO - stdout - {'loss': 0.6421, 'grad_norm': 1.25335693359375, 'learning_rate': 5.813402624625722e-06, 'epoch': 1.95} +2025-02-05 23:40:51 - ERROR - stderr - 65%|██████▍ | 14546/22434 [13:33:11<5:34:33, 2.54s/it] +2025-02-05 23:40:53 - ERROR - stderr - 65%|██████▍ | 14547/22434 [13:33:13<5:30:15, 2.51s/it] +2025-02-05 23:40:53 - ERROR - stderr - +2025-02-05 23:40:53 - ERROR - stderr - +2025-02-05 23:40:53 - INFO - stdout - {'loss': 0.612, 'grad_norm': 1.2016547918319702, 'learning_rate': 5.81209153962186e-06, 'epoch': 1.95} +2025-02-05 23:40:53 - ERROR - stderr - 65%|██████▍ | 14547/22434 [13:33:13<5:30:15, 2.51s/it] +2025-02-05 23:40:56 - ERROR - stderr - 65%|██████▍ | 14548/22434 [13:33:16<5:29:06, 2.50s/it] +2025-02-05 23:40:56 - ERROR - stderr - +2025-02-05 23:40:56 - ERROR - stderr - +2025-02-05 23:40:56 - INFO - stdout - {'loss': 0.5703, 'grad_norm': 1.1580238342285156, 'learning_rate': 5.810780541910951e-06, 'epoch': 1.95} +2025-02-05 23:40:56 - ERROR - stderr - 65%|██████▍ | 14548/22434 [13:33:16<5:29:06, 2.50s/it] +2025-02-05 23:40:58 - ERROR - stderr - 65%|██████▍ | 14549/22434 [13:33:18<5:29:47, 2.51s/it] +2025-02-05 23:40:58 - ERROR - stderr - +2025-02-05 23:40:58 - ERROR - stderr - +2025-02-05 23:40:58 - INFO - stdout - {'loss': 0.7102, 'grad_norm': 1.2101776599884033, 'learning_rate': 5.809469631520304e-06, 'epoch': 1.95} +2025-02-05 23:40:58 - ERROR - stderr - 65%|██████▍ | 14549/22434 [13:33:18<5:29:47, 2.51s/it] +2025-02-05 23:41:01 - ERROR - stderr - 65%|██████▍ | 14550/22434 [13:33:21<5:30:51, 2.52s/it] +2025-02-05 23:41:01 - ERROR - stderr - +2025-02-05 23:41:01 - ERROR - stderr - +2025-02-05 23:41:01 - INFO - stdout - {'loss': 0.6384, 'grad_norm': 1.2439404726028442, 'learning_rate': 5.808158808477261e-06, 'epoch': 1.95} +2025-02-05 23:41:01 - ERROR - stderr - 65%|██████▍ | 14550/22434 [13:33:21<5:30:51, 2.52s/it] +2025-02-05 23:41:03 - ERROR - stderr - 65%|██████▍ | 14551/22434 [13:33:23<5:28:45, 2.50s/it] +2025-02-05 23:41:03 - ERROR - stderr - +2025-02-05 23:41:03 - ERROR - stderr - +2025-02-05 23:41:03 - INFO - stdout - {'loss': 0.7239, 'grad_norm': 1.3315669298171997, 'learning_rate': 5.806848072809132e-06, 'epoch': 1.95} +2025-02-05 23:41:03 - ERROR - stderr - 65%|██████▍ | 14551/22434 [13:33:23<5:28:45, 2.50s/it] +2025-02-05 23:41:06 - ERROR - stderr - 65%|██████▍ | 14552/22434 [13:33:25<5:25:02, 2.47s/it] +2025-02-05 23:41:06 - ERROR - stderr - +2025-02-05 23:41:06 - ERROR - stderr - +2025-02-05 23:41:06 - INFO - stdout - {'loss': 0.6621, 'grad_norm': 1.3043615818023682, 'learning_rate': 5.805537424543244e-06, 'epoch': 1.95} +2025-02-05 23:41:06 - ERROR - stderr - 65%|██████▍ | 14552/22434 [13:33:26<5:25:02, 2.47s/it] +2025-02-05 23:41:08 - ERROR - stderr - 65%|██████▍ | 14553/22434 [13:33:28<5:25:44, 2.48s/it] +2025-02-05 23:41:08 - ERROR - stderr - +2025-02-05 23:41:08 - ERROR - stderr - +2025-02-05 23:41:08 - INFO - stdout - {'loss': 0.6922, 'grad_norm': 1.3136605024337769, 'learning_rate': 5.8042268637069125e-06, 'epoch': 1.95} +2025-02-05 23:41:08 - ERROR - stderr - 65%|██████▍ | 14553/22434 [13:33:28<5:25:44, 2.48s/it] +2025-02-05 23:41:11 - ERROR - stderr - 65%|██████▍ | 14554/22434 [13:33:30<5:25:58, 2.48s/it] +2025-02-05 23:41:11 - ERROR - stderr - +2025-02-05 23:41:11 - ERROR - stderr - +2025-02-05 23:41:11 - INFO - stdout - {'loss': 0.5407, 'grad_norm': 1.2590259313583374, 'learning_rate': 5.802916390327459e-06, 'epoch': 1.95} +2025-02-05 23:41:11 - ERROR - stderr - 65%|██████▍ | 14554/22434 [13:33:30<5:25:58, 2.48s/it] +2025-02-05 23:41:13 - ERROR - stderr - 65%|██████▍ | 14555/22434 [13:33:33<5:24:17, 2.47s/it] +2025-02-05 23:41:13 - ERROR - stderr - +2025-02-05 23:41:13 - ERROR - stderr - +2025-02-05 23:41:13 - INFO - stdout - {'loss': 0.7444, 'grad_norm': 1.2732837200164795, 'learning_rate': 5.801606004432197e-06, 'epoch': 1.95} +2025-02-05 23:41:13 - ERROR - stderr - 65%|██████▍ | 14555/22434 [13:33:33<5:24:17, 2.47s/it] +2025-02-05 23:41:16 - ERROR - stderr - 65%|██████▍ | 14556/22434 [13:33:36<5:32:28, 2.53s/it] +2025-02-05 23:41:16 - ERROR - stderr - +2025-02-05 23:41:16 - ERROR - stderr - +2025-02-05 23:41:16 - INFO - stdout - {'loss': 0.6423, 'grad_norm': 1.252768635749817, 'learning_rate': 5.800295706048439e-06, 'epoch': 1.95} +2025-02-05 23:41:16 - ERROR - stderr - 65%|██████▍ | 14556/22434 [13:33:36<5:32:28, 2.53s/it] +2025-02-05 23:41:18 - ERROR - stderr - 65%|██████▍ | 14557/22434 [13:33:38<5:35:46, 2.56s/it] +2025-02-05 23:41:18 - ERROR - stderr - +2025-02-05 23:41:18 - ERROR - stderr - +2025-02-05 23:41:18 - INFO - stdout - {'loss': 0.6832, 'grad_norm': 1.1760319471359253, 'learning_rate': 5.7989854952035e-06, 'epoch': 1.95} +2025-02-05 23:41:18 - ERROR - stderr - 65%|██████▍ | 14557/22434 [13:33:38<5:35:46, 2.56s/it] +2025-02-05 23:41:21 - ERROR - stderr - 65%|██████▍ | 14558/22434 [13:33:41<5:36:32, 2.56s/it] +2025-02-05 23:41:21 - ERROR - stderr - +2025-02-05 23:41:21 - ERROR - stderr - +2025-02-05 23:41:21 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.1559561491012573, 'learning_rate': 5.797675371924687e-06, 'epoch': 1.95} +2025-02-05 23:41:21 - ERROR - stderr - 65%|██████▍ | 14558/22434 [13:33:41<5:36:32, 2.56s/it] +2025-02-05 23:41:24 - ERROR - stderr - 65%|██████▍ | 14559/22434 [13:33:43<5:35:28, 2.56s/it] +2025-02-05 23:41:24 - ERROR - stderr - +2025-02-05 23:41:24 - ERROR - stderr - +2025-02-05 23:41:24 - INFO - stdout - {'loss': 0.6302, 'grad_norm': 1.2067400217056274, 'learning_rate': 5.79636533623931e-06, 'epoch': 1.95} +2025-02-05 23:41:24 - ERROR - stderr - 65%|██████▍ | 14559/22434 [13:33:43<5:35:28, 2.56s/it] +2025-02-05 23:41:26 - ERROR - stderr - 65%|██████▍ | 14560/22434 [13:33:46<5:32:03, 2.53s/it] +2025-02-05 23:41:26 - ERROR - stderr - +2025-02-05 23:41:26 - ERROR - stderr - +2025-02-05 23:41:26 - INFO - stdout - {'loss': 0.6236, 'grad_norm': 1.1841014623641968, 'learning_rate': 5.795055388174675e-06, 'epoch': 1.95} +2025-02-05 23:41:26 - ERROR - stderr - 65%|██████▍ | 14560/22434 [13:33:46<5:32:03, 2.53s/it] +2025-02-05 23:41:29 - ERROR - stderr - 65%|██████▍ | 14561/22434 [13:33:48<5:31:28, 2.53s/it] +2025-02-05 23:41:29 - ERROR - stderr - +2025-02-05 23:41:29 - ERROR - stderr - +2025-02-05 23:41:29 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.3379091024398804, 'learning_rate': 5.7937455277580875e-06, 'epoch': 1.95} +2025-02-05 23:41:29 - ERROR - stderr - 65%|██████▍ | 14561/22434 [13:33:48<5:31:28, 2.53s/it] +2025-02-05 23:41:31 - ERROR - stderr - 65%|██████▍ | 14562/22434 [13:33:51<5:31:41, 2.53s/it] +2025-02-05 23:41:31 - ERROR - stderr - +2025-02-05 23:41:31 - ERROR - stderr - +2025-02-05 23:41:31 - INFO - stdout - {'loss': 0.6841, 'grad_norm': 1.334794044494629, 'learning_rate': 5.7924357550168534e-06, 'epoch': 1.95} +2025-02-05 23:41:31 - ERROR - stderr - 65%|██████▍ | 14562/22434 [13:33:51<5:31:41, 2.53s/it] +2025-02-05 23:41:34 - ERROR - stderr - 65%|██████▍ | 14563/22434 [13:33:53<5:30:04, 2.52s/it] +2025-02-05 23:41:34 - ERROR - stderr - +2025-02-05 23:41:34 - ERROR - stderr - +2025-02-05 23:41:34 - INFO - stdout - {'loss': 0.7293, 'grad_norm': 1.3697844743728638, 'learning_rate': 5.791126069978261e-06, 'epoch': 1.95} +2025-02-05 23:41:34 - ERROR - stderr - 65%|██████▍ | 14563/22434 [13:33:53<5:30:04, 2.52s/it] +2025-02-05 23:41:36 - ERROR - stderr - 65%|██████▍ | 14564/22434 [13:33:56<5:25:36, 2.48s/it] +2025-02-05 23:41:36 - ERROR - stderr - +2025-02-05 23:41:36 - ERROR - stderr - +2025-02-05 23:41:36 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.3431954383850098, 'learning_rate': 5.789816472669622e-06, 'epoch': 1.95} +2025-02-05 23:41:36 - ERROR - stderr - 65%|██████▍ | 14564/22434 [13:33:56<5:25:36, 2.48s/it] +2025-02-05 23:41:38 - ERROR - stderr - 65%|██████▍ | 14565/22434 [13:33:58<5:26:09, 2.49s/it] +2025-02-05 23:41:38 - ERROR - stderr - +2025-02-05 23:41:38 - ERROR - stderr - +2025-02-05 23:41:38 - INFO - stdout - {'loss': 0.6781, 'grad_norm': 1.3829270601272583, 'learning_rate': 5.788506963118232e-06, 'epoch': 1.95} +2025-02-05 23:41:38 - ERROR - stderr - 65%|██████▍ | 14565/22434 [13:33:58<5:26:09, 2.49s/it] +2025-02-05 23:41:41 - ERROR - stderr - 65%|██████▍ | 14566/22434 [13:34:01<5:33:36, 2.54s/it] +2025-02-05 23:41:41 - ERROR - stderr - +2025-02-05 23:41:41 - ERROR - stderr - +2025-02-05 23:41:41 - INFO - stdout - {'loss': 0.6945, 'grad_norm': 1.2572550773620605, 'learning_rate': 5.787197541351383e-06, 'epoch': 1.95} +2025-02-05 23:41:41 - ERROR - stderr - 65%|██████▍ | 14566/22434 [13:34:01<5:33:36, 2.54s/it] +2025-02-05 23:41:44 - ERROR - stderr - 65%|██████▍ | 14567/22434 [13:34:03<5:30:29, 2.52s/it] +2025-02-05 23:41:44 - ERROR - stderr - +2025-02-05 23:41:44 - ERROR - stderr - +2025-02-05 23:41:44 - INFO - stdout - {'loss': 0.6362, 'grad_norm': 1.2475545406341553, 'learning_rate': 5.785888207396374e-06, 'epoch': 1.95} +2025-02-05 23:41:44 - ERROR - stderr - 65%|██████▍ | 14567/22434 [13:34:03<5:30:29, 2.52s/it] +2025-02-05 23:41:46 - ERROR - stderr - 65%|██████▍ | 14568/22434 [13:34:06<5:30:45, 2.52s/it] +2025-02-05 23:41:46 - ERROR - stderr - +2025-02-05 23:41:46 - ERROR - stderr - +2025-02-05 23:41:46 - INFO - stdout - {'loss': 0.5718, 'grad_norm': 1.12861967086792, 'learning_rate': 5.784578961280485e-06, 'epoch': 1.95} +2025-02-05 23:41:46 - ERROR - stderr - 65%|██████▍ | 14568/22434 [13:34:06<5:30:45, 2.52s/it] +2025-02-05 23:41:49 - ERROR - stderr - 65%|██████▍ | 14569/22434 [13:34:08<5:30:24, 2.52s/it] +2025-02-05 23:41:49 - ERROR - stderr - +2025-02-05 23:41:49 - ERROR - stderr - +2025-02-05 23:41:49 - INFO - stdout - {'loss': 0.7706, 'grad_norm': 1.3583524227142334, 'learning_rate': 5.783269803031022e-06, 'epoch': 1.95} +2025-02-05 23:41:49 - ERROR - stderr - 65%|██████▍ | 14569/22434 [13:34:08<5:30:24, 2.52s/it] +2025-02-05 23:41:51 - ERROR - stderr - 65%|██████▍ | 14570/22434 [13:34:11<5:31:32, 2.53s/it] +2025-02-05 23:41:51 - ERROR - stderr - +2025-02-05 23:41:51 - ERROR - stderr - +2025-02-05 23:41:51 - INFO - stdout - {'loss': 0.6147, 'grad_norm': 1.3273788690567017, 'learning_rate': 5.78196073267526e-06, 'epoch': 1.95} +2025-02-05 23:41:51 - ERROR - stderr - 65%|██████▍ | 14570/22434 [13:34:11<5:31:32, 2.53s/it] +2025-02-05 23:41:54 - ERROR - stderr - 65%|██████▍ | 14571/22434 [13:34:13<5:32:10, 2.53s/it] +2025-02-05 23:41:54 - ERROR - stderr - +2025-02-05 23:41:54 - ERROR - stderr - +2025-02-05 23:41:54 - INFO - stdout - {'loss': 0.6911, 'grad_norm': 1.2622047662734985, 'learning_rate': 5.780651750240491e-06, 'epoch': 1.95} +2025-02-05 23:41:54 - ERROR - stderr - 65%|██████▍ | 14571/22434 [13:34:14<5:32:10, 2.53s/it] +2025-02-05 23:41:56 - ERROR - stderr - 65%|██████▍ | 14572/22434 [13:34:16<5:29:50, 2.52s/it] +2025-02-05 23:41:56 - ERROR - stderr - +2025-02-05 23:41:56 - ERROR - stderr - +2025-02-05 23:41:56 - INFO - stdout - {'loss': 0.5865, 'grad_norm': 1.318591833114624, 'learning_rate': 5.779342855754e-06, 'epoch': 1.95} +2025-02-05 23:41:56 - ERROR - stderr - 65%|██████▍ | 14572/22434 [13:34:16<5:29:50, 2.52s/it] +2025-02-05 23:41:59 - ERROR - stderr - 65%|██████▍ | 14573/22434 [13:34:18<5:28:42, 2.51s/it] +2025-02-05 23:41:59 - ERROR - stderr - +2025-02-05 23:41:59 - ERROR - stderr - +2025-02-05 23:41:59 - INFO - stdout - {'loss': 0.613, 'grad_norm': 1.2917972803115845, 'learning_rate': 5.778034049243062e-06, 'epoch': 1.95} +2025-02-05 23:41:59 - ERROR - stderr - 65%|██████▍ | 14573/22434 [13:34:18<5:28:42, 2.51s/it] +2025-02-05 23:42:01 - ERROR - stderr - 65%|██████▍ | 14574/22434 [13:34:21<5:26:10, 2.49s/it] +2025-02-05 23:42:01 - ERROR - stderr - +2025-02-05 23:42:01 - ERROR - stderr - +2025-02-05 23:42:01 - INFO - stdout - {'loss': 0.652, 'grad_norm': 1.19225013256073, 'learning_rate': 5.776725330734973e-06, 'epoch': 1.95} +2025-02-05 23:42:01 - ERROR - stderr - 65%|██████▍ | 14574/22434 [13:34:21<5:26:10, 2.49s/it] +2025-02-05 23:42:04 - ERROR - stderr - 65%|██████▍ | 14575/22434 [13:34:23<5:25:15, 2.48s/it] +2025-02-05 23:42:04 - ERROR - stderr - +2025-02-05 23:42:04 - ERROR - stderr - +2025-02-05 23:42:04 - INFO - stdout - {'loss': 0.6307, 'grad_norm': 1.2270605564117432, 'learning_rate': 5.7754167002570015e-06, 'epoch': 1.95} +2025-02-05 23:42:04 - ERROR - stderr - 65%|██████▍ | 14575/22434 [13:34:23<5:25:15, 2.48s/it] +2025-02-05 23:42:06 - ERROR - stderr - 65%|██████▍ | 14576/22434 [13:34:26<5:25:51, 2.49s/it] +2025-02-05 23:42:06 - ERROR - stderr - +2025-02-05 23:42:06 - ERROR - stderr - +2025-02-05 23:42:06 - INFO - stdout - {'loss': 0.6809, 'grad_norm': 1.4167894124984741, 'learning_rate': 5.774108157836424e-06, 'epoch': 1.95} +2025-02-05 23:42:06 - ERROR - stderr - 65%|██████▍ | 14576/22434 [13:34:26<5:25:51, 2.49s/it] +2025-02-05 23:42:09 - ERROR - stderr - 65%|██████▍ | 14577/22434 [13:34:28<5:25:46, 2.49s/it] +2025-02-05 23:42:09 - ERROR - stderr - +2025-02-05 23:42:09 - ERROR - stderr - +2025-02-05 23:42:09 - INFO - stdout - {'loss': 0.6394, 'grad_norm': 1.2539727687835693, 'learning_rate': 5.772799703500519e-06, 'epoch': 1.95} +2025-02-05 23:42:09 - ERROR - stderr - 65%|██████▍ | 14577/22434 [13:34:28<5:25:46, 2.49s/it] +2025-02-05 23:42:11 - ERROR - stderr - 65%|██████▍ | 14578/22434 [13:34:31<5:25:59, 2.49s/it] +2025-02-05 23:42:11 - ERROR - stderr - +2025-02-05 23:42:11 - ERROR - stderr - +2025-02-05 23:42:11 - INFO - stdout - {'loss': 0.7197, 'grad_norm': 1.4108961820602417, 'learning_rate': 5.771491337276559e-06, 'epoch': 1.95} +2025-02-05 23:42:11 - ERROR - stderr - 65%|██████▍ | 14578/22434 [13:34:31<5:25:59, 2.49s/it] +2025-02-05 23:42:14 - ERROR - stderr - 65%|██████▍ | 14579/22434 [13:34:33<5:24:13, 2.48s/it] +2025-02-05 23:42:14 - ERROR - stderr - +2025-02-05 23:42:14 - ERROR - stderr - +2025-02-05 23:42:14 - INFO - stdout - {'loss': 0.6709, 'grad_norm': 1.239606499671936, 'learning_rate': 5.7701830591918164e-06, 'epoch': 1.95} +2025-02-05 23:42:14 - ERROR - stderr - 65%|██████▍ | 14579/22434 [13:34:33<5:24:13, 2.48s/it] +2025-02-05 23:42:16 - ERROR - stderr - 65%|██████▍ | 14580/22434 [13:34:36<5:24:24, 2.48s/it] +2025-02-05 23:42:16 - ERROR - stderr - +2025-02-05 23:42:16 - ERROR - stderr - +2025-02-05 23:42:16 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.4313056468963623, 'learning_rate': 5.76887486927356e-06, 'epoch': 1.95} +2025-02-05 23:42:16 - ERROR - stderr - 65%|██████▍ | 14580/22434 [13:34:36<5:24:24, 2.48s/it] +2025-02-05 23:42:18 - ERROR - stderr - 65%|██████▍ | 14581/22434 [13:34:38<5:24:25, 2.48s/it] +2025-02-05 23:42:19 - ERROR - stderr - +2025-02-05 23:42:19 - ERROR - stderr - +2025-02-05 23:42:19 - INFO - stdout - {'loss': 0.7019, 'grad_norm': 1.2129759788513184, 'learning_rate': 5.767566767549058e-06, 'epoch': 1.95} +2025-02-05 23:42:19 - ERROR - stderr - 65%|██████▍ | 14581/22434 [13:34:38<5:24:25, 2.48s/it] +2025-02-05 23:42:21 - ERROR - stderr - 65%|██████▍ | 14582/22434 [13:34:41<5:36:57, 2.57s/it] +2025-02-05 23:42:21 - ERROR - stderr - +2025-02-05 23:42:21 - ERROR - stderr - +2025-02-05 23:42:21 - INFO - stdout - {'loss': 0.5381, 'grad_norm': 1.1411004066467285, 'learning_rate': 5.766258754045577e-06, 'epoch': 1.95} +2025-02-05 23:42:21 - ERROR - stderr - 65%|██████▍ | 14582/22434 [13:34:41<5:36:57, 2.57s/it] +2025-02-05 23:42:24 - ERROR - stderr - 65%|██████▌ | 14583/22434 [13:34:44<5:34:18, 2.55s/it] +2025-02-05 23:42:24 - ERROR - stderr - +2025-02-05 23:42:24 - ERROR - stderr - +2025-02-05 23:42:24 - INFO - stdout - {'loss': 0.6141, 'grad_norm': 1.271362066268921, 'learning_rate': 5.764950828790381e-06, 'epoch': 1.95} +2025-02-05 23:42:24 - ERROR - stderr - 65%|██████▌ | 14583/22434 [13:34:44<5:34:18, 2.55s/it] +2025-02-05 23:42:26 - ERROR - stderr - 65%|██████▌ | 14584/22434 [13:34:46<5:32:10, 2.54s/it] +2025-02-05 23:42:26 - ERROR - stderr - +2025-02-05 23:42:26 - ERROR - stderr - +2025-02-05 23:42:26 - INFO - stdout - {'loss': 0.6496, 'grad_norm': 1.1945751905441284, 'learning_rate': 5.763642991810732e-06, 'epoch': 1.95} +2025-02-05 23:42:26 - ERROR - stderr - 65%|██████▌ | 14584/22434 [13:34:46<5:32:10, 2.54s/it] +2025-02-05 23:42:29 - ERROR - stderr - 65%|██████▌ | 14585/22434 [13:34:49<5:29:53, 2.52s/it] +2025-02-05 23:42:29 - ERROR - stderr - +2025-02-05 23:42:29 - ERROR - stderr - +2025-02-05 23:42:29 - INFO - stdout - {'loss': 0.6351, 'grad_norm': 1.2459964752197266, 'learning_rate': 5.762335243133892e-06, 'epoch': 1.95} +2025-02-05 23:42:29 - ERROR - stderr - 65%|██████▌ | 14585/22434 [13:34:49<5:29:53, 2.52s/it] +2025-02-05 23:42:31 - ERROR - stderr - 65%|██████▌ | 14586/22434 [13:34:51<5:30:19, 2.53s/it] +2025-02-05 23:42:31 - ERROR - stderr - +2025-02-05 23:42:31 - ERROR - stderr - +2025-02-05 23:42:31 - INFO - stdout - {'loss': 0.6306, 'grad_norm': 1.202523112297058, 'learning_rate': 5.761027582787122e-06, 'epoch': 1.95} +2025-02-05 23:42:31 - ERROR - stderr - 65%|██████▌ | 14586/22434 [13:34:51<5:30:19, 2.53s/it] +2025-02-05 23:42:34 - ERROR - stderr - 65%|██████▌ | 14587/22434 [13:34:54<5:27:59, 2.51s/it] +2025-02-05 23:42:34 - ERROR - stderr - +2025-02-05 23:42:34 - ERROR - stderr - +2025-02-05 23:42:34 - INFO - stdout - {'loss': 0.736, 'grad_norm': 1.437219262123108, 'learning_rate': 5.759720010797668e-06, 'epoch': 1.95} +2025-02-05 23:42:34 - ERROR - stderr - 65%|██████▌ | 14587/22434 [13:34:54<5:27:59, 2.51s/it] +2025-02-05 23:42:36 - ERROR - stderr - 65%|██████▌ | 14588/22434 [13:34:56<5:29:01, 2.52s/it] +2025-02-05 23:42:36 - ERROR - stderr - +2025-02-05 23:42:36 - ERROR - stderr - +2025-02-05 23:42:36 - INFO - stdout - {'loss': 0.6181, 'grad_norm': 1.0865683555603027, 'learning_rate': 5.758412527192801e-06, 'epoch': 1.95} +2025-02-05 23:42:36 - ERROR - stderr - 65%|██████▌ | 14588/22434 [13:34:56<5:29:01, 2.52s/it] +2025-02-05 23:42:39 - ERROR - stderr - 65%|██████▌ | 14589/22434 [13:34:59<5:28:40, 2.51s/it] +2025-02-05 23:42:39 - ERROR - stderr - +2025-02-05 23:42:39 - ERROR - stderr - +2025-02-05 23:42:39 - INFO - stdout - {'loss': 0.7191, 'grad_norm': 1.2649890184402466, 'learning_rate': 5.7571051319997585e-06, 'epoch': 1.95} +2025-02-05 23:42:39 - ERROR - stderr - 65%|██████▌ | 14589/22434 [13:34:59<5:28:40, 2.51s/it] +2025-02-05 23:42:41 - ERROR - stderr - 65%|██████▌ | 14590/22434 [13:35:01<5:29:41, 2.52s/it] +2025-02-05 23:42:41 - ERROR - stderr - +2025-02-05 23:42:41 - ERROR - stderr - +2025-02-05 23:42:41 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.2200759649276733, 'learning_rate': 5.755797825245802e-06, 'epoch': 1.95} +2025-02-05 23:42:41 - ERROR - stderr - 65%|██████▌ | 14590/22434 [13:35:01<5:29:41, 2.52s/it] +2025-02-05 23:42:44 - ERROR - stderr - 65%|██████▌ | 14591/22434 [13:35:04<5:27:28, 2.51s/it] +2025-02-05 23:42:44 - ERROR - stderr - +2025-02-05 23:42:44 - ERROR - stderr - +2025-02-05 23:42:44 - INFO - stdout - {'loss': 0.638, 'grad_norm': 1.2296539545059204, 'learning_rate': 5.754490606958185e-06, 'epoch': 1.95} +2025-02-05 23:42:44 - ERROR - stderr - 65%|██████▌ | 14591/22434 [13:35:04<5:27:28, 2.51s/it] +2025-02-05 23:42:46 - ERROR - stderr - 65%|██████▌ | 14592/22434 [13:35:06<5:28:19, 2.51s/it] +2025-02-05 23:42:46 - ERROR - stderr - +2025-02-05 23:42:46 - ERROR - stderr - +2025-02-05 23:42:46 - INFO - stdout - {'loss': 0.7047, 'grad_norm': 1.3819276094436646, 'learning_rate': 5.753183477164139e-06, 'epoch': 1.95} +2025-02-05 23:42:46 - ERROR - stderr - 65%|██████▌ | 14592/22434 [13:35:06<5:28:19, 2.51s/it] +2025-02-05 23:42:49 - ERROR - stderr - 65%|██████▌ | 14593/22434 [13:35:09<5:26:13, 2.50s/it] +2025-02-05 23:42:49 - ERROR - stderr - +2025-02-05 23:42:49 - ERROR - stderr - +2025-02-05 23:42:49 - INFO - stdout - {'loss': 0.6435, 'grad_norm': 1.2433072328567505, 'learning_rate': 5.751876435890929e-06, 'epoch': 1.95} +2025-02-05 23:42:49 - ERROR - stderr - 65%|██████▌ | 14593/22434 [13:35:09<5:26:13, 2.50s/it] +2025-02-05 23:42:52 - ERROR - stderr - 65%|██████▌ | 14594/22434 [13:35:11<5:39:08, 2.60s/it] +2025-02-05 23:42:52 - ERROR - stderr - +2025-02-05 23:42:52 - ERROR - stderr - +2025-02-05 23:42:52 - INFO - stdout - {'loss': 0.6793, 'grad_norm': 1.2099626064300537, 'learning_rate': 5.750569483165785e-06, 'epoch': 1.95} +2025-02-05 23:42:52 - ERROR - stderr - 65%|██████▌ | 14594/22434 [13:35:11<5:39:08, 2.60s/it] +2025-02-05 23:42:54 - ERROR - stderr - 65%|██████▌ | 14595/22434 [13:35:14<5:38:38, 2.59s/it] +2025-02-05 23:42:54 - ERROR - stderr - +2025-02-05 23:42:54 - ERROR - stderr - +2025-02-05 23:42:54 - INFO - stdout - {'loss': 0.5591, 'grad_norm': 1.026604413986206, 'learning_rate': 5.7492626190159515e-06, 'epoch': 1.95} +2025-02-05 23:42:54 - ERROR - stderr - 65%|██████▌ | 14595/22434 [13:35:14<5:38:38, 2.59s/it] +2025-02-05 23:42:57 - ERROR - stderr - 65%|██████▌ | 14596/22434 [13:35:17<5:36:15, 2.57s/it] +2025-02-05 23:42:57 - ERROR - stderr - +2025-02-05 23:42:57 - ERROR - stderr - +2025-02-05 23:42:57 - INFO - stdout - {'loss': 0.6698, 'grad_norm': 1.3085094690322876, 'learning_rate': 5.747955843468674e-06, 'epoch': 1.95} +2025-02-05 23:42:57 - ERROR - stderr - 65%|██████▌ | 14596/22434 [13:35:17<5:36:15, 2.57s/it] +2025-02-05 23:42:59 - ERROR - stderr - 65%|██████▌ | 14597/22434 [13:35:19<5:32:41, 2.55s/it] +2025-02-05 23:42:59 - ERROR - stderr - +2025-02-05 23:42:59 - ERROR - stderr - +2025-02-05 23:42:59 - INFO - stdout - {'loss': 0.6139, 'grad_norm': 1.235945463180542, 'learning_rate': 5.746649156551187e-06, 'epoch': 1.95} +2025-02-05 23:42:59 - ERROR - stderr - 65%|██████▌ | 14597/22434 [13:35:19<5:32:41, 2.55s/it] +2025-02-05 23:43:02 - ERROR - stderr - 65%|██████▌ | 14598/22434 [13:35:21<5:30:33, 2.53s/it] +2025-02-05 23:43:02 - ERROR - stderr - +2025-02-05 23:43:02 - ERROR - stderr - +2025-02-05 23:43:02 - INFO - stdout - {'loss': 0.6993, 'grad_norm': 1.449135661125183, 'learning_rate': 5.74534255829073e-06, 'epoch': 1.95} +2025-02-05 23:43:02 - ERROR - stderr - 65%|██████▌ | 14598/22434 [13:35:22<5:30:33, 2.53s/it] +2025-02-05 23:43:04 - ERROR - stderr - 65%|██████▌ | 14599/22434 [13:35:24<5:30:08, 2.53s/it] +2025-02-05 23:43:04 - ERROR - stderr - +2025-02-05 23:43:04 - ERROR - stderr - +2025-02-05 23:43:04 - INFO - stdout - {'loss': 0.703, 'grad_norm': 1.2851158380508423, 'learning_rate': 5.744036048714534e-06, 'epoch': 1.95} +2025-02-05 23:43:04 - ERROR - stderr - 65%|██████▌ | 14599/22434 [13:35:24<5:30:08, 2.53s/it] +2025-02-05 23:43:07 - ERROR - stderr - 65%|██████▌ | 14600/22434 [13:35:26<5:26:46, 2.50s/it] +2025-02-05 23:43:07 - ERROR - stderr - +2025-02-05 23:43:07 - ERROR - stderr - +2025-02-05 23:43:07 - INFO - stdout - {'loss': 0.6717, 'grad_norm': 1.1898199319839478, 'learning_rate': 5.742729627849836e-06, 'epoch': 1.95} +2025-02-05 23:43:07 - ERROR - stderr - 65%|██████▌ | 14600/22434 [13:35:27<5:26:46, 2.50s/it] +2025-02-05 23:43:09 - ERROR - stderr - 65%|██████▌ | 14601/22434 [13:35:29<5:26:14, 2.50s/it] +2025-02-05 23:43:09 - ERROR - stderr - +2025-02-05 23:43:09 - ERROR - stderr - +2025-02-05 23:43:09 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.21040940284729, 'learning_rate': 5.7414232957238635e-06, 'epoch': 1.95} +2025-02-05 23:43:09 - ERROR - stderr - 65%|██████▌ | 14601/22434 [13:35:29<5:26:14, 2.50s/it] +2025-02-05 23:43:12 - ERROR - stderr - 65%|██████▌ | 14602/22434 [13:35:31<5:25:52, 2.50s/it] +2025-02-05 23:43:12 - ERROR - stderr - +2025-02-05 23:43:12 - ERROR - stderr - +2025-02-05 23:43:12 - INFO - stdout - {'loss': 0.6697, 'grad_norm': 1.1966288089752197, 'learning_rate': 5.740117052363848e-06, 'epoch': 1.95} +2025-02-05 23:43:12 - ERROR - stderr - 65%|██████▌ | 14602/22434 [13:35:31<5:25:52, 2.50s/it] +2025-02-05 23:43:14 - ERROR - stderr - 65%|██████▌ | 14603/22434 [13:35:34<5:23:54, 2.48s/it] +2025-02-05 23:43:14 - ERROR - stderr - +2025-02-05 23:43:14 - ERROR - stderr - +2025-02-05 23:43:14 - INFO - stdout - {'loss': 0.7008, 'grad_norm': 1.3730822801589966, 'learning_rate': 5.738810897797016e-06, 'epoch': 1.95} +2025-02-05 23:43:14 - ERROR - stderr - 65%|██████▌ | 14603/22434 [13:35:34<5:23:54, 2.48s/it] +2025-02-05 23:43:17 - ERROR - stderr - 65%|██████▌ | 14604/22434 [13:35:36<5:25:15, 2.49s/it] +2025-02-05 23:43:17 - ERROR - stderr - +2025-02-05 23:43:17 - ERROR - stderr - +2025-02-05 23:43:17 - INFO - stdout - {'loss': 0.6007, 'grad_norm': 1.1394495964050293, 'learning_rate': 5.737504832050594e-06, 'epoch': 1.95} +2025-02-05 23:43:17 - ERROR - stderr - 65%|██████▌ | 14604/22434 [13:35:36<5:25:15, 2.49s/it] +2025-02-05 23:43:19 - ERROR - stderr - 65%|██████▌ | 14605/22434 [13:35:39<5:29:27, 2.52s/it] +2025-02-05 23:43:19 - ERROR - stderr - +2025-02-05 23:43:19 - ERROR - stderr - +2025-02-05 23:43:19 - INFO - stdout - {'loss': 0.6924, 'grad_norm': 1.4064130783081055, 'learning_rate': 5.736198855151804e-06, 'epoch': 1.95} +2025-02-05 23:43:19 - ERROR - stderr - 65%|██████▌ | 14605/22434 [13:35:39<5:29:27, 2.52s/it] +2025-02-05 23:43:22 - ERROR - stderr - 65%|██████▌ | 14606/22434 [13:35:41<5:28:04, 2.51s/it] +2025-02-05 23:43:22 - ERROR - stderr - +2025-02-05 23:43:22 - ERROR - stderr - +2025-02-05 23:43:22 - INFO - stdout - {'loss': 0.7476, 'grad_norm': 1.408184289932251, 'learning_rate': 5.734892967127869e-06, 'epoch': 1.95} +2025-02-05 23:43:22 - ERROR - stderr - 65%|██████▌ | 14606/22434 [13:35:42<5:28:04, 2.51s/it] +2025-02-05 23:43:24 - ERROR - stderr - 65%|██████▌ | 14607/22434 [13:35:44<5:26:51, 2.51s/it] +2025-02-05 23:43:24 - ERROR - stderr - +2025-02-05 23:43:24 - ERROR - stderr - +2025-02-05 23:43:24 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.194059133529663, 'learning_rate': 5.733587168006014e-06, 'epoch': 1.95} +2025-02-05 23:43:24 - ERROR - stderr - 65%|██████▌ | 14607/22434 [13:35:44<5:26:51, 2.51s/it] +2025-02-05 23:43:27 - ERROR - stderr - 65%|██████▌ | 14608/22434 [13:35:47<5:28:00, 2.51s/it] +2025-02-05 23:43:27 - ERROR - stderr - +2025-02-05 23:43:27 - ERROR - stderr - +2025-02-05 23:43:27 - INFO - stdout - {'loss': 0.7018, 'grad_norm': 1.3690651655197144, 'learning_rate': 5.732281457813445e-06, 'epoch': 1.95} +2025-02-05 23:43:27 - ERROR - stderr - 65%|██████▌ | 14608/22434 [13:35:47<5:28:00, 2.51s/it] +2025-02-05 23:43:29 - ERROR - stderr - 65%|██████▌ | 14609/22434 [13:35:49<5:30:41, 2.54s/it] +2025-02-05 23:43:29 - ERROR - stderr - +2025-02-05 23:43:29 - ERROR - stderr - +2025-02-05 23:43:29 - INFO - stdout - {'loss': 0.6149, 'grad_norm': 1.1940110921859741, 'learning_rate': 5.730975836577386e-06, 'epoch': 1.95} +2025-02-05 23:43:29 - ERROR - stderr - 65%|██████▌ | 14609/22434 [13:35:49<5:30:41, 2.54s/it] +2025-02-05 23:43:32 - ERROR - stderr - 65%|██████▌ | 14610/22434 [13:35:52<5:29:46, 2.53s/it] +2025-02-05 23:43:32 - ERROR - stderr - +2025-02-05 23:43:32 - ERROR - stderr - +2025-02-05 23:43:32 - INFO - stdout - {'loss': 0.6151, 'grad_norm': 1.268286108970642, 'learning_rate': 5.729670304325057e-06, 'epoch': 1.95} +2025-02-05 23:43:32 - ERROR - stderr - 65%|██████▌ | 14610/22434 [13:35:52<5:29:46, 2.53s/it] +2025-02-05 23:43:35 - ERROR - stderr - 65%|██████▌ | 14611/22434 [13:35:54<5:43:21, 2.63s/it] +2025-02-05 23:43:35 - ERROR - stderr - +2025-02-05 23:43:35 - ERROR - stderr - +2025-02-05 23:43:35 - INFO - stdout - {'loss': 0.6842, 'grad_norm': 1.3428348302841187, 'learning_rate': 5.728364861083655e-06, 'epoch': 1.95} +2025-02-05 23:43:35 - ERROR - stderr - 65%|██████��� | 14611/22434 [13:35:55<5:43:21, 2.63s/it] +2025-02-05 23:43:37 - ERROR - stderr - 65%|██████▌ | 14612/22434 [13:35:57<5:41:07, 2.62s/it] +2025-02-05 23:43:37 - ERROR - stderr - +2025-02-05 23:43:37 - ERROR - stderr - +2025-02-05 23:43:37 - INFO - stdout - {'loss': 0.6316, 'grad_norm': 1.073595404624939, 'learning_rate': 5.727059506880408e-06, 'epoch': 1.95} +2025-02-05 23:43:37 - ERROR - stderr - 65%|██████▌ | 14612/22434 [13:35:57<5:41:07, 2.62s/it] +2025-02-05 23:43:40 - ERROR - stderr - 65%|██████▌ | 14613/22434 [13:36:00<5:38:10, 2.59s/it] +2025-02-05 23:43:40 - ERROR - stderr - +2025-02-05 23:43:40 - ERROR - stderr - +2025-02-05 23:43:40 - INFO - stdout - {'loss': 0.6534, 'grad_norm': 1.206764817237854, 'learning_rate': 5.72575424174251e-06, 'epoch': 1.95} +2025-02-05 23:43:40 - ERROR - stderr - 65%|██████▌ | 14613/22434 [13:36:00<5:38:10, 2.59s/it] +2025-02-05 23:43:42 - ERROR - stderr - 65%|██████▌ | 14614/22434 [13:36:02<5:37:32, 2.59s/it] +2025-02-05 23:43:42 - ERROR - stderr - +2025-02-05 23:43:42 - ERROR - stderr - +2025-02-05 23:43:42 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.2491192817687988, 'learning_rate': 5.724449065697182e-06, 'epoch': 1.95} +2025-02-05 23:43:42 - ERROR - stderr - 65%|██████▌ | 14614/22434 [13:36:02<5:37:32, 2.59s/it] +2025-02-05 23:43:45 - ERROR - stderr - 65%|██████▌ | 14615/22434 [13:36:05<5:33:38, 2.56s/it] +2025-02-05 23:43:45 - ERROR - stderr - +2025-02-05 23:43:45 - ERROR - stderr - +2025-02-05 23:43:45 - INFO - stdout - {'loss': 0.6271, 'grad_norm': 1.157507061958313, 'learning_rate': 5.723143978771617e-06, 'epoch': 1.95} +2025-02-05 23:43:45 - ERROR - stderr - 65%|██████▌ | 14615/22434 [13:36:05<5:33:38, 2.56s/it] +2025-02-05 23:43:47 - ERROR - stderr - 65%|██████▌ | 14616/22434 [13:36:07<5:33:30, 2.56s/it] +2025-02-05 23:43:48 - ERROR - stderr - +2025-02-05 23:43:48 - ERROR - stderr - +2025-02-05 23:43:48 - INFO - stdout - {'loss': 0.7223, 'grad_norm': 1.2509205341339111, 'learning_rate': 5.721838980993025e-06, 'epoch': 1.95} +2025-02-05 23:43:48 - ERROR - stderr - 65%|██████▌ | 14616/22434 [13:36:07<5:33:30, 2.56s/it] +2025-02-05 23:43:50 - ERROR - stderr - 65%|██████▌ | 14617/22434 [13:36:10<5:32:02, 2.55s/it] +2025-02-05 23:43:50 - ERROR - stderr - +2025-02-05 23:43:50 - ERROR - stderr - +2025-02-05 23:43:50 - INFO - stdout - {'loss': 0.7047, 'grad_norm': 1.2894445657730103, 'learning_rate': 5.720534072388605e-06, 'epoch': 1.95} +2025-02-05 23:43:50 - ERROR - stderr - 65%|██████▌ | 14617/22434 [13:36:10<5:32:02, 2.55s/it] +2025-02-05 23:43:53 - ERROR - stderr - 65%|██████▌ | 14618/22434 [13:36:12<5:30:27, 2.54s/it] +2025-02-05 23:43:53 - ERROR - stderr - +2025-02-05 23:43:53 - ERROR - stderr - +2025-02-05 23:43:53 - INFO - stdout - {'loss': 0.6554, 'grad_norm': 1.226746678352356, 'learning_rate': 5.719229252985553e-06, 'epoch': 1.95} +2025-02-05 23:43:53 - ERROR - stderr - 65%|██████▌ | 14618/22434 [13:36:12<5:30:27, 2.54s/it] +2025-02-05 23:43:55 - ERROR - stderr - 65%|██████▌ | 14619/22434 [13:36:15<5:26:57, 2.51s/it] +2025-02-05 23:43:55 - ERROR - stderr - +2025-02-05 23:43:55 - ERROR - stderr - +2025-02-05 23:43:55 - INFO - stdout - {'loss': 0.6613, 'grad_norm': 1.3642951250076294, 'learning_rate': 5.7179245228110795e-06, 'epoch': 1.95} +2025-02-05 23:43:55 - ERROR - stderr - 65%|██████▌ | 14619/22434 [13:36:15<5:26:57, 2.51s/it] +2025-02-05 23:43:57 - ERROR - stderr - 65%|██████▌ | 14620/22434 [13:36:17<5:27:53, 2.52s/it] +2025-02-05 23:43:58 - ERROR - stderr - +2025-02-05 23:43:58 - ERROR - stderr - +2025-02-05 23:43:58 - INFO - stdout - {'loss': 0.685, 'grad_norm': 1.3229955434799194, 'learning_rate': 5.716619881892367e-06, 'epoch': 1.96} +2025-02-05 23:43:58 - ERROR - stderr - 65%|██████▌ | 14620/22434 [13:36:17<5:27:53, 2.52s/it] +2025-02-05 23:44:00 - ERROR - stderr - 65%|██████▌ | 14621/22434 [13:36:20<5:27:32, 2.52s/it] +2025-02-05 23:44:00 - ERROR - stderr - +2025-02-05 23:44:00 - ERROR - stderr - +2025-02-05 23:44:00 - INFO - stdout - {'loss': 0.6266, 'grad_norm': 1.3023368120193481, 'learning_rate': 5.715315330256614e-06, 'epoch': 1.96} +2025-02-05 23:44:00 - ERROR - stderr - 65%|██████▌ | 14621/22434 [13:36:20<5:27:32, 2.52s/it] +2025-02-05 23:44:03 - ERROR - stderr - 65%|██████▌ | 14622/22434 [13:36:22<5:30:41, 2.54s/it] +2025-02-05 23:44:03 - ERROR - stderr - +2025-02-05 23:44:03 - ERROR - stderr - +2025-02-05 23:44:03 - INFO - stdout - {'loss': 0.6259, 'grad_norm': 1.2533899545669556, 'learning_rate': 5.714010867931015e-06, 'epoch': 1.96} +2025-02-05 23:44:03 - ERROR - stderr - 65%|██████▌ | 14622/22434 [13:36:22<5:30:41, 2.54s/it] +2025-02-05 23:44:05 - ERROR - stderr - 65%|██████▌ | 14623/22434 [13:36:25<5:32:38, 2.56s/it] +2025-02-05 23:44:05 - ERROR - stderr - +2025-02-05 23:44:05 - ERROR - stderr - +2025-02-05 23:44:05 - INFO - stdout - {'loss': 0.6825, 'grad_norm': 1.314460039138794, 'learning_rate': 5.7127064949427566e-06, 'epoch': 1.96} +2025-02-05 23:44:05 - ERROR - stderr - 65%|██████▌ | 14623/22434 [13:36:25<5:32:38, 2.56s/it] +2025-02-05 23:44:08 - ERROR - stderr - 65%|██████▌ | 14624/22434 [13:36:28<5:42:23, 2.63s/it] +2025-02-05 23:44:08 - ERROR - stderr - +2025-02-05 23:44:08 - ERROR - stderr - +2025-02-05 23:44:08 - INFO - stdout - {'loss': 0.6088, 'grad_norm': 1.182695984840393, 'learning_rate': 5.71140221131903e-06, 'epoch': 1.96} +2025-02-05 23:44:08 - ERROR - stderr - 65%|██████▌ | 14624/22434 [13:36:28<5:42:23, 2.63s/it] +2025-02-05 23:44:11 - ERROR - stderr - 65%|██████▌ | 14625/22434 [13:36:30<5:38:23, 2.60s/it] +2025-02-05 23:44:11 - ERROR - stderr - +2025-02-05 23:44:11 - ERROR - stderr - +2025-02-05 23:44:11 - INFO - stdout - {'loss': 0.6714, 'grad_norm': 1.3603765964508057, 'learning_rate': 5.710098017087019e-06, 'epoch': 1.96} +2025-02-05 23:44:11 - ERROR - stderr - 65%|██████▌ | 14625/22434 [13:36:30<5:38:23, 2.60s/it] +2025-02-05 23:44:13 - ERROR - stderr - 65%|██████▌ | 14626/22434 [13:36:33<5:31:22, 2.55s/it] +2025-02-05 23:44:13 - ERROR - stderr - +2025-02-05 23:44:13 - ERROR - stderr - +2025-02-05 23:44:13 - INFO - stdout - {'loss': 0.626, 'grad_norm': 1.1404730081558228, 'learning_rate': 5.708793912273911e-06, 'epoch': 1.96} +2025-02-05 23:44:13 - ERROR - stderr - 65%|██████▌ | 14626/22434 [13:36:33<5:31:22, 2.55s/it] +2025-02-05 23:44:15 - ERROR - stderr - 65%|██████▌ | 14627/22434 [13:36:35<5:28:18, 2.52s/it] +2025-02-05 23:44:15 - ERROR - stderr - +2025-02-05 23:44:15 - ERROR - stderr - +2025-02-05 23:44:15 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.2869254350662231, 'learning_rate': 5.7074898969068874e-06, 'epoch': 1.96} +2025-02-05 23:44:15 - ERROR - stderr - 65%|██████▌ | 14627/22434 [13:36:35<5:28:18, 2.52s/it] +2025-02-05 23:44:18 - ERROR - stderr - 65%|██████▌ | 14628/22434 [13:36:38<5:46:29, 2.66s/it] +2025-02-05 23:44:18 - ERROR - stderr - +2025-02-05 23:44:18 - ERROR - stderr - +2025-02-05 23:44:18 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.269412875175476, 'learning_rate': 5.7061859710131296e-06, 'epoch': 1.96} +2025-02-05 23:44:18 - ERROR - stderr - 65%|██████▌ | 14628/22434 [13:36:38<5:46:29, 2.66s/it] +2025-02-05 23:44:21 - ERROR - stderr - 65%|██████▌ | 14629/22434 [13:36:41<5:46:25, 2.66s/it] +2025-02-05 23:44:21 - ERROR - stderr - +2025-02-05 23:44:21 - ERROR - stderr - +2025-02-05 23:44:21 - INFO - stdout - {'loss': 0.6241, 'grad_norm': 1.2813360691070557, 'learning_rate': 5.7048821346198155e-06, 'epoch': 1.96} +2025-02-05 23:44:21 - ERROR - stderr - 65%|██████▌ | 14629/22434 [13:36:41<5:46:25, 2.66s/it] +2025-02-05 23:44:24 - ERROR - stderr - 65%|██████▌ | 14630/22434 [13:36:43<5:41:08, 2.62s/it] +2025-02-05 23:44:24 - ERROR - stderr - +2025-02-05 23:44:24 - ERROR - stderr - +2025-02-05 23:44:24 - INFO - stdout - {'loss': 0.6185, 'grad_norm': 1.332628846168518, 'learning_rate': 5.703578387754124e-06, 'epoch': 1.96} +2025-02-05 23:44:24 - ERROR - stderr - 65%|██████▌ | 14630/22434 [13:36:43<5:41:08, 2.62s/it] +2025-02-05 23:44:26 - ERROR - stderr - 65%|██████▌ | 14631/22434 [13:36:46<5:38:11, 2.60s/it] +2025-02-05 23:44:26 - ERROR - stderr - +2025-02-05 23:44:26 - ERROR - stderr - +2025-02-05 23:44:26 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.459277868270874, 'learning_rate': 5.702274730443234e-06, 'epoch': 1.96} +2025-02-05 23:44:26 - ERROR - stderr - 65%|██████▌ | 14631/22434 [13:36:46<5:38:11, 2.60s/it] +2025-02-05 23:44:29 - ERROR - stderr - 65%|██████▌ | 14632/22434 [13:36:48<5:31:56, 2.55s/it] +2025-02-05 23:44:29 - ERROR - stderr - +2025-02-05 23:44:29 - ERROR - stderr - +2025-02-05 23:44:29 - INFO - stdout - {'loss': 0.7439, 'grad_norm': 1.2983884811401367, 'learning_rate': 5.700971162714306e-06, 'epoch': 1.96} +2025-02-05 23:44:29 - ERROR - stderr - 65%|██████▌ | 14632/22434 [13:36:48<5:31:56, 2.55s/it] +2025-02-05 23:44:31 - ERROR - stderr - 65%|██████▌ | 14633/22434 [13:36:51<5:29:24, 2.53s/it] +2025-02-05 23:44:31 - ERROR - stderr - +2025-02-05 23:44:31 - ERROR - stderr - +2025-02-05 23:44:31 - INFO - stdout - {'loss': 0.6815, 'grad_norm': 1.2364405393600464, 'learning_rate': 5.69966768459453e-06, 'epoch': 1.96} +2025-02-05 23:44:31 - ERROR - stderr - 65%|██████▌ | 14633/22434 [13:36:51<5:29:24, 2.53s/it] +2025-02-05 23:44:34 - ERROR - stderr - 65%|██████▌ | 14634/22434 [13:36:53<5:28:08, 2.52s/it] +2025-02-05 23:44:34 - ERROR - stderr - +2025-02-05 23:44:34 - ERROR - stderr - +2025-02-05 23:44:34 - INFO - stdout - {'loss': 0.6297, 'grad_norm': 1.215437650680542, 'learning_rate': 5.698364296111057e-06, 'epoch': 1.96} +2025-02-05 23:44:34 - ERROR - stderr - 65%|██████▌ | 14634/22434 [13:36:53<5:28:08, 2.52s/it] +2025-02-05 23:44:36 - ERROR - stderr - 65%|██████▌ | 14635/22434 [13:36:56<5:29:45, 2.54s/it] +2025-02-05 23:44:36 - ERROR - stderr - +2025-02-05 23:44:36 - ERROR - stderr - +2025-02-05 23:44:36 - INFO - stdout - {'loss': 0.685, 'grad_norm': 1.315403938293457, 'learning_rate': 5.697060997291071e-06, 'epoch': 1.96} +2025-02-05 23:44:36 - ERROR - stderr - 65%|██████▌ | 14635/22434 [13:36:56<5:29:45, 2.54s/it] +2025-02-05 23:44:39 - ERROR - stderr - 65%|██████▌ | 14636/22434 [13:36:58<5:25:36, 2.51s/it] +2025-02-05 23:44:39 - ERROR - stderr - +2025-02-05 23:44:39 - ERROR - stderr - +2025-02-05 23:44:39 - INFO - stdout - {'loss': 0.5997, 'grad_norm': 1.1407335996627808, 'learning_rate': 5.695757788161729e-06, 'epoch': 1.96} +2025-02-05 23:44:39 - ERROR - stderr - 65%|██████▌ | 14636/22434 [13:36:58<5:25:36, 2.51s/it] +2025-02-05 23:44:41 - ERROR - stderr - 65%|██████▌ | 14637/22434 [13:37:01<5:26:05, 2.51s/it] +2025-02-05 23:44:41 - ERROR - stderr - +2025-02-05 23:44:41 - ERROR - stderr - +2025-02-05 23:44:41 - INFO - stdout - {'loss': 0.7771, 'grad_norm': 1.3310550451278687, 'learning_rate': 5.694454668750191e-06, 'epoch': 1.96} +2025-02-05 23:44:41 - ERROR - stderr - 65%|██████▌ | 14637/22434 [13:37:01<5:26:05, 2.51s/it] +2025-02-05 23:44:44 - ERROR - stderr - 65%|██████▌ | 14638/22434 [13:37:03<5:25:41, 2.51s/it] +2025-02-05 23:44:44 - ERROR - stderr - +2025-02-05 23:44:44 - ERROR - stderr - +2025-02-05 23:44:44 - INFO - stdout - {'loss': 0.6863, 'grad_norm': 1.2347928285598755, 'learning_rate': 5.6931516390836364e-06, 'epoch': 1.96} +2025-02-05 23:44:44 - ERROR - stderr - 65%|██████▌ | 14638/22434 [13:37:03<5:25:41, 2.51s/it] +2025-02-05 23:44:46 - ERROR - stderr - 65%|██████▌ | 14639/22434 [13:37:06<5:26:13, 2.51s/it] +2025-02-05 23:44:46 - ERROR - stderr - +2025-02-05 23:44:46 - ERROR - stderr - +2025-02-05 23:44:46 - INFO - stdout - {'loss': 0.5978, 'grad_norm': 1.363997220993042, 'learning_rate': 5.6918486991892085e-06, 'epoch': 1.96} +2025-02-05 23:44:46 - ERROR - stderr - 65%|██████▌ | 14639/22434 [13:37:06<5:26:13, 2.51s/it] +2025-02-05 23:44:49 - ERROR - stderr - 65%|██████▌ | 14640/22434 [13:37:08<5:26:01, 2.51s/it] +2025-02-05 23:44:49 - ERROR - stderr - +2025-02-05 23:44:49 - ERROR - stderr - +2025-02-05 23:44:49 - INFO - stdout - {'loss': 0.661, 'grad_norm': 1.293451189994812, 'learning_rate': 5.690545849094072e-06, 'epoch': 1.96} +2025-02-05 23:44:49 - ERROR - stderr - 65%|██████▌ | 14640/22434 [13:37:08<5:26:01, 2.51s/it] +2025-02-05 23:44:51 - ERROR - stderr - 65%|██████▌ | 14641/22434 [13:37:11<5:24:35, 2.50s/it] +2025-02-05 23:44:51 - ERROR - stderr - +2025-02-05 23:44:51 - ERROR - stderr - +2025-02-05 23:44:51 - INFO - stdout - {'loss': 0.7209, 'grad_norm': 1.3363726139068604, 'learning_rate': 5.689243088825385e-06, 'epoch': 1.96} +2025-02-05 23:44:51 - ERROR - stderr - 65%|██████▌ | 14641/22434 [13:37:11<5:24:35, 2.50s/it] +2025-02-05 23:44:54 - ERROR - stderr - 65%|██████▌ | 14642/22434 [13:37:13<5:25:12, 2.50s/it] +2025-02-05 23:44:54 - ERROR - stderr - +2025-02-05 23:44:54 - ERROR - stderr - +2025-02-05 23:44:54 - INFO - stdout - {'loss': 0.7761, 'grad_norm': 1.381197452545166, 'learning_rate': 5.6879404184102994e-06, 'epoch': 1.96} +2025-02-05 23:44:54 - ERROR - stderr - 65%|██████▌ | 14642/22434 [13:37:13<5:25:12, 2.50s/it] +2025-02-05 23:44:56 - ERROR - stderr - 65%|██████▌ | 14643/22434 [13:37:16<5:26:58, 2.52s/it] +2025-02-05 23:44:56 - ERROR - stderr - +2025-02-05 23:44:56 - ERROR - stderr - +2025-02-05 23:44:56 - INFO - stdout - {'loss': 0.6871, 'grad_norm': 1.255351185798645, 'learning_rate': 5.68663783787597e-06, 'epoch': 1.96} +2025-02-05 23:44:56 - ERROR - stderr - 65%|██████▌ | 14643/22434 [13:37:16<5:26:58, 2.52s/it] +2025-02-05 23:44:59 - ERROR - stderr - 65%|██████▌ | 14644/22434 [13:37:18<5:26:13, 2.51s/it] +2025-02-05 23:44:59 - ERROR - stderr - +2025-02-05 23:44:59 - ERROR - stderr - +2025-02-05 23:44:59 - INFO - stdout - {'loss': 0.7137, 'grad_norm': 1.3195786476135254, 'learning_rate': 5.685335347249548e-06, 'epoch': 1.96} +2025-02-05 23:44:59 - ERROR - stderr - 65%|██████▌ | 14644/22434 [13:37:18<5:26:13, 2.51s/it] +2025-02-05 23:45:01 - ERROR - stderr - 65%|██████▌ | 14645/22434 [13:37:21<5:26:32, 2.52s/it] +2025-02-05 23:45:01 - ERROR - stderr - +2025-02-05 23:45:01 - ERROR - stderr - +2025-02-05 23:45:01 - INFO - stdout - {'loss': 0.617, 'grad_norm': 1.316573977470398, 'learning_rate': 5.684032946558182e-06, 'epoch': 1.96} +2025-02-05 23:45:01 - ERROR - stderr - 65%|██████▌ | 14645/22434 [13:37:21<5:26:32, 2.52s/it] +2025-02-05 23:45:04 - ERROR - stderr - 65%|██████▌ | 14646/22434 [13:37:23<5:25:15, 2.51s/it] +2025-02-05 23:45:04 - ERROR - stderr - +2025-02-05 23:45:04 - ERROR - stderr - +2025-02-05 23:45:04 - INFO - stdout - {'loss': 0.6937, 'grad_norm': 1.3466285467147827, 'learning_rate': 5.682730635829019e-06, 'epoch': 1.96} +2025-02-05 23:45:04 - ERROR - stderr - 65%|██████▌ | 14646/22434 [13:37:23<5:25:15, 2.51s/it] +2025-02-05 23:45:06 - ERROR - stderr - 65%|██████▌ | 14647/22434 [13:37:26<5:30:54, 2.55s/it] +2025-02-05 23:45:06 - ERROR - stderr - +2025-02-05 23:45:06 - ERROR - stderr - +2025-02-05 23:45:06 - INFO - stdout - {'loss': 0.7281, 'grad_norm': 1.3530833721160889, 'learning_rate': 5.681428415089204e-06, 'epoch': 1.96} +2025-02-05 23:45:06 - ERROR - stderr - 65%|██████▌ | 14647/22434 [13:37:26<5:30:54, 2.55s/it] +2025-02-05 23:45:09 - ERROR - stderr - 65%|██████▌ | 14648/22434 [13:37:29<5:33:03, 2.57s/it] +2025-02-05 23:45:09 - ERROR - stderr - +2025-02-05 23:45:09 - ERROR - stderr - +2025-02-05 23:45:09 - INFO - stdout - {'loss': 0.7142, 'grad_norm': 1.2846466302871704, 'learning_rate': 5.680126284365882e-06, 'epoch': 1.96} +2025-02-05 23:45:09 - ERROR - stderr - 65%|██████▌ | 14648/22434 [13:37:29<5:33:03, 2.57s/it] +2025-02-05 23:45:12 - ERROR - stderr - 65%|██████▌ | 14649/22434 [13:37:31<5:33:58, 2.57s/it] +2025-02-05 23:45:12 - ERROR - stderr - +2025-02-05 23:45:12 - ERROR - stderr - +2025-02-05 23:45:12 - INFO - stdout - {'loss': 0.6648, 'grad_norm': 1.1907652616500854, 'learning_rate': 5.678824243686194e-06, 'epoch': 1.96} +2025-02-05 23:45:12 - ERROR - stderr - 65%|██████▌ | 14649/22434 [13:37:31<5:33:58, 2.57s/it] +2025-02-05 23:45:14 - ERROR - stderr - 65%|██████▌ | 14650/22434 [13:37:34<5:31:15, 2.55s/it] +2025-02-05 23:45:14 - ERROR - stderr - +2025-02-05 23:45:14 - ERROR - stderr - +2025-02-05 23:45:14 - INFO - stdout - {'loss': 0.6205, 'grad_norm': 1.2312815189361572, 'learning_rate': 5.67752229307728e-06, 'epoch': 1.96} +2025-02-05 23:45:14 - ERROR - stderr - 65%|██████▌ | 14650/22434 [13:37:34<5:31:15, 2.55s/it] +2025-02-05 23:45:17 - ERROR - stderr - 65%|██████▌ | 14651/22434 [13:37:36<5:28:41, 2.53s/it] +2025-02-05 23:45:17 - ERROR - stderr - +2025-02-05 23:45:17 - ERROR - stderr - +2025-02-05 23:45:17 - INFO - stdout - {'loss': 0.6899, 'grad_norm': 1.5501290559768677, 'learning_rate': 5.6762204325662775e-06, 'epoch': 1.96} +2025-02-05 23:45:17 - ERROR - stderr - 65%|██████▌ | 14651/22434 [13:37:36<5:28:41, 2.53s/it] +2025-02-05 23:45:19 - ERROR - stderr - 65%|██████▌ | 14652/22434 [13:37:39<5:25:07, 2.51s/it] +2025-02-05 23:45:19 - ERROR - stderr - +2025-02-05 23:45:19 - ERROR - stderr - +2025-02-05 23:45:19 - INFO - stdout - {'loss': 0.6055, 'grad_norm': 1.1593133211135864, 'learning_rate': 5.674918662180326e-06, 'epoch': 1.96} +2025-02-05 23:45:19 - ERROR - stderr - 65%|██████▌ | 14652/22434 [13:37:39<5:25:07, 2.51s/it] +2025-02-05 23:45:21 - ERROR - stderr - 65%|██████▌ | 14653/22434 [13:37:41<5:26:12, 2.52s/it] +2025-02-05 23:45:22 - ERROR - stderr - +2025-02-05 23:45:22 - ERROR - stderr - +2025-02-05 23:45:22 - INFO - stdout - {'loss': 0.706, 'grad_norm': 1.3351491689682007, 'learning_rate': 5.673616981946548e-06, 'epoch': 1.96} +2025-02-05 23:45:22 - ERROR - stderr - 65%|██████▌ | 14653/22434 [13:37:41<5:26:12, 2.52s/it] +2025-02-05 23:45:24 - ERROR - stderr - 65%|██████▌ | 14654/22434 [13:37:44<5:25:05, 2.51s/it] +2025-02-05 23:45:24 - ERROR - stderr - +2025-02-05 23:45:24 - ERROR - stderr - +2025-02-05 23:45:24 - INFO - stdout - {'loss': 0.5792, 'grad_norm': 1.1090319156646729, 'learning_rate': 5.672315391892094e-06, 'epoch': 1.96} +2025-02-05 23:45:24 - ERROR - stderr - 65%|██████▌ | 14654/22434 [13:37:44<5:25:05, 2.51s/it] +2025-02-05 23:45:26 - ERROR - stderr - 65%|██████▌ | 14655/22434 [13:37:46<5:22:42, 2.49s/it] +2025-02-05 23:45:26 - ERROR - stderr - +2025-02-05 23:45:26 - ERROR - stderr - +2025-02-05 23:45:26 - INFO - stdout - {'loss': 0.6802, 'grad_norm': 1.4159257411956787, 'learning_rate': 5.671013892044079e-06, 'epoch': 1.96} +2025-02-05 23:45:26 - ERROR - stderr - 65%|██████▌ | 14655/22434 [13:37:46<5:22:42, 2.49s/it] +2025-02-05 23:45:29 - ERROR - stderr - 65%|██████▌ | 14656/22434 [13:37:49<5:23:13, 2.49s/it] +2025-02-05 23:45:29 - ERROR - stderr - +2025-02-05 23:45:29 - ERROR - stderr - +2025-02-05 23:45:29 - INFO - stdout - {'loss': 0.6262, 'grad_norm': 1.16078519821167, 'learning_rate': 5.669712482429632e-06, 'epoch': 1.96} +2025-02-05 23:45:29 - ERROR - stderr - 65%|██████▌ | 14656/22434 [13:37:49<5:23:13, 2.49s/it] +2025-02-05 23:45:31 - ERROR - stderr - 65%|██████▌ | 14657/22434 [13:37:51<5:25:27, 2.51s/it] +2025-02-05 23:45:32 - ERROR - stderr - +2025-02-05 23:45:32 - ERROR - stderr - +2025-02-05 23:45:32 - INFO - stdout - {'loss': 0.7067, 'grad_norm': 1.2462307214736938, 'learning_rate': 5.668411163075896e-06, 'epoch': 1.96} +2025-02-05 23:45:32 - ERROR - stderr - 65%|██████▌ | 14657/22434 [13:37:51<5:25:27, 2.51s/it] +2025-02-05 23:45:34 - ERROR - stderr - 65%|██████▌ | 14658/22434 [13:37:54<5:30:38, 2.55s/it] +2025-02-05 23:45:34 - ERROR - stderr - +2025-02-05 23:45:34 - ERROR - stderr - +2025-02-05 23:45:34 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.354002594947815, 'learning_rate': 5.667109934009973e-06, 'epoch': 1.96} +2025-02-05 23:45:34 - ERROR - stderr - 65%|██████▌ | 14658/22434 [13:37:54<5:30:38, 2.55s/it] +2025-02-05 23:45:37 - ERROR - stderr - 65%|██████▌ | 14659/22434 [13:37:56<5:31:01, 2.55s/it] +2025-02-05 23:45:37 - ERROR - stderr - +2025-02-05 23:45:37 - ERROR - stderr - +2025-02-05 23:45:37 - INFO - stdout - {'loss': 0.7714, 'grad_norm': 1.3994309902191162, 'learning_rate': 5.6658087952590064e-06, 'epoch': 1.96} +2025-02-05 23:45:37 - ERROR - stderr - 65%|██████▌ | 14659/22434 [13:37:57<5:31:01, 2.55s/it] +2025-02-05 23:45:39 - ERROR - stderr - 65%|██████▌ | 14660/22434 [13:37:59<5:27:51, 2.53s/it] +2025-02-05 23:45:39 - ERROR - stderr - +2025-02-05 23:45:39 - ERROR - stderr - +2025-02-05 23:45:39 - INFO - stdout - {'loss': 0.6872, 'grad_norm': 1.1637780666351318, 'learning_rate': 5.664507746850106e-06, 'epoch': 1.96} +2025-02-05 23:45:39 - ERROR - stderr - 65%|██████▌ | 14660/22434 [13:37:59<5:27:51, 2.53s/it] +2025-02-05 23:45:42 - ERROR - stderr - 65%|██████▌ | 14661/22434 [13:38:01<5:26:44, 2.52s/it] +2025-02-05 23:45:42 - ERROR - stderr - +2025-02-05 23:45:42 - ERROR - stderr - +2025-02-05 23:45:42 - INFO - stdout - {'loss': 0.622, 'grad_norm': 1.1955482959747314, 'learning_rate': 5.663206788810391e-06, 'epoch': 1.96} +2025-02-05 23:45:42 - ERROR - stderr - 65%|██████▌ | 14661/22434 [13:38:01<5:26:44, 2.52s/it] +2025-02-05 23:45:44 - ERROR - stderr - 65%|██████▌ | 14662/22434 [13:38:04<5:24:44, 2.51s/it] +2025-02-05 23:45:44 - ERROR - stderr - +2025-02-05 23:45:44 - ERROR - stderr - +2025-02-05 23:45:44 - INFO - stdout - {'loss': 0.6395, 'grad_norm': 1.3589155673980713, 'learning_rate': 5.661905921166981e-06, 'epoch': 1.96} +2025-02-05 23:45:44 - ERROR - stderr - 65%|██████▌ | 14662/22434 [13:38:04<5:24:44, 2.51s/it] +2025-02-05 23:45:47 - ERROR - stderr - 65%|██████▌ | 14663/22434 [13:38:06<5:22:38, 2.49s/it] +2025-02-05 23:45:47 - ERROR - stderr - +2025-02-05 23:45:47 - ERROR - stderr - +2025-02-05 23:45:47 - INFO - stdout - {'loss': 0.6191, 'grad_norm': 1.1530872583389282, 'learning_rate': 5.6606051439469915e-06, 'epoch': 1.96} +2025-02-05 23:45:47 - ERROR - stderr - 65%|██████▌ | 14663/22434 [13:38:06<5:22:38, 2.49s/it] +2025-02-05 23:45:49 - ERROR - stderr - 65%|██████▌ | 14664/22434 [13:38:09<5:24:11, 2.50s/it] +2025-02-05 23:45:49 - ERROR - stderr - +2025-02-05 23:45:49 - ERROR - stderr - +2025-02-05 23:45:49 - INFO - stdout - {'loss': 0.8032, 'grad_norm': 1.3186378479003906, 'learning_rate': 5.6593044571775344e-06, 'epoch': 1.96} +2025-02-05 23:45:49 - ERROR - stderr - 65%|██████▌ | 14664/22434 [13:38:09<5:24:11, 2.50s/it] +2025-02-05 23:45:52 - ERROR - stderr - 65%|██████▌ | 14665/22434 [13:38:11<5:27:35, 2.53s/it] +2025-02-05 23:45:52 - ERROR - stderr - +2025-02-05 23:45:52 - ERROR - stderr - +2025-02-05 23:45:52 - INFO - stdout - {'loss': 0.6306, 'grad_norm': 1.1950606107711792, 'learning_rate': 5.658003860885724e-06, 'epoch': 1.96} +2025-02-05 23:45:52 - ERROR - stderr - 65%|██████▌ | 14665/22434 [13:38:12<5:27:35, 2.53s/it] +2025-02-05 23:45:54 - ERROR - stderr - 65%|██████▌ | 14666/22434 [13:38:14<5:23:26, 2.50s/it] +2025-02-05 23:45:54 - ERROR - stderr - +2025-02-05 23:45:54 - ERROR - stderr - +2025-02-05 23:45:54 - INFO - stdout - {'loss': 0.6399, 'grad_norm': 1.3002429008483887, 'learning_rate': 5.656703355098666e-06, 'epoch': 1.96} +2025-02-05 23:45:54 - ERROR - stderr - 65%|██████▌ | 14666/22434 [13:38:14<5:23:26, 2.50s/it] +2025-02-05 23:45:57 - ERROR - stderr - 65%|██████▌ | 14667/22434 [13:38:16<5:20:54, 2.48s/it] +2025-02-05 23:45:57 - ERROR - stderr - +2025-02-05 23:45:57 - ERROR - stderr - +2025-02-05 23:45:57 - INFO - stdout - {'loss': 0.7687, 'grad_norm': 1.3470947742462158, 'learning_rate': 5.655402939843472e-06, 'epoch': 1.96} +2025-02-05 23:45:57 - ERROR - stderr - 65%|██████▌ | 14667/22434 [13:38:16<5:20:54, 2.48s/it] +2025-02-05 23:45:59 - ERROR - stderr - 65%|██████▌ | 14668/22434 [13:38:19<5:23:04, 2.50s/it] +2025-02-05 23:45:59 - ERROR - stderr - +2025-02-05 23:45:59 - ERROR - stderr - +2025-02-05 23:45:59 - INFO - stdout - {'loss': 0.7361, 'grad_norm': 1.3797457218170166, 'learning_rate': 5.654102615147245e-06, 'epoch': 1.96} +2025-02-05 23:45:59 - ERROR - stderr - 65%|██████▌ | 14668/22434 [13:38:19<5:23:04, 2.50s/it] +2025-02-05 23:46:02 - ERROR - stderr - 65%|██████▌ | 14669/22434 [13:38:21<5:20:41, 2.48s/it] +2025-02-05 23:46:02 - ERROR - stderr - +2025-02-05 23:46:02 - ERROR - stderr - +2025-02-05 23:46:02 - INFO - stdout - {'loss': 0.6731, 'grad_norm': 1.3496390581130981, 'learning_rate': 5.652802381037093e-06, 'epoch': 1.96} +2025-02-05 23:46:02 - ERROR - stderr - 65%|██████▌ | 14669/22434 [13:38:21<5:20:41, 2.48s/it] +2025-02-05 23:46:04 - ERROR - stderr - 65%|██████▌ | 14670/22434 [13:38:24<5:21:55, 2.49s/it] +2025-02-05 23:46:04 - ERROR - stderr - +2025-02-05 23:46:04 - ERROR - stderr - +2025-02-05 23:46:04 - INFO - stdout - {'loss': 0.6217, 'grad_norm': 1.1988033056259155, 'learning_rate': 5.651502237540113e-06, 'epoch': 1.96} +2025-02-05 23:46:04 - ERROR - stderr - 65%|██████▌ | 14670/22434 [13:38:24<5:21:55, 2.49s/it] +2025-02-05 23:46:07 - ERROR - stderr - 65%|██████▌ | 14671/22434 [13:38:26<5:21:18, 2.48s/it] +2025-02-05 23:46:07 - ERROR - stderr - +2025-02-05 23:46:07 - ERROR - stderr - +2025-02-05 23:46:07 - INFO - stdout - {'loss': 0.676, 'grad_norm': 1.2463001012802124, 'learning_rate': 5.650202184683413e-06, 'epoch': 1.96} +2025-02-05 23:46:07 - ERROR - stderr - 65%|██████▌ | 14671/22434 [13:38:26<5:21:18, 2.48s/it] +2025-02-05 23:46:09 - ERROR - stderr - 65%|██████▌ | 14672/22434 [13:38:29<5:21:38, 2.49s/it] +2025-02-05 23:46:09 - ERROR - stderr - +2025-02-05 23:46:09 - ERROR - stderr - +2025-02-05 23:46:09 - INFO - stdout - {'loss': 0.6124, 'grad_norm': 1.1167430877685547, 'learning_rate': 5.648902222494077e-06, 'epoch': 1.96} +2025-02-05 23:46:09 - ERROR - stderr - 65%|██████▌ | 14672/22434 [13:38:29<5:21:38, 2.49s/it] +2025-02-05 23:46:12 - ERROR - stderr - 65%|██████▌ | 14673/22434 [13:38:31<5:24:21, 2.51s/it] +2025-02-05 23:46:12 - ERROR - stderr - +2025-02-05 23:46:12 - ERROR - stderr - +2025-02-05 23:46:12 - INFO - stdout - {'loss': 0.6356, 'grad_norm': 1.3045426607131958, 'learning_rate': 5.64760235099922e-06, 'epoch': 1.96} +2025-02-05 23:46:12 - ERROR - stderr - 65%|██████▌ | 14673/22434 [13:38:31<5:24:21, 2.51s/it] +2025-02-05 23:46:14 - ERROR - stderr - 65%|██████▌ | 14674/22434 [13:38:34<5:24:14, 2.51s/it] +2025-02-05 23:46:14 - ERROR - stderr - +2025-02-05 23:46:14 - ERROR - stderr - +2025-02-05 23:46:14 - INFO - stdout - {'loss': 0.6672, 'grad_norm': 1.257029414176941, 'learning_rate': 5.646302570225919e-06, 'epoch': 1.96} +2025-02-05 23:46:14 - ERROR - stderr - 65%|██████▌ | 14674/22434 [13:38:34<5:24:14, 2.51s/it] +2025-02-05 23:46:17 - ERROR - stderr - 65%|██████▌ | 14675/22434 [13:38:36<5:22:17, 2.49s/it] +2025-02-05 23:46:17 - ERROR - stderr - +2025-02-05 23:46:17 - ERROR - stderr - +2025-02-05 23:46:17 - INFO - stdout - {'loss': 0.6295, 'grad_norm': 1.2020256519317627, 'learning_rate': 5.645002880201278e-06, 'epoch': 1.96} +2025-02-05 23:46:17 - ERROR - stderr - 65%|██████▌ | 14675/22434 [13:38:36<5:22:17, 2.49s/it] +2025-02-05 23:46:19 - ERROR - stderr - 65%|██████▌ | 14676/22434 [13:38:39<5:22:15, 2.49s/it] +2025-02-05 23:46:19 - ERROR - stderr - +2025-02-05 23:46:19 - ERROR - stderr - +2025-02-05 23:46:19 - INFO - stdout - {'loss': 0.6613, 'grad_norm': 1.2027219533920288, 'learning_rate': 5.643703280952391e-06, 'epoch': 1.96} +2025-02-05 23:46:19 - ERROR - stderr - 65%|██████▌ | 14676/22434 [13:38:39<5:22:15, 2.49s/it] +2025-02-05 23:46:22 - ERROR - stderr - 65%|██████▌ | 14677/22434 [13:38:41<5:22:37, 2.50s/it] +2025-02-05 23:46:22 - ERROR - stderr - +2025-02-05 23:46:22 - ERROR - stderr - +2025-02-05 23:46:22 - INFO - stdout - {'loss': 0.6726, 'grad_norm': 1.353958249092102, 'learning_rate': 5.642403772506331e-06, 'epoch': 1.96} +2025-02-05 23:46:22 - ERROR - stderr - 65%|██████▌ | 14677/22434 [13:38:41<5:22:37, 2.50s/it] +2025-02-05 23:46:24 - ERROR - stderr - 65%|██████▌ | 14678/22434 [13:38:44<5:21:26, 2.49s/it] +2025-02-05 23:46:24 - ERROR - stderr - +2025-02-05 23:46:24 - ERROR - stderr - +2025-02-05 23:46:24 - INFO - stdout - {'loss': 0.7723, 'grad_norm': 1.4640628099441528, 'learning_rate': 5.6411043548902016e-06, 'epoch': 1.96} +2025-02-05 23:46:24 - ERROR - stderr - 65%|██████▌ | 14678/22434 [13:38:44<5:21:26, 2.49s/it] +2025-02-05 23:46:27 - ERROR - stderr - 65%|██████▌ | 14679/22434 [13:38:46<5:23:06, 2.50s/it] +2025-02-05 23:46:27 - ERROR - stderr - +2025-02-05 23:46:27 - ERROR - stderr - +2025-02-05 23:46:27 - INFO - stdout - {'loss': 0.6843, 'grad_norm': 1.2663509845733643, 'learning_rate': 5.639805028131078e-06, 'epoch': 1.96} +2025-02-05 23:46:27 - ERROR - stderr - 65%|██████▌ | 14679/22434 [13:38:46<5:23:06, 2.50s/it] +2025-02-05 23:46:29 - ERROR - stderr - 65%|██████▌ | 14680/22434 [13:38:49<5:26:10, 2.52s/it] +2025-02-05 23:46:29 - ERROR - stderr - +2025-02-05 23:46:29 - ERROR - stderr - +2025-02-05 23:46:29 - INFO - stdout - {'loss': 0.6768, 'grad_norm': 1.2475420236587524, 'learning_rate': 5.638505792256046e-06, 'epoch': 1.96} +2025-02-05 23:46:29 - ERROR - stderr - 65%|██████▌ | 14680/22434 [13:38:49<5:26:10, 2.52s/it] +2025-02-05 23:46:32 - ERROR - stderr - 65%|██████▌ | 14681/22434 [13:38:51<5:29:12, 2.55s/it] +2025-02-05 23:46:32 - ERROR - stderr - +2025-02-05 23:46:32 - ERROR - stderr - +2025-02-05 23:46:32 - INFO - stdout - {'loss': 0.6207, 'grad_norm': 1.2259981632232666, 'learning_rate': 5.6372066472921875e-06, 'epoch': 1.96} +2025-02-05 23:46:32 - ERROR - stderr - 65%|██████▌ | 14681/22434 [13:38:52<5:29:12, 2.55s/it] +2025-02-05 23:46:34 - ERROR - stderr - 65%|██████▌ | 14682/22434 [13:38:54<5:28:47, 2.54s/it] +2025-02-05 23:46:34 - ERROR - stderr - +2025-02-05 23:46:34 - ERROR - stderr - +2025-02-05 23:46:34 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.2565579414367676, 'learning_rate': 5.635907593266578e-06, 'epoch': 1.96} +2025-02-05 23:46:34 - ERROR - stderr - 65%|██████▌ | 14682/22434 [13:38:54<5:28:47, 2.54s/it] +2025-02-05 23:46:37 - ERROR - stderr - 65%|██████▌ | 14683/22434 [13:38:57<5:27:40, 2.54s/it] +2025-02-05 23:46:37 - ERROR - stderr - +2025-02-05 23:46:37 - ERROR - stderr - +2025-02-05 23:46:37 - INFO - stdout - {'loss': 0.6373, 'grad_norm': 1.2950533628463745, 'learning_rate': 5.634608630206306e-06, 'epoch': 1.96} +2025-02-05 23:46:37 - ERROR - stderr - 65%|██████▌ | 14683/22434 [13:38:57<5:27:40, 2.54s/it] +2025-02-05 23:46:39 - ERROR - stderr - 65%|██████▌ | 14684/22434 [13:38:59<5:25:52, 2.52s/it] +2025-02-05 23:46:39 - ERROR - stderr - +2025-02-05 23:46:39 - ERROR - stderr - +2025-02-05 23:46:39 - INFO - stdout - {'loss': 0.6578, 'grad_norm': 1.2803237438201904, 'learning_rate': 5.6333097581384365e-06, 'epoch': 1.96} +2025-02-05 23:46:39 - ERROR - stderr - 65%|██████▌ | 14684/22434 [13:38:59<5:25:52, 2.52s/it] +2025-02-05 23:46:42 - ERROR - stderr - 65%|██████▌ | 14685/22434 [13:39:01<5:23:01, 2.50s/it] +2025-02-05 23:46:42 - ERROR - stderr - +2025-02-05 23:46:42 - ERROR - stderr - +2025-02-05 23:46:42 - INFO - stdout - {'loss': 0.7063, 'grad_norm': 1.499576210975647, 'learning_rate': 5.6320109770900455e-06, 'epoch': 1.96} +2025-02-05 23:46:42 - ERROR - stderr - 65%|██████▌ | 14685/22434 [13:39:02<5:23:01, 2.50s/it] +2025-02-05 23:46:44 - ERROR - stderr - 65%|██████▌ | 14686/22434 [13:39:04<5:24:48, 2.52s/it] +2025-02-05 23:46:44 - ERROR - stderr - +2025-02-05 23:46:44 - ERROR - stderr - +2025-02-05 23:46:44 - INFO - stdout - {'loss': 0.6045, 'grad_norm': 1.2894600629806519, 'learning_rate': 5.630712287088207e-06, 'epoch': 1.96} +2025-02-05 23:46:44 - ERROR - stderr - 65%|██████▌ | 14686/22434 [13:39:04<5:24:48, 2.52s/it] +2025-02-05 23:46:47 - ERROR - stderr - 65%|██████▌ | 14687/22434 [13:39:07<5:42:13, 2.65s/it] +2025-02-05 23:46:47 - ERROR - stderr - +2025-02-05 23:46:47 - ERROR - stderr - +2025-02-05 23:46:47 - INFO - stdout - {'loss': 0.7184, 'grad_norm': 1.3914014101028442, 'learning_rate': 5.6294136881599905e-06, 'epoch': 1.96} +2025-02-05 23:46:47 - ERROR - stderr - 65%|██████▌ | 14687/22434 [13:39:07<5:42:13, 2.65s/it] +2025-02-05 23:46:50 - ERROR - stderr - 65%|██████▌ | 14688/22434 [13:39:10<5:41:59, 2.65s/it] +2025-02-05 23:46:50 - ERROR - stderr - +2025-02-05 23:46:50 - ERROR - stderr - +2025-02-05 23:46:50 - INFO - stdout - {'loss': 0.5928, 'grad_norm': 1.0767074823379517, 'learning_rate': 5.628115180332463e-06, 'epoch': 1.96} +2025-02-05 23:46:50 - ERROR - stderr - 65%|██████▌ | 14688/22434 [13:39:10<5:41:59, 2.65s/it] +2025-02-05 23:46:53 - ERROR - stderr - 65%|██████▌ | 14689/22434 [13:39:12<5:42:59, 2.66s/it] +2025-02-05 23:46:53 - ERROR - stderr - +2025-02-05 23:46:53 - ERROR - stderr - +2025-02-05 23:46:53 - INFO - stdout - {'loss': 0.665, 'grad_norm': 1.3659939765930176, 'learning_rate': 5.6268167636326896e-06, 'epoch': 1.96} +2025-02-05 23:46:53 - ERROR - stderr - 65%|██████▌ | 14689/22434 [13:39:12<5:42:59, 2.66s/it] +2025-02-05 23:46:55 - ERROR - stderr - 65%|██████▌ | 14690/22434 [13:39:15<5:39:59, 2.63s/it] +2025-02-05 23:46:55 - ERROR - stderr - +2025-02-05 23:46:55 - ERROR - stderr - +2025-02-05 23:46:55 - INFO - stdout - {'loss': 0.7519, 'grad_norm': 1.4366748332977295, 'learning_rate': 5.625518438087738e-06, 'epoch': 1.96} +2025-02-05 23:46:55 - ERROR - stderr - 65%|█████���▌ | 14690/22434 [13:39:15<5:39:59, 2.63s/it] +2025-02-05 23:46:58 - ERROR - stderr - 65%|██████▌ | 14691/22434 [13:39:18<5:45:41, 2.68s/it] +2025-02-05 23:46:58 - ERROR - stderr - +2025-02-05 23:46:58 - ERROR - stderr - +2025-02-05 23:46:58 - INFO - stdout - {'loss': 0.6127, 'grad_norm': 1.1853920221328735, 'learning_rate': 5.624220203724669e-06, 'epoch': 1.96} +2025-02-05 23:46:58 - ERROR - stderr - 65%|██████▌ | 14691/22434 [13:39:18<5:45:41, 2.68s/it] +2025-02-05 23:47:00 - ERROR - stderr - 65%|██████▌ | 14692/22434 [13:39:20<5:38:49, 2.63s/it] +2025-02-05 23:47:00 - ERROR - stderr - +2025-02-05 23:47:00 - ERROR - stderr - +2025-02-05 23:47:00 - INFO - stdout - {'loss': 0.7031, 'grad_norm': 1.2590110301971436, 'learning_rate': 5.62292206057054e-06, 'epoch': 1.96} +2025-02-05 23:47:00 - ERROR - stderr - 65%|██████▌ | 14692/22434 [13:39:20<5:38:49, 2.63s/it] +2025-02-05 23:47:03 - ERROR - stderr - 65%|██████▌ | 14693/22434 [13:39:23<5:35:53, 2.60s/it] +2025-02-05 23:47:03 - ERROR - stderr - +2025-02-05 23:47:03 - ERROR - stderr - +2025-02-05 23:47:03 - INFO - stdout - {'loss': 0.7354, 'grad_norm': 1.350253939628601, 'learning_rate': 5.621624008652414e-06, 'epoch': 1.96} +2025-02-05 23:47:03 - ERROR - stderr - 65%|██████▌ | 14693/22434 [13:39:23<5:35:53, 2.60s/it] +2025-02-05 23:47:06 - ERROR - stderr - 65%|██████▌ | 14694/22434 [13:39:25<5:35:15, 2.60s/it] +2025-02-05 23:47:06 - ERROR - stderr - +2025-02-05 23:47:06 - ERROR - stderr - +2025-02-05 23:47:06 - INFO - stdout - {'loss': 0.6178, 'grad_norm': 1.364099383354187, 'learning_rate': 5.620326047997346e-06, 'epoch': 1.96} +2025-02-05 23:47:06 - ERROR - stderr - 65%|██████▌ | 14694/22434 [13:39:25<5:35:15, 2.60s/it] +2025-02-05 23:47:08 - ERROR - stderr - 66%|██████▌ | 14695/22434 [13:39:28<5:37:17, 2.62s/it] +2025-02-05 23:47:08 - ERROR - stderr - +2025-02-05 23:47:08 - ERROR - stderr - +2025-02-05 23:47:08 - INFO - stdout - {'loss': 0.7489, 'grad_norm': 1.3688173294067383, 'learning_rate': 5.619028178632394e-06, 'epoch': 1.97} +2025-02-05 23:47:08 - ERROR - stderr - 66%|██████▌ | 14695/22434 [13:39:28<5:37:17, 2.62s/it] +2025-02-05 23:47:11 - ERROR - stderr - 66%|██████▌ | 14696/22434 [13:39:31<5:33:51, 2.59s/it] +2025-02-05 23:47:11 - ERROR - stderr - +2025-02-05 23:47:11 - ERROR - stderr - +2025-02-05 23:47:11 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.261811375617981, 'learning_rate': 5.6177304005846e-06, 'epoch': 1.97} +2025-02-05 23:47:11 - ERROR - stderr - 66%|██████▌ | 14696/22434 [13:39:31<5:33:51, 2.59s/it] +2025-02-05 23:47:13 - ERROR - stderr - 66%|██████▌ | 14697/22434 [13:39:33<5:31:13, 2.57s/it] +2025-02-05 23:47:13 - ERROR - stderr - +2025-02-05 23:47:13 - ERROR - stderr - +2025-02-05 23:47:13 - INFO - stdout - {'loss': 0.682, 'grad_norm': 1.2925554513931274, 'learning_rate': 5.61643271388103e-06, 'epoch': 1.97} +2025-02-05 23:47:13 - ERROR - stderr - 66%|██████▌ | 14697/22434 [13:39:33<5:31:13, 2.57s/it] +2025-02-05 23:47:16 - ERROR - stderr - 66%|██████▌ | 14698/22434 [13:39:36<5:28:43, 2.55s/it] +2025-02-05 23:47:16 - ERROR - stderr - +2025-02-05 23:47:16 - ERROR - stderr - +2025-02-05 23:47:16 - INFO - stdout - {'loss': 0.6622, 'grad_norm': 1.1903830766677856, 'learning_rate': 5.615135118548718e-06, 'epoch': 1.97} +2025-02-05 23:47:16 - ERROR - stderr - 66%|██████▌ | 14698/22434 [13:39:36<5:28:43, 2.55s/it] +2025-02-05 23:47:18 - ERROR - stderr - 66%|██████▌ | 14699/22434 [13:39:38<5:27:16, 2.54s/it] +2025-02-05 23:47:18 - ERROR - stderr - +2025-02-05 23:47:18 - ERROR - stderr - +2025-02-05 23:47:18 - INFO - stdout - {'loss': 0.6756, 'grad_norm': 1.3450969457626343, 'learning_rate': 5.613837614614726e-06, 'epoch': 1.97} +2025-02-05 23:47:18 - ERROR - stderr - 66%|██████▌ | 14699/22434 [13:39:38<5:27:16, 2.54s/it] +2025-02-05 23:47:21 - ERROR - stderr - 66%|██████▌ | 14700/22434 [13:39:41<5:26:07, 2.53s/it] +2025-02-05 23:47:21 - ERROR - stderr - +2025-02-05 23:47:21 - ERROR - stderr - +2025-02-05 23:47:21 - INFO - stdout - {'loss': 0.6788, 'grad_norm': 1.2793998718261719, 'learning_rate': 5.612540202106089e-06, 'epoch': 1.97} +2025-02-05 23:47:21 - ERROR - stderr - 66%|██████▌ | 14700/22434 [13:39:41<5:26:07, 2.53s/it] +2025-02-05 23:47:23 - ERROR - stderr - 66%|██████▌ | 14701/22434 [13:39:43<5:23:55, 2.51s/it] +2025-02-05 23:47:23 - ERROR - stderr - +2025-02-05 23:47:23 - ERROR - stderr - +2025-02-05 23:47:23 - INFO - stdout - {'loss': 0.6447, 'grad_norm': 1.2261356115341187, 'learning_rate': 5.611242881049848e-06, 'epoch': 1.97} +2025-02-05 23:47:23 - ERROR - stderr - 66%|██████▌ | 14701/22434 [13:39:43<5:23:55, 2.51s/it] +2025-02-05 23:47:26 - ERROR - stderr - 66%|██████▌ | 14702/22434 [13:39:46<5:22:58, 2.51s/it] +2025-02-05 23:47:26 - ERROR - stderr - +2025-02-05 23:47:26 - ERROR - stderr - +2025-02-05 23:47:26 - INFO - stdout - {'loss': 0.6521, 'grad_norm': 1.2503198385238647, 'learning_rate': 5.6099456514730585e-06, 'epoch': 1.97} +2025-02-05 23:47:26 - ERROR - stderr - 66%|██████▌ | 14702/22434 [13:39:46<5:22:58, 2.51s/it] +2025-02-05 23:47:28 - ERROR - stderr - 66%|██████▌ | 14703/22434 [13:39:48<5:24:36, 2.52s/it] +2025-02-05 23:47:28 - ERROR - stderr - +2025-02-05 23:47:28 - ERROR - stderr - +2025-02-05 23:47:28 - INFO - stdout - {'loss': 0.6257, 'grad_norm': 1.2863826751708984, 'learning_rate': 5.608648513402741e-06, 'epoch': 1.97} +2025-02-05 23:47:28 - ERROR - stderr - 66%|██████▌ | 14703/22434 [13:39:48<5:24:36, 2.52s/it] +2025-02-05 23:47:31 - ERROR - stderr - 66%|██████▌ | 14704/22434 [13:39:51<5:27:51, 2.54s/it] +2025-02-05 23:47:31 - ERROR - stderr - +2025-02-05 23:47:31 - ERROR - stderr - +2025-02-05 23:47:31 - INFO - stdout - {'loss': 0.6021, 'grad_norm': 1.2316803932189941, 'learning_rate': 5.607351466865954e-06, 'epoch': 1.97} +2025-02-05 23:47:31 - ERROR - stderr - 66%|██████▌ | 14704/22434 [13:39:51<5:27:51, 2.54s/it] +2025-02-05 23:47:33 - ERROR - stderr - 66%|██████▌ | 14705/22434 [13:39:53<5:27:45, 2.54s/it] +2025-02-05 23:47:33 - ERROR - stderr - +2025-02-05 23:47:33 - ERROR - stderr - +2025-02-05 23:47:33 - INFO - stdout - {'loss': 0.6859, 'grad_norm': 1.310901403427124, 'learning_rate': 5.606054511889716e-06, 'epoch': 1.97} +2025-02-05 23:47:33 - ERROR - stderr - 66%|██████▌ | 14705/22434 [13:39:53<5:27:45, 2.54s/it] +2025-02-05 23:47:36 - ERROR - stderr - 66%|██████▌ | 14706/22434 [13:39:56<5:27:06, 2.54s/it] +2025-02-05 23:47:36 - ERROR - stderr - +2025-02-05 23:47:36 - ERROR - stderr - +2025-02-05 23:47:36 - INFO - stdout - {'loss': 0.6668, 'grad_norm': 1.3242027759552002, 'learning_rate': 5.604757648501069e-06, 'epoch': 1.97} +2025-02-05 23:47:36 - ERROR - stderr - 66%|██████▌ | 14706/22434 [13:39:56<5:27:06, 2.54s/it] +2025-02-05 23:47:39 - ERROR - stderr - 66%|██████▌ | 14707/22434 [13:39:58<5:26:49, 2.54s/it] +2025-02-05 23:47:39 - ERROR - stderr - +2025-02-05 23:47:39 - ERROR - stderr - +2025-02-05 23:47:39 - INFO - stdout - {'loss': 0.5948, 'grad_norm': 1.116053581237793, 'learning_rate': 5.603460876727043e-06, 'epoch': 1.97} +2025-02-05 23:47:39 - ERROR - stderr - 66%|██████▌ | 14707/22434 [13:39:58<5:26:49, 2.54s/it] +2025-02-05 23:47:41 - ERROR - stderr - 66%|██████▌ | 14708/22434 [13:40:01<5:28:43, 2.55s/it] +2025-02-05 23:47:41 - ERROR - stderr - +2025-02-05 23:47:41 - ERROR - stderr - +2025-02-05 23:47:41 - INFO - stdout - {'loss': 0.6268, 'grad_norm': 1.280141830444336, 'learning_rate': 5.602164196594666e-06, 'epoch': 1.97} +2025-02-05 23:47:41 - ERROR - stderr - 66%|██████▌ | 14708/22434 [13:40:01<5:28:43, 2.55s/it] +2025-02-05 23:47:44 - ERROR - stderr - 66%|██████▌ | 14709/22434 [13:40:03<5:28:07, 2.55s/it] +2025-02-05 23:47:44 - ERROR - stderr - +2025-02-05 23:47:44 - ERROR - stderr - +2025-02-05 23:47:44 - INFO - stdout - {'loss': 0.6673, 'grad_norm': 1.2817654609680176, 'learning_rate': 5.6008676081309685e-06, 'epoch': 1.97} +2025-02-05 23:47:44 - ERROR - stderr - 66%|██████▌ | 14709/22434 [13:40:03<5:28:07, 2.55s/it] +2025-02-05 23:47:46 - ERROR - stderr - 66%|██████▌ | 14710/22434 [13:40:06<5:26:34, 2.54s/it] +2025-02-05 23:47:46 - ERROR - stderr - +2025-02-05 23:47:46 - ERROR - stderr - +2025-02-05 23:47:46 - INFO - stdout - {'loss': 0.7094, 'grad_norm': 1.247922658920288, 'learning_rate': 5.599571111362978e-06, 'epoch': 1.97} +2025-02-05 23:47:46 - ERROR - stderr - 66%|██████▌ | 14710/22434 [13:40:06<5:26:34, 2.54s/it] +2025-02-05 23:47:49 - ERROR - stderr - 66%|██████▌ | 14711/22434 [13:40:08<5:24:39, 2.52s/it] +2025-02-05 23:47:49 - ERROR - stderr - +2025-02-05 23:47:49 - ERROR - stderr - +2025-02-05 23:47:49 - INFO - stdout - {'loss': 0.652, 'grad_norm': 1.2625484466552734, 'learning_rate': 5.598274706317716e-06, 'epoch': 1.97} +2025-02-05 23:47:49 - ERROR - stderr - 66%|██████▌ | 14711/22434 [13:40:08<5:24:39, 2.52s/it] +2025-02-05 23:47:51 - ERROR - stderr - 66%|██████▌ | 14712/22434 [13:40:11<5:28:59, 2.56s/it] +2025-02-05 23:47:51 - ERROR - stderr - +2025-02-05 23:47:51 - ERROR - stderr - +2025-02-05 23:47:51 - INFO - stdout - {'loss': 0.6861, 'grad_norm': 1.242927074432373, 'learning_rate': 5.596978393022206e-06, 'epoch': 1.97} +2025-02-05 23:47:51 - ERROR - stderr - 66%|██████▌ | 14712/22434 [13:40:11<5:28:59, 2.56s/it] +2025-02-05 23:47:54 - ERROR - stderr - 66%|██████▌ | 14713/22434 [13:40:14<5:26:58, 2.54s/it] +2025-02-05 23:47:54 - ERROR - stderr - +2025-02-05 23:47:54 - ERROR - stderr - +2025-02-05 23:47:54 - INFO - stdout - {'loss': 0.5897, 'grad_norm': 1.2137401103973389, 'learning_rate': 5.595682171503467e-06, 'epoch': 1.97} +2025-02-05 23:47:54 - ERROR - stderr - 66%|██████▌ | 14713/22434 [13:40:14<5:26:58, 2.54s/it] +2025-02-05 23:47:56 - ERROR - stderr - 66%|██████▌ | 14714/22434 [13:40:16<5:27:01, 2.54s/it] +2025-02-05 23:47:56 - ERROR - stderr - +2025-02-05 23:47:56 - ERROR - stderr - +2025-02-05 23:47:56 - INFO - stdout - {'loss': 0.6935, 'grad_norm': 1.3626763820648193, 'learning_rate': 5.59438604178852e-06, 'epoch': 1.97} +2025-02-05 23:47:56 - ERROR - stderr - 66%|██████▌ | 14714/22434 [13:40:16<5:27:01, 2.54s/it] +2025-02-05 23:47:59 - ERROR - stderr - 66%|██████▌ | 14715/22434 [13:40:19<5:25:47, 2.53s/it] +2025-02-05 23:47:59 - ERROR - stderr - +2025-02-05 23:47:59 - ERROR - stderr - +2025-02-05 23:47:59 - INFO - stdout - {'loss': 0.7677, 'grad_norm': 1.3026707172393799, 'learning_rate': 5.593090003904379e-06, 'epoch': 1.97} +2025-02-05 23:47:59 - ERROR - stderr - 66%|██████▌ | 14715/22434 [13:40:19<5:25:47, 2.53s/it] +2025-02-05 23:48:01 - ERROR - stderr - 66%|██████▌ | 14716/22434 [13:40:21<5:26:46, 2.54s/it] +2025-02-05 23:48:01 - ERROR - stderr - +2025-02-05 23:48:01 - ERROR - stderr - +2025-02-05 23:48:01 - INFO - stdout - {'loss': 0.6696, 'grad_norm': 1.3383845090866089, 'learning_rate': 5.5917940578780635e-06, 'epoch': 1.97} +2025-02-05 23:48:01 - ERROR - stderr - 66%|██████▌ | 14716/22434 [13:40:21<5:26:46, 2.54s/it] +2025-02-05 23:48:04 - ERROR - stderr - 66%|██████▌ | 14717/22434 [13:40:24<5:23:39, 2.52s/it] +2025-02-05 23:48:04 - ERROR - stderr - +2025-02-05 23:48:04 - ERROR - stderr - +2025-02-05 23:48:04 - INFO - stdout - {'loss': 0.6921, 'grad_norm': 1.3258661031723022, 'learning_rate': 5.590498203736576e-06, 'epoch': 1.97} +2025-02-05 23:48:04 - ERROR - stderr - 66%|██████▌ | 14717/22434 [13:40:24<5:23:39, 2.52s/it] +2025-02-05 23:48:06 - ERROR - stderr - 66%|██████▌ | 14718/22434 [13:40:26<5:24:50, 2.53s/it] +2025-02-05 23:48:06 - ERROR - stderr - +2025-02-05 23:48:06 - ERROR - stderr - +2025-02-05 23:48:06 - INFO - stdout - {'loss': 0.6463, 'grad_norm': 1.2359272241592407, 'learning_rate': 5.589202441506942e-06, 'epoch': 1.97} +2025-02-05 23:48:06 - ERROR - stderr - 66%|██████▌ | 14718/22434 [13:40:26<5:24:50, 2.53s/it] +2025-02-05 23:48:09 - ERROR - stderr - 66%|██████▌ | 14719/22434 [13:40:29<5:23:02, 2.51s/it] +2025-02-05 23:48:09 - ERROR - stderr - +2025-02-05 23:48:09 - ERROR - stderr - +2025-02-05 23:48:09 - INFO - stdout - {'loss': 0.712, 'grad_norm': 1.4394315481185913, 'learning_rate': 5.587906771216154e-06, 'epoch': 1.97} +2025-02-05 23:48:09 - ERROR - stderr - 66%|██████▌ | 14719/22434 [13:40:29<5:23:02, 2.51s/it] +2025-02-05 23:48:11 - ERROR - stderr - 66%|██████▌ | 14720/22434 [13:40:31<5:25:59, 2.54s/it] +2025-02-05 23:48:12 - ERROR - stderr - +2025-02-05 23:48:12 - ERROR - stderr - +2025-02-05 23:48:12 - INFO - stdout - {'loss': 0.6514, 'grad_norm': 1.289727807044983, 'learning_rate': 5.586611192891231e-06, 'epoch': 1.97} +2025-02-05 23:48:12 - ERROR - stderr - 66%|██████▌ | 14720/22434 [13:40:31<5:25:59, 2.54s/it] +2025-02-05 23:48:14 - ERROR - stderr - 66%|██████▌ | 14721/22434 [13:40:34<5:22:54, 2.51s/it] +2025-02-05 23:48:14 - ERROR - stderr - +2025-02-05 23:48:14 - ERROR - stderr - +2025-02-05 23:48:14 - INFO - stdout - {'loss': 0.6619, 'grad_norm': 1.2624475955963135, 'learning_rate': 5.58531570655918e-06, 'epoch': 1.97} +2025-02-05 23:48:14 - ERROR - stderr - 66%|██████▌ | 14721/22434 [13:40:34<5:22:54, 2.51s/it] +2025-02-05 23:48:17 - ERROR - stderr - 66%|██████▌ | 14722/22434 [13:40:36<5:27:08, 2.55s/it] +2025-02-05 23:48:17 - ERROR - stderr - +2025-02-05 23:48:17 - ERROR - stderr - +2025-02-05 23:48:17 - INFO - stdout - {'loss': 0.716, 'grad_norm': 1.345966100692749, 'learning_rate': 5.584020312246991e-06, 'epoch': 1.97} +2025-02-05 23:48:17 - ERROR - stderr - 66%|██████▌ | 14722/22434 [13:40:36<5:27:08, 2.55s/it] +2025-02-05 23:48:19 - ERROR - stderr - 66%|██████▌ | 14723/22434 [13:40:39<5:24:25, 2.52s/it] +2025-02-05 23:48:19 - ERROR - stderr - +2025-02-05 23:48:19 - ERROR - stderr - +2025-02-05 23:48:19 - INFO - stdout - {'loss': 0.6311, 'grad_norm': 1.1887273788452148, 'learning_rate': 5.5827250099816785e-06, 'epoch': 1.97} +2025-02-05 23:48:19 - ERROR - stderr - 66%|██████▌ | 14723/22434 [13:40:39<5:24:25, 2.52s/it] +2025-02-05 23:48:22 - ERROR - stderr - 66%|██████▌ | 14724/22434 [13:40:41<5:22:52, 2.51s/it] +2025-02-05 23:48:22 - ERROR - stderr - +2025-02-05 23:48:22 - ERROR - stderr - +2025-02-05 23:48:22 - INFO - stdout - {'loss': 0.6616, 'grad_norm': 1.2655175924301147, 'learning_rate': 5.581429799790234e-06, 'epoch': 1.97} +2025-02-05 23:48:22 - ERROR - stderr - 66%|██████▌ | 14724/22434 [13:40:41<5:22:52, 2.51s/it] +2025-02-05 23:48:24 - ERROR - stderr - 66%|██████▌ | 14725/22434 [13:40:44<5:20:13, 2.49s/it] +2025-02-05 23:48:24 - ERROR - stderr - +2025-02-05 23:48:24 - ERROR - stderr - +2025-02-05 23:48:24 - INFO - stdout - {'loss': 0.6543, 'grad_norm': 1.19561767578125, 'learning_rate': 5.580134681699657e-06, 'epoch': 1.97} +2025-02-05 23:48:24 - ERROR - stderr - 66%|██████▌ | 14725/22434 [13:40:44<5:20:13, 2.49s/it] +2025-02-05 23:48:26 - ERROR - stderr - 66%|██████▌ | 14726/22434 [13:40:46<5:20:22, 2.49s/it] +2025-02-05 23:48:26 - ERROR - stderr - +2025-02-05 23:48:26 - ERROR - stderr - +2025-02-05 23:48:26 - INFO - stdout - {'loss': 0.6546, 'grad_norm': 1.42328679561615, 'learning_rate': 5.578839655736943e-06, 'epoch': 1.97} +2025-02-05 23:48:26 - ERROR - stderr - 66%|██████▌ | 14726/22434 [13:40:46<5:20:22, 2.49s/it] +2025-02-05 23:48:29 - ERROR - stderr - 66%|██████▌ | 14727/22434 [13:40:49<5:18:11, 2.48s/it] +2025-02-05 23:48:29 - ERROR - stderr - +2025-02-05 23:48:29 - ERROR - stderr - +2025-02-05 23:48:29 - INFO - stdout - {'loss': 0.6298, 'grad_norm': 1.1691291332244873, 'learning_rate': 5.577544721929082e-06, 'epoch': 1.97} +2025-02-05 23:48:29 - ERROR - stderr - 66%|██████▌ | 14727/22434 [13:40:49<5:18:11, 2.48s/it] +2025-02-05 23:48:31 - ERROR - stderr - 66%|██████▌ | 14728/22434 [13:40:51<5:18:49, 2.48s/it] +2025-02-05 23:48:31 - ERROR - stderr - +2025-02-05 23:48:31 - ERROR - stderr - +2025-02-05 23:48:31 - INFO - stdout - {'loss': 0.616, 'grad_norm': 1.2448904514312744, 'learning_rate': 5.5762498803030775e-06, 'epoch': 1.97} +2025-02-05 23:48:31 - ERROR - stderr - 66%|██████▌ | 14728/22434 [13:40:51<5:18:49, 2.48s/it] +2025-02-05 23:48:34 - ERROR - stderr - 66%|██████▌ | 14729/22434 [13:40:54<5:17:23, 2.47s/it] +2025-02-05 23:48:34 - ERROR - stderr - +2025-02-05 23:48:34 - ERROR - stderr - +2025-02-05 23:48:34 - INFO - stdout - {'loss': 0.6898, 'grad_norm': 1.184157133102417, 'learning_rate': 5.574955130885906e-06, 'epoch': 1.97} +2025-02-05 23:48:34 - ERROR - stderr - 66%|██████▌ | 14729/22434 [13:40:54<5:17:23, 2.47s/it] +2025-02-05 23:48:37 - ERROR - stderr - 66%|██████▌ | 14730/22434 [13:40:57<5:35:36, 2.61s/it] +2025-02-05 23:48:37 - ERROR - stderr - +2025-02-05 23:48:37 - ERROR - stderr - +2025-02-05 23:48:37 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.4148709774017334, 'learning_rate': 5.573660473704562e-06, 'epoch': 1.97} +2025-02-05 23:48:37 - ERROR - stderr - 66%|██████▌ | 14730/22434 [13:40:57<5:35:36, 2.61s/it] +2025-02-05 23:48:39 - ERROR - stderr - 66%|██████▌ | 14731/22434 [13:40:59<5:31:38, 2.58s/it] +2025-02-05 23:48:39 - ERROR - stderr - +2025-02-05 23:48:39 - ERROR - stderr - +2025-02-05 23:48:39 - INFO - stdout - {'loss': 0.6144, 'grad_norm': 1.2132543325424194, 'learning_rate': 5.572365908786029e-06, 'epoch': 1.97} +2025-02-05 23:48:39 - ERROR - stderr - 66%|██████▌ | 14731/22434 [13:40:59<5:31:38, 2.58s/it] +2025-02-05 23:48:42 - ERROR - stderr - 66%|██████▌ | 14732/22434 [13:41:02<5:27:31, 2.55s/it] +2025-02-05 23:48:42 - ERROR - stderr - +2025-02-05 23:48:42 - ERROR - stderr - +2025-02-05 23:48:42 - INFO - stdout - {'loss': 0.6831, 'grad_norm': 1.2301424741744995, 'learning_rate': 5.5710714361572915e-06, 'epoch': 1.97} +2025-02-05 23:48:42 - ERROR - stderr - 66%|██████▌ | 14732/22434 [13:41:02<5:27:31, 2.55s/it] +2025-02-05 23:48:44 - ERROR - stderr - 66%|██████▌ | 14733/22434 [13:41:04<5:23:39, 2.52s/it] +2025-02-05 23:48:44 - ERROR - stderr - +2025-02-05 23:48:44 - ERROR - stderr - +2025-02-05 23:48:44 - INFO - stdout - {'loss': 0.7309, 'grad_norm': 1.3183348178863525, 'learning_rate': 5.569777055845334e-06, 'epoch': 1.97} +2025-02-05 23:48:44 - ERROR - stderr - 66%|██████▌ | 14733/22434 [13:41:04<5:23:39, 2.52s/it] +2025-02-05 23:48:47 - ERROR - stderr - 66%|██████▌ | 14734/22434 [13:41:07<5:36:45, 2.62s/it] +2025-02-05 23:48:47 - ERROR - stderr - +2025-02-05 23:48:47 - ERROR - stderr - +2025-02-05 23:48:47 - INFO - stdout - {'loss': 0.6097, 'grad_norm': 1.2518537044525146, 'learning_rate': 5.568482767877132e-06, 'epoch': 1.97} +2025-02-05 23:48:47 - ERROR - stderr - 66%|██████▌ | 14734/22434 [13:41:07<5:36:45, 2.62s/it] +2025-02-05 23:48:50 - ERROR - stderr - 66%|██████▌ | 14735/22434 [13:41:09<5:36:03, 2.62s/it] +2025-02-05 23:48:50 - ERROR - stderr - +2025-02-05 23:48:50 - ERROR - stderr - +2025-02-05 23:48:50 - INFO - stdout - {'loss': 0.6439, 'grad_norm': 1.1447440385818481, 'learning_rate': 5.567188572279667e-06, 'epoch': 1.97} +2025-02-05 23:48:50 - ERROR - stderr - 66%|██████▌ | 14735/22434 [13:41:10<5:36:03, 2.62s/it] +2025-02-05 23:48:52 - ERROR - stderr - 66%|██████▌ | 14736/22434 [13:41:12<5:31:17, 2.58s/it] +2025-02-05 23:48:52 - ERROR - stderr - +2025-02-05 23:48:52 - ERROR - stderr - +2025-02-05 23:48:52 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.379079818725586, 'learning_rate': 5.5658944690799155e-06, 'epoch': 1.97} +2025-02-05 23:48:52 - ERROR - stderr - 66%|██████▌ | 14736/22434 [13:41:12<5:31:17, 2.58s/it] +2025-02-05 23:48:55 - ERROR - stderr - 66%|██████▌ | 14737/22434 [13:41:14<5:28:48, 2.56s/it] +2025-02-05 23:48:55 - ERROR - stderr - +2025-02-05 23:48:55 - ERROR - stderr - +2025-02-05 23:48:55 - INFO - stdout - {'loss': 0.5934, 'grad_norm': 1.2161476612091064, 'learning_rate': 5.564600458304854e-06, 'epoch': 1.97} +2025-02-05 23:48:55 - ERROR - stderr - 66%|██████▌ | 14737/22434 [13:41:15<5:28:48, 2.56s/it] +2025-02-05 23:48:57 - ERROR - stderr - 66%|██████▌ | 14738/22434 [13:41:17<5:25:39, 2.54s/it] +2025-02-05 23:48:57 - ERROR - stderr - +2025-02-05 23:48:57 - ERROR - stderr - +2025-02-05 23:48:57 - INFO - stdout - {'loss': 0.6021, 'grad_norm': 1.1162919998168945, 'learning_rate': 5.563306539981443e-06, 'epoch': 1.97} +2025-02-05 23:48:57 - ERROR - stderr - 66%|██████▌ | 14738/22434 [13:41:17<5:25:39, 2.54s/it] +2025-02-05 23:49:00 - ERROR - stderr - 66%|██████▌ | 14739/22434 [13:41:19<5:24:59, 2.53s/it] +2025-02-05 23:49:00 - ERROR - stderr - +2025-02-05 23:49:00 - ERROR - stderr - +2025-02-05 23:49:00 - INFO - stdout - {'loss': 0.6245, 'grad_norm': 1.2312591075897217, 'learning_rate': 5.562012714136667e-06, 'epoch': 1.97} +2025-02-05 23:49:00 - ERROR - stderr - 66%|██████▌ | 14739/22434 [13:41:20<5:24:59, 2.53s/it] +2025-02-05 23:49:02 - ERROR - stderr - 66%|██████▌ | 14740/22434 [13:41:22<5:23:21, 2.52s/it] +2025-02-05 23:49:02 - ERROR - stderr - +2025-02-05 23:49:02 - ERROR - stderr - +2025-02-05 23:49:02 - INFO - stdout - {'loss': 0.6366, 'grad_norm': 1.217163324356079, 'learning_rate': 5.560718980797492e-06, 'epoch': 1.97} +2025-02-05 23:49:02 - ERROR - stderr - 66%|██████▌ | 14740/22434 [13:41:22<5:23:21, 2.52s/it] +2025-02-05 23:49:05 - ERROR - stderr - 66%|██████▌ | 14741/22434 [13:41:25<5:25:47, 2.54s/it] +2025-02-05 23:49:05 - ERROR - stderr - +2025-02-05 23:49:05 - ERROR - stderr - +2025-02-05 23:49:05 - INFO - stdout - {'loss': 0.6743, 'grad_norm': 1.269691824913025, 'learning_rate': 5.559425339990876e-06, 'epoch': 1.97} +2025-02-05 23:49:05 - ERROR - stderr - 66%|██████▌ | 14741/22434 [13:41:25<5:25:47, 2.54s/it] +2025-02-05 23:49:07 - ERROR - stderr - 66%|██████▌ | 14742/22434 [13:41:27<5:23:19, 2.52s/it] +2025-02-05 23:49:07 - ERROR - stderr - +2025-02-05 23:49:07 - ERROR - stderr - +2025-02-05 23:49:07 - INFO - stdout - {'loss': 0.6685, 'grad_norm': 1.1983734369277954, 'learning_rate': 5.558131791743795e-06, 'epoch': 1.97} +2025-02-05 23:49:07 - ERROR - stderr - 66%|██████▌ | 14742/22434 [13:41:27<5:23:19, 2.52s/it] +2025-02-05 23:49:10 - ERROR - stderr - 66%|██████▌ | 14743/22434 [13:41:29<5:21:13, 2.51s/it] +2025-02-05 23:49:10 - ERROR - stderr - +2025-02-05 23:49:10 - ERROR - stderr - +2025-02-05 23:49:10 - INFO - stdout - {'loss': 0.8242, 'grad_norm': 1.4491660594940186, 'learning_rate': 5.5568383360832e-06, 'epoch': 1.97} +2025-02-05 23:49:10 - ERROR - stderr - 66%|██████▌ | 14743/22434 [13:41:30<5:21:13, 2.51s/it] +2025-02-05 23:49:12 - ERROR - stderr - 66%|██████▌ | 14744/22434 [13:41:32<5:19:40, 2.49s/it] +2025-02-05 23:49:12 - ERROR - stderr - +2025-02-05 23:49:12 - ERROR - stderr - +2025-02-05 23:49:12 - INFO - stdout - {'loss': 0.6692, 'grad_norm': 1.262616515159607, 'learning_rate': 5.555544973036067e-06, 'epoch': 1.97} +2025-02-05 23:49:12 - ERROR - stderr - 66%|██████▌ | 14744/22434 [13:41:32<5:19:40, 2.49s/it] +2025-02-05 23:49:15 - ERROR - stderr - 66%|██████▌ | 14745/22434 [13:41:34<5:17:02, 2.47s/it] +2025-02-05 23:49:15 - ERROR - stderr - +2025-02-05 23:49:15 - ERROR - stderr - +2025-02-05 23:49:15 - INFO - stdout - {'loss': 0.6851, 'grad_norm': 1.3029879331588745, 'learning_rate': 5.554251702629341e-06, 'epoch': 1.97} +2025-02-05 23:49:15 - ERROR - stderr - 66%|██████▌ | 14745/22434 [13:41:34<5:17:02, 2.47s/it] +2025-02-05 23:49:17 - ERROR - stderr - 66%|██████▌ | 14746/22434 [13:41:37<5:17:45, 2.48s/it] +2025-02-05 23:49:17 - ERROR - stderr - +2025-02-05 23:49:17 - ERROR - stderr - +2025-02-05 23:49:17 - INFO - stdout - {'loss': 0.6593, 'grad_norm': 1.2365121841430664, 'learning_rate': 5.55295852488998e-06, 'epoch': 1.97} +2025-02-05 23:49:17 - ERROR - stderr - 66%|██████▌ | 14746/22434 [13:41:37<5:17:45, 2.48s/it] +2025-02-05 23:49:20 - ERROR - stderr - 66%|██████▌ | 14747/22434 [13:41:39<5:17:48, 2.48s/it] +2025-02-05 23:49:20 - ERROR - stderr - +2025-02-05 23:49:20 - ERROR - stderr - +2025-02-05 23:49:20 - INFO - stdout - {'loss': 0.7862, 'grad_norm': 1.3223450183868408, 'learning_rate': 5.551665439844951e-06, 'epoch': 1.97} +2025-02-05 23:49:20 - ERROR - stderr - 66%|██████▌ | 14747/22434 [13:41:39<5:17:48, 2.48s/it] +2025-02-05 23:49:22 - ERROR - stderr - 66%|██████▌ | 14748/22434 [13:41:42<5:20:14, 2.50s/it] +2025-02-05 23:49:22 - ERROR - stderr - +2025-02-05 23:49:22 - ERROR - stderr - +2025-02-05 23:49:22 - INFO - stdout - {'loss': 0.6303, 'grad_norm': 1.212733268737793, 'learning_rate': 5.550372447521195e-06, 'epoch': 1.97} +2025-02-05 23:49:22 - ERROR - stderr - 66%|██████▌ | 14748/22434 [13:41:42<5:20:14, 2.50s/it] +2025-02-05 23:49:25 - ERROR - stderr - 66%|██████▌ | 14749/22434 [13:41:44<5:22:41, 2.52s/it] +2025-02-05 23:49:25 - ERROR - stderr - +2025-02-05 23:49:25 - ERROR - stderr - +2025-02-05 23:49:25 - INFO - stdout - {'loss': 0.6094, 'grad_norm': 1.1284390687942505, 'learning_rate': 5.549079547945669e-06, 'epoch': 1.97} +2025-02-05 23:49:25 - ERROR - stderr - 66%|██████▌ | 14749/22434 [13:41:45<5:22:41, 2.52s/it] +2025-02-05 23:49:27 - ERROR - stderr - 66%|██████▌ | 14750/22434 [13:41:47<5:23:12, 2.52s/it] +2025-02-05 23:49:27 - ERROR - stderr - +2025-02-05 23:49:27 - ERROR - stderr - +2025-02-05 23:49:27 - INFO - stdout - {'loss': 0.6576, 'grad_norm': 1.2222892045974731, 'learning_rate': 5.54778674114532e-06, 'epoch': 1.97} +2025-02-05 23:49:27 - ERROR - stderr - 66%|██████▌ | 14750/22434 [13:41:47<5:23:12, 2.52s/it] +2025-02-05 23:49:30 - ERROR - stderr - 66%|██████▌ | 14751/22434 [13:41:49<5:21:09, 2.51s/it] +2025-02-05 23:49:30 - ERROR - stderr - +2025-02-05 23:49:30 - ERROR - stderr - +2025-02-05 23:49:30 - INFO - stdout - {'loss': 0.7018, 'grad_norm': 1.393254280090332, 'learning_rate': 5.5464940271470955e-06, 'epoch': 1.97} +2025-02-05 23:49:30 - ERROR - stderr - 66%|██████▌ | 14751/22434 [13:41:50<5:21:09, 2.51s/it] +2025-02-05 23:49:33 - ERROR - stderr - 66%|██████▌ | 14752/22434 [13:41:53<5:41:40, 2.67s/it] +2025-02-05 23:49:33 - ERROR - stderr - +2025-02-05 23:49:33 - ERROR - stderr - +2025-02-05 23:49:33 - INFO - stdout - {'loss': 0.6703, 'grad_norm': 1.2778618335723877, 'learning_rate': 5.5452014059779425e-06, 'epoch': 1.97} +2025-02-05 23:49:33 - ERROR - stderr - 66%|██████▌ | 14752/22434 [13:41:53<5:41:40, 2.67s/it] +2025-02-05 23:49:35 - ERROR - stderr - 66%|██████▌ | 14753/22434 [13:41:55<5:35:12, 2.62s/it] +2025-02-05 23:49:35 - ERROR - stderr - +2025-02-05 23:49:35 - ERROR - stderr - +2025-02-05 23:49:35 - INFO - stdout - {'loss': 0.6344, 'grad_norm': 1.1624325513839722, 'learning_rate': 5.5439088776648034e-06, 'epoch': 1.97} +2025-02-05 23:49:35 - ERROR - stderr - 66%|██████▌ | 14753/22434 [13:41:55<5:35:12, 2.62s/it] +2025-02-05 23:49:38 - ERROR - stderr - 66%|██████▌ | 14754/22434 [13:41:58<5:33:44, 2.61s/it] +2025-02-05 23:49:38 - ERROR - stderr - +2025-02-05 23:49:38 - ERROR - stderr - +2025-02-05 23:49:38 - INFO - stdout - {'loss': 0.744, 'grad_norm': 1.3268336057662964, 'learning_rate': 5.542616442234618e-06, 'epoch': 1.97} +2025-02-05 23:49:38 - ERROR - stderr - 66%|██████▌ | 14754/22434 [13:41:58<5:33:44, 2.61s/it] +2025-02-05 23:49:40 - ERROR - stderr - 66%|██████▌ | 14755/22434 [13:42:00<5:33:45, 2.61s/it] +2025-02-05 23:49:41 - ERROR - stderr - +2025-02-05 23:49:41 - ERROR - stderr - +2025-02-05 23:49:41 - INFO - stdout - {'loss': 0.6251, 'grad_norm': 1.1663570404052734, 'learning_rate': 5.541324099714329e-06, 'epoch': 1.97} +2025-02-05 23:49:41 - ERROR - stderr - 66%|██████▌ | 14755/22434 [13:42:00<5:33:45, 2.61s/it] +2025-02-05 23:49:43 - ERROR - stderr - 66%|██████▌ | 14756/22434 [13:42:03<5:35:33, 2.62s/it] +2025-02-05 23:49:43 - ERROR - stderr - +2025-02-05 23:49:43 - ERROR - stderr - +2025-02-05 23:49:43 - INFO - stdout - {'loss': 0.6068, 'grad_norm': 1.077268362045288, 'learning_rate': 5.5400318501308755e-06, 'epoch': 1.97} +2025-02-05 23:49:43 - ERROR - stderr - 66%|██████▌ | 14756/22434 [13:42:03<5:35:33, 2.62s/it] +2025-02-05 23:49:46 - ERROR - stderr - 66%|██████▌ | 14757/22434 [13:42:06<5:40:55, 2.66s/it] +2025-02-05 23:49:46 - ERROR - stderr - +2025-02-05 23:49:46 - ERROR - stderr - +2025-02-05 23:49:46 - INFO - stdout - {'loss': 0.6578, 'grad_norm': 1.253084421157837, 'learning_rate': 5.5387396935111834e-06, 'epoch': 1.97} +2025-02-05 23:49:46 - ERROR - stderr - 66%|██████▌ | 14757/22434 [13:42:06<5:40:55, 2.66s/it] +2025-02-05 23:49:48 - ERROR - stderr - 66%|██████▌ | 14758/22434 [13:42:08<5:35:47, 2.62s/it] +2025-02-05 23:49:48 - ERROR - stderr - +2025-02-05 23:49:48 - ERROR - stderr - +2025-02-05 23:49:48 - INFO - stdout - {'loss': 0.8143, 'grad_norm': 1.3760347366333008, 'learning_rate': 5.537447629882198e-06, 'epoch': 1.97} +2025-02-05 23:49:48 - ERROR - stderr - 66%|██████▌ | 14758/22434 [13:42:08<5:35:47, 2.62s/it] +2025-02-05 23:49:51 - ERROR - stderr - 66%|██████▌ | 14759/22434 [13:42:11<5:29:09, 2.57s/it] +2025-02-05 23:49:51 - ERROR - stderr - +2025-02-05 23:49:51 - ERROR - stderr - +2025-02-05 23:49:51 - INFO - stdout - {'loss': 0.7175, 'grad_norm': 1.219498872756958, 'learning_rate': 5.536155659270846e-06, 'epoch': 1.97} +2025-02-05 23:49:51 - ERROR - stderr - 66%|██████▌ | 14759/22434 [13:42:11<5:29:09, 2.57s/it] +2025-02-05 23:49:53 - ERROR - stderr - 66%|██████▌ | 14760/22434 [13:42:13<5:25:29, 2.54s/it] +2025-02-05 23:49:53 - ERROR - stderr - +2025-02-05 23:49:53 - ERROR - stderr - +2025-02-05 23:49:53 - INFO - stdout - {'loss': 0.6802, 'grad_norm': 1.291443943977356, 'learning_rate': 5.534863781704059e-06, 'epoch': 1.97} +2025-02-05 23:49:53 - ERROR - stderr - 66%|██████▌ | 14760/22434 [13:42:13<5:25:29, 2.54s/it] +2025-02-05 23:49:56 - ERROR - stderr - 66%|██████▌ | 14761/22434 [13:42:16<5:25:06, 2.54s/it] +2025-02-05 23:49:56 - ERROR - stderr - +2025-02-05 23:49:56 - ERROR - stderr - +2025-02-05 23:49:56 - INFO - stdout - {'loss': 0.6179, 'grad_norm': 1.2158610820770264, 'learning_rate': 5.533571997208766e-06, 'epoch': 1.97} +2025-02-05 23:49:56 - ERROR - stderr - 66%|██████▌ | 14761/22434 [13:42:16<5:25:06, 2.54s/it] +2025-02-05 23:49:59 - ERROR - stderr - 66%|██████▌ | 14762/22434 [13:42:18<5:33:37, 2.61s/it] +2025-02-05 23:49:59 - ERROR - stderr - +2025-02-05 23:49:59 - ERROR - stderr - +2025-02-05 23:49:59 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.2621403932571411, 'learning_rate': 5.532280305811883e-06, 'epoch': 1.97} +2025-02-05 23:49:59 - ERROR - stderr - 66%|██████▌ | 14762/22434 [13:42:18<5:33:37, 2.61s/it] +2025-02-05 23:50:01 - ERROR - stderr - 66%|██████▌ | 14763/22434 [13:42:21<5:29:02, 2.57s/it] +2025-02-05 23:50:01 - ERROR - stderr - +2025-02-05 23:50:01 - ERROR - stderr - +2025-02-05 23:50:01 - INFO - stdout - {'loss': 0.6943, 'grad_norm': 1.195396065711975, 'learning_rate': 5.53098870754035e-06, 'epoch': 1.97} +2025-02-05 23:50:01 - ERROR - stderr - 66%|██████▌ | 14763/22434 [13:42:21<5:29:02, 2.57s/it] +2025-02-05 23:50:04 - ERROR - stderr - 66%|██████▌ | 14764/22434 [13:42:23<5:27:14, 2.56s/it] +2025-02-05 23:50:04 - ERROR - stderr - +2025-02-05 23:50:04 - ERROR - stderr - +2025-02-05 23:50:04 - INFO - stdout - {'loss': 0.6528, 'grad_norm': 1.2404215335845947, 'learning_rate': 5.529697202421078e-06, 'epoch': 1.97} +2025-02-05 23:50:04 - ERROR - stderr - 66%|██████▌ | 14764/22434 [13:42:23<5:27:14, 2.56s/it] +2025-02-05 23:50:07 - ERROR - stderr - 66%|██████▌ | 14765/22434 [13:42:27<5:52:52, 2.76s/it] +2025-02-05 23:50:07 - ERROR - stderr - +2025-02-05 23:50:07 - ERROR - stderr - +2025-02-05 23:50:07 - INFO - stdout - {'loss': 0.7154, 'grad_norm': 1.2457860708236694, 'learning_rate': 5.5284057904809855e-06, 'epoch': 1.97} +2025-02-05 23:50:07 - ERROR - stderr - 66%|██████▌ | 14765/22434 [13:42:27<5:52:52, 2.76s/it] +2025-02-05 23:50:09 - ERROR - stderr - 66%|██████▌ | 14766/22434 [13:42:29<5:44:00, 2.69s/it] +2025-02-05 23:50:09 - ERROR - stderr - +2025-02-05 23:50:09 - ERROR - stderr - +2025-02-05 23:50:09 - INFO - stdout - {'loss': 0.6563, 'grad_norm': 1.2416030168533325, 'learning_rate': 5.527114471747004e-06, 'epoch': 1.97} +2025-02-05 23:50:09 - ERROR - stderr - 66%|██████▌ | 14766/22434 [13:42:29<5:44:00, 2.69s/it] +2025-02-05 23:50:12 - ERROR - stderr - 66%|██████▌ | 14767/22434 [13:42:32<5:36:56, 2.64s/it] +2025-02-05 23:50:12 - ERROR - stderr - +2025-02-05 23:50:12 - ERROR - stderr - +2025-02-05 23:50:12 - INFO - stdout - {'loss': 0.6922, 'grad_norm': 1.2817267179489136, 'learning_rate': 5.525823246246031e-06, 'epoch': 1.97} +2025-02-05 23:50:12 - ERROR - stderr - 66%|██████▌ | 14767/22434 [13:42:32<5:36:56, 2.64s/it] +2025-02-05 23:50:14 - ERROR - stderr - 66%|██████▌ | 14768/22434 [13:42:34<5:29:40, 2.58s/it] +2025-02-05 23:50:14 - ERROR - stderr - +2025-02-05 23:50:14 - ERROR - stderr - +2025-02-05 23:50:14 - INFO - stdout - {'loss': 0.6941, 'grad_norm': 1.340245246887207, 'learning_rate': 5.524532114005001e-06, 'epoch': 1.97} +2025-02-05 23:50:14 - ERROR - stderr - 66%|██████▌ | 14768/22434 [13:42:34<5:29:40, 2.58s/it] +2025-02-05 23:50:17 - ERROR - stderr - 66%|██████▌ | 14769/22434 [13:42:37<5:25:18, 2.55s/it] +2025-02-05 23:50:17 - ERROR - stderr - +2025-02-05 23:50:17 - ERROR - stderr - +2025-02-05 23:50:17 - INFO - stdout - {'loss': 0.6966, 'grad_norm': 1.265772819519043, 'learning_rate': 5.523241075050813e-06, 'epoch': 1.97} +2025-02-05 23:50:17 - ERROR - stderr - 66%|██████▌ | 14769/22434 [13:42:37<5:25:18, 2.55s/it] +2025-02-05 23:50:19 - ERROR - stderr - 66%|██████▌ | 14770/22434 [13:42:39<5:22:40, 2.53s/it] +2025-02-05 23:50:19 - ERROR - stderr - +2025-02-05 23:50:19 - ERROR - stderr - +2025-02-05 23:50:19 - INFO - stdout - {'loss': 0.7058, 'grad_norm': 1.3466570377349854, 'learning_rate': 5.52195012941038e-06, 'epoch': 1.98} +2025-02-05 23:50:19 - ERROR - stderr - 66%|██████▌ | 14770/22434 [13:42:39<5:22:40, 2.53s/it] +2025-02-05 23:50:22 - ERROR - stderr - 66%|██████▌ | 14771/22434 [13:42:42<5:29:40, 2.58s/it] +2025-02-05 23:50:22 - ERROR - stderr - +2025-02-05 23:50:22 - ERROR - stderr - +2025-02-05 23:50:22 - INFO - stdout - {'loss': 0.6437, 'grad_norm': 1.3354321718215942, 'learning_rate': 5.520659277110611e-06, 'epoch': 1.98} +2025-02-05 23:50:22 - ERROR - stderr - 66%|██████▌ | 14771/22434 [13:42:42<5:29:40, 2.58s/it] +2025-02-05 23:50:25 - ERROR - stderr - 66%|██████▌ | 14772/22434 [13:42:44<5:28:54, 2.58s/it] +2025-02-05 23:50:25 - ERROR - stderr - +2025-02-05 23:50:25 - ERROR - stderr - +2025-02-05 23:50:25 - INFO - stdout - {'loss': 0.7199, 'grad_norm': 1.1858508586883545, 'learning_rate': 5.519368518178414e-06, 'epoch': 1.98} +2025-02-05 23:50:25 - ERROR - stderr - 66%|██████▌ | 14772/22434 [13:42:44<5:28:54, 2.58s/it] +2025-02-05 23:50:27 - ERROR - stderr - 66%|██████▌ | 14773/22434 [13:42:47<5:27:17, 2.56s/it] +2025-02-05 23:50:27 - ERROR - stderr - +2025-02-05 23:50:27 - ERROR - stderr - +2025-02-05 23:50:27 - INFO - stdout - {'loss': 0.6592, 'grad_norm': 1.2448784112930298, 'learning_rate': 5.5180778526406935e-06, 'epoch': 1.98} +2025-02-05 23:50:27 - ERROR - stderr - 66%|██████▌ | 14773/22434 [13:42:47<5:27:17, 2.56s/it] +2025-02-05 23:50:30 - ERROR - stderr - 66%|██████▌ | 14774/22434 [13:42:49<5:25:59, 2.55s/it] +2025-02-05 23:50:30 - ERROR - stderr - +2025-02-05 23:50:30 - ERROR - stderr - +2025-02-05 23:50:30 - INFO - stdout - {'loss': 0.6896, 'grad_norm': 1.2904632091522217, 'learning_rate': 5.5167872805243505e-06, 'epoch': 1.98} +2025-02-05 23:50:30 - ERROR - stderr - 66%|██████▌ | 14774/22434 [13:42:49<5:25:59, 2.55s/it] +2025-02-05 23:50:32 - ERROR - stderr - 66%|██████▌ | 14775/22434 [13:42:52<5:22:23, 2.53s/it] +2025-02-05 23:50:32 - ERROR - stderr - +2025-02-05 23:50:32 - ERROR - stderr - +2025-02-05 23:50:32 - INFO - stdout - {'loss': 0.5726, 'grad_norm': 1.241255521774292, 'learning_rate': 5.515496801856287e-06, 'epoch': 1.98} +2025-02-05 23:50:32 - ERROR - stderr - 66%|██████▌ | 14775/22434 [13:42:52<5:22:23, 2.53s/it] +2025-02-05 23:50:35 - ERROR - stderr - 66%|██████▌ | 14776/22434 [13:42:54<5:22:23, 2.53s/it] +2025-02-05 23:50:35 - ERROR - stderr - +2025-02-05 23:50:35 - ERROR - stderr - +2025-02-05 23:50:35 - INFO - stdout - {'loss': 0.6089, 'grad_norm': 1.1411305665969849, 'learning_rate': 5.514206416663401e-06, 'epoch': 1.98} +2025-02-05 23:50:35 - ERROR - stderr - 66%|██████▌ | 14776/22434 [13:42:54<5:22:23, 2.53s/it] +2025-02-05 23:50:37 - ERROR - stderr - 66%|██████▌ | 14777/22434 [13:42:57<5:19:53, 2.51s/it] +2025-02-05 23:50:37 - ERROR - stderr - +2025-02-05 23:50:37 - ERROR - stderr - +2025-02-05 23:50:37 - INFO - stdout - {'loss': 0.6086, 'grad_norm': 1.1187050342559814, 'learning_rate': 5.512916124972589e-06, 'epoch': 1.98} +2025-02-05 23:50:37 - ERROR - stderr - 66%|██████▌ | 14777/22434 [13:42:57<5:19:53, 2.51s/it] +2025-02-05 23:50:40 - ERROR - stderr - 66%|██████▌ | 14778/22434 [13:42:59<5:21:32, 2.52s/it] +2025-02-05 23:50:40 - ERROR - stderr - +2025-02-05 23:50:40 - ERROR - stderr - +2025-02-05 23:50:40 - INFO - stdout - {'loss': 0.7315, 'grad_norm': 1.3878928422927856, 'learning_rate': 5.511625926810749e-06, 'epoch': 1.98} +2025-02-05 23:50:40 - ERROR - stderr - 66%|██████▌ | 14778/22434 [13:42:59<5:21:32, 2.52s/it] +2025-02-05 23:50:42 - ERROR - stderr - 66%|██████▌ | 14779/22434 [13:43:02<5:18:27, 2.50s/it] +2025-02-05 23:50:42 - ERROR - stderr - +2025-02-05 23:50:42 - ERROR - stderr - +2025-02-05 23:50:42 - INFO - stdout - {'loss': 0.654, 'grad_norm': 1.0977685451507568, 'learning_rate': 5.510335822204771e-06, 'epoch': 1.98} +2025-02-05 23:50:42 - ERROR - stderr - 66%|██████▌ | 14779/22434 [13:43:02<5:18:27, 2.50s/it] +2025-02-05 23:50:45 - ERROR - stderr - 66%|██████▌ | 14780/22434 [13:43:04<5:22:23, 2.53s/it] +2025-02-05 23:50:45 - ERROR - stderr - +2025-02-05 23:50:45 - ERROR - stderr - +2025-02-05 23:50:45 - INFO - stdout - {'loss': 0.5597, 'grad_norm': 1.2233381271362305, 'learning_rate': 5.509045811181549e-06, 'epoch': 1.98} +2025-02-05 23:50:45 - ERROR - stderr - 66%|██████▌ | 14780/22434 [13:43:05<5:22:23, 2.53s/it] +2025-02-05 23:50:47 - ERROR - stderr - 66%|██████▌ | 14781/22434 [13:43:07<5:21:37, 2.52s/it] +2025-02-05 23:50:47 - ERROR - stderr - +2025-02-05 23:50:47 - ERROR - stderr - +2025-02-05 23:50:47 - INFO - stdout - {'loss': 0.6529, 'grad_norm': 1.3761019706726074, 'learning_rate': 5.507755893767963e-06, 'epoch': 1.98} +2025-02-05 23:50:47 - ERROR - stderr - 66%|██████▌ | 14781/22434 [13:43:07<5:21:37, 2.52s/it] +2025-02-05 23:50:50 - ERROR - stderr - 66%|██████▌ | 14782/22434 [13:43:09<5:21:46, 2.52s/it] +2025-02-05 23:50:50 - ERROR - stderr - +2025-02-05 23:50:50 - ERROR - stderr - +2025-02-05 23:50:50 - INFO - stdout - {'loss': 0.6308, 'grad_norm': 1.2496211528778076, 'learning_rate': 5.506466069990914e-06, 'epoch': 1.98} +2025-02-05 23:50:50 - ERROR - stderr - 66%|██████▌ | 14782/22434 [13:43:10<5:21:46, 2.52s/it] +2025-02-05 23:50:52 - ERROR - stderr - 66%|██████▌ | 14783/22434 [13:43:12<5:21:09, 2.52s/it] +2025-02-05 23:50:52 - ERROR - stderr - +2025-02-05 23:50:52 - ERROR - stderr - +2025-02-05 23:50:52 - INFO - stdout - {'loss': 0.6501, 'grad_norm': 1.2499384880065918, 'learning_rate': 5.505176339877273e-06, 'epoch': 1.98} +2025-02-05 23:50:52 - ERROR - stderr - 66%|██████▌ | 14783/22434 [13:43:12<5:21:09, 2.52s/it] +2025-02-05 23:50:55 - ERROR - stderr - 66%|██████▌ | 14784/22434 [13:43:15<5:20:19, 2.51s/it] +2025-02-05 23:50:55 - ERROR - stderr - +2025-02-05 23:50:55 - ERROR - stderr - +2025-02-05 23:50:55 - INFO - stdout - {'loss': 0.6659, 'grad_norm': 1.2473065853118896, 'learning_rate': 5.503886703453933e-06, 'epoch': 1.98} +2025-02-05 23:50:55 - ERROR - stderr - 66%|██████▌ | 14784/22434 [13:43:15<5:20:19, 2.51s/it] +2025-02-05 23:50:57 - ERROR - stderr - 66%|██████▌ | 14785/22434 [13:43:17<5:18:57, 2.50s/it] +2025-02-05 23:50:57 - ERROR - stderr - +2025-02-05 23:50:57 - ERROR - stderr - +2025-02-05 23:50:57 - INFO - stdout - {'loss': 0.6842, 'grad_norm': 1.2127883434295654, 'learning_rate': 5.502597160747778e-06, 'epoch': 1.98} +2025-02-05 23:50:57 - ERROR - stderr - 66%|██████▌ | 14785/22434 [13:43:17<5:18:57, 2.50s/it] +2025-02-05 23:51:00 - ERROR - stderr - 66%|██████▌ | 14786/22434 [13:43:19<5:18:09, 2.50s/it] +2025-02-05 23:51:00 - ERROR - stderr - +2025-02-05 23:51:00 - ERROR - stderr - +2025-02-05 23:51:00 - INFO - stdout - {'loss': 0.6791, 'grad_norm': 1.3580031394958496, 'learning_rate': 5.501307711785672e-06, 'epoch': 1.98} +2025-02-05 23:51:00 - ERROR - stderr - 66%|██████▌ | 14786/22434 [13:43:20<5:18:09, 2.50s/it] +2025-02-05 23:51:02 - ERROR - stderr - 66%|██████▌ | 14787/22434 [13:43:22<5:18:48, 2.50s/it] +2025-02-05 23:51:02 - ERROR - stderr - +2025-02-05 23:51:02 - ERROR - stderr - +2025-02-05 23:51:02 - INFO - stdout - {'loss': 0.7295, 'grad_norm': 1.3264230489730835, 'learning_rate': 5.5000183565945095e-06, 'epoch': 1.98} +2025-02-05 23:51:02 - ERROR - stderr - 66%|██████▌ | 14787/22434 [13:43:22<5:18:48, 2.50s/it] +2025-02-05 23:51:05 - ERROR - stderr - 66%|██████▌ | 14788/22434 [13:43:24<5:19:10, 2.50s/it] +2025-02-05 23:51:05 - ERROR - stderr - +2025-02-05 23:51:05 - ERROR - stderr - +2025-02-05 23:51:05 - INFO - stdout - {'loss': 0.5818, 'grad_norm': 1.2811975479125977, 'learning_rate': 5.4987290952011514e-06, 'epoch': 1.98} +2025-02-05 23:51:05 - ERROR - stderr - 66%|██████▌ | 14788/22434 [13:43:25<5:19:10, 2.50s/it] +2025-02-05 23:51:07 - ERROR - stderr - 66%|██████▌ | 14789/22434 [13:43:27<5:20:00, 2.51s/it] +2025-02-05 23:51:07 - ERROR - stderr - +2025-02-05 23:51:07 - ERROR - stderr - +2025-02-05 23:51:07 - INFO - stdout - {'loss': 0.6836, 'grad_norm': 1.3609604835510254, 'learning_rate': 5.497439927632486e-06, 'epoch': 1.98} +2025-02-05 23:51:07 - ERROR - stderr - 66%|██████▌ | 14789/22434 [13:43:27<5:20:00, 2.51s/it] +2025-02-05 23:51:10 - ERROR - stderr - 66%|██████▌ | 14790/22434 [13:43:29<5:17:43, 2.49s/it] +2025-02-05 23:51:10 - ERROR - stderr - +2025-02-05 23:51:10 - ERROR - stderr - +2025-02-05 23:51:10 - INFO - stdout - {'loss': 0.7534, 'grad_norm': 1.3904787302017212, 'learning_rate': 5.4961508539153744e-06, 'epoch': 1.98} +2025-02-05 23:51:10 - ERROR - stderr - 66%|██████▌ | 14790/22434 [13:43:30<5:17:43, 2.49s/it] +2025-02-05 23:51:12 - ERROR - stderr - 66%|██████▌ | 14791/22434 [13:43:32<5:18:01, 2.50s/it] +2025-02-05 23:51:12 - ERROR - stderr - +2025-02-05 23:51:12 - ERROR - stderr - +2025-02-05 23:51:12 - INFO - stdout - {'loss': 0.6065, 'grad_norm': 1.1995903253555298, 'learning_rate': 5.494861874076682e-06, 'epoch': 1.98} +2025-02-05 23:51:12 - ERROR - stderr - 66%|██████▌ | 14791/22434 [13:43:32<5:18:01, 2.50s/it] +2025-02-05 23:51:15 - ERROR - stderr - 66%|██████▌ | 14792/22434 [13:43:34<5:17:37, 2.49s/it] +2025-02-05 23:51:15 - ERROR - stderr - +2025-02-05 23:51:15 - ERROR - stderr - +2025-02-05 23:51:15 - INFO - stdout - {'loss': 0.6627, 'grad_norm': 1.2256760597229004, 'learning_rate': 5.493572988143292e-06, 'epoch': 1.98} +2025-02-05 23:51:15 - ERROR - stderr - 66%|██████▌ | 14792/22434 [13:43:35<5:17:37, 2.49s/it] +2025-02-05 23:51:17 - ERROR - stderr - 66%|██████▌ | 14793/22434 [13:43:37<5:16:39, 2.49s/it] +2025-02-05 23:51:17 - ERROR - stderr - +2025-02-05 23:51:17 - ERROR - stderr - +2025-02-05 23:51:17 - INFO - stdout - {'loss': 0.655, 'grad_norm': 1.3130468130111694, 'learning_rate': 5.492284196142057e-06, 'epoch': 1.98} +2025-02-05 23:51:17 - ERROR - stderr - 66%|██████▌ | 14793/22434 [13:43:37<5:16:39, 2.49s/it] +2025-02-05 23:51:20 - ERROR - stderr - 66%|██████▌ | 14794/22434 [13:43:39<5:16:19, 2.48s/it] +2025-02-05 23:51:20 - ERROR - stderr - +2025-02-05 23:51:20 - ERROR - stderr - +2025-02-05 23:51:20 - INFO - stdout - {'loss': 0.6497, 'grad_norm': 1.383592128753662, 'learning_rate': 5.490995498099844e-06, 'epoch': 1.98} +2025-02-05 23:51:20 - ERROR - stderr - 66%|██████▌ | 14794/22434 [13:43:39<5:16:19, 2.48s/it] +2025-02-05 23:51:22 - ERROR - stderr - 66%|██████▌ | 14795/22434 [13:43:42<5:15:04, 2.47s/it] +2025-02-05 23:51:22 - ERROR - stderr - +2025-02-05 23:51:22 - ERROR - stderr - +2025-02-05 23:51:22 - INFO - stdout - {'loss': 0.7338, 'grad_norm': 1.2162625789642334, 'learning_rate': 5.489706894043516e-06, 'epoch': 1.98} +2025-02-05 23:51:22 - ERROR - stderr - 66%|██████▌ | 14795/22434 [13:43:42<5:15:04, 2.47s/it] +2025-02-05 23:51:25 - ERROR - stderr - 66%|██████▌ | 14796/22434 [13:43:44<5:15:44, 2.48s/it] +2025-02-05 23:51:25 - ERROR - stderr - +2025-02-05 23:51:25 - ERROR - stderr - +2025-02-05 23:51:25 - INFO - stdout - {'loss': 0.7157, 'grad_norm': 1.4042145013809204, 'learning_rate': 5.48841838399993e-06, 'epoch': 1.98} +2025-02-05 23:51:25 - ERROR - stderr - 66%|██████▌ | 14796/22434 [13:43:44<5:15:44, 2.48s/it] +2025-02-05 23:51:27 - ERROR - stderr - 66%|██████▌ | 14797/22434 [13:43:47<5:16:55, 2.49s/it] +2025-02-05 23:51:27 - ERROR - stderr - +2025-02-05 23:51:27 - ERROR - stderr - +2025-02-05 23:51:27 - INFO - stdout - {'loss': 0.6003, 'grad_norm': 1.1914047002792358, 'learning_rate': 5.487129967995948e-06, 'epoch': 1.98} +2025-02-05 23:51:27 - ERROR - stderr - 66%|██████▌ | 14797/22434 [13:43:47<5:16:55, 2.49s/it] +2025-02-05 23:51:30 - ERROR - stderr - 66%|██████▌ | 14798/22434 [13:43:49<5:17:17, 2.49s/it] +2025-02-05 23:51:30 - ERROR - stderr - +2025-02-05 23:51:30 - ERROR - stderr - +2025-02-05 23:51:30 - INFO - stdout - {'loss': 0.6363, 'grad_norm': 1.2774121761322021, 'learning_rate': 5.485841646058423e-06, 'epoch': 1.98} +2025-02-05 23:51:30 - ERROR - stderr - 66%|██████▌ | 14798/22434 [13:43:49<5:17:17, 2.49s/it] +2025-02-05 23:51:32 - ERROR - stderr - 66%|██████▌ | 14799/22434 [13:43:52<5:17:22, 2.49s/it] +2025-02-05 23:51:32 - ERROR - stderr - +2025-02-05 23:51:32 - ERROR - stderr - +2025-02-05 23:51:32 - INFO - stdout - {'loss': 0.7308, 'grad_norm': 1.3322339057922363, 'learning_rate': 5.484553418214208e-06, 'epoch': 1.98} +2025-02-05 23:51:32 - ERROR - stderr - 66%|██████▌ | 14799/22434 [13:43:52<5:17:22, 2.49s/it] +2025-02-05 23:51:35 - ERROR - stderr - 66%|██████▌ | 14800/22434 [13:43:55<5:29:05, 2.59s/it] +2025-02-05 23:51:35 - ERROR - stderr - +2025-02-05 23:51:35 - ERROR - stderr - +2025-02-05 23:51:35 - INFO - stdout - {'loss': 0.6417, 'grad_norm': 1.2799160480499268, 'learning_rate': 5.483265284490157e-06, 'epoch': 1.98} +2025-02-05 23:51:35 - ERROR - stderr - 66%|██████▌ | 14800/22434 [13:43:55<5:29:05, 2.59s/it] +2025-02-05 23:51:37 - ERROR - stderr - 66%|██████▌ | 14801/22434 [13:43:57<5:26:34, 2.57s/it] +2025-02-05 23:51:37 - ERROR - stderr - +2025-02-05 23:51:37 - ERROR - stderr - +2025-02-05 23:51:37 - INFO - stdout - {'loss': 0.6334, 'grad_norm': 1.2640044689178467, 'learning_rate': 5.481977244913124e-06, 'epoch': 1.98} +2025-02-05 23:51:37 - ERROR - stderr - 66%|██████▌ | 14801/22434 [13:43:57<5:26:34, 2.57s/it] +2025-02-05 23:51:40 - ERROR - stderr - 66%|██████▌ | 14802/22434 [13:44:00<5:22:27, 2.54s/it] +2025-02-05 23:51:40 - ERROR - stderr - +2025-02-05 23:51:40 - ERROR - stderr - +2025-02-05 23:51:40 - INFO - stdout - {'loss': 0.6986, 'grad_norm': 1.1758586168289185, 'learning_rate': 5.480689299509943e-06, 'epoch': 1.98} +2025-02-05 23:51:40 - ERROR - stderr - 66%|██████▌ | 14802/22434 [13:44:00<5:22:27, 2.54s/it] +2025-02-05 23:51:42 - ERROR - stderr - 66%|██████▌ | 14803/22434 [13:44:02<5:19:09, 2.51s/it] +2025-02-05 23:51:42 - ERROR - stderr - +2025-02-05 23:51:42 - ERROR - stderr - +2025-02-05 23:51:42 - INFO - stdout - {'loss': 0.6203, 'grad_norm': 1.2020176649093628, 'learning_rate': 5.479401448307473e-06, 'epoch': 1.98} +2025-02-05 23:51:42 - ERROR - stderr - 66%|██████▌ | 14803/22434 [13:44:02<5:19:09, 2.51s/it] +2025-02-05 23:51:45 - ERROR - stderr - 66%|██████▌ | 14804/22434 [13:44:05<5:18:28, 2.50s/it] +2025-02-05 23:51:45 - ERROR - stderr - +2025-02-05 23:51:45 - ERROR - stderr - +2025-02-05 23:51:45 - INFO - stdout - {'loss': 0.6073, 'grad_norm': 1.2386404275894165, 'learning_rate': 5.4781136913325535e-06, 'epoch': 1.98} +2025-02-05 23:51:45 - ERROR - stderr - 66%|██████▌ | 14804/22434 [13:44:05<5:18:28, 2.50s/it] +2025-02-05 23:51:47 - ERROR - stderr - 66%|██████▌ | 14805/22434 [13:44:07<5:20:48, 2.52s/it] +2025-02-05 23:51:47 - ERROR - stderr - +2025-02-05 23:51:47 - ERROR - stderr - +2025-02-05 23:51:47 - INFO - stdout - {'loss': 0.645, 'grad_norm': 1.4168819189071655, 'learning_rate': 5.476826028612028e-06, 'epoch': 1.98} +2025-02-05 23:51:47 - ERROR - stderr - 66%|██████▌ | 14805/22434 [13:44:07<5:20:48, 2.52s/it] +2025-02-05 23:51:50 - ERROR - stderr - 66%|██████▌ | 14806/22434 [13:44:10<5:21:17, 2.53s/it] +2025-02-05 23:51:50 - ERROR - stderr - +2025-02-05 23:51:50 - ERROR - stderr - +2025-02-05 23:51:50 - INFO - stdout - {'loss': 0.6272, 'grad_norm': 1.450361967086792, 'learning_rate': 5.47553846017274e-06, 'epoch': 1.98} +2025-02-05 23:51:50 - ERROR - stderr - 66%|██████▌ | 14806/22434 [13:44:10<5:21:17, 2.53s/it] +2025-02-05 23:51:52 - ERROR - stderr - 66%|██████▌ | 14807/22434 [13:44:12<5:18:04, 2.50s/it] +2025-02-05 23:51:52 - ERROR - stderr - +2025-02-05 23:51:52 - ERROR - stderr - +2025-02-05 23:51:52 - INFO - stdout - {'loss': 0.6878, 'grad_norm': 1.3986823558807373, 'learning_rate': 5.474250986041514e-06, 'epoch': 1.98} +2025-02-05 23:51:52 - ERROR - stderr - 66%|██████▌ | 14807/22434 [13:44:12<5:18:04, 2.50s/it] +2025-02-05 23:51:55 - ERROR - stderr - 66%|██████▌ | 14808/22434 [13:44:15<5:19:35, 2.51s/it] +2025-02-05 23:51:55 - ERROR - stderr - +2025-02-05 23:51:55 - ERROR - stderr - +2025-02-05 23:51:55 - INFO - stdout - {'loss': 0.6541, 'grad_norm': 1.2959864139556885, 'learning_rate': 5.472963606245205e-06, 'epoch': 1.98} +2025-02-05 23:51:55 - ERROR - stderr - 66%|██████▌ | 14808/22434 [13:44:15<5:19:35, 2.51s/it] +2025-02-05 23:51:57 - ERROR - stderr - 66%|██████▌ | 14809/22434 [13:44:17<5:18:34, 2.51s/it] +2025-02-05 23:51:57 - ERROR - stderr - +2025-02-05 23:51:57 - ERROR - stderr - +2025-02-05 23:51:57 - INFO - stdout - {'loss': 0.7539, 'grad_norm': 1.452620506286621, 'learning_rate': 5.471676320810633e-06, 'epoch': 1.98} +2025-02-05 23:51:57 - ERROR - stderr - 66%|██████▌ | 14809/22434 [13:44:17<5:18:34, 2.51s/it] +2025-02-05 23:52:00 - ERROR - stderr - 66%|██████▌ | 14810/22434 [13:44:20<5:16:05, 2.49s/it] +2025-02-05 23:52:00 - ERROR - stderr - +2025-02-05 23:52:00 - ERROR - stderr - +2025-02-05 23:52:00 - INFO - stdout - {'loss': 0.6901, 'grad_norm': 1.189140796661377, 'learning_rate': 5.47038912976463e-06, 'epoch': 1.98} +2025-02-05 23:52:00 - ERROR - stderr - 66%|██████▌ | 14810/22434 [13:44:20<5:16:05, 2.49s/it] +2025-02-05 23:52:02 - ERROR - stderr - 66%|██████▌ | 14811/22434 [13:44:22<5:16:52, 2.49s/it] +2025-02-05 23:52:02 - ERROR - stderr - +2025-02-05 23:52:02 - ERROR - stderr - +2025-02-05 23:52:02 - INFO - stdout - {'loss': 0.7385, 'grad_norm': 1.3543273210525513, 'learning_rate': 5.469102033134042e-06, 'epoch': 1.98} +2025-02-05 23:52:02 - ERROR - stderr - 66%|██████▌ | 14811/22434 [13:44:22<5:16:52, 2.49s/it] +2025-02-05 23:52:05 - ERROR - stderr - 66%|██████▌ | 14812/22434 [13:44:25<5:15:02, 2.48s/it] +2025-02-05 23:52:05 - ERROR - stderr - +2025-02-05 23:52:05 - ERROR - stderr - +2025-02-05 23:52:05 - INFO - stdout - {'loss': 0.6865, 'grad_norm': 1.3062105178833008, 'learning_rate': 5.467815030945676e-06, 'epoch': 1.98} +2025-02-05 23:52:05 - ERROR - stderr - 66%|██████▌ | 14812/22434 [13:44:25<5:15:02, 2.48s/it] +2025-02-05 23:52:07 - ERROR - stderr - 66%|██████▌ | 14813/22434 [13:44:27<5:15:21, 2.48s/it] +2025-02-05 23:52:07 - ERROR - stderr - +2025-02-05 23:52:07 - ERROR - stderr - +2025-02-05 23:52:07 - INFO - stdout - {'loss': 0.5618, 'grad_norm': 1.210077166557312, 'learning_rate': 5.466528123226378e-06, 'epoch': 1.98} +2025-02-05 23:52:07 - ERROR - stderr - 66%|██████▌ | 14813/22434 [13:44:27<5:15:21, 2.48s/it] +2025-02-05 23:52:10 - ERROR - stderr - 66%|██████▌ | 14814/22434 [13:44:30<5:13:56, 2.47s/it] +2025-02-05 23:52:10 - ERROR - stderr - +2025-02-05 23:52:10 - ERROR - stderr - +2025-02-05 23:52:10 - INFO - stdout - {'loss': 0.6422, 'grad_norm': 1.2392358779907227, 'learning_rate': 5.465241310002959e-06, 'epoch': 1.98} +2025-02-05 23:52:10 - ERROR - stderr - 66%|██████▌ | 14814/22434 [13:44:30<5:13:56, 2.47s/it] +2025-02-05 23:52:12 - ERROR - stderr - 66%|██████▌ | 14815/22434 [13:44:32<5:13:16, 2.47s/it] +2025-02-05 23:52:12 - ERROR - stderr - +2025-02-05 23:52:12 - ERROR - stderr - +2025-02-05 23:52:12 - INFO - stdout - {'loss': 0.6629, 'grad_norm': 1.2217706441879272, 'learning_rate': 5.463954591302245e-06, 'epoch': 1.98} +2025-02-05 23:52:12 - ERROR - stderr - 66%|██████▌ | 14815/22434 [13:44:32<5:13:16, 2.47s/it] +2025-02-05 23:52:15 - ERROR - stderr - 66%|██████▌ | 14816/22434 [13:44:34<5:12:21, 2.46s/it] +2025-02-05 23:52:15 - ERROR - stderr - +2025-02-05 23:52:15 - ERROR - stderr - +2025-02-05 23:52:15 - INFO - stdout - {'loss': 0.647, 'grad_norm': 1.1721479892730713, 'learning_rate': 5.462667967151059e-06, 'epoch': 1.98} +2025-02-05 23:52:15 - ERROR - stderr - 66%|██████▌ | 14816/22434 [13:44:34<5:12:21, 2.46s/it] +2025-02-05 23:52:17 - ERROR - stderr - 66%|██████▌ | 14817/22434 [13:44:37<5:12:00, 2.46s/it] +2025-02-05 23:52:17 - ERROR - stderr - +2025-02-05 23:52:17 - ERROR - stderr - +2025-02-05 23:52:17 - INFO - stdout - {'loss': 0.608, 'grad_norm': 1.3137712478637695, 'learning_rate': 5.461381437576216e-06, 'epoch': 1.98} +2025-02-05 23:52:17 - ERROR - stderr - 66%|██████▌ | 14817/22434 [13:44:37<5:12:00, 2.46s/it] +2025-02-05 23:52:20 - ERROR - stderr - 66%|██████▌ | 14818/22434 [13:44:39<5:13:44, 2.47s/it] +2025-02-05 23:52:20 - ERROR - stderr - +2025-02-05 23:52:20 - ERROR - stderr - +2025-02-05 23:52:20 - INFO - stdout - {'loss': 0.6296, 'grad_norm': 1.1816153526306152, 'learning_rate': 5.460095002604533e-06, 'epoch': 1.98} +2025-02-05 23:52:20 - ERROR - stderr - 66%|██████▌ | 14818/22434 [13:44:39<5:13:44, 2.47s/it] +2025-02-05 23:52:22 - ERROR - stderr - 66%|██████▌ | 14819/22434 [13:44:42<5:12:11, 2.46s/it] +2025-02-05 23:52:22 - ERROR - stderr - +2025-02-05 23:52:22 - ERROR - stderr - +2025-02-05 23:52:22 - INFO - stdout - {'loss': 0.7559, 'grad_norm': 1.509032130241394, 'learning_rate': 5.458808662262826e-06, 'epoch': 1.98} +2025-02-05 23:52:22 - ERROR - stderr - 66%|██████▌ | 14819/22434 [13:44:42<5:12:11, 2.46s/it] +2025-02-05 23:52:25 - ERROR - stderr - 66%|██████▌ | 14820/22434 [13:44:44<5:13:34, 2.47s/it] +2025-02-05 23:52:25 - ERROR - stderr - +2025-02-05 23:52:25 - ERROR - stderr - +2025-02-05 23:52:25 - INFO - stdout - {'loss': 0.7419, 'grad_norm': 1.3604809045791626, 'learning_rate': 5.4575224165779075e-06, 'epoch': 1.98} +2025-02-05 23:52:25 - ERROR - stderr - 66%|██████▌ | 14820/22434 [13:44:44<5:13:34, 2.47s/it] +2025-02-05 23:52:27 - ERROR - stderr - 66%|██████▌ | 14821/22434 [13:44:47<5:12:26, 2.46s/it] +2025-02-05 23:52:27 - ERROR - stderr - +2025-02-05 23:52:27 - ERROR - stderr - +2025-02-05 23:52:27 - INFO - stdout - {'loss': 0.659, 'grad_norm': 1.3283542394638062, 'learning_rate': 5.456236265576589e-06, 'epoch': 1.98} +2025-02-05 23:52:27 - ERROR - stderr - 66%|██████▌ | 14821/22434 [13:44:47<5:12:26, 2.46s/it] +2025-02-05 23:52:29 - ERROR - stderr - 66%|██████▌ | 14822/22434 [13:44:49<5:14:44, 2.48s/it] +2025-02-05 23:52:30 - ERROR - stderr - +2025-02-05 23:52:30 - ERROR - stderr - +2025-02-05 23:52:30 - INFO - stdout - {'loss': 0.6865, 'grad_norm': 1.3595868349075317, 'learning_rate': 5.454950209285676e-06, 'epoch': 1.98} +2025-02-05 23:52:30 - ERROR - stderr - 66%|██████▌ | 14822/22434 [13:44:49<5:14:44, 2.48s/it] +2025-02-05 23:52:32 - ERROR - stderr - 66%|██████▌ | 14823/22434 [13:44:52<5:24:34, 2.56s/it] +2025-02-05 23:52:32 - ERROR - stderr - +2025-02-05 23:52:32 - ERROR - stderr - +2025-02-05 23:52:32 - INFO - stdout - {'loss': 0.6198, 'grad_norm': 1.1654398441314697, 'learning_rate': 5.453664247731976e-06, 'epoch': 1.98} +2025-02-05 23:52:32 - ERROR - stderr - 66%|██████▌ | 14823/22434 [13:44:52<5:24:34, 2.56s/it] +2025-02-05 23:52:35 - ERROR - stderr - 66%|██████▌ | 14824/22434 [13:44:54<5:20:47, 2.53s/it] +2025-02-05 23:52:35 - ERROR - stderr - +2025-02-05 23:52:35 - ERROR - stderr - +2025-02-05 23:52:35 - INFO - stdout - {'loss': 0.6252, 'grad_norm': 1.2143925428390503, 'learning_rate': 5.452378380942296e-06, 'epoch': 1.98} +2025-02-05 23:52:35 - ERROR - stderr - 66%|██████▌ | 14824/22434 [13:44:55<5:20:47, 2.53s/it] +2025-02-05 23:52:37 - ERROR - stderr - 66%|██████▌ | 14825/22434 [13:44:57<5:26:41, 2.58s/it] +2025-02-05 23:52:37 - ERROR - stderr - +2025-02-05 23:52:37 - ERROR - stderr - +2025-02-05 23:52:37 - INFO - stdout - {'loss': 0.7553, 'grad_norm': 1.405228614807129, 'learning_rate': 5.45109260894344e-06, 'epoch': 1.98} +2025-02-05 23:52:37 - ERROR - stderr - 66%|██████▌ | 14825/22434 [13:44:57<5:26:41, 2.58s/it] +2025-02-05 23:52:40 - ERROR - stderr - 66%|██████▌ | 14826/22434 [13:45:00<5:24:06, 2.56s/it] +2025-02-05 23:52:40 - ERROR - stderr - +2025-02-05 23:52:40 - ERROR - stderr - +2025-02-05 23:52:40 - INFO - stdout - {'loss': 0.6689, 'grad_norm': 1.2356865406036377, 'learning_rate': 5.449806931762198e-06, 'epoch': 1.98} +2025-02-05 23:52:40 - ERROR - stderr - 66%|██████▌ | 14826/22434 [13:45:00<5:24:06, 2.56s/it] +2025-02-05 23:52:43 - ERROR - stderr - 66%|██████▌ | 14827/22434 [13:45:02<5:27:35, 2.58s/it] +2025-02-05 23:52:43 - ERROR - stderr - +2025-02-05 23:52:43 - ERROR - stderr - +2025-02-05 23:52:43 - INFO - stdout - {'loss': 0.6512, 'grad_norm': 1.3537129163742065, 'learning_rate': 5.448521349425384e-06, 'epoch': 1.98} +2025-02-05 23:52:43 - ERROR - stderr - 66%|██████▌ | 14827/22434 [13:45:02<5:27:35, 2.58s/it] +2025-02-05 23:52:45 - ERROR - stderr - 66%|██████▌ | 14828/22434 [13:45:05<5:32:10, 2.62s/it] +2025-02-05 23:52:45 - ERROR - stderr - +2025-02-05 23:52:45 - ERROR - stderr - +2025-02-05 23:52:45 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.2031548023223877, 'learning_rate': 5.4472358619597795e-06, 'epoch': 1.98} +2025-02-05 23:52:45 - ERROR - stderr - 66%|██████▌ | 14828/22434 [13:45:05<5:32:10, 2.62s/it] +2025-02-05 23:52:48 - ERROR - stderr - 66%|██████▌ | 14829/22434 [13:45:08<5:28:44, 2.59s/it] +2025-02-05 23:52:48 - ERROR - stderr - +2025-02-05 23:52:48 - ERROR - stderr - +2025-02-05 23:52:48 - INFO - stdout - {'loss': 0.6824, 'grad_norm': 1.1823501586914062, 'learning_rate': 5.445950469392191e-06, 'epoch': 1.98} +2025-02-05 23:52:48 - ERROR - stderr - 66%|██████▌ | 14829/22434 [13:45:08<5:28:44, 2.59s/it] +2025-02-05 23:52:50 - ERROR - stderr - 66%|██████▌ | 14830/22434 [13:45:10<5:29:43, 2.60s/it] +2025-02-05 23:52:50 - ERROR - stderr - +2025-02-05 23:52:50 - ERROR - stderr - +2025-02-05 23:52:50 - INFO - stdout - {'loss': 0.6957, 'grad_norm': 1.3379372358322144, 'learning_rate': 5.444665171749411e-06, 'epoch': 1.98} +2025-02-05 23:52:50 - ERROR - stderr - 66%|██████▌ | 14830/22434 [13:45:10<5:29:43, 2.60s/it] +2025-02-05 23:52:53 - ERROR - stderr - 66%|██████▌ | 14831/22434 [13:45:13<5:24:14, 2.56s/it] +2025-02-05 23:52:53 - ERROR - stderr - +2025-02-05 23:52:53 - ERROR - stderr - +2025-02-05 23:52:53 - INFO - stdout - {'loss': 0.6398, 'grad_norm': 1.3275970220565796, 'learning_rate': 5.44337996905822e-06, 'epoch': 1.98} +2025-02-05 23:52:53 - ERROR - stderr - 66%|██████▌ | 14831/22434 [13:45:13<5:24:14, 2.56s/it] +2025-02-05 23:52:55 - ERROR - stderr - 66%|██████▌ | 14832/22434 [13:45:15<5:18:43, 2.52s/it] +2025-02-05 23:52:55 - ERROR - stderr - +2025-02-05 23:52:55 - ERROR - stderr - +2025-02-05 23:52:55 - INFO - stdout - {'loss': 0.6339, 'grad_norm': 1.230839490890503, 'learning_rate': 5.442094861345419e-06, 'epoch': 1.98} +2025-02-05 23:52:55 - ERROR - stderr - 66%|██████▌ | 14832/22434 [13:45:15<5:18:43, 2.52s/it] +2025-02-05 23:52:58 - ERROR - stderr - 66%|██████▌ | 14833/22434 [13:45:18<5:16:56, 2.50s/it] +2025-02-05 23:52:58 - ERROR - stderr - +2025-02-05 23:52:58 - ERROR - stderr - +2025-02-05 23:52:58 - INFO - stdout - {'loss': 0.6064, 'grad_norm': 1.278613805770874, 'learning_rate': 5.440809848637787e-06, 'epoch': 1.98} +2025-02-05 23:52:58 - ERROR - stderr - 66%|██████▌ | 14833/22434 [13:45:18<5:16:56, 2.50s/it] +2025-02-05 23:53:00 - ERROR - stderr - 66%|██████▌ | 14834/22434 [13:45:20<5:18:49, 2.52s/it] +2025-02-05 23:53:00 - ERROR - stderr - +2025-02-05 23:53:00 - ERROR - stderr - +2025-02-05 23:53:00 - INFO - stdout - {'loss': 0.6236, 'grad_norm': 1.1381815671920776, 'learning_rate': 5.43952493096211e-06, 'epoch': 1.98} +2025-02-05 23:53:00 - ERROR - stderr - 66%|██████▌ | 14834/22434 [13:45:20<5:18:49, 2.52s/it] +2025-02-05 23:53:03 - ERROR - stderr - 66%|██████▌ | 14835/22434 [13:45:23<5:19:23, 2.52s/it] +2025-02-05 23:53:03 - ERROR - stderr - +2025-02-05 23:53:03 - ERROR - stderr - +2025-02-05 23:53:03 - INFO - stdout - {'loss': 0.6195, 'grad_norm': 1.218299150466919, 'learning_rate': 5.438240108345172e-06, 'epoch': 1.98} +2025-02-05 23:53:03 - ERROR - stderr - 66%|██████▌ | 14835/22434 [13:45:23<5:19:23, 2.52s/it] +2025-02-05 23:53:05 - ERROR - stderr - 66%|██████▌ | 14836/22434 [13:45:25<5:17:30, 2.51s/it] +2025-02-05 23:53:05 - ERROR - stderr - +2025-02-05 23:53:05 - ERROR - stderr - +2025-02-05 23:53:05 - INFO - stdout - {'loss': 0.6985, 'grad_norm': 1.262445330619812, 'learning_rate': 5.436955380813751e-06, 'epoch': 1.98} +2025-02-05 23:53:05 - ERROR - stderr - 66%|██████▌ | 14836/22434 [13:45:25<5:17:30, 2.51s/it] +2025-02-05 23:53:08 - ERROR - stderr - 66%|██████▌ | 14837/22434 [13:45:28<5:33:50, 2.64s/it] +2025-02-05 23:53:08 - ERROR - stderr - +2025-02-05 23:53:08 - ERROR - stderr - +2025-02-05 23:53:08 - INFO - stdout - {'loss': 0.644, 'grad_norm': 1.302369236946106, 'learning_rate': 5.435670748394635e-06, 'epoch': 1.98} +2025-02-05 23:53:08 - ERROR - stderr - 66%|██████▌ | 14837/22434 [13:45:28<5:33:50, 2.64s/it] +2025-02-05 23:53:11 - ERROR - stderr - 66%|██████▌ | 14838/22434 [13:45:30<5:27:09, 2.58s/it] +2025-02-05 23:53:11 - ERROR - stderr - +2025-02-05 23:53:11 - ERROR - stderr - +2025-02-05 23:53:11 - INFO - stdout - {'loss': 0.6044, 'grad_norm': 1.18966543674469, 'learning_rate': 5.434386211114592e-06, 'epoch': 1.98} +2025-02-05 23:53:11 - ERROR - stderr - 66%|██████▌ | 14838/22434 [13:45:31<5:27:09, 2.58s/it] +2025-02-05 23:53:13 - ERROR - stderr - 66%|██████▌ | 14839/22434 [13:45:33<5:23:54, 2.56s/it] +2025-02-05 23:53:13 - ERROR - stderr - +2025-02-05 23:53:13 - ERROR - stderr - +2025-02-05 23:53:13 - INFO - stdout - {'loss': 0.6485, 'grad_norm': 1.2184501886367798, 'learning_rate': 5.433101769000399e-06, 'epoch': 1.98} +2025-02-05 23:53:13 - ERROR - stderr - 66%|██████▌ | 14839/22434 [13:45:33<5:23:54, 2.56s/it] +2025-02-05 23:53:16 - ERROR - stderr - 66%|██████▌ | 14840/22434 [13:45:35<5:20:44, 2.53s/it] +2025-02-05 23:53:16 - ERROR - stderr - +2025-02-05 23:53:16 - ERROR - stderr - +2025-02-05 23:53:16 - INFO - stdout - {'loss': 0.6575, 'grad_norm': 1.1978563070297241, 'learning_rate': 5.431817422078829e-06, 'epoch': 1.98} +2025-02-05 23:53:16 - ERROR - stderr - 66%|██████▌ | 14840/22434 [13:45:35<5:20:44, 2.53s/it] +2025-02-05 23:53:18 - ERROR - stderr - 66%|██████▌ | 14841/22434 [13:45:38<5:18:57, 2.52s/it] +2025-02-05 23:53:18 - ERROR - stderr - +2025-02-05 23:53:18 - ERROR - stderr - +2025-02-05 23:53:18 - INFO - stdout - {'loss': 0.7216, 'grad_norm': 1.3085737228393555, 'learning_rate': 5.430533170376655e-06, 'epoch': 1.98} +2025-02-05 23:53:18 - ERROR - stderr - 66%|██████▌ | 14841/22434 [13:45:38<5:18:57, 2.52s/it] +2025-02-05 23:53:21 - ERROR - stderr - 66%|██████▌ | 14842/22434 [13:45:40<5:19:52, 2.53s/it] +2025-02-05 23:53:21 - ERROR - stderr - +2025-02-05 23:53:21 - ERROR - stderr - +2025-02-05 23:53:21 - INFO - stdout - {'loss': 0.6817, 'grad_norm': 1.2358304262161255, 'learning_rate': 5.429249013920643e-06, 'epoch': 1.98} +2025-02-05 23:53:21 - ERROR - stderr - 66%|██████▌ | 14842/22434 [13:45:41<5:19:52, 2.53s/it] +2025-02-05 23:53:23 - ERROR - stderr - 66%|██████▌ | 14843/22434 [13:45:43<5:22:39, 2.55s/it] +2025-02-05 23:53:23 - ERROR - stderr - +2025-02-05 23:53:23 - ERROR - stderr - +2025-02-05 23:53:23 - INFO - stdout - {'loss': 0.7368, 'grad_norm': 1.3739274740219116, 'learning_rate': 5.4279649527375636e-06, 'epoch': 1.98} +2025-02-05 23:53:23 - ERROR - stderr - 66%|██████▌ | 14843/22434 [13:45:43<5:22:39, 2.55s/it] +2025-02-05 23:53:26 - ERROR - stderr - 66%|██████▌ | 14844/22434 [13:45:46<5:19:29, 2.53s/it] +2025-02-05 23:53:26 - ERROR - stderr - +2025-02-05 23:53:26 - ERROR - stderr - +2025-02-05 23:53:26 - INFO - stdout - {'loss': 0.6999, 'grad_norm': 1.2815834283828735, 'learning_rate': 5.426680986854178e-06, 'epoch': 1.99} +2025-02-05 23:53:26 - ERROR - stderr - 66%|██████▌ | 14844/22434 [13:45:46<5:19:29, 2.53s/it] +2025-02-05 23:53:28 - ERROR - stderr - 66%|██████▌ | 14845/22434 [13:45:48<5:16:50, 2.51s/it] +2025-02-05 23:53:28 - ERROR - stderr - +2025-02-05 23:53:28 - ERROR - stderr - +2025-02-05 23:53:28 - INFO - stdout - {'loss': 0.5736, 'grad_norm': 1.0874109268188477, 'learning_rate': 5.425397116297251e-06, 'epoch': 1.99} +2025-02-05 23:53:28 - ERROR - stderr - 66%|██████▌ | 14845/22434 [13:45:48<5:16:50, 2.51s/it] +2025-02-05 23:53:31 - ERROR - stderr - 66%|██████▌ | 14846/22434 [13:45:50<5:14:08, 2.48s/it] +2025-02-05 23:53:31 - ERROR - stderr - +2025-02-05 23:53:31 - ERROR - stderr - +2025-02-05 23:53:31 - INFO - stdout - {'loss': 0.6671, 'grad_norm': 1.2515881061553955, 'learning_rate': 5.424113341093548e-06, 'epoch': 1.99} +2025-02-05 23:53:31 - ERROR - stderr - 66%|██████▌ | 14846/22434 [13:45:50<5:14:08, 2.48s/it] +2025-02-05 23:53:33 - ERROR - stderr - 66%|██████▌ | 14847/22434 [13:45:53<5:13:04, 2.48s/it] +2025-02-05 23:53:33 - ERROR - stderr - +2025-02-05 23:53:33 - ERROR - stderr - +2025-02-05 23:53:33 - INFO - stdout - {'loss': 0.6746, 'grad_norm': 1.3371175527572632, 'learning_rate': 5.422829661269816e-06, 'epoch': 1.99} +2025-02-05 23:53:33 - ERROR - stderr - 66%|██████▌ | 14847/22434 [13:45:53<5:13:04, 2.48s/it] +2025-02-05 23:53:36 - ERROR - stderr - 66%|██████▌ | 14848/22434 [13:45:55<5:12:43, 2.47s/it] +2025-02-05 23:53:36 - ERROR - stderr - +2025-02-05 23:53:36 - ERROR - stderr - +2025-02-05 23:53:36 - INFO - stdout - {'loss': 0.6685, 'grad_norm': 1.3259339332580566, 'learning_rate': 5.421546076852824e-06, 'epoch': 1.99} +2025-02-05 23:53:36 - ERROR - stderr - 66%|█████���▌ | 14848/22434 [13:45:55<5:12:43, 2.47s/it] +2025-02-05 23:53:38 - ERROR - stderr - 66%|██████▌ | 14849/22434 [13:45:58<5:13:42, 2.48s/it] +2025-02-05 23:53:38 - ERROR - stderr - +2025-02-05 23:53:38 - ERROR - stderr - +2025-02-05 23:53:38 - INFO - stdout - {'loss': 0.7014, 'grad_norm': 1.3031377792358398, 'learning_rate': 5.420262587869327e-06, 'epoch': 1.99} +2025-02-05 23:53:38 - ERROR - stderr - 66%|██████▌ | 14849/22434 [13:45:58<5:13:42, 2.48s/it] +2025-02-05 23:53:41 - ERROR - stderr - 66%|██████▌ | 14850/22434 [13:46:00<5:13:37, 2.48s/it] +2025-02-05 23:53:41 - ERROR - stderr - +2025-02-05 23:53:41 - ERROR - stderr - +2025-02-05 23:53:41 - INFO - stdout - {'loss': 0.7865, 'grad_norm': 1.377580165863037, 'learning_rate': 5.418979194346065e-06, 'epoch': 1.99} +2025-02-05 23:53:41 - ERROR - stderr - 66%|██████▌ | 14850/22434 [13:46:00<5:13:37, 2.48s/it] +2025-02-05 23:53:43 - ERROR - stderr - 66%|██████▌ | 14851/22434 [13:46:03<5:15:10, 2.49s/it] +2025-02-05 23:53:43 - ERROR - stderr - +2025-02-05 23:53:43 - ERROR - stderr - +2025-02-05 23:53:43 - INFO - stdout - {'loss': 0.71, 'grad_norm': 1.3640551567077637, 'learning_rate': 5.417695896309807e-06, 'epoch': 1.99} +2025-02-05 23:53:43 - ERROR - stderr - 66%|██████▌ | 14851/22434 [13:46:03<5:15:10, 2.49s/it] +2025-02-05 23:53:45 - ERROR - stderr - 66%|██████▌ | 14852/22434 [13:46:05<5:11:20, 2.46s/it] +2025-02-05 23:53:46 - ERROR - stderr - +2025-02-05 23:53:46 - ERROR - stderr - +2025-02-05 23:53:46 - INFO - stdout - {'loss': 0.7335, 'grad_norm': 1.4848729372024536, 'learning_rate': 5.4164126937872855e-06, 'epoch': 1.99} +2025-02-05 23:53:46 - ERROR - stderr - 66%|██████▌ | 14852/22434 [13:46:05<5:11:20, 2.46s/it] +2025-02-05 23:53:48 - ERROR - stderr - 66%|██████▌ | 14853/22434 [13:46:08<5:10:22, 2.46s/it] +2025-02-05 23:53:48 - ERROR - stderr - +2025-02-05 23:53:48 - ERROR - stderr - +2025-02-05 23:53:48 - INFO - stdout - {'loss': 0.6266, 'grad_norm': 1.2218044996261597, 'learning_rate': 5.415129586805264e-06, 'epoch': 1.99} +2025-02-05 23:53:48 - ERROR - stderr - 66%|██████▌ | 14853/22434 [13:46:08<5:10:22, 2.46s/it] +2025-02-05 23:53:50 - ERROR - stderr - 66%|██████▌ | 14854/22434 [13:46:10<5:08:50, 2.44s/it] +2025-02-05 23:53:50 - ERROR - stderr - +2025-02-05 23:53:50 - ERROR - stderr - +2025-02-05 23:53:50 - INFO - stdout - {'loss': 0.8394, 'grad_norm': 1.5918920040130615, 'learning_rate': 5.4138465753904735e-06, 'epoch': 1.99} +2025-02-05 23:53:50 - ERROR - stderr - 66%|██████▌ | 14854/22434 [13:46:10<5:08:50, 2.44s/it] +2025-02-05 23:53:53 - ERROR - stderr - 66%|██████▌ | 14855/22434 [13:46:13<5:09:51, 2.45s/it] +2025-02-05 23:53:53 - ERROR - stderr - +2025-02-05 23:53:53 - ERROR - stderr - +2025-02-05 23:53:53 - INFO - stdout - {'loss': 0.6964, 'grad_norm': 1.2429171800613403, 'learning_rate': 5.4125636595696585e-06, 'epoch': 1.99} +2025-02-05 23:53:53 - ERROR - stderr - 66%|██████▌ | 14855/22434 [13:46:13<5:09:51, 2.45s/it] +2025-02-05 23:53:55 - ERROR - stderr - 66%|██████▌ | 14856/22434 [13:46:15<5:11:56, 2.47s/it] +2025-02-05 23:53:55 - ERROR - stderr - +2025-02-05 23:53:55 - ERROR - stderr - +2025-02-05 23:53:55 - INFO - stdout - {'loss': 0.6946, 'grad_norm': 1.303707480430603, 'learning_rate': 5.411280839369574e-06, 'epoch': 1.99} +2025-02-05 23:53:55 - ERROR - stderr - 66%|██████▌ | 14856/22434 [13:46:15<5:11:56, 2.47s/it] +2025-02-05 23:53:58 - ERROR - stderr - 66%|██████▌ | 14857/22434 [13:46:18<5:13:40, 2.48s/it] +2025-02-05 23:53:58 - ERROR - stderr - +2025-02-05 23:53:58 - ERROR - stderr - +2025-02-05 23:53:58 - INFO - stdout - {'loss': 0.6214, 'grad_norm': 1.2284363508224487, 'learning_rate': 5.409998114816943e-06, 'epoch': 1.99} +2025-02-05 23:53:58 - ERROR - stderr - 66%|██████▌ | 14857/22434 [13:46:18<5:13:40, 2.48s/it] +2025-02-05 23:54:00 - ERROR - stderr - 66%|██████▌ | 14858/22434 [13:46:20<5:12:15, 2.47s/it] +2025-02-05 23:54:00 - ERROR - stderr - +2025-02-05 23:54:00 - ERROR - stderr - +2025-02-05 23:54:00 - INFO - stdout - {'loss': 0.7401, 'grad_norm': 1.5470633506774902, 'learning_rate': 5.408715485938511e-06, 'epoch': 1.99} +2025-02-05 23:54:00 - ERROR - stderr - 66%|██████▌ | 14858/22434 [13:46:20<5:12:15, 2.47s/it] +2025-02-05 23:54:03 - ERROR - stderr - 66%|██████▌ | 14859/22434 [13:46:23<5:12:38, 2.48s/it] +2025-02-05 23:54:03 - ERROR - stderr - +2025-02-05 23:54:03 - ERROR - stderr - +2025-02-05 23:54:03 - INFO - stdout - {'loss': 0.7107, 'grad_norm': 1.3907623291015625, 'learning_rate': 5.407432952761011e-06, 'epoch': 1.99} +2025-02-05 23:54:03 - ERROR - stderr - 66%|██████▌ | 14859/22434 [13:46:23<5:12:38, 2.48s/it] +2025-02-05 23:54:05 - ERROR - stderr - 66%|██████▌ | 14860/22434 [13:46:25<5:13:11, 2.48s/it] +2025-02-05 23:54:05 - ERROR - stderr - +2025-02-05 23:54:05 - ERROR - stderr - +2025-02-05 23:54:05 - INFO - stdout - {'loss': 0.6768, 'grad_norm': 1.2641938924789429, 'learning_rate': 5.406150515311177e-06, 'epoch': 1.99} +2025-02-05 23:54:05 - ERROR - stderr - 66%|██████▌ | 14860/22434 [13:46:25<5:13:11, 2.48s/it] +2025-02-05 23:54:08 - ERROR - stderr - 66%|██████▌ | 14861/22434 [13:46:28<5:14:52, 2.49s/it] +2025-02-05 23:54:08 - ERROR - stderr - +2025-02-05 23:54:08 - ERROR - stderr - +2025-02-05 23:54:08 - INFO - stdout - {'loss': 0.6201, 'grad_norm': 1.2861313819885254, 'learning_rate': 5.404868173615739e-06, 'epoch': 1.99} +2025-02-05 23:54:08 - ERROR - stderr - 66%|██████▌ | 14861/22434 [13:46:28<5:14:52, 2.49s/it] +2025-02-05 23:54:10 - ERROR - stderr - 66%|██████▌ | 14862/22434 [13:46:30<5:13:33, 2.48s/it] +2025-02-05 23:54:10 - ERROR - stderr - +2025-02-05 23:54:10 - ERROR - stderr - +2025-02-05 23:54:10 - INFO - stdout - {'loss': 0.6577, 'grad_norm': 1.2859594821929932, 'learning_rate': 5.403585927701427e-06, 'epoch': 1.99} +2025-02-05 23:54:10 - ERROR - stderr - 66%|██████▌ | 14862/22434 [13:46:30<5:13:33, 2.48s/it] +2025-02-05 23:54:13 - ERROR - stderr - 66%|██████▋ | 14863/22434 [13:46:33<5:14:39, 2.49s/it] +2025-02-05 23:54:13 - ERROR - stderr - +2025-02-05 23:54:13 - ERROR - stderr - +2025-02-05 23:54:13 - INFO - stdout - {'loss': 0.6407, 'grad_norm': 1.1363396644592285, 'learning_rate': 5.402303777594968e-06, 'epoch': 1.99} +2025-02-05 23:54:13 - ERROR - stderr - 66%|██████▋ | 14863/22434 [13:46:33<5:14:39, 2.49s/it] +2025-02-05 23:54:15 - ERROR - stderr - 66%|██████▋ | 14864/22434 [13:46:35<5:12:35, 2.48s/it] +2025-02-05 23:54:15 - ERROR - stderr - +2025-02-05 23:54:15 - ERROR - stderr - +2025-02-05 23:54:15 - INFO - stdout - {'loss': 0.6151, 'grad_norm': 1.1651362180709839, 'learning_rate': 5.401021723323088e-06, 'epoch': 1.99} +2025-02-05 23:54:15 - ERROR - stderr - 66%|██████▋ | 14864/22434 [13:46:35<5:12:35, 2.48s/it] +2025-02-05 23:54:18 - ERROR - stderr - 66%|██████▋ | 14865/22434 [13:46:37<5:11:17, 2.47s/it] +2025-02-05 23:54:18 - ERROR - stderr - +2025-02-05 23:54:18 - ERROR - stderr - +2025-02-05 23:54:18 - INFO - stdout - {'loss': 0.5937, 'grad_norm': 1.205941915512085, 'learning_rate': 5.399739764912513e-06, 'epoch': 1.99} +2025-02-05 23:54:18 - ERROR - stderr - 66%|██████▋ | 14865/22434 [13:46:37<5:11:17, 2.47s/it] +2025-02-05 23:54:20 - ERROR - stderr - 66%|██████▋ | 14866/22434 [13:46:40<5:10:38, 2.46s/it] +2025-02-05 23:54:20 - ERROR - stderr - +2025-02-05 23:54:20 - ERROR - stderr - +2025-02-05 23:54:20 - INFO - stdout - {'loss': 0.6449, 'grad_norm': 1.287140965461731, 'learning_rate': 5.398457902389952e-06, 'epoch': 1.99} +2025-02-05 23:54:20 - ERROR - stderr - 66%|██████▋ | 14866/22434 [13:46:40<5:10:38, 2.46s/it] +2025-02-05 23:54:23 - ERROR - stderr - 66%|██████▋ | 14867/22434 [13:46:42<5:13:54, 2.49s/it] +2025-02-05 23:54:23 - ERROR - stderr - +2025-02-05 23:54:23 - ERROR - stderr - +2025-02-05 23:54:23 - INFO - stdout - {'loss': 0.7012, 'grad_norm': 1.230329155921936, 'learning_rate': 5.397176135782136e-06, 'epoch': 1.99} +2025-02-05 23:54:23 - ERROR - stderr - 66%|██████▋ | 14867/22434 [13:46:42<5:13:54, 2.49s/it] +2025-02-05 23:54:25 - ERROR - stderr - 66%|██████▋ | 14868/22434 [13:46:45<5:21:19, 2.55s/it] +2025-02-05 23:54:25 - ERROR - stderr - +2025-02-05 23:54:25 - ERROR - stderr - +2025-02-05 23:54:25 - INFO - stdout - {'loss': 0.7155, 'grad_norm': 1.1349071264266968, 'learning_rate': 5.395894465115781e-06, 'epoch': 1.99} +2025-02-05 23:54:25 - ERROR - stderr - 66%|██████▋ | 14868/22434 [13:46:45<5:21:19, 2.55s/it] +2025-02-05 23:54:28 - ERROR - stderr - 66%|██████▋ | 14869/22434 [13:46:48<5:21:25, 2.55s/it] +2025-02-05 23:54:28 - ERROR - stderr - +2025-02-05 23:54:28 - ERROR - stderr - +2025-02-05 23:54:28 - INFO - stdout - {'loss': 0.6827, 'grad_norm': 1.3649296760559082, 'learning_rate': 5.3946128904176e-06, 'epoch': 1.99} +2025-02-05 23:54:28 - ERROR - stderr - 66%|██████▋ | 14869/22434 [13:46:48<5:21:25, 2.55s/it] +2025-02-05 23:54:30 - ERROR - stderr - 66%|██████▋ | 14870/22434 [13:46:50<5:17:46, 2.52s/it] +2025-02-05 23:54:30 - ERROR - stderr - +2025-02-05 23:54:30 - ERROR - stderr - +2025-02-05 23:54:30 - INFO - stdout - {'loss': 0.7285, 'grad_norm': 1.2703038454055786, 'learning_rate': 5.393331411714309e-06, 'epoch': 1.99} +2025-02-05 23:54:30 - ERROR - stderr - 66%|██████▋ | 14870/22434 [13:46:50<5:17:46, 2.52s/it] +2025-02-05 23:54:33 - ERROR - stderr - 66%|██████▋ | 14871/22434 [13:46:53<5:19:44, 2.54s/it] +2025-02-05 23:54:33 - ERROR - stderr - +2025-02-05 23:54:33 - ERROR - stderr - +2025-02-05 23:54:33 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.2619279623031616, 'learning_rate': 5.392050029032609e-06, 'epoch': 1.99} +2025-02-05 23:54:33 - ERROR - stderr - 66%|██████▋ | 14871/22434 [13:46:53<5:19:44, 2.54s/it] +2025-02-05 23:54:35 - ERROR - stderr - 66%|██████▋ | 14872/22434 [13:46:55<5:15:58, 2.51s/it] +2025-02-05 23:54:35 - ERROR - stderr - +2025-02-05 23:54:35 - ERROR - stderr - +2025-02-05 23:54:35 - INFO - stdout - {'loss': 0.6365, 'grad_norm': 1.2635893821716309, 'learning_rate': 5.390768742399226e-06, 'epoch': 1.99} +2025-02-05 23:54:35 - ERROR - stderr - 66%|██████▋ | 14872/22434 [13:46:55<5:15:58, 2.51s/it] +2025-02-05 23:54:38 - ERROR - stderr - 66%|██████▋ | 14873/22434 [13:46:58<5:13:10, 2.49s/it] +2025-02-05 23:54:38 - ERROR - stderr - +2025-02-05 23:54:38 - ERROR - stderr - +2025-02-05 23:54:38 - INFO - stdout - {'loss': 0.6993, 'grad_norm': 1.3888193368911743, 'learning_rate': 5.38948755184085e-06, 'epoch': 1.99} +2025-02-05 23:54:38 - ERROR - stderr - 66%|██████▋ | 14873/22434 [13:46:58<5:13:10, 2.49s/it] +2025-02-05 23:54:40 - ERROR - stderr - 66%|██████▋ | 14874/22434 [13:47:00<5:13:27, 2.49s/it] +2025-02-05 23:54:40 - ERROR - stderr - +2025-02-05 23:54:40 - ERROR - stderr - +2025-02-05 23:54:40 - INFO - stdout - {'loss': 0.6206, 'grad_norm': 1.21269690990448, 'learning_rate': 5.388206457384198e-06, 'epoch': 1.99} +2025-02-05 23:54:40 - ERROR - stderr - 66%|██████▋ | 14874/22434 [13:47:00<5:13:27, 2.49s/it] +2025-02-05 23:54:43 - ERROR - stderr - 66%|██████▋ | 14875/22434 [13:47:03<5:12:03, 2.48s/it] +2025-02-05 23:54:43 - ERROR - stderr - +2025-02-05 23:54:43 - ERROR - stderr - +2025-02-05 23:54:43 - INFO - stdout - {'loss': 0.6093, 'grad_norm': 1.234320878982544, 'learning_rate': 5.386925459055971e-06, 'epoch': 1.99} +2025-02-05 23:54:43 - ERROR - stderr - 66%|██████▋ | 14875/22434 [13:47:03<5:12:03, 2.48s/it] +2025-02-05 23:54:45 - ERROR - stderr - 66%|██████▋ | 14876/22434 [13:47:05<5:18:15, 2.53s/it] +2025-02-05 23:54:45 - ERROR - stderr - +2025-02-05 23:54:45 - ERROR - stderr - +2025-02-05 23:54:45 - INFO - stdout - {'loss': 0.6275, 'grad_norm': 1.2551288604736328, 'learning_rate': 5.385644556882863e-06, 'epoch': 1.99} +2025-02-05 23:54:45 - ERROR - stderr - 66%|██████▋ | 14876/22434 [13:47:05<5:18:15, 2.53s/it] +2025-02-05 23:54:48 - ERROR - stderr - 66%|██████▋ | 14877/22434 [13:47:08<5:19:46, 2.54s/it] +2025-02-05 23:54:48 - ERROR - stderr - +2025-02-05 23:54:48 - ERROR - stderr - +2025-02-05 23:54:48 - INFO - stdout - {'loss': 0.7088, 'grad_norm': 1.2301275730133057, 'learning_rate': 5.384363750891586e-06, 'epoch': 1.99} +2025-02-05 23:54:48 - ERROR - stderr - 66%|██████▋ | 14877/22434 [13:47:08<5:19:46, 2.54s/it] +2025-02-05 23:54:51 - ERROR - stderr - 66%|██████▋ | 14878/22434 [13:47:10<5:20:14, 2.54s/it] +2025-02-05 23:54:51 - ERROR - stderr - +2025-02-05 23:54:51 - ERROR - stderr - +2025-02-05 23:54:51 - INFO - stdout - {'loss': 0.6432, 'grad_norm': 1.1993392705917358, 'learning_rate': 5.383083041108827e-06, 'epoch': 1.99} +2025-02-05 23:54:51 - ERROR - stderr - 66%|██████▋ | 14878/22434 [13:47:10<5:20:14, 2.54s/it] +2025-02-05 23:54:53 - ERROR - stderr - 66%|██████▋ | 14879/22434 [13:47:13<5:20:44, 2.55s/it] +2025-02-05 23:54:53 - ERROR - stderr - +2025-02-05 23:54:53 - ERROR - stderr - +2025-02-05 23:54:53 - INFO - stdout - {'loss': 0.7441, 'grad_norm': 1.418544888496399, 'learning_rate': 5.3818024275612825e-06, 'epoch': 1.99} +2025-02-05 23:54:53 - ERROR - stderr - 66%|██████▋ | 14879/22434 [13:47:13<5:20:44, 2.55s/it] +2025-02-05 23:54:56 - ERROR - stderr - 66%|██████▋ | 14880/22434 [13:47:15<5:18:23, 2.53s/it] +2025-02-05 23:54:56 - ERROR - stderr - +2025-02-05 23:54:56 - ERROR - stderr - +2025-02-05 23:54:56 - INFO - stdout - {'loss': 0.6576, 'grad_norm': 1.2797012329101562, 'learning_rate': 5.380521910275649e-06, 'epoch': 1.99} +2025-02-05 23:54:56 - ERROR - stderr - 66%|██████▋ | 14880/22434 [13:47:15<5:18:23, 2.53s/it] +2025-02-05 23:54:58 - ERROR - stderr - 66%|██████▋ | 14881/22434 [13:47:18<5:15:40, 2.51s/it] +2025-02-05 23:54:58 - ERROR - stderr - +2025-02-05 23:54:58 - ERROR - stderr - +2025-02-05 23:54:58 - INFO - stdout - {'loss': 0.6826, 'grad_norm': 1.2851200103759766, 'learning_rate': 5.379241489278615e-06, 'epoch': 1.99} +2025-02-05 23:54:58 - ERROR - stderr - 66%|██████▋ | 14881/22434 [13:47:18<5:15:40, 2.51s/it] +2025-02-05 23:55:01 - ERROR - stderr - 66%|██████▋ | 14882/22434 [13:47:20<5:15:37, 2.51s/it] +2025-02-05 23:55:01 - ERROR - stderr - +2025-02-05 23:55:01 - ERROR - stderr - +2025-02-05 23:55:01 - INFO - stdout - {'loss': 0.6506, 'grad_norm': 1.3117268085479736, 'learning_rate': 5.3779611645968696e-06, 'epoch': 1.99} +2025-02-05 23:55:01 - ERROR - stderr - 66%|██████▋ | 14882/22434 [13:47:20<5:15:37, 2.51s/it] +2025-02-05 23:55:03 - ERROR - stderr - 66%|██████▋ | 14883/22434 [13:47:23<5:14:51, 2.50s/it] +2025-02-05 23:55:03 - ERROR - stderr - +2025-02-05 23:55:03 - ERROR - stderr - +2025-02-05 23:55:03 - INFO - stdout - {'loss': 0.6771, 'grad_norm': 1.235428810119629, 'learning_rate': 5.376680936257102e-06, 'epoch': 1.99} +2025-02-05 23:55:03 - ERROR - stderr - 66%|██████▋ | 14883/22434 [13:47:23<5:14:51, 2.50s/it] +2025-02-05 23:55:06 - ERROR - stderr - 66%|██████▋ | 14884/22434 [13:47:25<5:15:30, 2.51s/it] +2025-02-05 23:55:06 - ERROR - stderr - +2025-02-05 23:55:06 - ERROR - stderr - +2025-02-05 23:55:06 - INFO - stdout - {'loss': 0.6154, 'grad_norm': 1.2995370626449585, 'learning_rate': 5.375400804285995e-06, 'epoch': 1.99} +2025-02-05 23:55:06 - ERROR - stderr - 66%|██████▋ | 14884/22434 [13:47:25<5:15:30, 2.51s/it] +2025-02-05 23:55:08 - ERROR - stderr - 66%|██████▋ | 14885/22434 [13:47:28<5:14:33, 2.50s/it] +2025-02-05 23:55:08 - ERROR - stderr - +2025-02-05 23:55:08 - ERROR - stderr - +2025-02-05 23:55:08 - INFO - stdout - {'loss': 0.6909, 'grad_norm': 1.2795130014419556, 'learning_rate': 5.3741207687102345e-06, 'epoch': 1.99} +2025-02-05 23:55:08 - ERROR - stderr - 66%|██████▋ | 14885/22434 [13:47:28<5:14:33, 2.50s/it] +2025-02-05 23:55:10 - ERROR - stderr - 66%|██████▋ | 14886/22434 [13:47:30<5:12:03, 2.48s/it] +2025-02-05 23:55:10 - ERROR - stderr - +2025-02-05 23:55:10 - ERROR - stderr - +2025-02-05 23:55:10 - INFO - stdout - {'loss': 0.6616, 'grad_norm': 1.3259156942367554, 'learning_rate': 5.3728408295565e-06, 'epoch': 1.99} +2025-02-05 23:55:10 - ERROR - stderr - 66%|██████▋ | 14886/22434 [13:47:30<5:12:03, 2.48s/it] +2025-02-05 23:55:13 - ERROR - stderr - 66%|██████▋ | 14887/22434 [13:47:33<5:12:31, 2.48s/it] +2025-02-05 23:55:13 - ERROR - stderr - +2025-02-05 23:55:13 - ERROR - stderr - +2025-02-05 23:55:13 - INFO - stdout - {'loss': 0.686, 'grad_norm': 1.428341031074524, 'learning_rate': 5.37156098685147e-06, 'epoch': 1.99} +2025-02-05 23:55:13 - ERROR - stderr - 66%|██████▋ | 14887/22434 [13:47:33<5:12:31, 2.48s/it] +2025-02-05 23:55:15 - ERROR - stderr - 66%|██████▋ | 14888/22434 [13:47:35<5:14:37, 2.50s/it] +2025-02-05 23:55:16 - ERROR - stderr - +2025-02-05 23:55:16 - ERROR - stderr - +2025-02-05 23:55:16 - INFO - stdout - {'loss': 0.7091, 'grad_norm': 1.3400788307189941, 'learning_rate': 5.370281240621823e-06, 'epoch': 1.99} +2025-02-05 23:55:16 - ERROR - stderr - 66%|██████▋ | 14888/22434 [13:47:35<5:14:37, 2.50s/it] +2025-02-05 23:55:18 - ERROR - stderr - 66%|██████▋ | 14889/22434 [13:47:38<5:14:06, 2.50s/it] +2025-02-05 23:55:18 - ERROR - stderr - +2025-02-05 23:55:18 - ERROR - stderr - +2025-02-05 23:55:18 - INFO - stdout - {'loss': 0.662, 'grad_norm': 1.2306163311004639, 'learning_rate': 5.369001590894233e-06, 'epoch': 1.99} +2025-02-05 23:55:18 - ERROR - stderr - 66%|██████▋ | 14889/22434 [13:47:38<5:14:06, 2.50s/it] +2025-02-05 23:55:20 - ERROR - stderr - 66%|██████▋ | 14890/22434 [13:47:40<5:13:29, 2.49s/it] +2025-02-05 23:55:20 - ERROR - stderr - +2025-02-05 23:55:20 - ERROR - stderr - +2025-02-05 23:55:20 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.2499032020568848, 'learning_rate': 5.367722037695373e-06, 'epoch': 1.99} +2025-02-05 23:55:20 - ERROR - stderr - 66%|██████▋ | 14890/22434 [13:47:40<5:13:29, 2.49s/it] +2025-02-05 23:55:23 - ERROR - stderr - 66%|██████▋ | 14891/22434 [13:47:43<5:14:46, 2.50s/it] +2025-02-05 23:55:23 - ERROR - stderr - +2025-02-05 23:55:23 - ERROR - stderr - +2025-02-05 23:55:23 - INFO - stdout - {'loss': 0.6526, 'grad_norm': 1.3928347826004028, 'learning_rate': 5.366442581051918e-06, 'epoch': 1.99} +2025-02-05 23:55:23 - ERROR - stderr - 66%|██████▋ | 14891/22434 [13:47:43<5:14:46, 2.50s/it] +2025-02-05 23:55:26 - ERROR - stderr - 66%|██████▋ | 14892/22434 [13:47:45<5:16:07, 2.51s/it] +2025-02-05 23:55:26 - ERROR - stderr - +2025-02-05 23:55:26 - ERROR - stderr - +2025-02-05 23:55:26 - INFO - stdout - {'loss': 0.5943, 'grad_norm': 1.28783118724823, 'learning_rate': 5.365163220990528e-06, 'epoch': 1.99} +2025-02-05 23:55:26 - ERROR - stderr - 66%|██████▋ | 14892/22434 [13:47:45<5:16:07, 2.51s/it] +2025-02-05 23:55:28 - ERROR - stderr - 66%|██████▋ | 14893/22434 [13:47:48<5:13:43, 2.50s/it] +2025-02-05 23:55:28 - ERROR - stderr - +2025-02-05 23:55:28 - ERROR - stderr - +2025-02-05 23:55:28 - INFO - stdout - {'loss': 0.659, 'grad_norm': 1.2658665180206299, 'learning_rate': 5.3638839575378775e-06, 'epoch': 1.99} +2025-02-05 23:55:28 - ERROR - stderr - 66%|██████▋ | 14893/22434 [13:47:48<5:13:43, 2.50s/it] +2025-02-05 23:55:30 - ERROR - stderr - 66%|██████▋ | 14894/22434 [13:47:50<5:14:48, 2.51s/it] +2025-02-05 23:55:31 - ERROR - stderr - +2025-02-05 23:55:31 - ERROR - stderr - +2025-02-05 23:55:31 - INFO - stdout - {'loss': 0.6672, 'grad_norm': 1.265547275543213, 'learning_rate': 5.3626047907206335e-06, 'epoch': 1.99} +2025-02-05 23:55:31 - ERROR - stderr - 66%|██████▋ | 14894/22434 [13:47:50<5:14:48, 2.51s/it] +2025-02-05 23:55:33 - ERROR - stderr - 66%|██████▋ | 14895/22434 [13:47:53<5:12:20, 2.49s/it] +2025-02-05 23:55:33 - ERROR - stderr - +2025-02-05 23:55:33 - ERROR - stderr - +2025-02-05 23:55:33 - INFO - stdout - {'loss': 0.7005, 'grad_norm': 1.28840970993042, 'learning_rate': 5.361325720565449e-06, 'epoch': 1.99} +2025-02-05 23:55:33 - ERROR - stderr - 66%|██████▋ | 14895/22434 [13:47:53<5:12:20, 2.49s/it] +2025-02-05 23:55:35 - ERROR - stderr - 66%|██████▋ | 14896/22434 [13:47:55<5:12:35, 2.49s/it] +2025-02-05 23:55:35 - ERROR - stderr - +2025-02-05 23:55:35 - ERROR - stderr - +2025-02-05 23:55:35 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.5279120206832886, 'learning_rate': 5.360046747098997e-06, 'epoch': 1.99} +2025-02-05 23:55:35 - ERROR - stderr - 66%|██████▋ | 14896/22434 [13:47:55<5:12:35, 2.49s/it] +2025-02-05 23:55:38 - ERROR - stderr - 66%|██████▋ | 14897/22434 [13:47:58<5:13:38, 2.50s/it] +2025-02-05 23:55:38 - ERROR - stderr - +2025-02-05 23:55:38 - ERROR - stderr - +2025-02-05 23:55:38 - INFO - stdout - {'loss': 0.6857, 'grad_norm': 1.2827222347259521, 'learning_rate': 5.358767870347924e-06, 'epoch': 1.99} +2025-02-05 23:55:38 - ERROR - stderr - 66%|██████▋ | 14897/22434 [13:47:58<5:13:38, 2.50s/it] +2025-02-05 23:55:40 - ERROR - stderr - 66%|██████▋ | 14898/22434 [13:48:00<5:15:41, 2.51s/it] +2025-02-05 23:55:41 - ERROR - stderr - +2025-02-05 23:55:41 - ERROR - stderr - +2025-02-05 23:55:41 - INFO - stdout - {'loss': 0.7034, 'grad_norm': 1.1647934913635254, 'learning_rate': 5.357489090338901e-06, 'epoch': 1.99} +2025-02-05 23:55:41 - ERROR - stderr - 66%|██████▋ | 14898/22434 [13:48:00<5:15:41, 2.51s/it] +2025-02-05 23:55:43 - ERROR - stderr - 66%|██████▋ | 14899/22434 [13:48:03<5:12:29, 2.49s/it] +2025-02-05 23:55:43 - ERROR - stderr - +2025-02-05 23:55:43 - ERROR - stderr - +2025-02-05 23:55:43 - INFO - stdout - {'loss': 0.6631, 'grad_norm': 1.4435542821884155, 'learning_rate': 5.356210407098572e-06, 'epoch': 1.99} +2025-02-05 23:55:43 - ERROR - stderr - 66%|██████▋ | 14899/22434 [13:48:03<5:12:29, 2.49s/it] +2025-02-05 23:55:45 - ERROR - stderr - 66%|██████▋ | 14900/22434 [13:48:05<5:14:50, 2.51s/it] +2025-02-05 23:55:46 - ERROR - stderr - +2025-02-05 23:55:46 - ERROR - stderr - +2025-02-05 23:55:46 - INFO - stdout - {'loss': 0.5966, 'grad_norm': 1.2394779920578003, 'learning_rate': 5.354931820653593e-06, 'epoch': 1.99} +2025-02-05 23:55:46 - ERROR - stderr - 66%|██████▋ | 14900/22434 [13:48:05<5:14:50, 2.51s/it] +2025-02-05 23:55:48 - ERROR - stderr - 66%|██████▋ | 14901/22434 [13:48:08<5:14:02, 2.50s/it] +2025-02-05 23:55:48 - ERROR - stderr - +2025-02-05 23:55:48 - ERROR - stderr - +2025-02-05 23:55:48 - INFO - stdout - {'loss': 0.7604, 'grad_norm': 1.2509815692901611, 'learning_rate': 5.353653331030615e-06, 'epoch': 1.99} +2025-02-05 23:55:48 - ERROR - stderr - 66%|██████▋ | 14901/22434 [13:48:08<5:14:02, 2.50s/it] +2025-02-05 23:55:50 - ERROR - stderr - 66%|██████▋ | 14902/22434 [13:48:10<5:11:36, 2.48s/it] +2025-02-05 23:55:50 - ERROR - stderr - +2025-02-05 23:55:50 - ERROR - stderr - +2025-02-05 23:55:50 - INFO - stdout - {'loss': 0.7249, 'grad_norm': 1.192550539970398, 'learning_rate': 5.352374938256289e-06, 'epoch': 1.99} +2025-02-05 23:55:50 - ERROR - stderr - 66%|██████▋ | 14902/22434 [13:48:10<5:11:36, 2.48s/it] +2025-02-05 23:55:53 - ERROR - stderr - 66%|██████▋ | 14903/22434 [13:48:13<5:11:25, 2.48s/it] +2025-02-05 23:55:53 - ERROR - stderr - +2025-02-05 23:55:53 - ERROR - stderr - +2025-02-05 23:55:53 - INFO - stdout - {'loss': 0.6759, 'grad_norm': 1.2485854625701904, 'learning_rate': 5.351096642357259e-06, 'epoch': 1.99} +2025-02-05 23:55:53 - ERROR - stderr - 66%|██████▋ | 14903/22434 [13:48:13<5:11:25, 2.48s/it] +2025-02-05 23:55:56 - ERROR - stderr - 66%|██████▋ | 14904/22434 [13:48:15<5:22:38, 2.57s/it] +2025-02-05 23:55:56 - ERROR - stderr - +2025-02-05 23:55:56 - ERROR - stderr - +2025-02-05 23:55:56 - INFO - stdout - {'loss': 0.7277, 'grad_norm': 1.384131908416748, 'learning_rate': 5.3498184433601695e-06, 'epoch': 1.99} +2025-02-05 23:55:56 - ERROR - stderr - 66%|██████▋ | 14904/22434 [13:48:15<5:22:38, 2.57s/it] +2025-02-05 23:55:58 - ERROR - stderr - 66%|██████▋ | 14905/22434 [13:48:18<5:18:11, 2.54s/it] +2025-02-05 23:55:58 - ERROR - stderr - +2025-02-05 23:55:58 - ERROR - stderr - +2025-02-05 23:55:58 - INFO - stdout - {'loss': 0.7991, 'grad_norm': 1.3492255210876465, 'learning_rate': 5.348540341291666e-06, 'epoch': 1.99} +2025-02-05 23:55:58 - ERROR - stderr - 66%|██████▋ | 14905/22434 [13:48:18<5:18:11, 2.54s/it] +2025-02-05 23:56:01 - ERROR - stderr - 66%|██████▋ | 14906/22434 [13:48:20<5:20:39, 2.56s/it] +2025-02-05 23:56:01 - ERROR - stderr - +2025-02-05 23:56:01 - ERROR - stderr - +2025-02-05 23:56:01 - INFO - stdout - {'loss': 0.6984, 'grad_norm': 1.3712635040283203, 'learning_rate': 5.3472623361783896e-06, 'epoch': 1.99} +2025-02-05 23:56:01 - ERROR - stderr - 66%|██████▋ | 14906/22434 [13:48:21<5:20:39, 2.56s/it] +2025-02-05 23:56:03 - ERROR - stderr - 66%|██████▋ | 14907/22434 [13:48:23<5:19:23, 2.55s/it] +2025-02-05 23:56:03 - ERROR - stderr - +2025-02-05 23:56:03 - ERROR - stderr - +2025-02-05 23:56:03 - INFO - stdout - {'loss': 0.6803, 'grad_norm': 1.2760930061340332, 'learning_rate': 5.345984428046976e-06, 'epoch': 1.99} +2025-02-05 23:56:03 - ERROR - stderr - 66%|██████▋ | 14907/22434 [13:48:23<5:19:23, 2.55s/it] +2025-02-05 23:56:06 - ERROR - stderr - 66%|██████▋ | 14908/22434 [13:48:25<5:15:27, 2.51s/it] +2025-02-05 23:56:06 - ERROR - stderr - +2025-02-05 23:56:06 - ERROR - stderr - +2025-02-05 23:56:06 - INFO - stdout - {'loss': 0.6497, 'grad_norm': 1.2948359251022339, 'learning_rate': 5.344706616924062e-06, 'epoch': 1.99} +2025-02-05 23:56:06 - ERROR - stderr - 66%|██████▋ | 14908/22434 [13:48:26<5:15:27, 2.51s/it] +2025-02-05 23:56:08 - ERROR - stderr - 66%|██████▋ | 14909/22434 [13:48:28<5:12:52, 2.49s/it] +2025-02-05 23:56:08 - ERROR - stderr - +2025-02-05 23:56:08 - ERROR - stderr - +2025-02-05 23:56:08 - INFO - stdout - {'loss': 0.618, 'grad_norm': 1.233685851097107, 'learning_rate': 5.343428902836287e-06, 'epoch': 1.99} +2025-02-05 23:56:08 - ERROR - stderr - 66%|██████▋ | 14909/22434 [13:48:28<5:12:52, 2.49s/it] +2025-02-05 23:56:11 - ERROR - stderr - 66%|██████▋ | 14910/22434 [13:48:30<5:15:32, 2.52s/it] +2025-02-05 23:56:11 - ERROR - stderr - +2025-02-05 23:56:11 - ERROR - stderr - +2025-02-05 23:56:11 - INFO - stdout - {'loss': 0.6374, 'grad_norm': 1.3347445726394653, 'learning_rate': 5.342151285810283e-06, 'epoch': 1.99} +2025-02-05 23:56:11 - ERROR - stderr - 66%|██████▋ | 14910/22434 [13:48:31<5:15:32, 2.52s/it] +2025-02-05 23:56:13 - ERROR - stderr - 66%|██████▋ | 14911/22434 [13:48:33<5:14:54, 2.51s/it] +2025-02-05 23:56:13 - ERROR - stderr - +2025-02-05 23:56:13 - ERROR - stderr - +2025-02-05 23:56:13 - INFO - stdout - {'loss': 0.743, 'grad_norm': 1.4459260702133179, 'learning_rate': 5.340873765872671e-06, 'epoch': 1.99} +2025-02-05 23:56:13 - ERROR - stderr - 66%|██████▋ | 14911/22434 [13:48:33<5:14:54, 2.51s/it] +2025-02-05 23:56:16 - ERROR - stderr - 66%|██████▋ | 14912/22434 [13:48:35<5:14:22, 2.51s/it] +2025-02-05 23:56:16 - ERROR - stderr - +2025-02-05 23:56:16 - ERROR - stderr - +2025-02-05 23:56:16 - INFO - stdout - {'loss': 0.7001, 'grad_norm': 1.2689650058746338, 'learning_rate': 5.339596343050091e-06, 'epoch': 1.99} +2025-02-05 23:56:16 - ERROR - stderr - 66%|██████▋ | 14912/22434 [13:48:36<5:14:22, 2.51s/it] +2025-02-05 23:56:18 - ERROR - stderr - 66%|██████▋ | 14913/22434 [13:48:38<5:15:35, 2.52s/it] +2025-02-05 23:56:18 - ERROR - stderr - +2025-02-05 23:56:18 - ERROR - stderr - +2025-02-05 23:56:18 - INFO - stdout - {'loss': 0.7303, 'grad_norm': 1.2226638793945312, 'learning_rate': 5.338319017369165e-06, 'epoch': 1.99} +2025-02-05 23:56:18 - ERROR - stderr - 66%|██████▋ | 14913/22434 [13:48:38<5:15:35, 2.52s/it] +2025-02-05 23:56:21 - ERROR - stderr - 66%|██████▋ | 14914/22434 [13:48:41<5:15:18, 2.52s/it] +2025-02-05 23:56:21 - ERROR - stderr - +2025-02-05 23:56:21 - ERROR - stderr - +2025-02-05 23:56:21 - INFO - stdout - {'loss': 0.6491, 'grad_norm': 1.3546885251998901, 'learning_rate': 5.337041788856518e-06, 'epoch': 1.99} +2025-02-05 23:56:21 - ERROR - stderr - 66%|██████▋ | 14914/22434 [13:48:41<5:15:18, 2.52s/it] +2025-02-05 23:56:23 - ERROR - stderr - 66%|██████▋ | 14915/22434 [13:48:43<5:13:10, 2.50s/it] +2025-02-05 23:56:23 - ERROR - stderr - +2025-02-05 23:56:23 - ERROR - stderr - +2025-02-05 23:56:23 - INFO - stdout - {'loss': 0.6996, 'grad_norm': 1.4291191101074219, 'learning_rate': 5.335764657538779e-06, 'epoch': 1.99} +2025-02-05 23:56:23 - ERROR - stderr - 66%|██████▋ | 14915/22434 [13:48:43<5:13:10, 2.50s/it] +2025-02-05 23:56:26 - ERROR - stderr - 66%|██████▋ | 14916/22434 [13:48:45<5:12:44, 2.50s/it] +2025-02-05 23:56:26 - ERROR - stderr - +2025-02-05 23:56:26 - ERROR - stderr - +2025-02-05 23:56:26 - INFO - stdout - {'loss': 0.6476, 'grad_norm': 1.2857189178466797, 'learning_rate': 5.3344876234425536e-06, 'epoch': 1.99} +2025-02-05 23:56:26 - ERROR - stderr - 66%|██████▋ | 14916/22434 [13:48:46<5:12:44, 2.50s/it] +2025-02-05 23:56:29 - ERROR - stderr - 66%|██████▋ | 14917/22434 [13:48:48<5:24:25, 2.59s/it] +2025-02-05 23:56:29 - ERROR - stderr - +2025-02-05 23:56:29 - ERROR - stderr - +2025-02-05 23:56:29 - INFO - stdout - {'loss': 0.7331, 'grad_norm': 1.2670665979385376, 'learning_rate': 5.3332106865944766e-06, 'epoch': 1.99} +2025-02-05 23:56:29 - ERROR - stderr - 66%|██████▋ | 14917/22434 [13:48:48<5:24:25, 2.59s/it] +2025-02-05 23:56:31 - ERROR - stderr - 66%|██████▋ | 14918/22434 [13:48:51<5:21:03, 2.56s/it] +2025-02-05 23:56:31 - ERROR - stderr - +2025-02-05 23:56:31 - ERROR - stderr - +2025-02-05 23:56:31 - INFO - stdout - {'loss': 0.7309, 'grad_norm': 1.2989870309829712, 'learning_rate': 5.331933847021153e-06, 'epoch': 1.99} +2025-02-05 23:56:31 - ERROR - stderr - 66%|██████▋ | 14918/22434 [13:48:51<5:21:03, 2.56s/it] +2025-02-05 23:56:33 - ERROR - stderr - 67%|██████▋ | 14919/22434 [13:48:53<5:16:25, 2.53s/it] +2025-02-05 23:56:33 - ERROR - stderr - +2025-02-05 23:56:33 - ERROR - stderr - +2025-02-05 23:56:33 - INFO - stdout - {'loss': 0.667, 'grad_norm': 1.2674791812896729, 'learning_rate': 5.330657104749203e-06, 'epoch': 2.0} +2025-02-05 23:56:33 - ERROR - stderr - 67%|██████▋ | 14919/22434 [13:48:53<5:16:25, 2.53s/it] +2025-02-05 23:56:36 - ERROR - stderr - 67%|██████▋ | 14920/22434 [13:48:56<5:12:25, 2.49s/it] +2025-02-05 23:56:36 - ERROR - stderr - +2025-02-05 23:56:36 - ERROR - stderr - +2025-02-05 23:56:36 - INFO - stdout - {'loss': 0.5557, 'grad_norm': 1.1993584632873535, 'learning_rate': 5.329380459805237e-06, 'epoch': 2.0} +2025-02-05 23:56:36 - ERROR - stderr - 67%|██████▋ | 14920/22434 [13:48:56<5:12:25, 2.49s/it] +2025-02-05 23:56:38 - ERROR - stderr - 67%|██████▋ | 14921/22434 [13:48:58<5:13:33, 2.50s/it] +2025-02-05 23:56:38 - ERROR - stderr - +2025-02-05 23:56:38 - ERROR - stderr - +2025-02-05 23:56:38 - INFO - stdout - {'loss': 0.6876, 'grad_norm': 1.2351102828979492, 'learning_rate': 5.328103912215861e-06, 'epoch': 2.0} +2025-02-05 23:56:38 - ERROR - stderr - 67%|██████▋ | 14921/22434 [13:48:58<5:13:33, 2.50s/it] +2025-02-05 23:56:41 - ERROR - stderr - 67%|██████▋ | 14922/22434 [13:49:01<5:14:46, 2.51s/it] +2025-02-05 23:56:41 - ERROR - stderr - +2025-02-05 23:56:41 - ERROR - stderr - +2025-02-05 23:56:41 - INFO - stdout - {'loss': 0.6627, 'grad_norm': 1.303617000579834, 'learning_rate': 5.326827462007697e-06, 'epoch': 2.0} +2025-02-05 23:56:41 - ERROR - stderr - 67%|██████▋ | 14922/22434 [13:49:01<5:14:46, 2.51s/it] +2025-02-05 23:56:43 - ERROR - stderr - 67%|██████▋ | 14923/22434 [13:49:03<5:13:41, 2.51s/it] +2025-02-05 23:56:43 - ERROR - stderr - +2025-02-05 23:56:43 - ERROR - stderr - +2025-02-05 23:56:43 - INFO - stdout - {'loss': 0.6713, 'grad_norm': 1.3228851556777954, 'learning_rate': 5.32555110920734e-06, 'epoch': 2.0} +2025-02-05 23:56:43 - ERROR - stderr - 67%|██████▋ | 14923/22434 [13:49:03<5:13:41, 2.51s/it] +2025-02-05 23:56:46 - ERROR - stderr - 67%|██████▋ | 14924/22434 [13:49:06<5:20:32, 2.56s/it] +2025-02-05 23:56:46 - ERROR - stderr - +2025-02-05 23:56:46 - ERROR - stderr - +2025-02-05 23:56:46 - INFO - stdout - {'loss': 0.6675, 'grad_norm': 1.2968569993972778, 'learning_rate': 5.324274853841396e-06, 'epoch': 2.0} +2025-02-05 23:56:46 - ERROR - stderr - 67%|██████▋ | 14924/22434 [13:49:06<5:20:32, 2.56s/it] +2025-02-05 23:56:49 - ERROR - stderr - 67%|██████▋ | 14925/22434 [13:49:08<5:19:11, 2.55s/it] +2025-02-05 23:56:49 - ERROR - stderr - +2025-02-05 23:56:49 - ERROR - stderr - +2025-02-05 23:56:49 - INFO - stdout - {'loss': 0.8057, 'grad_norm': 1.4967141151428223, 'learning_rate': 5.3229986959364675e-06, 'epoch': 2.0} +2025-02-05 23:56:49 - ERROR - stderr - 67%|██████▋ | 14925/22434 [13:49:08<5:19:11, 2.55s/it] +2025-02-05 23:56:51 - ERROR - stderr - 67%|██████▋ | 14926/22434 [13:49:11<5:19:21, 2.55s/it] +2025-02-05 23:56:51 - ERROR - stderr - +2025-02-05 23:56:51 - ERROR - stderr - +2025-02-05 23:56:51 - INFO - stdout - {'loss': 0.681, 'grad_norm': 1.2397456169128418, 'learning_rate': 5.321722635519158e-06, 'epoch': 2.0} +2025-02-05 23:56:51 - ERROR - stderr - 67%|██████▋ | 14926/22434 [13:49:11<5:19:21, 2.55s/it] +2025-02-05 23:56:54 - ERROR - stderr - 67%|██████▋ | 14927/22434 [13:49:13<5:18:19, 2.54s/it] +2025-02-05 23:56:54 - ERROR - stderr - +2025-02-05 23:56:54 - ERROR - stderr - +2025-02-05 23:56:54 - INFO - stdout - {'loss': 0.6317, 'grad_norm': 1.261259913444519, 'learning_rate': 5.320446672616062e-06, 'epoch': 2.0} +2025-02-05 23:56:54 - ERROR - stderr - 67%|██████▋ | 14927/22434 [13:49:14<5:18:19, 2.54s/it] +2025-02-05 23:56:56 - ERROR - stderr - 67%|██████▋ | 14928/22434 [13:49:16<5:17:04, 2.53s/it] +2025-02-05 23:56:56 - ERROR - stderr - +2025-02-05 23:56:56 - ERROR - stderr - +2025-02-05 23:56:56 - INFO - stdout - {'loss': 0.7377, 'grad_norm': 1.3671162128448486, 'learning_rate': 5.319170807253777e-06, 'epoch': 2.0} +2025-02-05 23:56:56 - ERROR - stderr - 67%|██████▋ | 14928/22434 [13:49:16<5:17:04, 2.53s/it] +2025-02-05 23:56:59 - ERROR - stderr - 67%|██████▋ | 14929/22434 [13:49:18<5:14:16, 2.51s/it] +2025-02-05 23:56:59 - ERROR - stderr - +2025-02-05 23:56:59 - ERROR - stderr - +2025-02-05 23:56:59 - INFO - stdout - {'loss': 0.6293, 'grad_norm': 1.292900562286377, 'learning_rate': 5.317895039458899e-06, 'epoch': 2.0} +2025-02-05 23:56:59 - ERROR - stderr - 67%|██████▋ | 14929/22434 [13:49:19<5:14:16, 2.51s/it] +2025-02-05 23:57:01 - ERROR - stderr - 67%|██████▋ | 14930/22434 [13:49:21<5:13:39, 2.51s/it] +2025-02-05 23:57:01 - ERROR - stderr - +2025-02-05 23:57:01 - ERROR - stderr - +2025-02-05 23:57:01 - INFO - stdout - {'loss': 0.6445, 'grad_norm': 1.154637098312378, 'learning_rate': 5.316619369258018e-06, 'epoch': 2.0} +2025-02-05 23:57:01 - ERROR - stderr - 67%|██████▋ | 14930/22434 [13:49:21<5:13:39, 2.51s/it] +2025-02-05 23:57:04 - ERROR - stderr - 67%|██████▋ | 14931/22434 [13:49:23<5:12:02, 2.50s/it] +2025-02-05 23:57:04 - ERROR - stderr - +2025-02-05 23:57:04 - ERROR - stderr - +2025-02-05 23:57:04 - INFO - stdout - {'loss': 0.6586, 'grad_norm': 1.3124198913574219, 'learning_rate': 5.315343796677724e-06, 'epoch': 2.0} +2025-02-05 23:57:04 - ERROR - stderr - 67%|██████▋ | 14931/22434 [13:49:23<5:12:02, 2.50s/it] +2025-02-05 23:57:06 - ERROR - stderr - 67%|██████▋ | 14932/22434 [13:49:26<5:09:31, 2.48s/it] +2025-02-05 23:57:06 - ERROR - stderr - +2025-02-05 23:57:06 - ERROR - stderr - +2025-02-05 23:57:06 - INFO - stdout - {'loss': 0.7663, 'grad_norm': 1.4545519351959229, 'learning_rate': 5.314068321744607e-06, 'epoch': 2.0} +2025-02-05 23:57:06 - ERROR - stderr - 67%|██████▋ | 14932/22434 [13:49:26<5:09:31, 2.48s/it] +2025-02-05 23:57:09 - ERROR - stderr - 67%|██████▋ | 14933/22434 [13:49:29<5:17:37, 2.54s/it] +2025-02-05 23:57:09 - ERROR - stderr - +2025-02-05 23:57:09 - ERROR - stderr - +2025-02-05 23:57:09 - INFO - stdout - {'loss': 0.7021, 'grad_norm': 1.226535439491272, 'learning_rate': 5.312792944485251e-06, 'epoch': 2.0} +2025-02-05 23:57:09 - ERROR - stderr - 67%|██████▋ | 14933/22434 [13:49:29<5:17:37, 2.54s/it] +2025-02-05 23:57:11 - ERROR - stderr - 67%|██████▋ | 14934/22434 [13:49:31<5:17:04, 2.54s/it] +2025-02-05 23:57:11 - ERROR - stderr - +2025-02-05 23:57:11 - ERROR - stderr - +2025-02-05 23:57:11 - INFO - stdout - {'loss': 0.6182, 'grad_norm': 1.2827050685882568, 'learning_rate': 5.3115176649262445e-06, 'epoch': 2.0} +2025-02-05 23:57:11 - ERROR - stderr - 67%|██████▋ | 14934/22434 [13:49:31<5:17:04, 2.54s/it] +2025-02-05 23:57:14 - ERROR - stderr - 67%|██████▋ | 14935/22434 [13:49:33<5:12:50, 2.50s/it] +2025-02-05 23:57:14 - ERROR - stderr - +2025-02-05 23:57:14 - ERROR - stderr - +2025-02-05 23:57:14 - INFO - stdout - {'loss': 0.6808, 'grad_norm': 1.33535897731781, 'learning_rate': 5.310242483094159e-06, 'epoch': 2.0} +2025-02-05 23:57:14 - ERROR - stderr - 67%|██████▋ | 14935/22434 [13:49:34<5:12:50, 2.50s/it] +2025-02-05 23:57:16 - ERROR - stderr - 67%|██████▋ | 14936/22434 [13:49:36<5:15:06, 2.52s/it] +2025-02-05 23:57:16 - ERROR - stderr - +2025-02-05 23:57:16 - ERROR - stderr - +2025-02-05 23:57:16 - INFO - stdout - {'loss': 0.6582, 'grad_norm': 1.3951321840286255, 'learning_rate': 5.308967399015589e-06, 'epoch': 2.0} +2025-02-05 23:57:16 - ERROR - stderr - 67%|██████▋ | 14936/22434 [13:49:36<5:15:06, 2.52s/it] +2025-02-05 23:57:19 - ERROR - stderr - 67%|██████▋ | 14937/22434 [13:49:39<5:14:57, 2.52s/it] +2025-02-05 23:57:19 - ERROR - stderr - +2025-02-05 23:57:19 - ERROR - stderr - +2025-02-05 23:57:19 - INFO - stdout - {'loss': 0.6532, 'grad_norm': 1.1756454706192017, 'learning_rate': 5.3076924127170956e-06, 'epoch': 2.0} +2025-02-05 23:57:19 - ERROR - stderr - 67%|██████▋ | 14937/22434 [13:49:39<5:14:57, 2.52s/it] +2025-02-05 23:57:21 - ERROR - stderr - 67%|██████▋ | 14938/22434 [13:49:41<5:19:18, 2.56s/it] +2025-02-05 23:57:21 - ERROR - stderr - +2025-02-05 23:57:21 - ERROR - stderr - +2025-02-05 23:57:21 - INFO - stdout - {'loss': 0.6959, 'grad_norm': 1.3676700592041016, 'learning_rate': 5.3064175242252694e-06, 'epoch': 2.0} +2025-02-05 23:57:21 - ERROR - stderr - 67%|██████▋ | 14938/22434 [13:49:41<5:19:18, 2.56s/it] +2025-02-05 23:57:24 - ERROR - stderr - 67%|██████▋ | 14939/22434 [13:49:44<5:17:13, 2.54s/it] +2025-02-05 23:57:24 - ERROR - stderr - +2025-02-05 23:57:24 - ERROR - stderr - +2025-02-05 23:57:24 - INFO - stdout - {'loss': 0.6707, 'grad_norm': 1.1414225101470947, 'learning_rate': 5.305142733566681e-06, 'epoch': 2.0} +2025-02-05 23:57:24 - ERROR - stderr - 67%|██████▋ | 14939/22434 [13:49:44<5:17:13, 2.54s/it] +2025-02-05 23:57:26 - ERROR - stderr - 67%|██████▋ | 14940/22434 [13:49:46<5:15:37, 2.53s/it] +2025-02-05 23:57:26 - ERROR - stderr - +2025-02-05 23:57:26 - ERROR - stderr - +2025-02-05 23:57:26 - INFO - stdout - {'loss': 0.7027, 'grad_norm': 1.3546677827835083, 'learning_rate': 5.303868040767894e-06, 'epoch': 2.0} +2025-02-05 23:57:26 - ERROR - stderr - 67%|██████▋ | 14940/22434 [13:49:46<5:15:37, 2.53s/it] +2025-02-05 23:57:29 - ERROR - stderr - 67%|██████▋ | 14941/22434 [13:49:49<5:13:51, 2.51s/it] +2025-02-05 23:57:29 - ERROR - stderr - +2025-02-05 23:57:29 - ERROR - stderr - +2025-02-05 23:57:29 - INFO - stdout - {'loss': 0.6753, 'grad_norm': 1.2713035345077515, 'learning_rate': 5.30259344585549e-06, 'epoch': 2.0} +2025-02-05 23:57:29 - ERROR - stderr - 67%|██████▋ | 14941/22434 [13:49:49<5:13:51, 2.51s/it] +2025-02-05 23:57:31 - ERROR - stderr - 67%|██████▋ | 14942/22434 [13:49:51<5:15:54, 2.53s/it] +2025-02-05 23:57:32 - ERROR - stderr - +2025-02-05 23:57:32 - ERROR - stderr - +2025-02-05 23:57:32 - INFO - stdout - {'loss': 0.647, 'grad_norm': 1.209706425666809, 'learning_rate': 5.301318948856029e-06, 'epoch': 2.0} +2025-02-05 23:57:32 - ERROR - stderr - 67%|██████▋ | 14942/22434 [13:49:51<5:15:54, 2.53s/it] +2025-02-05 23:57:34 - ERROR - stderr - 67%|██████▋ | 14943/22434 [13:49:54<5:15:39, 2.53s/it] +2025-02-05 23:57:34 - ERROR - stderr - +2025-02-05 23:57:34 - ERROR - stderr - +2025-02-05 23:57:34 - INFO - stdout - {'loss': 0.7555, 'grad_norm': 1.2872382402420044, 'learning_rate': 5.300044549796076e-06, 'epoch': 2.0} +2025-02-05 23:57:34 - ERROR - stderr - 67%|██████▋ | 14943/22434 [13:49:54<5:15:39, 2.53s/it] +2025-02-05 23:57:37 - ERROR - stderr - 67%|██████▋ | 14944/22434 [13:49:56<5:15:53, 2.53s/it] +2025-02-05 23:57:37 - ERROR - stderr - +2025-02-05 23:57:37 - ERROR - stderr - +2025-02-05 23:57:37 - INFO - stdout - {'loss': 0.6505, 'grad_norm': 1.3546504974365234, 'learning_rate': 5.298770248702198e-06, 'epoch': 2.0} +2025-02-05 23:57:37 - ERROR - stderr - 67%|██████▋ | 14944/22434 [13:49:56<5:15:53, 2.53s/it] +2025-02-05 23:57:39 - ERROR - stderr - 67%|██████▋ | 14945/22434 [13:49:59<5:14:15, 2.52s/it] +2025-02-05 23:57:39 - ERROR - stderr - +2025-02-05 23:57:39 - ERROR - stderr - +2025-02-05 23:57:39 - INFO - stdout - {'loss': 0.6236, 'grad_norm': 1.2530288696289062, 'learning_rate': 5.297496045600956e-06, 'epoch': 2.0} +2025-02-05 23:57:39 - ERROR - stderr - 67%|██████▋ | 14945/22434 [13:49:59<5:14:15, 2.52s/it] +2025-02-05 23:57:42 - ERROR - stderr - 67%|██████▋ | 14946/22434 [13:50:01<5:15:04, 2.52s/it] +2025-02-05 23:57:42 - ERROR - stderr - +2025-02-05 23:57:42 - ERROR - stderr - +2025-02-05 23:57:42 - INFO - stdout - {'loss': 0.7452, 'grad_norm': 1.3733775615692139, 'learning_rate': 5.296221940518908e-06, 'epoch': 2.0} +2025-02-05 23:57:42 - ERROR - stderr - 67%|██████▋ | 14946/22434 [13:50:01<5:15:04, 2.52s/it] +2025-02-05 23:57:44 - ERROR - stderr - 67%|██████▋ | 14947/22434 [13:50:04<5:16:15, 2.53s/it] +2025-02-05 23:57:44 - ERROR - stderr - +2025-02-05 23:57:44 - ERROR - stderr - +2025-02-05 23:57:44 - INFO - stdout - {'loss': 0.7002, 'grad_norm': 1.3594934940338135, 'learning_rate': 5.294947933482612e-06, 'epoch': 2.0} +2025-02-05 23:57:44 - ERROR - stderr - 67%|██████▋ | 14947/22434 [13:50:04<5:16:15, 2.53s/it] +2025-02-05 23:57:47 - ERROR - stderr - 67%|██████▋ | 14948/22434 [13:50:06<5:17:04, 2.54s/it] +2025-02-05 23:57:47 - ERROR - stderr - +2025-02-05 23:57:47 - ERROR - stderr - +2025-02-05 23:57:47 - INFO - stdout - {'loss': 0.6553, 'grad_norm': 1.2966232299804688, 'learning_rate': 5.293674024518627e-06, 'epoch': 2.0} +2025-02-05 23:57:47 - ERROR - stderr - 67%|██████▋ | 14948/22434 [13:50:07<5:17:04, 2.54s/it] +2025-02-05 23:57:49 - ERROR - stderr - 67%|██████▋ | 14949/22434 [13:50:09<5:16:39, 2.54s/it] +2025-02-05 23:57:49 - ERROR - stderr - +2025-02-05 23:57:49 - ERROR - stderr - +2025-02-05 23:57:49 - INFO - stdout - {'loss': 0.6879, 'grad_norm': 1.2556822299957275, 'learning_rate': 5.292400213653501e-06, 'epoch': 2.0} +2025-02-05 23:57:49 - ERROR - stderr - 67%|██████▋ | 14949/22434 [13:50:09<5:16:39, 2.54s/it] +2025-02-05 23:57:52 - ERROR - stderr - 67%|██████▋ | 14950/22434 [13:50:12<5:16:27, 2.54s/it] +2025-02-05 23:57:52 - ERROR - stderr - +2025-02-05 23:57:52 - ERROR - stderr - +2025-02-05 23:57:52 - INFO - stdout - {'loss': 0.7289, 'grad_norm': 1.1856272220611572, 'learning_rate': 5.291126500913788e-06, 'epoch': 2.0} +2025-02-05 23:57:52 - ERROR - stderr - 67%|██████▋ | 14950/22434 [13:50:12<5:16:27, 2.54s/it] +2025-02-05 23:57:54 - ERROR - stderr - 67%|██████▋ | 14951/22434 [13:50:14<5:17:26, 2.55s/it] +2025-02-05 23:57:54 - ERROR - stderr - +2025-02-05 23:57:54 - ERROR - stderr - +2025-02-05 23:57:54 - INFO - stdout - {'loss': 0.6823, 'grad_norm': 1.301573634147644, 'learning_rate': 5.289852886326039e-06, 'epoch': 2.0} +2025-02-05 23:57:54 - ERROR - stderr - 67%|██████▋ | 14951/22434 [13:50:14<5:17:26, 2.55s/it] +2025-02-05 23:57:57 - ERROR - stderr - 67%|██████▋ | 14952/22434 [13:50:17<5:16:31, 2.54s/it] +2025-02-05 23:57:57 - ERROR - stderr - +2025-02-05 23:57:57 - ERROR - stderr - +2025-02-05 23:57:57 - INFO - stdout - {'loss': 0.6418, 'grad_norm': 1.2661771774291992, 'learning_rate': 5.288579369916798e-06, 'epoch': 2.0} +2025-02-05 23:57:57 - ERROR - stderr - 67%|██████▋ | 14952/22434 [13:50:17<5:16:31, 2.54s/it] +2025-02-05 23:57:59 - ERROR - stderr - 67%|██████▋ | 14953/22434 [13:50:19<5:17:50, 2.55s/it] +2025-02-05 23:57:59 - ERROR - stderr - +2025-02-05 23:57:59 - ERROR - stderr - +2025-02-05 23:57:59 - INFO - stdout - {'loss': 0.5722, 'grad_norm': 1.390884280204773, 'learning_rate': 5.287305951712612e-06, 'epoch': 2.0} +2025-02-05 23:57:59 - ERROR - stderr - 67%|██████▋ | 14953/22434 [13:50:19<5:17:50, 2.55s/it] +2025-02-05 23:58:02 - ERROR - stderr - 67%|██████▋ | 14954/22434 [13:50:22<5:20:18, 2.57s/it] +2025-02-05 23:58:02 - ERROR - stderr - +2025-02-05 23:58:02 - ERROR - stderr - +2025-02-05 23:58:02 - INFO - stdout - {'loss': 0.6523, 'grad_norm': 1.2112319469451904, 'learning_rate': 5.286032631740023e-06, 'epoch': 2.0} +2025-02-05 23:58:02 - ERROR - stderr - 67%|██████▋ | 14954/22434 [13:50:22<5:20:18, 2.57s/it] +2025-02-05 23:58:05 - ERROR - stderr - 67%|██████▋ | 14955/22434 [13:50:24<5:21:35, 2.58s/it] +2025-02-05 23:58:05 - ERROR - stderr - +2025-02-05 23:58:05 - ERROR - stderr - +2025-02-05 23:58:05 - INFO - stdout - {'loss': 0.677, 'grad_norm': 1.3114471435546875, 'learning_rate': 5.284759410025578e-06, 'epoch': 2.0} +2025-02-05 23:58:05 - ERROR - stderr - 67%|██████▋ | 14955/22434 [13:50:24<5:21:35, 2.58s/it] +2025-02-05 23:58:07 - ERROR - stderr - 67%|██████▋ | 14956/22434 [13:50:27<5:15:15, 2.53s/it] +2025-02-05 23:58:07 - ERROR - stderr - +2025-02-05 23:58:07 - ERROR - stderr - +2025-02-05 23:58:07 - INFO - stdout - {'loss': 0.4815, 'grad_norm': 1.050843358039856, 'learning_rate': 5.283486286595804e-06, 'epoch': 2.0} +2025-02-05 23:58:07 - ERROR - stderr - 67%|██████▋ | 14956/22434 [13:50:27<5:15:15, 2.53s/it] +2025-02-05 23:58:10 - ERROR - stderr - 67%|██████▋ | 14957/22434 [13:50:30<5:20:38, 2.57s/it] +2025-02-05 23:58:10 - ERROR - stderr - +2025-02-05 23:58:10 - ERROR - stderr - +2025-02-05 23:58:10 - INFO - stdout - {'loss': 0.455, 'grad_norm': 1.055700421333313, 'learning_rate': 5.282213261477247e-06, 'epoch': 2.0} +2025-02-05 23:58:10 - ERROR - stderr - 67%|██████▋ | 14957/22434 [13:50:30<5:20:38, 2.57s/it] +2025-02-05 23:58:12 - ERROR - stderr - 67%|██████▋ | 14958/22434 [13:50:32<5:18:40, 2.56s/it] +2025-02-05 23:58:12 - ERROR - stderr - +2025-02-05 23:58:12 - ERROR - stderr - +2025-02-05 23:58:12 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 0.974917471408844, 'learning_rate': 5.280940334696442e-06, 'epoch': 2.0} +2025-02-05 23:58:12 - ERROR - stderr - 67%|██████▋ | 14958/22434 [13:50:32<5:18:40, 2.56s/it] +2025-02-05 23:58:15 - ERROR - stderr - 67%|██████▋ | 14959/22434 [13:50:34<5:15:11, 2.53s/it] +2025-02-05 23:58:15 - ERROR - stderr - +2025-02-05 23:58:15 - ERROR - stderr - +2025-02-05 23:58:15 - INFO - stdout - {'loss': 0.4543, 'grad_norm': 1.0393953323364258, 'learning_rate': 5.27966750627992e-06, 'epoch': 2.0} +2025-02-05 23:58:15 - ERROR - stderr - 67%|██████▋ | 14959/22434 [13:50:35<5:15:11, 2.53s/it] +2025-02-05 23:58:17 - ERROR - stderr - 67%|██████▋ | 14960/22434 [13:50:37<5:12:26, 2.51s/it] +2025-02-05 23:58:17 - ERROR - stderr - +2025-02-05 23:58:17 - ERROR - stderr - +2025-02-05 23:58:17 - INFO - stdout - {'loss': 0.456, 'grad_norm': 1.2081223726272583, 'learning_rate': 5.278394776254214e-06, 'epoch': 2.0} +2025-02-05 23:58:17 - ERROR - stderr - 67%|██████▋ | 14960/22434 [13:50:37<5:12:26, 2.51s/it] +2025-02-05 23:58:20 - ERROR - stderr - 67%|██████▋ | 14961/22434 [13:50:40<5:24:58, 2.61s/it] +2025-02-05 23:58:20 - ERROR - stderr - +2025-02-05 23:58:20 - ERROR - stderr - +2025-02-05 23:58:20 - INFO - stdout - {'loss': 0.4558, 'grad_norm': 1.1563613414764404, 'learning_rate': 5.2771221446458445e-06, 'epoch': 2.0} +2025-02-05 23:58:20 - ERROR - stderr - 67%|██████▋ | 14961/22434 [13:50:40<5:24:58, 2.61s/it] +2025-02-05 23:58:23 - ERROR - stderr - 67%|██████▋ | 14962/22434 [13:50:42<5:22:50, 2.59s/it] +2025-02-05 23:58:23 - ERROR - stderr - +2025-02-05 23:58:23 - ERROR - stderr - +2025-02-05 23:58:23 - INFO - stdout - {'loss': 0.4938, 'grad_norm': 1.101528286933899, 'learning_rate': 5.275849611481352e-06, 'epoch': 2.0} +2025-02-05 23:58:23 - ERROR - stderr - 67%|██████▋ | 14962/22434 [13:50:42<5:22:50, 2.59s/it] +2025-02-05 23:58:25 - ERROR - stderr - 67%|██████▋ | 14963/22434 [13:50:45<5:20:29, 2.57s/it] +2025-02-05 23:58:25 - ERROR - stderr - +2025-02-05 23:58:25 - ERROR - stderr - +2025-02-05 23:58:25 - INFO - stdout - {'loss': 0.4511, 'grad_norm': 1.0396020412445068, 'learning_rate': 5.27457717678725e-06, 'epoch': 2.0} +2025-02-05 23:58:25 - ERROR - stderr - 67%|██████▋ | 14963/22434 [13:50:45<5:20:29, 2.57s/it] +2025-02-05 23:58:28 - ERROR - stderr - 67%|██████▋ | 14964/22434 [13:50:47<5:17:14, 2.55s/it] +2025-02-05 23:58:28 - ERROR - stderr - +2025-02-05 23:58:28 - ERROR - stderr - +2025-02-05 23:58:28 - INFO - stdout - {'loss': 0.455, 'grad_norm': 1.2446961402893066, 'learning_rate': 5.273304840590066e-06, 'epoch': 2.0} +2025-02-05 23:58:28 - ERROR - stderr - 67%|██████▋ | 14964/22434 [13:50:47<5:17:14, 2.55s/it] +2025-02-05 23:58:30 - ERROR - stderr - 67%|██████▋ | 14965/22434 [13:50:50<5:16:38, 2.54s/it] +2025-02-05 23:58:30 - ERROR - stderr - +2025-02-05 23:58:30 - ERROR - stderr - +2025-02-05 23:58:30 - INFO - stdout - {'loss': 0.4274, 'grad_norm': 1.0518479347229004, 'learning_rate': 5.272032602916317e-06, 'epoch': 2.0} +2025-02-05 23:58:30 - ERROR - stderr - 67%|██████▋ | 14965/22434 [13:50:50<5:16:38, 2.54s/it] +2025-02-05 23:58:33 - ERROR - stderr - 67%|██████▋ | 14966/22434 [13:50:52<5:15:33, 2.54s/it] +2025-02-05 23:58:33 - ERROR - stderr - +2025-02-05 23:58:33 - ERROR - stderr - +2025-02-05 23:58:33 - INFO - stdout - {'loss': 0.4736, 'grad_norm': 1.1066879034042358, 'learning_rate': 5.270760463792523e-06, 'epoch': 2.0} +2025-02-05 23:58:33 - ERROR - stderr - 67%|██████▋ | 14966/22434 [13:50:52<5:15:33, 2.54s/it] +2025-02-05 23:58:35 - ERROR - stderr - 67%|██████▋ | 14967/22434 [13:50:55<5:14:51, 2.53s/it] +2025-02-05 23:58:35 - ERROR - stderr - +2025-02-05 23:58:35 - ERROR - stderr - +2025-02-05 23:58:35 - INFO - stdout - {'loss': 0.4654, 'grad_norm': 1.1960071325302124, 'learning_rate': 5.2694884232452086e-06, 'epoch': 2.0} +2025-02-05 23:58:35 - ERROR - stderr - 67%|██████▋ | 14967/22434 [13:50:55<5:14:51, 2.53s/it] +2025-02-05 23:58:38 - ERROR - stderr - 67%|██████▋ | 14968/22434 [13:50:58<5:19:43, 2.57s/it] +2025-02-05 23:58:38 - ERROR - stderr - +2025-02-05 23:58:38 - ERROR - stderr - +2025-02-05 23:58:38 - INFO - stdout - {'loss': 0.423, 'grad_norm': 1.0354878902435303, 'learning_rate': 5.268216481300876e-06, 'epoch': 2.0} +2025-02-05 23:58:38 - ERROR - stderr - 67%|██████▋ | 14968/22434 [13:50:58<5:19:43, 2.57s/it] +2025-02-05 23:58:40 - ERROR - stderr - 67%|██████▋ | 14969/22434 [13:51:00<5:20:23, 2.58s/it] +2025-02-05 23:58:40 - ERROR - stderr - +2025-02-05 23:58:40 - ERROR - stderr - +2025-02-05 23:58:40 - INFO - stdout - {'loss': 0.4263, 'grad_norm': 1.1417587995529175, 'learning_rate': 5.266944637986046e-06, 'epoch': 2.0} +2025-02-05 23:58:40 - ERROR - stderr - 67%|██████▋ | 14969/22434 [13:51:00<5:20:23, 2.58s/it] +2025-02-05 23:58:43 - ERROR - stderr - 67%|██████▋ | 14970/22434 [13:51:03<5:14:37, 2.53s/it] +2025-02-05 23:58:43 - ERROR - stderr - +2025-02-05 23:58:43 - ERROR - stderr - +2025-02-05 23:58:43 - INFO - stdout - {'loss': 0.4161, 'grad_norm': 1.0643304586410522, 'learning_rate': 5.265672893327224e-06, 'epoch': 2.0} +2025-02-05 23:58:43 - ERROR - stderr - 67%|██████▋ | 14970/22434 [13:51:03<5:14:37, 2.53s/it] +2025-02-05 23:58:45 - ERROR - stderr - 67%|██████▋ | 14971/22434 [13:51:05<5:12:49, 2.52s/it] +2025-02-05 23:58:45 - ERROR - stderr - +2025-02-05 23:58:45 - ERROR - stderr - +2025-02-05 23:58:45 - INFO - stdout - {'loss': 0.4201, 'grad_norm': 1.1227037906646729, 'learning_rate': 5.264401247350921e-06, 'epoch': 2.0} +2025-02-05 23:58:45 - ERROR - stderr - 67%|██████▋ | 14971/22434 [13:51:05<5:12:49, 2.52s/it] +2025-02-05 23:58:48 - ERROR - stderr - 67%|██████▋ | 14972/22434 [13:51:08<5:11:10, 2.50s/it] +2025-02-05 23:58:48 - ERROR - stderr - +2025-02-05 23:58:48 - ERROR - stderr - +2025-02-05 23:58:48 - INFO - stdout - {'loss': 0.4266, 'grad_norm': 1.253049373626709, 'learning_rate': 5.263129700083642e-06, 'epoch': 2.0} +2025-02-05 23:58:48 - ERROR - stderr - 67%|██████▋ | 14972/22434 [13:51:08<5:11:10, 2.50s/it] +2025-02-05 23:58:50 - ERROR - stderr - 67%|██████▋ | 14973/22434 [13:51:10<5:11:47, 2.51s/it] +2025-02-05 23:58:50 - ERROR - stderr - +2025-02-05 23:58:50 - ERROR - stderr - +2025-02-05 23:58:50 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.1163498163223267, 'learning_rate': 5.261858251551893e-06, 'epoch': 2.0} +2025-02-05 23:58:50 - ERROR - stderr - 67%|██████▋ | 14973/22434 [13:51:10<5:11:47, 2.51s/it] +2025-02-05 23:58:53 - ERROR - stderr - 67%|██████▋ | 14974/22434 [13:51:13<5:13:49, 2.52s/it] +2025-02-05 23:58:53 - ERROR - stderr - +2025-02-05 23:58:53 - ERROR - stderr - +2025-02-05 23:58:53 - INFO - stdout - {'loss': 0.4066, 'grad_norm': 1.1221153736114502, 'learning_rate': 5.260586901782172e-06, 'epoch': 2.0} +2025-02-05 23:58:53 - ERROR - stderr - 67%|██████▋ | 14974/22434 [13:51:13<5:13:49, 2.52s/it] +2025-02-05 23:58:55 - ERROR - stderr - 67%|██████▋ | 14975/22434 [13:51:15<5:13:33, 2.52s/it] +2025-02-05 23:58:55 - ERROR - stderr - +2025-02-05 23:58:55 - ERROR - stderr - +2025-02-05 23:58:55 - INFO - stdout - {'loss': 0.4544, 'grad_norm': 1.1791040897369385, 'learning_rate': 5.2593156508009844e-06, 'epoch': 2.0} +2025-02-05 23:58:55 - ERROR - stderr - 67%|██████▋ | 14975/22434 [13:51:15<5:13:33, 2.52s/it] +2025-02-05 23:58:58 - ERROR - stderr - 67%|██████▋ | 14976/22434 [13:51:18<5:13:49, 2.52s/it] +2025-02-05 23:58:58 - ERROR - stderr - +2025-02-05 23:58:58 - ERROR - stderr - +2025-02-05 23:58:58 - INFO - stdout - {'loss': 0.4169, 'grad_norm': 1.2274538278579712, 'learning_rate': 5.258044498634825e-06, 'epoch': 2.0} +2025-02-05 23:58:58 - ERROR - stderr - 67%|██████▋ | 14976/22434 [13:51:18<5:13:49, 2.52s/it] +2025-02-05 23:59:00 - ERROR - stderr - 67%|██████▋ | 14977/22434 [13:51:20<5:12:55, 2.52s/it] +2025-02-05 23:59:00 - ERROR - stderr - +2025-02-05 23:59:00 - ERROR - stderr - +2025-02-05 23:59:00 - INFO - stdout - {'loss': 0.4114, 'grad_norm': 1.2820711135864258, 'learning_rate': 5.256773445310191e-06, 'epoch': 2.0} +2025-02-05 23:59:00 - ERROR - stderr - 67%|██████▋ | 14977/22434 [13:51:20<5:12:55, 2.52s/it] +2025-02-05 23:59:03 - ERROR - stderr - 67%|██████▋ | 14978/22434 [13:51:23<5:10:57, 2.50s/it] +2025-02-05 23:59:03 - ERROR - stderr - +2025-02-05 23:59:03 - ERROR - stderr - +2025-02-05 23:59:03 - INFO - stdout - {'loss': 0.4383, 'grad_norm': 1.4803552627563477, 'learning_rate': 5.255502490853575e-06, 'epoch': 2.0} +2025-02-05 23:59:03 - ERROR - stderr - 67%|██████▋ | 14978/22434 [13:51:23<5:10:57, 2.50s/it] +2025-02-05 23:59:05 - ERROR - stderr - 67%|██████▋ | 14979/22434 [13:51:25<5:11:06, 2.50s/it] +2025-02-05 23:59:05 - ERROR - stderr - +2025-02-05 23:59:05 - ERROR - stderr - +2025-02-05 23:59:05 - INFO - stdout - {'loss': 0.4196, 'grad_norm': 1.3164548873901367, 'learning_rate': 5.2542316352914735e-06, 'epoch': 2.0} +2025-02-05 23:59:05 - ERROR - stderr - 67%|██████▋ | 14979/22434 [13:51:25<5:11:06, 2.50s/it] +2025-02-05 23:59:08 - ERROR - stderr - 67%|██████▋ | 14980/22434 [13:51:28<5:08:18, 2.48s/it] +2025-02-05 23:59:08 - ERROR - stderr - +2025-02-05 23:59:08 - ERROR - stderr - +2025-02-05 23:59:08 - INFO - stdout - {'loss': 0.4117, 'grad_norm': 1.2315852642059326, 'learning_rate': 5.252960878650364e-06, 'epoch': 2.0} +2025-02-05 23:59:08 - ERROR - stderr - 67%|██████▋ | 14980/22434 [13:51:28<5:08:18, 2.48s/it] +2025-02-05 23:59:10 - ERROR - stderr - 67%|██████▋ | 14981/22434 [13:51:30<5:10:31, 2.50s/it] +2025-02-05 23:59:10 - ERROR - stderr - +2025-02-05 23:59:10 - ERROR - stderr - +2025-02-05 23:59:10 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.208964467048645, 'learning_rate': 5.251690220956751e-06, 'epoch': 2.0} +2025-02-05 23:59:10 - ERROR - stderr - 67%|██████▋ | 14981/22434 [13:51:30<5:10:31, 2.50s/it] +2025-02-05 23:59:13 - ERROR - stderr - 67%|██████▋ | 14982/22434 [13:51:33<5:12:57, 2.52s/it] +2025-02-05 23:59:13 - ERROR - stderr - +2025-02-05 23:59:13 - ERROR - stderr - +2025-02-05 23:59:13 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.4122995138168335, 'learning_rate': 5.250419662237104e-06, 'epoch': 2.0} +2025-02-05 23:59:13 - ERROR - stderr - 67%|██████▋ | 14982/22434 [13:51:33<5:12:57, 2.52s/it] +2025-02-05 23:59:15 - ERROR - stderr - 67%|██████▋ | 14983/22434 [13:51:35<5:12:13, 2.51s/it] +2025-02-05 23:59:15 - ERROR - stderr - +2025-02-05 23:59:15 - ERROR - stderr - +2025-02-05 23:59:15 - INFO - stdout - {'loss': 0.4082, 'grad_norm': 1.3758465051651, 'learning_rate': 5.249149202517922e-06, 'epoch': 2.0} +2025-02-05 23:59:15 - ERROR - stderr - 67%|██████▋ | 14983/22434 [13:51:35<5:12:13, 2.51s/it] +2025-02-05 23:59:18 - ERROR - stderr - 67%|██████▋ | 14984/22434 [13:51:38<5:10:15, 2.50s/it] +2025-02-05 23:59:18 - ERROR - stderr - +2025-02-05 23:59:18 - ERROR - stderr - +2025-02-05 23:59:18 - INFO - stdout - {'loss': 0.4118, 'grad_norm': 1.3981959819793701, 'learning_rate': 5.247878841825676e-06, 'epoch': 2.0} +2025-02-05 23:59:18 - ERROR - stderr - 67%|██████▋ | 14984/22434 [13:51:38<5:10:15, 2.50s/it] +2025-02-05 23:59:20 - ERROR - stderr - 67%|██████▋ | 14985/22434 [13:51:40<5:10:03, 2.50s/it] +2025-02-05 23:59:20 - ERROR - stderr - +2025-02-05 23:59:20 - ERROR - stderr - +2025-02-05 23:59:20 - INFO - stdout - {'loss': 0.443, 'grad_norm': 1.4900596141815186, 'learning_rate': 5.246608580186843e-06, 'epoch': 2.0} +2025-02-05 23:59:20 - ERROR - stderr - 67%|██████▋ | 14985/22434 [13:51:40<5:10:03, 2.50s/it] +2025-02-05 23:59:23 - ERROR - stderr - 67%|██████▋ | 14986/22434 [13:51:43<5:11:17, 2.51s/it] +2025-02-05 23:59:23 - ERROR - stderr - +2025-02-05 23:59:23 - ERROR - stderr - +2025-02-05 23:59:23 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.3102511167526245, 'learning_rate': 5.2453384176279135e-06, 'epoch': 2.0} +2025-02-05 23:59:23 - ERROR - stderr - 67%|██████▋ | 14986/22434 [13:51:43<5:11:17, 2.51s/it] +2025-02-05 23:59:25 - ERROR - stderr - 67%|██████▋ | 14987/22434 [13:51:45<5:12:36, 2.52s/it] +2025-02-05 23:59:26 - ERROR - stderr - +2025-02-05 23:59:26 - ERROR - stderr - +2025-02-05 23:59:26 - INFO - stdout - {'loss': 0.3596, 'grad_norm': 1.2923847436904907, 'learning_rate': 5.244068354175352e-06, 'epoch': 2.0} +2025-02-05 23:59:26 - ERROR - stderr - 67%|██████▋ | 14987/22434 [13:51:45<5:12:36, 2.52s/it] +2025-02-05 23:59:28 - ERROR - stderr - 67%|██████▋ | 14988/22434 [13:51:48<5:13:38, 2.53s/it] +2025-02-05 23:59:28 - ERROR - stderr - +2025-02-05 23:59:28 - ERROR - stderr - +2025-02-05 23:59:28 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.3789196014404297, 'learning_rate': 5.242798389855634e-06, 'epoch': 2.0} +2025-02-05 23:59:28 - ERROR - stderr - 67%|██████▋ | 14988/22434 [13:51:48<5:13:38, 2.53s/it] +2025-02-05 23:59:30 - ERROR - stderr - 67%|██████▋ | 14989/22434 [13:51:50<5:11:44, 2.51s/it] +2025-02-05 23:59:31 - ERROR - stderr - +2025-02-05 23:59:31 - ERROR - stderr - +2025-02-05 23:59:31 - INFO - stdout - {'loss': 0.4069, 'grad_norm': 1.4346433877944946, 'learning_rate': 5.2415285246952305e-06, 'epoch': 2.0} +2025-02-05 23:59:31 - ERROR - stderr - 67%|██████▋ | 14989/22434 [13:51:50<5:11:44, 2.51s/it] +2025-02-05 23:59:33 - ERROR - stderr - 67%|██████▋ | 14990/22434 [13:51:53<5:17:19, 2.56s/it] +2025-02-05 23:59:33 - ERROR - stderr - +2025-02-05 23:59:33 - ERROR - stderr - +2025-02-05 23:59:33 - INFO - stdout - {'loss': 0.4206, 'grad_norm': 1.5303571224212646, 'learning_rate': 5.2402587587206134e-06, 'epoch': 2.0} +2025-02-05 23:59:33 - ERROR - stderr - 67%|██████▋ | 14990/22434 [13:51:53<5:17:19, 2.56s/it] +2025-02-05 23:59:36 - ERROR - stderr - 67%|██████▋ | 14991/22434 [13:51:55<5:11:51, 2.51s/it] +2025-02-05 23:59:36 - ERROR - stderr - +2025-02-05 23:59:36 - ERROR - stderr - +2025-02-05 23:59:36 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.2710036039352417, 'learning_rate': 5.238989091958246e-06, 'epoch': 2.0} +2025-02-05 23:59:36 - ERROR - stderr - 67%|██████▋ | 14991/22434 [13:51:55<5:11:51, 2.51s/it] +2025-02-05 23:59:38 - ERROR - stderr - 67%|██████▋ | 14992/22434 [13:51:58<5:10:11, 2.50s/it] +2025-02-05 23:59:38 - ERROR - stderr - +2025-02-05 23:59:38 - ERROR - stderr - +2025-02-05 23:59:38 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.3869820833206177, 'learning_rate': 5.2377195244345965e-06, 'epoch': 2.0} +2025-02-05 23:59:38 - ERROR - stderr - 67%|██████▋ | 14992/22434 [13:51:58<5:10:11, 2.50s/it] +2025-02-05 23:59:41 - ERROR - stderr - 67%|██████▋ | 14993/22434 [13:52:00<5:10:56, 2.51s/it] +2025-02-05 23:59:41 - ERROR - stderr - +2025-02-05 23:59:41 - ERROR - stderr - +2025-02-05 23:59:41 - INFO - stdout - {'loss': 0.4351, 'grad_norm': 1.4926518201828003, 'learning_rate': 5.236450056176127e-06, 'epoch': 2.0} +2025-02-05 23:59:41 - ERROR - stderr - 67%|██████▋ | 14993/22434 [13:52:00<5:10:56, 2.51s/it] +2025-02-05 23:59:43 - ERROR - stderr - 67%|██████▋ | 14994/22434 [13:52:03<5:17:50, 2.56s/it] +2025-02-05 23:59:43 - ERROR - stderr - +2025-02-05 23:59:43 - ERROR - stderr - +2025-02-05 23:59:43 - INFO - stdout - {'loss': 0.4313, 'grad_norm': 1.495334506034851, 'learning_rate': 5.235180687209296e-06, 'epoch': 2.01} +2025-02-05 23:59:43 - ERROR - stderr - 67%|██████▋ | 14994/22434 [13:52:03<5:17:50, 2.56s/it] +2025-02-05 23:59:46 - ERROR - stderr - 67%|██████▋ | 14995/22434 [13:52:05<5:13:21, 2.53s/it] +2025-02-05 23:59:46 - ERROR - stderr - +2025-02-05 23:59:46 - ERROR - stderr - +2025-02-05 23:59:46 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.3399914503097534, 'learning_rate': 5.233911417560567e-06, 'epoch': 2.01} +2025-02-05 23:59:46 - ERROR - stderr - 67%|██████▋ | 14995/22434 [13:52:06<5:13:21, 2.53s/it] +2025-02-05 23:59:48 - ERROR - stderr - 67%|██████▋ | 14996/22434 [13:52:08<5:12:33, 2.52s/it] +2025-02-05 23:59:48 - ERROR - stderr - +2025-02-05 23:59:48 - ERROR - stderr - +2025-02-05 23:59:48 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.5036503076553345, 'learning_rate': 5.232642247256391e-06, 'epoch': 2.01} +2025-02-05 23:59:48 - ERROR - stderr - 67%|██████▋ | 14996/22434 [13:52:08<5:12:33, 2.52s/it] +2025-02-05 23:59:51 - ERROR - stderr - 67%|██████▋ | 14997/22434 [13:52:10<5:09:27, 2.50s/it] +2025-02-05 23:59:51 - ERROR - stderr - +2025-02-05 23:59:51 - ERROR - stderr - +2025-02-05 23:59:51 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.3649390935897827, 'learning_rate': 5.231373176323227e-06, 'epoch': 2.01} +2025-02-05 23:59:51 - ERROR - stderr - 67%|██████▋ | 14997/22434 [13:52:10<5:09:27, 2.50s/it] +2025-02-05 23:59:53 - ERROR - stderr - 67%|██████▋ | 14998/22434 [13:52:13<5:06:40, 2.47s/it] +2025-02-05 23:59:53 - ERROR - stderr - +2025-02-05 23:59:53 - ERROR - stderr - +2025-02-05 23:59:53 - INFO - stdout - {'loss': 0.433, 'grad_norm': 1.3976646661758423, 'learning_rate': 5.230104204787525e-06, 'epoch': 2.01} +2025-02-05 23:59:53 - ERROR - stderr - 67%|██████▋ | 14998/22434 [13:52:13<5:06:40, 2.47s/it] +2025-02-05 23:59:56 - ERROR - stderr - 67%|██████▋ | 14999/22434 [13:52:15<5:06:35, 2.47s/it] +2025-02-05 23:59:56 - ERROR - stderr - +2025-02-05 23:59:56 - ERROR - stderr - +2025-02-05 23:59:56 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.3757050037384033, 'learning_rate': 5.228835332675737e-06, 'epoch': 2.01} +2025-02-05 23:59:56 - ERROR - stderr - 67%|██████▋ | 14999/22434 [13:52:15<5:06:35, 2.47s/it] +2025-02-05 23:59:58 - ERROR - stderr - 67%|██████▋ | 15000/22434 [13:52:18<5:07:42, 2.48s/it] +2025-02-05 23:59:58 - ERROR - stderr - +2025-02-05 23:59:58 - ERROR - stderr - +2025-02-05 23:59:58 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.4187084436416626, 'learning_rate': 5.227566560014315e-06, 'epoch': 2.01} +2025-02-05 23:59:58 - ERROR - stderr - 67%|██████▋ | 15000/22434 [13:52:18<5:07:42, 2.48s/it] +2025-02-06 00:00:01 - ERROR - stderr - 67%|██████▋ | 15001/22434 [13:52:20<5:07:42, 2.48s/it] +2025-02-06 00:00:01 - ERROR - stderr - +2025-02-06 00:00:01 - ERROR - stderr - +2025-02-06 00:00:01 - INFO - stdout - {'loss': 0.3527, 'grad_norm': 1.1158243417739868, 'learning_rate': 5.226297886829695e-06, 'epoch': 2.01} +2025-02-06 00:00:01 - ERROR - stderr - 67%|██████▋ | 15001/22434 [13:52:20<5:07:42, 2.48s/it] +2025-02-06 00:00:03 - ERROR - stderr - 67%|██████▋ | 15002/22434 [13:52:23<5:10:30, 2.51s/it] +2025-02-06 00:00:03 - ERROR - stderr - +2025-02-06 00:00:03 - ERROR - stderr - +2025-02-06 00:00:03 - INFO - stdout - {'loss': 0.4278, 'grad_norm': 1.4159162044525146, 'learning_rate': 5.225029313148333e-06, 'epoch': 2.01} +2025-02-06 00:00:03 - ERROR - stderr - 67%|██████▋ | 15002/22434 [13:52:23<5:10:30, 2.51s/it] +2025-02-06 00:00:06 - ERROR - stderr - 67%|██████▋ | 15003/22434 [13:52:25<5:09:17, 2.50s/it] +2025-02-06 00:00:06 - ERROR - stderr - +2025-02-06 00:00:06 - ERROR - stderr - +2025-02-06 00:00:06 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.4480334520339966, 'learning_rate': 5.223760838996663e-06, 'epoch': 2.01} +2025-02-06 00:00:06 - ERROR - stderr - 67%|██████▋ | 15003/22434 [13:52:25<5:09:17, 2.50s/it] +2025-02-06 00:00:08 - ERROR - stderr - 67%|██████▋ | 15004/22434 [13:52:28<5:09:47, 2.50s/it] +2025-02-06 00:00:08 - ERROR - stderr - +2025-02-06 00:00:08 - ERROR - stderr - +2025-02-06 00:00:08 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.2829251289367676, 'learning_rate': 5.222492464401124e-06, 'epoch': 2.01} +2025-02-06 00:00:08 - ERROR - stderr - 67%|██████▋ | 15004/22434 [13:52:28<5:09:47, 2.50s/it] +2025-02-06 00:00:11 - ERROR - stderr - 67%|██████▋ | 15005/22434 [13:52:30<5:11:56, 2.52s/it] +2025-02-06 00:00:11 - ERROR - stderr - +2025-02-06 00:00:11 - ERROR - stderr - +2025-02-06 00:00:11 - INFO - stdout - {'loss': 0.4105, 'grad_norm': 1.2952005863189697, 'learning_rate': 5.221224189388165e-06, 'epoch': 2.01} +2025-02-06 00:00:11 - ERROR - stderr - 67%|██████▋ | 15005/22434 [13:52:30<5:11:56, 2.52s/it] +2025-02-06 00:00:13 - ERROR - stderr - 67%|██████▋ | 15006/22434 [13:52:33<5:08:47, 2.49s/it] +2025-02-06 00:00:13 - ERROR - stderr - +2025-02-06 00:00:13 - ERROR - stderr - +2025-02-06 00:00:13 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.4485445022583008, 'learning_rate': 5.219956013984209e-06, 'epoch': 2.01} +2025-02-06 00:00:13 - ERROR - stderr - 67%|██████▋ | 15006/22434 [13:52:33<5:08:47, 2.49s/it] +2025-02-06 00:00:16 - ERROR - stderr - 67%|██████▋ | 15007/22434 [13:52:35<5:09:02, 2.50s/it] +2025-02-06 00:00:16 - ERROR - stderr - +2025-02-06 00:00:16 - ERROR - stderr - +2025-02-06 00:00:16 - INFO - stdout - {'loss': 0.3392, 'grad_norm': 1.0720840692520142, 'learning_rate': 5.218687938215702e-06, 'epoch': 2.01} +2025-02-06 00:00:16 - ERROR - stderr - 67%|██████▋ | 15007/22434 [13:52:35<5:09:02, 2.50s/it] +2025-02-06 00:00:18 - ERROR - stderr - 67%|██████▋ | 15008/22434 [13:52:38<5:08:46, 2.49s/it] +2025-02-06 00:00:18 - ERROR - stderr - +2025-02-06 00:00:18 - ERROR - stderr - +2025-02-06 00:00:18 - INFO - stdout - {'loss': 0.401, 'grad_norm': 1.2559359073638916, 'learning_rate': 5.217419962109067e-06, 'epoch': 2.01} +2025-02-06 00:00:18 - ERROR - stderr - 67%|██████▋ | 15008/22434 [13:52:38<5:08:46, 2.49s/it] +2025-02-06 00:00:21 - ERROR - stderr - 67%|██████▋ | 15009/22434 [13:52:40<5:11:38, 2.52s/it] +2025-02-06 00:00:21 - ERROR - stderr - +2025-02-06 00:00:21 - ERROR - stderr - +2025-02-06 00:00:21 - INFO - stdout - {'loss': 0.4332, 'grad_norm': 1.3435348272323608, 'learning_rate': 5.216152085690736e-06, 'epoch': 2.01} +2025-02-06 00:00:21 - ERROR - stderr - 67%|██████▋ | 15009/22434 [13:52:40<5:11:38, 2.52s/it] +2025-02-06 00:00:23 - ERROR - stderr - 67%|██████▋ | 15010/22434 [13:52:43<5:14:25, 2.54s/it] +2025-02-06 00:00:23 - ERROR - stderr - +2025-02-06 00:00:23 - ERROR - stderr - +2025-02-06 00:00:23 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.2701259851455688, 'learning_rate': 5.214884308987136e-06, 'epoch': 2.01} +2025-02-06 00:00:23 - ERROR - stderr - 67%|██████▋ | 15010/22434 [13:52:43<5:14:25, 2.54s/it] +2025-02-06 00:00:26 - ERROR - stderr - 67%|██████▋ | 15011/22434 [13:52:46<5:17:34, 2.57s/it] +2025-02-06 00:00:26 - ERROR - stderr - +2025-02-06 00:00:26 - ERROR - stderr - +2025-02-06 00:00:26 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.2609282732009888, 'learning_rate': 5.213616632024695e-06, 'epoch': 2.01} +2025-02-06 00:00:26 - ERROR - stderr - 67%|██████▋ | 15011/22434 [13:52:46<5:17:34, 2.57s/it] +2025-02-06 00:00:28 - ERROR - stderr - 67%|██████▋ | 15012/22434 [13:52:48<5:16:13, 2.56s/it] +2025-02-06 00:00:28 - ERROR - stderr - +2025-02-06 00:00:28 - ERROR - stderr - +2025-02-06 00:00:28 - INFO - stdout - {'loss': 0.3634, 'grad_norm': 1.2839031219482422, 'learning_rate': 5.212349054829835e-06, 'epoch': 2.01} +2025-02-06 00:00:28 - ERROR - stderr - 67%|██████▋ | 15012/22434 [13:52:48<5:16:13, 2.56s/it] +2025-02-06 00:00:31 - ERROR - stderr - 67%|██████▋ | 15013/22434 [13:52:51<5:13:42, 2.54s/it] +2025-02-06 00:00:31 - ERROR - stderr - +2025-02-06 00:00:31 - ERROR - stderr - +2025-02-06 00:00:31 - INFO - stdout - {'loss': 0.447, 'grad_norm': 1.4811216592788696, 'learning_rate': 5.211081577428978e-06, 'epoch': 2.01} +2025-02-06 00:00:31 - ERROR - stderr - 67%|██████▋ | 15013/22434 [13:52:51<5:13:42, 2.54s/it] +2025-02-06 00:00:33 - ERROR - stderr - 67%|██████▋ | 15014/22434 [13:52:53<5:11:32, 2.52s/it] +2025-02-06 00:00:33 - ERROR - stderr - +2025-02-06 00:00:33 - ERROR - stderr - +2025-02-06 00:00:33 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.2900382280349731, 'learning_rate': 5.2098141998485415e-06, 'epoch': 2.01} +2025-02-06 00:00:33 - ERROR - stderr - 67%|██████▋ | 15014/22434 [13:52:53<5:11:32, 2.52s/it] +2025-02-06 00:00:36 - ERROR - stderr - 67%|██████▋ | 15015/22434 [13:52:56<5:09:14, 2.50s/it] +2025-02-06 00:00:36 - ERROR - stderr - +2025-02-06 00:00:36 - ERROR - stderr - +2025-02-06 00:00:36 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 1.4951683282852173, 'learning_rate': 5.2085469221149465e-06, 'epoch': 2.01} +2025-02-06 00:00:36 - ERROR - stderr - 67%|██████▋ | 15015/22434 [13:52:56<5:09:14, 2.50s/it] +2025-02-06 00:00:38 - ERROR - stderr - 67%|██████▋ | 15016/22434 [13:52:58<5:09:24, 2.50s/it] +2025-02-06 00:00:38 - ERROR - stderr - +2025-02-06 00:00:38 - ERROR - stderr - +2025-02-06 00:00:38 - INFO - stdout - {'loss': 0.4075, 'grad_norm': 1.4090937376022339, 'learning_rate': 5.207279744254605e-06, 'epoch': 2.01} +2025-02-06 00:00:38 - ERROR - stderr - 67%|██████▋ | 15016/22434 [13:52:58<5:09:24, 2.50s/it] +2025-02-06 00:00:41 - ERROR - stderr - 67%|██████▋ | 15017/22434 [13:53:01<5:08:35, 2.50s/it] +2025-02-06 00:00:41 - ERROR - stderr - +2025-02-06 00:00:41 - ERROR - stderr - +2025-02-06 00:00:41 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.286105751991272, 'learning_rate': 5.206012666293931e-06, 'epoch': 2.01} +2025-02-06 00:00:41 - ERROR - stderr - 67%|██████▋ | 15017/22434 [13:53:01<5:08:35, 2.50s/it] +2025-02-06 00:00:43 - ERROR - stderr - 67%|██████▋ | 15018/22434 [13:53:03<5:08:56, 2.50s/it] +2025-02-06 00:00:43 - ERROR - stderr - +2025-02-06 00:00:43 - ERROR - stderr - +2025-02-06 00:00:43 - INFO - stdout - {'loss': 0.4432, 'grad_norm': 1.4553841352462769, 'learning_rate': 5.204745688259336e-06, 'epoch': 2.01} +2025-02-06 00:00:43 - ERROR - stderr - 67%|██████▋ | 15018/22434 [13:53:03<5:08:56, 2.50s/it] +2025-02-06 00:00:46 - ERROR - stderr - 67%|██████▋ | 15019/22434 [13:53:06<5:24:18, 2.62s/it] +2025-02-06 00:00:46 - ERROR - stderr - +2025-02-06 00:00:46 - ERROR - stderr - +2025-02-06 00:00:46 - INFO - stdout - {'loss': 0.4203, 'grad_norm': 1.419102430343628, 'learning_rate': 5.203478810177232e-06, 'epoch': 2.01} +2025-02-06 00:00:46 - ERROR - stderr - 67%|██████▋ | 15019/22434 [13:53:06<5:24:18, 2.62s/it] +2025-02-06 00:00:49 - ERROR - stderr - 67%|██████▋ | 15020/22434 [13:53:08<5:19:15, 2.58s/it] +2025-02-06 00:00:49 - ERROR - stderr - +2025-02-06 00:00:49 - ERROR - stderr - +2025-02-06 00:00:49 - INFO - stdout - {'loss': 0.3801, 'grad_norm': 1.311758279800415, 'learning_rate': 5.202212032074014e-06, 'epoch': 2.01} +2025-02-06 00:00:49 - ERROR - stderr - 67%|██████▋ | 15020/22434 [13:53:09<5:19:15, 2.58s/it] +2025-02-06 00:00:51 - ERROR - stderr - 67%|██████▋ | 15021/22434 [13:53:11<5:14:51, 2.55s/it] +2025-02-06 00:00:51 - ERROR - stderr - +2025-02-06 00:00:51 - ERROR - stderr - +2025-02-06 00:00:51 - INFO - stdout - {'loss': 0.34, 'grad_norm': 1.3151917457580566, 'learning_rate': 5.200945353976103e-06, 'epoch': 2.01} +2025-02-06 00:00:51 - ERROR - stderr - 67%|██████▋ | 15021/22434 [13:53:11<5:14:51, 2.55s/it] +2025-02-06 00:00:54 - ERROR - stderr - 67%|██████▋ | 15022/22434 [13:53:13<5:13:24, 2.54s/it] +2025-02-06 00:00:54 - ERROR - stderr - +2025-02-06 00:00:54 - ERROR - stderr - +2025-02-06 00:00:54 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.1964595317840576, 'learning_rate': 5.199678775909889e-06, 'epoch': 2.01} +2025-02-06 00:00:54 - ERROR - stderr - 67%|██████▋ | 15022/22434 [13:53:14<5:13:24, 2.54s/it] +2025-02-06 00:00:56 - ERROR - stderr - 67%|██████▋ | 15023/22434 [13:53:16<5:12:07, 2.53s/it] +2025-02-06 00:00:56 - ERROR - stderr - +2025-02-06 00:00:56 - ERROR - stderr - +2025-02-06 00:00:56 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.3623753786087036, 'learning_rate': 5.1984122979017785e-06, 'epoch': 2.01} +2025-02-06 00:00:56 - ERROR - stderr - 67%|██████▋ | 15023/22434 [13:53:16<5:12:07, 2.53s/it] +2025-02-06 00:00:59 - ERROR - stderr - 67%|██████▋ | 15024/22434 [13:53:18<5:09:04, 2.50s/it] +2025-02-06 00:00:59 - ERROR - stderr - +2025-02-06 00:00:59 - ERROR - stderr - +2025-02-06 00:00:59 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.955280065536499, 'learning_rate': 5.197145919978172e-06, 'epoch': 2.01} +2025-02-06 00:00:59 - ERROR - stderr - 67%|██████▋ | 15024/22434 [13:53:18<5:09:04, 2.50s/it] +2025-02-06 00:01:01 - ERROR - stderr - 67%|██████▋ | 15025/22434 [13:53:21<5:11:31, 2.52s/it] +2025-02-06 00:01:01 - ERROR - stderr - +2025-02-06 00:01:01 - ERROR - stderr - +2025-02-06 00:01:01 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.5207499265670776, 'learning_rate': 5.195879642165458e-06, 'epoch': 2.01} +2025-02-06 00:01:01 - ERROR - stderr - 67%|██████▋ | 15025/22434 [13:53:21<5:11:31, 2.52s/it] +2025-02-06 00:01:04 - ERROR - stderr - 67%|██████▋ | 15026/22434 [13:53:24<5:14:21, 2.55s/it] +2025-02-06 00:01:04 - ERROR - stderr - +2025-02-06 00:01:04 - ERROR - stderr - +2025-02-06 00:01:04 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.4051802158355713, 'learning_rate': 5.194613464490042e-06, 'epoch': 2.01} +2025-02-06 00:01:04 - ERROR - stderr - 67%|██████▋ | 15026/22434 [13:53:24<5:14:21, 2.55s/it] +2025-02-06 00:01:06 - ERROR - stderr - 67%|██████▋ | 15027/22434 [13:53:26<5:10:10, 2.51s/it] +2025-02-06 00:01:06 - ERROR - stderr - +2025-02-06 00:01:06 - ERROR - stderr - +2025-02-06 00:01:06 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.5008959770202637, 'learning_rate': 5.193347386978307e-06, 'epoch': 2.01} +2025-02-06 00:01:06 - ERROR - stderr - 67%|██████▋ | 15027/22434 [13:53:26<5:10:10, 2.51s/it] +2025-02-06 00:01:09 - ERROR - stderr - 67%|██████▋ | 15028/22434 [13:53:28<5:07:13, 2.49s/it] +2025-02-06 00:01:09 - ERROR - stderr - +2025-02-06 00:01:09 - ERROR - stderr - +2025-02-06 00:01:09 - INFO - stdout - {'loss': 0.4146, 'grad_norm': 1.4927411079406738, 'learning_rate': 5.192081409656647e-06, 'epoch': 2.01} +2025-02-06 00:01:09 - ERROR - stderr - 67%|██████▋ | 15028/22434 [13:53:29<5:07:13, 2.49s/it] +2025-02-06 00:01:11 - ERROR - stderr - 67%|██████▋ | 15029/22434 [13:53:31<5:09:44, 2.51s/it] +2025-02-06 00:01:11 - ERROR - stderr - +2025-02-06 00:01:11 - ERROR - stderr - +2025-02-06 00:01:11 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.3970462083816528, 'learning_rate': 5.190815532551448e-06, 'epoch': 2.01} +2025-02-06 00:01:11 - ERROR - stderr - 67%|██████▋ | 15029/22434 [13:53:31<5:09:44, 2.51s/it] +2025-02-06 00:01:14 - ERROR - stderr - 67%|██████▋ | 15030/22434 [13:53:34<5:15:13, 2.55s/it] +2025-02-06 00:01:14 - ERROR - stderr - +2025-02-06 00:01:14 - ERROR - stderr - +2025-02-06 00:01:14 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.30995512008667, 'learning_rate': 5.189549755689094e-06, 'epoch': 2.01} +2025-02-06 00:01:14 - ERROR - stderr - 67%|██████▋ | 15030/22434 [13:53:34<5:15:13, 2.55s/it] +2025-02-06 00:01:17 - ERROR - stderr - 67%|██████▋ | 15031/22434 [13:53:36<5:17:33, 2.57s/it] +2025-02-06 00:01:17 - ERROR - stderr - +2025-02-06 00:01:17 - ERROR - stderr - +2025-02-06 00:01:17 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.4284340143203735, 'learning_rate': 5.1882840790959785e-06, 'epoch': 2.01} +2025-02-06 00:01:17 - ERROR - stderr - 67%|██████▋ | 15031/22434 [13:53:36<5:17:33, 2.57s/it] +2025-02-06 00:01:19 - ERROR - stderr - 67%|██████▋ | 15032/22434 [13:53:39<5:15:49, 2.56s/it] +2025-02-06 00:01:19 - ERROR - stderr - +2025-02-06 00:01:19 - ERROR - stderr - +2025-02-06 00:01:19 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.3816999197006226, 'learning_rate': 5.187018502798475e-06, 'epoch': 2.01} +2025-02-06 00:01:19 - ERROR - stderr - 67%|██████▋ | 15032/22434 [13:53:39<5:15:49, 2.56s/it] +2025-02-06 00:01:22 - ERROR - stderr - 67%|██████▋ | 15033/22434 [13:53:41<5:13:05, 2.54s/it] +2025-02-06 00:01:22 - ERROR - stderr - +2025-02-06 00:01:22 - ERROR - stderr - +2025-02-06 00:01:22 - INFO - stdout - {'loss': 0.4026, 'grad_norm': 1.4078904390335083, 'learning_rate': 5.185753026822964e-06, 'epoch': 2.01} +2025-02-06 00:01:22 - ERROR - stderr - 67%|██████▋ | 15033/22434 [13:53:41<5:13:05, 2.54s/it] +2025-02-06 00:01:24 - ERROR - stderr - 67%|██████▋ | 15034/22434 [13:53:44<5:14:13, 2.55s/it] +2025-02-06 00:01:24 - ERROR - stderr - +2025-02-06 00:01:24 - ERROR - stderr - +2025-02-06 00:01:24 - INFO - stdout - {'loss': 0.4271, 'grad_norm': 1.5465421676635742, 'learning_rate': 5.184487651195825e-06, 'epoch': 2.01} +2025-02-06 00:01:24 - ERROR - stderr - 67%|██████▋ | 15034/22434 [13:53:44<5:14:13, 2.55s/it] +2025-02-06 00:01:27 - ERROR - stderr - 67%|██████▋ | 15035/22434 [13:53:46<5:12:12, 2.53s/it] +2025-02-06 00:01:27 - ERROR - stderr - +2025-02-06 00:01:27 - ERROR - stderr - +2025-02-06 00:01:27 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.3871711492538452, 'learning_rate': 5.183222375943433e-06, 'epoch': 2.01} +2025-02-06 00:01:27 - ERROR - stderr - 67%|██████▋ | 15035/22434 [13:53:46<5:12:12, 2.53s/it] +2025-02-06 00:01:29 - ERROR - stderr - 67%|██████▋ | 15036/22434 [13:53:49<5:10:21, 2.52s/it] +2025-02-06 00:01:29 - ERROR - stderr - +2025-02-06 00:01:29 - ERROR - stderr - +2025-02-06 00:01:29 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.3923689126968384, 'learning_rate': 5.181957201092163e-06, 'epoch': 2.01} +2025-02-06 00:01:29 - ERROR - stderr - 67%|██████▋ | 15036/22434 [13:53:49<5:10:21, 2.52s/it] +2025-02-06 00:01:32 - ERROR - stderr - 67%|██████▋ | 15037/22434 [13:53:51<5:09:24, 2.51s/it] +2025-02-06 00:01:32 - ERROR - stderr - +2025-02-06 00:01:32 - ERROR - stderr - +2025-02-06 00:01:32 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.4931987524032593, 'learning_rate': 5.180692126668383e-06, 'epoch': 2.01} +2025-02-06 00:01:32 - ERROR - stderr - 67%|██████▋ | 15037/22434 [13:53:51<5:09:24, 2.51s/it] +2025-02-06 00:01:34 - ERROR - stderr - 67%|██████▋ | 15038/22434 [13:53:54<5:13:48, 2.55s/it] +2025-02-06 00:01:34 - ERROR - stderr - +2025-02-06 00:01:34 - ERROR - stderr - +2025-02-06 00:01:34 - INFO - stdout - {'loss': 0.3377, 'grad_norm': 1.36726713180542, 'learning_rate': 5.179427152698464e-06, 'epoch': 2.01} +2025-02-06 00:01:34 - ERROR - stderr - 67%|██████▋ | 15038/22434 [13:53:54<5:13:48, 2.55s/it] +2025-02-06 00:01:37 - ERROR - stderr - 67%|██████▋ | 15039/22434 [13:53:57<5:21:32, 2.61s/it] +2025-02-06 00:01:37 - ERROR - stderr - +2025-02-06 00:01:37 - ERROR - stderr - +2025-02-06 00:01:37 - INFO - stdout - {'loss': 0.3712, 'grad_norm': 1.2424111366271973, 'learning_rate': 5.178162279208774e-06, 'epoch': 2.01} +2025-02-06 00:01:37 - ERROR - stderr - 67%|██████▋ | 15039/22434 [13:53:57<5:21:32, 2.61s/it] +2025-02-06 00:01:39 - ERROR - stderr - 67%|██████▋ | 15040/22434 [13:53:59<5:17:32, 2.58s/it] +2025-02-06 00:01:40 - ERROR - stderr - +2025-02-06 00:01:40 - ERROR - stderr - +2025-02-06 00:01:40 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4048572778701782, 'learning_rate': 5.176897506225675e-06, 'epoch': 2.01} +2025-02-06 00:01:40 - ERROR - stderr - 67%|████��█▋ | 15040/22434 [13:53:59<5:17:32, 2.58s/it] +2025-02-06 00:01:42 - ERROR - stderr - 67%|██████▋ | 15041/22434 [13:54:02<5:15:34, 2.56s/it] +2025-02-06 00:01:42 - ERROR - stderr - +2025-02-06 00:01:42 - ERROR - stderr - +2025-02-06 00:01:42 - INFO - stdout - {'loss': 0.4217, 'grad_norm': 1.5942720174789429, 'learning_rate': 5.175632833775535e-06, 'epoch': 2.01} +2025-02-06 00:01:42 - ERROR - stderr - 67%|██████▋ | 15041/22434 [13:54:02<5:15:34, 2.56s/it] +2025-02-06 00:01:44 - ERROR - stderr - 67%|██████▋ | 15042/22434 [13:54:04<5:10:34, 2.52s/it] +2025-02-06 00:01:44 - ERROR - stderr - +2025-02-06 00:01:44 - ERROR - stderr - +2025-02-06 00:01:44 - INFO - stdout - {'loss': 0.4482, 'grad_norm': 1.4521111249923706, 'learning_rate': 5.1743682618847114e-06, 'epoch': 2.01} +2025-02-06 00:01:44 - ERROR - stderr - 67%|██████▋ | 15042/22434 [13:54:04<5:10:34, 2.52s/it] +2025-02-06 00:01:47 - ERROR - stderr - 67%|██████▋ | 15043/22434 [13:54:07<5:14:15, 2.55s/it] +2025-02-06 00:01:47 - ERROR - stderr - +2025-02-06 00:01:47 - ERROR - stderr - +2025-02-06 00:01:47 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.3126685619354248, 'learning_rate': 5.173103790579564e-06, 'epoch': 2.01} +2025-02-06 00:01:47 - ERROR - stderr - 67%|██████▋ | 15043/22434 [13:54:07<5:14:15, 2.55s/it] +2025-02-06 00:01:50 - ERROR - stderr - 67%|██████▋ | 15044/22434 [13:54:09<5:12:43, 2.54s/it] +2025-02-06 00:01:50 - ERROR - stderr - +2025-02-06 00:01:50 - ERROR - stderr - +2025-02-06 00:01:50 - INFO - stdout - {'loss': 0.4443, 'grad_norm': 1.5449199676513672, 'learning_rate': 5.171839419886449e-06, 'epoch': 2.01} +2025-02-06 00:01:50 - ERROR - stderr - 67%|██████▋ | 15044/22434 [13:54:09<5:12:43, 2.54s/it] +2025-02-06 00:01:52 - ERROR - stderr - 67%|██████▋ | 15045/22434 [13:54:12<5:10:19, 2.52s/it] +2025-02-06 00:01:52 - ERROR - stderr - +2025-02-06 00:01:52 - ERROR - stderr - +2025-02-06 00:01:52 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.2719964981079102, 'learning_rate': 5.170575149831725e-06, 'epoch': 2.01} +2025-02-06 00:01:52 - ERROR - stderr - 67%|██████▋ | 15045/22434 [13:54:12<5:10:19, 2.52s/it] +2025-02-06 00:01:55 - ERROR - stderr - 67%|██████▋ | 15046/22434 [13:54:14<5:10:58, 2.53s/it] +2025-02-06 00:01:55 - ERROR - stderr - +2025-02-06 00:01:55 - ERROR - stderr - +2025-02-06 00:01:55 - INFO - stdout - {'loss': 0.3851, 'grad_norm': 1.4632441997528076, 'learning_rate': 5.169310980441732e-06, 'epoch': 2.01} +2025-02-06 00:01:55 - ERROR - stderr - 67%|██████▋ | 15046/22434 [13:54:14<5:10:58, 2.53s/it] +2025-02-06 00:01:57 - ERROR - stderr - 67%|██████▋ | 15047/22434 [13:54:17<5:09:50, 2.52s/it] +2025-02-06 00:01:57 - ERROR - stderr - +2025-02-06 00:01:57 - ERROR - stderr - +2025-02-06 00:01:57 - INFO - stdout - {'loss': 0.4124, 'grad_norm': 1.3705885410308838, 'learning_rate': 5.168046911742838e-06, 'epoch': 2.01} +2025-02-06 00:01:57 - ERROR - stderr - 67%|██████▋ | 15047/22434 [13:54:17<5:09:50, 2.52s/it] +2025-02-06 00:02:00 - ERROR - stderr - 67%|██████▋ | 15048/22434 [13:54:19<5:09:34, 2.51s/it] +2025-02-06 00:02:00 - ERROR - stderr - +2025-02-06 00:02:00 - ERROR - stderr - +2025-02-06 00:02:00 - INFO - stdout - {'loss': 0.4218, 'grad_norm': 1.4565726518630981, 'learning_rate': 5.166782943761378e-06, 'epoch': 2.01} +2025-02-06 00:02:00 - ERROR - stderr - 67%|██████▋ | 15048/22434 [13:54:19<5:09:34, 2.51s/it] +2025-02-06 00:02:02 - ERROR - stderr - 67%|██████▋ | 15049/22434 [13:54:22<5:06:53, 2.49s/it] +2025-02-06 00:02:02 - ERROR - stderr - +2025-02-06 00:02:02 - ERROR - stderr - +2025-02-06 00:02:02 - INFO - stdout - {'loss': 0.4291, 'grad_norm': 1.4194146394729614, 'learning_rate': 5.165519076523699e-06, 'epoch': 2.01} +2025-02-06 00:02:02 - ERROR - stderr - 67%|██████▋ | 15049/22434 [13:54:22<5:06:53, 2.49s/it] +2025-02-06 00:02:04 - ERROR - stderr - 67%|██████▋ | 15050/22434 [13:54:24<5:05:30, 2.48s/it] +2025-02-06 00:02:05 - ERROR - stderr - +2025-02-06 00:02:05 - ERROR - stderr - +2025-02-06 00:02:05 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.4706757068634033, 'learning_rate': 5.164255310056156e-06, 'epoch': 2.01} +2025-02-06 00:02:05 - ERROR - stderr - 67%|██████▋ | 15050/22434 [13:54:24<5:05:30, 2.48s/it] +2025-02-06 00:02:07 - ERROR - stderr - 67%|██████▋ | 15051/22434 [13:54:27<5:10:55, 2.53s/it] +2025-02-06 00:02:07 - ERROR - stderr - +2025-02-06 00:02:07 - ERROR - stderr - +2025-02-06 00:02:07 - INFO - stdout - {'loss': 0.4107, 'grad_norm': 1.5115705728530884, 'learning_rate': 5.162991644385078e-06, 'epoch': 2.01} +2025-02-06 00:02:07 - ERROR - stderr - 67%|██████▋ | 15051/22434 [13:54:27<5:10:55, 2.53s/it] +2025-02-06 00:02:10 - ERROR - stderr - 67%|██████▋ | 15052/22434 [13:54:29<5:10:22, 2.52s/it] +2025-02-06 00:02:10 - ERROR - stderr - +2025-02-06 00:02:10 - ERROR - stderr - +2025-02-06 00:02:10 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.493889570236206, 'learning_rate': 5.161728079536816e-06, 'epoch': 2.01} +2025-02-06 00:02:10 - ERROR - stderr - 67%|██████▋ | 15052/22434 [13:54:29<5:10:22, 2.52s/it] +2025-02-06 00:02:12 - ERROR - stderr - 67%|██████▋ | 15053/22434 [13:54:32<5:08:18, 2.51s/it] +2025-02-06 00:02:12 - ERROR - stderr - +2025-02-06 00:02:12 - ERROR - stderr - +2025-02-06 00:02:12 - INFO - stdout - {'loss': 0.4375, 'grad_norm': 1.5621106624603271, 'learning_rate': 5.1604646155377e-06, 'epoch': 2.01} +2025-02-06 00:02:12 - ERROR - stderr - 67%|██████▋ | 15053/22434 [13:54:32<5:08:18, 2.51s/it] +2025-02-06 00:02:15 - ERROR - stderr - 67%|██████▋ | 15054/22434 [13:54:34<5:06:33, 2.49s/it] +2025-02-06 00:02:15 - ERROR - stderr - +2025-02-06 00:02:15 - ERROR - stderr - +2025-02-06 00:02:15 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.4189461469650269, 'learning_rate': 5.159201252414067e-06, 'epoch': 2.01} +2025-02-06 00:02:15 - ERROR - stderr - 67%|██████▋ | 15054/22434 [13:54:34<5:06:33, 2.49s/it] +2025-02-06 00:02:17 - ERROR - stderr - 67%|██████▋ | 15055/22434 [13:54:37<5:06:52, 2.50s/it] +2025-02-06 00:02:17 - ERROR - stderr - +2025-02-06 00:02:17 - ERROR - stderr - +2025-02-06 00:02:17 - INFO - stdout - {'loss': 0.4137, 'grad_norm': 1.5454238653182983, 'learning_rate': 5.157937990192255e-06, 'epoch': 2.01} +2025-02-06 00:02:17 - ERROR - stderr - 67%|██████▋ | 15055/22434 [13:54:37<5:06:52, 2.50s/it] +2025-02-06 00:02:20 - ERROR - stderr - 67%|██████▋ | 15056/22434 [13:54:39<5:06:11, 2.49s/it] +2025-02-06 00:02:20 - ERROR - stderr - +2025-02-06 00:02:20 - ERROR - stderr - +2025-02-06 00:02:20 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.3968615531921387, 'learning_rate': 5.156674828898589e-06, 'epoch': 2.01} +2025-02-06 00:02:20 - ERROR - stderr - 67%|██████▋ | 15056/22434 [13:54:39<5:06:11, 2.49s/it] +2025-02-06 00:02:22 - ERROR - stderr - 67%|██████▋ | 15057/22434 [13:54:42<5:07:37, 2.50s/it] +2025-02-06 00:02:22 - ERROR - stderr - +2025-02-06 00:02:22 - ERROR - stderr - +2025-02-06 00:02:22 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.3716275691986084, 'learning_rate': 5.155411768559402e-06, 'epoch': 2.01} +2025-02-06 00:02:22 - ERROR - stderr - 67%|██████▋ | 15057/22434 [13:54:42<5:07:37, 2.50s/it] +2025-02-06 00:02:25 - ERROR - stderr - 67%|██████▋ | 15058/22434 [13:54:44<5:07:45, 2.50s/it] +2025-02-06 00:02:25 - ERROR - stderr - +2025-02-06 00:02:25 - ERROR - stderr - +2025-02-06 00:02:25 - INFO - stdout - {'loss': 0.3872, 'grad_norm': 1.471168875694275, 'learning_rate': 5.154148809201022e-06, 'epoch': 2.01} +2025-02-06 00:02:25 - ERROR - stderr - 67%|██████▋ | 15058/22434 [13:54:44<5:07:45, 2.50s/it] +2025-02-06 00:02:27 - ERROR - stderr - 67%|██████▋ | 15059/22434 [13:54:47<5:08:11, 2.51s/it] +2025-02-06 00:02:27 - ERROR - stderr - +2025-02-06 00:02:27 - ERROR - stderr - +2025-02-06 00:02:27 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.5357636213302612, 'learning_rate': 5.152885950849772e-06, 'epoch': 2.01} +2025-02-06 00:02:27 - ERROR - stderr - 67%|██████▋ | 15059/22434 [13:54:47<5:08:11, 2.51s/it] +2025-02-06 00:02:30 - ERROR - stderr - 67%|██████▋ | 15060/22434 [13:54:49<5:08:49, 2.51s/it] +2025-02-06 00:02:30 - ERROR - stderr - +2025-02-06 00:02:30 - ERROR - stderr - +2025-02-06 00:02:30 - INFO - stdout - {'loss': 0.3725, 'grad_norm': 1.3124295473098755, 'learning_rate': 5.151623193531976e-06, 'epoch': 2.01} +2025-02-06 00:02:30 - ERROR - stderr - 67%|██████▋ | 15060/22434 [13:54:49<5:08:49, 2.51s/it] +2025-02-06 00:02:32 - ERROR - stderr - 67%|██████▋ | 15061/22434 [13:54:52<5:09:13, 2.52s/it] +2025-02-06 00:02:32 - ERROR - stderr - +2025-02-06 00:02:32 - ERROR - stderr - +2025-02-06 00:02:32 - INFO - stdout - {'loss': 0.3893, 'grad_norm': 1.2504169940948486, 'learning_rate': 5.150360537273956e-06, 'epoch': 2.01} +2025-02-06 00:02:32 - ERROR - stderr - 67%|██████▋ | 15061/22434 [13:54:52<5:09:13, 2.52s/it] +2025-02-06 00:02:35 - ERROR - stderr - 67%|██████▋ | 15062/22434 [13:54:55<5:16:12, 2.57s/it] +2025-02-06 00:02:35 - ERROR - stderr - +2025-02-06 00:02:35 - ERROR - stderr - +2025-02-06 00:02:35 - INFO - stdout - {'loss': 0.431, 'grad_norm': 1.4942512512207031, 'learning_rate': 5.14909798210203e-06, 'epoch': 2.01} +2025-02-06 00:02:35 - ERROR - stderr - 67%|██████▋ | 15062/22434 [13:54:55<5:16:12, 2.57s/it] +2025-02-06 00:02:37 - ERROR - stderr - 67%|██████▋ | 15063/22434 [13:54:57<5:15:29, 2.57s/it] +2025-02-06 00:02:37 - ERROR - stderr - +2025-02-06 00:02:37 - ERROR - stderr - +2025-02-06 00:02:37 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.5264604091644287, 'learning_rate': 5.147835528042515e-06, 'epoch': 2.01} +2025-02-06 00:02:37 - ERROR - stderr - 67%|██████▋ | 15063/22434 [13:54:57<5:15:29, 2.57s/it] +2025-02-06 00:02:40 - ERROR - stderr - 67%|██████▋ | 15064/22434 [13:55:00<5:12:47, 2.55s/it] +2025-02-06 00:02:40 - ERROR - stderr - +2025-02-06 00:02:40 - ERROR - stderr - +2025-02-06 00:02:40 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.3152052164077759, 'learning_rate': 5.1465731751217286e-06, 'epoch': 2.01} +2025-02-06 00:02:40 - ERROR - stderr - 67%|██████▋ | 15064/22434 [13:55:00<5:12:47, 2.55s/it] +2025-02-06 00:02:42 - ERROR - stderr - 67%|██████▋ | 15065/22434 [13:55:02<5:09:13, 2.52s/it] +2025-02-06 00:02:42 - ERROR - stderr - +2025-02-06 00:02:42 - ERROR - stderr - +2025-02-06 00:02:42 - INFO - stdout - {'loss': 0.4622, 'grad_norm': 1.823743224143982, 'learning_rate': 5.145310923365973e-06, 'epoch': 2.01} +2025-02-06 00:02:42 - ERROR - stderr - 67%|██████▋ | 15065/22434 [13:55:02<5:09:13, 2.52s/it] +2025-02-06 00:02:45 - ERROR - stderr - 67%|██████▋ | 15066/22434 [13:55:05<5:06:26, 2.50s/it] +2025-02-06 00:02:45 - ERROR - stderr - +2025-02-06 00:02:45 - ERROR - stderr - +2025-02-06 00:02:45 - INFO - stdout - {'loss': 0.4324, 'grad_norm': 1.6228814125061035, 'learning_rate': 5.144048772801573e-06, 'epoch': 2.01} +2025-02-06 00:02:45 - ERROR - stderr - 67%|██████▋ | 15066/22434 [13:55:05<5:06:26, 2.50s/it] +2025-02-06 00:02:47 - ERROR - stderr - 67%|██████▋ | 15067/22434 [13:55:07<5:05:51, 2.49s/it] +2025-02-06 00:02:47 - ERROR - stderr - +2025-02-06 00:02:47 - ERROR - stderr - +2025-02-06 00:02:47 - INFO - stdout - {'loss': 0.4244, 'grad_norm': 1.441481351852417, 'learning_rate': 5.142786723454822e-06, 'epoch': 2.01} +2025-02-06 00:02:47 - ERROR - stderr - 67%|██████▋ | 15067/22434 [13:55:07<5:05:51, 2.49s/it] +2025-02-06 00:02:50 - ERROR - stderr - 67%|██████▋ | 15068/22434 [13:55:09<5:04:18, 2.48s/it] +2025-02-06 00:02:50 - ERROR - stderr - +2025-02-06 00:02:50 - ERROR - stderr - +2025-02-06 00:02:50 - INFO - stdout - {'loss': 0.468, 'grad_norm': 1.5807491540908813, 'learning_rate': 5.141524775352038e-06, 'epoch': 2.01} +2025-02-06 00:02:50 - ERROR - stderr - 67%|██████▋ | 15068/22434 [13:55:10<5:04:18, 2.48s/it] +2025-02-06 00:02:52 - ERROR - stderr - 67%|██████▋ | 15069/22434 [13:55:12<5:08:01, 2.51s/it] +2025-02-06 00:02:52 - ERROR - stderr - +2025-02-06 00:02:52 - ERROR - stderr - +2025-02-06 00:02:52 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.5947625637054443, 'learning_rate': 5.140262928519524e-06, 'epoch': 2.02} +2025-02-06 00:02:52 - ERROR - stderr - 67%|██████▋ | 15069/22434 [13:55:12<5:08:01, 2.51s/it] +2025-02-06 00:02:55 - ERROR - stderr - 67%|██████▋ | 15070/22434 [13:55:15<5:06:42, 2.50s/it] +2025-02-06 00:02:55 - ERROR - stderr - +2025-02-06 00:02:55 - ERROR - stderr - +2025-02-06 00:02:55 - INFO - stdout - {'loss': 0.404, 'grad_norm': 1.3195481300354004, 'learning_rate': 5.139001182983572e-06, 'epoch': 2.02} +2025-02-06 00:02:55 - ERROR - stderr - 67%|██████▋ | 15070/22434 [13:55:15<5:06:42, 2.50s/it] +2025-02-06 00:02:57 - ERROR - stderr - 67%|██████▋ | 15071/22434 [13:55:17<5:04:36, 2.48s/it] +2025-02-06 00:02:57 - ERROR - stderr - +2025-02-06 00:02:57 - ERROR - stderr - +2025-02-06 00:02:57 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.1909089088439941, 'learning_rate': 5.137739538770497e-06, 'epoch': 2.02} +2025-02-06 00:02:57 - ERROR - stderr - 67%|██████▋ | 15071/22434 [13:55:17<5:04:36, 2.48s/it] +2025-02-06 00:03:00 - ERROR - stderr - 67%|██████▋ | 15072/22434 [13:55:19<5:03:06, 2.47s/it] +2025-02-06 00:03:00 - ERROR - stderr - +2025-02-06 00:03:00 - ERROR - stderr - +2025-02-06 00:03:00 - INFO - stdout - {'loss': 0.441, 'grad_norm': 1.643643856048584, 'learning_rate': 5.136477995906583e-06, 'epoch': 2.02} +2025-02-06 00:03:00 - ERROR - stderr - 67%|██████▋ | 15072/22434 [13:55:19<5:03:06, 2.47s/it] +2025-02-06 00:03:02 - ERROR - stderr - 67%|██████▋ | 15073/22434 [13:55:22<5:11:10, 2.54s/it] +2025-02-06 00:03:02 - ERROR - stderr - +2025-02-06 00:03:02 - ERROR - stderr - +2025-02-06 00:03:02 - INFO - stdout - {'loss': 0.407, 'grad_norm': 1.382297396659851, 'learning_rate': 5.1352165544181345e-06, 'epoch': 2.02} +2025-02-06 00:03:02 - ERROR - stderr - 67%|██████▋ | 15073/22434 [13:55:22<5:11:10, 2.54s/it] +2025-02-06 00:03:05 - ERROR - stderr - 67%|██████▋ | 15074/22434 [13:55:25<5:07:37, 2.51s/it] +2025-02-06 00:03:05 - ERROR - stderr - +2025-02-06 00:03:05 - ERROR - stderr - +2025-02-06 00:03:05 - INFO - stdout - {'loss': 0.3946, 'grad_norm': 1.5611246824264526, 'learning_rate': 5.133955214331439e-06, 'epoch': 2.02} +2025-02-06 00:03:05 - ERROR - stderr - 67%|██████▋ | 15074/22434 [13:55:25<5:07:37, 2.51s/it] +2025-02-06 00:03:07 - ERROR - stderr - 67%|██████▋ | 15075/22434 [13:55:27<5:06:29, 2.50s/it] +2025-02-06 00:03:07 - ERROR - stderr - +2025-02-06 00:03:07 - ERROR - stderr - +2025-02-06 00:03:07 - INFO - stdout - {'loss': 0.4056, 'grad_norm': 1.4899537563323975, 'learning_rate': 5.132693975672788e-06, 'epoch': 2.02} +2025-02-06 00:03:07 - ERROR - stderr - 67%|██████▋ | 15075/22434 [13:55:27<5:06:29, 2.50s/it] +2025-02-06 00:03:10 - ERROR - stderr - 67%|██████▋ | 15076/22434 [13:55:30<5:07:08, 2.50s/it] +2025-02-06 00:03:10 - ERROR - stderr - +2025-02-06 00:03:10 - ERROR - stderr - +2025-02-06 00:03:10 - INFO - stdout - {'loss': 0.3168, 'grad_norm': 1.2148845195770264, 'learning_rate': 5.131432838468482e-06, 'epoch': 2.02} +2025-02-06 00:03:10 - ERROR - stderr - 67%|██████▋ | 15076/22434 [13:55:30<5:07:08, 2.50s/it] +2025-02-06 00:03:12 - ERROR - stderr - 67%|██████▋ | 15077/22434 [13:55:32<5:05:00, 2.49s/it] +2025-02-06 00:03:12 - ERROR - stderr - +2025-02-06 00:03:12 - ERROR - stderr - +2025-02-06 00:03:12 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.3161561489105225, 'learning_rate': 5.130171802744795e-06, 'epoch': 2.02} +2025-02-06 00:03:12 - ERROR - stderr - 67%|██████▋ | 15077/22434 [13:55:32<5:05:00, 2.49s/it] +2025-02-06 00:03:15 - ERROR - stderr - 67%|██████▋ | 15078/22434 [13:55:34<5:04:46, 2.49s/it] +2025-02-06 00:03:15 - ERROR - stderr - +2025-02-06 00:03:15 - ERROR - stderr - +2025-02-06 00:03:15 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.4447343349456787, 'learning_rate': 5.128910868528017e-06, 'epoch': 2.02} +2025-02-06 00:03:15 - ERROR - stderr - 67%|██████▋ | 15078/22434 [13:55:35<5:04:46, 2.49s/it] +2025-02-06 00:03:17 - ERROR - stderr - 67%|██████▋ | 15079/22434 [13:55:37<5:04:22, 2.48s/it] +2025-02-06 00:03:17 - ERROR - stderr - +2025-02-06 00:03:17 - ERROR - stderr - +2025-02-06 00:03:17 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.3097301721572876, 'learning_rate': 5.127650035844429e-06, 'epoch': 2.02} +2025-02-06 00:03:17 - ERROR - stderr - 67%|██████▋ | 15079/22434 [13:55:37<5:04:22, 2.48s/it] +2025-02-06 00:03:20 - ERROR - stderr - 67%|██████▋ | 15080/22434 [13:55:40<5:18:53, 2.60s/it] +2025-02-06 00:03:20 - ERROR - stderr - +2025-02-06 00:03:20 - ERROR - stderr - +2025-02-06 00:03:20 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.4044361114501953, 'learning_rate': 5.126389304720316e-06, 'epoch': 2.02} +2025-02-06 00:03:20 - ERROR - stderr - 67%|██████▋ | 15080/22434 [13:55:40<5:18:53, 2.60s/it] +2025-02-06 00:03:23 - ERROR - stderr - 67%|██████▋ | 15081/22434 [13:55:42<5:15:03, 2.57s/it] +2025-02-06 00:03:23 - ERROR - stderr - +2025-02-06 00:03:23 - ERROR - stderr - +2025-02-06 00:03:23 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.3084218502044678, 'learning_rate': 5.125128675181954e-06, 'epoch': 2.02} +2025-02-06 00:03:23 - ERROR - stderr - 67%|██████▋ | 15081/22434 [13:55:42<5:15:03, 2.57s/it] +2025-02-06 00:03:25 - ERROR - stderr - 67%|██████▋ | 15082/22434 [13:55:45<5:22:15, 2.63s/it] +2025-02-06 00:03:25 - ERROR - stderr - +2025-02-06 00:03:25 - ERROR - stderr - +2025-02-06 00:03:25 - INFO - stdout - {'loss': 0.4064, 'grad_norm': 1.3356610536575317, 'learning_rate': 5.123868147255619e-06, 'epoch': 2.02} +2025-02-06 00:03:25 - ERROR - stderr - 67%|██████▋ | 15082/22434 [13:55:45<5:22:15, 2.63s/it] +2025-02-06 00:03:28 - ERROR - stderr - 67%|██████▋ | 15083/22434 [13:55:48<5:14:46, 2.57s/it] +2025-02-06 00:03:28 - ERROR - stderr - +2025-02-06 00:03:28 - ERROR - stderr - +2025-02-06 00:03:28 - INFO - stdout - {'loss': 0.3645, 'grad_norm': 1.3402231931686401, 'learning_rate': 5.122607720967588e-06, 'epoch': 2.02} +2025-02-06 00:03:28 - ERROR - stderr - 67%|██████▋ | 15083/22434 [13:55:48<5:14:46, 2.57s/it] +2025-02-06 00:03:30 - ERROR - stderr - 67%|██████▋ | 15084/22434 [13:55:50<5:12:47, 2.55s/it] +2025-02-06 00:03:30 - ERROR - stderr - +2025-02-06 00:03:30 - ERROR - stderr - +2025-02-06 00:03:30 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.5002535581588745, 'learning_rate': 5.121347396344132e-06, 'epoch': 2.02} +2025-02-06 00:03:30 - ERROR - stderr - 67%|██████▋ | 15084/22434 [13:55:50<5:12:47, 2.55s/it] +2025-02-06 00:03:33 - ERROR - stderr - 67%|██████▋ | 15085/22434 [13:55:53<5:11:33, 2.54s/it] +2025-02-06 00:03:33 - ERROR - stderr - +2025-02-06 00:03:33 - ERROR - stderr - +2025-02-06 00:03:33 - INFO - stdout - {'loss': 0.4311, 'grad_norm': 1.558423399925232, 'learning_rate': 5.120087173411523e-06, 'epoch': 2.02} +2025-02-06 00:03:33 - ERROR - stderr - 67%|██████▋ | 15085/22434 [13:55:53<5:11:33, 2.54s/it] +2025-02-06 00:03:35 - ERROR - stderr - 67%|██████▋ | 15086/22434 [13:55:55<5:07:32, 2.51s/it] +2025-02-06 00:03:35 - ERROR - stderr - +2025-02-06 00:03:35 - ERROR - stderr - +2025-02-06 00:03:35 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.3377227783203125, 'learning_rate': 5.1188270521960215e-06, 'epoch': 2.02} +2025-02-06 00:03:35 - ERROR - stderr - 67%|██████▋ | 15086/22434 [13:55:55<5:07:32, 2.51s/it] +2025-02-06 00:03:38 - ERROR - stderr - 67%|██████▋ | 15087/22434 [13:55:57<5:05:19, 2.49s/it] +2025-02-06 00:03:38 - ERROR - stderr - +2025-02-06 00:03:38 - ERROR - stderr - +2025-02-06 00:03:38 - INFO - stdout - {'loss': 0.4075, 'grad_norm': 1.5051350593566895, 'learning_rate': 5.117567032723902e-06, 'epoch': 2.02} +2025-02-06 00:03:38 - ERROR - stderr - 67%|██████▋ | 15087/22434 [13:55:57<5:05:19, 2.49s/it] +2025-02-06 00:03:40 - ERROR - stderr - 67%|██████▋ | 15088/22434 [13:56:00<5:05:50, 2.50s/it] +2025-02-06 00:03:40 - ERROR - stderr - +2025-02-06 00:03:40 - ERROR - stderr - +2025-02-06 00:03:40 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.3303165435791016, 'learning_rate': 5.116307115021431e-06, 'epoch': 2.02} +2025-02-06 00:03:40 - ERROR - stderr - 67%|██████▋ | 15088/22434 [13:56:00<5:05:50, 2.50s/it] +2025-02-06 00:03:43 - ERROR - stderr - 67%|██████▋ | 15089/22434 [13:56:02<5:04:27, 2.49s/it] +2025-02-06 00:03:43 - ERROR - stderr - +2025-02-06 00:03:43 - ERROR - stderr - +2025-02-06 00:03:43 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.4619866609573364, 'learning_rate': 5.115047299114856e-06, 'epoch': 2.02} +2025-02-06 00:03:43 - ERROR - stderr - 67%|██████▋ | 15089/22434 [13:56:02<5:04:27, 2.49s/it] +2025-02-06 00:03:45 - ERROR - stderr - 67%|██████▋ | 15090/22434 [13:56:05<5:05:28, 2.50s/it] +2025-02-06 00:03:45 - ERROR - stderr - +2025-02-06 00:03:45 - ERROR - stderr - +2025-02-06 00:03:45 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.3716658353805542, 'learning_rate': 5.1137875850304545e-06, 'epoch': 2.02} +2025-02-06 00:03:45 - ERROR - stderr - 67%|██████▋ | 15090/22434 [13:56:05<5:05:28, 2.50s/it] +2025-02-06 00:03:48 - ERROR - stderr - 67%|██████▋ | 15091/22434 [13:56:07<5:05:17, 2.49s/it] +2025-02-06 00:03:48 - ERROR - stderr - +2025-02-06 00:03:48 - ERROR - stderr - +2025-02-06 00:03:48 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.338131308555603, 'learning_rate': 5.112527972794465e-06, 'epoch': 2.02} +2025-02-06 00:03:48 - ERROR - stderr - 67%|██████▋ | 15091/22434 [13:56:07<5:05:17, 2.49s/it] +2025-02-06 00:03:50 - ERROR - stderr - 67%|██████▋ | 15092/22434 [13:56:10<5:04:20, 2.49s/it] +2025-02-06 00:03:50 - ERROR - stderr - +2025-02-06 00:03:50 - ERROR - stderr - +2025-02-06 00:03:50 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.294083833694458, 'learning_rate': 5.111268462433163e-06, 'epoch': 2.02} +2025-02-06 00:03:50 - ERROR - stderr - 67%|██████▋ | 15092/22434 [13:56:10<5:04:20, 2.49s/it] +2025-02-06 00:03:53 - ERROR - stderr - 67%|██████▋ | 15093/22434 [13:56:12<5:02:26, 2.47s/it] +2025-02-06 00:03:53 - ERROR - stderr - +2025-02-06 00:03:53 - ERROR - stderr - +2025-02-06 00:03:53 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.4195901155471802, 'learning_rate': 5.1100090539727884e-06, 'epoch': 2.02} +2025-02-06 00:03:53 - ERROR - stderr - 67%|██████▋ | 15093/22434 [13:56:12<5:02:26, 2.47s/it] +2025-02-06 00:03:55 - ERROR - stderr - 67%|██████▋ | 15094/22434 [13:56:15<5:00:46, 2.46s/it] +2025-02-06 00:03:55 - ERROR - stderr - +2025-02-06 00:03:55 - ERROR - stderr - +2025-02-06 00:03:55 - INFO - stdout - {'loss': 0.4129, 'grad_norm': 1.349548101425171, 'learning_rate': 5.108749747439591e-06, 'epoch': 2.02} +2025-02-06 00:03:55 - ERROR - stderr - 67%|██████▋ | 15094/22434 [13:56:15<5:00:46, 2.46s/it] +2025-02-06 00:03:57 - ERROR - stderr - 67%|██████▋ | 15095/22434 [13:56:17<4:59:33, 2.45s/it] +2025-02-06 00:03:57 - ERROR - stderr - +2025-02-06 00:03:57 - ERROR - stderr - +2025-02-06 00:03:57 - INFO - stdout - {'loss': 0.3982, 'grad_norm': 1.545592188835144, 'learning_rate': 5.107490542859832e-06, 'epoch': 2.02} +2025-02-06 00:03:57 - ERROR - stderr - 67%|██████▋ | 15095/22434 [13:56:17<4:59:33, 2.45s/it] +2025-02-06 00:04:00 - ERROR - stderr - 67%|██████▋ | 15096/22434 [13:56:20<5:03:54, 2.48s/it] +2025-02-06 00:04:00 - ERROR - stderr - +2025-02-06 00:04:00 - ERROR - stderr - +2025-02-06 00:04:00 - INFO - stdout - {'loss': 0.3563, 'grad_norm': 1.4069312810897827, 'learning_rate': 5.106231440259748e-06, 'epoch': 2.02} +2025-02-06 00:04:00 - ERROR - stderr - 67%|██████▋ | 15096/22434 [13:56:20<5:03:54, 2.48s/it] +2025-02-06 00:04:02 - ERROR - stderr - 67%|██████▋ | 15097/22434 [13:56:22<5:01:28, 2.47s/it] +2025-02-06 00:04:02 - ERROR - stderr - +2025-02-06 00:04:02 - ERROR - stderr - +2025-02-06 00:04:02 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.2842289209365845, 'learning_rate': 5.1049724396655865e-06, 'epoch': 2.02} +2025-02-06 00:04:02 - ERROR - stderr - 67%|██████▋ | 15097/22434 [13:56:22<5:01:28, 2.47s/it] +2025-02-06 00:04:05 - ERROR - stderr - 67%|██████▋ | 15098/22434 [13:56:25<5:03:03, 2.48s/it] +2025-02-06 00:04:05 - ERROR - stderr - +2025-02-06 00:04:05 - ERROR - stderr - +2025-02-06 00:04:05 - INFO - stdout - {'loss': 0.3895, 'grad_norm': 1.4287010431289673, 'learning_rate': 5.10371354110359e-06, 'epoch': 2.02} +2025-02-06 00:04:05 - ERROR - stderr - 67%|██████▋ | 15098/22434 [13:56:25<5:03:03, 2.48s/it] +2025-02-06 00:04:07 - ERROR - stderr - 67%|██████▋ | 15099/22434 [13:56:27<5:02:04, 2.47s/it] +2025-02-06 00:04:07 - ERROR - stderr - +2025-02-06 00:04:07 - ERROR - stderr - +2025-02-06 00:04:07 - INFO - stdout - {'loss': 0.3941, 'grad_norm': 1.3400799036026, 'learning_rate': 5.102454744600001e-06, 'epoch': 2.02} +2025-02-06 00:04:07 - ERROR - stderr - 67%|██████▋ | 15099/22434 [13:56:27<5:02:04, 2.47s/it] +2025-02-06 00:04:10 - ERROR - stderr - 67%|██████▋ | 15100/22434 [13:56:30<5:02:04, 2.47s/it] +2025-02-06 00:04:10 - ERROR - stderr - +2025-02-06 00:04:10 - ERROR - stderr - +2025-02-06 00:04:10 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.3937876224517822, 'learning_rate': 5.101196050181054e-06, 'epoch': 2.02} +2025-02-06 00:04:10 - ERROR - stderr - 67%|██████▋ | 15100/22434 [13:56:30<5:02:04, 2.47s/it] +2025-02-06 00:04:12 - ERROR - stderr - 67%|██████▋ | 15101/22434 [13:56:32<5:01:31, 2.47s/it] +2025-02-06 00:04:12 - ERROR - stderr - +2025-02-06 00:04:12 - ERROR - stderr - +2025-02-06 00:04:12 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.411228060722351, 'learning_rate': 5.09993745787299e-06, 'epoch': 2.02} +2025-02-06 00:04:12 - ERROR - stderr - 67%|██████▋ | 15101/22434 [13:56:32<5:01:31, 2.47s/it] +2025-02-06 00:04:15 - ERROR - stderr - 67%|██████▋ | 15102/22434 [13:56:34<4:59:34, 2.45s/it] +2025-02-06 00:04:15 - ERROR - stderr - +2025-02-06 00:04:15 - ERROR - stderr - +2025-02-06 00:04:15 - INFO - stdout - {'loss': 0.3998, 'grad_norm': 1.4551266431808472, 'learning_rate': 5.09867896770204e-06, 'epoch': 2.02} +2025-02-06 00:04:15 - ERROR - stderr - 67%|██████▋ | 15102/22434 [13:56:35<4:59:34, 2.45s/it] +2025-02-06 00:04:17 - ERROR - stderr - 67%|██████▋ | 15103/22434 [13:56:37<5:00:02, 2.46s/it] +2025-02-06 00:04:17 - ERROR - stderr - +2025-02-06 00:04:17 - ERROR - stderr - +2025-02-06 00:04:17 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.4065072536468506, 'learning_rate': 5.0974205796944365e-06, 'epoch': 2.02} +2025-02-06 00:04:17 - ERROR - stderr - 67%|██████▋ | 15103/22434 [13:56:37<5:00:02, 2.46s/it] +2025-02-06 00:04:20 - ERROR - stderr - 67%|██████▋ | 15104/22434 [13:56:39<5:01:54, 2.47s/it] +2025-02-06 00:04:20 - ERROR - stderr - +2025-02-06 00:04:20 - ERROR - stderr - +2025-02-06 00:04:20 - INFO - stdout - {'loss': 0.4125, 'grad_norm': 1.5160589218139648, 'learning_rate': 5.096162293876415e-06, 'epoch': 2.02} +2025-02-06 00:04:20 - ERROR - stderr - 67%|██████▋ | 15104/22434 [13:56:40<5:01:54, 2.47s/it] +2025-02-06 00:04:22 - ERROR - stderr - 67%|██████▋ | 15105/22434 [13:56:42<5:04:30, 2.49s/it] +2025-02-06 00:04:22 - ERROR - stderr - +2025-02-06 00:04:22 - ERROR - stderr - +2025-02-06 00:04:22 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.4706947803497314, 'learning_rate': 5.094904110274188e-06, 'epoch': 2.02} +2025-02-06 00:04:22 - ERROR - stderr - 67%|██████▋ | 15105/22434 [13:56:42<5:04:30, 2.49s/it] +2025-02-06 00:04:25 - ERROR - stderr - 67%|██████▋ | 15106/22434 [13:56:44<5:03:37, 2.49s/it] +2025-02-06 00:04:25 - ERROR - stderr - +2025-02-06 00:04:25 - ERROR - stderr - +2025-02-06 00:04:25 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.2299705743789673, 'learning_rate': 5.093646028913996e-06, 'epoch': 2.02} +2025-02-06 00:04:25 - ERROR - stderr - 67%|██████▋ | 15106/22434 [13:56:45<5:03:37, 2.49s/it] +2025-02-06 00:04:27 - ERROR - stderr - 67%|██████▋ | 15107/22434 [13:56:47<5:05:16, 2.50s/it] +2025-02-06 00:04:27 - ERROR - stderr - +2025-02-06 00:04:27 - ERROR - stderr - +2025-02-06 00:04:27 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.449506163597107, 'learning_rate': 5.092388049822059e-06, 'epoch': 2.02} +2025-02-06 00:04:27 - ERROR - stderr - 67%|██████▋ | 15107/22434 [13:56:47<5:05:16, 2.50s/it] +2025-02-06 00:04:30 - ERROR - stderr - 67%|██████▋ | 15108/22434 [13:56:50<5:07:03, 2.51s/it] +2025-02-06 00:04:30 - ERROR - stderr - +2025-02-06 00:04:30 - ERROR - stderr - +2025-02-06 00:04:30 - INFO - stdout - {'loss': 0.4028, 'grad_norm': 1.5085084438323975, 'learning_rate': 5.091130173024596e-06, 'epoch': 2.02} +2025-02-06 00:04:30 - ERROR - stderr - 67%|██████▋ | 15108/22434 [13:56:50<5:07:03, 2.51s/it] +2025-02-06 00:04:32 - ERROR - stderr - 67%|██████▋ | 15109/22434 [13:56:52<5:04:20, 2.49s/it] +2025-02-06 00:04:32 - ERROR - stderr - +2025-02-06 00:04:32 - ERROR - stderr - +2025-02-06 00:04:32 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.4426831007003784, 'learning_rate': 5.089872398547831e-06, 'epoch': 2.02} +2025-02-06 00:04:32 - ERROR - stderr - 67%|██████▋ | 15109/22434 [13:56:52<5:04:20, 2.49s/it] +2025-02-06 00:04:35 - ERROR - stderr - 67%|██████▋ | 15110/22434 [13:56:55<5:08:17, 2.53s/it] +2025-02-06 00:04:35 - ERROR - stderr - +2025-02-06 00:04:35 - ERROR - stderr - +2025-02-06 00:04:35 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.2821072340011597, 'learning_rate': 5.0886147264179685e-06, 'epoch': 2.02} +2025-02-06 00:04:35 - ERROR - stderr - 67%|██████▋ | 15110/22434 [13:56:55<5:08:17, 2.53s/it] +2025-02-06 00:04:38 - ERROR - stderr - 67%|██████▋ | 15111/22434 [13:56:57<5:16:23, 2.59s/it] +2025-02-06 00:04:38 - ERROR - stderr - +2025-02-06 00:04:38 - ERROR - stderr - +2025-02-06 00:04:38 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.4536360502243042, 'learning_rate': 5.087357156661241e-06, 'epoch': 2.02} +2025-02-06 00:04:38 - ERROR - stderr - 67%|██████▋ | 15111/22434 [13:56:57<5:16:23, 2.59s/it] +2025-02-06 00:04:40 - ERROR - stderr - 67%|██████▋ | 15112/22434 [13:57:00<5:12:39, 2.56s/it] +2025-02-06 00:04:40 - ERROR - stderr - +2025-02-06 00:04:40 - ERROR - stderr - +2025-02-06 00:04:40 - INFO - stdout - {'loss': 0.4042, 'grad_norm': 1.4036517143249512, 'learning_rate': 5.08609968930385e-06, 'epoch': 2.02} +2025-02-06 00:04:40 - ERROR - stderr - 67%|██████▋ | 15112/22434 [13:57:00<5:12:39, 2.56s/it] +2025-02-06 00:04:43 - ERROR - stderr - 67%|██████▋ | 15113/22434 [13:57:02<5:09:11, 2.53s/it] +2025-02-06 00:04:43 - ERROR - stderr - +2025-02-06 00:04:43 - ERROR - stderr - +2025-02-06 00:04:43 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.261043906211853, 'learning_rate': 5.084842324372003e-06, 'epoch': 2.02} +2025-02-06 00:04:43 - ERROR - stderr - 67%|██████▋ | 15113/22434 [13:57:02<5:09:11, 2.53s/it] +2025-02-06 00:04:45 - ERROR - stderr - 67%|██████▋ | 15114/22434 [13:57:05<5:06:49, 2.51s/it] +2025-02-06 00:04:45 - ERROR - stderr - +2025-02-06 00:04:45 - ERROR - stderr - +2025-02-06 00:04:45 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.376478672027588, 'learning_rate': 5.083585061891925e-06, 'epoch': 2.02} +2025-02-06 00:04:45 - ERROR - stderr - 67%|██████▋ | 15114/22434 [13:57:05<5:06:49, 2.51s/it] +2025-02-06 00:04:48 - ERROR - stderr - 67%|██████▋ | 15115/22434 [13:57:07<5:06:31, 2.51s/it] +2025-02-06 00:04:48 - ERROR - stderr - +2025-02-06 00:04:48 - ERROR - stderr - +2025-02-06 00:04:48 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.3559823036193848, 'learning_rate': 5.082327901889801e-06, 'epoch': 2.02} +2025-02-06 00:04:48 - ERROR - stderr - 67%|██████▋ | 15115/22434 [13:57:07<5:06:31, 2.51s/it] +2025-02-06 00:04:51 - ERROR - stderr - 67%|██████▋ | 15116/22434 [13:57:10<5:24:37, 2.66s/it] +2025-02-06 00:04:51 - ERROR - stderr - +2025-02-06 00:04:51 - ERROR - stderr - +2025-02-06 00:04:51 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.4550917148590088, 'learning_rate': 5.081070844391855e-06, 'epoch': 2.02} +2025-02-06 00:04:51 - ERROR - stderr - 67%|██████▋ | 15116/22434 [13:57:10<5:24:37, 2.66s/it] +2025-02-06 00:04:53 - ERROR - stderr - 67%|██████▋ | 15117/22434 [13:57:13<5:15:30, 2.59s/it] +2025-02-06 00:04:53 - ERROR - stderr - +2025-02-06 00:04:53 - ERROR - stderr - +2025-02-06 00:04:53 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.371978998184204, 'learning_rate': 5.079813889424278e-06, 'epoch': 2.02} +2025-02-06 00:04:53 - ERROR - stderr - 67%|██████▋ | 15117/22434 [13:57:13<5:15:30, 2.59s/it] +2025-02-06 00:04:55 - ERROR - stderr - 67%|██████▋ | 15118/22434 [13:57:15<5:13:05, 2.57s/it] +2025-02-06 00:04:56 - ERROR - stderr - +2025-02-06 00:04:56 - ERROR - stderr - +2025-02-06 00:04:56 - INFO - stdout - {'loss': 0.3149, 'grad_norm': 1.2908315658569336, 'learning_rate': 5.078557037013271e-06, 'epoch': 2.02} +2025-02-06 00:04:56 - ERROR - stderr - 67%|██████▋ | 15118/22434 [13:57:15<5:13:05, 2.57s/it] +2025-02-06 00:04:58 - ERROR - stderr - 67%|██████▋ | 15119/22434 [13:57:18<5:09:56, 2.54s/it] +2025-02-06 00:04:58 - ERROR - stderr - +2025-02-06 00:04:58 - ERROR - stderr - +2025-02-06 00:04:58 - INFO - stdout - {'loss': 0.3877, 'grad_norm': 1.5395832061767578, 'learning_rate': 5.077300287185034e-06, 'epoch': 2.02} +2025-02-06 00:04:58 - ERROR - stderr - 67%|██████▋ | 15119/22434 [13:57:18<5:09:56, 2.54s/it] +2025-02-06 00:05:00 - ERROR - stderr - 67%|██████▋ | 15120/22434 [13:57:20<5:08:35, 2.53s/it] +2025-02-06 00:05:00 - ERROR - stderr - +2025-02-06 00:05:00 - ERROR - stderr - +2025-02-06 00:05:00 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.379804015159607, 'learning_rate': 5.0760436399657605e-06, 'epoch': 2.02} +2025-02-06 00:05:00 - ERROR - stderr - 67%|██████▋ | 15120/22434 [13:57:20<5:08:35, 2.53s/it] +2025-02-06 00:05:03 - ERROR - stderr - 67%|██████▋ | 15121/22434 [13:57:23<5:06:03, 2.51s/it] +2025-02-06 00:05:03 - ERROR - stderr - +2025-02-06 00:05:03 - ERROR - stderr - +2025-02-06 00:05:03 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.4979171752929688, 'learning_rate': 5.074787095381647e-06, 'epoch': 2.02} +2025-02-06 00:05:03 - ERROR - stderr - 67%|██████▋ | 15121/22434 [13:57:23<5:06:03, 2.51s/it] +2025-02-06 00:05:05 - ERROR - stderr - 67%|██████▋ | 15122/22434 [13:57:25<5:05:35, 2.51s/it] +2025-02-06 00:05:05 - ERROR - stderr - +2025-02-06 00:05:05 - ERROR - stderr - +2025-02-06 00:05:05 - INFO - stdout - {'loss': 0.425, 'grad_norm': 1.5418989658355713, 'learning_rate': 5.0735306534588826e-06, 'epoch': 2.02} +2025-02-06 00:05:05 - ERROR - stderr - 67%|██████▋ | 15122/22434 [13:57:25<5:05:35, 2.51s/it] +2025-02-06 00:05:08 - ERROR - stderr - 67%|██████▋ | 15123/22434 [13:57:28<5:06:40, 2.52s/it] +2025-02-06 00:05:08 - ERROR - stderr - +2025-02-06 00:05:08 - ERROR - stderr - +2025-02-06 00:05:08 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.236465334892273, 'learning_rate': 5.0722743142236585e-06, 'epoch': 2.02} +2025-02-06 00:05:08 - ERROR - stderr - 67%|██████▋ | 15123/22434 [13:57:28<5:06:40, 2.52s/it] +2025-02-06 00:05:11 - ERROR - stderr - 67%|██████▋ | 15124/22434 [13:57:30<5:08:46, 2.53s/it] +2025-02-06 00:05:11 - ERROR - stderr - +2025-02-06 00:05:11 - ERROR - stderr - +2025-02-06 00:05:11 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.3377609252929688, 'learning_rate': 5.071018077702161e-06, 'epoch': 2.02} +2025-02-06 00:05:11 - ERROR - stderr - 67%|██████▋ | 15124/22434 [13:57:30<5:08:46, 2.53s/it] +2025-02-06 00:05:13 - ERROR - stderr - 67%|██████▋ | 15125/22434 [13:57:33<5:04:46, 2.50s/it] +2025-02-06 00:05:13 - ERROR - stderr - +2025-02-06 00:05:13 - ERROR - stderr - +2025-02-06 00:05:13 - INFO - stdout - {'loss': 0.4262, 'grad_norm': 1.722356915473938, 'learning_rate': 5.069761943920575e-06, 'epoch': 2.02} +2025-02-06 00:05:13 - ERROR - stderr - 67%|██████▋ | 15125/22434 [13:57:33<5:04:46, 2.50s/it] +2025-02-06 00:05:15 - ERROR - stderr - 67%|██████▋ | 15126/22434 [13:57:35<5:04:01, 2.50s/it] +2025-02-06 00:05:15 - ERROR - stderr - +2025-02-06 00:05:15 - ERROR - stderr - +2025-02-06 00:05:15 - INFO - stdout - {'loss': 0.3874, 'grad_norm': 1.3811054229736328, 'learning_rate': 5.068505912905083e-06, 'epoch': 2.02} +2025-02-06 00:05:15 - ERROR - stderr - 67%|██████▋ | 15126/22434 [13:57:35<5:04:01, 2.50s/it] +2025-02-06 00:05:18 - ERROR - stderr - 67%|██████▋ | 15127/22434 [13:57:38<5:02:25, 2.48s/it] +2025-02-06 00:05:18 - ERROR - stderr - +2025-02-06 00:05:18 - ERROR - stderr - +2025-02-06 00:05:18 - INFO - stdout - {'loss': 0.4458, 'grad_norm': 1.5753281116485596, 'learning_rate': 5.067249984681865e-06, 'epoch': 2.02} +2025-02-06 00:05:18 - ERROR - stderr - 67%|██████▋ | 15127/22434 [13:57:38<5:02:25, 2.48s/it] +2025-02-06 00:05:20 - ERROR - stderr - 67%|██████▋ | 15128/22434 [13:57:40<5:03:43, 2.49s/it] +2025-02-06 00:05:20 - ERROR - stderr - +2025-02-06 00:05:20 - ERROR - stderr - +2025-02-06 00:05:20 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.3950414657592773, 'learning_rate': 5.065994159277103e-06, 'epoch': 2.02} +2025-02-06 00:05:20 - ERROR - stderr - 67%|██████▋ | 15128/22434 [13:57:40<5:03:43, 2.49s/it] +2025-02-06 00:05:23 - ERROR - stderr - 67%|██████▋ | 15129/22434 [13:57:43<5:03:34, 2.49s/it] +2025-02-06 00:05:23 - ERROR - stderr - +2025-02-06 00:05:23 - ERROR - stderr - +2025-02-06 00:05:23 - INFO - stdout - {'loss': 0.375, 'grad_norm': 1.4190549850463867, 'learning_rate': 5.064738436716972e-06, 'epoch': 2.02} +2025-02-06 00:05:23 - ERROR - stderr - 67%|██████▋ | 15129/22434 [13:57:43<5:03:34, 2.49s/it] +2025-02-06 00:05:26 - ERROR - stderr - 67%|██████▋ | 15130/22434 [13:57:45<5:08:44, 2.54s/it] +2025-02-06 00:05:26 - ERROR - stderr - +2025-02-06 00:05:26 - ERROR - stderr - +2025-02-06 00:05:26 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.3524816036224365, 'learning_rate': 5.0634828170276486e-06, 'epoch': 2.02} +2025-02-06 00:05:26 - ERROR - stderr - 67%|██████▋ | 15130/22434 [13:57:45<5:08:44, 2.54s/it] +2025-02-06 00:05:28 - ERROR - stderr - 67%|██████▋ | 15131/22434 [13:57:48<5:05:27, 2.51s/it] +2025-02-06 00:05:28 - ERROR - stderr - +2025-02-06 00:05:28 - ERROR - stderr - +2025-02-06 00:05:28 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.4363137483596802, 'learning_rate': 5.062227300235294e-06, 'epoch': 2.02} +2025-02-06 00:05:28 - ERROR - stderr - 67%|██████▋ | 15131/22434 [13:57:48<5:05:27, 2.51s/it] +2025-02-06 00:05:30 - ERROR - stderr - 67%|██████▋ | 15132/22434 [13:57:50<5:03:59, 2.50s/it] +2025-02-06 00:05:30 - ERROR - stderr - +2025-02-06 00:05:30 - ERROR - stderr - +2025-02-06 00:05:30 - INFO - stdout - {'loss': 0.4079, 'grad_norm': 1.4599323272705078, 'learning_rate': 5.06097188636609e-06, 'epoch': 2.02} +2025-02-06 00:05:30 - ERROR - stderr - 67%|██████▋ | 15132/22434 [13:57:50<5:03:59, 2.50s/it] +2025-02-06 00:05:33 - ERROR - stderr - 67%|██████▋ | 15133/22434 [13:57:53<5:04:56, 2.51s/it] +2025-02-06 00:05:33 - ERROR - stderr - +2025-02-06 00:05:33 - ERROR - stderr - +2025-02-06 00:05:33 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.4136043787002563, 'learning_rate': 5.0597165754462065e-06, 'epoch': 2.02} +2025-02-06 00:05:33 - ERROR - stderr - 67%|██████▋ | 15133/22434 [13:57:53<5:04:56, 2.51s/it] +2025-02-06 00:05:35 - ERROR - stderr - 67%|██████▋ | 15134/22434 [13:57:55<5:03:28, 2.49s/it] +2025-02-06 00:05:35 - ERROR - stderr - +2025-02-06 00:05:35 - ERROR - stderr - +2025-02-06 00:05:35 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.4858431816101074, 'learning_rate': 5.058461367501794e-06, 'epoch': 2.02} +2025-02-06 00:05:35 - ERROR - stderr - 67%|██████▋ | 15134/22434 [13:57:55<5:03:28, 2.49s/it] +2025-02-06 00:05:38 - ERROR - stderr - 67%|██████▋ | 15135/22434 [13:57:58<5:04:26, 2.50s/it] +2025-02-06 00:05:38 - ERROR - stderr - +2025-02-06 00:05:38 - ERROR - stderr - +2025-02-06 00:05:38 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.6382603645324707, 'learning_rate': 5.0572062625590355e-06, 'epoch': 2.02} +2025-02-06 00:05:38 - ERROR - stderr - 67%|██████▋ | 15135/22434 [13:57:58<5:04:26, 2.50s/it] +2025-02-06 00:05:41 - ERROR - stderr - 67%|██████▋ | 15136/22434 [13:58:00<5:07:19, 2.53s/it] +2025-02-06 00:05:41 - ERROR - stderr - +2025-02-06 00:05:41 - ERROR - stderr - +2025-02-06 00:05:41 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.5846140384674072, 'learning_rate': 5.055951260644074e-06, 'epoch': 2.02} +2025-02-06 00:05:41 - ERROR - stderr - 67%|██████▋ | 15136/22434 [13:58:00<5:07:19, 2.53s/it] +2025-02-06 00:05:43 - ERROR - stderr - 67%|██████▋ | 15137/22434 [13:58:03<5:04:18, 2.50s/it] +2025-02-06 00:05:43 - ERROR - stderr - +2025-02-06 00:05:43 - ERROR - stderr - +2025-02-06 00:05:43 - INFO - stdout - {'loss': 0.4071, 'grad_norm': 1.5171692371368408, 'learning_rate': 5.054696361783084e-06, 'epoch': 2.02} +2025-02-06 00:05:43 - ERROR - stderr - 67%|██████▋ | 15137/22434 [13:58:03<5:04:18, 2.50s/it] +2025-02-06 00:05:45 - ERROR - stderr - 67%|██████▋ | 15138/22434 [13:58:05<5:03:12, 2.49s/it] +2025-02-06 00:05:46 - ERROR - stderr - +2025-02-06 00:05:46 - ERROR - stderr - +2025-02-06 00:05:46 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.3030234575271606, 'learning_rate': 5.053441566002214e-06, 'epoch': 2.02} +2025-02-06 00:05:46 - ERROR - stderr - 67%|██████▋ | 15138/22434 [13:58:05<5:03:12, 2.49s/it] +2025-02-06 00:05:48 - ERROR - stderr - 67%|██████▋ | 15139/22434 [13:58:08<4:59:54, 2.47s/it] +2025-02-06 00:05:48 - ERROR - stderr - +2025-02-06 00:05:48 - ERROR - stderr - +2025-02-06 00:05:48 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.3448652029037476, 'learning_rate': 5.052186873327617e-06, 'epoch': 2.02} +2025-02-06 00:05:48 - ERROR - stderr - 67%|██████▋ | 15139/22434 [13:58:08<4:59:54, 2.47s/it] +2025-02-06 00:05:50 - ERROR - stderr - 67%|██████▋ | 15140/22434 [13:58:10<5:01:14, 2.48s/it] +2025-02-06 00:05:50 - ERROR - stderr - +2025-02-06 00:05:50 - ERROR - stderr - +2025-02-06 00:05:50 - INFO - stdout - {'loss': 0.4285, 'grad_norm': 1.5697062015533447, 'learning_rate': 5.050932283785457e-06, 'epoch': 2.02} +2025-02-06 00:05:50 - ERROR - stderr - 67%|██████▋ | 15140/22434 [13:58:10<5:01:14, 2.48s/it] +2025-02-06 00:05:53 - ERROR - stderr - 67%|██████▋ | 15141/22434 [13:58:13<5:05:43, 2.52s/it] +2025-02-06 00:05:53 - ERROR - stderr - +2025-02-06 00:05:53 - ERROR - stderr - +2025-02-06 00:05:53 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.2767164707183838, 'learning_rate': 5.049677797401875e-06, 'epoch': 2.02} +2025-02-06 00:05:53 - ERROR - stderr - 67%|██████▋ | 15141/22434 [13:58:13<5:05:43, 2.52s/it] +2025-02-06 00:05:55 - ERROR - stderr - 67%|██████▋ | 15142/22434 [13:58:15<5:03:22, 2.50s/it] +2025-02-06 00:05:55 - ERROR - stderr - +2025-02-06 00:05:55 - ERROR - stderr - +2025-02-06 00:05:55 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.2968182563781738, 'learning_rate': 5.048423414203022e-06, 'epoch': 2.02} +2025-02-06 00:05:55 - ERROR - stderr - 67%|██████▋ | 15142/22434 [13:58:15<5:03:22, 2.50s/it] +2025-02-06 00:05:58 - ERROR - stderr - 68%|██████▊ | 15143/22434 [13:58:18<5:05:14, 2.51s/it] +2025-02-06 00:05:58 - ERROR - stderr - +2025-02-06 00:05:58 - ERROR - stderr - +2025-02-06 00:05:58 - INFO - stdout - {'loss': 0.4235, 'grad_norm': 1.3948919773101807, 'learning_rate': 5.0471691342150445e-06, 'epoch': 2.03} +2025-02-06 00:05:58 - ERROR - stderr - 68%|██████▊ | 15143/22434 [13:58:18<5:05:14, 2.51s/it] +2025-02-06 00:06:00 - ERROR - stderr - 68%|██████▊ | 15144/22434 [13:58:20<5:04:03, 2.50s/it] +2025-02-06 00:06:01 - ERROR - stderr - +2025-02-06 00:06:01 - ERROR - stderr - +2025-02-06 00:06:01 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.3990373611450195, 'learning_rate': 5.045914957464086e-06, 'epoch': 2.03} +2025-02-06 00:06:01 - ERROR - stderr - 68%|██████▊ | 15144/22434 [13:58:20<5:04:03, 2.50s/it] +2025-02-06 00:06:03 - ERROR - stderr - 68%|██████▊ | 15145/22434 [13:58:23<5:01:32, 2.48s/it] +2025-02-06 00:06:03 - ERROR - stderr - +2025-02-06 00:06:03 - ERROR - stderr - +2025-02-06 00:06:03 - INFO - stdout - {'loss': 0.4302, 'grad_norm': 1.3211435079574585, 'learning_rate': 5.0446608839762925e-06, 'epoch': 2.03} +2025-02-06 00:06:03 - ERROR - stderr - 68%|██████▊ | 15145/22434 [13:58:23<5:01:32, 2.48s/it] +2025-02-06 00:06:05 - ERROR - stderr - 68%|██████▊ | 15146/22434 [13:58:25<5:00:08, 2.47s/it] +2025-02-06 00:06:05 - ERROR - stderr - +2025-02-06 00:06:05 - ERROR - stderr - +2025-02-06 00:06:05 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.4555833339691162, 'learning_rate': 5.0434069137778e-06, 'epoch': 2.03} +2025-02-06 00:06:05 - ERROR - stderr - 68%|██████▊ | 15146/22434 [13:58:25<5:00:08, 2.47s/it] +2025-02-06 00:06:08 - ERROR - stderr - 68%|██████▊ | 15147/22434 [13:58:28<5:01:37, 2.48s/it] +2025-02-06 00:06:08 - ERROR - stderr - +2025-02-06 00:06:08 - ERROR - stderr - +2025-02-06 00:06:08 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.3638324737548828, 'learning_rate': 5.042153046894746e-06, 'epoch': 2.03} +2025-02-06 00:06:08 - ERROR - stderr - 68%|██████▊ | 15147/22434 [13:58:28<5:01:37, 2.48s/it] +2025-02-06 00:06:10 - ERROR - stderr - 68%|██████▊ | 15148/22434 [13:58:30<5:02:14, 2.49s/it] +2025-02-06 00:06:10 - ERROR - stderr - +2025-02-06 00:06:10 - ERROR - stderr - +2025-02-06 00:06:10 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.3377161026000977, 'learning_rate': 5.040899283353269e-06, 'epoch': 2.03} +2025-02-06 00:06:10 - ERROR - stderr - 68%|██████▊ | 15148/22434 [13:58:30<5:02:14, 2.49s/it] +2025-02-06 00:06:13 - ERROR - stderr - 68%|██████▊ | 15149/22434 [13:58:33<5:05:32, 2.52s/it] +2025-02-06 00:06:13 - ERROR - stderr - +2025-02-06 00:06:13 - ERROR - stderr - +2025-02-06 00:06:13 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.2978307008743286, 'learning_rate': 5.039645623179503e-06, 'epoch': 2.03} +2025-02-06 00:06:13 - ERROR - stderr - 68%|██████▊ | 15149/22434 [13:58:33<5:05:32, 2.52s/it] +2025-02-06 00:06:15 - ERROR - stderr - 68%|██████▊ | 15150/22434 [13:58:35<5:04:21, 2.51s/it] +2025-02-06 00:06:15 - ERROR - stderr - +2025-02-06 00:06:15 - ERROR - stderr - +2025-02-06 00:06:15 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.5284180641174316, 'learning_rate': 5.038392066399572e-06, 'epoch': 2.03} +2025-02-06 00:06:15 - ERROR - stderr - 68%|██████▊ | 15150/22434 [13:58:35<5:04:21, 2.51s/it] +2025-02-06 00:06:18 - ERROR - stderr - 68%|██████▊ | 15151/22434 [13:58:38<5:04:32, 2.51s/it] +2025-02-06 00:06:18 - ERROR - stderr - +2025-02-06 00:06:18 - ERROR - stderr - +2025-02-06 00:06:18 - INFO - stdout - {'loss': 0.324, 'grad_norm': 1.2375569343566895, 'learning_rate': 5.037138613039614e-06, 'epoch': 2.03} +2025-02-06 00:06:18 - ERROR - stderr - 68%|██████▊ | 15151/22434 [13:58:38<5:04:32, 2.51s/it] +2025-02-06 00:06:20 - ERROR - stderr - 68%|██████▊ | 15152/22434 [13:58:40<5:04:28, 2.51s/it] +2025-02-06 00:06:20 - ERROR - stderr - +2025-02-06 00:06:20 - ERROR - stderr - +2025-02-06 00:06:20 - INFO - stdout - {'loss': 0.4198, 'grad_norm': 1.4792269468307495, 'learning_rate': 5.035885263125753e-06, 'epoch': 2.03} +2025-02-06 00:06:20 - ERROR - stderr - 68%|██████▊ | 15152/22434 [13:58:40<5:04:28, 2.51s/it] +2025-02-06 00:06:23 - ERROR - stderr - 68%|██████▊ | 15153/22434 [13:58:43<5:02:37, 2.49s/it] +2025-02-06 00:06:23 - ERROR - stderr - +2025-02-06 00:06:23 - ERROR - stderr - +2025-02-06 00:06:23 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.5818573236465454, 'learning_rate': 5.034632016684112e-06, 'epoch': 2.03} +2025-02-06 00:06:23 - ERROR - stderr - 68%|██████▊ | 15153/22434 [13:58:43<5:02:37, 2.49s/it] +2025-02-06 00:06:25 - ERROR - stderr - 68%|██████▊ | 15154/22434 [13:58:45<5:06:21, 2.52s/it] +2025-02-06 00:06:26 - ERROR - stderr - +2025-02-06 00:06:26 - ERROR - stderr - +2025-02-06 00:06:26 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.479024887084961, 'learning_rate': 5.03337887374082e-06, 'epoch': 2.03} +2025-02-06 00:06:26 - ERROR - stderr - 68%|██████▊ | 15154/22434 [13:58:45<5:06:21, 2.52s/it] +2025-02-06 00:06:28 - ERROR - stderr - 68%|██████▊ | 15155/22434 [13:58:48<5:05:48, 2.52s/it] +2025-02-06 00:06:28 - ERROR - stderr - +2025-02-06 00:06:28 - ERROR - stderr - +2025-02-06 00:06:28 - INFO - stdout - {'loss': 0.413, 'grad_norm': 1.453445315361023, 'learning_rate': 5.032125834321986e-06, 'epoch': 2.03} +2025-02-06 00:06:28 - ERROR - stderr - 68%|██████▊ | 15155/22434 [13:58:48<5:05:48, 2.52s/it] +2025-02-06 00:06:31 - ERROR - stderr - 68%|██████▊ | 15156/22434 [13:58:50<5:05:38, 2.52s/it] +2025-02-06 00:06:31 - ERROR - stderr - +2025-02-06 00:06:31 - ERROR - stderr - +2025-02-06 00:06:31 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.3083502054214478, 'learning_rate': 5.030872898453742e-06, 'epoch': 2.03} +2025-02-06 00:06:31 - ERROR - stderr - 68%|██████▊ | 15156/22434 [13:58:50<5:05:38, 2.52s/it] +2025-02-06 00:06:33 - ERROR - stderr - 68%|██████▊ | 15157/22434 [13:58:53<5:05:04, 2.52s/it] +2025-02-06 00:06:33 - ERROR - stderr - +2025-02-06 00:06:33 - ERROR - stderr - +2025-02-06 00:06:33 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.2808568477630615, 'learning_rate': 5.029620066162193e-06, 'epoch': 2.03} +2025-02-06 00:06:33 - ERROR - stderr - 68%|██████▊ | 15157/22434 [13:58:53<5:05:04, 2.52s/it] +2025-02-06 00:06:36 - ERROR - stderr - 68%|██████▊ | 15158/22434 [13:58:55<5:09:53, 2.56s/it] +2025-02-06 00:06:36 - ERROR - stderr - +2025-02-06 00:06:36 - ERROR - stderr - +2025-02-06 00:06:36 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.4797894954681396, 'learning_rate': 5.0283673374734546e-06, 'epoch': 2.03} +2025-02-06 00:06:36 - ERROR - stderr - 68%|██████▊ | 15158/22434 [13:58:55<5:09:53, 2.56s/it] +2025-02-06 00:06:38 - ERROR - stderr - 68%|██████▊ | 15159/22434 [13:58:58<5:10:06, 2.56s/it] +2025-02-06 00:06:38 - ERROR - stderr - +2025-02-06 00:06:38 - ERROR - stderr - +2025-02-06 00:06:38 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.4074105024337769, 'learning_rate': 5.02711471241365e-06, 'epoch': 2.03} +2025-02-06 00:06:38 - ERROR - stderr - 68%|██████▊ | 15159/22434 [13:58:58<5:10:06, 2.56s/it] +2025-02-06 00:06:41 - ERROR - stderr - 68%|██████▊ | 15160/22434 [13:59:01<5:08:45, 2.55s/it] +2025-02-06 00:06:41 - ERROR - stderr - +2025-02-06 00:06:41 - ERROR - stderr - +2025-02-06 00:06:41 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.571441888809204, 'learning_rate': 5.025862191008872e-06, 'epoch': 2.03} +2025-02-06 00:06:41 - ERROR - stderr - 68%|██████▊ | 15160/22434 [13:59:01<5:08:45, 2.55s/it] +2025-02-06 00:06:43 - ERROR - stderr - 68%|██████▊ | 15161/22434 [13:59:03<5:07:34, 2.54s/it] +2025-02-06 00:06:43 - ERROR - stderr - +2025-02-06 00:06:43 - ERROR - stderr - +2025-02-06 00:06:43 - INFO - stdout - {'loss': 0.4401, 'grad_norm': 1.4238935708999634, 'learning_rate': 5.024609773285245e-06, 'epoch': 2.03} +2025-02-06 00:06:43 - ERROR - stderr - 68%|██████▊ | 15161/22434 [13:59:03<5:07:34, 2.54s/it] +2025-02-06 00:06:46 - ERROR - stderr - 68%|██████▊ | 15162/22434 [13:59:06<5:07:22, 2.54s/it] +2025-02-06 00:06:46 - ERROR - stderr - +2025-02-06 00:06:46 - ERROR - stderr - +2025-02-06 00:06:46 - INFO - stdout - {'loss': 0.4243, 'grad_norm': 1.5721715688705444, 'learning_rate': 5.023357459268863e-06, 'epoch': 2.03} +2025-02-06 00:06:46 - ERROR - stderr - 68%|██████▊ | 15162/22434 [13:59:06<5:07:22, 2.54s/it] +2025-02-06 00:06:48 - ERROR - stderr - 68%|██████▊ | 15163/22434 [13:59:08<5:06:43, 2.53s/it] +2025-02-06 00:06:48 - ERROR - stderr - +2025-02-06 00:06:48 - ERROR - stderr - +2025-02-06 00:06:48 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.3580513000488281, 'learning_rate': 5.022105248985831e-06, 'epoch': 2.03} +2025-02-06 00:06:48 - ERROR - stderr - 68%|██████▊ | 15163/22434 [13:59:08<5:06:43, 2.53s/it] +2025-02-06 00:06:51 - ERROR - stderr - 68%|██████▊ | 15164/22434 [13:59:11<5:05:33, 2.52s/it] +2025-02-06 00:06:51 - ERROR - stderr - +2025-02-06 00:06:51 - ERROR - stderr - +2025-02-06 00:06:51 - INFO - stdout - {'loss': 0.4318, 'grad_norm': 1.5814497470855713, 'learning_rate': 5.020853142462253e-06, 'epoch': 2.03} +2025-02-06 00:06:51 - ERROR - stderr - 68%|██████▊ | 15164/22434 [13:59:11<5:05:33, 2.52s/it] +2025-02-06 00:06:53 - ERROR - stderr - 68%|██████▊ | 15165/22434 [13:59:13<5:05:49, 2.52s/it] +2025-02-06 00:06:53 - ERROR - stderr - +2025-02-06 00:06:53 - ERROR - stderr - +2025-02-06 00:06:53 - INFO - stdout - {'loss': 0.4226, 'grad_norm': 1.4298213720321655, 'learning_rate': 5.019601139724226e-06, 'epoch': 2.03} +2025-02-06 00:06:53 - ERROR - stderr - 68%|██████▊ | 15165/22434 [13:59:13<5:05:49, 2.52s/it] +2025-02-06 00:06:56 - ERROR - stderr - 68%|██████▊ | 15166/22434 [13:59:16<5:20:25, 2.65s/it] +2025-02-06 00:06:56 - ERROR - stderr - +2025-02-06 00:06:56 - ERROR - stderr - +2025-02-06 00:06:56 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.4232758283615112, 'learning_rate': 5.018349240797848e-06, 'epoch': 2.03} +2025-02-06 00:06:56 - ERROR - stderr - 68%|██████▊ | 15166/22434 [13:59:16<5:20:25, 2.65s/it] +2025-02-06 00:06:59 - ERROR - stderr - 68%|██████▊ | 15167/22434 [13:59:19<5:15:41, 2.61s/it] +2025-02-06 00:06:59 - ERROR - stderr - +2025-02-06 00:06:59 - ERROR - stderr - +2025-02-06 00:06:59 - INFO - stdout - {'loss': 0.3192, 'grad_norm': 1.3235522508621216, 'learning_rate': 5.017097445709214e-06, 'epoch': 2.03} +2025-02-06 00:06:59 - ERROR - stderr - 68%|██████▊ | 15167/22434 [13:59:19<5:15:41, 2.61s/it] +2025-02-06 00:07:01 - ERROR - stderr - 68%|██████▊ | 15168/22434 [13:59:21<5:10:30, 2.56s/it] +2025-02-06 00:07:01 - ERROR - stderr - +2025-02-06 00:07:01 - ERROR - stderr - +2025-02-06 00:07:01 - INFO - stdout - {'loss': 0.3937, 'grad_norm': 1.3963795900344849, 'learning_rate': 5.015845754484414e-06, 'epoch': 2.03} +2025-02-06 00:07:01 - ERROR - stderr - 68%|██████▊ | 15168/22434 [13:59:21<5:10:30, 2.56s/it] +2025-02-06 00:07:04 - ERROR - stderr - 68%|██████▊ | 15169/22434 [13:59:24<5:09:32, 2.56s/it] +2025-02-06 00:07:04 - ERROR - stderr - +2025-02-06 00:07:04 - ERROR - stderr - +2025-02-06 00:07:04 - INFO - stdout - {'loss': 0.4205, 'grad_norm': 1.6338738203048706, 'learning_rate': 5.014594167149541e-06, 'epoch': 2.03} +2025-02-06 00:07:04 - ERROR - stderr - 68%|██████▊ | 15169/22434 [13:59:24<5:09:32, 2.56s/it] +2025-02-06 00:07:06 - ERROR - stderr - 68%|██████▊ | 15170/22434 [13:59:26<5:07:12, 2.54s/it] +2025-02-06 00:07:06 - ERROR - stderr - +2025-02-06 00:07:06 - ERROR - stderr - +2025-02-06 00:07:06 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.3646982908248901, 'learning_rate': 5.013342683730682e-06, 'epoch': 2.03} +2025-02-06 00:07:06 - ERROR - stderr - 68%|██████▊ | 15170/22434 [13:59:26<5:07:12, 2.54s/it] +2025-02-06 00:07:09 - ERROR - stderr - 68%|██████▊ | 15171/22434 [13:59:29<5:09:15, 2.55s/it] +2025-02-06 00:07:09 - ERROR - stderr - +2025-02-06 00:07:09 - ERROR - stderr - +2025-02-06 00:07:09 - INFO - stdout - {'loss': 0.3964, 'grad_norm': 1.546512484550476, 'learning_rate': 5.012091304253923e-06, 'epoch': 2.03} +2025-02-06 00:07:09 - ERROR - stderr - 68%|██████▊ | 15171/22434 [13:59:29<5:09:15, 2.55s/it] +2025-02-06 00:07:11 - ERROR - stderr - 68%|██████▊ | 15172/22434 [13:59:31<5:05:50, 2.53s/it] +2025-02-06 00:07:11 - ERROR - stderr - +2025-02-06 00:07:11 - ERROR - stderr - +2025-02-06 00:07:11 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.3893542289733887, 'learning_rate': 5.010840028745347e-06, 'epoch': 2.03} +2025-02-06 00:07:11 - ERROR - stderr - 68%|██████▊ | 15172/22434 [13:59:31<5:05:50, 2.53s/it] +2025-02-06 00:07:14 - ERROR - stderr - 68%|██████▊ | 15173/22434 [13:59:34<5:03:31, 2.51s/it] +2025-02-06 00:07:14 - ERROR - stderr - +2025-02-06 00:07:14 - ERROR - stderr - +2025-02-06 00:07:14 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.4347014427185059, 'learning_rate': 5.009588857231043e-06, 'epoch': 2.03} +2025-02-06 00:07:14 - ERROR - stderr - 68%|██████▊ | 15173/22434 [13:59:34<5:03:31, 2.51s/it] +2025-02-06 00:07:17 - ERROR - stderr - 68%|██████▊ | 15174/22434 [13:59:36<5:10:51, 2.57s/it] +2025-02-06 00:07:17 - ERROR - stderr - +2025-02-06 00:07:17 - ERROR - stderr - +2025-02-06 00:07:17 - INFO - stdout - {'loss': 0.3678, 'grad_norm': 1.3542146682739258, 'learning_rate': 5.008337789737073e-06, 'epoch': 2.03} +2025-02-06 00:07:17 - ERROR - stderr - 68%|██████▊ | 15174/22434 [13:59:36<5:10:51, 2.57s/it] +2025-02-06 00:07:19 - ERROR - stderr - 68%|██████▊ | 15175/22434 [13:59:39<5:08:20, 2.55s/it] +2025-02-06 00:07:19 - ERROR - stderr - +2025-02-06 00:07:19 - ERROR - stderr - +2025-02-06 00:07:19 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.3876335620880127, 'learning_rate': 5.007086826289535e-06, 'epoch': 2.03} +2025-02-06 00:07:19 - ERROR - stderr - 68%|██████▊ | 15175/22434 [13:59:39<5:08:20, 2.55s/it] +2025-02-06 00:07:22 - ERROR - stderr - 68%|██████▊ | 15176/22434 [13:59:41<5:09:02, 2.55s/it] +2025-02-06 00:07:22 - ERROR - stderr - +2025-02-06 00:07:22 - ERROR - stderr - +2025-02-06 00:07:22 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.85788094997406, 'learning_rate': 5.005835966914485e-06, 'epoch': 2.03} +2025-02-06 00:07:22 - ERROR - stderr - 68%|██████▊ | 15176/22434 [13:59:41<5:09:02, 2.55s/it] +2025-02-06 00:07:24 - ERROR - stderr - 68%|██████▊ | 15177/22434 [13:59:44<5:04:46, 2.52s/it] +2025-02-06 00:07:24 - ERROR - stderr - +2025-02-06 00:07:24 - ERROR - stderr - +2025-02-06 00:07:24 - INFO - stdout - {'loss': 0.3645, 'grad_norm': 1.4473538398742676, 'learning_rate': 5.004585211638011e-06, 'epoch': 2.03} +2025-02-06 00:07:24 - ERROR - stderr - 68%|██████▊ | 15177/22434 [13:59:44<5:04:46, 2.52s/it] +2025-02-06 00:07:27 - ERROR - stderr - 68%|██████▊ | 15178/22434 [13:59:46<5:09:50, 2.56s/it] +2025-02-06 00:07:27 - ERROR - stderr - +2025-02-06 00:07:27 - ERROR - stderr - +2025-02-06 00:07:27 - INFO - stdout - {'loss': 0.4184, 'grad_norm': 1.495301365852356, 'learning_rate': 5.003334560486181e-06, 'epoch': 2.03} +2025-02-06 00:07:27 - ERROR - stderr - 68%|██████▊ | 15178/22434 [13:59:47<5:09:50, 2.56s/it] +2025-02-06 00:07:29 - ERROR - stderr - 68%|██████▊ | 15179/22434 [13:59:49<5:06:07, 2.53s/it] +2025-02-06 00:07:29 - ERROR - stderr - +2025-02-06 00:07:29 - ERROR - stderr - +2025-02-06 00:07:29 - INFO - stdout - {'loss': 0.4531, 'grad_norm': 1.7206151485443115, 'learning_rate': 5.002084013485054e-06, 'epoch': 2.03} +2025-02-06 00:07:29 - ERROR - stderr - 68%|██████▊ | 15179/22434 [13:59:49<5:06:07, 2.53s/it] +2025-02-06 00:07:32 - ERROR - stderr - 68%|██████▊ | 15180/22434 [13:59:52<5:11:34, 2.58s/it] +2025-02-06 00:07:32 - ERROR - stderr - +2025-02-06 00:07:32 - ERROR - stderr - +2025-02-06 00:07:32 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.387464165687561, 'learning_rate': 5.0008335706607095e-06, 'epoch': 2.03} +2025-02-06 00:07:32 - ERROR - stderr - 68%|██████▊ | 15180/22434 [13:59:52<5:11:34, 2.58s/it] +2025-02-06 00:07:35 - ERROR - stderr - 68%|██████▊ | 15181/22434 [13:59:54<5:15:27, 2.61s/it] +2025-02-06 00:07:35 - ERROR - stderr - +2025-02-06 00:07:35 - ERROR - stderr - +2025-02-06 00:07:35 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.582840919494629, 'learning_rate': 4.999583232039202e-06, 'epoch': 2.03} +2025-02-06 00:07:35 - ERROR - stderr - 68%|██████▊ | 15181/22434 [13:59:54<5:15:27, 2.61s/it] +2025-02-06 00:07:37 - ERROR - stderr - 68%|██████▊ | 15182/22434 [13:59:57<5:16:30, 2.62s/it] +2025-02-06 00:07:37 - ERROR - stderr - +2025-02-06 00:07:37 - ERROR - stderr - +2025-02-06 00:07:37 - INFO - stdout - {'loss': 0.3599, 'grad_norm': 1.3650336265563965, 'learning_rate': 4.998332997646598e-06, 'epoch': 2.03} +2025-02-06 00:07:37 - ERROR - stderr - 68%|██████▊ | 15182/22434 [13:59:57<5:16:30, 2.62s/it] +2025-02-06 00:07:40 - ERROR - stderr - 68%|██████▊ | 15183/22434 [14:00:00<5:25:15, 2.69s/it] +2025-02-06 00:07:40 - ERROR - stderr - +2025-02-06 00:07:40 - ERROR - stderr - +2025-02-06 00:07:40 - INFO - stdout - {'loss': 0.4252, 'grad_norm': 1.5347820520401, 'learning_rate': 4.997082867508956e-06, 'epoch': 2.03} +2025-02-06 00:07:40 - ERROR - stderr - 68%|██████▊ | 15183/22434 [14:00:00<5:25:15, 2.69s/it] +2025-02-06 00:07:43 - ERROR - stderr - 68%|██████▊ | 15184/22434 [14:00:02<5:25:09, 2.69s/it] +2025-02-06 00:07:43 - ERROR - stderr - +2025-02-06 00:07:43 - ERROR - stderr - +2025-02-06 00:07:43 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.456618070602417, 'learning_rate': 4.99583284165233e-06, 'epoch': 2.03} +2025-02-06 00:07:43 - ERROR - stderr - 68%|██████▊ | 15184/22434 [14:00:03<5:25:09, 2.69s/it] +2025-02-06 00:07:45 - ERROR - stderr - 68%|██████▊ | 15185/22434 [14:00:05<5:17:47, 2.63s/it] +2025-02-06 00:07:45 - ERROR - stderr - +2025-02-06 00:07:45 - ERROR - stderr - +2025-02-06 00:07:45 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.3619157075881958, 'learning_rate': 4.9945829201027894e-06, 'epoch': 2.03} +2025-02-06 00:07:45 - ERROR - stderr - 68%|██████▊ | 15185/22434 [14:00:05<5:17:47, 2.63s/it] +2025-02-06 00:07:48 - ERROR - stderr - 68%|██████▊ | 15186/22434 [14:00:08<5:13:49, 2.60s/it] +2025-02-06 00:07:48 - ERROR - stderr - +2025-02-06 00:07:48 - ERROR - stderr - +2025-02-06 00:07:48 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.5675910711288452, 'learning_rate': 4.993333102886373e-06, 'epoch': 2.03} +2025-02-06 00:07:48 - ERROR - stderr - 68%|██████▊ | 15186/22434 [14:00:08<5:13:49, 2.60s/it] +2025-02-06 00:07:50 - ERROR - stderr - 68%|██████▊ | 15187/22434 [14:00:10<5:13:16, 2.59s/it] +2025-02-06 00:07:50 - ERROR - stderr - +2025-02-06 00:07:50 - ERROR - stderr - +2025-02-06 00:07:50 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.4434353113174438, 'learning_rate': 4.992083390029138e-06, 'epoch': 2.03} +2025-02-06 00:07:50 - ERROR - stderr - 68%|██████▊ | 15187/22434 [14:00:10<5:13:16, 2.59s/it] +2025-02-06 00:07:53 - ERROR - stderr - 68%|██████▊ | 15188/22434 [14:00:13<5:07:16, 2.54s/it] +2025-02-06 00:07:53 - ERROR - stderr - +2025-02-06 00:07:53 - ERROR - stderr - +2025-02-06 00:07:53 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.5061726570129395, 'learning_rate': 4.990833781557132e-06, 'epoch': 2.03} +2025-02-06 00:07:53 - ERROR - stderr - 68%|██████▊ | 15188/22434 [14:00:13<5:07:16, 2.54s/it] +2025-02-06 00:07:56 - ERROR - stderr - 68%|██████▊ | 15189/22434 [14:00:15<5:17:18, 2.63s/it] +2025-02-06 00:07:56 - ERROR - stderr - +2025-02-06 00:07:56 - ERROR - stderr - +2025-02-06 00:07:56 - INFO - stdout - {'loss': 0.3881, 'grad_norm': 1.4095494747161865, 'learning_rate': 4.989584277496402e-06, 'epoch': 2.03} +2025-02-06 00:07:56 - ERROR - stderr - 68%|██████▊ | 15189/22434 [14:00:15<5:17:18, 2.63s/it] +2025-02-06 00:07:58 - ERROR - stderr - 68%|██████▊ | 15190/22434 [14:00:18<5:11:50, 2.58s/it] +2025-02-06 00:07:58 - ERROR - stderr - +2025-02-06 00:07:58 - ERROR - stderr - +2025-02-06 00:07:58 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.506284475326538, 'learning_rate': 4.988334877872995e-06, 'epoch': 2.03} +2025-02-06 00:07:58 - ERROR - stderr - 68%|██████▊ | 15190/22434 [14:00:18<5:11:50, 2.58s/it] +2025-02-06 00:08:01 - ERROR - stderr - 68%|██████▊ | 15191/22434 [14:00:20<5:10:04, 2.57s/it] +2025-02-06 00:08:01 - ERROR - stderr - +2025-02-06 00:08:01 - ERROR - stderr - +2025-02-06 00:08:01 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.6584863662719727, 'learning_rate': 4.987085582712951e-06, 'epoch': 2.03} +2025-02-06 00:08:01 - ERROR - stderr - 68%|██████▊ | 15191/22434 [14:00:20<5:10:04, 2.57s/it] +2025-02-06 00:08:03 - ERROR - stderr - 68%|██████▊ | 15192/22434 [14:00:23<5:08:03, 2.55s/it] +2025-02-06 00:08:03 - ERROR - stderr - +2025-02-06 00:08:03 - ERROR - stderr - +2025-02-06 00:08:03 - INFO - stdout - {'loss': 0.3884, 'grad_norm': 1.3552623987197876, 'learning_rate': 4.985836392042311e-06, 'epoch': 2.03} +2025-02-06 00:08:03 - ERROR - stderr - 68%|██████▊ | 15192/22434 [14:00:23<5:08:03, 2.55s/it] +2025-02-06 00:08:06 - ERROR - stderr - 68%|██████▊ | 15193/22434 [14:00:26<5:12:26, 2.59s/it] +2025-02-06 00:08:06 - ERROR - stderr - +2025-02-06 00:08:06 - ERROR - stderr - +2025-02-06 00:08:06 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.555468201637268, 'learning_rate': 4.984587305887113e-06, 'epoch': 2.03} +2025-02-06 00:08:06 - ERROR - stderr - 68%|██████▊ | 15193/22434 [14:00:26<5:12:26, 2.59s/it] +2025-02-06 00:08:08 - ERROR - stderr - 68%|██████▊ | 15194/22434 [14:00:28<5:08:13, 2.55s/it] +2025-02-06 00:08:08 - ERROR - stderr - +2025-02-06 00:08:08 - ERROR - stderr - +2025-02-06 00:08:08 - INFO - stdout - {'loss': 0.4263, 'grad_norm': 1.5172396898269653, 'learning_rate': 4.983338324273397e-06, 'epoch': 2.03} +2025-02-06 00:08:08 - ERROR - stderr - 68%|██████▊ | 15194/22434 [14:00:28<5:08:13, 2.55s/it] +2025-02-06 00:08:11 - ERROR - stderr - 68%|██████▊ | 15195/22434 [14:00:31<5:09:29, 2.57s/it] +2025-02-06 00:08:11 - ERROR - stderr - +2025-02-06 00:08:11 - ERROR - stderr - +2025-02-06 00:08:11 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.3437168598175049, 'learning_rate': 4.982089447227187e-06, 'epoch': 2.03} +2025-02-06 00:08:11 - ERROR - stderr - 68%|██████▊ | 15195/22434 [14:00:31<5:09:29, 2.57s/it] +2025-02-06 00:08:13 - ERROR - stderr - 68%|██████▊ | 15196/22434 [14:00:33<5:08:31, 2.56s/it] +2025-02-06 00:08:13 - ERROR - stderr - +2025-02-06 00:08:13 - ERROR - stderr - +2025-02-06 00:08:13 - INFO - stdout - {'loss': 0.3858, 'grad_norm': 1.3788877725601196, 'learning_rate': 4.980840674774523e-06, 'epoch': 2.03} +2025-02-06 00:08:13 - ERROR - stderr - 68%|██████▊ | 15196/22434 [14:00:33<5:08:31, 2.56s/it] +2025-02-06 00:08:16 - ERROR - stderr - 68%|██████▊ | 15197/22434 [14:00:36<5:06:00, 2.54s/it] +2025-02-06 00:08:16 - ERROR - stderr - +2025-02-06 00:08:16 - ERROR - stderr - +2025-02-06 00:08:16 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.3980292081832886, 'learning_rate': 4.979592006941437e-06, 'epoch': 2.03} +2025-02-06 00:08:16 - ERROR - stderr - 68%|██████▊ | 15197/22434 [14:00:36<5:06:00, 2.54s/it] +2025-02-06 00:08:18 - ERROR - stderr - 68%|██████▊ | 15198/22434 [14:00:38<5:03:34, 2.52s/it] +2025-02-06 00:08:18 - ERROR - stderr - +2025-02-06 00:08:18 - ERROR - stderr - +2025-02-06 00:08:18 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.3463141918182373, 'learning_rate': 4.9783434437539444e-06, 'epoch': 2.03} +2025-02-06 00:08:18 - ERROR - stderr - 68%|██��███▊ | 15198/22434 [14:00:38<5:03:34, 2.52s/it] +2025-02-06 00:08:21 - ERROR - stderr - 68%|██████▊ | 15199/22434 [14:00:41<5:04:44, 2.53s/it] +2025-02-06 00:08:21 - ERROR - stderr - +2025-02-06 00:08:21 - ERROR - stderr - +2025-02-06 00:08:21 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.9077330827713013, 'learning_rate': 4.977094985238085e-06, 'epoch': 2.03} +2025-02-06 00:08:21 - ERROR - stderr - 68%|██████▊ | 15199/22434 [14:00:41<5:04:44, 2.53s/it] +2025-02-06 00:08:23 - ERROR - stderr - 68%|██████▊ | 15200/22434 [14:00:43<5:06:06, 2.54s/it] +2025-02-06 00:08:23 - ERROR - stderr - +2025-02-06 00:08:23 - ERROR - stderr - +2025-02-06 00:08:23 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.4432581663131714, 'learning_rate': 4.975846631419866e-06, 'epoch': 2.03} +2025-02-06 00:08:24 - ERROR - stderr - 68%|██████▊ | 15200/22434 [14:00:43<5:06:06, 2.54s/it] +2025-02-06 00:08:26 - ERROR - stderr - 68%|██████▊ | 15201/22434 [14:00:46<5:13:49, 2.60s/it] +2025-02-06 00:08:26 - ERROR - stderr - +2025-02-06 00:08:26 - ERROR - stderr - +2025-02-06 00:08:26 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.4345471858978271, 'learning_rate': 4.974598382325324e-06, 'epoch': 2.03} +2025-02-06 00:08:26 - ERROR - stderr - 68%|██████▊ | 15201/22434 [14:00:46<5:13:49, 2.60s/it] +2025-02-06 00:08:29 - ERROR - stderr - 68%|██████▊ | 15202/22434 [14:00:48<5:10:49, 2.58s/it] +2025-02-06 00:08:29 - ERROR - stderr - +2025-02-06 00:08:29 - ERROR - stderr - +2025-02-06 00:08:29 - INFO - stdout - {'loss': 0.4144, 'grad_norm': 1.4581286907196045, 'learning_rate': 4.973350237980466e-06, 'epoch': 2.03} +2025-02-06 00:08:29 - ERROR - stderr - 68%|██████▊ | 15202/22434 [14:00:49<5:10:49, 2.58s/it] +2025-02-06 00:08:31 - ERROR - stderr - 68%|██████▊ | 15203/22434 [14:00:51<5:09:10, 2.57s/it] +2025-02-06 00:08:31 - ERROR - stderr - +2025-02-06 00:08:31 - ERROR - stderr - +2025-02-06 00:08:31 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.3915935754776, 'learning_rate': 4.972102198411309e-06, 'epoch': 2.03} +2025-02-06 00:08:31 - ERROR - stderr - 68%|██████▊ | 15203/22434 [14:00:51<5:09:10, 2.57s/it] +2025-02-06 00:08:34 - ERROR - stderr - 68%|██████▊ | 15204/22434 [14:00:53<5:05:19, 2.53s/it] +2025-02-06 00:08:34 - ERROR - stderr - +2025-02-06 00:08:34 - ERROR - stderr - +2025-02-06 00:08:34 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.5647811889648438, 'learning_rate': 4.970854263643878e-06, 'epoch': 2.03} +2025-02-06 00:08:34 - ERROR - stderr - 68%|██████▊ | 15204/22434 [14:00:54<5:05:19, 2.53s/it] +2025-02-06 00:08:36 - ERROR - stderr - 68%|██████▊ | 15205/22434 [14:00:56<5:02:30, 2.51s/it] +2025-02-06 00:08:36 - ERROR - stderr - +2025-02-06 00:08:36 - ERROR - stderr - +2025-02-06 00:08:36 - INFO - stdout - {'loss': 0.3244, 'grad_norm': 1.313264012336731, 'learning_rate': 4.969606433704174e-06, 'epoch': 2.03} +2025-02-06 00:08:36 - ERROR - stderr - 68%|██████▊ | 15205/22434 [14:00:56<5:02:30, 2.51s/it] +2025-02-06 00:08:39 - ERROR - stderr - 68%|██████▊ | 15206/22434 [14:00:59<5:05:35, 2.54s/it] +2025-02-06 00:08:39 - ERROR - stderr - +2025-02-06 00:08:39 - ERROR - stderr - +2025-02-06 00:08:39 - INFO - stdout - {'loss': 0.4075, 'grad_norm': 1.398871660232544, 'learning_rate': 4.968358708618211e-06, 'epoch': 2.03} +2025-02-06 00:08:39 - ERROR - stderr - 68%|██████▊ | 15206/22434 [14:00:59<5:05:35, 2.54s/it] +2025-02-06 00:08:41 - ERROR - stderr - 68%|██████▊ | 15207/22434 [14:01:01<5:05:19, 2.53s/it] +2025-02-06 00:08:41 - ERROR - stderr - +2025-02-06 00:08:41 - ERROR - stderr - +2025-02-06 00:08:41 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.4235507249832153, 'learning_rate': 4.967111088411994e-06, 'epoch': 2.03} +2025-02-06 00:08:41 - ERROR - stderr - 68%|██████▊ | 15207/22434 [14:01:01<5:05:19, 2.53s/it] +2025-02-06 00:08:44 - ERROR - stderr - 68%|██████▊ | 15208/22434 [14:01:04<5:04:23, 2.53s/it] +2025-02-06 00:08:44 - ERROR - stderr - +2025-02-06 00:08:44 - ERROR - stderr - +2025-02-06 00:08:44 - INFO - stdout - {'loss': 0.3981, 'grad_norm': 1.48894202709198, 'learning_rate': 4.9658635731115314e-06, 'epoch': 2.03} +2025-02-06 00:08:44 - ERROR - stderr - 68%|██████▊ | 15208/22434 [14:01:04<5:04:23, 2.53s/it] +2025-02-06 00:08:46 - ERROR - stderr - 68%|██████▊ | 15209/22434 [14:01:06<5:00:25, 2.49s/it] +2025-02-06 00:08:46 - ERROR - stderr - +2025-02-06 00:08:46 - ERROR - stderr - +2025-02-06 00:08:46 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.3534085750579834, 'learning_rate': 4.964616162742826e-06, 'epoch': 2.03} +2025-02-06 00:08:46 - ERROR - stderr - 68%|██████▊ | 15209/22434 [14:01:06<5:00:25, 2.49s/it] +2025-02-06 00:08:49 - ERROR - stderr - 68%|██████▊ | 15210/22434 [14:01:09<5:01:14, 2.50s/it] +2025-02-06 00:08:49 - ERROR - stderr - +2025-02-06 00:08:49 - ERROR - stderr - +2025-02-06 00:08:49 - INFO - stdout - {'loss': 0.4114, 'grad_norm': 1.5894485712051392, 'learning_rate': 4.9633688573318775e-06, 'epoch': 2.03} +2025-02-06 00:08:49 - ERROR - stderr - 68%|██████▊ | 15210/22434 [14:01:09<5:01:14, 2.50s/it] +2025-02-06 00:08:51 - ERROR - stderr - 68%|██████▊ | 15211/22434 [14:01:11<5:01:13, 2.50s/it] +2025-02-06 00:08:51 - ERROR - stderr - +2025-02-06 00:08:51 - ERROR - stderr - +2025-02-06 00:08:51 - INFO - stdout - {'loss': 0.4125, 'grad_norm': 1.669345736503601, 'learning_rate': 4.962121656904686e-06, 'epoch': 2.03} +2025-02-06 00:08:51 - ERROR - stderr - 68%|██████▊ | 15211/22434 [14:01:11<5:01:13, 2.50s/it] +2025-02-06 00:08:54 - ERROR - stderr - 68%|██████▊ | 15212/22434 [14:01:13<4:59:52, 2.49s/it] +2025-02-06 00:08:54 - ERROR - stderr - +2025-02-06 00:08:54 - ERROR - stderr - +2025-02-06 00:08:54 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.3715846538543701, 'learning_rate': 4.960874561487248e-06, 'epoch': 2.03} +2025-02-06 00:08:54 - ERROR - stderr - 68%|██████▊ | 15212/22434 [14:01:14<4:59:52, 2.49s/it] +2025-02-06 00:08:56 - ERROR - stderr - 68%|██████▊ | 15213/22434 [14:01:16<5:01:20, 2.50s/it] +2025-02-06 00:08:56 - ERROR - stderr - +2025-02-06 00:08:56 - ERROR - stderr - +2025-02-06 00:08:56 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.3820350170135498, 'learning_rate': 4.959627571105557e-06, 'epoch': 2.03} +2025-02-06 00:08:56 - ERROR - stderr - 68%|██████▊ | 15213/22434 [14:01:16<5:01:20, 2.50s/it] +2025-02-06 00:08:59 - ERROR - stderr - 68%|██████▊ | 15214/22434 [14:01:19<5:04:37, 2.53s/it] +2025-02-06 00:08:59 - ERROR - stderr - +2025-02-06 00:08:59 - ERROR - stderr - +2025-02-06 00:08:59 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.37521493434906, 'learning_rate': 4.958380685785608e-06, 'epoch': 2.03} +2025-02-06 00:08:59 - ERROR - stderr - 68%|██████▊ | 15214/22434 [14:01:19<5:04:37, 2.53s/it] +2025-02-06 00:09:01 - ERROR - stderr - 68%|██████▊ | 15215/22434 [14:01:21<5:01:25, 2.51s/it] +2025-02-06 00:09:01 - ERROR - stderr - +2025-02-06 00:09:01 - ERROR - stderr - +2025-02-06 00:09:01 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.5038729906082153, 'learning_rate': 4.957133905553387e-06, 'epoch': 2.03} +2025-02-06 00:09:01 - ERROR - stderr - 68%|██████▊ | 15215/22434 [14:01:21<5:01:25, 2.51s/it] +2025-02-06 00:09:04 - ERROR - stderr - 68%|██████▊ | 15216/22434 [14:01:24<5:00:32, 2.50s/it] +2025-02-06 00:09:04 - ERROR - stderr - +2025-02-06 00:09:04 - ERROR - stderr - +2025-02-06 00:09:04 - INFO - stdout - {'loss': 0.4044, 'grad_norm': 1.4716589450836182, 'learning_rate': 4.955887230434886e-06, 'epoch': 2.03} +2025-02-06 00:09:04 - ERROR - stderr - 68%|██████▊ | 15216/22434 [14:01:24<5:00:32, 2.50s/it] +2025-02-06 00:09:06 - ERROR - stderr - 68%|██████▊ | 15217/22434 [14:01:26<5:00:23, 2.50s/it] +2025-02-06 00:09:06 - ERROR - stderr - +2025-02-06 00:09:06 - ERROR - stderr - +2025-02-06 00:09:06 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.2435904741287231, 'learning_rate': 4.954640660456088e-06, 'epoch': 2.03} +2025-02-06 00:09:06 - ERROR - stderr - 68%|██████▊ | 15217/22434 [14:01:26<5:00:23, 2.50s/it] +2025-02-06 00:09:09 - ERROR - stderr - 68%|██████▊ | 15218/22434 [14:01:29<5:00:15, 2.50s/it] +2025-02-06 00:09:09 - ERROR - stderr - +2025-02-06 00:09:09 - ERROR - stderr - +2025-02-06 00:09:09 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.3445383310317993, 'learning_rate': 4.953394195642982e-06, 'epoch': 2.04} +2025-02-06 00:09:09 - ERROR - stderr - 68%|██████▊ | 15218/22434 [14:01:29<5:00:15, 2.50s/it] +2025-02-06 00:09:11 - ERROR - stderr - 68%|██████▊ | 15219/22434 [14:01:31<5:08:16, 2.56s/it] +2025-02-06 00:09:12 - ERROR - stderr - +2025-02-06 00:09:12 - ERROR - stderr - +2025-02-06 00:09:12 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.5189684629440308, 'learning_rate': 4.9521478360215365e-06, 'epoch': 2.04} +2025-02-06 00:09:12 - ERROR - stderr - 68%|██████▊ | 15219/22434 [14:01:31<5:08:16, 2.56s/it] +2025-02-06 00:09:14 - ERROR - stderr - 68%|██████▊ | 15220/22434 [14:01:34<5:07:27, 2.56s/it] +2025-02-06 00:09:14 - ERROR - stderr - +2025-02-06 00:09:14 - ERROR - stderr - +2025-02-06 00:09:14 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.4853973388671875, 'learning_rate': 4.950901581617747e-06, 'epoch': 2.04} +2025-02-06 00:09:14 - ERROR - stderr - 68%|██████▊ | 15220/22434 [14:01:34<5:07:27, 2.56s/it] +2025-02-06 00:09:16 - ERROR - stderr - 68%|██████▊ | 15221/22434 [14:01:36<5:03:52, 2.53s/it] +2025-02-06 00:09:17 - ERROR - stderr - +2025-02-06 00:09:17 - ERROR - stderr - +2025-02-06 00:09:17 - INFO - stdout - {'loss': 0.3337, 'grad_norm': 1.3932093381881714, 'learning_rate': 4.949655432457575e-06, 'epoch': 2.04} +2025-02-06 00:09:17 - ERROR - stderr - 68%|██████▊ | 15221/22434 [14:01:36<5:03:52, 2.53s/it] +2025-02-06 00:09:19 - ERROR - stderr - 68%|██████▊ | 15222/22434 [14:01:39<5:01:15, 2.51s/it] +2025-02-06 00:09:19 - ERROR - stderr - +2025-02-06 00:09:19 - ERROR - stderr - +2025-02-06 00:09:19 - INFO - stdout - {'loss': 0.3707, 'grad_norm': 1.4687823057174683, 'learning_rate': 4.948409388567007e-06, 'epoch': 2.04} +2025-02-06 00:09:19 - ERROR - stderr - 68%|██████▊ | 15222/22434 [14:01:39<5:01:15, 2.51s/it] +2025-02-06 00:09:21 - ERROR - stderr - 68%|██████▊ | 15223/22434 [14:01:41<4:59:33, 2.49s/it] +2025-02-06 00:09:21 - ERROR - stderr - +2025-02-06 00:09:21 - ERROR - stderr - +2025-02-06 00:09:21 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.3845065832138062, 'learning_rate': 4.947163449972016e-06, 'epoch': 2.04} +2025-02-06 00:09:21 - ERROR - stderr - 68%|██████▊ | 15223/22434 [14:01:41<4:59:33, 2.49s/it] +2025-02-06 00:09:24 - ERROR - stderr - 68%|██████▊ | 15224/22434 [14:01:44<4:58:02, 2.48s/it] +2025-02-06 00:09:24 - ERROR - stderr - +2025-02-06 00:09:24 - ERROR - stderr - +2025-02-06 00:09:24 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.439473032951355, 'learning_rate': 4.945917616698559e-06, 'epoch': 2.04} +2025-02-06 00:09:24 - ERROR - stderr - 68%|██████▊ | 15224/22434 [14:01:44<4:58:02, 2.48s/it] +2025-02-06 00:09:26 - ERROR - stderr - 68%|██████▊ | 15225/22434 [14:01:46<4:58:35, 2.49s/it] +2025-02-06 00:09:26 - ERROR - stderr - +2025-02-06 00:09:26 - ERROR - stderr - +2025-02-06 00:09:26 - INFO - stdout - {'loss': 0.449, 'grad_norm': 1.596977710723877, 'learning_rate': 4.944671888772621e-06, 'epoch': 2.04} +2025-02-06 00:09:26 - ERROR - stderr - 68%|██████▊ | 15225/22434 [14:01:46<4:58:35, 2.49s/it] +2025-02-06 00:09:29 - ERROR - stderr - 68%|██████▊ | 15226/22434 [14:01:49<5:09:27, 2.58s/it] +2025-02-06 00:09:29 - ERROR - stderr - +2025-02-06 00:09:29 - ERROR - stderr - +2025-02-06 00:09:29 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.590625524520874, 'learning_rate': 4.943426266220156e-06, 'epoch': 2.04} +2025-02-06 00:09:29 - ERROR - stderr - 68%|██████▊ | 15226/22434 [14:01:49<5:09:27, 2.58s/it] +2025-02-06 00:09:32 - ERROR - stderr - 68%|██████▊ | 15227/22434 [14:01:51<5:06:43, 2.55s/it] +2025-02-06 00:09:32 - ERROR - stderr - +2025-02-06 00:09:32 - ERROR - stderr - +2025-02-06 00:09:32 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.3335574865341187, 'learning_rate': 4.942180749067133e-06, 'epoch': 2.04} +2025-02-06 00:09:32 - ERROR - stderr - 68%|██████▊ | 15227/22434 [14:01:51<5:06:43, 2.55s/it] +2025-02-06 00:09:34 - ERROR - stderr - 68%|██████▊ | 15228/22434 [14:01:54<5:06:31, 2.55s/it] +2025-02-06 00:09:34 - ERROR - stderr - +2025-02-06 00:09:34 - ERROR - stderr - +2025-02-06 00:09:34 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.3558980226516724, 'learning_rate': 4.9409353373395105e-06, 'epoch': 2.04} +2025-02-06 00:09:34 - ERROR - stderr - 68%|██████▊ | 15228/22434 [14:01:54<5:06:31, 2.55s/it] +2025-02-06 00:09:37 - ERROR - stderr - 68%|██████▊ | 15229/22434 [14:01:56<5:05:06, 2.54s/it] +2025-02-06 00:09:37 - ERROR - stderr - +2025-02-06 00:09:37 - ERROR - stderr - +2025-02-06 00:09:37 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.6549838781356812, 'learning_rate': 4.939690031063251e-06, 'epoch': 2.04} +2025-02-06 00:09:37 - ERROR - stderr - 68%|██████▊ | 15229/22434 [14:01:57<5:05:06, 2.54s/it] +2025-02-06 00:09:39 - ERROR - stderr - 68%|██████▊ | 15230/22434 [14:01:59<5:02:47, 2.52s/it] +2025-02-06 00:09:39 - ERROR - stderr - +2025-02-06 00:09:39 - ERROR - stderr - +2025-02-06 00:09:39 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.2646405696868896, 'learning_rate': 4.938444830264311e-06, 'epoch': 2.04} +2025-02-06 00:09:39 - ERROR - stderr - 68%|██████▊ | 15230/22434 [14:01:59<5:02:47, 2.52s/it] +2025-02-06 00:09:42 - ERROR - stderr - 68%|██████▊ | 15231/22434 [14:02:01<5:01:24, 2.51s/it] +2025-02-06 00:09:42 - ERROR - stderr - +2025-02-06 00:09:42 - ERROR - stderr - +2025-02-06 00:09:42 - INFO - stdout - {'loss': 0.4247, 'grad_norm': 1.458613395690918, 'learning_rate': 4.937199734968644e-06, 'epoch': 2.04} +2025-02-06 00:09:42 - ERROR - stderr - 68%|██████▊ | 15231/22434 [14:02:01<5:01:24, 2.51s/it] +2025-02-06 00:09:44 - ERROR - stderr - 68%|██████▊ | 15232/22434 [14:02:04<5:06:44, 2.56s/it] +2025-02-06 00:09:44 - ERROR - stderr - +2025-02-06 00:09:44 - ERROR - stderr - +2025-02-06 00:09:44 - INFO - stdout - {'loss': 0.4322, 'grad_norm': 1.5858837366104126, 'learning_rate': 4.935954745202205e-06, 'epoch': 2.04} +2025-02-06 00:09:44 - ERROR - stderr - 68%|██████▊ | 15232/22434 [14:02:04<5:06:44, 2.56s/it] +2025-02-06 00:09:47 - ERROR - stderr - 68%|██████▊ | 15233/22434 [14:02:07<5:03:47, 2.53s/it] +2025-02-06 00:09:47 - ERROR - stderr - +2025-02-06 00:09:47 - ERROR - stderr - +2025-02-06 00:09:47 - INFO - stdout - {'loss': 0.4228, 'grad_norm': 1.6323989629745483, 'learning_rate': 4.934709860990944e-06, 'epoch': 2.04} +2025-02-06 00:09:47 - ERROR - stderr - 68%|██████▊ | 15233/22434 [14:02:07<5:03:47, 2.53s/it] +2025-02-06 00:09:49 - ERROR - stderr - 68%|██████▊ | 15234/22434 [14:02:09<5:01:29, 2.51s/it] +2025-02-06 00:09:49 - ERROR - stderr - +2025-02-06 00:09:49 - ERROR - stderr - +2025-02-06 00:09:49 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.547903060913086, 'learning_rate': 4.933465082360808e-06, 'epoch': 2.04} +2025-02-06 00:09:49 - ERROR - stderr - 68%|██████▊ | 15234/22434 [14:02:09<5:01:29, 2.51s/it] +2025-02-06 00:09:52 - ERROR - stderr - 68%|██████▊ | 15235/22434 [14:02:12<5:01:50, 2.52s/it] +2025-02-06 00:09:52 - ERROR - stderr - +2025-02-06 00:09:52 - ERROR - stderr - +2025-02-06 00:09:52 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.4644842147827148, 'learning_rate': 4.932220409337743e-06, 'epoch': 2.04} +2025-02-06 00:09:52 - ERROR - stderr - 68%|██████▊ | 15235/22434 [14:02:12<5:01:50, 2.52s/it] +2025-02-06 00:09:54 - ERROR - stderr - 68%|██████▊ | 15236/22434 [14:02:14<4:59:27, 2.50s/it] +2025-02-06 00:09:54 - ERROR - stderr - +2025-02-06 00:09:54 - ERROR - stderr - +2025-02-06 00:09:54 - INFO - stdout - {'loss': 0.4071, 'grad_norm': 1.3342421054840088, 'learning_rate': 4.930975841947696e-06, 'epoch': 2.04} +2025-02-06 00:09:54 - ERROR - stderr - 68%|██████▊ | 15236/22434 [14:02:14<4:59:27, 2.50s/it] +2025-02-06 00:09:57 - ERROR - stderr - 68%|██████▊ | 15237/22434 [14:02:17<5:00:19, 2.50s/it] +2025-02-06 00:09:57 - ERROR - stderr - +2025-02-06 00:09:57 - ERROR - stderr - +2025-02-06 00:09:57 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.4508602619171143, 'learning_rate': 4.929731380216607e-06, 'epoch': 2.04} +2025-02-06 00:09:57 - ERROR - stderr - 68%|██████▊ | 15237/22434 [14:02:17<5:00:19, 2.50s/it] +2025-02-06 00:09:59 - ERROR - stderr - 68%|██████▊ | 15238/22434 [14:02:19<5:03:09, 2.53s/it] +2025-02-06 00:09:59 - ERROR - stderr - +2025-02-06 00:09:59 - ERROR - stderr - +2025-02-06 00:09:59 - INFO - stdout - {'loss': 0.4485, 'grad_norm': 1.3649914264678955, 'learning_rate': 4.928487024170415e-06, 'epoch': 2.04} +2025-02-06 00:09:59 - ERROR - stderr - 68%|██████▊ | 15238/22434 [14:02:19<5:03:09, 2.53s/it] +2025-02-06 00:10:02 - ERROR - stderr - 68%|██████▊ | 15239/22434 [14:02:22<5:15:09, 2.63s/it] +2025-02-06 00:10:02 - ERROR - stderr - +2025-02-06 00:10:02 - ERROR - stderr - +2025-02-06 00:10:02 - INFO - stdout - {'loss': 0.4209, 'grad_norm': 1.564220666885376, 'learning_rate': 4.927242773835063e-06, 'epoch': 2.04} +2025-02-06 00:10:02 - ERROR - stderr - 68%|██████▊ | 15239/22434 [14:02:22<5:15:09, 2.63s/it] +2025-02-06 00:10:05 - ERROR - stderr - 68%|██████▊ | 15240/22434 [14:02:25<5:12:13, 2.60s/it] +2025-02-06 00:10:05 - ERROR - stderr - +2025-02-06 00:10:05 - ERROR - stderr - +2025-02-06 00:10:05 - INFO - stdout - {'loss': 0.3891, 'grad_norm': 1.3991637229919434, 'learning_rate': 4.925998629236473e-06, 'epoch': 2.04} +2025-02-06 00:10:05 - ERROR - stderr - 68%|██████▊ | 15240/22434 [14:02:25<5:12:13, 2.60s/it] +2025-02-06 00:10:07 - ERROR - stderr - 68%|██████▊ | 15241/22434 [14:02:27<5:06:19, 2.56s/it] +2025-02-06 00:10:07 - ERROR - stderr - +2025-02-06 00:10:07 - ERROR - stderr - +2025-02-06 00:10:07 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.4363185167312622, 'learning_rate': 4.92475459040059e-06, 'epoch': 2.04} +2025-02-06 00:10:07 - ERROR - stderr - 68%|██████▊ | 15241/22434 [14:02:27<5:06:19, 2.56s/it] +2025-02-06 00:10:10 - ERROR - stderr - 68%|██████▊ | 15242/22434 [14:02:29<5:02:54, 2.53s/it] +2025-02-06 00:10:10 - ERROR - stderr - +2025-02-06 00:10:10 - ERROR - stderr - +2025-02-06 00:10:10 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.3756963014602661, 'learning_rate': 4.923510657353344e-06, 'epoch': 2.04} +2025-02-06 00:10:10 - ERROR - stderr - 68%|██████▊ | 15242/22434 [14:02:29<5:02:54, 2.53s/it] +2025-02-06 00:10:12 - ERROR - stderr - 68%|██████▊ | 15243/22434 [14:02:32<4:58:56, 2.49s/it] +2025-02-06 00:10:12 - ERROR - stderr - +2025-02-06 00:10:12 - ERROR - stderr - +2025-02-06 00:10:12 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.4408073425292969, 'learning_rate': 4.922266830120654e-06, 'epoch': 2.04} +2025-02-06 00:10:12 - ERROR - stderr - 68%|██████▊ | 15243/22434 [14:02:32<4:58:56, 2.49s/it] +2025-02-06 00:10:15 - ERROR - stderr - 68%|██████▊ | 15244/22434 [14:02:34<4:58:07, 2.49s/it] +2025-02-06 00:10:15 - ERROR - stderr - +2025-02-06 00:10:15 - ERROR - stderr - +2025-02-06 00:10:15 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.4855787754058838, 'learning_rate': 4.921023108728461e-06, 'epoch': 2.04} +2025-02-06 00:10:15 - ERROR - stderr - 68%|██████▊ | 15244/22434 [14:02:34<4:58:07, 2.49s/it] +2025-02-06 00:10:17 - ERROR - stderr - 68%|██████▊ | 15245/22434 [14:02:37<4:59:32, 2.50s/it] +2025-02-06 00:10:17 - ERROR - stderr - +2025-02-06 00:10:17 - ERROR - stderr - +2025-02-06 00:10:17 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.571374773979187, 'learning_rate': 4.919779493202673e-06, 'epoch': 2.04} +2025-02-06 00:10:17 - ERROR - stderr - 68%|██████▊ | 15245/22434 [14:02:37<4:59:32, 2.50s/it] +2025-02-06 00:10:20 - ERROR - stderr - 68%|██████▊ | 15246/22434 [14:02:39<5:04:35, 2.54s/it] +2025-02-06 00:10:20 - ERROR - stderr - +2025-02-06 00:10:20 - ERROR - stderr - +2025-02-06 00:10:20 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.3049968481063843, 'learning_rate': 4.918535983569228e-06, 'epoch': 2.04} +2025-02-06 00:10:20 - ERROR - stderr - 68%|██████▊ | 15246/22434 [14:02:40<5:04:35, 2.54s/it] +2025-02-06 00:10:22 - ERROR - stderr - 68%|██████▊ | 15247/22434 [14:02:42<5:02:22, 2.52s/it] +2025-02-06 00:10:22 - ERROR - stderr - +2025-02-06 00:10:22 - ERROR - stderr - +2025-02-06 00:10:22 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.493226408958435, 'learning_rate': 4.917292579854035e-06, 'epoch': 2.04} +2025-02-06 00:10:22 - ERROR - stderr - 68%|██████▊ | 15247/22434 [14:02:42<5:02:22, 2.52s/it] +2025-02-06 00:10:25 - ERROR - stderr - 68%|██████▊ | 15248/22434 [14:02:45<5:08:53, 2.58s/it] +2025-02-06 00:10:25 - ERROR - stderr - +2025-02-06 00:10:25 - ERROR - stderr - +2025-02-06 00:10:25 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.2981209754943848, 'learning_rate': 4.916049282083013e-06, 'epoch': 2.04} +2025-02-06 00:10:25 - ERROR - stderr - 68%|██████▊ | 15248/22434 [14:02:45<5:08:53, 2.58s/it] +2025-02-06 00:10:27 - ERROR - stderr - 68%|██████▊ | 15249/22434 [14:02:47<5:05:12, 2.55s/it] +2025-02-06 00:10:27 - ERROR - stderr - +2025-02-06 00:10:27 - ERROR - stderr - +2025-02-06 00:10:27 - INFO - stdout - {'loss': 0.4649, 'grad_norm': 1.9203742742538452, 'learning_rate': 4.91480609028208e-06, 'epoch': 2.04} +2025-02-06 00:10:27 - ERROR - stderr - 68%|██████▊ | 15249/22434 [14:02:47<5:05:12, 2.55s/it] +2025-02-06 00:10:30 - ERROR - stderr - 68%|██████▊ | 15250/22434 [14:02:50<5:03:41, 2.54s/it] +2025-02-06 00:10:30 - ERROR - stderr - +2025-02-06 00:10:30 - ERROR - stderr - +2025-02-06 00:10:30 - INFO - stdout - {'loss': 0.394, 'grad_norm': 1.6046770811080933, 'learning_rate': 4.913563004477148e-06, 'epoch': 2.04} +2025-02-06 00:10:30 - ERROR - stderr - 68%|██████▊ | 15250/22434 [14:02:50<5:03:41, 2.54s/it] +2025-02-06 00:10:32 - ERROR - stderr - 68%|██████▊ | 15251/22434 [14:02:52<5:00:11, 2.51s/it] +2025-02-06 00:10:32 - ERROR - stderr - +2025-02-06 00:10:32 - ERROR - stderr - +2025-02-06 00:10:32 - INFO - stdout - {'loss': 0.3911, 'grad_norm': 1.5132733583450317, 'learning_rate': 4.912320024694128e-06, 'epoch': 2.04} +2025-02-06 00:10:32 - ERROR - stderr - 68%|██████▊ | 15251/22434 [14:02:52<5:00:11, 2.51s/it] +2025-02-06 00:10:35 - ERROR - stderr - 68%|██████▊ | 15252/22434 [14:02:55<5:02:27, 2.53s/it] +2025-02-06 00:10:35 - ERROR - stderr - +2025-02-06 00:10:35 - ERROR - stderr - +2025-02-06 00:10:35 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.256216287612915, 'learning_rate': 4.911077150958928e-06, 'epoch': 2.04} +2025-02-06 00:10:35 - ERROR - stderr - 68%|██████▊ | 15252/22434 [14:02:55<5:02:27, 2.53s/it] +2025-02-06 00:10:37 - ERROR - stderr - 68%|██████▊ | 15253/22434 [14:02:57<5:01:33, 2.52s/it] +2025-02-06 00:10:37 - ERROR - stderr - +2025-02-06 00:10:37 - ERROR - stderr - +2025-02-06 00:10:37 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.7052435874938965, 'learning_rate': 4.909834383297456e-06, 'epoch': 2.04} +2025-02-06 00:10:37 - ERROR - stderr - 68%|██████▊ | 15253/22434 [14:02:57<5:01:33, 2.52s/it] +2025-02-06 00:10:40 - ERROR - stderr - 68%|██████▊ | 15254/22434 [14:03:00<5:00:05, 2.51s/it] +2025-02-06 00:10:40 - ERROR - stderr - +2025-02-06 00:10:40 - ERROR - stderr - +2025-02-06 00:10:40 - INFO - stdout - {'loss': 0.3905, 'grad_norm': 1.400803565979004, 'learning_rate': 4.908591721735615e-06, 'epoch': 2.04} +2025-02-06 00:10:40 - ERROR - stderr - 68%|██████▊ | 15254/22434 [14:03:00<5:00:05, 2.51s/it] +2025-02-06 00:10:42 - ERROR - stderr - 68%|██████▊ | 15255/22434 [14:03:02<4:56:54, 2.48s/it] +2025-02-06 00:10:42 - ERROR - stderr - +2025-02-06 00:10:42 - ERROR - stderr - +2025-02-06 00:10:42 - INFO - stdout - {'loss': 0.3609, 'grad_norm': 1.473936915397644, 'learning_rate': 4.907349166299308e-06, 'epoch': 2.04} +2025-02-06 00:10:42 - ERROR - stderr - 68%|██████▊ | 15255/22434 [14:03:02<4:56:54, 2.48s/it] +2025-02-06 00:10:45 - ERROR - stderr - 68%|██████▊ | 15256/22434 [14:03:05<4:56:11, 2.48s/it] +2025-02-06 00:10:45 - ERROR - stderr - +2025-02-06 00:10:45 - ERROR - stderr - +2025-02-06 00:10:45 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.4464904069900513, 'learning_rate': 4.9061067170144335e-06, 'epoch': 2.04} +2025-02-06 00:10:45 - ERROR - stderr - 68%|██████▊ | 15256/22434 [14:03:05<4:56:11, 2.48s/it] +2025-02-06 00:10:47 - ERROR - stderr - 68%|██████▊ | 15257/22434 [14:03:07<4:55:22, 2.47s/it] +2025-02-06 00:10:47 - ERROR - stderr - +2025-02-06 00:10:47 - ERROR - stderr - +2025-02-06 00:10:47 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.4845995903015137, 'learning_rate': 4.904864373906892e-06, 'epoch': 2.04} +2025-02-06 00:10:47 - ERROR - stderr - 68%|██████▊ | 15257/22434 [14:03:07<4:55:22, 2.47s/it] +2025-02-06 00:10:50 - ERROR - stderr - 68%|██████▊ | 15258/22434 [14:03:10<4:56:45, 2.48s/it] +2025-02-06 00:10:50 - ERROR - stderr - +2025-02-06 00:10:50 - ERROR - stderr - +2025-02-06 00:10:50 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.3443933725357056, 'learning_rate': 4.903622137002579e-06, 'epoch': 2.04} +2025-02-06 00:10:50 - ERROR - stderr - 68%|██████▊ | 15258/22434 [14:03:10<4:56:45, 2.48s/it] +2025-02-06 00:10:52 - ERROR - stderr - 68%|██████▊ | 15259/22434 [14:03:12<5:00:49, 2.52s/it] +2025-02-06 00:10:52 - ERROR - stderr - +2025-02-06 00:10:52 - ERROR - stderr - +2025-02-06 00:10:52 - INFO - stdout - {'loss': 0.3458, 'grad_norm': 1.3063358068466187, 'learning_rate': 4.9023800063273795e-06, 'epoch': 2.04} +2025-02-06 00:10:52 - ERROR - stderr - 68%|██████▊ | 15259/22434 [14:03:12<5:00:49, 2.52s/it] +2025-02-06 00:10:55 - ERROR - stderr - 68%|██████▊ | 15260/22434 [14:03:15<5:02:14, 2.53s/it] +2025-02-06 00:10:55 - ERROR - stderr - +2025-02-06 00:10:55 - ERROR - stderr - +2025-02-06 00:10:55 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.3413947820663452, 'learning_rate': 4.9011379819071935e-06, 'epoch': 2.04} +2025-02-06 00:10:55 - ERROR - stderr - 68%|██████▊ | 15260/22434 [14:03:15<5:02:14, 2.53s/it] +2025-02-06 00:10:57 - ERROR - stderr - 68%|██████▊ | 15261/22434 [14:03:17<5:01:54, 2.53s/it] +2025-02-06 00:10:57 - ERROR - stderr - +2025-02-06 00:10:57 - ERROR - stderr - +2025-02-06 00:10:57 - INFO - stdout - {'loss': 0.4247, 'grad_norm': 1.543323278427124, 'learning_rate': 4.899896063767908e-06, 'epoch': 2.04} +2025-02-06 00:10:57 - ERROR - stderr - 68%|██████▊ | 15261/22434 [14:03:17<5:01:54, 2.53s/it] +2025-02-06 00:11:00 - ERROR - stderr - 68%|██████▊ | 15262/22434 [14:03:20<5:01:19, 2.52s/it] +2025-02-06 00:11:00 - ERROR - stderr - +2025-02-06 00:11:00 - ERROR - stderr - +2025-02-06 00:11:00 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.4663028717041016, 'learning_rate': 4.898654251935409e-06, 'epoch': 2.04} +2025-02-06 00:11:00 - ERROR - stderr - 68%|██████▊ | 15262/22434 [14:03:20<5:01:19, 2.52s/it] +2025-02-06 00:11:03 - ERROR - stderr - 68%|██████▊ | 15263/22434 [14:03:22<5:06:35, 2.57s/it] +2025-02-06 00:11:03 - ERROR - stderr - +2025-02-06 00:11:03 - ERROR - stderr - +2025-02-06 00:11:03 - INFO - stdout - {'loss': 0.4333, 'grad_norm': 1.6112192869186401, 'learning_rate': 4.8974125464355845e-06, 'epoch': 2.04} +2025-02-06 00:11:03 - ERROR - stderr - 68%|██████▊ | 15263/22434 [14:03:22<5:06:35, 2.57s/it] +2025-02-06 00:11:05 - ERROR - stderr - 68%|██████▊ | 15264/22434 [14:03:25<5:04:59, 2.55s/it] +2025-02-06 00:11:05 - ERROR - stderr - +2025-02-06 00:11:05 - ERROR - stderr - +2025-02-06 00:11:05 - INFO - stdout - {'loss': 0.3634, 'grad_norm': 1.3861055374145508, 'learning_rate': 4.8961709472943045e-06, 'epoch': 2.04} +2025-02-06 00:11:05 - ERROR - stderr - 68%|██████▊ | 15264/22434 [14:03:25<5:04:59, 2.55s/it] +2025-02-06 00:11:08 - ERROR - stderr - 68%|██████▊ | 15265/22434 [14:03:27<5:03:10, 2.54s/it] +2025-02-06 00:11:08 - ERROR - stderr - +2025-02-06 00:11:08 - ERROR - stderr - +2025-02-06 00:11:08 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.3533494472503662, 'learning_rate': 4.894929454537466e-06, 'epoch': 2.04} +2025-02-06 00:11:08 - ERROR - stderr - 68%|██████▊ | 15265/22434 [14:03:27<5:03:10, 2.54s/it] +2025-02-06 00:11:10 - ERROR - stderr - 68%|██████▊ | 15266/22434 [14:03:30<5:03:55, 2.54s/it] +2025-02-06 00:11:10 - ERROR - stderr - +2025-02-06 00:11:10 - ERROR - stderr - +2025-02-06 00:11:10 - INFO - stdout - {'loss': 0.4189, 'grad_norm': 1.5087858438491821, 'learning_rate': 4.893688068190933e-06, 'epoch': 2.04} +2025-02-06 00:11:10 - ERROR - stderr - 68%|██████▊ | 15266/22434 [14:03:30<5:03:55, 2.54s/it] +2025-02-06 00:11:13 - ERROR - stderr - 68%|██████▊ | 15267/22434 [14:03:32<5:02:25, 2.53s/it] +2025-02-06 00:11:13 - ERROR - stderr - +2025-02-06 00:11:13 - ERROR - stderr - +2025-02-06 00:11:13 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.486737847328186, 'learning_rate': 4.892446788280587e-06, 'epoch': 2.04} +2025-02-06 00:11:13 - ERROR - stderr - 68%|██████▊ | 15267/22434 [14:03:32<5:02:25, 2.53s/it] +2025-02-06 00:11:15 - ERROR - stderr - 68%|██████▊ | 15268/22434 [14:03:35<4:59:12, 2.51s/it] +2025-02-06 00:11:15 - ERROR - stderr - +2025-02-06 00:11:15 - ERROR - stderr - +2025-02-06 00:11:15 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.3685330152511597, 'learning_rate': 4.8912056148323e-06, 'epoch': 2.04} +2025-02-06 00:11:15 - ERROR - stderr - 68%|██████▊ | 15268/22434 [14:03:35<4:59:12, 2.51s/it] +2025-02-06 00:11:18 - ERROR - stderr - 68%|██████▊ | 15269/22434 [14:03:37<4:59:51, 2.51s/it] +2025-02-06 00:11:18 - ERROR - stderr - +2025-02-06 00:11:18 - ERROR - stderr - +2025-02-06 00:11:18 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.2882962226867676, 'learning_rate': 4.889964547871938e-06, 'epoch': 2.04} +2025-02-06 00:11:18 - ERROR - stderr - 68%|██████▊ | 15269/22434 [14:03:37<4:59:51, 2.51s/it] +2025-02-06 00:11:20 - ERROR - stderr - 68%|██████▊ | 15270/22434 [14:03:40<5:04:42, 2.55s/it] +2025-02-06 00:11:20 - ERROR - stderr - +2025-02-06 00:11:20 - ERROR - stderr - +2025-02-06 00:11:20 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.476643443107605, 'learning_rate': 4.888723587425385e-06, 'epoch': 2.04} +2025-02-06 00:11:20 - ERROR - stderr - 68%|██████▊ | 15270/22434 [14:03:40<5:04:42, 2.55s/it] +2025-02-06 00:11:23 - ERROR - stderr - 68%|██████▊ | 15271/22434 [14:03:43<5:06:13, 2.57s/it] +2025-02-06 00:11:23 - ERROR - stderr - +2025-02-06 00:11:23 - ERROR - stderr - +2025-02-06 00:11:23 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.3776965141296387, 'learning_rate': 4.887482733518493e-06, 'epoch': 2.04} +2025-02-06 00:11:23 - ERROR - stderr - 68%|██████▊ | 15271/22434 [14:03:43<5:06:13, 2.57s/it] +2025-02-06 00:11:26 - ERROR - stderr - 68%|██████▊ | 15272/22434 [14:03:45<5:08:28, 2.58s/it] +2025-02-06 00:11:26 - ERROR - stderr - +2025-02-06 00:11:26 - ERROR - stderr - +2025-02-06 00:11:26 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.3672350645065308, 'learning_rate': 4.886241986177132e-06, 'epoch': 2.04} +2025-02-06 00:11:26 - ERROR - stderr - 68%|██████▊ | 15272/22434 [14:03:45<5:08:28, 2.58s/it] +2025-02-06 00:11:28 - ERROR - stderr - 68%|██████▊ | 15273/22434 [14:03:48<5:05:54, 2.56s/it] +2025-02-06 00:11:28 - ERROR - stderr - +2025-02-06 00:11:28 - ERROR - stderr - +2025-02-06 00:11:28 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.4884644746780396, 'learning_rate': 4.885001345427163e-06, 'epoch': 2.04} +2025-02-06 00:11:28 - ERROR - stderr - 68%|██████▊ | 15273/22434 [14:03:48<5:05:54, 2.56s/it] +2025-02-06 00:11:31 - ERROR - stderr - 68%|██████▊ | 15274/22434 [14:03:50<5:05:50, 2.56s/it] +2025-02-06 00:11:31 - ERROR - stderr - +2025-02-06 00:11:31 - ERROR - stderr - +2025-02-06 00:11:31 - INFO - stdout - {'loss': 0.4219, 'grad_norm': 1.5446165800094604, 'learning_rate': 4.8837608112944456e-06, 'epoch': 2.04} +2025-02-06 00:11:31 - ERROR - stderr - 68%|██████▊ | 15274/22434 [14:03:50<5:05:50, 2.56s/it] +2025-02-06 00:11:33 - ERROR - stderr - 68%|██████▊ | 15275/22434 [14:03:53<5:05:42, 2.56s/it] +2025-02-06 00:11:33 - ERROR - stderr - +2025-02-06 00:11:33 - ERROR - stderr - +2025-02-06 00:11:33 - INFO - stdout - {'loss': 0.293, 'grad_norm': 1.1544125080108643, 'learning_rate': 4.88252038380484e-06, 'epoch': 2.04} +2025-02-06 00:11:33 - ERROR - stderr - 68%|██████▊ | 15275/22434 [14:03:53<5:05:42, 2.56s/it] +2025-02-06 00:11:36 - ERROR - stderr - 68%|██████▊ | 15276/22434 [14:03:55<5:05:20, 2.56s/it] +2025-02-06 00:11:36 - ERROR - stderr - +2025-02-06 00:11:36 - ERROR - stderr - +2025-02-06 00:11:36 - INFO - stdout - {'loss': 0.3475, 'grad_norm': 1.2423200607299805, 'learning_rate': 4.881280062984198e-06, 'epoch': 2.04} +2025-02-06 00:11:36 - ERROR - stderr - 68%|██████▊ | 15276/22434 [14:03:56<5:05:20, 2.56s/it] +2025-02-06 00:11:38 - ERROR - stderr - 68%|██████▊ | 15277/22434 [14:03:58<5:03:01, 2.54s/it] +2025-02-06 00:11:38 - ERROR - stderr - +2025-02-06 00:11:38 - ERROR - stderr - +2025-02-06 00:11:38 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.3745707273483276, 'learning_rate': 4.880039848858377e-06, 'epoch': 2.04} +2025-02-06 00:11:38 - ERROR - stderr - 68%|██��███▊ | 15277/22434 [14:03:58<5:03:01, 2.54s/it] +2025-02-06 00:11:41 - ERROR - stderr - 68%|██████▊ | 15278/22434 [14:04:00<5:01:23, 2.53s/it] +2025-02-06 00:11:41 - ERROR - stderr - +2025-02-06 00:11:41 - ERROR - stderr - +2025-02-06 00:11:41 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.3408139944076538, 'learning_rate': 4.878799741453225e-06, 'epoch': 2.04} +2025-02-06 00:11:41 - ERROR - stderr - 68%|██████▊ | 15278/22434 [14:04:01<5:01:23, 2.53s/it] +2025-02-06 00:11:44 - ERROR - stderr - 68%|██████▊ | 15279/22434 [14:04:03<5:14:26, 2.64s/it] +2025-02-06 00:11:44 - ERROR - stderr - +2025-02-06 00:11:44 - ERROR - stderr - +2025-02-06 00:11:44 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.4538769721984863, 'learning_rate': 4.877559740794593e-06, 'epoch': 2.04} +2025-02-06 00:11:44 - ERROR - stderr - 68%|██████▊ | 15279/22434 [14:04:03<5:14:26, 2.64s/it] +2025-02-06 00:11:46 - ERROR - stderr - 68%|██████▊ | 15280/22434 [14:04:06<5:10:02, 2.60s/it] +2025-02-06 00:11:46 - ERROR - stderr - +2025-02-06 00:11:46 - ERROR - stderr - +2025-02-06 00:11:46 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.4617369174957275, 'learning_rate': 4.876319846908326e-06, 'epoch': 2.04} +2025-02-06 00:11:46 - ERROR - stderr - 68%|██████▊ | 15280/22434 [14:04:06<5:10:02, 2.60s/it] +2025-02-06 00:11:49 - ERROR - stderr - 68%|██████▊ | 15281/22434 [14:04:09<5:28:42, 2.76s/it] +2025-02-06 00:11:49 - ERROR - stderr - +2025-02-06 00:11:49 - ERROR - stderr - +2025-02-06 00:11:49 - INFO - stdout - {'loss': 0.4604, 'grad_norm': 1.7942547798156738, 'learning_rate': 4.875080059820268e-06, 'epoch': 2.04} +2025-02-06 00:11:49 - ERROR - stderr - 68%|██████▊ | 15281/22434 [14:04:09<5:28:42, 2.76s/it] +2025-02-06 00:11:52 - ERROR - stderr - 68%|██████▊ | 15282/22434 [14:04:12<5:20:07, 2.69s/it] +2025-02-06 00:11:52 - ERROR - stderr - +2025-02-06 00:11:52 - ERROR - stderr - +2025-02-06 00:11:52 - INFO - stdout - {'loss': 0.4038, 'grad_norm': 1.6429582834243774, 'learning_rate': 4.873840379556268e-06, 'epoch': 2.04} +2025-02-06 00:11:52 - ERROR - stderr - 68%|██████▊ | 15282/22434 [14:04:12<5:20:07, 2.69s/it] +2025-02-06 00:11:54 - ERROR - stderr - 68%|██████▊ | 15283/22434 [14:04:14<5:10:57, 2.61s/it] +2025-02-06 00:11:54 - ERROR - stderr - +2025-02-06 00:11:54 - ERROR - stderr - +2025-02-06 00:11:54 - INFO - stdout - {'loss': 0.4507, 'grad_norm': 1.4804496765136719, 'learning_rate': 4.87260080614215e-06, 'epoch': 2.04} +2025-02-06 00:11:54 - ERROR - stderr - 68%|██████▊ | 15283/22434 [14:04:14<5:10:57, 2.61s/it] +2025-02-06 00:11:57 - ERROR - stderr - 68%|██████▊ | 15284/22434 [14:04:16<5:04:41, 2.56s/it] +2025-02-06 00:11:57 - ERROR - stderr - +2025-02-06 00:11:57 - ERROR - stderr - +2025-02-06 00:11:57 - INFO - stdout - {'loss': 0.4226, 'grad_norm': 1.5194768905639648, 'learning_rate': 4.87136133960377e-06, 'epoch': 2.04} +2025-02-06 00:11:57 - ERROR - stderr - 68%|██████▊ | 15284/22434 [14:04:16<5:04:41, 2.56s/it] +2025-02-06 00:11:59 - ERROR - stderr - 68%|██████▊ | 15285/22434 [14:04:19<5:02:39, 2.54s/it] +2025-02-06 00:11:59 - ERROR - stderr - +2025-02-06 00:11:59 - ERROR - stderr - +2025-02-06 00:11:59 - INFO - stdout - {'loss': 0.3851, 'grad_norm': 1.3231741189956665, 'learning_rate': 4.8701219799669495e-06, 'epoch': 2.04} +2025-02-06 00:11:59 - ERROR - stderr - 68%|██████▊ | 15285/22434 [14:04:19<5:02:39, 2.54s/it] +2025-02-06 00:12:02 - ERROR - stderr - 68%|██████▊ | 15286/22434 [14:04:21<4:59:29, 2.51s/it] +2025-02-06 00:12:02 - ERROR - stderr - +2025-02-06 00:12:02 - ERROR - stderr - +2025-02-06 00:12:02 - INFO - stdout - {'loss': 0.3693, 'grad_norm': 1.431269884109497, 'learning_rate': 4.86888272725753e-06, 'epoch': 2.04} +2025-02-06 00:12:02 - ERROR - stderr - 68%|██████▊ | 15286/22434 [14:04:21<4:59:29, 2.51s/it] +2025-02-06 00:12:04 - ERROR - stderr - 68%|██████▊ | 15287/22434 [14:04:24<4:57:21, 2.50s/it] +2025-02-06 00:12:04 - ERROR - stderr - +2025-02-06 00:12:04 - ERROR - stderr - +2025-02-06 00:12:04 - INFO - stdout - {'loss': 0.4273, 'grad_norm': 1.5591132640838623, 'learning_rate': 4.867643581501345e-06, 'epoch': 2.04} +2025-02-06 00:12:04 - ERROR - stderr - 68%|██████▊ | 15287/22434 [14:04:24<4:57:21, 2.50s/it] +2025-02-06 00:12:07 - ERROR - stderr - 68%|██████▊ | 15288/22434 [14:04:26<4:58:38, 2.51s/it] +2025-02-06 00:12:07 - ERROR - stderr - +2025-02-06 00:12:07 - ERROR - stderr - +2025-02-06 00:12:07 - INFO - stdout - {'loss': 0.4169, 'grad_norm': 1.5449163913726807, 'learning_rate': 4.866404542724209e-06, 'epoch': 2.04} +2025-02-06 00:12:07 - ERROR - stderr - 68%|██████▊ | 15288/22434 [14:04:26<4:58:38, 2.51s/it] +2025-02-06 00:12:09 - ERROR - stderr - 68%|██████▊ | 15289/22434 [14:04:29<4:56:18, 2.49s/it] +2025-02-06 00:12:09 - ERROR - stderr - +2025-02-06 00:12:09 - ERROR - stderr - +2025-02-06 00:12:09 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.5689740180969238, 'learning_rate': 4.865165610951966e-06, 'epoch': 2.04} +2025-02-06 00:12:09 - ERROR - stderr - 68%|██████▊ | 15289/22434 [14:04:29<4:56:18, 2.49s/it] +2025-02-06 00:12:12 - ERROR - stderr - 68%|██████▊ | 15290/22434 [14:04:31<4:59:57, 2.52s/it] +2025-02-06 00:12:12 - ERROR - stderr - +2025-02-06 00:12:12 - ERROR - stderr - +2025-02-06 00:12:12 - INFO - stdout - {'loss': 0.4197, 'grad_norm': 1.4284613132476807, 'learning_rate': 4.86392678621043e-06, 'epoch': 2.04} +2025-02-06 00:12:12 - ERROR - stderr - 68%|██████▊ | 15290/22434 [14:04:31<4:59:57, 2.52s/it] +2025-02-06 00:12:14 - ERROR - stderr - 68%|██████▊ | 15291/22434 [14:04:34<4:56:52, 2.49s/it] +2025-02-06 00:12:14 - ERROR - stderr - +2025-02-06 00:12:14 - ERROR - stderr - +2025-02-06 00:12:14 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.4720640182495117, 'learning_rate': 4.862688068525424e-06, 'epoch': 2.04} +2025-02-06 00:12:14 - ERROR - stderr - 68%|██████▊ | 15291/22434 [14:04:34<4:56:52, 2.49s/it] +2025-02-06 00:12:17 - ERROR - stderr - 68%|██████▊ | 15292/22434 [14:04:36<4:56:43, 2.49s/it] +2025-02-06 00:12:17 - ERROR - stderr - +2025-02-06 00:12:17 - ERROR - stderr - +2025-02-06 00:12:17 - INFO - stdout - {'loss': 0.4029, 'grad_norm': 1.5082443952560425, 'learning_rate': 4.86144945792277e-06, 'epoch': 2.04} +2025-02-06 00:12:17 - ERROR - stderr - 68%|██████▊ | 15292/22434 [14:04:36<4:56:43, 2.49s/it] +2025-02-06 00:12:19 - ERROR - stderr - 68%|██████▊ | 15293/22434 [14:04:39<5:01:36, 2.53s/it] +2025-02-06 00:12:19 - ERROR - stderr - +2025-02-06 00:12:19 - ERROR - stderr - +2025-02-06 00:12:19 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.4518718719482422, 'learning_rate': 4.860210954428285e-06, 'epoch': 2.05} +2025-02-06 00:12:19 - ERROR - stderr - 68%|██████▊ | 15293/22434 [14:04:39<5:01:36, 2.53s/it] +2025-02-06 00:12:22 - ERROR - stderr - 68%|██████▊ | 15294/22434 [14:04:41<5:01:59, 2.54s/it] +2025-02-06 00:12:22 - ERROR - stderr - +2025-02-06 00:12:22 - ERROR - stderr - +2025-02-06 00:12:22 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.3054760694503784, 'learning_rate': 4.858972558067784e-06, 'epoch': 2.05} +2025-02-06 00:12:22 - ERROR - stderr - 68%|██████▊ | 15294/22434 [14:04:42<5:01:59, 2.54s/it] +2025-02-06 00:12:24 - ERROR - stderr - 68%|██████▊ | 15295/22434 [14:04:44<5:06:35, 2.58s/it] +2025-02-06 00:12:24 - ERROR - stderr - +2025-02-06 00:12:24 - ERROR - stderr - +2025-02-06 00:12:24 - INFO - stdout - {'loss': 0.42, 'grad_norm': 1.5468804836273193, 'learning_rate': 4.857734268867082e-06, 'epoch': 2.05} +2025-02-06 00:12:24 - ERROR - stderr - 68%|██████▊ | 15295/22434 [14:04:44<5:06:35, 2.58s/it] +2025-02-06 00:12:27 - ERROR - stderr - 68%|██████▊ | 15296/22434 [14:04:47<5:06:03, 2.57s/it] +2025-02-06 00:12:27 - ERROR - stderr - +2025-02-06 00:12:27 - ERROR - stderr - +2025-02-06 00:12:27 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.467820167541504, 'learning_rate': 4.856496086851986e-06, 'epoch': 2.05} +2025-02-06 00:12:27 - ERROR - stderr - 68%|██████▊ | 15296/22434 [14:04:47<5:06:03, 2.57s/it] +2025-02-06 00:12:29 - ERROR - stderr - 68%|██████▊ | 15297/22434 [14:04:49<5:02:56, 2.55s/it] +2025-02-06 00:12:29 - ERROR - stderr - +2025-02-06 00:12:29 - ERROR - stderr - +2025-02-06 00:12:29 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.5120972394943237, 'learning_rate': 4.855258012048309e-06, 'epoch': 2.05} +2025-02-06 00:12:29 - ERROR - stderr - 68%|██████▊ | 15297/22434 [14:04:49<5:02:56, 2.55s/it] +2025-02-06 00:12:32 - ERROR - stderr - 68%|██████▊ | 15298/22434 [14:04:52<4:58:55, 2.51s/it] +2025-02-06 00:12:32 - ERROR - stderr - +2025-02-06 00:12:32 - ERROR - stderr - +2025-02-06 00:12:32 - INFO - stdout - {'loss': 0.4203, 'grad_norm': 1.479880452156067, 'learning_rate': 4.854020044481855e-06, 'epoch': 2.05} +2025-02-06 00:12:32 - ERROR - stderr - 68%|██████▊ | 15298/22434 [14:04:52<4:58:55, 2.51s/it] +2025-02-06 00:12:35 - ERROR - stderr - 68%|██████▊ | 15299/22434 [14:04:54<5:08:32, 2.59s/it] +2025-02-06 00:12:35 - ERROR - stderr - +2025-02-06 00:12:35 - ERROR - stderr - +2025-02-06 00:12:35 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.5339252948760986, 'learning_rate': 4.852782184178431e-06, 'epoch': 2.05} +2025-02-06 00:12:35 - ERROR - stderr - 68%|██████▊ | 15299/22434 [14:04:54<5:08:32, 2.59s/it] +2025-02-06 00:12:37 - ERROR - stderr - 68%|██████▊ | 15300/22434 [14:04:57<5:06:39, 2.58s/it] +2025-02-06 00:12:37 - ERROR - stderr - +2025-02-06 00:12:37 - ERROR - stderr - +2025-02-06 00:12:37 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.3302209377288818, 'learning_rate': 4.851544431163835e-06, 'epoch': 2.05} +2025-02-06 00:12:37 - ERROR - stderr - 68%|██████▊ | 15300/22434 [14:04:57<5:06:39, 2.58s/it] +2025-02-06 00:12:40 - ERROR - stderr - 68%|██████▊ | 15301/22434 [14:04:59<5:03:59, 2.56s/it] +2025-02-06 00:12:40 - ERROR - stderr - +2025-02-06 00:12:40 - ERROR - stderr - +2025-02-06 00:12:40 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.3409371376037598, 'learning_rate': 4.850306785463869e-06, 'epoch': 2.05} +2025-02-06 00:12:40 - ERROR - stderr - 68%|██████▊ | 15301/22434 [14:04:59<5:03:59, 2.56s/it] +2025-02-06 00:12:42 - ERROR - stderr - 68%|██████▊ | 15302/22434 [14:05:02<5:05:37, 2.57s/it] +2025-02-06 00:12:42 - ERROR - stderr - +2025-02-06 00:12:42 - ERROR - stderr - +2025-02-06 00:12:42 - INFO - stdout - {'loss': 0.3877, 'grad_norm': 1.5098565816879272, 'learning_rate': 4.84906924710433e-06, 'epoch': 2.05} +2025-02-06 00:12:42 - ERROR - stderr - 68%|██████▊ | 15302/22434 [14:05:02<5:05:37, 2.57s/it] +2025-02-06 00:12:45 - ERROR - stderr - 68%|██████▊ | 15303/22434 [14:05:04<4:59:53, 2.52s/it] +2025-02-06 00:12:45 - ERROR - stderr - +2025-02-06 00:12:45 - ERROR - stderr - +2025-02-06 00:12:45 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.410828709602356, 'learning_rate': 4.847831816111019e-06, 'epoch': 2.05} +2025-02-06 00:12:45 - ERROR - stderr - 68%|██████▊ | 15303/22434 [14:05:05<4:59:53, 2.52s/it] +2025-02-06 00:12:47 - ERROR - stderr - 68%|██████▊ | 15304/22434 [14:05:07<5:01:15, 2.54s/it] +2025-02-06 00:12:47 - ERROR - stderr - +2025-02-06 00:12:47 - ERROR - stderr - +2025-02-06 00:12:47 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.2830333709716797, 'learning_rate': 4.846594492509714e-06, 'epoch': 2.05} +2025-02-06 00:12:47 - ERROR - stderr - 68%|██████▊ | 15304/22434 [14:05:07<5:01:15, 2.54s/it] +2025-02-06 00:12:50 - ERROR - stderr - 68%|██████▊ | 15305/22434 [14:05:09<4:58:23, 2.51s/it] +2025-02-06 00:12:50 - ERROR - stderr - +2025-02-06 00:12:50 - ERROR - stderr - +2025-02-06 00:12:50 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.3533754348754883, 'learning_rate': 4.845357276326221e-06, 'epoch': 2.05} +2025-02-06 00:12:50 - ERROR - stderr - 68%|██████▊ | 15305/22434 [14:05:10<4:58:23, 2.51s/it] +2025-02-06 00:12:52 - ERROR - stderr - 68%|██████▊ | 15306/22434 [14:05:12<4:57:13, 2.50s/it] +2025-02-06 00:12:52 - ERROR - stderr - +2025-02-06 00:12:52 - ERROR - stderr - +2025-02-06 00:12:52 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.5009660720825195, 'learning_rate': 4.844120167586323e-06, 'epoch': 2.05} +2025-02-06 00:12:52 - ERROR - stderr - 68%|██████▊ | 15306/22434 [14:05:12<4:57:13, 2.50s/it] +2025-02-06 00:12:55 - ERROR - stderr - 68%|██████▊ | 15307/22434 [14:05:14<4:54:26, 2.48s/it] +2025-02-06 00:12:55 - ERROR - stderr - +2025-02-06 00:12:55 - ERROR - stderr - +2025-02-06 00:12:55 - INFO - stdout - {'loss': 0.4152, 'grad_norm': 1.3547630310058594, 'learning_rate': 4.842883166315806e-06, 'epoch': 2.05} +2025-02-06 00:12:55 - ERROR - stderr - 68%|██████▊ | 15307/22434 [14:05:14<4:54:26, 2.48s/it] +2025-02-06 00:12:57 - ERROR - stderr - 68%|██████▊ | 15308/22434 [14:05:17<4:52:29, 2.46s/it] +2025-02-06 00:12:57 - ERROR - stderr - +2025-02-06 00:12:57 - ERROR - stderr - +2025-02-06 00:12:57 - INFO - stdout - {'loss': 0.4448, 'grad_norm': 1.7151812314987183, 'learning_rate': 4.8416462725404575e-06, 'epoch': 2.05} +2025-02-06 00:12:57 - ERROR - stderr - 68%|██████▊ | 15308/22434 [14:05:17<4:52:29, 2.46s/it] +2025-02-06 00:13:00 - ERROR - stderr - 68%|██████▊ | 15309/22434 [14:05:19<4:52:47, 2.47s/it] +2025-02-06 00:13:00 - ERROR - stderr - +2025-02-06 00:13:00 - ERROR - stderr - +2025-02-06 00:13:00 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.3633064031600952, 'learning_rate': 4.840409486286051e-06, 'epoch': 2.05} +2025-02-06 00:13:00 - ERROR - stderr - 68%|██████▊ | 15309/22434 [14:05:19<4:52:47, 2.47s/it] +2025-02-06 00:13:02 - ERROR - stderr - 68%|██████▊ | 15310/22434 [14:05:22<4:57:13, 2.50s/it] +2025-02-06 00:13:02 - ERROR - stderr - +2025-02-06 00:13:02 - ERROR - stderr - +2025-02-06 00:13:02 - INFO - stdout - {'loss': 0.3466, 'grad_norm': 1.3008447885513306, 'learning_rate': 4.839172807578377e-06, 'epoch': 2.05} +2025-02-06 00:13:02 - ERROR - stderr - 68%|██████▊ | 15310/22434 [14:05:22<4:57:13, 2.50s/it] +2025-02-06 00:13:05 - ERROR - stderr - 68%|██████▊ | 15311/22434 [14:05:24<4:57:07, 2.50s/it] +2025-02-06 00:13:05 - ERROR - stderr - +2025-02-06 00:13:05 - ERROR - stderr - +2025-02-06 00:13:05 - INFO - stdout - {'loss': 0.4071, 'grad_norm': 1.6451970338821411, 'learning_rate': 4.8379362364432045e-06, 'epoch': 2.05} +2025-02-06 00:13:05 - ERROR - stderr - 68%|██████▊ | 15311/22434 [14:05:24<4:57:07, 2.50s/it] +2025-02-06 00:13:07 - ERROR - stderr - 68%|██████▊ | 15312/22434 [14:05:27<5:01:45, 2.54s/it] +2025-02-06 00:13:07 - ERROR - stderr - +2025-02-06 00:13:07 - ERROR - stderr - +2025-02-06 00:13:07 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.5467127561569214, 'learning_rate': 4.836699772906311e-06, 'epoch': 2.05} +2025-02-06 00:13:07 - ERROR - stderr - 68%|██████▊ | 15312/22434 [14:05:27<5:01:45, 2.54s/it] +2025-02-06 00:13:10 - ERROR - stderr - 68%|██████▊ | 15313/22434 [14:05:30<5:00:03, 2.53s/it] +2025-02-06 00:13:10 - ERROR - stderr - +2025-02-06 00:13:10 - ERROR - stderr - +2025-02-06 00:13:10 - INFO - stdout - {'loss': 0.4, 'grad_norm': 1.5761569738388062, 'learning_rate': 4.835463416993471e-06, 'epoch': 2.05} +2025-02-06 00:13:10 - ERROR - stderr - 68%|██████▊ | 15313/22434 [14:05:30<5:00:03, 2.53s/it] +2025-02-06 00:13:13 - ERROR - stderr - 68%|██████▊ | 15314/22434 [14:05:33<5:16:38, 2.67s/it] +2025-02-06 00:13:13 - ERROR - stderr - +2025-02-06 00:13:13 - ERROR - stderr - +2025-02-06 00:13:13 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.417314052581787, 'learning_rate': 4.834227168730451e-06, 'epoch': 2.05} +2025-02-06 00:13:13 - ERROR - stderr - 68%|██████▊ | 15314/22434 [14:05:33<5:16:38, 2.67s/it] +2025-02-06 00:13:15 - ERROR - stderr - 68%|██████▊ | 15315/22434 [14:05:35<5:10:33, 2.62s/it] +2025-02-06 00:13:15 - ERROR - stderr - +2025-02-06 00:13:15 - ERROR - stderr - +2025-02-06 00:13:15 - INFO - stdout - {'loss': 0.3965, 'grad_norm': 1.5240907669067383, 'learning_rate': 4.8329910281430285e-06, 'epoch': 2.05} +2025-02-06 00:13:15 - ERROR - stderr - 68%|██████▊ | 15315/22434 [14:05:35<5:10:33, 2.62s/it] +2025-02-06 00:13:18 - ERROR - stderr - 68%|██████▊ | 15316/22434 [14:05:38<5:20:03, 2.70s/it] +2025-02-06 00:13:18 - ERROR - stderr - +2025-02-06 00:13:18 - ERROR - stderr - +2025-02-06 00:13:18 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.4235204458236694, 'learning_rate': 4.8317549952569605e-06, 'epoch': 2.05} +2025-02-06 00:13:18 - ERROR - stderr - 68%|██████▊ | 15316/22434 [14:05:38<5:20:03, 2.70s/it] +2025-02-06 00:13:21 - ERROR - stderr - 68%|██████▊ | 15317/22434 [14:05:40<5:11:22, 2.63s/it] +2025-02-06 00:13:21 - ERROR - stderr - +2025-02-06 00:13:21 - ERROR - stderr - +2025-02-06 00:13:21 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.474574089050293, 'learning_rate': 4.830519070098014e-06, 'epoch': 2.05} +2025-02-06 00:13:21 - ERROR - stderr - 68%|██████▊ | 15317/22434 [14:05:40<5:11:22, 2.63s/it] +2025-02-06 00:13:23 - ERROR - stderr - 68%|██████▊ | 15318/22434 [14:05:43<5:04:44, 2.57s/it] +2025-02-06 00:13:23 - ERROR - stderr - +2025-02-06 00:13:23 - ERROR - stderr - +2025-02-06 00:13:23 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 1.5900211334228516, 'learning_rate': 4.829283252691951e-06, 'epoch': 2.05} +2025-02-06 00:13:23 - ERROR - stderr - 68%|██████▊ | 15318/22434 [14:05:43<5:04:44, 2.57s/it] +2025-02-06 00:13:25 - ERROR - stderr - 68%|██████▊ | 15319/22434 [14:05:45<5:01:11, 2.54s/it] +2025-02-06 00:13:26 - ERROR - stderr - +2025-02-06 00:13:26 - ERROR - stderr - +2025-02-06 00:13:26 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.361810326576233, 'learning_rate': 4.828047543064532e-06, 'epoch': 2.05} +2025-02-06 00:13:26 - ERROR - stderr - 68%|██████▊ | 15319/22434 [14:05:45<5:01:11, 2.54s/it] +2025-02-06 00:13:28 - ERROR - stderr - 68%|██████▊ | 15320/22434 [14:05:48<5:04:52, 2.57s/it] +2025-02-06 00:13:28 - ERROR - stderr - +2025-02-06 00:13:28 - ERROR - stderr - +2025-02-06 00:13:28 - INFO - stdout - {'loss': 0.3982, 'grad_norm': 1.494605541229248, 'learning_rate': 4.82681194124151e-06, 'epoch': 2.05} +2025-02-06 00:13:28 - ERROR - stderr - 68%|██████▊ | 15320/22434 [14:05:48<5:04:52, 2.57s/it] +2025-02-06 00:13:31 - ERROR - stderr - 68%|██████▊ | 15321/22434 [14:05:50<5:00:26, 2.53s/it] +2025-02-06 00:13:31 - ERROR - stderr - +2025-02-06 00:13:31 - ERROR - stderr - +2025-02-06 00:13:31 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.5423048734664917, 'learning_rate': 4.8255764472486455e-06, 'epoch': 2.05} +2025-02-06 00:13:31 - ERROR - stderr - 68%|██████▊ | 15321/22434 [14:05:50<5:00:26, 2.53s/it] +2025-02-06 00:13:33 - ERROR - stderr - 68%|██████▊ | 15322/22434 [14:05:53<4:56:25, 2.50s/it] +2025-02-06 00:13:33 - ERROR - stderr - +2025-02-06 00:13:33 - ERROR - stderr - +2025-02-06 00:13:33 - INFO - stdout - {'loss': 0.3965, 'grad_norm': 1.5912147760391235, 'learning_rate': 4.824341061111688e-06, 'epoch': 2.05} +2025-02-06 00:13:33 - ERROR - stderr - 68%|██████▊ | 15322/22434 [14:05:53<4:56:25, 2.50s/it] +2025-02-06 00:13:35 - ERROR - stderr - 68%|██████▊ | 15323/22434 [14:05:55<4:54:45, 2.49s/it] +2025-02-06 00:13:36 - ERROR - stderr - +2025-02-06 00:13:36 - ERROR - stderr - +2025-02-06 00:13:36 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.405929684638977, 'learning_rate': 4.823105782856388e-06, 'epoch': 2.05} +2025-02-06 00:13:36 - ERROR - stderr - 68%|██████▊ | 15323/22434 [14:05:55<4:54:45, 2.49s/it] +2025-02-06 00:13:38 - ERROR - stderr - 68%|██████▊ | 15324/22434 [14:05:58<4:52:18, 2.47s/it] +2025-02-06 00:13:38 - ERROR - stderr - +2025-02-06 00:13:38 - ERROR - stderr - +2025-02-06 00:13:38 - INFO - stdout - {'loss': 0.398, 'grad_norm': 1.4059332609176636, 'learning_rate': 4.821870612508494e-06, 'epoch': 2.05} +2025-02-06 00:13:38 - ERROR - stderr - 68%|██████▊ | 15324/22434 [14:05:58<4:52:18, 2.47s/it] +2025-02-06 00:13:40 - ERROR - stderr - 68%|██████▊ | 15325/22434 [14:06:00<4:53:54, 2.48s/it] +2025-02-06 00:13:40 - ERROR - stderr - +2025-02-06 00:13:40 - ERROR - stderr - +2025-02-06 00:13:40 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.460900902748108, 'learning_rate': 4.820635550093753e-06, 'epoch': 2.05} +2025-02-06 00:13:40 - ERROR - stderr - 68%|██████▊ | 15325/22434 [14:06:00<4:53:54, 2.48s/it] +2025-02-06 00:13:43 - ERROR - stderr - 68%|██████▊ | 15326/22434 [14:06:03<4:51:57, 2.46s/it] +2025-02-06 00:13:43 - ERROR - stderr - +2025-02-06 00:13:43 - ERROR - stderr - +2025-02-06 00:13:43 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.3327229022979736, 'learning_rate': 4.819400595637908e-06, 'epoch': 2.05} +2025-02-06 00:13:43 - ERROR - stderr - 68%|██████▊ | 15326/22434 [14:06:03<4:51:57, 2.46s/it] +2025-02-06 00:13:45 - ERROR - stderr - 68%|██████▊ | 15327/22434 [14:06:05<4:50:21, 2.45s/it] +2025-02-06 00:13:45 - ERROR - stderr - +2025-02-06 00:13:45 - ERROR - stderr - +2025-02-06 00:13:45 - INFO - stdout - {'loss': 0.3996, 'grad_norm': 1.4798040390014648, 'learning_rate': 4.818165749166703e-06, 'epoch': 2.05} +2025-02-06 00:13:45 - ERROR - stderr - 68%|██████▊ | 15327/22434 [14:06:05<4:50:21, 2.45s/it] +2025-02-06 00:13:48 - ERROR - stderr - 68%|██████▊ | 15328/22434 [14:06:07<4:50:11, 2.45s/it] +2025-02-06 00:13:48 - ERROR - stderr - +2025-02-06 00:13:48 - ERROR - stderr - +2025-02-06 00:13:48 - INFO - stdout - {'loss': 0.3163, 'grad_norm': 1.2080055475234985, 'learning_rate': 4.816931010705867e-06, 'epoch': 2.05} +2025-02-06 00:13:48 - ERROR - stderr - 68%|██████▊ | 15328/22434 [14:06:07<4:50:11, 2.45s/it] +2025-02-06 00:13:50 - ERROR - stderr - 68%|██████▊ | 15329/22434 [14:06:10<4:51:02, 2.46s/it] +2025-02-06 00:13:50 - ERROR - stderr - +2025-02-06 00:13:50 - ERROR - stderr - +2025-02-06 00:13:50 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.32821786403656, 'learning_rate': 4.815696380281153e-06, 'epoch': 2.05} +2025-02-06 00:13:50 - ERROR - stderr - 68%|██████▊ | 15329/22434 [14:06:10<4:51:02, 2.46s/it] +2025-02-06 00:13:53 - ERROR - stderr - 68%|██████▊ | 15330/22434 [14:06:12<4:52:25, 2.47s/it] +2025-02-06 00:13:53 - ERROR - stderr - +2025-02-06 00:13:53 - ERROR - stderr - +2025-02-06 00:13:53 - INFO - stdout - {'loss': 0.4132, 'grad_norm': 1.53327476978302, 'learning_rate': 4.814461857918279e-06, 'epoch': 2.05} +2025-02-06 00:13:53 - ERROR - stderr - 68%|██████▊ | 15330/22434 [14:06:12<4:52:25, 2.47s/it] +2025-02-06 00:13:55 - ERROR - stderr - 68%|██████▊ | 15331/22434 [14:06:15<4:52:00, 2.47s/it] +2025-02-06 00:13:55 - ERROR - stderr - +2025-02-06 00:13:55 - ERROR - stderr - +2025-02-06 00:13:55 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.4783756732940674, 'learning_rate': 4.8132274436429925e-06, 'epoch': 2.05} +2025-02-06 00:13:55 - ERROR - stderr - 68%|██████▊ | 15331/22434 [14:06:15<4:52:00, 2.47s/it] +2025-02-06 00:13:58 - ERROR - stderr - 68%|██████▊ | 15332/22434 [14:06:17<4:56:51, 2.51s/it] +2025-02-06 00:13:58 - ERROR - stderr - +2025-02-06 00:13:58 - ERROR - stderr - +2025-02-06 00:13:58 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.5153189897537231, 'learning_rate': 4.811993137481014e-06, 'epoch': 2.05} +2025-02-06 00:13:58 - ERROR - stderr - 68%|██████▊ | 15332/22434 [14:06:18<4:56:51, 2.51s/it] +2025-02-06 00:14:00 - ERROR - stderr - 68%|██████▊ | 15333/22434 [14:06:20<4:54:21, 2.49s/it] +2025-02-06 00:14:00 - ERROR - stderr - +2025-02-06 00:14:00 - ERROR - stderr - +2025-02-06 00:14:00 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.3218958377838135, 'learning_rate': 4.81075893945807e-06, 'epoch': 2.05} +2025-02-06 00:14:00 - ERROR - stderr - 68%|██████▊ | 15333/22434 [14:06:20<4:54:21, 2.49s/it] +2025-02-06 00:14:03 - ERROR - stderr - 68%|██████▊ | 15334/22434 [14:06:22<4:54:30, 2.49s/it] +2025-02-06 00:14:03 - ERROR - stderr - +2025-02-06 00:14:03 - ERROR - stderr - +2025-02-06 00:14:03 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.4127275943756104, 'learning_rate': 4.809524849599897e-06, 'epoch': 2.05} +2025-02-06 00:14:03 - ERROR - stderr - 68%|██████▊ | 15334/22434 [14:06:22<4:54:30, 2.49s/it] +2025-02-06 00:14:05 - ERROR - stderr - 68%|██████▊ | 15335/22434 [14:06:25<4:53:24, 2.48s/it] +2025-02-06 00:14:05 - ERROR - stderr - +2025-02-06 00:14:05 - ERROR - stderr - +2025-02-06 00:14:05 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.3549495935440063, 'learning_rate': 4.808290867932209e-06, 'epoch': 2.05} +2025-02-06 00:14:05 - ERROR - stderr - 68%|██████▊ | 15335/22434 [14:06:25<4:53:24, 2.48s/it] +2025-02-06 00:14:08 - ERROR - stderr - 68%|██████▊ | 15336/22434 [14:06:27<4:53:32, 2.48s/it] +2025-02-06 00:14:08 - ERROR - stderr - +2025-02-06 00:14:08 - ERROR - stderr - +2025-02-06 00:14:08 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.3982123136520386, 'learning_rate': 4.80705699448073e-06, 'epoch': 2.05} +2025-02-06 00:14:08 - ERROR - stderr - 68%|██████▊ | 15336/22434 [14:06:27<4:53:32, 2.48s/it] +2025-02-06 00:14:10 - ERROR - stderr - 68%|██████▊ | 15337/22434 [14:06:30<4:51:19, 2.46s/it] +2025-02-06 00:14:10 - ERROR - stderr - +2025-02-06 00:14:10 - ERROR - stderr - +2025-02-06 00:14:10 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.5429255962371826, 'learning_rate': 4.8058232292711785e-06, 'epoch': 2.05} +2025-02-06 00:14:10 - ERROR - stderr - 68%|██████▊ | 15337/22434 [14:06:30<4:51:19, 2.46s/it] +2025-02-06 00:14:12 - ERROR - stderr - 68%|██████▊ | 15338/22434 [14:06:32<4:51:46, 2.47s/it] +2025-02-06 00:14:13 - ERROR - stderr - +2025-02-06 00:14:13 - ERROR - stderr - +2025-02-06 00:14:13 - INFO - stdout - {'loss': 0.4328, 'grad_norm': 1.6331851482391357, 'learning_rate': 4.804589572329271e-06, 'epoch': 2.05} +2025-02-06 00:14:13 - ERROR - stderr - 68%|██████▊ | 15338/22434 [14:06:32<4:51:46, 2.47s/it] +2025-02-06 00:14:15 - ERROR - stderr - 68%|██████▊ | 15339/22434 [14:06:35<4:52:13, 2.47s/it] +2025-02-06 00:14:15 - ERROR - stderr - +2025-02-06 00:14:15 - ERROR - stderr - +2025-02-06 00:14:15 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.5125499963760376, 'learning_rate': 4.803356023680723e-06, 'epoch': 2.05} +2025-02-06 00:14:15 - ERROR - stderr - 68%|██████▊ | 15339/22434 [14:06:35<4:52:13, 2.47s/it] +2025-02-06 00:14:17 - ERROR - stderr - 68%|██████▊ | 15340/22434 [14:06:37<4:52:50, 2.48s/it] +2025-02-06 00:14:18 - ERROR - stderr - +2025-02-06 00:14:18 - ERROR - stderr - +2025-02-06 00:14:18 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.4796911478042603, 'learning_rate': 4.802122583351246e-06, 'epoch': 2.05} +2025-02-06 00:14:18 - ERROR - stderr - 68%|██████▊ | 15340/22434 [14:06:37<4:52:50, 2.48s/it] +2025-02-06 00:14:20 - ERROR - stderr - 68%|██████▊ | 15341/22434 [14:06:40<4:53:07, 2.48s/it] +2025-02-06 00:14:20 - ERROR - stderr - +2025-02-06 00:14:20 - ERROR - stderr - +2025-02-06 00:14:20 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.316720724105835, 'learning_rate': 4.80088925136655e-06, 'epoch': 2.05} +2025-02-06 00:14:20 - ERROR - stderr - 68%|██████▊ | 15341/22434 [14:06:40<4:53:07, 2.48s/it] +2025-02-06 00:14:22 - ERROR - stderr - 68%|██████▊ | 15342/22434 [14:06:42<4:53:34, 2.48s/it] +2025-02-06 00:14:22 - ERROR - stderr - +2025-02-06 00:14:22 - ERROR - stderr - +2025-02-06 00:14:22 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.40620756149292, 'learning_rate': 4.799656027752343e-06, 'epoch': 2.05} +2025-02-06 00:14:22 - ERROR - stderr - 68%|██████▊ | 15342/22434 [14:06:42<4:53:34, 2.48s/it] +2025-02-06 00:14:25 - ERROR - stderr - 68%|██████▊ | 15343/22434 [14:06:45<4:55:16, 2.50s/it] +2025-02-06 00:14:25 - ERROR - stderr - +2025-02-06 00:14:25 - ERROR - stderr - +2025-02-06 00:14:25 - INFO - stdout - {'loss': 0.389, 'grad_norm': 1.6071455478668213, 'learning_rate': 4.798422912534329e-06, 'epoch': 2.05} +2025-02-06 00:14:25 - ERROR - stderr - 68%|██████▊ | 15343/22434 [14:06:45<4:55:16, 2.50s/it] +2025-02-06 00:14:27 - ERROR - stderr - 68%|██████▊ | 15344/22434 [14:06:47<4:56:08, 2.51s/it] +2025-02-06 00:14:28 - ERROR - stderr - +2025-02-06 00:14:28 - ERROR - stderr - +2025-02-06 00:14:28 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.5946511030197144, 'learning_rate': 4.797189905738212e-06, 'epoch': 2.05} +2025-02-06 00:14:28 - ERROR - stderr - 68%|██████▊ | 15344/22434 [14:06:47<4:56:08, 2.51s/it] +2025-02-06 00:14:30 - ERROR - stderr - 68%|██████▊ | 15345/22434 [14:06:50<4:56:07, 2.51s/it] +2025-02-06 00:14:30 - ERROR - stderr - +2025-02-06 00:14:30 - ERROR - stderr - +2025-02-06 00:14:30 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.4388788938522339, 'learning_rate': 4.7959570073896935e-06, 'epoch': 2.05} +2025-02-06 00:14:30 - ERROR - stderr - 68%|██████▊ | 15345/22434 [14:06:50<4:56:07, 2.51s/it] +2025-02-06 00:14:32 - ERROR - stderr - 68%|██████▊ | 15346/22434 [14:06:52<4:53:50, 2.49s/it] +2025-02-06 00:14:32 - ERROR - stderr - +2025-02-06 00:14:32 - ERROR - stderr - +2025-02-06 00:14:32 - INFO - stdout - {'loss': 0.3401, 'grad_norm': 1.3409440517425537, 'learning_rate': 4.794724217514472e-06, 'epoch': 2.05} +2025-02-06 00:14:32 - ERROR - stderr - 68%|██████▊ | 15346/22434 [14:06:52<4:53:50, 2.49s/it] +2025-02-06 00:14:35 - ERROR - stderr - 68%|██████▊ | 15347/22434 [14:06:55<5:06:27, 2.59s/it] +2025-02-06 00:14:35 - ERROR - stderr - +2025-02-06 00:14:35 - ERROR - stderr - +2025-02-06 00:14:35 - INFO - stdout - {'loss': 0.4237, 'grad_norm': 1.6862062215805054, 'learning_rate': 4.7934915361382414e-06, 'epoch': 2.05} +2025-02-06 00:14:35 - ERROR - stderr - 68%|██████▊ | 15347/22434 [14:06:55<5:06:27, 2.59s/it] +2025-02-06 00:14:38 - ERROR - stderr - 68%|██████▊ | 15348/22434 [14:06:58<5:05:44, 2.59s/it] +2025-02-06 00:14:38 - ERROR - stderr - +2025-02-06 00:14:38 - ERROR - stderr - +2025-02-06 00:14:38 - INFO - stdout - {'loss': 0.4306, 'grad_norm': 1.679821252822876, 'learning_rate': 4.792258963286703e-06, 'epoch': 2.05} +2025-02-06 00:14:38 - ERROR - stderr - 68%|██████▊ | 15348/22434 [14:06:58<5:05:44, 2.59s/it] +2025-02-06 00:14:40 - ERROR - stderr - 68%|██████▊ | 15349/22434 [14:07:00<5:05:26, 2.59s/it] +2025-02-06 00:14:40 - ERROR - stderr - +2025-02-06 00:14:40 - ERROR - stderr - +2025-02-06 00:14:40 - INFO - stdout - {'loss': 0.3337, 'grad_norm': 1.4185497760772705, 'learning_rate': 4.791026498985535e-06, 'epoch': 2.05} +2025-02-06 00:14:40 - ERROR - stderr - 68%|██████▊ | 15349/22434 [14:07:00<5:05:26, 2.59s/it] +2025-02-06 00:14:43 - ERROR - stderr - 68%|██████▊ | 15350/22434 [14:07:03<5:01:12, 2.55s/it] +2025-02-06 00:14:43 - ERROR - stderr - +2025-02-06 00:14:43 - ERROR - stderr - +2025-02-06 00:14:43 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.4347972869873047, 'learning_rate': 4.789794143260443e-06, 'epoch': 2.05} +2025-02-06 00:14:43 - ERROR - stderr - 68%|██████▊ | 15350/22434 [14:07:03<5:01:12, 2.55s/it] +2025-02-06 00:14:45 - ERROR - stderr - 68%|██████▊ | 15351/22434 [14:07:05<4:58:10, 2.53s/it] +2025-02-06 00:14:45 - ERROR - stderr - +2025-02-06 00:14:45 - ERROR - stderr - +2025-02-06 00:14:45 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.4867558479309082, 'learning_rate': 4.7885618961371025e-06, 'epoch': 2.05} +2025-02-06 00:14:45 - ERROR - stderr - 68%|██████▊ | 15351/22434 [14:07:05<4:58:10, 2.53s/it] +2025-02-06 00:14:48 - ERROR - stderr - 68%|██████▊ | 15352/22434 [14:07:08<4:56:10, 2.51s/it] +2025-02-06 00:14:48 - ERROR - stderr - +2025-02-06 00:14:48 - ERROR - stderr - +2025-02-06 00:14:48 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.5347763299942017, 'learning_rate': 4.787329757641199e-06, 'epoch': 2.05} +2025-02-06 00:14:48 - ERROR - stderr - 68%|██████▊ | 15352/22434 [14:07:08<4:56:10, 2.51s/it] +2025-02-06 00:14:50 - ERROR - stderr - 68%|██████▊ | 15353/22434 [14:07:10<4:57:44, 2.52s/it] +2025-02-06 00:14:50 - ERROR - stderr - +2025-02-06 00:14:50 - ERROR - stderr - +2025-02-06 00:14:50 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.5248593091964722, 'learning_rate': 4.7860977277984265e-06, 'epoch': 2.05} +2025-02-06 00:14:50 - ERROR - stderr - 68%|██████▊ | 15353/22434 [14:07:10<4:57:44, 2.52s/it] +2025-02-06 00:14:53 - ERROR - stderr - 68%|██████▊ | 15354/22434 [14:07:13<4:56:03, 2.51s/it] +2025-02-06 00:14:53 - ERROR - stderr - +2025-02-06 00:14:53 - ERROR - stderr - +2025-02-06 00:14:53 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.5093733072280884, 'learning_rate': 4.784865806634449e-06, 'epoch': 2.05} +2025-02-06 00:14:53 - ERROR - stderr - 68%|██████▊ | 15354/22434 [14:07:13<4:56:03, 2.51s/it] +2025-02-06 00:14:55 - ERROR - stderr - 68%|██████▊ | 15355/22434 [14:07:15<4:58:02, 2.53s/it] +2025-02-06 00:14:55 - ERROR - stderr - +2025-02-06 00:14:55 - ERROR - stderr - +2025-02-06 00:14:55 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.4741685390472412, 'learning_rate': 4.783633994174962e-06, 'epoch': 2.05} +2025-02-06 00:14:56 - ERROR - stderr - 68%|██████▊ | 15355/22434 [14:07:15<4:58:02, 2.53s/it] +2025-02-06 00:14:58 - ERROR - stderr - 68%|██████▊ | 15356/22434 [14:07:18<4:56:23, 2.51s/it] +2025-02-06 00:14:58 - ERROR - stderr - +2025-02-06 00:14:58 - ERROR - stderr - +2025-02-06 00:14:58 - INFO - stdout - {'loss': 0.3911, 'grad_norm': 1.480236530303955, 'learning_rate': 4.782402290445629e-06, 'epoch': 2.05} +2025-02-06 00:14:58 - ERROR - stderr - 68%|███��██▊ | 15356/22434 [14:07:18<4:56:23, 2.51s/it] +2025-02-06 00:15:00 - ERROR - stderr - 68%|██████▊ | 15357/22434 [14:07:20<4:54:48, 2.50s/it] +2025-02-06 00:15:00 - ERROR - stderr - +2025-02-06 00:15:00 - ERROR - stderr - +2025-02-06 00:15:00 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.3714247941970825, 'learning_rate': 4.781170695472127e-06, 'epoch': 2.05} +2025-02-06 00:15:00 - ERROR - stderr - 68%|██████▊ | 15357/22434 [14:07:20<4:54:48, 2.50s/it] +2025-02-06 00:15:03 - ERROR - stderr - 68%|██████▊ | 15358/22434 [14:07:23<4:56:22, 2.51s/it] +2025-02-06 00:15:03 - ERROR - stderr - +2025-02-06 00:15:03 - ERROR - stderr - +2025-02-06 00:15:03 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.6008464097976685, 'learning_rate': 4.779939209280129e-06, 'epoch': 2.05} +2025-02-06 00:15:03 - ERROR - stderr - 68%|██████▊ | 15358/22434 [14:07:23<4:56:22, 2.51s/it] +2025-02-06 00:15:06 - ERROR - stderr - 68%|██████▊ | 15359/22434 [14:07:25<5:01:22, 2.56s/it] +2025-02-06 00:15:06 - ERROR - stderr - +2025-02-06 00:15:06 - ERROR - stderr - +2025-02-06 00:15:06 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.5429996252059937, 'learning_rate': 4.778707831895302e-06, 'epoch': 2.05} +2025-02-06 00:15:06 - ERROR - stderr - 68%|██████▊ | 15359/22434 [14:07:25<5:01:22, 2.56s/it] +2025-02-06 00:15:08 - ERROR - stderr - 68%|██████▊ | 15360/22434 [14:07:28<5:00:27, 2.55s/it] +2025-02-06 00:15:08 - ERROR - stderr - +2025-02-06 00:15:08 - ERROR - stderr - +2025-02-06 00:15:08 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.5035858154296875, 'learning_rate': 4.777476563343314e-06, 'epoch': 2.05} +2025-02-06 00:15:08 - ERROR - stderr - 68%|██████▊ | 15360/22434 [14:07:28<5:00:27, 2.55s/it] +2025-02-06 00:15:11 - ERROR - stderr - 68%|██████▊ | 15361/22434 [14:07:30<4:58:46, 2.53s/it] +2025-02-06 00:15:11 - ERROR - stderr - +2025-02-06 00:15:11 - ERROR - stderr - +2025-02-06 00:15:11 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.4074007272720337, 'learning_rate': 4.776245403649831e-06, 'epoch': 2.05} +2025-02-06 00:15:11 - ERROR - stderr - 68%|██████▊ | 15361/22434 [14:07:30<4:58:46, 2.53s/it] +2025-02-06 00:15:13 - ERROR - stderr - 68%|██████▊ | 15362/22434 [14:07:33<4:56:53, 2.52s/it] +2025-02-06 00:15:13 - ERROR - stderr - +2025-02-06 00:15:13 - ERROR - stderr - +2025-02-06 00:15:13 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.4304219484329224, 'learning_rate': 4.775014352840512e-06, 'epoch': 2.05} +2025-02-06 00:15:13 - ERROR - stderr - 68%|██████▊ | 15362/22434 [14:07:33<4:56:53, 2.52s/it] +2025-02-06 00:15:16 - ERROR - stderr - 68%|██████▊ | 15363/22434 [14:07:35<4:53:27, 2.49s/it] +2025-02-06 00:15:16 - ERROR - stderr - +2025-02-06 00:15:16 - ERROR - stderr - +2025-02-06 00:15:16 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.3462783098220825, 'learning_rate': 4.773783410941021e-06, 'epoch': 2.05} +2025-02-06 00:15:16 - ERROR - stderr - 68%|██████▊ | 15363/22434 [14:07:35<4:53:27, 2.49s/it] +2025-02-06 00:15:18 - ERROR - stderr - 68%|██████▊ | 15364/22434 [14:07:38<4:51:16, 2.47s/it] +2025-02-06 00:15:18 - ERROR - stderr - +2025-02-06 00:15:18 - ERROR - stderr - +2025-02-06 00:15:18 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.4243463277816772, 'learning_rate': 4.772552577977012e-06, 'epoch': 2.05} +2025-02-06 00:15:18 - ERROR - stderr - 68%|██████▊ | 15364/22434 [14:07:38<4:51:16, 2.47s/it] +2025-02-06 00:15:20 - ERROR - stderr - 68%|██████▊ | 15365/22434 [14:07:40<4:50:07, 2.46s/it] +2025-02-06 00:15:20 - ERROR - stderr - +2025-02-06 00:15:20 - ERROR - stderr - +2025-02-06 00:15:20 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.3190523386001587, 'learning_rate': 4.771321853974144e-06, 'epoch': 2.05} +2025-02-06 00:15:20 - ERROR - stderr - 68%|██████▊ | 15365/22434 [14:07:40<4:50:07, 2.46s/it] +2025-02-06 00:15:23 - ERROR - stderr - 68%|██████▊ | 15366/22434 [14:07:43<4:51:08, 2.47s/it] +2025-02-06 00:15:23 - ERROR - stderr - +2025-02-06 00:15:23 - ERROR - stderr - +2025-02-06 00:15:23 - INFO - stdout - {'loss': 0.4381, 'grad_norm': 1.4821985960006714, 'learning_rate': 4.770091238958068e-06, 'epoch': 2.05} +2025-02-06 00:15:23 - ERROR - stderr - 68%|██████▊ | 15366/22434 [14:07:43<4:51:08, 2.47s/it] +2025-02-06 00:15:25 - ERROR - stderr - 68%|██████▊ | 15367/22434 [14:07:45<4:52:58, 2.49s/it] +2025-02-06 00:15:25 - ERROR - stderr - +2025-02-06 00:15:25 - ERROR - stderr - +2025-02-06 00:15:25 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.3100749254226685, 'learning_rate': 4.768860732954439e-06, 'epoch': 2.05} +2025-02-06 00:15:25 - ERROR - stderr - 68%|██████▊ | 15367/22434 [14:07:45<4:52:58, 2.49s/it] +2025-02-06 00:15:28 - ERROR - stderr - 69%|██████▊ | 15368/22434 [14:07:48<4:53:00, 2.49s/it] +2025-02-06 00:15:28 - ERROR - stderr - +2025-02-06 00:15:28 - ERROR - stderr - +2025-02-06 00:15:28 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.4381048679351807, 'learning_rate': 4.767630335988895e-06, 'epoch': 2.06} +2025-02-06 00:15:28 - ERROR - stderr - 69%|██████▊ | 15368/22434 [14:07:48<4:53:00, 2.49s/it] +2025-02-06 00:15:30 - ERROR - stderr - 69%|██████▊ | 15369/22434 [14:07:50<4:54:02, 2.50s/it] +2025-02-06 00:15:30 - ERROR - stderr - +2025-02-06 00:15:30 - ERROR - stderr - +2025-02-06 00:15:30 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.4290353059768677, 'learning_rate': 4.766400048087098e-06, 'epoch': 2.06} +2025-02-06 00:15:30 - ERROR - stderr - 69%|██████▊ | 15369/22434 [14:07:50<4:54:02, 2.50s/it] +2025-02-06 00:15:33 - ERROR - stderr - 69%|██████▊ | 15370/22434 [14:07:53<4:52:28, 2.48s/it] +2025-02-06 00:15:33 - ERROR - stderr - +2025-02-06 00:15:33 - ERROR - stderr - +2025-02-06 00:15:33 - INFO - stdout - {'loss': 0.4505, 'grad_norm': 1.7387784719467163, 'learning_rate': 4.765169869274676e-06, 'epoch': 2.06} +2025-02-06 00:15:33 - ERROR - stderr - 69%|██████▊ | 15370/22434 [14:07:53<4:52:28, 2.48s/it] +2025-02-06 00:15:35 - ERROR - stderr - 69%|██████▊ | 15371/22434 [14:07:55<4:52:48, 2.49s/it] +2025-02-06 00:15:35 - ERROR - stderr - +2025-02-06 00:15:35 - ERROR - stderr - +2025-02-06 00:15:35 - INFO - stdout - {'loss': 0.4423, 'grad_norm': 1.6746313571929932, 'learning_rate': 4.763939799577283e-06, 'epoch': 2.06} +2025-02-06 00:15:35 - ERROR - stderr - 69%|██████▊ | 15371/22434 [14:07:55<4:52:48, 2.49s/it] +2025-02-06 00:15:38 - ERROR - stderr - 69%|██████▊ | 15372/22434 [14:07:58<4:53:10, 2.49s/it] +2025-02-06 00:15:38 - ERROR - stderr - +2025-02-06 00:15:38 - ERROR - stderr - +2025-02-06 00:15:38 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.3991596698760986, 'learning_rate': 4.7627098390205574e-06, 'epoch': 2.06} +2025-02-06 00:15:38 - ERROR - stderr - 69%|██████▊ | 15372/22434 [14:07:58<4:53:10, 2.49s/it] +2025-02-06 00:15:40 - ERROR - stderr - 69%|██████▊ | 15373/22434 [14:08:00<4:52:40, 2.49s/it] +2025-02-06 00:15:40 - ERROR - stderr - +2025-02-06 00:15:40 - ERROR - stderr - +2025-02-06 00:15:40 - INFO - stdout - {'loss': 0.3342, 'grad_norm': 1.4219862222671509, 'learning_rate': 4.761479987630127e-06, 'epoch': 2.06} +2025-02-06 00:15:40 - ERROR - stderr - 69%|██████▊ | 15373/22434 [14:08:00<4:52:40, 2.49s/it] +2025-02-06 00:15:43 - ERROR - stderr - 69%|██████▊ | 15374/22434 [14:08:03<4:54:32, 2.50s/it] +2025-02-06 00:15:43 - ERROR - stderr - +2025-02-06 00:15:43 - ERROR - stderr - +2025-02-06 00:15:43 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.6099721193313599, 'learning_rate': 4.76025024543164e-06, 'epoch': 2.06} +2025-02-06 00:15:43 - ERROR - stderr - 69%|██████▊ | 15374/22434 [14:08:03<4:54:32, 2.50s/it] +2025-02-06 00:15:45 - ERROR - stderr - 69%|██████▊ | 15375/22434 [14:08:05<4:51:39, 2.48s/it] +2025-02-06 00:15:45 - ERROR - stderr - +2025-02-06 00:15:45 - ERROR - stderr - +2025-02-06 00:15:45 - INFO - stdout - {'loss': 0.4138, 'grad_norm': 1.6572116613388062, 'learning_rate': 4.75902061245072e-06, 'epoch': 2.06} +2025-02-06 00:15:45 - ERROR - stderr - 69%|██████▊ | 15375/22434 [14:08:05<4:51:39, 2.48s/it] +2025-02-06 00:15:48 - ERROR - stderr - 69%|██████▊ | 15376/22434 [14:08:07<4:48:42, 2.45s/it] +2025-02-06 00:15:48 - ERROR - stderr - +2025-02-06 00:15:48 - ERROR - stderr - +2025-02-06 00:15:48 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.5832918882369995, 'learning_rate': 4.7577910887130004e-06, 'epoch': 2.06} +2025-02-06 00:15:48 - ERROR - stderr - 69%|██████▊ | 15376/22434 [14:08:08<4:48:42, 2.45s/it] +2025-02-06 00:15:50 - ERROR - stderr - 69%|██████▊ | 15377/22434 [14:08:10<4:49:47, 2.46s/it] +2025-02-06 00:15:50 - ERROR - stderr - +2025-02-06 00:15:50 - ERROR - stderr - +2025-02-06 00:15:50 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4484951496124268, 'learning_rate': 4.756561674244109e-06, 'epoch': 2.06} +2025-02-06 00:15:50 - ERROR - stderr - 69%|██████▊ | 15377/22434 [14:08:10<4:49:47, 2.46s/it] +2025-02-06 00:15:53 - ERROR - stderr - 69%|██████▊ | 15378/22434 [14:08:12<4:50:48, 2.47s/it] +2025-02-06 00:15:53 - ERROR - stderr - +2025-02-06 00:15:53 - ERROR - stderr - +2025-02-06 00:15:53 - INFO - stdout - {'loss': 0.3988, 'grad_norm': 1.4732202291488647, 'learning_rate': 4.7553323690696685e-06, 'epoch': 2.06} +2025-02-06 00:15:53 - ERROR - stderr - 69%|██████▊ | 15378/22434 [14:08:13<4:50:48, 2.47s/it] +2025-02-06 00:15:55 - ERROR - stderr - 69%|██████▊ | 15379/22434 [14:08:15<4:48:46, 2.46s/it] +2025-02-06 00:15:55 - ERROR - stderr - +2025-02-06 00:15:55 - ERROR - stderr - +2025-02-06 00:15:55 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.6103055477142334, 'learning_rate': 4.754103173215313e-06, 'epoch': 2.06} +2025-02-06 00:15:55 - ERROR - stderr - 69%|██████▊ | 15379/22434 [14:08:15<4:48:46, 2.46s/it] +2025-02-06 00:15:58 - ERROR - stderr - 69%|██████▊ | 15380/22434 [14:08:18<4:54:42, 2.51s/it] +2025-02-06 00:15:58 - ERROR - stderr - +2025-02-06 00:15:58 - ERROR - stderr - +2025-02-06 00:15:58 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.2619105577468872, 'learning_rate': 4.752874086706653e-06, 'epoch': 2.06} +2025-02-06 00:15:58 - ERROR - stderr - 69%|██████▊ | 15380/22434 [14:08:18<4:54:42, 2.51s/it] +2025-02-06 00:16:00 - ERROR - stderr - 69%|██████▊ | 15381/22434 [14:08:20<4:53:18, 2.50s/it] +2025-02-06 00:16:00 - ERROR - stderr - +2025-02-06 00:16:00 - ERROR - stderr - +2025-02-06 00:16:00 - INFO - stdout - {'loss': 0.4198, 'grad_norm': 1.503366470336914, 'learning_rate': 4.7516451095693125e-06, 'epoch': 2.06} +2025-02-06 00:16:00 - ERROR - stderr - 69%|██████▊ | 15381/22434 [14:08:20<4:53:18, 2.50s/it] +2025-02-06 00:16:03 - ERROR - stderr - 69%|██████▊ | 15382/22434 [14:08:23<4:54:59, 2.51s/it] +2025-02-06 00:16:03 - ERROR - stderr - +2025-02-06 00:16:03 - ERROR - stderr - +2025-02-06 00:16:03 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.1841264963150024, 'learning_rate': 4.7504162418289075e-06, 'epoch': 2.06} +2025-02-06 00:16:03 - ERROR - stderr - 69%|██████▊ | 15382/22434 [14:08:23<4:54:59, 2.51s/it] +2025-02-06 00:16:05 - ERROR - stderr - 69%|██████▊ | 15383/22434 [14:08:25<4:54:16, 2.50s/it] +2025-02-06 00:16:05 - ERROR - stderr - +2025-02-06 00:16:05 - ERROR - stderr - +2025-02-06 00:16:05 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.4515161514282227, 'learning_rate': 4.749187483511053e-06, 'epoch': 2.06} +2025-02-06 00:16:05 - ERROR - stderr - 69%|██████▊ | 15383/22434 [14:08:25<4:54:16, 2.50s/it] +2025-02-06 00:16:08 - ERROR - stderr - 69%|██████▊ | 15384/22434 [14:08:28<5:12:36, 2.66s/it] +2025-02-06 00:16:08 - ERROR - stderr - +2025-02-06 00:16:08 - ERROR - stderr - +2025-02-06 00:16:08 - INFO - stdout - {'loss': 0.4159, 'grad_norm': 1.376715064048767, 'learning_rate': 4.747958834641361e-06, 'epoch': 2.06} +2025-02-06 00:16:08 - ERROR - stderr - 69%|██████▊ | 15384/22434 [14:08:28<5:12:36, 2.66s/it] +2025-02-06 00:16:11 - ERROR - stderr - 69%|██████▊ | 15385/22434 [14:08:31<5:10:25, 2.64s/it] +2025-02-06 00:16:11 - ERROR - stderr - +2025-02-06 00:16:11 - ERROR - stderr - +2025-02-06 00:16:11 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.4098161458969116, 'learning_rate': 4.746730295245441e-06, 'epoch': 2.06} +2025-02-06 00:16:11 - ERROR - stderr - 69%|██████▊ | 15385/22434 [14:08:31<5:10:25, 2.64s/it] +2025-02-06 00:16:13 - ERROR - stderr - 69%|██████▊ | 15386/22434 [14:08:33<5:04:02, 2.59s/it] +2025-02-06 00:16:13 - ERROR - stderr - +2025-02-06 00:16:13 - ERROR - stderr - +2025-02-06 00:16:13 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.3264961242675781, 'learning_rate': 4.7455018653489005e-06, 'epoch': 2.06} +2025-02-06 00:16:13 - ERROR - stderr - 69%|██████▊ | 15386/22434 [14:08:33<5:04:02, 2.59s/it] +2025-02-06 00:16:16 - ERROR - stderr - 69%|██████▊ | 15387/22434 [14:08:36<5:01:13, 2.56s/it] +2025-02-06 00:16:16 - ERROR - stderr - +2025-02-06 00:16:16 - ERROR - stderr - +2025-02-06 00:16:16 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.3074675798416138, 'learning_rate': 4.744273544977346e-06, 'epoch': 2.06} +2025-02-06 00:16:16 - ERROR - stderr - 69%|██████▊ | 15387/22434 [14:08:36<5:01:13, 2.56s/it] +2025-02-06 00:16:18 - ERROR - stderr - 69%|██████▊ | 15388/22434 [14:08:38<4:56:59, 2.53s/it] +2025-02-06 00:16:18 - ERROR - stderr - +2025-02-06 00:16:18 - ERROR - stderr - +2025-02-06 00:16:18 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.5180000066757202, 'learning_rate': 4.7430453341563806e-06, 'epoch': 2.06} +2025-02-06 00:16:18 - ERROR - stderr - 69%|██████▊ | 15388/22434 [14:08:38<4:56:59, 2.53s/it] +2025-02-06 00:16:21 - ERROR - stderr - 69%|██████▊ | 15389/22434 [14:08:41<4:54:07, 2.50s/it] +2025-02-06 00:16:21 - ERROR - stderr - +2025-02-06 00:16:21 - ERROR - stderr - +2025-02-06 00:16:21 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.4633170366287231, 'learning_rate': 4.7418172329116056e-06, 'epoch': 2.06} +2025-02-06 00:16:21 - ERROR - stderr - 69%|██████▊ | 15389/22434 [14:08:41<4:54:07, 2.50s/it] +2025-02-06 00:16:23 - ERROR - stderr - 69%|██████▊ | 15390/22434 [14:08:43<4:54:12, 2.51s/it] +2025-02-06 00:16:23 - ERROR - stderr - +2025-02-06 00:16:23 - ERROR - stderr - +2025-02-06 00:16:23 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.378891944885254, 'learning_rate': 4.740589241268617e-06, 'epoch': 2.06} +2025-02-06 00:16:23 - ERROR - stderr - 69%|██████▊ | 15390/22434 [14:08:43<4:54:12, 2.51s/it] +2025-02-06 00:16:26 - ERROR - stderr - 69%|██████▊ | 15391/22434 [14:08:45<4:53:02, 2.50s/it] +2025-02-06 00:16:26 - ERROR - stderr - +2025-02-06 00:16:26 - ERROR - stderr - +2025-02-06 00:16:26 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.5611132383346558, 'learning_rate': 4.739361359253014e-06, 'epoch': 2.06} +2025-02-06 00:16:26 - ERROR - stderr - 69%|██████▊ | 15391/22434 [14:08:46<4:53:02, 2.50s/it] +2025-02-06 00:16:28 - ERROR - stderr - 69%|██████▊ | 15392/22434 [14:08:48<4:53:50, 2.50s/it] +2025-02-06 00:16:28 - ERROR - stderr - +2025-02-06 00:16:28 - ERROR - stderr - +2025-02-06 00:16:28 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.3337607383728027, 'learning_rate': 4.73813358689039e-06, 'epoch': 2.06} +2025-02-06 00:16:28 - ERROR - stderr - 69%|██████▊ | 15392/22434 [14:08:48<4:53:50, 2.50s/it] +2025-02-06 00:16:31 - ERROR - stderr - 69%|██████▊ | 15393/22434 [14:08:50<4:52:15, 2.49s/it] +2025-02-06 00:16:31 - ERROR - stderr - +2025-02-06 00:16:31 - ERROR - stderr - +2025-02-06 00:16:31 - INFO - stdout - {'loss': 0.3058, 'grad_norm': 1.3864563703536987, 'learning_rate': 4.73690592420634e-06, 'epoch': 2.06} +2025-02-06 00:16:31 - ERROR - stderr - 69%|██████▊ | 15393/22434 [14:08:51<4:52:15, 2.49s/it] +2025-02-06 00:16:33 - ERROR - stderr - 69%|██████▊ | 15394/22434 [14:08:53<4:51:59, 2.49s/it] +2025-02-06 00:16:33 - ERROR - stderr - +2025-02-06 00:16:33 - ERROR - stderr - +2025-02-06 00:16:33 - INFO - stdout - {'loss': 0.4276, 'grad_norm': 1.673964023590088, 'learning_rate': 4.7356783712264405e-06, 'epoch': 2.06} +2025-02-06 00:16:33 - ERROR - stderr - 69%|██████▊ | 15394/22434 [14:08:53<4:51:59, 2.49s/it] +2025-02-06 00:16:36 - ERROR - stderr - 69%|██████▊ | 15395/22434 [14:08:55<4:51:12, 2.48s/it] +2025-02-06 00:16:36 - ERROR - stderr - +2025-02-06 00:16:36 - ERROR - stderr - +2025-02-06 00:16:36 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.4214125871658325, 'learning_rate': 4.7344509279762975e-06, 'epoch': 2.06} +2025-02-06 00:16:36 - ERROR - stderr - 69%|██████▊ | 15395/22434 [14:08:55<4:51:12, 2.48s/it] +2025-02-06 00:16:39 - ERROR - stderr - 69%|██████▊ | 15396/22434 [14:08:59<5:23:14, 2.76s/it] +2025-02-06 00:16:39 - ERROR - stderr - +2025-02-06 00:16:39 - ERROR - stderr - +2025-02-06 00:16:39 - INFO - stdout - {'loss': 0.3514, 'grad_norm': 1.6643668413162231, 'learning_rate': 4.733223594481482e-06, 'epoch': 2.06} +2025-02-06 00:16:39 - ERROR - stderr - 69%|██████▊ | 15396/22434 [14:08:59<5:23:14, 2.76s/it] +2025-02-06 00:16:42 - ERROR - stderr - 69%|██████▊ | 15397/22434 [14:09:01<5:15:35, 2.69s/it] +2025-02-06 00:16:42 - ERROR - stderr - +2025-02-06 00:16:42 - ERROR - stderr - +2025-02-06 00:16:42 - INFO - stdout - {'loss': 0.449, 'grad_norm': 1.4524213075637817, 'learning_rate': 4.731996370767578e-06, 'epoch': 2.06} +2025-02-06 00:16:42 - ERROR - stderr - 69%|██████▊ | 15397/22434 [14:09:01<5:15:35, 2.69s/it] +2025-02-06 00:16:44 - ERROR - stderr - 69%|██████▊ | 15398/22434 [14:09:04<5:07:10, 2.62s/it] +2025-02-06 00:16:44 - ERROR - stderr - +2025-02-06 00:16:44 - ERROR - stderr - +2025-02-06 00:16:44 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.4068831205368042, 'learning_rate': 4.730769256860175e-06, 'epoch': 2.06} +2025-02-06 00:16:44 - ERROR - stderr - 69%|██████▊ | 15398/22434 [14:09:04<5:07:10, 2.62s/it] +2025-02-06 00:16:47 - ERROR - stderr - 69%|██████▊ | 15399/22434 [14:09:06<5:01:59, 2.58s/it] +2025-02-06 00:16:47 - ERROR - stderr - +2025-02-06 00:16:47 - ERROR - stderr - +2025-02-06 00:16:47 - INFO - stdout - {'loss': 0.4277, 'grad_norm': 1.5733909606933594, 'learning_rate': 4.729542252784837e-06, 'epoch': 2.06} +2025-02-06 00:16:47 - ERROR - stderr - 69%|██████▊ | 15399/22434 [14:09:06<5:01:59, 2.58s/it] +2025-02-06 00:16:49 - ERROR - stderr - 69%|██████▊ | 15400/22434 [14:09:09<5:00:11, 2.56s/it] +2025-02-06 00:16:49 - ERROR - stderr - +2025-02-06 00:16:49 - ERROR - stderr - +2025-02-06 00:16:49 - INFO - stdout - {'loss': 0.3661, 'grad_norm': 1.3529168367385864, 'learning_rate': 4.728315358567155e-06, 'epoch': 2.06} +2025-02-06 00:16:49 - ERROR - stderr - 69%|██████▊ | 15400/22434 [14:09:09<5:00:11, 2.56s/it] +2025-02-06 00:16:52 - ERROR - stderr - 69%|██████▊ | 15401/22434 [14:09:11<5:02:56, 2.58s/it] +2025-02-06 00:16:52 - ERROR - stderr - +2025-02-06 00:16:52 - ERROR - stderr - +2025-02-06 00:16:52 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.4157673120498657, 'learning_rate': 4.727088574232692e-06, 'epoch': 2.06} +2025-02-06 00:16:52 - ERROR - stderr - 69%|██████▊ | 15401/22434 [14:09:11<5:02:56, 2.58s/it] +2025-02-06 00:16:54 - ERROR - stderr - 69%|██████▊ | 15402/22434 [14:09:14<4:59:39, 2.56s/it] +2025-02-06 00:16:54 - ERROR - stderr - +2025-02-06 00:16:54 - ERROR - stderr - +2025-02-06 00:16:54 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.3042711019515991, 'learning_rate': 4.7258618998070215e-06, 'epoch': 2.06} +2025-02-06 00:16:54 - ERROR - stderr - 69%|██████▊ | 15402/22434 [14:09:14<4:59:39, 2.56s/it] +2025-02-06 00:16:57 - ERROR - stderr - 69%|██████▊ | 15403/22434 [14:09:17<5:13:19, 2.67s/it] +2025-02-06 00:16:57 - ERROR - stderr - +2025-02-06 00:16:57 - ERROR - stderr - +2025-02-06 00:16:57 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.447706699371338, 'learning_rate': 4.7246353353157125e-06, 'epoch': 2.06} +2025-02-06 00:16:57 - ERROR - stderr - 69%|██████▊ | 15403/22434 [14:09:17<5:13:19, 2.67s/it] +2025-02-06 00:17:00 - ERROR - stderr - 69%|██████▊ | 15404/22434 [14:09:19<5:06:48, 2.62s/it] +2025-02-06 00:17:00 - ERROR - stderr - +2025-02-06 00:17:00 - ERROR - stderr - +2025-02-06 00:17:00 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.6153004169464111, 'learning_rate': 4.7234088807843334e-06, 'epoch': 2.06} +2025-02-06 00:17:00 - ERROR - stderr - 69%|██████▊ | 15404/22434 [14:09:19<5:06:48, 2.62s/it] +2025-02-06 00:17:02 - ERROR - stderr - 69%|██████▊ | 15405/22434 [14:09:22<5:02:26, 2.58s/it] +2025-02-06 00:17:02 - ERROR - stderr - +2025-02-06 00:17:02 - ERROR - stderr - +2025-02-06 00:17:02 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.5024189949035645, 'learning_rate': 4.722182536238445e-06, 'epoch': 2.06} +2025-02-06 00:17:02 - ERROR - stderr - 69%|██████▊ | 15405/22434 [14:09:22<5:02:26, 2.58s/it] +2025-02-06 00:17:02 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2783 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 00:17:02 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2783 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 00:17:05 - ERROR - stderr - 69%|██████▊ | 15406/22434 [14:09:24<5:02:23, 2.58s/it] +2025-02-06 00:17:05 - ERROR - stderr - +2025-02-06 00:17:05 - ERROR - stderr - +2025-02-06 00:17:05 - INFO - stdout - {'loss': 0.4167, 'grad_norm': 1.4377762079238892, 'learning_rate': 4.720956301703613e-06, 'epoch': 2.06} +2025-02-06 00:17:05 - ERROR - stderr - 69%|██████▊ | 15406/22434 [14:09:25<5:02:23, 2.58s/it] +2025-02-06 00:17:10 - ERROR - stderr - 69%|██████▊ | 15407/22434 [14:09:30<6:52:26, 3.52s/it] +2025-02-06 00:17:10 - ERROR - stderr - +2025-02-06 00:17:10 - ERROR - stderr - +2025-02-06 00:17:10 - INFO - stdout - {'loss': 0.3442, 'grad_norm': 1.46294105052948, 'learning_rate': 4.719730177205395e-06, 'epoch': 2.06} +2025-02-06 00:17:10 - ERROR - stderr - 69%|██████▊ | 15407/22434 [14:09:30<6:52:26, 3.52s/it] +2025-02-06 00:17:13 - ERROR - stderr - 69%|██████▊ | 15408/22434 [14:09:33<6:19:58, 3.24s/it] +2025-02-06 00:17:13 - ERROR - stderr - +2025-02-06 00:17:13 - ERROR - stderr - +2025-02-06 00:17:13 - INFO - stdout - {'loss': 0.3877, 'grad_norm': 1.506975769996643, 'learning_rate': 4.7185041627693485e-06, 'epoch': 2.06} +2025-02-06 00:17:13 - ERROR - stderr - 69%|██████▊ | 15408/22434 [14:09:33<6:19:58, 3.24s/it] +2025-02-06 00:17:15 - ERROR - stderr - 69%|██████▊ | 15409/22434 [14:09:35<5:50:56, 3.00s/it] +2025-02-06 00:17:15 - ERROR - stderr - +2025-02-06 00:17:15 - ERROR - stderr - +2025-02-06 00:17:15 - INFO - stdout - {'loss': 0.413, 'grad_norm': 1.5477209091186523, 'learning_rate': 4.71727825842103e-06, 'epoch': 2.06} +2025-02-06 00:17:15 - ERROR - stderr - 69%|██████▊ | 15409/22434 [14:09:35<5:50:56, 3.00s/it] +2025-02-06 00:17:18 - ERROR - stderr - 69%|██████▊ | 15410/22434 [14:09:38<5:31:12, 2.83s/it] +2025-02-06 00:17:18 - ERROR - stderr - +2025-02-06 00:17:18 - ERROR - stderr - +2025-02-06 00:17:18 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.5747721195220947, 'learning_rate': 4.71605246418599e-06, 'epoch': 2.06} +2025-02-06 00:17:18 - ERROR - stderr - 69%|██████▊ | 15410/22434 [14:09:38<5:31:12, 2.83s/it] +2025-02-06 00:17:20 - ERROR - stderr - 69%|██████▊ | 15411/22434 [14:09:40<5:17:46, 2.71s/it] +2025-02-06 00:17:20 - ERROR - stderr - +2025-02-06 00:17:20 - ERROR - stderr - +2025-02-06 00:17:20 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.5606242418289185, 'learning_rate': 4.71482678008978e-06, 'epoch': 2.06} +2025-02-06 00:17:20 - ERROR - stderr - 69%|██████▊ | 15411/22434 [14:09:40<5:17:46, 2.71s/it] +2025-02-06 00:17:23 - ERROR - stderr - 69%|██████��� | 15412/22434 [14:09:43<5:09:17, 2.64s/it] +2025-02-06 00:17:23 - ERROR - stderr - +2025-02-06 00:17:23 - ERROR - stderr - +2025-02-06 00:17:23 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.3939623832702637, 'learning_rate': 4.713601206157953e-06, 'epoch': 2.06} +2025-02-06 00:17:23 - ERROR - stderr - 69%|██████▊ | 15412/22434 [14:09:43<5:09:17, 2.64s/it] +2025-02-06 00:17:25 - ERROR - stderr - 69%|██████▊ | 15413/22434 [14:09:45<5:03:01, 2.59s/it] +2025-02-06 00:17:25 - ERROR - stderr - +2025-02-06 00:17:25 - ERROR - stderr - +2025-02-06 00:17:25 - INFO - stdout - {'loss': 0.3207, 'grad_norm': 1.3177127838134766, 'learning_rate': 4.7123757424160425e-06, 'epoch': 2.06} +2025-02-06 00:17:25 - ERROR - stderr - 69%|██████▊ | 15413/22434 [14:09:45<5:03:01, 2.59s/it] +2025-02-06 00:17:28 - ERROR - stderr - 69%|██████▊ | 15414/22434 [14:09:47<4:57:39, 2.54s/it] +2025-02-06 00:17:28 - ERROR - stderr - +2025-02-06 00:17:28 - ERROR - stderr - +2025-02-06 00:17:28 - INFO - stdout - {'loss': 0.3892, 'grad_norm': 1.5481077432632446, 'learning_rate': 4.711150388889607e-06, 'epoch': 2.06} +2025-02-06 00:17:28 - ERROR - stderr - 69%|██████▊ | 15414/22434 [14:09:47<4:57:39, 2.54s/it] +2025-02-06 00:17:30 - ERROR - stderr - 69%|██████▊ | 15415/22434 [14:09:50<4:58:27, 2.55s/it] +2025-02-06 00:17:30 - ERROR - stderr - +2025-02-06 00:17:30 - ERROR - stderr - +2025-02-06 00:17:30 - INFO - stdout - {'loss': 0.409, 'grad_norm': 1.4694411754608154, 'learning_rate': 4.709925145604173e-06, 'epoch': 2.06} +2025-02-06 00:17:30 - ERROR - stderr - 69%|██████▊ | 15415/22434 [14:09:50<4:58:27, 2.55s/it] +2025-02-06 00:17:33 - ERROR - stderr - 69%|██████▊ | 15416/22434 [14:09:52<4:54:30, 2.52s/it] +2025-02-06 00:17:33 - ERROR - stderr - +2025-02-06 00:17:33 - ERROR - stderr - +2025-02-06 00:17:33 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.406741976737976, 'learning_rate': 4.708700012585292e-06, 'epoch': 2.06} +2025-02-06 00:17:33 - ERROR - stderr - 69%|██████▊ | 15416/22434 [14:09:53<4:54:30, 2.52s/it] +2025-02-06 00:17:35 - ERROR - stderr - 69%|██████▊ | 15417/22434 [14:09:55<4:55:50, 2.53s/it] +2025-02-06 00:17:35 - ERROR - stderr - +2025-02-06 00:17:35 - ERROR - stderr - +2025-02-06 00:17:35 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.3609262704849243, 'learning_rate': 4.707474989858499e-06, 'epoch': 2.06} +2025-02-06 00:17:35 - ERROR - stderr - 69%|██████▊ | 15417/22434 [14:09:55<4:55:50, 2.53s/it] +2025-02-06 00:17:38 - ERROR - stderr - 69%|██████▊ | 15418/22434 [14:09:58<5:12:58, 2.68s/it] +2025-02-06 00:17:38 - ERROR - stderr - +2025-02-06 00:17:38 - ERROR - stderr - +2025-02-06 00:17:38 - INFO - stdout - {'loss': 0.4058, 'grad_norm': 1.4467686414718628, 'learning_rate': 4.706250077449318e-06, 'epoch': 2.06} +2025-02-06 00:17:38 - ERROR - stderr - 69%|██████▊ | 15418/22434 [14:09:58<5:12:58, 2.68s/it] +2025-02-06 00:17:41 - ERROR - stderr - 69%|██████▊ | 15419/22434 [14:10:01<5:12:25, 2.67s/it] +2025-02-06 00:17:41 - ERROR - stderr - +2025-02-06 00:17:41 - ERROR - stderr - +2025-02-06 00:17:41 - INFO - stdout - {'loss': 0.3858, 'grad_norm': 1.4460002183914185, 'learning_rate': 4.705025275383297e-06, 'epoch': 2.06} +2025-02-06 00:17:41 - ERROR - stderr - 69%|██████▊ | 15419/22434 [14:10:01<5:12:25, 2.67s/it] +2025-02-06 00:17:43 - ERROR - stderr - 69%|██████▊ | 15420/22434 [14:10:03<5:08:52, 2.64s/it] +2025-02-06 00:17:44 - ERROR - stderr - +2025-02-06 00:17:44 - ERROR - stderr - +2025-02-06 00:17:44 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.3948044776916504, 'learning_rate': 4.7038005836859525e-06, 'epoch': 2.06} +2025-02-06 00:17:44 - ERROR - stderr - 69%|██████▊ | 15420/22434 [14:10:03<5:08:52, 2.64s/it] +2025-02-06 00:17:46 - ERROR - stderr - 69%|██████▊ | 15421/22434 [14:10:06<5:04:25, 2.60s/it] +2025-02-06 00:17:46 - ERROR - stderr - +2025-02-06 00:17:46 - ERROR - stderr - +2025-02-06 00:17:46 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.5428670644760132, 'learning_rate': 4.702576002382818e-06, 'epoch': 2.06} +2025-02-06 00:17:46 - ERROR - stderr - 69%|██████▊ | 15421/22434 [14:10:06<5:04:25, 2.60s/it] +2025-02-06 00:17:48 - ERROR - stderr - 69%|██████▊ | 15422/22434 [14:10:08<4:59:50, 2.57s/it] +2025-02-06 00:17:49 - ERROR - stderr - +2025-02-06 00:17:49 - ERROR - stderr - +2025-02-06 00:17:49 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.483926773071289, 'learning_rate': 4.7013515314994174e-06, 'epoch': 2.06} +2025-02-06 00:17:49 - ERROR - stderr - 69%|██████▊ | 15422/22434 [14:10:08<4:59:50, 2.57s/it] +2025-02-06 00:17:51 - ERROR - stderr - 69%|██████▊ | 15423/22434 [14:10:11<4:57:49, 2.55s/it] +2025-02-06 00:17:51 - ERROR - stderr - +2025-02-06 00:17:51 - ERROR - stderr - +2025-02-06 00:17:51 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.4130859375, 'learning_rate': 4.70012717106127e-06, 'epoch': 2.06} +2025-02-06 00:17:51 - ERROR - stderr - 69%|██████▊ | 15423/22434 [14:10:11<4:57:49, 2.55s/it] +2025-02-06 00:17:53 - ERROR - stderr - 69%|██████▉ | 15424/22434 [14:10:13<4:54:24, 2.52s/it] +2025-02-06 00:17:54 - ERROR - stderr - +2025-02-06 00:17:54 - ERROR - stderr - +2025-02-06 00:17:54 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.3366918563842773, 'learning_rate': 4.698902921093907e-06, 'epoch': 2.06} +2025-02-06 00:17:54 - ERROR - stderr - 69%|██████▉ | 15424/22434 [14:10:13<4:54:24, 2.52s/it] +2025-02-06 00:17:56 - ERROR - stderr - 69%|██████▉ | 15425/22434 [14:10:16<4:50:49, 2.49s/it] +2025-02-06 00:17:56 - ERROR - stderr - +2025-02-06 00:17:56 - ERROR - stderr - +2025-02-06 00:17:56 - INFO - stdout - {'loss': 0.4103, 'grad_norm': 1.504630446434021, 'learning_rate': 4.697678781622837e-06, 'epoch': 2.06} +2025-02-06 00:17:56 - ERROR - stderr - 69%|██████▉ | 15425/22434 [14:10:16<4:50:49, 2.49s/it] +2025-02-06 00:17:58 - ERROR - stderr - 69%|██████▉ | 15426/22434 [14:10:18<4:49:56, 2.48s/it] +2025-02-06 00:17:58 - ERROR - stderr - +2025-02-06 00:17:58 - ERROR - stderr - +2025-02-06 00:17:58 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.4659929275512695, 'learning_rate': 4.696454752673578e-06, 'epoch': 2.06} +2025-02-06 00:17:58 - ERROR - stderr - 69%|██████▉ | 15426/22434 [14:10:18<4:49:56, 2.48s/it] +2025-02-06 00:18:01 - ERROR - stderr - 69%|██████▉ | 15427/22434 [14:10:21<4:52:39, 2.51s/it] +2025-02-06 00:18:01 - ERROR - stderr - +2025-02-06 00:18:01 - ERROR - stderr - +2025-02-06 00:18:01 - INFO - stdout - {'loss': 0.403, 'grad_norm': 1.4550950527191162, 'learning_rate': 4.695230834271647e-06, 'epoch': 2.06} +2025-02-06 00:18:01 - ERROR - stderr - 69%|██████▉ | 15427/22434 [14:10:21<4:52:39, 2.51s/it] +2025-02-06 00:18:03 - ERROR - stderr - 69%|██████▉ | 15428/22434 [14:10:23<4:51:03, 2.49s/it] +2025-02-06 00:18:03 - ERROR - stderr - +2025-02-06 00:18:03 - ERROR - stderr - +2025-02-06 00:18:03 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.4081318378448486, 'learning_rate': 4.694007026442551e-06, 'epoch': 2.06} +2025-02-06 00:18:03 - ERROR - stderr - 69%|██████▉ | 15428/22434 [14:10:23<4:51:03, 2.49s/it] +2025-02-06 00:18:06 - ERROR - stderr - 69%|██████▉ | 15429/22434 [14:10:26<4:51:17, 2.49s/it] +2025-02-06 00:18:06 - ERROR - stderr - +2025-02-06 00:18:06 - ERROR - stderr - +2025-02-06 00:18:06 - INFO - stdout - {'loss': 0.4071, 'grad_norm': 1.5238710641860962, 'learning_rate': 4.692783329211802e-06, 'epoch': 2.06} +2025-02-06 00:18:06 - ERROR - stderr - 69%|██████▉ | 15429/22434 [14:10:26<4:51:17, 2.49s/it] +2025-02-06 00:18:08 - ERROR - stderr - 69%|██████▉ | 15430/22434 [14:10:28<4:52:00, 2.50s/it] +2025-02-06 00:18:08 - ERROR - stderr - +2025-02-06 00:18:08 - ERROR - stderr - +2025-02-06 00:18:08 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.3430339097976685, 'learning_rate': 4.691559742604906e-06, 'epoch': 2.06} +2025-02-06 00:18:08 - ERROR - stderr - 69%|██████▉ | 15430/22434 [14:10:28<4:52:00, 2.50s/it] +2025-02-06 00:18:11 - ERROR - stderr - 69%|██████▉ | 15431/22434 [14:10:31<4:51:43, 2.50s/it] +2025-02-06 00:18:11 - ERROR - stderr - +2025-02-06 00:18:11 - ERROR - stderr - +2025-02-06 00:18:11 - INFO - stdout - {'loss': 0.4052, 'grad_norm': 1.5392705202102661, 'learning_rate': 4.690336266647368e-06, 'epoch': 2.06} +2025-02-06 00:18:11 - ERROR - stderr - 69%|██████▉ | 15431/22434 [14:10:31<4:51:43, 2.50s/it] +2025-02-06 00:18:13 - ERROR - stderr - 69%|██████▉ | 15432/22434 [14:10:33<4:51:55, 2.50s/it] +2025-02-06 00:18:13 - ERROR - stderr - +2025-02-06 00:18:13 - ERROR - stderr - +2025-02-06 00:18:13 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.469671607017517, 'learning_rate': 4.68911290136469e-06, 'epoch': 2.06} +2025-02-06 00:18:13 - ERROR - stderr - 69%|██████▉ | 15432/22434 [14:10:33<4:51:55, 2.50s/it] +2025-02-06 00:18:16 - ERROR - stderr - 69%|██████▉ | 15433/22434 [14:10:36<4:50:13, 2.49s/it] +2025-02-06 00:18:16 - ERROR - stderr - +2025-02-06 00:18:16 - ERROR - stderr - +2025-02-06 00:18:16 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.3256806135177612, 'learning_rate': 4.687889646782374e-06, 'epoch': 2.06} +2025-02-06 00:18:16 - ERROR - stderr - 69%|██████▉ | 15433/22434 [14:10:36<4:50:13, 2.49s/it] +2025-02-06 00:18:18 - ERROR - stderr - 69%|██████▉ | 15434/22434 [14:10:38<4:49:55, 2.49s/it] +2025-02-06 00:18:18 - ERROR - stderr - +2025-02-06 00:18:18 - ERROR - stderr - +2025-02-06 00:18:18 - INFO - stdout - {'loss': 0.3892, 'grad_norm': 1.5027660131454468, 'learning_rate': 4.686666502925908e-06, 'epoch': 2.06} +2025-02-06 00:18:18 - ERROR - stderr - 69%|██████▉ | 15434/22434 [14:10:38<4:49:55, 2.49s/it] +2025-02-06 00:18:21 - ERROR - stderr - 69%|██████▉ | 15435/22434 [14:10:41<4:47:42, 2.47s/it] +2025-02-06 00:18:21 - ERROR - stderr - +2025-02-06 00:18:21 - ERROR - stderr - +2025-02-06 00:18:21 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.4907779693603516, 'learning_rate': 4.685443469820799e-06, 'epoch': 2.06} +2025-02-06 00:18:21 - ERROR - stderr - 69%|██████▉ | 15435/22434 [14:10:41<4:47:42, 2.47s/it] +2025-02-06 00:18:23 - ERROR - stderr - 69%|██████▉ | 15436/22434 [14:10:43<4:48:29, 2.47s/it] +2025-02-06 00:18:23 - ERROR - stderr - +2025-02-06 00:18:23 - ERROR - stderr - +2025-02-06 00:18:23 - INFO - stdout - {'loss': 0.4415, 'grad_norm': 1.6684945821762085, 'learning_rate': 4.684220547492539e-06, 'epoch': 2.06} +2025-02-06 00:18:23 - ERROR - stderr - 69%|██████▉ | 15436/22434 [14:10:43<4:48:29, 2.47s/it] +2025-02-06 00:18:26 - ERROR - stderr - 69%|██████▉ | 15437/22434 [14:10:45<4:48:06, 2.47s/it] +2025-02-06 00:18:26 - ERROR - stderr - +2025-02-06 00:18:26 - ERROR - stderr - +2025-02-06 00:18:26 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.4465241432189941, 'learning_rate': 4.682997735966607e-06, 'epoch': 2.06} +2025-02-06 00:18:26 - ERROR - stderr - 69%|██████▉ | 15437/22434 [14:10:46<4:48:06, 2.47s/it] +2025-02-06 00:18:28 - ERROR - stderr - 69%|██████▉ | 15438/22434 [14:10:48<4:48:20, 2.47s/it] +2025-02-06 00:18:28 - ERROR - stderr - +2025-02-06 00:18:28 - ERROR - stderr - +2025-02-06 00:18:28 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.2934316396713257, 'learning_rate': 4.681775035268507e-06, 'epoch': 2.06} +2025-02-06 00:18:28 - ERROR - stderr - 69%|██████▉ | 15438/22434 [14:10:48<4:48:20, 2.47s/it] +2025-02-06 00:18:31 - ERROR - stderr - 69%|██████▉ | 15439/22434 [14:10:50<4:46:41, 2.46s/it] +2025-02-06 00:18:31 - ERROR - stderr - +2025-02-06 00:18:31 - ERROR - stderr - +2025-02-06 00:18:31 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.372052550315857, 'learning_rate': 4.6805524454237095e-06, 'epoch': 2.06} +2025-02-06 00:18:31 - ERROR - stderr - 69%|██████▉ | 15439/22434 [14:10:50<4:46:41, 2.46s/it] +2025-02-06 00:18:33 - ERROR - stderr - 69%|██████▉ | 15440/22434 [14:10:53<4:47:12, 2.46s/it] +2025-02-06 00:18:33 - ERROR - stderr - +2025-02-06 00:18:33 - ERROR - stderr - +2025-02-06 00:18:33 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.4609370231628418, 'learning_rate': 4.6793299664577145e-06, 'epoch': 2.06} +2025-02-06 00:18:33 - ERROR - stderr - 69%|██████▉ | 15440/22434 [14:10:53<4:47:12, 2.46s/it] +2025-02-06 00:18:36 - ERROR - stderr - 69%|██████▉ | 15441/22434 [14:10:55<4:50:42, 2.49s/it] +2025-02-06 00:18:36 - ERROR - stderr - +2025-02-06 00:18:36 - ERROR - stderr - +2025-02-06 00:18:36 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.4659334421157837, 'learning_rate': 4.678107598395991e-06, 'epoch': 2.06} +2025-02-06 00:18:36 - ERROR - stderr - 69%|██████▉ | 15441/22434 [14:10:55<4:50:42, 2.49s/it] +2025-02-06 00:18:38 - ERROR - stderr - 69%|██████▉ | 15442/22434 [14:10:58<4:51:12, 2.50s/it] +2025-02-06 00:18:38 - ERROR - stderr - +2025-02-06 00:18:38 - ERROR - stderr - +2025-02-06 00:18:38 - INFO - stdout - {'loss': 0.4092, 'grad_norm': 1.42794668674469, 'learning_rate': 4.676885341264018e-06, 'epoch': 2.06} +2025-02-06 00:18:38 - ERROR - stderr - 69%|██████▉ | 15442/22434 [14:10:58<4:51:12, 2.50s/it] +2025-02-06 00:18:41 - ERROR - stderr - 69%|██████▉ | 15443/22434 [14:11:00<4:50:21, 2.49s/it] +2025-02-06 00:18:41 - ERROR - stderr - +2025-02-06 00:18:41 - ERROR - stderr - +2025-02-06 00:18:41 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.466582179069519, 'learning_rate': 4.675663195087285e-06, 'epoch': 2.07} +2025-02-06 00:18:41 - ERROR - stderr - 69%|██████▉ | 15443/22434 [14:11:00<4:50:21, 2.49s/it] +2025-02-06 00:18:43 - ERROR - stderr - 69%|██████▉ | 15444/22434 [14:11:03<4:50:46, 2.50s/it] +2025-02-06 00:18:43 - ERROR - stderr - +2025-02-06 00:18:43 - ERROR - stderr - +2025-02-06 00:18:43 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.6180777549743652, 'learning_rate': 4.674441159891252e-06, 'epoch': 2.07} +2025-02-06 00:18:43 - ERROR - stderr - 69%|██████▉ | 15444/22434 [14:11:03<4:50:46, 2.50s/it] +2025-02-06 00:18:46 - ERROR - stderr - 69%|██████▉ | 15445/22434 [14:11:05<4:49:55, 2.49s/it] +2025-02-06 00:18:46 - ERROR - stderr - +2025-02-06 00:18:46 - ERROR - stderr - +2025-02-06 00:18:46 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.452977180480957, 'learning_rate': 4.673219235701398e-06, 'epoch': 2.07} +2025-02-06 00:18:46 - ERROR - stderr - 69%|██████▉ | 15445/22434 [14:11:05<4:49:55, 2.49s/it] +2025-02-06 00:18:48 - ERROR - stderr - 69%|██████▉ | 15446/22434 [14:11:08<4:48:22, 2.48s/it] +2025-02-06 00:18:48 - ERROR - stderr - +2025-02-06 00:18:48 - ERROR - stderr - +2025-02-06 00:18:48 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.3510925769805908, 'learning_rate': 4.6719974225431926e-06, 'epoch': 2.07} +2025-02-06 00:18:48 - ERROR - stderr - 69%|██████▉ | 15446/22434 [14:11:08<4:48:22, 2.48s/it] +2025-02-06 00:18:51 - ERROR - stderr - 69%|██████▉ | 15447/22434 [14:11:10<4:50:06, 2.49s/it] +2025-02-06 00:18:51 - ERROR - stderr - +2025-02-06 00:18:51 - ERROR - stderr - +2025-02-06 00:18:51 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.3907662630081177, 'learning_rate': 4.670775720442102e-06, 'epoch': 2.07} +2025-02-06 00:18:51 - ERROR - stderr - 69%|██████▉ | 15447/22434 [14:11:10<4:50:06, 2.49s/it] +2025-02-06 00:18:53 - ERROR - stderr - 69%|██████▉ | 15448/22434 [14:11:13<4:48:31, 2.48s/it] +2025-02-06 00:18:53 - ERROR - stderr - +2025-02-06 00:18:53 - ERROR - stderr - +2025-02-06 00:18:53 - INFO - stdout - {'loss': 0.4248, 'grad_norm': 1.6616202592849731, 'learning_rate': 4.669554129423593e-06, 'epoch': 2.07} +2025-02-06 00:18:53 - ERROR - stderr - 69%|██████▉ | 15448/22434 [14:11:13<4:48:31, 2.48s/it] +2025-02-06 00:18:55 - ERROR - stderr - 69%|██████▉ | 15449/22434 [14:11:15<4:47:47, 2.47s/it] +2025-02-06 00:18:56 - ERROR - stderr - +2025-02-06 00:18:56 - ERROR - stderr - +2025-02-06 00:18:56 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.3344568014144897, 'learning_rate': 4.668332649513127e-06, 'epoch': 2.07} +2025-02-06 00:18:56 - ERROR - stderr - 69%|██████▉ | 15449/22434 [14:11:15<4:47:47, 2.47s/it] +2025-02-06 00:18:58 - ERROR - stderr - 69%|██████▉ | 15450/22434 [14:11:18<4:47:37, 2.47s/it] +2025-02-06 00:18:58 - ERROR - stderr - +2025-02-06 00:18:58 - ERROR - stderr - +2025-02-06 00:18:58 - INFO - stdout - {'loss': 0.4016, 'grad_norm': 1.4247746467590332, 'learning_rate': 4.667111280736164e-06, 'epoch': 2.07} +2025-02-06 00:18:58 - ERROR - stderr - 69%|██████▉ | 15450/22434 [14:11:18<4:47:37, 2.47s/it] +2025-02-06 00:19:00 - ERROR - stderr - 69%|██████▉ | 15451/22434 [14:11:20<4:47:38, 2.47s/it] +2025-02-06 00:19:00 - ERROR - stderr - +2025-02-06 00:19:00 - ERROR - stderr - +2025-02-06 00:19:00 - INFO - stdout - {'loss': 0.4014, 'grad_norm': 1.465731143951416, 'learning_rate': 4.665890023118164e-06, 'epoch': 2.07} +2025-02-06 00:19:00 - ERROR - stderr - 69%|██████▉ | 15451/22434 [14:11:20<4:47:38, 2.47s/it] +2025-02-06 00:19:03 - ERROR - stderr - 69%|██████▉ | 15452/22434 [14:11:23<4:47:19, 2.47s/it] +2025-02-06 00:19:03 - ERROR - stderr - +2025-02-06 00:19:03 - ERROR - stderr - +2025-02-06 00:19:03 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.4717645645141602, 'learning_rate': 4.664668876684586e-06, 'epoch': 2.07} +2025-02-06 00:19:03 - ERROR - stderr - 69%|██████▉ | 15452/22434 [14:11:23<4:47:19, 2.47s/it] +2025-02-06 00:19:05 - ERROR - stderr - 69%|██████▉ | 15453/22434 [14:11:25<4:47:16, 2.47s/it] +2025-02-06 00:19:05 - ERROR - stderr - +2025-02-06 00:19:05 - ERROR - stderr - +2025-02-06 00:19:05 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.4359855651855469, 'learning_rate': 4.663447841460872e-06, 'epoch': 2.07} +2025-02-06 00:19:05 - ERROR - stderr - 69%|██████▉ | 15453/22434 [14:11:25<4:47:16, 2.47s/it] +2025-02-06 00:19:08 - ERROR - stderr - 69%|██████▉ | 15454/22434 [14:11:28<4:48:09, 2.48s/it] +2025-02-06 00:19:08 - ERROR - stderr - +2025-02-06 00:19:08 - ERROR - stderr - +2025-02-06 00:19:08 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.3743728399276733, 'learning_rate': 4.662226917472485e-06, 'epoch': 2.07} +2025-02-06 00:19:08 - ERROR - stderr - 69%|██████▉ | 15454/22434 [14:11:28<4:48:09, 2.48s/it] +2025-02-06 00:19:10 - ERROR - stderr - 69%|██████▉ | 15455/22434 [14:11:30<4:49:04, 2.49s/it] +2025-02-06 00:19:10 - ERROR - stderr - +2025-02-06 00:19:10 - ERROR - stderr - +2025-02-06 00:19:10 - INFO - stdout - {'loss': 0.416, 'grad_norm': 1.4159470796585083, 'learning_rate': 4.661006104744871e-06, 'epoch': 2.07} +2025-02-06 00:19:10 - ERROR - stderr - 69%|██████▉ | 15455/22434 [14:11:30<4:49:04, 2.49s/it] +2025-02-06 00:19:13 - ERROR - stderr - 69%|██████▉ | 15456/22434 [14:11:33<4:48:09, 2.48s/it] +2025-02-06 00:19:13 - ERROR - stderr - +2025-02-06 00:19:13 - ERROR - stderr - +2025-02-06 00:19:13 - INFO - stdout - {'loss': 0.4044, 'grad_norm': 1.5950192213058472, 'learning_rate': 4.659785403303476e-06, 'epoch': 2.07} +2025-02-06 00:19:13 - ERROR - stderr - 69%|██████▉ | 15456/22434 [14:11:33<4:48:09, 2.48s/it] +2025-02-06 00:19:15 - ERROR - stderr - 69%|██████▉ | 15457/22434 [14:11:35<4:49:42, 2.49s/it] +2025-02-06 00:19:15 - ERROR - stderr - +2025-02-06 00:19:15 - ERROR - stderr - +2025-02-06 00:19:15 - INFO - stdout - {'loss': 0.3399, 'grad_norm': 1.3533879518508911, 'learning_rate': 4.658564813173747e-06, 'epoch': 2.07} +2025-02-06 00:19:15 - ERROR - stderr - 69%|██████▉ | 15457/22434 [14:11:35<4:49:42, 2.49s/it] +2025-02-06 00:19:18 - ERROR - stderr - 69%|██████▉ | 15458/22434 [14:11:38<4:46:49, 2.47s/it] +2025-02-06 00:19:18 - ERROR - stderr - +2025-02-06 00:19:18 - ERROR - stderr - +2025-02-06 00:19:18 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.539728045463562, 'learning_rate': 4.657344334381116e-06, 'epoch': 2.07} +2025-02-06 00:19:18 - ERROR - stderr - 69%|██████▉ | 15458/22434 [14:11:38<4:46:49, 2.47s/it] +2025-02-06 00:19:20 - ERROR - stderr - 69%|██████▉ | 15459/22434 [14:11:40<4:47:35, 2.47s/it] +2025-02-06 00:19:20 - ERROR - stderr - +2025-02-06 00:19:20 - ERROR - stderr - +2025-02-06 00:19:20 - INFO - stdout - {'loss': 0.3896, 'grad_norm': 1.3842591047286987, 'learning_rate': 4.6561239669510385e-06, 'epoch': 2.07} +2025-02-06 00:19:20 - ERROR - stderr - 69%|██████▉ | 15459/22434 [14:11:40<4:47:35, 2.47s/it] +2025-02-06 00:19:23 - ERROR - stderr - 69%|██████▉ | 15460/22434 [14:11:43<4:59:02, 2.57s/it] +2025-02-06 00:19:23 - ERROR - stderr - +2025-02-06 00:19:23 - ERROR - stderr - +2025-02-06 00:19:23 - INFO - stdout - {'loss': 0.4117, 'grad_norm': 1.5253359079360962, 'learning_rate': 4.654903710908938e-06, 'epoch': 2.07} +2025-02-06 00:19:23 - ERROR - stderr - 69%|██████▉ | 15460/22434 [14:11:43<4:59:02, 2.57s/it] +2025-02-06 00:19:26 - ERROR - stderr - 69%|██████▉ | 15461/22434 [14:11:45<4:55:36, 2.54s/it] +2025-02-06 00:19:26 - ERROR - stderr - +2025-02-06 00:19:26 - ERROR - stderr - +2025-02-06 00:19:26 - INFO - stdout - {'loss': 0.4018, 'grad_norm': 1.5704386234283447, 'learning_rate': 4.653683566280253e-06, 'epoch': 2.07} +2025-02-06 00:19:26 - ERROR - stderr - 69%|██████▉ | 15461/22434 [14:11:45<4:55:36, 2.54s/it] +2025-02-06 00:19:28 - ERROR - stderr - 69%|██████▉ | 15462/22434 [14:11:48<4:52:19, 2.52s/it] +2025-02-06 00:19:28 - ERROR - stderr - +2025-02-06 00:19:28 - ERROR - stderr - +2025-02-06 00:19:28 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.3722317218780518, 'learning_rate': 4.652463533090425e-06, 'epoch': 2.07} +2025-02-06 00:19:28 - ERROR - stderr - 69%|██████▉ | 15462/22434 [14:11:48<4:52:19, 2.52s/it] +2025-02-06 00:19:30 - ERROR - stderr - 69%|██████▉ | 15463/22434 [14:11:50<4:51:54, 2.51s/it] +2025-02-06 00:19:31 - ERROR - stderr - +2025-02-06 00:19:31 - ERROR - stderr - +2025-02-06 00:19:31 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.3908131122589111, 'learning_rate': 4.65124361136487e-06, 'epoch': 2.07} +2025-02-06 00:19:31 - ERROR - stderr - 69%|██████▉ | 15463/22434 [14:11:50<4:51:54, 2.51s/it] +2025-02-06 00:19:33 - ERROR - stderr - 69%|██████▉ | 15464/22434 [14:11:53<4:52:48, 2.52s/it] +2025-02-06 00:19:33 - ERROR - stderr - +2025-02-06 00:19:33 - ERROR - stderr - +2025-02-06 00:19:33 - INFO - stdout - {'loss': 0.4318, 'grad_norm': 1.599563479423523, 'learning_rate': 4.65002380112903e-06, 'epoch': 2.07} +2025-02-06 00:19:33 - ERROR - stderr - 69%|██████▉ | 15464/22434 [14:11:53<4:52:48, 2.52s/it] +2025-02-06 00:19:35 - ERROR - stderr - 69%|██████▉ | 15465/22434 [14:11:55<4:49:17, 2.49s/it] +2025-02-06 00:19:35 - ERROR - stderr - +2025-02-06 00:19:35 - ERROR - stderr - +2025-02-06 00:19:35 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.3315554857254028, 'learning_rate': 4.648804102408322e-06, 'epoch': 2.07} +2025-02-06 00:19:35 - ERROR - stderr - 69%|██████▉ | 15465/22434 [14:11:55<4:49:17, 2.49s/it] +2025-02-06 00:19:38 - ERROR - stderr - 69%|██████▉ | 15466/22434 [14:11:58<4:49:44, 2.49s/it] +2025-02-06 00:19:38 - ERROR - stderr - +2025-02-06 00:19:38 - ERROR - stderr - +2025-02-06 00:19:38 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.4610991477966309, 'learning_rate': 4.647584515228172e-06, 'epoch': 2.07} +2025-02-06 00:19:38 - ERROR - stderr - 69%|██████▉ | 15466/22434 [14:11:58<4:49:44, 2.49s/it] +2025-02-06 00:19:40 - ERROR - stderr - 69%|██████▉ | 15467/22434 [14:12:00<4:47:24, 2.48s/it] +2025-02-06 00:19:40 - ERROR - stderr - +2025-02-06 00:19:40 - ERROR - stderr - +2025-02-06 00:19:40 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.3391027450561523, 'learning_rate': 4.646365039614001e-06, 'epoch': 2.07} +2025-02-06 00:19:40 - ERROR - stderr - 69%|██████▉ | 15467/22434 [14:12:00<4:47:24, 2.48s/it] +2025-02-06 00:19:43 - ERROR - stderr - 69%|██████▉ | 15468/22434 [14:12:03<4:50:44, 2.50s/it] +2025-02-06 00:19:43 - ERROR - stderr - +2025-02-06 00:19:43 - ERROR - stderr - +2025-02-06 00:19:43 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.5410743951797485, 'learning_rate': 4.6451456755912235e-06, 'epoch': 2.07} +2025-02-06 00:19:43 - ERROR - stderr - 69%|██████▉ | 15468/22434 [14:12:03<4:50:44, 2.50s/it] +2025-02-06 00:19:45 - ERROR - stderr - 69%|██████▉ | 15469/22434 [14:12:05<4:50:18, 2.50s/it] +2025-02-06 00:19:45 - ERROR - stderr - +2025-02-06 00:19:45 - ERROR - stderr - +2025-02-06 00:19:45 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.4280060529708862, 'learning_rate': 4.6439264231852685e-06, 'epoch': 2.07} +2025-02-06 00:19:45 - ERROR - stderr - 69%|██████▉ | 15469/22434 [14:12:05<4:50:18, 2.50s/it] +2025-02-06 00:19:48 - ERROR - stderr - 69%|██████▉ | 15470/22434 [14:12:08<4:49:17, 2.49s/it] +2025-02-06 00:19:48 - ERROR - stderr - +2025-02-06 00:19:48 - ERROR - stderr - +2025-02-06 00:19:48 - INFO - stdout - {'loss': 0.3836, 'grad_norm': 1.5747367143630981, 'learning_rate': 4.642707282421538e-06, 'epoch': 2.07} +2025-02-06 00:19:48 - ERROR - stderr - 69%|██████▉ | 15470/22434 [14:12:08<4:49:17, 2.49s/it] +2025-02-06 00:19:50 - ERROR - stderr - 69%|██████▉ | 15471/22434 [14:12:10<4:49:12, 2.49s/it] +2025-02-06 00:19:50 - ERROR - stderr - +2025-02-06 00:19:50 - ERROR - stderr - +2025-02-06 00:19:50 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.3540974855422974, 'learning_rate': 4.641488253325448e-06, 'epoch': 2.07} +2025-02-06 00:19:50 - ERROR - stderr - 69%|██████▉ | 15471/22434 [14:12:10<4:49:12, 2.49s/it] +2025-02-06 00:19:53 - ERROR - stderr - 69%|██████▉ | 15472/22434 [14:12:13<4:48:14, 2.48s/it] +2025-02-06 00:19:53 - ERROR - stderr - +2025-02-06 00:19:53 - ERROR - stderr - +2025-02-06 00:19:53 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.3723469972610474, 'learning_rate': 4.6402693359224076e-06, 'epoch': 2.07} +2025-02-06 00:19:53 - ERROR - stderr - 69%|██████▉ | 15472/22434 [14:12:13<4:48:14, 2.48s/it] +2025-02-06 00:19:55 - ERROR - stderr - 69%|██████▉ | 15473/22434 [14:12:15<4:52:29, 2.52s/it] +2025-02-06 00:19:56 - ERROR - stderr - +2025-02-06 00:19:56 - ERROR - stderr - +2025-02-06 00:19:56 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.2425537109375, 'learning_rate': 4.639050530237824e-06, 'epoch': 2.07} +2025-02-06 00:19:56 - ERROR - stderr - 69%|██████▉ | 15473/22434 [14:12:15<4:52:29, 2.52s/it] +2025-02-06 00:19:58 - ERROR - stderr - 69%|██████▉ | 15474/22434 [14:12:18<4:52:02, 2.52s/it] +2025-02-06 00:19:58 - ERROR - stderr - +2025-02-06 00:19:58 - ERROR - stderr - +2025-02-06 00:19:58 - INFO - stdout - {'loss': 0.4277, 'grad_norm': 1.4034665822982788, 'learning_rate': 4.637831836297103e-06, 'epoch': 2.07} +2025-02-06 00:19:58 - ERROR - stderr - 69%|██████▉ | 15474/22434 [14:12:18<4:52:02, 2.52s/it] +2025-02-06 00:20:00 - ERROR - stderr - 69%|██████▉ | 15475/22434 [14:12:20<4:49:26, 2.50s/it] +2025-02-06 00:20:00 - ERROR - stderr - +2025-02-06 00:20:00 - ERROR - stderr - +2025-02-06 00:20:00 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.4398534297943115, 'learning_rate': 4.636613254125646e-06, 'epoch': 2.07} +2025-02-06 00:20:00 - ERROR - stderr - 69%|██████▉ | 15475/22434 [14:12:20<4:49:26, 2.50s/it] +2025-02-06 00:20:03 - ERROR - stderr - 69%|██████▉ | 15476/22434 [14:12:23<4:51:14, 2.51s/it] +2025-02-06 00:20:03 - ERROR - stderr - +2025-02-06 00:20:03 - ERROR - stderr - +2025-02-06 00:20:03 - INFO - stdout - {'loss': 0.4555, 'grad_norm': 1.7757490873336792, 'learning_rate': 4.635394783748853e-06, 'epoch': 2.07} +2025-02-06 00:20:03 - ERROR - stderr - 69%|██████▉ | 15476/22434 [14:12:23<4:51:14, 2.51s/it] +2025-02-06 00:20:06 - ERROR - stderr - 69%|██████▉ | 15477/22434 [14:12:25<4:52:44, 2.52s/it] +2025-02-06 00:20:06 - ERROR - stderr - +2025-02-06 00:20:06 - ERROR - stderr - +2025-02-06 00:20:06 - INFO - stdout - {'loss': 0.3962, 'grad_norm': 1.5073790550231934, 'learning_rate': 4.634176425192123e-06, 'epoch': 2.07} +2025-02-06 00:20:06 - ERROR - stderr - 69%|██████▉ | 15477/22434 [14:12:25<4:52:44, 2.52s/it] +2025-02-06 00:20:08 - ERROR - stderr - 69%|██████▉ | 15478/22434 [14:12:28<4:53:05, 2.53s/it] +2025-02-06 00:20:08 - ERROR - stderr - +2025-02-06 00:20:08 - ERROR - stderr - +2025-02-06 00:20:08 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.4159979820251465, 'learning_rate': 4.632958178480854e-06, 'epoch': 2.07} +2025-02-06 00:20:08 - ERROR - stderr - 69%|██████▉ | 15478/22434 [14:12:28<4:53:05, 2.53s/it] +2025-02-06 00:20:11 - ERROR - stderr - 69%|██████▉ | 15479/22434 [14:12:31<4:58:24, 2.57s/it] +2025-02-06 00:20:11 - ERROR - stderr - +2025-02-06 00:20:11 - ERROR - stderr - +2025-02-06 00:20:11 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.5473610162734985, 'learning_rate': 4.6317400436404295e-06, 'epoch': 2.07} +2025-02-06 00:20:11 - ERROR - stderr - 69%|█���████▉ | 15479/22434 [14:12:31<4:58:24, 2.57s/it] +2025-02-06 00:20:13 - ERROR - stderr - 69%|██████▉ | 15480/22434 [14:12:33<4:55:26, 2.55s/it] +2025-02-06 00:20:13 - ERROR - stderr - +2025-02-06 00:20:13 - ERROR - stderr - +2025-02-06 00:20:13 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.4713704586029053, 'learning_rate': 4.63052202069625e-06, 'epoch': 2.07} +2025-02-06 00:20:13 - ERROR - stderr - 69%|██████▉ | 15480/22434 [14:12:33<4:55:26, 2.55s/it] +2025-02-06 00:20:16 - ERROR - stderr - 69%|██████▉ | 15481/22434 [14:12:35<4:52:30, 2.52s/it] +2025-02-06 00:20:16 - ERROR - stderr - +2025-02-06 00:20:16 - ERROR - stderr - +2025-02-06 00:20:16 - INFO - stdout - {'loss': 0.3616, 'grad_norm': 1.3772010803222656, 'learning_rate': 4.629304109673705e-06, 'epoch': 2.07} +2025-02-06 00:20:16 - ERROR - stderr - 69%|██████▉ | 15481/22434 [14:12:36<4:52:30, 2.52s/it] +2025-02-06 00:20:18 - ERROR - stderr - 69%|██████▉ | 15482/22434 [14:12:38<4:52:53, 2.53s/it] +2025-02-06 00:20:18 - ERROR - stderr - +2025-02-06 00:20:18 - ERROR - stderr - +2025-02-06 00:20:18 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.3994207382202148, 'learning_rate': 4.628086310598169e-06, 'epoch': 2.07} +2025-02-06 00:20:18 - ERROR - stderr - 69%|██████▉ | 15482/22434 [14:12:38<4:52:53, 2.53s/it] +2025-02-06 00:20:21 - ERROR - stderr - 69%|██████▉ | 15483/22434 [14:12:41<4:57:53, 2.57s/it] +2025-02-06 00:20:21 - ERROR - stderr - +2025-02-06 00:20:21 - ERROR - stderr - +2025-02-06 00:20:21 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.4276278018951416, 'learning_rate': 4.62686862349504e-06, 'epoch': 2.07} +2025-02-06 00:20:21 - ERROR - stderr - 69%|██████▉ | 15483/22434 [14:12:41<4:57:53, 2.57s/it] +2025-02-06 00:20:23 - ERROR - stderr - 69%|██████▉ | 15484/22434 [14:12:43<4:53:45, 2.54s/it] +2025-02-06 00:20:23 - ERROR - stderr - +2025-02-06 00:20:23 - ERROR - stderr - +2025-02-06 00:20:23 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.5051707029342651, 'learning_rate': 4.625651048389687e-06, 'epoch': 2.07} +2025-02-06 00:20:23 - ERROR - stderr - 69%|██████▉ | 15484/22434 [14:12:43<4:53:45, 2.54s/it] +2025-02-06 00:20:26 - ERROR - stderr - 69%|██████▉ | 15485/22434 [14:12:46<4:49:43, 2.50s/it] +2025-02-06 00:20:26 - ERROR - stderr - +2025-02-06 00:20:26 - ERROR - stderr - +2025-02-06 00:20:26 - INFO - stdout - {'loss': 0.3946, 'grad_norm': 1.5453317165374756, 'learning_rate': 4.624433585307502e-06, 'epoch': 2.07} +2025-02-06 00:20:26 - ERROR - stderr - 69%|██████▉ | 15485/22434 [14:12:46<4:49:43, 2.50s/it] +2025-02-06 00:20:28 - ERROR - stderr - 69%|██████▉ | 15486/22434 [14:12:48<4:53:36, 2.54s/it] +2025-02-06 00:20:28 - ERROR - stderr - +2025-02-06 00:20:28 - ERROR - stderr - +2025-02-06 00:20:28 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.5410877466201782, 'learning_rate': 4.623216234273852e-06, 'epoch': 2.07} +2025-02-06 00:20:28 - ERROR - stderr - 69%|██████▉ | 15486/22434 [14:12:48<4:53:36, 2.54s/it] +2025-02-06 00:20:31 - ERROR - stderr - 69%|██████▉ | 15487/22434 [14:12:51<4:53:29, 2.53s/it] +2025-02-06 00:20:31 - ERROR - stderr - +2025-02-06 00:20:31 - ERROR - stderr - +2025-02-06 00:20:31 - INFO - stdout - {'loss': 0.4128, 'grad_norm': 1.4690775871276855, 'learning_rate': 4.62199899531411e-06, 'epoch': 2.07} +2025-02-06 00:20:31 - ERROR - stderr - 69%|██████▉ | 15487/22434 [14:12:51<4:53:29, 2.53s/it] +2025-02-06 00:20:33 - ERROR - stderr - 69%|██████▉ | 15488/22434 [14:12:53<4:50:19, 2.51s/it] +2025-02-06 00:20:33 - ERROR - stderr - +2025-02-06 00:20:33 - ERROR - stderr - +2025-02-06 00:20:33 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.3076168298721313, 'learning_rate': 4.62078186845366e-06, 'epoch': 2.07} +2025-02-06 00:20:33 - ERROR - stderr - 69%|██████▉ | 15488/22434 [14:12:53<4:50:19, 2.51s/it] +2025-02-06 00:20:36 - ERROR - stderr - 69%|██████▉ | 15489/22434 [14:12:56<4:48:08, 2.49s/it] +2025-02-06 00:20:36 - ERROR - stderr - +2025-02-06 00:20:36 - ERROR - stderr - +2025-02-06 00:20:36 - INFO - stdout - {'loss': 0.4139, 'grad_norm': 1.4047549962997437, 'learning_rate': 4.619564853717861e-06, 'epoch': 2.07} +2025-02-06 00:20:36 - ERROR - stderr - 69%|██████▉ | 15489/22434 [14:12:56<4:48:08, 2.49s/it] +2025-02-06 00:20:39 - ERROR - stderr - 69%|██████▉ | 15490/22434 [14:12:58<4:56:39, 2.56s/it] +2025-02-06 00:20:39 - ERROR - stderr - +2025-02-06 00:20:39 - ERROR - stderr - +2025-02-06 00:20:39 - INFO - stdout - {'loss': 0.454, 'grad_norm': 1.686294436454773, 'learning_rate': 4.618347951132085e-06, 'epoch': 2.07} +2025-02-06 00:20:39 - ERROR - stderr - 69%|██████▉ | 15490/22434 [14:12:58<4:56:39, 2.56s/it] +2025-02-06 00:20:41 - ERROR - stderr - 69%|██████▉ | 15491/22434 [14:13:01<4:53:34, 2.54s/it] +2025-02-06 00:20:41 - ERROR - stderr - +2025-02-06 00:20:41 - ERROR - stderr - +2025-02-06 00:20:41 - INFO - stdout - {'loss': 0.4187, 'grad_norm': 1.4207226037979126, 'learning_rate': 4.617131160721696e-06, 'epoch': 2.07} +2025-02-06 00:20:41 - ERROR - stderr - 69%|██████▉ | 15491/22434 [14:13:01<4:53:34, 2.54s/it] +2025-02-06 00:20:44 - ERROR - stderr - 69%|██████▉ | 15492/22434 [14:13:03<4:52:08, 2.53s/it] +2025-02-06 00:20:44 - ERROR - stderr - +2025-02-06 00:20:44 - ERROR - stderr - +2025-02-06 00:20:44 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.4855901002883911, 'learning_rate': 4.615914482512056e-06, 'epoch': 2.07} +2025-02-06 00:20:44 - ERROR - stderr - 69%|██████▉ | 15492/22434 [14:13:03<4:52:08, 2.53s/it] +2025-02-06 00:20:46 - ERROR - stderr - 69%|██████▉ | 15493/22434 [14:13:06<4:49:17, 2.50s/it] +2025-02-06 00:20:46 - ERROR - stderr - +2025-02-06 00:20:46 - ERROR - stderr - +2025-02-06 00:20:46 - INFO - stdout - {'loss': 0.4107, 'grad_norm': 1.712656021118164, 'learning_rate': 4.614697916528528e-06, 'epoch': 2.07} +2025-02-06 00:20:46 - ERROR - stderr - 69%|██████▉ | 15493/22434 [14:13:06<4:49:17, 2.50s/it] +2025-02-06 00:20:48 - ERROR - stderr - 69%|██████▉ | 15494/22434 [14:13:08<4:47:15, 2.48s/it] +2025-02-06 00:20:48 - ERROR - stderr - +2025-02-06 00:20:48 - ERROR - stderr - +2025-02-06 00:20:48 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.6097272634506226, 'learning_rate': 4.613481462796468e-06, 'epoch': 2.07} +2025-02-06 00:20:48 - ERROR - stderr - 69%|██████▉ | 15494/22434 [14:13:08<4:47:15, 2.48s/it] +2025-02-06 00:20:51 - ERROR - stderr - 69%|██████▉ | 15495/22434 [14:13:11<4:45:49, 2.47s/it] +2025-02-06 00:20:51 - ERROR - stderr - +2025-02-06 00:20:51 - ERROR - stderr - +2025-02-06 00:20:51 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.4192359447479248, 'learning_rate': 4.612265121341233e-06, 'epoch': 2.07} +2025-02-06 00:20:51 - ERROR - stderr - 69%|██████▉ | 15495/22434 [14:13:11<4:45:49, 2.47s/it] +2025-02-06 00:20:53 - ERROR - stderr - 69%|██████▉ | 15496/22434 [14:13:13<4:44:52, 2.46s/it] +2025-02-06 00:20:53 - ERROR - stderr - +2025-02-06 00:20:53 - ERROR - stderr - +2025-02-06 00:20:53 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.4438153505325317, 'learning_rate': 4.6110488921881755e-06, 'epoch': 2.07} +2025-02-06 00:20:53 - ERROR - stderr - 69%|██████▉ | 15496/22434 [14:13:13<4:44:52, 2.46s/it] +2025-02-06 00:20:56 - ERROR - stderr - 69%|██████▉ | 15497/22434 [14:13:16<4:46:25, 2.48s/it] +2025-02-06 00:20:56 - ERROR - stderr - +2025-02-06 00:20:56 - ERROR - stderr - +2025-02-06 00:20:56 - INFO - stdout - {'loss': 0.4079, 'grad_norm': 1.5269197225570679, 'learning_rate': 4.6098327753626515e-06, 'epoch': 2.07} +2025-02-06 00:20:56 - ERROR - stderr - 69%|██████▉ | 15497/22434 [14:13:16<4:46:25, 2.48s/it] +2025-02-06 00:20:58 - ERROR - stderr - 69%|██████▉ | 15498/22434 [14:13:18<4:45:25, 2.47s/it] +2025-02-06 00:20:58 - ERROR - stderr - +2025-02-06 00:20:58 - ERROR - stderr - +2025-02-06 00:20:58 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.3769514560699463, 'learning_rate': 4.608616770889998e-06, 'epoch': 2.07} +2025-02-06 00:20:58 - ERROR - stderr - 69%|██████▉ | 15498/22434 [14:13:18<4:45:25, 2.47s/it] +2025-02-06 00:21:01 - ERROR - stderr - 69%|██████▉ | 15499/22434 [14:13:21<4:47:59, 2.49s/it] +2025-02-06 00:21:01 - ERROR - stderr - +2025-02-06 00:21:01 - ERROR - stderr - +2025-02-06 00:21:01 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.4006940126419067, 'learning_rate': 4.6074008787955725e-06, 'epoch': 2.07} +2025-02-06 00:21:01 - ERROR - stderr - 69%|██████▉ | 15499/22434 [14:13:21<4:47:59, 2.49s/it] +2025-02-06 00:21:04 - ERROR - stderr - 69%|██████▉ | 15500/22434 [14:13:23<4:56:42, 2.57s/it] +2025-02-06 00:21:04 - ERROR - stderr - +2025-02-06 00:21:04 - ERROR - stderr - +2025-02-06 00:21:04 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.4371217489242554, 'learning_rate': 4.606185099104716e-06, 'epoch': 2.07} +2025-02-06 00:21:04 - ERROR - stderr - 69%|██████▉ | 15500/22434 [14:13:23<4:56:42, 2.57s/it] +2025-02-06 00:21:06 - ERROR - stderr - 69%|██████▉ | 15501/22434 [14:13:26<4:52:32, 2.53s/it] +2025-02-06 00:21:06 - ERROR - stderr - +2025-02-06 00:21:06 - ERROR - stderr - +2025-02-06 00:21:06 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.6114728450775146, 'learning_rate': 4.604969431842769e-06, 'epoch': 2.07} +2025-02-06 00:21:06 - ERROR - stderr - 69%|██████▉ | 15501/22434 [14:13:26<4:52:32, 2.53s/it] +2025-02-06 00:21:08 - ERROR - stderr - 69%|██████▉ | 15502/22434 [14:13:28<4:49:16, 2.50s/it] +2025-02-06 00:21:08 - ERROR - stderr - +2025-02-06 00:21:08 - ERROR - stderr - +2025-02-06 00:21:08 - INFO - stdout - {'loss': 0.389, 'grad_norm': 1.3759255409240723, 'learning_rate': 4.603753877035075e-06, 'epoch': 2.07} +2025-02-06 00:21:08 - ERROR - stderr - 69%|██████▉ | 15502/22434 [14:13:28<4:49:16, 2.50s/it] +2025-02-06 00:21:11 - ERROR - stderr - 69%|██████▉ | 15503/22434 [14:13:31<4:48:36, 2.50s/it] +2025-02-06 00:21:11 - ERROR - stderr - +2025-02-06 00:21:11 - ERROR - stderr - +2025-02-06 00:21:11 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.2989004850387573, 'learning_rate': 4.6025384347069615e-06, 'epoch': 2.07} +2025-02-06 00:21:11 - ERROR - stderr - 69%|██████▉ | 15503/22434 [14:13:31<4:48:36, 2.50s/it] +2025-02-06 00:21:14 - ERROR - stderr - 69%|██████▉ | 15504/22434 [14:13:33<4:56:43, 2.57s/it] +2025-02-06 00:21:14 - ERROR - stderr - +2025-02-06 00:21:14 - ERROR - stderr - +2025-02-06 00:21:14 - INFO - stdout - {'loss': 0.418, 'grad_norm': 1.4718331098556519, 'learning_rate': 4.601323104883776e-06, 'epoch': 2.07} +2025-02-06 00:21:14 - ERROR - stderr - 69%|██████▉ | 15504/22434 [14:13:33<4:56:43, 2.57s/it] +2025-02-06 00:21:16 - ERROR - stderr - 69%|██████▉ | 15505/22434 [14:13:36<4:54:00, 2.55s/it] +2025-02-06 00:21:16 - ERROR - stderr - +2025-02-06 00:21:16 - ERROR - stderr - +2025-02-06 00:21:16 - INFO - stdout - {'loss': 0.432, 'grad_norm': 1.5887699127197266, 'learning_rate': 4.600107887590841e-06, 'epoch': 2.07} +2025-02-06 00:21:16 - ERROR - stderr - 69%|██████▉ | 15505/22434 [14:13:36<4:54:00, 2.55s/it] +2025-02-06 00:21:19 - ERROR - stderr - 69%|██████▉ | 15506/22434 [14:13:38<4:51:42, 2.53s/it] +2025-02-06 00:21:19 - ERROR - stderr - +2025-02-06 00:21:19 - ERROR - stderr - +2025-02-06 00:21:19 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.3867260217666626, 'learning_rate': 4.598892782853487e-06, 'epoch': 2.07} +2025-02-06 00:21:19 - ERROR - stderr - 69%|██████▉ | 15506/22434 [14:13:38<4:51:42, 2.53s/it] +2025-02-06 00:21:21 - ERROR - stderr - 69%|██████▉ | 15507/22434 [14:13:41<4:48:53, 2.50s/it] +2025-02-06 00:21:21 - ERROR - stderr - +2025-02-06 00:21:21 - ERROR - stderr - +2025-02-06 00:21:21 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.6015316247940063, 'learning_rate': 4.597677790697051e-06, 'epoch': 2.07} +2025-02-06 00:21:21 - ERROR - stderr - 69%|██████▉ | 15507/22434 [14:13:41<4:48:53, 2.50s/it] +2025-02-06 00:21:23 - ERROR - stderr - 69%|██████▉ | 15508/22434 [14:13:43<4:46:00, 2.48s/it] +2025-02-06 00:21:24 - ERROR - stderr - +2025-02-06 00:21:24 - ERROR - stderr - +2025-02-06 00:21:24 - INFO - stdout - {'loss': 0.4079, 'grad_norm': 1.3856736421585083, 'learning_rate': 4.596462911146845e-06, 'epoch': 2.07} +2025-02-06 00:21:24 - ERROR - stderr - 69%|██████▉ | 15508/22434 [14:13:43<4:46:00, 2.48s/it] +2025-02-06 00:21:26 - ERROR - stderr - 69%|██████▉ | 15509/22434 [14:13:46<4:46:38, 2.48s/it] +2025-02-06 00:21:26 - ERROR - stderr - +2025-02-06 00:21:26 - ERROR - stderr - +2025-02-06 00:21:26 - INFO - stdout - {'loss': 0.4067, 'grad_norm': 1.439468502998352, 'learning_rate': 4.595248144228206e-06, 'epoch': 2.07} +2025-02-06 00:21:26 - ERROR - stderr - 69%|██████▉ | 15509/22434 [14:13:46<4:46:38, 2.48s/it] +2025-02-06 00:21:29 - ERROR - stderr - 69%|██████▉ | 15510/22434 [14:13:48<4:53:24, 2.54s/it] +2025-02-06 00:21:29 - ERROR - stderr - +2025-02-06 00:21:29 - ERROR - stderr - +2025-02-06 00:21:29 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.6168230772018433, 'learning_rate': 4.594033489966444e-06, 'epoch': 2.07} +2025-02-06 00:21:29 - ERROR - stderr - 69%|██████▉ | 15510/22434 [14:13:48<4:53:24, 2.54s/it] +2025-02-06 00:21:31 - ERROR - stderr - 69%|██████▉ | 15511/22434 [14:13:51<4:56:10, 2.57s/it] +2025-02-06 00:21:31 - ERROR - stderr - +2025-02-06 00:21:31 - ERROR - stderr - +2025-02-06 00:21:31 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.4911526441574097, 'learning_rate': 4.592818948386882e-06, 'epoch': 2.07} +2025-02-06 00:21:31 - ERROR - stderr - 69%|██████▉ | 15511/22434 [14:13:51<4:56:10, 2.57s/it] +2025-02-06 00:21:34 - ERROR - stderr - 69%|██████▉ | 15512/22434 [14:13:54<4:52:56, 2.54s/it] +2025-02-06 00:21:34 - ERROR - stderr - +2025-02-06 00:21:34 - ERROR - stderr - +2025-02-06 00:21:34 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.4144484996795654, 'learning_rate': 4.591604519514834e-06, 'epoch': 2.07} +2025-02-06 00:21:34 - ERROR - stderr - 69%|██████▉ | 15512/22434 [14:13:54<4:52:56, 2.54s/it] +2025-02-06 00:21:36 - ERROR - stderr - 69%|██████▉ | 15513/22434 [14:13:56<4:50:15, 2.52s/it] +2025-02-06 00:21:36 - ERROR - stderr - +2025-02-06 00:21:36 - ERROR - stderr - +2025-02-06 00:21:36 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.4156376123428345, 'learning_rate': 4.5903902033756145e-06, 'epoch': 2.07} +2025-02-06 00:21:36 - ERROR - stderr - 69%|██████▉ | 15513/22434 [14:13:56<4:50:15, 2.52s/it] +2025-02-06 00:21:39 - ERROR - stderr - 69%|██████▉ | 15514/22434 [14:13:59<4:51:06, 2.52s/it] +2025-02-06 00:21:39 - ERROR - stderr - +2025-02-06 00:21:39 - ERROR - stderr - +2025-02-06 00:21:39 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.3800048828125, 'learning_rate': 4.589175999994535e-06, 'epoch': 2.07} +2025-02-06 00:21:39 - ERROR - stderr - 69%|██████▉ | 15514/22434 [14:13:59<4:51:06, 2.52s/it] +2025-02-06 00:21:41 - ERROR - stderr - 69%|██████▉ | 15515/22434 [14:14:01<4:49:08, 2.51s/it] +2025-02-06 00:21:41 - ERROR - stderr - +2025-02-06 00:21:41 - ERROR - stderr - +2025-02-06 00:21:41 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.5969502925872803, 'learning_rate': 4.587961909396904e-06, 'epoch': 2.07} +2025-02-06 00:21:41 - ERROR - stderr - 69%|██████▉ | 15515/22434 [14:14:01<4:49:08, 2.51s/it] +2025-02-06 00:21:44 - ERROR - stderr - 69%|██████▉ | 15516/22434 [14:14:04<4:50:54, 2.52s/it] +2025-02-06 00:21:44 - ERROR - stderr - +2025-02-06 00:21:44 - ERROR - stderr - +2025-02-06 00:21:44 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.4718846082687378, 'learning_rate': 4.586747931608029e-06, 'epoch': 2.07} +2025-02-06 00:21:44 - ERROR - stderr - 69%|██████▉ | 15516/22434 [14:14:04<4:50:54, 2.52s/it] +2025-02-06 00:21:46 - ERROR - stderr - 69%|██████▉ | 15517/22434 [14:14:06<4:49:06, 2.51s/it] +2025-02-06 00:21:46 - ERROR - stderr - +2025-02-06 00:21:46 - ERROR - stderr - +2025-02-06 00:21:46 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.396140217781067, 'learning_rate': 4.585534066653212e-06, 'epoch': 2.08} +2025-02-06 00:21:46 - ERROR - stderr - 69%|██████▉ | 15517/22434 [14:14:06<4:49:06, 2.51s/it] +2025-02-06 00:21:49 - ERROR - stderr - 69%|██████▉ | 15518/22434 [14:14:09<4:52:15, 2.54s/it] +2025-02-06 00:21:49 - ERROR - stderr - +2025-02-06 00:21:49 - ERROR - stderr - +2025-02-06 00:21:49 - INFO - stdout - {'loss': 0.3096, 'grad_norm': 1.3933732509613037, 'learning_rate': 4.584320314557758e-06, 'epoch': 2.08} +2025-02-06 00:21:49 - ERROR - stderr - 69%|██████▉ | 15518/22434 [14:14:09<4:52:15, 2.54s/it] +2025-02-06 00:21:51 - ERROR - stderr - 69%|██████▉ | 15519/22434 [14:14:11<4:51:00, 2.52s/it] +2025-02-06 00:21:51 - ERROR - stderr - +2025-02-06 00:21:51 - ERROR - stderr - +2025-02-06 00:21:51 - INFO - stdout - {'loss': 0.4378, 'grad_norm': 1.5565464496612549, 'learning_rate': 4.583106675346964e-06, 'epoch': 2.08} +2025-02-06 00:21:51 - ERROR - stderr - 69%|██████▉ | 15519/22434 [14:14:11<4:51:00, 2.52s/it] +2025-02-06 00:21:54 - ERROR - stderr - 69%|██████▉ | 15520/22434 [14:14:14<4:48:49, 2.51s/it] +2025-02-06 00:21:54 - ERROR - stderr - +2025-02-06 00:21:54 - ERROR - stderr - +2025-02-06 00:21:54 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.356546401977539, 'learning_rate': 4.581893149046128e-06, 'epoch': 2.08} +2025-02-06 00:21:54 - ERROR - stderr - 69%|██████▉ | 15520/22434 [14:14:14<4:48:49, 2.51s/it] +2025-02-06 00:21:57 - ERROR - stderr - 69%|██████▉ | 15521/22434 [14:14:16<4:56:47, 2.58s/it] +2025-02-06 00:21:57 - ERROR - stderr - +2025-02-06 00:21:57 - ERROR - stderr - +2025-02-06 00:21:57 - INFO - stdout - {'loss': 0.3828, 'grad_norm': 1.4793492555618286, 'learning_rate': 4.580679735680548e-06, 'epoch': 2.08} +2025-02-06 00:21:57 - ERROR - stderr - 69%|██████▉ | 15521/22434 [14:14:16<4:56:47, 2.58s/it] +2025-02-06 00:21:59 - ERROR - stderr - 69%|██████▉ | 15522/22434 [14:14:19<4:51:26, 2.53s/it] +2025-02-06 00:21:59 - ERROR - stderr - +2025-02-06 00:21:59 - ERROR - stderr - +2025-02-06 00:21:59 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.3967177867889404, 'learning_rate': 4.579466435275506e-06, 'epoch': 2.08} +2025-02-06 00:21:59 - ERROR - stderr - 69%|██████▉ | 15522/22434 [14:14:19<4:51:26, 2.53s/it] +2025-02-06 00:22:02 - ERROR - stderr - 69%|██████▉ | 15523/22434 [14:14:21<4:53:04, 2.54s/it] +2025-02-06 00:22:02 - ERROR - stderr - +2025-02-06 00:22:02 - ERROR - stderr - +2025-02-06 00:22:02 - INFO - stdout - {'loss': 0.383, 'grad_norm': 1.6533570289611816, 'learning_rate': 4.5782532478563065e-06, 'epoch': 2.08} +2025-02-06 00:22:02 - ERROR - stderr - 69%|██████▉ | 15523/22434 [14:14:21<4:53:04, 2.54s/it] +2025-02-06 00:22:04 - ERROR - stderr - 69%|██████▉ | 15524/22434 [14:14:24<4:50:00, 2.52s/it] +2025-02-06 00:22:04 - ERROR - stderr - +2025-02-06 00:22:04 - ERROR - stderr - +2025-02-06 00:22:04 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.3016936779022217, 'learning_rate': 4.577040173448224e-06, 'epoch': 2.08} +2025-02-06 00:22:04 - ERROR - stderr - 69%|██████▉ | 15524/22434 [14:14:24<4:50:00, 2.52s/it] +2025-02-06 00:22:06 - ERROR - stderr - 69%|██████▉ | 15525/22434 [14:14:26<4:46:37, 2.49s/it] +2025-02-06 00:22:07 - ERROR - stderr - +2025-02-06 00:22:07 - ERROR - stderr - +2025-02-06 00:22:07 - INFO - stdout - {'loss': 0.407, 'grad_norm': 1.3832767009735107, 'learning_rate': 4.575827212076553e-06, 'epoch': 2.08} +2025-02-06 00:22:07 - ERROR - stderr - 69%|██████▉ | 15525/22434 [14:14:26<4:46:37, 2.49s/it] +2025-02-06 00:22:09 - ERROR - stderr - 69%|██████▉ | 15526/22434 [14:14:29<4:47:06, 2.49s/it] +2025-02-06 00:22:09 - ERROR - stderr - +2025-02-06 00:22:09 - ERROR - stderr - +2025-02-06 00:22:09 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.5316741466522217, 'learning_rate': 4.574614363766575e-06, 'epoch': 2.08} +2025-02-06 00:22:09 - ERROR - stderr - 69%|██████▉ | 15526/22434 [14:14:29<4:47:06, 2.49s/it] +2025-02-06 00:22:11 - ERROR - stderr - 69%|██████▉ | 15527/22434 [14:14:31<4:47:16, 2.50s/it] +2025-02-06 00:22:12 - ERROR - stderr - +2025-02-06 00:22:12 - ERROR - stderr - +2025-02-06 00:22:12 - INFO - stdout - {'loss': 0.4005, 'grad_norm': 1.793339490890503, 'learning_rate': 4.573401628543564e-06, 'epoch': 2.08} +2025-02-06 00:22:12 - ERROR - stderr - 69%|██████▉ | 15527/22434 [14:14:31<4:47:16, 2.50s/it] +2025-02-06 00:22:14 - ERROR - stderr - 69%|██████▉ | 15528/22434 [14:14:34<4:46:58, 2.49s/it] +2025-02-06 00:22:14 - ERROR - stderr - +2025-02-06 00:22:14 - ERROR - stderr - +2025-02-06 00:22:14 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.4098761081695557, 'learning_rate': 4.57218900643281e-06, 'epoch': 2.08} +2025-02-06 00:22:14 - ERROR - stderr - 69%|██████▉ | 15528/22434 [14:14:34<4:46:58, 2.49s/it] +2025-02-06 00:22:16 - ERROR - stderr - 69%|██████▉ | 15529/22434 [14:14:36<4:45:04, 2.48s/it] +2025-02-06 00:22:16 - ERROR - stderr - +2025-02-06 00:22:16 - ERROR - stderr - +2025-02-06 00:22:16 - INFO - stdout - {'loss': 0.4269, 'grad_norm': 1.622809886932373, 'learning_rate': 4.570976497459579e-06, 'epoch': 2.08} +2025-02-06 00:22:16 - ERROR - stderr - 69%|██████▉ | 15529/22434 [14:14:36<4:45:04, 2.48s/it] +2025-02-06 00:22:19 - ERROR - stderr - 69%|██████▉ | 15530/22434 [14:14:39<4:46:47, 2.49s/it] +2025-02-06 00:22:19 - ERROR - stderr - +2025-02-06 00:22:19 - ERROR - stderr - +2025-02-06 00:22:19 - INFO - stdout - {'loss': 0.4211, 'grad_norm': 1.3706339597702026, 'learning_rate': 4.5697641016491465e-06, 'epoch': 2.08} +2025-02-06 00:22:19 - ERROR - stderr - 69%|██████▉ | 15530/22434 [14:14:39<4:46:47, 2.49s/it] +2025-02-06 00:22:21 - ERROR - stderr - 69%|██████▉ | 15531/22434 [14:14:41<4:47:20, 2.50s/it] +2025-02-06 00:22:21 - ERROR - stderr - +2025-02-06 00:22:21 - ERROR - stderr - +2025-02-06 00:22:21 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.5597443580627441, 'learning_rate': 4.568551819026786e-06, 'epoch': 2.08} +2025-02-06 00:22:21 - ERROR - stderr - 69%|██████▉ | 15531/22434 [14:14:41<4:47:20, 2.50s/it] +2025-02-06 00:22:24 - ERROR - stderr - 69%|██████▉ | 15532/22434 [14:14:44<4:46:24, 2.49s/it] +2025-02-06 00:22:24 - ERROR - stderr - +2025-02-06 00:22:24 - ERROR - stderr - +2025-02-06 00:22:24 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.3971006870269775, 'learning_rate': 4.567339649617763e-06, 'epoch': 2.08} +2025-02-06 00:22:24 - ERROR - stderr - 69%|██████▉ | 15532/22434 [14:14:44<4:46:24, 2.49s/it] +2025-02-06 00:22:26 - ERROR - stderr - 69%|██████▉ | 15533/22434 [14:14:46<4:45:33, 2.48s/it] +2025-02-06 00:22:26 - ERROR - stderr - +2025-02-06 00:22:26 - ERROR - stderr - +2025-02-06 00:22:26 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.444258689880371, 'learning_rate': 4.566127593447353e-06, 'epoch': 2.08} +2025-02-06 00:22:26 - ERROR - stderr - 69%|██████▉ | 15533/22434 [14:14:46<4:45:33, 2.48s/it] +2025-02-06 00:22:29 - ERROR - stderr - 69%|██████▉ | 15534/22434 [14:14:49<4:48:36, 2.51s/it] +2025-02-06 00:22:29 - ERROR - stderr - +2025-02-06 00:22:29 - ERROR - stderr - +2025-02-06 00:22:29 - INFO - stdout - {'loss': 0.4015, 'grad_norm': 1.5460172891616821, 'learning_rate': 4.5649156505408084e-06, 'epoch': 2.08} +2025-02-06 00:22:29 - ERROR - stderr - 69%|██████▉ | 15534/22434 [14:14:49<4:48:36, 2.51s/it] +2025-02-06 00:22:32 - ERROR - stderr - 69%|██████▉ | 15535/22434 [14:14:51<4:50:49, 2.53s/it] +2025-02-06 00:22:32 - ERROR - stderr - +2025-02-06 00:22:32 - ERROR - stderr - +2025-02-06 00:22:32 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.307062029838562, 'learning_rate': 4.563703820923399e-06, 'epoch': 2.08} +2025-02-06 00:22:32 - ERROR - stderr - 69%|██████▉ | 15535/22434 [14:14:51<4:50:49, 2.53s/it] +2025-02-06 00:22:34 - ERROR - stderr - 69%|██████▉ | 15536/22434 [14:14:54<4:47:42, 2.50s/it] +2025-02-06 00:22:34 - ERROR - stderr - +2025-02-06 00:22:34 - ERROR - stderr - +2025-02-06 00:22:34 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.4445074796676636, 'learning_rate': 4.56249210462038e-06, 'epoch': 2.08} +2025-02-06 00:22:34 - ERROR - stderr - 69%|██████▉ | 15536/22434 [14:14:54<4:47:42, 2.50s/it] +2025-02-06 00:22:36 - ERROR - stderr - 69%|██████▉ | 15537/22434 [14:14:56<4:48:29, 2.51s/it] +2025-02-06 00:22:37 - ERROR - stderr - +2025-02-06 00:22:37 - ERROR - stderr - +2025-02-06 00:22:37 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.470812201499939, 'learning_rate': 4.56128050165701e-06, 'epoch': 2.08} +2025-02-06 00:22:37 - ERROR - stderr - 69%|██████▉ | 15537/22434 [14:14:56<4:48:29, 2.51s/it] +2025-02-06 00:22:39 - ERROR - stderr - 69%|██████▉ | 15538/22434 [14:14:59<4:48:45, 2.51s/it] +2025-02-06 00:22:39 - ERROR - stderr - +2025-02-06 00:22:39 - ERROR - stderr - +2025-02-06 00:22:39 - INFO - stdout - {'loss': 0.4301, 'grad_norm': 1.636654019355774, 'learning_rate': 4.560069012058543e-06, 'epoch': 2.08} +2025-02-06 00:22:39 - ERROR - stderr - 69%|██████▉ | 15538/22434 [14:14:59<4:48:45, 2.51s/it] +2025-02-06 00:22:42 - ERROR - stderr - 69%|██████▉ | 15539/22434 [14:15:01<4:49:06, 2.52s/it] +2025-02-06 00:22:42 - ERROR - stderr - +2025-02-06 00:22:42 - ERROR - stderr - +2025-02-06 00:22:42 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.4901503324508667, 'learning_rate': 4.558857635850233e-06, 'epoch': 2.08} +2025-02-06 00:22:42 - ERROR - stderr - 69%|██████▉ | 15539/22434 [14:15:01<4:49:06, 2.52s/it] +2025-02-06 00:22:44 - ERROR - stderr - 69%|██████▉ | 15540/22434 [14:15:04<4:47:39, 2.50s/it] +2025-02-06 00:22:44 - ERROR - stderr - +2025-02-06 00:22:44 - ERROR - stderr - +2025-02-06 00:22:44 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.41392183303833, 'learning_rate': 4.557646373057329e-06, 'epoch': 2.08} +2025-02-06 00:22:44 - ERROR - stderr - 69%|██████▉ | 15540/22434 [14:15:04<4:47:39, 2.50s/it] +2025-02-06 00:22:46 - ERROR - stderr - 69%|██████▉ | 15541/22434 [14:15:06<4:44:54, 2.48s/it] +2025-02-06 00:22:46 - ERROR - stderr - +2025-02-06 00:22:46 - ERROR - stderr - +2025-02-06 00:22:46 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.3927148580551147, 'learning_rate': 4.556435223705078e-06, 'epoch': 2.08} +2025-02-06 00:22:46 - ERROR - stderr - 69%|██████▉ | 15541/22434 [14:15:06<4:44:54, 2.48s/it] +2025-02-06 00:22:49 - ERROR - stderr - 69%|██████▉ | 15542/22434 [14:15:09<4:44:43, 2.48s/it] +2025-02-06 00:22:49 - ERROR - stderr - +2025-02-06 00:22:49 - ERROR - stderr - +2025-02-06 00:22:49 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.6424471139907837, 'learning_rate': 4.55522418781873e-06, 'epoch': 2.08} +2025-02-06 00:22:49 - ERROR - stderr - 69%|██████▉ | 15542/22434 [14:15:09<4:44:43, 2.48s/it] +2025-02-06 00:22:51 - ERROR - stderr - 69%|██████▉ | 15543/22434 [14:15:11<4:44:34, 2.48s/it] +2025-02-06 00:22:51 - ERROR - stderr - +2025-02-06 00:22:51 - ERROR - stderr - +2025-02-06 00:22:51 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.497828483581543, 'learning_rate': 4.554013265423516e-06, 'epoch': 2.08} +2025-02-06 00:22:51 - ERROR - stderr - 69%|██████▉ | 15543/22434 [14:15:11<4:44:34, 2.48s/it] +2025-02-06 00:22:54 - ERROR - stderr - 69%|██████▉ | 15544/22434 [14:15:14<4:42:30, 2.46s/it] +2025-02-06 00:22:54 - ERROR - stderr - +2025-02-06 00:22:54 - ERROR - stderr - +2025-02-06 00:22:54 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.3883613348007202, 'learning_rate': 4.552802456544688e-06, 'epoch': 2.08} +2025-02-06 00:22:54 - ERROR - stderr - 69%|██████▉ | 15544/22434 [14:15:14<4:42:30, 2.46s/it] +2025-02-06 00:22:56 - ERROR - stderr - 69%|██████▉ | 15545/22434 [14:15:16<4:43:23, 2.47s/it] +2025-02-06 00:22:56 - ERROR - stderr - +2025-02-06 00:22:56 - ERROR - stderr - +2025-02-06 00:22:56 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.487144112586975, 'learning_rate': 4.551591761207485e-06, 'epoch': 2.08} +2025-02-06 00:22:56 - ERROR - stderr - 69%|██████▉ | 15545/22434 [14:15:16<4:43:23, 2.47s/it] +2025-02-06 00:22:59 - ERROR - stderr - 69%|██████▉ | 15546/22434 [14:15:18<4:41:52, 2.46s/it] +2025-02-06 00:22:59 - ERROR - stderr - +2025-02-06 00:22:59 - ERROR - stderr - +2025-02-06 00:22:59 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.60309636592865, 'learning_rate': 4.550381179437129e-06, 'epoch': 2.08} +2025-02-06 00:22:59 - ERROR - stderr - 69%|██████▉ | 15546/22434 [14:15:19<4:41:52, 2.46s/it] +2025-02-06 00:23:01 - ERROR - stderr - 69%|██████▉ | 15547/22434 [14:15:21<4:43:03, 2.47s/it] +2025-02-06 00:23:01 - ERROR - stderr - +2025-02-06 00:23:01 - ERROR - stderr - +2025-02-06 00:23:01 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.3909451961517334, 'learning_rate': 4.549170711258872e-06, 'epoch': 2.08} +2025-02-06 00:23:01 - ERROR - stderr - 69%|██████▉ | 15547/22434 [14:15:21<4:43:03, 2.47s/it] +2025-02-06 00:23:04 - ERROR - stderr - 69%|██████▉ | 15548/22434 [14:15:24<5:00:33, 2.62s/it] +2025-02-06 00:23:04 - ERROR - stderr - +2025-02-06 00:23:04 - ERROR - stderr - +2025-02-06 00:23:04 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.4441571235656738, 'learning_rate': 4.547960356697927e-06, 'epoch': 2.08} +2025-02-06 00:23:04 - ERROR - stderr - 69%|██████▉ | 15548/22434 [14:15:24<5:00:33, 2.62s/it] +2025-02-06 00:23:07 - ERROR - stderr - 69%|██████▉ | 15549/22434 [14:15:27<4:59:32, 2.61s/it] +2025-02-06 00:23:07 - ERROR - stderr - +2025-02-06 00:23:07 - ERROR - stderr - +2025-02-06 00:23:07 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.4510213136672974, 'learning_rate': 4.546750115779538e-06, 'epoch': 2.08} +2025-02-06 00:23:07 - ERROR - stderr - 69%|██████▉ | 15549/22434 [14:15:27<4:59:32, 2.61s/it] +2025-02-06 00:23:09 - ERROR - stderr - 69%|██████▉ | 15550/22434 [14:15:29<4:55:40, 2.58s/it] +2025-02-06 00:23:09 - ERROR - stderr - +2025-02-06 00:23:09 - ERROR - stderr - +2025-02-06 00:23:09 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.4373650550842285, 'learning_rate': 4.545539988528922e-06, 'epoch': 2.08} +2025-02-06 00:23:09 - ERROR - stderr - 69%|██████▉ | 15550/22434 [14:15:29<4:55:40, 2.58s/it] +2025-02-06 00:23:12 - ERROR - stderr - 69%|██████▉ | 15551/22434 [14:15:31<4:51:47, 2.54s/it] +2025-02-06 00:23:12 - ERROR - stderr - +2025-02-06 00:23:12 - ERROR - stderr - +2025-02-06 00:23:12 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.447117567062378, 'learning_rate': 4.544329974971302e-06, 'epoch': 2.08} +2025-02-06 00:23:12 - ERROR - stderr - 69%|██████▉ | 15551/22434 [14:15:32<4:51:47, 2.54s/it] +2025-02-06 00:23:14 - ERROR - stderr - 69%|██████▉ | 15552/22434 [14:15:34<4:50:14, 2.53s/it] +2025-02-06 00:23:14 - ERROR - stderr - +2025-02-06 00:23:14 - ERROR - stderr - +2025-02-06 00:23:14 - INFO - stdout - {'loss': 0.4068, 'grad_norm': 1.6633490324020386, 'learning_rate': 4.543120075131911e-06, 'epoch': 2.08} +2025-02-06 00:23:14 - ERROR - stderr - 69%|██████▉ | 15552/22434 [14:15:34<4:50:14, 2.53s/it] +2025-02-06 00:23:17 - ERROR - stderr - 69%|██████▉ | 15553/22434 [14:15:37<4:53:53, 2.56s/it] +2025-02-06 00:23:17 - ERROR - stderr - +2025-02-06 00:23:17 - ERROR - stderr - +2025-02-06 00:23:17 - INFO - stdout - {'loss': 0.3379, 'grad_norm': 1.3152557611465454, 'learning_rate': 4.5419102890359515e-06, 'epoch': 2.08} +2025-02-06 00:23:17 - ERROR - stderr - 69%|██████▉ | 15553/22434 [14:15:37<4:53:53, 2.56s/it] +2025-02-06 00:23:19 - ERROR - stderr - 69%|██████▉ | 15554/22434 [14:15:39<4:49:48, 2.53s/it] +2025-02-06 00:23:19 - ERROR - stderr - +2025-02-06 00:23:19 - ERROR - stderr - +2025-02-06 00:23:19 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5810796022415161, 'learning_rate': 4.5407006167086575e-06, 'epoch': 2.08} +2025-02-06 00:23:19 - ERROR - stderr - 69%|██████▉ | 15554/22434 [14:15:39<4:49:48, 2.53s/it] +2025-02-06 00:23:22 - ERROR - stderr - 69%|██████▉ | 15555/22434 [14:15:42<4:47:57, 2.51s/it] +2025-02-06 00:23:22 - ERROR - stderr - +2025-02-06 00:23:22 - ERROR - stderr - +2025-02-06 00:23:22 - INFO - stdout - {'loss': 0.3462, 'grad_norm': 1.4355103969573975, 'learning_rate': 4.5394910581752315e-06, 'epoch': 2.08} +2025-02-06 00:23:22 - ERROR - stderr - 69%|██████▉ | 15555/22434 [14:15:42<4:47:57, 2.51s/it] +2025-02-06 00:23:24 - ERROR - stderr - 69%|██████▉ | 15556/22434 [14:15:44<4:46:50, 2.50s/it] +2025-02-06 00:23:24 - ERROR - stderr - +2025-02-06 00:23:24 - ERROR - stderr - +2025-02-06 00:23:24 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.437821388244629, 'learning_rate': 4.538281613460889e-06, 'epoch': 2.08} +2025-02-06 00:23:24 - ERROR - stderr - 69%|██████▉ | 15556/22434 [14:15:44<4:46:50, 2.50s/it] +2025-02-06 00:23:27 - ERROR - stderr - 69%|██████▉ | 15557/22434 [14:15:47<4:45:49, 2.49s/it] +2025-02-06 00:23:27 - ERROR - stderr - +2025-02-06 00:23:27 - ERROR - stderr - +2025-02-06 00:23:27 - INFO - stdout - {'loss': 0.4516, 'grad_norm': 1.5658038854599, 'learning_rate': 4.5370722825908395e-06, 'epoch': 2.08} +2025-02-06 00:23:27 - ERROR - stderr - 69%|██████▉ | 15557/22434 [14:15:47<4:45:49, 2.49s/it] +2025-02-06 00:23:29 - ERROR - stderr - 69%|██████▉ | 15558/22434 [14:15:49<4:47:39, 2.51s/it] +2025-02-06 00:23:29 - ERROR - stderr - +2025-02-06 00:23:29 - ERROR - stderr - +2025-02-06 00:23:29 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.5167653560638428, 'learning_rate': 4.5358630655902916e-06, 'epoch': 2.08} +2025-02-06 00:23:29 - ERROR - stderr - 69%|████���█▉ | 15558/22434 [14:15:49<4:47:39, 2.51s/it] +2025-02-06 00:23:32 - ERROR - stderr - 69%|██████▉ | 15559/22434 [14:15:52<4:48:57, 2.52s/it] +2025-02-06 00:23:32 - ERROR - stderr - +2025-02-06 00:23:32 - ERROR - stderr - +2025-02-06 00:23:32 - INFO - stdout - {'loss': 0.3306, 'grad_norm': 1.448601484298706, 'learning_rate': 4.53465396248445e-06, 'epoch': 2.08} +2025-02-06 00:23:32 - ERROR - stderr - 69%|██████▉ | 15559/22434 [14:15:52<4:48:57, 2.52s/it] +2025-02-06 00:23:34 - ERROR - stderr - 69%|██████▉ | 15560/22434 [14:15:54<4:49:15, 2.52s/it] +2025-02-06 00:23:34 - ERROR - stderr - +2025-02-06 00:23:34 - ERROR - stderr - +2025-02-06 00:23:34 - INFO - stdout - {'loss': 0.421, 'grad_norm': 1.5941975116729736, 'learning_rate': 4.533444973298516e-06, 'epoch': 2.08} +2025-02-06 00:23:34 - ERROR - stderr - 69%|██████▉ | 15560/22434 [14:15:54<4:49:15, 2.52s/it] +2025-02-06 00:23:37 - ERROR - stderr - 69%|██████▉ | 15561/22434 [14:15:57<4:50:51, 2.54s/it] +2025-02-06 00:23:37 - ERROR - stderr - +2025-02-06 00:23:37 - ERROR - stderr - +2025-02-06 00:23:37 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.4270976781845093, 'learning_rate': 4.5322360980576904e-06, 'epoch': 2.08} +2025-02-06 00:23:37 - ERROR - stderr - 69%|██████▉ | 15561/22434 [14:15:57<4:50:51, 2.54s/it] +2025-02-06 00:23:39 - ERROR - stderr - 69%|██████▉ | 15562/22434 [14:15:59<4:48:33, 2.52s/it] +2025-02-06 00:23:39 - ERROR - stderr - +2025-02-06 00:23:39 - ERROR - stderr - +2025-02-06 00:23:39 - INFO - stdout - {'loss': 0.392, 'grad_norm': 1.5342450141906738, 'learning_rate': 4.531027336787172e-06, 'epoch': 2.08} +2025-02-06 00:23:39 - ERROR - stderr - 69%|██████▉ | 15562/22434 [14:15:59<4:48:33, 2.52s/it] +2025-02-06 00:23:42 - ERROR - stderr - 69%|██████▉ | 15563/22434 [14:16:02<4:58:25, 2.61s/it] +2025-02-06 00:23:42 - ERROR - stderr - +2025-02-06 00:23:42 - ERROR - stderr - +2025-02-06 00:23:42 - INFO - stdout - {'loss': 0.4221, 'grad_norm': 1.5524028539657593, 'learning_rate': 4.529818689512154e-06, 'epoch': 2.08} +2025-02-06 00:23:42 - ERROR - stderr - 69%|██████▉ | 15563/22434 [14:16:02<4:58:25, 2.61s/it] +2025-02-06 00:23:45 - ERROR - stderr - 69%|██████▉ | 15564/22434 [14:16:04<4:53:04, 2.56s/it] +2025-02-06 00:23:45 - ERROR - stderr - +2025-02-06 00:23:45 - ERROR - stderr - +2025-02-06 00:23:45 - INFO - stdout - {'loss': 0.3553, 'grad_norm': 1.3768445253372192, 'learning_rate': 4.528610156257832e-06, 'epoch': 2.08} +2025-02-06 00:23:45 - ERROR - stderr - 69%|██████▉ | 15564/22434 [14:16:04<4:53:04, 2.56s/it] +2025-02-06 00:23:47 - ERROR - stderr - 69%|██████▉ | 15565/22434 [14:16:07<4:56:37, 2.59s/it] +2025-02-06 00:23:47 - ERROR - stderr - +2025-02-06 00:23:47 - ERROR - stderr - +2025-02-06 00:23:47 - INFO - stdout - {'loss': 0.4038, 'grad_norm': 1.478058934211731, 'learning_rate': 4.527401737049396e-06, 'epoch': 2.08} +2025-02-06 00:23:47 - ERROR - stderr - 69%|██████▉ | 15565/22434 [14:16:07<4:56:37, 2.59s/it] +2025-02-06 00:23:50 - ERROR - stderr - 69%|██████▉ | 15566/22434 [14:16:10<4:50:59, 2.54s/it] +2025-02-06 00:23:50 - ERROR - stderr - +2025-02-06 00:23:50 - ERROR - stderr - +2025-02-06 00:23:50 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.5652023553848267, 'learning_rate': 4.526193431912038e-06, 'epoch': 2.08} +2025-02-06 00:23:50 - ERROR - stderr - 69%|██████▉ | 15566/22434 [14:16:10<4:50:59, 2.54s/it] +2025-02-06 00:23:52 - ERROR - stderr - 69%|██████▉ | 15567/22434 [14:16:12<4:47:12, 2.51s/it] +2025-02-06 00:23:52 - ERROR - stderr - +2025-02-06 00:23:52 - ERROR - stderr - +2025-02-06 00:23:52 - INFO - stdout - {'loss': 0.3679, 'grad_norm': 1.4810761213302612, 'learning_rate': 4.524985240870932e-06, 'epoch': 2.08} +2025-02-06 00:23:52 - ERROR - stderr - 69%|██████▉ | 15567/22434 [14:16:12<4:47:12, 2.51s/it] +2025-02-06 00:23:55 - ERROR - stderr - 69%|██████▉ | 15568/22434 [14:16:14<4:44:43, 2.49s/it] +2025-02-06 00:23:55 - ERROR - stderr - +2025-02-06 00:23:55 - ERROR - stderr - +2025-02-06 00:23:55 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.6437420845031738, 'learning_rate': 4.523777163951277e-06, 'epoch': 2.08} +2025-02-06 00:23:55 - ERROR - stderr - 69%|██████▉ | 15568/22434 [14:16:14<4:44:43, 2.49s/it] +2025-02-06 00:23:57 - ERROR - stderr - 69%|██████▉ | 15569/22434 [14:16:17<4:47:33, 2.51s/it] +2025-02-06 00:23:57 - ERROR - stderr - +2025-02-06 00:23:57 - ERROR - stderr - +2025-02-06 00:23:57 - INFO - stdout - {'loss': 0.3412, 'grad_norm': 1.3160046339035034, 'learning_rate': 4.5225692011782395e-06, 'epoch': 2.08} +2025-02-06 00:23:57 - ERROR - stderr - 69%|██████▉ | 15569/22434 [14:16:17<4:47:33, 2.51s/it] +2025-02-06 00:24:00 - ERROR - stderr - 69%|██████▉ | 15570/22434 [14:16:20<4:50:39, 2.54s/it] +2025-02-06 00:24:00 - ERROR - stderr - +2025-02-06 00:24:00 - ERROR - stderr - +2025-02-06 00:24:00 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.2827880382537842, 'learning_rate': 4.521361352577011e-06, 'epoch': 2.08} +2025-02-06 00:24:00 - ERROR - stderr - 69%|██████▉ | 15570/22434 [14:16:20<4:50:39, 2.54s/it] +2025-02-06 00:24:03 - ERROR - stderr - 69%|██████▉ | 15571/22434 [14:16:22<5:01:49, 2.64s/it] +2025-02-06 00:24:03 - ERROR - stderr - +2025-02-06 00:24:03 - ERROR - stderr - +2025-02-06 00:24:03 - INFO - stdout - {'loss': 0.4538, 'grad_norm': 1.5145409107208252, 'learning_rate': 4.520153618172764e-06, 'epoch': 2.08} +2025-02-06 00:24:03 - ERROR - stderr - 69%|██████▉ | 15571/22434 [14:16:22<5:01:49, 2.64s/it] +2025-02-06 00:24:05 - ERROR - stderr - 69%|██████▉ | 15572/22434 [14:16:25<4:55:03, 2.58s/it] +2025-02-06 00:24:05 - ERROR - stderr - +2025-02-06 00:24:05 - ERROR - stderr - +2025-02-06 00:24:05 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.621311902999878, 'learning_rate': 4.518945997990665e-06, 'epoch': 2.08} +2025-02-06 00:24:05 - ERROR - stderr - 69%|██████▉ | 15572/22434 [14:16:25<4:55:03, 2.58s/it] +2025-02-06 00:24:08 - ERROR - stderr - 69%|██████▉ | 15573/22434 [14:16:27<4:51:32, 2.55s/it] +2025-02-06 00:24:08 - ERROR - stderr - +2025-02-06 00:24:08 - ERROR - stderr - +2025-02-06 00:24:08 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.3657732009887695, 'learning_rate': 4.5177384920558985e-06, 'epoch': 2.08} +2025-02-06 00:24:08 - ERROR - stderr - 69%|██████▉ | 15573/22434 [14:16:27<4:51:32, 2.55s/it] +2025-02-06 00:24:10 - ERROR - stderr - 69%|██████▉ | 15574/22434 [14:16:30<4:53:17, 2.57s/it] +2025-02-06 00:24:10 - ERROR - stderr - +2025-02-06 00:24:10 - ERROR - stderr - +2025-02-06 00:24:10 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.5254472494125366, 'learning_rate': 4.516531100393624e-06, 'epoch': 2.08} +2025-02-06 00:24:10 - ERROR - stderr - 69%|██████▉ | 15574/22434 [14:16:30<4:53:17, 2.57s/it] +2025-02-06 00:24:13 - ERROR - stderr - 69%|██████▉ | 15575/22434 [14:16:32<4:49:47, 2.53s/it] +2025-02-06 00:24:13 - ERROR - stderr - +2025-02-06 00:24:13 - ERROR - stderr - +2025-02-06 00:24:13 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.6675376892089844, 'learning_rate': 4.515323823029012e-06, 'epoch': 2.08} +2025-02-06 00:24:13 - ERROR - stderr - 69%|██████▉ | 15575/22434 [14:16:32<4:49:47, 2.53s/it] +2025-02-06 00:24:15 - ERROR - stderr - 69%|██████▉ | 15576/22434 [14:16:35<4:51:25, 2.55s/it] +2025-02-06 00:24:15 - ERROR - stderr - +2025-02-06 00:24:15 - ERROR - stderr - +2025-02-06 00:24:15 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.5225484371185303, 'learning_rate': 4.5141166599872255e-06, 'epoch': 2.08} +2025-02-06 00:24:15 - ERROR - stderr - 69%|██████▉ | 15576/22434 [14:16:35<4:51:25, 2.55s/it] +2025-02-06 00:24:18 - ERROR - stderr - 69%|██████▉ | 15577/22434 [14:16:37<4:47:45, 2.52s/it] +2025-02-06 00:24:18 - ERROR - stderr - +2025-02-06 00:24:18 - ERROR - stderr - +2025-02-06 00:24:18 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.336268663406372, 'learning_rate': 4.512909611293429e-06, 'epoch': 2.08} +2025-02-06 00:24:18 - ERROR - stderr - 69%|██████▉ | 15577/22434 [14:16:38<4:47:45, 2.52s/it] +2025-02-06 00:24:20 - ERROR - stderr - 69%|██████▉ | 15578/22434 [14:16:40<4:48:18, 2.52s/it] +2025-02-06 00:24:20 - ERROR - stderr - +2025-02-06 00:24:20 - ERROR - stderr - +2025-02-06 00:24:20 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.3677226305007935, 'learning_rate': 4.51170267697278e-06, 'epoch': 2.08} +2025-02-06 00:24:20 - ERROR - stderr - 69%|██████▉ | 15578/22434 [14:16:40<4:48:18, 2.52s/it] +2025-02-06 00:24:23 - ERROR - stderr - 69%|██████▉ | 15579/22434 [14:16:43<4:48:42, 2.53s/it] +2025-02-06 00:24:23 - ERROR - stderr - +2025-02-06 00:24:23 - ERROR - stderr - +2025-02-06 00:24:23 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.5330201387405396, 'learning_rate': 4.510495857050437e-06, 'epoch': 2.08} +2025-02-06 00:24:23 - ERROR - stderr - 69%|██████▉ | 15579/22434 [14:16:43<4:48:42, 2.53s/it] +2025-02-06 00:24:25 - ERROR - stderr - 69%|██████▉ | 15580/22434 [14:16:45<4:46:11, 2.51s/it] +2025-02-06 00:24:25 - ERROR - stderr - +2025-02-06 00:24:25 - ERROR - stderr - +2025-02-06 00:24:25 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.454996943473816, 'learning_rate': 4.509289151551556e-06, 'epoch': 2.08} +2025-02-06 00:24:25 - ERROR - stderr - 69%|██████▉ | 15580/22434 [14:16:45<4:46:11, 2.51s/it] +2025-02-06 00:24:28 - ERROR - stderr - 69%|██████▉ | 15581/22434 [14:16:47<4:44:40, 2.49s/it] +2025-02-06 00:24:28 - ERROR - stderr - +2025-02-06 00:24:28 - ERROR - stderr - +2025-02-06 00:24:28 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.347825288772583, 'learning_rate': 4.508082560501288e-06, 'epoch': 2.08} +2025-02-06 00:24:28 - ERROR - stderr - 69%|██████▉ | 15581/22434 [14:16:48<4:44:40, 2.49s/it] +2025-02-06 00:24:28 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2915 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 00:24:28 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2915 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 00:24:30 - ERROR - stderr - 69%|██████▉ | 15582/22434 [14:16:50<4:46:56, 2.51s/it] +2025-02-06 00:24:30 - ERROR - stderr - +2025-02-06 00:24:30 - ERROR - stderr - +2025-02-06 00:24:30 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.611128330230713, 'learning_rate': 4.5068760839247835e-06, 'epoch': 2.08} +2025-02-06 00:24:30 - ERROR - stderr - 69%|██████▉ | 15582/22434 [14:16:50<4:46:56, 2.51s/it] +2025-02-06 00:24:36 - ERROR - stderr - 69%|██████▉ | 15583/22434 [14:16:56<6:36:12, 3.47s/it] +2025-02-06 00:24:36 - ERROR - stderr - +2025-02-06 00:24:36 - ERROR - stderr - +2025-02-06 00:24:36 - INFO - stdout - {'loss': 0.3251, 'grad_norm': 1.219836711883545, 'learning_rate': 4.505669721847193e-06, 'epoch': 2.08} +2025-02-06 00:24:36 - ERROR - stderr - 69%|██████▉ | 15583/22434 [14:16:56<6:36:12, 3.47s/it] +2025-02-06 00:24:38 - ERROR - stderr - 69%|██████▉ | 15584/22434 [14:16:58<6:02:36, 3.18s/it] +2025-02-06 00:24:38 - ERROR - stderr - +2025-02-06 00:24:38 - ERROR - stderr - +2025-02-06 00:24:38 - INFO - stdout - {'loss': 0.4321, 'grad_norm': 1.643996000289917, 'learning_rate': 4.504463474293656e-06, 'epoch': 2.08} +2025-02-06 00:24:38 - ERROR - stderr - 69%|██████▉ | 15584/22434 [14:16:58<6:02:36, 3.18s/it] +2025-02-06 00:24:41 - ERROR - stderr - 69%|██████▉ | 15585/22434 [14:17:01<5:39:47, 2.98s/it] +2025-02-06 00:24:41 - ERROR - stderr - +2025-02-06 00:24:41 - ERROR - stderr - +2025-02-06 00:24:41 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.5773465633392334, 'learning_rate': 4.503257341289321e-06, 'epoch': 2.08} +2025-02-06 00:24:41 - ERROR - stderr - 69%|██████▉ | 15585/22434 [14:17:01<5:39:47, 2.98s/it] +2025-02-06 00:24:43 - ERROR - stderr - 69%|██████▉ | 15586/22434 [14:17:03<5:21:55, 2.82s/it] +2025-02-06 00:24:43 - ERROR - stderr - +2025-02-06 00:24:43 - ERROR - stderr - +2025-02-06 00:24:43 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.5046992301940918, 'learning_rate': 4.5020513228593275e-06, 'epoch': 2.08} +2025-02-06 00:24:43 - ERROR - stderr - 69%|██████▉ | 15586/22434 [14:17:03<5:21:55, 2.82s/it] +2025-02-06 00:24:46 - ERROR - stderr - 69%|██████▉ | 15587/22434 [14:17:06<5:10:25, 2.72s/it] +2025-02-06 00:24:46 - ERROR - stderr - +2025-02-06 00:24:46 - ERROR - stderr - +2025-02-06 00:24:46 - INFO - stdout - {'loss': 0.4339, 'grad_norm': 1.5279245376586914, 'learning_rate': 4.500845419028817e-06, 'epoch': 2.08} +2025-02-06 00:24:46 - ERROR - stderr - 69%|██████▉ | 15587/22434 [14:17:06<5:10:25, 2.72s/it] +2025-02-06 00:24:48 - ERROR - stderr - 69%|██████▉ | 15588/22434 [14:17:08<5:04:51, 2.67s/it] +2025-02-06 00:24:48 - ERROR - stderr - +2025-02-06 00:24:48 - ERROR - stderr - +2025-02-06 00:24:48 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.5555322170257568, 'learning_rate': 4.4996396298229126e-06, 'epoch': 2.08} +2025-02-06 00:24:48 - ERROR - stderr - 69%|██████▉ | 15588/22434 [14:17:08<5:04:51, 2.67s/it] +2025-02-06 00:24:51 - ERROR - stderr - 69%|██████▉ | 15589/22434 [14:17:11<4:59:45, 2.63s/it] +2025-02-06 00:24:51 - ERROR - stderr - +2025-02-06 00:24:51 - ERROR - stderr - +2025-02-06 00:24:51 - INFO - stdout - {'loss': 0.3801, 'grad_norm': 1.4350197315216064, 'learning_rate': 4.498433955266761e-06, 'epoch': 2.08} +2025-02-06 00:24:51 - ERROR - stderr - 69%|██████▉ | 15589/22434 [14:17:11<4:59:45, 2.63s/it] +2025-02-06 00:24:53 - ERROR - stderr - 69%|██████▉ | 15590/22434 [14:17:13<4:52:26, 2.56s/it] +2025-02-06 00:24:53 - ERROR - stderr - +2025-02-06 00:24:53 - ERROR - stderr - +2025-02-06 00:24:53 - INFO - stdout - {'loss': 0.415, 'grad_norm': 1.5810738801956177, 'learning_rate': 4.497228395385494e-06, 'epoch': 2.08} +2025-02-06 00:24:53 - ERROR - stderr - 69%|██████▉ | 15590/22434 [14:17:13<4:52:26, 2.56s/it] +2025-02-06 00:24:56 - ERROR - stderr - 69%|██████▉ | 15591/22434 [14:17:16<4:49:23, 2.54s/it] +2025-02-06 00:24:56 - ERROR - stderr - +2025-02-06 00:24:56 - ERROR - stderr - +2025-02-06 00:24:56 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.4832266569137573, 'learning_rate': 4.4960229502042275e-06, 'epoch': 2.08} +2025-02-06 00:24:56 - ERROR - stderr - 69%|██████▉ | 15591/22434 [14:17:16<4:49:23, 2.54s/it] +2025-02-06 00:24:58 - ERROR - stderr - 70%|██████▉ | 15592/22434 [14:17:18<4:49:05, 2.54s/it] +2025-02-06 00:24:58 - ERROR - stderr - +2025-02-06 00:24:58 - ERROR - stderr - +2025-02-06 00:24:58 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.4933520555496216, 'learning_rate': 4.494817619748103e-06, 'epoch': 2.09} +2025-02-06 00:24:58 - ERROR - stderr - 70%|██████▉ | 15592/22434 [14:17:18<4:49:05, 2.54s/it] +2025-02-06 00:25:01 - ERROR - stderr - 70%|██████▉ | 15593/22434 [14:17:21<4:50:59, 2.55s/it] +2025-02-06 00:25:01 - ERROR - stderr - +2025-02-06 00:25:01 - ERROR - stderr - +2025-02-06 00:25:01 - INFO - stdout - {'loss': 0.4139, 'grad_norm': 1.5231611728668213, 'learning_rate': 4.49361240404223e-06, 'epoch': 2.09} +2025-02-06 00:25:01 - ERROR - stderr - 70%|██████▉ | 15593/22434 [14:17:21<4:50:59, 2.55s/it] +2025-02-06 00:25:04 - ERROR - stderr - 70%|██████▉ | 15594/22434 [14:17:23<4:50:34, 2.55s/it] +2025-02-06 00:25:04 - ERROR - stderr - +2025-02-06 00:25:04 - ERROR - stderr - +2025-02-06 00:25:04 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.6450469493865967, 'learning_rate': 4.492407303111745e-06, 'epoch': 2.09} +2025-02-06 00:25:04 - ERROR - stderr - 70%|██████▉ | 15594/22434 [14:17:23<4:50:34, 2.55s/it] +2025-02-06 00:25:06 - ERROR - stderr - 70%|██████▉ | 15595/22434 [14:17:26<4:49:17, 2.54s/it] +2025-02-06 00:25:06 - ERROR - stderr - +2025-02-06 00:25:06 - ERROR - stderr - +2025-02-06 00:25:06 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.3872545957565308, 'learning_rate': 4.491202316981755e-06, 'epoch': 2.09} +2025-02-06 00:25:06 - ERROR - stderr - 70%|██████▉ | 15595/22434 [14:17:26<4:49:17, 2.54s/it] +2025-02-06 00:25:09 - ERROR - stderr - 70%|██████▉ | 15596/22434 [14:17:28<4:50:45, 2.55s/it] +2025-02-06 00:25:09 - ERROR - stderr - +2025-02-06 00:25:09 - ERROR - stderr - +2025-02-06 00:25:09 - INFO - stdout - {'loss': 0.4259, 'grad_norm': 1.570786714553833, 'learning_rate': 4.489997445677383e-06, 'epoch': 2.09} +2025-02-06 00:25:09 - ERROR - stderr - 70%|██████▉ | 15596/22434 [14:17:28<4:50:45, 2.55s/it] +2025-02-06 00:25:11 - ERROR - stderr - 70%|██████▉ | 15597/22434 [14:17:31<4:48:25, 2.53s/it] +2025-02-06 00:25:11 - ERROR - stderr - +2025-02-06 00:25:11 - ERROR - stderr - +2025-02-06 00:25:11 - INFO - stdout - {'loss': 0.4404, 'grad_norm': 1.5542056560516357, 'learning_rate': 4.488792689223741e-06, 'epoch': 2.09} +2025-02-06 00:25:11 - ERROR - stderr - 70%|██████▉ | 15597/22434 [14:17:31<4:48:25, 2.53s/it] +2025-02-06 00:25:14 - ERROR - stderr - 70%|██████▉ | 15598/22434 [14:17:33<4:47:23, 2.52s/it] +2025-02-06 00:25:14 - ERROR - stderr - +2025-02-06 00:25:14 - ERROR - stderr - +2025-02-06 00:25:14 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.400649905204773, 'learning_rate': 4.487588047645941e-06, 'epoch': 2.09} +2025-02-06 00:25:14 - ERROR - stderr - 70%|██████▉ | 15598/22434 [14:17:33<4:47:23, 2.52s/it] +2025-02-06 00:25:16 - ERROR - stderr - 70%|██████▉ | 15599/22434 [14:17:36<4:48:49, 2.54s/it] +2025-02-06 00:25:16 - ERROR - stderr - +2025-02-06 00:25:16 - ERROR - stderr - +2025-02-06 00:25:16 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.40584397315979, 'learning_rate': 4.486383520969094e-06, 'epoch': 2.09} +2025-02-06 00:25:16 - ERROR - stderr - 70%|██████▉ | 15599/22434 [14:17:36<4:48:49, 2.54s/it] +2025-02-06 00:25:19 - ERROR - stderr - 70%|██████▉ | 15600/22434 [14:17:38<4:47:31, 2.52s/it] +2025-02-06 00:25:19 - ERROR - stderr - +2025-02-06 00:25:19 - ERROR - stderr - +2025-02-06 00:25:19 - INFO - stdout - {'loss': 0.3836, 'grad_norm': 1.3746213912963867, 'learning_rate': 4.485179109218307e-06, 'epoch': 2.09} +2025-02-06 00:25:19 - ERROR - stderr - 70%|██████▉ | 15600/22434 [14:17:38<4:47:31, 2.52s/it] +2025-02-06 00:25:21 - ERROR - stderr - 70%|██████▉ | 15601/22434 [14:17:41<4:46:24, 2.51s/it] +2025-02-06 00:25:21 - ERROR - stderr - +2025-02-06 00:25:21 - ERROR - stderr - +2025-02-06 00:25:21 - INFO - stdout - {'loss': 0.3888, 'grad_norm': 1.5158931016921997, 'learning_rate': 4.483974812418684e-06, 'epoch': 2.09} +2025-02-06 00:25:21 - ERROR - stderr - 70%|██████▉ | 15601/22434 [14:17:41<4:46:24, 2.51s/it] +2025-02-06 00:25:24 - ERROR - stderr - 70%|██████▉ | 15602/22434 [14:17:43<4:45:38, 2.51s/it] +2025-02-06 00:25:24 - ERROR - stderr - +2025-02-06 00:25:24 - ERROR - stderr - +2025-02-06 00:25:24 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.37674081325531, 'learning_rate': 4.482770630595328e-06, 'epoch': 2.09} +2025-02-06 00:25:24 - ERROR - stderr - 70%|██████▉ | 15602/22434 [14:17:43<4:45:38, 2.51s/it] +2025-02-06 00:25:26 - ERROR - stderr - 70%|██████▉ | 15603/22434 [14:17:46<4:45:46, 2.51s/it] +2025-02-06 00:25:26 - ERROR - stderr - +2025-02-06 00:25:26 - ERROR - stderr - +2025-02-06 00:25:26 - INFO - stdout - {'loss': 0.4089, 'grad_norm': 1.4940325021743774, 'learning_rate': 4.481566563773337e-06, 'epoch': 2.09} +2025-02-06 00:25:26 - ERROR - stderr - 70%|██████▉ | 15603/22434 [14:17:46<4:45:46, 2.51s/it] +2025-02-06 00:25:29 - ERROR - stderr - 70%|██████▉ | 15604/22434 [14:17:49<4:54:38, 2.59s/it] +2025-02-06 00:25:29 - ERROR - stderr - +2025-02-06 00:25:29 - ERROR - stderr - +2025-02-06 00:25:29 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.4464157819747925, 'learning_rate': 4.4803626119778135e-06, 'epoch': 2.09} +2025-02-06 00:25:29 - ERROR - stderr - 70%|██████▉ | 15604/22434 [14:17:49<4:54:38, 2.59s/it] +2025-02-06 00:25:31 - ERROR - stderr - 70%|██████▉ | 15605/22434 [14:17:51<4:53:01, 2.57s/it] +2025-02-06 00:25:32 - ERROR - stderr - +2025-02-06 00:25:32 - ERROR - stderr - +2025-02-06 00:25:32 - INFO - stdout - {'loss': 0.4024, 'grad_norm': 1.3560158014297485, 'learning_rate': 4.4791587752338475e-06, 'epoch': 2.09} +2025-02-06 00:25:32 - ERROR - stderr - 70%|██████▉ | 15605/22434 [14:17:51<4:53:01, 2.57s/it] +2025-02-06 00:25:34 - ERROR - stderr - 70%|██████▉ | 15606/22434 [14:17:54<4:53:31, 2.58s/it] +2025-02-06 00:25:34 - ERROR - stderr - +2025-02-06 00:25:34 - ERROR - stderr - +2025-02-06 00:25:34 - INFO - stdout - {'loss': 0.33, 'grad_norm': 1.2261706590652466, 'learning_rate': 4.4779550535665385e-06, 'epoch': 2.09} +2025-02-06 00:25:34 - ERROR - stderr - 70%|██████▉ | 15606/22434 [14:17:54<4:53:31, 2.58s/it] +2025-02-06 00:25:37 - ERROR - stderr - 70%|██████▉ | 15607/22434 [14:17:56<4:52:57, 2.57s/it] +2025-02-06 00:25:37 - ERROR - stderr - +2025-02-06 00:25:37 - ERROR - stderr - +2025-02-06 00:25:37 - INFO - stdout - {'loss': 0.3791, 'grad_norm': 1.3653289079666138, 'learning_rate': 4.4767514470009646e-06, 'epoch': 2.09} +2025-02-06 00:25:37 - ERROR - stderr - 70%|██████▉ | 15607/22434 [14:17:56<4:52:57, 2.57s/it] +2025-02-06 00:25:39 - ERROR - stderr - 70%|██████▉ | 15608/22434 [14:17:59<4:51:19, 2.56s/it] +2025-02-06 00:25:39 - ERROR - stderr - +2025-02-06 00:25:39 - ERROR - stderr - +2025-02-06 00:25:39 - INFO - stdout - {'loss': 0.3943, 'grad_norm': 1.5935239791870117, 'learning_rate': 4.475547955562225e-06, 'epoch': 2.09} +2025-02-06 00:25:39 - ERROR - stderr - 70%|██████▉ | 15608/22434 [14:17:59<4:51:19, 2.56s/it] +2025-02-06 00:25:42 - ERROR - stderr - 70%|██████▉ | 15609/22434 [14:18:01<4:50:28, 2.55s/it] +2025-02-06 00:25:42 - ERROR - stderr - +2025-02-06 00:25:42 - ERROR - stderr - +2025-02-06 00:25:42 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.4235332012176514, 'learning_rate': 4.4743445792754014e-06, 'epoch': 2.09} +2025-02-06 00:25:42 - ERROR - stderr - 70%|██████▉ | 15609/22434 [14:18:02<4:50:28, 2.55s/it] +2025-02-06 00:25:44 - ERROR - stderr - 70%|██████▉ | 15610/22434 [14:18:04<4:47:39, 2.53s/it] +2025-02-06 00:25:44 - ERROR - stderr - +2025-02-06 00:25:44 - ERROR - stderr - +2025-02-06 00:25:44 - INFO - stdout - {'loss': 0.3442, 'grad_norm': 1.3664047718048096, 'learning_rate': 4.4731413181655794e-06, 'epoch': 2.09} +2025-02-06 00:25:44 - ERROR - stderr - 70%|██████▉ | 15610/22434 [14:18:04<4:47:39, 2.53s/it] +2025-02-06 00:25:47 - ERROR - stderr - 70%|██████▉ | 15611/22434 [14:18:07<4:48:26, 2.54s/it] +2025-02-06 00:25:47 - ERROR - stderr - +2025-02-06 00:25:47 - ERROR - stderr - +2025-02-06 00:25:47 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.5194188356399536, 'learning_rate': 4.4719381722578405e-06, 'epoch': 2.09} +2025-02-06 00:25:47 - ERROR - stderr - 70%|██████▉ | 15611/22434 [14:18:07<4:48:26, 2.54s/it] +2025-02-06 00:25:49 - ERROR - stderr - 70%|██████▉ | 15612/22434 [14:18:09<4:45:50, 2.51s/it] +2025-02-06 00:25:49 - ERROR - stderr - +2025-02-06 00:25:49 - ERROR - stderr - +2025-02-06 00:25:49 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.4515771865844727, 'learning_rate': 4.4707351415772535e-06, 'epoch': 2.09} +2025-02-06 00:25:49 - ERROR - stderr - 70%|██████▉ | 15612/22434 [14:18:09<4:45:50, 2.51s/it] +2025-02-06 00:25:52 - ERROR - stderr - 70%|██████▉ | 15613/22434 [14:18:11<4:45:33, 2.51s/it] +2025-02-06 00:25:52 - ERROR - stderr - +2025-02-06 00:25:52 - ERROR - stderr - +2025-02-06 00:25:52 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.5529335737228394, 'learning_rate': 4.469532226148908e-06, 'epoch': 2.09} +2025-02-06 00:25:52 - ERROR - stderr - 70%|██████▉ | 15613/22434 [14:18:12<4:45:33, 2.51s/it] +2025-02-06 00:25:54 - ERROR - stderr - 70%|██████▉ | 15614/22434 [14:18:14<4:45:41, 2.51s/it] +2025-02-06 00:25:54 - ERROR - stderr - +2025-02-06 00:25:54 - ERROR - stderr - +2025-02-06 00:25:54 - INFO - stdout - {'loss': 0.4455, 'grad_norm': 1.6439588069915771, 'learning_rate': 4.46832942599787e-06, 'epoch': 2.09} +2025-02-06 00:25:54 - ERROR - stderr - 70%|██████▉ | 15614/22434 [14:18:14<4:45:41, 2.51s/it] +2025-02-06 00:25:57 - ERROR - stderr - 70%|██████▉ | 15615/22434 [14:18:17<4:45:54, 2.52s/it] +2025-02-06 00:25:57 - ERROR - stderr - +2025-02-06 00:25:57 - ERROR - stderr - +2025-02-06 00:25:57 - INFO - stdout - {'loss': 0.407, 'grad_norm': 1.5345412492752075, 'learning_rate': 4.467126741149209e-06, 'epoch': 2.09} +2025-02-06 00:25:57 - ERROR - stderr - 70%|██████▉ | 15615/22434 [14:18:17<4:45:54, 2.52s/it] +2025-02-06 00:25:59 - ERROR - stderr - 70%|██████▉ | 15616/22434 [14:18:19<4:44:48, 2.51s/it] +2025-02-06 00:25:59 - ERROR - stderr - +2025-02-06 00:25:59 - ERROR - stderr - +2025-02-06 00:25:59 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.5773178339004517, 'learning_rate': 4.4659241716279974e-06, 'epoch': 2.09} +2025-02-06 00:25:59 - ERROR - stderr - 70%|██████▉ | 15616/22434 [14:18:19<4:44:48, 2.51s/it] +2025-02-06 00:26:02 - ERROR - stderr - 70%|██████▉ | 15617/22434 [14:18:22<4:47:42, 2.53s/it] +2025-02-06 00:26:02 - ERROR - stderr - +2025-02-06 00:26:02 - ERROR - stderr - +2025-02-06 00:26:02 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.4791076183319092, 'learning_rate': 4.464721717459298e-06, 'epoch': 2.09} +2025-02-06 00:26:02 - ERROR - stderr - 70%|██████▉ | 15617/22434 [14:18:22<4:47:42, 2.53s/it] +2025-02-06 00:26:04 - ERROR - stderr - 70%|██████▉ | 15618/22434 [14:18:24<4:46:46, 2.52s/it] +2025-02-06 00:26:04 - ERROR - stderr - +2025-02-06 00:26:04 - ERROR - stderr - +2025-02-06 00:26:04 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.3979734182357788, 'learning_rate': 4.463519378668185e-06, 'epoch': 2.09} +2025-02-06 00:26:04 - ERROR - stderr - 70%|██████▉ | 15618/22434 [14:18:24<4:46:46, 2.52s/it] +2025-02-06 00:26:07 - ERROR - stderr - 70%|██████▉ | 15619/22434 [14:18:27<4:43:00, 2.49s/it] +2025-02-06 00:26:07 - ERROR - stderr - +2025-02-06 00:26:07 - ERROR - stderr - +2025-02-06 00:26:07 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.4221725463867188, 'learning_rate': 4.46231715527971e-06, 'epoch': 2.09} +2025-02-06 00:26:07 - ERROR - stderr - 70%|██████▉ | 15619/22434 [14:18:27<4:43:00, 2.49s/it] +2025-02-06 00:26:10 - ERROR - stderr - 70%|██████▉ | 15620/22434 [14:18:29<4:52:37, 2.58s/it] +2025-02-06 00:26:10 - ERROR - stderr - +2025-02-06 00:26:10 - ERROR - stderr - +2025-02-06 00:26:10 - INFO - stdout - {'loss': 0.3412, 'grad_norm': 1.3345770835876465, 'learning_rate': 4.461115047318934e-06, 'epoch': 2.09} +2025-02-06 00:26:10 - ERROR - stderr - 70%|██████▉ | 15620/22434 [14:18:29<4:52:37, 2.58s/it] +2025-02-06 00:26:12 - ERROR - stderr - 70%|██████▉ | 15621/22434 [14:18:32<4:47:36, 2.53s/it] +2025-02-06 00:26:12 - ERROR - stderr - +2025-02-06 00:26:12 - ERROR - stderr - +2025-02-06 00:26:12 - INFO - stdout - {'loss': 0.3604, 'grad_norm': 1.418308973312378, 'learning_rate': 4.459913054810913e-06, 'epoch': 2.09} +2025-02-06 00:26:12 - ERROR - stderr - 70%|██████▉ | 15621/22434 [14:18:32<4:47:36, 2.53s/it] +2025-02-06 00:26:14 - ERROR - stderr - 70%|██████▉ | 15622/22434 [14:18:34<4:44:05, 2.50s/it] +2025-02-06 00:26:14 - ERROR - stderr - +2025-02-06 00:26:14 - ERROR - stderr - +2025-02-06 00:26:14 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.4533021450042725, 'learning_rate': 4.458711177780705e-06, 'epoch': 2.09} +2025-02-06 00:26:14 - ERROR - stderr - 70%|██████▉ | 15622/22434 [14:18:34<4:44:05, 2.50s/it] +2025-02-06 00:26:17 - ERROR - stderr - 70%|██████▉ | 15623/22434 [14:18:37<4:43:59, 2.50s/it] +2025-02-06 00:26:17 - ERROR - stderr - +2025-02-06 00:26:17 - ERROR - stderr - +2025-02-06 00:26:17 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.4725196361541748, 'learning_rate': 4.45750941625336e-06, 'epoch': 2.09} +2025-02-06 00:26:17 - ERROR - stderr - 70%|██████▉ | 15623/22434 [14:18:37<4:43:59, 2.50s/it] +2025-02-06 00:26:19 - ERROR - stderr - 70%|██████▉ | 15624/22434 [14:18:39<4:43:20, 2.50s/it] +2025-02-06 00:26:19 - ERROR - stderr - +2025-02-06 00:26:19 - ERROR - stderr - +2025-02-06 00:26:19 - INFO - stdout - {'loss': 0.3647, 'grad_norm': 1.516769289970398, 'learning_rate': 4.456307770253927e-06, 'epoch': 2.09} +2025-02-06 00:26:19 - ERROR - stderr - 70%|██████▉ | 15624/22434 [14:18:39<4:43:20, 2.50s/it] +2025-02-06 00:26:22 - ERROR - stderr - 70%|██████▉ | 15625/22434 [14:18:42<4:43:40, 2.50s/it] +2025-02-06 00:26:22 - ERROR - stderr - +2025-02-06 00:26:22 - ERROR - stderr - +2025-02-06 00:26:22 - INFO - stdout - {'loss': 0.3895, 'grad_norm': 1.5384849309921265, 'learning_rate': 4.455106239807454e-06, 'epoch': 2.09} +2025-02-06 00:26:22 - ERROR - stderr - 70%|██████▉ | 15625/22434 [14:18:42<4:43:40, 2.50s/it] +2025-02-06 00:26:24 - ERROR - stderr - 70%|██████▉ | 15626/22434 [14:18:44<4:45:15, 2.51s/it] +2025-02-06 00:26:24 - ERROR - stderr - +2025-02-06 00:26:24 - ERROR - stderr - +2025-02-06 00:26:24 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.4518753290176392, 'learning_rate': 4.453904824938986e-06, 'epoch': 2.09} +2025-02-06 00:26:24 - ERROR - stderr - 70%|██████▉ | 15626/22434 [14:18:44<4:45:15, 2.51s/it] +2025-02-06 00:26:27 - ERROR - stderr - 70%|██████▉ | 15627/22434 [14:18:47<4:47:43, 2.54s/it] +2025-02-06 00:26:27 - ERROR - stderr - +2025-02-06 00:26:27 - ERROR - stderr - +2025-02-06 00:26:27 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.6094948053359985, 'learning_rate': 4.452703525673564e-06, 'epoch': 2.09} +2025-02-06 00:26:27 - ERROR - stderr - 70%|██████▉ | 15627/22434 [14:18:47<4:47:43, 2.54s/it] +2025-02-06 00:26:30 - ERROR - stderr - 70%|██████▉ | 15628/22434 [14:18:49<4:46:47, 2.53s/it] +2025-02-06 00:26:30 - ERROR - stderr - +2025-02-06 00:26:30 - ERROR - stderr - +2025-02-06 00:26:30 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.6361199617385864, 'learning_rate': 4.451502342036229e-06, 'epoch': 2.09} +2025-02-06 00:26:30 - ERROR - stderr - 70%|██████▉ | 15628/22434 [14:18:49<4:46:47, 2.53s/it] +2025-02-06 00:26:32 - ERROR - stderr - 70%|██████▉ | 15629/22434 [14:18:52<4:44:29, 2.51s/it] +2025-02-06 00:26:32 - ERROR - stderr - +2025-02-06 00:26:32 - ERROR - stderr - +2025-02-06 00:26:32 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.4584499597549438, 'learning_rate': 4.450301274052019e-06, 'epoch': 2.09} +2025-02-06 00:26:32 - ERROR - stderr - 70%|██████▉ | 15629/22434 [14:18:52<4:44:29, 2.51s/it] +2025-02-06 00:26:34 - ERROR - stderr - 70%|██████▉ | 15630/22434 [14:18:54<4:44:06, 2.51s/it] +2025-02-06 00:26:35 - ERROR - stderr - +2025-02-06 00:26:35 - ERROR - stderr - +2025-02-06 00:26:35 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.5477406978607178, 'learning_rate': 4.449100321745972e-06, 'epoch': 2.09} +2025-02-06 00:26:35 - ERROR - stderr - 70%|██████▉ | 15630/22434 [14:18:54<4:44:06, 2.51s/it] +2025-02-06 00:26:37 - ERROR - stderr - 70%|██████▉ | 15631/22434 [14:18:57<4:44:10, 2.51s/it] +2025-02-06 00:26:37 - ERROR - stderr - +2025-02-06 00:26:37 - ERROR - stderr - +2025-02-06 00:26:37 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.3772072792053223, 'learning_rate': 4.447899485143109e-06, 'epoch': 2.09} +2025-02-06 00:26:37 - ERROR - stderr - 70%|██████▉ | 15631/22434 [14:18:57<4:44:10, 2.51s/it] +2025-02-06 00:26:39 - ERROR - stderr - 70%|██████▉ | 15632/22434 [14:18:59<4:41:51, 2.49s/it] +2025-02-06 00:26:39 - ERROR - stderr - +2025-02-06 00:26:39 - ERROR - stderr - +2025-02-06 00:26:39 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.5331861972808838, 'learning_rate': 4.446698764268477e-06, 'epoch': 2.09} +2025-02-06 00:26:39 - ERROR - stderr - 70%|██████▉ | 15632/22434 [14:18:59<4:41:51, 2.49s/it] +2025-02-06 00:26:42 - ERROR - stderr - 70%|██████▉ | 15633/22434 [14:19:02<4:44:35, 2.51s/it] +2025-02-06 00:26:42 - ERROR - stderr - +2025-02-06 00:26:42 - ERROR - stderr - +2025-02-06 00:26:42 - INFO - stdout - {'loss': 0.3911, 'grad_norm': 1.4482604265213013, 'learning_rate': 4.445498159147087e-06, 'epoch': 2.09} +2025-02-06 00:26:42 - ERROR - stderr - 70%|██████▉ | 15633/22434 [14:19:02<4:44:35, 2.51s/it] +2025-02-06 00:26:44 - ERROR - stderr - 70%|██████▉ | 15634/22434 [14:19:04<4:42:25, 2.49s/it] +2025-02-06 00:26:44 - ERROR - stderr - +2025-02-06 00:26:44 - ERROR - stderr - +2025-02-06 00:26:44 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.5205113887786865, 'learning_rate': 4.444297669803981e-06, 'epoch': 2.09} +2025-02-06 00:26:44 - ERROR - stderr - 70%|██████▉ | 15634/22434 [14:19:04<4:42:25, 2.49s/it] +2025-02-06 00:26:47 - ERROR - stderr - 70%|██████▉ | 15635/22434 [14:19:07<4:40:00, 2.47s/it] +2025-02-06 00:26:47 - ERROR - stderr - +2025-02-06 00:26:47 - ERROR - stderr - +2025-02-06 00:26:47 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.4746811389923096, 'learning_rate': 4.4430972962641695e-06, 'epoch': 2.09} +2025-02-06 00:26:47 - ERROR - stderr - 70%|██████▉ | 15635/22434 [14:19:07<4:40:00, 2.47s/it] +2025-02-06 00:26:49 - ERROR - stderr - 70%|██████▉ | 15636/22434 [14:19:09<4:40:59, 2.48s/it] +2025-02-06 00:26:49 - ERROR - stderr - +2025-02-06 00:26:49 - ERROR - stderr - +2025-02-06 00:26:49 - INFO - stdout - {'loss': 0.4215, 'grad_norm': 1.596834421157837, 'learning_rate': 4.441897038552674e-06, 'epoch': 2.09} +2025-02-06 00:26:49 - ERROR - stderr - 70%|██████▉ | 15636/22434 [14:19:09<4:40:59, 2.48s/it] +2025-02-06 00:26:52 - ERROR - stderr - 70%|██████▉ | 15637/22434 [14:19:12<4:54:39, 2.60s/it] +2025-02-06 00:26:52 - ERROR - stderr - +2025-02-06 00:26:52 - ERROR - stderr - +2025-02-06 00:26:52 - INFO - stdout - {'loss': 0.4239, 'grad_norm': 1.5580360889434814, 'learning_rate': 4.440696896694523e-06, 'epoch': 2.09} +2025-02-06 00:26:52 - ERROR - stderr - 70%|██████▉ | 15637/22434 [14:19:12<4:54:39, 2.60s/it] +2025-02-06 00:26:55 - ERROR - stderr - 70%|██████▉ | 15638/22434 [14:19:14<4:50:24, 2.56s/it] +2025-02-06 00:26:55 - ERROR - stderr - +2025-02-06 00:26:55 - ERROR - stderr - +2025-02-06 00:26:55 - INFO - stdout - {'loss': 0.4042, 'grad_norm': 1.6279077529907227, 'learning_rate': 4.439496870714719e-06, 'epoch': 2.09} +2025-02-06 00:26:55 - ERROR - stderr - 70%|██████▉ | 15638/22434 [14:19:15<4:50:24, 2.56s/it] +2025-02-06 00:26:57 - ERROR - stderr - 70%|██████▉ | 15639/22434 [14:19:17<4:48:45, 2.55s/it] +2025-02-06 00:26:57 - ERROR - stderr - +2025-02-06 00:26:57 - ERROR - stderr - +2025-02-06 00:26:57 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.5757710933685303, 'learning_rate': 4.438296960638289e-06, 'epoch': 2.09} +2025-02-06 00:26:57 - ERROR - stderr - 70%|██████▉ | 15639/22434 [14:19:17<4:48:45, 2.55s/it] +2025-02-06 00:27:00 - ERROR - stderr - 70%|██████▉ | 15640/22434 [14:19:19<4:44:22, 2.51s/it] +2025-02-06 00:27:00 - ERROR - stderr - +2025-02-06 00:27:00 - ERROR - stderr - +2025-02-06 00:27:00 - INFO - stdout - {'loss': 0.4166, 'grad_norm': 1.3488764762878418, 'learning_rate': 4.4370971664902325e-06, 'epoch': 2.09} +2025-02-06 00:27:00 - ERROR - stderr - 70%|██████▉ | 15640/22434 [14:19:19<4:44:22, 2.51s/it] +2025-02-06 00:27:02 - ERROR - stderr - 70%|██████▉ | 15641/22434 [14:19:22<4:45:58, 2.53s/it] +2025-02-06 00:27:02 - ERROR - stderr - +2025-02-06 00:27:02 - ERROR - stderr - +2025-02-06 00:27:02 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.4164925813674927, 'learning_rate': 4.435897488295564e-06, 'epoch': 2.09} +2025-02-06 00:27:02 - ERROR - stderr - 70%|██████▉ | 15641/22434 [14:19:22<4:45:58, 2.53s/it] +2025-02-06 00:27:05 - ERROR - stderr - 70%|██████▉ | 15642/22434 [14:19:25<4:45:43, 2.52s/it] +2025-02-06 00:27:05 - ERROR - stderr - +2025-02-06 00:27:05 - ERROR - stderr - +2025-02-06 00:27:05 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.3988877534866333, 'learning_rate': 4.434697926079287e-06, 'epoch': 2.09} +2025-02-06 00:27:05 - ERROR - stderr - 70%|██████▉ | 15642/22434 [14:19:25<4:45:43, 2.52s/it] +2025-02-06 00:27:07 - ERROR - stderr - 70%|██████▉ | 15643/22434 [14:19:27<4:42:45, 2.50s/it] +2025-02-06 00:27:07 - ERROR - stderr - +2025-02-06 00:27:07 - ERROR - stderr - +2025-02-06 00:27:07 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.4428128004074097, 'learning_rate': 4.433498479866406e-06, 'epoch': 2.09} +2025-02-06 00:27:07 - ERROR - stderr - 70%|██████▉ | 15643/22434 [14:19:27<4:42:45, 2.50s/it] +2025-02-06 00:27:10 - ERROR - stderr - 70%|██████▉ | 15644/22434 [14:19:29<4:43:17, 2.50s/it] +2025-02-06 00:27:10 - ERROR - stderr - +2025-02-06 00:27:10 - ERROR - stderr - +2025-02-06 00:27:10 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.4864366054534912, 'learning_rate': 4.4322991496819234e-06, 'epoch': 2.09} +2025-02-06 00:27:10 - ERROR - stderr - 70%|██████▉ | 15644/22434 [14:19:30<4:43:17, 2.50s/it] +2025-02-06 00:27:12 - ERROR - stderr - 70%|██████▉ | 15645/22434 [14:19:32<4:42:11, 2.49s/it] +2025-02-06 00:27:12 - ERROR - stderr - +2025-02-06 00:27:12 - ERROR - stderr - +2025-02-06 00:27:12 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.3318320512771606, 'learning_rate': 4.431099935550837e-06, 'epoch': 2.09} +2025-02-06 00:27:12 - ERROR - stderr - 70%|██████▉ | 15645/22434 [14:19:32<4:42:11, 2.49s/it] +2025-02-06 00:27:15 - ERROR - stderr - 70%|██████▉ | 15646/22434 [14:19:34<4:44:12, 2.51s/it] +2025-02-06 00:27:15 - ERROR - stderr - +2025-02-06 00:27:15 - ERROR - stderr - +2025-02-06 00:27:15 - INFO - stdout - {'loss': 0.4413, 'grad_norm': 1.7070696353912354, 'learning_rate': 4.4299008374981436e-06, 'epoch': 2.09} +2025-02-06 00:27:15 - ERROR - stderr - 70%|██████▉ | 15646/22434 [14:19:35<4:44:12, 2.51s/it] +2025-02-06 00:27:17 - ERROR - stderr - 70%|██████▉ | 15647/22434 [14:19:37<4:43:07, 2.50s/it] +2025-02-06 00:27:17 - ERROR - stderr - +2025-02-06 00:27:17 - ERROR - stderr - +2025-02-06 00:27:17 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.3534314632415771, 'learning_rate': 4.428701855548837e-06, 'epoch': 2.09} +2025-02-06 00:27:17 - ERROR - stderr - 70%|██████▉ | 15647/22434 [14:19:37<4:43:07, 2.50s/it] +2025-02-06 00:27:20 - ERROR - stderr - 70%|██████▉ | 15648/22434 [14:19:39<4:40:26, 2.48s/it] +2025-02-06 00:27:20 - ERROR - stderr - +2025-02-06 00:27:20 - ERROR - stderr - +2025-02-06 00:27:20 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.478061556816101, 'learning_rate': 4.42750298972791e-06, 'epoch': 2.09} +2025-02-06 00:27:20 - ERROR - stderr - 70%|██████▉ | 15648/22434 [14:19:39<4:40:26, 2.48s/it] +2025-02-06 00:27:22 - ERROR - stderr - 70%|██████▉ | 15649/22434 [14:19:42<4:39:59, 2.48s/it] +2025-02-06 00:27:22 - ERROR - stderr - +2025-02-06 00:27:22 - ERROR - stderr - +2025-02-06 00:27:22 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.4423024654388428, 'learning_rate': 4.42630424006035e-06, 'epoch': 2.09} +2025-02-06 00:27:22 - ERROR - stderr - 70%|██████▉ | 15649/22434 [14:19:42<4:39:59, 2.48s/it] +2025-02-06 00:27:25 - ERROR - stderr - 70%|██████▉ | 15650/22434 [14:19:44<4:42:20, 2.50s/it] +2025-02-06 00:27:25 - ERROR - stderr - +2025-02-06 00:27:25 - ERROR - stderr - +2025-02-06 00:27:25 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.2507692575454712, 'learning_rate': 4.425105606571145e-06, 'epoch': 2.09} +2025-02-06 00:27:25 - ERROR - stderr - 70%|██████▉ | 15650/22434 [14:19:44<4:42:20, 2.50s/it] +2025-02-06 00:27:27 - ERROR - stderr - 70%|██████▉ | 15651/22434 [14:19:47<4:44:23, 2.52s/it] +2025-02-06 00:27:27 - ERROR - stderr - +2025-02-06 00:27:27 - ERROR - stderr - +2025-02-06 00:27:27 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.4497268199920654, 'learning_rate': 4.423907089285282e-06, 'epoch': 2.09} +2025-02-06 00:27:27 - ERROR - stderr - 70%|██████▉ | 15651/22434 [14:19:47<4:44:23, 2.52s/it] +2025-02-06 00:27:30 - ERROR - stderr - 70%|██████▉ | 15652/22434 [14:19:49<4:42:12, 2.50s/it] +2025-02-06 00:27:30 - ERROR - stderr - +2025-02-06 00:27:30 - ERROR - stderr - +2025-02-06 00:27:30 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.4894993305206299, 'learning_rate': 4.4227086882277335e-06, 'epoch': 2.09} +2025-02-06 00:27:30 - ERROR - stderr - 70%|██████▉ | 15652/22434 [14:19:49<4:42:12, 2.50s/it] +2025-02-06 00:27:32 - ERROR - stderr - 70%|██████▉ | 15653/22434 [14:19:52<4:40:59, 2.49s/it] +2025-02-06 00:27:32 - ERROR - stderr - +2025-02-06 00:27:32 - ERROR - stderr - +2025-02-06 00:27:32 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.5055650472640991, 'learning_rate': 4.421510403423489e-06, 'epoch': 2.09} +2025-02-06 00:27:32 - ERROR - stderr - 70%|██████▉ | 15653/22434 [14:19:52<4:40:59, 2.49s/it] +2025-02-06 00:27:35 - ERROR - stderr - 70%|██████▉ | 15654/22434 [14:19:54<4:40:26, 2.48s/it] +2025-02-06 00:27:35 - ERROR - stderr - +2025-02-06 00:27:35 - ERROR - stderr - +2025-02-06 00:27:35 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.3029303550720215, 'learning_rate': 4.420312234897521e-06, 'epoch': 2.09} +2025-02-06 00:27:35 - ERROR - stderr - 70%|██████▉ | 15654/22434 [14:19:54<4:40:26, 2.48s/it] +2025-02-06 00:27:37 - ERROR - stderr - 70%|██████▉ | 15655/22434 [14:19:57<4:38:45, 2.47s/it] +2025-02-06 00:27:37 - ERROR - stderr - +2025-02-06 00:27:37 - ERROR - stderr - +2025-02-06 00:27:37 - INFO - stdout - {'loss': 0.3899, 'grad_norm': 1.4466160535812378, 'learning_rate': 4.419114182674807e-06, 'epoch': 2.09} +2025-02-06 00:27:37 - ERROR - stderr - 70%|██████▉ | 15655/22434 [14:19:57<4:38:45, 2.47s/it] +2025-02-06 00:27:39 - ERROR - stderr - 70%|██████▉ | 15656/22434 [14:19:59<4:38:22, 2.46s/it] +2025-02-06 00:27:40 - ERROR - stderr - +2025-02-06 00:27:40 - ERROR - stderr - +2025-02-06 00:27:40 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.3878724575042725, 'learning_rate': 4.41791624678032e-06, 'epoch': 2.09} +2025-02-06 00:27:40 - ERROR - stderr - 70%|██████▉ | 15656/22434 [14:19:59<4:38:22, 2.46s/it] +2025-02-06 00:27:42 - ERROR - stderr - 70%|██████▉ | 15657/22434 [14:20:02<4:45:36, 2.53s/it] +2025-02-06 00:27:42 - ERROR - stderr - +2025-02-06 00:27:42 - ERROR - stderr - +2025-02-06 00:27:42 - INFO - stdout - {'loss': 0.3889, 'grad_norm': 1.3854899406433105, 'learning_rate': 4.4167184272390204e-06, 'epoch': 2.09} +2025-02-06 00:27:42 - ERROR - stderr - 70%|██████▉ | 15657/22434 [14:20:02<4:45:36, 2.53s/it] +2025-02-06 00:27:45 - ERROR - stderr - 70%|██████▉ | 15658/22434 [14:20:04<4:43:55, 2.51s/it] +2025-02-06 00:27:45 - ERROR - stderr - +2025-02-06 00:27:45 - ERROR - stderr - +2025-02-06 00:27:45 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.5768896341323853, 'learning_rate': 4.415520724075891e-06, 'epoch': 2.09} +2025-02-06 00:27:45 - ERROR - stderr - 70%|██████▉ | 15658/22434 [14:20:04<4:43:55, 2.51s/it] +2025-02-06 00:27:47 - ERROR - stderr - 70%|██████▉ | 15659/22434 [14:20:07<4:40:49, 2.49s/it] +2025-02-06 00:27:47 - ERROR - stderr - +2025-02-06 00:27:47 - ERROR - stderr - +2025-02-06 00:27:47 - INFO - stdout - {'loss': 0.3417, 'grad_norm': 1.3193234205245972, 'learning_rate': 4.414323137315884e-06, 'epoch': 2.09} +2025-02-06 00:27:47 - ERROR - stderr - 70%|██████▉ | 15659/22434 [14:20:07<4:40:49, 2.49s/it] +2025-02-06 00:27:50 - ERROR - stderr - 70%|██████▉ | 15660/22434 [14:20:09<4:43:39, 2.51s/it] +2025-02-06 00:27:50 - ERROR - stderr - +2025-02-06 00:27:50 - ERROR - stderr - +2025-02-06 00:27:50 - INFO - stdout - {'loss': 0.3215, 'grad_norm': 1.2497140169143677, 'learning_rate': 4.413125666983965e-06, 'epoch': 2.09} +2025-02-06 00:27:50 - ERROR - stderr - 70%|██████▉ | 15660/22434 [14:20:09<4:43:39, 2.51s/it] +2025-02-06 00:27:52 - ERROR - stderr - 70%|██████▉ | 15661/22434 [14:20:12<4:43:22, 2.51s/it] +2025-02-06 00:27:52 - ERROR - stderr - +2025-02-06 00:27:52 - ERROR - stderr - +2025-02-06 00:27:52 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.4555151462554932, 'learning_rate': 4.411928313105097e-06, 'epoch': 2.09} +2025-02-06 00:27:52 - ERROR - stderr - 70%|██████▉ | 15661/22434 [14:20:12<4:43:22, 2.51s/it] +2025-02-06 00:27:55 - ERROR - stderr - 70%|██████▉ | 15662/22434 [14:20:14<4:44:40, 2.52s/it] +2025-02-06 00:27:55 - ERROR - stderr - +2025-02-06 00:27:55 - ERROR - stderr - +2025-02-06 00:27:55 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.5373482704162598, 'learning_rate': 4.410731075704232e-06, 'epoch': 2.09} +2025-02-06 00:27:55 - ERROR - stderr - 70%|██████▉ | 15662/22434 [14:20:15<4:44:40, 2.52s/it] +2025-02-06 00:27:57 - ERROR - stderr - 70%|██████▉ | 15663/22434 [14:20:17<4:41:45, 2.50s/it] +2025-02-06 00:27:57 - ERROR - stderr - +2025-02-06 00:27:57 - ERROR - stderr - +2025-02-06 00:27:57 - INFO - stdout - {'loss': 0.4102, 'grad_norm': 1.6334055662155151, 'learning_rate': 4.409533954806336e-06, 'epoch': 2.09} +2025-02-06 00:27:57 - ERROR - stderr - 70%|██████▉ | 15663/22434 [14:20:17<4:41:45, 2.50s/it] +2025-02-06 00:28:00 - ERROR - stderr - 70%|██████▉ | 15664/22434 [14:20:19<4:40:57, 2.49s/it] +2025-02-06 00:28:00 - ERROR - stderr - +2025-02-06 00:28:00 - ERROR - stderr - +2025-02-06 00:28:00 - INFO - stdout - {'loss': 0.4114, 'grad_norm': 1.3893389701843262, 'learning_rate': 4.408336950436353e-06, 'epoch': 2.09} +2025-02-06 00:28:00 - ERROR - stderr - 70%|██████▉ | 15664/22434 [14:20:19<4:40:57, 2.49s/it] +2025-02-06 00:28:02 - ERROR - stderr - 70%|██████▉ | 15665/22434 [14:20:22<4:52:09, 2.59s/it] +2025-02-06 00:28:02 - ERROR - stderr - +2025-02-06 00:28:02 - ERROR - stderr - +2025-02-06 00:28:02 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.5049036741256714, 'learning_rate': 4.407140062619234e-06, 'epoch': 2.09} +2025-02-06 00:28:02 - ERROR - stderr - 70%|██████▉ | 15665/22434 [14:20:22<4:52:09, 2.59s/it] +2025-02-06 00:28:05 - ERROR - stderr - 70%|██████▉ | 15666/22434 [14:20:25<4:52:03, 2.59s/it] +2025-02-06 00:28:05 - ERROR - stderr - +2025-02-06 00:28:05 - ERROR - stderr - +2025-02-06 00:28:05 - INFO - stdout - {'loss': 0.3901, 'grad_norm': 1.5738027095794678, 'learning_rate': 4.405943291379929e-06, 'epoch': 2.09} +2025-02-06 00:28:05 - ERROR - stderr - 70%|██████▉ | 15666/22434 [14:20:25<4:52:03, 2.59s/it] +2025-02-06 00:28:08 - ERROR - stderr - 70%|██████▉ | 15667/22434 [14:20:27<4:53:06, 2.60s/it] +2025-02-06 00:28:08 - ERROR - stderr - +2025-02-06 00:28:08 - ERROR - stderr - +2025-02-06 00:28:08 - INFO - stdout - {'loss': 0.4539, 'grad_norm': 1.4291270971298218, 'learning_rate': 4.404746636743383e-06, 'epoch': 2.1} +2025-02-06 00:28:08 - ERROR - stderr - 70%|██████▉ | 15667/22434 [14:20:27<4:53:06, 2.60s/it] +2025-02-06 00:28:10 - ERROR - stderr - 70%|██████▉ | 15668/22434 [14:20:30<4:52:36, 2.59s/it] +2025-02-06 00:28:10 - ERROR - stderr - +2025-02-06 00:28:10 - ERROR - stderr - +2025-02-06 00:28:10 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.340965986251831, 'learning_rate': 4.403550098734541e-06, 'epoch': 2.1} +2025-02-06 00:28:10 - ERROR - stderr - 70%|██████▉ | 15668/22434 [14:20:30<4:52:36, 2.59s/it] +2025-02-06 00:28:13 - ERROR - stderr - 70%|██████▉ | 15669/22434 [14:20:32<4:48:11, 2.56s/it] +2025-02-06 00:28:13 - ERROR - stderr - +2025-02-06 00:28:13 - ERROR - stderr - +2025-02-06 00:28:13 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.5585929155349731, 'learning_rate': 4.402353677378341e-06, 'epoch': 2.1} +2025-02-06 00:28:13 - ERROR - stderr - 70%|██████▉ | 15669/22434 [14:20:32<4:48:11, 2.56s/it] +2025-02-06 00:28:15 - ERROR - stderr - 70%|██████▉ | 15670/22434 [14:20:35<4:45:29, 2.53s/it] +2025-02-06 00:28:15 - ERROR - stderr - +2025-02-06 00:28:15 - ERROR - stderr - +2025-02-06 00:28:15 - INFO - stdout - {'loss': 0.4088, 'grad_norm': 1.5626426935195923, 'learning_rate': 4.4011573726997215e-06, 'epoch': 2.1} +2025-02-06 00:28:15 - ERROR - stderr - 70%|██████▉ | 15670/22434 [14:20:35<4:45:29, 2.53s/it] +2025-02-06 00:28:18 - ERROR - stderr - 70%|██████▉ | 15671/22434 [14:20:37<4:45:17, 2.53s/it] +2025-02-06 00:28:18 - ERROR - stderr - +2025-02-06 00:28:18 - ERROR - stderr - +2025-02-06 00:28:18 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.3515079021453857, 'learning_rate': 4.399961184723619e-06, 'epoch': 2.1} +2025-02-06 00:28:18 - ERROR - stderr - 70%|██████▉ | 15671/22434 [14:20:38<4:45:17, 2.53s/it] +2025-02-06 00:28:20 - ERROR - stderr - 70%|██████▉ | 15672/22434 [14:20:40<4:45:24, 2.53s/it] +2025-02-06 00:28:20 - ERROR - stderr - +2025-02-06 00:28:20 - ERROR - stderr - +2025-02-06 00:28:20 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.1834638118743896, 'learning_rate': 4.398765113474968e-06, 'epoch': 2.1} +2025-02-06 00:28:20 - ERROR - stderr - 70%|██████▉ | 15672/22434 [14:20:40<4:45:24, 2.53s/it] +2025-02-06 00:28:23 - ERROR - stderr - 70%|██████▉ | 15673/22434 [14:20:43<4:54:47, 2.62s/it] +2025-02-06 00:28:23 - ERROR - stderr - +2025-02-06 00:28:23 - ERROR - stderr - +2025-02-06 00:28:23 - INFO - stdout - {'loss': 0.3148, 'grad_norm': 1.3986002206802368, 'learning_rate': 4.397569158978698e-06, 'epoch': 2.1} +2025-02-06 00:28:23 - ERROR - stderr - 70%|██████▉ | 15673/22434 [14:20:43<4:54:47, 2.62s/it] +2025-02-06 00:28:26 - ERROR - stderr - 70%|██████▉ | 15674/22434 [14:20:45<4:51:51, 2.59s/it] +2025-02-06 00:28:26 - ERROR - stderr - +2025-02-06 00:28:26 - ERROR - stderr - +2025-02-06 00:28:26 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.4963678121566772, 'learning_rate': 4.396373321259737e-06, 'epoch': 2.1} +2025-02-06 00:28:26 - ERROR - stderr - 70%|██████▉ | 15674/22434 [14:20:45<4:51:51, 2.59s/it] +2025-02-06 00:28:28 - ERROR - stderr - 70%|██████▉ | 15675/22434 [14:20:48<4:48:49, 2.56s/it] +2025-02-06 00:28:28 - ERROR - stderr - +2025-02-06 00:28:28 - ERROR - stderr - +2025-02-06 00:28:28 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.4160876274108887, 'learning_rate': 4.395177600343017e-06, 'epoch': 2.1} +2025-02-06 00:28:28 - ERROR - stderr - 70%|██████▉ | 15675/22434 [14:20:48<4:48:49, 2.56s/it] +2025-02-06 00:28:31 - ERROR - stderr - 70%|██████▉ | 15676/22434 [14:20:50<4:45:44, 2.54s/it] +2025-02-06 00:28:31 - ERROR - stderr - +2025-02-06 00:28:31 - ERROR - stderr - +2025-02-06 00:28:31 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.526122808456421, 'learning_rate': 4.393981996253448e-06, 'epoch': 2.1} +2025-02-06 00:28:31 - ERROR - stderr - 70%|██████▉ | 15676/22434 [14:20:50<4:45:44, 2.54s/it] +2025-02-06 00:28:33 - ERROR - stderr - 70%|██████▉ | 15677/22434 [14:20:53<4:46:24, 2.54s/it] +2025-02-06 00:28:33 - ERROR - stderr - +2025-02-06 00:28:33 - ERROR - stderr - +2025-02-06 00:28:33 - INFO - stdout - {'loss': 0.3147, 'grad_norm': 1.182529330253601, 'learning_rate': 4.392786509015968e-06, 'epoch': 2.1} +2025-02-06 00:28:33 - ERROR - stderr - 70%|██████▉ | 15677/22434 [14:20:53<4:46:24, 2.54s/it] +2025-02-06 00:28:36 - ERROR - stderr - 70%|██████▉ | 15678/22434 [14:20:55<4:42:41, 2.51s/it] +2025-02-06 00:28:36 - ERROR - stderr - +2025-02-06 00:28:36 - ERROR - stderr - +2025-02-06 00:28:36 - INFO - stdout - {'loss': 0.4224, 'grad_norm': 1.4445598125457764, 'learning_rate': 4.391591138655481e-06, 'epoch': 2.1} +2025-02-06 00:28:36 - ERROR - stderr - 70%|██████▉ | 15678/22434 [14:20:55<4:42:41, 2.51s/it] +2025-02-06 00:28:38 - ERROR - stderr - 70%|██████▉ | 15679/22434 [14:20:58<4:39:48, 2.49s/it] +2025-02-06 00:28:38 - ERROR - stderr - +2025-02-06 00:28:38 - ERROR - stderr - +2025-02-06 00:28:38 - INFO - stdout - {'loss': 0.3841, 'grad_norm': 1.4032464027404785, 'learning_rate': 4.390395885196916e-06, 'epoch': 2.1} +2025-02-06 00:28:38 - ERROR - stderr - 70%|██████▉ | 15679/22434 [14:20:58<4:39:48, 2.49s/it] +2025-02-06 00:28:41 - ERROR - stderr - 70%|██████▉ | 15680/22434 [14:21:00<4:42:38, 2.51s/it] +2025-02-06 00:28:41 - ERROR - stderr - +2025-02-06 00:28:41 - ERROR - stderr - +2025-02-06 00:28:41 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.5276345014572144, 'learning_rate': 4.389200748665179e-06, 'epoch': 2.1} +2025-02-06 00:28:41 - ERROR - stderr - 70%|██████▉ | 15680/22434 [14:21:00<4:42:38, 2.51s/it] +2025-02-06 00:28:43 - ERROR - stderr - 70%|██████▉ | 15681/22434 [14:21:03<4:40:04, 2.49s/it] +2025-02-06 00:28:43 - ERROR - stderr - +2025-02-06 00:28:43 - ERROR - stderr - +2025-02-06 00:28:43 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.3111323118209839, 'learning_rate': 4.3880057290851786e-06, 'epoch': 2.1} +2025-02-06 00:28:43 - ERROR - stderr - 70%|██████▉ | 15681/22434 [14:21:03<4:40:04, 2.49s/it] +2025-02-06 00:28:45 - ERROR - stderr - 70%|██████▉ | 15682/22434 [14:21:05<4:38:12, 2.47s/it] +2025-02-06 00:28:45 - ERROR - stderr - +2025-02-06 00:28:45 - ERROR - stderr - +2025-02-06 00:28:45 - INFO - stdout - {'loss': 0.3981, 'grad_norm': 1.6265418529510498, 'learning_rate': 4.3868108264818366e-06, 'epoch': 2.1} +2025-02-06 00:28:45 - ERROR - stderr - 70%|██████▉ | 15682/22434 [14:21:05<4:38:12, 2.47s/it] +2025-02-06 00:28:48 - ERROR - stderr - 70%|██████▉ | 15683/22434 [14:21:08<4:38:59, 2.48s/it] +2025-02-06 00:28:48 - ERROR - stderr - +2025-02-06 00:28:48 - ERROR - stderr - +2025-02-06 00:28:48 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.591089129447937, 'learning_rate': 4.3856160408800475e-06, 'epoch': 2.1} +2025-02-06 00:28:48 - ERROR - stderr - 70%|██████▉ | 15683/22434 [14:21:08<4:38:59, 2.48s/it] +2025-02-06 00:28:50 - ERROR - stderr - 70%|██████▉ | 15684/22434 [14:21:10<4:40:35, 2.49s/it] +2025-02-06 00:28:50 - ERROR - stderr - +2025-02-06 00:28:50 - ERROR - stderr - +2025-02-06 00:28:50 - INFO - stdout - {'loss': 0.4126, 'grad_norm': 1.4167088270187378, 'learning_rate': 4.38442137230472e-06, 'epoch': 2.1} +2025-02-06 00:28:50 - ERROR - stderr - 70%|██████▉ | 15684/22434 [14:21:10<4:40:35, 2.49s/it] +2025-02-06 00:28:53 - ERROR - stderr - 70%|██████▉ | 15685/22434 [14:21:13<4:44:34, 2.53s/it] +2025-02-06 00:28:53 - ERROR - stderr - +2025-02-06 00:28:53 - ERROR - stderr - +2025-02-06 00:28:53 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.4229716062545776, 'learning_rate': 4.383226820780756e-06, 'epoch': 2.1} +2025-02-06 00:28:53 - ERROR - stderr - 70%|██████▉ | 15685/22434 [14:21:13<4:44:34, 2.53s/it] +2025-02-06 00:28:56 - ERROR - stderr - 70%|██████▉ | 15686/22434 [14:21:15<4:42:23, 2.51s/it] +2025-02-06 00:28:56 - ERROR - stderr - +2025-02-06 00:28:56 - ERROR - stderr - +2025-02-06 00:28:56 - INFO - stdout - {'loss': 0.4445, 'grad_norm': 1.6457507610321045, 'learning_rate': 4.382032386333053e-06, 'epoch': 2.1} +2025-02-06 00:28:56 - ERROR - stderr - 70%|██████▉ | 15686/22434 [14:21:15<4:42:23, 2.51s/it] +2025-02-06 00:28:58 - ERROR - stderr - 70%|██████▉ | 15687/22434 [14:21:18<4:40:19, 2.49s/it] +2025-02-06 00:28:58 - ERROR - stderr - +2025-02-06 00:28:58 - ERROR - stderr - +2025-02-06 00:28:58 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.6720408201217651, 'learning_rate': 4.3808380689865106e-06, 'epoch': 2.1} +2025-02-06 00:28:58 - ERROR - stderr - 70%|██████▉ | 15687/22434 [14:21:18<4:40:19, 2.49s/it] +2025-02-06 00:29:01 - ERROR - stderr - 70%|██████▉ | 15688/22434 [14:21:20<4:46:19, 2.55s/it] +2025-02-06 00:29:01 - ERROR - stderr - +2025-02-06 00:29:01 - ERROR - stderr - +2025-02-06 00:29:01 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.3859692811965942, 'learning_rate': 4.37964386876602e-06, 'epoch': 2.1} +2025-02-06 00:29:01 - ERROR - stderr - 70%|██████▉ | 15688/22434 [14:21:20<4:46:19, 2.55s/it] +2025-02-06 00:29:03 - ERROR - stderr - 70%|██████▉ | 15689/22434 [14:21:23<4:43:16, 2.52s/it] +2025-02-06 00:29:03 - ERROR - stderr - +2025-02-06 00:29:03 - ERROR - stderr - +2025-02-06 00:29:03 - INFO - stdout - {'loss': 0.3411, 'grad_norm': 1.4062530994415283, 'learning_rate': 4.378449785696476e-06, 'epoch': 2.1} +2025-02-06 00:29:03 - ERROR - stderr - 70%|██████▉ | 15689/22434 [14:21:23<4:43:16, 2.52s/it] +2025-02-06 00:29:06 - ERROR - stderr - 70%|██████▉ | 15690/22434 [14:21:25<4:41:48, 2.51s/it] +2025-02-06 00:29:06 - ERROR - stderr - +2025-02-06 00:29:06 - ERROR - stderr - +2025-02-06 00:29:06 - INFO - stdout - {'loss': 0.3993, 'grad_norm': 1.5429835319519043, 'learning_rate': 4.377255819802766e-06, 'epoch': 2.1} +2025-02-06 00:29:06 - ERROR - stderr - 70%|██████▉ | 15690/22434 [14:21:25<4:41:48, 2.51s/it] +2025-02-06 00:29:08 - ERROR - stderr - 70%|██████▉ | 15691/22434 [14:21:28<4:40:13, 2.49s/it] +2025-02-06 00:29:08 - ERROR - stderr - +2025-02-06 00:29:08 - ERROR - stderr - +2025-02-06 00:29:08 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.537819266319275, 'learning_rate': 4.376061971109779e-06, 'epoch': 2.1} +2025-02-06 00:29:08 - ERROR - stderr - 70%|██████▉ | 15691/22434 [14:21:28<4:40:13, 2.49s/it] +2025-02-06 00:29:11 - ERROR - stderr - 70%|██████▉ | 15692/22434 [14:21:31<4:56:32, 2.64s/it] +2025-02-06 00:29:11 - ERROR - stderr - +2025-02-06 00:29:11 - ERROR - stderr - +2025-02-06 00:29:11 - INFO - stdout - {'loss': 0.4036, 'grad_norm': 1.6518570184707642, 'learning_rate': 4.374868239642398e-06, 'epoch': 2.1} +2025-02-06 00:29:11 - ERROR - stderr - 70%|██████▉ | 15692/22434 [14:21:31<4:56:32, 2.64s/it] +2025-02-06 00:29:14 - ERROR - stderr - 70%|██████▉ | 15693/22434 [14:21:33<4:53:18, 2.61s/it] +2025-02-06 00:29:14 - ERROR - stderr - +2025-02-06 00:29:14 - ERROR - stderr - +2025-02-06 00:29:14 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.4917223453521729, 'learning_rate': 4.373674625425507e-06, 'epoch': 2.1} +2025-02-06 00:29:14 - ERROR - stderr - 70%|██████▉ | 15693/22434 [14:21:33<4:53:18, 2.61s/it] +2025-02-06 00:29:16 - ERROR - stderr - 70%|██████▉ | 15694/22434 [14:21:36<4:50:20, 2.58s/it] +2025-02-06 00:29:16 - ERROR - stderr - +2025-02-06 00:29:16 - ERROR - stderr - +2025-02-06 00:29:16 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.4152390956878662, 'learning_rate': 4.372481128483984e-06, 'epoch': 2.1} +2025-02-06 00:29:16 - ERROR - stderr - 70%|██████▉ | 15694/22434 [14:21:36<4:50:20, 2.58s/it] +2025-02-06 00:29:19 - ERROR - stderr - 70%|██████▉ | 15695/22434 [14:21:38<4:45:24, 2.54s/it] +2025-02-06 00:29:19 - ERROR - stderr - +2025-02-06 00:29:19 - ERROR - stderr - +2025-02-06 00:29:19 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.3284255266189575, 'learning_rate': 4.371287748842706e-06, 'epoch': 2.1} +2025-02-06 00:29:19 - ERROR - stderr - 70%|██████▉ | 15695/22434 [14:21:38<4:45:24, 2.54s/it] +2025-02-06 00:29:21 - ERROR - stderr - 70%|██████▉ | 15696/22434 [14:21:41<4:46:34, 2.55s/it] +2025-02-06 00:29:21 - ERROR - stderr - +2025-02-06 00:29:21 - ERROR - stderr - +2025-02-06 00:29:21 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.5194693803787231, 'learning_rate': 4.370094486526553e-06, 'epoch': 2.1} +2025-02-06 00:29:21 - ERROR - stderr - 70%|██████▉ | 15696/22434 [14:21:41<4:46:34, 2.55s/it] +2025-02-06 00:29:24 - ERROR - stderr - 70%|██████▉ | 15697/22434 [14:21:44<4:50:20, 2.59s/it] +2025-02-06 00:29:24 - ERROR - stderr - +2025-02-06 00:29:24 - ERROR - stderr - +2025-02-06 00:29:24 - INFO - stdout - {'loss': 0.4239, 'grad_norm': 1.6204891204833984, 'learning_rate': 4.368901341560386e-06, 'epoch': 2.1} +2025-02-06 00:29:24 - ERROR - stderr - 70%|██████▉ | 15697/22434 [14:21:44<4:50:20, 2.59s/it] +2025-02-06 00:29:26 - ERROR - stderr - 70%|██████▉ | 15698/22434 [14:21:46<4:47:25, 2.56s/it] +2025-02-06 00:29:26 - ERROR - stderr - +2025-02-06 00:29:26 - ERROR - stderr - +2025-02-06 00:29:26 - INFO - stdout - {'loss': 0.4265, 'grad_norm': 1.5124558210372925, 'learning_rate': 4.36770831396909e-06, 'epoch': 2.1} +2025-02-06 00:29:26 - ERROR - stderr - 70%|██████▉ | 15698/22434 [14:21:46<4:47:25, 2.56s/it] +2025-02-06 00:29:29 - ERROR - stderr - 70%|██████▉ | 15699/22434 [14:21:48<4:43:46, 2.53s/it] +2025-02-06 00:29:29 - ERROR - stderr - +2025-02-06 00:29:29 - ERROR - stderr - +2025-02-06 00:29:29 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.375800371170044, 'learning_rate': 4.366515403777522e-06, 'epoch': 2.1} +2025-02-06 00:29:29 - ERROR - stderr - 70%|██████▉ | 15699/22434 [14:21:49<4:43:46, 2.53s/it] +2025-02-06 00:29:31 - ERROR - stderr - 70%|██████▉ | 15700/22434 [14:21:51<4:41:31, 2.51s/it] +2025-02-06 00:29:31 - ERROR - stderr - +2025-02-06 00:29:31 - ERROR - stderr - +2025-02-06 00:29:31 - INFO - stdout - {'loss': 0.4077, 'grad_norm': 1.6754341125488281, 'learning_rate': 4.365322611010544e-06, 'epoch': 2.1} +2025-02-06 00:29:31 - ERROR - stderr - 70%|██████▉ | 15700/22434 [14:21:51<4:41:31, 2.51s/it] +2025-02-06 00:29:34 - ERROR - stderr - 70%|██████▉ | 15701/22434 [14:21:53<4:41:28, 2.51s/it] +2025-02-06 00:29:34 - ERROR - stderr - +2025-02-06 00:29:34 - ERROR - stderr - +2025-02-06 00:29:34 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.392293930053711, 'learning_rate': 4.364129935693032e-06, 'epoch': 2.1} +2025-02-06 00:29:34 - ERROR - stderr - 70%|██████▉ | 15701/22434 [14:21:53<4:41:28, 2.51s/it] +2025-02-06 00:29:36 - ERROR - stderr - 70%|██████▉ | 15702/22434 [14:21:56<4:38:59, 2.49s/it] +2025-02-06 00:29:36 - ERROR - stderr - +2025-02-06 00:29:36 - ERROR - stderr - +2025-02-06 00:29:36 - INFO - stdout - {'loss': 0.3947, 'grad_norm': 1.5114208459854126, 'learning_rate': 4.362937377849832e-06, 'epoch': 2.1} +2025-02-06 00:29:36 - ERROR - stderr - 70%|██████▉ | 15702/22434 [14:21:56<4:38:59, 2.49s/it] +2025-02-06 00:29:39 - ERROR - stderr - 70%|██████▉ | 15703/22434 [14:21:58<4:39:06, 2.49s/it] +2025-02-06 00:29:39 - ERROR - stderr - +2025-02-06 00:29:39 - ERROR - stderr - +2025-02-06 00:29:39 - INFO - stdout - {'loss': 0.4077, 'grad_norm': 1.526084303855896, 'learning_rate': 4.361744937505815e-06, 'epoch': 2.1} +2025-02-06 00:29:39 - ERROR - stderr - 70%|██████▉ | 15703/22434 [14:21:58<4:39:06, 2.49s/it] +2025-02-06 00:29:41 - ERROR - stderr - 70%|███████ | 15704/22434 [14:22:01<4:43:40, 2.53s/it] +2025-02-06 00:29:41 - ERROR - stderr - +2025-02-06 00:29:41 - ERROR - stderr - +2025-02-06 00:29:41 - INFO - stdout - {'loss': 0.4082, 'grad_norm': 1.6001319885253906, 'learning_rate': 4.360552614685825e-06, 'epoch': 2.1} +2025-02-06 00:29:41 - ERROR - stderr - 70%|███████ | 15704/22434 [14:22:01<4:43:40, 2.53s/it] +2025-02-06 00:29:44 - ERROR - stderr - 70%|███████ | 15705/22434 [14:22:04<4:54:02, 2.62s/it] +2025-02-06 00:29:44 - ERROR - stderr - +2025-02-06 00:29:44 - ERROR - stderr - +2025-02-06 00:29:44 - INFO - stdout - {'loss': 0.3342, 'grad_norm': 1.4394702911376953, 'learning_rate': 4.359360409414721e-06, 'epoch': 2.1} +2025-02-06 00:29:44 - ERROR - stderr - 70%|███████ | 15705/22434 [14:22:04<4:54:02, 2.62s/it] +2025-02-06 00:29:47 - ERROR - stderr - 70%|███████ | 15706/22434 [14:22:06<4:51:18, 2.60s/it] +2025-02-06 00:29:47 - ERROR - stderr - +2025-02-06 00:29:47 - ERROR - stderr - +2025-02-06 00:29:47 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.4712375402450562, 'learning_rate': 4.358168321717352e-06, 'epoch': 2.1} +2025-02-06 00:29:47 - ERROR - stderr - 70%|███████ | 15706/22434 [14:22:06<4:51:18, 2.60s/it] +2025-02-06 00:29:49 - ERROR - stderr - 70%|███████ | 15707/22434 [14:22:09<4:47:10, 2.56s/it] +2025-02-06 00:29:49 - ERROR - stderr - +2025-02-06 00:29:49 - ERROR - stderr - +2025-02-06 00:29:49 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.5200183391571045, 'learning_rate': 4.356976351618565e-06, 'epoch': 2.1} +2025-02-06 00:29:49 - ERROR - stderr - 70%|███████ | 15707/22434 [14:22:09<4:47:10, 2.56s/it] +2025-02-06 00:29:52 - ERROR - stderr - 70%|███████ | 15708/22434 [14:22:11<4:44:52, 2.54s/it] +2025-02-06 00:29:52 - ERROR - stderr - +2025-02-06 00:29:52 - ERROR - stderr - +2025-02-06 00:29:52 - INFO - stdout - {'loss': 0.3706, 'grad_norm': 1.5098627805709839, 'learning_rate': 4.355784499143207e-06, 'epoch': 2.1} +2025-02-06 00:29:52 - ERROR - stderr - 70%|███████ | 15708/22434 [14:22:11<4:44:52, 2.54s/it] +2025-02-06 00:29:54 - ERROR - stderr - 70%|███████ | 15709/22434 [14:22:14<4:45:52, 2.55s/it] +2025-02-06 00:29:54 - ERROR - stderr - +2025-02-06 00:29:54 - ERROR - stderr - +2025-02-06 00:29:54 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.4118281602859497, 'learning_rate': 4.354592764316118e-06, 'epoch': 2.1} +2025-02-06 00:29:54 - ERROR - stderr - 70%|███████ | 15709/22434 [14:22:14<4:45:52, 2.55s/it] +2025-02-06 00:29:57 - ERROR - stderr - 70%|███████ | 15710/22434 [14:22:16<4:43:48, 2.53s/it] +2025-02-06 00:29:57 - ERROR - stderr - +2025-02-06 00:29:57 - ERROR - stderr - +2025-02-06 00:29:57 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.18292236328125, 'learning_rate': 4.353401147162142e-06, 'epoch': 2.1} +2025-02-06 00:29:57 - ERROR - stderr - 70%|███████ | 15710/22434 [14:22:16<4:43:48, 2.53s/it] +2025-02-06 00:29:59 - ERROR - stderr - 70%|███████ | 15711/22434 [14:22:19<4:42:11, 2.52s/it] +2025-02-06 00:29:59 - ERROR - stderr - +2025-02-06 00:29:59 - ERROR - stderr - +2025-02-06 00:29:59 - INFO - stdout - {'loss': 0.3604, 'grad_norm': 1.5326002836227417, 'learning_rate': 4.352209647706116e-06, 'epoch': 2.1} +2025-02-06 00:29:59 - ERROR - stderr - 70%|███████ | 15711/22434 [14:22:19<4:42:11, 2.52s/it] +2025-02-06 00:30:02 - ERROR - stderr - 70%|███████ | 15712/22434 [14:22:21<4:40:37, 2.50s/it] +2025-02-06 00:30:02 - ERROR - stderr - +2025-02-06 00:30:02 - ERROR - stderr - +2025-02-06 00:30:02 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.590121865272522, 'learning_rate': 4.351018265972875e-06, 'epoch': 2.1} +2025-02-06 00:30:02 - ERROR - stderr - 70%|███████ | 15712/22434 [14:22:21<4:40:37, 2.50s/it] +2025-02-06 00:30:04 - ERROR - stderr - 70%|███████ | 15713/22434 [14:22:24<4:51:17, 2.60s/it] +2025-02-06 00:30:04 - ERROR - stderr - +2025-02-06 00:30:04 - ERROR - stderr - +2025-02-06 00:30:04 - INFO - stdout - {'loss': 0.3993, 'grad_norm': 1.536358118057251, 'learning_rate': 4.349827001987254e-06, 'epoch': 2.1} +2025-02-06 00:30:04 - ERROR - stderr - 70%|███████ | 15713/22434 [14:22:24<4:51:17, 2.60s/it] +2025-02-06 00:30:07 - ERROR - stderr - 70%|███████ | 15714/22434 [14:22:27<4:45:05, 2.55s/it] +2025-02-06 00:30:07 - ERROR - stderr - +2025-02-06 00:30:07 - ERROR - stderr - +2025-02-06 00:30:07 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.589345097541809, 'learning_rate': 4.348635855774082e-06, 'epoch': 2.1} +2025-02-06 00:30:07 - ERROR - stderr - 70%|███████ | 15714/22434 [14:22:27<4:45:05, 2.55s/it] +2025-02-06 00:30:09 - ERROR - stderr - 70%|███████ | 15715/22434 [14:22:29<4:42:52, 2.53s/it] +2025-02-06 00:30:09 - ERROR - stderr - +2025-02-06 00:30:09 - ERROR - stderr - +2025-02-06 00:30:09 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.3861310482025146, 'learning_rate': 4.34744482735819e-06, 'epoch': 2.1} +2025-02-06 00:30:09 - ERROR - stderr - 70%|███████ | 15715/22434 [14:22:29<4:42:52, 2.53s/it] +2025-02-06 00:30:12 - ERROR - stderr - 70%|███████ | 15716/22434 [14:22:32<4:42:00, 2.52s/it] +2025-02-06 00:30:12 - ERROR - stderr - +2025-02-06 00:30:12 - ERROR - stderr - +2025-02-06 00:30:12 - INFO - stdout - {'loss': 0.3476, 'grad_norm': 1.3468713760375977, 'learning_rate': 4.346253916764396e-06, 'epoch': 2.1} +2025-02-06 00:30:12 - ERROR - stderr - 70%|███████ | 15716/22434 [14:22:32<4:42:00, 2.52s/it] +2025-02-06 00:30:14 - ERROR - stderr - 70%|███████ | 15717/22434 [14:22:34<4:41:23, 2.51s/it] +2025-02-06 00:30:14 - ERROR - stderr - +2025-02-06 00:30:14 - ERROR - stderr - +2025-02-06 00:30:14 - INFO - stdout - {'loss': 0.3661, 'grad_norm': 1.2690790891647339, 'learning_rate': 4.345063124017537e-06, 'epoch': 2.1} +2025-02-06 00:30:14 - ERROR - stderr - 70%|███████ | 15717/22434 [14:22:34<4:41:23, 2.51s/it] +2025-02-06 00:30:17 - ERROR - stderr - 70%|███████ | 15718/22434 [14:22:37<4:39:21, 2.50s/it] +2025-02-06 00:30:17 - ERROR - stderr - +2025-02-06 00:30:17 - ERROR - stderr - +2025-02-06 00:30:17 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.5111621618270874, 'learning_rate': 4.343872449142417e-06, 'epoch': 2.1} +2025-02-06 00:30:17 - ERROR - stderr - 70%|███████ | 15718/22434 [14:22:37<4:39:21, 2.50s/it] +2025-02-06 00:30:19 - ERROR - stderr - 70%|███████ | 15719/22434 [14:22:39<4:38:55, 2.49s/it] +2025-02-06 00:30:19 - ERROR - stderr - +2025-02-06 00:30:19 - ERROR - stderr - +2025-02-06 00:30:19 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.4507535696029663, 'learning_rate': 4.342681892163868e-06, 'epoch': 2.1} +2025-02-06 00:30:19 - ERROR - stderr - 70%|███████ | 15719/22434 [14:22:39<4:38:55, 2.49s/it] +2025-02-06 00:30:22 - ERROR - stderr - 70%|███████ | 15720/22434 [14:22:41<4:37:49, 2.48s/it] +2025-02-06 00:30:22 - ERROR - stderr - +2025-02-06 00:30:22 - ERROR - stderr - +2025-02-06 00:30:22 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.4821652173995972, 'learning_rate': 4.341491453106704e-06, 'epoch': 2.1} +2025-02-06 00:30:22 - ERROR - stderr - 70%|███████ | 15720/22434 [14:22:42<4:37:49, 2.48s/it] +2025-02-06 00:30:24 - ERROR - stderr - 70%|███████ | 15721/22434 [14:22:44<4:39:51, 2.50s/it] +2025-02-06 00:30:24 - ERROR - stderr - +2025-02-06 00:30:24 - ERROR - stderr - +2025-02-06 00:30:24 - INFO - stdout - {'loss': 0.3519, 'grad_norm': 1.5352351665496826, 'learning_rate': 4.34030113199573e-06, 'epoch': 2.1} +2025-02-06 00:30:24 - ERROR - stderr - 70%|███████ | 15721/22434 [14:22:44<4:39:51, 2.50s/it] +2025-02-06 00:30:27 - ERROR - stderr - 70%|███████ | 15722/22434 [14:22:47<4:41:32, 2.52s/it] +2025-02-06 00:30:27 - ERROR - stderr - +2025-02-06 00:30:27 - ERROR - stderr - +2025-02-06 00:30:27 - INFO - stdout - {'loss': 0.401, 'grad_norm': 1.6089775562286377, 'learning_rate': 4.33911092885577e-06, 'epoch': 2.1} +2025-02-06 00:30:27 - ERROR - stderr - 70%|███████ | 15722/22434 [14:22:47<4:41:32, 2.52s/it] +2025-02-06 00:30:29 - ERROR - stderr - 70%|███████ | 15723/22434 [14:22:49<4:40:41, 2.51s/it] +2025-02-06 00:30:29 - ERROR - stderr - +2025-02-06 00:30:29 - ERROR - stderr - +2025-02-06 00:30:29 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.595123291015625, 'learning_rate': 4.337920843711619e-06, 'epoch': 2.1} +2025-02-06 00:30:29 - ERROR - stderr - 70%|███████ | 15723/22434 [14:22:49<4:40:41, 2.51s/it] +2025-02-06 00:30:32 - ERROR - stderr - 70%|███████ | 15724/22434 [14:22:52<4:43:42, 2.54s/it] +2025-02-06 00:30:32 - ERROR - stderr - +2025-02-06 00:30:32 - ERROR - stderr - +2025-02-06 00:30:32 - INFO - stdout - {'loss': 0.411, 'grad_norm': 1.480790138244629, 'learning_rate': 4.336730876588097e-06, 'epoch': 2.1} +2025-02-06 00:30:32 - ERROR - stderr - 70%|███████ | 15724/22434 [14:22:52<4:43:42, 2.54s/it] +2025-02-06 00:30:34 - ERROR - stderr - 70%|███████ | 15725/22434 [14:22:54<4:42:00, 2.52s/it] +2025-02-06 00:30:34 - ERROR - stderr - +2025-02-06 00:30:34 - ERROR - stderr - +2025-02-06 00:30:34 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.5942189693450928, 'learning_rate': 4.335541027509996e-06, 'epoch': 2.1} +2025-02-06 00:30:34 - ERROR - stderr - 70%|███████ | 15725/22434 [14:22:54<4:42:00, 2.52s/it] +2025-02-06 00:30:37 - ERROR - stderr - 70%|███████ | 15726/22434 [14:22:57<4:43:38, 2.54s/it] +2025-02-06 00:30:37 - ERROR - stderr - +2025-02-06 00:30:37 - ERROR - stderr - +2025-02-06 00:30:37 - INFO - stdout - {'loss': 0.4216, 'grad_norm': 1.5302094221115112, 'learning_rate': 4.334351296502119e-06, 'epoch': 2.1} +2025-02-06 00:30:37 - ERROR - stderr - 70%|███████ | 15726/22434 [14:22:57<4:43:38, 2.54s/it] +2025-02-06 00:30:40 - ERROR - stderr - 70%|███████ | 15727/22434 [14:22:59<4:43:35, 2.54s/it] +2025-02-06 00:30:40 - ERROR - stderr - +2025-02-06 00:30:40 - ERROR - stderr - +2025-02-06 00:30:40 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.3961538076400757, 'learning_rate': 4.333161683589276e-06, 'epoch': 2.1} +2025-02-06 00:30:40 - ERROR - stderr - 70%|███████ | 15727/22434 [14:22:59<4:43:35, 2.54s/it] +2025-02-06 00:30:42 - ERROR - stderr - 70%|███████ | 15728/22434 [14:23:02<4:42:50, 2.53s/it] +2025-02-06 00:30:42 - ERROR - stderr - +2025-02-06 00:30:42 - ERROR - stderr - +2025-02-06 00:30:42 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.3041951656341553, 'learning_rate': 4.3319721887962505e-06, 'epoch': 2.1} +2025-02-06 00:30:42 - ERROR - stderr - 70%|███████ | 15728/22434 [14:23:02<4:42:50, 2.53s/it] +2025-02-06 00:30:45 - ERROR - stderr - 70%|███████ | 15729/22434 [14:23:05<4:52:09, 2.61s/it] +2025-02-06 00:30:45 - ERROR - stderr - +2025-02-06 00:30:45 - ERROR - stderr - +2025-02-06 00:30:45 - INFO - stdout - {'loss': 0.4238, 'grad_norm': 1.5044530630111694, 'learning_rate': 4.330782812147842e-06, 'epoch': 2.1} +2025-02-06 00:30:45 - ERROR - stderr - 70%|███████ | 15729/22434 [14:23:05<4:52:09, 2.61s/it] +2025-02-06 00:30:47 - ERROR - stderr - 70%|███████ | 15730/22434 [14:23:07<4:50:00, 2.60s/it] +2025-02-06 00:30:47 - ERROR - stderr - +2025-02-06 00:30:47 - ERROR - stderr - +2025-02-06 00:30:47 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.5571688413619995, 'learning_rate': 4.329593553668841e-06, 'epoch': 2.1} +2025-02-06 00:30:47 - ERROR - stderr - 70%|███████ | 15730/22434 [14:23:07<4:50:00, 2.60s/it] +2025-02-06 00:30:50 - ERROR - stderr - 70%|███████ | 15731/22434 [14:23:10<4:45:13, 2.55s/it] +2025-02-06 00:30:50 - ERROR - stderr - +2025-02-06 00:30:50 - ERROR - stderr - +2025-02-06 00:30:50 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.449092984199524, 'learning_rate': 4.328404413384035e-06, 'epoch': 2.1} +2025-02-06 00:30:50 - ERROR - stderr - 70%|███████ | 15731/22434 [14:23:10<4:45:13, 2.55s/it] +2025-02-06 00:30:52 - ERROR - stderr - 70%|███████ | 15732/22434 [14:23:12<4:43:30, 2.54s/it] +2025-02-06 00:30:52 - ERROR - stderr - +2025-02-06 00:30:52 - ERROR - stderr - +2025-02-06 00:30:52 - INFO - stdout - {'loss': 0.4227, 'grad_norm': 1.6132605075836182, 'learning_rate': 4.327215391318213e-06, 'epoch': 2.1} +2025-02-06 00:30:52 - ERROR - stderr - 70%|███████ | 15732/22434 [14:23:12<4:43:30, 2.54s/it] +2025-02-06 00:30:55 - ERROR - stderr - 70%|███████ | 15733/22434 [14:23:15<4:42:18, 2.53s/it] +2025-02-06 00:30:55 - ERROR - stderr - +2025-02-06 00:30:55 - ERROR - stderr - +2025-02-06 00:30:55 - INFO - stdout - {'loss': 0.3935, 'grad_norm': 1.3593202829360962, 'learning_rate': 4.326026487496157e-06, 'epoch': 2.1} +2025-02-06 00:30:55 - ERROR - stderr - 70%|███████ | 15733/22434 [14:23:15<4:42:18, 2.53s/it] +2025-02-06 00:30:57 - ERROR - stderr - 70%|███████ | 15734/22434 [14:23:17<4:41:49, 2.52s/it] +2025-02-06 00:30:57 - ERROR - stderr - +2025-02-06 00:30:57 - ERROR - stderr - +2025-02-06 00:30:57 - INFO - stdout - {'loss': 0.4088, 'grad_norm': 1.5615870952606201, 'learning_rate': 4.32483770194265e-06, 'epoch': 2.1} +2025-02-06 00:30:57 - ERROR - stderr - 70%|███████ | 15734/22434 [14:23:17<4:41:49, 2.52s/it] +2025-02-06 00:31:00 - ERROR - stderr - 70%|███████ | 15735/22434 [14:23:20<4:40:16, 2.51s/it] +2025-02-06 00:31:00 - ERROR - stderr - +2025-02-06 00:31:00 - ERROR - stderr - +2025-02-06 00:31:00 - INFO - stdout - {'loss': 0.4129, 'grad_norm': 1.528801441192627, 'learning_rate': 4.32364903468247e-06, 'epoch': 2.1} +2025-02-06 00:31:00 - ERROR - stderr - 70%|███████ | 15735/22434 [14:23:20<4:40:16, 2.51s/it] +2025-02-06 00:31:02 - ERROR - stderr - 70%|███████ | 15736/22434 [14:23:22<4:40:06, 2.51s/it] +2025-02-06 00:31:02 - ERROR - stderr - +2025-02-06 00:31:02 - ERROR - stderr - +2025-02-06 00:31:02 - INFO - stdout - {'loss': 0.4703, 'grad_norm': 1.6716378927230835, 'learning_rate': 4.3224604857403985e-06, 'epoch': 2.1} +2025-02-06 00:31:02 - ERROR - stderr - 70%|███████ | 15736/22434 [14:23:22<4:40:06, 2.51s/it] +2025-02-06 00:31:05 - ERROR - stderr - 70%|███████ | 15737/22434 [14:23:25<4:40:12, 2.51s/it] +2025-02-06 00:31:05 - ERROR - stderr - +2025-02-06 00:31:05 - ERROR - stderr - +2025-02-06 00:31:05 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.494335412979126, 'learning_rate': 4.321272055141198e-06, 'epoch': 2.1} +2025-02-06 00:31:05 - ERROR - stderr - 70%|███████ | 15737/22434 [14:23:25<4:40:12, 2.51s/it] +2025-02-06 00:31:07 - ERROR - stderr - 70%|███████ | 15738/22434 [14:23:27<4:40:16, 2.51s/it] +2025-02-06 00:31:07 - ERROR - stderr - +2025-02-06 00:31:07 - ERROR - stderr - +2025-02-06 00:31:07 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.373849868774414, 'learning_rate': 4.320083742909651e-06, 'epoch': 2.1} +2025-02-06 00:31:07 - ERROR - stderr - 70%|███████ | 15738/22434 [14:23:27<4:40:16, 2.51s/it] +2025-02-06 00:31:10 - ERROR - stderr - 70%|███████ | 15739/22434 [14:23:30<4:38:56, 2.50s/it] +2025-02-06 00:31:10 - ERROR - stderr - +2025-02-06 00:31:10 - ERROR - stderr - +2025-02-06 00:31:10 - INFO - stdout - {'loss': 0.3489, 'grad_norm': 1.413920521736145, 'learning_rate': 4.318895549070524e-06, 'epoch': 2.1} +2025-02-06 00:31:10 - ERROR - stderr - 70%|███████ | 15739/22434 [14:23:30<4:38:56, 2.50s/it] +2025-02-06 00:31:12 - ERROR - stderr - 70%|███████ | 15740/22434 [14:23:32<4:38:25, 2.50s/it] +2025-02-06 00:31:12 - ERROR - stderr - +2025-02-06 00:31:12 - ERROR - stderr - +2025-02-06 00:31:12 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.4368923902511597, 'learning_rate': 4.317707473648582e-06, 'epoch': 2.1} +2025-02-06 00:31:12 - ERROR - stderr - 70%|███████ | 15740/22434 [14:23:32<4:38:25, 2.50s/it] +2025-02-06 00:31:15 - ERROR - stderr - 70%|███████ | 15741/22434 [14:23:35<4:40:07, 2.51s/it] +2025-02-06 00:31:15 - ERROR - stderr - +2025-02-06 00:31:15 - ERROR - stderr - +2025-02-06 00:31:15 - INFO - stdout - {'loss': 0.333, 'grad_norm': 1.1731454133987427, 'learning_rate': 4.316519516668595e-06, 'epoch': 2.1} +2025-02-06 00:31:15 - ERROR - stderr - 70%|███████ | 15741/22434 [14:23:35<4:40:07, 2.51s/it] +2025-02-06 00:31:17 - ERROR - stderr - 70%|███████ | 15742/22434 [14:23:37<4:39:38, 2.51s/it] +2025-02-06 00:31:17 - ERROR - stderr - +2025-02-06 00:31:17 - ERROR - stderr - +2025-02-06 00:31:17 - INFO - stdout - {'loss': 0.3248, 'grad_norm': 1.3359408378601074, 'learning_rate': 4.315331678155312e-06, 'epoch': 2.11} +2025-02-06 00:31:17 - ERROR - stderr - 70%|███████ | 15742/22434 [14:23:37<4:39:38, 2.51s/it] +2025-02-06 00:31:20 - ERROR - stderr - 70%|███████ | 15743/22434 [14:23:40<4:40:29, 2.52s/it] +2025-02-06 00:31:20 - ERROR - stderr - +2025-02-06 00:31:20 - ERROR - stderr - +2025-02-06 00:31:20 - INFO - stdout - {'loss': 0.3814, 'grad_norm': 1.4907485246658325, 'learning_rate': 4.314143958133508e-06, 'epoch': 2.11} +2025-02-06 00:31:20 - ERROR - stderr - 70%|███████ | 15743/22434 [14:23:40<4:40:29, 2.52s/it] +2025-02-06 00:31:22 - ERROR - stderr - 70%|███████ | 15744/22434 [14:23:42<4:42:05, 2.53s/it] +2025-02-06 00:31:23 - ERROR - stderr - +2025-02-06 00:31:23 - ERROR - stderr - +2025-02-06 00:31:23 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.5934419631958008, 'learning_rate': 4.312956356627929e-06, 'epoch': 2.11} +2025-02-06 00:31:23 - ERROR - stderr - 70%|███████ | 15744/22434 [14:23:42<4:42:05, 2.53s/it] +2025-02-06 00:31:25 - ERROR - stderr - 70%|███████ | 15745/22434 [14:23:45<4:40:29, 2.52s/it] +2025-02-06 00:31:25 - ERROR - stderr - +2025-02-06 00:31:25 - ERROR - stderr - +2025-02-06 00:31:25 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.4008458852767944, 'learning_rate': 4.311768873663329e-06, 'epoch': 2.11} +2025-02-06 00:31:25 - ERROR - stderr - 70%|███████ | 15745/22434 [14:23:45<4:40:29, 2.52s/it] +2025-02-06 00:31:27 - ERROR - stderr - 70%|███████ | 15746/22434 [14:23:47<4:40:50, 2.52s/it] +2025-02-06 00:31:28 - ERROR - stderr - +2025-02-06 00:31:28 - ERROR - stderr - +2025-02-06 00:31:28 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.2961705923080444, 'learning_rate': 4.310581509264471e-06, 'epoch': 2.11} +2025-02-06 00:31:28 - ERROR - stderr - 70%|███████ | 15746/22434 [14:23:47<4:40:50, 2.52s/it] +2025-02-06 00:31:30 - ERROR - stderr - 70%|███████ | 15747/22434 [14:23:50<4:38:51, 2.50s/it] +2025-02-06 00:31:30 - ERROR - stderr - +2025-02-06 00:31:30 - ERROR - stderr - +2025-02-06 00:31:30 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 1.6165697574615479, 'learning_rate': 4.309394263456091e-06, 'epoch': 2.11} +2025-02-06 00:31:30 - ERROR - stderr - 70%|███████ | 15747/22434 [14:23:50<4:38:51, 2.50s/it] +2025-02-06 00:31:32 - ERROR - stderr - 70%|███████ | 15748/22434 [14:23:52<4:38:16, 2.50s/it] +2025-02-06 00:31:32 - ERROR - stderr - +2025-02-06 00:31:32 - ERROR - stderr - +2025-02-06 00:31:32 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.6913940906524658, 'learning_rate': 4.308207136262949e-06, 'epoch': 2.11} +2025-02-06 00:31:32 - ERROR - stderr - 70%|███████ | 15748/22434 [14:23:52<4:38:16, 2.50s/it] +2025-02-06 00:31:35 - ERROR - stderr - 70%|███████ | 15749/22434 [14:23:55<4:39:04, 2.50s/it] +2025-02-06 00:31:35 - ERROR - stderr - +2025-02-06 00:31:35 - ERROR - stderr - +2025-02-06 00:31:35 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.4439914226531982, 'learning_rate': 4.3070201277097775e-06, 'epoch': 2.11} +2025-02-06 00:31:35 - ERROR - stderr - 70%|███████ | 15749/22434 [14:23:55<4:39:04, 2.50s/it] +2025-02-06 00:31:37 - ERROR - stderr - 70%|███████ | 15750/22434 [14:23:57<4:39:43, 2.51s/it] +2025-02-06 00:31:38 - ERROR - stderr - +2025-02-06 00:31:38 - ERROR - stderr - +2025-02-06 00:31:38 - INFO - stdout - {'loss': 0.3905, 'grad_norm': 1.5959389209747314, 'learning_rate': 4.305833237821325e-06, 'epoch': 2.11} +2025-02-06 00:31:38 - ERROR - stderr - 70%|███████ | 15750/22434 [14:23:57<4:39:43, 2.51s/it] +2025-02-06 00:31:40 - ERROR - stderr - 70%|███████ | 15751/22434 [14:24:00<4:40:09, 2.52s/it] +2025-02-06 00:31:40 - ERROR - stderr - +2025-02-06 00:31:40 - ERROR - stderr - +2025-02-06 00:31:40 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.5053926706314087, 'learning_rate': 4.304646466622331e-06, 'epoch': 2.11} +2025-02-06 00:31:40 - ERROR - stderr - 70%|███████ | 15751/22434 [14:24:00<4:40:09, 2.52s/it] +2025-02-06 00:31:42 - ERROR - stderr - 70%|███████ | 15752/22434 [14:24:02<4:37:37, 2.49s/it] +2025-02-06 00:31:42 - ERROR - stderr - +2025-02-06 00:31:42 - ERROR - stderr - +2025-02-06 00:31:42 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.4584319591522217, 'learning_rate': 4.303459814137531e-06, 'epoch': 2.11} +2025-02-06 00:31:42 - ERROR - stderr - 70%|███████ | 15752/22434 [14:24:02<4:37:37, 2.49s/it] +2025-02-06 00:31:45 - ERROR - stderr - 70%|███████ | 15753/22434 [14:24:05<4:45:30, 2.56s/it] +2025-02-06 00:31:45 - ERROR - stderr - +2025-02-06 00:31:45 - ERROR - stderr - +2025-02-06 00:31:45 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.4074989557266235, 'learning_rate': 4.302273280391659e-06, 'epoch': 2.11} +2025-02-06 00:31:45 - ERROR - stderr - 70%|███████ | 15753/22434 [14:24:05<4:45:30, 2.56s/it] +2025-02-06 00:31:48 - ERROR - stderr - 70%|███████ | 15754/22434 [14:24:08<4:48:38, 2.59s/it] +2025-02-06 00:31:48 - ERROR - stderr - +2025-02-06 00:31:48 - ERROR - stderr - +2025-02-06 00:31:48 - INFO - stdout - {'loss': 0.3795, 'grad_norm': 1.507177472114563, 'learning_rate': 4.301086865409449e-06, 'epoch': 2.11} +2025-02-06 00:31:48 - ERROR - stderr - 70%|███████ | 15754/22434 [14:24:08<4:48:38, 2.59s/it] +2025-02-06 00:31:50 - ERROR - stderr - 70%|███████ | 15755/22434 [14:24:10<4:42:57, 2.54s/it] +2025-02-06 00:31:50 - ERROR - stderr - +2025-02-06 00:31:50 - ERROR - stderr - +2025-02-06 00:31:50 - INFO - stdout - {'loss': 0.4047, 'grad_norm': 1.577232837677002, 'learning_rate': 4.29990056921563e-06, 'epoch': 2.11} +2025-02-06 00:31:50 - ERROR - stderr - 70%|███████ | 15755/22434 [14:24:10<4:42:57, 2.54s/it] +2025-02-06 00:31:53 - ERROR - stderr - 70%|███████ | 15756/22434 [14:24:12<4:40:31, 2.52s/it] +2025-02-06 00:31:53 - ERROR - stderr - +2025-02-06 00:31:53 - ERROR - stderr - +2025-02-06 00:31:53 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.534798264503479, 'learning_rate': 4.298714391834929e-06, 'epoch': 2.11} +2025-02-06 00:31:53 - ERROR - stderr - 70%|███████ | 15756/22434 [14:24:13<4:40:31, 2.52s/it] +2025-02-06 00:31:55 - ERROR - stderr - 70%|███████ | 15757/22434 [14:24:15<4:43:05, 2.54s/it] +2025-02-06 00:31:55 - ERROR - stderr - +2025-02-06 00:31:55 - ERROR - stderr - +2025-02-06 00:31:55 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.3766248226165771, 'learning_rate': 4.297528333292072e-06, 'epoch': 2.11} +2025-02-06 00:31:55 - ERROR - stderr - 70%|███████ | 15757/22434 [14:24:15<4:43:05, 2.54s/it] +2025-02-06 00:31:58 - ERROR - stderr - 70%|███████ | 15758/22434 [14:24:18<4:45:02, 2.56s/it] +2025-02-06 00:31:58 - ERROR - stderr - +2025-02-06 00:31:58 - ERROR - stderr - +2025-02-06 00:31:58 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.5357187986373901, 'learning_rate': 4.2963423936117795e-06, 'epoch': 2.11} +2025-02-06 00:31:58 - ERROR - stderr - 70%|███████ | 15758/22434 [14:24:18<4:45:02, 2.56s/it] +2025-02-06 00:32:01 - ERROR - stderr - 70%|███████ | 15759/22434 [14:24:21<4:53:58, 2.64s/it] +2025-02-06 00:32:01 - ERROR - stderr - +2025-02-06 00:32:01 - ERROR - stderr - +2025-02-06 00:32:01 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.4902327060699463, 'learning_rate': 4.295156572818773e-06, 'epoch': 2.11} +2025-02-06 00:32:01 - ERROR - stderr - 70%|███████ | 15759/22434 [14:24:21<4:53:58, 2.64s/it] +2025-02-06 00:32:03 - ERROR - stderr - 70%|███████ | 15760/22434 [14:24:23<4:47:32, 2.59s/it] +2025-02-06 00:32:03 - ERROR - stderr - +2025-02-06 00:32:03 - ERROR - stderr - +2025-02-06 00:32:03 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.4230576753616333, 'learning_rate': 4.293970870937772e-06, 'epoch': 2.11} +2025-02-06 00:32:03 - ERROR - stderr - 70%|███████ | 15760/22434 [14:24:23<4:47:32, 2.59s/it] +2025-02-06 00:32:06 - ERROR - stderr - 70%|███████ | 15761/22434 [14:24:26<4:46:15, 2.57s/it] +2025-02-06 00:32:06 - ERROR - stderr - +2025-02-06 00:32:06 - ERROR - stderr - +2025-02-06 00:32:06 - INFO - stdout - {'loss': 0.3791, 'grad_norm': 1.448320984840393, 'learning_rate': 4.292785287993479e-06, 'epoch': 2.11} +2025-02-06 00:32:06 - ERROR - stderr - 70%|███████ | 15761/22434 [14:24:26<4:46:15, 2.57s/it] +2025-02-06 00:32:08 - ERROR - stderr - 70%|███████ | 15762/22434 [14:24:28<4:41:49, 2.53s/it] +2025-02-06 00:32:08 - ERROR - stderr - +2025-02-06 00:32:08 - ERROR - stderr - +2025-02-06 00:32:08 - INFO - stdout - {'loss': 0.3967, 'grad_norm': 1.5622847080230713, 'learning_rate': 4.291599824010625e-06, 'epoch': 2.11} +2025-02-06 00:32:08 - ERROR - stderr - 70%|███████ | 15762/22434 [14:24:28<4:41:49, 2.53s/it] +2025-02-06 00:32:11 - ERROR - stderr - 70%|███████ | 15763/22434 [14:24:31<4:43:16, 2.55s/it] +2025-02-06 00:32:11 - ERROR - stderr - +2025-02-06 00:32:11 - ERROR - stderr - +2025-02-06 00:32:11 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.4429446458816528, 'learning_rate': 4.290414479013902e-06, 'epoch': 2.11} +2025-02-06 00:32:11 - ERROR - stderr - 70%|███████ | 15763/22434 [14:24:31<4:43:16, 2.55s/it] +2025-02-06 00:32:13 - ERROR - stderr - 70%|███████ | 15764/22434 [14:24:33<4:41:59, 2.54s/it] +2025-02-06 00:32:13 - ERROR - stderr - +2025-02-06 00:32:13 - ERROR - stderr - +2025-02-06 00:32:13 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.6376534700393677, 'learning_rate': 4.289229253028029e-06, 'epoch': 2.11} +2025-02-06 00:32:13 - ERROR - stderr - 70%|███████ | 15764/22434 [14:24:33<4:41:59, 2.54s/it] +2025-02-06 00:32:16 - ERROR - stderr - 70%|███████ | 15765/22434 [14:24:36<4:39:14, 2.51s/it] +2025-02-06 00:32:16 - ERROR - stderr - +2025-02-06 00:32:16 - ERROR - stderr - +2025-02-06 00:32:16 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.5807442665100098, 'learning_rate': 4.288044146077712e-06, 'epoch': 2.11} +2025-02-06 00:32:16 - ERROR - stderr - 70%|███████ | 15765/22434 [14:24:36<4:39:14, 2.51s/it] +2025-02-06 00:32:18 - ERROR - stderr - 70%|███████ | 15766/22434 [14:24:38<4:37:24, 2.50s/it] +2025-02-06 00:32:18 - ERROR - stderr - +2025-02-06 00:32:18 - ERROR - stderr - +2025-02-06 00:32:18 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.2993814945220947, 'learning_rate': 4.286859158187641e-06, 'epoch': 2.11} +2025-02-06 00:32:18 - ERROR - stderr - 70%|███████ | 15766/22434 [14:24:38<4:37:24, 2.50s/it] +2025-02-06 00:32:21 - ERROR - stderr - 70%|███████ | 15767/22434 [14:24:40<4:36:15, 2.49s/it] +2025-02-06 00:32:21 - ERROR - stderr - +2025-02-06 00:32:21 - ERROR - stderr - +2025-02-06 00:32:21 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.313005805015564, 'learning_rate': 4.285674289382532e-06, 'epoch': 2.11} +2025-02-06 00:32:21 - ERROR - stderr - 70%|███████ | 15767/22434 [14:24:40<4:36:15, 2.49s/it] +2025-02-06 00:32:23 - ERROR - stderr - 70%|███████ | 15768/22434 [14:24:43<4:35:52, 2.48s/it] +2025-02-06 00:32:23 - ERROR - stderr - +2025-02-06 00:32:23 - ERROR - stderr - +2025-02-06 00:32:23 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.5080761909484863, 'learning_rate': 4.2844895396870704e-06, 'epoch': 2.11} +2025-02-06 00:32:23 - ERROR - stderr - 70%|███████ | 15768/22434 [14:24:43<4:35:52, 2.48s/it] +2025-02-06 00:32:26 - ERROR - stderr - 70%|███████ | 15769/22434 [14:24:45<4:33:42, 2.46s/it] +2025-02-06 00:32:26 - ERROR - stderr - +2025-02-06 00:32:26 - ERROR - stderr - +2025-02-06 00:32:26 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.5081191062927246, 'learning_rate': 4.283304909125956e-06, 'epoch': 2.11} +2025-02-06 00:32:26 - ERROR - stderr - 70%|███████ | 15769/22434 [14:24:45<4:33:42, 2.46s/it] +2025-02-06 00:32:28 - ERROR - stderr - 70%|███████ | 15770/22434 [14:24:48<4:35:22, 2.48s/it] +2025-02-06 00:32:28 - ERROR - stderr - +2025-02-06 00:32:28 - ERROR - stderr - +2025-02-06 00:32:28 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 1.6054719686508179, 'learning_rate': 4.282120397723879e-06, 'epoch': 2.11} +2025-02-06 00:32:28 - ERROR - stderr - 70%|███████ | 15770/22434 [14:24:48<4:35:22, 2.48s/it] +2025-02-06 00:32:31 - ERROR - stderr - 70%|███████ | 15771/22434 [14:24:50<4:35:34, 2.48s/it] +2025-02-06 00:32:31 - ERROR - stderr - +2025-02-06 00:32:31 - ERROR - stderr - +2025-02-06 00:32:31 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.5559393167495728, 'learning_rate': 4.280936005505528e-06, 'epoch': 2.11} +2025-02-06 00:32:31 - ERROR - stderr - 70%|███████ | 15771/22434 [14:24:50<4:35:34, 2.48s/it] +2025-02-06 00:32:31 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2878 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 00:32:31 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2878 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 00:32:33 - ERROR - stderr - 70%|███████ | 15772/22434 [14:24:53<4:36:22, 2.49s/it] +2025-02-06 00:32:33 - ERROR - stderr - +2025-02-06 00:32:33 - ERROR - stderr - +2025-02-06 00:32:33 - INFO - stdout - {'loss': 0.3278, 'grad_norm': 1.3938302993774414, 'learning_rate': 4.279751732495601e-06, 'epoch': 2.11} +2025-02-06 00:32:33 - ERROR - stderr - 70%|███████ | 15772/22434 [14:24:53<4:36:22, 2.49s/it] +2025-02-06 00:32:39 - ERROR - stderr - 70%|███████ | 15773/22434 [14:24:59<6:25:10, 3.47s/it] +2025-02-06 00:32:39 - ERROR - stderr - +2025-02-06 00:32:39 - ERROR - stderr - +2025-02-06 00:32:39 - INFO - stdout - {'loss': 0.3693, 'grad_norm': 1.3618437051773071, 'learning_rate': 4.278567578718772e-06, 'epoch': 2.11} +2025-02-06 00:32:39 - ERROR - stderr - 70%|███████ | 15773/22434 [14:24:59<6:25:10, 3.47s/it] +2025-02-06 00:32:41 - ERROR - stderr - 70%|███████ | 15774/22434 [14:25:01<5:56:51, 3.21s/it] +2025-02-06 00:32:42 - ERROR - stderr - +2025-02-06 00:32:42 - ERROR - stderr - +2025-02-06 00:32:42 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4044644832611084, 'learning_rate': 4.277383544199726e-06, 'epoch': 2.11} +2025-02-06 00:32:42 - ERROR - stderr - 70%|███████ | 15774/22434 [14:25:01<5:56:51, 3.21s/it] +2025-02-06 00:32:44 - ERROR - stderr - 70%|███████ | 15775/22434 [14:25:04<5:33:20, 3.00s/it] +2025-02-06 00:32:44 - ERROR - stderr - +2025-02-06 00:32:44 - ERROR - stderr - +2025-02-06 00:32:44 - INFO - stdout - {'loss': 0.4061, 'grad_norm': 1.4152419567108154, 'learning_rate': 4.276199628963145e-06, 'epoch': 2.11} +2025-02-06 00:32:44 - ERROR - stderr - 70%|███████ | 15775/22434 [14:25:04<5:33:20, 3.00s/it] +2025-02-06 00:32:46 - ERROR - stderr - 70%|███████ | 15776/22434 [14:25:06<5:16:34, 2.85s/it] +2025-02-06 00:32:47 - ERROR - stderr - +2025-02-06 00:32:47 - ERROR - stderr - +2025-02-06 00:32:47 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.4973400831222534, 'learning_rate': 4.275015833033706e-06, 'epoch': 2.11} +2025-02-06 00:32:47 - ERROR - stderr - 70%|███████ | 15776/22434 [14:25:06<5:16:34, 2.85s/it] +2025-02-06 00:32:49 - ERROR - stderr - 70%|███████ | 15777/22434 [14:25:09<5:06:03, 2.76s/it] +2025-02-06 00:32:49 - ERROR - stderr - +2025-02-06 00:32:49 - ERROR - stderr - +2025-02-06 00:32:49 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.4519171714782715, 'learning_rate': 4.273832156436082e-06, 'epoch': 2.11} +2025-02-06 00:32:49 - ERROR - stderr - 70%|███████ | 15777/22434 [14:25:09<5:06:03, 2.76s/it] +2025-02-06 00:32:51 - ERROR - stderr - 70%|███████ | 15778/22434 [14:25:11<4:56:31, 2.67s/it] +2025-02-06 00:32:52 - ERROR - stderr - +2025-02-06 00:32:52 - ERROR - stderr - +2025-02-06 00:32:52 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.3978184461593628, 'learning_rate': 4.272648599194948e-06, 'epoch': 2.11} +2025-02-06 00:32:52 - ERROR - stderr - 70%|███████ | 15778/22434 [14:25:11<4:56:31, 2.67s/it] +2025-02-06 00:32:54 - ERROR - stderr - 70%|███████ | 15779/22434 [14:25:14<4:53:25, 2.65s/it] +2025-02-06 00:32:54 - ERROR - stderr - +2025-02-06 00:32:54 - ERROR - stderr - +2025-02-06 00:32:54 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.284856915473938, 'learning_rate': 4.271465161334974e-06, 'epoch': 2.11} +2025-02-06 00:32:54 - ERROR - stderr - 70%|███████ | 15779/22434 [14:25:14<4:53:25, 2.65s/it] +2025-02-06 00:32:57 - ERROR - stderr - 70%|███████ | 15780/22434 [14:25:16<4:49:47, 2.61s/it] +2025-02-06 00:32:57 - ERROR - stderr - +2025-02-06 00:32:57 - ERROR - stderr - +2025-02-06 00:32:57 - INFO - stdout - {'loss': 0.457, 'grad_norm': 1.6457892656326294, 'learning_rate': 4.270281842880827e-06, 'epoch': 2.11} +2025-02-06 00:32:57 - ERROR - stderr - 70%|███████ | 15780/22434 [14:25:16<4:49:47, 2.61s/it] +2025-02-06 00:32:59 - ERROR - stderr - 70%|███████ | 15781/22434 [14:25:19<4:47:57, 2.60s/it] +2025-02-06 00:32:59 - ERROR - stderr - +2025-02-06 00:32:59 - ERROR - stderr - +2025-02-06 00:32:59 - INFO - stdout - {'loss': 0.3346, 'grad_norm': 1.372888207435608, 'learning_rate': 4.269098643857176e-06, 'epoch': 2.11} +2025-02-06 00:32:59 - ERROR - stderr - 70%|███████ | 15781/22434 [14:25:19<4:47:57, 2.60s/it] +2025-02-06 00:33:02 - ERROR - stderr - 70%|███████ | 15782/22434 [14:25:21<4:45:43, 2.58s/it] +2025-02-06 00:33:02 - ERROR - stderr - +2025-02-06 00:33:02 - ERROR - stderr - +2025-02-06 00:33:02 - INFO - stdout - {'loss': 0.3978, 'grad_norm': 1.6197234392166138, 'learning_rate': 4.267915564288673e-06, 'epoch': 2.11} +2025-02-06 00:33:02 - ERROR - stderr - 70%|███████ | 15782/22434 [14:25:22<4:45:43, 2.58s/it] +2025-02-06 00:33:04 - ERROR - stderr - 70%|███████ | 15783/22434 [14:25:24<4:41:21, 2.54s/it] +2025-02-06 00:33:04 - ERROR - stderr - +2025-02-06 00:33:04 - ERROR - stderr - +2025-02-06 00:33:04 - INFO - stdout - {'loss': 0.4306, 'grad_norm': 1.6297394037246704, 'learning_rate': 4.266732604199988e-06, 'epoch': 2.11} +2025-02-06 00:33:04 - ERROR - stderr - 70%|███████ | 15783/22434 [14:25:24<4:41:21, 2.54s/it] +2025-02-06 00:33:07 - ERROR - stderr - 70%|███████ | 15784/22434 [14:25:26<4:38:38, 2.51s/it] +2025-02-06 00:33:07 - ERROR - stderr - +2025-02-06 00:33:07 - ERROR - stderr - +2025-02-06 00:33:07 - INFO - stdout - {'loss': 0.404, 'grad_norm': 1.6843125820159912, 'learning_rate': 4.26554976361578e-06, 'epoch': 2.11} +2025-02-06 00:33:07 - ERROR - stderr - 70%|███████ | 15784/22434 [14:25:26<4:38:38, 2.51s/it] +2025-02-06 00:33:09 - ERROR - stderr - 70%|███████ | 15785/22434 [14:25:29<4:38:01, 2.51s/it] +2025-02-06 00:33:09 - ERROR - stderr - +2025-02-06 00:33:09 - ERROR - stderr - +2025-02-06 00:33:09 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.6915265321731567, 'learning_rate': 4.264367042560691e-06, 'epoch': 2.11} +2025-02-06 00:33:09 - ERROR - stderr - 70%|███████ | 15785/22434 [14:25:29<4:38:01, 2.51s/it] +2025-02-06 00:33:12 - ERROR - stderr - 70%|███████ | 15786/22434 [14:25:32<4:44:27, 2.57s/it] +2025-02-06 00:33:12 - ERROR - stderr - +2025-02-06 00:33:12 - ERROR - stderr - +2025-02-06 00:33:12 - INFO - stdout - {'loss': 0.3899, 'grad_norm': 1.5842605829238892, 'learning_rate': 4.263184441059391e-06, 'epoch': 2.11} +2025-02-06 00:33:12 - ERROR - stderr - 70%|███████ | 15786/22434 [14:25:32<4:44:27, 2.57s/it] +2025-02-06 00:33:14 - ERROR - stderr - 70%|███████ | 15787/22434 [14:25:34<4:44:34, 2.57s/it] +2025-02-06 00:33:14 - ERROR - stderr - +2025-02-06 00:33:14 - ERROR - stderr - +2025-02-06 00:33:14 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.4372575283050537, 'learning_rate': 4.262001959136515e-06, 'epoch': 2.11} +2025-02-06 00:33:14 - ERROR - stderr - 70%|███████ | 15787/22434 [14:25:34<4:44:34, 2.57s/it] +2025-02-06 00:33:17 - ERROR - stderr - 70%|███████ | 15788/22434 [14:25:37<4:49:11, 2.61s/it] +2025-02-06 00:33:17 - ERROR - stderr - +2025-02-06 00:33:17 - ERROR - stderr - +2025-02-06 00:33:17 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.5107115507125854, 'learning_rate': 4.260819596816725e-06, 'epoch': 2.11} +2025-02-06 00:33:17 - ERROR - stderr - 70%|███████ | 15788/22434 [14:25:37<4:49:11, 2.61s/it] +2025-02-06 00:33:20 - ERROR - stderr - 70%|███████ | 15789/22434 [14:25:39<4:49:01, 2.61s/it] +2025-02-06 00:33:20 - ERROR - stderr - +2025-02-06 00:33:20 - ERROR - stderr - +2025-02-06 00:33:20 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.4474635124206543, 'learning_rate': 4.259637354124654e-06, 'epoch': 2.11} +2025-02-06 00:33:20 - ERROR - stderr - 70%|███████ | 15789/22434 [14:25:39<4:49:01, 2.61s/it] +2025-02-06 00:33:22 - ERROR - stderr - 70%|███████ | 15790/22434 [14:25:42<4:42:10, 2.55s/it] +2025-02-06 00:33:22 - ERROR - stderr - +2025-02-06 00:33:22 - ERROR - stderr - +2025-02-06 00:33:22 - INFO - stdout - {'loss': 0.3836, 'grad_norm': 1.4111926555633545, 'learning_rate': 4.2584552310849454e-06, 'epoch': 2.11} +2025-02-06 00:33:22 - ERROR - stderr - 70%|███████ | 15790/22434 [14:25:42<4:42:10, 2.55s/it] +2025-02-06 00:33:25 - ERROR - stderr - 70%|███████ | 15791/22434 [14:25:45<4:45:41, 2.58s/it] +2025-02-06 00:33:25 - ERROR - stderr - +2025-02-06 00:33:25 - ERROR - stderr - +2025-02-06 00:33:25 - INFO - stdout - {'loss': 0.3388, 'grad_norm': 1.4600623846054077, 'learning_rate': 4.257273227722252e-06, 'epoch': 2.11} +2025-02-06 00:33:25 - ERROR - stderr - 70%|███████ | 15791/22434 [14:25:45<4:45:41, 2.58s/it] +2025-02-06 00:33:27 - ERROR - stderr - 70%|███████ | 15792/22434 [14:25:47<4:44:35, 2.57s/it] +2025-02-06 00:33:27 - ERROR - stderr - +2025-02-06 00:33:27 - ERROR - stderr - +2025-02-06 00:33:27 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.5950621366500854, 'learning_rate': 4.256091344061199e-06, 'epoch': 2.11} +2025-02-06 00:33:27 - ERROR - stderr - 70%|███████ | 15792/22434 [14:25:47<4:44:35, 2.57s/it] +2025-02-06 00:33:30 - ERROR - stderr - 70%|███████ | 15793/22434 [14:25:50<4:43:11, 2.56s/it] +2025-02-06 00:33:30 - ERROR - stderr - +2025-02-06 00:33:30 - ERROR - stderr - +2025-02-06 00:33:30 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.574222445487976, 'learning_rate': 4.254909580126425e-06, 'epoch': 2.11} +2025-02-06 00:33:30 - ERROR - stderr - 70%|███████ | 15793/22434 [14:25:50<4:43:11, 2.56s/it] +2025-02-06 00:33:32 - ERROR - stderr - 70%|███████ | 15794/22434 [14:25:52<4:41:10, 2.54s/it] +2025-02-06 00:33:32 - ERROR - stderr - +2025-02-06 00:33:32 - ERROR - stderr - +2025-02-06 00:33:32 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4009393453598022, 'learning_rate': 4.253727935942563e-06, 'epoch': 2.11} +2025-02-06 00:33:32 - ERROR - stderr - 70%|███████ | 15794/22434 [14:25:52<4:41:10, 2.54s/it] +2025-02-06 00:33:35 - ERROR - stderr - 70%|███████ | 15795/22434 [14:25:55<4:43:38, 2.56s/it] +2025-02-06 00:33:35 - ERROR - stderr - +2025-02-06 00:33:35 - ERROR - stderr - +2025-02-06 00:33:35 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.309697151184082, 'learning_rate': 4.252546411534245e-06, 'epoch': 2.11} +2025-02-06 00:33:35 - ERROR - stderr - 70%|███████ | 15795/22434 [14:25:55<4:43:38, 2.56s/it] +2025-02-06 00:33:38 - ERROR - stderr - 70%|███████ | 15796/22434 [14:25:57<4:43:43, 2.56s/it] +2025-02-06 00:33:38 - ERROR - stderr - +2025-02-06 00:33:38 - ERROR - stderr - +2025-02-06 00:33:38 - INFO - stdout - {'loss': 0.3205, 'grad_norm': 1.3598058223724365, 'learning_rate': 4.251365006926096e-06, 'epoch': 2.11} +2025-02-06 00:33:38 - ERROR - stderr - 70%|███████ | 15796/22434 [14:25:57<4:43:43, 2.56s/it] +2025-02-06 00:33:40 - ERROR - stderr - 70%|███████ | 15797/22434 [14:26:00<4:39:46, 2.53s/it] +2025-02-06 00:33:40 - ERROR - stderr - +2025-02-06 00:33:40 - ERROR - stderr - +2025-02-06 00:33:40 - INFO - stdout - {'loss': 0.3126, 'grad_norm': 1.447260856628418, 'learning_rate': 4.250183722142743e-06, 'epoch': 2.11} +2025-02-06 00:33:40 - ERROR - stderr - 70%|███████ | 15797/22434 [14:26:00<4:39:46, 2.53s/it] +2025-02-06 00:33:42 - ERROR - stderr - 70%|███████ | 15798/22434 [14:26:02<4:39:18, 2.53s/it] +2025-02-06 00:33:43 - ERROR - stderr - +2025-02-06 00:33:43 - ERROR - stderr - +2025-02-06 00:33:43 - INFO - stdout - {'loss': 0.4169, 'grad_norm': 1.5538804531097412, 'learning_rate': 4.249002557208809e-06, 'epoch': 2.11} +2025-02-06 00:33:43 - ERROR - stderr - 70%|███████ | 15798/22434 [14:26:02<4:39:18, 2.53s/it] +2025-02-06 00:33:45 - ERROR - stderr - 70%|███████ | 15799/22434 [14:26:05<4:42:11, 2.55s/it] +2025-02-06 00:33:45 - ERROR - stderr - +2025-02-06 00:33:45 - ERROR - stderr - +2025-02-06 00:33:45 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.5057202577590942, 'learning_rate': 4.247821512148913e-06, 'epoch': 2.11} +2025-02-06 00:33:45 - ERROR - stderr - 70%|███████ | 15799/22434 [14:26:05<4:42:11, 2.55s/it] +2025-02-06 00:33:48 - ERROR - stderr - 70%|███████ | 15800/22434 [14:26:07<4:39:30, 2.53s/it] +2025-02-06 00:33:48 - ERROR - stderr - +2025-02-06 00:33:48 - ERROR - stderr - +2025-02-06 00:33:48 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.4938865900039673, 'learning_rate': 4.246640586987677e-06, 'epoch': 2.11} +2025-02-06 00:33:48 - ERROR - stderr - 70%|███████ | 15800/22434 [14:26:07<4:39:30, 2.53s/it] +2025-02-06 00:33:50 - ERROR - stderr - 70%|███████ | 15801/22434 [14:26:10<4:40:00, 2.53s/it] +2025-02-06 00:33:50 - ERROR - stderr - +2025-02-06 00:33:50 - ERROR - stderr - +2025-02-06 00:33:50 - INFO - stdout - {'loss': 0.4118, 'grad_norm': 1.5976083278656006, 'learning_rate': 4.2454597817497054e-06, 'epoch': 2.11} +2025-02-06 00:33:50 - ERROR - stderr - 70%|███████ | 15801/22434 [14:26:10<4:40:00, 2.53s/it] +2025-02-06 00:33:53 - ERROR - stderr - 70%|███████ | 15802/22434 [14:26:12<4:40:00, 2.53s/it] +2025-02-06 00:33:53 - ERROR - stderr - +2025-02-06 00:33:53 - ERROR - stderr - +2025-02-06 00:33:53 - INFO - stdout - {'loss': 0.3708, 'grad_norm': 1.5542221069335938, 'learning_rate': 4.244279096459623e-06, 'epoch': 2.11} +2025-02-06 00:33:53 - ERROR - stderr - 70%|███████ | 15802/22434 [14:26:12<4:40:00, 2.53s/it] +2025-02-06 00:33:55 - ERROR - stderr - 70%|███████ | 15803/22434 [14:26:15<4:37:23, 2.51s/it] +2025-02-06 00:33:55 - ERROR - stderr - +2025-02-06 00:33:55 - ERROR - stderr - +2025-02-06 00:33:55 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.5483275651931763, 'learning_rate': 4.243098531142034e-06, 'epoch': 2.11} +2025-02-06 00:33:55 - ERROR - stderr - 70%|███████ | 15803/22434 [14:26:15<4:37:23, 2.51s/it] +2025-02-06 00:33:58 - ERROR - stderr - 70%|███████ | 15804/22434 [14:26:18<4:43:09, 2.56s/it] +2025-02-06 00:33:58 - ERROR - stderr - +2025-02-06 00:33:58 - ERROR - stderr - +2025-02-06 00:33:58 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.409231424331665, 'learning_rate': 4.241918085821547e-06, 'epoch': 2.11} +2025-02-06 00:33:58 - ERROR - stderr - 70%|███████ | 15804/22434 [14:26:18<4:43:09, 2.56s/it] +2025-02-06 00:34:00 - ERROR - stderr - 70%|███████ | 15805/22434 [14:26:20<4:40:06, 2.54s/it] +2025-02-06 00:34:00 - ERROR - stderr - +2025-02-06 00:34:00 - ERROR - stderr - +2025-02-06 00:34:00 - INFO - stdout - {'loss': 0.4254, 'grad_norm': 1.7263003587722778, 'learning_rate': 4.2407377605227715e-06, 'epoch': 2.11} +2025-02-06 00:34:00 - ERROR - stderr - 70%|███████ | 15805/22434 [14:26:20<4:40:06, 2.54s/it] +2025-02-06 00:34:03 - ERROR - stderr - 70%|███████ | 15806/22434 [14:26:23<4:40:26, 2.54s/it] +2025-02-06 00:34:03 - ERROR - stderr - +2025-02-06 00:34:03 - ERROR - stderr - +2025-02-06 00:34:03 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.403637170791626, 'learning_rate': 4.2395575552702996e-06, 'epoch': 2.11} +2025-02-06 00:34:03 - ERROR - stderr - 70%|███████ | 15806/22434 [14:26:23<4:40:26, 2.54s/it] +2025-02-06 00:34:05 - ERROR - stderr - 70%|███████ | 15807/22434 [14:26:25<4:41:52, 2.55s/it] +2025-02-06 00:34:05 - ERROR - stderr - +2025-02-06 00:34:05 - ERROR - stderr - +2025-02-06 00:34:05 - INFO - stdout - {'loss': 0.3248, 'grad_norm': 1.482252836227417, 'learning_rate': 4.238377470088745e-06, 'epoch': 2.11} +2025-02-06 00:34:05 - ERROR - stderr - 70%|███████ | 15807/22434 [14:26:25<4:41:52, 2.55s/it] +2025-02-06 00:34:08 - ERROR - stderr - 70%|███████ | 15808/22434 [14:26:28<4:41:55, 2.55s/it] +2025-02-06 00:34:08 - ERROR - stderr - +2025-02-06 00:34:08 - ERROR - stderr - +2025-02-06 00:34:08 - INFO - stdout - {'loss': 0.3911, 'grad_norm': 1.4706225395202637, 'learning_rate': 4.2371975050026915e-06, 'epoch': 2.11} +2025-02-06 00:34:08 - ERROR - stderr - 70%|███████ | 15808/22434 [14:26:28<4:41:55, 2.55s/it] +2025-02-06 00:34:11 - ERROR - stderr - 70%|███████ | 15809/22434 [14:26:30<4:42:44, 2.56s/it] +2025-02-06 00:34:11 - ERROR - stderr - +2025-02-06 00:34:11 - ERROR - stderr - +2025-02-06 00:34:11 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.3683924674987793, 'learning_rate': 4.236017660036745e-06, 'epoch': 2.11} +2025-02-06 00:34:11 - ERROR - stderr - 70%|███████ | 15809/22434 [14:26:30<4:42:44, 2.56s/it] +2025-02-06 00:34:13 - ERROR - stderr - 70%|███████ | 15810/22434 [14:26:33<4:44:53, 2.58s/it] +2025-02-06 00:34:13 - ERROR - stderr - +2025-02-06 00:34:13 - ERROR - stderr - +2025-02-06 00:34:13 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.46933114528656, 'learning_rate': 4.2348379352155e-06, 'epoch': 2.11} +2025-02-06 00:34:13 - ERROR - stderr - 70%|███████ | 15810/22434 [14:26:33<4:44:53, 2.58s/it] +2025-02-06 00:34:16 - ERROR - stderr - 70%|███████ | 15811/22434 [14:26:35<4:41:24, 2.55s/it] +2025-02-06 00:34:16 - ERROR - stderr - +2025-02-06 00:34:16 - ERROR - stderr - +2025-02-06 00:34:16 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.4610997438430786, 'learning_rate': 4.233658330563533e-06, 'epoch': 2.11} +2025-02-06 00:34:16 - ERROR - stderr - 70%|███████ | 15811/22434 [14:26:35<4:41:24, 2.55s/it] +2025-02-06 00:34:18 - ERROR - stderr - 70%|███████ | 15812/22434 [14:26:38<4:46:32, 2.60s/it] +2025-02-06 00:34:18 - ERROR - stderr - +2025-02-06 00:34:18 - ERROR - stderr - +2025-02-06 00:34:18 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.5270748138427734, 'learning_rate': 4.232478846105447e-06, 'epoch': 2.11} +2025-02-06 00:34:18 - ERROR - stderr - 70%|███████ | 15812/22434 [14:26:38<4:46:32, 2.60s/it] +2025-02-06 00:34:21 - ERROR - stderr - 70%|███████ | 15813/22434 [14:26:41<4:40:47, 2.54s/it] +2025-02-06 00:34:21 - ERROR - stderr - +2025-02-06 00:34:21 - ERROR - stderr - +2025-02-06 00:34:21 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.4847596883773804, 'learning_rate': 4.231299481865818e-06, 'epoch': 2.11} +2025-02-06 00:34:21 - ERROR - stderr - 70%|███████ | 15813/22434 [14:26:41<4:40:47, 2.54s/it] +2025-02-06 00:34:23 - ERROR - stderr - 70%|███████ | 15814/22434 [14:26:43<4:39:36, 2.53s/it] +2025-02-06 00:34:23 - ERROR - stderr - +2025-02-06 00:34:23 - ERROR - stderr - +2025-02-06 00:34:23 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5384604930877686, 'learning_rate': 4.230120237869232e-06, 'epoch': 2.11} +2025-02-06 00:34:23 - ERROR - stderr - 70%|███████ | 15814/22434 [14:26:43<4:39:36, 2.53s/it] +2025-02-06 00:34:26 - ERROR - stderr - 70%|███████ | 15815/22434 [14:26:46<4:38:04, 2.52s/it] +2025-02-06 00:34:26 - ERROR - stderr - +2025-02-06 00:34:26 - ERROR - stderr - +2025-02-06 00:34:26 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.4346331357955933, 'learning_rate': 4.228941114140267e-06, 'epoch': 2.11} +2025-02-06 00:34:26 - ERROR - stderr - 70%|███████ | 15815/22434 [14:26:46<4:38:04, 2.52s/it] +2025-02-06 00:34:28 - ERROR - stderr - 71%|███████ | 15816/22434 [14:26:48<4:36:16, 2.50s/it] +2025-02-06 00:34:28 - ERROR - stderr - +2025-02-06 00:34:28 - ERROR - stderr - +2025-02-06 00:34:28 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.6083632707595825, 'learning_rate': 4.227762110703499e-06, 'epoch': 2.12} +2025-02-06 00:34:28 - ERROR - stderr - 71%|███████ | 15816/22434 [14:26:48<4:36:16, 2.50s/it] +2025-02-06 00:34:31 - ERROR - stderr - 71%|███████ | 15817/22434 [14:26:51<4:44:20, 2.58s/it] +2025-02-06 00:34:31 - ERROR - stderr - +2025-02-06 00:34:31 - ERROR - stderr - +2025-02-06 00:34:31 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.4010009765625, 'learning_rate': 4.226583227583514e-06, 'epoch': 2.12} +2025-02-06 00:34:31 - ERROR - stderr - 71%|███████ | 15817/22434 [14:26:51<4:44:20, 2.58s/it] +2025-02-06 00:34:34 - ERROR - stderr - 71%|███████ | 15818/22434 [14:26:54<4:54:08, 2.67s/it] +2025-02-06 00:34:34 - ERROR - stderr - +2025-02-06 00:34:34 - ERROR - stderr - +2025-02-06 00:34:34 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.3030776977539062, 'learning_rate': 4.225404464804873e-06, 'epoch': 2.12} +2025-02-06 00:34:34 - ERROR - stderr - 71%|███████ | 15818/22434 [14:26:54<4:54:08, 2.67s/it] +2025-02-06 00:34:36 - ERROR - stderr - 71%|███████ | 15819/22434 [14:26:56<4:46:42, 2.60s/it] +2025-02-06 00:34:36 - ERROR - stderr - +2025-02-06 00:34:36 - ERROR - stderr - +2025-02-06 00:34:36 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.5039422512054443, 'learning_rate': 4.224225822392149e-06, 'epoch': 2.12} +2025-02-06 00:34:36 - ERROR - stderr - 71%|███████ | 15819/22434 [14:26:56<4:46:42, 2.60s/it] +2025-02-06 00:34:39 - ERROR - stderr - 71%|███████ | 15820/22434 [14:26:59<4:42:36, 2.56s/it] +2025-02-06 00:34:39 - ERROR - stderr - +2025-02-06 00:34:39 - ERROR - stderr - +2025-02-06 00:34:39 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.7316020727157593, 'learning_rate': 4.223047300369914e-06, 'epoch': 2.12} +2025-02-06 00:34:39 - ERROR - stderr - 71%|███████ | 15820/22434 [14:26:59<4:42:36, 2.56s/it] +2025-02-06 00:34:41 - ERROR - stderr - 71%|███████ | 15821/22434 [14:27:01<4:43:45, 2.57s/it] +2025-02-06 00:34:41 - ERROR - stderr - +2025-02-06 00:34:41 - ERROR - stderr - +2025-02-06 00:34:41 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.443379282951355, 'learning_rate': 4.2218688987627276e-06, 'epoch': 2.12} +2025-02-06 00:34:41 - ERROR - stderr - 71%|███████ | 15821/22434 [14:27:01<4:43:45, 2.57s/it] +2025-02-06 00:34:44 - ERROR - stderr - 71%|███████ | 15822/22434 [14:27:04<4:46:55, 2.60s/it] +2025-02-06 00:34:44 - ERROR - stderr - +2025-02-06 00:34:44 - ERROR - stderr - +2025-02-06 00:34:44 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.3161598443984985, 'learning_rate': 4.220690617595155e-06, 'epoch': 2.12} +2025-02-06 00:34:44 - ERROR - stderr - 71%|███████ | 15822/22434 [14:27:04<4:46:55, 2.60s/it] +2025-02-06 00:34:47 - ERROR - stderr - 71%|███████ | 15823/22434 [14:27:06<4:44:07, 2.58s/it] +2025-02-06 00:34:47 - ERROR - stderr - +2025-02-06 00:34:47 - ERROR - stderr - +2025-02-06 00:34:47 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.494821310043335, 'learning_rate': 4.2195124568917574e-06, 'epoch': 2.12} +2025-02-06 00:34:47 - ERROR - stderr - 71%|███████ | 15823/22434 [14:27:06<4:44:07, 2.58s/it] +2025-02-06 00:34:49 - ERROR - stderr - 71%|███████ | 15824/22434 [14:27:09<4:41:40, 2.56s/it] +2025-02-06 00:34:49 - ERROR - stderr - +2025-02-06 00:34:49 - ERROR - stderr - +2025-02-06 00:34:49 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.4729195833206177, 'learning_rate': 4.218334416677091e-06, 'epoch': 2.12} +2025-02-06 00:34:49 - ERROR - stderr - 71%|███████ | 15824/22434 [14:27:09<4:41:40, 2.56s/it] +2025-02-06 00:34:52 - ERROR - stderr - 71%|███████ | 15825/22434 [14:27:11<4:39:56, 2.54s/it] +2025-02-06 00:34:52 - ERROR - stderr - +2025-02-06 00:34:52 - ERROR - stderr - +2025-02-06 00:34:52 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.4879413843154907, 'learning_rate': 4.217156496975711e-06, 'epoch': 2.12} +2025-02-06 00:34:52 - ERROR - stderr - 71%|███████ | 15825/22434 [14:27:11<4:39:56, 2.54s/it] +2025-02-06 00:34:54 - ERROR - stderr - 71%|███████ | 15826/22434 [14:27:14<4:41:34, 2.56s/it] +2025-02-06 00:34:54 - ERROR - stderr - +2025-02-06 00:34:54 - ERROR - stderr - +2025-02-06 00:34:54 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.3683109283447266, 'learning_rate': 4.215978697812174e-06, 'epoch': 2.12} +2025-02-06 00:34:54 - ERROR - stderr - 71%|███████ | 15826/22434 [14:27:14<4:41:34, 2.56s/it] +2025-02-06 00:34:57 - ERROR - stderr - 71%|███████ | 15827/22434 [14:27:16<4:40:12, 2.54s/it] +2025-02-06 00:34:57 - ERROR - stderr - +2025-02-06 00:34:57 - ERROR - stderr - +2025-02-06 00:34:57 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.455397129058838, 'learning_rate': 4.214801019211019e-06, 'epoch': 2.12} +2025-02-06 00:34:57 - ERROR - stderr - 71%|███████ | 15827/22434 [14:27:16<4:40:12, 2.54s/it] +2025-02-06 00:34:59 - ERROR - stderr - 71%|███████ | 15828/22434 [14:27:19<4:38:16, 2.53s/it] +2025-02-06 00:34:59 - ERROR - stderr - +2025-02-06 00:34:59 - ERROR - stderr - +2025-02-06 00:34:59 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.5102944374084473, 'learning_rate': 4.213623461196804e-06, 'epoch': 2.12} +2025-02-06 00:34:59 - ERROR - stderr - 71%|███████ | 15828/22434 [14:27:19<4:38:16, 2.53s/it] +2025-02-06 00:35:02 - ERROR - stderr - 71%|███████ | 15829/22434 [14:27:22<4:45:56, 2.60s/it] +2025-02-06 00:35:02 - ERROR - stderr - +2025-02-06 00:35:02 - ERROR - stderr - +2025-02-06 00:35:02 - INFO - stdout - {'loss': 0.4199, 'grad_norm': 1.746572732925415, 'learning_rate': 4.212446023794076e-06, 'epoch': 2.12} +2025-02-06 00:35:02 - ERROR - stderr - 71%|███████ | 15829/22434 [14:27:22<4:45:56, 2.60s/it] +2025-02-06 00:35:04 - ERROR - stderr - 71%|███████ | 15830/22434 [14:27:24<4:43:36, 2.58s/it] +2025-02-06 00:35:04 - ERROR - stderr - +2025-02-06 00:35:04 - ERROR - stderr - +2025-02-06 00:35:04 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.533099889755249, 'learning_rate': 4.211268707027364e-06, 'epoch': 2.12} +2025-02-06 00:35:04 - ERROR - stderr - 71%|███████ | 15830/22434 [14:27:24<4:43:36, 2.58s/it] +2025-02-06 00:35:07 - ERROR - stderr - 71%|███████ | 15831/22434 [14:27:27<4:40:54, 2.55s/it] +2025-02-06 00:35:07 - ERROR - stderr - +2025-02-06 00:35:07 - ERROR - stderr - +2025-02-06 00:35:07 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.5608463287353516, 'learning_rate': 4.210091510921225e-06, 'epoch': 2.12} +2025-02-06 00:35:07 - ERROR - stderr - 71%|███████ | 15831/22434 [14:27:27<4:40:54, 2.55s/it] +2025-02-06 00:35:10 - ERROR - stderr - 71%|███████ | 15832/22434 [14:27:29<4:45:45, 2.60s/it] +2025-02-06 00:35:10 - ERROR - stderr - +2025-02-06 00:35:10 - ERROR - stderr - +2025-02-06 00:35:10 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.8351783752441406, 'learning_rate': 4.20891443550018e-06, 'epoch': 2.12} +2025-02-06 00:35:10 - ERROR - stderr - 71%|███████ | 15832/22434 [14:27:29<4:45:45, 2.60s/it] +2025-02-06 00:35:12 - ERROR - stderr - 71%|███████ | 15833/22434 [14:27:32<4:48:20, 2.62s/it] +2025-02-06 00:35:12 - ERROR - stderr - +2025-02-06 00:35:12 - ERROR - stderr - +2025-02-06 00:35:12 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.517020583152771, 'learning_rate': 4.207737480788779e-06, 'epoch': 2.12} +2025-02-06 00:35:12 - ERROR - stderr - 71%|███████ | 15833/22434 [14:27:32<4:48:20, 2.62s/it] +2025-02-06 00:35:15 - ERROR - stderr - 71%|███████ | 15834/22434 [14:27:35<4:50:37, 2.64s/it] +2025-02-06 00:35:15 - ERROR - stderr - +2025-02-06 00:35:15 - ERROR - stderr - +2025-02-06 00:35:15 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.4805076122283936, 'learning_rate': 4.206560646811545e-06, 'epoch': 2.12} +2025-02-06 00:35:15 - ERROR - stderr - 71%|███████ | 15834/22434 [14:27:35<4:50:37, 2.64s/it] +2025-02-06 00:35:17 - ERROR - stderr - 71%|███████ | 15835/22434 [14:27:37<4:44:11, 2.58s/it] +2025-02-06 00:35:18 - ERROR - stderr - +2025-02-06 00:35:18 - ERROR - stderr - +2025-02-06 00:35:18 - INFO - stdout - {'loss': 0.4432, 'grad_norm': 1.7181388139724731, 'learning_rate': 4.205383933593006e-06, 'epoch': 2.12} +2025-02-06 00:35:18 - ERROR - stderr - 71%|███████ | 15835/22434 [14:27:37<4:44:11, 2.58s/it] +2025-02-06 00:35:20 - ERROR - stderr - 71%|███████ | 15836/22434 [14:27:40<4:44:03, 2.58s/it] +2025-02-06 00:35:20 - ERROR - stderr - +2025-02-06 00:35:20 - ERROR - stderr - +2025-02-06 00:35:20 - INFO - stdout - {'loss': 0.3933, 'grad_norm': 1.6780328750610352, 'learning_rate': 4.204207341157702e-06, 'epoch': 2.12} +2025-02-06 00:35:20 - ERROR - stderr - 71%|███████ | 15836/22434 [14:27:40<4:44:03, 2.58s/it] +2025-02-06 00:35:23 - ERROR - stderr - 71%|███████ | 15837/22434 [14:27:42<4:41:26, 2.56s/it] +2025-02-06 00:35:23 - ERROR - stderr - +2025-02-06 00:35:23 - ERROR - stderr - +2025-02-06 00:35:23 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.4277105331420898, 'learning_rate': 4.2030308695301455e-06, 'epoch': 2.12} +2025-02-06 00:35:23 - ERROR - stderr - 71%|███████ | 15837/22434 [14:27:42<4:41:26, 2.56s/it] +2025-02-06 00:35:25 - ERROR - stderr - 71%|███████ | 15838/22434 [14:27:45<4:42:08, 2.57s/it] +2025-02-06 00:35:25 - ERROR - stderr - +2025-02-06 00:35:25 - ERROR - stderr - +2025-02-06 00:35:25 - INFO - stdout - {'loss': 0.3941, 'grad_norm': 1.689159631729126, 'learning_rate': 4.2018545187348645e-06, 'epoch': 2.12} +2025-02-06 00:35:25 - ERROR - stderr - 71%|███████ | 15838/22434 [14:27:45<4:42:08, 2.57s/it] +2025-02-06 00:35:28 - ERROR - stderr - 71%|███████ | 15839/22434 [14:27:47<4:40:07, 2.55s/it] +2025-02-06 00:35:28 - ERROR - stderr - +2025-02-06 00:35:28 - ERROR - stderr - +2025-02-06 00:35:28 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.4892306327819824, 'learning_rate': 4.200678288796378e-06, 'epoch': 2.12} +2025-02-06 00:35:28 - ERROR - stderr - 71%|███████ | 15839/22434 [14:27:47<4:40:07, 2.55s/it] +2025-02-06 00:35:30 - ERROR - stderr - 71%|███████ | 15840/22434 [14:27:50<4:37:19, 2.52s/it] +2025-02-06 00:35:30 - ERROR - stderr - +2025-02-06 00:35:30 - ERROR - stderr - +2025-02-06 00:35:30 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.466098666191101, 'learning_rate': 4.199502179739202e-06, 'epoch': 2.12} +2025-02-06 00:35:30 - ERROR - stderr - 71%|███████ | 15840/22434 [14:27:50<4:37:19, 2.52s/it] +2025-02-06 00:35:33 - ERROR - stderr - 71%|███████ | 15841/22434 [14:27:52<4:36:00, 2.51s/it] +2025-02-06 00:35:33 - ERROR - stderr - +2025-02-06 00:35:33 - ERROR - stderr - +2025-02-06 00:35:33 - INFO - stdout - {'loss': 0.4329, 'grad_norm': 1.473799467086792, 'learning_rate': 4.1983261915878535e-06, 'epoch': 2.12} +2025-02-06 00:35:33 - ERROR - stderr - 71%|███████ | 15841/22434 [14:27:52<4:36:00, 2.51s/it] +2025-02-06 00:35:35 - ERROR - stderr - 71%|███████ | 15842/22434 [14:27:55<4:38:16, 2.53s/it] +2025-02-06 00:35:35 - ERROR - stderr - +2025-02-06 00:35:35 - ERROR - stderr - +2025-02-06 00:35:35 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.4501913785934448, 'learning_rate': 4.197150324366844e-06, 'epoch': 2.12} +2025-02-06 00:35:35 - ERROR - stderr - 71%|███████ | 15842/22434 [14:27:55<4:38:16, 2.53s/it] +2025-02-06 00:35:38 - ERROR - stderr - 71%|███████ | 15843/22434 [14:27:57<4:36:03, 2.51s/it] +2025-02-06 00:35:38 - ERROR - stderr - +2025-02-06 00:35:38 - ERROR - stderr - +2025-02-06 00:35:38 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.538856863975525, 'learning_rate': 4.1959745781006835e-06, 'epoch': 2.12} +2025-02-06 00:35:38 - ERROR - stderr - 71%|███████ | 15843/22434 [14:27:57<4:36:03, 2.51s/it] +2025-02-06 00:35:40 - ERROR - stderr - 71%|███████ | 15844/22434 [14:28:00<4:37:26, 2.53s/it] +2025-02-06 00:35:40 - ERROR - stderr - +2025-02-06 00:35:40 - ERROR - stderr - +2025-02-06 00:35:40 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.2100977897644043, 'learning_rate': 4.194798952813878e-06, 'epoch': 2.12} +2025-02-06 00:35:40 - ERROR - stderr - 71%|███████ | 15844/22434 [14:28:00<4:37:26, 2.53s/it] +2025-02-06 00:35:43 - ERROR - stderr - 71%|███████ | 15845/22434 [14:28:02<4:34:42, 2.50s/it] +2025-02-06 00:35:43 - ERROR - stderr - +2025-02-06 00:35:43 - ERROR - stderr - +2025-02-06 00:35:43 - INFO - stdout - {'loss': 0.3707, 'grad_norm': 1.507686734199524, 'learning_rate': 4.193623448530937e-06, 'epoch': 2.12} +2025-02-06 00:35:43 - ERROR - stderr - 71%|███████ | 15845/22434 [14:28:02<4:34:42, 2.50s/it] +2025-02-06 00:35:45 - ERROR - stderr - 71%|███████ | 15846/22434 [14:28:05<4:32:48, 2.48s/it] +2025-02-06 00:35:45 - ERROR - stderr - +2025-02-06 00:35:45 - ERROR - stderr - +2025-02-06 00:35:45 - INFO - stdout - {'loss': 0.4113, 'grad_norm': 1.5177654027938843, 'learning_rate': 4.192448065276352e-06, 'epoch': 2.12} +2025-02-06 00:35:45 - ERROR - stderr - 71%|███████ | 15846/22434 [14:28:05<4:32:48, 2.48s/it] +2025-02-06 00:35:48 - ERROR - stderr - 71%|███████ | 15847/22434 [14:28:08<4:39:31, 2.55s/it] +2025-02-06 00:35:48 - ERROR - stderr - +2025-02-06 00:35:48 - ERROR - stderr - +2025-02-06 00:35:48 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.3500256538391113, 'learning_rate': 4.191272803074634e-06, 'epoch': 2.12} +2025-02-06 00:35:48 - ERROR - stderr - 71%|███████ | 15847/22434 [14:28:08<4:39:31, 2.55s/it] +2025-02-06 00:35:51 - ERROR - stderr - 71%|███████ | 15848/22434 [14:28:10<4:48:03, 2.62s/it] +2025-02-06 00:35:51 - ERROR - stderr - +2025-02-06 00:35:51 - ERROR - stderr - +2025-02-06 00:35:51 - INFO - stdout - {'loss': 0.4211, 'grad_norm': 1.655120611190796, 'learning_rate': 4.190097661950277e-06, 'epoch': 2.12} +2025-02-06 00:35:51 - ERROR - stderr - 71%|███████ | 15848/22434 [14:28:10<4:48:03, 2.62s/it] +2025-02-06 00:35:53 - ERROR - stderr - 71%|███████ | 15849/22434 [14:28:13<4:47:10, 2.62s/it] +2025-02-06 00:35:53 - ERROR - stderr - +2025-02-06 00:35:53 - ERROR - stderr - +2025-02-06 00:35:53 - INFO - stdout - {'loss': 0.3963, 'grad_norm': 1.3412137031555176, 'learning_rate': 4.188922641927773e-06, 'epoch': 2.12} +2025-02-06 00:35:53 - ERROR - stderr - 71%|███████ | 15849/22434 [14:28:13<4:47:10, 2.62s/it] +2025-02-06 00:35:56 - ERROR - stderr - 71%|███████ | 15850/22434 [14:28:15<4:41:16, 2.56s/it] +2025-02-06 00:35:56 - ERROR - stderr - +2025-02-06 00:35:56 - ERROR - stderr - +2025-02-06 00:35:56 - INFO - stdout - {'loss': 0.3918, 'grad_norm': 1.4473352432250977, 'learning_rate': 4.18774774303162e-06, 'epoch': 2.12} +2025-02-06 00:35:56 - ERROR - stderr - 71%|███████ | 15850/22434 [14:28:15<4:41:16, 2.56s/it] +2025-02-06 00:35:58 - ERROR - stderr - 71%|███████ | 15851/22434 [14:28:18<4:41:11, 2.56s/it] +2025-02-06 00:35:58 - ERROR - stderr - +2025-02-06 00:35:58 - ERROR - stderr - +2025-02-06 00:35:58 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.572608232498169, 'learning_rate': 4.186572965286297e-06, 'epoch': 2.12} +2025-02-06 00:35:58 - ERROR - stderr - 71%|███████ | 15851/22434 [14:28:18<4:41:11, 2.56s/it] +2025-02-06 00:36:01 - ERROR - stderr - 71%|███████ | 15852/22434 [14:28:21<4:44:27, 2.59s/it] +2025-02-06 00:36:01 - ERROR - stderr - +2025-02-06 00:36:01 - ERROR - stderr - +2025-02-06 00:36:01 - INFO - stdout - {'loss': 0.4115, 'grad_norm': 1.4750779867172241, 'learning_rate': 4.185398308716304e-06, 'epoch': 2.12} +2025-02-06 00:36:01 - ERROR - stderr - 71%|███████ | 15852/22434 [14:28:21<4:44:27, 2.59s/it] +2025-02-06 00:36:03 - ERROR - stderr - 71%|███████ | 15853/22434 [14:28:23<4:40:29, 2.56s/it] +2025-02-06 00:36:03 - ERROR - stderr - +2025-02-06 00:36:03 - ERROR - stderr - +2025-02-06 00:36:03 - INFO - stdout - {'loss': 0.3829, 'grad_norm': 1.395971417427063, 'learning_rate': 4.1842237733461166e-06, 'epoch': 2.12} +2025-02-06 00:36:03 - ERROR - stderr - 71%|███████ | 15853/22434 [14:28:23<4:40:29, 2.56s/it] +2025-02-06 00:36:06 - ERROR - stderr - 71%|███████ | 15854/22434 [14:28:26<4:36:58, 2.53s/it] +2025-02-06 00:36:06 - ERROR - stderr - +2025-02-06 00:36:06 - ERROR - stderr - +2025-02-06 00:36:06 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.5216400623321533, 'learning_rate': 4.183049359200215e-06, 'epoch': 2.12} +2025-02-06 00:36:06 - ERROR - stderr - 71%|███████ | 15854/22434 [14:28:26<4:36:58, 2.53s/it] +2025-02-06 00:36:08 - ERROR - stderr - 71%|███████ | 15855/22434 [14:28:28<4:34:04, 2.50s/it] +2025-02-06 00:36:08 - ERROR - stderr - +2025-02-06 00:36:08 - ERROR - stderr - +2025-02-06 00:36:08 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.5628246068954468, 'learning_rate': 4.181875066303092e-06, 'epoch': 2.12} +2025-02-06 00:36:08 - ERROR - stderr - 71%|███████ | 15855/22434 [14:28:28<4:34:04, 2.50s/it] +2025-02-06 00:36:11 - ERROR - stderr - 71%|███████ | 15856/22434 [14:28:31<4:36:04, 2.52s/it] +2025-02-06 00:36:11 - ERROR - stderr - +2025-02-06 00:36:11 - ERROR - stderr - +2025-02-06 00:36:11 - INFO - stdout - {'loss': 0.3793, 'grad_norm': 1.364039659500122, 'learning_rate': 4.1807008946792075e-06, 'epoch': 2.12} +2025-02-06 00:36:11 - ERROR - stderr - 71%|███████ | 15856/22434 [14:28:31<4:36:04, 2.52s/it] +2025-02-06 00:36:13 - ERROR - stderr - 71%|███████ | 15857/22434 [14:28:33<4:34:18, 2.50s/it] +2025-02-06 00:36:13 - ERROR - stderr - +2025-02-06 00:36:13 - ERROR - stderr - +2025-02-06 00:36:13 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.2929723262786865, 'learning_rate': 4.179526844353051e-06, 'epoch': 2.12} +2025-02-06 00:36:13 - ERROR - stderr - 71%|███████ | 15857/22434 [14:28:33<4:34:18, 2.50s/it] +2025-02-06 00:36:16 - ERROR - stderr - 71%|███████ | 15858/22434 [14:28:36<4:40:19, 2.56s/it] +2025-02-06 00:36:16 - ERROR - stderr - +2025-02-06 00:36:16 - ERROR - stderr - +2025-02-06 00:36:16 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.3052074909210205, 'learning_rate': 4.178352915349085e-06, 'epoch': 2.12} +2025-02-06 00:36:16 - ERROR - stderr - 71%|███████ | 15858/22434 [14:28:36<4:40:19, 2.56s/it] +2025-02-06 00:36:18 - ERROR - stderr - 71%|███████ | 15859/22434 [14:28:38<4:39:58, 2.55s/it] +2025-02-06 00:36:19 - ERROR - stderr - +2025-02-06 00:36:19 - ERROR - stderr - +2025-02-06 00:36:19 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.6489425897598267, 'learning_rate': 4.177179107691782e-06, 'epoch': 2.12} +2025-02-06 00:36:19 - ERROR - stderr - 71%|███████ | 15859/22434 [14:28:38<4:39:58, 2.55s/it] +2025-02-06 00:36:21 - ERROR - stderr - 71%|███████ | 15860/22434 [14:28:41<4:38:05, 2.54s/it] +2025-02-06 00:36:21 - ERROR - stderr - +2025-02-06 00:36:21 - ERROR - stderr - +2025-02-06 00:36:21 - INFO - stdout - {'loss': 0.4271, 'grad_norm': 1.7088359594345093, 'learning_rate': 4.176005421405609e-06, 'epoch': 2.12} +2025-02-06 00:36:21 - ERROR - stderr - 71%|███████ | 15860/22434 [14:28:41<4:38:05, 2.54s/it] +2025-02-06 00:36:23 - ERROR - stderr - 71%|███████ | 15861/22434 [14:28:43<4:37:07, 2.53s/it] +2025-02-06 00:36:24 - ERROR - stderr - +2025-02-06 00:36:24 - ERROR - stderr - +2025-02-06 00:36:24 - INFO - stdout - {'loss': 0.3842, 'grad_norm': 1.4561560153961182, 'learning_rate': 4.174831856515029e-06, 'epoch': 2.12} +2025-02-06 00:36:24 - ERROR - stderr - 71%|███████ | 15861/22434 [14:28:43<4:37:07, 2.53s/it] +2025-02-06 00:36:26 - ERROR - stderr - 71%|███████ | 15862/22434 [14:28:46<4:36:07, 2.52s/it] +2025-02-06 00:36:26 - ERROR - stderr - +2025-02-06 00:36:26 - ERROR - stderr - +2025-02-06 00:36:26 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.3666518926620483, 'learning_rate': 4.173658413044506e-06, 'epoch': 2.12} +2025-02-06 00:36:26 - ERROR - stderr - 71%|███████ | 15862/22434 [14:28:46<4:36:07, 2.52s/it] +2025-02-06 00:36:28 - ERROR - stderr - 71%|███████ | 15863/22434 [14:28:48<4:34:13, 2.50s/it] +2025-02-06 00:36:28 - ERROR - stderr - +2025-02-06 00:36:28 - ERROR - stderr - +2025-02-06 00:36:28 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.4363411664962769, 'learning_rate': 4.172485091018498e-06, 'epoch': 2.12} +2025-02-06 00:36:28 - ERROR - stderr - 71%|███████ | 15863/22434 [14:28:48<4:34:13, 2.50s/it] +2025-02-06 00:36:31 - ERROR - stderr - 71%|███████ | 15864/22434 [14:28:51<4:34:43, 2.51s/it] +2025-02-06 00:36:31 - ERROR - stderr - +2025-02-06 00:36:31 - ERROR - stderr - +2025-02-06 00:36:31 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.3198888301849365, 'learning_rate': 4.171311890461461e-06, 'epoch': 2.12} +2025-02-06 00:36:31 - ERROR - stderr - 71%|███████ | 15864/22434 [14:28:51<4:34:43, 2.51s/it] +2025-02-06 00:36:33 - ERROR - stderr - 71%|███████ | 15865/22434 [14:28:53<4:35:25, 2.52s/it] +2025-02-06 00:36:34 - ERROR - stderr - +2025-02-06 00:36:34 - ERROR - stderr - +2025-02-06 00:36:34 - INFO - stdout - {'loss': 0.4146, 'grad_norm': 1.4109219312667847, 'learning_rate': 4.17013881139785e-06, 'epoch': 2.12} +2025-02-06 00:36:34 - ERROR - stderr - 71%|███████ | 15865/22434 [14:28:53<4:35:25, 2.52s/it] +2025-02-06 00:36:36 - ERROR - stderr - 71%|███████ | 15866/22434 [14:28:56<4:33:59, 2.50s/it] +2025-02-06 00:36:36 - ERROR - stderr - +2025-02-06 00:36:36 - ERROR - stderr - +2025-02-06 00:36:36 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.5402634143829346, 'learning_rate': 4.1689658538521185e-06, 'epoch': 2.12} +2025-02-06 00:36:36 - ERROR - stderr - 71%|███████ | 15866/22434 [14:28:56<4:33:59, 2.50s/it] +2025-02-06 00:36:38 - ERROR - stderr - 71%|███████ | 15867/22434 [14:28:58<4:31:59, 2.49s/it] +2025-02-06 00:36:38 - ERROR - stderr - +2025-02-06 00:36:38 - ERROR - stderr - +2025-02-06 00:36:38 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.6806546449661255, 'learning_rate': 4.167793017848712e-06, 'epoch': 2.12} +2025-02-06 00:36:38 - ERROR - stderr - 71%|███████ | 15867/22434 [14:28:58<4:31:59, 2.49s/it] +2025-02-06 00:36:41 - ERROR - stderr - 71%|███████ | 15868/22434 [14:29:01<4:33:43, 2.50s/it] +2025-02-06 00:36:41 - ERROR - stderr - +2025-02-06 00:36:41 - ERROR - stderr - +2025-02-06 00:36:41 - INFO - stdout - {'loss': 0.4215, 'grad_norm': 1.5720099210739136, 'learning_rate': 4.166620303412081e-06, 'epoch': 2.12} +2025-02-06 00:36:41 - ERROR - stderr - 71%|███████ | 15868/22434 [14:29:01<4:33:43, 2.50s/it] +2025-02-06 00:36:43 - ERROR - stderr - 71%|███████ | 15869/22434 [14:29:03<4:34:29, 2.51s/it] +2025-02-06 00:36:44 - ERROR - stderr - +2025-02-06 00:36:44 - ERROR - stderr - +2025-02-06 00:36:44 - INFO - stdout - {'loss': 0.4155, 'grad_norm': 1.5742303133010864, 'learning_rate': 4.165447710566671e-06, 'epoch': 2.12} +2025-02-06 00:36:44 - ERROR - stderr - 71%|███████ | 15869/22434 [14:29:03<4:34:29, 2.51s/it] +2025-02-06 00:36:46 - ERROR - stderr - 71%|███████ | 15870/22434 [14:29:06<4:32:54, 2.49s/it] +2025-02-06 00:36:46 - ERROR - stderr - +2025-02-06 00:36:46 - ERROR - stderr - +2025-02-06 00:36:46 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.4051637649536133, 'learning_rate': 4.164275239336914e-06, 'epoch': 2.12} +2025-02-06 00:36:46 - ERROR - stderr - 71%|███████ | 15870/22434 [14:29:06<4:32:54, 2.49s/it] +2025-02-06 00:36:48 - ERROR - stderr - 71%|███████ | 15871/22434 [14:29:08<4:35:02, 2.51s/it] +2025-02-06 00:36:49 - ERROR - stderr - +2025-02-06 00:36:49 - ERROR - stderr - +2025-02-06 00:36:49 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.4366191625595093, 'learning_rate': 4.16310288974726e-06, 'epoch': 2.12} +2025-02-06 00:36:49 - ERROR - stderr - 71%|███████ | 15871/22434 [14:29:08<4:35:02, 2.51s/it] +2025-02-06 00:36:51 - ERROR - stderr - 71%|███████ | 15872/22434 [14:29:11<4:35:23, 2.52s/it] +2025-02-06 00:36:51 - ERROR - stderr - +2025-02-06 00:36:51 - ERROR - stderr - +2025-02-06 00:36:51 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.4073134660720825, 'learning_rate': 4.161930661822137e-06, 'epoch': 2.12} +2025-02-06 00:36:51 - ERROR - stderr - 71%|███████ | 15872/22434 [14:29:11<4:35:23, 2.52s/it] +2025-02-06 00:36:54 - ERROR - stderr - 71%|███████ | 15873/22434 [14:29:13<4:39:01, 2.55s/it] +2025-02-06 00:36:54 - ERROR - stderr - +2025-02-06 00:36:54 - ERROR - stderr - +2025-02-06 00:36:54 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.5554659366607666, 'learning_rate': 4.160758555585984e-06, 'epoch': 2.12} +2025-02-06 00:36:54 - ERROR - stderr - 71%|███████ | 15873/22434 [14:29:13<4:39:01, 2.55s/it] +2025-02-06 00:36:56 - ERROR - stderr - 71%|███████ | 15874/22434 [14:29:16<4:37:03, 2.53s/it] +2025-02-06 00:36:56 - ERROR - stderr - +2025-02-06 00:36:56 - ERROR - stderr - +2025-02-06 00:36:56 - INFO - stdout - {'loss': 0.4434, 'grad_norm': 1.6783851385116577, 'learning_rate': 4.1595865710632366e-06, 'epoch': 2.12} +2025-02-06 00:36:56 - ERROR - stderr - 71%|███████ | 15874/22434 [14:29:16<4:37:03, 2.53s/it] +2025-02-06 00:36:59 - ERROR - stderr - 71%|███████ | 15875/22434 [14:29:18<4:35:07, 2.52s/it] +2025-02-06 00:36:59 - ERROR - stderr - +2025-02-06 00:36:59 - ERROR - stderr - +2025-02-06 00:36:59 - INFO - stdout - {'loss': 0.4342, 'grad_norm': 1.5492866039276123, 'learning_rate': 4.15841470827831e-06, 'epoch': 2.12} +2025-02-06 00:36:59 - ERROR - stderr - 71%|███████ | 15875/22434 [14:29:18<4:35:07, 2.52s/it] +2025-02-06 00:37:01 - ERROR - stderr - 71%|███████ | 15876/22434 [14:29:21<4:34:14, 2.51s/it] +2025-02-06 00:37:01 - ERROR - stderr - +2025-02-06 00:37:01 - ERROR - stderr - +2025-02-06 00:37:01 - INFO - stdout - {'loss': 0.3399, 'grad_norm': 1.425931692123413, 'learning_rate': 4.157242967255647e-06, 'epoch': 2.12} +2025-02-06 00:37:01 - ERROR - stderr - 71%|███████ | 15876/22434 [14:29:21<4:34:14, 2.51s/it] +2025-02-06 00:37:04 - ERROR - stderr - 71%|███████ | 15877/22434 [14:29:24<4:38:03, 2.54s/it] +2025-02-06 00:37:04 - ERROR - stderr - +2025-02-06 00:37:04 - ERROR - stderr - +2025-02-06 00:37:04 - INFO - stdout - {'loss': 0.4187, 'grad_norm': 1.5937637090682983, 'learning_rate': 4.15607134801966e-06, 'epoch': 2.12} +2025-02-06 00:37:04 - ERROR - stderr - 71%|███████ | 15877/22434 [14:29:24<4:38:03, 2.54s/it] +2025-02-06 00:37:06 - ERROR - stderr - 71%|███████ | 15878/22434 [14:29:26<4:37:57, 2.54s/it] +2025-02-06 00:37:06 - ERROR - stderr - +2025-02-06 00:37:06 - ERROR - stderr - +2025-02-06 00:37:06 - INFO - stdout - {'loss': 0.4214, 'grad_norm': 1.7412713766098022, 'learning_rate': 4.154899850594774e-06, 'epoch': 2.12} +2025-02-06 00:37:06 - ERROR - stderr - 71%|███████ | 15878/22434 [14:29:26<4:37:57, 2.54s/it] +2025-02-06 00:37:09 - ERROR - stderr - 71%|███████ | 15879/22434 [14:29:29<4:35:17, 2.52s/it] +2025-02-06 00:37:09 - ERROR - stderr - +2025-02-06 00:37:09 - ERROR - stderr - +2025-02-06 00:37:09 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.5094362497329712, 'learning_rate': 4.153728475005406e-06, 'epoch': 2.12} +2025-02-06 00:37:09 - ERROR - stderr - 71%|███████ | 15879/22434 [14:29:29<4:35:17, 2.52s/it] +2025-02-06 00:37:11 - ERROR - stderr - 71%|███████ | 15880/22434 [14:29:31<4:35:21, 2.52s/it] +2025-02-06 00:37:11 - ERROR - stderr - +2025-02-06 00:37:11 - ERROR - stderr - +2025-02-06 00:37:11 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.645103096961975, 'learning_rate': 4.152557221275975e-06, 'epoch': 2.12} +2025-02-06 00:37:11 - ERROR - stderr - 71%|███████ | 15880/22434 [14:29:31<4:35:21, 2.52s/it] +2025-02-06 00:37:14 - ERROR - stderr - 71%|███████ | 15881/22434 [14:29:33<4:31:53, 2.49s/it] +2025-02-06 00:37:14 - ERROR - stderr - +2025-02-06 00:37:14 - ERROR - stderr - +2025-02-06 00:37:14 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.4615051746368408, 'learning_rate': 4.151386089430892e-06, 'epoch': 2.12} +2025-02-06 00:37:14 - ERROR - stderr - 71%|███████ | 15881/22434 [14:29:33<4:31:53, 2.49s/it] +2025-02-06 00:37:16 - ERROR - stderr - 71%|███████ | 15882/22434 [14:29:36<4:41:39, 2.58s/it] +2025-02-06 00:37:17 - ERROR - stderr - +2025-02-06 00:37:17 - ERROR - stderr - +2025-02-06 00:37:17 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.3701961040496826, 'learning_rate': 4.1502150794945705e-06, 'epoch': 2.12} +2025-02-06 00:37:17 - ERROR - stderr - 71%|███████ | 15882/22434 [14:29:36<4:41:39, 2.58s/it] +2025-02-06 00:37:19 - ERROR - stderr - 71%|███████ | 15883/22434 [14:29:39<4:39:40, 2.56s/it] +2025-02-06 00:37:19 - ERROR - stderr - +2025-02-06 00:37:19 - ERROR - stderr - +2025-02-06 00:37:19 - INFO - stdout - {'loss': 0.4058, 'grad_norm': 1.322082281112671, 'learning_rate': 4.149044191491418e-06, 'epoch': 2.12} +2025-02-06 00:37:19 - ERROR - stderr - 71%|███████ | 15883/22434 [14:29:39<4:39:40, 2.56s/it] +2025-02-06 00:37:21 - ERROR - stderr - 71%|███████ | 15884/22434 [14:29:41<4:37:36, 2.54s/it] +2025-02-06 00:37:22 - ERROR - stderr - +2025-02-06 00:37:22 - ERROR - stderr - +2025-02-06 00:37:22 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.5117310285568237, 'learning_rate': 4.147873425445839e-06, 'epoch': 2.12} +2025-02-06 00:37:22 - ERROR - stderr - 71%|███████ | 15884/22434 [14:29:41<4:37:36, 2.54s/it] +2025-02-06 00:37:24 - ERROR - stderr - 71%|███████ | 15885/22434 [14:29:44<4:33:40, 2.51s/it] +2025-02-06 00:37:24 - ERROR - stderr - +2025-02-06 00:37:24 - ERROR - stderr - +2025-02-06 00:37:24 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.3842909336090088, 'learning_rate': 4.146702781382242e-06, 'epoch': 2.12} +2025-02-06 00:37:24 - ERROR - stderr - 71%|███████ | 15885/22434 [14:29:44<4:33:40, 2.51s/it] +2025-02-06 00:37:26 - ERROR - stderr - 71%|███████ | 15886/22434 [14:29:46<4:32:28, 2.50s/it] +2025-02-06 00:37:26 - ERROR - stderr - +2025-02-06 00:37:26 - ERROR - stderr - +2025-02-06 00:37:26 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.3393534421920776, 'learning_rate': 4.1455322593250216e-06, 'epoch': 2.12} +2025-02-06 00:37:26 - ERROR - stderr - 71%|███████ | 15886/22434 [14:29:46<4:32:28, 2.50s/it] +2025-02-06 00:37:29 - ERROR - stderr - 71%|███████ | 15887/22434 [14:29:49<4:32:43, 2.50s/it] +2025-02-06 00:37:29 - ERROR - stderr - +2025-02-06 00:37:29 - ERROR - stderr - +2025-02-06 00:37:29 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.5686941146850586, 'learning_rate': 4.14436185929858e-06, 'epoch': 2.12} +2025-02-06 00:37:29 - ERROR - stderr - 71%|███████ | 15887/22434 [14:29:49<4:32:43, 2.50s/it] +2025-02-06 00:37:31 - ERROR - stderr - 71%|███████ | 15888/22434 [14:29:51<4:33:48, 2.51s/it] +2025-02-06 00:37:31 - ERROR - stderr - +2025-02-06 00:37:31 - ERROR - stderr - +2025-02-06 00:37:31 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.5218119621276855, 'learning_rate': 4.1431915813273124e-06, 'epoch': 2.12} +2025-02-06 00:37:31 - ERROR - stderr - 71%|███████ | 15888/22434 [14:29:51<4:33:48, 2.51s/it] +2025-02-06 00:37:34 - ERROR - stderr - 71%|███████ | 15889/22434 [14:29:54<4:33:19, 2.51s/it] +2025-02-06 00:37:34 - ERROR - stderr - +2025-02-06 00:37:34 - ERROR - stderr - +2025-02-06 00:37:34 - INFO - stdout - {'loss': 0.3997, 'grad_norm': 1.502213478088379, 'learning_rate': 4.142021425435612e-06, 'epoch': 2.12} +2025-02-06 00:37:34 - ERROR - stderr - 71%|███████ | 15889/22434 [14:29:54<4:33:19, 2.51s/it] +2025-02-06 00:37:36 - ERROR - stderr - 71%|███████ | 15890/22434 [14:29:56<4:34:09, 2.51s/it] +2025-02-06 00:37:37 - ERROR - stderr - +2025-02-06 00:37:37 - ERROR - stderr - +2025-02-06 00:37:37 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.6279336214065552, 'learning_rate': 4.140851391647872e-06, 'epoch': 2.12} +2025-02-06 00:37:37 - ERROR - stderr - 71%|███████ | 15890/22434 [14:29:56<4:34:09, 2.51s/it] +2025-02-06 00:37:39 - ERROR - stderr - 71%|███████ | 15891/22434 [14:29:59<4:34:54, 2.52s/it] +2025-02-06 00:37:39 - ERROR - stderr - +2025-02-06 00:37:39 - ERROR - stderr - +2025-02-06 00:37:39 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.5856924057006836, 'learning_rate': 4.139681479988472e-06, 'epoch': 2.13} +2025-02-06 00:37:39 - ERROR - stderr - 71%|███████ | 15891/22434 [14:29:59<4:34:54, 2.52s/it] +2025-02-06 00:37:41 - ERROR - stderr - 71%|███████ | 15892/22434 [14:30:01<4:32:28, 2.50s/it] +2025-02-06 00:37:41 - ERROR - stderr - +2025-02-06 00:37:41 - ERROR - stderr - +2025-02-06 00:37:41 - INFO - stdout - {'loss': 0.4107, 'grad_norm': 1.6946572065353394, 'learning_rate': 4.138511690481808e-06, 'epoch': 2.13} +2025-02-06 00:37:41 - ERROR - stderr - 71%|███████ | 15892/22434 [14:30:01<4:32:28, 2.50s/it] +2025-02-06 00:37:44 - ERROR - stderr - 71%|███████ | 15893/22434 [14:30:04<4:33:24, 2.51s/it] +2025-02-06 00:37:44 - ERROR - stderr - +2025-02-06 00:37:44 - ERROR - stderr - +2025-02-06 00:37:44 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.4954934120178223, 'learning_rate': 4.137342023152257e-06, 'epoch': 2.13} +2025-02-06 00:37:44 - ERROR - stderr - 71%|███████ | 15893/22434 [14:30:04<4:33:24, 2.51s/it] +2025-02-06 00:37:46 - ERROR - stderr - 71%|███████ | 15894/22434 [14:30:06<4:32:44, 2.50s/it] +2025-02-06 00:37:47 - ERROR - stderr - +2025-02-06 00:37:47 - ERROR - stderr - +2025-02-06 00:37:47 - INFO - stdout - {'loss': 0.3377, 'grad_norm': 1.484686017036438, 'learning_rate': 4.136172478024203e-06, 'epoch': 2.13} +2025-02-06 00:37:47 - ERROR - stderr - 71%|███████ | 15894/22434 [14:30:06<4:32:44, 2.50s/it] +2025-02-06 00:37:49 - ERROR - stderr - 71%|███████ | 15895/22434 [14:30:09<4:31:41, 2.49s/it] +2025-02-06 00:37:49 - ERROR - stderr - +2025-02-06 00:37:49 - ERROR - stderr - +2025-02-06 00:37:49 - INFO - stdout - {'loss': 0.409, 'grad_norm': 1.4771796464920044, 'learning_rate': 4.135003055122027e-06, 'epoch': 2.13} +2025-02-06 00:37:49 - ERROR - stderr - 71%|███████ | 15895/22434 [14:30:09<4:31:41, 2.49s/it] +2025-02-06 00:37:51 - ERROR - stderr - 71%|███████ | 15896/22434 [14:30:11<4:33:14, 2.51s/it] +2025-02-06 00:37:52 - ERROR - stderr - +2025-02-06 00:37:52 - ERROR - stderr - +2025-02-06 00:37:52 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.5144153833389282, 'learning_rate': 4.133833754470091e-06, 'epoch': 2.13} +2025-02-06 00:37:52 - ERROR - stderr - 71%|███████ | 15896/22434 [14:30:11<4:33:14, 2.51s/it] +2025-02-06 00:37:54 - ERROR - stderr - 71%|███████ | 15897/22434 [14:30:14<4:37:48, 2.55s/it] +2025-02-06 00:37:54 - ERROR - stderr - +2025-02-06 00:37:54 - ERROR - stderr - +2025-02-06 00:37:54 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.5525412559509277, 'learning_rate': 4.132664576092785e-06, 'epoch': 2.13} +2025-02-06 00:37:54 - ERROR - stderr - 71%|███████ | 15897/22434 [14:30:14<4:37:48, 2.55s/it] +2025-02-06 00:37:57 - ERROR - stderr - 71%|███████ | 15898/22434 [14:30:17<4:45:43, 2.62s/it] +2025-02-06 00:37:57 - ERROR - stderr - +2025-02-06 00:37:57 - ERROR - stderr - +2025-02-06 00:37:57 - INFO - stdout - {'loss': 0.345, 'grad_norm': 1.3438653945922852, 'learning_rate': 4.131495520014469e-06, 'epoch': 2.13} +2025-02-06 00:37:57 - ERROR - stderr - 71%|███████ | 15898/22434 [14:30:17<4:45:43, 2.62s/it] +2025-02-06 00:37:59 - ERROR - stderr - 71%|███████ | 15899/22434 [14:30:19<4:39:00, 2.56s/it] +2025-02-06 00:37:59 - ERROR - stderr - +2025-02-06 00:37:59 - ERROR - stderr - +2025-02-06 00:37:59 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.4278593063354492, 'learning_rate': 4.130326586259509e-06, 'epoch': 2.13} +2025-02-06 00:37:59 - ERROR - stderr - 71%|███████ | 15899/22434 [14:30:19<4:39:00, 2.56s/it] +2025-02-06 00:38:02 - ERROR - stderr - 71%|███████ | 15900/22434 [14:30:22<4:38:36, 2.56s/it] +2025-02-06 00:38:02 - ERROR - stderr - +2025-02-06 00:38:02 - ERROR - stderr - +2025-02-06 00:38:02 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.6261132955551147, 'learning_rate': 4.129157774852282e-06, 'epoch': 2.13} +2025-02-06 00:38:02 - ERROR - stderr - 71%|███████ | 15900/22434 [14:30:22<4:38:36, 2.56s/it] +2025-02-06 00:38:04 - ERROR - stderr - 71%|███████ | 15901/22434 [14:30:24<4:35:52, 2.53s/it] +2025-02-06 00:38:04 - ERROR - stderr - +2025-02-06 00:38:04 - ERROR - stderr - +2025-02-06 00:38:04 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.4129383563995361, 'learning_rate': 4.127989085817135e-06, 'epoch': 2.13} +2025-02-06 00:38:04 - ERROR - stderr - 71%|███████ | 15901/22434 [14:30:24<4:35:52, 2.53s/it] +2025-02-06 00:38:07 - ERROR - stderr - 71%|███████ | 15902/22434 [14:30:27<4:33:49, 2.52s/it] +2025-02-06 00:38:07 - ERROR - stderr - +2025-02-06 00:38:07 - ERROR - stderr - +2025-02-06 00:38:07 - INFO - stdout - {'loss': 0.425, 'grad_norm': 1.6368474960327148, 'learning_rate': 4.126820519178445e-06, 'epoch': 2.13} +2025-02-06 00:38:07 - ERROR - stderr - 71%|███████ | 15902/22434 [14:30:27<4:33:49, 2.52s/it] +2025-02-06 00:38:09 - ERROR - stderr - 71%|███████ | 15903/22434 [14:30:29<4:33:23, 2.51s/it] +2025-02-06 00:38:09 - ERROR - stderr - +2025-02-06 00:38:09 - ERROR - stderr - +2025-02-06 00:38:09 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.5429695844650269, 'learning_rate': 4.125652074960556e-06, 'epoch': 2.13} +2025-02-06 00:38:09 - ERROR - stderr - 71%|███████ | 15903/22434 [14:30:29<4:33:23, 2.51s/it] +2025-02-06 00:38:12 - ERROR - stderr - 71%|███████ | 15904/22434 [14:30:32<4:37:59, 2.55s/it] +2025-02-06 00:38:12 - ERROR - stderr - +2025-02-06 00:38:12 - ERROR - stderr - +2025-02-06 00:38:12 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.3731828927993774, 'learning_rate': 4.124483753187831e-06, 'epoch': 2.13} +2025-02-06 00:38:12 - ERROR - stderr - 71%|███████ | 15904/22434 [14:30:32<4:37:59, 2.55s/it] +2025-02-06 00:38:15 - ERROR - stderr - 71%|███████ | 15905/22434 [14:30:34<4:38:36, 2.56s/it] +2025-02-06 00:38:15 - ERROR - stderr - +2025-02-06 00:38:15 - ERROR - stderr - +2025-02-06 00:38:15 - INFO - stdout - {'loss': 0.4191, 'grad_norm': 1.4839287996292114, 'learning_rate': 4.123315553884618e-06, 'epoch': 2.13} +2025-02-06 00:38:15 - ERROR - stderr - 71%|███████ | 15905/22434 [14:30:34<4:38:36, 2.56s/it] +2025-02-06 00:38:17 - ERROR - stderr - 71%|███████ | 15906/22434 [14:30:37<4:36:26, 2.54s/it] +2025-02-06 00:38:17 - ERROR - stderr - +2025-02-06 00:38:17 - ERROR - stderr - +2025-02-06 00:38:17 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.428155779838562, 'learning_rate': 4.12214747707527e-06, 'epoch': 2.13} +2025-02-06 00:38:17 - ERROR - stderr - 71%|███████ | 15906/22434 [14:30:37<4:36:26, 2.54s/it] +2025-02-06 00:38:20 - ERROR - stderr - 71%|███████ | 15907/22434 [14:30:39<4:37:35, 2.55s/it] +2025-02-06 00:38:20 - ERROR - stderr - +2025-02-06 00:38:20 - ERROR - stderr - +2025-02-06 00:38:20 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.480543613433838, 'learning_rate': 4.120979522784132e-06, 'epoch': 2.13} +2025-02-06 00:38:20 - ERROR - stderr - 71%|███████ | 15907/22434 [14:30:39<4:37:35, 2.55s/it] +2025-02-06 00:38:22 - ERROR - stderr - 71%|███████ | 15908/22434 [14:30:42<4:33:53, 2.52s/it] +2025-02-06 00:38:22 - ERROR - stderr - +2025-02-06 00:38:22 - ERROR - stderr - +2025-02-06 00:38:22 - INFO - stdout - {'loss': 0.4109, 'grad_norm': 1.5258797407150269, 'learning_rate': 4.119811691035551e-06, 'epoch': 2.13} +2025-02-06 00:38:22 - ERROR - stderr - 71%|███████ | 15908/22434 [14:30:42<4:33:53, 2.52s/it] +2025-02-06 00:38:25 - ERROR - stderr - 71%|███████ | 15909/22434 [14:30:44<4:32:50, 2.51s/it] +2025-02-06 00:38:25 - ERROR - stderr - +2025-02-06 00:38:25 - ERROR - stderr - +2025-02-06 00:38:25 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.6359429359436035, 'learning_rate': 4.118643981853869e-06, 'epoch': 2.13} +2025-02-06 00:38:25 - ERROR - stderr - 71%|███████ | 15909/22434 [14:30:44<4:32:50, 2.51s/it] +2025-02-06 00:38:27 - ERROR - stderr - 71%|███████ | 15910/22434 [14:30:47<4:33:36, 2.52s/it] +2025-02-06 00:38:27 - ERROR - stderr - +2025-02-06 00:38:27 - ERROR - stderr - +2025-02-06 00:38:27 - INFO - stdout - {'loss': 0.3884, 'grad_norm': 1.7595237493515015, 'learning_rate': 4.1174763952634255e-06, 'epoch': 2.13} +2025-02-06 00:38:27 - ERROR - stderr - 71%|███████ | 15910/22434 [14:30:47<4:33:36, 2.52s/it] +2025-02-06 00:38:30 - ERROR - stderr - 71%|███████ | 15911/22434 [14:30:49<4:30:53, 2.49s/it] +2025-02-06 00:38:30 - ERROR - stderr - +2025-02-06 00:38:30 - ERROR - stderr - +2025-02-06 00:38:30 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.4064629077911377, 'learning_rate': 4.116308931288556e-06, 'epoch': 2.13} +2025-02-06 00:38:30 - ERROR - stderr - 71%|███████ | 15911/22434 [14:30:49<4:30:53, 2.49s/it] +2025-02-06 00:38:32 - ERROR - stderr - 71%|███████ | 15912/22434 [14:30:52<4:30:48, 2.49s/it] +2025-02-06 00:38:32 - ERROR - stderr - +2025-02-06 00:38:32 - ERROR - stderr - +2025-02-06 00:38:32 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.576346755027771, 'learning_rate': 4.115141589953599e-06, 'epoch': 2.13} +2025-02-06 00:38:32 - ERROR - stderr - 71%|███████ | 15912/22434 [14:30:52<4:30:48, 2.49s/it] +2025-02-06 00:38:35 - ERROR - stderr - 71%|███████ | 15913/22434 [14:30:54<4:32:06, 2.50s/it] +2025-02-06 00:38:35 - ERROR - stderr - +2025-02-06 00:38:35 - ERROR - stderr - +2025-02-06 00:38:35 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.5196313858032227, 'learning_rate': 4.113974371282883e-06, 'epoch': 2.13} +2025-02-06 00:38:35 - ERROR - stderr - 71%|███████ | 15913/22434 [14:30:54<4:32:06, 2.50s/it] +2025-02-06 00:38:37 - ERROR - stderr - 71%|███████ | 15914/22434 [14:30:57<4:36:03, 2.54s/it] +2025-02-06 00:38:37 - ERROR - stderr - +2025-02-06 00:38:37 - ERROR - stderr - +2025-02-06 00:38:37 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.350335955619812, 'learning_rate': 4.112807275300742e-06, 'epoch': 2.13} +2025-02-06 00:38:37 - ERROR - stderr - 71%|███████ | 15914/22434 [14:30:57<4:36:03, 2.54s/it] +2025-02-06 00:38:40 - ERROR - stderr - 71%|███████ | 15915/22434 [14:31:00<4:38:47, 2.57s/it] +2025-02-06 00:38:40 - ERROR - stderr - +2025-02-06 00:38:40 - ERROR - stderr - +2025-02-06 00:38:40 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.6362286806106567, 'learning_rate': 4.111640302031494e-06, 'epoch': 2.13} +2025-02-06 00:38:40 - ERROR - stderr - 71%|███████ | 15915/22434 [14:31:00<4:38:47, 2.57s/it] +2025-02-06 00:38:42 - ERROR - stderr - 71%|███████ | 15916/22434 [14:31:02<4:37:06, 2.55s/it] +2025-02-06 00:38:42 - ERROR - stderr - +2025-02-06 00:38:42 - ERROR - stderr - +2025-02-06 00:38:42 - INFO - stdout - {'loss': 0.3983, 'grad_norm': 1.516675591468811, 'learning_rate': 4.110473451499476e-06, 'epoch': 2.13} +2025-02-06 00:38:42 - ERROR - stderr - 71%|███████ | 15916/22434 [14:31:02<4:37:06, 2.55s/it] +2025-02-06 00:38:45 - ERROR - stderr - 71%|███████ | 15917/22434 [14:31:05<4:36:43, 2.55s/it] +2025-02-06 00:38:45 - ERROR - stderr - +2025-02-06 00:38:45 - ERROR - stderr - +2025-02-06 00:38:45 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.464879035949707, 'learning_rate': 4.109306723728995e-06, 'epoch': 2.13} +2025-02-06 00:38:45 - ERROR - stderr - 71%|███████ | 15917/22434 [14:31:05<4:36:43, 2.55s/it] +2025-02-06 00:38:47 - ERROR - stderr - 71%|███████ | 15918/22434 [14:31:07<4:38:38, 2.57s/it] +2025-02-06 00:38:48 - ERROR - stderr - +2025-02-06 00:38:48 - ERROR - stderr - +2025-02-06 00:38:48 - INFO - stdout - {'loss': 0.3972, 'grad_norm': 1.5611447095870972, 'learning_rate': 4.108140118744383e-06, 'epoch': 2.13} +2025-02-06 00:38:48 - ERROR - stderr - 71%|███████ | 15918/22434 [14:31:07<4:38:38, 2.57s/it] +2025-02-06 00:38:50 - ERROR - stderr - 71%|███████ | 15919/22434 [14:31:10<4:37:45, 2.56s/it] +2025-02-06 00:38:50 - ERROR - stderr - +2025-02-06 00:38:50 - ERROR - stderr - +2025-02-06 00:38:50 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5515780448913574, 'learning_rate': 4.106973636569956e-06, 'epoch': 2.13} +2025-02-06 00:38:50 - ERROR - stderr - 71%|███████ | 15919/22434 [14:31:10<4:37:45, 2.56s/it] +2025-02-06 00:38:53 - ERROR - stderr - 71%|███████ | 15920/22434 [14:31:12<4:41:00, 2.59s/it] +2025-02-06 00:38:53 - ERROR - stderr - +2025-02-06 00:38:53 - ERROR - stderr - +2025-02-06 00:38:53 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.392643690109253, 'learning_rate': 4.105807277230018e-06, 'epoch': 2.13} +2025-02-06 00:38:53 - ERROR - stderr - 71%|███████ | 15920/22434 [14:31:12<4:41:00, 2.59s/it] +2025-02-06 00:38:55 - ERROR - stderr - 71%|███████ | 15921/22434 [14:31:15<4:37:59, 2.56s/it] +2025-02-06 00:38:55 - ERROR - stderr - +2025-02-06 00:38:55 - ERROR - stderr - +2025-02-06 00:38:55 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.5708081722259521, 'learning_rate': 4.104641040748894e-06, 'epoch': 2.13} +2025-02-06 00:38:55 - ERROR - stderr - 71%|███████ | 15921/22434 [14:31:15<4:37:59, 2.56s/it] +2025-02-06 00:38:58 - ERROR - stderr - 71%|███████ | 15922/22434 [14:31:18<4:40:19, 2.58s/it] +2025-02-06 00:38:58 - ERROR - stderr - +2025-02-06 00:38:58 - ERROR - stderr - +2025-02-06 00:38:58 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.4403773546218872, 'learning_rate': 4.103474927150882e-06, 'epoch': 2.13} +2025-02-06 00:38:58 - ERROR - stderr - 71%|███████ | 15922/22434 [14:31:18<4:40:19, 2.58s/it] +2025-02-06 00:39:00 - ERROR - stderr - 71%|███████ | 15923/22434 [14:31:20<4:37:19, 2.56s/it] +2025-02-06 00:39:00 - ERROR - stderr - +2025-02-06 00:39:00 - ERROR - stderr - +2025-02-06 00:39:00 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.3526792526245117, 'learning_rate': 4.1023089364602945e-06, 'epoch': 2.13} +2025-02-06 00:39:00 - ERROR - stderr - 71%|███████ | 15923/22434 [14:31:20<4:37:19, 2.56s/it] +2025-02-06 00:39:03 - ERROR - stderr - 71%|███████ | 15924/22434 [14:31:23<4:36:38, 2.55s/it] +2025-02-06 00:39:03 - ERROR - stderr - +2025-02-06 00:39:03 - ERROR - stderr - +2025-02-06 00:39:03 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.5631285905838013, 'learning_rate': 4.101143068701432e-06, 'epoch': 2.13} +2025-02-06 00:39:03 - ERROR - stderr - 71%|███████ | 15924/22434 [14:31:23<4:36:38, 2.55s/it] +2025-02-06 00:39:05 - ERROR - stderr - 71%|███████ | 15925/22434 [14:31:25<4:35:19, 2.54s/it] +2025-02-06 00:39:05 - ERROR - stderr - +2025-02-06 00:39:05 - ERROR - stderr - +2025-02-06 00:39:05 - INFO - stdout - {'loss': 0.4104, 'grad_norm': 1.5933613777160645, 'learning_rate': 4.0999773238985975e-06, 'epoch': 2.13} +2025-02-06 00:39:05 - ERROR - stderr - 71%|███████ | 15925/22434 [14:31:25<4:35:19, 2.54s/it] +2025-02-06 00:39:08 - ERROR - stderr - 71%|███████ | 15926/22434 [14:31:28<4:33:50, 2.52s/it] +2025-02-06 00:39:08 - ERROR - stderr - +2025-02-06 00:39:08 - ERROR - stderr - +2025-02-06 00:39:08 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.5095676183700562, 'learning_rate': 4.098811702076091e-06, 'epoch': 2.13} +2025-02-06 00:39:08 - ERROR - stderr - 71%|███████ | 15926/22434 [14:31:28<4:33:50, 2.52s/it] +2025-02-06 00:39:10 - ERROR - stderr - 71%|███████ | 15927/22434 [14:31:30<4:32:22, 2.51s/it] +2025-02-06 00:39:10 - ERROR - stderr - +2025-02-06 00:39:10 - ERROR - stderr - +2025-02-06 00:39:10 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.5459033250808716, 'learning_rate': 4.097646203258207e-06, 'epoch': 2.13} +2025-02-06 00:39:10 - ERROR - stderr - 71%|███████ | 15927/22434 [14:31:30<4:32:22, 2.51s/it] +2025-02-06 00:39:13 - ERROR - stderr - 71%|███████ | 15928/22434 [14:31:33<4:39:02, 2.57s/it] +2025-02-06 00:39:13 - ERROR - stderr - +2025-02-06 00:39:13 - ERROR - stderr - +2025-02-06 00:39:13 - INFO - stdout - {'loss': 0.4173, 'grad_norm': 1.650770664215088, 'learning_rate': 4.09648082746924e-06, 'epoch': 2.13} +2025-02-06 00:39:13 - ERROR - stderr - 71%|███████ | 15928/22434 [14:31:33<4:39:02, 2.57s/it] +2025-02-06 00:39:15 - ERROR - stderr - 71%|███████ | 15929/22434 [14:31:35<4:35:09, 2.54s/it] +2025-02-06 00:39:16 - ERROR - stderr - +2025-02-06 00:39:16 - ERROR - stderr - +2025-02-06 00:39:16 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.4838576316833496, 'learning_rate': 4.095315574733482e-06, 'epoch': 2.13} +2025-02-06 00:39:16 - ERROR - stderr - 71%|███████ | 15929/22434 [14:31:35<4:35:09, 2.54s/it] +2025-02-06 00:39:18 - ERROR - stderr - 71%|███████ | 15930/22434 [14:31:38<4:33:16, 2.52s/it] +2025-02-06 00:39:18 - ERROR - stderr - +2025-02-06 00:39:18 - ERROR - stderr - +2025-02-06 00:39:18 - INFO - stdout - {'loss': 0.327, 'grad_norm': 1.3735183477401733, 'learning_rate': 4.09415044507522e-06, 'epoch': 2.13} +2025-02-06 00:39:18 - ERROR - stderr - 71%|███████ | 15930/22434 [14:31:38<4:33:16, 2.52s/it] +2025-02-06 00:39:20 - ERROR - stderr - 71%|███████ | 15931/22434 [14:31:40<4:32:15, 2.51s/it] +2025-02-06 00:39:21 - ERROR - stderr - +2025-02-06 00:39:21 - ERROR - stderr - +2025-02-06 00:39:21 - INFO - stdout - {'loss': 0.3366, 'grad_norm': 1.4151078462600708, 'learning_rate': 4.09298543851874e-06, 'epoch': 2.13} +2025-02-06 00:39:21 - ERROR - stderr - 71%|███████ | 15931/22434 [14:31:40<4:32:15, 2.51s/it] +2025-02-06 00:39:23 - ERROR - stderr - 71%|███████ | 15932/22434 [14:31:43<4:30:11, 2.49s/it] +2025-02-06 00:39:23 - ERROR - stderr - +2025-02-06 00:39:23 - ERROR - stderr - +2025-02-06 00:39:23 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.3822365999221802, 'learning_rate': 4.091820555088327e-06, 'epoch': 2.13} +2025-02-06 00:39:23 - ERROR - stderr - 71%|███████ | 15932/22434 [14:31:43<4:30:11, 2.49s/it] +2025-02-06 00:39:25 - ERROR - stderr - 71%|███████ | 15933/22434 [14:31:45<4:30:19, 2.49s/it] +2025-02-06 00:39:25 - ERROR - stderr - +2025-02-06 00:39:25 - ERROR - stderr - +2025-02-06 00:39:25 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.5044782161712646, 'learning_rate': 4.090655794808262e-06, 'epoch': 2.13} +2025-02-06 00:39:25 - ERROR - stderr - 71%|███████ | 15933/22434 [14:31:45<4:30:19, 2.49s/it] +2025-02-06 00:39:28 - ERROR - stderr - 71%|███████ | 15934/22434 [14:31:48<4:34:25, 2.53s/it] +2025-02-06 00:39:28 - ERROR - stderr - +2025-02-06 00:39:28 - ERROR - stderr - +2025-02-06 00:39:28 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.5967580080032349, 'learning_rate': 4.089491157702821e-06, 'epoch': 2.13} +2025-02-06 00:39:28 - ERROR - stderr - 71%|███████ | 15934/22434 [14:31:48<4:34:25, 2.53s/it] +2025-02-06 00:39:31 - ERROR - stderr - 71%|███████ | 15935/22434 [14:31:50<4:34:33, 2.53s/it] +2025-02-06 00:39:31 - ERROR - stderr - +2025-02-06 00:39:31 - ERROR - stderr - +2025-02-06 00:39:31 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.3753914833068848, 'learning_rate': 4.088326643796284e-06, 'epoch': 2.13} +2025-02-06 00:39:31 - ERROR - stderr - 71%|███████ | 15935/22434 [14:31:50<4:34:33, 2.53s/it] +2025-02-06 00:39:33 - ERROR - stderr - 71%|███████ | 15936/22434 [14:31:53<4:33:29, 2.53s/it] +2025-02-06 00:39:33 - ERROR - stderr - +2025-02-06 00:39:33 - ERROR - stderr - +2025-02-06 00:39:33 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.4694678783416748, 'learning_rate': 4.087162253112915e-06, 'epoch': 2.13} +2025-02-06 00:39:33 - ERROR - stderr - 71%|███████ | 15936/22434 [14:31:53<4:33:29, 2.53s/it] +2025-02-06 00:39:36 - ERROR - stderr - 71%|███████ | 15937/22434 [14:31:56<4:46:43, 2.65s/it] +2025-02-06 00:39:36 - ERROR - stderr - +2025-02-06 00:39:36 - ERROR - stderr - +2025-02-06 00:39:36 - INFO - stdout - {'loss': 0.3489, 'grad_norm': 1.3661811351776123, 'learning_rate': 4.085997985676995e-06, 'epoch': 2.13} +2025-02-06 00:39:36 - ERROR - stderr - 71%|███████ | 15937/22434 [14:31:56<4:46:43, 2.65s/it] +2025-02-06 00:39:39 - ERROR - stderr - 71%|███████ | 15938/22434 [14:31:58<4:42:35, 2.61s/it] +2025-02-06 00:39:39 - ERROR - stderr - +2025-02-06 00:39:39 - ERROR - stderr - +2025-02-06 00:39:39 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.537551999092102, 'learning_rate': 4.084833841512791e-06, 'epoch': 2.13} +2025-02-06 00:39:39 - ERROR - stderr - 71%|███████ | 15938/22434 [14:31:58<4:42:35, 2.61s/it] +2025-02-06 00:39:41 - ERROR - stderr - 71%|███████ | 15939/22434 [14:32:01<4:38:54, 2.58s/it] +2025-02-06 00:39:41 - ERROR - stderr - +2025-02-06 00:39:41 - ERROR - stderr - +2025-02-06 00:39:41 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.6699230670928955, 'learning_rate': 4.083669820644558e-06, 'epoch': 2.13} +2025-02-06 00:39:41 - ERROR - stderr - 71%|███████ | 15939/22434 [14:32:01<4:38:54, 2.58s/it] +2025-02-06 00:39:44 - ERROR - stderr - 71%|███████ | 15940/22434 [14:32:03<4:36:51, 2.56s/it] +2025-02-06 00:39:44 - ERROR - stderr - +2025-02-06 00:39:44 - ERROR - stderr - +2025-02-06 00:39:44 - INFO - stdout - {'loss': 0.4136, 'grad_norm': 1.5585919618606567, 'learning_rate': 4.0825059230965735e-06, 'epoch': 2.13} +2025-02-06 00:39:44 - ERROR - stderr - 71%|███████ | 15940/22434 [14:32:03<4:36:51, 2.56s/it] +2025-02-06 00:39:46 - ERROR - stderr - 71%|███████ | 15941/22434 [14:32:06<4:33:48, 2.53s/it] +2025-02-06 00:39:46 - ERROR - stderr - +2025-02-06 00:39:46 - ERROR - stderr - +2025-02-06 00:39:46 - INFO - stdout - {'loss': 0.4014, 'grad_norm': 1.7295082807540894, 'learning_rate': 4.081342148893083e-06, 'epoch': 2.13} +2025-02-06 00:39:46 - ERROR - stderr - 71%|███████ | 15941/22434 [14:32:06<4:33:48, 2.53s/it] +2025-02-06 00:39:48 - ERROR - stderr - 71%|███████ | 15942/22434 [14:32:08<4:31:22, 2.51s/it] +2025-02-06 00:39:49 - ERROR - stderr - +2025-02-06 00:39:49 - ERROR - stderr - +2025-02-06 00:39:49 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.4812641143798828, 'learning_rate': 4.080178498058359e-06, 'epoch': 2.13} +2025-02-06 00:39:49 - ERROR - stderr - 71%|███████ | 15942/22434 [14:32:08<4:31:22, 2.51s/it] +2025-02-06 00:39:51 - ERROR - stderr - 71%|███████ | 15943/22434 [14:32:11<4:39:40, 2.59s/it] +2025-02-06 00:39:51 - ERROR - stderr - +2025-02-06 00:39:51 - ERROR - stderr - +2025-02-06 00:39:51 - INFO - stdout - {'loss': 0.3354, 'grad_norm': 1.396743655204773, 'learning_rate': 4.079014970616647e-06, 'epoch': 2.13} +2025-02-06 00:39:51 - ERROR - stderr - 71%|███████ | 15943/22434 [14:32:11<4:39:40, 2.59s/it] +2025-02-06 00:39:54 - ERROR - stderr - 71%|███████ | 15944/22434 [14:32:13<4:35:20, 2.55s/it] +2025-02-06 00:39:54 - ERROR - stderr - +2025-02-06 00:39:54 - ERROR - stderr - +2025-02-06 00:39:54 - INFO - stdout - {'loss': 0.383, 'grad_norm': 1.6025176048278809, 'learning_rate': 4.077851566592202e-06, 'epoch': 2.13} +2025-02-06 00:39:54 - ERROR - stderr - 71%|███████ | 15944/22434 [14:32:13<4:35:20, 2.55s/it] +2025-02-06 00:39:56 - ERROR - stderr - 71%|███████ | 15945/22434 [14:32:16<4:33:32, 2.53s/it] +2025-02-06 00:39:56 - ERROR - stderr - +2025-02-06 00:39:56 - ERROR - stderr - +2025-02-06 00:39:56 - INFO - stdout - {'loss': 0.4239, 'grad_norm': 1.7655713558197021, 'learning_rate': 4.076688286009274e-06, 'epoch': 2.13} +2025-02-06 00:39:56 - ERROR - stderr - 71%|███████ | 15945/22434 [14:32:16<4:33:32, 2.53s/it] +2025-02-06 00:39:59 - ERROR - stderr - 71%|███████ | 15946/22434 [14:32:18<4:32:03, 2.52s/it] +2025-02-06 00:39:59 - ERROR - stderr - +2025-02-06 00:39:59 - ERROR - stderr - +2025-02-06 00:39:59 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.4127881526947021, 'learning_rate': 4.07552512889211e-06, 'epoch': 2.13} +2025-02-06 00:39:59 - ERROR - stderr - 71%|███████ | 15946/22434 [14:32:18<4:32:03, 2.52s/it] +2025-02-06 00:40:01 - ERROR - stderr - 71%|███████ | 15947/22434 [14:32:21<4:30:41, 2.50s/it] +2025-02-06 00:40:01 - ERROR - stderr - +2025-02-06 00:40:01 - ERROR - stderr - +2025-02-06 00:40:01 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.595535397529602, 'learning_rate': 4.074362095264957e-06, 'epoch': 2.13} +2025-02-06 00:40:01 - ERROR - stderr - 71%|███████ | 15947/22434 [14:32:21<4:30:41, 2.50s/it] +2025-02-06 00:40:04 - ERROR - stderr - 71%|███████ | 15948/22434 [14:32:23<4:28:47, 2.49s/it] +2025-02-06 00:40:04 - ERROR - stderr - +2025-02-06 00:40:04 - ERROR - stderr - +2025-02-06 00:40:04 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.55341637134552, 'learning_rate': 4.073199185152054e-06, 'epoch': 2.13} +2025-02-06 00:40:04 - ERROR - stderr - 71%|███████ | 15948/22434 [14:32:23<4:28:47, 2.49s/it] +2025-02-06 00:40:06 - ERROR - stderr - 71%|███████ | 15949/22434 [14:32:26<4:28:19, 2.48s/it] +2025-02-06 00:40:06 - ERROR - stderr - +2025-02-06 00:40:06 - ERROR - stderr - +2025-02-06 00:40:06 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.4043346643447876, 'learning_rate': 4.072036398577644e-06, 'epoch': 2.13} +2025-02-06 00:40:06 - ERROR - stderr - 71%|███████ | 15949/22434 [14:32:26<4:28:19, 2.48s/it] +2025-02-06 00:40:08 - ERROR - stderr - 71%|███████ | 15950/22434 [14:32:28<4:26:52, 2.47s/it] +2025-02-06 00:40:09 - ERROR - stderr - +2025-02-06 00:40:09 - ERROR - stderr - +2025-02-06 00:40:09 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.4377882480621338, 'learning_rate': 4.070873735565962e-06, 'epoch': 2.13} +2025-02-06 00:40:09 - ERROR - stderr - 71%|███████ | 15950/22434 [14:32:28<4:26:52, 2.47s/it] +2025-02-06 00:40:11 - ERROR - stderr - 71%|███████ | 15951/22434 [14:32:31<4:28:27, 2.48s/it] +2025-02-06 00:40:11 - ERROR - stderr - +2025-02-06 00:40:11 - ERROR - stderr - +2025-02-06 00:40:11 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.513719916343689, 'learning_rate': 4.069711196141244e-06, 'epoch': 2.13} +2025-02-06 00:40:11 - ERROR - stderr - 71%|███████ | 15951/22434 [14:32:31<4:28:27, 2.48s/it] +2025-02-06 00:40:13 - ERROR - stderr - 71%|███████ | 15952/22434 [14:32:33<4:28:18, 2.48s/it] +2025-02-06 00:40:14 - ERROR - stderr - +2025-02-06 00:40:14 - ERROR - stderr - +2025-02-06 00:40:14 - INFO - stdout - {'loss': 0.3596, 'grad_norm': 1.6297292709350586, 'learning_rate': 4.068548780327721e-06, 'epoch': 2.13} +2025-02-06 00:40:14 - ERROR - stderr - 71%|███████ | 15952/22434 [14:32:33<4:28:18, 2.48s/it] +2025-02-06 00:40:16 - ERROR - stderr - 71%|███████ | 15953/22434 [14:32:36<4:29:05, 2.49s/it] +2025-02-06 00:40:16 - ERROR - stderr - +2025-02-06 00:40:16 - ERROR - stderr - +2025-02-06 00:40:16 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.4297336339950562, 'learning_rate': 4.067386488149624e-06, 'epoch': 2.13} +2025-02-06 00:40:16 - ERROR - stderr - 71%|███████ | 15953/22434 [14:32:36<4:29:05, 2.49s/it] +2025-02-06 00:40:19 - ERROR - stderr - 71%|███████ | 15954/22434 [14:32:38<4:30:33, 2.51s/it] +2025-02-06 00:40:19 - ERROR - stderr - +2025-02-06 00:40:19 - ERROR - stderr - +2025-02-06 00:40:19 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.4753037691116333, 'learning_rate': 4.066224319631181e-06, 'epoch': 2.13} +2025-02-06 00:40:19 - ERROR - stderr - 71%|███████ | 15954/22434 [14:32:38<4:30:33, 2.51s/it] +2025-02-06 00:40:21 - ERROR - stderr - 71%|███████ | 15955/22434 [14:32:41<4:39:34, 2.59s/it] +2025-02-06 00:40:21 - ERROR - stderr - +2025-02-06 00:40:21 - ERROR - stderr - +2025-02-06 00:40:21 - INFO - stdout - {'loss': 0.3918, 'grad_norm': 1.5886751413345337, 'learning_rate': 4.065062274796609e-06, 'epoch': 2.13} +2025-02-06 00:40:21 - ERROR - stderr - 71%|███████ | 15955/22434 [14:32:41<4:39:34, 2.59s/it] +2025-02-06 00:40:24 - ERROR - stderr - 71%|███████ | 15956/22434 [14:32:44<4:37:02, 2.57s/it] +2025-02-06 00:40:24 - ERROR - stderr - +2025-02-06 00:40:24 - ERROR - stderr - +2025-02-06 00:40:24 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.56856369972229, 'learning_rate': 4.063900353670136e-06, 'epoch': 2.13} +2025-02-06 00:40:24 - ERROR - stderr - 71%|███████ | 15956/22434 [14:32:44<4:37:02, 2.57s/it] +2025-02-06 00:40:26 - ERROR - stderr - 71%|███████ | 15957/22434 [14:32:46<4:33:15, 2.53s/it] +2025-02-06 00:40:26 - ERROR - stderr - +2025-02-06 00:40:26 - ERROR - stderr - +2025-02-06 00:40:26 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.4538838863372803, 'learning_rate': 4.06273855627598e-06, 'epoch': 2.13} +2025-02-06 00:40:26 - ERROR - stderr - 71%|███████ | 15957/22434 [14:32:46<4:33:15, 2.53s/it] +2025-02-06 00:40:29 - ERROR - stderr - 71%|███████ | 15958/22434 [14:32:49<4:43:49, 2.63s/it] +2025-02-06 00:40:29 - ERROR - stderr - +2025-02-06 00:40:29 - ERROR - stderr - +2025-02-06 00:40:29 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.376729130744934, 'learning_rate': 4.061576882638359e-06, 'epoch': 2.13} +2025-02-06 00:40:29 - ERROR - stderr - 71%|███████ | 15958/22434 [14:32:49<4:43:49, 2.63s/it] +2025-02-06 00:40:32 - ERROR - stderr - 71%|███████ | 15959/22434 [14:32:51<4:38:57, 2.58s/it] +2025-02-06 00:40:32 - ERROR - stderr - +2025-02-06 00:40:32 - ERROR - stderr - +2025-02-06 00:40:32 - INFO - stdout - {'loss': 0.4181, 'grad_norm': 1.6132502555847168, 'learning_rate': 4.060415332781488e-06, 'epoch': 2.13} +2025-02-06 00:40:32 - ERROR - stderr - 71%|███████ | 15959/22434 [14:32:51<4:38:57, 2.58s/it] +2025-02-06 00:40:34 - ERROR - stderr - 71%|███████ | 15960/22434 [14:32:54<4:37:49, 2.57s/it] +2025-02-06 00:40:34 - ERROR - stderr - +2025-02-06 00:40:34 - ERROR - stderr - +2025-02-06 00:40:34 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.1769644021987915, 'learning_rate': 4.059253906729569e-06, 'epoch': 2.13} +2025-02-06 00:40:34 - ERROR - stderr - 71%|███████ | 15960/22434 [14:32:54<4:37:49, 2.57s/it] +2025-02-06 00:40:37 - ERROR - stderr - 71%|███████ | 15961/22434 [14:32:56<4:37:06, 2.57s/it] +2025-02-06 00:40:37 - ERROR - stderr - +2025-02-06 00:40:37 - ERROR - stderr - +2025-02-06 00:40:37 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.4853334426879883, 'learning_rate': 4.058092604506825e-06, 'epoch': 2.13} +2025-02-06 00:40:37 - ERROR - stderr - 71%|███████ | 15961/22434 [14:32:57<4:37:06, 2.57s/it] +2025-02-06 00:40:39 - ERROR - stderr - 71%|███████ | 15962/22434 [14:32:59<4:35:13, 2.55s/it] +2025-02-06 00:40:39 - ERROR - stderr - +2025-02-06 00:40:39 - ERROR - stderr - +2025-02-06 00:40:39 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.291886568069458, 'learning_rate': 4.05693142613745e-06, 'epoch': 2.13} +2025-02-06 00:40:39 - ERROR - stderr - 71%|███████ | 15962/22434 [14:32:59<4:35:13, 2.55s/it] +2025-02-06 00:40:42 - ERROR - stderr - 71%|███████ | 15963/22434 [14:33:02<4:33:23, 2.53s/it] +2025-02-06 00:40:42 - ERROR - stderr - +2025-02-06 00:40:42 - ERROR - stderr - +2025-02-06 00:40:42 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.522615671157837, 'learning_rate': 4.055770371645655e-06, 'epoch': 2.13} +2025-02-06 00:40:42 - ERROR - stderr - 71%|███████ | 15963/22434 [14:33:02<4:33:23, 2.53s/it] +2025-02-06 00:40:44 - ERROR - stderr - 71%|███████ | 15964/22434 [14:33:04<4:31:24, 2.52s/it] +2025-02-06 00:40:44 - ERROR - stderr - +2025-02-06 00:40:44 - ERROR - stderr - +2025-02-06 00:40:44 - INFO - stdout - {'loss': 0.3401, 'grad_norm': 1.3327890634536743, 'learning_rate': 4.054609441055636e-06, 'epoch': 2.13} +2025-02-06 00:40:44 - ERROR - stderr - 71%|███████ | 15964/22434 [14:33:04<4:31:24, 2.52s/it] +2025-02-06 00:40:47 - ERROR - stderr - 71%|███████ | 15965/22434 [14:33:07<4:33:02, 2.53s/it] +2025-02-06 00:40:47 - ERROR - stderr - +2025-02-06 00:40:47 - ERROR - stderr - +2025-02-06 00:40:47 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.3329821825027466, 'learning_rate': 4.053448634391591e-06, 'epoch': 2.13} +2025-02-06 00:40:47 - ERROR - stderr - 71%|███████ | 15965/22434 [14:33:07<4:33:02, 2.53s/it] +2025-02-06 00:40:49 - ERROR - stderr - 71%|███████ | 15966/22434 [14:33:09<4:31:19, 2.52s/it] +2025-02-06 00:40:49 - ERROR - stderr - +2025-02-06 00:40:49 - ERROR - stderr - +2025-02-06 00:40:49 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.4154276847839355, 'learning_rate': 4.052287951677727e-06, 'epoch': 2.14} +2025-02-06 00:40:49 - ERROR - stderr - 71%|███████ | 15966/22434 [14:33:09<4:31:19, 2.52s/it] +2025-02-06 00:40:52 - ERROR - stderr - 71%|███████ | 15967/22434 [14:33:12<4:41:28, 2.61s/it] +2025-02-06 00:40:52 - ERROR - stderr - +2025-02-06 00:40:52 - ERROR - stderr - +2025-02-06 00:40:52 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.3937063217163086, 'learning_rate': 4.051127392938226e-06, 'epoch': 2.14} +2025-02-06 00:40:52 - ERROR - stderr - 71%|███████ | 15967/22434 [14:33:12<4:41:28, 2.61s/it] +2025-02-06 00:40:55 - ERROR - stderr - 71%|███████ | 15968/22434 [14:33:14<4:39:49, 2.60s/it] +2025-02-06 00:40:55 - ERROR - stderr - +2025-02-06 00:40:55 - ERROR - stderr - +2025-02-06 00:40:55 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.2747223377227783, 'learning_rate': 4.049966958197281e-06, 'epoch': 2.14} +2025-02-06 00:40:55 - ERROR - stderr - 71%|███████ | 15968/22434 [14:33:14<4:39:49, 2.60s/it] +2025-02-06 00:40:57 - ERROR - stderr - 71%|███████ | 15969/22434 [14:33:17<4:36:52, 2.57s/it] +2025-02-06 00:40:57 - ERROR - stderr - +2025-02-06 00:40:57 - ERROR - stderr - +2025-02-06 00:40:57 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.3360567092895508, 'learning_rate': 4.048806647479082e-06, 'epoch': 2.14} +2025-02-06 00:40:57 - ERROR - stderr - 71%|███████ | 15969/22434 [14:33:17<4:36:52, 2.57s/it] +2025-02-06 00:41:00 - ERROR - stderr - 71%|███████ | 15970/22434 [14:33:19<4:34:38, 2.55s/it] +2025-02-06 00:41:00 - ERROR - stderr - +2025-02-06 00:41:00 - ERROR - stderr - +2025-02-06 00:41:00 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.5092837810516357, 'learning_rate': 4.047646460807814e-06, 'epoch': 2.14} +2025-02-06 00:41:00 - ERROR - stderr - 71%|███████ | 15970/22434 [14:33:19<4:34:38, 2.55s/it] +2025-02-06 00:41:02 - ERROR - stderr - 71%|███████ | 15971/22434 [14:33:22<4:30:43, 2.51s/it] +2025-02-06 00:41:02 - ERROR - stderr - +2025-02-06 00:41:02 - ERROR - stderr - +2025-02-06 00:41:02 - INFO - stdout - {'loss': 0.3947, 'grad_norm': 1.5105440616607666, 'learning_rate': 4.046486398207659e-06, 'epoch': 2.14} +2025-02-06 00:41:02 - ERROR - stderr - 71%|███████ | 15971/22434 [14:33:22<4:30:43, 2.51s/it] +2025-02-06 00:41:05 - ERROR - stderr - 71%|███████ | 15972/22434 [14:33:24<4:28:49, 2.50s/it] +2025-02-06 00:41:05 - ERROR - stderr - +2025-02-06 00:41:05 - ERROR - stderr - +2025-02-06 00:41:05 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.5411834716796875, 'learning_rate': 4.045326459702797e-06, 'epoch': 2.14} +2025-02-06 00:41:05 - ERROR - stderr - 71%|███████ | 15972/22434 [14:33:24<4:28:49, 2.50s/it] +2025-02-06 00:41:07 - ERROR - stderr - 71%|███████ | 15973/22434 [14:33:27<4:32:27, 2.53s/it] +2025-02-06 00:41:07 - ERROR - stderr - +2025-02-06 00:41:07 - ERROR - stderr - +2025-02-06 00:41:07 - INFO - stdout - {'loss': 0.4044, 'grad_norm': 1.4459211826324463, 'learning_rate': 4.044166645317409e-06, 'epoch': 2.14} +2025-02-06 00:41:07 - ERROR - stderr - 71%|███████ | 15973/22434 [14:33:27<4:32:27, 2.53s/it] +2025-02-06 00:41:10 - ERROR - stderr - 71%|███████ | 15974/22434 [14:33:30<4:34:12, 2.55s/it] +2025-02-06 00:41:10 - ERROR - stderr - +2025-02-06 00:41:10 - ERROR - stderr - +2025-02-06 00:41:10 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.424497365951538, 'learning_rate': 4.043006955075667e-06, 'epoch': 2.14} +2025-02-06 00:41:10 - ERROR - stderr - 71%|███████ | 15974/22434 [14:33:30<4:34:12, 2.55s/it] +2025-02-06 00:41:12 - ERROR - stderr - 71%|███████ | 15975/22434 [14:33:32<4:34:03, 2.55s/it] +2025-02-06 00:41:12 - ERROR - stderr - +2025-02-06 00:41:12 - ERROR - stderr - +2025-02-06 00:41:12 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.5023434162139893, 'learning_rate': 4.041847389001745e-06, 'epoch': 2.14} +2025-02-06 00:41:12 - ERROR - stderr - 71%|███████ | 15975/22434 [14:33:32<4:34:03, 2.55s/it] +2025-02-06 00:41:15 - ERROR - stderr - 71%|███████ | 15976/22434 [14:33:35<4:48:01, 2.68s/it] +2025-02-06 00:41:15 - ERROR - stderr - +2025-02-06 00:41:15 - ERROR - stderr - +2025-02-06 00:41:15 - INFO - stdout - {'loss': 0.4304, 'grad_norm': 1.7268065214157104, 'learning_rate': 4.040687947119813e-06, 'epoch': 2.14} +2025-02-06 00:41:15 - ERROR - stderr - 71%|███████ | 15976/22434 [14:33:35<4:48:01, 2.68s/it] +2025-02-06 00:41:18 - ERROR - stderr - 71%|███████ | 15977/22434 [14:33:38<4:42:47, 2.63s/it] +2025-02-06 00:41:18 - ERROR - stderr - +2025-02-06 00:41:18 - ERROR - stderr - +2025-02-06 00:41:18 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.5523854494094849, 'learning_rate': 4.039528629454039e-06, 'epoch': 2.14} +2025-02-06 00:41:18 - ERROR - stderr - 71%|███████ | 15977/22434 [14:33:38<4:42:47, 2.63s/it] +2025-02-06 00:41:20 - ERROR - stderr - 71%|███████ | 15978/22434 [14:33:40<4:36:36, 2.57s/it] +2025-02-06 00:41:20 - ERROR - stderr - +2025-02-06 00:41:20 - ERROR - stderr - +2025-02-06 00:41:20 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.670316219329834, 'learning_rate': 4.038369436028586e-06, 'epoch': 2.14} +2025-02-06 00:41:20 - ERROR - stderr - 71%|███████ | 15978/22434 [14:33:40<4:36:36, 2.57s/it] +2025-02-06 00:41:23 - ERROR - stderr - 71%|███████ | 15979/22434 [14:33:42<4:33:53, 2.55s/it] +2025-02-06 00:41:23 - ERROR - stderr - +2025-02-06 00:41:23 - ERROR - stderr - +2025-02-06 00:41:23 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.4367728233337402, 'learning_rate': 4.037210366867617e-06, 'epoch': 2.14} +2025-02-06 00:41:23 - ERROR - stderr - 71%|███████ | 15979/22434 [14:33:43<4:33:53, 2.55s/it] +2025-02-06 00:41:25 - ERROR - stderr - 71%|███████ | 15980/22434 [14:33:45<4:34:59, 2.56s/it] +2025-02-06 00:41:25 - ERROR - stderr - +2025-02-06 00:41:25 - ERROR - stderr - +2025-02-06 00:41:25 - INFO - stdout - {'loss': 0.4593, 'grad_norm': 1.7735098600387573, 'learning_rate': 4.036051421995298e-06, 'epoch': 2.14} +2025-02-06 00:41:25 - ERROR - stderr - 71%|███████ | 15980/22434 [14:33:45<4:34:59, 2.56s/it] +2025-02-06 00:41:28 - ERROR - stderr - 71%|███████ | 15981/22434 [14:33:48<4:35:10, 2.56s/it] +2025-02-06 00:41:28 - ERROR - stderr - +2025-02-06 00:41:28 - ERROR - stderr - +2025-02-06 00:41:28 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.5288199186325073, 'learning_rate': 4.034892601435771e-06, 'epoch': 2.14} +2025-02-06 00:41:28 - ERROR - stderr - 71%|███████ | 15981/22434 [14:33:48<4:35:10, 2.56s/it] +2025-02-06 00:41:30 - ERROR - stderr - 71%|███████ | 15982/22434 [14:33:50<4:36:38, 2.57s/it] +2025-02-06 00:41:31 - ERROR - stderr - +2025-02-06 00:41:31 - ERROR - stderr - +2025-02-06 00:41:31 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.5304921865463257, 'learning_rate': 4.033733905213209e-06, 'epoch': 2.14} +2025-02-06 00:41:31 - ERROR - stderr - 71%|███████ | 15982/22434 [14:33:50<4:36:38, 2.57s/it] +2025-02-06 00:41:33 - ERROR - stderr - 71%|███████ | 15983/22434 [14:33:53<4:34:54, 2.56s/it] +2025-02-06 00:41:33 - ERROR - stderr - +2025-02-06 00:41:33 - ERROR - stderr - +2025-02-06 00:41:33 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.5764700174331665, 'learning_rate': 4.032575333351749e-06, 'epoch': 2.14} +2025-02-06 00:41:33 - ERROR - stderr - 71%|███████ | 15983/22434 [14:33:53<4:34:54, 2.56s/it] +2025-02-06 00:41:35 - ERROR - stderr - 71%|███████ | 15984/22434 [14:33:55<4:30:12, 2.51s/it] +2025-02-06 00:41:35 - ERROR - stderr - +2025-02-06 00:41:35 - ERROR - stderr - +2025-02-06 00:41:35 - INFO - stdout - {'loss': 0.3668, 'grad_norm': 1.4649922847747803, 'learning_rate': 4.0314168858755434e-06, 'epoch': 2.14} +2025-02-06 00:41:35 - ERROR - stderr - 71%|███████ | 15984/22434 [14:33:55<4:30:12, 2.51s/it] +2025-02-06 00:41:38 - ERROR - stderr - 71%|███████▏ | 15985/22434 [14:33:58<4:32:26, 2.53s/it] +2025-02-06 00:41:38 - ERROR - stderr - +2025-02-06 00:41:38 - ERROR - stderr - +2025-02-06 00:41:38 - INFO - stdout - {'loss': 0.3375, 'grad_norm': 1.4266655445098877, 'learning_rate': 4.0302585628087475e-06, 'epoch': 2.14} +2025-02-06 00:41:38 - ERROR - stderr - 71%|███████▏ | 15985/22434 [14:33:58<4:32:26, 2.53s/it] +2025-02-06 00:41:40 - ERROR - stderr - 71%|███████▏ | 15986/22434 [14:34:00<4:31:22, 2.53s/it] +2025-02-06 00:41:41 - ERROR - stderr - +2025-02-06 00:41:41 - ERROR - stderr - +2025-02-06 00:41:41 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.335291862487793, 'learning_rate': 4.0291003641754935e-06, 'epoch': 2.14} +2025-02-06 00:41:41 - ERROR - stderr - 71%|███████▏ | 15986/22434 [14:34:00<4:31:22, 2.53s/it] +2025-02-06 00:41:43 - ERROR - stderr - 71%|███████▏ | 15987/22434 [14:34:03<4:36:21, 2.57s/it] +2025-02-06 00:41:43 - ERROR - stderr - +2025-02-06 00:41:43 - ERROR - stderr - +2025-02-06 00:41:43 - INFO - stdout - {'loss': 0.4122, 'grad_norm': 1.6448557376861572, 'learning_rate': 4.0279422899999355e-06, 'epoch': 2.14} +2025-02-06 00:41:43 - ERROR - stderr - 71%|███████▏ | 15987/22434 [14:34:03<4:36:21, 2.57s/it] +2025-02-06 00:41:46 - ERROR - stderr - 71%|███████▏ | 15988/22434 [14:34:05<4:33:49, 2.55s/it] +2025-02-06 00:41:46 - ERROR - stderr - +2025-02-06 00:41:46 - ERROR - stderr - +2025-02-06 00:41:46 - INFO - stdout - {'loss': 0.3877, 'grad_norm': 1.4767085313796997, 'learning_rate': 4.026784340306202e-06, 'epoch': 2.14} +2025-02-06 00:41:46 - ERROR - stderr - 71%|███████▏ | 15988/22434 [14:34:05<4:33:49, 2.55s/it] +2025-02-06 00:41:48 - ERROR - stderr - 71%|███████▏ | 15989/22434 [14:34:08<4:32:03, 2.53s/it] +2025-02-06 00:41:48 - ERROR - stderr - +2025-02-06 00:41:48 - ERROR - stderr - +2025-02-06 00:41:48 - INFO - stdout - {'loss': 0.4423, 'grad_norm': 1.6708890199661255, 'learning_rate': 4.025626515118434e-06, 'epoch': 2.14} +2025-02-06 00:41:48 - ERROR - stderr - 71%|███████▏ | 15989/22434 [14:34:08<4:32:03, 2.53s/it] +2025-02-06 00:41:51 - ERROR - stderr - 71%|███████▏ | 15990/22434 [14:34:10<4:32:49, 2.54s/it] +2025-02-06 00:41:51 - ERROR - stderr - +2025-02-06 00:41:51 - ERROR - stderr - +2025-02-06 00:41:51 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.5185105800628662, 'learning_rate': 4.024468814460764e-06, 'epoch': 2.14} +2025-02-06 00:41:51 - ERROR - stderr - 71%|███████▏ | 15990/22434 [14:34:11<4:32:49, 2.54s/it] +2025-02-06 00:41:53 - ERROR - stderr - 71%|███████▏ | 15991/22434 [14:34:13<4:29:29, 2.51s/it] +2025-02-06 00:41:53 - ERROR - stderr - +2025-02-06 00:41:53 - ERROR - stderr - +2025-02-06 00:41:53 - INFO - stdout - {'loss': 0.4378, 'grad_norm': 1.6793361902236938, 'learning_rate': 4.023311238357324e-06, 'epoch': 2.14} +2025-02-06 00:41:53 - ERROR - stderr - 71%|███████▏ | 15991/22434 [14:34:13<4:29:29, 2.51s/it] +2025-02-06 00:41:56 - ERROR - stderr - 71%|███████▏ | 15992/22434 [14:34:15<4:28:11, 2.50s/it] +2025-02-06 00:41:56 - ERROR - stderr - +2025-02-06 00:41:56 - ERROR - stderr - +2025-02-06 00:41:56 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.6090344190597534, 'learning_rate': 4.022153786832241e-06, 'epoch': 2.14} +2025-02-06 00:41:56 - ERROR - stderr - 71%|███████▏ | 15992/22434 [14:34:15<4:28:11, 2.50s/it] +2025-02-06 00:41:58 - ERROR - stderr - 71%|███████▏ | 15993/22434 [14:34:18<4:28:45, 2.50s/it] +2025-02-06 00:41:58 - ERROR - stderr - +2025-02-06 00:41:58 - ERROR - stderr - +2025-02-06 00:41:58 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.4538154602050781, 'learning_rate': 4.020996459909643e-06, 'epoch': 2.14} +2025-02-06 00:41:58 - ERROR - stderr - 71%|███████▏ | 15993/22434 [14:34:18<4:28:45, 2.50s/it] +2025-02-06 00:42:01 - ERROR - stderr - 71%|███████▏ | 15994/22434 [14:34:20<4:27:41, 2.49s/it] +2025-02-06 00:42:01 - ERROR - stderr - +2025-02-06 00:42:01 - ERROR - stderr - +2025-02-06 00:42:01 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.4623823165893555, 'learning_rate': 4.019839257613652e-06, 'epoch': 2.14} +2025-02-06 00:42:01 - ERROR - stderr - 71%|███████▏ | 15994/22434 [14:34:20<4:27:41, 2.49s/it] +2025-02-06 00:42:03 - ERROR - stderr - 71%|███████▏ | 15995/22434 [14:34:23<4:28:24, 2.50s/it] +2025-02-06 00:42:03 - ERROR - stderr - +2025-02-06 00:42:03 - ERROR - stderr - +2025-02-06 00:42:03 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.383408546447754, 'learning_rate': 4.018682179968391e-06, 'epoch': 2.14} +2025-02-06 00:42:03 - ERROR - stderr - 71%|███████▏ | 15995/22434 [14:34:23<4:28:24, 2.50s/it] +2025-02-06 00:42:06 - ERROR - stderr - 71%|███████▏ | 15996/22434 [14:34:26<4:36:19, 2.58s/it] +2025-02-06 00:42:06 - ERROR - stderr - +2025-02-06 00:42:06 - ERROR - stderr - +2025-02-06 00:42:06 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.5650684833526611, 'learning_rate': 4.017525226997975e-06, 'epoch': 2.14} +2025-02-06 00:42:06 - ERROR - stderr - 71%|███████▏ | 15996/22434 [14:34:26<4:36:19, 2.58s/it] +2025-02-06 00:42:08 - ERROR - stderr - 71%|███████▏ | 15997/22434 [14:34:28<4:32:16, 2.54s/it] +2025-02-06 00:42:08 - ERROR - stderr - +2025-02-06 00:42:08 - ERROR - stderr - +2025-02-06 00:42:08 - INFO - stdout - {'loss': 0.4166, 'grad_norm': 1.5420795679092407, 'learning_rate': 4.0163683987265215e-06, 'epoch': 2.14} +2025-02-06 00:42:08 - ERROR - stderr - 71%|███████▏ | 15997/22434 [14:34:28<4:32:16, 2.54s/it] +2025-02-06 00:42:11 - ERROR - stderr - 71%|███████▏ | 15998/22434 [14:34:31<4:29:39, 2.51s/it] +2025-02-06 00:42:11 - ERROR - stderr - +2025-02-06 00:42:11 - ERROR - stderr - +2025-02-06 00:42:11 - INFO - stdout - {'loss': 0.4164, 'grad_norm': 1.5818982124328613, 'learning_rate': 4.015211695178142e-06, 'epoch': 2.14} +2025-02-06 00:42:11 - ERROR - stderr - 71%|███████▏ | 15998/22434 [14:34:31<4:29:39, 2.51s/it] +2025-02-06 00:42:13 - ERROR - stderr - 71%|███████▏ | 15999/22434 [14:34:33<4:31:07, 2.53s/it] +2025-02-06 00:42:13 - ERROR - stderr - +2025-02-06 00:42:13 - ERROR - stderr - +2025-02-06 00:42:13 - INFO - stdout - {'loss': 0.4304, 'grad_norm': 1.6897526979446411, 'learning_rate': 4.014055116376952e-06, 'epoch': 2.14} +2025-02-06 00:42:13 - ERROR - stderr - 71%|███████▏ | 15999/22434 [14:34:33<4:31:07, 2.53s/it] +2025-02-06 00:42:16 - ERROR - stderr - 71%|███████▏ | 16000/22434 [14:34:36<4:29:07, 2.51s/it] +2025-02-06 00:42:16 - ERROR - stderr - +2025-02-06 00:42:16 - ERROR - stderr - +2025-02-06 00:42:16 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.4433164596557617, 'learning_rate': 4.012898662347048e-06, 'epoch': 2.14} +2025-02-06 00:42:16 - ERROR - stderr - 71%|███████▏ | 16000/22434 [14:34:36<4:29:07, 2.51s/it] +2025-02-06 00:42:18 - ERROR - stderr - 71%|███████▏ | 16001/22434 [14:34:38<4:30:27, 2.52s/it] +2025-02-06 00:42:18 - ERROR - stderr - +2025-02-06 00:42:18 - ERROR - stderr - +2025-02-06 00:42:18 - INFO - stdout - {'loss': 0.4005, 'grad_norm': 1.5308454036712646, 'learning_rate': 4.011742333112546e-06, 'epoch': 2.14} +2025-02-06 00:42:18 - ERROR - stderr - 71%|███████▏ | 16001/22434 [14:34:38<4:30:27, 2.52s/it] +2025-02-06 00:42:21 - ERROR - stderr - 71%|███████▏ | 16002/22434 [14:34:41<4:30:39, 2.52s/it] +2025-02-06 00:42:21 - ERROR - stderr - +2025-02-06 00:42:21 - ERROR - stderr - +2025-02-06 00:42:21 - INFO - stdout - {'loss': 0.4279, 'grad_norm': 1.6341255903244019, 'learning_rate': 4.010586128697546e-06, 'epoch': 2.14} +2025-02-06 00:42:21 - ERROR - stderr - 71%|███████▏ | 16002/22434 [14:34:41<4:30:39, 2.52s/it] +2025-02-06 00:42:23 - ERROR - stderr - 71%|███████▏ | 16003/22434 [14:34:43<4:28:04, 2.50s/it] +2025-02-06 00:42:23 - ERROR - stderr - +2025-02-06 00:42:23 - ERROR - stderr - +2025-02-06 00:42:23 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.383974552154541, 'learning_rate': 4.009430049126145e-06, 'epoch': 2.14} +2025-02-06 00:42:23 - ERROR - stderr - 71%|███████▏ | 16003/22434 [14:34:43<4:28:04, 2.50s/it] +2025-02-06 00:42:26 - ERROR - stderr - 71%|███████▏ | 16004/22434 [14:34:46<4:25:49, 2.48s/it] +2025-02-06 00:42:26 - ERROR - stderr - +2025-02-06 00:42:26 - ERROR - stderr - +2025-02-06 00:42:26 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.560968041419983, 'learning_rate': 4.008274094422447e-06, 'epoch': 2.14} +2025-02-06 00:42:26 - ERROR - stderr - 71%|███████▏ | 16004/22434 [14:34:46<4:25:49, 2.48s/it] +2025-02-06 00:42:28 - ERROR - stderr - 71%|███████▏ | 16005/22434 [14:34:48<4:24:31, 2.47s/it] +2025-02-06 00:42:28 - ERROR - stderr - +2025-02-06 00:42:28 - ERROR - stderr - +2025-02-06 00:42:28 - INFO - stdout - {'loss': 0.4146, 'grad_norm': 1.5451240539550781, 'learning_rate': 4.007118264610534e-06, 'epoch': 2.14} +2025-02-06 00:42:28 - ERROR - stderr - 71%|███████▏ | 16005/22434 [14:34:48<4:24:31, 2.47s/it] +2025-02-06 00:42:31 - ERROR - stderr - 71%|███████▏ | 16006/22434 [14:34:51<4:33:51, 2.56s/it] +2025-02-06 00:42:31 - ERROR - stderr - +2025-02-06 00:42:31 - ERROR - stderr - +2025-02-06 00:42:31 - INFO - stdout - {'loss': 0.4031, 'grad_norm': 1.596616268157959, 'learning_rate': 4.005962559714514e-06, 'epoch': 2.14} +2025-02-06 00:42:31 - ERROR - stderr - 71%|███████▏ | 16006/22434 [14:34:51<4:33:51, 2.56s/it] +2025-02-06 00:42:33 - ERROR - stderr - 71%|███████▏ | 16007/22434 [14:34:53<4:31:31, 2.53s/it] +2025-02-06 00:42:33 - ERROR - stderr - +2025-02-06 00:42:34 - ERROR - stderr - +2025-02-06 00:42:34 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.489897608757019, 'learning_rate': 4.0048069797584665e-06, 'epoch': 2.14} +2025-02-06 00:42:34 - ERROR - stderr - 71%|███████▏ | 16007/22434 [14:34:53<4:31:31, 2.53s/it] +2025-02-06 00:42:36 - ERROR - stderr - 71%|███████▏ | 16008/22434 [14:34:56<4:32:35, 2.55s/it] +2025-02-06 00:42:36 - ERROR - stderr - +2025-02-06 00:42:36 - ERROR - stderr - +2025-02-06 00:42:36 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.6411100625991821, 'learning_rate': 4.003651524766479e-06, 'epoch': 2.14} +2025-02-06 00:42:36 - ERROR - stderr - 71%|███████▏ | 16008/22434 [14:34:56<4:32:35, 2.55s/it] +2025-02-06 00:42:39 - ERROR - stderr - 71%|███████▏ | 16009/22434 [14:34:58<4:31:04, 2.53s/it] +2025-02-06 00:42:39 - ERROR - stderr - +2025-02-06 00:42:39 - ERROR - stderr - +2025-02-06 00:42:39 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.479434609413147, 'learning_rate': 4.0024961947626386e-06, 'epoch': 2.14} +2025-02-06 00:42:39 - ERROR - stderr - 71%|███████▏ | 16009/22434 [14:34:58<4:31:04, 2.53s/it] +2025-02-06 00:42:41 - ERROR - stderr - 71%|███████▏ | 16010/22434 [14:35:01<4:34:51, 2.57s/it] +2025-02-06 00:42:41 - ERROR - stderr - +2025-02-06 00:42:41 - ERROR - stderr - +2025-02-06 00:42:41 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.4158761501312256, 'learning_rate': 4.001340989771022e-06, 'epoch': 2.14} +2025-02-06 00:42:41 - ERROR - stderr - 71%|███████▏ | 16010/22434 [14:35:01<4:34:51, 2.57s/it] +2025-02-06 00:42:44 - ERROR - stderr - 71%|███████▏ | 16011/22434 [14:35:03<4:33:40, 2.56s/it] +2025-02-06 00:42:44 - ERROR - stderr - +2025-02-06 00:42:44 - ERROR - stderr - +2025-02-06 00:42:44 - INFO - stdout - {'loss': 0.4204, 'grad_norm': 1.5351736545562744, 'learning_rate': 4.000185909815719e-06, 'epoch': 2.14} +2025-02-06 00:42:44 - ERROR - stderr - 71%|███████▏ | 16011/22434 [14:35:04<4:33:40, 2.56s/it] +2025-02-06 00:42:46 - ERROR - stderr - 71%|███████▏ | 16012/22434 [14:35:06<4:31:43, 2.54s/it] +2025-02-06 00:42:46 - ERROR - stderr - +2025-02-06 00:42:46 - ERROR - stderr - +2025-02-06 00:42:46 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.5570470094680786, 'learning_rate': 3.999030954920796e-06, 'epoch': 2.14} +2025-02-06 00:42:46 - ERROR - stderr - 71%|███████▏ | 16012/22434 [14:35:06<4:31:43, 2.54s/it] +2025-02-06 00:42:49 - ERROR - stderr - 71%|███████▏ | 16013/22434 [14:35:08<4:27:57, 2.50s/it] +2025-02-06 00:42:49 - ERROR - stderr - +2025-02-06 00:42:49 - ERROR - stderr - +2025-02-06 00:42:49 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.712088942527771, 'learning_rate': 3.997876125110331e-06, 'epoch': 2.14} +2025-02-06 00:42:49 - ERROR - stderr - 71%|███████▏ | 16013/22434 [14:35:08<4:27:57, 2.50s/it] +2025-02-06 00:42:51 - ERROR - stderr - 71%|███████▏ | 16014/22434 [14:35:11<4:27:09, 2.50s/it] +2025-02-06 00:42:51 - ERROR - stderr - +2025-02-06 00:42:51 - ERROR - stderr - +2025-02-06 00:42:51 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.4223662614822388, 'learning_rate': 3.996721420408395e-06, 'epoch': 2.14} +2025-02-06 00:42:51 - ERROR - stderr - 71%|███████▏ | 16014/22434 [14:35:11<4:27:09, 2.50s/it] +2025-02-06 00:42:54 - ERROR - stderr - 71%|███████▏ | 16015/22434 [14:35:13<4:25:23, 2.48s/it] +2025-02-06 00:42:54 - ERROR - stderr - +2025-02-06 00:42:54 - ERROR - stderr - +2025-02-06 00:42:54 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.4365901947021484, 'learning_rate': 3.995566840839056e-06, 'epoch': 2.14} +2025-02-06 00:42:54 - ERROR - stderr - 71%|███████▏ | 16015/22434 [14:35:13<4:25:23, 2.48s/it] +2025-02-06 00:42:56 - ERROR - stderr - 71%|███████▏ | 16016/22434 [14:35:16<4:33:51, 2.56s/it] +2025-02-06 00:42:56 - ERROR - stderr - +2025-02-06 00:42:56 - ERROR - stderr - +2025-02-06 00:42:56 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.6464719772338867, 'learning_rate': 3.99441238642638e-06, 'epoch': 2.14} +2025-02-06 00:42:56 - ERROR - stderr - 71%|███████▏ | 16016/22434 [14:35:16<4:33:51, 2.56s/it] +2025-02-06 00:42:59 - ERROR - stderr - 71%|███████▏ | 16017/22434 [14:35:19<4:31:21, 2.54s/it] +2025-02-06 00:42:59 - ERROR - stderr - +2025-02-06 00:42:59 - ERROR - stderr - +2025-02-06 00:42:59 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.2602894306182861, 'learning_rate': 3.993258057194432e-06, 'epoch': 2.14} +2025-02-06 00:42:59 - ERROR - stderr - 71%|███████▏ | 16017/22434 [14:35:19<4:31:21, 2.54s/it] +2025-02-06 00:43:01 - ERROR - stderr - 71%|███████▏ | 16018/22434 [14:35:21<4:28:41, 2.51s/it] +2025-02-06 00:43:01 - ERROR - stderr - +2025-02-06 00:43:01 - ERROR - stderr - +2025-02-06 00:43:01 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.3909038305282593, 'learning_rate': 3.992103853167272e-06, 'epoch': 2.14} +2025-02-06 00:43:01 - ERROR - stderr - 71%|███████▏ | 16018/22434 [14:35:21<4:28:41, 2.51s/it] +2025-02-06 00:43:04 - ERROR - stderr - 71%|███████▏ | 16019/22434 [14:35:24<4:28:41, 2.51s/it] +2025-02-06 00:43:04 - ERROR - stderr - +2025-02-06 00:43:04 - ERROR - stderr - +2025-02-06 00:43:04 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.496744155883789, 'learning_rate': 3.990949774368957e-06, 'epoch': 2.14} +2025-02-06 00:43:04 - ERROR - stderr - 71%|███████▏ | 16019/22434 [14:35:24<4:28:41, 2.51s/it] +2025-02-06 00:43:06 - ERROR - stderr - 71%|███████▏ | 16020/22434 [14:35:26<4:25:47, 2.49s/it] +2025-02-06 00:43:06 - ERROR - stderr - +2025-02-06 00:43:06 - ERROR - stderr - +2025-02-06 00:43:06 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.3516961336135864, 'learning_rate': 3.9897958208235456e-06, 'epoch': 2.14} +2025-02-06 00:43:06 - ERROR - stderr - 71%|███████▏ | 16020/22434 [14:35:26<4:25:47, 2.49s/it] +2025-02-06 00:43:09 - ERROR - stderr - 71%|███████▏ | 16021/22434 [14:35:29<4:29:29, 2.52s/it] +2025-02-06 00:43:09 - ERROR - stderr - +2025-02-06 00:43:09 - ERROR - stderr - +2025-02-06 00:43:09 - INFO - stdout - {'loss': 0.4065, 'grad_norm': 1.5119680166244507, 'learning_rate': 3.988641992555088e-06, 'epoch': 2.14} +2025-02-06 00:43:09 - ERROR - stderr - 71%|███████▏ | 16021/22434 [14:35:29<4:29:29, 2.52s/it] +2025-02-06 00:43:11 - ERROR - stderr - 71%|███████▏ | 16022/22434 [14:35:31<4:27:03, 2.50s/it] +2025-02-06 00:43:11 - ERROR - stderr - +2025-02-06 00:43:11 - ERROR - stderr - +2025-02-06 00:43:11 - INFO - stdout - {'loss': 0.3887, 'grad_norm': 1.5900371074676514, 'learning_rate': 3.9874882895876364e-06, 'epoch': 2.14} +2025-02-06 00:43:11 - ERROR - stderr - 71%|███████▏ | 16022/22434 [14:35:31<4:27:03, 2.50s/it] +2025-02-06 00:43:14 - ERROR - stderr - 71%|███████▏ | 16023/22434 [14:35:34<4:32:52, 2.55s/it] +2025-02-06 00:43:14 - ERROR - stderr - +2025-02-06 00:43:14 - ERROR - stderr - +2025-02-06 00:43:14 - INFO - stdout - {'loss': 0.4152, 'grad_norm': 1.4662861824035645, 'learning_rate': 3.986334711945241e-06, 'epoch': 2.14} +2025-02-06 00:43:14 - ERROR - stderr - 71%|███████▏ | 16023/22434 [14:35:34<4:32:52, 2.55s/it] +2025-02-06 00:43:16 - ERROR - stderr - 71%|███████▏ | 16024/22434 [14:35:36<4:31:43, 2.54s/it] +2025-02-06 00:43:16 - ERROR - stderr - +2025-02-06 00:43:16 - ERROR - stderr - +2025-02-06 00:43:16 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.3891469240188599, 'learning_rate': 3.985181259651938e-06, 'epoch': 2.14} +2025-02-06 00:43:16 - ERROR - stderr - 71%|███████▏ | 16024/22434 [14:35:36<4:31:43, 2.54s/it] +2025-02-06 00:43:19 - ERROR - stderr - 71%|███████▏ | 16025/22434 [14:35:39<4:29:57, 2.53s/it] +2025-02-06 00:43:19 - ERROR - stderr - +2025-02-06 00:43:19 - ERROR - stderr - +2025-02-06 00:43:19 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.524953842163086, 'learning_rate': 3.984027932731782e-06, 'epoch': 2.14} +2025-02-06 00:43:19 - ERROR - stderr - 71%|███████▏ | 16025/22434 [14:35:39<4:29:57, 2.53s/it] +2025-02-06 00:43:21 - ERROR - stderr - 71%|███████▏ | 16026/22434 [14:35:41<4:29:32, 2.52s/it] +2025-02-06 00:43:21 - ERROR - stderr - +2025-02-06 00:43:21 - ERROR - stderr - +2025-02-06 00:43:21 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.5173137187957764, 'learning_rate': 3.982874731208802e-06, 'epoch': 2.14} +2025-02-06 00:43:21 - ERROR - stderr - 71%|███████▏ | 16026/22434 [14:35:41<4:29:32, 2.52s/it] +2025-02-06 00:43:24 - ERROR - stderr - 71%|███████▏ | 16027/22434 [14:35:44<4:29:05, 2.52s/it] +2025-02-06 00:43:24 - ERROR - stderr - +2025-02-06 00:43:24 - ERROR - stderr - +2025-02-06 00:43:24 - INFO - stdout - {'loss': 0.3967, 'grad_norm': 1.4850375652313232, 'learning_rate': 3.981721655107046e-06, 'epoch': 2.14} +2025-02-06 00:43:24 - ERROR - stderr - 71%|███████▏ | 16027/22434 [14:35:44<4:29:05, 2.52s/it] +2025-02-06 00:43:27 - ERROR - stderr - 71%|███████▏ | 16028/22434 [14:35:46<4:30:36, 2.53s/it] +2025-02-06 00:43:27 - ERROR - stderr - +2025-02-06 00:43:27 - ERROR - stderr - +2025-02-06 00:43:27 - INFO - stdout - {'loss': 0.4272, 'grad_norm': 1.6878807544708252, 'learning_rate': 3.980568704450539e-06, 'epoch': 2.14} +2025-02-06 00:43:27 - ERROR - stderr - 71%|███████▏ | 16028/22434 [14:35:46<4:30:36, 2.53s/it] +2025-02-06 00:43:29 - ERROR - stderr - 71%|███████▏ | 16029/22434 [14:35:49<4:33:37, 2.56s/it] +2025-02-06 00:43:29 - ERROR - stderr - +2025-02-06 00:43:29 - ERROR - stderr - +2025-02-06 00:43:29 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.55438232421875, 'learning_rate': 3.9794158792633155e-06, 'epoch': 2.14} +2025-02-06 00:43:29 - ERROR - stderr - 71%|███████▏ | 16029/22434 [14:35:49<4:33:37, 2.56s/it] +2025-02-06 00:43:32 - ERROR - stderr - 71%|███████▏ | 16030/22434 [14:35:51<4:31:34, 2.54s/it] +2025-02-06 00:43:32 - ERROR - stderr - +2025-02-06 00:43:32 - ERROR - stderr - +2025-02-06 00:43:32 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.4437282085418701, 'learning_rate': 3.978263179569413e-06, 'epoch': 2.14} +2025-02-06 00:43:32 - ERROR - stderr - 71%|███████▏ | 16030/22434 [14:35:51<4:31:34, 2.54s/it] +2025-02-06 00:43:34 - ERROR - stderr - 71%|███████▏ | 16031/22434 [14:35:54<4:39:02, 2.61s/it] +2025-02-06 00:43:34 - ERROR - stderr - +2025-02-06 00:43:34 - ERROR - stderr - +2025-02-06 00:43:34 - INFO - stdout - {'loss': 0.3254, 'grad_norm': 1.4001771211624146, 'learning_rate': 3.977110605392849e-06, 'epoch': 2.14} +2025-02-06 00:43:34 - ERROR - stderr - 71%|███████▏ | 16031/22434 [14:35:54<4:39:02, 2.61s/it] +2025-02-06 00:43:37 - ERROR - stderr - 71%|███████▏ | 16032/22434 [14:35:57<4:33:11, 2.56s/it] +2025-02-06 00:43:37 - ERROR - stderr - +2025-02-06 00:43:37 - ERROR - stderr - +2025-02-06 00:43:37 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.435258150100708, 'learning_rate': 3.9759581567576515e-06, 'epoch': 2.14} +2025-02-06 00:43:37 - ERROR - stderr - 71%|███████▏ | 16032/22434 [14:35:57<4:33:11, 2.56s/it] +2025-02-06 00:43:39 - ERROR - stderr - 71%|███████▏ | 16033/22434 [14:35:59<4:28:59, 2.52s/it] +2025-02-06 00:43:39 - ERROR - stderr - +2025-02-06 00:43:39 - ERROR - stderr - +2025-02-06 00:43:39 - INFO - stdout - {'loss': 0.4144, 'grad_norm': 1.528347134590149, 'learning_rate': 3.974805833687841e-06, 'epoch': 2.14} +2025-02-06 00:43:39 - ERROR - stderr - 71%|███████▏ | 16033/22434 [14:35:59<4:28:59, 2.52s/it] +2025-02-06 00:43:42 - ERROR - stderr - 71%|███████▏ | 16034/22434 [14:36:02<4:29:10, 2.52s/it] +2025-02-06 00:43:42 - ERROR - stderr - +2025-02-06 00:43:42 - ERROR - stderr - +2025-02-06 00:43:42 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.3722405433654785, 'learning_rate': 3.973653636207437e-06, 'epoch': 2.14} +2025-02-06 00:43:42 - ERROR - stderr - 71%|███████▏ | 16034/22434 [14:36:02<4:29:10, 2.52s/it] +2025-02-06 00:43:44 - ERROR - stderr - 71%|███████▏ | 16035/22434 [14:36:04<4:26:49, 2.50s/it] +2025-02-06 00:43:44 - ERROR - stderr - +2025-02-06 00:43:44 - ERROR - stderr - +2025-02-06 00:43:44 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.4615352153778076, 'learning_rate': 3.972501564340457e-06, 'epoch': 2.14} +2025-02-06 00:43:44 - ERROR - stderr - 71%|███████▏ | 16035/22434 [14:36:04<4:26:49, 2.50s/it] +2025-02-06 00:43:47 - ERROR - stderr - 71%|███████▏ | 16036/22434 [14:36:06<4:24:46, 2.48s/it] +2025-02-06 00:43:47 - ERROR - stderr - +2025-02-06 00:43:47 - ERROR - stderr - +2025-02-06 00:43:47 - INFO - stdout - {'loss': 0.4037, 'grad_norm': 1.461098313331604, 'learning_rate': 3.971349618110915e-06, 'epoch': 2.14} +2025-02-06 00:43:47 - ERROR - stderr - 71%|███████▏ | 16036/22434 [14:36:07<4:24:46, 2.48s/it] +2025-02-06 00:43:49 - ERROR - stderr - 71%|███████▏ | 16037/22434 [14:36:09<4:26:15, 2.50s/it] +2025-02-06 00:43:49 - ERROR - stderr - +2025-02-06 00:43:49 - ERROR - stderr - +2025-02-06 00:43:49 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.44745934009552, 'learning_rate': 3.970197797542821e-06, 'epoch': 2.14} +2025-02-06 00:43:49 - ERROR - stderr - 71%|███████▏ | 16037/22434 [14:36:09<4:26:15, 2.50s/it] +2025-02-06 00:43:52 - ERROR - stderr - 71%|███████▏ | 16038/22434 [14:36:12<4:33:41, 2.57s/it] +2025-02-06 00:43:52 - ERROR - stderr - +2025-02-06 00:43:52 - ERROR - stderr - +2025-02-06 00:43:52 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.4748221635818481, 'learning_rate': 3.9690461026601844e-06, 'epoch': 2.14} +2025-02-06 00:43:52 - ERROR - stderr - 71%|███████▏ | 16038/22434 [14:36:12<4:33:41, 2.57s/it] +2025-02-06 00:43:55 - ERROR - stderr - 71%|███████▏ | 16039/22434 [14:36:14<4:33:10, 2.56s/it] +2025-02-06 00:43:55 - ERROR - stderr - +2025-02-06 00:43:55 - ERROR - stderr - +2025-02-06 00:43:55 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.5287187099456787, 'learning_rate': 3.96789453348701e-06, 'epoch': 2.14} +2025-02-06 00:43:55 - ERROR - stderr - 71%|███████▏ | 16039/22434 [14:36:14<4:33:10, 2.56s/it] +2025-02-06 00:43:57 - ERROR - stderr - 71%|███████▏ | 16040/22434 [14:36:17<4:29:12, 2.53s/it] +2025-02-06 00:43:57 - ERROR - stderr - +2025-02-06 00:43:57 - ERROR - stderr - +2025-02-06 00:43:57 - INFO - stdout - {'loss': 0.4073, 'grad_norm': 1.5448416471481323, 'learning_rate': 3.9667430900473024e-06, 'epoch': 2.14} +2025-02-06 00:43:57 - ERROR - stderr - 71%|███████▏ | 16040/22434 [14:36:17<4:29:12, 2.53s/it] +2025-02-06 00:43:59 - ERROR - stderr - 72%|███████▏ | 16041/22434 [14:36:19<4:28:20, 2.52s/it] +2025-02-06 00:43:59 - ERROR - stderr - +2025-02-06 00:43:59 - ERROR - stderr - +2025-02-06 00:43:59 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.4126704931259155, 'learning_rate': 3.965591772365062e-06, 'epoch': 2.15} +2025-02-06 00:44:00 - ERROR - stderr - 72%|███████▏ | 16041/22434 [14:36:19<4:28:20, 2.52s/it] +2025-02-06 00:44:02 - ERROR - stderr - 72%|███████▏ | 16042/22434 [14:36:22<4:28:20, 2.52s/it] +2025-02-06 00:44:02 - ERROR - stderr - +2025-02-06 00:44:02 - ERROR - stderr - +2025-02-06 00:44:02 - INFO - stdout - {'loss': 0.301, 'grad_norm': 1.2463871240615845, 'learning_rate': 3.964440580464286e-06, 'epoch': 2.15} +2025-02-06 00:44:02 - ERROR - stderr - 72%|███████▏ | 16042/22434 [14:36:22<4:28:20, 2.52s/it] +2025-02-06 00:44:05 - ERROR - stderr - 72%|███████▏ | 16043/22434 [14:36:24<4:29:15, 2.53s/it] +2025-02-06 00:44:05 - ERROR - stderr - +2025-02-06 00:44:05 - ERROR - stderr - +2025-02-06 00:44:05 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.5668666362762451, 'learning_rate': 3.963289514368971e-06, 'epoch': 2.15} +2025-02-06 00:44:05 - ERROR - stderr - 72%|███████▏ | 16043/22434 [14:36:24<4:29:15, 2.53s/it] +2025-02-06 00:44:07 - ERROR - stderr - 72%|███████▏ | 16044/22434 [14:36:27<4:34:01, 2.57s/it] +2025-02-06 00:44:07 - ERROR - stderr - +2025-02-06 00:44:07 - ERROR - stderr - +2025-02-06 00:44:07 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.3873860836029053, 'learning_rate': 3.962138574103114e-06, 'epoch': 2.15} +2025-02-06 00:44:07 - ERROR - stderr - 72%|███████▏ | 16044/22434 [14:36:27<4:34:01, 2.57s/it] +2025-02-06 00:44:10 - ERROR - stderr - 72%|███████▏ | 16045/22434 [14:36:30<4:33:17, 2.57s/it] +2025-02-06 00:44:10 - ERROR - stderr - +2025-02-06 00:44:10 - ERROR - stderr - +2025-02-06 00:44:10 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.3748260736465454, 'learning_rate': 3.960987759690692e-06, 'epoch': 2.15} +2025-02-06 00:44:10 - ERROR - stderr - 72%|███████▏ | 16045/22434 [14:36:30<4:33:17, 2.57s/it] +2025-02-06 00:44:12 - ERROR - stderr - 72%|███████▏ | 16046/22434 [14:36:32<4:29:30, 2.53s/it] +2025-02-06 00:44:12 - ERROR - stderr - +2025-02-06 00:44:12 - ERROR - stderr - +2025-02-06 00:44:12 - INFO - stdout - {'loss': 0.4075, 'grad_norm': 1.4905911684036255, 'learning_rate': 3.95983707115571e-06, 'epoch': 2.15} +2025-02-06 00:44:12 - ERROR - stderr - 72%|███████▏ | 16046/22434 [14:36:32<4:29:30, 2.53s/it] +2025-02-06 00:44:15 - ERROR - stderr - 72%|███████▏ | 16047/22434 [14:36:34<4:28:54, 2.53s/it] +2025-02-06 00:44:15 - ERROR - stderr - +2025-02-06 00:44:15 - ERROR - stderr - +2025-02-06 00:44:15 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.492329478263855, 'learning_rate': 3.95868650852214e-06, 'epoch': 2.15} +2025-02-06 00:44:15 - ERROR - stderr - 72%|███████▏ | 16047/22434 [14:36:35<4:28:54, 2.53s/it] +2025-02-06 00:44:17 - ERROR - stderr - 72%|███████▏ | 16048/22434 [14:36:37<4:28:46, 2.53s/it] +2025-02-06 00:44:17 - ERROR - stderr - +2025-02-06 00:44:17 - ERROR - stderr - +2025-02-06 00:44:17 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.4288463592529297, 'learning_rate': 3.957536071813966e-06, 'epoch': 2.15} +2025-02-06 00:44:17 - ERROR - stderr - 72%|███████▏ | 16048/22434 [14:36:37<4:28:46, 2.53s/it] +2025-02-06 00:44:20 - ERROR - stderr - 72%|███████▏ | 16049/22434 [14:36:39<4:27:28, 2.51s/it] +2025-02-06 00:44:20 - ERROR - stderr - +2025-02-06 00:44:20 - ERROR - stderr - +2025-02-06 00:44:20 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.5469757318496704, 'learning_rate': 3.9563857610551785e-06, 'epoch': 2.15} +2025-02-06 00:44:20 - ERROR - stderr - 72%|███████▏ | 16049/22434 [14:36:40<4:27:28, 2.51s/it] +2025-02-06 00:44:22 - ERROR - stderr - 72%|███████▏ | 16050/22434 [14:36:42<4:26:37, 2.51s/it] +2025-02-06 00:44:22 - ERROR - stderr - +2025-02-06 00:44:22 - ERROR - stderr - +2025-02-06 00:44:22 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.484440565109253, 'learning_rate': 3.955235576269738e-06, 'epoch': 2.15} +2025-02-06 00:44:22 - ERROR - stderr - 72%|███████▏ | 16050/22434 [14:36:42<4:26:37, 2.51s/it] +2025-02-06 00:44:25 - ERROR - stderr - 72%|███████▏ | 16051/22434 [14:36:44<4:26:26, 2.50s/it] +2025-02-06 00:44:25 - ERROR - stderr - +2025-02-06 00:44:25 - ERROR - stderr - +2025-02-06 00:44:25 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.7135186195373535, 'learning_rate': 3.954085517481635e-06, 'epoch': 2.15} +2025-02-06 00:44:25 - ERROR - stderr - 72%|███████▏ | 16051/22434 [14:36:45<4:26:26, 2.50s/it] +2025-02-06 00:44:27 - ERROR - stderr - 72%|███████▏ | 16052/22434 [14:36:47<4:28:02, 2.52s/it] +2025-02-06 00:44:27 - ERROR - stderr - +2025-02-06 00:44:27 - ERROR - stderr - +2025-02-06 00:44:27 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.592209815979004, 'learning_rate': 3.952935584714831e-06, 'epoch': 2.15} +2025-02-06 00:44:27 - ERROR - stderr - 72%|███████▏ | 16052/22434 [14:36:47<4:28:02, 2.52s/it] +2025-02-06 00:44:30 - ERROR - stderr - 72%|███████▏ | 16053/22434 [14:36:50<4:29:02, 2.53s/it] +2025-02-06 00:44:30 - ERROR - stderr - +2025-02-06 00:44:30 - ERROR - stderr - +2025-02-06 00:44:30 - INFO - stdout - {'loss': 0.4118, 'grad_norm': 1.8524796962738037, 'learning_rate': 3.951785777993298e-06, 'epoch': 2.15} +2025-02-06 00:44:30 - ERROR - stderr - 72%|███████▏ | 16053/22434 [14:36:50<4:29:02, 2.53s/it] +2025-02-06 00:44:32 - ERROR - stderr - 72%|███████▏ | 16054/22434 [14:36:52<4:27:33, 2.52s/it] +2025-02-06 00:44:32 - ERROR - stderr - +2025-02-06 00:44:32 - ERROR - stderr - +2025-02-06 00:44:32 - INFO - stdout - {'loss': 0.332, 'grad_norm': 1.4964635372161865, 'learning_rate': 3.950636097341003e-06, 'epoch': 2.15} +2025-02-06 00:44:32 - ERROR - stderr - 72%|███████▏ | 16054/22434 [14:36:52<4:27:33, 2.52s/it] +2025-02-06 00:44:35 - ERROR - stderr - 72%|███████▏ | 16055/22434 [14:36:55<4:27:21, 2.51s/it] +2025-02-06 00:44:35 - ERROR - stderr - +2025-02-06 00:44:35 - ERROR - stderr - +2025-02-06 00:44:35 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.5913512706756592, 'learning_rate': 3.949486542781911e-06, 'epoch': 2.15} +2025-02-06 00:44:35 - ERROR - stderr - 72%|███████▏ | 16055/22434 [14:36:55<4:27:21, 2.51s/it] +2025-02-06 00:44:37 - ERROR - stderr - 72%|███████▏ | 16056/22434 [14:36:57<4:26:00, 2.50s/it] +2025-02-06 00:44:37 - ERROR - stderr - +2025-02-06 00:44:37 - ERROR - stderr - +2025-02-06 00:44:37 - INFO - stdout - {'loss': 0.4297, 'grad_norm': 1.5682556629180908, 'learning_rate': 3.948337114339981e-06, 'epoch': 2.15} +2025-02-06 00:44:37 - ERROR - stderr - 72%|███████▏ | 16056/22434 [14:36:57<4:26:00, 2.50s/it] +2025-02-06 00:44:40 - ERROR - stderr - 72%|███████▏ | 16057/22434 [14:37:00<4:34:46, 2.59s/it] +2025-02-06 00:44:40 - ERROR - stderr - +2025-02-06 00:44:40 - ERROR - stderr - +2025-02-06 00:44:40 - INFO - stdout - {'loss': 0.3933, 'grad_norm': 1.5334926843643188, 'learning_rate': 3.947187812039173e-06, 'epoch': 2.15} +2025-02-06 00:44:40 - ERROR - stderr - 72%|███████▏ | 16057/22434 [14:37:00<4:34:46, 2.59s/it] +2025-02-06 00:44:43 - ERROR - stderr - 72%|███████▏ | 16058/22434 [14:37:02<4:30:27, 2.55s/it] +2025-02-06 00:44:43 - ERROR - stderr - +2025-02-06 00:44:43 - ERROR - stderr - +2025-02-06 00:44:43 - INFO - stdout - {'loss': 0.4209, 'grad_norm': 1.5958219766616821, 'learning_rate': 3.946038635903443e-06, 'epoch': 2.15} +2025-02-06 00:44:43 - ERROR - stderr - 72%|███████▏ | 16058/22434 [14:37:02<4:30:27, 2.55s/it] +2025-02-06 00:44:45 - ERROR - stderr - 72%|███████▏ | 16059/22434 [14:37:05<4:28:40, 2.53s/it] +2025-02-06 00:44:45 - ERROR - stderr - +2025-02-06 00:44:45 - ERROR - stderr - +2025-02-06 00:44:45 - INFO - stdout - {'loss': 0.3928, 'grad_norm': 1.4068228006362915, 'learning_rate': 3.944889585956746e-06, 'epoch': 2.15} +2025-02-06 00:44:45 - ERROR - stderr - 72%|███████▏ | 16059/22434 [14:37:05<4:28:40, 2.53s/it] +2025-02-06 00:44:48 - ERROR - stderr - 72%|███████▏ | 16060/22434 [14:37:07<4:27:25, 2.52s/it] +2025-02-06 00:44:48 - ERROR - stderr - +2025-02-06 00:44:48 - ERROR - stderr - +2025-02-06 00:44:48 - INFO - stdout - {'loss': 0.4255, 'grad_norm': 1.4997811317443848, 'learning_rate': 3.94374066222303e-06, 'epoch': 2.15} +2025-02-06 00:44:48 - ERROR - stderr - 72%|███████▏ | 16060/22434 [14:37:07<4:27:25, 2.52s/it] +2025-02-06 00:44:50 - ERROR - stderr - 72%|███████▏ | 16061/22434 [14:37:10<4:29:16, 2.54s/it] +2025-02-06 00:44:50 - ERROR - stderr - +2025-02-06 00:44:50 - ERROR - stderr - +2025-02-06 00:44:50 - INFO - stdout - {'loss': 0.3911, 'grad_norm': 1.6097463369369507, 'learning_rate': 3.942591864726246e-06, 'epoch': 2.15} +2025-02-06 00:44:50 - ERROR - stderr - 72%|███████▏ | 16061/22434 [14:37:10<4:29:16, 2.54s/it] +2025-02-06 00:44:53 - ERROR - stderr - 72%|███████▏ | 16062/22434 [14:37:12<4:27:06, 2.52s/it] +2025-02-06 00:44:53 - ERROR - stderr - +2025-02-06 00:44:53 - ERROR - stderr - +2025-02-06 00:44:53 - INFO - stdout - {'loss': 0.4154, 'grad_norm': 1.5123066902160645, 'learning_rate': 3.941443193490338e-06, 'epoch': 2.15} +2025-02-06 00:44:53 - ERROR - stderr - 72%|███████▏ | 16062/22434 [14:37:12<4:27:06, 2.52s/it] +2025-02-06 00:44:55 - ERROR - stderr - 72%|███████▏ | 16063/22434 [14:37:15<4:30:00, 2.54s/it] +2025-02-06 00:44:55 - ERROR - stderr - +2025-02-06 00:44:55 - ERROR - stderr - +2025-02-06 00:44:55 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.60038161277771, 'learning_rate': 3.940294648539248e-06, 'epoch': 2.15} +2025-02-06 00:44:55 - ERROR - stderr - 72%|███████▏ | 16063/22434 [14:37:15<4:30:00, 2.54s/it] +2025-02-06 00:44:58 - ERROR - stderr - 72%|███████▏ | 16064/22434 [14:37:17<4:30:50, 2.55s/it] +2025-02-06 00:44:58 - ERROR - stderr - +2025-02-06 00:44:58 - ERROR - stderr - +2025-02-06 00:44:58 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.4714967012405396, 'learning_rate': 3.939146229896919e-06, 'epoch': 2.15} +2025-02-06 00:44:58 - ERROR - stderr - 72%|███████▏ | 16064/22434 [14:37:18<4:30:50, 2.55s/it] +2025-02-06 00:45:00 - ERROR - stderr - 72%|███████▏ | 16065/22434 [14:37:20<4:27:50, 2.52s/it] +2025-02-06 00:45:00 - ERROR - stderr - +2025-02-06 00:45:00 - ERROR - stderr - +2025-02-06 00:45:00 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.302838683128357, 'learning_rate': 3.93799793758729e-06, 'epoch': 2.15} +2025-02-06 00:45:00 - ERROR - stderr - 72%|███████▏ | 16065/22434 [14:37:20<4:27:50, 2.52s/it] +2025-02-06 00:45:03 - ERROR - stderr - 72%|███████▏ | 16066/22434 [14:37:22<4:28:19, 2.53s/it] +2025-02-06 00:45:03 - ERROR - stderr - +2025-02-06 00:45:03 - ERROR - stderr - +2025-02-06 00:45:03 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.4940990209579468, 'learning_rate': 3.936849771634286e-06, 'epoch': 2.15} +2025-02-06 00:45:03 - ERROR - stderr - 72%|███████▏ | 16066/22434 [14:37:23<4:28:19, 2.53s/it] +2025-02-06 00:45:05 - ERROR - stderr - 72%|███████▏ | 16067/22434 [14:37:25<4:25:40, 2.50s/it] +2025-02-06 00:45:05 - ERROR - stderr - +2025-02-06 00:45:05 - ERROR - stderr - +2025-02-06 00:45:05 - INFO - stdout - {'loss': 0.4117, 'grad_norm': 1.8734276294708252, 'learning_rate': 3.9357017320618506e-06, 'epoch': 2.15} +2025-02-06 00:45:05 - ERROR - stderr - 72%|███████▏ | 16067/22434 [14:37:25<4:25:40, 2.50s/it] +2025-02-06 00:45:08 - ERROR - stderr - 72%|███████▏ | 16068/22434 [14:37:27<4:25:27, 2.50s/it] +2025-02-06 00:45:08 - ERROR - stderr - +2025-02-06 00:45:08 - ERROR - stderr - +2025-02-06 00:45:08 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.591796875, 'learning_rate': 3.934553818893912e-06, 'epoch': 2.15} +2025-02-06 00:45:08 - ERROR - stderr - 72%|███████▏ | 16068/22434 [14:37:27<4:25:27, 2.50s/it] +2025-02-06 00:45:10 - ERROR - stderr - 72%|███████▏ | 16069/22434 [14:37:30<4:30:35, 2.55s/it] +2025-02-06 00:45:10 - ERROR - stderr - +2025-02-06 00:45:10 - ERROR - stderr - +2025-02-06 00:45:10 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.4919301271438599, 'learning_rate': 3.93340603215439e-06, 'epoch': 2.15} +2025-02-06 00:45:10 - ERROR - stderr - 72%|███████▏ | 16069/22434 [14:37:30<4:30:35, 2.55s/it] +2025-02-06 00:45:13 - ERROR - stderr - 72%|███████▏ | 16070/22434 [14:37:33<4:34:48, 2.59s/it] +2025-02-06 00:45:13 - ERROR - stderr - +2025-02-06 00:45:13 - ERROR - stderr - +2025-02-06 00:45:13 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.593639850616455, 'learning_rate': 3.932258371867221e-06, 'epoch': 2.15} +2025-02-06 00:45:13 - ERROR - stderr - 72%|███████▏ | 16070/22434 [14:37:33<4:34:48, 2.59s/it] +2025-02-06 00:45:16 - ERROR - stderr - 72%|███████▏ | 16071/22434 [14:37:35<4:34:25, 2.59s/it] +2025-02-06 00:45:16 - ERROR - stderr - +2025-02-06 00:45:16 - ERROR - stderr - +2025-02-06 00:45:16 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.4452128410339355, 'learning_rate': 3.9311108380563125e-06, 'epoch': 2.15} +2025-02-06 00:45:16 - ERROR - stderr - 72%|███████▏ | 16071/22434 [14:37:35<4:34:25, 2.59s/it] +2025-02-06 00:45:18 - ERROR - stderr - 72%|███████▏ | 16072/22434 [14:37:38<4:31:26, 2.56s/it] +2025-02-06 00:45:18 - ERROR - stderr - +2025-02-06 00:45:18 - ERROR - stderr - +2025-02-06 00:45:18 - INFO - stdout - {'loss': 0.4416, 'grad_norm': 1.685669183731079, 'learning_rate': 3.929963430745598e-06, 'epoch': 2.15} +2025-02-06 00:45:18 - ERROR - stderr - 72%|███████▏ | 16072/22434 [14:37:38<4:31:26, 2.56s/it] +2025-02-06 00:45:21 - ERROR - stderr - 72%|███████▏ | 16073/22434 [14:37:40<4:27:44, 2.53s/it] +2025-02-06 00:45:21 - ERROR - stderr - +2025-02-06 00:45:21 - ERROR - stderr - +2025-02-06 00:45:21 - INFO - stdout - {'loss': 0.3905, 'grad_norm': 1.4676814079284668, 'learning_rate': 3.928816149958984e-06, 'epoch': 2.15} +2025-02-06 00:45:21 - ERROR - stderr - 72%|███████▏ | 16073/22434 [14:37:40<4:27:44, 2.53s/it] +2025-02-06 00:45:23 - ERROR - stderr - 72%|███████▏ | 16074/22434 [14:37:43<4:24:53, 2.50s/it] +2025-02-06 00:45:23 - ERROR - stderr - +2025-02-06 00:45:23 - ERROR - stderr - +2025-02-06 00:45:23 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.3602242469787598, 'learning_rate': 3.927668995720384e-06, 'epoch': 2.15} +2025-02-06 00:45:23 - ERROR - stderr - 72%|███████▏ | 16074/22434 [14:37:43<4:24:53, 2.50s/it] +2025-02-06 00:45:25 - ERROR - stderr - 72%|███████▏ | 16075/22434 [14:37:45<4:22:29, 2.48s/it] +2025-02-06 00:45:25 - ERROR - stderr - +2025-02-06 00:45:25 - ERROR - stderr - +2025-02-06 00:45:25 - INFO - stdout - {'loss': 0.3791, 'grad_norm': 1.5395230054855347, 'learning_rate': 3.92652196805372e-06, 'epoch': 2.15} +2025-02-06 00:45:25 - ERROR - stderr - 72%|███████▏ | 16075/22434 [14:37:45<4:22:29, 2.48s/it] +2025-02-06 00:45:28 - ERROR - stderr - 72%|███████▏ | 16076/22434 [14:37:48<4:21:58, 2.47s/it] +2025-02-06 00:45:28 - ERROR - stderr - +2025-02-06 00:45:28 - ERROR - stderr - +2025-02-06 00:45:28 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.542179822921753, 'learning_rate': 3.925375066982892e-06, 'epoch': 2.15} +2025-02-06 00:45:28 - ERROR - stderr - 72%|███████▏ | 16076/22434 [14:37:48<4:21:58, 2.47s/it] +2025-02-06 00:45:30 - ERROR - stderr - 72%|███████▏ | 16077/22434 [14:37:50<4:22:45, 2.48s/it] +2025-02-06 00:45:30 - ERROR - stderr - +2025-02-06 00:45:30 - ERROR - stderr - +2025-02-06 00:45:30 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.5008248090744019, 'learning_rate': 3.9242282925318064e-06, 'epoch': 2.15} +2025-02-06 00:45:30 - ERROR - stderr - 72%|███████▏ | 16077/22434 [14:37:50<4:22:45, 2.48s/it] +2025-02-06 00:45:33 - ERROR - stderr - 72%|███████▏ | 16078/22434 [14:37:53<4:22:52, 2.48s/it] +2025-02-06 00:45:33 - ERROR - stderr - +2025-02-06 00:45:33 - ERROR - stderr - +2025-02-06 00:45:33 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.4808636903762817, 'learning_rate': 3.9230816447243695e-06, 'epoch': 2.15} +2025-02-06 00:45:33 - ERROR - stderr - 72%|███████▏ | 16078/22434 [14:37:53<4:22:52, 2.48s/it] +2025-02-06 00:45:35 - ERROR - stderr - 72%|███████▏ | 16079/22434 [14:37:55<4:22:03, 2.47s/it] +2025-02-06 00:45:35 - ERROR - stderr - +2025-02-06 00:45:35 - ERROR - stderr - +2025-02-06 00:45:35 - INFO - stdout - {'loss': 0.372, 'grad_norm': 1.516959547996521, 'learning_rate': 3.921935123584479e-06, 'epoch': 2.15} +2025-02-06 00:45:35 - ERROR - stderr - 72%|███████▏ | 16079/22434 [14:37:55<4:22:03, 2.47s/it] +2025-02-06 00:45:38 - ERROR - stderr - 72%|███████▏ | 16080/22434 [14:37:58<4:22:41, 2.48s/it] +2025-02-06 00:45:38 - ERROR - stderr - +2025-02-06 00:45:38 - ERROR - stderr - +2025-02-06 00:45:38 - INFO - stdout - {'loss': 0.4529, 'grad_norm': 1.6373462677001953, 'learning_rate': 3.920788729136036e-06, 'epoch': 2.15} +2025-02-06 00:45:38 - ERROR - stderr - 72%|███████▏ | 16080/22434 [14:37:58<4:22:41, 2.48s/it] +2025-02-06 00:45:40 - ERROR - stderr - 72%|███████▏ | 16081/22434 [14:38:00<4:26:07, 2.51s/it] +2025-02-06 00:45:40 - ERROR - stderr - +2025-02-06 00:45:40 - ERROR - stderr - +2025-02-06 00:45:40 - INFO - stdout - {'loss': 0.4203, 'grad_norm': 1.5589298009872437, 'learning_rate': 3.919642461402935e-06, 'epoch': 2.15} +2025-02-06 00:45:40 - ERROR - stderr - 72%|███████▏ | 16081/22434 [14:38:00<4:26:07, 2.51s/it] +2025-02-06 00:45:43 - ERROR - stderr - 72%|███████▏ | 16082/22434 [14:38:03<4:26:53, 2.52s/it] +2025-02-06 00:45:43 - ERROR - stderr - +2025-02-06 00:45:43 - ERROR - stderr - +2025-02-06 00:45:43 - INFO - stdout - {'loss': 0.4422, 'grad_norm': 1.6543482542037964, 'learning_rate': 3.918496320409068e-06, 'epoch': 2.15} +2025-02-06 00:45:43 - ERROR - stderr - 72%|███████▏ | 16082/22434 [14:38:03<4:26:53, 2.52s/it] +2025-02-06 00:45:45 - ERROR - stderr - 72%|███████▏ | 16083/22434 [14:38:05<4:25:18, 2.51s/it] +2025-02-06 00:45:45 - ERROR - stderr - +2025-02-06 00:45:45 - ERROR - stderr - +2025-02-06 00:45:45 - INFO - stdout - {'loss': 0.4419, 'grad_norm': 1.6424309015274048, 'learning_rate': 3.917350306178326e-06, 'epoch': 2.15} +2025-02-06 00:45:45 - ERROR - stderr - 72%|███████▏ | 16083/22434 [14:38:05<4:25:18, 2.51s/it] +2025-02-06 00:45:48 - ERROR - stderr - 72%|███████▏ | 16084/22434 [14:38:08<4:25:04, 2.50s/it] +2025-02-06 00:45:48 - ERROR - stderr - +2025-02-06 00:45:48 - ERROR - stderr - +2025-02-06 00:45:48 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.3029494285583496, 'learning_rate': 3.916204418734599e-06, 'epoch': 2.15} +2025-02-06 00:45:48 - ERROR - stderr - 72%|███████▏ | 16084/22434 [14:38:08<4:25:04, 2.50s/it] +2025-02-06 00:45:50 - ERROR - stderr - 72%|███████▏ | 16085/22434 [14:38:10<4:23:14, 2.49s/it] +2025-02-06 00:45:50 - ERROR - stderr - +2025-02-06 00:45:50 - ERROR - stderr - +2025-02-06 00:45:50 - INFO - stdout - {'loss': 0.44, 'grad_norm': 1.6073129177093506, 'learning_rate': 3.915058658101763e-06, 'epoch': 2.15} +2025-02-06 00:45:50 - ERROR - stderr - 72%|███████▏ | 16085/22434 [14:38:10<4:23:14, 2.49s/it] +2025-02-06 00:45:53 - ERROR - stderr - 72%|███████▏ | 16086/22434 [14:38:13<4:22:19, 2.48s/it] +2025-02-06 00:45:53 - ERROR - stderr - +2025-02-06 00:45:53 - ERROR - stderr - +2025-02-06 00:45:53 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.4433947801589966, 'learning_rate': 3.913913024303712e-06, 'epoch': 2.15} +2025-02-06 00:45:53 - ERROR - stderr - 72%|███████▏ | 16086/22434 [14:38:13<4:22:19, 2.48s/it] +2025-02-06 00:45:55 - ERROR - stderr - 72%|███████▏ | 16087/22434 [14:38:15<4:20:40, 2.46s/it] +2025-02-06 00:45:55 - ERROR - stderr - +2025-02-06 00:45:55 - ERROR - stderr - +2025-02-06 00:45:55 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.5924313068389893, 'learning_rate': 3.912767517364317e-06, 'epoch': 2.15} +2025-02-06 00:45:55 - ERROR - stderr - 72%|███████▏ | 16087/22434 [14:38:15<4:20:40, 2.46s/it] +2025-02-06 00:45:58 - ERROR - stderr - 72%|███████▏ | 16088/22434 [14:38:18<4:21:45, 2.47s/it] +2025-02-06 00:45:58 - ERROR - stderr - +2025-02-06 00:45:58 - ERROR - stderr - +2025-02-06 00:45:58 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.6250706911087036, 'learning_rate': 3.91162213730746e-06, 'epoch': 2.15} +2025-02-06 00:45:58 - ERROR - stderr - 72%|███████▏ | 16088/22434 [14:38:18<4:21:45, 2.47s/it] +2025-02-06 00:46:00 - ERROR - stderr - 72%|███████▏ | 16089/22434 [14:38:20<4:30:27, 2.56s/it] +2025-02-06 00:46:01 - ERROR - stderr - +2025-02-06 00:46:01 - ERROR - stderr - +2025-02-06 00:46:01 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.4148329496383667, 'learning_rate': 3.9104768841570175e-06, 'epoch': 2.15} +2025-02-06 00:46:01 - ERROR - stderr - 72%|███████▏ | 16089/22434 [14:38:20<4:30:27, 2.56s/it] +2025-02-06 00:46:03 - ERROR - stderr - 72%|███████▏ | 16090/22434 [14:38:23<4:31:00, 2.56s/it] +2025-02-06 00:46:03 - ERROR - stderr - +2025-02-06 00:46:03 - ERROR - stderr - +2025-02-06 00:46:03 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.40345299243927, 'learning_rate': 3.90933175793685e-06, 'epoch': 2.15} +2025-02-06 00:46:03 - ERROR - stderr - 72%|███████▏ | 16090/22434 [14:38:23<4:31:00, 2.56s/it] +2025-02-06 00:46:06 - ERROR - stderr - 72%|███████▏ | 16091/22434 [14:38:25<4:30:54, 2.56s/it] +2025-02-06 00:46:06 - ERROR - stderr - +2025-02-06 00:46:06 - ERROR - stderr - +2025-02-06 00:46:06 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.456261396408081, 'learning_rate': 3.90818675867084e-06, 'epoch': 2.15} +2025-02-06 00:46:06 - ERROR - stderr - 72%|███████▏ | 16091/22434 [14:38:25<4:30:54, 2.56s/it] +2025-02-06 00:46:08 - ERROR - stderr - 72%|███████▏ | 16092/22434 [14:38:28<4:28:21, 2.54s/it] +2025-02-06 00:46:08 - ERROR - stderr - +2025-02-06 00:46:08 - ERROR - stderr - +2025-02-06 00:46:08 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.454257845878601, 'learning_rate': 3.907041886382845e-06, 'epoch': 2.15} +2025-02-06 00:46:08 - ERROR - stderr - 72%|███████▏ | 16092/22434 [14:38:28<4:28:21, 2.54s/it] +2025-02-06 00:46:11 - ERROR - stderr - 72%|███████▏ | 16093/22434 [14:38:30<4:28:08, 2.54s/it] +2025-02-06 00:46:11 - ERROR - stderr - +2025-02-06 00:46:11 - ERROR - stderr - +2025-02-06 00:46:11 - INFO - stdout - {'loss': 0.4213, 'grad_norm': 1.67599618434906, 'learning_rate': 3.9058971410967285e-06, 'epoch': 2.15} +2025-02-06 00:46:11 - ERROR - stderr - 72%|███████▏ | 16093/22434 [14:38:30<4:28:08, 2.54s/it] +2025-02-06 00:46:13 - ERROR - stderr - 72%|███████▏ | 16094/22434 [14:38:33<4:33:33, 2.59s/it] +2025-02-06 00:46:13 - ERROR - stderr - +2025-02-06 00:46:13 - ERROR - stderr - +2025-02-06 00:46:13 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.4291666746139526, 'learning_rate': 3.90475252283636e-06, 'epoch': 2.15} +2025-02-06 00:46:13 - ERROR - stderr - 72%|███████▏ | 16094/22434 [14:38:33<4:33:33, 2.59s/it] +2025-02-06 00:46:16 - ERROR - stderr - 72%|███████▏ | 16095/22434 [14:38:36<4:31:34, 2.57s/it] +2025-02-06 00:46:16 - ERROR - stderr - +2025-02-06 00:46:16 - ERROR - stderr - +2025-02-06 00:46:16 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.539754867553711, 'learning_rate': 3.903608031625587e-06, 'epoch': 2.15} +2025-02-06 00:46:16 - ERROR - stderr - 72%|███████▏ | 16095/22434 [14:38:36<4:31:34, 2.57s/it] +2025-02-06 00:46:18 - ERROR - stderr - 72%|███████▏ | 16096/22434 [14:38:38<4:28:13, 2.54s/it] +2025-02-06 00:46:18 - ERROR - stderr - +2025-02-06 00:46:18 - ERROR - stderr - +2025-02-06 00:46:18 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.4383267164230347, 'learning_rate': 3.902463667488278e-06, 'epoch': 2.15} +2025-02-06 00:46:18 - ERROR - stderr - 72%|███████▏ | 16096/22434 [14:38:38<4:28:13, 2.54s/it] +2025-02-06 00:46:21 - ERROR - stderr - 72%|███████▏ | 16097/22434 [14:38:41<4:31:28, 2.57s/it] +2025-02-06 00:46:21 - ERROR - stderr - +2025-02-06 00:46:21 - ERROR - stderr - +2025-02-06 00:46:21 - INFO - stdout - {'loss': 0.3963, 'grad_norm': 1.4488979578018188, 'learning_rate': 3.901319430448276e-06, 'epoch': 2.15} +2025-02-06 00:46:21 - ERROR - stderr - 72%|███████▏ | 16097/22434 [14:38:41<4:31:28, 2.57s/it] +2025-02-06 00:46:23 - ERROR - stderr - 72%|███████▏ | 16098/22434 [14:38:43<4:28:48, 2.55s/it] +2025-02-06 00:46:24 - ERROR - stderr - +2025-02-06 00:46:24 - ERROR - stderr - +2025-02-06 00:46:24 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.5035040378570557, 'learning_rate': 3.9001753205294335e-06, 'epoch': 2.15} +2025-02-06 00:46:24 - ERROR - stderr - 72%|███████▏ | 16098/22434 [14:38:43<4:28:48, 2.55s/it] +2025-02-06 00:46:26 - ERROR - stderr - 72%|███████▏ | 16099/22434 [14:38:46<4:27:47, 2.54s/it] +2025-02-06 00:46:26 - ERROR - stderr - +2025-02-06 00:46:26 - ERROR - stderr - +2025-02-06 00:46:26 - INFO - stdout - {'loss': 0.3287, 'grad_norm': 1.335659384727478, 'learning_rate': 3.8990313377556e-06, 'epoch': 2.15} +2025-02-06 00:46:26 - ERROR - stderr - 72%|███████▏ | 16099/22434 [14:38:46<4:27:47, 2.54s/it] +2025-02-06 00:46:28 - ERROR - stderr - 72%|███████▏ | 16100/22434 [14:38:48<4:26:37, 2.53s/it] +2025-02-06 00:46:29 - ERROR - stderr - +2025-02-06 00:46:29 - ERROR - stderr - +2025-02-06 00:46:29 - INFO - stdout - {'loss': 0.4122, 'grad_norm': 1.6142460107803345, 'learning_rate': 3.897887482150621e-06, 'epoch': 2.15} +2025-02-06 00:46:29 - ERROR - stderr - 72%|███████▏ | 16100/22434 [14:38:48<4:26:37, 2.53s/it] +2025-02-06 00:46:31 - ERROR - stderr - 72%|███████▏ | 16101/22434 [14:38:51<4:25:58, 2.52s/it] +2025-02-06 00:46:31 - ERROR - stderr - +2025-02-06 00:46:31 - ERROR - stderr - +2025-02-06 00:46:31 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.5206695795059204, 'learning_rate': 3.896743753738337e-06, 'epoch': 2.15} +2025-02-06 00:46:31 - ERROR - stderr - 72%|███████▏ | 16101/22434 [14:38:51<4:25:58, 2.52s/it] +2025-02-06 00:46:34 - ERROR - stderr - 72%|███████▏ | 16102/22434 [14:38:53<4:25:55, 2.52s/it] +2025-02-06 00:46:34 - ERROR - stderr - +2025-02-06 00:46:34 - ERROR - stderr - +2025-02-06 00:46:34 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.4361813068389893, 'learning_rate': 3.89560015254259e-06, 'epoch': 2.15} +2025-02-06 00:46:34 - ERROR - stderr - 72%|███████▏ | 16102/22434 [14:38:53<4:25:55, 2.52s/it] +2025-02-06 00:46:36 - ERROR - stderr - 72%|███████▏ | 16103/22434 [14:38:56<4:25:58, 2.52s/it] +2025-02-06 00:46:36 - ERROR - stderr - +2025-02-06 00:46:36 - ERROR - stderr - +2025-02-06 00:46:36 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.5769808292388916, 'learning_rate': 3.894456678587216e-06, 'epoch': 2.15} +2025-02-06 00:46:36 - ERROR - stderr - 72%|███████▏ | 16103/22434 [14:38:56<4:25:58, 2.52s/it] +2025-02-06 00:46:39 - ERROR - stderr - 72%|███████▏ | 16104/22434 [14:38:58<4:30:45, 2.57s/it] +2025-02-06 00:46:39 - ERROR - stderr - +2025-02-06 00:46:39 - ERROR - stderr - +2025-02-06 00:46:39 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.4655263423919678, 'learning_rate': 3.893313331896051e-06, 'epoch': 2.15} +2025-02-06 00:46:39 - ERROR - stderr - 72%|███████▏ | 16104/22434 [14:38:59<4:30:45, 2.57s/it] +2025-02-06 00:46:41 - ERROR - stderr - 72%|███████▏ | 16105/22434 [14:39:01<4:28:53, 2.55s/it] +2025-02-06 00:46:41 - ERROR - stderr - +2025-02-06 00:46:41 - ERROR - stderr - +2025-02-06 00:46:41 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.3872959613800049, 'learning_rate': 3.8921701124929255e-06, 'epoch': 2.15} +2025-02-06 00:46:41 - ERROR - stderr - 72%|███████▏ | 16105/22434 [14:39:01<4:28:53, 2.55s/it] +2025-02-06 00:46:44 - ERROR - stderr - 72%|███████▏ | 16106/22434 [14:39:03<4:27:30, 2.54s/it] +2025-02-06 00:46:44 - ERROR - stderr - +2025-02-06 00:46:44 - ERROR - stderr - +2025-02-06 00:46:44 - INFO - stdout - {'loss': 0.3791, 'grad_norm': 1.3545231819152832, 'learning_rate': 3.89102702040167e-06, 'epoch': 2.15} +2025-02-06 00:46:44 - ERROR - stderr - 72%|███████▏ | 16106/22434 [14:39:04<4:27:30, 2.54s/it] +2025-02-06 00:46:46 - ERROR - stderr - 72%|███████▏ | 16107/22434 [14:39:06<4:27:18, 2.53s/it] +2025-02-06 00:46:46 - ERROR - stderr - +2025-02-06 00:46:46 - ERROR - stderr - +2025-02-06 00:46:46 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.4708161354064941, 'learning_rate': 3.88988405564611e-06, 'epoch': 2.15} +2025-02-06 00:46:46 - ERROR - stderr - 72%|███████▏ | 16107/22434 [14:39:06<4:27:18, 2.53s/it] +2025-02-06 00:46:49 - ERROR - stderr - 72%|███████▏ | 16108/22434 [14:39:08<4:24:35, 2.51s/it] +2025-02-06 00:46:49 - ERROR - stderr - +2025-02-06 00:46:49 - ERROR - stderr - +2025-02-06 00:46:49 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.4483819007873535, 'learning_rate': 3.888741218250074e-06, 'epoch': 2.15} +2025-02-06 00:46:49 - ERROR - stderr - 72%|███████▏ | 16108/22434 [14:39:09<4:24:35, 2.51s/it] +2025-02-06 00:46:51 - ERROR - stderr - 72%|███████▏ | 16109/22434 [14:39:11<4:32:24, 2.58s/it] +2025-02-06 00:46:52 - ERROR - stderr - +2025-02-06 00:46:52 - ERROR - stderr - +2025-02-06 00:46:52 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.435144305229187, 'learning_rate': 3.8875985082373725e-06, 'epoch': 2.15} +2025-02-06 00:46:52 - ERROR - stderr - 72%|███████▏ | 16109/22434 [14:39:11<4:32:24, 2.58s/it] +2025-02-06 00:46:54 - ERROR - stderr - 72%|███████▏ | 16110/22434 [14:39:14<4:28:23, 2.55s/it] +2025-02-06 00:46:54 - ERROR - stderr - +2025-02-06 00:46:54 - ERROR - stderr - +2025-02-06 00:46:54 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.4609761238098145, 'learning_rate': 3.8864559256318375e-06, 'epoch': 2.15} +2025-02-06 00:46:54 - ERROR - stderr - 72%|███████▏ | 16110/22434 [14:39:14<4:28:23, 2.55s/it] +2025-02-06 00:46:57 - ERROR - stderr - 72%|███████▏ | 16111/22434 [14:39:16<4:34:30, 2.60s/it] +2025-02-06 00:46:57 - ERROR - stderr - +2025-02-06 00:46:57 - ERROR - stderr - +2025-02-06 00:46:57 - INFO - stdout - {'loss': 0.4283, 'grad_norm': 1.5012121200561523, 'learning_rate': 3.885313470457272e-06, 'epoch': 2.15} +2025-02-06 00:46:57 - ERROR - stderr - 72%|███████▏ | 16111/22434 [14:39:16<4:34:30, 2.60s/it] +2025-02-06 00:46:59 - ERROR - stderr - 72%|███████▏ | 16112/22434 [14:39:19<4:30:08, 2.56s/it] +2025-02-06 00:46:59 - ERROR - stderr - +2025-02-06 00:46:59 - ERROR - stderr - +2025-02-06 00:46:59 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.4494997262954712, 'learning_rate': 3.8841711427375e-06, 'epoch': 2.15} +2025-02-06 00:46:59 - ERROR - stderr - 72%|███████▏ | 16112/22434 [14:39:19<4:30:08, 2.56s/it] +2025-02-06 00:47:02 - ERROR - stderr - 72%|███████▏ | 16113/22434 [14:39:21<4:30:34, 2.57s/it] +2025-02-06 00:47:02 - ERROR - stderr - +2025-02-06 00:47:02 - ERROR - stderr - +2025-02-06 00:47:02 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.6195967197418213, 'learning_rate': 3.883028942496333e-06, 'epoch': 2.15} +2025-02-06 00:47:02 - ERROR - stderr - 72%|███████▏ | 16113/22434 [14:39:22<4:30:34, 2.57s/it] +2025-02-06 00:47:04 - ERROR - stderr - 72%|███████▏ | 16114/22434 [14:39:24<4:27:13, 2.54s/it] +2025-02-06 00:47:04 - ERROR - stderr - +2025-02-06 00:47:04 - ERROR - stderr - +2025-02-06 00:47:04 - INFO - stdout - {'loss': 0.4225, 'grad_norm': 1.5485037565231323, 'learning_rate': 3.881886869757565e-06, 'epoch': 2.15} +2025-02-06 00:47:04 - ERROR - stderr - 72%|███████▏ | 16114/22434 [14:39:24<4:27:13, 2.54s/it] +2025-02-06 00:47:07 - ERROR - stderr - 72%|███████▏ | 16115/22434 [14:39:26<4:25:47, 2.52s/it] +2025-02-06 00:47:07 - ERROR - stderr - +2025-02-06 00:47:07 - ERROR - stderr - +2025-02-06 00:47:07 - INFO - stdout - {'loss': 0.4064, 'grad_norm': 1.429679036140442, 'learning_rate': 3.880744924545019e-06, 'epoch': 2.15} +2025-02-06 00:47:07 - ERROR - stderr - 72%|███████▏ | 16115/22434 [14:39:26<4:25:47, 2.52s/it] +2025-02-06 00:47:09 - ERROR - stderr - 72%|███████▏ | 16116/22434 [14:39:29<4:24:07, 2.51s/it] +2025-02-06 00:47:09 - ERROR - stderr - +2025-02-06 00:47:09 - ERROR - stderr - +2025-02-06 00:47:09 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.559616208076477, 'learning_rate': 3.8796031068824865e-06, 'epoch': 2.16} +2025-02-06 00:47:09 - ERROR - stderr - 72%|███████▏ | 16116/22434 [14:39:29<4:24:07, 2.51s/it] +2025-02-06 00:47:12 - ERROR - stderr - 72%|███████▏ | 16117/22434 [14:39:31<4:23:55, 2.51s/it] +2025-02-06 00:47:12 - ERROR - stderr - +2025-02-06 00:47:12 - ERROR - stderr - +2025-02-06 00:47:12 - INFO - stdout - {'loss': 0.4716, 'grad_norm': 1.8206707239151, 'learning_rate': 3.87846141679377e-06, 'epoch': 2.16} +2025-02-06 00:47:12 - ERROR - stderr - 72%|███████▏ | 16117/22434 [14:39:31<4:23:55, 2.51s/it] +2025-02-06 00:47:14 - ERROR - stderr - 72%|███████▏ | 16118/22434 [14:39:34<4:23:35, 2.50s/it] +2025-02-06 00:47:14 - ERROR - stderr - +2025-02-06 00:47:14 - ERROR - stderr - +2025-02-06 00:47:14 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.3845725059509277, 'learning_rate': 3.877319854302668e-06, 'epoch': 2.16} +2025-02-06 00:47:14 - ERROR - stderr - 72%|███████▏ | 16118/22434 [14:39:34<4:23:35, 2.50s/it] +2025-02-06 00:47:17 - ERROR - stderr - 72%|███████▏ | 16119/22434 [14:39:36<4:23:18, 2.50s/it] +2025-02-06 00:47:17 - ERROR - stderr - +2025-02-06 00:47:17 - ERROR - stderr - +2025-02-06 00:47:17 - INFO - stdout - {'loss': 0.3889, 'grad_norm': 1.4974833726882935, 'learning_rate': 3.876178419432971e-06, 'epoch': 2.16} +2025-02-06 00:47:17 - ERROR - stderr - 72%|███████�� | 16119/22434 [14:39:36<4:23:18, 2.50s/it] +2025-02-06 00:47:19 - ERROR - stderr - 72%|███████▏ | 16120/22434 [14:39:39<4:23:32, 2.50s/it] +2025-02-06 00:47:19 - ERROR - stderr - +2025-02-06 00:47:19 - ERROR - stderr - +2025-02-06 00:47:19 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.488113522529602, 'learning_rate': 3.875037112208482e-06, 'epoch': 2.16} +2025-02-06 00:47:19 - ERROR - stderr - 72%|███████▏ | 16120/22434 [14:39:39<4:23:32, 2.50s/it] +2025-02-06 00:47:22 - ERROR - stderr - 72%|███████▏ | 16121/22434 [14:39:41<4:23:35, 2.51s/it] +2025-02-06 00:47:22 - ERROR - stderr - +2025-02-06 00:47:22 - ERROR - stderr - +2025-02-06 00:47:22 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.4846007823944092, 'learning_rate': 3.87389593265298e-06, 'epoch': 2.16} +2025-02-06 00:47:22 - ERROR - stderr - 72%|███████▏ | 16121/22434 [14:39:41<4:23:35, 2.51s/it] +2025-02-06 00:47:24 - ERROR - stderr - 72%|███████▏ | 16122/22434 [14:39:44<4:21:22, 2.48s/it] +2025-02-06 00:47:24 - ERROR - stderr - +2025-02-06 00:47:24 - ERROR - stderr - +2025-02-06 00:47:24 - INFO - stdout - {'loss': 0.4116, 'grad_norm': 1.6884653568267822, 'learning_rate': 3.872754880790255e-06, 'epoch': 2.16} +2025-02-06 00:47:24 - ERROR - stderr - 72%|███████▏ | 16122/22434 [14:39:44<4:21:22, 2.48s/it] +2025-02-06 00:47:27 - ERROR - stderr - 72%|███████▏ | 16123/22434 [14:39:46<4:20:36, 2.48s/it] +2025-02-06 00:47:27 - ERROR - stderr - +2025-02-06 00:47:27 - ERROR - stderr - +2025-02-06 00:47:27 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.4158353805541992, 'learning_rate': 3.871613956644091e-06, 'epoch': 2.16} +2025-02-06 00:47:27 - ERROR - stderr - 72%|███████▏ | 16123/22434 [14:39:46<4:20:36, 2.48s/it] +2025-02-06 00:47:29 - ERROR - stderr - 72%|███████▏ | 16124/22434 [14:39:49<4:21:38, 2.49s/it] +2025-02-06 00:47:29 - ERROR - stderr - +2025-02-06 00:47:29 - ERROR - stderr - +2025-02-06 00:47:29 - INFO - stdout - {'loss': 0.4165, 'grad_norm': 1.4970380067825317, 'learning_rate': 3.870473160238271e-06, 'epoch': 2.16} +2025-02-06 00:47:29 - ERROR - stderr - 72%|███████▏ | 16124/22434 [14:39:49<4:21:38, 2.49s/it] +2025-02-06 00:47:32 - ERROR - stderr - 72%|███████▏ | 16125/22434 [14:39:51<4:20:32, 2.48s/it] +2025-02-06 00:47:32 - ERROR - stderr - +2025-02-06 00:47:32 - ERROR - stderr - +2025-02-06 00:47:32 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.563650369644165, 'learning_rate': 3.869332491596573e-06, 'epoch': 2.16} +2025-02-06 00:47:32 - ERROR - stderr - 72%|███████▏ | 16125/22434 [14:39:51<4:20:32, 2.48s/it] +2025-02-06 00:47:34 - ERROR - stderr - 72%|███████▏ | 16126/22434 [14:39:54<4:21:31, 2.49s/it] +2025-02-06 00:47:34 - ERROR - stderr - +2025-02-06 00:47:34 - ERROR - stderr - +2025-02-06 00:47:34 - INFO - stdout - {'loss': 0.4018, 'grad_norm': 1.581353783607483, 'learning_rate': 3.868191950742771e-06, 'epoch': 2.16} +2025-02-06 00:47:34 - ERROR - stderr - 72%|███████▏ | 16126/22434 [14:39:54<4:21:31, 2.49s/it] +2025-02-06 00:47:37 - ERROR - stderr - 72%|███████▏ | 16127/22434 [14:39:56<4:24:07, 2.51s/it] +2025-02-06 00:47:37 - ERROR - stderr - +2025-02-06 00:47:37 - ERROR - stderr - +2025-02-06 00:47:37 - INFO - stdout - {'loss': 0.4135, 'grad_norm': 1.5553282499313354, 'learning_rate': 3.867051537700642e-06, 'epoch': 2.16} +2025-02-06 00:47:37 - ERROR - stderr - 72%|███████▏ | 16127/22434 [14:39:56<4:24:07, 2.51s/it] +2025-02-06 00:47:39 - ERROR - stderr - 72%|███████▏ | 16128/22434 [14:39:59<4:25:10, 2.52s/it] +2025-02-06 00:47:39 - ERROR - stderr - +2025-02-06 00:47:39 - ERROR - stderr - +2025-02-06 00:47:39 - INFO - stdout - {'loss': 0.3476, 'grad_norm': 1.5583549737930298, 'learning_rate': 3.8659112524939535e-06, 'epoch': 2.16} +2025-02-06 00:47:39 - ERROR - stderr - 72%|███████▏ | 16128/22434 [14:39:59<4:25:10, 2.52s/it] +2025-02-06 00:47:42 - ERROR - stderr - 72%|███████▏ | 16129/22434 [14:40:01<4:22:19, 2.50s/it] +2025-02-06 00:47:42 - ERROR - stderr - +2025-02-06 00:47:42 - ERROR - stderr - +2025-02-06 00:47:42 - INFO - stdout - {'loss': 0.403, 'grad_norm': 1.486255407333374, 'learning_rate': 3.864771095146479e-06, 'epoch': 2.16} +2025-02-06 00:47:42 - ERROR - stderr - 72%|███████▏ | 16129/22434 [14:40:01<4:22:19, 2.50s/it] +2025-02-06 00:47:44 - ERROR - stderr - 72%|███████▏ | 16130/22434 [14:40:04<4:21:23, 2.49s/it] +2025-02-06 00:47:44 - ERROR - stderr - +2025-02-06 00:47:44 - ERROR - stderr - +2025-02-06 00:47:44 - INFO - stdout - {'loss': 0.4273, 'grad_norm': 1.3971513509750366, 'learning_rate': 3.863631065681974e-06, 'epoch': 2.16} +2025-02-06 00:47:44 - ERROR - stderr - 72%|███████▏ | 16130/22434 [14:40:04<4:21:23, 2.49s/it] +2025-02-06 00:47:47 - ERROR - stderr - 72%|███████▏ | 16131/22434 [14:40:06<4:24:13, 2.52s/it] +2025-02-06 00:47:47 - ERROR - stderr - +2025-02-06 00:47:47 - ERROR - stderr - +2025-02-06 00:47:47 - INFO - stdout - {'loss': 0.3563, 'grad_norm': 1.4773471355438232, 'learning_rate': 3.862491164124211e-06, 'epoch': 2.16} +2025-02-06 00:47:47 - ERROR - stderr - 72%|███████▏ | 16131/22434 [14:40:06<4:24:13, 2.52s/it] +2025-02-06 00:47:49 - ERROR - stderr - 72%|███████▏ | 16132/22434 [14:40:09<4:22:53, 2.50s/it] +2025-02-06 00:47:49 - ERROR - stderr - +2025-02-06 00:47:49 - ERROR - stderr - +2025-02-06 00:47:49 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.6563184261322021, 'learning_rate': 3.86135139049695e-06, 'epoch': 2.16} +2025-02-06 00:47:49 - ERROR - stderr - 72%|███████▏ | 16132/22434 [14:40:09<4:22:53, 2.50s/it] +2025-02-06 00:47:52 - ERROR - stderr - 72%|███████▏ | 16133/22434 [14:40:11<4:23:10, 2.51s/it] +2025-02-06 00:47:52 - ERROR - stderr - +2025-02-06 00:47:52 - ERROR - stderr - +2025-02-06 00:47:52 - INFO - stdout - {'loss': 0.4156, 'grad_norm': 1.5103743076324463, 'learning_rate': 3.860211744823939e-06, 'epoch': 2.16} +2025-02-06 00:47:52 - ERROR - stderr - 72%|███████▏ | 16133/22434 [14:40:11<4:23:10, 2.51s/it] +2025-02-06 00:47:54 - ERROR - stderr - 72%|███████▏ | 16134/22434 [14:40:14<4:23:00, 2.50s/it] +2025-02-06 00:47:54 - ERROR - stderr - +2025-02-06 00:47:54 - ERROR - stderr - +2025-02-06 00:47:54 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.5286197662353516, 'learning_rate': 3.859072227128945e-06, 'epoch': 2.16} +2025-02-06 00:47:54 - ERROR - stderr - 72%|███████▏ | 16134/22434 [14:40:14<4:23:00, 2.50s/it] +2025-02-06 00:47:57 - ERROR - stderr - 72%|███████▏ | 16135/22434 [14:40:16<4:24:56, 2.52s/it] +2025-02-06 00:47:57 - ERROR - stderr - +2025-02-06 00:47:57 - ERROR - stderr - +2025-02-06 00:47:57 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.5182170867919922, 'learning_rate': 3.857932837435707e-06, 'epoch': 2.16} +2025-02-06 00:47:57 - ERROR - stderr - 72%|███████▏ | 16135/22434 [14:40:17<4:24:56, 2.52s/it] +2025-02-06 00:47:59 - ERROR - stderr - 72%|███████▏ | 16136/22434 [14:40:19<4:21:43, 2.49s/it] +2025-02-06 00:47:59 - ERROR - stderr - +2025-02-06 00:47:59 - ERROR - stderr - +2025-02-06 00:47:59 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.495816946029663, 'learning_rate': 3.856793575767989e-06, 'epoch': 2.16} +2025-02-06 00:47:59 - ERROR - stderr - 72%|███████▏ | 16136/22434 [14:40:19<4:21:43, 2.49s/it] +2025-02-06 00:48:02 - ERROR - stderr - 72%|███████▏ | 16137/22434 [14:40:21<4:20:24, 2.48s/it] +2025-02-06 00:48:02 - ERROR - stderr - +2025-02-06 00:48:02 - ERROR - stderr - +2025-02-06 00:48:02 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4636350870132446, 'learning_rate': 3.855654442149527e-06, 'epoch': 2.16} +2025-02-06 00:48:02 - ERROR - stderr - 72%|███████▏ | 16137/22434 [14:40:21<4:20:24, 2.48s/it] +2025-02-06 00:48:04 - ERROR - stderr - 72%|███████▏ | 16138/22434 [14:40:24<4:19:38, 2.47s/it] +2025-02-06 00:48:04 - ERROR - stderr - +2025-02-06 00:48:04 - ERROR - stderr - +2025-02-06 00:48:04 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.3995403051376343, 'learning_rate': 3.854515436604066e-06, 'epoch': 2.16} +2025-02-06 00:48:04 - ERROR - stderr - 72%|███████▏ | 16138/22434 [14:40:24<4:19:38, 2.47s/it] +2025-02-06 00:48:07 - ERROR - stderr - 72%|███████▏ | 16139/22434 [14:40:26<4:23:45, 2.51s/it] +2025-02-06 00:48:07 - ERROR - stderr - +2025-02-06 00:48:07 - ERROR - stderr - +2025-02-06 00:48:07 - INFO - stdout - {'loss': 0.4089, 'grad_norm': 1.4839344024658203, 'learning_rate': 3.8533765591553564e-06, 'epoch': 2.16} +2025-02-06 00:48:07 - ERROR - stderr - 72%|███████▏ | 16139/22434 [14:40:26<4:23:45, 2.51s/it] +2025-02-06 00:48:09 - ERROR - stderr - 72%|███████▏ | 16140/22434 [14:40:29<4:21:55, 2.50s/it] +2025-02-06 00:48:09 - ERROR - stderr - +2025-02-06 00:48:09 - ERROR - stderr - +2025-02-06 00:48:09 - INFO - stdout - {'loss': 0.358, 'grad_norm': 1.4802722930908203, 'learning_rate': 3.852237809827127e-06, 'epoch': 2.16} +2025-02-06 00:48:09 - ERROR - stderr - 72%|███████▏ | 16140/22434 [14:40:29<4:21:55, 2.50s/it] +2025-02-06 00:48:12 - ERROR - stderr - 72%|███████▏ | 16141/22434 [14:40:31<4:22:39, 2.50s/it] +2025-02-06 00:48:12 - ERROR - stderr - +2025-02-06 00:48:12 - ERROR - stderr - +2025-02-06 00:48:12 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.516821026802063, 'learning_rate': 3.8510991886431185e-06, 'epoch': 2.16} +2025-02-06 00:48:12 - ERROR - stderr - 72%|███████▏ | 16141/22434 [14:40:31<4:22:39, 2.50s/it] +2025-02-06 00:48:14 - ERROR - stderr - 72%|███████▏ | 16142/22434 [14:40:34<4:21:10, 2.49s/it] +2025-02-06 00:48:14 - ERROR - stderr - +2025-02-06 00:48:14 - ERROR - stderr - +2025-02-06 00:48:14 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.5478018522262573, 'learning_rate': 3.849960695627063e-06, 'epoch': 2.16} +2025-02-06 00:48:14 - ERROR - stderr - 72%|███████▏ | 16142/22434 [14:40:34<4:21:10, 2.49s/it] +2025-02-06 00:48:17 - ERROR - stderr - 72%|███████▏ | 16143/22434 [14:40:36<4:23:25, 2.51s/it] +2025-02-06 00:48:17 - ERROR - stderr - +2025-02-06 00:48:17 - ERROR - stderr - +2025-02-06 00:48:17 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.4458105564117432, 'learning_rate': 3.848822330802691e-06, 'epoch': 2.16} +2025-02-06 00:48:17 - ERROR - stderr - 72%|███████▏ | 16143/22434 [14:40:36<4:23:25, 2.51s/it] +2025-02-06 00:48:19 - ERROR - stderr - 72%|███████▏ | 16144/22434 [14:40:39<4:22:28, 2.50s/it] +2025-02-06 00:48:19 - ERROR - stderr - +2025-02-06 00:48:19 - ERROR - stderr - +2025-02-06 00:48:19 - INFO - stdout - {'loss': 0.3678, 'grad_norm': 1.4049608707427979, 'learning_rate': 3.847684094193733e-06, 'epoch': 2.16} +2025-02-06 00:48:19 - ERROR - stderr - 72%|███████▏ | 16144/22434 [14:40:39<4:22:28, 2.50s/it] +2025-02-06 00:48:22 - ERROR - stderr - 72%|███████▏ | 16145/22434 [14:40:41<4:20:12, 2.48s/it] +2025-02-06 00:48:22 - ERROR - stderr - +2025-02-06 00:48:22 - ERROR - stderr - +2025-02-06 00:48:22 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.47234046459198, 'learning_rate': 3.846545985823912e-06, 'epoch': 2.16} +2025-02-06 00:48:22 - ERROR - stderr - 72%|███████▏ | 16145/22434 [14:40:41<4:20:12, 2.48s/it] +2025-02-06 00:48:24 - ERROR - stderr - 72%|███████▏ | 16146/22434 [14:40:44<4:20:32, 2.49s/it] +2025-02-06 00:48:24 - ERROR - stderr - +2025-02-06 00:48:24 - ERROR - stderr - +2025-02-06 00:48:24 - INFO - stdout - {'loss': 0.3814, 'grad_norm': 1.5856846570968628, 'learning_rate': 3.845408005716952e-06, 'epoch': 2.16} +2025-02-06 00:48:24 - ERROR - stderr - 72%|███████▏ | 16146/22434 [14:40:44<4:20:32, 2.49s/it] +2025-02-06 00:48:27 - ERROR - stderr - 72%|███████▏ | 16147/22434 [14:40:46<4:21:25, 2.49s/it] +2025-02-06 00:48:27 - ERROR - stderr - +2025-02-06 00:48:27 - ERROR - stderr - +2025-02-06 00:48:27 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.4797228574752808, 'learning_rate': 3.844270153896574e-06, 'epoch': 2.16} +2025-02-06 00:48:27 - ERROR - stderr - 72%|███████▏ | 16147/22434 [14:40:46<4:21:25, 2.49s/it] +2025-02-06 00:48:29 - ERROR - stderr - 72%|███████▏ | 16148/22434 [14:40:49<4:19:47, 2.48s/it] +2025-02-06 00:48:29 - ERROR - stderr - +2025-02-06 00:48:29 - ERROR - stderr - +2025-02-06 00:48:29 - INFO - stdout - {'loss': 0.3232, 'grad_norm': 1.2931476831436157, 'learning_rate': 3.843132430386492e-06, 'epoch': 2.16} +2025-02-06 00:48:29 - ERROR - stderr - 72%|███████▏ | 16148/22434 [14:40:49<4:19:47, 2.48s/it] +2025-02-06 00:48:32 - ERROR - stderr - 72%|███████▏ | 16149/22434 [14:40:51<4:26:59, 2.55s/it] +2025-02-06 00:48:32 - ERROR - stderr - +2025-02-06 00:48:32 - ERROR - stderr - +2025-02-06 00:48:32 - INFO - stdout - {'loss': 0.4106, 'grad_norm': 1.532896876335144, 'learning_rate': 3.841994835210424e-06, 'epoch': 2.16} +2025-02-06 00:48:32 - ERROR - stderr - 72%|███████▏ | 16149/22434 [14:40:52<4:26:59, 2.55s/it] +2025-02-06 00:48:34 - ERROR - stderr - 72%|███████▏ | 16150/22434 [14:40:54<4:24:27, 2.53s/it] +2025-02-06 00:48:34 - ERROR - stderr - +2025-02-06 00:48:34 - ERROR - stderr - +2025-02-06 00:48:34 - INFO - stdout - {'loss': 0.3204, 'grad_norm': 1.380008578300476, 'learning_rate': 3.840857368392082e-06, 'epoch': 2.16} +2025-02-06 00:48:34 - ERROR - stderr - 72%|███████▏ | 16150/22434 [14:40:54<4:24:27, 2.53s/it] +2025-02-06 00:48:37 - ERROR - stderr - 72%|███████▏ | 16151/22434 [14:40:56<4:22:57, 2.51s/it] +2025-02-06 00:48:37 - ERROR - stderr - +2025-02-06 00:48:37 - ERROR - stderr - +2025-02-06 00:48:37 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.321828007698059, 'learning_rate': 3.839720029955173e-06, 'epoch': 2.16} +2025-02-06 00:48:37 - ERROR - stderr - 72%|███████▏ | 16151/22434 [14:40:56<4:22:57, 2.51s/it] +2025-02-06 00:48:39 - ERROR - stderr - 72%|███████▏ | 16152/22434 [14:40:59<4:21:07, 2.49s/it] +2025-02-06 00:48:39 - ERROR - stderr - +2025-02-06 00:48:39 - ERROR - stderr - +2025-02-06 00:48:39 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.3661874532699585, 'learning_rate': 3.838582819923405e-06, 'epoch': 2.16} +2025-02-06 00:48:39 - ERROR - stderr - 72%|███████▏ | 16152/22434 [14:40:59<4:21:07, 2.49s/it] +2025-02-06 00:48:42 - ERROR - stderr - 72%|███████▏ | 16153/22434 [14:41:02<4:32:36, 2.60s/it] +2025-02-06 00:48:42 - ERROR - stderr - +2025-02-06 00:48:42 - ERROR - stderr - +2025-02-06 00:48:42 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.6206506490707397, 'learning_rate': 3.837445738320488e-06, 'epoch': 2.16} +2025-02-06 00:48:42 - ERROR - stderr - 72%|███████▏ | 16153/22434 [14:41:02<4:32:36, 2.60s/it] +2025-02-06 00:48:45 - ERROR - stderr - 72%|███████▏ | 16154/22434 [14:41:04<4:31:12, 2.59s/it] +2025-02-06 00:48:45 - ERROR - stderr - +2025-02-06 00:48:45 - ERROR - stderr - +2025-02-06 00:48:45 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.478460431098938, 'learning_rate': 3.836308785170109e-06, 'epoch': 2.16} +2025-02-06 00:48:45 - ERROR - stderr - 72%|███████▏ | 16154/22434 [14:41:04<4:31:12, 2.59s/it] +2025-02-06 00:48:47 - ERROR - stderr - 72%|███████▏ | 16155/22434 [14:41:07<4:29:03, 2.57s/it] +2025-02-06 00:48:47 - ERROR - stderr - +2025-02-06 00:48:47 - ERROR - stderr - +2025-02-06 00:48:47 - INFO - stdout - {'loss': 0.4125, 'grad_norm': 1.7232661247253418, 'learning_rate': 3.835171960495983e-06, 'epoch': 2.16} +2025-02-06 00:48:47 - ERROR - stderr - 72%|███████▏ | 16155/22434 [14:41:07<4:29:03, 2.57s/it] +2025-02-06 00:48:50 - ERROR - stderr - 72%|███████▏ | 16156/22434 [14:41:09<4:27:41, 2.56s/it] +2025-02-06 00:48:50 - ERROR - stderr - +2025-02-06 00:48:50 - ERROR - stderr - +2025-02-06 00:48:50 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.5069818496704102, 'learning_rate': 3.8340352643217904e-06, 'epoch': 2.16} +2025-02-06 00:48:50 - ERROR - stderr - 72%|███████▏ | 16156/22434 [14:41:09<4:27:41, 2.56s/it] +2025-02-06 00:48:52 - ERROR - stderr - 72%|███████▏ | 16157/22434 [14:41:12<4:24:02, 2.52s/it] +2025-02-06 00:48:52 - ERROR - stderr - +2025-02-06 00:48:52 - ERROR - stderr - +2025-02-06 00:48:52 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.5332118272781372, 'learning_rate': 3.832898696671237e-06, 'epoch': 2.16} +2025-02-06 00:48:52 - ERROR - stderr - 72%|███████▏ | 16157/22434 [14:41:12<4:24:02, 2.52s/it] +2025-02-06 00:48:55 - ERROR - stderr - 72%|███████▏ | 16158/22434 [14:41:14<4:24:23, 2.53s/it] +2025-02-06 00:48:55 - ERROR - stderr - +2025-02-06 00:48:55 - ERROR - stderr - +2025-02-06 00:48:55 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.6705482006072998, 'learning_rate': 3.831762257568013e-06, 'epoch': 2.16} +2025-02-06 00:48:55 - ERROR - stderr - 72%|███████▏ | 16158/22434 [14:41:14<4:24:23, 2.53s/it] +2025-02-06 00:48:57 - ERROR - stderr - 72%|███████▏ | 16159/22434 [14:41:17<4:21:54, 2.50s/it] +2025-02-06 00:48:57 - ERROR - stderr - +2025-02-06 00:48:57 - ERROR - stderr - +2025-02-06 00:48:57 - INFO - stdout - {'loss': 0.316, 'grad_norm': 1.3945168256759644, 'learning_rate': 3.8306259470357935e-06, 'epoch': 2.16} +2025-02-06 00:48:57 - ERROR - stderr - 72%|███████▏ | 16159/22434 [14:41:17<4:21:54, 2.50s/it] +2025-02-06 00:48:59 - ERROR - stderr - 72%|███████▏ | 16160/22434 [14:41:19<4:19:58, 2.49s/it] +2025-02-06 00:49:00 - ERROR - stderr - +2025-02-06 00:49:00 - ERROR - stderr - +2025-02-06 00:49:00 - INFO - stdout - {'loss': 0.4065, 'grad_norm': 1.4786723852157593, 'learning_rate': 3.829489765098281e-06, 'epoch': 2.16} +2025-02-06 00:49:00 - ERROR - stderr - 72%|███████▏ | 16160/22434 [14:41:19<4:19:58, 2.49s/it] +2025-02-06 00:49:02 - ERROR - stderr - 72%|███████▏ | 16161/22434 [14:41:22<4:18:56, 2.48s/it] +2025-02-06 00:49:02 - ERROR - stderr - +2025-02-06 00:49:02 - ERROR - stderr - +2025-02-06 00:49:02 - INFO - stdout - {'loss': 0.4358, 'grad_norm': 1.5336272716522217, 'learning_rate': 3.828353711779146e-06, 'epoch': 2.16} +2025-02-06 00:49:02 - ERROR - stderr - 72%|███████▏ | 16161/22434 [14:41:22<4:18:56, 2.48s/it] +2025-02-06 00:49:04 - ERROR - stderr - 72%|███████▏ | 16162/22434 [14:41:24<4:18:42, 2.47s/it] +2025-02-06 00:49:04 - ERROR - stderr - +2025-02-06 00:49:04 - ERROR - stderr - +2025-02-06 00:49:04 - INFO - stdout - {'loss': 0.4005, 'grad_norm': 1.4567012786865234, 'learning_rate': 3.827217787102072e-06, 'epoch': 2.16} +2025-02-06 00:49:04 - ERROR - stderr - 72%|███████▏ | 16162/22434 [14:41:24<4:18:42, 2.47s/it] +2025-02-06 00:49:07 - ERROR - stderr - 72%|███████▏ | 16163/22434 [14:41:27<4:20:13, 2.49s/it] +2025-02-06 00:49:07 - ERROR - stderr - +2025-02-06 00:49:07 - ERROR - stderr - +2025-02-06 00:49:07 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.6541118621826172, 'learning_rate': 3.826081991090737e-06, 'epoch': 2.16} +2025-02-06 00:49:07 - ERROR - stderr - 72%|███████▏ | 16163/22434 [14:41:27<4:20:13, 2.49s/it] +2025-02-06 00:49:10 - ERROR - stderr - 72%|███████▏ | 16164/22434 [14:41:29<4:28:32, 2.57s/it] +2025-02-06 00:49:10 - ERROR - stderr - +2025-02-06 00:49:10 - ERROR - stderr - +2025-02-06 00:49:10 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.4277007579803467, 'learning_rate': 3.824946323768811e-06, 'epoch': 2.16} +2025-02-06 00:49:10 - ERROR - stderr - 72%|███████▏ | 16164/22434 [14:41:29<4:28:32, 2.57s/it] +2025-02-06 00:49:12 - ERROR - stderr - 72%|███████▏ | 16165/22434 [14:41:32<4:28:29, 2.57s/it] +2025-02-06 00:49:12 - ERROR - stderr - +2025-02-06 00:49:12 - ERROR - stderr - +2025-02-06 00:49:12 - INFO - stdout - {'loss': 0.438, 'grad_norm': 1.5351969003677368, 'learning_rate': 3.8238107851599785e-06, 'epoch': 2.16} +2025-02-06 00:49:12 - ERROR - stderr - 72%|███████▏ | 16165/22434 [14:41:32<4:28:29, 2.57s/it] +2025-02-06 00:49:15 - ERROR - stderr - 72%|███████▏ | 16166/22434 [14:41:34<4:25:28, 2.54s/it] +2025-02-06 00:49:15 - ERROR - stderr - +2025-02-06 00:49:15 - ERROR - stderr - +2025-02-06 00:49:15 - INFO - stdout - {'loss': 0.398, 'grad_norm': 1.5175426006317139, 'learning_rate': 3.8226753752878955e-06, 'epoch': 2.16} +2025-02-06 00:49:15 - ERROR - stderr - 72%|███████▏ | 16166/22434 [14:41:35<4:25:28, 2.54s/it] +2025-02-06 00:49:17 - ERROR - stderr - 72%|███████▏ | 16167/22434 [14:41:37<4:24:18, 2.53s/it] +2025-02-06 00:49:17 - ERROR - stderr - +2025-02-06 00:49:17 - ERROR - stderr - +2025-02-06 00:49:17 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.3230775594711304, 'learning_rate': 3.8215400941762325e-06, 'epoch': 2.16} +2025-02-06 00:49:17 - ERROR - stderr - 72%|███████▏ | 16167/22434 [14:41:37<4:24:18, 2.53s/it] +2025-02-06 00:49:20 - ERROR - stderr - 72%|███████▏ | 16168/22434 [14:41:40<4:29:39, 2.58s/it] +2025-02-06 00:49:20 - ERROR - stderr - +2025-02-06 00:49:20 - ERROR - stderr - +2025-02-06 00:49:20 - INFO - stdout - {'loss': 0.3404, 'grad_norm': 1.35922372341156, 'learning_rate': 3.820404941848656e-06, 'epoch': 2.16} +2025-02-06 00:49:20 - ERROR - stderr - 72%|███████▏ | 16168/22434 [14:41:40<4:29:39, 2.58s/it] +2025-02-06 00:49:22 - ERROR - stderr - 72%|███████▏ | 16169/22434 [14:41:42<4:27:29, 2.56s/it] +2025-02-06 00:49:22 - ERROR - stderr - +2025-02-06 00:49:22 - ERROR - stderr - +2025-02-06 00:49:22 - INFO - stdout - {'loss': 0.4134, 'grad_norm': 1.5555135011672974, 'learning_rate': 3.819269918328824e-06, 'epoch': 2.16} +2025-02-06 00:49:22 - ERROR - stderr - 72%|███████▏ | 16169/22434 [14:41:42<4:27:29, 2.56s/it] +2025-02-06 00:49:25 - ERROR - stderr - 72%|███████▏ | 16170/22434 [14:41:45<4:23:48, 2.53s/it] +2025-02-06 00:49:25 - ERROR - stderr - +2025-02-06 00:49:25 - ERROR - stderr - +2025-02-06 00:49:25 - INFO - stdout - {'loss': 0.3892, 'grad_norm': 1.5486416816711426, 'learning_rate': 3.8181350236403955e-06, 'epoch': 2.16} +2025-02-06 00:49:25 - ERROR - stderr - 72%|███████▏ | 16170/22434 [14:41:45<4:23:48, 2.53s/it] +2025-02-06 00:49:27 - ERROR - stderr - 72%|███████▏ | 16171/22434 [14:41:47<4:24:11, 2.53s/it] +2025-02-06 00:49:27 - ERROR - stderr - +2025-02-06 00:49:27 - ERROR - stderr - +2025-02-06 00:49:27 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.6113145351409912, 'learning_rate': 3.817000257807029e-06, 'epoch': 2.16} +2025-02-06 00:49:27 - ERROR - stderr - 72%|███████▏ | 16171/22434 [14:41:47<4:24:11, 2.53s/it] +2025-02-06 00:49:30 - ERROR - stderr - 72%|███████▏ | 16172/22434 [14:41:50<4:26:32, 2.55s/it] +2025-02-06 00:49:30 - ERROR - stderr - +2025-02-06 00:49:30 - ERROR - stderr - +2025-02-06 00:49:30 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.5268046855926514, 'learning_rate': 3.815865620852375e-06, 'epoch': 2.16} +2025-02-06 00:49:30 - ERROR - stderr - 72%|███████▏ | 16172/22434 [14:41:50<4:26:32, 2.55s/it] +2025-02-06 00:49:33 - ERROR - stderr - 72%|███████▏ | 16173/22434 [14:41:53<4:36:26, 2.65s/it] +2025-02-06 00:49:33 - ERROR - stderr - +2025-02-06 00:49:33 - ERROR - stderr - +2025-02-06 00:49:33 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.4876846075057983, 'learning_rate': 3.814731112800083e-06, 'epoch': 2.16} +2025-02-06 00:49:33 - ERROR - stderr - 72%|███████▏ | 16173/22434 [14:41:53<4:36:26, 2.65s/it] +2025-02-06 00:49:35 - ERROR - stderr - 72%|███████▏ | 16174/22434 [14:41:55<4:31:10, 2.60s/it] +2025-02-06 00:49:35 - ERROR - stderr - +2025-02-06 00:49:35 - ERROR - stderr - +2025-02-06 00:49:35 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.521790862083435, 'learning_rate': 3.8135967336738076e-06, 'epoch': 2.16} +2025-02-06 00:49:35 - ERROR - stderr - 72%|███████▏ | 16174/22434 [14:41:55<4:31:10, 2.60s/it] +2025-02-06 00:49:38 - ERROR - stderr - 72%|███████▏ | 16175/22434 [14:41:58<4:25:30, 2.55s/it] +2025-02-06 00:49:38 - ERROR - stderr - +2025-02-06 00:49:38 - ERROR - stderr - +2025-02-06 00:49:38 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.456770896911621, 'learning_rate': 3.8124624834971803e-06, 'epoch': 2.16} +2025-02-06 00:49:38 - ERROR - stderr - 72%|███████▏ | 16175/22434 [14:41:58<4:25:30, 2.55s/it] +2025-02-06 00:49:40 - ERROR - stderr - 72%|███████▏ | 16176/22434 [14:42:00<4:23:29, 2.53s/it] +2025-02-06 00:49:40 - ERROR - stderr - +2025-02-06 00:49:40 - ERROR - stderr - +2025-02-06 00:49:40 - INFO - stdout - {'loss': 0.3257, 'grad_norm': 1.26850163936615, 'learning_rate': 3.8113283622938556e-06, 'epoch': 2.16} +2025-02-06 00:49:40 - ERROR - stderr - 72%|███████▏ | 16176/22434 [14:42:00<4:23:29, 2.53s/it] +2025-02-06 00:49:43 - ERROR - stderr - 72%|███████▏ | 16177/22434 [14:42:03<4:24:14, 2.53s/it] +2025-02-06 00:49:43 - ERROR - stderr - +2025-02-06 00:49:43 - ERROR - stderr - +2025-02-06 00:49:43 - INFO - stdout - {'loss': 0.4165, 'grad_norm': 1.5819414854049683, 'learning_rate': 3.810194370087473e-06, 'epoch': 2.16} +2025-02-06 00:49:43 - ERROR - stderr - 72%|███████▏ | 16177/22434 [14:42:03<4:24:14, 2.53s/it] +2025-02-06 00:49:45 - ERROR - stderr - 72%|███████▏ | 16178/22434 [14:42:05<4:21:08, 2.50s/it] +2025-02-06 00:49:45 - ERROR - stderr - +2025-02-06 00:49:45 - ERROR - stderr - +2025-02-06 00:49:45 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.5373594760894775, 'learning_rate': 3.8090605069016596e-06, 'epoch': 2.16} +2025-02-06 00:49:45 - ERROR - stderr - 72%|███████▏ | 16178/22434 [14:42:05<4:21:08, 2.50s/it] +2025-02-06 00:49:48 - ERROR - stderr - 72%|███████▏ | 16179/22434 [14:42:08<4:21:46, 2.51s/it] +2025-02-06 00:49:48 - ERROR - stderr - +2025-02-06 00:49:48 - ERROR - stderr - +2025-02-06 00:49:48 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.3819327354431152, 'learning_rate': 3.8079267727600623e-06, 'epoch': 2.16} +2025-02-06 00:49:48 - ERROR - stderr - 72%|███████▏ | 16179/22434 [14:42:08<4:21:46, 2.51s/it] +2025-02-06 00:49:51 - ERROR - stderr - 72%|███████▏ | 16180/22434 [14:42:10<4:31:31, 2.61s/it] +2025-02-06 00:49:51 - ERROR - stderr - +2025-02-06 00:49:51 - ERROR - stderr - +2025-02-06 00:49:51 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.460359811782837, 'learning_rate': 3.806793167686298e-06, 'epoch': 2.16} +2025-02-06 00:49:51 - ERROR - stderr - 72%|███████▏ | 16180/22434 [14:42:10<4:31:31, 2.61s/it] +2025-02-06 00:49:53 - ERROR - stderr - 72%|███████▏ | 16181/22434 [14:42:13<4:26:20, 2.56s/it] +2025-02-06 00:49:53 - ERROR - stderr - +2025-02-06 00:49:53 - ERROR - stderr - +2025-02-06 00:49:53 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.501381278038025, 'learning_rate': 3.805659691704012e-06, 'epoch': 2.16} +2025-02-06 00:49:53 - ERROR - stderr - 72%|███████▏ | 16181/22434 [14:42:13<4:26:20, 2.56s/it] +2025-02-06 00:49:56 - ERROR - stderr - 72%|███████▏ | 16182/22434 [14:42:15<4:24:12, 2.54s/it] +2025-02-06 00:49:56 - ERROR - stderr - +2025-02-06 00:49:56 - ERROR - stderr - +2025-02-06 00:49:56 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.392208218574524, 'learning_rate': 3.8045263448368186e-06, 'epoch': 2.16} +2025-02-06 00:49:56 - ERROR - stderr - 72%|███████▏ | 16182/22434 [14:42:15<4:24:12, 2.54s/it] +2025-02-06 00:49:58 - ERROR - stderr - 72%|███████▏ | 16183/22434 [14:42:18<4:23:18, 2.53s/it] +2025-02-06 00:49:58 - ERROR - stderr - +2025-02-06 00:49:58 - ERROR - stderr - +2025-02-06 00:49:58 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.6806392669677734, 'learning_rate': 3.8033931271083423e-06, 'epoch': 2.16} +2025-02-06 00:49:58 - ERROR - stderr - 72%|███████▏ | 16183/22434 [14:42:18<4:23:18, 2.53s/it] +2025-02-06 00:50:01 - ERROR - stderr - 72%|███████▏ | 16184/22434 [14:42:20<4:25:13, 2.55s/it] +2025-02-06 00:50:01 - ERROR - stderr - +2025-02-06 00:50:01 - ERROR - stderr - +2025-02-06 00:50:01 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.4217662811279297, 'learning_rate': 3.8022600385422126e-06, 'epoch': 2.16} +2025-02-06 00:50:01 - ERROR - stderr - 72%|███████▏ | 16184/22434 [14:42:20<4:25:13, 2.55s/it] +2025-02-06 00:50:03 - ERROR - stderr - 72%|███████▏ | 16185/22434 [14:42:23<4:22:07, 2.52s/it] +2025-02-06 00:50:03 - ERROR - stderr - +2025-02-06 00:50:03 - ERROR - stderr - +2025-02-06 00:50:03 - INFO - stdout - {'loss': 0.4237, 'grad_norm': 1.557889461517334, 'learning_rate': 3.801127079162039e-06, 'epoch': 2.16} +2025-02-06 00:50:03 - ERROR - stderr - 72%|███████▏ | 16185/22434 [14:42:23<4:22:07, 2.52s/it] +2025-02-06 00:50:06 - ERROR - stderr - 72%|███████▏ | 16186/22434 [14:42:25<4:23:38, 2.53s/it] +2025-02-06 00:50:06 - ERROR - stderr - +2025-02-06 00:50:06 - ERROR - stderr - +2025-02-06 00:50:06 - INFO - stdout - {'loss': 0.3994, 'grad_norm': 1.4533740282058716, 'learning_rate': 3.7999942489914397e-06, 'epoch': 2.16} +2025-02-06 00:50:06 - ERROR - stderr - 72%|███████▏ | 16186/22434 [14:42:25<4:23:38, 2.53s/it] +2025-02-06 00:50:08 - ERROR - stderr - 72%|███████▏ | 16187/22434 [14:42:28<4:20:45, 2.50s/it] +2025-02-06 00:50:08 - ERROR - stderr - +2025-02-06 00:50:08 - ERROR - stderr - +2025-02-06 00:50:08 - INFO - stdout - {'loss': 0.4034, 'grad_norm': 1.6821880340576172, 'learning_rate': 3.798861548054028e-06, 'epoch': 2.16} +2025-02-06 00:50:08 - ERROR - stderr - 72%|███████▏ | 16187/22434 [14:42:28<4:20:45, 2.50s/it] +2025-02-06 00:50:11 - ERROR - stderr - 72%|███████▏ | 16188/22434 [14:42:30<4:20:13, 2.50s/it] +2025-02-06 00:50:11 - ERROR - stderr - +2025-02-06 00:50:11 - ERROR - stderr - +2025-02-06 00:50:11 - INFO - stdout - {'loss': 0.3988, 'grad_norm': 1.5353648662567139, 'learning_rate': 3.7977289763734125e-06, 'epoch': 2.16} +2025-02-06 00:50:11 - ERROR - stderr - 72%|███████▏ | 16188/22434 [14:42:30<4:20:13, 2.50s/it] +2025-02-06 00:50:13 - ERROR - stderr - 72%|███████▏ | 16189/22434 [14:42:33<4:21:16, 2.51s/it] +2025-02-06 00:50:13 - ERROR - stderr - +2025-02-06 00:50:13 - ERROR - stderr - +2025-02-06 00:50:13 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.403009057044983, 'learning_rate': 3.7965965339732025e-06, 'epoch': 2.16} +2025-02-06 00:50:13 - ERROR - stderr - 72%|███████▏ | 16189/22434 [14:42:33<4:21:16, 2.51s/it] +2025-02-06 00:50:16 - ERROR - stderr - 72%|███████▏ | 16190/22434 [14:42:36<4:25:53, 2.55s/it] +2025-02-06 00:50:16 - ERROR - stderr - +2025-02-06 00:50:16 - ERROR - stderr - +2025-02-06 00:50:16 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.4433552026748657, 'learning_rate': 3.795464220877001e-06, 'epoch': 2.17} +2025-02-06 00:50:16 - ERROR - stderr - 72%|███████▏ | 16190/22434 [14:42:36<4:25:53, 2.55s/it] +2025-02-06 00:50:18 - ERROR - stderr - 72%|███████▏ | 16191/22434 [14:42:38<4:24:23, 2.54s/it] +2025-02-06 00:50:18 - ERROR - stderr - +2025-02-06 00:50:18 - ERROR - stderr - +2025-02-06 00:50:18 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.570870041847229, 'learning_rate': 3.7943320371084104e-06, 'epoch': 2.17} +2025-02-06 00:50:18 - ERROR - stderr - 72%|███████▏ | 16191/22434 [14:42:38<4:24:23, 2.54s/it] +2025-02-06 00:50:21 - ERROR - stderr - 72%|███████▏ | 16192/22434 [14:42:41<4:22:22, 2.52s/it] +2025-02-06 00:50:21 - ERROR - stderr - +2025-02-06 00:50:21 - ERROR - stderr - +2025-02-06 00:50:21 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.4943965673446655, 'learning_rate': 3.7931999826910316e-06, 'epoch': 2.17} +2025-02-06 00:50:21 - ERROR - stderr - 72%|███████▏ | 16192/22434 [14:42:41<4:22:22, 2.52s/it] +2025-02-06 00:50:23 - ERROR - stderr - 72%|███████▏ | 16193/22434 [14:42:43<4:21:18, 2.51s/it] +2025-02-06 00:50:23 - ERROR - stderr - +2025-02-06 00:50:23 - ERROR - stderr - +2025-02-06 00:50:23 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.3177591562271118, 'learning_rate': 3.7920680576484627e-06, 'epoch': 2.17} +2025-02-06 00:50:23 - ERROR - stderr - 72%|███████▏ | 16193/22434 [14:42:43<4:21:18, 2.51s/it] +2025-02-06 00:50:26 - ERROR - stderr - 72%|███████▏ | 16194/22434 [14:42:46<4:29:41, 2.59s/it] +2025-02-06 00:50:26 - ERROR - stderr - +2025-02-06 00:50:26 - ERROR - stderr - +2025-02-06 00:50:26 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.419259786605835, 'learning_rate': 3.790936262004287e-06, 'epoch': 2.17} +2025-02-06 00:50:26 - ERROR - stderr - 72%|███████▏ | 16194/22434 [14:42:46<4:29:41, 2.59s/it] +2025-02-06 00:50:29 - ERROR - stderr - 72%|███████▏ | 16195/22434 [14:42:48<4:30:29, 2.60s/it] +2025-02-06 00:50:29 - ERROR - stderr - +2025-02-06 00:50:29 - ERROR - stderr - +2025-02-06 00:50:29 - INFO - stdout - {'loss': 0.4005, 'grad_norm': 1.5744701623916626, 'learning_rate': 3.7898045957821082e-06, 'epoch': 2.17} +2025-02-06 00:50:29 - ERROR - stderr - 72%|███████▏ | 16195/22434 [14:42:48<4:30:29, 2.60s/it] +2025-02-06 00:50:31 - ERROR - stderr - 72%|███████▏ | 16196/22434 [14:42:51<4:30:08, 2.60s/it] +2025-02-06 00:50:31 - ERROR - stderr - +2025-02-06 00:50:31 - ERROR - stderr - +2025-02-06 00:50:31 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.3720170259475708, 'learning_rate': 3.78867305900551e-06, 'epoch': 2.17} +2025-02-06 00:50:31 - ERROR - stderr - 72%|███████▏ | 16196/22434 [14:42:51<4:30:08, 2.60s/it] +2025-02-06 00:50:34 - ERROR - stderr - 72%|███████▏ | 16197/22434 [14:42:54<4:30:27, 2.60s/it] +2025-02-06 00:50:34 - ERROR - stderr - +2025-02-06 00:50:34 - ERROR - stderr - +2025-02-06 00:50:34 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.5274364948272705, 'learning_rate': 3.787541651698077e-06, 'epoch': 2.17} +2025-02-06 00:50:34 - ERROR - stderr - 72%|███████▏ | 16197/22434 [14:42:54<4:30:27, 2.60s/it] +2025-02-06 00:50:36 - ERROR - stderr - 72%|███████▏ | 16198/22434 [14:42:56<4:26:14, 2.56s/it] +2025-02-06 00:50:36 - ERROR - stderr - +2025-02-06 00:50:36 - ERROR - stderr - +2025-02-06 00:50:36 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.5357438325881958, 'learning_rate': 3.786410373883398e-06, 'epoch': 2.17} +2025-02-06 00:50:36 - ERROR - stderr - 72%|███████▏ | 16198/22434 [14:42:56<4:26:14, 2.56s/it] +2025-02-06 00:50:39 - ERROR - stderr - 72%|███████▏ | 16199/22434 [14:42:59<4:28:19, 2.58s/it] +2025-02-06 00:50:39 - ERROR - stderr - +2025-02-06 00:50:39 - ERROR - stderr - +2025-02-06 00:50:39 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.5135211944580078, 'learning_rate': 3.785279225585042e-06, 'epoch': 2.17} +2025-02-06 00:50:39 - ERROR - stderr - 72%|███████▏ | 16199/22434 [14:42:59<4:28:19, 2.58s/it] +2025-02-06 00:50:42 - ERROR - stderr - 72%|███████▏ | 16200/22434 [14:43:01<4:32:26, 2.62s/it] +2025-02-06 00:50:42 - ERROR - stderr - +2025-02-06 00:50:42 - ERROR - stderr - +2025-02-06 00:50:42 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.4882596731185913, 'learning_rate': 3.7841482068266013e-06, 'epoch': 2.17} +2025-02-06 00:50:42 - ERROR - stderr - 72%|███████▏ | 16200/22434 [14:43:01<4:32:26, 2.62s/it] +2025-02-06 00:50:44 - ERROR - stderr - 72%|███████▏ | 16201/22434 [14:43:04<4:32:16, 2.62s/it] +2025-02-06 00:50:44 - ERROR - stderr - +2025-02-06 00:50:44 - ERROR - stderr - +2025-02-06 00:50:44 - INFO - stdout - {'loss': 0.3303, 'grad_norm': 1.4050239324569702, 'learning_rate': 3.783017317631639e-06, 'epoch': 2.17} +2025-02-06 00:50:44 - ERROR - stderr - 72%|███████▏ | 16201/22434 [14:43:04<4:32:16, 2.62s/it] +2025-02-06 00:50:47 - ERROR - stderr - 72%|███████▏ | 16202/22434 [14:43:07<4:29:06, 2.59s/it] +2025-02-06 00:50:47 - ERROR - stderr - +2025-02-06 00:50:47 - ERROR - stderr - +2025-02-06 00:50:47 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.5266227722167969, 'learning_rate': 3.7818865580237287e-06, 'epoch': 2.17} +2025-02-06 00:50:47 - ERROR - stderr - 72%|███████▏ | 16202/22434 [14:43:07<4:29:06, 2.59s/it] +2025-02-06 00:50:49 - ERROR - stderr - 72%|███████▏ | 16203/22434 [14:43:09<4:26:56, 2.57s/it] +2025-02-06 00:50:49 - ERROR - stderr - +2025-02-06 00:50:49 - ERROR - stderr - +2025-02-06 00:50:49 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.5692311525344849, 'learning_rate': 3.7807559280264495e-06, 'epoch': 2.17} +2025-02-06 00:50:49 - ERROR - stderr - 72%|███████▏ | 16203/22434 [14:43:09<4:26:56, 2.57s/it] +2025-02-06 00:50:52 - ERROR - stderr - 72%|███████▏ | 16204/22434 [14:43:12<4:23:58, 2.54s/it] +2025-02-06 00:50:52 - ERROR - stderr - +2025-02-06 00:50:52 - ERROR - stderr - +2025-02-06 00:50:52 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.45881986618042, 'learning_rate': 3.779625427663355e-06, 'epoch': 2.17} +2025-02-06 00:50:52 - ERROR - stderr - 72%|███████▏ | 16204/22434 [14:43:12<4:23:58, 2.54s/it] +2025-02-06 00:50:54 - ERROR - stderr - 72%|███████▏ | 16205/22434 [14:43:14<4:27:27, 2.58s/it] +2025-02-06 00:50:55 - ERROR - stderr - +2025-02-06 00:50:55 - ERROR - stderr - +2025-02-06 00:50:55 - INFO - stdout - {'loss': 0.3943, 'grad_norm': 1.5354901552200317, 'learning_rate': 3.7784950569580224e-06, 'epoch': 2.17} +2025-02-06 00:50:55 - ERROR - stderr - 72%|███████▏ | 16205/22434 [14:43:14<4:27:27, 2.58s/it] +2025-02-06 00:50:57 - ERROR - stderr - 72%|███████▏ | 16206/22434 [14:43:17<4:22:48, 2.53s/it] +2025-02-06 00:50:57 - ERROR - stderr - +2025-02-06 00:50:57 - ERROR - stderr - +2025-02-06 00:50:57 - INFO - stdout - {'loss': 0.3268, 'grad_norm': 1.4313868284225464, 'learning_rate': 3.777364815934005e-06, 'epoch': 2.17} +2025-02-06 00:50:57 - ERROR - stderr - 72%|███████▏ | 16206/22434 [14:43:17<4:22:48, 2.53s/it] +2025-02-06 00:50:59 - ERROR - stderr - 72%|███████▏ | 16207/22434 [14:43:19<4:20:28, 2.51s/it] +2025-02-06 00:50:59 - ERROR - stderr - +2025-02-06 00:50:59 - ERROR - stderr - +2025-02-06 00:50:59 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.69831120967865, 'learning_rate': 3.776234704614863e-06, 'epoch': 2.17} +2025-02-06 00:50:59 - ERROR - stderr - 72%|███████▏ | 16207/22434 [14:43:19<4:20:28, 2.51s/it] +2025-02-06 00:51:02 - ERROR - stderr - 72%|███████▏ | 16208/22434 [14:43:22<4:22:58, 2.53s/it] +2025-02-06 00:51:02 - ERROR - stderr - +2025-02-06 00:51:02 - ERROR - stderr - +2025-02-06 00:51:02 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.4901002645492554, 'learning_rate': 3.7751047230241535e-06, 'epoch': 2.17} +2025-02-06 00:51:02 - ERROR - stderr - 72%|███████▏ | 16208/22434 [14:43:22<4:22:58, 2.53s/it] +2025-02-06 00:51:05 - ERROR - stderr - 72%|███████▏ | 16209/22434 [14:43:24<4:24:13, 2.55s/it] +2025-02-06 00:51:05 - ERROR - stderr - +2025-02-06 00:51:05 - ERROR - stderr - +2025-02-06 00:51:05 - INFO - stdout - {'loss': 0.4466, 'grad_norm': 1.7743366956710815, 'learning_rate': 3.7739748711854284e-06, 'epoch': 2.17} +2025-02-06 00:51:05 - ERROR - stderr - 72%|███████▏ | 16209/22434 [14:43:24<4:24:13, 2.55s/it] +2025-02-06 00:51:07 - ERROR - stderr - 72%|███████▏ | 16210/22434 [14:43:27<4:26:27, 2.57s/it] +2025-02-06 00:51:07 - ERROR - stderr - +2025-02-06 00:51:07 - ERROR - stderr - +2025-02-06 00:51:07 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.3814276456832886, 'learning_rate': 3.7728451491222394e-06, 'epoch': 2.17} +2025-02-06 00:51:07 - ERROR - stderr - 72%|███████▏ | 16210/22434 [14:43:27<4:26:27, 2.57s/it] +2025-02-06 00:51:10 - ERROR - stderr - 72%|███████▏ | 16211/22434 [14:43:29<4:24:58, 2.55s/it] +2025-02-06 00:51:10 - ERROR - stderr - +2025-02-06 00:51:10 - ERROR - stderr - +2025-02-06 00:51:10 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.6186026334762573, 'learning_rate': 3.7717155568581354e-06, 'epoch': 2.17} +2025-02-06 00:51:10 - ERROR - stderr - 72%|███████▏ | 16211/22434 [14:43:29<4:24:58, 2.55s/it] +2025-02-06 00:51:12 - ERROR - stderr - 72%|███████▏ | 16212/22434 [14:43:32<4:21:40, 2.52s/it] +2025-02-06 00:51:12 - ERROR - stderr - +2025-02-06 00:51:12 - ERROR - stderr - +2025-02-06 00:51:12 - INFO - stdout - {'loss': 0.4086, 'grad_norm': 1.744234561920166, 'learning_rate': 3.7705860944166607e-06, 'epoch': 2.17} +2025-02-06 00:51:12 - ERROR - stderr - 72%|███████▏ | 16212/22434 [14:43:32<4:21:40, 2.52s/it] +2025-02-06 00:51:15 - ERROR - stderr - 72%|███████▏ | 16213/22434 [14:43:34<4:20:15, 2.51s/it] +2025-02-06 00:51:15 - ERROR - stderr - +2025-02-06 00:51:15 - ERROR - stderr - +2025-02-06 00:51:15 - INFO - stdout - {'loss': 0.4351, 'grad_norm': 1.6753228902816772, 'learning_rate': 3.7694567618213584e-06, 'epoch': 2.17} +2025-02-06 00:51:15 - ERROR - stderr - 72%|███████▏ | 16213/22434 [14:43:34<4:20:15, 2.51s/it] +2025-02-06 00:51:17 - ERROR - stderr - 72%|███████▏ | 16214/22434 [14:43:37<4:20:17, 2.51s/it] +2025-02-06 00:51:17 - ERROR - stderr - +2025-02-06 00:51:17 - ERROR - stderr - +2025-02-06 00:51:17 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.3432114124298096, 'learning_rate': 3.768327559095767e-06, 'epoch': 2.17} +2025-02-06 00:51:17 - ERROR - stderr - 72%|███████▏ | 16214/22434 [14:43:37<4:20:17, 2.51s/it] +2025-02-06 00:51:20 - ERROR - stderr - 72%|███████▏ | 16215/22434 [14:43:39<4:21:50, 2.53s/it] +2025-02-06 00:51:20 - ERROR - stderr - +2025-02-06 00:51:20 - ERROR - stderr - +2025-02-06 00:51:20 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.5963348150253296, 'learning_rate': 3.7671984862634246e-06, 'epoch': 2.17} +2025-02-06 00:51:20 - ERROR - stderr - 72%|███████▏ | 16215/22434 [14:43:39<4:21:50, 2.53s/it] +2025-02-06 00:51:22 - ERROR - stderr - 72%|███████▏ | 16216/22434 [14:43:42<4:20:43, 2.52s/it] +2025-02-06 00:51:22 - ERROR - stderr - +2025-02-06 00:51:22 - ERROR - stderr - +2025-02-06 00:51:22 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.4214552640914917, 'learning_rate': 3.7660695433478667e-06, 'epoch': 2.17} +2025-02-06 00:51:22 - ERROR - stderr - 72%|███████▏ | 16216/22434 [14:43:42<4:20:43, 2.52s/it] +2025-02-06 00:51:25 - ERROR - stderr - 72%|███████▏ | 16217/22434 [14:43:45<4:23:00, 2.54s/it] +2025-02-06 00:51:25 - ERROR - stderr - +2025-02-06 00:51:25 - ERROR - stderr - +2025-02-06 00:51:25 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.4677363634109497, 'learning_rate': 3.7649407303726258e-06, 'epoch': 2.17} +2025-02-06 00:51:25 - ERROR - stderr - 72%|███████▏ | 16217/22434 [14:43:45<4:23:00, 2.54s/it] +2025-02-06 00:51:27 - ERROR - stderr - 72%|███████▏ | 16218/22434 [14:43:47<4:24:37, 2.55s/it] +2025-02-06 00:51:27 - ERROR - stderr - +2025-02-06 00:51:27 - ERROR - stderr - +2025-02-06 00:51:27 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.4466300010681152, 'learning_rate': 3.7638120473612228e-06, 'epoch': 2.17} +2025-02-06 00:51:27 - ERROR - stderr - 72%|███████▏ | 16218/22434 [14:43:47<4:24:37, 2.55s/it] +2025-02-06 00:51:30 - ERROR - stderr - 72%|███████▏ | 16219/22434 [14:43:50<4:21:14, 2.52s/it] +2025-02-06 00:51:30 - ERROR - stderr - +2025-02-06 00:51:30 - ERROR - stderr - +2025-02-06 00:51:30 - INFO - stdout - {'loss': 0.308, 'grad_norm': 1.2779343128204346, 'learning_rate': 3.7626834943371984e-06, 'epoch': 2.17} +2025-02-06 00:51:30 - ERROR - stderr - 72%|███████▏ | 16219/22434 [14:43:50<4:21:14, 2.52s/it] +2025-02-06 00:51:32 - ERROR - stderr - 72%|███████▏ | 16220/22434 [14:43:52<4:17:55, 2.49s/it] +2025-02-06 00:51:32 - ERROR - stderr - +2025-02-06 00:51:32 - ERROR - stderr - +2025-02-06 00:51:32 - INFO - stdout - {'loss': 0.3706, 'grad_norm': 1.4937680959701538, 'learning_rate': 3.76155507132406e-06, 'epoch': 2.17} +2025-02-06 00:51:32 - ERROR - stderr - 72%|███████▏ | 16220/22434 [14:43:52<4:17:55, 2.49s/it] +2025-02-06 00:51:35 - ERROR - stderr - 72%|███████▏ | 16221/22434 [14:43:55<4:29:07, 2.60s/it] +2025-02-06 00:51:35 - ERROR - stderr - +2025-02-06 00:51:35 - ERROR - stderr - +2025-02-06 00:51:35 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.2822743654251099, 'learning_rate': 3.7604267783453395e-06, 'epoch': 2.17} +2025-02-06 00:51:35 - ERROR - stderr - 72%|███████▏ | 16221/22434 [14:43:55<4:29:07, 2.60s/it] +2025-02-06 00:51:37 - ERROR - stderr - 72%|███████▏ | 16222/22434 [14:43:57<4:23:51, 2.55s/it] +2025-02-06 00:51:38 - ERROR - stderr - +2025-02-06 00:51:38 - ERROR - stderr - +2025-02-06 00:51:38 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.6081180572509766, 'learning_rate': 3.759298615424557e-06, 'epoch': 2.17} +2025-02-06 00:51:38 - ERROR - stderr - 72%|███████▏ | 16222/22434 [14:43:57<4:23:51, 2.55s/it] +2025-02-06 00:51:40 - ERROR - stderr - 72%|███████▏ | 16223/22434 [14:44:00<4:22:43, 2.54s/it] +2025-02-06 00:51:40 - ERROR - stderr - +2025-02-06 00:51:40 - ERROR - stderr - +2025-02-06 00:51:40 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.4695504903793335, 'learning_rate': 3.7581705825852156e-06, 'epoch': 2.17} +2025-02-06 00:51:40 - ERROR - stderr - 72%|███████▏ | 16223/22434 [14:44:00<4:22:43, 2.54s/it] +2025-02-06 00:51:42 - ERROR - stderr - 72%|███████▏ | 16224/22434 [14:44:02<4:20:35, 2.52s/it] +2025-02-06 00:51:43 - ERROR - stderr - +2025-02-06 00:51:43 - ERROR - stderr - +2025-02-06 00:51:43 - INFO - stdout - {'loss': 0.4166, 'grad_norm': 1.583430528640747, 'learning_rate': 3.7570426798508417e-06, 'epoch': 2.17} +2025-02-06 00:51:43 - ERROR - stderr - 72%|███████▏ | 16224/22434 [14:44:02<4:20:35, 2.52s/it] +2025-02-06 00:51:45 - ERROR - stderr - 72%|███████▏ | 16225/22434 [14:44:05<4:19:42, 2.51s/it] +2025-02-06 00:51:45 - ERROR - stderr - +2025-02-06 00:51:45 - ERROR - stderr - +2025-02-06 00:51:45 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.5401413440704346, 'learning_rate': 3.7559149072449377e-06, 'epoch': 2.17} +2025-02-06 00:51:45 - ERROR - stderr - 72%|███████▏ | 16225/22434 [14:44:05<4:19:42, 2.51s/it] +2025-02-06 00:51:48 - ERROR - stderr - 72%|███████▏ | 16226/22434 [14:44:07<4:20:30, 2.52s/it] +2025-02-06 00:51:48 - ERROR - stderr - +2025-02-06 00:51:48 - ERROR - stderr - +2025-02-06 00:51:48 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.3912434577941895, 'learning_rate': 3.754787264791011e-06, 'epoch': 2.17} +2025-02-06 00:51:48 - ERROR - stderr - 72%|███████▏ | 16226/22434 [14:44:07<4:20:30, 2.52s/it] +2025-02-06 00:51:50 - ERROR - stderr - 72%|███████▏ | 16227/22434 [14:44:10<4:19:59, 2.51s/it] +2025-02-06 00:51:50 - ERROR - stderr - +2025-02-06 00:51:50 - ERROR - stderr - +2025-02-06 00:51:50 - INFO - stdout - {'loss': 0.3146, 'grad_norm': 1.2680352926254272, 'learning_rate': 3.7536597525125683e-06, 'epoch': 2.17} +2025-02-06 00:51:50 - ERROR - stderr - 72%|███████▏ | 16227/22434 [14:44:10<4:19:59, 2.51s/it] +2025-02-06 00:51:53 - ERROR - stderr - 72%|███████▏ | 16228/22434 [14:44:12<4:20:02, 2.51s/it] +2025-02-06 00:51:53 - ERROR - stderr - +2025-02-06 00:51:53 - ERROR - stderr - +2025-02-06 00:51:53 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.4927648305892944, 'learning_rate': 3.7525323704331108e-06, 'epoch': 2.17} +2025-02-06 00:51:53 - ERROR - stderr - 72%|███████▏ | 16228/22434 [14:44:12<4:20:02, 2.51s/it] +2025-02-06 00:51:55 - ERROR - stderr - 72%|███████▏ | 16229/22434 [14:44:15<4:21:20, 2.53s/it] +2025-02-06 00:51:55 - ERROR - stderr - +2025-02-06 00:51:55 - ERROR - stderr - +2025-02-06 00:51:55 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.3060998916625977, 'learning_rate': 3.751405118576138e-06, 'epoch': 2.17} +2025-02-06 00:51:55 - ERROR - stderr - 72%|███████▏ | 16229/22434 [14:44:15<4:21:20, 2.53s/it] +2025-02-06 00:51:58 - ERROR - stderr - 72%|███████▏ | 16230/22434 [14:44:17<4:20:57, 2.52s/it] +2025-02-06 00:51:58 - ERROR - stderr - +2025-02-06 00:51:58 - ERROR - stderr - +2025-02-06 00:51:58 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.6163612604141235, 'learning_rate': 3.750277996965146e-06, 'epoch': 2.17} +2025-02-06 00:51:58 - ERROR - stderr - 72%|███████▏ | 16230/22434 [14:44:17<4:20:57, 2.52s/it] +2025-02-06 00:52:00 - ERROR - stderr - 72%|███████▏ | 16231/22434 [14:44:20<4:19:47, 2.51s/it] +2025-02-06 00:52:00 - ERROR - stderr - +2025-02-06 00:52:00 - ERROR - stderr - +2025-02-06 00:52:00 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.566009759902954, 'learning_rate': 3.749151005623629e-06, 'epoch': 2.17} +2025-02-06 00:52:00 - ERROR - stderr - 72%|███████▏ | 16231/22434 [14:44:20<4:19:47, 2.51s/it] +2025-02-06 00:52:03 - ERROR - stderr - 72%|███████▏ | 16232/22434 [14:44:23<4:26:54, 2.58s/it] +2025-02-06 00:52:03 - ERROR - stderr - +2025-02-06 00:52:03 - ERROR - stderr - +2025-02-06 00:52:03 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.4719799757003784, 'learning_rate': 3.7480241445750776e-06, 'epoch': 2.17} +2025-02-06 00:52:03 - ERROR - stderr - 72%|███████▏ | 16232/22434 [14:44:23<4:26:54, 2.58s/it] +2025-02-06 00:52:05 - ERROR - stderr - 72%|███████▏ | 16233/22434 [14:44:25<4:26:11, 2.58s/it] +2025-02-06 00:52:05 - ERROR - stderr - +2025-02-06 00:52:05 - ERROR - stderr - +2025-02-06 00:52:05 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.4604594707489014, 'learning_rate': 3.7468974138429802e-06, 'epoch': 2.17} +2025-02-06 00:52:05 - ERROR - stderr - 72%|███████▏ | 16233/22434 [14:44:25<4:26:11, 2.58s/it] +2025-02-06 00:52:08 - ERROR - stderr - 72%|███████▏ | 16234/22434 [14:44:28<4:24:48, 2.56s/it] +2025-02-06 00:52:08 - ERROR - stderr - +2025-02-06 00:52:08 - ERROR - stderr - +2025-02-06 00:52:08 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.3394873142242432, 'learning_rate': 3.745770813450824e-06, 'epoch': 2.17} +2025-02-06 00:52:08 - ERROR - stderr - 72%|███████▏ | 16234/22434 [14:44:28<4:24:48, 2.56s/it] +2025-02-06 00:52:11 - ERROR - stderr - 72%|███████▏ | 16235/22434 [14:44:30<4:26:11, 2.58s/it] +2025-02-06 00:52:11 - ERROR - stderr - +2025-02-06 00:52:11 - ERROR - stderr - +2025-02-06 00:52:11 - INFO - stdout - {'loss': 0.4034, 'grad_norm': 1.6297065019607544, 'learning_rate': 3.7446443434220894e-06, 'epoch': 2.17} +2025-02-06 00:52:11 - ERROR - stderr - 72%|███████▏ | 16235/22434 [14:44:30<4:26:11, 2.58s/it] +2025-02-06 00:52:13 - ERROR - stderr - 72%|███████▏ | 16236/22434 [14:44:33<4:24:11, 2.56s/it] +2025-02-06 00:52:13 - ERROR - stderr - +2025-02-06 00:52:13 - ERROR - stderr - +2025-02-06 00:52:13 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.4894779920578003, 'learning_rate': 3.7435180037802575e-06, 'epoch': 2.17} +2025-02-06 00:52:13 - ERROR - stderr - 72%|███████▏ | 16236/22434 [14:44:33<4:24:11, 2.56s/it] +2025-02-06 00:52:16 - ERROR - stderr - 72%|███████▏ | 16237/22434 [14:44:35<4:24:26, 2.56s/it] +2025-02-06 00:52:16 - ERROR - stderr - +2025-02-06 00:52:16 - ERROR - stderr - +2025-02-06 00:52:16 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.478907823562622, 'learning_rate': 3.7423917945488075e-06, 'epoch': 2.17} +2025-02-06 00:52:16 - ERROR - stderr - 72%|███████▏ | 16237/22434 [14:44:35<4:24:26, 2.56s/it] +2025-02-06 00:52:18 - ERROR - stderr - 72%|███████▏ | 16238/22434 [14:44:38<4:22:21, 2.54s/it] +2025-02-06 00:52:18 - ERROR - stderr - +2025-02-06 00:52:18 - ERROR - stderr - +2025-02-06 00:52:18 - INFO - stdout - {'loss': 0.4439, 'grad_norm': 1.7704520225524902, 'learning_rate': 3.7412657157512144e-06, 'epoch': 2.17} +2025-02-06 00:52:18 - ERROR - stderr - 72%|███████▏ | 16238/22434 [14:44:38<4:22:21, 2.54s/it] +2025-02-06 00:52:21 - ERROR - stderr - 72%|███████▏ | 16239/22434 [14:44:40<4:19:42, 2.52s/it] +2025-02-06 00:52:21 - ERROR - stderr - +2025-02-06 00:52:21 - ERROR - stderr - +2025-02-06 00:52:21 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.4853854179382324, 'learning_rate': 3.740139767410943e-06, 'epoch': 2.17} +2025-02-06 00:52:21 - ERROR - stderr - 72%|███████▏ | 16239/22434 [14:44:40<4:19:42, 2.52s/it] +2025-02-06 00:52:23 - ERROR - stderr - 72%|███████▏ | 16240/22434 [14:44:43<4:17:57, 2.50s/it] +2025-02-06 00:52:23 - ERROR - stderr - +2025-02-06 00:52:23 - ERROR - stderr - +2025-02-06 00:52:23 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.542970061302185, 'learning_rate': 3.739013949551471e-06, 'epoch': 2.17} +2025-02-06 00:52:23 - ERROR - stderr - 72%|███████▏ | 16240/22434 [14:44:43<4:17:57, 2.50s/it] +2025-02-06 00:52:25 - ERROR - stderr - 72%|███████▏ | 16241/22434 [14:44:45<4:16:59, 2.49s/it] +2025-02-06 00:52:26 - ERROR - stderr - +2025-02-06 00:52:26 - ERROR - stderr - +2025-02-06 00:52:26 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.3319801092147827, 'learning_rate': 3.737888262196262e-06, 'epoch': 2.17} +2025-02-06 00:52:26 - ERROR - stderr - 72%|███████▏ | 16241/22434 [14:44:45<4:16:59, 2.49s/it] +2025-02-06 00:52:28 - ERROR - stderr - 72%|███████▏ | 16242/22434 [14:44:48<4:17:02, 2.49s/it] +2025-02-06 00:52:28 - ERROR - stderr - +2025-02-06 00:52:28 - ERROR - stderr - +2025-02-06 00:52:28 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.6695350408554077, 'learning_rate': 3.7367627053687796e-06, 'epoch': 2.17} +2025-02-06 00:52:28 - ERROR - stderr - 72%|████���██▏ | 16242/22434 [14:44:48<4:17:02, 2.49s/it] +2025-02-06 00:52:31 - ERROR - stderr - 72%|███████▏ | 16243/22434 [14:44:50<4:20:17, 2.52s/it] +2025-02-06 00:52:31 - ERROR - stderr - +2025-02-06 00:52:31 - ERROR - stderr - +2025-02-06 00:52:31 - INFO - stdout - {'loss': 0.4105, 'grad_norm': 1.4810701608657837, 'learning_rate': 3.735637279092489e-06, 'epoch': 2.17} +2025-02-06 00:52:31 - ERROR - stderr - 72%|███████▏ | 16243/22434 [14:44:50<4:20:17, 2.52s/it] +2025-02-06 00:52:33 - ERROR - stderr - 72%|███████▏ | 16244/22434 [14:44:53<4:23:06, 2.55s/it] +2025-02-06 00:52:33 - ERROR - stderr - +2025-02-06 00:52:33 - ERROR - stderr - +2025-02-06 00:52:33 - INFO - stdout - {'loss': 0.4257, 'grad_norm': 1.6019119024276733, 'learning_rate': 3.7345119833908383e-06, 'epoch': 2.17} +2025-02-06 00:52:33 - ERROR - stderr - 72%|███████▏ | 16244/22434 [14:44:53<4:23:06, 2.55s/it] +2025-02-06 00:52:36 - ERROR - stderr - 72%|███████▏ | 16245/22434 [14:44:56<4:24:56, 2.57s/it] +2025-02-06 00:52:36 - ERROR - stderr - +2025-02-06 00:52:36 - ERROR - stderr - +2025-02-06 00:52:36 - INFO - stdout - {'loss': 0.3971, 'grad_norm': 1.475785493850708, 'learning_rate': 3.7333868182872966e-06, 'epoch': 2.17} +2025-02-06 00:52:36 - ERROR - stderr - 72%|███████▏ | 16245/22434 [14:44:56<4:24:56, 2.57s/it] +2025-02-06 00:52:38 - ERROR - stderr - 72%|███████▏ | 16246/22434 [14:44:58<4:21:31, 2.54s/it] +2025-02-06 00:52:38 - ERROR - stderr - +2025-02-06 00:52:38 - ERROR - stderr - +2025-02-06 00:52:38 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.5395865440368652, 'learning_rate': 3.7322617838053066e-06, 'epoch': 2.17} +2025-02-06 00:52:38 - ERROR - stderr - 72%|███████▏ | 16246/22434 [14:44:58<4:21:31, 2.54s/it] +2025-02-06 00:52:41 - ERROR - stderr - 72%|███████▏ | 16247/22434 [14:45:01<4:20:18, 2.52s/it] +2025-02-06 00:52:41 - ERROR - stderr - +2025-02-06 00:52:41 - ERROR - stderr - +2025-02-06 00:52:41 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5096434354782104, 'learning_rate': 3.731136879968319e-06, 'epoch': 2.17} +2025-02-06 00:52:41 - ERROR - stderr - 72%|███████▏ | 16247/22434 [14:45:01<4:20:18, 2.52s/it] +2025-02-06 00:52:43 - ERROR - stderr - 72%|███████▏ | 16248/22434 [14:45:03<4:19:45, 2.52s/it] +2025-02-06 00:52:43 - ERROR - stderr - +2025-02-06 00:52:43 - ERROR - stderr - +2025-02-06 00:52:43 - INFO - stdout - {'loss': 0.3615, 'grad_norm': 1.428429365158081, 'learning_rate': 3.7300121067997917e-06, 'epoch': 2.17} +2025-02-06 00:52:43 - ERROR - stderr - 72%|███████▏ | 16248/22434 [14:45:03<4:19:45, 2.52s/it] +2025-02-06 00:52:46 - ERROR - stderr - 72%|███████▏ | 16249/22434 [14:45:05<4:17:44, 2.50s/it] +2025-02-06 00:52:46 - ERROR - stderr - +2025-02-06 00:52:46 - ERROR - stderr - +2025-02-06 00:52:46 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.5296647548675537, 'learning_rate': 3.7288874643231543e-06, 'epoch': 2.17} +2025-02-06 00:52:46 - ERROR - stderr - 72%|███████▏ | 16249/22434 [14:45:06<4:17:44, 2.50s/it] +2025-02-06 00:52:48 - ERROR - stderr - 72%|███████▏ | 16250/22434 [14:45:08<4:20:47, 2.53s/it] +2025-02-06 00:52:48 - ERROR - stderr - +2025-02-06 00:52:48 - ERROR - stderr - +2025-02-06 00:52:48 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.5431780815124512, 'learning_rate': 3.7277629525618653e-06, 'epoch': 2.17} +2025-02-06 00:52:48 - ERROR - stderr - 72%|███████▏ | 16250/22434 [14:45:08<4:20:47, 2.53s/it] +2025-02-06 00:52:51 - ERROR - stderr - 72%|███████▏ | 16251/22434 [14:45:11<4:20:25, 2.53s/it] +2025-02-06 00:52:51 - ERROR - stderr - +2025-02-06 00:52:51 - ERROR - stderr - +2025-02-06 00:52:51 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.2569829225540161, 'learning_rate': 3.7266385715393515e-06, 'epoch': 2.17} +2025-02-06 00:52:51 - ERROR - stderr - 72%|███████▏ | 16251/22434 [14:45:11<4:20:25, 2.53s/it] +2025-02-06 00:52:53 - ERROR - stderr - 72%|███████▏ | 16252/22434 [14:45:13<4:19:10, 2.52s/it] +2025-02-06 00:52:53 - ERROR - stderr - +2025-02-06 00:52:53 - ERROR - stderr - +2025-02-06 00:52:53 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.3211477994918823, 'learning_rate': 3.7255143212790536e-06, 'epoch': 2.17} +2025-02-06 00:52:53 - ERROR - stderr - 72%|███████▏ | 16252/22434 [14:45:13<4:19:10, 2.52s/it] +2025-02-06 00:52:56 - ERROR - stderr - 72%|███████▏ | 16253/22434 [14:45:16<4:20:56, 2.53s/it] +2025-02-06 00:52:56 - ERROR - stderr - +2025-02-06 00:52:56 - ERROR - stderr - +2025-02-06 00:52:56 - INFO - stdout - {'loss': 0.4489, 'grad_norm': 1.6510108709335327, 'learning_rate': 3.7243902018044054e-06, 'epoch': 2.17} +2025-02-06 00:52:56 - ERROR - stderr - 72%|███████▏ | 16253/22434 [14:45:16<4:20:56, 2.53s/it] +2025-02-06 00:52:58 - ERROR - stderr - 72%|███████▏ | 16254/22434 [14:45:18<4:19:23, 2.52s/it] +2025-02-06 00:52:58 - ERROR - stderr - +2025-02-06 00:52:58 - ERROR - stderr - +2025-02-06 00:52:58 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.542637825012207, 'learning_rate': 3.7232662131388386e-06, 'epoch': 2.17} +2025-02-06 00:52:58 - ERROR - stderr - 72%|███████▏ | 16254/22434 [14:45:18<4:19:23, 2.52s/it] +2025-02-06 00:53:01 - ERROR - stderr - 72%|███████▏ | 16255/22434 [14:45:21<4:20:59, 2.53s/it] +2025-02-06 00:53:01 - ERROR - stderr - +2025-02-06 00:53:01 - ERROR - stderr - +2025-02-06 00:53:01 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.6173707246780396, 'learning_rate': 3.7221423553057814e-06, 'epoch': 2.17} +2025-02-06 00:53:01 - ERROR - stderr - 72%|███████▏ | 16255/22434 [14:45:21<4:20:59, 2.53s/it] +2025-02-06 00:53:04 - ERROR - stderr - 72%|███████▏ | 16256/22434 [14:45:23<4:26:28, 2.59s/it] +2025-02-06 00:53:04 - ERROR - stderr - +2025-02-06 00:53:04 - ERROR - stderr - +2025-02-06 00:53:04 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.5031521320343018, 'learning_rate': 3.7210186283286596e-06, 'epoch': 2.17} +2025-02-06 00:53:04 - ERROR - stderr - 72%|███████▏ | 16256/22434 [14:45:23<4:26:28, 2.59s/it] +2025-02-06 00:53:06 - ERROR - stderr - 72%|███████▏ | 16257/22434 [14:45:26<4:23:20, 2.56s/it] +2025-02-06 00:53:06 - ERROR - stderr - +2025-02-06 00:53:06 - ERROR - stderr - +2025-02-06 00:53:06 - INFO - stdout - {'loss': 0.3814, 'grad_norm': 1.5874103307724, 'learning_rate': 3.7198950322308956e-06, 'epoch': 2.17} +2025-02-06 00:53:06 - ERROR - stderr - 72%|███████▏ | 16257/22434 [14:45:26<4:23:20, 2.56s/it] +2025-02-06 00:53:09 - ERROR - stderr - 72%|███████▏ | 16258/22434 [14:45:28<4:20:12, 2.53s/it] +2025-02-06 00:53:09 - ERROR - stderr - +2025-02-06 00:53:09 - ERROR - stderr - +2025-02-06 00:53:09 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.6553881168365479, 'learning_rate': 3.7187715670359114e-06, 'epoch': 2.17} +2025-02-06 00:53:09 - ERROR - stderr - 72%|███████▏ | 16258/22434 [14:45:28<4:20:12, 2.53s/it] +2025-02-06 00:53:11 - ERROR - stderr - 72%|███████▏ | 16259/22434 [14:45:31<4:18:25, 2.51s/it] +2025-02-06 00:53:11 - ERROR - stderr - +2025-02-06 00:53:11 - ERROR - stderr - +2025-02-06 00:53:11 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.4726183414459229, 'learning_rate': 3.7176482327671224e-06, 'epoch': 2.17} +2025-02-06 00:53:11 - ERROR - stderr - 72%|███████▏ | 16259/22434 [14:45:31<4:18:25, 2.51s/it] +2025-02-06 00:53:14 - ERROR - stderr - 72%|███████▏ | 16260/22434 [14:45:34<4:24:22, 2.57s/it] +2025-02-06 00:53:14 - ERROR - stderr - +2025-02-06 00:53:14 - ERROR - stderr - +2025-02-06 00:53:14 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.512617826461792, 'learning_rate': 3.716525029447945e-06, 'epoch': 2.17} +2025-02-06 00:53:14 - ERROR - stderr - 72%|███████▏ | 16260/22434 [14:45:34<4:24:22, 2.57s/it] +2025-02-06 00:53:16 - ERROR - stderr - 72%|███████▏ | 16261/22434 [14:45:36<4:26:06, 2.59s/it] +2025-02-06 00:53:16 - ERROR - stderr - +2025-02-06 00:53:16 - ERROR - stderr - +2025-02-06 00:53:16 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.5675050020217896, 'learning_rate': 3.7154019571017907e-06, 'epoch': 2.17} +2025-02-06 00:53:16 - ERROR - stderr - 72%|███████▏ | 16261/22434 [14:45:36<4:26:06, 2.59s/it] +2025-02-06 00:53:19 - ERROR - stderr - 72%|███████▏ | 16262/22434 [14:45:39<4:23:39, 2.56s/it] +2025-02-06 00:53:19 - ERROR - stderr - +2025-02-06 00:53:19 - ERROR - stderr - +2025-02-06 00:53:19 - INFO - stdout - {'loss': 0.3647, 'grad_norm': 1.5499221086502075, 'learning_rate': 3.7142790157520725e-06, 'epoch': 2.17} +2025-02-06 00:53:19 - ERROR - stderr - 72%|███████▏ | 16262/22434 [14:45:39<4:23:39, 2.56s/it] +2025-02-06 00:53:21 - ERROR - stderr - 72%|███████▏ | 16263/22434 [14:45:41<4:20:50, 2.54s/it] +2025-02-06 00:53:21 - ERROR - stderr - +2025-02-06 00:53:21 - ERROR - stderr - +2025-02-06 00:53:21 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.5489075183868408, 'learning_rate': 3.713156205422186e-06, 'epoch': 2.17} +2025-02-06 00:53:21 - ERROR - stderr - 72%|███████▏ | 16263/22434 [14:45:41<4:20:50, 2.54s/it] +2025-02-06 00:53:24 - ERROR - stderr - 72%|███████▏ | 16264/22434 [14:45:44<4:21:44, 2.55s/it] +2025-02-06 00:53:24 - ERROR - stderr - +2025-02-06 00:53:24 - ERROR - stderr - +2025-02-06 00:53:24 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.3635002374649048, 'learning_rate': 3.71203352613555e-06, 'epoch': 2.17} +2025-02-06 00:53:24 - ERROR - stderr - 72%|███████▏ | 16264/22434 [14:45:44<4:21:44, 2.55s/it] +2025-02-06 00:53:26 - ERROR - stderr - 73%|███████▎ | 16265/22434 [14:45:46<4:20:52, 2.54s/it] +2025-02-06 00:53:27 - ERROR - stderr - +2025-02-06 00:53:27 - ERROR - stderr - +2025-02-06 00:53:27 - INFO - stdout - {'loss': 0.375, 'grad_norm': 1.5593349933624268, 'learning_rate': 3.7109109779155505e-06, 'epoch': 2.18} +2025-02-06 00:53:27 - ERROR - stderr - 73%|███████▎ | 16265/22434 [14:45:46<4:20:52, 2.54s/it] +2025-02-06 00:53:29 - ERROR - stderr - 73%|███████▎ | 16266/22434 [14:45:49<4:22:06, 2.55s/it] +2025-02-06 00:53:29 - ERROR - stderr - +2025-02-06 00:53:29 - ERROR - stderr - +2025-02-06 00:53:29 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.4230684041976929, 'learning_rate': 3.7097885607855977e-06, 'epoch': 2.18} +2025-02-06 00:53:29 - ERROR - stderr - 73%|███████▎ | 16266/22434 [14:45:49<4:22:06, 2.55s/it] +2025-02-06 00:53:32 - ERROR - stderr - 73%|███████▎ | 16267/22434 [14:45:51<4:21:18, 2.54s/it] +2025-02-06 00:53:32 - ERROR - stderr - +2025-02-06 00:53:32 - ERROR - stderr - +2025-02-06 00:53:32 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.5980064868927002, 'learning_rate': 3.7086662747690873e-06, 'epoch': 2.18} +2025-02-06 00:53:32 - ERROR - stderr - 73%|███████▎ | 16267/22434 [14:45:51<4:21:18, 2.54s/it] +2025-02-06 00:53:34 - ERROR - stderr - 73%|███████▎ | 16268/22434 [14:45:54<4:19:40, 2.53s/it] +2025-02-06 00:53:34 - ERROR - stderr - +2025-02-06 00:53:34 - ERROR - stderr - +2025-02-06 00:53:34 - INFO - stdout - {'loss': 0.3973, 'grad_norm': 1.4956570863723755, 'learning_rate': 3.7075441198894004e-06, 'epoch': 2.18} +2025-02-06 00:53:34 - ERROR - stderr - 73%|███████▎ | 16268/22434 [14:45:54<4:19:40, 2.53s/it] +2025-02-06 00:53:37 - ERROR - stderr - 73%|███████▎ | 16269/22434 [14:45:56<4:19:14, 2.52s/it] +2025-02-06 00:53:37 - ERROR - stderr - +2025-02-06 00:53:37 - ERROR - stderr - +2025-02-06 00:53:37 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.4807687997817993, 'learning_rate': 3.7064220961699427e-06, 'epoch': 2.18} +2025-02-06 00:53:37 - ERROR - stderr - 73%|███████▎ | 16269/22434 [14:45:56<4:19:14, 2.52s/it] +2025-02-06 00:53:39 - ERROR - stderr - 73%|███████▎ | 16270/22434 [14:45:59<4:30:12, 2.63s/it] +2025-02-06 00:53:40 - ERROR - stderr - +2025-02-06 00:53:40 - ERROR - stderr - +2025-02-06 00:53:40 - INFO - stdout - {'loss': 0.3885, 'grad_norm': 1.5164546966552734, 'learning_rate': 3.70530020363409e-06, 'epoch': 2.18} +2025-02-06 00:53:40 - ERROR - stderr - 73%|███████▎ | 16270/22434 [14:45:59<4:30:12, 2.63s/it] +2025-02-06 00:53:42 - ERROR - stderr - 73%|███████▎ | 16271/22434 [14:46:02<4:26:51, 2.60s/it] +2025-02-06 00:53:42 - ERROR - stderr - +2025-02-06 00:53:42 - ERROR - stderr - +2025-02-06 00:53:42 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.6316159963607788, 'learning_rate': 3.704178442305231e-06, 'epoch': 2.18} +2025-02-06 00:53:42 - ERROR - stderr - 73%|███████▎ | 16271/22434 [14:46:02<4:26:51, 2.60s/it] +2025-02-06 00:53:45 - ERROR - stderr - 73%|███████▎ | 16272/22434 [14:46:04<4:29:23, 2.62s/it] +2025-02-06 00:53:45 - ERROR - stderr - +2025-02-06 00:53:45 - ERROR - stderr - +2025-02-06 00:53:45 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.6896103620529175, 'learning_rate': 3.703056812206748e-06, 'epoch': 2.18} +2025-02-06 00:53:45 - ERROR - stderr - 73%|███████▎ | 16272/22434 [14:46:05<4:29:23, 2.62s/it] +2025-02-06 00:53:47 - ERROR - stderr - 73%|███████▎ | 16273/22434 [14:46:07<4:26:15, 2.59s/it] +2025-02-06 00:53:47 - ERROR - stderr - +2025-02-06 00:53:47 - ERROR - stderr - +2025-02-06 00:53:47 - INFO - stdout - {'loss': 0.4389, 'grad_norm': 1.6158279180526733, 'learning_rate': 3.7019353133620208e-06, 'epoch': 2.18} +2025-02-06 00:53:47 - ERROR - stderr - 73%|███████▎ | 16273/22434 [14:46:07<4:26:15, 2.59s/it] +2025-02-06 00:53:50 - ERROR - stderr - 73%|███████▎ | 16274/22434 [14:46:10<4:28:38, 2.62s/it] +2025-02-06 00:53:50 - ERROR - stderr - +2025-02-06 00:53:50 - ERROR - stderr - +2025-02-06 00:53:50 - INFO - stdout - {'loss': 0.4283, 'grad_norm': 1.7111865282058716, 'learning_rate': 3.700813945794425e-06, 'epoch': 2.18} +2025-02-06 00:53:50 - ERROR - stderr - 73%|███████▎ | 16274/22434 [14:46:10<4:28:38, 2.62s/it] +2025-02-06 00:53:52 - ERROR - stderr - 73%|███████▎ | 16275/22434 [14:46:12<4:23:15, 2.56s/it] +2025-02-06 00:53:52 - ERROR - stderr - +2025-02-06 00:53:52 - ERROR - stderr - +2025-02-06 00:53:52 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.4147570133209229, 'learning_rate': 3.699692709527335e-06, 'epoch': 2.18} +2025-02-06 00:53:52 - ERROR - stderr - 73%|███████▎ | 16275/22434 [14:46:12<4:23:15, 2.56s/it] +2025-02-06 00:53:55 - ERROR - stderr - 73%|███████▎ | 16276/22434 [14:46:15<4:22:52, 2.56s/it] +2025-02-06 00:53:55 - ERROR - stderr - +2025-02-06 00:53:55 - ERROR - stderr - +2025-02-06 00:53:55 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.4546010494232178, 'learning_rate': 3.6985716045841223e-06, 'epoch': 2.18} +2025-02-06 00:53:55 - ERROR - stderr - 73%|███████▎ | 16276/22434 [14:46:15<4:22:52, 2.56s/it] +2025-02-06 00:53:57 - ERROR - stderr - 73%|███████▎ | 16277/22434 [14:46:17<4:22:08, 2.55s/it] +2025-02-06 00:53:57 - ERROR - stderr - +2025-02-06 00:53:57 - ERROR - stderr - +2025-02-06 00:53:57 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.431753396987915, 'learning_rate': 3.697450630988154e-06, 'epoch': 2.18} +2025-02-06 00:53:57 - ERROR - stderr - 73%|███████▎ | 16277/22434 [14:46:17<4:22:08, 2.55s/it] +2025-02-06 00:54:00 - ERROR - stderr - 73%|███████▎ | 16278/22434 [14:46:20<4:22:22, 2.56s/it] +2025-02-06 00:54:00 - ERROR - stderr - +2025-02-06 00:54:00 - ERROR - stderr - +2025-02-06 00:54:00 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.2828805446624756, 'learning_rate': 3.6963297887627957e-06, 'epoch': 2.18} +2025-02-06 00:54:00 - ERROR - stderr - 73%|███████▎ | 16278/22434 [14:46:20<4:22:22, 2.56s/it] +2025-02-06 00:54:03 - ERROR - stderr - 73%|███████▎ | 16279/22434 [14:46:22<4:21:13, 2.55s/it] +2025-02-06 00:54:03 - ERROR - stderr - +2025-02-06 00:54:03 - ERROR - stderr - +2025-02-06 00:54:03 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.6131726503372192, 'learning_rate': 3.695209077931412e-06, 'epoch': 2.18} +2025-02-06 00:54:03 - ERROR - stderr - 73%|███████▎ | 16279/22434 [14:46:22<4:21:13, 2.55s/it] +2025-02-06 00:54:05 - ERROR - stderr - 73%|███████▎ | 16280/22434 [14:46:25<4:30:27, 2.64s/it] +2025-02-06 00:54:05 - ERROR - stderr - +2025-02-06 00:54:05 - ERROR - stderr - +2025-02-06 00:54:05 - INFO - stdout - {'loss': 0.411, 'grad_norm': 1.7734261751174927, 'learning_rate': 3.694088498517362e-06, 'epoch': 2.18} +2025-02-06 00:54:05 - ERROR - stderr - 73%|███████▎ | 16280/22434 [14:46:25<4:30:27, 2.64s/it] +2025-02-06 00:54:08 - ERROR - stderr - 73%|███████▎ | 16281/22434 [14:46:28<4:25:56, 2.59s/it] +2025-02-06 00:54:08 - ERROR - stderr - +2025-02-06 00:54:08 - ERROR - stderr - +2025-02-06 00:54:08 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.5452563762664795, 'learning_rate': 3.6929680505440035e-06, 'epoch': 2.18} +2025-02-06 00:54:08 - ERROR - stderr - 73%|███████▎ | 16281/22434 [14:46:28<4:25:56, 2.59s/it] +2025-02-06 00:54:10 - ERROR - stderr - 73%|███████▎ | 16282/22434 [14:46:30<4:23:27, 2.57s/it] +2025-02-06 00:54:10 - ERROR - stderr - +2025-02-06 00:54:10 - ERROR - stderr - +2025-02-06 00:54:10 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.6094892024993896, 'learning_rate': 3.6918477340346903e-06, 'epoch': 2.18} +2025-02-06 00:54:10 - ERROR - stderr - 73%|███████▎ | 16282/22434 [14:46:30<4:23:27, 2.57s/it] +2025-02-06 00:54:13 - ERROR - stderr - 73%|███████▎ | 16283/22434 [14:46:33<4:21:32, 2.55s/it] +2025-02-06 00:54:13 - ERROR - stderr - +2025-02-06 00:54:13 - ERROR - stderr - +2025-02-06 00:54:13 - INFO - stdout - {'loss': 0.3219, 'grad_norm': 1.2754590511322021, 'learning_rate': 3.690727549012778e-06, 'epoch': 2.18} +2025-02-06 00:54:13 - ERROR - stderr - 73%|███████▎ | 16283/22434 [14:46:33<4:21:32, 2.55s/it] +2025-02-06 00:54:15 - ERROR - stderr - 73%|███████▎ | 16284/22434 [14:46:35<4:17:28, 2.51s/it] +2025-02-06 00:54:15 - ERROR - stderr - +2025-02-06 00:54:15 - ERROR - stderr - +2025-02-06 00:54:15 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.4750466346740723, 'learning_rate': 3.689607495501606e-06, 'epoch': 2.18} +2025-02-06 00:54:15 - ERROR - stderr - 73%|███████▎ | 16284/22434 [14:46:35<4:17:28, 2.51s/it] +2025-02-06 00:54:18 - ERROR - stderr - 73%|███████▎ | 16285/22434 [14:46:38<4:33:54, 2.67s/it] +2025-02-06 00:54:18 - ERROR - stderr - +2025-02-06 00:54:18 - ERROR - stderr - +2025-02-06 00:54:18 - INFO - stdout - {'loss': 0.3958, 'grad_norm': 1.6493852138519287, 'learning_rate': 3.6884875735245307e-06, 'epoch': 2.18} +2025-02-06 00:54:18 - ERROR - stderr - 73%|███████▎ | 16285/22434 [14:46:38<4:33:54, 2.67s/it] +2025-02-06 00:54:21 - ERROR - stderr - 73%|███████▎ | 16286/22434 [14:46:41<4:29:33, 2.63s/it] +2025-02-06 00:54:21 - ERROR - stderr - +2025-02-06 00:54:21 - ERROR - stderr - +2025-02-06 00:54:21 - INFO - stdout - {'loss': 0.4202, 'grad_norm': 1.732311725616455, 'learning_rate': 3.687367783104896e-06, 'epoch': 2.18} +2025-02-06 00:54:21 - ERROR - stderr - 73%|███████▎ | 16286/22434 [14:46:41<4:29:33, 2.63s/it] +2025-02-06 00:54:23 - ERROR - stderr - 73%|███████▎ | 16287/22434 [14:46:43<4:24:38, 2.58s/it] +2025-02-06 00:54:23 - ERROR - stderr - +2025-02-06 00:54:23 - ERROR - stderr - +2025-02-06 00:54:23 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.4855539798736572, 'learning_rate': 3.686248124266033e-06, 'epoch': 2.18} +2025-02-06 00:54:23 - ERROR - stderr - 73%|███████▎ | 16287/22434 [14:46:43<4:24:38, 2.58s/it] +2025-02-06 00:54:26 - ERROR - stderr - 73%|███████▎ | 16288/22434 [14:46:46<4:22:13, 2.56s/it] +2025-02-06 00:54:26 - ERROR - stderr - +2025-02-06 00:54:26 - ERROR - stderr - +2025-02-06 00:54:26 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.514873743057251, 'learning_rate': 3.6851285970312923e-06, 'epoch': 2.18} +2025-02-06 00:54:26 - ERROR - stderr - 73%|███████▎ | 16288/22434 [14:46:46<4:22:13, 2.56s/it] +2025-02-06 00:54:28 - ERROR - stderr - 73%|███████▎ | 16289/22434 [14:46:48<4:23:32, 2.57s/it] +2025-02-06 00:54:28 - ERROR - stderr - +2025-02-06 00:54:28 - ERROR - stderr - +2025-02-06 00:54:28 - INFO - stdout - {'loss': 0.4069, 'grad_norm': 1.529582142829895, 'learning_rate': 3.6840092014239968e-06, 'epoch': 2.18} +2025-02-06 00:54:28 - ERROR - stderr - 73%|███████▎ | 16289/22434 [14:46:48<4:23:32, 2.57s/it] +2025-02-06 00:54:31 - ERROR - stderr - 73%|███████▎ | 16290/22434 [14:46:51<4:24:28, 2.58s/it] +2025-02-06 00:54:31 - ERROR - stderr - +2025-02-06 00:54:31 - ERROR - stderr - +2025-02-06 00:54:31 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.5395400524139404, 'learning_rate': 3.6828899374674933e-06, 'epoch': 2.18} +2025-02-06 00:54:31 - ERROR - stderr - 73%|███████▎ | 16290/22434 [14:46:51<4:24:28, 2.58s/it] +2025-02-06 00:54:34 - ERROR - stderr - 73%|███████▎ | 16291/22434 [14:46:53<4:21:37, 2.56s/it] +2025-02-06 00:54:34 - ERROR - stderr - +2025-02-06 00:54:34 - ERROR - stderr - +2025-02-06 00:54:34 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.4626466035842896, 'learning_rate': 3.6817708051851e-06, 'epoch': 2.18} +2025-02-06 00:54:34 - ERROR - stderr - 73%|███████▎ | 16291/22434 [14:46:53<4:21:37, 2.56s/it] +2025-02-06 00:54:36 - ERROR - stderr - 73%|███████▎ | 16292/22434 [14:46:56<4:21:22, 2.55s/it] +2025-02-06 00:54:36 - ERROR - stderr - +2025-02-06 00:54:36 - ERROR - stderr - +2025-02-06 00:54:36 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.6334456205368042, 'learning_rate': 3.680651804600148e-06, 'epoch': 2.18} +2025-02-06 00:54:36 - ERROR - stderr - 73%|███████▎ | 16292/22434 [14:46:56<4:21:22, 2.55s/it] +2025-02-06 00:54:39 - ERROR - stderr - 73%|███████▎ | 16293/22434 [14:46:58<4:20:03, 2.54s/it] +2025-02-06 00:54:39 - ERROR - stderr - +2025-02-06 00:54:39 - ERROR - stderr - +2025-02-06 00:54:39 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.5947253704071045, 'learning_rate': 3.679532935735962e-06, 'epoch': 2.18} +2025-02-06 00:54:39 - ERROR - stderr - 73%|███████▎ | 16293/22434 [14:46:58<4:20:03, 2.54s/it] +2025-02-06 00:54:41 - ERROR - stderr - 73%|███████▎ | 16294/22434 [14:47:01<4:17:48, 2.52s/it] +2025-02-06 00:54:41 - ERROR - stderr - +2025-02-06 00:54:41 - ERROR - stderr - +2025-02-06 00:54:41 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.5445412397384644, 'learning_rate': 3.6784141986158652e-06, 'epoch': 2.18} +2025-02-06 00:54:41 - ERROR - stderr - 73%|███████▎ | 16294/22434 [14:47:01<4:17:48, 2.52s/it] +2025-02-06 00:54:44 - ERROR - stderr - 73%|███████▎ | 16295/22434 [14:47:03<4:16:25, 2.51s/it] +2025-02-06 00:54:44 - ERROR - stderr - +2025-02-06 00:54:44 - ERROR - stderr - +2025-02-06 00:54:44 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.4923593997955322, 'learning_rate': 3.6772955932631748e-06, 'epoch': 2.18} +2025-02-06 00:54:44 - ERROR - stderr - 73%|███████▎ | 16295/22434 [14:47:03<4:16:25, 2.51s/it] +2025-02-06 00:54:46 - ERROR - stderr - 73%|███████▎ | 16296/22434 [14:47:06<4:17:43, 2.52s/it] +2025-02-06 00:54:46 - ERROR - stderr - +2025-02-06 00:54:46 - ERROR - stderr - +2025-02-06 00:54:46 - INFO - stdout - {'loss': 0.4, 'grad_norm': 1.6122815608978271, 'learning_rate': 3.6761771197012075e-06, 'epoch': 2.18} +2025-02-06 00:54:46 - ERROR - stderr - 73%|███████▎ | 16296/22434 [14:47:06<4:17:43, 2.52s/it] +2025-02-06 00:54:49 - ERROR - stderr - 73%|███████▎ | 16297/22434 [14:47:08<4:15:59, 2.50s/it] +2025-02-06 00:54:49 - ERROR - stderr - +2025-02-06 00:54:49 - ERROR - stderr - +2025-02-06 00:54:49 - INFO - stdout - {'loss': 0.3627, 'grad_norm': 1.423462986946106, 'learning_rate': 3.6750587779532763e-06, 'epoch': 2.18} +2025-02-06 00:54:49 - ERROR - stderr - 73%|███████▎ | 16297/22434 [14:47:08<4:15:59, 2.50s/it] +2025-02-06 00:54:51 - ERROR - stderr - 73%|███████▎ | 16298/22434 [14:47:11<4:16:41, 2.51s/it] +2025-02-06 00:54:51 - ERROR - stderr - +2025-02-06 00:54:51 - ERROR - stderr - +2025-02-06 00:54:51 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.3249611854553223, 'learning_rate': 3.6739405680426933e-06, 'epoch': 2.18} +2025-02-06 00:54:51 - ERROR - stderr - 73%|███████▎ | 16298/22434 [14:47:11<4:16:41, 2.51s/it] +2025-02-06 00:54:54 - ERROR - stderr - 73%|███████▎ | 16299/22434 [14:47:13<4:16:51, 2.51s/it] +2025-02-06 00:54:54 - ERROR - stderr - +2025-02-06 00:54:54 - ERROR - stderr - +2025-02-06 00:54:54 - INFO - stdout - {'loss': 0.4328, 'grad_norm': 1.5497000217437744, 'learning_rate': 3.6728224899927658e-06, 'epoch': 2.18} +2025-02-06 00:54:54 - ERROR - stderr - 73%|███████▎ | 16299/22434 [14:47:13<4:16:51, 2.51s/it] +2025-02-06 00:54:56 - ERROR - stderr - 73%|███████▎ | 16300/22434 [14:47:16<4:22:35, 2.57s/it] +2025-02-06 00:54:56 - ERROR - stderr - +2025-02-06 00:54:56 - ERROR - stderr - +2025-02-06 00:54:56 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.2785531282424927, 'learning_rate': 3.6717045438267986e-06, 'epoch': 2.18} +2025-02-06 00:54:56 - ERROR - stderr - 73%|███████▎ | 16300/22434 [14:47:16<4:22:35, 2.57s/it] +2025-02-06 00:54:59 - ERROR - stderr - 73%|███████▎ | 16301/22434 [14:47:19<4:19:21, 2.54s/it] +2025-02-06 00:54:59 - ERROR - stderr - +2025-02-06 00:54:59 - ERROR - stderr - +2025-02-06 00:54:59 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.3955689668655396, 'learning_rate': 3.6705867295680954e-06, 'epoch': 2.18} +2025-02-06 00:54:59 - ERROR - stderr - 73%|███████▎ | 16301/22434 [14:47:19<4:19:21, 2.54s/it] +2025-02-06 00:55:01 - ERROR - stderr - 73%|███████▎ | 16302/22434 [14:47:21<4:24:11, 2.59s/it] +2025-02-06 00:55:02 - ERROR - stderr - +2025-02-06 00:55:02 - ERROR - stderr - +2025-02-06 00:55:02 - INFO - stdout - {'loss': 0.413, 'grad_norm': 1.4439014196395874, 'learning_rate': 3.6694690472399575e-06, 'epoch': 2.18} +2025-02-06 00:55:02 - ERROR - stderr - 73%|███████▎ | 16302/22434 [14:47:21<4:24:11, 2.59s/it] +2025-02-06 00:55:04 - ERROR - stderr - 73%|███████▎ | 16303/22434 [14:47:24<4:25:05, 2.59s/it] +2025-02-06 00:55:04 - ERROR - stderr - +2025-02-06 00:55:04 - ERROR - stderr - +2025-02-06 00:55:04 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.4956010580062866, 'learning_rate': 3.668351496865674e-06, 'epoch': 2.18} +2025-02-06 00:55:04 - ERROR - stderr - 73%|███████▎ | 16303/22434 [14:47:24<4:25:05, 2.59s/it] +2025-02-06 00:55:07 - ERROR - stderr - 73%|███████▎ | 16304/22434 [14:47:26<4:23:03, 2.57s/it] +2025-02-06 00:55:07 - ERROR - stderr - +2025-02-06 00:55:07 - ERROR - stderr - +2025-02-06 00:55:07 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.4027496576309204, 'learning_rate': 3.6672340784685477e-06, 'epoch': 2.18} +2025-02-06 00:55:07 - ERROR - stderr - 73%|███████▎ | 16304/22434 [14:47:26<4:23:03, 2.57s/it] +2025-02-06 00:55:09 - ERROR - stderr - 73%|███████▎ | 16305/22434 [14:47:29<4:20:28, 2.55s/it] +2025-02-06 00:55:09 - ERROR - stderr - +2025-02-06 00:55:09 - ERROR - stderr - +2025-02-06 00:55:09 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.3655768632888794, 'learning_rate': 3.6661167920718664e-06, 'epoch': 2.18} +2025-02-06 00:55:09 - ERROR - stderr - 73%|███████▎ | 16305/22434 [14:47:29<4:20:28, 2.55s/it] +2025-02-06 00:55:12 - ERROR - stderr - 73%|███████▎ | 16306/22434 [14:47:32<4:23:29, 2.58s/it] +2025-02-06 00:55:12 - ERROR - stderr - +2025-02-06 00:55:12 - ERROR - stderr - +2025-02-06 00:55:12 - INFO - stdout - {'loss': 0.4158, 'grad_norm': 1.4732098579406738, 'learning_rate': 3.6649996376989215e-06, 'epoch': 2.18} +2025-02-06 00:55:12 - ERROR - stderr - 73%|███████▎ | 16306/22434 [14:47:32<4:23:29, 2.58s/it] +2025-02-06 00:55:14 - ERROR - stderr - 73%|███████▎ | 16307/22434 [14:47:34<4:22:27, 2.57s/it] +2025-02-06 00:55:14 - ERROR - stderr - +2025-02-06 00:55:14 - ERROR - stderr - +2025-02-06 00:55:14 - INFO - stdout - {'loss': 0.4175, 'grad_norm': 1.5541791915893555, 'learning_rate': 3.663882615372999e-06, 'epoch': 2.18} +2025-02-06 00:55:14 - ERROR - stderr - 73%|███████▎ | 16307/22434 [14:47:34<4:22:27, 2.57s/it] +2025-02-06 00:55:17 - ERROR - stderr - 73%|███████▎ | 16308/22434 [14:47:37<4:21:03, 2.56s/it] +2025-02-06 00:55:17 - ERROR - stderr - +2025-02-06 00:55:17 - ERROR - stderr - +2025-02-06 00:55:17 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.58713960647583, 'learning_rate': 3.662765725117374e-06, 'epoch': 2.18} +2025-02-06 00:55:17 - ERROR - stderr - 73%|███████▎ | 16308/22434 [14:47:37<4:21:03, 2.56s/it] +2025-02-06 00:55:19 - ERROR - stderr - 73%|███████▎ | 16309/22434 [14:47:39<4:16:40, 2.51s/it] +2025-02-06 00:55:19 - ERROR - stderr - +2025-02-06 00:55:19 - ERROR - stderr - +2025-02-06 00:55:19 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.4791971445083618, 'learning_rate': 3.661648966955341e-06, 'epoch': 2.18} +2025-02-06 00:55:19 - ERROR - stderr - 73%|███████▎ | 16309/22434 [14:47:39<4:16:40, 2.51s/it] +2025-02-06 00:55:22 - ERROR - stderr - 73%|███████▎ | 16310/22434 [14:47:41<4:14:50, 2.50s/it] +2025-02-06 00:55:22 - ERROR - stderr - +2025-02-06 00:55:22 - ERROR - stderr - +2025-02-06 00:55:22 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.5218161344528198, 'learning_rate': 3.6605323409101656e-06, 'epoch': 2.18} +2025-02-06 00:55:22 - ERROR - stderr - 73%|███████▎ | 16310/22434 [14:47:42<4:14:50, 2.50s/it] +2025-02-06 00:55:24 - ERROR - stderr - 73%|███████▎ | 16311/22434 [14:47:44<4:15:37, 2.50s/it] +2025-02-06 00:55:24 - ERROR - stderr - +2025-02-06 00:55:24 - ERROR - stderr - +2025-02-06 00:55:24 - INFO - stdout - {'loss': 0.3706, 'grad_norm': 1.2320057153701782, 'learning_rate': 3.659415847005129e-06, 'epoch': 2.18} +2025-02-06 00:55:24 - ERROR - stderr - 73%|███████▎ | 16311/22434 [14:47:44<4:15:37, 2.50s/it] +2025-02-06 00:55:27 - ERROR - stderr - 73%|███████▎ | 16312/22434 [14:47:46<4:15:04, 2.50s/it] +2025-02-06 00:55:27 - ERROR - stderr - +2025-02-06 00:55:27 - ERROR - stderr - +2025-02-06 00:55:27 - INFO - stdout - {'loss': 0.3479, 'grad_norm': 1.3070148229599, 'learning_rate': 3.6582994852635e-06, 'epoch': 2.18} +2025-02-06 00:55:27 - ERROR - stderr - 73%|███████▎ | 16312/22434 [14:47:47<4:15:04, 2.50s/it] +2025-02-06 00:55:29 - ERROR - stderr - 73%|███████▎ | 16313/22434 [14:47:49<4:18:58, 2.54s/it] +2025-02-06 00:55:29 - ERROR - stderr - +2025-02-06 00:55:29 - ERROR - stderr - +2025-02-06 00:55:29 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.6972564458847046, 'learning_rate': 3.6571832557085475e-06, 'epoch': 2.18} +2025-02-06 00:55:29 - ERROR - stderr - 73%|███████▎ | 16313/22434 [14:47:49<4:18:58, 2.54s/it] +2025-02-06 00:55:32 - ERROR - stderr - 73%|███████▎ | 16314/22434 [14:47:52<4:16:21, 2.51s/it] +2025-02-06 00:55:32 - ERROR - stderr - +2025-02-06 00:55:32 - ERROR - stderr - +2025-02-06 00:55:32 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.5121333599090576, 'learning_rate': 3.6560671583635467e-06, 'epoch': 2.18} +2025-02-06 00:55:32 - ERROR - stderr - 73%|███████▎ | 16314/22434 [14:47:52<4:16:21, 2.51s/it] +2025-02-06 00:55:34 - ERROR - stderr - 73%|███████▎ | 16315/22434 [14:47:54<4:17:34, 2.53s/it] +2025-02-06 00:55:34 - ERROR - stderr - +2025-02-06 00:55:34 - ERROR - stderr - +2025-02-06 00:55:34 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.4832582473754883, 'learning_rate': 3.654951193251752e-06, 'epoch': 2.18} +2025-02-06 00:55:34 - ERROR - stderr - 73%|███████▎ | 16315/22434 [14:47:54<4:17:34, 2.53s/it] +2025-02-06 00:55:37 - ERROR - stderr - 73%|███████▎ | 16316/22434 [14:47:57<4:16:22, 2.51s/it] +2025-02-06 00:55:37 - ERROR - stderr - +2025-02-06 00:55:37 - ERROR - stderr - +2025-02-06 00:55:37 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.519750714302063, 'learning_rate': 3.6538353603964292e-06, 'epoch': 2.18} +2025-02-06 00:55:37 - ERROR - stderr - 73%|███████▎ | 16316/22434 [14:47:57<4:16:22, 2.51s/it] +2025-02-06 00:55:39 - ERROR - stderr - 73%|███████▎ | 16317/22434 [14:47:59<4:14:21, 2.49s/it] +2025-02-06 00:55:39 - ERROR - stderr - +2025-02-06 00:55:39 - ERROR - stderr - +2025-02-06 00:55:39 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.718870997428894, 'learning_rate': 3.6527196598208347e-06, 'epoch': 2.18} +2025-02-06 00:55:39 - ERROR - stderr - 73%|███████▎ | 16317/22434 [14:47:59<4:14:21, 2.49s/it] +2025-02-06 00:55:42 - ERROR - stderr - 73%|███████▎ | 16318/22434 [14:48:02<4:13:51, 2.49s/it] +2025-02-06 00:55:42 - ERROR - stderr - +2025-02-06 00:55:42 - ERROR - stderr - +2025-02-06 00:55:42 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.33301842212677, 'learning_rate': 3.6516040915482264e-06, 'epoch': 2.18} +2025-02-06 00:55:42 - ERROR - stderr - 73%|███████▎ | 16318/22434 [14:48:02<4:13:51, 2.49s/it] +2025-02-06 00:55:44 - ERROR - stderr - 73%|███████▎ | 16319/22434 [14:48:04<4:11:00, 2.46s/it] +2025-02-06 00:55:44 - ERROR - stderr - +2025-02-06 00:55:44 - ERROR - stderr - +2025-02-06 00:55:44 - INFO - stdout - {'loss': 0.3346, 'grad_norm': 1.322077989578247, 'learning_rate': 3.6504886556018547e-06, 'epoch': 2.18} +2025-02-06 00:55:44 - ERROR - stderr - 73%|███████▎ | 16319/22434 [14:48:04<4:11:00, 2.46s/it] +2025-02-06 00:55:47 - ERROR - stderr - 73%|███████▎ | 16320/22434 [14:48:06<4:10:41, 2.46s/it] +2025-02-06 00:55:47 - ERROR - stderr - +2025-02-06 00:55:47 - ERROR - stderr - +2025-02-06 00:55:47 - INFO - stdout - {'loss': 0.3292, 'grad_norm': 1.4060719013214111, 'learning_rate': 3.649373352004972e-06, 'epoch': 2.18} +2025-02-06 00:55:47 - ERROR - stderr - 73%|███████▎ | 16320/22434 [14:48:06<4:10:41, 2.46s/it] +2025-02-06 00:55:49 - ERROR - stderr - 73%|��██████▎ | 16321/22434 [14:48:09<4:11:07, 2.46s/it] +2025-02-06 00:55:49 - ERROR - stderr - +2025-02-06 00:55:49 - ERROR - stderr - +2025-02-06 00:55:49 - INFO - stdout - {'loss': 0.345, 'grad_norm': 1.501711368560791, 'learning_rate': 3.648258180780825e-06, 'epoch': 2.18} +2025-02-06 00:55:49 - ERROR - stderr - 73%|███████▎ | 16321/22434 [14:48:09<4:11:07, 2.46s/it] +2025-02-06 00:55:52 - ERROR - stderr - 73%|███████▎ | 16322/22434 [14:48:11<4:12:11, 2.48s/it] +2025-02-06 00:55:52 - ERROR - stderr - +2025-02-06 00:55:52 - ERROR - stderr - +2025-02-06 00:55:52 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.4843274354934692, 'learning_rate': 3.647143141952657e-06, 'epoch': 2.18} +2025-02-06 00:55:52 - ERROR - stderr - 73%|███████▎ | 16322/22434 [14:48:11<4:12:11, 2.48s/it] +2025-02-06 00:55:54 - ERROR - stderr - 73%|███████▎ | 16323/22434 [14:48:14<4:12:43, 2.48s/it] +2025-02-06 00:55:54 - ERROR - stderr - +2025-02-06 00:55:54 - ERROR - stderr - +2025-02-06 00:55:54 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.4530706405639648, 'learning_rate': 3.6460282355437125e-06, 'epoch': 2.18} +2025-02-06 00:55:54 - ERROR - stderr - 73%|███████▎ | 16323/22434 [14:48:14<4:12:43, 2.48s/it] +2025-02-06 00:55:57 - ERROR - stderr - 73%|███████▎ | 16324/22434 [14:48:16<4:14:15, 2.50s/it] +2025-02-06 00:55:57 - ERROR - stderr - +2025-02-06 00:55:57 - ERROR - stderr - +2025-02-06 00:55:57 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.5931384563446045, 'learning_rate': 3.6449134615772284e-06, 'epoch': 2.18} +2025-02-06 00:55:57 - ERROR - stderr - 73%|███████▎ | 16324/22434 [14:48:16<4:14:15, 2.50s/it] +2025-02-06 00:55:59 - ERROR - stderr - 73%|███████▎ | 16325/22434 [14:48:19<4:15:51, 2.51s/it] +2025-02-06 00:55:59 - ERROR - stderr - +2025-02-06 00:55:59 - ERROR - stderr - +2025-02-06 00:55:59 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.6148937940597534, 'learning_rate': 3.6437988200764427e-06, 'epoch': 2.18} +2025-02-06 00:55:59 - ERROR - stderr - 73%|███████▎ | 16325/22434 [14:48:19<4:15:51, 2.51s/it] +2025-02-06 00:56:02 - ERROR - stderr - 73%|███████▎ | 16326/22434 [14:48:21<4:14:30, 2.50s/it] +2025-02-06 00:56:02 - ERROR - stderr - +2025-02-06 00:56:02 - ERROR - stderr - +2025-02-06 00:56:02 - INFO - stdout - {'loss': 0.4149, 'grad_norm': 1.5775597095489502, 'learning_rate': 3.642684311064588e-06, 'epoch': 2.18} +2025-02-06 00:56:02 - ERROR - stderr - 73%|███████▎ | 16326/22434 [14:48:21<4:14:30, 2.50s/it] +2025-02-06 00:56:04 - ERROR - stderr - 73%|███████▎ | 16327/22434 [14:48:24<4:14:24, 2.50s/it] +2025-02-06 00:56:04 - ERROR - stderr - +2025-02-06 00:56:04 - ERROR - stderr - +2025-02-06 00:56:04 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.7508631944656372, 'learning_rate': 3.641569934564896e-06, 'epoch': 2.18} +2025-02-06 00:56:04 - ERROR - stderr - 73%|███████▎ | 16327/22434 [14:48:24<4:14:24, 2.50s/it] +2025-02-06 00:56:07 - ERROR - stderr - 73%|███████▎ | 16328/22434 [14:48:26<4:13:04, 2.49s/it] +2025-02-06 00:56:07 - ERROR - stderr - +2025-02-06 00:56:07 - ERROR - stderr - +2025-02-06 00:56:07 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.4091308116912842, 'learning_rate': 3.6404556906005973e-06, 'epoch': 2.18} +2025-02-06 00:56:07 - ERROR - stderr - 73%|███████▎ | 16328/22434 [14:48:26<4:13:04, 2.49s/it] +2025-02-06 00:56:09 - ERROR - stderr - 73%|███████▎ | 16329/22434 [14:48:29<4:12:45, 2.48s/it] +2025-02-06 00:56:09 - ERROR - stderr - +2025-02-06 00:56:09 - ERROR - stderr - +2025-02-06 00:56:09 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.5087890625, 'learning_rate': 3.6393415791949084e-06, 'epoch': 2.18} +2025-02-06 00:56:09 - ERROR - stderr - 73%|███████▎ | 16329/22434 [14:48:29<4:12:45, 2.48s/it] +2025-02-06 00:56:12 - ERROR - stderr - 73%|███████▎ | 16330/22434 [14:48:31<4:13:15, 2.49s/it] +2025-02-06 00:56:12 - ERROR - stderr - +2025-02-06 00:56:12 - ERROR - stderr - +2025-02-06 00:56:12 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.6268309354782104, 'learning_rate': 3.638227600371064e-06, 'epoch': 2.18} +2025-02-06 00:56:12 - ERROR - stderr - 73%|███████▎ | 16330/22434 [14:48:31<4:13:15, 2.49s/it] +2025-02-06 00:56:14 - ERROR - stderr - 73%|███████▎ | 16331/22434 [14:48:34<4:15:31, 2.51s/it] +2025-02-06 00:56:14 - ERROR - stderr - +2025-02-06 00:56:14 - ERROR - stderr - +2025-02-06 00:56:14 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.5602035522460938, 'learning_rate': 3.6371137541522737e-06, 'epoch': 2.18} +2025-02-06 00:56:14 - ERROR - stderr - 73%|███████▎ | 16331/22434 [14:48:34<4:15:31, 2.51s/it] +2025-02-06 00:56:17 - ERROR - stderr - 73%|███████▎ | 16332/22434 [14:48:36<4:16:12, 2.52s/it] +2025-02-06 00:56:17 - ERROR - stderr - +2025-02-06 00:56:17 - ERROR - stderr - +2025-02-06 00:56:17 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4617384672164917, 'learning_rate': 3.6360000405617558e-06, 'epoch': 2.18} +2025-02-06 00:56:17 - ERROR - stderr - 73%|███████▎ | 16332/22434 [14:48:36<4:16:12, 2.52s/it] +2025-02-06 00:56:19 - ERROR - stderr - 73%|███████▎ | 16333/22434 [14:48:39<4:14:55, 2.51s/it] +2025-02-06 00:56:19 - ERROR - stderr - +2025-02-06 00:56:19 - ERROR - stderr - +2025-02-06 00:56:19 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.5935776233673096, 'learning_rate': 3.634886459622734e-06, 'epoch': 2.18} +2025-02-06 00:56:19 - ERROR - stderr - 73%|███████▎ | 16333/22434 [14:48:39<4:14:55, 2.51s/it] +2025-02-06 00:56:22 - ERROR - stderr - 73%|███████▎ | 16334/22434 [14:48:42<4:17:01, 2.53s/it] +2025-02-06 00:56:22 - ERROR - stderr - +2025-02-06 00:56:22 - ERROR - stderr - +2025-02-06 00:56:22 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.5108132362365723, 'learning_rate': 3.6337730113584058e-06, 'epoch': 2.18} +2025-02-06 00:56:22 - ERROR - stderr - 73%|███████▎ | 16334/22434 [14:48:42<4:17:01, 2.53s/it] +2025-02-06 00:56:24 - ERROR - stderr - 73%|███████▎ | 16335/22434 [14:48:44<4:21:47, 2.58s/it] +2025-02-06 00:56:24 - ERROR - stderr - +2025-02-06 00:56:24 - ERROR - stderr - +2025-02-06 00:56:24 - INFO - stdout - {'loss': 0.4285, 'grad_norm': 1.5082652568817139, 'learning_rate': 3.6326596957919957e-06, 'epoch': 2.18} +2025-02-06 00:56:24 - ERROR - stderr - 73%|███████▎ | 16335/22434 [14:48:44<4:21:47, 2.58s/it] +2025-02-06 00:56:27 - ERROR - stderr - 73%|███████▎ | 16336/22434 [14:48:47<4:29:14, 2.65s/it] +2025-02-06 00:56:27 - ERROR - stderr - +2025-02-06 00:56:27 - ERROR - stderr - +2025-02-06 00:56:27 - INFO - stdout - {'loss': 0.3981, 'grad_norm': 1.4768787622451782, 'learning_rate': 3.6315465129466966e-06, 'epoch': 2.18} +2025-02-06 00:56:27 - ERROR - stderr - 73%|███████▎ | 16336/22434 [14:48:47<4:29:14, 2.65s/it] +2025-02-06 00:56:30 - ERROR - stderr - 73%|███████▎ | 16337/22434 [14:48:50<4:24:37, 2.60s/it] +2025-02-06 00:56:30 - ERROR - stderr - +2025-02-06 00:56:30 - ERROR - stderr - +2025-02-06 00:56:30 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.4968199729919434, 'learning_rate': 3.630433462845717e-06, 'epoch': 2.18} +2025-02-06 00:56:30 - ERROR - stderr - 73%|███████▎ | 16337/22434 [14:48:50<4:24:37, 2.60s/it] +2025-02-06 00:56:32 - ERROR - stderr - 73%|███████▎ | 16338/22434 [14:48:52<4:26:36, 2.62s/it] +2025-02-06 00:56:32 - ERROR - stderr - +2025-02-06 00:56:32 - ERROR - stderr - +2025-02-06 00:56:32 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.2967222929000854, 'learning_rate': 3.629320545512257e-06, 'epoch': 2.18} +2025-02-06 00:56:32 - ERROR - stderr - 73%|███████▎ | 16338/22434 [14:48:52<4:26:36, 2.62s/it] +2025-02-06 00:56:35 - ERROR - stderr - 73%|███████▎ | 16339/22434 [14:48:55<4:24:39, 2.61s/it] +2025-02-06 00:56:35 - ERROR - stderr - +2025-02-06 00:56:35 - ERROR - stderr - +2025-02-06 00:56:35 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.4939343929290771, 'learning_rate': 3.628207760969513e-06, 'epoch': 2.18} +2025-02-06 00:56:35 - ERROR - stderr - 73%|███████▎ | 16339/22434 [14:48:55<4:24:39, 2.61s/it] +2025-02-06 00:56:38 - ERROR - stderr - 73%|███████▎ | 16340/22434 [14:48:57<4:23:57, 2.60s/it] +2025-02-06 00:56:38 - ERROR - stderr - +2025-02-06 00:56:38 - ERROR - stderr - +2025-02-06 00:56:38 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.4628163576126099, 'learning_rate': 3.6270951092406826e-06, 'epoch': 2.19} +2025-02-06 00:56:38 - ERROR - stderr - 73%|███████▎ | 16340/22434 [14:48:57<4:23:57, 2.60s/it] +2025-02-06 00:56:40 - ERROR - stderr - 73%|███████▎ | 16341/22434 [14:49:00<4:20:12, 2.56s/it] +2025-02-06 00:56:40 - ERROR - stderr - +2025-02-06 00:56:40 - ERROR - stderr - +2025-02-06 00:56:40 - INFO - stdout - {'loss': 0.4487, 'grad_norm': 1.6034502983093262, 'learning_rate': 3.6259825903489567e-06, 'epoch': 2.19} +2025-02-06 00:56:40 - ERROR - stderr - 73%|███████▎ | 16341/22434 [14:49:00<4:20:12, 2.56s/it] +2025-02-06 00:56:43 - ERROR - stderr - 73%|███████▎ | 16342/22434 [14:49:02<4:17:39, 2.54s/it] +2025-02-06 00:56:43 - ERROR - stderr - +2025-02-06 00:56:43 - ERROR - stderr - +2025-02-06 00:56:43 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.4875776767730713, 'learning_rate': 3.624870204317523e-06, 'epoch': 2.19} +2025-02-06 00:56:43 - ERROR - stderr - 73%|███████▎ | 16342/22434 [14:49:02<4:17:39, 2.54s/it] +2025-02-06 00:56:45 - ERROR - stderr - 73%|███████▎ | 16343/22434 [14:49:05<4:17:08, 2.53s/it] +2025-02-06 00:56:45 - ERROR - stderr - +2025-02-06 00:56:45 - ERROR - stderr - +2025-02-06 00:56:45 - INFO - stdout - {'loss': 0.4063, 'grad_norm': 1.6573498249053955, 'learning_rate': 3.6237579511695696e-06, 'epoch': 2.19} +2025-02-06 00:56:45 - ERROR - stderr - 73%|███████▎ | 16343/22434 [14:49:05<4:17:08, 2.53s/it] +2025-02-06 00:56:45 - INFO - stdout - WARNING: tokenization mismatch: 156 vs. 174. (ignored) +2025-02-06 00:56:48 - ERROR - stderr - 73%|███████▎ | 16344/22434 [14:49:07<4:15:58, 2.52s/it] +2025-02-06 00:56:48 - ERROR - stderr - +2025-02-06 00:56:48 - ERROR - stderr - +2025-02-06 00:56:48 - INFO - stdout - {'loss': 0.4037, 'grad_norm': 1.6375935077667236, 'learning_rate': 3.6226458309282806e-06, 'epoch': 2.19} +2025-02-06 00:56:48 - ERROR - stderr - 73%|███████▎ | 16344/22434 [14:49:07<4:15:58, 2.52s/it] +2025-02-06 00:56:50 - ERROR - stderr - 73%|███████▎ | 16345/22434 [14:49:10<4:16:11, 2.52s/it] +2025-02-06 00:56:50 - ERROR - stderr - +2025-02-06 00:56:50 - ERROR - stderr - +2025-02-06 00:56:50 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.424809217453003, 'learning_rate': 3.621533843616838e-06, 'epoch': 2.19} +2025-02-06 00:56:50 - ERROR - stderr - 73%|███████▎ | 16345/22434 [14:49:10<4:16:11, 2.52s/it] +2025-02-06 00:56:53 - ERROR - stderr - 73%|███████▎ | 16346/22434 [14:49:12<4:13:30, 2.50s/it] +2025-02-06 00:56:53 - ERROR - stderr - +2025-02-06 00:56:53 - ERROR - stderr - +2025-02-06 00:56:53 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.5334001779556274, 'learning_rate': 3.620421989258418e-06, 'epoch': 2.19} +2025-02-06 00:56:53 - ERROR - stderr - 73%|███████▎ | 16346/22434 [14:49:12<4:13:30, 2.50s/it] +2025-02-06 00:56:55 - ERROR - stderr - 73%|███████▎ | 16347/22434 [14:49:15<4:10:36, 2.47s/it] +2025-02-06 00:56:55 - ERROR - stderr - +2025-02-06 00:56:55 - ERROR - stderr - +2025-02-06 00:56:55 - INFO - stdout - {'loss': 0.4054, 'grad_norm': 1.8028336763381958, 'learning_rate': 3.6193102678762004e-06, 'epoch': 2.19} +2025-02-06 00:56:55 - ERROR - stderr - 73%|███████▎ | 16347/22434 [14:49:15<4:10:36, 2.47s/it] +2025-02-06 00:56:58 - ERROR - stderr - 73%|███████▎ | 16348/22434 [14:49:18<4:21:25, 2.58s/it] +2025-02-06 00:56:58 - ERROR - stderr - +2025-02-06 00:56:58 - ERROR - stderr - +2025-02-06 00:56:58 - INFO - stdout - {'loss': 0.4064, 'grad_norm': 1.5019755363464355, 'learning_rate': 3.618198679493348e-06, 'epoch': 2.19} +2025-02-06 00:56:58 - ERROR - stderr - 73%|███████▎ | 16348/22434 [14:49:18<4:21:25, 2.58s/it] +2025-02-06 00:57:00 - ERROR - stderr - 73%|███████▎ | 16349/22434 [14:49:20<4:23:19, 2.60s/it] +2025-02-06 00:57:00 - ERROR - stderr - +2025-02-06 00:57:00 - ERROR - stderr - +2025-02-06 00:57:00 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.7856967449188232, 'learning_rate': 3.61708722413304e-06, 'epoch': 2.19} +2025-02-06 00:57:00 - ERROR - stderr - 73%|███████▎ | 16349/22434 [14:49:20<4:23:19, 2.60s/it] +2025-02-06 00:57:03 - ERROR - stderr - 73%|███████▎ | 16350/22434 [14:49:23<4:19:37, 2.56s/it] +2025-02-06 00:57:03 - ERROR - stderr - +2025-02-06 00:57:03 - ERROR - stderr - +2025-02-06 00:57:03 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.3765339851379395, 'learning_rate': 3.6159759018184417e-06, 'epoch': 2.19} +2025-02-06 00:57:03 - ERROR - stderr - 73%|███████▎ | 16350/22434 [14:49:23<4:19:37, 2.56s/it] +2025-02-06 00:57:06 - ERROR - stderr - 73%|███████▎ | 16351/22434 [14:49:25<4:28:15, 2.65s/it] +2025-02-06 00:57:06 - ERROR - stderr - +2025-02-06 00:57:06 - ERROR - stderr - +2025-02-06 00:57:06 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.4637154340744019, 'learning_rate': 3.6148647125727165e-06, 'epoch': 2.19} +2025-02-06 00:57:06 - ERROR - stderr - 73%|███████▎ | 16351/22434 [14:49:26<4:28:15, 2.65s/it] +2025-02-06 00:57:08 - ERROR - stderr - 73%|███████▎ | 16352/22434 [14:49:28<4:23:22, 2.60s/it] +2025-02-06 00:57:08 - ERROR - stderr - +2025-02-06 00:57:08 - ERROR - stderr - +2025-02-06 00:57:08 - INFO - stdout - {'loss': 0.434, 'grad_norm': 1.6549135446548462, 'learning_rate': 3.6137536564190302e-06, 'epoch': 2.19} +2025-02-06 00:57:08 - ERROR - stderr - 73%|███████▎ | 16352/22434 [14:49:28<4:23:22, 2.60s/it] +2025-02-06 00:57:11 - ERROR - stderr - 73%|███████▎ | 16353/22434 [14:49:31<4:22:37, 2.59s/it] +2025-02-06 00:57:11 - ERROR - stderr - +2025-02-06 00:57:11 - ERROR - stderr - +2025-02-06 00:57:11 - INFO - stdout - {'loss': 0.3412, 'grad_norm': 1.460453748703003, 'learning_rate': 3.6126427333805315e-06, 'epoch': 2.19} +2025-02-06 00:57:11 - ERROR - stderr - 73%|███████▎ | 16353/22434 [14:49:31<4:22:37, 2.59s/it] +2025-02-06 00:57:13 - ERROR - stderr - 73%|███████▎ | 16354/22434 [14:49:33<4:19:42, 2.56s/it] +2025-02-06 00:57:13 - ERROR - stderr - +2025-02-06 00:57:13 - ERROR - stderr - +2025-02-06 00:57:13 - INFO - stdout - {'loss': 0.3408, 'grad_norm': 1.5194929838180542, 'learning_rate': 3.6115319434803897e-06, 'epoch': 2.19} +2025-02-06 00:57:13 - ERROR - stderr - 73%|███████▎ | 16354/22434 [14:49:33<4:19:42, 2.56s/it] +2025-02-06 00:57:16 - ERROR - stderr - 73%|███████▎ | 16355/22434 [14:49:36<4:26:12, 2.63s/it] +2025-02-06 00:57:16 - ERROR - stderr - +2025-02-06 00:57:16 - ERROR - stderr - +2025-02-06 00:57:16 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.5000133514404297, 'learning_rate': 3.6104212867417477e-06, 'epoch': 2.19} +2025-02-06 00:57:16 - ERROR - stderr - 73%|███████▎ | 16355/22434 [14:49:36<4:26:12, 2.63s/it] +2025-02-06 00:57:19 - ERROR - stderr - 73%|███████▎ | 16356/22434 [14:49:38<4:23:37, 2.60s/it] +2025-02-06 00:57:19 - ERROR - stderr - +2025-02-06 00:57:19 - ERROR - stderr - +2025-02-06 00:57:19 - INFO - stdout - {'loss': 0.3954, 'grad_norm': 1.9321099519729614, 'learning_rate': 3.609310763187759e-06, 'epoch': 2.19} +2025-02-06 00:57:19 - ERROR - stderr - 73%|███████▎ | 16356/22434 [14:49:38<4:23:37, 2.60s/it] +2025-02-06 00:57:21 - ERROR - stderr - 73%|███████▎ | 16357/22434 [14:49:41<4:21:57, 2.59s/it] +2025-02-06 00:57:21 - ERROR - stderr - +2025-02-06 00:57:21 - ERROR - stderr - +2025-02-06 00:57:21 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.4440604448318481, 'learning_rate': 3.608200372841574e-06, 'epoch': 2.19} +2025-02-06 00:57:21 - ERROR - stderr - 73%|███████▎ | 16357/22434 [14:49:41<4:21:57, 2.59s/it] +2025-02-06 00:57:24 - ERROR - stderr - 73%|███████▎ | 16358/22434 [14:49:43<4:19:50, 2.57s/it] +2025-02-06 00:57:24 - ERROR - stderr - +2025-02-06 00:57:24 - ERROR - stderr - +2025-02-06 00:57:24 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.4573160409927368, 'learning_rate': 3.6070901157263303e-06, 'epoch': 2.19} +2025-02-06 00:57:24 - ERROR - stderr - 73%|███████▎ | 16358/22434 [14:49:43<4:19:50, 2.57s/it] +2025-02-06 00:57:26 - ERROR - stderr - 73%|███████▎ | 16359/22434 [14:49:46<4:15:59, 2.53s/it] +2025-02-06 00:57:26 - ERROR - stderr - +2025-02-06 00:57:26 - ERROR - stderr - +2025-02-06 00:57:26 - INFO - stdout - {'loss': 0.4484, 'grad_norm': 1.5695223808288574, 'learning_rate': 3.605979991865185e-06, 'epoch': 2.19} +2025-02-06 00:57:26 - ERROR - stderr - 73%|███████▎ | 16359/22434 [14:49:46<4:15:59, 2.53s/it] +2025-02-06 00:57:29 - ERROR - stderr - 73%|███████▎ | 16360/22434 [14:49:48<4:16:00, 2.53s/it] +2025-02-06 00:57:29 - ERROR - stderr - +2025-02-06 00:57:29 - ERROR - stderr - +2025-02-06 00:57:29 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.3229132890701294, 'learning_rate': 3.604870001281263e-06, 'epoch': 2.19} +2025-02-06 00:57:29 - ERROR - stderr - 73%|███████▎ | 16360/22434 [14:49:48<4:16:00, 2.53s/it] +2025-02-06 00:57:31 - ERROR - stderr - 73%|███████▎ | 16361/22434 [14:49:51<4:16:14, 2.53s/it] +2025-02-06 00:57:31 - ERROR - stderr - +2025-02-06 00:57:31 - ERROR - stderr - +2025-02-06 00:57:31 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.441109538078308, 'learning_rate': 3.603760143997708e-06, 'epoch': 2.19} +2025-02-06 00:57:31 - ERROR - stderr - 73%|███████▎ | 16361/22434 [14:49:51<4:16:14, 2.53s/it] +2025-02-06 00:57:34 - ERROR - stderr - 73%|███████▎ | 16362/22434 [14:49:53<4:16:39, 2.54s/it] +2025-02-06 00:57:34 - ERROR - stderr - +2025-02-06 00:57:34 - ERROR - stderr - +2025-02-06 00:57:34 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.3831636905670166, 'learning_rate': 3.602650420037651e-06, 'epoch': 2.19} +2025-02-06 00:57:34 - ERROR - stderr - 73%|███████▎ | 16362/22434 [14:49:54<4:16:39, 2.54s/it] +2025-02-06 00:57:36 - ERROR - stderr - 73%|███████▎ | 16363/22434 [14:49:56<4:17:29, 2.54s/it] +2025-02-06 00:57:36 - ERROR - stderr - +2025-02-06 00:57:36 - ERROR - stderr - +2025-02-06 00:57:36 - INFO - stdout - {'loss': 0.4099, 'grad_norm': 1.381396770477295, 'learning_rate': 3.601540829424225e-06, 'epoch': 2.19} +2025-02-06 00:57:36 - ERROR - stderr - 73%|███████▎ | 16363/22434 [14:49:56<4:17:29, 2.54s/it] +2025-02-06 00:57:39 - ERROR - stderr - 73%|███████▎ | 16364/22434 [14:49:59<4:17:57, 2.55s/it] +2025-02-06 00:57:39 - ERROR - stderr - +2025-02-06 00:57:39 - ERROR - stderr - +2025-02-06 00:57:39 - INFO - stdout - {'loss': 0.3303, 'grad_norm': 1.5415884256362915, 'learning_rate': 3.600431372180557e-06, 'epoch': 2.19} +2025-02-06 00:57:39 - ERROR - stderr - 73%|███████▎ | 16364/22434 [14:49:59<4:17:57, 2.55s/it] +2025-02-06 00:57:41 - ERROR - stderr - 73%|███████▎ | 16365/22434 [14:50:01<4:18:04, 2.55s/it] +2025-02-06 00:57:41 - ERROR - stderr - +2025-02-06 00:57:41 - ERROR - stderr - +2025-02-06 00:57:41 - INFO - stdout - {'loss': 0.3458, 'grad_norm': 1.4183281660079956, 'learning_rate': 3.599322048329774e-06, 'epoch': 2.19} +2025-02-06 00:57:41 - ERROR - stderr - 73%|███████▎ | 16365/22434 [14:50:01<4:18:04, 2.55s/it] +2025-02-06 00:57:44 - ERROR - stderr - 73%|███████▎ | 16366/22434 [14:50:04<4:16:24, 2.54s/it] +2025-02-06 00:57:44 - ERROR - stderr - +2025-02-06 00:57:44 - ERROR - stderr - +2025-02-06 00:57:44 - INFO - stdout - {'loss': 0.3977, 'grad_norm': 1.5219459533691406, 'learning_rate': 3.5982128578949984e-06, 'epoch': 2.19} +2025-02-06 00:57:44 - ERROR - stderr - 73%|███████▎ | 16366/22434 [14:50:04<4:16:24, 2.54s/it] +2025-02-06 00:57:46 - ERROR - stderr - 73%|███████▎ | 16367/22434 [14:50:06<4:14:37, 2.52s/it] +2025-02-06 00:57:46 - ERROR - stderr - +2025-02-06 00:57:46 - ERROR - stderr - +2025-02-06 00:57:46 - INFO - stdout - {'loss': 0.4177, 'grad_norm': 1.6582280397415161, 'learning_rate': 3.5971038008993496e-06, 'epoch': 2.19} +2025-02-06 00:57:46 - ERROR - stderr - 73%|███████▎ | 16367/22434 [14:50:06<4:14:37, 2.52s/it] +2025-02-06 00:57:49 - ERROR - stderr - 73%|███████▎ | 16368/22434 [14:50:09<4:12:27, 2.50s/it] +2025-02-06 00:57:49 - ERROR - stderr - +2025-02-06 00:57:49 - ERROR - stderr - +2025-02-06 00:57:49 - INFO - stdout - {'loss': 0.4223, 'grad_norm': 1.4540014266967773, 'learning_rate': 3.595994877365945e-06, 'epoch': 2.19} +2025-02-06 00:57:49 - ERROR - stderr - 73%|███████▎ | 16368/22434 [14:50:09<4:12:27, 2.50s/it] +2025-02-06 00:57:51 - ERROR - stderr - 73%|███████▎ | 16369/22434 [14:50:11<4:10:28, 2.48s/it] +2025-02-06 00:57:51 - ERROR - stderr - +2025-02-06 00:57:51 - ERROR - stderr - +2025-02-06 00:57:51 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.694222331047058, 'learning_rate': 3.5948860873178992e-06, 'epoch': 2.19} +2025-02-06 00:57:51 - ERROR - stderr - 73%|███████▎ | 16369/22434 [14:50:11<4:10:28, 2.48s/it] +2025-02-06 00:57:54 - ERROR - stderr - 73%|███████▎ | 16370/22434 [14:50:14<4:12:33, 2.50s/it] +2025-02-06 00:57:54 - ERROR - stderr - +2025-02-06 00:57:54 - ERROR - stderr - +2025-02-06 00:57:54 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.4442307949066162, 'learning_rate': 3.5937774307783245e-06, 'epoch': 2.19} +2025-02-06 00:57:54 - ERROR - stderr - 73%|███████▎ | 16370/22434 [14:50:14<4:12:33, 2.50s/it] +2025-02-06 00:57:56 - ERROR - stderr - 73%|███████▎ | 16371/22434 [14:50:16<4:11:53, 2.49s/it] +2025-02-06 00:57:56 - ERROR - stderr - +2025-02-06 00:57:56 - ERROR - stderr - +2025-02-06 00:57:56 - INFO - stdout - {'loss': 0.4117, 'grad_norm': 1.4792312383651733, 'learning_rate': 3.5926689077703323e-06, 'epoch': 2.19} +2025-02-06 00:57:56 - ERROR - stderr - 73%|███████▎ | 16371/22434 [14:50:16<4:11:53, 2.49s/it] +2025-02-06 00:57:59 - ERROR - stderr - 73%|███████▎ | 16372/22434 [14:50:19<4:18:40, 2.56s/it] +2025-02-06 00:57:59 - ERROR - stderr - +2025-02-06 00:57:59 - ERROR - stderr - +2025-02-06 00:57:59 - INFO - stdout - {'loss': 0.4331, 'grad_norm': 1.5308893918991089, 'learning_rate': 3.591560518317019e-06, 'epoch': 2.19} +2025-02-06 00:57:59 - ERROR - stderr - 73%|███████▎ | 16372/22434 [14:50:19<4:18:40, 2.56s/it] +2025-02-06 00:58:02 - ERROR - stderr - 73%|███████▎ | 16373/22434 [14:50:21<4:17:46, 2.55s/it] +2025-02-06 00:58:02 - ERROR - stderr - +2025-02-06 00:58:02 - ERROR - stderr - +2025-02-06 00:58:02 - INFO - stdout - {'loss': 0.4114, 'grad_norm': 1.6259902715682983, 'learning_rate': 3.5904522624415007e-06, 'epoch': 2.19} +2025-02-06 00:58:02 - ERROR - stderr - 73%|███████▎ | 16373/22434 [14:50:21<4:17:46, 2.55s/it] +2025-02-06 00:58:04 - ERROR - stderr - 73%|███████▎ | 16374/22434 [14:50:24<4:16:49, 2.54s/it] +2025-02-06 00:58:04 - ERROR - stderr - +2025-02-06 00:58:04 - ERROR - stderr - +2025-02-06 00:58:04 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.8770668506622314, 'learning_rate': 3.5893441401668648e-06, 'epoch': 2.19} +2025-02-06 00:58:04 - ERROR - stderr - 73%|███████▎ | 16374/22434 [14:50:24<4:16:49, 2.54s/it] +2025-02-06 00:58:07 - ERROR - stderr - 73%|███████▎ | 16375/22434 [14:50:26<4:15:21, 2.53s/it] +2025-02-06 00:58:07 - ERROR - stderr - +2025-02-06 00:58:07 - ERROR - stderr - +2025-02-06 00:58:07 - INFO - stdout - {'loss': 0.3998, 'grad_norm': 1.5259467363357544, 'learning_rate': 3.5882361515162223e-06, 'epoch': 2.19} +2025-02-06 00:58:07 - ERROR - stderr - 73%|███████▎ | 16375/22434 [14:50:26<4:15:21, 2.53s/it] +2025-02-06 00:58:09 - ERROR - stderr - 73%|███████▎ | 16376/22434 [14:50:29<4:24:01, 2.61s/it] +2025-02-06 00:58:09 - ERROR - stderr - +2025-02-06 00:58:09 - ERROR - stderr - +2025-02-06 00:58:09 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.4136395454406738, 'learning_rate': 3.5871282965126596e-06, 'epoch': 2.19} +2025-02-06 00:58:09 - ERROR - stderr - 73%|███████▎ | 16376/22434 [14:50:29<4:24:01, 2.61s/it] +2025-02-06 00:58:12 - ERROR - stderr - 73%|███████▎ | 16377/22434 [14:50:32<4:21:47, 2.59s/it] +2025-02-06 00:58:12 - ERROR - stderr - +2025-02-06 00:58:12 - ERROR - stderr - +2025-02-06 00:58:12 - INFO - stdout - {'loss': 0.3995, 'grad_norm': 1.4673441648483276, 'learning_rate': 3.5860205751792676e-06, 'epoch': 2.19} +2025-02-06 00:58:12 - ERROR - stderr - 73%|███████▎ | 16377/22434 [14:50:32<4:21:47, 2.59s/it] +2025-02-06 00:58:14 - ERROR - stderr - 73%|███████▎ | 16378/22434 [14:50:34<4:18:32, 2.56s/it] +2025-02-06 00:58:14 - ERROR - stderr - +2025-02-06 00:58:14 - ERROR - stderr - +2025-02-06 00:58:14 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.5149905681610107, 'learning_rate': 3.5849129875391453e-06, 'epoch': 2.19} +2025-02-06 00:58:14 - ERROR - stderr - 73%|███████▎ | 16378/22434 [14:50:34<4:18:32, 2.56s/it] +2025-02-06 00:58:17 - ERROR - stderr - 73%|███████▎ | 16379/22434 [14:50:37<4:15:46, 2.53s/it] +2025-02-06 00:58:17 - ERROR - stderr - +2025-02-06 00:58:17 - ERROR - stderr - +2025-02-06 00:58:17 - INFO - stdout - {'loss': 0.3829, 'grad_norm': 1.5662513971328735, 'learning_rate': 3.58380553361537e-06, 'epoch': 2.19} +2025-02-06 00:58:17 - ERROR - stderr - 73%|███████▎ | 16379/22434 [14:50:37<4:15:46, 2.53s/it] +2025-02-06 00:58:20 - ERROR - stderr - 73%|███████▎ | 16380/22434 [14:50:39<4:19:31, 2.57s/it] +2025-02-06 00:58:20 - ERROR - stderr - +2025-02-06 00:58:20 - ERROR - stderr - +2025-02-06 00:58:20 - INFO - stdout - {'loss': 0.3978, 'grad_norm': 1.5141760110855103, 'learning_rate': 3.5826982134310294e-06, 'epoch': 2.19} +2025-02-06 00:58:20 - ERROR - stderr - 73%|███████▎ | 16380/22434 [14:50:39<4:19:31, 2.57s/it] +2025-02-06 00:58:22 - ERROR - stderr - 73%|███████▎ | 16381/22434 [14:50:42<4:14:54, 2.53s/it] +2025-02-06 00:58:22 - ERROR - stderr - +2025-02-06 00:58:22 - ERROR - stderr - +2025-02-06 00:58:22 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.3788865804672241, 'learning_rate': 3.5815910270092025e-06, 'epoch': 2.19} +2025-02-06 00:58:22 - ERROR - stderr - 73%|███████▎ | 16381/22434 [14:50:42<4:14:54, 2.53s/it] +2025-02-06 00:58:24 - ERROR - stderr - 73%|███████▎ | 16382/22434 [14:50:44<4:13:01, 2.51s/it] +2025-02-06 00:58:24 - ERROR - stderr - +2025-02-06 00:58:24 - ERROR - stderr - +2025-02-06 00:58:24 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.5146018266677856, 'learning_rate': 3.58048397437297e-06, 'epoch': 2.19} +2025-02-06 00:58:24 - ERROR - stderr - 73%|███████▎ | 16382/22434 [14:50:44<4:13:01, 2.51s/it] +2025-02-06 00:58:27 - ERROR - stderr - 73%|███████▎ | 16383/22434 [14:50:47<4:12:13, 2.50s/it] +2025-02-06 00:58:27 - ERROR - stderr - +2025-02-06 00:58:27 - ERROR - stderr - +2025-02-06 00:58:27 - INFO - stdout - {'loss': 0.3036, 'grad_norm': 1.3292251825332642, 'learning_rate': 3.5793770555454065e-06, 'epoch': 2.19} +2025-02-06 00:58:27 - ERROR - stderr - 73%|███████▎ | 16383/22434 [14:50:47<4:12:13, 2.50s/it] +2025-02-06 00:58:29 - ERROR - stderr - 73%|███████▎ | 16384/22434 [14:50:49<4:12:27, 2.50s/it] +2025-02-06 00:58:29 - ERROR - stderr - +2025-02-06 00:58:29 - ERROR - stderr - +2025-02-06 00:58:29 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.4756734371185303, 'learning_rate': 3.578270270549583e-06, 'epoch': 2.19} +2025-02-06 00:58:29 - ERROR - stderr - 73%|███████▎ | 16384/22434 [14:50:49<4:12:27, 2.50s/it] +2025-02-06 00:58:32 - ERROR - stderr - 73%|███████▎ | 16385/22434 [14:50:52<4:13:13, 2.51s/it] +2025-02-06 00:58:32 - ERROR - stderr - +2025-02-06 00:58:32 - ERROR - stderr - +2025-02-06 00:58:32 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.5888948440551758, 'learning_rate': 3.5771636194085724e-06, 'epoch': 2.19} +2025-02-06 00:58:32 - ERROR - stderr - 73%|███████▎ | 16385/22434 [14:50:52<4:13:13, 2.51s/it] +2025-02-06 00:58:35 - ERROR - stderr - 73%|███████▎ | 16386/22434 [14:50:54<4:20:39, 2.59s/it] +2025-02-06 00:58:35 - ERROR - stderr - +2025-02-06 00:58:35 - ERROR - stderr - +2025-02-06 00:58:35 - INFO - stdout - {'loss': 0.3858, 'grad_norm': 1.4345057010650635, 'learning_rate': 3.5760571021454393e-06, 'epoch': 2.19} +2025-02-06 00:58:35 - ERROR - stderr - 73%|███████▎ | 16386/22434 [14:50:54<4:20:39, 2.59s/it] +2025-02-06 00:58:37 - ERROR - stderr - 73%|███████▎ | 16387/22434 [14:50:57<4:15:44, 2.54s/it] +2025-02-06 00:58:37 - ERROR - stderr - +2025-02-06 00:58:37 - ERROR - stderr - +2025-02-06 00:58:37 - INFO - stdout - {'loss': 0.4428, 'grad_norm': 1.687107801437378, 'learning_rate': 3.5749507187832486e-06, 'epoch': 2.19} +2025-02-06 00:58:37 - ERROR - stderr - 73%|███████▎ | 16387/22434 [14:50:57<4:15:44, 2.54s/it] +2025-02-06 00:58:40 - ERROR - stderr - 73%|███████▎ | 16388/22434 [14:51:00<4:19:02, 2.57s/it] +2025-02-06 00:58:40 - ERROR - stderr - +2025-02-06 00:58:40 - ERROR - stderr - +2025-02-06 00:58:40 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.433917760848999, 'learning_rate': 3.5738444693450624e-06, 'epoch': 2.19} +2025-02-06 00:58:40 - ERROR - stderr - 73%|███████▎ | 16388/22434 [14:51:00<4:19:02, 2.57s/it] +2025-02-06 00:58:42 - ERROR - stderr - 73%|███████▎ | 16389/22434 [14:51:02<4:16:58, 2.55s/it] +2025-02-06 00:58:42 - ERROR - stderr - +2025-02-06 00:58:42 - ERROR - stderr - +2025-02-06 00:58:42 - INFO - stdout - {'loss': 0.3514, 'grad_norm': 1.42483651638031, 'learning_rate': 3.5727383538539395e-06, 'epoch': 2.19} +2025-02-06 00:58:42 - ERROR - stderr - 73%|███████▎ | 16389/22434 [14:51:02<4:16:58, 2.55s/it] +2025-02-06 00:58:45 - ERROR - stderr - 73%|███████▎ | 16390/22434 [14:51:05<4:15:17, 2.53s/it] +2025-02-06 00:58:45 - ERROR - stderr - +2025-02-06 00:58:45 - ERROR - stderr - +2025-02-06 00:58:45 - INFO - stdout - {'loss': 0.4188, 'grad_norm': 1.6001818180084229, 'learning_rate': 3.5716323723329347e-06, 'epoch': 2.19} +2025-02-06 00:58:45 - ERROR - stderr - 73%|███████▎ | 16390/22434 [14:51:05<4:15:17, 2.53s/it] +2025-02-06 00:58:47 - ERROR - stderr - 73%|███████▎ | 16391/22434 [14:51:07<4:14:28, 2.53s/it] +2025-02-06 00:58:47 - ERROR - stderr - +2025-02-06 00:58:47 - ERROR - stderr - +2025-02-06 00:58:47 - INFO - stdout - {'loss': 0.4128, 'grad_norm': 1.507879376411438, 'learning_rate': 3.5705265248051023e-06, 'epoch': 2.19} +2025-02-06 00:58:47 - ERROR - stderr - 73%|███████▎ | 16391/22434 [14:51:07<4:14:28, 2.53s/it] +2025-02-06 00:58:50 - ERROR - stderr - 73%|███████▎ | 16392/22434 [14:51:10<4:13:40, 2.52s/it] +2025-02-06 00:58:50 - ERROR - stderr - +2025-02-06 00:58:50 - ERROR - stderr - +2025-02-06 00:58:50 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.405267596244812, 'learning_rate': 3.569420811293496e-06, 'epoch': 2.19} +2025-02-06 00:58:50 - ERROR - stderr - 73%|███████▎ | 16392/22434 [14:51:10<4:13:40, 2.52s/it] +2025-02-06 00:58:52 - ERROR - stderr - 73%|███████▎ | 16393/22434 [14:51:12<4:11:03, 2.49s/it] +2025-02-06 00:58:52 - ERROR - stderr - +2025-02-06 00:58:52 - ERROR - stderr - +2025-02-06 00:58:52 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.49809730052948, 'learning_rate': 3.568315231821151e-06, 'epoch': 2.19} +2025-02-06 00:58:52 - ERROR - stderr - 73%|███████▎ | 16393/22434 [14:51:12<4:11:03, 2.49s/it] +2025-02-06 00:58:55 - ERROR - stderr - 73%|███████▎ | 16394/22434 [14:51:14<4:08:35, 2.47s/it] +2025-02-06 00:58:55 - ERROR - stderr - +2025-02-06 00:58:55 - ERROR - stderr - +2025-02-06 00:58:55 - INFO - stdout - {'loss': 0.3964, 'grad_norm': 1.510206937789917, 'learning_rate': 3.5672097864111287e-06, 'epoch': 2.19} +2025-02-06 00:58:55 - ERROR - stderr - 73%|███████▎ | 16394/22434 [14:51:14<4:08:35, 2.47s/it] +2025-02-06 00:58:57 - ERROR - stderr - 73%|███████▎ | 16395/22434 [14:51:17<4:09:29, 2.48s/it] +2025-02-06 00:58:57 - ERROR - stderr - +2025-02-06 00:58:57 - ERROR - stderr - +2025-02-06 00:58:57 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.4786604642868042, 'learning_rate': 3.5661044750864595e-06, 'epoch': 2.19} +2025-02-06 00:58:57 - ERROR - stderr - 73%|███████▎ | 16395/22434 [14:51:17<4:09:29, 2.48s/it] +2025-02-06 00:59:00 - ERROR - stderr - 73%|███████▎ | 16396/22434 [14:51:19<4:11:16, 2.50s/it] +2025-02-06 00:59:00 - ERROR - stderr - +2025-02-06 00:59:00 - ERROR - stderr - +2025-02-06 00:59:00 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.5953116416931152, 'learning_rate': 3.564999297870182e-06, 'epoch': 2.19} +2025-02-06 00:59:00 - ERROR - stderr - 73%|███████▎ | 16396/22434 [14:51:19<4:11:16, 2.50s/it] +2025-02-06 00:59:02 - ERROR - stderr - 73%|███████▎ | 16397/22434 [14:51:22<4:11:11, 2.50s/it] +2025-02-06 00:59:02 - ERROR - stderr - +2025-02-06 00:59:02 - ERROR - stderr - +2025-02-06 00:59:02 - INFO - stdout - {'loss': 0.4323, 'grad_norm': 1.654759168624878, 'learning_rate': 3.563894254785344e-06, 'epoch': 2.19} +2025-02-06 00:59:02 - ERROR - stderr - 73%|███████▎ | 16397/22434 [14:51:22<4:11:11, 2.50s/it] +2025-02-06 00:59:05 - ERROR - stderr - 73%|███████▎ | 16398/22434 [14:51:25<4:16:27, 2.55s/it] +2025-02-06 00:59:05 - ERROR - stderr - +2025-02-06 00:59:05 - ERROR - stderr - +2025-02-06 00:59:05 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.2955964803695679, 'learning_rate': 3.5627893458549644e-06, 'epoch': 2.19} +2025-02-06 00:59:05 - ERROR - stderr - 73%|███████▎ | 16398/22434 [14:51:25<4:16:27, 2.55s/it] +2025-02-06 00:59:07 - ERROR - stderr - 73%|███���███▎ | 16399/22434 [14:51:27<4:15:04, 2.54s/it] +2025-02-06 00:59:07 - ERROR - stderr - +2025-02-06 00:59:07 - ERROR - stderr - +2025-02-06 00:59:07 - INFO - stdout - {'loss': 0.3229, 'grad_norm': 1.3139053583145142, 'learning_rate': 3.5616845711020876e-06, 'epoch': 2.19} +2025-02-06 00:59:07 - ERROR - stderr - 73%|███████▎ | 16399/22434 [14:51:27<4:15:04, 2.54s/it] +2025-02-06 00:59:10 - ERROR - stderr - 73%|███████▎ | 16400/22434 [14:51:30<4:13:41, 2.52s/it] +2025-02-06 00:59:10 - ERROR - stderr - +2025-02-06 00:59:10 - ERROR - stderr - +2025-02-06 00:59:10 - INFO - stdout - {'loss': 0.4172, 'grad_norm': 1.6418778896331787, 'learning_rate': 3.5605799305497325e-06, 'epoch': 2.19} +2025-02-06 00:59:10 - ERROR - stderr - 73%|███████▎ | 16400/22434 [14:51:30<4:13:41, 2.52s/it] +2025-02-06 00:59:12 - ERROR - stderr - 73%|███████▎ | 16401/22434 [14:51:32<4:17:36, 2.56s/it] +2025-02-06 00:59:13 - ERROR - stderr - +2025-02-06 00:59:13 - ERROR - stderr - +2025-02-06 00:59:13 - INFO - stdout - {'loss': 0.3519, 'grad_norm': 1.4203234910964966, 'learning_rate': 3.5594754242209263e-06, 'epoch': 2.19} +2025-02-06 00:59:13 - ERROR - stderr - 73%|███████▎ | 16401/22434 [14:51:32<4:17:36, 2.56s/it] +2025-02-06 00:59:15 - ERROR - stderr - 73%|███████▎ | 16402/22434 [14:51:35<4:14:39, 2.53s/it] +2025-02-06 00:59:15 - ERROR - stderr - +2025-02-06 00:59:15 - ERROR - stderr - +2025-02-06 00:59:15 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.3634294271469116, 'learning_rate': 3.5583710521386916e-06, 'epoch': 2.19} +2025-02-06 00:59:15 - ERROR - stderr - 73%|███████▎ | 16402/22434 [14:51:35<4:14:39, 2.53s/it] +2025-02-06 00:59:18 - ERROR - stderr - 73%|███████▎ | 16403/22434 [14:51:37<4:21:17, 2.60s/it] +2025-02-06 00:59:18 - ERROR - stderr - +2025-02-06 00:59:18 - ERROR - stderr - +2025-02-06 00:59:18 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.5628266334533691, 'learning_rate': 3.5572668143260458e-06, 'epoch': 2.19} +2025-02-06 00:59:18 - ERROR - stderr - 73%|███████▎ | 16403/22434 [14:51:38<4:21:17, 2.60s/it] +2025-02-06 00:59:20 - ERROR - stderr - 73%|███████▎ | 16404/22434 [14:51:40<4:19:18, 2.58s/it] +2025-02-06 00:59:20 - ERROR - stderr - +2025-02-06 00:59:20 - ERROR - stderr - +2025-02-06 00:59:20 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.4561527967453003, 'learning_rate': 3.5561627108060137e-06, 'epoch': 2.19} +2025-02-06 00:59:20 - ERROR - stderr - 73%|███████▎ | 16404/22434 [14:51:40<4:19:18, 2.58s/it] +2025-02-06 00:59:23 - ERROR - stderr - 73%|███████▎ | 16405/22434 [14:51:42<4:16:33, 2.55s/it] +2025-02-06 00:59:23 - ERROR - stderr - +2025-02-06 00:59:23 - ERROR - stderr - +2025-02-06 00:59:23 - INFO - stdout - {'loss': 0.4276, 'grad_norm': 1.735776662826538, 'learning_rate': 3.5550587416016016e-06, 'epoch': 2.19} +2025-02-06 00:59:23 - ERROR - stderr - 73%|███████▎ | 16405/22434 [14:51:43<4:16:33, 2.55s/it] +2025-02-06 00:59:25 - ERROR - stderr - 73%|███████▎ | 16406/22434 [14:51:45<4:12:46, 2.52s/it] +2025-02-06 00:59:25 - ERROR - stderr - +2025-02-06 00:59:25 - ERROR - stderr - +2025-02-06 00:59:25 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.4480133056640625, 'learning_rate': 3.5539549067358225e-06, 'epoch': 2.19} +2025-02-06 00:59:25 - ERROR - stderr - 73%|███████▎ | 16406/22434 [14:51:45<4:12:46, 2.52s/it] +2025-02-06 00:59:28 - ERROR - stderr - 73%|███████▎ | 16407/22434 [14:51:48<4:15:59, 2.55s/it] +2025-02-06 00:59:28 - ERROR - stderr - +2025-02-06 00:59:28 - ERROR - stderr - +2025-02-06 00:59:28 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.4428770542144775, 'learning_rate': 3.5528512062316857e-06, 'epoch': 2.19} +2025-02-06 00:59:28 - ERROR - stderr - 73%|███████▎ | 16407/22434 [14:51:48<4:15:59, 2.55s/it] +2025-02-06 00:59:30 - ERROR - stderr - 73%|███████▎ | 16408/22434 [14:51:50<4:14:17, 2.53s/it] +2025-02-06 00:59:30 - ERROR - stderr - +2025-02-06 00:59:30 - ERROR - stderr - +2025-02-06 00:59:30 - INFO - stdout - {'loss': 0.4368, 'grad_norm': 1.696537733078003, 'learning_rate': 3.5517476401121953e-06, 'epoch': 2.19} +2025-02-06 00:59:30 - ERROR - stderr - 73%|███████▎ | 16408/22434 [14:51:50<4:14:17, 2.53s/it] +2025-02-06 00:59:33 - ERROR - stderr - 73%|███████▎ | 16409/22434 [14:51:52<4:12:24, 2.51s/it] +2025-02-06 00:59:33 - ERROR - stderr - +2025-02-06 00:59:33 - ERROR - stderr - +2025-02-06 00:59:33 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.6042019128799438, 'learning_rate': 3.5506442084003554e-06, 'epoch': 2.19} +2025-02-06 00:59:33 - ERROR - stderr - 73%|███████▎ | 16409/22434 [14:51:53<4:12:24, 2.51s/it] +2025-02-06 00:59:35 - ERROR - stderr - 73%|███████▎ | 16410/22434 [14:51:55<4:11:05, 2.50s/it] +2025-02-06 00:59:35 - ERROR - stderr - +2025-02-06 00:59:35 - ERROR - stderr - +2025-02-06 00:59:35 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.5025063753128052, 'learning_rate': 3.549540911119166e-06, 'epoch': 2.19} +2025-02-06 00:59:35 - ERROR - stderr - 73%|███████▎ | 16410/22434 [14:51:55<4:11:05, 2.50s/it] +2025-02-06 00:59:38 - ERROR - stderr - 73%|███████▎ | 16411/22434 [14:51:57<4:11:44, 2.51s/it] +2025-02-06 00:59:38 - ERROR - stderr - +2025-02-06 00:59:38 - ERROR - stderr - +2025-02-06 00:59:38 - INFO - stdout - {'loss': 0.3962, 'grad_norm': 1.5232625007629395, 'learning_rate': 3.5484377482916245e-06, 'epoch': 2.19} +2025-02-06 00:59:38 - ERROR - stderr - 73%|███████▎ | 16411/22434 [14:51:58<4:11:44, 2.51s/it] +2025-02-06 00:59:40 - ERROR - stderr - 73%|███████▎ | 16412/22434 [14:52:00<4:11:33, 2.51s/it] +2025-02-06 00:59:40 - ERROR - stderr - +2025-02-06 00:59:40 - ERROR - stderr - +2025-02-06 00:59:40 - INFO - stdout - {'loss': 0.3499, 'grad_norm': 1.4286611080169678, 'learning_rate': 3.547334719940724e-06, 'epoch': 2.19} +2025-02-06 00:59:40 - ERROR - stderr - 73%|███████▎ | 16412/22434 [14:52:00<4:11:33, 2.51s/it] +2025-02-06 00:59:43 - ERROR - stderr - 73%|███████▎ | 16413/22434 [14:52:03<4:12:07, 2.51s/it] +2025-02-06 00:59:43 - ERROR - stderr - +2025-02-06 00:59:43 - ERROR - stderr - +2025-02-06 00:59:43 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.2526613473892212, 'learning_rate': 3.546231826089459e-06, 'epoch': 2.19} +2025-02-06 00:59:43 - ERROR - stderr - 73%|███████▎ | 16413/22434 [14:52:03<4:12:07, 2.51s/it] +2025-02-06 00:59:46 - ERROR - stderr - 73%|███████▎ | 16414/22434 [14:52:05<4:23:01, 2.62s/it] +2025-02-06 00:59:46 - ERROR - stderr - +2025-02-06 00:59:46 - ERROR - stderr - +2025-02-06 00:59:46 - INFO - stdout - {'loss': 0.4387, 'grad_norm': 1.7012066841125488, 'learning_rate': 3.545129066760811e-06, 'epoch': 2.19} +2025-02-06 00:59:46 - ERROR - stderr - 73%|███████▎ | 16414/22434 [14:52:05<4:23:01, 2.62s/it] +2025-02-06 00:59:48 - ERROR - stderr - 73%|███████▎ | 16415/22434 [14:52:08<4:19:23, 2.59s/it] +2025-02-06 00:59:48 - ERROR - stderr - +2025-02-06 00:59:48 - ERROR - stderr - +2025-02-06 00:59:48 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.4367096424102783, 'learning_rate': 3.5440264419777724e-06, 'epoch': 2.2} +2025-02-06 00:59:48 - ERROR - stderr - 73%|███████▎ | 16415/22434 [14:52:08<4:19:23, 2.59s/it] +2025-02-06 00:59:51 - ERROR - stderr - 73%|███████▎ | 16416/22434 [14:52:10<4:15:53, 2.55s/it] +2025-02-06 00:59:51 - ERROR - stderr - +2025-02-06 00:59:51 - ERROR - stderr - +2025-02-06 00:59:51 - INFO - stdout - {'loss': 0.3997, 'grad_norm': 1.608490228652954, 'learning_rate': 3.5429239517633297e-06, 'epoch': 2.2} +2025-02-06 00:59:51 - ERROR - stderr - 73%|███████▎ | 16416/22434 [14:52:10<4:15:53, 2.55s/it] +2025-02-06 00:59:53 - ERROR - stderr - 73%|███████▎ | 16417/22434 [14:52:13<4:13:57, 2.53s/it] +2025-02-06 00:59:53 - ERROR - stderr - +2025-02-06 00:59:53 - ERROR - stderr - +2025-02-06 00:59:53 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.570743441581726, 'learning_rate': 3.541821596140452e-06, 'epoch': 2.2} +2025-02-06 00:59:53 - ERROR - stderr - 73%|███████▎ | 16417/22434 [14:52:13<4:13:57, 2.53s/it] +2025-02-06 00:59:56 - ERROR - stderr - 73%|███████▎ | 16418/22434 [14:52:15<4:13:31, 2.53s/it] +2025-02-06 00:59:56 - ERROR - stderr - +2025-02-06 00:59:56 - ERROR - stderr - +2025-02-06 00:59:56 - INFO - stdout - {'loss': 0.3988, 'grad_norm': 1.6516621112823486, 'learning_rate': 3.540719375132129e-06, 'epoch': 2.2} +2025-02-06 00:59:56 - ERROR - stderr - 73%|███████▎ | 16418/22434 [14:52:15<4:13:31, 2.53s/it] +2025-02-06 00:59:58 - ERROR - stderr - 73%|███████▎ | 16419/22434 [14:52:18<4:11:28, 2.51s/it] +2025-02-06 00:59:58 - ERROR - stderr - +2025-02-06 00:59:58 - ERROR - stderr - +2025-02-06 00:59:58 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.6899850368499756, 'learning_rate': 3.5396172887613246e-06, 'epoch': 2.2} +2025-02-06 00:59:58 - ERROR - stderr - 73%|███████▎ | 16419/22434 [14:52:18<4:11:28, 2.51s/it] +2025-02-06 01:00:01 - ERROR - stderr - 73%|███████▎ | 16420/22434 [14:52:21<4:15:56, 2.55s/it] +2025-02-06 01:00:01 - ERROR - stderr - +2025-02-06 01:00:01 - ERROR - stderr - +2025-02-06 01:00:01 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.3151986598968506, 'learning_rate': 3.5385153370510207e-06, 'epoch': 2.2} +2025-02-06 01:00:01 - ERROR - stderr - 73%|███████▎ | 16420/22434 [14:52:21<4:15:56, 2.55s/it] +2025-02-06 01:00:03 - ERROR - stderr - 73%|███████▎ | 16421/22434 [14:52:23<4:14:31, 2.54s/it] +2025-02-06 01:00:03 - ERROR - stderr - +2025-02-06 01:00:03 - ERROR - stderr - +2025-02-06 01:00:03 - INFO - stdout - {'loss': 0.4111, 'grad_norm': 1.562608003616333, 'learning_rate': 3.53741352002418e-06, 'epoch': 2.2} +2025-02-06 01:00:03 - ERROR - stderr - 73%|███████▎ | 16421/22434 [14:52:23<4:14:31, 2.54s/it] +2025-02-06 01:00:06 - ERROR - stderr - 73%|███████▎ | 16422/22434 [14:52:25<4:11:28, 2.51s/it] +2025-02-06 01:00:06 - ERROR - stderr - +2025-02-06 01:00:06 - ERROR - stderr - +2025-02-06 01:00:06 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.5565778017044067, 'learning_rate': 3.5363118377037654e-06, 'epoch': 2.2} +2025-02-06 01:00:06 - ERROR - stderr - 73%|███████▎ | 16422/22434 [14:52:25<4:11:28, 2.51s/it] +2025-02-06 01:00:08 - ERROR - stderr - 73%|███████▎ | 16423/22434 [14:52:28<4:11:28, 2.51s/it] +2025-02-06 01:00:08 - ERROR - stderr - +2025-02-06 01:00:08 - ERROR - stderr - +2025-02-06 01:00:08 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.5046974420547485, 'learning_rate': 3.5352102901127527e-06, 'epoch': 2.2} +2025-02-06 01:00:08 - ERROR - stderr - 73%|███████▎ | 16423/22434 [14:52:28<4:11:28, 2.51s/it] +2025-02-06 01:00:11 - ERROR - stderr - 73%|███████▎ | 16424/22434 [14:52:30<4:12:06, 2.52s/it] +2025-02-06 01:00:11 - ERROR - stderr - +2025-02-06 01:00:11 - ERROR - stderr - +2025-02-06 01:00:11 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.3644211292266846, 'learning_rate': 3.5341088772740928e-06, 'epoch': 2.2} +2025-02-06 01:00:11 - ERROR - stderr - 73%|███████▎ | 16424/22434 [14:52:31<4:12:06, 2.52s/it] +2025-02-06 01:00:13 - ERROR - stderr - 73%|███████▎ | 16425/22434 [14:52:33<4:12:34, 2.52s/it] +2025-02-06 01:00:13 - ERROR - stderr - +2025-02-06 01:00:13 - ERROR - stderr - +2025-02-06 01:00:13 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.3171924352645874, 'learning_rate': 3.533007599210746e-06, 'epoch': 2.2} +2025-02-06 01:00:13 - ERROR - stderr - 73%|███████▎ | 16425/22434 [14:52:33<4:12:34, 2.52s/it] +2025-02-06 01:00:16 - ERROR - stderr - 73%|███████▎ | 16426/22434 [14:52:36<4:11:45, 2.51s/it] +2025-02-06 01:00:16 - ERROR - stderr - +2025-02-06 01:00:16 - ERROR - stderr - +2025-02-06 01:00:16 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.5270090103149414, 'learning_rate': 3.5319064559456672e-06, 'epoch': 2.2} +2025-02-06 01:00:16 - ERROR - stderr - 73%|███████▎ | 16426/22434 [14:52:36<4:11:45, 2.51s/it] +2025-02-06 01:00:18 - ERROR - stderr - 73%|███████▎ | 16427/22434 [14:52:38<4:12:05, 2.52s/it] +2025-02-06 01:00:18 - ERROR - stderr - +2025-02-06 01:00:18 - ERROR - stderr - +2025-02-06 01:00:18 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.4101202487945557, 'learning_rate': 3.5308054475018095e-06, 'epoch': 2.2} +2025-02-06 01:00:18 - ERROR - stderr - 73%|███████▎ | 16427/22434 [14:52:38<4:12:05, 2.52s/it] +2025-02-06 01:00:21 - ERROR - stderr - 73%|███████▎ | 16428/22434 [14:52:41<4:12:14, 2.52s/it] +2025-02-06 01:00:21 - ERROR - stderr - +2025-02-06 01:00:21 - ERROR - stderr - +2025-02-06 01:00:21 - INFO - stdout - {'loss': 0.4063, 'grad_norm': 1.6080455780029297, 'learning_rate': 3.529704573902121e-06, 'epoch': 2.2} +2025-02-06 01:00:21 - ERROR - stderr - 73%|███████▎ | 16428/22434 [14:52:41<4:12:14, 2.52s/it] +2025-02-06 01:00:23 - ERROR - stderr - 73%|███████▎ | 16429/22434 [14:52:43<4:09:32, 2.49s/it] +2025-02-06 01:00:23 - ERROR - stderr - +2025-02-06 01:00:23 - ERROR - stderr - +2025-02-06 01:00:23 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.4410773515701294, 'learning_rate': 3.5286038351695493e-06, 'epoch': 2.2} +2025-02-06 01:00:23 - ERROR - stderr - 73%|███████▎ | 16429/22434 [14:52:43<4:09:32, 2.49s/it] +2025-02-06 01:00:26 - ERROR - stderr - 73%|███████▎ | 16430/22434 [14:52:46<4:12:50, 2.53s/it] +2025-02-06 01:00:26 - ERROR - stderr - +2025-02-06 01:00:26 - ERROR - stderr - +2025-02-06 01:00:26 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.4366222620010376, 'learning_rate': 3.5275032313270386e-06, 'epoch': 2.2} +2025-02-06 01:00:26 - ERROR - stderr - 73%|███████▎ | 16430/22434 [14:52:46<4:12:50, 2.53s/it] +2025-02-06 01:00:28 - ERROR - stderr - 73%|███████▎ | 16431/22434 [14:52:48<4:10:34, 2.50s/it] +2025-02-06 01:00:28 - ERROR - stderr - +2025-02-06 01:00:28 - ERROR - stderr - +2025-02-06 01:00:28 - INFO - stdout - {'loss': 0.4458, 'grad_norm': 1.5443081855773926, 'learning_rate': 3.5264027623975294e-06, 'epoch': 2.2} +2025-02-06 01:00:28 - ERROR - stderr - 73%|███████▎ | 16431/22434 [14:52:48<4:10:34, 2.50s/it] +2025-02-06 01:00:31 - ERROR - stderr - 73%|███████▎ | 16432/22434 [14:52:51<4:19:10, 2.59s/it] +2025-02-06 01:00:31 - ERROR - stderr - +2025-02-06 01:00:31 - ERROR - stderr - +2025-02-06 01:00:31 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.4828189611434937, 'learning_rate': 3.525302428403964e-06, 'epoch': 2.2} +2025-02-06 01:00:31 - ERROR - stderr - 73%|███████▎ | 16432/22434 [14:52:51<4:19:10, 2.59s/it] +2025-02-06 01:00:34 - ERROR - stderr - 73%|███████▎ | 16433/22434 [14:52:53<4:14:29, 2.54s/it] +2025-02-06 01:00:34 - ERROR - stderr - +2025-02-06 01:00:34 - ERROR - stderr - +2025-02-06 01:00:34 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.5275070667266846, 'learning_rate': 3.524202229369267e-06, 'epoch': 2.2} +2025-02-06 01:00:34 - ERROR - stderr - 73%|███████▎ | 16433/22434 [14:52:53<4:14:29, 2.54s/it] +2025-02-06 01:00:36 - ERROR - stderr - 73%|███████▎ | 16434/22434 [14:52:56<4:14:51, 2.55s/it] +2025-02-06 01:00:36 - ERROR - stderr - +2025-02-06 01:00:36 - ERROR - stderr - +2025-02-06 01:00:36 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.4760372638702393, 'learning_rate': 3.523102165316381e-06, 'epoch': 2.2} +2025-02-06 01:00:36 - ERROR - stderr - 73%|███████▎ | 16434/22434 [14:52:56<4:14:51, 2.55s/it] +2025-02-06 01:00:39 - ERROR - stderr - 73%|███████▎ | 16435/22434 [14:52:58<4:12:47, 2.53s/it] +2025-02-06 01:00:39 - ERROR - stderr - +2025-02-06 01:00:39 - ERROR - stderr - +2025-02-06 01:00:39 - INFO - stdout - {'loss': 0.424, 'grad_norm': 1.7516838312149048, 'learning_rate': 3.522002236268233e-06, 'epoch': 2.2} +2025-02-06 01:00:39 - ERROR - stderr - 73%|███████▎ | 16435/22434 [14:52:58<4:12:47, 2.53s/it] +2025-02-06 01:00:41 - ERROR - stderr - 73%|███████▎ | 16436/22434 [14:53:01<4:13:00, 2.53s/it] +2025-02-06 01:00:41 - ERROR - stderr - +2025-02-06 01:00:41 - ERROR - stderr - +2025-02-06 01:00:41 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.491129755973816, 'learning_rate': 3.520902442247749e-06, 'epoch': 2.2} +2025-02-06 01:00:41 - ERROR - stderr - 73%|███████▎ | 16436/22434 [14:53:01<4:13:00, 2.53s/it] +2025-02-06 01:00:44 - ERROR - stderr - 73%|███████▎ | 16437/22434 [14:53:03<4:12:21, 2.52s/it] +2025-02-06 01:00:44 - ERROR - stderr - +2025-02-06 01:00:44 - ERROR - stderr - +2025-02-06 01:00:44 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.3396703004837036, 'learning_rate': 3.519802783277857e-06, 'epoch': 2.2} +2025-02-06 01:00:44 - ERROR - stderr - 73%|███████▎ | 16437/22434 [14:53:03<4:12:21, 2.52s/it] +2025-02-06 01:00:46 - ERROR - stderr - 73%|███████▎ | 16438/22434 [14:53:06<4:12:23, 2.53s/it] +2025-02-06 01:00:46 - ERROR - stderr - +2025-02-06 01:00:46 - ERROR - stderr - +2025-02-06 01:00:46 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.4581927061080933, 'learning_rate': 3.5187032593814684e-06, 'epoch': 2.2} +2025-02-06 01:00:46 - ERROR - stderr - 73%|███████▎ | 16438/22434 [14:53:06<4:12:23, 2.53s/it] +2025-02-06 01:00:49 - ERROR - stderr - 73%|███████▎ | 16439/22434 [14:53:08<4:12:49, 2.53s/it] +2025-02-06 01:00:49 - ERROR - stderr - +2025-02-06 01:00:49 - ERROR - stderr - +2025-02-06 01:00:49 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.549901008605957, 'learning_rate': 3.5176038705815163e-06, 'epoch': 2.2} +2025-02-06 01:00:49 - ERROR - stderr - 73%|███████▎ | 16439/22434 [14:53:08<4:12:49, 2.53s/it] +2025-02-06 01:00:51 - ERROR - stderr - 73%|███████▎ | 16440/22434 [14:53:11<4:16:02, 2.56s/it] +2025-02-06 01:00:51 - ERROR - stderr - +2025-02-06 01:00:51 - ERROR - stderr - +2025-02-06 01:00:51 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.4372854232788086, 'learning_rate': 3.516504616900904e-06, 'epoch': 2.2} +2025-02-06 01:00:51 - ERROR - stderr - 73%|███████▎ | 16440/22434 [14:53:11<4:16:02, 2.56s/it] +2025-02-06 01:00:54 - ERROR - stderr - 73%|███████▎ | 16441/22434 [14:53:14<4:18:57, 2.59s/it] +2025-02-06 01:00:54 - ERROR - stderr - +2025-02-06 01:00:54 - ERROR - stderr - +2025-02-06 01:00:54 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.3589279651641846, 'learning_rate': 3.5154054983625463e-06, 'epoch': 2.2} +2025-02-06 01:00:54 - ERROR - stderr - 73%|███████▎ | 16441/22434 [14:53:14<4:18:57, 2.59s/it] +2025-02-06 01:00:57 - ERROR - stderr - 73%|███████▎ | 16442/22434 [14:53:16<4:22:24, 2.63s/it] +2025-02-06 01:00:57 - ERROR - stderr - +2025-02-06 01:00:57 - ERROR - stderr - +2025-02-06 01:00:57 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.4434417486190796, 'learning_rate': 3.5143065149893617e-06, 'epoch': 2.2} +2025-02-06 01:00:57 - ERROR - stderr - 73%|███████▎ | 16442/22434 [14:53:17<4:22:24, 2.63s/it] +2025-02-06 01:00:59 - ERROR - stderr - 73%|███████▎ | 16443/22434 [14:53:19<4:18:21, 2.59s/it] +2025-02-06 01:00:59 - ERROR - stderr - +2025-02-06 01:00:59 - ERROR - stderr - +2025-02-06 01:00:59 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.4635580778121948, 'learning_rate': 3.5132076668042457e-06, 'epoch': 2.2} +2025-02-06 01:00:59 - ERROR - stderr - 73%|███��███▎ | 16443/22434 [14:53:19<4:18:21, 2.59s/it] +2025-02-06 01:01:02 - ERROR - stderr - 73%|███████▎ | 16444/22434 [14:53:22<4:24:07, 2.65s/it] +2025-02-06 01:01:02 - ERROR - stderr - +2025-02-06 01:01:02 - ERROR - stderr - +2025-02-06 01:01:02 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.4728752374649048, 'learning_rate': 3.5121089538301156e-06, 'epoch': 2.2} +2025-02-06 01:01:02 - ERROR - stderr - 73%|███████▎ | 16444/22434 [14:53:22<4:24:07, 2.65s/it] +2025-02-06 01:01:04 - ERROR - stderr - 73%|███████▎ | 16445/22434 [14:53:24<4:19:51, 2.60s/it] +2025-02-06 01:01:05 - ERROR - stderr - +2025-02-06 01:01:05 - ERROR - stderr - +2025-02-06 01:01:05 - INFO - stdout - {'loss': 0.4, 'grad_norm': 1.4978705644607544, 'learning_rate': 3.5110103760898616e-06, 'epoch': 2.2} +2025-02-06 01:01:05 - ERROR - stderr - 73%|███████▎ | 16445/22434 [14:53:24<4:19:51, 2.60s/it] +2025-02-06 01:01:07 - ERROR - stderr - 73%|███████▎ | 16446/22434 [14:53:27<4:17:17, 2.58s/it] +2025-02-06 01:01:07 - ERROR - stderr - +2025-02-06 01:01:07 - ERROR - stderr - +2025-02-06 01:01:07 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.5517399311065674, 'learning_rate': 3.509911933606388e-06, 'epoch': 2.2} +2025-02-06 01:01:07 - ERROR - stderr - 73%|███████▎ | 16446/22434 [14:53:27<4:17:17, 2.58s/it] +2025-02-06 01:01:09 - ERROR - stderr - 73%|███████▎ | 16447/22434 [14:53:29<4:12:18, 2.53s/it] +2025-02-06 01:01:09 - ERROR - stderr - +2025-02-06 01:01:09 - ERROR - stderr - +2025-02-06 01:01:09 - INFO - stdout - {'loss': 0.3842, 'grad_norm': 1.557432770729065, 'learning_rate': 3.5088136264025895e-06, 'epoch': 2.2} +2025-02-06 01:01:09 - ERROR - stderr - 73%|███████▎ | 16447/22434 [14:53:29<4:12:18, 2.53s/it] +2025-02-06 01:01:12 - ERROR - stderr - 73%|███████▎ | 16448/22434 [14:53:32<4:12:16, 2.53s/it] +2025-02-06 01:01:12 - ERROR - stderr - +2025-02-06 01:01:12 - ERROR - stderr - +2025-02-06 01:01:12 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.476466178894043, 'learning_rate': 3.5077154545013603e-06, 'epoch': 2.2} +2025-02-06 01:01:12 - ERROR - stderr - 73%|███████▎ | 16448/22434 [14:53:32<4:12:16, 2.53s/it] +2025-02-06 01:01:15 - ERROR - stderr - 73%|███████▎ | 16449/22434 [14:53:34<4:14:43, 2.55s/it] +2025-02-06 01:01:15 - ERROR - stderr - +2025-02-06 01:01:15 - ERROR - stderr - +2025-02-06 01:01:15 - INFO - stdout - {'loss': 0.3346, 'grad_norm': 1.2703248262405396, 'learning_rate': 3.5066174179255885e-06, 'epoch': 2.2} +2025-02-06 01:01:15 - ERROR - stderr - 73%|███████▎ | 16449/22434 [14:53:34<4:14:43, 2.55s/it] +2025-02-06 01:01:17 - ERROR - stderr - 73%|███████▎ | 16450/22434 [14:53:37<4:15:11, 2.56s/it] +2025-02-06 01:01:17 - ERROR - stderr - +2025-02-06 01:01:17 - ERROR - stderr - +2025-02-06 01:01:17 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.552782416343689, 'learning_rate': 3.505519516698165e-06, 'epoch': 2.2} +2025-02-06 01:01:17 - ERROR - stderr - 73%|███████▎ | 16450/22434 [14:53:37<4:15:11, 2.56s/it] +2025-02-06 01:01:20 - ERROR - stderr - 73%|███████▎ | 16451/22434 [14:53:40<4:18:38, 2.59s/it] +2025-02-06 01:01:20 - ERROR - stderr - +2025-02-06 01:01:20 - ERROR - stderr - +2025-02-06 01:01:20 - INFO - stdout - {'loss': 0.3647, 'grad_norm': 1.293184757232666, 'learning_rate': 3.504421750841971e-06, 'epoch': 2.2} +2025-02-06 01:01:20 - ERROR - stderr - 73%|███████▎ | 16451/22434 [14:53:40<4:18:38, 2.59s/it] +2025-02-06 01:01:23 - ERROR - stderr - 73%|███████▎ | 16452/22434 [14:53:42<4:22:15, 2.63s/it] +2025-02-06 01:01:23 - ERROR - stderr - +2025-02-06 01:01:23 - ERROR - stderr - +2025-02-06 01:01:23 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.3890290260314941, 'learning_rate': 3.5033241203798907e-06, 'epoch': 2.2} +2025-02-06 01:01:23 - ERROR - stderr - 73%|███████▎ | 16452/22434 [14:53:42<4:22:15, 2.63s/it] +2025-02-06 01:01:25 - ERROR - stderr - 73%|███████▎ | 16453/22434 [14:53:45<4:28:59, 2.70s/it] +2025-02-06 01:01:25 - ERROR - stderr - +2025-02-06 01:01:25 - ERROR - stderr - +2025-02-06 01:01:25 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.3625593185424805, 'learning_rate': 3.5022266253348025e-06, 'epoch': 2.2} +2025-02-06 01:01:25 - ERROR - stderr - 73%|███████▎ | 16453/22434 [14:53:45<4:28:59, 2.70s/it] +2025-02-06 01:01:28 - ERROR - stderr - 73%|███████▎ | 16454/22434 [14:53:48<4:25:49, 2.67s/it] +2025-02-06 01:01:28 - ERROR - stderr - +2025-02-06 01:01:28 - ERROR - stderr - +2025-02-06 01:01:28 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.7006300687789917, 'learning_rate': 3.5011292657295825e-06, 'epoch': 2.2} +2025-02-06 01:01:28 - ERROR - stderr - 73%|███████▎ | 16454/22434 [14:53:48<4:25:49, 2.67s/it] +2025-02-06 01:01:30 - ERROR - stderr - 73%|███████▎ | 16455/22434 [14:53:50<4:22:02, 2.63s/it] +2025-02-06 01:01:31 - ERROR - stderr - +2025-02-06 01:01:31 - ERROR - stderr - +2025-02-06 01:01:31 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.5066580772399902, 'learning_rate': 3.5000320415871035e-06, 'epoch': 2.2} +2025-02-06 01:01:31 - ERROR - stderr - 73%|███████▎ | 16455/22434 [14:53:50<4:22:02, 2.63s/it] +2025-02-06 01:01:33 - ERROR - stderr - 73%|███████▎ | 16456/22434 [14:53:53<4:21:29, 2.62s/it] +2025-02-06 01:01:33 - ERROR - stderr - +2025-02-06 01:01:33 - ERROR - stderr - +2025-02-06 01:01:33 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.374006748199463, 'learning_rate': 3.498934952930242e-06, 'epoch': 2.2} +2025-02-06 01:01:33 - ERROR - stderr - 73%|███████▎ | 16456/22434 [14:53:53<4:21:29, 2.62s/it] +2025-02-06 01:01:36 - ERROR - stderr - 73%|███████▎ | 16457/22434 [14:53:55<4:17:39, 2.59s/it] +2025-02-06 01:01:36 - ERROR - stderr - +2025-02-06 01:01:36 - ERROR - stderr - +2025-02-06 01:01:36 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.4527384042739868, 'learning_rate': 3.497837999781852e-06, 'epoch': 2.2} +2025-02-06 01:01:36 - ERROR - stderr - 73%|███████▎ | 16457/22434 [14:53:55<4:17:39, 2.59s/it] +2025-02-06 01:01:38 - ERROR - stderr - 73%|███████▎ | 16458/22434 [14:53:58<4:15:05, 2.56s/it] +2025-02-06 01:01:38 - ERROR - stderr - +2025-02-06 01:01:38 - ERROR - stderr - +2025-02-06 01:01:38 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.581458330154419, 'learning_rate': 3.4967411821648144e-06, 'epoch': 2.2} +2025-02-06 01:01:38 - ERROR - stderr - 73%|███████▎ | 16458/22434 [14:53:58<4:15:05, 2.56s/it] +2025-02-06 01:01:41 - ERROR - stderr - 73%|███████▎ | 16459/22434 [14:54:00<4:15:35, 2.57s/it] +2025-02-06 01:01:41 - ERROR - stderr - +2025-02-06 01:01:41 - ERROR - stderr - +2025-02-06 01:01:41 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.4305740594863892, 'learning_rate': 3.495644500101978e-06, 'epoch': 2.2} +2025-02-06 01:01:41 - ERROR - stderr - 73%|███████▎ | 16459/22434 [14:54:01<4:15:35, 2.57s/it] +2025-02-06 01:01:43 - ERROR - stderr - 73%|███████▎ | 16460/22434 [14:54:03<4:14:04, 2.55s/it] +2025-02-06 01:01:43 - ERROR - stderr - +2025-02-06 01:01:43 - ERROR - stderr - +2025-02-06 01:01:43 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.5581700801849365, 'learning_rate': 3.4945479536162096e-06, 'epoch': 2.2} +2025-02-06 01:01:43 - ERROR - stderr - 73%|███████▎ | 16460/22434 [14:54:03<4:14:04, 2.55s/it] +2025-02-06 01:01:46 - ERROR - stderr - 73%|███████▎ | 16461/22434 [14:54:05<4:12:07, 2.53s/it] +2025-02-06 01:01:46 - ERROR - stderr - +2025-02-06 01:01:46 - ERROR - stderr - +2025-02-06 01:01:46 - INFO - stdout - {'loss': 0.4187, 'grad_norm': 1.552886962890625, 'learning_rate': 3.4934515427303684e-06, 'epoch': 2.2} +2025-02-06 01:01:46 - ERROR - stderr - 73%|███████▎ | 16461/22434 [14:54:06<4:12:07, 2.53s/it] +2025-02-06 01:01:48 - ERROR - stderr - 73%|███████▎ | 16462/22434 [14:54:08<4:10:48, 2.52s/it] +2025-02-06 01:01:48 - ERROR - stderr - +2025-02-06 01:01:48 - ERROR - stderr - +2025-02-06 01:01:48 - INFO - stdout - {'loss': 0.4161, 'grad_norm': 1.541314959526062, 'learning_rate': 3.4923552674672978e-06, 'epoch': 2.2} +2025-02-06 01:01:48 - ERROR - stderr - 73%|███████▎ | 16462/22434 [14:54:08<4:10:48, 2.52s/it] +2025-02-06 01:01:51 - ERROR - stderr - 73%|███████▎ | 16463/22434 [14:54:10<4:10:49, 2.52s/it] +2025-02-06 01:01:51 - ERROR - stderr - +2025-02-06 01:01:51 - ERROR - stderr - +2025-02-06 01:01:51 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.5983177423477173, 'learning_rate': 3.49125912784986e-06, 'epoch': 2.2} +2025-02-06 01:01:51 - ERROR - stderr - 73%|███████▎ | 16463/22434 [14:54:11<4:10:49, 2.52s/it] +2025-02-06 01:01:54 - ERROR - stderr - 73%|███████▎ | 16464/22434 [14:54:13<4:21:03, 2.62s/it] +2025-02-06 01:01:54 - ERROR - stderr - +2025-02-06 01:01:54 - ERROR - stderr - +2025-02-06 01:01:54 - INFO - stdout - {'loss': 0.4056, 'grad_norm': 1.635797381401062, 'learning_rate': 3.4901631239008947e-06, 'epoch': 2.2} +2025-02-06 01:01:54 - ERROR - stderr - 73%|███████▎ | 16464/22434 [14:54:13<4:21:03, 2.62s/it] +2025-02-06 01:01:56 - ERROR - stderr - 73%|███████▎ | 16465/22434 [14:54:16<4:15:48, 2.57s/it] +2025-02-06 01:01:56 - ERROR - stderr - +2025-02-06 01:01:56 - ERROR - stderr - +2025-02-06 01:01:56 - INFO - stdout - {'loss': 0.3573, 'grad_norm': 1.5336662530899048, 'learning_rate': 3.489067255643249e-06, 'epoch': 2.2} +2025-02-06 01:01:56 - ERROR - stderr - 73%|███████▎ | 16465/22434 [14:54:16<4:15:48, 2.57s/it] +2025-02-06 01:01:59 - ERROR - stderr - 73%|███████▎ | 16466/22434 [14:54:18<4:13:21, 2.55s/it] +2025-02-06 01:01:59 - ERROR - stderr - +2025-02-06 01:01:59 - ERROR - stderr - +2025-02-06 01:01:59 - INFO - stdout - {'loss': 0.4169, 'grad_norm': 1.4986268281936646, 'learning_rate': 3.487971523099768e-06, 'epoch': 2.2} +2025-02-06 01:01:59 - ERROR - stderr - 73%|███████▎ | 16466/22434 [14:54:18<4:13:21, 2.55s/it] +2025-02-06 01:02:01 - ERROR - stderr - 73%|███████▎ | 16467/22434 [14:54:21<4:12:30, 2.54s/it] +2025-02-06 01:02:01 - ERROR - stderr - +2025-02-06 01:02:01 - ERROR - stderr - +2025-02-06 01:02:01 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.5082272291183472, 'learning_rate': 3.486875926293284e-06, 'epoch': 2.2} +2025-02-06 01:02:01 - ERROR - stderr - 73%|███████▎ | 16467/22434 [14:54:21<4:12:30, 2.54s/it] +2025-02-06 01:02:03 - ERROR - stderr - 73%|███████▎ | 16468/22434 [14:54:23<4:10:01, 2.51s/it] +2025-02-06 01:02:04 - ERROR - stderr - +2025-02-06 01:02:04 - ERROR - stderr - +2025-02-06 01:02:04 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.5096263885498047, 'learning_rate': 3.4857804652466466e-06, 'epoch': 2.2} +2025-02-06 01:02:04 - ERROR - stderr - 73%|███████▎ | 16468/22434 [14:54:23<4:10:01, 2.51s/it] +2025-02-06 01:02:06 - ERROR - stderr - 73%|███████▎ | 16469/22434 [14:54:26<4:08:28, 2.50s/it] +2025-02-06 01:02:06 - ERROR - stderr - +2025-02-06 01:02:06 - ERROR - stderr - +2025-02-06 01:02:06 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.3791091442108154, 'learning_rate': 3.4846851399826788e-06, 'epoch': 2.2} +2025-02-06 01:02:06 - ERROR - stderr - 73%|███████▎ | 16469/22434 [14:54:26<4:08:28, 2.50s/it] +2025-02-06 01:02:08 - ERROR - stderr - 73%|███████▎ | 16470/22434 [14:54:28<4:09:11, 2.51s/it] +2025-02-06 01:02:09 - ERROR - stderr - +2025-02-06 01:02:09 - ERROR - stderr - +2025-02-06 01:02:09 - INFO - stdout - {'loss': 0.381, 'grad_norm': 1.6764763593673706, 'learning_rate': 3.483589950524213e-06, 'epoch': 2.2} +2025-02-06 01:02:09 - ERROR - stderr - 73%|███████▎ | 16470/22434 [14:54:28<4:09:11, 2.51s/it] +2025-02-06 01:02:11 - ERROR - stderr - 73%|███████▎ | 16471/22434 [14:54:31<4:10:35, 2.52s/it] +2025-02-06 01:02:11 - ERROR - stderr - +2025-02-06 01:02:11 - ERROR - stderr - +2025-02-06 01:02:11 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.7501657009124756, 'learning_rate': 3.4824948968940808e-06, 'epoch': 2.2} +2025-02-06 01:02:11 - ERROR - stderr - 73%|███████▎ | 16471/22434 [14:54:31<4:10:35, 2.52s/it] +2025-02-06 01:02:14 - ERROR - stderr - 73%|███████▎ | 16472/22434 [14:54:33<4:10:20, 2.52s/it] +2025-02-06 01:02:14 - ERROR - stderr - +2025-02-06 01:02:14 - ERROR - stderr - +2025-02-06 01:02:14 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.4173920154571533, 'learning_rate': 3.4813999791151065e-06, 'epoch': 2.2} +2025-02-06 01:02:14 - ERROR - stderr - 73%|███████▎ | 16472/22434 [14:54:33<4:10:20, 2.52s/it] +2025-02-06 01:02:16 - ERROR - stderr - 73%|███████▎ | 16473/22434 [14:54:36<4:11:02, 2.53s/it] +2025-02-06 01:02:16 - ERROR - stderr - +2025-02-06 01:02:16 - ERROR - stderr - +2025-02-06 01:02:16 - INFO - stdout - {'loss': 0.4193, 'grad_norm': 1.6346664428710938, 'learning_rate': 3.480305197210111e-06, 'epoch': 2.2} +2025-02-06 01:02:16 - ERROR - stderr - 73%|███████▎ | 16473/22434 [14:54:36<4:11:02, 2.53s/it] +2025-02-06 01:02:19 - ERROR - stderr - 73%|███████▎ | 16474/22434 [14:54:38<4:10:03, 2.52s/it] +2025-02-06 01:02:19 - ERROR - stderr - +2025-02-06 01:02:19 - ERROR - stderr - +2025-02-06 01:02:19 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.500555396080017, 'learning_rate': 3.4792105512019148e-06, 'epoch': 2.2} +2025-02-06 01:02:19 - ERROR - stderr - 73%|███████▎ | 16474/22434 [14:54:38<4:10:03, 2.52s/it] +2025-02-06 01:02:21 - ERROR - stderr - 73%|███████▎ | 16475/22434 [14:54:41<4:11:34, 2.53s/it] +2025-02-06 01:02:21 - ERROR - stderr - +2025-02-06 01:02:21 - ERROR - stderr - +2025-02-06 01:02:21 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.4445401430130005, 'learning_rate': 3.4781160411133354e-06, 'epoch': 2.2} +2025-02-06 01:02:21 - ERROR - stderr - 73%|███████▎ | 16475/22434 [14:54:41<4:11:34, 2.53s/it] +2025-02-06 01:02:24 - ERROR - stderr - 73%|███████▎ | 16476/22434 [14:54:44<4:14:09, 2.56s/it] +2025-02-06 01:02:24 - ERROR - stderr - +2025-02-06 01:02:24 - ERROR - stderr - +2025-02-06 01:02:24 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.4952434301376343, 'learning_rate': 3.477021666967186e-06, 'epoch': 2.2} +2025-02-06 01:02:24 - ERROR - stderr - 73%|███████▎ | 16476/22434 [14:54:44<4:14:09, 2.56s/it] +2025-02-06 01:02:26 - ERROR - stderr - 73%|███████▎ | 16477/22434 [14:54:46<4:11:33, 2.53s/it] +2025-02-06 01:02:26 - ERROR - stderr - +2025-02-06 01:02:26 - ERROR - stderr - +2025-02-06 01:02:26 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.6164652109146118, 'learning_rate': 3.475927428786281e-06, 'epoch': 2.2} +2025-02-06 01:02:26 - ERROR - stderr - 73%|███████▎ | 16477/22434 [14:54:46<4:11:33, 2.53s/it] +2025-02-06 01:02:29 - ERROR - stderr - 73%|███████▎ | 16478/22434 [14:54:49<4:10:58, 2.53s/it] +2025-02-06 01:02:29 - ERROR - stderr - +2025-02-06 01:02:29 - ERROR - stderr - +2025-02-06 01:02:29 - INFO - stdout - {'loss': 0.4037, 'grad_norm': 1.5771899223327637, 'learning_rate': 3.474833326593421e-06, 'epoch': 2.2} +2025-02-06 01:02:29 - ERROR - stderr - 73%|███████▎ | 16478/22434 [14:54:49<4:10:58, 2.53s/it] +2025-02-06 01:02:31 - ERROR - stderr - 73%|███████▎ | 16479/22434 [14:54:51<4:12:39, 2.55s/it] +2025-02-06 01:02:31 - ERROR - stderr - +2025-02-06 01:02:31 - ERROR - stderr - +2025-02-06 01:02:31 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.3571815490722656, 'learning_rate': 3.473739360411418e-06, 'epoch': 2.2} +2025-02-06 01:02:31 - ERROR - stderr - 73%|███████▎ | 16479/22434 [14:54:51<4:12:39, 2.55s/it] +2025-02-06 01:02:34 - ERROR - stderr - 73%|███████▎ | 16480/22434 [14:54:54<4:13:30, 2.55s/it] +2025-02-06 01:02:34 - ERROR - stderr - +2025-02-06 01:02:34 - ERROR - stderr - +2025-02-06 01:02:34 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.5566506385803223, 'learning_rate': 3.4726455302630768e-06, 'epoch': 2.2} +2025-02-06 01:02:34 - ERROR - stderr - 73%|███████▎ | 16480/22434 [14:54:54<4:13:30, 2.55s/it] +2025-02-06 01:02:36 - ERROR - stderr - 73%|███████▎ | 16481/22434 [14:54:56<4:10:18, 2.52s/it] +2025-02-06 01:02:36 - ERROR - stderr - +2025-02-06 01:02:36 - ERROR - stderr - +2025-02-06 01:02:36 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.6009420156478882, 'learning_rate': 3.4715518361711876e-06, 'epoch': 2.2} +2025-02-06 01:02:36 - ERROR - stderr - 73%|███████▎ | 16481/22434 [14:54:56<4:10:18, 2.52s/it] +2025-02-06 01:02:39 - ERROR - stderr - 73%|███████▎ | 16482/22434 [14:54:59<4:10:55, 2.53s/it] +2025-02-06 01:02:39 - ERROR - stderr - +2025-02-06 01:02:39 - ERROR - stderr - +2025-02-06 01:02:39 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.5089601278305054, 'learning_rate': 3.4704582781585596e-06, 'epoch': 2.2} +2025-02-06 01:02:39 - ERROR - stderr - 73%|███████▎ | 16482/22434 [14:54:59<4:10:55, 2.53s/it] +2025-02-06 01:02:41 - ERROR - stderr - 73%|███████▎ | 16483/22434 [14:55:01<4:11:22, 2.53s/it] +2025-02-06 01:02:42 - ERROR - stderr - +2025-02-06 01:02:42 - ERROR - stderr - +2025-02-06 01:02:42 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.5041552782058716, 'learning_rate': 3.4693648562479733e-06, 'epoch': 2.2} +2025-02-06 01:02:42 - ERROR - stderr - 73%|███████▎ | 16483/22434 [14:55:01<4:11:22, 2.53s/it] +2025-02-06 01:02:44 - ERROR - stderr - 73%|███████▎ | 16484/22434 [14:55:04<4:10:14, 2.52s/it] +2025-02-06 01:02:44 - ERROR - stderr - +2025-02-06 01:02:44 - ERROR - stderr - +2025-02-06 01:02:44 - INFO - stdout - {'loss': 0.3397, 'grad_norm': 1.5039492845535278, 'learning_rate': 3.468271570462235e-06, 'epoch': 2.2} +2025-02-06 01:02:44 - ERROR - stderr - 73%|███████▎ | 16484/22434 [14:55:04<4:10:14, 2.52s/it] +2025-02-06 01:02:46 - ERROR - stderr - 73%|███████▎ | 16485/22434 [14:55:06<4:08:33, 2.51s/it] +2025-02-06 01:02:46 - ERROR - stderr - +2025-02-06 01:02:46 - ERROR - stderr - +2025-02-06 01:02:46 - INFO - stdout - {'loss': 0.4163, 'grad_norm': 1.5863938331604004, 'learning_rate': 3.467178420824122e-06, 'epoch': 2.2} +2025-02-06 01:02:46 - ERROR - stderr - 73%|███████▎ | 16485/22434 [14:55:06<4:08:33, 2.51s/it] +2025-02-06 01:02:49 - ERROR - stderr - 73%|███████▎ | 16486/22434 [14:55:09<4:09:10, 2.51s/it] +2025-02-06 01:02:49 - ERROR - stderr - +2025-02-06 01:02:49 - ERROR - stderr - +2025-02-06 01:02:49 - INFO - stdout - {'loss': 0.3812, 'grad_norm': 1.6247525215148926, 'learning_rate': 3.46608540735642e-06, 'epoch': 2.2} +2025-02-06 01:02:49 - ERROR - stderr - 73%|███████▎ | 16486/22434 [14:55:09<4:09:10, 2.51s/it] +2025-02-06 01:02:51 - ERROR - stderr - 73%|███████▎ | 16487/22434 [14:55:11<4:08:49, 2.51s/it] +2025-02-06 01:02:52 - ERROR - stderr - +2025-02-06 01:02:52 - ERROR - stderr - +2025-02-06 01:02:52 - INFO - stdout - {'loss': 0.4138, 'grad_norm': 1.5923283100128174, 'learning_rate': 3.464992530081922e-06, 'epoch': 2.2} +2025-02-06 01:02:52 - ERROR - stderr - 73%|███████▎ | 16487/22434 [14:55:11<4:08:49, 2.51s/it] +2025-02-06 01:02:54 - ERROR - stderr - 73%|███████▎ | 16488/22434 [14:55:14<4:06:51, 2.49s/it] +2025-02-06 01:02:54 - ERROR - stderr - +2025-02-06 01:02:54 - ERROR - stderr - +2025-02-06 01:02:54 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.4704937934875488, 'learning_rate': 3.463899789023395e-06, 'epoch': 2.2} +2025-02-06 01:02:54 - ERROR - stderr - 73%|███████▎ | 16488/22434 [14:55:14<4:06:51, 2.49s/it] +2025-02-06 01:02:56 - ERROR - stderr - 74%|███████▎ | 16489/22434 [14:55:16<4:07:44, 2.50s/it] +2025-02-06 01:02:56 - ERROR - stderr - +2025-02-06 01:02:56 - ERROR - stderr - +2025-02-06 01:02:56 - INFO - stdout - {'loss': 0.3959, 'grad_norm': 1.522513747215271, 'learning_rate': 3.462807184203629e-06, 'epoch': 2.21} +2025-02-06 01:02:56 - ERROR - stderr - 74%|███████▎ | 16489/22434 [14:55:16<4:07:44, 2.50s/it] +2025-02-06 01:02:59 - ERROR - stderr - 74%|███████▎ | 16490/22434 [14:55:19<4:08:46, 2.51s/it] +2025-02-06 01:02:59 - ERROR - stderr - +2025-02-06 01:02:59 - ERROR - stderr - +2025-02-06 01:02:59 - INFO - stdout - {'loss': 0.4472, 'grad_norm': 1.5773521661758423, 'learning_rate': 3.461714715645389e-06, 'epoch': 2.21} +2025-02-06 01:02:59 - ERROR - stderr - 74%|███████▎ | 16490/22434 [14:55:19<4:08:46, 2.51s/it] +2025-02-06 01:03:01 - ERROR - stderr - 74%|███████▎ | 16491/22434 [14:55:21<4:08:53, 2.51s/it] +2025-02-06 01:03:02 - ERROR - stderr - +2025-02-06 01:03:02 - ERROR - stderr - +2025-02-06 01:03:02 - INFO - stdout - {'loss': 0.3235, 'grad_norm': 1.338305115699768, 'learning_rate': 3.4606223833714493e-06, 'epoch': 2.21} +2025-02-06 01:03:02 - ERROR - stderr - 74%|███████▎ | 16491/22434 [14:55:21<4:08:53, 2.51s/it] +2025-02-06 01:03:04 - ERROR - stderr - 74%|███████▎ | 16492/22434 [14:55:24<4:06:21, 2.49s/it] +2025-02-06 01:03:04 - ERROR - stderr - +2025-02-06 01:03:04 - ERROR - stderr - +2025-02-06 01:03:04 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.6059221029281616, 'learning_rate': 3.4595301874045785e-06, 'epoch': 2.21} +2025-02-06 01:03:04 - ERROR - stderr - 74%|███████▎ | 16492/22434 [14:55:24<4:06:21, 2.49s/it] +2025-02-06 01:03:07 - ERROR - stderr - 74%|███████▎ | 16493/22434 [14:55:27<4:20:38, 2.63s/it] +2025-02-06 01:03:07 - ERROR - stderr - +2025-02-06 01:03:07 - ERROR - stderr - +2025-02-06 01:03:07 - INFO - stdout - {'loss': 0.431, 'grad_norm': 1.7253587245941162, 'learning_rate': 3.4584381277675416e-06, 'epoch': 2.21} +2025-02-06 01:03:07 - ERROR - stderr - 74%|███████▎ | 16493/22434 [14:55:27<4:20:38, 2.63s/it] +2025-02-06 01:03:09 - ERROR - stderr - 74%|███████▎ | 16494/22434 [14:55:29<4:18:29, 2.61s/it] +2025-02-06 01:03:09 - ERROR - stderr - +2025-02-06 01:03:09 - ERROR - stderr - +2025-02-06 01:03:09 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.5903260707855225, 'learning_rate': 3.457346204483103e-06, 'epoch': 2.21} +2025-02-06 01:03:09 - ERROR - stderr - 74%|███████▎ | 16494/22434 [14:55:29<4:18:29, 2.61s/it] +2025-02-06 01:03:12 - ERROR - stderr - 74%|███████▎ | 16495/22434 [14:55:32<4:14:52, 2.57s/it] +2025-02-06 01:03:12 - ERROR - stderr - +2025-02-06 01:03:12 - ERROR - stderr - +2025-02-06 01:03:12 - INFO - stdout - {'loss': 0.4106, 'grad_norm': 1.5016887187957764, 'learning_rate': 3.456254417574022e-06, 'epoch': 2.21} +2025-02-06 01:03:12 - ERROR - stderr - 74%|███████▎ | 16495/22434 [14:55:32<4:14:52, 2.57s/it] +2025-02-06 01:03:14 - ERROR - stderr - 74%|███████▎ | 16496/22434 [14:55:34<4:13:11, 2.56s/it] +2025-02-06 01:03:15 - ERROR - stderr - +2025-02-06 01:03:15 - ERROR - stderr - +2025-02-06 01:03:15 - INFO - stdout - {'loss': 0.3116, 'grad_norm': 1.417677879333496, 'learning_rate': 3.4551627670630562e-06, 'epoch': 2.21} +2025-02-06 01:03:15 - ERROR - stderr - 74%|███████▎ | 16496/22434 [14:55:34<4:13:11, 2.56s/it] +2025-02-06 01:03:17 - ERROR - stderr - 74%|███████▎ | 16497/22434 [14:55:37<4:10:49, 2.53s/it] +2025-02-06 01:03:17 - ERROR - stderr - +2025-02-06 01:03:17 - ERROR - stderr - +2025-02-06 01:03:17 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.532374382019043, 'learning_rate': 3.4540712529729592e-06, 'epoch': 2.21} +2025-02-06 01:03:17 - ERROR - stderr - 74%|███████▎ | 16497/22434 [14:55:37<4:10:49, 2.53s/it] +2025-02-06 01:03:19 - ERROR - stderr - 74%|███████▎ | 16498/22434 [14:55:39<4:10:48, 2.54s/it] +2025-02-06 01:03:20 - ERROR - stderr - +2025-02-06 01:03:20 - ERROR - stderr - +2025-02-06 01:03:20 - INFO - stdout - {'loss': 0.3449, 'grad_norm': 1.4879239797592163, 'learning_rate': 3.452979875326483e-06, 'epoch': 2.21} +2025-02-06 01:03:20 - ERROR - stderr - 74%|███████▎ | 16498/22434 [14:55:39<4:10:48, 2.54s/it] +2025-02-06 01:03:22 - ERROR - stderr - 74%|███████▎ | 16499/22434 [14:55:42<4:09:32, 2.52s/it] +2025-02-06 01:03:22 - ERROR - stderr - +2025-02-06 01:03:22 - ERROR - stderr - +2025-02-06 01:03:22 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.3327503204345703, 'learning_rate': 3.4518886341463775e-06, 'epoch': 2.21} +2025-02-06 01:03:22 - ERROR - stderr - 74%|█��█████▎ | 16499/22434 [14:55:42<4:09:32, 2.52s/it] +2025-02-06 01:03:24 - ERROR - stderr - 74%|███████▎ | 16500/22434 [14:55:44<4:07:23, 2.50s/it] +2025-02-06 01:03:24 - ERROR - stderr - +2025-02-06 01:03:24 - ERROR - stderr - +2025-02-06 01:03:24 - INFO - stdout - {'loss': 0.3298, 'grad_norm': 1.380768060684204, 'learning_rate': 3.4507975294553877e-06, 'epoch': 2.21} +2025-02-06 01:03:24 - ERROR - stderr - 74%|███████▎ | 16500/22434 [14:55:44<4:07:23, 2.50s/it] +2025-02-06 01:03:27 - ERROR - stderr - 74%|███████▎ | 16501/22434 [14:55:47<4:05:14, 2.48s/it] +2025-02-06 01:03:27 - ERROR - stderr - +2025-02-06 01:03:27 - ERROR - stderr - +2025-02-06 01:03:27 - INFO - stdout - {'loss': 0.3999, 'grad_norm': 1.4507542848587036, 'learning_rate': 3.449706561276259e-06, 'epoch': 2.21} +2025-02-06 01:03:27 - ERROR - stderr - 74%|███████▎ | 16501/22434 [14:55:47<4:05:14, 2.48s/it] +2025-02-06 01:03:29 - ERROR - stderr - 74%|███████▎ | 16502/22434 [14:55:49<4:05:20, 2.48s/it] +2025-02-06 01:03:29 - ERROR - stderr - +2025-02-06 01:03:29 - ERROR - stderr - +2025-02-06 01:03:29 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.4084267616271973, 'learning_rate': 3.4486157296317224e-06, 'epoch': 2.21} +2025-02-06 01:03:29 - ERROR - stderr - 74%|███████▎ | 16502/22434 [14:55:49<4:05:20, 2.48s/it] +2025-02-06 01:03:32 - ERROR - stderr - 74%|███████▎ | 16503/22434 [14:55:52<4:05:46, 2.49s/it] +2025-02-06 01:03:32 - ERROR - stderr - +2025-02-06 01:03:32 - ERROR - stderr - +2025-02-06 01:03:32 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.5734275579452515, 'learning_rate': 3.4475250345445287e-06, 'epoch': 2.21} +2025-02-06 01:03:32 - ERROR - stderr - 74%|███████▎ | 16503/22434 [14:55:52<4:05:46, 2.49s/it] +2025-02-06 01:03:34 - ERROR - stderr - 74%|███████▎ | 16504/22434 [14:55:54<4:05:25, 2.48s/it] +2025-02-06 01:03:34 - ERROR - stderr - +2025-02-06 01:03:34 - ERROR - stderr - +2025-02-06 01:03:34 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.7202084064483643, 'learning_rate': 3.446434476037399e-06, 'epoch': 2.21} +2025-02-06 01:03:34 - ERROR - stderr - 74%|███████▎ | 16504/22434 [14:55:54<4:05:25, 2.48s/it] +2025-02-06 01:03:37 - ERROR - stderr - 74%|███████▎ | 16505/22434 [14:55:57<4:07:48, 2.51s/it] +2025-02-06 01:03:37 - ERROR - stderr - +2025-02-06 01:03:37 - ERROR - stderr - +2025-02-06 01:03:37 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.573837161064148, 'learning_rate': 3.445344054133075e-06, 'epoch': 2.21} +2025-02-06 01:03:37 - ERROR - stderr - 74%|███████▎ | 16505/22434 [14:55:57<4:07:48, 2.51s/it] +2025-02-06 01:03:39 - ERROR - stderr - 74%|███████▎ | 16506/22434 [14:55:59<4:06:15, 2.49s/it] +2025-02-06 01:03:39 - ERROR - stderr - +2025-02-06 01:03:39 - ERROR - stderr - +2025-02-06 01:03:39 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.4000096321105957, 'learning_rate': 3.4442537688542855e-06, 'epoch': 2.21} +2025-02-06 01:03:39 - ERROR - stderr - 74%|███████▎ | 16506/22434 [14:55:59<4:06:15, 2.49s/it] +2025-02-06 01:03:42 - ERROR - stderr - 74%|███████▎ | 16507/22434 [14:56:02<4:06:02, 2.49s/it] +2025-02-06 01:03:42 - ERROR - stderr - +2025-02-06 01:03:42 - ERROR - stderr - +2025-02-06 01:03:42 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.6614608764648438, 'learning_rate': 3.4431636202237464e-06, 'epoch': 2.21} +2025-02-06 01:03:42 - ERROR - stderr - 74%|███████▎ | 16507/22434 [14:56:02<4:06:02, 2.49s/it] +2025-02-06 01:03:44 - ERROR - stderr - 74%|███████▎ | 16508/22434 [14:56:04<4:07:30, 2.51s/it] +2025-02-06 01:03:44 - ERROR - stderr - +2025-02-06 01:03:44 - ERROR - stderr - +2025-02-06 01:03:44 - INFO - stdout - {'loss': 0.4082, 'grad_norm': 1.585769534111023, 'learning_rate': 3.442073608264194e-06, 'epoch': 2.21} +2025-02-06 01:03:44 - ERROR - stderr - 74%|███████▎ | 16508/22434 [14:56:04<4:07:30, 2.51s/it] +2025-02-06 01:03:47 - ERROR - stderr - 74%|███████▎ | 16509/22434 [14:56:07<4:07:46, 2.51s/it] +2025-02-06 01:03:47 - ERROR - stderr - +2025-02-06 01:03:47 - ERROR - stderr - +2025-02-06 01:03:47 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.5032787322998047, 'learning_rate': 3.4409837329983376e-06, 'epoch': 2.21} +2025-02-06 01:03:47 - ERROR - stderr - 74%|███████▎ | 16509/22434 [14:56:07<4:07:46, 2.51s/it] +2025-02-06 01:03:49 - ERROR - stderr - 74%|███████▎ | 16510/22434 [14:56:09<4:07:17, 2.50s/it] +2025-02-06 01:03:49 - ERROR - stderr - +2025-02-06 01:03:49 - ERROR - stderr - +2025-02-06 01:03:49 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.4769421815872192, 'learning_rate': 3.4398939944488994e-06, 'epoch': 2.21} +2025-02-06 01:03:49 - ERROR - stderr - 74%|███████▎ | 16510/22434 [14:56:09<4:07:17, 2.50s/it] +2025-02-06 01:03:52 - ERROR - stderr - 74%|███████▎ | 16511/22434 [14:56:12<4:06:12, 2.49s/it] +2025-02-06 01:03:52 - ERROR - stderr - +2025-02-06 01:03:52 - ERROR - stderr - +2025-02-06 01:03:52 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.464247226715088, 'learning_rate': 3.438804392638595e-06, 'epoch': 2.21} +2025-02-06 01:03:52 - ERROR - stderr - 74%|███████▎ | 16511/22434 [14:56:12<4:06:12, 2.49s/it] +2025-02-06 01:03:55 - ERROR - stderr - 74%|███████▎ | 16512/22434 [14:56:14<4:16:02, 2.59s/it] +2025-02-06 01:03:55 - ERROR - stderr - +2025-02-06 01:03:55 - ERROR - stderr - +2025-02-06 01:03:55 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.4186909198760986, 'learning_rate': 3.43771492759013e-06, 'epoch': 2.21} +2025-02-06 01:03:55 - ERROR - stderr - 74%|███████▎ | 16512/22434 [14:56:14<4:16:02, 2.59s/it] +2025-02-06 01:03:57 - ERROR - stderr - 74%|███████▎ | 16513/22434 [14:56:17<4:13:14, 2.57s/it] +2025-02-06 01:03:57 - ERROR - stderr - +2025-02-06 01:03:57 - ERROR - stderr - +2025-02-06 01:03:57 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.361175775527954, 'learning_rate': 3.4366255993262255e-06, 'epoch': 2.21} +2025-02-06 01:03:57 - ERROR - stderr - 74%|███████▎ | 16513/22434 [14:56:17<4:13:14, 2.57s/it] +2025-02-06 01:04:00 - ERROR - stderr - 74%|███████▎ | 16514/22434 [14:56:19<4:10:13, 2.54s/it] +2025-02-06 01:04:00 - ERROR - stderr - +2025-02-06 01:04:00 - ERROR - stderr - +2025-02-06 01:04:00 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.567903757095337, 'learning_rate': 3.435536407869575e-06, 'epoch': 2.21} +2025-02-06 01:04:00 - ERROR - stderr - 74%|███████▎ | 16514/22434 [14:56:19<4:10:13, 2.54s/it] +2025-02-06 01:04:02 - ERROR - stderr - 74%|███████▎ | 16515/22434 [14:56:22<4:09:20, 2.53s/it] +2025-02-06 01:04:02 - ERROR - stderr - +2025-02-06 01:04:02 - ERROR - stderr - +2025-02-06 01:04:02 - INFO - stdout - {'loss': 0.4244, 'grad_norm': 1.8031455278396606, 'learning_rate': 3.434447353242888e-06, 'epoch': 2.21} +2025-02-06 01:04:02 - ERROR - stderr - 74%|███████▎ | 16515/22434 [14:56:22<4:09:20, 2.53s/it] +2025-02-06 01:04:05 - ERROR - stderr - 74%|███████▎ | 16516/22434 [14:56:24<4:09:11, 2.53s/it] +2025-02-06 01:04:05 - ERROR - stderr - +2025-02-06 01:04:05 - ERROR - stderr - +2025-02-06 01:04:05 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.3213262557983398, 'learning_rate': 3.4333584354688634e-06, 'epoch': 2.21} +2025-02-06 01:04:05 - ERROR - stderr - 74%|███████▎ | 16516/22434 [14:56:24<4:09:11, 2.53s/it] +2025-02-06 01:04:07 - ERROR - stderr - 74%|███████▎ | 16517/22434 [14:56:27<4:08:27, 2.52s/it] +2025-02-06 01:04:07 - ERROR - stderr - +2025-02-06 01:04:07 - ERROR - stderr - +2025-02-06 01:04:07 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.626657247543335, 'learning_rate': 3.4322696545701984e-06, 'epoch': 2.21} +2025-02-06 01:04:07 - ERROR - stderr - 74%|███████▎ | 16517/22434 [14:56:27<4:08:27, 2.52s/it] +2025-02-06 01:04:10 - ERROR - stderr - 74%|███████▎ | 16518/22434 [14:56:29<4:07:33, 2.51s/it] +2025-02-06 01:04:10 - ERROR - stderr - +2025-02-06 01:04:10 - ERROR - stderr - +2025-02-06 01:04:10 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.4158945083618164, 'learning_rate': 3.4311810105695875e-06, 'epoch': 2.21} +2025-02-06 01:04:10 - ERROR - stderr - 74%|███████▎ | 16518/22434 [14:56:29<4:07:33, 2.51s/it] +2025-02-06 01:04:12 - ERROR - stderr - 74%|███████▎ | 16519/22434 [14:56:32<4:08:36, 2.52s/it] +2025-02-06 01:04:12 - ERROR - stderr - +2025-02-06 01:04:12 - ERROR - stderr - +2025-02-06 01:04:12 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.5091612339019775, 'learning_rate': 3.4300925034897227e-06, 'epoch': 2.21} +2025-02-06 01:04:12 - ERROR - stderr - 74%|███████▎ | 16519/22434 [14:56:32<4:08:36, 2.52s/it] +2025-02-06 01:04:15 - ERROR - stderr - 74%|███████▎ | 16520/22434 [14:56:35<4:08:47, 2.52s/it] +2025-02-06 01:04:15 - ERROR - stderr - +2025-02-06 01:04:15 - ERROR - stderr - +2025-02-06 01:04:15 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.700692057609558, 'learning_rate': 3.429004133353293e-06, 'epoch': 2.21} +2025-02-06 01:04:15 - ERROR - stderr - 74%|███████▎ | 16520/22434 [14:56:35<4:08:47, 2.52s/it] +2025-02-06 01:04:18 - ERROR - stderr - 74%|███████▎ | 16521/22434 [14:56:37<4:21:21, 2.65s/it] +2025-02-06 01:04:18 - ERROR - stderr - +2025-02-06 01:04:18 - ERROR - stderr - +2025-02-06 01:04:18 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.6729439496994019, 'learning_rate': 3.4279159001829844e-06, 'epoch': 2.21} +2025-02-06 01:04:18 - ERROR - stderr - 74%|███████▎ | 16521/22434 [14:56:38<4:21:21, 2.65s/it] +2025-02-06 01:04:20 - ERROR - stderr - 74%|███████▎ | 16522/22434 [14:56:40<4:15:16, 2.59s/it] +2025-02-06 01:04:20 - ERROR - stderr - +2025-02-06 01:04:20 - ERROR - stderr - +2025-02-06 01:04:20 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.545614242553711, 'learning_rate': 3.4268278040014836e-06, 'epoch': 2.21} +2025-02-06 01:04:20 - ERROR - stderr - 74%|███████▎ | 16522/22434 [14:56:40<4:15:16, 2.59s/it] +2025-02-06 01:04:23 - ERROR - stderr - 74%|███████▎ | 16523/22434 [14:56:42<4:13:19, 2.57s/it] +2025-02-06 01:04:23 - ERROR - stderr - +2025-02-06 01:04:23 - ERROR - stderr - +2025-02-06 01:04:23 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.4382662773132324, 'learning_rate': 3.4257398448314604e-06, 'epoch': 2.21} +2025-02-06 01:04:23 - ERROR - stderr - 74%|███████▎ | 16523/22434 [14:56:42<4:13:19, 2.57s/it] +2025-02-06 01:04:25 - ERROR - stderr - 74%|███████▎ | 16524/22434 [14:56:45<4:11:27, 2.55s/it] +2025-02-06 01:04:25 - ERROR - stderr - +2025-02-06 01:04:25 - ERROR - stderr - +2025-02-06 01:04:25 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5453675985336304, 'learning_rate': 3.4246520226956028e-06, 'epoch': 2.21} +2025-02-06 01:04:25 - ERROR - stderr - 74%|███████▎ | 16524/22434 [14:56:45<4:11:27, 2.55s/it] +2025-02-06 01:04:28 - ERROR - stderr - 74%|███████▎ | 16525/22434 [14:56:47<4:10:04, 2.54s/it] +2025-02-06 01:04:28 - ERROR - stderr - +2025-02-06 01:04:28 - ERROR - stderr - +2025-02-06 01:04:28 - INFO - stdout - {'loss': 0.444, 'grad_norm': 1.7147769927978516, 'learning_rate': 3.423564337616585e-06, 'epoch': 2.21} +2025-02-06 01:04:28 - ERROR - stderr - 74%|███████▎ | 16525/22434 [14:56:47<4:10:04, 2.54s/it] +2025-02-06 01:04:31 - ERROR - stderr - 74%|███████▎ | 16526/22434 [14:56:50<4:22:59, 2.67s/it] +2025-02-06 01:04:31 - ERROR - stderr - +2025-02-06 01:04:31 - ERROR - stderr - +2025-02-06 01:04:31 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.5619629621505737, 'learning_rate': 3.4224767896170697e-06, 'epoch': 2.21} +2025-02-06 01:04:31 - ERROR - stderr - 74%|███████▎ | 16526/22434 [14:56:50<4:22:59, 2.67s/it] +2025-02-06 01:04:33 - ERROR - stderr - 74%|███████▎ | 16527/22434 [14:56:53<4:19:03, 2.63s/it] +2025-02-06 01:04:33 - ERROR - stderr - +2025-02-06 01:04:33 - ERROR - stderr - +2025-02-06 01:04:33 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.6263844966888428, 'learning_rate': 3.4213893787197372e-06, 'epoch': 2.21} +2025-02-06 01:04:33 - ERROR - stderr - 74%|███████▎ | 16527/22434 [14:56:53<4:19:03, 2.63s/it] +2025-02-06 01:04:36 - ERROR - stderr - 74%|███████▎ | 16528/22434 [14:56:55<4:15:01, 2.59s/it] +2025-02-06 01:04:36 - ERROR - stderr - +2025-02-06 01:04:36 - ERROR - stderr - +2025-02-06 01:04:36 - INFO - stdout - {'loss': 0.4085, 'grad_norm': 1.637852430343628, 'learning_rate': 3.4203021049472417e-06, 'epoch': 2.21} +2025-02-06 01:04:36 - ERROR - stderr - 74%|███████▎ | 16528/22434 [14:56:56<4:15:01, 2.59s/it] +2025-02-06 01:04:38 - ERROR - stderr - 74%|███████▎ | 16529/22434 [14:56:58<4:13:09, 2.57s/it] +2025-02-06 01:04:38 - ERROR - stderr - +2025-02-06 01:04:38 - ERROR - stderr - +2025-02-06 01:04:38 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.3829307556152344, 'learning_rate': 3.41921496832226e-06, 'epoch': 2.21} +2025-02-06 01:04:38 - ERROR - stderr - 74%|███████▎ | 16529/22434 [14:56:58<4:13:09, 2.57s/it] +2025-02-06 01:04:41 - ERROR - stderr - 74%|███████▎ | 16530/22434 [14:57:01<4:12:07, 2.56s/it] +2025-02-06 01:04:41 - ERROR - stderr - +2025-02-06 01:04:41 - ERROR - stderr - +2025-02-06 01:04:41 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.7452017068862915, 'learning_rate': 3.418127968867442e-06, 'epoch': 2.21} +2025-02-06 01:04:41 - ERROR - stderr - 74%|███████▎ | 16530/22434 [14:57:01<4:12:07, 2.56s/it] +2025-02-06 01:04:43 - ERROR - stderr - 74%|███████▎ | 16531/22434 [14:57:03<4:09:10, 2.53s/it] +2025-02-06 01:04:43 - ERROR - stderr - +2025-02-06 01:04:43 - ERROR - stderr - +2025-02-06 01:04:43 - INFO - stdout - {'loss': 0.4851, 'grad_norm': 1.7482801675796509, 'learning_rate': 3.4170411066054442e-06, 'epoch': 2.21} +2025-02-06 01:04:43 - ERROR - stderr - 74%|███████▎ | 16531/22434 [14:57:03<4:09:10, 2.53s/it] +2025-02-06 01:04:46 - ERROR - stderr - 74%|███████▎ | 16532/22434 [14:57:05<4:06:52, 2.51s/it] +2025-02-06 01:04:46 - ERROR - stderr - +2025-02-06 01:04:46 - ERROR - stderr - +2025-02-06 01:04:46 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.4223763942718506, 'learning_rate': 3.4159543815589325e-06, 'epoch': 2.21} +2025-02-06 01:04:46 - ERROR - stderr - 74%|███████▎ | 16532/22434 [14:57:05<4:06:52, 2.51s/it] +2025-02-06 01:04:48 - ERROR - stderr - 74%|███████▎ | 16533/22434 [14:57:08<4:08:11, 2.52s/it] +2025-02-06 01:04:48 - ERROR - stderr - +2025-02-06 01:04:48 - ERROR - stderr - +2025-02-06 01:04:48 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.6544294357299805, 'learning_rate': 3.414867793750547e-06, 'epoch': 2.21} +2025-02-06 01:04:48 - ERROR - stderr - 74%|███████▎ | 16533/22434 [14:57:08<4:08:11, 2.52s/it] +2025-02-06 01:04:51 - ERROR - stderr - 74%|███████▎ | 16534/22434 [14:57:11<4:15:35, 2.60s/it] +2025-02-06 01:04:51 - ERROR - stderr - +2025-02-06 01:04:51 - ERROR - stderr - +2025-02-06 01:04:51 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.545331597328186, 'learning_rate': 3.413781343202942e-06, 'epoch': 2.21} +2025-02-06 01:04:51 - ERROR - stderr - 74%|███████▎ | 16534/22434 [14:57:11<4:15:35, 2.60s/it] +2025-02-06 01:04:53 - ERROR - stderr - 74%|███████▎ | 16535/22434 [14:57:13<4:10:22, 2.55s/it] +2025-02-06 01:04:53 - ERROR - stderr - +2025-02-06 01:04:53 - ERROR - stderr - +2025-02-06 01:04:53 - INFO - stdout - {'loss': 0.3987, 'grad_norm': 1.5762994289398193, 'learning_rate': 3.412695029938763e-06, 'epoch': 2.21} +2025-02-06 01:04:53 - ERROR - stderr - 74%|███████▎ | 16535/22434 [14:57:13<4:10:22, 2.55s/it] +2025-02-06 01:04:56 - ERROR - stderr - 74%|███████▎ | 16536/22434 [14:57:16<4:08:43, 2.53s/it] +2025-02-06 01:04:56 - ERROR - stderr - +2025-02-06 01:04:56 - ERROR - stderr - +2025-02-06 01:04:56 - INFO - stdout - {'loss': 0.3883, 'grad_norm': 1.5882729291915894, 'learning_rate': 3.4116088539806523e-06, 'epoch': 2.21} +2025-02-06 01:04:56 - ERROR - stderr - 74%|███████▎ | 16536/22434 [14:57:16<4:08:43, 2.53s/it] +2025-02-06 01:04:59 - ERROR - stderr - 74%|███████▎ | 16537/22434 [14:57:18<4:10:21, 2.55s/it] +2025-02-06 01:04:59 - ERROR - stderr - +2025-02-06 01:04:59 - ERROR - stderr - +2025-02-06 01:04:59 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.5356632471084595, 'learning_rate': 3.4105228153512502e-06, 'epoch': 2.21} +2025-02-06 01:04:59 - ERROR - stderr - 74%|███████▎ | 16537/22434 [14:57:18<4:10:21, 2.55s/it] +2025-02-06 01:05:01 - ERROR - stderr - 74%|███████▎ | 16538/22434 [14:57:21<4:21:04, 2.66s/it] +2025-02-06 01:05:01 - ERROR - stderr - +2025-02-06 01:05:01 - ERROR - stderr - +2025-02-06 01:05:01 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.633872628211975, 'learning_rate': 3.4094369140731953e-06, 'epoch': 2.21} +2025-02-06 01:05:01 - ERROR - stderr - 74%|███████▎ | 16538/22434 [14:57:21<4:21:04, 2.66s/it] +2025-02-06 01:05:04 - ERROR - stderr - 74%|███████▎ | 16539/22434 [14:57:24<4:22:26, 2.67s/it] +2025-02-06 01:05:04 - ERROR - stderr - +2025-02-06 01:05:04 - ERROR - stderr - +2025-02-06 01:05:04 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.5918583869934082, 'learning_rate': 3.4083511501691214e-06, 'epoch': 2.21} +2025-02-06 01:05:04 - ERROR - stderr - 74%|███████▎ | 16539/22434 [14:57:24<4:22:26, 2.67s/it] +2025-02-06 01:05:07 - ERROR - stderr - 74%|███████▎ | 16540/22434 [14:57:27<4:24:19, 2.69s/it] +2025-02-06 01:05:07 - ERROR - stderr - +2025-02-06 01:05:07 - ERROR - stderr - +2025-02-06 01:05:07 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.4663565158843994, 'learning_rate': 3.4072655236616593e-06, 'epoch': 2.21} +2025-02-06 01:05:07 - ERROR - stderr - 74%|███████▎ | 16540/22434 [14:57:27<4:24:19, 2.69s/it] +2025-02-06 01:05:09 - ERROR - stderr - 74%|███████▎ | 16541/22434 [14:57:29<4:22:18, 2.67s/it] +2025-02-06 01:05:10 - ERROR - stderr - +2025-02-06 01:05:10 - ERROR - stderr - +2025-02-06 01:05:10 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.2753360271453857, 'learning_rate': 3.406180034573443e-06, 'epoch': 2.21} +2025-02-06 01:05:10 - ERROR - stderr - 74%|███████▎ | 16541/22434 [14:57:29<4:22:18, 2.67s/it] +2025-02-06 01:05:12 - ERROR - stderr - 74%|███████▎ | 16542/22434 [14:57:32<4:18:43, 2.63s/it] +2025-02-06 01:05:12 - ERROR - stderr - +2025-02-06 01:05:12 - ERROR - stderr - +2025-02-06 01:05:12 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.5873600244522095, 'learning_rate': 3.405094682927087e-06, 'epoch': 2.21} +2025-02-06 01:05:12 - ERROR - stderr - 74%|███████▎ | 16542/22434 [14:57:32<4:18:43, 2.63s/it] +2025-02-06 01:05:14 - ERROR - stderr - 74%|███████▎ | 16543/22434 [14:57:34<4:12:16, 2.57s/it] +2025-02-06 01:05:15 - ERROR - stderr - +2025-02-06 01:05:15 - ERROR - stderr - +2025-02-06 01:05:15 - INFO - stdout - {'loss': 0.4158, 'grad_norm': 1.669495701789856, 'learning_rate': 3.4040094687452263e-06, 'epoch': 2.21} +2025-02-06 01:05:15 - ERROR - stderr - 74%|███████▎ | 16543/22434 [14:57:34<4:12:16, 2.57s/it] +2025-02-06 01:05:17 - ERROR - stderr - 74%|███████▎ | 16544/22434 [14:57:37<4:10:04, 2.55s/it] +2025-02-06 01:05:17 - ERROR - stderr - +2025-02-06 01:05:17 - ERROR - stderr - +2025-02-06 01:05:17 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.4929075241088867, 'learning_rate': 3.402924392050475e-06, 'epoch': 2.21} +2025-02-06 01:05:17 - ERROR - stderr - 74%|███████▎ | 16544/22434 [14:57:37<4:10:04, 2.55s/it] +2025-02-06 01:05:19 - ERROR - stderr - 74%|███████▎ | 16545/22434 [14:57:39<4:08:53, 2.54s/it] +2025-02-06 01:05:20 - ERROR - stderr - +2025-02-06 01:05:20 - ERROR - stderr - +2025-02-06 01:05:20 - INFO - stdout - {'loss': 0.4269, 'grad_norm': 1.549967885017395, 'learning_rate': 3.401839452865453e-06, 'epoch': 2.21} +2025-02-06 01:05:20 - ERROR - stderr - 74%|███████▎ | 16545/22434 [14:57:39<4:08:53, 2.54s/it] +2025-02-06 01:05:22 - ERROR - stderr - 74%|███████▍ | 16546/22434 [14:57:42<4:12:09, 2.57s/it] +2025-02-06 01:05:22 - ERROR - stderr - +2025-02-06 01:05:22 - ERROR - stderr - +2025-02-06 01:05:22 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.4322086572647095, 'learning_rate': 3.4007546512127764e-06, 'epoch': 2.21} +2025-02-06 01:05:22 - ERROR - stderr - 74%|███████▍ | 16546/22434 [14:57:42<4:12:09, 2.57s/it] +2025-02-06 01:05:25 - ERROR - stderr - 74%|███████▍ | 16547/22434 [14:57:44<4:13:10, 2.58s/it] +2025-02-06 01:05:25 - ERROR - stderr - +2025-02-06 01:05:25 - ERROR - stderr - +2025-02-06 01:05:25 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.4452801942825317, 'learning_rate': 3.3996699871150486e-06, 'epoch': 2.21} +2025-02-06 01:05:25 - ERROR - stderr - 74%|███████▍ | 16547/22434 [14:57:45<4:13:10, 2.58s/it] +2025-02-06 01:05:27 - ERROR - stderr - 74%|███████▍ | 16548/22434 [14:57:47<4:10:56, 2.56s/it] +2025-02-06 01:05:27 - ERROR - stderr - +2025-02-06 01:05:27 - ERROR - stderr - +2025-02-06 01:05:27 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.2992616891860962, 'learning_rate': 3.3985854605948896e-06, 'epoch': 2.21} +2025-02-06 01:05:27 - ERROR - stderr - 74%|███████▍ | 16548/22434 [14:57:47<4:10:56, 2.56s/it] +2025-02-06 01:05:30 - ERROR - stderr - 74%|███████▍ | 16549/22434 [14:57:49<4:08:10, 2.53s/it] +2025-02-06 01:05:30 - ERROR - stderr - +2025-02-06 01:05:30 - ERROR - stderr - +2025-02-06 01:05:30 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.4481343030929565, 'learning_rate': 3.397501071674898e-06, 'epoch': 2.21} +2025-02-06 01:05:30 - ERROR - stderr - 74%|███████▍ | 16549/22434 [14:57:50<4:08:10, 2.53s/it] +2025-02-06 01:05:32 - ERROR - stderr - 74%|███████▍ | 16550/22434 [14:57:52<4:09:04, 2.54s/it] +2025-02-06 01:05:32 - ERROR - stderr - +2025-02-06 01:05:32 - ERROR - stderr - +2025-02-06 01:05:32 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.4488797187805176, 'learning_rate': 3.396416820377675e-06, 'epoch': 2.21} +2025-02-06 01:05:32 - ERROR - stderr - 74%|███████▍ | 16550/22434 [14:57:52<4:09:04, 2.54s/it] +2025-02-06 01:05:35 - ERROR - stderr - 74%|███████▍ | 16551/22434 [14:57:55<4:18:03, 2.63s/it] +2025-02-06 01:05:35 - ERROR - stderr - +2025-02-06 01:05:35 - ERROR - stderr - +2025-02-06 01:05:35 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.4925519227981567, 'learning_rate': 3.3953327067258303e-06, 'epoch': 2.21} +2025-02-06 01:05:35 - ERROR - stderr - 74%|███████▍ | 16551/22434 [14:57:55<4:18:03, 2.63s/it] +2025-02-06 01:05:38 - ERROR - stderr - 74%|███████▍ | 16552/22434 [14:57:57<4:14:25, 2.60s/it] +2025-02-06 01:05:38 - ERROR - stderr - +2025-02-06 01:05:38 - ERROR - stderr - +2025-02-06 01:05:38 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.473625898361206, 'learning_rate': 3.394248730741948e-06, 'epoch': 2.21} +2025-02-06 01:05:38 - ERROR - stderr - 74%|███████▍ | 16552/22434 [14:57:57<4:14:25, 2.60s/it] +2025-02-06 01:05:40 - ERROR - stderr - 74%|███████▍ | 16553/22434 [14:58:00<4:09:04, 2.54s/it] +2025-02-06 01:05:40 - ERROR - stderr - +2025-02-06 01:05:40 - ERROR - stderr - +2025-02-06 01:05:40 - INFO - stdout - {'loss': 0.3897, 'grad_norm': 1.5663082599639893, 'learning_rate': 3.3931648924486383e-06, 'epoch': 2.21} +2025-02-06 01:05:40 - ERROR - stderr - 74%|███████▍ | 16553/22434 [14:58:00<4:09:04, 2.54s/it] +2025-02-06 01:05:43 - ERROR - stderr - 74%|███████▍ | 16554/22434 [14:58:02<4:09:53, 2.55s/it] +2025-02-06 01:05:43 - ERROR - stderr - +2025-02-06 01:05:43 - ERROR - stderr - +2025-02-06 01:05:43 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.5980395078659058, 'learning_rate': 3.3920811918684804e-06, 'epoch': 2.21} +2025-02-06 01:05:43 - ERROR - stderr - 74%|███████▍ | 16554/22434 [14:58:02<4:09:53, 2.55s/it] +2025-02-06 01:05:45 - ERROR - stderr - 74%|███████▍ | 16555/22434 [14:58:05<4:18:58, 2.64s/it] +2025-02-06 01:05:45 - ERROR - stderr - +2025-02-06 01:05:45 - ERROR - stderr - +2025-02-06 01:05:45 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.5004271268844604, 'learning_rate': 3.3909976290240663e-06, 'epoch': 2.21} +2025-02-06 01:05:45 - ERROR - stderr - 74%|███████▍ | 16555/22434 [14:58:05<4:18:58, 2.64s/it] +2025-02-06 01:05:48 - ERROR - stderr - 74%|███████▍ | 16556/22434 [14:58:08<4:15:11, 2.60s/it] +2025-02-06 01:05:48 - ERROR - stderr - +2025-02-06 01:05:48 - ERROR - stderr - +2025-02-06 01:05:48 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.3759044408798218, 'learning_rate': 3.389914203937983e-06, 'epoch': 2.21} +2025-02-06 01:05:48 - ERROR - stderr - 74%|███████▍ | 16556/22434 [14:58:08<4:15:11, 2.60s/it] +2025-02-06 01:05:51 - ERROR - stderr - 74%|███████▍ | 16557/22434 [14:58:10<4:14:10, 2.60s/it] +2025-02-06 01:05:51 - ERROR - stderr - +2025-02-06 01:05:51 - ERROR - stderr - +2025-02-06 01:05:51 - INFO - stdout - {'loss': 0.3887, 'grad_norm': 1.4824753999710083, 'learning_rate': 3.388830916632813e-06, 'epoch': 2.21} +2025-02-06 01:05:51 - ERROR - stderr - 74%|███████▍ | 16557/22434 [14:58:10<4:14:10, 2.60s/it] +2025-02-06 01:05:53 - ERROR - stderr - 74%|███████▍ | 16558/22434 [14:58:13<4:09:46, 2.55s/it] +2025-02-06 01:05:53 - ERROR - stderr - +2025-02-06 01:05:53 - ERROR - stderr - +2025-02-06 01:05:53 - INFO - stdout - {'loss': 0.3941, 'grad_norm': 1.5208218097686768, 'learning_rate': 3.3877477671311363e-06, 'epoch': 2.21} +2025-02-06 01:05:53 - ERROR - stderr - 74%|███████▍ | 16558/22434 [14:58:13<4:09:46, 2.55s/it] +2025-02-06 01:05:56 - ERROR - stderr - 74%|███████▍ | 16559/22434 [14:58:15<4:10:54, 2.56s/it] +2025-02-06 01:05:56 - ERROR - stderr - +2025-02-06 01:05:56 - ERROR - stderr - +2025-02-06 01:05:56 - INFO - stdout - {'loss': 0.4222, 'grad_norm': 1.6410958766937256, 'learning_rate': 3.38666475545553e-06, 'epoch': 2.21} +2025-02-06 01:05:56 - ERROR - stderr - 74%|███████▍ | 16559/22434 [14:58:15<4:10:54, 2.56s/it] +2025-02-06 01:05:58 - ERROR - stderr - 74%|███████▍ | 16560/22434 [14:58:18<4:10:22, 2.56s/it] +2025-02-06 01:05:58 - ERROR - stderr - +2025-02-06 01:05:58 - ERROR - stderr - +2025-02-06 01:05:58 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 1.4773175716400146, 'learning_rate': 3.3855818816285692e-06, 'epoch': 2.21} +2025-02-06 01:05:58 - ERROR - stderr - 74%|███████▍ | 16560/22434 [14:58:18<4:10:22, 2.56s/it] +2025-02-06 01:06:01 - ERROR - stderr - 74%|███████▍ | 16561/22434 [14:58:20<4:06:59, 2.52s/it] +2025-02-06 01:06:01 - ERROR - stderr - +2025-02-06 01:06:01 - ERROR - stderr - +2025-02-06 01:06:01 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.5418674945831299, 'learning_rate': 3.384499145672824e-06, 'epoch': 2.21} +2025-02-06 01:06:01 - ERROR - stderr - 74%|███████▍ | 16561/22434 [14:58:20<4:06:59, 2.52s/it] +2025-02-06 01:06:03 - ERROR - stderr - 74%|███████▍ | 16562/22434 [14:58:23<4:05:57, 2.51s/it] +2025-02-06 01:06:03 - ERROR - stderr - +2025-02-06 01:06:03 - ERROR - stderr - +2025-02-06 01:06:03 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.3757091760635376, 'learning_rate': 3.3834165476108637e-06, 'epoch': 2.21} +2025-02-06 01:06:03 - ERROR - stderr - 74%|███████▍ | 16562/22434 [14:58:23<4:05:57, 2.51s/it] +2025-02-06 01:06:06 - ERROR - stderr - 74%|███████▍ | 16563/22434 [14:58:25<4:07:09, 2.53s/it] +2025-02-06 01:06:06 - ERROR - stderr - +2025-02-06 01:06:06 - ERROR - stderr - +2025-02-06 01:06:06 - INFO - stdout - {'loss': 0.3899, 'grad_norm': 1.4011141061782837, 'learning_rate': 3.3823340874652543e-06, 'epoch': 2.21} +2025-02-06 01:06:06 - ERROR - stderr - 74%|███████▍ | 16563/22434 [14:58:25<4:07:09, 2.53s/it] +2025-02-06 01:06:08 - ERROR - stderr - 74%|███████▍ | 16564/22434 [14:58:28<4:07:49, 2.53s/it] +2025-02-06 01:06:08 - ERROR - stderr - +2025-02-06 01:06:08 - ERROR - stderr - +2025-02-06 01:06:08 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.577030062675476, 'learning_rate': 3.3812517652585597e-06, 'epoch': 2.22} +2025-02-06 01:06:08 - ERROR - stderr - 74%|███████▍ | 16564/22434 [14:58:28<4:07:49, 2.53s/it] +2025-02-06 01:06:11 - ERROR - stderr - 74%|███████▍ | 16565/22434 [14:58:30<4:07:11, 2.53s/it] +2025-02-06 01:06:11 - ERROR - stderr - +2025-02-06 01:06:11 - ERROR - stderr - +2025-02-06 01:06:11 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.3586053848266602, 'learning_rate': 3.3801695810133407e-06, 'epoch': 2.22} +2025-02-06 01:06:11 - ERROR - stderr - 74%|███████▍ | 16565/22434 [14:58:30<4:07:11, 2.53s/it] +2025-02-06 01:06:13 - ERROR - stderr - 74%|███████▍ | 16566/22434 [14:58:33<4:06:21, 2.52s/it] +2025-02-06 01:06:13 - ERROR - stderr - +2025-02-06 01:06:13 - ERROR - stderr - +2025-02-06 01:06:13 - INFO - stdout - {'loss': 0.4159, 'grad_norm': 1.5198086500167847, 'learning_rate': 3.3790875347521456e-06, 'epoch': 2.22} +2025-02-06 01:06:13 - ERROR - stderr - 74%|███████▍ | 16566/22434 [14:58:33<4:06:21, 2.52s/it] +2025-02-06 01:06:16 - ERROR - stderr - 74%|███████▍ | 16567/22434 [14:58:35<4:06:21, 2.52s/it] +2025-02-06 01:06:16 - ERROR - stderr - +2025-02-06 01:06:16 - ERROR - stderr - +2025-02-06 01:06:16 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.4178853034973145, 'learning_rate': 3.378005626497541e-06, 'epoch': 2.22} +2025-02-06 01:06:16 - ERROR - stderr - 74%|███████▍ | 16567/22434 [14:58:36<4:06:21, 2.52s/it] +2025-02-06 01:06:18 - ERROR - stderr - 74%|███████▍ | 16568/22434 [14:58:38<4:06:34, 2.52s/it] +2025-02-06 01:06:18 - ERROR - stderr - +2025-02-06 01:06:18 - ERROR - stderr - +2025-02-06 01:06:18 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.3791643381118774, 'learning_rate': 3.3769238562720674e-06, 'epoch': 2.22} +2025-02-06 01:06:18 - ERROR - stderr - 74%|███████▍ | 16568/22434 [14:58:38<4:06:34, 2.52s/it] +2025-02-06 01:06:21 - ERROR - stderr - 74%|███████▍ | 16569/22434 [14:58:41<4:06:35, 2.52s/it] +2025-02-06 01:06:21 - ERROR - stderr - +2025-02-06 01:06:21 - ERROR - stderr - +2025-02-06 01:06:21 - INFO - stdout - {'loss': 0.4006, 'grad_norm': 1.512338399887085, 'learning_rate': 3.3758422240982814e-06, 'epoch': 2.22} +2025-02-06 01:06:21 - ERROR - stderr - 74%|███████▍ | 16569/22434 [14:58:41<4:06:35, 2.52s/it] +2025-02-06 01:06:23 - ERROR - stderr - 74%|███████▍ | 16570/22434 [14:58:43<4:07:06, 2.53s/it] +2025-02-06 01:06:23 - ERROR - stderr - +2025-02-06 01:06:23 - ERROR - stderr - +2025-02-06 01:06:23 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.402131199836731, 'learning_rate': 3.3747607299987294e-06, 'epoch': 2.22} +2025-02-06 01:06:23 - ERROR - stderr - 74%|███████▍ | 16570/22434 [14:58:43<4:07:06, 2.53s/it] +2025-02-06 01:06:26 - ERROR - stderr - 74%|███████▍ | 16571/22434 [14:58:46<4:07:07, 2.53s/it] +2025-02-06 01:06:26 - ERROR - stderr - +2025-02-06 01:06:26 - ERROR - stderr - +2025-02-06 01:06:26 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.5381319522857666, 'learning_rate': 3.3736793739959426e-06, 'epoch': 2.22} +2025-02-06 01:06:26 - ERROR - stderr - 74%|███████▍ | 16571/22434 [14:58:46<4:07:07, 2.53s/it] +2025-02-06 01:06:28 - ERROR - stderr - 74%|███████▍ | 16572/22434 [14:58:48<4:05:06, 2.51s/it] +2025-02-06 01:06:28 - ERROR - stderr - +2025-02-06 01:06:28 - ERROR - stderr - +2025-02-06 01:06:28 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.4864060878753662, 'learning_rate': 3.3725981561124764e-06, 'epoch': 2.22} +2025-02-06 01:06:28 - ERROR - stderr - 74%|███████▍ | 16572/22434 [14:58:48<4:05:06, 2.51s/it] +2025-02-06 01:06:31 - ERROR - stderr - 74%|███████▍ | 16573/22434 [14:58:51<4:05:01, 2.51s/it] +2025-02-06 01:06:31 - ERROR - stderr - +2025-02-06 01:06:31 - ERROR - stderr - +2025-02-06 01:06:31 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.2980787754058838, 'learning_rate': 3.3715170763708526e-06, 'epoch': 2.22} +2025-02-06 01:06:31 - ERROR - stderr - 74%|███████▍ | 16573/22434 [14:58:51<4:05:01, 2.51s/it] +2025-02-06 01:06:33 - ERROR - stderr - 74%|███████▍ | 16574/22434 [14:58:53<4:02:57, 2.49s/it] +2025-02-06 01:06:33 - ERROR - stderr - +2025-02-06 01:06:33 - ERROR - stderr - +2025-02-06 01:06:33 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.280125379562378, 'learning_rate': 3.3704361347936186e-06, 'epoch': 2.22} +2025-02-06 01:06:33 - ERROR - stderr - 74%|███████▍ | 16574/22434 [14:58:53<4:02:57, 2.49s/it] +2025-02-06 01:06:36 - ERROR - stderr - 74%|███████▍ | 16575/22434 [14:58:55<4:00:58, 2.47s/it] +2025-02-06 01:06:36 - ERROR - stderr - +2025-02-06 01:06:36 - ERROR - stderr - +2025-02-06 01:06:36 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.4714363813400269, 'learning_rate': 3.3693553314032967e-06, 'epoch': 2.22} +2025-02-06 01:06:36 - ERROR - stderr - 74%|███████▍ | 16575/22434 [14:58:55<4:00:58, 2.47s/it] +2025-02-06 01:06:38 - ERROR - stderr - 74%|███████▍ | 16576/22434 [14:58:58<4:03:08, 2.49s/it] +2025-02-06 01:06:38 - ERROR - stderr - +2025-02-06 01:06:38 - ERROR - stderr - +2025-02-06 01:06:38 - INFO - stdout - {'loss': 0.3256, 'grad_norm': 1.440182089805603, 'learning_rate': 3.368274666222419e-06, 'epoch': 2.22} +2025-02-06 01:06:38 - ERROR - stderr - 74%|███████▍ | 16576/22434 [14:58:58<4:03:08, 2.49s/it] +2025-02-06 01:06:41 - ERROR - stderr - 74%|███████▍ | 16577/22434 [14:59:00<4:04:24, 2.50s/it] +2025-02-06 01:06:41 - ERROR - stderr - +2025-02-06 01:06:41 - ERROR - stderr - +2025-02-06 01:06:41 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.4203541278839111, 'learning_rate': 3.367194139273509e-06, 'epoch': 2.22} +2025-02-06 01:06:41 - ERROR - stderr - 74%|███████▍ | 16577/22434 [14:59:01<4:04:24, 2.50s/it] +2025-02-06 01:06:43 - ERROR - stderr - 74%|███████▍ | 16578/22434 [14:59:03<4:05:15, 2.51s/it] +2025-02-06 01:06:43 - ERROR - stderr - +2025-02-06 01:06:43 - ERROR - stderr - +2025-02-06 01:06:43 - INFO - stdout - {'loss': 0.4392, 'grad_norm': 1.66934335231781, 'learning_rate': 3.366113750579091e-06, 'epoch': 2.22} +2025-02-06 01:06:43 - ERROR - stderr - 74%|███████▍ | 16578/22434 [14:59:03<4:05:15, 2.51s/it] +2025-02-06 01:06:46 - ERROR - stderr - 74%|███████▍ | 16579/22434 [14:59:06<4:04:23, 2.50s/it] +2025-02-06 01:06:46 - ERROR - stderr - +2025-02-06 01:06:46 - ERROR - stderr - +2025-02-06 01:06:46 - INFO - stdout - {'loss': 0.3165, 'grad_norm': 1.6014271974563599, 'learning_rate': 3.365033500161683e-06, 'epoch': 2.22} +2025-02-06 01:06:46 - ERROR - stderr - 74%|███████▍ | 16579/22434 [14:59:06<4:04:23, 2.50s/it] +2025-02-06 01:06:48 - ERROR - stderr - 74%|███████▍ | 16580/22434 [14:59:08<4:04:47, 2.51s/it] +2025-02-06 01:06:48 - ERROR - stderr - +2025-02-06 01:06:48 - ERROR - stderr - +2025-02-06 01:06:48 - INFO - stdout - {'loss': 0.3451, 'grad_norm': 1.6330770254135132, 'learning_rate': 3.3639533880438037e-06, 'epoch': 2.22} +2025-02-06 01:06:48 - ERROR - stderr - 74%|███████▍ | 16580/22434 [14:59:08<4:04:47, 2.51s/it] +2025-02-06 01:06:51 - ERROR - stderr - 74%|███████▍ | 16581/22434 [14:59:11<4:05:31, 2.52s/it] +2025-02-06 01:06:51 - ERROR - stderr - +2025-02-06 01:06:51 - ERROR - stderr - +2025-02-06 01:06:51 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.5846052169799805, 'learning_rate': 3.3628734142479646e-06, 'epoch': 2.22} +2025-02-06 01:06:51 - ERROR - stderr - 74%|███████▍ | 16581/22434 [14:59:11<4:05:31, 2.52s/it] +2025-02-06 01:06:53 - ERROR - stderr - 74%|███████▍ | 16582/22434 [14:59:13<4:03:25, 2.50s/it] +2025-02-06 01:06:53 - ERROR - stderr - +2025-02-06 01:06:53 - ERROR - stderr - +2025-02-06 01:06:53 - INFO - stdout - {'loss': 0.4027, 'grad_norm': 1.5264737606048584, 'learning_rate': 3.3617935787966793e-06, 'epoch': 2.22} +2025-02-06 01:06:53 - ERROR - stderr - 74%|███████▍ | 16582/22434 [14:59:13<4:03:25, 2.50s/it] +2025-02-06 01:06:56 - ERROR - stderr - 74%|███████▍ | 16583/22434 [14:59:16<4:04:49, 2.51s/it] +2025-02-06 01:06:56 - ERROR - stderr - +2025-02-06 01:06:56 - ERROR - stderr - +2025-02-06 01:06:56 - INFO - stdout - {'loss': 0.4016, 'grad_norm': 1.5482114553451538, 'learning_rate': 3.360713881712454e-06, 'epoch': 2.22} +2025-02-06 01:06:56 - ERROR - stderr - 74%|███████▍ | 16583/22434 [14:59:16<4:04:49, 2.51s/it] +2025-02-06 01:06:58 - ERROR - stderr - 74%|███████▍ | 16584/22434 [14:59:18<4:03:05, 2.49s/it] +2025-02-06 01:06:58 - ERROR - stderr - +2025-02-06 01:06:58 - ERROR - stderr - +2025-02-06 01:06:58 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.6392947435379028, 'learning_rate': 3.3596343230177954e-06, 'epoch': 2.22} +2025-02-06 01:06:58 - ERROR - stderr - 74%|███████▍ | 16584/22434 [14:59:18<4:03:05, 2.49s/it] +2025-02-06 01:07:01 - ERROR - stderr - 74%|███████▍ | 16585/22434 [14:59:21<4:03:00, 2.49s/it] +2025-02-06 01:07:01 - ERROR - stderr - +2025-02-06 01:07:01 - ERROR - stderr - +2025-02-06 01:07:01 - INFO - stdout - {'loss': 0.3229, 'grad_norm': 1.4725600481033325, 'learning_rate': 3.3585549027352047e-06, 'epoch': 2.22} +2025-02-06 01:07:01 - ERROR - stderr - 74%|███████▍ | 16585/22434 [14:59:21<4:03:00, 2.49s/it] +2025-02-06 01:07:03 - ERROR - stderr - 74%|███████▍ | 16586/22434 [14:59:23<4:07:12, 2.54s/it] +2025-02-06 01:07:03 - ERROR - stderr - +2025-02-06 01:07:03 - ERROR - stderr - +2025-02-06 01:07:03 - INFO - stdout - {'loss': 0.394, 'grad_norm': 1.5494537353515625, 'learning_rate': 3.3574756208871862e-06, 'epoch': 2.22} +2025-02-06 01:07:03 - ERROR - stderr - 74%|███████▍ | 16586/22434 [14:59:23<4:07:12, 2.54s/it] +2025-02-06 01:07:06 - ERROR - stderr - 74%|███████▍ | 16587/22434 [14:59:26<4:07:27, 2.54s/it] +2025-02-06 01:07:06 - ERROR - stderr - +2025-02-06 01:07:06 - ERROR - stderr - +2025-02-06 01:07:06 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.4391752481460571, 'learning_rate': 3.3563964774962245e-06, 'epoch': 2.22} +2025-02-06 01:07:06 - ERROR - stderr - 74%|███████▍ | 16587/22434 [14:59:26<4:07:27, 2.54s/it] +2025-02-06 01:07:08 - ERROR - stderr - 74%|███████▍ | 16588/22434 [14:59:28<4:04:15, 2.51s/it] +2025-02-06 01:07:08 - ERROR - stderr - +2025-02-06 01:07:08 - ERROR - stderr - +2025-02-06 01:07:08 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.4733268022537231, 'learning_rate': 3.3553174725848247e-06, 'epoch': 2.22} +2025-02-06 01:07:08 - ERROR - stderr - 74%|███████▍ | 16588/22434 [14:59:28<4:04:15, 2.51s/it] +2025-02-06 01:07:11 - ERROR - stderr - 74%|███████▍ | 16589/22434 [14:59:31<4:03:49, 2.50s/it] +2025-02-06 01:07:11 - ERROR - stderr - +2025-02-06 01:07:11 - ERROR - stderr - +2025-02-06 01:07:11 - INFO - stdout - {'loss': 0.4118, 'grad_norm': 1.531790018081665, 'learning_rate': 3.354238606175474e-06, 'epoch': 2.22} +2025-02-06 01:07:11 - ERROR - stderr - 74%|███████▍ | 16589/22434 [14:59:31<4:03:49, 2.50s/it] +2025-02-06 01:07:13 - ERROR - stderr - 74%|███████▍ | 16590/22434 [14:59:33<4:03:44, 2.50s/it] +2025-02-06 01:07:13 - ERROR - stderr - +2025-02-06 01:07:13 - ERROR - stderr - +2025-02-06 01:07:13 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.5632377862930298, 'learning_rate': 3.3531598782906605e-06, 'epoch': 2.22} +2025-02-06 01:07:13 - ERROR - stderr - 74%|███████▍ | 16590/22434 [14:59:33<4:03:44, 2.50s/it] +2025-02-06 01:07:16 - ERROR - stderr - 74%|███████▍ | 16591/22434 [14:59:36<4:02:26, 2.49s/it] +2025-02-06 01:07:16 - ERROR - stderr - +2025-02-06 01:07:16 - ERROR - stderr - +2025-02-06 01:07:16 - INFO - stdout - {'loss': 0.4089, 'grad_norm': 1.6562833786010742, 'learning_rate': 3.352081288952872e-06, 'epoch': 2.22} +2025-02-06 01:07:16 - ERROR - stderr - 74%|███████▍ | 16591/22434 [14:59:36<4:02:26, 2.49s/it] +2025-02-06 01:07:18 - ERROR - stderr - 74%|███████▍ | 16592/22434 [14:59:38<4:03:19, 2.50s/it] +2025-02-06 01:07:18 - ERROR - stderr - +2025-02-06 01:07:18 - ERROR - stderr - +2025-02-06 01:07:18 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.6304148435592651, 'learning_rate': 3.3510028381845804e-06, 'epoch': 2.22} +2025-02-06 01:07:18 - ERROR - stderr - 74%|███████▍ | 16592/22434 [14:59:38<4:03:19, 2.50s/it] +2025-02-06 01:07:21 - ERROR - stderr - 74%|███████▍ | 16593/22434 [14:59:41<4:05:54, 2.53s/it] +2025-02-06 01:07:21 - ERROR - stderr - +2025-02-06 01:07:21 - ERROR - stderr - +2025-02-06 01:07:21 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.4776134490966797, 'learning_rate': 3.3499245260082803e-06, 'epoch': 2.22} +2025-02-06 01:07:21 - ERROR - stderr - 74%|███████▍ | 16593/22434 [14:59:41<4:05:54, 2.53s/it] +2025-02-06 01:07:23 - ERROR - stderr - 74%|███████▍ | 16594/22434 [14:59:43<4:04:30, 2.51s/it] +2025-02-06 01:07:23 - ERROR - stderr - +2025-02-06 01:07:23 - ERROR - stderr - +2025-02-06 01:07:23 - INFO - stdout - {'loss': 0.3858, 'grad_norm': 1.4355220794677734, 'learning_rate': 3.3488463524464355e-06, 'epoch': 2.22} +2025-02-06 01:07:23 - ERROR - stderr - 74%|███████▍ | 16594/22434 [14:59:43<4:04:30, 2.51s/it] +2025-02-06 01:07:26 - ERROR - stderr - 74%|███████▍ | 16595/22434 [14:59:46<4:04:41, 2.51s/it] +2025-02-06 01:07:26 - ERROR - stderr - +2025-02-06 01:07:26 - ERROR - stderr - +2025-02-06 01:07:26 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.3044672012329102, 'learning_rate': 3.3477683175215213e-06, 'epoch': 2.22} +2025-02-06 01:07:26 - ERROR - stderr - 74%|███████▍ | 16595/22434 [14:59:46<4:04:41, 2.51s/it] +2025-02-06 01:07:29 - ERROR - stderr - 74%|███████▍ | 16596/22434 [14:59:48<4:09:05, 2.56s/it] +2025-02-06 01:07:29 - ERROR - stderr - +2025-02-06 01:07:29 - ERROR - stderr - +2025-02-06 01:07:29 - INFO - stdout - {'loss': 0.4281, 'grad_norm': 1.5870615243911743, 'learning_rate': 3.346690421256017e-06, 'epoch': 2.22} +2025-02-06 01:07:29 - ERROR - stderr - 74%|███████▍ | 16596/22434 [14:59:48<4:09:05, 2.56s/it] +2025-02-06 01:07:31 - ERROR - stderr - 74%|███████▍ | 16597/22434 [14:59:51<4:09:33, 2.57s/it] +2025-02-06 01:07:31 - ERROR - stderr - +2025-02-06 01:07:31 - ERROR - stderr - +2025-02-06 01:07:31 - INFO - stdout - {'loss': 0.3881, 'grad_norm': 1.419994592666626, 'learning_rate': 3.3456126636723786e-06, 'epoch': 2.22} +2025-02-06 01:07:31 - ERROR - stderr - 74%|███████▍ | 16597/22434 [14:59:51<4:09:33, 2.57s/it] +2025-02-06 01:07:34 - ERROR - stderr - 74%|███████▍ | 16598/22434 [14:59:53<4:07:06, 2.54s/it] +2025-02-06 01:07:34 - ERROR - stderr - +2025-02-06 01:07:34 - ERROR - stderr - +2025-02-06 01:07:34 - INFO - stdout - {'loss': 0.3395, 'grad_norm': 1.3339389562606812, 'learning_rate': 3.3445350447930824e-06, 'epoch': 2.22} +2025-02-06 01:07:34 - ERROR - stderr - 74%|███████▍ | 16598/22434 [14:59:53<4:07:06, 2.54s/it] +2025-02-06 01:07:36 - ERROR - stderr - 74%|███████▍ | 16599/22434 [14:59:56<4:11:22, 2.58s/it] +2025-02-06 01:07:36 - ERROR - stderr - +2025-02-06 01:07:36 - ERROR - stderr - +2025-02-06 01:07:36 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.4296832084655762, 'learning_rate': 3.343457564640582e-06, 'epoch': 2.22} +2025-02-06 01:07:36 - ERROR - stderr - 74%|███████▍ | 16599/22434 [14:59:56<4:11:22, 2.58s/it] +2025-02-06 01:07:39 - ERROR - stderr - 74%|███████▍ | 16600/22434 [14:59:59<4:06:13, 2.53s/it] +2025-02-06 01:07:39 - ERROR - stderr - +2025-02-06 01:07:39 - ERROR - stderr - +2025-02-06 01:07:39 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.5512733459472656, 'learning_rate': 3.342380223237338e-06, 'epoch': 2.22} +2025-02-06 01:07:39 - ERROR - stderr - 74%|███████▍ | 16600/22434 [14:59:59<4:06:13, 2.53s/it] +2025-02-06 01:07:41 - ERROR - stderr - 74%|███████▍ | 16601/22434 [15:00:01<4:06:34, 2.54s/it] +2025-02-06 01:07:41 - ERROR - stderr - +2025-02-06 01:07:41 - ERROR - stderr - +2025-02-06 01:07:41 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.2560606002807617, 'learning_rate': 3.341303020605808e-06, 'epoch': 2.22} +2025-02-06 01:07:41 - ERROR - stderr - 74%|███████▍ | 16601/22434 [15:00:01<4:06:34, 2.54s/it] +2025-02-06 01:07:44 - ERROR - stderr - 74%|███████▍ | 16602/22434 [15:00:04<4:06:08, 2.53s/it] +2025-02-06 01:07:44 - ERROR - stderr - +2025-02-06 01:07:44 - ERROR - stderr - +2025-02-06 01:07:44 - INFO - stdout - {'loss': 0.3384, 'grad_norm': 1.3711079359054565, 'learning_rate': 3.340225956768446e-06, 'epoch': 2.22} +2025-02-06 01:07:44 - ERROR - stderr - 74%|███████▍ | 16602/22434 [15:00:04<4:06:08, 2.53s/it] +2025-02-06 01:07:46 - ERROR - stderr - 74%|███████▍ | 16603/22434 [15:00:06<4:03:51, 2.51s/it] +2025-02-06 01:07:46 - ERROR - stderr - +2025-02-06 01:07:46 - ERROR - stderr - +2025-02-06 01:07:46 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.4956520795822144, 'learning_rate': 3.3391490317477006e-06, 'epoch': 2.22} +2025-02-06 01:07:46 - ERROR - stderr - 74%|███████▍ | 16603/22434 [15:00:06<4:03:51, 2.51s/it] +2025-02-06 01:07:49 - ERROR - stderr - 74%|███████▍ | 16604/22434 [15:00:09<4:03:47, 2.51s/it] +2025-02-06 01:07:49 - ERROR - stderr - +2025-02-06 01:07:49 - ERROR - stderr - +2025-02-06 01:07:49 - INFO - stdout - {'loss': 0.392, 'grad_norm': 1.413162350654602, 'learning_rate': 3.33807224556602e-06, 'epoch': 2.22} +2025-02-06 01:07:49 - ERROR - stderr - 74%|███████▍ | 16604/22434 [15:00:09<4:03:47, 2.51s/it] +2025-02-06 01:07:51 - ERROR - stderr - 74%|███████▍ | 16605/22434 [15:00:11<4:06:15, 2.53s/it] +2025-02-06 01:07:51 - ERROR - stderr - +2025-02-06 01:07:51 - ERROR - stderr - +2025-02-06 01:07:51 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.4823578596115112, 'learning_rate': 3.336995598245848e-06, 'epoch': 2.22} +2025-02-06 01:07:51 - ERROR - stderr - 74%|███████▍ | 16605/22434 [15:00:11<4:06:15, 2.53s/it] +2025-02-06 01:07:54 - ERROR - stderr - 74%|███████▍ | 16606/22434 [15:00:14<4:05:08, 2.52s/it] +2025-02-06 01:07:54 - ERROR - stderr - +2025-02-06 01:07:54 - ERROR - stderr - +2025-02-06 01:07:54 - INFO - stdout - {'loss': 0.4034, 'grad_norm': 1.4129291772842407, 'learning_rate': 3.3359190898096273e-06, 'epoch': 2.22} +2025-02-06 01:07:54 - ERROR - stderr - 74%|███████▍ | 16606/22434 [15:00:14<4:05:08, 2.52s/it] +2025-02-06 01:07:56 - ERROR - stderr - 74%|███████▍ | 16607/22434 [15:00:16<4:03:21, 2.51s/it] +2025-02-06 01:07:56 - ERROR - stderr - +2025-02-06 01:07:56 - ERROR - stderr - +2025-02-06 01:07:56 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.4582463502883911, 'learning_rate': 3.3348427202797964e-06, 'epoch': 2.22} +2025-02-06 01:07:56 - ERROR - stderr - 74%|███████▍ | 16607/22434 [15:00:16<4:03:21, 2.51s/it] +2025-02-06 01:07:59 - ERROR - stderr - 74%|███████▍ | 16608/22434 [15:00:19<4:04:30, 2.52s/it] +2025-02-06 01:07:59 - ERROR - stderr - +2025-02-06 01:07:59 - ERROR - stderr - +2025-02-06 01:07:59 - INFO - stdout - {'loss': 0.4025, 'grad_norm': 1.6861326694488525, 'learning_rate': 3.3337664896787915e-06, 'epoch': 2.22} +2025-02-06 01:07:59 - ERROR - stderr - 74%|███████▍ | 16608/22434 [15:00:19<4:04:30, 2.52s/it] +2025-02-06 01:08:01 - ERROR - stderr - 74%|███████▍ | 16609/22434 [15:00:21<4:06:23, 2.54s/it] +2025-02-06 01:08:02 - ERROR - stderr - +2025-02-06 01:08:02 - ERROR - stderr - +2025-02-06 01:08:02 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.3861110210418701, 'learning_rate': 3.332690398029044e-06, 'epoch': 2.22} +2025-02-06 01:08:02 - ERROR - stderr - 74%|███████▍ | 16609/22434 [15:00:21<4:06:23, 2.54s/it] +2025-02-06 01:08:04 - ERROR - stderr - 74%|███████▍ | 16610/22434 [15:00:24<4:10:37, 2.58s/it] +2025-02-06 01:08:04 - ERROR - stderr - +2025-02-06 01:08:04 - ERROR - stderr - +2025-02-06 01:08:04 - INFO - stdout - {'loss': 0.3958, 'grad_norm': 1.485183835029602, 'learning_rate': 3.3316144453529897e-06, 'epoch': 2.22} +2025-02-06 01:08:04 - ERROR - stderr - 74%|███████▍ | 16610/22434 [15:00:24<4:10:37, 2.58s/it] +2025-02-06 01:08:07 - ERROR - stderr - 74%|███████▍ | 16611/22434 [15:00:27<4:11:13, 2.59s/it] +2025-02-06 01:08:07 - ERROR - stderr - +2025-02-06 01:08:07 - ERROR - stderr - +2025-02-06 01:08:07 - INFO - stdout - {'loss': 0.3335, 'grad_norm': 1.4620007276535034, 'learning_rate': 3.330538631673045e-06, 'epoch': 2.22} +2025-02-06 01:08:07 - ERROR - stderr - 74%|███████▍ | 16611/22434 [15:00:27<4:11:13, 2.59s/it] +2025-02-06 01:08:09 - ERROR - stderr - 74%|███████▍ | 16612/22434 [15:00:29<4:12:04, 2.60s/it] +2025-02-06 01:08:09 - ERROR - stderr - +2025-02-06 01:08:09 - ERROR - stderr - +2025-02-06 01:08:09 - INFO - stdout - {'loss': 0.3867, 'grad_norm': 1.524143099784851, 'learning_rate': 3.3294629570116453e-06, 'epoch': 2.22} +2025-02-06 01:08:09 - ERROR - stderr - 74%|███████▍ | 16612/22434 [15:00:29<4:12:04, 2.60s/it] +2025-02-06 01:08:12 - ERROR - stderr - 74%|███████▍ | 16613/22434 [15:00:32<4:10:02, 2.58s/it] +2025-02-06 01:08:12 - ERROR - stderr - +2025-02-06 01:08:12 - ERROR - stderr - +2025-02-06 01:08:12 - INFO - stdout - {'loss': 0.3405, 'grad_norm': 1.3652702569961548, 'learning_rate': 3.3283874213912028e-06, 'epoch': 2.22} +2025-02-06 01:08:12 - ERROR - stderr - 74%|███████▍ | 16613/22434 [15:00:32<4:10:02, 2.58s/it] +2025-02-06 01:08:14 - ERROR - stderr - 74%|███████▍ | 16614/22434 [15:00:34<4:09:03, 2.57s/it] +2025-02-06 01:08:14 - ERROR - stderr - +2025-02-06 01:08:14 - ERROR - stderr - +2025-02-06 01:08:14 - INFO - stdout - {'loss': 0.3851, 'grad_norm': 1.5495275259017944, 'learning_rate': 3.3273120248341427e-06, 'epoch': 2.22} +2025-02-06 01:08:14 - ERROR - stderr - 74%|███████▍ | 16614/22434 [15:00:34<4:09:03, 2.57s/it] +2025-02-06 01:08:17 - ERROR - stderr - 74%|███████▍ | 16615/22434 [15:00:37<4:09:16, 2.57s/it] +2025-02-06 01:08:17 - ERROR - stderr - +2025-02-06 01:08:17 - ERROR - stderr - +2025-02-06 01:08:17 - INFO - stdout - {'loss': 0.3956, 'grad_norm': 1.517457365989685, 'learning_rate': 3.3262367673628813e-06, 'epoch': 2.22} +2025-02-06 01:08:17 - ERROR - stderr - 74%|███████▍ | 16615/22434 [15:00:37<4:09:16, 2.57s/it] +2025-02-06 01:08:20 - ERROR - stderr - 74%|███████▍ | 16616/22434 [15:00:39<4:09:06, 2.57s/it] +2025-02-06 01:08:20 - ERROR - stderr - +2025-02-06 01:08:20 - ERROR - stderr - +2025-02-06 01:08:20 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.4989700317382812, 'learning_rate': 3.325161648999823e-06, 'epoch': 2.22} +2025-02-06 01:08:20 - ERROR - stderr - 74%|███████▍ | 16616/22434 [15:00:39<4:09:06, 2.57s/it] +2025-02-06 01:08:22 - ERROR - stderr - 74%|███████▍ | 16617/22434 [15:00:42<4:05:22, 2.53s/it] +2025-02-06 01:08:22 - ERROR - stderr - +2025-02-06 01:08:22 - ERROR - stderr - +2025-02-06 01:08:22 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.8849539756774902, 'learning_rate': 3.324086669767388e-06, 'epoch': 2.22} +2025-02-06 01:08:22 - ERROR - stderr - 74%|███████▍ | 16617/22434 [15:00:42<4:05:22, 2.53s/it] +2025-02-06 01:08:25 - ERROR - stderr - 74%|███████▍ | 16618/22434 [15:00:45<4:12:39, 2.61s/it] +2025-02-06 01:08:25 - ERROR - stderr - +2025-02-06 01:08:25 - ERROR - stderr - +2025-02-06 01:08:25 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.2653659582138062, 'learning_rate': 3.3230118296879765e-06, 'epoch': 2.22} +2025-02-06 01:08:25 - ERROR - stderr - 74%|███████▍ | 16618/22434 [15:00:45<4:12:39, 2.61s/it] +2025-02-06 01:08:27 - ERROR - stderr - 74%|███████▍ | 16619/22434 [15:00:47<4:10:33, 2.59s/it] +2025-02-06 01:08:27 - ERROR - stderr - +2025-02-06 01:08:27 - ERROR - stderr - +2025-02-06 01:08:27 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.6657754182815552, 'learning_rate': 3.321937128783993e-06, 'epoch': 2.22} +2025-02-06 01:08:27 - ERROR - stderr - 74%|███████▍ | 16619/22434 [15:00:47<4:10:33, 2.59s/it] +2025-02-06 01:08:30 - ERROR - stderr - 74%|███████▍ | 16620/22434 [15:00:50<4:09:00, 2.57s/it] +2025-02-06 01:08:30 - ERROR - stderr - +2025-02-06 01:08:30 - ERROR - stderr - +2025-02-06 01:08:30 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.49312424659729, 'learning_rate': 3.3208625670778403e-06, 'epoch': 2.22} +2025-02-06 01:08:30 - ERROR - stderr - 74%|███████▍ | 16620/22434 [15:00:50<4:09:00, 2.57s/it] +2025-02-06 01:08:32 - ERROR - stderr - 74%|███████▍ | 16621/22434 [15:00:52<4:06:03, 2.54s/it] +2025-02-06 01:08:32 - ERROR - stderr - +2025-02-06 01:08:32 - ERROR - stderr - +2025-02-06 01:08:32 - INFO - stdout - {'loss': 0.3275, 'grad_norm': 1.355749249458313, 'learning_rate': 3.3197881445919165e-06, 'epoch': 2.22} +2025-02-06 01:08:32 - ERROR - stderr - 74%|███████▍ | 16621/22434 [15:00:52<4:06:03, 2.54s/it] +2025-02-06 01:08:35 - ERROR - stderr - 74%|███████▍ | 16622/22434 [15:00:55<4:06:43, 2.55s/it] +2025-02-06 01:08:35 - ERROR - stderr - +2025-02-06 01:08:35 - ERROR - stderr - +2025-02-06 01:08:35 - INFO - stdout - {'loss': 0.3993, 'grad_norm': 1.5757213830947876, 'learning_rate': 3.318713861348617e-06, 'epoch': 2.22} +2025-02-06 01:08:35 - ERROR - stderr - 74%|███████▍ | 16622/22434 [15:00:55<4:06:43, 2.55s/it] +2025-02-06 01:08:37 - ERROR - stderr - 74%|███████▍ | 16623/22434 [15:00:57<4:07:07, 2.55s/it] +2025-02-06 01:08:38 - ERROR - stderr - +2025-02-06 01:08:38 - ERROR - stderr - +2025-02-06 01:08:38 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.6092207431793213, 'learning_rate': 3.3176397173703323e-06, 'epoch': 2.22} +2025-02-06 01:08:38 - ERROR - stderr - 74%|███████▍ | 16623/22434 [15:00:57<4:07:07, 2.55s/it] +2025-02-06 01:08:40 - ERROR - stderr - 74%|███████▍ | 16624/22434 [15:01:00<4:13:23, 2.62s/it] +2025-02-06 01:08:40 - ERROR - stderr - +2025-02-06 01:08:40 - ERROR - stderr - +2025-02-06 01:08:40 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.4089053869247437, 'learning_rate': 3.3165657126794537e-06, 'epoch': 2.22} +2025-02-06 01:08:40 - ERROR - stderr - 74%|███████▍ | 16624/22434 [15:01:00<4:13:23, 2.62s/it] +2025-02-06 01:08:43 - ERROR - stderr - 74%|███████▍ | 16625/22434 [15:01:03<4:12:30, 2.61s/it] +2025-02-06 01:08:43 - ERROR - stderr - +2025-02-06 01:08:43 - ERROR - stderr - +2025-02-06 01:08:43 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.3993682861328125, 'learning_rate': 3.3154918472983687e-06, 'epoch': 2.22} +2025-02-06 01:08:43 - ERROR - stderr - 74%|███████▍ | 16625/22434 [15:01:03<4:12:30, 2.61s/it] +2025-02-06 01:08:45 - ERROR - stderr - 74%|███████▍ | 16626/22434 [15:01:05<4:10:07, 2.58s/it] +2025-02-06 01:08:45 - ERROR - stderr - +2025-02-06 01:08:45 - ERROR - stderr - +2025-02-06 01:08:45 - INFO - stdout - {'loss': 0.4148, 'grad_norm': 1.542460560798645, 'learning_rate': 3.314418121249459e-06, 'epoch': 2.22} +2025-02-06 01:08:45 - ERROR - stderr - 74%|███████▍ | 16626/22434 [15:01:05<4:10:07, 2.58s/it] +2025-02-06 01:08:48 - ERROR - stderr - 74%|███████▍ | 16627/22434 [15:01:08<4:08:36, 2.57s/it] +2025-02-06 01:08:48 - ERROR - stderr - +2025-02-06 01:08:48 - ERROR - stderr - +2025-02-06 01:08:48 - INFO - stdout - {'loss': 0.4478, 'grad_norm': 1.5561797618865967, 'learning_rate': 3.313344534555106e-06, 'epoch': 2.22} +2025-02-06 01:08:48 - ERROR - stderr - 74%|███████▍ | 16627/22434 [15:01:08<4:08:36, 2.57s/it] +2025-02-06 01:08:50 - ERROR - stderr - 74%|███████▍ | 16628/22434 [15:01:10<4:06:19, 2.55s/it] +2025-02-06 01:08:50 - ERROR - stderr - +2025-02-06 01:08:50 - ERROR - stderr - +2025-02-06 01:08:50 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.5955150127410889, 'learning_rate': 3.3122710872376875e-06, 'epoch': 2.22} +2025-02-06 01:08:50 - ERROR - stderr - 74%|███████▍ | 16628/22434 [15:01:10<4:06:19, 2.55s/it] +2025-02-06 01:08:53 - ERROR - stderr - 74%|███████▍ | 16629/22434 [15:01:13<4:04:09, 2.52s/it] +2025-02-06 01:08:53 - ERROR - stderr - +2025-02-06 01:08:53 - ERROR - stderr - +2025-02-06 01:08:53 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.4200388193130493, 'learning_rate': 3.3111977793195794e-06, 'epoch': 2.22} +2025-02-06 01:08:53 - ERROR - stderr - 74%|███████▍ | 16629/22434 [15:01:13<4:04:09, 2.52s/it] +2025-02-06 01:08:55 - ERROR - stderr - 74%|███████▍ | 16630/22434 [15:01:15<4:01:26, 2.50s/it] +2025-02-06 01:08:55 - ERROR - stderr - +2025-02-06 01:08:55 - ERROR - stderr - +2025-02-06 01:08:55 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.5386641025543213, 'learning_rate': 3.310124610823152e-06, 'epoch': 2.22} +2025-02-06 01:08:55 - ERROR - stderr - 74%|███████▍ | 16630/22434 [15:01:15<4:01:26, 2.50s/it] +2025-02-06 01:08:58 - ERROR - stderr - 74%|███████▍ | 16631/22434 [15:01:18<4:00:43, 2.49s/it] +2025-02-06 01:08:58 - ERROR - stderr - +2025-02-06 01:08:58 - ERROR - stderr - +2025-02-06 01:08:58 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.5449252128601074, 'learning_rate': 3.3090515817707803e-06, 'epoch': 2.22} +2025-02-06 01:08:58 - ERROR - stderr - 74%|███████▍ | 16631/22434 [15:01:18<4:00:43, 2.49s/it] +2025-02-06 01:09:00 - ERROR - stderr - 74%|███████▍ | 16632/22434 [15:01:20<4:01:04, 2.49s/it] +2025-02-06 01:09:00 - ERROR - stderr - +2025-02-06 01:09:00 - ERROR - stderr - +2025-02-06 01:09:00 - INFO - stdout - {'loss': 0.418, 'grad_norm': 1.5345039367675781, 'learning_rate': 3.307978692184819e-06, 'epoch': 2.22} +2025-02-06 01:09:00 - ERROR - stderr - 74%|███████▍ | 16632/22434 [15:01:20<4:01:04, 2.49s/it] +2025-02-06 01:09:03 - ERROR - stderr - 74%|███████▍ | 16633/22434 [15:01:22<3:59:42, 2.48s/it] +2025-02-06 01:09:03 - ERROR - stderr - +2025-02-06 01:09:03 - ERROR - stderr - +2025-02-06 01:09:03 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.3569824695587158, 'learning_rate': 3.30690594208764e-06, 'epoch': 2.22} +2025-02-06 01:09:03 - ERROR - stderr - 74%|███████▍ | 16633/22434 [15:01:23<3:59:42, 2.48s/it] +2025-02-06 01:09:05 - ERROR - stderr - 74%|███████▍ | 16634/22434 [15:01:25<4:00:34, 2.49s/it] +2025-02-06 01:09:05 - ERROR - stderr - +2025-02-06 01:09:05 - ERROR - stderr - +2025-02-06 01:09:05 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.5612645149230957, 'learning_rate': 3.3058333315016066e-06, 'epoch': 2.22} +2025-02-06 01:09:05 - ERROR - stderr - 74%|███████▍ | 16634/22434 [15:01:25<4:00:34, 2.49s/it] +2025-02-06 01:09:08 - ERROR - stderr - 74%|███████▍ | 16635/22434 [15:01:28<4:01:31, 2.50s/it] +2025-02-06 01:09:08 - ERROR - stderr - +2025-02-06 01:09:08 - ERROR - stderr - +2025-02-06 01:09:08 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.4902763366699219, 'learning_rate': 3.3047608604490655e-06, 'epoch': 2.22} +2025-02-06 01:09:08 - ERROR - stderr - 74%|███████▍ | 16635/22434 [15:01:28<4:01:31, 2.50s/it] +2025-02-06 01:09:10 - ERROR - stderr - 74%|███████▍ | 16636/22434 [15:01:30<4:01:19, 2.50s/it] +2025-02-06 01:09:10 - ERROR - stderr - +2025-02-06 01:09:10 - ERROR - stderr - +2025-02-06 01:09:10 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.4195106029510498, 'learning_rate': 3.3036885289523836e-06, 'epoch': 2.22} +2025-02-06 01:09:10 - ERROR - stderr - 74%|███████▍ | 16636/22434 [15:01:30<4:01:19, 2.50s/it] +2025-02-06 01:09:13 - ERROR - stderr - 74%|███████▍ | 16637/22434 [15:01:33<4:01:19, 2.50s/it] +2025-02-06 01:09:13 - ERROR - stderr - +2025-02-06 01:09:13 - ERROR - stderr - +2025-02-06 01:09:13 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.5231140851974487, 'learning_rate': 3.3026163370339e-06, 'epoch': 2.22} +2025-02-06 01:09:13 - ERROR - stderr - 74%|███████▍ | 16637/22434 [15:01:33<4:01:19, 2.50s/it] +2025-02-06 01:09:15 - ERROR - stderr - 74%|███████▍ | 16638/22434 [15:01:35<4:02:40, 2.51s/it] +2025-02-06 01:09:15 - ERROR - stderr - +2025-02-06 01:09:15 - ERROR - stderr - +2025-02-06 01:09:15 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.5485320091247559, 'learning_rate': 3.3015442847159772e-06, 'epoch': 2.22} +2025-02-06 01:09:15 - ERROR - stderr - 74%|███████▍ | 16638/22434 [15:01:35<4:02:40, 2.51s/it] +2025-02-06 01:09:18 - ERROR - stderr - 74%|███████▍ | 16639/22434 [15:01:38<4:02:30, 2.51s/it] +2025-02-06 01:09:18 - ERROR - stderr - +2025-02-06 01:09:18 - ERROR - stderr - +2025-02-06 01:09:18 - INFO - stdout - {'loss': 0.3841, 'grad_norm': 1.6018744707107544, 'learning_rate': 3.3004723720209507e-06, 'epoch': 2.23} +2025-02-06 01:09:18 - ERROR - stderr - 74%|███████▍ | 16639/22434 [15:01:38<4:02:30, 2.51s/it] +2025-02-06 01:09:20 - ERROR - stderr - 74%|███████▍ | 16640/22434 [15:01:40<4:02:59, 2.52s/it] +2025-02-06 01:09:20 - ERROR - stderr - +2025-02-06 01:09:20 - ERROR - stderr - +2025-02-06 01:09:20 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.3405845165252686, 'learning_rate': 3.2994005989711664e-06, 'epoch': 2.23} +2025-02-06 01:09:20 - ERROR - stderr - 74%|███████▍ | 16640/22434 [15:01:40<4:02:59, 2.52s/it] +2025-02-06 01:09:23 - ERROR - stderr - 74%|███████▍ | 16641/22434 [15:01:43<4:06:21, 2.55s/it] +2025-02-06 01:09:23 - ERROR - stderr - +2025-02-06 01:09:23 - ERROR - stderr - +2025-02-06 01:09:23 - INFO - stdout - {'loss': 0.4035, 'grad_norm': 1.5225411653518677, 'learning_rate': 3.298328965588966e-06, 'epoch': 2.23} +2025-02-06 01:09:23 - ERROR - stderr - 74%|███████▍ | 16641/22434 [15:01:43<4:06:21, 2.55s/it] +2025-02-06 01:09:26 - ERROR - stderr - 74%|███████▍ | 16642/22434 [15:01:45<4:07:02, 2.56s/it] +2025-02-06 01:09:26 - ERROR - stderr - +2025-02-06 01:09:26 - ERROR - stderr - +2025-02-06 01:09:26 - INFO - stdout - {'loss': 0.324, 'grad_norm': 1.5103999376296997, 'learning_rate': 3.2972574718966845e-06, 'epoch': 2.23} +2025-02-06 01:09:26 - ERROR - stderr - 74%|███████▍ | 16642/22434 [15:01:45<4:07:02, 2.56s/it] +2025-02-06 01:09:28 - ERROR - stderr - 74%|███████▍ | 16643/22434 [15:01:48<4:07:24, 2.56s/it] +2025-02-06 01:09:28 - ERROR - stderr - +2025-02-06 01:09:28 - ERROR - stderr - +2025-02-06 01:09:28 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.3894184827804565, 'learning_rate': 3.2961861179166568e-06, 'epoch': 2.23} +2025-02-06 01:09:28 - ERROR - stderr - 74%|███████▍ | 16643/22434 [15:01:48<4:07:24, 2.56s/it] +2025-02-06 01:09:31 - ERROR - stderr - 74%|███████▍ | 16644/22434 [15:01:50<4:04:22, 2.53s/it] +2025-02-06 01:09:31 - ERROR - stderr - +2025-02-06 01:09:31 - ERROR - stderr - +2025-02-06 01:09:31 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.7332139015197754, 'learning_rate': 3.2951149036712147e-06, 'epoch': 2.23} +2025-02-06 01:09:31 - ERROR - stderr - 74%|███████▍ | 16644/22434 [15:01:50<4:04:22, 2.53s/it] +2025-02-06 01:09:33 - ERROR - stderr - 74%|█���█████▍ | 16645/22434 [15:01:53<4:02:32, 2.51s/it] +2025-02-06 01:09:33 - ERROR - stderr - +2025-02-06 01:09:33 - ERROR - stderr - +2025-02-06 01:09:33 - INFO - stdout - {'loss': 0.4069, 'grad_norm': 1.7833398580551147, 'learning_rate': 3.2940438291826883e-06, 'epoch': 2.23} +2025-02-06 01:09:33 - ERROR - stderr - 74%|███████▍ | 16645/22434 [15:01:53<4:02:32, 2.51s/it] +2025-02-06 01:09:36 - ERROR - stderr - 74%|███████▍ | 16646/22434 [15:01:55<4:01:29, 2.50s/it] +2025-02-06 01:09:36 - ERROR - stderr - +2025-02-06 01:09:36 - ERROR - stderr - +2025-02-06 01:09:36 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.4716541767120361, 'learning_rate': 3.2929728944733997e-06, 'epoch': 2.23} +2025-02-06 01:09:36 - ERROR - stderr - 74%|███████▍ | 16646/22434 [15:01:55<4:01:29, 2.50s/it] +2025-02-06 01:09:38 - ERROR - stderr - 74%|███████▍ | 16647/22434 [15:01:58<4:02:28, 2.51s/it] +2025-02-06 01:09:38 - ERROR - stderr - +2025-02-06 01:09:38 - ERROR - stderr - +2025-02-06 01:09:38 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.4921190738677979, 'learning_rate': 3.2919020995656735e-06, 'epoch': 2.23} +2025-02-06 01:09:38 - ERROR - stderr - 74%|███████▍ | 16647/22434 [15:01:58<4:02:28, 2.51s/it] +2025-02-06 01:09:40 - ERROR - stderr - 74%|███████▍ | 16648/22434 [15:02:00<4:00:12, 2.49s/it] +2025-02-06 01:09:41 - ERROR - stderr - +2025-02-06 01:09:41 - ERROR - stderr - +2025-02-06 01:09:41 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.5928740501403809, 'learning_rate': 3.290831444481829e-06, 'epoch': 2.23} +2025-02-06 01:09:41 - ERROR - stderr - 74%|███████▍ | 16648/22434 [15:02:00<4:00:12, 2.49s/it] +2025-02-06 01:09:43 - ERROR - stderr - 74%|███████▍ | 16649/22434 [15:02:03<4:00:58, 2.50s/it] +2025-02-06 01:09:43 - ERROR - stderr - +2025-02-06 01:09:43 - ERROR - stderr - +2025-02-06 01:09:43 - INFO - stdout - {'loss': 0.516, 'grad_norm': 1.9459527730941772, 'learning_rate': 3.2897609292441834e-06, 'epoch': 2.23} +2025-02-06 01:09:43 - ERROR - stderr - 74%|███████▍ | 16649/22434 [15:02:03<4:00:58, 2.50s/it] +2025-02-06 01:09:45 - ERROR - stderr - 74%|███████▍ | 16650/22434 [15:02:05<4:00:01, 2.49s/it] +2025-02-06 01:09:46 - ERROR - stderr - +2025-02-06 01:09:46 - ERROR - stderr - +2025-02-06 01:09:46 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.4410967826843262, 'learning_rate': 3.2886905538750523e-06, 'epoch': 2.23} +2025-02-06 01:09:46 - ERROR - stderr - 74%|███████▍ | 16650/22434 [15:02:05<4:00:01, 2.49s/it] +2025-02-06 01:09:48 - ERROR - stderr - 74%|███████▍ | 16651/22434 [15:02:08<4:01:33, 2.51s/it] +2025-02-06 01:09:48 - ERROR - stderr - +2025-02-06 01:09:48 - ERROR - stderr - +2025-02-06 01:09:48 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.634263515472412, 'learning_rate': 3.287620318396739e-06, 'epoch': 2.23} +2025-02-06 01:09:48 - ERROR - stderr - 74%|███████▍ | 16651/22434 [15:02:08<4:01:33, 2.51s/it] +2025-02-06 01:09:51 - ERROR - stderr - 74%|███████▍ | 16652/22434 [15:02:10<4:03:01, 2.52s/it] +2025-02-06 01:09:51 - ERROR - stderr - +2025-02-06 01:09:51 - ERROR - stderr - +2025-02-06 01:09:51 - INFO - stdout - {'loss': 0.397, 'grad_norm': 1.7036445140838623, 'learning_rate': 3.2865502228315615e-06, 'epoch': 2.23} +2025-02-06 01:09:51 - ERROR - stderr - 74%|███████▍ | 16652/22434 [15:02:10<4:03:01, 2.52s/it] +2025-02-06 01:09:53 - ERROR - stderr - 74%|███████▍ | 16653/22434 [15:02:13<4:02:09, 2.51s/it] +2025-02-06 01:09:53 - ERROR - stderr - +2025-02-06 01:09:53 - ERROR - stderr - +2025-02-06 01:09:53 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.5910916328430176, 'learning_rate': 3.2854802672018194e-06, 'epoch': 2.23} +2025-02-06 01:09:53 - ERROR - stderr - 74%|███████▍ | 16653/22434 [15:02:13<4:02:09, 2.51s/it] +2025-02-06 01:09:56 - ERROR - stderr - 74%|███████▍ | 16654/22434 [15:02:15<3:59:50, 2.49s/it] +2025-02-06 01:09:56 - ERROR - stderr - +2025-02-06 01:09:56 - ERROR - stderr - +2025-02-06 01:09:56 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.559190273284912, 'learning_rate': 3.284410451529816e-06, 'epoch': 2.23} +2025-02-06 01:09:56 - ERROR - stderr - 74%|███████▍ | 16654/22434 [15:02:15<3:59:50, 2.49s/it] +2025-02-06 01:09:58 - ERROR - stderr - 74%|███████▍ | 16655/22434 [15:02:18<4:00:49, 2.50s/it] +2025-02-06 01:09:58 - ERROR - stderr - +2025-02-06 01:09:58 - ERROR - stderr - +2025-02-06 01:09:58 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.596232295036316, 'learning_rate': 3.2833407758378534e-06, 'epoch': 2.23} +2025-02-06 01:09:58 - ERROR - stderr - 74%|███████▍ | 16655/22434 [15:02:18<4:00:49, 2.50s/it] +2025-02-06 01:10:01 - ERROR - stderr - 74%|███████▍ | 16656/22434 [15:02:20<4:00:11, 2.49s/it] +2025-02-06 01:10:01 - ERROR - stderr - +2025-02-06 01:10:01 - ERROR - stderr - +2025-02-06 01:10:01 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.4821006059646606, 'learning_rate': 3.282271240148219e-06, 'epoch': 2.23} +2025-02-06 01:10:01 - ERROR - stderr - 74%|███████▍ | 16656/22434 [15:02:20<4:00:11, 2.49s/it] +2025-02-06 01:10:03 - ERROR - stderr - 74%|███████▍ | 16657/22434 [15:02:23<4:08:32, 2.58s/it] +2025-02-06 01:10:03 - ERROR - stderr - +2025-02-06 01:10:03 - ERROR - stderr - +2025-02-06 01:10:03 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.533178448677063, 'learning_rate': 3.2812018444832195e-06, 'epoch': 2.23} +2025-02-06 01:10:03 - ERROR - stderr - 74%|███████▍ | 16657/22434 [15:02:23<4:08:32, 2.58s/it] +2025-02-06 01:10:06 - ERROR - stderr - 74%|███████▍ | 16658/22434 [15:02:26<4:05:57, 2.56s/it] +2025-02-06 01:10:06 - ERROR - stderr - +2025-02-06 01:10:06 - ERROR - stderr - +2025-02-06 01:10:06 - INFO - stdout - {'loss': 0.3122, 'grad_norm': 1.377198576927185, 'learning_rate': 3.2801325888651313e-06, 'epoch': 2.23} +2025-02-06 01:10:06 - ERROR - stderr - 74%|███████▍ | 16658/22434 [15:02:26<4:05:57, 2.56s/it] +2025-02-06 01:10:09 - ERROR - stderr - 74%|███████▍ | 16659/22434 [15:02:28<4:11:57, 2.62s/it] +2025-02-06 01:10:09 - ERROR - stderr - +2025-02-06 01:10:09 - ERROR - stderr - +2025-02-06 01:10:09 - INFO - stdout - {'loss': 0.3309, 'grad_norm': 1.4968308210372925, 'learning_rate': 3.2790634733162563e-06, 'epoch': 2.23} +2025-02-06 01:10:09 - ERROR - stderr - 74%|███████▍ | 16659/22434 [15:02:28<4:11:57, 2.62s/it] +2025-02-06 01:10:11 - ERROR - stderr - 74%|███████▍ | 16660/22434 [15:02:31<4:08:48, 2.59s/it] +2025-02-06 01:10:11 - ERROR - stderr - +2025-02-06 01:10:11 - ERROR - stderr - +2025-02-06 01:10:11 - INFO - stdout - {'loss': 0.4114, 'grad_norm': 1.7413092851638794, 'learning_rate': 3.2779944978588686e-06, 'epoch': 2.23} +2025-02-06 01:10:11 - ERROR - stderr - 74%|███████▍ | 16660/22434 [15:02:31<4:08:48, 2.59s/it] +2025-02-06 01:10:14 - ERROR - stderr - 74%|███████▍ | 16661/22434 [15:02:33<4:07:40, 2.57s/it] +2025-02-06 01:10:14 - ERROR - stderr - +2025-02-06 01:10:14 - ERROR - stderr - +2025-02-06 01:10:14 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.614399790763855, 'learning_rate': 3.276925662515249e-06, 'epoch': 2.23} +2025-02-06 01:10:14 - ERROR - stderr - 74%|███████▍ | 16661/22434 [15:02:33<4:07:40, 2.57s/it] +2025-02-06 01:10:16 - ERROR - stderr - 74%|███████▍ | 16662/22434 [15:02:36<4:04:26, 2.54s/it] +2025-02-06 01:10:16 - ERROR - stderr - +2025-02-06 01:10:16 - ERROR - stderr - +2025-02-06 01:10:16 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.5151318311691284, 'learning_rate': 3.275856967307688e-06, 'epoch': 2.23} +2025-02-06 01:10:16 - ERROR - stderr - 74%|███████▍ | 16662/22434 [15:02:36<4:04:26, 2.54s/it] +2025-02-06 01:10:19 - ERROR - stderr - 74%|███████▍ | 16663/22434 [15:02:38<4:04:28, 2.54s/it] +2025-02-06 01:10:19 - ERROR - stderr - +2025-02-06 01:10:19 - ERROR - stderr - +2025-02-06 01:10:19 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.489237666130066, 'learning_rate': 3.2747884122584504e-06, 'epoch': 2.23} +2025-02-06 01:10:19 - ERROR - stderr - 74%|███████▍ | 16663/22434 [15:02:38<4:04:28, 2.54s/it] +2025-02-06 01:10:21 - ERROR - stderr - 74%|███████▍ | 16664/22434 [15:02:41<4:07:31, 2.57s/it] +2025-02-06 01:10:21 - ERROR - stderr - +2025-02-06 01:10:21 - ERROR - stderr - +2025-02-06 01:10:21 - INFO - stdout - {'loss': 0.4598, 'grad_norm': 1.902297019958496, 'learning_rate': 3.2737199973898136e-06, 'epoch': 2.23} +2025-02-06 01:10:21 - ERROR - stderr - 74%|███████▍ | 16664/22434 [15:02:41<4:07:31, 2.57s/it] +2025-02-06 01:10:24 - ERROR - stderr - 74%|███████▍ | 16665/22434 [15:02:44<4:04:34, 2.54s/it] +2025-02-06 01:10:24 - ERROR - stderr - +2025-02-06 01:10:24 - ERROR - stderr - +2025-02-06 01:10:24 - INFO - stdout - {'loss': 0.3169, 'grad_norm': 1.259997844696045, 'learning_rate': 3.272651722724047e-06, 'epoch': 2.23} +2025-02-06 01:10:24 - ERROR - stderr - 74%|███████▍ | 16665/22434 [15:02:44<4:04:34, 2.54s/it] +2025-02-06 01:10:26 - ERROR - stderr - 74%|███████▍ | 16666/22434 [15:02:46<4:02:37, 2.52s/it] +2025-02-06 01:10:26 - ERROR - stderr - +2025-02-06 01:10:26 - ERROR - stderr - +2025-02-06 01:10:26 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.5152636766433716, 'learning_rate': 3.271583588283418e-06, 'epoch': 2.23} +2025-02-06 01:10:26 - ERROR - stderr - 74%|███████▍ | 16666/22434 [15:02:46<4:02:37, 2.52s/it] +2025-02-06 01:10:29 - ERROR - stderr - 74%|███████▍ | 16667/22434 [15:02:48<4:01:01, 2.51s/it] +2025-02-06 01:10:29 - ERROR - stderr - +2025-02-06 01:10:29 - ERROR - stderr - +2025-02-06 01:10:29 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.405554175376892, 'learning_rate': 3.27051559409019e-06, 'epoch': 2.23} +2025-02-06 01:10:29 - ERROR - stderr - 74%|███████▍ | 16667/22434 [15:02:49<4:01:01, 2.51s/it] +2025-02-06 01:10:31 - ERROR - stderr - 74%|███████▍ | 16668/22434 [15:02:51<4:05:12, 2.55s/it] +2025-02-06 01:10:31 - ERROR - stderr - +2025-02-06 01:10:31 - ERROR - stderr - +2025-02-06 01:10:31 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.508276104927063, 'learning_rate': 3.2694477401666257e-06, 'epoch': 2.23} +2025-02-06 01:10:31 - ERROR - stderr - 74%|███████▍ | 16668/22434 [15:02:51<4:05:12, 2.55s/it] +2025-02-06 01:10:34 - ERROR - stderr - 74%|███████▍ | 16669/22434 [15:02:54<4:03:25, 2.53s/it] +2025-02-06 01:10:34 - ERROR - stderr - +2025-02-06 01:10:34 - ERROR - stderr - +2025-02-06 01:10:34 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.3172376155853271, 'learning_rate': 3.268380026534983e-06, 'epoch': 2.23} +2025-02-06 01:10:34 - ERROR - stderr - 74%|███████▍ | 16669/22434 [15:02:54<4:03:25, 2.53s/it] +2025-02-06 01:10:36 - ERROR - stderr - 74%|███████▍ | 16670/22434 [15:02:56<4:03:38, 2.54s/it] +2025-02-06 01:10:36 - ERROR - stderr - +2025-02-06 01:10:36 - ERROR - stderr - +2025-02-06 01:10:36 - INFO - stdout - {'loss': 0.298, 'grad_norm': 1.3303909301757812, 'learning_rate': 3.267312453217517e-06, 'epoch': 2.23} +2025-02-06 01:10:36 - ERROR - stderr - 74%|███████▍ | 16670/22434 [15:02:56<4:03:38, 2.54s/it] +2025-02-06 01:10:39 - ERROR - stderr - 74%|███████▍ | 16671/22434 [15:02:59<4:02:52, 2.53s/it] +2025-02-06 01:10:39 - ERROR - stderr - +2025-02-06 01:10:39 - ERROR - stderr - +2025-02-06 01:10:39 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.492348551750183, 'learning_rate': 3.2662450202364806e-06, 'epoch': 2.23} +2025-02-06 01:10:39 - ERROR - stderr - 74%|███████▍ | 16671/22434 [15:02:59<4:02:52, 2.53s/it] +2025-02-06 01:10:41 - ERROR - stderr - 74%|███████▍ | 16672/22434 [15:03:01<4:03:03, 2.53s/it] +2025-02-06 01:10:41 - ERROR - stderr - +2025-02-06 01:10:41 - ERROR - stderr - +2025-02-06 01:10:41 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.4973037242889404, 'learning_rate': 3.265177727614123e-06, 'epoch': 2.23} +2025-02-06 01:10:41 - ERROR - stderr - 74%|███████▍ | 16672/22434 [15:03:01<4:03:03, 2.53s/it] +2025-02-06 01:10:44 - ERROR - stderr - 74%|███████▍ | 16673/22434 [15:03:04<4:01:42, 2.52s/it] +2025-02-06 01:10:44 - ERROR - stderr - +2025-02-06 01:10:44 - ERROR - stderr - +2025-02-06 01:10:44 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.4724109172821045, 'learning_rate': 3.26411057537269e-06, 'epoch': 2.23} +2025-02-06 01:10:44 - ERROR - stderr - 74%|███████▍ | 16673/22434 [15:03:04<4:01:42, 2.52s/it] +2025-02-06 01:10:47 - ERROR - stderr - 74%|███████▍ | 16674/22434 [15:03:07<4:15:07, 2.66s/it] +2025-02-06 01:10:47 - ERROR - stderr - +2025-02-06 01:10:47 - ERROR - stderr - +2025-02-06 01:10:47 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.5360755920410156, 'learning_rate': 3.2630435635344283e-06, 'epoch': 2.23} +2025-02-06 01:10:47 - ERROR - stderr - 74%|███████▍ | 16674/22434 [15:03:07<4:15:07, 2.66s/it] +2025-02-06 01:10:50 - ERROR - stderr - 74%|███████▍ | 16675/22434 [15:03:09<4:15:10, 2.66s/it] +2025-02-06 01:10:50 - ERROR - stderr - +2025-02-06 01:10:50 - ERROR - stderr - +2025-02-06 01:10:50 - INFO - stdout - {'loss': 0.3335, 'grad_norm': 1.3943054676055908, 'learning_rate': 3.2619766921215755e-06, 'epoch': 2.23} +2025-02-06 01:10:50 - ERROR - stderr - 74%|███████▍ | 16675/22434 [15:03:09<4:15:10, 2.66s/it] +2025-02-06 01:10:52 - ERROR - stderr - 74%|███████▍ | 16676/22434 [15:03:12<4:12:17, 2.63s/it] +2025-02-06 01:10:52 - ERROR - stderr - +2025-02-06 01:10:52 - ERROR - stderr - +2025-02-06 01:10:52 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.586645483970642, 'learning_rate': 3.2609099611563754e-06, 'epoch': 2.23} +2025-02-06 01:10:52 - ERROR - stderr - 74%|███████▍ | 16676/22434 [15:03:12<4:12:17, 2.63s/it] +2025-02-06 01:10:55 - ERROR - stderr - 74%|███████▍ | 16677/22434 [15:03:14<4:07:16, 2.58s/it] +2025-02-06 01:10:55 - ERROR - stderr - +2025-02-06 01:10:55 - ERROR - stderr - +2025-02-06 01:10:55 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.4012998342514038, 'learning_rate': 3.259843370661051e-06, 'epoch': 2.23} +2025-02-06 01:10:55 - ERROR - stderr - 74%|███████▍ | 16677/22434 [15:03:14<4:07:16, 2.58s/it] +2025-02-06 01:10:57 - ERROR - stderr - 74%|███████▍ | 16678/22434 [15:03:17<4:07:42, 2.58s/it] +2025-02-06 01:10:57 - ERROR - stderr - +2025-02-06 01:10:57 - ERROR - stderr - +2025-02-06 01:10:57 - INFO - stdout - {'loss': 0.3451, 'grad_norm': 1.3928658962249756, 'learning_rate': 3.258776920657849e-06, 'epoch': 2.23} +2025-02-06 01:10:57 - ERROR - stderr - 74%|███████▍ | 16678/22434 [15:03:17<4:07:42, 2.58s/it] +2025-02-06 01:11:00 - ERROR - stderr - 74%|███████▍ | 16679/22434 [15:03:19<4:05:08, 2.56s/it] +2025-02-06 01:11:00 - ERROR - stderr - +2025-02-06 01:11:00 - ERROR - stderr - +2025-02-06 01:11:00 - INFO - stdout - {'loss': 0.4097, 'grad_norm': 1.5631343126296997, 'learning_rate': 3.2577106111689884e-06, 'epoch': 2.23} +2025-02-06 01:11:00 - ERROR - stderr - 74%|███████▍ | 16679/22434 [15:03:19<4:05:08, 2.56s/it] +2025-02-06 01:11:02 - ERROR - stderr - 74%|███████▍ | 16680/22434 [15:03:22<4:02:42, 2.53s/it] +2025-02-06 01:11:02 - ERROR - stderr - +2025-02-06 01:11:02 - ERROR - stderr - +2025-02-06 01:11:02 - INFO - stdout - {'loss': 0.3993, 'grad_norm': 1.497756838798523, 'learning_rate': 3.2566444422166955e-06, 'epoch': 2.23} +2025-02-06 01:11:02 - ERROR - stderr - 74%|███████▍ | 16680/22434 [15:03:22<4:02:42, 2.53s/it] +2025-02-06 01:11:05 - ERROR - stderr - 74%|███████▍ | 16681/22434 [15:03:24<4:03:40, 2.54s/it] +2025-02-06 01:11:05 - ERROR - stderr - +2025-02-06 01:11:05 - ERROR - stderr - +2025-02-06 01:11:05 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.5008302927017212, 'learning_rate': 3.2555784138232014e-06, 'epoch': 2.23} +2025-02-06 01:11:05 - ERROR - stderr - 74%|███████▍ | 16681/22434 [15:03:25<4:03:40, 2.54s/it] +2025-02-06 01:11:07 - ERROR - stderr - 74%|███████▍ | 16682/22434 [15:03:27<4:03:01, 2.54s/it] +2025-02-06 01:11:07 - ERROR - stderr - +2025-02-06 01:11:07 - ERROR - stderr - +2025-02-06 01:11:07 - INFO - stdout - {'loss': 0.3267, 'grad_norm': 1.3706705570220947, 'learning_rate': 3.254512526010717e-06, 'epoch': 2.23} +2025-02-06 01:11:07 - ERROR - stderr - 74%|███████▍ | 16682/22434 [15:03:27<4:03:01, 2.54s/it] +2025-02-06 01:11:10 - ERROR - stderr - 74%|███████▍ | 16683/22434 [15:03:29<4:01:57, 2.52s/it] +2025-02-06 01:11:10 - ERROR - stderr - +2025-02-06 01:11:10 - ERROR - stderr - +2025-02-06 01:11:10 - INFO - stdout - {'loss': 0.3412, 'grad_norm': 1.3972560167312622, 'learning_rate': 3.25344677880147e-06, 'epoch': 2.23} +2025-02-06 01:11:10 - ERROR - stderr - 74%|███████▍ | 16683/22434 [15:03:30<4:01:57, 2.52s/it] +2025-02-06 01:11:12 - ERROR - stderr - 74%|███████▍ | 16684/22434 [15:03:32<4:03:55, 2.55s/it] +2025-02-06 01:11:12 - ERROR - stderr - +2025-02-06 01:11:12 - ERROR - stderr - +2025-02-06 01:11:12 - INFO - stdout - {'loss': 0.4121, 'grad_norm': 1.5514709949493408, 'learning_rate': 3.2523811722176657e-06, 'epoch': 2.23} +2025-02-06 01:11:12 - ERROR - stderr - 74%|███████▍ | 16684/22434 [15:03:32<4:03:55, 2.55s/it] +2025-02-06 01:11:15 - ERROR - stderr - 74%|███████▍ | 16685/22434 [15:03:35<4:00:50, 2.51s/it] +2025-02-06 01:11:15 - ERROR - stderr - +2025-02-06 01:11:15 - ERROR - stderr - +2025-02-06 01:11:15 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.3561575412750244, 'learning_rate': 3.251315706281519e-06, 'epoch': 2.23} +2025-02-06 01:11:15 - ERROR - stderr - 74%|███████▍ | 16685/22434 [15:03:35<4:00:50, 2.51s/it] +2025-02-06 01:11:17 - ERROR - stderr - 74%|███████▍ | 16686/22434 [15:03:37<3:58:41, 2.49s/it] +2025-02-06 01:11:17 - ERROR - stderr - +2025-02-06 01:11:17 - ERROR - stderr - +2025-02-06 01:11:17 - INFO - stdout - {'loss': 0.4117, 'grad_norm': 1.503368854522705, 'learning_rate': 3.2502503810152385e-06, 'epoch': 2.23} +2025-02-06 01:11:17 - ERROR - stderr - 74%|███████▍ | 16686/22434 [15:03:37<3:58:41, 2.49s/it] +2025-02-06 01:11:20 - ERROR - stderr - 74%|███████▍ | 16687/22434 [15:03:39<3:57:46, 2.48s/it] +2025-02-06 01:11:20 - ERROR - stderr - +2025-02-06 01:11:20 - ERROR - stderr - +2025-02-06 01:11:20 - INFO - stdout - {'loss': 0.406, 'grad_norm': 1.604862093925476, 'learning_rate': 3.2491851964410304e-06, 'epoch': 2.23} +2025-02-06 01:11:20 - ERROR - stderr - 74%|███████▍ | 16687/22434 [15:03:39<3:57:46, 2.48s/it] +2025-02-06 01:11:22 - ERROR - stderr - 74%|███████▍ | 16688/22434 [15:03:42<3:58:11, 2.49s/it] +2025-02-06 01:11:22 - ERROR - stderr - +2025-02-06 01:11:22 - ERROR - stderr - +2025-02-06 01:11:22 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.319469928741455, 'learning_rate': 3.248120152581097e-06, 'epoch': 2.23} +2025-02-06 01:11:22 - ERROR - stderr - 74%|███████▍ | 16688/22434 [15:03:42<3:58:11, 2.49s/it] +2025-02-06 01:11:25 - ERROR - stderr - 74%|███████▍ | 16689/22434 [15:03:44<3:57:35, 2.48s/it] +2025-02-06 01:11:25 - ERROR - stderr - +2025-02-06 01:11:25 - ERROR - stderr - +2025-02-06 01:11:25 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.5132969617843628, 'learning_rate': 3.247055249457638e-06, 'epoch': 2.23} +2025-02-06 01:11:25 - ERROR - stderr - 74%|███████▍ | 16689/22434 [15:03:44<3:57:35, 2.48s/it] +2025-02-06 01:11:27 - ERROR - stderr - 74%|███████▍ | 16690/22434 [15:03:47<3:56:44, 2.47s/it] +2025-02-06 01:11:27 - ERROR - stderr - +2025-02-06 01:11:27 - ERROR - stderr - +2025-02-06 01:11:27 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.4819958209991455, 'learning_rate': 3.2459904870928503e-06, 'epoch': 2.23} +2025-02-06 01:11:27 - ERROR - stderr - 74%|███████▍ | 16690/22434 [15:03:47<3:56:44, 2.47s/it] +2025-02-06 01:11:30 - ERROR - stderr - 74%|███████▍ | 16691/22434 [15:03:49<3:56:49, 2.47s/it] +2025-02-06 01:11:30 - ERROR - stderr - +2025-02-06 01:11:30 - ERROR - stderr - +2025-02-06 01:11:30 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.3600480556488037, 'learning_rate': 3.244925865508929e-06, 'epoch': 2.23} +2025-02-06 01:11:30 - ERROR - stderr - 74%|███████▍ | 16691/22434 [15:03:49<3:56:49, 2.47s/it] +2025-02-06 01:11:32 - ERROR - stderr - 74%|███████▍ | 16692/22434 [15:03:52<3:58:22, 2.49s/it] +2025-02-06 01:11:32 - ERROR - stderr - +2025-02-06 01:11:32 - ERROR - stderr - +2025-02-06 01:11:32 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.4519842863082886, 'learning_rate': 3.243861384728063e-06, 'epoch': 2.23} +2025-02-06 01:11:32 - ERROR - stderr - 74%|███████▍ | 16692/22434 [15:03:52<3:58:22, 2.49s/it] +2025-02-06 01:11:35 - ERROR - stderr - 74%|███████▍ | 16693/22434 [15:03:54<3:57:45, 2.48s/it] +2025-02-06 01:11:35 - ERROR - stderr - +2025-02-06 01:11:35 - ERROR - stderr - +2025-02-06 01:11:35 - INFO - stdout - {'loss': 0.4092, 'grad_norm': 1.550400733947754, 'learning_rate': 3.2427970447724424e-06, 'epoch': 2.23} +2025-02-06 01:11:35 - ERROR - stderr - 74%|███████▍ | 16693/22434 [15:03:54<3:57:45, 2.48s/it] +2025-02-06 01:11:37 - ERROR - stderr - 74%|███████▍ | 16694/22434 [15:03:57<3:58:49, 2.50s/it] +2025-02-06 01:11:37 - ERROR - stderr - +2025-02-06 01:11:37 - ERROR - stderr - +2025-02-06 01:11:37 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.380650281906128, 'learning_rate': 3.2417328456642507e-06, 'epoch': 2.23} +2025-02-06 01:11:37 - ERROR - stderr - 74%|███████▍ | 16694/22434 [15:03:57<3:58:49, 2.50s/it] +2025-02-06 01:11:40 - ERROR - stderr - 74%|███████▍ | 16695/22434 [15:03:59<4:01:01, 2.52s/it] +2025-02-06 01:11:40 - ERROR - stderr - +2025-02-06 01:11:40 - ERROR - stderr - +2025-02-06 01:11:40 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.512605905532837, 'learning_rate': 3.2406687874256736e-06, 'epoch': 2.23} +2025-02-06 01:11:40 - ERROR - stderr - 74%|███████▍ | 16695/22434 [15:03:59<4:01:01, 2.52s/it] +2025-02-06 01:11:42 - ERROR - stderr - 74%|███████▍ | 16696/22434 [15:04:02<4:03:48, 2.55s/it] +2025-02-06 01:11:42 - ERROR - stderr - +2025-02-06 01:11:42 - ERROR - stderr - +2025-02-06 01:11:42 - INFO - stdout - {'loss': 0.3337, 'grad_norm': 1.4997767210006714, 'learning_rate': 3.239604870078883e-06, 'epoch': 2.23} +2025-02-06 01:11:42 - ERROR - stderr - 74%|███████▍ | 16696/22434 [15:04:02<4:03:48, 2.55s/it] +2025-02-06 01:11:45 - ERROR - stderr - 74%|███████▍ | 16697/22434 [15:04:05<4:04:37, 2.56s/it] +2025-02-06 01:11:45 - ERROR - stderr - +2025-02-06 01:11:45 - ERROR - stderr - +2025-02-06 01:11:45 - INFO - stdout - {'loss': 0.3941, 'grad_norm': 1.6403921842575073, 'learning_rate': 3.2385410936460616e-06, 'epoch': 2.23} +2025-02-06 01:11:45 - ERROR - stderr - 74%|███████▍ | 16697/22434 [15:04:05<4:04:37, 2.56s/it] +2025-02-06 01:11:47 - ERROR - stderr - 74%|███████▍ | 16698/22434 [15:04:07<4:02:51, 2.54s/it] +2025-02-06 01:11:47 - ERROR - stderr - +2025-02-06 01:11:47 - ERROR - stderr - +2025-02-06 01:11:47 - INFO - stdout - {'loss': 0.3175, 'grad_norm': 1.4092366695404053, 'learning_rate': 3.2374774581493816e-06, 'epoch': 2.23} +2025-02-06 01:11:47 - ERROR - stderr - 74%|███████▍ | 16698/22434 [15:04:07<4:02:51, 2.54s/it] +2025-02-06 01:11:50 - ERROR - stderr - 74%|███████▍ | 16699/22434 [15:04:10<4:02:12, 2.53s/it] +2025-02-06 01:11:50 - ERROR - stderr - +2025-02-06 01:11:50 - ERROR - stderr - +2025-02-06 01:11:50 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.4954997301101685, 'learning_rate': 3.2364139636110127e-06, 'epoch': 2.23} +2025-02-06 01:11:50 - ERROR - stderr - 74%|███████▍ | 16699/22434 [15:04:10<4:02:12, 2.53s/it] +2025-02-06 01:11:52 - ERROR - stderr - 74%|███████▍ | 16700/22434 [15:04:12<4:00:00, 2.51s/it] +2025-02-06 01:11:52 - ERROR - stderr - +2025-02-06 01:11:52 - ERROR - stderr - +2025-02-06 01:11:52 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.6160610914230347, 'learning_rate': 3.235350610053126e-06, 'epoch': 2.23} +2025-02-06 01:11:52 - ERROR - stderr - 74%|███████▍ | 16700/22434 [15:04:12<4:00:00, 2.51s/it] +2025-02-06 01:11:55 - ERROR - stderr - 74%|███████▍ | 16701/22434 [15:04:15<4:02:09, 2.53s/it] +2025-02-06 01:11:55 - ERROR - stderr - +2025-02-06 01:11:55 - ERROR - stderr - +2025-02-06 01:11:55 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.3948166370391846, 'learning_rate': 3.234287397497877e-06, 'epoch': 2.23} +2025-02-06 01:11:55 - ERROR - stderr - 74%|███████▍ | 16701/22434 [15:04:15<4:02:09, 2.53s/it] +2025-02-06 01:11:58 - ERROR - stderr - 74%|███████▍ | 16702/22434 [15:04:17<4:07:48, 2.59s/it] +2025-02-06 01:11:58 - ERROR - stderr - +2025-02-06 01:11:58 - ERROR - stderr - +2025-02-06 01:11:58 - INFO - stdout - {'loss': 0.427, 'grad_norm': 1.4766731262207031, 'learning_rate': 3.233224325967439e-06, 'epoch': 2.23} +2025-02-06 01:11:58 - ERROR - stderr - 74%|███████▍ | 16702/22434 [15:04:17<4:07:48, 2.59s/it] +2025-02-06 01:12:00 - ERROR - stderr - 74%|███████▍ | 16703/22434 [15:04:20<4:05:13, 2.57s/it] +2025-02-06 01:12:00 - ERROR - stderr - +2025-02-06 01:12:00 - ERROR - stderr - +2025-02-06 01:12:00 - INFO - stdout - {'loss': 0.4122, 'grad_norm': 1.6584209203720093, 'learning_rate': 3.2321613954839616e-06, 'epoch': 2.23} +2025-02-06 01:12:00 - ERROR - stderr - 74%|███████▍ | 16703/22434 [15:04:20<4:05:13, 2.57s/it] +2025-02-06 01:12:03 - ERROR - stderr - 74%|███████▍ | 16704/22434 [15:04:22<4:02:46, 2.54s/it] +2025-02-06 01:12:03 - ERROR - stderr - +2025-02-06 01:12:03 - ERROR - stderr - +2025-02-06 01:12:03 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.5687224864959717, 'learning_rate': 3.2310986060696038e-06, 'epoch': 2.23} +2025-02-06 01:12:03 - ERROR - stderr - 74%|███████▍ | 16704/22434 [15:04:22<4:02:46, 2.54s/it] +2025-02-06 01:12:05 - ERROR - stderr - 74%|███████▍ | 16705/22434 [15:04:25<4:00:29, 2.52s/it] +2025-02-06 01:12:05 - ERROR - stderr - +2025-02-06 01:12:05 - ERROR - stderr - +2025-02-06 01:12:05 - INFO - stdout - {'loss': 0.4252, 'grad_norm': 1.6024816036224365, 'learning_rate': 3.230035957746518e-06, 'epoch': 2.23} +2025-02-06 01:12:05 - ERROR - stderr - 74%|███████▍ | 16705/22434 [15:04:25<4:00:29, 2.52s/it] +2025-02-06 01:12:08 - ERROR - stderr - 74%|███████▍ | 16706/22434 [15:04:27<4:01:47, 2.53s/it] +2025-02-06 01:12:08 - ERROR - stderr - +2025-02-06 01:12:08 - ERROR - stderr - +2025-02-06 01:12:08 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.7430874109268188, 'learning_rate': 3.228973450536852e-06, 'epoch': 2.23} +2025-02-06 01:12:08 - ERROR - stderr - 74%|███████▍ | 16706/22434 [15:04:27<4:01:47, 2.53s/it] +2025-02-06 01:12:10 - ERROR - stderr - 74%|███████▍ | 16707/22434 [15:04:30<3:59:38, 2.51s/it] +2025-02-06 01:12:10 - ERROR - stderr - +2025-02-06 01:12:10 - ERROR - stderr - +2025-02-06 01:12:10 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.388566493988037, 'learning_rate': 3.2279110844627616e-06, 'epoch': 2.23} +2025-02-06 01:12:10 - ERROR - stderr - 74%|███████▍ | 16707/22434 [15:04:30<3:59:38, 2.51s/it] +2025-02-06 01:12:13 - ERROR - stderr - 74%|███████▍ | 16708/22434 [15:04:32<3:57:19, 2.49s/it] +2025-02-06 01:12:13 - ERROR - stderr - +2025-02-06 01:12:13 - ERROR - stderr - +2025-02-06 01:12:13 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.499253273010254, 'learning_rate': 3.2268488595463808e-06, 'epoch': 2.23} +2025-02-06 01:12:13 - ERROR - stderr - 74%|███████▍ | 16708/22434 [15:04:32<3:57:19, 2.49s/it] +2025-02-06 01:12:15 - ERROR - stderr - 74%|███████▍ | 16709/22434 [15:04:35<3:58:19, 2.50s/it] +2025-02-06 01:12:15 - ERROR - stderr - +2025-02-06 01:12:15 - ERROR - stderr - +2025-02-06 01:12:15 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.3733304738998413, 'learning_rate': 3.225786775809855e-06, 'epoch': 2.23} +2025-02-06 01:12:15 - ERROR - stderr - 74%|███████▍ | 16709/22434 [15:04:35<3:58:19, 2.50s/it] +2025-02-06 01:12:18 - ERROR - stderr - 74%|███████▍ | 16710/22434 [15:04:38<4:02:56, 2.55s/it] +2025-02-06 01:12:18 - ERROR - stderr - +2025-02-06 01:12:18 - ERROR - stderr - +2025-02-06 01:12:18 - INFO - stdout - {'loss': 0.4072, 'grad_norm': 1.5888887643814087, 'learning_rate': 3.2247248332753213e-06, 'epoch': 2.23} +2025-02-06 01:12:18 - ERROR - stderr - 74%|███████▍ | 16710/22434 [15:04:38<4:02:56, 2.55s/it] +2025-02-06 01:12:20 - ERROR - stderr - 74%|███████▍ | 16711/22434 [15:04:40<4:00:57, 2.53s/it] +2025-02-06 01:12:20 - ERROR - stderr - +2025-02-06 01:12:20 - ERROR - stderr - +2025-02-06 01:12:20 - INFO - stdout - {'loss': 0.4005, 'grad_norm': 1.7010387182235718, 'learning_rate': 3.223663031964914e-06, 'epoch': 2.23} +2025-02-06 01:12:20 - ERROR - stderr - 74%|███████▍ | 16711/22434 [15:04:40<4:00:57, 2.53s/it] +2025-02-06 01:12:23 - ERROR - stderr - 74%|███████▍ | 16712/22434 [15:04:43<4:01:06, 2.53s/it] +2025-02-06 01:12:23 - ERROR - stderr - +2025-02-06 01:12:23 - ERROR - stderr - +2025-02-06 01:12:23 - INFO - stdout - {'loss': 0.3151, 'grad_norm': 1.3099058866500854, 'learning_rate': 3.2226013719007686e-06, 'epoch': 2.23} +2025-02-06 01:12:23 - ERROR - stderr - 74%|███████▍ | 16712/22434 [15:04:43<4:01:06, 2.53s/it] +2025-02-06 01:12:25 - ERROR - stderr - 74%|███████▍ | 16713/22434 [15:04:45<4:00:11, 2.52s/it] +2025-02-06 01:12:25 - ERROR - stderr - +2025-02-06 01:12:25 - ERROR - stderr - +2025-02-06 01:12:25 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.47629714012146, 'learning_rate': 3.2215398531050114e-06, 'epoch': 2.23} +2025-02-06 01:12:25 - ERROR - stderr - 74%|███████▍ | 16713/22434 [15:04:45<4:00:11, 2.52s/it] +2025-02-06 01:12:28 - ERROR - stderr - 75%|███████▍ | 16714/22434 [15:04:48<4:01:23, 2.53s/it] +2025-02-06 01:12:28 - ERROR - stderr - +2025-02-06 01:12:28 - ERROR - stderr - +2025-02-06 01:12:28 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.6433742046356201, 'learning_rate': 3.22047847559977e-06, 'epoch': 2.24} +2025-02-06 01:12:28 - ERROR - stderr - 75%|███████▍ | 16714/22434 [15:04:48<4:01:23, 2.53s/it] +2025-02-06 01:12:30 - ERROR - stderr - 75%|███████▍ | 16715/22434 [15:04:50<4:03:18, 2.55s/it] +2025-02-06 01:12:30 - ERROR - stderr - +2025-02-06 01:12:30 - ERROR - stderr - +2025-02-06 01:12:30 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.5814965963363647, 'learning_rate': 3.2194172394071666e-06, 'epoch': 2.24} +2025-02-06 01:12:30 - ERROR - stderr - 75%|███████▍ | 16715/22434 [15:04:50<4:03:18, 2.55s/it] +2025-02-06 01:12:33 - ERROR - stderr - 75%|███████▍ | 16716/22434 [15:04:53<4:02:32, 2.55s/it] +2025-02-06 01:12:33 - ERROR - stderr - +2025-02-06 01:12:33 - ERROR - stderr - +2025-02-06 01:12:33 - INFO - stdout - {'loss': 0.4285, 'grad_norm': 1.609731674194336, 'learning_rate': 3.2183561445493226e-06, 'epoch': 2.24} +2025-02-06 01:12:33 - ERROR - stderr - 75%|███████▍ | 16716/22434 [15:04:53<4:02:32, 2.55s/it] +2025-02-06 01:12:35 - ERROR - stderr - 75%|███████▍ | 16717/22434 [15:04:55<4:02:03, 2.54s/it] +2025-02-06 01:12:36 - ERROR - stderr - +2025-02-06 01:12:36 - ERROR - stderr - +2025-02-06 01:12:36 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.5654329061508179, 'learning_rate': 3.2172951910483564e-06, 'epoch': 2.24} +2025-02-06 01:12:36 - ERROR - stderr - 75%|███████▍ | 16717/22434 [15:04:55<4:02:03, 2.54s/it] +2025-02-06 01:12:38 - ERROR - stderr - 75%|███████▍ | 16718/22434 [15:04:58<4:01:18, 2.53s/it] +2025-02-06 01:12:38 - ERROR - stderr - +2025-02-06 01:12:38 - ERROR - stderr - +2025-02-06 01:12:38 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.3929680585861206, 'learning_rate': 3.2162343789263807e-06, 'epoch': 2.24} +2025-02-06 01:12:38 - ERROR - stderr - 75%|███████▍ | 16718/22434 [15:04:58<4:01:18, 2.53s/it] +2025-02-06 01:12:41 - ERROR - stderr - 75%|███████▍ | 16719/22434 [15:05:00<4:03:42, 2.56s/it] +2025-02-06 01:12:41 - ERROR - stderr - +2025-02-06 01:12:41 - ERROR - stderr - +2025-02-06 01:12:41 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.5375103950500488, 'learning_rate': 3.2151737082055123e-06, 'epoch': 2.24} +2025-02-06 01:12:41 - ERROR - stderr - 75%|███████▍ | 16719/22434 [15:05:00<4:03:42, 2.56s/it] +2025-02-06 01:12:43 - ERROR - stderr - 75%|███████▍ | 16720/22434 [15:05:03<4:00:58, 2.53s/it] +2025-02-06 01:12:43 - ERROR - stderr - +2025-02-06 01:12:43 - ERROR - stderr - +2025-02-06 01:12:43 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.6191916465759277, 'learning_rate': 3.2141131789078482e-06, 'epoch': 2.24} +2025-02-06 01:12:43 - ERROR - stderr - 75%|███████▍ | 16720/22434 [15:05:03<4:00:58, 2.53s/it] +2025-02-06 01:12:46 - ERROR - stderr - 75%|███████▍ | 16721/22434 [15:05:05<4:02:43, 2.55s/it] +2025-02-06 01:12:46 - ERROR - stderr - +2025-02-06 01:12:46 - ERROR - stderr - +2025-02-06 01:12:46 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.6870583295822144, 'learning_rate': 3.2130527910555088e-06, 'epoch': 2.24} +2025-02-06 01:12:46 - ERROR - stderr - 75%|███████▍ | 16721/22434 [15:05:05<4:02:43, 2.55s/it] +2025-02-06 01:12:48 - ERROR - stderr - 75%|███████▍ | 16722/22434 [15:05:08<4:02:53, 2.55s/it] +2025-02-06 01:12:48 - ERROR - stderr - +2025-02-06 01:12:48 - ERROR - stderr - +2025-02-06 01:12:48 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.377785563468933, 'learning_rate': 3.2119925446705824e-06, 'epoch': 2.24} +2025-02-06 01:12:48 - ERROR - stderr - 75%|███████▍ | 16722/22434 [15:05:08<4:02:53, 2.55s/it] +2025-02-06 01:12:51 - ERROR - stderr - 75%|███████▍ | 16723/22434 [15:05:11<4:04:06, 2.56s/it] +2025-02-06 01:12:51 - ERROR - stderr - +2025-02-06 01:12:51 - ERROR - stderr - +2025-02-06 01:12:51 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.4136121273040771, 'learning_rate': 3.2109324397751818e-06, 'epoch': 2.24} +2025-02-06 01:12:51 - ERROR - stderr - 75%|███████▍ | 16723/22434 [15:05:11<4:04:06, 2.56s/it] +2025-02-06 01:12:53 - ERROR - stderr - 75%|███████▍ | 16724/22434 [15:05:13<4:03:52, 2.56s/it] +2025-02-06 01:12:53 - ERROR - stderr - +2025-02-06 01:12:53 - ERROR - stderr - +2025-02-06 01:12:53 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.3534873723983765, 'learning_rate': 3.2098724763913958e-06, 'epoch': 2.24} +2025-02-06 01:12:53 - ERROR - stderr - 75%|███████▍ | 16724/22434 [15:05:13<4:03:52, 2.56s/it] +2025-02-06 01:12:56 - ERROR - stderr - 75%|███████▍ | 16725/22434 [15:05:16<4:03:05, 2.55s/it] +2025-02-06 01:12:56 - ERROR - stderr - +2025-02-06 01:12:56 - ERROR - stderr - +2025-02-06 01:12:56 - INFO - stdout - {'loss': 0.3272, 'grad_norm': 1.371701955795288, 'learning_rate': 3.2088126545413168e-06, 'epoch': 2.24} +2025-02-06 01:12:56 - ERROR - stderr - 75%|███████▍ | 16725/22434 [15:05:16<4:03:05, 2.55s/it] +2025-02-06 01:12:58 - ERROR - stderr - 75%|███████▍ | 16726/22434 [15:05:18<4:00:24, 2.53s/it] +2025-02-06 01:12:58 - ERROR - stderr - +2025-02-06 01:12:58 - ERROR - stderr - +2025-02-06 01:12:58 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.512346625328064, 'learning_rate': 3.2077529742470472e-06, 'epoch': 2.24} +2025-02-06 01:12:58 - ERROR - stderr - 75%|███████▍ | 16726/22434 [15:05:18<4:00:24, 2.53s/it] +2025-02-06 01:13:01 - ERROR - stderr - 75%|███████▍ | 16727/22434 [15:05:21<3:58:38, 2.51s/it] +2025-02-06 01:13:01 - ERROR - stderr - +2025-02-06 01:13:01 - ERROR - stderr - +2025-02-06 01:13:01 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.5362614393234253, 'learning_rate': 3.2066934355306633e-06, 'epoch': 2.24} +2025-02-06 01:13:01 - ERROR - stderr - 75%|███████▍ | 16727/22434 [15:05:21<3:58:38, 2.51s/it] +2025-02-06 01:13:03 - ERROR - stderr - 75%|███████▍ | 16728/22434 [15:05:23<3:59:00, 2.51s/it] +2025-02-06 01:13:03 - ERROR - stderr - +2025-02-06 01:13:03 - ERROR - stderr - +2025-02-06 01:13:03 - INFO - stdout - {'loss': 0.4157, 'grad_norm': 1.6169233322143555, 'learning_rate': 3.2056340384142536e-06, 'epoch': 2.24} +2025-02-06 01:13:03 - ERROR - stderr - 75%|███████▍ | 16728/22434 [15:05:23<3:59:00, 2.51s/it] +2025-02-06 01:13:06 - ERROR - stderr - 75%|███████▍ | 16729/22434 [15:05:26<4:11:46, 2.65s/it] +2025-02-06 01:13:06 - ERROR - stderr - +2025-02-06 01:13:06 - ERROR - stderr - +2025-02-06 01:13:06 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.5522016286849976, 'learning_rate': 3.2045747829199015e-06, 'epoch': 2.24} +2025-02-06 01:13:06 - ERROR - stderr - 75%|███████▍ | 16729/22434 [15:05:26<4:11:46, 2.65s/it] +2025-02-06 01:13:09 - ERROR - stderr - 75%|███████▍ | 16730/22434 [15:05:29<4:10:32, 2.64s/it] +2025-02-06 01:13:09 - ERROR - stderr - +2025-02-06 01:13:09 - ERROR - stderr - +2025-02-06 01:13:09 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.4609475135803223, 'learning_rate': 3.2035156690696857e-06, 'epoch': 2.24} +2025-02-06 01:13:09 - ERROR - stderr - 75%|███████▍ | 16730/22434 [15:05:29<4:10:32, 2.64s/it] +2025-02-06 01:13:11 - ERROR - stderr - 75%|███████▍ | 16731/22434 [15:05:31<4:06:17, 2.59s/it] +2025-02-06 01:13:11 - ERROR - stderr - +2025-02-06 01:13:11 - ERROR - stderr - +2025-02-06 01:13:11 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.6271857023239136, 'learning_rate': 3.202456696885683e-06, 'epoch': 2.24} +2025-02-06 01:13:11 - ERROR - stderr - 75%|███████▍ | 16731/22434 [15:05:31<4:06:17, 2.59s/it] +2025-02-06 01:13:14 - ERROR - stderr - 75%|███████▍ | 16732/22434 [15:05:34<4:14:51, 2.68s/it] +2025-02-06 01:13:14 - ERROR - stderr - +2025-02-06 01:13:14 - ERROR - stderr - +2025-02-06 01:13:14 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.3557190895080566, 'learning_rate': 3.2013978663899647e-06, 'epoch': 2.24} +2025-02-06 01:13:14 - ERROR - stderr - 75%|███████▍ | 16732/22434 [15:05:34<4:14:51, 2.68s/it] +2025-02-06 01:13:17 - ERROR - stderr - 75%|███████▍ | 16733/22434 [15:05:37<4:08:48, 2.62s/it] +2025-02-06 01:13:17 - ERROR - stderr - +2025-02-06 01:13:17 - ERROR - stderr - +2025-02-06 01:13:17 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.53298819065094, 'learning_rate': 3.200339177604602e-06, 'epoch': 2.24} +2025-02-06 01:13:17 - ERROR - stderr - 75%|███████▍ | 16733/22434 [15:05:37<4:08:48, 2.62s/it] +2025-02-06 01:13:19 - ERROR - stderr - 75%|███████▍ | 16734/22434 [15:05:39<4:04:29, 2.57s/it] +2025-02-06 01:13:19 - ERROR - stderr - +2025-02-06 01:13:19 - ERROR - stderr - +2025-02-06 01:13:19 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.458052635192871, 'learning_rate': 3.199280630551663e-06, 'epoch': 2.24} +2025-02-06 01:13:19 - ERROR - stderr - 75%|███████▍ | 16734/22434 [15:05:39<4:04:29, 2.57s/it] +2025-02-06 01:13:22 - ERROR - stderr - 75%|███████▍ | 16735/22434 [15:05:41<4:01:56, 2.55s/it] +2025-02-06 01:13:22 - ERROR - stderr - +2025-02-06 01:13:22 - ERROR - stderr - +2025-02-06 01:13:22 - INFO - stdout - {'loss': 0.4259, 'grad_norm': 1.5469892024993896, 'learning_rate': 3.1982222252532126e-06, 'epoch': 2.24} +2025-02-06 01:13:22 - ERROR - stderr - 75%|███████▍ | 16735/22434 [15:05:42<4:01:56, 2.55s/it] +2025-02-06 01:13:24 - ERROR - stderr - 75%|███████▍ | 16736/22434 [15:05:44<4:01:17, 2.54s/it] +2025-02-06 01:13:24 - ERROR - stderr - +2025-02-06 01:13:24 - ERROR - stderr - +2025-02-06 01:13:24 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.3807612657546997, 'learning_rate': 3.197163961731311e-06, 'epoch': 2.24} +2025-02-06 01:13:24 - ERROR - stderr - 75%|███████▍ | 16736/22434 [15:05:44<4:01:17, 2.54s/it] +2025-02-06 01:13:27 - ERROR - stderr - 75%|███████▍ | 16737/22434 [15:05:47<4:01:48, 2.55s/it] +2025-02-06 01:13:27 - ERROR - stderr - +2025-02-06 01:13:27 - ERROR - stderr - +2025-02-06 01:13:27 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.3742730617523193, 'learning_rate': 3.1961058400080157e-06, 'epoch': 2.24} +2025-02-06 01:13:27 - ERROR - stderr - 75%|███████▍ | 16737/22434 [15:05:47<4:01:48, 2.55s/it] +2025-02-06 01:13:29 - ERROR - stderr - 75%|███████▍ | 16738/22434 [15:05:49<3:59:49, 2.53s/it] +2025-02-06 01:13:29 - ERROR - stderr - +2025-02-06 01:13:29 - ERROR - stderr - +2025-02-06 01:13:29 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.5196279287338257, 'learning_rate': 3.1950478601053847e-06, 'epoch': 2.24} +2025-02-06 01:13:29 - ERROR - stderr - 75%|███████▍ | 16738/22434 [15:05:49<3:59:49, 2.53s/it] +2025-02-06 01:13:32 - ERROR - stderr - 75%|███████▍ | 16739/22434 [15:05:52<4:00:39, 2.54s/it] +2025-02-06 01:13:32 - ERROR - stderr - +2025-02-06 01:13:32 - ERROR - stderr - +2025-02-06 01:13:32 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.5679413080215454, 'learning_rate': 3.19399002204547e-06, 'epoch': 2.24} +2025-02-06 01:13:32 - ERROR - stderr - 75%|███████▍ | 16739/22434 [15:05:52<4:00:39, 2.54s/it] +2025-02-06 01:13:34 - ERROR - stderr - 75%|███████▍ | 16740/22434 [15:05:54<3:59:07, 2.52s/it] +2025-02-06 01:13:34 - ERROR - stderr - +2025-02-06 01:13:34 - ERROR - stderr - +2025-02-06 01:13:34 - INFO - stdout - {'loss': 0.4257, 'grad_norm': 1.6415464878082275, 'learning_rate': 3.192932325850323e-06, 'epoch': 2.24} +2025-02-06 01:13:34 - ERROR - stderr - 75%|███████▍ | 16740/22434 [15:05:54<3:59:07, 2.52s/it] +2025-02-06 01:13:37 - ERROR - stderr - 75%|███████▍ | 16741/22434 [15:05:57<3:58:00, 2.51s/it] +2025-02-06 01:13:37 - ERROR - stderr - +2025-02-06 01:13:37 - ERROR - stderr - +2025-02-06 01:13:37 - INFO - stdout - {'loss': 0.397, 'grad_norm': 1.5730862617492676, 'learning_rate': 3.1918747715419808e-06, 'epoch': 2.24} +2025-02-06 01:13:37 - ERROR - stderr - 75%|███████▍ | 16741/22434 [15:05:57<3:58:00, 2.51s/it] +2025-02-06 01:13:39 - ERROR - stderr - 75%|███████▍ | 16742/22434 [15:05:59<3:57:46, 2.51s/it] +2025-02-06 01:13:39 - ERROR - stderr - +2025-02-06 01:13:39 - ERROR - stderr - +2025-02-06 01:13:39 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.4044052362442017, 'learning_rate': 3.190817359142502e-06, 'epoch': 2.24} +2025-02-06 01:13:39 - ERROR - stderr - 75%|███████▍ | 16742/22434 [15:05:59<3:57:46, 2.51s/it] +2025-02-06 01:13:42 - ERROR - stderr - 75%|███████▍ | 16743/22434 [15:06:02<4:07:52, 2.61s/it] +2025-02-06 01:13:42 - ERROR - stderr - +2025-02-06 01:13:42 - ERROR - stderr - +2025-02-06 01:13:42 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.6410927772521973, 'learning_rate': 3.1897600886739134e-06, 'epoch': 2.24} +2025-02-06 01:13:42 - ERROR - stderr - 75%|███████▍ | 16743/22434 [15:06:02<4:07:52, 2.61s/it] +2025-02-06 01:13:45 - ERROR - stderr - 75%|███████▍ | 16744/22434 [15:06:04<4:05:29, 2.59s/it] +2025-02-06 01:13:45 - ERROR - stderr - +2025-02-06 01:13:45 - ERROR - stderr - +2025-02-06 01:13:45 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.4425358772277832, 'learning_rate': 3.1887029601582607e-06, 'epoch': 2.24} +2025-02-06 01:13:45 - ERROR - stderr - 75%|███████▍ | 16744/22434 [15:06:05<4:05:29, 2.59s/it] +2025-02-06 01:13:47 - ERROR - stderr - 75%|███████▍ | 16745/22434 [15:06:07<4:04:19, 2.58s/it] +2025-02-06 01:13:47 - ERROR - stderr - +2025-02-06 01:13:47 - ERROR - stderr - +2025-02-06 01:13:47 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.5258649587631226, 'learning_rate': 3.1876459736175815e-06, 'epoch': 2.24} +2025-02-06 01:13:47 - ERROR - stderr - 75%|███████▍ | 16745/22434 [15:06:07<4:04:19, 2.58s/it] +2025-02-06 01:13:50 - ERROR - stderr - 75%|███████▍ | 16746/22434 [15:06:09<4:00:41, 2.54s/it] +2025-02-06 01:13:50 - ERROR - stderr - +2025-02-06 01:13:50 - ERROR - stderr - +2025-02-06 01:13:50 - INFO - stdout - {'loss': 0.3221, 'grad_norm': 1.2879316806793213, 'learning_rate': 3.1865891290738972e-06, 'epoch': 2.24} +2025-02-06 01:13:50 - ERROR - stderr - 75%|███████▍ | 16746/22434 [15:06:10<4:00:41, 2.54s/it] +2025-02-06 01:13:52 - ERROR - stderr - 75%|███████▍ | 16747/22434 [15:06:12<3:58:57, 2.52s/it] +2025-02-06 01:13:52 - ERROR - stderr - +2025-02-06 01:13:52 - ERROR - stderr - +2025-02-06 01:13:52 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.396704077720642, 'learning_rate': 3.1855324265492483e-06, 'epoch': 2.24} +2025-02-06 01:13:52 - ERROR - stderr - 75%|███████▍ | 16747/22434 [15:06:12<3:58:57, 2.52s/it] +2025-02-06 01:13:55 - ERROR - stderr - 75%|███████▍ | 16748/22434 [15:06:14<3:57:42, 2.51s/it] +2025-02-06 01:13:55 - ERROR - stderr - +2025-02-06 01:13:55 - ERROR - stderr - +2025-02-06 01:13:55 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.6822084188461304, 'learning_rate': 3.1844758660656528e-06, 'epoch': 2.24} +2025-02-06 01:13:55 - ERROR - stderr - 75%|███████▍ | 16748/22434 [15:06:14<3:57:42, 2.51s/it] +2025-02-06 01:13:57 - ERROR - stderr - 75%|███████▍ | 16749/22434 [15:06:17<3:57:03, 2.50s/it] +2025-02-06 01:13:57 - ERROR - stderr - +2025-02-06 01:13:57 - ERROR - stderr - +2025-02-06 01:13:57 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.6863566637039185, 'learning_rate': 3.1834194476451352e-06, 'epoch': 2.24} +2025-02-06 01:13:57 - ERROR - stderr - 75%|███████▍ | 16749/22434 [15:06:17<3:57:03, 2.50s/it] +2025-02-06 01:14:00 - ERROR - stderr - 75%|███████▍ | 16750/22434 [15:06:20<3:59:52, 2.53s/it] +2025-02-06 01:14:00 - ERROR - stderr - +2025-02-06 01:14:00 - ERROR - stderr - +2025-02-06 01:14:00 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.536750078201294, 'learning_rate': 3.182363171309717e-06, 'epoch': 2.24} +2025-02-06 01:14:00 - ERROR - stderr - 75%|███████▍ | 16750/22434 [15:06:20<3:59:52, 2.53s/it] +2025-02-06 01:14:02 - ERROR - stderr - 75%|███████▍ | 16751/22434 [15:06:22<3:57:44, 2.51s/it] +2025-02-06 01:14:02 - ERROR - stderr - +2025-02-06 01:14:02 - ERROR - stderr - +2025-02-06 01:14:02 - INFO - stdout - {'loss': 0.4259, 'grad_norm': 1.579534649848938, 'learning_rate': 3.1813070370814112e-06, 'epoch': 2.24} +2025-02-06 01:14:02 - ERROR - stderr - 75%|███████▍ | 16751/22434 [15:06:22<3:57:44, 2.51s/it] +2025-02-06 01:14:05 - ERROR - stderr - 75%|███████▍ | 16752/22434 [15:06:25<3:57:59, 2.51s/it] +2025-02-06 01:14:05 - ERROR - stderr - +2025-02-06 01:14:05 - ERROR - stderr - +2025-02-06 01:14:05 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.397848129272461, 'learning_rate': 3.180251044982242e-06, 'epoch': 2.24} +2025-02-06 01:14:05 - ERROR - stderr - 75%|███████▍ | 16752/22434 [15:06:25<3:57:59, 2.51s/it] +2025-02-06 01:14:07 - ERROR - stderr - 75%|███████▍ | 16753/22434 [15:06:27<3:55:52, 2.49s/it] +2025-02-06 01:14:07 - ERROR - stderr - +2025-02-06 01:14:07 - ERROR - stderr - +2025-02-06 01:14:07 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.5751878023147583, 'learning_rate': 3.1791951950342117e-06, 'epoch': 2.24} +2025-02-06 01:14:07 - ERROR - stderr - 75%|███████▍ | 16753/22434 [15:06:27<3:55:52, 2.49s/it] +2025-02-06 01:14:10 - ERROR - stderr - 75%|███████▍ | 16754/22434 [15:06:29<3:56:07, 2.49s/it] +2025-02-06 01:14:10 - ERROR - stderr - +2025-02-06 01:14:10 - ERROR - stderr - +2025-02-06 01:14:10 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.519509196281433, 'learning_rate': 3.1781394872593296e-06, 'epoch': 2.24} +2025-02-06 01:14:10 - ERROR - stderr - 75%|███████▍ | 16754/22434 [15:06:29<3:56:07, 2.49s/it] +2025-02-06 01:14:12 - ERROR - stderr - 75%|███████▍ | 16755/22434 [15:06:32<3:57:18, 2.51s/it] +2025-02-06 01:14:12 - ERROR - stderr - +2025-02-06 01:14:12 - ERROR - stderr - +2025-02-06 01:14:12 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.4400750398635864, 'learning_rate': 3.1770839216796025e-06, 'epoch': 2.24} +2025-02-06 01:14:12 - ERROR - stderr - 75%|███████▍ | 16755/22434 [15:06:32<3:57:18, 2.51s/it] +2025-02-06 01:14:15 - ERROR - stderr - 75%|███████▍ | 16756/22434 [15:06:35<4:07:44, 2.62s/it] +2025-02-06 01:14:15 - ERROR - stderr - +2025-02-06 01:14:15 - ERROR - stderr - +2025-02-06 01:14:15 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.5375254154205322, 'learning_rate': 3.176028498317032e-06, 'epoch': 2.24} +2025-02-06 01:14:15 - ERROR - stderr - 75%|███████▍ | 16756/22434 [15:06:35<4:07:44, 2.62s/it] +2025-02-06 01:14:18 - ERROR - stderr - 75%|███████▍ | 16757/22434 [15:06:37<4:04:00, 2.58s/it] +2025-02-06 01:14:18 - ERROR - stderr - +2025-02-06 01:14:18 - ERROR - stderr - +2025-02-06 01:14:18 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.5784186124801636, 'learning_rate': 3.1749732171936176e-06, 'epoch': 2.24} +2025-02-06 01:14:18 - ERROR - stderr - 75%|███████▍ | 16757/22434 [15:06:37<4:04:00, 2.58s/it] +2025-02-06 01:14:20 - ERROR - stderr - 75%|███████▍ | 16758/22434 [15:06:40<4:00:14, 2.54s/it] +2025-02-06 01:14:20 - ERROR - stderr - +2025-02-06 01:14:20 - ERROR - stderr - +2025-02-06 01:14:20 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.6434003114700317, 'learning_rate': 3.1739180783313563e-06, 'epoch': 2.24} +2025-02-06 01:14:20 - ERROR - stderr - 75%|███████▍ | 16758/22434 [15:06:40<4:00:14, 2.54s/it] +2025-02-06 01:14:22 - ERROR - stderr - 75%|███████▍ | 16759/22434 [15:06:42<3:58:07, 2.52s/it] +2025-02-06 01:14:23 - ERROR - stderr - +2025-02-06 01:14:23 - ERROR - stderr - +2025-02-06 01:14:23 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.5441961288452148, 'learning_rate': 3.1728630817522397e-06, 'epoch': 2.24} +2025-02-06 01:14:23 - ERROR - stderr - 75%|███████▍ | 16759/22434 [15:06:42<3:58:07, 2.52s/it] +2025-02-06 01:14:25 - ERROR - stderr - 75%|███████▍ | 16760/22434 [15:06:45<3:56:00, 2.50s/it] +2025-02-06 01:14:25 - ERROR - stderr - +2025-02-06 01:14:25 - ERROR - stderr - +2025-02-06 01:14:25 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.3710834980010986, 'learning_rate': 3.1718082274782604e-06, 'epoch': 2.24} +2025-02-06 01:14:25 - ERROR - stderr - 75%|███████▍ | 16760/22434 [15:06:45<3:56:00, 2.50s/it] +2025-02-06 01:14:27 - ERROR - stderr - 75%|███████▍ | 16761/22434 [15:06:47<3:55:32, 2.49s/it] +2025-02-06 01:14:27 - ERROR - stderr - +2025-02-06 01:14:27 - ERROR - stderr - +2025-02-06 01:14:27 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.4352635145187378, 'learning_rate': 3.170753515531407e-06, 'epoch': 2.24} +2025-02-06 01:14:27 - ERROR - stderr - 75%|███████▍ | 16761/22434 [15:06:47<3:55:32, 2.49s/it] +2025-02-06 01:14:30 - ERROR - stderr - 75%|███████▍ | 16762/22434 [15:06:50<3:58:07, 2.52s/it] +2025-02-06 01:14:30 - ERROR - stderr - +2025-02-06 01:14:30 - ERROR - stderr - +2025-02-06 01:14:30 - INFO - stdout - {'loss': 0.3165, 'grad_norm': 1.4013431072235107, 'learning_rate': 3.169698945933656e-06, 'epoch': 2.24} +2025-02-06 01:14:30 - ERROR - stderr - 75%|███████▍ | 16762/22434 [15:06:50<3:58:07, 2.52s/it] +2025-02-06 01:14:33 - ERROR - stderr - 75%|███████▍ | 16763/22434 [15:06:52<3:58:03, 2.52s/it] +2025-02-06 01:14:33 - ERROR - stderr - +2025-02-06 01:14:33 - ERROR - stderr - +2025-02-06 01:14:33 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.3648494482040405, 'learning_rate': 3.1686445187069968e-06, 'epoch': 2.24} +2025-02-06 01:14:33 - ERROR - stderr - 75%|███████▍ | 16763/22434 [15:06:52<3:58:03, 2.52s/it] +2025-02-06 01:14:35 - ERROR - stderr - 75%|███████▍ | 16764/22434 [15:06:55<3:55:25, 2.49s/it] +2025-02-06 01:14:35 - ERROR - stderr - +2025-02-06 01:14:35 - ERROR - stderr - +2025-02-06 01:14:35 - INFO - stdout - {'loss': 0.3354, 'grad_norm': 1.396718978881836, 'learning_rate': 3.16759023387341e-06, 'epoch': 2.24} +2025-02-06 01:14:35 - ERROR - stderr - 75%|███████▍ | 16764/22434 [15:06:55<3:55:25, 2.49s/it] +2025-02-06 01:14:38 - ERROR - stderr - 75%|███████▍ | 16765/22434 [15:06:57<3:58:18, 2.52s/it] +2025-02-06 01:14:38 - ERROR - stderr - +2025-02-06 01:14:38 - ERROR - stderr - +2025-02-06 01:14:38 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.4493862390518188, 'learning_rate': 3.1665360914548603e-06, 'epoch': 2.24} +2025-02-06 01:14:38 - ERROR - stderr - 75%|███████▍ | 16765/22434 [15:06:57<3:58:18, 2.52s/it] +2025-02-06 01:14:40 - ERROR - stderr - 75%|███████▍ | 16766/22434 [15:07:00<3:56:57, 2.51s/it] +2025-02-06 01:14:40 - ERROR - stderr - +2025-02-06 01:14:40 - ERROR - stderr - +2025-02-06 01:14:40 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.2586445808410645, 'learning_rate': 3.165482091473333e-06, 'epoch': 2.24} +2025-02-06 01:14:40 - ERROR - stderr - 75%|███████▍ | 16766/22434 [15:07:00<3:56:57, 2.51s/it] +2025-02-06 01:14:42 - ERROR - stderr - 75%|███████▍ | 16767/22434 [15:07:02<3:54:24, 2.48s/it] +2025-02-06 01:14:42 - ERROR - stderr - +2025-02-06 01:14:42 - ERROR - stderr - +2025-02-06 01:14:42 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.4625751972198486, 'learning_rate': 3.1644282339507847e-06, 'epoch': 2.24} +2025-02-06 01:14:42 - ERROR - stderr - 75%|███████▍ | 16767/22434 [15:07:02<3:54:24, 2.48s/it] +2025-02-06 01:14:45 - ERROR - stderr - 75%|███████▍ | 16768/22434 [15:07:05<3:54:21, 2.48s/it] +2025-02-06 01:14:45 - ERROR - stderr - +2025-02-06 01:14:45 - ERROR - stderr - +2025-02-06 01:14:45 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.3744359016418457, 'learning_rate': 3.163374518909197e-06, 'epoch': 2.24} +2025-02-06 01:14:45 - ERROR - stderr - 75%|███████▍ | 16768/22434 [15:07:05<3:54:21, 2.48s/it] +2025-02-06 01:14:47 - ERROR - stderr - 75%|███████▍ | 16769/22434 [15:07:07<3:56:51, 2.51s/it] +2025-02-06 01:14:48 - ERROR - stderr - +2025-02-06 01:14:48 - ERROR - stderr - +2025-02-06 01:14:48 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.4949232339859009, 'learning_rate': 3.1623209463705207e-06, 'epoch': 2.24} +2025-02-06 01:14:48 - ERROR - stderr - 75%|███████▍ | 16769/22434 [15:07:07<3:56:51, 2.51s/it] +2025-02-06 01:14:50 - ERROR - stderr - 75%|███████▍ | 16770/22434 [15:07:10<3:55:12, 2.49s/it] +2025-02-06 01:14:50 - ERROR - stderr - +2025-02-06 01:14:50 - ERROR - stderr - +2025-02-06 01:14:50 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.4209668636322021, 'learning_rate': 3.1612675163567186e-06, 'epoch': 2.24} +2025-02-06 01:14:50 - ERROR - stderr - 75%|███████▍ | 16770/22434 [15:07:10<3:55:12, 2.49s/it] +2025-02-06 01:14:53 - ERROR - stderr - 75%|███████▍ | 16771/22434 [15:07:12<3:57:13, 2.51s/it] +2025-02-06 01:14:53 - ERROR - stderr - +2025-02-06 01:14:53 - ERROR - stderr - +2025-02-06 01:14:53 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.4952473640441895, 'learning_rate': 3.1602142288897575e-06, 'epoch': 2.24} +2025-02-06 01:14:53 - ERROR - stderr - 75%|███████▍ | 16771/22434 [15:07:12<3:57:13, 2.51s/it] +2025-02-06 01:14:55 - ERROR - stderr - 75%|███████▍ | 16772/22434 [15:07:15<3:58:21, 2.53s/it] +2025-02-06 01:14:55 - ERROR - stderr - +2025-02-06 01:14:55 - ERROR - stderr - +2025-02-06 01:14:55 - INFO - stdout - {'loss': 0.3615, 'grad_norm': 1.5211883783340454, 'learning_rate': 3.1591610839915822e-06, 'epoch': 2.24} +2025-02-06 01:14:55 - ERROR - stderr - 75%|███████▍ | 16772/22434 [15:07:15<3:58:21, 2.53s/it] +2025-02-06 01:14:58 - ERROR - stderr - 75%|███████▍ | 16773/22434 [15:07:17<3:58:01, 2.52s/it] +2025-02-06 01:14:58 - ERROR - stderr - +2025-02-06 01:14:58 - ERROR - stderr - +2025-02-06 01:14:58 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.39066743850708, 'learning_rate': 3.1581080816841492e-06, 'epoch': 2.24} +2025-02-06 01:14:58 - ERROR - stderr - 75%|███████▍ | 16773/22434 [15:07:17<3:58:01, 2.52s/it] +2025-02-06 01:15:00 - ERROR - stderr - 75%|███████▍ | 16774/22434 [15:07:20<3:56:20, 2.51s/it] +2025-02-06 01:15:00 - ERROR - stderr - +2025-02-06 01:15:00 - ERROR - stderr - +2025-02-06 01:15:00 - INFO - stdout - {'loss': 0.3018, 'grad_norm': 1.3620537519454956, 'learning_rate': 3.1570552219894055e-06, 'epoch': 2.24} +2025-02-06 01:15:00 - ERROR - stderr - 75%|███████▍ | 16774/22434 [15:07:20<3:56:20, 2.51s/it] +2025-02-06 01:15:03 - ERROR - stderr - 75%|███████▍ | 16775/22434 [15:07:22<3:55:33, 2.50s/it] +2025-02-06 01:15:03 - ERROR - stderr - +2025-02-06 01:15:03 - ERROR - stderr - +2025-02-06 01:15:03 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.355678677558899, 'learning_rate': 3.1560025049292973e-06, 'epoch': 2.24} +2025-02-06 01:15:03 - ERROR - stderr - 75%|███████▍ | 16775/22434 [15:07:22<3:55:33, 2.50s/it] +2025-02-06 01:15:05 - ERROR - stderr - 75%|███████▍ | 16776/22434 [15:07:25<3:56:24, 2.51s/it] +2025-02-06 01:15:05 - ERROR - stderr - +2025-02-06 01:15:05 - ERROR - stderr - +2025-02-06 01:15:05 - INFO - stdout - {'loss': 0.3909, 'grad_norm': 1.6843209266662598, 'learning_rate': 3.154949930525769e-06, 'epoch': 2.24} +2025-02-06 01:15:05 - ERROR - stderr - 75%|███████▍ | 16776/22434 [15:07:25<3:56:24, 2.51s/it] +2025-02-06 01:15:08 - ERROR - stderr - 75%|███████▍ | 16777/22434 [15:07:27<3:57:47, 2.52s/it] +2025-02-06 01:15:08 - ERROR - stderr - +2025-02-06 01:15:08 - ERROR - stderr - +2025-02-06 01:15:08 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.4969115257263184, 'learning_rate': 3.1538974988007587e-06, 'epoch': 2.24} +2025-02-06 01:15:08 - ERROR - stderr - 75%|███████▍ | 16777/22434 [15:07:27<3:57:47, 2.52s/it] +2025-02-06 01:15:10 - ERROR - stderr - 75%|███████▍ | 16778/22434 [15:07:30<4:05:56, 2.61s/it] +2025-02-06 01:15:10 - ERROR - stderr - +2025-02-06 01:15:10 - ERROR - stderr - +2025-02-06 01:15:10 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.4056955575942993, 'learning_rate': 3.152845209776204e-06, 'epoch': 2.24} +2025-02-06 01:15:10 - ERROR - stderr - 75%|███████▍ | 16778/22434 [15:07:30<4:05:56, 2.61s/it] +2025-02-06 01:15:13 - ERROR - stderr - 75%|███████▍ | 16779/22434 [15:07:33<4:02:01, 2.57s/it] +2025-02-06 01:15:13 - ERROR - stderr - +2025-02-06 01:15:13 - ERROR - stderr - +2025-02-06 01:15:13 - INFO - stdout - {'loss': 0.4431, 'grad_norm': 1.7109324932098389, 'learning_rate': 3.151793063474039e-06, 'epoch': 2.24} +2025-02-06 01:15:13 - ERROR - stderr - 75%|███████▍ | 16779/22434 [15:07:33<4:02:01, 2.57s/it] +2025-02-06 01:15:15 - ERROR - stderr - 75%|███████▍ | 16780/22434 [15:07:35<4:01:53, 2.57s/it] +2025-02-06 01:15:16 - ERROR - stderr - +2025-02-06 01:15:16 - ERROR - stderr - +2025-02-06 01:15:16 - INFO - stdout - {'loss': 0.4179, 'grad_norm': 1.5168843269348145, 'learning_rate': 3.150741059916198e-06, 'epoch': 2.24} +2025-02-06 01:15:16 - ERROR - stderr - 75%|███████▍ | 16780/22434 [15:07:35<4:01:53, 2.57s/it] +2025-02-06 01:15:18 - ERROR - stderr - 75%|███████▍ | 16781/22434 [15:07:38<4:00:38, 2.55s/it] +2025-02-06 01:15:18 - ERROR - stderr - +2025-02-06 01:15:18 - ERROR - stderr - +2025-02-06 01:15:18 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.3413865566253662, 'learning_rate': 3.1496891991245994e-06, 'epoch': 2.24} +2025-02-06 01:15:18 - ERROR - stderr - 75%|███████▍ | 16781/22434 [15:07:38<4:00:38, 2.55s/it] +2025-02-06 01:15:20 - ERROR - stderr - 75%|███████▍ | 16782/22434 [15:07:40<3:58:17, 2.53s/it] +2025-02-06 01:15:21 - ERROR - stderr - +2025-02-06 01:15:21 - ERROR - stderr - +2025-02-06 01:15:21 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.4487597942352295, 'learning_rate': 3.148637481121177e-06, 'epoch': 2.24} +2025-02-06 01:15:21 - ERROR - stderr - 75%|███████▍ | 16782/22434 [15:07:40<3:58:17, 2.53s/it] +2025-02-06 01:15:23 - ERROR - stderr - 75%|███████▍ | 16783/22434 [15:07:43<4:00:37, 2.55s/it] +2025-02-06 01:15:23 - ERROR - stderr - +2025-02-06 01:15:23 - ERROR - stderr - +2025-02-06 01:15:23 - INFO - stdout - {'loss': 0.33, 'grad_norm': 1.455621361732483, 'learning_rate': 3.1475859059278502e-06, 'epoch': 2.24} +2025-02-06 01:15:23 - ERROR - stderr - 75%|███████▍ | 16783/22434 [15:07:43<4:00:37, 2.55s/it] +2025-02-06 01:15:25 - ERROR - stderr - 75%|███████▍ | 16784/22434 [15:07:45<3:56:53, 2.52s/it] +2025-02-06 01:15:26 - ERROR - stderr - +2025-02-06 01:15:26 - ERROR - stderr - +2025-02-06 01:15:26 - INFO - stdout - {'loss': 0.3956, 'grad_norm': 1.5780792236328125, 'learning_rate': 3.146534473566539e-06, 'epoch': 2.24} +2025-02-06 01:15:26 - ERROR - stderr - 75%|███████▍ | 16784/22434 [15:07:45<3:56:53, 2.52s/it] +2025-02-06 01:15:28 - ERROR - stderr - 75%|███████▍ | 16785/22434 [15:07:48<3:57:27, 2.52s/it] +2025-02-06 01:15:28 - ERROR - stderr - +2025-02-06 01:15:28 - ERROR - stderr - +2025-02-06 01:15:28 - INFO - stdout - {'loss': 0.404, 'grad_norm': 1.7076690196990967, 'learning_rate': 3.1454831840591616e-06, 'epoch': 2.24} +2025-02-06 01:15:28 - ERROR - stderr - 75%|███████▍ | 16785/22434 [15:07:48<3:57:27, 2.52s/it] +2025-02-06 01:15:31 - ERROR - stderr - 75%|███████▍ | 16786/22434 [15:07:50<3:59:33, 2.54s/it] +2025-02-06 01:15:31 - ERROR - stderr - +2025-02-06 01:15:31 - ERROR - stderr - +2025-02-06 01:15:31 - INFO - stdout - {'loss': 0.3566, 'grad_norm': 1.6057820320129395, 'learning_rate': 3.1444320374276203e-06, 'epoch': 2.24} +2025-02-06 01:15:31 - ERROR - stderr - 75%|███████▍ | 16786/22434 [15:07:50<3:59:33, 2.54s/it] +2025-02-06 01:15:33 - ERROR - stderr - 75%|███████▍ | 16787/22434 [15:07:53<3:59:25, 2.54s/it] +2025-02-06 01:15:33 - ERROR - stderr - +2025-02-06 01:15:33 - ERROR - stderr - +2025-02-06 01:15:33 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.6587820053100586, 'learning_rate': 3.143381033693842e-06, 'epoch': 2.24} +2025-02-06 01:15:33 - ERROR - stderr - 75%|███████▍ | 16787/22434 [15:07:53<3:59:25, 2.54s/it] +2025-02-06 01:15:36 - ERROR - stderr - 75%|███████▍ | 16788/22434 [15:07:55<3:57:40, 2.53s/it] +2025-02-06 01:15:36 - ERROR - stderr - +2025-02-06 01:15:36 - ERROR - stderr - +2025-02-06 01:15:36 - INFO - stdout - {'loss': 0.3964, 'grad_norm': 1.6360174417495728, 'learning_rate': 3.1423301728797197e-06, 'epoch': 2.24} +2025-02-06 01:15:36 - ERROR - stderr - 75%|███████▍ | 16788/22434 [15:07:55<3:57:40, 2.53s/it] +2025-02-06 01:15:38 - ERROR - stderr - 75%|███████▍ | 16789/22434 [15:07:58<3:57:56, 2.53s/it] +2025-02-06 01:15:38 - ERROR - stderr - +2025-02-06 01:15:38 - ERROR - stderr - +2025-02-06 01:15:38 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.3965563774108887, 'learning_rate': 3.14127945500716e-06, 'epoch': 2.25} +2025-02-06 01:15:38 - ERROR - stderr - 75%|███████▍ | 16789/22434 [15:07:58<3:57:56, 2.53s/it] +2025-02-06 01:15:41 - ERROR - stderr - 75%|███████▍ | 16790/22434 [15:08:00<3:58:06, 2.53s/it] +2025-02-06 01:15:41 - ERROR - stderr - +2025-02-06 01:15:41 - ERROR - stderr - +2025-02-06 01:15:41 - INFO - stdout - {'loss': 0.4598, 'grad_norm': 1.6767451763153076, 'learning_rate': 3.140228880098074e-06, 'epoch': 2.25} +2025-02-06 01:15:41 - ERROR - stderr - 75%|███████▍ | 16790/22434 [15:08:01<3:58:06, 2.53s/it] +2025-02-06 01:15:43 - ERROR - stderr - 75%|███████▍ | 16791/22434 [15:08:03<4:00:52, 2.56s/it] +2025-02-06 01:15:43 - ERROR - stderr - +2025-02-06 01:15:43 - ERROR - stderr - +2025-02-06 01:15:43 - INFO - stdout - {'loss': 0.4531, 'grad_norm': 1.6556202173233032, 'learning_rate': 3.139178448174347e-06, 'epoch': 2.25} +2025-02-06 01:15:43 - ERROR - stderr - 75%|███████▍ | 16791/22434 [15:08:03<4:00:52, 2.56s/it] +2025-02-06 01:15:46 - ERROR - stderr - 75%|███████▍ | 16792/22434 [15:08:06<4:15:05, 2.71s/it] +2025-02-06 01:15:46 - ERROR - stderr - +2025-02-06 01:15:46 - ERROR - stderr - +2025-02-06 01:15:46 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.63777756690979, 'learning_rate': 3.138128159257885e-06, 'epoch': 2.25} +2025-02-06 01:15:46 - ERROR - stderr - 75%|███████▍ | 16792/22434 [15:08:06<4:15:05, 2.71s/it] +2025-02-06 01:15:49 - ERROR - stderr - 75%|███████▍ | 16793/22434 [15:08:09<4:10:19, 2.66s/it] +2025-02-06 01:15:49 - ERROR - stderr - +2025-02-06 01:15:49 - ERROR - stderr - +2025-02-06 01:15:49 - INFO - stdout - {'loss': 0.3309, 'grad_norm': 1.5401463508605957, 'learning_rate': 3.1370780133705737e-06, 'epoch': 2.25} +2025-02-06 01:15:49 - ERROR - stderr - 75%|███████▍ | 16793/22434 [15:08:09<4:10:19, 2.66s/it] +2025-02-06 01:15:51 - ERROR - stderr - 75%|███████▍ | 16794/22434 [15:08:11<4:04:26, 2.60s/it] +2025-02-06 01:15:51 - ERROR - stderr - +2025-02-06 01:15:51 - ERROR - stderr - +2025-02-06 01:15:51 - INFO - stdout - {'loss': 0.4333, 'grad_norm': 1.5661907196044922, 'learning_rate': 3.136028010534303e-06, 'epoch': 2.25} +2025-02-06 01:15:51 - ERROR - stderr - 75%|███████▍ | 16794/22434 [15:08:11<4:04:26, 2.60s/it] +2025-02-06 01:15:54 - ERROR - stderr - 75%|███████▍ | 16795/22434 [15:08:14<4:09:07, 2.65s/it] +2025-02-06 01:15:54 - ERROR - stderr - +2025-02-06 01:15:54 - ERROR - stderr - +2025-02-06 01:15:54 - INFO - stdout - {'loss': 0.324, 'grad_norm': 1.525539517402649, 'learning_rate': 3.1349781507709607e-06, 'epoch': 2.25} +2025-02-06 01:15:54 - ERROR - stderr - 75%|███████▍ | 16795/22434 [15:08:14<4:09:07, 2.65s/it] +2025-02-06 01:15:57 - ERROR - stderr - 75%|███████▍ | 16796/22434 [15:08:16<4:03:51, 2.60s/it] +2025-02-06 01:15:57 - ERROR - stderr - +2025-02-06 01:15:57 - ERROR - stderr - +2025-02-06 01:15:57 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.4739433526992798, 'learning_rate': 3.13392843410243e-06, 'epoch': 2.25} +2025-02-06 01:15:57 - ERROR - stderr - 75%|███████▍ | 16796/22434 [15:08:16<4:03:51, 2.60s/it] +2025-02-06 01:15:59 - ERROR - stderr - 75%|███████▍ | 16797/22434 [15:08:19<4:01:28, 2.57s/it] +2025-02-06 01:15:59 - ERROR - stderr - +2025-02-06 01:15:59 - ERROR - stderr - +2025-02-06 01:15:59 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.343876600265503, 'learning_rate': 3.132878860550591e-06, 'epoch': 2.25} +2025-02-06 01:15:59 - ERROR - stderr - 75%|███████▍ | 16797/22434 [15:08:19<4:01:28, 2.57s/it] +2025-02-06 01:16:02 - ERROR - stderr - 75%|███████▍ | 16798/22434 [15:08:21<3:59:24, 2.55s/it] +2025-02-06 01:16:02 - ERROR - stderr - +2025-02-06 01:16:02 - ERROR - stderr - +2025-02-06 01:16:02 - INFO - stdout - {'loss': 0.3847, 'grad_norm': 1.5756762027740479, 'learning_rate': 3.131829430137321e-06, 'epoch': 2.25} +2025-02-06 01:16:02 - ERROR - stderr - 75%|███████▍ | 16798/22434 [15:08:21<3:59:24, 2.55s/it] +2025-02-06 01:16:04 - ERROR - stderr - 75%|███████▍ | 16799/22434 [15:08:24<3:59:10, 2.55s/it] +2025-02-06 01:16:04 - ERROR - stderr - +2025-02-06 01:16:04 - ERROR - stderr - +2025-02-06 01:16:04 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.4930800199508667, 'learning_rate': 3.130780142884494e-06, 'epoch': 2.25} +2025-02-06 01:16:04 - ERROR - stderr - 75%|███████▍ | 16799/22434 [15:08:24<3:59:10, 2.55s/it] +2025-02-06 01:16:07 - ERROR - stderr - 75%|███████▍ | 16800/22434 [15:08:27<4:05:39, 2.62s/it] +2025-02-06 01:16:07 - ERROR - stderr - +2025-02-06 01:16:07 - ERROR - stderr - +2025-02-06 01:16:07 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.5194464921951294, 'learning_rate': 3.1297309988139824e-06, 'epoch': 2.25} +2025-02-06 01:16:07 - ERROR - stderr - 75%|███████▍ | 16800/22434 [15:08:27<4:05:39, 2.62s/it] +2025-02-06 01:16:10 - ERROR - stderr - 75%|███████▍ | 16801/22434 [15:08:29<4:07:50, 2.64s/it] +2025-02-06 01:16:10 - ERROR - stderr - +2025-02-06 01:16:10 - ERROR - stderr - +2025-02-06 01:16:10 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.6453365087509155, 'learning_rate': 3.1286819979476533e-06, 'epoch': 2.25} +2025-02-06 01:16:10 - ERROR - stderr - 75%|███████▍ | 16801/22434 [15:08:30<4:07:50, 2.64s/it] +2025-02-06 01:16:12 - ERROR - stderr - 75%|███████▍ | 16802/22434 [15:08:32<4:06:24, 2.63s/it] +2025-02-06 01:16:12 - ERROR - stderr - +2025-02-06 01:16:12 - ERROR - stderr - +2025-02-06 01:16:12 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.3833181858062744, 'learning_rate': 3.1276331403073733e-06, 'epoch': 2.25} +2025-02-06 01:16:12 - ERROR - stderr - 75%|███████▍ | 16802/22434 [15:08:32<4:06:24, 2.63s/it] +2025-02-06 01:16:15 - ERROR - stderr - 75%|███████▍ | 16803/22434 [15:08:35<4:03:47, 2.60s/it] +2025-02-06 01:16:15 - ERROR - stderr - +2025-02-06 01:16:15 - ERROR - stderr - +2025-02-06 01:16:15 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.5450055599212646, 'learning_rate': 3.1265844259150035e-06, 'epoch': 2.25} +2025-02-06 01:16:15 - ERROR - stderr - 75%|███████▍ | 16803/22434 [15:08:35<4:03:47, 2.60s/it] +2025-02-06 01:16:17 - ERROR - stderr - 75%|███████▍ | 16804/22434 [15:08:37<4:03:15, 2.59s/it] +2025-02-06 01:16:17 - ERROR - stderr - +2025-02-06 01:16:17 - ERROR - stderr - +2025-02-06 01:16:17 - INFO - stdout - {'loss': 0.4233, 'grad_norm': 1.8055185079574585, 'learning_rate': 3.1255358547924084e-06, 'epoch': 2.25} +2025-02-06 01:16:17 - ERROR - stderr - 75%|███████▍ | 16804/22434 [15:08:37<4:03:15, 2.59s/it] +2025-02-06 01:16:20 - ERROR - stderr - 75%|███████▍ | 16805/22434 [15:08:40<4:01:00, 2.57s/it] +2025-02-06 01:16:20 - ERROR - stderr - +2025-02-06 01:16:20 - ERROR - stderr - +2025-02-06 01:16:20 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.5006890296936035, 'learning_rate': 3.1244874269614335e-06, 'epoch': 2.25} +2025-02-06 01:16:20 - ERROR - stderr - 75%|███████▍ | 16805/22434 [15:08:40<4:01:00, 2.57s/it] +2025-02-06 01:16:22 - ERROR - stderr - 75%|███████▍ | 16806/22434 [15:08:42<3:58:23, 2.54s/it] +2025-02-06 01:16:22 - ERROR - stderr - +2025-02-06 01:16:22 - ERROR - stderr - +2025-02-06 01:16:22 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.4413305521011353, 'learning_rate': 3.123439142443946e-06, 'epoch': 2.25} +2025-02-06 01:16:22 - ERROR - stderr - 75%|███████▍ | 16806/22434 [15:08:42<3:58:23, 2.54s/it] +2025-02-06 01:16:25 - ERROR - stderr - 75%|███████▍ | 16807/22434 [15:08:45<3:57:55, 2.54s/it] +2025-02-06 01:16:25 - ERROR - stderr - +2025-02-06 01:16:25 - ERROR - stderr - +2025-02-06 01:16:25 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.3707932233810425, 'learning_rate': 3.122391001261782e-06, 'epoch': 2.25} +2025-02-06 01:16:25 - ERROR - stderr - 75%|███████▍ | 16807/22434 [15:08:45<3:57:55, 2.54s/it] +2025-02-06 01:16:27 - ERROR - stderr - 75%|███████▍ | 16808/22434 [15:08:47<3:57:53, 2.54s/it] +2025-02-06 01:16:27 - ERROR - stderr - +2025-02-06 01:16:27 - ERROR - stderr - +2025-02-06 01:16:27 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.3762987852096558, 'learning_rate': 3.1213430034367995e-06, 'epoch': 2.25} +2025-02-06 01:16:27 - ERROR - stderr - 75%|███████▍ | 16808/22434 [15:08:47<3:57:53, 2.54s/it] +2025-02-06 01:16:30 - ERROR - stderr - 75%|███████▍ | 16809/22434 [15:08:50<3:58:26, 2.54s/it] +2025-02-06 01:16:30 - ERROR - stderr - +2025-02-06 01:16:30 - ERROR - stderr - +2025-02-06 01:16:30 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.5467716455459595, 'learning_rate': 3.120295148990845e-06, 'epoch': 2.25} +2025-02-06 01:16:30 - ERROR - stderr - 75%|███████▍ | 16809/22434 [15:08:50<3:58:26, 2.54s/it] +2025-02-06 01:16:32 - ERROR - stderr - 75%|███████▍ | 16810/22434 [15:08:52<3:56:58, 2.53s/it] +2025-02-06 01:16:33 - ERROR - stderr - +2025-02-06 01:16:33 - ERROR - stderr - +2025-02-06 01:16:33 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.6736401319503784, 'learning_rate': 3.119247437945747e-06, 'epoch': 2.25} +2025-02-06 01:16:33 - ERROR - stderr - 75%|███████▍ | 16810/22434 [15:08:52<3:56:58, 2.53s/it] +2025-02-06 01:16:35 - ERROR - stderr - 75%|███████▍ | 16811/22434 [15:08:55<3:55:39, 2.51s/it] +2025-02-06 01:16:35 - ERROR - stderr - +2025-02-06 01:16:35 - ERROR - stderr - +2025-02-06 01:16:35 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.400375485420227, 'learning_rate': 3.1181998703233584e-06, 'epoch': 2.25} +2025-02-06 01:16:35 - ERROR - stderr - 75%|███████▍ | 16811/22434 [15:08:55<3:55:39, 2.51s/it] +2025-02-06 01:16:38 - ERROR - stderr - 75%|███████▍ | 16812/22434 [15:08:57<4:00:43, 2.57s/it] +2025-02-06 01:16:38 - ERROR - stderr - +2025-02-06 01:16:38 - ERROR - stderr - +2025-02-06 01:16:38 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.4999583959579468, 'learning_rate': 3.117152446145506e-06, 'epoch': 2.25} +2025-02-06 01:16:38 - ERROR - stderr - 75%|███████▍ | 16812/22434 [15:08:57<4:00:43, 2.57s/it] +2025-02-06 01:16:40 - ERROR - stderr - 75%|███████▍ | 16813/22434 [15:09:00<3:59:26, 2.56s/it] +2025-02-06 01:16:40 - ERROR - stderr - +2025-02-06 01:16:40 - ERROR - stderr - +2025-02-06 01:16:40 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.5084211826324463, 'learning_rate': 3.1161051654340236e-06, 'epoch': 2.25} +2025-02-06 01:16:40 - ERROR - stderr - 75%|███████▍ | 16813/22434 [15:09:00<3:59:26, 2.56s/it] +2025-02-06 01:16:43 - ERROR - stderr - 75%|███████▍ | 16814/22434 [15:09:03<4:03:16, 2.60s/it] +2025-02-06 01:16:43 - ERROR - stderr - +2025-02-06 01:16:43 - ERROR - stderr - +2025-02-06 01:16:43 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.5979585647583008, 'learning_rate': 3.1150580282107425e-06, 'epoch': 2.25} +2025-02-06 01:16:43 - ERROR - stderr - 75%|███████▍ | 16814/22434 [15:09:03<4:03:16, 2.60s/it] +2025-02-06 01:16:46 - ERROR - stderr - 75%|███████▍ | 16815/22434 [15:09:05<4:09:29, 2.66s/it] +2025-02-06 01:16:46 - ERROR - stderr - +2025-02-06 01:16:46 - ERROR - stderr - +2025-02-06 01:16:46 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.420744776725769, 'learning_rate': 3.114011034497485e-06, 'epoch': 2.25} +2025-02-06 01:16:46 - ERROR - stderr - 75%|███████▍ | 16815/22434 [15:09:06<4:09:29, 2.66s/it] +2025-02-06 01:16:48 - ERROR - stderr - 75%|███████▍ | 16816/22434 [15:09:08<4:04:15, 2.61s/it] +2025-02-06 01:16:48 - ERROR - stderr - +2025-02-06 01:16:48 - ERROR - stderr - +2025-02-06 01:16:48 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.466162919998169, 'learning_rate': 3.1129641843160854e-06, 'epoch': 2.25} +2025-02-06 01:16:48 - ERROR - stderr - 75%|███████▍ | 16816/22434 [15:09:08<4:04:15, 2.61s/it] +2025-02-06 01:16:51 - ERROR - stderr - 75%|███████▍ | 16817/22434 [15:09:11<4:17:27, 2.75s/it] +2025-02-06 01:16:51 - ERROR - stderr - +2025-02-06 01:16:51 - ERROR - stderr - +2025-02-06 01:16:51 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.5573830604553223, 'learning_rate': 3.111917477688353e-06, 'epoch': 2.25} +2025-02-06 01:16:51 - ERROR - stderr - 75%|███████▍ | 16817/22434 [15:09:11<4:17:27, 2.75s/it] +2025-02-06 01:16:54 - ERROR - stderr - 75%|███████▍ | 16818/22434 [15:09:13<4:09:01, 2.66s/it] +2025-02-06 01:16:54 - ERROR - stderr - +2025-02-06 01:16:54 - ERROR - stderr - +2025-02-06 01:16:54 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.4667078256607056, 'learning_rate': 3.1108709146361106e-06, 'epoch': 2.25} +2025-02-06 01:16:54 - ERROR - stderr - 75%|███████▍ | 16818/22434 [15:09:14<4:09:01, 2.66s/it] +2025-02-06 01:16:56 - ERROR - stderr - 75%|███████▍ | 16819/22434 [15:09:16<4:03:55, 2.61s/it] +2025-02-06 01:16:56 - ERROR - stderr - +2025-02-06 01:16:56 - ERROR - stderr - +2025-02-06 01:16:56 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.5942108631134033, 'learning_rate': 3.1098244951811718e-06, 'epoch': 2.25} +2025-02-06 01:16:56 - ERROR - stderr - 75%|███████▍ | 16819/22434 [15:09:16<4:03:55, 2.61s/it] +2025-02-06 01:16:59 - ERROR - stderr - 75%|███████▍ | 16820/22434 [15:09:18<4:00:43, 2.57s/it] +2025-02-06 01:16:59 - ERROR - stderr - +2025-02-06 01:16:59 - ERROR - stderr - +2025-02-06 01:16:59 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.6193267107009888, 'learning_rate': 3.1087782193453477e-06, 'epoch': 2.25} +2025-02-06 01:16:59 - ERROR - stderr - 75%|███████▍ | 16820/22434 [15:09:19<4:00:43, 2.57s/it] +2025-02-06 01:17:01 - ERROR - stderr - 75%|███████▍ | 16821/22434 [15:09:21<4:01:10, 2.58s/it] +2025-02-06 01:17:01 - ERROR - stderr - +2025-02-06 01:17:01 - ERROR - stderr - +2025-02-06 01:17:01 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.5907152891159058, 'learning_rate': 3.107732087150447e-06, 'epoch': 2.25} +2025-02-06 01:17:01 - ERROR - stderr - 75%|███████▍ | 16821/22434 [15:09:21<4:01:10, 2.58s/it] +2025-02-06 01:17:04 - ERROR - stderr - 75%|███████▍ | 16822/22434 [15:09:24<4:02:11, 2.59s/it] +2025-02-06 01:17:04 - ERROR - stderr - +2025-02-06 01:17:04 - ERROR - stderr - +2025-02-06 01:17:04 - INFO - stdout - {'loss': 0.4067, 'grad_norm': 1.5456286668777466, 'learning_rate': 3.106686098618277e-06, 'epoch': 2.25} +2025-02-06 01:17:04 - ERROR - stderr - 75%|███████▍ | 16822/22434 [15:09:24<4:02:11, 2.59s/it] +2025-02-06 01:17:06 - ERROR - stderr - 75%|███████▍ | 16823/22434 [15:09:26<3:58:13, 2.55s/it] +2025-02-06 01:17:06 - ERROR - stderr - +2025-02-06 01:17:06 - ERROR - stderr - +2025-02-06 01:17:06 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.4454903602600098, 'learning_rate': 3.1056402537706375e-06, 'epoch': 2.25} +2025-02-06 01:17:06 - ERROR - stderr - 75%|███████▍ | 16823/22434 [15:09:26<3:58:13, 2.55s/it] +2025-02-06 01:17:09 - ERROR - stderr - 75%|███████▍ | 16824/22434 [15:09:29<3:57:00, 2.53s/it] +2025-02-06 01:17:09 - ERROR - stderr - +2025-02-06 01:17:09 - ERROR - stderr - +2025-02-06 01:17:09 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.3326643705368042, 'learning_rate': 3.1045945526293307e-06, 'epoch': 2.25} +2025-02-06 01:17:09 - ERROR - stderr - 75%|███████▍ | 16824/22434 [15:09:29<3:57:00, 2.53s/it] +2025-02-06 01:17:11 - ERROR - stderr - 75%|███████▍ | 16825/22434 [15:09:31<3:57:28, 2.54s/it] +2025-02-06 01:17:11 - ERROR - stderr - +2025-02-06 01:17:11 - ERROR - stderr - +2025-02-06 01:17:11 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.5931422710418701, 'learning_rate': 3.1035489952161556e-06, 'epoch': 2.25} +2025-02-06 01:17:11 - ERROR - stderr - 75%|███████▍ | 16825/22434 [15:09:31<3:57:28, 2.54s/it] +2025-02-06 01:17:14 - ERROR - stderr - 75%|███████▌ | 16826/22434 [15:09:34<3:54:49, 2.51s/it] +2025-02-06 01:17:14 - ERROR - stderr - +2025-02-06 01:17:14 - ERROR - stderr - +2025-02-06 01:17:14 - INFO - stdout - {'loss': 0.3908, 'grad_norm': 1.6935925483703613, 'learning_rate': 3.102503581552896e-06, 'epoch': 2.25} +2025-02-06 01:17:14 - ERROR - stderr - 75%|███████▌ | 16826/22434 [15:09:34<3:54:49, 2.51s/it] +2025-02-06 01:17:17 - ERROR - stderr - 75%|███████▌ | 16827/22434 [15:09:36<3:59:08, 2.56s/it] +2025-02-06 01:17:17 - ERROR - stderr - +2025-02-06 01:17:17 - ERROR - stderr - +2025-02-06 01:17:17 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.5537736415863037, 'learning_rate': 3.101458311661352e-06, 'epoch': 2.25} +2025-02-06 01:17:17 - ERROR - stderr - 75%|███████▌ | 16827/22434 [15:09:36<3:59:08, 2.56s/it] +2025-02-06 01:17:19 - ERROR - stderr - 75%|███████▌ | 16828/22434 [15:09:39<3:56:23, 2.53s/it] +2025-02-06 01:17:19 - ERROR - stderr - +2025-02-06 01:17:19 - ERROR - stderr - +2025-02-06 01:17:19 - INFO - stdout - {'loss': 0.3937, 'grad_norm': 1.660982370376587, 'learning_rate': 3.100413185563309e-06, 'epoch': 2.25} +2025-02-06 01:17:19 - ERROR - stderr - 75%|███████▌ | 16828/22434 [15:09:39<3:56:23, 2.53s/it] +2025-02-06 01:17:22 - ERROR - stderr - 75%|███████▌ | 16829/22434 [15:09:41<3:57:55, 2.55s/it] +2025-02-06 01:17:22 - ERROR - stderr - +2025-02-06 01:17:22 - ERROR - stderr - +2025-02-06 01:17:22 - INFO - stdout - {'loss': 0.3872, 'grad_norm': 1.6677594184875488, 'learning_rate': 3.0993682032805507e-06, 'epoch': 2.25} +2025-02-06 01:17:22 - ERROR - stderr - 75%|███████▌ | 16829/22434 [15:09:41<3:57:55, 2.55s/it] +2025-02-06 01:17:24 - ERROR - stderr - 75%|███████▌ | 16830/22434 [15:09:44<3:57:17, 2.54s/it] +2025-02-06 01:17:24 - ERROR - stderr - +2025-02-06 01:17:24 - ERROR - stderr - +2025-02-06 01:17:24 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.5561074018478394, 'learning_rate': 3.0983233648348608e-06, 'epoch': 2.25} +2025-02-06 01:17:24 - ERROR - stderr - 75%|███████▌ | 16830/22434 [15:09:44<3:57:17, 2.54s/it] +2025-02-06 01:17:27 - ERROR - stderr - 75%|███████▌ | 16831/22434 [15:09:46<3:55:05, 2.52s/it] +2025-02-06 01:17:27 - ERROR - stderr - +2025-02-06 01:17:27 - ERROR - stderr - +2025-02-06 01:17:27 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.4261614084243774, 'learning_rate': 3.0972786702480116e-06, 'epoch': 2.25} +2025-02-06 01:17:27 - ERROR - stderr - 75%|███████▌ | 16831/22434 [15:09:46<3:55:05, 2.52s/it] +2025-02-06 01:17:29 - ERROR - stderr - 75%|███████▌ | 16832/22434 [15:09:49<3:58:21, 2.55s/it] +2025-02-06 01:17:29 - ERROR - stderr - +2025-02-06 01:17:29 - ERROR - stderr - +2025-02-06 01:17:29 - INFO - stdout - {'loss': 0.3258, 'grad_norm': 1.276711106300354, 'learning_rate': 3.096234119541789e-06, 'epoch': 2.25} +2025-02-06 01:17:29 - ERROR - stderr - 75%|███████▌ | 16832/22434 [15:09:49<3:58:21, 2.55s/it] +2025-02-06 01:17:32 - ERROR - stderr - 75%|███████▌ | 16833/22434 [15:09:52<3:59:26, 2.56s/it] +2025-02-06 01:17:32 - ERROR - stderr - +2025-02-06 01:17:32 - ERROR - stderr - +2025-02-06 01:17:32 - INFO - stdout - {'loss': 0.4213, 'grad_norm': 1.5506370067596436, 'learning_rate': 3.095189712737957e-06, 'epoch': 2.25} +2025-02-06 01:17:32 - ERROR - stderr - 75%|███████▌ | 16833/22434 [15:09:52<3:59:26, 2.56s/it] +2025-02-06 01:17:34 - ERROR - stderr - 75%|███████▌ | 16834/22434 [15:09:54<3:56:51, 2.54s/it] +2025-02-06 01:17:34 - ERROR - stderr - +2025-02-06 01:17:34 - ERROR - stderr - +2025-02-06 01:17:34 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.4372632503509521, 'learning_rate': 3.0941454498582847e-06, 'epoch': 2.25} +2025-02-06 01:17:34 - ERROR - stderr - 75%|███████▌ | 16834/22434 [15:09:54<3:56:51, 2.54s/it] +2025-02-06 01:17:37 - ERROR - stderr - 75%|███████▌ | 16835/22434 [15:09:56<3:54:17, 2.51s/it] +2025-02-06 01:17:37 - ERROR - stderr - +2025-02-06 01:17:37 - ERROR - stderr - +2025-02-06 01:17:37 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.4412713050842285, 'learning_rate': 3.0931013309245484e-06, 'epoch': 2.25} +2025-02-06 01:17:37 - ERROR - stderr - 75%|███████▌ | 16835/22434 [15:09:57<3:54:17, 2.51s/it] +2025-02-06 01:17:39 - ERROR - stderr - 75%|███████▌ | 16836/22434 [15:09:59<3:58:22, 2.55s/it] +2025-02-06 01:17:39 - ERROR - stderr - +2025-02-06 01:17:39 - ERROR - stderr - +2025-02-06 01:17:39 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.3859285116195679, 'learning_rate': 3.0920573559585e-06, 'epoch': 2.25} +2025-02-06 01:17:39 - ERROR - stderr - 75%|███████▌ | 16836/22434 [15:09:59<3:58:22, 2.55s/it] +2025-02-06 01:17:42 - ERROR - stderr - 75%|███████▌ | 16837/22434 [15:10:02<3:57:41, 2.55s/it] +2025-02-06 01:17:42 - ERROR - stderr - +2025-02-06 01:17:42 - ERROR - stderr - +2025-02-06 01:17:42 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.4443210363388062, 'learning_rate': 3.0910135249819116e-06, 'epoch': 2.25} +2025-02-06 01:17:42 - ERROR - stderr - 75%|███████▌ | 16837/22434 [15:10:02<3:57:41, 2.55s/it] +2025-02-06 01:17:44 - ERROR - stderr - 75%|███████▌ | 16838/22434 [15:10:04<3:55:24, 2.52s/it] +2025-02-06 01:17:44 - ERROR - stderr - +2025-02-06 01:17:44 - ERROR - stderr - +2025-02-06 01:17:44 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.4229679107666016, 'learning_rate': 3.089969838016532e-06, 'epoch': 2.25} +2025-02-06 01:17:44 - ERROR - stderr - 75%|███████▌ | 16838/22434 [15:10:04<3:55:24, 2.52s/it] +2025-02-06 01:17:47 - ERROR - stderr - 75%|███████▌ | 16839/22434 [15:10:07<3:53:13, 2.50s/it] +2025-02-06 01:17:47 - ERROR - stderr - +2025-02-06 01:17:47 - ERROR - stderr - +2025-02-06 01:17:47 - INFO - stdout - {'loss': 0.3865, 'grad_norm': 1.564966082572937, 'learning_rate': 3.0889262950841205e-06, 'epoch': 2.25} +2025-02-06 01:17:47 - ERROR - stderr - 75%|███████▌ | 16839/22434 [15:10:07<3:53:13, 2.50s/it] +2025-02-06 01:17:49 - ERROR - stderr - 75%|███████▌ | 16840/22434 [15:10:09<3:55:03, 2.52s/it] +2025-02-06 01:17:49 - ERROR - stderr - +2025-02-06 01:17:49 - ERROR - stderr - +2025-02-06 01:17:49 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.4747819900512695, 'learning_rate': 3.0878828962064256e-06, 'epoch': 2.25} +2025-02-06 01:17:49 - ERROR - stderr - 75%|███████▌ | 16840/22434 [15:10:09<3:55:03, 2.52s/it] +2025-02-06 01:17:52 - ERROR - stderr - 75%|███████▌ | 16841/22434 [15:10:12<3:55:51, 2.53s/it] +2025-02-06 01:17:52 - ERROR - stderr - +2025-02-06 01:17:52 - ERROR - stderr - +2025-02-06 01:17:52 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.4413813352584839, 'learning_rate': 3.086839641405197e-06, 'epoch': 2.25} +2025-02-06 01:17:52 - ERROR - stderr - 75%|███████▌ | 16841/22434 [15:10:12<3:55:51, 2.53s/it] +2025-02-06 01:17:54 - ERROR - stderr - 75%|███████▌ | 16842/22434 [15:10:14<3:54:27, 2.52s/it] +2025-02-06 01:17:54 - ERROR - stderr - +2025-02-06 01:17:54 - ERROR - stderr - +2025-02-06 01:17:54 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.4476912021636963, 'learning_rate': 3.085796530702182e-06, 'epoch': 2.25} +2025-02-06 01:17:54 - ERROR - stderr - 75%|███████▌ | 16842/22434 [15:10:14<3:54:27, 2.52s/it] +2025-02-06 01:17:57 - ERROR - stderr - 75%|███████▌ | 16843/22434 [15:10:17<3:55:39, 2.53s/it] +2025-02-06 01:17:57 - ERROR - stderr - +2025-02-06 01:17:57 - ERROR - stderr - +2025-02-06 01:17:57 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.414391040802002, 'learning_rate': 3.084753564119122e-06, 'epoch': 2.25} +2025-02-06 01:17:57 - ERROR - stderr - 75%|███████▌ | 16843/22434 [15:10:17<3:55:39, 2.53s/it] +2025-02-06 01:17:59 - ERROR - stderr - 75%|███████▌ | 16844/22434 [15:10:19<3:54:01, 2.51s/it] +2025-02-06 01:18:00 - ERROR - stderr - +2025-02-06 01:18:00 - ERROR - stderr - +2025-02-06 01:18:00 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.5013394355773926, 'learning_rate': 3.083710741677757e-06, 'epoch': 2.25} +2025-02-06 01:18:00 - ERROR - stderr - 75%|███████▌ | 16844/22434 [15:10:19<3:54:01, 2.51s/it] +2025-02-06 01:18:02 - ERROR - stderr - 75%|███████▌ | 16845/22434 [15:10:22<3:56:38, 2.54s/it] +2025-02-06 01:18:02 - ERROR - stderr - +2025-02-06 01:18:02 - ERROR - stderr - +2025-02-06 01:18:02 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.5913958549499512, 'learning_rate': 3.082668063399823e-06, 'epoch': 2.25} +2025-02-06 01:18:02 - ERROR - stderr - 75%|███████▌ | 16845/22434 [15:10:22<3:56:38, 2.54s/it] +2025-02-06 01:18:05 - ERROR - stderr - 75%|███████▌ | 16846/22434 [15:10:24<3:55:58, 2.53s/it] +2025-02-06 01:18:05 - ERROR - stderr - +2025-02-06 01:18:05 - ERROR - stderr - +2025-02-06 01:18:05 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.4901087284088135, 'learning_rate': 3.081625529307054e-06, 'epoch': 2.25} +2025-02-06 01:18:05 - ERROR - stderr - 75%|███████▌ | 16846/22434 [15:10:24<3:55:58, 2.53s/it] +2025-02-06 01:18:07 - ERROR - stderr - 75%|███████▌ | 16847/22434 [15:10:27<3:54:57, 2.52s/it] +2025-02-06 01:18:07 - ERROR - stderr - +2025-02-06 01:18:07 - ERROR - stderr - +2025-02-06 01:18:07 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.5260628461837769, 'learning_rate': 3.0805831394211805e-06, 'epoch': 2.25} +2025-02-06 01:18:07 - ERROR - stderr - 75%|███████▌ | 16847/22434 [15:10:27<3:54:57, 2.52s/it] +2025-02-06 01:18:10 - ERROR - stderr - 75%|███████▌ | 16848/22434 [15:10:29<3:55:18, 2.53s/it] +2025-02-06 01:18:10 - ERROR - stderr - +2025-02-06 01:18:10 - ERROR - stderr - +2025-02-06 01:18:10 - INFO - stdout - {'loss': 0.4362, 'grad_norm': 1.7693425416946411, 'learning_rate': 3.0795408937639313e-06, 'epoch': 2.25} +2025-02-06 01:18:10 - ERROR - stderr - 75%|███████▌ | 16848/22434 [15:10:29<3:55:18, 2.53s/it] +2025-02-06 01:18:12 - ERROR - stderr - 75%|███████▌ | 16849/22434 [15:10:32<3:54:51, 2.52s/it] +2025-02-06 01:18:12 - ERROR - stderr - +2025-02-06 01:18:12 - ERROR - stderr - +2025-02-06 01:18:12 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.5618743896484375, 'learning_rate': 3.078498792357032e-06, 'epoch': 2.25} +2025-02-06 01:18:12 - ERROR - stderr - 75%|███████▌ | 16849/22434 [15:10:32<3:54:51, 2.52s/it] +2025-02-06 01:18:15 - ERROR - stderr - 75%|███████▌ | 16850/22434 [15:10:34<3:56:14, 2.54s/it] +2025-02-06 01:18:15 - ERROR - stderr - +2025-02-06 01:18:15 - ERROR - stderr - +2025-02-06 01:18:15 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.6441594362258911, 'learning_rate': 3.0774568352221966e-06, 'epoch': 2.25} +2025-02-06 01:18:15 - ERROR - stderr - 75%|███████▌ | 16850/22434 [15:10:35<3:56:14, 2.54s/it] +2025-02-06 01:18:17 - ERROR - stderr - 75%|███████▌ | 16851/22434 [15:10:37<3:55:52, 2.53s/it] +2025-02-06 01:18:17 - ERROR - stderr - +2025-02-06 01:18:17 - ERROR - stderr - +2025-02-06 01:18:17 - INFO - stdout - {'loss': 0.4092, 'grad_norm': 1.617983341217041, 'learning_rate': 3.076415022381155e-06, 'epoch': 2.25} +2025-02-06 01:18:17 - ERROR - stderr - 75%|███████▌ | 16851/22434 [15:10:37<3:55:52, 2.53s/it] +2025-02-06 01:18:20 - ERROR - stderr - 75%|███████▌ | 16852/22434 [15:10:40<3:56:32, 2.54s/it] +2025-02-06 01:18:20 - ERROR - stderr - +2025-02-06 01:18:20 - ERROR - stderr - +2025-02-06 01:18:20 - INFO - stdout - {'loss': 0.4183, 'grad_norm': 1.704642653465271, 'learning_rate': 3.0753733538556117e-06, 'epoch': 2.25} +2025-02-06 01:18:20 - ERROR - stderr - 75%|███████▌ | 16852/22434 [15:10:40<3:56:32, 2.54s/it] +2025-02-06 01:18:22 - ERROR - stderr - 75%|███████▌ | 16853/22434 [15:10:42<3:54:08, 2.52s/it] +2025-02-06 01:18:22 - ERROR - stderr - +2025-02-06 01:18:22 - ERROR - stderr - +2025-02-06 01:18:22 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.4673752784729004, 'learning_rate': 3.0743318296672876e-06, 'epoch': 2.25} +2025-02-06 01:18:22 - ERROR - stderr - 75%|███████▌ | 16853/22434 [15:10:42<3:54:08, 2.52s/it] +2025-02-06 01:18:25 - ERROR - stderr - 75%|███████▌ | 16854/22434 [15:10:44<3:52:10, 2.50s/it] +2025-02-06 01:18:25 - ERROR - stderr - +2025-02-06 01:18:25 - ERROR - stderr - +2025-02-06 01:18:25 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.397048830986023, 'learning_rate': 3.0732904498378925e-06, 'epoch': 2.25} +2025-02-06 01:18:25 - ERROR - stderr - 75%|███████▌ | 16854/22434 [15:10:45<3:52:10, 2.50s/it] +2025-02-06 01:18:27 - ERROR - stderr - 75%|███████▌ | 16855/22434 [15:10:47<3:57:35, 2.56s/it] +2025-02-06 01:18:27 - ERROR - stderr - +2025-02-06 01:18:27 - ERROR - stderr - +2025-02-06 01:18:27 - INFO - stdout - {'loss': 0.331, 'grad_norm': 1.3881217241287231, 'learning_rate': 3.0722492143891223e-06, 'epoch': 2.25} +2025-02-06 01:18:27 - ERROR - stderr - 75%|███████▌ | 16855/22434 [15:10:47<3:57:35, 2.56s/it] +2025-02-06 01:18:30 - ERROR - stderr - 75%|███████▌ | 16856/22434 [15:10:50<3:54:18, 2.52s/it] +2025-02-06 01:18:30 - ERROR - stderr - +2025-02-06 01:18:30 - ERROR - stderr - +2025-02-06 01:18:30 - INFO - stdout - {'loss': 0.3995, 'grad_norm': 1.498417615890503, 'learning_rate': 3.071208123342696e-06, 'epoch': 2.25} +2025-02-06 01:18:30 - ERROR - stderr - 75%|███████▌ | 16856/22434 [15:10:50<3:54:18, 2.52s/it] +2025-02-06 01:18:32 - ERROR - stderr - 75%|███████▌ | 16857/22434 [15:10:52<3:52:55, 2.51s/it] +2025-02-06 01:18:32 - ERROR - stderr - +2025-02-06 01:18:32 - ERROR - stderr - +2025-02-06 01:18:32 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.47802734375, 'learning_rate': 3.070167176720302e-06, 'epoch': 2.25} +2025-02-06 01:18:32 - ERROR - stderr - 75%|███████▌ | 16857/22434 [15:10:52<3:52:55, 2.51s/it] +2025-02-06 01:18:35 - ERROR - stderr - 75%|███████▌ | 16858/22434 [15:10:55<4:01:11, 2.60s/it] +2025-02-06 01:18:35 - ERROR - stderr - +2025-02-06 01:18:35 - ERROR - stderr - +2025-02-06 01:18:35 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.3491392135620117, 'learning_rate': 3.069126374543643e-06, 'epoch': 2.25} +2025-02-06 01:18:35 - ERROR - stderr - 75%|███████▌ | 16858/22434 [15:10:55<4:01:11, 2.60s/it] +2025-02-06 01:18:38 - ERROR - stderr - 75%|███████▌ | 16859/22434 [15:10:57<3:57:32, 2.56s/it] +2025-02-06 01:18:38 - ERROR - stderr - +2025-02-06 01:18:38 - ERROR - stderr - +2025-02-06 01:18:38 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.5695017576217651, 'learning_rate': 3.0680857168344123e-06, 'epoch': 2.25} +2025-02-06 01:18:38 - ERROR - stderr - 75%|███████▌ | 16859/22434 [15:10:57<3:57:32, 2.56s/it] +2025-02-06 01:18:40 - ERROR - stderr - 75%|███████▌ | 16860/22434 [15:11:00<4:01:26, 2.60s/it] +2025-02-06 01:18:40 - ERROR - stderr - +2025-02-06 01:18:40 - ERROR - stderr - +2025-02-06 01:18:40 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.3904234170913696, 'learning_rate': 3.0670452036142986e-06, 'epoch': 2.25} +2025-02-06 01:18:40 - ERROR - stderr - 75%|███████▌ | 16860/22434 [15:11:00<4:01:26, 2.60s/it] +2025-02-06 01:18:43 - ERROR - stderr - 75%|███████▌ | 16861/22434 [15:11:02<3:56:24, 2.55s/it] +2025-02-06 01:18:43 - ERROR - stderr - +2025-02-06 01:18:43 - ERROR - stderr - +2025-02-06 01:18:43 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.569503664970398, 'learning_rate': 3.066004834905e-06, 'epoch': 2.25} +2025-02-06 01:18:43 - ERROR - stderr - 75%|███████▌ | 16861/22434 [15:11:03<3:56:24, 2.55s/it] +2025-02-06 01:18:45 - ERROR - stderr - 75%|███████▌ | 16862/22434 [15:11:05<4:02:43, 2.61s/it] +2025-02-06 01:18:46 - ERROR - stderr - +2025-02-06 01:18:46 - ERROR - stderr - +2025-02-06 01:18:46 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.488072156906128, 'learning_rate': 3.0649646107281917e-06, 'epoch': 2.25} +2025-02-06 01:18:46 - ERROR - stderr - 75%|███████▌ | 16862/22434 [15:11:05<4:02:43, 2.61s/it] +2025-02-06 01:18:48 - ERROR - stderr - 75%|███████▌ | 16863/22434 [15:11:08<4:02:59, 2.62s/it] +2025-02-06 01:18:48 - ERROR - stderr - +2025-02-06 01:18:48 - ERROR - stderr - +2025-02-06 01:18:48 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.5880361795425415, 'learning_rate': 3.06392453110556e-06, 'epoch': 2.26} +2025-02-06 01:18:48 - ERROR - stderr - 75%|███████▌ | 16863/22434 [15:11:08<4:02:59, 2.62s/it] +2025-02-06 01:18:51 - ERROR - stderr - 75%|███████▌ | 16864/22434 [15:11:10<3:58:23, 2.57s/it] +2025-02-06 01:18:51 - ERROR - stderr - +2025-02-06 01:18:51 - ERROR - stderr - +2025-02-06 01:18:51 - INFO - stdout - {'loss': 0.3255, 'grad_norm': 1.4313229322433472, 'learning_rate': 3.062884596058784e-06, 'epoch': 2.26} +2025-02-06 01:18:51 - ERROR - stderr - 75%|███████▌ | 16864/22434 [15:11:10<3:58:23, 2.57s/it] +2025-02-06 01:18:53 - ERROR - stderr - 75%|███████▌ | 16865/22434 [15:11:13<3:57:40, 2.56s/it] +2025-02-06 01:18:53 - ERROR - stderr - +2025-02-06 01:18:53 - ERROR - stderr - +2025-02-06 01:18:53 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.5292876958847046, 'learning_rate': 3.0618448056095417e-06, 'epoch': 2.26} +2025-02-06 01:18:53 - ERROR - stderr - 75%|███████▌ | 16865/22434 [15:11:13<3:57:40, 2.56s/it] +2025-02-06 01:18:56 - ERROR - stderr - 75%|███████▌ | 16866/22434 [15:11:15<3:54:55, 2.53s/it] +2025-02-06 01:18:56 - ERROR - stderr - +2025-02-06 01:18:56 - ERROR - stderr - +2025-02-06 01:18:56 - INFO - stdout - {'loss': 0.3997, 'grad_norm': 1.6403310298919678, 'learning_rate': 3.0608051597795043e-06, 'epoch': 2.26} +2025-02-06 01:18:56 - ERROR - stderr - 75%|███████▌ | 16866/22434 [15:11:15<3:54:55, 2.53s/it] +2025-02-06 01:18:58 - ERROR - stderr - 75%|███████▌ | 16867/22434 [15:11:18<3:54:31, 2.53s/it] +2025-02-06 01:18:58 - ERROR - stderr - +2025-02-06 01:18:58 - ERROR - stderr - +2025-02-06 01:18:58 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.4868853092193604, 'learning_rate': 3.0597656585903435e-06, 'epoch': 2.26} +2025-02-06 01:18:58 - ERROR - stderr - 75%|███████▌ | 16867/22434 [15:11:18<3:54:31, 2.53s/it] +2025-02-06 01:19:00 - ERROR - stderr - 75%|███████▌ | 16868/22434 [15:11:20<3:51:29, 2.50s/it] +2025-02-06 01:19:01 - ERROR - stderr - +2025-02-06 01:19:01 - ERROR - stderr - +2025-02-06 01:19:01 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5466123819351196, 'learning_rate': 3.058726302063727e-06, 'epoch': 2.26} +2025-02-06 01:19:01 - ERROR - stderr - 75%|███████▌ | 16868/22434 [15:11:20<3:51:29, 2.50s/it] +2025-02-06 01:19:03 - ERROR - stderr - 75%|███████▌ | 16869/22434 [15:11:23<3:52:32, 2.51s/it] +2025-02-06 01:19:03 - ERROR - stderr - +2025-02-06 01:19:03 - ERROR - stderr - +2025-02-06 01:19:03 - INFO - stdout - {'loss': 0.4401, 'grad_norm': 1.6676883697509766, 'learning_rate': 3.0576870902213186e-06, 'epoch': 2.26} +2025-02-06 01:19:03 - ERROR - stderr - 75%|███████▌ | 16869/22434 [15:11:23<3:52:32, 2.51s/it] +2025-02-06 01:19:06 - ERROR - stderr - 75%|███████▌ | 16870/22434 [15:11:25<3:52:37, 2.51s/it] +2025-02-06 01:19:06 - ERROR - stderr - +2025-02-06 01:19:06 - ERROR - stderr - +2025-02-06 01:19:06 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.7351393699645996, 'learning_rate': 3.056648023084783e-06, 'epoch': 2.26} +2025-02-06 01:19:06 - ERROR - stderr - 75%|███████▌ | 16870/22434 [15:11:25<3:52:37, 2.51s/it] +2025-02-06 01:19:08 - ERROR - stderr - 75%|███████▌ | 16871/22434 [15:11:28<3:52:54, 2.51s/it] +2025-02-06 01:19:08 - ERROR - stderr - +2025-02-06 01:19:08 - ERROR - stderr - +2025-02-06 01:19:08 - INFO - stdout - {'loss': 0.3135, 'grad_norm': 1.3027445077896118, 'learning_rate': 3.0556091006757684e-06, 'epoch': 2.26} +2025-02-06 01:19:08 - ERROR - stderr - 75%|███████▌ | 16871/22434 [15:11:28<3:52:54, 2.51s/it] +2025-02-06 01:19:11 - ERROR - stderr - 75%|███████▌ | 16872/22434 [15:11:30<3:53:10, 2.52s/it] +2025-02-06 01:19:11 - ERROR - stderr - +2025-02-06 01:19:11 - ERROR - stderr - +2025-02-06 01:19:11 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.5714423656463623, 'learning_rate': 3.0545703230159394e-06, 'epoch': 2.26} +2025-02-06 01:19:11 - ERROR - stderr - 75%|███████▌ | 16872/22434 [15:11:30<3:53:10, 2.52s/it] +2025-02-06 01:19:13 - ERROR - stderr - 75%|███████▌ | 16873/22434 [15:11:33<4:00:59, 2.60s/it] +2025-02-06 01:19:13 - ERROR - stderr - +2025-02-06 01:19:13 - ERROR - stderr - +2025-02-06 01:19:13 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.6864912509918213, 'learning_rate': 3.053531690126951e-06, 'epoch': 2.26} +2025-02-06 01:19:13 - ERROR - stderr - 75%|███████▌ | 16873/22434 [15:11:33<4:00:59, 2.60s/it] +2025-02-06 01:19:16 - ERROR - stderr - 75%|███████▌ | 16874/22434 [15:11:36<4:00:04, 2.59s/it] +2025-02-06 01:19:16 - ERROR - stderr - +2025-02-06 01:19:16 - ERROR - stderr - +2025-02-06 01:19:16 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.6888189315795898, 'learning_rate': 3.05249320203044e-06, 'epoch': 2.26} +2025-02-06 01:19:16 - ERROR - stderr - 75%|███████▌ | 16874/22434 [15:11:36<4:00:04, 2.59s/it] +2025-02-06 01:19:18 - ERROR - stderr - 75%|███████▌ | 16875/22434 [15:11:38<3:58:00, 2.57s/it] +2025-02-06 01:19:19 - ERROR - stderr - +2025-02-06 01:19:19 - ERROR - stderr - +2025-02-06 01:19:19 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.7319388389587402, 'learning_rate': 3.0514548587480663e-06, 'epoch': 2.26} +2025-02-06 01:19:19 - ERROR - stderr - 75%|███████▌ | 16875/22434 [15:11:38<3:58:00, 2.57s/it] +2025-02-06 01:19:21 - ERROR - stderr - 75%|███████▌ | 16876/22434 [15:11:41<3:55:06, 2.54s/it] +2025-02-06 01:19:21 - ERROR - stderr - +2025-02-06 01:19:21 - ERROR - stderr - +2025-02-06 01:19:21 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.47159743309021, 'learning_rate': 3.050416660301462e-06, 'epoch': 2.26} +2025-02-06 01:19:21 - ERROR - stderr - 75%|███████▌ | 16876/22434 [15:11:41<3:55:06, 2.54s/it] +2025-02-06 01:19:23 - ERROR - stderr - 75%|███████▌ | 16877/22434 [15:11:43<3:53:23, 2.52s/it] +2025-02-06 01:19:23 - ERROR - stderr - +2025-02-06 01:19:23 - ERROR - stderr - +2025-02-06 01:19:23 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.4730385541915894, 'learning_rate': 3.0493786067122764e-06, 'epoch': 2.26} +2025-02-06 01:19:23 - ERROR - stderr - 75%|███████▌ | 16877/22434 [15:11:43<3:53:23, 2.52s/it] +2025-02-06 01:19:26 - ERROR - stderr - 75%|███████▌ | 16878/22434 [15:11:46<3:53:53, 2.53s/it] +2025-02-06 01:19:26 - ERROR - stderr - +2025-02-06 01:19:26 - ERROR - stderr - +2025-02-06 01:19:26 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.6782777309417725, 'learning_rate': 3.0483406980021414e-06, 'epoch': 2.26} +2025-02-06 01:19:26 - ERROR - stderr - 75%|███████▌ | 16878/22434 [15:11:46<3:53:53, 2.53s/it] +2025-02-06 01:19:28 - ERROR - stderr - 75%|███████▌ | 16879/22434 [15:11:48<3:53:30, 2.52s/it] +2025-02-06 01:19:29 - ERROR - stderr - +2025-02-06 01:19:29 - ERROR - stderr - +2025-02-06 01:19:29 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.7455781698226929, 'learning_rate': 3.0473029341926897e-06, 'epoch': 2.26} +2025-02-06 01:19:29 - ERROR - stderr - 75%|███████▌ | 16879/22434 [15:11:48<3:53:30, 2.52s/it] +2025-02-06 01:19:31 - ERROR - stderr - 75%|███████▌ | 16880/22434 [15:11:51<3:51:58, 2.51s/it] +2025-02-06 01:19:31 - ERROR - stderr - +2025-02-06 01:19:31 - ERROR - stderr - +2025-02-06 01:19:31 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.4600164890289307, 'learning_rate': 3.0462653153055612e-06, 'epoch': 2.26} +2025-02-06 01:19:31 - ERROR - stderr - 75%|███████▌ | 16880/22434 [15:11:51<3:51:58, 2.51s/it] +2025-02-06 01:19:33 - ERROR - stderr - 75%|███████▌ | 16881/22434 [15:11:53<3:53:29, 2.52s/it] +2025-02-06 01:19:34 - ERROR - stderr - +2025-02-06 01:19:34 - ERROR - stderr - +2025-02-06 01:19:34 - INFO - stdout - {'loss': 0.4067, 'grad_norm': 1.6327508687973022, 'learning_rate': 3.0452278413623736e-06, 'epoch': 2.26} +2025-02-06 01:19:34 - ERROR - stderr - 75%|███████▌ | 16881/22434 [15:11:53<3:53:29, 2.52s/it] +2025-02-06 01:19:36 - ERROR - stderr - 75%|███████▌ | 16882/22434 [15:11:56<3:52:42, 2.51s/it] +2025-02-06 01:19:36 - ERROR - stderr - +2025-02-06 01:19:36 - ERROR - stderr - +2025-02-06 01:19:36 - INFO - stdout - {'loss': 0.4633, 'grad_norm': 1.8765074014663696, 'learning_rate': 3.0441905123847583e-06, 'epoch': 2.26} +2025-02-06 01:19:36 - ERROR - stderr - 75%|███████▌ | 16882/22434 [15:11:56<3:52:42, 2.51s/it] +2025-02-06 01:19:38 - ERROR - stderr - 75%|███████▌ | 16883/22434 [15:11:58<3:51:10, 2.50s/it] +2025-02-06 01:19:38 - ERROR - stderr - +2025-02-06 01:19:38 - ERROR - stderr - +2025-02-06 01:19:38 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.325674057006836, 'learning_rate': 3.043153328394335e-06, 'epoch': 2.26} +2025-02-06 01:19:38 - ERROR - stderr - 75%|███████▌ | 16883/22434 [15:11:58<3:51:10, 2.50s/it] +2025-02-06 01:19:41 - ERROR - stderr - 75%|███████▌ | 16884/22434 [15:12:01<3:49:46, 2.48s/it] +2025-02-06 01:19:41 - ERROR - stderr - +2025-02-06 01:19:41 - ERROR - stderr - +2025-02-06 01:19:41 - INFO - stdout - {'loss': 0.4178, 'grad_norm': 1.666165828704834, 'learning_rate': 3.042116289412724e-06, 'epoch': 2.26} +2025-02-06 01:19:41 - ERROR - stderr - 75%|███████▌ | 16884/22434 [15:12:01<3:49:46, 2.48s/it] +2025-02-06 01:19:43 - ERROR - stderr - 75%|███████▌ | 16885/22434 [15:12:03<3:52:16, 2.51s/it] +2025-02-06 01:19:44 - ERROR - stderr - +2025-02-06 01:19:44 - ERROR - stderr - +2025-02-06 01:19:44 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.5565783977508545, 'learning_rate': 3.0410793954615414e-06, 'epoch': 2.26} +2025-02-06 01:19:44 - ERROR - stderr - 75%|███████▌ | 16885/22434 [15:12:03<3:52:16, 2.51s/it] +2025-02-06 01:19:46 - ERROR - stderr - 75%|███████▌ | 16886/22434 [15:12:06<3:52:05, 2.51s/it] +2025-02-06 01:19:46 - ERROR - stderr - +2025-02-06 01:19:46 - ERROR - stderr - +2025-02-06 01:19:46 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.5822184085845947, 'learning_rate': 3.040042646562399e-06, 'epoch': 2.26} +2025-02-06 01:19:46 - ERROR - stderr - 75%|███████▌ | 16886/22434 [15:12:06<3:52:05, 2.51s/it] +2025-02-06 01:19:48 - ERROR - stderr - 75%|███████▌ | 16887/22434 [15:12:08<3:51:02, 2.50s/it] +2025-02-06 01:19:48 - ERROR - stderr - +2025-02-06 01:19:48 - ERROR - stderr - +2025-02-06 01:19:48 - INFO - stdout - {'loss': 0.3938, 'grad_norm': 1.6373484134674072, 'learning_rate': 3.0390060427369074e-06, 'epoch': 2.26} +2025-02-06 01:19:48 - ERROR - stderr - 75%|███████▌ | 16887/22434 [15:12:08<3:51:02, 2.50s/it] +2025-02-06 01:19:51 - ERROR - stderr - 75%|███████▌ | 16888/22434 [15:12:11<3:51:07, 2.50s/it] +2025-02-06 01:19:51 - ERROR - stderr - +2025-02-06 01:19:51 - ERROR - stderr - +2025-02-06 01:19:51 - INFO - stdout - {'loss': 0.3627, 'grad_norm': 1.5109983682632446, 'learning_rate': 3.037969584006675e-06, 'epoch': 2.26} +2025-02-06 01:19:51 - ERROR - stderr - 75%|███████▌ | 16888/22434 [15:12:11<3:51:07, 2.50s/it] +2025-02-06 01:19:53 - ERROR - stderr - 75%|███████▌ | 16889/22434 [15:12:13<3:49:34, 2.48s/it] +2025-02-06 01:19:53 - ERROR - stderr - +2025-02-06 01:19:53 - ERROR - stderr - +2025-02-06 01:19:53 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.4306840896606445, 'learning_rate': 3.0369332703933073e-06, 'epoch': 2.26} +2025-02-06 01:19:53 - ERROR - stderr - 75%|███████▌ | 16889/22434 [15:12:13<3:49:34, 2.48s/it] +2025-02-06 01:19:56 - ERROR - stderr - 75%|███████▌ | 16890/22434 [15:12:16<3:50:27, 2.49s/it] +2025-02-06 01:19:56 - ERROR - stderr - +2025-02-06 01:19:56 - ERROR - stderr - +2025-02-06 01:19:56 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.3334693908691406, 'learning_rate': 3.035897101918396e-06, 'epoch': 2.26} +2025-02-06 01:19:56 - ERROR - stderr - 75%|███████▌ | 16890/22434 [15:12:16<3:50:27, 2.49s/it] +2025-02-06 01:19:58 - ERROR - stderr - 75%|███████▌ | 16891/22434 [15:12:18<3:50:22, 2.49s/it] +2025-02-06 01:19:58 - ERROR - stderr - +2025-02-06 01:19:58 - ERROR - stderr - +2025-02-06 01:19:58 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.5721312761306763, 'learning_rate': 3.034861078603549e-06, 'epoch': 2.26} +2025-02-06 01:19:58 - ERROR - stderr - 75%|███████▌ | 16891/22434 [15:12:18<3:50:22, 2.49s/it] +2025-02-06 01:20:01 - ERROR - stderr - 75%|███████▌ | 16892/22434 [15:12:21<3:49:40, 2.49s/it] +2025-02-06 01:20:01 - ERROR - stderr - +2025-02-06 01:20:01 - ERROR - stderr - +2025-02-06 01:20:01 - INFO - stdout - {'loss': 0.3022, 'grad_norm': 1.294703722000122, 'learning_rate': 3.0338252004703583e-06, 'epoch': 2.26} +2025-02-06 01:20:01 - ERROR - stderr - 75%|███████▌ | 16892/22434 [15:12:21<3:49:40, 2.49s/it] +2025-02-06 01:20:03 - ERROR - stderr - 75%|███████▌ | 16893/22434 [15:12:23<3:51:25, 2.51s/it] +2025-02-06 01:20:03 - ERROR - stderr - +2025-02-06 01:20:03 - ERROR - stderr - +2025-02-06 01:20:03 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.5428558588027954, 'learning_rate': 3.0327894675404155e-06, 'epoch': 2.26} +2025-02-06 01:20:03 - ERROR - stderr - 75%|███████▌ | 16893/22434 [15:12:23<3:51:25, 2.51s/it] +2025-02-06 01:20:06 - ERROR - stderr - 75%|███████▌ | 16894/22434 [15:12:26<3:49:07, 2.48s/it] +2025-02-06 01:20:06 - ERROR - stderr - +2025-02-06 01:20:06 - ERROR - stderr - +2025-02-06 01:20:06 - INFO - stdout - {'loss': 0.3884, 'grad_norm': 1.7050325870513916, 'learning_rate': 3.0317538798353117e-06, 'epoch': 2.26} +2025-02-06 01:20:06 - ERROR - stderr - 75%|███████▌ | 16894/22434 [15:12:26<3:49:07, 2.48s/it] +2025-02-06 01:20:08 - ERROR - stderr - 75%|███████▌ | 16895/22434 [15:12:28<3:49:45, 2.49s/it] +2025-02-06 01:20:08 - ERROR - stderr - +2025-02-06 01:20:08 - ERROR - stderr - +2025-02-06 01:20:08 - INFO - stdout - {'loss': 0.3884, 'grad_norm': 1.4307669401168823, 'learning_rate': 3.030718437376625e-06, 'epoch': 2.26} +2025-02-06 01:20:08 - ERROR - stderr - 75%|███████▌ | 16895/22434 [15:12:28<3:49:45, 2.49s/it] +2025-02-06 01:20:11 - ERROR - stderr - 75%|███████▌ | 16896/22434 [15:12:31<3:49:15, 2.48s/it] +2025-02-06 01:20:11 - ERROR - stderr - +2025-02-06 01:20:11 - ERROR - stderr - +2025-02-06 01:20:11 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.4387240409851074, 'learning_rate': 3.0296831401859494e-06, 'epoch': 2.26} +2025-02-06 01:20:11 - ERROR - stderr - 75%|███████▌ | 16896/22434 [15:12:31<3:49:15, 2.48s/it] +2025-02-06 01:20:13 - ERROR - stderr - 75%|███████▌ | 16897/22434 [15:12:33<3:49:55, 2.49s/it] +2025-02-06 01:20:13 - ERROR - stderr - +2025-02-06 01:20:13 - ERROR - stderr - +2025-02-06 01:20:13 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.3711886405944824, 'learning_rate': 3.028647988284855e-06, 'epoch': 2.26} +2025-02-06 01:20:13 - ERROR - stderr - 75%|███████▌ | 16897/22434 [15:12:33<3:49:55, 2.49s/it] +2025-02-06 01:20:16 - ERROR - stderr - 75%|███████▌ | 16898/22434 [15:12:36<3:48:42, 2.48s/it] +2025-02-06 01:20:16 - ERROR - stderr - +2025-02-06 01:20:16 - ERROR - stderr - +2025-02-06 01:20:16 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.5028516054153442, 'learning_rate': 3.0276129816949207e-06, 'epoch': 2.26} +2025-02-06 01:20:16 - ERROR - stderr - 75%|███████▌ | 16898/22434 [15:12:36<3:48:42, 2.48s/it] +2025-02-06 01:20:18 - ERROR - stderr - 75%|███████▌ | 16899/22434 [15:12:38<3:48:16, 2.47s/it] +2025-02-06 01:20:18 - ERROR - stderr - +2025-02-06 01:20:18 - ERROR - stderr - +2025-02-06 01:20:18 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.4393095970153809, 'learning_rate': 3.0265781204377278e-06, 'epoch': 2.26} +2025-02-06 01:20:18 - ERROR - stderr - 75%|███████▌ | 16899/22434 [15:12:38<3:48:16, 2.47s/it] +2025-02-06 01:20:21 - ERROR - stderr - 75%|███████▌ | 16900/22434 [15:12:41<3:50:32, 2.50s/it] +2025-02-06 01:20:21 - ERROR - stderr - +2025-02-06 01:20:21 - ERROR - stderr - +2025-02-06 01:20:21 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.4832910299301147, 'learning_rate': 3.0255434045348344e-06, 'epoch': 2.26} +2025-02-06 01:20:21 - ERROR - stderr - 75%|███████▌ | 16900/22434 [15:12:41<3:50:32, 2.50s/it] +2025-02-06 01:20:23 - ERROR - stderr - 75%|███████▌ | 16901/22434 [15:12:43<3:51:44, 2.51s/it] +2025-02-06 01:20:23 - ERROR - stderr - +2025-02-06 01:20:23 - ERROR - stderr - +2025-02-06 01:20:23 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.5236473083496094, 'learning_rate': 3.024508834007821e-06, 'epoch': 2.26} +2025-02-06 01:20:23 - ERROR - stderr - 75%|███████▌ | 16901/22434 [15:12:43<3:51:44, 2.51s/it] +2025-02-06 01:20:26 - ERROR - stderr - 75%|███████▌ | 16902/22434 [15:12:46<3:50:30, 2.50s/it] +2025-02-06 01:20:26 - ERROR - stderr - +2025-02-06 01:20:26 - ERROR - stderr - +2025-02-06 01:20:26 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.5917166471481323, 'learning_rate': 3.0234744088782443e-06, 'epoch': 2.26} +2025-02-06 01:20:26 - ERROR - stderr - 75%|███████▌ | 16902/22434 [15:12:46<3:50:30, 2.50s/it] +2025-02-06 01:20:28 - ERROR - stderr - 75%|███████▌ | 16903/22434 [15:12:48<3:52:44, 2.52s/it] +2025-02-06 01:20:28 - ERROR - stderr - +2025-02-06 01:20:28 - ERROR - stderr - +2025-02-06 01:20:28 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.438387393951416, 'learning_rate': 3.022440129167666e-06, 'epoch': 2.26} +2025-02-06 01:20:28 - ERROR - stderr - 75%|███████▌ | 16903/22434 [15:12:48<3:52:44, 2.52s/it] +2025-02-06 01:20:31 - ERROR - stderr - 75%|███████▌ | 16904/22434 [15:12:51<3:51:25, 2.51s/it] +2025-02-06 01:20:31 - ERROR - stderr - +2025-02-06 01:20:31 - ERROR - stderr - +2025-02-06 01:20:31 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.4679757356643677, 'learning_rate': 3.021405994897647e-06, 'epoch': 2.26} +2025-02-06 01:20:31 - ERROR - stderr - 75%|███████▌ | 16904/22434 [15:12:51<3:51:25, 2.51s/it] +2025-02-06 01:20:33 - ERROR - stderr - 75%|███████▌ | 16905/22434 [15:12:53<3:49:07, 2.49s/it] +2025-02-06 01:20:33 - ERROR - stderr - +2025-02-06 01:20:33 - ERROR - stderr - +2025-02-06 01:20:33 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.3614897727966309, 'learning_rate': 3.0203720060897434e-06, 'epoch': 2.26} +2025-02-06 01:20:33 - ERROR - stderr - 75%|███████▌ | 16905/22434 [15:12:53<3:49:07, 2.49s/it] +2025-02-06 01:20:36 - ERROR - stderr - 75%|███████▌ | 16906/22434 [15:12:56<3:49:50, 2.49s/it] +2025-02-06 01:20:36 - ERROR - stderr - +2025-02-06 01:20:36 - ERROR - stderr - +2025-02-06 01:20:36 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.4986467361450195, 'learning_rate': 3.019338162765505e-06, 'epoch': 2.26} +2025-02-06 01:20:36 - ERROR - stderr - 75%|███████▌ | 16906/22434 [15:12:56<3:49:50, 2.49s/it] +2025-02-06 01:20:38 - ERROR - stderr - 75%|███████▌ | 16907/22434 [15:12:58<3:50:45, 2.50s/it] +2025-02-06 01:20:38 - ERROR - stderr - +2025-02-06 01:20:38 - ERROR - stderr - +2025-02-06 01:20:38 - INFO - stdout - {'loss': 0.4413, 'grad_norm': 1.7912352085113525, 'learning_rate': 3.018304464946483e-06, 'epoch': 2.26} +2025-02-06 01:20:38 - ERROR - stderr - 75%|███████▌ | 16907/22434 [15:12:58<3:50:45, 2.50s/it] +2025-02-06 01:20:41 - ERROR - stderr - 75%|███████▌ | 16908/22434 [15:13:01<3:55:07, 2.55s/it] +2025-02-06 01:20:41 - ERROR - stderr - +2025-02-06 01:20:41 - ERROR - stderr - +2025-02-06 01:20:41 - INFO - stdout - {'loss': 0.4243, 'grad_norm': 1.600490927696228, 'learning_rate': 3.0172709126542244e-06, 'epoch': 2.26} +2025-02-06 01:20:41 - ERROR - stderr - 75%|███████▌ | 16908/22434 [15:13:01<3:55:07, 2.55s/it] +2025-02-06 01:20:44 - ERROR - stderr - 75%|███████▌ | 16909/22434 [15:13:03<3:53:05, 2.53s/it] +2025-02-06 01:20:44 - ERROR - stderr - +2025-02-06 01:20:44 - ERROR - stderr - +2025-02-06 01:20:44 - INFO - stdout - {'loss': 0.4, 'grad_norm': 1.5026960372924805, 'learning_rate': 3.016237505910272e-06, 'epoch': 2.26} +2025-02-06 01:20:44 - ERROR - stderr - 75%|███████▌ | 16909/22434 [15:13:03<3:53:05, 2.53s/it] +2025-02-06 01:20:46 - ERROR - stderr - 75%|███████▌ | 16910/22434 [15:13:06<3:54:17, 2.54s/it] +2025-02-06 01:20:46 - ERROR - stderr - +2025-02-06 01:20:46 - ERROR - stderr - +2025-02-06 01:20:46 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.5268440246582031, 'learning_rate': 3.015204244736166e-06, 'epoch': 2.26} +2025-02-06 01:20:46 - ERROR - stderr - 75%|███████▌ | 16910/22434 [15:13:06<3:54:17, 2.54s/it] +2025-02-06 01:20:49 - ERROR - stderr - 75%|███████▌ | 16911/22434 [15:13:08<3:53:49, 2.54s/it] +2025-02-06 01:20:49 - ERROR - stderr - +2025-02-06 01:20:49 - ERROR - stderr - +2025-02-06 01:20:49 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.6370006799697876, 'learning_rate': 3.0141711291534435e-06, 'epoch': 2.26} +2025-02-06 01:20:49 - ERROR - stderr - 75%|███████▌ | 16911/22434 [15:13:08<3:53:49, 2.54s/it] +2025-02-06 01:20:51 - ERROR - stderr - 75%|███████▌ | 16912/22434 [15:13:11<3:58:35, 2.59s/it] +2025-02-06 01:20:51 - ERROR - stderr - +2025-02-06 01:20:51 - ERROR - stderr - +2025-02-06 01:20:51 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.5226500034332275, 'learning_rate': 3.0131381591836385e-06, 'epoch': 2.26} +2025-02-06 01:20:51 - ERROR - stderr - 75%|███████▌ | 16912/22434 [15:13:11<3:58:35, 2.59s/it] +2025-02-06 01:20:54 - ERROR - stderr - 75%|███████▌ | 16913/22434 [15:13:14<3:57:59, 2.59s/it] +2025-02-06 01:20:54 - ERROR - stderr - +2025-02-06 01:20:54 - ERROR - stderr - +2025-02-06 01:20:54 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.3329479694366455, 'learning_rate': 3.0121053348482844e-06, 'epoch': 2.26} +2025-02-06 01:20:54 - ERROR - stderr - 75%|███████▌ | 16913/22434 [15:13:14<3:57:59, 2.59s/it] +2025-02-06 01:20:56 - ERROR - stderr - 75%|███████▌ | 16914/22434 [15:13:16<3:54:09, 2.55s/it] +2025-02-06 01:20:56 - ERROR - stderr - +2025-02-06 01:20:56 - ERROR - stderr - +2025-02-06 01:20:56 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.4048888683319092, 'learning_rate': 3.011072656168906e-06, 'epoch': 2.26} +2025-02-06 01:20:56 - ERROR - stderr - 75%|███████▌ | 16914/22434 [15:13:16<3:54:09, 2.55s/it] +2025-02-06 01:20:59 - ERROR - stderr - 75%|███████▌ | 16915/22434 [15:13:19<3:51:22, 2.52s/it] +2025-02-06 01:20:59 - ERROR - stderr - +2025-02-06 01:20:59 - ERROR - stderr - +2025-02-06 01:20:59 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.4553167819976807, 'learning_rate': 3.0100401231670353e-06, 'epoch': 2.26} +2025-02-06 01:20:59 - ERROR - stderr - 75%|███████▌ | 16915/22434 [15:13:19<3:51:22, 2.52s/it] +2025-02-06 01:21:01 - ERROR - stderr - 75%|███████▌ | 16916/22434 [15:13:21<3:50:26, 2.51s/it] +2025-02-06 01:21:01 - ERROR - stderr - +2025-02-06 01:21:01 - ERROR - stderr - +2025-02-06 01:21:01 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.495451807975769, 'learning_rate': 3.009007735864182e-06, 'epoch': 2.26} +2025-02-06 01:21:01 - ERROR - stderr - 75%|███████▌ | 16916/22434 [15:13:21<3:50:26, 2.51s/it] +2025-02-06 01:21:04 - ERROR - stderr - 75%|███████▌ | 16917/22434 [15:13:24<3:51:01, 2.51s/it] +2025-02-06 01:21:04 - ERROR - stderr - +2025-02-06 01:21:04 - ERROR - stderr - +2025-02-06 01:21:04 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.4222584962844849, 'learning_rate': 3.007975494281876e-06, 'epoch': 2.26} +2025-02-06 01:21:04 - ERROR - stderr - 75%|███████▌ | 16917/22434 [15:13:24<3:51:01, 2.51s/it] +2025-02-06 01:21:06 - ERROR - stderr - 75%|███████▌ | 16918/22434 [15:13:26<3:50:47, 2.51s/it] +2025-02-06 01:21:06 - ERROR - stderr - +2025-02-06 01:21:06 - ERROR - stderr - +2025-02-06 01:21:06 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.4308780431747437, 'learning_rate': 3.006943398441634e-06, 'epoch': 2.26} +2025-02-06 01:21:06 - ERROR - stderr - 75%|███████▌ | 16918/22434 [15:13:26<3:50:47, 2.51s/it] +2025-02-06 01:21:09 - ERROR - stderr - 75%|███████▌ | 16919/22434 [15:13:29<3:57:42, 2.59s/it] +2025-02-06 01:21:09 - ERROR - stderr - +2025-02-06 01:21:09 - ERROR - stderr - +2025-02-06 01:21:09 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.3845402002334595, 'learning_rate': 3.005911448364959e-06, 'epoch': 2.26} +2025-02-06 01:21:09 - ERROR - stderr - 75%|███████▌ | 16919/22434 [15:13:29<3:57:42, 2.59s/it] +2025-02-06 01:21:12 - ERROR - stderr - 75%|███████▌ | 16920/22434 [15:13:31<3:54:24, 2.55s/it] +2025-02-06 01:21:12 - ERROR - stderr - +2025-02-06 01:21:12 - ERROR - stderr - +2025-02-06 01:21:12 - INFO - stdout - {'loss': 0.411, 'grad_norm': 1.6162265539169312, 'learning_rate': 3.004879644073373e-06, 'epoch': 2.26} +2025-02-06 01:21:12 - ERROR - stderr - 75%|███████▌ | 16920/22434 [15:13:31<3:54:24, 2.55s/it] +2025-02-06 01:21:14 - ERROR - stderr - 75%|███████▌ | 16921/22434 [15:13:34<3:53:15, 2.54s/it] +2025-02-06 01:21:14 - ERROR - stderr - +2025-02-06 01:21:14 - ERROR - stderr - +2025-02-06 01:21:14 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.5653401613235474, 'learning_rate': 3.0038479855883705e-06, 'epoch': 2.26} +2025-02-06 01:21:14 - ERROR - stderr - 75%|███████▌ | 16921/22434 [15:13:34<3:53:15, 2.54s/it] +2025-02-06 01:21:17 - ERROR - stderr - 75%|███████▌ | 16922/22434 [15:13:36<3:51:41, 2.52s/it] +2025-02-06 01:21:17 - ERROR - stderr - +2025-02-06 01:21:17 - ERROR - stderr - +2025-02-06 01:21:17 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.6274349689483643, 'learning_rate': 3.00281647293147e-06, 'epoch': 2.26} +2025-02-06 01:21:17 - ERROR - stderr - 75%|███████▌ | 16922/22434 [15:13:36<3:51:41, 2.52s/it] +2025-02-06 01:21:19 - ERROR - stderr - 75%|███████▌ | 16923/22434 [15:13:39<3:51:08, 2.52s/it] +2025-02-06 01:21:19 - ERROR - stderr - +2025-02-06 01:21:19 - ERROR - stderr - +2025-02-06 01:21:19 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.4817800521850586, 'learning_rate': 3.00178510612416e-06, 'epoch': 2.26} +2025-02-06 01:21:19 - ERROR - stderr - 75%|███████▌ | 16923/22434 [15:13:39<3:51:08, 2.52s/it] +2025-02-06 01:21:22 - ERROR - stderr - 75%|███████▌ | 16924/22434 [15:13:41<3:52:14, 2.53s/it] +2025-02-06 01:21:22 - ERROR - stderr - +2025-02-06 01:21:22 - ERROR - stderr - +2025-02-06 01:21:22 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.4545910358428955, 'learning_rate': 3.0007538851879435e-06, 'epoch': 2.26} +2025-02-06 01:21:22 - ERROR - stderr - 75%|███████▌ | 16924/22434 [15:13:41<3:52:14, 2.53s/it] +2025-02-06 01:21:24 - ERROR - stderr - 75%|███████▌ | 16925/22434 [15:13:44<3:49:36, 2.50s/it] +2025-02-06 01:21:24 - ERROR - stderr - +2025-02-06 01:21:24 - ERROR - stderr - +2025-02-06 01:21:24 - INFO - stdout - {'loss': 0.3851, 'grad_norm': 1.478630542755127, 'learning_rate': 2.9997228101443143e-06, 'epoch': 2.26} +2025-02-06 01:21:24 - ERROR - stderr - 75%|███████▌ | 16925/22434 [15:13:44<3:49:36, 2.50s/it] +2025-02-06 01:21:27 - ERROR - stderr - 75%|███████▌ | 16926/22434 [15:13:46<3:49:42, 2.50s/it] +2025-02-06 01:21:27 - ERROR - stderr - +2025-02-06 01:21:27 - ERROR - stderr - +2025-02-06 01:21:27 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.4287770986557007, 'learning_rate': 2.998691881014765e-06, 'epoch': 2.26} +2025-02-06 01:21:27 - ERROR - stderr - 75%|███████▌ | 16926/22434 [15:13:46<3:49:42, 2.50s/it] +2025-02-06 01:21:29 - ERROR - stderr - 75%|███████▌ | 16927/22434 [15:13:49<3:54:35, 2.56s/it] +2025-02-06 01:21:29 - ERROR - stderr - +2025-02-06 01:21:29 - ERROR - stderr - +2025-02-06 01:21:29 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.604519248008728, 'learning_rate': 2.997661097820784e-06, 'epoch': 2.26} +2025-02-06 01:21:29 - ERROR - stderr - 75%|███████▌ | 16927/22434 [15:13:49<3:54:35, 2.56s/it] +2025-02-06 01:21:32 - ERROR - stderr - 75%|███████▌ | 16928/22434 [15:13:51<3:51:26, 2.52s/it] +2025-02-06 01:21:32 - ERROR - stderr - +2025-02-06 01:21:32 - ERROR - stderr - +2025-02-06 01:21:32 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.423415184020996, 'learning_rate': 2.996630460583857e-06, 'epoch': 2.26} +2025-02-06 01:21:32 - ERROR - stderr - 75%|███████▌ | 16928/22434 [15:13:51<3:51:26, 2.52s/it] +2025-02-06 01:21:34 - ERROR - stderr - 75%|███████▌ | 16929/22434 [15:13:54<3:48:54, 2.49s/it] +2025-02-06 01:21:34 - ERROR - stderr - +2025-02-06 01:21:34 - ERROR - stderr - +2025-02-06 01:21:34 - INFO - stdout - {'loss': 0.4067, 'grad_norm': 1.5680296421051025, 'learning_rate': 2.9955999693254656e-06, 'epoch': 2.26} +2025-02-06 01:21:34 - ERROR - stderr - 75%|███████▌ | 16929/22434 [15:13:54<3:48:54, 2.49s/it] +2025-02-06 01:21:37 - ERROR - stderr - 75%|███████▌ | 16930/22434 [15:13:56<3:51:35, 2.52s/it] +2025-02-06 01:21:37 - ERROR - stderr - +2025-02-06 01:21:37 - ERROR - stderr - +2025-02-06 01:21:37 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.3366303443908691, 'learning_rate': 2.9945696240670905e-06, 'epoch': 2.26} +2025-02-06 01:21:37 - ERROR - stderr - 75%|███████▌ | 16930/22434 [15:13:57<3:51:35, 2.52s/it] +2025-02-06 01:21:39 - ERROR - stderr - 75%|███████▌ | 16931/22434 [15:13:59<3:51:48, 2.53s/it] +2025-02-06 01:21:39 - ERROR - stderr - +2025-02-06 01:21:39 - ERROR - stderr - +2025-02-06 01:21:39 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.4197465181350708, 'learning_rate': 2.9935394248302097e-06, 'epoch': 2.26} +2025-02-06 01:21:39 - ERROR - stderr - 75%|███████▌ | 16931/22434 [15:13:59<3:51:48, 2.53s/it] +2025-02-06 01:21:42 - ERROR - stderr - 75%|███████▌ | 16932/22434 [15:14:02<3:52:57, 2.54s/it] +2025-02-06 01:21:42 - ERROR - stderr - +2025-02-06 01:21:42 - ERROR - stderr - +2025-02-06 01:21:42 - INFO - stdout - {'loss': 0.319, 'grad_norm': 1.289496660232544, 'learning_rate': 2.992509371636294e-06, 'epoch': 2.26} +2025-02-06 01:21:42 - ERROR - stderr - 75%|███████▌ | 16932/22434 [15:14:02<3:52:57, 2.54s/it] +2025-02-06 01:21:44 - ERROR - stderr - 75%|███████▌ | 16933/22434 [15:14:04<3:52:04, 2.53s/it] +2025-02-06 01:21:44 - ERROR - stderr - +2025-02-06 01:21:44 - ERROR - stderr - +2025-02-06 01:21:44 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.4601109027862549, 'learning_rate': 2.9914794645068147e-06, 'epoch': 2.26} +2025-02-06 01:21:44 - ERROR - stderr - 75%|███████▌ | 16933/22434 [15:14:04<3:52:04, 2.53s/it] +2025-02-06 01:21:47 - ERROR - stderr - 75%|███████▌ | 16934/22434 [15:14:07<3:49:53, 2.51s/it] +2025-02-06 01:21:47 - ERROR - stderr - +2025-02-06 01:21:47 - ERROR - stderr - +2025-02-06 01:21:47 - INFO - stdout - {'loss': 0.4339, 'grad_norm': 1.7249987125396729, 'learning_rate': 2.990449703463243e-06, 'epoch': 2.26} +2025-02-06 01:21:47 - ERROR - stderr - 75%|███████▌ | 16934/22434 [15:14:07<3:49:53, 2.51s/it] +2025-02-06 01:21:49 - ERROR - stderr - 75%|███████▌ | 16935/22434 [15:14:09<3:51:48, 2.53s/it] +2025-02-06 01:21:49 - ERROR - stderr - +2025-02-06 01:21:49 - ERROR - stderr - +2025-02-06 01:21:49 - INFO - stdout - {'loss': 0.4152, 'grad_norm': 1.475365400314331, 'learning_rate': 2.9894200885270342e-06, 'epoch': 2.26} +2025-02-06 01:21:49 - ERROR - stderr - 75%|███████▌ | 16935/22434 [15:14:09<3:51:48, 2.53s/it] +2025-02-06 01:21:52 - ERROR - stderr - 75%|███████▌ | 16936/22434 [15:14:12<3:49:35, 2.51s/it] +2025-02-06 01:21:52 - ERROR - stderr - +2025-02-06 01:21:52 - ERROR - stderr - +2025-02-06 01:21:52 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.727491021156311, 'learning_rate': 2.988390619719658e-06, 'epoch': 2.26} +2025-02-06 01:21:52 - ERROR - stderr - 75%|███████▌ | 16936/22434 [15:14:12<3:49:35, 2.51s/it] +2025-02-06 01:21:55 - ERROR - stderr - 75%|███████▌ | 16937/22434 [15:14:15<4:01:51, 2.64s/it] +2025-02-06 01:21:55 - ERROR - stderr - +2025-02-06 01:21:55 - ERROR - stderr - +2025-02-06 01:21:55 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.5721575021743774, 'learning_rate': 2.9873612970625687e-06, 'epoch': 2.26} +2025-02-06 01:21:55 - ERROR - stderr - 75%|███████▌ | 16937/22434 [15:14:15<4:01:51, 2.64s/it] +2025-02-06 01:21:57 - ERROR - stderr - 76%|███████▌ | 16938/22434 [15:14:17<3:57:02, 2.59s/it] +2025-02-06 01:21:57 - ERROR - stderr - +2025-02-06 01:21:57 - ERROR - stderr - +2025-02-06 01:21:57 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.7165000438690186, 'learning_rate': 2.9863321205772243e-06, 'epoch': 2.27} +2025-02-06 01:21:57 - ERROR - stderr - 76%|███████▌ | 16938/22434 [15:14:17<3:57:02, 2.59s/it] +2025-02-06 01:22:00 - ERROR - stderr - 76%|███████▌ | 16939/22434 [15:14:19<3:54:08, 2.56s/it] +2025-02-06 01:22:00 - ERROR - stderr - +2025-02-06 01:22:00 - ERROR - stderr - +2025-02-06 01:22:00 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.4183181524276733, 'learning_rate': 2.985303090285078e-06, 'epoch': 2.27} +2025-02-06 01:22:00 - ERROR - stderr - 76%|███████▌ | 16939/22434 [15:14:20<3:54:08, 2.56s/it] +2025-02-06 01:22:02 - ERROR - stderr - 76%|███████▌ | 16940/22434 [15:14:22<3:52:32, 2.54s/it] +2025-02-06 01:22:02 - ERROR - stderr - +2025-02-06 01:22:02 - ERROR - stderr - +2025-02-06 01:22:02 - INFO - stdout - {'loss': 0.3599, 'grad_norm': 1.267835021018982, 'learning_rate': 2.9842742062075703e-06, 'epoch': 2.27} +2025-02-06 01:22:02 - ERROR - stderr - 76%|███████▌ | 16940/22434 [15:14:22<3:52:32, 2.54s/it] +2025-02-06 01:22:05 - ERROR - stderr - 76%|███████▌ | 16941/22434 [15:14:24<3:50:40, 2.52s/it] +2025-02-06 01:22:05 - ERROR - stderr - +2025-02-06 01:22:05 - ERROR - stderr - +2025-02-06 01:22:05 - INFO - stdout - {'loss': 0.3084, 'grad_norm': 1.4425122737884521, 'learning_rate': 2.9832454683661595e-06, 'epoch': 2.27} +2025-02-06 01:22:05 - ERROR - stderr - 76%|███████▌ | 16941/22434 [15:14:24<3:50:40, 2.52s/it] +2025-02-06 01:22:07 - ERROR - stderr - 76%|███████▌ | 16942/22434 [15:14:27<3:48:59, 2.50s/it] +2025-02-06 01:22:07 - ERROR - stderr - +2025-02-06 01:22:07 - ERROR - stderr - +2025-02-06 01:22:07 - INFO - stdout - {'loss': 0.3928, 'grad_norm': 1.434779167175293, 'learning_rate': 2.98221687678228e-06, 'epoch': 2.27} +2025-02-06 01:22:07 - ERROR - stderr - 76%|███████▌ | 16942/22434 [15:14:27<3:48:59, 2.50s/it] +2025-02-06 01:22:10 - ERROR - stderr - 76%|███████▌ | 16943/22434 [15:14:29<3:49:52, 2.51s/it] +2025-02-06 01:22:10 - ERROR - stderr - +2025-02-06 01:22:10 - ERROR - stderr - +2025-02-06 01:22:10 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.5460067987442017, 'learning_rate': 2.981188431477371e-06, 'epoch': 2.27} +2025-02-06 01:22:10 - ERROR - stderr - 76%|███████▌ | 16943/22434 [15:14:29<3:49:52, 2.51s/it] +2025-02-06 01:22:12 - ERROR - stderr - 76%|███████▌ | 16944/22434 [15:14:32<3:47:56, 2.49s/it] +2025-02-06 01:22:12 - ERROR - stderr - +2025-02-06 01:22:12 - ERROR - stderr - +2025-02-06 01:22:12 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.5429136753082275, 'learning_rate': 2.980160132472879e-06, 'epoch': 2.27} +2025-02-06 01:22:12 - ERROR - stderr - 76%|███████▌ | 16944/22434 [15:14:32<3:47:56, 2.49s/it] +2025-02-06 01:22:15 - ERROR - stderr - 76%|███████▌ | 16945/22434 [15:14:34<3:47:40, 2.49s/it] +2025-02-06 01:22:15 - ERROR - stderr - +2025-02-06 01:22:15 - ERROR - stderr - +2025-02-06 01:22:15 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.5830225944519043, 'learning_rate': 2.979131979790225e-06, 'epoch': 2.27} +2025-02-06 01:22:15 - ERROR - stderr - 76%|███████▌ | 16945/22434 [15:14:34<3:47:40, 2.49s/it] +2025-02-06 01:22:17 - ERROR - stderr - 76%|███████▌ | 16946/22434 [15:14:37<3:48:57, 2.50s/it] +2025-02-06 01:22:17 - ERROR - stderr - +2025-02-06 01:22:17 - ERROR - stderr - +2025-02-06 01:22:17 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.5474716424942017, 'learning_rate': 2.9781039734508543e-06, 'epoch': 2.27} +2025-02-06 01:22:17 - ERROR - stderr - 76%|██████���▌ | 16946/22434 [15:14:37<3:48:57, 2.50s/it] +2025-02-06 01:22:20 - ERROR - stderr - 76%|███████▌ | 16947/22434 [15:14:39<3:47:21, 2.49s/it] +2025-02-06 01:22:20 - ERROR - stderr - +2025-02-06 01:22:20 - ERROR - stderr - +2025-02-06 01:22:20 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.5434056520462036, 'learning_rate': 2.9770761134761828e-06, 'epoch': 2.27} +2025-02-06 01:22:20 - ERROR - stderr - 76%|███████▌ | 16947/22434 [15:14:39<3:47:21, 2.49s/it] +2025-02-06 01:22:22 - ERROR - stderr - 76%|███████▌ | 16948/22434 [15:14:42<3:48:34, 2.50s/it] +2025-02-06 01:22:22 - ERROR - stderr - +2025-02-06 01:22:22 - ERROR - stderr - +2025-02-06 01:22:22 - INFO - stdout - {'loss': 0.3967, 'grad_norm': 1.436893343925476, 'learning_rate': 2.97604839988764e-06, 'epoch': 2.27} +2025-02-06 01:22:22 - ERROR - stderr - 76%|███████▌ | 16948/22434 [15:14:42<3:48:34, 2.50s/it] +2025-02-06 01:22:25 - ERROR - stderr - 76%|███████▌ | 16949/22434 [15:14:44<3:49:06, 2.51s/it] +2025-02-06 01:22:25 - ERROR - stderr - +2025-02-06 01:22:25 - ERROR - stderr - +2025-02-06 01:22:25 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.6894923448562622, 'learning_rate': 2.9750208327066466e-06, 'epoch': 2.27} +2025-02-06 01:22:25 - ERROR - stderr - 76%|███████▌ | 16949/22434 [15:14:44<3:49:06, 2.51s/it] +2025-02-06 01:22:27 - ERROR - stderr - 76%|███████▌ | 16950/22434 [15:14:47<3:50:52, 2.53s/it] +2025-02-06 01:22:27 - ERROR - stderr - +2025-02-06 01:22:27 - ERROR - stderr - +2025-02-06 01:22:27 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.4925389289855957, 'learning_rate': 2.973993411954622e-06, 'epoch': 2.27} +2025-02-06 01:22:27 - ERROR - stderr - 76%|███████▌ | 16950/22434 [15:14:47<3:50:52, 2.53s/it] +2025-02-06 01:22:30 - ERROR - stderr - 76%|███████▌ | 16951/22434 [15:14:50<3:53:05, 2.55s/it] +2025-02-06 01:22:30 - ERROR - stderr - +2025-02-06 01:22:30 - ERROR - stderr - +2025-02-06 01:22:30 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.543095588684082, 'learning_rate': 2.972966137652983e-06, 'epoch': 2.27} +2025-02-06 01:22:30 - ERROR - stderr - 76%|███████▌ | 16951/22434 [15:14:50<3:53:05, 2.55s/it] +2025-02-06 01:22:33 - ERROR - stderr - 76%|███████▌ | 16952/22434 [15:14:52<4:01:25, 2.64s/it] +2025-02-06 01:22:33 - ERROR - stderr - +2025-02-06 01:22:33 - ERROR - stderr - +2025-02-06 01:22:33 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.6135847568511963, 'learning_rate': 2.9719390098231384e-06, 'epoch': 2.27} +2025-02-06 01:22:33 - ERROR - stderr - 76%|███████▌ | 16952/22434 [15:14:52<4:01:25, 2.64s/it] +2025-02-06 01:22:35 - ERROR - stderr - 76%|███████▌ | 16953/22434 [15:14:55<3:55:57, 2.58s/it] +2025-02-06 01:22:35 - ERROR - stderr - +2025-02-06 01:22:35 - ERROR - stderr - +2025-02-06 01:22:35 - INFO - stdout - {'loss': 0.3335, 'grad_norm': 1.347380518913269, 'learning_rate': 2.9709120284865012e-06, 'epoch': 2.27} +2025-02-06 01:22:35 - ERROR - stderr - 76%|███████▌ | 16953/22434 [15:14:55<3:55:57, 2.58s/it] +2025-02-06 01:22:38 - ERROR - stderr - 76%|███████▌ | 16954/22434 [15:14:57<3:51:42, 2.54s/it] +2025-02-06 01:22:38 - ERROR - stderr - +2025-02-06 01:22:38 - ERROR - stderr - +2025-02-06 01:22:38 - INFO - stdout - {'loss': 0.4158, 'grad_norm': 1.5861824750900269, 'learning_rate': 2.9698851936644767e-06, 'epoch': 2.27} +2025-02-06 01:22:38 - ERROR - stderr - 76%|███████▌ | 16954/22434 [15:14:57<3:51:42, 2.54s/it] +2025-02-06 01:22:40 - ERROR - stderr - 76%|███████▌ | 16955/22434 [15:15:00<3:50:02, 2.52s/it] +2025-02-06 01:22:40 - ERROR - stderr - +2025-02-06 01:22:40 - ERROR - stderr - +2025-02-06 01:22:40 - INFO - stdout - {'loss': 0.4128, 'grad_norm': 1.5730758905410767, 'learning_rate': 2.968858505378468e-06, 'epoch': 2.27} +2025-02-06 01:22:40 - ERROR - stderr - 76%|███████▌ | 16955/22434 [15:15:00<3:50:02, 2.52s/it] +2025-02-06 01:22:42 - ERROR - stderr - 76%|███████▌ | 16956/22434 [15:15:02<3:48:52, 2.51s/it] +2025-02-06 01:22:43 - ERROR - stderr - +2025-02-06 01:22:43 - ERROR - stderr - +2025-02-06 01:22:43 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.415982961654663, 'learning_rate': 2.9678319636498752e-06, 'epoch': 2.27} +2025-02-06 01:22:43 - ERROR - stderr - 76%|███████▌ | 16956/22434 [15:15:02<3:48:52, 2.51s/it] +2025-02-06 01:22:45 - ERROR - stderr - 76%|███████▌ | 16957/22434 [15:15:05<3:46:47, 2.48s/it] +2025-02-06 01:22:45 - ERROR - stderr - +2025-02-06 01:22:45 - ERROR - stderr - +2025-02-06 01:22:45 - INFO - stdout - {'loss': 0.3615, 'grad_norm': 1.449568748474121, 'learning_rate': 2.9668055685000976e-06, 'epoch': 2.27} +2025-02-06 01:22:45 - ERROR - stderr - 76%|███████▌ | 16957/22434 [15:15:05<3:46:47, 2.48s/it] +2025-02-06 01:22:47 - ERROR - stderr - 76%|███████▌ | 16958/22434 [15:15:07<3:47:31, 2.49s/it] +2025-02-06 01:22:47 - ERROR - stderr - +2025-02-06 01:22:47 - ERROR - stderr - +2025-02-06 01:22:47 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.7115237712860107, 'learning_rate': 2.965779319950529e-06, 'epoch': 2.27} +2025-02-06 01:22:47 - ERROR - stderr - 76%|███████▌ | 16958/22434 [15:15:07<3:47:31, 2.49s/it] +2025-02-06 01:22:50 - ERROR - stderr - 76%|███████▌ | 16959/22434 [15:15:10<3:47:33, 2.49s/it] +2025-02-06 01:22:50 - ERROR - stderr - +2025-02-06 01:22:50 - ERROR - stderr - +2025-02-06 01:22:50 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.32489812374115, 'learning_rate': 2.9647532180225547e-06, 'epoch': 2.27} +2025-02-06 01:22:50 - ERROR - stderr - 76%|███████▌ | 16959/22434 [15:15:10<3:47:33, 2.49s/it] +2025-02-06 01:22:52 - ERROR - stderr - 76%|███████▌ | 16960/22434 [15:15:12<3:47:33, 2.49s/it] +2025-02-06 01:22:52 - ERROR - stderr - +2025-02-06 01:22:52 - ERROR - stderr - +2025-02-06 01:22:52 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.6141796112060547, 'learning_rate': 2.9637272627375735e-06, 'epoch': 2.27} +2025-02-06 01:22:52 - ERROR - stderr - 76%|███████▌ | 16960/22434 [15:15:12<3:47:33, 2.49s/it] +2025-02-06 01:22:55 - ERROR - stderr - 76%|███████▌ | 16961/22434 [15:15:15<3:47:12, 2.49s/it] +2025-02-06 01:22:55 - ERROR - stderr - +2025-02-06 01:22:55 - ERROR - stderr - +2025-02-06 01:22:55 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.497443437576294, 'learning_rate': 2.9627014541169575e-06, 'epoch': 2.27} +2025-02-06 01:22:55 - ERROR - stderr - 76%|███████▌ | 16961/22434 [15:15:15<3:47:12, 2.49s/it] +2025-02-06 01:22:57 - ERROR - stderr - 76%|███████▌ | 16962/22434 [15:15:17<3:45:45, 2.48s/it] +2025-02-06 01:22:57 - ERROR - stderr - +2025-02-06 01:22:57 - ERROR - stderr - +2025-02-06 01:22:57 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.6728148460388184, 'learning_rate': 2.9616757921821005e-06, 'epoch': 2.27} +2025-02-06 01:22:57 - ERROR - stderr - 76%|███████▌ | 16962/22434 [15:15:17<3:45:45, 2.48s/it] +2025-02-06 01:23:00 - ERROR - stderr - 76%|███████▌ | 16963/22434 [15:15:20<3:49:53, 2.52s/it] +2025-02-06 01:23:00 - ERROR - stderr - +2025-02-06 01:23:00 - ERROR - stderr - +2025-02-06 01:23:00 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.6707217693328857, 'learning_rate': 2.9606502769543778e-06, 'epoch': 2.27} +2025-02-06 01:23:00 - ERROR - stderr - 76%|███████▌ | 16963/22434 [15:15:20<3:49:53, 2.52s/it] +2025-02-06 01:23:02 - ERROR - stderr - 76%|███████▌ | 16964/22434 [15:15:22<3:48:14, 2.50s/it] +2025-02-06 01:23:02 - ERROR - stderr - +2025-02-06 01:23:02 - ERROR - stderr - +2025-02-06 01:23:02 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.5550148487091064, 'learning_rate': 2.959624908455159e-06, 'epoch': 2.27} +2025-02-06 01:23:02 - ERROR - stderr - 76%|███████▌ | 16964/22434 [15:15:22<3:48:14, 2.50s/it] +2025-02-06 01:23:05 - ERROR - stderr - 76%|███████▌ | 16965/22434 [15:15:25<3:49:25, 2.52s/it] +2025-02-06 01:23:05 - ERROR - stderr - +2025-02-06 01:23:05 - ERROR - stderr - +2025-02-06 01:23:05 - INFO - stdout - {'loss': 0.4088, 'grad_norm': 1.6925963163375854, 'learning_rate': 2.9585996867058286e-06, 'epoch': 2.27} +2025-02-06 01:23:05 - ERROR - stderr - 76%|███████▌ | 16965/22434 [15:15:25<3:49:25, 2.52s/it] +2025-02-06 01:23:07 - ERROR - stderr - 76%|███████▌ | 16966/22434 [15:15:27<3:48:22, 2.51s/it] +2025-02-06 01:23:08 - ERROR - stderr - +2025-02-06 01:23:08 - ERROR - stderr - +2025-02-06 01:23:08 - INFO - stdout - {'loss': 0.3967, 'grad_norm': 1.395007610321045, 'learning_rate': 2.957574611727746e-06, 'epoch': 2.27} +2025-02-06 01:23:08 - ERROR - stderr - 76%|███████▌ | 16966/22434 [15:15:27<3:48:22, 2.51s/it] +2025-02-06 01:23:10 - ERROR - stderr - 76%|███████▌ | 16967/22434 [15:15:30<3:47:26, 2.50s/it] +2025-02-06 01:23:10 - ERROR - stderr - +2025-02-06 01:23:10 - ERROR - stderr - +2025-02-06 01:23:10 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.3553985357284546, 'learning_rate': 2.9565496835422822e-06, 'epoch': 2.27} +2025-02-06 01:23:10 - ERROR - stderr - 76%|███████▌ | 16967/22434 [15:15:30<3:47:26, 2.50s/it] +2025-02-06 01:23:12 - ERROR - stderr - 76%|███████▌ | 16968/22434 [15:15:32<3:47:47, 2.50s/it] +2025-02-06 01:23:12 - ERROR - stderr - +2025-02-06 01:23:12 - ERROR - stderr - +2025-02-06 01:23:12 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.5196080207824707, 'learning_rate': 2.9555249021707998e-06, 'epoch': 2.27} +2025-02-06 01:23:12 - ERROR - stderr - 76%|███████▌ | 16968/22434 [15:15:32<3:47:47, 2.50s/it] +2025-02-06 01:23:15 - ERROR - stderr - 76%|███████▌ | 16969/22434 [15:15:35<3:46:51, 2.49s/it] +2025-02-06 01:23:15 - ERROR - stderr - +2025-02-06 01:23:15 - ERROR - stderr - +2025-02-06 01:23:15 - INFO - stdout - {'loss': 0.3067, 'grad_norm': 1.2926737070083618, 'learning_rate': 2.954500267634661e-06, 'epoch': 2.27} +2025-02-06 01:23:15 - ERROR - stderr - 76%|███████▌ | 16969/22434 [15:15:35<3:46:51, 2.49s/it] +2025-02-06 01:23:18 - ERROR - stderr - 76%|███████▌ | 16970/22434 [15:15:37<3:52:33, 2.55s/it] +2025-02-06 01:23:18 - ERROR - stderr - +2025-02-06 01:23:18 - ERROR - stderr - +2025-02-06 01:23:18 - INFO - stdout - {'loss': 0.4214, 'grad_norm': 1.5860449075698853, 'learning_rate': 2.9534757799552216e-06, 'epoch': 2.27} +2025-02-06 01:23:18 - ERROR - stderr - 76%|███████▌ | 16970/22434 [15:15:37<3:52:33, 2.55s/it] +2025-02-06 01:23:20 - ERROR - stderr - 76%|███████▌ | 16971/22434 [15:15:40<3:53:53, 2.57s/it] +2025-02-06 01:23:20 - ERROR - stderr - +2025-02-06 01:23:20 - ERROR - stderr - +2025-02-06 01:23:20 - INFO - stdout - {'loss': 0.4049, 'grad_norm': 1.631088137626648, 'learning_rate': 2.952451439153837e-06, 'epoch': 2.27} +2025-02-06 01:23:20 - ERROR - stderr - 76%|███████▌ | 16971/22434 [15:15:40<3:53:53, 2.57s/it] +2025-02-06 01:23:23 - ERROR - stderr - 76%|███████▌ | 16972/22434 [15:15:42<3:51:34, 2.54s/it] +2025-02-06 01:23:23 - ERROR - stderr - +2025-02-06 01:23:23 - ERROR - stderr - +2025-02-06 01:23:23 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.5856449604034424, 'learning_rate': 2.951427245251858e-06, 'epoch': 2.27} +2025-02-06 01:23:23 - ERROR - stderr - 76%|███████▌ | 16972/22434 [15:15:43<3:51:34, 2.54s/it] +2025-02-06 01:23:25 - ERROR - stderr - 76%|███████▌ | 16973/22434 [15:15:45<3:53:15, 2.56s/it] +2025-02-06 01:23:25 - ERROR - stderr - +2025-02-06 01:23:25 - ERROR - stderr - +2025-02-06 01:23:25 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.5612843036651611, 'learning_rate': 2.950403198270634e-06, 'epoch': 2.27} +2025-02-06 01:23:25 - ERROR - stderr - 76%|███████▌ | 16973/22434 [15:15:45<3:53:15, 2.56s/it] +2025-02-06 01:23:28 - ERROR - stderr - 76%|███████▌ | 16974/22434 [15:15:48<3:50:42, 2.54s/it] +2025-02-06 01:23:28 - ERROR - stderr - +2025-02-06 01:23:28 - ERROR - stderr - +2025-02-06 01:23:28 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.4442483186721802, 'learning_rate': 2.9493792982315082e-06, 'epoch': 2.27} +2025-02-06 01:23:28 - ERROR - stderr - 76%|███████▌ | 16974/22434 [15:15:48<3:50:42, 2.54s/it] +2025-02-06 01:23:30 - ERROR - stderr - 76%|███████▌ | 16975/22434 [15:15:50<3:49:40, 2.52s/it] +2025-02-06 01:23:30 - ERROR - stderr - +2025-02-06 01:23:30 - ERROR - stderr - +2025-02-06 01:23:30 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.5850260257720947, 'learning_rate': 2.9483555451558253e-06, 'epoch': 2.27} +2025-02-06 01:23:30 - ERROR - stderr - 76%|███████▌ | 16975/22434 [15:15:50<3:49:40, 2.52s/it] +2025-02-06 01:23:33 - ERROR - stderr - 76%|███████▌ | 16976/22434 [15:15:53<3:48:46, 2.52s/it] +2025-02-06 01:23:33 - ERROR - stderr - +2025-02-06 01:23:33 - ERROR - stderr - +2025-02-06 01:23:33 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.5037930011749268, 'learning_rate': 2.9473319390649234e-06, 'epoch': 2.27} +2025-02-06 01:23:33 - ERROR - stderr - 76%|███████▌ | 16976/22434 [15:15:53<3:48:46, 2.52s/it] +2025-02-06 01:23:35 - ERROR - stderr - 76%|███████▌ | 16977/22434 [15:15:55<3:47:21, 2.50s/it] +2025-02-06 01:23:35 - ERROR - stderr - +2025-02-06 01:23:35 - ERROR - stderr - +2025-02-06 01:23:35 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.4450018405914307, 'learning_rate': 2.946308479980139e-06, 'epoch': 2.27} +2025-02-06 01:23:35 - ERROR - stderr - 76%|███████▌ | 16977/22434 [15:15:55<3:47:21, 2.50s/it] +2025-02-06 01:23:38 - ERROR - stderr - 76%|███████▌ | 16978/22434 [15:15:57<3:46:44, 2.49s/it] +2025-02-06 01:23:38 - ERROR - stderr - +2025-02-06 01:23:38 - ERROR - stderr - +2025-02-06 01:23:38 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.5997565984725952, 'learning_rate': 2.9452851679228044e-06, 'epoch': 2.27} +2025-02-06 01:23:38 - ERROR - stderr - 76%|███████▌ | 16978/22434 [15:15:58<3:46:44, 2.49s/it] +2025-02-06 01:23:40 - ERROR - stderr - 76%|███████▌ | 16979/22434 [15:16:00<3:46:09, 2.49s/it] +2025-02-06 01:23:40 - ERROR - stderr - +2025-02-06 01:23:40 - ERROR - stderr - +2025-02-06 01:23:40 - INFO - stdout - {'loss': 0.3214, 'grad_norm': 1.4115175008773804, 'learning_rate': 2.944262002914252e-06, 'epoch': 2.27} +2025-02-06 01:23:40 - ERROR - stderr - 76%|███████▌ | 16979/22434 [15:16:00<3:46:09, 2.49s/it] +2025-02-06 01:23:43 - ERROR - stderr - 76%|███████▌ | 16980/22434 [15:16:02<3:45:24, 2.48s/it] +2025-02-06 01:23:43 - ERROR - stderr - +2025-02-06 01:23:43 - ERROR - stderr - +2025-02-06 01:23:43 - INFO - stdout - {'loss': 0.4379, 'grad_norm': 1.6623889207839966, 'learning_rate': 2.9432389849758014e-06, 'epoch': 2.27} +2025-02-06 01:23:43 - ERROR - stderr - 76%|███████▌ | 16980/22434 [15:16:02<3:45:24, 2.48s/it] +2025-02-06 01:23:45 - ERROR - stderr - 76%|███████▌ | 16981/22434 [15:16:05<3:43:57, 2.46s/it] +2025-02-06 01:23:45 - ERROR - stderr - +2025-02-06 01:23:45 - ERROR - stderr - +2025-02-06 01:23:45 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.5614867210388184, 'learning_rate': 2.9422161141287843e-06, 'epoch': 2.27} +2025-02-06 01:23:45 - ERROR - stderr - 76%|███████▌ | 16981/22434 [15:16:05<3:43:57, 2.46s/it] +2025-02-06 01:23:48 - ERROR - stderr - 76%|███████▌ | 16982/22434 [15:16:07<3:45:51, 2.49s/it] +2025-02-06 01:23:48 - ERROR - stderr - +2025-02-06 01:23:48 - ERROR - stderr - +2025-02-06 01:23:48 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.3938965797424316, 'learning_rate': 2.9411933903945224e-06, 'epoch': 2.27} +2025-02-06 01:23:48 - ERROR - stderr - 76%|███████▌ | 16982/22434 [15:16:07<3:45:51, 2.49s/it] +2025-02-06 01:23:50 - ERROR - stderr - 76%|███████▌ | 16983/22434 [15:16:10<3:45:31, 2.48s/it] +2025-02-06 01:23:50 - ERROR - stderr - +2025-02-06 01:23:50 - ERROR - stderr - +2025-02-06 01:23:50 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.5526942014694214, 'learning_rate': 2.940170813794322e-06, 'epoch': 2.27} +2025-02-06 01:23:50 - ERROR - stderr - 76%|███████▌ | 16983/22434 [15:16:10<3:45:31, 2.48s/it] +2025-02-06 01:23:53 - ERROR - stderr - 76%|███████▌ | 16984/22434 [15:16:12<3:45:44, 2.49s/it] +2025-02-06 01:23:53 - ERROR - stderr - +2025-02-06 01:23:53 - ERROR - stderr - +2025-02-06 01:23:53 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.4636236429214478, 'learning_rate': 2.9391483843495126e-06, 'epoch': 2.27} +2025-02-06 01:23:53 - ERROR - stderr - 76%|███████▌ | 16984/22434 [15:16:12<3:45:44, 2.49s/it] +2025-02-06 01:23:55 - ERROR - stderr - 76%|███████▌ | 16985/22434 [15:16:15<3:46:52, 2.50s/it] +2025-02-06 01:23:55 - ERROR - stderr - +2025-02-06 01:23:55 - ERROR - stderr - +2025-02-06 01:23:55 - INFO - stdout - {'loss': 0.4148, 'grad_norm': 1.679840326309204, 'learning_rate': 2.938126102081392e-06, 'epoch': 2.27} +2025-02-06 01:23:55 - ERROR - stderr - 76%|███████▌ | 16985/22434 [15:16:15<3:46:52, 2.50s/it] +2025-02-06 01:23:58 - ERROR - stderr - 76%|███████▌ | 16986/22434 [15:16:17<3:46:14, 2.49s/it] +2025-02-06 01:23:58 - ERROR - stderr - +2025-02-06 01:23:58 - ERROR - stderr - +2025-02-06 01:23:58 - INFO - stdout - {'loss': 0.3611, 'grad_norm': 1.5810012817382812, 'learning_rate': 2.9371039670112832e-06, 'epoch': 2.27} +2025-02-06 01:23:58 - ERROR - stderr - 76%|███████▌ | 16986/22434 [15:16:17<3:46:14, 2.49s/it] +2025-02-06 01:24:00 - ERROR - stderr - 76%|███████▌ | 16987/22434 [15:16:20<3:48:34, 2.52s/it] +2025-02-06 01:24:00 - ERROR - stderr - +2025-02-06 01:24:00 - ERROR - stderr - +2025-02-06 01:24:00 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.582472801208496, 'learning_rate': 2.936081979160479e-06, 'epoch': 2.27} +2025-02-06 01:24:00 - ERROR - stderr - 76%|███████▌ | 16987/22434 [15:16:20<3:48:34, 2.52s/it] +2025-02-06 01:24:03 - ERROR - stderr - 76%|███████▌ | 16988/22434 [15:16:22<3:47:49, 2.51s/it] +2025-02-06 01:24:03 - ERROR - stderr - +2025-02-06 01:24:03 - ERROR - stderr - +2025-02-06 01:24:03 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.51668381690979, 'learning_rate': 2.9350601385502865e-06, 'epoch': 2.27} +2025-02-06 01:24:03 - ERROR - stderr - 76%|███████▌ | 16988/22434 [15:16:22<3:47:49, 2.51s/it] +2025-02-06 01:24:05 - ERROR - stderr - 76%|███████▌ | 16989/22434 [15:16:25<3:54:45, 2.59s/it] +2025-02-06 01:24:05 - ERROR - stderr - +2025-02-06 01:24:05 - ERROR - stderr - +2025-02-06 01:24:05 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.632483720779419, 'learning_rate': 2.9340384452020053e-06, 'epoch': 2.27} +2025-02-06 01:24:05 - ERROR - stderr - 76%|███████▌ | 16989/22434 [15:16:25<3:54:45, 2.59s/it] +2025-02-06 01:24:08 - ERROR - stderr - 76%|███████▌ | 16990/22434 [15:16:28<3:52:36, 2.56s/it] +2025-02-06 01:24:08 - ERROR - stderr - +2025-02-06 01:24:08 - ERROR - stderr - +2025-02-06 01:24:08 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.3301669359207153, 'learning_rate': 2.9330168991369323e-06, 'epoch': 2.27} +2025-02-06 01:24:08 - ERROR - stderr - 76%|███████▌ | 16990/22434 [15:16:28<3:52:36, 2.56s/it] +2025-02-06 01:24:10 - ERROR - stderr - 76%|███████▌ | 16991/22434 [15:16:30<3:49:36, 2.53s/it] +2025-02-06 01:24:10 - ERROR - stderr - +2025-02-06 01:24:10 - ERROR - stderr - +2025-02-06 01:24:10 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.4877718687057495, 'learning_rate': 2.931995500376359e-06, 'epoch': 2.27} +2025-02-06 01:24:10 - ERROR - stderr - 76%|███████▌ | 16991/22434 [15:16:30<3:49:36, 2.53s/it] +2025-02-06 01:24:13 - ERROR - stderr - 76%|███████▌ | 16992/22434 [15:16:33<3:48:17, 2.52s/it] +2025-02-06 01:24:13 - ERROR - stderr - +2025-02-06 01:24:13 - ERROR - stderr - +2025-02-06 01:24:13 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.4969431161880493, 'learning_rate': 2.9309742489415747e-06, 'epoch': 2.27} +2025-02-06 01:24:13 - ERROR - stderr - 76%|███████▌ | 16992/22434 [15:16:33<3:48:17, 2.52s/it] +2025-02-06 01:24:15 - ERROR - stderr - 76%|███████▌ | 16993/22434 [15:16:35<3:47:03, 2.50s/it] +2025-02-06 01:24:15 - ERROR - stderr - +2025-02-06 01:24:15 - ERROR - stderr - +2025-02-06 01:24:15 - INFO - stdout - {'loss': 0.3251, 'grad_norm': 1.4712682962417603, 'learning_rate': 2.92995314485387e-06, 'epoch': 2.27} +2025-02-06 01:24:15 - ERROR - stderr - 76%|███████▌ | 16993/22434 [15:16:35<3:47:03, 2.50s/it] +2025-02-06 01:24:18 - ERROR - stderr - 76%|███████▌ | 16994/22434 [15:16:38<3:49:14, 2.53s/it] +2025-02-06 01:24:18 - ERROR - stderr - +2025-02-06 01:24:18 - ERROR - stderr - +2025-02-06 01:24:18 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.4961767196655273, 'learning_rate': 2.9289321881345257e-06, 'epoch': 2.27} +2025-02-06 01:24:18 - ERROR - stderr - 76%|███████▌ | 16994/22434 [15:16:38<3:49:14, 2.53s/it] +2025-02-06 01:24:20 - ERROR - stderr - 76%|███████▌ | 16995/22434 [15:16:40<3:49:42, 2.53s/it] +2025-02-06 01:24:21 - ERROR - stderr - +2025-02-06 01:24:21 - ERROR - stderr - +2025-02-06 01:24:21 - INFO - stdout - {'loss': 0.3616, 'grad_norm': 1.567962884902954, 'learning_rate': 2.927911378804824e-06, 'epoch': 2.27} +2025-02-06 01:24:21 - ERROR - stderr - 76%|███████▌ | 16995/22434 [15:16:40<3:49:42, 2.53s/it] +2025-02-06 01:24:23 - ERROR - stderr - 76%|███████▌ | 16996/22434 [15:16:43<3:47:28, 2.51s/it] +2025-02-06 01:24:23 - ERROR - stderr - +2025-02-06 01:24:23 - ERROR - stderr - +2025-02-06 01:24:23 - INFO - stdout - {'loss': 0.4515, 'grad_norm': 1.7414886951446533, 'learning_rate': 2.926890716886042e-06, 'epoch': 2.27} +2025-02-06 01:24:23 - ERROR - stderr - 76%|███████▌ | 16996/22434 [15:16:43<3:47:28, 2.51s/it] +2025-02-06 01:24:25 - ERROR - stderr - 76%|███████▌ | 16997/22434 [15:16:45<3:47:28, 2.51s/it] +2025-02-06 01:24:25 - ERROR - stderr - +2025-02-06 01:24:25 - ERROR - stderr - +2025-02-06 01:24:25 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.419416069984436, 'learning_rate': 2.9258702023994547e-06, 'epoch': 2.27} +2025-02-06 01:24:25 - ERROR - stderr - 76%|███████▌ | 16997/22434 [15:16:45<3:47:28, 2.51s/it] +2025-02-06 01:24:28 - ERROR - stderr - 76%|███████▌ | 16998/22434 [15:16:48<3:56:35, 2.61s/it] +2025-02-06 01:24:28 - ERROR - stderr - +2025-02-06 01:24:28 - ERROR - stderr - +2025-02-06 01:24:28 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.4319350719451904, 'learning_rate': 2.9248498353663337e-06, 'epoch': 2.27} +2025-02-06 01:24:28 - ERROR - stderr - 76%|███████▌ | 16998/22434 [15:16:48<3:56:35, 2.61s/it] +2025-02-06 01:24:31 - ERROR - stderr - 76%|███████▌ | 16999/22434 [15:16:51<3:54:12, 2.59s/it] +2025-02-06 01:24:31 - ERROR - stderr - +2025-02-06 01:24:31 - ERROR - stderr - +2025-02-06 01:24:31 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.385412335395813, 'learning_rate': 2.923829615807948e-06, 'epoch': 2.27} +2025-02-06 01:24:31 - ERROR - stderr - 76%|███████▌ | 16999/22434 [15:16:51<3:54:12, 2.59s/it] +2025-02-06 01:24:33 - ERROR - stderr - 76%|███████▌ | 17000/22434 [15:16:53<3:56:39, 2.61s/it] +2025-02-06 01:24:34 - ERROR - stderr - +2025-02-06 01:24:34 - ERROR - stderr - +2025-02-06 01:24:34 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.4646358489990234, 'learning_rate': 2.922809543745563e-06, 'epoch': 2.27} +2025-02-06 01:24:34 - ERROR - stderr - 76%|███████▌ | 17000/22434 [15:16:53<3:56:39, 2.61s/it] +2025-02-06 01:24:36 - ERROR - stderr - 76%|███████▌ | 17001/22434 [15:16:56<3:53:30, 2.58s/it] +2025-02-06 01:24:36 - ERROR - stderr - +2025-02-06 01:24:36 - ERROR - stderr - +2025-02-06 01:24:36 - INFO - stdout - {'loss': 0.3414, 'grad_norm': 1.560086965560913, 'learning_rate': 2.9217896192004413e-06, 'epoch': 2.27} +2025-02-06 01:24:36 - ERROR - stderr - 76%|███████▌ | 17001/22434 [15:16:56<3:53:30, 2.58s/it] +2025-02-06 01:24:39 - ERROR - stderr - 76%|███████▌ | 17002/22434 [15:16:58<3:51:38, 2.56s/it] +2025-02-06 01:24:39 - ERROR - stderr - +2025-02-06 01:24:39 - ERROR - stderr - +2025-02-06 01:24:39 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.6190581321716309, 'learning_rate': 2.9207698421938415e-06, 'epoch': 2.27} +2025-02-06 01:24:39 - ERROR - stderr - 76%|███████▌ | 17002/22434 [15:16:58<3:51:38, 2.56s/it] +2025-02-06 01:24:41 - ERROR - stderr - 76%|███████▌ | 17003/22434 [15:17:01<3:53:06, 2.58s/it] +2025-02-06 01:24:41 - ERROR - stderr - +2025-02-06 01:24:41 - ERROR - stderr - +2025-02-06 01:24:41 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.555875301361084, 'learning_rate': 2.9197502127470223e-06, 'epoch': 2.27} +2025-02-06 01:24:41 - ERROR - stderr - 76%|███████▌ | 17003/22434 [15:17:01<3:53:06, 2.58s/it] +2025-02-06 01:24:44 - ERROR - stderr - 76%|███████▌ | 17004/22434 [15:17:03<3:51:55, 2.56s/it] +2025-02-06 01:24:44 - ERROR - stderr - +2025-02-06 01:24:44 - ERROR - stderr - +2025-02-06 01:24:44 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.4572465419769287, 'learning_rate': 2.9187307308812298e-06, 'epoch': 2.27} +2025-02-06 01:24:44 - ERROR - stderr - 76%|███████▌ | 17004/22434 [15:17:03<3:51:55, 2.56s/it] +2025-02-06 01:24:46 - ERROR - stderr - 76%|███████▌ | 17005/22434 [15:17:06<3:52:44, 2.57s/it] +2025-02-06 01:24:46 - ERROR - stderr - +2025-02-06 01:24:46 - ERROR - stderr - +2025-02-06 01:24:46 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.464336633682251, 'learning_rate': 2.917711396617725e-06, 'epoch': 2.27} +2025-02-06 01:24:46 - ERROR - stderr - 76%|███████▌ | 17005/22434 [15:17:06<3:52:44, 2.57s/it] +2025-02-06 01:24:49 - ERROR - stderr - 76%|███████▌ | 17006/22434 [15:17:08<3:49:38, 2.54s/it] +2025-02-06 01:24:49 - ERROR - stderr - +2025-02-06 01:24:49 - ERROR - stderr - +2025-02-06 01:24:49 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.6059752702713013, 'learning_rate': 2.916692209977743e-06, 'epoch': 2.27} +2025-02-06 01:24:49 - ERROR - stderr - 76%|███████▌ | 17006/22434 [15:17:09<3:49:38, 2.54s/it] +2025-02-06 01:24:51 - ERROR - stderr - 76%|███████▌ | 17007/22434 [15:17:11<3:48:25, 2.53s/it] +2025-02-06 01:24:51 - ERROR - stderr - +2025-02-06 01:24:51 - ERROR - stderr - +2025-02-06 01:24:51 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.513388752937317, 'learning_rate': 2.91567317098254e-06, 'epoch': 2.27} +2025-02-06 01:24:51 - ERROR - stderr - 76%|███████▌ | 17007/22434 [15:17:11<3:48:25, 2.53s/it] +2025-02-06 01:24:54 - ERROR - stderr - 76%|███████▌ | 17008/22434 [15:17:13<3:47:44, 2.52s/it] +2025-02-06 01:24:54 - ERROR - stderr - +2025-02-06 01:24:54 - ERROR - stderr - +2025-02-06 01:24:54 - INFO - stdout - {'loss': 0.3404, 'grad_norm': 1.5431640148162842, 'learning_rate': 2.9146542796533484e-06, 'epoch': 2.27} +2025-02-06 01:24:54 - ERROR - stderr - 76%|███████▌ | 17008/22434 [15:17:14<3:47:44, 2.52s/it] +2025-02-06 01:24:56 - ERROR - stderr - 76%|███████▌ | 17009/22434 [15:17:16<3:50:32, 2.55s/it] +2025-02-06 01:24:56 - ERROR - stderr - +2025-02-06 01:24:56 - ERROR - stderr - +2025-02-06 01:24:56 - INFO - stdout - {'loss': 0.4268, 'grad_norm': 1.6186972856521606, 'learning_rate': 2.9136355360114045e-06, 'epoch': 2.27} +2025-02-06 01:24:56 - ERROR - stderr - 76%|███████▌ | 17009/22434 [15:17:16<3:50:32, 2.55s/it] +2025-02-06 01:24:59 - ERROR - stderr - 76%|███████▌ | 17010/22434 [15:17:19<3:48:06, 2.52s/it] +2025-02-06 01:24:59 - ERROR - stderr - +2025-02-06 01:24:59 - ERROR - stderr - +2025-02-06 01:24:59 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.4681708812713623, 'learning_rate': 2.9126169400779536e-06, 'epoch': 2.27} +2025-02-06 01:24:59 - ERROR - stderr - 76%|███████▌ | 17010/22434 [15:17:19<3:48:06, 2.52s/it] +2025-02-06 01:25:01 - ERROR - stderr - 76%|███████▌ | 17011/22434 [15:17:21<3:48:16, 2.53s/it] +2025-02-06 01:25:01 - ERROR - stderr - +2025-02-06 01:25:01 - ERROR - stderr - +2025-02-06 01:25:01 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.64728844165802, 'learning_rate': 2.9115984918742167e-06, 'epoch': 2.27} +2025-02-06 01:25:01 - ERROR - stderr - 76%|███████▌ | 17011/22434 [15:17:21<3:48:16, 2.53s/it] +2025-02-06 01:25:04 - ERROR - stderr - 76%|███████▌ | 17012/22434 [15:17:24<3:48:10, 2.52s/it] +2025-02-06 01:25:04 - ERROR - stderr - +2025-02-06 01:25:04 - ERROR - stderr - +2025-02-06 01:25:04 - INFO - stdout - {'loss': 0.4061, 'grad_norm': 1.5776786804199219, 'learning_rate': 2.9105801914214272e-06, 'epoch': 2.27} +2025-02-06 01:25:04 - ERROR - stderr - 76%|███████▌ | 17012/22434 [15:17:24<3:48:10, 2.52s/it] +2025-02-06 01:25:06 - ERROR - stderr - 76%|███████▌ | 17013/22434 [15:17:26<3:47:32, 2.52s/it] +2025-02-06 01:25:06 - ERROR - stderr - +2025-02-06 01:25:06 - ERROR - stderr - +2025-02-06 01:25:06 - INFO - stdout - {'loss': 0.4438, 'grad_norm': 1.7696211338043213, 'learning_rate': 2.9095620387408097e-06, 'epoch': 2.28} +2025-02-06 01:25:06 - ERROR - stderr - 76%|███████▌ | 17013/22434 [15:17:26<3:47:32, 2.52s/it] +2025-02-06 01:25:09 - ERROR - stderr - 76%|███████▌ | 17014/22434 [15:17:29<3:47:21, 2.52s/it] +2025-02-06 01:25:09 - ERROR - stderr - +2025-02-06 01:25:09 - ERROR - stderr - +2025-02-06 01:25:09 - INFO - stdout - {'loss': 0.3329, 'grad_norm': 1.3879187107086182, 'learning_rate': 2.9085440338535866e-06, 'epoch': 2.28} +2025-02-06 01:25:09 - ERROR - stderr - 76%|███████▌ | 17014/22434 [15:17:29<3:47:21, 2.52s/it] +2025-02-06 01:25:12 - ERROR - stderr - 76%|███████▌ | 17015/22434 [15:17:31<3:54:39, 2.60s/it] +2025-02-06 01:25:12 - ERROR - stderr - +2025-02-06 01:25:12 - ERROR - stderr - +2025-02-06 01:25:12 - INFO - stdout - {'loss': 0.3148, 'grad_norm': 1.3968441486358643, 'learning_rate': 2.907526176780977e-06, 'epoch': 2.28} +2025-02-06 01:25:12 - ERROR - stderr - 76%|███████▌ | 17015/22434 [15:17:31<3:54:39, 2.60s/it] +2025-02-06 01:25:14 - ERROR - stderr - 76%|███████▌ | 17016/22434 [15:17:34<3:53:24, 2.58s/it] +2025-02-06 01:25:14 - ERROR - stderr - +2025-02-06 01:25:14 - ERROR - stderr - +2025-02-06 01:25:14 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.6089133024215698, 'learning_rate': 2.906508467544198e-06, 'epoch': 2.28} +2025-02-06 01:25:14 - ERROR - stderr - 76%|███████▌ | 17016/22434 [15:17:34<3:53:24, 2.58s/it] +2025-02-06 01:25:17 - ERROR - stderr - 76%|███████▌ | 17017/22434 [15:17:36<3:50:50, 2.56s/it] +2025-02-06 01:25:17 - ERROR - stderr - +2025-02-06 01:25:17 - ERROR - stderr - +2025-02-06 01:25:17 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.6451023817062378, 'learning_rate': 2.9054909061644623e-06, 'epoch': 2.28} +2025-02-06 01:25:17 - ERROR - stderr - 76%|███████▌ | 17017/22434 [15:17:37<3:50:50, 2.56s/it] +2025-02-06 01:25:19 - ERROR - stderr - 76%|███████▌ | 17018/22434 [15:17:39<3:47:57, 2.53s/it] +2025-02-06 01:25:19 - ERROR - stderr - +2025-02-06 01:25:19 - ERROR - stderr - +2025-02-06 01:25:19 - INFO - stdout - {'loss': 0.4077, 'grad_norm': 1.6566888093948364, 'learning_rate': 2.9044734926629793e-06, 'epoch': 2.28} +2025-02-06 01:25:19 - ERROR - stderr - 76%|███████▌ | 17018/22434 [15:17:39<3:47:57, 2.53s/it] +2025-02-06 01:25:22 - ERROR - stderr - 76%|███████▌ | 17019/22434 [15:17:41<3:48:36, 2.53s/it] +2025-02-06 01:25:22 - ERROR - stderr - +2025-02-06 01:25:22 - ERROR - stderr - +2025-02-06 01:25:22 - INFO - stdout - {'loss': 0.3473, 'grad_norm': 1.550437569618225, 'learning_rate': 2.9034562270609567e-06, 'epoch': 2.28} +2025-02-06 01:25:22 - ERROR - stderr - 76%|███████▌ | 17019/22434 [15:17:42<3:48:36, 2.53s/it] +2025-02-06 01:25:24 - ERROR - stderr - 76%|███████▌ | 17020/22434 [15:17:44<3:47:54, 2.53s/it] +2025-02-06 01:25:24 - ERROR - stderr - +2025-02-06 01:25:24 - ERROR - stderr - +2025-02-06 01:25:24 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.5410364866256714, 'learning_rate': 2.902439109379599e-06, 'epoch': 2.28} +2025-02-06 01:25:24 - ERROR - stderr - 76%|███████▌ | 17020/22434 [15:17:44<3:47:54, 2.53s/it] +2025-02-06 01:25:27 - ERROR - stderr - 76%|███████▌ | 17021/22434 [15:17:46<3:46:47, 2.51s/it] +2025-02-06 01:25:27 - ERROR - stderr - +2025-02-06 01:25:27 - ERROR - stderr - +2025-02-06 01:25:27 - INFO - stdout - {'loss': 0.3645, 'grad_norm': 1.4992727041244507, 'learning_rate': 2.9014221396401064e-06, 'epoch': 2.28} +2025-02-06 01:25:27 - ERROR - stderr - 76%|███████▌ | 17021/22434 [15:17:47<3:46:47, 2.51s/it] +2025-02-06 01:25:29 - ERROR - stderr - 76%|███████▌ | 17022/22434 [15:17:49<3:45:35, 2.50s/it] +2025-02-06 01:25:29 - ERROR - stderr - +2025-02-06 01:25:29 - ERROR - stderr - +2025-02-06 01:25:29 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.5268008708953857, 'learning_rate': 2.900405317863676e-06, 'epoch': 2.28} +2025-02-06 01:25:29 - ERROR - stderr - 76%|███████▌ | 17022/22434 [15:17:49<3:45:35, 2.50s/it] +2025-02-06 01:25:32 - ERROR - stderr - 76%|███████▌ | 17023/22434 [15:17:52<3:58:04, 2.64s/it] +2025-02-06 01:25:32 - ERROR - stderr - +2025-02-06 01:25:32 - ERROR - stderr - +2025-02-06 01:25:32 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.3154749870300293, 'learning_rate': 2.8993886440715036e-06, 'epoch': 2.28} +2025-02-06 01:25:32 - ERROR - stderr - 76%|███████▌ | 17023/22434 [15:17:52<3:58:04, 2.64s/it] +2025-02-06 01:25:35 - ERROR - stderr - 76%|███████▌ | 17024/22434 [15:17:54<3:54:48, 2.60s/it] +2025-02-06 01:25:35 - ERROR - stderr - +2025-02-06 01:25:35 - ERROR - stderr - +2025-02-06 01:25:35 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.2949903011322021, 'learning_rate': 2.8983721182847834e-06, 'epoch': 2.28} +2025-02-06 01:25:35 - ERROR - stderr - 76%|███████▌ | 17024/22434 [15:17:54<3:54:48, 2.60s/it] +2025-02-06 01:25:37 - ERROR - stderr - 76%|█��█████▌ | 17025/22434 [15:17:57<3:51:59, 2.57s/it] +2025-02-06 01:25:37 - ERROR - stderr - +2025-02-06 01:25:37 - ERROR - stderr - +2025-02-06 01:25:37 - INFO - stdout - {'loss': 0.4079, 'grad_norm': 1.5786867141723633, 'learning_rate': 2.8973557405246954e-06, 'epoch': 2.28} +2025-02-06 01:25:37 - ERROR - stderr - 76%|███████▌ | 17025/22434 [15:17:57<3:51:59, 2.57s/it] +2025-02-06 01:25:40 - ERROR - stderr - 76%|███████▌ | 17026/22434 [15:17:59<3:50:14, 2.55s/it] +2025-02-06 01:25:40 - ERROR - stderr - +2025-02-06 01:25:40 - ERROR - stderr - +2025-02-06 01:25:40 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.5485320091247559, 'learning_rate': 2.896339510812436e-06, 'epoch': 2.28} +2025-02-06 01:25:40 - ERROR - stderr - 76%|███████▌ | 17026/22434 [15:17:59<3:50:14, 2.55s/it] +2025-02-06 01:25:42 - ERROR - stderr - 76%|███████▌ | 17027/22434 [15:18:02<3:52:24, 2.58s/it] +2025-02-06 01:25:42 - ERROR - stderr - +2025-02-06 01:25:42 - ERROR - stderr - +2025-02-06 01:25:42 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.2919182777404785, 'learning_rate': 2.895323429169179e-06, 'epoch': 2.28} +2025-02-06 01:25:42 - ERROR - stderr - 76%|███████▌ | 17027/22434 [15:18:02<3:52:24, 2.58s/it] +2025-02-06 01:25:45 - ERROR - stderr - 76%|███████▌ | 17028/22434 [15:18:05<3:49:30, 2.55s/it] +2025-02-06 01:25:45 - ERROR - stderr - +2025-02-06 01:25:45 - ERROR - stderr - +2025-02-06 01:25:45 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.6092054843902588, 'learning_rate': 2.894307495616103e-06, 'epoch': 2.28} +2025-02-06 01:25:45 - ERROR - stderr - 76%|███████▌ | 17028/22434 [15:18:05<3:49:30, 2.55s/it] +2025-02-06 01:25:47 - ERROR - stderr - 76%|███████▌ | 17029/22434 [15:18:07<3:53:07, 2.59s/it] +2025-02-06 01:25:47 - ERROR - stderr - +2025-02-06 01:25:47 - ERROR - stderr - +2025-02-06 01:25:48 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.2670623064041138, 'learning_rate': 2.8932917101743953e-06, 'epoch': 2.28} +2025-02-06 01:25:48 - ERROR - stderr - 76%|███████▌ | 17029/22434 [15:18:07<3:53:07, 2.59s/it] +2025-02-06 01:25:50 - ERROR - stderr - 76%|███████▌ | 17030/22434 [15:18:10<4:03:11, 2.70s/it] +2025-02-06 01:25:50 - ERROR - stderr - +2025-02-06 01:25:50 - ERROR - stderr - +2025-02-06 01:25:50 - INFO - stdout - {'loss': 0.4051, 'grad_norm': 1.5616655349731445, 'learning_rate': 2.8922760728652144e-06, 'epoch': 2.28} +2025-02-06 01:25:50 - ERROR - stderr - 76%|███████▌ | 17030/22434 [15:18:10<4:03:11, 2.70s/it] +2025-02-06 01:25:53 - ERROR - stderr - 76%|███████▌ | 17031/22434 [15:18:13<3:56:19, 2.62s/it] +2025-02-06 01:25:53 - ERROR - stderr - +2025-02-06 01:25:53 - ERROR - stderr - +2025-02-06 01:25:53 - INFO - stdout - {'loss': 0.3411, 'grad_norm': 1.3748559951782227, 'learning_rate': 2.891260583709744e-06, 'epoch': 2.28} +2025-02-06 01:25:53 - ERROR - stderr - 76%|███████▌ | 17031/22434 [15:18:13<3:56:19, 2.62s/it] +2025-02-06 01:25:55 - ERROR - stderr - 76%|███████▌ | 17032/22434 [15:18:15<3:53:41, 2.60s/it] +2025-02-06 01:25:55 - ERROR - stderr - +2025-02-06 01:25:55 - ERROR - stderr - +2025-02-06 01:25:55 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.5091861486434937, 'learning_rate': 2.8902452427291407e-06, 'epoch': 2.28} +2025-02-06 01:25:55 - ERROR - stderr - 76%|███████▌ | 17032/22434 [15:18:15<3:53:41, 2.60s/it] +2025-02-06 01:25:58 - ERROR - stderr - 76%|███████▌ | 17033/22434 [15:18:18<3:51:40, 2.57s/it] +2025-02-06 01:25:58 - ERROR - stderr - +2025-02-06 01:25:58 - ERROR - stderr - +2025-02-06 01:25:58 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.4395629167556763, 'learning_rate': 2.8892300499445725e-06, 'epoch': 2.28} +2025-02-06 01:25:58 - ERROR - stderr - 76%|███████▌ | 17033/22434 [15:18:18<3:51:40, 2.57s/it] +2025-02-06 01:26:00 - ERROR - stderr - 76%|███████▌ | 17034/22434 [15:18:20<3:49:34, 2.55s/it] +2025-02-06 01:26:00 - ERROR - stderr - +2025-02-06 01:26:00 - ERROR - stderr - +2025-02-06 01:26:00 - INFO - stdout - {'loss': 0.4225, 'grad_norm': 1.3729289770126343, 'learning_rate': 2.8882150053771997e-06, 'epoch': 2.28} +2025-02-06 01:26:00 - ERROR - stderr - 76%|███████▌ | 17034/22434 [15:18:20<3:49:34, 2.55s/it] +2025-02-06 01:26:03 - ERROR - stderr - 76%|███████▌ | 17035/22434 [15:18:23<3:54:06, 2.60s/it] +2025-02-06 01:26:03 - ERROR - stderr - +2025-02-06 01:26:03 - ERROR - stderr - +2025-02-06 01:26:03 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.5264701843261719, 'learning_rate': 2.8872001090481804e-06, 'epoch': 2.28} +2025-02-06 01:26:03 - ERROR - stderr - 76%|███████▌ | 17035/22434 [15:18:23<3:54:06, 2.60s/it] +2025-02-06 01:26:06 - ERROR - stderr - 76%|███████▌ | 17036/22434 [15:18:25<3:49:24, 2.55s/it] +2025-02-06 01:26:06 - ERROR - stderr - +2025-02-06 01:26:06 - ERROR - stderr - +2025-02-06 01:26:06 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.4509196281433105, 'learning_rate': 2.886185360978667e-06, 'epoch': 2.28} +2025-02-06 01:26:06 - ERROR - stderr - 76%|███████▌ | 17036/22434 [15:18:25<3:49:24, 2.55s/it] +2025-02-06 01:26:08 - ERROR - stderr - 76%|███████▌ | 17037/22434 [15:18:28<3:49:38, 2.55s/it] +2025-02-06 01:26:08 - ERROR - stderr - +2025-02-06 01:26:08 - ERROR - stderr - +2025-02-06 01:26:08 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.5283418893814087, 'learning_rate': 2.8851707611898138e-06, 'epoch': 2.28} +2025-02-06 01:26:08 - ERROR - stderr - 76%|███████▌ | 17037/22434 [15:18:28<3:49:38, 2.55s/it] +2025-02-06 01:26:11 - ERROR - stderr - 76%|███████▌ | 17038/22434 [15:18:30<3:48:45, 2.54s/it] +2025-02-06 01:26:11 - ERROR - stderr - +2025-02-06 01:26:11 - ERROR - stderr - +2025-02-06 01:26:11 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.734653353691101, 'learning_rate': 2.884156309702768e-06, 'epoch': 2.28} +2025-02-06 01:26:11 - ERROR - stderr - 76%|███████▌ | 17038/22434 [15:18:30<3:48:45, 2.54s/it] +2025-02-06 01:26:13 - ERROR - stderr - 76%|███████▌ | 17039/22434 [15:18:33<3:48:08, 2.54s/it] +2025-02-06 01:26:13 - ERROR - stderr - +2025-02-06 01:26:13 - ERROR - stderr - +2025-02-06 01:26:13 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.8386884927749634, 'learning_rate': 2.883142006538675e-06, 'epoch': 2.28} +2025-02-06 01:26:13 - ERROR - stderr - 76%|███████▌ | 17039/22434 [15:18:33<3:48:08, 2.54s/it] +2025-02-06 01:26:16 - ERROR - stderr - 76%|███████▌ | 17040/22434 [15:18:35<3:48:07, 2.54s/it] +2025-02-06 01:26:16 - ERROR - stderr - +2025-02-06 01:26:16 - ERROR - stderr - +2025-02-06 01:26:16 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.3719942569732666, 'learning_rate': 2.8821278517186755e-06, 'epoch': 2.28} +2025-02-06 01:26:16 - ERROR - stderr - 76%|███████▌ | 17040/22434 [15:18:36<3:48:07, 2.54s/it] +2025-02-06 01:26:18 - ERROR - stderr - 76%|███████▌ | 17041/22434 [15:18:38<3:47:59, 2.54s/it] +2025-02-06 01:26:18 - ERROR - stderr - +2025-02-06 01:26:18 - ERROR - stderr - +2025-02-06 01:26:18 - INFO - stdout - {'loss': 0.3388, 'grad_norm': 1.417429804801941, 'learning_rate': 2.881113845263911e-06, 'epoch': 2.28} +2025-02-06 01:26:18 - ERROR - stderr - 76%|███████▌ | 17041/22434 [15:18:38<3:47:59, 2.54s/it] +2025-02-06 01:26:21 - ERROR - stderr - 76%|███████▌ | 17042/22434 [15:18:41<3:47:58, 2.54s/it] +2025-02-06 01:26:21 - ERROR - stderr - +2025-02-06 01:26:21 - ERROR - stderr - +2025-02-06 01:26:21 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.5438954830169678, 'learning_rate': 2.880099987195516e-06, 'epoch': 2.28} +2025-02-06 01:26:21 - ERROR - stderr - 76%|███████▌ | 17042/22434 [15:18:41<3:47:58, 2.54s/it] +2025-02-06 01:26:23 - ERROR - stderr - 76%|███████▌ | 17043/22434 [15:18:43<3:47:09, 2.53s/it] +2025-02-06 01:26:23 - ERROR - stderr - +2025-02-06 01:26:23 - ERROR - stderr - +2025-02-06 01:26:23 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.5103967189788818, 'learning_rate': 2.8790862775346275e-06, 'epoch': 2.28} +2025-02-06 01:26:23 - ERROR - stderr - 76%|███████▌ | 17043/22434 [15:18:43<3:47:09, 2.53s/it] +2025-02-06 01:26:26 - ERROR - stderr - 76%|███████▌ | 17044/22434 [15:18:46<3:47:06, 2.53s/it] +2025-02-06 01:26:26 - ERROR - stderr - +2025-02-06 01:26:26 - ERROR - stderr - +2025-02-06 01:26:26 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.492672324180603, 'learning_rate': 2.878072716302364e-06, 'epoch': 2.28} +2025-02-06 01:26:26 - ERROR - stderr - 76%|███████▌ | 17044/22434 [15:18:46<3:47:06, 2.53s/it] +2025-02-06 01:26:28 - ERROR - stderr - 76%|███████▌ | 17045/22434 [15:18:48<3:44:11, 2.50s/it] +2025-02-06 01:26:28 - ERROR - stderr - +2025-02-06 01:26:28 - ERROR - stderr - +2025-02-06 01:26:28 - INFO - stdout - {'loss': 0.3211, 'grad_norm': 1.3144148588180542, 'learning_rate': 2.8770593035198667e-06, 'epoch': 2.28} +2025-02-06 01:26:28 - ERROR - stderr - 76%|███████▌ | 17045/22434 [15:18:48<3:44:11, 2.50s/it] +2025-02-06 01:26:31 - ERROR - stderr - 76%|███████▌ | 17046/22434 [15:18:51<3:45:57, 2.52s/it] +2025-02-06 01:26:31 - ERROR - stderr - +2025-02-06 01:26:31 - ERROR - stderr - +2025-02-06 01:26:31 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.648748755455017, 'learning_rate': 2.8760460392082468e-06, 'epoch': 2.28} +2025-02-06 01:26:31 - ERROR - stderr - 76%|███████▌ | 17046/22434 [15:18:51<3:45:57, 2.52s/it] +2025-02-06 01:26:33 - ERROR - stderr - 76%|███████▌ | 17047/22434 [15:18:53<3:45:51, 2.52s/it] +2025-02-06 01:26:33 - ERROR - stderr - +2025-02-06 01:26:33 - ERROR - stderr - +2025-02-06 01:26:33 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.631947636604309, 'learning_rate': 2.875032923388632e-06, 'epoch': 2.28} +2025-02-06 01:26:33 - ERROR - stderr - 76%|███████▌ | 17047/22434 [15:18:53<3:45:51, 2.52s/it] +2025-02-06 01:26:36 - ERROR - stderr - 76%|███████▌ | 17048/22434 [15:18:56<3:44:39, 2.50s/it] +2025-02-06 01:26:36 - ERROR - stderr - +2025-02-06 01:26:36 - ERROR - stderr - +2025-02-06 01:26:36 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.4238033294677734, 'learning_rate': 2.8740199560821426e-06, 'epoch': 2.28} +2025-02-06 01:26:36 - ERROR - stderr - 76%|███████▌ | 17048/22434 [15:18:56<3:44:39, 2.50s/it] +2025-02-06 01:26:38 - ERROR - stderr - 76%|███████▌ | 17049/22434 [15:18:58<3:43:09, 2.49s/it] +2025-02-06 01:26:38 - ERROR - stderr - +2025-02-06 01:26:38 - ERROR - stderr - +2025-02-06 01:26:38 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.3393027782440186, 'learning_rate': 2.8730071373098813e-06, 'epoch': 2.28} +2025-02-06 01:26:38 - ERROR - stderr - 76%|███████▌ | 17049/22434 [15:18:58<3:43:09, 2.49s/it] +2025-02-06 01:26:41 - ERROR - stderr - 76%|███████▌ | 17050/22434 [15:19:00<3:43:24, 2.49s/it] +2025-02-06 01:26:41 - ERROR - stderr - +2025-02-06 01:26:41 - ERROR - stderr - +2025-02-06 01:26:41 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.377915620803833, 'learning_rate': 2.871994467092972e-06, 'epoch': 2.28} +2025-02-06 01:26:41 - ERROR - stderr - 76%|███████▌ | 17050/22434 [15:19:01<3:43:24, 2.49s/it] +2025-02-06 01:26:43 - ERROR - stderr - 76%|███████▌ | 17051/22434 [15:19:03<3:48:46, 2.55s/it] +2025-02-06 01:26:43 - ERROR - stderr - +2025-02-06 01:26:43 - ERROR - stderr - +2025-02-06 01:26:43 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.3234686851501465, 'learning_rate': 2.8709819454525157e-06, 'epoch': 2.28} +2025-02-06 01:26:43 - ERROR - stderr - 76%|███████▌ | 17051/22434 [15:19:03<3:48:46, 2.55s/it] +2025-02-06 01:26:46 - ERROR - stderr - 76%|███████▌ | 17052/22434 [15:19:06<3:46:40, 2.53s/it] +2025-02-06 01:26:46 - ERROR - stderr - +2025-02-06 01:26:46 - ERROR - stderr - +2025-02-06 01:26:46 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5339642763137817, 'learning_rate': 2.8699695724096177e-06, 'epoch': 2.28} +2025-02-06 01:26:46 - ERROR - stderr - 76%|███████▌ | 17052/22434 [15:19:06<3:46:40, 2.53s/it] +2025-02-06 01:26:48 - ERROR - stderr - 76%|███████▌ | 17053/22434 [15:19:08<3:47:23, 2.54s/it] +2025-02-06 01:26:48 - ERROR - stderr - +2025-02-06 01:26:48 - ERROR - stderr - +2025-02-06 01:26:48 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.5116121768951416, 'learning_rate': 2.8689573479853826e-06, 'epoch': 2.28} +2025-02-06 01:26:48 - ERROR - stderr - 76%|███████▌ | 17053/22434 [15:19:08<3:47:23, 2.54s/it] +2025-02-06 01:26:51 - ERROR - stderr - 76%|███████▌ | 17054/22434 [15:19:11<3:45:12, 2.51s/it] +2025-02-06 01:26:51 - ERROR - stderr - +2025-02-06 01:26:51 - ERROR - stderr - +2025-02-06 01:26:51 - INFO - stdout - {'loss': 0.333, 'grad_norm': 1.4378588199615479, 'learning_rate': 2.867945272200904e-06, 'epoch': 2.28} +2025-02-06 01:26:51 - ERROR - stderr - 76%|███████▌ | 17054/22434 [15:19:11<3:45:12, 2.51s/it] +2025-02-06 01:26:53 - ERROR - stderr - 76%|███████▌ | 17055/22434 [15:19:13<3:45:52, 2.52s/it] +2025-02-06 01:26:53 - ERROR - stderr - +2025-02-06 01:26:53 - ERROR - stderr - +2025-02-06 01:26:53 - INFO - stdout - {'loss': 0.3411, 'grad_norm': 1.3883188962936401, 'learning_rate': 2.8669333450772873e-06, 'epoch': 2.28} +2025-02-06 01:26:53 - ERROR - stderr - 76%|███████▌ | 17055/22434 [15:19:13<3:45:52, 2.52s/it] +2025-02-06 01:26:56 - ERROR - stderr - 76%|███████▌ | 17056/22434 [15:19:16<3:43:30, 2.49s/it] +2025-02-06 01:26:56 - ERROR - stderr - +2025-02-06 01:26:56 - ERROR - stderr - +2025-02-06 01:26:56 - INFO - stdout - {'loss': 0.329, 'grad_norm': 1.4780216217041016, 'learning_rate': 2.865921566635618e-06, 'epoch': 2.28} +2025-02-06 01:26:56 - ERROR - stderr - 76%|███████▌ | 17056/22434 [15:19:16<3:43:30, 2.49s/it] +2025-02-06 01:26:59 - ERROR - stderr - 76%|███████▌ | 17057/22434 [15:19:18<3:47:18, 2.54s/it] +2025-02-06 01:26:59 - ERROR - stderr - +2025-02-06 01:26:59 - ERROR - stderr - +2025-02-06 01:26:59 - INFO - stdout - {'loss': 0.3305, 'grad_norm': 1.2886308431625366, 'learning_rate': 2.864909936896986e-06, 'epoch': 2.28} +2025-02-06 01:26:59 - ERROR - stderr - 76%|███████▌ | 17057/22434 [15:19:18<3:47:18, 2.54s/it] +2025-02-06 01:27:01 - ERROR - stderr - 76%|███████▌ | 17058/22434 [15:19:21<3:52:57, 2.60s/it] +2025-02-06 01:27:01 - ERROR - stderr - +2025-02-06 01:27:01 - ERROR - stderr - +2025-02-06 01:27:01 - INFO - stdout - {'loss': 0.4124, 'grad_norm': 1.562170147895813, 'learning_rate': 2.8638984558824777e-06, 'epoch': 2.28} +2025-02-06 01:27:01 - ERROR - stderr - 76%|███████▌ | 17058/22434 [15:19:21<3:52:57, 2.60s/it] +2025-02-06 01:27:04 - ERROR - stderr - 76%|███████▌ | 17059/22434 [15:19:24<3:50:21, 2.57s/it] +2025-02-06 01:27:04 - ERROR - stderr - +2025-02-06 01:27:04 - ERROR - stderr - +2025-02-06 01:27:04 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.4724701642990112, 'learning_rate': 2.8628871236131796e-06, 'epoch': 2.28} +2025-02-06 01:27:04 - ERROR - stderr - 76%|███████▌ | 17059/22434 [15:19:24<3:50:21, 2.57s/it] +2025-02-06 01:27:06 - ERROR - stderr - 76%|███████▌ | 17060/22434 [15:19:26<3:50:07, 2.57s/it] +2025-02-06 01:27:06 - ERROR - stderr - +2025-02-06 01:27:06 - ERROR - stderr - +2025-02-06 01:27:06 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.5228806734085083, 'learning_rate': 2.861875940110168e-06, 'epoch': 2.28} +2025-02-06 01:27:06 - ERROR - stderr - 76%|███████▌ | 17060/22434 [15:19:26<3:50:07, 2.57s/it] +2025-02-06 01:27:09 - ERROR - stderr - 76%|███████▌ | 17061/22434 [15:19:29<3:51:51, 2.59s/it] +2025-02-06 01:27:09 - ERROR - stderr - +2025-02-06 01:27:09 - ERROR - stderr - +2025-02-06 01:27:09 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.6004219055175781, 'learning_rate': 2.8608649053945235e-06, 'epoch': 2.28} +2025-02-06 01:27:09 - ERROR - stderr - 76%|███████▌ | 17061/22434 [15:19:29<3:51:51, 2.59s/it] +2025-02-06 01:27:11 - ERROR - stderr - 76%|███████▌ | 17062/22434 [15:19:31<3:49:32, 2.56s/it] +2025-02-06 01:27:12 - ERROR - stderr - +2025-02-06 01:27:12 - ERROR - stderr - +2025-02-06 01:27:12 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.432561993598938, 'learning_rate': 2.859854019487318e-06, 'epoch': 2.28} +2025-02-06 01:27:12 - ERROR - stderr - 76%|███████▌ | 17062/22434 [15:19:31<3:49:32, 2.56s/it] +2025-02-06 01:27:14 - ERROR - stderr - 76%|███████▌ | 17063/22434 [15:19:34<3:48:32, 2.55s/it] +2025-02-06 01:27:14 - ERROR - stderr - +2025-02-06 01:27:14 - ERROR - stderr - +2025-02-06 01:27:14 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5826714038848877, 'learning_rate': 2.8588432824096236e-06, 'epoch': 2.28} +2025-02-06 01:27:14 - ERROR - stderr - 76%|███████▌ | 17063/22434 [15:19:34<3:48:32, 2.55s/it] +2025-02-06 01:27:17 - ERROR - stderr - 76%|███████▌ | 17064/22434 [15:19:36<3:49:12, 2.56s/it] +2025-02-06 01:27:17 - ERROR - stderr - +2025-02-06 01:27:17 - ERROR - stderr - +2025-02-06 01:27:17 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.628354549407959, 'learning_rate': 2.8578326941825074e-06, 'epoch': 2.28} +2025-02-06 01:27:17 - ERROR - stderr - 76%|███████▌ | 17064/22434 [15:19:36<3:49:12, 2.56s/it] +2025-02-06 01:27:19 - ERROR - stderr - 76%|███████▌ | 17065/22434 [15:19:39<3:48:45, 2.56s/it] +2025-02-06 01:27:19 - ERROR - stderr - +2025-02-06 01:27:19 - ERROR - stderr - +2025-02-06 01:27:19 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.4790418148040771, 'learning_rate': 2.856822254827034e-06, 'epoch': 2.28} +2025-02-06 01:27:19 - ERROR - stderr - 76%|███████▌ | 17065/22434 [15:19:39<3:48:45, 2.56s/it] +2025-02-06 01:27:22 - ERROR - stderr - 76%|███████▌ | 17066/22434 [15:19:41<3:46:43, 2.53s/it] +2025-02-06 01:27:22 - ERROR - stderr - +2025-02-06 01:27:22 - ERROR - stderr - +2025-02-06 01:27:22 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.4843205213546753, 'learning_rate': 2.8558119643642657e-06, 'epoch': 2.28} +2025-02-06 01:27:22 - ERROR - stderr - 76%|███████▌ | 17066/22434 [15:19:41<3:46:43, 2.53s/it] +2025-02-06 01:27:24 - ERROR - stderr - 76%|███████▌ | 17067/22434 [15:19:44<3:46:24, 2.53s/it] +2025-02-06 01:27:24 - ERROR - stderr - +2025-02-06 01:27:24 - ERROR - stderr - +2025-02-06 01:27:24 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.7294301986694336, 'learning_rate': 2.854801822815263e-06, 'epoch': 2.28} +2025-02-06 01:27:24 - ERROR - stderr - 76%|███████▌ | 17067/22434 [15:19:44<3:46:24, 2.53s/it] +2025-02-06 01:27:27 - ERROR - stderr - 76%|███████▌ | 17068/22434 [15:19:46<3:45:30, 2.52s/it] +2025-02-06 01:27:27 - ERROR - stderr - +2025-02-06 01:27:27 - ERROR - stderr - +2025-02-06 01:27:27 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.4788068532943726, 'learning_rate': 2.8537918302010737e-06, 'epoch': 2.28} +2025-02-06 01:27:27 - ERROR - stderr - 76%|███████▌ | 17068/22434 [15:19:46<3:45:30, 2.52s/it] +2025-02-06 01:27:29 - ERROR - stderr - 76%|███████▌ | 17069/22434 [15:19:49<3:46:20, 2.53s/it] +2025-02-06 01:27:29 - ERROR - stderr - +2025-02-06 01:27:29 - ERROR - stderr - +2025-02-06 01:27:29 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.5772935152053833, 'learning_rate': 2.852781986542762e-06, 'epoch': 2.28} +2025-02-06 01:27:29 - ERROR - stderr - 76%|███████▌ | 17069/22434 [15:19:49<3:46:20, 2.53s/it] +2025-02-06 01:27:32 - ERROR - stderr - 76%|███████▌ | 17070/22434 [15:19:51<3:46:47, 2.54s/it] +2025-02-06 01:27:32 - ERROR - stderr - +2025-02-06 01:27:32 - ERROR - stderr - +2025-02-06 01:27:32 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.4873038530349731, 'learning_rate': 2.8517722918613642e-06, 'epoch': 2.28} +2025-02-06 01:27:32 - ERROR - stderr - 76%|███████▌ | 17070/22434 [15:19:52<3:46:47, 2.54s/it] +2025-02-06 01:27:34 - ERROR - stderr - 76%|███████▌ | 17071/22434 [15:19:54<3:45:26, 2.52s/it] +2025-02-06 01:27:34 - ERROR - stderr - +2025-02-06 01:27:34 - ERROR - stderr - +2025-02-06 01:27:34 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.5206979513168335, 'learning_rate': 2.8507627461779384e-06, 'epoch': 2.28} +2025-02-06 01:27:34 - ERROR - stderr - 76%|███████▌ | 17071/22434 [15:19:54<3:45:26, 2.52s/it] +2025-02-06 01:27:37 - ERROR - stderr - 76%|███████▌ | 17072/22434 [15:19:57<3:45:41, 2.53s/it] +2025-02-06 01:27:37 - ERROR - stderr - +2025-02-06 01:27:37 - ERROR - stderr - +2025-02-06 01:27:37 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.6808936595916748, 'learning_rate': 2.84975334951352e-06, 'epoch': 2.28} +2025-02-06 01:27:37 - ERROR - stderr - 76%|███████▌ | 17072/22434 [15:19:57<3:45:41, 2.53s/it] +2025-02-06 01:27:39 - ERROR - stderr - 76%|███████▌ | 17073/22434 [15:19:59<3:44:27, 2.51s/it] +2025-02-06 01:27:39 - ERROR - stderr - +2025-02-06 01:27:39 - ERROR - stderr - +2025-02-06 01:27:39 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.3503527641296387, 'learning_rate': 2.848744101889148e-06, 'epoch': 2.28} +2025-02-06 01:27:39 - ERROR - stderr - 76%|███████▌ | 17073/22434 [15:19:59<3:44:27, 2.51s/it] +2025-02-06 01:27:42 - ERROR - stderr - 76%|███████▌ | 17074/22434 [15:20:02<3:47:58, 2.55s/it] +2025-02-06 01:27:42 - ERROR - stderr - +2025-02-06 01:27:42 - ERROR - stderr - +2025-02-06 01:27:42 - INFO - stdout - {'loss': 0.3137, 'grad_norm': 1.3562567234039307, 'learning_rate': 2.847735003325868e-06, 'epoch': 2.28} +2025-02-06 01:27:42 - ERROR - stderr - 76%|███████▌ | 17074/22434 [15:20:02<3:47:58, 2.55s/it] +2025-02-06 01:27:44 - ERROR - stderr - 76%|███████▌ | 17075/22434 [15:20:04<3:44:40, 2.52s/it] +2025-02-06 01:27:44 - ERROR - stderr - +2025-02-06 01:27:44 - ERROR - stderr - +2025-02-06 01:27:44 - INFO - stdout - {'loss': 0.3151, 'grad_norm': 1.439693570137024, 'learning_rate': 2.8467260538447038e-06, 'epoch': 2.28} +2025-02-06 01:27:44 - ERROR - stderr - 76%|███████▌ | 17075/22434 [15:20:04<3:44:40, 2.52s/it] +2025-02-06 01:27:47 - ERROR - stderr - 76%|███████▌ | 17076/22434 [15:20:07<3:46:07, 2.53s/it] +2025-02-06 01:27:47 - ERROR - stderr - +2025-02-06 01:27:47 - ERROR - stderr - +2025-02-06 01:27:47 - INFO - stdout - {'loss': 0.4178, 'grad_norm': 1.7823055982589722, 'learning_rate': 2.845717253466691e-06, 'epoch': 2.28} +2025-02-06 01:27:47 - ERROR - stderr - 76%|███████▌ | 17076/22434 [15:20:07<3:46:07, 2.53s/it] +2025-02-06 01:27:49 - ERROR - stderr - 76%|███████▌ | 17077/22434 [15:20:09<3:45:09, 2.52s/it] +2025-02-06 01:27:49 - ERROR - stderr - +2025-02-06 01:27:49 - ERROR - stderr - +2025-02-06 01:27:49 - INFO - stdout - {'loss': 0.4156, 'grad_norm': 1.6346759796142578, 'learning_rate': 2.8447086022128565e-06, 'epoch': 2.28} +2025-02-06 01:27:49 - ERROR - stderr - 76%|███████▌ | 17077/22434 [15:20:09<3:45:09, 2.52s/it] +2025-02-06 01:27:52 - ERROR - stderr - 76%|███████▌ | 17078/22434 [15:20:12<3:43:33, 2.50s/it] +2025-02-06 01:27:52 - ERROR - stderr - +2025-02-06 01:27:52 - ERROR - stderr - +2025-02-06 01:27:52 - INFO - stdout - {'loss': 0.3355, 'grad_norm': 1.2736523151397705, 'learning_rate': 2.8437001001042244e-06, 'epoch': 2.28} +2025-02-06 01:27:52 - ERROR - stderr - 76%|███████▌ | 17078/22434 [15:20:12<3:43:33, 2.50s/it] +2025-02-06 01:27:54 - ERROR - stderr - 76%|███████▌ | 17079/22434 [15:20:14<3:41:52, 2.49s/it] +2025-02-06 01:27:54 - ERROR - stderr - +2025-02-06 01:27:54 - ERROR - stderr - +2025-02-06 01:27:54 - INFO - stdout - {'loss': 0.4395, 'grad_norm': 1.707222819328308, 'learning_rate': 2.8426917471618144e-06, 'epoch': 2.28} +2025-02-06 01:27:54 - ERROR - stderr - 76%|███████▌ | 17079/22434 [15:20:14<3:41:52, 2.49s/it] +2025-02-06 01:27:57 - ERROR - stderr - 76%|███████▌ | 17080/22434 [15:20:16<3:40:02, 2.47s/it] +2025-02-06 01:27:57 - ERROR - stderr - +2025-02-06 01:27:57 - ERROR - stderr - +2025-02-06 01:27:57 - INFO - stdout - {'loss': 0.4029, 'grad_norm': 1.5472502708435059, 'learning_rate': 2.841683543406647e-06, 'epoch': 2.28} +2025-02-06 01:27:57 - ERROR - stderr - 76%|███████▌ | 17080/22434 [15:20:17<3:40:02, 2.47s/it] +2025-02-06 01:27:59 - ERROR - stderr - 76%|███████▌ | 17081/22434 [15:20:19<3:41:55, 2.49s/it] +2025-02-06 01:27:59 - ERROR - stderr - +2025-02-06 01:27:59 - ERROR - stderr - +2025-02-06 01:27:59 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.539623498916626, 'learning_rate': 2.8406754888597365e-06, 'epoch': 2.28} +2025-02-06 01:27:59 - ERROR - stderr - 76%|███████▌ | 17081/22434 [15:20:19<3:41:55, 2.49s/it] +2025-02-06 01:28:02 - ERROR - stderr - 76%|███████▌ | 17082/22434 [15:20:21<3:39:59, 2.47s/it] +2025-02-06 01:28:02 - ERROR - stderr - +2025-02-06 01:28:02 - ERROR - stderr - +2025-02-06 01:28:02 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.5965551137924194, 'learning_rate': 2.839667583542095e-06, 'epoch': 2.28} +2025-02-06 01:28:02 - ERROR - stderr - 76%|███████▌ | 17082/22434 [15:20:21<3:39:59, 2.47s/it] +2025-02-06 01:28:04 - ERROR - stderr - 76%|███████▌ | 17083/22434 [15:20:24<3:40:27, 2.47s/it] +2025-02-06 01:28:04 - ERROR - stderr - +2025-02-06 01:28:04 - ERROR - stderr - +2025-02-06 01:28:04 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.4609220027923584, 'learning_rate': 2.8386598274747303e-06, 'epoch': 2.28} +2025-02-06 01:28:04 - ERROR - stderr - 76%|███████▌ | 17083/22434 [15:20:24<3:40:27, 2.47s/it] +2025-02-06 01:28:07 - ERROR - stderr - 76%|███████▌ | 17084/22434 [15:20:27<3:43:41, 2.51s/it] +2025-02-06 01:28:07 - ERROR - stderr - +2025-02-06 01:28:07 - ERROR - stderr - +2025-02-06 01:28:07 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.5263363122940063, 'learning_rate': 2.8376522206786494e-06, 'epoch': 2.28} +2025-02-06 01:28:07 - ERROR - stderr - 76%|███████▌ | 17084/22434 [15:20:27<3:43:41, 2.51s/it] +2025-02-06 01:28:09 - ERROR - stderr - 76%|███████▌ | 17085/22434 [15:20:29<3:42:32, 2.50s/it] +2025-02-06 01:28:09 - ERROR - stderr - +2025-02-06 01:28:09 - ERROR - stderr - +2025-02-06 01:28:09 - INFO - stdout - {'loss': 0.4484, 'grad_norm': 1.6540801525115967, 'learning_rate': 2.836644763174854e-06, 'epoch': 2.28} +2025-02-06 01:28:09 - ERROR - stderr - 76%|███████▌ | 17085/22434 [15:20:29<3:42:32, 2.50s/it] +2025-02-06 01:28:12 - ERROR - stderr - 76%|███████▌ | 17086/22434 [15:20:31<3:40:39, 2.48s/it] +2025-02-06 01:28:12 - ERROR - stderr - +2025-02-06 01:28:12 - ERROR - stderr - +2025-02-06 01:28:12 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.5076287984848022, 'learning_rate': 2.8356374549843447e-06, 'epoch': 2.28} +2025-02-06 01:28:12 - ERROR - stderr - 76%|███████▌ | 17086/22434 [15:20:31<3:40:39, 2.48s/it] +2025-02-06 01:28:14 - ERROR - stderr - 76%|███████▌ | 17087/22434 [15:20:34<3:41:53, 2.49s/it] +2025-02-06 01:28:14 - ERROR - stderr - +2025-02-06 01:28:14 - ERROR - stderr - +2025-02-06 01:28:14 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.4015815258026123, 'learning_rate': 2.834630296128116e-06, 'epoch': 2.28} +2025-02-06 01:28:14 - ERROR - stderr - 76%|███████▌ | 17087/22434 [15:20:34<3:41:53, 2.49s/it] +2025-02-06 01:28:17 - ERROR - stderr - 76%|███████▌ | 17088/22434 [15:20:36<3:43:24, 2.51s/it] +2025-02-06 01:28:17 - ERROR - stderr - +2025-02-06 01:28:17 - ERROR - stderr - +2025-02-06 01:28:17 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.5057001113891602, 'learning_rate': 2.8336232866271663e-06, 'epoch': 2.29} +2025-02-06 01:28:17 - ERROR - stderr - 76%|███████▌ | 17088/22434 [15:20:37<3:43:24, 2.51s/it] +2025-02-06 01:28:19 - ERROR - stderr - 76%|███████▌ | 17089/22434 [15:20:39<3:42:41, 2.50s/it] +2025-02-06 01:28:19 - ERROR - stderr - +2025-02-06 01:28:19 - ERROR - stderr - +2025-02-06 01:28:19 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.4895102977752686, 'learning_rate': 2.8326164265024746e-06, 'epoch': 2.29} +2025-02-06 01:28:19 - ERROR - stderr - 76%|███████▌ | 17089/22434 [15:20:39<3:42:41, 2.50s/it] +2025-02-06 01:28:22 - ERROR - stderr - 76%|███████▌ | 17090/22434 [15:20:41<3:42:18, 2.50s/it] +2025-02-06 01:28:22 - ERROR - stderr - +2025-02-06 01:28:22 - ERROR - stderr - +2025-02-06 01:28:22 - INFO - stdout - {'loss': 0.4047, 'grad_norm': 1.730924129486084, 'learning_rate': 2.8316097157750422e-06, 'epoch': 2.29} +2025-02-06 01:28:22 - ERROR - stderr - 76%|███████▌ | 17090/22434 [15:20:41<3:42:18, 2.50s/it] +2025-02-06 01:28:24 - ERROR - stderr - 76%|███████▌ | 17091/22434 [15:20:44<3:41:17, 2.48s/it] +2025-02-06 01:28:24 - ERROR - stderr - +2025-02-06 01:28:24 - ERROR - stderr - +2025-02-06 01:28:24 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.4394056797027588, 'learning_rate': 2.8306031544658387e-06, 'epoch': 2.29} +2025-02-06 01:28:24 - ERROR - stderr - 76%|███████▌ | 17091/22434 [15:20:44<3:41:17, 2.48s/it] +2025-02-06 01:28:27 - ERROR - stderr - 76%|███████�� | 17092/22434 [15:20:46<3:42:27, 2.50s/it] +2025-02-06 01:28:27 - ERROR - stderr - +2025-02-06 01:28:27 - ERROR - stderr - +2025-02-06 01:28:27 - INFO - stdout - {'loss': 0.3679, 'grad_norm': 1.4548566341400146, 'learning_rate': 2.8295967425958557e-06, 'epoch': 2.29} +2025-02-06 01:28:27 - ERROR - stderr - 76%|███████▌ | 17092/22434 [15:20:46<3:42:27, 2.50s/it] +2025-02-06 01:28:29 - ERROR - stderr - 76%|███████▌ | 17093/22434 [15:20:49<3:40:55, 2.48s/it] +2025-02-06 01:28:29 - ERROR - stderr - +2025-02-06 01:28:29 - ERROR - stderr - +2025-02-06 01:28:29 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.4451931715011597, 'learning_rate': 2.82859048018607e-06, 'epoch': 2.29} +2025-02-06 01:28:29 - ERROR - stderr - 76%|███████▌ | 17093/22434 [15:20:49<3:40:55, 2.48s/it] +2025-02-06 01:28:32 - ERROR - stderr - 76%|███████▌ | 17094/22434 [15:20:51<3:39:31, 2.47s/it] +2025-02-06 01:28:32 - ERROR - stderr - +2025-02-06 01:28:32 - ERROR - stderr - +2025-02-06 01:28:32 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.5823029279708862, 'learning_rate': 2.8275843672574476e-06, 'epoch': 2.29} +2025-02-06 01:28:32 - ERROR - stderr - 76%|███████▌ | 17094/22434 [15:20:51<3:39:31, 2.47s/it] +2025-02-06 01:28:34 - ERROR - stderr - 76%|███████▌ | 17095/22434 [15:20:54<3:40:09, 2.47s/it] +2025-02-06 01:28:34 - ERROR - stderr - +2025-02-06 01:28:34 - ERROR - stderr - +2025-02-06 01:28:34 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.3878077268600464, 'learning_rate': 2.826578403830972e-06, 'epoch': 2.29} +2025-02-06 01:28:34 - ERROR - stderr - 76%|███████▌ | 17095/22434 [15:20:54<3:40:09, 2.47s/it] +2025-02-06 01:28:37 - ERROR - stderr - 76%|███████▌ | 17096/22434 [15:20:56<3:40:16, 2.48s/it] +2025-02-06 01:28:37 - ERROR - stderr - +2025-02-06 01:28:37 - ERROR - stderr - +2025-02-06 01:28:37 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.5558421611785889, 'learning_rate': 2.825572589927602e-06, 'epoch': 2.29} +2025-02-06 01:28:37 - ERROR - stderr - 76%|███████▌ | 17096/22434 [15:20:56<3:40:16, 2.48s/it] +2025-02-06 01:28:39 - ERROR - stderr - 76%|███████▌ | 17097/22434 [15:20:59<3:39:17, 2.47s/it] +2025-02-06 01:28:39 - ERROR - stderr - +2025-02-06 01:28:39 - ERROR - stderr - +2025-02-06 01:28:39 - INFO - stdout - {'loss': 0.4234, 'grad_norm': 1.6921919584274292, 'learning_rate': 2.8245669255683072e-06, 'epoch': 2.29} +2025-02-06 01:28:39 - ERROR - stderr - 76%|███████▌ | 17097/22434 [15:20:59<3:39:17, 2.47s/it] +2025-02-06 01:28:41 - ERROR - stderr - 76%|███████▌ | 17098/22434 [15:21:01<3:40:20, 2.48s/it] +2025-02-06 01:28:42 - ERROR - stderr - +2025-02-06 01:28:42 - ERROR - stderr - +2025-02-06 01:28:42 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.505511999130249, 'learning_rate': 2.823561410774047e-06, 'epoch': 2.29} +2025-02-06 01:28:42 - ERROR - stderr - 76%|███████▌ | 17098/22434 [15:21:01<3:40:20, 2.48s/it] +2025-02-06 01:28:44 - ERROR - stderr - 76%|███████▌ | 17099/22434 [15:21:04<3:41:17, 2.49s/it] +2025-02-06 01:28:44 - ERROR - stderr - +2025-02-06 01:28:44 - ERROR - stderr - +2025-02-06 01:28:44 - INFO - stdout - {'loss': 0.3384, 'grad_norm': 1.4950768947601318, 'learning_rate': 2.8225560455657807e-06, 'epoch': 2.29} +2025-02-06 01:28:44 - ERROR - stderr - 76%|███████▌ | 17099/22434 [15:21:04<3:41:17, 2.49s/it] +2025-02-06 01:28:47 - ERROR - stderr - 76%|███████▌ | 17100/22434 [15:21:06<3:42:34, 2.50s/it] +2025-02-06 01:28:47 - ERROR - stderr - +2025-02-06 01:28:47 - ERROR - stderr - +2025-02-06 01:28:47 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.5675758123397827, 'learning_rate': 2.82155082996447e-06, 'epoch': 2.29} +2025-02-06 01:28:47 - ERROR - stderr - 76%|███████▌ | 17100/22434 [15:21:06<3:42:34, 2.50s/it] +2025-02-06 01:28:49 - ERROR - stderr - 76%|███████▌ | 17101/22434 [15:21:09<3:44:56, 2.53s/it] +2025-02-06 01:28:49 - ERROR - stderr - +2025-02-06 01:28:49 - ERROR - stderr - +2025-02-06 01:28:49 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.3418360948562622, 'learning_rate': 2.8205457639910616e-06, 'epoch': 2.29} +2025-02-06 01:28:49 - ERROR - stderr - 76%|███████▌ | 17101/22434 [15:21:09<3:44:56, 2.53s/it] +2025-02-06 01:28:52 - ERROR - stderr - 76%|███████▌ | 17102/22434 [15:21:11<3:42:13, 2.50s/it] +2025-02-06 01:28:52 - ERROR - stderr - +2025-02-06 01:28:52 - ERROR - stderr - +2025-02-06 01:28:52 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.5419352054595947, 'learning_rate': 2.8195408476665064e-06, 'epoch': 2.29} +2025-02-06 01:28:52 - ERROR - stderr - 76%|███████▌ | 17102/22434 [15:21:11<3:42:13, 2.50s/it] +2025-02-06 01:28:54 - ERROR - stderr - 76%|███████▌ | 17103/22434 [15:21:14<3:39:59, 2.48s/it] +2025-02-06 01:28:54 - ERROR - stderr - +2025-02-06 01:28:54 - ERROR - stderr - +2025-02-06 01:28:54 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.5953378677368164, 'learning_rate': 2.8185360810117514e-06, 'epoch': 2.29} +2025-02-06 01:28:54 - ERROR - stderr - 76%|███████▌ | 17103/22434 [15:21:14<3:39:59, 2.48s/it] +2025-02-06 01:28:56 - ERROR - stderr - 76%|███████▌ | 17104/22434 [15:21:16<3:41:22, 2.49s/it] +2025-02-06 01:28:57 - ERROR - stderr - +2025-02-06 01:28:57 - ERROR - stderr - +2025-02-06 01:28:57 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.4761632680892944, 'learning_rate': 2.817531464047739e-06, 'epoch': 2.29} +2025-02-06 01:28:57 - ERROR - stderr - 76%|███████▌ | 17104/22434 [15:21:16<3:41:22, 2.49s/it] +2025-02-06 01:28:59 - ERROR - stderr - 76%|███████▌ | 17105/22434 [15:21:19<3:41:30, 2.49s/it] +2025-02-06 01:28:59 - ERROR - stderr - +2025-02-06 01:28:59 - ERROR - stderr - +2025-02-06 01:28:59 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.496482491493225, 'learning_rate': 2.816526996795411e-06, 'epoch': 2.29} +2025-02-06 01:28:59 - ERROR - stderr - 76%|███████▌ | 17105/22434 [15:21:19<3:41:30, 2.49s/it] +2025-02-06 01:29:01 - ERROR - stderr - 76%|███████▋ | 17106/22434 [15:21:21<3:42:06, 2.50s/it] +2025-02-06 01:29:02 - ERROR - stderr - +2025-02-06 01:29:02 - ERROR - stderr - +2025-02-06 01:29:02 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.5692352056503296, 'learning_rate': 2.815522679275704e-06, 'epoch': 2.29} +2025-02-06 01:29:02 - ERROR - stderr - 76%|███████▋ | 17106/22434 [15:21:21<3:42:06, 2.50s/it] +2025-02-06 01:29:04 - ERROR - stderr - 76%|███████▋ | 17107/22434 [15:21:24<3:40:46, 2.49s/it] +2025-02-06 01:29:04 - ERROR - stderr - +2025-02-06 01:29:04 - ERROR - stderr - +2025-02-06 01:29:04 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.6782152652740479, 'learning_rate': 2.814518511509552e-06, 'epoch': 2.29} +2025-02-06 01:29:04 - ERROR - stderr - 76%|███████▋ | 17107/22434 [15:21:24<3:40:46, 2.49s/it] +2025-02-06 01:29:06 - ERROR - stderr - 76%|███████▋ | 17108/22434 [15:21:26<3:41:37, 2.50s/it] +2025-02-06 01:29:07 - ERROR - stderr - +2025-02-06 01:29:07 - ERROR - stderr - +2025-02-06 01:29:07 - INFO - stdout - {'loss': 0.3216, 'grad_norm': 1.4950534105300903, 'learning_rate': 2.813514493517885e-06, 'epoch': 2.29} +2025-02-06 01:29:07 - ERROR - stderr - 76%|███████▋ | 17108/22434 [15:21:26<3:41:37, 2.50s/it] +2025-02-06 01:29:09 - ERROR - stderr - 76%|███████▋ | 17109/22434 [15:21:29<3:42:48, 2.51s/it] +2025-02-06 01:29:09 - ERROR - stderr - +2025-02-06 01:29:09 - ERROR - stderr - +2025-02-06 01:29:09 - INFO - stdout - {'loss': 0.316, 'grad_norm': 1.3602277040481567, 'learning_rate': 2.8125106253216363e-06, 'epoch': 2.29} +2025-02-06 01:29:09 - ERROR - stderr - 76%|███████▋ | 17109/22434 [15:21:29<3:42:48, 2.51s/it] +2025-02-06 01:29:11 - ERROR - stderr - 76%|███████▋ | 17110/22434 [15:21:31<3:41:46, 2.50s/it] +2025-02-06 01:29:12 - ERROR - stderr - +2025-02-06 01:29:12 - ERROR - stderr - +2025-02-06 01:29:12 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.5474023818969727, 'learning_rate': 2.8115069069417176e-06, 'epoch': 2.29} +2025-02-06 01:29:12 - ERROR - stderr - 76%|███████▋ | 17110/22434 [15:21:31<3:41:46, 2.50s/it] +2025-02-06 01:29:14 - ERROR - stderr - 76%|███████▋ | 17111/22434 [15:21:34<3:40:20, 2.48s/it] +2025-02-06 01:29:14 - ERROR - stderr - +2025-02-06 01:29:14 - ERROR - stderr - +2025-02-06 01:29:14 - INFO - stdout - {'loss': 0.3345, 'grad_norm': 1.333630084991455, 'learning_rate': 2.810503338399063e-06, 'epoch': 2.29} +2025-02-06 01:29:14 - ERROR - stderr - 76%|███████▋ | 17111/22434 [15:21:34<3:40:20, 2.48s/it] +2025-02-06 01:29:17 - ERROR - stderr - 76%|███████▋ | 17112/22434 [15:21:36<3:47:05, 2.56s/it] +2025-02-06 01:29:17 - ERROR - stderr - +2025-02-06 01:29:17 - ERROR - stderr - +2025-02-06 01:29:17 - INFO - stdout - {'loss': 0.381, 'grad_norm': 1.4353400468826294, 'learning_rate': 2.8094999197145902e-06, 'epoch': 2.29} +2025-02-06 01:29:17 - ERROR - stderr - 76%|███████▋ | 17112/22434 [15:21:36<3:47:05, 2.56s/it] +2025-02-06 01:29:19 - ERROR - stderr - 76%|███████▋ | 17113/22434 [15:21:39<3:45:24, 2.54s/it] +2025-02-06 01:29:19 - ERROR - stderr - +2025-02-06 01:29:19 - ERROR - stderr - +2025-02-06 01:29:19 - INFO - stdout - {'loss': 0.4281, 'grad_norm': 1.6984935998916626, 'learning_rate': 2.808496650909205e-06, 'epoch': 2.29} +2025-02-06 01:29:19 - ERROR - stderr - 76%|███████▋ | 17113/22434 [15:21:39<3:45:24, 2.54s/it] +2025-02-06 01:29:22 - ERROR - stderr - 76%|███████▋ | 17114/22434 [15:21:41<3:44:06, 2.53s/it] +2025-02-06 01:29:22 - ERROR - stderr - +2025-02-06 01:29:22 - ERROR - stderr - +2025-02-06 01:29:22 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.4357613325119019, 'learning_rate': 2.807493532003831e-06, 'epoch': 2.29} +2025-02-06 01:29:22 - ERROR - stderr - 76%|███████▋ | 17114/22434 [15:21:41<3:44:06, 2.53s/it] +2025-02-06 01:29:24 - ERROR - stderr - 76%|███████▋ | 17115/22434 [15:21:44<3:43:28, 2.52s/it] +2025-02-06 01:29:24 - ERROR - stderr - +2025-02-06 01:29:24 - ERROR - stderr - +2025-02-06 01:29:24 - INFO - stdout - {'loss': 0.394, 'grad_norm': 1.4283082485198975, 'learning_rate': 2.806490563019366e-06, 'epoch': 2.29} +2025-02-06 01:29:24 - ERROR - stderr - 76%|███████▋ | 17115/22434 [15:21:44<3:43:28, 2.52s/it] +2025-02-06 01:29:27 - ERROR - stderr - 76%|███████▋ | 17116/22434 [15:21:47<3:45:15, 2.54s/it] +2025-02-06 01:29:27 - ERROR - stderr - +2025-02-06 01:29:27 - ERROR - stderr - +2025-02-06 01:29:27 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.4725817441940308, 'learning_rate': 2.8054877439767283e-06, 'epoch': 2.29} +2025-02-06 01:29:27 - ERROR - stderr - 76%|███████▋ | 17116/22434 [15:21:47<3:45:15, 2.54s/it] +2025-02-06 01:29:29 - ERROR - stderr - 76%|███████▋ | 17117/22434 [15:21:49<3:43:44, 2.52s/it] +2025-02-06 01:29:29 - ERROR - stderr - +2025-02-06 01:29:29 - ERROR - stderr - +2025-02-06 01:29:29 - INFO - stdout - {'loss': 0.424, 'grad_norm': 1.763079285621643, 'learning_rate': 2.8044850748968112e-06, 'epoch': 2.29} +2025-02-06 01:29:29 - ERROR - stderr - 76%|███████▋ | 17117/22434 [15:21:49<3:43:44, 2.52s/it] +2025-02-06 01:29:32 - ERROR - stderr - 76%|███████▋ | 17118/22434 [15:21:52<3:45:30, 2.55s/it] +2025-02-06 01:29:32 - ERROR - stderr - +2025-02-06 01:29:32 - ERROR - stderr - +2025-02-06 01:29:32 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.4673564434051514, 'learning_rate': 2.803482555800513e-06, 'epoch': 2.29} +2025-02-06 01:29:32 - ERROR - stderr - 76%|███████▋ | 17118/22434 [15:21:52<3:45:30, 2.55s/it] +2025-02-06 01:29:34 - ERROR - stderr - 76%|███████▋ | 17119/22434 [15:21:54<3:47:45, 2.57s/it] +2025-02-06 01:29:35 - ERROR - stderr - +2025-02-06 01:29:35 - ERROR - stderr - +2025-02-06 01:29:35 - INFO - stdout - {'loss': 0.3908, 'grad_norm': 1.3144391775131226, 'learning_rate': 2.8024801867087414e-06, 'epoch': 2.29} +2025-02-06 01:29:35 - ERROR - stderr - 76%|███████▋ | 17119/22434 [15:21:54<3:47:45, 2.57s/it] +2025-02-06 01:29:37 - ERROR - stderr - 76%|███████▋ | 17120/22434 [15:21:57<3:45:53, 2.55s/it] +2025-02-06 01:29:37 - ERROR - stderr - +2025-02-06 01:29:37 - ERROR - stderr - +2025-02-06 01:29:37 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.4826347827911377, 'learning_rate': 2.801477967642381e-06, 'epoch': 2.29} +2025-02-06 01:29:37 - ERROR - stderr - 76%|███████▋ | 17120/22434 [15:21:57<3:45:53, 2.55s/it] +2025-02-06 01:29:40 - ERROR - stderr - 76%|███████▋ | 17121/22434 [15:21:59<3:50:17, 2.60s/it] +2025-02-06 01:29:40 - ERROR - stderr - +2025-02-06 01:29:40 - ERROR - stderr - +2025-02-06 01:29:40 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.4246609210968018, 'learning_rate': 2.8004758986223225e-06, 'epoch': 2.29} +2025-02-06 01:29:40 - ERROR - stderr - 76%|███████▋ | 17121/22434 [15:22:00<3:50:17, 2.60s/it] +2025-02-06 01:29:42 - ERROR - stderr - 76%|███████▋ | 17122/22434 [15:22:02<3:48:44, 2.58s/it] +2025-02-06 01:29:42 - ERROR - stderr - +2025-02-06 01:29:42 - ERROR - stderr - +2025-02-06 01:29:42 - INFO - stdout - {'loss': 0.4015, 'grad_norm': 1.4616708755493164, 'learning_rate': 2.799473979669456e-06, 'epoch': 2.29} +2025-02-06 01:29:42 - ERROR - stderr - 76%|███████▋ | 17122/22434 [15:22:02<3:48:44, 2.58s/it] +2025-02-06 01:29:45 - ERROR - stderr - 76%|███████▋ | 17123/22434 [15:22:05<3:46:58, 2.56s/it] +2025-02-06 01:29:45 - ERROR - stderr - +2025-02-06 01:29:45 - ERROR - stderr - +2025-02-06 01:29:45 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.6043283939361572, 'learning_rate': 2.7984722108046637e-06, 'epoch': 2.29} +2025-02-06 01:29:45 - ERROR - stderr - 76%|███████▋ | 17123/22434 [15:22:05<3:46:58, 2.56s/it] +2025-02-06 01:29:47 - ERROR - stderr - 76%|███████▋ | 17124/22434 [15:22:07<3:46:24, 2.56s/it] +2025-02-06 01:29:47 - ERROR - stderr - +2025-02-06 01:29:47 - ERROR - stderr - +2025-02-06 01:29:47 - INFO - stdout - {'loss': 0.3645, 'grad_norm': 1.555106282234192, 'learning_rate': 2.7974705920488267e-06, 'epoch': 2.29} +2025-02-06 01:29:47 - ERROR - stderr - 76%|███████▋ | 17124/22434 [15:22:07<3:46:24, 2.56s/it] +2025-02-06 01:29:50 - ERROR - stderr - 76%|███████▋ | 17125/22434 [15:22:10<3:45:36, 2.55s/it] +2025-02-06 01:29:50 - ERROR - stderr - +2025-02-06 01:29:50 - ERROR - stderr - +2025-02-06 01:29:50 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.6657214164733887, 'learning_rate': 2.7964691234228238e-06, 'epoch': 2.29} +2025-02-06 01:29:50 - ERROR - stderr - 76%|███████▋ | 17125/22434 [15:22:10<3:45:36, 2.55s/it] +2025-02-06 01:29:52 - ERROR - stderr - 76%|███████▋ | 17126/22434 [15:22:12<3:45:03, 2.54s/it] +2025-02-06 01:29:52 - ERROR - stderr - +2025-02-06 01:29:52 - ERROR - stderr - +2025-02-06 01:29:52 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.3964290618896484, 'learning_rate': 2.795467804947528e-06, 'epoch': 2.29} +2025-02-06 01:29:52 - ERROR - stderr - 76%|███████▋ | 17126/22434 [15:22:12<3:45:03, 2.54s/it] +2025-02-06 01:29:55 - ERROR - stderr - 76%|███████▋ | 17127/22434 [15:22:15<3:47:44, 2.57s/it] +2025-02-06 01:29:55 - ERROR - stderr - +2025-02-06 01:29:55 - ERROR - stderr - +2025-02-06 01:29:55 - INFO - stdout - {'loss': 0.4029, 'grad_norm': 1.507797360420227, 'learning_rate': 2.794466636643812e-06, 'epoch': 2.29} +2025-02-06 01:29:55 - ERROR - stderr - 76%|███████▋ | 17127/22434 [15:22:15<3:47:44, 2.57s/it] +2025-02-06 01:29:58 - ERROR - stderr - 76%|███████▋ | 17128/22434 [15:22:17<3:46:13, 2.56s/it] +2025-02-06 01:29:58 - ERROR - stderr - +2025-02-06 01:29:58 - ERROR - stderr - +2025-02-06 01:29:58 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.6231237649917603, 'learning_rate': 2.7934656185325483e-06, 'epoch': 2.29} +2025-02-06 01:29:58 - ERROR - stderr - 76%|███████▋ | 17128/22434 [15:22:17<3:46:13, 2.56s/it] +2025-02-06 01:30:00 - ERROR - stderr - 76%|███████▋ | 17129/22434 [15:22:20<3:44:51, 2.54s/it] +2025-02-06 01:30:00 - ERROR - stderr - +2025-02-06 01:30:00 - ERROR - stderr - +2025-02-06 01:30:00 - INFO - stdout - {'loss': 0.4103, 'grad_norm': 1.6168947219848633, 'learning_rate': 2.7924647506345913e-06, 'epoch': 2.29} +2025-02-06 01:30:00 - ERROR - stderr - 76%|███████▋ | 17129/22434 [15:22:20<3:44:51, 2.54s/it] +2025-02-06 01:30:02 - ERROR - stderr - 76%|███████▋ | 17130/22434 [15:22:22<3:41:55, 2.51s/it] +2025-02-06 01:30:03 - ERROR - stderr - +2025-02-06 01:30:03 - ERROR - stderr - +2025-02-06 01:30:03 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.4203189611434937, 'learning_rate': 2.791464032970812e-06, 'epoch': 2.29} +2025-02-06 01:30:03 - ERROR - stderr - 76%|███████▋ | 17130/22434 [15:22:22<3:41:55, 2.51s/it] +2025-02-06 01:30:05 - ERROR - stderr - 76%|███████▋ | 17131/22434 [15:22:25<3:42:48, 2.52s/it] +2025-02-06 01:30:05 - ERROR - stderr - +2025-02-06 01:30:05 - ERROR - stderr - +2025-02-06 01:30:05 - INFO - stdout - {'loss': 0.3256, 'grad_norm': 1.4013820886611938, 'learning_rate': 2.790463465562068e-06, 'epoch': 2.29} +2025-02-06 01:30:05 - ERROR - stderr - 76%|███████▋ | 17131/22434 [15:22:25<3:42:48, 2.52s/it] +2025-02-06 01:30:08 - ERROR - stderr - 76%|███████▋ | 17132/22434 [15:22:27<3:42:51, 2.52s/it] +2025-02-06 01:30:08 - ERROR - stderr - +2025-02-06 01:30:08 - ERROR - stderr - +2025-02-06 01:30:08 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.4253093004226685, 'learning_rate': 2.789463048429214e-06, 'epoch': 2.29} +2025-02-06 01:30:08 - ERROR - stderr - 76%|███████▋ | 17132/22434 [15:22:27<3:42:51, 2.52s/it] +2025-02-06 01:30:10 - ERROR - stderr - 76%|███████▋ | 17133/22434 [15:22:30<3:41:50, 2.51s/it] +2025-02-06 01:30:10 - ERROR - stderr - +2025-02-06 01:30:10 - ERROR - stderr - +2025-02-06 01:30:10 - INFO - stdout - {'loss': 0.3946, 'grad_norm': 1.5374804735183716, 'learning_rate': 2.7884627815931052e-06, 'epoch': 2.29} +2025-02-06 01:30:10 - ERROR - stderr - 76%|███████▋ | 17133/22434 [15:22:30<3:41:50, 2.51s/it] +2025-02-06 01:30:13 - ERROR - stderr - 76%|███████▋ | 17134/22434 [15:22:32<3:44:54, 2.55s/it] +2025-02-06 01:30:13 - ERROR - stderr - +2025-02-06 01:30:13 - ERROR - stderr - +2025-02-06 01:30:13 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.5190311670303345, 'learning_rate': 2.7874626650745838e-06, 'epoch': 2.29} +2025-02-06 01:30:13 - ERROR - stderr - 76%|███████▋ | 17134/22434 [15:22:32<3:44:54, 2.55s/it] +2025-02-06 01:30:15 - ERROR - stderr - 76%|███████▋ | 17135/22434 [15:22:35<3:43:08, 2.53s/it] +2025-02-06 01:30:15 - ERROR - stderr - +2025-02-06 01:30:15 - ERROR - stderr - +2025-02-06 01:30:15 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.5384248495101929, 'learning_rate': 2.786462698894508e-06, 'epoch': 2.29} +2025-02-06 01:30:15 - ERROR - stderr - 76%|███████▋ | 17135/22434 [15:22:35<3:43:08, 2.53s/it] +2025-02-06 01:30:18 - ERROR - stderr - 76%|███████▋ | 17136/22434 [15:22:37<3:41:28, 2.51s/it] +2025-02-06 01:30:18 - ERROR - stderr - +2025-02-06 01:30:18 - ERROR - stderr - +2025-02-06 01:30:18 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.5339213609695435, 'learning_rate': 2.785462883073711e-06, 'epoch': 2.29} +2025-02-06 01:30:18 - ERROR - stderr - 76%|███████▋ | 17136/22434 [15:22:37<3:41:28, 2.51s/it] +2025-02-06 01:30:20 - ERROR - stderr - 76%|███████▋ | 17137/22434 [15:22:40<3:44:28, 2.54s/it] +2025-02-06 01:30:20 - ERROR - stderr - +2025-02-06 01:30:20 - ERROR - stderr - +2025-02-06 01:30:20 - INFO - stdout - {'loss': 0.4061, 'grad_norm': 1.5710184574127197, 'learning_rate': 2.784463217633033e-06, 'epoch': 2.29} +2025-02-06 01:30:20 - ERROR - stderr - 76%|███████▋ | 17137/22434 [15:22:40<3:44:28, 2.54s/it] +2025-02-06 01:30:23 - ERROR - stderr - 76%|███████▋ | 17138/22434 [15:22:43<3:44:36, 2.54s/it] +2025-02-06 01:30:23 - ERROR - stderr - +2025-02-06 01:30:23 - ERROR - stderr - +2025-02-06 01:30:23 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.4566221237182617, 'learning_rate': 2.783463702593322e-06, 'epoch': 2.29} +2025-02-06 01:30:23 - ERROR - stderr - 76%|███████▋ | 17138/22434 [15:22:43<3:44:36, 2.54s/it] +2025-02-06 01:30:25 - ERROR - stderr - 76%|███████▋ | 17139/22434 [15:22:45<3:47:20, 2.58s/it] +2025-02-06 01:30:25 - ERROR - stderr - +2025-02-06 01:30:25 - ERROR - stderr - +2025-02-06 01:30:25 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.3668036460876465, 'learning_rate': 2.782464337975398e-06, 'epoch': 2.29} +2025-02-06 01:30:25 - ERROR - stderr - 76%|███████▋ | 17139/22434 [15:22:45<3:47:20, 2.58s/it] +2025-02-06 01:30:28 - ERROR - stderr - 76%|███████▋ | 17140/22434 [15:22:48<3:43:59, 2.54s/it] +2025-02-06 01:30:28 - ERROR - stderr - +2025-02-06 01:30:28 - ERROR - stderr - +2025-02-06 01:30:28 - INFO - stdout - {'loss': 0.4277, 'grad_norm': 1.782456398010254, 'learning_rate': 2.7814651238001045e-06, 'epoch': 2.29} +2025-02-06 01:30:28 - ERROR - stderr - 76%|███████▋ | 17140/22434 [15:22:48<3:43:59, 2.54s/it] +2025-02-06 01:30:30 - ERROR - stderr - 76%|███████▋ | 17141/22434 [15:22:50<3:44:33, 2.55s/it] +2025-02-06 01:30:30 - ERROR - stderr - +2025-02-06 01:30:30 - ERROR - stderr - +2025-02-06 01:30:30 - INFO - stdout - {'loss': 0.4144, 'grad_norm': 1.4327895641326904, 'learning_rate': 2.780466060088259e-06, 'epoch': 2.29} +2025-02-06 01:30:30 - ERROR - stderr - 76%|███████▋ | 17141/22434 [15:22:50<3:44:33, 2.55s/it] +2025-02-06 01:30:33 - ERROR - stderr - 76%|███████▋ | 17142/22434 [15:22:53<3:43:09, 2.53s/it] +2025-02-06 01:30:33 - ERROR - stderr - +2025-02-06 01:30:33 - ERROR - stderr - +2025-02-06 01:30:33 - INFO - stdout - {'loss': 0.3983, 'grad_norm': 1.4891985654830933, 'learning_rate': 2.7794671468606916e-06, 'epoch': 2.29} +2025-02-06 01:30:33 - ERROR - stderr - 76%|███████▋ | 17142/22434 [15:22:53<3:43:09, 2.53s/it] +2025-02-06 01:30:35 - ERROR - stderr - 76%|███████▋ | 17143/22434 [15:22:55<3:41:07, 2.51s/it] +2025-02-06 01:30:35 - ERROR - stderr - +2025-02-06 01:30:35 - ERROR - stderr - +2025-02-06 01:30:35 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.5933634042739868, 'learning_rate': 2.778468384138222e-06, 'epoch': 2.29} +2025-02-06 01:30:35 - ERROR - stderr - 76%|███████▋ | 17143/22434 [15:22:55<3:41:07, 2.51s/it] +2025-02-06 01:30:38 - ERROR - stderr - 76%|███████▋ | 17144/22434 [15:22:58<3:38:44, 2.48s/it] +2025-02-06 01:30:38 - ERROR - stderr - +2025-02-06 01:30:38 - ERROR - stderr - +2025-02-06 01:30:38 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.682861328125, 'learning_rate': 2.7774697719416688e-06, 'epoch': 2.29} +2025-02-06 01:30:38 - ERROR - stderr - 76%|███████▋ | 17144/22434 [15:22:58<3:38:44, 2.48s/it] +2025-02-06 01:30:40 - ERROR - stderr - 76%|███████▋ | 17145/22434 [15:23:00<3:40:42, 2.50s/it] +2025-02-06 01:30:40 - ERROR - stderr - +2025-02-06 01:30:40 - ERROR - stderr - +2025-02-06 01:30:40 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.4285898208618164, 'learning_rate': 2.776471310291846e-06, 'epoch': 2.29} +2025-02-06 01:30:40 - ERROR - stderr - 76%|███████▋ | 17145/22434 [15:23:00<3:40:42, 2.50s/it] +2025-02-06 01:30:43 - ERROR - stderr - 76%|███████▋ | 17146/22434 [15:23:03<3:41:41, 2.52s/it] +2025-02-06 01:30:43 - ERROR - stderr - +2025-02-06 01:30:43 - ERROR - stderr - +2025-02-06 01:30:43 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.418239951133728, 'learning_rate': 2.7754729992095673e-06, 'epoch': 2.29} +2025-02-06 01:30:43 - ERROR - stderr - 76%|███████▋ | 17146/22434 [15:23:03<3:41:41, 2.52s/it] +2025-02-06 01:30:45 - ERROR - stderr - 76%|███████▋ | 17147/22434 [15:23:05<3:40:41, 2.50s/it] +2025-02-06 01:30:45 - ERROR - stderr - +2025-02-06 01:30:45 - ERROR - stderr - +2025-02-06 01:30:45 - INFO - stdout - {'loss': 0.3399, 'grad_norm': 1.4235265254974365, 'learning_rate': 2.774474838715642e-06, 'epoch': 2.29} +2025-02-06 01:30:45 - ERROR - stderr - 76%|███████▋ | 17147/22434 [15:23:05<3:40:41, 2.50s/it] +2025-02-06 01:30:48 - ERROR - stderr - 76%|███████▋ | 17148/22434 [15:23:08<3:40:07, 2.50s/it] +2025-02-06 01:30:48 - ERROR - stderr - +2025-02-06 01:30:48 - ERROR - stderr - +2025-02-06 01:30:48 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.4237998723983765, 'learning_rate': 2.7734768288308724e-06, 'epoch': 2.29} +2025-02-06 01:30:48 - ERROR - stderr - 76%|███████▋ | 17148/22434 [15:23:08<3:40:07, 2.50s/it] +2025-02-06 01:30:50 - ERROR - stderr - 76%|███████▋ | 17149/22434 [15:23:10<3:39:26, 2.49s/it] +2025-02-06 01:30:50 - ERROR - stderr - +2025-02-06 01:30:50 - ERROR - stderr - +2025-02-06 01:30:50 - INFO - stdout - {'loss': 0.3215, 'grad_norm': 1.3857581615447998, 'learning_rate': 2.7724789695760645e-06, 'epoch': 2.29} +2025-02-06 01:30:50 - ERROR - stderr - 76%|███████▋ | 17149/22434 [15:23:10<3:39:26, 2.49s/it] +2025-02-06 01:30:53 - ERROR - stderr - 76%|███████▋ | 17150/22434 [15:23:13<3:39:07, 2.49s/it] +2025-02-06 01:30:53 - ERROR - stderr - +2025-02-06 01:30:53 - ERROR - stderr - +2025-02-06 01:30:53 - INFO - stdout - {'loss': 0.4083, 'grad_norm': 1.6971173286437988, 'learning_rate': 2.7714812609720167e-06, 'epoch': 2.29} +2025-02-06 01:30:53 - ERROR - stderr - 76%|███████▋ | 17150/22434 [15:23:13<3:39:07, 2.49s/it] +2025-02-06 01:30:55 - ERROR - stderr - 76%|███████▋ | 17151/22434 [15:23:15<3:38:15, 2.48s/it] +2025-02-06 01:30:55 - ERROR - stderr - +2025-02-06 01:30:55 - ERROR - stderr - +2025-02-06 01:30:55 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.4783122539520264, 'learning_rate': 2.7704837030395237e-06, 'epoch': 2.29} +2025-02-06 01:30:55 - ERROR - stderr - 76%|███████▋ | 17151/22434 [15:23:15<3:38:15, 2.48s/it] +2025-02-06 01:30:58 - ERROR - stderr - 76%|███████▋ | 17152/22434 [15:23:18<3:38:02, 2.48s/it] +2025-02-06 01:30:58 - ERROR - stderr - +2025-02-06 01:30:58 - ERROR - stderr - +2025-02-06 01:30:58 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.4801981449127197, 'learning_rate': 2.769486295799385e-06, 'epoch': 2.29} +2025-02-06 01:30:58 - ERROR - stderr - 76%|███████▋ | 17152/22434 [15:23:18<3:38:02, 2.48s/it] +2025-02-06 01:31:00 - ERROR - stderr - 76%|███████▋ | 17153/22434 [15:23:20<3:42:00, 2.52s/it] +2025-02-06 01:31:00 - ERROR - stderr - +2025-02-06 01:31:00 - ERROR - stderr - +2025-02-06 01:31:00 - INFO - stdout - {'loss': 0.4077, 'grad_norm': 1.4771761894226074, 'learning_rate': 2.7684890392723783e-06, 'epoch': 2.29} +2025-02-06 01:31:00 - ERROR - stderr - 76%|███████▋ | 17153/22434 [15:23:20<3:42:00, 2.52s/it] +2025-02-06 01:31:03 - ERROR - stderr - 76%|███████▋ | 17154/22434 [15:23:23<3:43:33, 2.54s/it] +2025-02-06 01:31:03 - ERROR - stderr - +2025-02-06 01:31:03 - ERROR - stderr - +2025-02-06 01:31:03 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.6517611742019653, 'learning_rate': 2.767491933479304e-06, 'epoch': 2.29} +2025-02-06 01:31:03 - ERROR - stderr - 76%|███████▋ | 17154/22434 [15:23:23<3:43:33, 2.54s/it] +2025-02-06 01:31:05 - ERROR - stderr - 76%|███████▋ | 17155/22434 [15:23:25<3:41:40, 2.52s/it] +2025-02-06 01:31:05 - ERROR - stderr - +2025-02-06 01:31:05 - ERROR - stderr - +2025-02-06 01:31:05 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.4087947607040405, 'learning_rate': 2.7664949784409335e-06, 'epoch': 2.29} +2025-02-06 01:31:05 - ERROR - stderr - 76%|███████▋ | 17155/22434 [15:23:25<3:41:40, 2.52s/it] +2025-02-06 01:31:08 - ERROR - stderr - 76%|███████▋ | 17156/22434 [15:23:28<3:43:13, 2.54s/it] +2025-02-06 01:31:08 - ERROR - stderr - +2025-02-06 01:31:08 - ERROR - stderr - +2025-02-06 01:31:08 - INFO - stdout - {'loss': 0.3867, 'grad_norm': 1.5638269186019897, 'learning_rate': 2.765498174178056e-06, 'epoch': 2.29} +2025-02-06 01:31:08 - ERROR - stderr - 76%|███████▋ | 17156/22434 [15:23:28<3:43:13, 2.54s/it] +2025-02-06 01:31:11 - ERROR - stderr - 76%|███████▋ | 17157/22434 [15:23:30<3:44:23, 2.55s/it] +2025-02-06 01:31:11 - ERROR - stderr - +2025-02-06 01:31:11 - ERROR - stderr - +2025-02-06 01:31:11 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.510145664215088, 'learning_rate': 2.76450152071145e-06, 'epoch': 2.29} +2025-02-06 01:31:11 - ERROR - stderr - 76%|███████▋ | 17157/22434 [15:23:30<3:44:23, 2.55s/it] +2025-02-06 01:31:13 - ERROR - stderr - 76%|███████▋ | 17158/22434 [15:23:33<3:44:32, 2.55s/it] +2025-02-06 01:31:13 - ERROR - stderr - +2025-02-06 01:31:13 - ERROR - stderr - +2025-02-06 01:31:13 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.495118260383606, 'learning_rate': 2.7635050180618805e-06, 'epoch': 2.29} +2025-02-06 01:31:13 - ERROR - stderr - 76%|███████▋ | 17158/22434 [15:23:33<3:44:32, 2.55s/it] +2025-02-06 01:31:16 - ERROR - stderr - 76%|███████▋ | 17159/22434 [15:23:35<3:43:39, 2.54s/it] +2025-02-06 01:31:16 - ERROR - stderr - +2025-02-06 01:31:16 - ERROR - stderr - +2025-02-06 01:31:16 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.5410338640213013, 'learning_rate': 2.76250866625013e-06, 'epoch': 2.29} +2025-02-06 01:31:16 - ERROR - stderr - 76%|███████▋ | 17159/22434 [15:23:35<3:43:39, 2.54s/it] +2025-02-06 01:31:18 - ERROR - stderr - 76%|███████▋ | 17160/22434 [15:23:38<3:40:04, 2.50s/it] +2025-02-06 01:31:18 - ERROR - stderr - +2025-02-06 01:31:18 - ERROR - stderr - +2025-02-06 01:31:18 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.4344508647918701, 'learning_rate': 2.7615124652969583e-06, 'epoch': 2.29} +2025-02-06 01:31:18 - ERROR - stderr - 76%|███████▋ | 17160/22434 [15:23:38<3:40:04, 2.50s/it] +2025-02-06 01:31:21 - ERROR - stderr - 76%|███████▋ | 17161/22434 [15:23:40<3:39:32, 2.50s/it] +2025-02-06 01:31:21 - ERROR - stderr - +2025-02-06 01:31:21 - ERROR - stderr - +2025-02-06 01:31:21 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.505493402481079, 'learning_rate': 2.7605164152231322e-06, 'epoch': 2.29} +2025-02-06 01:31:21 - ERROR - stderr - 76%|███████▋ | 17161/22434 [15:23:40<3:39:32, 2.50s/it] +2025-02-06 01:31:23 - ERROR - stderr - 76%|███████▋ | 17162/22434 [15:23:43<3:39:42, 2.50s/it] +2025-02-06 01:31:23 - ERROR - stderr - +2025-02-06 01:31:23 - ERROR - stderr - +2025-02-06 01:31:23 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.4173133373260498, 'learning_rate': 2.7595205160494133e-06, 'epoch': 2.29} +2025-02-06 01:31:23 - ERROR - stderr - 76%|███████▋ | 17162/22434 [15:23:43<3:39:42, 2.50s/it] +2025-02-06 01:31:26 - ERROR - stderr - 77%|███████▋ | 17163/22434 [15:23:45<3:38:40, 2.49s/it] +2025-02-06 01:31:26 - ERROR - stderr - +2025-02-06 01:31:26 - ERROR - stderr - +2025-02-06 01:31:26 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.509009838104248, 'learning_rate': 2.7585247677965588e-06, 'epoch': 2.3} +2025-02-06 01:31:26 - ERROR - stderr - 77%|███████▋ | 17163/22434 [15:23:45<3:38:40, 2.49s/it] +2025-02-06 01:31:28 - ERROR - stderr - 77%|███████▋ | 17164/22434 [15:23:48<3:38:30, 2.49s/it] +2025-02-06 01:31:28 - ERROR - stderr - +2025-02-06 01:31:28 - ERROR - stderr - +2025-02-06 01:31:28 - INFO - stdout - {'loss': 0.4129, 'grad_norm': 1.8681782484054565, 'learning_rate': 2.7575291704853325e-06, 'epoch': 2.3} +2025-02-06 01:31:28 - ERROR - stderr - 77%|███████▋ | 17164/22434 [15:23:48<3:38:30, 2.49s/it] +2025-02-06 01:31:31 - ERROR - stderr - 77%|███████▋ | 17165/22434 [15:23:50<3:38:43, 2.49s/it] +2025-02-06 01:31:31 - ERROR - stderr - +2025-02-06 01:31:31 - ERROR - stderr - +2025-02-06 01:31:31 - INFO - stdout - {'loss': 0.3271, 'grad_norm': 1.4409300088882446, 'learning_rate': 2.7565337241364766e-06, 'epoch': 2.3} +2025-02-06 01:31:31 - ERROR - stderr - 77%|███████▋ | 17165/22434 [15:23:50<3:38:43, 2.49s/it] +2025-02-06 01:31:33 - ERROR - stderr - 77%|███████▋ | 17166/22434 [15:23:53<3:41:27, 2.52s/it] +2025-02-06 01:31:33 - ERROR - stderr - +2025-02-06 01:31:33 - ERROR - stderr - +2025-02-06 01:31:33 - INFO - stdout - {'loss': 0.3299, 'grad_norm': 1.390662431716919, 'learning_rate': 2.7555384287707443e-06, 'epoch': 2.3} +2025-02-06 01:31:33 - ERROR - stderr - 77%|███████▋ | 17166/22434 [15:23:53<3:41:27, 2.52s/it] +2025-02-06 01:31:36 - ERROR - stderr - 77%|███████▋ | 17167/22434 [15:23:55<3:41:48, 2.53s/it] +2025-02-06 01:31:36 - ERROR - stderr - +2025-02-06 01:31:36 - ERROR - stderr - +2025-02-06 01:31:36 - INFO - stdout - {'loss': 0.3982, 'grad_norm': 1.6760491132736206, 'learning_rate': 2.7545432844088814e-06, 'epoch': 2.3} +2025-02-06 01:31:36 - ERROR - stderr - 77%|███████▋ | 17167/22434 [15:23:55<3:41:48, 2.53s/it] +2025-02-06 01:31:38 - ERROR - stderr - 77%|███████▋ | 17168/22434 [15:23:58<3:39:36, 2.50s/it] +2025-02-06 01:31:38 - ERROR - stderr - +2025-02-06 01:31:38 - ERROR - stderr - +2025-02-06 01:31:38 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.4453706741333008, 'learning_rate': 2.7535482910716305e-06, 'epoch': 2.3} +2025-02-06 01:31:38 - ERROR - stderr - 77%|███████▋ | 17168/22434 [15:23:58<3:39:36, 2.50s/it] +2025-02-06 01:31:41 - ERROR - stderr - 77%|███████▋ | 17169/22434 [15:24:00<3:39:49, 2.51s/it] +2025-02-06 01:31:41 - ERROR - stderr - +2025-02-06 01:31:41 - ERROR - stderr - +2025-02-06 01:31:41 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.4513747692108154, 'learning_rate': 2.7525534487797313e-06, 'epoch': 2.3} +2025-02-06 01:31:41 - ERROR - stderr - 77%|███████▋ | 17169/22434 [15:24:00<3:39:49, 2.51s/it] +2025-02-06 01:31:43 - ERROR - stderr - 77%|███████▋ | 17170/22434 [15:24:03<3:44:51, 2.56s/it] +2025-02-06 01:31:43 - ERROR - stderr - +2025-02-06 01:31:43 - ERROR - stderr - +2025-02-06 01:31:43 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.4528206586837769, 'learning_rate': 2.751558757553919e-06, 'epoch': 2.3} +2025-02-06 01:31:43 - ERROR - stderr - 77%|███████▋ | 17170/22434 [15:24:03<3:44:51, 2.56s/it] +2025-02-06 01:31:46 - ERROR - stderr - 77%|███████▋ | 17171/22434 [15:24:06<3:41:57, 2.53s/it] +2025-02-06 01:31:46 - ERROR - stderr - +2025-02-06 01:31:46 - ERROR - stderr - +2025-02-06 01:31:46 - INFO - stdout - {'loss': 0.4297, 'grad_norm': 1.6733758449554443, 'learning_rate': 2.7505642174149306e-06, 'epoch': 2.3} +2025-02-06 01:31:46 - ERROR - stderr - 77%|███████▋ | 17171/22434 [15:24:06<3:41:57, 2.53s/it] +2025-02-06 01:31:48 - ERROR - stderr - 77%|███████▋ | 17172/22434 [15:24:08<3:43:55, 2.55s/it] +2025-02-06 01:31:48 - ERROR - stderr - +2025-02-06 01:31:48 - ERROR - stderr - +2025-02-06 01:31:48 - INFO - stdout - {'loss': 0.4558, 'grad_norm': 1.7016123533248901, 'learning_rate': 2.7495698283834926e-06, 'epoch': 2.3} +2025-02-06 01:31:48 - ERROR - stderr - 77%|███████▋ | 17172/22434 [15:24:08<3:43:55, 2.55s/it] +2025-02-06 01:31:51 - ERROR - stderr - 77%|███████▋ | 17173/22434 [15:24:11<3:41:30, 2.53s/it] +2025-02-06 01:31:51 - ERROR - stderr - +2025-02-06 01:31:51 - ERROR - stderr - +2025-02-06 01:31:51 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.4338643550872803, 'learning_rate': 2.748575590480338e-06, 'epoch': 2.3} +2025-02-06 01:31:51 - ERROR - stderr - 77%|███████▋ | 17173/22434 [15:24:11<3:41:30, 2.53s/it] +2025-02-06 01:31:53 - ERROR - stderr - 77%|███████▋ | 17174/22434 [15:24:13<3:40:23, 2.51s/it] +2025-02-06 01:31:53 - ERROR - stderr - +2025-02-06 01:31:53 - ERROR - stderr - +2025-02-06 01:31:53 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.4443522691726685, 'learning_rate': 2.74758150372618e-06, 'epoch': 2.3} +2025-02-06 01:31:53 - ERROR - stderr - 77%|███████▋ | 17174/22434 [15:24:13<3:40:23, 2.51s/it] +2025-02-06 01:31:56 - ERROR - stderr - 77%|███████▋ | 17175/22434 [15:24:16<3:40:06, 2.51s/it] +2025-02-06 01:31:56 - ERROR - stderr - +2025-02-06 01:31:56 - ERROR - stderr - +2025-02-06 01:31:56 - INFO - stdout - {'loss': 0.3812, 'grad_norm': 1.5457407236099243, 'learning_rate': 2.7465875681417475e-06, 'epoch': 2.3} +2025-02-06 01:31:56 - ERROR - stderr - 77%|███████▋ | 17175/22434 [15:24:16<3:40:06, 2.51s/it] +2025-02-06 01:31:58 - ERROR - stderr - 77%|███████▋ | 17176/22434 [15:24:18<3:42:15, 2.54s/it] +2025-02-06 01:31:58 - ERROR - stderr - +2025-02-06 01:31:58 - ERROR - stderr - +2025-02-06 01:31:58 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.337968111038208, 'learning_rate': 2.7455937837477577e-06, 'epoch': 2.3} +2025-02-06 01:31:58 - ERROR - stderr - 77%|███████▋ | 17176/22434 [15:24:18<3:42:15, 2.54s/it] +2025-02-06 01:32:01 - ERROR - stderr - 77%|███████▋ | 17177/22434 [15:24:21<3:41:28, 2.53s/it] +2025-02-06 01:32:01 - ERROR - stderr - +2025-02-06 01:32:01 - ERROR - stderr - +2025-02-06 01:32:01 - INFO - stdout - {'loss': 0.3874, 'grad_norm': 1.4253863096237183, 'learning_rate': 2.7446001505649234e-06, 'epoch': 2.3} +2025-02-06 01:32:01 - ERROR - stderr - 77%|███████▋ | 17177/22434 [15:24:21<3:41:28, 2.53s/it] +2025-02-06 01:32:03 - ERROR - stderr - 77%|███████▋ | 17178/22434 [15:24:23<3:40:55, 2.52s/it] +2025-02-06 01:32:03 - ERROR - stderr - +2025-02-06 01:32:03 - ERROR - stderr - +2025-02-06 01:32:03 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.4307308197021484, 'learning_rate': 2.7436066686139595e-06, 'epoch': 2.3} +2025-02-06 01:32:03 - ERROR - stderr - 77%|███████▋ | 17178/22434 [15:24:23<3:40:55, 2.52s/it] +2025-02-06 01:32:06 - ERROR - stderr - 77%|███████▋ | 17179/22434 [15:24:26<3:42:27, 2.54s/it] +2025-02-06 01:32:06 - ERROR - stderr - +2025-02-06 01:32:06 - ERROR - stderr - +2025-02-06 01:32:06 - INFO - stdout - {'loss': 0.4199, 'grad_norm': 1.6833326816558838, 'learning_rate': 2.742613337915564e-06, 'epoch': 2.3} +2025-02-06 01:32:06 - ERROR - stderr - 77%|███████▋ | 17179/22434 [15:24:26<3:42:27, 2.54s/it] +2025-02-06 01:32:09 - ERROR - stderr - 77%|███████▋ | 17180/22434 [15:24:28<3:42:26, 2.54s/it] +2025-02-06 01:32:09 - ERROR - stderr - +2025-02-06 01:32:09 - ERROR - stderr - +2025-02-06 01:32:09 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.552001714706421, 'learning_rate': 2.7416201584904556e-06, 'epoch': 2.3} +2025-02-06 01:32:09 - ERROR - stderr - 77%|███████▋ | 17180/22434 [15:24:28<3:42:26, 2.54s/it] +2025-02-06 01:32:11 - ERROR - stderr - 77%|███████▋ | 17181/22434 [15:24:31<3:44:52, 2.57s/it] +2025-02-06 01:32:11 - ERROR - stderr - +2025-02-06 01:32:11 - ERROR - stderr - +2025-02-06 01:32:11 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.3154784440994263, 'learning_rate': 2.7406271303593266e-06, 'epoch': 2.3} +2025-02-06 01:32:11 - ERROR - stderr - 77%|███████▋ | 17181/22434 [15:24:31<3:44:52, 2.57s/it] +2025-02-06 01:32:14 - ERROR - stderr - 77%|███████▋ | 17182/22434 [15:24:33<3:43:17, 2.55s/it] +2025-02-06 01:32:14 - ERROR - stderr - +2025-02-06 01:32:14 - ERROR - stderr - +2025-02-06 01:32:14 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.4997074604034424, 'learning_rate': 2.7396342535428753e-06, 'epoch': 2.3} +2025-02-06 01:32:14 - ERROR - stderr - 77%|███████▋ | 17182/22434 [15:24:34<3:43:17, 2.55s/it] +2025-02-06 01:32:16 - ERROR - stderr - 77%|███████▋ | 17183/22434 [15:24:36<3:45:14, 2.57s/it] +2025-02-06 01:32:16 - ERROR - stderr - +2025-02-06 01:32:16 - ERROR - stderr - +2025-02-06 01:32:16 - INFO - stdout - {'loss': 0.3987, 'grad_norm': 1.3807638883590698, 'learning_rate': 2.7386415280618074e-06, 'epoch': 2.3} +2025-02-06 01:32:16 - ERROR - stderr - 77%|███████▋ | 17183/22434 [15:24:36<3:45:14, 2.57s/it] +2025-02-06 01:32:19 - ERROR - stderr - 77%|███████▋ | 17184/22434 [15:24:39<3:42:05, 2.54s/it] +2025-02-06 01:32:19 - ERROR - stderr - +2025-02-06 01:32:19 - ERROR - stderr - +2025-02-06 01:32:19 - INFO - stdout - {'loss': 0.4329, 'grad_norm': 1.716066837310791, 'learning_rate': 2.7376489539368014e-06, 'epoch': 2.3} +2025-02-06 01:32:19 - ERROR - stderr - 77%|███████▋ | 17184/22434 [15:24:39<3:42:05, 2.54s/it] +2025-02-06 01:32:21 - ERROR - stderr - 77%|███████▋ | 17185/22434 [15:24:41<3:41:32, 2.53s/it] +2025-02-06 01:32:21 - ERROR - stderr - +2025-02-06 01:32:21 - ERROR - stderr - +2025-02-06 01:32:21 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.3511594533920288, 'learning_rate': 2.7366565311885605e-06, 'epoch': 2.3} +2025-02-06 01:32:21 - ERROR - stderr - 77%|███████▋ | 17185/22434 [15:24:41<3:41:32, 2.53s/it] +2025-02-06 01:32:24 - ERROR - stderr - 77%|███████▋ | 17186/22434 [15:24:44<3:41:44, 2.54s/it] +2025-02-06 01:32:24 - ERROR - stderr - +2025-02-06 01:32:24 - ERROR - stderr - +2025-02-06 01:32:24 - INFO - stdout - {'loss': 0.4061, 'grad_norm': 1.5046862363815308, 'learning_rate': 2.7356642598377604e-06, 'epoch': 2.3} +2025-02-06 01:32:24 - ERROR - stderr - 77%|███████▋ | 17186/22434 [15:24:44<3:41:44, 2.54s/it] +2025-02-06 01:32:26 - ERROR - stderr - 77%|███████▋ | 17187/22434 [15:24:46<3:44:06, 2.56s/it] +2025-02-06 01:32:27 - ERROR - stderr - +2025-02-06 01:32:27 - ERROR - stderr - +2025-02-06 01:32:27 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.5308281183242798, 'learning_rate': 2.734672139905088e-06, 'epoch': 2.3} +2025-02-06 01:32:27 - ERROR - stderr - 77%|███████▋ | 17187/22434 [15:24:46<3:44:06, 2.56s/it] +2025-02-06 01:32:29 - ERROR - stderr - 77%|███████▋ | 17188/22434 [15:24:49<3:43:58, 2.56s/it] +2025-02-06 01:32:29 - ERROR - stderr - +2025-02-06 01:32:29 - ERROR - stderr - +2025-02-06 01:32:29 - INFO - stdout - {'loss': 0.4146, 'grad_norm': 1.618495225906372, 'learning_rate': 2.7336801714112217e-06, 'epoch': 2.3} +2025-02-06 01:32:29 - ERROR - stderr - 77%|███████▋ | 17188/22434 [15:24:49<3:43:58, 2.56s/it] +2025-02-06 01:32:32 - ERROR - stderr - 77%|███████▋ | 17189/22434 [15:24:51<3:46:23, 2.59s/it] +2025-02-06 01:32:32 - ERROR - stderr - +2025-02-06 01:32:32 - ERROR - stderr - +2025-02-06 01:32:32 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.4642754793167114, 'learning_rate': 2.7326883543768403e-06, 'epoch': 2.3} +2025-02-06 01:32:32 - ERROR - stderr - 77%|███████▋ | 17189/22434 [15:24:51<3:46:23, 2.59s/it] +2025-02-06 01:32:34 - ERROR - stderr - 77%|███████▋ | 17190/22434 [15:24:54<3:43:12, 2.55s/it] +2025-02-06 01:32:34 - ERROR - stderr - +2025-02-06 01:32:34 - ERROR - stderr - +2025-02-06 01:32:34 - INFO - stdout - {'loss': 0.3519, 'grad_norm': 1.4755795001983643, 'learning_rate': 2.731696688822615e-06, 'epoch': 2.3} +2025-02-06 01:32:34 - ERROR - stderr - 77%|███████▋ | 17190/22434 [15:24:54<3:43:12, 2.55s/it] +2025-02-06 01:32:37 - ERROR - stderr - 77%|███████▋ | 17191/22434 [15:24:56<3:41:20, 2.53s/it] +2025-02-06 01:32:37 - ERROR - stderr - +2025-02-06 01:32:37 - ERROR - stderr - +2025-02-06 01:32:37 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.3965767621994019, 'learning_rate': 2.730705174769218e-06, 'epoch': 2.3} +2025-02-06 01:32:37 - ERROR - stderr - 77%|███████▋ | 17191/22434 [15:24:56<3:41:20, 2.53s/it] +2025-02-06 01:32:39 - ERROR - stderr - 77%|███████▋ | 17192/22434 [15:24:59<3:41:20, 2.53s/it] +2025-02-06 01:32:39 - ERROR - stderr - +2025-02-06 01:32:39 - ERROR - stderr - +2025-02-06 01:32:39 - INFO - stdout - {'loss': 0.4365, 'grad_norm': 1.6315058469772339, 'learning_rate': 2.7297138122373158e-06, 'epoch': 2.3} +2025-02-06 01:32:39 - ERROR - stderr - 77%|███████▋ | 17192/22434 [15:24:59<3:41:20, 2.53s/it] +2025-02-06 01:32:42 - ERROR - stderr - 77%|███████▋ | 17193/22434 [15:25:02<3:42:36, 2.55s/it] +2025-02-06 01:32:42 - ERROR - stderr - +2025-02-06 01:32:42 - ERROR - stderr - +2025-02-06 01:32:42 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.4860994815826416, 'learning_rate': 2.728722601247572e-06, 'epoch': 2.3} +2025-02-06 01:32:42 - ERROR - stderr - 77%|███████▋ | 17193/22434 [15:25:02<3:42:36, 2.55s/it] +2025-02-06 01:32:44 - ERROR - stderr - 77%|███████▋ | 17194/22434 [15:25:04<3:40:37, 2.53s/it] +2025-02-06 01:32:44 - ERROR - stderr - +2025-02-06 01:32:44 - ERROR - stderr - +2025-02-06 01:32:44 - INFO - stdout - {'loss': 0.4365, 'grad_norm': 1.5704338550567627, 'learning_rate': 2.7277315418206476e-06, 'epoch': 2.3} +2025-02-06 01:32:44 - ERROR - stderr - 77%|███████▋ | 17194/22434 [15:25:04<3:40:37, 2.53s/it] +2025-02-06 01:32:47 - ERROR - stderr - 77%|███████▋ | 17195/22434 [15:25:07<3:41:21, 2.54s/it] +2025-02-06 01:32:47 - ERROR - stderr - +2025-02-06 01:32:47 - ERROR - stderr - +2025-02-06 01:32:47 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.4059998989105225, 'learning_rate': 2.7267406339771995e-06, 'epoch': 2.3} +2025-02-06 01:32:47 - ERROR - stderr - 77%|███████▋ | 17195/22434 [15:25:07<3:41:21, 2.54s/it] +2025-02-06 01:32:49 - ERROR - stderr - 77%|███████▋ | 17196/22434 [15:25:09<3:39:36, 2.52s/it] +2025-02-06 01:32:49 - ERROR - stderr - +2025-02-06 01:32:49 - ERROR - stderr - +2025-02-06 01:32:49 - INFO - stdout - {'loss': 0.3387, 'grad_norm': 1.5870018005371094, 'learning_rate': 2.7257498777378843e-06, 'epoch': 2.3} +2025-02-06 01:32:49 - ERROR - stderr - 77%|███████▋ | 17196/22434 [15:25:09<3:39:36, 2.52s/it] +2025-02-06 01:32:52 - ERROR - stderr - 77%|███████▋ | 17197/22434 [15:25:12<3:41:38, 2.54s/it] +2025-02-06 01:32:52 - ERROR - stderr - +2025-02-06 01:32:52 - ERROR - stderr - +2025-02-06 01:32:52 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.4791349172592163, 'learning_rate': 2.7247592731233552e-06, 'epoch': 2.3} +2025-02-06 01:32:52 - ERROR - stderr - 77%|███████▋ | 17197/22434 [15:25:12<3:41:38, 2.54s/it] +2025-02-06 01:32:54 - ERROR - stderr - 77%|███████▋ | 17198/22434 [15:25:14<3:39:10, 2.51s/it] +2025-02-06 01:32:54 - ERROR - stderr - +2025-02-06 01:32:54 - ERROR - stderr - +2025-02-06 01:32:54 - INFO - stdout - {'loss': 0.4395, 'grad_norm': 1.7134732007980347, 'learning_rate': 2.723768820154251e-06, 'epoch': 2.3} +2025-02-06 01:32:54 - ERROR - stderr - 77%|███████▋ | 17198/22434 [15:25:14<3:39:10, 2.51s/it] +2025-02-06 01:32:57 - ERROR - stderr - 77%|███████▋ | 17199/22434 [15:25:17<3:41:00, 2.53s/it] +2025-02-06 01:32:57 - ERROR - stderr - +2025-02-06 01:32:57 - ERROR - stderr - +2025-02-06 01:32:57 - INFO - stdout - {'loss': 0.42, 'grad_norm': 1.6738784313201904, 'learning_rate': 2.72277851885123e-06, 'epoch': 2.3} +2025-02-06 01:32:57 - ERROR - stderr - 77%|███████▋ | 17199/22434 [15:25:17<3:41:00, 2.53s/it] +2025-02-06 01:32:59 - ERROR - stderr - 77%|███████▋ | 17200/22434 [15:25:19<3:38:35, 2.51s/it] +2025-02-06 01:32:59 - ERROR - stderr - +2025-02-06 01:32:59 - ERROR - stderr - +2025-02-06 01:32:59 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.5754859447479248, 'learning_rate': 2.72178836923492e-06, 'epoch': 2.3} +2025-02-06 01:32:59 - ERROR - stderr - 77%|███████▋ | 17200/22434 [15:25:19<3:38:35, 2.51s/it] +2025-02-06 01:33:02 - ERROR - stderr - 77%|███████▋ | 17201/22434 [15:25:22<3:37:34, 2.49s/it] +2025-02-06 01:33:02 - ERROR - stderr - +2025-02-06 01:33:02 - ERROR - stderr - +2025-02-06 01:33:02 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.5172200202941895, 'learning_rate': 2.7207983713259713e-06, 'epoch': 2.3} +2025-02-06 01:33:02 - ERROR - stderr - 77%|███████▋ | 17201/22434 [15:25:22<3:37:34, 2.49s/it] +2025-02-06 01:33:05 - ERROR - stderr - 77%|███████▋ | 17202/22434 [15:25:24<3:43:45, 2.57s/it] +2025-02-06 01:33:05 - ERROR - stderr - +2025-02-06 01:33:05 - ERROR - stderr - +2025-02-06 01:33:05 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.784548044204712, 'learning_rate': 2.719808525145017e-06, 'epoch': 2.3} +2025-02-06 01:33:05 - ERROR - stderr - 77%|███████▋ | 17202/22434 [15:25:24<3:43:45, 2.57s/it] +2025-02-06 01:33:07 - ERROR - stderr - 77%|███████▋ | 17203/22434 [15:25:27<3:41:55, 2.55s/it] +2025-02-06 01:33:07 - ERROR - stderr - +2025-02-06 01:33:07 - ERROR - stderr - +2025-02-06 01:33:07 - INFO - stdout - {'loss': 0.3149, 'grad_norm': 1.2046096324920654, 'learning_rate': 2.7188188307126817e-06, 'epoch': 2.3} +2025-02-06 01:33:07 - ERROR - stderr - 77%|███████▋ | 17203/22434 [15:25:27<3:41:55, 2.55s/it] +2025-02-06 01:33:10 - ERROR - stderr - 77%|███████▋ | 17204/22434 [15:25:29<3:40:11, 2.53s/it] +2025-02-06 01:33:10 - ERROR - stderr - +2025-02-06 01:33:10 - ERROR - stderr - +2025-02-06 01:33:10 - INFO - stdout - {'loss': 0.426, 'grad_norm': 1.6806238889694214, 'learning_rate': 2.717829288049607e-06, 'epoch': 2.3} +2025-02-06 01:33:10 - ERROR - stderr - 77%|███████▋ | 17204/22434 [15:25:29<3:40:11, 2.53s/it] +2025-02-06 01:33:12 - ERROR - stderr - 77%|███████▋ | 17205/22434 [15:25:32<3:40:58, 2.54s/it] +2025-02-06 01:33:12 - ERROR - stderr - +2025-02-06 01:33:12 - ERROR - stderr - +2025-02-06 01:33:12 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.468948483467102, 'learning_rate': 2.7168398971764088e-06, 'epoch': 2.3} +2025-02-06 01:33:12 - ERROR - stderr - 77%|███████▋ | 17205/22434 [15:25:32<3:40:58, 2.54s/it] +2025-02-06 01:33:15 - ERROR - stderr - 77%|███████▋ | 17206/22434 [15:25:34<3:41:11, 2.54s/it] +2025-02-06 01:33:15 - ERROR - stderr - +2025-02-06 01:33:15 - ERROR - stderr - +2025-02-06 01:33:15 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.4999831914901733, 'learning_rate': 2.7158506581137147e-06, 'epoch': 2.3} +2025-02-06 01:33:15 - ERROR - stderr - 77%|███████▋ | 17206/22434 [15:25:34<3:41:11, 2.54s/it] +2025-02-06 01:33:17 - ERROR - stderr - 77%|███████▋ | 17207/22434 [15:25:37<3:43:51, 2.57s/it] +2025-02-06 01:33:17 - ERROR - stderr - +2025-02-06 01:33:17 - ERROR - stderr - +2025-02-06 01:33:17 - INFO - stdout - {'loss': 0.4184, 'grad_norm': 1.8045616149902344, 'learning_rate': 2.7148615708821422e-06, 'epoch': 2.3} +2025-02-06 01:33:17 - ERROR - stderr - 77%|███████▋ | 17207/22434 [15:25:37<3:43:51, 2.57s/it] +2025-02-06 01:33:20 - ERROR - stderr - 77%|███████▋ | 17208/22434 [15:25:40<3:41:48, 2.55s/it] +2025-02-06 01:33:20 - ERROR - stderr - +2025-02-06 01:33:20 - ERROR - stderr - +2025-02-06 01:33:20 - INFO - stdout - {'loss': 0.4067, 'grad_norm': 1.570469617843628, 'learning_rate': 2.713872635502307e-06, 'epoch': 2.3} +2025-02-06 01:33:20 - ERROR - stderr - 77%|███████▋ | 17208/22434 [15:25:40<3:41:48, 2.55s/it] +2025-02-06 01:33:22 - ERROR - stderr - 77%|███████▋ | 17209/22434 [15:25:42<3:38:27, 2.51s/it] +2025-02-06 01:33:22 - ERROR - stderr - +2025-02-06 01:33:22 - ERROR - stderr - +2025-02-06 01:33:22 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.5693238973617554, 'learning_rate': 2.7128838519948307e-06, 'epoch': 2.3} +2025-02-06 01:33:22 - ERROR - stderr - 77%|███████▋ | 17209/22434 [15:25:42<3:38:27, 2.51s/it] +2025-02-06 01:33:25 - ERROR - stderr - 77%|███████▋ | 17210/22434 [15:25:45<3:47:15, 2.61s/it] +2025-02-06 01:33:25 - ERROR - stderr - +2025-02-06 01:33:25 - ERROR - stderr - +2025-02-06 01:33:25 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.5478546619415283, 'learning_rate': 2.711895220380315e-06, 'epoch': 2.3} +2025-02-06 01:33:25 - ERROR - stderr - 77%|███████▋ | 17210/22434 [15:25:45<3:47:15, 2.61s/it] +2025-02-06 01:33:28 - ERROR - stderr - 77%|███████▋ | 17211/22434 [15:25:47<3:44:12, 2.58s/it] +2025-02-06 01:33:28 - ERROR - stderr - +2025-02-06 01:33:28 - ERROR - stderr - +2025-02-06 01:33:28 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.564774751663208, 'learning_rate': 2.7109067406793688e-06, 'epoch': 2.3} +2025-02-06 01:33:28 - ERROR - stderr - 77%|███████▋ | 17211/22434 [15:25:47<3:44:12, 2.58s/it] +2025-02-06 01:33:30 - ERROR - stderr - 77%|███████▋ | 17212/22434 [15:25:50<3:42:44, 2.56s/it] +2025-02-06 01:33:30 - ERROR - stderr - +2025-02-06 01:33:30 - ERROR - stderr - +2025-02-06 01:33:30 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.3911818265914917, 'learning_rate': 2.7099184129125967e-06, 'epoch': 2.3} +2025-02-06 01:33:30 - ERROR - stderr - 77%|███████▋ | 17212/22434 [15:25:50<3:42:44, 2.56s/it] +2025-02-06 01:33:33 - ERROR - stderr - 77%|███████▋ | 17213/22434 [15:25:52<3:42:01, 2.55s/it] +2025-02-06 01:33:33 - ERROR - stderr - +2025-02-06 01:33:33 - ERROR - stderr - +2025-02-06 01:33:33 - INFO - stdout - {'loss': 0.3294, 'grad_norm': 1.2479983568191528, 'learning_rate': 2.7089302371005986e-06, 'epoch': 2.3} +2025-02-06 01:33:33 - ERROR - stderr - 77%|███████▋ | 17213/22434 [15:25:52<3:42:01, 2.55s/it] +2025-02-06 01:33:35 - ERROR - stderr - 77%|███████▋ | 17214/22434 [15:25:55<3:42:07, 2.55s/it] +2025-02-06 01:33:35 - ERROR - stderr - +2025-02-06 01:33:35 - ERROR - stderr - +2025-02-06 01:33:35 - INFO - stdout - {'loss': 0.4193, 'grad_norm': 1.4475657939910889, 'learning_rate': 2.7079422132639745e-06, 'epoch': 2.3} +2025-02-06 01:33:35 - ERROR - stderr - 77%|███████▋ | 17214/22434 [15:25:55<3:42:07, 2.55s/it] +2025-02-06 01:33:38 - ERROR - stderr - 77%|███████▋ | 17215/22434 [15:25:57<3:41:19, 2.54s/it] +2025-02-06 01:33:38 - ERROR - stderr - +2025-02-06 01:33:38 - ERROR - stderr - +2025-02-06 01:33:38 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.4607634544372559, 'learning_rate': 2.7069543414233157e-06, 'epoch': 2.3} +2025-02-06 01:33:38 - ERROR - stderr - 77%|███████▋ | 17215/22434 [15:25:57<3:41:19, 2.54s/it] +2025-02-06 01:33:40 - ERROR - stderr - 77%|███████▋ | 17216/22434 [15:26:00<3:40:39, 2.54s/it] +2025-02-06 01:33:40 - ERROR - stderr - +2025-02-06 01:33:40 - ERROR - stderr - +2025-02-06 01:33:40 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.547248363494873, 'learning_rate': 2.7059666215992165e-06, 'epoch': 2.3} +2025-02-06 01:33:40 - ERROR - stderr - 77%|███████▋ | 17216/22434 [15:26:00<3:40:39, 2.54s/it] +2025-02-06 01:33:43 - ERROR - stderr - 77%|███████▋ | 17217/22434 [15:26:02<3:38:17, 2.51s/it] +2025-02-06 01:33:43 - ERROR - stderr - +2025-02-06 01:33:43 - ERROR - stderr - +2025-02-06 01:33:43 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.5162708759307861, 'learning_rate': 2.7049790538122623e-06, 'epoch': 2.3} +2025-02-06 01:33:43 - ERROR - stderr - 77%|███████▋ | 17217/22434 [15:26:02<3:38:17, 2.51s/it] +2025-02-06 01:33:45 - ERROR - stderr - 77%|███████▋ | 17218/22434 [15:26:05<3:37:08, 2.50s/it] +2025-02-06 01:33:45 - ERROR - stderr - +2025-02-06 01:33:45 - ERROR - stderr - +2025-02-06 01:33:45 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.582796335220337, 'learning_rate': 2.703991638083042e-06, 'epoch': 2.3} +2025-02-06 01:33:45 - ERROR - stderr - 77%|███████▋ | 17218/22434 [15:26:05<3:37:08, 2.50s/it] +2025-02-06 01:33:48 - ERROR - stderr - 77%|███████▋ | 17219/22434 [15:26:07<3:36:02, 2.49s/it] +2025-02-06 01:33:48 - ERROR - stderr - +2025-02-06 01:33:48 - ERROR - stderr - +2025-02-06 01:33:48 - INFO - stdout - {'loss': 0.421, 'grad_norm': 1.6232115030288696, 'learning_rate': 2.703004374432129e-06, 'epoch': 2.3} +2025-02-06 01:33:48 - ERROR - stderr - 77%|███████▋ | 17219/22434 [15:26:07<3:36:02, 2.49s/it] +2025-02-06 01:33:50 - ERROR - stderr - 77%|███████▋ | 17220/22434 [15:26:10<3:34:52, 2.47s/it] +2025-02-06 01:33:50 - ERROR - stderr - +2025-02-06 01:33:50 - ERROR - stderr - +2025-02-06 01:33:50 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.662674069404602, 'learning_rate': 2.702017262880111e-06, 'epoch': 2.3} +2025-02-06 01:33:50 - ERROR - stderr - 77%|███████▋ | 17220/22434 [15:26:10<3:34:52, 2.47s/it] +2025-02-06 01:33:52 - ERROR - stderr - 77%|███████▋ | 17221/22434 [15:26:12<3:36:10, 2.49s/it] +2025-02-06 01:33:53 - ERROR - stderr - +2025-02-06 01:33:53 - ERROR - stderr - +2025-02-06 01:33:53 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.574709177017212, 'learning_rate': 2.7010303034475616e-06, 'epoch': 2.3} +2025-02-06 01:33:53 - ERROR - stderr - 77%|███████▋ | 17221/22434 [15:26:12<3:36:10, 2.49s/it] +2025-02-06 01:33:55 - ERROR - stderr - 77%|███████▋ | 17222/22434 [15:26:15<3:42:40, 2.56s/it] +2025-02-06 01:33:55 - ERROR - stderr - +2025-02-06 01:33:55 - ERROR - stderr - +2025-02-06 01:33:55 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.4702725410461426, 'learning_rate': 2.7000434961550458e-06, 'epoch': 2.3} +2025-02-06 01:33:55 - ERROR - stderr - 77%|███████▋ | 17222/22434 [15:26:15<3:42:40, 2.56s/it] +2025-02-06 01:33:58 - ERROR - stderr - 77%|███████▋ | 17223/22434 [15:26:18<3:41:20, 2.55s/it] +2025-02-06 01:33:58 - ERROR - stderr - +2025-02-06 01:33:58 - ERROR - stderr - +2025-02-06 01:33:58 - INFO - stdout - {'loss': 0.3845, 'grad_norm': 1.8665411472320557, 'learning_rate': 2.6990568410231432e-06, 'epoch': 2.3} +2025-02-06 01:33:58 - ERROR - stderr - 77%|███████▋ | 17223/22434 [15:26:18<3:41:20, 2.55s/it] +2025-02-06 01:34:00 - ERROR - stderr - 77%|███████▋ | 17224/22434 [15:26:20<3:44:26, 2.58s/it] +2025-02-06 01:34:00 - ERROR - stderr - +2025-02-06 01:34:00 - ERROR - stderr - +2025-02-06 01:34:00 - INFO - stdout - {'loss': 0.4137, 'grad_norm': 1.6615039110183716, 'learning_rate': 2.6980703380724093e-06, 'epoch': 2.3} +2025-02-06 01:34:00 - ERROR - stderr - 77%|███████▋ | 17224/22434 [15:26:20<3:44:26, 2.58s/it] +2025-02-06 01:34:03 - ERROR - stderr - 77%|███████▋ | 17225/22434 [15:26:23<3:41:40, 2.55s/it] +2025-02-06 01:34:03 - ERROR - stderr - +2025-02-06 01:34:03 - ERROR - stderr - +2025-02-06 01:34:03 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.4534879922866821, 'learning_rate': 2.697083987323418e-06, 'epoch': 2.3} +2025-02-06 01:34:03 - ERROR - stderr - 77%|███████▋ | 17225/22434 [15:26:23<3:41:40, 2.55s/it] +2025-02-06 01:34:05 - ERROR - stderr - 77%|███████▋ | 17226/22434 [15:26:25<3:40:07, 2.54s/it] +2025-02-06 01:34:05 - ERROR - stderr - +2025-02-06 01:34:05 - ERROR - stderr - +2025-02-06 01:34:05 - INFO - stdout - {'loss': 0.3954, 'grad_norm': 1.6492722034454346, 'learning_rate': 2.69609778879672e-06, 'epoch': 2.3} +2025-02-06 01:34:05 - ERROR - stderr - 77%|███████▋ | 17226/22434 [15:26:25<3:40:07, 2.54s/it] +2025-02-06 01:34:08 - ERROR - stderr - 77%|███████▋ | 17227/22434 [15:26:28<3:37:23, 2.51s/it] +2025-02-06 01:34:08 - ERROR - stderr - +2025-02-06 01:34:08 - ERROR - stderr - +2025-02-06 01:34:08 - INFO - stdout - {'loss': 0.424, 'grad_norm': 1.6218595504760742, 'learning_rate': 2.6951117425128715e-06, 'epoch': 2.3} +2025-02-06 01:34:08 - ERROR - stderr - 77%|███████▋ | 17227/22434 [15:26:28<3:37:23, 2.51s/it] +2025-02-06 01:34:10 - ERROR - stderr - 77%|███████▋ | 17228/22434 [15:26:30<3:35:46, 2.49s/it] +2025-02-06 01:34:10 - ERROR - stderr - +2025-02-06 01:34:10 - ERROR - stderr - +2025-02-06 01:34:10 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.7219208478927612, 'learning_rate': 2.694125848492434e-06, 'epoch': 2.3} +2025-02-06 01:34:10 - ERROR - stderr - 77%|███████▋ | 17228/22434 [15:26:30<3:35:46, 2.49s/it] +2025-02-06 01:34:13 - ERROR - stderr - 77%|███████▋ | 17229/22434 [15:26:33<3:37:15, 2.50s/it] +2025-02-06 01:34:13 - ERROR - stderr - +2025-02-06 01:34:13 - ERROR - stderr - +2025-02-06 01:34:13 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.6455587148666382, 'learning_rate': 2.6931401067559503e-06, 'epoch': 2.3} +2025-02-06 01:34:13 - ERROR - stderr - 77%|███████▋ | 17229/22434 [15:26:33<3:37:15, 2.50s/it] +2025-02-06 01:34:15 - ERROR - stderr - 77%|███████▋ | 17230/22434 [15:26:35<3:37:58, 2.51s/it] +2025-02-06 01:34:15 - ERROR - stderr - +2025-02-06 01:34:15 - ERROR - stderr - +2025-02-06 01:34:15 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.525317668914795, 'learning_rate': 2.6921545173239684e-06, 'epoch': 2.3} +2025-02-06 01:34:15 - ERROR - stderr - 77%|███████▋ | 17230/22434 [15:26:35<3:37:58, 2.51s/it] +2025-02-06 01:34:18 - ERROR - stderr - 77%|███████▋ | 17231/22434 [15:26:38<3:38:42, 2.52s/it] +2025-02-06 01:34:18 - ERROR - stderr - +2025-02-06 01:34:18 - ERROR - stderr - +2025-02-06 01:34:18 - INFO - stdout - {'loss': 0.3933, 'grad_norm': 1.5554383993148804, 'learning_rate': 2.691169080217032e-06, 'epoch': 2.3} +2025-02-06 01:34:18 - ERROR - stderr - 77%|███████▋ | 17231/22434 [15:26:38<3:38:42, 2.52s/it] +2025-02-06 01:34:20 - ERROR - stderr - 77%|███████▋ | 17232/22434 [15:26:40<3:38:39, 2.52s/it] +2025-02-06 01:34:20 - ERROR - stderr - +2025-02-06 01:34:20 - ERROR - stderr - +2025-02-06 01:34:20 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.5328584909439087, 'learning_rate': 2.690183795455684e-06, 'epoch': 2.3} +2025-02-06 01:34:20 - ERROR - stderr - 77%|███████▋ | 17232/22434 [15:26:40<3:38:39, 2.52s/it] +2025-02-06 01:34:23 - ERROR - stderr - 77%|███████▋ | 17233/22434 [15:26:43<3:38:33, 2.52s/it] +2025-02-06 01:34:23 - ERROR - stderr - +2025-02-06 01:34:23 - ERROR - stderr - +2025-02-06 01:34:23 - INFO - stdout - {'loss': 0.3311, 'grad_norm': 1.4847791194915771, 'learning_rate': 2.6891986630604595e-06, 'epoch': 2.3} +2025-02-06 01:34:23 - ERROR - stderr - 77%|███████▋ | 17233/22434 [15:26:43<3:38:33, 2.52s/it] +2025-02-06 01:34:25 - ERROR - stderr - 77%|███████▋ | 17234/22434 [15:26:45<3:37:59, 2.52s/it] +2025-02-06 01:34:25 - ERROR - stderr - +2025-02-06 01:34:25 - ERROR - stderr - +2025-02-06 01:34:25 - INFO - stdout - {'loss': 0.3361, 'grad_norm': 1.468964695930481, 'learning_rate': 2.6882136830518923e-06, 'epoch': 2.3} +2025-02-06 01:34:25 - ERROR - stderr - 77%|███████▋ | 17234/22434 [15:26:45<3:37:59, 2.52s/it] +2025-02-06 01:34:28 - ERROR - stderr - 77%|███████▋ | 17235/22434 [15:26:48<3:37:33, 2.51s/it] +2025-02-06 01:34:28 - ERROR - stderr - +2025-02-06 01:34:28 - ERROR - stderr - +2025-02-06 01:34:28 - INFO - stdout - {'loss': 0.3384, 'grad_norm': 1.5205678939819336, 'learning_rate': 2.6872288554505157e-06, 'epoch': 2.3} +2025-02-06 01:34:28 - ERROR - stderr - 77%|███████▋ | 17235/22434 [15:26:48<3:37:33, 2.51s/it] +2025-02-06 01:34:30 - ERROR - stderr - 77%|███████▋ | 17236/22434 [15:26:50<3:35:53, 2.49s/it] +2025-02-06 01:34:30 - ERROR - stderr - +2025-02-06 01:34:30 - ERROR - stderr - +2025-02-06 01:34:30 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.5176019668579102, 'learning_rate': 2.686244180276855e-06, 'epoch': 2.3} +2025-02-06 01:34:30 - ERROR - stderr - 77%|███████▋ | 17236/22434 [15:26:50<3:35:53, 2.49s/it] +2025-02-06 01:34:33 - ERROR - stderr - 77%|███████▋ | 17237/22434 [15:26:53<3:34:58, 2.48s/it] +2025-02-06 01:34:33 - ERROR - stderr - +2025-02-06 01:34:33 - ERROR - stderr - +2025-02-06 01:34:33 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.576094627380371, 'learning_rate': 2.685259657551439e-06, 'epoch': 2.31} +2025-02-06 01:34:33 - ERROR - stderr - 77%|███████▋ | 17237/22434 [15:26:53<3:34:58, 2.48s/it] +2025-02-06 01:34:35 - ERROR - stderr - 77%|███████▋ | 17238/22434 [15:26:55<3:36:41, 2.50s/it] +2025-02-06 01:34:35 - ERROR - stderr - +2025-02-06 01:34:35 - ERROR - stderr - +2025-02-06 01:34:35 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.527039885520935, 'learning_rate': 2.68427528729478e-06, 'epoch': 2.31} +2025-02-06 01:34:35 - ERROR - stderr - 77%|███████▋ | 17238/22434 [15:26:55<3:36:41, 2.50s/it] +2025-02-06 01:34:38 - ERROR - stderr - 77%|███████▋ | 17239/22434 [15:26:58<3:34:17, 2.48s/it] +2025-02-06 01:34:38 - ERROR - stderr - +2025-02-06 01:34:38 - ERROR - stderr - +2025-02-06 01:34:38 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.5439746379852295, 'learning_rate': 2.683291069527405e-06, 'epoch': 2.31} +2025-02-06 01:34:38 - ERROR - stderr - 77%|███████▋ | 17239/22434 [15:26:58<3:34:17, 2.48s/it] +2025-02-06 01:34:40 - ERROR - stderr - 77%|███████▋ | 17240/22434 [15:27:00<3:35:30, 2.49s/it] +2025-02-06 01:34:40 - ERROR - stderr - +2025-02-06 01:34:40 - ERROR - stderr - +2025-02-06 01:34:40 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.3784844875335693, 'learning_rate': 2.6823070042698276e-06, 'epoch': 2.31} +2025-02-06 01:34:40 - ERROR - stderr - 77%|███████▋ | 17240/22434 [15:27:00<3:35:30, 2.49s/it] +2025-02-06 01:34:43 - ERROR - stderr - 77%|███████▋ | 17241/22434 [15:27:03<3:34:16, 2.48s/it] +2025-02-06 01:34:43 - ERROR - stderr - +2025-02-06 01:34:43 - ERROR - stderr - +2025-02-06 01:34:43 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.3741300106048584, 'learning_rate': 2.681323091542557e-06, 'epoch': 2.31} +2025-02-06 01:34:43 - ERROR - stderr - 77%|███████▋ | 17241/22434 [15:27:03<3:34:16, 2.48s/it] +2025-02-06 01:34:45 - ERROR - stderr - 77%|███████▋ | 17242/22434 [15:27:05<3:36:55, 2.51s/it] +2025-02-06 01:34:45 - ERROR - stderr - +2025-02-06 01:34:45 - ERROR - stderr - +2025-02-06 01:34:45 - INFO - stdout - {'loss': 0.4138, 'grad_norm': 1.692230463027954, 'learning_rate': 2.6803393313661063e-06, 'epoch': 2.31} +2025-02-06 01:34:45 - ERROR - stderr - 77%|███████▋ | 17242/22434 [15:27:05<3:36:55, 2.51s/it] +2025-02-06 01:34:48 - ERROR - stderr - 77%|███████▋ | 17243/22434 [15:27:08<3:36:47, 2.51s/it] +2025-02-06 01:34:48 - ERROR - stderr - +2025-02-06 01:34:48 - ERROR - stderr - +2025-02-06 01:34:48 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.4910967350006104, 'learning_rate': 2.6793557237609724e-06, 'epoch': 2.31} +2025-02-06 01:34:48 - ERROR - stderr - 77%|███████▋ | 17243/22434 [15:27:08<3:36:47, 2.51s/it] +2025-02-06 01:34:50 - ERROR - stderr - 77%|███████▋ | 17244/22434 [15:27:10<3:37:40, 2.52s/it] +2025-02-06 01:34:50 - ERROR - stderr - +2025-02-06 01:34:50 - ERROR - stderr - +2025-02-06 01:34:50 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.342836618423462, 'learning_rate': 2.67837226874767e-06, 'epoch': 2.31} +2025-02-06 01:34:50 - ERROR - stderr - 77%|███████▋ | 17244/22434 [15:27:10<3:37:40, 2.52s/it] +2025-02-06 01:34:53 - ERROR - stderr - 77%|███████▋ | 17245/22434 [15:27:13<3:39:09, 2.53s/it] +2025-02-06 01:34:53 - ERROR - stderr - +2025-02-06 01:34:53 - ERROR - stderr - +2025-02-06 01:34:53 - INFO - stdout - {'loss': 0.4028, 'grad_norm': 1.5732731819152832, 'learning_rate': 2.677388966346688e-06, 'epoch': 2.31} +2025-02-06 01:34:53 - ERROR - stderr - 77%|███████▋ | 17245/22434 [15:27:13<3:39:09, 2.53s/it] +2025-02-06 01:34:55 - ERROR - stderr - 77%|███████▋ | 17246/22434 [15:27:15<3:36:46, 2.51s/it] +2025-02-06 01:34:55 - ERROR - stderr - +2025-02-06 01:34:55 - ERROR - stderr - +2025-02-06 01:34:55 - INFO - stdout - {'loss': 0.4016, 'grad_norm': 1.5863515138626099, 'learning_rate': 2.6764058165785233e-06, 'epoch': 2.31} +2025-02-06 01:34:55 - ERROR - stderr - 77%|███████▋ | 17246/22434 [15:27:15<3:36:46, 2.51s/it] +2025-02-06 01:34:58 - ERROR - stderr - 77%|███████▋ | 17247/22434 [15:27:18<3:35:39, 2.49s/it] +2025-02-06 01:34:58 - ERROR - stderr - +2025-02-06 01:34:58 - ERROR - stderr - +2025-02-06 01:34:58 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.295082688331604, 'learning_rate': 2.675422819463678e-06, 'epoch': 2.31} +2025-02-06 01:34:58 - ERROR - stderr - 77%|███████▋ | 17247/22434 [15:27:18<3:35:39, 2.49s/it] +2025-02-06 01:35:00 - ERROR - stderr - 77%|███████▋ | 17248/22434 [15:27:20<3:38:28, 2.53s/it] +2025-02-06 01:35:01 - ERROR - stderr - +2025-02-06 01:35:01 - ERROR - stderr - +2025-02-06 01:35:01 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.596990704536438, 'learning_rate': 2.674439975022628e-06, 'epoch': 2.31} +2025-02-06 01:35:01 - ERROR - stderr - 77%|███████▋ | 17248/22434 [15:27:20<3:38:28, 2.53s/it] +2025-02-06 01:35:03 - ERROR - stderr - 77%|███████▋ | 17249/22434 [15:27:23<3:43:54, 2.59s/it] +2025-02-06 01:35:03 - ERROR - stderr - +2025-02-06 01:35:03 - ERROR - stderr - +2025-02-06 01:35:03 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.6416265964508057, 'learning_rate': 2.673457283275873e-06, 'epoch': 2.31} +2025-02-06 01:35:03 - ERROR - stderr - 77%|███████▋ | 17249/22434 [15:27:23<3:43:54, 2.59s/it] +2025-02-06 01:35:06 - ERROR - stderr - 77%|███████▋ | 17250/22434 [15:27:26<3:41:41, 2.57s/it] +2025-02-06 01:35:06 - ERROR - stderr - +2025-02-06 01:35:06 - ERROR - stderr - +2025-02-06 01:35:06 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.4452191591262817, 'learning_rate': 2.672474744243888e-06, 'epoch': 2.31} +2025-02-06 01:35:06 - ERROR - stderr - 77%|███████▋ | 17250/22434 [15:27:26<3:41:41, 2.57s/it] +2025-02-06 01:35:08 - ERROR - stderr - 77%|███████▋ | 17251/22434 [15:27:28<3:40:32, 2.55s/it] +2025-02-06 01:35:08 - ERROR - stderr - +2025-02-06 01:35:08 - ERROR - stderr - +2025-02-06 01:35:08 - INFO - stdout - {'loss': 0.329, 'grad_norm': 1.4098650217056274, 'learning_rate': 2.671492357947155e-06, 'epoch': 2.31} +2025-02-06 01:35:08 - ERROR - stderr - 77%|███████▋ | 17251/22434 [15:27:28<3:40:32, 2.55s/it] +2025-02-06 01:35:11 - ERROR - stderr - 77%|███████▋ | 17252/22434 [15:27:31<3:39:52, 2.55s/it] +2025-02-06 01:35:11 - ERROR - stderr - +2025-02-06 01:35:11 - ERROR - stderr - +2025-02-06 01:35:11 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.637052059173584, 'learning_rate': 2.6705101244061506e-06, 'epoch': 2.31} +2025-02-06 01:35:11 - ERROR - stderr - 77%|███████▋ | 17252/22434 [15:27:31<3:39:52, 2.55s/it] +2025-02-06 01:35:13 - ERROR - stderr - 77%|███████▋ | 17253/22434 [15:27:33<3:39:02, 2.54s/it] +2025-02-06 01:35:13 - ERROR - stderr - +2025-02-06 01:35:13 - ERROR - stderr - +2025-02-06 01:35:13 - INFO - stdout - {'loss': 0.3706, 'grad_norm': 1.3661260604858398, 'learning_rate': 2.6695280436413494e-06, 'epoch': 2.31} +2025-02-06 01:35:13 - ERROR - stderr - 77%|███████▋ | 17253/22434 [15:27:33<3:39:02, 2.54s/it] +2025-02-06 01:35:16 - ERROR - stderr - 77%|███████▋ | 17254/22434 [15:27:36<3:36:35, 2.51s/it] +2025-02-06 01:35:16 - ERROR - stderr - +2025-02-06 01:35:16 - ERROR - stderr - +2025-02-06 01:35:16 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.4715721607208252, 'learning_rate': 2.668546115673222e-06, 'epoch': 2.31} +2025-02-06 01:35:16 - ERROR - stderr - 77%|███████▋ | 17254/22434 [15:27:36<3:36:35, 2.51s/it] +2025-02-06 01:35:18 - ERROR - stderr - 77%|███████▋ | 17255/22434 [15:27:38<3:37:26, 2.52s/it] +2025-02-06 01:35:18 - ERROR - stderr - +2025-02-06 01:35:18 - ERROR - stderr - +2025-02-06 01:35:18 - INFO - stdout - {'loss': 0.4301, 'grad_norm': 1.7912428379058838, 'learning_rate': 2.667564340522235e-06, 'epoch': 2.31} +2025-02-06 01:35:18 - ERROR - stderr - 77%|███████▋ | 17255/22434 [15:27:38<3:37:26, 2.52s/it] +2025-02-06 01:35:21 - ERROR - stderr - 77%|███████▋ | 17256/22434 [15:27:41<3:36:42, 2.51s/it] +2025-02-06 01:35:21 - ERROR - stderr - +2025-02-06 01:35:21 - ERROR - stderr - +2025-02-06 01:35:21 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.503482699394226, 'learning_rate': 2.666582718208853e-06, 'epoch': 2.31} +2025-02-06 01:35:21 - ERROR - stderr - 77%|███████▋ | 17256/22434 [15:27:41<3:36:42, 2.51s/it] +2025-02-06 01:35:23 - ERROR - stderr - 77%|███████▋ | 17257/22434 [15:27:43<3:35:41, 2.50s/it] +2025-02-06 01:35:23 - ERROR - stderr - +2025-02-06 01:35:23 - ERROR - stderr - +2025-02-06 01:35:23 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.5292028188705444, 'learning_rate': 2.6656012487535377e-06, 'epoch': 2.31} +2025-02-06 01:35:23 - ERROR - stderr - 77%|███████▋ | 17257/22434 [15:27:43<3:35:41, 2.50s/it] +2025-02-06 01:35:26 - ERROR - stderr - 77%|███████▋ | 17258/22434 [15:27:46<3:35:34, 2.50s/it] +2025-02-06 01:35:26 - ERROR - stderr - +2025-02-06 01:35:26 - ERROR - stderr - +2025-02-06 01:35:26 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.6456142663955688, 'learning_rate': 2.664619932176745e-06, 'epoch': 2.31} +2025-02-06 01:35:26 - ERROR - stderr - 77%|███████▋ | 17258/22434 [15:27:46<3:35:34, 2.50s/it] +2025-02-06 01:35:28 - ERROR - stderr - 77%|███████▋ | 17259/22434 [15:27:48<3:33:29, 2.48s/it] +2025-02-06 01:35:28 - ERROR - stderr - +2025-02-06 01:35:28 - ERROR - stderr - +2025-02-06 01:35:28 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.4305490255355835, 'learning_rate': 2.663638768498932e-06, 'epoch': 2.31} +2025-02-06 01:35:28 - ERROR - stderr - 77%|███████▋ | 17259/22434 [15:27:48<3:33:29, 2.48s/it] +2025-02-06 01:35:31 - ERROR - stderr - 77%|███████▋ | 17260/22434 [15:27:50<3:33:40, 2.48s/it] +2025-02-06 01:35:31 - ERROR - stderr - +2025-02-06 01:35:31 - ERROR - stderr - +2025-02-06 01:35:31 - INFO - stdout - {'loss': 0.3213, 'grad_norm': 1.388782262802124, 'learning_rate': 2.6626577577405464e-06, 'epoch': 2.31} +2025-02-06 01:35:31 - ERROR - stderr - 77%|███████▋ | 17260/22434 [15:27:50<3:33:40, 2.48s/it] +2025-02-06 01:35:33 - ERROR - stderr - 77%|███████▋ | 17261/22434 [15:27:53<3:32:57, 2.47s/it] +2025-02-06 01:35:33 - ERROR - stderr - +2025-02-06 01:35:33 - ERROR - stderr - +2025-02-06 01:35:33 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.6446101665496826, 'learning_rate': 2.661676899922041e-06, 'epoch': 2.31} +2025-02-06 01:35:33 - ERROR - stderr - 77%|███████▋ | 17261/22434 [15:27:53<3:32:57, 2.47s/it] +2025-02-06 01:35:36 - ERROR - stderr - 77%|███████▋ | 17262/22434 [15:27:55<3:31:47, 2.46s/it] +2025-02-06 01:35:36 - ERROR - stderr - +2025-02-06 01:35:36 - ERROR - stderr - +2025-02-06 01:35:36 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.4010463953018188, 'learning_rate': 2.660696195063858e-06, 'epoch': 2.31} +2025-02-06 01:35:36 - ERROR - stderr - 77%|███████▋ | 17262/22434 [15:27:55<3:31:47, 2.46s/it] +2025-02-06 01:35:38 - ERROR - stderr - 77%|███████▋ | 17263/22434 [15:27:58<3:31:36, 2.46s/it] +2025-02-06 01:35:38 - ERROR - stderr - +2025-02-06 01:35:38 - ERROR - stderr - +2025-02-06 01:35:38 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.6406043767929077, 'learning_rate': 2.6597156431864423e-06, 'epoch': 2.31} +2025-02-06 01:35:38 - ERROR - stderr - 77%|███████▋ | 17263/22434 [15:27:58<3:31:36, 2.46s/it] +2025-02-06 01:35:41 - ERROR - stderr - 77%|███████▋ | 17264/22434 [15:28:01<3:40:30, 2.56s/it] +2025-02-06 01:35:41 - ERROR - stderr - +2025-02-06 01:35:41 - ERROR - stderr - +2025-02-06 01:35:41 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.5354620218276978, 'learning_rate': 2.6587352443102245e-06, 'epoch': 2.31} +2025-02-06 01:35:41 - ERROR - stderr - 77%|███████▋ | 17264/22434 [15:28:01<3:40:30, 2.56s/it] +2025-02-06 01:35:43 - ERROR - stderr - 77%|███████▋ | 17265/22434 [15:28:03<3:38:47, 2.54s/it] +2025-02-06 01:35:43 - ERROR - stderr - +2025-02-06 01:35:43 - ERROR - stderr - +2025-02-06 01:35:43 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.508113980293274, 'learning_rate': 2.6577549984556485e-06, 'epoch': 2.31} +2025-02-06 01:35:43 - ERROR - stderr - 77%|███████▋ | 17265/22434 [15:28:03<3:38:47, 2.54s/it] +2025-02-06 01:35:46 - ERROR - stderr - 77%|███████▋ | 17266/22434 [15:28:05<3:36:18, 2.51s/it] +2025-02-06 01:35:46 - ERROR - stderr - +2025-02-06 01:35:46 - ERROR - stderr - +2025-02-06 01:35:46 - INFO - stdout - {'loss': 0.406, 'grad_norm': 1.43109929561615, 'learning_rate': 2.656774905643147e-06, 'epoch': 2.31} +2025-02-06 01:35:46 - ERROR - stderr - 77%|███████▋ | 17266/22434 [15:28:06<3:36:18, 2.51s/it] +2025-02-06 01:35:48 - ERROR - stderr - 77%|███████▋ | 17267/22434 [15:28:08<3:35:02, 2.50s/it] +2025-02-06 01:35:48 - ERROR - stderr - +2025-02-06 01:35:48 - ERROR - stderr - +2025-02-06 01:35:48 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.6870267391204834, 'learning_rate': 2.6557949658931402e-06, 'epoch': 2.31} +2025-02-06 01:35:48 - ERROR - stderr - 77%|███████▋ | 17267/22434 [15:28:08<3:35:02, 2.50s/it] +2025-02-06 01:35:51 - ERROR - stderr - 77%|███████▋ | 17268/22434 [15:28:10<3:35:25, 2.50s/it] +2025-02-06 01:35:51 - ERROR - stderr - +2025-02-06 01:35:51 - ERROR - stderr - +2025-02-06 01:35:51 - INFO - stdout - {'loss': 0.451, 'grad_norm': 1.964639663696289, 'learning_rate': 2.6548151792260647e-06, 'epoch': 2.31} +2025-02-06 01:35:51 - ERROR - stderr - 77%|███████▋ | 17268/22434 [15:28:11<3:35:25, 2.50s/it] +2025-02-06 01:35:53 - ERROR - stderr - 77%|███████▋ | 17269/22434 [15:28:13<3:33:19, 2.48s/it] +2025-02-06 01:35:53 - ERROR - stderr - +2025-02-06 01:35:53 - ERROR - stderr - +2025-02-06 01:35:53 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.4153599739074707, 'learning_rate': 2.653835545662333e-06, 'epoch': 2.31} +2025-02-06 01:35:53 - ERROR - stderr - 77%|███████▋ | 17269/22434 [15:28:13<3:33:19, 2.48s/it] +2025-02-06 01:35:56 - ERROR - stderr - 77%|███████▋ | 17270/22434 [15:28:15<3:32:10, 2.47s/it] +2025-02-06 01:35:56 - ERROR - stderr - +2025-02-06 01:35:56 - ERROR - stderr - +2025-02-06 01:35:56 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.5784382820129395, 'learning_rate': 2.6528560652223756e-06, 'epoch': 2.31} +2025-02-06 01:35:56 - ERROR - stderr - 77%|███████▋ | 17270/22434 [15:28:15<3:32:10, 2.47s/it] +2025-02-06 01:35:58 - ERROR - stderr - 77%|█████���█▋ | 17271/22434 [15:28:18<3:32:46, 2.47s/it] +2025-02-06 01:35:58 - ERROR - stderr - +2025-02-06 01:35:58 - ERROR - stderr - +2025-02-06 01:35:58 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.3733164072036743, 'learning_rate': 2.651876737926601e-06, 'epoch': 2.31} +2025-02-06 01:35:58 - ERROR - stderr - 77%|███████▋ | 17271/22434 [15:28:18<3:32:46, 2.47s/it] +2025-02-06 01:36:01 - ERROR - stderr - 77%|███████▋ | 17272/22434 [15:28:20<3:32:32, 2.47s/it] +2025-02-06 01:36:01 - ERROR - stderr - +2025-02-06 01:36:01 - ERROR - stderr - +2025-02-06 01:36:01 - INFO - stdout - {'loss': 0.3954, 'grad_norm': 1.3948289155960083, 'learning_rate': 2.6508975637954224e-06, 'epoch': 2.31} +2025-02-06 01:36:01 - ERROR - stderr - 77%|███████▋ | 17272/22434 [15:28:20<3:32:32, 2.47s/it] +2025-02-06 01:36:03 - ERROR - stderr - 77%|███████▋ | 17273/22434 [15:28:23<3:33:30, 2.48s/it] +2025-02-06 01:36:03 - ERROR - stderr - +2025-02-06 01:36:03 - ERROR - stderr - +2025-02-06 01:36:03 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.6088441610336304, 'learning_rate': 2.6499185428492534e-06, 'epoch': 2.31} +2025-02-06 01:36:03 - ERROR - stderr - 77%|███████▋ | 17273/22434 [15:28:23<3:33:30, 2.48s/it] +2025-02-06 01:36:05 - ERROR - stderr - 77%|███████▋ | 17274/22434 [15:28:25<3:31:23, 2.46s/it] +2025-02-06 01:36:05 - ERROR - stderr - +2025-02-06 01:36:05 - ERROR - stderr - +2025-02-06 01:36:05 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.3912757635116577, 'learning_rate': 2.6489396751084983e-06, 'epoch': 2.31} +2025-02-06 01:36:05 - ERROR - stderr - 77%|███████▋ | 17274/22434 [15:28:25<3:31:23, 2.46s/it] +2025-02-06 01:36:08 - ERROR - stderr - 77%|███████▋ | 17275/22434 [15:28:28<3:31:06, 2.46s/it] +2025-02-06 01:36:08 - ERROR - stderr - +2025-02-06 01:36:08 - ERROR - stderr - +2025-02-06 01:36:08 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.437946081161499, 'learning_rate': 2.647960960593562e-06, 'epoch': 2.31} +2025-02-06 01:36:08 - ERROR - stderr - 77%|███████▋ | 17275/22434 [15:28:28<3:31:06, 2.46s/it] +2025-02-06 01:36:10 - ERROR - stderr - 77%|███████▋ | 17276/22434 [15:28:30<3:32:54, 2.48s/it] +2025-02-06 01:36:10 - ERROR - stderr - +2025-02-06 01:36:10 - ERROR - stderr - +2025-02-06 01:36:10 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.6099984645843506, 'learning_rate': 2.6469823993248444e-06, 'epoch': 2.31} +2025-02-06 01:36:10 - ERROR - stderr - 77%|███████▋ | 17276/22434 [15:28:30<3:32:54, 2.48s/it] +2025-02-06 01:36:13 - ERROR - stderr - 77%|███████▋ | 17277/22434 [15:28:33<3:34:43, 2.50s/it] +2025-02-06 01:36:13 - ERROR - stderr - +2025-02-06 01:36:13 - ERROR - stderr - +2025-02-06 01:36:13 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.662084937095642, 'learning_rate': 2.646003991322742e-06, 'epoch': 2.31} +2025-02-06 01:36:13 - ERROR - stderr - 77%|███████▋ | 17277/22434 [15:28:33<3:34:43, 2.50s/it] +2025-02-06 01:36:15 - ERROR - stderr - 77%|███████▋ | 17278/22434 [15:28:35<3:35:12, 2.50s/it] +2025-02-06 01:36:16 - ERROR - stderr - +2025-02-06 01:36:16 - ERROR - stderr - +2025-02-06 01:36:16 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.3653630018234253, 'learning_rate': 2.6450257366076494e-06, 'epoch': 2.31} +2025-02-06 01:36:16 - ERROR - stderr - 77%|███████▋ | 17278/22434 [15:28:35<3:35:12, 2.50s/it] +2025-02-06 01:36:18 - ERROR - stderr - 77%|███████▋ | 17279/22434 [15:28:38<3:44:23, 2.61s/it] +2025-02-06 01:36:18 - ERROR - stderr - +2025-02-06 01:36:18 - ERROR - stderr - +2025-02-06 01:36:18 - INFO - stdout - {'loss': 0.3257, 'grad_norm': 1.5453976392745972, 'learning_rate': 2.644047635199958e-06, 'epoch': 2.31} +2025-02-06 01:36:18 - ERROR - stderr - 77%|███████▋ | 17279/22434 [15:28:38<3:44:23, 2.61s/it] +2025-02-06 01:36:21 - ERROR - stderr - 77%|███████▋ | 17280/22434 [15:28:41<3:39:48, 2.56s/it] +2025-02-06 01:36:21 - ERROR - stderr - +2025-02-06 01:36:21 - ERROR - stderr - +2025-02-06 01:36:21 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.4139389991760254, 'learning_rate': 2.6430696871200546e-06, 'epoch': 2.31} +2025-02-06 01:36:21 - ERROR - stderr - 77%|███████▋ | 17280/22434 [15:28:41<3:39:48, 2.56s/it] +2025-02-06 01:36:23 - ERROR - stderr - 77%|███████▋ | 17281/22434 [15:28:43<3:37:27, 2.53s/it] +2025-02-06 01:36:23 - ERROR - stderr - +2025-02-06 01:36:23 - ERROR - stderr - +2025-02-06 01:36:23 - INFO - stdout - {'loss': 0.4165, 'grad_norm': 1.6965724229812622, 'learning_rate': 2.642091892388323e-06, 'epoch': 2.31} +2025-02-06 01:36:23 - ERROR - stderr - 77%|███████▋ | 17281/22434 [15:28:43<3:37:27, 2.53s/it] +2025-02-06 01:36:26 - ERROR - stderr - 77%|███████▋ | 17282/22434 [15:28:45<3:34:39, 2.50s/it] +2025-02-06 01:36:26 - ERROR - stderr - +2025-02-06 01:36:26 - ERROR - stderr - +2025-02-06 01:36:26 - INFO - stdout - {'loss': 0.3115, 'grad_norm': 1.4413119554519653, 'learning_rate': 2.64111425102515e-06, 'epoch': 2.31} +2025-02-06 01:36:26 - ERROR - stderr - 77%|███████▋ | 17282/22434 [15:28:45<3:34:39, 2.50s/it] +2025-02-06 01:36:28 - ERROR - stderr - 77%|███████▋ | 17283/22434 [15:28:48<3:34:19, 2.50s/it] +2025-02-06 01:36:28 - ERROR - stderr - +2025-02-06 01:36:28 - ERROR - stderr - +2025-02-06 01:36:28 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.5952370166778564, 'learning_rate': 2.640136763050901e-06, 'epoch': 2.31} +2025-02-06 01:36:28 - ERROR - stderr - 77%|███████▋ | 17283/22434 [15:28:48<3:34:19, 2.50s/it] +2025-02-06 01:36:31 - ERROR - stderr - 77%|███████▋ | 17284/22434 [15:28:50<3:33:52, 2.49s/it] +2025-02-06 01:36:31 - ERROR - stderr - +2025-02-06 01:36:31 - ERROR - stderr - +2025-02-06 01:36:31 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.43675696849823, 'learning_rate': 2.639159428485962e-06, 'epoch': 2.31} +2025-02-06 01:36:31 - ERROR - stderr - 77%|███████▋ | 17284/22434 [15:28:50<3:33:52, 2.49s/it] +2025-02-06 01:36:33 - ERROR - stderr - 77%|███████▋ | 17285/22434 [15:28:53<3:34:23, 2.50s/it] +2025-02-06 01:36:33 - ERROR - stderr - +2025-02-06 01:36:33 - ERROR - stderr - +2025-02-06 01:36:33 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.7120444774627686, 'learning_rate': 2.6381822473507014e-06, 'epoch': 2.31} +2025-02-06 01:36:33 - ERROR - stderr - 77%|███████▋ | 17285/22434 [15:28:53<3:34:23, 2.50s/it] +2025-02-06 01:36:36 - ERROR - stderr - 77%|███████▋ | 17286/22434 [15:28:55<3:35:40, 2.51s/it] +2025-02-06 01:36:36 - ERROR - stderr - +2025-02-06 01:36:36 - ERROR - stderr - +2025-02-06 01:36:36 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.4572577476501465, 'learning_rate': 2.637205219665486e-06, 'epoch': 2.31} +2025-02-06 01:36:36 - ERROR - stderr - 77%|███████▋ | 17286/22434 [15:28:56<3:35:40, 2.51s/it] +2025-02-06 01:36:38 - ERROR - stderr - 77%|███████▋ | 17287/22434 [15:28:58<3:34:56, 2.51s/it] +2025-02-06 01:36:38 - ERROR - stderr - +2025-02-06 01:36:38 - ERROR - stderr - +2025-02-06 01:36:38 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.5276273488998413, 'learning_rate': 2.6362283454506877e-06, 'epoch': 2.31} +2025-02-06 01:36:38 - ERROR - stderr - 77%|███████▋ | 17287/22434 [15:28:58<3:34:56, 2.51s/it] +2025-02-06 01:36:41 - ERROR - stderr - 77%|███████▋ | 17288/22434 [15:29:00<3:34:34, 2.50s/it] +2025-02-06 01:36:41 - ERROR - stderr - +2025-02-06 01:36:41 - ERROR - stderr - +2025-02-06 01:36:41 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.5447605848312378, 'learning_rate': 2.635251624726656e-06, 'epoch': 2.31} +2025-02-06 01:36:41 - ERROR - stderr - 77%|███████▋ | 17288/22434 [15:29:00<3:34:34, 2.50s/it] +2025-02-06 01:36:43 - ERROR - stderr - 77%|███████▋ | 17289/22434 [15:29:03<3:36:47, 2.53s/it] +2025-02-06 01:36:43 - ERROR - stderr - +2025-02-06 01:36:43 - ERROR - stderr - +2025-02-06 01:36:43 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.6203325986862183, 'learning_rate': 2.6342750575137623e-06, 'epoch': 2.31} +2025-02-06 01:36:43 - ERROR - stderr - 77%|███████▋ | 17289/22434 [15:29:03<3:36:47, 2.53s/it] +2025-02-06 01:36:46 - ERROR - stderr - 77%|███████▋ | 17290/22434 [15:29:06<3:36:27, 2.52s/it] +2025-02-06 01:36:46 - ERROR - stderr - +2025-02-06 01:36:46 - ERROR - stderr - +2025-02-06 01:36:46 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.449604868888855, 'learning_rate': 2.633298643832355e-06, 'epoch': 2.31} +2025-02-06 01:36:46 - ERROR - stderr - 77%|███████▋ | 17290/22434 [15:29:06<3:36:27, 2.52s/it] +2025-02-06 01:36:48 - ERROR - stderr - 77%|███████▋ | 17291/22434 [15:29:08<3:33:17, 2.49s/it] +2025-02-06 01:36:48 - ERROR - stderr - +2025-02-06 01:36:48 - ERROR - stderr - +2025-02-06 01:36:48 - INFO - stdout - {'loss': 0.4135, 'grad_norm': 1.6105260848999023, 'learning_rate': 2.6323223837027876e-06, 'epoch': 2.31} +2025-02-06 01:36:48 - ERROR - stderr - 77%|███████▋ | 17291/22434 [15:29:08<3:33:17, 2.49s/it] +2025-02-06 01:36:51 - ERROR - stderr - 77%|███████▋ | 17292/22434 [15:29:11<3:34:59, 2.51s/it] +2025-02-06 01:36:51 - ERROR - stderr - +2025-02-06 01:36:51 - ERROR - stderr - +2025-02-06 01:36:51 - INFO - stdout - {'loss': 0.3325, 'grad_norm': 1.4969819784164429, 'learning_rate': 2.6313462771454103e-06, 'epoch': 2.31} +2025-02-06 01:36:51 - ERROR - stderr - 77%|███████▋ | 17292/22434 [15:29:11<3:34:59, 2.51s/it] +2025-02-06 01:36:53 - ERROR - stderr - 77%|███████▋ | 17293/22434 [15:29:13<3:40:26, 2.57s/it] +2025-02-06 01:36:54 - ERROR - stderr - +2025-02-06 01:36:54 - ERROR - stderr - +2025-02-06 01:36:54 - INFO - stdout - {'loss': 0.3699, 'grad_norm': 1.3541268110275269, 'learning_rate': 2.6303703241805656e-06, 'epoch': 2.31} +2025-02-06 01:36:54 - ERROR - stderr - 77%|███████▋ | 17293/22434 [15:29:13<3:40:26, 2.57s/it] +2025-02-06 01:36:56 - ERROR - stderr - 77%|███████▋ | 17294/22434 [15:29:16<3:38:20, 2.55s/it] +2025-02-06 01:36:56 - ERROR - stderr - +2025-02-06 01:36:56 - ERROR - stderr - +2025-02-06 01:36:56 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.722935438156128, 'learning_rate': 2.6293945248286047e-06, 'epoch': 2.31} +2025-02-06 01:36:56 - ERROR - stderr - 77%|███████▋ | 17294/22434 [15:29:16<3:38:20, 2.55s/it] +2025-02-06 01:36:58 - ERROR - stderr - 77%|███████▋ | 17295/22434 [15:29:18<3:36:07, 2.52s/it] +2025-02-06 01:36:58 - ERROR - stderr - +2025-02-06 01:36:58 - ERROR - stderr - +2025-02-06 01:36:58 - INFO - stdout - {'loss': 0.4123, 'grad_norm': 1.579074740409851, 'learning_rate': 2.62841887910986e-06, 'epoch': 2.31} +2025-02-06 01:36:58 - ERROR - stderr - 77%|███████▋ | 17295/22434 [15:29:18<3:36:07, 2.52s/it] +2025-02-06 01:37:01 - ERROR - stderr - 77%|███████▋ | 17296/22434 [15:29:21<3:35:54, 2.52s/it] +2025-02-06 01:37:01 - ERROR - stderr - +2025-02-06 01:37:01 - ERROR - stderr - +2025-02-06 01:37:01 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.4995322227478027, 'learning_rate': 2.6274433870446704e-06, 'epoch': 2.31} +2025-02-06 01:37:01 - ERROR - stderr - 77%|███████▋ | 17296/22434 [15:29:21<3:35:54, 2.52s/it] +2025-02-06 01:37:03 - ERROR - stderr - 77%|███████▋ | 17297/22434 [15:29:23<3:34:34, 2.51s/it] +2025-02-06 01:37:03 - ERROR - stderr - +2025-02-06 01:37:03 - ERROR - stderr - +2025-02-06 01:37:03 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.4410686492919922, 'learning_rate': 2.6264680486533677e-06, 'epoch': 2.31} +2025-02-06 01:37:03 - ERROR - stderr - 77%|███████▋ | 17297/22434 [15:29:23<3:34:34, 2.51s/it] +2025-02-06 01:37:06 - ERROR - stderr - 77%|███████▋ | 17298/22434 [15:29:26<3:34:22, 2.50s/it] +2025-02-06 01:37:06 - ERROR - stderr - +2025-02-06 01:37:06 - ERROR - stderr - +2025-02-06 01:37:06 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.307084560394287, 'learning_rate': 2.6254928639562826e-06, 'epoch': 2.31} +2025-02-06 01:37:06 - ERROR - stderr - 77%|███████▋ | 17298/22434 [15:29:26<3:34:22, 2.50s/it] +2025-02-06 01:37:08 - ERROR - stderr - 77%|███████▋ | 17299/22434 [15:29:28<3:34:06, 2.50s/it] +2025-02-06 01:37:08 - ERROR - stderr - +2025-02-06 01:37:08 - ERROR - stderr - +2025-02-06 01:37:08 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.5430079698562622, 'learning_rate': 2.624517832973743e-06, 'epoch': 2.31} +2025-02-06 01:37:08 - ERROR - stderr - 77%|███████▋ | 17299/22434 [15:29:28<3:34:06, 2.50s/it] +2025-02-06 01:37:11 - ERROR - stderr - 77%|███████▋ | 17300/22434 [15:29:31<3:33:33, 2.50s/it] +2025-02-06 01:37:11 - ERROR - stderr - +2025-02-06 01:37:11 - ERROR - stderr - +2025-02-06 01:37:11 - INFO - stdout - {'loss': 0.3956, 'grad_norm': 1.6198211908340454, 'learning_rate': 2.6235429557260716e-06, 'epoch': 2.31} +2025-02-06 01:37:11 - ERROR - stderr - 77%|███████▋ | 17300/22434 [15:29:31<3:33:33, 2.50s/it] +2025-02-06 01:37:14 - ERROR - stderr - 77%|███████▋ | 17301/22434 [15:29:33<3:37:07, 2.54s/it] +2025-02-06 01:37:14 - ERROR - stderr - +2025-02-06 01:37:14 - ERROR - stderr - +2025-02-06 01:37:14 - INFO - stdout - {'loss': 0.3706, 'grad_norm': 1.4721360206604004, 'learning_rate': 2.6225682322335876e-06, 'epoch': 2.31} +2025-02-06 01:37:14 - ERROR - stderr - 77%|███████▋ | 17301/22434 [15:29:33<3:37:07, 2.54s/it] +2025-02-06 01:37:16 - ERROR - stderr - 77%|███████▋ | 17302/22434 [15:29:36<3:37:11, 2.54s/it] +2025-02-06 01:37:16 - ERROR - stderr - +2025-02-06 01:37:16 - ERROR - stderr - +2025-02-06 01:37:16 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.5926934480667114, 'learning_rate': 2.6215936625166106e-06, 'epoch': 2.31} +2025-02-06 01:37:16 - ERROR - stderr - 77%|███████▋ | 17302/22434 [15:29:36<3:37:11, 2.54s/it] +2025-02-06 01:37:18 - ERROR - stderr - 77%|███████▋ | 17303/22434 [15:29:38<3:33:39, 2.50s/it] +2025-02-06 01:37:19 - ERROR - stderr - +2025-02-06 01:37:19 - ERROR - stderr - +2025-02-06 01:37:19 - INFO - stdout - {'loss': 0.4516, 'grad_norm': 1.701040267944336, 'learning_rate': 2.620619246595453e-06, 'epoch': 2.31} +2025-02-06 01:37:19 - ERROR - stderr - 77%|███████▋ | 17303/22434 [15:29:38<3:33:39, 2.50s/it] +2025-02-06 01:37:21 - ERROR - stderr - 77%|███████▋ | 17304/22434 [15:29:41<3:35:06, 2.52s/it] +2025-02-06 01:37:21 - ERROR - stderr - +2025-02-06 01:37:21 - ERROR - stderr - +2025-02-06 01:37:21 - INFO - stdout - {'loss': 0.4197, 'grad_norm': 1.6770532131195068, 'learning_rate': 2.6196449844904257e-06, 'epoch': 2.31} +2025-02-06 01:37:21 - ERROR - stderr - 77%|███████▋ | 17304/22434 [15:29:41<3:35:06, 2.52s/it] +2025-02-06 01:37:24 - ERROR - stderr - 77%|███████▋ | 17305/22434 [15:29:43<3:35:50, 2.52s/it] +2025-02-06 01:37:24 - ERROR - stderr - +2025-02-06 01:37:24 - ERROR - stderr - +2025-02-06 01:37:24 - INFO - stdout - {'loss': 0.4208, 'grad_norm': 1.6526716947555542, 'learning_rate': 2.6186708762218373e-06, 'epoch': 2.31} +2025-02-06 01:37:24 - ERROR - stderr - 77%|███████▋ | 17305/22434 [15:29:43<3:35:50, 2.52s/it] +2025-02-06 01:37:26 - ERROR - stderr - 77%|███████▋ | 17306/22434 [15:29:46<3:35:20, 2.52s/it] +2025-02-06 01:37:26 - ERROR - stderr - +2025-02-06 01:37:26 - ERROR - stderr - +2025-02-06 01:37:26 - INFO - stdout - {'loss': 0.323, 'grad_norm': 1.3465416431427002, 'learning_rate': 2.6176969218099936e-06, 'epoch': 2.31} +2025-02-06 01:37:26 - ERROR - stderr - 77%|███████▋ | 17306/22434 [15:29:46<3:35:20, 2.52s/it] +2025-02-06 01:37:29 - ERROR - stderr - 77%|███████▋ | 17307/22434 [15:29:48<3:34:52, 2.51s/it] +2025-02-06 01:37:29 - ERROR - stderr - +2025-02-06 01:37:29 - ERROR - stderr - +2025-02-06 01:37:29 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.4111219644546509, 'learning_rate': 2.6167231212751864e-06, 'epoch': 2.31} +2025-02-06 01:37:29 - ERROR - stderr - 77%|███████▋ | 17307/22434 [15:29:48<3:34:52, 2.51s/it] +2025-02-06 01:37:31 - ERROR - stderr - 77%|███████▋ | 17308/22434 [15:29:51<3:34:22, 2.51s/it] +2025-02-06 01:37:31 - ERROR - stderr - +2025-02-06 01:37:31 - ERROR - stderr - +2025-02-06 01:37:31 - INFO - stdout - {'loss': 0.3872, 'grad_norm': 1.658780574798584, 'learning_rate': 2.6157494746377276e-06, 'epoch': 2.31} +2025-02-06 01:37:31 - ERROR - stderr - 77%|███████▋ | 17308/22434 [15:29:51<3:34:22, 2.51s/it] +2025-02-06 01:37:34 - ERROR - stderr - 77%|███████▋ | 17309/22434 [15:29:53<3:32:44, 2.49s/it] +2025-02-06 01:37:34 - ERROR - stderr - +2025-02-06 01:37:34 - ERROR - stderr - +2025-02-06 01:37:34 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.5630604028701782, 'learning_rate': 2.6147759819179e-06, 'epoch': 2.31} +2025-02-06 01:37:34 - ERROR - stderr - 77%|███████▋ | 17309/22434 [15:29:53<3:32:44, 2.49s/it] +2025-02-06 01:37:36 - ERROR - stderr - 77%|███████▋ | 17310/22434 [15:29:56<3:31:24, 2.48s/it] +2025-02-06 01:37:36 - ERROR - stderr - +2025-02-06 01:37:36 - ERROR - stderr - +2025-02-06 01:37:36 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.5395259857177734, 'learning_rate': 2.613802643136002e-06, 'epoch': 2.31} +2025-02-06 01:37:36 - ERROR - stderr - 77%|███████▋ | 17310/22434 [15:29:56<3:31:24, 2.48s/it] +2025-02-06 01:37:38 - ERROR - stderr - 77%|███████▋ | 17311/22434 [15:29:58<3:31:55, 2.48s/it] +2025-02-06 01:37:39 - ERROR - stderr - +2025-02-06 01:37:39 - ERROR - stderr - +2025-02-06 01:37:39 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.3887840509414673, 'learning_rate': 2.6128294583123236e-06, 'epoch': 2.31} +2025-02-06 01:37:39 - ERROR - stderr - 77%|███████▋ | 17311/22434 [15:29:58<3:31:55, 2.48s/it] +2025-02-06 01:37:41 - ERROR - stderr - 77%|███████▋ | 17312/22434 [15:30:01<3:30:56, 2.47s/it] +2025-02-06 01:37:41 - ERROR - stderr - +2025-02-06 01:37:41 - ERROR - stderr - +2025-02-06 01:37:41 - INFO - stdout - {'loss': 0.3525, 'grad_norm': 1.3718063831329346, 'learning_rate': 2.61185642746714e-06, 'epoch': 2.32} +2025-02-06 01:37:41 - ERROR - stderr - 77%|███████▋ | 17312/22434 [15:30:01<3:30:56, 2.47s/it] +2025-02-06 01:37:43 - ERROR - stderr - 77%|███████▋ | 17313/22434 [15:30:03<3:30:03, 2.46s/it] +2025-02-06 01:37:43 - ERROR - stderr - +2025-02-06 01:37:43 - ERROR - stderr - +2025-02-06 01:37:43 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.4196175336837769, 'learning_rate': 2.6108835506207465e-06, 'epoch': 2.32} +2025-02-06 01:37:43 - ERROR - stderr - 77%|███████▋ | 17313/22434 [15:30:03<3:30:03, 2.46s/it] +2025-02-06 01:37:46 - ERROR - stderr - 77%|███████▋ | 17314/22434 [15:30:06<3:31:17, 2.48s/it] +2025-02-06 01:37:46 - ERROR - stderr - +2025-02-06 01:37:46 - ERROR - stderr - +2025-02-06 01:37:46 - INFO - stdout - {'loss': 0.4224, 'grad_norm': 1.5935378074645996, 'learning_rate': 2.6099108277934105e-06, 'epoch': 2.32} +2025-02-06 01:37:46 - ERROR - stderr - 77%|███████▋ | 17314/22434 [15:30:06<3:31:17, 2.48s/it] +2025-02-06 01:37:48 - ERROR - stderr - 77%|███████▋ | 17315/22434 [15:30:08<3:30:42, 2.47s/it] +2025-02-06 01:37:48 - ERROR - stderr - +2025-02-06 01:37:48 - ERROR - stderr - +2025-02-06 01:37:48 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.4283748865127563, 'learning_rate': 2.6089382590054122e-06, 'epoch': 2.32} +2025-02-06 01:37:48 - ERROR - stderr - 77%|███████▋ | 17315/22434 [15:30:08<3:30:42, 2.47s/it] +2025-02-06 01:37:51 - ERROR - stderr - 77%|███████▋ | 17316/22434 [15:30:11<3:32:04, 2.49s/it] +2025-02-06 01:37:51 - ERROR - stderr - +2025-02-06 01:37:51 - ERROR - stderr - +2025-02-06 01:37:51 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.3659788370132446, 'learning_rate': 2.607965844277024e-06, 'epoch': 2.32} +2025-02-06 01:37:51 - ERROR - stderr - 77%|███████▋ | 17316/22434 [15:30:11<3:32:04, 2.49s/it] +2025-02-06 01:37:53 - ERROR - stderr - 77%|███████▋ | 17317/22434 [15:30:13<3:31:23, 2.48s/it] +2025-02-06 01:37:53 - ERROR - stderr - +2025-02-06 01:37:53 - ERROR - stderr - +2025-02-06 01:37:53 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.509053111076355, 'learning_rate': 2.606993583628513e-06, 'epoch': 2.32} +2025-02-06 01:37:53 - ERROR - stderr - 77%|███████▋ | 17317/22434 [15:30:13<3:31:23, 2.48s/it] +2025-02-06 01:37:56 - ERROR - stderr - 77%|███████▋ | 17318/22434 [15:30:16<3:32:42, 2.49s/it] +2025-02-06 01:37:56 - ERROR - stderr - +2025-02-06 01:37:56 - ERROR - stderr - +2025-02-06 01:37:56 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.476450800895691, 'learning_rate': 2.606021477080147e-06, 'epoch': 2.32} +2025-02-06 01:37:56 - ERROR - stderr - 77%|███████▋ | 17318/22434 [15:30:16<3:32:42, 2.49s/it] +2025-02-06 01:37:58 - ERROR - stderr - 77%|███████▋ | 17319/22434 [15:30:18<3:30:43, 2.47s/it] +2025-02-06 01:37:58 - ERROR - stderr - +2025-02-06 01:37:58 - ERROR - stderr - +2025-02-06 01:37:58 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.5358840227127075, 'learning_rate': 2.605049524652189e-06, 'epoch': 2.32} +2025-02-06 01:37:58 - ERROR - stderr - 77%|███████▋ | 17319/22434 [15:30:18<3:30:43, 2.47s/it] +2025-02-06 01:38:01 - ERROR - stderr - 77%|███████▋ | 17320/22434 [15:30:21<3:31:09, 2.48s/it] +2025-02-06 01:38:01 - ERROR - stderr - +2025-02-06 01:38:01 - ERROR - stderr - +2025-02-06 01:38:01 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.5286520719528198, 'learning_rate': 2.6040777263648964e-06, 'epoch': 2.32} +2025-02-06 01:38:01 - ERROR - stderr - 77%|███████▋ | 17320/22434 [15:30:21<3:31:09, 2.48s/it] +2025-02-06 01:38:03 - ERROR - stderr - 77%|███████▋ | 17321/22434 [15:30:23<3:32:18, 2.49s/it] +2025-02-06 01:38:03 - ERROR - stderr - +2025-02-06 01:38:03 - ERROR - stderr - +2025-02-06 01:38:03 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.4930964708328247, 'learning_rate': 2.603106082238527e-06, 'epoch': 2.32} +2025-02-06 01:38:03 - ERROR - stderr - 77%|███████▋ | 17321/22434 [15:30:23<3:32:18, 2.49s/it] +2025-02-06 01:38:06 - ERROR - stderr - 77%|███████▋ | 17322/22434 [15:30:26<3:33:17, 2.50s/it] +2025-02-06 01:38:06 - ERROR - stderr - +2025-02-06 01:38:06 - ERROR - stderr - +2025-02-06 01:38:06 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.5236842632293701, 'learning_rate': 2.6021345922933328e-06, 'epoch': 2.32} +2025-02-06 01:38:06 - ERROR - stderr - 77%|███████▋ | 17322/22434 [15:30:26<3:33:17, 2.50s/it] +2025-02-06 01:38:08 - ERROR - stderr - 77%|███████▋ | 17323/22434 [15:30:28<3:33:16, 2.50s/it] +2025-02-06 01:38:08 - ERROR - stderr - +2025-02-06 01:38:08 - ERROR - stderr - +2025-02-06 01:38:08 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.575071096420288, 'learning_rate': 2.6011632565495646e-06, 'epoch': 2.32} +2025-02-06 01:38:08 - ERROR - stderr - 77%|███████▋ | 17323/22434 [15:30:28<3:33:16, 2.50s/it] +2025-02-06 01:38:11 - ERROR - stderr - 77%|███████▋ | 17324/22434 [15:30:31<3:33:15, 2.50s/it] +2025-02-06 01:38:11 - ERROR - stderr - +2025-02-06 01:38:11 - ERROR - stderr - +2025-02-06 01:38:11 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.4129579067230225, 'learning_rate': 2.600192075027468e-06, 'epoch': 2.32} +2025-02-06 01:38:11 - ERROR - stderr - 77%|███████▋ | 17324/22434 [15:30:31<3:33:15, 2.50s/it] +2025-02-06 01:38:13 - ERROR - stderr - 77%|███████▋ | 17325/22434 [15:30:33<3:37:17, 2.55s/it] +2025-02-06 01:38:14 - ERROR - stderr - +2025-02-06 01:38:14 - ERROR - stderr - +2025-02-06 01:38:14 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.6925137042999268, 'learning_rate': 2.5992210477472866e-06, 'epoch': 2.32} +2025-02-06 01:38:14 - ERROR - stderr - 77%|███████▋ | 17325/22434 [15:30:33<3:37:17, 2.55s/it] +2025-02-06 01:38:16 - ERROR - stderr - 77%|███████▋ | 17326/22434 [15:30:36<3:37:41, 2.56s/it] +2025-02-06 01:38:16 - ERROR - stderr - +2025-02-06 01:38:16 - ERROR - stderr - +2025-02-06 01:38:16 - INFO - stdout - {'loss': 0.4124, 'grad_norm': 1.6678059101104736, 'learning_rate': 2.598250174729261e-06, 'epoch': 2.32} +2025-02-06 01:38:16 - ERROR - stderr - 77%|███████▋ | 17326/22434 [15:30:36<3:37:41, 2.56s/it] +2025-02-06 01:38:18 - ERROR - stderr - 77%|███████▋ | 17327/22434 [15:30:38<3:34:21, 2.52s/it] +2025-02-06 01:38:19 - ERROR - stderr - +2025-02-06 01:38:19 - ERROR - stderr - +2025-02-06 01:38:19 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.4436016082763672, 'learning_rate': 2.597279455993631e-06, 'epoch': 2.32} +2025-02-06 01:38:19 - ERROR - stderr - 77%|███████▋ | 17327/22434 [15:30:38<3:34:21, 2.52s/it] +2025-02-06 01:38:21 - ERROR - stderr - 77%|███████▋ | 17328/22434 [15:30:41<3:35:31, 2.53s/it] +2025-02-06 01:38:21 - ERROR - stderr - +2025-02-06 01:38:21 - ERROR - stderr - +2025-02-06 01:38:21 - INFO - stdout - {'loss': 0.4053, 'grad_norm': 1.6795214414596558, 'learning_rate': 2.5963088915606204e-06, 'epoch': 2.32} +2025-02-06 01:38:21 - ERROR - stderr - 77%|███████▋ | 17328/22434 [15:30:41<3:35:31, 2.53s/it] +2025-02-06 01:38:24 - ERROR - stderr - 77%|███████▋ | 17329/22434 [15:30:43<3:35:54, 2.54s/it] +2025-02-06 01:38:24 - ERROR - stderr - +2025-02-06 01:38:24 - ERROR - stderr - +2025-02-06 01:38:24 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.7218916416168213, 'learning_rate': 2.59533848145047e-06, 'epoch': 2.32} +2025-02-06 01:38:24 - ERROR - stderr - 77%|███████▋ | 17329/22434 [15:30:43<3:35:54, 2.54s/it] +2025-02-06 01:38:26 - ERROR - stderr - 77%|███████▋ | 17330/22434 [15:30:46<3:32:40, 2.50s/it] +2025-02-06 01:38:26 - ERROR - stderr - +2025-02-06 01:38:26 - ERROR - stderr - +2025-02-06 01:38:26 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.385292410850525, 'learning_rate': 2.594368225683407e-06, 'epoch': 2.32} +2025-02-06 01:38:26 - ERROR - stderr - 77%|███████▋ | 17330/22434 [15:30:46<3:32:40, 2.50s/it] +2025-02-06 01:38:28 - ERROR - stderr - 77%|███████▋ | 17331/22434 [15:30:48<3:30:56, 2.48s/it] +2025-02-06 01:38:28 - ERROR - stderr - +2025-02-06 01:38:28 - ERROR - stderr - +2025-02-06 01:38:28 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.4175090789794922, 'learning_rate': 2.5933981242796445e-06, 'epoch': 2.32} +2025-02-06 01:38:28 - ERROR - stderr - 77%|███████▋ | 17331/22434 [15:30:48<3:30:56, 2.48s/it] +2025-02-06 01:38:31 - ERROR - stderr - 77%|███████▋ | 17332/22434 [15:30:51<3:31:27, 2.49s/it] +2025-02-06 01:38:31 - ERROR - stderr - +2025-02-06 01:38:31 - ERROR - stderr - +2025-02-06 01:38:31 - INFO - stdout - {'loss': 0.3959, 'grad_norm': 1.4795702695846558, 'learning_rate': 2.5924281772594174e-06, 'epoch': 2.32} +2025-02-06 01:38:31 - ERROR - stderr - 77%|███████▋ | 17332/22434 [15:30:51<3:31:27, 2.49s/it] +2025-02-06 01:38:33 - ERROR - stderr - 77%|███████▋ | 17333/22434 [15:30:53<3:33:25, 2.51s/it] +2025-02-06 01:38:34 - ERROR - stderr - +2025-02-06 01:38:34 - ERROR - stderr - +2025-02-06 01:38:34 - INFO - stdout - {'loss': 0.4085, 'grad_norm': 1.567068338394165, 'learning_rate': 2.591458384642931e-06, 'epoch': 2.32} +2025-02-06 01:38:34 - ERROR - stderr - 77%|███████▋ | 17333/22434 [15:30:53<3:33:25, 2.51s/it] +2025-02-06 01:38:36 - ERROR - stderr - 77%|███████▋ | 17334/22434 [15:30:56<3:31:56, 2.49s/it] +2025-02-06 01:38:36 - ERROR - stderr - +2025-02-06 01:38:36 - ERROR - stderr - +2025-02-06 01:38:36 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.549087643623352, 'learning_rate': 2.5904887464504115e-06, 'epoch': 2.32} +2025-02-06 01:38:36 - ERROR - stderr - 77%|███████▋ | 17334/22434 [15:30:56<3:31:56, 2.49s/it] +2025-02-06 01:38:38 - ERROR - stderr - 77%|███████▋ | 17335/22434 [15:30:58<3:31:44, 2.49s/it] +2025-02-06 01:38:38 - ERROR - stderr - +2025-02-06 01:38:38 - ERROR - stderr - +2025-02-06 01:38:38 - INFO - stdout - {'loss': 0.4163, 'grad_norm': 1.4080950021743774, 'learning_rate': 2.5895192627020604e-06, 'epoch': 2.32} +2025-02-06 01:38:38 - ERROR - stderr - 77%|███████▋ | 17335/22434 [15:30:58<3:31:44, 2.49s/it] +2025-02-06 01:38:41 - ERROR - stderr - 77%|███████▋ | 17336/22434 [15:31:01<3:38:08, 2.57s/it] +2025-02-06 01:38:41 - ERROR - stderr - +2025-02-06 01:38:41 - ERROR - stderr - +2025-02-06 01:38:41 - INFO - stdout - {'loss': 0.4135, 'grad_norm': 1.4772248268127441, 'learning_rate': 2.5885499334180887e-06, 'epoch': 2.32} +2025-02-06 01:38:41 - ERROR - stderr - 77%|███████▋ | 17336/22434 [15:31:01<3:38:08, 2.57s/it] +2025-02-06 01:38:44 - ERROR - stderr - 77%|███████▋ | 17337/22434 [15:31:04<3:42:00, 2.61s/it] +2025-02-06 01:38:44 - ERROR - stderr - +2025-02-06 01:38:44 - ERROR - stderr - +2025-02-06 01:38:44 - INFO - stdout - {'loss': 0.4347, 'grad_norm': 1.6864362955093384, 'learning_rate': 2.587580758618703e-06, 'epoch': 2.32} +2025-02-06 01:38:44 - ERROR - stderr - 77%|███████▋ | 17337/22434 [15:31:04<3:42:00, 2.61s/it] +2025-02-06 01:38:46 - ERROR - stderr - 77%|███████▋ | 17338/22434 [15:31:06<3:38:09, 2.57s/it] +2025-02-06 01:38:46 - ERROR - stderr - +2025-02-06 01:38:46 - ERROR - stderr - +2025-02-06 01:38:46 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.459546685218811, 'learning_rate': 2.5866117383240997e-06, 'epoch': 2.32} +2025-02-06 01:38:46 - ERROR - stderr - 77%|███████▋ | 17338/22434 [15:31:06<3:38:09, 2.57s/it] +2025-02-06 01:38:49 - ERROR - stderr - 77%|███████▋ | 17339/22434 [15:31:09<3:35:47, 2.54s/it] +2025-02-06 01:38:49 - ERROR - stderr - +2025-02-06 01:38:49 - ERROR - stderr - +2025-02-06 01:38:49 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.487243890762329, 'learning_rate': 2.5856428725544868e-06, 'epoch': 2.32} +2025-02-06 01:38:49 - ERROR - stderr - 77%|███████▋ | 17339/22434 [15:31:09<3:35:47, 2.54s/it] +2025-02-06 01:38:51 - ERROR - stderr - 77%|███████▋ | 17340/22434 [15:31:11<3:34:50, 2.53s/it] +2025-02-06 01:38:51 - ERROR - stderr - +2025-02-06 01:38:51 - ERROR - stderr - +2025-02-06 01:38:51 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.519048810005188, 'learning_rate': 2.584674161330051e-06, 'epoch': 2.32} +2025-02-06 01:38:51 - ERROR - stderr - 77%|███████▋ | 17340/22434 [15:31:11<3:34:50, 2.53s/it] +2025-02-06 01:38:54 - ERROR - stderr - 77%|███████▋ | 17341/22434 [15:31:14<3:35:15, 2.54s/it] +2025-02-06 01:38:54 - ERROR - stderr - +2025-02-06 01:38:54 - ERROR - stderr - +2025-02-06 01:38:54 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.3728705644607544, 'learning_rate': 2.583705604670985e-06, 'epoch': 2.32} +2025-02-06 01:38:54 - ERROR - stderr - 77%|███████▋ | 17341/22434 [15:31:14<3:35:15, 2.54s/it] +2025-02-06 01:38:56 - ERROR - stderr - 77%|███████▋ | 17342/22434 [15:31:16<3:34:12, 2.52s/it] +2025-02-06 01:38:56 - ERROR - stderr - +2025-02-06 01:38:56 - ERROR - stderr - +2025-02-06 01:38:56 - INFO - stdout - {'loss': 0.4151, 'grad_norm': 1.6111341714859009, 'learning_rate': 2.5827372025974804e-06, 'epoch': 2.32} +2025-02-06 01:38:56 - ERROR - stderr - 77%|███████▋ | 17342/22434 [15:31:16<3:34:12, 2.52s/it] +2025-02-06 01:38:59 - ERROR - stderr - 77%|███████▋ | 17343/22434 [15:31:19<3:32:19, 2.50s/it] +2025-02-06 01:38:59 - ERROR - stderr - +2025-02-06 01:38:59 - ERROR - stderr - +2025-02-06 01:38:59 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.4583547115325928, 'learning_rate': 2.581768955129722e-06, 'epoch': 2.32} +2025-02-06 01:38:59 - ERROR - stderr - 77%|███████▋ | 17343/22434 [15:31:19<3:32:19, 2.50s/it] +2025-02-06 01:39:02 - ERROR - stderr - 77%|███████▋ | 17344/22434 [15:31:21<3:37:06, 2.56s/it] +2025-02-06 01:39:02 - ERROR - stderr - +2025-02-06 01:39:02 - ERROR - stderr - +2025-02-06 01:39:02 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.3331198692321777, 'learning_rate': 2.58080086228789e-06, 'epoch': 2.32} +2025-02-06 01:39:02 - ERROR - stderr - 77%|███████▋ | 17344/22434 [15:31:21<3:37:06, 2.56s/it] +2025-02-06 01:39:04 - ERROR - stderr - 77%|███████▋ | 17345/22434 [15:31:24<3:35:49, 2.54s/it] +2025-02-06 01:39:04 - ERROR - stderr - +2025-02-06 01:39:04 - ERROR - stderr - +2025-02-06 01:39:04 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.408850908279419, 'learning_rate': 2.579832924092165e-06, 'epoch': 2.32} +2025-02-06 01:39:04 - ERROR - stderr - 77%|███████▋ | 17345/22434 [15:31:24<3:35:49, 2.54s/it] +2025-02-06 01:39:07 - ERROR - stderr - 77%|███████▋ | 17346/22434 [15:31:26<3:35:33, 2.54s/it] +2025-02-06 01:39:07 - ERROR - stderr - +2025-02-06 01:39:07 - ERROR - stderr - +2025-02-06 01:39:07 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.5345115661621094, 'learning_rate': 2.578865140562722e-06, 'epoch': 2.32} +2025-02-06 01:39:07 - ERROR - stderr - 77%|███████▋ | 17346/22434 [15:31:26<3:35:33, 2.54s/it] +2025-02-06 01:39:09 - ERROR - stderr - 77%|███████▋ | 17347/22434 [15:31:29<3:33:20, 2.52s/it] +2025-02-06 01:39:09 - ERROR - stderr - +2025-02-06 01:39:09 - ERROR - stderr - +2025-02-06 01:39:09 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.5069886445999146, 'learning_rate': 2.577897511719735e-06, 'epoch': 2.32} +2025-02-06 01:39:09 - ERROR - stderr - 77%|███████▋ | 17347/22434 [15:31:29<3:33:20, 2.52s/it] +2025-02-06 01:39:12 - ERROR - stderr - 77%|███████▋ | 17348/22434 [15:31:32<3:42:22, 2.62s/it] +2025-02-06 01:39:12 - ERROR - stderr - +2025-02-06 01:39:12 - ERROR - stderr - +2025-02-06 01:39:12 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.452288269996643, 'learning_rate': 2.5769300375833705e-06, 'epoch': 2.32} +2025-02-06 01:39:12 - ERROR - stderr - 77%|███████▋ | 17348/22434 [15:31:32<3:42:22, 2.62s/it] +2025-02-06 01:39:15 - ERROR - stderr - 77%|███████▋ | 17349/22434 [15:31:34<3:43:12, 2.63s/it] +2025-02-06 01:39:15 - ERROR - stderr - +2025-02-06 01:39:15 - ERROR - stderr - +2025-02-06 01:39:15 - INFO - stdout - {'loss': 0.4271, 'grad_norm': 1.5234006643295288, 'learning_rate': 2.5759627181737977e-06, 'epoch': 2.32} +2025-02-06 01:39:15 - ERROR - stderr - 77%|███████▋ | 17349/22434 [15:31:34<3:43:12, 2.63s/it] +2025-02-06 01:39:17 - ERROR - stderr - 77%|███████▋ | 17350/22434 [15:31:37<3:39:53, 2.60s/it] +2025-02-06 01:39:17 - ERROR - stderr - +2025-02-06 01:39:17 - ERROR - stderr - +2025-02-06 01:39:17 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.442626953125, 'learning_rate': 2.574995553511177e-06, 'epoch': 2.32} +2025-02-06 01:39:17 - ERROR - stderr - 77%|███████▋ | 17350/22434 [15:31:37<3:39:53, 2.60s/it] +2025-02-06 01:39:20 - ERROR - stderr - 77%|███████▋ | 17351/22434 [15:31:39<3:37:21, 2.57s/it] +2025-02-06 01:39:20 - ERROR - stderr - +2025-02-06 01:39:20 - ERROR - stderr - +2025-02-06 01:39:20 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.3990323543548584, 'learning_rate': 2.5740285436156732e-06, 'epoch': 2.32} +2025-02-06 01:39:20 - ERROR - stderr - 77%|███████▋ | 17351/22434 [15:31:39<3:37:21, 2.57s/it] +2025-02-06 01:39:22 - ERROR - stderr - 77%|███████▋ | 17352/22434 [15:31:42<3:34:57, 2.54s/it] +2025-02-06 01:39:22 - ERROR - stderr - +2025-02-06 01:39:22 - ERROR - stderr - +2025-02-06 01:39:22 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.4189847707748413, 'learning_rate': 2.573061688507431e-06, 'epoch': 2.32} +2025-02-06 01:39:22 - ERROR - stderr - 77%|███████▋ | 17352/22434 [15:31:42<3:34:57, 2.54s/it] +2025-02-06 01:39:25 - ERROR - stderr - 77%|███████▋ | 17353/22434 [15:31:45<3:42:38, 2.63s/it] +2025-02-06 01:39:25 - ERROR - stderr - +2025-02-06 01:39:25 - ERROR - stderr - +2025-02-06 01:39:25 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.3224866390228271, 'learning_rate': 2.5720949882066184e-06, 'epoch': 2.32} +2025-02-06 01:39:25 - ERROR - stderr - 77%|███████▋ | 17353/22434 [15:31:45<3:42:38, 2.63s/it] +2025-02-06 01:39:27 - ERROR - stderr - 77%|███████▋ | 17354/22434 [15:31:47<3:39:38, 2.59s/it] +2025-02-06 01:39:27 - ERROR - stderr - +2025-02-06 01:39:27 - ERROR - stderr - +2025-02-06 01:39:27 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.6355680227279663, 'learning_rate': 2.5711284427333716e-06, 'epoch': 2.32} +2025-02-06 01:39:27 - ERROR - stderr - 77%|███████▋ | 17354/22434 [15:31:47<3:39:38, 2.59s/it] +2025-02-06 01:39:30 - ERROR - stderr - 77%|███████▋ | 17355/22434 [15:31:50<3:38:06, 2.58s/it] +2025-02-06 01:39:30 - ERROR - stderr - +2025-02-06 01:39:30 - ERROR - stderr - +2025-02-06 01:39:30 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.4673792123794556, 'learning_rate': 2.5701620521078497e-06, 'epoch': 2.32} +2025-02-06 01:39:30 - ERROR - stderr - 77%|███████▋ | 17355/22434 [15:31:50<3:38:06, 2.58s/it] +2025-02-06 01:39:32 - ERROR - stderr - 77%|███████▋ | 17356/22434 [15:31:52<3:36:28, 2.56s/it] +2025-02-06 01:39:32 - ERROR - stderr - +2025-02-06 01:39:32 - ERROR - stderr - +2025-02-06 01:39:32 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.5565831661224365, 'learning_rate': 2.5691958163501875e-06, 'epoch': 2.32} +2025-02-06 01:39:32 - ERROR - stderr - 77%|███████▋ | 17356/22434 [15:31:52<3:36:28, 2.56s/it] +2025-02-06 01:39:35 - ERROR - stderr - 77%|███████▋ | 17357/22434 [15:31:55<3:33:17, 2.52s/it] +2025-02-06 01:39:35 - ERROR - stderr - +2025-02-06 01:39:35 - ERROR - stderr - +2025-02-06 01:39:35 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.6876822710037231, 'learning_rate': 2.568229735480524e-06, 'epoch': 2.32} +2025-02-06 01:39:35 - ERROR - stderr - 77%|███████▋ | 17357/22434 [15:31:55<3:33:17, 2.52s/it] +2025-02-06 01:39:37 - ERROR - stderr - 77%|███████▋ | 17358/22434 [15:31:57<3:34:10, 2.53s/it] +2025-02-06 01:39:37 - ERROR - stderr - +2025-02-06 01:39:37 - ERROR - stderr - +2025-02-06 01:39:37 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.5254268646240234, 'learning_rate': 2.567263809519007e-06, 'epoch': 2.32} +2025-02-06 01:39:37 - ERROR - stderr - 77%|███████▋ | 17358/22434 [15:31:57<3:34:10, 2.53s/it] +2025-02-06 01:39:40 - ERROR - stderr - 77%|███████▋ | 17359/22434 [15:32:00<3:35:09, 2.54s/it] +2025-02-06 01:39:40 - ERROR - stderr - +2025-02-06 01:39:40 - ERROR - stderr - +2025-02-06 01:39:40 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.4397848844528198, 'learning_rate': 2.5662980384857605e-06, 'epoch': 2.32} +2025-02-06 01:39:40 - ERROR - stderr - 77%|███████▋ | 17359/22434 [15:32:00<3:35:09, 2.54s/it] +2025-02-06 01:39:42 - ERROR - stderr - 77%|███████▋ | 17360/22434 [15:32:02<3:32:56, 2.52s/it] +2025-02-06 01:39:43 - ERROR - stderr - +2025-02-06 01:39:43 - ERROR - stderr - +2025-02-06 01:39:43 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.6071133613586426, 'learning_rate': 2.5653324224009192e-06, 'epoch': 2.32} +2025-02-06 01:39:43 - ERROR - stderr - 77%|███████▋ | 17360/22434 [15:32:02<3:32:56, 2.52s/it] +2025-02-06 01:39:45 - ERROR - stderr - 77%|███████▋ | 17361/22434 [15:32:05<3:31:36, 2.50s/it] +2025-02-06 01:39:45 - ERROR - stderr - +2025-02-06 01:39:45 - ERROR - stderr - +2025-02-06 01:39:45 - INFO - stdout - {'loss': 0.3885, 'grad_norm': 1.607040286064148, 'learning_rate': 2.564366961284608e-06, 'epoch': 2.32} +2025-02-06 01:39:45 - ERROR - stderr - 77%|███████▋ | 17361/22434 [15:32:05<3:31:36, 2.50s/it] +2025-02-06 01:39:47 - ERROR - stderr - 77%|███████▋ | 17362/22434 [15:32:07<3:31:57, 2.51s/it] +2025-02-06 01:39:47 - ERROR - stderr - +2025-02-06 01:39:47 - ERROR - stderr - +2025-02-06 01:39:47 - INFO - stdout - {'loss': 0.4151, 'grad_norm': 1.6239193677902222, 'learning_rate': 2.563401655156952e-06, 'epoch': 2.32} +2025-02-06 01:39:47 - ERROR - stderr - 77%|███████▋ | 17362/22434 [15:32:07<3:31:57, 2.51s/it] +2025-02-06 01:39:50 - ERROR - stderr - 77%|███████▋ | 17363/22434 [15:32:10<3:30:42, 2.49s/it] +2025-02-06 01:39:50 - ERROR - stderr - +2025-02-06 01:39:50 - ERROR - stderr - +2025-02-06 01:39:50 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.3694560527801514, 'learning_rate': 2.562436504038074e-06, 'epoch': 2.32} +2025-02-06 01:39:50 - ERROR - stderr - 77%|███████▋ | 17363/22434 [15:32:10<3:30:42, 2.49s/it] +2025-02-06 01:39:52 - ERROR - stderr - 77%|███████▋ | 17364/22434 [15:32:12<3:29:29, 2.48s/it] +2025-02-06 01:39:52 - ERROR - stderr - +2025-02-06 01:39:52 - ERROR - stderr - +2025-02-06 01:39:52 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.5949527025222778, 'learning_rate': 2.561471507948089e-06, 'epoch': 2.32} +2025-02-06 01:39:52 - ERROR - stderr - 77%|███████▋ | 17364/22434 [15:32:12<3:29:29, 2.48s/it] +2025-02-06 01:39:55 - ERROR - stderr - 77%|███████▋ | 17365/22434 [15:32:15<3:29:54, 2.48s/it] +2025-02-06 01:39:55 - ERROR - stderr - +2025-02-06 01:39:55 - ERROR - stderr - +2025-02-06 01:39:55 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.5379475355148315, 'learning_rate': 2.5605066669071123e-06, 'epoch': 2.32} +2025-02-06 01:39:55 - ERROR - stderr - 77%|███████▋ | 17365/22434 [15:32:15<3:29:54, 2.48s/it] +2025-02-06 01:39:57 - ERROR - stderr - 77%|███████▋ | 17366/22434 [15:32:17<3:31:55, 2.51s/it] +2025-02-06 01:39:57 - ERROR - stderr - +2025-02-06 01:39:57 - ERROR - stderr - +2025-02-06 01:39:57 - INFO - stdout - {'loss': 0.3417, 'grad_norm': 1.5675251483917236, 'learning_rate': 2.559541980935256e-06, 'epoch': 2.32} +2025-02-06 01:39:57 - ERROR - stderr - 77%|███████▋ | 17366/22434 [15:32:17<3:31:55, 2.51s/it] +2025-02-06 01:40:00 - ERROR - stderr - 77%|███████▋ | 17367/22434 [15:32:20<3:31:18, 2.50s/it] +2025-02-06 01:40:00 - ERROR - stderr - +2025-02-06 01:40:00 - ERROR - stderr - +2025-02-06 01:40:00 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.6319044828414917, 'learning_rate': 2.558577450052627e-06, 'epoch': 2.32} +2025-02-06 01:40:00 - ERROR - stderr - 77%|███████▋ | 17367/22434 [15:32:20<3:31:18, 2.50s/it] +2025-02-06 01:40:02 - ERROR - stderr - 77%|███████▋ | 17368/22434 [15:32:22<3:30:14, 2.49s/it] +2025-02-06 01:40:02 - ERROR - stderr - +2025-02-06 01:40:02 - ERROR - stderr - +2025-02-06 01:40:02 - INFO - stdout - {'loss': 0.3307, 'grad_norm': 1.318451166152954, 'learning_rate': 2.5576130742793304e-06, 'epoch': 2.32} +2025-02-06 01:40:02 - ERROR - stderr - 77%|███████▋ | 17368/22434 [15:32:22<3:30:14, 2.49s/it] +2025-02-06 01:40:05 - ERROR - stderr - 77%|███████▋ | 17369/22434 [15:32:25<3:30:30, 2.49s/it] +2025-02-06 01:40:05 - ERROR - stderr - +2025-02-06 01:40:05 - ERROR - stderr - +2025-02-06 01:40:05 - INFO - stdout - {'loss': 0.3111, 'grad_norm': 1.297761082649231, 'learning_rate': 2.5566488536354673e-06, 'epoch': 2.32} +2025-02-06 01:40:05 - ERROR - stderr - 77%|███████▋ | 17369/22434 [15:32:25<3:30:30, 2.49s/it] +2025-02-06 01:40:07 - ERROR - stderr - 77%|███████▋ | 17370/22434 [15:32:27<3:33:23, 2.53s/it] +2025-02-06 01:40:08 - ERROR - stderr - +2025-02-06 01:40:08 - ERROR - stderr - +2025-02-06 01:40:08 - INFO - stdout - {'loss': 0.2983, 'grad_norm': 1.619277834892273, 'learning_rate': 2.555684788141137e-06, 'epoch': 2.32} +2025-02-06 01:40:08 - ERROR - stderr - 77%|███████▋ | 17370/22434 [15:32:27<3:33:23, 2.53s/it] +2025-02-06 01:40:10 - ERROR - stderr - 77%|███████▋ | 17371/22434 [15:32:30<3:33:55, 2.54s/it] +2025-02-06 01:40:10 - ERROR - stderr - +2025-02-06 01:40:10 - ERROR - stderr - +2025-02-06 01:40:10 - INFO - stdout - {'loss': 0.3385, 'grad_norm': 1.3799242973327637, 'learning_rate': 2.5547208778164336e-06, 'epoch': 2.32} +2025-02-06 01:40:10 - ERROR - stderr - 77%|███████▋ | 17371/22434 [15:32:30<3:33:55, 2.54s/it] +2025-02-06 01:40:12 - ERROR - stderr - 77%|███████▋ | 17372/22434 [15:32:32<3:31:21, 2.51s/it] +2025-02-06 01:40:13 - ERROR - stderr - +2025-02-06 01:40:13 - ERROR - stderr - +2025-02-06 01:40:13 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.5364391803741455, 'learning_rate': 2.5537571226814517e-06, 'epoch': 2.32} +2025-02-06 01:40:13 - ERROR - stderr - 77%|███████▋ | 17372/22434 [15:32:32<3:31:21, 2.51s/it] +2025-02-06 01:40:15 - ERROR - stderr - 77%|███████▋ | 17373/22434 [15:32:35<3:31:37, 2.51s/it] +2025-02-06 01:40:15 - ERROR - stderr - +2025-02-06 01:40:15 - ERROR - stderr - +2025-02-06 01:40:15 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.5701355934143066, 'learning_rate': 2.5527935227562716e-06, 'epoch': 2.32} +2025-02-06 01:40:15 - ERROR - stderr - 77%|███████▋ | 17373/22434 [15:32:35<3:31:37, 2.51s/it] +2025-02-06 01:40:18 - ERROR - stderr - 77%|███████▋ | 17374/22434 [15:32:37<3:33:36, 2.53s/it] +2025-02-06 01:40:18 - ERROR - stderr - +2025-02-06 01:40:18 - ERROR - stderr - +2025-02-06 01:40:18 - INFO - stdout - {'loss': 0.4049, 'grad_norm': 1.7673043012619019, 'learning_rate': 2.5518300780609905e-06, 'epoch': 2.32} +2025-02-06 01:40:18 - ERROR - stderr - 77%|███████▋ | 17374/22434 [15:32:37<3:33:36, 2.53s/it] +2025-02-06 01:40:20 - ERROR - stderr - 77%|███████▋ | 17375/22434 [15:32:40<3:32:02, 2.51s/it] +2025-02-06 01:40:20 - ERROR - stderr - +2025-02-06 01:40:20 - ERROR - stderr - +2025-02-06 01:40:20 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.6083894968032837, 'learning_rate': 2.5508667886156814e-06, 'epoch': 2.32} +2025-02-06 01:40:20 - ERROR - stderr - 77%|███████▋ | 17375/22434 [15:32:40<3:32:02, 2.51s/it] +2025-02-06 01:40:23 - ERROR - stderr - 77%|███████▋ | 17376/22434 [15:32:42<3:32:37, 2.52s/it] +2025-02-06 01:40:23 - ERROR - stderr - +2025-02-06 01:40:23 - ERROR - stderr - +2025-02-06 01:40:23 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.6863664388656616, 'learning_rate': 2.549903654440423e-06, 'epoch': 2.32} +2025-02-06 01:40:23 - ERROR - stderr - 77%|███████▋ | 17376/22434 [15:32:42<3:32:37, 2.52s/it] +2025-02-06 01:40:25 - ERROR - stderr - 77%|███████▋ | 17377/22434 [15:32:45<3:37:32, 2.58s/it] +2025-02-06 01:40:25 - ERROR - stderr - +2025-02-06 01:40:25 - ERROR - stderr - +2025-02-06 01:40:25 - INFO - stdout - {'loss': 0.34, 'grad_norm': 1.5348397493362427, 'learning_rate': 2.5489406755553005e-06, 'epoch': 2.32} +2025-02-06 01:40:25 - ERROR - stderr - 77%|███████▋ | 17377/22434 [15:32:45<3:37:32, 2.58s/it] +2025-02-06 01:40:28 - ERROR - stderr - 77%|███████▋ | 17378/22434 [15:32:48<3:35:39, 2.56s/it] +2025-02-06 01:40:28 - ERROR - stderr - +2025-02-06 01:40:28 - ERROR - stderr - +2025-02-06 01:40:28 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.376990556716919, 'learning_rate': 2.547977851980373e-06, 'epoch': 2.32} +2025-02-06 01:40:28 - ERROR - stderr - 77%|███████▋ | 17378/22434 [15:32:48<3:35:39, 2.56s/it] +2025-02-06 01:40:30 - ERROR - stderr - 77%|███████▋ | 17379/22434 [15:32:50<3:33:21, 2.53s/it] +2025-02-06 01:40:30 - ERROR - stderr - +2025-02-06 01:40:30 - ERROR - stderr - +2025-02-06 01:40:30 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.4662714004516602, 'learning_rate': 2.5470151837357227e-06, 'epoch': 2.32} +2025-02-06 01:40:30 - ERROR - stderr - 77%|███████▋ | 17379/22434 [15:32:50<3:33:21, 2.53s/it] +2025-02-06 01:40:33 - ERROR - stderr - 77%|███████▋ | 17380/22434 [15:32:53<3:33:38, 2.54s/it] +2025-02-06 01:40:33 - ERROR - stderr - +2025-02-06 01:40:33 - ERROR - stderr - +2025-02-06 01:40:33 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.522168755531311, 'learning_rate': 2.546052670841406e-06, 'epoch': 2.32} +2025-02-06 01:40:33 - ERROR - stderr - 77%|███████▋ | 17380/22434 [15:32:53<3:33:38, 2.54s/it] +2025-02-06 01:40:35 - ERROR - stderr - 77%|███████▋ | 17381/22434 [15:32:55<3:35:10, 2.55s/it] +2025-02-06 01:40:35 - ERROR - stderr - +2025-02-06 01:40:35 - ERROR - stderr - +2025-02-06 01:40:35 - INFO - stdout - {'loss': 0.3398, 'grad_norm': 1.5200400352478027, 'learning_rate': 2.5450903133174878e-06, 'epoch': 2.32} +2025-02-06 01:40:35 - ERROR - stderr - 77%|███████▋ | 17381/22434 [15:32:55<3:35:10, 2.55s/it] +2025-02-06 01:40:38 - ERROR - stderr - 77%|███████▋ | 17382/22434 [15:32:58<3:34:15, 2.54s/it] +2025-02-06 01:40:38 - ERROR - stderr - +2025-02-06 01:40:38 - ERROR - stderr - +2025-02-06 01:40:38 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.5238368511199951, 'learning_rate': 2.54412811118403e-06, 'epoch': 2.32} +2025-02-06 01:40:38 - ERROR - stderr - 77%|███████▋ | 17382/22434 [15:32:58<3:34:15, 2.54s/it] +2025-02-06 01:40:40 - ERROR - stderr - 77%|███████▋ | 17383/22434 [15:33:00<3:32:09, 2.52s/it] +2025-02-06 01:40:40 - ERROR - stderr - +2025-02-06 01:40:40 - ERROR - stderr - +2025-02-06 01:40:40 - INFO - stdout - {'loss': 0.3385, 'grad_norm': 1.58733069896698, 'learning_rate': 2.5431660644610856e-06, 'epoch': 2.32} +2025-02-06 01:40:40 - ERROR - stderr - 77%|███████▋ | 17383/22434 [15:33:00<3:32:09, 2.52s/it] +2025-02-06 01:40:43 - ERROR - stderr - 77%|███████▋ | 17384/22434 [15:33:03<3:31:34, 2.51s/it] +2025-02-06 01:40:43 - ERROR - stderr - +2025-02-06 01:40:43 - ERROR - stderr - +2025-02-06 01:40:43 - INFO - stdout - {'loss': 0.3129, 'grad_norm': 1.3430794477462769, 'learning_rate': 2.542204173168711e-06, 'epoch': 2.32} +2025-02-06 01:40:43 - ERROR - stderr - 77%|███████▋ | 17384/22434 [15:33:03<3:31:34, 2.51s/it] +2025-02-06 01:40:45 - ERROR - stderr - 77%|███████▋ | 17385/22434 [15:33:05<3:31:23, 2.51s/it] +2025-02-06 01:40:45 - ERROR - stderr - +2025-02-06 01:40:45 - ERROR - stderr - +2025-02-06 01:40:45 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.43308687210083, 'learning_rate': 2.541242437326953e-06, 'epoch': 2.32} +2025-02-06 01:40:45 - ERROR - stderr - 77%|███████▋ | 17385/22434 [15:33:05<3:31:23, 2.51s/it] +2025-02-06 01:40:48 - ERROR - stderr - 77%|███████▋ | 17386/22434 [15:33:08<3:38:18, 2.59s/it] +2025-02-06 01:40:48 - ERROR - stderr - +2025-02-06 01:40:48 - ERROR - stderr - +2025-02-06 01:40:48 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5380980968475342, 'learning_rate': 2.540280856955859e-06, 'epoch': 2.32} +2025-02-06 01:40:48 - ERROR - stderr - 77%|███████▋ | 17386/22434 [15:33:08<3:38:18, 2.59s/it] +2025-02-06 01:40:51 - ERROR - stderr - 78%|███████▊ | 17387/22434 [15:33:10<3:35:01, 2.56s/it] +2025-02-06 01:40:51 - ERROR - stderr - +2025-02-06 01:40:51 - ERROR - stderr - +2025-02-06 01:40:51 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.5723624229431152, 'learning_rate': 2.539319432075472e-06, 'epoch': 2.33} +2025-02-06 01:40:51 - ERROR - stderr - 78%|███████▊ | 17387/22434 [15:33:10<3:35:01, 2.56s/it] +2025-02-06 01:40:53 - ERROR - stderr - 78%|███████▊ | 17388/22434 [15:33:13<3:34:02, 2.55s/it] +2025-02-06 01:40:53 - ERROR - stderr - +2025-02-06 01:40:53 - ERROR - stderr - +2025-02-06 01:40:53 - INFO - stdout - {'loss': 0.3596, 'grad_norm': 1.6219971179962158, 'learning_rate': 2.538358162705834e-06, 'epoch': 2.33} +2025-02-06 01:40:53 - ERROR - stderr - 78%|███████▊ | 17388/22434 [15:33:13<3:34:02, 2.55s/it] +2025-02-06 01:40:56 - ERROR - stderr - 78%|███████▊ | 17389/22434 [15:33:15<3:32:41, 2.53s/it] +2025-02-06 01:40:56 - ERROR - stderr - +2025-02-06 01:40:56 - ERROR - stderr - +2025-02-06 01:40:56 - INFO - stdout - {'loss': 0.3404, 'grad_norm': 1.5307294130325317, 'learning_rate': 2.5373970488669784e-06, 'epoch': 2.33} +2025-02-06 01:40:56 - ERROR - stderr - 78%|███████▊ | 17389/22434 [15:33:15<3:32:41, 2.53s/it] +2025-02-06 01:40:58 - ERROR - stderr - 78%|███████▊ | 17390/22434 [15:33:18<3:31:11, 2.51s/it] +2025-02-06 01:40:58 - ERROR - stderr - +2025-02-06 01:40:58 - ERROR - stderr - +2025-02-06 01:40:58 - INFO - stdout - {'loss': 0.4327, 'grad_norm': 1.7119563817977905, 'learning_rate': 2.536436090578941e-06, 'epoch': 2.33} +2025-02-06 01:40:58 - ERROR - stderr - 78%|███████▊ | 17390/22434 [15:33:18<3:31:11, 2.51s/it] +2025-02-06 01:41:01 - ERROR - stderr - 78%|███████▊ | 17391/22434 [15:33:20<3:31:13, 2.51s/it] +2025-02-06 01:41:01 - ERROR - stderr - +2025-02-06 01:41:01 - ERROR - stderr - +2025-02-06 01:41:01 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.5795284509658813, 'learning_rate': 2.535475287861755e-06, 'epoch': 2.33} +2025-02-06 01:41:01 - ERROR - stderr - 78%|███████▊ | 17391/22434 [15:33:20<3:31:13, 2.51s/it] +2025-02-06 01:41:03 - ERROR - stderr - 78%|███████▊ | 17392/22434 [15:33:23<3:29:33, 2.49s/it] +2025-02-06 01:41:03 - ERROR - stderr - +2025-02-06 01:41:03 - ERROR - stderr - +2025-02-06 01:41:03 - INFO - stdout - {'loss': 0.3267, 'grad_norm': 1.4756495952606201, 'learning_rate': 2.534514640735437e-06, 'epoch': 2.33} +2025-02-06 01:41:03 - ERROR - stderr - 78%|███████▊ | 17392/22434 [15:33:23<3:29:33, 2.49s/it] +2025-02-06 01:41:06 - ERROR - stderr - 78%|███████▊ | 17393/22434 [15:33:25<3:29:16, 2.49s/it] +2025-02-06 01:41:06 - ERROR - stderr - +2025-02-06 01:41:06 - ERROR - stderr - +2025-02-06 01:41:06 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.5753260850906372, 'learning_rate': 2.533554149220024e-06, 'epoch': 2.33} +2025-02-06 01:41:06 - ERROR - stderr - 78%|███████▊ | 17393/22434 [15:33:25<3:29:16, 2.49s/it] +2025-02-06 01:41:08 - ERROR - stderr - 78%|███████▊ | 17394/22434 [15:33:28<3:37:21, 2.59s/it] +2025-02-06 01:41:08 - ERROR - stderr - +2025-02-06 01:41:08 - ERROR - stderr - +2025-02-06 01:41:08 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.6040136814117432, 'learning_rate': 2.532593813335524e-06, 'epoch': 2.33} +2025-02-06 01:41:08 - ERROR - stderr - 78%|███████▊ | 17394/22434 [15:33:28<3:37:21, 2.59s/it] +2025-02-06 01:41:11 - ERROR - stderr - 78%|███████▊ | 17395/22434 [15:33:31<3:33:25, 2.54s/it] +2025-02-06 01:41:11 - ERROR - stderr - +2025-02-06 01:41:11 - ERROR - stderr - +2025-02-06 01:41:11 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.6881283521652222, 'learning_rate': 2.531633633101964e-06, 'epoch': 2.33} +2025-02-06 01:41:11 - ERROR - stderr - 78%|███████▊ | 17395/22434 [15:33:31<3:33:25, 2.54s/it] +2025-02-06 01:41:13 - ERROR - stderr - 78%|███████▊ | 17396/22434 [15:33:33<3:31:05, 2.51s/it] +2025-02-06 01:41:13 - ERROR - stderr - +2025-02-06 01:41:13 - ERROR - stderr - +2025-02-06 01:41:13 - INFO - stdout - {'loss': 0.3627, 'grad_norm': 1.508731722831726, 'learning_rate': 2.530673608539357e-06, 'epoch': 2.33} +2025-02-06 01:41:13 - ERROR - stderr - 78%|███████▊ | 17396/22434 [15:33:33<3:31:05, 2.51s/it] +2025-02-06 01:41:16 - ERROR - stderr - 78%|███████▊ | 17397/22434 [15:33:36<3:37:19, 2.59s/it] +2025-02-06 01:41:16 - ERROR - stderr - +2025-02-06 01:41:16 - ERROR - stderr - +2025-02-06 01:41:16 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.5773926973342896, 'learning_rate': 2.529713739667705e-06, 'epoch': 2.33} +2025-02-06 01:41:16 - ERROR - stderr - 78%|███████▊ | 17397/22434 [15:33:36<3:37:19, 2.59s/it] +2025-02-06 01:41:19 - ERROR - stderr - 78%|███████▊ | 17398/22434 [15:33:38<3:33:48, 2.55s/it] +2025-02-06 01:41:19 - ERROR - stderr - +2025-02-06 01:41:19 - ERROR - stderr - +2025-02-06 01:41:19 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.667004942893982, 'learning_rate': 2.5287540265070277e-06, 'epoch': 2.33} +2025-02-06 01:41:19 - ERROR - stderr - 78%|███████▊ | 17398/22434 [15:33:38<3:33:48, 2.55s/it] +2025-02-06 01:41:21 - ERROR - stderr - 78%|███████▊ | 17399/22434 [15:33:41<3:32:31, 2.53s/it] +2025-02-06 01:41:21 - ERROR - stderr - +2025-02-06 01:41:21 - ERROR - stderr - +2025-02-06 01:41:21 - INFO - stdout - {'loss': 0.4043, 'grad_norm': 1.5701279640197754, 'learning_rate': 2.5277944690773213e-06, 'epoch': 2.33} +2025-02-06 01:41:21 - ERROR - stderr - 78%|███████▊ | 17399/22434 [15:33:41<3:32:31, 2.53s/it] +2025-02-06 01:41:24 - ERROR - stderr - 78%|███████▊ | 17400/22434 [15:33:44<3:41:38, 2.64s/it] +2025-02-06 01:41:24 - ERROR - stderr - +2025-02-06 01:41:24 - ERROR - stderr - +2025-02-06 01:41:24 - INFO - stdout - {'loss': 0.3364, 'grad_norm': 1.41645085811615, 'learning_rate': 2.5268350673985887e-06, 'epoch': 2.33} +2025-02-06 01:41:24 - ERROR - stderr - 78%|███████▊ | 17400/22434 [15:33:44<3:41:38, 2.64s/it] +2025-02-06 01:41:26 - ERROR - stderr - 78%|███████▊ | 17401/22434 [15:33:46<3:38:07, 2.60s/it] +2025-02-06 01:41:26 - ERROR - stderr - +2025-02-06 01:41:26 - ERROR - stderr - +2025-02-06 01:41:26 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.483176827430725, 'learning_rate': 2.5258758214908273e-06, 'epoch': 2.33} +2025-02-06 01:41:26 - ERROR - stderr - 78%|███████▊ | 17401/22434 [15:33:46<3:38:07, 2.60s/it] +2025-02-06 01:41:29 - ERROR - stderr - 78%|███████▊ | 17402/22434 [15:33:49<3:37:44, 2.60s/it] +2025-02-06 01:41:29 - ERROR - stderr - +2025-02-06 01:41:29 - ERROR - stderr - +2025-02-06 01:41:29 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.4493509531021118, 'learning_rate': 2.5249167313740307e-06, 'epoch': 2.33} +2025-02-06 01:41:29 - ERROR - stderr - 78%|███████▊ | 17402/22434 [15:33:49<3:37:44, 2.60s/it] +2025-02-06 01:41:31 - ERROR - stderr - 78%|███████▊ | 17403/22434 [15:33:51<3:34:14, 2.55s/it] +2025-02-06 01:41:32 - ERROR - stderr - +2025-02-06 01:41:32 - ERROR - stderr - +2025-02-06 01:41:32 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.496437668800354, 'learning_rate': 2.523957797068197e-06, 'epoch': 2.33} +2025-02-06 01:41:32 - ERROR - stderr - 78%|███████▊ | 17403/22434 [15:33:51<3:34:14, 2.55s/it] +2025-02-06 01:41:34 - ERROR - stderr - 78%|███████▊ | 17404/22434 [15:33:54<3:32:41, 2.54s/it] +2025-02-06 01:41:34 - ERROR - stderr - +2025-02-06 01:41:34 - ERROR - stderr - +2025-02-06 01:41:34 - INFO - stdout - {'loss': 0.439, 'grad_norm': 1.7576886415481567, 'learning_rate': 2.5229990185933075e-06, 'epoch': 2.33} +2025-02-06 01:41:34 - ERROR - stderr - 78%|███████▊ | 17404/22434 [15:33:54<3:32:41, 2.54s/it] +2025-02-06 01:41:36 - ERROR - stderr - 78%|███████▊ | 17405/22434 [15:33:56<3:30:20, 2.51s/it] +2025-02-06 01:41:36 - ERROR - stderr - +2025-02-06 01:41:36 - ERROR - stderr - +2025-02-06 01:41:36 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4831782579421997, 'learning_rate': 2.5220403959693473e-06, 'epoch': 2.33} +2025-02-06 01:41:36 - ERROR - stderr - 78%|███████▊ | 17405/22434 [15:33:56<3:30:20, 2.51s/it] +2025-02-06 01:41:39 - ERROR - stderr - 78%|███████▊ | 17406/22434 [15:33:59<3:31:12, 2.52s/it] +2025-02-06 01:41:39 - ERROR - stderr - +2025-02-06 01:41:39 - ERROR - stderr - +2025-02-06 01:41:39 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.5239614248275757, 'learning_rate': 2.5210819292163003e-06, 'epoch': 2.33} +2025-02-06 01:41:39 - ERROR - stderr - 78%|███████▊ | 17406/22434 [15:33:59<3:31:12, 2.52s/it] +2025-02-06 01:41:41 - ERROR - stderr - 78%|███████▊ | 17407/22434 [15:34:01<3:30:54, 2.52s/it] +2025-02-06 01:41:42 - ERROR - stderr - +2025-02-06 01:41:42 - ERROR - stderr - +2025-02-06 01:41:42 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.4576964378356934, 'learning_rate': 2.5201236183541433e-06, 'epoch': 2.33} +2025-02-06 01:41:42 - ERROR - stderr - 78%|███████▊ | 17407/22434 [15:34:01<3:30:54, 2.52s/it] +2025-02-06 01:41:44 - ERROR - stderr - 78%|███████▊ | 17408/22434 [15:34:04<3:29:36, 2.50s/it] +2025-02-06 01:41:44 - ERROR - stderr - +2025-02-06 01:41:44 - ERROR - stderr - +2025-02-06 01:41:44 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.570426106452942, 'learning_rate': 2.519165463402853e-06, 'epoch': 2.33} +2025-02-06 01:41:44 - ERROR - stderr - 78%|███████▊ | 17408/22434 [15:34:04<3:29:36, 2.50s/it] +2025-02-06 01:41:46 - ERROR - stderr - 78%|███████▊ | 17409/22434 [15:34:06<3:28:20, 2.49s/it] +2025-02-06 01:41:46 - ERROR - stderr - +2025-02-06 01:41:46 - ERROR - stderr - +2025-02-06 01:41:46 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.5089309215545654, 'learning_rate': 2.5182074643823996e-06, 'epoch': 2.33} +2025-02-06 01:41:46 - ERROR - stderr - 78%|███████▊ | 17409/22434 [15:34:06<3:28:20, 2.49s/it] +2025-02-06 01:41:49 - ERROR - stderr - 78%|███████▊ | 17410/22434 [15:34:09<3:27:45, 2.48s/it] +2025-02-06 01:41:49 - ERROR - stderr - +2025-02-06 01:41:49 - ERROR - stderr - +2025-02-06 01:41:49 - INFO - stdout - {'loss': 0.4302, 'grad_norm': 1.7286098003387451, 'learning_rate': 2.517249621312752e-06, 'epoch': 2.33} +2025-02-06 01:41:49 - ERROR - stderr - 78%|███████▊ | 17410/22434 [15:34:09<3:27:45, 2.48s/it] +2025-02-06 01:41:51 - ERROR - stderr - 78%|███████▊ | 17411/22434 [15:34:11<3:29:10, 2.50s/it] +2025-02-06 01:41:51 - ERROR - stderr - +2025-02-06 01:41:51 - ERROR - stderr - +2025-02-06 01:41:51 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.3437933921813965, 'learning_rate': 2.516291934213876e-06, 'epoch': 2.33} +2025-02-06 01:41:51 - ERROR - stderr - 78%|███████▊ | 17411/22434 [15:34:11<3:29:10, 2.50s/it] +2025-02-06 01:41:54 - ERROR - stderr - 78%|███████▊ | 17412/22434 [15:34:14<3:28:28, 2.49s/it] +2025-02-06 01:41:54 - ERROR - stderr - +2025-02-06 01:41:54 - ERROR - stderr - +2025-02-06 01:41:54 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.4931237697601318, 'learning_rate': 2.5153344031057337e-06, 'epoch': 2.33} +2025-02-06 01:41:54 - ERROR - stderr - 78%|███████▊ | 17412/22434 [15:34:14<3:28:28, 2.49s/it] +2025-02-06 01:41:56 - ERROR - stderr - 78%|███████▊ | 17413/22434 [15:34:16<3:26:13, 2.46s/it] +2025-02-06 01:41:56 - ERROR - stderr - +2025-02-06 01:41:56 - ERROR - stderr - +2025-02-06 01:41:56 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.443722128868103, 'learning_rate': 2.5143770280082837e-06, 'epoch': 2.33} +2025-02-06 01:41:56 - ERROR - stderr - 78%|███████▊ | 17413/22434 [15:34:16<3:26:13, 2.46s/it] +2025-02-06 01:41:59 - ERROR - stderr - 78%|███████▊ | 17414/22434 [15:34:19<3:29:52, 2.51s/it] +2025-02-06 01:41:59 - ERROR - stderr - +2025-02-06 01:41:59 - ERROR - stderr - +2025-02-06 01:41:59 - INFO - stdout - {'loss': 0.3514, 'grad_norm': 1.553514838218689, 'learning_rate': 2.513419808941482e-06, 'epoch': 2.33} +2025-02-06 01:41:59 - ERROR - stderr - 78%|███████▊ | 17414/22434 [15:34:19<3:29:52, 2.51s/it] +2025-02-06 01:42:01 - ERROR - stderr - 78%|███████▊ | 17415/22434 [15:34:21<3:30:19, 2.51s/it] +2025-02-06 01:42:01 - ERROR - stderr - +2025-02-06 01:42:01 - ERROR - stderr - +2025-02-06 01:42:01 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.6258881092071533, 'learning_rate': 2.5124627459252826e-06, 'epoch': 2.33} +2025-02-06 01:42:01 - ERROR - stderr - 78%|███████▊ | 17415/22434 [15:34:21<3:30:19, 2.51s/it] +2025-02-06 01:42:04 - ERROR - stderr - 78%|███████▊ | 17416/22434 [15:34:24<3:28:07, 2.49s/it] +2025-02-06 01:42:04 - ERROR - stderr - +2025-02-06 01:42:04 - ERROR - stderr - +2025-02-06 01:42:04 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.626522421836853, 'learning_rate': 2.5115058389796264e-06, 'epoch': 2.33} +2025-02-06 01:42:04 - ERROR - stderr - 78%|███████▊ | 17416/22434 [15:34:24<3:28:07, 2.49s/it] +2025-02-06 01:42:06 - ERROR - stderr - 78%|███████▊ | 17417/22434 [15:34:26<3:28:47, 2.50s/it] +2025-02-06 01:42:06 - ERROR - stderr - +2025-02-06 01:42:06 - ERROR - stderr - +2025-02-06 01:42:06 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.4538726806640625, 'learning_rate': 2.510549088124472e-06, 'epoch': 2.33} +2025-02-06 01:42:06 - ERROR - stderr - 78%|███████▊ | 17417/22434 [15:34:26<3:28:47, 2.50s/it] +2025-02-06 01:42:09 - ERROR - stderr - 78%|███████▊ | 17418/22434 [15:34:29<3:29:55, 2.51s/it] +2025-02-06 01:42:09 - ERROR - stderr - +2025-02-06 01:42:09 - ERROR - stderr - +2025-02-06 01:42:09 - INFO - stdout - {'loss': 0.3397, 'grad_norm': 1.4378339052200317, 'learning_rate': 2.509592493379749e-06, 'epoch': 2.33} +2025-02-06 01:42:09 - ERROR - stderr - 78%|███████▊ | 17418/22434 [15:34:29<3:29:55, 2.51s/it] +2025-02-06 01:42:11 - ERROR - stderr - 78%|███████▊ | 17419/22434 [15:34:31<3:29:24, 2.51s/it] +2025-02-06 01:42:11 - ERROR - stderr - +2025-02-06 01:42:11 - ERROR - stderr - +2025-02-06 01:42:11 - INFO - stdout - {'loss': 0.306, 'grad_norm': 1.31521475315094, 'learning_rate': 2.5086360547654088e-06, 'epoch': 2.33} +2025-02-06 01:42:11 - ERROR - stderr - 78%|███████▊ | 17419/22434 [15:34:31<3:29:24, 2.51s/it] +2025-02-06 01:42:14 - ERROR - stderr - 78%|███████▊ | 17420/22434 [15:34:34<3:30:23, 2.52s/it] +2025-02-06 01:42:14 - ERROR - stderr - +2025-02-06 01:42:14 - ERROR - stderr - +2025-02-06 01:42:14 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.5718178749084473, 'learning_rate': 2.507679772301379e-06, 'epoch': 2.33} +2025-02-06 01:42:14 - ERROR - stderr - 78%|███████▊ | 17420/22434 [15:34:34<3:30:23, 2.52s/it] +2025-02-06 01:42:16 - ERROR - stderr - 78%|███████▊ | 17421/22434 [15:34:36<3:29:08, 2.50s/it] +2025-02-06 01:42:16 - ERROR - stderr - +2025-02-06 01:42:16 - ERROR - stderr - +2025-02-06 01:42:16 - INFO - stdout - {'loss': 0.4231, 'grad_norm': 1.7533091306686401, 'learning_rate': 2.5067236460075916e-06, 'epoch': 2.33} +2025-02-06 01:42:16 - ERROR - stderr - 78%|███████▊ | 17421/22434 [15:34:36<3:29:08, 2.50s/it] +2025-02-06 01:42:19 - ERROR - stderr - 78%|███████▊ | 17422/22434 [15:34:39<3:28:56, 2.50s/it] +2025-02-06 01:42:19 - ERROR - stderr - +2025-02-06 01:42:19 - ERROR - stderr - +2025-02-06 01:42:19 - INFO - stdout - {'loss': 0.3965, 'grad_norm': 1.6867796182632446, 'learning_rate': 2.505767675903985e-06, 'epoch': 2.33} +2025-02-06 01:42:19 - ERROR - stderr - 78%|███████▊ | 17422/22434 [15:34:39<3:28:56, 2.50s/it] +2025-02-06 01:42:21 - ERROR - stderr - 78%|███████▊ | 17423/22434 [15:34:41<3:29:06, 2.50s/it] +2025-02-06 01:42:21 - ERROR - stderr - +2025-02-06 01:42:21 - ERROR - stderr - +2025-02-06 01:42:21 - INFO - stdout - {'loss': 0.3993, 'grad_norm': 1.5020016431808472, 'learning_rate': 2.5048118620104754e-06, 'epoch': 2.33} +2025-02-06 01:42:21 - ERROR - stderr - 78%|███████▊ | 17423/22434 [15:34:41<3:29:06, 2.50s/it] +2025-02-06 01:42:24 - ERROR - stderr - 78%|███████▊ | 17424/22434 [15:34:44<3:29:20, 2.51s/it] +2025-02-06 01:42:24 - ERROR - stderr - +2025-02-06 01:42:24 - ERROR - stderr - +2025-02-06 01:42:24 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.4248294830322266, 'learning_rate': 2.503856204346995e-06, 'epoch': 2.33} +2025-02-06 01:42:24 - ERROR - stderr - 78%|███████▊ | 17424/22434 [15:34:44<3:29:20, 2.51s/it] +2025-02-06 01:42:26 - ERROR - stderr - 78%|███████▊ | 17425/22434 [15:34:46<3:29:48, 2.51s/it] +2025-02-06 01:42:26 - ERROR - stderr - +2025-02-06 01:42:26 - ERROR - stderr - +2025-02-06 01:42:26 - INFO - stdout - {'loss': 0.3938, 'grad_norm': 1.6056840419769287, 'learning_rate': 2.5029007029334574e-06, 'epoch': 2.33} +2025-02-06 01:42:26 - ERROR - stderr - 78%|███████▊ | 17425/22434 [15:34:46<3:29:48, 2.51s/it] +2025-02-06 01:42:29 - ERROR - stderr - 78%|███████▊ | 17426/22434 [15:34:49<3:28:12, 2.49s/it] +2025-02-06 01:42:29 - ERROR - stderr - +2025-02-06 01:42:29 - ERROR - stderr - +2025-02-06 01:42:29 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.6165626049041748, 'learning_rate': 2.501945357789779e-06, 'epoch': 2.33} +2025-02-06 01:42:29 - ERROR - stderr - 78%|███████▊ | 17426/22434 [15:34:49<3:28:12, 2.49s/it] +2025-02-06 01:42:32 - ERROR - stderr - 78%|███████▊ | 17427/22434 [15:34:51<3:35:55, 2.59s/it] +2025-02-06 01:42:32 - ERROR - stderr - +2025-02-06 01:42:32 - ERROR - stderr - +2025-02-06 01:42:32 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.730675458908081, 'learning_rate': 2.5009901689358763e-06, 'epoch': 2.33} +2025-02-06 01:42:32 - ERROR - stderr - 78%|███████▊ | 17427/22434 [15:34:52<3:35:55, 2.59s/it] +2025-02-06 01:42:34 - ERROR - stderr - 78%|███████▊ | 17428/22434 [15:34:54<3:34:35, 2.57s/it] +2025-02-06 01:42:34 - ERROR - stderr - +2025-02-06 01:42:34 - ERROR - stderr - +2025-02-06 01:42:34 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.538291573524475, 'learning_rate': 2.5000351363916564e-06, 'epoch': 2.33} +2025-02-06 01:42:34 - ERROR - stderr - 78%|███████▊ | 17428/22434 [15:34:54<3:34:35, 2.57s/it] +2025-02-06 01:42:37 - ERROR - stderr - 78%|███████▊ | 17429/22434 [15:34:57<3:34:21, 2.57s/it] +2025-02-06 01:42:37 - ERROR - stderr - +2025-02-06 01:42:37 - ERROR - stderr - +2025-02-06 01:42:37 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.405634880065918, 'learning_rate': 2.499080260177028e-06, 'epoch': 2.33} +2025-02-06 01:42:37 - ERROR - stderr - 78%|███████▊ | 17429/22434 [15:34:57<3:34:21, 2.57s/it] +2025-02-06 01:42:39 - ERROR - stderr - 78%|███████▊ | 17430/22434 [15:34:59<3:31:23, 2.53s/it] +2025-02-06 01:42:39 - ERROR - stderr - +2025-02-06 01:42:39 - ERROR - stderr - +2025-02-06 01:42:39 - INFO - stdout - {'loss': 0.3292, 'grad_norm': 1.5386171340942383, 'learning_rate': 2.4981255403118942e-06, 'epoch': 2.33} +2025-02-06 01:42:39 - ERROR - stderr - 78%|███████▊ | 17430/22434 [15:34:59<3:31:23, 2.53s/it] +2025-02-06 01:42:42 - ERROR - stderr - 78%|███████▊ | 17431/22434 [15:35:02<3:34:29, 2.57s/it] +2025-02-06 01:42:42 - ERROR - stderr - +2025-02-06 01:42:42 - ERROR - stderr - +2025-02-06 01:42:42 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.6686047315597534, 'learning_rate': 2.497170976816156e-06, 'epoch': 2.33} +2025-02-06 01:42:42 - ERROR - stderr - 78%|███████▊ | 17431/22434 [15:35:02<3:34:29, 2.57s/it] +2025-02-06 01:42:44 - ERROR - stderr - 78%|███████▊ | 17432/22434 [15:35:04<3:33:17, 2.56s/it] +2025-02-06 01:42:44 - ERROR - stderr - +2025-02-06 01:42:44 - ERROR - stderr - +2025-02-06 01:42:44 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.7270348072052002, 'learning_rate': 2.4962165697097075e-06, 'epoch': 2.33} +2025-02-06 01:42:44 - ERROR - stderr - 78%|███████▊ | 17432/22434 [15:35:04<3:33:17, 2.56s/it] +2025-02-06 01:42:47 - ERROR - stderr - 78%|███████▊ | 17433/22434 [15:35:07<3:31:32, 2.54s/it] +2025-02-06 01:42:47 - ERROR - stderr - +2025-02-06 01:42:47 - ERROR - stderr - +2025-02-06 01:42:47 - INFO - stdout - {'loss': 0.3959, 'grad_norm': 1.5836387872695923, 'learning_rate': 2.495262319012445e-06, 'epoch': 2.33} +2025-02-06 01:42:47 - ERROR - stderr - 78%|███████▊ | 17433/22434 [15:35:07<3:31:32, 2.54s/it] +2025-02-06 01:42:49 - ERROR - stderr - 78%|███████▊ | 17434/22434 [15:35:09<3:31:53, 2.54s/it] +2025-02-06 01:42:50 - ERROR - stderr - +2025-02-06 01:42:50 - ERROR - stderr - +2025-02-06 01:42:50 - INFO - stdout - {'loss': 0.3392, 'grad_norm': 1.5012779235839844, 'learning_rate': 2.4943082247442584e-06, 'epoch': 2.33} +2025-02-06 01:42:50 - ERROR - stderr - 78%|███████▊ | 17434/22434 [15:35:09<3:31:53, 2.54s/it] +2025-02-06 01:42:52 - ERROR - stderr - 78%|███████▊ | 17435/22434 [15:35:12<3:30:57, 2.53s/it] +2025-02-06 01:42:52 - ERROR - stderr - +2025-02-06 01:42:52 - ERROR - stderr - +2025-02-06 01:42:52 - INFO - stdout - {'loss': 0.4016, 'grad_norm': 1.649025321006775, 'learning_rate': 2.493354286925035e-06, 'epoch': 2.33} +2025-02-06 01:42:52 - ERROR - stderr - 78%|███████▊ | 17435/22434 [15:35:12<3:30:57, 2.53s/it] +2025-02-06 01:42:54 - ERROR - stderr - 78%|███████▊ | 17436/22434 [15:35:14<3:30:30, 2.53s/it] +2025-02-06 01:42:55 - ERROR - stderr - +2025-02-06 01:42:55 - ERROR - stderr - +2025-02-06 01:42:55 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.3496390581130981, 'learning_rate': 2.4924005055746603e-06, 'epoch': 2.33} +2025-02-06 01:42:55 - ERROR - stderr - 78%|███████▊ | 17436/22434 [15:35:14<3:30:30, 2.53s/it] +2025-02-06 01:42:57 - ERROR - stderr - 78%|███████▊ | 17437/22434 [15:35:17<3:32:45, 2.55s/it] +2025-02-06 01:42:57 - ERROR - stderr - +2025-02-06 01:42:57 - ERROR - stderr - +2025-02-06 01:42:57 - INFO - stdout - {'loss': 0.3335, 'grad_norm': 1.4612303972244263, 'learning_rate': 2.4914468807130076e-06, 'epoch': 2.33} +2025-02-06 01:42:57 - ERROR - stderr - 78%|███████▊ | 17437/22434 [15:35:17<3:32:45, 2.55s/it] +2025-02-06 01:43:00 - ERROR - stderr - 78%|███████▊ | 17438/22434 [15:35:20<3:40:11, 2.64s/it] +2025-02-06 01:43:00 - ERROR - stderr - +2025-02-06 01:43:00 - ERROR - stderr - +2025-02-06 01:43:00 - INFO - stdout - {'loss': 0.3241, 'grad_norm': 1.624683141708374, 'learning_rate': 2.4904934123599657e-06, 'epoch': 2.33} +2025-02-06 01:43:00 - ERROR - stderr - 78%|███████▊ | 17438/22434 [15:35:20<3:40:11, 2.64s/it] +2025-02-06 01:43:02 - ERROR - stderr - 78%|███████▊ | 17439/22434 [15:35:22<3:37:11, 2.61s/it] +2025-02-06 01:43:03 - ERROR - stderr - +2025-02-06 01:43:03 - ERROR - stderr - +2025-02-06 01:43:03 - INFO - stdout - {'loss': 0.3434, 'grad_norm': 1.5036067962646484, 'learning_rate': 2.489540100535397e-06, 'epoch': 2.33} +2025-02-06 01:43:03 - ERROR - stderr - 78%|███████▊ | 17439/22434 [15:35:22<3:37:11, 2.61s/it] +2025-02-06 01:43:05 - ERROR - stderr - 78%|███████▊ | 17440/22434 [15:35:25<3:33:45, 2.57s/it] +2025-02-06 01:43:05 - ERROR - stderr - +2025-02-06 01:43:05 - ERROR - stderr - +2025-02-06 01:43:05 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.6572527885437012, 'learning_rate': 2.4885869452591817e-06, 'epoch': 2.33} +2025-02-06 01:43:05 - ERROR - stderr - 78%|███████▊ | 17440/22434 [15:35:25<3:33:45, 2.57s/it] +2025-02-06 01:43:08 - ERROR - stderr - 78%|███████▊ | 17441/22434 [15:35:27<3:38:24, 2.62s/it] +2025-02-06 01:43:08 - ERROR - stderr - +2025-02-06 01:43:08 - ERROR - stderr - +2025-02-06 01:43:08 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.388824701309204, 'learning_rate': 2.4876339465511857e-06, 'epoch': 2.33} +2025-02-06 01:43:08 - ERROR - stderr - 78%|███████▊ | 17441/22434 [15:35:28<3:38:24, 2.62s/it] +2025-02-06 01:43:10 - ERROR - stderr - 78%|███████▊ | 17442/22434 [15:35:30<3:34:20, 2.58s/it] +2025-02-06 01:43:10 - ERROR - stderr - +2025-02-06 01:43:10 - ERROR - stderr - +2025-02-06 01:43:10 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.5635526180267334, 'learning_rate': 2.4866811044312667e-06, 'epoch': 2.33} +2025-02-06 01:43:10 - ERROR - stderr - 78%|███████▊ | 17442/22434 [15:35:30<3:34:20, 2.58s/it] +2025-02-06 01:43:13 - ERROR - stderr - 78%|███████▊ | 17443/22434 [15:35:32<3:30:34, 2.53s/it] +2025-02-06 01:43:13 - ERROR - stderr - +2025-02-06 01:43:13 - ERROR - stderr - +2025-02-06 01:43:13 - INFO - stdout - {'loss': 0.4148, 'grad_norm': 1.5039589405059814, 'learning_rate': 2.4857284189192956e-06, 'epoch': 2.33} +2025-02-06 01:43:13 - ERROR - stderr - 78%|███████▊ | 17443/22434 [15:35:32<3:30:34, 2.53s/it] +2025-02-06 01:43:15 - ERROR - stderr - 78%|███████▊ | 17444/22434 [15:35:35<3:27:54, 2.50s/it] +2025-02-06 01:43:15 - ERROR - stderr - +2025-02-06 01:43:15 - ERROR - stderr - +2025-02-06 01:43:15 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.4200513362884521, 'learning_rate': 2.4847758900351226e-06, 'epoch': 2.33} +2025-02-06 01:43:15 - ERROR - stderr - 78%|███████▊ | 17444/22434 [15:35:35<3:27:54, 2.50s/it] +2025-02-06 01:43:17 - ERROR - stderr - 78%|███████▊ | 17445/22434 [15:35:37<3:26:30, 2.48s/it] +2025-02-06 01:43:18 - ERROR - stderr - +2025-02-06 01:43:18 - ERROR - stderr - +2025-02-06 01:43:18 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.5599287748336792, 'learning_rate': 2.4838235177986046e-06, 'epoch': 2.33} +2025-02-06 01:43:18 - ERROR - stderr - 78%|███████▊ | 17445/22434 [15:35:37<3:26:30, 2.48s/it] +2025-02-06 01:43:20 - ERROR - stderr - 78%|███████▊ | 17446/22434 [15:35:40<3:26:02, 2.48s/it] +2025-02-06 01:43:20 - ERROR - stderr - +2025-02-06 01:43:20 - ERROR - stderr - +2025-02-06 01:43:20 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.4237825870513916, 'learning_rate': 2.4828713022295936e-06, 'epoch': 2.33} +2025-02-06 01:43:20 - ERROR - stderr - 78%|███████▊ | 17446/22434 [15:35:40<3:26:02, 2.48s/it] +2025-02-06 01:43:22 - ERROR - stderr - 78%|███████▊ | 17447/22434 [15:35:42<3:25:49, 2.48s/it] +2025-02-06 01:43:22 - ERROR - stderr - +2025-02-06 01:43:22 - ERROR - stderr - +2025-02-06 01:43:22 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.455592155456543, 'learning_rate': 2.4819192433479344e-06, 'epoch': 2.33} +2025-02-06 01:43:22 - ERROR - stderr - 78%|███████▊ | 17447/22434 [15:35:42<3:25:49, 2.48s/it] +2025-02-06 01:43:25 - ERROR - stderr - 78%|███████▊ | 17448/22434 [15:35:45<3:28:40, 2.51s/it] +2025-02-06 01:43:25 - ERROR - stderr - +2025-02-06 01:43:25 - ERROR - stderr - +2025-02-06 01:43:25 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.4499908685684204, 'learning_rate': 2.4809673411734805e-06, 'epoch': 2.33} +2025-02-06 01:43:25 - ERROR - stderr - 78%|███████▊ | 17448/22434 [15:35:45<3:28:40, 2.51s/it] +2025-02-06 01:43:27 - ERROR - stderr - 78%|███████▊ | 17449/22434 [15:35:47<3:26:45, 2.49s/it] +2025-02-06 01:43:28 - ERROR - stderr - +2025-02-06 01:43:28 - ERROR - stderr - +2025-02-06 01:43:28 - INFO - stdout - {'loss': 0.3358, 'grad_norm': 1.3951843976974487, 'learning_rate': 2.4800155957260643e-06, 'epoch': 2.33} +2025-02-06 01:43:28 - ERROR - stderr - 78%|███████▊ | 17449/22434 [15:35:47<3:26:45, 2.49s/it] +2025-02-06 01:43:30 - ERROR - stderr - 78%|███████▊ | 17450/22434 [15:35:50<3:27:29, 2.50s/it] +2025-02-06 01:43:30 - ERROR - stderr - +2025-02-06 01:43:30 - ERROR - stderr - +2025-02-06 01:43:30 - INFO - stdout - {'loss': 0.4297, 'grad_norm': 1.8905631303787231, 'learning_rate': 2.4790640070255267e-06, 'epoch': 2.33} +2025-02-06 01:43:30 - ERROR - stderr - 78%|███████▊ | 17450/22434 [15:35:50<3:27:29, 2.50s/it] +2025-02-06 01:43:32 - ERROR - stderr - 78%|███████▊ | 17451/22434 [15:35:52<3:25:55, 2.48s/it] +2025-02-06 01:43:32 - ERROR - stderr - +2025-02-06 01:43:32 - ERROR - stderr - +2025-02-06 01:43:32 - INFO - stdout - {'loss': 0.3962, 'grad_norm': 1.5401147603988647, 'learning_rate': 2.4781125750917036e-06, 'epoch': 2.33} +2025-02-06 01:43:32 - ERROR - stderr - 78%|███████▊ | 17451/22434 [15:35:52<3:25:55, 2.48s/it] +2025-02-06 01:43:35 - ERROR - stderr - 78%|███████▊ | 17452/22434 [15:35:55<3:25:34, 2.48s/it] +2025-02-06 01:43:35 - ERROR - stderr - +2025-02-06 01:43:35 - ERROR - stderr - +2025-02-06 01:43:35 - INFO - stdout - {'loss': 0.3055, 'grad_norm': 1.3732661008834839, 'learning_rate': 2.477161299944426e-06, 'epoch': 2.33} +2025-02-06 01:43:35 - ERROR - stderr - 78%|███████▊ | 17452/22434 [15:35:55<3:25:34, 2.48s/it] +2025-02-06 01:43:37 - ERROR - stderr - 78%|███████▊ | 17453/22434 [15:35:57<3:25:34, 2.48s/it] +2025-02-06 01:43:37 - ERROR - stderr - +2025-02-06 01:43:37 - ERROR - stderr - +2025-02-06 01:43:37 - INFO - stdout - {'loss': 0.3277, 'grad_norm': 1.3883804082870483, 'learning_rate': 2.476210181603522e-06, 'epoch': 2.33} +2025-02-06 01:43:37 - ERROR - stderr - 78%|███████▊ | 17453/22434 [15:35:57<3:25:34, 2.48s/it] +2025-02-06 01:43:40 - ERROR - stderr - 78%|███████▊ | 17454/22434 [15:36:00<3:25:53, 2.48s/it] +2025-02-06 01:43:40 - ERROR - stderr - +2025-02-06 01:43:40 - ERROR - stderr - +2025-02-06 01:43:40 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.5273683071136475, 'learning_rate': 2.4752592200888183e-06, 'epoch': 2.33} +2025-02-06 01:43:40 - ERROR - stderr - 78%|███████▊ | 17454/22434 [15:36:00<3:25:53, 2.48s/it] +2025-02-06 01:43:42 - ERROR - stderr - 78%|███████▊ | 17455/22434 [15:36:02<3:28:54, 2.52s/it] +2025-02-06 01:43:43 - ERROR - stderr - +2025-02-06 01:43:43 - ERROR - stderr - +2025-02-06 01:43:43 - INFO - stdout - {'loss': 0.3448, 'grad_norm': 1.4578170776367188, 'learning_rate': 2.474308415420136e-06, 'epoch': 2.33} +2025-02-06 01:43:43 - ERROR - stderr - 78%|███████▊ | 17455/22434 [15:36:02<3:28:54, 2.52s/it] +2025-02-06 01:43:45 - ERROR - stderr - 78%|███████▊ | 17456/22434 [15:36:05<3:28:25, 2.51s/it] +2025-02-06 01:43:45 - ERROR - stderr - +2025-02-06 01:43:45 - ERROR - stderr - +2025-02-06 01:43:45 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.3809643983840942, 'learning_rate': 2.4733577676172927e-06, 'epoch': 2.33} +2025-02-06 01:43:45 - ERROR - stderr - 78%|███████▊ | 17456/22434 [15:36:05<3:28:25, 2.51s/it] +2025-02-06 01:43:48 - ERROR - stderr - 78%|███████▊ | 17457/22434 [15:36:07<3:29:33, 2.53s/it] +2025-02-06 01:43:48 - ERROR - stderr - +2025-02-06 01:43:48 - ERROR - stderr - +2025-02-06 01:43:48 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.4942042827606201, 'learning_rate': 2.4724072767001074e-06, 'epoch': 2.33} +2025-02-06 01:43:48 - ERROR - stderr - 78%|███████▊ | 17457/22434 [15:36:07<3:29:33, 2.53s/it] +2025-02-06 01:43:50 - ERROR - stderr - 78%|███████▊ | 17458/22434 [15:36:10<3:29:04, 2.52s/it] +2025-02-06 01:43:50 - ERROR - stderr - +2025-02-06 01:43:50 - ERROR - stderr - +2025-02-06 01:43:50 - INFO - stdout - {'loss': 0.3398, 'grad_norm': 1.4241713285446167, 'learning_rate': 2.471456942688384e-06, 'epoch': 2.33} +2025-02-06 01:43:50 - ERROR - stderr - 78%|███████▊ | 17458/22434 [15:36:10<3:29:04, 2.52s/it] +2025-02-06 01:43:52 - ERROR - stderr - 78%|███████▊ | 17459/22434 [15:36:12<3:27:09, 2.50s/it] +2025-02-06 01:43:53 - ERROR - stderr - +2025-02-06 01:43:53 - ERROR - stderr - +2025-02-06 01:43:53 - INFO - stdout - {'loss': 0.3442, 'grad_norm': 1.5271642208099365, 'learning_rate': 2.4705067656019386e-06, 'epoch': 2.33} +2025-02-06 01:43:53 - ERROR - stderr - 78%|███████▊ | 17459/22434 [15:36:12<3:27:09, 2.50s/it] +2025-02-06 01:43:55 - ERROR - stderr - 78%|███████▊ | 17460/22434 [15:36:15<3:28:41, 2.52s/it] +2025-02-06 01:43:55 - ERROR - stderr - +2025-02-06 01:43:55 - ERROR - stderr - +2025-02-06 01:43:55 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.6252468824386597, 'learning_rate': 2.4695567454605785e-06, 'epoch': 2.33} +2025-02-06 01:43:55 - ERROR - stderr - 78%|███████▊ | 17460/22434 [15:36:15<3:28:41, 2.52s/it] +2025-02-06 01:43:58 - ERROR - stderr - 78%|██���████▊ | 17461/22434 [15:36:18<3:38:34, 2.64s/it] +2025-02-06 01:43:58 - ERROR - stderr - +2025-02-06 01:43:58 - ERROR - stderr - +2025-02-06 01:43:58 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.5473562479019165, 'learning_rate': 2.468606882284096e-06, 'epoch': 2.33} +2025-02-06 01:43:58 - ERROR - stderr - 78%|███████▊ | 17461/22434 [15:36:18<3:38:34, 2.64s/it] +2025-02-06 01:44:00 - ERROR - stderr - 78%|███████▊ | 17462/22434 [15:36:20<3:34:11, 2.58s/it] +2025-02-06 01:44:00 - ERROR - stderr - +2025-02-06 01:44:00 - ERROR - stderr - +2025-02-06 01:44:00 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.491461157798767, 'learning_rate': 2.467657176092302e-06, 'epoch': 2.34} +2025-02-06 01:44:00 - ERROR - stderr - 78%|███████▊ | 17462/22434 [15:36:20<3:34:11, 2.58s/it] +2025-02-06 01:44:03 - ERROR - stderr - 78%|███████▊ | 17463/22434 [15:36:23<3:31:38, 2.55s/it] +2025-02-06 01:44:03 - ERROR - stderr - +2025-02-06 01:44:03 - ERROR - stderr - +2025-02-06 01:44:03 - INFO - stdout - {'loss': 0.3944, 'grad_norm': 1.4178305864334106, 'learning_rate': 2.4667076269049805e-06, 'epoch': 2.34} +2025-02-06 01:44:03 - ERROR - stderr - 78%|███████▊ | 17463/22434 [15:36:23<3:31:38, 2.55s/it] +2025-02-06 01:44:05 - ERROR - stderr - 78%|███████▊ | 17464/22434 [15:36:25<3:29:35, 2.53s/it] +2025-02-06 01:44:05 - ERROR - stderr - +2025-02-06 01:44:05 - ERROR - stderr - +2025-02-06 01:44:05 - INFO - stdout - {'loss': 0.3278, 'grad_norm': 1.3842869997024536, 'learning_rate': 2.465758234741936e-06, 'epoch': 2.34} +2025-02-06 01:44:05 - ERROR - stderr - 78%|███████▊ | 17464/22434 [15:36:25<3:29:35, 2.53s/it] +2025-02-06 01:44:08 - ERROR - stderr - 78%|███████▊ | 17465/22434 [15:36:28<3:28:39, 2.52s/it] +2025-02-06 01:44:08 - ERROR - stderr - +2025-02-06 01:44:08 - ERROR - stderr - +2025-02-06 01:44:08 - INFO - stdout - {'loss': 0.4049, 'grad_norm': 1.6831704378128052, 'learning_rate': 2.4648089996229485e-06, 'epoch': 2.34} +2025-02-06 01:44:08 - ERROR - stderr - 78%|███████▊ | 17465/22434 [15:36:28<3:28:39, 2.52s/it] +2025-02-06 01:44:10 - ERROR - stderr - 78%|███████▊ | 17466/22434 [15:36:30<3:27:46, 2.51s/it] +2025-02-06 01:44:10 - ERROR - stderr - +2025-02-06 01:44:10 - ERROR - stderr - +2025-02-06 01:44:10 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.4243803024291992, 'learning_rate': 2.463859921567805e-06, 'epoch': 2.34} +2025-02-06 01:44:10 - ERROR - stderr - 78%|███████▊ | 17466/22434 [15:36:30<3:27:46, 2.51s/it] +2025-02-06 01:44:13 - ERROR - stderr - 78%|███████▊ | 17467/22434 [15:36:33<3:29:51, 2.54s/it] +2025-02-06 01:44:13 - ERROR - stderr - +2025-02-06 01:44:13 - ERROR - stderr - +2025-02-06 01:44:13 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.5287054777145386, 'learning_rate': 2.4629110005962954e-06, 'epoch': 2.34} +2025-02-06 01:44:13 - ERROR - stderr - 78%|███████▊ | 17467/22434 [15:36:33<3:29:51, 2.54s/it] +2025-02-06 01:44:16 - ERROR - stderr - 78%|███████▊ | 17468/22434 [15:36:35<3:35:56, 2.61s/it] +2025-02-06 01:44:16 - ERROR - stderr - +2025-02-06 01:44:16 - ERROR - stderr - +2025-02-06 01:44:16 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.5389456748962402, 'learning_rate': 2.4619622367281905e-06, 'epoch': 2.34} +2025-02-06 01:44:16 - ERROR - stderr - 78%|███████▊ | 17468/22434 [15:36:36<3:35:56, 2.61s/it] +2025-02-06 01:44:18 - ERROR - stderr - 78%|███████▊ | 17469/22434 [15:36:38<3:31:54, 2.56s/it] +2025-02-06 01:44:18 - ERROR - stderr - +2025-02-06 01:44:18 - ERROR - stderr - +2025-02-06 01:44:18 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.6213796138763428, 'learning_rate': 2.4610136299832697e-06, 'epoch': 2.34} +2025-02-06 01:44:18 - ERROR - stderr - 78%|███████▊ | 17469/22434 [15:36:38<3:31:54, 2.56s/it] +2025-02-06 01:44:21 - ERROR - stderr - 78%|███████▊ | 17470/22434 [15:36:40<3:31:03, 2.55s/it] +2025-02-06 01:44:21 - ERROR - stderr - +2025-02-06 01:44:21 - ERROR - stderr - +2025-02-06 01:44:21 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.5653772354125977, 'learning_rate': 2.4600651803813057e-06, 'epoch': 2.34} +2025-02-06 01:44:21 - ERROR - stderr - 78%|███████▊ | 17470/22434 [15:36:41<3:31:03, 2.55s/it] +2025-02-06 01:44:23 - ERROR - stderr - 78%|███████▊ | 17471/22434 [15:36:43<3:30:23, 2.54s/it] +2025-02-06 01:44:23 - ERROR - stderr - +2025-02-06 01:44:23 - ERROR - stderr - +2025-02-06 01:44:23 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.449273705482483, 'learning_rate': 2.459116887942069e-06, 'epoch': 2.34} +2025-02-06 01:44:23 - ERROR - stderr - 78%|███████▊ | 17471/22434 [15:36:43<3:30:23, 2.54s/it] +2025-02-06 01:44:26 - ERROR - stderr - 78%|███████▊ | 17472/22434 [15:36:46<3:34:22, 2.59s/it] +2025-02-06 01:44:26 - ERROR - stderr - +2025-02-06 01:44:26 - ERROR - stderr - +2025-02-06 01:44:26 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.8648433685302734, 'learning_rate': 2.4581687526853235e-06, 'epoch': 2.34} +2025-02-06 01:44:26 - ERROR - stderr - 78%|███████▊ | 17472/22434 [15:36:46<3:34:22, 2.59s/it] +2025-02-06 01:44:28 - ERROR - stderr - 78%|███████▊ | 17473/22434 [15:36:48<3:30:34, 2.55s/it] +2025-02-06 01:44:28 - ERROR - stderr - +2025-02-06 01:44:28 - ERROR - stderr - +2025-02-06 01:44:28 - INFO - stdout - {'loss': 0.3725, 'grad_norm': 1.5202324390411377, 'learning_rate': 2.457220774630835e-06, 'epoch': 2.34} +2025-02-06 01:44:28 - ERROR - stderr - 78%|███████▊ | 17473/22434 [15:36:48<3:30:34, 2.55s/it] +2025-02-06 01:44:31 - ERROR - stderr - 78%|███████▊ | 17474/22434 [15:36:51<3:27:44, 2.51s/it] +2025-02-06 01:44:31 - ERROR - stderr - +2025-02-06 01:44:31 - ERROR - stderr - +2025-02-06 01:44:31 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.4705729484558105, 'learning_rate': 2.456272953798361e-06, 'epoch': 2.34} +2025-02-06 01:44:31 - ERROR - stderr - 78%|███████▊ | 17474/22434 [15:36:51<3:27:44, 2.51s/it] +2025-02-06 01:44:33 - ERROR - stderr - 78%|███████▊ | 17475/22434 [15:36:53<3:26:24, 2.50s/it] +2025-02-06 01:44:33 - ERROR - stderr - +2025-02-06 01:44:33 - ERROR - stderr - +2025-02-06 01:44:33 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4698258638381958, 'learning_rate': 2.4553252902076595e-06, 'epoch': 2.34} +2025-02-06 01:44:33 - ERROR - stderr - 78%|███████▊ | 17475/22434 [15:36:53<3:26:24, 2.50s/it] +2025-02-06 01:44:36 - ERROR - stderr - 78%|███████▊ | 17476/22434 [15:36:55<3:25:20, 2.48s/it] +2025-02-06 01:44:36 - ERROR - stderr - +2025-02-06 01:44:36 - ERROR - stderr - +2025-02-06 01:44:36 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.645731806755066, 'learning_rate': 2.4543777838784855e-06, 'epoch': 2.34} +2025-02-06 01:44:36 - ERROR - stderr - 78%|███████▊ | 17476/22434 [15:36:56<3:25:20, 2.48s/it] +2025-02-06 01:44:38 - ERROR - stderr - 78%|███████▊ | 17477/22434 [15:36:58<3:24:16, 2.47s/it] +2025-02-06 01:44:38 - ERROR - stderr - +2025-02-06 01:44:38 - ERROR - stderr - +2025-02-06 01:44:38 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.755331039428711, 'learning_rate': 2.4534304348305795e-06, 'epoch': 2.34} +2025-02-06 01:44:38 - ERROR - stderr - 78%|███████▊ | 17477/22434 [15:36:58<3:24:16, 2.47s/it] +2025-02-06 01:44:41 - ERROR - stderr - 78%|███████▊ | 17478/22434 [15:37:00<3:24:13, 2.47s/it] +2025-02-06 01:44:41 - ERROR - stderr - +2025-02-06 01:44:41 - ERROR - stderr - +2025-02-06 01:44:41 - INFO - stdout - {'loss': 0.4211, 'grad_norm': 1.6362115144729614, 'learning_rate': 2.452483243083699e-06, 'epoch': 2.34} +2025-02-06 01:44:41 - ERROR - stderr - 78%|███████▊ | 17478/22434 [15:37:00<3:24:13, 2.47s/it] +2025-02-06 01:44:43 - ERROR - stderr - 78%|███████▊ | 17479/22434 [15:37:03<3:24:30, 2.48s/it] +2025-02-06 01:44:43 - ERROR - stderr - +2025-02-06 01:44:43 - ERROR - stderr - +2025-02-06 01:44:43 - INFO - stdout - {'loss': 0.41, 'grad_norm': 1.6000616550445557, 'learning_rate': 2.4515362086575824e-06, 'epoch': 2.34} +2025-02-06 01:44:43 - ERROR - stderr - 78%|███████▊ | 17479/22434 [15:37:03<3:24:30, 2.48s/it] +2025-02-06 01:44:46 - ERROR - stderr - 78%|███████▊ | 17480/22434 [15:37:05<3:25:46, 2.49s/it] +2025-02-06 01:44:46 - ERROR - stderr - +2025-02-06 01:44:46 - ERROR - stderr - +2025-02-06 01:44:46 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.4886176586151123, 'learning_rate': 2.45058933157197e-06, 'epoch': 2.34} +2025-02-06 01:44:46 - ERROR - stderr - 78%|███████▊ | 17480/22434 [15:37:05<3:25:46, 2.49s/it] +2025-02-06 01:44:48 - ERROR - stderr - 78%|███████▊ | 17481/22434 [15:37:08<3:24:03, 2.47s/it] +2025-02-06 01:44:48 - ERROR - stderr - +2025-02-06 01:44:48 - ERROR - stderr - +2025-02-06 01:44:48 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.780479907989502, 'learning_rate': 2.449642611846602e-06, 'epoch': 2.34} +2025-02-06 01:44:48 - ERROR - stderr - 78%|███████▊ | 17481/22434 [15:37:08<3:24:03, 2.47s/it] +2025-02-06 01:44:51 - ERROR - stderr - 78%|███████▊ | 17482/22434 [15:37:11<3:32:41, 2.58s/it] +2025-02-06 01:44:51 - ERROR - stderr - +2025-02-06 01:44:51 - ERROR - stderr - +2025-02-06 01:44:51 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.5574089288711548, 'learning_rate': 2.4486960495012037e-06, 'epoch': 2.34} +2025-02-06 01:44:51 - ERROR - stderr - 78%|███████▊ | 17482/22434 [15:37:11<3:32:41, 2.58s/it] +2025-02-06 01:44:53 - ERROR - stderr - 78%|███████▊ | 17483/22434 [15:37:13<3:29:51, 2.54s/it] +2025-02-06 01:44:53 - ERROR - stderr - +2025-02-06 01:44:53 - ERROR - stderr - +2025-02-06 01:44:53 - INFO - stdout - {'loss': 0.4245, 'grad_norm': 1.6260024309158325, 'learning_rate': 2.447749644555516e-06, 'epoch': 2.34} +2025-02-06 01:44:53 - ERROR - stderr - 78%|███████▊ | 17483/22434 [15:37:13<3:29:51, 2.54s/it] +2025-02-06 01:44:56 - ERROR - stderr - 78%|███████▊ | 17484/22434 [15:37:16<3:27:52, 2.52s/it] +2025-02-06 01:44:56 - ERROR - stderr - +2025-02-06 01:44:56 - ERROR - stderr - +2025-02-06 01:44:56 - INFO - stdout - {'loss': 0.3281, 'grad_norm': 1.5757054090499878, 'learning_rate': 2.446803397029257e-06, 'epoch': 2.34} +2025-02-06 01:44:56 - ERROR - stderr - 78%|███████▊ | 17484/22434 [15:37:16<3:27:52, 2.52s/it] +2025-02-06 01:44:58 - ERROR - stderr - 78%|███████▊ | 17485/22434 [15:37:18<3:27:31, 2.52s/it] +2025-02-06 01:44:58 - ERROR - stderr - +2025-02-06 01:44:58 - ERROR - stderr - +2025-02-06 01:44:58 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.5461342334747314, 'learning_rate': 2.445857306942151e-06, 'epoch': 2.34} +2025-02-06 01:44:58 - ERROR - stderr - 78%|███████▊ | 17485/22434 [15:37:18<3:27:31, 2.52s/it] +2025-02-06 01:45:01 - ERROR - stderr - 78%|███████▊ | 17486/22434 [15:37:21<3:27:31, 2.52s/it] +2025-02-06 01:45:01 - ERROR - stderr - +2025-02-06 01:45:01 - ERROR - stderr - +2025-02-06 01:45:01 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.5894041061401367, 'learning_rate': 2.444911374313926e-06, 'epoch': 2.34} +2025-02-06 01:45:01 - ERROR - stderr - 78%|███████▊ | 17486/22434 [15:37:21<3:27:31, 2.52s/it] +2025-02-06 01:45:04 - ERROR - stderr - 78%|███████▊ | 17487/22434 [15:37:23<3:30:57, 2.56s/it] +2025-02-06 01:45:04 - ERROR - stderr - +2025-02-06 01:45:04 - ERROR - stderr - +2025-02-06 01:45:04 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.483266830444336, 'learning_rate': 2.4439655991642897e-06, 'epoch': 2.34} +2025-02-06 01:45:04 - ERROR - stderr - 78%|███████▊ | 17487/22434 [15:37:23<3:30:57, 2.56s/it] +2025-02-06 01:45:06 - ERROR - stderr - 78%|███████▊ | 17488/22434 [15:37:26<3:28:59, 2.54s/it] +2025-02-06 01:45:06 - ERROR - stderr - +2025-02-06 01:45:06 - ERROR - stderr - +2025-02-06 01:45:06 - INFO - stdout - {'loss': 0.4308, 'grad_norm': 1.5486618280410767, 'learning_rate': 2.443019981512964e-06, 'epoch': 2.34} +2025-02-06 01:45:06 - ERROR - stderr - 78%|███████▊ | 17488/22434 [15:37:26<3:28:59, 2.54s/it] +2025-02-06 01:45:08 - ERROR - stderr - 78%|███████▊ | 17489/22434 [15:37:28<3:27:32, 2.52s/it] +2025-02-06 01:45:09 - ERROR - stderr - +2025-02-06 01:45:09 - ERROR - stderr - +2025-02-06 01:45:09 - INFO - stdout - {'loss': 0.3087, 'grad_norm': 1.3916655778884888, 'learning_rate': 2.442074521379654e-06, 'epoch': 2.34} +2025-02-06 01:45:09 - ERROR - stderr - 78%|███████▊ | 17489/22434 [15:37:28<3:27:32, 2.52s/it] +2025-02-06 01:45:11 - ERROR - stderr - 78%|███████▊ | 17490/22434 [15:37:31<3:30:44, 2.56s/it] +2025-02-06 01:45:11 - ERROR - stderr - +2025-02-06 01:45:11 - ERROR - stderr - +2025-02-06 01:45:11 - INFO - stdout - {'loss': 0.4419, 'grad_norm': 1.835715889930725, 'learning_rate': 2.4411292187840685e-06, 'epoch': 2.34} +2025-02-06 01:45:11 - ERROR - stderr - 78%|███████▊ | 17490/22434 [15:37:31<3:30:44, 2.56s/it] +2025-02-06 01:45:14 - ERROR - stderr - 78%|███████▊ | 17491/22434 [15:37:33<3:28:02, 2.53s/it] +2025-02-06 01:45:14 - ERROR - stderr - +2025-02-06 01:45:14 - ERROR - stderr - +2025-02-06 01:45:14 - INFO - stdout - {'loss': 0.3871, 'grad_norm': 1.6722612380981445, 'learning_rate': 2.4401840737459104e-06, 'epoch': 2.34} +2025-02-06 01:45:14 - ERROR - stderr - 78%|███████▊ | 17491/22434 [15:37:33<3:28:02, 2.53s/it] +2025-02-06 01:45:16 - ERROR - stderr - 78%|███████▊ | 17492/22434 [15:37:36<3:28:33, 2.53s/it] +2025-02-06 01:45:16 - ERROR - stderr - +2025-02-06 01:45:16 - ERROR - stderr - +2025-02-06 01:45:16 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.677459478378296, 'learning_rate': 2.4392390862848826e-06, 'epoch': 2.34} +2025-02-06 01:45:16 - ERROR - stderr - 78%|███████▊ | 17492/22434 [15:37:36<3:28:33, 2.53s/it] +2025-02-06 01:45:19 - ERROR - stderr - 78%|███████▊ | 17493/22434 [15:37:38<3:25:49, 2.50s/it] +2025-02-06 01:45:19 - ERROR - stderr - +2025-02-06 01:45:19 - ERROR - stderr - +2025-02-06 01:45:19 - INFO - stdout - {'loss': 0.4144, 'grad_norm': 1.6830699443817139, 'learning_rate': 2.43829425642068e-06, 'epoch': 2.34} +2025-02-06 01:45:19 - ERROR - stderr - 78%|███████▊ | 17493/22434 [15:37:38<3:25:49, 2.50s/it] +2025-02-06 01:45:21 - ERROR - stderr - 78%|███████▊ | 17494/22434 [15:37:41<3:30:27, 2.56s/it] +2025-02-06 01:45:21 - ERROR - stderr - +2025-02-06 01:45:21 - ERROR - stderr - +2025-02-06 01:45:21 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.5476511716842651, 'learning_rate': 2.4373495841729987e-06, 'epoch': 2.34} +2025-02-06 01:45:21 - ERROR - stderr - 78%|███████▊ | 17494/22434 [15:37:41<3:30:27, 2.56s/it] +2025-02-06 01:45:24 - ERROR - stderr - 78%|███████▊ | 17495/22434 [15:37:44<3:30:51, 2.56s/it] +2025-02-06 01:45:24 - ERROR - stderr - +2025-02-06 01:45:24 - ERROR - stderr - +2025-02-06 01:45:24 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.6017037630081177, 'learning_rate': 2.4364050695615284e-06, 'epoch': 2.34} +2025-02-06 01:45:24 - ERROR - stderr - 78%|███████▊ | 17495/22434 [15:37:44<3:30:51, 2.56s/it] +2025-02-06 01:45:26 - ERROR - stderr - 78%|███████▊ | 17496/22434 [15:37:46<3:29:25, 2.54s/it] +2025-02-06 01:45:26 - ERROR - stderr - +2025-02-06 01:45:26 - ERROR - stderr - +2025-02-06 01:45:26 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.462943434715271, 'learning_rate': 2.435460712605956e-06, 'epoch': 2.34} +2025-02-06 01:45:26 - ERROR - stderr - 78%|███████▊ | 17496/22434 [15:37:46<3:29:25, 2.54s/it] +2025-02-06 01:45:29 - ERROR - stderr - 78%|███████▊ | 17497/22434 [15:37:49<3:28:23, 2.53s/it] +2025-02-06 01:45:29 - ERROR - stderr - +2025-02-06 01:45:29 - ERROR - stderr - +2025-02-06 01:45:29 - INFO - stdout - {'loss': 0.4114, 'grad_norm': 1.6367404460906982, 'learning_rate': 2.4345165133259673e-06, 'epoch': 2.34} +2025-02-06 01:45:29 - ERROR - stderr - 78%|███████▊ | 17497/22434 [15:37:49<3:28:23, 2.53s/it] +2025-02-06 01:45:31 - ERROR - stderr - 78%|███████▊ | 17498/22434 [15:37:51<3:26:40, 2.51s/it] +2025-02-06 01:45:31 - ERROR - stderr - +2025-02-06 01:45:31 - ERROR - stderr - +2025-02-06 01:45:31 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.4398554563522339, 'learning_rate': 2.4335724717412433e-06, 'epoch': 2.34} +2025-02-06 01:45:31 - ERROR - stderr - 78%|███████▊ | 17498/22434 [15:37:51<3:26:40, 2.51s/it] +2025-02-06 01:45:34 - ERROR - stderr - 78%|███████▊ | 17499/22434 [15:37:54<3:26:20, 2.51s/it] +2025-02-06 01:45:34 - ERROR - stderr - +2025-02-06 01:45:34 - ERROR - stderr - +2025-02-06 01:45:34 - INFO - stdout - {'loss': 0.3271, 'grad_norm': 1.4657833576202393, 'learning_rate': 2.4326285878714595e-06, 'epoch': 2.34} +2025-02-06 01:45:34 - ERROR - stderr - 78%|███████▊ | 17499/22434 [15:37:54<3:26:20, 2.51s/it] +2025-02-06 01:45:36 - ERROR - stderr - 78%|███████▊ | 17500/22434 [15:37:56<3:26:52, 2.52s/it] +2025-02-06 01:45:36 - ERROR - stderr - +2025-02-06 01:45:36 - ERROR - stderr - +2025-02-06 01:45:36 - INFO - stdout - {'loss': 0.3376, 'grad_norm': 1.2831934690475464, 'learning_rate': 2.4316848617362952e-06, 'epoch': 2.34} +2025-02-06 01:45:36 - ERROR - stderr - 78%|███████▊ | 17500/22434 [15:37:56<3:26:52, 2.52s/it] +2025-02-06 01:45:39 - ERROR - stderr - 78%|███████▊ | 17501/22434 [15:37:59<3:26:17, 2.51s/it] +2025-02-06 01:45:39 - ERROR - stderr - +2025-02-06 01:45:39 - ERROR - stderr - +2025-02-06 01:45:39 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.6171060800552368, 'learning_rate': 2.430741293355412e-06, 'epoch': 2.34} +2025-02-06 01:45:39 - ERROR - stderr - 78%|███████▊ | 17501/22434 [15:37:59<3:26:17, 2.51s/it] +2025-02-06 01:45:41 - ERROR - stderr - 78%|███████▊ | 17502/22434 [15:38:01<3:27:03, 2.52s/it] +2025-02-06 01:45:41 - ERROR - stderr - +2025-02-06 01:45:41 - ERROR - stderr - +2025-02-06 01:45:41 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.6490013599395752, 'learning_rate': 2.4297978827484893e-06, 'epoch': 2.34} +2025-02-06 01:45:41 - ERROR - stderr - 78%|███████▊ | 17502/22434 [15:38:01<3:27:03, 2.52s/it] +2025-02-06 01:45:44 - ERROR - stderr - 78%|███████▊ | 17503/22434 [15:38:04<3:32:39, 2.59s/it] +2025-02-06 01:45:44 - ERROR - stderr - +2025-02-06 01:45:44 - ERROR - stderr - +2025-02-06 01:45:44 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.5419689416885376, 'learning_rate': 2.42885462993518e-06, 'epoch': 2.34} +2025-02-06 01:45:44 - ERROR - stderr - 78%|███████▊ | 17503/22434 [15:38:04<3:32:39, 2.59s/it] +2025-02-06 01:45:47 - ERROR - stderr - 78%|███████▊ | 17504/22434 [15:38:06<3:30:51, 2.57s/it] +2025-02-06 01:45:47 - ERROR - stderr - +2025-02-06 01:45:47 - ERROR - stderr - +2025-02-06 01:45:47 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.4847825765609741, 'learning_rate': 2.4279115349351546e-06, 'epoch': 2.34} +2025-02-06 01:45:47 - ERROR - stderr - 78%|███████▊ | 17504/22434 [15:38:06<3:30:51, 2.57s/it] +2025-02-06 01:45:49 - ERROR - stderr - 78%|███████▊ | 17505/22434 [15:38:09<3:31:20, 2.57s/it] +2025-02-06 01:45:49 - ERROR - stderr - +2025-02-06 01:45:49 - ERROR - stderr - +2025-02-06 01:45:49 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.53132164478302, 'learning_rate': 2.426968597768069e-06, 'epoch': 2.34} +2025-02-06 01:45:49 - ERROR - stderr - 78%|███████▊ | 17505/22434 [15:38:09<3:31:20, 2.57s/it] +2025-02-06 01:45:52 - ERROR - stderr - 78%|███████▊ | 17506/22434 [15:38:11<3:26:49, 2.52s/it] +2025-02-06 01:45:52 - ERROR - stderr - +2025-02-06 01:45:52 - ERROR - stderr - +2025-02-06 01:45:52 - INFO - stdout - {'loss': 0.3668, 'grad_norm': 1.7561753988265991, 'learning_rate': 2.426025818453572e-06, 'epoch': 2.34} +2025-02-06 01:45:52 - ERROR - stderr - 78%|███████▊ | 17506/22434 [15:38:11<3:26:49, 2.52s/it] +2025-02-06 01:45:54 - ERROR - stderr - 78%|███████▊ | 17507/22434 [15:38:14<3:27:18, 2.52s/it] +2025-02-06 01:45:54 - ERROR - stderr - +2025-02-06 01:45:54 - ERROR - stderr - +2025-02-06 01:45:54 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.456761121749878, 'learning_rate': 2.425083197011324e-06, 'epoch': 2.34} +2025-02-06 01:45:54 - ERROR - stderr - 78%|███████▊ | 17507/22434 [15:38:14<3:27:18, 2.52s/it] +2025-02-06 01:45:57 - ERROR - stderr - 78%|███████▊ | 17508/22434 [15:38:16<3:24:40, 2.49s/it] +2025-02-06 01:45:57 - ERROR - stderr - +2025-02-06 01:45:57 - ERROR - stderr - +2025-02-06 01:45:57 - INFO - stdout - {'loss': 0.4037, 'grad_norm': 1.5619450807571411, 'learning_rate': 2.4241407334609634e-06, 'epoch': 2.34} +2025-02-06 01:45:57 - ERROR - stderr - 78%|███████▊ | 17508/22434 [15:38:16<3:24:40, 2.49s/it] +2025-02-06 01:45:59 - ERROR - stderr - 78%|███████▊ | 17509/22434 [15:38:19<3:23:04, 2.47s/it] +2025-02-06 01:45:59 - ERROR - stderr - +2025-02-06 01:45:59 - ERROR - stderr - +2025-02-06 01:45:59 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.4734280109405518, 'learning_rate': 2.4231984278221453e-06, 'epoch': 2.34} +2025-02-06 01:45:59 - ERROR - stderr - 78%|███████▊ | 17509/22434 [15:38:19<3:23:04, 2.47s/it] +2025-02-06 01:46:01 - ERROR - stderr - 78%|███████▊ | 17510/22434 [15:38:21<3:21:56, 2.46s/it] +2025-02-06 01:46:01 - ERROR - stderr - +2025-02-06 01:46:01 - ERROR - stderr - +2025-02-06 01:46:01 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.5037074089050293, 'learning_rate': 2.4222562801145035e-06, 'epoch': 2.34} +2025-02-06 01:46:01 - ERROR - stderr - 78%|███████▊ | 17510/22434 [15:38:21<3:21:56, 2.46s/it] +2025-02-06 01:46:04 - ERROR - stderr - 78%|███████▊ | 17511/22434 [15:38:24<3:23:01, 2.47s/it] +2025-02-06 01:46:04 - ERROR - stderr - +2025-02-06 01:46:04 - ERROR - stderr - +2025-02-06 01:46:04 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.4884778261184692, 'learning_rate': 2.421314290357675e-06, 'epoch': 2.34} +2025-02-06 01:46:04 - ERROR - stderr - 78%|███████▊ | 17511/22434 [15:38:24<3:23:01, 2.47s/it] +2025-02-06 01:46:06 - ERROR - stderr - 78%|███████▊ | 17512/22434 [15:38:26<3:22:21, 2.47s/it] +2025-02-06 01:46:06 - ERROR - stderr - +2025-02-06 01:46:06 - ERROR - stderr - +2025-02-06 01:46:06 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.478894829750061, 'learning_rate': 2.420372458571304e-06, 'epoch': 2.34} +2025-02-06 01:46:06 - ERROR - stderr - 78%|███████▊ | 17512/22434 [15:38:26<3:22:21, 2.47s/it] +2025-02-06 01:46:09 - ERROR - stderr - 78%|███████▊ | 17513/22434 [15:38:29<3:21:47, 2.46s/it] +2025-02-06 01:46:09 - ERROR - stderr - +2025-02-06 01:46:09 - ERROR - stderr - +2025-02-06 01:46:09 - INFO - stdout - {'loss': 0.3232, 'grad_norm': 1.4484221935272217, 'learning_rate': 2.419430784775013e-06, 'epoch': 2.34} +2025-02-06 01:46:09 - ERROR - stderr - 78%|███████▊ | 17513/22434 [15:38:29<3:21:47, 2.46s/it] +2025-02-06 01:46:11 - ERROR - stderr - 78%|███████▊ | 17514/22434 [15:38:31<3:22:42, 2.47s/it] +2025-02-06 01:46:11 - ERROR - stderr - +2025-02-06 01:46:11 - ERROR - stderr - +2025-02-06 01:46:11 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.373468279838562, 'learning_rate': 2.418489268988433e-06, 'epoch': 2.34} +2025-02-06 01:46:11 - ERROR - stderr - 78%|███████▊ | 17514/22434 [15:38:31<3:22:42, 2.47s/it] +2025-02-06 01:46:14 - ERROR - stderr - 78%|███████▊ | 17515/22434 [15:38:33<3:21:11, 2.45s/it] +2025-02-06 01:46:14 - ERROR - stderr - +2025-02-06 01:46:14 - ERROR - stderr - +2025-02-06 01:46:14 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.6089197397232056, 'learning_rate': 2.4175479112311904e-06, 'epoch': 2.34} +2025-02-06 01:46:14 - ERROR - stderr - 78%|███████▊ | 17515/22434 [15:38:34<3:21:11, 2.45s/it] +2025-02-06 01:46:16 - ERROR - stderr - 78%|███████▊ | 17516/22434 [15:38:36<3:21:05, 2.45s/it] +2025-02-06 01:46:16 - ERROR - stderr - +2025-02-06 01:46:16 - ERROR - stderr - +2025-02-06 01:46:16 - INFO - stdout - {'loss': 0.3965, 'grad_norm': 1.5382609367370605, 'learning_rate': 2.4166067115229062e-06, 'epoch': 2.34} +2025-02-06 01:46:16 - ERROR - stderr - 78%|███████▊ | 17516/22434 [15:38:36<3:21:05, 2.45s/it] +2025-02-06 01:46:19 - ERROR - stderr - 78%|███████▊ | 17517/22434 [15:38:38<3:22:22, 2.47s/it] +2025-02-06 01:46:19 - ERROR - stderr - +2025-02-06 01:46:19 - ERROR - stderr - +2025-02-06 01:46:19 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.325708031654358, 'learning_rate': 2.415665669883198e-06, 'epoch': 2.34} +2025-02-06 01:46:19 - ERROR - stderr - 78%|███████▊ | 17517/22434 [15:38:38<3:22:22, 2.47s/it] +2025-02-06 01:46:21 - ERROR - stderr - 78%|███████▊ | 17518/22434 [15:38:41<3:27:34, 2.53s/it] +2025-02-06 01:46:21 - ERROR - stderr - +2025-02-06 01:46:21 - ERROR - stderr - +2025-02-06 01:46:21 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.5110138654708862, 'learning_rate': 2.4147247863316814e-06, 'epoch': 2.34} +2025-02-06 01:46:21 - ERROR - stderr - 78%|███████▊ | 17518/22434 [15:38:41<3:27:34, 2.53s/it] +2025-02-06 01:46:24 - ERROR - stderr - 78%|███████▊ | 17519/22434 [15:38:44<3:26:25, 2.52s/it] +2025-02-06 01:46:24 - ERROR - stderr - +2025-02-06 01:46:24 - ERROR - stderr - +2025-02-06 01:46:24 - INFO - stdout - {'loss': 0.4058, 'grad_norm': 1.4987272024154663, 'learning_rate': 2.4137840608879682e-06, 'epoch': 2.34} +2025-02-06 01:46:24 - ERROR - stderr - 78%|███████▊ | 17519/22434 [15:38:44<3:26:25, 2.52s/it] +2025-02-06 01:46:26 - ERROR - stderr - 78%|███████▊ | 17520/22434 [15:38:46<3:25:18, 2.51s/it] +2025-02-06 01:46:26 - ERROR - stderr - +2025-02-06 01:46:26 - ERROR - stderr - +2025-02-06 01:46:26 - INFO - stdout - {'loss': 0.3942, 'grad_norm': 1.4979009628295898, 'learning_rate': 2.4128434935716673e-06, 'epoch': 2.34} +2025-02-06 01:46:26 - ERROR - stderr - 78%|███████▊ | 17520/22434 [15:38:46<3:25:18, 2.51s/it] +2025-02-06 01:46:29 - ERROR - stderr - 78%|███████▊ | 17521/22434 [15:38:49<3:25:19, 2.51s/it] +2025-02-06 01:46:29 - ERROR - stderr - +2025-02-06 01:46:29 - ERROR - stderr - +2025-02-06 01:46:29 - INFO - stdout - {'loss': 0.3399, 'grad_norm': 1.388946771621704, 'learning_rate': 2.411903084402387e-06, 'epoch': 2.34} +2025-02-06 01:46:29 - ERROR - stderr - 78%|███████▊ | 17521/22434 [15:38:49<3:25:19, 2.51s/it] +2025-02-06 01:46:31 - ERROR - stderr - 78%|███████▊ | 17522/22434 [15:38:51<3:28:20, 2.54s/it] +2025-02-06 01:46:32 - ERROR - stderr - +2025-02-06 01:46:32 - ERROR - stderr - +2025-02-06 01:46:32 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.5249682664871216, 'learning_rate': 2.410962833399719e-06, 'epoch': 2.34} +2025-02-06 01:46:32 - ERROR - stderr - 78%|███████▊ | 17522/22434 [15:38:51<3:28:20, 2.54s/it] +2025-02-06 01:46:34 - ERROR - stderr - 78%|███████▊ | 17523/22434 [15:38:54<3:26:02, 2.52s/it] +2025-02-06 01:46:34 - ERROR - stderr - +2025-02-06 01:46:34 - ERROR - stderr - +2025-02-06 01:46:34 - INFO - stdout - {'loss': 0.4015, 'grad_norm': 1.6147184371948242, 'learning_rate': 2.4100227405832734e-06, 'epoch': 2.34} +2025-02-06 01:46:34 - ERROR - stderr - 78%|███████▊ | 17523/22434 [15:38:54<3:26:02, 2.52s/it] +2025-02-06 01:46:36 - ERROR - stderr - 78%|███████▊ | 17524/22434 [15:38:56<3:23:58, 2.49s/it] +2025-02-06 01:46:36 - ERROR - stderr - +2025-02-06 01:46:36 - ERROR - stderr - +2025-02-06 01:46:36 - INFO - stdout - {'loss': 0.3244, 'grad_norm': 1.434480905532837, 'learning_rate': 2.409082805972639e-06, 'epoch': 2.34} +2025-02-06 01:46:36 - ERROR - stderr - 78%|███████▊ | 17524/22434 [15:38:56<3:23:58, 2.49s/it] +2025-02-06 01:46:39 - ERROR - stderr - 78%|███████▊ | 17525/22434 [15:38:59<3:24:59, 2.51s/it] +2025-02-06 01:46:39 - ERROR - stderr - +2025-02-06 01:46:39 - ERROR - stderr - +2025-02-06 01:46:39 - INFO - stdout - {'loss': 0.319, 'grad_norm': 1.5144776105880737, 'learning_rate': 2.408143029587411e-06, 'epoch': 2.34} +2025-02-06 01:46:39 - ERROR - stderr - 78%|███████▊ | 17525/22434 [15:38:59<3:24:59, 2.51s/it] +2025-02-06 01:46:41 - ERROR - stderr - 78%|███████▊ | 17526/22434 [15:39:01<3:27:03, 2.53s/it] +2025-02-06 01:46:42 - ERROR - stderr - +2025-02-06 01:46:42 - ERROR - stderr - +2025-02-06 01:46:42 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.3578449487686157, 'learning_rate': 2.40720341144718e-06, 'epoch': 2.34} +2025-02-06 01:46:42 - ERROR - stderr - 78%|███████▊ | 17526/22434 [15:39:01<3:27:03, 2.53s/it] +2025-02-06 01:46:44 - ERROR - stderr - 78%|███████▊ | 17527/22434 [15:39:04<3:26:11, 2.52s/it] +2025-02-06 01:46:44 - ERROR - stderr - +2025-02-06 01:46:44 - ERROR - stderr - +2025-02-06 01:46:44 - INFO - stdout - {'loss': 0.4123, 'grad_norm': 1.6287689208984375, 'learning_rate': 2.4062639515715214e-06, 'epoch': 2.34} +2025-02-06 01:46:44 - ERROR - stderr - 78%|███████▊ | 17527/22434 [15:39:04<3:26:11, 2.52s/it] +2025-02-06 01:46:46 - ERROR - stderr - 78%|███████▊ | 17528/22434 [15:39:06<3:25:15, 2.51s/it] +2025-02-06 01:46:47 - ERROR - stderr - +2025-02-06 01:46:47 - ERROR - stderr - +2025-02-06 01:46:47 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.6070001125335693, 'learning_rate': 2.4053246499800307e-06, 'epoch': 2.34} +2025-02-06 01:46:47 - ERROR - stderr - 78%|███████▊ | 17528/22434 [15:39:06<3:25:15, 2.51s/it] +2025-02-06 01:46:49 - ERROR - stderr - 78%|███████▊ | 17529/22434 [15:39:09<3:25:12, 2.51s/it] +2025-02-06 01:46:49 - ERROR - stderr - +2025-02-06 01:46:49 - ERROR - stderr - +2025-02-06 01:46:49 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.6061204671859741, 'learning_rate': 2.4043855066922783e-06, 'epoch': 2.34} +2025-02-06 01:46:49 - ERROR - stderr - 78%|███████▊ | 17529/22434 [15:39:09<3:25:12, 2.51s/it] +2025-02-06 01:46:52 - ERROR - stderr - 78%|███████▊ | 17530/22434 [15:39:11<3:25:39, 2.52s/it] +2025-02-06 01:46:52 - ERROR - stderr - +2025-02-06 01:46:52 - ERROR - stderr - +2025-02-06 01:46:52 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.358227252960205, 'learning_rate': 2.403446521727838e-06, 'epoch': 2.34} +2025-02-06 01:46:52 - ERROR - stderr - 78%|███████▊ | 17530/22434 [15:39:11<3:25:39, 2.52s/it] +2025-02-06 01:46:54 - ERROR - stderr - 78%|███████▊ | 17531/22434 [15:39:14<3:24:33, 2.50s/it] +2025-02-06 01:46:54 - ERROR - stderr - +2025-02-06 01:46:54 - ERROR - stderr - +2025-02-06 01:46:54 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.5503959655761719, 'learning_rate': 2.402507695106292e-06, 'epoch': 2.34} +2025-02-06 01:46:54 - ERROR - stderr - 78%|███████▊ | 17531/22434 [15:39:14<3:24:33, 2.50s/it] +2025-02-06 01:46:57 - ERROR - stderr - 78%|███████▊ | 17532/22434 [15:39:16<3:25:05, 2.51s/it] +2025-02-06 01:46:57 - ERROR - stderr - +2025-02-06 01:46:57 - ERROR - stderr - +2025-02-06 01:46:57 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.5253574848175049, 'learning_rate': 2.401569026847197e-06, 'epoch': 2.34} +2025-02-06 01:46:57 - ERROR - stderr - 78%|███████▊ | 17532/22434 [15:39:16<3:25:05, 2.51s/it] +2025-02-06 01:46:59 - ERROR - stderr - 78%|███████▊ | 17533/22434 [15:39:19<3:25:31, 2.52s/it] +2025-02-06 01:46:59 - ERROR - stderr - +2025-02-06 01:46:59 - ERROR - stderr - +2025-02-06 01:46:59 - INFO - stdout - {'loss': 0.3841, 'grad_norm': 1.517142415046692, 'learning_rate': 2.4006305169701306e-06, 'epoch': 2.34} +2025-02-06 01:46:59 - ERROR - stderr - 78%|███████▊ | 17533/22434 [15:39:19<3:25:31, 2.52s/it] +2025-02-06 01:47:02 - ERROR - stderr - 78%|███████▊ | 17534/22434 [15:39:21<3:25:14, 2.51s/it] +2025-02-06 01:47:02 - ERROR - stderr - +2025-02-06 01:47:02 - ERROR - stderr - +2025-02-06 01:47:02 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.5566586256027222, 'learning_rate': 2.399692165494646e-06, 'epoch': 2.34} +2025-02-06 01:47:02 - ERROR - stderr - 78%|███████▊ | 17534/22434 [15:39:21<3:25:14, 2.51s/it] +2025-02-06 01:47:04 - ERROR - stderr - 78%|███████▊ | 17535/22434 [15:39:24<3:23:56, 2.50s/it] +2025-02-06 01:47:04 - ERROR - stderr - +2025-02-06 01:47:04 - ERROR - stderr - +2025-02-06 01:47:04 - INFO - stdout - {'loss': 0.3466, 'grad_norm': 1.4871102571487427, 'learning_rate': 2.3987539724403065e-06, 'epoch': 2.34} +2025-02-06 01:47:04 - ERROR - stderr - 78%|███████▊ | 17535/22434 [15:39:24<3:23:56, 2.50s/it] +2025-02-06 01:47:07 - ERROR - stderr - 78%|███████▊ | 17536/22434 [15:39:26<3:26:06, 2.52s/it] +2025-02-06 01:47:07 - ERROR - stderr - +2025-02-06 01:47:07 - ERROR - stderr - +2025-02-06 01:47:07 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.4861352443695068, 'learning_rate': 2.3978159378266663e-06, 'epoch': 2.35} +2025-02-06 01:47:07 - ERROR - stderr - 78%|███████▊ | 17536/22434 [15:39:26<3:26:06, 2.52s/it] +2025-02-06 01:47:09 - ERROR - stderr - 78%|███████▊ | 17537/22434 [15:39:29<3:27:25, 2.54s/it] +2025-02-06 01:47:09 - ERROR - stderr - +2025-02-06 01:47:09 - ERROR - stderr - +2025-02-06 01:47:09 - INFO - stdout - {'loss': 0.4356, 'grad_norm': 1.6022893190383911, 'learning_rate': 2.396878061673278e-06, 'epoch': 2.35} +2025-02-06 01:47:09 - ERROR - stderr - 78%|███████▊ | 17537/22434 [15:39:29<3:27:25, 2.54s/it] +2025-02-06 01:47:12 - ERROR - stderr - 78%|███████▊ | 17538/22434 [15:39:31<3:27:11, 2.54s/it] +2025-02-06 01:47:12 - ERROR - stderr - +2025-02-06 01:47:12 - ERROR - stderr - +2025-02-06 01:47:12 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.3883713483810425, 'learning_rate': 2.395940343999691e-06, 'epoch': 2.35} +2025-02-06 01:47:12 - ERROR - stderr - 78%|███████▊ | 17538/22434 [15:39:32<3:27:11, 2.54s/it] +2025-02-06 01:47:14 - ERROR - stderr - 78%|███████▊ | 17539/22434 [15:39:34<3:26:05, 2.53s/it] +2025-02-06 01:47:14 - ERROR - stderr - +2025-02-06 01:47:14 - ERROR - stderr - +2025-02-06 01:47:14 - INFO - stdout - {'loss': 0.3878, 'grad_norm': 1.8230940103530884, 'learning_rate': 2.395002784825452e-06, 'epoch': 2.35} +2025-02-06 01:47:14 - ERROR - stderr - 78%|███████▊ | 17539/22434 [15:39:34<3:26:05, 2.53s/it] +2025-02-06 01:47:17 - ERROR - stderr - 78%|███████▊ | 17540/22434 [15:39:36<3:24:47, 2.51s/it] +2025-02-06 01:47:17 - ERROR - stderr - +2025-02-06 01:47:17 - ERROR - stderr - +2025-02-06 01:47:17 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.3361262083053589, 'learning_rate': 2.3940653841701023e-06, 'epoch': 2.35} +2025-02-06 01:47:17 - ERROR - stderr - 78%|███████▊ | 17540/22434 [15:39:36<3:24:47, 2.51s/it] +2025-02-06 01:47:19 - ERROR - stderr - 78%|███████▊ | 17541/22434 [15:39:39<3:23:16, 2.49s/it] +2025-02-06 01:47:19 - ERROR - stderr - +2025-02-06 01:47:19 - ERROR - stderr - +2025-02-06 01:47:19 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.5380806922912598, 'learning_rate': 2.3931281420531816e-06, 'epoch': 2.35} +2025-02-06 01:47:19 - ERROR - stderr - 78%|███████▊ | 17541/22434 [15:39:39<3:23:16, 2.49s/it] +2025-02-06 01:47:22 - ERROR - stderr - 78%|███████▊ | 17542/22434 [15:39:41<3:23:39, 2.50s/it] +2025-02-06 01:47:22 - ERROR - stderr - +2025-02-06 01:47:22 - ERROR - stderr - +2025-02-06 01:47:22 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.5438785552978516, 'learning_rate': 2.3921910584942265e-06, 'epoch': 2.35} +2025-02-06 01:47:22 - ERROR - stderr - 78%|███████▊ | 17542/22434 [15:39:41<3:23:39, 2.50s/it] +2025-02-06 01:47:24 - ERROR - stderr - 78%|███████▊ | 17543/22434 [15:39:44<3:24:24, 2.51s/it] +2025-02-06 01:47:24 - ERROR - stderr - +2025-02-06 01:47:24 - ERROR - stderr - +2025-02-06 01:47:24 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.6624984741210938, 'learning_rate': 2.391254133512768e-06, 'epoch': 2.35} +2025-02-06 01:47:24 - ERROR - stderr - 78%|███████▊ | 17543/22434 [15:39:44<3:24:24, 2.51s/it] +2025-02-06 01:47:27 - ERROR - stderr - 78%|███████▊ | 17544/22434 [15:39:46<3:22:40, 2.49s/it] +2025-02-06 01:47:27 - ERROR - stderr - +2025-02-06 01:47:27 - ERROR - stderr - +2025-02-06 01:47:27 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.4388582706451416, 'learning_rate': 2.3903173671283363e-06, 'epoch': 2.35} +2025-02-06 01:47:27 - ERROR - stderr - 78%|███████▊ | 17544/22434 [15:39:46<3:22:40, 2.49s/it] +2025-02-06 01:47:29 - ERROR - stderr - 78%|███████▊ | 17545/22434 [15:39:49<3:23:10, 2.49s/it] +2025-02-06 01:47:29 - ERROR - stderr - +2025-02-06 01:47:29 - ERROR - stderr - +2025-02-06 01:47:29 - INFO - stdout - {'loss': 0.3374, 'grad_norm': 1.3611366748809814, 'learning_rate': 2.3893807593604614e-06, 'epoch': 2.35} +2025-02-06 01:47:29 - ERROR - stderr - 78%|███████▊ | 17545/22434 [15:39:49<3:23:10, 2.49s/it] +2025-02-06 01:47:32 - ERROR - stderr - 78%|███████▊ | 17546/22434 [15:39:51<3:23:00, 2.49s/it] +2025-02-06 01:47:32 - ERROR - stderr - +2025-02-06 01:47:32 - ERROR - stderr - +2025-02-06 01:47:32 - INFO - stdout - {'loss': 0.332, 'grad_norm': 1.441688895225525, 'learning_rate': 2.3884443102286547e-06, 'epoch': 2.35} +2025-02-06 01:47:32 - ERROR - stderr - 78%|███████▊ | 17546/22434 [15:39:51<3:23:00, 2.49s/it] +2025-02-06 01:47:34 - ERROR - stderr - 78%|███████▊ | 17547/22434 [15:39:54<3:24:07, 2.51s/it] +2025-02-06 01:47:34 - ERROR - stderr - +2025-02-06 01:47:34 - ERROR - stderr - +2025-02-06 01:47:34 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.708808422088623, 'learning_rate': 2.387508019752449e-06, 'epoch': 2.35} +2025-02-06 01:47:34 - ERROR - stderr - 78%|███████▊ | 17547/22434 [15:39:54<3:24:07, 2.51s/it] +2025-02-06 01:47:37 - ERROR - stderr - 78%|███████▊ | 17548/22434 [15:39:56<3:23:59, 2.50s/it] +2025-02-06 01:47:37 - ERROR - stderr - +2025-02-06 01:47:37 - ERROR - stderr - +2025-02-06 01:47:37 - INFO - stdout - {'loss': 0.3341, 'grad_norm': 1.4082392454147339, 'learning_rate': 2.386571887951349e-06, 'epoch': 2.35} +2025-02-06 01:47:37 - ERROR - stderr - 78%|███████▊ | 17548/22434 [15:39:56<3:23:59, 2.50s/it] +2025-02-06 01:47:39 - ERROR - stderr - 78%|███████▊ | 17549/22434 [15:39:59<3:26:46, 2.54s/it] +2025-02-06 01:47:39 - ERROR - stderr - +2025-02-06 01:47:39 - ERROR - stderr - +2025-02-06 01:47:39 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.3861737251281738, 'learning_rate': 2.385635914844876e-06, 'epoch': 2.35} +2025-02-06 01:47:39 - ERROR - stderr - 78%|███████▊ | 17549/22434 [15:39:59<3:26:46, 2.54s/it] +2025-02-06 01:47:42 - ERROR - stderr - 78%|███████▊ | 17550/22434 [15:40:02<3:26:10, 2.53s/it] +2025-02-06 01:47:42 - ERROR - stderr - +2025-02-06 01:47:42 - ERROR - stderr - +2025-02-06 01:47:42 - INFO - stdout - {'loss': 0.3222, 'grad_norm': 1.306904673576355, 'learning_rate': 2.384700100452538e-06, 'epoch': 2.35} +2025-02-06 01:47:42 - ERROR - stderr - 78%|███████▊ | 17550/22434 [15:40:02<3:26:10, 2.53s/it] +2025-02-06 01:47:44 - ERROR - stderr - 78%|███████▊ | 17551/22434 [15:40:04<3:26:05, 2.53s/it] +2025-02-06 01:47:44 - ERROR - stderr - +2025-02-06 01:47:44 - ERROR - stderr - +2025-02-06 01:47:44 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.538404107093811, 'learning_rate': 2.3837644447938348e-06, 'epoch': 2.35} +2025-02-06 01:47:44 - ERROR - stderr - 78%|███████▊ | 17551/22434 [15:40:04<3:26:05, 2.53s/it] +2025-02-06 01:47:47 - ERROR - stderr - 78%|███████▊ | 17552/22434 [15:40:07<3:26:11, 2.53s/it] +2025-02-06 01:47:47 - ERROR - stderr - +2025-02-06 01:47:47 - ERROR - stderr - +2025-02-06 01:47:47 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.691231369972229, 'learning_rate': 2.3828289478882783e-06, 'epoch': 2.35} +2025-02-06 01:47:47 - ERROR - stderr - 78%|███████▊ | 17552/22434 [15:40:07<3:26:11, 2.53s/it] +2025-02-06 01:47:49 - ERROR - stderr - 78%|███████▊ | 17553/22434 [15:40:09<3:25:48, 2.53s/it] +2025-02-06 01:47:49 - ERROR - stderr - +2025-02-06 01:47:49 - ERROR - stderr - +2025-02-06 01:47:49 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.5625964403152466, 'learning_rate': 2.381893609755361e-06, 'epoch': 2.35} +2025-02-06 01:47:49 - ERROR - stderr - 78%|███████▊ | 17553/22434 [15:40:09<3:25:48, 2.53s/it] +2025-02-06 01:47:52 - ERROR - stderr - 78%|███████▊ | 17554/22434 [15:40:12<3:24:30, 2.51s/it] +2025-02-06 01:47:52 - ERROR - stderr - +2025-02-06 01:47:52 - ERROR - stderr - +2025-02-06 01:47:52 - INFO - stdout - {'loss': 0.4171, 'grad_norm': 1.4818755388259888, 'learning_rate': 2.3809584304145827e-06, 'epoch': 2.35} +2025-02-06 01:47:52 - ERROR - stderr - 78%|███████▊ | 17554/22434 [15:40:12<3:24:30, 2.51s/it] +2025-02-06 01:47:54 - ERROR - stderr - 78%|███████▊ | 17555/22434 [15:40:14<3:27:24, 2.55s/it] +2025-02-06 01:47:55 - ERROR - stderr - +2025-02-06 01:47:55 - ERROR - stderr - +2025-02-06 01:47:55 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.4765480756759644, 'learning_rate': 2.3800234098854346e-06, 'epoch': 2.35} +2025-02-06 01:47:55 - ERROR - stderr - 78%|███████▊ | 17555/22434 [15:40:14<3:27:24, 2.55s/it] +2025-02-06 01:47:57 - ERROR - stderr - 78%|███████▊ | 17556/22434 [15:40:17<3:30:15, 2.59s/it] +2025-02-06 01:47:57 - ERROR - stderr - +2025-02-06 01:47:57 - ERROR - stderr - +2025-02-06 01:47:57 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.5711544752120972, 'learning_rate': 2.3790885481874037e-06, 'epoch': 2.35} +2025-02-06 01:47:57 - ERROR - stderr - 78%|███████▊ | 17556/22434 [15:40:17<3:30:15, 2.59s/it] +2025-02-06 01:48:00 - ERROR - stderr - 78%|███████▊ | 17557/22434 [15:40:19<3:27:21, 2.55s/it] +2025-02-06 01:48:00 - ERROR - stderr - +2025-02-06 01:48:00 - ERROR - stderr - +2025-02-06 01:48:00 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.466008186340332, 'learning_rate': 2.3781538453399856e-06, 'epoch': 2.35} +2025-02-06 01:48:00 - ERROR - stderr - 78%|███████▊ | 17557/22434 [15:40:19<3:27:21, 2.55s/it] +2025-02-06 01:48:02 - ERROR - stderr - 78%|███████▊ | 17558/22434 [15:40:22<3:27:01, 2.55s/it] +2025-02-06 01:48:02 - ERROR - stderr - +2025-02-06 01:48:02 - ERROR - stderr - +2025-02-06 01:48:02 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.5146269798278809, 'learning_rate': 2.3772193013626545e-06, 'epoch': 2.35} +2025-02-06 01:48:02 - ERROR - stderr - 78%|███████▊ | 17558/22434 [15:40:22<3:27:01, 2.55s/it] +2025-02-06 01:48:05 - ERROR - stderr - 78%|███████▊ | 17559/22434 [15:40:24<3:25:10, 2.53s/it] +2025-02-06 01:48:05 - ERROR - stderr - +2025-02-06 01:48:05 - ERROR - stderr - +2025-02-06 01:48:05 - INFO - stdout - {'loss': 0.3136, 'grad_norm': 1.6002072095870972, 'learning_rate': 2.3762849162748935e-06, 'epoch': 2.35} +2025-02-06 01:48:05 - ERROR - stderr - 78%|███████▊ | 17559/22434 [15:40:24<3:25:10, 2.53s/it] +2025-02-06 01:48:07 - ERROR - stderr - 78%|███████▊ | 17560/22434 [15:40:27<3:24:07, 2.51s/it] +2025-02-06 01:48:07 - ERROR - stderr - +2025-02-06 01:48:07 - ERROR - stderr - +2025-02-06 01:48:07 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.549189567565918, 'learning_rate': 2.3753506900961774e-06, 'epoch': 2.35} +2025-02-06 01:48:07 - ERROR - stderr - 78%|███████▊ | 17560/22434 [15:40:27<3:24:07, 2.51s/it] +2025-02-06 01:48:10 - ERROR - stderr - 78%|███████▊ | 17561/22434 [15:40:29<3:24:50, 2.52s/it] +2025-02-06 01:48:10 - ERROR - stderr - +2025-02-06 01:48:10 - ERROR - stderr - +2025-02-06 01:48:10 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.4733622074127197, 'learning_rate': 2.374416622845981e-06, 'epoch': 2.35} +2025-02-06 01:48:10 - ERROR - stderr - 78%|███████▊ | 17561/22434 [15:40:29<3:24:50, 2.52s/it] +2025-02-06 01:48:12 - ERROR - stderr - 78%|███████▊ | 17562/22434 [15:40:32<3:25:31, 2.53s/it] +2025-02-06 01:48:12 - ERROR - stderr - +2025-02-06 01:48:12 - ERROR - stderr - +2025-02-06 01:48:12 - INFO - stdout - {'loss': 0.3327, 'grad_norm': 1.8149288892745972, 'learning_rate': 2.3734827145437723e-06, 'epoch': 2.35} +2025-02-06 01:48:12 - ERROR - stderr - 78%|███████▊ | 17562/22434 [15:40:32<3:25:31, 2.53s/it] +2025-02-06 01:48:15 - ERROR - stderr - 78%|███████▊ | 17563/22434 [15:40:35<3:30:29, 2.59s/it] +2025-02-06 01:48:15 - ERROR - stderr - +2025-02-06 01:48:15 - ERROR - stderr - +2025-02-06 01:48:15 - INFO - stdout - {'loss': 0.4353, 'grad_norm': 1.657407522201538, 'learning_rate': 2.3725489652090183e-06, 'epoch': 2.35} +2025-02-06 01:48:15 - ERROR - stderr - 78%|███████▊ | 17563/22434 [15:40:35<3:30:29, 2.59s/it] +2025-02-06 01:48:18 - ERROR - stderr - 78%|███████▊ | 17564/22434 [15:40:37<3:30:27, 2.59s/it] +2025-02-06 01:48:18 - ERROR - stderr - +2025-02-06 01:48:18 - ERROR - stderr - +2025-02-06 01:48:18 - INFO - stdout - {'loss': 0.3339, 'grad_norm': 1.443459153175354, 'learning_rate': 2.371615374861184e-06, 'epoch': 2.35} +2025-02-06 01:48:18 - ERROR - stderr - 78%|███████▊ | 17564/22434 [15:40:37<3:30:27, 2.59s/it] +2025-02-06 01:48:20 - ERROR - stderr - 78%|███████▊ | 17565/22434 [15:40:40<3:29:46, 2.58s/it] +2025-02-06 01:48:20 - ERROR - stderr - +2025-02-06 01:48:20 - ERROR - stderr - +2025-02-06 01:48:20 - INFO - stdout - {'loss': 0.3918, 'grad_norm': 1.5707156658172607, 'learning_rate': 2.3706819435197257e-06, 'epoch': 2.35} +2025-02-06 01:48:20 - ERROR - stderr - 78%|███████▊ | 17565/22434 [15:40:40<3:29:46, 2.58s/it] +2025-02-06 01:48:23 - ERROR - stderr - 78%|███████▊ | 17566/22434 [15:40:42<3:30:10, 2.59s/it] +2025-02-06 01:48:23 - ERROR - stderr - +2025-02-06 01:48:23 - ERROR - stderr - +2025-02-06 01:48:23 - INFO - stdout - {'loss': 0.4153, 'grad_norm': 1.5400866270065308, 'learning_rate': 2.369748671204106e-06, 'epoch': 2.35} +2025-02-06 01:48:23 - ERROR - stderr - 78%|███████▊ | 17566/22434 [15:40:43<3:30:10, 2.59s/it] +2025-02-06 01:48:25 - ERROR - stderr - 78%|███████▊ | 17567/22434 [15:40:45<3:27:23, 2.56s/it] +2025-02-06 01:48:25 - ERROR - stderr - +2025-02-06 01:48:25 - ERROR - stderr - +2025-02-06 01:48:25 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.4629552364349365, 'learning_rate': 2.368815557933768e-06, 'epoch': 2.35} +2025-02-06 01:48:25 - ERROR - stderr - 78%|███████▊ | 17567/22434 [15:40:45<3:27:23, 2.56s/it] +2025-02-06 01:48:28 - ERROR - stderr - 78%|███████▊ | 17568/22434 [15:40:48<3:28:32, 2.57s/it] +2025-02-06 01:48:28 - ERROR - stderr - +2025-02-06 01:48:28 - ERROR - stderr - +2025-02-06 01:48:28 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.4154753684997559, 'learning_rate': 2.36788260372817e-06, 'epoch': 2.35} +2025-02-06 01:48:28 - ERROR - stderr - 78%|███████▊ | 17568/22434 [15:40:48<3:28:32, 2.57s/it] +2025-02-06 01:48:30 - ERROR - stderr - 78%|███████▊ | 17569/22434 [15:40:50<3:29:22, 2.58s/it] +2025-02-06 01:48:30 - ERROR - stderr - +2025-02-06 01:48:30 - ERROR - stderr - +2025-02-06 01:48:30 - INFO - stdout - {'loss': 0.4102, 'grad_norm': 1.4788978099822998, 'learning_rate': 2.366949808606759e-06, 'epoch': 2.35} +2025-02-06 01:48:30 - ERROR - stderr - 78%|███████▊ | 17569/22434 [15:40:50<3:29:22, 2.58s/it] +2025-02-06 01:48:33 - ERROR - stderr - 78%|███████▊ | 17570/22434 [15:40:53<3:29:01, 2.58s/it] +2025-02-06 01:48:33 - ERROR - stderr - +2025-02-06 01:48:33 - ERROR - stderr - +2025-02-06 01:48:33 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.4997624158859253, 'learning_rate': 2.3660171725889703e-06, 'epoch': 2.35} +2025-02-06 01:48:33 - ERROR - stderr - 78%|███████▊ | 17570/22434 [15:40:53<3:29:01, 2.58s/it] +2025-02-06 01:48:36 - ERROR - stderr - 78%|███████▊ | 17571/22434 [15:40:56<3:34:03, 2.64s/it] +2025-02-06 01:48:36 - ERROR - stderr - +2025-02-06 01:48:36 - ERROR - stderr - +2025-02-06 01:48:36 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.4077261686325073, 'learning_rate': 2.365084695694253e-06, 'epoch': 2.35} +2025-02-06 01:48:36 - ERROR - stderr - 78%|███████▊ | 17571/22434 [15:40:56<3:34:03, 2.64s/it] +2025-02-06 01:48:38 - ERROR - stderr - 78%|███████▊ | 17572/22434 [15:40:58<3:30:50, 2.60s/it] +2025-02-06 01:48:38 - ERROR - stderr - +2025-02-06 01:48:38 - ERROR - stderr - +2025-02-06 01:48:38 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.4685546159744263, 'learning_rate': 2.364152377942035e-06, 'epoch': 2.35} +2025-02-06 01:48:38 - ERROR - stderr - 78%|█████���█▊ | 17572/22434 [15:40:58<3:30:50, 2.60s/it] +2025-02-06 01:48:41 - ERROR - stderr - 78%|███████▊ | 17573/22434 [15:41:01<3:33:13, 2.63s/it] +2025-02-06 01:48:41 - ERROR - stderr - +2025-02-06 01:48:41 - ERROR - stderr - +2025-02-06 01:48:41 - INFO - stdout - {'loss': 0.4012, 'grad_norm': 1.5082988739013672, 'learning_rate': 2.3632202193517582e-06, 'epoch': 2.35} +2025-02-06 01:48:41 - ERROR - stderr - 78%|███████▊ | 17573/22434 [15:41:01<3:33:13, 2.63s/it] +2025-02-06 01:48:43 - ERROR - stderr - 78%|███████▊ | 17574/22434 [15:41:03<3:29:00, 2.58s/it] +2025-02-06 01:48:43 - ERROR - stderr - +2025-02-06 01:48:43 - ERROR - stderr - +2025-02-06 01:48:43 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.3603029251098633, 'learning_rate': 2.3622882199428463e-06, 'epoch': 2.35} +2025-02-06 01:48:43 - ERROR - stderr - 78%|███████▊ | 17574/22434 [15:41:03<3:29:00, 2.58s/it] +2025-02-06 01:48:46 - ERROR - stderr - 78%|███████▊ | 17575/22434 [15:41:06<3:27:57, 2.57s/it] +2025-02-06 01:48:46 - ERROR - stderr - +2025-02-06 01:48:46 - ERROR - stderr - +2025-02-06 01:48:46 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.3584884405136108, 'learning_rate': 2.361356379734725e-06, 'epoch': 2.35} +2025-02-06 01:48:46 - ERROR - stderr - 78%|███████▊ | 17575/22434 [15:41:06<3:27:57, 2.57s/it] +2025-02-06 01:48:48 - ERROR - stderr - 78%|███████▊ | 17576/22434 [15:41:08<3:25:58, 2.54s/it] +2025-02-06 01:48:49 - ERROR - stderr - +2025-02-06 01:48:49 - ERROR - stderr - +2025-02-06 01:48:49 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.573844313621521, 'learning_rate': 2.360424698746827e-06, 'epoch': 2.35} +2025-02-06 01:48:49 - ERROR - stderr - 78%|███████▊ | 17576/22434 [15:41:08<3:25:58, 2.54s/it] +2025-02-06 01:48:51 - ERROR - stderr - 78%|███████▊ | 17577/22434 [15:41:11<3:24:34, 2.53s/it] +2025-02-06 01:48:51 - ERROR - stderr - +2025-02-06 01:48:51 - ERROR - stderr - +2025-02-06 01:48:51 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.5637869834899902, 'learning_rate': 2.359493176998562e-06, 'epoch': 2.35} +2025-02-06 01:48:51 - ERROR - stderr - 78%|███████▊ | 17577/22434 [15:41:11<3:24:34, 2.53s/it] +2025-02-06 01:48:53 - ERROR - stderr - 78%|███████▊ | 17578/22434 [15:41:13<3:22:11, 2.50s/it] +2025-02-06 01:48:53 - ERROR - stderr - +2025-02-06 01:48:53 - ERROR - stderr - +2025-02-06 01:48:53 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.678982138633728, 'learning_rate': 2.3585618145093513e-06, 'epoch': 2.35} +2025-02-06 01:48:53 - ERROR - stderr - 78%|███████▊ | 17578/22434 [15:41:13<3:22:11, 2.50s/it] +2025-02-06 01:48:56 - ERROR - stderr - 78%|███████▊ | 17579/22434 [15:41:16<3:22:15, 2.50s/it] +2025-02-06 01:48:56 - ERROR - stderr - +2025-02-06 01:48:56 - ERROR - stderr - +2025-02-06 01:48:56 - INFO - stdout - {'loss': 0.3793, 'grad_norm': 1.5881446599960327, 'learning_rate': 2.357630611298607e-06, 'epoch': 2.35} +2025-02-06 01:48:56 - ERROR - stderr - 78%|███████▊ | 17579/22434 [15:41:16<3:22:15, 2.50s/it] +2025-02-06 01:48:58 - ERROR - stderr - 78%|███████▊ | 17580/22434 [15:41:18<3:24:17, 2.53s/it] +2025-02-06 01:48:59 - ERROR - stderr - +2025-02-06 01:48:59 - ERROR - stderr - +2025-02-06 01:48:59 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.5485620498657227, 'learning_rate': 2.3566995673857397e-06, 'epoch': 2.35} +2025-02-06 01:48:59 - ERROR - stderr - 78%|███████▊ | 17580/22434 [15:41:18<3:24:17, 2.53s/it] +2025-02-06 01:49:01 - ERROR - stderr - 78%|███████▊ | 17581/22434 [15:41:21<3:31:48, 2.62s/it] +2025-02-06 01:49:01 - ERROR - stderr - +2025-02-06 01:49:01 - ERROR - stderr - +2025-02-06 01:49:01 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.4549474716186523, 'learning_rate': 2.355768682790156e-06, 'epoch': 2.35} +2025-02-06 01:49:01 - ERROR - stderr - 78%|███████▊ | 17581/22434 [15:41:21<3:31:48, 2.62s/it] +2025-02-06 01:49:04 - ERROR - stderr - 78%|███████▊ | 17582/22434 [15:41:24<3:28:57, 2.58s/it] +2025-02-06 01:49:04 - ERROR - stderr - +2025-02-06 01:49:04 - ERROR - stderr - +2025-02-06 01:49:04 - INFO - stdout - {'loss': 0.3361, 'grad_norm': 1.5404152870178223, 'learning_rate': 2.3548379575312597e-06, 'epoch': 2.35} +2025-02-06 01:49:04 - ERROR - stderr - 78%|███████▊ | 17582/22434 [15:41:24<3:28:57, 2.58s/it] +2025-02-06 01:49:06 - ERROR - stderr - 78%|███████▊ | 17583/22434 [15:41:26<3:27:10, 2.56s/it] +2025-02-06 01:49:06 - ERROR - stderr - +2025-02-06 01:49:06 - ERROR - stderr - +2025-02-06 01:49:06 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.4260433912277222, 'learning_rate': 2.3539073916284504e-06, 'epoch': 2.35} +2025-02-06 01:49:06 - ERROR - stderr - 78%|███████▊ | 17583/22434 [15:41:26<3:27:10, 2.56s/it] +2025-02-06 01:49:09 - ERROR - stderr - 78%|███████▊ | 17584/22434 [15:41:29<3:24:32, 2.53s/it] +2025-02-06 01:49:09 - ERROR - stderr - +2025-02-06 01:49:09 - ERROR - stderr - +2025-02-06 01:49:09 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.447527527809143, 'learning_rate': 2.352976985101125e-06, 'epoch': 2.35} +2025-02-06 01:49:09 - ERROR - stderr - 78%|███████▊ | 17584/22434 [15:41:29<3:24:32, 2.53s/it] +2025-02-06 01:49:11 - ERROR - stderr - 78%|███████▊ | 17585/22434 [15:41:31<3:25:14, 2.54s/it] +2025-02-06 01:49:11 - ERROR - stderr - +2025-02-06 01:49:11 - ERROR - stderr - +2025-02-06 01:49:11 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.5347251892089844, 'learning_rate': 2.3520467379686797e-06, 'epoch': 2.35} +2025-02-06 01:49:11 - ERROR - stderr - 78%|███████▊ | 17585/22434 [15:41:31<3:25:14, 2.54s/it] +2025-02-06 01:49:14 - ERROR - stderr - 78%|███████▊ | 17586/22434 [15:41:34<3:23:18, 2.52s/it] +2025-02-06 01:49:14 - ERROR - stderr - +2025-02-06 01:49:14 - ERROR - stderr - +2025-02-06 01:49:14 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.4257363080978394, 'learning_rate': 2.3511166502504967e-06, 'epoch': 2.35} +2025-02-06 01:49:14 - ERROR - stderr - 78%|███████▊ | 17586/22434 [15:41:34<3:23:18, 2.52s/it] +2025-02-06 01:49:16 - ERROR - stderr - 78%|███████▊ | 17587/22434 [15:41:36<3:22:26, 2.51s/it] +2025-02-06 01:49:16 - ERROR - stderr - +2025-02-06 01:49:16 - ERROR - stderr - +2025-02-06 01:49:16 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.6945871114730835, 'learning_rate': 2.3501867219659703e-06, 'epoch': 2.35} +2025-02-06 01:49:16 - ERROR - stderr - 78%|███████▊ | 17587/22434 [15:41:36<3:22:26, 2.51s/it] +2025-02-06 01:49:19 - ERROR - stderr - 78%|███████▊ | 17588/22434 [15:41:39<3:21:47, 2.50s/it] +2025-02-06 01:49:19 - ERROR - stderr - +2025-02-06 01:49:19 - ERROR - stderr - +2025-02-06 01:49:19 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.4235073328018188, 'learning_rate': 2.349256953134481e-06, 'epoch': 2.35} +2025-02-06 01:49:19 - ERROR - stderr - 78%|███████▊ | 17588/22434 [15:41:39<3:21:47, 2.50s/it] +2025-02-06 01:49:21 - ERROR - stderr - 78%|███████▊ | 17589/22434 [15:41:41<3:23:45, 2.52s/it] +2025-02-06 01:49:21 - ERROR - stderr - +2025-02-06 01:49:21 - ERROR - stderr - +2025-02-06 01:49:21 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.6472917795181274, 'learning_rate': 2.3483273437754106e-06, 'epoch': 2.35} +2025-02-06 01:49:21 - ERROR - stderr - 78%|███████▊ | 17589/22434 [15:41:41<3:23:45, 2.52s/it] +2025-02-06 01:49:24 - ERROR - stderr - 78%|███████▊ | 17590/22434 [15:41:44<3:22:52, 2.51s/it] +2025-02-06 01:49:24 - ERROR - stderr - +2025-02-06 01:49:24 - ERROR - stderr - +2025-02-06 01:49:24 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.6385244131088257, 'learning_rate': 2.3473978939081375e-06, 'epoch': 2.35} +2025-02-06 01:49:24 - ERROR - stderr - 78%|███████▊ | 17590/22434 [15:41:44<3:22:52, 2.51s/it] +2025-02-06 01:49:26 - ERROR - stderr - 78%|███████▊ | 17591/22434 [15:41:46<3:22:09, 2.50s/it] +2025-02-06 01:49:26 - ERROR - stderr - +2025-02-06 01:49:26 - ERROR - stderr - +2025-02-06 01:49:26 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.4511382579803467, 'learning_rate': 2.3464686035520267e-06, 'epoch': 2.35} +2025-02-06 01:49:26 - ERROR - stderr - 78%|███████▊ | 17591/22434 [15:41:46<3:22:09, 2.50s/it] +2025-02-06 01:49:29 - ERROR - stderr - 78%|███████▊ | 17592/22434 [15:41:49<3:20:07, 2.48s/it] +2025-02-06 01:49:29 - ERROR - stderr - +2025-02-06 01:49:29 - ERROR - stderr - +2025-02-06 01:49:29 - INFO - stdout - {'loss': 0.4102, 'grad_norm': 1.6987982988357544, 'learning_rate': 2.345539472726459e-06, 'epoch': 2.35} +2025-02-06 01:49:29 - ERROR - stderr - 78%|███████▊ | 17592/22434 [15:41:49<3:20:07, 2.48s/it] +2025-02-06 01:49:31 - ERROR - stderr - 78%|███████▊ | 17593/22434 [15:41:51<3:23:44, 2.53s/it] +2025-02-06 01:49:31 - ERROR - stderr - +2025-02-06 01:49:31 - ERROR - stderr - +2025-02-06 01:49:31 - INFO - stdout - {'loss': 0.3489, 'grad_norm': 1.5320804119110107, 'learning_rate': 2.3446105014507925e-06, 'epoch': 2.35} +2025-02-06 01:49:31 - ERROR - stderr - 78%|███████▊ | 17593/22434 [15:41:51<3:23:44, 2.53s/it] +2025-02-06 01:49:34 - ERROR - stderr - 78%|███████▊ | 17594/22434 [15:41:54<3:23:03, 2.52s/it] +2025-02-06 01:49:34 - ERROR - stderr - +2025-02-06 01:49:34 - ERROR - stderr - +2025-02-06 01:49:34 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.5359894037246704, 'learning_rate': 2.343681689744396e-06, 'epoch': 2.35} +2025-02-06 01:49:34 - ERROR - stderr - 78%|███████▊ | 17594/22434 [15:41:54<3:23:03, 2.52s/it] +2025-02-06 01:49:37 - ERROR - stderr - 78%|███████▊ | 17595/22434 [15:41:56<3:27:12, 2.57s/it] +2025-02-06 01:49:37 - ERROR - stderr - +2025-02-06 01:49:37 - ERROR - stderr - +2025-02-06 01:49:37 - INFO - stdout - {'loss': 0.3181, 'grad_norm': 1.3519891500473022, 'learning_rate': 2.342753037626633e-06, 'epoch': 2.35} +2025-02-06 01:49:37 - ERROR - stderr - 78%|███████▊ | 17595/22434 [15:41:56<3:27:12, 2.57s/it] +2025-02-06 01:49:39 - ERROR - stderr - 78%|███████▊ | 17596/22434 [15:41:59<3:24:13, 2.53s/it] +2025-02-06 01:49:39 - ERROR - stderr - +2025-02-06 01:49:39 - ERROR - stderr - +2025-02-06 01:49:39 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.3641998767852783, 'learning_rate': 2.341824545116849e-06, 'epoch': 2.35} +2025-02-06 01:49:39 - ERROR - stderr - 78%|███████▊ | 17596/22434 [15:41:59<3:24:13, 2.53s/it] +2025-02-06 01:49:41 - ERROR - stderr - 78%|███████▊ | 17597/22434 [15:42:01<3:23:02, 2.52s/it] +2025-02-06 01:49:42 - ERROR - stderr - +2025-02-06 01:49:42 - ERROR - stderr - +2025-02-06 01:49:42 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.6393500566482544, 'learning_rate': 2.3408962122344093e-06, 'epoch': 2.35} +2025-02-06 01:49:42 - ERROR - stderr - 78%|███████▊ | 17597/22434 [15:42:01<3:23:02, 2.52s/it] +2025-02-06 01:49:44 - ERROR - stderr - 78%|███████▊ | 17598/22434 [15:42:04<3:32:39, 2.64s/it] +2025-02-06 01:49:44 - ERROR - stderr - +2025-02-06 01:49:44 - ERROR - stderr - +2025-02-06 01:49:44 - INFO - stdout - {'loss': 0.3335, 'grad_norm': 1.371472954750061, 'learning_rate': 2.339968038998657e-06, 'epoch': 2.35} +2025-02-06 01:49:44 - ERROR - stderr - 78%|███████▊ | 17598/22434 [15:42:04<3:32:39, 2.64s/it] +2025-02-06 01:49:47 - ERROR - stderr - 78%|███████▊ | 17599/22434 [15:42:07<3:29:27, 2.60s/it] +2025-02-06 01:49:47 - ERROR - stderr - +2025-02-06 01:49:47 - ERROR - stderr - +2025-02-06 01:49:47 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.7660419940948486, 'learning_rate': 2.3390400254289402e-06, 'epoch': 2.35} +2025-02-06 01:49:47 - ERROR - stderr - 78%|███████▊ | 17599/22434 [15:42:07<3:29:27, 2.60s/it] +2025-02-06 01:49:49 - ERROR - stderr - 78%|███████▊ | 17600/22434 [15:42:09<3:28:51, 2.59s/it] +2025-02-06 01:49:50 - ERROR - stderr - +2025-02-06 01:49:50 - ERROR - stderr - +2025-02-06 01:49:50 - INFO - stdout - {'loss': 0.3634, 'grad_norm': 1.535967230796814, 'learning_rate': 2.3381121715446044e-06, 'epoch': 2.35} +2025-02-06 01:49:50 - ERROR - stderr - 78%|███████▊ | 17600/22434 [15:42:09<3:28:51, 2.59s/it] +2025-02-06 01:49:52 - ERROR - stderr - 78%|███████▊ | 17601/22434 [15:42:12<3:28:18, 2.59s/it] +2025-02-06 01:49:52 - ERROR - stderr - +2025-02-06 01:49:52 - ERROR - stderr - +2025-02-06 01:49:52 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.6346659660339355, 'learning_rate': 2.3371844773649888e-06, 'epoch': 2.35} +2025-02-06 01:49:52 - ERROR - stderr - 78%|███████▊ | 17601/22434 [15:42:12<3:28:18, 2.59s/it] +2025-02-06 01:49:55 - ERROR - stderr - 78%|███████▊ | 17602/22434 [15:42:15<3:31:39, 2.63s/it] +2025-02-06 01:49:55 - ERROR - stderr - +2025-02-06 01:49:55 - ERROR - stderr - +2025-02-06 01:49:55 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.409423589706421, 'learning_rate': 2.3362569429094295e-06, 'epoch': 2.35} +2025-02-06 01:49:55 - ERROR - stderr - 78%|███████▊ | 17602/22434 [15:42:15<3:31:39, 2.63s/it] +2025-02-06 01:49:57 - ERROR - stderr - 78%|███████▊ | 17603/22434 [15:42:17<3:27:36, 2.58s/it] +2025-02-06 01:49:57 - ERROR - stderr - +2025-02-06 01:49:57 - ERROR - stderr - +2025-02-06 01:49:57 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.621748924255371, 'learning_rate': 2.335329568197261e-06, 'epoch': 2.35} +2025-02-06 01:49:57 - ERROR - stderr - 78%|███████▊ | 17603/22434 [15:42:17<3:27:36, 2.58s/it] +2025-02-06 01:50:00 - ERROR - stderr - 78%|███████▊ | 17604/22434 [15:42:19<3:24:24, 2.54s/it] +2025-02-06 01:50:00 - ERROR - stderr - +2025-02-06 01:50:00 - ERROR - stderr - +2025-02-06 01:50:00 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.5207022428512573, 'learning_rate': 2.3344023532478135e-06, 'epoch': 2.35} +2025-02-06 01:50:00 - ERROR - stderr - 78%|███████▊ | 17604/22434 [15:42:20<3:24:24, 2.54s/it] +2025-02-06 01:50:02 - ERROR - stderr - 78%|███████▊ | 17605/22434 [15:42:22<3:23:05, 2.52s/it] +2025-02-06 01:50:02 - ERROR - stderr - +2025-02-06 01:50:02 - ERROR - stderr - +2025-02-06 01:50:02 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.4854915142059326, 'learning_rate': 2.333475298080414e-06, 'epoch': 2.35} +2025-02-06 01:50:02 - ERROR - stderr - 78%|███████▊ | 17605/22434 [15:42:22<3:23:05, 2.52s/it] +2025-02-06 01:50:05 - ERROR - stderr - 78%|███████▊ | 17606/22434 [15:42:24<3:22:45, 2.52s/it] +2025-02-06 01:50:05 - ERROR - stderr - +2025-02-06 01:50:05 - ERROR - stderr - +2025-02-06 01:50:05 - INFO - stdout - {'loss': 0.3489, 'grad_norm': 1.4008948802947998, 'learning_rate': 2.332548402714385e-06, 'epoch': 2.35} +2025-02-06 01:50:05 - ERROR - stderr - 78%|███████▊ | 17606/22434 [15:42:25<3:22:45, 2.52s/it] +2025-02-06 01:50:07 - ERROR - stderr - 78%|███████▊ | 17607/22434 [15:42:27<3:26:47, 2.57s/it] +2025-02-06 01:50:07 - ERROR - stderr - +2025-02-06 01:50:07 - ERROR - stderr - +2025-02-06 01:50:07 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.619930386543274, 'learning_rate': 2.3316216671690485e-06, 'epoch': 2.35} +2025-02-06 01:50:07 - ERROR - stderr - 78%|███████▊ | 17607/22434 [15:42:27<3:26:47, 2.57s/it] +2025-02-06 01:50:10 - ERROR - stderr - 78%|███████▊ | 17608/22434 [15:42:30<3:23:07, 2.53s/it] +2025-02-06 01:50:10 - ERROR - stderr - +2025-02-06 01:50:10 - ERROR - stderr - +2025-02-06 01:50:10 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.6028120517730713, 'learning_rate': 2.3306950914637205e-06, 'epoch': 2.35} +2025-02-06 01:50:10 - ERROR - stderr - 78%|███████▊ | 17608/22434 [15:42:30<3:23:07, 2.53s/it] +2025-02-06 01:50:12 - ERROR - stderr - 78%|███████▊ | 17609/22434 [15:42:32<3:20:41, 2.50s/it] +2025-02-06 01:50:12 - ERROR - stderr - +2025-02-06 01:50:12 - ERROR - stderr - +2025-02-06 01:50:12 - INFO - stdout - {'loss': 0.3943, 'grad_norm': 1.6198242902755737, 'learning_rate': 2.329768675617714e-06, 'epoch': 2.35} +2025-02-06 01:50:12 - ERROR - stderr - 78%|███████▊ | 17609/22434 [15:42:32<3:20:41, 2.50s/it] +2025-02-06 01:50:15 - ERROR - stderr - 78%|███████▊ | 17610/22434 [15:42:34<3:19:19, 2.48s/it] +2025-02-06 01:50:15 - ERROR - stderr - +2025-02-06 01:50:15 - ERROR - stderr - +2025-02-06 01:50:15 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.7245213985443115, 'learning_rate': 2.32884241965034e-06, 'epoch': 2.35} +2025-02-06 01:50:15 - ERROR - stderr - 78%|███████▊ | 17610/22434 [15:42:34<3:19:19, 2.48s/it] +2025-02-06 01:50:17 - ERROR - stderr - 79%|███████▊ | 17611/22434 [15:42:37<3:19:52, 2.49s/it] +2025-02-06 01:50:17 - ERROR - stderr - +2025-02-06 01:50:17 - ERROR - stderr - +2025-02-06 01:50:17 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.3707060813903809, 'learning_rate': 2.327916323580909e-06, 'epoch': 2.36} +2025-02-06 01:50:17 - ERROR - stderr - 79%|███████▊ | 17611/22434 [15:42:37<3:19:52, 2.49s/it] +2025-02-06 01:50:20 - ERROR - stderr - 79%|███████▊ | 17612/22434 [15:42:39<3:18:36, 2.47s/it] +2025-02-06 01:50:20 - ERROR - stderr - +2025-02-06 01:50:20 - ERROR - stderr - +2025-02-06 01:50:20 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.5521405935287476, 'learning_rate': 2.3269903874287146e-06, 'epoch': 2.36} +2025-02-06 01:50:20 - ERROR - stderr - 79%|███████▊ | 17612/22434 [15:42:39<3:18:36, 2.47s/it] +2025-02-06 01:50:22 - ERROR - stderr - 79%|███████▊ | 17613/22434 [15:42:42<3:18:12, 2.47s/it] +2025-02-06 01:50:22 - ERROR - stderr - +2025-02-06 01:50:22 - ERROR - stderr - +2025-02-06 01:50:22 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.464019536972046, 'learning_rate': 2.3260646112130657e-06, 'epoch': 2.36} +2025-02-06 01:50:22 - ERROR - stderr - 79%|███████▊ | 17613/22434 [15:42:42<3:18:12, 2.47s/it] +2025-02-06 01:50:25 - ERROR - stderr - 79%|███████▊ | 17614/22434 [15:42:44<3:21:05, 2.50s/it] +2025-02-06 01:50:25 - ERROR - stderr - +2025-02-06 01:50:25 - ERROR - stderr - +2025-02-06 01:50:25 - INFO - stdout - {'loss': 0.3238, 'grad_norm': 1.2734593152999878, 'learning_rate': 2.32513899495326e-06, 'epoch': 2.36} +2025-02-06 01:50:25 - ERROR - stderr - 79%|███████▊ | 17614/22434 [15:42:44<3:21:05, 2.50s/it] +2025-02-06 01:50:27 - ERROR - stderr - 79%|███████▊ | 17615/22434 [15:42:47<3:22:07, 2.52s/it] +2025-02-06 01:50:27 - ERROR - stderr - +2025-02-06 01:50:27 - ERROR - stderr - +2025-02-06 01:50:27 - INFO - stdout - {'loss': 0.3889, 'grad_norm': 1.6263285875320435, 'learning_rate': 2.3242135386685816e-06, 'epoch': 2.36} +2025-02-06 01:50:27 - ERROR - stderr - 79%|███████▊ | 17615/22434 [15:42:47<3:22:07, 2.52s/it] +2025-02-06 01:50:30 - ERROR - stderr - 79%|███████▊ | 17616/22434 [15:42:49<3:21:37, 2.51s/it] +2025-02-06 01:50:30 - ERROR - stderr - +2025-02-06 01:50:30 - ERROR - stderr - +2025-02-06 01:50:30 - INFO - stdout - {'loss': 0.4187, 'grad_norm': 1.7689213752746582, 'learning_rate': 2.3232882423783342e-06, 'epoch': 2.36} +2025-02-06 01:50:30 - ERROR - stderr - 79%|███████▊ | 17616/22434 [15:42:50<3:21:37, 2.51s/it] +2025-02-06 01:50:32 - ERROR - stderr - 79%|███████▊ | 17617/22434 [15:42:52<3:21:05, 2.50s/it] +2025-02-06 01:50:32 - ERROR - stderr - +2025-02-06 01:50:32 - ERROR - stderr - +2025-02-06 01:50:32 - INFO - stdout - {'loss': 0.4012, 'grad_norm': 1.5880107879638672, 'learning_rate': 2.3223631061017903e-06, 'epoch': 2.36} +2025-02-06 01:50:32 - ERROR - stderr - 79%|███████▊ | 17617/22434 [15:42:52<3:21:05, 2.50s/it] +2025-02-06 01:50:35 - ERROR - stderr - 79%|███████▊ | 17618/22434 [15:42:54<3:18:54, 2.48s/it] +2025-02-06 01:50:35 - ERROR - stderr - +2025-02-06 01:50:35 - ERROR - stderr - +2025-02-06 01:50:35 - INFO - stdout - {'loss': 0.3433, 'grad_norm': 1.4149914979934692, 'learning_rate': 2.3214381298582477e-06, 'epoch': 2.36} +2025-02-06 01:50:35 - ERROR - stderr - 79%|███████▊ | 17618/22434 [15:42:54<3:18:54, 2.48s/it] +2025-02-06 01:50:37 - ERROR - stderr - 79%|███████▊ | 17619/22434 [15:42:57<3:22:14, 2.52s/it] +2025-02-06 01:50:37 - ERROR - stderr - +2025-02-06 01:50:37 - ERROR - stderr - +2025-02-06 01:50:37 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.5423498153686523, 'learning_rate': 2.3205133136669757e-06, 'epoch': 2.36} +2025-02-06 01:50:37 - ERROR - stderr - 79%|███████▊ | 17619/22434 [15:42:57<3:22:14, 2.52s/it] +2025-02-06 01:50:40 - ERROR - stderr - 79%|███████▊ | 17620/22434 [15:43:00<3:25:24, 2.56s/it] +2025-02-06 01:50:40 - ERROR - stderr - +2025-02-06 01:50:40 - ERROR - stderr - +2025-02-06 01:50:40 - INFO - stdout - {'loss': 0.4466, 'grad_norm': 1.7391676902770996, 'learning_rate': 2.3195886575472557e-06, 'epoch': 2.36} +2025-02-06 01:50:40 - ERROR - stderr - 79%|███████▊ | 17620/22434 [15:43:00<3:25:24, 2.56s/it] +2025-02-06 01:50:42 - ERROR - stderr - 79%|███████▊ | 17621/22434 [15:43:02<3:26:33, 2.58s/it] +2025-02-06 01:50:43 - ERROR - stderr - +2025-02-06 01:50:43 - ERROR - stderr - +2025-02-06 01:50:43 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5171104669570923, 'learning_rate': 2.3186641615183615e-06, 'epoch': 2.36} +2025-02-06 01:50:43 - ERROR - stderr - 79%|███████▊ | 17621/22434 [15:43:02<3:26:33, 2.58s/it] +2025-02-06 01:50:45 - ERROR - stderr - 79%|███████▊ | 17622/22434 [15:43:05<3:25:37, 2.56s/it] +2025-02-06 01:50:45 - ERROR - stderr - +2025-02-06 01:50:45 - ERROR - stderr - +2025-02-06 01:50:45 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.5710216760635376, 'learning_rate': 2.317739825599562e-06, 'epoch': 2.36} +2025-02-06 01:50:45 - ERROR - stderr - 79%|███████▊ | 17622/22434 [15:43:05<3:25:37, 2.56s/it] +2025-02-06 01:50:47 - ERROR - stderr - 79%|███████▊ | 17623/22434 [15:43:07<3:22:54, 2.53s/it] +2025-02-06 01:50:48 - ERROR - stderr - +2025-02-06 01:50:48 - ERROR - stderr - +2025-02-06 01:50:48 - INFO - stdout - {'loss': 0.4088, 'grad_norm': 1.7492246627807617, 'learning_rate': 2.3168156498101247e-06, 'epoch': 2.36} +2025-02-06 01:50:48 - ERROR - stderr - 79%|███████▊ | 17623/22434 [15:43:07<3:22:54, 2.53s/it] +2025-02-06 01:50:50 - ERROR - stderr - 79%|███████▊ | 17624/22434 [15:43:10<3:23:22, 2.54s/it] +2025-02-06 01:50:50 - ERROR - stderr - +2025-02-06 01:50:50 - ERROR - stderr - +2025-02-06 01:50:50 - INFO - stdout - {'loss': 0.4073, 'grad_norm': 1.6864070892333984, 'learning_rate': 2.3158916341693126e-06, 'epoch': 2.36} +2025-02-06 01:50:50 - ERROR - stderr - 79%|███████▊ | 17624/22434 [15:43:10<3:23:22, 2.54s/it] +2025-02-06 01:50:53 - ERROR - stderr - 79%|███████▊ | 17625/22434 [15:43:12<3:23:42, 2.54s/it] +2025-02-06 01:50:53 - ERROR - stderr - +2025-02-06 01:50:53 - ERROR - stderr - +2025-02-06 01:50:53 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.572609782218933, 'learning_rate': 2.3149677786963874e-06, 'epoch': 2.36} +2025-02-06 01:50:53 - ERROR - stderr - 79%|███████▊ | 17625/22434 [15:43:12<3:23:42, 2.54s/it] +2025-02-06 01:50:55 - ERROR - stderr - 79%|███████▊ | 17626/22434 [15:43:15<3:22:20, 2.52s/it] +2025-02-06 01:50:55 - ERROR - stderr - +2025-02-06 01:50:55 - ERROR - stderr - +2025-02-06 01:50:55 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.4225693941116333, 'learning_rate': 2.314044083410605e-06, 'epoch': 2.36} +2025-02-06 01:50:55 - ERROR - stderr - 79%|███████▊ | 17626/22434 [15:43:15<3:22:20, 2.52s/it] +2025-02-06 01:50:58 - ERROR - stderr - 79%|███████▊ | 17627/22434 [15:43:17<3:22:33, 2.53s/it] +2025-02-06 01:50:58 - ERROR - stderr - +2025-02-06 01:50:58 - ERROR - stderr - +2025-02-06 01:50:58 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.512403130531311, 'learning_rate': 2.313120548331218e-06, 'epoch': 2.36} +2025-02-06 01:50:58 - ERROR - stderr - 79%|███████▊ | 17627/22434 [15:43:17<3:22:33, 2.53s/it] +2025-02-06 01:51:00 - ERROR - stderr - 79%|███████▊ | 17628/22434 [15:43:20<3:21:22, 2.51s/it] +2025-02-06 01:51:00 - ERROR - stderr - +2025-02-06 01:51:00 - ERROR - stderr - +2025-02-06 01:51:00 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.6987284421920776, 'learning_rate': 2.3121971734774783e-06, 'epoch': 2.36} +2025-02-06 01:51:00 - ERROR - stderr - 79%|███████▊ | 17628/22434 [15:43:20<3:21:22, 2.51s/it] +2025-02-06 01:51:03 - ERROR - stderr - 79%|███████▊ | 17629/22434 [15:43:22<3:22:02, 2.52s/it] +2025-02-06 01:51:03 - ERROR - stderr - +2025-02-06 01:51:03 - ERROR - stderr - +2025-02-06 01:51:03 - INFO - stdout - {'loss': 0.3244, 'grad_norm': 1.4325984716415405, 'learning_rate': 2.3112739588686327e-06, 'epoch': 2.36} +2025-02-06 01:51:03 - ERROR - stderr - 79%|███████▊ | 17629/22434 [15:43:22<3:22:02, 2.52s/it] +2025-02-06 01:51:05 - ERROR - stderr - 79%|███████▊ | 17630/22434 [15:43:25<3:21:24, 2.52s/it] +2025-02-06 01:51:05 - ERROR - stderr - +2025-02-06 01:51:05 - ERROR - stderr - +2025-02-06 01:51:05 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.5761210918426514, 'learning_rate': 2.310350904523926e-06, 'epoch': 2.36} +2025-02-06 01:51:05 - ERROR - stderr - 79%|███████▊ | 17630/22434 [15:43:25<3:21:24, 2.52s/it] +2025-02-06 01:51:08 - ERROR - stderr - 79%|███████▊ | 17631/22434 [15:43:27<3:22:13, 2.53s/it] +2025-02-06 01:51:08 - ERROR - stderr - +2025-02-06 01:51:08 - ERROR - stderr - +2025-02-06 01:51:08 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.4827806949615479, 'learning_rate': 2.309428010462591e-06, 'epoch': 2.36} +2025-02-06 01:51:08 - ERROR - stderr - 79%|███████▊ | 17631/22434 [15:43:27<3:22:13, 2.53s/it] +2025-02-06 01:51:10 - ERROR - stderr - 79%|███████▊ | 17632/22434 [15:43:30<3:25:53, 2.57s/it] +2025-02-06 01:51:10 - ERROR - stderr - +2025-02-06 01:51:10 - ERROR - stderr - +2025-02-06 01:51:10 - INFO - stdout - {'loss': 0.4202, 'grad_norm': 1.7346501350402832, 'learning_rate': 2.308505276703874e-06, 'epoch': 2.36} +2025-02-06 01:51:10 - ERROR - stderr - 79%|███████▊ | 17632/22434 [15:43:30<3:25:53, 2.57s/it] +2025-02-06 01:51:13 - ERROR - stderr - 79%|███████▊ | 17633/22434 [15:43:33<3:25:57, 2.57s/it] +2025-02-06 01:51:13 - ERROR - stderr - +2025-02-06 01:51:13 - ERROR - stderr - +2025-02-06 01:51:13 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.4199753999710083, 'learning_rate': 2.3075827032670028e-06, 'epoch': 2.36} +2025-02-06 01:51:13 - ERROR - stderr - 79%|███████▊ | 17633/22434 [15:43:33<3:25:57, 2.57s/it] +2025-02-06 01:51:15 - ERROR - stderr - 79%|███████▊ | 17634/22434 [15:43:35<3:23:29, 2.54s/it] +2025-02-06 01:51:15 - ERROR - stderr - +2025-02-06 01:51:15 - ERROR - stderr - +2025-02-06 01:51:15 - INFO - stdout - {'loss': 0.4244, 'grad_norm': 1.7224870920181274, 'learning_rate': 2.306660290171211e-06, 'epoch': 2.36} +2025-02-06 01:51:15 - ERROR - stderr - 79%|███████▊ | 17634/22434 [15:43:35<3:23:29, 2.54s/it] +2025-02-06 01:51:18 - ERROR - stderr - 79%|███████▊ | 17635/22434 [15:43:38<3:21:26, 2.52s/it] +2025-02-06 01:51:18 - ERROR - stderr - +2025-02-06 01:51:18 - ERROR - stderr - +2025-02-06 01:51:18 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.4850341081619263, 'learning_rate': 2.305738037435725e-06, 'epoch': 2.36} +2025-02-06 01:51:18 - ERROR - stderr - 79%|███████▊ | 17635/22434 [15:43:38<3:21:26, 2.52s/it] +2025-02-06 01:51:20 - ERROR - stderr - 79%|███████▊ | 17636/22434 [15:43:40<3:22:21, 2.53s/it] +2025-02-06 01:51:20 - ERROR - stderr - +2025-02-06 01:51:20 - ERROR - stderr - +2025-02-06 01:51:20 - INFO - stdout - {'loss': 0.4409, 'grad_norm': 1.5659384727478027, 'learning_rate': 2.3048159450797626e-06, 'epoch': 2.36} +2025-02-06 01:51:20 - ERROR - stderr - 79%|███████▊ | 17636/22434 [15:43:40<3:22:21, 2.53s/it] +2025-02-06 01:51:23 - ERROR - stderr - 79%|███████▊ | 17637/22434 [15:43:43<3:21:34, 2.52s/it] +2025-02-06 01:51:23 - ERROR - stderr - +2025-02-06 01:51:23 - ERROR - stderr - +2025-02-06 01:51:23 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.4847172498703003, 'learning_rate': 2.303894013122553e-06, 'epoch': 2.36} +2025-02-06 01:51:23 - ERROR - stderr - 79%|███████▊ | 17637/22434 [15:43:43<3:21:34, 2.52s/it] +2025-02-06 01:51:25 - ERROR - stderr - 79%|███████▊ | 17638/22434 [15:43:45<3:21:18, 2.52s/it] +2025-02-06 01:51:25 - ERROR - stderr - +2025-02-06 01:51:25 - ERROR - stderr - +2025-02-06 01:51:25 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.4024910926818848, 'learning_rate': 2.3029722415833057e-06, 'epoch': 2.36} +2025-02-06 01:51:25 - ERROR - stderr - 79%|███████▊ | 17638/22434 [15:43:45<3:21:18, 2.52s/it] +2025-02-06 01:51:28 - ERROR - stderr - 79%|███████▊ | 17639/22434 [15:43:48<3:21:07, 2.52s/it] +2025-02-06 01:51:28 - ERROR - stderr - +2025-02-06 01:51:28 - ERROR - stderr - +2025-02-06 01:51:28 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.5001801252365112, 'learning_rate': 2.3020506304812373e-06, 'epoch': 2.36} +2025-02-06 01:51:28 - ERROR - stderr - 79%|███████▊ | 17639/22434 [15:43:48<3:21:07, 2.52s/it] +2025-02-06 01:51:30 - ERROR - stderr - 79%|███████▊ | 17640/22434 [15:43:50<3:20:56, 2.51s/it] +2025-02-06 01:51:31 - ERROR - stderr - +2025-02-06 01:51:31 - ERROR - stderr - +2025-02-06 01:51:31 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.439276099205017, 'learning_rate': 2.3011291798355573e-06, 'epoch': 2.36} +2025-02-06 01:51:31 - ERROR - stderr - 79%|███████▊ | 17640/22434 [15:43:50<3:20:56, 2.51s/it] +2025-02-06 01:51:33 - ERROR - stderr - 79%|███████▊ | 17641/22434 [15:43:53<3:21:30, 2.52s/it] +2025-02-06 01:51:33 - ERROR - stderr - +2025-02-06 01:51:33 - ERROR - stderr - +2025-02-06 01:51:33 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.6095563173294067, 'learning_rate': 2.300207889665469e-06, 'epoch': 2.36} +2025-02-06 01:51:33 - ERROR - stderr - 79%|███████▊ | 17641/22434 [15:43:53<3:21:30, 2.52s/it] +2025-02-06 01:51:35 - ERROR - stderr - 79%|███████▊ | 17642/22434 [15:43:55<3:20:34, 2.51s/it] +2025-02-06 01:51:36 - ERROR - stderr - +2025-02-06 01:51:36 - ERROR - stderr - +2025-02-06 01:51:36 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.5512473583221436, 'learning_rate': 2.299286759990186e-06, 'epoch': 2.36} +2025-02-06 01:51:36 - ERROR - stderr - 79%|███████▊ | 17642/22434 [15:43:55<3:20:34, 2.51s/it] +2025-02-06 01:51:38 - ERROR - stderr - 79%|███████▊ | 17643/22434 [15:43:58<3:20:34, 2.51s/it] +2025-02-06 01:51:38 - ERROR - stderr - +2025-02-06 01:51:38 - ERROR - stderr - +2025-02-06 01:51:38 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.5021485090255737, 'learning_rate': 2.298365790828898e-06, 'epoch': 2.36} +2025-02-06 01:51:38 - ERROR - stderr - 79%|███████▊ | 17643/22434 [15:43:58<3:20:34, 2.51s/it] +2025-02-06 01:51:40 - ERROR - stderr - 79%|███████▊ | 17644/22434 [15:44:00<3:18:28, 2.49s/it] +2025-02-06 01:51:40 - ERROR - stderr - +2025-02-06 01:51:40 - ERROR - stderr - +2025-02-06 01:51:40 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.490556001663208, 'learning_rate': 2.2974449822008062e-06, 'epoch': 2.36} +2025-02-06 01:51:40 - ERROR - stderr - 79%|███████▊ | 17644/22434 [15:44:00<3:18:28, 2.49s/it] +2025-02-06 01:51:43 - ERROR - stderr - 79%|███████▊ | 17645/22434 [15:44:03<3:17:52, 2.48s/it] +2025-02-06 01:51:43 - ERROR - stderr - +2025-02-06 01:51:43 - ERROR - stderr - +2025-02-06 01:51:43 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.4124581813812256, 'learning_rate': 2.296524334125102e-06, 'epoch': 2.36} +2025-02-06 01:51:43 - ERROR - stderr - 79%|███████▊ | 17645/22434 [15:44:03<3:17:52, 2.48s/it] +2025-02-06 01:51:45 - ERROR - stderr - 79%|███████▊ | 17646/22434 [15:44:05<3:17:36, 2.48s/it] +2025-02-06 01:51:45 - ERROR - stderr - +2025-02-06 01:51:45 - ERROR - stderr - +2025-02-06 01:51:45 - INFO - stdout - {'loss': 0.3569, 'grad_norm': 1.4436336755752563, 'learning_rate': 2.2956038466209775e-06, 'epoch': 2.36} +2025-02-06 01:51:45 - ERROR - stderr - 79%|███████▊ | 17646/22434 [15:44:05<3:17:36, 2.48s/it] +2025-02-06 01:51:48 - ERROR - stderr - 79%|███████▊ | 17647/22434 [15:44:08<3:20:19, 2.51s/it] +2025-02-06 01:51:48 - ERROR - stderr - +2025-02-06 01:51:48 - ERROR - stderr - +2025-02-06 01:51:48 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.6325069665908813, 'learning_rate': 2.294683519707619e-06, 'epoch': 2.36} +2025-02-06 01:51:48 - ERROR - stderr - 79%|███████▊ | 17647/22434 [15:44:08<3:20:19, 2.51s/it] +2025-02-06 01:51:50 - ERROR - stderr - 79%|███████▊ | 17648/22434 [15:44:10<3:18:33, 2.49s/it] +2025-02-06 01:51:50 - ERROR - stderr - +2025-02-06 01:51:50 - ERROR - stderr - +2025-02-06 01:51:50 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.4365670680999756, 'learning_rate': 2.2937633534042083e-06, 'epoch': 2.36} +2025-02-06 01:51:50 - ERROR - stderr - 79%|███████▊ | 17648/22434 [15:44:10<3:18:33, 2.49s/it] +2025-02-06 01:51:53 - ERROR - stderr - 79%|███████▊ | 17649/22434 [15:44:13<3:22:16, 2.54s/it] +2025-02-06 01:51:53 - ERROR - stderr - +2025-02-06 01:51:53 - ERROR - stderr - +2025-02-06 01:51:53 - INFO - stdout - {'loss': 0.3847, 'grad_norm': 1.5861457586288452, 'learning_rate': 2.2928433477299274e-06, 'epoch': 2.36} +2025-02-06 01:51:53 - ERROR - stderr - 79%|███████▊ | 17649/22434 [15:44:13<3:22:16, 2.54s/it] +2025-02-06 01:51:56 - ERROR - stderr - 79%|███████▊ | 17650/22434 [15:44:15<3:21:16, 2.52s/it] +2025-02-06 01:51:56 - ERROR - stderr - +2025-02-06 01:51:56 - ERROR - stderr - +2025-02-06 01:51:56 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.5435835123062134, 'learning_rate': 2.2919235027039512e-06, 'epoch': 2.36} +2025-02-06 01:51:56 - ERROR - stderr - 79%|███████▊ | 17650/22434 [15:44:15<3:21:16, 2.52s/it] +2025-02-06 01:51:58 - ERROR - stderr - 79%|███████▊ | 17651/22434 [15:44:18<3:19:19, 2.50s/it] +2025-02-06 01:51:58 - ERROR - stderr - +2025-02-06 01:51:58 - ERROR - stderr - +2025-02-06 01:51:58 - INFO - stdout - {'loss': 0.3047, 'grad_norm': 1.3797398805618286, 'learning_rate': 2.291003818345454e-06, 'epoch': 2.36} +2025-02-06 01:51:58 - ERROR - stderr - 79%|███████▊ | 17651/22434 [15:44:18<3:19:19, 2.50s/it] +2025-02-06 01:52:00 - ERROR - stderr - 79%|███████▊ | 17652/22434 [15:44:20<3:19:33, 2.50s/it] +2025-02-06 01:52:01 - ERROR - stderr - +2025-02-06 01:52:01 - ERROR - stderr - +2025-02-06 01:52:01 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.4556787014007568, 'learning_rate': 2.290084294673606e-06, 'epoch': 2.36} +2025-02-06 01:52:01 - ERROR - stderr - 79%|███████▊ | 17652/22434 [15:44:20<3:19:33, 2.50s/it] +2025-02-06 01:52:03 - ERROR - stderr - 79%|███████▊ | 17653/22434 [15:44:23<3:20:11, 2.51s/it] +2025-02-06 01:52:03 - ERROR - stderr - +2025-02-06 01:52:03 - ERROR - stderr - +2025-02-06 01:52:03 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.5313482284545898, 'learning_rate': 2.2891649317075728e-06, 'epoch': 2.36} +2025-02-06 01:52:03 - ERROR - stderr - 79%|███████▊ | 17653/22434 [15:44:23<3:20:11, 2.51s/it] +2025-02-06 01:52:06 - ERROR - stderr - 79%|███████▊ | 17654/22434 [15:44:25<3:22:56, 2.55s/it] +2025-02-06 01:52:06 - ERROR - stderr - +2025-02-06 01:52:06 - ERROR - stderr - +2025-02-06 01:52:06 - INFO - stdout - {'loss': 0.4318, 'grad_norm': 1.8698794841766357, 'learning_rate': 2.2882457294665205e-06, 'epoch': 2.36} +2025-02-06 01:52:06 - ERROR - stderr - 79%|███████▊ | 17654/22434 [15:44:25<3:22:56, 2.55s/it] +2025-02-06 01:52:08 - ERROR - stderr - 79%|███████▊ | 17655/22434 [15:44:28<3:21:24, 2.53s/it] +2025-02-06 01:52:08 - ERROR - stderr - +2025-02-06 01:52:08 - ERROR - stderr - +2025-02-06 01:52:08 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.4722373485565186, 'learning_rate': 2.287326687969601e-06, 'epoch': 2.36} +2025-02-06 01:52:08 - ERROR - stderr - 79%|███████▊ | 17655/22434 [15:44:28<3:21:24, 2.53s/it] +2025-02-06 01:52:11 - ERROR - stderr - 79%|███████▊ | 17656/22434 [15:44:31<3:23:04, 2.55s/it] +2025-02-06 01:52:11 - ERROR - stderr - +2025-02-06 01:52:11 - ERROR - stderr - +2025-02-06 01:52:11 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.4357688426971436, 'learning_rate': 2.286407807235983e-06, 'epoch': 2.36} +2025-02-06 01:52:11 - ERROR - stderr - 79%|███████▊ | 17656/22434 [15:44:31<3:23:04, 2.55s/it] +2025-02-06 01:52:13 - ERROR - stderr - 79%|███████▊ | 17657/22434 [15:44:33<3:23:10, 2.55s/it] +2025-02-06 01:52:13 - ERROR - stderr - +2025-02-06 01:52:13 - ERROR - stderr - +2025-02-06 01:52:13 - INFO - stdout - {'loss': 0.3329, 'grad_norm': 1.4033628702163696, 'learning_rate': 2.2854890872848067e-06, 'epoch': 2.36} +2025-02-06 01:52:13 - ERROR - stderr - 79%|███████▊ | 17657/22434 [15:44:33<3:23:10, 2.55s/it] +2025-02-06 01:52:16 - ERROR - stderr - 79%|███████▊ | 17658/22434 [15:44:36<3:23:11, 2.55s/it] +2025-02-06 01:52:16 - ERROR - stderr - +2025-02-06 01:52:16 - ERROR - stderr - +2025-02-06 01:52:16 - INFO - stdout - {'loss': 0.3183, 'grad_norm': 1.4588919878005981, 'learning_rate': 2.2845705281352317e-06, 'epoch': 2.36} +2025-02-06 01:52:16 - ERROR - stderr - 79%|███████▊ | 17658/22434 [15:44:36<3:23:11, 2.55s/it] +2025-02-06 01:52:18 - ERROR - stderr - 79%|███████▊ | 17659/22434 [15:44:38<3:21:40, 2.53s/it] +2025-02-06 01:52:18 - ERROR - stderr - +2025-02-06 01:52:18 - ERROR - stderr - +2025-02-06 01:52:18 - INFO - stdout - {'loss': 0.3981, 'grad_norm': 1.474920392036438, 'learning_rate': 2.283652129806404e-06, 'epoch': 2.36} +2025-02-06 01:52:18 - ERROR - stderr - 79%|███████▊ | 17659/22434 [15:44:38<3:21:40, 2.53s/it] +2025-02-06 01:52:21 - ERROR - stderr - 79%|███████▊ | 17660/22434 [15:44:41<3:21:38, 2.53s/it] +2025-02-06 01:52:21 - ERROR - stderr - +2025-02-06 01:52:21 - ERROR - stderr - +2025-02-06 01:52:21 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.6830304861068726, 'learning_rate': 2.282733892317458e-06, 'epoch': 2.36} +2025-02-06 01:52:21 - ERROR - stderr - 79%|███████▊ | 17660/22434 [15:44:41<3:21:38, 2.53s/it] +2025-02-06 01:52:23 - ERROR - stderr - 79%|███████▊ | 17661/22434 [15:44:43<3:20:52, 2.53s/it] +2025-02-06 01:52:23 - ERROR - stderr - +2025-02-06 01:52:23 - ERROR - stderr - +2025-02-06 01:52:23 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.6151689291000366, 'learning_rate': 2.281815815687545e-06, 'epoch': 2.36} +2025-02-06 01:52:23 - ERROR - stderr - 79%|███████▊ | 17661/22434 [15:44:43<3:20:52, 2.53s/it] +2025-02-06 01:52:26 - ERROR - stderr - 79%|███████▊ | 17662/22434 [15:44:46<3:20:12, 2.52s/it] +2025-02-06 01:52:26 - ERROR - stderr - +2025-02-06 01:52:26 - ERROR - stderr - +2025-02-06 01:52:26 - INFO - stdout - {'loss': 0.4145, 'grad_norm': 1.464469075202942, 'learning_rate': 2.2808978999357933e-06, 'epoch': 2.36} +2025-02-06 01:52:26 - ERROR - stderr - 79%|███████▊ | 17662/22434 [15:44:46<3:20:12, 2.52s/it] +2025-02-06 01:52:28 - ERROR - stderr - 79%|███████▊ | 17663/22434 [15:44:48<3:20:04, 2.52s/it] +2025-02-06 01:52:28 - ERROR - stderr - +2025-02-06 01:52:28 - ERROR - stderr - +2025-02-06 01:52:28 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.539075255393982, 'learning_rate': 2.2799801450813385e-06, 'epoch': 2.36} +2025-02-06 01:52:28 - ERROR - stderr - 79%|███████▊ | 17663/22434 [15:44:48<3:20:04, 2.52s/it] +2025-02-06 01:52:31 - ERROR - stderr - 79%|███████▊ | 17664/22434 [15:44:51<3:22:14, 2.54s/it] +2025-02-06 01:52:31 - ERROR - stderr - +2025-02-06 01:52:31 - ERROR - stderr - +2025-02-06 01:52:31 - INFO - stdout - {'loss': 0.324, 'grad_norm': 1.3395589590072632, 'learning_rate': 2.2790625511433096e-06, 'epoch': 2.36} +2025-02-06 01:52:31 - ERROR - stderr - 79%|███████▊ | 17664/22434 [15:44:51<3:22:14, 2.54s/it] +2025-02-06 01:52:33 - ERROR - stderr - 79%|███████▊ | 17665/22434 [15:44:53<3:19:56, 2.52s/it] +2025-02-06 01:52:33 - ERROR - stderr - +2025-02-06 01:52:33 - ERROR - stderr - +2025-02-06 01:52:33 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.4639531373977661, 'learning_rate': 2.2781451181408343e-06, 'epoch': 2.36} +2025-02-06 01:52:33 - ERROR - stderr - 79%|███████▊ | 17665/22434 [15:44:53<3:19:56, 2.52s/it] +2025-02-06 01:52:36 - ERROR - stderr - 79%|███████▊ | 17666/22434 [15:44:56<3:31:11, 2.66s/it] +2025-02-06 01:52:36 - ERROR - stderr - +2025-02-06 01:52:36 - ERROR - stderr - +2025-02-06 01:52:36 - INFO - stdout - {'loss': 0.3615, 'grad_norm': 1.4742915630340576, 'learning_rate': 2.277227846093035e-06, 'epoch': 2.36} +2025-02-06 01:52:36 - ERROR - stderr - 79%|███████▊ | 17666/22434 [15:44:56<3:31:11, 2.66s/it] +2025-02-06 01:52:39 - ERROR - stderr - 79%|███████▉ | 17667/22434 [15:44:59<3:26:45, 2.60s/it] +2025-02-06 01:52:39 - ERROR - stderr - +2025-02-06 01:52:39 - ERROR - stderr - +2025-02-06 01:52:39 - INFO - stdout - {'loss': 0.4345, 'grad_norm': 1.5962618589401245, 'learning_rate': 2.2763107350190318e-06, 'epoch': 2.36} +2025-02-06 01:52:39 - ERROR - stderr - 79%|███████▉ | 17667/22434 [15:44:59<3:26:45, 2.60s/it] +2025-02-06 01:52:41 - ERROR - stderr - 79%|███████▉ | 17668/22434 [15:45:01<3:24:30, 2.57s/it] +2025-02-06 01:52:41 - ERROR - stderr - +2025-02-06 01:52:41 - ERROR - stderr - +2025-02-06 01:52:41 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.6713985204696655, 'learning_rate': 2.2753937849379392e-06, 'epoch': 2.36} +2025-02-06 01:52:41 - ERROR - stderr - 79%|███████▉ | 17668/22434 [15:45:01<3:24:30, 2.57s/it] +2025-02-06 01:52:44 - ERROR - stderr - 79%|███████▉ | 17669/22434 [15:45:04<3:20:51, 2.53s/it] +2025-02-06 01:52:44 - ERROR - stderr - +2025-02-06 01:52:44 - ERROR - stderr - +2025-02-06 01:52:44 - INFO - stdout - {'loss': 0.3419, 'grad_norm': 1.3686665296554565, 'learning_rate': 2.274476995868873e-06, 'epoch': 2.36} +2025-02-06 01:52:44 - ERROR - stderr - 79%|███████▉ | 17669/22434 [15:45:04<3:20:51, 2.53s/it] +2025-02-06 01:52:46 - ERROR - stderr - 79%|███████▉ | 17670/22434 [15:45:06<3:21:52, 2.54s/it] +2025-02-06 01:52:46 - ERROR - stderr - +2025-02-06 01:52:46 - ERROR - stderr - +2025-02-06 01:52:46 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.44289231300354, 'learning_rate': 2.2735603678309402e-06, 'epoch': 2.36} +2025-02-06 01:52:46 - ERROR - stderr - 79%|███████▉ | 17670/22434 [15:45:06<3:21:52, 2.54s/it] +2025-02-06 01:52:49 - ERROR - stderr - 79%|███████▉ | 17671/22434 [15:45:09<3:20:33, 2.53s/it] +2025-02-06 01:52:49 - ERROR - stderr - +2025-02-06 01:52:49 - ERROR - stderr - +2025-02-06 01:52:49 - INFO - stdout - {'loss': 0.4065, 'grad_norm': 1.7823231220245361, 'learning_rate': 2.272643900843249e-06, 'epoch': 2.36} +2025-02-06 01:52:49 - ERROR - stderr - 79%|███████▉ | 17671/22434 [15:45:09<3:20:33, 2.53s/it] +2025-02-06 01:52:51 - ERROR - stderr - 79%|███████▉ | 17672/22434 [15:45:11<3:18:03, 2.50s/it] +2025-02-06 01:52:51 - ERROR - stderr - +2025-02-06 01:52:51 - ERROR - stderr - +2025-02-06 01:52:51 - INFO - stdout - {'loss': 0.3096, 'grad_norm': 1.4176145792007446, 'learning_rate': 2.271727594924901e-06, 'epoch': 2.36} +2025-02-06 01:52:51 - ERROR - stderr - 79%|███████▉ | 17672/22434 [15:45:11<3:18:03, 2.50s/it] +2025-02-06 01:52:54 - ERROR - stderr - 79%|███████▉ | 17673/22434 [15:45:14<3:21:07, 2.53s/it] +2025-02-06 01:52:54 - ERROR - stderr - +2025-02-06 01:52:54 - ERROR - stderr - +2025-02-06 01:52:54 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.5126402378082275, 'learning_rate': 2.270811450094996e-06, 'epoch': 2.36} +2025-02-06 01:52:54 - ERROR - stderr - 79%|███████▉ | 17673/22434 [15:45:14<3:21:07, 2.53s/it] +2025-02-06 01:52:57 - ERROR - stderr - 79%|███████▉ | 17674/22434 [15:45:16<3:21:40, 2.54s/it] +2025-02-06 01:52:57 - ERROR - stderr - +2025-02-06 01:52:57 - ERROR - stderr - +2025-02-06 01:52:57 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.5010432004928589, 'learning_rate': 2.26989546637263e-06, 'epoch': 2.36} +2025-02-06 01:52:57 - ERROR - stderr - 79%|███████▉ | 17674/22434 [15:45:16<3:21:40, 2.54s/it] +2025-02-06 01:52:59 - ERROR - stderr - 79%|███████▉ | 17675/22434 [15:45:19<3:20:04, 2.52s/it] +2025-02-06 01:52:59 - ERROR - stderr - +2025-02-06 01:52:59 - ERROR - stderr - +2025-02-06 01:52:59 - INFO - stdout - {'loss': 0.3566, 'grad_norm': 1.445062518119812, 'learning_rate': 2.2689796437768996e-06, 'epoch': 2.36} +2025-02-06 01:52:59 - ERROR - stderr - 79%|███████▉ | 17675/22434 [15:45:19<3:20:04, 2.52s/it] +2025-02-06 01:53:02 - ERROR - stderr - 79%|███████▉ | 17676/22434 [15:45:21<3:23:12, 2.56s/it] +2025-02-06 01:53:02 - ERROR - stderr - +2025-02-06 01:53:02 - ERROR - stderr - +2025-02-06 01:53:02 - INFO - stdout - {'loss': 0.3954, 'grad_norm': 1.6571155786514282, 'learning_rate': 2.2680639823268848e-06, 'epoch': 2.36} +2025-02-06 01:53:02 - ERROR - stderr - 79%|███████▉ | 17676/22434 [15:45:21<3:23:12, 2.56s/it] +2025-02-06 01:53:04 - ERROR - stderr - 79%|███████▉ | 17677/22434 [15:45:24<3:22:02, 2.55s/it] +2025-02-06 01:53:04 - ERROR - stderr - +2025-02-06 01:53:04 - ERROR - stderr - +2025-02-06 01:53:04 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.512982726097107, 'learning_rate': 2.267148482041681e-06, 'epoch': 2.36} +2025-02-06 01:53:04 - ERROR - stderr - 79%|███████▉ | 17677/22434 [15:45:24<3:22:02, 2.55s/it] +2025-02-06 01:53:07 - ERROR - stderr - 79%|███████▉ | 17678/22434 [15:45:27<3:29:30, 2.64s/it] +2025-02-06 01:53:07 - ERROR - stderr - +2025-02-06 01:53:07 - ERROR - stderr - +2025-02-06 01:53:07 - INFO - stdout - {'loss': 0.3292, 'grad_norm': 1.2868047952651978, 'learning_rate': 2.2662331429403672e-06, 'epoch': 2.36} +2025-02-06 01:53:07 - ERROR - stderr - 79%|███████▉ | 17678/22434 [15:45:27<3:29:30, 2.64s/it] +2025-02-06 01:53:10 - ERROR - stderr - 79%|███████▉ | 17679/22434 [15:45:29<3:25:29, 2.59s/it] +2025-02-06 01:53:10 - ERROR - stderr - +2025-02-06 01:53:10 - ERROR - stderr - +2025-02-06 01:53:10 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.406975269317627, 'learning_rate': 2.265317965042022e-06, 'epoch': 2.36} +2025-02-06 01:53:10 - ERROR - stderr - 79%|███████▉ | 17679/22434 [15:45:29<3:25:29, 2.59s/it] +2025-02-06 01:53:12 - ERROR - stderr - 79%|███████▉ | 17680/22434 [15:45:32<3:22:38, 2.56s/it] +2025-02-06 01:53:12 - ERROR - stderr - +2025-02-06 01:53:12 - ERROR - stderr - +2025-02-06 01:53:12 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.5612964630126953, 'learning_rate': 2.264402948365727e-06, 'epoch': 2.36} +2025-02-06 01:53:12 - ERROR - stderr - 79%|███████▉ | 17680/22434 [15:45:32<3:22:38, 2.56s/it] +2025-02-06 01:53:14 - ERROR - stderr - 79%|███████▉ | 17681/22434 [15:45:34<3:20:04, 2.53s/it] +2025-02-06 01:53:14 - ERROR - stderr - +2025-02-06 01:53:14 - ERROR - stderr - +2025-02-06 01:53:14 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.7238435745239258, 'learning_rate': 2.2634880929305436e-06, 'epoch': 2.36} +2025-02-06 01:53:14 - ERROR - stderr - 79%|███████▉ | 17681/22434 [15:45:34<3:20:04, 2.53s/it] +2025-02-06 01:53:17 - ERROR - stderr - 79%|███████▉ | 17682/22434 [15:45:37<3:23:19, 2.57s/it] +2025-02-06 01:53:17 - ERROR - stderr - +2025-02-06 01:53:17 - ERROR - stderr - +2025-02-06 01:53:17 - INFO - stdout - {'loss': 0.436, 'grad_norm': 1.6193193197250366, 'learning_rate': 2.2625733987555542e-06, 'epoch': 2.36} +2025-02-06 01:53:17 - ERROR - stderr - 79%|███████▉ | 17682/22434 [15:45:37<3:23:19, 2.57s/it] +2025-02-06 01:53:20 - ERROR - stderr - 79%|███████▉ | 17683/22434 [15:45:39<3:20:39, 2.53s/it] +2025-02-06 01:53:20 - ERROR - stderr - +2025-02-06 01:53:20 - ERROR - stderr - +2025-02-06 01:53:20 - INFO - stdout - {'loss': 0.4031, 'grad_norm': 1.524370789527893, 'learning_rate': 2.2616588658598147e-06, 'epoch': 2.36} +2025-02-06 01:53:20 - ERROR - stderr - 79%|███████▉ | 17683/22434 [15:45:39<3:20:39, 2.53s/it] +2025-02-06 01:53:22 - ERROR - stderr - 79%|███████▉ | 17684/22434 [15:45:42<3:20:21, 2.53s/it] +2025-02-06 01:53:22 - ERROR - stderr - +2025-02-06 01:53:22 - ERROR - stderr - +2025-02-06 01:53:22 - INFO - stdout - {'loss': 0.4038, 'grad_norm': 1.5803323984146118, 'learning_rate': 2.2607444942623922e-06, 'epoch': 2.36} +2025-02-06 01:53:22 - ERROR - stderr - 79%|███████▉ | 17684/22434 [15:45:42<3:20:21, 2.53s/it] +2025-02-06 01:53:24 - ERROR - stderr - 79%|███████▉ | 17685/22434 [15:45:44<3:17:41, 2.50s/it] +2025-02-06 01:53:25 - ERROR - stderr - +2025-02-06 01:53:25 - ERROR - stderr - +2025-02-06 01:53:25 - INFO - stdout - {'loss': 0.4262, 'grad_norm': 1.7810205221176147, 'learning_rate': 2.259830283982345e-06, 'epoch': 2.36} +2025-02-06 01:53:25 - ERROR - stderr - 79%|███████▉ | 17685/22434 [15:45:44<3:17:41, 2.50s/it] +2025-02-06 01:53:27 - ERROR - stderr - 79%|███████▉ | 17686/22434 [15:45:47<3:19:21, 2.52s/it] +2025-02-06 01:53:27 - ERROR - stderr - +2025-02-06 01:53:27 - ERROR - stderr - +2025-02-06 01:53:27 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.472284197807312, 'learning_rate': 2.258916235038726e-06, 'epoch': 2.37} +2025-02-06 01:53:27 - ERROR - stderr - 79%|███████▉ | 17686/22434 [15:45:47<3:19:21, 2.52s/it] +2025-02-06 01:53:30 - ERROR - stderr - 79%|███████▉ | 17687/22434 [15:45:49<3:19:04, 2.52s/it] +2025-02-06 01:53:30 - ERROR - stderr - +2025-02-06 01:53:30 - ERROR - stderr - +2025-02-06 01:53:30 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.613823413848877, 'learning_rate': 2.2580023474505965e-06, 'epoch': 2.37} +2025-02-06 01:53:30 - ERROR - stderr - 79%|███████▉ | 17687/22434 [15:45:49<3:19:04, 2.52s/it] +2025-02-06 01:53:32 - ERROR - stderr - 79%|███████▉ | 17688/22434 [15:45:52<3:17:12, 2.49s/it] +2025-02-06 01:53:32 - ERROR - stderr - +2025-02-06 01:53:32 - ERROR - stderr - +2025-02-06 01:53:32 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.7042313814163208, 'learning_rate': 2.257088621236997e-06, 'epoch': 2.37} +2025-02-06 01:53:32 - ERROR - stderr - 79%|███████▉ | 17688/22434 [15:45:52<3:17:12, 2.49s/it] +2025-02-06 01:53:35 - ERROR - stderr - 79%|███████▉ | 17689/22434 [15:45:54<3:18:19, 2.51s/it] +2025-02-06 01:53:35 - ERROR - stderr - +2025-02-06 01:53:35 - ERROR - stderr - +2025-02-06 01:53:35 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.4048388004302979, 'learning_rate': 2.256175056416976e-06, 'epoch': 2.37} +2025-02-06 01:53:35 - ERROR - stderr - 79%|███████▉ | 17689/22434 [15:45:54<3:18:19, 2.51s/it] +2025-02-06 01:53:37 - ERROR - stderr - 79%|███████▉ | 17690/22434 [15:45:57<3:16:53, 2.49s/it] +2025-02-06 01:53:37 - ERROR - stderr - +2025-02-06 01:53:37 - ERROR - stderr - +2025-02-06 01:53:37 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.6079317331314087, 'learning_rate': 2.255261653009575e-06, 'epoch': 2.37} +2025-02-06 01:53:37 - ERROR - stderr - 79%|███████▉ | 17690/22434 [15:45:57<3:16:53, 2.49s/it] +2025-02-06 01:53:40 - ERROR - stderr - 79%|███████▉ | 17691/22434 [15:45:59<3:17:26, 2.50s/it] +2025-02-06 01:53:40 - ERROR - stderr - +2025-02-06 01:53:40 - ERROR - stderr - +2025-02-06 01:53:40 - INFO - stdout - {'loss': 0.3352, 'grad_norm': 1.4293856620788574, 'learning_rate': 2.2543484110338353e-06, 'epoch': 2.37} +2025-02-06 01:53:40 - ERROR - stderr - 79%|███████▉ | 17691/22434 [15:45:59<3:17:26, 2.50s/it] +2025-02-06 01:53:42 - ERROR - stderr - 79%|███████▉ | 17692/22434 [15:46:02<3:18:05, 2.51s/it] +2025-02-06 01:53:42 - ERROR - stderr - +2025-02-06 01:53:42 - ERROR - stderr - +2025-02-06 01:53:42 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.4527764320373535, 'learning_rate': 2.253435330508791e-06, 'epoch': 2.37} +2025-02-06 01:53:42 - ERROR - stderr - 79%|███████▉ | 17692/22434 [15:46:02<3:18:05, 2.51s/it] +2025-02-06 01:53:44 - ERROR - stderr - 79%|███████▉ | 17693/22434 [15:46:04<3:16:41, 2.49s/it] +2025-02-06 01:53:45 - ERROR - stderr - +2025-02-06 01:53:45 - ERROR - stderr - +2025-02-06 01:53:45 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.732912540435791, 'learning_rate': 2.252522411453474e-06, 'epoch': 2.37} +2025-02-06 01:53:45 - ERROR - stderr - 79%|███████▉ | 17693/22434 [15:46:04<3:16:41, 2.49s/it] +2025-02-06 01:53:47 - ERROR - stderr - 79%|███████▉ | 17694/22434 [15:46:07<3:18:20, 2.51s/it] +2025-02-06 01:53:47 - ERROR - stderr - +2025-02-06 01:53:47 - ERROR - stderr - +2025-02-06 01:53:47 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.4639184474945068, 'learning_rate': 2.2516096538869137e-06, 'epoch': 2.37} +2025-02-06 01:53:47 - ERROR - stderr - 79%|███████▉ | 17694/22434 [15:46:07<3:18:20, 2.51s/it] +2025-02-06 01:53:50 - ERROR - stderr - 79%|███████▉ | 17695/22434 [15:46:09<3:18:38, 2.52s/it] +2025-02-06 01:53:50 - ERROR - stderr - +2025-02-06 01:53:50 - ERROR - stderr - +2025-02-06 01:53:50 - INFO - stdout - {'loss': 0.3377, 'grad_norm': 1.4273416996002197, 'learning_rate': 2.250697057828135e-06, 'epoch': 2.37} +2025-02-06 01:53:50 - ERROR - stderr - 79%|███████▉ | 17695/22434 [15:46:09<3:18:38, 2.52s/it] +2025-02-06 01:53:52 - ERROR - stderr - 79%|███████▉ | 17696/22434 [15:46:12<3:16:21, 2.49s/it] +2025-02-06 01:53:52 - ERROR - stderr - +2025-02-06 01:53:52 - ERROR - stderr - +2025-02-06 01:53:52 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.5685405731201172, 'learning_rate': 2.249784623296163e-06, 'epoch': 2.37} +2025-02-06 01:53:52 - ERROR - stderr - 79%|███████▉ | 17696/22434 [15:46:12<3:16:21, 2.49s/it] +2025-02-06 01:53:55 - ERROR - stderr - 79%|███████▉ | 17697/22434 [15:46:14<3:19:26, 2.53s/it] +2025-02-06 01:53:55 - ERROR - stderr - +2025-02-06 01:53:55 - ERROR - stderr - +2025-02-06 01:53:55 - INFO - stdout - {'loss': 0.4104, 'grad_norm': 1.6673862934112549, 'learning_rate': 2.248872350310013e-06, 'epoch': 2.37} +2025-02-06 01:53:55 - ERROR - stderr - 79%|███████▉ | 17697/22434 [15:46:14<3:19:26, 2.53s/it] +2025-02-06 01:53:57 - ERROR - stderr - 79%|███████▉ | 17698/22434 [15:46:17<3:19:13, 2.52s/it] +2025-02-06 01:53:57 - ERROR - stderr - +2025-02-06 01:53:57 - ERROR - stderr - +2025-02-06 01:53:57 - INFO - stdout - {'loss': 0.3308, 'grad_norm': 1.288824200630188, 'learning_rate': 2.2479602388887013e-06, 'epoch': 2.37} +2025-02-06 01:53:57 - ERROR - stderr - 79%|███████▉ | 17698/22434 [15:46:17<3:19:13, 2.52s/it] +2025-02-06 01:54:00 - ERROR - stderr - 79%|███████▉ | 17699/22434 [15:46:19<3:19:47, 2.53s/it] +2025-02-06 01:54:00 - ERROR - stderr - +2025-02-06 01:54:00 - ERROR - stderr - +2025-02-06 01:54:00 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.3809102773666382, 'learning_rate': 2.2470482890512446e-06, 'epoch': 2.37} +2025-02-06 01:54:00 - ERROR - stderr - 79%|███████▉ | 17699/22434 [15:46:20<3:19:47, 2.53s/it] +2025-02-06 01:54:02 - ERROR - stderr - 79%|███████▉ | 17700/22434 [15:46:22<3:19:42, 2.53s/it] +2025-02-06 01:54:02 - ERROR - stderr - +2025-02-06 01:54:02 - ERROR - stderr - +2025-02-06 01:54:02 - INFO - stdout - {'loss': 0.3275, 'grad_norm': 1.379055380821228, 'learning_rate': 2.2461365008166412e-06, 'epoch': 2.37} +2025-02-06 01:54:02 - ERROR - stderr - 79%|███████▉ | 17700/22434 [15:46:22<3:19:42, 2.53s/it] +2025-02-06 01:54:05 - ERROR - stderr - 79%|███████▉ | 17701/22434 [15:46:24<3:18:31, 2.52s/it] +2025-02-06 01:54:05 - ERROR - stderr - +2025-02-06 01:54:05 - ERROR - stderr - +2025-02-06 01:54:05 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.4392231702804565, 'learning_rate': 2.2452248742039083e-06, 'epoch': 2.37} +2025-02-06 01:54:05 - ERROR - stderr - 79%|███████▉ | 17701/22434 [15:46:25<3:18:31, 2.52s/it] +2025-02-06 01:54:07 - ERROR - stderr - 79%|███████▉ | 17702/22434 [15:46:27<3:16:25, 2.49s/it] +2025-02-06 01:54:07 - ERROR - stderr - +2025-02-06 01:54:07 - ERROR - stderr - +2025-02-06 01:54:07 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.4698125123977661, 'learning_rate': 2.244313409232037e-06, 'epoch': 2.37} +2025-02-06 01:54:07 - ERROR - stderr - 79%|███████▉ | 17702/22434 [15:46:27<3:16:25, 2.49s/it] +2025-02-06 01:54:10 - ERROR - stderr - 79%|███████▉ | 17703/22434 [15:46:29<3:18:30, 2.52s/it] +2025-02-06 01:54:10 - ERROR - stderr - +2025-02-06 01:54:10 - ERROR - stderr - +2025-02-06 01:54:10 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.4300315380096436, 'learning_rate': 2.2434021059200373e-06, 'epoch': 2.37} +2025-02-06 01:54:10 - ERROR - stderr - 79%|███████▉ | 17703/22434 [15:46:30<3:18:30, 2.52s/it] +2025-02-06 01:54:12 - ERROR - stderr - 79%|███████▉ | 17704/22434 [15:46:32<3:17:52, 2.51s/it] +2025-02-06 01:54:12 - ERROR - stderr - +2025-02-06 01:54:12 - ERROR - stderr - +2025-02-06 01:54:12 - INFO - stdout - {'loss': 0.3693, 'grad_norm': 1.4911444187164307, 'learning_rate': 2.242490964286895e-06, 'epoch': 2.37} +2025-02-06 01:54:12 - ERROR - stderr - 79%|███████▉ | 17704/22434 [15:46:32<3:17:52, 2.51s/it] +2025-02-06 01:54:15 - ERROR - stderr - 79%|███████▉ | 17705/22434 [15:46:34<3:16:18, 2.49s/it] +2025-02-06 01:54:15 - ERROR - stderr - +2025-02-06 01:54:15 - ERROR - stderr - +2025-02-06 01:54:15 - INFO - stdout - {'loss': 0.3236, 'grad_norm': 1.3211544752120972, 'learning_rate': 2.241579984351603e-06, 'epoch': 2.37} +2025-02-06 01:54:15 - ERROR - stderr - 79%|███████▉ | 17705/22434 [15:46:34<3:16:18, 2.49s/it] +2025-02-06 01:54:17 - ERROR - stderr - 79%|███████▉ | 17706/22434 [15:46:37<3:15:55, 2.49s/it] +2025-02-06 01:54:17 - ERROR - stderr - +2025-02-06 01:54:17 - ERROR - stderr - +2025-02-06 01:54:17 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.5822662115097046, 'learning_rate': 2.240669166133158e-06, 'epoch': 2.37} +2025-02-06 01:54:17 - ERROR - stderr - 79%|███████▉ | 17706/22434 [15:46:37<3:15:55, 2.49s/it] +2025-02-06 01:54:20 - ERROR - stderr - 79%|███████▉ | 17707/22434 [15:46:39<3:15:56, 2.49s/it] +2025-02-06 01:54:20 - ERROR - stderr - +2025-02-06 01:54:20 - ERROR - stderr - +2025-02-06 01:54:20 - INFO - stdout - {'loss': 0.3566, 'grad_norm': 1.4565086364746094, 'learning_rate': 2.239758509650536e-06, 'epoch': 2.37} +2025-02-06 01:54:20 - ERROR - stderr - 79%|███████▉ | 17707/22434 [15:46:39<3:15:56, 2.49s/it] +2025-02-06 01:54:22 - ERROR - stderr - 79%|███████▉ | 17708/22434 [15:46:42<3:19:27, 2.53s/it] +2025-02-06 01:54:22 - ERROR - stderr - +2025-02-06 01:54:22 - ERROR - stderr - +2025-02-06 01:54:22 - INFO - stdout - {'loss': 0.3291, 'grad_norm': 1.5679067373275757, 'learning_rate': 2.2388480149227233e-06, 'epoch': 2.37} +2025-02-06 01:54:22 - ERROR - stderr - 79%|███████▉ | 17708/22434 [15:46:42<3:19:27, 2.53s/it] +2025-02-06 01:54:25 - ERROR - stderr - 79%|███████▉ | 17709/22434 [15:46:45<3:19:00, 2.53s/it] +2025-02-06 01:54:25 - ERROR - stderr - +2025-02-06 01:54:25 - ERROR - stderr - +2025-02-06 01:54:25 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.473365306854248, 'learning_rate': 2.237937681968696e-06, 'epoch': 2.37} +2025-02-06 01:54:25 - ERROR - stderr - 79%|███████▉ | 17709/22434 [15:46:45<3:19:00, 2.53s/it] +2025-02-06 01:54:27 - ERROR - stderr - 79%|███████▉ | 17710/22434 [15:46:47<3:17:18, 2.51s/it] +2025-02-06 01:54:27 - ERROR - stderr - +2025-02-06 01:54:27 - ERROR - stderr - +2025-02-06 01:54:27 - INFO - stdout - {'loss': 0.3947, 'grad_norm': 1.6807719469070435, 'learning_rate': 2.2370275108074303e-06, 'epoch': 2.37} +2025-02-06 01:54:27 - ERROR - stderr - 79%|███████▉ | 17710/22434 [15:46:47<3:17:18, 2.51s/it] +2025-02-06 01:54:30 - ERROR - stderr - 79%|███████▉ | 17711/22434 [15:46:50<3:17:25, 2.51s/it] +2025-02-06 01:54:30 - ERROR - stderr - +2025-02-06 01:54:30 - ERROR - stderr - +2025-02-06 01:54:30 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.4028116464614868, 'learning_rate': 2.2361175014578983e-06, 'epoch': 2.37} +2025-02-06 01:54:30 - ERROR - stderr - 79%|███████▉ | 17711/22434 [15:46:50<3:17:25, 2.51s/it] +2025-02-06 01:54:32 - ERROR - stderr - 79%|███████▉ | 17712/22434 [15:46:52<3:21:59, 2.57s/it] +2025-02-06 01:54:32 - ERROR - stderr - +2025-02-06 01:54:32 - ERROR - stderr - +2025-02-06 01:54:32 - INFO - stdout - {'loss': 0.3329, 'grad_norm': 1.4825043678283691, 'learning_rate': 2.2352076539390664e-06, 'epoch': 2.37} +2025-02-06 01:54:32 - ERROR - stderr - 79%|███████▉ | 17712/22434 [15:46:52<3:21:59, 2.57s/it] +2025-02-06 01:54:35 - ERROR - stderr - 79%|███████▉ | 17713/22434 [15:46:55<3:21:18, 2.56s/it] +2025-02-06 01:54:35 - ERROR - stderr - +2025-02-06 01:54:35 - ERROR - stderr - +2025-02-06 01:54:35 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5437074899673462, 'learning_rate': 2.234297968269903e-06, 'epoch': 2.37} +2025-02-06 01:54:35 - ERROR - stderr - 79%|███████▉ | 17713/22434 [15:46:55<3:21:18, 2.56s/it] +2025-02-06 01:54:37 - ERROR - stderr - 79%|███████▉ | 17714/22434 [15:46:57<3:20:06, 2.54s/it] +2025-02-06 01:54:38 - ERROR - stderr - +2025-02-06 01:54:38 - ERROR - stderr - +2025-02-06 01:54:38 - INFO - stdout - {'loss': 0.4132, 'grad_norm': 1.6066052913665771, 'learning_rate': 2.2333884444693656e-06, 'epoch': 2.37} +2025-02-06 01:54:38 - ERROR - stderr - 79%|███████▉ | 17714/22434 [15:46:57<3:20:06, 2.54s/it] +2025-02-06 01:54:40 - ERROR - stderr - 79%|███████▉ | 17715/22434 [15:47:00<3:18:07, 2.52s/it] +2025-02-06 01:54:40 - ERROR - stderr - +2025-02-06 01:54:40 - ERROR - stderr - +2025-02-06 01:54:40 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.476467251777649, 'learning_rate': 2.2324790825564146e-06, 'epoch': 2.37} +2025-02-06 01:54:40 - ERROR - stderr - 79%|███████▉ | 17715/22434 [15:47:00<3:18:07, 2.52s/it] +2025-02-06 01:54:42 - ERROR - stderr - 79%|███████▉ | 17716/22434 [15:47:02<3:18:27, 2.52s/it] +2025-02-06 01:54:43 - ERROR - stderr - +2025-02-06 01:54:43 - ERROR - stderr - +2025-02-06 01:54:43 - INFO - stdout - {'loss': 0.3977, 'grad_norm': 1.8732562065124512, 'learning_rate': 2.2315698825500053e-06, 'epoch': 2.37} +2025-02-06 01:54:43 - ERROR - stderr - 79%|███████▉ | 17716/22434 [15:47:02<3:18:27, 2.52s/it] +2025-02-06 01:54:45 - ERROR - stderr - 79%|███████▉ | 17717/22434 [15:47:05<3:17:36, 2.51s/it] +2025-02-06 01:54:45 - ERROR - stderr - +2025-02-06 01:54:45 - ERROR - stderr - +2025-02-06 01:54:45 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.6135532855987549, 'learning_rate': 2.230660844469088e-06, 'epoch': 2.37} +2025-02-06 01:54:45 - ERROR - stderr - 79%|███████▉ | 17717/22434 [15:47:05<3:17:36, 2.51s/it] +2025-02-06 01:54:47 - ERROR - stderr - 79%|███████▉ | 17718/22434 [15:47:07<3:17:01, 2.51s/it] +2025-02-06 01:54:48 - ERROR - stderr - +2025-02-06 01:54:48 - ERROR - stderr - +2025-02-06 01:54:48 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.5239770412445068, 'learning_rate': 2.229751968332611e-06, 'epoch': 2.37} +2025-02-06 01:54:48 - ERROR - stderr - 79%|███████▉ | 17718/22434 [15:47:07<3:17:01, 2.51s/it] +2025-02-06 01:54:50 - ERROR - stderr - 79%|███████▉ | 17719/22434 [15:47:10<3:17:57, 2.52s/it] +2025-02-06 01:54:50 - ERROR - stderr - +2025-02-06 01:54:50 - ERROR - stderr - +2025-02-06 01:54:50 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.6480637788772583, 'learning_rate': 2.2288432541595185e-06, 'epoch': 2.37} +2025-02-06 01:54:50 - ERROR - stderr - 79%|███████▉ | 17719/22434 [15:47:10<3:17:57, 2.52s/it] +2025-02-06 01:54:53 - ERROR - stderr - 79%|███████▉ | 17720/22434 [15:47:13<3:27:21, 2.64s/it] +2025-02-06 01:54:53 - ERROR - stderr - +2025-02-06 01:54:53 - ERROR - stderr - +2025-02-06 01:54:53 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.7169145345687866, 'learning_rate': 2.227934701968755e-06, 'epoch': 2.37} +2025-02-06 01:54:53 - ERROR - stderr - 79%|███████▉ | 17720/22434 [15:47:13<3:27:21, 2.64s/it] +2025-02-06 01:54:55 - ERROR - stderr - 79%|███████▉ | 17721/22434 [15:47:15<3:23:04, 2.59s/it] +2025-02-06 01:54:55 - ERROR - stderr - +2025-02-06 01:54:55 - ERROR - stderr - +2025-02-06 01:54:55 - INFO - stdout - {'loss': 0.3929, 'grad_norm': 1.447026252746582, 'learning_rate': 2.227026311779249e-06, 'epoch': 2.37} +2025-02-06 01:54:55 - ERROR - stderr - 79%|███████▉ | 17721/22434 [15:47:15<3:23:04, 2.59s/it] +2025-02-06 01:54:58 - ERROR - stderr - 79%|███████▉ | 17722/22434 [15:47:18<3:20:18, 2.55s/it] +2025-02-06 01:54:58 - ERROR - stderr - +2025-02-06 01:54:58 - ERROR - stderr - +2025-02-06 01:54:58 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.5903888940811157, 'learning_rate': 2.2261180836099482e-06, 'epoch': 2.37} +2025-02-06 01:54:58 - ERROR - stderr - 79%|███████▉ | 17722/22434 [15:47:18<3:20:18, 2.55s/it] +2025-02-06 01:55:00 - ERROR - stderr - 79%|███████▉ | 17723/22434 [15:47:20<3:18:22, 2.53s/it] +2025-02-06 01:55:00 - ERROR - stderr - +2025-02-06 01:55:00 - ERROR - stderr - +2025-02-06 01:55:00 - INFO - stdout - {'loss': 0.4015, 'grad_norm': 1.4336998462677002, 'learning_rate': 2.2252100174797753e-06, 'epoch': 2.37} +2025-02-06 01:55:00 - ERROR - stderr - 79%|███████▉ | 17723/22434 [15:47:20<3:18:22, 2.53s/it] +2025-02-06 01:55:03 - ERROR - stderr - 79%|███████▉ | 17724/22434 [15:47:23<3:17:11, 2.51s/it] +2025-02-06 01:55:03 - ERROR - stderr - +2025-02-06 01:55:03 - ERROR - stderr - +2025-02-06 01:55:03 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.4801890850067139, 'learning_rate': 2.2243021134076557e-06, 'epoch': 2.37} +2025-02-06 01:55:03 - ERROR - stderr - 79%|███████▉ | 17724/22434 [15:47:23<3:17:11, 2.51s/it] +2025-02-06 01:55:05 - ERROR - stderr - 79%|███████▉ | 17725/22434 [15:47:25<3:18:03, 2.52s/it] +2025-02-06 01:55:05 - ERROR - stderr - +2025-02-06 01:55:05 - ERROR - stderr - +2025-02-06 01:55:05 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.5349730253219604, 'learning_rate': 2.223394371412524e-06, 'epoch': 2.37} +2025-02-06 01:55:05 - ERROR - stderr - 79%|███████▉ | 17725/22434 [15:47:25<3:18:03, 2.52s/it] +2025-02-06 01:55:08 - ERROR - stderr - 79%|███████▉ | 17726/22434 [15:47:28<3:16:24, 2.50s/it] +2025-02-06 01:55:08 - ERROR - stderr - +2025-02-06 01:55:08 - ERROR - stderr - +2025-02-06 01:55:08 - INFO - stdout - {'loss': 0.4404, 'grad_norm': 1.7089978456497192, 'learning_rate': 2.2224867915132896e-06, 'epoch': 2.37} +2025-02-06 01:55:08 - ERROR - stderr - 79%|███████▉ | 17726/22434 [15:47:28<3:16:24, 2.50s/it] +2025-02-06 01:55:10 - ERROR - stderr - 79%|███████▉ | 17727/22434 [15:47:30<3:16:53, 2.51s/it] +2025-02-06 01:55:10 - ERROR - stderr - +2025-02-06 01:55:10 - ERROR - stderr - +2025-02-06 01:55:10 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.6646735668182373, 'learning_rate': 2.2215793737288817e-06, 'epoch': 2.37} +2025-02-06 01:55:10 - ERROR - stderr - 79%|███████▉ | 17727/22434 [15:47:30<3:16:53, 2.51s/it] +2025-02-06 01:55:13 - ERROR - stderr - 79%|███████▉ | 17728/22434 [15:47:33<3:17:00, 2.51s/it] +2025-02-06 01:55:13 - ERROR - stderr - +2025-02-06 01:55:13 - ERROR - stderr - +2025-02-06 01:55:13 - INFO - stdout - {'loss': 0.3971, 'grad_norm': 1.4868961572647095, 'learning_rate': 2.2206721180782053e-06, 'epoch': 2.37} +2025-02-06 01:55:13 - ERROR - stderr - 79%|███████▉ | 17728/22434 [15:47:33<3:17:00, 2.51s/it] +2025-02-06 01:55:15 - ERROR - stderr - 79%|███████▉ | 17729/22434 [15:47:35<3:15:52, 2.50s/it] +2025-02-06 01:55:15 - ERROR - stderr - +2025-02-06 01:55:15 - ERROR - stderr - +2025-02-06 01:55:15 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.3605149984359741, 'learning_rate': 2.219765024580175e-06, 'epoch': 2.37} +2025-02-06 01:55:15 - ERROR - stderr - 79%|███████▉ | 17729/22434 [15:47:35<3:15:52, 2.50s/it] +2025-02-06 01:55:18 - ERROR - stderr - 79%|███████▉ | 17730/22434 [15:47:38<3:15:06, 2.49s/it] +2025-02-06 01:55:18 - ERROR - stderr - +2025-02-06 01:55:18 - ERROR - stderr - +2025-02-06 01:55:18 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.5987104177474976, 'learning_rate': 2.2188580932536986e-06, 'epoch': 2.37} +2025-02-06 01:55:18 - ERROR - stderr - 79%|███████▉ | 17730/22434 [15:47:38<3:15:06, 2.49s/it] +2025-02-06 01:55:20 - ERROR - stderr - 79%|███████▉ | 17731/22434 [15:47:40<3:18:09, 2.53s/it] +2025-02-06 01:55:20 - ERROR - stderr - +2025-02-06 01:55:20 - ERROR - stderr - +2025-02-06 01:55:20 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.6482453346252441, 'learning_rate': 2.2179513241176777e-06, 'epoch': 2.37} +2025-02-06 01:55:20 - ERROR - stderr - 79%|███████▉ | 17731/22434 [15:47:40<3:18:09, 2.53s/it] +2025-02-06 01:55:23 - ERROR - stderr - 79%|███████▉ | 17732/22434 [15:47:43<3:17:11, 2.52s/it] +2025-02-06 01:55:23 - ERROR - stderr - +2025-02-06 01:55:23 - ERROR - stderr - +2025-02-06 01:55:23 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.4732414484024048, 'learning_rate': 2.2170447171910157e-06, 'epoch': 2.37} +2025-02-06 01:55:23 - ERROR - stderr - 79%|███████▉ | 17732/22434 [15:47:43<3:17:11, 2.52s/it] +2025-02-06 01:55:25 - ERROR - stderr - 79%|███████▉ | 17733/22434 [15:47:45<3:15:25, 2.49s/it] +2025-02-06 01:55:25 - ERROR - stderr - +2025-02-06 01:55:25 - ERROR - stderr - +2025-02-06 01:55:25 - INFO - stdout - {'loss': 0.4107, 'grad_norm': 1.6557761430740356, 'learning_rate': 2.2161382724926096e-06, 'epoch': 2.37} +2025-02-06 01:55:25 - ERROR - stderr - 79%|███████▉ | 17733/22434 [15:47:45<3:15:25, 2.49s/it] +2025-02-06 01:55:28 - ERROR - stderr - 79%|███████▉ | 17734/22434 [15:47:48<3:19:30, 2.55s/it] +2025-02-06 01:55:28 - ERROR - stderr - +2025-02-06 01:55:28 - ERROR - stderr - +2025-02-06 01:55:28 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.6231448650360107, 'learning_rate': 2.2152319900413523e-06, 'epoch': 2.37} +2025-02-06 01:55:28 - ERROR - stderr - 79%|███████▉ | 17734/22434 [15:47:48<3:19:30, 2.55s/it] +2025-02-06 01:55:31 - ERROR - stderr - 79%|███████▉ | 17735/22434 [15:47:50<3:21:04, 2.57s/it] +2025-02-06 01:55:31 - ERROR - stderr - +2025-02-06 01:55:31 - ERROR - stderr - +2025-02-06 01:55:31 - INFO - stdout - {'loss': 0.3339, 'grad_norm': 1.500584363937378, 'learning_rate': 2.2143258698561354e-06, 'epoch': 2.37} +2025-02-06 01:55:31 - ERROR - stderr - 79%|███████▉ | 17735/22434 [15:47:50<3:21:04, 2.57s/it] +2025-02-06 01:55:33 - ERROR - stderr - 79%|███████▉ | 17736/22434 [15:47:53<3:18:45, 2.54s/it] +2025-02-06 01:55:33 - ERROR - stderr - +2025-02-06 01:55:33 - ERROR - stderr - +2025-02-06 01:55:33 - INFO - stdout - {'loss': 0.3373, 'grad_norm': 1.3678990602493286, 'learning_rate': 2.213419911955845e-06, 'epoch': 2.37} +2025-02-06 01:55:33 - ERROR - stderr - 79%|███████▉ | 17736/22434 [15:47:53<3:18:45, 2.54s/it] +2025-02-06 01:55:36 - ERROR - stderr - 79%|███████▉ | 17737/22434 [15:47:55<3:17:45, 2.53s/it] +2025-02-06 01:55:36 - ERROR - stderr - +2025-02-06 01:55:36 - ERROR - stderr - +2025-02-06 01:55:36 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.640110969543457, 'learning_rate': 2.212514116359367e-06, 'epoch': 2.37} +2025-02-06 01:55:36 - ERROR - stderr - 79%|███████▉ | 17737/22434 [15:47:55<3:17:45, 2.53s/it] +2025-02-06 01:55:38 - ERROR - stderr - 79%|███████▉ | 17738/22434 [15:47:58<3:17:11, 2.52s/it] +2025-02-06 01:55:38 - ERROR - stderr - +2025-02-06 01:55:38 - ERROR - stderr - +2025-02-06 01:55:38 - INFO - stdout - {'loss': 0.3415, 'grad_norm': 1.3266701698303223, 'learning_rate': 2.211608483085579e-06, 'epoch': 2.37} +2025-02-06 01:55:38 - ERROR - stderr - 79%|███████▉ | 17738/22434 [15:47:58<3:17:11, 2.52s/it] +2025-02-06 01:55:41 - ERROR - stderr - 79%|███████▉ | 17739/22434 [15:48:00<3:18:54, 2.54s/it] +2025-02-06 01:55:41 - ERROR - stderr - +2025-02-06 01:55:41 - ERROR - stderr - +2025-02-06 01:55:41 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.520799160003662, 'learning_rate': 2.2107030121533623e-06, 'epoch': 2.37} +2025-02-06 01:55:41 - ERROR - stderr - 79%|███████▉ | 17739/22434 [15:48:01<3:18:54, 2.54s/it] +2025-02-06 01:55:43 - ERROR - stderr - 79%|███████▉ | 17740/22434 [15:48:03<3:16:23, 2.51s/it] +2025-02-06 01:55:43 - ERROR - stderr - +2025-02-06 01:55:43 - ERROR - stderr - +2025-02-06 01:55:43 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.433852195739746, 'learning_rate': 2.209797703581582e-06, 'epoch': 2.37} +2025-02-06 01:55:43 - ERROR - stderr - 79%|███████▉ | 17740/22434 [15:48:03<3:16:23, 2.51s/it] +2025-02-06 01:55:46 - ERROR - stderr - 79%|███████▉ | 17741/22434 [15:48:06<3:21:14, 2.57s/it] +2025-02-06 01:55:46 - ERROR - stderr - +2025-02-06 01:55:46 - ERROR - stderr - +2025-02-06 01:55:46 - INFO - stdout - {'loss': 0.3223, 'grad_norm': 1.5438417196273804, 'learning_rate': 2.2088925573891207e-06, 'epoch': 2.37} +2025-02-06 01:55:46 - ERROR - stderr - 79%|███████▉ | 17741/22434 [15:48:06<3:21:14, 2.57s/it] +2025-02-06 01:55:48 - ERROR - stderr - 79%|███████▉ | 17742/22434 [15:48:08<3:20:00, 2.56s/it] +2025-02-06 01:55:48 - ERROR - stderr - +2025-02-06 01:55:48 - ERROR - stderr - +2025-02-06 01:55:48 - INFO - stdout - {'loss': 0.4181, 'grad_norm': 1.7594188451766968, 'learning_rate': 2.207987573594833e-06, 'epoch': 2.37} +2025-02-06 01:55:48 - ERROR - stderr - 79%|███████▉ | 17742/22434 [15:48:08<3:20:00, 2.56s/it] +2025-02-06 01:55:51 - ERROR - stderr - 79%|███████▉ | 17743/22434 [15:48:11<3:17:30, 2.53s/it] +2025-02-06 01:55:51 - ERROR - stderr - +2025-02-06 01:55:51 - ERROR - stderr - +2025-02-06 01:55:51 - INFO - stdout - {'loss': 0.3013, 'grad_norm': 1.3919388055801392, 'learning_rate': 2.207082752217591e-06, 'epoch': 2.37} +2025-02-06 01:55:51 - ERROR - stderr - 79%|███████▉ | 17743/22434 [15:48:11<3:17:30, 2.53s/it] +2025-02-06 01:55:53 - ERROR - stderr - 79%|███████▉ | 17744/22434 [15:48:13<3:16:58, 2.52s/it] +2025-02-06 01:55:53 - ERROR - stderr - +2025-02-06 01:55:53 - ERROR - stderr - +2025-02-06 01:55:53 - INFO - stdout - {'loss': 0.3277, 'grad_norm': 1.5057038068771362, 'learning_rate': 2.2061780932762545e-06, 'epoch': 2.37} +2025-02-06 01:55:53 - ERROR - stderr - 79%|███████▉ | 17744/22434 [15:48:13<3:16:58, 2.52s/it] +2025-02-06 01:55:56 - ERROR - stderr - 79%|███████▉ | 17745/22434 [15:48:16<3:16:04, 2.51s/it] +2025-02-06 01:55:56 - ERROR - stderr - +2025-02-06 01:55:56 - ERROR - stderr - +2025-02-06 01:55:56 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.6068222522735596, 'learning_rate': 2.205273596789672e-06, 'epoch': 2.37} +2025-02-06 01:55:56 - ERROR - stderr - 79%|███████▉ | 17745/22434 [15:48:16<3:16:04, 2.51s/it] +2025-02-06 01:55:58 - ERROR - stderr - 79%|███████▉ | 17746/22434 [15:48:18<3:14:35, 2.49s/it] +2025-02-06 01:55:58 - ERROR - stderr - +2025-02-06 01:55:58 - ERROR - stderr - +2025-02-06 01:55:58 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.587902307510376, 'learning_rate': 2.2043692627767077e-06, 'epoch': 2.37} +2025-02-06 01:55:58 - ERROR - stderr - 79%|███████▉ | 17746/22434 [15:48:18<3:14:35, 2.49s/it] +2025-02-06 01:56:01 - ERROR - stderr - 79%|███████▉ | 17747/22434 [15:48:21<3:16:11, 2.51s/it] +2025-02-06 01:56:01 - ERROR - stderr - +2025-02-06 01:56:01 - ERROR - stderr - +2025-02-06 01:56:01 - INFO - stdout - {'loss': 0.3321, 'grad_norm': 1.5797476768493652, 'learning_rate': 2.203465091256205e-06, 'epoch': 2.37} +2025-02-06 01:56:01 - ERROR - stderr - 79%|███████▉ | 17747/22434 [15:48:21<3:16:11, 2.51s/it] +2025-02-06 01:56:04 - ERROR - stderr - 79%|███████▉ | 17748/22434 [15:48:23<3:21:01, 2.57s/it] +2025-02-06 01:56:04 - ERROR - stderr - +2025-02-06 01:56:04 - ERROR - stderr - +2025-02-06 01:56:04 - INFO - stdout - {'loss': 0.325, 'grad_norm': 1.4322763681411743, 'learning_rate': 2.2025610822470113e-06, 'epoch': 2.37} +2025-02-06 01:56:04 - ERROR - stderr - 79%|███████▉ | 17748/22434 [15:48:23<3:21:01, 2.57s/it] +2025-02-06 01:56:06 - ERROR - stderr - 79%|███████▉ | 17749/22434 [15:48:26<3:18:41, 2.54s/it] +2025-02-06 01:56:06 - ERROR - stderr - +2025-02-06 01:56:06 - ERROR - stderr - +2025-02-06 01:56:06 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.4042689800262451, 'learning_rate': 2.201657235767971e-06, 'epoch': 2.37} +2025-02-06 01:56:06 - ERROR - stderr - 79%|███████▉ | 17749/22434 [15:48:26<3:18:41, 2.54s/it] +2025-02-06 01:56:08 - ERROR - stderr - 79%|███████▉ | 17750/22434 [15:48:28<3:16:50, 2.52s/it] +2025-02-06 01:56:09 - ERROR - stderr - +2025-02-06 01:56:09 - ERROR - stderr - +2025-02-06 01:56:09 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.5412917137145996, 'learning_rate': 2.2007535518379196e-06, 'epoch': 2.37} +2025-02-06 01:56:09 - ERROR - stderr - 79%|███████▉ | 17750/22434 [15:48:28<3:16:50, 2.52s/it] +2025-02-06 01:56:11 - ERROR - stderr - 79%|███████▉ | 17751/22434 [15:48:31<3:22:28, 2.59s/it] +2025-02-06 01:56:11 - ERROR - stderr - +2025-02-06 01:56:11 - ERROR - stderr - +2025-02-06 01:56:11 - INFO - stdout - {'loss': 0.4536, 'grad_norm': 1.758390188217163, 'learning_rate': 2.1998500304757044e-06, 'epoch': 2.37} +2025-02-06 01:56:11 - ERROR - stderr - 79%|███████▉ | 17751/22434 [15:48:31<3:22:28, 2.59s/it] +2025-02-06 01:56:14 - ERROR - stderr - 79%|███████▉ | 17752/22434 [15:48:33<3:19:02, 2.55s/it] +2025-02-06 01:56:14 - ERROR - stderr - +2025-02-06 01:56:14 - ERROR - stderr - +2025-02-06 01:56:14 - INFO - stdout - {'loss': 0.4084, 'grad_norm': 1.7407138347625732, 'learning_rate': 2.1989466717001475e-06, 'epoch': 2.37} +2025-02-06 01:56:14 - ERROR - stderr - 79%|███████▉ | 17752/22434 [15:48:34<3:19:02, 2.55s/it] +2025-02-06 01:56:16 - ERROR - stderr - 79%|███████▉ | 17753/22434 [15:48:36<3:19:12, 2.55s/it] +2025-02-06 01:56:16 - ERROR - stderr - +2025-02-06 01:56:16 - ERROR - stderr - +2025-02-06 01:56:16 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.5752229690551758, 'learning_rate': 2.1980434755300828e-06, 'epoch': 2.37} +2025-02-06 01:56:16 - ERROR - stderr - 79%|███████▉ | 17753/22434 [15:48:36<3:19:12, 2.55s/it] +2025-02-06 01:56:19 - ERROR - stderr - 79%|███████▉ | 17754/22434 [15:48:39<3:19:40, 2.56s/it] +2025-02-06 01:56:19 - ERROR - stderr - +2025-02-06 01:56:19 - ERROR - stderr - +2025-02-06 01:56:19 - INFO - stdout - {'loss': 0.375, 'grad_norm': 1.4256819486618042, 'learning_rate': 2.1971404419843355e-06, 'epoch': 2.37} +2025-02-06 01:56:19 - ERROR - stderr - 79%|███████▉ | 17754/22434 [15:48:39<3:19:40, 2.56s/it] +2025-02-06 01:56:21 - ERROR - stderr - 79%|███████▉ | 17755/22434 [15:48:41<3:17:14, 2.53s/it] +2025-02-06 01:56:21 - ERROR - stderr - +2025-02-06 01:56:21 - ERROR - stderr - +2025-02-06 01:56:21 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.5044407844543457, 'learning_rate': 2.1962375710817296e-06, 'epoch': 2.37} +2025-02-06 01:56:21 - ERROR - stderr - 79%|███████▉ | 17755/22434 [15:48:41<3:17:14, 2.53s/it] +2025-02-06 01:56:24 - ERROR - stderr - 79%|███████▉ | 17756/22434 [15:48:44<3:16:22, 2.52s/it] +2025-02-06 01:56:24 - ERROR - stderr - +2025-02-06 01:56:24 - ERROR - stderr - +2025-02-06 01:56:24 - INFO - stdout - {'loss': 0.3285, 'grad_norm': 1.2612330913543701, 'learning_rate': 2.1953348628410855e-06, 'epoch': 2.37} +2025-02-06 01:56:24 - ERROR - stderr - 79%|███████▉ | 17756/22434 [15:48:44<3:16:22, 2.52s/it] +2025-02-06 01:56:26 - ERROR - stderr - 79%|███████▉ | 17757/22434 [15:48:46<3:15:58, 2.51s/it] +2025-02-06 01:56:26 - ERROR - stderr - +2025-02-06 01:56:26 - ERROR - stderr - +2025-02-06 01:56:26 - INFO - stdout - {'loss': 0.3313, 'grad_norm': 1.3040237426757812, 'learning_rate': 2.1944323172812166e-06, 'epoch': 2.37} +2025-02-06 01:56:26 - ERROR - stderr - 79%|███████▉ | 17757/22434 [15:48:46<3:15:58, 2.51s/it] +2025-02-06 01:56:29 - ERROR - stderr - 79%|███████▉ | 17758/22434 [15:48:49<3:16:35, 2.52s/it] +2025-02-06 01:56:29 - ERROR - stderr - +2025-02-06 01:56:29 - ERROR - stderr - +2025-02-06 01:56:29 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.4450445175170898, 'learning_rate': 2.193529934420937e-06, 'epoch': 2.37} +2025-02-06 01:56:29 - ERROR - stderr - 79%|███████▉ | 17758/22434 [15:48:49<3:16:35, 2.52s/it] +2025-02-06 01:56:31 - ERROR - stderr - 79%|███████▉ | 17759/22434 [15:48:51<3:15:33, 2.51s/it] +2025-02-06 01:56:31 - ERROR - stderr - +2025-02-06 01:56:31 - ERROR - stderr - +2025-02-06 01:56:31 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.4951387643814087, 'learning_rate': 2.1926277142790554e-06, 'epoch': 2.37} +2025-02-06 01:56:31 - ERROR - stderr - 79%|███████▉ | 17759/22434 [15:48:51<3:15:33, 2.51s/it] +2025-02-06 01:56:34 - ERROR - stderr - 79%|███████▉ | 17760/22434 [15:48:54<3:14:59, 2.50s/it] +2025-02-06 01:56:34 - ERROR - stderr - +2025-02-06 01:56:34 - ERROR - stderr - +2025-02-06 01:56:34 - INFO - stdout - {'loss': 0.333, 'grad_norm': 1.4737999439239502, 'learning_rate': 2.1917256568743794e-06, 'epoch': 2.37} +2025-02-06 01:56:34 - ERROR - stderr - 79%|███████▉ | 17760/22434 [15:48:54<3:14:59, 2.50s/it] +2025-02-06 01:56:36 - ERROR - stderr - 79%|███████▉ | 17761/22434 [15:48:56<3:14:08, 2.49s/it] +2025-02-06 01:56:36 - ERROR - stderr - +2025-02-06 01:56:36 - ERROR - stderr - +2025-02-06 01:56:36 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.5859688520431519, 'learning_rate': 2.1908237622257087e-06, 'epoch': 2.38} +2025-02-06 01:56:36 - ERROR - stderr - 79%|███████▉ | 17761/22434 [15:48:56<3:14:08, 2.49s/it] +2025-02-06 01:56:39 - ERROR - stderr - 79%|███████▉ | 17762/22434 [15:48:59<3:14:04, 2.49s/it] +2025-02-06 01:56:39 - ERROR - stderr - +2025-02-06 01:56:39 - ERROR - stderr - +2025-02-06 01:56:39 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.68455171585083, 'learning_rate': 2.1899220303518465e-06, 'epoch': 2.38} +2025-02-06 01:56:39 - ERROR - stderr - 79%|███████▉ | 17762/22434 [15:48:59<3:14:04, 2.49s/it] +2025-02-06 01:56:41 - ERROR - stderr - 79%|███████▉ | 17763/22434 [15:49:01<3:16:43, 2.53s/it] +2025-02-06 01:56:41 - ERROR - stderr - +2025-02-06 01:56:41 - ERROR - stderr - +2025-02-06 01:56:41 - INFO - stdout - {'loss': 0.3836, 'grad_norm': 1.478835940361023, 'learning_rate': 2.1890204612715847e-06, 'epoch': 2.38} +2025-02-06 01:56:41 - ERROR - stderr - 79%|███████▉ | 17763/22434 [15:49:01<3:16:43, 2.53s/it] +2025-02-06 01:56:44 - ERROR - stderr - 79%|███████▉ | 17764/22434 [15:49:04<3:15:27, 2.51s/it] +2025-02-06 01:56:44 - ERROR - stderr - +2025-02-06 01:56:44 - ERROR - stderr - +2025-02-06 01:56:44 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.4816006422042847, 'learning_rate': 2.188119055003717e-06, 'epoch': 2.38} +2025-02-06 01:56:44 - ERROR - stderr - 79%|███████▉ | 17764/22434 [15:49:04<3:15:27, 2.51s/it] +2025-02-06 01:56:46 - ERROR - stderr - 79%|███████▉ | 17765/22434 [15:49:06<3:15:58, 2.52s/it] +2025-02-06 01:56:46 - ERROR - stderr - +2025-02-06 01:56:46 - ERROR - stderr - +2025-02-06 01:56:46 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.6837992668151855, 'learning_rate': 2.187217811567035e-06, 'epoch': 2.38} +2025-02-06 01:56:46 - ERROR - stderr - 79%|███████▉ | 17765/22434 [15:49:06<3:15:58, 2.52s/it] +2025-02-06 01:56:49 - ERROR - stderr - 79%|███████▉ | 17766/22434 [15:49:09<3:16:46, 2.53s/it] +2025-02-06 01:56:49 - ERROR - stderr - +2025-02-06 01:56:49 - ERROR - stderr - +2025-02-06 01:56:49 - INFO - stdout - {'loss': 0.3699, 'grad_norm': 1.482251524925232, 'learning_rate': 2.186316730980317e-06, 'epoch': 2.38} +2025-02-06 01:56:49 - ERROR - stderr - 79%|███████▉ | 17766/22434 [15:49:09<3:16:46, 2.53s/it] +2025-02-06 01:56:51 - ERROR - stderr - 79%|███████▉ | 17767/22434 [15:49:11<3:15:22, 2.51s/it] +2025-02-06 01:56:51 - ERROR - stderr - +2025-02-06 01:56:51 - ERROR - stderr - +2025-02-06 01:56:51 - INFO - stdout - {'loss': 0.3415, 'grad_norm': 1.431774616241455, 'learning_rate': 2.185415813262355e-06, 'epoch': 2.38} +2025-02-06 01:56:51 - ERROR - stderr - 79%|███████▉ | 17767/22434 [15:49:11<3:15:22, 2.51s/it] +2025-02-06 01:56:54 - ERROR - stderr - 79%|███████▉ | 17768/22434 [15:49:14<3:17:06, 2.53s/it] +2025-02-06 01:56:54 - ERROR - stderr - +2025-02-06 01:56:54 - ERROR - stderr - +2025-02-06 01:56:54 - INFO - stdout - {'loss': 0.39, 'grad_norm': 2.1552860736846924, 'learning_rate': 2.1845150584319197e-06, 'epoch': 2.38} +2025-02-06 01:56:54 - ERROR - stderr - 79%|███████▉ | 17768/22434 [15:49:14<3:17:06, 2.53s/it] +2025-02-06 01:56:56 - ERROR - stderr - 79%|███████▉ | 17769/22434 [15:49:16<3:15:52, 2.52s/it] +2025-02-06 01:56:57 - ERROR - stderr - +2025-02-06 01:56:57 - ERROR - stderr - +2025-02-06 01:56:57 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.6339871883392334, 'learning_rate': 2.1836144665077873e-06, 'epoch': 2.38} +2025-02-06 01:56:57 - ERROR - stderr - 79%|███████▉ | 17769/22434 [15:49:16<3:15:52, 2.52s/it] +2025-02-06 01:57:00 - ERROR - stderr - 79%|███████▉ | 17770/22434 [15:49:19<3:30:21, 2.71s/it] +2025-02-06 01:57:00 - ERROR - stderr - +2025-02-06 01:57:00 - ERROR - stderr - +2025-02-06 01:57:00 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.5598790645599365, 'learning_rate': 2.1827140375087363e-06, 'epoch': 2.38} +2025-02-06 01:57:00 - ERROR - stderr - 79%|███████▉ | 17770/22434 [15:49:19<3:30:21, 2.71s/it] +2025-02-06 01:57:02 - ERROR - stderr - 79%|███████▉ | 17771/22434 [15:49:22<3:26:39, 2.66s/it] +2025-02-06 01:57:02 - ERROR - stderr - +2025-02-06 01:57:02 - ERROR - stderr - +2025-02-06 01:57:02 - INFO - stdout - {'loss': 0.4169, 'grad_norm': 1.6280962228775024, 'learning_rate': 2.181813771453526e-06, 'epoch': 2.38} +2025-02-06 01:57:02 - ERROR - stderr - 79%|███████▉ | 17771/22434 [15:49:22<3:26:39, 2.66s/it] +2025-02-06 01:57:05 - ERROR - stderr - 79%|███████▉ | 17772/22434 [15:49:25<3:33:23, 2.75s/it] +2025-02-06 01:57:05 - ERROR - stderr - +2025-02-06 01:57:05 - ERROR - stderr - +2025-02-06 01:57:05 - INFO - stdout - {'loss': 0.398, 'grad_norm': 1.6548141241073608, 'learning_rate': 2.1809136683609324e-06, 'epoch': 2.38} +2025-02-06 01:57:05 - ERROR - stderr - 79%|███████▉ | 17772/22434 [15:49:25<3:33:23, 2.75s/it] +2025-02-06 01:57:08 - ERROR - stderr - 79%|███████▉ | 17773/22434 [15:49:27<3:26:18, 2.66s/it] +2025-02-06 01:57:08 - ERROR - stderr - +2025-02-06 01:57:08 - ERROR - stderr - +2025-02-06 01:57:08 - INFO - stdout - {'loss': 0.4104, 'grad_norm': 1.705264687538147, 'learning_rate': 2.180013728249708e-06, 'epoch': 2.38} +2025-02-06 01:57:08 - ERROR - stderr - 79%|███████▉ | 17773/22434 [15:49:27<3:26:18, 2.66s/it] +2025-02-06 01:57:10 - ERROR - stderr - 79%|███████▉ | 17774/22434 [15:49:30<3:23:06, 2.62s/it] +2025-02-06 01:57:10 - ERROR - stderr - +2025-02-06 01:57:10 - ERROR - stderr - +2025-02-06 01:57:10 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.4637964963912964, 'learning_rate': 2.179113951138615e-06, 'epoch': 2.38} +2025-02-06 01:57:10 - ERROR - stderr - 79%|███████▉ | 17774/22434 [15:49:30<3:23:06, 2.62s/it] +2025-02-06 01:57:13 - ERROR - stderr - 79%|███████▉ | 17775/22434 [15:49:32<3:20:25, 2.58s/it] +2025-02-06 01:57:13 - ERROR - stderr - +2025-02-06 01:57:13 - ERROR - stderr - +2025-02-06 01:57:13 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.6449227333068848, 'learning_rate': 2.1782143370464072e-06, 'epoch': 2.38} +2025-02-06 01:57:13 - ERROR - stderr - 79%|███████▉ | 17775/22434 [15:49:32<3:20:25, 2.58s/it] +2025-02-06 01:57:15 - ERROR - stderr - 79%|███████▉ | 17776/22434 [15:49:35<3:18:32, 2.56s/it] +2025-02-06 01:57:15 - ERROR - stderr - +2025-02-06 01:57:15 - ERROR - stderr - +2025-02-06 01:57:15 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.4319182634353638, 'learning_rate': 2.177314885991837e-06, 'epoch': 2.38} +2025-02-06 01:57:15 - ERROR - stderr - 79%|███████▉ | 17776/22434 [15:49:35<3:18:32, 2.56s/it] +2025-02-06 01:57:18 - ERROR - stderr - 79%|███████▉ | 17777/22434 [15:49:37<3:16:56, 2.54s/it] +2025-02-06 01:57:18 - ERROR - stderr - +2025-02-06 01:57:18 - ERROR - stderr - +2025-02-06 01:57:18 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.6064527034759521, 'learning_rate': 2.176415597993653e-06, 'epoch': 2.38} +2025-02-06 01:57:18 - ERROR - stderr - 79%|███████▉ | 17777/22434 [15:49:37<3:16:56, 2.54s/it] +2025-02-06 01:57:20 - ERROR - stderr - 79%|███████▉ | 17778/22434 [15:49:40<3:17:48, 2.55s/it] +2025-02-06 01:57:20 - ERROR - stderr - +2025-02-06 01:57:20 - ERROR - stderr - +2025-02-06 01:57:20 - INFO - stdout - {'loss': 0.4329, 'grad_norm': 1.7177451848983765, 'learning_rate': 2.175516473070599e-06, 'epoch': 2.38} +2025-02-06 01:57:20 - ERROR - stderr - 79%|███████▉ | 17778/22434 [15:49:40<3:17:48, 2.55s/it] +2025-02-06 01:57:23 - ERROR - stderr - 79%|███████▉ | 17779/22434 [15:49:42<3:15:26, 2.52s/it] +2025-02-06 01:57:23 - ERROR - stderr - +2025-02-06 01:57:23 - ERROR - stderr - +2025-02-06 01:57:23 - INFO - stdout - {'loss': 0.3316, 'grad_norm': 1.4720944166183472, 'learning_rate': 2.174617511241417e-06, 'epoch': 2.38} +2025-02-06 01:57:23 - ERROR - stderr - 79%|███████▉ | 17779/22434 [15:49:42<3:15:26, 2.52s/it] +2025-02-06 01:57:25 - ERROR - stderr - 79%|███████▉ | 17780/22434 [15:49:45<3:13:33, 2.50s/it] +2025-02-06 01:57:25 - ERROR - stderr - +2025-02-06 01:57:25 - ERROR - stderr - +2025-02-06 01:57:25 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.4723901748657227, 'learning_rate': 2.173718712524845e-06, 'epoch': 2.38} +2025-02-06 01:57:25 - ERROR - stderr - 79%|███████▉ | 17780/22434 [15:49:45<3:13:33, 2.50s/it] +2025-02-06 01:57:27 - ERROR - stderr - 79%|███████▉ | 17781/22434 [15:49:47<3:12:23, 2.48s/it] +2025-02-06 01:57:28 - ERROR - stderr - +2025-02-06 01:57:28 - ERROR - stderr - +2025-02-06 01:57:28 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.556833028793335, 'learning_rate': 2.172820076939618e-06, 'epoch': 2.38} +2025-02-06 01:57:28 - ERROR - stderr - 79%|███████▉ | 17781/22434 [15:49:47<3:12:23, 2.48s/it] +2025-02-06 01:57:30 - ERROR - stderr - 79%|███████▉ | 17782/22434 [15:49:50<3:14:36, 2.51s/it] +2025-02-06 01:57:30 - ERROR - stderr - +2025-02-06 01:57:30 - ERROR - stderr - +2025-02-06 01:57:30 - INFO - stdout - {'loss': 0.4273, 'grad_norm': 1.580910325050354, 'learning_rate': 2.1719216045044656e-06, 'epoch': 2.38} +2025-02-06 01:57:30 - ERROR - stderr - 79%|███████▉ | 17782/22434 [15:49:50<3:14:36, 2.51s/it] +2025-02-06 01:57:33 - ERROR - stderr - 79%|███████▉ | 17783/22434 [15:49:52<3:14:12, 2.51s/it] +2025-02-06 01:57:33 - ERROR - stderr - +2025-02-06 01:57:33 - ERROR - stderr - +2025-02-06 01:57:33 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.4721739292144775, 'learning_rate': 2.171023295238117e-06, 'epoch': 2.38} +2025-02-06 01:57:33 - ERROR - stderr - 79%|███████▉ | 17783/22434 [15:49:52<3:14:12, 2.51s/it] +2025-02-06 01:57:35 - ERROR - stderr - 79%|███████▉ | 17784/22434 [15:49:55<3:12:37, 2.49s/it] +2025-02-06 01:57:35 - ERROR - stderr - +2025-02-06 01:57:35 - ERROR - stderr - +2025-02-06 01:57:35 - INFO - stdout - {'loss': 0.4344, 'grad_norm': 1.7716712951660156, 'learning_rate': 2.1701251491593e-06, 'epoch': 2.38} +2025-02-06 01:57:35 - ERROR - stderr - 79%|███████▉ | 17784/22434 [15:49:55<3:12:37, 2.49s/it] +2025-02-06 01:57:38 - ERROR - stderr - 79%|███████▉ | 17785/22434 [15:49:57<3:13:33, 2.50s/it] +2025-02-06 01:57:38 - ERROR - stderr - +2025-02-06 01:57:38 - ERROR - stderr - +2025-02-06 01:57:38 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.5400400161743164, 'learning_rate': 2.1692271662867257e-06, 'epoch': 2.38} +2025-02-06 01:57:38 - ERROR - stderr - 79%|███████▉ | 17785/22434 [15:49:57<3:13:33, 2.50s/it] +2025-02-06 01:57:40 - ERROR - stderr - 79%|███████▉ | 17786/22434 [15:50:00<3:16:37, 2.54s/it] +2025-02-06 01:57:40 - ERROR - stderr - +2025-02-06 01:57:40 - ERROR - stderr - +2025-02-06 01:57:40 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.407920479774475, 'learning_rate': 2.168329346639123e-06, 'epoch': 2.38} +2025-02-06 01:57:40 - ERROR - stderr - 79%|███████▉ | 17786/22434 [15:50:00<3:16:37, 2.54s/it] +2025-02-06 01:57:43 - ERROR - stderr - 79%|███████▉ | 17787/22434 [15:50:02<3:14:28, 2.51s/it] +2025-02-06 01:57:43 - ERROR - stderr - +2025-02-06 01:57:43 - ERROR - stderr - +2025-02-06 01:57:43 - INFO - stdout - {'loss': 0.3888, 'grad_norm': 1.6963717937469482, 'learning_rate': 2.1674316902351967e-06, 'epoch': 2.38} +2025-02-06 01:57:43 - ERROR - stderr - 79%|███████▉ | 17787/22434 [15:50:02<3:14:28, 2.51s/it] +2025-02-06 01:57:45 - ERROR - stderr - 79%|███████▉ | 17788/22434 [15:50:05<3:13:35, 2.50s/it] +2025-02-06 01:57:45 - ERROR - stderr - +2025-02-06 01:57:45 - ERROR - stderr - +2025-02-06 01:57:45 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.3053958415985107, 'learning_rate': 2.166534197093664e-06, 'epoch': 2.38} +2025-02-06 01:57:45 - ERROR - stderr - 79%|███████▉ | 17788/22434 [15:50:05<3:13:35, 2.50s/it] +2025-02-06 01:57:48 - ERROR - stderr - 79%|███████▉ | 17789/22434 [15:50:07<3:13:46, 2.50s/it] +2025-02-06 01:57:48 - ERROR - stderr - +2025-02-06 01:57:48 - ERROR - stderr - +2025-02-06 01:57:48 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.5242102146148682, 'learning_rate': 2.165636867233232e-06, 'epoch': 2.38} +2025-02-06 01:57:48 - ERROR - stderr - 79%|███████▉ | 17789/22434 [15:50:07<3:13:46, 2.50s/it] +2025-02-06 01:57:50 - ERROR - stderr - 79%|███████▉ | 17790/22434 [15:50:10<3:22:21, 2.61s/it] +2025-02-06 01:57:51 - ERROR - stderr - +2025-02-06 01:57:51 - ERROR - stderr - +2025-02-06 01:57:51 - INFO - stdout - {'loss': 0.3449, 'grad_norm': 1.4534916877746582, 'learning_rate': 2.1647397006725978e-06, 'epoch': 2.38} +2025-02-06 01:57:51 - ERROR - stderr - 79%|███████▉ | 17790/22434 [15:50:10<3:22:21, 2.61s/it] +2025-02-06 01:57:53 - ERROR - stderr - 79%|███████▉ | 17791/22434 [15:50:13<3:20:20, 2.59s/it] +2025-02-06 01:57:53 - ERROR - stderr - +2025-02-06 01:57:53 - ERROR - stderr - +2025-02-06 01:57:53 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.4911854267120361, 'learning_rate': 2.1638426974304737e-06, 'epoch': 2.38} +2025-02-06 01:57:53 - ERROR - stderr - 79%|███████▉ | 17791/22434 [15:50:13<3:20:20, 2.59s/it] +2025-02-06 01:57:56 - ERROR - stderr - 79%|███████▉ | 17792/22434 [15:50:15<3:19:29, 2.58s/it] +2025-02-06 01:57:56 - ERROR - stderr - +2025-02-06 01:57:56 - ERROR - stderr - +2025-02-06 01:57:56 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.5047416687011719, 'learning_rate': 2.1629458575255457e-06, 'epoch': 2.38} +2025-02-06 01:57:56 - ERROR - stderr - 79%|███████▉ | 17792/22434 [15:50:15<3:19:29, 2.58s/it] +2025-02-06 01:57:58 - ERROR - stderr - 79%|███████▉ | 17793/22434 [15:50:18<3:17:58, 2.56s/it] +2025-02-06 01:57:58 - ERROR - stderr - +2025-02-06 01:57:58 - ERROR - stderr - +2025-02-06 01:57:58 - INFO - stdout - {'loss': 0.4073, 'grad_norm': 1.5534480810165405, 'learning_rate': 2.1620491809765133e-06, 'epoch': 2.38} +2025-02-06 01:57:58 - ERROR - stderr - 79%|███████▉ | 17793/22434 [15:50:18<3:17:58, 2.56s/it] +2025-02-06 01:58:01 - ERROR - stderr - 79%|███████▉ | 17794/22434 [15:50:20<3:18:05, 2.56s/it] +2025-02-06 01:58:01 - ERROR - stderr - +2025-02-06 01:58:01 - ERROR - stderr - +2025-02-06 01:58:01 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.4092671871185303, 'learning_rate': 2.1611526678020658e-06, 'epoch': 2.38} +2025-02-06 01:58:01 - ERROR - stderr - 79%|███████▉ | 17794/22434 [15:50:20<3:18:05, 2.56s/it] +2025-02-06 01:58:03 - ERROR - stderr - 79%|███████▉ | 17795/22434 [15:50:23<3:16:17, 2.54s/it] +2025-02-06 01:58:03 - ERROR - stderr - +2025-02-06 01:58:03 - ERROR - stderr - +2025-02-06 01:58:03 - INFO - stdout - {'loss': 0.3335, 'grad_norm': 1.5560104846954346, 'learning_rate': 2.1602563180208857e-06, 'epoch': 2.38} +2025-02-06 01:58:03 - ERROR - stderr - 79%|███████▉ | 17795/22434 [15:50:23<3:16:17, 2.54s/it] +2025-02-06 01:58:06 - ERROR - stderr - 79%|███████▉ | 17796/22434 [15:50:25<3:16:17, 2.54s/it] +2025-02-06 01:58:06 - ERROR - stderr - +2025-02-06 01:58:06 - ERROR - stderr - +2025-02-06 01:58:06 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.590920329093933, 'learning_rate': 2.1593601316516677e-06, 'epoch': 2.38} +2025-02-06 01:58:06 - ERROR - stderr - 79%|███████▉ | 17796/22434 [15:50:25<3:16:17, 2.54s/it] +2025-02-06 01:58:08 - ERROR - stderr - 79%|███████▉ | 17797/22434 [15:50:28<3:13:56, 2.51s/it] +2025-02-06 01:58:08 - ERROR - stderr - +2025-02-06 01:58:08 - ERROR - stderr - +2025-02-06 01:58:08 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.4623390436172485, 'learning_rate': 2.158464108713082e-06, 'epoch': 2.38} +2025-02-06 01:58:08 - ERROR - stderr - 79%|███████▉ | 17797/22434 [15:50:28<3:13:56, 2.51s/it] +2025-02-06 01:58:11 - ERROR - stderr - 79%|███████▉ | 17798/22434 [15:50:30<3:13:11, 2.50s/it] +2025-02-06 01:58:11 - ERROR - stderr - +2025-02-06 01:58:11 - ERROR - stderr - +2025-02-06 01:58:11 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.5075510740280151, 'learning_rate': 2.157568249223808e-06, 'epoch': 2.38} +2025-02-06 01:58:11 - ERROR - stderr - 79%|███████▉ | 17798/22434 [15:50:30<3:13:11, 2.50s/it] +2025-02-06 01:58:13 - ERROR - stderr - 79%|███████▉ | 17799/22434 [15:50:33<3:12:32, 2.49s/it] +2025-02-06 01:58:13 - ERROR - stderr - +2025-02-06 01:58:13 - ERROR - stderr - +2025-02-06 01:58:13 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.6794437170028687, 'learning_rate': 2.156672553202519e-06, 'epoch': 2.38} +2025-02-06 01:58:13 - ERROR - stderr - 79%|███████▉ | 17799/22434 [15:50:33<3:12:32, 2.49s/it] +2025-02-06 01:58:15 - ERROR - stderr - 79%|███████▉ | 17800/22434 [15:50:35<3:11:32, 2.48s/it] +2025-02-06 01:58:16 - ERROR - stderr - +2025-02-06 01:58:16 - ERROR - stderr - +2025-02-06 01:58:16 - INFO - stdout - {'loss': 0.3405, 'grad_norm': 1.3852394819259644, 'learning_rate': 2.155777020667886e-06, 'epoch': 2.38} +2025-02-06 01:58:16 - ERROR - stderr - 79%|███████▉ | 17800/22434 [15:50:35<3:11:32, 2.48s/it] +2025-02-06 01:58:18 - ERROR - stderr - 79%|███████▉ | 17801/22434 [15:50:38<3:11:48, 2.48s/it] +2025-02-06 01:58:18 - ERROR - stderr - +2025-02-06 01:58:18 - ERROR - stderr - +2025-02-06 01:58:18 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.5128384828567505, 'learning_rate': 2.154881651638575e-06, 'epoch': 2.38} +2025-02-06 01:58:18 - ERROR - stderr - 79%|███████▉ | 17801/22434 [15:50:38<3:11:48, 2.48s/it] +2025-02-06 01:58:20 - ERROR - stderr - 79%|███████▉ | 17802/22434 [15:50:40<3:11:35, 2.48s/it] +2025-02-06 01:58:21 - ERROR - stderr - +2025-02-06 01:58:21 - ERROR - stderr - +2025-02-06 01:58:21 - INFO - stdout - {'loss': 0.4159, 'grad_norm': 1.7992634773254395, 'learning_rate': 2.1539864461332495e-06, 'epoch': 2.38} +2025-02-06 01:58:21 - ERROR - stderr - 79%|███████▉ | 17802/22434 [15:50:40<3:11:35, 2.48s/it] +2025-02-06 01:58:23 - ERROR - stderr - 79%|███████▉ | 17803/22434 [15:50:43<3:11:44, 2.48s/it] +2025-02-06 01:58:23 - ERROR - stderr - +2025-02-06 01:58:23 - ERROR - stderr - +2025-02-06 01:58:23 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.6477489471435547, 'learning_rate': 2.1530914041705686e-06, 'epoch': 2.38} +2025-02-06 01:58:23 - ERROR - stderr - 79%|███████▉ | 17803/22434 [15:50:43<3:11:44, 2.48s/it] +2025-02-06 01:58:25 - ERROR - stderr - 79%|███████▉ | 17804/22434 [15:50:45<3:11:01, 2.48s/it] +2025-02-06 01:58:25 - ERROR - stderr - +2025-02-06 01:58:25 - ERROR - stderr - +2025-02-06 01:58:25 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.2010512351989746, 'learning_rate': 2.152196525769188e-06, 'epoch': 2.38} +2025-02-06 01:58:25 - ERROR - stderr - 79%|███████▉ | 17804/22434 [15:50:45<3:11:01, 2.48s/it] +2025-02-06 01:58:28 - ERROR - stderr - 79%|███████▉ | 17805/22434 [15:50:48<3:12:07, 2.49s/it] +2025-02-06 01:58:28 - ERROR - stderr - +2025-02-06 01:58:28 - ERROR - stderr - +2025-02-06 01:58:28 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.4801628589630127, 'learning_rate': 2.1513018109477647e-06, 'epoch': 2.38} +2025-02-06 01:58:28 - ERROR - stderr - 79%|███████▉ | 17805/22434 [15:50:48<3:12:07, 2.49s/it] +2025-02-06 01:58:30 - ERROR - stderr - 79%|███████▉ | 17806/22434 [15:50:50<3:12:46, 2.50s/it] +2025-02-06 01:58:31 - ERROR - stderr - +2025-02-06 01:58:31 - ERROR - stderr - +2025-02-06 01:58:31 - INFO - stdout - {'loss': 0.3405, 'grad_norm': 1.3441920280456543, 'learning_rate': 2.150407259724938e-06, 'epoch': 2.38} +2025-02-06 01:58:31 - ERROR - stderr - 79%|███████▉ | 17806/22434 [15:50:50<3:12:46, 2.50s/it] +2025-02-06 01:58:33 - ERROR - stderr - 79%|███████▉ | 17807/22434 [15:50:53<3:12:58, 2.50s/it] +2025-02-06 01:58:33 - ERROR - stderr - +2025-02-06 01:58:33 - ERROR - stderr - +2025-02-06 01:58:33 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.4753878116607666, 'learning_rate': 2.1495128721193648e-06, 'epoch': 2.38} +2025-02-06 01:58:33 - ERROR - stderr - 79%|███████▉ | 17807/22434 [15:50:53<3:12:58, 2.50s/it] +2025-02-06 01:58:36 - ERROR - stderr - 79%|███████▉ | 17808/22434 [15:50:55<3:14:18, 2.52s/it] +2025-02-06 01:58:36 - ERROR - stderr - +2025-02-06 01:58:36 - ERROR - stderr - +2025-02-06 01:58:36 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5801125764846802, 'learning_rate': 2.1486186481496863e-06, 'epoch': 2.38} +2025-02-06 01:58:36 - ERROR - stderr - 79%|███████▉ | 17808/22434 [15:50:55<3:14:18, 2.52s/it] +2025-02-06 01:58:38 - ERROR - stderr - 79%|███████▉ | 17809/22434 [15:50:58<3:13:13, 2.51s/it] +2025-02-06 01:58:38 - ERROR - stderr - +2025-02-06 01:58:38 - ERROR - stderr - +2025-02-06 01:58:38 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.6334245204925537, 'learning_rate': 2.147724587834533e-06, 'epoch': 2.38} +2025-02-06 01:58:38 - ERROR - stderr - 79%|███████▉ | 17809/22434 [15:50:58<3:13:13, 2.51s/it] +2025-02-06 01:58:40 - ERROR - stderr - 79%|███████▉ | 17810/22434 [15:51:00<3:12:52, 2.50s/it] +2025-02-06 01:58:41 - ERROR - stderr - +2025-02-06 01:58:41 - ERROR - stderr - +2025-02-06 01:58:41 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.6705485582351685, 'learning_rate': 2.146830691192553e-06, 'epoch': 2.38} +2025-02-06 01:58:41 - ERROR - stderr - 79%|███████▉ | 17810/22434 [15:51:00<3:12:52, 2.50s/it] +2025-02-06 01:58:43 - ERROR - stderr - 79%|███████▉ | 17811/22434 [15:51:03<3:11:30, 2.49s/it] +2025-02-06 01:58:43 - ERROR - stderr - +2025-02-06 01:58:43 - ERROR - stderr - +2025-02-06 01:58:43 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.8443785905838013, 'learning_rate': 2.1459369582423663e-06, 'epoch': 2.38} +2025-02-06 01:58:43 - ERROR - stderr - 79%|███████▉ | 17811/22434 [15:51:03<3:11:30, 2.49s/it] +2025-02-06 01:58:45 - ERROR - stderr - 79%|███████▉ | 17812/22434 [15:51:05<3:10:38, 2.47s/it] +2025-02-06 01:58:45 - ERROR - stderr - +2025-02-06 01:58:45 - ERROR - stderr - +2025-02-06 01:58:45 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.5280587673187256, 'learning_rate': 2.1450433890026147e-06, 'epoch': 2.38} +2025-02-06 01:58:45 - ERROR - stderr - 79%|███████▉ | 17812/22434 [15:51:05<3:10:38, 2.47s/it] +2025-02-06 01:58:48 - ERROR - stderr - 79%|███████▉ | 17813/22434 [15:51:08<3:09:32, 2.46s/it] +2025-02-06 01:58:48 - ERROR - stderr - +2025-02-06 01:58:48 - ERROR - stderr - +2025-02-06 01:58:48 - INFO - stdout - {'loss': 0.402, 'grad_norm': 1.6207430362701416, 'learning_rate': 2.144149983491913e-06, 'epoch': 2.38} +2025-02-06 01:58:48 - ERROR - stderr - 79%|███████▉ | 17813/22434 [15:51:08<3:09:32, 2.46s/it] +2025-02-06 01:58:50 - ERROR - stderr - 79%|███████▉ | 17814/22434 [15:51:10<3:12:58, 2.51s/it] +2025-02-06 01:58:50 - ERROR - stderr - +2025-02-06 01:58:50 - ERROR - stderr - +2025-02-06 01:58:50 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.5400564670562744, 'learning_rate': 2.1432567417288862e-06, 'epoch': 2.38} +2025-02-06 01:58:50 - ERROR - stderr - 79%|███████▉ | 17814/22434 [15:51:10<3:12:58, 2.51s/it] +2025-02-06 01:58:53 - ERROR - stderr - 79%|███████▉ | 17815/22434 [15:51:13<3:12:39, 2.50s/it] +2025-02-06 01:58:53 - ERROR - stderr - +2025-02-06 01:58:53 - ERROR - stderr - +2025-02-06 01:58:53 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.4587730169296265, 'learning_rate': 2.14236366373216e-06, 'epoch': 2.38} +2025-02-06 01:58:53 - ERROR - stderr - 79%|███████▉ | 17815/22434 [15:51:13<3:12:39, 2.50s/it] +2025-02-06 01:58:56 - ERROR - stderr - 79%|███████▉ | 17816/22434 [15:51:15<3:15:07, 2.54s/it] +2025-02-06 01:58:56 - ERROR - stderr - +2025-02-06 01:58:56 - ERROR - stderr - +2025-02-06 01:58:56 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.4588806629180908, 'learning_rate': 2.1414707495203415e-06, 'epoch': 2.38} +2025-02-06 01:58:56 - ERROR - stderr - 79%|███████▉ | 17816/22434 [15:51:15<3:15:07, 2.54s/it] +2025-02-06 01:58:58 - ERROR - stderr - 79%|███████▉ | 17817/22434 [15:51:18<3:15:24, 2.54s/it] +2025-02-06 01:58:58 - ERROR - stderr - +2025-02-06 01:58:58 - ERROR - stderr - +2025-02-06 01:58:58 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.6683028936386108, 'learning_rate': 2.1405779991120445e-06, 'epoch': 2.38} +2025-02-06 01:58:58 - ERROR - stderr - 79%|███████▉ | 17817/22434 [15:51:18<3:15:24, 2.54s/it] +2025-02-06 01:59:01 - ERROR - stderr - 79%|███████▉ | 17818/22434 [15:51:20<3:15:56, 2.55s/it] +2025-02-06 01:59:01 - ERROR - stderr - +2025-02-06 01:59:01 - ERROR - stderr - +2025-02-06 01:59:01 - INFO - stdout - {'loss': 0.327, 'grad_norm': 1.3662933111190796, 'learning_rate': 2.139685412525879e-06, 'epoch': 2.38} +2025-02-06 01:59:01 - ERROR - stderr - 79%|███████▉ | 17818/22434 [15:51:20<3:15:56, 2.55s/it] +2025-02-06 01:59:03 - ERROR - stderr - 79%|███████▉ | 17819/22434 [15:51:23<3:19:48, 2.60s/it] +2025-02-06 01:59:03 - ERROR - stderr - +2025-02-06 01:59:03 - ERROR - stderr - +2025-02-06 01:59:03 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.417389988899231, 'learning_rate': 2.1387929897804503e-06, 'epoch': 2.38} +2025-02-06 01:59:03 - ERROR - stderr - 79%|███████▉ | 17819/22434 [15:51:23<3:19:48, 2.60s/it] +2025-02-06 01:59:06 - ERROR - stderr - 79%|███████▉ | 17820/22434 [15:51:26<3:20:57, 2.61s/it] +2025-02-06 01:59:06 - ERROR - stderr - +2025-02-06 01:59:06 - ERROR - stderr - +2025-02-06 01:59:06 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.5727050304412842, 'learning_rate': 2.137900730894359e-06, 'epoch': 2.38} +2025-02-06 01:59:06 - ERROR - stderr - 79%|███████▉ | 17820/22434 [15:51:26<3:20:57, 2.61s/it] +2025-02-06 01:59:08 - ERROR - stderr - 79%|███████▉ | 17821/22434 [15:51:28<3:16:56, 2.56s/it] +2025-02-06 01:59:09 - ERROR - stderr - +2025-02-06 01:59:09 - ERROR - stderr - +2025-02-06 01:59:09 - INFO - stdout - {'loss': 0.3146, 'grad_norm': 1.5236448049545288, 'learning_rate': 2.137008635886203e-06, 'epoch': 2.38} +2025-02-06 01:59:09 - ERROR - stderr - 79%|███████▉ | 17821/22434 [15:51:28<3:16:56, 2.56s/it] +2025-02-06 01:59:11 - ERROR - stderr - 79%|███████▉ | 17822/22434 [15:51:31<3:16:30, 2.56s/it] +2025-02-06 01:59:11 - ERROR - stderr - +2025-02-06 01:59:11 - ERROR - stderr - +2025-02-06 01:59:11 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.7054824829101562, 'learning_rate': 2.136116704774579e-06, 'epoch': 2.38} +2025-02-06 01:59:11 - ERROR - stderr - 79%|███████▉ | 17822/22434 [15:51:31<3:16:30, 2.56s/it] +2025-02-06 01:59:14 - ERROR - stderr - 79%|███████▉ | 17823/22434 [15:51:33<3:15:18, 2.54s/it] +2025-02-06 01:59:14 - ERROR - stderr - +2025-02-06 01:59:14 - ERROR - stderr - +2025-02-06 01:59:14 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.5645071268081665, 'learning_rate': 2.1352249375780763e-06, 'epoch': 2.38} +2025-02-06 01:59:14 - ERROR - stderr - 79%|███████▉ | 17823/22434 [15:51:33<3:15:18, 2.54s/it] +2025-02-06 01:59:16 - ERROR - stderr - 79%|███████▉ | 17824/22434 [15:51:36<3:13:00, 2.51s/it] +2025-02-06 01:59:16 - ERROR - stderr - +2025-02-06 01:59:16 - ERROR - stderr - +2025-02-06 01:59:16 - INFO - stdout - {'loss': 0.3356, 'grad_norm': 1.552512764930725, 'learning_rate': 2.1343333343152873e-06, 'epoch': 2.38} +2025-02-06 01:59:16 - ERROR - stderr - 79%|███████▉ | 17824/22434 [15:51:36<3:13:00, 2.51s/it] +2025-02-06 01:59:18 - ERROR - stderr - 79%|███████▉ | 17825/22434 [15:51:38<3:12:10, 2.50s/it] +2025-02-06 01:59:18 - ERROR - stderr - +2025-02-06 01:59:18 - ERROR - stderr - +2025-02-06 01:59:18 - INFO - stdout - {'loss': 0.4219, 'grad_norm': 1.8232992887496948, 'learning_rate': 2.1334418950047885e-06, 'epoch': 2.38} +2025-02-06 01:59:18 - ERROR - stderr - 79%|███████▉ | 17825/22434 [15:51:38<3:12:10, 2.50s/it] +2025-02-06 01:59:21 - ERROR - stderr - 79%|███████▉ | 17826/22434 [15:51:41<3:13:26, 2.52s/it] +2025-02-06 01:59:21 - ERROR - stderr - +2025-02-06 01:59:21 - ERROR - stderr - +2025-02-06 01:59:21 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.5619089603424072, 'learning_rate': 2.132550619665168e-06, 'epoch': 2.38} +2025-02-06 01:59:21 - ERROR - stderr - 79%|███████▉ | 17826/22434 [15:51:41<3:13:26, 2.52s/it] +2025-02-06 01:59:23 - ERROR - stderr - 79%|███████▉ | 17827/22434 [15:51:43<3:13:04, 2.51s/it] +2025-02-06 01:59:24 - ERROR - stderr - +2025-02-06 01:59:24 - ERROR - stderr - +2025-02-06 01:59:24 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.3505864143371582, 'learning_rate': 2.1316595083150017e-06, 'epoch': 2.38} +2025-02-06 01:59:24 - ERROR - stderr - 79%|███████▉ | 17827/22434 [15:51:43<3:13:04, 2.51s/it] +2025-02-06 01:59:26 - ERROR - stderr - 79%|███████▉ | 17828/22434 [15:51:46<3:12:52, 2.51s/it] +2025-02-06 01:59:26 - ERROR - stderr - +2025-02-06 01:59:26 - ERROR - stderr - +2025-02-06 01:59:26 - INFO - stdout - {'loss': 0.4061, 'grad_norm': 1.7365391254425049, 'learning_rate': 2.1307685609728634e-06, 'epoch': 2.38} +2025-02-06 01:59:26 - ERROR - stderr - 79%|███████▉ | 17828/22434 [15:51:46<3:12:52, 2.51s/it] +2025-02-06 01:59:29 - ERROR - stderr - 79%|███████▉ | 17829/22434 [15:51:48<3:13:12, 2.52s/it] +2025-02-06 01:59:29 - ERROR - stderr - +2025-02-06 01:59:29 - ERROR - stderr - +2025-02-06 01:59:29 - INFO - stdout - {'loss': 0.4152, 'grad_norm': 1.5226134061813354, 'learning_rate': 2.1298777776573267e-06, 'epoch': 2.38} +2025-02-06 01:59:29 - ERROR - stderr - 79%|████���██▉ | 17829/22434 [15:51:48<3:13:12, 2.52s/it] +2025-02-06 01:59:31 - ERROR - stderr - 79%|███████▉ | 17830/22434 [15:51:51<3:11:27, 2.50s/it] +2025-02-06 01:59:31 - ERROR - stderr - +2025-02-06 01:59:31 - ERROR - stderr - +2025-02-06 01:59:31 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.6723659038543701, 'learning_rate': 2.1289871583869527e-06, 'epoch': 2.38} +2025-02-06 01:59:31 - ERROR - stderr - 79%|███████▉ | 17830/22434 [15:51:51<3:11:27, 2.50s/it] +2025-02-06 01:59:34 - ERROR - stderr - 79%|███████▉ | 17831/22434 [15:51:53<3:12:50, 2.51s/it] +2025-02-06 01:59:34 - ERROR - stderr - +2025-02-06 01:59:34 - ERROR - stderr - +2025-02-06 01:59:34 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.3853586912155151, 'learning_rate': 2.1280967031803134e-06, 'epoch': 2.38} +2025-02-06 01:59:34 - ERROR - stderr - 79%|███████▉ | 17831/22434 [15:51:53<3:12:50, 2.51s/it] +2025-02-06 01:59:36 - ERROR - stderr - 79%|███████▉ | 17832/22434 [15:51:56<3:20:13, 2.61s/it] +2025-02-06 01:59:36 - ERROR - stderr - +2025-02-06 01:59:36 - ERROR - stderr - +2025-02-06 01:59:36 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.5825053453445435, 'learning_rate': 2.1272064120559644e-06, 'epoch': 2.38} +2025-02-06 01:59:36 - ERROR - stderr - 79%|███████▉ | 17832/22434 [15:51:56<3:20:13, 2.61s/it] +2025-02-06 01:59:39 - ERROR - stderr - 79%|███████▉ | 17833/22434 [15:51:59<3:16:22, 2.56s/it] +2025-02-06 01:59:39 - ERROR - stderr - +2025-02-06 01:59:39 - ERROR - stderr - +2025-02-06 01:59:39 - INFO - stdout - {'loss': 0.4006, 'grad_norm': 1.5157063007354736, 'learning_rate': 2.1263162850324617e-06, 'epoch': 2.38} +2025-02-06 01:59:39 - ERROR - stderr - 79%|███████▉ | 17833/22434 [15:51:59<3:16:22, 2.56s/it] +2025-02-06 01:59:42 - ERROR - stderr - 79%|███████▉ | 17834/22434 [15:52:01<3:20:54, 2.62s/it] +2025-02-06 01:59:42 - ERROR - stderr - +2025-02-06 01:59:42 - ERROR - stderr - +2025-02-06 01:59:42 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.5054740905761719, 'learning_rate': 2.1254263221283657e-06, 'epoch': 2.38} +2025-02-06 01:59:42 - ERROR - stderr - 79%|███████▉ | 17834/22434 [15:52:01<3:20:54, 2.62s/it] +2025-02-06 01:59:44 - ERROR - stderr - 79%|███████▉ | 17835/22434 [15:52:04<3:16:08, 2.56s/it] +2025-02-06 01:59:44 - ERROR - stderr - +2025-02-06 01:59:44 - ERROR - stderr - +2025-02-06 01:59:44 - INFO - stdout - {'loss': 0.4264, 'grad_norm': 1.7522867918014526, 'learning_rate': 2.1245365233622186e-06, 'epoch': 2.38} +2025-02-06 01:59:44 - ERROR - stderr - 79%|███████▉ | 17835/22434 [15:52:04<3:16:08, 2.56s/it] +2025-02-06 01:59:46 - ERROR - stderr - 80%|███████▉ | 17836/22434 [15:52:06<3:14:17, 2.54s/it] +2025-02-06 01:59:47 - ERROR - stderr - +2025-02-06 01:59:47 - ERROR - stderr - +2025-02-06 01:59:47 - INFO - stdout - {'loss': 0.4082, 'grad_norm': 1.6084753274917603, 'learning_rate': 2.123646888752576e-06, 'epoch': 2.39} +2025-02-06 01:59:47 - ERROR - stderr - 80%|███████▉ | 17836/22434 [15:52:06<3:14:17, 2.54s/it] +2025-02-06 01:59:49 - ERROR - stderr - 80%|███████▉ | 17837/22434 [15:52:09<3:18:10, 2.59s/it] +2025-02-06 01:59:49 - ERROR - stderr - +2025-02-06 01:59:49 - ERROR - stderr - +2025-02-06 01:59:49 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.5315499305725098, 'learning_rate': 2.1227574183179755e-06, 'epoch': 2.39} +2025-02-06 01:59:49 - ERROR - stderr - 80%|███████▉ | 17837/22434 [15:52:09<3:18:10, 2.59s/it] +2025-02-06 01:59:52 - ERROR - stderr - 80%|███████▉ | 17838/22434 [15:52:12<3:18:19, 2.59s/it] +2025-02-06 01:59:52 - ERROR - stderr - +2025-02-06 01:59:52 - ERROR - stderr - +2025-02-06 01:59:52 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.7481637001037598, 'learning_rate': 2.121868112076959e-06, 'epoch': 2.39} +2025-02-06 01:59:52 - ERROR - stderr - 80%|███████▉ | 17838/22434 [15:52:12<3:18:19, 2.59s/it] +2025-02-06 01:59:54 - ERROR - stderr - 80%|███████▉ | 17839/22434 [15:52:14<3:18:25, 2.59s/it] +2025-02-06 01:59:54 - ERROR - stderr - +2025-02-06 01:59:54 - ERROR - stderr - +2025-02-06 01:59:54 - INFO - stdout - {'loss': 0.3235, 'grad_norm': 1.5142607688903809, 'learning_rate': 2.120978970048063e-06, 'epoch': 2.39} +2025-02-06 01:59:54 - ERROR - stderr - 80%|███████▉ | 17839/22434 [15:52:14<3:18:25, 2.59s/it] +2025-02-06 01:59:57 - ERROR - stderr - 80%|███████▉ | 17840/22434 [15:52:17<3:16:32, 2.57s/it] +2025-02-06 01:59:57 - ERROR - stderr - +2025-02-06 01:59:57 - ERROR - stderr - +2025-02-06 01:59:57 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.6345171928405762, 'learning_rate': 2.120089992249821e-06, 'epoch': 2.39} +2025-02-06 01:59:57 - ERROR - stderr - 80%|███████▉ | 17840/22434 [15:52:17<3:16:32, 2.57s/it] +2025-02-06 01:59:59 - ERROR - stderr - 80%|███████▉ | 17841/22434 [15:52:19<3:15:08, 2.55s/it] +2025-02-06 01:59:59 - ERROR - stderr - +2025-02-06 01:59:59 - ERROR - stderr - +2025-02-06 01:59:59 - INFO - stdout - {'loss': 0.4361, 'grad_norm': 1.7262449264526367, 'learning_rate': 2.119201178700763e-06, 'epoch': 2.39} +2025-02-06 01:59:59 - ERROR - stderr - 80%|███████▉ | 17841/22434 [15:52:19<3:15:08, 2.55s/it] +2025-02-06 02:00:02 - ERROR - stderr - 80%|███████▉ | 17842/22434 [15:52:22<3:14:14, 2.54s/it] +2025-02-06 02:00:02 - ERROR - stderr - +2025-02-06 02:00:02 - ERROR - stderr - +2025-02-06 02:00:02 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.455633282661438, 'learning_rate': 2.118312529419414e-06, 'epoch': 2.39} +2025-02-06 02:00:02 - ERROR - stderr - 80%|███████▉ | 17842/22434 [15:52:22<3:14:14, 2.54s/it] +2025-02-06 02:00:04 - ERROR - stderr - 80%|███████▉ | 17843/22434 [15:52:24<3:14:07, 2.54s/it] +2025-02-06 02:00:04 - ERROR - stderr - +2025-02-06 02:00:04 - ERROR - stderr - +2025-02-06 02:00:04 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.5089219808578491, 'learning_rate': 2.1174240444243e-06, 'epoch': 2.39} +2025-02-06 02:00:04 - ERROR - stderr - 80%|███████▉ | 17843/22434 [15:52:24<3:14:07, 2.54s/it] +2025-02-06 02:00:07 - ERROR - stderr - 80%|███████▉ | 17844/22434 [15:52:27<3:15:45, 2.56s/it] +2025-02-06 02:00:07 - ERROR - stderr - +2025-02-06 02:00:07 - ERROR - stderr - +2025-02-06 02:00:07 - INFO - stdout - {'loss': 0.397, 'grad_norm': 1.4888718128204346, 'learning_rate': 2.116535723733938e-06, 'epoch': 2.39} +2025-02-06 02:00:07 - ERROR - stderr - 80%|███████▉ | 17844/22434 [15:52:27<3:15:45, 2.56s/it] +2025-02-06 02:00:10 - ERROR - stderr - 80%|███████▉ | 17845/22434 [15:52:29<3:14:10, 2.54s/it] +2025-02-06 02:00:10 - ERROR - stderr - +2025-02-06 02:00:10 - ERROR - stderr - +2025-02-06 02:00:10 - INFO - stdout - {'loss': 0.4196, 'grad_norm': 1.4841015338897705, 'learning_rate': 2.1156475673668453e-06, 'epoch': 2.39} +2025-02-06 02:00:10 - ERROR - stderr - 80%|███████▉ | 17845/22434 [15:52:29<3:14:10, 2.54s/it] +2025-02-06 02:00:12 - ERROR - stderr - 80%|███████▉ | 17846/22434 [15:52:32<3:12:26, 2.52s/it] +2025-02-06 02:00:12 - ERROR - stderr - +2025-02-06 02:00:12 - ERROR - stderr - +2025-02-06 02:00:12 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.532810926437378, 'learning_rate': 2.114759575341535e-06, 'epoch': 2.39} +2025-02-06 02:00:12 - ERROR - stderr - 80%|███████▉ | 17846/22434 [15:52:32<3:12:26, 2.52s/it] +2025-02-06 02:00:15 - ERROR - stderr - 80%|███████▉ | 17847/22434 [15:52:34<3:12:17, 2.52s/it] +2025-02-06 02:00:15 - ERROR - stderr - +2025-02-06 02:00:15 - ERROR - stderr - +2025-02-06 02:00:15 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.5386040210723877, 'learning_rate': 2.113871747676516e-06, 'epoch': 2.39} +2025-02-06 02:00:15 - ERROR - stderr - 80%|███████▉ | 17847/22434 [15:52:34<3:12:17, 2.52s/it] +2025-02-06 02:00:17 - ERROR - stderr - 80%|███████▉ | 17848/22434 [15:52:37<3:12:27, 2.52s/it] +2025-02-06 02:00:17 - ERROR - stderr - +2025-02-06 02:00:17 - ERROR - stderr - +2025-02-06 02:00:17 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.6453512907028198, 'learning_rate': 2.112984084390294e-06, 'epoch': 2.39} +2025-02-06 02:00:17 - ERROR - stderr - 80%|███████▉ | 17848/22434 [15:52:37<3:12:27, 2.52s/it] +2025-02-06 02:00:20 - ERROR - stderr - 80%|███████▉ | 17849/22434 [15:52:39<3:12:17, 2.52s/it] +2025-02-06 02:00:20 - ERROR - stderr - +2025-02-06 02:00:20 - ERROR - stderr - +2025-02-06 02:00:20 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.5648529529571533, 'learning_rate': 2.112096585501371e-06, 'epoch': 2.39} +2025-02-06 02:00:20 - ERROR - stderr - 80%|███████▉ | 17849/22434 [15:52:39<3:12:17, 2.52s/it] +2025-02-06 02:00:22 - ERROR - stderr - 80%|███████▉ | 17850/22434 [15:52:42<3:11:12, 2.50s/it] +2025-02-06 02:00:22 - ERROR - stderr - +2025-02-06 02:00:22 - ERROR - stderr - +2025-02-06 02:00:22 - INFO - stdout - {'loss': 0.2982, 'grad_norm': 1.3051241636276245, 'learning_rate': 2.11120925102825e-06, 'epoch': 2.39} +2025-02-06 02:00:22 - ERROR - stderr - 80%|███████▉ | 17850/22434 [15:52:42<3:11:12, 2.50s/it] +2025-02-06 02:00:25 - ERROR - stderr - 80%|███████▉ | 17851/22434 [15:52:44<3:13:04, 2.53s/it] +2025-02-06 02:00:25 - ERROR - stderr - +2025-02-06 02:00:25 - ERROR - stderr - +2025-02-06 02:00:25 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.5490403175354004, 'learning_rate': 2.1103220809894188e-06, 'epoch': 2.39} +2025-02-06 02:00:25 - ERROR - stderr - 80%|███████▉ | 17851/22434 [15:52:44<3:13:04, 2.53s/it] +2025-02-06 02:00:27 - ERROR - stderr - 80%|███████▉ | 17852/22434 [15:52:47<3:11:24, 2.51s/it] +2025-02-06 02:00:27 - ERROR - stderr - +2025-02-06 02:00:27 - ERROR - stderr - +2025-02-06 02:00:27 - INFO - stdout - {'loss': 0.4098, 'grad_norm': 1.7357176542282104, 'learning_rate': 2.1094350754033765e-06, 'epoch': 2.39} +2025-02-06 02:00:27 - ERROR - stderr - 80%|███████▉ | 17852/22434 [15:52:47<3:11:24, 2.51s/it] +2025-02-06 02:00:30 - ERROR - stderr - 80%|███████▉ | 17853/22434 [15:52:49<3:11:51, 2.51s/it] +2025-02-06 02:00:30 - ERROR - stderr - +2025-02-06 02:00:30 - ERROR - stderr - +2025-02-06 02:00:30 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.2691576480865479, 'learning_rate': 2.108548234288612e-06, 'epoch': 2.39} +2025-02-06 02:00:30 - ERROR - stderr - 80%|███████▉ | 17853/22434 [15:52:49<3:11:51, 2.51s/it] +2025-02-06 02:00:32 - ERROR - stderr - 80%|███████▉ | 17854/22434 [15:52:52<3:14:46, 2.55s/it] +2025-02-06 02:00:32 - ERROR - stderr - +2025-02-06 02:00:32 - ERROR - stderr - +2025-02-06 02:00:32 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.554545521736145, 'learning_rate': 2.107661557663603e-06, 'epoch': 2.39} +2025-02-06 02:00:32 - ERROR - stderr - 80%|███████▉ | 17854/22434 [15:52:52<3:14:46, 2.55s/it] +2025-02-06 02:00:35 - ERROR - stderr - 80%|███████▉ | 17855/22434 [15:52:55<3:13:38, 2.54s/it] +2025-02-06 02:00:35 - ERROR - stderr - +2025-02-06 02:00:35 - ERROR - stderr - +2025-02-06 02:00:35 - INFO - stdout - {'loss': 0.4025, 'grad_norm': 1.5350016355514526, 'learning_rate': 2.106775045546842e-06, 'epoch': 2.39} +2025-02-06 02:00:35 - ERROR - stderr - 80%|███████▉ | 17855/22434 [15:52:55<3:13:38, 2.54s/it] +2025-02-06 02:00:37 - ERROR - stderr - 80%|███████▉ | 17856/22434 [15:52:57<3:14:18, 2.55s/it] +2025-02-06 02:00:37 - ERROR - stderr - +2025-02-06 02:00:37 - ERROR - stderr - +2025-02-06 02:00:37 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.5181926488876343, 'learning_rate': 2.105888697956796e-06, 'epoch': 2.39} +2025-02-06 02:00:37 - ERROR - stderr - 80%|███████▉ | 17856/22434 [15:52:57<3:14:18, 2.55s/it] +2025-02-06 02:00:40 - ERROR - stderr - 80%|███████▉ | 17857/22434 [15:53:00<3:13:10, 2.53s/it] +2025-02-06 02:00:40 - ERROR - stderr - +2025-02-06 02:00:40 - ERROR - stderr - +2025-02-06 02:00:40 - INFO - stdout - {'loss': 0.3411, 'grad_norm': 1.434273600578308, 'learning_rate': 2.1050025149119523e-06, 'epoch': 2.39} +2025-02-06 02:00:40 - ERROR - stderr - 80%|███████▉ | 17857/22434 [15:53:00<3:13:10, 2.53s/it] +2025-02-06 02:00:42 - ERROR - stderr - 80%|███████▉ | 17858/22434 [15:53:02<3:10:33, 2.50s/it] +2025-02-06 02:00:42 - ERROR - stderr - +2025-02-06 02:00:42 - ERROR - stderr - +2025-02-06 02:00:42 - INFO - stdout - {'loss': 0.3897, 'grad_norm': 1.6034188270568848, 'learning_rate': 2.1041164964307747e-06, 'epoch': 2.39} +2025-02-06 02:00:42 - ERROR - stderr - 80%|███████▉ | 17858/22434 [15:53:02<3:10:33, 2.50s/it] +2025-02-06 02:00:45 - ERROR - stderr - 80%|███████▉ | 17859/22434 [15:53:04<3:10:27, 2.50s/it] +2025-02-06 02:00:45 - ERROR - stderr - +2025-02-06 02:00:45 - ERROR - stderr - +2025-02-06 02:00:45 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.6710104942321777, 'learning_rate': 2.1032306425317296e-06, 'epoch': 2.39} +2025-02-06 02:00:45 - ERROR - stderr - 80%|███████▉ | 17859/22434 [15:53:05<3:10:27, 2.50s/it] +2025-02-06 02:00:47 - ERROR - stderr - 80%|███████▉ | 17860/22434 [15:53:07<3:12:30, 2.53s/it] +2025-02-06 02:00:47 - ERROR - stderr - +2025-02-06 02:00:47 - ERROR - stderr - +2025-02-06 02:00:47 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.4621918201446533, 'learning_rate': 2.1023449532332908e-06, 'epoch': 2.39} +2025-02-06 02:00:47 - ERROR - stderr - 80%|███████▉ | 17860/22434 [15:53:07<3:12:30, 2.53s/it] +2025-02-06 02:00:50 - ERROR - stderr - 80%|███████▉ | 17861/22434 [15:53:10<3:12:26, 2.52s/it] +2025-02-06 02:00:50 - ERROR - stderr - +2025-02-06 02:00:50 - ERROR - stderr - +2025-02-06 02:00:50 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.3159018754959106, 'learning_rate': 2.101459428553911e-06, 'epoch': 2.39} +2025-02-06 02:00:50 - ERROR - stderr - 80%|███████▉ | 17861/22434 [15:53:10<3:12:26, 2.52s/it] +2025-02-06 02:00:52 - ERROR - stderr - 80%|███████▉ | 17862/22434 [15:53:12<3:11:43, 2.52s/it] +2025-02-06 02:00:52 - ERROR - stderr - +2025-02-06 02:00:52 - ERROR - stderr - +2025-02-06 02:00:52 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.4102952480316162, 'learning_rate': 2.1005740685120524e-06, 'epoch': 2.39} +2025-02-06 02:00:52 - ERROR - stderr - 80%|███████▉ | 17862/22434 [15:53:12<3:11:43, 2.52s/it] +2025-02-06 02:00:55 - ERROR - stderr - 80%|███████▉ | 17863/22434 [15:53:15<3:10:55, 2.51s/it] +2025-02-06 02:00:55 - ERROR - stderr - +2025-02-06 02:00:55 - ERROR - stderr - +2025-02-06 02:00:55 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.6028289794921875, 'learning_rate': 2.099688873126168e-06, 'epoch': 2.39} +2025-02-06 02:00:55 - ERROR - stderr - 80%|███████▉ | 17863/22434 [15:53:15<3:10:55, 2.51s/it] +2025-02-06 02:00:57 - ERROR - stderr - 80%|███████▉ | 17864/22434 [15:53:17<3:11:06, 2.51s/it] +2025-02-06 02:00:57 - ERROR - stderr - +2025-02-06 02:00:57 - ERROR - stderr - +2025-02-06 02:00:57 - INFO - stdout - {'loss': 0.4323, 'grad_norm': 1.6218167543411255, 'learning_rate': 2.0988038424147093e-06, 'epoch': 2.39} +2025-02-06 02:00:57 - ERROR - stderr - 80%|███████▉ | 17864/22434 [15:53:17<3:11:06, 2.51s/it] +2025-02-06 02:01:00 - ERROR - stderr - 80%|███████▉ | 17865/22434 [15:53:20<3:10:50, 2.51s/it] +2025-02-06 02:01:00 - ERROR - stderr - +2025-02-06 02:01:00 - ERROR - stderr - +2025-02-06 02:01:00 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.5229204893112183, 'learning_rate': 2.097918976396124e-06, 'epoch': 2.39} +2025-02-06 02:01:00 - ERROR - stderr - 80%|███████▉ | 17865/22434 [15:53:20<3:10:50, 2.51s/it] +2025-02-06 02:01:03 - ERROR - stderr - 80%|███████▉ | 17866/22434 [15:53:23<3:21:35, 2.65s/it] +2025-02-06 02:01:03 - ERROR - stderr - +2025-02-06 02:01:03 - ERROR - stderr - +2025-02-06 02:01:03 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.4938158988952637, 'learning_rate': 2.097034275088855e-06, 'epoch': 2.39} +2025-02-06 02:01:03 - ERROR - stderr - 80%|███████▉ | 17866/22434 [15:53:23<3:21:35, 2.65s/it] +2025-02-06 02:01:05 - ERROR - stderr - 80%|███████▉ | 17867/22434 [15:53:25<3:18:14, 2.60s/it] +2025-02-06 02:01:05 - ERROR - stderr - +2025-02-06 02:01:05 - ERROR - stderr - +2025-02-06 02:01:05 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.4379466772079468, 'learning_rate': 2.096149738511346e-06, 'epoch': 2.39} +2025-02-06 02:01:05 - ERROR - stderr - 80%|███████▉ | 17867/22434 [15:53:25<3:18:14, 2.60s/it] +2025-02-06 02:01:08 - ERROR - stderr - 80%|███████▉ | 17868/22434 [15:53:28<3:16:05, 2.58s/it] +2025-02-06 02:01:08 - ERROR - stderr - +2025-02-06 02:01:08 - ERROR - stderr - +2025-02-06 02:01:08 - INFO - stdout - {'loss': 0.3451, 'grad_norm': 1.4845023155212402, 'learning_rate': 2.095265366682031e-06, 'epoch': 2.39} +2025-02-06 02:01:08 - ERROR - stderr - 80%|███████▉ | 17868/22434 [15:53:28<3:16:05, 2.58s/it] +2025-02-06 02:01:10 - ERROR - stderr - 80%|███████▉ | 17869/22434 [15:53:30<3:13:26, 2.54s/it] +2025-02-06 02:01:10 - ERROR - stderr - +2025-02-06 02:01:10 - ERROR - stderr - +2025-02-06 02:01:10 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.611345648765564, 'learning_rate': 2.0943811596193485e-06, 'epoch': 2.39} +2025-02-06 02:01:10 - ERROR - stderr - 80%|███████▉ | 17869/22434 [15:53:30<3:13:26, 2.54s/it] +2025-02-06 02:01:13 - ERROR - stderr - 80%|███████▉ | 17870/22434 [15:53:33<3:21:05, 2.64s/it] +2025-02-06 02:01:13 - ERROR - stderr - +2025-02-06 02:01:13 - ERROR - stderr - +2025-02-06 02:01:13 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.597919225692749, 'learning_rate': 2.093497117341722e-06, 'epoch': 2.39} +2025-02-06 02:01:13 - ERROR - stderr - 80%|███████▉ | 17870/22434 [15:53:33<3:21:05, 2.64s/it] +2025-02-06 02:01:16 - ERROR - stderr - 80%|███████▉ | 17871/22434 [15:53:35<3:16:04, 2.58s/it] +2025-02-06 02:01:16 - ERROR - stderr - +2025-02-06 02:01:16 - ERROR - stderr - +2025-02-06 02:01:16 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.493640422821045, 'learning_rate': 2.0926132398675836e-06, 'epoch': 2.39} +2025-02-06 02:01:16 - ERROR - stderr - 80%|███████▉ | 17871/22434 [15:53:35<3:16:04, 2.58s/it] +2025-02-06 02:01:18 - ERROR - stderr - 80%|███████▉ | 17872/22434 [15:53:38<3:12:59, 2.54s/it] +2025-02-06 02:01:18 - ERROR - stderr - +2025-02-06 02:01:18 - ERROR - stderr - +2025-02-06 02:01:18 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.4551411867141724, 'learning_rate': 2.091729527215356e-06, 'epoch': 2.39} +2025-02-06 02:01:18 - ERROR - stderr - 80%|███████▉ | 17872/22434 [15:53:38<3:12:59, 2.54s/it] +2025-02-06 02:01:20 - ERROR - stderr - 80%|███████▉ | 17873/22434 [15:53:40<3:11:02, 2.51s/it] +2025-02-06 02:01:21 - ERROR - stderr - +2025-02-06 02:01:21 - ERROR - stderr - +2025-02-06 02:01:21 - INFO - stdout - {'loss': 0.3935, 'grad_norm': 1.5585696697235107, 'learning_rate': 2.0908459794034587e-06, 'epoch': 2.39} +2025-02-06 02:01:21 - ERROR - stderr - 80%|███████▉ | 17873/22434 [15:53:40<3:11:02, 2.51s/it] +2025-02-06 02:01:23 - ERROR - stderr - 80%|███████▉ | 17874/22434 [15:53:43<3:08:39, 2.48s/it] +2025-02-06 02:01:23 - ERROR - stderr - +2025-02-06 02:01:23 - ERROR - stderr - +2025-02-06 02:01:23 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.6306896209716797, 'learning_rate': 2.0899625964503113e-06, 'epoch': 2.39} +2025-02-06 02:01:23 - ERROR - stderr - 80%|███████▉ | 17874/22434 [15:53:43<3:08:39, 2.48s/it] +2025-02-06 02:01:26 - ERROR - stderr - 80%|███████▉ | 17875/22434 [15:53:45<3:12:05, 2.53s/it] +2025-02-06 02:01:26 - ERROR - stderr - +2025-02-06 02:01:26 - ERROR - stderr - +2025-02-06 02:01:26 - INFO - stdout - {'loss': 0.3237, 'grad_norm': 1.2582402229309082, 'learning_rate': 2.0890793783743204e-06, 'epoch': 2.39} +2025-02-06 02:01:26 - ERROR - stderr - 80%|███████▉ | 17875/22434 [15:53:45<3:12:05, 2.53s/it] +2025-02-06 02:01:28 - ERROR - stderr - 80%|███████▉ | 17876/22434 [15:53:48<3:11:23, 2.52s/it] +2025-02-06 02:01:28 - ERROR - stderr - +2025-02-06 02:01:28 - ERROR - stderr - +2025-02-06 02:01:28 - INFO - stdout - {'loss': 0.4098, 'grad_norm': 1.614313006401062, 'learning_rate': 2.088196325193904e-06, 'epoch': 2.39} +2025-02-06 02:01:28 - ERROR - stderr - 80%|███████▉ | 17876/22434 [15:53:48<3:11:23, 2.52s/it] +2025-02-06 02:01:31 - ERROR - stderr - 80%|███████▉ | 17877/22434 [15:53:50<3:11:37, 2.52s/it] +2025-02-06 02:01:31 - ERROR - stderr - +2025-02-06 02:01:31 - ERROR - stderr - +2025-02-06 02:01:31 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.5872982740402222, 'learning_rate': 2.0873134369274616e-06, 'epoch': 2.39} +2025-02-06 02:01:31 - ERROR - stderr - 80%|███████▉ | 17877/22434 [15:53:50<3:11:37, 2.52s/it] +2025-02-06 02:01:33 - ERROR - stderr - 80%|███████▉ | 17878/22434 [15:53:53<3:10:45, 2.51s/it] +2025-02-06 02:01:33 - ERROR - stderr - +2025-02-06 02:01:33 - ERROR - stderr - +2025-02-06 02:01:33 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.51145601272583, 'learning_rate': 2.086430713593397e-06, 'epoch': 2.39} +2025-02-06 02:01:33 - ERROR - stderr - 80%|███████▉ | 17878/22434 [15:53:53<3:10:45, 2.51s/it] +2025-02-06 02:01:36 - ERROR - stderr - 80%|███████▉ | 17879/22434 [15:53:55<3:10:26, 2.51s/it] +2025-02-06 02:01:36 - ERROR - stderr - +2025-02-06 02:01:36 - ERROR - stderr - +2025-02-06 02:01:36 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.7802976369857788, 'learning_rate': 2.0855481552101163e-06, 'epoch': 2.39} +2025-02-06 02:01:36 - ERROR - stderr - 80%|███████▉ | 17879/22434 [15:53:55<3:10:26, 2.51s/it] +2025-02-06 02:01:38 - ERROR - stderr - 80%|███████▉ | 17880/22434 [15:53:58<3:08:32, 2.48s/it] +2025-02-06 02:01:38 - ERROR - stderr - +2025-02-06 02:01:38 - ERROR - stderr - +2025-02-06 02:01:38 - INFO - stdout - {'loss': 0.3814, 'grad_norm': 1.484453558921814, 'learning_rate': 2.0846657617960063e-06, 'epoch': 2.39} +2025-02-06 02:01:38 - ERROR - stderr - 80%|███████▉ | 17880/22434 [15:53:58<3:08:32, 2.48s/it] +2025-02-06 02:01:40 - ERROR - stderr - 80%|███████▉ | 17881/22434 [15:54:00<3:08:15, 2.48s/it] +2025-02-06 02:01:40 - ERROR - stderr - +2025-02-06 02:01:40 - ERROR - stderr - +2025-02-06 02:01:40 - INFO - stdout - {'loss': 0.3895, 'grad_norm': 1.645645260810852, 'learning_rate': 2.08378353336947e-06, 'epoch': 2.39} +2025-02-06 02:01:40 - ERROR - stderr - 80%|███████▉ | 17881/22434 [15:54:00<3:08:15, 2.48s/it] +2025-02-06 02:01:43 - ERROR - stderr - 80%|███████▉ | 17882/22434 [15:54:03<3:08:40, 2.49s/it] +2025-02-06 02:01:43 - ERROR - stderr - +2025-02-06 02:01:43 - ERROR - stderr - +2025-02-06 02:01:43 - INFO - stdout - {'loss': 0.3029, 'grad_norm': 1.3851213455200195, 'learning_rate': 2.082901469948888e-06, 'epoch': 2.39} +2025-02-06 02:01:43 - ERROR - stderr - 80%|███████▉ | 17882/22434 [15:54:03<3:08:40, 2.49s/it] +2025-02-06 02:01:45 - ERROR - stderr - 80%|███████▉ | 17883/22434 [15:54:05<3:07:54, 2.48s/it] +2025-02-06 02:01:45 - ERROR - stderr - +2025-02-06 02:01:45 - ERROR - stderr - +2025-02-06 02:01:45 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.3250372409820557, 'learning_rate': 2.0820195715526493e-06, 'epoch': 2.39} +2025-02-06 02:01:45 - ERROR - stderr - 80%|███████▉ | 17883/22434 [15:54:05<3:07:54, 2.48s/it] +2025-02-06 02:01:48 - ERROR - stderr - 80%|███████▉ | 17884/22434 [15:54:08<3:06:07, 2.45s/it] +2025-02-06 02:01:48 - ERROR - stderr - +2025-02-06 02:01:48 - ERROR - stderr - +2025-02-06 02:01:48 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.5430479049682617, 'learning_rate': 2.0811378381991354e-06, 'epoch': 2.39} +2025-02-06 02:01:48 - ERROR - stderr - 80%|███████▉ | 17884/22434 [15:54:08<3:06:07, 2.45s/it] +2025-02-06 02:01:50 - ERROR - stderr - 80%|███████▉ | 17885/22434 [15:54:10<3:07:50, 2.48s/it] +2025-02-06 02:01:50 - ERROR - stderr - +2025-02-06 02:01:50 - ERROR - stderr - +2025-02-06 02:01:50 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.6684101819992065, 'learning_rate': 2.0802562699067254e-06, 'epoch': 2.39} +2025-02-06 02:01:50 - ERROR - stderr - 80%|███████▉ | 17885/22434 [15:54:10<3:07:50, 2.48s/it] +2025-02-06 02:01:53 - ERROR - stderr - 80%|███████▉ | 17886/22434 [15:54:13<3:06:26, 2.46s/it] +2025-02-06 02:01:53 - ERROR - stderr - +2025-02-06 02:01:53 - ERROR - stderr - +2025-02-06 02:01:53 - INFO - stdout - {'loss': 0.301, 'grad_norm': 1.4793622493743896, 'learning_rate': 2.0793748666937963e-06, 'epoch': 2.39} +2025-02-06 02:01:53 - ERROR - stderr - 80%|███████▉ | 17886/22434 [15:54:13<3:06:26, 2.46s/it] +2025-02-06 02:01:55 - ERROR - stderr - 80%|███████▉ | 17887/22434 [15:54:15<3:07:09, 2.47s/it] +2025-02-06 02:01:55 - ERROR - stderr - +2025-02-06 02:01:55 - ERROR - stderr - +2025-02-06 02:01:55 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.4755393266677856, 'learning_rate': 2.0784936285787173e-06, 'epoch': 2.39} +2025-02-06 02:01:55 - ERROR - stderr - 80%|███████▉ | 17887/22434 [15:54:15<3:07:09, 2.47s/it] +2025-02-06 02:01:58 - ERROR - stderr - 80%|███████▉ | 17888/22434 [15:54:18<3:16:35, 2.59s/it] +2025-02-06 02:01:58 - ERROR - stderr - +2025-02-06 02:01:58 - ERROR - stderr - +2025-02-06 02:01:58 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.51986825466156, 'learning_rate': 2.07761255557986e-06, 'epoch': 2.39} +2025-02-06 02:01:58 - ERROR - stderr - 80%|███████▉ | 17888/22434 [15:54:18<3:16:35, 2.59s/it] +2025-02-06 02:02:01 - ERROR - stderr - 80%|███████▉ | 17889/22434 [15:54:20<3:15:43, 2.58s/it] +2025-02-06 02:02:01 - ERROR - stderr - +2025-02-06 02:02:01 - ERROR - stderr - +2025-02-06 02:02:01 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.6284205913543701, 'learning_rate': 2.0767316477155875e-06, 'epoch': 2.39} +2025-02-06 02:02:01 - ERROR - stderr - 80%|███████▉ | 17889/22434 [15:54:21<3:15:43, 2.58s/it] +2025-02-06 02:02:03 - ERROR - stderr - 80%|███████▉ | 17890/22434 [15:54:23<3:18:31, 2.62s/it] +2025-02-06 02:02:03 - ERROR - stderr - +2025-02-06 02:02:03 - ERROR - stderr - +2025-02-06 02:02:03 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.5278631448745728, 'learning_rate': 2.075850905004262e-06, 'epoch': 2.39} +2025-02-06 02:02:03 - ERROR - stderr - 80%|███████▉ | 17890/22434 [15:54:23<3:18:31, 2.62s/it] +2025-02-06 02:02:06 - ERROR - stderr - 80%|███████▉ | 17891/22434 [15:54:26<3:15:24, 2.58s/it] +2025-02-06 02:02:06 - ERROR - stderr - +2025-02-06 02:02:06 - ERROR - stderr - +2025-02-06 02:02:06 - INFO - stdout - {'loss': 0.4004, 'grad_norm': 1.5485665798187256, 'learning_rate': 2.074970327464242e-06, 'epoch': 2.39} +2025-02-06 02:02:06 - ERROR - stderr - 80%|███████▉ | 17891/22434 [15:54:26<3:15:24, 2.58s/it] +2025-02-06 02:02:08 - ERROR - stderr - 80%|███████▉ | 17892/22434 [15:54:28<3:15:39, 2.58s/it] +2025-02-06 02:02:09 - ERROR - stderr - +2025-02-06 02:02:09 - ERROR - stderr - +2025-02-06 02:02:09 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.610215663909912, 'learning_rate': 2.0740899151138816e-06, 'epoch': 2.39} +2025-02-06 02:02:09 - ERROR - stderr - 80%|███████▉ | 17892/22434 [15:54:28<3:15:39, 2.58s/it] +2025-02-06 02:02:11 - ERROR - stderr - 80%|███████▉ | 17893/22434 [15:54:31<3:14:58, 2.58s/it] +2025-02-06 02:02:11 - ERROR - stderr - +2025-02-06 02:02:11 - ERROR - stderr - +2025-02-06 02:02:11 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.4530028104782104, 'learning_rate': 2.0732096679715353e-06, 'epoch': 2.39} +2025-02-06 02:02:11 - ERROR - stderr - 80%|███████▉ | 17893/22434 [15:54:31<3:14:58, 2.58s/it] +2025-02-06 02:02:13 - ERROR - stderr - 80%|███████▉ | 17894/22434 [15:54:33<3:11:53, 2.54s/it] +2025-02-06 02:02:14 - ERROR - stderr - +2025-02-06 02:02:14 - ERROR - stderr - +2025-02-06 02:02:14 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.550559639930725, 'learning_rate': 2.0723295860555438e-06, 'epoch': 2.39} +2025-02-06 02:02:14 - ERROR - stderr - 80%|███████▉ | 17894/22434 [15:54:33<3:11:53, 2.54s/it] +2025-02-06 02:02:16 - ERROR - stderr - 80%|███████▉ | 17895/22434 [15:54:36<3:19:59, 2.64s/it] +2025-02-06 02:02:16 - ERROR - stderr - +2025-02-06 02:02:16 - ERROR - stderr - +2025-02-06 02:02:16 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.5442068576812744, 'learning_rate': 2.071449669384261e-06, 'epoch': 2.39} +2025-02-06 02:02:16 - ERROR - stderr - 80%|███████▉ | 17895/22434 [15:54:36<3:19:59, 2.64s/it] +2025-02-06 02:02:19 - ERROR - stderr - 80%|███████▉ | 17896/22434 [15:54:39<3:16:23, 2.60s/it] +2025-02-06 02:02:19 - ERROR - stderr - +2025-02-06 02:02:19 - ERROR - stderr - +2025-02-06 02:02:19 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.484924554824829, 'learning_rate': 2.0705699179760176e-06, 'epoch': 2.39} +2025-02-06 02:02:19 - ERROR - stderr - 80%|███████▉ | 17896/22434 [15:54:39<3:16:23, 2.60s/it] +2025-02-06 02:02:21 - ERROR - stderr - 80%|███████▉ | 17897/22434 [15:54:41<3:14:33, 2.57s/it] +2025-02-06 02:02:21 - ERROR - stderr - +2025-02-06 02:02:21 - ERROR - stderr - +2025-02-06 02:02:21 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.5484185218811035, 'learning_rate': 2.069690331849159e-06, 'epoch': 2.39} +2025-02-06 02:02:21 - ERROR - stderr - 80%|███████▉ | 17897/22434 [15:54:41<3:14:33, 2.57s/it] +2025-02-06 02:02:24 - ERROR - stderr - 80%|███████▉ | 17898/22434 [15:54:44<3:15:39, 2.59s/it] +2025-02-06 02:02:24 - ERROR - stderr - +2025-02-06 02:02:24 - ERROR - stderr - +2025-02-06 02:02:24 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.5027798414230347, 'learning_rate': 2.068810911022021e-06, 'epoch': 2.39} +2025-02-06 02:02:24 - ERROR - stderr - 80%|███████▉ | 17898/22434 [15:54:44<3:15:39, 2.59s/it] +2025-02-06 02:02:27 - ERROR - stderr - 80%|███████▉ | 17899/22434 [15:54:46<3:13:42, 2.56s/it] +2025-02-06 02:02:27 - ERROR - stderr - +2025-02-06 02:02:27 - ERROR - stderr - +2025-02-06 02:02:27 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.532842993736267, 'learning_rate': 2.0679316555129236e-06, 'epoch': 2.39} +2025-02-06 02:02:27 - ERROR - stderr - 80%|███████▉ | 17899/22434 [15:54:46<3:13:42, 2.56s/it] +2025-02-06 02:02:29 - ERROR - stderr - 80%|███████▉ | 17900/22434 [15:54:49<3:11:53, 2.54s/it] +2025-02-06 02:02:29 - ERROR - stderr - +2025-02-06 02:02:29 - ERROR - stderr - +2025-02-06 02:02:29 - INFO - stdout - {'loss': 0.3302, 'grad_norm': 1.3951435089111328, 'learning_rate': 2.0670525653402064e-06, 'epoch': 2.39} +2025-02-06 02:02:29 - ERROR - stderr - 80%|███████▉ | 17900/22434 [15:54:49<3:11:53, 2.54s/it] +2025-02-06 02:02:31 - ERROR - stderr - 80%|███████▉ | 17901/22434 [15:54:51<3:10:07, 2.52s/it] +2025-02-06 02:02:32 - ERROR - stderr - +2025-02-06 02:02:32 - ERROR - stderr - +2025-02-06 02:02:32 - INFO - stdout - {'loss': 0.336, 'grad_norm': 1.3945815563201904, 'learning_rate': 2.0661736405221843e-06, 'epoch': 2.39} +2025-02-06 02:02:32 - ERROR - stderr - 80%|███████▉ | 17901/22434 [15:54:51<3:10:07, 2.52s/it] +2025-02-06 02:02:34 - ERROR - stderr - 80%|███████▉ | 17902/22434 [15:54:54<3:08:20, 2.49s/it] +2025-02-06 02:02:34 - ERROR - stderr - +2025-02-06 02:02:34 - ERROR - stderr - +2025-02-06 02:02:34 - INFO - stdout - {'loss': 0.307, 'grad_norm': 1.5005645751953125, 'learning_rate': 2.065294881077181e-06, 'epoch': 2.39} +2025-02-06 02:02:34 - ERROR - stderr - 80%|███████▉ | 17902/22434 [15:54:54<3:08:20, 2.49s/it] +2025-02-06 02:02:36 - ERROR - stderr - 80%|███████▉ | 17903/22434 [15:54:56<3:09:29, 2.51s/it] +2025-02-06 02:02:36 - ERROR - stderr - +2025-02-06 02:02:36 - ERROR - stderr - +2025-02-06 02:02:36 - INFO - stdout - {'loss': 0.3416, 'grad_norm': 1.5576838254928589, 'learning_rate': 2.064416287023514e-06, 'epoch': 2.39} +2025-02-06 02:02:36 - ERROR - stderr - 80%|███████▉ | 17903/22434 [15:54:56<3:09:29, 2.51s/it] +2025-02-06 02:02:39 - ERROR - stderr - 80%|███████▉ | 17904/22434 [15:54:59<3:09:36, 2.51s/it] +2025-02-06 02:02:39 - ERROR - stderr - +2025-02-06 02:02:39 - ERROR - stderr - +2025-02-06 02:02:39 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.5740407705307007, 'learning_rate': 2.063537858379493e-06, 'epoch': 2.39} +2025-02-06 02:02:39 - ERROR - stderr - 80%|███████▉ | 17904/22434 [15:54:59<3:09:36, 2.51s/it] +2025-02-06 02:02:41 - ERROR - stderr - 80%|███████▉ | 17905/22434 [15:55:01<3:09:32, 2.51s/it] +2025-02-06 02:02:42 - ERROR - stderr - +2025-02-06 02:02:42 - ERROR - stderr - +2025-02-06 02:02:42 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.5155054330825806, 'learning_rate': 2.0626595951634365e-06, 'epoch': 2.39} +2025-02-06 02:02:42 - ERROR - stderr - 80%|███████▉ | 17905/22434 [15:55:01<3:09:32, 2.51s/it] +2025-02-06 02:02:44 - ERROR - stderr - 80%|███████▉ | 17906/22434 [15:55:04<3:17:33, 2.62s/it] +2025-02-06 02:02:44 - ERROR - stderr - +2025-02-06 02:02:44 - ERROR - stderr - +2025-02-06 02:02:44 - INFO - stdout - {'loss': 0.2807, 'grad_norm': 1.3387234210968018, 'learning_rate': 2.0617814973936425e-06, 'epoch': 2.39} +2025-02-06 02:02:44 - ERROR - stderr - 80%|███████▉ | 17906/22434 [15:55:04<3:17:33, 2.62s/it] +2025-02-06 02:02:47 - ERROR - stderr - 80%|███████▉ | 17907/22434 [15:55:07<3:14:48, 2.58s/it] +2025-02-06 02:02:47 - ERROR - stderr - +2025-02-06 02:02:47 - ERROR - stderr - +2025-02-06 02:02:47 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.616338849067688, 'learning_rate': 2.060903565088417e-06, 'epoch': 2.39} +2025-02-06 02:02:47 - ERROR - stderr - 80%|███████▉ | 17907/22434 [15:55:07<3:14:48, 2.58s/it] +2025-02-06 02:02:49 - ERROR - stderr - 80%|█��█████▉ | 17908/22434 [15:55:09<3:12:13, 2.55s/it] +2025-02-06 02:02:49 - ERROR - stderr - +2025-02-06 02:02:49 - ERROR - stderr - +2025-02-06 02:02:49 - INFO - stdout - {'loss': 0.3189, 'grad_norm': 1.3485593795776367, 'learning_rate': 2.0600257982660598e-06, 'epoch': 2.39} +2025-02-06 02:02:49 - ERROR - stderr - 80%|███████▉ | 17908/22434 [15:55:09<3:12:13, 2.55s/it] +2025-02-06 02:02:52 - ERROR - stderr - 80%|███████▉ | 17909/22434 [15:55:12<3:10:07, 2.52s/it] +2025-02-06 02:02:52 - ERROR - stderr - +2025-02-06 02:02:52 - ERROR - stderr - +2025-02-06 02:02:52 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.6645393371582031, 'learning_rate': 2.0591481969448668e-06, 'epoch': 2.39} +2025-02-06 02:02:52 - ERROR - stderr - 80%|███████▉ | 17909/22434 [15:55:12<3:10:07, 2.52s/it] +2025-02-06 02:02:54 - ERROR - stderr - 80%|███████▉ | 17910/22434 [15:55:14<3:09:06, 2.51s/it] +2025-02-06 02:02:54 - ERROR - stderr - +2025-02-06 02:02:54 - ERROR - stderr - +2025-02-06 02:02:54 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.4824212789535522, 'learning_rate': 2.058270761143132e-06, 'epoch': 2.4} +2025-02-06 02:02:54 - ERROR - stderr - 80%|███████▉ | 17910/22434 [15:55:14<3:09:06, 2.51s/it] +2025-02-06 02:02:57 - ERROR - stderr - 80%|███████▉ | 17911/22434 [15:55:17<3:09:17, 2.51s/it] +2025-02-06 02:02:57 - ERROR - stderr - +2025-02-06 02:02:57 - ERROR - stderr - +2025-02-06 02:02:57 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.563370943069458, 'learning_rate': 2.0573934908791426e-06, 'epoch': 2.4} +2025-02-06 02:02:57 - ERROR - stderr - 80%|███████▉ | 17911/22434 [15:55:17<3:09:17, 2.51s/it] +2025-02-06 02:02:59 - ERROR - stderr - 80%|███████▉ | 17912/22434 [15:55:19<3:08:51, 2.51s/it] +2025-02-06 02:02:59 - ERROR - stderr - +2025-02-06 02:02:59 - ERROR - stderr - +2025-02-06 02:02:59 - INFO - stdout - {'loss': 0.4287, 'grad_norm': 1.8260524272918701, 'learning_rate': 2.0565163861711867e-06, 'epoch': 2.4} +2025-02-06 02:02:59 - ERROR - stderr - 80%|███████▉ | 17912/22434 [15:55:19<3:08:51, 2.51s/it] +2025-02-06 02:03:02 - ERROR - stderr - 80%|███████▉ | 17913/22434 [15:55:21<3:07:58, 2.49s/it] +2025-02-06 02:03:02 - ERROR - stderr - +2025-02-06 02:03:02 - ERROR - stderr - +2025-02-06 02:03:02 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.5508047342300415, 'learning_rate': 2.055639447037545e-06, 'epoch': 2.4} +2025-02-06 02:03:02 - ERROR - stderr - 80%|███████▉ | 17913/22434 [15:55:22<3:07:58, 2.49s/it] +2025-02-06 02:03:04 - ERROR - stderr - 80%|███████▉ | 17914/22434 [15:55:24<3:09:03, 2.51s/it] +2025-02-06 02:03:04 - ERROR - stderr - +2025-02-06 02:03:04 - ERROR - stderr - +2025-02-06 02:03:04 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.6852552890777588, 'learning_rate': 2.0547626734965e-06, 'epoch': 2.4} +2025-02-06 02:03:04 - ERROR - stderr - 80%|███████▉ | 17914/22434 [15:55:24<3:09:03, 2.51s/it] +2025-02-06 02:03:07 - ERROR - stderr - 80%|███████▉ | 17915/22434 [15:55:27<3:09:17, 2.51s/it] +2025-02-06 02:03:07 - ERROR - stderr - +2025-02-06 02:03:07 - ERROR - stderr - +2025-02-06 02:03:07 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.6302062273025513, 'learning_rate': 2.0538860655663183e-06, 'epoch': 2.4} +2025-02-06 02:03:07 - ERROR - stderr - 80%|███████▉ | 17915/22434 [15:55:27<3:09:17, 2.51s/it] +2025-02-06 02:03:09 - ERROR - stderr - 80%|███████▉ | 17916/22434 [15:55:29<3:07:44, 2.49s/it] +2025-02-06 02:03:09 - ERROR - stderr - +2025-02-06 02:03:09 - ERROR - stderr - +2025-02-06 02:03:09 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.645849347114563, 'learning_rate': 2.0530096232652818e-06, 'epoch': 2.4} +2025-02-06 02:03:09 - ERROR - stderr - 80%|███████▉ | 17916/22434 [15:55:29<3:07:44, 2.49s/it] +2025-02-06 02:03:12 - ERROR - stderr - 80%|███████▉ | 17917/22434 [15:55:31<3:07:45, 2.49s/it] +2025-02-06 02:03:12 - ERROR - stderr - +2025-02-06 02:03:12 - ERROR - stderr - +2025-02-06 02:03:12 - INFO - stdout - {'loss': 0.4024, 'grad_norm': 1.5102357864379883, 'learning_rate': 2.0521333466116576e-06, 'epoch': 2.4} +2025-02-06 02:03:12 - ERROR - stderr - 80%|███████▉ | 17917/22434 [15:55:32<3:07:45, 2.49s/it] +2025-02-06 02:03:14 - ERROR - stderr - 80%|███████▉ | 17918/22434 [15:55:34<3:07:37, 2.49s/it] +2025-02-06 02:03:14 - ERROR - stderr - +2025-02-06 02:03:14 - ERROR - stderr - +2025-02-06 02:03:14 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.4829307794570923, 'learning_rate': 2.0512572356237027e-06, 'epoch': 2.4} +2025-02-06 02:03:14 - ERROR - stderr - 80%|███████▉ | 17918/22434 [15:55:34<3:07:37, 2.49s/it] +2025-02-06 02:03:17 - ERROR - stderr - 80%|███████▉ | 17919/22434 [15:55:37<3:08:53, 2.51s/it] +2025-02-06 02:03:17 - ERROR - stderr - +2025-02-06 02:03:17 - ERROR - stderr - +2025-02-06 02:03:17 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.590675711631775, 'learning_rate': 2.0503812903196897e-06, 'epoch': 2.4} +2025-02-06 02:03:17 - ERROR - stderr - 80%|███████▉ | 17919/22434 [15:55:37<3:08:53, 2.51s/it] +2025-02-06 02:03:19 - ERROR - stderr - 80%|███████▉ | 17920/22434 [15:55:39<3:08:45, 2.51s/it] +2025-02-06 02:03:19 - ERROR - stderr - +2025-02-06 02:03:19 - ERROR - stderr - +2025-02-06 02:03:19 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.4651782512664795, 'learning_rate': 2.0495055107178675e-06, 'epoch': 2.4} +2025-02-06 02:03:19 - ERROR - stderr - 80%|███████▉ | 17920/22434 [15:55:39<3:08:45, 2.51s/it] +2025-02-06 02:03:22 - ERROR - stderr - 80%|███████▉ | 17921/22434 [15:55:41<3:07:31, 2.49s/it] +2025-02-06 02:03:22 - ERROR - stderr - +2025-02-06 02:03:22 - ERROR - stderr - +2025-02-06 02:03:22 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.6999223232269287, 'learning_rate': 2.0486298968364994e-06, 'epoch': 2.4} +2025-02-06 02:03:22 - ERROR - stderr - 80%|███████▉ | 17921/22434 [15:55:42<3:07:31, 2.49s/it] +2025-02-06 02:03:24 - ERROR - stderr - 80%|███████▉ | 17922/22434 [15:55:44<3:07:39, 2.50s/it] +2025-02-06 02:03:24 - ERROR - stderr - +2025-02-06 02:03:24 - ERROR - stderr - +2025-02-06 02:03:24 - INFO - stdout - {'loss': 0.3268, 'grad_norm': 1.367639183998108, 'learning_rate': 2.0477544486938306e-06, 'epoch': 2.4} +2025-02-06 02:03:24 - ERROR - stderr - 80%|███████▉ | 17922/22434 [15:55:44<3:07:39, 2.50s/it] +2025-02-06 02:03:27 - ERROR - stderr - 80%|███████▉ | 17923/22434 [15:55:46<3:06:20, 2.48s/it] +2025-02-06 02:03:27 - ERROR - stderr - +2025-02-06 02:03:27 - ERROR - stderr - +2025-02-06 02:03:27 - INFO - stdout - {'loss': 0.413, 'grad_norm': 1.6939268112182617, 'learning_rate': 2.0468791663081077e-06, 'epoch': 2.4} +2025-02-06 02:03:27 - ERROR - stderr - 80%|███████▉ | 17923/22434 [15:55:46<3:06:20, 2.48s/it] +2025-02-06 02:03:29 - ERROR - stderr - 80%|███████▉ | 17924/22434 [15:55:49<3:05:13, 2.46s/it] +2025-02-06 02:03:29 - ERROR - stderr - +2025-02-06 02:03:29 - ERROR - stderr - +2025-02-06 02:03:29 - INFO - stdout - {'loss': 0.375, 'grad_norm': 1.531829833984375, 'learning_rate': 2.0460040496975843e-06, 'epoch': 2.4} +2025-02-06 02:03:29 - ERROR - stderr - 80%|███████▉ | 17924/22434 [15:55:49<3:05:13, 2.46s/it] +2025-02-06 02:03:32 - ERROR - stderr - 80%|███████▉ | 17925/22434 [15:55:51<3:07:26, 2.49s/it] +2025-02-06 02:03:32 - ERROR - stderr - +2025-02-06 02:03:32 - ERROR - stderr - +2025-02-06 02:03:32 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.491853952407837, 'learning_rate': 2.0451290988804916e-06, 'epoch': 2.4} +2025-02-06 02:03:32 - ERROR - stderr - 80%|███████▉ | 17925/22434 [15:55:51<3:07:26, 2.49s/it] +2025-02-06 02:03:34 - ERROR - stderr - 80%|███████▉ | 17926/22434 [15:55:54<3:05:56, 2.47s/it] +2025-02-06 02:03:34 - ERROR - stderr - +2025-02-06 02:03:34 - ERROR - stderr - +2025-02-06 02:03:34 - INFO - stdout - {'loss': 0.3346, 'grad_norm': 1.5072453022003174, 'learning_rate': 2.0442543138750713e-06, 'epoch': 2.4} +2025-02-06 02:03:34 - ERROR - stderr - 80%|███████▉ | 17926/22434 [15:55:54<3:05:56, 2.47s/it] +2025-02-06 02:03:37 - ERROR - stderr - 80%|███████▉ | 17927/22434 [15:55:57<3:14:59, 2.60s/it] +2025-02-06 02:03:37 - ERROR - stderr - +2025-02-06 02:03:37 - ERROR - stderr - +2025-02-06 02:03:37 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.487454891204834, 'learning_rate': 2.0433796946995565e-06, 'epoch': 2.4} +2025-02-06 02:03:37 - ERROR - stderr - 80%|███████▉ | 17927/22434 [15:55:57<3:14:59, 2.60s/it] +2025-02-06 02:03:39 - ERROR - stderr - 80%|███████▉ | 17928/22434 [15:55:59<3:13:23, 2.58s/it] +2025-02-06 02:03:40 - ERROR - stderr - +2025-02-06 02:03:40 - ERROR - stderr - +2025-02-06 02:03:40 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.6014050245285034, 'learning_rate': 2.0425052413721793e-06, 'epoch': 2.4} +2025-02-06 02:03:40 - ERROR - stderr - 80%|███████▉ | 17928/22434 [15:55:59<3:13:23, 2.58s/it] +2025-02-06 02:03:42 - ERROR - stderr - 80%|███████▉ | 17929/22434 [15:56:02<3:10:59, 2.54s/it] +2025-02-06 02:03:42 - ERROR - stderr - +2025-02-06 02:03:42 - ERROR - stderr - +2025-02-06 02:03:42 - INFO - stdout - {'loss': 0.4098, 'grad_norm': 1.7423291206359863, 'learning_rate': 2.0416309539111656e-06, 'epoch': 2.4} +2025-02-06 02:03:42 - ERROR - stderr - 80%|███████▉ | 17929/22434 [15:56:02<3:10:59, 2.54s/it] +2025-02-06 02:03:44 - ERROR - stderr - 80%|███████▉ | 17930/22434 [15:56:04<3:09:27, 2.52s/it] +2025-02-06 02:03:44 - ERROR - stderr - +2025-02-06 02:03:44 - ERROR - stderr - +2025-02-06 02:03:44 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.3837201595306396, 'learning_rate': 2.0407568323347395e-06, 'epoch': 2.4} +2025-02-06 02:03:44 - ERROR - stderr - 80%|███████▉ | 17930/22434 [15:56:04<3:09:27, 2.52s/it] +2025-02-06 02:03:47 - ERROR - stderr - 80%|███████▉ | 17931/22434 [15:56:07<3:14:30, 2.59s/it] +2025-02-06 02:03:47 - ERROR - stderr - +2025-02-06 02:03:47 - ERROR - stderr - +2025-02-06 02:03:47 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.6199169158935547, 'learning_rate': 2.03988287666112e-06, 'epoch': 2.4} +2025-02-06 02:03:47 - ERROR - stderr - 80%|███████▉ | 17931/22434 [15:56:07<3:14:30, 2.59s/it] +2025-02-06 02:03:50 - ERROR - stderr - 80%|███████▉ | 17932/22434 [15:56:09<3:11:18, 2.55s/it] +2025-02-06 02:03:50 - ERROR - stderr - +2025-02-06 02:03:50 - ERROR - stderr - +2025-02-06 02:03:50 - INFO - stdout - {'loss': 0.4106, 'grad_norm': 1.5247902870178223, 'learning_rate': 2.0390090869085254e-06, 'epoch': 2.4} +2025-02-06 02:03:50 - ERROR - stderr - 80%|███████▉ | 17932/22434 [15:56:09<3:11:18, 2.55s/it] +2025-02-06 02:03:52 - ERROR - stderr - 80%|███████▉ | 17933/22434 [15:56:12<3:09:12, 2.52s/it] +2025-02-06 02:03:52 - ERROR - stderr - +2025-02-06 02:03:52 - ERROR - stderr - +2025-02-06 02:03:52 - INFO - stdout - {'loss': 0.3337, 'grad_norm': 1.3270975351333618, 'learning_rate': 2.038135463095169e-06, 'epoch': 2.4} +2025-02-06 02:03:52 - ERROR - stderr - 80%|███████▉ | 17933/22434 [15:56:12<3:09:12, 2.52s/it] +2025-02-06 02:03:55 - ERROR - stderr - 80%|███████▉ | 17934/22434 [15:56:14<3:08:02, 2.51s/it] +2025-02-06 02:03:55 - ERROR - stderr - +2025-02-06 02:03:55 - ERROR - stderr - +2025-02-06 02:03:55 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.5214077234268188, 'learning_rate': 2.03726200523926e-06, 'epoch': 2.4} +2025-02-06 02:03:55 - ERROR - stderr - 80%|███████▉ | 17934/22434 [15:56:14<3:08:02, 2.51s/it] +2025-02-06 02:03:57 - ERROR - stderr - 80%|███████▉ | 17935/22434 [15:56:17<3:08:57, 2.52s/it] +2025-02-06 02:03:57 - ERROR - stderr - +2025-02-06 02:03:57 - ERROR - stderr - +2025-02-06 02:03:57 - INFO - stdout - {'loss': 0.3327, 'grad_norm': 1.53960120677948, 'learning_rate': 2.0363887133590053e-06, 'epoch': 2.4} +2025-02-06 02:03:57 - ERROR - stderr - 80%|███████▉ | 17935/22434 [15:56:17<3:08:57, 2.52s/it] +2025-02-06 02:04:00 - ERROR - stderr - 80%|███████▉ | 17936/22434 [15:56:19<3:09:38, 2.53s/it] +2025-02-06 02:04:00 - ERROR - stderr - +2025-02-06 02:04:00 - ERROR - stderr - +2025-02-06 02:04:00 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.6621617078781128, 'learning_rate': 2.0355155874726073e-06, 'epoch': 2.4} +2025-02-06 02:04:00 - ERROR - stderr - 80%|███████▉ | 17936/22434 [15:56:19<3:09:38, 2.53s/it] +2025-02-06 02:04:02 - ERROR - stderr - 80%|███████▉ | 17937/22434 [15:56:22<3:09:44, 2.53s/it] +2025-02-06 02:04:02 - ERROR - stderr - +2025-02-06 02:04:02 - ERROR - stderr - +2025-02-06 02:04:02 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.5302265882492065, 'learning_rate': 2.0346426275982654e-06, 'epoch': 2.4} +2025-02-06 02:04:02 - ERROR - stderr - 80%|███████▉ | 17937/22434 [15:56:22<3:09:44, 2.53s/it] +2025-02-06 02:04:05 - ERROR - stderr - 80%|███████▉ | 17938/22434 [15:56:24<3:09:09, 2.52s/it] +2025-02-06 02:04:05 - ERROR - stderr - +2025-02-06 02:04:05 - ERROR - stderr - +2025-02-06 02:04:05 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.3897732496261597, 'learning_rate': 2.0337698337541787e-06, 'epoch': 2.4} +2025-02-06 02:04:05 - ERROR - stderr - 80%|███████▉ | 17938/22434 [15:56:25<3:09:09, 2.52s/it] +2025-02-06 02:04:07 - ERROR - stderr - 80%|███████▉ | 17939/22434 [15:56:27<3:08:43, 2.52s/it] +2025-02-06 02:04:07 - ERROR - stderr - +2025-02-06 02:04:07 - ERROR - stderr - +2025-02-06 02:04:07 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.5956616401672363, 'learning_rate': 2.0328972059585317e-06, 'epoch': 2.4} +2025-02-06 02:04:07 - ERROR - stderr - 80%|███████▉ | 17939/22434 [15:56:27<3:08:43, 2.52s/it] +2025-02-06 02:04:10 - ERROR - stderr - 80%|███████▉ | 17940/22434 [15:56:30<3:08:32, 2.52s/it] +2025-02-06 02:04:10 - ERROR - stderr - +2025-02-06 02:04:10 - ERROR - stderr - +2025-02-06 02:04:10 - INFO - stdout - {'loss': 0.3963, 'grad_norm': 1.746777057647705, 'learning_rate': 2.0320247442295237e-06, 'epoch': 2.4} +2025-02-06 02:04:10 - ERROR - stderr - 80%|███████▉ | 17940/22434 [15:56:30<3:08:32, 2.52s/it] +2025-02-06 02:04:12 - ERROR - stderr - 80%|███████▉ | 17941/22434 [15:56:32<3:08:30, 2.52s/it] +2025-02-06 02:04:12 - ERROR - stderr - +2025-02-06 02:04:12 - ERROR - stderr - +2025-02-06 02:04:12 - INFO - stdout - {'loss': 0.4164, 'grad_norm': 1.7709999084472656, 'learning_rate': 2.0311524485853307e-06, 'epoch': 2.4} +2025-02-06 02:04:12 - ERROR - stderr - 80%|███████▉ | 17941/22434 [15:56:32<3:08:30, 2.52s/it] +2025-02-06 02:04:15 - ERROR - stderr - 80%|███████▉ | 17942/22434 [15:56:35<3:08:23, 2.52s/it] +2025-02-06 02:04:15 - ERROR - stderr - +2025-02-06 02:04:15 - ERROR - stderr - +2025-02-06 02:04:15 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.4809352159500122, 'learning_rate': 2.0302803190441424e-06, 'epoch': 2.4} +2025-02-06 02:04:15 - ERROR - stderr - 80%|███████▉ | 17942/22434 [15:56:35<3:08:23, 2.52s/it] +2025-02-06 02:04:17 - ERROR - stderr - 80%|███████▉ | 17943/22434 [15:56:37<3:06:35, 2.49s/it] +2025-02-06 02:04:17 - ERROR - stderr - +2025-02-06 02:04:17 - ERROR - stderr - +2025-02-06 02:04:17 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.701432228088379, 'learning_rate': 2.029408355624136e-06, 'epoch': 2.4} +2025-02-06 02:04:17 - ERROR - stderr - 80%|███████▉ | 17943/22434 [15:56:37<3:06:35, 2.49s/it] +2025-02-06 02:04:20 - ERROR - stderr - 80%|███████▉ | 17944/22434 [15:56:40<3:09:50, 2.54s/it] +2025-02-06 02:04:20 - ERROR - stderr - +2025-02-06 02:04:20 - ERROR - stderr - +2025-02-06 02:04:20 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.6228365898132324, 'learning_rate': 2.028536558343481e-06, 'epoch': 2.4} +2025-02-06 02:04:20 - ERROR - stderr - 80%|███████▉ | 17944/22434 [15:56:40<3:09:50, 2.54s/it] +2025-02-06 02:04:22 - ERROR - stderr - 80%|███████▉ | 17945/22434 [15:56:42<3:09:09, 2.53s/it] +2025-02-06 02:04:22 - ERROR - stderr - +2025-02-06 02:04:22 - ERROR - stderr - +2025-02-06 02:04:22 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.543188214302063, 'learning_rate': 2.0276649272203586e-06, 'epoch': 2.4} +2025-02-06 02:04:22 - ERROR - stderr - 80%|███████▉ | 17945/22434 [15:56:42<3:09:09, 2.53s/it] +2025-02-06 02:04:25 - ERROR - stderr - 80%|███████▉ | 17946/22434 [15:56:45<3:08:52, 2.52s/it] +2025-02-06 02:04:25 - ERROR - stderr - +2025-02-06 02:04:25 - ERROR - stderr - +2025-02-06 02:04:25 - INFO - stdout - {'loss': 0.3971, 'grad_norm': 1.6711128950119019, 'learning_rate': 2.02679346227293e-06, 'epoch': 2.4} +2025-02-06 02:04:25 - ERROR - stderr - 80%|███████▉ | 17946/22434 [15:56:45<3:08:52, 2.52s/it] +2025-02-06 02:04:27 - ERROR - stderr - 80%|███████▉ | 17947/22434 [15:56:47<3:09:28, 2.53s/it] +2025-02-06 02:04:27 - ERROR - stderr - +2025-02-06 02:04:27 - ERROR - stderr - +2025-02-06 02:04:27 - INFO - stdout - {'loss': 0.3755, 'grad_norm': 1.6172393560409546, 'learning_rate': 2.0259221635193616e-06, 'epoch': 2.4} +2025-02-06 02:04:27 - ERROR - stderr - 80%|███████▉ | 17947/22434 [15:56:47<3:09:28, 2.53s/it] +2025-02-06 02:04:30 - ERROR - stderr - 80%|████████ | 17948/22434 [15:56:50<3:09:00, 2.53s/it] +2025-02-06 02:04:30 - ERROR - stderr - +2025-02-06 02:04:30 - ERROR - stderr - +2025-02-06 02:04:30 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.6031951904296875, 'learning_rate': 2.025051030977816e-06, 'epoch': 2.4} +2025-02-06 02:04:30 - ERROR - stderr - 80%|████████ | 17948/22434 [15:56:50<3:09:00, 2.53s/it] +2025-02-06 02:04:33 - ERROR - stderr - 80%|████████ | 17949/22434 [15:56:52<3:09:39, 2.54s/it] +2025-02-06 02:04:33 - ERROR - stderr - +2025-02-06 02:04:33 - ERROR - stderr - +2025-02-06 02:04:33 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.8141647577285767, 'learning_rate': 2.02418006466645e-06, 'epoch': 2.4} +2025-02-06 02:04:33 - ERROR - stderr - 80%|████████ | 17949/22434 [15:56:52<3:09:39, 2.54s/it] +2025-02-06 02:04:35 - ERROR - stderr - 80%|████████ | 17950/22434 [15:56:55<3:16:14, 2.63s/it] +2025-02-06 02:04:35 - ERROR - stderr - +2025-02-06 02:04:35 - ERROR - stderr - +2025-02-06 02:04:35 - INFO - stdout - {'loss': 0.3987, 'grad_norm': 1.6056065559387207, 'learning_rate': 2.023309264603418e-06, 'epoch': 2.4} +2025-02-06 02:04:35 - ERROR - stderr - 80%|████████ | 17950/22434 [15:56:55<3:16:14, 2.63s/it] +2025-02-06 02:04:38 - ERROR - stderr - 80%|████████ | 17951/22434 [15:56:58<3:12:42, 2.58s/it] +2025-02-06 02:04:38 - ERROR - stderr - +2025-02-06 02:04:38 - ERROR - stderr - +2025-02-06 02:04:38 - INFO - stdout - {'loss': 0.4063, 'grad_norm': 1.5413163900375366, 'learning_rate': 2.022438630806872e-06, 'epoch': 2.4} +2025-02-06 02:04:38 - ERROR - stderr - 80%|████████ | 17951/22434 [15:56:58<3:12:42, 2.58s/it] +2025-02-06 02:04:40 - ERROR - stderr - 80%|████████ | 17952/22434 [15:57:00<3:11:34, 2.56s/it] +2025-02-06 02:04:40 - ERROR - stderr - +2025-02-06 02:04:40 - ERROR - stderr - +2025-02-06 02:04:40 - INFO - stdout - {'loss': 0.3295, 'grad_norm': 1.4227293729782104, 'learning_rate': 2.021568163294959e-06, 'epoch': 2.4} +2025-02-06 02:04:40 - ERROR - stderr - 80%|████████ | 17952/22434 [15:57:00<3:11:34, 2.56s/it] +2025-02-06 02:04:43 - ERROR - stderr - 80%|████████ | 17953/22434 [15:57:03<3:09:42, 2.54s/it] +2025-02-06 02:04:43 - ERROR - stderr - +2025-02-06 02:04:43 - ERROR - stderr - +2025-02-06 02:04:43 - INFO - stdout - {'loss': 0.4115, 'grad_norm': 1.6649378538131714, 'learning_rate': 2.020697862085823e-06, 'epoch': 2.4} +2025-02-06 02:04:43 - ERROR - stderr - 80%|████████ | 17953/22434 [15:57:03<3:09:42, 2.54s/it] +2025-02-06 02:04:45 - ERROR - stderr - 80%|████████ | 17954/22434 [15:57:05<3:08:46, 2.53s/it] +2025-02-06 02:04:45 - ERROR - stderr - +2025-02-06 02:04:45 - ERROR - stderr - +2025-02-06 02:04:45 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.6725213527679443, 'learning_rate': 2.019827727197605e-06, 'epoch': 2.4} +2025-02-06 02:04:45 - ERROR - stderr - 80%|████████ | 17954/22434 [15:57:05<3:08:46, 2.53s/it] +2025-02-06 02:04:48 - ERROR - stderr - 80%|████████ | 17955/22434 [15:57:08<3:06:59, 2.51s/it] +2025-02-06 02:04:48 - ERROR - stderr - +2025-02-06 02:04:48 - ERROR - stderr - +2025-02-06 02:04:48 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.4428149461746216, 'learning_rate': 2.018957758648442e-06, 'epoch': 2.4} +2025-02-06 02:04:48 - ERROR - stderr - 80%|████████ | 17955/22434 [15:57:08<3:06:59, 2.51s/it] +2025-02-06 02:04:50 - ERROR - stderr - 80%|████████ | 17956/22434 [15:57:10<3:11:30, 2.57s/it] +2025-02-06 02:04:51 - ERROR - stderr - +2025-02-06 02:04:51 - ERROR - stderr - +2025-02-06 02:04:51 - INFO - stdout - {'loss': 0.3357, 'grad_norm': 1.492225170135498, 'learning_rate': 2.018087956456467e-06, 'epoch': 2.4} +2025-02-06 02:04:51 - ERROR - stderr - 80%|████████ | 17956/22434 [15:57:10<3:11:30, 2.57s/it] +2025-02-06 02:04:53 - ERROR - stderr - 80%|████████ | 17957/22434 [15:57:13<3:09:11, 2.54s/it] +2025-02-06 02:04:53 - ERROR - stderr - +2025-02-06 02:04:53 - ERROR - stderr - +2025-02-06 02:04:53 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.4774190187454224, 'learning_rate': 2.017218320639811e-06, 'epoch': 2.4} +2025-02-06 02:04:53 - ERROR - stderr - 80%|████████ | 17957/22434 [15:57:13<3:09:11, 2.54s/it] +2025-02-06 02:04:55 - ERROR - stderr - 80%|████████ | 17958/22434 [15:57:15<3:08:19, 2.52s/it] +2025-02-06 02:04:55 - ERROR - stderr - +2025-02-06 02:04:55 - ERROR - stderr - +2025-02-06 02:04:55 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.5460723638534546, 'learning_rate': 2.0163488512166007e-06, 'epoch': 2.4} +2025-02-06 02:04:55 - ERROR - stderr - 80%|████████ | 17958/22434 [15:57:15<3:08:19, 2.52s/it] +2025-02-06 02:04:58 - ERROR - stderr - 80%|████████ | 17959/22434 [15:57:18<3:06:45, 2.50s/it] +2025-02-06 02:04:58 - ERROR - stderr - +2025-02-06 02:04:58 - ERROR - stderr - +2025-02-06 02:04:58 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.7437920570373535, 'learning_rate': 2.0154795482049616e-06, 'epoch': 2.4} +2025-02-06 02:04:58 - ERROR - stderr - 80%|████████ | 17959/22434 [15:57:18<3:06:45, 2.50s/it] +2025-02-06 02:05:00 - ERROR - stderr - 80%|████████ | 17960/22434 [15:57:20<3:05:13, 2.48s/it] +2025-02-06 02:05:00 - ERROR - stderr - +2025-02-06 02:05:00 - ERROR - stderr - +2025-02-06 02:05:00 - INFO - stdout - {'loss': 0.3341, 'grad_norm': 1.440415620803833, 'learning_rate': 2.014610411623005e-06, 'epoch': 2.4} +2025-02-06 02:05:00 - ERROR - stderr - 80%|████████ | 17960/22434 [15:57:20<3:05:13, 2.48s/it] +2025-02-06 02:05:03 - ERROR - stderr - 80%|████████ | 17961/22434 [15:57:23<3:07:18, 2.51s/it] +2025-02-06 02:05:03 - ERROR - stderr - +2025-02-06 02:05:03 - ERROR - stderr - +2025-02-06 02:05:03 - INFO - stdout - {'loss': 0.3265, 'grad_norm': 1.4563069343566895, 'learning_rate': 2.0137414414888555e-06, 'epoch': 2.4} +2025-02-06 02:05:03 - ERROR - stderr - 80%|████████ | 17961/22434 [15:57:23<3:07:18, 2.51s/it] +2025-02-06 02:05:05 - ERROR - stderr - 80%|████████ | 17962/22434 [15:57:25<3:07:43, 2.52s/it] +2025-02-06 02:05:05 - ERROR - stderr - +2025-02-06 02:05:05 - ERROR - stderr - +2025-02-06 02:05:05 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.4194328784942627, 'learning_rate': 2.0128726378206275e-06, 'epoch': 2.4} +2025-02-06 02:05:05 - ERROR - stderr - 80%|████████ | 17962/22434 [15:57:25<3:07:43, 2.52s/it] +2025-02-06 02:05:08 - ERROR - stderr - 80%|████████ | 17963/22434 [15:57:28<3:06:11, 2.50s/it] +2025-02-06 02:05:08 - ERROR - stderr - +2025-02-06 02:05:08 - ERROR - stderr - +2025-02-06 02:05:08 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.5021103620529175, 'learning_rate': 2.0120040006364204e-06, 'epoch': 2.4} +2025-02-06 02:05:08 - ERROR - stderr - 80%|████████ | 17963/22434 [15:57:28<3:06:11, 2.50s/it] +2025-02-06 02:05:10 - ERROR - stderr - 80%|████████ | 17964/22434 [15:57:30<3:05:14, 2.49s/it] +2025-02-06 02:05:10 - ERROR - stderr - +2025-02-06 02:05:10 - ERROR - stderr - +2025-02-06 02:05:10 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.604524850845337, 'learning_rate': 2.011135529954352e-06, 'epoch': 2.4} +2025-02-06 02:05:10 - ERROR - stderr - 80%|████████ | 17964/22434 [15:57:30<3:05:14, 2.49s/it] +2025-02-06 02:05:13 - ERROR - stderr - 80%|████████ | 17965/22434 [15:57:33<3:05:32, 2.49s/it] +2025-02-06 02:05:13 - ERROR - stderr - +2025-02-06 02:05:13 - ERROR - stderr - +2025-02-06 02:05:13 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.441267490386963, 'learning_rate': 2.0102672257925137e-06, 'epoch': 2.4} +2025-02-06 02:05:13 - ERROR - stderr - 80%|████████ | 17965/22434 [15:57:33<3:05:32, 2.49s/it] +2025-02-06 02:05:15 - ERROR - stderr - 80%|████████ | 17966/22434 [15:57:35<3:05:41, 2.49s/it] +2025-02-06 02:05:15 - ERROR - stderr - +2025-02-06 02:05:15 - ERROR - stderr - +2025-02-06 02:05:15 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.7912224531173706, 'learning_rate': 2.009399088169015e-06, 'epoch': 2.4} +2025-02-06 02:05:15 - ERROR - stderr - 80%|████████ | 17966/22434 [15:57:35<3:05:41, 2.49s/it] +2025-02-06 02:05:18 - ERROR - stderr - 80%|████████ | 17967/22434 [15:57:38<3:05:48, 2.50s/it] +2025-02-06 02:05:18 - ERROR - stderr - +2025-02-06 02:05:18 - ERROR - stderr - +2025-02-06 02:05:18 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.6365660429000854, 'learning_rate': 2.008531117101943e-06, 'epoch': 2.4} +2025-02-06 02:05:18 - ERROR - stderr - 80%|████████ | 17967/22434 [15:57:38<3:05:48, 2.50s/it] +2025-02-06 02:05:20 - ERROR - stderr - 80%|████████ | 17968/22434 [15:57:40<3:05:59, 2.50s/it] +2025-02-06 02:05:20 - ERROR - stderr - +2025-02-06 02:05:20 - ERROR - stderr - +2025-02-06 02:05:20 - INFO - stdout - {'loss': 0.3122, 'grad_norm': 1.3388489484786987, 'learning_rate': 2.007663312609394e-06, 'epoch': 2.4} +2025-02-06 02:05:20 - ERROR - stderr - 80%|████████ | 17968/22434 [15:57:40<3:05:59, 2.50s/it] +2025-02-06 02:05:23 - ERROR - stderr - 80%|████████ | 17969/22434 [15:57:43<3:06:36, 2.51s/it] +2025-02-06 02:05:23 - ERROR - stderr - +2025-02-06 02:05:23 - ERROR - stderr - +2025-02-06 02:05:23 - INFO - stdout - {'loss': 0.3374, 'grad_norm': 1.5664211511611938, 'learning_rate': 2.0067956747094542e-06, 'epoch': 2.4} +2025-02-06 02:05:23 - ERROR - stderr - 80%|████████ | 17969/22434 [15:57:43<3:06:36, 2.51s/it] +2025-02-06 02:05:25 - ERROR - stderr - 80%|████████ | 17970/22434 [15:57:45<3:07:04, 2.51s/it] +2025-02-06 02:05:25 - ERROR - stderr - +2025-02-06 02:05:25 - ERROR - stderr - +2025-02-06 02:05:25 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.5719892978668213, 'learning_rate': 2.0059282034202097e-06, 'epoch': 2.4} +2025-02-06 02:05:25 - ERROR - stderr - 80%|████████ | 17970/22434 [15:57:45<3:07:04, 2.51s/it] +2025-02-06 02:05:28 - ERROR - stderr - 80%|████████ | 17971/22434 [15:57:48<3:08:43, 2.54s/it] +2025-02-06 02:05:28 - ERROR - stderr - +2025-02-06 02:05:28 - ERROR - stderr - +2025-02-06 02:05:28 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.5805082321166992, 'learning_rate': 2.005060898759743e-06, 'epoch': 2.4} +2025-02-06 02:05:28 - ERROR - stderr - 80%|████████ | 17971/22434 [15:57:48<3:08:43, 2.54s/it] +2025-02-06 02:05:31 - ERROR - stderr - 80%|████████ | 17972/22434 [15:57:50<3:07:31, 2.52s/it] +2025-02-06 02:05:31 - ERROR - stderr - +2025-02-06 02:05:31 - ERROR - stderr - +2025-02-06 02:05:31 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.529064655303955, 'learning_rate': 2.0041937607461315e-06, 'epoch': 2.4} +2025-02-06 02:05:31 - ERROR - stderr - 80%|████████ | 17972/22434 [15:57:50<3:07:31, 2.52s/it] +2025-02-06 02:05:33 - ERROR - stderr - 80%|████████ | 17973/22434 [15:57:53<3:06:24, 2.51s/it] +2025-02-06 02:05:33 - ERROR - stderr - +2025-02-06 02:05:33 - ERROR - stderr - +2025-02-06 02:05:33 - INFO - stdout - {'loss': 0.3929, 'grad_norm': 1.646658182144165, 'learning_rate': 2.0033267893974495e-06, 'epoch': 2.4} +2025-02-06 02:05:33 - ERROR - stderr - 80%|████████ | 17973/22434 [15:57:53<3:06:24, 2.51s/it] +2025-02-06 02:05:35 - ERROR - stderr - 80%|████████ | 17974/22434 [15:57:55<3:04:25, 2.48s/it] +2025-02-06 02:05:35 - ERROR - stderr - +2025-02-06 02:05:35 - ERROR - stderr - +2025-02-06 02:05:35 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.5485522747039795, 'learning_rate': 2.0024599847317695e-06, 'epoch': 2.4} +2025-02-06 02:05:35 - ERROR - stderr - 80%|████████ | 17974/22434 [15:57:55<3:04:25, 2.48s/it] +2025-02-06 02:05:38 - ERROR - stderr - 80%|████████ | 17975/22434 [15:57:58<3:04:24, 2.48s/it] +2025-02-06 02:05:38 - ERROR - stderr - +2025-02-06 02:05:38 - ERROR - stderr - +2025-02-06 02:05:38 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.4969358444213867, 'learning_rate': 2.001593346767158e-06, 'epoch': 2.4} +2025-02-06 02:05:38 - ERROR - stderr - 80%|████████ | 17975/22434 [15:57:58<3:04:24, 2.48s/it] +2025-02-06 02:05:40 - ERROR - stderr - 80%|████████ | 17976/22434 [15:58:00<3:04:32, 2.48s/it] +2025-02-06 02:05:40 - ERROR - stderr - +2025-02-06 02:05:40 - ERROR - stderr - +2025-02-06 02:05:40 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.4068106412887573, 'learning_rate': 2.000726875521679e-06, 'epoch': 2.4} +2025-02-06 02:05:40 - ERROR - stderr - 80%|████████ | 17976/22434 [15:58:00<3:04:32, 2.48s/it] +2025-02-06 02:05:43 - ERROR - stderr - 80%|████████ | 17977/22434 [15:58:03<3:03:39, 2.47s/it] +2025-02-06 02:05:43 - ERROR - stderr - +2025-02-06 02:05:43 - ERROR - stderr - +2025-02-06 02:05:43 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.4809266328811646, 'learning_rate': 1.999860571013393e-06, 'epoch': 2.4} +2025-02-06 02:05:43 - ERROR - stderr - 80%|████████ | 17977/22434 [15:58:03<3:03:39, 2.47s/it] +2025-02-06 02:05:45 - ERROR - stderr - 80%|████████ | 17978/22434 [15:58:05<3:03:48, 2.47s/it] +2025-02-06 02:05:45 - ERROR - stderr - +2025-02-06 02:05:45 - ERROR - stderr - +2025-02-06 02:05:45 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.555912733078003, 'learning_rate': 1.998994433260363e-06, 'epoch': 2.4} +2025-02-06 02:05:45 - ERROR - stderr - 80%|████████ | 17978/22434 [15:58:05<3:03:48, 2.47s/it] +2025-02-06 02:05:48 - ERROR - stderr - 80%|████████ | 17979/22434 [15:58:08<3:05:53, 2.50s/it] +2025-02-06 02:05:48 - ERROR - stderr - +2025-02-06 02:05:48 - ERROR - stderr - +2025-02-06 02:05:48 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.526281476020813, 'learning_rate': 1.9981284622806306e-06, 'epoch': 2.4} +2025-02-06 02:05:48 - ERROR - stderr - 80%|████████ | 17979/22434 [15:58:08<3:05:53, 2.50s/it] +2025-02-06 02:05:50 - ERROR - stderr - 80%|████████ | 17980/22434 [15:58:10<3:07:01, 2.52s/it] +2025-02-06 02:05:50 - ERROR - stderr - +2025-02-06 02:05:50 - ERROR - stderr - +2025-02-06 02:05:50 - INFO - stdout - {'loss': 0.3707, 'grad_norm': 1.6385716199874878, 'learning_rate': 1.9972626580922573e-06, 'epoch': 2.4} +2025-02-06 02:05:50 - ERROR - stderr - 80%|████████ | 17980/22434 [15:58:10<3:07:01, 2.52s/it] +2025-02-06 02:05:53 - ERROR - stderr - 80%|████████ | 17981/22434 [15:58:13<3:06:31, 2.51s/it] +2025-02-06 02:05:53 - ERROR - stderr - +2025-02-06 02:05:53 - ERROR - stderr - +2025-02-06 02:05:53 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.5952492952346802, 'learning_rate': 1.9963970207132854e-06, 'epoch': 2.4} +2025-02-06 02:05:53 - ERROR - stderr - 80%|████████ | 17981/22434 [15:58:13<3:06:31, 2.51s/it] +2025-02-06 02:05:56 - ERROR - stderr - 80%|████████ | 17982/22434 [15:58:15<3:08:01, 2.53s/it] +2025-02-06 02:05:56 - ERROR - stderr - +2025-02-06 02:05:56 - ERROR - stderr - +2025-02-06 02:05:56 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.4557716846466064, 'learning_rate': 1.995531550161759e-06, 'epoch': 2.4} +2025-02-06 02:05:56 - ERROR - stderr - 80%|████████ | 17982/22434 [15:58:15<3:08:01, 2.53s/it] +2025-02-06 02:05:58 - ERROR - stderr - 80%|████████ | 17983/22434 [15:58:18<3:05:53, 2.51s/it] +2025-02-06 02:05:58 - ERROR - stderr - +2025-02-06 02:05:58 - ERROR - stderr - +2025-02-06 02:05:58 - INFO - stdout - {'loss': 0.3195, 'grad_norm': 1.4788583517074585, 'learning_rate': 1.994666246455721e-06, 'epoch': 2.4} +2025-02-06 02:05:58 - ERROR - stderr - 80%|████████ | 17983/22434 [15:58:18<3:05:53, 2.51s/it] +2025-02-06 02:06:00 - ERROR - stderr - 80%|████████ | 17984/22434 [15:58:20<3:05:49, 2.51s/it] +2025-02-06 02:06:00 - ERROR - stderr - +2025-02-06 02:06:00 - ERROR - stderr - +2025-02-06 02:06:00 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.3122265338897705, 'learning_rate': 1.9938011096131993e-06, 'epoch': 2.4} +2025-02-06 02:06:00 - ERROR - stderr - 80%|████████ | 17984/22434 [15:58:20<3:05:49, 2.51s/it] +2025-02-06 02:06:03 - ERROR - stderr - 80%|████████ | 17985/22434 [15:58:23<3:11:43, 2.59s/it] +2025-02-06 02:06:03 - ERROR - stderr - +2025-02-06 02:06:03 - ERROR - stderr - +2025-02-06 02:06:03 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.5411807298660278, 'learning_rate': 1.9929361396522386e-06, 'epoch': 2.41} +2025-02-06 02:06:03 - ERROR - stderr - 80%|████████ | 17985/22434 [15:58:23<3:11:43, 2.59s/it] +2025-02-06 02:06:06 - ERROR - stderr - 80%|████████ | 17986/22434 [15:58:25<3:09:37, 2.56s/it] +2025-02-06 02:06:06 - ERROR - stderr - +2025-02-06 02:06:06 - ERROR - stderr - +2025-02-06 02:06:06 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.5965473651885986, 'learning_rate': 1.9920713365908586e-06, 'epoch': 2.41} +2025-02-06 02:06:06 - ERROR - stderr - 80%|████████ | 17986/22434 [15:58:26<3:09:37, 2.56s/it] +2025-02-06 02:06:08 - ERROR - stderr - 80%|████████ | 17987/22434 [15:58:28<3:08:44, 2.55s/it] +2025-02-06 02:06:08 - ERROR - stderr - +2025-02-06 02:06:08 - ERROR - stderr - +2025-02-06 02:06:08 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.73786199092865, 'learning_rate': 1.9912067004470892e-06, 'epoch': 2.41} +2025-02-06 02:06:08 - ERROR - stderr - 80%|████████ | 17987/22434 [15:58:28<3:08:44, 2.55s/it] +2025-02-06 02:06:11 - ERROR - stderr - 80%|████████ | 17988/22434 [15:58:31<3:09:19, 2.56s/it] +2025-02-06 02:06:11 - ERROR - stderr - +2025-02-06 02:06:11 - ERROR - stderr - +2025-02-06 02:06:11 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.5518689155578613, 'learning_rate': 1.990342231238952e-06, 'epoch': 2.41} +2025-02-06 02:06:11 - ERROR - stderr - 80%|████████ | 17988/22434 [15:58:31<3:09:19, 2.56s/it] +2025-02-06 02:06:13 - ERROR - stderr - 80%|████████ | 17989/22434 [15:58:33<3:08:32, 2.55s/it] +2025-02-06 02:06:13 - ERROR - stderr - +2025-02-06 02:06:13 - ERROR - stderr - +2025-02-06 02:06:13 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.6562516689300537, 'learning_rate': 1.9894779289844646e-06, 'epoch': 2.41} +2025-02-06 02:06:13 - ERROR - stderr - 80%|████████ | 17989/22434 [15:58:33<3:08:32, 2.55s/it] +2025-02-06 02:06:16 - ERROR - stderr - 80%|████████ | 17990/22434 [15:58:36<3:05:42, 2.51s/it] +2025-02-06 02:06:16 - ERROR - stderr - +2025-02-06 02:06:16 - ERROR - stderr - +2025-02-06 02:06:16 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.7126809358596802, 'learning_rate': 1.9886137937016493e-06, 'epoch': 2.41} +2025-02-06 02:06:16 - ERROR - stderr - 80%|████████ | 17990/22434 [15:58:36<3:05:42, 2.51s/it] +2025-02-06 02:06:18 - ERROR - stderr - 80%|████████ | 17991/22434 [15:58:38<3:04:07, 2.49s/it] +2025-02-06 02:06:18 - ERROR - stderr - +2025-02-06 02:06:18 - ERROR - stderr - +2025-02-06 02:06:18 - INFO - stdout - {'loss': 0.4516, 'grad_norm': 1.739372968673706, 'learning_rate': 1.9877498254085103e-06, 'epoch': 2.41} +2025-02-06 02:06:18 - ERROR - stderr - 80%|████████ | 17991/22434 [15:58:38<3:04:07, 2.49s/it] +2025-02-06 02:06:21 - ERROR - stderr - 80%|████████ | 17992/22434 [15:58:40<3:03:00, 2.47s/it] +2025-02-06 02:06:21 - ERROR - stderr - +2025-02-06 02:06:21 - ERROR - stderr - +2025-02-06 02:06:21 - INFO - stdout - {'loss': 0.418, 'grad_norm': 1.5956242084503174, 'learning_rate': 1.9868860241230604e-06, 'epoch': 2.41} +2025-02-06 02:06:21 - ERROR - stderr - 80%|████████ | 17992/22434 [15:58:40<3:03:00, 2.47s/it] +2025-02-06 02:06:23 - ERROR - stderr - 80%|████████ | 17993/22434 [15:58:43<3:04:37, 2.49s/it] +2025-02-06 02:06:23 - ERROR - stderr - +2025-02-06 02:06:23 - ERROR - stderr - +2025-02-06 02:06:23 - INFO - stdout - {'loss': 0.301, 'grad_norm': 1.2999794483184814, 'learning_rate': 1.9860223898633023e-06, 'epoch': 2.41} +2025-02-06 02:06:23 - ERROR - stderr - 80%|████████ | 17993/22434 [15:58:43<3:04:37, 2.49s/it] +2025-02-06 02:06:26 - ERROR - stderr - 80%|████████ | 17994/22434 [15:58:45<3:03:39, 2.48s/it] +2025-02-06 02:06:26 - ERROR - stderr - +2025-02-06 02:06:26 - ERROR - stderr - +2025-02-06 02:06:26 - INFO - stdout - {'loss': 0.3231, 'grad_norm': 1.4201385974884033, 'learning_rate': 1.9851589226472402e-06, 'epoch': 2.41} +2025-02-06 02:06:26 - ERROR - stderr - 80%|████████ | 17994/22434 [15:58:45<3:03:39, 2.48s/it] +2025-02-06 02:06:28 - ERROR - stderr - 80%|████████ | 17995/22434 [15:58:48<3:02:55, 2.47s/it] +2025-02-06 02:06:28 - ERROR - stderr - +2025-02-06 02:06:28 - ERROR - stderr - +2025-02-06 02:06:28 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.7690218687057495, 'learning_rate': 1.98429562249287e-06, 'epoch': 2.41} +2025-02-06 02:06:28 - ERROR - stderr - 80%|████████ | 17995/22434 [15:58:48<3:02:55, 2.47s/it] +2025-02-06 02:06:31 - ERROR - stderr - 80%|████████ | 17996/22434 [15:58:50<3:02:39, 2.47s/it] +2025-02-06 02:06:31 - ERROR - stderr - +2025-02-06 02:06:31 - ERROR - stderr - +2025-02-06 02:06:31 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.4726568460464478, 'learning_rate': 1.983432489418189e-06, 'epoch': 2.41} +2025-02-06 02:06:31 - ERROR - stderr - 80%|████████ | 17996/22434 [15:58:50<3:02:39, 2.47s/it] +2025-02-06 02:06:33 - ERROR - stderr - 80%|████████ | 17997/22434 [15:58:53<3:03:32, 2.48s/it] +2025-02-06 02:06:33 - ERROR - stderr - +2025-02-06 02:06:33 - ERROR - stderr - +2025-02-06 02:06:33 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.433272361755371, 'learning_rate': 1.9825695234411847e-06, 'epoch': 2.41} +2025-02-06 02:06:33 - ERROR - stderr - 80%|████████ | 17997/22434 [15:58:53<3:03:32, 2.48s/it] +2025-02-06 02:06:35 - ERROR - stderr - 80%|████████ | 17998/22434 [15:58:55<3:02:41, 2.47s/it] +2025-02-06 02:06:36 - ERROR - stderr - +2025-02-06 02:06:36 - ERROR - stderr - +2025-02-06 02:06:36 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.7957602739334106, 'learning_rate': 1.981706724579848e-06, 'epoch': 2.41} +2025-02-06 02:06:36 - ERROR - stderr - 80%|████████ | 17998/22434 [15:58:55<3:02:41, 2.47s/it] +2025-02-06 02:06:38 - ERROR - stderr - 80%|████████ | 17999/22434 [15:58:58<3:02:20, 2.47s/it] +2025-02-06 02:06:38 - ERROR - stderr - +2025-02-06 02:06:38 - ERROR - stderr - +2025-02-06 02:06:38 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.7641193866729736, 'learning_rate': 1.980844092852162e-06, 'epoch': 2.41} +2025-02-06 02:06:38 - ERROR - stderr - 80%|████████ | 17999/22434 [15:58:58<3:02:20, 2.47s/it] +2025-02-06 02:06:40 - ERROR - stderr - 80%|████████ | 18000/22434 [15:59:00<3:02:37, 2.47s/it] +2025-02-06 02:06:40 - ERROR - stderr - +2025-02-06 02:06:40 - ERROR - stderr - +2025-02-06 02:06:40 - INFO - stdout - {'loss': 0.3569, 'grad_norm': 1.5508638620376587, 'learning_rate': 1.9799816282761064e-06, 'epoch': 2.41} +2025-02-06 02:06:40 - ERROR - stderr - 80%|████████ | 18000/22434 [15:59:00<3:02:37, 2.47s/it] +2025-02-06 02:06:43 - ERROR - stderr - 80%|████████ | 18001/22434 [15:59:03<3:02:49, 2.47s/it] +2025-02-06 02:06:43 - ERROR - stderr - +2025-02-06 02:06:43 - ERROR - stderr - +2025-02-06 02:06:43 - INFO - stdout - {'loss': 0.4101, 'grad_norm': 1.6310198307037354, 'learning_rate': 1.979119330869661e-06, 'epoch': 2.41} +2025-02-06 02:06:43 - ERROR - stderr - 80%|████████ | 18001/22434 [15:59:03<3:02:49, 2.47s/it] +2025-02-06 02:06:45 - ERROR - stderr - 80%|████████ | 18002/22434 [15:59:05<3:02:57, 2.48s/it] +2025-02-06 02:06:45 - ERROR - stderr - +2025-02-06 02:06:45 - ERROR - stderr - +2025-02-06 02:06:45 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.676611304283142, 'learning_rate': 1.9782572006507995e-06, 'epoch': 2.41} +2025-02-06 02:06:45 - ERROR - stderr - 80%|████████ | 18002/22434 [15:59:05<3:02:57, 2.48s/it] +2025-02-06 02:06:48 - ERROR - stderr - 80%|████████ | 18003/22434 [15:59:08<3:02:22, 2.47s/it] +2025-02-06 02:06:48 - ERROR - stderr - +2025-02-06 02:06:48 - ERROR - stderr - +2025-02-06 02:06:48 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.5769695043563843, 'learning_rate': 1.977395237637485e-06, 'epoch': 2.41} +2025-02-06 02:06:48 - ERROR - stderr - 80%|████████ | 18003/22434 [15:59:08<3:02:22, 2.47s/it] +2025-02-06 02:06:50 - ERROR - stderr - 80%|████████ | 18004/22434 [15:59:10<3:02:03, 2.47s/it] +2025-02-06 02:06:50 - ERROR - stderr - +2025-02-06 02:06:50 - ERROR - stderr - +2025-02-06 02:06:50 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.7358254194259644, 'learning_rate': 1.9765334418476967e-06, 'epoch': 2.41} +2025-02-06 02:06:50 - ERROR - stderr - 80%|████████ | 18004/22434 [15:59:10<3:02:03, 2.47s/it] +2025-02-06 02:06:53 - ERROR - stderr - 80%|████████ | 18005/22434 [15:59:13<3:05:11, 2.51s/it] +2025-02-06 02:06:53 - ERROR - stderr - +2025-02-06 02:06:53 - ERROR - stderr - +2025-02-06 02:06:53 - INFO - stdout - {'loss': 0.4427, 'grad_norm': 1.6642705202102661, 'learning_rate': 1.9756718132993848e-06, 'epoch': 2.41} +2025-02-06 02:06:53 - ERROR - stderr - 80%|████████ | 18005/22434 [15:59:13<3:05:11, 2.51s/it] +2025-02-06 02:06:55 - ERROR - stderr - 80%|████████ | 18006/22434 [15:59:15<3:04:19, 2.50s/it] +2025-02-06 02:06:55 - ERROR - stderr - +2025-02-06 02:06:55 - ERROR - stderr - +2025-02-06 02:06:55 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.6510920524597168, 'learning_rate': 1.974810352010519e-06, 'epoch': 2.41} +2025-02-06 02:06:55 - ERROR - stderr - 80%|████████ | 18006/22434 [15:59:15<3:04:19, 2.50s/it] +2025-02-06 02:06:58 - ERROR - stderr - 80%|████████ | 18007/22434 [15:59:18<3:03:53, 2.49s/it] +2025-02-06 02:06:58 - ERROR - stderr - +2025-02-06 02:06:58 - ERROR - stderr - +2025-02-06 02:06:58 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.6528747081756592, 'learning_rate': 1.973949057999054e-06, 'epoch': 2.41} +2025-02-06 02:06:58 - ERROR - stderr - 80%|████████ | 18007/22434 [15:59:18<3:03:53, 2.49s/it] +2025-02-06 02:07:00 - ERROR - stderr - 80%|████████ | 18008/22434 [15:59:20<3:04:26, 2.50s/it] +2025-02-06 02:07:00 - ERROR - stderr - +2025-02-06 02:07:00 - ERROR - stderr - +2025-02-06 02:07:00 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.5447001457214355, 'learning_rate': 1.9730879312829354e-06, 'epoch': 2.41} +2025-02-06 02:07:00 - ERROR - stderr - 80%|████████ | 18008/22434 [15:59:20<3:04:26, 2.50s/it] +2025-02-06 02:07:03 - ERROR - stderr - 80%|████████ | 18009/22434 [15:59:23<3:05:34, 2.52s/it] +2025-02-06 02:07:03 - ERROR - stderr - +2025-02-06 02:07:03 - ERROR - stderr - +2025-02-06 02:07:03 - INFO - stdout - {'loss': 0.3525, 'grad_norm': 1.3964084386825562, 'learning_rate': 1.9722269718801236e-06, 'epoch': 2.41} +2025-02-06 02:07:03 - ERROR - stderr - 80%|████████ | 18009/22434 [15:59:23<3:05:34, 2.52s/it] +2025-02-06 02:07:06 - ERROR - stderr - 80%|████████ | 18010/22434 [15:59:25<3:06:44, 2.53s/it] +2025-02-06 02:07:06 - ERROR - stderr - +2025-02-06 02:07:06 - ERROR - stderr - +2025-02-06 02:07:06 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.3696916103363037, 'learning_rate': 1.9713661798085557e-06, 'epoch': 2.41} +2025-02-06 02:07:06 - ERROR - stderr - 80%|████████ | 18010/22434 [15:59:25<3:06:44, 2.53s/it] +2025-02-06 02:07:08 - ERROR - stderr - 80%|████████ | 18011/22434 [15:59:28<3:06:16, 2.53s/it] +2025-02-06 02:07:08 - ERROR - stderr - +2025-02-06 02:07:08 - ERROR - stderr - +2025-02-06 02:07:08 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.7484173774719238, 'learning_rate': 1.9705055550861784e-06, 'epoch': 2.41} +2025-02-06 02:07:08 - ERROR - stderr - 80%|████████ | 18011/22434 [15:59:28<3:06:16, 2.53s/it] +2025-02-06 02:07:10 - ERROR - stderr - 80%|████████ | 18012/22434 [15:59:30<3:04:34, 2.50s/it] +2025-02-06 02:07:11 - ERROR - stderr - +2025-02-06 02:07:11 - ERROR - stderr - +2025-02-06 02:07:11 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.5583627223968506, 'learning_rate': 1.9696450977309278e-06, 'epoch': 2.41} +2025-02-06 02:07:11 - ERROR - stderr - 80%|████████ | 18012/22434 [15:59:30<3:04:34, 2.50s/it] +2025-02-06 02:07:13 - ERROR - stderr - 80%|████████ | 18013/22434 [15:59:33<3:05:36, 2.52s/it] +2025-02-06 02:07:13 - ERROR - stderr - +2025-02-06 02:07:13 - ERROR - stderr - +2025-02-06 02:07:13 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.4588489532470703, 'learning_rate': 1.968784807760742e-06, 'epoch': 2.41} +2025-02-06 02:07:13 - ERROR - stderr - 80%|████████ | 18013/22434 [15:59:33<3:05:36, 2.52s/it] +2025-02-06 02:07:16 - ERROR - stderr - 80%|████████ | 18014/22434 [15:59:35<3:04:53, 2.51s/it] +2025-02-06 02:07:16 - ERROR - stderr - +2025-02-06 02:07:16 - ERROR - stderr - +2025-02-06 02:07:16 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.4994537830352783, 'learning_rate': 1.967924685193552e-06, 'epoch': 2.41} +2025-02-06 02:07:16 - ERROR - stderr - 80%|████████ | 18014/22434 [15:59:35<3:04:53, 2.51s/it] +2025-02-06 02:07:18 - ERROR - stderr - 80%|████████ | 18015/22434 [15:59:38<3:04:52, 2.51s/it] +2025-02-06 02:07:18 - ERROR - stderr - +2025-02-06 02:07:18 - ERROR - stderr - +2025-02-06 02:07:18 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.3571383953094482, 'learning_rate': 1.9670647300472856e-06, 'epoch': 2.41} +2025-02-06 02:07:18 - ERROR - stderr - 80%|████████ | 18015/22434 [15:59:38<3:04:52, 2.51s/it] +2025-02-06 02:07:20 - ERROR - stderr - 80%|████████ | 18016/22434 [15:59:40<3:03:49, 2.50s/it] +2025-02-06 02:07:21 - ERROR - stderr - +2025-02-06 02:07:21 - ERROR - stderr - +2025-02-06 02:07:21 - INFO - stdout - {'loss': 0.3164, 'grad_norm': 1.178723931312561, 'learning_rate': 1.966204942339869e-06, 'epoch': 2.41} +2025-02-06 02:07:21 - ERROR - stderr - 80%|████████ | 18016/22434 [15:59:40<3:03:49, 2.50s/it] +2025-02-06 02:07:23 - ERROR - stderr - 80%|████████ | 18017/22434 [15:59:43<3:02:58, 2.49s/it] +2025-02-06 02:07:23 - ERROR - stderr - +2025-02-06 02:07:23 - ERROR - stderr - +2025-02-06 02:07:23 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.6758408546447754, 'learning_rate': 1.9653453220892217e-06, 'epoch': 2.41} +2025-02-06 02:07:23 - ERROR - stderr - 80%|████████ | 18017/22434 [15:59:43<3:02:58, 2.49s/it] +2025-02-06 02:07:26 - ERROR - stderr - 80%|████████ | 18018/22434 [15:59:45<3:04:27, 2.51s/it] +2025-02-06 02:07:26 - ERROR - stderr - +2025-02-06 02:07:26 - ERROR - stderr - +2025-02-06 02:07:26 - INFO - stdout - {'loss': 0.4024, 'grad_norm': 1.3990116119384766, 'learning_rate': 1.9644858693132627e-06, 'epoch': 2.41} +2025-02-06 02:07:26 - ERROR - stderr - 80%|████████ | 18018/22434 [15:59:45<3:04:27, 2.51s/it] +2025-02-06 02:07:28 - ERROR - stderr - 80%|████████ | 18019/22434 [15:59:48<3:04:53, 2.51s/it] +2025-02-06 02:07:28 - ERROR - stderr - +2025-02-06 02:07:28 - ERROR - stderr - +2025-02-06 02:07:28 - INFO - stdout - {'loss': 0.3369, 'grad_norm': 1.5540207624435425, 'learning_rate': 1.9636265840299075e-06, 'epoch': 2.41} +2025-02-06 02:07:28 - ERROR - stderr - 80%|████████ | 18019/22434 [15:59:48<3:04:53, 2.51s/it] +2025-02-06 02:07:31 - ERROR - stderr - 80%|████████ | 18020/22434 [15:59:50<3:04:18, 2.51s/it] +2025-02-06 02:07:31 - ERROR - stderr - +2025-02-06 02:07:31 - ERROR - stderr - +2025-02-06 02:07:31 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.4744235277175903, 'learning_rate': 1.962767466257066e-06, 'epoch': 2.41} +2025-02-06 02:07:31 - ERROR - stderr - 80%|████████ | 18020/22434 [15:59:50<3:04:18, 2.51s/it] +2025-02-06 02:07:33 - ERROR - stderr - 80%|████████ | 18021/22434 [15:59:53<3:04:21, 2.51s/it] +2025-02-06 02:07:33 - ERROR - stderr - +2025-02-06 02:07:33 - ERROR - stderr - +2025-02-06 02:07:33 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.4892199039459229, 'learning_rate': 1.961908516012646e-06, 'epoch': 2.41} +2025-02-06 02:07:33 - ERROR - stderr - 80%|████████ | 18021/22434 [15:59:53<3:04:21, 2.51s/it] +2025-02-06 02:07:36 - ERROR - stderr - 80%|████████ | 18022/22434 [15:59:55<3:04:26, 2.51s/it] +2025-02-06 02:07:36 - ERROR - stderr - +2025-02-06 02:07:36 - ERROR - stderr - +2025-02-06 02:07:36 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.689810037612915, 'learning_rate': 1.9610497333145506e-06, 'epoch': 2.41} +2025-02-06 02:07:36 - ERROR - stderr - 80%|████████ | 18022/22434 [15:59:55<3:04:26, 2.51s/it] +2025-02-06 02:07:38 - ERROR - stderr - 80%|████████ | 18023/22434 [15:59:58<3:04:57, 2.52s/it] +2025-02-06 02:07:38 - ERROR - stderr - +2025-02-06 02:07:38 - ERROR - stderr - +2025-02-06 02:07:38 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.4549789428710938, 'learning_rate': 1.9601911181806845e-06, 'epoch': 2.41} +2025-02-06 02:07:38 - ERROR - stderr - 80%|████████ | 18023/22434 [15:59:58<3:04:57, 2.52s/it] +2025-02-06 02:07:41 - ERROR - stderr - 80%|████████ | 18024/22434 [16:00:00<3:05:16, 2.52s/it] +2025-02-06 02:07:41 - ERROR - stderr - +2025-02-06 02:07:41 - ERROR - stderr - +2025-02-06 02:07:41 - INFO - stdout - {'loss': 0.3003, 'grad_norm': 1.3405613899230957, 'learning_rate': 1.959332670628936e-06, 'epoch': 2.41} +2025-02-06 02:07:41 - ERROR - stderr - 80%|████████ | 18024/22434 [16:00:00<3:05:16, 2.52s/it] +2025-02-06 02:07:43 - ERROR - stderr - 80%|████████ | 18025/22434 [16:00:03<3:06:31, 2.54s/it] +2025-02-06 02:07:43 - ERROR - stderr - +2025-02-06 02:07:43 - ERROR - stderr - +2025-02-06 02:07:43 - INFO - stdout - {'loss': 0.4037, 'grad_norm': 1.635075569152832, 'learning_rate': 1.9584743906772063e-06, 'epoch': 2.41} +2025-02-06 02:07:43 - ERROR - stderr - 80%|████████ | 18025/22434 [16:00:03<3:06:31, 2.54s/it] +2025-02-06 02:07:46 - ERROR - stderr - 80%|████████ | 18026/22434 [16:00:05<3:05:47, 2.53s/it] +2025-02-06 02:07:46 - ERROR - stderr - +2025-02-06 02:07:46 - ERROR - stderr - +2025-02-06 02:07:46 - INFO - stdout - {'loss': 0.407, 'grad_norm': 1.6496763229370117, 'learning_rate': 1.9576162783433826e-06, 'epoch': 2.41} +2025-02-06 02:07:46 - ERROR - stderr - 80%|████████ | 18026/22434 [16:00:06<3:05:47, 2.53s/it] +2025-02-06 02:07:48 - ERROR - stderr - 80%|████████ | 18027/22434 [16:00:08<3:03:25, 2.50s/it] +2025-02-06 02:07:48 - ERROR - stderr - +2025-02-06 02:07:48 - ERROR - stderr - +2025-02-06 02:07:48 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.578217625617981, 'learning_rate': 1.9567583336453523e-06, 'epoch': 2.41} +2025-02-06 02:07:48 - ERROR - stderr - 80%|████████ | 18027/22434 [16:00:08<3:03:25, 2.50s/it] +2025-02-06 02:07:51 - ERROR - stderr - 80%|████████ | 18028/22434 [16:00:10<3:02:40, 2.49s/it] +2025-02-06 02:07:51 - ERROR - stderr - +2025-02-06 02:07:51 - ERROR - stderr - +2025-02-06 02:07:51 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.4393810033798218, 'learning_rate': 1.9559005566010013e-06, 'epoch': 2.41} +2025-02-06 02:07:51 - ERROR - stderr - 80%|████████ | 18028/22434 [16:00:10<3:02:40, 2.49s/it] +2025-02-06 02:07:53 - ERROR - stderr - 80%|████████ | 18029/22434 [16:00:13<3:02:01, 2.48s/it] +2025-02-06 02:07:53 - ERROR - stderr - +2025-02-06 02:07:53 - ERROR - stderr - +2025-02-06 02:07:53 - INFO - stdout - {'loss': 0.3978, 'grad_norm': 1.659801959991455, 'learning_rate': 1.9550429472281995e-06, 'epoch': 2.41} +2025-02-06 02:07:53 - ERROR - stderr - 80%|████████ | 18029/22434 [16:00:13<3:02:01, 2.48s/it] +2025-02-06 02:07:56 - ERROR - stderr - 80%|████████ | 18030/22434 [16:00:15<3:04:31, 2.51s/it] +2025-02-06 02:07:56 - ERROR - stderr - +2025-02-06 02:07:56 - ERROR - stderr - +2025-02-06 02:07:56 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.5541491508483887, 'learning_rate': 1.9541855055448346e-06, 'epoch': 2.41} +2025-02-06 02:07:56 - ERROR - stderr - 80%|████████ | 18030/22434 [16:00:15<3:04:31, 2.51s/it] +2025-02-06 02:07:58 - ERROR - stderr - 80%|████████ | 18031/22434 [16:00:18<3:02:44, 2.49s/it] +2025-02-06 02:07:58 - ERROR - stderr - +2025-02-06 02:07:58 - ERROR - stderr - +2025-02-06 02:07:58 - INFO - stdout - {'loss': 0.3573, 'grad_norm': 1.5560804605484009, 'learning_rate': 1.9533282315687716e-06, 'epoch': 2.41} +2025-02-06 02:07:58 - ERROR - stderr - 80%|████████ | 18031/22434 [16:00:18<3:02:44, 2.49s/it] +2025-02-06 02:08:01 - ERROR - stderr - 80%|████████ | 18032/22434 [16:00:20<3:04:10, 2.51s/it] +2025-02-06 02:08:01 - ERROR - stderr - +2025-02-06 02:08:01 - ERROR - stderr - +2025-02-06 02:08:01 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.6256047487258911, 'learning_rate': 1.952471125317882e-06, 'epoch': 2.41} +2025-02-06 02:08:01 - ERROR - stderr - 80%|████████ | 18032/22434 [16:00:20<3:04:10, 2.51s/it] +2025-02-06 02:08:03 - ERROR - stderr - 80%|████████ | 18033/22434 [16:00:23<3:06:13, 2.54s/it] +2025-02-06 02:08:03 - ERROR - stderr - +2025-02-06 02:08:03 - ERROR - stderr - +2025-02-06 02:08:03 - INFO - stdout - {'loss': 0.3433, 'grad_norm': 1.3780567646026611, 'learning_rate': 1.9516141868100304e-06, 'epoch': 2.41} +2025-02-06 02:08:03 - ERROR - stderr - 80%|████████ | 18033/22434 [16:00:23<3:06:13, 2.54s/it] +2025-02-06 02:08:06 - ERROR - stderr - 80%|████████ | 18034/22434 [16:00:25<3:04:31, 2.52s/it] +2025-02-06 02:08:06 - ERROR - stderr - +2025-02-06 02:08:06 - ERROR - stderr - +2025-02-06 02:08:06 - INFO - stdout - {'loss': 0.3118, 'grad_norm': 1.5133588314056396, 'learning_rate': 1.950757416063077e-06, 'epoch': 2.41} +2025-02-06 02:08:06 - ERROR - stderr - 80%|████████ | 18034/22434 [16:00:26<3:04:31, 2.52s/it] +2025-02-06 02:08:09 - ERROR - stderr - 80%|████████ | 18035/22434 [16:00:28<3:12:46, 2.63s/it] +2025-02-06 02:08:09 - ERROR - stderr - +2025-02-06 02:08:09 - ERROR - stderr - +2025-02-06 02:08:09 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.6051510572433472, 'learning_rate': 1.9499008130948893e-06, 'epoch': 2.41} +2025-02-06 02:08:09 - ERROR - stderr - 80%|████████ | 18035/22434 [16:00:28<3:12:46, 2.63s/it] +2025-02-06 02:08:11 - ERROR - stderr - 80%|████████ | 18036/22434 [16:00:31<3:10:41, 2.60s/it] +2025-02-06 02:08:11 - ERROR - stderr - +2025-02-06 02:08:11 - ERROR - stderr - +2025-02-06 02:08:11 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.5389469861984253, 'learning_rate': 1.9490443779233127e-06, 'epoch': 2.41} +2025-02-06 02:08:11 - ERROR - stderr - 80%|████████ | 18036/22434 [16:00:31<3:10:41, 2.60s/it] +2025-02-06 02:08:14 - ERROR - stderr - 80%|████████ | 18037/22434 [16:00:33<3:09:24, 2.58s/it] +2025-02-06 02:08:14 - ERROR - stderr - +2025-02-06 02:08:14 - ERROR - stderr - +2025-02-06 02:08:14 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.4993011951446533, 'learning_rate': 1.9481881105662027e-06, 'epoch': 2.41} +2025-02-06 02:08:14 - ERROR - stderr - 80%|████████ | 18037/22434 [16:00:33<3:09:24, 2.58s/it] +2025-02-06 02:08:16 - ERROR - stderr - 80%|████████ | 18038/22434 [16:00:36<3:07:11, 2.56s/it] +2025-02-06 02:08:16 - ERROR - stderr - +2025-02-06 02:08:16 - ERROR - stderr - +2025-02-06 02:08:16 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.7161153554916382, 'learning_rate': 1.947332011041406e-06, 'epoch': 2.41} +2025-02-06 02:08:16 - ERROR - stderr - 80%|████████ | 18038/22434 [16:00:36<3:07:11, 2.56s/it] +2025-02-06 02:08:19 - ERROR - stderr - 80%|████████ | 18039/22434 [16:00:38<3:04:34, 2.52s/it] +2025-02-06 02:08:19 - ERROR - stderr - +2025-02-06 02:08:19 - ERROR - stderr - +2025-02-06 02:08:19 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.799184799194336, 'learning_rate': 1.946476079366768e-06, 'epoch': 2.41} +2025-02-06 02:08:19 - ERROR - stderr - 80%|████████ | 18039/22434 [16:00:38<3:04:34, 2.52s/it] +2025-02-06 02:08:21 - ERROR - stderr - 80%|████████ | 18040/22434 [16:00:41<3:05:01, 2.53s/it] +2025-02-06 02:08:21 - ERROR - stderr - +2025-02-06 02:08:21 - ERROR - stderr - +2025-02-06 02:08:21 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.7864443063735962, 'learning_rate': 1.9456203155601295e-06, 'epoch': 2.41} +2025-02-06 02:08:21 - ERROR - stderr - 80%|████████ | 18040/22434 [16:00:41<3:05:01, 2.53s/it] +2025-02-06 02:08:24 - ERROR - stderr - 80%|████████ | 18041/22434 [16:00:43<3:04:21, 2.52s/it] +2025-02-06 02:08:24 - ERROR - stderr - +2025-02-06 02:08:24 - ERROR - stderr - +2025-02-06 02:08:24 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.6597378253936768, 'learning_rate': 1.9447647196393295e-06, 'epoch': 2.41} +2025-02-06 02:08:24 - ERROR - stderr - 80%|████████ | 18041/22434 [16:00:43<3:04:21, 2.52s/it] +2025-02-06 02:08:26 - ERROR - stderr - 80%|████████ | 18042/22434 [16:00:46<3:03:07, 2.50s/it] +2025-02-06 02:08:26 - ERROR - stderr - +2025-02-06 02:08:26 - ERROR - stderr - +2025-02-06 02:08:26 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.5790430307388306, 'learning_rate': 1.9439092916222004e-06, 'epoch': 2.41} +2025-02-06 02:08:26 - ERROR - stderr - 80%|████████ | 18042/22434 [16:00:46<3:03:07, 2.50s/it] +2025-02-06 02:08:29 - ERROR - stderr - 80%|████████ | 18043/22434 [16:00:49<3:06:19, 2.55s/it] +2025-02-06 02:08:29 - ERROR - stderr - +2025-02-06 02:08:29 - ERROR - stderr - +2025-02-06 02:08:29 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.7413369417190552, 'learning_rate': 1.9430540315265723e-06, 'epoch': 2.41} +2025-02-06 02:08:29 - ERROR - stderr - 80%|████████ | 18043/22434 [16:00:49<3:06:19, 2.55s/it] +2025-02-06 02:08:31 - ERROR - stderr - 80%|████████ | 18044/22434 [16:00:51<3:05:11, 2.53s/it] +2025-02-06 02:08:31 - ERROR - stderr - +2025-02-06 02:08:31 - ERROR - stderr - +2025-02-06 02:08:31 - INFO - stdout - {'loss': 0.4181, 'grad_norm': 1.695953607559204, 'learning_rate': 1.9421989393702744e-06, 'epoch': 2.41} +2025-02-06 02:08:31 - ERROR - stderr - 80%|████████ | 18044/22434 [16:00:51<3:05:11, 2.53s/it] +2025-02-06 02:08:34 - ERROR - stderr - 80%|████████ | 18045/22434 [16:00:54<3:13:31, 2.65s/it] +2025-02-06 02:08:34 - ERROR - stderr - +2025-02-06 02:08:34 - ERROR - stderr - +2025-02-06 02:08:34 - INFO - stdout - {'loss': 0.3699, 'grad_norm': 1.694677472114563, 'learning_rate': 1.9413440151711282e-06, 'epoch': 2.41} +2025-02-06 02:08:34 - ERROR - stderr - 80%|████████ | 18045/22434 [16:00:54<3:13:31, 2.65s/it] +2025-02-06 02:08:37 - ERROR - stderr - 80%|████████ | 18046/22434 [16:00:56<3:09:36, 2.59s/it] +2025-02-06 02:08:37 - ERROR - stderr - +2025-02-06 02:08:37 - ERROR - stderr - +2025-02-06 02:08:37 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.599601149559021, 'learning_rate': 1.940489258946955e-06, 'epoch': 2.41} +2025-02-06 02:08:37 - ERROR - stderr - 80%|████████ | 18046/22434 [16:00:56<3:09:36, 2.59s/it] +2025-02-06 02:08:39 - ERROR - stderr - 80%|████████ | 18047/22434 [16:00:59<3:05:40, 2.54s/it] +2025-02-06 02:08:39 - ERROR - stderr - +2025-02-06 02:08:39 - ERROR - stderr - +2025-02-06 02:08:39 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.5811517238616943, 'learning_rate': 1.9396346707155745e-06, 'epoch': 2.41} +2025-02-06 02:08:39 - ERROR - stderr - 80%|████████ | 18047/22434 [16:00:59<3:05:40, 2.54s/it] +2025-02-06 02:08:42 - ERROR - stderr - 80%|████████ | 18048/22434 [16:01:01<3:04:46, 2.53s/it] +2025-02-06 02:08:42 - ERROR - stderr - +2025-02-06 02:08:42 - ERROR - stderr - +2025-02-06 02:08:42 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.626510739326477, 'learning_rate': 1.9387802504947906e-06, 'epoch': 2.41} +2025-02-06 02:08:42 - ERROR - stderr - 80%|████████ | 18048/22434 [16:01:01<3:04:46, 2.53s/it] +2025-02-06 02:08:44 - ERROR - stderr - 80%|████████ | 18049/22434 [16:01:04<3:13:11, 2.64s/it] +2025-02-06 02:08:45 - ERROR - stderr - +2025-02-06 02:08:45 - ERROR - stderr - +2025-02-06 02:08:45 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.548584222793579, 'learning_rate': 1.9379259983024236e-06, 'epoch': 2.41} +2025-02-06 02:08:45 - ERROR - stderr - 80%|████████ | 18049/22434 [16:01:04<3:13:11, 2.64s/it] +2025-02-06 02:08:47 - ERROR - stderr - 80%|████████ | 18050/22434 [16:01:07<3:16:26, 2.69s/it] +2025-02-06 02:08:47 - ERROR - stderr - +2025-02-06 02:08:47 - ERROR - stderr - +2025-02-06 02:08:47 - INFO - stdout - {'loss': 0.3068, 'grad_norm': 1.4227983951568604, 'learning_rate': 1.9370719141562687e-06, 'epoch': 2.41} +2025-02-06 02:08:47 - ERROR - stderr - 80%|████████ | 18050/22434 [16:01:07<3:16:26, 2.69s/it] +2025-02-06 02:08:50 - ERROR - stderr - 80%|████████ | 18051/22434 [16:01:09<3:10:57, 2.61s/it] +2025-02-06 02:08:50 - ERROR - stderr - +2025-02-06 02:08:50 - ERROR - stderr - +2025-02-06 02:08:50 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.3419324159622192, 'learning_rate': 1.9362179980741413e-06, 'epoch': 2.41} +2025-02-06 02:08:50 - ERROR - stderr - 80%|████████ | 18051/22434 [16:01:10<3:10:57, 2.61s/it] +2025-02-06 02:08:52 - ERROR - stderr - 80%|████████ | 18052/22434 [16:01:12<3:06:57, 2.56s/it] +2025-02-06 02:08:52 - ERROR - stderr - +2025-02-06 02:08:52 - ERROR - stderr - +2025-02-06 02:08:52 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.4893600940704346, 'learning_rate': 1.93536425007383e-06, 'epoch': 2.41} +2025-02-06 02:08:52 - ERROR - stderr - 80%|████████ | 18052/22434 [16:01:12<3:06:57, 2.56s/it] +2025-02-06 02:08:55 - ERROR - stderr - 80%|████████ | 18053/22434 [16:01:14<3:05:11, 2.54s/it] +2025-02-06 02:08:55 - ERROR - stderr - +2025-02-06 02:08:55 - ERROR - stderr - +2025-02-06 02:08:55 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.541743516921997, 'learning_rate': 1.934510670173131e-06, 'epoch': 2.41} +2025-02-06 02:08:55 - ERROR - stderr - 80%|████████ | 18053/22434 [16:01:14<3:05:11, 2.54s/it] +2025-02-06 02:08:57 - ERROR - stderr - 80%|████████ | 18054/22434 [16:01:17<3:04:24, 2.53s/it] +2025-02-06 02:08:57 - ERROR - stderr - +2025-02-06 02:08:57 - ERROR - stderr - +2025-02-06 02:08:57 - INFO - stdout - {'loss': 0.4035, 'grad_norm': 1.6864932775497437, 'learning_rate': 1.9336572583898448e-06, 'epoch': 2.41} +2025-02-06 02:08:57 - ERROR - stderr - 80%|████████ | 18054/22434 [16:01:17<3:04:24, 2.53s/it] +2025-02-06 02:09:00 - ERROR - stderr - 80%|████████ | 18055/22434 [16:01:19<3:03:52, 2.52s/it] +2025-02-06 02:09:00 - ERROR - stderr - +2025-02-06 02:09:00 - ERROR - stderr - +2025-02-06 02:09:00 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.4050863981246948, 'learning_rate': 1.9328040147417513e-06, 'epoch': 2.41} +2025-02-06 02:09:00 - ERROR - stderr - 80%|████████ | 18055/22434 [16:01:19<3:03:52, 2.52s/it] +2025-02-06 02:09:02 - ERROR - stderr - 80%|████████ | 18056/22434 [16:01:22<3:01:41, 2.49s/it] +2025-02-06 02:09:02 - ERROR - stderr - +2025-02-06 02:09:02 - ERROR - stderr - +2025-02-06 02:09:02 - INFO - stdout - {'loss': 0.3136, 'grad_norm': 1.4355391263961792, 'learning_rate': 1.9319509392466394e-06, 'epoch': 2.41} +2025-02-06 02:09:02 - ERROR - stderr - 80%|████████ | 18056/22434 [16:01:22<3:01:41, 2.49s/it] +2025-02-06 02:09:05 - ERROR - stderr - 80%|████████ | 18057/22434 [16:01:24<3:02:23, 2.50s/it] +2025-02-06 02:09:05 - ERROR - stderr - +2025-02-06 02:09:05 - ERROR - stderr - +2025-02-06 02:09:05 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.4853312969207764, 'learning_rate': 1.9310980319222903e-06, 'epoch': 2.41} +2025-02-06 02:09:05 - ERROR - stderr - 80%|████████ | 18057/22434 [16:01:24<3:02:23, 2.50s/it] +2025-02-06 02:09:07 - ERROR - stderr - 80%|████████ | 18058/22434 [16:01:27<3:02:26, 2.50s/it] +2025-02-06 02:09:07 - ERROR - stderr - +2025-02-06 02:09:07 - ERROR - stderr - +2025-02-06 02:09:07 - INFO - stdout - {'loss': 0.3138, 'grad_norm': 1.5603691339492798, 'learning_rate': 1.9302452927864812e-06, 'epoch': 2.41} +2025-02-06 02:09:07 - ERROR - stderr - 80%|████████ | 18058/22434 [16:01:27<3:02:26, 2.50s/it] +2025-02-06 02:09:10 - ERROR - stderr - 80%|████████ | 18059/22434 [16:01:29<3:02:22, 2.50s/it] +2025-02-06 02:09:10 - ERROR - stderr - +2025-02-06 02:09:10 - ERROR - stderr - +2025-02-06 02:09:10 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.572059154510498, 'learning_rate': 1.9293927218569863e-06, 'epoch': 2.41} +2025-02-06 02:09:10 - ERROR - stderr - 80%|████████ | 18059/22434 [16:01:29<3:02:22, 2.50s/it] +2025-02-06 02:09:12 - ERROR - stderr - 81%|████████ | 18060/22434 [16:01:32<3:00:30, 2.48s/it] +2025-02-06 02:09:12 - ERROR - stderr - +2025-02-06 02:09:12 - ERROR - stderr - +2025-02-06 02:09:12 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.472150206565857, 'learning_rate': 1.9285403191515783e-06, 'epoch': 2.42} +2025-02-06 02:09:12 - ERROR - stderr - 81%|████████ | 18060/22434 [16:01:32<3:00:30, 2.48s/it] +2025-02-06 02:09:14 - ERROR - stderr - 81%|████████ | 18061/22434 [16:01:34<2:59:24, 2.46s/it] +2025-02-06 02:09:14 - ERROR - stderr - +2025-02-06 02:09:14 - ERROR - stderr - +2025-02-06 02:09:14 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.553295612335205, 'learning_rate': 1.927688084688023e-06, 'epoch': 2.42} +2025-02-06 02:09:14 - ERROR - stderr - 81%|████████ | 18061/22434 [16:01:34<2:59:24, 2.46s/it] +2025-02-06 02:09:17 - ERROR - stderr - 81%|████████ | 18062/22434 [16:01:37<3:00:40, 2.48s/it] +2025-02-06 02:09:17 - ERROR - stderr - +2025-02-06 02:09:17 - ERROR - stderr - +2025-02-06 02:09:17 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.5900689363479614, 'learning_rate': 1.926836018484085e-06, 'epoch': 2.42} +2025-02-06 02:09:17 - ERROR - stderr - 81%|████████ | 18062/22434 [16:01:37<3:00:40, 2.48s/it] +2025-02-06 02:09:20 - ERROR - stderr - 81%|████████ | 18063/22434 [16:01:39<3:03:18, 2.52s/it] +2025-02-06 02:09:20 - ERROR - stderr - +2025-02-06 02:09:20 - ERROR - stderr - +2025-02-06 02:09:20 - INFO - stdout - {'loss': 0.3872, 'grad_norm': 1.525107979774475, 'learning_rate': 1.925984120557526e-06, 'epoch': 2.42} +2025-02-06 02:09:20 - ERROR - stderr - 81%|████████ | 18063/22434 [16:01:39<3:03:18, 2.52s/it] +2025-02-06 02:09:22 - ERROR - stderr - 81%|████████ | 18064/22434 [16:01:42<3:01:35, 2.49s/it] +2025-02-06 02:09:22 - ERROR - stderr - +2025-02-06 02:09:22 - ERROR - stderr - +2025-02-06 02:09:22 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.494560718536377, 'learning_rate': 1.925132390926102e-06, 'epoch': 2.42} +2025-02-06 02:09:22 - ERROR - stderr - 81%|████████ | 18064/22434 [16:01:42<3:01:35, 2.49s/it] +2025-02-06 02:09:24 - ERROR - stderr - 81%|████████ | 18065/22434 [16:01:44<3:01:55, 2.50s/it] +2025-02-06 02:09:25 - ERROR - stderr - +2025-02-06 02:09:25 - ERROR - stderr - +2025-02-06 02:09:25 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.452217936515808, 'learning_rate': 1.9242808296075655e-06, 'epoch': 2.42} +2025-02-06 02:09:25 - ERROR - stderr - 81%|████████ | 18065/22434 [16:01:44<3:01:55, 2.50s/it] +2025-02-06 02:09:27 - ERROR - stderr - 81%|████████ | 18066/22434 [16:01:47<3:00:21, 2.48s/it] +2025-02-06 02:09:27 - ERROR - stderr - +2025-02-06 02:09:27 - ERROR - stderr - +2025-02-06 02:09:27 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.7618021965026855, 'learning_rate': 1.9234294366196683e-06, 'epoch': 2.42} +2025-02-06 02:09:27 - ERROR - stderr - 81%|████████ | 18066/22434 [16:01:47<3:00:21, 2.48s/it] +2025-02-06 02:09:29 - ERROR - stderr - 81%|████████ | 18067/22434 [16:01:49<2:59:44, 2.47s/it] +2025-02-06 02:09:29 - ERROR - stderr - +2025-02-06 02:09:29 - ERROR - stderr - +2025-02-06 02:09:29 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.458060383796692, 'learning_rate': 1.9225782119801563e-06, 'epoch': 2.42} +2025-02-06 02:09:29 - ERROR - stderr - 81%|████████ | 18067/22434 [16:01:49<2:59:44, 2.47s/it] +2025-02-06 02:09:32 - ERROR - stderr - 81%|████████ | 18068/22434 [16:01:52<2:58:33, 2.45s/it] +2025-02-06 02:09:32 - ERROR - stderr - +2025-02-06 02:09:32 - ERROR - stderr - +2025-02-06 02:09:32 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.6456409692764282, 'learning_rate': 1.921727155706774e-06, 'epoch': 2.42} +2025-02-06 02:09:32 - ERROR - stderr - 81%|████████ | 18068/22434 [16:01:52<2:58:33, 2.45s/it] +2025-02-06 02:09:34 - ERROR - stderr - 81%|████████ | 18069/22434 [16:01:54<2:58:02, 2.45s/it] +2025-02-06 02:09:34 - ERROR - stderr - +2025-02-06 02:09:34 - ERROR - stderr - +2025-02-06 02:09:34 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.4869545698165894, 'learning_rate': 1.9208762678172543e-06, 'epoch': 2.42} +2025-02-06 02:09:34 - ERROR - stderr - 81%|████████ | 18069/22434 [16:01:54<2:58:02, 2.45s/it] +2025-02-06 02:09:37 - ERROR - stderr - 81%|████████ | 18070/22434 [16:01:57<2:59:59, 2.47s/it] +2025-02-06 02:09:37 - ERROR - stderr - +2025-02-06 02:09:37 - ERROR - stderr - +2025-02-06 02:09:37 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.7008962631225586, 'learning_rate': 1.9200255483293427e-06, 'epoch': 2.42} +2025-02-06 02:09:37 - ERROR - stderr - 81%|████████ | 18070/22434 [16:01:57<2:59:59, 2.47s/it] +2025-02-06 02:09:39 - ERROR - stderr - 81%|████████ | 18071/22434 [16:01:59<3:01:55, 2.50s/it] +2025-02-06 02:09:39 - ERROR - stderr - +2025-02-06 02:09:39 - ERROR - stderr - +2025-02-06 02:09:39 - INFO - stdout - {'loss': 0.4341, 'grad_norm': 1.6859917640686035, 'learning_rate': 1.9191749972607655e-06, 'epoch': 2.42} +2025-02-06 02:09:39 - ERROR - stderr - 81%|████████ | 18071/22434 [16:01:59<3:01:55, 2.50s/it] +2025-02-06 02:09:42 - ERROR - stderr - 81%|████████ | 18072/22434 [16:02:02<3:01:35, 2.50s/it] +2025-02-06 02:09:42 - ERROR - stderr - +2025-02-06 02:09:42 - ERROR - stderr - +2025-02-06 02:09:42 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.458686113357544, 'learning_rate': 1.918324614629249e-06, 'epoch': 2.42} +2025-02-06 02:09:42 - ERROR - stderr - 81%|████████ | 18072/22434 [16:02:02<3:01:35, 2.50s/it] +2025-02-06 02:09:44 - ERROR - stderr - 81%|████████ | 18073/22434 [16:02:04<3:00:49, 2.49s/it] +2025-02-06 02:09:44 - ERROR - stderr - +2025-02-06 02:09:44 - ERROR - stderr - +2025-02-06 02:09:44 - INFO - stdout - {'loss': 0.3298, 'grad_norm': 1.4641509056091309, 'learning_rate': 1.917474400452528e-06, 'epoch': 2.42} +2025-02-06 02:09:44 - ERROR - stderr - 81%|████████ | 18073/22434 [16:02:04<3:00:49, 2.49s/it] +2025-02-06 02:09:47 - ERROR - stderr - 81%|████████ | 18074/22434 [16:02:07<3:04:49, 2.54s/it] +2025-02-06 02:09:47 - ERROR - stderr - +2025-02-06 02:09:47 - ERROR - stderr - +2025-02-06 02:09:47 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.5092238187789917, 'learning_rate': 1.9166243547483143e-06, 'epoch': 2.42} +2025-02-06 02:09:47 - ERROR - stderr - 81%|████████ | 18074/22434 [16:02:07<3:04:49, 2.54s/it] +2025-02-06 02:09:49 - ERROR - stderr - 81%|████████ | 18075/22434 [16:02:09<3:03:03, 2.52s/it] +2025-02-06 02:09:49 - ERROR - stderr - +2025-02-06 02:09:49 - ERROR - stderr - +2025-02-06 02:09:49 - INFO - stdout - {'loss': 0.3835, 'grad_norm': 1.576855182647705, 'learning_rate': 1.9157744775343355e-06, 'epoch': 2.42} +2025-02-06 02:09:49 - ERROR - stderr - 81%|███��████ | 18075/22434 [16:02:09<3:03:03, 2.52s/it] +2025-02-06 02:09:52 - ERROR - stderr - 81%|████████ | 18076/22434 [16:02:12<3:02:30, 2.51s/it] +2025-02-06 02:09:52 - ERROR - stderr - +2025-02-06 02:09:52 - ERROR - stderr - +2025-02-06 02:09:52 - INFO - stdout - {'loss': 0.3509, 'grad_norm': 1.6809840202331543, 'learning_rate': 1.9149247688283e-06, 'epoch': 2.42} +2025-02-06 02:09:52 - ERROR - stderr - 81%|████████ | 18076/22434 [16:02:12<3:02:30, 2.51s/it] +2025-02-06 02:09:54 - ERROR - stderr - 81%|████████ | 18077/22434 [16:02:14<3:02:16, 2.51s/it] +2025-02-06 02:09:54 - ERROR - stderr - +2025-02-06 02:09:54 - ERROR - stderr - +2025-02-06 02:09:54 - INFO - stdout - {'loss': 0.4223, 'grad_norm': 1.5899978876113892, 'learning_rate': 1.9140752286479213e-06, 'epoch': 2.42} +2025-02-06 02:09:54 - ERROR - stderr - 81%|████████ | 18077/22434 [16:02:14<3:02:16, 2.51s/it] +2025-02-06 02:09:57 - ERROR - stderr - 81%|████████ | 18078/22434 [16:02:17<3:07:26, 2.58s/it] +2025-02-06 02:09:57 - ERROR - stderr - +2025-02-06 02:09:57 - ERROR - stderr - +2025-02-06 02:09:57 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.6565505266189575, 'learning_rate': 1.9132258570109062e-06, 'epoch': 2.42} +2025-02-06 02:09:57 - ERROR - stderr - 81%|████████ | 18078/22434 [16:02:17<3:07:26, 2.58s/it] +2025-02-06 02:10:00 - ERROR - stderr - 81%|████████ | 18079/22434 [16:02:19<3:06:57, 2.58s/it] +2025-02-06 02:10:00 - ERROR - stderr - +2025-02-06 02:10:00 - ERROR - stderr - +2025-02-06 02:10:00 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.555350422859192, 'learning_rate': 1.912376653934961e-06, 'epoch': 2.42} +2025-02-06 02:10:00 - ERROR - stderr - 81%|████████ | 18079/22434 [16:02:20<3:06:57, 2.58s/it] +2025-02-06 02:10:02 - ERROR - stderr - 81%|████████ | 18080/22434 [16:02:22<3:07:27, 2.58s/it] +2025-02-06 02:10:02 - ERROR - stderr - +2025-02-06 02:10:02 - ERROR - stderr - +2025-02-06 02:10:02 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.5153439044952393, 'learning_rate': 1.911527619437784e-06, 'epoch': 2.42} +2025-02-06 02:10:02 - ERROR - stderr - 81%|████████ | 18080/22434 [16:02:22<3:07:27, 2.58s/it] +2025-02-06 02:10:05 - ERROR - stderr - 81%|████████ | 18081/22434 [16:02:25<3:05:47, 2.56s/it] +2025-02-06 02:10:05 - ERROR - stderr - +2025-02-06 02:10:05 - ERROR - stderr - +2025-02-06 02:10:05 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.470353364944458, 'learning_rate': 1.9106787535370753e-06, 'epoch': 2.42} +2025-02-06 02:10:05 - ERROR - stderr - 81%|████████ | 18081/22434 [16:02:25<3:05:47, 2.56s/it] +2025-02-06 02:10:07 - ERROR - stderr - 81%|████████ | 18082/22434 [16:02:27<3:04:50, 2.55s/it] +2025-02-06 02:10:07 - ERROR - stderr - +2025-02-06 02:10:07 - ERROR - stderr - +2025-02-06 02:10:07 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.6296138763427734, 'learning_rate': 1.9098300562505266e-06, 'epoch': 2.42} +2025-02-06 02:10:07 - ERROR - stderr - 81%|████████ | 18082/22434 [16:02:27<3:04:50, 2.55s/it] +2025-02-06 02:10:10 - ERROR - stderr - 81%|████████ | 18083/22434 [16:02:30<3:03:41, 2.53s/it] +2025-02-06 02:10:10 - ERROR - stderr - +2025-02-06 02:10:10 - ERROR - stderr - +2025-02-06 02:10:10 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.5523146390914917, 'learning_rate': 1.908981527595829e-06, 'epoch': 2.42} +2025-02-06 02:10:10 - ERROR - stderr - 81%|████████ | 18083/22434 [16:02:30<3:03:41, 2.53s/it] +2025-02-06 02:10:12 - ERROR - stderr - 81%|████████ | 18084/22434 [16:02:32<3:01:45, 2.51s/it] +2025-02-06 02:10:12 - ERROR - stderr - +2025-02-06 02:10:12 - ERROR - stderr - +2025-02-06 02:10:12 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.6103651523590088, 'learning_rate': 1.908133167590669e-06, 'epoch': 2.42} +2025-02-06 02:10:12 - ERROR - stderr - 81%|████████ | 18084/22434 [16:02:32<3:01:45, 2.51s/it] +2025-02-06 02:10:15 - ERROR - stderr - 81%|████████ | 18085/22434 [16:02:35<3:06:03, 2.57s/it] +2025-02-06 02:10:15 - ERROR - stderr - +2025-02-06 02:10:15 - ERROR - stderr - +2025-02-06 02:10:15 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.6443121433258057, 'learning_rate': 1.9072849762527301e-06, 'epoch': 2.42} +2025-02-06 02:10:15 - ERROR - stderr - 81%|████████ | 18085/22434 [16:02:35<3:06:03, 2.57s/it] +2025-02-06 02:10:17 - ERROR - stderr - 81%|████████ | 18086/22434 [16:02:37<3:04:29, 2.55s/it] +2025-02-06 02:10:18 - ERROR - stderr - +2025-02-06 02:10:18 - ERROR - stderr - +2025-02-06 02:10:18 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.546447515487671, 'learning_rate': 1.906436953599693e-06, 'epoch': 2.42} +2025-02-06 02:10:18 - ERROR - stderr - 81%|████████ | 18086/22434 [16:02:37<3:04:29, 2.55s/it] +2025-02-06 02:10:20 - ERROR - stderr - 81%|████████ | 18087/22434 [16:02:40<3:03:27, 2.53s/it] +2025-02-06 02:10:20 - ERROR - stderr - +2025-02-06 02:10:20 - ERROR - stderr - +2025-02-06 02:10:20 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.654009222984314, 'learning_rate': 1.9055890996492344e-06, 'epoch': 2.42} +2025-02-06 02:10:20 - ERROR - stderr - 81%|████████ | 18087/22434 [16:02:40<3:03:27, 2.53s/it] +2025-02-06 02:10:22 - ERROR - stderr - 81%|████████ | 18088/22434 [16:02:42<3:02:30, 2.52s/it] +2025-02-06 02:10:23 - ERROR - stderr - +2025-02-06 02:10:23 - ERROR - stderr - +2025-02-06 02:10:23 - INFO - stdout - {'loss': 0.3874, 'grad_norm': 1.8355324268341064, 'learning_rate': 1.9047414144190203e-06, 'epoch': 2.42} +2025-02-06 02:10:23 - ERROR - stderr - 81%|████████ | 18088/22434 [16:02:42<3:02:30, 2.52s/it] +2025-02-06 02:10:25 - ERROR - stderr - 81%|████████ | 18089/22434 [16:02:45<3:02:04, 2.51s/it] +2025-02-06 02:10:25 - ERROR - stderr - +2025-02-06 02:10:25 - ERROR - stderr - +2025-02-06 02:10:25 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.4860445261001587, 'learning_rate': 1.9038938979267308e-06, 'epoch': 2.42} +2025-02-06 02:10:25 - ERROR - stderr - 81%|████████ | 18089/22434 [16:02:45<3:02:04, 2.51s/it] +2025-02-06 02:10:27 - ERROR - stderr - 81%|████████ | 18090/22434 [16:02:47<3:00:28, 2.49s/it] +2025-02-06 02:10:27 - ERROR - stderr - +2025-02-06 02:10:27 - ERROR - stderr - +2025-02-06 02:10:27 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.552764654159546, 'learning_rate': 1.9030465501900207e-06, 'epoch': 2.42} +2025-02-06 02:10:27 - ERROR - stderr - 81%|████████ | 18090/22434 [16:02:47<3:00:28, 2.49s/it] +2025-02-06 02:10:30 - ERROR - stderr - 81%|████████ | 18091/22434 [16:02:50<3:00:59, 2.50s/it] +2025-02-06 02:10:30 - ERROR - stderr - +2025-02-06 02:10:30 - ERROR - stderr - +2025-02-06 02:10:30 - INFO - stdout - {'loss': 0.3228, 'grad_norm': 1.345727801322937, 'learning_rate': 1.9021993712265596e-06, 'epoch': 2.42} +2025-02-06 02:10:30 - ERROR - stderr - 81%|████████ | 18091/22434 [16:02:50<3:00:59, 2.50s/it] +2025-02-06 02:10:32 - ERROR - stderr - 81%|████████ | 18092/22434 [16:02:52<3:00:51, 2.50s/it] +2025-02-06 02:10:32 - ERROR - stderr - +2025-02-06 02:10:32 - ERROR - stderr - +2025-02-06 02:10:32 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.5653833150863647, 'learning_rate': 1.9013523610540064e-06, 'epoch': 2.42} +2025-02-06 02:10:32 - ERROR - stderr - 81%|████████ | 18092/22434 [16:02:52<3:00:51, 2.50s/it] +2025-02-06 02:10:35 - ERROR - stderr - 81%|████████ | 18093/22434 [16:02:55<3:03:14, 2.53s/it] +2025-02-06 02:10:35 - ERROR - stderr - +2025-02-06 02:10:35 - ERROR - stderr - +2025-02-06 02:10:35 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.538627028465271, 'learning_rate': 1.900505519690009e-06, 'epoch': 2.42} +2025-02-06 02:10:35 - ERROR - stderr - 81%|████████ | 18093/22434 [16:02:55<3:03:14, 2.53s/it] +2025-02-06 02:10:37 - ERROR - stderr - 81%|████████ | 18094/22434 [16:02:57<3:00:35, 2.50s/it] +2025-02-06 02:10:38 - ERROR - stderr - +2025-02-06 02:10:38 - ERROR - stderr - +2025-02-06 02:10:38 - INFO - stdout - {'loss': 0.2736, 'grad_norm': 1.4516618251800537, 'learning_rate': 1.8996588471522282e-06, 'epoch': 2.42} +2025-02-06 02:10:38 - ERROR - stderr - 81%|████████ | 18094/22434 [16:02:57<3:00:35, 2.50s/it] +2025-02-06 02:10:40 - ERROR - stderr - 81%|████████ | 18095/22434 [16:03:00<2:59:16, 2.48s/it] +2025-02-06 02:10:40 - ERROR - stderr - +2025-02-06 02:10:40 - ERROR - stderr - +2025-02-06 02:10:40 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.4271422624588013, 'learning_rate': 1.898812343458305e-06, 'epoch': 2.42} +2025-02-06 02:10:40 - ERROR - stderr - 81%|████████ | 18095/22434 [16:03:00<2:59:16, 2.48s/it] +2025-02-06 02:10:43 - ERROR - stderr - 81%|████████ | 18096/22434 [16:03:02<3:02:14, 2.52s/it] +2025-02-06 02:10:43 - ERROR - stderr - +2025-02-06 02:10:43 - ERROR - stderr - +2025-02-06 02:10:43 - INFO - stdout - {'loss': 0.3068, 'grad_norm': 1.3784897327423096, 'learning_rate': 1.8979660086258866e-06, 'epoch': 2.42} +2025-02-06 02:10:43 - ERROR - stderr - 81%|████████ | 18096/22434 [16:03:02<3:02:14, 2.52s/it] +2025-02-06 02:10:45 - ERROR - stderr - 81%|████████ | 18097/22434 [16:03:05<3:00:52, 2.50s/it] +2025-02-06 02:10:45 - ERROR - stderr - +2025-02-06 02:10:45 - ERROR - stderr - +2025-02-06 02:10:45 - INFO - stdout - {'loss': 0.3888, 'grad_norm': 1.8402807712554932, 'learning_rate': 1.8971198426726145e-06, 'epoch': 2.42} +2025-02-06 02:10:45 - ERROR - stderr - 81%|████████ | 18097/22434 [16:03:05<3:00:52, 2.50s/it] +2025-02-06 02:10:48 - ERROR - stderr - 81%|████████ | 18098/22434 [16:03:08<3:09:21, 2.62s/it] +2025-02-06 02:10:48 - ERROR - stderr - +2025-02-06 02:10:48 - ERROR - stderr - +2025-02-06 02:10:48 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.4829161167144775, 'learning_rate': 1.8962738456161223e-06, 'epoch': 2.42} +2025-02-06 02:10:48 - ERROR - stderr - 81%|████████ | 18098/22434 [16:03:08<3:09:21, 2.62s/it] +2025-02-06 02:10:50 - ERROR - stderr - 81%|████████ | 18099/22434 [16:03:10<3:05:48, 2.57s/it] +2025-02-06 02:10:50 - ERROR - stderr - +2025-02-06 02:10:50 - ERROR - stderr - +2025-02-06 02:10:50 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.5332369804382324, 'learning_rate': 1.8954280174740536e-06, 'epoch': 2.42} +2025-02-06 02:10:50 - ERROR - stderr - 81%|████████ | 18099/22434 [16:03:10<3:05:48, 2.57s/it] +2025-02-06 02:10:53 - ERROR - stderr - 81%|████████ | 18100/22434 [16:03:13<3:03:46, 2.54s/it] +2025-02-06 02:10:53 - ERROR - stderr - +2025-02-06 02:10:53 - ERROR - stderr - +2025-02-06 02:10:53 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.4898616075515747, 'learning_rate': 1.8945823582640288e-06, 'epoch': 2.42} +2025-02-06 02:10:53 - ERROR - stderr - 81%|████████ | 18100/22434 [16:03:13<3:03:46, 2.54s/it] +2025-02-06 02:10:55 - ERROR - stderr - 81%|████████ | 18101/22434 [16:03:15<3:02:13, 2.52s/it] +2025-02-06 02:10:55 - ERROR - stderr - +2025-02-06 02:10:55 - ERROR - stderr - +2025-02-06 02:10:55 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.5770057439804077, 'learning_rate': 1.8937368680036794e-06, 'epoch': 2.42} +2025-02-06 02:10:55 - ERROR - stderr - 81%|████████ | 18101/22434 [16:03:15<3:02:13, 2.52s/it] +2025-02-06 02:10:58 - ERROR - stderr - 81%|████████ | 18102/22434 [16:03:18<3:01:58, 2.52s/it] +2025-02-06 02:10:58 - ERROR - stderr - +2025-02-06 02:10:58 - ERROR - stderr - +2025-02-06 02:10:58 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.3761988878250122, 'learning_rate': 1.892891546710628e-06, 'epoch': 2.42} +2025-02-06 02:10:58 - ERROR - stderr - 81%|████████ | 18102/22434 [16:03:18<3:01:58, 2.52s/it] +2025-02-06 02:11:00 - ERROR - stderr - 81%|████████ | 18103/22434 [16:03:20<3:01:16, 2.51s/it] +2025-02-06 02:11:00 - ERROR - stderr - +2025-02-06 02:11:00 - ERROR - stderr - +2025-02-06 02:11:00 - INFO - stdout - {'loss': 0.4025, 'grad_norm': 1.75946843624115, 'learning_rate': 1.8920463944024948e-06, 'epoch': 2.42} +2025-02-06 02:11:00 - ERROR - stderr - 81%|████████ | 18103/22434 [16:03:20<3:01:16, 2.51s/it] +2025-02-06 02:11:03 - ERROR - stderr - 81%|████████ | 18104/22434 [16:03:23<3:00:40, 2.50s/it] +2025-02-06 02:11:03 - ERROR - stderr - +2025-02-06 02:11:03 - ERROR - stderr - +2025-02-06 02:11:03 - INFO - stdout - {'loss': 0.3405, 'grad_norm': 1.457396149635315, 'learning_rate': 1.8912014110968956e-06, 'epoch': 2.42} +2025-02-06 02:11:03 - ERROR - stderr - 81%|████████ | 18104/22434 [16:03:23<3:00:40, 2.50s/it] +2025-02-06 02:11:05 - ERROR - stderr - 81%|████████ | 18105/22434 [16:03:25<3:01:51, 2.52s/it] +2025-02-06 02:11:05 - ERROR - stderr - +2025-02-06 02:11:05 - ERROR - stderr - +2025-02-06 02:11:05 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.507546067237854, 'learning_rate': 1.8903565968114445e-06, 'epoch': 2.42} +2025-02-06 02:11:05 - ERROR - stderr - 81%|████████ | 18105/22434 [16:03:25<3:01:51, 2.52s/it] +2025-02-06 02:11:08 - ERROR - stderr - 81%|████████ | 18106/22434 [16:03:28<3:00:30, 2.50s/it] +2025-02-06 02:11:08 - ERROR - stderr - +2025-02-06 02:11:08 - ERROR - stderr - +2025-02-06 02:11:08 - INFO - stdout - {'loss': 0.3878, 'grad_norm': 1.5793941020965576, 'learning_rate': 1.8895119515637495e-06, 'epoch': 2.42} +2025-02-06 02:11:08 - ERROR - stderr - 81%|████████ | 18106/22434 [16:03:28<3:00:30, 2.50s/it] +2025-02-06 02:11:10 - ERROR - stderr - 81%|████████ | 18107/22434 [16:03:30<3:01:45, 2.52s/it] +2025-02-06 02:11:10 - ERROR - stderr - +2025-02-06 02:11:10 - ERROR - stderr - +2025-02-06 02:11:10 - INFO - stdout - {'loss': 0.3375, 'grad_norm': 1.387105941772461, 'learning_rate': 1.8886674753714162e-06, 'epoch': 2.42} +2025-02-06 02:11:10 - ERROR - stderr - 81%|████████ | 18107/22434 [16:03:30<3:01:45, 2.52s/it] +2025-02-06 02:11:13 - ERROR - stderr - 81%|████████ | 18108/22434 [16:03:33<3:02:29, 2.53s/it] +2025-02-06 02:11:13 - ERROR - stderr - +2025-02-06 02:11:13 - ERROR - stderr - +2025-02-06 02:11:13 - INFO - stdout - {'loss': 0.3076, 'grad_norm': 1.4817014932632446, 'learning_rate': 1.8878231682520488e-06, 'epoch': 2.42} +2025-02-06 02:11:13 - ERROR - stderr - 81%|████████ | 18108/22434 [16:03:33<3:02:29, 2.53s/it] +2025-02-06 02:11:15 - ERROR - stderr - 81%|████████ | 18109/22434 [16:03:35<3:02:19, 2.53s/it] +2025-02-06 02:11:15 - ERROR - stderr - +2025-02-06 02:11:15 - ERROR - stderr - +2025-02-06 02:11:15 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.5437718629837036, 'learning_rate': 1.886979030223245e-06, 'epoch': 2.42} +2025-02-06 02:11:15 - ERROR - stderr - 81%|████████ | 18109/22434 [16:03:35<3:02:19, 2.53s/it] +2025-02-06 02:11:18 - ERROR - stderr - 81%|████████ | 18110/22434 [16:03:38<3:05:23, 2.57s/it] +2025-02-06 02:11:18 - ERROR - stderr - +2025-02-06 02:11:18 - ERROR - stderr - +2025-02-06 02:11:18 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.4769299030303955, 'learning_rate': 1.8861350613025996e-06, 'epoch': 2.42} +2025-02-06 02:11:18 - ERROR - stderr - 81%|████████ | 18110/22434 [16:03:38<3:05:23, 2.57s/it] +2025-02-06 02:11:21 - ERROR - stderr - 81%|████████ | 18111/22434 [16:03:40<3:03:54, 2.55s/it] +2025-02-06 02:11:21 - ERROR - stderr - +2025-02-06 02:11:21 - ERROR - stderr - +2025-02-06 02:11:21 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.4321874380111694, 'learning_rate': 1.8852912615077045e-06, 'epoch': 2.42} +2025-02-06 02:11:21 - ERROR - stderr - 81%|████████ | 18111/22434 [16:03:40<3:03:54, 2.55s/it] +2025-02-06 02:11:23 - ERROR - stderr - 81%|████████ | 18112/22434 [16:03:43<3:02:04, 2.53s/it] +2025-02-06 02:11:23 - ERROR - stderr - +2025-02-06 02:11:23 - ERROR - stderr - +2025-02-06 02:11:23 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.4741073846817017, 'learning_rate': 1.8844476308561488e-06, 'epoch': 2.42} +2025-02-06 02:11:23 - ERROR - stderr - 81%|████████ | 18112/22434 [16:03:43<3:02:04, 2.53s/it] +2025-02-06 02:11:26 - ERROR - stderr - 81%|████████ | 18113/22434 [16:03:45<3:00:36, 2.51s/it] +2025-02-06 02:11:26 - ERROR - stderr - +2025-02-06 02:11:26 - ERROR - stderr - +2025-02-06 02:11:26 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.5756639242172241, 'learning_rate': 1.8836041693655183e-06, 'epoch': 2.42} +2025-02-06 02:11:26 - ERROR - stderr - 81%|████████ | 18113/22434 [16:03:45<3:00:36, 2.51s/it] +2025-02-06 02:11:28 - ERROR - stderr - 81%|████████ | 18114/22434 [16:03:48<3:00:54, 2.51s/it] +2025-02-06 02:11:28 - ERROR - stderr - +2025-02-06 02:11:28 - ERROR - stderr - +2025-02-06 02:11:28 - INFO - stdout - {'loss': 0.3279, 'grad_norm': 1.4143149852752686, 'learning_rate': 1.882760877053388e-06, 'epoch': 2.42} +2025-02-06 02:11:28 - ERROR - stderr - 81%|████████ | 18114/22434 [16:03:48<3:00:54, 2.51s/it] +2025-02-06 02:11:31 - ERROR - stderr - 81%|████████ | 18115/22434 [16:03:50<3:00:03, 2.50s/it] +2025-02-06 02:11:31 - ERROR - stderr - +2025-02-06 02:11:31 - ERROR - stderr - +2025-02-06 02:11:31 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.5536344051361084, 'learning_rate': 1.8819177539373445e-06, 'epoch': 2.42} +2025-02-06 02:11:31 - ERROR - stderr - 81%|████████ | 18115/22434 [16:03:50<3:00:03, 2.50s/it] +2025-02-06 02:11:33 - ERROR - stderr - 81%|████████ | 18116/22434 [16:03:53<2:58:47, 2.48s/it] +2025-02-06 02:11:33 - ERROR - stderr - +2025-02-06 02:11:33 - ERROR - stderr - +2025-02-06 02:11:33 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.4539053440093994, 'learning_rate': 1.8810748000349544e-06, 'epoch': 2.42} +2025-02-06 02:11:33 - ERROR - stderr - 81%|████████ | 18116/22434 [16:03:53<2:58:47, 2.48s/it] +2025-02-06 02:11:36 - ERROR - stderr - 81%|████████ | 18117/22434 [16:03:55<3:00:07, 2.50s/it] +2025-02-06 02:11:36 - ERROR - stderr - +2025-02-06 02:11:36 - ERROR - stderr - +2025-02-06 02:11:36 - INFO - stdout - {'loss': 0.3885, 'grad_norm': 1.6055166721343994, 'learning_rate': 1.8802320153637888e-06, 'epoch': 2.42} +2025-02-06 02:11:36 - ERROR - stderr - 81%|████████ | 18117/22434 [16:03:55<3:00:07, 2.50s/it] +2025-02-06 02:11:38 - ERROR - stderr - 81%|████████ | 18118/22434 [16:03:58<3:02:59, 2.54s/it] +2025-02-06 02:11:38 - ERROR - stderr - +2025-02-06 02:11:38 - ERROR - stderr - +2025-02-06 02:11:38 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.4351887702941895, 'learning_rate': 1.8793893999414226e-06, 'epoch': 2.42} +2025-02-06 02:11:38 - ERROR - stderr - 81%|████████ | 18118/22434 [16:03:58<3:02:59, 2.54s/it] +2025-02-06 02:11:41 - ERROR - stderr - 81%|████████ | 18119/22434 [16:04:01<3:03:50, 2.56s/it] +2025-02-06 02:11:41 - ERROR - stderr - +2025-02-06 02:11:41 - ERROR - stderr - +2025-02-06 02:11:41 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.4355101585388184, 'learning_rate': 1.8785469537854084e-06, 'epoch': 2.42} +2025-02-06 02:11:41 - ERROR - stderr - 81%|████████ | 18119/22434 [16:04:01<3:03:50, 2.56s/it] +2025-02-06 02:11:43 - ERROR - stderr - 81%|████████ | 18120/22434 [16:04:03<3:02:35, 2.54s/it] +2025-02-06 02:11:43 - ERROR - stderr - +2025-02-06 02:11:43 - ERROR - stderr - +2025-02-06 02:11:43 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.6233646869659424, 'learning_rate': 1.8777046769133167e-06, 'epoch': 2.42} +2025-02-06 02:11:43 - ERROR - stderr - 81%|████████ | 18120/22434 [16:04:03<3:02:35, 2.54s/it] +2025-02-06 02:11:46 - ERROR - stderr - 81%|████████ | 18121/22434 [16:04:05<3:00:10, 2.51s/it] +2025-02-06 02:11:46 - ERROR - stderr - +2025-02-06 02:11:46 - ERROR - stderr - +2025-02-06 02:11:46 - INFO - stdout - {'loss': 0.3356, 'grad_norm': 1.402411937713623, 'learning_rate': 1.8768625693426956e-06, 'epoch': 2.42} +2025-02-06 02:11:46 - ERROR - stderr - 81%|████████ | 18121/22434 [16:04:06<3:00:10, 2.51s/it] +2025-02-06 02:11:48 - ERROR - stderr - 81%|████████ | 18122/22434 [16:04:08<2:59:50, 2.50s/it] +2025-02-06 02:11:48 - ERROR - stderr - +2025-02-06 02:11:48 - ERROR - stderr - +2025-02-06 02:11:48 - INFO - stdout - {'loss': 0.3499, 'grad_norm': 1.674034595489502, 'learning_rate': 1.8760206310911023e-06, 'epoch': 2.42} +2025-02-06 02:11:48 - ERROR - stderr - 81%|████████ | 18122/22434 [16:04:08<2:59:50, 2.50s/it] +2025-02-06 02:11:51 - ERROR - stderr - 81%|████████ | 18123/22434 [16:04:10<2:59:22, 2.50s/it] +2025-02-06 02:11:51 - ERROR - stderr - +2025-02-06 02:11:51 - ERROR - stderr - +2025-02-06 02:11:51 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.4749321937561035, 'learning_rate': 1.8751788621760846e-06, 'epoch': 2.42} +2025-02-06 02:11:51 - ERROR - stderr - 81%|████████ | 18123/22434 [16:04:10<2:59:22, 2.50s/it] +2025-02-06 02:11:53 - ERROR - stderr - 81%|████████ | 18124/22434 [16:04:13<3:00:24, 2.51s/it] +2025-02-06 02:11:53 - ERROR - stderr - +2025-02-06 02:11:53 - ERROR - stderr - +2025-02-06 02:11:53 - INFO - stdout - {'loss': 0.3999, 'grad_norm': 1.5503557920455933, 'learning_rate': 1.874337262615189e-06, 'epoch': 2.42} +2025-02-06 02:11:53 - ERROR - stderr - 81%|████████ | 18124/22434 [16:04:13<3:00:24, 2.51s/it] +2025-02-06 02:11:56 - ERROR - stderr - 81%|████████ | 18125/22434 [16:04:15<2:59:48, 2.50s/it] +2025-02-06 02:11:56 - ERROR - stderr - +2025-02-06 02:11:56 - ERROR - stderr - +2025-02-06 02:11:56 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.4730955362319946, 'learning_rate': 1.8734958324259577e-06, 'epoch': 2.42} +2025-02-06 02:11:56 - ERROR - stderr - 81%|████████ | 18125/22434 [16:04:16<2:59:48, 2.50s/it] +2025-02-06 02:11:58 - ERROR - stderr - 81%|████████ | 18126/22434 [16:04:18<2:58:09, 2.48s/it] +2025-02-06 02:11:58 - ERROR - stderr - +2025-02-06 02:11:58 - ERROR - stderr - +2025-02-06 02:11:58 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.468219518661499, 'learning_rate': 1.8726545716259293e-06, 'epoch': 2.42} +2025-02-06 02:11:58 - ERROR - stderr - 81%|████████ | 18126/22434 [16:04:18<2:58:09, 2.48s/it] +2025-02-06 02:12:01 - ERROR - stderr - 81%|████████ | 18127/22434 [16:04:20<2:58:49, 2.49s/it] +2025-02-06 02:12:01 - ERROR - stderr - +2025-02-06 02:12:01 - ERROR - stderr - +2025-02-06 02:12:01 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.6197565793991089, 'learning_rate': 1.8718134802326393e-06, 'epoch': 2.42} +2025-02-06 02:12:01 - ERROR - stderr - 81%|████████ | 18127/22434 [16:04:20<2:58:49, 2.49s/it] +2025-02-06 02:12:03 - ERROR - stderr - 81%|████████ | 18128/22434 [16:04:23<2:59:19, 2.50s/it] +2025-02-06 02:12:03 - ERROR - stderr - +2025-02-06 02:12:03 - ERROR - stderr - +2025-02-06 02:12:03 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.6689503192901611, 'learning_rate': 1.8709725582636195e-06, 'epoch': 2.42} +2025-02-06 02:12:03 - ERROR - stderr - 81%|████████ | 18128/22434 [16:04:23<2:59:19, 2.50s/it] +2025-02-06 02:12:06 - ERROR - stderr - 81%|████████ | 18129/22434 [16:04:25<2:59:25, 2.50s/it] +2025-02-06 02:12:06 - ERROR - stderr - +2025-02-06 02:12:06 - ERROR - stderr - +2025-02-06 02:12:06 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.5541847944259644, 'learning_rate': 1.8701318057363981e-06, 'epoch': 2.42} +2025-02-06 02:12:06 - ERROR - stderr - 81%|████████ | 18129/22434 [16:04:25<2:59:25, 2.50s/it] +2025-02-06 02:12:08 - ERROR - stderr - 81%|████████ | 18130/22434 [16:04:28<2:58:30, 2.49s/it] +2025-02-06 02:12:08 - ERROR - stderr - +2025-02-06 02:12:08 - ERROR - stderr - +2025-02-06 02:12:08 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.4522587060928345, 'learning_rate': 1.8692912226685012e-06, 'epoch': 2.42} +2025-02-06 02:12:08 - ERROR - stderr - 81%|████████ | 18130/22434 [16:04:28<2:58:30, 2.49s/it] +2025-02-06 02:12:11 - ERROR - stderr - 81%|████████ | 18131/22434 [16:04:30<2:59:45, 2.51s/it] +2025-02-06 02:12:11 - ERROR - stderr - +2025-02-06 02:12:11 - ERROR - stderr - +2025-02-06 02:12:11 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.528070092201233, 'learning_rate': 1.8684508090774467e-06, 'epoch': 2.42} +2025-02-06 02:12:11 - ERROR - stderr - 81%|████████ | 18131/22434 [16:04:30<2:59:45, 2.51s/it] +2025-02-06 02:12:13 - ERROR - stderr - 81%|████████ | 18132/22434 [16:04:33<2:59:48, 2.51s/it] +2025-02-06 02:12:13 - ERROR - stderr - +2025-02-06 02:12:13 - ERROR - stderr - +2025-02-06 02:12:13 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.4616628885269165, 'learning_rate': 1.8676105649807573e-06, 'epoch': 2.42} +2025-02-06 02:12:13 - ERROR - stderr - 81%|████████ | 18132/22434 [16:04:33<2:59:48, 2.51s/it] +2025-02-06 02:12:16 - ERROR - stderr - 81%|████████ | 18133/22434 [16:04:35<2:59:09, 2.50s/it] +2025-02-06 02:12:16 - ERROR - stderr - +2025-02-06 02:12:16 - ERROR - stderr - +2025-02-06 02:12:16 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.7883821725845337, 'learning_rate': 1.8667704903959383e-06, 'epoch': 2.42} +2025-02-06 02:12:16 - ERROR - stderr - 81%|████████ | 18133/22434 [16:04:35<2:59:09, 2.50s/it] +2025-02-06 02:12:18 - ERROR - stderr - 81%|████████ | 18134/22434 [16:04:38<2:59:23, 2.50s/it] +2025-02-06 02:12:18 - ERROR - stderr - +2025-02-06 02:12:18 - ERROR - stderr - +2025-02-06 02:12:18 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.4654746055603027, 'learning_rate': 1.8659305853405118e-06, 'epoch': 2.42} +2025-02-06 02:12:18 - ERROR - stderr - 81%|████████ | 18134/22434 [16:04:38<2:59:23, 2.50s/it] +2025-02-06 02:12:21 - ERROR - stderr - 81%|████████ | 18135/22434 [16:04:41<3:04:19, 2.57s/it] +2025-02-06 02:12:21 - ERROR - stderr - +2025-02-06 02:12:21 - ERROR - stderr - +2025-02-06 02:12:21 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.404805064201355, 'learning_rate': 1.865090849831973e-06, 'epoch': 2.43} +2025-02-06 02:12:21 - ERROR - stderr - 81%|████████ | 18135/22434 [16:04:41<3:04:19, 2.57s/it] +2025-02-06 02:12:23 - ERROR - stderr - 81%|████████ | 18136/22434 [16:04:43<3:03:10, 2.56s/it] +2025-02-06 02:12:23 - ERROR - stderr - +2025-02-06 02:12:23 - ERROR - stderr - +2025-02-06 02:12:23 - INFO - stdout - {'loss': 0.4161, 'grad_norm': 1.5814189910888672, 'learning_rate': 1.8642512838878335e-06, 'epoch': 2.43} +2025-02-06 02:12:23 - ERROR - stderr - 81%|████████ | 18136/22434 [16:04:43<3:03:10, 2.56s/it] +2025-02-06 02:12:26 - ERROR - stderr - 81%|████████ | 18137/22434 [16:04:46<3:03:10, 2.56s/it] +2025-02-06 02:12:26 - ERROR - stderr - +2025-02-06 02:12:26 - ERROR - stderr - +2025-02-06 02:12:26 - INFO - stdout - {'loss': 0.3219, 'grad_norm': 1.5269899368286133, 'learning_rate': 1.8634118875255935e-06, 'epoch': 2.43} +2025-02-06 02:12:26 - ERROR - stderr - 81%|████████ | 18137/22434 [16:04:46<3:03:10, 2.56s/it] +2025-02-06 02:12:29 - ERROR - stderr - 81%|████████ | 18138/22434 [16:04:48<3:02:21, 2.55s/it] +2025-02-06 02:12:29 - ERROR - stderr - +2025-02-06 02:12:29 - ERROR - stderr - +2025-02-06 02:12:29 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.4669893980026245, 'learning_rate': 1.8625726607627425e-06, 'epoch': 2.43} +2025-02-06 02:12:29 - ERROR - stderr - 81%|████████ | 18138/22434 [16:04:48<3:02:21, 2.55s/it] +2025-02-06 02:12:31 - ERROR - stderr - 81%|████████ | 18139/22434 [16:04:51<3:01:21, 2.53s/it] +2025-02-06 02:12:31 - ERROR - stderr - +2025-02-06 02:12:31 - ERROR - stderr - +2025-02-06 02:12:31 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.62925386428833, 'learning_rate': 1.8617336036167822e-06, 'epoch': 2.43} +2025-02-06 02:12:31 - ERROR - stderr - 81%|████████ | 18139/22434 [16:04:51<3:01:21, 2.53s/it] +2025-02-06 02:12:34 - ERROR - stderr - 81%|████████ | 18140/22434 [16:04:53<3:01:25, 2.53s/it] +2025-02-06 02:12:34 - ERROR - stderr - +2025-02-06 02:12:34 - ERROR - stderr - +2025-02-06 02:12:34 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.477181077003479, 'learning_rate': 1.8608947161051949e-06, 'epoch': 2.43} +2025-02-06 02:12:34 - ERROR - stderr - 81%|████████ | 18140/22434 [16:04:53<3:01:25, 2.53s/it] +2025-02-06 02:12:36 - ERROR - stderr - 81%|████████ | 18141/22434 [16:04:56<3:06:58, 2.61s/it] +2025-02-06 02:12:36 - ERROR - stderr - +2025-02-06 02:12:36 - ERROR - stderr - +2025-02-06 02:12:36 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.501099944114685, 'learning_rate': 1.8600559982454691e-06, 'epoch': 2.43} +2025-02-06 02:12:36 - ERROR - stderr - 81%|████████ | 18141/22434 [16:04:56<3:06:58, 2.61s/it] +2025-02-06 02:12:39 - ERROR - stderr - 81%|████████ | 18142/22434 [16:04:59<3:07:58, 2.63s/it] +2025-02-06 02:12:39 - ERROR - stderr - +2025-02-06 02:12:39 - ERROR - stderr - +2025-02-06 02:12:39 - INFO - stdout - {'loss': 0.3937, 'grad_norm': 1.5785404443740845, 'learning_rate': 1.8592174500550875e-06, 'epoch': 2.43} +2025-02-06 02:12:39 - ERROR - stderr - 81%|███████��� | 18142/22434 [16:04:59<3:07:58, 2.63s/it] +2025-02-06 02:12:42 - ERROR - stderr - 81%|████████ | 18143/22434 [16:05:01<3:05:06, 2.59s/it] +2025-02-06 02:12:42 - ERROR - stderr - +2025-02-06 02:12:42 - ERROR - stderr - +2025-02-06 02:12:42 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.5076217651367188, 'learning_rate': 1.8583790715515248e-06, 'epoch': 2.43} +2025-02-06 02:12:42 - ERROR - stderr - 81%|████████ | 18143/22434 [16:05:01<3:05:06, 2.59s/it] +2025-02-06 02:12:44 - ERROR - stderr - 81%|████████ | 18144/22434 [16:05:04<3:06:29, 2.61s/it] +2025-02-06 02:12:44 - ERROR - stderr - +2025-02-06 02:12:44 - ERROR - stderr - +2025-02-06 02:12:44 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.3558850288391113, 'learning_rate': 1.857540862752265e-06, 'epoch': 2.43} +2025-02-06 02:12:44 - ERROR - stderr - 81%|████████ | 18144/22434 [16:05:04<3:06:29, 2.61s/it] +2025-02-06 02:12:47 - ERROR - stderr - 81%|████████ | 18145/22434 [16:05:06<3:05:12, 2.59s/it] +2025-02-06 02:12:47 - ERROR - stderr - +2025-02-06 02:12:47 - ERROR - stderr - +2025-02-06 02:12:47 - INFO - stdout - {'loss': 0.3254, 'grad_norm': 1.367803931236267, 'learning_rate': 1.856702823674772e-06, 'epoch': 2.43} +2025-02-06 02:12:47 - ERROR - stderr - 81%|████████ | 18145/22434 [16:05:07<3:05:12, 2.59s/it] +2025-02-06 02:12:49 - ERROR - stderr - 81%|████████ | 18146/22434 [16:05:09<3:04:01, 2.58s/it] +2025-02-06 02:12:49 - ERROR - stderr - +2025-02-06 02:12:49 - ERROR - stderr - +2025-02-06 02:12:49 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.426256537437439, 'learning_rate': 1.855864954336517e-06, 'epoch': 2.43} +2025-02-06 02:12:49 - ERROR - stderr - 81%|████████ | 18146/22434 [16:05:09<3:04:01, 2.58s/it] +2025-02-06 02:12:52 - ERROR - stderr - 81%|████████ | 18147/22434 [16:05:12<3:02:31, 2.55s/it] +2025-02-06 02:12:52 - ERROR - stderr - +2025-02-06 02:12:52 - ERROR - stderr - +2025-02-06 02:12:52 - INFO - stdout - {'loss': 0.3599, 'grad_norm': 1.5262291431427002, 'learning_rate': 1.855027254754963e-06, 'epoch': 2.43} +2025-02-06 02:12:52 - ERROR - stderr - 81%|████████ | 18147/22434 [16:05:12<3:02:31, 2.55s/it] +2025-02-06 02:12:54 - ERROR - stderr - 81%|████████ | 18148/22434 [16:05:14<3:00:43, 2.53s/it] +2025-02-06 02:12:54 - ERROR - stderr - +2025-02-06 02:12:54 - ERROR - stderr - +2025-02-06 02:12:54 - INFO - stdout - {'loss': 0.3719, 'grad_norm': 1.6131625175476074, 'learning_rate': 1.8541897249475715e-06, 'epoch': 2.43} +2025-02-06 02:12:54 - ERROR - stderr - 81%|████████ | 18148/22434 [16:05:14<3:00:43, 2.53s/it] +2025-02-06 02:12:57 - ERROR - stderr - 81%|████████ | 18149/22434 [16:05:17<3:02:15, 2.55s/it] +2025-02-06 02:12:57 - ERROR - stderr - +2025-02-06 02:12:57 - ERROR - stderr - +2025-02-06 02:12:57 - INFO - stdout - {'loss': 0.3442, 'grad_norm': 1.5687006711959839, 'learning_rate': 1.853352364931802e-06, 'epoch': 2.43} +2025-02-06 02:12:57 - ERROR - stderr - 81%|████████ | 18149/22434 [16:05:17<3:02:15, 2.55s/it] +2025-02-06 02:12:59 - ERROR - stderr - 81%|████████ | 18150/22434 [16:05:19<3:01:32, 2.54s/it] +2025-02-06 02:12:59 - ERROR - stderr - +2025-02-06 02:12:59 - ERROR - stderr - +2025-02-06 02:12:59 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.5420058965682983, 'learning_rate': 1.8525151747251058e-06, 'epoch': 2.43} +2025-02-06 02:12:59 - ERROR - stderr - 81%|████████ | 18150/22434 [16:05:19<3:01:32, 2.54s/it] +2025-02-06 02:13:02 - ERROR - stderr - 81%|████████ | 18151/22434 [16:05:22<2:59:44, 2.52s/it] +2025-02-06 02:13:02 - ERROR - stderr - +2025-02-06 02:13:02 - ERROR - stderr - +2025-02-06 02:13:02 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.5271199941635132, 'learning_rate': 1.8516781543449346e-06, 'epoch': 2.43} +2025-02-06 02:13:02 - ERROR - stderr - 81%|████████ | 18151/22434 [16:05:22<2:59:44, 2.52s/it] +2025-02-06 02:13:04 - ERROR - stderr - 81%|████████ | 18152/22434 [16:05:24<2:59:08, 2.51s/it] +2025-02-06 02:13:04 - ERROR - stderr - +2025-02-06 02:13:04 - ERROR - stderr - +2025-02-06 02:13:04 - INFO - stdout - {'loss': 0.4274, 'grad_norm': 1.672491192817688, 'learning_rate': 1.8508413038087358e-06, 'epoch': 2.43} +2025-02-06 02:13:04 - ERROR - stderr - 81%|████████ | 18152/22434 [16:05:24<2:59:08, 2.51s/it] +2025-02-06 02:13:07 - ERROR - stderr - 81%|████████ | 18153/22434 [16:05:27<3:00:55, 2.54s/it] +2025-02-06 02:13:07 - ERROR - stderr - +2025-02-06 02:13:07 - ERROR - stderr - +2025-02-06 02:13:07 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.634647250175476, 'learning_rate': 1.850004623133954e-06, 'epoch': 2.43} +2025-02-06 02:13:07 - ERROR - stderr - 81%|████████ | 18153/22434 [16:05:27<3:00:55, 2.54s/it] +2025-02-06 02:13:09 - ERROR - stderr - 81%|████████ | 18154/22434 [16:05:29<3:00:07, 2.53s/it] +2025-02-06 02:13:09 - ERROR - stderr - +2025-02-06 02:13:09 - ERROR - stderr - +2025-02-06 02:13:09 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.5605021715164185, 'learning_rate': 1.8491681123380235e-06, 'epoch': 2.43} +2025-02-06 02:13:09 - ERROR - stderr - 81%|████████ | 18154/22434 [16:05:29<3:00:07, 2.53s/it] +2025-02-06 02:13:12 - ERROR - stderr - 81%|████████ | 18155/22434 [16:05:32<3:07:28, 2.63s/it] +2025-02-06 02:13:12 - ERROR - stderr - +2025-02-06 02:13:12 - ERROR - stderr - +2025-02-06 02:13:12 - INFO - stdout - {'loss': 0.4058, 'grad_norm': 1.6575759649276733, 'learning_rate': 1.8483317714383852e-06, 'epoch': 2.43} +2025-02-06 02:13:12 - ERROR - stderr - 81%|████████ | 18155/22434 [16:05:32<3:07:28, 2.63s/it] +2025-02-06 02:13:15 - ERROR - stderr - 81%|████████ | 18156/22434 [16:05:35<3:04:44, 2.59s/it] +2025-02-06 02:13:15 - ERROR - stderr - +2025-02-06 02:13:15 - ERROR - stderr - +2025-02-06 02:13:15 - INFO - stdout - {'loss': 0.4298, 'grad_norm': 1.6028364896774292, 'learning_rate': 1.8474956004524736e-06, 'epoch': 2.43} +2025-02-06 02:13:15 - ERROR - stderr - 81%|████████ | 18156/22434 [16:05:35<3:04:44, 2.59s/it] +2025-02-06 02:13:17 - ERROR - stderr - 81%|████████ | 18157/22434 [16:05:37<3:01:41, 2.55s/it] +2025-02-06 02:13:17 - ERROR - stderr - +2025-02-06 02:13:17 - ERROR - stderr - +2025-02-06 02:13:17 - INFO - stdout - {'loss': 0.3281, 'grad_norm': 1.501952052116394, 'learning_rate': 1.8466595993977098e-06, 'epoch': 2.43} +2025-02-06 02:13:17 - ERROR - stderr - 81%|████████ | 18157/22434 [16:05:37<3:01:41, 2.55s/it] +2025-02-06 02:13:20 - ERROR - stderr - 81%|████████ | 18158/22434 [16:05:40<3:03:50, 2.58s/it] +2025-02-06 02:13:20 - ERROR - stderr - +2025-02-06 02:13:20 - ERROR - stderr - +2025-02-06 02:13:20 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.3506063222885132, 'learning_rate': 1.8458237682915303e-06, 'epoch': 2.43} +2025-02-06 02:13:20 - ERROR - stderr - 81%|████████ | 18158/22434 [16:05:40<3:03:50, 2.58s/it] +2025-02-06 02:13:22 - ERROR - stderr - 81%|████████ | 18159/22434 [16:05:42<3:01:23, 2.55s/it] +2025-02-06 02:13:22 - ERROR - stderr - +2025-02-06 02:13:22 - ERROR - stderr - +2025-02-06 02:13:22 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.4719749689102173, 'learning_rate': 1.8449881071513464e-06, 'epoch': 2.43} +2025-02-06 02:13:22 - ERROR - stderr - 81%|████████ | 18159/22434 [16:05:42<3:01:23, 2.55s/it] +2025-02-06 02:13:25 - ERROR - stderr - 81%|████████ | 18160/22434 [16:05:45<3:00:00, 2.53s/it] +2025-02-06 02:13:25 - ERROR - stderr - +2025-02-06 02:13:25 - ERROR - stderr - +2025-02-06 02:13:25 - INFO - stdout - {'loss': 0.3995, 'grad_norm': 1.6349042654037476, 'learning_rate': 1.8441526159945878e-06, 'epoch': 2.43} +2025-02-06 02:13:25 - ERROR - stderr - 81%|████████ | 18160/22434 [16:05:45<3:00:00, 2.53s/it] +2025-02-06 02:13:27 - ERROR - stderr - 81%|████████ | 18161/22434 [16:05:47<2:59:19, 2.52s/it] +2025-02-06 02:13:27 - ERROR - stderr - +2025-02-06 02:13:27 - ERROR - stderr - +2025-02-06 02:13:27 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.5119370222091675, 'learning_rate': 1.84331729483866e-06, 'epoch': 2.43} +2025-02-06 02:13:27 - ERROR - stderr - 81%|████████ | 18161/22434 [16:05:47<2:59:19, 2.52s/it] +2025-02-06 02:13:30 - ERROR - stderr - 81%|████████ | 18162/22434 [16:05:50<3:00:09, 2.53s/it] +2025-02-06 02:13:30 - ERROR - stderr - +2025-02-06 02:13:30 - ERROR - stderr - +2025-02-06 02:13:30 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.6625653505325317, 'learning_rate': 1.8424821437009766e-06, 'epoch': 2.43} +2025-02-06 02:13:30 - ERROR - stderr - 81%|████████ | 18162/22434 [16:05:50<3:00:09, 2.53s/it] +2025-02-06 02:13:32 - ERROR - stderr - 81%|████████ | 18163/22434 [16:05:52<2:58:48, 2.51s/it] +2025-02-06 02:13:32 - ERROR - stderr - +2025-02-06 02:13:32 - ERROR - stderr - +2025-02-06 02:13:32 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.4546360969543457, 'learning_rate': 1.8416471625989506e-06, 'epoch': 2.43} +2025-02-06 02:13:32 - ERROR - stderr - 81%|████████ | 18163/22434 [16:05:52<2:58:48, 2.51s/it] +2025-02-06 02:13:35 - ERROR - stderr - 81%|████████ | 18164/22434 [16:05:55<2:58:34, 2.51s/it] +2025-02-06 02:13:35 - ERROR - stderr - +2025-02-06 02:13:35 - ERROR - stderr - +2025-02-06 02:13:35 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.339250087738037, 'learning_rate': 1.8408123515499821e-06, 'epoch': 2.43} +2025-02-06 02:13:35 - ERROR - stderr - 81%|████████ | 18164/22434 [16:05:55<2:58:34, 2.51s/it] +2025-02-06 02:13:38 - ERROR - stderr - 81%|████████ | 18165/22434 [16:05:57<3:05:42, 2.61s/it] +2025-02-06 02:13:38 - ERROR - stderr - +2025-02-06 02:13:38 - ERROR - stderr - +2025-02-06 02:13:38 - INFO - stdout - {'loss': 0.4295, 'grad_norm': 1.6681081056594849, 'learning_rate': 1.839977710571471e-06, 'epoch': 2.43} +2025-02-06 02:13:38 - ERROR - stderr - 81%|████████ | 18165/22434 [16:05:58<3:05:42, 2.61s/it] +2025-02-06 02:13:40 - ERROR - stderr - 81%|████████ | 18166/22434 [16:06:00<3:04:55, 2.60s/it] +2025-02-06 02:13:40 - ERROR - stderr - +2025-02-06 02:13:40 - ERROR - stderr - +2025-02-06 02:13:40 - INFO - stdout - {'loss': 0.3427, 'grad_norm': 1.3231414556503296, 'learning_rate': 1.8391432396808173e-06, 'epoch': 2.43} +2025-02-06 02:13:40 - ERROR - stderr - 81%|████████ | 18166/22434 [16:06:00<3:04:55, 2.60s/it] +2025-02-06 02:13:43 - ERROR - stderr - 81%|████████ | 18167/22434 [16:06:03<3:05:03, 2.60s/it] +2025-02-06 02:13:43 - ERROR - stderr - +2025-02-06 02:13:43 - ERROR - stderr - +2025-02-06 02:13:43 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.5408296585083008, 'learning_rate': 1.8383089388954134e-06, 'epoch': 2.43} +2025-02-06 02:13:43 - ERROR - stderr - 81%|████████ | 18167/22434 [16:06:03<3:05:03, 2.60s/it] +2025-02-06 02:13:46 - ERROR - stderr - 81%|████████ | 18168/22434 [16:06:05<3:09:24, 2.66s/it] +2025-02-06 02:13:46 - ERROR - stderr - +2025-02-06 02:13:46 - ERROR - stderr - +2025-02-06 02:13:46 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.5553381443023682, 'learning_rate': 1.8374748082326487e-06, 'epoch': 2.43} +2025-02-06 02:13:46 - ERROR - stderr - 81%|████████ | 18168/22434 [16:06:06<3:09:24, 2.66s/it] +2025-02-06 02:13:48 - ERROR - stderr - 81%|████████ | 18169/22434 [16:06:08<3:04:41, 2.60s/it] +2025-02-06 02:13:48 - ERROR - stderr - +2025-02-06 02:13:48 - ERROR - stderr - +2025-02-06 02:13:48 - INFO - stdout - {'loss': 0.3719, 'grad_norm': 1.5951651334762573, 'learning_rate': 1.8366408477099118e-06, 'epoch': 2.43} +2025-02-06 02:13:48 - ERROR - stderr - 81%|████████ | 18169/22434 [16:06:08<3:04:41, 2.60s/it] +2025-02-06 02:13:51 - ERROR - stderr - 81%|████████ | 18170/22434 [16:06:10<3:02:03, 2.56s/it] +2025-02-06 02:13:51 - ERROR - stderr - +2025-02-06 02:13:51 - ERROR - stderr - +2025-02-06 02:13:51 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.461053729057312, 'learning_rate': 1.8358070573445852e-06, 'epoch': 2.43} +2025-02-06 02:13:51 - ERROR - stderr - 81%|████████ | 18170/22434 [16:06:10<3:02:03, 2.56s/it] +2025-02-06 02:13:53 - ERROR - stderr - 81%|████████ | 18171/22434 [16:06:13<3:00:19, 2.54s/it] +2025-02-06 02:13:53 - ERROR - stderr - +2025-02-06 02:13:53 - ERROR - stderr - +2025-02-06 02:13:53 - INFO - stdout - {'loss': 0.4401, 'grad_norm': 1.694732666015625, 'learning_rate': 1.8349734371540485e-06, 'epoch': 2.43} +2025-02-06 02:13:53 - ERROR - stderr - 81%|████████ | 18171/22434 [16:06:13<3:00:19, 2.54s/it] +2025-02-06 02:13:56 - ERROR - stderr - 81%|████████ | 18172/22434 [16:06:15<2:59:08, 2.52s/it] +2025-02-06 02:13:56 - ERROR - stderr - +2025-02-06 02:13:56 - ERROR - stderr - +2025-02-06 02:13:56 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.6370354890823364, 'learning_rate': 1.8341399871556786e-06, 'epoch': 2.43} +2025-02-06 02:13:56 - ERROR - stderr - 81%|████████ | 18172/22434 [16:06:15<2:59:08, 2.52s/it] +2025-02-06 02:13:58 - ERROR - stderr - 81%|████████ | 18173/22434 [16:06:18<2:58:51, 2.52s/it] +2025-02-06 02:13:58 - ERROR - stderr - +2025-02-06 02:13:58 - ERROR - stderr - +2025-02-06 02:13:58 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.4816865921020508, 'learning_rate': 1.8333067073668432e-06, 'epoch': 2.43} +2025-02-06 02:13:58 - ERROR - stderr - 81%|████████ | 18173/22434 [16:06:18<2:58:51, 2.52s/it] +2025-02-06 02:14:01 - ERROR - stderr - 81%|████████ | 18174/22434 [16:06:20<2:56:45, 2.49s/it] +2025-02-06 02:14:01 - ERROR - stderr - +2025-02-06 02:14:01 - ERROR - stderr - +2025-02-06 02:14:01 - INFO - stdout - {'loss': 0.4194, 'grad_norm': 1.524155616760254, 'learning_rate': 1.8324735978049168e-06, 'epoch': 2.43} +2025-02-06 02:14:01 - ERROR - stderr - 81%|████████ | 18174/22434 [16:06:20<2:56:45, 2.49s/it] +2025-02-06 02:14:03 - ERROR - stderr - 81%|████████ | 18175/22434 [16:06:23<2:57:50, 2.51s/it] +2025-02-06 02:14:03 - ERROR - stderr - +2025-02-06 02:14:03 - ERROR - stderr - +2025-02-06 02:14:03 - INFO - stdout - {'loss': 0.4487, 'grad_norm': 1.6262582540512085, 'learning_rate': 1.8316406584872625e-06, 'epoch': 2.43} +2025-02-06 02:14:03 - ERROR - stderr - 81%|████████ | 18175/22434 [16:06:23<2:57:50, 2.51s/it] +2025-02-06 02:14:06 - ERROR - stderr - 81%|████████ | 18176/22434 [16:06:26<3:05:51, 2.62s/it] +2025-02-06 02:14:06 - ERROR - stderr - +2025-02-06 02:14:06 - ERROR - stderr - +2025-02-06 02:14:06 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.5761114358901978, 'learning_rate': 1.8308078894312431e-06, 'epoch': 2.43} +2025-02-06 02:14:06 - ERROR - stderr - 81%|████████ | 18176/22434 [16:06:26<3:05:51, 2.62s/it] +2025-02-06 02:14:08 - ERROR - stderr - 81%|████████ | 18177/22434 [16:06:28<3:03:18, 2.58s/it] +2025-02-06 02:14:08 - ERROR - stderr - +2025-02-06 02:14:08 - ERROR - stderr - +2025-02-06 02:14:08 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.3818485736846924, 'learning_rate': 1.829975290654218e-06, 'epoch': 2.43} +2025-02-06 02:14:08 - ERROR - stderr - 81%|████████ | 18177/22434 [16:06:28<3:03:18, 2.58s/it] +2025-02-06 02:14:11 - ERROR - stderr - 81%|████████ | 18178/22434 [16:06:31<3:00:52, 2.55s/it] +2025-02-06 02:14:11 - ERROR - stderr - +2025-02-06 02:14:11 - ERROR - stderr - +2025-02-06 02:14:11 - INFO - stdout - {'loss': 0.4244, 'grad_norm': 1.6064000129699707, 'learning_rate': 1.8291428621735353e-06, 'epoch': 2.43} +2025-02-06 02:14:11 - ERROR - stderr - 81%|████████ | 18178/22434 [16:06:31<3:00:52, 2.55s/it] +2025-02-06 02:14:13 - ERROR - stderr - 81%|████████ | 18179/22434 [16:06:33<2:59:51, 2.54s/it] +2025-02-06 02:14:13 - ERROR - stderr - +2025-02-06 02:14:13 - ERROR - stderr - +2025-02-06 02:14:13 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.4403181076049805, 'learning_rate': 1.8283106040065557e-06, 'epoch': 2.43} +2025-02-06 02:14:13 - ERROR - stderr - 81%|████████ | 18179/22434 [16:06:33<2:59:51, 2.54s/it] +2025-02-06 02:14:16 - ERROR - stderr - 81%|████████ | 18180/22434 [16:06:36<2:59:37, 2.53s/it] +2025-02-06 02:14:16 - ERROR - stderr - +2025-02-06 02:14:16 - ERROR - stderr - +2025-02-06 02:14:16 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.559250831604004, 'learning_rate': 1.8274785161706198e-06, 'epoch': 2.43} +2025-02-06 02:14:16 - ERROR - stderr - 81%|████████ | 18180/22434 [16:06:36<2:59:37, 2.53s/it] +2025-02-06 02:14:18 - ERROR - stderr - 81%|████████ | 18181/22434 [16:06:38<2:57:19, 2.50s/it] +2025-02-06 02:14:18 - ERROR - stderr - +2025-02-06 02:14:18 - ERROR - stderr - +2025-02-06 02:14:18 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.4103502035140991, 'learning_rate': 1.8266465986830718e-06, 'epoch': 2.43} +2025-02-06 02:14:18 - ERROR - stderr - 81%|████████ | 18181/22434 [16:06:38<2:57:19, 2.50s/it] +2025-02-06 02:14:21 - ERROR - stderr - 81%|████████ | 18182/22434 [16:06:41<3:02:36, 2.58s/it] +2025-02-06 02:14:21 - ERROR - stderr - +2025-02-06 02:14:21 - ERROR - stderr - +2025-02-06 02:14:21 - INFO - stdout - {'loss': 0.417, 'grad_norm': 1.6944823265075684, 'learning_rate': 1.8258148515612584e-06, 'epoch': 2.43} +2025-02-06 02:14:21 - ERROR - stderr - 81%|████████ | 18182/22434 [16:06:41<3:02:36, 2.58s/it] +2025-02-06 02:14:24 - ERROR - stderr - 81%|████████ | 18183/22434 [16:06:43<2:59:09, 2.53s/it] +2025-02-06 02:14:24 - ERROR - stderr - +2025-02-06 02:14:24 - ERROR - stderr - +2025-02-06 02:14:24 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.6668181419372559, 'learning_rate': 1.8249832748225082e-06, 'epoch': 2.43} +2025-02-06 02:14:24 - ERROR - stderr - 81%|████████ | 18183/22434 [16:06:43<2:59:09, 2.53s/it] +2025-02-06 02:14:26 - ERROR - stderr - 81%|████████ | 18184/22434 [16:06:46<3:00:04, 2.54s/it] +2025-02-06 02:14:26 - ERROR - stderr - +2025-02-06 02:14:26 - ERROR - stderr - +2025-02-06 02:14:26 - INFO - stdout - {'loss': 0.32, 'grad_norm': 1.3947749137878418, 'learning_rate': 1.8241518684841642e-06, 'epoch': 2.43} +2025-02-06 02:14:26 - ERROR - stderr - 81%|████████ | 18184/22434 [16:06:46<3:00:04, 2.54s/it] +2025-02-06 02:14:29 - ERROR - stderr - 81%|████████ | 18185/22434 [16:06:48<2:59:53, 2.54s/it] +2025-02-06 02:14:29 - ERROR - stderr - +2025-02-06 02:14:29 - ERROR - stderr - +2025-02-06 02:14:29 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.5086702108383179, 'learning_rate': 1.8233206325635489e-06, 'epoch': 2.43} +2025-02-06 02:14:29 - ERROR - stderr - 81%|████████ | 18185/22434 [16:06:48<2:59:53, 2.54s/it] +2025-02-06 02:14:31 - ERROR - stderr - 81%|████████ | 18186/22434 [16:06:51<2:59:07, 2.53s/it] +2025-02-06 02:14:31 - ERROR - stderr - +2025-02-06 02:14:31 - ERROR - stderr - +2025-02-06 02:14:31 - INFO - stdout - {'loss': 0.3815, 'grad_norm': 1.5507982969284058, 'learning_rate': 1.8224895670779906e-06, 'epoch': 2.43} +2025-02-06 02:14:31 - ERROR - stderr - 81%|████████ | 18186/22434 [16:06:51<2:59:07, 2.53s/it] +2025-02-06 02:14:34 - ERROR - stderr - 81%|████████ | 18187/22434 [16:06:53<2:58:53, 2.53s/it] +2025-02-06 02:14:34 - ERROR - stderr - +2025-02-06 02:14:34 - ERROR - stderr - +2025-02-06 02:14:34 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.6147749423980713, 'learning_rate': 1.8216586720448115e-06, 'epoch': 2.43} +2025-02-06 02:14:34 - ERROR - stderr - 81%|████████ | 18187/22434 [16:06:54<2:58:53, 2.53s/it] +2025-02-06 02:14:36 - ERROR - stderr - 81%|████████ | 18188/22434 [16:06:56<2:58:16, 2.52s/it] +2025-02-06 02:14:36 - ERROR - stderr - +2025-02-06 02:14:36 - ERROR - stderr - +2025-02-06 02:14:36 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.6236050128936768, 'learning_rate': 1.8208279474813295e-06, 'epoch': 2.43} +2025-02-06 02:14:36 - ERROR - stderr - 81%|████████ | 18188/22434 [16:06:56<2:58:16, 2.52s/it] +2025-02-06 02:14:39 - ERROR - stderr - 81%|████████ | 18189/22434 [16:06:58<2:57:00, 2.50s/it] +2025-02-06 02:14:39 - ERROR - stderr - +2025-02-06 02:14:39 - ERROR - stderr - +2025-02-06 02:14:39 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.5117872953414917, 'learning_rate': 1.8199973934048677e-06, 'epoch': 2.43} +2025-02-06 02:14:39 - ERROR - stderr - 81%|████████ | 18189/22434 [16:06:58<2:57:00, 2.50s/it] +2025-02-06 02:14:41 - ERROR - stderr - 81%|████████ | 18190/22434 [16:07:01<2:55:47, 2.49s/it] +2025-02-06 02:14:41 - ERROR - stderr - +2025-02-06 02:14:41 - ERROR - stderr - +2025-02-06 02:14:41 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.6009238958358765, 'learning_rate': 1.8191670098327297e-06, 'epoch': 2.43} +2025-02-06 02:14:41 - ERROR - stderr - 81%|████████ | 18190/22434 [16:07:01<2:55:47, 2.49s/it] +2025-02-06 02:14:44 - ERROR - stderr - 81%|████████ | 18191/22434 [16:07:03<2:55:07, 2.48s/it] +2025-02-06 02:14:44 - ERROR - stderr - +2025-02-06 02:14:44 - ERROR - stderr - +2025-02-06 02:14:44 - INFO - stdout - {'loss': 0.4235, 'grad_norm': 1.6920732259750366, 'learning_rate': 1.8183367967822274e-06, 'epoch': 2.43} +2025-02-06 02:14:44 - ERROR - stderr - 81%|████████ | 18191/22434 [16:07:03<2:55:07, 2.48s/it] +2025-02-06 02:14:46 - ERROR - stderr - 81%|████████ | 18192/22434 [16:07:06<2:57:07, 2.51s/it] +2025-02-06 02:14:46 - ERROR - stderr - +2025-02-06 02:14:46 - ERROR - stderr - +2025-02-06 02:14:46 - INFO - stdout - {'loss': 0.4341, 'grad_norm': 1.650769591331482, 'learning_rate': 1.8175067542706659e-06, 'epoch': 2.43} +2025-02-06 02:14:46 - ERROR - stderr - 81%|████████ | 18192/22434 [16:07:06<2:57:07, 2.51s/it] +2025-02-06 02:14:49 - ERROR - stderr - 81%|████████ | 18193/22434 [16:07:08<2:57:19, 2.51s/it] +2025-02-06 02:14:49 - ERROR - stderr - +2025-02-06 02:14:49 - ERROR - stderr - +2025-02-06 02:14:49 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.4963065385818481, 'learning_rate': 1.8166768823153458e-06, 'epoch': 2.43} +2025-02-06 02:14:49 - ERROR - stderr - 81%|████████ | 18193/22434 [16:07:08<2:57:19, 2.51s/it] +2025-02-06 02:14:51 - ERROR - stderr - 81%|████████ | 18194/22434 [16:07:11<2:56:20, 2.50s/it] +2025-02-06 02:14:51 - ERROR - stderr - +2025-02-06 02:14:51 - ERROR - stderr - +2025-02-06 02:14:51 - INFO - stdout - {'loss': 0.4489, 'grad_norm': 1.6237667798995972, 'learning_rate': 1.8158471809335653e-06, 'epoch': 2.43} +2025-02-06 02:14:51 - ERROR - stderr - 81%|████████ | 18194/22434 [16:07:11<2:56:20, 2.50s/it] +2025-02-06 02:14:54 - ERROR - stderr - 81%|████████ | 18195/22434 [16:07:13<2:56:05, 2.49s/it] +2025-02-06 02:14:54 - ERROR - stderr - +2025-02-06 02:14:54 - ERROR - stderr - +2025-02-06 02:14:54 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.5648760795593262, 'learning_rate': 1.8150176501426199e-06, 'epoch': 2.43} +2025-02-06 02:14:54 - ERROR - stderr - 81%|████████ | 18195/22434 [16:07:13<2:56:05, 2.49s/it] +2025-02-06 02:14:56 - ERROR - stderr - 81%|████████ | 18196/22434 [16:07:16<2:56:32, 2.50s/it] +2025-02-06 02:14:56 - ERROR - stderr - +2025-02-06 02:14:56 - ERROR - stderr - +2025-02-06 02:14:56 - INFO - stdout - {'loss': 0.4079, 'grad_norm': 1.7834364175796509, 'learning_rate': 1.8141882899597986e-06, 'epoch': 2.43} +2025-02-06 02:14:56 - ERROR - stderr - 81%|████████ | 18196/22434 [16:07:16<2:56:32, 2.50s/it] +2025-02-06 02:14:59 - ERROR - stderr - 81%|████████ | 18197/22434 [16:07:18<2:56:40, 2.50s/it] +2025-02-06 02:14:59 - ERROR - stderr - +2025-02-06 02:14:59 - ERROR - stderr - +2025-02-06 02:14:59 - INFO - stdout - {'loss': 0.3978, 'grad_norm': 1.4551732540130615, 'learning_rate': 1.8133591004023897e-06, 'epoch': 2.43} +2025-02-06 02:14:59 - ERROR - stderr - 81%|████████ | 18197/22434 [16:07:18<2:56:40, 2.50s/it] +2025-02-06 02:15:01 - ERROR - stderr - 81%|████████ | 18198/22434 [16:07:21<2:56:31, 2.50s/it] +2025-02-06 02:15:01 - ERROR - stderr - +2025-02-06 02:15:01 - ERROR - stderr - +2025-02-06 02:15:01 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.5140717029571533, 'learning_rate': 1.812530081487679e-06, 'epoch': 2.43} +2025-02-06 02:15:01 - ERROR - stderr - 81%|████████ | 18198/22434 [16:07:21<2:56:31, 2.50s/it] +2025-02-06 02:15:04 - ERROR - stderr - 81%|████████ | 18199/22434 [16:07:24<2:59:42, 2.55s/it] +2025-02-06 02:15:04 - ERROR - stderr - +2025-02-06 02:15:04 - ERROR - stderr - +2025-02-06 02:15:04 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.7185540199279785, 'learning_rate': 1.8117012332329399e-06, 'epoch': 2.43} +2025-02-06 02:15:04 - ERROR - stderr - 81%|████████ | 18199/22434 [16:07:24<2:59:42, 2.55s/it] +2025-02-06 02:15:06 - ERROR - stderr - 81%|████████ | 18200/22434 [16:07:26<2:58:07, 2.52s/it] +2025-02-06 02:15:06 - ERROR - stderr - +2025-02-06 02:15:06 - ERROR - stderr - +2025-02-06 02:15:06 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.321104645729065, 'learning_rate': 1.810872555655454e-06, 'epoch': 2.43} +2025-02-06 02:15:06 - ERROR - stderr - 81%|████████ | 18200/22434 [16:07:26<2:58:07, 2.52s/it] +2025-02-06 02:15:09 - ERROR - stderr - 81%|████████ | 18201/22434 [16:07:28<2:57:13, 2.51s/it] +2025-02-06 02:15:09 - ERROR - stderr - +2025-02-06 02:15:09 - ERROR - stderr - +2025-02-06 02:15:09 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.5534127950668335, 'learning_rate': 1.810044048772498e-06, 'epoch': 2.43} +2025-02-06 02:15:09 - ERROR - stderr - 81%|████████ | 18201/22434 [16:07:29<2:57:13, 2.51s/it] +2025-02-06 02:15:12 - ERROR - stderr - 81%|████████ | 18202/22434 [16:07:31<3:07:50, 2.66s/it] +2025-02-06 02:15:12 - ERROR - stderr - +2025-02-06 02:15:12 - ERROR - stderr - +2025-02-06 02:15:12 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.535367727279663, 'learning_rate': 1.809215712601331e-06, 'epoch': 2.43} +2025-02-06 02:15:12 - ERROR - stderr - 81%|████████ | 18202/22434 [16:07:32<3:07:50, 2.66s/it] +2025-02-06 02:15:14 - ERROR - stderr - 81%|████████ | 18203/22434 [16:07:34<3:05:31, 2.63s/it] +2025-02-06 02:15:14 - ERROR - stderr - +2025-02-06 02:15:14 - ERROR - stderr - +2025-02-06 02:15:14 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.5620211362838745, 'learning_rate': 1.8083875471592294e-06, 'epoch': 2.43} +2025-02-06 02:15:14 - ERROR - stderr - 81%|████████ | 18203/22434 [16:07:34<3:05:31, 2.63s/it] +2025-02-06 02:15:17 - ERROR - stderr - 81%|████████ | 18204/22434 [16:07:36<3:01:07, 2.57s/it] +2025-02-06 02:15:17 - ERROR - stderr - +2025-02-06 02:15:17 - ERROR - stderr - +2025-02-06 02:15:17 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.4875489473342896, 'learning_rate': 1.807559552463446e-06, 'epoch': 2.43} +2025-02-06 02:15:17 - ERROR - stderr - 81%|████████ | 18204/22434 [16:07:37<3:01:07, 2.57s/it] +2025-02-06 02:15:19 - ERROR - stderr - 81%|████████ | 18205/22434 [16:07:39<2:58:47, 2.54s/it] +2025-02-06 02:15:19 - ERROR - stderr - +2025-02-06 02:15:19 - ERROR - stderr - +2025-02-06 02:15:19 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.5180929899215698, 'learning_rate': 1.8067317285312503e-06, 'epoch': 2.43} +2025-02-06 02:15:19 - ERROR - stderr - 81%|████████ | 18205/22434 [16:07:39<2:58:47, 2.54s/it] +2025-02-06 02:15:22 - ERROR - stderr - 81%|████████ | 18206/22434 [16:07:42<3:01:36, 2.58s/it] +2025-02-06 02:15:22 - ERROR - stderr - +2025-02-06 02:15:22 - ERROR - stderr - +2025-02-06 02:15:22 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.5418572425842285, 'learning_rate': 1.8059040753798884e-06, 'epoch': 2.43} +2025-02-06 02:15:22 - ERROR - stderr - 81%|████████ | 18206/22434 [16:07:42<3:01:36, 2.58s/it] +2025-02-06 02:15:24 - ERROR - stderr - 81%|████████ | 18207/22434 [16:07:44<2:59:03, 2.54s/it] +2025-02-06 02:15:24 - ERROR - stderr - +2025-02-06 02:15:24 - ERROR - stderr - +2025-02-06 02:15:24 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.4747289419174194, 'learning_rate': 1.8050765930266123e-06, 'epoch': 2.43} +2025-02-06 02:15:24 - ERROR - stderr - 81%|████████ | 18207/22434 [16:07:44<2:59:03, 2.54s/it] +2025-02-06 02:15:27 - ERROR - stderr - 81%|████████ | 18208/22434 [16:07:46<2:56:21, 2.50s/it] +2025-02-06 02:15:27 - ERROR - stderr - +2025-02-06 02:15:27 - ERROR - stderr - +2025-02-06 02:15:27 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.499009132385254, 'learning_rate': 1.804249281488678e-06, 'epoch': 2.43} +2025-02-06 02:15:27 - ERROR - stderr - 81%|████████ | 18208/22434 [16:07:47<2:56:21, 2.50s/it] +2025-02-06 02:15:29 - ERROR - stderr - 81%|████████ | 18209/22434 [16:07:49<2:55:37, 2.49s/it] +2025-02-06 02:15:29 - ERROR - stderr - +2025-02-06 02:15:29 - ERROR - stderr - +2025-02-06 02:15:29 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.6491479873657227, 'learning_rate': 1.803422140783323e-06, 'epoch': 2.44} +2025-02-06 02:15:29 - ERROR - stderr - 81%|████████ | 18209/22434 [16:07:49<2:55:37, 2.49s/it] +2025-02-06 02:15:32 - ERROR - stderr - 81%|████████ | 18210/22434 [16:07:51<2:54:21, 2.48s/it] +2025-02-06 02:15:32 - ERROR - stderr - +2025-02-06 02:15:32 - ERROR - stderr - +2025-02-06 02:15:32 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.5776349306106567, 'learning_rate': 1.80259517092779e-06, 'epoch': 2.44} +2025-02-06 02:15:32 - ERROR - stderr - 81%|████████ | 18210/22434 [16:07:51<2:54:21, 2.48s/it] +2025-02-06 02:15:34 - ERROR - stderr - 81%|████████ | 18211/22434 [16:07:54<2:54:26, 2.48s/it] +2025-02-06 02:15:34 - ERROR - stderr - +2025-02-06 02:15:34 - ERROR - stderr - +2025-02-06 02:15:34 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.514560580253601, 'learning_rate': 1.8017683719393163e-06, 'epoch': 2.44} +2025-02-06 02:15:34 - ERROR - stderr - 81%|████████ | 18211/22434 [16:07:54<2:54:26, 2.48s/it] +2025-02-06 02:15:37 - ERROR - stderr - 81%|████████ | 18212/22434 [16:07:56<2:54:07, 2.47s/it] +2025-02-06 02:15:37 - ERROR - stderr - +2025-02-06 02:15:37 - ERROR - stderr - +2025-02-06 02:15:37 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.577111005783081, 'learning_rate': 1.8009417438351363e-06, 'epoch': 2.44} +2025-02-06 02:15:37 - ERROR - stderr - 81%|████████ | 18212/22434 [16:07:56<2:54:07, 2.47s/it] +2025-02-06 02:15:39 - ERROR - stderr - 81%|████████ | 18213/22434 [16:07:59<2:54:07, 2.48s/it] +2025-02-06 02:15:39 - ERROR - stderr - +2025-02-06 02:15:39 - ERROR - stderr - +2025-02-06 02:15:39 - INFO - stdout - {'loss': 0.4054, 'grad_norm': 1.7509959936141968, 'learning_rate': 1.80011528663248e-06, 'epoch': 2.44} +2025-02-06 02:15:39 - ERROR - stderr - 81%|████████ | 18213/22434 [16:07:59<2:54:07, 2.48s/it] +2025-02-06 02:15:42 - ERROR - stderr - 81%|████████ | 18214/22434 [16:08:01<2:55:30, 2.50s/it] +2025-02-06 02:15:42 - ERROR - stderr - +2025-02-06 02:15:42 - ERROR - stderr - +2025-02-06 02:15:42 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.4242032766342163, 'learning_rate': 1.7992890003485742e-06, 'epoch': 2.44} +2025-02-06 02:15:42 - ERROR - stderr - 81%|████████ | 18214/22434 [16:08:01<2:55:30, 2.50s/it] +2025-02-06 02:15:44 - ERROR - stderr - 81%|████████ | 18215/22434 [16:08:04<2:55:37, 2.50s/it] +2025-02-06 02:15:44 - ERROR - stderr - +2025-02-06 02:15:44 - ERROR - stderr - +2025-02-06 02:15:44 - INFO - stdout - {'loss': 0.358, 'grad_norm': 1.655269742012024, 'learning_rate': 1.7984628850006414e-06, 'epoch': 2.44} +2025-02-06 02:15:44 - ERROR - stderr - 81%|████████ | 18215/22434 [16:08:04<2:55:37, 2.50s/it] +2025-02-06 02:15:47 - ERROR - stderr - 81%|████████ | 18216/22434 [16:08:06<2:55:19, 2.49s/it] +2025-02-06 02:15:47 - ERROR - stderr - +2025-02-06 02:15:47 - ERROR - stderr - +2025-02-06 02:15:47 - INFO - stdout - {'loss': 0.4521, 'grad_norm': 1.7394752502441406, 'learning_rate': 1.7976369406059025e-06, 'epoch': 2.44} +2025-02-06 02:15:47 - ERROR - stderr - 81%|████████ | 18216/22434 [16:08:06<2:55:19, 2.49s/it] +2025-02-06 02:15:49 - ERROR - stderr - 81%|████████ | 18217/22434 [16:08:09<2:56:11, 2.51s/it] +2025-02-06 02:15:49 - ERROR - stderr - +2025-02-06 02:15:49 - ERROR - stderr - +2025-02-06 02:15:49 - INFO - stdout - {'loss': 0.3311, 'grad_norm': 1.4547960758209229, 'learning_rate': 1.7968111671815747e-06, 'epoch': 2.44} +2025-02-06 02:15:49 - ERROR - stderr - 81%|████████ | 18217/22434 [16:08:09<2:56:11, 2.51s/it] +2025-02-06 02:15:52 - ERROR - stderr - 81%|████████ | 18218/22434 [16:08:11<2:57:32, 2.53s/it] +2025-02-06 02:15:52 - ERROR - stderr - +2025-02-06 02:15:52 - ERROR - stderr - +2025-02-06 02:15:52 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5292388200759888, 'learning_rate': 1.7959855647448642e-06, 'epoch': 2.44} +2025-02-06 02:15:52 - ERROR - stderr - 81%|████████ | 18218/22434 [16:08:12<2:57:32, 2.53s/it] +2025-02-06 02:15:54 - ERROR - stderr - 81%|████████ | 18219/22434 [16:08:14<2:57:33, 2.53s/it] +2025-02-06 02:15:54 - ERROR - stderr - +2025-02-06 02:15:54 - ERROR - stderr - +2025-02-06 02:15:54 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.6920738220214844, 'learning_rate': 1.7951601333129864e-06, 'epoch': 2.44} +2025-02-06 02:15:54 - ERROR - stderr - 81%|████████ | 18219/22434 [16:08:14<2:57:33, 2.53s/it] +2025-02-06 02:15:57 - ERROR - stderr - 81%|████████ | 18220/22434 [16:08:17<3:03:02, 2.61s/it] +2025-02-06 02:15:57 - ERROR - stderr - +2025-02-06 02:15:57 - ERROR - stderr - +2025-02-06 02:15:57 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.5297890901565552, 'learning_rate': 1.794334872903144e-06, 'epoch': 2.44} +2025-02-06 02:15:57 - ERROR - stderr - 81%|████████ | 18220/22434 [16:08:17<3:03:02, 2.61s/it] +2025-02-06 02:15:59 - ERROR - stderr - 81%|████████ | 18221/22434 [16:08:19<2:59:21, 2.55s/it] +2025-02-06 02:15:59 - ERROR - stderr - +2025-02-06 02:15:59 - ERROR - stderr - +2025-02-06 02:15:59 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.5986666679382324, 'learning_rate': 1.7935097835325399e-06, 'epoch': 2.44} +2025-02-06 02:15:59 - ERROR - stderr - 81%|████████ | 18221/22434 [16:08:19<2:59:21, 2.55s/it] +2025-02-06 02:16:02 - ERROR - stderr - 81%|████████ | 18222/22434 [16:08:22<2:58:34, 2.54s/it] +2025-02-06 02:16:02 - ERROR - stderr - +2025-02-06 02:16:02 - ERROR - stderr - +2025-02-06 02:16:02 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.4229732751846313, 'learning_rate': 1.7926848652183736e-06, 'epoch': 2.44} +2025-02-06 02:16:02 - ERROR - stderr - 81%|████████ | 18222/22434 [16:08:22<2:58:34, 2.54s/it] +2025-02-06 02:16:04 - ERROR - stderr - 81%|████████ | 18223/22434 [16:08:24<2:56:55, 2.52s/it] +2025-02-06 02:16:04 - ERROR - stderr - +2025-02-06 02:16:04 - ERROR - stderr - +2025-02-06 02:16:04 - INFO - stdout - {'loss': 0.3292, 'grad_norm': 1.4651970863342285, 'learning_rate': 1.7918601179778328e-06, 'epoch': 2.44} +2025-02-06 02:16:04 - ERROR - stderr - 81%|████████ | 18223/22434 [16:08:24<2:56:55, 2.52s/it] +2025-02-06 02:16:07 - ERROR - stderr - 81%|████████ | 18224/22434 [16:08:27<2:57:43, 2.53s/it] +2025-02-06 02:16:07 - ERROR - stderr - +2025-02-06 02:16:07 - ERROR - stderr - +2025-02-06 02:16:07 - INFO - stdout - {'loss': 0.3596, 'grad_norm': 1.4440951347351074, 'learning_rate': 1.7910355418281189e-06, 'epoch': 2.44} +2025-02-06 02:16:07 - ERROR - stderr - 81%|████████ | 18224/22434 [16:08:27<2:57:43, 2.53s/it] +2025-02-06 02:16:09 - ERROR - stderr - 81%|████████ | 18225/22434 [16:08:29<2:57:09, 2.53s/it] +2025-02-06 02:16:10 - ERROR - stderr - +2025-02-06 02:16:10 - ERROR - stderr - +2025-02-06 02:16:10 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.5454461574554443, 'learning_rate': 1.7902111367864106e-06, 'epoch': 2.44} +2025-02-06 02:16:10 - ERROR - stderr - 81%|████████ | 18225/22434 [16:08:29<2:57:09, 2.53s/it] +2025-02-06 02:16:12 - ERROR - stderr - 81%|████████ | 18226/22434 [16:08:32<2:54:57, 2.49s/it] +2025-02-06 02:16:12 - ERROR - stderr - +2025-02-06 02:16:12 - ERROR - stderr - +2025-02-06 02:16:12 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.6433773040771484, 'learning_rate': 1.789386902869893e-06, 'epoch': 2.44} +2025-02-06 02:16:12 - ERROR - stderr - 81%|████████ | 18226/22434 [16:08:32<2:54:57, 2.49s/it] +2025-02-06 02:16:14 - ERROR - stderr - 81%|████████ | 18227/22434 [16:08:34<2:54:34, 2.49s/it] +2025-02-06 02:16:14 - ERROR - stderr - +2025-02-06 02:16:14 - ERROR - stderr - +2025-02-06 02:16:14 - INFO - stdout - {'loss': 0.3177, 'grad_norm': 1.3851348161697388, 'learning_rate': 1.7885628400957543e-06, 'epoch': 2.44} +2025-02-06 02:16:14 - ERROR - stderr - 81%|████████ | 18227/22434 [16:08:34<2:54:34, 2.49s/it] +2025-02-06 02:16:17 - ERROR - stderr - 81%|████████▏ | 18228/22434 [16:08:37<2:54:59, 2.50s/it] +2025-02-06 02:16:17 - ERROR - stderr - +2025-02-06 02:16:17 - ERROR - stderr - +2025-02-06 02:16:17 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.5309910774230957, 'learning_rate': 1.7877389484811603e-06, 'epoch': 2.44} +2025-02-06 02:16:17 - ERROR - stderr - 81%|████████▏ | 18228/22434 [16:08:37<2:54:59, 2.50s/it] +2025-02-06 02:16:19 - ERROR - stderr - 81%|████████▏ | 18229/22434 [16:08:39<2:53:36, 2.48s/it] +2025-02-06 02:16:19 - ERROR - stderr - +2025-02-06 02:16:19 - ERROR - stderr - +2025-02-06 02:16:19 - INFO - stdout - {'loss': 0.4015, 'grad_norm': 1.658008098602295, 'learning_rate': 1.7869152280432944e-06, 'epoch': 2.44} +2025-02-06 02:16:19 - ERROR - stderr - 81%|████████▏ | 18229/22434 [16:08:39<2:53:36, 2.48s/it] +2025-02-06 02:16:22 - ERROR - stderr - 81%|████████▏ | 18230/22434 [16:08:42<2:58:08, 2.54s/it] +2025-02-06 02:16:22 - ERROR - stderr - +2025-02-06 02:16:22 - ERROR - stderr - +2025-02-06 02:16:22 - INFO - stdout - {'loss': 0.334, 'grad_norm': 1.5738810300827026, 'learning_rate': 1.7860916787993198e-06, 'epoch': 2.44} +2025-02-06 02:16:22 - ERROR - stderr - 81%|████████▏ | 18230/22434 [16:08:42<2:58:08, 2.54s/it] +2025-02-06 02:16:25 - ERROR - stderr - 81%|████████▏ | 18231/22434 [16:08:44<2:56:22, 2.52s/it] +2025-02-06 02:16:25 - ERROR - stderr - +2025-02-06 02:16:25 - ERROR - stderr - +2025-02-06 02:16:25 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.5266389846801758, 'learning_rate': 1.785268300766404e-06, 'epoch': 2.44} +2025-02-06 02:16:25 - ERROR - stderr - 81%|████████▏ | 18231/22434 [16:08:44<2:56:22, 2.52s/it] +2025-02-06 02:16:27 - ERROR - stderr - 81%|████████▏ | 18232/22434 [16:08:47<2:54:06, 2.49s/it] +2025-02-06 02:16:27 - ERROR - stderr - +2025-02-06 02:16:27 - ERROR - stderr - +2025-02-06 02:16:27 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.4081307649612427, 'learning_rate': 1.7844450939617098e-06, 'epoch': 2.44} +2025-02-06 02:16:27 - ERROR - stderr - 81%|████████▏ | 18232/22434 [16:08:47<2:54:06, 2.49s/it] +2025-02-06 02:16:29 - ERROR - stderr - 81%|████████▏ | 18233/22434 [16:08:49<2:55:21, 2.50s/it] +2025-02-06 02:16:30 - ERROR - stderr - +2025-02-06 02:16:30 - ERROR - stderr - +2025-02-06 02:16:30 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.6197218894958496, 'learning_rate': 1.7836220584023956e-06, 'epoch': 2.44} +2025-02-06 02:16:30 - ERROR - stderr - 81%|████████▏ | 18233/22434 [16:08:49<2:55:21, 2.50s/it] +2025-02-06 02:16:32 - ERROR - stderr - 81%|████████▏ | 18234/22434 [16:08:52<2:55:16, 2.50s/it] +2025-02-06 02:16:32 - ERROR - stderr - +2025-02-06 02:16:32 - ERROR - stderr - +2025-02-06 02:16:32 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.5370994806289673, 'learning_rate': 1.7827991941056177e-06, 'epoch': 2.44} +2025-02-06 02:16:32 - ERROR - stderr - 81%|████████▏ | 18234/22434 [16:08:52<2:55:16, 2.50s/it] +2025-02-06 02:16:34 - ERROR - stderr - 81%|████████▏ | 18235/22434 [16:08:54<2:54:34, 2.49s/it] +2025-02-06 02:16:34 - ERROR - stderr - +2025-02-06 02:16:34 - ERROR - stderr - +2025-02-06 02:16:34 - INFO - stdout - {'loss': 0.3888, 'grad_norm': 1.6212660074234009, 'learning_rate': 1.7819765010885281e-06, 'epoch': 2.44} +2025-02-06 02:16:34 - ERROR - stderr - 81%|████████▏ | 18235/22434 [16:08:54<2:54:34, 2.49s/it] +2025-02-06 02:16:37 - ERROR - stderr - 81%|████████▏ | 18236/22434 [16:08:57<2:54:28, 2.49s/it] +2025-02-06 02:16:37 - ERROR - stderr - +2025-02-06 02:16:37 - ERROR - stderr - +2025-02-06 02:16:37 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.7444339990615845, 'learning_rate': 1.781153979368274e-06, 'epoch': 2.44} +2025-02-06 02:16:37 - ERROR - stderr - 81%|████████▏ | 18236/22434 [16:08:57<2:54:28, 2.49s/it] +2025-02-06 02:16:39 - ERROR - stderr - 81%|████████▏ | 18237/22434 [16:08:59<2:54:02, 2.49s/it] +2025-02-06 02:16:39 - ERROR - stderr - +2025-02-06 02:16:39 - ERROR - stderr - +2025-02-06 02:16:39 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.4328597784042358, 'learning_rate': 1.780331628962001e-06, 'epoch': 2.44} +2025-02-06 02:16:39 - ERROR - stderr - 81%|████████▏ | 18237/22434 [16:08:59<2:54:02, 2.49s/it] +2025-02-06 02:16:42 - ERROR - stderr - 81%|████████▏ | 18238/22434 [16:09:02<2:54:16, 2.49s/it] +2025-02-06 02:16:42 - ERROR - stderr - +2025-02-06 02:16:42 - ERROR - stderr - +2025-02-06 02:16:42 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.5061352252960205, 'learning_rate': 1.7795094498868494e-06, 'epoch': 2.44} +2025-02-06 02:16:42 - ERROR - stderr - 81%|████████▏ | 18238/22434 [16:09:02<2:54:16, 2.49s/it] +2025-02-06 02:16:44 - ERROR - stderr - 81%|████████▏ | 18239/22434 [16:09:04<2:53:50, 2.49s/it] +2025-02-06 02:16:44 - ERROR - stderr - +2025-02-06 02:16:44 - ERROR - stderr - +2025-02-06 02:16:44 - INFO - stdout - {'loss': 0.3285, 'grad_norm': 1.4693876504898071, 'learning_rate': 1.7786874421599575e-06, 'epoch': 2.44} +2025-02-06 02:16:44 - ERROR - stderr - 81%|████████▏ | 18239/22434 [16:09:04<2:53:50, 2.49s/it] +2025-02-06 02:16:47 - ERROR - stderr - 81%|████████▏ | 18240/22434 [16:09:07<2:55:06, 2.51s/it] +2025-02-06 02:16:47 - ERROR - stderr - +2025-02-06 02:16:47 - ERROR - stderr - +2025-02-06 02:16:47 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.5137965679168701, 'learning_rate': 1.7778656057984588e-06, 'epoch': 2.44} +2025-02-06 02:16:47 - ERROR - stderr - 81%|████████▏ | 18240/22434 [16:09:07<2:55:06, 2.51s/it] +2025-02-06 02:16:49 - ERROR - stderr - 81%|████████▏ | 18241/22434 [16:09:09<2:53:41, 2.49s/it] +2025-02-06 02:16:49 - ERROR - stderr - +2025-02-06 02:16:49 - ERROR - stderr - +2025-02-06 02:16:49 - INFO - stdout - {'loss': 0.3105, 'grad_norm': 1.4213095903396606, 'learning_rate': 1.7770439408194862e-06, 'epoch': 2.44} +2025-02-06 02:16:49 - ERROR - stderr - 81%|████████▏ | 18241/22434 [16:09:09<2:53:41, 2.49s/it] +2025-02-06 02:16:52 - ERROR - stderr - 81%|████████▏ | 18242/22434 [16:09:12<2:54:04, 2.49s/it] +2025-02-06 02:16:52 - ERROR - stderr - +2025-02-06 02:16:52 - ERROR - stderr - +2025-02-06 02:16:52 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.6336909532546997, 'learning_rate': 1.776222447240159e-06, 'epoch': 2.44} +2025-02-06 02:16:52 - ERROR - stderr - 81%|████████▏ | 18242/22434 [16:09:12<2:54:04, 2.49s/it] +2025-02-06 02:16:54 - ERROR - stderr - 81%|████████▏ | 18243/22434 [16:09:14<2:52:22, 2.47s/it] +2025-02-06 02:16:54 - ERROR - stderr - +2025-02-06 02:16:54 - ERROR - stderr - +2025-02-06 02:16:54 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.6486101150512695, 'learning_rate': 1.7754011250776114e-06, 'epoch': 2.44} +2025-02-06 02:16:54 - ERROR - stderr - 81%|████████▏ | 18243/22434 [16:09:14<2:52:22, 2.47s/it] +2025-02-06 02:16:57 - ERROR - stderr - 81%|████████▏ | 18244/22434 [16:09:17<2:52:15, 2.47s/it] +2025-02-06 02:16:57 - ERROR - stderr - +2025-02-06 02:16:57 - ERROR - stderr - +2025-02-06 02:16:57 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.4699738025665283, 'learning_rate': 1.7745799743489512e-06, 'epoch': 2.44} +2025-02-06 02:16:57 - ERROR - stderr - 81%|████████▏ | 18244/22434 [16:09:17<2:52:15, 2.47s/it] +2025-02-06 02:16:59 - ERROR - stderr - 81%|████████▏ | 18245/22434 [16:09:19<2:52:56, 2.48s/it] +2025-02-06 02:16:59 - ERROR - stderr - +2025-02-06 02:16:59 - ERROR - stderr - +2025-02-06 02:16:59 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.5429487228393555, 'learning_rate': 1.7737589950713042e-06, 'epoch': 2.44} +2025-02-06 02:16:59 - ERROR - stderr - 81%|████████▏ | 18245/22434 [16:09:19<2:52:56, 2.48s/it] +2025-02-06 02:17:02 - ERROR - stderr - 81%|████████▏ | 18246/22434 [16:09:22<2:57:33, 2.54s/it] +2025-02-06 02:17:02 - ERROR - stderr - +2025-02-06 02:17:02 - ERROR - stderr - +2025-02-06 02:17:02 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.5133074522018433, 'learning_rate': 1.7729381872617812e-06, 'epoch': 2.44} +2025-02-06 02:17:02 - ERROR - stderr - 81%|████████▏ | 18246/22434 [16:09:22<2:57:33, 2.54s/it] +2025-02-06 02:17:04 - ERROR - stderr - 81%|████████▏ | 18247/22434 [16:09:24<2:56:22, 2.53s/it] +2025-02-06 02:17:04 - ERROR - stderr - +2025-02-06 02:17:04 - ERROR - stderr - +2025-02-06 02:17:04 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.5166326761245728, 'learning_rate': 1.7721175509374832e-06, 'epoch': 2.44} +2025-02-06 02:17:04 - ERROR - stderr - 81%|████████▏ | 18247/22434 [16:09:24<2:56:22, 2.53s/it] +2025-02-06 02:17:07 - ERROR - stderr - 81%|████████▏ | 18248/22434 [16:09:27<2:55:36, 2.52s/it] +2025-02-06 02:17:07 - ERROR - stderr - +2025-02-06 02:17:07 - ERROR - stderr - +2025-02-06 02:17:07 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.5559535026550293, 'learning_rate': 1.7712970861155276e-06, 'epoch': 2.44} +2025-02-06 02:17:07 - ERROR - stderr - 81%|████████▏ | 18248/22434 [16:09:27<2:55:36, 2.52s/it] +2025-02-06 02:17:09 - ERROR - stderr - 81%|████████▏ | 18249/22434 [16:09:29<2:53:53, 2.49s/it] +2025-02-06 02:17:09 - ERROR - stderr - +2025-02-06 02:17:09 - ERROR - stderr - +2025-02-06 02:17:09 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.5959115028381348, 'learning_rate': 1.7704767928130084e-06, 'epoch': 2.44} +2025-02-06 02:17:09 - ERROR - stderr - 81%|████████▏ | 18249/22434 [16:09:29<2:53:53, 2.49s/it] +2025-02-06 02:17:12 - ERROR - stderr - 81%|████████▏ | 18250/22434 [16:09:32<2:57:59, 2.55s/it] +2025-02-06 02:17:12 - ERROR - stderr - +2025-02-06 02:17:12 - ERROR - stderr - +2025-02-06 02:17:12 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.5388987064361572, 'learning_rate': 1.7696566710470254e-06, 'epoch': 2.44} +2025-02-06 02:17:12 - ERROR - stderr - 81%|████████▏ | 18250/22434 [16:09:32<2:57:59, 2.55s/it] +2025-02-06 02:17:15 - ERROR - stderr - 81%|████████▏ | 18251/22434 [16:09:34<2:57:13, 2.54s/it] +2025-02-06 02:17:15 - ERROR - stderr - +2025-02-06 02:17:15 - ERROR - stderr - +2025-02-06 02:17:15 - INFO - stdout - {'loss': 0.3611, 'grad_norm': 1.5253854990005493, 'learning_rate': 1.7688367208346723e-06, 'epoch': 2.44} +2025-02-06 02:17:15 - ERROR - stderr - 81%|████████▏ | 18251/22434 [16:09:34<2:57:13, 2.54s/it] +2025-02-06 02:17:17 - ERROR - stderr - 81%|████████▏ | 18252/22434 [16:09:37<2:56:21, 2.53s/it] +2025-02-06 02:17:17 - ERROR - stderr - +2025-02-06 02:17:17 - ERROR - stderr - +2025-02-06 02:17:17 - INFO - stdout - {'loss': 0.3708, 'grad_norm': 1.5663312673568726, 'learning_rate': 1.7680169421930404e-06, 'epoch': 2.44} +2025-02-06 02:17:17 - ERROR - stderr - 81%|████████▏ | 18252/22434 [16:09:37<2:56:21, 2.53s/it] +2025-02-06 02:17:20 - ERROR - stderr - 81%|████████▏ | 18253/22434 [16:09:39<2:56:12, 2.53s/it] +2025-02-06 02:17:20 - ERROR - stderr - +2025-02-06 02:17:20 - ERROR - stderr - +2025-02-06 02:17:20 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.816049337387085, 'learning_rate': 1.7671973351392223e-06, 'epoch': 2.44} +2025-02-06 02:17:20 - ERROR - stderr - 81%|████████▏ | 18253/22434 [16:09:39<2:56:12, 2.53s/it] +2025-02-06 02:17:22 - ERROR - stderr - 81%|████████▏ | 18254/22434 [16:09:42<3:01:03, 2.60s/it] +2025-02-06 02:17:22 - ERROR - stderr - +2025-02-06 02:17:22 - ERROR - stderr - +2025-02-06 02:17:22 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.4882328510284424, 'learning_rate': 1.7663778996902947e-06, 'epoch': 2.44} +2025-02-06 02:17:22 - ERROR - stderr - 81%|████████▏ | 18254/22434 [16:09:42<3:01:03, 2.60s/it] +2025-02-06 02:17:25 - ERROR - stderr - 81%|████████▏ | 18255/22434 [16:09:45<2:58:45, 2.57s/it] +2025-02-06 02:17:25 - ERROR - stderr - +2025-02-06 02:17:25 - ERROR - stderr - +2025-02-06 02:17:25 - INFO - stdout - {'loss': 0.3475, 'grad_norm': 1.6296606063842773, 'learning_rate': 1.7655586358633426e-06, 'epoch': 2.44} +2025-02-06 02:17:25 - ERROR - stderr - 81%|████████▏ | 18255/22434 [16:09:45<2:58:45, 2.57s/it] +2025-02-06 02:17:27 - ERROR - stderr - 81%|████████▏ | 18256/22434 [16:09:47<2:58:00, 2.56s/it] +2025-02-06 02:17:27 - ERROR - stderr - +2025-02-06 02:17:27 - ERROR - stderr - +2025-02-06 02:17:27 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.435652494430542, 'learning_rate': 1.76473954367544e-06, 'epoch': 2.44} +2025-02-06 02:17:27 - ERROR - stderr - 81%|████████▏ | 18256/22434 [16:09:47<2:58:00, 2.56s/it] +2025-02-06 02:17:30 - ERROR - stderr - 81%|████████▏ | 18257/22434 [16:09:50<2:55:58, 2.53s/it] +2025-02-06 02:17:30 - ERROR - stderr - +2025-02-06 02:17:30 - ERROR - stderr - +2025-02-06 02:17:30 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.6103055477142334, 'learning_rate': 1.7639206231436622e-06, 'epoch': 2.44} +2025-02-06 02:17:30 - ERROR - stderr - 81%|████████▏ | 18257/22434 [16:09:50<2:55:58, 2.53s/it] +2025-02-06 02:17:32 - ERROR - stderr - 81%|████████▏ | 18258/22434 [16:09:52<2:56:09, 2.53s/it] +2025-02-06 02:17:32 - ERROR - stderr - +2025-02-06 02:17:32 - ERROR - stderr - +2025-02-06 02:17:32 - INFO - stdout - {'loss': 0.3943, 'grad_norm': 1.8223057985305786, 'learning_rate': 1.763101874285077e-06, 'epoch': 2.44} +2025-02-06 02:17:32 - ERROR - stderr - 81%|████████▏ | 18258/22434 [16:09:52<2:56:09, 2.53s/it] +2025-02-06 02:17:35 - ERROR - stderr - 81%|████████▏ | 18259/22434 [16:09:55<2:56:22, 2.53s/it] +2025-02-06 02:17:35 - ERROR - stderr - +2025-02-06 02:17:35 - ERROR - stderr - +2025-02-06 02:17:35 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.508858561515808, 'learning_rate': 1.7622832971167524e-06, 'epoch': 2.44} +2025-02-06 02:17:35 - ERROR - stderr - 81%|████████▏ | 18259/22434 [16:09:55<2:56:22, 2.53s/it] +2025-02-06 02:17:37 - ERROR - stderr - 81%|████████▏ | 18260/22434 [16:09:57<2:53:56, 2.50s/it] +2025-02-06 02:17:37 - ERROR - stderr - +2025-02-06 02:17:37 - ERROR - stderr - +2025-02-06 02:17:37 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.535022258758545, 'learning_rate': 1.7614648916557486e-06, 'epoch': 2.44} +2025-02-06 02:17:37 - ERROR - stderr - 81%|████████▏ | 18260/22434 [16:09:57<2:53:56, 2.50s/it] +2025-02-06 02:17:40 - ERROR - stderr - 81%|████████▏ | 18261/22434 [16:10:00<2:54:33, 2.51s/it] +2025-02-06 02:17:40 - ERROR - stderr - +2025-02-06 02:17:40 - ERROR - stderr - +2025-02-06 02:17:40 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.5430930852890015, 'learning_rate': 1.7606466579191272e-06, 'epoch': 2.44} +2025-02-06 02:17:40 - ERROR - stderr - 81%|████████▏ | 18261/22434 [16:10:00<2:54:33, 2.51s/it] +2025-02-06 02:17:42 - ERROR - stderr - 81%|████████▏ | 18262/22434 [16:10:02<2:56:12, 2.53s/it] +2025-02-06 02:17:43 - ERROR - stderr - +2025-02-06 02:17:43 - ERROR - stderr - +2025-02-06 02:17:43 - INFO - stdout - {'loss': 0.358, 'grad_norm': 1.557789921760559, 'learning_rate': 1.7598285959239437e-06, 'epoch': 2.44} +2025-02-06 02:17:43 - ERROR - stderr - 81%|████████▏ | 18262/22434 [16:10:02<2:56:12, 2.53s/it] +2025-02-06 02:17:45 - ERROR - stderr - 81%|████████▏ | 18263/22434 [16:10:05<2:56:15, 2.54s/it] +2025-02-06 02:17:45 - ERROR - stderr - +2025-02-06 02:17:45 - ERROR - stderr - +2025-02-06 02:17:45 - INFO - stdout - {'loss': 0.3243, 'grad_norm': 1.3906068801879883, 'learning_rate': 1.759010705687243e-06, 'epoch': 2.44} +2025-02-06 02:17:45 - ERROR - stderr - 81%|████████▏ | 18263/22434 [16:10:05<2:56:15, 2.54s/it] +2025-02-06 02:17:48 - ERROR - stderr - 81%|████████▏ | 18264/22434 [16:10:07<2:56:40, 2.54s/it] +2025-02-06 02:17:48 - ERROR - stderr - +2025-02-06 02:17:48 - ERROR - stderr - +2025-02-06 02:17:48 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.385672688484192, 'learning_rate': 1.7581929872260805e-06, 'epoch': 2.44} +2025-02-06 02:17:48 - ERROR - stderr - 81%|████████▏ | 18264/22434 [16:10:07<2:56:40, 2.54s/it] +2025-02-06 02:17:50 - ERROR - stderr - 81%|████████▏ | 18265/22434 [16:10:10<2:56:24, 2.54s/it] +2025-02-06 02:17:50 - ERROR - stderr - +2025-02-06 02:17:50 - ERROR - stderr - +2025-02-06 02:17:50 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.5294710397720337, 'learning_rate': 1.7573754405575029e-06, 'epoch': 2.44} +2025-02-06 02:17:50 - ERROR - stderr - 81%|████████▏ | 18265/22434 [16:10:10<2:56:24, 2.54s/it] +2025-02-06 02:17:53 - ERROR - stderr - 81%|████████▏ | 18266/22434 [16:10:12<2:56:11, 2.54s/it] +2025-02-06 02:17:53 - ERROR - stderr - +2025-02-06 02:17:53 - ERROR - stderr - +2025-02-06 02:17:53 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.8096469640731812, 'learning_rate': 1.7565580656985403e-06, 'epoch': 2.44} +2025-02-06 02:17:53 - ERROR - stderr - 81%|████████▏ | 18266/22434 [16:10:12<2:56:11, 2.54s/it] +2025-02-06 02:17:55 - ERROR - stderr - 81%|████████▏ | 18267/22434 [16:10:15<2:53:46, 2.50s/it] +2025-02-06 02:17:55 - ERROR - stderr - +2025-02-06 02:17:55 - ERROR - stderr - +2025-02-06 02:17:55 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.259243369102478, 'learning_rate': 1.755740862666242e-06, 'epoch': 2.44} +2025-02-06 02:17:55 - ERROR - stderr - 81%|████████▏ | 18267/22434 [16:10:15<2:53:46, 2.50s/it] +2025-02-06 02:17:58 - ERROR - stderr - 81%|████████▏ | 18268/22434 [16:10:17<2:52:33, 2.49s/it] +2025-02-06 02:17:58 - ERROR - stderr - +2025-02-06 02:17:58 - ERROR - stderr - +2025-02-06 02:17:58 - INFO - stdout - {'loss': 0.3008, 'grad_norm': 1.4385862350463867, 'learning_rate': 1.7549238314776318e-06, 'epoch': 2.44} +2025-02-06 02:17:58 - ERROR - stderr - 81%|████████▏ | 18268/22434 [16:10:17<2:52:33, 2.49s/it] +2025-02-06 02:18:00 - ERROR - stderr - 81%|████████▏ | 18269/22434 [16:10:20<2:51:51, 2.48s/it] +2025-02-06 02:18:00 - ERROR - stderr - +2025-02-06 02:18:00 - ERROR - stderr - +2025-02-06 02:18:00 - INFO - stdout - {'loss': 0.3155, 'grad_norm': 1.4794098138809204, 'learning_rate': 1.7541069721497494e-06, 'epoch': 2.44} +2025-02-06 02:18:00 - ERROR - stderr - 81%|████████▏ | 18269/22434 [16:10:20<2:51:51, 2.48s/it] +2025-02-06 02:18:02 - ERROR - stderr - 81%|████████▏ | 18270/22434 [16:10:22<2:50:48, 2.46s/it] +2025-02-06 02:18:02 - ERROR - stderr - +2025-02-06 02:18:02 - ERROR - stderr - +2025-02-06 02:18:02 - INFO - stdout - {'loss': 0.283, 'grad_norm': 1.2600165605545044, 'learning_rate': 1.7532902846996136e-06, 'epoch': 2.44} +2025-02-06 02:18:02 - ERROR - stderr - 81%|████████▏ | 18270/22434 [16:10:22<2:50:48, 2.46s/it] +2025-02-06 02:18:05 - ERROR - stderr - 81%|████████▏ | 18271/22434 [16:10:25<2:52:16, 2.48s/it] +2025-02-06 02:18:05 - ERROR - stderr - +2025-02-06 02:18:05 - ERROR - stderr - +2025-02-06 02:18:05 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.4867573976516724, 'learning_rate': 1.7524737691442495e-06, 'epoch': 2.44} +2025-02-06 02:18:05 - ERROR - stderr - 81%|████████▏ | 18271/22434 [16:10:25<2:52:16, 2.48s/it] +2025-02-06 02:18:07 - ERROR - stderr - 81%|████████▏ | 18272/22434 [16:10:27<2:54:17, 2.51s/it] +2025-02-06 02:18:08 - ERROR - stderr - +2025-02-06 02:18:08 - ERROR - stderr - +2025-02-06 02:18:08 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5677626132965088, 'learning_rate': 1.7516574255006813e-06, 'epoch': 2.44} +2025-02-06 02:18:08 - ERROR - stderr - 81%|████████▏ | 18272/22434 [16:10:27<2:54:17, 2.51s/it] +2025-02-06 02:18:10 - ERROR - stderr - 81%|████████▏ | 18273/22434 [16:10:30<2:54:52, 2.52s/it] +2025-02-06 02:18:10 - ERROR - stderr - +2025-02-06 02:18:10 - ERROR - stderr - +2025-02-06 02:18:10 - INFO - stdout - {'loss': 0.3267, 'grad_norm': 1.3933168649673462, 'learning_rate': 1.7508412537859164e-06, 'epoch': 2.44} +2025-02-06 02:18:10 - ERROR - stderr - 81%|████████▏ | 18273/22434 [16:10:30<2:54:52, 2.52s/it] +2025-02-06 02:18:13 - ERROR - stderr - 81%|████████▏ | 18274/22434 [16:10:32<2:54:57, 2.52s/it] +2025-02-06 02:18:13 - ERROR - stderr - +2025-02-06 02:18:13 - ERROR - stderr - +2025-02-06 02:18:13 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.4900708198547363, 'learning_rate': 1.7500252540169782e-06, 'epoch': 2.44} +2025-02-06 02:18:13 - ERROR - stderr - 81%|████████▏ | 18274/22434 [16:10:32<2:54:57, 2.52s/it] +2025-02-06 02:18:15 - ERROR - stderr - 81%|████████▏ | 18275/22434 [16:10:35<2:54:03, 2.51s/it] +2025-02-06 02:18:15 - ERROR - stderr - +2025-02-06 02:18:15 - ERROR - stderr - +2025-02-06 02:18:15 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.6241878271102905, 'learning_rate': 1.7492094262108661e-06, 'epoch': 2.44} +2025-02-06 02:18:15 - ERROR - stderr - 81%|████████▏ | 18275/22434 [16:10:35<2:54:03, 2.51s/it] +2025-02-06 02:18:18 - ERROR - stderr - 81%|████████▏ | 18276/22434 [16:10:37<2:53:09, 2.50s/it] +2025-02-06 02:18:18 - ERROR - stderr - +2025-02-06 02:18:18 - ERROR - stderr - +2025-02-06 02:18:18 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.681395173072815, 'learning_rate': 1.7483937703845876e-06, 'epoch': 2.44} +2025-02-06 02:18:18 - ERROR - stderr - 81%|████████▏ | 18276/22434 [16:10:37<2:53:09, 2.50s/it] +2025-02-06 02:18:20 - ERROR - stderr - 81%|████████▏ | 18277/22434 [16:10:40<2:55:46, 2.54s/it] +2025-02-06 02:18:20 - ERROR - stderr - +2025-02-06 02:18:20 - ERROR - stderr - +2025-02-06 02:18:20 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.5869262218475342, 'learning_rate': 1.747578286555146e-06, 'epoch': 2.44} +2025-02-06 02:18:20 - ERROR - stderr - 81%|████████▏ | 18277/22434 [16:10:40<2:55:46, 2.54s/it] +2025-02-06 02:18:23 - ERROR - stderr - 81%|████████▏ | 18278/22434 [16:10:42<2:55:02, 2.53s/it] +2025-02-06 02:18:23 - ERROR - stderr - +2025-02-06 02:18:23 - ERROR - stderr - +2025-02-06 02:18:23 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.6530287265777588, 'learning_rate': 1.7467629747395376e-06, 'epoch': 2.44} +2025-02-06 02:18:23 - ERROR - stderr - 81%|████████▏ | 18278/22434 [16:10:42<2:55:02, 2.53s/it] +2025-02-06 02:18:25 - ERROR - stderr - 81%|████████▏ | 18279/22434 [16:10:45<2:53:37, 2.51s/it] +2025-02-06 02:18:25 - ERROR - stderr - +2025-02-06 02:18:25 - ERROR - stderr - +2025-02-06 02:18:25 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.4440975189208984, 'learning_rate': 1.7459478349547577e-06, 'epoch': 2.44} +2025-02-06 02:18:25 - ERROR - stderr - 81%|████████▏ | 18279/22434 [16:10:45<2:53:37, 2.51s/it] +2025-02-06 02:18:28 - ERROR - stderr - 81%|████████▏ | 18280/22434 [16:10:47<2:52:18, 2.49s/it] +2025-02-06 02:18:28 - ERROR - stderr - +2025-02-06 02:18:28 - ERROR - stderr - +2025-02-06 02:18:28 - INFO - stdout - {'loss': 0.3867, 'grad_norm': 1.3625982999801636, 'learning_rate': 1.7451328672177969e-06, 'epoch': 2.44} +2025-02-06 02:18:28 - ERROR - stderr - 81%|████████▏ | 18280/22434 [16:10:47<2:52:18, 2.49s/it] +2025-02-06 02:18:30 - ERROR - stderr - 81%|████████▏ | 18281/22434 [16:10:50<2:52:15, 2.49s/it] +2025-02-06 02:18:30 - ERROR - stderr - +2025-02-06 02:18:30 - ERROR - stderr - +2025-02-06 02:18:30 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.5737146139144897, 'learning_rate': 1.7443180715456431e-06, 'epoch': 2.44} +2025-02-06 02:18:30 - ERROR - stderr - 81%|████████▏ | 18281/22434 [16:10:50<2:52:15, 2.49s/it] +2025-02-06 02:18:33 - ERROR - stderr - 81%|████████▏ | 18282/22434 [16:10:53<2:57:57, 2.57s/it] +2025-02-06 02:18:33 - ERROR - stderr - +2025-02-06 02:18:33 - ERROR - stderr - +2025-02-06 02:18:33 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.5660408735275269, 'learning_rate': 1.743503447955278e-06, 'epoch': 2.44} +2025-02-06 02:18:33 - ERROR - stderr - 81%|████████▏ | 18282/22434 [16:10:53<2:57:57, 2.57s/it] +2025-02-06 02:18:35 - ERROR - stderr - 81%|████████▏ | 18283/22434 [16:10:55<2:55:13, 2.53s/it] +2025-02-06 02:18:35 - ERROR - stderr - +2025-02-06 02:18:35 - ERROR - stderr - +2025-02-06 02:18:35 - INFO - stdout - {'loss': 0.4028, 'grad_norm': 1.7328089475631714, 'learning_rate': 1.742688996463684e-06, 'epoch': 2.44} +2025-02-06 02:18:35 - ERROR - stderr - 81%|████████▏ | 18283/22434 [16:10:55<2:55:13, 2.53s/it] +2025-02-06 02:18:38 - ERROR - stderr - 82%|████████▏ | 18284/22434 [16:10:58<2:54:05, 2.52s/it] +2025-02-06 02:18:38 - ERROR - stderr - +2025-02-06 02:18:38 - ERROR - stderr - +2025-02-06 02:18:38 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.5653839111328125, 'learning_rate': 1.741874717087836e-06, 'epoch': 2.45} +2025-02-06 02:18:38 - ERROR - stderr - 82%|████████▏ | 18284/22434 [16:10:58<2:54:05, 2.52s/it] +2025-02-06 02:18:40 - ERROR - stderr - 82%|████████▏ | 18285/22434 [16:11:00<2:52:49, 2.50s/it] +2025-02-06 02:18:40 - ERROR - stderr - +2025-02-06 02:18:40 - ERROR - stderr - +2025-02-06 02:18:40 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.5264708995819092, 'learning_rate': 1.741060609844708e-06, 'epoch': 2.45} +2025-02-06 02:18:40 - ERROR - stderr - 82%|████████▏ | 18285/22434 [16:11:00<2:52:49, 2.50s/it] +2025-02-06 02:18:43 - ERROR - stderr - 82%|████████▏ | 18286/22434 [16:11:02<2:51:53, 2.49s/it] +2025-02-06 02:18:43 - ERROR - stderr - +2025-02-06 02:18:43 - ERROR - stderr - +2025-02-06 02:18:43 - INFO - stdout - {'loss': 0.3203, 'grad_norm': 1.595931887626648, 'learning_rate': 1.7402466747512704e-06, 'epoch': 2.45} +2025-02-06 02:18:43 - ERROR - stderr - 82%|████████▏ | 18286/22434 [16:11:02<2:51:53, 2.49s/it] +2025-02-06 02:18:45 - ERROR - stderr - 82%|████████▏ | 18287/22434 [16:11:05<2:53:44, 2.51s/it] +2025-02-06 02:18:45 - ERROR - stderr - +2025-02-06 02:18:45 - ERROR - stderr - +2025-02-06 02:18:45 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.446514368057251, 'learning_rate': 1.7394329118244825e-06, 'epoch': 2.45} +2025-02-06 02:18:45 - ERROR - stderr - 82%|████████▏ | 18287/22434 [16:11:05<2:53:44, 2.51s/it] +2025-02-06 02:18:48 - ERROR - stderr - 82%|████████▏ | 18288/22434 [16:11:08<3:01:18, 2.62s/it] +2025-02-06 02:18:48 - ERROR - stderr - +2025-02-06 02:18:48 - ERROR - stderr - +2025-02-06 02:18:48 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.4631075859069824, 'learning_rate': 1.7386193210813163e-06, 'epoch': 2.45} +2025-02-06 02:18:48 - ERROR - stderr - 82%|████████▏ | 18288/22434 [16:11:08<3:01:18, 2.62s/it] +2025-02-06 02:18:51 - ERROR - stderr - 82%|████████▏ | 18289/22434 [16:11:10<2:59:53, 2.60s/it] +2025-02-06 02:18:51 - ERROR - stderr - +2025-02-06 02:18:51 - ERROR - stderr - +2025-02-06 02:18:51 - INFO - stdout - {'loss': 0.3634, 'grad_norm': 1.5443226099014282, 'learning_rate': 1.7378059025387194e-06, 'epoch': 2.45} +2025-02-06 02:18:51 - ERROR - stderr - 82%|████████▏ | 18289/22434 [16:11:10<2:59:53, 2.60s/it] +2025-02-06 02:18:53 - ERROR - stderr - 82%|████████▏ | 18290/22434 [16:11:13<2:56:38, 2.56s/it] +2025-02-06 02:18:53 - ERROR - stderr - +2025-02-06 02:18:53 - ERROR - stderr - +2025-02-06 02:18:53 - INFO - stdout - {'loss': 0.4054, 'grad_norm': 1.5995323657989502, 'learning_rate': 1.7369926562136553e-06, 'epoch': 2.45} +2025-02-06 02:18:53 - ERROR - stderr - 82%|████████▏ | 18290/22434 [16:11:13<2:56:38, 2.56s/it] +2025-02-06 02:18:56 - ERROR - stderr - 82%|████████▏ | 18291/22434 [16:11:15<2:54:56, 2.53s/it] +2025-02-06 02:18:56 - ERROR - stderr - +2025-02-06 02:18:56 - ERROR - stderr - +2025-02-06 02:18:56 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.419340968132019, 'learning_rate': 1.7361795821230741e-06, 'epoch': 2.45} +2025-02-06 02:18:56 - ERROR - stderr - 82%|████████▏ | 18291/22434 [16:11:15<2:54:56, 2.53s/it] +2025-02-06 02:18:58 - ERROR - stderr - 82%|████████▏ | 18292/22434 [16:11:18<2:52:44, 2.50s/it] +2025-02-06 02:18:58 - ERROR - stderr - +2025-02-06 02:18:58 - ERROR - stderr - +2025-02-06 02:18:58 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.5237889289855957, 'learning_rate': 1.7353666802839176e-06, 'epoch': 2.45} +2025-02-06 02:18:58 - ERROR - stderr - 82%|████████▏ | 18292/22434 [16:11:18<2:52:44, 2.50s/it] +2025-02-06 02:19:01 - ERROR - stderr - 82%|████████▏ | 18293/22434 [16:11:20<2:52:29, 2.50s/it] +2025-02-06 02:19:01 - ERROR - stderr - +2025-02-06 02:19:01 - ERROR - stderr - +2025-02-06 02:19:01 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.7873990535736084, 'learning_rate': 1.7345539507131392e-06, 'epoch': 2.45} +2025-02-06 02:19:01 - ERROR - stderr - 82%|████████▏ | 18293/22434 [16:11:20<2:52:29, 2.50s/it] +2025-02-06 02:19:03 - ERROR - stderr - 82%|████████▏ | 18294/22434 [16:11:23<2:51:40, 2.49s/it] +2025-02-06 02:19:03 - ERROR - stderr - +2025-02-06 02:19:03 - ERROR - stderr - +2025-02-06 02:19:03 - INFO - stdout - {'loss': 0.4115, 'grad_norm': 1.6849751472473145, 'learning_rate': 1.7337413934276726e-06, 'epoch': 2.45} +2025-02-06 02:19:03 - ERROR - stderr - 82%|████████▏ | 18294/22434 [16:11:23<2:51:40, 2.49s/it] +2025-02-06 02:19:05 - ERROR - stderr - 82%|████████▏ | 18295/22434 [16:11:25<2:51:59, 2.49s/it] +2025-02-06 02:19:06 - ERROR - stderr - +2025-02-06 02:19:06 - ERROR - stderr - +2025-02-06 02:19:06 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.5099862813949585, 'learning_rate': 1.7329290084444561e-06, 'epoch': 2.45} +2025-02-06 02:19:06 - ERROR - stderr - 82%|████████▏ | 18295/22434 [16:11:25<2:51:59, 2.49s/it] +2025-02-06 02:19:08 - ERROR - stderr - 82%|████████▏ | 18296/22434 [16:11:28<2:52:35, 2.50s/it] +2025-02-06 02:19:08 - ERROR - stderr - +2025-02-06 02:19:08 - ERROR - stderr - +2025-02-06 02:19:08 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.6152827739715576, 'learning_rate': 1.7321167957804241e-06, 'epoch': 2.45} +2025-02-06 02:19:08 - ERROR - stderr - 82%|████████▏ | 18296/22434 [16:11:28<2:52:35, 2.50s/it] +2025-02-06 02:19:10 - ERROR - stderr - 82%|████████▏ | 18297/22434 [16:11:30<2:52:11, 2.50s/it] +2025-02-06 02:19:11 - ERROR - stderr - +2025-02-06 02:19:11 - ERROR - stderr - +2025-02-06 02:19:11 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.8857215642929077, 'learning_rate': 1.7313047554525054e-06, 'epoch': 2.45} +2025-02-06 02:19:11 - ERROR - stderr - 82%|████████▏ | 18297/22434 [16:11:30<2:52:11, 2.50s/it] +2025-02-06 02:19:13 - ERROR - stderr - 82%|████████▏ | 18298/22434 [16:11:33<2:51:19, 2.49s/it] +2025-02-06 02:19:13 - ERROR - stderr - +2025-02-06 02:19:13 - ERROR - stderr - +2025-02-06 02:19:13 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.658272385597229, 'learning_rate': 1.7304928874776272e-06, 'epoch': 2.45} +2025-02-06 02:19:13 - ERROR - stderr - 82%|████████▏ | 18298/22434 [16:11:33<2:51:19, 2.49s/it] +2025-02-06 02:19:15 - ERROR - stderr - 82%|████████▏ | 18299/22434 [16:11:35<2:51:59, 2.50s/it] +2025-02-06 02:19:16 - ERROR - stderr - +2025-02-06 02:19:16 - ERROR - stderr - +2025-02-06 02:19:16 - INFO - stdout - {'loss': 0.3277, 'grad_norm': 1.38486909866333, 'learning_rate': 1.7296811918727107e-06, 'epoch': 2.45} +2025-02-06 02:19:16 - ERROR - stderr - 82%|████████▏ | 18299/22434 [16:11:35<2:51:59, 2.50s/it] +2025-02-06 02:19:18 - ERROR - stderr - 82%|████████▏ | 18300/22434 [16:11:38<2:52:57, 2.51s/it] +2025-02-06 02:19:18 - ERROR - stderr - +2025-02-06 02:19:18 - ERROR - stderr - +2025-02-06 02:19:18 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.3321501016616821, 'learning_rate': 1.7288696686546768e-06, 'epoch': 2.45} +2025-02-06 02:19:18 - ERROR - stderr - 82%|████████▏ | 18300/22434 [16:11:38<2:52:57, 2.51s/it] +2025-02-06 02:19:21 - ERROR - stderr - 82%|████████▏ | 18301/22434 [16:11:40<2:53:00, 2.51s/it] +2025-02-06 02:19:21 - ERROR - stderr - +2025-02-06 02:19:21 - ERROR - stderr - +2025-02-06 02:19:21 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.6780868768692017, 'learning_rate': 1.7280583178404408e-06, 'epoch': 2.45} +2025-02-06 02:19:21 - ERROR - stderr - 82%|████████▏ | 18301/22434 [16:11:40<2:53:00, 2.51s/it] +2025-02-06 02:19:23 - ERROR - stderr - 82%|████████▏ | 18302/22434 [16:11:43<2:51:41, 2.49s/it] +2025-02-06 02:19:23 - ERROR - stderr - +2025-02-06 02:19:23 - ERROR - stderr - +2025-02-06 02:19:23 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.7888216972351074, 'learning_rate': 1.7272471394469125e-06, 'epoch': 2.45} +2025-02-06 02:19:23 - ERROR - stderr - 82%|████████▏ | 18302/22434 [16:11:43<2:51:41, 2.49s/it] +2025-02-06 02:19:25 - ERROR - stderr - 82%|████████▏ | 18303/22434 [16:11:45<2:52:03, 2.50s/it] +2025-02-06 02:19:26 - ERROR - stderr - +2025-02-06 02:19:26 - ERROR - stderr - +2025-02-06 02:19:26 - INFO - stdout - {'loss': 0.4083, 'grad_norm': 1.8735895156860352, 'learning_rate': 1.726436133491002e-06, 'epoch': 2.45} +2025-02-06 02:19:26 - ERROR - stderr - 82%|████████▏ | 18303/22434 [16:11:45<2:52:03, 2.50s/it] +2025-02-06 02:19:28 - ERROR - stderr - 82%|████████▏ | 18304/22434 [16:11:48<2:52:12, 2.50s/it] +2025-02-06 02:19:28 - ERROR - stderr - +2025-02-06 02:19:28 - ERROR - stderr - +2025-02-06 02:19:28 - INFO - stdout - {'loss': 0.4284, 'grad_norm': 1.6659198999404907, 'learning_rate': 1.725625299989614e-06, 'epoch': 2.45} +2025-02-06 02:19:28 - ERROR - stderr - 82%|████████▏ | 18304/22434 [16:11:48<2:52:12, 2.50s/it] +2025-02-06 02:19:30 - ERROR - stderr - 82%|████████▏ | 18305/22434 [16:11:50<2:52:10, 2.50s/it] +2025-02-06 02:19:31 - ERROR - stderr - +2025-02-06 02:19:31 - ERROR - stderr - +2025-02-06 02:19:31 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.5958473682403564, 'learning_rate': 1.7248146389596476e-06, 'epoch': 2.45} +2025-02-06 02:19:31 - ERROR - stderr - 82%|████████▏ | 18305/22434 [16:11:50<2:52:10, 2.50s/it] +2025-02-06 02:19:33 - ERROR - stderr - 82%|████████▏ | 18306/22434 [16:11:53<2:50:27, 2.48s/it] +2025-02-06 02:19:33 - ERROR - stderr - +2025-02-06 02:19:33 - ERROR - stderr - +2025-02-06 02:19:33 - INFO - stdout - {'loss': 0.3973, 'grad_norm': 1.7837719917297363, 'learning_rate': 1.7240041504180016e-06, 'epoch': 2.45} +2025-02-06 02:19:33 - ERROR - stderr - 82%|████████▏ | 18306/22434 [16:11:53<2:50:27, 2.48s/it] +2025-02-06 02:19:35 - ERROR - stderr - 82%|████████▏ | 18307/22434 [16:11:55<2:49:47, 2.47s/it] +2025-02-06 02:19:35 - ERROR - stderr - +2025-02-06 02:19:35 - ERROR - stderr - +2025-02-06 02:19:35 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.7201337814331055, 'learning_rate': 1.7231938343815735e-06, 'epoch': 2.45} +2025-02-06 02:19:35 - ERROR - stderr - 82%|████████▏ | 18307/22434 [16:11:55<2:49:47, 2.47s/it] +2025-02-06 02:19:38 - ERROR - stderr - 82%|████████▏ | 18308/22434 [16:11:58<2:50:56, 2.49s/it] +2025-02-06 02:19:38 - ERROR - stderr - +2025-02-06 02:19:38 - ERROR - stderr - +2025-02-06 02:19:38 - INFO - stdout - {'loss': 0.322, 'grad_norm': 1.5669019222259521, 'learning_rate': 1.7223836908672441e-06, 'epoch': 2.45} +2025-02-06 02:19:38 - ERROR - stderr - 82%|████████▏ | 18308/22434 [16:11:58<2:50:56, 2.49s/it] +2025-02-06 02:19:40 - ERROR - stderr - 82%|████████▏ | 18309/22434 [16:12:00<2:51:19, 2.49s/it] +2025-02-06 02:19:40 - ERROR - stderr - +2025-02-06 02:19:40 - ERROR - stderr - +2025-02-06 02:19:40 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.6596336364746094, 'learning_rate': 1.721573719891908e-06, 'epoch': 2.45} +2025-02-06 02:19:40 - ERROR - stderr - 82%|████████▏ | 18309/22434 [16:12:00<2:51:19, 2.49s/it] +2025-02-06 02:19:43 - ERROR - stderr - 82%|████████▏ | 18310/22434 [16:12:03<2:50:35, 2.48s/it] +2025-02-06 02:19:43 - ERROR - stderr - +2025-02-06 02:19:43 - ERROR - stderr - +2025-02-06 02:19:43 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.5750923156738281, 'learning_rate': 1.7207639214724491e-06, 'epoch': 2.45} +2025-02-06 02:19:43 - ERROR - stderr - 82%|████████▏ | 18310/22434 [16:12:03<2:50:35, 2.48s/it] +2025-02-06 02:19:45 - ERROR - stderr - 82%|████████▏ | 18311/22434 [16:12:05<2:49:20, 2.46s/it] +2025-02-06 02:19:45 - ERROR - stderr - +2025-02-06 02:19:45 - ERROR - stderr - +2025-02-06 02:19:45 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.6255757808685303, 'learning_rate': 1.7199542956257388e-06, 'epoch': 2.45} +2025-02-06 02:19:45 - ERROR - stderr - 82%|████████▏ | 18311/22434 [16:12:05<2:49:20, 2.46s/it] +2025-02-06 02:19:48 - ERROR - stderr - 82%|████████▏ | 18312/22434 [16:12:07<2:48:45, 2.46s/it] +2025-02-06 02:19:48 - ERROR - stderr - +2025-02-06 02:19:48 - ERROR - stderr - +2025-02-06 02:19:48 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.5581547021865845, 'learning_rate': 1.719144842368663e-06, 'epoch': 2.45} +2025-02-06 02:19:48 - ERROR - stderr - 82%|████████▏ | 18312/22434 [16:12:08<2:48:45, 2.46s/it] +2025-02-06 02:19:50 - ERROR - stderr - 82%|████████▏ | 18313/22434 [16:12:10<2:49:43, 2.47s/it] +2025-02-06 02:19:50 - ERROR - stderr - +2025-02-06 02:19:50 - ERROR - stderr - +2025-02-06 02:19:50 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.6339961290359497, 'learning_rate': 1.718335561718084e-06, 'epoch': 2.45} +2025-02-06 02:19:50 - ERROR - stderr - 82%|████████▏ | 18313/22434 [16:12:10<2:49:43, 2.47s/it] +2025-02-06 02:19:53 - ERROR - stderr - 82%|████████▏ | 18314/22434 [16:12:13<2:51:26, 2.50s/it] +2025-02-06 02:19:53 - ERROR - stderr - +2025-02-06 02:19:53 - ERROR - stderr - +2025-02-06 02:19:53 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.7018744945526123, 'learning_rate': 1.717526453690881e-06, 'epoch': 2.45} +2025-02-06 02:19:53 - ERROR - stderr - 82%|████████▏ | 18314/22434 [16:12:13<2:51:26, 2.50s/it] +2025-02-06 02:19:55 - ERROR - stderr - 82%|████████▏ | 18315/22434 [16:12:15<2:51:34, 2.50s/it] +2025-02-06 02:19:55 - ERROR - stderr - +2025-02-06 02:19:55 - ERROR - stderr - +2025-02-06 02:19:55 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.5955941677093506, 'learning_rate': 1.7167175183039108e-06, 'epoch': 2.45} +2025-02-06 02:19:55 - ERROR - stderr - 82%|████████▏ | 18315/22434 [16:12:15<2:51:34, 2.50s/it] +2025-02-06 02:19:58 - ERROR - stderr - 82%|████████▏ | 18316/22434 [16:12:18<2:52:17, 2.51s/it] +2025-02-06 02:19:58 - ERROR - stderr - +2025-02-06 02:19:58 - ERROR - stderr - +2025-02-06 02:19:58 - INFO - stdout - {'loss': 0.42, 'grad_norm': 1.902502417564392, 'learning_rate': 1.7159087555740383e-06, 'epoch': 2.45} +2025-02-06 02:19:58 - ERROR - stderr - 82%|████████▏ | 18316/22434 [16:12:18<2:52:17, 2.51s/it] +2025-02-06 02:20:00 - ERROR - stderr - 82%|████████▏ | 18317/22434 [16:12:20<2:53:00, 2.52s/it] +2025-02-06 02:20:00 - ERROR - stderr - +2025-02-06 02:20:00 - ERROR - stderr - +2025-02-06 02:20:00 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.6018946170806885, 'learning_rate': 1.7151001655181199e-06, 'epoch': 2.45} +2025-02-06 02:20:00 - ERROR - stderr - 82%|████████▏ | 18317/22434 [16:12:20<2:53:00, 2.52s/it] +2025-02-06 02:20:03 - ERROR - stderr - 82%|████████▏ | 18318/22434 [16:12:23<2:51:48, 2.50s/it] +2025-02-06 02:20:03 - ERROR - stderr - +2025-02-06 02:20:03 - ERROR - stderr - +2025-02-06 02:20:03 - INFO - stdout - {'loss': 0.326, 'grad_norm': 1.2996386289596558, 'learning_rate': 1.7142917481530108e-06, 'epoch': 2.45} +2025-02-06 02:20:03 - ERROR - stderr - 82%|████████▏ | 18318/22434 [16:12:23<2:51:48, 2.50s/it] +2025-02-06 02:20:05 - ERROR - stderr - 82%|████████▏ | 18319/22434 [16:12:25<2:52:34, 2.52s/it] +2025-02-06 02:20:05 - ERROR - stderr - +2025-02-06 02:20:05 - ERROR - stderr - +2025-02-06 02:20:05 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.5361231565475464, 'learning_rate': 1.713483503495562e-06, 'epoch': 2.45} +2025-02-06 02:20:05 - ERROR - stderr - 82%|████████▏ | 18319/22434 [16:12:25<2:52:34, 2.52s/it] +2025-02-06 02:20:08 - ERROR - stderr - 82%|████████▏ | 18320/22434 [16:12:28<2:52:16, 2.51s/it] +2025-02-06 02:20:08 - ERROR - stderr - +2025-02-06 02:20:08 - ERROR - stderr - +2025-02-06 02:20:08 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 2.2248213291168213, 'learning_rate': 1.7126754315626203e-06, 'epoch': 2.45} +2025-02-06 02:20:08 - ERROR - stderr - 82%|████████▏ | 18320/22434 [16:12:28<2:52:16, 2.51s/it] +2025-02-06 02:20:10 - ERROR - stderr - 82%|████████▏ | 18321/22434 [16:12:30<2:53:22, 2.53s/it] +2025-02-06 02:20:11 - ERROR - stderr - +2025-02-06 02:20:11 - ERROR - stderr - +2025-02-06 02:20:11 - INFO - stdout - {'loss': 0.3616, 'grad_norm': 1.516777515411377, 'learning_rate': 1.7118675323710288e-06, 'epoch': 2.45} +2025-02-06 02:20:11 - ERROR - stderr - 82%|████████▏ | 18321/22434 [16:12:30<2:53:22, 2.53s/it] +2025-02-06 02:20:13 - ERROR - stderr - 82%|████████▏ | 18322/22434 [16:12:33<2:52:48, 2.52s/it] +2025-02-06 02:20:13 - ERROR - stderr - +2025-02-06 02:20:13 - ERROR - stderr - +2025-02-06 02:20:13 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.6168711185455322, 'learning_rate': 1.7110598059376282e-06, 'epoch': 2.45} +2025-02-06 02:20:13 - ERROR - stderr - 82%|████████▏ | 18322/22434 [16:12:33<2:52:48, 2.52s/it] +2025-02-06 02:20:15 - ERROR - stderr - 82%|████████▏ | 18323/22434 [16:12:35<2:50:53, 2.49s/it] +2025-02-06 02:20:15 - ERROR - stderr - +2025-02-06 02:20:15 - ERROR - stderr - +2025-02-06 02:20:15 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.5790424346923828, 'learning_rate': 1.710252252279253e-06, 'epoch': 2.45} +2025-02-06 02:20:15 - ERROR - stderr - 82%|████████▏ | 18323/22434 [16:12:35<2:50:53, 2.49s/it] +2025-02-06 02:20:18 - ERROR - stderr - 82%|████████▏ | 18324/22434 [16:12:38<2:49:46, 2.48s/it] +2025-02-06 02:20:18 - ERROR - stderr - +2025-02-06 02:20:18 - ERROR - stderr - +2025-02-06 02:20:18 - INFO - stdout - {'loss': 0.3126, 'grad_norm': 1.3731200695037842, 'learning_rate': 1.7094448714127387e-06, 'epoch': 2.45} +2025-02-06 02:20:18 - ERROR - stderr - 82%|████████▏ | 18324/22434 [16:12:38<2:49:46, 2.48s/it] +2025-02-06 02:20:20 - ERROR - stderr - 82%|████████▏ | 18325/22434 [16:12:40<2:52:06, 2.51s/it] +2025-02-06 02:20:20 - ERROR - stderr - +2025-02-06 02:20:20 - ERROR - stderr - +2025-02-06 02:20:20 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.678053617477417, 'learning_rate': 1.7086376633549119e-06, 'epoch': 2.45} +2025-02-06 02:20:20 - ERROR - stderr - 82%|████████▏ | 18325/22434 [16:12:40<2:52:06, 2.51s/it] +2025-02-06 02:20:23 - ERROR - stderr - 82%|████████▏ | 18326/22434 [16:12:43<2:51:16, 2.50s/it] +2025-02-06 02:20:23 - ERROR - stderr - +2025-02-06 02:20:23 - ERROR - stderr - +2025-02-06 02:20:23 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.368772029876709, 'learning_rate': 1.707830628122602e-06, 'epoch': 2.45} +2025-02-06 02:20:23 - ERROR - stderr - 82%|████████▏ | 18326/22434 [16:12:43<2:51:16, 2.50s/it] +2025-02-06 02:20:25 - ERROR - stderr - 82%|████████▏ | 18327/22434 [16:12:45<2:52:16, 2.52s/it] +2025-02-06 02:20:25 - ERROR - stderr - +2025-02-06 02:20:25 - ERROR - stderr - +2025-02-06 02:20:25 - INFO - stdout - {'loss': 0.3049, 'grad_norm': 1.39094078540802, 'learning_rate': 1.7070237657326228e-06, 'epoch': 2.45} +2025-02-06 02:20:25 - ERROR - stderr - 82%|████████▏ | 18327/22434 [16:12:45<2:52:16, 2.52s/it] +2025-02-06 02:20:28 - ERROR - stderr - 82%|████████▏ | 18328/22434 [16:12:48<3:02:19, 2.66s/it] +2025-02-06 02:20:29 - ERROR - stderr - +2025-02-06 02:20:29 - ERROR - stderr - +2025-02-06 02:20:29 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.4616683721542358, 'learning_rate': 1.7062170762018005e-06, 'epoch': 2.45} +2025-02-06 02:20:29 - ERROR - stderr - 82%|████████▏ | 18328/22434 [16:12:48<3:02:19, 2.66s/it] +2025-02-06 02:20:31 - ERROR - stderr - 82%|████████▏ | 18329/22434 [16:12:51<3:00:08, 2.63s/it] +2025-02-06 02:20:31 - ERROR - stderr - +2025-02-06 02:20:31 - ERROR - stderr - +2025-02-06 02:20:31 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.4798396825790405, 'learning_rate': 1.7054105595469462e-06, 'epoch': 2.45} +2025-02-06 02:20:31 - ERROR - stderr - 82%|████████▏ | 18329/22434 [16:12:51<3:00:08, 2.63s/it] +2025-02-06 02:20:33 - ERROR - stderr - 82%|████████▏ | 18330/22434 [16:12:53<2:56:53, 2.59s/it] +2025-02-06 02:20:34 - ERROR - stderr - +2025-02-06 02:20:34 - ERROR - stderr - +2025-02-06 02:20:34 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.5773667097091675, 'learning_rate': 1.7046042157848718e-06, 'epoch': 2.45} +2025-02-06 02:20:34 - ERROR - stderr - 82%|████████▏ | 18330/22434 [16:12:53<2:56:53, 2.59s/it] +2025-02-06 02:20:36 - ERROR - stderr - 82%|████████▏ | 18331/22434 [16:12:56<2:54:16, 2.55s/it] +2025-02-06 02:20:36 - ERROR - stderr - +2025-02-06 02:20:36 - ERROR - stderr - +2025-02-06 02:20:36 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.5374013185501099, 'learning_rate': 1.7037980449323876e-06, 'epoch': 2.45} +2025-02-06 02:20:36 - ERROR - stderr - 82%|██��█████▏ | 18331/22434 [16:12:56<2:54:16, 2.55s/it] +2025-02-06 02:20:38 - ERROR - stderr - 82%|████████▏ | 18332/22434 [16:12:58<2:53:19, 2.54s/it] +2025-02-06 02:20:39 - ERROR - stderr - +2025-02-06 02:20:39 - ERROR - stderr - +2025-02-06 02:20:39 - INFO - stdout - {'loss': 0.3489, 'grad_norm': 1.3983029127120972, 'learning_rate': 1.70299204700629e-06, 'epoch': 2.45} +2025-02-06 02:20:39 - ERROR - stderr - 82%|████████▏ | 18332/22434 [16:12:58<2:53:19, 2.54s/it] +2025-02-06 02:20:41 - ERROR - stderr - 82%|████████▏ | 18333/22434 [16:13:01<2:52:41, 2.53s/it] +2025-02-06 02:20:41 - ERROR - stderr - +2025-02-06 02:20:41 - ERROR - stderr - +2025-02-06 02:20:41 - INFO - stdout - {'loss': 0.3279, 'grad_norm': 1.4616225957870483, 'learning_rate': 1.7021862220233887e-06, 'epoch': 2.45} +2025-02-06 02:20:41 - ERROR - stderr - 82%|████████▏ | 18333/22434 [16:13:01<2:52:41, 2.53s/it] +2025-02-06 02:20:43 - ERROR - stderr - 82%|████████▏ | 18334/22434 [16:13:03<2:52:49, 2.53s/it] +2025-02-06 02:20:44 - ERROR - stderr - +2025-02-06 02:20:44 - ERROR - stderr - +2025-02-06 02:20:44 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.372166395187378, 'learning_rate': 1.7013805700004715e-06, 'epoch': 2.45} +2025-02-06 02:20:44 - ERROR - stderr - 82%|████████▏ | 18334/22434 [16:13:03<2:52:49, 2.53s/it] +2025-02-06 02:20:46 - ERROR - stderr - 82%|████████▏ | 18335/22434 [16:13:06<2:51:59, 2.52s/it] +2025-02-06 02:20:46 - ERROR - stderr - +2025-02-06 02:20:46 - ERROR - stderr - +2025-02-06 02:20:46 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.5691156387329102, 'learning_rate': 1.7005750909543373e-06, 'epoch': 2.45} +2025-02-06 02:20:46 - ERROR - stderr - 82%|████████▏ | 18335/22434 [16:13:06<2:51:59, 2.52s/it] +2025-02-06 02:20:48 - ERROR - stderr - 82%|████████▏ | 18336/22434 [16:13:08<2:51:25, 2.51s/it] +2025-02-06 02:20:49 - ERROR - stderr - +2025-02-06 02:20:49 - ERROR - stderr - +2025-02-06 02:20:49 - INFO - stdout - {'loss': 0.3177, 'grad_norm': 1.3386577367782593, 'learning_rate': 1.6997697849017725e-06, 'epoch': 2.45} +2025-02-06 02:20:49 - ERROR - stderr - 82%|████████▏ | 18336/22434 [16:13:08<2:51:25, 2.51s/it] +2025-02-06 02:20:51 - ERROR - stderr - 82%|████████▏ | 18337/22434 [16:13:11<2:51:56, 2.52s/it] +2025-02-06 02:20:51 - ERROR - stderr - +2025-02-06 02:20:51 - ERROR - stderr - +2025-02-06 02:20:51 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.3635553121566772, 'learning_rate': 1.6989646518595616e-06, 'epoch': 2.45} +2025-02-06 02:20:51 - ERROR - stderr - 82%|████████▏ | 18337/22434 [16:13:11<2:51:56, 2.52s/it] +2025-02-06 02:20:53 - ERROR - stderr - 82%|████████▏ | 18338/22434 [16:13:13<2:51:08, 2.51s/it] +2025-02-06 02:20:54 - ERROR - stderr - +2025-02-06 02:20:54 - ERROR - stderr - +2025-02-06 02:20:54 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.5327842235565186, 'learning_rate': 1.6981596918444953e-06, 'epoch': 2.45} +2025-02-06 02:20:54 - ERROR - stderr - 82%|████████▏ | 18338/22434 [16:13:13<2:51:08, 2.51s/it] +2025-02-06 02:20:56 - ERROR - stderr - 82%|████████▏ | 18339/22434 [16:13:16<2:51:24, 2.51s/it] +2025-02-06 02:20:56 - ERROR - stderr - +2025-02-06 02:20:56 - ERROR - stderr - +2025-02-06 02:20:56 - INFO - stdout - {'loss': 0.3334, 'grad_norm': 1.3996065855026245, 'learning_rate': 1.6973549048733428e-06, 'epoch': 2.45} +2025-02-06 02:20:56 - ERROR - stderr - 82%|████████▏ | 18339/22434 [16:13:16<2:51:24, 2.51s/it] +2025-02-06 02:20:59 - ERROR - stderr - 82%|████████▏ | 18340/22434 [16:13:19<2:56:42, 2.59s/it] +2025-02-06 02:20:59 - ERROR - stderr - +2025-02-06 02:20:59 - ERROR - stderr - +2025-02-06 02:20:59 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.4622074365615845, 'learning_rate': 1.6965502909628828e-06, 'epoch': 2.45} +2025-02-06 02:20:59 - ERROR - stderr - 82%|████████▏ | 18340/22434 [16:13:19<2:56:42, 2.59s/it] +2025-02-06 02:21:01 - ERROR - stderr - 82%|████████▏ | 18341/22434 [16:13:21<2:56:56, 2.59s/it] +2025-02-06 02:21:01 - ERROR - stderr - +2025-02-06 02:21:01 - ERROR - stderr - +2025-02-06 02:21:01 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.5473166704177856, 'learning_rate': 1.6957458501298862e-06, 'epoch': 2.45} +2025-02-06 02:21:01 - ERROR - stderr - 82%|████████▏ | 18341/22434 [16:13:21<2:56:56, 2.59s/it] +2025-02-06 02:21:04 - ERROR - stderr - 82%|████████▏ | 18342/22434 [16:13:24<2:54:01, 2.55s/it] +2025-02-06 02:21:04 - ERROR - stderr - +2025-02-06 02:21:04 - ERROR - stderr - +2025-02-06 02:21:04 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.5477222204208374, 'learning_rate': 1.6949415823911208e-06, 'epoch': 2.45} +2025-02-06 02:21:04 - ERROR - stderr - 82%|████████▏ | 18342/22434 [16:13:24<2:54:01, 2.55s/it] +2025-02-06 02:21:06 - ERROR - stderr - 82%|████████▏ | 18343/22434 [16:13:26<2:55:41, 2.58s/it] +2025-02-06 02:21:07 - ERROR - stderr - +2025-02-06 02:21:07 - ERROR - stderr - +2025-02-06 02:21:07 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.4794080257415771, 'learning_rate': 1.6941374877633522e-06, 'epoch': 2.45} +2025-02-06 02:21:07 - ERROR - stderr - 82%|████████▏ | 18343/22434 [16:13:26<2:55:41, 2.58s/it] +2025-02-06 02:21:09 - ERROR - stderr - 82%|████████▏ | 18344/22434 [16:13:29<2:54:32, 2.56s/it] +2025-02-06 02:21:09 - ERROR - stderr - +2025-02-06 02:21:09 - ERROR - stderr - +2025-02-06 02:21:09 - INFO - stdout - {'loss': 0.3893, 'grad_norm': 1.677876353263855, 'learning_rate': 1.6933335662633387e-06, 'epoch': 2.45} +2025-02-06 02:21:09 - ERROR - stderr - 82%|████████▏ | 18344/22434 [16:13:29<2:54:32, 2.56s/it] +2025-02-06 02:21:11 - ERROR - stderr - 82%|████████▏ | 18345/22434 [16:13:31<2:52:49, 2.54s/it] +2025-02-06 02:21:12 - ERROR - stderr - +2025-02-06 02:21:12 - ERROR - stderr - +2025-02-06 02:21:12 - INFO - stdout - {'loss': 0.4135, 'grad_norm': 1.678351640701294, 'learning_rate': 1.6925298179078386e-06, 'epoch': 2.45} +2025-02-06 02:21:12 - ERROR - stderr - 82%|████████▏ | 18345/22434 [16:13:31<2:52:49, 2.54s/it] +2025-02-06 02:21:14 - ERROR - stderr - 82%|████████▏ | 18346/22434 [16:13:34<2:52:03, 2.53s/it] +2025-02-06 02:21:14 - ERROR - stderr - +2025-02-06 02:21:14 - ERROR - stderr - +2025-02-06 02:21:14 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.4939243793487549, 'learning_rate': 1.6917262427136049e-06, 'epoch': 2.45} +2025-02-06 02:21:14 - ERROR - stderr - 82%|████████▏ | 18346/22434 [16:13:34<2:52:03, 2.53s/it] +2025-02-06 02:21:17 - ERROR - stderr - 82%|████████▏ | 18347/22434 [16:13:36<2:52:32, 2.53s/it] +2025-02-06 02:21:17 - ERROR - stderr - +2025-02-06 02:21:17 - ERROR - stderr - +2025-02-06 02:21:17 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.468875765800476, 'learning_rate': 1.6909228406973887e-06, 'epoch': 2.45} +2025-02-06 02:21:17 - ERROR - stderr - 82%|████████▏ | 18347/22434 [16:13:36<2:52:32, 2.53s/it] +2025-02-06 02:21:19 - ERROR - stderr - 82%|████████▏ | 18348/22434 [16:13:39<2:51:47, 2.52s/it] +2025-02-06 02:21:19 - ERROR - stderr - +2025-02-06 02:21:19 - ERROR - stderr - +2025-02-06 02:21:19 - INFO - stdout - {'loss': 0.4035, 'grad_norm': 1.737004280090332, 'learning_rate': 1.6901196118759333e-06, 'epoch': 2.45} +2025-02-06 02:21:19 - ERROR - stderr - 82%|████████▏ | 18348/22434 [16:13:39<2:51:47, 2.52s/it] +2025-02-06 02:21:22 - ERROR - stderr - 82%|████████▏ | 18349/22434 [16:13:41<2:51:10, 2.51s/it] +2025-02-06 02:21:22 - ERROR - stderr - +2025-02-06 02:21:22 - ERROR - stderr - +2025-02-06 02:21:22 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.6445791721343994, 'learning_rate': 1.6893165562659842e-06, 'epoch': 2.45} +2025-02-06 02:21:22 - ERROR - stderr - 82%|████████▏ | 18349/22434 [16:13:41<2:51:10, 2.51s/it] +2025-02-06 02:21:24 - ERROR - stderr - 82%|████████▏ | 18350/22434 [16:13:44<2:54:43, 2.57s/it] +2025-02-06 02:21:24 - ERROR - stderr - +2025-02-06 02:21:24 - ERROR - stderr - +2025-02-06 02:21:24 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.306131362915039, 'learning_rate': 1.6885136738842812e-06, 'epoch': 2.45} +2025-02-06 02:21:24 - ERROR - stderr - 82%|████████▏ | 18350/22434 [16:13:44<2:54:43, 2.57s/it] +2025-02-06 02:21:27 - ERROR - stderr - 82%|████████▏ | 18351/22434 [16:13:47<3:03:50, 2.70s/it] +2025-02-06 02:21:27 - ERROR - stderr - +2025-02-06 02:21:27 - ERROR - stderr - +2025-02-06 02:21:27 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.4941554069519043, 'learning_rate': 1.687710964747552e-06, 'epoch': 2.45} +2025-02-06 02:21:27 - ERROR - stderr - 82%|████████▏ | 18351/22434 [16:13:47<3:03:50, 2.70s/it] +2025-02-06 02:21:30 - ERROR - stderr - 82%|████████▏ | 18352/22434 [16:13:50<2:59:37, 2.64s/it] +2025-02-06 02:21:30 - ERROR - stderr - +2025-02-06 02:21:30 - ERROR - stderr - +2025-02-06 02:21:30 - INFO - stdout - {'loss': 0.3274, 'grad_norm': 1.5097514390945435, 'learning_rate': 1.686908428872539e-06, 'epoch': 2.45} +2025-02-06 02:21:30 - ERROR - stderr - 82%|████████▏ | 18352/22434 [16:13:50<2:59:37, 2.64s/it] +2025-02-06 02:21:32 - ERROR - stderr - 82%|████████▏ | 18353/22434 [16:13:52<2:58:13, 2.62s/it] +2025-02-06 02:21:32 - ERROR - stderr - +2025-02-06 02:21:32 - ERROR - stderr - +2025-02-06 02:21:32 - INFO - stdout - {'loss': 0.3295, 'grad_norm': 1.3403040170669556, 'learning_rate': 1.6861060662759598e-06, 'epoch': 2.45} +2025-02-06 02:21:32 - ERROR - stderr - 82%|████████▏ | 18353/22434 [16:13:52<2:58:13, 2.62s/it] +2025-02-06 02:21:35 - ERROR - stderr - 82%|████████▏ | 18354/22434 [16:13:55<2:55:56, 2.59s/it] +2025-02-06 02:21:35 - ERROR - stderr - +2025-02-06 02:21:35 - ERROR - stderr - +2025-02-06 02:21:35 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.5221773386001587, 'learning_rate': 1.6853038769745466e-06, 'epoch': 2.45} +2025-02-06 02:21:35 - ERROR - stderr - 82%|████████▏ | 18354/22434 [16:13:55<2:55:56, 2.59s/it] +2025-02-06 02:21:38 - ERROR - stderr - 82%|████████▏ | 18355/22434 [16:13:57<2:59:12, 2.64s/it] +2025-02-06 02:21:38 - ERROR - stderr - +2025-02-06 02:21:38 - ERROR - stderr - +2025-02-06 02:21:38 - INFO - stdout - {'loss': 0.3462, 'grad_norm': 1.7235254049301147, 'learning_rate': 1.6845018609850206e-06, 'epoch': 2.45} +2025-02-06 02:21:38 - ERROR - stderr - 82%|████████▏ | 18355/22434 [16:13:57<2:59:12, 2.64s/it] +2025-02-06 02:21:40 - ERROR - stderr - 82%|████████▏ | 18356/22434 [16:14:00<2:56:32, 2.60s/it] +2025-02-06 02:21:40 - ERROR - stderr - +2025-02-06 02:21:40 - ERROR - stderr - +2025-02-06 02:21:40 - INFO - stdout - {'loss': 0.4338, 'grad_norm': 1.8121986389160156, 'learning_rate': 1.6837000183240915e-06, 'epoch': 2.45} +2025-02-06 02:21:40 - ERROR - stderr - 82%|████████▏ | 18356/22434 [16:14:00<2:56:32, 2.60s/it] +2025-02-06 02:21:43 - ERROR - stderr - 82%|████████▏ | 18357/22434 [16:14:02<2:54:12, 2.56s/it] +2025-02-06 02:21:43 - ERROR - stderr - +2025-02-06 02:21:43 - ERROR - stderr - +2025-02-06 02:21:43 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.6822962760925293, 'learning_rate': 1.6828983490084827e-06, 'epoch': 2.45} +2025-02-06 02:21:43 - ERROR - stderr - 82%|████████▏ | 18357/22434 [16:14:02<2:54:12, 2.56s/it] +2025-02-06 02:21:45 - ERROR - stderr - 82%|████████▏ | 18358/22434 [16:14:05<2:56:38, 2.60s/it] +2025-02-06 02:21:45 - ERROR - stderr - +2025-02-06 02:21:45 - ERROR - stderr - +2025-02-06 02:21:45 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.695307731628418, 'learning_rate': 1.6820968530548931e-06, 'epoch': 2.45} +2025-02-06 02:21:45 - ERROR - stderr - 82%|████████▏ | 18358/22434 [16:14:05<2:56:38, 2.60s/it] +2025-02-06 02:21:48 - ERROR - stderr - 82%|████████▏ | 18359/22434 [16:14:07<2:52:53, 2.55s/it] +2025-02-06 02:21:48 - ERROR - stderr - +2025-02-06 02:21:48 - ERROR - stderr - +2025-02-06 02:21:48 - INFO - stdout - {'loss': 0.3204, 'grad_norm': 1.4447683095932007, 'learning_rate': 1.6812955304800415e-06, 'epoch': 2.46} +2025-02-06 02:21:48 - ERROR - stderr - 82%|████████▏ | 18359/22434 [16:14:07<2:52:53, 2.55s/it] +2025-02-06 02:21:50 - ERROR - stderr - 82%|████████▏ | 18360/22434 [16:14:10<2:53:50, 2.56s/it] +2025-02-06 02:21:50 - ERROR - stderr - +2025-02-06 02:21:50 - ERROR - stderr - +2025-02-06 02:21:50 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.5660394430160522, 'learning_rate': 1.6804943813006214e-06, 'epoch': 2.46} +2025-02-06 02:21:50 - ERROR - stderr - 82%|████████▏ | 18360/22434 [16:14:10<2:53:50, 2.56s/it] +2025-02-06 02:21:53 - ERROR - stderr - 82%|████████▏ | 18361/22434 [16:14:13<2:52:42, 2.54s/it] +2025-02-06 02:21:53 - ERROR - stderr - +2025-02-06 02:21:53 - ERROR - stderr - +2025-02-06 02:21:53 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.501447081565857, 'learning_rate': 1.6796934055333346e-06, 'epoch': 2.46} +2025-02-06 02:21:53 - ERROR - stderr - 82%|████████▏ | 18361/22434 [16:14:13<2:52:42, 2.54s/it] +2025-02-06 02:21:55 - ERROR - stderr - 82%|████████▏ | 18362/22434 [16:14:15<2:50:51, 2.52s/it] +2025-02-06 02:21:55 - ERROR - stderr - +2025-02-06 02:21:55 - ERROR - stderr - +2025-02-06 02:21:55 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.6945569515228271, 'learning_rate': 1.6788926031948782e-06, 'epoch': 2.46} +2025-02-06 02:21:55 - ERROR - stderr - 82%|████████▏ | 18362/22434 [16:14:15<2:50:51, 2.52s/it] +2025-02-06 02:21:58 - ERROR - stderr - 82%|████████▏ | 18363/22434 [16:14:18<2:52:53, 2.55s/it] +2025-02-06 02:21:58 - ERROR - stderr - +2025-02-06 02:21:58 - ERROR - stderr - +2025-02-06 02:21:58 - INFO - stdout - {'loss': 0.3928, 'grad_norm': 1.761191725730896, 'learning_rate': 1.678091974301942e-06, 'epoch': 2.46} +2025-02-06 02:21:58 - ERROR - stderr - 82%|████████▏ | 18363/22434 [16:14:18<2:52:53, 2.55s/it] +2025-02-06 02:22:00 - ERROR - stderr - 82%|████████▏ | 18364/22434 [16:14:20<2:50:28, 2.51s/it] +2025-02-06 02:22:00 - ERROR - stderr - +2025-02-06 02:22:00 - ERROR - stderr - +2025-02-06 02:22:00 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.5343469381332397, 'learning_rate': 1.6772915188712157e-06, 'epoch': 2.46} +2025-02-06 02:22:00 - ERROR - stderr - 82%|████████▏ | 18364/22434 [16:14:20<2:50:28, 2.51s/it] +2025-02-06 02:22:03 - ERROR - stderr - 82%|████████▏ | 18365/22434 [16:14:23<2:53:06, 2.55s/it] +2025-02-06 02:22:03 - ERROR - stderr - +2025-02-06 02:22:03 - ERROR - stderr - +2025-02-06 02:22:03 - INFO - stdout - {'loss': 0.3252, 'grad_norm': 1.5702824592590332, 'learning_rate': 1.676491236919384e-06, 'epoch': 2.46} +2025-02-06 02:22:03 - ERROR - stderr - 82%|████████▏ | 18365/22434 [16:14:23<2:53:06, 2.55s/it] +2025-02-06 02:22:05 - ERROR - stderr - 82%|████████▏ | 18366/22434 [16:14:25<2:51:16, 2.53s/it] +2025-02-06 02:22:05 - ERROR - stderr - +2025-02-06 02:22:05 - ERROR - stderr - +2025-02-06 02:22:05 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.3933650255203247, 'learning_rate': 1.6756911284631272e-06, 'epoch': 2.46} +2025-02-06 02:22:05 - ERROR - stderr - 82%|████████▏ | 18366/22434 [16:14:25<2:51:16, 2.53s/it] +2025-02-06 02:22:08 - ERROR - stderr - 82%|████████▏ | 18367/22434 [16:14:28<2:52:12, 2.54s/it] +2025-02-06 02:22:08 - ERROR - stderr - +2025-02-06 02:22:08 - ERROR - stderr - +2025-02-06 02:22:08 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.6416264772415161, 'learning_rate': 1.6748911935191236e-06, 'epoch': 2.46} +2025-02-06 02:22:08 - ERROR - stderr - 82%|████████▏ | 18367/22434 [16:14:28<2:52:12, 2.54s/it] +2025-02-06 02:22:11 - ERROR - stderr - 82%|████████▏ | 18368/22434 [16:14:30<2:52:34, 2.55s/it] +2025-02-06 02:22:11 - ERROR - stderr - +2025-02-06 02:22:11 - ERROR - stderr - +2025-02-06 02:22:11 - INFO - stdout - {'loss': 0.4024, 'grad_norm': 1.5850558280944824, 'learning_rate': 1.6740914321040468e-06, 'epoch': 2.46} +2025-02-06 02:22:11 - ERROR - stderr - 82%|████████▏ | 18368/22434 [16:14:30<2:52:34, 2.55s/it] +2025-02-06 02:22:13 - ERROR - stderr - 82%|████████▏ | 18369/22434 [16:14:33<2:52:51, 2.55s/it] +2025-02-06 02:22:13 - ERROR - stderr - +2025-02-06 02:22:13 - ERROR - stderr - +2025-02-06 02:22:13 - INFO - stdout - {'loss': 0.3293, 'grad_norm': 1.4282252788543701, 'learning_rate': 1.673291844234568e-06, 'epoch': 2.46} +2025-02-06 02:22:13 - ERROR - stderr - 82%|████████▏ | 18369/22434 [16:14:33<2:52:51, 2.55s/it] +2025-02-06 02:22:16 - ERROR - stderr - 82%|████████▏ | 18370/22434 [16:14:35<2:50:59, 2.52s/it] +2025-02-06 02:22:16 - ERROR - stderr - +2025-02-06 02:22:16 - ERROR - stderr - +2025-02-06 02:22:16 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.516066074371338, 'learning_rate': 1.6724924299273514e-06, 'epoch': 2.46} +2025-02-06 02:22:16 - ERROR - stderr - 82%|████████▏ | 18370/22434 [16:14:35<2:50:59, 2.52s/it] +2025-02-06 02:22:18 - ERROR - stderr - 82%|████████▏ | 18371/22434 [16:14:38<2:51:50, 2.54s/it] +2025-02-06 02:22:18 - ERROR - stderr - +2025-02-06 02:22:18 - ERROR - stderr - +2025-02-06 02:22:18 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.2862077951431274, 'learning_rate': 1.671693189199065e-06, 'epoch': 2.46} +2025-02-06 02:22:18 - ERROR - stderr - 82%|████████▏ | 18371/22434 [16:14:38<2:51:50, 2.54s/it] +2025-02-06 02:22:21 - ERROR - stderr - 82%|████████▏ | 18372/22434 [16:14:40<2:51:18, 2.53s/it] +2025-02-06 02:22:21 - ERROR - stderr - +2025-02-06 02:22:21 - ERROR - stderr - +2025-02-06 02:22:21 - INFO - stdout - {'loss': 0.3891, 'grad_norm': 1.534590721130371, 'learning_rate': 1.67089412206636e-06, 'epoch': 2.46} +2025-02-06 02:22:21 - ERROR - stderr - 82%|████████▏ | 18372/22434 [16:14:40<2:51:18, 2.53s/it] +2025-02-06 02:22:23 - ERROR - stderr - 82%|████████▏ | 18373/22434 [16:14:43<2:50:52, 2.52s/it] +2025-02-06 02:22:23 - ERROR - stderr - +2025-02-06 02:22:23 - ERROR - stderr - +2025-02-06 02:22:23 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.6412922143936157, 'learning_rate': 1.6700952285458983e-06, 'epoch': 2.46} +2025-02-06 02:22:23 - ERROR - stderr - 82%|████████▏ | 18373/22434 [16:14:43<2:50:52, 2.52s/it] +2025-02-06 02:22:26 - ERROR - stderr - 82%|████████▏ | 18374/22434 [16:14:45<2:48:42, 2.49s/it] +2025-02-06 02:22:26 - ERROR - stderr - +2025-02-06 02:22:26 - ERROR - stderr - +2025-02-06 02:22:26 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.4446101188659668, 'learning_rate': 1.6692965086543311e-06, 'epoch': 2.46} +2025-02-06 02:22:26 - ERROR - stderr - 82%|████████▏ | 18374/22434 [16:14:45<2:48:42, 2.49s/it] +2025-02-06 02:22:28 - ERROR - stderr - 82%|████████▏ | 18375/22434 [16:14:48<2:49:41, 2.51s/it] +2025-02-06 02:22:28 - ERROR - stderr - +2025-02-06 02:22:28 - ERROR - stderr - +2025-02-06 02:22:28 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.531467080116272, 'learning_rate': 1.6684979624083076e-06, 'epoch': 2.46} +2025-02-06 02:22:28 - ERROR - stderr - 82%|████████▏ | 18375/22434 [16:14:48<2:49:41, 2.51s/it] +2025-02-06 02:22:31 - ERROR - stderr - 82%|███████���▏ | 18376/22434 [16:14:50<2:47:44, 2.48s/it] +2025-02-06 02:22:31 - ERROR - stderr - +2025-02-06 02:22:31 - ERROR - stderr - +2025-02-06 02:22:31 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.5182467699050903, 'learning_rate': 1.667699589824473e-06, 'epoch': 2.46} +2025-02-06 02:22:31 - ERROR - stderr - 82%|████████▏ | 18376/22434 [16:14:50<2:47:44, 2.48s/it] +2025-02-06 02:22:33 - ERROR - stderr - 82%|████████▏ | 18377/22434 [16:14:53<2:47:05, 2.47s/it] +2025-02-06 02:22:33 - ERROR - stderr - +2025-02-06 02:22:33 - ERROR - stderr - +2025-02-06 02:22:33 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.56467866897583, 'learning_rate': 1.666901390919462e-06, 'epoch': 2.46} +2025-02-06 02:22:33 - ERROR - stderr - 82%|████████▏ | 18377/22434 [16:14:53<2:47:05, 2.47s/it] +2025-02-06 02:22:35 - ERROR - stderr - 82%|████████▏ | 18378/22434 [16:14:55<2:46:41, 2.47s/it] +2025-02-06 02:22:35 - ERROR - stderr - +2025-02-06 02:22:35 - ERROR - stderr - +2025-02-06 02:22:35 - INFO - stdout - {'loss': 0.3836, 'grad_norm': 1.6375277042388916, 'learning_rate': 1.6661033657099236e-06, 'epoch': 2.46} +2025-02-06 02:22:35 - ERROR - stderr - 82%|████████▏ | 18378/22434 [16:14:55<2:46:41, 2.47s/it] +2025-02-06 02:22:38 - ERROR - stderr - 82%|████████▏ | 18379/22434 [16:14:58<2:46:21, 2.46s/it] +2025-02-06 02:22:38 - ERROR - stderr - +2025-02-06 02:22:38 - ERROR - stderr - +2025-02-06 02:22:38 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.543134093284607, 'learning_rate': 1.665305514212483e-06, 'epoch': 2.46} +2025-02-06 02:22:38 - ERROR - stderr - 82%|████████▏ | 18379/22434 [16:14:58<2:46:21, 2.46s/it] +2025-02-06 02:22:40 - ERROR - stderr - 82%|████████▏ | 18380/22434 [16:15:00<2:46:56, 2.47s/it] +2025-02-06 02:22:40 - ERROR - stderr - +2025-02-06 02:22:40 - ERROR - stderr - +2025-02-06 02:22:40 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.5102978944778442, 'learning_rate': 1.6645078364437739e-06, 'epoch': 2.46} +2025-02-06 02:22:40 - ERROR - stderr - 82%|████████▏ | 18380/22434 [16:15:00<2:46:56, 2.47s/it] +2025-02-06 02:22:43 - ERROR - stderr - 82%|████████▏ | 18381/22434 [16:15:03<2:46:11, 2.46s/it] +2025-02-06 02:22:43 - ERROR - stderr - +2025-02-06 02:22:43 - ERROR - stderr - +2025-02-06 02:22:43 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.5835179090499878, 'learning_rate': 1.6637103324204219e-06, 'epoch': 2.46} +2025-02-06 02:22:43 - ERROR - stderr - 82%|████████▏ | 18381/22434 [16:15:03<2:46:11, 2.46s/it] +2025-02-06 02:22:45 - ERROR - stderr - 82%|████████▏ | 18382/22434 [16:15:05<2:46:08, 2.46s/it] +2025-02-06 02:22:45 - ERROR - stderr - +2025-02-06 02:22:45 - ERROR - stderr - +2025-02-06 02:22:45 - INFO - stdout - {'loss': 0.3195, 'grad_norm': 1.3649482727050781, 'learning_rate': 1.662913002159049e-06, 'epoch': 2.46} +2025-02-06 02:22:45 - ERROR - stderr - 82%|████████▏ | 18382/22434 [16:15:05<2:46:08, 2.46s/it] +2025-02-06 02:22:48 - ERROR - stderr - 82%|████████▏ | 18383/22434 [16:15:08<2:47:00, 2.47s/it] +2025-02-06 02:22:48 - ERROR - stderr - +2025-02-06 02:22:48 - ERROR - stderr - +2025-02-06 02:22:48 - INFO - stdout - {'loss': 0.3007, 'grad_norm': 1.4722611904144287, 'learning_rate': 1.662115845676282e-06, 'epoch': 2.46} +2025-02-06 02:22:48 - ERROR - stderr - 82%|████████▏ | 18383/22434 [16:15:08<2:47:00, 2.47s/it] +2025-02-06 02:22:50 - ERROR - stderr - 82%|████████▏ | 18384/22434 [16:15:10<2:47:42, 2.48s/it] +2025-02-06 02:22:50 - ERROR - stderr - +2025-02-06 02:22:50 - ERROR - stderr - +2025-02-06 02:22:50 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.4906474351882935, 'learning_rate': 1.661318862988729e-06, 'epoch': 2.46} +2025-02-06 02:22:50 - ERROR - stderr - 82%|████████▏ | 18384/22434 [16:15:10<2:47:42, 2.48s/it] +2025-02-06 02:22:53 - ERROR - stderr - 82%|████████▏ | 18385/22434 [16:15:13<2:47:33, 2.48s/it] +2025-02-06 02:22:53 - ERROR - stderr - +2025-02-06 02:22:53 - ERROR - stderr - +2025-02-06 02:22:53 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.5938630104064941, 'learning_rate': 1.6605220541130052e-06, 'epoch': 2.46} +2025-02-06 02:22:53 - ERROR - stderr - 82%|████████▏ | 18385/22434 [16:15:13<2:47:33, 2.48s/it] +2025-02-06 02:22:55 - ERROR - stderr - 82%|████████▏ | 18386/22434 [16:15:15<2:46:44, 2.47s/it] +2025-02-06 02:22:55 - ERROR - stderr - +2025-02-06 02:22:55 - ERROR - stderr - +2025-02-06 02:22:55 - INFO - stdout - {'loss': 0.3836, 'grad_norm': 1.6239675283432007, 'learning_rate': 1.6597254190657187e-06, 'epoch': 2.46} +2025-02-06 02:22:55 - ERROR - stderr - 82%|████████▏ | 18386/22434 [16:15:15<2:46:44, 2.47s/it] +2025-02-06 02:22:58 - ERROR - stderr - 82%|████████▏ | 18387/22434 [16:15:18<2:50:33, 2.53s/it] +2025-02-06 02:22:58 - ERROR - stderr - +2025-02-06 02:22:58 - ERROR - stderr - +2025-02-06 02:22:58 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.5647399425506592, 'learning_rate': 1.658928957863476e-06, 'epoch': 2.46} +2025-02-06 02:22:58 - ERROR - stderr - 82%|████████▏ | 18387/22434 [16:15:18<2:50:33, 2.53s/it] +2025-02-06 02:23:00 - ERROR - stderr - 82%|████████▏ | 18388/22434 [16:15:20<2:49:46, 2.52s/it] +2025-02-06 02:23:00 - ERROR - stderr - +2025-02-06 02:23:00 - ERROR - stderr - +2025-02-06 02:23:00 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.6796379089355469, 'learning_rate': 1.6581326705228772e-06, 'epoch': 2.46} +2025-02-06 02:23:00 - ERROR - stderr - 82%|████████▏ | 18388/22434 [16:15:20<2:49:46, 2.52s/it] +2025-02-06 02:23:03 - ERROR - stderr - 82%|████████▏ | 18389/22434 [16:15:23<2:50:23, 2.53s/it] +2025-02-06 02:23:03 - ERROR - stderr - +2025-02-06 02:23:03 - ERROR - stderr - +2025-02-06 02:23:03 - INFO - stdout - {'loss': 0.4315, 'grad_norm': 1.7886395454406738, 'learning_rate': 1.6573365570605204e-06, 'epoch': 2.46} +2025-02-06 02:23:03 - ERROR - stderr - 82%|████████▏ | 18389/22434 [16:15:23<2:50:23, 2.53s/it] +2025-02-06 02:23:05 - ERROR - stderr - 82%|████████▏ | 18390/22434 [16:15:25<2:49:23, 2.51s/it] +2025-02-06 02:23:05 - ERROR - stderr - +2025-02-06 02:23:05 - ERROR - stderr - +2025-02-06 02:23:05 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.479917287826538, 'learning_rate': 1.6565406174929999e-06, 'epoch': 2.46} +2025-02-06 02:23:05 - ERROR - stderr - 82%|████████▏ | 18390/22434 [16:15:25<2:49:23, 2.51s/it] +2025-02-06 02:23:08 - ERROR - stderr - 82%|████████▏ | 18391/22434 [16:15:28<2:48:43, 2.50s/it] +2025-02-06 02:23:08 - ERROR - stderr - +2025-02-06 02:23:08 - ERROR - stderr - +2025-02-06 02:23:08 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.678924560546875, 'learning_rate': 1.6557448518369067e-06, 'epoch': 2.46} +2025-02-06 02:23:08 - ERROR - stderr - 82%|████████▏ | 18391/22434 [16:15:28<2:48:43, 2.50s/it] +2025-02-06 02:23:10 - ERROR - stderr - 82%|████████▏ | 18392/22434 [16:15:30<2:48:00, 2.49s/it] +2025-02-06 02:23:10 - ERROR - stderr - +2025-02-06 02:23:10 - ERROR - stderr - +2025-02-06 02:23:10 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.5352133512496948, 'learning_rate': 1.6549492601088268e-06, 'epoch': 2.46} +2025-02-06 02:23:10 - ERROR - stderr - 82%|████████▏ | 18392/22434 [16:15:30<2:48:00, 2.49s/it] +2025-02-06 02:23:13 - ERROR - stderr - 82%|████████▏ | 18393/22434 [16:15:33<2:48:33, 2.50s/it] +2025-02-06 02:23:13 - ERROR - stderr - +2025-02-06 02:23:13 - ERROR - stderr - +2025-02-06 02:23:13 - INFO - stdout - {'loss': 0.3847, 'grad_norm': 1.7141751050949097, 'learning_rate': 1.6541538423253456e-06, 'epoch': 2.46} +2025-02-06 02:23:13 - ERROR - stderr - 82%|████████▏ | 18393/22434 [16:15:33<2:48:33, 2.50s/it] +2025-02-06 02:23:15 - ERROR - stderr - 82%|████████▏ | 18394/22434 [16:15:35<2:46:58, 2.48s/it] +2025-02-06 02:23:15 - ERROR - stderr - +2025-02-06 02:23:15 - ERROR - stderr - +2025-02-06 02:23:15 - INFO - stdout - {'loss': 0.3863, 'grad_norm': 1.6180425882339478, 'learning_rate': 1.6533585985030398e-06, 'epoch': 2.46} +2025-02-06 02:23:15 - ERROR - stderr - 82%|████████▏ | 18394/22434 [16:15:35<2:46:58, 2.48s/it] +2025-02-06 02:23:18 - ERROR - stderr - 82%|████████▏ | 18395/22434 [16:15:38<2:47:45, 2.49s/it] +2025-02-06 02:23:18 - ERROR - stderr - +2025-02-06 02:23:18 - ERROR - stderr - +2025-02-06 02:23:18 - INFO - stdout - {'loss': 0.3212, 'grad_norm': 1.4593968391418457, 'learning_rate': 1.6525635286584907e-06, 'epoch': 2.46} +2025-02-06 02:23:18 - ERROR - stderr - 82%|████████▏ | 18395/22434 [16:15:38<2:47:45, 2.49s/it] +2025-02-06 02:23:20 - ERROR - stderr - 82%|████████▏ | 18396/22434 [16:15:40<2:50:17, 2.53s/it] +2025-02-06 02:23:20 - ERROR - stderr - +2025-02-06 02:23:20 - ERROR - stderr - +2025-02-06 02:23:20 - INFO - stdout - {'loss': 0.3253, 'grad_norm': 1.4271601438522339, 'learning_rate': 1.6517686328082616e-06, 'epoch': 2.46} +2025-02-06 02:23:20 - ERROR - stderr - 82%|████████▏ | 18396/22434 [16:15:40<2:50:17, 2.53s/it] +2025-02-06 02:23:23 - ERROR - stderr - 82%|████████▏ | 18397/22434 [16:15:43<2:48:11, 2.50s/it] +2025-02-06 02:23:23 - ERROR - stderr - +2025-02-06 02:23:23 - ERROR - stderr - +2025-02-06 02:23:23 - INFO - stdout - {'loss': 0.4075, 'grad_norm': 1.650604248046875, 'learning_rate': 1.6509739109689326e-06, 'epoch': 2.46} +2025-02-06 02:23:23 - ERROR - stderr - 82%|████████▏ | 18397/22434 [16:15:43<2:48:11, 2.50s/it] +2025-02-06 02:23:25 - ERROR - stderr - 82%|████████▏ | 18398/22434 [16:15:45<2:46:30, 2.48s/it] +2025-02-06 02:23:25 - ERROR - stderr - +2025-02-06 02:23:25 - ERROR - stderr - +2025-02-06 02:23:25 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.4136661291122437, 'learning_rate': 1.6501793631570584e-06, 'epoch': 2.46} +2025-02-06 02:23:25 - ERROR - stderr - 82%|████████▏ | 18398/22434 [16:15:45<2:46:30, 2.48s/it] +2025-02-06 02:23:28 - ERROR - stderr - 82%|████████▏ | 18399/22434 [16:15:48<2:46:19, 2.47s/it] +2025-02-06 02:23:28 - ERROR - stderr - +2025-02-06 02:23:28 - ERROR - stderr - +2025-02-06 02:23:28 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.6053146123886108, 'learning_rate': 1.64938498938921e-06, 'epoch': 2.46} +2025-02-06 02:23:28 - ERROR - stderr - 82%|████████▏ | 18399/22434 [16:15:48<2:46:19, 2.47s/it] +2025-02-06 02:23:30 - ERROR - stderr - 82%|████████▏ | 18400/22434 [16:15:50<2:45:58, 2.47s/it] +2025-02-06 02:23:30 - ERROR - stderr - +2025-02-06 02:23:30 - ERROR - stderr - +2025-02-06 02:23:30 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4433890581130981, 'learning_rate': 1.6485907896819387e-06, 'epoch': 2.46} +2025-02-06 02:23:30 - ERROR - stderr - 82%|████████▏ | 18400/22434 [16:15:50<2:45:58, 2.47s/it] +2025-02-06 02:23:33 - ERROR - stderr - 82%|████████▏ | 18401/22434 [16:15:52<2:47:14, 2.49s/it] +2025-02-06 02:23:33 - ERROR - stderr - +2025-02-06 02:23:33 - ERROR - stderr - +2025-02-06 02:23:33 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.4452996253967285, 'learning_rate': 1.6477967640517978e-06, 'epoch': 2.46} +2025-02-06 02:23:33 - ERROR - stderr - 82%|████████▏ | 18401/22434 [16:15:53<2:47:14, 2.49s/it] +2025-02-06 02:23:35 - ERROR - stderr - 82%|████████▏ | 18402/22434 [16:15:55<2:49:03, 2.52s/it] +2025-02-06 02:23:35 - ERROR - stderr - +2025-02-06 02:23:35 - ERROR - stderr - +2025-02-06 02:23:35 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.5810301303863525, 'learning_rate': 1.6470029125153463e-06, 'epoch': 2.46} +2025-02-06 02:23:35 - ERROR - stderr - 82%|████████▏ | 18402/22434 [16:15:55<2:49:03, 2.52s/it] +2025-02-06 02:23:38 - ERROR - stderr - 82%|████████▏ | 18403/22434 [16:15:58<2:50:06, 2.53s/it] +2025-02-06 02:23:38 - ERROR - stderr - +2025-02-06 02:23:38 - ERROR - stderr - +2025-02-06 02:23:38 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.5991277694702148, 'learning_rate': 1.6462092350891245e-06, 'epoch': 2.46} +2025-02-06 02:23:38 - ERROR - stderr - 82%|████████▏ | 18403/22434 [16:15:58<2:50:06, 2.53s/it] +2025-02-06 02:23:40 - ERROR - stderr - 82%|████████▏ | 18404/22434 [16:16:00<2:48:57, 2.52s/it] +2025-02-06 02:23:40 - ERROR - stderr - +2025-02-06 02:23:40 - ERROR - stderr - +2025-02-06 02:23:40 - INFO - stdout - {'loss': 0.3203, 'grad_norm': 1.425514817237854, 'learning_rate': 1.645415731789677e-06, 'epoch': 2.46} +2025-02-06 02:23:40 - ERROR - stderr - 82%|████████▏ | 18404/22434 [16:16:00<2:48:57, 2.52s/it] +2025-02-06 02:23:43 - ERROR - stderr - 82%|████████▏ | 18405/22434 [16:16:03<2:49:27, 2.52s/it] +2025-02-06 02:23:43 - ERROR - stderr - +2025-02-06 02:23:43 - ERROR - stderr - +2025-02-06 02:23:43 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.6018075942993164, 'learning_rate': 1.6446224026335434e-06, 'epoch': 2.46} +2025-02-06 02:23:43 - ERROR - stderr - 82%|████████▏ | 18405/22434 [16:16:03<2:49:27, 2.52s/it] +2025-02-06 02:23:45 - ERROR - stderr - 82%|████████▏ | 18406/22434 [16:16:05<2:48:37, 2.51s/it] +2025-02-06 02:23:45 - ERROR - stderr - +2025-02-06 02:23:45 - ERROR - stderr - +2025-02-06 02:23:45 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.5432902574539185, 'learning_rate': 1.6438292476372607e-06, 'epoch': 2.46} +2025-02-06 02:23:45 - ERROR - stderr - 82%|████████▏ | 18406/22434 [16:16:05<2:48:37, 2.51s/it] +2025-02-06 02:23:48 - ERROR - stderr - 82%|████████▏ | 18407/22434 [16:16:08<2:47:44, 2.50s/it] +2025-02-06 02:23:48 - ERROR - stderr - +2025-02-06 02:23:48 - ERROR - stderr - +2025-02-06 02:23:48 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.618990182876587, 'learning_rate': 1.6430362668173627e-06, 'epoch': 2.46} +2025-02-06 02:23:48 - ERROR - stderr - 82%|████████▏ | 18407/22434 [16:16:08<2:47:44, 2.50s/it] +2025-02-06 02:23:51 - ERROR - stderr - 82%|████████▏ | 18408/22434 [16:16:10<2:51:51, 2.56s/it] +2025-02-06 02:23:51 - ERROR - stderr - +2025-02-06 02:23:51 - ERROR - stderr - +2025-02-06 02:23:51 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.3908213376998901, 'learning_rate': 1.6422434601903758e-06, 'epoch': 2.46} +2025-02-06 02:23:51 - ERROR - stderr - 82%|████████▏ | 18408/22434 [16:16:10<2:51:51, 2.56s/it] +2025-02-06 02:23:53 - ERROR - stderr - 82%|████████▏ | 18409/22434 [16:16:13<2:50:53, 2.55s/it] +2025-02-06 02:23:53 - ERROR - stderr - +2025-02-06 02:23:53 - ERROR - stderr - +2025-02-06 02:23:53 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.4298393726348877, 'learning_rate': 1.6414508277728268e-06, 'epoch': 2.46} +2025-02-06 02:23:53 - ERROR - stderr - 82%|████████▏ | 18409/22434 [16:16:13<2:50:53, 2.55s/it] +2025-02-06 02:23:56 - ERROR - stderr - 82%|████████▏ | 18410/22434 [16:16:15<2:50:05, 2.54s/it] +2025-02-06 02:23:56 - ERROR - stderr - +2025-02-06 02:23:56 - ERROR - stderr - +2025-02-06 02:23:56 - INFO - stdout - {'loss': 0.4048, 'grad_norm': 1.5809578895568848, 'learning_rate': 1.6406583695812362e-06, 'epoch': 2.46} +2025-02-06 02:23:56 - ERROR - stderr - 82%|████████▏ | 18410/22434 [16:16:15<2:50:05, 2.54s/it] +2025-02-06 02:23:58 - ERROR - stderr - 82%|████████▏ | 18411/22434 [16:16:18<2:50:16, 2.54s/it] +2025-02-06 02:23:58 - ERROR - stderr - +2025-02-06 02:23:58 - ERROR - stderr - +2025-02-06 02:23:58 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.5856084823608398, 'learning_rate': 1.6398660856321236e-06, 'epoch': 2.46} +2025-02-06 02:23:58 - ERROR - stderr - 82%|████████▏ | 18411/22434 [16:16:18<2:50:16, 2.54s/it] +2025-02-06 02:24:01 - ERROR - stderr - 82%|████████▏ | 18412/22434 [16:16:20<2:50:32, 2.54s/it] +2025-02-06 02:24:01 - ERROR - stderr - +2025-02-06 02:24:01 - ERROR - stderr - +2025-02-06 02:24:01 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.5606738328933716, 'learning_rate': 1.6390739759420027e-06, 'epoch': 2.46} +2025-02-06 02:24:01 - ERROR - stderr - 82%|████████▏ | 18412/22434 [16:16:21<2:50:32, 2.54s/it] +2025-02-06 02:24:03 - ERROR - stderr - 82%|████████▏ | 18413/22434 [16:16:23<2:49:35, 2.53s/it] +2025-02-06 02:24:03 - ERROR - stderr - +2025-02-06 02:24:03 - ERROR - stderr - +2025-02-06 02:24:03 - INFO - stdout - {'loss': 0.4016, 'grad_norm': 1.5805355310440063, 'learning_rate': 1.6382820405273846e-06, 'epoch': 2.46} +2025-02-06 02:24:03 - ERROR - stderr - 82%|████████▏ | 18413/22434 [16:16:23<2:49:35, 2.53s/it] +2025-02-06 02:24:06 - ERROR - stderr - 82%|████████▏ | 18414/22434 [16:16:25<2:48:19, 2.51s/it] +2025-02-06 02:24:06 - ERROR - stderr - +2025-02-06 02:24:06 - ERROR - stderr - +2025-02-06 02:24:06 - INFO - stdout - {'loss': 0.3203, 'grad_norm': 1.333855152130127, 'learning_rate': 1.6374902794047754e-06, 'epoch': 2.46} +2025-02-06 02:24:06 - ERROR - stderr - 82%|████████▏ | 18414/22434 [16:16:25<2:48:19, 2.51s/it] +2025-02-06 02:24:08 - ERROR - stderr - 82%|████████▏ | 18415/22434 [16:16:28<2:46:16, 2.48s/it] +2025-02-06 02:24:08 - ERROR - stderr - +2025-02-06 02:24:08 - ERROR - stderr - +2025-02-06 02:24:08 - INFO - stdout - {'loss': 0.3316, 'grad_norm': 1.3911138772964478, 'learning_rate': 1.6366986925906802e-06, 'epoch': 2.46} +2025-02-06 02:24:08 - ERROR - stderr - 82%|████████▏ | 18415/22434 [16:16:28<2:46:16, 2.48s/it] +2025-02-06 02:24:11 - ERROR - stderr - 82%|████████▏ | 18416/22434 [16:16:30<2:46:53, 2.49s/it] +2025-02-06 02:24:11 - ERROR - stderr - +2025-02-06 02:24:11 - ERROR - stderr - +2025-02-06 02:24:11 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.4963572025299072, 'learning_rate': 1.6359072801015995e-06, 'epoch': 2.46} +2025-02-06 02:24:11 - ERROR - stderr - 82%|████████▏ | 18416/22434 [16:16:30<2:46:53, 2.49s/it] +2025-02-06 02:24:13 - ERROR - stderr - 82%|████████▏ | 18417/22434 [16:16:33<2:48:02, 2.51s/it] +2025-02-06 02:24:13 - ERROR - stderr - +2025-02-06 02:24:13 - ERROR - stderr - +2025-02-06 02:24:13 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.5172936916351318, 'learning_rate': 1.6351160419540235e-06, 'epoch': 2.46} +2025-02-06 02:24:13 - ERROR - stderr - 82%|████████▏ | 18417/22434 [16:16:33<2:48:02, 2.51s/it] +2025-02-06 02:24:16 - ERROR - stderr - 82%|████████▏ | 18418/22434 [16:16:35<2:48:55, 2.52s/it] +2025-02-06 02:24:16 - ERROR - stderr - +2025-02-06 02:24:16 - ERROR - stderr - +2025-02-06 02:24:16 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.5685102939605713, 'learning_rate': 1.6343249781644533e-06, 'epoch': 2.46} +2025-02-06 02:24:16 - ERROR - stderr - 82%|████████▏ | 18418/22434 [16:16:36<2:48:55, 2.52s/it] +2025-02-06 02:24:19 - ERROR - stderr - 82%|████████▏ | 18419/22434 [16:16:38<2:54:45, 2.61s/it] +2025-02-06 02:24:19 - ERROR - stderr - +2025-02-06 02:24:19 - ERROR - stderr - +2025-02-06 02:24:19 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.6269254684448242, 'learning_rate': 1.6335340887493723e-06, 'epoch': 2.46} +2025-02-06 02:24:19 - ERROR - stderr - 82%|████████▏ | 18419/22434 [16:16:38<2:54:45, 2.61s/it] +2025-02-06 02:24:21 - ERROR - stderr - 82%|████████▏ | 18420/22434 [16:16:41<2:52:07, 2.57s/it] +2025-02-06 02:24:21 - ERROR - stderr - +2025-02-06 02:24:21 - ERROR - stderr - +2025-02-06 02:24:21 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.49678373336792, 'learning_rate': 1.6327433737252651e-06, 'epoch': 2.46} +2025-02-06 02:24:21 - ERROR - stderr - 82%|████████▏ | 18420/22434 [16:16:41<2:52:07, 2.57s/it] +2025-02-06 02:24:23 - ERROR - stderr - 82%|████████▏ | 18421/22434 [16:16:43<2:49:05, 2.53s/it] +2025-02-06 02:24:23 - ERROR - stderr - +2025-02-06 02:24:23 - ERROR - stderr - +2025-02-06 02:24:23 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.5435237884521484, 'learning_rate': 1.6319528331086198e-06, 'epoch': 2.46} +2025-02-06 02:24:23 - ERROR - stderr - 82%|████████▏ | 18421/22434 [16:16:43<2:49:05, 2.53s/it] +2025-02-06 02:24:26 - ERROR - stderr - 82%|████████▏ | 18422/22434 [16:16:46<2:48:05, 2.51s/it] +2025-02-06 02:24:26 - ERROR - stderr - +2025-02-06 02:24:26 - ERROR - stderr - +2025-02-06 02:24:26 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.757003903388977, 'learning_rate': 1.6311624669159064e-06, 'epoch': 2.46} +2025-02-06 02:24:26 - ERROR - stderr - 82%|████████▏ | 18422/22434 [16:16:46<2:48:05, 2.51s/it] +2025-02-06 02:24:28 - ERROR - stderr - 82%|████████▏ | 18423/22434 [16:16:48<2:47:36, 2.51s/it] +2025-02-06 02:24:28 - ERROR - stderr - +2025-02-06 02:24:28 - ERROR - stderr - +2025-02-06 02:24:28 - INFO - stdout - {'loss': 0.4317, 'grad_norm': 1.5914117097854614, 'learning_rate': 1.6303722751636076e-06, 'epoch': 2.46} +2025-02-06 02:24:28 - ERROR - stderr - 82%|████████▏ | 18423/22434 [16:16:48<2:47:36, 2.51s/it] +2025-02-06 02:24:31 - ERROR - stderr - 82%|████████▏ | 18424/22434 [16:16:51<2:46:13, 2.49s/it] +2025-02-06 02:24:31 - ERROR - stderr - +2025-02-06 02:24:31 - ERROR - stderr - +2025-02-06 02:24:31 - INFO - stdout - {'loss': 0.3259, 'grad_norm': 1.3521523475646973, 'learning_rate': 1.6295822578681875e-06, 'epoch': 2.46} +2025-02-06 02:24:31 - ERROR - stderr - 82%|████████▏ | 18424/22434 [16:16:51<2:46:13, 2.49s/it] +2025-02-06 02:24:33 - ERROR - stderr - 82%|████████▏ | 18425/22434 [16:16:53<2:47:43, 2.51s/it] +2025-02-06 02:24:33 - ERROR - stderr - +2025-02-06 02:24:33 - ERROR - stderr - +2025-02-06 02:24:33 - INFO - stdout - {'loss': 0.3998, 'grad_norm': 1.5597515106201172, 'learning_rate': 1.6287924150461153e-06, 'epoch': 2.46} +2025-02-06 02:24:33 - ERROR - stderr - 82%|████████▏ | 18425/22434 [16:16:53<2:47:43, 2.51s/it] +2025-02-06 02:24:36 - ERROR - stderr - 82%|████████▏ | 18426/22434 [16:16:56<2:46:45, 2.50s/it] +2025-02-06 02:24:36 - ERROR - stderr - +2025-02-06 02:24:36 - ERROR - stderr - +2025-02-06 02:24:36 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.5405718088150024, 'learning_rate': 1.6280027467138547e-06, 'epoch': 2.46} +2025-02-06 02:24:36 - ERROR - stderr - 82%|████████▏ | 18426/22434 [16:16:56<2:46:45, 2.50s/it] +2025-02-06 02:24:38 - ERROR - stderr - 82%|████████▏ | 18427/22434 [16:16:58<2:46:32, 2.49s/it] +2025-02-06 02:24:38 - ERROR - stderr - +2025-02-06 02:24:38 - ERROR - stderr - +2025-02-06 02:24:38 - INFO - stdout - {'loss': 0.4022, 'grad_norm': 1.5388661623001099, 'learning_rate': 1.627213252887866e-06, 'epoch': 2.46} +2025-02-06 02:24:38 - ERROR - stderr - 82%|████████▏ | 18427/22434 [16:16:58<2:46:32, 2.49s/it] +2025-02-06 02:24:41 - ERROR - stderr - 82%|████████▏ | 18428/22434 [16:17:01<2:48:06, 2.52s/it] +2025-02-06 02:24:41 - ERROR - stderr - +2025-02-06 02:24:41 - ERROR - stderr - +2025-02-06 02:24:41 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.5667030811309814, 'learning_rate': 1.6264239335846055e-06, 'epoch': 2.46} +2025-02-06 02:24:41 - ERROR - stderr - 82%|████████▏ | 18428/22434 [16:17:01<2:48:06, 2.52s/it] +2025-02-06 02:24:43 - ERROR - stderr - 82%|████████▏ | 18429/22434 [16:17:03<2:48:39, 2.53s/it] +2025-02-06 02:24:44 - ERROR - stderr - +2025-02-06 02:24:44 - ERROR - stderr - +2025-02-06 02:24:44 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.5254513025283813, 'learning_rate': 1.6256347888205248e-06, 'epoch': 2.46} +2025-02-06 02:24:44 - ERROR - stderr - 82%|████████▏ | 18429/22434 [16:17:03<2:48:39, 2.53s/it] +2025-02-06 02:24:46 - ERROR - stderr - 82%|████████▏ | 18430/22434 [16:17:06<2:47:17, 2.51s/it] +2025-02-06 02:24:46 - ERROR - stderr - +2025-02-06 02:24:46 - ERROR - stderr - +2025-02-06 02:24:46 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.541748046875, 'learning_rate': 1.6248458186120741e-06, 'epoch': 2.46} +2025-02-06 02:24:46 - ERROR - stderr - 82%|████████▏ | 18430/22434 [16:17:06<2:47:17, 2.51s/it] +2025-02-06 02:24:49 - ERROR - stderr - 82%|████████▏ | 18431/22434 [16:17:08<2:48:57, 2.53s/it] +2025-02-06 02:24:49 - ERROR - stderr - +2025-02-06 02:24:49 - ERROR - stderr - +2025-02-06 02:24:49 - INFO - stdout - {'loss': 0.3983, 'grad_norm': 1.5242400169372559, 'learning_rate': 1.624057022975698e-06, 'epoch': 2.46} +2025-02-06 02:24:49 - ERROR - stderr - 82%|████████▏ | 18431/22434 [16:17:08<2:48:57, 2.53s/it] +2025-02-06 02:24:51 - ERROR - stderr - 82%|████████▏ | 18432/22434 [16:17:11<2:51:51, 2.58s/it] +2025-02-06 02:24:51 - ERROR - stderr - +2025-02-06 02:24:51 - ERROR - stderr - +2025-02-06 02:24:51 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.6263666152954102, 'learning_rate': 1.6232684019278389e-06, 'epoch': 2.46} +2025-02-06 02:24:51 - ERROR - stderr - 82%|████████▏ | 18432/22434 [16:17:11<2:51:51, 2.58s/it] +2025-02-06 02:24:54 - ERROR - stderr - 82%|████████▏ | 18433/22434 [16:17:13<2:50:04, 2.55s/it] +2025-02-06 02:24:54 - ERROR - stderr - +2025-02-06 02:24:54 - ERROR - stderr - +2025-02-06 02:24:54 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.4669629335403442, 'learning_rate': 1.6224799554849335e-06, 'epoch': 2.46} +2025-02-06 02:24:54 - ERROR - stderr - 82%|████████▏ | 18433/22434 [16:17:14<2:50:04, 2.55s/it] +2025-02-06 02:24:56 - ERROR - stderr - 82%|████████▏ | 18434/22434 [16:17:16<2:49:38, 2.54s/it] +2025-02-06 02:24:56 - ERROR - stderr - +2025-02-06 02:24:56 - ERROR - stderr - +2025-02-06 02:24:56 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.5077074766159058, 'learning_rate': 1.6216916836634179e-06, 'epoch': 2.47} +2025-02-06 02:24:56 - ERROR - stderr - 82%|████████▏ | 18434/22434 [16:17:16<2:49:38, 2.54s/it] +2025-02-06 02:24:59 - ERROR - stderr - 82%|████████▏ | 18435/22434 [16:17:19<2:49:37, 2.55s/it] +2025-02-06 02:24:59 - ERROR - stderr - +2025-02-06 02:24:59 - ERROR - stderr - +2025-02-06 02:24:59 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5486758947372437, 'learning_rate': 1.620903586479723e-06, 'epoch': 2.47} +2025-02-06 02:24:59 - ERROR - stderr - 82%|████████▏ | 18435/22434 [16:17:19<2:49:37, 2.55s/it] +2025-02-06 02:25:01 - ERROR - stderr - 82%|████████▏ | 18436/22434 [16:17:21<2:49:57, 2.55s/it] +2025-02-06 02:25:01 - ERROR - stderr - +2025-02-06 02:25:01 - ERROR - stderr - +2025-02-06 02:25:01 - INFO - stdout - {'loss': 0.3262, 'grad_norm': 1.4843965768814087, 'learning_rate': 1.6201156639502714e-06, 'epoch': 2.47} +2025-02-06 02:25:01 - ERROR - stderr - 82%|████████▏ | 18436/22434 [16:17:21<2:49:57, 2.55s/it] +2025-02-06 02:25:04 - ERROR - stderr - 82%|████████▏ | 18437/22434 [16:17:24<2:50:13, 2.56s/it] +2025-02-06 02:25:04 - ERROR - stderr - +2025-02-06 02:25:04 - ERROR - stderr - +2025-02-06 02:25:04 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.5757921934127808, 'learning_rate': 1.6193279160914943e-06, 'epoch': 2.47} +2025-02-06 02:25:04 - ERROR - stderr - 82%|████████▏ | 18437/22434 [16:17:24<2:50:13, 2.56s/it] +2025-02-06 02:25:07 - ERROR - stderr - 82%|████████▏ | 18438/22434 [16:17:26<2:51:29, 2.57s/it] +2025-02-06 02:25:07 - ERROR - stderr - +2025-02-06 02:25:07 - ERROR - stderr - +2025-02-06 02:25:07 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.6165995597839355, 'learning_rate': 1.618540342919802e-06, 'epoch': 2.47} +2025-02-06 02:25:07 - ERROR - stderr - 82%|████████▏ | 18438/22434 [16:17:26<2:51:29, 2.57s/it] +2025-02-06 02:25:09 - ERROR - stderr - 82%|████████▏ | 18439/22434 [16:17:29<2:49:08, 2.54s/it] +2025-02-06 02:25:09 - ERROR - stderr - +2025-02-06 02:25:09 - ERROR - stderr - +2025-02-06 02:25:09 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.6041425466537476, 'learning_rate': 1.6177529444516193e-06, 'epoch': 2.47} +2025-02-06 02:25:09 - ERROR - stderr - 82%|████████▏ | 18439/22434 [16:17:29<2:49:08, 2.54s/it] +2025-02-06 02:25:11 - ERROR - stderr - 82%|████████▏ | 18440/22434 [16:17:31<2:48:54, 2.54s/it] +2025-02-06 02:25:12 - ERROR - stderr - +2025-02-06 02:25:12 - ERROR - stderr - +2025-02-06 02:25:12 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.4997284412384033, 'learning_rate': 1.6169657207033574e-06, 'epoch': 2.47} +2025-02-06 02:25:12 - ERROR - stderr - 82%|████████▏ | 18440/22434 [16:17:31<2:48:54, 2.54s/it] +2025-02-06 02:25:14 - ERROR - stderr - 82%|████████▏ | 18441/22434 [16:17:34<2:49:57, 2.55s/it] +2025-02-06 02:25:14 - ERROR - stderr - +2025-02-06 02:25:14 - ERROR - stderr - +2025-02-06 02:25:14 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.5533074140548706, 'learning_rate': 1.6161786716914196e-06, 'epoch': 2.47} +2025-02-06 02:25:14 - ERROR - stderr - 82%|████████▏ | 18441/22434 [16:17:34<2:49:57, 2.55s/it] +2025-02-06 02:25:17 - ERROR - stderr - 82%|████████▏ | 18442/22434 [16:17:37<2:53:29, 2.61s/it] +2025-02-06 02:25:17 - ERROR - stderr - +2025-02-06 02:25:17 - ERROR - stderr - +2025-02-06 02:25:17 - INFO - stdout - {'loss': 0.4126, 'grad_norm': 1.7296086549758911, 'learning_rate': 1.6153917974322187e-06, 'epoch': 2.47} +2025-02-06 02:25:17 - ERROR - stderr - 82%|████████▏ | 18442/22434 [16:17:37<2:53:29, 2.61s/it] +2025-02-06 02:25:19 - ERROR - stderr - 82%|████████▏ | 18443/22434 [16:17:39<2:51:14, 2.57s/it] +2025-02-06 02:25:19 - ERROR - stderr - +2025-02-06 02:25:19 - ERROR - stderr - +2025-02-06 02:25:19 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.4833669662475586, 'learning_rate': 1.614605097942148e-06, 'epoch': 2.47} +2025-02-06 02:25:19 - ERROR - stderr - 82%|████████▏ | 18443/22434 [16:17:39<2:51:14, 2.57s/it] +2025-02-06 02:25:22 - ERROR - stderr - 82%|████████▏ | 18444/22434 [16:17:42<3:01:35, 2.73s/it] +2025-02-06 02:25:22 - ERROR - stderr - +2025-02-06 02:25:22 - ERROR - stderr - +2025-02-06 02:25:22 - INFO - stdout - {'loss': 0.3299, 'grad_norm': 1.5846545696258545, 'learning_rate': 1.6138185732376144e-06, 'epoch': 2.47} +2025-02-06 02:25:22 - ERROR - stderr - 82%|████████▏ | 18444/22434 [16:17:42<3:01:35, 2.73s/it] +2025-02-06 02:25:25 - ERROR - stderr - 82%|████████▏ | 18445/22434 [16:17:45<2:57:10, 2.67s/it] +2025-02-06 02:25:25 - ERROR - stderr - +2025-02-06 02:25:25 - ERROR - stderr - +2025-02-06 02:25:25 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.5114147663116455, 'learning_rate': 1.613032223335007e-06, 'epoch': 2.47} +2025-02-06 02:25:25 - ERROR - stderr - 82%|████████▏ | 18445/22434 [16:17:45<2:57:10, 2.67s/it] +2025-02-06 02:25:27 - ERROR - stderr - 82%|████████▏ | 18446/22434 [16:17:47<2:53:08, 2.60s/it] +2025-02-06 02:25:27 - ERROR - stderr - +2025-02-06 02:25:27 - ERROR - stderr - +2025-02-06 02:25:27 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.3816806077957153, 'learning_rate': 1.612246048250714e-06, 'epoch': 2.47} +2025-02-06 02:25:27 - ERROR - stderr - 82%|████████▏ | 18446/22434 [16:17:47<2:53:08, 2.60s/it] +2025-02-06 02:25:30 - ERROR - stderr - 82%|████████▏ | 18447/22434 [16:17:50<2:50:47, 2.57s/it] +2025-02-06 02:25:30 - ERROR - stderr - +2025-02-06 02:25:30 - ERROR - stderr - +2025-02-06 02:25:30 - INFO - stdout - {'loss': 0.3354, 'grad_norm': 1.5780364274978638, 'learning_rate': 1.611460048001131e-06, 'epoch': 2.47} +2025-02-06 02:25:30 - ERROR - stderr - 82%|████████▏ | 18447/22434 [16:17:50<2:50:47, 2.57s/it] +2025-02-06 02:25:32 - ERROR - stderr - 82%|████████▏ | 18448/22434 [16:17:52<2:49:54, 2.56s/it] +2025-02-06 02:25:32 - ERROR - stderr - +2025-02-06 02:25:32 - ERROR - stderr - +2025-02-06 02:25:32 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.567331314086914, 'learning_rate': 1.610674222602634e-06, 'epoch': 2.47} +2025-02-06 02:25:32 - ERROR - stderr - 82%|████████▏ | 18448/22434 [16:17:52<2:49:54, 2.56s/it] +2025-02-06 02:25:35 - ERROR - stderr - 82%|████████▏ | 18449/22434 [16:17:55<2:48:47, 2.54s/it] +2025-02-06 02:25:35 - ERROR - stderr - +2025-02-06 02:25:35 - ERROR - stderr - +2025-02-06 02:25:35 - INFO - stdout - {'loss': 0.3057, 'grad_norm': 1.38765549659729, 'learning_rate': 1.609888572071604e-06, 'epoch': 2.47} +2025-02-06 02:25:35 - ERROR - stderr - 82%|████████▏ | 18449/22434 [16:17:55<2:48:47, 2.54s/it] +2025-02-06 02:25:38 - ERROR - stderr - 82%|████████▏ | 18450/22434 [16:17:57<2:49:59, 2.56s/it] +2025-02-06 02:25:38 - ERROR - stderr - +2025-02-06 02:25:38 - ERROR - stderr - +2025-02-06 02:25:38 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.5674617290496826, 'learning_rate': 1.6091030964244192e-06, 'epoch': 2.47} +2025-02-06 02:25:38 - ERROR - stderr - 82%|████████▏ | 18450/22434 [16:17:57<2:49:59, 2.56s/it] +2025-02-06 02:25:40 - ERROR - stderr - 82%|████████▏ | 18451/22434 [16:18:00<2:49:22, 2.55s/it] +2025-02-06 02:25:40 - ERROR - stderr - +2025-02-06 02:25:40 - ERROR - stderr - +2025-02-06 02:25:40 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.5536940097808838, 'learning_rate': 1.608317795677451e-06, 'epoch': 2.47} +2025-02-06 02:25:40 - ERROR - stderr - 82%|████████▏ | 18451/22434 [16:18:00<2:49:22, 2.55s/it] +2025-02-06 02:25:43 - ERROR - stderr - 82%|████████▏ | 18452/22434 [16:18:02<2:48:04, 2.53s/it] +2025-02-06 02:25:43 - ERROR - stderr - +2025-02-06 02:25:43 - ERROR - stderr - +2025-02-06 02:25:43 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.5396480560302734, 'learning_rate': 1.6075326698470695e-06, 'epoch': 2.47} +2025-02-06 02:25:43 - ERROR - stderr - 82%|████████▏ | 18452/22434 [16:18:02<2:48:04, 2.53s/it] +2025-02-06 02:25:45 - ERROR - stderr - 82%|████████▏ | 18453/22434 [16:18:05<2:46:52, 2.52s/it] +2025-02-06 02:25:45 - ERROR - stderr - +2025-02-06 02:25:45 - ERROR - stderr - +2025-02-06 02:25:45 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.5758980512619019, 'learning_rate': 1.6067477189496371e-06, 'epoch': 2.47} +2025-02-06 02:25:45 - ERROR - stderr - 82%|████████▏ | 18453/22434 [16:18:05<2:46:52, 2.52s/it] +2025-02-06 02:25:47 - ERROR - stderr - 82%|████████▏ | 18454/22434 [16:18:07<2:45:53, 2.50s/it] +2025-02-06 02:25:48 - ERROR - stderr - +2025-02-06 02:25:48 - ERROR - stderr - +2025-02-06 02:25:48 - INFO - stdout - {'loss': 0.3908, 'grad_norm': 1.548737645149231, 'learning_rate': 1.6059629430015178e-06, 'epoch': 2.47} +2025-02-06 02:25:48 - ERROR - stderr - 82%|████████▏ | 18454/22434 [16:18:07<2:45:53, 2.50s/it] +2025-02-06 02:25:50 - ERROR - stderr - 82%|████████▏ | 18455/22434 [16:18:10<2:45:10, 2.49s/it] +2025-02-06 02:25:50 - ERROR - stderr - +2025-02-06 02:25:50 - ERROR - stderr - +2025-02-06 02:25:50 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.5963069200515747, 'learning_rate': 1.605178342019068e-06, 'epoch': 2.47} +2025-02-06 02:25:50 - ERROR - stderr - 82%|████████▏ | 18455/22434 [16:18:10<2:45:10, 2.49s/it] +2025-02-06 02:25:52 - ERROR - stderr - 82%|████████▏ | 18456/22434 [16:18:12<2:44:42, 2.48s/it] +2025-02-06 02:25:52 - ERROR - stderr - +2025-02-06 02:25:52 - ERROR - stderr - +2025-02-06 02:25:52 - INFO - stdout - {'loss': 0.3972, 'grad_norm': 1.7830100059509277, 'learning_rate': 1.6043939160186462e-06, 'epoch': 2.47} +2025-02-06 02:25:52 - ERROR - stderr - 82%|████████▏ | 18456/22434 [16:18:12<2:44:42, 2.48s/it] +2025-02-06 02:25:55 - ERROR - stderr - 82%|████████▏ | 18457/22434 [16:18:15<2:45:30, 2.50s/it] +2025-02-06 02:25:55 - ERROR - stderr - +2025-02-06 02:25:55 - ERROR - stderr - +2025-02-06 02:25:55 - INFO - stdout - {'loss': 0.3166, 'grad_norm': 1.4239022731781006, 'learning_rate': 1.6036096650165944e-06, 'epoch': 2.47} +2025-02-06 02:25:55 - ERROR - stderr - 82%|████████▏ | 18457/22434 [16:18:15<2:45:30, 2.50s/it] +2025-02-06 02:25:57 - ERROR - stderr - 82%|████████▏ | 18458/22434 [16:18:17<2:46:05, 2.51s/it] +2025-02-06 02:25:58 - ERROR - stderr - +2025-02-06 02:25:58 - ERROR - stderr - +2025-02-06 02:25:58 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.4585000276565552, 'learning_rate': 1.6028255890292666e-06, 'epoch': 2.47} +2025-02-06 02:25:58 - ERROR - stderr - 82%|████████▏ | 18458/22434 [16:18:17<2:46:05, 2.51s/it] +2025-02-06 02:26:00 - ERROR - stderr - 82%|████████▏ | 18459/22434 [16:18:20<2:47:31, 2.53s/it] +2025-02-06 02:26:00 - ERROR - stderr - +2025-02-06 02:26:00 - ERROR - stderr - +2025-02-06 02:26:00 - INFO - stdout - {'loss': 0.34, 'grad_norm': 1.5661699771881104, 'learning_rate': 1.602041688073005e-06, 'epoch': 2.47} +2025-02-06 02:26:00 - ERROR - stderr - 82%|████████▏ | 18459/22434 [16:18:20<2:47:31, 2.53s/it] +2025-02-06 02:26:03 - ERROR - stderr - 82%|████████▏ | 18460/22434 [16:18:22<2:46:07, 2.51s/it] +2025-02-06 02:26:03 - ERROR - stderr - +2025-02-06 02:26:03 - ERROR - stderr - +2025-02-06 02:26:03 - INFO - stdout - {'loss': 0.3331, 'grad_norm': 1.401645541191101, 'learning_rate': 1.6012579621641478e-06, 'epoch': 2.47} +2025-02-06 02:26:03 - ERROR - stderr - 82%|████████▏ | 18460/22434 [16:18:22<2:46:07, 2.51s/it] +2025-02-06 02:26:05 - ERROR - stderr - 82%|████████▏ | 18461/22434 [16:18:25<2:44:43, 2.49s/it] +2025-02-06 02:26:05 - ERROR - stderr - +2025-02-06 02:26:05 - ERROR - stderr - +2025-02-06 02:26:05 - INFO - stdout - {'loss': 0.3448, 'grad_norm': 1.423363447189331, 'learning_rate': 1.6004744113190341e-06, 'epoch': 2.47} +2025-02-06 02:26:05 - ERROR - stderr - 82%|████████▏ | 18461/22434 [16:18:25<2:44:43, 2.49s/it] +2025-02-06 02:26:07 - ERROR - stderr - 82%|████████▏ | 18462/22434 [16:18:27<2:43:17, 2.47s/it] +2025-02-06 02:26:07 - ERROR - stderr - +2025-02-06 02:26:07 - ERROR - stderr - +2025-02-06 02:26:07 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.6676380634307861, 'learning_rate': 1.5996910355539884e-06, 'epoch': 2.47} +2025-02-06 02:26:07 - ERROR - stderr - 82%|████████▏ | 18462/22434 [16:18:27<2:43:17, 2.47s/it] +2025-02-06 02:26:10 - ERROR - stderr - 82%|████████▏ | 18463/22434 [16:18:30<2:42:52, 2.46s/it] +2025-02-06 02:26:10 - ERROR - stderr - +2025-02-06 02:26:10 - ERROR - stderr - +2025-02-06 02:26:10 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.3877291679382324, 'learning_rate': 1.5989078348853505e-06, 'epoch': 2.47} +2025-02-06 02:26:10 - ERROR - stderr - 82%|████████▏ | 18463/22434 [16:18:30<2:42:52, 2.46s/it] +2025-02-06 02:26:12 - ERROR - stderr - 82%|████████▏ | 18464/22434 [16:18:32<2:42:19, 2.45s/it] +2025-02-06 02:26:12 - ERROR - stderr - +2025-02-06 02:26:12 - ERROR - stderr - +2025-02-06 02:26:12 - INFO - stdout - {'loss': 0.4089, 'grad_norm': 1.5548053979873657, 'learning_rate': 1.5981248093294377e-06, 'epoch': 2.47} +2025-02-06 02:26:12 - ERROR - stderr - 82%|████████▏ | 18464/22434 [16:18:32<2:42:19, 2.45s/it] +2025-02-06 02:26:15 - ERROR - stderr - 82%|████████▏ | 18465/22434 [16:18:34<2:42:31, 2.46s/it] +2025-02-06 02:26:15 - ERROR - stderr - +2025-02-06 02:26:15 - ERROR - stderr - +2025-02-06 02:26:15 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.5234910249710083, 'learning_rate': 1.5973419589025707e-06, 'epoch': 2.47} +2025-02-06 02:26:15 - ERROR - stderr - 82%|████████▏ | 18465/22434 [16:18:35<2:42:31, 2.46s/it] +2025-02-06 02:26:17 - ERROR - stderr - 82%|████████▏ | 18466/22434 [16:18:37<2:43:33, 2.47s/it] +2025-02-06 02:26:17 - ERROR - stderr - +2025-02-06 02:26:17 - ERROR - stderr - +2025-02-06 02:26:17 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.6505564451217651, 'learning_rate': 1.596559283621074e-06, 'epoch': 2.47} +2025-02-06 02:26:17 - ERROR - stderr - 82%|████████▏ | 18466/22434 [16:18:37<2:43:33, 2.47s/it] +2025-02-06 02:26:20 - ERROR - stderr - 82%|████████▏ | 18467/22434 [16:18:40<2:44:14, 2.48s/it] +2025-02-06 02:26:20 - ERROR - stderr - +2025-02-06 02:26:20 - ERROR - stderr - +2025-02-06 02:26:20 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.421630620956421, 'learning_rate': 1.595776783501254e-06, 'epoch': 2.47} +2025-02-06 02:26:20 - ERROR - stderr - 82%|████████▏ | 18467/22434 [16:18:40<2:44:14, 2.48s/it] +2025-02-06 02:26:22 - ERROR - stderr - 82%|████████▏ | 18468/22434 [16:18:42<2:43:44, 2.48s/it] +2025-02-06 02:26:22 - ERROR - stderr - +2025-02-06 02:26:22 - ERROR - stderr - +2025-02-06 02:26:22 - INFO - stdout - {'loss': 0.4221, 'grad_norm': 1.6747018098831177, 'learning_rate': 1.59499445855943e-06, 'epoch': 2.47} +2025-02-06 02:26:22 - ERROR - stderr - 82%|████████▏ | 18468/22434 [16:18:42<2:43:44, 2.48s/it] +2025-02-06 02:26:25 - ERROR - stderr - 82%|████████▏ | 18469/22434 [16:18:45<2:45:25, 2.50s/it] +2025-02-06 02:26:25 - ERROR - stderr - +2025-02-06 02:26:25 - ERROR - stderr - +2025-02-06 02:26:25 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.4631503820419312, 'learning_rate': 1.594212308811901e-06, 'epoch': 2.47} +2025-02-06 02:26:25 - ERROR - stderr - 82%|████████▏ | 18469/22434 [16:18:45<2:45:25, 2.50s/it] +2025-02-06 02:26:27 - ERROR - stderr - 82%|████████▏ | 18470/22434 [16:18:47<2:46:50, 2.53s/it] +2025-02-06 02:26:27 - ERROR - stderr - +2025-02-06 02:26:27 - ERROR - stderr - +2025-02-06 02:26:27 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.5792455673217773, 'learning_rate': 1.5934303342749725e-06, 'epoch': 2.47} +2025-02-06 02:26:27 - ERROR - stderr - 82%|████████▏ | 18470/22434 [16:18:47<2:46:50, 2.53s/it] +2025-02-06 02:26:30 - ERROR - stderr - 82%|████████▏ | 18471/22434 [16:18:50<2:46:39, 2.52s/it] +2025-02-06 02:26:30 - ERROR - stderr - +2025-02-06 02:26:30 - ERROR - stderr - +2025-02-06 02:26:30 - INFO - stdout - {'loss': 0.3411, 'grad_norm': 1.4541759490966797, 'learning_rate': 1.5926485349649457e-06, 'epoch': 2.47} +2025-02-06 02:26:30 - ERROR - stderr - 82%|████████▏ | 18471/22434 [16:18:50<2:46:39, 2.52s/it] +2025-02-06 02:26:32 - ERROR - stderr - 82%|████████▏ | 18472/22434 [16:18:52<2:47:57, 2.54s/it] +2025-02-06 02:26:32 - ERROR - stderr - +2025-02-06 02:26:32 - ERROR - stderr - +2025-02-06 02:26:32 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.5153157711029053, 'learning_rate': 1.5918669108981143e-06, 'epoch': 2.47} +2025-02-06 02:26:32 - ERROR - stderr - 82%|████████▏ | 18472/22434 [16:18:52<2:47:57, 2.54s/it] +2025-02-06 02:26:35 - ERROR - stderr - 82%|████████▏ | 18473/22434 [16:18:55<2:45:54, 2.51s/it] +2025-02-06 02:26:35 - ERROR - stderr - +2025-02-06 02:26:35 - ERROR - stderr - +2025-02-06 02:26:35 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.4849190711975098, 'learning_rate': 1.5910854620907711e-06, 'epoch': 2.47} +2025-02-06 02:26:35 - ERROR - stderr - 82%|████████▏ | 18473/22434 [16:18:55<2:45:54, 2.51s/it] +2025-02-06 02:26:37 - ERROR - stderr - 82%|████████▏ | 18474/22434 [16:18:57<2:46:07, 2.52s/it] +2025-02-06 02:26:37 - ERROR - stderr - +2025-02-06 02:26:37 - ERROR - stderr - +2025-02-06 02:26:37 - INFO - stdout - {'loss': 0.3284, 'grad_norm': 1.444666862487793, 'learning_rate': 1.5903041885592052e-06, 'epoch': 2.47} +2025-02-06 02:26:37 - ERROR - stderr - 82%|████████▏ | 18474/22434 [16:18:57<2:46:07, 2.52s/it] +2025-02-06 02:26:40 - ERROR - stderr - 82%|████████▏ | 18475/22434 [16:19:00<2:47:44, 2.54s/it] +2025-02-06 02:26:40 - ERROR - stderr - +2025-02-06 02:26:40 - ERROR - stderr - +2025-02-06 02:26:40 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.3294528722763062, 'learning_rate': 1.5895230903197023e-06, 'epoch': 2.47} +2025-02-06 02:26:40 - ERROR - stderr - 82%|████████▏ | 18475/22434 [16:19:00<2:47:44, 2.54s/it] +2025-02-06 02:26:42 - ERROR - stderr - 82%|████████▏ | 18476/22434 [16:19:02<2:46:07, 2.52s/it] +2025-02-06 02:26:43 - ERROR - stderr - +2025-02-06 02:26:43 - ERROR - stderr - +2025-02-06 02:26:43 - INFO - stdout - {'loss': 0.4134, 'grad_norm': 1.9074265956878662, 'learning_rate': 1.5887421673885417e-06, 'epoch': 2.47} +2025-02-06 02:26:43 - ERROR - stderr - 82%|████████▏ | 18476/22434 [16:19:02<2:46:07, 2.52s/it] +2025-02-06 02:26:45 - ERROR - stderr - 82%|████████▏ | 18477/22434 [16:19:05<2:46:49, 2.53s/it] +2025-02-06 02:26:45 - ERROR - stderr - +2025-02-06 02:26:45 - ERROR - stderr - +2025-02-06 02:26:45 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.619092345237732, 'learning_rate': 1.5879614197820026e-06, 'epoch': 2.47} +2025-02-06 02:26:45 - ERROR - stderr - 82%|████████▏ | 18477/22434 [16:19:05<2:46:49, 2.53s/it] +2025-02-06 02:26:48 - ERROR - stderr - 82%|████████▏ | 18478/22434 [16:19:07<2:46:57, 2.53s/it] +2025-02-06 02:26:48 - ERROR - stderr - +2025-02-06 02:26:48 - ERROR - stderr - +2025-02-06 02:26:48 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.4971381425857544, 'learning_rate': 1.5871808475163575e-06, 'epoch': 2.47} +2025-02-06 02:26:48 - ERROR - stderr - 82%|████████▏ | 18478/22434 [16:19:07<2:46:57, 2.53s/it] +2025-02-06 02:26:50 - ERROR - stderr - 82%|████████▏ | 18479/22434 [16:19:10<2:45:49, 2.52s/it] +2025-02-06 02:26:50 - ERROR - stderr - +2025-02-06 02:26:50 - ERROR - stderr - +2025-02-06 02:26:50 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5303348302841187, 'learning_rate': 1.5864004506078778e-06, 'epoch': 2.47} +2025-02-06 02:26:50 - ERROR - stderr - 82%|████████▏ | 18479/22434 [16:19:10<2:45:49, 2.52s/it] +2025-02-06 02:26:52 - ERROR - stderr - 82%|████████▏ | 18480/22434 [16:19:12<2:44:12, 2.49s/it] +2025-02-06 02:26:53 - ERROR - stderr - +2025-02-06 02:26:53 - ERROR - stderr - +2025-02-06 02:26:53 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.6977455615997314, 'learning_rate': 1.5856202290728318e-06, 'epoch': 2.47} +2025-02-06 02:26:53 - ERROR - stderr - 82%|████████▏ | 18480/22434 [16:19:12<2:44:12, 2.49s/it] +2025-02-06 02:26:55 - ERROR - stderr - 82%|████████▏ | 18481/22434 [16:19:15<2:43:22, 2.48s/it] +2025-02-06 02:26:55 - ERROR - stderr - +2025-02-06 02:26:55 - ERROR - stderr - +2025-02-06 02:26:55 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.4846254587173462, 'learning_rate': 1.5848401829274762e-06, 'epoch': 2.47} +2025-02-06 02:26:55 - ERROR - stderr - 82%|████████▏ | 18481/22434 [16:19:15<2:43:22, 2.48s/it] +2025-02-06 02:26:57 - ERROR - stderr - 82%|████████▏ | 18482/22434 [16:19:17<2:44:01, 2.49s/it] +2025-02-06 02:26:58 - ERROR - stderr - +2025-02-06 02:26:58 - ERROR - stderr - +2025-02-06 02:26:58 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.6844042539596558, 'learning_rate': 1.5840603121880782e-06, 'epoch': 2.47} +2025-02-06 02:26:58 - ERROR - stderr - 82%|████████▏ | 18482/22434 [16:19:17<2:44:01, 2.49s/it] +2025-02-06 02:27:00 - ERROR - stderr - 82%|████████▏ | 18483/22434 [16:19:20<2:43:41, 2.49s/it] +2025-02-06 02:27:00 - ERROR - stderr - +2025-02-06 02:27:00 - ERROR - stderr - +2025-02-06 02:27:00 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.4793013334274292, 'learning_rate': 1.5832806168708858e-06, 'epoch': 2.47} +2025-02-06 02:27:00 - ERROR - stderr - 82%|████████▏ | 18483/22434 [16:19:20<2:43:41, 2.49s/it] +2025-02-06 02:27:02 - ERROR - stderr - 82%|████████▏ | 18484/22434 [16:19:22<2:42:23, 2.47s/it] +2025-02-06 02:27:02 - ERROR - stderr - +2025-02-06 02:27:02 - ERROR - stderr - +2025-02-06 02:27:02 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.5780149698257446, 'learning_rate': 1.5825010969921583e-06, 'epoch': 2.47} +2025-02-06 02:27:02 - ERROR - stderr - 82%|████████▏ | 18484/22434 [16:19:22<2:42:23, 2.47s/it] +2025-02-06 02:27:05 - ERROR - stderr - 82%|████████▏ | 18485/22434 [16:19:25<2:41:43, 2.46s/it] +2025-02-06 02:27:05 - ERROR - stderr - +2025-02-06 02:27:05 - ERROR - stderr - +2025-02-06 02:27:05 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.8261311054229736, 'learning_rate': 1.5817217525681416e-06, 'epoch': 2.47} +2025-02-06 02:27:05 - ERROR - stderr - 82%|████████▏ | 18485/22434 [16:19:25<2:41:43, 2.46s/it] +2025-02-06 02:27:07 - ERROR - stderr - 82%|████████▏ | 18486/22434 [16:19:27<2:46:21, 2.53s/it] +2025-02-06 02:27:08 - ERROR - stderr - +2025-02-06 02:27:08 - ERROR - stderr - +2025-02-06 02:27:08 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.6270192861557007, 'learning_rate': 1.5809425836150761e-06, 'epoch': 2.47} +2025-02-06 02:27:08 - ERROR - stderr - 82%|█████���██▏ | 18486/22434 [16:19:27<2:46:21, 2.53s/it] +2025-02-06 02:27:10 - ERROR - stderr - 82%|████████▏ | 18487/22434 [16:19:30<2:45:43, 2.52s/it] +2025-02-06 02:27:10 - ERROR - stderr - +2025-02-06 02:27:10 - ERROR - stderr - +2025-02-06 02:27:10 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.542722225189209, 'learning_rate': 1.5801635901492108e-06, 'epoch': 2.47} +2025-02-06 02:27:10 - ERROR - stderr - 82%|████████▏ | 18487/22434 [16:19:30<2:45:43, 2.52s/it] +2025-02-06 02:27:12 - ERROR - stderr - 82%|████████▏ | 18488/22434 [16:19:32<2:45:22, 2.51s/it] +2025-02-06 02:27:13 - ERROR - stderr - +2025-02-06 02:27:13 - ERROR - stderr - +2025-02-06 02:27:13 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.6362234354019165, 'learning_rate': 1.5793847721867749e-06, 'epoch': 2.47} +2025-02-06 02:27:13 - ERROR - stderr - 82%|████████▏ | 18488/22434 [16:19:32<2:45:22, 2.51s/it] +2025-02-06 02:27:15 - ERROR - stderr - 82%|████████▏ | 18489/22434 [16:19:35<2:46:50, 2.54s/it] +2025-02-06 02:27:15 - ERROR - stderr - +2025-02-06 02:27:15 - ERROR - stderr - +2025-02-06 02:27:15 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.4620044231414795, 'learning_rate': 1.578606129744007e-06, 'epoch': 2.47} +2025-02-06 02:27:15 - ERROR - stderr - 82%|████████▏ | 18489/22434 [16:19:35<2:46:50, 2.54s/it] +2025-02-06 02:27:18 - ERROR - stderr - 82%|████████▏ | 18490/22434 [16:19:38<2:53:49, 2.64s/it] +2025-02-06 02:27:18 - ERROR - stderr - +2025-02-06 02:27:18 - ERROR - stderr - +2025-02-06 02:27:18 - INFO - stdout - {'loss': 0.3599, 'grad_norm': 1.4025802612304688, 'learning_rate': 1.577827662837136e-06, 'epoch': 2.47} +2025-02-06 02:27:18 - ERROR - stderr - 82%|████████▏ | 18490/22434 [16:19:38<2:53:49, 2.64s/it] +2025-02-06 02:27:21 - ERROR - stderr - 82%|████████▏ | 18491/22434 [16:19:40<2:51:43, 2.61s/it] +2025-02-06 02:27:21 - ERROR - stderr - +2025-02-06 02:27:21 - ERROR - stderr - +2025-02-06 02:27:21 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.504228115081787, 'learning_rate': 1.5770493714823854e-06, 'epoch': 2.47} +2025-02-06 02:27:21 - ERROR - stderr - 82%|████████▏ | 18491/22434 [16:19:40<2:51:43, 2.61s/it] +2025-02-06 02:27:23 - ERROR - stderr - 82%|████████▏ | 18492/22434 [16:19:43<2:49:42, 2.58s/it] +2025-02-06 02:27:23 - ERROR - stderr - +2025-02-06 02:27:23 - ERROR - stderr - +2025-02-06 02:27:23 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.5774524211883545, 'learning_rate': 1.5762712556959859e-06, 'epoch': 2.47} +2025-02-06 02:27:23 - ERROR - stderr - 82%|████████▏ | 18492/22434 [16:19:43<2:49:42, 2.58s/it] +2025-02-06 02:27:26 - ERROR - stderr - 82%|████████▏ | 18493/22434 [16:19:45<2:48:25, 2.56s/it] +2025-02-06 02:27:26 - ERROR - stderr - +2025-02-06 02:27:26 - ERROR - stderr - +2025-02-06 02:27:26 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.6592427492141724, 'learning_rate': 1.5754933154941488e-06, 'epoch': 2.47} +2025-02-06 02:27:26 - ERROR - stderr - 82%|████████▏ | 18493/22434 [16:19:45<2:48:25, 2.56s/it] +2025-02-06 02:27:28 - ERROR - stderr - 82%|████████▏ | 18494/22434 [16:19:48<2:48:41, 2.57s/it] +2025-02-06 02:27:28 - ERROR - stderr - +2025-02-06 02:27:28 - ERROR - stderr - +2025-02-06 02:27:28 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.4538761377334595, 'learning_rate': 1.5747155508930912e-06, 'epoch': 2.47} +2025-02-06 02:27:28 - ERROR - stderr - 82%|████████▏ | 18494/22434 [16:19:48<2:48:41, 2.57s/it] +2025-02-06 02:27:31 - ERROR - stderr - 82%|████████▏ | 18495/22434 [16:19:50<2:46:52, 2.54s/it] +2025-02-06 02:27:31 - ERROR - stderr - +2025-02-06 02:27:31 - ERROR - stderr - +2025-02-06 02:27:31 - INFO - stdout - {'loss': 0.3134, 'grad_norm': 1.3130501508712769, 'learning_rate': 1.5739379619090267e-06, 'epoch': 2.47} +2025-02-06 02:27:31 - ERROR - stderr - 82%|████████▏ | 18495/22434 [16:19:50<2:46:52, 2.54s/it] +2025-02-06 02:27:33 - ERROR - stderr - 82%|████████▏ | 18496/22434 [16:19:53<2:45:24, 2.52s/it] +2025-02-06 02:27:33 - ERROR - stderr - +2025-02-06 02:27:33 - ERROR - stderr - +2025-02-06 02:27:33 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.5615488290786743, 'learning_rate': 1.5731605485581624e-06, 'epoch': 2.47} +2025-02-06 02:27:33 - ERROR - stderr - 82%|████████▏ | 18496/22434 [16:19:53<2:45:24, 2.52s/it] +2025-02-06 02:27:36 - ERROR - stderr - 82%|████████▏ | 18497/22434 [16:19:55<2:43:55, 2.50s/it] +2025-02-06 02:27:36 - ERROR - stderr - +2025-02-06 02:27:36 - ERROR - stderr - +2025-02-06 02:27:36 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.5295140743255615, 'learning_rate': 1.5723833108567033e-06, 'epoch': 2.47} +2025-02-06 02:27:36 - ERROR - stderr - 82%|████████▏ | 18497/22434 [16:19:55<2:43:55, 2.50s/it] +2025-02-06 02:27:38 - ERROR - stderr - 82%|████████▏ | 18498/22434 [16:19:58<2:42:15, 2.47s/it] +2025-02-06 02:27:38 - ERROR - stderr - +2025-02-06 02:27:38 - ERROR - stderr - +2025-02-06 02:27:38 - INFO - stdout - {'loss': 0.3985, 'grad_norm': 1.683884859085083, 'learning_rate': 1.5716062488208494e-06, 'epoch': 2.47} +2025-02-06 02:27:38 - ERROR - stderr - 82%|████████▏ | 18498/22434 [16:19:58<2:42:15, 2.47s/it] +2025-02-06 02:27:40 - ERROR - stderr - 82%|████████▏ | 18499/22434 [16:20:00<2:42:54, 2.48s/it] +2025-02-06 02:27:40 - ERROR - stderr - +2025-02-06 02:27:40 - ERROR - stderr - +2025-02-06 02:27:40 - INFO - stdout - {'loss': 0.3347, 'grad_norm': 1.3327797651290894, 'learning_rate': 1.570829362466798e-06, 'epoch': 2.47} +2025-02-06 02:27:40 - ERROR - stderr - 82%|████████▏ | 18499/22434 [16:20:00<2:42:54, 2.48s/it] +2025-02-06 02:27:43 - ERROR - stderr - 82%|████████▏ | 18500/22434 [16:20:03<2:43:20, 2.49s/it] +2025-02-06 02:27:43 - ERROR - stderr - +2025-02-06 02:27:43 - ERROR - stderr - +2025-02-06 02:27:43 - INFO - stdout - {'loss': 0.403, 'grad_norm': 1.536956787109375, 'learning_rate': 1.5700526518107428e-06, 'epoch': 2.47} +2025-02-06 02:27:43 - ERROR - stderr - 82%|████████▏ | 18500/22434 [16:20:03<2:43:20, 2.49s/it] +2025-02-06 02:27:45 - ERROR - stderr - 82%|████████▏ | 18501/22434 [16:20:05<2:43:52, 2.50s/it] +2025-02-06 02:27:46 - ERROR - stderr - +2025-02-06 02:27:46 - ERROR - stderr - +2025-02-06 02:27:46 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.473001480102539, 'learning_rate': 1.5692761168688764e-06, 'epoch': 2.47} +2025-02-06 02:27:46 - ERROR - stderr - 82%|████████▏ | 18501/22434 [16:20:05<2:43:52, 2.50s/it] +2025-02-06 02:27:48 - ERROR - stderr - 82%|████████▏ | 18502/22434 [16:20:08<2:43:17, 2.49s/it] +2025-02-06 02:27:48 - ERROR - stderr - +2025-02-06 02:27:48 - ERROR - stderr - +2025-02-06 02:27:48 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.409569501876831, 'learning_rate': 1.5684997576573767e-06, 'epoch': 2.47} +2025-02-06 02:27:48 - ERROR - stderr - 82%|████████▏ | 18502/22434 [16:20:08<2:43:17, 2.49s/it] +2025-02-06 02:27:50 - ERROR - stderr - 82%|████████▏ | 18503/22434 [16:20:10<2:42:43, 2.48s/it] +2025-02-06 02:27:50 - ERROR - stderr - +2025-02-06 02:27:50 - ERROR - stderr - +2025-02-06 02:27:50 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.454110860824585, 'learning_rate': 1.5677235741924347e-06, 'epoch': 2.47} +2025-02-06 02:27:50 - ERROR - stderr - 82%|████████▏ | 18503/22434 [16:20:10<2:42:43, 2.48s/it] +2025-02-06 02:27:53 - ERROR - stderr - 82%|████████▏ | 18504/22434 [16:20:13<2:44:04, 2.50s/it] +2025-02-06 02:27:53 - ERROR - stderr - +2025-02-06 02:27:53 - ERROR - stderr - +2025-02-06 02:27:53 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.5229406356811523, 'learning_rate': 1.5669475664902268e-06, 'epoch': 2.47} +2025-02-06 02:27:53 - ERROR - stderr - 82%|████████▏ | 18504/22434 [16:20:13<2:44:04, 2.50s/it] +2025-02-06 02:27:55 - ERROR - stderr - 82%|████████▏ | 18505/22434 [16:20:15<2:44:32, 2.51s/it] +2025-02-06 02:27:56 - ERROR - stderr - +2025-02-06 02:27:56 - ERROR - stderr - +2025-02-06 02:27:56 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.6541240215301514, 'learning_rate': 1.5661717345669237e-06, 'epoch': 2.47} +2025-02-06 02:27:56 - ERROR - stderr - 82%|████████▏ | 18505/22434 [16:20:15<2:44:32, 2.51s/it] +2025-02-06 02:27:58 - ERROR - stderr - 82%|████████▏ | 18506/22434 [16:20:18<2:43:28, 2.50s/it] +2025-02-06 02:27:58 - ERROR - stderr - +2025-02-06 02:27:58 - ERROR - stderr - +2025-02-06 02:27:58 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.3895009756088257, 'learning_rate': 1.5653960784387047e-06, 'epoch': 2.47} +2025-02-06 02:27:58 - ERROR - stderr - 82%|████████▏ | 18506/22434 [16:20:18<2:43:28, 2.50s/it] +2025-02-06 02:28:00 - ERROR - stderr - 82%|████████▏ | 18507/22434 [16:20:20<2:43:45, 2.50s/it] +2025-02-06 02:28:01 - ERROR - stderr - +2025-02-06 02:28:01 - ERROR - stderr - +2025-02-06 02:28:01 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.621322512626648, 'learning_rate': 1.5646205981217288e-06, 'epoch': 2.47} +2025-02-06 02:28:01 - ERROR - stderr - 82%|████████▏ | 18507/22434 [16:20:20<2:43:45, 2.50s/it] +2025-02-06 02:28:03 - ERROR - stderr - 82%|████████▏ | 18508/22434 [16:20:23<2:42:04, 2.48s/it] +2025-02-06 02:28:03 - ERROR - stderr - +2025-02-06 02:28:03 - ERROR - stderr - +2025-02-06 02:28:03 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.517978310585022, 'learning_rate': 1.5638452936321702e-06, 'epoch': 2.47} +2025-02-06 02:28:03 - ERROR - stderr - 82%|████████▏ | 18508/22434 [16:20:23<2:42:04, 2.48s/it] +2025-02-06 02:28:05 - ERROR - stderr - 83%|████████▎ | 18509/22434 [16:20:25<2:42:06, 2.48s/it] +2025-02-06 02:28:05 - ERROR - stderr - +2025-02-06 02:28:05 - ERROR - stderr - +2025-02-06 02:28:05 - INFO - stdout - {'loss': 0.4053, 'grad_norm': 1.6285312175750732, 'learning_rate': 1.5630701649861802e-06, 'epoch': 2.48} +2025-02-06 02:28:05 - ERROR - stderr - 83%|████████▎ | 18509/22434 [16:20:25<2:42:06, 2.48s/it] +2025-02-06 02:28:08 - ERROR - stderr - 83%|████████▎ | 18510/22434 [16:20:28<2:42:18, 2.48s/it] +2025-02-06 02:28:08 - ERROR - stderr - +2025-02-06 02:28:08 - ERROR - stderr - +2025-02-06 02:28:08 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.3396122455596924, 'learning_rate': 1.562295212199918e-06, 'epoch': 2.48} +2025-02-06 02:28:08 - ERROR - stderr - 83%|████████▎ | 18510/22434 [16:20:28<2:42:18, 2.48s/it] +2025-02-06 02:28:10 - ERROR - stderr - 83%|████████▎ | 18511/22434 [16:20:30<2:43:44, 2.50s/it] +2025-02-06 02:28:10 - ERROR - stderr - +2025-02-06 02:28:10 - ERROR - stderr - +2025-02-06 02:28:10 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.4089446067810059, 'learning_rate': 1.561520435289543e-06, 'epoch': 2.48} +2025-02-06 02:28:10 - ERROR - stderr - 83%|████████▎ | 18511/22434 [16:20:30<2:43:44, 2.50s/it] +2025-02-06 02:28:13 - ERROR - stderr - 83%|████████▎ | 18512/22434 [16:20:33<2:42:54, 2.49s/it] +2025-02-06 02:28:13 - ERROR - stderr - +2025-02-06 02:28:13 - ERROR - stderr - +2025-02-06 02:28:13 - INFO - stdout - {'loss': 0.3863, 'grad_norm': 1.6344127655029297, 'learning_rate': 1.5607458342711968e-06, 'epoch': 2.48} +2025-02-06 02:28:13 - ERROR - stderr - 83%|████████▎ | 18512/22434 [16:20:33<2:42:54, 2.49s/it] +2025-02-06 02:28:15 - ERROR - stderr - 83%|████████▎ | 18513/22434 [16:20:35<2:41:27, 2.47s/it] +2025-02-06 02:28:15 - ERROR - stderr - +2025-02-06 02:28:15 - ERROR - stderr - +2025-02-06 02:28:15 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.6467463970184326, 'learning_rate': 1.5599714091610284e-06, 'epoch': 2.48} +2025-02-06 02:28:15 - ERROR - stderr - 83%|████████▎ | 18513/22434 [16:20:35<2:41:27, 2.47s/it] +2025-02-06 02:28:18 - ERROR - stderr - 83%|████████▎ | 18514/22434 [16:20:38<2:42:22, 2.49s/it] +2025-02-06 02:28:18 - ERROR - stderr - +2025-02-06 02:28:18 - ERROR - stderr - +2025-02-06 02:28:18 - INFO - stdout - {'loss': 0.4159, 'grad_norm': 1.7331737279891968, 'learning_rate': 1.55919715997518e-06, 'epoch': 2.48} +2025-02-06 02:28:18 - ERROR - stderr - 83%|████████▎ | 18514/22434 [16:20:38<2:42:22, 2.49s/it] +2025-02-06 02:28:20 - ERROR - stderr - 83%|████████▎ | 18515/22434 [16:20:40<2:42:20, 2.49s/it] +2025-02-06 02:28:20 - ERROR - stderr - +2025-02-06 02:28:20 - ERROR - stderr - +2025-02-06 02:28:20 - INFO - stdout - {'loss': 0.4259, 'grad_norm': 1.5829551219940186, 'learning_rate': 1.5584230867297888e-06, 'epoch': 2.48} +2025-02-06 02:28:20 - ERROR - stderr - 83%|████████▎ | 18515/22434 [16:20:40<2:42:20, 2.49s/it] +2025-02-06 02:28:23 - ERROR - stderr - 83%|████████▎ | 18516/22434 [16:20:43<2:42:58, 2.50s/it] +2025-02-06 02:28:23 - ERROR - stderr - +2025-02-06 02:28:23 - ERROR - stderr - +2025-02-06 02:28:23 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.6567975282669067, 'learning_rate': 1.5576491894409918e-06, 'epoch': 2.48} +2025-02-06 02:28:23 - ERROR - stderr - 83%|████████▎ | 18516/22434 [16:20:43<2:42:58, 2.50s/it] +2025-02-06 02:28:25 - ERROR - stderr - 83%|████████▎ | 18517/22434 [16:20:45<2:42:39, 2.49s/it] +2025-02-06 02:28:25 - ERROR - stderr - +2025-02-06 02:28:25 - ERROR - stderr - +2025-02-06 02:28:25 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.5065925121307373, 'learning_rate': 1.5568754681249188e-06, 'epoch': 2.48} +2025-02-06 02:28:25 - ERROR - stderr - 83%|████████▎ | 18517/22434 [16:20:45<2:42:39, 2.49s/it] +2025-02-06 02:28:28 - ERROR - stderr - 83%|████████▎ | 18518/22434 [16:20:48<2:43:50, 2.51s/it] +2025-02-06 02:28:28 - ERROR - stderr - +2025-02-06 02:28:28 - ERROR - stderr - +2025-02-06 02:28:28 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.5182218551635742, 'learning_rate': 1.556101922797697e-06, 'epoch': 2.48} +2025-02-06 02:28:28 - ERROR - stderr - 83%|████████▎ | 18518/22434 [16:20:48<2:43:50, 2.51s/it] +2025-02-06 02:28:30 - ERROR - stderr - 83%|████████▎ | 18519/22434 [16:20:50<2:44:58, 2.53s/it] +2025-02-06 02:28:30 - ERROR - stderr - +2025-02-06 02:28:30 - ERROR - stderr - +2025-02-06 02:28:30 - INFO - stdout - {'loss': 0.334, 'grad_norm': 1.5013128519058228, 'learning_rate': 1.5553285534754503e-06, 'epoch': 2.48} +2025-02-06 02:28:30 - ERROR - stderr - 83%|████████▎ | 18519/22434 [16:20:50<2:44:58, 2.53s/it] +2025-02-06 02:28:33 - ERROR - stderr - 83%|████████▎ | 18520/22434 [16:20:53<2:45:07, 2.53s/it] +2025-02-06 02:28:33 - ERROR - stderr - +2025-02-06 02:28:33 - ERROR - stderr - +2025-02-06 02:28:33 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.7551946640014648, 'learning_rate': 1.5545553601743024e-06, 'epoch': 2.48} +2025-02-06 02:28:33 - ERROR - stderr - 83%|████████▎ | 18520/22434 [16:20:53<2:45:07, 2.53s/it] +2025-02-06 02:28:36 - ERROR - stderr - 83%|████████▎ | 18521/22434 [16:20:55<2:45:10, 2.53s/it] +2025-02-06 02:28:36 - ERROR - stderr - +2025-02-06 02:28:36 - ERROR - stderr - +2025-02-06 02:28:36 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.5106297731399536, 'learning_rate': 1.5537823429103615e-06, 'epoch': 2.48} +2025-02-06 02:28:36 - ERROR - stderr - 83%|████████▎ | 18521/22434 [16:20:55<2:45:10, 2.53s/it] +2025-02-06 02:28:38 - ERROR - stderr - 83%|████████▎ | 18522/22434 [16:20:58<2:45:09, 2.53s/it] +2025-02-06 02:28:38 - ERROR - stderr - +2025-02-06 02:28:38 - ERROR - stderr - +2025-02-06 02:28:38 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.5935688018798828, 'learning_rate': 1.5530095016997482e-06, 'epoch': 2.48} +2025-02-06 02:28:38 - ERROR - stderr - 83%|████████▎ | 18522/22434 [16:20:58<2:45:09, 2.53s/it] +2025-02-06 02:28:41 - ERROR - stderr - 83%|████████▎ | 18523/22434 [16:21:00<2:45:41, 2.54s/it] +2025-02-06 02:28:41 - ERROR - stderr - +2025-02-06 02:28:41 - ERROR - stderr - +2025-02-06 02:28:41 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.5946675539016724, 'learning_rate': 1.5522368365585695e-06, 'epoch': 2.48} +2025-02-06 02:28:41 - ERROR - stderr - 83%|████████▎ | 18523/22434 [16:21:00<2:45:41, 2.54s/it] +2025-02-06 02:28:43 - ERROR - stderr - 83%|████████▎ | 18524/22434 [16:21:03<2:44:05, 2.52s/it] +2025-02-06 02:28:43 - ERROR - stderr - +2025-02-06 02:28:43 - ERROR - stderr - +2025-02-06 02:28:43 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.4072651863098145, 'learning_rate': 1.551464347502929e-06, 'epoch': 2.48} +2025-02-06 02:28:43 - ERROR - stderr - 83%|████████▎ | 18524/22434 [16:21:03<2:44:05, 2.52s/it] +2025-02-06 02:28:46 - ERROR - stderr - 83%|████████▎ | 18525/22434 [16:21:05<2:43:05, 2.50s/it] +2025-02-06 02:28:46 - ERROR - stderr - +2025-02-06 02:28:46 - ERROR - stderr - +2025-02-06 02:28:46 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.4174827337265015, 'learning_rate': 1.550692034548933e-06, 'epoch': 2.48} +2025-02-06 02:28:46 - ERROR - stderr - 83%|████████▎ | 18525/22434 [16:21:05<2:43:05, 2.50s/it] +2025-02-06 02:28:48 - ERROR - stderr - 83%|████████▎ | 18526/22434 [16:21:08<2:43:45, 2.51s/it] +2025-02-06 02:28:48 - ERROR - stderr - +2025-02-06 02:28:48 - ERROR - stderr - +2025-02-06 02:28:48 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.7302743196487427, 'learning_rate': 1.5499198977126718e-06, 'epoch': 2.48} +2025-02-06 02:28:48 - ERROR - stderr - 83%|████████▎ | 18526/22434 [16:21:08<2:43:45, 2.51s/it] +2025-02-06 02:28:51 - ERROR - stderr - 83%|████████▎ | 18527/22434 [16:21:10<2:44:10, 2.52s/it] +2025-02-06 02:28:51 - ERROR - stderr - +2025-02-06 02:28:51 - ERROR - stderr - +2025-02-06 02:28:51 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5431721210479736, 'learning_rate': 1.549147937010248e-06, 'epoch': 2.48} +2025-02-06 02:28:51 - ERROR - stderr - 83%|████████▎ | 18527/22434 [16:21:10<2:44:10, 2.52s/it] +2025-02-06 02:28:53 - ERROR - stderr - 83%|████████▎ | 18528/22434 [16:21:13<2:42:57, 2.50s/it] +2025-02-06 02:28:53 - ERROR - stderr - +2025-02-06 02:28:53 - ERROR - stderr - +2025-02-06 02:28:53 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.4230859279632568, 'learning_rate': 1.5483761524577457e-06, 'epoch': 2.48} +2025-02-06 02:28:53 - ERROR - stderr - 83%|████████▎ | 18528/22434 [16:21:13<2:42:57, 2.50s/it] +2025-02-06 02:28:56 - ERROR - stderr - 83%|████████▎ | 18529/22434 [16:21:15<2:43:35, 2.51s/it] +2025-02-06 02:28:56 - ERROR - stderr - +2025-02-06 02:28:56 - ERROR - stderr - +2025-02-06 02:28:56 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.4824645519256592, 'learning_rate': 1.5476045440712573e-06, 'epoch': 2.48} +2025-02-06 02:28:56 - ERROR - stderr - 83%|████████▎ | 18529/22434 [16:21:15<2:43:35, 2.51s/it] +2025-02-06 02:28:58 - ERROR - stderr - 83%|████████▎ | 18530/22434 [16:21:18<2:44:11, 2.52s/it] +2025-02-06 02:28:58 - ERROR - stderr - +2025-02-06 02:28:58 - ERROR - stderr - +2025-02-06 02:28:58 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.558699607849121, 'learning_rate': 1.5468331118668655e-06, 'epoch': 2.48} +2025-02-06 02:28:58 - ERROR - stderr - 83%|████████▎ | 18530/22434 [16:21:18<2:44:11, 2.52s/it] +2025-02-06 02:29:01 - ERROR - stderr - 83%|████████▎ | 18531/22434 [16:21:21<2:45:18, 2.54s/it] +2025-02-06 02:29:01 - ERROR - stderr - +2025-02-06 02:29:01 - ERROR - stderr - +2025-02-06 02:29:01 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.4668254852294922, 'learning_rate': 1.5460618558606445e-06, 'epoch': 2.48} +2025-02-06 02:29:01 - ERROR - stderr - 83%|████████▎ | 18531/22434 [16:21:21<2:45:18, 2.54s/it] +2025-02-06 02:29:03 - ERROR - stderr - 83%|████████▎ | 18532/22434 [16:21:23<2:44:44, 2.53s/it] +2025-02-06 02:29:03 - ERROR - stderr - +2025-02-06 02:29:03 - ERROR - stderr - +2025-02-06 02:29:03 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.4193419218063354, 'learning_rate': 1.5452907760686798e-06, 'epoch': 2.48} +2025-02-06 02:29:03 - ERROR - stderr - 83%|████████▎ | 18532/22434 [16:21:23<2:44:44, 2.53s/it] +2025-02-06 02:29:06 - ERROR - stderr - 83%|████████▎ | 18533/22434 [16:21:26<2:45:40, 2.55s/it] +2025-02-06 02:29:06 - ERROR - stderr - +2025-02-06 02:29:06 - ERROR - stderr - +2025-02-06 02:29:06 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.5371713638305664, 'learning_rate': 1.5445198725070355e-06, 'epoch': 2.48} +2025-02-06 02:29:06 - ERROR - stderr - 83%|████████▎ | 18533/22434 [16:21:26<2:45:40, 2.55s/it] +2025-02-06 02:29:08 - ERROR - stderr - 83%|████████▎ | 18534/22434 [16:21:28<2:43:41, 2.52s/it] +2025-02-06 02:29:08 - ERROR - stderr - +2025-02-06 02:29:08 - ERROR - stderr - +2025-02-06 02:29:08 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.4579757452011108, 'learning_rate': 1.5437491451917829e-06, 'epoch': 2.48} +2025-02-06 02:29:08 - ERROR - stderr - 83%|████████▎ | 18534/22434 [16:21:28<2:43:41, 2.52s/it] +2025-02-06 02:29:11 - ERROR - stderr - 83%|████████▎ | 18535/22434 [16:21:31<2:42:59, 2.51s/it] +2025-02-06 02:29:11 - ERROR - stderr - +2025-02-06 02:29:11 - ERROR - stderr - +2025-02-06 02:29:11 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.4357386827468872, 'learning_rate': 1.5429785941389885e-06, 'epoch': 2.48} +2025-02-06 02:29:11 - ERROR - stderr - 83%|████████▎ | 18535/22434 [16:21:31<2:42:59, 2.51s/it] +2025-02-06 02:29:13 - ERROR - stderr - 83%|████████▎ | 18536/22434 [16:21:33<2:44:57, 2.54s/it] +2025-02-06 02:29:13 - ERROR - stderr - +2025-02-06 02:29:13 - ERROR - stderr - +2025-02-06 02:29:13 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.4298735857009888, 'learning_rate': 1.5422082193647102e-06, 'epoch': 2.48} +2025-02-06 02:29:13 - ERROR - stderr - 83%|████████▎ | 18536/22434 [16:21:33<2:44:57, 2.54s/it] +2025-02-06 02:29:16 - ERROR - stderr - 83%|████████▎ | 18537/22434 [16:21:36<2:44:23, 2.53s/it] +2025-02-06 02:29:16 - ERROR - stderr - +2025-02-06 02:29:16 - ERROR - stderr - +2025-02-06 02:29:16 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.4432660341262817, 'learning_rate': 1.5414380208850133e-06, 'epoch': 2.48} +2025-02-06 02:29:16 - ERROR - stderr - 83%|████████▎ | 18537/22434 [16:21:36<2:44:23, 2.53s/it] +2025-02-06 02:29:18 - ERROR - stderr - 83%|████████▎ | 18538/22434 [16:21:38<2:44:30, 2.53s/it] +2025-02-06 02:29:18 - ERROR - stderr - +2025-02-06 02:29:18 - ERROR - stderr - +2025-02-06 02:29:18 - INFO - stdout - {'loss': 0.4375, 'grad_norm': 1.7889536619186401, 'learning_rate': 1.5406679987159445e-06, 'epoch': 2.48} +2025-02-06 02:29:18 - ERROR - stderr - 83%|████████▎ | 18538/22434 [16:21:38<2:44:30, 2.53s/it] +2025-02-06 02:29:21 - ERROR - stderr - 83%|████████▎ | 18539/22434 [16:21:41<2:43:50, 2.52s/it] +2025-02-06 02:29:21 - ERROR - stderr - +2025-02-06 02:29:21 - ERROR - stderr - +2025-02-06 02:29:21 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.532205581665039, 'learning_rate': 1.5398981528735569e-06, 'epoch': 2.48} +2025-02-06 02:29:21 - ERROR - stderr - 83%|████████▎ | 18539/22434 [16:21:41<2:43:50, 2.52s/it] +2025-02-06 02:29:23 - ERROR - stderr - 83%|████████▎ | 18540/22434 [16:21:43<2:42:27, 2.50s/it] +2025-02-06 02:29:23 - ERROR - stderr - +2025-02-06 02:29:23 - ERROR - stderr - +2025-02-06 02:29:23 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5441720485687256, 'learning_rate': 1.5391284833738961e-06, 'epoch': 2.48} +2025-02-06 02:29:23 - ERROR - stderr - 83%|████████▎ | 18540/22434 [16:21:43<2:42:27, 2.50s/it] +2025-02-06 02:29:26 - ERROR - stderr - 83%|████████▎ | 18541/22434 [16:21:46<2:42:20, 2.50s/it] +2025-02-06 02:29:26 - ERROR - stderr - +2025-02-06 02:29:26 - ERROR - stderr - +2025-02-06 02:29:26 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.5795656442642212, 'learning_rate': 1.5383589902330065e-06, 'epoch': 2.48} +2025-02-06 02:29:26 - ERROR - stderr - 83%|████████▎ | 18541/22434 [16:21:46<2:42:20, 2.50s/it] +2025-02-06 02:29:28 - ERROR - stderr - 83%|████████▎ | 18542/22434 [16:21:48<2:42:32, 2.51s/it] +2025-02-06 02:29:28 - ERROR - stderr - +2025-02-06 02:29:28 - ERROR - stderr - +2025-02-06 02:29:28 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.5452322959899902, 'learning_rate': 1.5375896734669271e-06, 'epoch': 2.48} +2025-02-06 02:29:28 - ERROR - stderr - 83%|████████▎ | 18542/22434 [16:21:48<2:42:32, 2.51s/it] +2025-02-06 02:29:31 - ERROR - stderr - 83%|████████▎ | 18543/22434 [16:21:51<2:42:02, 2.50s/it] +2025-02-06 02:29:31 - ERROR - stderr - +2025-02-06 02:29:31 - ERROR - stderr - +2025-02-06 02:29:31 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.4723248481750488, 'learning_rate': 1.5368205330916918e-06, 'epoch': 2.48} +2025-02-06 02:29:31 - ERROR - stderr - 83%|████████▎ | 18543/22434 [16:21:51<2:42:02, 2.50s/it] +2025-02-06 02:29:33 - ERROR - stderr - 83%|████████▎ | 18544/22434 [16:21:53<2:41:40, 2.49s/it] +2025-02-06 02:29:33 - ERROR - stderr - +2025-02-06 02:29:33 - ERROR - stderr - +2025-02-06 02:29:33 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.5698682069778442, 'learning_rate': 1.5360515691233358e-06, 'epoch': 2.48} +2025-02-06 02:29:33 - ERROR - stderr - 83%|████████▎ | 18544/22434 [16:21:53<2:41:40, 2.49s/it] +2025-02-06 02:29:36 - ERROR - stderr - 83%|████████▎ | 18545/22434 [16:21:56<2:44:06, 2.53s/it] +2025-02-06 02:29:36 - ERROR - stderr - +2025-02-06 02:29:36 - ERROR - stderr - +2025-02-06 02:29:36 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5200496912002563, 'learning_rate': 1.5352827815778849e-06, 'epoch': 2.48} +2025-02-06 02:29:36 - ERROR - stderr - 83%|████████▎ | 18545/22434 [16:21:56<2:44:06, 2.53s/it] +2025-02-06 02:29:39 - ERROR - stderr - 83%|████████▎ | 18546/22434 [16:21:58<2:43:49, 2.53s/it] +2025-02-06 02:29:39 - ERROR - stderr - +2025-02-06 02:29:39 - ERROR - stderr - +2025-02-06 02:29:39 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.5363209247589111, 'learning_rate': 1.5345141704713673e-06, 'epoch': 2.48} +2025-02-06 02:29:39 - ERROR - stderr - 83%|████████▎ | 18546/22434 [16:21:58<2:43:49, 2.53s/it] +2025-02-06 02:29:41 - ERROR - stderr - 83%|████████▎ | 18547/22434 [16:22:01<2:44:50, 2.54s/it] +2025-02-06 02:29:41 - ERROR - stderr - +2025-02-06 02:29:41 - ERROR - stderr - +2025-02-06 02:29:41 - INFO - stdout - {'loss': 0.3707, 'grad_norm': 1.6510262489318848, 'learning_rate': 1.533745735819796e-06, 'epoch': 2.48} +2025-02-06 02:29:41 - ERROR - stderr - 83%|████████▎ | 18547/22434 [16:22:01<2:44:50, 2.54s/it] +2025-02-06 02:29:44 - ERROR - stderr - 83%|████████▎ | 18548/22434 [16:22:03<2:43:07, 2.52s/it] +2025-02-06 02:29:44 - ERROR - stderr - +2025-02-06 02:29:44 - ERROR - stderr - +2025-02-06 02:29:44 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.5178200006484985, 'learning_rate': 1.532977477639196e-06, 'epoch': 2.48} +2025-02-06 02:29:44 - ERROR - stderr - 83%|████████▎ | 18548/22434 [16:22:03<2:43:07, 2.52s/it] +2025-02-06 02:29:46 - ERROR - stderr - 83%|████████▎ | 18549/22434 [16:22:06<2:41:58, 2.50s/it] +2025-02-06 02:29:46 - ERROR - stderr - +2025-02-06 02:29:46 - ERROR - stderr - +2025-02-06 02:29:46 - INFO - stdout - {'loss': 0.3308, 'grad_norm': 1.3785372972488403, 'learning_rate': 1.5322093959455808e-06, 'epoch': 2.48} +2025-02-06 02:29:46 - ERROR - stderr - 83%|████████▎ | 18549/22434 [16:22:06<2:41:58, 2.50s/it] +2025-02-06 02:29:49 - ERROR - stderr - 83%|████████▎ | 18550/22434 [16:22:08<2:41:46, 2.50s/it] +2025-02-06 02:29:49 - ERROR - stderr - +2025-02-06 02:29:49 - ERROR - stderr - +2025-02-06 02:29:49 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.5331741571426392, 'learning_rate': 1.5314414907549535e-06, 'epoch': 2.48} +2025-02-06 02:29:49 - ERROR - stderr - 83%|████████▎ | 18550/22434 [16:22:08<2:41:46, 2.50s/it] +2025-02-06 02:29:51 - ERROR - stderr - 83%|████████▎ | 18551/22434 [16:22:11<2:42:39, 2.51s/it] +2025-02-06 02:29:51 - ERROR - stderr - +2025-02-06 02:29:51 - ERROR - stderr - +2025-02-06 02:29:51 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.4459644556045532, 'learning_rate': 1.530673762083329e-06, 'epoch': 2.48} +2025-02-06 02:29:51 - ERROR - stderr - 83%|████████▎ | 18551/22434 [16:22:11<2:42:39, 2.51s/it] +2025-02-06 02:29:54 - ERROR - stderr - 83%|████████▎ | 18552/22434 [16:22:13<2:42:20, 2.51s/it] +2025-02-06 02:29:54 - ERROR - stderr - +2025-02-06 02:29:54 - ERROR - stderr - +2025-02-06 02:29:54 - INFO - stdout - {'loss': 0.333, 'grad_norm': 1.4166239500045776, 'learning_rate': 1.5299062099467011e-06, 'epoch': 2.48} +2025-02-06 02:29:54 - ERROR - stderr - 83%|████████▎ | 18552/22434 [16:22:13<2:42:20, 2.51s/it] +2025-02-06 02:29:56 - ERROR - stderr - 83%|████████▎ | 18553/22434 [16:22:16<2:42:26, 2.51s/it] +2025-02-06 02:29:56 - ERROR - stderr - +2025-02-06 02:29:56 - ERROR - stderr - +2025-02-06 02:29:56 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.6284527778625488, 'learning_rate': 1.529138834361079e-06, 'epoch': 2.48} +2025-02-06 02:29:56 - ERROR - stderr - 83%|████████▎ | 18553/22434 [16:22:16<2:42:26, 2.51s/it] +2025-02-06 02:29:59 - ERROR - stderr - 83%|████████▎ | 18554/22434 [16:22:19<2:51:04, 2.65s/it] +2025-02-06 02:29:59 - ERROR - stderr - +2025-02-06 02:29:59 - ERROR - stderr - +2025-02-06 02:29:59 - INFO - stdout - {'loss': 0.333, 'grad_norm': 1.464064121246338, 'learning_rate': 1.5283716353424482e-06, 'epoch': 2.48} +2025-02-06 02:29:59 - ERROR - stderr - 83%|████████▎ | 18554/22434 [16:22:19<2:51:04, 2.65s/it] +2025-02-06 02:30:02 - ERROR - stderr - 83%|████████▎ | 18555/22434 [16:22:21<2:48:39, 2.61s/it] +2025-02-06 02:30:02 - ERROR - stderr - +2025-02-06 02:30:02 - ERROR - stderr - +2025-02-06 02:30:02 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.6437480449676514, 'learning_rate': 1.5276046129068034e-06, 'epoch': 2.48} +2025-02-06 02:30:02 - ERROR - stderr - 83%|████████▎ | 18555/22434 [16:22:21<2:48:39, 2.61s/it] +2025-02-06 02:30:04 - ERROR - stderr - 83%|████████▎ | 18556/22434 [16:22:24<2:45:12, 2.56s/it] +2025-02-06 02:30:04 - ERROR - stderr - +2025-02-06 02:30:04 - ERROR - stderr - +2025-02-06 02:30:04 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.4831856489181519, 'learning_rate': 1.5268377670701363e-06, 'epoch': 2.48} +2025-02-06 02:30:04 - ERROR - stderr - 83%|████████▎ | 18556/22434 [16:22:24<2:45:12, 2.56s/it] +2025-02-06 02:30:06 - ERROR - stderr - 83%|████████▎ | 18557/22434 [16:22:26<2:43:49, 2.54s/it] +2025-02-06 02:30:07 - ERROR - stderr - +2025-02-06 02:30:07 - ERROR - stderr - +2025-02-06 02:30:07 - INFO - stdout - {'loss': 0.3829, 'grad_norm': 1.6348040103912354, 'learning_rate': 1.5260710978484271e-06, 'epoch': 2.48} +2025-02-06 02:30:07 - ERROR - stderr - 83%|████████▎ | 18557/22434 [16:22:26<2:43:49, 2.54s/it] +2025-02-06 02:30:09 - ERROR - stderr - 83%|████████▎ | 18558/22434 [16:22:29<2:42:53, 2.52s/it] +2025-02-06 02:30:09 - ERROR - stderr - +2025-02-06 02:30:09 - ERROR - stderr - +2025-02-06 02:30:09 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.657033085823059, 'learning_rate': 1.5253046052576559e-06, 'epoch': 2.48} +2025-02-06 02:30:09 - ERROR - stderr - 83%|████████▎ | 18558/22434 [16:22:29<2:42:53, 2.52s/it] +2025-02-06 02:30:11 - ERROR - stderr - 83%|████████▎ | 18559/22434 [16:22:31<2:43:14, 2.53s/it] +2025-02-06 02:30:12 - ERROR - stderr - +2025-02-06 02:30:12 - ERROR - stderr - +2025-02-06 02:30:12 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.5436540842056274, 'learning_rate': 1.5245382893138016e-06, 'epoch': 2.48} +2025-02-06 02:30:12 - ERROR - stderr - 83%|████████▎ | 18559/22434 [16:22:31<2:43:14, 2.53s/it] +2025-02-06 02:30:14 - ERROR - stderr - 83%|████████▎ | 18560/22434 [16:22:34<2:42:46, 2.52s/it] +2025-02-06 02:30:14 - ERROR - stderr - +2025-02-06 02:30:14 - ERROR - stderr - +2025-02-06 02:30:14 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.4588854312896729, 'learning_rate': 1.5237721500328373e-06, 'epoch': 2.48} +2025-02-06 02:30:14 - ERROR - stderr - 83%|████████▎ | 18560/22434 [16:22:34<2:42:46, 2.52s/it] +2025-02-06 02:30:17 - ERROR - stderr - 83%|████████▎ | 18561/22434 [16:22:36<2:42:55, 2.52s/it] +2025-02-06 02:30:17 - ERROR - stderr - +2025-02-06 02:30:17 - ERROR - stderr - +2025-02-06 02:30:17 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.589267373085022, 'learning_rate': 1.52300618743073e-06, 'epoch': 2.48} +2025-02-06 02:30:17 - ERROR - stderr - 83%|████████▎ | 18561/22434 [16:22:36<2:42:55, 2.52s/it] +2025-02-06 02:30:19 - ERROR - stderr - 83%|████████▎ | 18562/22434 [16:22:39<2:43:42, 2.54s/it] +2025-02-06 02:30:19 - ERROR - stderr - +2025-02-06 02:30:19 - ERROR - stderr - +2025-02-06 02:30:19 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.5676326751708984, 'learning_rate': 1.5222404015234483e-06, 'epoch': 2.48} +2025-02-06 02:30:19 - ERROR - stderr - 83%|████████▎ | 18562/22434 [16:22:39<2:43:42, 2.54s/it] +2025-02-06 02:30:22 - ERROR - stderr - 83%|████████▎ | 18563/22434 [16:22:41<2:43:30, 2.53s/it] +2025-02-06 02:30:22 - ERROR - stderr - +2025-02-06 02:30:22 - ERROR - stderr - +2025-02-06 02:30:22 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.538878321647644, 'learning_rate': 1.5214747923269524e-06, 'epoch': 2.48} +2025-02-06 02:30:22 - ERROR - stderr - 83%|████████▎ | 18563/22434 [16:22:41<2:43:30, 2.53s/it] +2025-02-06 02:30:24 - ERROR - stderr - 83%|████████▎ | 18564/22434 [16:22:44<2:42:50, 2.52s/it] +2025-02-06 02:30:24 - ERROR - stderr - +2025-02-06 02:30:24 - ERROR - stderr - +2025-02-06 02:30:24 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4866918325424194, 'learning_rate': 1.520709359857202e-06, 'epoch': 2.48} +2025-02-06 02:30:24 - ERROR - stderr - 83%|████████▎ | 18564/22434 [16:22:44<2:42:50, 2.52s/it] +2025-02-06 02:30:27 - ERROR - stderr - 83%|████████▎ | 18565/22434 [16:22:47<2:49:37, 2.63s/it] +2025-02-06 02:30:27 - ERROR - stderr - +2025-02-06 02:30:27 - ERROR - stderr - +2025-02-06 02:30:27 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.5924081802368164, 'learning_rate': 1.5199441041301533e-06, 'epoch': 2.48} +2025-02-06 02:30:27 - ERROR - stderr - 83%|████████▎ | 18565/22434 [16:22:47<2:49:37, 2.63s/it] +2025-02-06 02:30:30 - ERROR - stderr - 83%|████████▎ | 18566/22434 [16:22:49<2:49:41, 2.63s/it] +2025-02-06 02:30:30 - ERROR - stderr - +2025-02-06 02:30:30 - ERROR - stderr - +2025-02-06 02:30:30 - INFO - stdout - {'loss': 0.4083, 'grad_norm': 1.649298906326294, 'learning_rate': 1.5191790251617499e-06, 'epoch': 2.48} +2025-02-06 02:30:30 - ERROR - stderr - 83%|████████▎ | 18566/22434 [16:22:49<2:49:41, 2.63s/it] +2025-02-06 02:30:32 - ERROR - stderr - 83%|████████▎ | 18567/22434 [16:22:52<2:48:27, 2.61s/it] +2025-02-06 02:30:32 - ERROR - stderr - +2025-02-06 02:30:32 - ERROR - stderr - +2025-02-06 02:30:32 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.558677077293396, 'learning_rate': 1.5184141229679472e-06, 'epoch': 2.48} +2025-02-06 02:30:32 - ERROR - stderr - 83%|████████▎ | 18567/22434 [16:22:52<2:48:27, 2.61s/it] +2025-02-06 02:30:35 - ERROR - stderr - 83%|████████▎ | 18568/22434 [16:22:55<2:47:16, 2.60s/it] +2025-02-06 02:30:35 - ERROR - stderr - +2025-02-06 02:30:35 - ERROR - stderr - +2025-02-06 02:30:35 - INFO - stdout - {'loss': 0.3242, 'grad_norm': 1.6864196062088013, 'learning_rate': 1.5176493975646866e-06, 'epoch': 2.48} +2025-02-06 02:30:35 - ERROR - stderr - 83%|████████▎ | 18568/22434 [16:22:55<2:47:16, 2.60s/it] +2025-02-06 02:30:37 - ERROR - stderr - 83%|████████▎ | 18569/22434 [16:22:57<2:45:46, 2.57s/it] +2025-02-06 02:30:37 - ERROR - stderr - +2025-02-06 02:30:37 - ERROR - stderr - +2025-02-06 02:30:37 - INFO - stdout - {'loss': 0.3179, 'grad_norm': 1.3753076791763306, 'learning_rate': 1.5168848489679066e-06, 'epoch': 2.48} +2025-02-06 02:30:37 - ERROR - stderr - 83%|████████▎ | 18569/22434 [16:22:57<2:45:46, 2.57s/it] +2025-02-06 02:30:40 - ERROR - stderr - 83%|████████▎ | 18570/22434 [16:22:59<2:42:57, 2.53s/it] +2025-02-06 02:30:40 - ERROR - stderr - +2025-02-06 02:30:40 - ERROR - stderr - +2025-02-06 02:30:40 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.5665085315704346, 'learning_rate': 1.516120477193548e-06, 'epoch': 2.48} +2025-02-06 02:30:40 - ERROR - stderr - 83%|████████▎ | 18570/22434 [16:23:00<2:42:57, 2.53s/it] +2025-02-06 02:30:42 - ERROR - stderr - 83%|████████▎ | 18571/22434 [16:23:02<2:42:56, 2.53s/it] +2025-02-06 02:30:42 - ERROR - stderr - +2025-02-06 02:30:42 - ERROR - stderr - +2025-02-06 02:30:42 - INFO - stdout - {'loss': 0.4308, 'grad_norm': 1.8492076396942139, 'learning_rate': 1.5153562822575352e-06, 'epoch': 2.48} +2025-02-06 02:30:42 - ERROR - stderr - 83%|████████▎ | 18571/22434 [16:23:02<2:42:56, 2.53s/it] +2025-02-06 02:30:45 - ERROR - stderr - 83%|████████▎ | 18572/22434 [16:23:05<2:44:41, 2.56s/it] +2025-02-06 02:30:45 - ERROR - stderr - +2025-02-06 02:30:45 - ERROR - stderr - +2025-02-06 02:30:45 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.6114310026168823, 'learning_rate': 1.5145922641758048e-06, 'epoch': 2.48} +2025-02-06 02:30:45 - ERROR - stderr - 83%|████████▎ | 18572/22434 [16:23:05<2:44:41, 2.56s/it] +2025-02-06 02:30:47 - ERROR - stderr - 83%|████████▎ | 18573/22434 [16:23:07<2:43:23, 2.54s/it] +2025-02-06 02:30:47 - ERROR - stderr - +2025-02-06 02:30:47 - ERROR - stderr - +2025-02-06 02:30:47 - INFO - stdout - {'loss': 0.3791, 'grad_norm': 1.6405280828475952, 'learning_rate': 1.5138284229642786e-06, 'epoch': 2.48} +2025-02-06 02:30:47 - ERROR - stderr - 83%|████████▎ | 18573/22434 [16:23:07<2:43:23, 2.54s/it] +2025-02-06 02:30:50 - ERROR - stderr - 83%|████████▎ | 18574/22434 [16:23:10<2:42:24, 2.52s/it] +2025-02-06 02:30:50 - ERROR - stderr - +2025-02-06 02:30:50 - ERROR - stderr - +2025-02-06 02:30:50 - INFO - stdout - {'loss': 0.3943, 'grad_norm': 1.6995488405227661, 'learning_rate': 1.5130647586388746e-06, 'epoch': 2.48} +2025-02-06 02:30:50 - ERROR - stderr - 83%|████████▎ | 18574/22434 [16:23:10<2:42:24, 2.52s/it] +2025-02-06 02:30:52 - ERROR - stderr - 83%|████████▎ | 18575/22434 [16:23:12<2:40:37, 2.50s/it] +2025-02-06 02:30:52 - ERROR - stderr - +2025-02-06 02:30:52 - ERROR - stderr - +2025-02-06 02:30:52 - INFO - stdout - {'loss': 0.337, 'grad_norm': 1.5553573369979858, 'learning_rate': 1.5123012712155205e-06, 'epoch': 2.48} +2025-02-06 02:30:52 - ERROR - stderr - 83%|████████▎ | 18575/22434 [16:23:12<2:40:37, 2.50s/it] +2025-02-06 02:30:55 - ERROR - stderr - 83%|████████▎ | 18576/22434 [16:23:15<2:41:28, 2.51s/it] +2025-02-06 02:30:55 - ERROR - stderr - +2025-02-06 02:30:55 - ERROR - stderr - +2025-02-06 02:30:55 - INFO - stdout - {'loss': 0.3035, 'grad_norm': 1.5313228368759155, 'learning_rate': 1.5115379607101189e-06, 'epoch': 2.48} +2025-02-06 02:30:55 - ERROR - stderr - 83%|████████▎ | 18576/22434 [16:23:15<2:41:28, 2.51s/it] +2025-02-06 02:30:57 - ERROR - stderr - 83%|████████▎ | 18577/22434 [16:23:17<2:42:10, 2.52s/it] +2025-02-06 02:30:57 - ERROR - stderr - +2025-02-06 02:30:57 - ERROR - stderr - +2025-02-06 02:30:57 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.6485062837600708, 'learning_rate': 1.5107748271385914e-06, 'epoch': 2.48} +2025-02-06 02:30:57 - ERROR - stderr - 83%|████████▎ | 18577/22434 [16:23:17<2:42:10, 2.52s/it] +2025-02-06 02:31:00 - ERROR - stderr - 83%|████████▎ | 18578/22434 [16:23:20<2:41:30, 2.51s/it] +2025-02-06 02:31:00 - ERROR - stderr - +2025-02-06 02:31:00 - ERROR - stderr - +2025-02-06 02:31:00 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.4323893785476685, 'learning_rate': 1.5100118705168364e-06, 'epoch': 2.48} +2025-02-06 02:31:00 - ERROR - stderr - 83%|████████▎ | 18578/22434 [16:23:20<2:41:30, 2.51s/it] +2025-02-06 02:31:02 - ERROR - stderr - 83%|████████▎ | 18579/22434 [16:23:22<2:42:15, 2.53s/it] +2025-02-06 02:31:02 - ERROR - stderr - +2025-02-06 02:31:02 - ERROR - stderr - +2025-02-06 02:31:02 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.518763542175293, 'learning_rate': 1.5092490908607605e-06, 'epoch': 2.48} +2025-02-06 02:31:02 - ERROR - stderr - 83%|████████▎ | 18579/22434 [16:23:22<2:42:15, 2.53s/it] +2025-02-06 02:31:05 - ERROR - stderr - 83%|████████▎ | 18580/22434 [16:23:25<2:41:53, 2.52s/it] +2025-02-06 02:31:05 - ERROR - stderr - +2025-02-06 02:31:05 - ERROR - stderr - +2025-02-06 02:31:05 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.476028323173523, 'learning_rate': 1.5084864881862627e-06, 'epoch': 2.48} +2025-02-06 02:31:05 - ERROR - stderr - 83%|████████▎ | 18580/22434 [16:23:25<2:41:53, 2.52s/it] +2025-02-06 02:31:07 - ERROR - stderr - 83%|████████▎ | 18581/22434 [16:23:27<2:41:00, 2.51s/it] +2025-02-06 02:31:07 - ERROR - stderr - +2025-02-06 02:31:07 - ERROR - stderr - +2025-02-06 02:31:07 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.5826342105865479, 'learning_rate': 1.507724062509237e-06, 'epoch': 2.48} +2025-02-06 02:31:07 - ERROR - stderr - 83%|████████▎ | 18581/22434 [16:23:27<2:41:00, 2.51s/it] +2025-02-06 02:31:10 - ERROR - stderr - 83%|████████▎ | 18582/22434 [16:23:30<2:40:24, 2.50s/it] +2025-02-06 02:31:10 - ERROR - stderr - +2025-02-06 02:31:10 - ERROR - stderr - +2025-02-06 02:31:10 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.4992865324020386, 'learning_rate': 1.5069618138455788e-06, 'epoch': 2.48} +2025-02-06 02:31:10 - ERROR - stderr - 83%|████████▎ | 18582/22434 [16:23:30<2:40:24, 2.50s/it] +2025-02-06 02:31:12 - ERROR - stderr - 83%|████████▎ | 18583/22434 [16:23:32<2:40:34, 2.50s/it] +2025-02-06 02:31:12 - ERROR - stderr - +2025-02-06 02:31:12 - ERROR - stderr - +2025-02-06 02:31:12 - INFO - stdout - {'loss': 0.3246, 'grad_norm': 1.5023773908615112, 'learning_rate': 1.506199742211174e-06, 'epoch': 2.49} +2025-02-06 02:31:12 - ERROR - stderr - 83%|████████▎ | 18583/22434 [16:23:32<2:40:34, 2.50s/it] +2025-02-06 02:31:15 - ERROR - stderr - 83%|████████▎ | 18584/22434 [16:23:35<2:40:06, 2.50s/it] +2025-02-06 02:31:15 - ERROR - stderr - +2025-02-06 02:31:15 - ERROR - stderr - +2025-02-06 02:31:15 - INFO - stdout - {'loss': 0.3577, 'grad_norm': 1.607790231704712, 'learning_rate': 1.5054378476219079e-06, 'epoch': 2.49} +2025-02-06 02:31:15 - ERROR - stderr - 83%|████████▎ | 18584/22434 [16:23:35<2:40:06, 2.50s/it] +2025-02-06 02:31:17 - ERROR - stderr - 83%|████████▎ | 18585/22434 [16:23:37<2:39:03, 2.48s/it] +2025-02-06 02:31:17 - ERROR - stderr - +2025-02-06 02:31:17 - ERROR - stderr - +2025-02-06 02:31:17 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.612294316291809, 'learning_rate': 1.5046761300936607e-06, 'epoch': 2.49} +2025-02-06 02:31:17 - ERROR - stderr - 83%|████████▎ | 18585/22434 [16:23:37<2:39:03, 2.48s/it] +2025-02-06 02:31:20 - ERROR - stderr - 83%|████████▎ | 18586/22434 [16:23:40<2:38:16, 2.47s/it] +2025-02-06 02:31:20 - ERROR - stderr - +2025-02-06 02:31:20 - ERROR - stderr - +2025-02-06 02:31:20 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.4075566530227661, 'learning_rate': 1.5039145896423112e-06, 'epoch': 2.49} +2025-02-06 02:31:20 - ERROR - stderr - 83%|████████▎ | 18586/22434 [16:23:40<2:38:16, 2.47s/it] +2025-02-06 02:31:22 - ERROR - stderr - 83%|████████▎ | 18587/22434 [16:23:42<2:37:14, 2.45s/it] +2025-02-06 02:31:22 - ERROR - stderr - +2025-02-06 02:31:22 - ERROR - stderr - +2025-02-06 02:31:22 - INFO - stdout - {'loss': 0.3129, 'grad_norm': 1.3451197147369385, 'learning_rate': 1.5031532262837323e-06, 'epoch': 2.49} +2025-02-06 02:31:22 - ERROR - stderr - 83%|████████▎ | 18587/22434 [16:23:42<2:37:14, 2.45s/it] +2025-02-06 02:31:25 - ERROR - stderr - 83%|████████▎ | 18588/22434 [16:23:44<2:37:45, 2.46s/it] +2025-02-06 02:31:25 - ERROR - stderr - +2025-02-06 02:31:25 - ERROR - stderr - +2025-02-06 02:31:25 - INFO - stdout - {'loss': 0.3901, 'grad_norm': 1.7540115118026733, 'learning_rate': 1.5023920400337932e-06, 'epoch': 2.49} +2025-02-06 02:31:25 - ERROR - stderr - 83%|████████▎ | 18588/22434 [16:23:44<2:37:45, 2.46s/it] +2025-02-06 02:31:27 - ERROR - stderr - 83%|████████▎ | 18589/22434 [16:23:47<2:41:21, 2.52s/it] +2025-02-06 02:31:27 - ERROR - stderr - +2025-02-06 02:31:27 - ERROR - stderr - +2025-02-06 02:31:27 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.434766411781311, 'learning_rate': 1.5016310309083637e-06, 'epoch': 2.49} +2025-02-06 02:31:27 - ERROR - stderr - 83%|████████▎ | 18589/22434 [16:23:47<2:41:21, 2.52s/it] +2025-02-06 02:31:30 - ERROR - stderr - 83%|████████▎ | 18590/22434 [16:23:50<2:39:59, 2.50s/it] +2025-02-06 02:31:30 - ERROR - stderr - +2025-02-06 02:31:30 - ERROR - stderr - +2025-02-06 02:31:30 - INFO - stdout - {'loss': 0.3037, 'grad_norm': 1.323065161705017, 'learning_rate': 1.5008701989232977e-06, 'epoch': 2.49} +2025-02-06 02:31:30 - ERROR - stderr - 83%|████████▎ | 18590/22434 [16:23:50<2:39:59, 2.50s/it] +2025-02-06 02:31:33 - ERROR - stderr - 83%|████████▎ | 18591/22434 [16:23:52<2:45:00, 2.58s/it] +2025-02-06 02:31:33 - ERROR - stderr - +2025-02-06 02:31:33 - ERROR - stderr - +2025-02-06 02:31:33 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.4208122491836548, 'learning_rate': 1.5001095440944657e-06, 'epoch': 2.49} +2025-02-06 02:31:33 - ERROR - stderr - 83%|████████▎ | 18591/22434 [16:23:52<2:45:00, 2.58s/it] +2025-02-06 02:31:35 - ERROR - stderr - 83%|████████▎ | 18592/22434 [16:23:55<2:41:55, 2.53s/it] +2025-02-06 02:31:35 - ERROR - stderr - +2025-02-06 02:31:35 - ERROR - stderr - +2025-02-06 02:31:35 - INFO - stdout - {'loss': 0.4004, 'grad_norm': 1.7600501775741577, 'learning_rate': 1.499349066437711e-06, 'epoch': 2.49} +2025-02-06 02:31:35 - ERROR - stderr - 83%|████████▎ | 18592/22434 [16:23:55<2:41:55, 2.53s/it] +2025-02-06 02:31:37 - ERROR - stderr - 83%|████████▎ | 18593/22434 [16:23:57<2:41:02, 2.52s/it] +2025-02-06 02:31:37 - ERROR - stderr - +2025-02-06 02:31:37 - ERROR - stderr - +2025-02-06 02:31:37 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.603559970855713, 'learning_rate': 1.4985887659688936e-06, 'epoch': 2.49} +2025-02-06 02:31:37 - ERROR - stderr - 83%|████████▎ | 18593/22434 [16:23:57<2:41:02, 2.52s/it] +2025-02-06 02:31:40 - ERROR - stderr - 83%|████████▎ | 18594/22434 [16:24:00<2:40:04, 2.50s/it] +2025-02-06 02:31:40 - ERROR - stderr - +2025-02-06 02:31:40 - ERROR - stderr - +2025-02-06 02:31:40 - INFO - stdout - {'loss': 0.4054, 'grad_norm': 1.6010901927947998, 'learning_rate': 1.4978286427038602e-06, 'epoch': 2.49} +2025-02-06 02:31:40 - ERROR - stderr - 83%|████████▎ | 18594/22434 [16:24:00<2:40:04, 2.50s/it] +2025-02-06 02:31:42 - ERROR - stderr - 83%|████████▎ | 18595/22434 [16:24:02<2:40:12, 2.50s/it] +2025-02-06 02:31:42 - ERROR - stderr - +2025-02-06 02:31:42 - ERROR - stderr - +2025-02-06 02:31:42 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.5291130542755127, 'learning_rate': 1.497068696658449e-06, 'epoch': 2.49} +2025-02-06 02:31:42 - ERROR - stderr - 83%|████████▎ | 18595/22434 [16:24:02<2:40:12, 2.50s/it] +2025-02-06 02:31:45 - ERROR - stderr - 83%|████████▎ | 18596/22434 [16:24:05<2:40:02, 2.50s/it] +2025-02-06 02:31:45 - ERROR - stderr - +2025-02-06 02:31:45 - ERROR - stderr - +2025-02-06 02:31:45 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.6267318725585938, 'learning_rate': 1.4963089278485088e-06, 'epoch': 2.49} +2025-02-06 02:31:45 - ERROR - stderr - 83%|████████▎ | 18596/22434 [16:24:05<2:40:02, 2.50s/it] +2025-02-06 02:31:47 - ERROR - stderr - 83%|████████▎ | 18597/22434 [16:24:07<2:38:35, 2.48s/it] +2025-02-06 02:31:47 - ERROR - stderr - +2025-02-06 02:31:47 - ERROR - stderr - +2025-02-06 02:31:47 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.6186786890029907, 'learning_rate': 1.4955493362898688e-06, 'epoch': 2.49} +2025-02-06 02:31:47 - ERROR - stderr - 83%|████████▎ | 18597/22434 [16:24:07<2:38:35, 2.48s/it] +2025-02-06 02:31:50 - ERROR - stderr - 83%|████████▎ | 18598/22434 [16:24:10<2:37:35, 2.46s/it] +2025-02-06 02:31:50 - ERROR - stderr - +2025-02-06 02:31:50 - ERROR - stderr - +2025-02-06 02:31:50 - INFO - stdout - {'loss': 0.3401, 'grad_norm': 1.341605305671692, 'learning_rate': 1.4947899219983664e-06, 'epoch': 2.49} +2025-02-06 02:31:50 - ERROR - stderr - 83%|████████▎ | 18598/22434 [16:24:10<2:37:35, 2.46s/it] +2025-02-06 02:31:52 - ERROR - stderr - 83%|████████▎ | 18599/22434 [16:24:12<2:37:53, 2.47s/it] +2025-02-06 02:31:52 - ERROR - stderr - +2025-02-06 02:31:52 - ERROR - stderr - +2025-02-06 02:31:52 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.5712435245513916, 'learning_rate': 1.4940306849898289e-06, 'epoch': 2.49} +2025-02-06 02:31:52 - ERROR - stderr - 83%|████████▎ | 18599/22434 [16:24:12<2:37:53, 2.47s/it] +2025-02-06 02:31:55 - ERROR - stderr - 83%|████████▎ | 18600/22434 [16:24:14<2:38:06, 2.47s/it] +2025-02-06 02:31:55 - ERROR - stderr - +2025-02-06 02:31:55 - ERROR - stderr - +2025-02-06 02:31:55 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.6367145776748657, 'learning_rate': 1.4932716252800817e-06, 'epoch': 2.49} +2025-02-06 02:31:55 - ERROR - stderr - 83%|████████▎ | 18600/22434 [16:24:15<2:38:06, 2.47s/it] +2025-02-06 02:31:57 - ERROR - stderr - 83%|████████▎ | 18601/22434 [16:24:17<2:40:20, 2.51s/it] +2025-02-06 02:31:57 - ERROR - stderr - +2025-02-06 02:31:57 - ERROR - stderr - +2025-02-06 02:31:57 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.6758514642715454, 'learning_rate': 1.4925127428849484e-06, 'epoch': 2.49} +2025-02-06 02:31:57 - ERROR - stderr - 83%|████████▎ | 18601/22434 [16:24:17<2:40:20, 2.51s/it] +2025-02-06 02:32:00 - ERROR - stderr - 83%|████████▎ | 18602/22434 [16:24:20<2:38:46, 2.49s/it] +2025-02-06 02:32:00 - ERROR - stderr - +2025-02-06 02:32:00 - ERROR - stderr - +2025-02-06 02:32:00 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.500554084777832, 'learning_rate': 1.4917540378202456e-06, 'epoch': 2.49} +2025-02-06 02:32:00 - ERROR - stderr - 83%|████████▎ | 18602/22434 [16:24:20<2:38:46, 2.49s/it] +2025-02-06 02:32:02 - ERROR - stderr - 83%|████████▎ | 18603/22434 [16:24:22<2:38:57, 2.49s/it] +2025-02-06 02:32:02 - ERROR - stderr - +2025-02-06 02:32:02 - ERROR - stderr - +2025-02-06 02:32:02 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.5667781829833984, 'learning_rate': 1.4909955101017882e-06, 'epoch': 2.49} +2025-02-06 02:32:02 - ERROR - stderr - 83%|████████▎ | 18603/22434 [16:24:22<2:38:57, 2.49s/it] +2025-02-06 02:32:05 - ERROR - stderr - 83%|████████▎ | 18604/22434 [16:24:24<2:37:50, 2.47s/it] +2025-02-06 02:32:05 - ERROR - stderr - +2025-02-06 02:32:05 - ERROR - stderr - +2025-02-06 02:32:05 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.6292206048965454, 'learning_rate': 1.4902371597453879e-06, 'epoch': 2.49} +2025-02-06 02:32:05 - ERROR - stderr - 83%|████████▎ | 18604/22434 [16:24:25<2:37:50, 2.47s/it] +2025-02-06 02:32:07 - ERROR - stderr - 83%|████████▎ | 18605/22434 [16:24:27<2:38:13, 2.48s/it] +2025-02-06 02:32:07 - ERROR - stderr - +2025-02-06 02:32:07 - ERROR - stderr - +2025-02-06 02:32:07 - INFO - stdout - {'loss': 0.3387, 'grad_norm': 1.2281500101089478, 'learning_rate': 1.4894789867668502e-06, 'epoch': 2.49} +2025-02-06 02:32:07 - ERROR - stderr - 83%|████████▎ | 18605/22434 [16:24:27<2:38:13, 2.48s/it] +2025-02-06 02:32:10 - ERROR - stderr - 83%|████████▎ | 18606/22434 [16:24:29<2:38:12, 2.48s/it] +2025-02-06 02:32:10 - ERROR - stderr - +2025-02-06 02:32:10 - ERROR - stderr - +2025-02-06 02:32:10 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.7816895246505737, 'learning_rate': 1.48872099118198e-06, 'epoch': 2.49} +2025-02-06 02:32:10 - ERROR - stderr - 83%|████████▎ | 18606/22434 [16:24:29<2:38:12, 2.48s/it] +2025-02-06 02:32:12 - ERROR - stderr - 83%|████████▎ | 18607/22434 [16:24:32<2:40:12, 2.51s/it] +2025-02-06 02:32:12 - ERROR - stderr - +2025-02-06 02:32:12 - ERROR - stderr - +2025-02-06 02:32:12 - INFO - stdout - {'loss': 0.3108, 'grad_norm': 1.3239103555679321, 'learning_rate': 1.487963173006577e-06, 'epoch': 2.49} +2025-02-06 02:32:12 - ERROR - stderr - 83%|████████▎ | 18607/22434 [16:24:32<2:40:12, 2.51s/it] +2025-02-06 02:32:15 - ERROR - stderr - 83%|████████▎ | 18608/22434 [16:24:34<2:39:13, 2.50s/it] +2025-02-06 02:32:15 - ERROR - stderr - +2025-02-06 02:32:15 - ERROR - stderr - +2025-02-06 02:32:15 - INFO - stdout - {'loss': 0.3645, 'grad_norm': 1.4950788021087646, 'learning_rate': 1.4872055322564349e-06, 'epoch': 2.49} +2025-02-06 02:32:15 - ERROR - stderr - 83%|████████▎ | 18608/22434 [16:24:35<2:39:13, 2.50s/it] +2025-02-06 02:32:17 - ERROR - stderr - 83%|████████▎ | 18609/22434 [16:24:37<2:40:52, 2.52s/it] +2025-02-06 02:32:17 - ERROR - stderr - +2025-02-06 02:32:17 - ERROR - stderr - +2025-02-06 02:32:17 - INFO - stdout - {'loss': 0.3707, 'grad_norm': 1.6210441589355469, 'learning_rate': 1.486448068947348e-06, 'epoch': 2.49} +2025-02-06 02:32:17 - ERROR - stderr - 83%|████████▎ | 18609/22434 [16:24:37<2:40:52, 2.52s/it] +2025-02-06 02:32:20 - ERROR - stderr - 83%|████████▎ | 18610/22434 [16:24:40<2:39:59, 2.51s/it] +2025-02-06 02:32:20 - ERROR - stderr - +2025-02-06 02:32:20 - ERROR - stderr - +2025-02-06 02:32:20 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.5311343669891357, 'learning_rate': 1.4856907830951084e-06, 'epoch': 2.49} +2025-02-06 02:32:20 - ERROR - stderr - 83%|████████▎ | 18610/22434 [16:24:40<2:39:59, 2.51s/it] +2025-02-06 02:32:22 - ERROR - stderr - 83%|████████▎ | 18611/22434 [16:24:42<2:39:43, 2.51s/it] +2025-02-06 02:32:22 - ERROR - stderr - +2025-02-06 02:32:22 - ERROR - stderr - +2025-02-06 02:32:22 - INFO - stdout - {'loss': 0.4054, 'grad_norm': 1.4379596710205078, 'learning_rate': 1.4849336747154908e-06, 'epoch': 2.49} +2025-02-06 02:32:22 - ERROR - stderr - 83%|████████▎ | 18611/22434 [16:24:42<2:39:43, 2.51s/it] +2025-02-06 02:32:25 - ERROR - stderr - 83%|████████▎ | 18612/22434 [16:24:45<2:39:00, 2.50s/it] +2025-02-06 02:32:25 - ERROR - stderr - +2025-02-06 02:32:25 - ERROR - stderr - +2025-02-06 02:32:25 - INFO - stdout - {'loss': 0.326, 'grad_norm': 1.4215463399887085, 'learning_rate': 1.484176743824286e-06, 'epoch': 2.49} +2025-02-06 02:32:25 - ERROR - stderr - 83%|████████▎ | 18612/22434 [16:24:45<2:39:00, 2.50s/it] +2025-02-06 02:32:28 - ERROR - stderr - 83%|████████▎ | 18613/22434 [16:24:47<2:43:59, 2.58s/it] +2025-02-06 02:32:28 - ERROR - stderr - +2025-02-06 02:32:28 - ERROR - stderr - +2025-02-06 02:32:28 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.5715378522872925, 'learning_rate': 1.483419990437267e-06, 'epoch': 2.49} +2025-02-06 02:32:28 - ERROR - stderr - 83%|████████▎ | 18613/22434 [16:24:47<2:43:59, 2.58s/it] +2025-02-06 02:32:30 - ERROR - stderr - 83%|████████▎ | 18614/22434 [16:24:50<2:43:01, 2.56s/it] +2025-02-06 02:32:30 - ERROR - stderr - +2025-02-06 02:32:30 - ERROR - stderr - +2025-02-06 02:32:30 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.52204430103302, 'learning_rate': 1.4826634145702102e-06, 'epoch': 2.49} +2025-02-06 02:32:30 - ERROR - stderr - 83%|████████▎ | 18614/22434 [16:24:50<2:43:01, 2.56s/it] +2025-02-06 02:32:32 - ERROR - stderr - 83%|████████▎ | 18615/22434 [16:24:52<2:40:52, 2.53s/it] +2025-02-06 02:32:33 - ERROR - stderr - +2025-02-06 02:32:33 - ERROR - stderr - +2025-02-06 02:32:33 - INFO - stdout - {'loss': 0.3376, 'grad_norm': 1.382900595664978, 'learning_rate': 1.481907016238886e-06, 'epoch': 2.49} +2025-02-06 02:32:33 - ERROR - stderr - 83%|████████▎ | 18615/22434 [16:24:52<2:40:52, 2.53s/it] +2025-02-06 02:32:35 - ERROR - stderr - 83%|████████▎ | 18616/22434 [16:24:55<2:40:30, 2.52s/it] +2025-02-06 02:32:35 - ERROR - stderr - +2025-02-06 02:32:35 - ERROR - stderr - +2025-02-06 02:32:35 - INFO - stdout - {'loss': 0.4042, 'grad_norm': 1.6175472736358643, 'learning_rate': 1.4811507954590542e-06, 'epoch': 2.49} +2025-02-06 02:32:35 - ERROR - stderr - 83%|████████▎ | 18616/22434 [16:24:55<2:40:30, 2.52s/it] +2025-02-06 02:32:37 - ERROR - stderr - 83%|████████▎ | 18617/22434 [16:24:57<2:39:07, 2.50s/it] +2025-02-06 02:32:37 - ERROR - stderr - +2025-02-06 02:32:37 - ERROR - stderr - +2025-02-06 02:32:37 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.481105089187622, 'learning_rate': 1.480394752246488e-06, 'epoch': 2.49} +2025-02-06 02:32:37 - ERROR - stderr - 83%|████████▎ | 18617/22434 [16:24:57<2:39:07, 2.50s/it] +2025-02-06 02:32:40 - ERROR - stderr - 83%|████████▎ | 18618/22434 [16:25:00<2:39:56, 2.51s/it] +2025-02-06 02:32:40 - ERROR - stderr - +2025-02-06 02:32:40 - ERROR - stderr - +2025-02-06 02:32:40 - INFO - stdout - {'loss': 0.4075, 'grad_norm': 1.7270575761795044, 'learning_rate': 1.4796388866169375e-06, 'epoch': 2.49} +2025-02-06 02:32:40 - ERROR - stderr - 83%|████████▎ | 18618/22434 [16:25:00<2:39:56, 2.51s/it] +2025-02-06 02:32:43 - ERROR - stderr - 83%|████████▎ | 18619/22434 [16:25:02<2:40:25, 2.52s/it] +2025-02-06 02:32:43 - ERROR - stderr - +2025-02-06 02:32:43 - ERROR - stderr - +2025-02-06 02:32:43 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.4701913595199585, 'learning_rate': 1.4788831985861597e-06, 'epoch': 2.49} +2025-02-06 02:32:43 - ERROR - stderr - 83%|████████▎ | 18619/22434 [16:25:02<2:40:25, 2.52s/it] +2025-02-06 02:32:45 - ERROR - stderr - 83%|████████▎ | 18620/22434 [16:25:05<2:40:41, 2.53s/it] +2025-02-06 02:32:45 - ERROR - stderr - +2025-02-06 02:32:45 - ERROR - stderr - +2025-02-06 02:32:45 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.8368618488311768, 'learning_rate': 1.4781276881699114e-06, 'epoch': 2.49} +2025-02-06 02:32:45 - ERROR - stderr - 83%|████████▎ | 18620/22434 [16:25:05<2:40:41, 2.53s/it] +2025-02-06 02:32:48 - ERROR - stderr - 83%|████████▎ | 18621/22434 [16:25:07<2:39:49, 2.52s/it] +2025-02-06 02:32:48 - ERROR - stderr - +2025-02-06 02:32:48 - ERROR - stderr - +2025-02-06 02:32:48 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.5809166431427002, 'learning_rate': 1.4773723553839325e-06, 'epoch': 2.49} +2025-02-06 02:32:48 - ERROR - stderr - 83%|████████▎ | 18621/22434 [16:25:07<2:39:49, 2.52s/it] +2025-02-06 02:32:50 - ERROR - stderr - 83%|████████▎ | 18622/22434 [16:25:10<2:39:32, 2.51s/it] +2025-02-06 02:32:50 - ERROR - stderr - +2025-02-06 02:32:50 - ERROR - stderr - +2025-02-06 02:32:50 - INFO - stdout - {'loss': 0.3266, 'grad_norm': 1.3764699697494507, 'learning_rate': 1.4766172002439772e-06, 'epoch': 2.49} +2025-02-06 02:32:50 - ERROR - stderr - 83%|████████▎ | 18622/22434 [16:25:10<2:39:32, 2.51s/it] +2025-02-06 02:32:53 - ERROR - stderr - 83%|████████▎ | 18623/22434 [16:25:12<2:39:47, 2.52s/it] +2025-02-06 02:32:53 - ERROR - stderr - +2025-02-06 02:32:53 - ERROR - stderr - +2025-02-06 02:32:53 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.5955884456634521, 'learning_rate': 1.475862222765777e-06, 'epoch': 2.49} +2025-02-06 02:32:53 - ERROR - stderr - 83%|████████▎ | 18623/22434 [16:25:12<2:39:47, 2.52s/it] +2025-02-06 02:32:55 - ERROR - stderr - 83%|████████▎ | 18624/22434 [16:25:15<2:39:34, 2.51s/it] +2025-02-06 02:32:55 - ERROR - stderr - +2025-02-06 02:32:55 - ERROR - stderr - +2025-02-06 02:32:55 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.4743257761001587, 'learning_rate': 1.475107422965073e-06, 'epoch': 2.49} +2025-02-06 02:32:55 - ERROR - stderr - 83%|████████▎ | 18624/22434 [16:25:15<2:39:34, 2.51s/it] +2025-02-06 02:32:58 - ERROR - stderr - 83%|████████▎ | 18625/22434 [16:25:18<2:48:50, 2.66s/it] +2025-02-06 02:32:58 - ERROR - stderr - +2025-02-06 02:32:58 - ERROR - stderr - +2025-02-06 02:32:58 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.6498315334320068, 'learning_rate': 1.4743528008575968e-06, 'epoch': 2.49} +2025-02-06 02:32:58 - ERROR - stderr - 83%|████████▎ | 18625/22434 [16:25:18<2:48:50, 2.66s/it] +2025-02-06 02:33:01 - ERROR - stderr - 83%|████████▎ | 18626/22434 [16:25:20<2:46:10, 2.62s/it] +2025-02-06 02:33:01 - ERROR - stderr - +2025-02-06 02:33:01 - ERROR - stderr - +2025-02-06 02:33:01 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.4713531732559204, 'learning_rate': 1.4735983564590784e-06, 'epoch': 2.49} +2025-02-06 02:33:01 - ERROR - stderr - 83%|████████▎ | 18626/22434 [16:25:20<2:46:10, 2.62s/it] +2025-02-06 02:33:03 - ERROR - stderr - 83%|████████▎ | 18627/22434 [16:25:23<2:44:14, 2.59s/it] +2025-02-06 02:33:03 - ERROR - stderr - +2025-02-06 02:33:03 - ERROR - stderr - +2025-02-06 02:33:03 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.632378101348877, 'learning_rate': 1.4728440897852436e-06, 'epoch': 2.49} +2025-02-06 02:33:03 - ERROR - stderr - 83%|████████▎ | 18627/22434 [16:25:23<2:44:14, 2.59s/it] +2025-02-06 02:33:06 - ERROR - stderr - 83%|████████▎ | 18628/22434 [16:25:25<2:42:18, 2.56s/it] +2025-02-06 02:33:06 - ERROR - stderr - +2025-02-06 02:33:06 - ERROR - stderr - +2025-02-06 02:33:06 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.5066401958465576, 'learning_rate': 1.4720900008518136e-06, 'epoch': 2.49} +2025-02-06 02:33:06 - ERROR - stderr - 83%|████████▎ | 18628/22434 [16:25:25<2:42:18, 2.56s/it] +2025-02-06 02:33:08 - ERROR - stderr - 83%|████████▎ | 18629/22434 [16:25:28<2:42:13, 2.56s/it] +2025-02-06 02:33:08 - ERROR - stderr - +2025-02-06 02:33:08 - ERROR - stderr - +2025-02-06 02:33:08 - INFO - stdout - {'loss': 0.4125, 'grad_norm': 1.7821909189224243, 'learning_rate': 1.4713360896745077e-06, 'epoch': 2.49} +2025-02-06 02:33:08 - ERROR - stderr - 83%|████████▎ | 18629/22434 [16:25:28<2:42:13, 2.56s/it] +2025-02-06 02:33:11 - ERROR - stderr - 83%|████████▎ | 18630/22434 [16:25:30<2:40:27, 2.53s/it] +2025-02-06 02:33:11 - ERROR - stderr - +2025-02-06 02:33:11 - ERROR - stderr - +2025-02-06 02:33:11 - INFO - stdout - {'loss': 0.3191, 'grad_norm': 1.47171950340271, 'learning_rate': 1.4705823562690402e-06, 'epoch': 2.49} +2025-02-06 02:33:11 - ERROR - stderr - 83%|████████▎ | 18630/22434 [16:25:30<2:40:27, 2.53s/it] +2025-02-06 02:33:13 - ERROR - stderr - 83%|████████▎ | 18631/22434 [16:25:33<2:40:10, 2.53s/it] +2025-02-06 02:33:13 - ERROR - stderr - +2025-02-06 02:33:13 - ERROR - stderr - +2025-02-06 02:33:13 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.5529792308807373, 'learning_rate': 1.4698288006511208e-06, 'epoch': 2.49} +2025-02-06 02:33:13 - ERROR - stderr - 83%|████████▎ | 18631/22434 [16:25:33<2:40:10, 2.53s/it] +2025-02-06 02:33:16 - ERROR - stderr - 83%|████████▎ | 18632/22434 [16:25:36<2:45:56, 2.62s/it] +2025-02-06 02:33:16 - ERROR - stderr - +2025-02-06 02:33:16 - ERROR - stderr - +2025-02-06 02:33:16 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.7210625410079956, 'learning_rate': 1.4690754228364578e-06, 'epoch': 2.49} +2025-02-06 02:33:16 - ERROR - stderr - 83%|████████▎ | 18632/22434 [16:25:36<2:45:56, 2.62s/it] +2025-02-06 02:33:18 - ERROR - stderr - 83%|████████▎ | 18633/22434 [16:25:38<2:42:36, 2.57s/it] +2025-02-06 02:33:18 - ERROR - stderr - +2025-02-06 02:33:18 - ERROR - stderr - +2025-02-06 02:33:18 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.4205901622772217, 'learning_rate': 1.4683222228407544e-06, 'epoch': 2.49} +2025-02-06 02:33:18 - ERROR - stderr - 83%|████████▎ | 18633/22434 [16:25:38<2:42:36, 2.57s/it] +2025-02-06 02:33:21 - ERROR - stderr - 83%|████████▎ | 18634/22434 [16:25:41<2:41:26, 2.55s/it] +2025-02-06 02:33:21 - ERROR - stderr - +2025-02-06 02:33:21 - ERROR - stderr - +2025-02-06 02:33:21 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.579715609550476, 'learning_rate': 1.4675692006797137e-06, 'epoch': 2.49} +2025-02-06 02:33:21 - ERROR - stderr - 83%|████████▎ | 18634/22434 [16:25:41<2:41:26, 2.55s/it] +2025-02-06 02:33:24 - ERROR - stderr - 83%|████████▎ | 18635/22434 [16:25:43<2:41:38, 2.55s/it] +2025-02-06 02:33:24 - ERROR - stderr - +2025-02-06 02:33:24 - ERROR - stderr - +2025-02-06 02:33:24 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.402671456336975, 'learning_rate': 1.466816356369023e-06, 'epoch': 2.49} +2025-02-06 02:33:24 - ERROR - stderr - 83%|████████▎ | 18635/22434 [16:25:43<2:41:38, 2.55s/it] +2025-02-06 02:33:26 - ERROR - stderr - 83%|████████▎ | 18636/22434 [16:25:46<2:39:27, 2.52s/it] +2025-02-06 02:33:26 - ERROR - stderr - +2025-02-06 02:33:26 - ERROR - stderr - +2025-02-06 02:33:26 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.7316697835922241, 'learning_rate': 1.4660636899243841e-06, 'epoch': 2.49} +2025-02-06 02:33:26 - ERROR - stderr - 83%|████████▎ | 18636/22434 [16:25:46<2:39:27, 2.52s/it] +2025-02-06 02:33:28 - ERROR - stderr - 83%|████████▎ | 18637/22434 [16:25:48<2:39:30, 2.52s/it] +2025-02-06 02:33:29 - ERROR - stderr - +2025-02-06 02:33:29 - ERROR - stderr - +2025-02-06 02:33:29 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.4928300380706787, 'learning_rate': 1.465311201361478e-06, 'epoch': 2.49} +2025-02-06 02:33:29 - ERROR - stderr - 83%|████████▎ | 18637/22434 [16:25:48<2:39:30, 2.52s/it] +2025-02-06 02:33:31 - ERROR - stderr - 83%|████████▎ | 18638/22434 [16:25:51<2:38:59, 2.51s/it] +2025-02-06 02:33:31 - ERROR - stderr - +2025-02-06 02:33:31 - ERROR - stderr - +2025-02-06 02:33:31 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.4771106243133545, 'learning_rate': 1.464558890695994e-06, 'epoch': 2.49} +2025-02-06 02:33:31 - ERROR - stderr - 83%|████████▎ | 18638/22434 [16:25:51<2:38:59, 2.51s/it] +2025-02-06 02:33:33 - ERROR - stderr - 83%|████████▎ | 18639/22434 [16:25:53<2:38:21, 2.50s/it] +2025-02-06 02:33:34 - ERROR - stderr - +2025-02-06 02:33:34 - ERROR - stderr - +2025-02-06 02:33:34 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.5786329507827759, 'learning_rate': 1.4638067579436156e-06, 'epoch': 2.49} +2025-02-06 02:33:34 - ERROR - stderr - 83%|████████▎ | 18639/22434 [16:25:53<2:38:21, 2.50s/it] +2025-02-06 02:33:36 - ERROR - stderr - 83%|████████▎ | 18640/22434 [16:25:56<2:38:30, 2.51s/it] +2025-02-06 02:33:36 - ERROR - stderr - +2025-02-06 02:33:36 - ERROR - stderr - +2025-02-06 02:33:36 - INFO - stdout - {'loss': 0.3037, 'grad_norm': 1.4679805040359497, 'learning_rate': 1.463054803120012e-06, 'epoch': 2.49} +2025-02-06 02:33:36 - ERROR - stderr - 83%|████████▎ | 18640/22434 [16:25:56<2:38:30, 2.51s/it] +2025-02-06 02:33:38 - ERROR - stderr - 83%|████████▎ | 18641/22434 [16:25:58<2:38:08, 2.50s/it] +2025-02-06 02:33:39 - ERROR - stderr - +2025-02-06 02:33:39 - ERROR - stderr - +2025-02-06 02:33:39 - INFO - stdout - {'loss': 0.2889, 'grad_norm': 1.2981265783309937, 'learning_rate': 1.4623030262408677e-06, 'epoch': 2.49} +2025-02-06 02:33:39 - ERROR - stderr - 83%|███████��▎ | 18641/22434 [16:25:58<2:38:08, 2.50s/it] +2025-02-06 02:33:41 - ERROR - stderr - 83%|████████▎ | 18642/22434 [16:26:01<2:38:25, 2.51s/it] +2025-02-06 02:33:41 - ERROR - stderr - +2025-02-06 02:33:41 - ERROR - stderr - +2025-02-06 02:33:41 - INFO - stdout - {'loss': 0.4053, 'grad_norm': 1.6673259735107422, 'learning_rate': 1.4615514273218435e-06, 'epoch': 2.49} +2025-02-06 02:33:41 - ERROR - stderr - 83%|████████▎ | 18642/22434 [16:26:01<2:38:25, 2.51s/it] +2025-02-06 02:33:44 - ERROR - stderr - 83%|████████▎ | 18643/22434 [16:26:03<2:39:21, 2.52s/it] +2025-02-06 02:33:44 - ERROR - stderr - +2025-02-06 02:33:44 - ERROR - stderr - +2025-02-06 02:33:44 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.3892321586608887, 'learning_rate': 1.4608000063786098e-06, 'epoch': 2.49} +2025-02-06 02:33:44 - ERROR - stderr - 83%|████████▎ | 18643/22434 [16:26:03<2:39:21, 2.52s/it] +2025-02-06 02:33:46 - ERROR - stderr - 83%|████████▎ | 18644/22434 [16:26:06<2:37:28, 2.49s/it] +2025-02-06 02:33:46 - ERROR - stderr - +2025-02-06 02:33:46 - ERROR - stderr - +2025-02-06 02:33:46 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.4569119215011597, 'learning_rate': 1.460048763426829e-06, 'epoch': 2.49} +2025-02-06 02:33:46 - ERROR - stderr - 83%|████████▎ | 18644/22434 [16:26:06<2:37:28, 2.49s/it] +2025-02-06 02:33:49 - ERROR - stderr - 83%|████████▎ | 18645/22434 [16:26:09<2:45:01, 2.61s/it] +2025-02-06 02:33:49 - ERROR - stderr - +2025-02-06 02:33:49 - ERROR - stderr - +2025-02-06 02:33:49 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.5483710765838623, 'learning_rate': 1.4592976984821604e-06, 'epoch': 2.49} +2025-02-06 02:33:49 - ERROR - stderr - 83%|████████▎ | 18645/22434 [16:26:09<2:45:01, 2.61s/it] +2025-02-06 02:33:51 - ERROR - stderr - 83%|████████▎ | 18646/22434 [16:26:11<2:44:06, 2.60s/it] +2025-02-06 02:33:51 - ERROR - stderr - +2025-02-06 02:33:51 - ERROR - stderr - +2025-02-06 02:33:51 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.4543675184249878, 'learning_rate': 1.4585468115602574e-06, 'epoch': 2.49} +2025-02-06 02:33:51 - ERROR - stderr - 83%|████████▎ | 18646/22434 [16:26:11<2:44:06, 2.60s/it] +2025-02-06 02:33:54 - ERROR - stderr - 83%|████████▎ | 18647/22434 [16:26:14<2:42:31, 2.58s/it] +2025-02-06 02:33:54 - ERROR - stderr - +2025-02-06 02:33:54 - ERROR - stderr - +2025-02-06 02:33:54 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.5567741394042969, 'learning_rate': 1.457796102676774e-06, 'epoch': 2.49} +2025-02-06 02:33:54 - ERROR - stderr - 83%|████████▎ | 18647/22434 [16:26:14<2:42:31, 2.58s/it] +2025-02-06 02:33:57 - ERROR - stderr - 83%|████████▎ | 18648/22434 [16:26:16<2:46:10, 2.63s/it] +2025-02-06 02:33:57 - ERROR - stderr - +2025-02-06 02:33:57 - ERROR - stderr - +2025-02-06 02:33:57 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.6860926151275635, 'learning_rate': 1.4570455718473563e-06, 'epoch': 2.49} +2025-02-06 02:33:57 - ERROR - stderr - 83%|████████▎ | 18648/22434 [16:26:17<2:46:10, 2.63s/it] +2025-02-06 02:33:59 - ERROR - stderr - 83%|████████▎ | 18649/22434 [16:26:19<2:45:56, 2.63s/it] +2025-02-06 02:33:59 - ERROR - stderr - +2025-02-06 02:33:59 - ERROR - stderr - +2025-02-06 02:33:59 - INFO - stdout - {'loss': 0.4329, 'grad_norm': 1.771240472793579, 'learning_rate': 1.456295219087649e-06, 'epoch': 2.49} +2025-02-06 02:33:59 - ERROR - stderr - 83%|████████▎ | 18649/22434 [16:26:19<2:45:56, 2.63s/it] +2025-02-06 02:34:02 - ERROR - stderr - 83%|████████▎ | 18650/22434 [16:26:22<2:44:27, 2.61s/it] +2025-02-06 02:34:02 - ERROR - stderr - +2025-02-06 02:34:02 - ERROR - stderr - +2025-02-06 02:34:02 - INFO - stdout - {'loss': 0.4223, 'grad_norm': 1.8035566806793213, 'learning_rate': 1.4555450444132934e-06, 'epoch': 2.49} +2025-02-06 02:34:02 - ERROR - stderr - 83%|████████▎ | 18650/22434 [16:26:22<2:44:27, 2.61s/it] +2025-02-06 02:34:04 - ERROR - stderr - 83%|████████▎ | 18651/22434 [16:26:24<2:43:14, 2.59s/it] +2025-02-06 02:34:04 - ERROR - stderr - +2025-02-06 02:34:04 - ERROR - stderr - +2025-02-06 02:34:04 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.6802574396133423, 'learning_rate': 1.4547950478399242e-06, 'epoch': 2.49} +2025-02-06 02:34:04 - ERROR - stderr - 83%|████████▎ | 18651/22434 [16:26:24<2:43:14, 2.59s/it] +2025-02-06 02:34:07 - ERROR - stderr - 83%|████████▎ | 18652/22434 [16:26:27<2:44:11, 2.60s/it] +2025-02-06 02:34:07 - ERROR - stderr - +2025-02-06 02:34:07 - ERROR - stderr - +2025-02-06 02:34:07 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.4724411964416504, 'learning_rate': 1.4540452293831753e-06, 'epoch': 2.49} +2025-02-06 02:34:07 - ERROR - stderr - 83%|████████▎ | 18652/22434 [16:26:27<2:44:11, 2.60s/it] +2025-02-06 02:34:10 - ERROR - stderr - 83%|████████▎ | 18653/22434 [16:26:29<2:41:27, 2.56s/it] +2025-02-06 02:34:10 - ERROR - stderr - +2025-02-06 02:34:10 - ERROR - stderr - +2025-02-06 02:34:10 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.5564385652542114, 'learning_rate': 1.4532955890586764e-06, 'epoch': 2.49} +2025-02-06 02:34:10 - ERROR - stderr - 83%|████████▎ | 18653/22434 [16:26:29<2:41:27, 2.56s/it] +2025-02-06 02:34:12 - ERROR - stderr - 83%|████████▎ | 18654/22434 [16:26:32<2:40:09, 2.54s/it] +2025-02-06 02:34:12 - ERROR - stderr - +2025-02-06 02:34:12 - ERROR - stderr - +2025-02-06 02:34:12 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.4613860845565796, 'learning_rate': 1.4525461268820517e-06, 'epoch': 2.49} +2025-02-06 02:34:12 - ERROR - stderr - 83%|████████▎ | 18654/22434 [16:26:32<2:40:09, 2.54s/it] +2025-02-06 02:34:15 - ERROR - stderr - 83%|████████▎ | 18655/22434 [16:26:34<2:39:23, 2.53s/it] +2025-02-06 02:34:15 - ERROR - stderr - +2025-02-06 02:34:15 - ERROR - stderr - +2025-02-06 02:34:15 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.4449515342712402, 'learning_rate': 1.4517968428689277e-06, 'epoch': 2.49} +2025-02-06 02:34:15 - ERROR - stderr - 83%|████████▎ | 18655/22434 [16:26:34<2:39:23, 2.53s/it] +2025-02-06 02:34:17 - ERROR - stderr - 83%|████████▎ | 18656/22434 [16:26:37<2:37:56, 2.51s/it] +2025-02-06 02:34:17 - ERROR - stderr - +2025-02-06 02:34:17 - ERROR - stderr - +2025-02-06 02:34:17 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.6455576419830322, 'learning_rate': 1.451047737034913e-06, 'epoch': 2.49} +2025-02-06 02:34:17 - ERROR - stderr - 83%|████████▎ | 18656/22434 [16:26:37<2:37:56, 2.51s/it] +2025-02-06 02:34:19 - ERROR - stderr - 83%|████████▎ | 18657/22434 [16:26:39<2:36:23, 2.48s/it] +2025-02-06 02:34:19 - ERROR - stderr - +2025-02-06 02:34:19 - ERROR - stderr - +2025-02-06 02:34:19 - INFO - stdout - {'loss': 0.4092, 'grad_norm': 1.6345043182373047, 'learning_rate': 1.4502988093956306e-06, 'epoch': 2.49} +2025-02-06 02:34:19 - ERROR - stderr - 83%|████████▎ | 18657/22434 [16:26:39<2:36:23, 2.48s/it] +2025-02-06 02:34:22 - ERROR - stderr - 83%|████████▎ | 18658/22434 [16:26:42<2:37:09, 2.50s/it] +2025-02-06 02:34:22 - ERROR - stderr - +2025-02-06 02:34:22 - ERROR - stderr - +2025-02-06 02:34:22 - INFO - stdout - {'loss': 0.3634, 'grad_norm': 1.5034797191619873, 'learning_rate': 1.44955005996669e-06, 'epoch': 2.5} +2025-02-06 02:34:22 - ERROR - stderr - 83%|████████▎ | 18658/22434 [16:26:42<2:37:09, 2.50s/it] +2025-02-06 02:34:25 - ERROR - stderr - 83%|████████▎ | 18659/22434 [16:26:44<2:39:00, 2.53s/it] +2025-02-06 02:34:25 - ERROR - stderr - +2025-02-06 02:34:25 - ERROR - stderr - +2025-02-06 02:34:25 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.5965651273727417, 'learning_rate': 1.4488014887636926e-06, 'epoch': 2.5} +2025-02-06 02:34:25 - ERROR - stderr - 83%|████████▎ | 18659/22434 [16:26:44<2:39:00, 2.53s/it] +2025-02-06 02:34:27 - ERROR - stderr - 83%|████████▎ | 18660/22434 [16:26:47<2:40:06, 2.55s/it] +2025-02-06 02:34:27 - ERROR - stderr - +2025-02-06 02:34:27 - ERROR - stderr - +2025-02-06 02:34:27 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.5741041898727417, 'learning_rate': 1.4480530958022498e-06, 'epoch': 2.5} +2025-02-06 02:34:27 - ERROR - stderr - 83%|████████▎ | 18660/22434 [16:26:47<2:40:06, 2.55s/it] +2025-02-06 02:34:30 - ERROR - stderr - 83%|████████▎ | 18661/22434 [16:26:50<2:45:04, 2.63s/it] +2025-02-06 02:34:30 - ERROR - stderr - +2025-02-06 02:34:30 - ERROR - stderr - +2025-02-06 02:34:30 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.5955390930175781, 'learning_rate': 1.447304881097953e-06, 'epoch': 2.5} +2025-02-06 02:34:30 - ERROR - stderr - 83%|████████▎ | 18661/22434 [16:26:50<2:45:04, 2.63s/it] +2025-02-06 02:34:33 - ERROR - stderr - 83%|████████▎ | 18662/22434 [16:26:53<2:50:26, 2.71s/it] +2025-02-06 02:34:33 - ERROR - stderr - +2025-02-06 02:34:33 - ERROR - stderr - +2025-02-06 02:34:33 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.4500372409820557, 'learning_rate': 1.4465568446664057e-06, 'epoch': 2.5} +2025-02-06 02:34:33 - ERROR - stderr - 83%|████████▎ | 18662/22434 [16:26:53<2:50:26, 2.71s/it] +2025-02-06 02:34:35 - ERROR - stderr - 83%|████████▎ | 18663/22434 [16:26:55<2:46:03, 2.64s/it] +2025-02-06 02:34:35 - ERROR - stderr - +2025-02-06 02:34:35 - ERROR - stderr - +2025-02-06 02:34:35 - INFO - stdout - {'loss': 0.3611, 'grad_norm': 1.5714104175567627, 'learning_rate': 1.445808986523195e-06, 'epoch': 2.5} +2025-02-06 02:34:35 - ERROR - stderr - 83%|████████▎ | 18663/22434 [16:26:55<2:46:03, 2.64s/it] +2025-02-06 02:34:38 - ERROR - stderr - 83%|████████▎ | 18664/22434 [16:26:58<2:43:10, 2.60s/it] +2025-02-06 02:34:38 - ERROR - stderr - +2025-02-06 02:34:38 - ERROR - stderr - +2025-02-06 02:34:38 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.5748096704483032, 'learning_rate': 1.4450613066839092e-06, 'epoch': 2.5} +2025-02-06 02:34:38 - ERROR - stderr - 83%|████████▎ | 18664/22434 [16:26:58<2:43:10, 2.60s/it] +2025-02-06 02:34:40 - ERROR - stderr - 83%|████████▎ | 18665/22434 [16:27:00<2:41:25, 2.57s/it] +2025-02-06 02:34:40 - ERROR - stderr - +2025-02-06 02:34:40 - ERROR - stderr - +2025-02-06 02:34:40 - INFO - stdout - {'loss': 0.3054, 'grad_norm': 1.4553799629211426, 'learning_rate': 1.4443138051641347e-06, 'epoch': 2.5} +2025-02-06 02:34:40 - ERROR - stderr - 83%|████████▎ | 18665/22434 [16:27:00<2:41:25, 2.57s/it] +2025-02-06 02:34:43 - ERROR - stderr - 83%|████████▎ | 18666/22434 [16:27:03<2:38:30, 2.52s/it] +2025-02-06 02:34:43 - ERROR - stderr - +2025-02-06 02:34:43 - ERROR - stderr - +2025-02-06 02:34:43 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.525992751121521, 'learning_rate': 1.4435664819794527e-06, 'epoch': 2.5} +2025-02-06 02:34:43 - ERROR - stderr - 83%|████████▎ | 18666/22434 [16:27:03<2:38:30, 2.52s/it] +2025-02-06 02:34:45 - ERROR - stderr - 83%|████████▎ | 18667/22434 [16:27:05<2:37:10, 2.50s/it] +2025-02-06 02:34:45 - ERROR - stderr - +2025-02-06 02:34:45 - ERROR - stderr - +2025-02-06 02:34:45 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.4415743350982666, 'learning_rate': 1.442819337145439e-06, 'epoch': 2.5} +2025-02-06 02:34:45 - ERROR - stderr - 83%|████████▎ | 18667/22434 [16:27:05<2:37:10, 2.50s/it] +2025-02-06 02:34:48 - ERROR - stderr - 83%|████████▎ | 18668/22434 [16:27:07<2:37:02, 2.50s/it] +2025-02-06 02:34:48 - ERROR - stderr - +2025-02-06 02:34:48 - ERROR - stderr - +2025-02-06 02:34:48 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.5423680543899536, 'learning_rate': 1.4420723706776673e-06, 'epoch': 2.5} +2025-02-06 02:34:48 - ERROR - stderr - 83%|████████▎ | 18668/22434 [16:27:08<2:37:02, 2.50s/it] +2025-02-06 02:34:50 - ERROR - stderr - 83%|████████▎ | 18669/22434 [16:27:10<2:36:44, 2.50s/it] +2025-02-06 02:34:50 - ERROR - stderr - +2025-02-06 02:34:50 - ERROR - stderr - +2025-02-06 02:34:50 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.5602530241012573, 'learning_rate': 1.4413255825917094e-06, 'epoch': 2.5} +2025-02-06 02:34:50 - ERROR - stderr - 83%|████████▎ | 18669/22434 [16:27:10<2:36:44, 2.50s/it] +2025-02-06 02:34:53 - ERROR - stderr - 83%|████████▎ | 18670/22434 [16:27:13<2:40:26, 2.56s/it] +2025-02-06 02:34:53 - ERROR - stderr - +2025-02-06 02:34:53 - ERROR - stderr - +2025-02-06 02:34:53 - INFO - stdout - {'loss': 0.3449, 'grad_norm': 1.5395179986953735, 'learning_rate': 1.4405789729031294e-06, 'epoch': 2.5} +2025-02-06 02:34:53 - ERROR - stderr - 83%|████████▎ | 18670/22434 [16:27:13<2:40:26, 2.56s/it] +2025-02-06 02:34:55 - ERROR - stderr - 83%|████████▎ | 18671/22434 [16:27:15<2:38:07, 2.52s/it] +2025-02-06 02:34:55 - ERROR - stderr - +2025-02-06 02:34:55 - ERROR - stderr - +2025-02-06 02:34:55 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.5500996112823486, 'learning_rate': 1.4398325416274894e-06, 'epoch': 2.5} +2025-02-06 02:34:55 - ERROR - stderr - 83%|████████▎ | 18671/22434 [16:27:15<2:38:07, 2.52s/it] +2025-02-06 02:34:58 - ERROR - stderr - 83%|████████▎ | 18672/22434 [16:27:18<2:44:03, 2.62s/it] +2025-02-06 02:34:58 - ERROR - stderr - +2025-02-06 02:34:58 - ERROR - stderr - +2025-02-06 02:34:58 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.4442201852798462, 'learning_rate': 1.4390862887803502e-06, 'epoch': 2.5} +2025-02-06 02:34:58 - ERROR - stderr - 83%|████████▎ | 18672/22434 [16:27:18<2:44:03, 2.62s/it] +2025-02-06 02:35:01 - ERROR - stderr - 83%|████████▎ | 18673/22434 [16:27:20<2:42:34, 2.59s/it] +2025-02-06 02:35:01 - ERROR - stderr - +2025-02-06 02:35:01 - ERROR - stderr - +2025-02-06 02:35:01 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.5078154802322388, 'learning_rate': 1.4383402143772651e-06, 'epoch': 2.5} +2025-02-06 02:35:01 - ERROR - stderr - 83%|████████▎ | 18673/22434 [16:27:21<2:42:34, 2.59s/it] +2025-02-06 02:35:03 - ERROR - stderr - 83%|████████▎ | 18674/22434 [16:27:23<2:41:05, 2.57s/it] +2025-02-06 02:35:03 - ERROR - stderr - +2025-02-06 02:35:03 - ERROR - stderr - +2025-02-06 02:35:03 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.6072710752487183, 'learning_rate': 1.4375943184337871e-06, 'epoch': 2.5} +2025-02-06 02:35:03 - ERROR - stderr - 83%|████████▎ | 18674/22434 [16:27:23<2:41:05, 2.57s/it] +2025-02-06 02:35:06 - ERROR - stderr - 83%|█████���██▎ | 18675/22434 [16:27:26<2:40:26, 2.56s/it] +2025-02-06 02:35:06 - ERROR - stderr - +2025-02-06 02:35:06 - ERROR - stderr - +2025-02-06 02:35:06 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.4882187843322754, 'learning_rate': 1.4368486009654582e-06, 'epoch': 2.5} +2025-02-06 02:35:06 - ERROR - stderr - 83%|████████▎ | 18675/22434 [16:27:26<2:40:26, 2.56s/it] +2025-02-06 02:35:08 - ERROR - stderr - 83%|████████▎ | 18676/22434 [16:27:28<2:38:31, 2.53s/it] +2025-02-06 02:35:08 - ERROR - stderr - +2025-02-06 02:35:08 - ERROR - stderr - +2025-02-06 02:35:08 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.5553520917892456, 'learning_rate': 1.4361030619878292e-06, 'epoch': 2.5} +2025-02-06 02:35:08 - ERROR - stderr - 83%|████████▎ | 18676/22434 [16:27:28<2:38:31, 2.53s/it] +2025-02-06 02:35:11 - ERROR - stderr - 83%|████████▎ | 18677/22434 [16:27:31<2:38:42, 2.53s/it] +2025-02-06 02:35:11 - ERROR - stderr - +2025-02-06 02:35:11 - ERROR - stderr - +2025-02-06 02:35:11 - INFO - stdout - {'loss': 0.3527, 'grad_norm': 1.4280493259429932, 'learning_rate': 1.4353577015164356e-06, 'epoch': 2.5} +2025-02-06 02:35:11 - ERROR - stderr - 83%|████████▎ | 18677/22434 [16:27:31<2:38:42, 2.53s/it] +2025-02-06 02:35:13 - ERROR - stderr - 83%|████████▎ | 18678/22434 [16:27:33<2:40:32, 2.56s/it] +2025-02-06 02:35:13 - ERROR - stderr - +2025-02-06 02:35:13 - ERROR - stderr - +2025-02-06 02:35:13 - INFO - stdout - {'loss': 0.3693, 'grad_norm': 1.6124001741409302, 'learning_rate': 1.434612519566816e-06, 'epoch': 2.5} +2025-02-06 02:35:13 - ERROR - stderr - 83%|████████▎ | 18678/22434 [16:27:33<2:40:32, 2.56s/it] +2025-02-06 02:35:16 - ERROR - stderr - 83%|████████▎ | 18679/22434 [16:27:36<2:46:22, 2.66s/it] +2025-02-06 02:35:16 - ERROR - stderr - +2025-02-06 02:35:16 - ERROR - stderr - +2025-02-06 02:35:16 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.6526379585266113, 'learning_rate': 1.4338675161545046e-06, 'epoch': 2.5} +2025-02-06 02:35:16 - ERROR - stderr - 83%|████████▎ | 18679/22434 [16:27:36<2:46:22, 2.66s/it] +2025-02-06 02:35:19 - ERROR - stderr - 83%|████████▎ | 18680/22434 [16:27:39<2:43:08, 2.61s/it] +2025-02-06 02:35:19 - ERROR - stderr - +2025-02-06 02:35:19 - ERROR - stderr - +2025-02-06 02:35:19 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.56964910030365, 'learning_rate': 1.4331226912950236e-06, 'epoch': 2.5} +2025-02-06 02:35:19 - ERROR - stderr - 83%|████████▎ | 18680/22434 [16:27:39<2:43:08, 2.61s/it] +2025-02-06 02:35:21 - ERROR - stderr - 83%|████████▎ | 18681/22434 [16:27:41<2:40:49, 2.57s/it] +2025-02-06 02:35:21 - ERROR - stderr - +2025-02-06 02:35:21 - ERROR - stderr - +2025-02-06 02:35:21 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.37174654006958, 'learning_rate': 1.432378045003906e-06, 'epoch': 2.5} +2025-02-06 02:35:21 - ERROR - stderr - 83%|████████▎ | 18681/22434 [16:27:41<2:40:49, 2.57s/it] +2025-02-06 02:35:24 - ERROR - stderr - 83%|████████▎ | 18682/22434 [16:27:43<2:38:57, 2.54s/it] +2025-02-06 02:35:24 - ERROR - stderr - +2025-02-06 02:35:24 - ERROR - stderr - +2025-02-06 02:35:24 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.5816730260849, 'learning_rate': 1.4316335772966683e-06, 'epoch': 2.5} +2025-02-06 02:35:24 - ERROR - stderr - 83%|████████▎ | 18682/22434 [16:27:44<2:38:57, 2.54s/it] +2025-02-06 02:35:26 - ERROR - stderr - 83%|████████▎ | 18683/22434 [16:27:46<2:37:02, 2.51s/it] +2025-02-06 02:35:26 - ERROR - stderr - +2025-02-06 02:35:26 - ERROR - stderr - +2025-02-06 02:35:26 - INFO - stdout - {'loss': 0.3678, 'grad_norm': 1.3611667156219482, 'learning_rate': 1.4308892881888293e-06, 'epoch': 2.5} +2025-02-06 02:35:26 - ERROR - stderr - 83%|████████▎ | 18683/22434 [16:27:46<2:37:02, 2.51s/it] +2025-02-06 02:35:29 - ERROR - stderr - 83%|████████▎ | 18684/22434 [16:27:48<2:37:06, 2.51s/it] +2025-02-06 02:35:29 - ERROR - stderr - +2025-02-06 02:35:29 - ERROR - stderr - +2025-02-06 02:35:29 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.4585638046264648, 'learning_rate': 1.430145177695904e-06, 'epoch': 2.5} +2025-02-06 02:35:29 - ERROR - stderr - 83%|████████▎ | 18684/22434 [16:27:49<2:37:06, 2.51s/it] +2025-02-06 02:35:31 - ERROR - stderr - 83%|████████▎ | 18685/22434 [16:27:51<2:36:20, 2.50s/it] +2025-02-06 02:35:31 - ERROR - stderr - +2025-02-06 02:35:31 - ERROR - stderr - +2025-02-06 02:35:31 - INFO - stdout - {'loss': 0.4183, 'grad_norm': 1.6856166124343872, 'learning_rate': 1.4294012458333995e-06, 'epoch': 2.5} +2025-02-06 02:35:31 - ERROR - stderr - 83%|████████▎ | 18685/22434 [16:27:51<2:36:20, 2.50s/it] +2025-02-06 02:35:34 - ERROR - stderr - 83%|████████▎ | 18686/22434 [16:27:53<2:35:33, 2.49s/it] +2025-02-06 02:35:34 - ERROR - stderr - +2025-02-06 02:35:34 - ERROR - stderr - +2025-02-06 02:35:34 - INFO - stdout - {'loss': 0.2937, 'grad_norm': 1.396251916885376, 'learning_rate': 1.4286574926168284e-06, 'epoch': 2.5} +2025-02-06 02:35:34 - ERROR - stderr - 83%|████████▎ | 18686/22434 [16:27:53<2:35:33, 2.49s/it] +2025-02-06 02:35:36 - ERROR - stderr - 83%|████████▎ | 18687/22434 [16:27:56<2:34:46, 2.48s/it] +2025-02-06 02:35:36 - ERROR - stderr - +2025-02-06 02:35:36 - ERROR - stderr - +2025-02-06 02:35:36 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.7472703456878662, 'learning_rate': 1.4279139180616886e-06, 'epoch': 2.5} +2025-02-06 02:35:36 - ERROR - stderr - 83%|████████▎ | 18687/22434 [16:27:56<2:34:46, 2.48s/it] +2025-02-06 02:35:39 - ERROR - stderr - 83%|████████▎ | 18688/22434 [16:27:58<2:35:38, 2.49s/it] +2025-02-06 02:35:39 - ERROR - stderr - +2025-02-06 02:35:39 - ERROR - stderr - +2025-02-06 02:35:39 - INFO - stdout - {'loss': 0.3291, 'grad_norm': 1.4669545888900757, 'learning_rate': 1.4271705221834808e-06, 'epoch': 2.5} +2025-02-06 02:35:39 - ERROR - stderr - 83%|████████▎ | 18688/22434 [16:27:58<2:35:38, 2.49s/it] +2025-02-06 02:35:41 - ERROR - stderr - 83%|████████▎ | 18689/22434 [16:28:01<2:35:58, 2.50s/it] +2025-02-06 02:35:41 - ERROR - stderr - +2025-02-06 02:35:41 - ERROR - stderr - +2025-02-06 02:35:41 - INFO - stdout - {'loss': 0.3243, 'grad_norm': 1.5886670351028442, 'learning_rate': 1.4264273049976995e-06, 'epoch': 2.5} +2025-02-06 02:35:41 - ERROR - stderr - 83%|████████▎ | 18689/22434 [16:28:01<2:35:58, 2.50s/it] +2025-02-06 02:35:44 - ERROR - stderr - 83%|████████▎ | 18690/22434 [16:28:03<2:36:00, 2.50s/it] +2025-02-06 02:35:44 - ERROR - stderr - +2025-02-06 02:35:44 - ERROR - stderr - +2025-02-06 02:35:44 - INFO - stdout - {'loss': 0.3106, 'grad_norm': 1.3921732902526855, 'learning_rate': 1.4256842665198377e-06, 'epoch': 2.5} +2025-02-06 02:35:44 - ERROR - stderr - 83%|████████▎ | 18690/22434 [16:28:03<2:36:00, 2.50s/it] +2025-02-06 02:35:46 - ERROR - stderr - 83%|████████▎ | 18691/22434 [16:28:06<2:36:47, 2.51s/it] +2025-02-06 02:35:46 - ERROR - stderr - +2025-02-06 02:35:46 - ERROR - stderr - +2025-02-06 02:35:46 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.731690526008606, 'learning_rate': 1.4249414067653821e-06, 'epoch': 2.5} +2025-02-06 02:35:46 - ERROR - stderr - 83%|████████▎ | 18691/22434 [16:28:06<2:36:47, 2.51s/it] +2025-02-06 02:35:49 - ERROR - stderr - 83%|████████▎ | 18692/22434 [16:28:08<2:35:24, 2.49s/it] +2025-02-06 02:35:49 - ERROR - stderr - +2025-02-06 02:35:49 - ERROR - stderr - +2025-02-06 02:35:49 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.7129285335540771, 'learning_rate': 1.424198725749818e-06, 'epoch': 2.5} +2025-02-06 02:35:49 - ERROR - stderr - 83%|████████▎ | 18692/22434 [16:28:08<2:35:24, 2.49s/it] +2025-02-06 02:35:51 - ERROR - stderr - 83%|████████▎ | 18693/22434 [16:28:11<2:34:25, 2.48s/it] +2025-02-06 02:35:51 - ERROR - stderr - +2025-02-06 02:35:51 - ERROR - stderr - +2025-02-06 02:35:51 - INFO - stdout - {'loss': 0.4243, 'grad_norm': 1.7014578580856323, 'learning_rate': 1.423456223488625e-06, 'epoch': 2.5} +2025-02-06 02:35:51 - ERROR - stderr - 83%|████████▎ | 18693/22434 [16:28:11<2:34:25, 2.48s/it] +2025-02-06 02:35:54 - ERROR - stderr - 83%|████████▎ | 18694/22434 [16:28:13<2:34:53, 2.48s/it] +2025-02-06 02:35:54 - ERROR - stderr - +2025-02-06 02:35:54 - ERROR - stderr - +2025-02-06 02:35:54 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.5362272262573242, 'learning_rate': 1.4227138999972801e-06, 'epoch': 2.5} +2025-02-06 02:35:54 - ERROR - stderr - 83%|████████▎ | 18694/22434 [16:28:13<2:34:53, 2.48s/it] +2025-02-06 02:35:56 - ERROR - stderr - 83%|████████▎ | 18695/22434 [16:28:16<2:34:33, 2.48s/it] +2025-02-06 02:35:56 - ERROR - stderr - +2025-02-06 02:35:56 - ERROR - stderr - +2025-02-06 02:35:56 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.6782039403915405, 'learning_rate': 1.421971755291256e-06, 'epoch': 2.5} +2025-02-06 02:35:56 - ERROR - stderr - 83%|████████▎ | 18695/22434 [16:28:16<2:34:33, 2.48s/it] +2025-02-06 02:35:58 - ERROR - stderr - 83%|████████▎ | 18696/22434 [16:28:18<2:34:21, 2.48s/it] +2025-02-06 02:35:59 - ERROR - stderr - +2025-02-06 02:35:59 - ERROR - stderr - +2025-02-06 02:35:59 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.5355224609375, 'learning_rate': 1.4212297893860228e-06, 'epoch': 2.5} +2025-02-06 02:35:59 - ERROR - stderr - 83%|████████▎ | 18696/22434 [16:28:18<2:34:21, 2.48s/it] +2025-02-06 02:36:01 - ERROR - stderr - 83%|████████▎ | 18697/22434 [16:28:21<2:35:28, 2.50s/it] +2025-02-06 02:36:01 - ERROR - stderr - +2025-02-06 02:36:01 - ERROR - stderr - +2025-02-06 02:36:01 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.4049218893051147, 'learning_rate': 1.4204880022970457e-06, 'epoch': 2.5} +2025-02-06 02:36:01 - ERROR - stderr - 83%|████████▎ | 18697/22434 [16:28:21<2:35:28, 2.50s/it] +2025-02-06 02:36:04 - ERROR - stderr - 83%|████████▎ | 18698/22434 [16:28:23<2:35:16, 2.49s/it] +2025-02-06 02:36:04 - ERROR - stderr - +2025-02-06 02:36:04 - ERROR - stderr - +2025-02-06 02:36:04 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.5492300987243652, 'learning_rate': 1.419746394039786e-06, 'epoch': 2.5} +2025-02-06 02:36:04 - ERROR - stderr - 83%|████████▎ | 18698/22434 [16:28:23<2:35:16, 2.49s/it] +2025-02-06 02:36:06 - ERROR - stderr - 83%|████████▎ | 18699/22434 [16:28:26<2:37:28, 2.53s/it] +2025-02-06 02:36:06 - ERROR - stderr - +2025-02-06 02:36:06 - ERROR - stderr - +2025-02-06 02:36:06 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.6307997703552246, 'learning_rate': 1.4190049646297032e-06, 'epoch': 2.5} +2025-02-06 02:36:06 - ERROR - stderr - 83%|████████▎ | 18699/22434 [16:28:26<2:37:28, 2.53s/it] +2025-02-06 02:36:09 - ERROR - stderr - 83%|████████▎ | 18700/22434 [16:28:28<2:35:50, 2.50s/it] +2025-02-06 02:36:09 - ERROR - stderr - +2025-02-06 02:36:09 - ERROR - stderr - +2025-02-06 02:36:09 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.423971176147461, 'learning_rate': 1.418263714082252e-06, 'epoch': 2.5} +2025-02-06 02:36:09 - ERROR - stderr - 83%|████████▎ | 18700/22434 [16:28:28<2:35:50, 2.50s/it] +2025-02-06 02:36:11 - ERROR - stderr - 83%|████████▎ | 18701/22434 [16:28:31<2:37:06, 2.53s/it] +2025-02-06 02:36:11 - ERROR - stderr - +2025-02-06 02:36:11 - ERROR - stderr - +2025-02-06 02:36:11 - INFO - stdout - {'loss': 0.335, 'grad_norm': 1.3322798013687134, 'learning_rate': 1.4175226424128775e-06, 'epoch': 2.5} +2025-02-06 02:36:11 - ERROR - stderr - 83%|████████▎ | 18701/22434 [16:28:31<2:37:06, 2.53s/it] +2025-02-06 02:36:14 - ERROR - stderr - 83%|████████▎ | 18702/22434 [16:28:34<2:45:04, 2.65s/it] +2025-02-06 02:36:14 - ERROR - stderr - +2025-02-06 02:36:14 - ERROR - stderr - +2025-02-06 02:36:14 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.680484652519226, 'learning_rate': 1.4167817496370362e-06, 'epoch': 2.5} +2025-02-06 02:36:14 - ERROR - stderr - 83%|████████▎ | 18702/22434 [16:28:34<2:45:04, 2.65s/it] +2025-02-06 02:36:17 - ERROR - stderr - 83%|████████▎ | 18703/22434 [16:28:36<2:42:32, 2.61s/it] +2025-02-06 02:36:17 - ERROR - stderr - +2025-02-06 02:36:17 - ERROR - stderr - +2025-02-06 02:36:17 - INFO - stdout - {'loss': 0.3693, 'grad_norm': 1.4885387420654297, 'learning_rate': 1.4160410357701638e-06, 'epoch': 2.5} +2025-02-06 02:36:17 - ERROR - stderr - 83%|████████▎ | 18703/22434 [16:28:36<2:42:32, 2.61s/it] +2025-02-06 02:36:19 - ERROR - stderr - 83%|████████▎ | 18704/22434 [16:28:39<2:40:20, 2.58s/it] +2025-02-06 02:36:19 - ERROR - stderr - +2025-02-06 02:36:19 - ERROR - stderr - +2025-02-06 02:36:19 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.524792194366455, 'learning_rate': 1.4153005008276987e-06, 'epoch': 2.5} +2025-02-06 02:36:19 - ERROR - stderr - 83%|████████▎ | 18704/22434 [16:28:39<2:40:20, 2.58s/it] +2025-02-06 02:36:22 - ERROR - stderr - 83%|████████▎ | 18705/22434 [16:28:41<2:40:02, 2.58s/it] +2025-02-06 02:36:22 - ERROR - stderr - +2025-02-06 02:36:22 - ERROR - stderr - +2025-02-06 02:36:22 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.346716046333313, 'learning_rate': 1.4145601448250857e-06, 'epoch': 2.5} +2025-02-06 02:36:22 - ERROR - stderr - 83%|████████▎ | 18705/22434 [16:28:42<2:40:02, 2.58s/it] +2025-02-06 02:36:24 - ERROR - stderr - 83%|████████▎ | 18706/22434 [16:28:44<2:39:06, 2.56s/it] +2025-02-06 02:36:24 - ERROR - stderr - +2025-02-06 02:36:24 - ERROR - stderr - +2025-02-06 02:36:24 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.6070681810379028, 'learning_rate': 1.4138199677777465e-06, 'epoch': 2.5} +2025-02-06 02:36:24 - ERROR - stderr - 83%|████████▎ | 18706/22434 [16:28:44<2:39:06, 2.56s/it] +2025-02-06 02:36:27 - ERROR - stderr - 83%|████████▎ | 18707/22434 [16:28:46<2:37:15, 2.53s/it] +2025-02-06 02:36:27 - ERROR - stderr - +2025-02-06 02:36:27 - ERROR - stderr - +2025-02-06 02:36:27 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.6407783031463623, 'learning_rate': 1.4130799697011177e-06, 'epoch': 2.5} +2025-02-06 02:36:27 - ERROR - stderr - 83%|████████▎ | 18707/22434 [16:28:46<2:37:15, 2.53s/it] +2025-02-06 02:36:29 - ERROR - stderr - 83%|████████▎ | 18708/22434 [16:28:49<2:41:45, 2.60s/it] +2025-02-06 02:36:30 - ERROR - stderr - +2025-02-06 02:36:30 - ERROR - stderr - +2025-02-06 02:36:30 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.808449387550354, 'learning_rate': 1.4123401506106182e-06, 'epoch': 2.5} +2025-02-06 02:36:30 - ERROR - stderr - 83%|████████▎ | 18708/22434 [16:28:49<2:41:45, 2.60s/it] +2025-02-06 02:36:32 - ERROR - stderr - 83%|████████▎ | 18709/22434 [16:28:52<2:41:09, 2.60s/it] +2025-02-06 02:36:32 - ERROR - stderr - +2025-02-06 02:36:32 - ERROR - stderr - +2025-02-06 02:36:32 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.7229595184326172, 'learning_rate': 1.4116005105216712e-06, 'epoch': 2.5} +2025-02-06 02:36:32 - ERROR - stderr - 83%|████████▎ | 18709/22434 [16:28:52<2:41:09, 2.60s/it] +2025-02-06 02:36:35 - ERROR - stderr - 83%|████████▎ | 18710/22434 [16:28:54<2:40:11, 2.58s/it] +2025-02-06 02:36:35 - ERROR - stderr - +2025-02-06 02:36:35 - ERROR - stderr - +2025-02-06 02:36:35 - INFO - stdout - {'loss': 0.2707, 'grad_norm': 1.3103058338165283, 'learning_rate': 1.4108610494496934e-06, 'epoch': 2.5} +2025-02-06 02:36:35 - ERROR - stderr - 83%|████████▎ | 18710/22434 [16:28:54<2:40:11, 2.58s/it] +2025-02-06 02:36:37 - ERROR - stderr - 83%|████████▎ | 18711/22434 [16:28:57<2:39:38, 2.57s/it] +2025-02-06 02:36:37 - ERROR - stderr - +2025-02-06 02:36:37 - ERROR - stderr - +2025-02-06 02:36:37 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.5738813877105713, 'learning_rate': 1.4101217674100975e-06, 'epoch': 2.5} +2025-02-06 02:36:37 - ERROR - stderr - 83%|████████▎ | 18711/22434 [16:28:57<2:39:38, 2.57s/it] +2025-02-06 02:36:40 - ERROR - stderr - 83%|████████▎ | 18712/22434 [16:28:59<2:39:19, 2.57s/it] +2025-02-06 02:36:40 - ERROR - stderr - +2025-02-06 02:36:40 - ERROR - stderr - +2025-02-06 02:36:40 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.4926611185073853, 'learning_rate': 1.4093826644182939e-06, 'epoch': 2.5} +2025-02-06 02:36:40 - ERROR - stderr - 83%|████████▎ | 18712/22434 [16:29:00<2:39:19, 2.57s/it] +2025-02-06 02:36:42 - ERROR - stderr - 83%|████████▎ | 18713/22434 [16:29:02<2:39:51, 2.58s/it] +2025-02-06 02:36:42 - ERROR - stderr - +2025-02-06 02:36:42 - ERROR - stderr - +2025-02-06 02:36:42 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.6358656883239746, 'learning_rate': 1.408643740489688e-06, 'epoch': 2.5} +2025-02-06 02:36:42 - ERROR - stderr - 83%|████████▎ | 18713/22434 [16:29:02<2:39:51, 2.58s/it] +2025-02-06 02:36:45 - ERROR - stderr - 83%|████████▎ | 18714/22434 [16:29:05<2:39:12, 2.57s/it] +2025-02-06 02:36:45 - ERROR - stderr - +2025-02-06 02:36:45 - ERROR - stderr - +2025-02-06 02:36:45 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.7031357288360596, 'learning_rate': 1.4079049956396828e-06, 'epoch': 2.5} +2025-02-06 02:36:45 - ERROR - stderr - 83%|████████▎ | 18714/22434 [16:29:05<2:39:12, 2.57s/it] +2025-02-06 02:36:47 - ERROR - stderr - 83%|████████▎ | 18715/22434 [16:29:07<2:38:32, 2.56s/it] +2025-02-06 02:36:47 - ERROR - stderr - +2025-02-06 02:36:47 - ERROR - stderr - +2025-02-06 02:36:47 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.6063978672027588, 'learning_rate': 1.4071664298836762e-06, 'epoch': 2.5} +2025-02-06 02:36:47 - ERROR - stderr - 83%|████████▎ | 18715/22434 [16:29:07<2:38:32, 2.56s/it] +2025-02-06 02:36:50 - ERROR - stderr - 83%|████████▎ | 18716/22434 [16:29:10<2:38:25, 2.56s/it] +2025-02-06 02:36:50 - ERROR - stderr - +2025-02-06 02:36:50 - ERROR - stderr - +2025-02-06 02:36:50 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.4091458320617676, 'learning_rate': 1.4064280432370635e-06, 'epoch': 2.5} +2025-02-06 02:36:50 - ERROR - stderr - 83%|████████▎ | 18716/22434 [16:29:10<2:38:25, 2.56s/it] +2025-02-06 02:36:52 - ERROR - stderr - 83%|████████▎ | 18717/22434 [16:29:12<2:37:08, 2.54s/it] +2025-02-06 02:36:52 - ERROR - stderr - +2025-02-06 02:36:52 - ERROR - stderr - +2025-02-06 02:36:52 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.3704969882965088, 'learning_rate': 1.4056898357152338e-06, 'epoch': 2.5} +2025-02-06 02:36:52 - ERROR - stderr - 83%|████████▎ | 18717/22434 [16:29:12<2:37:08, 2.54s/it] +2025-02-06 02:36:55 - ERROR - stderr - 83%|████████▎ | 18718/22434 [16:29:15<2:37:25, 2.54s/it] +2025-02-06 02:36:55 - ERROR - stderr - +2025-02-06 02:36:55 - ERROR - stderr - +2025-02-06 02:36:55 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.5371063947677612, 'learning_rate': 1.4049518073335767e-06, 'epoch': 2.5} +2025-02-06 02:36:55 - ERROR - stderr - 83%|████████▎ | 18718/22434 [16:29:15<2:37:25, 2.54s/it] +2025-02-06 02:36:57 - ERROR - stderr - 83%|████████▎ | 18719/22434 [16:29:17<2:36:24, 2.53s/it] +2025-02-06 02:36:58 - ERROR - stderr - +2025-02-06 02:36:58 - ERROR - stderr - +2025-02-06 02:36:58 - INFO - stdout - {'loss': 0.4203, 'grad_norm': 1.608318567276001, 'learning_rate': 1.4042139581074765e-06, 'epoch': 2.5} +2025-02-06 02:36:58 - ERROR - stderr - 83%|████████▎ | 18719/22434 [16:29:17<2:36:24, 2.53s/it] +2025-02-06 02:37:00 - ERROR - stderr - 83%|████████▎ | 18720/22434 [16:29:20<2:36:56, 2.54s/it] +2025-02-06 02:37:00 - ERROR - stderr - +2025-02-06 02:37:00 - ERROR - stderr - +2025-02-06 02:37:00 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.4850339889526367, 'learning_rate': 1.4034762880523068e-06, 'epoch': 2.5} +2025-02-06 02:37:00 - ERROR - stderr - 83%|████████▎ | 18720/22434 [16:29:20<2:36:56, 2.54s/it] +2025-02-06 02:37:03 - ERROR - stderr - 83%|████████▎ | 18721/22434 [16:29:23<2:41:21, 2.61s/it] +2025-02-06 02:37:03 - ERROR - stderr - +2025-02-06 02:37:03 - ERROR - stderr - +2025-02-06 02:37:03 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.4904999732971191, 'learning_rate': 1.4027387971834495e-06, 'epoch': 2.5} +2025-02-06 02:37:03 - ERROR - stderr - 83%|████████▎ | 18721/22434 [16:29:23<2:41:21, 2.61s/it] +2025-02-06 02:37:05 - ERROR - stderr - 83%|████████▎ | 18722/22434 [16:29:25<2:43:11, 2.64s/it] +2025-02-06 02:37:06 - ERROR - stderr - +2025-02-06 02:37:06 - ERROR - stderr - +2025-02-06 02:37:06 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.6859025955200195, 'learning_rate': 1.4020014855162755e-06, 'epoch': 2.5} +2025-02-06 02:37:06 - ERROR - stderr - 83%|████████▎ | 18722/22434 [16:29:25<2:43:11, 2.64s/it] +2025-02-06 02:37:08 - ERROR - stderr - 83%|████████▎ | 18723/22434 [16:29:28<2:41:02, 2.60s/it] +2025-02-06 02:37:08 - ERROR - stderr - +2025-02-06 02:37:08 - ERROR - stderr - +2025-02-06 02:37:08 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.483705997467041, 'learning_rate': 1.4012643530661529e-06, 'epoch': 2.5} +2025-02-06 02:37:08 - ERROR - stderr - 83%|████████▎ | 18723/22434 [16:29:28<2:41:02, 2.60s/it] +2025-02-06 02:37:11 - ERROR - stderr - 83%|████████▎ | 18724/22434 [16:29:30<2:39:08, 2.57s/it] +2025-02-06 02:37:11 - ERROR - stderr - +2025-02-06 02:37:11 - ERROR - stderr - +2025-02-06 02:37:11 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.6232643127441406, 'learning_rate': 1.4005273998484504e-06, 'epoch': 2.5} +2025-02-06 02:37:11 - ERROR - stderr - 83%|████████▎ | 18724/22434 [16:29:30<2:39:08, 2.57s/it] +2025-02-06 02:37:13 - ERROR - stderr - 83%|████████▎ | 18725/22434 [16:29:33<2:37:27, 2.55s/it] +2025-02-06 02:37:13 - ERROR - stderr - +2025-02-06 02:37:13 - ERROR - stderr - +2025-02-06 02:37:13 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.425099492073059, 'learning_rate': 1.3997906258785188e-06, 'epoch': 2.5} +2025-02-06 02:37:13 - ERROR - stderr - 83%|████████▎ | 18725/22434 [16:29:33<2:37:27, 2.55s/it] +2025-02-06 02:37:15 - ERROR - stderr - 83%|████████▎ | 18726/22434 [16:29:35<2:35:20, 2.51s/it] +2025-02-06 02:37:15 - ERROR - stderr - +2025-02-06 02:37:15 - ERROR - stderr - +2025-02-06 02:37:15 - INFO - stdout - {'loss': 0.3288, 'grad_norm': 1.520273208618164, 'learning_rate': 1.3990540311717282e-06, 'epoch': 2.5} +2025-02-06 02:37:15 - ERROR - stderr - 83%|████████▎ | 18726/22434 [16:29:35<2:35:20, 2.51s/it] +2025-02-06 02:37:18 - ERROR - stderr - 83%|████████▎ | 18727/22434 [16:29:38<2:35:52, 2.52s/it] +2025-02-06 02:37:18 - ERROR - stderr - +2025-02-06 02:37:18 - ERROR - stderr - +2025-02-06 02:37:18 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.5543946027755737, 'learning_rate': 1.398317615743423e-06, 'epoch': 2.5} +2025-02-06 02:37:18 - ERROR - stderr - 83%|████████▎ | 18727/22434 [16:29:38<2:35:52, 2.52s/it] +2025-02-06 02:37:20 - ERROR - stderr - 83%|████████▎ | 18728/22434 [16:29:40<2:34:44, 2.51s/it] +2025-02-06 02:37:21 - ERROR - stderr - +2025-02-06 02:37:21 - ERROR - stderr - +2025-02-06 02:37:21 - INFO - stdout - {'loss': 0.3153, 'grad_norm': 1.4910900592803955, 'learning_rate': 1.3975813796089566e-06, 'epoch': 2.5} +2025-02-06 02:37:21 - ERROR - stderr - 83%|████████▎ | 18728/22434 [16:29:40<2:34:44, 2.51s/it] +2025-02-06 02:37:23 - ERROR - stderr - 83%|████████▎ | 18729/22434 [16:29:43<2:34:54, 2.51s/it] +2025-02-06 02:37:23 - ERROR - stderr - +2025-02-06 02:37:23 - ERROR - stderr - +2025-02-06 02:37:23 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.5485210418701172, 'learning_rate': 1.3968453227836753e-06, 'epoch': 2.5} +2025-02-06 02:37:23 - ERROR - stderr - 83%|████████▎ | 18729/22434 [16:29:43<2:34:54, 2.51s/it] +2025-02-06 02:37:25 - ERROR - stderr - 83%|████████▎ | 18730/22434 [16:29:45<2:34:35, 2.50s/it] +2025-02-06 02:37:26 - ERROR - stderr - +2025-02-06 02:37:26 - ERROR - stderr - +2025-02-06 02:37:26 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.681546926498413, 'learning_rate': 1.3961094452829182e-06, 'epoch': 2.5} +2025-02-06 02:37:26 - ERROR - stderr - 83%|████████▎ | 18730/22434 [16:29:45<2:34:35, 2.50s/it] +2025-02-06 02:37:28 - ERROR - stderr - 83%|████████▎ | 18731/22434 [16:29:48<2:35:33, 2.52s/it] +2025-02-06 02:37:28 - ERROR - stderr - +2025-02-06 02:37:28 - ERROR - stderr - +2025-02-06 02:37:28 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.366850733757019, 'learning_rate': 1.3953737471220307e-06, 'epoch': 2.5} +2025-02-06 02:37:28 - ERROR - stderr - 83%|████████▎ | 18731/22434 [16:29:48<2:35:33, 2.52s/it] +2025-02-06 02:37:28 - INFO - stdout - WARNING: tokenization mismatch: 96 vs. 114. (ignored) +2025-02-06 02:37:31 - ERROR - stderr - 83%|████████▎ | 18732/22434 [16:29:50<2:34:54, 2.51s/it] +2025-02-06 02:37:31 - ERROR - stderr - +2025-02-06 02:37:31 - ERROR - stderr - +2025-02-06 02:37:31 - INFO - stdout - {'loss': 0.4468, 'grad_norm': 1.7842521667480469, 'learning_rate': 1.3946382283163417e-06, 'epoch': 2.5} +2025-02-06 02:37:31 - ERROR - stderr - 83%|████████▎ | 18732/22434 [16:29:50<2:34:54, 2.51s/it] +2025-02-06 02:37:33 - ERROR - stderr - 84%|████████▎ | 18733/22434 [16:29:53<2:34:06, 2.50s/it] +2025-02-06 02:37:33 - ERROR - stderr - +2025-02-06 02:37:33 - ERROR - stderr - +2025-02-06 02:37:33 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.5375540256500244, 'learning_rate': 1.3939028888811845e-06, 'epoch': 2.51} +2025-02-06 02:37:33 - ERROR - stderr - 84%|████████▎ | 18733/22434 [16:29:53<2:34:06, 2.50s/it] +2025-02-06 02:37:36 - ERROR - stderr - 84%|████████▎ | 18734/22434 [16:29:55<2:36:02, 2.53s/it] +2025-02-06 02:37:36 - ERROR - stderr - +2025-02-06 02:37:36 - ERROR - stderr - +2025-02-06 02:37:36 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.53855299949646, 'learning_rate': 1.3931677288318868e-06, 'epoch': 2.51} +2025-02-06 02:37:36 - ERROR - stderr - 84%|████████▎ | 18734/22434 [16:29:55<2:36:02, 2.53s/it] +2025-02-06 02:37:38 - ERROR - stderr - 84%|████████▎ | 18735/22434 [16:29:58<2:35:29, 2.52s/it] +2025-02-06 02:37:38 - ERROR - stderr - +2025-02-06 02:37:38 - ERROR - stderr - +2025-02-06 02:37:38 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.6233115196228027, 'learning_rate': 1.3924327481837708e-06, 'epoch': 2.51} +2025-02-06 02:37:38 - ERROR - stderr - 84%|████████▎ | 18735/22434 [16:29:58<2:35:29, 2.52s/it] +2025-02-06 02:37:41 - ERROR - stderr - 84%|████████▎ | 18736/22434 [16:30:00<2:36:12, 2.53s/it] +2025-02-06 02:37:41 - ERROR - stderr - +2025-02-06 02:37:41 - ERROR - stderr - +2025-02-06 02:37:41 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.5496952533721924, 'learning_rate': 1.3916979469521585e-06, 'epoch': 2.51} +2025-02-06 02:37:41 - ERROR - stderr - 84%|████████▎ | 18736/22434 [16:30:00<2:36:12, 2.53s/it] +2025-02-06 02:37:43 - ERROR - stderr - 84%|████████▎ | 18737/22434 [16:30:03<2:35:15, 2.52s/it] +2025-02-06 02:37:43 - ERROR - stderr - +2025-02-06 02:37:43 - ERROR - stderr - +2025-02-06 02:37:43 - INFO - stdout - {'loss': 0.289, 'grad_norm': 1.4044309854507446, 'learning_rate': 1.3909633251523657e-06, 'epoch': 2.51} +2025-02-06 02:37:43 - ERROR - stderr - 84%|████████▎ | 18737/22434 [16:30:03<2:35:15, 2.52s/it] +2025-02-06 02:37:46 - ERROR - stderr - 84%|████████▎ | 18738/22434 [16:30:05<2:34:14, 2.50s/it] +2025-02-06 02:37:46 - ERROR - stderr - +2025-02-06 02:37:46 - ERROR - stderr - +2025-02-06 02:37:46 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.5986356735229492, 'learning_rate': 1.3902288827997035e-06, 'epoch': 2.51} +2025-02-06 02:37:46 - ERROR - stderr - 84%|████████▎ | 18738/22434 [16:30:05<2:34:14, 2.50s/it] +2025-02-06 02:37:48 - ERROR - stderr - 84%|████████▎ | 18739/22434 [16:30:08<2:34:22, 2.51s/it] +2025-02-06 02:37:48 - ERROR - stderr - +2025-02-06 02:37:48 - ERROR - stderr - +2025-02-06 02:37:48 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.588167667388916, 'learning_rate': 1.3894946199094816e-06, 'epoch': 2.51} +2025-02-06 02:37:48 - ERROR - stderr - 84%|████████▎ | 18739/22434 [16:30:08<2:34:22, 2.51s/it] +2025-02-06 02:37:51 - ERROR - stderr - 84%|████████▎ | 18740/22434 [16:30:10<2:34:24, 2.51s/it] +2025-02-06 02:37:51 - ERROR - stderr - +2025-02-06 02:37:51 - ERROR - stderr - +2025-02-06 02:37:51 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.4581599235534668, 'learning_rate': 1.3887605364970058e-06, 'epoch': 2.51} +2025-02-06 02:37:51 - ERROR - stderr - 84%|████████▎ | 18740/22434 [16:30:10<2:34:24, 2.51s/it] +2025-02-06 02:37:53 - ERROR - stderr - 84%|████████▎ | 18741/22434 [16:30:13<2:33:54, 2.50s/it] +2025-02-06 02:37:53 - ERROR - stderr - +2025-02-06 02:37:53 - ERROR - stderr - +2025-02-06 02:37:53 - INFO - stdout - {'loss': 0.3392, 'grad_norm': 1.5078647136688232, 'learning_rate': 1.388026632577576e-06, 'epoch': 2.51} +2025-02-06 02:37:53 - ERROR - stderr - 84%|████████▎ | 18741/22434 [16:30:13<2:33:54, 2.50s/it] +2025-02-06 02:37:56 - ERROR - stderr - 84%|████████▎ | 18742/22434 [16:30:15<2:33:16, 2.49s/it] +2025-02-06 02:37:56 - ERROR - stderr - +2025-02-06 02:37:56 - ERROR - stderr - +2025-02-06 02:37:56 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.5856362581253052, 'learning_rate': 1.387292908166491e-06, 'epoch': 2.51} +2025-02-06 02:37:56 - ERROR - stderr - 84%|████████▎ | 18742/22434 [16:30:15<2:33:16, 2.49s/it] +2025-02-06 02:37:58 - ERROR - stderr - 84%|████████▎ | 18743/22434 [16:30:18<2:33:43, 2.50s/it] +2025-02-06 02:37:58 - ERROR - stderr - +2025-02-06 02:37:58 - ERROR - stderr - +2025-02-06 02:37:58 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.5700857639312744, 'learning_rate': 1.3865593632790453e-06, 'epoch': 2.51} +2025-02-06 02:37:58 - ERROR - stderr - 84%|████████▎ | 18743/22434 [16:30:18<2:33:43, 2.50s/it] +2025-02-06 02:38:01 - ERROR - stderr - 84%|████████▎ | 18744/22434 [16:30:20<2:34:45, 2.52s/it] +2025-02-06 02:38:01 - ERROR - stderr - +2025-02-06 02:38:01 - ERROR - stderr - +2025-02-06 02:38:01 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.5993281602859497, 'learning_rate': 1.3858259979305234e-06, 'epoch': 2.51} +2025-02-06 02:38:01 - ERROR - stderr - 84%|████████▎ | 18744/22434 [16:30:20<2:34:45, 2.52s/it] +2025-02-06 02:38:03 - ERROR - stderr - 84%|████████▎ | 18745/22434 [16:30:23<2:34:12, 2.51s/it] +2025-02-06 02:38:03 - ERROR - stderr - +2025-02-06 02:38:03 - ERROR - stderr - +2025-02-06 02:38:03 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.5850213766098022, 'learning_rate': 1.3850928121362195e-06, 'epoch': 2.51} +2025-02-06 02:38:03 - ERROR - stderr - 84%|████████▎ | 18745/22434 [16:30:23<2:34:12, 2.51s/it] +2025-02-06 02:38:06 - ERROR - stderr - 84%|████████▎ | 18746/22434 [16:30:25<2:33:42, 2.50s/it] +2025-02-06 02:38:06 - ERROR - stderr - +2025-02-06 02:38:06 - ERROR - stderr - +2025-02-06 02:38:06 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.6332415342330933, 'learning_rate': 1.3843598059114083e-06, 'epoch': 2.51} +2025-02-06 02:38:06 - ERROR - stderr - 84%|████████▎ | 18746/22434 [16:30:25<2:33:42, 2.50s/it] +2025-02-06 02:38:08 - ERROR - stderr - 84%|████████▎ | 18747/22434 [16:30:28<2:34:23, 2.51s/it] +2025-02-06 02:38:08 - ERROR - stderr - +2025-02-06 02:38:08 - ERROR - stderr - +2025-02-06 02:38:08 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.3776710033416748, 'learning_rate': 1.3836269792713774e-06, 'epoch': 2.51} +2025-02-06 02:38:08 - ERROR - stderr - 84%|████████▎ | 18747/22434 [16:30:28<2:34:23, 2.51s/it] +2025-02-06 02:38:11 - ERROR - stderr - 84%|████████▎ | 18748/22434 [16:30:30<2:34:53, 2.52s/it] +2025-02-06 02:38:11 - ERROR - stderr - +2025-02-06 02:38:11 - ERROR - stderr - +2025-02-06 02:38:11 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.513225793838501, 'learning_rate': 1.382894332231395e-06, 'epoch': 2.51} +2025-02-06 02:38:11 - ERROR - stderr - 84%|████████▎ | 18748/22434 [16:30:31<2:34:53, 2.52s/it] +2025-02-06 02:38:13 - ERROR - stderr - 84%|████████▎ | 18749/22434 [16:30:33<2:33:36, 2.50s/it] +2025-02-06 02:38:13 - ERROR - stderr - +2025-02-06 02:38:13 - ERROR - stderr - +2025-02-06 02:38:13 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.4606261253356934, 'learning_rate': 1.3821618648067314e-06, 'epoch': 2.51} +2025-02-06 02:38:13 - ERROR - stderr - 84%|████████▎ | 18749/22434 [16:30:33<2:33:36, 2.50s/it] +2025-02-06 02:38:16 - ERROR - stderr - 84%|████████▎ | 18750/22434 [16:30:35<2:34:21, 2.51s/it] +2025-02-06 02:38:16 - ERROR - stderr - +2025-02-06 02:38:16 - ERROR - stderr - +2025-02-06 02:38:16 - INFO - stdout - {'loss': 0.3273, 'grad_norm': 1.4419642686843872, 'learning_rate': 1.381429577012663e-06, 'epoch': 2.51} +2025-02-06 02:38:16 - ERROR - stderr - 84%|████████▎ | 18750/22434 [16:30:36<2:34:21, 2.51s/it] +2025-02-06 02:38:18 - ERROR - stderr - 84%|████████▎ | 18751/22434 [16:30:38<2:33:22, 2.50s/it] +2025-02-06 02:38:18 - ERROR - stderr - +2025-02-06 02:38:18 - ERROR - stderr - +2025-02-06 02:38:18 - INFO - stdout - {'loss': 0.4025, 'grad_norm': 1.7148438692092896, 'learning_rate': 1.3806974688644449e-06, 'epoch': 2.51} +2025-02-06 02:38:18 - ERROR - stderr - 84%|████████▎ | 18751/22434 [16:30:38<2:33:22, 2.50s/it] +2025-02-06 02:38:21 - ERROR - stderr - 84%|████████▎ | 18752/22434 [16:30:40<2:34:01, 2.51s/it] +2025-02-06 02:38:21 - ERROR - stderr - +2025-02-06 02:38:21 - ERROR - stderr - +2025-02-06 02:38:21 - INFO - stdout - {'loss': 0.3458, 'grad_norm': 1.47361159324646, 'learning_rate': 1.3799655403773405e-06, 'epoch': 2.51} +2025-02-06 02:38:21 - ERROR - stderr - 84%|████████▎ | 18752/22434 [16:30:41<2:34:01, 2.51s/it] +2025-02-06 02:38:23 - ERROR - stderr - 84%|████████▎ | 18753/22434 [16:30:43<2:32:48, 2.49s/it] +2025-02-06 02:38:23 - ERROR - stderr - +2025-02-06 02:38:23 - ERROR - stderr - +2025-02-06 02:38:23 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.5109809637069702, 'learning_rate': 1.3792337915666065e-06, 'epoch': 2.51} +2025-02-06 02:38:23 - ERROR - stderr - 84%|████████▎ | 18753/22434 [16:30:43<2:32:48, 2.49s/it] +2025-02-06 02:38:26 - ERROR - stderr - 84%|████████▎ | 18754/22434 [16:30:45<2:33:38, 2.50s/it] +2025-02-06 02:38:26 - ERROR - stderr - +2025-02-06 02:38:26 - ERROR - stderr - +2025-02-06 02:38:26 - INFO - stdout - {'loss': 0.3199, 'grad_norm': 1.3976682424545288, 'learning_rate': 1.3785022224474943e-06, 'epoch': 2.51} +2025-02-06 02:38:26 - ERROR - stderr - 84%|████████▎ | 18754/22434 [16:30:46<2:33:38, 2.50s/it] +2025-02-06 02:38:28 - ERROR - stderr - 84%|████████▎ | 18755/22434 [16:30:48<2:34:43, 2.52s/it] +2025-02-06 02:38:28 - ERROR - stderr - +2025-02-06 02:38:28 - ERROR - stderr - +2025-02-06 02:38:28 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.3809337615966797, 'learning_rate': 1.3777708330352534e-06, 'epoch': 2.51} +2025-02-06 02:38:28 - ERROR - stderr - 84%|████████▎ | 18755/22434 [16:30:48<2:34:43, 2.52s/it] +2025-02-06 02:38:31 - ERROR - stderr - 84%|████████▎ | 18756/22434 [16:30:51<2:34:16, 2.52s/it] +2025-02-06 02:38:31 - ERROR - stderr - +2025-02-06 02:38:31 - ERROR - stderr - +2025-02-06 02:38:31 - INFO - stdout - {'loss': 0.3941, 'grad_norm': 1.5658472776412964, 'learning_rate': 1.3770396233451288e-06, 'epoch': 2.51} +2025-02-06 02:38:31 - ERROR - stderr - 84%|████████▎ | 18756/22434 [16:30:51<2:34:16, 2.52s/it] +2025-02-06 02:38:33 - ERROR - stderr - 84%|████████▎ | 18757/22434 [16:30:53<2:33:50, 2.51s/it] +2025-02-06 02:38:33 - ERROR - stderr - +2025-02-06 02:38:33 - ERROR - stderr - +2025-02-06 02:38:33 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.6389915943145752, 'learning_rate': 1.3763085933923626e-06, 'epoch': 2.51} +2025-02-06 02:38:33 - ERROR - stderr - 84%|████████▎ | 18757/22434 [16:30:53<2:33:50, 2.51s/it] +2025-02-06 02:38:36 - ERROR - stderr - 84%|████████▎ | 18758/22434 [16:30:55<2:32:44, 2.49s/it] +2025-02-06 02:38:36 - ERROR - stderr - +2025-02-06 02:38:36 - ERROR - stderr - +2025-02-06 02:38:36 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.5828673839569092, 'learning_rate': 1.3755777431921912e-06, 'epoch': 2.51} +2025-02-06 02:38:36 - ERROR - stderr - 84%|████████▎ | 18758/22434 [16:30:56<2:32:44, 2.49s/it] +2025-02-06 02:38:38 - ERROR - stderr - 84%|████████▎ | 18759/22434 [16:30:58<2:31:24, 2.47s/it] +2025-02-06 02:38:38 - ERROR - stderr - +2025-02-06 02:38:38 - ERROR - stderr - +2025-02-06 02:38:38 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.604867935180664, 'learning_rate': 1.3748470727598496e-06, 'epoch': 2.51} +2025-02-06 02:38:38 - ERROR - stderr - 84%|████████▎ | 18759/22434 [16:30:58<2:31:24, 2.47s/it] +2025-02-06 02:38:41 - ERROR - stderr - 84%|████████▎ | 18760/22434 [16:31:00<2:32:07, 2.48s/it] +2025-02-06 02:38:41 - ERROR - stderr - +2025-02-06 02:38:41 - ERROR - stderr - +2025-02-06 02:38:41 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.6293370723724365, 'learning_rate': 1.3741165821105674e-06, 'epoch': 2.51} +2025-02-06 02:38:41 - ERROR - stderr - 84%|████████▎ | 18760/22434 [16:31:00<2:32:07, 2.48s/it] +2025-02-06 02:38:43 - ERROR - stderr - 84%|████████▎ | 18761/22434 [16:31:03<2:32:14, 2.49s/it] +2025-02-06 02:38:43 - ERROR - stderr - +2025-02-06 02:38:43 - ERROR - stderr - +2025-02-06 02:38:43 - INFO - stdout - {'loss': 0.3211, 'grad_norm': 1.5185595750808716, 'learning_rate': 1.3733862712595702e-06, 'epoch': 2.51} +2025-02-06 02:38:43 - ERROR - stderr - 84%|████████▎ | 18761/22434 [16:31:03<2:32:14, 2.49s/it] +2025-02-06 02:38:46 - ERROR - stderr - 84%|████████▎ | 18762/22434 [16:31:05<2:31:47, 2.48s/it] +2025-02-06 02:38:46 - ERROR - stderr - +2025-02-06 02:38:46 - ERROR - stderr - +2025-02-06 02:38:46 - INFO - stdout - {'loss': 0.332, 'grad_norm': 1.5716077089309692, 'learning_rate': 1.3726561402220818e-06, 'epoch': 2.51} +2025-02-06 02:38:46 - ERROR - stderr - 84%|████████▎ | 18762/22434 [16:31:05<2:31:47, 2.48s/it] +2025-02-06 02:38:48 - ERROR - stderr - 84%|████████▎ | 18763/22434 [16:31:08<2:33:07, 2.50s/it] +2025-02-06 02:38:48 - ERROR - stderr - +2025-02-06 02:38:48 - ERROR - stderr - +2025-02-06 02:38:48 - INFO - stdout - {'loss': 0.3614, 'grad_norm': 1.5485892295837402, 'learning_rate': 1.3719261890133206e-06, 'epoch': 2.51} +2025-02-06 02:38:48 - ERROR - stderr - 84%|████████▎ | 18763/22434 [16:31:08<2:33:07, 2.50s/it] +2025-02-06 02:38:51 - ERROR - stderr - 84%|████████▎ | 18764/22434 [16:31:10<2:33:14, 2.51s/it] +2025-02-06 02:38:51 - ERROR - stderr - +2025-02-06 02:38:51 - ERROR - stderr - +2025-02-06 02:38:51 - INFO - stdout - {'loss': 0.3266, 'grad_norm': 1.4621496200561523, 'learning_rate': 1.3711964176485049e-06, 'epoch': 2.51} +2025-02-06 02:38:51 - ERROR - stderr - 84%|████████▎ | 18764/22434 [16:31:10<2:33:14, 2.51s/it] +2025-02-06 02:38:53 - ERROR - stderr - 84%|████████▎ | 18765/22434 [16:31:13<2:32:59, 2.50s/it] +2025-02-06 02:38:53 - ERROR - stderr - +2025-02-06 02:38:53 - ERROR - stderr - +2025-02-06 02:38:53 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.573520302772522, 'learning_rate': 1.3704668261428377e-06, 'epoch': 2.51} +2025-02-06 02:38:53 - ERROR - stderr - 84%|████████▎ | 18765/22434 [16:31:13<2:32:59, 2.50s/it] +2025-02-06 02:38:56 - ERROR - stderr - 84%|████████▎ | 18766/22434 [16:31:15<2:32:43, 2.50s/it] +2025-02-06 02:38:56 - ERROR - stderr - +2025-02-06 02:38:56 - ERROR - stderr - +2025-02-06 02:38:56 - INFO - stdout - {'loss': 0.313, 'grad_norm': 1.293465495109558, 'learning_rate': 1.369737414511536e-06, 'epoch': 2.51} +2025-02-06 02:38:56 - ERROR - stderr - 84%|████████▎ | 18766/22434 [16:31:15<2:32:43, 2.50s/it] +2025-02-06 02:38:59 - ERROR - stderr - 84%|████████▎ | 18767/22434 [16:31:18<2:42:01, 2.65s/it] +2025-02-06 02:38:59 - ERROR - stderr - +2025-02-06 02:38:59 - ERROR - stderr - +2025-02-06 02:38:59 - INFO - stdout - {'loss': 0.3308, 'grad_norm': 1.499725103378296, 'learning_rate': 1.3690081827697988e-06, 'epoch': 2.51} +2025-02-06 02:38:59 - ERROR - stderr - 84%|████████▎ | 18767/22434 [16:31:18<2:42:01, 2.65s/it] +2025-02-06 02:39:01 - ERROR - stderr - 84%|████████▎ | 18768/22434 [16:31:21<2:37:15, 2.57s/it] +2025-02-06 02:39:01 - ERROR - stderr - +2025-02-06 02:39:01 - ERROR - stderr - +2025-02-06 02:39:01 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.686970591545105, 'learning_rate': 1.3682791309328236e-06, 'epoch': 2.51} +2025-02-06 02:39:01 - ERROR - stderr - 84%|████████▎ | 18768/22434 [16:31:21<2:37:15, 2.57s/it] +2025-02-06 02:39:04 - ERROR - stderr - 84%|████████▎ | 18769/22434 [16:31:23<2:37:34, 2.58s/it] +2025-02-06 02:39:04 - ERROR - stderr - +2025-02-06 02:39:04 - ERROR - stderr - +2025-02-06 02:39:04 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.6373852491378784, 'learning_rate': 1.367550259015815e-06, 'epoch': 2.51} +2025-02-06 02:39:04 - ERROR - stderr - 84%|████████▎ | 18769/22434 [16:31:23<2:37:34, 2.58s/it] +2025-02-06 02:39:06 - ERROR - stderr - 84%|████████▎ | 18770/22434 [16:31:26<2:37:13, 2.57s/it] +2025-02-06 02:39:06 - ERROR - stderr - +2025-02-06 02:39:06 - ERROR - stderr - +2025-02-06 02:39:06 - INFO - stdout - {'loss': 0.3121, 'grad_norm': 1.4756332635879517, 'learning_rate': 1.3668215670339569e-06, 'epoch': 2.51} +2025-02-06 02:39:06 - ERROR - stderr - 84%|████████▎ | 18770/22434 [16:31:26<2:37:13, 2.57s/it] +2025-02-06 02:39:09 - ERROR - stderr - 84%|████████▎ | 18771/22434 [16:31:29<2:36:46, 2.57s/it] +2025-02-06 02:39:09 - ERROR - stderr - +2025-02-06 02:39:09 - ERROR - stderr - +2025-02-06 02:39:09 - INFO - stdout - {'loss': 0.3321, 'grad_norm': 1.5063499212265015, 'learning_rate': 1.3660930550024454e-06, 'epoch': 2.51} +2025-02-06 02:39:09 - ERROR - stderr - 84%|████████▎ | 18771/22434 [16:31:29<2:36:46, 2.57s/it] +2025-02-06 02:39:11 - ERROR - stderr - 84%|████████▎ | 18772/22434 [16:31:31<2:35:00, 2.54s/it] +2025-02-06 02:39:11 - ERROR - stderr - +2025-02-06 02:39:11 - ERROR - stderr - +2025-02-06 02:39:11 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.851346731185913, 'learning_rate': 1.3653647229364619e-06, 'epoch': 2.51} +2025-02-06 02:39:11 - ERROR - stderr - 84%|████████▎ | 18772/22434 [16:31:31<2:35:00, 2.54s/it] +2025-02-06 02:39:14 - ERROR - stderr - 84%|████████▎ | 18773/22434 [16:31:34<2:34:17, 2.53s/it] +2025-02-06 02:39:14 - ERROR - stderr - +2025-02-06 02:39:14 - ERROR - stderr - +2025-02-06 02:39:14 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.528391718864441, 'learning_rate': 1.3646365708511867e-06, 'epoch': 2.51} +2025-02-06 02:39:14 - ERROR - stderr - 84%|████████▎ | 18773/22434 [16:31:34<2:34:17, 2.53s/it] +2025-02-06 02:39:16 - ERROR - stderr - 84%|████████▎ | 18774/22434 [16:31:36<2:32:09, 2.49s/it] +2025-02-06 02:39:16 - ERROR - stderr - +2025-02-06 02:39:16 - ERROR - stderr - +2025-02-06 02:39:16 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.6154814958572388, 'learning_rate': 1.3639085987618005e-06, 'epoch': 2.51} +2025-02-06 02:39:16 - ERROR - stderr - 84%|████████▎ | 18774/22434 [16:31:36<2:32:09, 2.49s/it] +2025-02-06 02:39:19 - ERROR - stderr - 84%|████████▎ | 18775/22434 [16:31:38<2:31:17, 2.48s/it] +2025-02-06 02:39:19 - ERROR - stderr - +2025-02-06 02:39:19 - ERROR - stderr - +2025-02-06 02:39:19 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.6219675540924072, 'learning_rate': 1.363180806683475e-06, 'epoch': 2.51} +2025-02-06 02:39:19 - ERROR - stderr - 84%|████████▎ | 18775/22434 [16:31:38<2:31:17, 2.48s/it] +2025-02-06 02:39:21 - ERROR - stderr - 84%|████████▎ | 18776/22434 [16:31:41<2:31:39, 2.49s/it] +2025-02-06 02:39:21 - ERROR - stderr - +2025-02-06 02:39:21 - ERROR - stderr - +2025-02-06 02:39:21 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.368686556816101, 'learning_rate': 1.3624531946313812e-06, 'epoch': 2.51} +2025-02-06 02:39:21 - ERROR - stderr - 84%|████████▎ | 18776/22434 [16:31:41<2:31:39, 2.49s/it] +2025-02-06 02:39:24 - ERROR - stderr - 84%|████████▎ | 18777/22434 [16:31:43<2:30:31, 2.47s/it] +2025-02-06 02:39:24 - ERROR - stderr - +2025-02-06 02:39:24 - ERROR - stderr - +2025-02-06 02:39:24 - INFO - stdout - {'loss': 0.3337, 'grad_norm': 1.381800889968872, 'learning_rate': 1.3617257626206849e-06, 'epoch': 2.51} +2025-02-06 02:39:24 - ERROR - stderr - 84%|████████▎ | 18777/22434 [16:31:43<2:30:31, 2.47s/it] +2025-02-06 02:39:26 - ERROR - stderr - 84%|████████▎ | 18778/22434 [16:31:46<2:30:26, 2.47s/it] +2025-02-06 02:39:26 - ERROR - stderr - +2025-02-06 02:39:26 - ERROR - stderr - +2025-02-06 02:39:26 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.5714846849441528, 'learning_rate': 1.3609985106665491e-06, 'epoch': 2.51} +2025-02-06 02:39:26 - ERROR - stderr - 84%|████████▎ | 18778/22434 [16:31:46<2:30:26, 2.47s/it] +2025-02-06 02:39:28 - ERROR - stderr - 84%|████████▎ | 18779/22434 [16:31:48<2:30:47, 2.48s/it] +2025-02-06 02:39:29 - ERROR - stderr - +2025-02-06 02:39:29 - ERROR - stderr - +2025-02-06 02:39:29 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.4829574823379517, 'learning_rate': 1.3602714387841332e-06, 'epoch': 2.51} +2025-02-06 02:39:29 - ERROR - stderr - 84%|████████▎ | 18779/22434 [16:31:48<2:30:47, 2.48s/it] +2025-02-06 02:39:31 - ERROR - stderr - 84%|████████▎ | 18780/22434 [16:31:51<2:29:33, 2.46s/it] +2025-02-06 02:39:31 - ERROR - stderr - +2025-02-06 02:39:31 - ERROR - stderr - +2025-02-06 02:39:31 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.5353392362594604, 'learning_rate': 1.3595445469885915e-06, 'epoch': 2.51} +2025-02-06 02:39:31 - ERROR - stderr - 84%|████████▎ | 18780/22434 [16:31:51<2:29:33, 2.46s/it] +2025-02-06 02:39:33 - ERROR - stderr - 84%|████████▎ | 18781/22434 [16:31:53<2:29:31, 2.46s/it] +2025-02-06 02:39:33 - ERROR - stderr - +2025-02-06 02:39:33 - ERROR - stderr - +2025-02-06 02:39:33 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.42531156539917, 'learning_rate': 1.3588178352950764e-06, 'epoch': 2.51} +2025-02-06 02:39:33 - ERROR - stderr - 84%|████████▎ | 18781/22434 [16:31:53<2:29:31, 2.46s/it] +2025-02-06 02:39:36 - ERROR - stderr - 84%|████████▎ | 18782/22434 [16:31:56<2:29:16, 2.45s/it] +2025-02-06 02:39:36 - ERROR - stderr - +2025-02-06 02:39:36 - ERROR - stderr - +2025-02-06 02:39:36 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.678500771522522, 'learning_rate': 1.3580913037187338e-06, 'epoch': 2.51} +2025-02-06 02:39:36 - ERROR - stderr - 84%|████████▎ | 18782/22434 [16:31:56<2:29:16, 2.45s/it] +2025-02-06 02:39:38 - ERROR - stderr - 84%|████████▎ | 18783/22434 [16:31:58<2:33:17, 2.52s/it] +2025-02-06 02:39:39 - ERROR - stderr - +2025-02-06 02:39:39 - ERROR - stderr - +2025-02-06 02:39:39 - INFO - stdout - {'loss': 0.3253, 'grad_norm': 1.5032864809036255, 'learning_rate': 1.357364952274709e-06, 'epoch': 2.51} +2025-02-06 02:39:39 - ERROR - stderr - 84%|████████▎ | 18783/22434 [16:31:58<2:33:17, 2.52s/it] +2025-02-06 02:39:41 - ERROR - stderr - 84%|████████▎ | 18784/22434 [16:32:01<2:35:07, 2.55s/it] +2025-02-06 02:39:41 - ERROR - stderr - +2025-02-06 02:39:41 - ERROR - stderr - +2025-02-06 02:39:41 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.4216761589050293, 'learning_rate': 1.3566387809781423e-06, 'epoch': 2.51} +2025-02-06 02:39:41 - ERROR - stderr - 84%|████████▎ | 18784/22434 [16:32:01<2:35:07, 2.55s/it] +2025-02-06 02:39:44 - ERROR - stderr - 84%|████████▎ | 18785/22434 [16:32:04<2:38:26, 2.61s/it] +2025-02-06 02:39:44 - ERROR - stderr - +2025-02-06 02:39:44 - ERROR - stderr - +2025-02-06 02:39:44 - INFO - stdout - {'loss': 0.3177, 'grad_norm': 1.5071951150894165, 'learning_rate': 1.3559127898441703e-06, 'epoch': 2.51} +2025-02-06 02:39:44 - ERROR - stderr - 84%|████████▎ | 18785/22434 [16:32:04<2:38:26, 2.61s/it] +2025-02-06 02:39:46 - ERROR - stderr - 84%|████████▎ | 18786/22434 [16:32:06<2:37:02, 2.58s/it] +2025-02-06 02:39:46 - ERROR - stderr - +2025-02-06 02:39:46 - ERROR - stderr - +2025-02-06 02:39:46 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.6937954425811768, 'learning_rate': 1.3551869788879213e-06, 'epoch': 2.51} +2025-02-06 02:39:46 - ERROR - stderr - 84%|████████▎ | 18786/22434 [16:32:06<2:37:02, 2.58s/it] +2025-02-06 02:39:49 - ERROR - stderr - 84%|████████▎ | 18787/22434 [16:32:09<2:34:59, 2.55s/it] +2025-02-06 02:39:49 - ERROR - stderr - +2025-02-06 02:39:49 - ERROR - stderr - +2025-02-06 02:39:49 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.6006581783294678, 'learning_rate': 1.3544613481245294e-06, 'epoch': 2.51} +2025-02-06 02:39:49 - ERROR - stderr - 84%|████████▎ | 18787/22434 [16:32:09<2:34:59, 2.55s/it] +2025-02-06 02:39:51 - ERROR - stderr - 84%|████████▎ | 18788/22434 [16:32:11<2:32:45, 2.51s/it] +2025-02-06 02:39:51 - ERROR - stderr - +2025-02-06 02:39:51 - ERROR - stderr - +2025-02-06 02:39:51 - INFO - stdout - {'loss': 0.4051, 'grad_norm': 1.6860499382019043, 'learning_rate': 1.3537358975691205e-06, 'epoch': 2.51} +2025-02-06 02:39:51 - ERROR - stderr - 84%|████████▎ | 18788/22434 [16:32:11<2:32:45, 2.51s/it] +2025-02-06 02:39:54 - ERROR - stderr - 84%|████████▍ | 18789/22434 [16:32:14<2:32:25, 2.51s/it] +2025-02-06 02:39:54 - ERROR - stderr - +2025-02-06 02:39:54 - ERROR - stderr - +2025-02-06 02:39:54 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.5064318180084229, 'learning_rate': 1.3530106272368083e-06, 'epoch': 2.51} +2025-02-06 02:39:54 - ERROR - stderr - 84%|████████▍ | 18789/22434 [16:32:14<2:32:25, 2.51s/it] +2025-02-06 02:39:56 - ERROR - stderr - 84%|████████▍ | 18790/22434 [16:32:16<2:32:20, 2.51s/it] +2025-02-06 02:39:56 - ERROR - stderr - +2025-02-06 02:39:56 - ERROR - stderr - +2025-02-06 02:39:56 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.831417202949524, 'learning_rate': 1.35228553714272e-06, 'epoch': 2.51} +2025-02-06 02:39:56 - ERROR - stderr - 84%|████████▍ | 18790/22434 [16:32:16<2:32:20, 2.51s/it] +2025-02-06 02:39:59 - ERROR - stderr - 84%|████████▍ | 18791/22434 [16:32:19<2:32:34, 2.51s/it] +2025-02-06 02:39:59 - ERROR - stderr - +2025-02-06 02:39:59 - ERROR - stderr - +2025-02-06 02:39:59 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.65804922580719, 'learning_rate': 1.35156062730196e-06, 'epoch': 2.51} +2025-02-06 02:39:59 - ERROR - stderr - 84%|████████▍ | 18791/22434 [16:32:19<2:32:34, 2.51s/it] +2025-02-06 02:40:01 - ERROR - stderr - 84%|████████▍ | 18792/22434 [16:32:21<2:33:22, 2.53s/it] +2025-02-06 02:40:01 - ERROR - stderr - +2025-02-06 02:40:01 - ERROR - stderr - +2025-02-06 02:40:01 - INFO - stdout - {'loss': 0.4029, 'grad_norm': 1.593489408493042, 'learning_rate': 1.3508358977296477e-06, 'epoch': 2.51} +2025-02-06 02:40:01 - ERROR - stderr - 84%|████████▍ | 18792/22434 [16:32:21<2:33:22, 2.53s/it] +2025-02-06 02:40:04 - ERROR - stderr - 84%|████████▍ | 18793/22434 [16:32:24<2:32:35, 2.51s/it] +2025-02-06 02:40:04 - ERROR - stderr - +2025-02-06 02:40:04 - ERROR - stderr - +2025-02-06 02:40:04 - INFO - stdout - {'loss': 0.4142, 'grad_norm': 1.6318997144699097, 'learning_rate': 1.3501113484408822e-06, 'epoch': 2.51} +2025-02-06 02:40:04 - ERROR - stderr - 84%|████████▍ | 18793/22434 [16:32:24<2:32:35, 2.51s/it] +2025-02-06 02:40:06 - ERROR - stderr - 84%|████████▍ | 18794/22434 [16:32:26<2:32:12, 2.51s/it] +2025-02-06 02:40:06 - ERROR - stderr - +2025-02-06 02:40:06 - ERROR - stderr - +2025-02-06 02:40:06 - INFO - stdout - {'loss': 0.3244, 'grad_norm': 1.4919341802597046, 'learning_rate': 1.3493869794507664e-06, 'epoch': 2.51} +2025-02-06 02:40:06 - ERROR - stderr - 84%|████████▍ | 18794/22434 [16:32:26<2:32:12, 2.51s/it] +2025-02-06 02:40:09 - ERROR - stderr - 84%|████████▍ | 18795/22434 [16:32:29<2:31:19, 2.50s/it] +2025-02-06 02:40:09 - ERROR - stderr - +2025-02-06 02:40:09 - ERROR - stderr - +2025-02-06 02:40:09 - INFO - stdout - {'loss': 0.3153, 'grad_norm': 1.4111082553863525, 'learning_rate': 1.3486627907744065e-06, 'epoch': 2.51} +2025-02-06 02:40:09 - ERROR - stderr - 84%|████████▍ | 18795/22434 [16:32:29<2:31:19, 2.50s/it] +2025-02-06 02:40:11 - ERROR - stderr - 84%|████████▍ | 18796/22434 [16:32:31<2:31:52, 2.50s/it] +2025-02-06 02:40:11 - ERROR - stderr - +2025-02-06 02:40:11 - ERROR - stderr - +2025-02-06 02:40:11 - INFO - stdout - {'loss': 0.4304, 'grad_norm': 1.555806279182434, 'learning_rate': 1.3479387824268897e-06, 'epoch': 2.51} +2025-02-06 02:40:11 - ERROR - stderr - 84%|██████���█▍ | 18796/22434 [16:32:31<2:31:52, 2.50s/it] +2025-02-06 02:40:14 - ERROR - stderr - 84%|████████▍ | 18797/22434 [16:32:34<2:31:10, 2.49s/it] +2025-02-06 02:40:14 - ERROR - stderr - +2025-02-06 02:40:14 - ERROR - stderr - +2025-02-06 02:40:14 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.4185792207717896, 'learning_rate': 1.3472149544233092e-06, 'epoch': 2.51} +2025-02-06 02:40:14 - ERROR - stderr - 84%|████████▍ | 18797/22434 [16:32:34<2:31:10, 2.49s/it] +2025-02-06 02:40:16 - ERROR - stderr - 84%|████████▍ | 18798/22434 [16:32:36<2:31:31, 2.50s/it] +2025-02-06 02:40:16 - ERROR - stderr - +2025-02-06 02:40:16 - ERROR - stderr - +2025-02-06 02:40:16 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5001932382583618, 'learning_rate': 1.3464913067787534e-06, 'epoch': 2.51} +2025-02-06 02:40:16 - ERROR - stderr - 84%|████████▍ | 18798/22434 [16:32:36<2:31:31, 2.50s/it] +2025-02-06 02:40:19 - ERROR - stderr - 84%|████████▍ | 18799/22434 [16:32:39<2:32:13, 2.51s/it] +2025-02-06 02:40:19 - ERROR - stderr - +2025-02-06 02:40:19 - ERROR - stderr - +2025-02-06 02:40:19 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.6629923582077026, 'learning_rate': 1.3457678395083062e-06, 'epoch': 2.51} +2025-02-06 02:40:19 - ERROR - stderr - 84%|████████▍ | 18799/22434 [16:32:39<2:32:13, 2.51s/it] +2025-02-06 02:40:21 - ERROR - stderr - 84%|████████▍ | 18800/22434 [16:32:41<2:33:05, 2.53s/it] +2025-02-06 02:40:21 - ERROR - stderr - +2025-02-06 02:40:21 - ERROR - stderr - +2025-02-06 02:40:21 - INFO - stdout - {'loss': 0.3956, 'grad_norm': 1.7201027870178223, 'learning_rate': 1.3450445526270473e-06, 'epoch': 2.51} +2025-02-06 02:40:21 - ERROR - stderr - 84%|████████▍ | 18800/22434 [16:32:41<2:33:05, 2.53s/it] +2025-02-06 02:40:24 - ERROR - stderr - 84%|████████▍ | 18801/22434 [16:32:44<2:32:38, 2.52s/it] +2025-02-06 02:40:24 - ERROR - stderr - +2025-02-06 02:40:24 - ERROR - stderr - +2025-02-06 02:40:24 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.5988587141036987, 'learning_rate': 1.344321446150052e-06, 'epoch': 2.51} +2025-02-06 02:40:24 - ERROR - stderr - 84%|████████▍ | 18801/22434 [16:32:44<2:32:38, 2.52s/it] +2025-02-06 02:40:26 - ERROR - stderr - 84%|████████▍ | 18802/22434 [16:32:46<2:33:19, 2.53s/it] +2025-02-06 02:40:27 - ERROR - stderr - +2025-02-06 02:40:27 - ERROR - stderr - +2025-02-06 02:40:27 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.4937191009521484, 'learning_rate': 1.343598520092394e-06, 'epoch': 2.51} +2025-02-06 02:40:27 - ERROR - stderr - 84%|████████▍ | 18802/22434 [16:32:46<2:33:19, 2.53s/it] +2025-02-06 02:40:29 - ERROR - stderr - 84%|████████▍ | 18803/22434 [16:32:49<2:32:36, 2.52s/it] +2025-02-06 02:40:29 - ERROR - stderr - +2025-02-06 02:40:29 - ERROR - stderr - +2025-02-06 02:40:29 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.5772446393966675, 'learning_rate': 1.3428757744691422e-06, 'epoch': 2.51} +2025-02-06 02:40:29 - ERROR - stderr - 84%|████████▍ | 18803/22434 [16:32:49<2:32:36, 2.52s/it] +2025-02-06 02:40:31 - ERROR - stderr - 84%|████████▍ | 18804/22434 [16:32:51<2:32:30, 2.52s/it] +2025-02-06 02:40:32 - ERROR - stderr - +2025-02-06 02:40:32 - ERROR - stderr - +2025-02-06 02:40:32 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.475939154624939, 'learning_rate': 1.3421532092953625e-06, 'epoch': 2.51} +2025-02-06 02:40:32 - ERROR - stderr - 84%|████████▍ | 18804/22434 [16:32:51<2:32:30, 2.52s/it] +2025-02-06 02:40:34 - ERROR - stderr - 84%|████████▍ | 18805/22434 [16:32:54<2:32:09, 2.52s/it] +2025-02-06 02:40:34 - ERROR - stderr - +2025-02-06 02:40:34 - ERROR - stderr - +2025-02-06 02:40:34 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.53485107421875, 'learning_rate': 1.3414308245861097e-06, 'epoch': 2.51} +2025-02-06 02:40:34 - ERROR - stderr - 84%|████████▍ | 18805/22434 [16:32:54<2:32:09, 2.52s/it] +2025-02-06 02:40:36 - ERROR - stderr - 84%|████████▍ | 18806/22434 [16:32:56<2:31:37, 2.51s/it] +2025-02-06 02:40:37 - ERROR - stderr - +2025-02-06 02:40:37 - ERROR - stderr - +2025-02-06 02:40:37 - INFO - stdout - {'loss': 0.3712, 'grad_norm': 1.580859899520874, 'learning_rate': 1.340708620356449e-06, 'epoch': 2.51} +2025-02-06 02:40:37 - ERROR - stderr - 84%|████████▍ | 18806/22434 [16:32:56<2:31:37, 2.51s/it] +2025-02-06 02:40:39 - ERROR - stderr - 84%|████████▍ | 18807/22434 [16:32:59<2:37:17, 2.60s/it] +2025-02-06 02:40:39 - ERROR - stderr - +2025-02-06 02:40:39 - ERROR - stderr - +2025-02-06 02:40:39 - INFO - stdout - {'loss': 0.3569, 'grad_norm': 1.6358156204223633, 'learning_rate': 1.339986596621431e-06, 'epoch': 2.51} +2025-02-06 02:40:39 - ERROR - stderr - 84%|████████▍ | 18807/22434 [16:32:59<2:37:17, 2.60s/it] +2025-02-06 02:40:42 - ERROR - stderr - 84%|████████▍ | 18808/22434 [16:33:02<2:35:38, 2.58s/it] +2025-02-06 02:40:42 - ERROR - stderr - +2025-02-06 02:40:42 - ERROR - stderr - +2025-02-06 02:40:42 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.4552199840545654, 'learning_rate': 1.3392647533961056e-06, 'epoch': 2.52} +2025-02-06 02:40:42 - ERROR - stderr - 84%|████████▍ | 18808/22434 [16:33:02<2:35:38, 2.58s/it] +2025-02-06 02:40:44 - ERROR - stderr - 84%|████████▍ | 18809/22434 [16:33:04<2:34:04, 2.55s/it] +2025-02-06 02:40:44 - ERROR - stderr - +2025-02-06 02:40:44 - ERROR - stderr - +2025-02-06 02:40:44 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.5465292930603027, 'learning_rate': 1.338543090695521e-06, 'epoch': 2.52} +2025-02-06 02:40:44 - ERROR - stderr - 84%|████████▍ | 18809/22434 [16:33:04<2:34:04, 2.55s/it] +2025-02-06 02:40:47 - ERROR - stderr - 84%|████████▍ | 18810/22434 [16:33:07<2:36:10, 2.59s/it] +2025-02-06 02:40:47 - ERROR - stderr - +2025-02-06 02:40:47 - ERROR - stderr - +2025-02-06 02:40:47 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.416013240814209, 'learning_rate': 1.3378216085347128e-06, 'epoch': 2.52} +2025-02-06 02:40:47 - ERROR - stderr - 84%|████████▍ | 18810/22434 [16:33:07<2:36:10, 2.59s/it] +2025-02-06 02:40:50 - ERROR - stderr - 84%|████████▍ | 18811/22434 [16:33:09<2:35:03, 2.57s/it] +2025-02-06 02:40:50 - ERROR - stderr - +2025-02-06 02:40:50 - ERROR - stderr - +2025-02-06 02:40:50 - INFO - stdout - {'loss': 0.3288, 'grad_norm': 1.5111641883850098, 'learning_rate': 1.3371003069287292e-06, 'epoch': 2.52} +2025-02-06 02:40:50 - ERROR - stderr - 84%|████████▍ | 18811/22434 [16:33:09<2:35:03, 2.57s/it] +2025-02-06 02:40:52 - ERROR - stderr - 84%|████████▍ | 18812/22434 [16:33:12<2:33:44, 2.55s/it] +2025-02-06 02:40:52 - ERROR - stderr - +2025-02-06 02:40:52 - ERROR - stderr - +2025-02-06 02:40:52 - INFO - stdout - {'loss': 0.3929, 'grad_norm': 1.5227692127227783, 'learning_rate': 1.3363791858925978e-06, 'epoch': 2.52} +2025-02-06 02:40:52 - ERROR - stderr - 84%|████████▍ | 18812/22434 [16:33:12<2:33:44, 2.55s/it] +2025-02-06 02:40:54 - ERROR - stderr - 84%|████████▍ | 18813/22434 [16:33:14<2:31:31, 2.51s/it] +2025-02-06 02:40:54 - ERROR - stderr - +2025-02-06 02:40:54 - ERROR - stderr - +2025-02-06 02:40:54 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.4305846691131592, 'learning_rate': 1.3356582454413504e-06, 'epoch': 2.52} +2025-02-06 02:40:54 - ERROR - stderr - 84%|████████▍ | 18813/22434 [16:33:14<2:31:31, 2.51s/it] +2025-02-06 02:40:57 - ERROR - stderr - 84%|████████▍ | 18814/22434 [16:33:17<2:30:24, 2.49s/it] +2025-02-06 02:40:57 - ERROR - stderr - +2025-02-06 02:40:57 - ERROR - stderr - +2025-02-06 02:40:57 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.4858986139297485, 'learning_rate': 1.33493748559002e-06, 'epoch': 2.52} +2025-02-06 02:40:57 - ERROR - stderr - 84%|████████▍ | 18814/22434 [16:33:17<2:30:24, 2.49s/it] +2025-02-06 02:40:59 - ERROR - stderr - 84%|████████▍ | 18815/22434 [16:33:19<2:30:57, 2.50s/it] +2025-02-06 02:40:59 - ERROR - stderr - +2025-02-06 02:40:59 - ERROR - stderr - +2025-02-06 02:40:59 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.4846031665802002, 'learning_rate': 1.3342169063536214e-06, 'epoch': 2.52} +2025-02-06 02:40:59 - ERROR - stderr - 84%|████████▍ | 18815/22434 [16:33:19<2:30:57, 2.50s/it] +2025-02-06 02:41:02 - ERROR - stderr - 84%|████████▍ | 18816/22434 [16:33:22<2:30:44, 2.50s/it] +2025-02-06 02:41:02 - ERROR - stderr - +2025-02-06 02:41:02 - ERROR - stderr - +2025-02-06 02:41:02 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.7297195196151733, 'learning_rate': 1.333496507747184e-06, 'epoch': 2.52} +2025-02-06 02:41:02 - ERROR - stderr - 84%|████████▍ | 18816/22434 [16:33:22<2:30:44, 2.50s/it] +2025-02-06 02:41:04 - ERROR - stderr - 84%|████████▍ | 18817/22434 [16:33:24<2:30:33, 2.50s/it] +2025-02-06 02:41:04 - ERROR - stderr - +2025-02-06 02:41:04 - ERROR - stderr - +2025-02-06 02:41:04 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.680019736289978, 'learning_rate': 1.3327762897857167e-06, 'epoch': 2.52} +2025-02-06 02:41:04 - ERROR - stderr - 84%|████████▍ | 18817/22434 [16:33:24<2:30:33, 2.50s/it] +2025-02-06 02:41:07 - ERROR - stderr - 84%|████████▍ | 18818/22434 [16:33:27<2:29:24, 2.48s/it] +2025-02-06 02:41:07 - ERROR - stderr - +2025-02-06 02:41:07 - ERROR - stderr - +2025-02-06 02:41:07 - INFO - stdout - {'loss': 0.4049, 'grad_norm': 1.7061219215393066, 'learning_rate': 1.332056252484234e-06, 'epoch': 2.52} +2025-02-06 02:41:07 - ERROR - stderr - 84%|████████▍ | 18818/22434 [16:33:27<2:29:24, 2.48s/it] +2025-02-06 02:41:09 - ERROR - stderr - 84%|████████▍ | 18819/22434 [16:33:29<2:29:50, 2.49s/it] +2025-02-06 02:41:09 - ERROR - stderr - +2025-02-06 02:41:09 - ERROR - stderr - +2025-02-06 02:41:09 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.5936188697814941, 'learning_rate': 1.3313363958577442e-06, 'epoch': 2.52} +2025-02-06 02:41:09 - ERROR - stderr - 84%|████████▍ | 18819/22434 [16:33:29<2:29:50, 2.49s/it] +2025-02-06 02:41:12 - ERROR - stderr - 84%|████████▍ | 18820/22434 [16:33:32<2:31:09, 2.51s/it] +2025-02-06 02:41:12 - ERROR - stderr - +2025-02-06 02:41:12 - ERROR - stderr - +2025-02-06 02:41:12 - INFO - stdout - {'loss': 0.3258, 'grad_norm': 1.4679731130599976, 'learning_rate': 1.3306167199212527e-06, 'epoch': 2.52} +2025-02-06 02:41:12 - ERROR - stderr - 84%|████████▍ | 18820/22434 [16:33:32<2:31:09, 2.51s/it] +2025-02-06 02:41:14 - ERROR - stderr - 84%|████████▍ | 18821/22434 [16:33:34<2:31:09, 2.51s/it] +2025-02-06 02:41:14 - ERROR - stderr - +2025-02-06 02:41:14 - ERROR - stderr - +2025-02-06 02:41:14 - INFO - stdout - {'loss': 0.3415, 'grad_norm': 1.3932182788848877, 'learning_rate': 1.329897224689759e-06, 'epoch': 2.52} +2025-02-06 02:41:14 - ERROR - stderr - 84%|████████▍ | 18821/22434 [16:33:34<2:31:09, 2.51s/it] +2025-02-06 02:41:17 - ERROR - stderr - 84%|████████▍ | 18822/22434 [16:33:37<2:33:34, 2.55s/it] +2025-02-06 02:41:17 - ERROR - stderr - +2025-02-06 02:41:17 - ERROR - stderr - +2025-02-06 02:41:17 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.4148304462432861, 'learning_rate': 1.329177910178262e-06, 'epoch': 2.52} +2025-02-06 02:41:17 - ERROR - stderr - 84%|████████▍ | 18822/22434 [16:33:37<2:33:34, 2.55s/it] +2025-02-06 02:41:20 - ERROR - stderr - 84%|████████▍ | 18823/22434 [16:33:39<2:31:58, 2.53s/it] +2025-02-06 02:41:20 - ERROR - stderr - +2025-02-06 02:41:20 - ERROR - stderr - +2025-02-06 02:41:20 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.6134790182113647, 'learning_rate': 1.3284587764017543e-06, 'epoch': 2.52} +2025-02-06 02:41:20 - ERROR - stderr - 84%|████████▍ | 18823/22434 [16:33:39<2:31:58, 2.53s/it] +2025-02-06 02:41:22 - ERROR - stderr - 84%|████████▍ | 18824/22434 [16:33:42<2:32:56, 2.54s/it] +2025-02-06 02:41:22 - ERROR - stderr - +2025-02-06 02:41:22 - ERROR - stderr - +2025-02-06 02:41:22 - INFO - stdout - {'loss': 0.3579, 'grad_norm': 1.457486867904663, 'learning_rate': 1.3277398233752258e-06, 'epoch': 2.52} +2025-02-06 02:41:22 - ERROR - stderr - 84%|████████▍ | 18824/22434 [16:33:42<2:32:56, 2.54s/it] +2025-02-06 02:41:25 - ERROR - stderr - 84%|████████▍ | 18825/22434 [16:33:44<2:32:33, 2.54s/it] +2025-02-06 02:41:25 - ERROR - stderr - +2025-02-06 02:41:25 - ERROR - stderr - +2025-02-06 02:41:25 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.4696450233459473, 'learning_rate': 1.3270210511136616e-06, 'epoch': 2.52} +2025-02-06 02:41:25 - ERROR - stderr - 84%|████████▍ | 18825/22434 [16:33:44<2:32:33, 2.54s/it] +2025-02-06 02:41:27 - ERROR - stderr - 84%|████████▍ | 18826/22434 [16:33:47<2:32:01, 2.53s/it] +2025-02-06 02:41:27 - ERROR - stderr - +2025-02-06 02:41:27 - ERROR - stderr - +2025-02-06 02:41:27 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.6368086338043213, 'learning_rate': 1.326302459632045e-06, 'epoch': 2.52} +2025-02-06 02:41:27 - ERROR - stderr - 84%|████████▍ | 18826/22434 [16:33:47<2:32:01, 2.53s/it] +2025-02-06 02:41:30 - ERROR - stderr - 84%|████████▍ | 18827/22434 [16:33:49<2:32:07, 2.53s/it] +2025-02-06 02:41:30 - ERROR - stderr - +2025-02-06 02:41:30 - ERROR - stderr - +2025-02-06 02:41:30 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.4553279876708984, 'learning_rate': 1.3255840489453542e-06, 'epoch': 2.52} +2025-02-06 02:41:30 - ERROR - stderr - 84%|████████▍ | 18827/22434 [16:33:49<2:32:07, 2.53s/it] +2025-02-06 02:41:32 - ERROR - stderr - 84%|████████▍ | 18828/22434 [16:33:52<2:31:15, 2.52s/it] +2025-02-06 02:41:32 - ERROR - stderr - +2025-02-06 02:41:32 - ERROR - stderr - +2025-02-06 02:41:32 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.5567313432693481, 'learning_rate': 1.3248658190685648e-06, 'epoch': 2.52} +2025-02-06 02:41:32 - ERROR - stderr - 84%|████████▍ | 18828/22434 [16:33:52<2:31:15, 2.52s/it] +2025-02-06 02:41:35 - ERROR - stderr - 84%|████████▍ | 18829/22434 [16:33:54<2:32:17, 2.53s/it] +2025-02-06 02:41:35 - ERROR - stderr - +2025-02-06 02:41:35 - ERROR - stderr - +2025-02-06 02:41:35 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.6422576904296875, 'learning_rate': 1.3241477700166427e-06, 'epoch': 2.52} +2025-02-06 02:41:35 - ERROR - stderr - 84%|████████▍ | 18829/22434 [16:33:55<2:32:17, 2.53s/it] +2025-02-06 02:41:37 - ERROR - stderr - 84%|█��██████▍ | 18830/22434 [16:33:57<2:34:33, 2.57s/it] +2025-02-06 02:41:37 - ERROR - stderr - +2025-02-06 02:41:37 - ERROR - stderr - +2025-02-06 02:41:37 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.5452489852905273, 'learning_rate': 1.3234299018045615e-06, 'epoch': 2.52} +2025-02-06 02:41:37 - ERROR - stderr - 84%|████████▍ | 18830/22434 [16:33:57<2:34:33, 2.57s/it] +2025-02-06 02:41:40 - ERROR - stderr - 84%|████████▍ | 18831/22434 [16:34:00<2:34:47, 2.58s/it] +2025-02-06 02:41:40 - ERROR - stderr - +2025-02-06 02:41:40 - ERROR - stderr - +2025-02-06 02:41:40 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.5659314393997192, 'learning_rate': 1.3227122144472782e-06, 'epoch': 2.52} +2025-02-06 02:41:40 - ERROR - stderr - 84%|████████▍ | 18831/22434 [16:34:00<2:34:47, 2.58s/it] +2025-02-06 02:41:43 - ERROR - stderr - 84%|████████▍ | 18832/22434 [16:34:02<2:36:46, 2.61s/it] +2025-02-06 02:41:43 - ERROR - stderr - +2025-02-06 02:41:43 - ERROR - stderr - +2025-02-06 02:41:43 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.5921807289123535, 'learning_rate': 1.3219947079597573e-06, 'epoch': 2.52} +2025-02-06 02:41:43 - ERROR - stderr - 84%|████████▍ | 18832/22434 [16:34:02<2:36:46, 2.61s/it] +2025-02-06 02:41:45 - ERROR - stderr - 84%|████████▍ | 18833/22434 [16:34:05<2:35:10, 2.59s/it] +2025-02-06 02:41:45 - ERROR - stderr - +2025-02-06 02:41:45 - ERROR - stderr - +2025-02-06 02:41:45 - INFO - stdout - {'loss': 0.4004, 'grad_norm': 1.5508805513381958, 'learning_rate': 1.3212773823569548e-06, 'epoch': 2.52} +2025-02-06 02:41:45 - ERROR - stderr - 84%|████████▍ | 18833/22434 [16:34:05<2:35:10, 2.59s/it] +2025-02-06 02:41:48 - ERROR - stderr - 84%|████████▍ | 18834/22434 [16:34:08<2:43:25, 2.72s/it] +2025-02-06 02:41:48 - ERROR - stderr - +2025-02-06 02:41:48 - ERROR - stderr - +2025-02-06 02:41:48 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.5954124927520752, 'learning_rate': 1.3205602376538162e-06, 'epoch': 2.52} +2025-02-06 02:41:48 - ERROR - stderr - 84%|████████▍ | 18834/22434 [16:34:08<2:43:25, 2.72s/it] +2025-02-06 02:41:51 - ERROR - stderr - 84%|████████▍ | 18835/22434 [16:34:10<2:38:26, 2.64s/it] +2025-02-06 02:41:51 - ERROR - stderr - +2025-02-06 02:41:51 - ERROR - stderr - +2025-02-06 02:41:51 - INFO - stdout - {'loss': 0.3341, 'grad_norm': 1.544524073600769, 'learning_rate': 1.3198432738652988e-06, 'epoch': 2.52} +2025-02-06 02:41:51 - ERROR - stderr - 84%|████████▍ | 18835/22434 [16:34:11<2:38:26, 2.64s/it] +2025-02-06 02:41:53 - ERROR - stderr - 84%|████████▍ | 18836/22434 [16:34:13<2:34:44, 2.58s/it] +2025-02-06 02:41:53 - ERROR - stderr - +2025-02-06 02:41:53 - ERROR - stderr - +2025-02-06 02:41:53 - INFO - stdout - {'loss': 0.3964, 'grad_norm': 1.5325716733932495, 'learning_rate': 1.3191264910063405e-06, 'epoch': 2.52} +2025-02-06 02:41:53 - ERROR - stderr - 84%|████████▍ | 18836/22434 [16:34:13<2:34:44, 2.58s/it] +2025-02-06 02:41:56 - ERROR - stderr - 84%|████████▍ | 18837/22434 [16:34:15<2:34:35, 2.58s/it] +2025-02-06 02:41:56 - ERROR - stderr - +2025-02-06 02:41:56 - ERROR - stderr - +2025-02-06 02:41:56 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.3853940963745117, 'learning_rate': 1.3184098890918829e-06, 'epoch': 2.52} +2025-02-06 02:41:56 - ERROR - stderr - 84%|████████▍ | 18837/22434 [16:34:16<2:34:35, 2.58s/it] +2025-02-06 02:41:58 - ERROR - stderr - 84%|████████▍ | 18838/22434 [16:34:18<2:32:17, 2.54s/it] +2025-02-06 02:41:58 - ERROR - stderr - +2025-02-06 02:41:58 - ERROR - stderr - +2025-02-06 02:41:58 - INFO - stdout - {'loss': 0.3327, 'grad_norm': 1.5011093616485596, 'learning_rate': 1.3176934681368648e-06, 'epoch': 2.52} +2025-02-06 02:41:58 - ERROR - stderr - 84%|████████▍ | 18838/22434 [16:34:18<2:32:17, 2.54s/it] +2025-02-06 02:42:01 - ERROR - stderr - 84%|████████▍ | 18839/22434 [16:34:20<2:31:07, 2.52s/it] +2025-02-06 02:42:01 - ERROR - stderr - +2025-02-06 02:42:01 - ERROR - stderr - +2025-02-06 02:42:01 - INFO - stdout - {'loss': 0.326, 'grad_norm': 1.4824696779251099, 'learning_rate': 1.3169772281562154e-06, 'epoch': 2.52} +2025-02-06 02:42:01 - ERROR - stderr - 84%|████████▍ | 18839/22434 [16:34:20<2:31:07, 2.52s/it] +2025-02-06 02:42:03 - ERROR - stderr - 84%|████████▍ | 18840/22434 [16:34:23<2:30:13, 2.51s/it] +2025-02-06 02:42:03 - ERROR - stderr - +2025-02-06 02:42:03 - ERROR - stderr - +2025-02-06 02:42:03 - INFO - stdout - {'loss': 0.3216, 'grad_norm': 1.5798135995864868, 'learning_rate': 1.3162611691648708e-06, 'epoch': 2.52} +2025-02-06 02:42:03 - ERROR - stderr - 84%|████████▍ | 18840/22434 [16:34:23<2:30:13, 2.51s/it] +2025-02-06 02:42:06 - ERROR - stderr - 84%|████████▍ | 18841/22434 [16:34:25<2:30:16, 2.51s/it] +2025-02-06 02:42:06 - ERROR - stderr - +2025-02-06 02:42:06 - ERROR - stderr - +2025-02-06 02:42:06 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.6640273332595825, 'learning_rate': 1.3155452911777511e-06, 'epoch': 2.52} +2025-02-06 02:42:06 - ERROR - stderr - 84%|████████▍ | 18841/22434 [16:34:25<2:30:16, 2.51s/it] +2025-02-06 02:42:08 - ERROR - stderr - 84%|████████▍ | 18842/22434 [16:34:28<2:29:34, 2.50s/it] +2025-02-06 02:42:08 - ERROR - stderr - +2025-02-06 02:42:08 - ERROR - stderr - +2025-02-06 02:42:08 - INFO - stdout - {'loss': 0.4206, 'grad_norm': 1.6306378841400146, 'learning_rate': 1.3148295942097799e-06, 'epoch': 2.52} +2025-02-06 02:42:08 - ERROR - stderr - 84%|████████▍ | 18842/22434 [16:34:28<2:29:34, 2.50s/it] +2025-02-06 02:42:11 - ERROR - stderr - 84%|████████▍ | 18843/22434 [16:34:31<2:34:37, 2.58s/it] +2025-02-06 02:42:11 - ERROR - stderr - +2025-02-06 02:42:11 - ERROR - stderr - +2025-02-06 02:42:11 - INFO - stdout - {'loss': 0.3378, 'grad_norm': 1.576798915863037, 'learning_rate': 1.3141140782758743e-06, 'epoch': 2.52} +2025-02-06 02:42:11 - ERROR - stderr - 84%|████████▍ | 18843/22434 [16:34:31<2:34:37, 2.58s/it] +2025-02-06 02:42:14 - ERROR - stderr - 84%|████████▍ | 18844/22434 [16:34:34<2:40:13, 2.68s/it] +2025-02-06 02:42:14 - ERROR - stderr - +2025-02-06 02:42:14 - ERROR - stderr - +2025-02-06 02:42:14 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.5848937034606934, 'learning_rate': 1.3133987433909502e-06, 'epoch': 2.52} +2025-02-06 02:42:14 - ERROR - stderr - 84%|████████▍ | 18844/22434 [16:34:34<2:40:13, 2.68s/it] +2025-02-06 02:42:16 - ERROR - stderr - 84%|████████▍ | 18845/22434 [16:34:36<2:40:37, 2.69s/it] +2025-02-06 02:42:17 - ERROR - stderr - +2025-02-06 02:42:17 - ERROR - stderr - +2025-02-06 02:42:17 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.750974416732788, 'learning_rate': 1.3126835895699164e-06, 'epoch': 2.52} +2025-02-06 02:42:17 - ERROR - stderr - 84%|████████▍ | 18845/22434 [16:34:36<2:40:37, 2.69s/it] +2025-02-06 02:42:19 - ERROR - stderr - 84%|████████▍ | 18846/22434 [16:34:39<2:36:59, 2.63s/it] +2025-02-06 02:42:19 - ERROR - stderr - +2025-02-06 02:42:19 - ERROR - stderr - +2025-02-06 02:42:19 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.5847290754318237, 'learning_rate': 1.3119686168276812e-06, 'epoch': 2.52} +2025-02-06 02:42:19 - ERROR - stderr - 84%|████████▍ | 18846/22434 [16:34:39<2:36:59, 2.63s/it] +2025-02-06 02:42:21 - ERROR - stderr - 84%|████████▍ | 18847/22434 [16:34:41<2:33:55, 2.57s/it] +2025-02-06 02:42:21 - ERROR - stderr - +2025-02-06 02:42:21 - ERROR - stderr - +2025-02-06 02:42:21 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.610335111618042, 'learning_rate': 1.3112538251791461e-06, 'epoch': 2.52} +2025-02-06 02:42:21 - ERROR - stderr - 84%|████████▍ | 18847/22434 [16:34:41<2:33:55, 2.57s/it] +2025-02-06 02:42:24 - ERROR - stderr - 84%|████████▍ | 18848/22434 [16:34:44<2:33:31, 2.57s/it] +2025-02-06 02:42:24 - ERROR - stderr - +2025-02-06 02:42:24 - ERROR - stderr - +2025-02-06 02:42:24 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.6374783515930176, 'learning_rate': 1.3105392146392104e-06, 'epoch': 2.52} +2025-02-06 02:42:24 - ERROR - stderr - 84%|████████▍ | 18848/22434 [16:34:44<2:33:31, 2.57s/it] +2025-02-06 02:42:27 - ERROR - stderr - 84%|████████▍ | 18849/22434 [16:34:46<2:32:57, 2.56s/it] +2025-02-06 02:42:27 - ERROR - stderr - +2025-02-06 02:42:27 - ERROR - stderr - +2025-02-06 02:42:27 - INFO - stdout - {'loss': 0.3009, 'grad_norm': 1.5574485063552856, 'learning_rate': 1.309824785222772e-06, 'epoch': 2.52} +2025-02-06 02:42:27 - ERROR - stderr - 84%|████████▍ | 18849/22434 [16:34:46<2:32:57, 2.56s/it] +2025-02-06 02:42:29 - ERROR - stderr - 84%|████████▍ | 18850/22434 [16:34:49<2:30:33, 2.52s/it] +2025-02-06 02:42:29 - ERROR - stderr - +2025-02-06 02:42:29 - ERROR - stderr - +2025-02-06 02:42:29 - INFO - stdout - {'loss': 0.3312, 'grad_norm': 1.4666000604629517, 'learning_rate': 1.3091105369447166e-06, 'epoch': 2.52} +2025-02-06 02:42:29 - ERROR - stderr - 84%|████████▍ | 18850/22434 [16:34:49<2:30:33, 2.52s/it] +2025-02-06 02:42:32 - ERROR - stderr - 84%|████████▍ | 18851/22434 [16:34:51<2:32:08, 2.55s/it] +2025-02-06 02:42:32 - ERROR - stderr - +2025-02-06 02:42:32 - ERROR - stderr - +2025-02-06 02:42:32 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.4455159902572632, 'learning_rate': 1.308396469819938e-06, 'epoch': 2.52} +2025-02-06 02:42:32 - ERROR - stderr - 84%|████████▍ | 18851/22434 [16:34:51<2:32:08, 2.55s/it] +2025-02-06 02:42:34 - ERROR - stderr - 84%|████████▍ | 18852/22434 [16:34:54<2:30:42, 2.52s/it] +2025-02-06 02:42:34 - ERROR - stderr - +2025-02-06 02:42:34 - ERROR - stderr - +2025-02-06 02:42:34 - INFO - stdout - {'loss': 0.3203, 'grad_norm': 1.4717121124267578, 'learning_rate': 1.30768258386332e-06, 'epoch': 2.52} +2025-02-06 02:42:34 - ERROR - stderr - 84%|████████▍ | 18852/22434 [16:34:54<2:30:42, 2.52s/it] +2025-02-06 02:42:36 - ERROR - stderr - 84%|████████▍ | 18853/22434 [16:34:56<2:28:53, 2.49s/it] +2025-02-06 02:42:36 - ERROR - stderr - +2025-02-06 02:42:36 - ERROR - stderr - +2025-02-06 02:42:36 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.5219894647598267, 'learning_rate': 1.3069688790897362e-06, 'epoch': 2.52} +2025-02-06 02:42:36 - ERROR - stderr - 84%|████████▍ | 18853/22434 [16:34:56<2:28:53, 2.49s/it] +2025-02-06 02:42:39 - ERROR - stderr - 84%|████████▍ | 18854/22434 [16:34:59<2:29:21, 2.50s/it] +2025-02-06 02:42:39 - ERROR - stderr - +2025-02-06 02:42:39 - ERROR - stderr - +2025-02-06 02:42:39 - INFO - stdout - {'loss': 0.4369, 'grad_norm': 1.7840780019760132, 'learning_rate': 1.3062553555140722e-06, 'epoch': 2.52} +2025-02-06 02:42:39 - ERROR - stderr - 84%|████████▍ | 18854/22434 [16:34:59<2:29:21, 2.50s/it] +2025-02-06 02:42:41 - ERROR - stderr - 84%|████████▍ | 18855/22434 [16:35:01<2:29:42, 2.51s/it] +2025-02-06 02:42:42 - ERROR - stderr - +2025-02-06 02:42:42 - ERROR - stderr - +2025-02-06 02:42:42 - INFO - stdout - {'loss': 0.4057, 'grad_norm': 1.6263341903686523, 'learning_rate': 1.305542013151192e-06, 'epoch': 2.52} +2025-02-06 02:42:42 - ERROR - stderr - 84%|████████▍ | 18855/22434 [16:35:01<2:29:42, 2.51s/it] +2025-02-06 02:42:44 - ERROR - stderr - 84%|████████▍ | 18856/22434 [16:35:04<2:29:41, 2.51s/it] +2025-02-06 02:42:44 - ERROR - stderr - +2025-02-06 02:42:44 - ERROR - stderr - +2025-02-06 02:42:44 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.6607122421264648, 'learning_rate': 1.3048288520159736e-06, 'epoch': 2.52} +2025-02-06 02:42:44 - ERROR - stderr - 84%|████████▍ | 18856/22434 [16:35:04<2:29:41, 2.51s/it] +2025-02-06 02:42:46 - ERROR - stderr - 84%|████████▍ | 18857/22434 [16:35:06<2:28:34, 2.49s/it] +2025-02-06 02:42:47 - ERROR - stderr - +2025-02-06 02:42:47 - ERROR - stderr - +2025-02-06 02:42:47 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.5537337064743042, 'learning_rate': 1.304115872123275e-06, 'epoch': 2.52} +2025-02-06 02:42:47 - ERROR - stderr - 84%|████████▍ | 18857/22434 [16:35:06<2:28:34, 2.49s/it] +2025-02-06 02:42:49 - ERROR - stderr - 84%|████████▍ | 18858/22434 [16:35:09<2:32:52, 2.57s/it] +2025-02-06 02:42:49 - ERROR - stderr - +2025-02-06 02:42:49 - ERROR - stderr - +2025-02-06 02:42:49 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.489897608757019, 'learning_rate': 1.3034030734879576e-06, 'epoch': 2.52} +2025-02-06 02:42:49 - ERROR - stderr - 84%|████████▍ | 18858/22434 [16:35:09<2:32:52, 2.57s/it] +2025-02-06 02:42:52 - ERROR - stderr - 84%|████████▍ | 18859/22434 [16:35:12<2:32:52, 2.57s/it] +2025-02-06 02:42:52 - ERROR - stderr - +2025-02-06 02:42:52 - ERROR - stderr - +2025-02-06 02:42:52 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.659265398979187, 'learning_rate': 1.3026904561248865e-06, 'epoch': 2.52} +2025-02-06 02:42:52 - ERROR - stderr - 84%|████████▍ | 18859/22434 [16:35:12<2:32:52, 2.57s/it] +2025-02-06 02:42:54 - ERROR - stderr - 84%|████████▍ | 18860/22434 [16:35:14<2:32:34, 2.56s/it] +2025-02-06 02:42:54 - ERROR - stderr - +2025-02-06 02:42:54 - ERROR - stderr - +2025-02-06 02:42:54 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.540769338607788, 'learning_rate': 1.3019780200489073e-06, 'epoch': 2.52} +2025-02-06 02:42:54 - ERROR - stderr - 84%|████████▍ | 18860/22434 [16:35:14<2:32:34, 2.56s/it] +2025-02-06 02:42:57 - ERROR - stderr - 84%|████████▍ | 18861/22434 [16:35:17<2:35:01, 2.60s/it] +2025-02-06 02:42:57 - ERROR - stderr - +2025-02-06 02:42:57 - ERROR - stderr - +2025-02-06 02:42:57 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.741860270500183, 'learning_rate': 1.301265765274874e-06, 'epoch': 2.52} +2025-02-06 02:42:57 - ERROR - stderr - 84%|████████▍ | 18861/22434 [16:35:17<2:35:01, 2.60s/it] +2025-02-06 02:43:00 - ERROR - stderr - 84%|████████▍ | 18862/22434 [16:35:20<2:38:52, 2.67s/it] +2025-02-06 02:43:00 - ERROR - stderr - +2025-02-06 02:43:00 - ERROR - stderr - +2025-02-06 02:43:00 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.45553719997406, 'learning_rate': 1.3005536918176309e-06, 'epoch': 2.52} +2025-02-06 02:43:00 - ERROR - stderr - 84%|████████▍ | 18862/22434 [16:35:20<2:38:52, 2.67s/it] +2025-02-06 02:43:02 - ERROR - stderr - 84%|████████▍ | 18863/22434 [16:35:22<2:35:47, 2.62s/it] +2025-02-06 02:43:02 - ERROR - stderr - +2025-02-06 02:43:02 - ERROR - stderr - +2025-02-06 02:43:02 - INFO - stdout - {'loss': 0.4195, 'grad_norm': 1.6767503023147583, 'learning_rate': 1.299841799692023e-06, 'epoch': 2.52} +2025-02-06 02:43:02 - ERROR - stderr - 84%|████████▍ | 18863/22434 [16:35:22<2:35:47, 2.62s/it] +2025-02-06 02:43:05 - ERROR - stderr - 84%|████████▍ | 18864/22434 [16:35:25<2:32:56, 2.57s/it] +2025-02-06 02:43:05 - ERROR - stderr - +2025-02-06 02:43:05 - ERROR - stderr - +2025-02-06 02:43:05 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.3968744277954102, 'learning_rate': 1.2991300889128867e-06, 'epoch': 2.52} +2025-02-06 02:43:05 - ERROR - stderr - 84%|████████▍ | 18864/22434 [16:35:25<2:32:56, 2.57s/it] +2025-02-06 02:43:07 - ERROR - stderr - 84%|████████▍ | 18865/22434 [16:35:27<2:31:23, 2.55s/it] +2025-02-06 02:43:07 - ERROR - stderr - +2025-02-06 02:43:07 - ERROR - stderr - +2025-02-06 02:43:07 - INFO - stdout - {'loss': 0.3211, 'grad_norm': 1.4151256084442139, 'learning_rate': 1.2984185594950582e-06, 'epoch': 2.52} +2025-02-06 02:43:07 - ERROR - stderr - 84%|████████▍ | 18865/22434 [16:35:27<2:31:23, 2.55s/it] +2025-02-06 02:43:10 - ERROR - stderr - 84%|████████▍ | 18866/22434 [16:35:30<2:31:10, 2.54s/it] +2025-02-06 02:43:10 - ERROR - stderr - +2025-02-06 02:43:10 - ERROR - stderr - +2025-02-06 02:43:10 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.5486268997192383, 'learning_rate': 1.2977072114533683e-06, 'epoch': 2.52} +2025-02-06 02:43:10 - ERROR - stderr - 84%|████████▍ | 18866/22434 [16:35:30<2:31:10, 2.54s/it] +2025-02-06 02:43:12 - ERROR - stderr - 84%|████████▍ | 18867/22434 [16:35:32<2:30:16, 2.53s/it] +2025-02-06 02:43:12 - ERROR - stderr - +2025-02-06 02:43:12 - ERROR - stderr - +2025-02-06 02:43:12 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.5191481113433838, 'learning_rate': 1.2969960448026443e-06, 'epoch': 2.52} +2025-02-06 02:43:12 - ERROR - stderr - 84%|████████▍ | 18867/22434 [16:35:32<2:30:16, 2.53s/it] +2025-02-06 02:43:15 - ERROR - stderr - 84%|████████▍ | 18868/22434 [16:35:35<2:28:31, 2.50s/it] +2025-02-06 02:43:15 - ERROR - stderr - +2025-02-06 02:43:15 - ERROR - stderr - +2025-02-06 02:43:15 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.516695499420166, 'learning_rate': 1.2962850595577092e-06, 'epoch': 2.52} +2025-02-06 02:43:15 - ERROR - stderr - 84%|████████▍ | 18868/22434 [16:35:35<2:28:31, 2.50s/it] +2025-02-06 02:43:17 - ERROR - stderr - 84%|████████▍ | 18869/22434 [16:35:37<2:29:13, 2.51s/it] +2025-02-06 02:43:17 - ERROR - stderr - +2025-02-06 02:43:17 - ERROR - stderr - +2025-02-06 02:43:17 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.7147924900054932, 'learning_rate': 1.295574255733385e-06, 'epoch': 2.52} +2025-02-06 02:43:17 - ERROR - stderr - 84%|████████▍ | 18869/22434 [16:35:37<2:29:13, 2.51s/it] +2025-02-06 02:43:20 - ERROR - stderr - 84%|████████▍ | 18870/22434 [16:35:40<2:30:05, 2.53s/it] +2025-02-06 02:43:20 - ERROR - stderr - +2025-02-06 02:43:20 - ERROR - stderr - +2025-02-06 02:43:20 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.419689655303955, 'learning_rate': 1.2948636333444853e-06, 'epoch': 2.52} +2025-02-06 02:43:20 - ERROR - stderr - 84%|████████▍ | 18870/22434 [16:35:40<2:30:05, 2.53s/it] +2025-02-06 02:43:22 - ERROR - stderr - 84%|████████▍ | 18871/22434 [16:35:42<2:29:40, 2.52s/it] +2025-02-06 02:43:22 - ERROR - stderr - +2025-02-06 02:43:22 - ERROR - stderr - +2025-02-06 02:43:22 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.519533395767212, 'learning_rate': 1.2941531924058227e-06, 'epoch': 2.52} +2025-02-06 02:43:22 - ERROR - stderr - 84%|████████▍ | 18871/22434 [16:35:42<2:29:40, 2.52s/it] +2025-02-06 02:43:25 - ERROR - stderr - 84%|████████▍ | 18872/22434 [16:35:45<2:29:15, 2.51s/it] +2025-02-06 02:43:25 - ERROR - stderr - +2025-02-06 02:43:25 - ERROR - stderr - +2025-02-06 02:43:25 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.5843182802200317, 'learning_rate': 1.2934429329322073e-06, 'epoch': 2.52} +2025-02-06 02:43:25 - ERROR - stderr - 84%|████████▍ | 18872/22434 [16:35:45<2:29:15, 2.51s/it] +2025-02-06 02:43:27 - ERROR - stderr - 84%|████████▍ | 18873/22434 [16:35:47<2:28:10, 2.50s/it] +2025-02-06 02:43:27 - ERROR - stderr - +2025-02-06 02:43:27 - ERROR - stderr - +2025-02-06 02:43:27 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.312522530555725, 'learning_rate': 1.2927328549384444e-06, 'epoch': 2.52} +2025-02-06 02:43:27 - ERROR - stderr - 84%|████████▍ | 18873/22434 [16:35:47<2:28:10, 2.50s/it] +2025-02-06 02:43:30 - ERROR - stderr - 84%|████████▍ | 18874/22434 [16:35:50<2:28:16, 2.50s/it] +2025-02-06 02:43:30 - ERROR - stderr - +2025-02-06 02:43:30 - ERROR - stderr - +2025-02-06 02:43:30 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.5722289085388184, 'learning_rate': 1.2920229584393284e-06, 'epoch': 2.52} +2025-02-06 02:43:30 - ERROR - stderr - 84%|████████▍ | 18874/22434 [16:35:50<2:28:16, 2.50s/it] +2025-02-06 02:43:32 - ERROR - stderr - 84%|████████▍ | 18875/22434 [16:35:52<2:26:43, 2.47s/it] +2025-02-06 02:43:32 - ERROR - stderr - +2025-02-06 02:43:32 - ERROR - stderr - +2025-02-06 02:43:32 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.5377271175384521, 'learning_rate': 1.2913132434496666e-06, 'epoch': 2.52} +2025-02-06 02:43:32 - ERROR - stderr - 84%|████████▍ | 18875/22434 [16:35:52<2:26:43, 2.47s/it] +2025-02-06 02:43:35 - ERROR - stderr - 84%|████████▍ | 18876/22434 [16:35:55<2:27:51, 2.49s/it] +2025-02-06 02:43:35 - ERROR - stderr - +2025-02-06 02:43:35 - ERROR - stderr - +2025-02-06 02:43:35 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.5642294883728027, 'learning_rate': 1.2906037099842417e-06, 'epoch': 2.52} +2025-02-06 02:43:35 - ERROR - stderr - 84%|████████▍ | 18876/22434 [16:35:55<2:27:51, 2.49s/it] +2025-02-06 02:43:37 - ERROR - stderr - 84%|████████▍ | 18877/22434 [16:35:57<2:29:03, 2.51s/it] +2025-02-06 02:43:37 - ERROR - stderr - +2025-02-06 02:43:37 - ERROR - stderr - +2025-02-06 02:43:37 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.3723654747009277, 'learning_rate': 1.2898943580578504e-06, 'epoch': 2.52} +2025-02-06 02:43:37 - ERROR - stderr - 84%|████████▍ | 18877/22434 [16:35:57<2:29:03, 2.51s/it] +2025-02-06 02:43:40 - ERROR - stderr - 84%|████████▍ | 18878/22434 [16:36:00<2:30:07, 2.53s/it] +2025-02-06 02:43:40 - ERROR - stderr - +2025-02-06 02:43:40 - ERROR - stderr - +2025-02-06 02:43:40 - INFO - stdout - {'loss': 0.3313, 'grad_norm': 1.5495398044586182, 'learning_rate': 1.2891851876852802e-06, 'epoch': 2.52} +2025-02-06 02:43:40 - ERROR - stderr - 84%|████████▍ | 18878/22434 [16:36:00<2:30:07, 2.53s/it] +2025-02-06 02:43:42 - ERROR - stderr - 84%|████████▍ | 18879/22434 [16:36:02<2:28:51, 2.51s/it] +2025-02-06 02:43:42 - ERROR - stderr - +2025-02-06 02:43:42 - ERROR - stderr - +2025-02-06 02:43:42 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.7357217073440552, 'learning_rate': 1.2884761988813034e-06, 'epoch': 2.52} +2025-02-06 02:43:42 - ERROR - stderr - 84%|████████▍ | 18879/22434 [16:36:02<2:28:51, 2.51s/it] +2025-02-06 02:43:45 - ERROR - stderr - 84%|████████▍ | 18880/22434 [16:36:05<2:28:48, 2.51s/it] +2025-02-06 02:43:45 - ERROR - stderr - +2025-02-06 02:43:45 - ERROR - stderr - +2025-02-06 02:43:45 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.5397751331329346, 'learning_rate': 1.2877673916607092e-06, 'epoch': 2.52} +2025-02-06 02:43:45 - ERROR - stderr - 84%|████████▍ | 18880/22434 [16:36:05<2:28:48, 2.51s/it] +2025-02-06 02:43:47 - ERROR - stderr - 84%|████████▍ | 18881/22434 [16:36:07<2:30:07, 2.54s/it] +2025-02-06 02:43:48 - ERROR - stderr - +2025-02-06 02:43:48 - ERROR - stderr - +2025-02-06 02:43:48 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.5161101818084717, 'learning_rate': 1.287058766038265e-06, 'epoch': 2.52} +2025-02-06 02:43:48 - ERROR - stderr - 84%|████████▍ | 18881/22434 [16:36:07<2:30:07, 2.54s/it] +2025-02-06 02:43:50 - ERROR - stderr - 84%|████████▍ | 18882/22434 [16:36:10<2:30:56, 2.55s/it] +2025-02-06 02:43:50 - ERROR - stderr - +2025-02-06 02:43:50 - ERROR - stderr - +2025-02-06 02:43:50 - INFO - stdout - {'loss': 0.3311, 'grad_norm': 1.4876248836517334, 'learning_rate': 1.2863503220287433e-06, 'epoch': 2.53} +2025-02-06 02:43:50 - ERROR - stderr - 84%|████████▍ | 18882/22434 [16:36:10<2:30:56, 2.55s/it] +2025-02-06 02:43:53 - ERROR - stderr - 84%|████████▍ | 18883/22434 [16:36:12<2:31:42, 2.56s/it] +2025-02-06 02:43:53 - ERROR - stderr - +2025-02-06 02:43:53 - ERROR - stderr - +2025-02-06 02:43:53 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.5441436767578125, 'learning_rate': 1.285642059646911e-06, 'epoch': 2.53} +2025-02-06 02:43:53 - ERROR - stderr - 84%|████████▍ | 18883/22434 [16:36:12<2:31:42, 2.56s/it] +2025-02-06 02:43:55 - ERROR - stderr - 84%|████████▍ | 18884/22434 [16:36:15<2:30:20, 2.54s/it] +2025-02-06 02:43:55 - ERROR - stderr - +2025-02-06 02:43:55 - ERROR - stderr - +2025-02-06 02:43:55 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.4930050373077393, 'learning_rate': 1.28493397890753e-06, 'epoch': 2.53} +2025-02-06 02:43:55 - ERROR - stderr - 84%|████████▍ | 18884/22434 [16:36:15<2:30:20, 2.54s/it] +2025-02-06 02:43:58 - ERROR - stderr - 84%|████████▍ | 18885/22434 [16:36:17<2:29:59, 2.54s/it] +2025-02-06 02:43:58 - ERROR - stderr - +2025-02-06 02:43:58 - ERROR - stderr - +2025-02-06 02:43:58 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.625380039215088, 'learning_rate': 1.2842260798253637e-06, 'epoch': 2.53} +2025-02-06 02:43:58 - ERROR - stderr - 84%|████████▍ | 18885/22434 [16:36:17<2:29:59, 2.54s/it] +2025-02-06 02:44:00 - ERROR - stderr - 84%|████████▍ | 18886/22434 [16:36:20<2:34:47, 2.62s/it] +2025-02-06 02:44:01 - ERROR - stderr - +2025-02-06 02:44:01 - ERROR - stderr - +2025-02-06 02:44:01 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.4921505451202393, 'learning_rate': 1.2835183624151637e-06, 'epoch': 2.53} +2025-02-06 02:44:01 - ERROR - stderr - 84%|████████▍ | 18886/22434 [16:36:20<2:34:47, 2.62s/it] +2025-02-06 02:44:03 - ERROR - stderr - 84%|████████▍ | 18887/22434 [16:36:23<2:35:41, 2.63s/it] +2025-02-06 02:44:03 - ERROR - stderr - +2025-02-06 02:44:03 - ERROR - stderr - +2025-02-06 02:44:03 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.3709782361984253, 'learning_rate': 1.2828108266916817e-06, 'epoch': 2.53} +2025-02-06 02:44:03 - ERROR - stderr - 84%|████████▍ | 18887/22434 [16:36:23<2:35:41, 2.63s/it] +2025-02-06 02:44:06 - ERROR - stderr - 84%|████████▍ | 18888/22434 [16:36:25<2:34:14, 2.61s/it] +2025-02-06 02:44:06 - ERROR - stderr - +2025-02-06 02:44:06 - ERROR - stderr - +2025-02-06 02:44:06 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.5018943548202515, 'learning_rate': 1.2821034726696669e-06, 'epoch': 2.53} +2025-02-06 02:44:06 - ERROR - stderr - 84%|████████▍ | 18888/22434 [16:36:26<2:34:14, 2.61s/it] +2025-02-06 02:44:08 - ERROR - stderr - 84%|████████▍ | 18889/22434 [16:36:28<2:33:02, 2.59s/it] +2025-02-06 02:44:08 - ERROR - stderr - +2025-02-06 02:44:08 - ERROR - stderr - +2025-02-06 02:44:08 - INFO - stdout - {'loss': 0.43, 'grad_norm': 1.8628551959991455, 'learning_rate': 1.281396300363863e-06, 'epoch': 2.53} +2025-02-06 02:44:08 - ERROR - stderr - 84%|████████▍ | 18889/22434 [16:36:28<2:33:02, 2.59s/it] +2025-02-06 02:44:11 - ERROR - stderr - 84%|████████▍ | 18890/22434 [16:36:30<2:30:58, 2.56s/it] +2025-02-06 02:44:11 - ERROR - stderr - +2025-02-06 02:44:11 - ERROR - stderr - +2025-02-06 02:44:11 - INFO - stdout - {'loss': 0.3712, 'grad_norm': 1.6328073740005493, 'learning_rate': 1.2806893097890105e-06, 'epoch': 2.53} +2025-02-06 02:44:11 - ERROR - stderr - 84%|████████▍ | 18890/22434 [16:36:31<2:30:58, 2.56s/it] +2025-02-06 02:44:13 - ERROR - stderr - 84%|████████▍ | 18891/22434 [16:36:33<2:30:30, 2.55s/it] +2025-02-06 02:44:13 - ERROR - stderr - +2025-02-06 02:44:13 - ERROR - stderr - +2025-02-06 02:44:13 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.5484539270401, 'learning_rate': 1.2799825009598466e-06, 'epoch': 2.53} +2025-02-06 02:44:13 - ERROR - stderr - 84%|████████▍ | 18891/22434 [16:36:33<2:30:30, 2.55s/it] +2025-02-06 02:44:16 - ERROR - stderr - 84%|████████▍ | 18892/22434 [16:36:36<2:30:28, 2.55s/it] +2025-02-06 02:44:16 - ERROR - stderr - +2025-02-06 02:44:16 - ERROR - stderr - +2025-02-06 02:44:16 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.4931472539901733, 'learning_rate': 1.2792758738911026e-06, 'epoch': 2.53} +2025-02-06 02:44:16 - ERROR - stderr - 84%|████████▍ | 18892/22434 [16:36:36<2:30:28, 2.55s/it] +2025-02-06 02:44:18 - ERROR - stderr - 84%|████████▍ | 18893/22434 [16:36:38<2:28:36, 2.52s/it] +2025-02-06 02:44:18 - ERROR - stderr - +2025-02-06 02:44:18 - ERROR - stderr - +2025-02-06 02:44:18 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.4790502786636353, 'learning_rate': 1.278569428597508e-06, 'epoch': 2.53} +2025-02-06 02:44:18 - ERROR - stderr - 84%|████████▍ | 18893/22434 [16:36:38<2:28:36, 2.52s/it] +2025-02-06 02:44:21 - ERROR - stderr - 84%|████████▍ | 18894/22434 [16:36:40<2:27:50, 2.51s/it] +2025-02-06 02:44:21 - ERROR - stderr - +2025-02-06 02:44:21 - ERROR - stderr - +2025-02-06 02:44:21 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.5567066669464111, 'learning_rate': 1.27786316509379e-06, 'epoch': 2.53} +2025-02-06 02:44:21 - ERROR - stderr - 84%|████████▍ | 18894/22434 [16:36:41<2:27:50, 2.51s/it] +2025-02-06 02:44:23 - ERROR - stderr - 84%|████████▍ | 18895/22434 [16:36:43<2:26:26, 2.48s/it] +2025-02-06 02:44:23 - ERROR - stderr - +2025-02-06 02:44:23 - ERROR - stderr - +2025-02-06 02:44:23 - INFO - stdout - {'loss': 0.3358, 'grad_norm': 1.5801483392715454, 'learning_rate': 1.2771570833946645e-06, 'epoch': 2.53} +2025-02-06 02:44:23 - ERROR - stderr - 84%|████████▍ | 18895/22434 [16:36:43<2:26:26, 2.48s/it] +2025-02-06 02:44:26 - ERROR - stderr - 84%|████████▍ | 18896/22434 [16:36:45<2:25:12, 2.46s/it] +2025-02-06 02:44:26 - ERROR - stderr - +2025-02-06 02:44:26 - ERROR - stderr - +2025-02-06 02:44:26 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.4971035718917847, 'learning_rate': 1.2764511835148552e-06, 'epoch': 2.53} +2025-02-06 02:44:26 - ERROR - stderr - 84%|████████▍ | 18896/22434 [16:36:45<2:25:12, 2.46s/it] +2025-02-06 02:44:28 - ERROR - stderr - 84%|████████▍ | 18897/22434 [16:36:48<2:25:24, 2.47s/it] +2025-02-06 02:44:28 - ERROR - stderr - +2025-02-06 02:44:28 - ERROR - stderr - +2025-02-06 02:44:28 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.5514277219772339, 'learning_rate': 1.2757454654690748e-06, 'epoch': 2.53} +2025-02-06 02:44:28 - ERROR - stderr - 84%|████████▍ | 18897/22434 [16:36:48<2:25:24, 2.47s/it] +2025-02-06 02:44:31 - ERROR - stderr - 84%|████████▍ | 18898/22434 [16:36:50<2:25:27, 2.47s/it] +2025-02-06 02:44:31 - ERROR - stderr - +2025-02-06 02:44:31 - ERROR - stderr - +2025-02-06 02:44:31 - INFO - stdout - {'loss': 0.413, 'grad_norm': 1.6739771366119385, 'learning_rate': 1.2750399292720284e-06, 'epoch': 2.53} +2025-02-06 02:44:31 - ERROR - stderr - 84%|████████▍ | 18898/22434 [16:36:50<2:25:27, 2.47s/it] +2025-02-06 02:44:33 - ERROR - stderr - 84%|████████▍ | 18899/22434 [16:36:53<2:25:47, 2.47s/it] +2025-02-06 02:44:33 - ERROR - stderr - +2025-02-06 02:44:33 - ERROR - stderr - +2025-02-06 02:44:33 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.4762396812438965, 'learning_rate': 1.2743345749384296e-06, 'epoch': 2.53} +2025-02-06 02:44:33 - ERROR - stderr - 84%|████████▍ | 18899/22434 [16:36:53<2:25:47, 2.47s/it] +2025-02-06 02:44:36 - ERROR - stderr - 84%|████████▍ | 18900/22434 [16:36:55<2:26:23, 2.49s/it] +2025-02-06 02:44:36 - ERROR - stderr - +2025-02-06 02:44:36 - ERROR - stderr - +2025-02-06 02:44:36 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.636349081993103, 'learning_rate': 1.2736294024829732e-06, 'epoch': 2.53} +2025-02-06 02:44:36 - ERROR - stderr - 84%|████████▍ | 18900/22434 [16:36:55<2:26:23, 2.49s/it] +2025-02-06 02:44:38 - ERROR - stderr - 84%|████████▍ | 18901/22434 [16:36:58<2:27:16, 2.50s/it] +2025-02-06 02:44:38 - ERROR - stderr - +2025-02-06 02:44:38 - ERROR - stderr - +2025-02-06 02:44:38 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.557605504989624, 'learning_rate': 1.2729244119203655e-06, 'epoch': 2.53} +2025-02-06 02:44:38 - ERROR - stderr - 84%|████████▍ | 18901/22434 [16:36:58<2:27:16, 2.50s/it] +2025-02-06 02:44:41 - ERROR - stderr - 84%|████████▍ | 18902/22434 [16:37:00<2:29:50, 2.55s/it] +2025-02-06 02:44:41 - ERROR - stderr - +2025-02-06 02:44:41 - ERROR - stderr - +2025-02-06 02:44:41 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.7391892671585083, 'learning_rate': 1.2722196032652955e-06, 'epoch': 2.53} +2025-02-06 02:44:41 - ERROR - stderr - 84%|████████▍ | 18902/22434 [16:37:01<2:29:50, 2.55s/it] +2025-02-06 02:44:43 - ERROR - stderr - 84%|████████▍ | 18903/22434 [16:37:03<2:29:13, 2.54s/it] +2025-02-06 02:44:43 - ERROR - stderr - +2025-02-06 02:44:43 - ERROR - stderr - +2025-02-06 02:44:43 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.4972485303878784, 'learning_rate': 1.2715149765324542e-06, 'epoch': 2.53} +2025-02-06 02:44:43 - ERROR - stderr - 84%|████████▍ | 18903/22434 [16:37:03<2:29:13, 2.54s/it] +2025-02-06 02:44:46 - ERROR - stderr - 84%|████████▍ | 18904/22434 [16:37:05<2:27:32, 2.51s/it] +2025-02-06 02:44:46 - ERROR - stderr - +2025-02-06 02:44:46 - ERROR - stderr - +2025-02-06 02:44:46 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.526349663734436, 'learning_rate': 1.270810531736535e-06, 'epoch': 2.53} +2025-02-06 02:44:46 - ERROR - stderr - 84%|████████▍ | 18904/22434 [16:37:05<2:27:32, 2.51s/it] +2025-02-06 02:44:48 - ERROR - stderr - 84%|████████▍ | 18905/22434 [16:37:08<2:28:06, 2.52s/it] +2025-02-06 02:44:48 - ERROR - stderr - +2025-02-06 02:44:48 - ERROR - stderr - +2025-02-06 02:44:48 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.4375758171081543, 'learning_rate': 1.270106268892216e-06, 'epoch': 2.53} +2025-02-06 02:44:48 - ERROR - stderr - 84%|████████▍ | 18905/22434 [16:37:08<2:28:06, 2.52s/it] +2025-02-06 02:44:51 - ERROR - stderr - 84%|████████▍ | 18906/22434 [16:37:10<2:27:44, 2.51s/it] +2025-02-06 02:44:51 - ERROR - stderr - +2025-02-06 02:44:51 - ERROR - stderr - +2025-02-06 02:44:51 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.5881681442260742, 'learning_rate': 1.2694021880141772e-06, 'epoch': 2.53} +2025-02-06 02:44:51 - ERROR - stderr - 84%|████████▍ | 18906/22434 [16:37:11<2:27:44, 2.51s/it] +2025-02-06 02:44:53 - ERROR - stderr - 84%|████████▍ | 18907/22434 [16:37:13<2:27:47, 2.51s/it] +2025-02-06 02:44:53 - ERROR - stderr - +2025-02-06 02:44:53 - ERROR - stderr - +2025-02-06 02:44:53 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.4318047761917114, 'learning_rate': 1.2686982891170962e-06, 'epoch': 2.53} +2025-02-06 02:44:53 - ERROR - stderr - 84%|████████▍ | 18907/22434 [16:37:13<2:27:47, 2.51s/it] +2025-02-06 02:44:56 - ERROR - stderr - 84%|████████▍ | 18908/22434 [16:37:15<2:27:41, 2.51s/it] +2025-02-06 02:44:56 - ERROR - stderr - +2025-02-06 02:44:56 - ERROR - stderr - +2025-02-06 02:44:56 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.5559161901474, 'learning_rate': 1.267994572215644e-06, 'epoch': 2.53} +2025-02-06 02:44:56 - ERROR - stderr - 84%|████████▍ | 18908/22434 [16:37:16<2:27:41, 2.51s/it] +2025-02-06 02:44:58 - ERROR - stderr - 84%|████████▍ | 18909/22434 [16:37:18<2:28:07, 2.52s/it] +2025-02-06 02:44:58 - ERROR - stderr - +2025-02-06 02:44:58 - ERROR - stderr - +2025-02-06 02:44:58 - INFO - stdout - {'loss': 0.4203, 'grad_norm': 1.701930046081543, 'learning_rate': 1.2672910373244896e-06, 'epoch': 2.53} +2025-02-06 02:44:58 - ERROR - stderr - 84%|████████▍ | 18909/22434 [16:37:18<2:28:07, 2.52s/it] +2025-02-06 02:45:01 - ERROR - stderr - 84%|████████▍ | 18910/22434 [16:37:21<2:27:19, 2.51s/it] +2025-02-06 02:45:01 - ERROR - stderr - +2025-02-06 02:45:01 - ERROR - stderr - +2025-02-06 02:45:01 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.5551903247833252, 'learning_rate': 1.266587684458297e-06, 'epoch': 2.53} +2025-02-06 02:45:01 - ERROR - stderr - 84%|████████▍ | 18910/22434 [16:37:21<2:27:19, 2.51s/it] +2025-02-06 02:45:03 - ERROR - stderr - 84%|████████▍ | 18911/22434 [16:37:23<2:27:24, 2.51s/it] +2025-02-06 02:45:03 - ERROR - stderr - +2025-02-06 02:45:03 - ERROR - stderr - +2025-02-06 02:45:03 - INFO - stdout - {'loss': 0.3139, 'grad_norm': 1.3695919513702393, 'learning_rate': 1.2658845136317276e-06, 'epoch': 2.53} +2025-02-06 02:45:03 - ERROR - stderr - 84%|████████▍ | 18911/22434 [16:37:23<2:27:24, 2.51s/it] +2025-02-06 02:45:06 - ERROR - stderr - 84%|████████▍ | 18912/22434 [16:37:26<2:26:56, 2.50s/it] +2025-02-06 02:45:06 - ERROR - stderr - +2025-02-06 02:45:06 - ERROR - stderr - +2025-02-06 02:45:06 - INFO - stdout - {'loss': 0.3994, 'grad_norm': 1.7634577751159668, 'learning_rate': 1.2651815248594368e-06, 'epoch': 2.53} +2025-02-06 02:45:06 - ERROR - stderr - 84%|████████▍ | 18912/22434 [16:37:26<2:26:56, 2.50s/it] +2025-02-06 02:45:08 - ERROR - stderr - 84%|████████▍ | 18913/22434 [16:37:28<2:25:24, 2.48s/it] +2025-02-06 02:45:08 - ERROR - stderr - +2025-02-06 02:45:08 - ERROR - stderr - +2025-02-06 02:45:08 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.4687007665634155, 'learning_rate': 1.2644787181560826e-06, 'epoch': 2.53} +2025-02-06 02:45:08 - ERROR - stderr - 84%|████████▍ | 18913/22434 [16:37:28<2:25:24, 2.48s/it] +2025-02-06 02:45:11 - ERROR - stderr - 84%|████████▍ | 18914/22434 [16:37:30<2:25:54, 2.49s/it] +2025-02-06 02:45:11 - ERROR - stderr - +2025-02-06 02:45:11 - ERROR - stderr - +2025-02-06 02:45:11 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.4918192625045776, 'learning_rate': 1.2637760935363053e-06, 'epoch': 2.53} +2025-02-06 02:45:11 - ERROR - stderr - 84%|████████▍ | 18914/22434 [16:37:30<2:25:54, 2.49s/it] +2025-02-06 02:45:13 - ERROR - stderr - 84%|████████▍ | 18915/22434 [16:37:33<2:26:21, 2.50s/it] +2025-02-06 02:45:13 - ERROR - stderr - +2025-02-06 02:45:13 - ERROR - stderr - +2025-02-06 02:45:13 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.6515318155288696, 'learning_rate': 1.2630736510147569e-06, 'epoch': 2.53} +2025-02-06 02:45:13 - ERROR - stderr - 84%|████████▍ | 18915/22434 [16:37:33<2:26:21, 2.50s/it] +2025-02-06 02:45:16 - ERROR - stderr - 84%|████████▍ | 18916/22434 [16:37:36<2:28:29, 2.53s/it] +2025-02-06 02:45:16 - ERROR - stderr - +2025-02-06 02:45:16 - ERROR - stderr - +2025-02-06 02:45:16 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.7606096267700195, 'learning_rate': 1.2623713906060798e-06, 'epoch': 2.53} +2025-02-06 02:45:16 - ERROR - stderr - 84%|████████▍ | 18916/22434 [16:37:36<2:28:29, 2.53s/it] +2025-02-06 02:45:19 - ERROR - stderr - 84%|████████▍ | 18917/22434 [16:37:38<2:34:44, 2.64s/it] +2025-02-06 02:45:19 - ERROR - stderr - +2025-02-06 02:45:19 - ERROR - stderr - +2025-02-06 02:45:19 - INFO - stdout - {'loss': 0.3066, 'grad_norm': 1.505626916885376, 'learning_rate': 1.261669312324908e-06, 'epoch': 2.53} +2025-02-06 02:45:19 - ERROR - stderr - 84%|████████▍ | 18917/22434 [16:37:39<2:34:44, 2.64s/it] +2025-02-06 02:45:21 - ERROR - stderr - 84%|████████▍ | 18918/22434 [16:37:41<2:35:29, 2.65s/it] +2025-02-06 02:45:21 - ERROR - stderr - +2025-02-06 02:45:21 - ERROR - stderr - +2025-02-06 02:45:21 - INFO - stdout - {'loss': 0.3279, 'grad_norm': 1.4503101110458374, 'learning_rate': 1.260967416185882e-06, 'epoch': 2.53} +2025-02-06 02:45:21 - ERROR - stderr - 84%|████████▍ | 18918/22434 [16:37:41<2:35:29, 2.65s/it] +2025-02-06 02:45:24 - ERROR - stderr - 84%|████████▍ | 18919/22434 [16:37:44<2:32:11, 2.60s/it] +2025-02-06 02:45:24 - ERROR - stderr - +2025-02-06 02:45:24 - ERROR - stderr - +2025-02-06 02:45:24 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.7868506908416748, 'learning_rate': 1.2602657022036224e-06, 'epoch': 2.53} +2025-02-06 02:45:24 - ERROR - stderr - 84%|████████▍ | 18919/22434 [16:37:44<2:32:11, 2.60s/it] +2025-02-06 02:45:26 - ERROR - stderr - 84%|████████▍ | 18920/22434 [16:37:46<2:30:24, 2.57s/it] +2025-02-06 02:45:26 - ERROR - stderr - +2025-02-06 02:45:26 - ERROR - stderr - +2025-02-06 02:45:26 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.4775365591049194, 'learning_rate': 1.2595641703927652e-06, 'epoch': 2.53} +2025-02-06 02:45:26 - ERROR - stderr - 84%|████████▍ | 18920/22434 [16:37:46<2:30:24, 2.57s/it] +2025-02-06 02:45:29 - ERROR - stderr - 84%|████████▍ | 18921/22434 [16:37:49<2:30:58, 2.58s/it] +2025-02-06 02:45:29 - ERROR - stderr - +2025-02-06 02:45:29 - ERROR - stderr - +2025-02-06 02:45:29 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.7230604887008667, 'learning_rate': 1.2588628207679276e-06, 'epoch': 2.53} +2025-02-06 02:45:29 - ERROR - stderr - 84%|████████▍ | 18921/22434 [16:37:49<2:30:58, 2.58s/it] +2025-02-06 02:45:31 - ERROR - stderr - 84%|████████▍ | 18922/22434 [16:37:51<2:29:40, 2.56s/it] +2025-02-06 02:45:32 - ERROR - stderr - +2025-02-06 02:45:32 - ERROR - stderr - +2025-02-06 02:45:32 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.548416256904602, 'learning_rate': 1.2581616533437279e-06, 'epoch': 2.53} +2025-02-06 02:45:32 - ERROR - stderr - 84%|████████▍ | 18922/22434 [16:37:51<2:29:40, 2.56s/it] +2025-02-06 02:45:34 - ERROR - stderr - 84%|████████▍ | 18923/22434 [16:37:54<2:28:39, 2.54s/it] +2025-02-06 02:45:34 - ERROR - stderr - +2025-02-06 02:45:34 - ERROR - stderr - +2025-02-06 02:45:34 - INFO - stdout - {'loss': 0.3246, 'grad_norm': 1.4889628887176514, 'learning_rate': 1.2574606681347878e-06, 'epoch': 2.53} +2025-02-06 02:45:34 - ERROR - stderr - 84%|████████▍ | 18923/22434 [16:37:54<2:28:39, 2.54s/it] +2025-02-06 02:45:36 - ERROR - stderr - 84%|████████▍ | 18924/22434 [16:37:56<2:28:23, 2.54s/it] +2025-02-06 02:45:37 - ERROR - stderr - +2025-02-06 02:45:37 - ERROR - stderr - +2025-02-06 02:45:37 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.5073761940002441, 'learning_rate': 1.25675986515571e-06, 'epoch': 2.53} +2025-02-06 02:45:37 - ERROR - stderr - 84%|████████▍ | 18924/22434 [16:37:56<2:28:23, 2.54s/it] +2025-02-06 02:45:39 - ERROR - stderr - 84%|████████▍ | 18925/22434 [16:37:59<2:26:56, 2.51s/it] +2025-02-06 02:45:39 - ERROR - stderr - +2025-02-06 02:45:39 - ERROR - stderr - +2025-02-06 02:45:39 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.572583556175232, 'learning_rate': 1.2560592444211106e-06, 'epoch': 2.53} +2025-02-06 02:45:39 - ERROR - stderr - 84%|████████▍ | 18925/22434 [16:37:59<2:26:56, 2.51s/it] +2025-02-06 02:45:41 - ERROR - stderr - 84%|████████▍ | 18926/22434 [16:38:01<2:26:53, 2.51s/it] +2025-02-06 02:45:42 - ERROR - stderr - +2025-02-06 02:45:42 - ERROR - stderr - +2025-02-06 02:45:42 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.5006012916564941, 'learning_rate': 1.2553588059455878e-06, 'epoch': 2.53} +2025-02-06 02:45:42 - ERROR - stderr - 84%|████████▍ | 18926/22434 [16:38:01<2:26:53, 2.51s/it] +2025-02-06 02:45:44 - ERROR - stderr - 84%|████████▍ | 18927/22434 [16:38:04<2:25:46, 2.49s/it] +2025-02-06 02:45:44 - ERROR - stderr - +2025-02-06 02:45:44 - ERROR - stderr - +2025-02-06 02:45:44 - INFO - stdout - {'loss': 0.3793, 'grad_norm': 1.8597067594528198, 'learning_rate': 1.2546585497437425e-06, 'epoch': 2.53} +2025-02-06 02:45:44 - ERROR - stderr - 84%|████████▍ | 18927/22434 [16:38:04<2:25:46, 2.49s/it] +2025-02-06 02:45:46 - ERROR - stderr - 84%|████████▍ | 18928/22434 [16:38:06<2:25:05, 2.48s/it] +2025-02-06 02:45:46 - ERROR - stderr - +2025-02-06 02:45:46 - ERROR - stderr - +2025-02-06 02:45:46 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.4208486080169678, 'learning_rate': 1.2539584758301704e-06, 'epoch': 2.53} +2025-02-06 02:45:46 - ERROR - stderr - 84%|████████▍ | 18928/22434 [16:38:06<2:25:05, 2.48s/it] +2025-02-06 02:45:49 - ERROR - stderr - 84%|████████▍ | 18929/22434 [16:38:09<2:26:27, 2.51s/it] +2025-02-06 02:45:49 - ERROR - stderr - +2025-02-06 02:45:49 - ERROR - stderr - +2025-02-06 02:45:49 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.5138018131256104, 'learning_rate': 1.2532585842194656e-06, 'epoch': 2.53} +2025-02-06 02:45:49 - ERROR - stderr - 84%|████████▍ | 18929/22434 [16:38:09<2:26:27, 2.51s/it] +2025-02-06 02:45:51 - ERROR - stderr - 84%|████████▍ | 18930/22434 [16:38:11<2:25:50, 2.50s/it] +2025-02-06 02:45:51 - ERROR - stderr - +2025-02-06 02:45:51 - ERROR - stderr - +2025-02-06 02:45:51 - INFO - stdout - {'loss': 0.336, 'grad_norm': 1.3781754970550537, 'learning_rate': 1.2525588749262163e-06, 'epoch': 2.53} +2025-02-06 02:45:51 - ERROR - stderr - 84%|████████▍ | 18930/22434 [16:38:11<2:25:50, 2.50s/it] +2025-02-06 02:45:54 - ERROR - stderr - 84%|████████▍ | 18931/22434 [16:38:14<2:33:35, 2.63s/it] +2025-02-06 02:45:54 - ERROR - stderr - +2025-02-06 02:45:54 - ERROR - stderr - +2025-02-06 02:45:54 - INFO - stdout - {'loss': 0.3259, 'grad_norm': 1.4748908281326294, 'learning_rate': 1.2518593479650065e-06, 'epoch': 2.53} +2025-02-06 02:45:54 - ERROR - stderr - 84%|████████▍ | 18931/22434 [16:38:14<2:33:35, 2.63s/it] +2025-02-06 02:45:57 - ERROR - stderr - 84%|████████▍ | 18932/22434 [16:38:17<2:32:04, 2.61s/it] +2025-02-06 02:45:57 - ERROR - stderr - +2025-02-06 02:45:57 - ERROR - stderr - +2025-02-06 02:45:57 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.6090593338012695, 'learning_rate': 1.2511600033504178e-06, 'epoch': 2.53} +2025-02-06 02:45:57 - ERROR - stderr - 84%|████████▍ | 18932/22434 [16:38:17<2:32:04, 2.61s/it] +2025-02-06 02:45:59 - ERROR - stderr - 84%|████████▍ | 18933/22434 [16:38:19<2:29:43, 2.57s/it] +2025-02-06 02:45:59 - ERROR - stderr - +2025-02-06 02:45:59 - ERROR - stderr - +2025-02-06 02:45:59 - INFO - stdout - {'loss': 0.358, 'grad_norm': 1.5167192220687866, 'learning_rate': 1.2504608410970264e-06, 'epoch': 2.53} +2025-02-06 02:45:59 - ERROR - stderr - 84%|████████▍ | 18933/22434 [16:38:19<2:29:43, 2.57s/it] +2025-02-06 02:46:02 - ERROR - stderr - 84%|████████▍ | 18934/22434 [16:38:22<2:27:36, 2.53s/it] +2025-02-06 02:46:02 - ERROR - stderr - +2025-02-06 02:46:02 - ERROR - stderr - +2025-02-06 02:46:02 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.4974462985992432, 'learning_rate': 1.2497618612194073e-06, 'epoch': 2.53} +2025-02-06 02:46:02 - ERROR - stderr - 84%|████████▍ | 18934/22434 [16:38:22<2:27:36, 2.53s/it] +2025-02-06 02:46:04 - ERROR - stderr - 84%|████████▍ | 18935/22434 [16:38:24<2:30:07, 2.57s/it] +2025-02-06 02:46:05 - ERROR - stderr - +2025-02-06 02:46:05 - ERROR - stderr - +2025-02-06 02:46:05 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.4638608694076538, 'learning_rate': 1.2490630637321289e-06, 'epoch': 2.53} +2025-02-06 02:46:05 - ERROR - stderr - 84%|████████▍ | 18935/22434 [16:38:24<2:30:07, 2.57s/it] +2025-02-06 02:46:07 - ERROR - stderr - 84%|████████▍ | 18936/22434 [16:38:27<2:29:21, 2.56s/it] +2025-02-06 02:46:07 - ERROR - stderr - +2025-02-06 02:46:07 - ERROR - stderr - +2025-02-06 02:46:07 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.5389032363891602, 'learning_rate': 1.248364448649757e-06, 'epoch': 2.53} +2025-02-06 02:46:07 - ERROR - stderr - 84%|████████▍ | 18936/22434 [16:38:27<2:29:21, 2.56s/it] +2025-02-06 02:46:10 - ERROR - stderr - 84%|████████▍ | 18937/22434 [16:38:29<2:28:52, 2.55s/it] +2025-02-06 02:46:10 - ERROR - stderr - +2025-02-06 02:46:10 - ERROR - stderr - +2025-02-06 02:46:10 - INFO - stdout - {'loss': 0.4132, 'grad_norm': 1.6555815935134888, 'learning_rate': 1.2476660159868559e-06, 'epoch': 2.53} +2025-02-06 02:46:10 - ERROR - stderr - 84%|████████▍ | 18937/22434 [16:38:29<2:28:52, 2.55s/it] +2025-02-06 02:46:12 - ERROR - stderr - 84%|████████▍ | 18938/22434 [16:38:32<2:28:34, 2.55s/it] +2025-02-06 02:46:12 - ERROR - stderr - +2025-02-06 02:46:12 - ERROR - stderr - +2025-02-06 02:46:12 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.5481048822402954, 'learning_rate': 1.2469677657579771e-06, 'epoch': 2.53} +2025-02-06 02:46:12 - ERROR - stderr - 84%|████████▍ | 18938/22434 [16:38:32<2:28:34, 2.55s/it] +2025-02-06 02:46:15 - ERROR - stderr - 84%|████████▍ | 18939/22434 [16:38:34<2:28:36, 2.55s/it] +2025-02-06 02:46:15 - ERROR - stderr - +2025-02-06 02:46:15 - ERROR - stderr - +2025-02-06 02:46:15 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.7124639749526978, 'learning_rate': 1.2462696979776835e-06, 'epoch': 2.53} +2025-02-06 02:46:15 - ERROR - stderr - 84%|████████▍ | 18939/22434 [16:38:34<2:28:36, 2.55s/it] +2025-02-06 02:46:17 - ERROR - stderr - 84%|████████▍ | 18940/22434 [16:38:37<2:28:09, 2.54s/it] +2025-02-06 02:46:17 - ERROR - stderr - +2025-02-06 02:46:17 - ERROR - stderr - +2025-02-06 02:46:17 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.6271015405654907, 'learning_rate': 1.2455718126605176e-06, 'epoch': 2.53} +2025-02-06 02:46:17 - ERROR - stderr - 84%|���███████▍ | 18940/22434 [16:38:37<2:28:09, 2.54s/it] +2025-02-06 02:46:20 - ERROR - stderr - 84%|████████▍ | 18941/22434 [16:38:39<2:27:41, 2.54s/it] +2025-02-06 02:46:20 - ERROR - stderr - +2025-02-06 02:46:20 - ERROR - stderr - +2025-02-06 02:46:20 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.7026063203811646, 'learning_rate': 1.2448741098210326e-06, 'epoch': 2.53} +2025-02-06 02:46:20 - ERROR - stderr - 84%|████████▍ | 18941/22434 [16:38:40<2:27:41, 2.54s/it] +2025-02-06 02:46:22 - ERROR - stderr - 84%|████████▍ | 18942/22434 [16:38:42<2:32:03, 2.61s/it] +2025-02-06 02:46:23 - ERROR - stderr - +2025-02-06 02:46:23 - ERROR - stderr - +2025-02-06 02:46:23 - INFO - stdout - {'loss': 0.4584, 'grad_norm': 1.7247742414474487, 'learning_rate': 1.2441765894737711e-06, 'epoch': 2.53} +2025-02-06 02:46:23 - ERROR - stderr - 84%|████████▍ | 18942/22434 [16:38:42<2:32:03, 2.61s/it] +2025-02-06 02:46:25 - ERROR - stderr - 84%|████████▍ | 18943/22434 [16:38:45<2:30:18, 2.58s/it] +2025-02-06 02:46:25 - ERROR - stderr - +2025-02-06 02:46:25 - ERROR - stderr - +2025-02-06 02:46:25 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.5899831056594849, 'learning_rate': 1.243479251633266e-06, 'epoch': 2.53} +2025-02-06 02:46:25 - ERROR - stderr - 84%|████████▍ | 18943/22434 [16:38:45<2:30:18, 2.58s/it] +2025-02-06 02:46:27 - ERROR - stderr - 84%|████████▍ | 18944/22434 [16:38:47<2:28:06, 2.55s/it] +2025-02-06 02:46:28 - ERROR - stderr - +2025-02-06 02:46:28 - ERROR - stderr - +2025-02-06 02:46:28 - INFO - stdout - {'loss': 0.4004, 'grad_norm': 1.6082937717437744, 'learning_rate': 1.2427820963140612e-06, 'epoch': 2.53} +2025-02-06 02:46:28 - ERROR - stderr - 84%|████████▍ | 18944/22434 [16:38:47<2:28:06, 2.55s/it] +2025-02-06 02:46:30 - ERROR - stderr - 84%|████████▍ | 18945/22434 [16:38:50<2:29:11, 2.57s/it] +2025-02-06 02:46:30 - ERROR - stderr - +2025-02-06 02:46:30 - ERROR - stderr - +2025-02-06 02:46:30 - INFO - stdout - {'loss': 0.3835, 'grad_norm': 1.5501461029052734, 'learning_rate': 1.2420851235306819e-06, 'epoch': 2.53} +2025-02-06 02:46:30 - ERROR - stderr - 84%|████████▍ | 18945/22434 [16:38:50<2:29:11, 2.57s/it] +2025-02-06 02:46:33 - ERROR - stderr - 84%|████████▍ | 18946/22434 [16:38:52<2:28:54, 2.56s/it] +2025-02-06 02:46:33 - ERROR - stderr - +2025-02-06 02:46:33 - ERROR - stderr - +2025-02-06 02:46:33 - INFO - stdout - {'loss': 0.3305, 'grad_norm': 1.5390478372573853, 'learning_rate': 1.2413883332976573e-06, 'epoch': 2.53} +2025-02-06 02:46:33 - ERROR - stderr - 84%|████████▍ | 18946/22434 [16:38:52<2:28:54, 2.56s/it] +2025-02-06 02:46:35 - ERROR - stderr - 84%|████████▍ | 18947/22434 [16:38:55<2:28:18, 2.55s/it] +2025-02-06 02:46:35 - ERROR - stderr - +2025-02-06 02:46:35 - ERROR - stderr - +2025-02-06 02:46:35 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.4468762874603271, 'learning_rate': 1.2406917256295115e-06, 'epoch': 2.53} +2025-02-06 02:46:35 - ERROR - stderr - 84%|████████▍ | 18947/22434 [16:38:55<2:28:18, 2.55s/it] +2025-02-06 02:46:38 - ERROR - stderr - 84%|████████▍ | 18948/22434 [16:38:57<2:27:34, 2.54s/it] +2025-02-06 02:46:38 - ERROR - stderr - +2025-02-06 02:46:38 - ERROR - stderr - +2025-02-06 02:46:38 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.4085780382156372, 'learning_rate': 1.239995300540765e-06, 'epoch': 2.53} +2025-02-06 02:46:38 - ERROR - stderr - 84%|████████▍ | 18948/22434 [16:38:57<2:27:34, 2.54s/it] +2025-02-06 02:46:40 - ERROR - stderr - 84%|████████▍ | 18949/22434 [16:39:00<2:27:27, 2.54s/it] +2025-02-06 02:46:40 - ERROR - stderr - +2025-02-06 02:46:40 - ERROR - stderr - +2025-02-06 02:46:40 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.6610180139541626, 'learning_rate': 1.2392990580459351e-06, 'epoch': 2.53} +2025-02-06 02:46:40 - ERROR - stderr - 84%|████████▍ | 18949/22434 [16:39:00<2:27:27, 2.54s/it] +2025-02-06 02:46:43 - ERROR - stderr - 84%|████████▍ | 18950/22434 [16:39:02<2:25:21, 2.50s/it] +2025-02-06 02:46:43 - ERROR - stderr - +2025-02-06 02:46:43 - ERROR - stderr - +2025-02-06 02:46:43 - INFO - stdout - {'loss': 0.4361, 'grad_norm': 1.6348180770874023, 'learning_rate': 1.2386029981595327e-06, 'epoch': 2.53} +2025-02-06 02:46:43 - ERROR - stderr - 84%|████████▍ | 18950/22434 [16:39:02<2:25:21, 2.50s/it] +2025-02-06 02:46:45 - ERROR - stderr - 84%|████████▍ | 18951/22434 [16:39:05<2:25:08, 2.50s/it] +2025-02-06 02:46:45 - ERROR - stderr - +2025-02-06 02:46:45 - ERROR - stderr - +2025-02-06 02:46:45 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.6586226224899292, 'learning_rate': 1.2379071208960669e-06, 'epoch': 2.53} +2025-02-06 02:46:45 - ERROR - stderr - 84%|████████▍ | 18951/22434 [16:39:05<2:25:08, 2.50s/it] +2025-02-06 02:46:48 - ERROR - stderr - 84%|████████▍ | 18952/22434 [16:39:07<2:24:38, 2.49s/it] +2025-02-06 02:46:48 - ERROR - stderr - +2025-02-06 02:46:48 - ERROR - stderr - +2025-02-06 02:46:48 - INFO - stdout - {'loss': 0.3266, 'grad_norm': 1.4290226697921753, 'learning_rate': 1.2372114262700419e-06, 'epoch': 2.53} +2025-02-06 02:46:48 - ERROR - stderr - 84%|████████▍ | 18952/22434 [16:39:07<2:24:38, 2.49s/it] +2025-02-06 02:46:50 - ERROR - stderr - 84%|████████▍ | 18953/22434 [16:39:10<2:23:44, 2.48s/it] +2025-02-06 02:46:50 - ERROR - stderr - +2025-02-06 02:46:50 - ERROR - stderr - +2025-02-06 02:46:50 - INFO - stdout - {'loss': 0.31, 'grad_norm': 1.4148601293563843, 'learning_rate': 1.2365159142959604e-06, 'epoch': 2.53} +2025-02-06 02:46:50 - ERROR - stderr - 84%|████████▍ | 18953/22434 [16:39:10<2:23:44, 2.48s/it] +2025-02-06 02:46:52 - ERROR - stderr - 84%|████████▍ | 18954/22434 [16:39:12<2:23:24, 2.47s/it] +2025-02-06 02:46:53 - ERROR - stderr - +2025-02-06 02:46:53 - ERROR - stderr - +2025-02-06 02:46:53 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.4793347120285034, 'learning_rate': 1.2358205849883197e-06, 'epoch': 2.53} +2025-02-06 02:46:53 - ERROR - stderr - 84%|████████▍ | 18954/22434 [16:39:12<2:23:24, 2.47s/it] +2025-02-06 02:46:55 - ERROR - stderr - 84%|████████▍ | 18955/22434 [16:39:15<2:22:31, 2.46s/it] +2025-02-06 02:46:55 - ERROR - stderr - +2025-02-06 02:46:55 - ERROR - stderr - +2025-02-06 02:46:55 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.5342814922332764, 'learning_rate': 1.235125438361612e-06, 'epoch': 2.53} +2025-02-06 02:46:55 - ERROR - stderr - 84%|████████▍ | 18955/22434 [16:39:15<2:22:31, 2.46s/it] +2025-02-06 02:46:57 - ERROR - stderr - 84%|████████▍ | 18956/22434 [16:39:17<2:22:58, 2.47s/it] +2025-02-06 02:46:57 - ERROR - stderr - +2025-02-06 02:46:57 - ERROR - stderr - +2025-02-06 02:46:57 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.5601845979690552, 'learning_rate': 1.234430474430327e-06, 'epoch': 2.53} +2025-02-06 02:46:57 - ERROR - stderr - 84%|████████▍ | 18956/22434 [16:39:17<2:22:58, 2.47s/it] +2025-02-06 02:47:00 - ERROR - stderr - 85%|████████▍ | 18957/22434 [16:39:20<2:23:40, 2.48s/it] +2025-02-06 02:47:00 - ERROR - stderr - +2025-02-06 02:47:00 - ERROR - stderr - +2025-02-06 02:47:00 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.6269174814224243, 'learning_rate': 1.2337356932089517e-06, 'epoch': 2.54} +2025-02-06 02:47:00 - ERROR - stderr - 85%|████████▍ | 18957/22434 [16:39:20<2:23:40, 2.48s/it] +2025-02-06 02:47:02 - ERROR - stderr - 85%|████████▍ | 18958/22434 [16:39:22<2:22:40, 2.46s/it] +2025-02-06 02:47:02 - ERROR - stderr - +2025-02-06 02:47:02 - ERROR - stderr - +2025-02-06 02:47:02 - INFO - stdout - {'loss': 0.4239, 'grad_norm': 1.8737772703170776, 'learning_rate': 1.2330410947119685e-06, 'epoch': 2.54} +2025-02-06 02:47:02 - ERROR - stderr - 85%|████████▍ | 18958/22434 [16:39:22<2:22:40, 2.46s/it] +2025-02-06 02:47:05 - ERROR - stderr - 85%|████████▍ | 18959/22434 [16:39:25<2:23:35, 2.48s/it] +2025-02-06 02:47:05 - ERROR - stderr - +2025-02-06 02:47:05 - ERROR - stderr - +2025-02-06 02:47:05 - INFO - stdout - {'loss': 0.3963, 'grad_norm': 1.744576096534729, 'learning_rate': 1.2323466789538508e-06, 'epoch': 2.54} +2025-02-06 02:47:05 - ERROR - stderr - 85%|████████▍ | 18959/22434 [16:39:25<2:23:35, 2.48s/it] +2025-02-06 02:47:07 - ERROR - stderr - 85%|████████▍ | 18960/22434 [16:39:27<2:24:01, 2.49s/it] +2025-02-06 02:47:07 - ERROR - stderr - +2025-02-06 02:47:07 - ERROR - stderr - +2025-02-06 02:47:07 - INFO - stdout - {'loss': 0.328, 'grad_norm': 1.4829845428466797, 'learning_rate': 1.2316524459490796e-06, 'epoch': 2.54} +2025-02-06 02:47:07 - ERROR - stderr - 85%|████████▍ | 18960/22434 [16:39:27<2:24:01, 2.49s/it] +2025-02-06 02:47:10 - ERROR - stderr - 85%|████████▍ | 18961/22434 [16:39:30<2:23:40, 2.48s/it] +2025-02-06 02:47:10 - ERROR - stderr - +2025-02-06 02:47:10 - ERROR - stderr - +2025-02-06 02:47:10 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.599000096321106, 'learning_rate': 1.230958395712123e-06, 'epoch': 2.54} +2025-02-06 02:47:10 - ERROR - stderr - 85%|████████▍ | 18961/22434 [16:39:30<2:23:40, 2.48s/it] +2025-02-06 02:47:12 - ERROR - stderr - 85%|████████▍ | 18962/22434 [16:39:32<2:23:34, 2.48s/it] +2025-02-06 02:47:12 - ERROR - stderr - +2025-02-06 02:47:12 - ERROR - stderr - +2025-02-06 02:47:12 - INFO - stdout - {'loss': 0.3609, 'grad_norm': 1.5227789878845215, 'learning_rate': 1.2302645282574465e-06, 'epoch': 2.54} +2025-02-06 02:47:12 - ERROR - stderr - 85%|████████▍ | 18962/22434 [16:39:32<2:23:34, 2.48s/it] +2025-02-06 02:47:15 - ERROR - stderr - 85%|████████▍ | 18963/22434 [16:39:35<2:27:45, 2.55s/it] +2025-02-06 02:47:15 - ERROR - stderr - +2025-02-06 02:47:15 - ERROR - stderr - +2025-02-06 02:47:15 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.6379772424697876, 'learning_rate': 1.2295708435995168e-06, 'epoch': 2.54} +2025-02-06 02:47:15 - ERROR - stderr - 85%|████████▍ | 18963/22434 [16:39:35<2:27:45, 2.55s/it] +2025-02-06 02:47:17 - ERROR - stderr - 85%|████████▍ | 18964/22434 [16:39:37<2:25:24, 2.51s/it] +2025-02-06 02:47:18 - ERROR - stderr - +2025-02-06 02:47:18 - ERROR - stderr - +2025-02-06 02:47:18 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.5787181854248047, 'learning_rate': 1.2288773417527866e-06, 'epoch': 2.54} +2025-02-06 02:47:18 - ERROR - stderr - 85%|████████▍ | 18964/22434 [16:39:37<2:25:24, 2.51s/it] +2025-02-06 02:47:20 - ERROR - stderr - 85%|████████▍ | 18965/22434 [16:39:40<2:27:00, 2.54s/it] +2025-02-06 02:47:20 - ERROR - stderr - +2025-02-06 02:47:20 - ERROR - stderr - +2025-02-06 02:47:20 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.4062182903289795, 'learning_rate': 1.2281840227317187e-06, 'epoch': 2.54} +2025-02-06 02:47:20 - ERROR - stderr - 85%|████████▍ | 18965/22434 [16:39:40<2:27:00, 2.54s/it] +2025-02-06 02:47:23 - ERROR - stderr - 85%|████████▍ | 18966/22434 [16:39:42<2:26:51, 2.54s/it] +2025-02-06 02:47:23 - ERROR - stderr - +2025-02-06 02:47:23 - ERROR - stderr - +2025-02-06 02:47:23 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.653089165687561, 'learning_rate': 1.2274908865507595e-06, 'epoch': 2.54} +2025-02-06 02:47:23 - ERROR - stderr - 85%|████████▍ | 18966/22434 [16:39:42<2:26:51, 2.54s/it] +2025-02-06 02:47:25 - ERROR - stderr - 85%|████████▍ | 18967/22434 [16:39:45<2:24:41, 2.50s/it] +2025-02-06 02:47:25 - ERROR - stderr - +2025-02-06 02:47:25 - ERROR - stderr - +2025-02-06 02:47:25 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.4121057987213135, 'learning_rate': 1.2267979332243552e-06, 'epoch': 2.54} +2025-02-06 02:47:25 - ERROR - stderr - 85%|████████▍ | 18967/22434 [16:39:45<2:24:41, 2.50s/it] +2025-02-06 02:47:27 - ERROR - stderr - 85%|████████▍ | 18968/22434 [16:39:47<2:24:02, 2.49s/it] +2025-02-06 02:47:28 - ERROR - stderr - +2025-02-06 02:47:28 - ERROR - stderr - +2025-02-06 02:47:28 - INFO - stdout - {'loss': 0.3375, 'grad_norm': 1.4362971782684326, 'learning_rate': 1.2261051627669584e-06, 'epoch': 2.54} +2025-02-06 02:47:28 - ERROR - stderr - 85%|████████▍ | 18968/22434 [16:39:47<2:24:02, 2.49s/it] +2025-02-06 02:47:30 - ERROR - stderr - 85%|████████▍ | 18969/22434 [16:39:50<2:23:06, 2.48s/it] +2025-02-06 02:47:30 - ERROR - stderr - +2025-02-06 02:47:30 - ERROR - stderr - +2025-02-06 02:47:30 - INFO - stdout - {'loss': 0.3247, 'grad_norm': 1.4840314388275146, 'learning_rate': 1.2254125751929991e-06, 'epoch': 2.54} +2025-02-06 02:47:30 - ERROR - stderr - 85%|████████▍ | 18969/22434 [16:39:50<2:23:06, 2.48s/it] +2025-02-06 02:47:32 - ERROR - stderr - 85%|████████▍ | 18970/22434 [16:39:52<2:22:34, 2.47s/it] +2025-02-06 02:47:32 - ERROR - stderr - +2025-02-06 02:47:32 - ERROR - stderr - +2025-02-06 02:47:32 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.5780587196350098, 'learning_rate': 1.2247201705169232e-06, 'epoch': 2.54} +2025-02-06 02:47:32 - ERROR - stderr - 85%|████████▍ | 18970/22434 [16:39:52<2:22:34, 2.47s/it] +2025-02-06 02:47:35 - ERROR - stderr - 85%|████████▍ | 18971/22434 [16:39:55<2:23:43, 2.49s/it] +2025-02-06 02:47:35 - ERROR - stderr - +2025-02-06 02:47:35 - ERROR - stderr - +2025-02-06 02:47:35 - INFO - stdout - {'loss': 0.3254, 'grad_norm': 1.3934011459350586, 'learning_rate': 1.2240279487531548e-06, 'epoch': 2.54} +2025-02-06 02:47:35 - ERROR - stderr - 85%|████████▍ | 18971/22434 [16:39:55<2:23:43, 2.49s/it] +2025-02-06 02:47:37 - ERROR - stderr - 85%|████████▍ | 18972/22434 [16:39:57<2:23:43, 2.49s/it] +2025-02-06 02:47:37 - ERROR - stderr - +2025-02-06 02:47:37 - ERROR - stderr - +2025-02-06 02:47:37 - INFO - stdout - {'loss': 0.3708, 'grad_norm': 1.4963421821594238, 'learning_rate': 1.2233359099161268e-06, 'epoch': 2.54} +2025-02-06 02:47:37 - ERROR - stderr - 85%|████████▍ | 18972/22434 [16:39:57<2:23:43, 2.49s/it] +2025-02-06 02:47:40 - ERROR - stderr - 85%|████████▍ | 18973/22434 [16:40:00<2:23:58, 2.50s/it] +2025-02-06 02:47:40 - ERROR - stderr - +2025-02-06 02:47:40 - ERROR - stderr - +2025-02-06 02:47:40 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.4356242418289185, 'learning_rate': 1.2226440540202645e-06, 'epoch': 2.54} +2025-02-06 02:47:40 - ERROR - stderr - 85%|████████▍ | 18973/22434 [16:40:00<2:23:58, 2.50s/it] +2025-02-06 02:47:42 - ERROR - stderr - 85%|████████▍ | 18974/22434 [16:40:02<2:24:01, 2.50s/it] +2025-02-06 02:47:42 - ERROR - stderr - +2025-02-06 02:47:42 - ERROR - stderr - +2025-02-06 02:47:42 - INFO - stdout - {'loss': 0.2859, 'grad_norm': 1.3616023063659668, 'learning_rate': 1.221952381079986e-06, 'epoch': 2.54} +2025-02-06 02:47:42 - ERROR - stderr - 85%|████████▍ | 18974/22434 [16:40:02<2:24:01, 2.50s/it] +2025-02-06 02:47:45 - ERROR - stderr - 85%|████████▍ | 18975/22434 [16:40:05<2:28:31, 2.58s/it] +2025-02-06 02:47:45 - ERROR - stderr - +2025-02-06 02:47:45 - ERROR - stderr - +2025-02-06 02:47:45 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.6241912841796875, 'learning_rate': 1.2212608911097123e-06, 'epoch': 2.54} +2025-02-06 02:47:45 - ERROR - stderr - 85%|████████▍ | 18975/22434 [16:40:05<2:28:31, 2.58s/it] +2025-02-06 02:47:48 - ERROR - stderr - 85%|████████▍ | 18976/22434 [16:40:08<2:28:45, 2.58s/it] +2025-02-06 02:47:48 - ERROR - stderr - +2025-02-06 02:47:48 - ERROR - stderr - +2025-02-06 02:47:48 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.471322774887085, 'learning_rate': 1.220569584123854e-06, 'epoch': 2.54} +2025-02-06 02:47:48 - ERROR - stderr - 85%|████████▍ | 18976/22434 [16:40:08<2:28:45, 2.58s/it] +2025-02-06 02:47:50 - ERROR - stderr - 85%|████████▍ | 18977/22434 [16:40:10<2:26:32, 2.54s/it] +2025-02-06 02:47:50 - ERROR - stderr - +2025-02-06 02:47:50 - ERROR - stderr - +2025-02-06 02:47:50 - INFO - stdout - {'loss': 0.4084, 'grad_norm': 1.5435800552368164, 'learning_rate': 1.2198784601368208e-06, 'epoch': 2.54} +2025-02-06 02:47:50 - ERROR - stderr - 85%|████████▍ | 18977/22434 [16:40:10<2:26:32, 2.54s/it] +2025-02-06 02:47:53 - ERROR - stderr - 85%|████████▍ | 18978/22434 [16:40:13<2:30:26, 2.61s/it] +2025-02-06 02:47:53 - ERROR - stderr - +2025-02-06 02:47:53 - ERROR - stderr - +2025-02-06 02:47:53 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.509615421295166, 'learning_rate': 1.2191875191630209e-06, 'epoch': 2.54} +2025-02-06 02:47:53 - ERROR - stderr - 85%|████████▍ | 18978/22434 [16:40:13<2:30:26, 2.61s/it] +2025-02-06 02:47:55 - ERROR - stderr - 85%|████████▍ | 18979/22434 [16:40:15<2:27:55, 2.57s/it] +2025-02-06 02:47:56 - ERROR - stderr - +2025-02-06 02:47:56 - ERROR - stderr - +2025-02-06 02:47:56 - INFO - stdout - {'loss': 0.2991, 'grad_norm': 1.3876394033432007, 'learning_rate': 1.218496761216854e-06, 'epoch': 2.54} +2025-02-06 02:47:56 - ERROR - stderr - 85%|████████▍ | 18979/22434 [16:40:15<2:27:55, 2.57s/it] +2025-02-06 02:47:58 - ERROR - stderr - 85%|████████▍ | 18980/22434 [16:40:18<2:25:11, 2.52s/it] +2025-02-06 02:47:58 - ERROR - stderr - +2025-02-06 02:47:58 - ERROR - stderr - +2025-02-06 02:47:58 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.5642766952514648, 'learning_rate': 1.21780618631272e-06, 'epoch': 2.54} +2025-02-06 02:47:58 - ERROR - stderr - 85%|████████▍ | 18980/22434 [16:40:18<2:25:11, 2.52s/it] +2025-02-06 02:48:01 - ERROR - stderr - 85%|████████▍ | 18981/22434 [16:40:20<2:27:49, 2.57s/it] +2025-02-06 02:48:01 - ERROR - stderr - +2025-02-06 02:48:01 - ERROR - stderr - +2025-02-06 02:48:01 - INFO - stdout - {'loss': 0.3215, 'grad_norm': 1.4800750017166138, 'learning_rate': 1.2171157944650114e-06, 'epoch': 2.54} +2025-02-06 02:48:01 - ERROR - stderr - 85%|████████▍ | 18981/22434 [16:40:20<2:27:49, 2.57s/it] +2025-02-06 02:48:03 - ERROR - stderr - 85%|████████▍ | 18982/22434 [16:40:23<2:25:51, 2.54s/it] +2025-02-06 02:48:03 - ERROR - stderr - +2025-02-06 02:48:03 - ERROR - stderr - +2025-02-06 02:48:03 - INFO - stdout - {'loss': 0.3373, 'grad_norm': 1.552872657775879, 'learning_rate': 1.2164255856881224e-06, 'epoch': 2.54} +2025-02-06 02:48:03 - ERROR - stderr - 85%|████████▍ | 18982/22434 [16:40:23<2:25:51, 2.54s/it] +2025-02-06 02:48:06 - ERROR - stderr - 85%|████████▍ | 18983/22434 [16:40:25<2:25:55, 2.54s/it] +2025-02-06 02:48:06 - ERROR - stderr - +2025-02-06 02:48:06 - ERROR - stderr - +2025-02-06 02:48:06 - INFO - stdout - {'loss': 0.3437, 'grad_norm': 1.4098650217056274, 'learning_rate': 1.2157355599964326e-06, 'epoch': 2.54} +2025-02-06 02:48:06 - ERROR - stderr - 85%|████████▍ | 18983/22434 [16:40:25<2:25:55, 2.54s/it] +2025-02-06 02:48:08 - ERROR - stderr - 85%|████████▍ | 18984/22434 [16:40:28<2:25:04, 2.52s/it] +2025-02-06 02:48:08 - ERROR - stderr - +2025-02-06 02:48:08 - ERROR - stderr - +2025-02-06 02:48:08 - INFO - stdout - {'loss': 0.329, 'grad_norm': 1.435842514038086, 'learning_rate': 1.2150457174043339e-06, 'epoch': 2.54} +2025-02-06 02:48:08 - ERROR - stderr - 85%|████████▍ | 18984/22434 [16:40:28<2:25:04, 2.52s/it] +2025-02-06 02:48:11 - ERROR - stderr - 85%|████��███▍ | 18985/22434 [16:40:30<2:25:52, 2.54s/it] +2025-02-06 02:48:11 - ERROR - stderr - +2025-02-06 02:48:11 - ERROR - stderr - +2025-02-06 02:48:11 - INFO - stdout - {'loss': 0.3355, 'grad_norm': 1.518319010734558, 'learning_rate': 1.214356057926197e-06, 'epoch': 2.54} +2025-02-06 02:48:11 - ERROR - stderr - 85%|████████▍ | 18985/22434 [16:40:30<2:25:52, 2.54s/it] +2025-02-06 02:48:13 - ERROR - stderr - 85%|████████▍ | 18986/22434 [16:40:33<2:26:04, 2.54s/it] +2025-02-06 02:48:13 - ERROR - stderr - +2025-02-06 02:48:13 - ERROR - stderr - +2025-02-06 02:48:13 - INFO - stdout - {'loss': 0.405, 'grad_norm': 1.756947636604309, 'learning_rate': 1.2136665815764027e-06, 'epoch': 2.54} +2025-02-06 02:48:13 - ERROR - stderr - 85%|████████▍ | 18986/22434 [16:40:33<2:26:04, 2.54s/it] +2025-02-06 02:48:16 - ERROR - stderr - 85%|████████▍ | 18987/22434 [16:40:35<2:25:28, 2.53s/it] +2025-02-06 02:48:16 - ERROR - stderr - +2025-02-06 02:48:16 - ERROR - stderr - +2025-02-06 02:48:16 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.588113784790039, 'learning_rate': 1.2129772883693236e-06, 'epoch': 2.54} +2025-02-06 02:48:16 - ERROR - stderr - 85%|████████▍ | 18987/22434 [16:40:36<2:25:28, 2.53s/it] +2025-02-06 02:48:18 - ERROR - stderr - 85%|████████▍ | 18988/22434 [16:40:38<2:25:41, 2.54s/it] +2025-02-06 02:48:18 - ERROR - stderr - +2025-02-06 02:48:18 - ERROR - stderr - +2025-02-06 02:48:18 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.9040706157684326, 'learning_rate': 1.2122881783193197e-06, 'epoch': 2.54} +2025-02-06 02:48:18 - ERROR - stderr - 85%|████████▍ | 18988/22434 [16:40:38<2:25:41, 2.54s/it] +2025-02-06 02:48:21 - ERROR - stderr - 85%|████████▍ | 18989/22434 [16:40:41<2:25:22, 2.53s/it] +2025-02-06 02:48:21 - ERROR - stderr - +2025-02-06 02:48:21 - ERROR - stderr - +2025-02-06 02:48:21 - INFO - stdout - {'loss': 0.3213, 'grad_norm': 1.4508939981460571, 'learning_rate': 1.2115992514407637e-06, 'epoch': 2.54} +2025-02-06 02:48:21 - ERROR - stderr - 85%|████████▍ | 18989/22434 [16:40:41<2:25:22, 2.53s/it] +2025-02-06 02:48:23 - ERROR - stderr - 85%|████████▍ | 18990/22434 [16:40:43<2:24:39, 2.52s/it] +2025-02-06 02:48:23 - ERROR - stderr - +2025-02-06 02:48:23 - ERROR - stderr - +2025-02-06 02:48:23 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.6112890243530273, 'learning_rate': 1.210910507748011e-06, 'epoch': 2.54} +2025-02-06 02:48:23 - ERROR - stderr - 85%|████████▍ | 18990/22434 [16:40:43<2:24:39, 2.52s/it] +2025-02-06 02:48:26 - ERROR - stderr - 85%|████████▍ | 18991/22434 [16:40:45<2:23:35, 2.50s/it] +2025-02-06 02:48:26 - ERROR - stderr - +2025-02-06 02:48:26 - ERROR - stderr - +2025-02-06 02:48:26 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.3934227228164673, 'learning_rate': 1.2102219472554177e-06, 'epoch': 2.54} +2025-02-06 02:48:26 - ERROR - stderr - 85%|████████▍ | 18991/22434 [16:40:46<2:23:35, 2.50s/it] +2025-02-06 02:48:28 - ERROR - stderr - 85%|████████▍ | 18992/22434 [16:40:48<2:22:14, 2.48s/it] +2025-02-06 02:48:28 - ERROR - stderr - +2025-02-06 02:48:28 - ERROR - stderr - +2025-02-06 02:48:28 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.629698395729065, 'learning_rate': 1.209533569977337e-06, 'epoch': 2.54} +2025-02-06 02:48:28 - ERROR - stderr - 85%|████████▍ | 18992/22434 [16:40:48<2:22:14, 2.48s/it] +2025-02-06 02:48:31 - ERROR - stderr - 85%|████████▍ | 18993/22434 [16:40:50<2:21:52, 2.47s/it] +2025-02-06 02:48:31 - ERROR - stderr - +2025-02-06 02:48:31 - ERROR - stderr - +2025-02-06 02:48:31 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.5746179819107056, 'learning_rate': 1.2088453759281172e-06, 'epoch': 2.54} +2025-02-06 02:48:31 - ERROR - stderr - 85%|████████▍ | 18993/22434 [16:40:50<2:21:52, 2.47s/it] +2025-02-06 02:48:33 - ERROR - stderr - 85%|████████▍ | 18994/22434 [16:40:53<2:22:20, 2.48s/it] +2025-02-06 02:48:33 - ERROR - stderr - +2025-02-06 02:48:33 - ERROR - stderr - +2025-02-06 02:48:33 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.5221848487854004, 'learning_rate': 1.2081573651221036e-06, 'epoch': 2.54} +2025-02-06 02:48:33 - ERROR - stderr - 85%|████████▍ | 18994/22434 [16:40:53<2:22:20, 2.48s/it] +2025-02-06 02:48:36 - ERROR - stderr - 85%|████████▍ | 18995/22434 [16:40:55<2:21:46, 2.47s/it] +2025-02-06 02:48:36 - ERROR - stderr - +2025-02-06 02:48:36 - ERROR - stderr - +2025-02-06 02:48:36 - INFO - stdout - {'loss': 0.4167, 'grad_norm': 1.5939481258392334, 'learning_rate': 1.2074695375736368e-06, 'epoch': 2.54} +2025-02-06 02:48:36 - ERROR - stderr - 85%|████████▍ | 18995/22434 [16:40:55<2:21:46, 2.47s/it] +2025-02-06 02:48:38 - ERROR - stderr - 85%|████████▍ | 18996/22434 [16:40:58<2:21:03, 2.46s/it] +2025-02-06 02:48:38 - ERROR - stderr - +2025-02-06 02:48:38 - ERROR - stderr - +2025-02-06 02:48:38 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.4628301858901978, 'learning_rate': 1.2067818932970543e-06, 'epoch': 2.54} +2025-02-06 02:48:38 - ERROR - stderr - 85%|████████▍ | 18996/22434 [16:40:58<2:21:03, 2.46s/it] +2025-02-06 02:48:41 - ERROR - stderr - 85%|████████▍ | 18997/22434 [16:41:00<2:22:24, 2.49s/it] +2025-02-06 02:48:41 - ERROR - stderr - +2025-02-06 02:48:41 - ERROR - stderr - +2025-02-06 02:48:41 - INFO - stdout - {'loss': 0.3306, 'grad_norm': 1.3332810401916504, 'learning_rate': 1.2060944323066891e-06, 'epoch': 2.54} +2025-02-06 02:48:41 - ERROR - stderr - 85%|████████▍ | 18997/22434 [16:41:00<2:22:24, 2.49s/it] +2025-02-06 02:48:43 - ERROR - stderr - 85%|████████▍ | 18998/22434 [16:41:03<2:22:31, 2.49s/it] +2025-02-06 02:48:43 - ERROR - stderr - +2025-02-06 02:48:43 - ERROR - stderr - +2025-02-06 02:48:43 - INFO - stdout - {'loss': 0.3814, 'grad_norm': 1.4984132051467896, 'learning_rate': 1.20540715461687e-06, 'epoch': 2.54} +2025-02-06 02:48:43 - ERROR - stderr - 85%|████████▍ | 18998/22434 [16:41:03<2:22:31, 2.49s/it] +2025-02-06 02:48:46 - ERROR - stderr - 85%|████████▍ | 18999/22434 [16:41:05<2:22:36, 2.49s/it] +2025-02-06 02:48:46 - ERROR - stderr - +2025-02-06 02:48:46 - ERROR - stderr - +2025-02-06 02:48:46 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.6222798824310303, 'learning_rate': 1.204720060241924e-06, 'epoch': 2.54} +2025-02-06 02:48:46 - ERROR - stderr - 85%|████████▍ | 18999/22434 [16:41:05<2:22:36, 2.49s/it] +2025-02-06 02:48:48 - ERROR - stderr - 85%|████████▍ | 19000/22434 [16:41:08<2:23:00, 2.50s/it] +2025-02-06 02:48:48 - ERROR - stderr - +2025-02-06 02:48:48 - ERROR - stderr - +2025-02-06 02:48:48 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.604555368423462, 'learning_rate': 1.204033149196171e-06, 'epoch': 2.54} +2025-02-06 02:48:48 - ERROR - stderr - 85%|████████▍ | 19000/22434 [16:41:08<2:23:00, 2.50s/it] +2025-02-06 02:48:51 - ERROR - stderr - 85%|████████▍ | 19001/22434 [16:41:10<2:23:31, 2.51s/it] +2025-02-06 02:48:51 - ERROR - stderr - +2025-02-06 02:48:51 - ERROR - stderr - +2025-02-06 02:48:51 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.564584732055664, 'learning_rate': 1.2033464214939317e-06, 'epoch': 2.54} +2025-02-06 02:48:51 - ERROR - stderr - 85%|████████▍ | 19001/22434 [16:41:10<2:23:31, 2.51s/it] +2025-02-06 02:48:53 - ERROR - stderr - 85%|████████▍ | 19002/22434 [16:41:13<2:22:54, 2.50s/it] +2025-02-06 02:48:53 - ERROR - stderr - +2025-02-06 02:48:53 - ERROR - stderr - +2025-02-06 02:48:53 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.417330265045166, 'learning_rate': 1.2026598771495167e-06, 'epoch': 2.54} +2025-02-06 02:48:53 - ERROR - stderr - 85%|████████▍ | 19002/22434 [16:41:13<2:22:54, 2.50s/it] +2025-02-06 02:48:56 - ERROR - stderr - 85%|████████▍ | 19003/22434 [16:41:15<2:23:21, 2.51s/it] +2025-02-06 02:48:56 - ERROR - stderr - +2025-02-06 02:48:56 - ERROR - stderr - +2025-02-06 02:48:56 - INFO - stdout - {'loss': 0.3173, 'grad_norm': 1.4878209829330444, 'learning_rate': 1.2019735161772429e-06, 'epoch': 2.54} +2025-02-06 02:48:56 - ERROR - stderr - 85%|████████▍ | 19003/22434 [16:41:15<2:23:21, 2.51s/it] +2025-02-06 02:48:58 - ERROR - stderr - 85%|████████▍ | 19004/22434 [16:41:18<2:22:06, 2.49s/it] +2025-02-06 02:48:58 - ERROR - stderr - +2025-02-06 02:48:58 - ERROR - stderr - +2025-02-06 02:48:58 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.6852128505706787, 'learning_rate': 1.201287338591407e-06, 'epoch': 2.54} +2025-02-06 02:48:58 - ERROR - stderr - 85%|████████▍ | 19004/22434 [16:41:18<2:22:06, 2.49s/it] +2025-02-06 02:49:00 - ERROR - stderr - 85%|████████▍ | 19005/22434 [16:41:20<2:21:24, 2.47s/it] +2025-02-06 02:49:00 - ERROR - stderr - +2025-02-06 02:49:00 - ERROR - stderr - +2025-02-06 02:49:00 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.5643755197525024, 'learning_rate': 1.2006013444063192e-06, 'epoch': 2.54} +2025-02-06 02:49:00 - ERROR - stderr - 85%|████████▍ | 19005/22434 [16:41:20<2:21:24, 2.47s/it] +2025-02-06 02:49:03 - ERROR - stderr - 85%|████████▍ | 19006/22434 [16:41:23<2:22:08, 2.49s/it] +2025-02-06 02:49:03 - ERROR - stderr - +2025-02-06 02:49:03 - ERROR - stderr - +2025-02-06 02:49:03 - INFO - stdout - {'loss': 0.435, 'grad_norm': 1.5982167720794678, 'learning_rate': 1.1999155336362779e-06, 'epoch': 2.54} +2025-02-06 02:49:03 - ERROR - stderr - 85%|████████▍ | 19006/22434 [16:41:23<2:22:08, 2.49s/it] +2025-02-06 02:49:06 - ERROR - stderr - 85%|████████▍ | 19007/22434 [16:41:25<2:25:56, 2.56s/it] +2025-02-06 02:49:06 - ERROR - stderr - +2025-02-06 02:49:06 - ERROR - stderr - +2025-02-06 02:49:06 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.6048541069030762, 'learning_rate': 1.1992299062955725e-06, 'epoch': 2.54} +2025-02-06 02:49:06 - ERROR - stderr - 85%|████████▍ | 19007/22434 [16:41:25<2:25:56, 2.56s/it] +2025-02-06 02:49:09 - ERROR - stderr - 85%|████████▍ | 19008/22434 [16:41:28<2:34:15, 2.70s/it] +2025-02-06 02:49:09 - ERROR - stderr - +2025-02-06 02:49:09 - ERROR - stderr - +2025-02-06 02:49:09 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.4650464057922363, 'learning_rate': 1.1985444623985031e-06, 'epoch': 2.54} +2025-02-06 02:49:09 - ERROR - stderr - 85%|████████▍ | 19008/22434 [16:41:29<2:34:15, 2.70s/it] +2025-02-06 02:49:11 - ERROR - stderr - 85%|████████▍ | 19009/22434 [16:41:31<2:30:34, 2.64s/it] +2025-02-06 02:49:11 - ERROR - stderr - +2025-02-06 02:49:11 - ERROR - stderr - +2025-02-06 02:49:11 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.5495638847351074, 'learning_rate': 1.1978592019593482e-06, 'epoch': 2.54} +2025-02-06 02:49:11 - ERROR - stderr - 85%|████████▍ | 19009/22434 [16:41:31<2:30:34, 2.64s/it] +2025-02-06 02:49:14 - ERROR - stderr - 85%|████████▍ | 19010/22434 [16:41:33<2:28:00, 2.59s/it] +2025-02-06 02:49:14 - ERROR - stderr - +2025-02-06 02:49:14 - ERROR - stderr - +2025-02-06 02:49:14 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.5978690385818481, 'learning_rate': 1.1971741249923985e-06, 'epoch': 2.54} +2025-02-06 02:49:14 - ERROR - stderr - 85%|████████▍ | 19010/22434 [16:41:34<2:28:00, 2.59s/it] +2025-02-06 02:49:16 - ERROR - stderr - 85%|████████▍ | 19011/22434 [16:41:36<2:24:58, 2.54s/it] +2025-02-06 02:49:16 - ERROR - stderr - +2025-02-06 02:49:16 - ERROR - stderr - +2025-02-06 02:49:16 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.6080735921859741, 'learning_rate': 1.1964892315119292e-06, 'epoch': 2.54} +2025-02-06 02:49:16 - ERROR - stderr - 85%|████████▍ | 19011/22434 [16:41:36<2:24:58, 2.54s/it] +2025-02-06 02:49:19 - ERROR - stderr - 85%|████████▍ | 19012/22434 [16:41:38<2:24:22, 2.53s/it] +2025-02-06 02:49:19 - ERROR - stderr - +2025-02-06 02:49:19 - ERROR - stderr - +2025-02-06 02:49:19 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.582105040550232, 'learning_rate': 1.195804521532219e-06, 'epoch': 2.54} +2025-02-06 02:49:19 - ERROR - stderr - 85%|████████▍ | 19012/22434 [16:41:38<2:24:22, 2.53s/it] +2025-02-06 02:49:21 - ERROR - stderr - 85%|████████▍ | 19013/22434 [16:41:41<2:24:37, 2.54s/it] +2025-02-06 02:49:21 - ERROR - stderr - +2025-02-06 02:49:21 - ERROR - stderr - +2025-02-06 02:49:21 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.6652531623840332, 'learning_rate': 1.1951199950675373e-06, 'epoch': 2.54} +2025-02-06 02:49:21 - ERROR - stderr - 85%|████████▍ | 19013/22434 [16:41:41<2:24:37, 2.54s/it] +2025-02-06 02:49:24 - ERROR - stderr - 85%|████████▍ | 19014/22434 [16:41:43<2:23:57, 2.53s/it] +2025-02-06 02:49:24 - ERROR - stderr - +2025-02-06 02:49:24 - ERROR - stderr - +2025-02-06 02:49:24 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.5084993839263916, 'learning_rate': 1.1944356521321542e-06, 'epoch': 2.54} +2025-02-06 02:49:24 - ERROR - stderr - 85%|████████▍ | 19014/22434 [16:41:43<2:23:57, 2.53s/it] +2025-02-06 02:49:26 - ERROR - stderr - 85%|████████▍ | 19015/22434 [16:41:46<2:23:55, 2.53s/it] +2025-02-06 02:49:26 - ERROR - stderr - +2025-02-06 02:49:26 - ERROR - stderr - +2025-02-06 02:49:26 - INFO - stdout - {'loss': 0.3416, 'grad_norm': 1.4647800922393799, 'learning_rate': 1.1937514927403349e-06, 'epoch': 2.54} +2025-02-06 02:49:26 - ERROR - stderr - 85%|████████▍ | 19015/22434 [16:41:46<2:23:55, 2.53s/it] +2025-02-06 02:49:29 - ERROR - stderr - 85%|████████▍ | 19016/22434 [16:41:49<2:26:50, 2.58s/it] +2025-02-06 02:49:29 - ERROR - stderr - +2025-02-06 02:49:29 - ERROR - stderr - +2025-02-06 02:49:29 - INFO - stdout - {'loss': 0.3275, 'grad_norm': 1.4628969430923462, 'learning_rate': 1.1930675169063388e-06, 'epoch': 2.54} +2025-02-06 02:49:29 - ERROR - stderr - 85%|████████▍ | 19016/22434 [16:41:49<2:26:50, 2.58s/it] +2025-02-06 02:49:31 - ERROR - stderr - 85%|████████▍ | 19017/22434 [16:41:51<2:25:07, 2.55s/it] +2025-02-06 02:49:31 - ERROR - stderr - +2025-02-06 02:49:31 - ERROR - stderr - +2025-02-06 02:49:31 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.5793293714523315, 'learning_rate': 1.1923837246444225e-06, 'epoch': 2.54} +2025-02-06 02:49:31 - ERROR - stderr - 85%|████████▍ | 19017/22434 [16:41:51<2:25:07, 2.55s/it] +2025-02-06 02:49:34 - ERROR - stderr - 85%|████████▍ | 19018/22434 [16:41:54<2:23:53, 2.53s/it] +2025-02-06 02:49:34 - ERROR - stderr - +2025-02-06 02:49:34 - ERROR - stderr - +2025-02-06 02:49:34 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.590939998626709, 'learning_rate': 1.191700115968839e-06, 'epoch': 2.54} +2025-02-06 02:49:34 - ERROR - stderr - 85%|████████▍ | 19018/22434 [16:41:54<2:23:53, 2.53s/it] +2025-02-06 02:49:36 - ERROR - stderr - 85%|████████▍ | 19019/22434 [16:41:56<2:21:49, 2.49s/it] +2025-02-06 02:49:36 - ERROR - stderr - +2025-02-06 02:49:36 - ERROR - stderr - +2025-02-06 02:49:36 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.5299972295761108, 'learning_rate': 1.1910166908938392e-06, 'epoch': 2.54} +2025-02-06 02:49:36 - ERROR - stderr - 85%|████████▍ | 19019/22434 [16:41:56<2:21:49, 2.49s/it] +2025-02-06 02:49:39 - ERROR - stderr - 85%|████████▍ | 19020/22434 [16:41:59<2:21:57, 2.49s/it] +2025-02-06 02:49:39 - ERROR - stderr - +2025-02-06 02:49:39 - ERROR - stderr - +2025-02-06 02:49:39 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.5773290395736694, 'learning_rate': 1.190333449433666e-06, 'epoch': 2.54} +2025-02-06 02:49:39 - ERROR - stderr - 85%|████████▍ | 19020/22434 [16:41:59<2:21:57, 2.49s/it] +2025-02-06 02:49:41 - ERROR - stderr - 85%|████████▍ | 19021/22434 [16:42:01<2:20:49, 2.48s/it] +2025-02-06 02:49:41 - ERROR - stderr - +2025-02-06 02:49:41 - ERROR - stderr - +2025-02-06 02:49:41 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.285900354385376, 'learning_rate': 1.1896503916025627e-06, 'epoch': 2.54} +2025-02-06 02:49:41 - ERROR - stderr - 85%|████████▍ | 19021/22434 [16:42:01<2:20:49, 2.48s/it] +2025-02-06 02:49:44 - ERROR - stderr - 85%|████████▍ | 19022/22434 [16:42:03<2:21:32, 2.49s/it] +2025-02-06 02:49:44 - ERROR - stderr - +2025-02-06 02:49:44 - ERROR - stderr - +2025-02-06 02:49:44 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.4042330980300903, 'learning_rate': 1.1889675174147685e-06, 'epoch': 2.54} +2025-02-06 02:49:44 - ERROR - stderr - 85%|████████▍ | 19022/22434 [16:42:04<2:21:32, 2.49s/it] +2025-02-06 02:49:46 - ERROR - stderr - 85%|████████▍ | 19023/22434 [16:42:06<2:21:39, 2.49s/it] +2025-02-06 02:49:46 - ERROR - stderr - +2025-02-06 02:49:46 - ERROR - stderr - +2025-02-06 02:49:46 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.5448381900787354, 'learning_rate': 1.1882848268845115e-06, 'epoch': 2.54} +2025-02-06 02:49:46 - ERROR - stderr - 85%|████████▍ | 19023/22434 [16:42:06<2:21:39, 2.49s/it] +2025-02-06 02:49:49 - ERROR - stderr - 85%|████████▍ | 19024/22434 [16:42:08<2:21:40, 2.49s/it] +2025-02-06 02:49:49 - ERROR - stderr - +2025-02-06 02:49:49 - ERROR - stderr - +2025-02-06 02:49:49 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.4879765510559082, 'learning_rate': 1.1876023200260268e-06, 'epoch': 2.54} +2025-02-06 02:49:49 - ERROR - stderr - 85%|████████▍ | 19024/22434 [16:42:09<2:21:40, 2.49s/it] +2025-02-06 02:49:51 - ERROR - stderr - 85%|████████▍ | 19025/22434 [16:42:11<2:24:18, 2.54s/it] +2025-02-06 02:49:51 - ERROR - stderr - +2025-02-06 02:49:51 - ERROR - stderr - +2025-02-06 02:49:51 - INFO - stdout - {'loss': 0.4068, 'grad_norm': 1.812675952911377, 'learning_rate': 1.1869199968535394e-06, 'epoch': 2.54} +2025-02-06 02:49:51 - ERROR - stderr - 85%|████████▍ | 19025/22434 [16:42:11<2:24:18, 2.54s/it] +2025-02-06 02:49:54 - ERROR - stderr - 85%|████████▍ | 19026/22434 [16:42:14<2:23:28, 2.53s/it] +2025-02-06 02:49:54 - ERROR - stderr - +2025-02-06 02:49:54 - ERROR - stderr - +2025-02-06 02:49:54 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.5145901441574097, 'learning_rate': 1.1862378573812715e-06, 'epoch': 2.54} +2025-02-06 02:49:54 - ERROR - stderr - 85%|████████▍ | 19026/22434 [16:42:14<2:23:28, 2.53s/it] +2025-02-06 02:49:56 - ERROR - stderr - 85%|████████▍ | 19027/22434 [16:42:16<2:21:48, 2.50s/it] +2025-02-06 02:49:56 - ERROR - stderr - +2025-02-06 02:49:56 - ERROR - stderr - +2025-02-06 02:49:56 - INFO - stdout - {'loss': 0.3292, 'grad_norm': 1.530989646911621, 'learning_rate': 1.185555901623443e-06, 'epoch': 2.54} +2025-02-06 02:49:56 - ERROR - stderr - 85%|████████▍ | 19027/22434 [16:42:16<2:21:48, 2.50s/it] +2025-02-06 02:49:59 - ERROR - stderr - 85%|████████▍ | 19028/22434 [16:42:19<2:21:55, 2.50s/it] +2025-02-06 02:49:59 - ERROR - stderr - +2025-02-06 02:49:59 - ERROR - stderr - +2025-02-06 02:49:59 - INFO - stdout - {'loss': 0.3569, 'grad_norm': 1.4953234195709229, 'learning_rate': 1.1848741295942634e-06, 'epoch': 2.54} +2025-02-06 02:49:59 - ERROR - stderr - 85%|████████▍ | 19028/22434 [16:42:19<2:21:55, 2.50s/it] +2025-02-06 02:50:01 - ERROR - stderr - 85%|████████▍ | 19029/22434 [16:42:21<2:21:42, 2.50s/it] +2025-02-06 02:50:01 - ERROR - stderr - +2025-02-06 02:50:01 - ERROR - stderr - +2025-02-06 02:50:01 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.5294607877731323, 'learning_rate': 1.1841925413079526e-06, 'epoch': 2.54} +2025-02-06 02:50:01 - ERROR - stderr - 85%|████████▍ | 19029/22434 [16:42:21<2:21:42, 2.50s/it] +2025-02-06 02:50:04 - ERROR - stderr - 85%|████████▍ | 19030/22434 [16:42:24<2:27:31, 2.60s/it] +2025-02-06 02:50:04 - ERROR - stderr - +2025-02-06 02:50:04 - ERROR - stderr - +2025-02-06 02:50:04 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.5358182191848755, 'learning_rate': 1.1835111367787089e-06, 'epoch': 2.54} +2025-02-06 02:50:04 - ERROR - stderr - 85%|████████▍ | 19030/22434 [16:42:24<2:27:31, 2.60s/it] +2025-02-06 02:50:07 - ERROR - stderr - 85%|████████▍ | 19031/22434 [16:42:26<2:26:20, 2.58s/it] +2025-02-06 02:50:07 - ERROR - stderr - +2025-02-06 02:50:07 - ERROR - stderr - +2025-02-06 02:50:07 - INFO - stdout - {'loss': 0.3927, 'grad_norm': 1.594738245010376, 'learning_rate': 1.18282991602074e-06, 'epoch': 2.54} +2025-02-06 02:50:07 - ERROR - stderr - 85%|████████▍ | 19031/22434 [16:42:26<2:26:20, 2.58s/it] +2025-02-06 02:50:09 - ERROR - stderr - 85%|████████▍ | 19032/22434 [16:42:29<2:24:22, 2.55s/it] +2025-02-06 02:50:09 - ERROR - stderr - +2025-02-06 02:50:09 - ERROR - stderr - +2025-02-06 02:50:09 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.5686489343643188, 'learning_rate': 1.1821488790482439e-06, 'epoch': 2.55} +2025-02-06 02:50:09 - ERROR - stderr - 85%|████████▍ | 19032/22434 [16:42:29<2:24:22, 2.55s/it] +2025-02-06 02:50:12 - ERROR - stderr - 85%|████████▍ | 19033/22434 [16:42:31<2:21:57, 2.50s/it] +2025-02-06 02:50:12 - ERROR - stderr - +2025-02-06 02:50:12 - ERROR - stderr - +2025-02-06 02:50:12 - INFO - stdout - {'loss': 0.409, 'grad_norm': 1.6287424564361572, 'learning_rate': 1.181468025875415e-06, 'epoch': 2.55} +2025-02-06 02:50:12 - ERROR - stderr - 85%|████████▍ | 19033/22434 [16:42:31<2:21:57, 2.50s/it] +2025-02-06 02:50:14 - ERROR - stderr - 85%|████████▍ | 19034/22434 [16:42:34<2:21:42, 2.50s/it] +2025-02-06 02:50:14 - ERROR - stderr - +2025-02-06 02:50:14 - ERROR - stderr - +2025-02-06 02:50:14 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.470693826675415, 'learning_rate': 1.1807873565164507e-06, 'epoch': 2.55} +2025-02-06 02:50:14 - ERROR - stderr - 85%|████████▍ | 19034/22434 [16:42:34<2:21:42, 2.50s/it] +2025-02-06 02:50:16 - ERROR - stderr - 85%|████████▍ | 19035/22434 [16:42:36<2:20:53, 2.49s/it] +2025-02-06 02:50:17 - ERROR - stderr - +2025-02-06 02:50:17 - ERROR - stderr - +2025-02-06 02:50:17 - INFO - stdout - {'loss': 0.326, 'grad_norm': 1.3940705060958862, 'learning_rate': 1.1801068709855324e-06, 'epoch': 2.55} +2025-02-06 02:50:17 - ERROR - stderr - 85%|████████▍ | 19035/22434 [16:42:36<2:20:53, 2.49s/it] +2025-02-06 02:50:19 - ERROR - stderr - 85%|████████▍ | 19036/22434 [16:42:39<2:20:35, 2.48s/it] +2025-02-06 02:50:19 - ERROR - stderr - +2025-02-06 02:50:19 - ERROR - stderr - +2025-02-06 02:50:19 - INFO - stdout - {'loss': 0.3577, 'grad_norm': 1.5084551572799683, 'learning_rate': 1.1794265692968476e-06, 'epoch': 2.55} +2025-02-06 02:50:19 - ERROR - stderr - 85%|████████▍ | 19036/22434 [16:42:39<2:20:35, 2.48s/it] +2025-02-06 02:50:21 - ERROR - stderr - 85%|████████▍ | 19037/22434 [16:42:41<2:19:57, 2.47s/it] +2025-02-06 02:50:21 - ERROR - stderr - +2025-02-06 02:50:21 - ERROR - stderr - +2025-02-06 02:50:21 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.598075270652771, 'learning_rate': 1.1787464514645752e-06, 'epoch': 2.55} +2025-02-06 02:50:21 - ERROR - stderr - 85%|████████▍ | 19037/22434 [16:42:41<2:19:57, 2.47s/it] +2025-02-06 02:50:24 - ERROR - stderr - 85%|████████▍ | 19038/22434 [16:42:44<2:20:32, 2.48s/it] +2025-02-06 02:50:24 - ERROR - stderr - +2025-02-06 02:50:24 - ERROR - stderr - +2025-02-06 02:50:24 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.6303542852401733, 'learning_rate': 1.1780665175028915e-06, 'epoch': 2.55} +2025-02-06 02:50:24 - ERROR - stderr - 85%|████████▍ | 19038/22434 [16:42:44<2:20:32, 2.48s/it] +2025-02-06 02:50:26 - ERROR - stderr - 85%|████████▍ | 19039/22434 [16:42:46<2:20:16, 2.48s/it] +2025-02-06 02:50:26 - ERROR - stderr - +2025-02-06 02:50:26 - ERROR - stderr - +2025-02-06 02:50:26 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.535286784172058, 'learning_rate': 1.1773867674259698e-06, 'epoch': 2.55} +2025-02-06 02:50:26 - ERROR - stderr - 85%|████████▍ | 19039/22434 [16:42:46<2:20:16, 2.48s/it] +2025-02-06 02:50:29 - ERROR - stderr - 85%|████████▍ | 19040/22434 [16:42:49<2:20:42, 2.49s/it] +2025-02-06 02:50:29 - ERROR - stderr - +2025-02-06 02:50:29 - ERROR - stderr - +2025-02-06 02:50:29 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.5834107398986816, 'learning_rate': 1.1767072012479785e-06, 'epoch': 2.55} +2025-02-06 02:50:29 - ERROR - stderr - 85%|████████▍ | 19040/22434 [16:42:49<2:20:42, 2.49s/it] +2025-02-06 02:50:31 - ERROR - stderr - 85%|████████▍ | 19041/22434 [16:42:51<2:21:44, 2.51s/it] +2025-02-06 02:50:31 - ERROR - stderr - +2025-02-06 02:50:31 - ERROR - stderr - +2025-02-06 02:50:31 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.6079164743423462, 'learning_rate': 1.1760278189830831e-06, 'epoch': 2.55} +2025-02-06 02:50:31 - ERROR - stderr - 85%|████████▍ | 19041/22434 [16:42:51<2:21:44, 2.51s/it] +2025-02-06 02:50:34 - ERROR - stderr - 85%|████████▍ | 19042/22434 [16:42:54<2:21:21, 2.50s/it] +2025-02-06 02:50:34 - ERROR - stderr - +2025-02-06 02:50:34 - ERROR - stderr - +2025-02-06 02:50:34 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.6829663515090942, 'learning_rate': 1.1753486206454433e-06, 'epoch': 2.55} +2025-02-06 02:50:34 - ERROR - stderr - 85%|████████▍ | 19042/22434 [16:42:54<2:21:21, 2.50s/it] +2025-02-06 02:50:37 - ERROR - stderr - 85%|████████▍ | 19043/22434 [16:42:56<2:22:59, 2.53s/it] +2025-02-06 02:50:37 - ERROR - stderr - +2025-02-06 02:50:37 - ERROR - stderr - +2025-02-06 02:50:37 - INFO - stdout - {'loss': 0.3235, 'grad_norm': 1.3769675493240356, 'learning_rate': 1.174669606249218e-06, 'epoch': 2.55} +2025-02-06 02:50:37 - ERROR - stderr - 85%|████████▍ | 19043/22434 [16:42:56<2:22:59, 2.53s/it] +2025-02-06 02:50:39 - ERROR - stderr - 85%|████████▍ | 19044/22434 [16:42:59<2:21:41, 2.51s/it] +2025-02-06 02:50:39 - ERROR - stderr - +2025-02-06 02:50:39 - ERROR - stderr - +2025-02-06 02:50:39 - INFO - stdout - {'loss': 0.4019, 'grad_norm': 1.6818662881851196, 'learning_rate': 1.17399077580856e-06, 'epoch': 2.55} +2025-02-06 02:50:39 - ERROR - stderr - 85%|████████▍ | 19044/22434 [16:42:59<2:21:41, 2.51s/it] +2025-02-06 02:50:41 - ERROR - stderr - 85%|████████▍ | 19045/22434 [16:43:01<2:21:38, 2.51s/it] +2025-02-06 02:50:42 - ERROR - stderr - +2025-02-06 02:50:42 - ERROR - stderr - +2025-02-06 02:50:42 - INFO - stdout - {'loss': 0.3885, 'grad_norm': 1.6505677700042725, 'learning_rate': 1.1733121293376181e-06, 'epoch': 2.55} +2025-02-06 02:50:42 - ERROR - stderr - 85%|████████▍ | 19045/22434 [16:43:01<2:21:38, 2.51s/it] +2025-02-06 02:50:44 - ERROR - stderr - 85%|████████▍ | 19046/22434 [16:43:04<2:20:48, 2.49s/it] +2025-02-06 02:50:44 - ERROR - stderr - +2025-02-06 02:50:44 - ERROR - stderr - +2025-02-06 02:50:44 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.627119541168213, 'learning_rate': 1.172633666850539e-06, 'epoch': 2.55} +2025-02-06 02:50:44 - ERROR - stderr - 85%|████████▍ | 19046/22434 [16:43:04<2:20:48, 2.49s/it] +2025-02-06 02:50:46 - ERROR - stderr - 85%|████████▍ | 19047/22434 [16:43:06<2:20:18, 2.49s/it] +2025-02-06 02:50:46 - ERROR - stderr - +2025-02-06 02:50:46 - ERROR - stderr - +2025-02-06 02:50:46 - INFO - stdout - {'loss': 0.4095, 'grad_norm': 1.6951234340667725, 'learning_rate': 1.1719553883614642e-06, 'epoch': 2.55} +2025-02-06 02:50:46 - ERROR - stderr - 85%|████████▍ | 19047/22434 [16:43:06<2:20:18, 2.49s/it] +2025-02-06 02:50:49 - ERROR - stderr - 85%|████████▍ | 19048/22434 [16:43:09<2:20:04, 2.48s/it] +2025-02-06 02:50:49 - ERROR - stderr - +2025-02-06 02:50:49 - ERROR - stderr - +2025-02-06 02:50:49 - INFO - stdout - {'loss': 0.3072, 'grad_norm': 1.491919994354248, 'learning_rate': 1.171277293884534e-06, 'epoch': 2.55} +2025-02-06 02:50:49 - ERROR - stderr - 85%|████████▍ | 19048/22434 [16:43:09<2:20:04, 2.48s/it] +2025-02-06 02:50:51 - ERROR - stderr - 85%|████████▍ | 19049/22434 [16:43:11<2:20:20, 2.49s/it] +2025-02-06 02:50:51 - ERROR - stderr - +2025-02-06 02:50:51 - ERROR - stderr - +2025-02-06 02:50:51 - INFO - stdout - {'loss': 0.3841, 'grad_norm': 1.7445472478866577, 'learning_rate': 1.1705993834338757e-06, 'epoch': 2.55} +2025-02-06 02:50:51 - ERROR - stderr - 85%|████████▍ | 19049/22434 [16:43:11<2:20:20, 2.49s/it] +2025-02-06 02:50:54 - ERROR - stderr - 85%|████████▍ | 19050/22434 [16:43:14<2:19:40, 2.48s/it] +2025-02-06 02:50:54 - ERROR - stderr - +2025-02-06 02:50:54 - ERROR - stderr - +2025-02-06 02:50:54 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.4695738554000854, 'learning_rate': 1.1699216570236294e-06, 'epoch': 2.55} +2025-02-06 02:50:54 - ERROR - stderr - 85%|████████▍ | 19050/22434 [16:43:14<2:19:40, 2.48s/it] +2025-02-06 02:50:56 - ERROR - stderr - 85%|████████▍ | 19051/22434 [16:43:16<2:20:45, 2.50s/it] +2025-02-06 02:50:56 - ERROR - stderr - +2025-02-06 02:50:56 - ERROR - stderr - +2025-02-06 02:50:56 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.5522472858428955, 'learning_rate': 1.1692441146679135e-06, 'epoch': 2.55} +2025-02-06 02:50:56 - ERROR - stderr - 85%|████████▍ | 19051/22434 [16:43:16<2:20:45, 2.50s/it] +2025-02-06 02:50:59 - ERROR - stderr - 85%|████████▍ | 19052/22434 [16:43:19<2:22:44, 2.53s/it] +2025-02-06 02:50:59 - ERROR - stderr - +2025-02-06 02:50:59 - ERROR - stderr - +2025-02-06 02:50:59 - INFO - stdout - {'loss': 0.3937, 'grad_norm': 1.6767175197601318, 'learning_rate': 1.1685667563808534e-06, 'epoch': 2.55} +2025-02-06 02:50:59 - ERROR - stderr - 85%|████████▍ | 19052/22434 [16:43:19<2:22:44, 2.53s/it] +2025-02-06 02:51:01 - ERROR - stderr - 85%|████████▍ | 19053/22434 [16:43:21<2:22:08, 2.52s/it] +2025-02-06 02:51:02 - ERROR - stderr - +2025-02-06 02:51:02 - ERROR - stderr - +2025-02-06 02:51:02 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.6503719091415405, 'learning_rate': 1.1678895821765712e-06, 'epoch': 2.55} +2025-02-06 02:51:02 - ERROR - stderr - 85%|████████▍ | 19053/22434 [16:43:21<2:22:08, 2.52s/it] +2025-02-06 02:51:04 - ERROR - stderr - 85%|████████▍ | 19054/22434 [16:43:24<2:21:27, 2.51s/it] +2025-02-06 02:51:04 - ERROR - stderr - +2025-02-06 02:51:04 - ERROR - stderr - +2025-02-06 02:51:04 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.4516584873199463, 'learning_rate': 1.1672125920691757e-06, 'epoch': 2.55} +2025-02-06 02:51:04 - ERROR - stderr - 85%|████████▍ | 19054/22434 [16:43:24<2:21:27, 2.51s/it] +2025-02-06 02:51:06 - ERROR - stderr - 85%|████████▍ | 19055/22434 [16:43:26<2:20:05, 2.49s/it] +2025-02-06 02:51:06 - ERROR - stderr - +2025-02-06 02:51:06 - ERROR - stderr - +2025-02-06 02:51:06 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.6738231182098389, 'learning_rate': 1.1665357860727855e-06, 'epoch': 2.55} +2025-02-06 02:51:06 - ERROR - stderr - 85%|████████▍ | 19055/22434 [16:43:26<2:20:05, 2.49s/it] +2025-02-06 02:51:09 - ERROR - stderr - 85%|████████▍ | 19056/22434 [16:43:29<2:19:13, 2.47s/it] +2025-02-06 02:51:09 - ERROR - stderr - +2025-02-06 02:51:09 - ERROR - stderr - +2025-02-06 02:51:09 - INFO - stdout - {'loss': 0.4117, 'grad_norm': 1.8212946653366089, 'learning_rate': 1.1658591642015026e-06, 'epoch': 2.55} +2025-02-06 02:51:09 - ERROR - stderr - 85%|████████▍ | 19056/22434 [16:43:29<2:19:13, 2.47s/it] +2025-02-06 02:51:12 - ERROR - stderr - 85%|████████▍ | 19057/22434 [16:43:31<2:24:16, 2.56s/it] +2025-02-06 02:51:12 - ERROR - stderr - +2025-02-06 02:51:12 - ERROR - stderr - +2025-02-06 02:51:12 - INFO - stdout - {'loss': 0.4018, 'grad_norm': 1.4859428405761719, 'learning_rate': 1.1651827264694315e-06, 'epoch': 2.55} +2025-02-06 02:51:12 - ERROR - stderr - 85%|████████▍ | 19057/22434 [16:43:31<2:24:16, 2.56s/it] +2025-02-06 02:51:14 - ERROR - stderr - 85%|████████▍ | 19058/22434 [16:43:34<2:23:45, 2.55s/it] +2025-02-06 02:51:14 - ERROR - stderr - +2025-02-06 02:51:14 - ERROR - stderr - +2025-02-06 02:51:14 - INFO - stdout - {'loss': 0.3891, 'grad_norm': 1.5382407903671265, 'learning_rate': 1.164506472890673e-06, 'epoch': 2.55} +2025-02-06 02:51:14 - ERROR - stderr - 85%|████████▍ | 19058/22434 [16:43:34<2:23:45, 2.55s/it] +2025-02-06 02:51:17 - ERROR - stderr - 85%|████████▍ | 19059/22434 [16:43:36<2:22:47, 2.54s/it] +2025-02-06 02:51:17 - ERROR - stderr - +2025-02-06 02:51:17 - ERROR - stderr - +2025-02-06 02:51:17 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.5199946165084839, 'learning_rate': 1.1638304034793224e-06, 'epoch': 2.55} +2025-02-06 02:51:17 - ERROR - stderr - 85%|████████▍ | 19059/22434 [16:43:36<2:22:47, 2.54s/it] +2025-02-06 02:51:19 - ERROR - stderr - 85%|████████▍ | 19060/22434 [16:43:39<2:23:07, 2.55s/it] +2025-02-06 02:51:19 - ERROR - stderr - +2025-02-06 02:51:19 - ERROR - stderr - +2025-02-06 02:51:19 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.433212161064148, 'learning_rate': 1.1631545182494719e-06, 'epoch': 2.55} +2025-02-06 02:51:19 - ERROR - stderr - 85%|████████▍ | 19060/22434 [16:43:39<2:23:07, 2.55s/it] +2025-02-06 02:51:22 - ERROR - stderr - 85%|████████▍ | 19061/22434 [16:43:42<2:24:55, 2.58s/it] +2025-02-06 02:51:22 - ERROR - stderr - +2025-02-06 02:51:22 - ERROR - stderr - +2025-02-06 02:51:22 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.6004211902618408, 'learning_rate': 1.162478817215209e-06, 'epoch': 2.55} +2025-02-06 02:51:22 - ERROR - stderr - 85%|████████▍ | 19061/22434 [16:43:42<2:24:55, 2.58s/it] +2025-02-06 02:51:24 - ERROR - stderr - 85%|████████▍ | 19062/22434 [16:43:44<2:24:27, 2.57s/it] +2025-02-06 02:51:24 - ERROR - stderr - +2025-02-06 02:51:24 - ERROR - stderr - +2025-02-06 02:51:24 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.4585224390029907, 'learning_rate': 1.161803300390618e-06, 'epoch': 2.55} +2025-02-06 02:51:24 - ERROR - stderr - 85%|████████▍ | 19062/22434 [16:43:44<2:24:27, 2.57s/it] +2025-02-06 02:51:27 - ERROR - stderr - 85%|████████▍ | 19063/22434 [16:43:47<2:23:15, 2.55s/it] +2025-02-06 02:51:27 - ERROR - stderr - +2025-02-06 02:51:27 - ERROR - stderr - +2025-02-06 02:51:27 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.3578685522079468, 'learning_rate': 1.1611279677897813e-06, 'epoch': 2.55} +2025-02-06 02:51:27 - ERROR - stderr - 85%|████████▍ | 19063/22434 [16:43:47<2:23:15, 2.55s/it] +2025-02-06 02:51:30 - ERROR - stderr - 85%|████████▍ | 19064/22434 [16:43:49<2:26:59, 2.62s/it] +2025-02-06 02:51:30 - ERROR - stderr - +2025-02-06 02:51:30 - ERROR - stderr - +2025-02-06 02:51:30 - INFO - stdout - {'loss': 0.3909, 'grad_norm': 1.5924443006515503, 'learning_rate': 1.160452819426774e-06, 'epoch': 2.55} +2025-02-06 02:51:30 - ERROR - stderr - 85%|████████▍ | 19064/22434 [16:43:50<2:26:59, 2.62s/it] +2025-02-06 02:51:32 - ERROR - stderr - 85%|████████▍ | 19065/22434 [16:43:52<2:24:38, 2.58s/it] +2025-02-06 02:51:32 - ERROR - stderr - +2025-02-06 02:51:32 - ERROR - stderr - +2025-02-06 02:51:32 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.5349609851837158, 'learning_rate': 1.159777855315668e-06, 'epoch': 2.55} +2025-02-06 02:51:32 - ERROR - stderr - 85%|████████▍ | 19065/22434 [16:43:52<2:24:38, 2.58s/it] +2025-02-06 02:51:35 - ERROR - stderr - 85%|████████▍ | 19066/22434 [16:43:55<2:24:13, 2.57s/it] +2025-02-06 02:51:35 - ERROR - stderr - +2025-02-06 02:51:35 - ERROR - stderr - +2025-02-06 02:51:35 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.6952784061431885, 'learning_rate': 1.1591030754705345e-06, 'epoch': 2.55} +2025-02-06 02:51:35 - ERROR - stderr - 85%|████████▍ | 19066/22434 [16:43:55<2:24:13, 2.57s/it] +2025-02-06 02:51:37 - ERROR - stderr - 85%|████████▍ | 19067/22434 [16:43:57<2:23:36, 2.56s/it] +2025-02-06 02:51:37 - ERROR - stderr - +2025-02-06 02:51:37 - ERROR - stderr - +2025-02-06 02:51:37 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.4711525440216064, 'learning_rate': 1.1584284799054391e-06, 'epoch': 2.55} +2025-02-06 02:51:37 - ERROR - stderr - 85%|████████▍ | 19067/22434 [16:43:57<2:23:36, 2.56s/it] +2025-02-06 02:51:40 - ERROR - stderr - 85%|████████▍ | 19068/22434 [16:44:00<2:22:02, 2.53s/it] +2025-02-06 02:51:40 - ERROR - stderr - +2025-02-06 02:51:40 - ERROR - stderr - +2025-02-06 02:51:40 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.5080596208572388, 'learning_rate': 1.157754068634438e-06, 'epoch': 2.55} +2025-02-06 02:51:40 - ERROR - stderr - 85%|████████▍ | 19068/22434 [16:44:00<2:22:02, 2.53s/it] +2025-02-06 02:51:43 - ERROR - stderr - 85%|████████▌ | 19069/22434 [16:44:02<2:27:21, 2.63s/it] +2025-02-06 02:51:43 - ERROR - stderr - +2025-02-06 02:51:43 - ERROR - stderr - +2025-02-06 02:51:43 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.449229121208191, 'learning_rate': 1.1570798416715933e-06, 'epoch': 2.55} +2025-02-06 02:51:43 - ERROR - stderr - 85%|████████▌ | 19069/22434 [16:44:02<2:27:21, 2.63s/it] +2025-02-06 02:51:45 - ERROR - stderr - 85%|████████▌ | 19070/22434 [16:44:05<2:24:44, 2.58s/it] +2025-02-06 02:51:45 - ERROR - stderr - +2025-02-06 02:51:45 - ERROR - stderr - +2025-02-06 02:51:45 - INFO - stdout - {'loss': 0.4422, 'grad_norm': 1.7168112993240356, 'learning_rate': 1.1564057990309584e-06, 'epoch': 2.55} +2025-02-06 02:51:45 - ERROR - stderr - 85%|████████▌ | 19070/22434 [16:44:05<2:24:44, 2.58s/it] +2025-02-06 02:51:48 - ERROR - stderr - 85%|████████▌ | 19071/22434 [16:44:07<2:25:45, 2.60s/it] +2025-02-06 02:51:48 - ERROR - stderr - +2025-02-06 02:51:48 - ERROR - stderr - +2025-02-06 02:51:48 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.5289077758789062, 'learning_rate': 1.1557319407265821e-06, 'epoch': 2.55} +2025-02-06 02:51:48 - ERROR - stderr - 85%|████████▌ | 19071/22434 [16:44:08<2:25:45, 2.60s/it] +2025-02-06 02:51:50 - ERROR - stderr - 85%|████████▌ | 19072/22434 [16:44:10<2:23:55, 2.57s/it] +2025-02-06 02:51:50 - ERROR - stderr - +2025-02-06 02:51:50 - ERROR - stderr - +2025-02-06 02:51:50 - INFO - stdout - {'loss': 0.3755, 'grad_norm': 1.6102027893066406, 'learning_rate': 1.155058266772513e-06, 'epoch': 2.55} +2025-02-06 02:51:50 - ERROR - stderr - 85%|████████▌ | 19072/22434 [16:44:10<2:23:55, 2.57s/it] +2025-02-06 02:51:53 - ERROR - stderr - 85%|████████▌ | 19073/22434 [16:44:12<2:22:28, 2.54s/it] +2025-02-06 02:51:53 - ERROR - stderr - +2025-02-06 02:51:53 - ERROR - stderr - +2025-02-06 02:51:53 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.6097593307495117, 'learning_rate': 1.1543847771827853e-06, 'epoch': 2.55} +2025-02-06 02:51:53 - ERROR - stderr - 85%|████████▌ | 19073/22434 [16:44:13<2:22:28, 2.54s/it] +2025-02-06 02:51:55 - ERROR - stderr - 85%|████████▌ | 19074/22434 [16:44:15<2:21:14, 2.52s/it] +2025-02-06 02:51:55 - ERROR - stderr - +2025-02-06 02:51:55 - ERROR - stderr - +2025-02-06 02:51:55 - INFO - stdout - {'loss': 0.381, 'grad_norm': 1.6595336198806763, 'learning_rate': 1.1537114719714482e-06, 'epoch': 2.55} +2025-02-06 02:51:55 - ERROR - stderr - 85%|████████▌ | 19074/22434 [16:44:15<2:21:14, 2.52s/it] +2025-02-06 02:51:58 - ERROR - stderr - 85%|████████▌ | 19075/22434 [16:44:18<2:22:00, 2.54s/it] +2025-02-06 02:51:58 - ERROR - stderr - +2025-02-06 02:51:58 - ERROR - stderr - +2025-02-06 02:51:58 - INFO - stdout - {'loss': 0.4048, 'grad_norm': 1.6471478939056396, 'learning_rate': 1.1530383511525268e-06, 'epoch': 2.55} +2025-02-06 02:51:58 - ERROR - stderr - 85%|████████▌ | 19075/22434 [16:44:18<2:22:00, 2.54s/it] +2025-02-06 02:52:00 - ERROR - stderr - 85%|████████▌ | 19076/22434 [16:44:20<2:20:32, 2.51s/it] +2025-02-06 02:52:00 - ERROR - stderr - +2025-02-06 02:52:00 - ERROR - stderr - +2025-02-06 02:52:00 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.451575517654419, 'learning_rate': 1.1523654147400566e-06, 'epoch': 2.55} +2025-02-06 02:52:00 - ERROR - stderr - 85%|████████▌ | 19076/22434 [16:44:20<2:20:32, 2.51s/it] +2025-02-06 02:52:03 - ERROR - stderr - 85%|████████▌ | 19077/22434 [16:44:22<2:18:59, 2.48s/it] +2025-02-06 02:52:03 - ERROR - stderr - +2025-02-06 02:52:03 - ERROR - stderr - +2025-02-06 02:52:03 - INFO - stdout - {'loss': 0.3287, 'grad_norm': 1.4595975875854492, 'learning_rate': 1.1516926627480628e-06, 'epoch': 2.55} +2025-02-06 02:52:03 - ERROR - stderr - 85%|████████▌ | 19077/22434 [16:44:22<2:18:59, 2.48s/it] +2025-02-06 02:52:05 - ERROR - stderr - 85%|████████▌ | 19078/22434 [16:44:25<2:18:16, 2.47s/it] +2025-02-06 02:52:05 - ERROR - stderr - +2025-02-06 02:52:05 - ERROR - stderr - +2025-02-06 02:52:05 - INFO - stdout - {'loss': 0.3228, 'grad_norm': 1.4831385612487793, 'learning_rate': 1.151020095190566e-06, 'epoch': 2.55} +2025-02-06 02:52:05 - ERROR - stderr - 85%|████████▌ | 19078/22434 [16:44:25<2:18:16, 2.47s/it] +2025-02-06 02:52:08 - ERROR - stderr - 85%|████████▌ | 19079/22434 [16:44:27<2:18:00, 2.47s/it] +2025-02-06 02:52:08 - ERROR - stderr - +2025-02-06 02:52:08 - ERROR - stderr - +2025-02-06 02:52:08 - INFO - stdout - {'loss': 0.4014, 'grad_norm': 1.6122426986694336, 'learning_rate': 1.150347712081592e-06, 'epoch': 2.55} +2025-02-06 02:52:08 - ERROR - stderr - 85%|████████▌ | 19079/22434 [16:44:27<2:18:00, 2.47s/it] +2025-02-06 02:52:10 - ERROR - stderr - 85%|████████▌ | 19080/22434 [16:44:30<2:18:27, 2.48s/it] +2025-02-06 02:52:10 - ERROR - stderr - +2025-02-06 02:52:10 - ERROR - stderr - +2025-02-06 02:52:10 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.5214954614639282, 'learning_rate': 1.14967551343515e-06, 'epoch': 2.55} +2025-02-06 02:52:10 - ERROR - stderr - 85%|████████▌ | 19080/22434 [16:44:30<2:18:27, 2.48s/it] +2025-02-06 02:52:12 - ERROR - stderr - 85%|████████▌ | 19081/22434 [16:44:32<2:18:07, 2.47s/it] +2025-02-06 02:52:13 - ERROR - stderr - +2025-02-06 02:52:13 - ERROR - stderr - +2025-02-06 02:52:13 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.6415059566497803, 'learning_rate': 1.1490034992652533e-06, 'epoch': 2.55} +2025-02-06 02:52:13 - ERROR - stderr - 85%|████████▌ | 19081/22434 [16:44:32<2:18:07, 2.47s/it] +2025-02-06 02:52:15 - ERROR - stderr - 85%|████████▌ | 19082/22434 [16:44:35<2:18:51, 2.49s/it] +2025-02-06 02:52:15 - ERROR - stderr - +2025-02-06 02:52:15 - ERROR - stderr - +2025-02-06 02:52:15 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.386521339416504, 'learning_rate': 1.1483316695859082e-06, 'epoch': 2.55} +2025-02-06 02:52:15 - ERROR - stderr - 85%|████████▌ | 19082/22434 [16:44:35<2:18:51, 2.49s/it] +2025-02-06 02:52:18 - ERROR - stderr - 85%|████████▌ | 19083/22434 [16:44:37<2:20:07, 2.51s/it] +2025-02-06 02:52:18 - ERROR - stderr - +2025-02-06 02:52:18 - ERROR - stderr - +2025-02-06 02:52:18 - INFO - stdout - {'loss': 0.4035, 'grad_norm': 1.6972733736038208, 'learning_rate': 1.1476600244111202e-06, 'epoch': 2.55} +2025-02-06 02:52:18 - ERROR - stderr - 85%|████████▌ | 19083/22434 [16:44:37<2:20:07, 2.51s/it] +2025-02-06 02:52:20 - ERROR - stderr - 85%|████████▌ | 19084/22434 [16:44:40<2:19:44, 2.50s/it] +2025-02-06 02:52:20 - ERROR - stderr - +2025-02-06 02:52:20 - ERROR - stderr - +2025-02-06 02:52:20 - INFO - stdout - {'loss': 0.3152, 'grad_norm': 1.5155812501907349, 'learning_rate': 1.1469885637548873e-06, 'epoch': 2.55} +2025-02-06 02:52:20 - ERROR - stderr - 85%|████████▌ | 19084/22434 [16:44:40<2:19:44, 2.50s/it] +2025-02-06 02:52:23 - ERROR - stderr - 85%|████████▌ | 19085/22434 [16:44:42<2:20:07, 2.51s/it] +2025-02-06 02:52:23 - ERROR - stderr - +2025-02-06 02:52:23 - ERROR - stderr - +2025-02-06 02:52:23 - INFO - stdout - {'loss': 0.3339, 'grad_norm': 1.4989207983016968, 'learning_rate': 1.146317287631208e-06, 'epoch': 2.55} +2025-02-06 02:52:23 - ERROR - stderr - 85%|████████▌ | 19085/22434 [16:44:42<2:20:07, 2.51s/it] +2025-02-06 02:52:25 - ERROR - stderr - 85%|████████▌ | 19086/22434 [16:44:45<2:20:30, 2.52s/it] +2025-02-06 02:52:25 - ERROR - stderr - +2025-02-06 02:52:25 - ERROR - stderr - +2025-02-06 02:52:25 - INFO - stdout - {'loss': 0.3119, 'grad_norm': 1.5697849988937378, 'learning_rate': 1.145646196054071e-06, 'epoch': 2.55} +2025-02-06 02:52:25 - ERROR - stderr - 85%|████████▌ | 19086/22434 [16:44:45<2:20:30, 2.52s/it] +2025-02-06 02:52:28 - ERROR - stderr - 85%|████████▌ | 19087/22434 [16:44:47<2:21:13, 2.53s/it] +2025-02-06 02:52:28 - ERROR - stderr - +2025-02-06 02:52:28 - ERROR - stderr - +2025-02-06 02:52:28 - INFO - stdout - {'loss': 0.441, 'grad_norm': 1.791693091392517, 'learning_rate': 1.1449752890374677e-06, 'epoch': 2.55} +2025-02-06 02:52:28 - ERROR - stderr - 85%|████████▌ | 19087/22434 [16:44:47<2:21:13, 2.53s/it] +2025-02-06 02:52:30 - ERROR - stderr - 85%|████████▌ | 19088/22434 [16:44:50<2:19:40, 2.50s/it] +2025-02-06 02:52:30 - ERROR - stderr - +2025-02-06 02:52:30 - ERROR - stderr - +2025-02-06 02:52:30 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.517799735069275, 'learning_rate': 1.14430456659538e-06, 'epoch': 2.55} +2025-02-06 02:52:30 - ERROR - stderr - 85%|████████▌ | 19088/22434 [16:44:50<2:19:40, 2.50s/it] +2025-02-06 02:52:33 - ERROR - stderr - 85%|████████▌ | 19089/22434 [16:44:52<2:19:39, 2.51s/it] +2025-02-06 02:52:33 - ERROR - stderr - +2025-02-06 02:52:33 - ERROR - stderr - +2025-02-06 02:52:33 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.5205950736999512, 'learning_rate': 1.14363402874179e-06, 'epoch': 2.55} +2025-02-06 02:52:33 - ERROR - stderr - 85%|████████▌ | 19089/22434 [16:44:52<2:19:39, 2.51s/it] +2025-02-06 02:52:35 - ERROR - stderr - 85%|████████▌ | 19090/22434 [16:44:55<2:19:40, 2.51s/it] +2025-02-06 02:52:35 - ERROR - stderr - +2025-02-06 02:52:35 - ERROR - stderr - +2025-02-06 02:52:35 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.4477252960205078, 'learning_rate': 1.1429636754906747e-06, 'epoch': 2.55} +2025-02-06 02:52:35 - ERROR - stderr - 85%|████████▌ | 19090/22434 [16:44:55<2:19:40, 2.51s/it] +2025-02-06 02:52:38 - ERROR - stderr - 85%|████████▌ | 19091/22434 [16:44:57<2:18:34, 2.49s/it] +2025-02-06 02:52:38 - ERROR - stderr - +2025-02-06 02:52:38 - ERROR - stderr - +2025-02-06 02:52:38 - INFO - stdout - {'loss': 0.4066, 'grad_norm': 1.6501531600952148, 'learning_rate': 1.1422935068560081e-06, 'epoch': 2.55} +2025-02-06 02:52:38 - ERROR - stderr - 85%|████████▌ | 19091/22434 [16:44:57<2:18:34, 2.49s/it] +2025-02-06 02:52:40 - ERROR - stderr - 85%|████████▌ | 19092/22434 [16:45:00<2:19:16, 2.50s/it] +2025-02-06 02:52:40 - ERROR - stderr - +2025-02-06 02:52:40 - ERROR - stderr - +2025-02-06 02:52:40 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.5092990398406982, 'learning_rate': 1.1416235228517537e-06, 'epoch': 2.55} +2025-02-06 02:52:40 - ERROR - stderr - 85%|████████▌ | 19092/22434 [16:45:00<2:19:16, 2.50s/it] +2025-02-06 02:52:43 - ERROR - stderr - 85%|████████▌ | 19093/22434 [16:45:02<2:19:16, 2.50s/it] +2025-02-06 02:52:43 - ERROR - stderr - +2025-02-06 02:52:43 - ERROR - stderr - +2025-02-06 02:52:43 - INFO - stdout - {'loss': 0.3699, 'grad_norm': 1.5319325923919678, 'learning_rate': 1.1409537234918832e-06, 'epoch': 2.55} +2025-02-06 02:52:43 - ERROR - stderr - 85%|████████▌ | 19093/22434 [16:45:02<2:19:16, 2.50s/it] +2025-02-06 02:52:45 - ERROR - stderr - 85%|████████▌ | 19094/22434 [16:45:05<2:18:39, 2.49s/it] +2025-02-06 02:52:45 - ERROR - stderr - +2025-02-06 02:52:45 - ERROR - stderr - +2025-02-06 02:52:45 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.6036040782928467, 'learning_rate': 1.1402841087903515e-06, 'epoch': 2.55} +2025-02-06 02:52:45 - ERROR - stderr - 85%|████████▌ | 19094/22434 [16:45:05<2:18:39, 2.49s/it] +2025-02-06 02:52:48 - ERROR - stderr - 85%|████████▌ | 19095/22434 [16:45:07<2:18:41, 2.49s/it] +2025-02-06 02:52:48 - ERROR - stderr - +2025-02-06 02:52:48 - ERROR - stderr - +2025-02-06 02:52:48 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.5045545101165771, 'learning_rate': 1.1396146787611251e-06, 'epoch': 2.55} +2025-02-06 02:52:48 - ERROR - stderr - 85%|████���███▌ | 19095/22434 [16:45:07<2:18:41, 2.49s/it] +2025-02-06 02:52:50 - ERROR - stderr - 85%|████████▌ | 19096/22434 [16:45:10<2:18:05, 2.48s/it] +2025-02-06 02:52:50 - ERROR - stderr - +2025-02-06 02:52:50 - ERROR - stderr - +2025-02-06 02:52:50 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.7087947130203247, 'learning_rate': 1.1389454334181494e-06, 'epoch': 2.55} +2025-02-06 02:52:50 - ERROR - stderr - 85%|████████▌ | 19096/22434 [16:45:10<2:18:05, 2.48s/it] +2025-02-06 02:52:53 - ERROR - stderr - 85%|████████▌ | 19097/22434 [16:45:12<2:21:29, 2.54s/it] +2025-02-06 02:52:53 - ERROR - stderr - +2025-02-06 02:52:53 - ERROR - stderr - +2025-02-06 02:52:53 - INFO - stdout - {'loss': 0.3244, 'grad_norm': 1.351144790649414, 'learning_rate': 1.1382763727753742e-06, 'epoch': 2.55} +2025-02-06 02:52:53 - ERROR - stderr - 85%|████████▌ | 19097/22434 [16:45:13<2:21:29, 2.54s/it] +2025-02-06 02:52:55 - ERROR - stderr - 85%|████████▌ | 19098/22434 [16:45:15<2:21:21, 2.54s/it] +2025-02-06 02:52:55 - ERROR - stderr - +2025-02-06 02:52:55 - ERROR - stderr - +2025-02-06 02:52:55 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.3674721717834473, 'learning_rate': 1.1376074968467532e-06, 'epoch': 2.55} +2025-02-06 02:52:55 - ERROR - stderr - 85%|████████▌ | 19098/22434 [16:45:15<2:21:21, 2.54s/it] +2025-02-06 02:52:58 - ERROR - stderr - 85%|████████▌ | 19099/22434 [16:45:18<2:22:11, 2.56s/it] +2025-02-06 02:52:58 - ERROR - stderr - +2025-02-06 02:52:58 - ERROR - stderr - +2025-02-06 02:52:58 - INFO - stdout - {'loss': 0.4071, 'grad_norm': 1.7420861721038818, 'learning_rate': 1.1369388056462217e-06, 'epoch': 2.55} +2025-02-06 02:52:58 - ERROR - stderr - 85%|████████▌ | 19099/22434 [16:45:18<2:22:11, 2.56s/it] +2025-02-06 02:53:00 - ERROR - stderr - 85%|████████▌ | 19100/22434 [16:45:20<2:22:17, 2.56s/it] +2025-02-06 02:53:00 - ERROR - stderr - +2025-02-06 02:53:00 - ERROR - stderr - +2025-02-06 02:53:00 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.582253098487854, 'learning_rate': 1.1362702991877184e-06, 'epoch': 2.55} +2025-02-06 02:53:00 - ERROR - stderr - 85%|████████▌ | 19100/22434 [16:45:20<2:22:17, 2.56s/it] +2025-02-06 02:53:03 - ERROR - stderr - 85%|████████▌ | 19101/22434 [16:45:23<2:21:38, 2.55s/it] +2025-02-06 02:53:03 - ERROR - stderr - +2025-02-06 02:53:03 - ERROR - stderr - +2025-02-06 02:53:03 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.429957389831543, 'learning_rate': 1.13560197748518e-06, 'epoch': 2.55} +2025-02-06 02:53:03 - ERROR - stderr - 85%|████████▌ | 19101/22434 [16:45:23<2:21:38, 2.55s/it] +2025-02-06 02:53:05 - ERROR - stderr - 85%|████████▌ | 19102/22434 [16:45:25<2:21:21, 2.55s/it] +2025-02-06 02:53:06 - ERROR - stderr - +2025-02-06 02:53:06 - ERROR - stderr - +2025-02-06 02:53:06 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.4914565086364746, 'learning_rate': 1.1349338405525368e-06, 'epoch': 2.55} +2025-02-06 02:53:06 - ERROR - stderr - 85%|████████▌ | 19102/22434 [16:45:25<2:21:21, 2.55s/it] +2025-02-06 02:53:08 - ERROR - stderr - 85%|████████▌ | 19103/22434 [16:45:28<2:20:28, 2.53s/it] +2025-02-06 02:53:08 - ERROR - stderr - +2025-02-06 02:53:08 - ERROR - stderr - +2025-02-06 02:53:08 - INFO - stdout - {'loss': 0.3369, 'grad_norm': 1.6115014553070068, 'learning_rate': 1.134265888403714e-06, 'epoch': 2.55} +2025-02-06 02:53:08 - ERROR - stderr - 85%|████████▌ | 19103/22434 [16:45:28<2:20:28, 2.53s/it] +2025-02-06 02:53:11 - ERROR - stderr - 85%|████████▌ | 19104/22434 [16:45:30<2:20:45, 2.54s/it] +2025-02-06 02:53:11 - ERROR - stderr - +2025-02-06 02:53:11 - ERROR - stderr - +2025-02-06 02:53:11 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.5044143199920654, 'learning_rate': 1.1335981210526347e-06, 'epoch': 2.55} +2025-02-06 02:53:11 - ERROR - stderr - 85%|████████▌ | 19104/22434 [16:45:30<2:20:45, 2.54s/it] +2025-02-06 02:53:13 - ERROR - stderr - 85%|████████▌ | 19105/22434 [16:45:33<2:19:35, 2.52s/it] +2025-02-06 02:53:13 - ERROR - stderr - +2025-02-06 02:53:13 - ERROR - stderr - +2025-02-06 02:53:13 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.549876093864441, 'learning_rate': 1.1329305385132194e-06, 'epoch': 2.55} +2025-02-06 02:53:13 - ERROR - stderr - 85%|████████▌ | 19105/22434 [16:45:33<2:19:35, 2.52s/it] +2025-02-06 02:53:15 - ERROR - stderr - 85%|████████▌ | 19106/22434 [16:45:35<2:18:06, 2.49s/it] +2025-02-06 02:53:15 - ERROR - stderr - +2025-02-06 02:53:15 - ERROR - stderr - +2025-02-06 02:53:15 - INFO - stdout - {'loss': 0.3801, 'grad_norm': 1.6749743223190308, 'learning_rate': 1.132263140799381e-06, 'epoch': 2.55} +2025-02-06 02:53:15 - ERROR - stderr - 85%|████████▌ | 19106/22434 [16:45:35<2:18:06, 2.49s/it] +2025-02-06 02:53:18 - ERROR - stderr - 85%|████████▌ | 19107/22434 [16:45:38<2:18:07, 2.49s/it] +2025-02-06 02:53:18 - ERROR - stderr - +2025-02-06 02:53:18 - ERROR - stderr - +2025-02-06 02:53:18 - INFO - stdout - {'loss': 0.2883, 'grad_norm': 1.3045932054519653, 'learning_rate': 1.1315959279250333e-06, 'epoch': 2.56} +2025-02-06 02:53:18 - ERROR - stderr - 85%|████████▌ | 19107/22434 [16:45:38<2:18:07, 2.49s/it] +2025-02-06 02:53:20 - ERROR - stderr - 85%|████████▌ | 19108/22434 [16:45:40<2:17:17, 2.48s/it] +2025-02-06 02:53:20 - ERROR - stderr - +2025-02-06 02:53:20 - ERROR - stderr - +2025-02-06 02:53:20 - INFO - stdout - {'loss': 0.296, 'grad_norm': 1.3580830097198486, 'learning_rate': 1.1309288999040812e-06, 'epoch': 2.56} +2025-02-06 02:53:20 - ERROR - stderr - 85%|████████▌ | 19108/22434 [16:45:40<2:17:17, 2.48s/it] +2025-02-06 02:53:23 - ERROR - stderr - 85%|████████▌ | 19109/22434 [16:45:43<2:17:34, 2.48s/it] +2025-02-06 02:53:23 - ERROR - stderr - +2025-02-06 02:53:23 - ERROR - stderr - +2025-02-06 02:53:23 - INFO - stdout - {'loss': 0.3445, 'grad_norm': 1.421675205230713, 'learning_rate': 1.1302620567504297e-06, 'epoch': 2.56} +2025-02-06 02:53:23 - ERROR - stderr - 85%|████████▌ | 19109/22434 [16:45:43<2:17:34, 2.48s/it] +2025-02-06 02:53:25 - ERROR - stderr - 85%|████████▌ | 19110/22434 [16:45:45<2:18:28, 2.50s/it] +2025-02-06 02:53:25 - ERROR - stderr - +2025-02-06 02:53:25 - ERROR - stderr - +2025-02-06 02:53:25 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.6112197637557983, 'learning_rate': 1.1295953984779783e-06, 'epoch': 2.56} +2025-02-06 02:53:25 - ERROR - stderr - 85%|████████▌ | 19110/22434 [16:45:45<2:18:28, 2.50s/it] +2025-02-06 02:53:28 - ERROR - stderr - 85%|████████▌ | 19111/22434 [16:45:48<2:18:25, 2.50s/it] +2025-02-06 02:53:28 - ERROR - stderr - +2025-02-06 02:53:28 - ERROR - stderr - +2025-02-06 02:53:28 - INFO - stdout - {'loss': 0.4154, 'grad_norm': 1.59079110622406, 'learning_rate': 1.128928925100623e-06, 'epoch': 2.56} +2025-02-06 02:53:28 - ERROR - stderr - 85%|████████▌ | 19111/22434 [16:45:48<2:18:25, 2.50s/it] +2025-02-06 02:53:30 - ERROR - stderr - 85%|████████▌ | 19112/22434 [16:45:50<2:18:38, 2.50s/it] +2025-02-06 02:53:30 - ERROR - stderr - +2025-02-06 02:53:30 - ERROR - stderr - +2025-02-06 02:53:30 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.6369214057922363, 'learning_rate': 1.1282626366322568e-06, 'epoch': 2.56} +2025-02-06 02:53:30 - ERROR - stderr - 85%|████████▌ | 19112/22434 [16:45:50<2:18:38, 2.50s/it] +2025-02-06 02:53:33 - ERROR - stderr - 85%|████████▌ | 19113/22434 [16:45:53<2:19:37, 2.52s/it] +2025-02-06 02:53:33 - ERROR - stderr - +2025-02-06 02:53:33 - ERROR - stderr - +2025-02-06 02:53:33 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.4973527193069458, 'learning_rate': 1.1275965330867633e-06, 'epoch': 2.56} +2025-02-06 02:53:33 - ERROR - stderr - 85%|████████▌ | 19113/22434 [16:45:53<2:19:37, 2.52s/it] +2025-02-06 02:53:35 - ERROR - stderr - 85%|████████▌ | 19114/22434 [16:45:55<2:18:35, 2.50s/it] +2025-02-06 02:53:35 - ERROR - stderr - +2025-02-06 02:53:35 - ERROR - stderr - +2025-02-06 02:53:35 - INFO - stdout - {'loss': 0.4161, 'grad_norm': 1.7654200792312622, 'learning_rate': 1.1269306144780335e-06, 'epoch': 2.56} +2025-02-06 02:53:35 - ERROR - stderr - 85%|████████▌ | 19114/22434 [16:45:55<2:18:35, 2.50s/it] +2025-02-06 02:53:38 - ERROR - stderr - 85%|████████▌ | 19115/22434 [16:45:58<2:20:43, 2.54s/it] +2025-02-06 02:53:38 - ERROR - stderr - +2025-02-06 02:53:38 - ERROR - stderr - +2025-02-06 02:53:38 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.4165576696395874, 'learning_rate': 1.1262648808199427e-06, 'epoch': 2.56} +2025-02-06 02:53:38 - ERROR - stderr - 85%|████████▌ | 19115/22434 [16:45:58<2:20:43, 2.54s/it] +2025-02-06 02:53:41 - ERROR - stderr - 85%|████████▌ | 19116/22434 [16:46:00<2:20:30, 2.54s/it] +2025-02-06 02:53:41 - ERROR - stderr - +2025-02-06 02:53:41 - ERROR - stderr - +2025-02-06 02:53:41 - INFO - stdout - {'loss': 0.4043, 'grad_norm': 1.8297301530838013, 'learning_rate': 1.125599332126368e-06, 'epoch': 2.56} +2025-02-06 02:53:41 - ERROR - stderr - 85%|████████▌ | 19116/22434 [16:46:00<2:20:30, 2.54s/it] +2025-02-06 02:53:43 - ERROR - stderr - 85%|████████▌ | 19117/22434 [16:46:03<2:20:57, 2.55s/it] +2025-02-06 02:53:43 - ERROR - stderr - +2025-02-06 02:53:43 - ERROR - stderr - +2025-02-06 02:53:43 - INFO - stdout - {'loss': 0.3811, 'grad_norm': 1.7113380432128906, 'learning_rate': 1.124933968411187e-06, 'epoch': 2.56} +2025-02-06 02:53:43 - ERROR - stderr - 85%|████████▌ | 19117/22434 [16:46:03<2:20:57, 2.55s/it] +2025-02-06 02:53:46 - ERROR - stderr - 85%|████████▌ | 19118/22434 [16:46:05<2:19:18, 2.52s/it] +2025-02-06 02:53:46 - ERROR - stderr - +2025-02-06 02:53:46 - ERROR - stderr - +2025-02-06 02:53:46 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.754014492034912, 'learning_rate': 1.1242687896882597e-06, 'epoch': 2.56} +2025-02-06 02:53:46 - ERROR - stderr - 85%|████████▌ | 19118/22434 [16:46:05<2:19:18, 2.52s/it] +2025-02-06 02:53:48 - ERROR - stderr - 85%|████████▌ | 19119/22434 [16:46:08<2:17:48, 2.49s/it] +2025-02-06 02:53:48 - ERROR - stderr - +2025-02-06 02:53:48 - ERROR - stderr - +2025-02-06 02:53:48 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.5587259531021118, 'learning_rate': 1.123603795971462e-06, 'epoch': 2.56} +2025-02-06 02:53:48 - ERROR - stderr - 85%|████████▌ | 19119/22434 [16:46:08<2:17:48, 2.49s/it] +2025-02-06 02:53:51 - ERROR - stderr - 85%|████████▌ | 19120/22434 [16:46:10<2:18:24, 2.51s/it] +2025-02-06 02:53:51 - ERROR - stderr - +2025-02-06 02:53:51 - ERROR - stderr - +2025-02-06 02:53:51 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.4812922477722168, 'learning_rate': 1.1229389872746466e-06, 'epoch': 2.56} +2025-02-06 02:53:51 - ERROR - stderr - 85%|████████▌ | 19120/22434 [16:46:10<2:18:24, 2.51s/it] +2025-02-06 02:53:53 - ERROR - stderr - 85%|████████▌ | 19121/22434 [16:46:13<2:17:03, 2.48s/it] +2025-02-06 02:53:53 - ERROR - stderr - +2025-02-06 02:53:53 - ERROR - stderr - +2025-02-06 02:53:53 - INFO - stdout - {'loss': 0.3954, 'grad_norm': 1.5797063112258911, 'learning_rate': 1.122274363611674e-06, 'epoch': 2.56} +2025-02-06 02:53:53 - ERROR - stderr - 85%|████████▌ | 19121/22434 [16:46:13<2:17:03, 2.48s/it] +2025-02-06 02:53:56 - ERROR - stderr - 85%|████████▌ | 19122/22434 [16:46:15<2:18:11, 2.50s/it] +2025-02-06 02:53:56 - ERROR - stderr - +2025-02-06 02:53:56 - ERROR - stderr - +2025-02-06 02:53:56 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.5816878080368042, 'learning_rate': 1.1216099249963964e-06, 'epoch': 2.56} +2025-02-06 02:53:56 - ERROR - stderr - 85%|████████▌ | 19122/22434 [16:46:15<2:18:11, 2.50s/it] +2025-02-06 02:53:58 - ERROR - stderr - 85%|████████▌ | 19123/22434 [16:46:18<2:20:16, 2.54s/it] +2025-02-06 02:53:58 - ERROR - stderr - +2025-02-06 02:53:58 - ERROR - stderr - +2025-02-06 02:53:58 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.581769347190857, 'learning_rate': 1.1209456714426625e-06, 'epoch': 2.56} +2025-02-06 02:53:58 - ERROR - stderr - 85%|████████▌ | 19123/22434 [16:46:18<2:20:16, 2.54s/it] +2025-02-06 02:54:01 - ERROR - stderr - 85%|████████▌ | 19124/22434 [16:46:21<2:20:16, 2.54s/it] +2025-02-06 02:54:01 - ERROR - stderr - +2025-02-06 02:54:01 - ERROR - stderr - +2025-02-06 02:54:01 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.5351593494415283, 'learning_rate': 1.1202816029643238e-06, 'epoch': 2.56} +2025-02-06 02:54:01 - ERROR - stderr - 85%|████████▌ | 19124/22434 [16:46:21<2:20:16, 2.54s/it] +2025-02-06 02:54:03 - ERROR - stderr - 85%|████████▌ | 19125/22434 [16:46:23<2:18:58, 2.52s/it] +2025-02-06 02:54:03 - ERROR - stderr - +2025-02-06 02:54:03 - ERROR - stderr - +2025-02-06 02:54:03 - INFO - stdout - {'loss': 0.3267, 'grad_norm': 1.53022038936615, 'learning_rate': 1.1196177195752167e-06, 'epoch': 2.56} +2025-02-06 02:54:03 - ERROR - stderr - 85%|████████▌ | 19125/22434 [16:46:23<2:18:58, 2.52s/it] +2025-02-06 02:54:06 - ERROR - stderr - 85%|████████▌ | 19126/22434 [16:46:25<2:18:16, 2.51s/it] +2025-02-06 02:54:06 - ERROR - stderr - +2025-02-06 02:54:06 - ERROR - stderr - +2025-02-06 02:54:06 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.4082410335540771, 'learning_rate': 1.1189540212891791e-06, 'epoch': 2.56} +2025-02-06 02:54:06 - ERROR - stderr - 85%|████████▌ | 19126/22434 [16:46:26<2:18:16, 2.51s/it] +2025-02-06 02:54:08 - ERROR - stderr - 85%|████████▌ | 19127/22434 [16:46:28<2:19:58, 2.54s/it] +2025-02-06 02:54:08 - ERROR - stderr - +2025-02-06 02:54:08 - ERROR - stderr - +2025-02-06 02:54:08 - INFO - stdout - {'loss': 0.3596, 'grad_norm': 1.5588889122009277, 'learning_rate': 1.118290508120048e-06, 'epoch': 2.56} +2025-02-06 02:54:08 - ERROR - stderr - 85%|████████▌ | 19127/22434 [16:46:28<2:19:58, 2.54s/it] +2025-02-06 02:54:11 - ERROR - stderr - 85%|████████▌ | 19128/22434 [16:46:31<2:20:49, 2.56s/it] +2025-02-06 02:54:11 - ERROR - stderr - +2025-02-06 02:54:11 - ERROR - stderr - +2025-02-06 02:54:11 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.6179125308990479, 'learning_rate': 1.117627180081653e-06, 'epoch': 2.56} +2025-02-06 02:54:11 - ERROR - stderr - 85%|████████▌ | 19128/22434 [16:46:31<2:20:49, 2.56s/it] +2025-02-06 02:54:13 - ERROR - stderr - 85%|████████▌ | 19129/22434 [16:46:33<2:19:46, 2.54s/it] +2025-02-06 02:54:13 - ERROR - stderr - +2025-02-06 02:54:13 - ERROR - stderr - +2025-02-06 02:54:13 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.6267465353012085, 'learning_rate': 1.1169640371878187e-06, 'epoch': 2.56} +2025-02-06 02:54:13 - ERROR - stderr - 85%|████████▌ | 19129/22434 [16:46:33<2:19:46, 2.54s/it] +2025-02-06 02:54:16 - ERROR - stderr - 85%|████████▌ | 19130/22434 [16:46:36<2:18:00, 2.51s/it] +2025-02-06 02:54:16 - ERROR - stderr - +2025-02-06 02:54:16 - ERROR - stderr - +2025-02-06 02:54:16 - INFO - stdout - {'loss': 0.3959, 'grad_norm': 1.5655436515808105, 'learning_rate': 1.1163010794523688e-06, 'epoch': 2.56} +2025-02-06 02:54:16 - ERROR - stderr - 85%|████████▌ | 19130/22434 [16:46:36<2:18:00, 2.51s/it] +2025-02-06 02:54:18 - ERROR - stderr - 85%|████████▌ | 19131/22434 [16:46:38<2:18:53, 2.52s/it] +2025-02-06 02:54:18 - ERROR - stderr - +2025-02-06 02:54:18 - ERROR - stderr - +2025-02-06 02:54:18 - INFO - stdout - {'loss': 0.3219, 'grad_norm': 1.5150017738342285, 'learning_rate': 1.115638306889123e-06, 'epoch': 2.56} +2025-02-06 02:54:18 - ERROR - stderr - 85%|████████▌ | 19131/22434 [16:46:38<2:18:53, 2.52s/it] +2025-02-06 02:54:21 - ERROR - stderr - 85%|████████▌ | 19132/22434 [16:46:41<2:18:11, 2.51s/it] +2025-02-06 02:54:21 - ERROR - stderr - +2025-02-06 02:54:21 - ERROR - stderr - +2025-02-06 02:54:21 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.7818816900253296, 'learning_rate': 1.1149757195118949e-06, 'epoch': 2.56} +2025-02-06 02:54:21 - ERROR - stderr - 85%|████████▌ | 19132/22434 [16:46:41<2:18:11, 2.51s/it] +2025-02-06 02:54:23 - ERROR - stderr - 85%|████████▌ | 19133/22434 [16:46:43<2:18:42, 2.52s/it] +2025-02-06 02:54:23 - ERROR - stderr - +2025-02-06 02:54:23 - ERROR - stderr - +2025-02-06 02:54:23 - INFO - stdout - {'loss': 0.3988, 'grad_norm': 1.7988239526748657, 'learning_rate': 1.1143133173344978e-06, 'epoch': 2.56} +2025-02-06 02:54:23 - ERROR - stderr - 85%|████████▌ | 19133/22434 [16:46:43<2:18:42, 2.52s/it] +2025-02-06 02:54:26 - ERROR - stderr - 85%|████████▌ | 19134/22434 [16:46:46<2:21:11, 2.57s/it] +2025-02-06 02:54:26 - ERROR - stderr - +2025-02-06 02:54:26 - ERROR - stderr - +2025-02-06 02:54:26 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.5477315187454224, 'learning_rate': 1.1136511003707329e-06, 'epoch': 2.56} +2025-02-06 02:54:26 - ERROR - stderr - 85%|████████▌ | 19134/22434 [16:46:46<2:21:11, 2.57s/it] +2025-02-06 02:54:29 - ERROR - stderr - 85%|████████▌ | 19135/22434 [16:46:48<2:21:59, 2.58s/it] +2025-02-06 02:54:29 - ERROR - stderr - +2025-02-06 02:54:29 - ERROR - stderr - +2025-02-06 02:54:29 - INFO - stdout - {'loss': 0.3609, 'grad_norm': 1.5614964962005615, 'learning_rate': 1.1129890686344092e-06, 'epoch': 2.56} +2025-02-06 02:54:29 - ERROR - stderr - 85%|████████▌ | 19135/22434 [16:46:49<2:21:59, 2.58s/it] +2025-02-06 02:54:31 - ERROR - stderr - 85%|████████▌ | 19136/22434 [16:46:51<2:22:59, 2.60s/it] +2025-02-06 02:54:31 - ERROR - stderr - +2025-02-06 02:54:31 - ERROR - stderr - +2025-02-06 02:54:31 - INFO - stdout - {'loss': 0.334, 'grad_norm': 1.4887311458587646, 'learning_rate': 1.1123272221393267e-06, 'epoch': 2.56} +2025-02-06 02:54:31 - ERROR - stderr - 85%|████████▌ | 19136/22434 [16:46:51<2:22:59, 2.60s/it] +2025-02-06 02:54:34 - ERROR - stderr - 85%|████████▌ | 19137/22434 [16:46:54<2:21:15, 2.57s/it] +2025-02-06 02:54:34 - ERROR - stderr - +2025-02-06 02:54:34 - ERROR - stderr - +2025-02-06 02:54:34 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.714962363243103, 'learning_rate': 1.1116655608992744e-06, 'epoch': 2.56} +2025-02-06 02:54:34 - ERROR - stderr - 85%|████████▌ | 19137/22434 [16:46:54<2:21:15, 2.57s/it] +2025-02-06 02:54:36 - ERROR - stderr - 85%|████████▌ | 19138/22434 [16:46:56<2:21:25, 2.57s/it] +2025-02-06 02:54:36 - ERROR - stderr - +2025-02-06 02:54:36 - ERROR - stderr - +2025-02-06 02:54:36 - INFO - stdout - {'loss': 0.4076, 'grad_norm': 1.7623990774154663, 'learning_rate': 1.1110040849280534e-06, 'epoch': 2.56} +2025-02-06 02:54:36 - ERROR - stderr - 85%|████████▌ | 19138/22434 [16:46:56<2:21:25, 2.57s/it] +2025-02-06 02:54:39 - ERROR - stderr - 85%|████████▌ | 19139/22434 [16:46:59<2:20:04, 2.55s/it] +2025-02-06 02:54:39 - ERROR - stderr - +2025-02-06 02:54:39 - ERROR - stderr - +2025-02-06 02:54:39 - INFO - stdout - {'loss': 0.3273, 'grad_norm': 1.4858635663986206, 'learning_rate': 1.1103427942394418e-06, 'epoch': 2.56} +2025-02-06 02:54:39 - ERROR - stderr - 85%|████████▌ | 19139/22434 [16:46:59<2:20:04, 2.55s/it] +2025-02-06 02:54:41 - ERROR - stderr - 85%|████████▌ | 19140/22434 [16:47:01<2:18:38, 2.53s/it] +2025-02-06 02:54:41 - ERROR - stderr - +2025-02-06 02:54:41 - ERROR - stderr - +2025-02-06 02:54:41 - INFO - stdout - {'loss': 0.4072, 'grad_norm': 1.6777796745300293, 'learning_rate': 1.1096816888472318e-06, 'epoch': 2.56} +2025-02-06 02:54:41 - ERROR - stderr - 85%|████████▌ | 19140/22434 [16:47:01<2:18:38, 2.53s/it] +2025-02-06 02:54:44 - ERROR - stderr - 85%|████████▌ | 19141/22434 [16:47:04<2:18:47, 2.53s/it] +2025-02-06 02:54:44 - ERROR - stderr - +2025-02-06 02:54:44 - ERROR - stderr - +2025-02-06 02:54:44 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.445453405380249, 'learning_rate': 1.1090207687651978e-06, 'epoch': 2.56} +2025-02-06 02:54:44 - ERROR - stderr - 85%|████████▌ | 19141/22434 [16:47:04<2:18:47, 2.53s/it] +2025-02-06 02:54:46 - ERROR - stderr - 85%|████████▌ | 19142/22434 [16:47:06<2:18:49, 2.53s/it] +2025-02-06 02:54:47 - ERROR - stderr - +2025-02-06 02:54:47 - ERROR - stderr - +2025-02-06 02:54:47 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.5060611963272095, 'learning_rate': 1.1083600340071165e-06, 'epoch': 2.56} +2025-02-06 02:54:47 - ERROR - stderr - 85%|████████▌ | 19142/22434 [16:47:06<2:18:49, 2.53s/it] +2025-02-06 02:54:49 - ERROR - stderr - 85%|████████▌ | 19143/22434 [16:47:09<2:18:38, 2.53s/it] +2025-02-06 02:54:49 - ERROR - stderr - +2025-02-06 02:54:49 - ERROR - stderr - +2025-02-06 02:54:49 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.6145423650741577, 'learning_rate': 1.1076994845867662e-06, 'epoch': 2.56} +2025-02-06 02:54:49 - ERROR - stderr - 85%|████████▌ | 19143/22434 [16:47:09<2:18:38, 2.53s/it] +2025-02-06 02:54:52 - ERROR - stderr - 85%|████████▌ | 19144/22434 [16:47:11<2:20:59, 2.57s/it] +2025-02-06 02:54:52 - ERROR - stderr - +2025-02-06 02:54:52 - ERROR - stderr - +2025-02-06 02:54:52 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.5225163698196411, 'learning_rate': 1.1070391205179087e-06, 'epoch': 2.56} +2025-02-06 02:54:52 - ERROR - stderr - 85%|████████▌ | 19144/22434 [16:47:11<2:20:59, 2.57s/it] +2025-02-06 02:54:54 - ERROR - stderr - 85%|████████▌ | 19145/22434 [16:47:14<2:19:55, 2.55s/it] +2025-02-06 02:54:54 - ERROR - stderr - +2025-02-06 02:54:54 - ERROR - stderr - +2025-02-06 02:54:54 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.3707412481307983, 'learning_rate': 1.106378941814311e-06, 'epoch': 2.56} +2025-02-06 02:54:54 - ERROR - stderr - 85%|████████▌ | 19145/22434 [16:47:14<2:19:55, 2.55s/it] +2025-02-06 02:54:57 - ERROR - stderr - 85%|████████▌ | 19146/22434 [16:47:16<2:18:49, 2.53s/it] +2025-02-06 02:54:57 - ERROR - stderr - +2025-02-06 02:54:57 - ERROR - stderr - +2025-02-06 02:54:57 - INFO - stdout - {'loss': 0.3075, 'grad_norm': 1.4909464120864868, 'learning_rate': 1.1057189484897335e-06, 'epoch': 2.56} +2025-02-06 02:54:57 - ERROR - stderr - 85%|████████▌ | 19146/22434 [16:47:16<2:18:49, 2.53s/it] +2025-02-06 02:54:59 - ERROR - stderr - 85%|████████▌ | 19147/22434 [16:47:19<2:18:25, 2.53s/it] +2025-02-06 02:54:59 - ERROR - stderr - +2025-02-06 02:54:59 - ERROR - stderr - +2025-02-06 02:54:59 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.4648799896240234, 'learning_rate': 1.1050591405579347e-06, 'epoch': 2.56} +2025-02-06 02:54:59 - ERROR - stderr - 85%|████████▌ | 19147/22434 [16:47:19<2:18:25, 2.53s/it] +2025-02-06 02:55:02 - ERROR - stderr - 85%|████████▌ | 19148/22434 [16:47:21<2:17:13, 2.51s/it] +2025-02-06 02:55:02 - ERROR - stderr - +2025-02-06 02:55:02 - ERROR - stderr - +2025-02-06 02:55:02 - INFO - stdout - {'loss': 0.3302, 'grad_norm': 1.3774727582931519, 'learning_rate': 1.1043995180326662e-06, 'epoch': 2.56} +2025-02-06 02:55:02 - ERROR - stderr - 85%|████████▌ | 19148/22434 [16:47:21<2:17:13, 2.51s/it] +2025-02-06 02:55:04 - ERROR - stderr - 85%|████████▌ | 19149/22434 [16:47:24<2:18:08, 2.52s/it] +2025-02-06 02:55:04 - ERROR - stderr - +2025-02-06 02:55:04 - ERROR - stderr - +2025-02-06 02:55:04 - INFO - stdout - {'loss': 0.3194, 'grad_norm': 1.316407322883606, 'learning_rate': 1.1037400809276777e-06, 'epoch': 2.56} +2025-02-06 02:55:04 - ERROR - stderr - 85%|████████▌ | 19149/22434 [16:47:24<2:18:08, 2.52s/it] +2025-02-06 02:55:07 - ERROR - stderr - 85%|████████▌ | 19150/22434 [16:47:27<2:20:52, 2.57s/it] +2025-02-06 02:55:07 - ERROR - stderr - +2025-02-06 02:55:07 - ERROR - stderr - +2025-02-06 02:55:07 - INFO - stdout - {'loss': 0.3985, 'grad_norm': 1.5402101278305054, 'learning_rate': 1.1030808292567142e-06, 'epoch': 2.56} +2025-02-06 02:55:07 - ERROR - stderr - 85%|████████▌ | 19150/22434 [16:47:27<2:20:52, 2.57s/it] +2025-02-06 02:55:09 - ERROR - stderr - 85%|████████▌ | 19151/22434 [16:47:29<2:19:45, 2.55s/it] +2025-02-06 02:55:09 - ERROR - stderr - +2025-02-06 02:55:09 - ERROR - stderr - +2025-02-06 02:55:09 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.5872464179992676, 'learning_rate': 1.1024217630335165e-06, 'epoch': 2.56} +2025-02-06 02:55:09 - ERROR - stderr - 85%|████████▌ | 19151/22434 [16:47:29<2:19:45, 2.55s/it] +2025-02-06 02:55:12 - ERROR - stderr - 85%|████████▌ | 19152/22434 [16:47:32<2:18:47, 2.54s/it] +2025-02-06 02:55:12 - ERROR - stderr - +2025-02-06 02:55:12 - ERROR - stderr - +2025-02-06 02:55:12 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.5644649267196655, 'learning_rate': 1.1017628822718262e-06, 'epoch': 2.56} +2025-02-06 02:55:12 - ERROR - stderr - 85%|████████▌ | 19152/22434 [16:47:32<2:18:47, 2.54s/it] +2025-02-06 02:55:14 - ERROR - stderr - 85%|████████▌ | 19153/22434 [16:47:34<2:17:07, 2.51s/it] +2025-02-06 02:55:14 - ERROR - stderr - +2025-02-06 02:55:14 - ERROR - stderr - +2025-02-06 02:55:14 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.5004148483276367, 'learning_rate': 1.10110418698537e-06, 'epoch': 2.56} +2025-02-06 02:55:14 - ERROR - stderr - 85%|████████▌ | 19153/22434 [16:47:34<2:17:07, 2.51s/it] +2025-02-06 02:55:17 - ERROR - stderr - 85%|████████▌ | 19154/22434 [16:47:37<2:15:50, 2.48s/it] +2025-02-06 02:55:17 - ERROR - stderr - +2025-02-06 02:55:17 - ERROR - stderr - +2025-02-06 02:55:17 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.6251713037490845, 'learning_rate': 1.1004456771878836e-06, 'epoch': 2.56} +2025-02-06 02:55:17 - ERROR - stderr - 85%|████████▌ | 19154/22434 [16:47:37<2:15:50, 2.48s/it] +2025-02-06 02:55:19 - ERROR - stderr - 85%|████████▌ | 19155/22434 [16:47:39<2:16:46, 2.50s/it] +2025-02-06 02:55:19 - ERROR - stderr - +2025-02-06 02:55:19 - ERROR - stderr - +2025-02-06 02:55:19 - INFO - stdout - {'loss': 0.3941, 'grad_norm': 1.6321078538894653, 'learning_rate': 1.0997873528930903e-06, 'epoch': 2.56} +2025-02-06 02:55:19 - ERROR - stderr - 85%|████████▌ | 19155/22434 [16:47:39<2:16:46, 2.50s/it] +2025-02-06 02:55:22 - ERROR - stderr - 85%|████████▌ | 19156/22434 [16:47:42<2:16:43, 2.50s/it] +2025-02-06 02:55:22 - ERROR - stderr - +2025-02-06 02:55:22 - ERROR - stderr - +2025-02-06 02:55:22 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.5436561107635498, 'learning_rate': 1.0991292141147135e-06, 'epoch': 2.56} +2025-02-06 02:55:22 - ERROR - stderr - 85%|████████▌ | 19156/22434 [16:47:42<2:16:43, 2.50s/it] +2025-02-06 02:55:25 - ERROR - stderr - 85%|████████▌ | 19157/22434 [16:47:45<2:23:52, 2.63s/it] +2025-02-06 02:55:25 - ERROR - stderr - +2025-02-06 02:55:25 - ERROR - stderr - +2025-02-06 02:55:25 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.4110045433044434, 'learning_rate': 1.098471260866474e-06, 'epoch': 2.56} +2025-02-06 02:55:25 - ERROR - stderr - 85%|████████▌ | 19157/22434 [16:47:45<2:23:52, 2.63s/it] +2025-02-06 02:55:27 - ERROR - stderr - 85%|████████▌ | 19158/22434 [16:47:47<2:22:59, 2.62s/it] +2025-02-06 02:55:27 - ERROR - stderr - +2025-02-06 02:55:27 - ERROR - stderr - +2025-02-06 02:55:27 - INFO - stdout - {'loss': 0.4444, 'grad_norm': 1.5870317220687866, 'learning_rate': 1.0978134931620787e-06, 'epoch': 2.56} +2025-02-06 02:55:27 - ERROR - stderr - 85%|████████▌ | 19158/22434 [16:47:47<2:22:59, 2.62s/it] +2025-02-06 02:55:30 - ERROR - stderr - 85%|████████▌ | 19159/22434 [16:47:50<2:21:45, 2.60s/it] +2025-02-06 02:55:30 - ERROR - stderr - +2025-02-06 02:55:30 - ERROR - stderr - +2025-02-06 02:55:30 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.471293330192566, 'learning_rate': 1.0971559110152463e-06, 'epoch': 2.56} +2025-02-06 02:55:30 - ERROR - stderr - 85%|████████▌ | 19159/22434 [16:47:50<2:21:45, 2.60s/it] +2025-02-06 02:55:32 - ERROR - stderr - 85%|████████▌ | 19160/22434 [16:47:52<2:19:45, 2.56s/it] +2025-02-06 02:55:32 - ERROR - stderr - +2025-02-06 02:55:32 - ERROR - stderr - +2025-02-06 02:55:32 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.4929289817810059, 'learning_rate': 1.0964985144396778e-06, 'epoch': 2.56} +2025-02-06 02:55:32 - ERROR - stderr - 85%|████████▌ | 19160/22434 [16:47:52<2:19:45, 2.56s/it] +2025-02-06 02:55:35 - ERROR - stderr - 85%|████████▌ | 19161/22434 [16:47:55<2:18:31, 2.54s/it] +2025-02-06 02:55:35 - ERROR - stderr - +2025-02-06 02:55:35 - ERROR - stderr - +2025-02-06 02:55:35 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.4330967664718628, 'learning_rate': 1.0958413034490757e-06, 'epoch': 2.56} +2025-02-06 02:55:35 - ERROR - stderr - 85%|████████▌ | 19161/22434 [16:47:55<2:18:31, 2.54s/it] +2025-02-06 02:55:37 - ERROR - stderr - 85%|████████▌ | 19162/22434 [16:47:57<2:18:29, 2.54s/it] +2025-02-06 02:55:37 - ERROR - stderr - +2025-02-06 02:55:37 - ERROR - stderr - +2025-02-06 02:55:37 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.6930015087127686, 'learning_rate': 1.0951842780571464e-06, 'epoch': 2.56} +2025-02-06 02:55:37 - ERROR - stderr - 85%|████████▌ | 19162/22434 [16:47:57<2:18:29, 2.54s/it] +2025-02-06 02:55:40 - ERROR - stderr - 85%|████████▌ | 19163/22434 [16:48:00<2:17:29, 2.52s/it] +2025-02-06 02:55:40 - ERROR - stderr - +2025-02-06 02:55:40 - ERROR - stderr - +2025-02-06 02:55:40 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.5958704948425293, 'learning_rate': 1.094527438277575e-06, 'epoch': 2.56} +2025-02-06 02:55:40 - ERROR - stderr - 85%|████████▌ | 19163/22434 [16:48:00<2:17:29, 2.52s/it] +2025-02-06 02:55:42 - ERROR - stderr - 85%|████████▌ | 19164/22434 [16:48:02<2:17:01, 2.51s/it] +2025-02-06 02:55:42 - ERROR - stderr - +2025-02-06 02:55:42 - ERROR - stderr - +2025-02-06 02:55:42 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.4633170366287231, 'learning_rate': 1.0938707841240614e-06, 'epoch': 2.56} +2025-02-06 02:55:42 - ERROR - stderr - 85%|████████▌ | 19164/22434 [16:48:02<2:17:01, 2.51s/it] +2025-02-06 02:55:45 - ERROR - stderr - 85%|████████▌ | 19165/22434 [16:48:05<2:16:55, 2.51s/it] +2025-02-06 02:55:45 - ERROR - stderr - +2025-02-06 02:55:45 - ERROR - stderr - +2025-02-06 02:55:45 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.6038436889648438, 'learning_rate': 1.093214315610287e-06, 'epoch': 2.56} +2025-02-06 02:55:45 - ERROR - stderr - 85%|████████▌ | 19165/22434 [16:48:05<2:16:55, 2.51s/it] +2025-02-06 02:55:47 - ERROR - stderr - 85%|████████▌ | 19166/22434 [16:48:07<2:15:47, 2.49s/it] +2025-02-06 02:55:47 - ERROR - stderr - +2025-02-06 02:55:47 - ERROR - stderr - +2025-02-06 02:55:47 - INFO - stdout - {'loss': 0.3375, 'grad_norm': 1.4914264678955078, 'learning_rate': 1.0925580327499386e-06, 'epoch': 2.56} +2025-02-06 02:55:47 - ERROR - stderr - 85%|████████▌ | 19166/22434 [16:48:07<2:15:47, 2.49s/it] +2025-02-06 02:55:50 - ERROR - stderr - 85%|████████▌ | 19167/22434 [16:48:10<2:16:35, 2.51s/it] +2025-02-06 02:55:50 - ERROR - stderr - +2025-02-06 02:55:50 - ERROR - stderr - +2025-02-06 02:55:50 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.5384317636489868, 'learning_rate': 1.091901935556693e-06, 'epoch': 2.56} +2025-02-06 02:55:50 - ERROR - stderr - 85%|████████▌ | 19167/22434 [16:48:10<2:16:35, 2.51s/it] +2025-02-06 02:55:52 - ERROR - stderr - 85%|████████▌ | 19168/22434 [16:48:12<2:16:52, 2.51s/it] +2025-02-06 02:55:52 - ERROR - stderr - +2025-02-06 02:55:52 - ERROR - stderr - +2025-02-06 02:55:52 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.7759722471237183, 'learning_rate': 1.091246024044228e-06, 'epoch': 2.56} +2025-02-06 02:55:52 - ERROR - stderr - 85%|████████▌ | 19168/22434 [16:48:12<2:16:52, 2.51s/it] +2025-02-06 02:55:55 - ERROR - stderr - 85%|████████▌ | 19169/22434 [16:48:15<2:16:55, 2.52s/it] +2025-02-06 02:55:55 - ERROR - stderr - +2025-02-06 02:55:55 - ERROR - stderr - +2025-02-06 02:55:55 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.452349066734314, 'learning_rate': 1.0905902982262151e-06, 'epoch': 2.56} +2025-02-06 02:55:55 - ERROR - stderr - 85%|████████▌ | 19169/22434 [16:48:15<2:16:55, 2.52s/it] +2025-02-06 02:55:57 - ERROR - stderr - 85%|████████▌ | 19170/22434 [16:48:17<2:17:45, 2.53s/it] +2025-02-06 02:55:58 - ERROR - stderr - +2025-02-06 02:55:58 - ERROR - stderr - +2025-02-06 02:55:58 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.485260009765625, 'learning_rate': 1.0899347581163222e-06, 'epoch': 2.56} +2025-02-06 02:55:58 - ERROR - stderr - 85%|████████▌ | 19170/22434 [16:48:17<2:17:45, 2.53s/it] +2025-02-06 02:56:00 - ERROR - stderr - 85%|████████▌ | 19171/22434 [16:48:20<2:16:46, 2.51s/it] +2025-02-06 02:56:00 - ERROR - stderr - +2025-02-06 02:56:00 - ERROR - stderr - +2025-02-06 02:56:00 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.620936632156372, 'learning_rate': 1.0892794037282129e-06, 'epoch': 2.56} +2025-02-06 02:56:00 - ERROR - stderr - 85%|████████▌ | 19171/22434 [16:48:20<2:16:46, 2.51s/it] +2025-02-06 02:56:03 - ERROR - stderr - 85%|████████▌ | 19172/22434 [16:48:22<2:18:46, 2.55s/it] +2025-02-06 02:56:03 - ERROR - stderr - +2025-02-06 02:56:03 - ERROR - stderr - +2025-02-06 02:56:03 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.6014331579208374, 'learning_rate': 1.088624235075547e-06, 'epoch': 2.56} +2025-02-06 02:56:03 - ERROR - stderr - 85%|████████▌ | 19172/22434 [16:48:22<2:18:46, 2.55s/it] +2025-02-06 02:56:05 - ERROR - stderr - 85%|████████▌ | 19173/22434 [16:48:25<2:18:50, 2.55s/it] +2025-02-06 02:56:05 - ERROR - stderr - +2025-02-06 02:56:05 - ERROR - stderr - +2025-02-06 02:56:05 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.76543128490448, 'learning_rate': 1.0879692521719831e-06, 'epoch': 2.56} +2025-02-06 02:56:05 - ERROR - stderr - 85%|████████▌ | 19173/22434 [16:48:25<2:18:50, 2.55s/it] +2025-02-06 02:56:08 - ERROR - stderr - 85%|████████▌ | 19174/22434 [16:48:27<2:17:46, 2.54s/it] +2025-02-06 02:56:08 - ERROR - stderr - +2025-02-06 02:56:08 - ERROR - stderr - +2025-02-06 02:56:08 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.4574775695800781, 'learning_rate': 1.087314455031172e-06, 'epoch': 2.56} +2025-02-06 02:56:08 - ERROR - stderr - 85%|████████▌ | 19174/22434 [16:48:27<2:17:46, 2.54s/it] +2025-02-06 02:56:10 - ERROR - stderr - 85%|████████▌ | 19175/22434 [16:48:30<2:17:02, 2.52s/it] +2025-02-06 02:56:10 - ERROR - stderr - +2025-02-06 02:56:10 - ERROR - stderr - +2025-02-06 02:56:10 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.4823658466339111, 'learning_rate': 1.086659843666762e-06, 'epoch': 2.56} +2025-02-06 02:56:10 - ERROR - stderr - 85%|████████▌ | 19175/22434 [16:48:30<2:17:02, 2.52s/it] +2025-02-06 02:56:13 - ERROR - stderr - 85%|████████▌ | 19176/22434 [16:48:32<2:16:45, 2.52s/it] +2025-02-06 02:56:13 - ERROR - stderr - +2025-02-06 02:56:13 - ERROR - stderr - +2025-02-06 02:56:13 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.6581299304962158, 'learning_rate': 1.0860054180924007e-06, 'epoch': 2.56} +2025-02-06 02:56:13 - ERROR - stderr - 85%|████████▌ | 19176/22434 [16:48:32<2:16:45, 2.52s/it] +2025-02-06 02:56:15 - ERROR - stderr - 85%|████████▌ | 19177/22434 [16:48:35<2:17:42, 2.54s/it] +2025-02-06 02:56:15 - ERROR - stderr - +2025-02-06 02:56:15 - ERROR - stderr - +2025-02-06 02:56:15 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.4035308361053467, 'learning_rate': 1.085351178321722e-06, 'epoch': 2.56} +2025-02-06 02:56:15 - ERROR - stderr - 85%|████████▌ | 19177/22434 [16:48:35<2:17:42, 2.54s/it] +2025-02-06 02:56:18 - ERROR - stderr - 85%|████████▌ | 19178/22434 [16:48:38<2:18:57, 2.56s/it] +2025-02-06 02:56:18 - ERROR - stderr - +2025-02-06 02:56:18 - ERROR - stderr - +2025-02-06 02:56:18 - INFO - stdout - {'loss': 0.3624, 'grad_norm': 1.621748447418213, 'learning_rate': 1.0846971243683724e-06, 'epoch': 2.56} +2025-02-06 02:56:18 - ERROR - stderr - 85%|████████▌ | 19178/22434 [16:48:38<2:18:57, 2.56s/it] +2025-02-06 02:56:20 - ERROR - stderr - 85%|████████▌ | 19179/22434 [16:48:40<2:17:41, 2.54s/it] +2025-02-06 02:56:20 - ERROR - stderr - +2025-02-06 02:56:20 - ERROR - stderr - +2025-02-06 02:56:20 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.6111441850662231, 'learning_rate': 1.0840432562459757e-06, 'epoch': 2.56} +2025-02-06 02:56:20 - ERROR - stderr - 85%|████████▌ | 19179/22434 [16:48:40<2:17:41, 2.54s/it] +2025-02-06 02:56:23 - ERROR - stderr - 85%|████████▌ | 19180/22434 [16:48:43<2:16:42, 2.52s/it] +2025-02-06 02:56:23 - ERROR - stderr - +2025-02-06 02:56:23 - ERROR - stderr - +2025-02-06 02:56:23 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.45008385181427, 'learning_rate': 1.0833895739681689e-06, 'epoch': 2.56} +2025-02-06 02:56:23 - ERROR - stderr - 85%|████████▌ | 19180/22434 [16:48:43<2:16:42, 2.52s/it] +2025-02-06 02:56:25 - ERROR - stderr - 85%|████████▌ | 19181/22434 [16:48:45<2:16:03, 2.51s/it] +2025-02-06 02:56:25 - ERROR - stderr - +2025-02-06 02:56:25 - ERROR - stderr - +2025-02-06 02:56:25 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.4364136457443237, 'learning_rate': 1.082736077548575e-06, 'epoch': 2.56} +2025-02-06 02:56:25 - ERROR - stderr - 85%|████████▌ | 19181/22434 [16:48:45<2:16:03, 2.51s/it] +2025-02-06 02:56:28 - ERROR - stderr - 86%|████████▌ | 19182/22434 [16:48:48<2:15:43, 2.50s/it] +2025-02-06 02:56:28 - ERROR - stderr - +2025-02-06 02:56:28 - ERROR - stderr - +2025-02-06 02:56:28 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.6160213947296143, 'learning_rate': 1.0820827670008104e-06, 'epoch': 2.57} +2025-02-06 02:56:28 - ERROR - stderr - 86%|████████▌ | 19182/22434 [16:48:48<2:15:43, 2.50s/it] +2025-02-06 02:56:30 - ERROR - stderr - 86%|████████▌ | 19183/22434 [16:48:50<2:14:43, 2.49s/it] +2025-02-06 02:56:30 - ERROR - stderr - +2025-02-06 02:56:30 - ERROR - stderr - +2025-02-06 02:56:30 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.5793753862380981, 'learning_rate': 1.0814296423385018e-06, 'epoch': 2.57} +2025-02-06 02:56:30 - ERROR - stderr - 86%|████████▌ | 19183/22434 [16:48:50<2:14:43, 2.49s/it] +2025-02-06 02:56:33 - ERROR - stderr - 86%|████████▌ | 19184/22434 [16:48:52<2:14:37, 2.49s/it] +2025-02-06 02:56:33 - ERROR - stderr - +2025-02-06 02:56:33 - ERROR - stderr - +2025-02-06 02:56:33 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.463008165359497, 'learning_rate': 1.0807767035752558e-06, 'epoch': 2.57} +2025-02-06 02:56:33 - ERROR - stderr - 86%|████████▌ | 19184/22434 [16:48:53<2:14:37, 2.49s/it] +2025-02-06 02:56:35 - ERROR - stderr - 86%|████████▌ | 19185/22434 [16:48:55<2:14:45, 2.49s/it] +2025-02-06 02:56:35 - ERROR - stderr - +2025-02-06 02:56:35 - ERROR - stderr - +2025-02-06 02:56:35 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.5994659662246704, 'learning_rate': 1.0801239507246853e-06, 'epoch': 2.57} +2025-02-06 02:56:35 - ERROR - stderr - 86%|████████▌ | 19185/22434 [16:48:55<2:14:45, 2.49s/it] +2025-02-06 02:56:38 - ERROR - stderr - 86%|████████▌ | 19186/22434 [16:48:57<2:14:34, 2.49s/it] +2025-02-06 02:56:38 - ERROR - stderr - +2025-02-06 02:56:38 - ERROR - stderr - +2025-02-06 02:56:38 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.5649604797363281, 'learning_rate': 1.0794713838003945e-06, 'epoch': 2.57} +2025-02-06 02:56:38 - ERROR - stderr - 86%|████████▌ | 19186/22434 [16:48:58<2:14:34, 2.49s/it] +2025-02-06 02:56:40 - ERROR - stderr - 86%|████████▌ | 19187/22434 [16:49:00<2:16:16, 2.52s/it] +2025-02-06 02:56:40 - ERROR - stderr - +2025-02-06 02:56:40 - ERROR - stderr - +2025-02-06 02:56:40 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.5192291736602783, 'learning_rate': 1.078819002815986e-06, 'epoch': 2.57} +2025-02-06 02:56:40 - ERROR - stderr - 86%|████████▌ | 19187/22434 [16:49:00<2:16:16, 2.52s/it] +2025-02-06 02:56:43 - ERROR - stderr - 86%|████████▌ | 19188/22434 [16:49:03<2:17:49, 2.55s/it] +2025-02-06 02:56:43 - ERROR - stderr - +2025-02-06 02:56:43 - ERROR - stderr - +2025-02-06 02:56:43 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.4657697677612305, 'learning_rate': 1.0781668077850616e-06, 'epoch': 2.57} +2025-02-06 02:56:43 - ERROR - stderr - 86%|████████▌ | 19188/22434 [16:49:03<2:17:49, 2.55s/it] +2025-02-06 02:56:46 - ERROR - stderr - 86%|████████▌ | 19189/22434 [16:49:05<2:19:14, 2.57s/it] +2025-02-06 02:56:46 - ERROR - stderr - +2025-02-06 02:56:46 - ERROR - stderr - +2025-02-06 02:56:46 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.554478406906128, 'learning_rate': 1.0775147987212108e-06, 'epoch': 2.57} +2025-02-06 02:56:46 - ERROR - stderr - 86%|████████▌ | 19189/22434 [16:49:05<2:19:14, 2.57s/it] +2025-02-06 02:56:48 - ERROR - stderr - 86%|████████▌ | 19190/22434 [16:49:08<2:17:36, 2.55s/it] +2025-02-06 02:56:48 - ERROR - stderr - +2025-02-06 02:56:48 - ERROR - stderr - +2025-02-06 02:56:48 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.447070598602295, 'learning_rate': 1.0768629756380266e-06, 'epoch': 2.57} +2025-02-06 02:56:48 - ERROR - stderr - 86%|████████▌ | 19190/22434 [16:49:08<2:17:36, 2.55s/it] +2025-02-06 02:56:50 - ERROR - stderr - 86%|████████▌ | 19191/22434 [16:49:10<2:16:33, 2.53s/it] +2025-02-06 02:56:51 - ERROR - stderr - +2025-02-06 02:56:51 - ERROR - stderr - +2025-02-06 02:56:51 - INFO - stdout - {'loss': 0.3828, 'grad_norm': 1.6692407131195068, 'learning_rate': 1.0762113385490957e-06, 'epoch': 2.57} +2025-02-06 02:56:51 - ERROR - stderr - 86%|████████▌ | 19191/22434 [16:49:10<2:16:33, 2.53s/it] +2025-02-06 02:56:53 - ERROR - stderr - 86%|████████▌ | 19192/22434 [16:49:13<2:17:09, 2.54s/it] +2025-02-06 02:56:53 - ERROR - stderr - +2025-02-06 02:56:53 - ERROR - stderr - +2025-02-06 02:56:53 - INFO - stdout - {'loss': 0.3033, 'grad_norm': 1.3471965789794922, 'learning_rate': 1.0755598874679995e-06, 'epoch': 2.57} +2025-02-06 02:56:53 - ERROR - stderr - 86%|████████▌ | 19192/22434 [16:49:13<2:17:09, 2.54s/it] +2025-02-06 02:56:56 - ERROR - stderr - 86%|████████▌ | 19193/22434 [16:49:15<2:15:30, 2.51s/it] +2025-02-06 02:56:56 - ERROR - stderr - +2025-02-06 02:56:56 - ERROR - stderr - +2025-02-06 02:56:56 - INFO - stdout - {'loss': 0.4058, 'grad_norm': 1.6716856956481934, 'learning_rate': 1.0749086224083184e-06, 'epoch': 2.57} +2025-02-06 02:56:56 - ERROR - stderr - 86%|████████▌ | 19193/22434 [16:49:15<2:15:30, 2.51s/it] +2025-02-06 02:56:58 - ERROR - stderr - 86%|████████▌ | 19194/22434 [16:49:18<2:15:36, 2.51s/it] +2025-02-06 02:56:58 - ERROR - stderr - +2025-02-06 02:56:58 - ERROR - stderr - +2025-02-06 02:56:58 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.5668190717697144, 'learning_rate': 1.0742575433836255e-06, 'epoch': 2.57} +2025-02-06 02:56:58 - ERROR - stderr - 86%|████████▌ | 19194/22434 [16:49:18<2:15:36, 2.51s/it] +2025-02-06 02:57:01 - ERROR - stderr - 86%|████████▌ | 19195/22434 [16:49:20<2:15:44, 2.51s/it] +2025-02-06 02:57:01 - ERROR - stderr - +2025-02-06 02:57:01 - ERROR - stderr - +2025-02-06 02:57:01 - INFO - stdout - {'loss': 0.4168, 'grad_norm': 1.7902649641036987, 'learning_rate': 1.0736066504074937e-06, 'epoch': 2.57} +2025-02-06 02:57:01 - ERROR - stderr - 86%|████████▌ | 19195/22434 [16:49:20<2:15:44, 2.51s/it] +2025-02-06 02:57:03 - ERROR - stderr - 86%|████████▌ | 19196/22434 [16:49:23<2:15:52, 2.52s/it] +2025-02-06 02:57:03 - ERROR - stderr - +2025-02-06 02:57:03 - ERROR - stderr - +2025-02-06 02:57:03 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.5918445587158203, 'learning_rate': 1.07295594349349e-06, 'epoch': 2.57} +2025-02-06 02:57:03 - ERROR - stderr - 86%|████████▌ | 19196/22434 [16:49:23<2:15:52, 2.52s/it] +2025-02-06 02:57:06 - ERROR - stderr - 86%|████████▌ | 19197/22434 [16:49:25<2:14:30, 2.49s/it] +2025-02-06 02:57:06 - ERROR - stderr - +2025-02-06 02:57:06 - ERROR - stderr - +2025-02-06 02:57:06 - INFO - stdout - {'loss': 0.3466, 'grad_norm': 1.5891847610473633, 'learning_rate': 1.0723054226551798e-06, 'epoch': 2.57} +2025-02-06 02:57:06 - ERROR - stderr - 86%|████████▌ | 19197/22434 [16:49:25<2:14:30, 2.49s/it] +2025-02-06 02:57:08 - ERROR - stderr - 86%|████████▌ | 19198/22434 [16:49:28<2:17:50, 2.56s/it] +2025-02-06 02:57:08 - ERROR - stderr - +2025-02-06 02:57:08 - ERROR - stderr - +2025-02-06 02:57:08 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.5330432653427124, 'learning_rate': 1.0716550879061148e-06, 'epoch': 2.57} +2025-02-06 02:57:08 - ERROR - stderr - 86%|████████▌ | 19198/22434 [16:49:28<2:17:50, 2.56s/it] +2025-02-06 02:57:11 - ERROR - stderr - 86%|████████▌ | 19199/22434 [16:49:30<2:17:13, 2.55s/it] +2025-02-06 02:57:11 - ERROR - stderr - +2025-02-06 02:57:11 - ERROR - stderr - +2025-02-06 02:57:11 - INFO - stdout - {'loss': 0.4194, 'grad_norm': 1.5324773788452148, 'learning_rate': 1.0710049392598587e-06, 'epoch': 2.57} +2025-02-06 02:57:11 - ERROR - stderr - 86%|████████▌ | 19199/22434 [16:49:31<2:17:13, 2.55s/it] +2025-02-06 02:57:13 - ERROR - stderr - 86%|████████▌ | 19200/22434 [16:49:33<2:15:48, 2.52s/it] +2025-02-06 02:57:13 - ERROR - stderr - +2025-02-06 02:57:13 - ERROR - stderr - +2025-02-06 02:57:13 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.5851337909698486, 'learning_rate': 1.0703549767299625e-06, 'epoch': 2.57} +2025-02-06 02:57:13 - ERROR - stderr - 86%|████████▌ | 19200/22434 [16:49:33<2:15:48, 2.52s/it] +2025-02-06 02:57:16 - ERROR - stderr - 86%|████████▌ | 19201/22434 [16:49:35<2:15:30, 2.51s/it] +2025-02-06 02:57:16 - ERROR - stderr - +2025-02-06 02:57:16 - ERROR - stderr - +2025-02-06 02:57:16 - INFO - stdout - {'loss': 0.3812, 'grad_norm': 1.4476577043533325, 'learning_rate': 1.069705200329969e-06, 'epoch': 2.57} +2025-02-06 02:57:16 - ERROR - stderr - 86%|████████▌ | 19201/22434 [16:49:36<2:15:30, 2.51s/it] +2025-02-06 02:57:18 - ERROR - stderr - 86%|████████▌ | 19202/22434 [16:49:38<2:16:13, 2.53s/it] +2025-02-06 02:57:18 - ERROR - stderr - +2025-02-06 02:57:18 - ERROR - stderr - +2025-02-06 02:57:18 - INFO - stdout - {'loss': 0.3221, 'grad_norm': 1.4401919841766357, 'learning_rate': 1.0690556100734284e-06, 'epoch': 2.57} +2025-02-06 02:57:18 - ERROR - stderr - 86%|████████▌ | 19202/22434 [16:49:38<2:16:13, 2.53s/it] +2025-02-06 02:57:21 - ERROR - stderr - 86%|████████▌ | 19203/22434 [16:49:41<2:17:11, 2.55s/it] +2025-02-06 02:57:21 - ERROR - stderr - +2025-02-06 02:57:21 - ERROR - stderr - +2025-02-06 02:57:21 - INFO - stdout - {'loss': 0.3895, 'grad_norm': 1.5651764869689941, 'learning_rate': 1.0684062059738731e-06, 'epoch': 2.57} +2025-02-06 02:57:21 - ERROR - stderr - 86%|████████▌ | 19203/22434 [16:49:41<2:17:11, 2.55s/it] +2025-02-06 02:57:24 - ERROR - stderr - 86%|████████▌ | 19204/22434 [16:49:43<2:19:16, 2.59s/it] +2025-02-06 02:57:24 - ERROR - stderr - +2025-02-06 02:57:24 - ERROR - stderr - +2025-02-06 02:57:24 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.3623939752578735, 'learning_rate': 1.0677569880448479e-06, 'epoch': 2.57} +2025-02-06 02:57:24 - ERROR - stderr - 86%|████████▌ | 19204/22434 [16:49:43<2:19:16, 2.59s/it] +2025-02-06 02:57:26 - ERROR - stderr - 86%|████████▌ | 19205/22434 [16:49:46<2:18:20, 2.57s/it] +2025-02-06 02:57:26 - ERROR - stderr - +2025-02-06 02:57:26 - ERROR - stderr - +2025-02-06 02:57:26 - INFO - stdout - {'loss': 0.3829, 'grad_norm': 1.6447831392288208, 'learning_rate': 1.06710795629988e-06, 'epoch': 2.57} +2025-02-06 02:57:26 - ERROR - stderr - 86%|████████▌ | 19205/22434 [16:49:46<2:18:20, 2.57s/it] +2025-02-06 02:57:28 - ERROR - stderr - 86%|████████▌ | 19206/22434 [16:49:48<2:16:16, 2.53s/it] +2025-02-06 02:57:29 - ERROR - stderr - +2025-02-06 02:57:29 - ERROR - stderr - +2025-02-06 02:57:29 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.437551498413086, 'learning_rate': 1.0664591107524958e-06, 'epoch': 2.57} +2025-02-06 02:57:29 - ERROR - stderr - 86%|████████▌ | 19206/22434 [16:49:48<2:16:16, 2.53s/it] +2025-02-06 02:57:31 - ERROR - stderr - 86%|████████▌ | 19207/22434 [16:49:51<2:18:19, 2.57s/it] +2025-02-06 02:57:31 - ERROR - stderr - +2025-02-06 02:57:31 - ERROR - stderr - +2025-02-06 02:57:31 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.3475244045257568, 'learning_rate': 1.0658104514162281e-06, 'epoch': 2.57} +2025-02-06 02:57:31 - ERROR - stderr - 86%|████████▌ | 19207/22434 [16:49:51<2:18:19, 2.57s/it] +2025-02-06 02:57:34 - ERROR - stderr - 86%|████████▌ | 19208/22434 [16:49:53<2:16:19, 2.54s/it] +2025-02-06 02:57:34 - ERROR - stderr - +2025-02-06 02:57:34 - ERROR - stderr - +2025-02-06 02:57:34 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.508324146270752, 'learning_rate': 1.0651619783045875e-06, 'epoch': 2.57} +2025-02-06 02:57:34 - ERROR - stderr - 86%|████████▌ | 19208/22434 [16:49:53<2:16:19, 2.54s/it] +2025-02-06 02:57:36 - ERROR - stderr - 86%|████████▌ | 19209/22434 [16:49:56<2:18:41, 2.58s/it] +2025-02-06 02:57:36 - ERROR - stderr - +2025-02-06 02:57:36 - ERROR - stderr - +2025-02-06 02:57:36 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.6285452842712402, 'learning_rate': 1.0645136914311005e-06, 'epoch': 2.57} +2025-02-06 02:57:36 - ERROR - stderr - 86%|████████▌ | 19209/22434 [16:49:56<2:18:41, 2.58s/it] +2025-02-06 02:57:39 - ERROR - stderr - 86%|████████▌ | 19210/22434 [16:49:59<2:18:15, 2.57s/it] +2025-02-06 02:57:39 - ERROR - stderr - +2025-02-06 02:57:39 - ERROR - stderr - +2025-02-06 02:57:39 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.6734073162078857, 'learning_rate': 1.063865590809272e-06, 'epoch': 2.57} +2025-02-06 02:57:39 - ERROR - stderr - 86%|████████▌ | 19210/22434 [16:49:59<2:18:15, 2.57s/it] +2025-02-06 02:57:41 - ERROR - stderr - 86%|████████▌ | 19211/22434 [16:50:01<2:16:53, 2.55s/it] +2025-02-06 02:57:41 - ERROR - stderr - +2025-02-06 02:57:41 - ERROR - stderr - +2025-02-06 02:57:41 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.6095579862594604, 'learning_rate': 1.0632176764526159e-06, 'epoch': 2.57} +2025-02-06 02:57:41 - ERROR - stderr - 86%|████████▌ | 19211/22434 [16:50:01<2:16:53, 2.55s/it] +2025-02-06 02:57:44 - ERROR - stderr - 86%|████████▌ | 19212/22434 [16:50:04<2:15:30, 2.52s/it] +2025-02-06 02:57:44 - ERROR - stderr - +2025-02-06 02:57:44 - ERROR - stderr - +2025-02-06 02:57:44 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.5068771839141846, 'learning_rate': 1.0625699483746355e-06, 'epoch': 2.57} +2025-02-06 02:57:44 - ERROR - stderr - 86%|████████▌ | 19212/22434 [16:50:04<2:15:30, 2.52s/it] +2025-02-06 02:57:46 - ERROR - stderr - 86%|████████▌ | 19213/22434 [16:50:06<2:15:15, 2.52s/it] +2025-02-06 02:57:46 - ERROR - stderr - +2025-02-06 02:57:46 - ERROR - stderr - +2025-02-06 02:57:46 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.475645899772644, 'learning_rate': 1.0619224065888312e-06, 'epoch': 2.57} +2025-02-06 02:57:46 - ERROR - stderr - 86%|████████▌ | 19213/22434 [16:50:06<2:15:15, 2.52s/it] +2025-02-06 02:57:49 - ERROR - stderr - 86%|████████▌ | 19214/22434 [16:50:09<2:19:12, 2.59s/it] +2025-02-06 02:57:49 - ERROR - stderr - +2025-02-06 02:57:49 - ERROR - stderr - +2025-02-06 02:57:49 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.381011724472046, 'learning_rate': 1.0612750511087022e-06, 'epoch': 2.57} +2025-02-06 02:57:49 - ERROR - stderr - 86%|████████▌ | 19214/22434 [16:50:09<2:19:12, 2.59s/it] +2025-02-06 02:57:52 - ERROR - stderr - 86%|████████▌ | 19215/22434 [16:50:11<2:16:51, 2.55s/it] +2025-02-06 02:57:52 - ERROR - stderr - +2025-02-06 02:57:52 - ERROR - stderr - +2025-02-06 02:57:52 - INFO - stdout - {'loss': 0.4273, 'grad_norm': 1.6200796365737915, 'learning_rate': 1.0606278819477412e-06, 'epoch': 2.57} +2025-02-06 02:57:52 - ERROR - stderr - 86%|████████▌ | 19215/22434 [16:50:11<2:16:51, 2.55s/it] +2025-02-06 02:57:54 - ERROR - stderr - 86%|████████▌ | 19216/22434 [16:50:14<2:14:50, 2.51s/it] +2025-02-06 02:57:54 - ERROR - stderr - +2025-02-06 02:57:54 - ERROR - stderr - +2025-02-06 02:57:54 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.5439568758010864, 'learning_rate': 1.0599808991194383e-06, 'epoch': 2.57} +2025-02-06 02:57:54 - ERROR - stderr - 86%|████████▌ | 19216/22434 [16:50:14<2:14:50, 2.51s/it] +2025-02-06 02:57:56 - ERROR - stderr - 86%|████████▌ | 19217/22434 [16:50:16<2:13:51, 2.50s/it] +2025-02-06 02:57:56 - ERROR - stderr - +2025-02-06 02:57:56 - ERROR - stderr - +2025-02-06 02:57:56 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.691437005996704, 'learning_rate': 1.0593341026372784e-06, 'epoch': 2.57} +2025-02-06 02:57:56 - ERROR - stderr - 86%|████████▌ | 19217/22434 [16:50:16<2:13:51, 2.50s/it] +2025-02-06 02:57:59 - ERROR - stderr - 86%|████████▌ | 19218/22434 [16:50:19<2:13:20, 2.49s/it] +2025-02-06 02:57:59 - ERROR - stderr - +2025-02-06 02:57:59 - ERROR - stderr - +2025-02-06 02:57:59 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.4096848964691162, 'learning_rate': 1.058687492514745e-06, 'epoch': 2.57} +2025-02-06 02:57:59 - ERROR - stderr - 86%|████████▌ | 19218/22434 [16:50:19<2:13:20, 2.49s/it] +2025-02-06 02:58:01 - ERROR - stderr - 86%|████████▌ | 19219/22434 [16:50:21<2:15:10, 2.52s/it] +2025-02-06 02:58:02 - ERROR - stderr - +2025-02-06 02:58:02 - ERROR - stderr - +2025-02-06 02:58:02 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.5017589330673218, 'learning_rate': 1.058041068765313e-06, 'epoch': 2.57} +2025-02-06 02:58:02 - ERROR - stderr - 86%|████████▌ | 19219/22434 [16:50:21<2:15:10, 2.52s/it] +2025-02-06 02:58:04 - ERROR - stderr - 86%|████████▌ | 19220/22434 [16:50:24<2:13:57, 2.50s/it] +2025-02-06 02:58:04 - ERROR - stderr - +2025-02-06 02:58:04 - ERROR - stderr - +2025-02-06 02:58:04 - INFO - stdout - {'loss': 0.3624, 'grad_norm': 1.5276645421981812, 'learning_rate': 1.0573948314024597e-06, 'epoch': 2.57} +2025-02-06 02:58:04 - ERROR - stderr - 86%|████████▌ | 19220/22434 [16:50:24<2:13:57, 2.50s/it] +2025-02-06 02:58:06 - ERROR - stderr - 86%|████████▌ | 19221/22434 [16:50:26<2:14:04, 2.50s/it] +2025-02-06 02:58:06 - ERROR - stderr - +2025-02-06 02:58:06 - ERROR - stderr - +2025-02-06 02:58:06 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.658337116241455, 'learning_rate': 1.056748780439656e-06, 'epoch': 2.57} +2025-02-06 02:58:07 - ERROR - stderr - 86%|████████▌ | 19221/22434 [16:50:26<2:14:04, 2.50s/it] +2025-02-06 02:58:09 - ERROR - stderr - 86%|████████▌ | 19222/22434 [16:50:29<2:14:16, 2.51s/it] +2025-02-06 02:58:09 - ERROR - stderr - +2025-02-06 02:58:09 - ERROR - stderr - +2025-02-06 02:58:09 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.4802452325820923, 'learning_rate': 1.0561029158903623e-06, 'epoch': 2.57} +2025-02-06 02:58:09 - ERROR - stderr - 86%|████████▌ | 19222/22434 [16:50:29<2:14:16, 2.51s/it] +2025-02-06 02:58:11 - ERROR - stderr - 86%|████████▌ | 19223/22434 [16:50:31<2:14:17, 2.51s/it] +2025-02-06 02:58:12 - ERROR - stderr - +2025-02-06 02:58:12 - ERROR - stderr - +2025-02-06 02:58:12 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.633117437362671, 'learning_rate': 1.0554572377680483e-06, 'epoch': 2.57} +2025-02-06 02:58:12 - ERROR - stderr - 86%|████████▌ | 19223/22434 [16:50:31<2:14:17, 2.51s/it] +2025-02-06 02:58:14 - ERROR - stderr - 86%|████████▌ | 19224/22434 [16:50:34<2:15:00, 2.52s/it] +2025-02-06 02:58:14 - ERROR - stderr - +2025-02-06 02:58:14 - ERROR - stderr - +2025-02-06 02:58:14 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.6145466566085815, 'learning_rate': 1.0548117460861652e-06, 'epoch': 2.57} +2025-02-06 02:58:14 - ERROR - stderr - 86%|████████▌ | 19224/22434 [16:50:34<2:15:00, 2.52s/it] +2025-02-06 02:58:17 - ERROR - stderr - 86%|████████▌ | 19225/22434 [16:50:36<2:14:54, 2.52s/it] +2025-02-06 02:58:17 - ERROR - stderr - +2025-02-06 02:58:17 - ERROR - stderr - +2025-02-06 02:58:17 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.6716312170028687, 'learning_rate': 1.0541664408581742e-06, 'epoch': 2.57} +2025-02-06 02:58:17 - ERROR - stderr - 86%|████████▌ | 19225/22434 [16:50:36<2:14:54, 2.52s/it] +2025-02-06 02:58:19 - ERROR - stderr - 86%|████████▌ | 19226/22434 [16:50:39<2:15:40, 2.54s/it] +2025-02-06 02:58:19 - ERROR - stderr - +2025-02-06 02:58:19 - ERROR - stderr - +2025-02-06 02:58:19 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.6233869791030884, 'learning_rate': 1.0535213220975248e-06, 'epoch': 2.57} +2025-02-06 02:58:19 - ERROR - stderr - 86%|████████▌ | 19226/22434 [16:50:39<2:15:40, 2.54s/it] +2025-02-06 02:58:22 - ERROR - stderr - 86%|████████▌ | 19227/22434 [16:50:42<2:21:45, 2.65s/it] +2025-02-06 02:58:22 - ERROR - stderr - +2025-02-06 02:58:22 - ERROR - stderr - +2025-02-06 02:58:22 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.6112399101257324, 'learning_rate': 1.0528763898176586e-06, 'epoch': 2.57} +2025-02-06 02:58:22 - ERROR - stderr - 86%|████████▌ | 19227/22434 [16:50:42<2:21:45, 2.65s/it] +2025-02-06 02:58:24 - ERROR - stderr - 86%|████████▌ | 19228/22434 [16:50:44<2:18:21, 2.59s/it] +2025-02-06 02:58:25 - ERROR - stderr - +2025-02-06 02:58:25 - ERROR - stderr - +2025-02-06 02:58:25 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.4784741401672363, 'learning_rate': 1.0522316440320279e-06, 'epoch': 2.57} +2025-02-06 02:58:25 - ERROR - stderr - 86%|████████▌ | 19228/22434 [16:50:44<2:18:21, 2.59s/it] +2025-02-06 02:58:27 - ERROR - stderr - 86%|████████▌ | 19229/22434 [16:50:47<2:16:37, 2.56s/it] +2025-02-06 02:58:27 - ERROR - stderr - +2025-02-06 02:58:27 - ERROR - stderr - +2025-02-06 02:58:27 - INFO - stdout - {'loss': 0.4236, 'grad_norm': 1.6744282245635986, 'learning_rate': 1.0515870847540632e-06, 'epoch': 2.57} +2025-02-06 02:58:27 - ERROR - stderr - 86%|████████▌ | 19229/22434 [16:50:47<2:16:37, 2.56s/it] +2025-02-06 02:58:29 - ERROR - stderr - 86%|████████▌ | 19230/22434 [16:50:49<2:14:48, 2.52s/it] +2025-02-06 02:58:29 - ERROR - stderr - +2025-02-06 02:58:29 - ERROR - stderr - +2025-02-06 02:58:29 - INFO - stdout - {'loss': 0.3313, 'grad_norm': 1.430050015449524, 'learning_rate': 1.0509427119972038e-06, 'epoch': 2.57} +2025-02-06 02:58:29 - ERROR - stderr - 86%|████████▌ | 19230/22434 [16:50:49<2:14:48, 2.52s/it] +2025-02-06 02:58:32 - ERROR - stderr - 86%|████████▌ | 19231/22434 [16:50:52<2:13:59, 2.51s/it] +2025-02-06 02:58:32 - ERROR - stderr - +2025-02-06 02:58:32 - ERROR - stderr - +2025-02-06 02:58:32 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.5020197629928589, 'learning_rate': 1.0502985257748788e-06, 'epoch': 2.57} +2025-02-06 02:58:32 - ERROR - stderr - 86%|████████▌ | 19231/22434 [16:50:52<2:13:59, 2.51s/it] +2025-02-06 02:58:34 - ERROR - stderr - 86%|████████▌ | 19232/22434 [16:50:54<2:13:12, 2.50s/it] +2025-02-06 02:58:34 - ERROR - stderr - +2025-02-06 02:58:34 - ERROR - stderr - +2025-02-06 02:58:34 - INFO - stdout - {'loss': 0.3232, 'grad_norm': 1.4548124074935913, 'learning_rate': 1.0496545261005164e-06, 'epoch': 2.57} +2025-02-06 02:58:34 - ERROR - stderr - 86%|████████▌ | 19232/22434 [16:50:54<2:13:12, 2.50s/it] +2025-02-06 02:58:37 - ERROR - stderr - 86%|████████▌ | 19233/22434 [16:50:57<2:12:59, 2.49s/it] +2025-02-06 02:58:37 - ERROR - stderr - +2025-02-06 02:58:37 - ERROR - stderr - +2025-02-06 02:58:37 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.517027735710144, 'learning_rate': 1.0490107129875448e-06, 'epoch': 2.57} +2025-02-06 02:58:37 - ERROR - stderr - 86%|████████▌ | 19233/22434 [16:50:57<2:12:59, 2.49s/it] +2025-02-06 02:58:39 - ERROR - stderr - 86%|████████▌ | 19234/22434 [16:50:59<2:13:13, 2.50s/it] +2025-02-06 02:58:39 - ERROR - stderr - +2025-02-06 02:58:39 - ERROR - stderr - +2025-02-06 02:58:39 - INFO - stdout - {'loss': 0.3186, 'grad_norm': 1.4259926080703735, 'learning_rate': 1.0483670864493777e-06, 'epoch': 2.57} +2025-02-06 02:58:39 - ERROR - stderr - 86%|████████▌ | 19234/22434 [16:50:59<2:13:13, 2.50s/it] +2025-02-06 02:58:42 - ERROR - stderr - 86%|████████▌ | 19235/22434 [16:51:02<2:13:33, 2.50s/it] +2025-02-06 02:58:42 - ERROR - stderr - +2025-02-06 02:58:42 - ERROR - stderr - +2025-02-06 02:58:42 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.615278720855713, 'learning_rate': 1.0477236464994322e-06, 'epoch': 2.57} +2025-02-06 02:58:42 - ERROR - stderr - 86%|████████▌ | 19235/22434 [16:51:02<2:13:33, 2.50s/it] +2025-02-06 02:58:44 - ERROR - stderr - 86%|████████▌ | 19236/22434 [16:51:04<2:14:24, 2.52s/it] +2025-02-06 02:58:44 - ERROR - stderr - +2025-02-06 02:58:44 - ERROR - stderr - +2025-02-06 02:58:44 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.5949742794036865, 'learning_rate': 1.047080393151122e-06, 'epoch': 2.57} +2025-02-06 02:58:44 - ERROR - stderr - 86%|████████▌ | 19236/22434 [16:51:04<2:14:24, 2.52s/it] +2025-02-06 02:58:47 - ERROR - stderr - 86%|████████▌ | 19237/22434 [16:51:07<2:15:17, 2.54s/it] +2025-02-06 02:58:47 - ERROR - stderr - +2025-02-06 02:58:47 - ERROR - stderr - +2025-02-06 02:58:47 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.5634150505065918, 'learning_rate': 1.046437326417853e-06, 'epoch': 2.57} +2025-02-06 02:58:47 - ERROR - stderr - 86%|████████▌ | 19237/22434 [16:51:07<2:15:17, 2.54s/it] +2025-02-06 02:58:50 - ERROR - stderr - 86%|████████▌ | 19238/22434 [16:51:09<2:14:57, 2.53s/it] +2025-02-06 02:58:50 - ERROR - stderr - +2025-02-06 02:58:50 - ERROR - stderr - +2025-02-06 02:58:50 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.5389418601989746, 'learning_rate': 1.045794446313031e-06, 'epoch': 2.57} +2025-02-06 02:58:50 - ERROR - stderr - 86%|████████▌ | 19238/22434 [16:51:09<2:14:57, 2.53s/it] +2025-02-06 02:58:52 - ERROR - stderr - 86%|████████▌ | 19239/22434 [16:51:12<2:15:46, 2.55s/it] +2025-02-06 02:58:52 - ERROR - stderr - +2025-02-06 02:58:52 - ERROR - stderr - +2025-02-06 02:58:52 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.58133864402771, 'learning_rate': 1.0451517528500544e-06, 'epoch': 2.57} +2025-02-06 02:58:52 - ERROR - stderr - 86%|████████▌ | 19239/22434 [16:51:12<2:15:46, 2.55s/it] +2025-02-06 02:58:55 - ERROR - stderr - 86%|████████▌ | 19240/22434 [16:51:14<2:15:08, 2.54s/it] +2025-02-06 02:58:55 - ERROR - stderr - +2025-02-06 02:58:55 - ERROR - stderr - +2025-02-06 02:58:55 - INFO - stdout - {'loss': 0.3098, 'grad_norm': 1.2974414825439453, 'learning_rate': 1.0445092460423222e-06, 'epoch': 2.57} +2025-02-06 02:58:55 - ERROR - stderr - 86%|████████▌ | 19240/22434 [16:51:14<2:15:08, 2.54s/it] +2025-02-06 02:58:57 - ERROR - stderr - 86%|████████▌ | 19241/22434 [16:51:17<2:15:13, 2.54s/it] +2025-02-06 02:58:57 - ERROR - stderr - +2025-02-06 02:58:57 - ERROR - stderr - +2025-02-06 02:58:57 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.6399016380310059, 'learning_rate': 1.0438669259032241e-06, 'epoch': 2.57} +2025-02-06 02:58:57 - ERROR - stderr - 86%|████████▌ | 19241/22434 [16:51:17<2:15:13, 2.54s/it] +2025-02-06 02:59:00 - ERROR - stderr - 86%|████████▌ | 19242/22434 [16:51:19<2:14:08, 2.52s/it] +2025-02-06 02:59:00 - ERROR - stderr - +2025-02-06 02:59:00 - ERROR - stderr - +2025-02-06 02:59:00 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.683309555053711, 'learning_rate': 1.0432247924461525e-06, 'epoch': 2.57} +2025-02-06 02:59:00 - ERROR - stderr - 86%|████████▌ | 19242/22434 [16:51:19<2:14:08, 2.52s/it] +2025-02-06 02:59:02 - ERROR - stderr - 86%|████████▌ | 19243/22434 [16:51:22<2:14:26, 2.53s/it] +2025-02-06 02:59:02 - ERROR - stderr - +2025-02-06 02:59:02 - ERROR - stderr - +2025-02-06 02:59:02 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.513400912284851, 'learning_rate': 1.0425828456844855e-06, 'epoch': 2.57} +2025-02-06 02:59:02 - ERROR - stderr - 86%|████████▌ | 19243/22434 [16:51:22<2:14:26, 2.53s/it] +2025-02-06 02:59:05 - ERROR - stderr - 86%|████████▌ | 19244/22434 [16:51:25<2:14:26, 2.53s/it] +2025-02-06 02:59:05 - ERROR - stderr - +2025-02-06 02:59:05 - ERROR - stderr - +2025-02-06 02:59:05 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.5276539325714111, 'learning_rate': 1.0419410856316092e-06, 'epoch': 2.57} +2025-02-06 02:59:05 - ERROR - stderr - 86%|████████▌ | 19244/22434 [16:51:25<2:14:26, 2.53s/it] +2025-02-06 02:59:07 - ERROR - stderr - 86%|████████▌ | 19245/22434 [16:51:27<2:13:22, 2.51s/it] +2025-02-06 02:59:07 - ERROR - stderr - +2025-02-06 02:59:07 - ERROR - stderr - +2025-02-06 02:59:07 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.757575273513794, 'learning_rate': 1.0412995123009006e-06, 'epoch': 2.57} +2025-02-06 02:59:07 - ERROR - stderr - 86%|████████▌ | 19245/22434 [16:51:27<2:13:22, 2.51s/it] +2025-02-06 02:59:10 - ERROR - stderr - 86%|████████▌ | 19246/22434 [16:51:29<2:12:50, 2.50s/it] +2025-02-06 02:59:10 - ERROR - stderr - +2025-02-06 02:59:10 - ERROR - stderr - +2025-02-06 02:59:10 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.3747422695159912, 'learning_rate': 1.040658125705728e-06, 'epoch': 2.57} +2025-02-06 02:59:10 - ERROR - stderr - 86%|████████▌ | 19246/22434 [16:51:29<2:12:50, 2.50s/it] +2025-02-06 02:59:12 - ERROR - stderr - 86%|████████▌ | 19247/22434 [16:51:32<2:12:47, 2.50s/it] +2025-02-06 02:59:12 - ERROR - stderr - +2025-02-06 02:59:12 - ERROR - stderr - +2025-02-06 02:59:12 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.7589099407196045, 'learning_rate': 1.0400169258594673e-06, 'epoch': 2.57} +2025-02-06 02:59:12 - ERROR - stderr - 86%|████████▌ | 19247/22434 [16:51:32<2:12:47, 2.50s/it] +2025-02-06 02:59:15 - ERROR - stderr - 86%|████████▌ | 19248/22434 [16:51:35<2:14:18, 2.53s/it] +2025-02-06 02:59:15 - ERROR - stderr - +2025-02-06 02:59:15 - ERROR - stderr - +2025-02-06 02:59:15 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.539939045906067, 'learning_rate': 1.0393759127754765e-06, 'epoch': 2.57} +2025-02-06 02:59:15 - ERROR - stderr - 86%|████████▌ | 19248/22434 [16:51:35<2:14:18, 2.53s/it] +2025-02-06 02:59:17 - ERROR - stderr - 86%|████████▌ | 19249/22434 [16:51:37<2:12:51, 2.50s/it] +2025-02-06 02:59:17 - ERROR - stderr - +2025-02-06 02:59:17 - ERROR - stderr - +2025-02-06 02:59:17 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.4342817068099976, 'learning_rate': 1.0387350864671242e-06, 'epoch': 2.57} +2025-02-06 02:59:17 - ERROR - stderr - 86%|████████▌ | 19249/22434 [16:51:37<2:12:51, 2.50s/it] +2025-02-06 02:59:20 - ERROR - stderr - 86%|████████▌ | 19250/22434 [16:51:39<2:12:45, 2.50s/it] +2025-02-06 02:59:20 - ERROR - stderr - +2025-02-06 02:59:20 - ERROR - stderr - +2025-02-06 02:59:20 - INFO - stdout - {'loss': 0.3451, 'grad_norm': 1.5475482940673828, 'learning_rate': 1.0380944469477617e-06, 'epoch': 2.57} +2025-02-06 02:59:20 - ERROR - stderr - 86%|█████���██▌ | 19250/22434 [16:51:40<2:12:45, 2.50s/it] +2025-02-06 02:59:22 - ERROR - stderr - 86%|████████▌ | 19251/22434 [16:51:42<2:13:05, 2.51s/it] +2025-02-06 02:59:22 - ERROR - stderr - +2025-02-06 02:59:22 - ERROR - stderr - +2025-02-06 02:59:22 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.6336615085601807, 'learning_rate': 1.0374539942307426e-06, 'epoch': 2.57} +2025-02-06 02:59:22 - ERROR - stderr - 86%|████████▌ | 19251/22434 [16:51:42<2:13:05, 2.51s/it] +2025-02-06 02:59:25 - ERROR - stderr - 86%|████████▌ | 19252/22434 [16:51:44<2:12:16, 2.49s/it] +2025-02-06 02:59:25 - ERROR - stderr - +2025-02-06 02:59:25 - ERROR - stderr - +2025-02-06 02:59:25 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.60920250415802, 'learning_rate': 1.0368137283294232e-06, 'epoch': 2.57} +2025-02-06 02:59:25 - ERROR - stderr - 86%|████████▌ | 19252/22434 [16:51:45<2:12:16, 2.49s/it] +2025-02-06 02:59:27 - ERROR - stderr - 86%|████████▌ | 19253/22434 [16:51:47<2:12:18, 2.50s/it] +2025-02-06 02:59:27 - ERROR - stderr - +2025-02-06 02:59:27 - ERROR - stderr - +2025-02-06 02:59:27 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.4386471509933472, 'learning_rate': 1.0361736492571428e-06, 'epoch': 2.57} +2025-02-06 02:59:27 - ERROR - stderr - 86%|████████▌ | 19253/22434 [16:51:47<2:12:18, 2.50s/it] +2025-02-06 02:59:30 - ERROR - stderr - 86%|████████▌ | 19254/22434 [16:51:49<2:12:39, 2.50s/it] +2025-02-06 02:59:30 - ERROR - stderr - +2025-02-06 02:59:30 - ERROR - stderr - +2025-02-06 02:59:30 - INFO - stdout - {'loss': 0.3811, 'grad_norm': 1.6944243907928467, 'learning_rate': 1.035533757027245e-06, 'epoch': 2.57} +2025-02-06 02:59:30 - ERROR - stderr - 86%|████████▌ | 19254/22434 [16:51:50<2:12:39, 2.50s/it] +2025-02-06 02:59:32 - ERROR - stderr - 86%|████████▌ | 19255/22434 [16:51:52<2:16:15, 2.57s/it] +2025-02-06 02:59:33 - ERROR - stderr - +2025-02-06 02:59:33 - ERROR - stderr - +2025-02-06 02:59:33 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.53303861618042, 'learning_rate': 1.034894051653068e-06, 'epoch': 2.57} +2025-02-06 02:59:33 - ERROR - stderr - 86%|████████▌ | 19255/22434 [16:51:52<2:16:15, 2.57s/it] +2025-02-06 02:59:35 - ERROR - stderr - 86%|████████▌ | 19256/22434 [16:51:55<2:18:15, 2.61s/it] +2025-02-06 02:59:35 - ERROR - stderr - +2025-02-06 02:59:35 - ERROR - stderr - +2025-02-06 02:59:35 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.5082919597625732, 'learning_rate': 1.0342545331479459e-06, 'epoch': 2.58} +2025-02-06 02:59:35 - ERROR - stderr - 86%|████████▌ | 19256/22434 [16:51:55<2:18:15, 2.61s/it] +2025-02-06 02:59:38 - ERROR - stderr - 86%|████████▌ | 19257/22434 [16:51:58<2:21:06, 2.67s/it] +2025-02-06 02:59:38 - ERROR - stderr - +2025-02-06 02:59:38 - ERROR - stderr - +2025-02-06 02:59:38 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.5976619720458984, 'learning_rate': 1.0336152015252088e-06, 'epoch': 2.58} +2025-02-06 02:59:38 - ERROR - stderr - 86%|████████▌ | 19257/22434 [16:51:58<2:21:06, 2.67s/it] +2025-02-06 02:59:41 - ERROR - stderr - 86%|████████▌ | 19258/22434 [16:52:01<2:23:03, 2.70s/it] +2025-02-06 02:59:41 - ERROR - stderr - +2025-02-06 02:59:41 - ERROR - stderr - +2025-02-06 02:59:41 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.5609242916107178, 'learning_rate': 1.032976056798184e-06, 'epoch': 2.58} +2025-02-06 02:59:41 - ERROR - stderr - 86%|████████▌ | 19258/22434 [16:52:01<2:23:03, 2.70s/it] +2025-02-06 02:59:43 - ERROR - stderr - 86%|████████▌ | 19259/22434 [16:52:03<2:18:51, 2.62s/it] +2025-02-06 02:59:43 - ERROR - stderr - +2025-02-06 02:59:43 - ERROR - stderr - +2025-02-06 02:59:43 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.4943677186965942, 'learning_rate': 1.0323370989801907e-06, 'epoch': 2.58} +2025-02-06 02:59:43 - ERROR - stderr - 86%|████████▌ | 19259/22434 [16:52:03<2:18:51, 2.62s/it] +2025-02-06 02:59:46 - ERROR - stderr - 86%|████████▌ | 19260/22434 [16:52:06<2:17:47, 2.60s/it] +2025-02-06 02:59:46 - ERROR - stderr - +2025-02-06 02:59:46 - ERROR - stderr - +2025-02-06 02:59:46 - INFO - stdout - {'loss': 0.3397, 'grad_norm': 1.572357177734375, 'learning_rate': 1.0316983280845505e-06, 'epoch': 2.58} +2025-02-06 02:59:46 - ERROR - stderr - 86%|████████▌ | 19260/22434 [16:52:06<2:17:47, 2.60s/it] +2025-02-06 02:59:48 - ERROR - stderr - 86%|████████▌ | 19261/22434 [16:52:08<2:15:03, 2.55s/it] +2025-02-06 02:59:48 - ERROR - stderr - +2025-02-06 02:59:48 - ERROR - stderr - +2025-02-06 02:59:48 - INFO - stdout - {'loss': 0.3962, 'grad_norm': 1.6423535346984863, 'learning_rate': 1.0310597441245795e-06, 'epoch': 2.58} +2025-02-06 02:59:48 - ERROR - stderr - 86%|████████▌ | 19261/22434 [16:52:08<2:15:03, 2.55s/it] +2025-02-06 02:59:51 - ERROR - stderr - 86%|████████▌ | 19262/22434 [16:52:10<2:14:41, 2.55s/it] +2025-02-06 02:59:51 - ERROR - stderr - +2025-02-06 02:59:51 - ERROR - stderr - +2025-02-06 02:59:51 - INFO - stdout - {'loss': 0.3462, 'grad_norm': 1.4560085535049438, 'learning_rate': 1.0304213471135816e-06, 'epoch': 2.58} +2025-02-06 02:59:51 - ERROR - stderr - 86%|████████▌ | 19262/22434 [16:52:11<2:14:41, 2.55s/it] +2025-02-06 02:59:53 - ERROR - stderr - 86%|████████▌ | 19263/22434 [16:52:13<2:13:15, 2.52s/it] +2025-02-06 02:59:53 - ERROR - stderr - +2025-02-06 02:59:53 - ERROR - stderr - +2025-02-06 02:59:53 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.62520432472229, 'learning_rate': 1.0297831370648692e-06, 'epoch': 2.58} +2025-02-06 02:59:53 - ERROR - stderr - 86%|████████▌ | 19263/22434 [16:52:13<2:13:15, 2.52s/it] +2025-02-06 02:59:56 - ERROR - stderr - 86%|████████▌ | 19264/22434 [16:52:15<2:12:57, 2.52s/it] +2025-02-06 02:59:56 - ERROR - stderr - +2025-02-06 02:59:56 - ERROR - stderr - +2025-02-06 02:59:56 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.6136783361434937, 'learning_rate': 1.029145113991743e-06, 'epoch': 2.58} +2025-02-06 02:59:56 - ERROR - stderr - 86%|████████▌ | 19264/22434 [16:52:15<2:12:57, 2.52s/it] +2025-02-06 02:59:58 - ERROR - stderr - 86%|████████▌ | 19265/22434 [16:52:18<2:13:52, 2.53s/it] +2025-02-06 02:59:58 - ERROR - stderr - +2025-02-06 02:59:58 - ERROR - stderr - +2025-02-06 02:59:58 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.5375196933746338, 'learning_rate': 1.0285072779075045e-06, 'epoch': 2.58} +2025-02-06 02:59:58 - ERROR - stderr - 86%|████████▌ | 19265/22434 [16:52:18<2:13:52, 2.53s/it] +2025-02-06 03:00:01 - ERROR - stderr - 86%|████████▌ | 19266/22434 [16:52:21<2:16:53, 2.59s/it] +2025-02-06 03:00:01 - ERROR - stderr - +2025-02-06 03:00:01 - ERROR - stderr - +2025-02-06 03:00:01 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.4805903434753418, 'learning_rate': 1.0278696288254475e-06, 'epoch': 2.58} +2025-02-06 03:00:01 - ERROR - stderr - 86%|████████▌ | 19266/22434 [16:52:21<2:16:53, 2.59s/it] +2025-02-06 03:00:03 - ERROR - stderr - 86%|████████▌ | 19267/22434 [16:52:23<2:14:21, 2.55s/it] +2025-02-06 03:00:03 - ERROR - stderr - +2025-02-06 03:00:03 - ERROR - stderr - +2025-02-06 03:00:03 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.7585663795471191, 'learning_rate': 1.0272321667588592e-06, 'epoch': 2.58} +2025-02-06 03:00:03 - ERROR - stderr - 86%|████████▌ | 19267/22434 [16:52:23<2:14:21, 2.55s/it] +2025-02-06 03:00:06 - ERROR - stderr - 86%|████████▌ | 19268/22434 [16:52:26<2:13:38, 2.53s/it] +2025-02-06 03:00:06 - ERROR - stderr - +2025-02-06 03:00:06 - ERROR - stderr - +2025-02-06 03:00:06 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.600266933441162, 'learning_rate': 1.0265948917210345e-06, 'epoch': 2.58} +2025-02-06 03:00:06 - ERROR - stderr - 86%|████████▌ | 19268/22434 [16:52:26<2:13:38, 2.53s/it] +2025-02-06 03:00:08 - ERROR - stderr - 86%|████████▌ | 19269/22434 [16:52:28<2:11:55, 2.50s/it] +2025-02-06 03:00:08 - ERROR - stderr - +2025-02-06 03:00:08 - ERROR - stderr - +2025-02-06 03:00:08 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.573654294013977, 'learning_rate': 1.0259578037252505e-06, 'epoch': 2.58} +2025-02-06 03:00:08 - ERROR - stderr - 86%|████████▌ | 19269/22434 [16:52:28<2:11:55, 2.50s/it] +2025-02-06 03:00:11 - ERROR - stderr - 86%|████████▌ | 19270/22434 [16:52:31<2:12:21, 2.51s/it] +2025-02-06 03:00:11 - ERROR - stderr - +2025-02-06 03:00:11 - ERROR - stderr - +2025-02-06 03:00:11 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.568121314048767, 'learning_rate': 1.0253209027847876e-06, 'epoch': 2.58} +2025-02-06 03:00:11 - ERROR - stderr - 86%|████████▌ | 19270/22434 [16:52:31<2:12:21, 2.51s/it] +2025-02-06 03:00:13 - ERROR - stderr - 86%|████████▌ | 19271/22434 [16:52:33<2:11:46, 2.50s/it] +2025-02-06 03:00:13 - ERROR - stderr - +2025-02-06 03:00:13 - ERROR - stderr - +2025-02-06 03:00:13 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.55734121799469, 'learning_rate': 1.0246841889129255e-06, 'epoch': 2.58} +2025-02-06 03:00:13 - ERROR - stderr - 86%|████████▌ | 19271/22434 [16:52:33<2:11:46, 2.50s/it] +2025-02-06 03:00:16 - ERROR - stderr - 86%|████████▌ | 19272/22434 [16:52:36<2:12:32, 2.51s/it] +2025-02-06 03:00:16 - ERROR - stderr - +2025-02-06 03:00:16 - ERROR - stderr - +2025-02-06 03:00:16 - INFO - stdout - {'loss': 0.3249, 'grad_norm': 1.3577525615692139, 'learning_rate': 1.02404766212293e-06, 'epoch': 2.58} +2025-02-06 03:00:16 - ERROR - stderr - 86%|████████▌ | 19272/22434 [16:52:36<2:12:32, 2.51s/it] +2025-02-06 03:00:18 - ERROR - stderr - 86%|████████▌ | 19273/22434 [16:52:38<2:11:37, 2.50s/it] +2025-02-06 03:00:18 - ERROR - stderr - +2025-02-06 03:00:18 - ERROR - stderr - +2025-02-06 03:00:18 - INFO - stdout - {'loss': 0.3145, 'grad_norm': 1.5033268928527832, 'learning_rate': 1.023411322428075e-06, 'epoch': 2.58} +2025-02-06 03:00:18 - ERROR - stderr - 86%|████████▌ | 19273/22434 [16:52:38<2:11:37, 2.50s/it] +2025-02-06 03:00:21 - ERROR - stderr - 86%|████████▌ | 19274/22434 [16:52:41<2:11:36, 2.50s/it] +2025-02-06 03:00:21 - ERROR - stderr - +2025-02-06 03:00:21 - ERROR - stderr - +2025-02-06 03:00:21 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.6319630146026611, 'learning_rate': 1.02277516984162e-06, 'epoch': 2.58} +2025-02-06 03:00:21 - ERROR - stderr - 86%|████████▌ | 19274/22434 [16:52:41<2:11:36, 2.50s/it] +2025-02-06 03:00:23 - ERROR - stderr - 86%|████████▌ | 19275/22434 [16:52:43<2:11:55, 2.51s/it] +2025-02-06 03:00:23 - ERROR - stderr - +2025-02-06 03:00:23 - ERROR - stderr - +2025-02-06 03:00:23 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.6792452335357666, 'learning_rate': 1.0221392043768264e-06, 'epoch': 2.58} +2025-02-06 03:00:23 - ERROR - stderr - 86%|████████▌ | 19275/22434 [16:52:43<2:11:55, 2.51s/it] +2025-02-06 03:00:26 - ERROR - stderr - 86%|████████▌ | 19276/22434 [16:52:46<2:10:55, 2.49s/it] +2025-02-06 03:00:26 - ERROR - stderr - +2025-02-06 03:00:26 - ERROR - stderr - +2025-02-06 03:00:26 - INFO - stdout - {'loss': 0.3878, 'grad_norm': 1.6028178930282593, 'learning_rate': 1.0215034260469502e-06, 'epoch': 2.58} +2025-02-06 03:00:26 - ERROR - stderr - 86%|████████▌ | 19276/22434 [16:52:46<2:10:55, 2.49s/it] +2025-02-06 03:00:28 - ERROR - stderr - 86%|████████▌ | 19277/22434 [16:52:48<2:10:22, 2.48s/it] +2025-02-06 03:00:28 - ERROR - stderr - +2025-02-06 03:00:28 - ERROR - stderr - +2025-02-06 03:00:28 - INFO - stdout - {'loss': 0.4167, 'grad_norm': 1.6867008209228516, 'learning_rate': 1.0208678348652433e-06, 'epoch': 2.58} +2025-02-06 03:00:28 - ERROR - stderr - 86%|████████▌ | 19277/22434 [16:52:48<2:10:22, 2.48s/it] +2025-02-06 03:00:31 - ERROR - stderr - 86%|████████▌ | 19278/22434 [16:52:51<2:11:11, 2.49s/it] +2025-02-06 03:00:31 - ERROR - stderr - +2025-02-06 03:00:31 - ERROR - stderr - +2025-02-06 03:00:31 - INFO - stdout - {'loss': 0.3234, 'grad_norm': 1.524542212486267, 'learning_rate': 1.020232430844954e-06, 'epoch': 2.58} +2025-02-06 03:00:31 - ERROR - stderr - 86%|████████▌ | 19278/22434 [16:52:51<2:11:11, 2.49s/it] +2025-02-06 03:00:33 - ERROR - stderr - 86%|████████▌ | 19279/22434 [16:52:53<2:10:30, 2.48s/it] +2025-02-06 03:00:33 - ERROR - stderr - +2025-02-06 03:00:33 - ERROR - stderr - +2025-02-06 03:00:33 - INFO - stdout - {'loss': 0.3971, 'grad_norm': 1.7698523998260498, 'learning_rate': 1.019597213999327e-06, 'epoch': 2.58} +2025-02-06 03:00:33 - ERROR - stderr - 86%|████████▌ | 19279/22434 [16:52:53<2:10:30, 2.48s/it] +2025-02-06 03:00:36 - ERROR - stderr - 86%|████████▌ | 19280/22434 [16:52:55<2:09:24, 2.46s/it] +2025-02-06 03:00:36 - ERROR - stderr - +2025-02-06 03:00:36 - ERROR - stderr - +2025-02-06 03:00:36 - INFO - stdout - {'loss': 0.3998, 'grad_norm': 1.5787378549575806, 'learning_rate': 1.018962184341603e-06, 'epoch': 2.58} +2025-02-06 03:00:36 - ERROR - stderr - 86%|████████▌ | 19280/22434 [16:52:56<2:09:24, 2.46s/it] +2025-02-06 03:00:38 - ERROR - stderr - 86%|████████▌ | 19281/22434 [16:52:58<2:09:34, 2.47s/it] +2025-02-06 03:00:38 - ERROR - stderr - +2025-02-06 03:00:38 - ERROR - stderr - +2025-02-06 03:00:38 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.6036442518234253, 'learning_rate': 1.0183273418850192e-06, 'epoch': 2.58} +2025-02-06 03:00:38 - ERROR - stderr - 86%|████████▌ | 19281/22434 [16:52:58<2:09:34, 2.47s/it] +2025-02-06 03:00:41 - ERROR - stderr - 86%|████████▌ | 19282/22434 [16:53:00<2:09:42, 2.47s/it] +2025-02-06 03:00:41 - ERROR - stderr - +2025-02-06 03:00:41 - ERROR - stderr - +2025-02-06 03:00:41 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.6378891468048096, 'learning_rate': 1.017692686642806e-06, 'epoch': 2.58} +2025-02-06 03:00:41 - ERROR - stderr - 86%|████████▌ | 19282/22434 [16:53:00<2:09:42, 2.47s/it] +2025-02-06 03:00:43 - ERROR - stderr - 86%|████████▌ | 19283/22434 [16:53:03<2:09:34, 2.47s/it] +2025-02-06 03:00:43 - ERROR - stderr - +2025-02-06 03:00:43 - ERROR - stderr - +2025-02-06 03:00:43 - INFO - stdout - {'loss': 0.3278, 'grad_norm': 1.4724100828170776, 'learning_rate': 1.0170582186281952e-06, 'epoch': 2.58} +2025-02-06 03:00:43 - ERROR - stderr - 86%|████████▌ | 19283/22434 [16:53:03<2:09:34, 2.47s/it] +2025-02-06 03:00:46 - ERROR - stderr - 86%|████████▌ | 19284/22434 [16:53:05<2:09:38, 2.47s/it] +2025-02-06 03:00:46 - ERROR - stderr - +2025-02-06 03:00:46 - ERROR - stderr - +2025-02-06 03:00:46 - INFO - stdout - {'loss': 0.3352, 'grad_norm': 1.435001015663147, 'learning_rate': 1.0164239378544083e-06, 'epoch': 2.58} +2025-02-06 03:00:46 - ERROR - stderr - 86%|████████▌ | 19284/22434 [16:53:05<2:09:38, 2.47s/it] +2025-02-06 03:00:48 - ERROR - stderr - 86%|████████▌ | 19285/22434 [16:53:08<2:11:42, 2.51s/it] +2025-02-06 03:00:48 - ERROR - stderr - +2025-02-06 03:00:48 - ERROR - stderr - +2025-02-06 03:00:48 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.4401791095733643, 'learning_rate': 1.0157898443346715e-06, 'epoch': 2.58} +2025-02-06 03:00:48 - ERROR - stderr - 86%|████████▌ | 19285/22434 [16:53:08<2:11:42, 2.51s/it] +2025-02-06 03:00:51 - ERROR - stderr - 86%|████████▌ | 19286/22434 [16:53:11<2:17:31, 2.62s/it] +2025-02-06 03:00:51 - ERROR - stderr - +2025-02-06 03:00:51 - ERROR - stderr - +2025-02-06 03:00:51 - INFO - stdout - {'loss': 0.3913, 'grad_norm': 1.5669329166412354, 'learning_rate': 1.015155938082194e-06, 'epoch': 2.58} +2025-02-06 03:00:51 - ERROR - stderr - 86%|████████▌ | 19286/22434 [16:53:11<2:17:31, 2.62s/it] +2025-02-06 03:00:54 - ERROR - stderr - 86%|████████▌ | 19287/22434 [16:53:13<2:16:47, 2.61s/it] +2025-02-06 03:00:54 - ERROR - stderr - +2025-02-06 03:00:54 - ERROR - stderr - +2025-02-06 03:00:54 - INFO - stdout - {'loss': 0.3273, 'grad_norm': 1.5061671733856201, 'learning_rate': 1.0145222191101967e-06, 'epoch': 2.58} +2025-02-06 03:00:54 - ERROR - stderr - 86%|████████▌ | 19287/22434 [16:53:13<2:16:47, 2.61s/it] +2025-02-06 03:00:56 - ERROR - stderr - 86%|████████▌ | 19288/22434 [16:53:16<2:14:33, 2.57s/it] +2025-02-06 03:00:56 - ERROR - stderr - +2025-02-06 03:00:56 - ERROR - stderr - +2025-02-06 03:00:56 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.5431190729141235, 'learning_rate': 1.013888687431882e-06, 'epoch': 2.58} +2025-02-06 03:00:56 - ERROR - stderr - 86%|████████▌ | 19288/22434 [16:53:16<2:14:33, 2.57s/it] +2025-02-06 03:00:59 - ERROR - stderr - 86%|████████▌ | 19289/22434 [16:53:18<2:12:04, 2.52s/it] +2025-02-06 03:00:59 - ERROR - stderr - +2025-02-06 03:00:59 - ERROR - stderr - +2025-02-06 03:00:59 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.6048848628997803, 'learning_rate': 1.0132553430604608e-06, 'epoch': 2.58} +2025-02-06 03:00:59 - ERROR - stderr - 86%|████████▌ | 19289/22434 [16:53:18<2:12:04, 2.52s/it] +2025-02-06 03:01:01 - ERROR - stderr - 86%|████████▌ | 19290/22434 [16:53:21<2:12:37, 2.53s/it] +2025-02-06 03:01:01 - ERROR - stderr - +2025-02-06 03:01:01 - ERROR - stderr - +2025-02-06 03:01:01 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.5820257663726807, 'learning_rate': 1.0126221860091357e-06, 'epoch': 2.58} +2025-02-06 03:01:01 - ERROR - stderr - 86%|████████▌ | 19290/22434 [16:53:21<2:12:37, 2.53s/it] +2025-02-06 03:01:04 - ERROR - stderr - 86%|████████▌ | 19291/22434 [16:53:23<2:13:15, 2.54s/it] +2025-02-06 03:01:04 - ERROR - stderr - +2025-02-06 03:01:04 - ERROR - stderr - +2025-02-06 03:01:04 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.565089464187622, 'learning_rate': 1.011989216291096e-06, 'epoch': 2.58} +2025-02-06 03:01:04 - ERROR - stderr - 86%|████████▌ | 19291/22434 [16:53:23<2:13:15, 2.54s/it] +2025-02-06 03:01:06 - ERROR - stderr - 86%|████████▌ | 19292/22434 [16:53:26<2:14:23, 2.57s/it] +2025-02-06 03:01:06 - ERROR - stderr - +2025-02-06 03:01:06 - ERROR - stderr - +2025-02-06 03:01:06 - INFO - stdout - {'loss': 0.4077, 'grad_norm': 1.7364336252212524, 'learning_rate': 1.0113564339195447e-06, 'epoch': 2.58} +2025-02-06 03:01:06 - ERROR - stderr - 86%|████████▌ | 19292/22434 [16:53:26<2:14:23, 2.57s/it] +2025-02-06 03:01:09 - ERROR - stderr - 86%|████████▌ | 19293/22434 [16:53:29<2:14:29, 2.57s/it] +2025-02-06 03:01:09 - ERROR - stderr - +2025-02-06 03:01:09 - ERROR - stderr - +2025-02-06 03:01:09 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.6415561437606812, 'learning_rate': 1.0107238389076636e-06, 'epoch': 2.58} +2025-02-06 03:01:09 - ERROR - stderr - 86%|████████▌ | 19293/22434 [16:53:29<2:14:29, 2.57s/it] +2025-02-06 03:01:11 - ERROR - stderr - 86%|████████▌ | 19294/22434 [16:53:31<2:13:33, 2.55s/it] +2025-02-06 03:01:11 - ERROR - stderr - +2025-02-06 03:01:11 - ERROR - stderr - +2025-02-06 03:01:11 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.6672186851501465, 'learning_rate': 1.010091431268645e-06, 'epoch': 2.58} +2025-02-06 03:01:11 - ERROR - stderr - 86%|████████▌ | 19294/22434 [16:53:31<2:13:33, 2.55s/it] +2025-02-06 03:01:14 - ERROR - stderr - 86%|████████▌ | 19295/22434 [16:53:34<2:14:12, 2.57s/it] +2025-02-06 03:01:14 - ERROR - stderr - +2025-02-06 03:01:14 - ERROR - stderr - +2025-02-06 03:01:14 - INFO - stdout - {'loss': 0.3176, 'grad_norm': 1.4849148988723755, 'learning_rate': 1.0094592110156676e-06, 'epoch': 2.58} +2025-02-06 03:01:14 - ERROR - stderr - 86%|████████▌ | 19295/22434 [16:53:34<2:14:12, 2.57s/it] +2025-02-06 03:01:16 - ERROR - stderr - 86%|████████▌ | 19296/22434 [16:53:36<2:13:41, 2.56s/it] +2025-02-06 03:01:17 - ERROR - stderr - +2025-02-06 03:01:17 - ERROR - stderr - +2025-02-06 03:01:17 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.4996753931045532, 'learning_rate': 1.0088271781619096e-06, 'epoch': 2.58} +2025-02-06 03:01:17 - ERROR - stderr - 86%|████████▌ | 19296/22434 [16:53:36<2:13:41, 2.56s/it] +2025-02-06 03:01:19 - ERROR - stderr - 86%|████████▌ | 19297/22434 [16:53:39<2:12:16, 2.53s/it] +2025-02-06 03:01:19 - ERROR - stderr - +2025-02-06 03:01:19 - ERROR - stderr - +2025-02-06 03:01:19 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.4081658124923706, 'learning_rate': 1.0081953327205452e-06, 'epoch': 2.58} +2025-02-06 03:01:19 - ERROR - stderr - 86%|████████▌ | 19297/22434 [16:53:39<2:12:16, 2.53s/it] +2025-02-06 03:01:21 - ERROR - stderr - 86%|████████▌ | 19298/22434 [16:53:41<2:12:04, 2.53s/it] +2025-02-06 03:01:22 - ERROR - stderr - +2025-02-06 03:01:22 - ERROR - stderr - +2025-02-06 03:01:22 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.594196081161499, 'learning_rate': 1.0075636747047446e-06, 'epoch': 2.58} +2025-02-06 03:01:22 - ERROR - stderr - 86%|████████▌ | 19298/22434 [16:53:41<2:12:04, 2.53s/it] +2025-02-06 03:01:24 - ERROR - stderr - 86%|████████▌ | 19299/22434 [16:53:44<2:11:36, 2.52s/it] +2025-02-06 03:01:24 - ERROR - stderr - +2025-02-06 03:01:24 - ERROR - stderr - +2025-02-06 03:01:24 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.4774590730667114, 'learning_rate': 1.0069322041276752e-06, 'epoch': 2.58} +2025-02-06 03:01:24 - ERROR - stderr - 86%|████████▌ | 19299/22434 [16:53:44<2:11:36, 2.52s/it] +2025-02-06 03:01:27 - ERROR - stderr - 86%|████████▌ | 19300/22434 [16:53:46<2:12:26, 2.54s/it] +2025-02-06 03:01:27 - ERROR - stderr - +2025-02-06 03:01:27 - ERROR - stderr - +2025-02-06 03:01:27 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.4119399785995483, 'learning_rate': 1.0063009210024978e-06, 'epoch': 2.58} +2025-02-06 03:01:27 - ERROR - stderr - 86%|████████▌ | 19300/22434 [16:53:46<2:12:26, 2.54s/it] +2025-02-06 03:01:29 - ERROR - stderr - 86%|████████▌ | 19301/22434 [16:53:49<2:14:40, 2.58s/it] +2025-02-06 03:01:29 - ERROR - stderr - +2025-02-06 03:01:29 - ERROR - stderr - +2025-02-06 03:01:29 - INFO - stdout - {'loss': 0.3614, 'grad_norm': 1.4174410104751587, 'learning_rate': 1.0056698253423725e-06, 'epoch': 2.58} +2025-02-06 03:01:29 - ERROR - stderr - 86%|████████▌ | 19301/22434 [16:53:49<2:14:40, 2.58s/it] +2025-02-06 03:01:32 - ERROR - stderr - 86%|████████▌ | 19302/22434 [16:53:52<2:18:03, 2.64s/it] +2025-02-06 03:01:32 - ERROR - stderr - +2025-02-06 03:01:32 - ERROR - stderr - +2025-02-06 03:01:32 - INFO - stdout - {'loss': 0.3897, 'grad_norm': 1.6803432703018188, 'learning_rate': 1.0050389171604523e-06, 'epoch': 2.58} +2025-02-06 03:01:32 - ERROR - stderr - 86%|████████▌ | 19302/22434 [16:53:52<2:18:03, 2.64s/it] +2025-02-06 03:01:43 - ERROR - stderr - 86%|████████▌ | 19303/22434 [16:54:03<4:34:18, 5.26s/it] +2025-02-06 03:01:43 - ERROR - stderr - +2025-02-06 03:01:43 - ERROR - stderr - +2025-02-06 03:01:43 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.6410739421844482, 'learning_rate': 1.004408196469888e-06, 'epoch': 2.58} +2025-02-06 03:01:43 - ERROR - stderr - 86%|████████▌ | 19303/22434 [16:54:03<4:34:18, 5.26s/it] +2025-02-06 03:01:56 - ERROR - stderr - 86%|████████▌ | 19304/22434 [16:54:16<6:32:57, 7.53s/it] +2025-02-06 03:01:56 - ERROR - stderr - +2025-02-06 03:01:56 - ERROR - stderr - +2025-02-06 03:01:56 - INFO - stdout - {'loss': 0.3476, 'grad_norm': 1.6088812351226807, 'learning_rate': 1.003777663283828e-06, 'epoch': 2.58} +2025-02-06 03:01:56 - ERROR - stderr - 86%|████████▌ | 19304/22434 [16:54:16<6:32:57, 7.53s/it] +2025-02-06 03:02:11 - ERROR - stderr - 86%|████████▌ | 19305/22434 [16:54:31<8:22:22, 9.63s/it] +2025-02-06 03:02:11 - ERROR - stderr - +2025-02-06 03:02:11 - ERROR - stderr - +2025-02-06 03:02:11 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.6194511651992798, 'learning_rate': 1.0031473176154139e-06, 'epoch': 2.58} +2025-02-06 03:02:11 - ERROR - stderr - 86%|████████▌ | 19305/22434 [16:54:31<8:22:22, 9.63s/it] +2025-02-06 03:02:18 - ERROR - stderr - 86%|████████▌ | 19306/22434 [16:54:38<7:49:56, 9.01s/it] +2025-02-06 03:02:18 - ERROR - stderr - +2025-02-06 03:02:18 - ERROR - stderr - +2025-02-06 03:02:18 - INFO - stdout - {'loss': 0.3873, 'grad_norm': 1.6188567876815796, 'learning_rate': 1.0025171594777872e-06, 'epoch': 2.58} +2025-02-06 03:02:18 - ERROR - stderr - 86%|████████▌ | 19306/22434 [16:54:38<7:49:56, 9.01s/it] +2025-02-06 03:02:21 - ERROR - stderr - 86%|████████▌ | 19307/22434 [16:54:41<6:07:46, 7.06s/it] +2025-02-06 03:02:21 - ERROR - stderr - +2025-02-06 03:02:21 - ERROR - stderr - +2025-02-06 03:02:21 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.6760709285736084, 'learning_rate': 1.0018871888840764e-06, 'epoch': 2.58} +2025-02-06 03:02:21 - ERROR - stderr - 86%|████████▌ | 19307/22434 [16:54:41<6:07:46, 7.06s/it] +2025-02-06 03:02:26 - ERROR - stderr - 86%|████████▌ | 19308/22434 [16:54:46<5:45:25, 6.63s/it] +2025-02-06 03:02:26 - ERROR - stderr - +2025-02-06 03:02:26 - ERROR - stderr - +2025-02-06 03:02:26 - INFO - stdout - {'loss': 0.3835, 'grad_norm': 1.5439616441726685, 'learning_rate': 1.001257405847419e-06, 'epoch': 2.58} +2025-02-06 03:02:26 - ERROR - stderr - 86%|████████▌ | 19308/22434 [16:54:46<5:45:25, 6.63s/it] +2025-02-06 03:02:36 - ERROR - stderr - 86%|████████▌ | 19309/22434 [16:54:56<6:31:01, 7.51s/it] +2025-02-06 03:02:36 - ERROR - stderr - +2025-02-06 03:02:36 - ERROR - stderr - +2025-02-06 03:02:36 - INFO - stdout - {'loss': 0.3871, 'grad_norm': 1.667294979095459, 'learning_rate': 1.0006278103809409e-06, 'epoch': 2.58} +2025-02-06 03:02:36 - ERROR - stderr - 86%|████████▌ | 19309/22434 [16:54:56<6:31:01, 7.51s/it] +2025-02-06 03:02:42 - ERROR - stderr - 86%|████████▌ | 19310/22434 [16:55:01<5:59:30, 6.90s/it] +2025-02-06 03:02:42 - ERROR - stderr - +2025-02-06 03:02:42 - ERROR - stderr - +2025-02-06 03:02:42 - INFO - stdout - {'loss': 0.335, 'grad_norm': 1.3349734544754028, 'learning_rate': 9.999984024977626e-07, 'epoch': 2.58} +2025-02-06 03:02:42 - ERROR - stderr - 86%|████████▌ | 19310/22434 [16:55:01<5:59:30, 6.90s/it] +2025-02-06 03:02:48 - ERROR - stderr - 86%|████████▌ | 19311/22434 [16:55:08<5:50:42, 6.74s/it] +2025-02-06 03:02:48 - ERROR - stderr - +2025-02-06 03:02:48 - ERROR - stderr - +2025-02-06 03:02:48 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.5528465509414673, 'learning_rate': 9.993691822110096e-07, 'epoch': 2.58} +2025-02-06 03:02:48 - ERROR - stderr - 86%|████████▌ | 19311/22434 [16:55:08<5:50:42, 6.74s/it] +2025-02-06 03:03:11 - ERROR - stderr - 86%|████████▌ | 19312/22434 [16:55:31<10:05:50, 11.64s/it] +2025-02-06 03:03:11 - ERROR - stderr - +2025-02-06 03:03:11 - ERROR - stderr - +2025-02-06 03:03:11 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.52981698513031, 'learning_rate': 9.987401495337878e-07, 'epoch': 2.58} +2025-02-06 03:03:11 - ERROR - stderr - 86%|████████▌ | 19312/22434 [16:55:31<10:05:50, 11.64s/it] +2025-02-06 03:03:32 - ERROR - stderr - 86%|████████▌ | 19313/22434 [16:55:52<12:36:30, 14.54s/it] +2025-02-06 03:03:32 - ERROR - stderr - +2025-02-06 03:03:32 - ERROR - stderr - +2025-02-06 03:03:32 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.6052106618881226, 'learning_rate': 9.98111304479219e-07, 'epoch': 2.58} +2025-02-06 03:03:32 - ERROR - stderr - 86%|████████▌ | 19313/22434 [16:55:52<12:36:30, 14.54s/it] +2025-02-06 03:03:35 - ERROR - stderr - 86%|████████▌ | 19314/22434 [16:55:55<9:28:29, 10.93s/it] +2025-02-06 03:03:35 - ERROR - stderr - +2025-02-06 03:03:35 - ERROR - stderr - +2025-02-06 03:03:35 - INFO - stdout - {'loss': 0.321, 'grad_norm': 1.4527983665466309, 'learning_rate': 9.97482647060405e-07, 'epoch': 2.58} +2025-02-06 03:03:35 - ERROR - stderr - 86%|████████▌ | 19314/22434 [16:55:55<9:28:29, 10.93s/it] +2025-02-06 03:03:43 - ERROR - stderr - 86%|████████▌ | 19315/22434 [16:56:03<8:47:49, 10.15s/it] +2025-02-06 03:03:43 - ERROR - stderr - +2025-02-06 03:03:43 - ERROR - stderr - +2025-02-06 03:03:43 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.5236835479736328, 'learning_rate': 9.968541772904472e-07, 'epoch': 2.58} +2025-02-06 03:03:43 - ERROR - stderr - 86%|████████▌ | 19315/22434 [16:56:03<8:47:49, 10.15s/it] +2025-02-06 03:04:14 - ERROR - stderr - 86%|████████▌ | 19316/22434 [16:56:34<14:15:58, 16.47s/it] +2025-02-06 03:04:14 - ERROR - stderr - +2025-02-06 03:04:14 - ERROR - stderr - +2025-02-06 03:04:14 - INFO - stdout - {'loss': 0.4027, 'grad_norm': 1.6799372434616089, 'learning_rate': 9.962258951824544e-07, 'epoch': 2.58} +2025-02-06 03:04:14 - ERROR - stderr - 86%|████████▌ | 19316/22434 [16:56:34<14:15:58, 16.47s/it] +2025-02-06 03:04:40 - ERROR - stderr - 86%|████████▌ | 19317/22434 [16:57:00<16:45:05, 19.35s/it] +2025-02-06 03:04:40 - ERROR - stderr - +2025-02-06 03:04:40 - ERROR - stderr - +2025-02-06 03:04:40 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.6325856447219849, 'learning_rate': 9.955978007495116e-07, 'epoch': 2.58} +2025-02-06 03:04:40 - ERROR - stderr - 86%|████████▌ | 19317/22434 [16:57:00<16:45:05, 19.35s/it] +2025-02-06 03:04:43 - ERROR - stderr - 86%|████████▌ | 19318/22434 [16:57:03<12:24:03, 14.33s/it] +2025-02-06 03:04:43 - ERROR - stderr - +2025-02-06 03:04:43 - ERROR - stderr - +2025-02-06 03:04:43 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.4206188917160034, 'learning_rate': 9.949698940047214e-07, 'epoch': 2.58} +2025-02-06 03:04:43 - ERROR - stderr - 86%|████████▌ | 19318/22434 [16:57:03<12:24:03, 14.33s/it] +2025-02-06 03:05:25 - ERROR - stderr - 86%|████████▌ | 19319/22434 [16:57:44<19:30:31, 22.55s/it] +2025-02-06 03:05:25 - ERROR - stderr - +2025-02-06 03:05:25 - ERROR - stderr - +2025-02-06 03:05:25 - INFO - stdout - {'loss': 0.3929, 'grad_norm': 1.6846051216125488, 'learning_rate': 9.943421749611648e-07, 'epoch': 2.58} +2025-02-06 03:05:25 - ERROR - stderr - 86%|████████▌ | 19319/22434 [16:57:45<19:30:31, 22.55s/it] +2025-02-06 03:05:49 - ERROR - stderr - 86%|████████▌ | 19320/22434 [16:58:09<20:04:31, 23.21s/it] +2025-02-06 03:05:50 - ERROR - stderr - +2025-02-06 03:05:50 - ERROR - stderr - +2025-02-06 03:05:50 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.3665893077850342, 'learning_rate': 9.937146436319278e-07, 'epoch': 2.58} +2025-02-06 03:05:50 - ERROR - stderr - 86%|████████▌ | 19320/22434 [16:58:09<20:04:31, 23.21s/it] +2025-02-06 03:06:00 - ERROR - stderr - 86%|████████▌ | 19321/22434 [16:58:20<16:52:21, 19.51s/it] +2025-02-06 03:06:00 - ERROR - stderr - +2025-02-06 03:06:00 - ERROR - stderr - +2025-02-06 03:06:00 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.806023120880127, 'learning_rate': 9.930873000300912e-07, 'epoch': 2.58} +2025-02-06 03:06:00 - ERROR - stderr - 86%|████████▌ | 19321/22434 [16:58:20<16:52:21, 19.51s/it] +2025-02-06 03:06:52 - ERROR - stderr - 86%|████████▌ | 19322/22434 [16:59:12<25:09:35, 29.11s/it] +2025-02-06 03:06:52 - ERROR - stderr - +2025-02-06 03:06:52 - ERROR - stderr - +2025-02-06 03:06:52 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.488228440284729, 'learning_rate': 9.92460144168731e-07, 'epoch': 2.58} +2025-02-06 03:06:52 - ERROR - stderr - 86%|████████▌ | 19322/22434 [16:59:12<25:09:35, 29.11s/it] +2025-02-06 03:06:54 - ERROR - stderr - 86%|████████▌ | 19323/22434 [16:59:14<18:14:28, 21.11s/it] +2025-02-06 03:06:54 - ERROR - stderr - +2025-02-06 03:06:54 - ERROR - stderr - +2025-02-06 03:06:54 - INFO - stdout - {'loss': 0.3231, 'grad_norm': 1.3787744045257568, 'learning_rate': 9.918331760609201e-07, 'epoch': 2.58} +2025-02-06 03:06:54 - ERROR - stderr - 86%|████████▌ | 19323/22434 [16:59:14<18:14:28, 21.11s/it] +2025-02-06 03:07:32 - ERROR - stderr - 86%|████████▌ | 19324/22434 [16:59:51<22:26:39, 25.98s/it] +2025-02-06 03:07:32 - ERROR - stderr - +2025-02-06 03:07:32 - ERROR - stderr - +2025-02-06 03:07:32 - INFO - stdout - {'loss': 0.3666, 'grad_norm': 1.5200321674346924, 'learning_rate': 9.91206395719726e-07, 'epoch': 2.58} +2025-02-06 03:07:32 - ERROR - stderr - 86%|████████▌ | 19324/22434 [16:59:51<22:26:39, 25.98s/it] +2025-02-06 03:08:11 - ERROR - stderr - 86%|████████▌ | 19325/22434 [17:00:31<25:58:56, 30.09s/it] +2025-02-06 03:08:11 - ERROR - stderr - +2025-02-06 03:08:11 - ERROR - stderr - +2025-02-06 03:08:11 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4724156856536865, 'learning_rate': 9.905798031582147e-07, 'epoch': 2.58} +2025-02-06 03:08:11 - ERROR - stderr - 86%|████████▌ | 19325/22434 [17:00:31<25:58:56, 30.09s/it] +2025-02-06 03:08:31 - ERROR - stderr - 86%|████████▌ | 19326/22434 [17:00:51<23:17:48, 26.98s/it] +2025-02-06 03:08:31 - ERROR - stderr - +2025-02-06 03:08:31 - ERROR - stderr - +2025-02-06 03:08:31 - INFO - stdout - {'loss': 0.3828, 'grad_norm': 1.596299171447754, 'learning_rate': 9.89953398389447e-07, 'epoch': 2.58} +2025-02-06 03:08:31 - ERROR - stderr - 86%|████████▌ | 19326/22434 [17:00:51<23:17:48, 26.98s/it] +2025-02-06 03:08:54 - ERROR - stderr - 86%|████████▌ | 19327/22434 [17:01:14<22:21:57, 25.91s/it] +2025-02-06 03:08:55 - ERROR - stderr - +2025-02-06 03:08:55 - ERROR - stderr - +2025-02-06 03:08:55 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.5065652132034302, 'learning_rate': 9.893271814264781e-07, 'epoch': 2.58} +2025-02-06 03:08:55 - ERROR - stderr - 86%|████████▌ | 19327/22434 [17:01:14<22:21:57, 25.91s/it] +2025-02-06 03:09:46 - ERROR - stderr - 86%|████████▌ | 19328/22434 [17:02:05<28:54:34, 33.51s/it] +2025-02-06 03:09:46 - ERROR - stderr - +2025-02-06 03:09:46 - ERROR - stderr - +2025-02-06 03:09:46 - INFO - stdout - {'loss': 0.3616, 'grad_norm': 1.5778130292892456, 'learning_rate': 9.88701152282362e-07, 'epoch': 2.58} +2025-02-06 03:09:46 - ERROR - stderr - 86%|████████▌ | 19328/22434 [17:02:06<28:54:34, 33.51s/it] +2025-02-06 03:10:34 - ERROR - stderr - 86%|████████▌ | 19329/22434 [17:02:54<32:42:08, 37.92s/it] +2025-02-06 03:10:34 - ERROR - stderr - +2025-02-06 03:10:34 - ERROR - stderr - +2025-02-06 03:10:34 - INFO - stdout - {'loss': 0.3227, 'grad_norm': 1.4566751718521118, 'learning_rate': 9.88075310970148e-07, 'epoch': 2.58} +2025-02-06 03:10:34 - ERROR - stderr - 86%|████████▌ | 19329/22434 [17:02:54<32:42:08, 37.92s/it] +2025-02-06 03:11:21 - ERROR - stderr - 86%|████████▌ | 19330/22434 [17:03:41<35:06:29, 40.72s/it] +2025-02-06 03:11:21 - ERROR - stderr - +2025-02-06 03:11:21 - ERROR - stderr - +2025-02-06 03:11:21 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.460789680480957, 'learning_rate': 9.874496575028814e-07, 'epoch': 2.58} +2025-02-06 03:11:21 - ERROR - stderr - 86%|████████▌ | 19330/22434 [17:03:41<35:06:29, 40.72s/it] +2025-02-06 03:11:34 - ERROR - stderr - 86%|████████▌ | 19331/22434 [17:03:54<27:57:58, 32.45s/it] +2025-02-06 03:11:34 - ERROR - stderr - +2025-02-06 03:11:34 - ERROR - stderr - +2025-02-06 03:11:34 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.5926569700241089, 'learning_rate': 9.868241918935994e-07, 'epoch': 2.59} +2025-02-06 03:11:34 - ERROR - stderr - 86%|████████▌ | 19331/22434 [17:03:54<27:57:58, 32.45s/it] +2025-02-06 03:12:19 - ERROR - stderr - 86%|████████▌ | 19332/22434 [17:04:39<31:14:57, 36.27s/it] +2025-02-06 03:12:20 - ERROR - stderr - +2025-02-06 03:12:20 - ERROR - stderr - +2025-02-06 03:12:20 - INFO - stdout - {'loss': 0.3473, 'grad_norm': 1.6023719310760498, 'learning_rate': 9.861989141553463e-07, 'epoch': 2.59} +2025-02-06 03:12:20 - ERROR - stderr - 86%|████████▌ | 19332/22434 [17:04:39<31:14:57, 36.27s/it] +2025-02-06 03:12:22 - ERROR - stderr - 86%|████████▌ | 19333/22434 [17:04:42<22:30:02, 26.12s/it] +2025-02-06 03:12:22 - ERROR - stderr - +2025-02-06 03:12:22 - ERROR - stderr - +2025-02-06 03:12:22 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.5605714321136475, 'learning_rate': 9.855738243011482e-07, 'epoch': 2.59} +2025-02-06 03:12:22 - ERROR - stderr - 86%|████████▌ | 19333/22434 [17:04:42<22:30:02, 26.12s/it] +2025-02-06 03:12:28 - ERROR - stderr - 86%|████████▌ | 19334/22434 [17:04:48<17:18:39, 20.10s/it] +2025-02-06 03:12:28 - ERROR - stderr - +2025-02-06 03:12:28 - ERROR - stderr - +2025-02-06 03:12:28 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.5475319623947144, 'learning_rate': 9.849489223440401e-07, 'epoch': 2.59} +2025-02-06 03:12:28 - ERROR - stderr - 86%|████████▌ | 19334/22434 [17:04:48<17:18:39, 20.10s/it] +2025-02-06 03:13:10 - ERROR - stderr - 86%|████████▌ | 19335/22434 [17:05:29<22:50:40, 26.54s/it] +2025-02-06 03:13:10 - ERROR - stderr - +2025-02-06 03:13:10 - ERROR - stderr - +2025-02-06 03:13:10 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.5588434934616089, 'learning_rate': 9.843242082970462e-07, 'epoch': 2.59} +2025-02-06 03:13:10 - ERROR - stderr - 86%|████████▌ | 19335/22434 [17:05:29<22:50:40, 26.54s/it] +2025-02-06 03:13:15 - ERROR - stderr - 86%|████████▌ | 19336/22434 [17:05:35<17:30:21, 20.34s/it] +2025-02-06 03:13:15 - ERROR - stderr - +2025-02-06 03:13:15 - ERROR - stderr - +2025-02-06 03:13:15 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.3440344333648682, 'learning_rate': 9.836996821731836e-07, 'epoch': 2.59} +2025-02-06 03:13:15 - ERROR - stderr - 86%|████████▌ | 19336/22434 [17:05:35<17:30:21, 20.34s/it] +2025-02-06 03:13:18 - ERROR - stderr - 86%|████████▌ | 19337/22434 [17:05:38<12:55:05, 15.02s/it] +2025-02-06 03:13:18 - ERROR - stderr - +2025-02-06 03:13:18 - ERROR - stderr - +2025-02-06 03:13:18 - INFO - stdout - {'loss': 0.3815, 'grad_norm': 1.6707100868225098, 'learning_rate': 9.830753439854769e-07, 'epoch': 2.59} +2025-02-06 03:13:18 - ERROR - stderr - 86%|████████▌ | 19337/22434 [17:05:38<12:55:05, 15.02s/it] +2025-02-06 03:14:00 - ERROR - stderr - 86%|████████▌ | 19338/22434 [17:06:20<19:56:45, 23.19s/it] +2025-02-06 03:14:00 - ERROR - stderr - +2025-02-06 03:14:00 - ERROR - stderr - +2025-02-06 03:14:00 - INFO - stdout - {'loss': 0.4004, 'grad_norm': 1.7597123384475708, 'learning_rate': 9.82451193746935e-07, 'epoch': 2.59} +2025-02-06 03:14:00 - ERROR - stderr - 86%|████████▌ | 19338/22434 [17:06:20<19:56:45, 23.19s/it] +2025-02-06 03:14:08 - ERROR - stderr - 86%|████████▌ | 19339/22434 [17:06:28<15:56:25, 18.54s/it] +2025-02-06 03:14:08 - ERROR - stderr - +2025-02-06 03:14:08 - ERROR - stderr - +2025-02-06 03:14:08 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.7298433780670166, 'learning_rate': 9.81827231470569e-07, 'epoch': 2.59} +2025-02-06 03:14:08 - ERROR - stderr - 86%|████████▌ | 19339/22434 [17:06:28<15:56:25, 18.54s/it] +2025-02-06 03:14:10 - ERROR - stderr - 86%|████████▌ | 19340/22434 [17:06:30<11:47:44, 13.72s/it] +2025-02-06 03:14:11 - ERROR - stderr - +2025-02-06 03:14:11 - ERROR - stderr - +2025-02-06 03:14:11 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.783390760421753, 'learning_rate': 9.812034571693841e-07, 'epoch': 2.59} +2025-02-06 03:14:11 - ERROR - stderr - 86%|████████▌ | 19340/22434 [17:06:30<11:47:44, 13.72s/it] +2025-02-06 03:14:52 - ERROR - stderr - 86%|████████▌ | 19341/22434 [17:07:12<18:53:43, 21.99s/it] +2025-02-06 03:14:52 - ERROR - stderr - +2025-02-06 03:14:52 - ERROR - stderr - +2025-02-06 03:14:52 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.4775928258895874, 'learning_rate': 9.80579870856384e-07, 'epoch': 2.59} +2025-02-06 03:14:52 - ERROR - stderr - 86%|████████▌ | 19341/22434 [17:07:12<18:53:43, 21.99s/it] +2025-02-06 03:14:54 - ERROR - stderr - 86%|████████▌ | 19342/22434 [17:07:14<13:55:42, 16.22s/it] +2025-02-06 03:14:55 - ERROR - stderr - +2025-02-06 03:14:55 - ERROR - stderr - +2025-02-06 03:14:55 - INFO - stdout - {'loss': 0.345, 'grad_norm': 1.5533064603805542, 'learning_rate': 9.799564725445653e-07, 'epoch': 2.59} +2025-02-06 03:14:55 - ERROR - stderr - 86%|████████▌ | 19342/22434 [17:07:14<13:55:42, 16.22s/it] +2025-02-06 03:15:27 - ERROR - stderr - 86%|████████▌ | 19343/22434 [17:07:47<18:11:23, 21.19s/it] +2025-02-06 03:15:27 - ERROR - stderr - +2025-02-06 03:15:27 - ERROR - stderr - +2025-02-06 03:15:27 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.5192970037460327, 'learning_rate': 9.79333262246923e-07, 'epoch': 2.59} +2025-02-06 03:15:27 - ERROR - stderr - 86%|████████▌ | 19343/22434 [17:07:47<18:11:23, 21.19s/it] +2025-02-06 03:15:57 - ERROR - stderr - 86%|████████▌ | 19344/22434 [17:08:17<20:21:23, 23.72s/it] +2025-02-06 03:15:57 - ERROR - stderr - +2025-02-06 03:15:57 - ERROR - stderr - +2025-02-06 03:15:57 - INFO - stdout - {'loss': 0.3112, 'grad_norm': 1.4900189638137817, 'learning_rate': 9.787102399764482e-07, 'epoch': 2.59} +2025-02-06 03:15:57 - ERROR - stderr - 86%|████████▌ | 19344/22434 [17:08:17<20:21:23, 23.72s/it] +2025-02-06 03:15:59 - ERROR - stderr - 86%|████████▌ | 19345/22434 [17:08:19<14:54:27, 17.37s/it] +2025-02-06 03:16:00 - ERROR - stderr - +2025-02-06 03:16:00 - ERROR - stderr - +2025-02-06 03:16:00 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.4583711624145508, 'learning_rate': 9.780874057461242e-07, 'epoch': 2.59} +2025-02-06 03:16:00 - ERROR - stderr - 86%|████████▌ | 19345/22434 [17:08:19<14:54:27, 17.37s/it] +2025-02-06 03:16:35 - ERROR - stderr - 86%|████████▌ | 19346/22434 [17:08:54<19:28:57, 22.71s/it] +2025-02-06 03:16:35 - ERROR - stderr - +2025-02-06 03:16:35 - ERROR - stderr - +2025-02-06 03:16:35 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.6585460901260376, 'learning_rate': 9.774647595689356e-07, 'epoch': 2.59} +2025-02-06 03:16:35 - ERROR - stderr - 86%|████████▌ | 19346/22434 [17:08:54<19:28:57, 22.71s/it] +2025-02-06 03:16:37 - ERROR - stderr - 86%|████████▌ | 19347/22434 [17:08:57<14:17:34, 16.67s/it] +2025-02-06 03:16:37 - ERROR - stderr - +2025-02-06 03:16:37 - ERROR - stderr - +2025-02-06 03:16:37 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.377306580543518, 'learning_rate': 9.76842301457861e-07, 'epoch': 2.59} +2025-02-06 03:16:37 - ERROR - stderr - 86%|████████▌ | 19347/22434 [17:08:57<14:17:34, 16.67s/it] +2025-02-06 03:16:40 - ERROR - stderr - 86%|████████▌ | 19348/22434 [17:09:00<10:39:58, 12.44s/it] +2025-02-06 03:16:40 - ERROR - stderr - +2025-02-06 03:16:40 - ERROR - stderr - +2025-02-06 03:16:40 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.5943844318389893, 'learning_rate': 9.76220031425874e-07, 'epoch': 2.59} +2025-02-06 03:16:40 - ERROR - stderr - 86%|████████▌ | 19348/22434 [17:09:00<10:39:58, 12.44s/it] +2025-02-06 03:16:42 - ERROR - stderr - 86%|████████▌ | 19349/22434 [17:09:02<8:05:36, 9.44s/it] +2025-02-06 03:16:42 - ERROR - stderr - +2025-02-06 03:16:42 - ERROR - stderr - +2025-02-06 03:16:42 - INFO - stdout - {'loss': 0.3475, 'grad_norm': 1.5348008871078491, 'learning_rate': 9.755979494859459e-07, 'epoch': 2.59} +2025-02-06 03:16:42 - ERROR - stderr - 86%|████████▌ | 19349/22434 [17:09:02<8:05:36, 9.44s/it] +2025-02-06 03:17:03 - ERROR - stderr - 86%|████████▋ | 19350/22434 [17:09:23<11:02:46, 12.89s/it] +2025-02-06 03:17:03 - ERROR - stderr - +2025-02-06 03:17:03 - ERROR - stderr - +2025-02-06 03:17:03 - INFO - stdout - {'loss': 0.3166, 'grad_norm': 1.4306881427764893, 'learning_rate': 9.749760556510435e-07, 'epoch': 2.59} +2025-02-06 03:17:03 - ERROR - stderr - 86%|████████▋ | 19350/22434 [17:09:23<11:02:46, 12.89s/it] +2025-02-06 03:17:34 - ERROR - stderr - 86%|████████▋ | 19351/22434 [17:09:53<15:34:18, 18.18s/it] +2025-02-06 03:17:34 - ERROR - stderr - +2025-02-06 03:17:34 - ERROR - stderr - +2025-02-06 03:17:34 - INFO - stdout - {'loss': 0.414, 'grad_norm': 1.9345148801803589, 'learning_rate': 9.743543499341302e-07, 'epoch': 2.59} +2025-02-06 03:17:34 - ERROR - stderr - 86%|████████▋ | 19351/22434 [17:09:54<15:34:18, 18.18s/it] +2025-02-06 03:18:05 - ERROR - stderr - 86%|████████▋ | 19352/22434 [17:10:24<18:50:54, 22.02s/it] +2025-02-06 03:18:05 - ERROR - stderr - +2025-02-06 03:18:05 - ERROR - stderr - +2025-02-06 03:18:05 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.6305358409881592, 'learning_rate': 9.7373283234816e-07, 'epoch': 2.59} +2025-02-06 03:18:05 - ERROR - stderr - 86%|████████▋ | 19352/22434 [17:10:24<18:50:54, 22.02s/it] +2025-02-06 03:18:39 - ERROR - stderr - 86%|████████▋ | 19353/22434 [17:10:59<22:00:03, 25.71s/it] +2025-02-06 03:18:39 - ERROR - stderr - +2025-02-06 03:18:39 - ERROR - stderr - +2025-02-06 03:18:39 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.6043014526367188, 'learning_rate': 9.731115029060945e-07, 'epoch': 2.59} +2025-02-06 03:18:39 - ERROR - stderr - 86%|████████▋ | 19353/22434 [17:10:59<22:00:03, 25.71s/it] +2025-02-06 03:18:54 - ERROR - stderr - 86%|████████▋ | 19354/22434 [17:11:14<19:14:33, 22.49s/it] +2025-02-06 03:18:54 - ERROR - stderr - +2025-02-06 03:18:54 - ERROR - stderr - +2025-02-06 03:18:54 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.5132147073745728, 'learning_rate': 9.724903616208837e-07, 'epoch': 2.59} +2025-02-06 03:18:54 - ERROR - stderr - 86%|████████▋ | 19354/22434 [17:11:14<19:14:33, 22.49s/it] +2025-02-06 03:19:09 - ERROR - stderr - 86%|████████▋ | 19355/22434 [17:11:29<17:22:10, 20.31s/it] +2025-02-06 03:19:09 - ERROR - stderr - +2025-02-06 03:19:09 - ERROR - stderr - +2025-02-06 03:19:09 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.6710190773010254, 'learning_rate': 9.718694085054681e-07, 'epoch': 2.59} +2025-02-06 03:19:09 - ERROR - stderr - 86%|████████▋ | 19355/22434 [17:11:29<17:22:10, 20.31s/it] +2025-02-06 03:19:12 - ERROR - stderr - 86%|████████▋ | 19356/22434 [17:11:31<12:46:41, 14.95s/it] +2025-02-06 03:19:12 - ERROR - stderr - +2025-02-06 03:19:12 - ERROR - stderr - +2025-02-06 03:19:12 - INFO - stdout - {'loss': 0.3274, 'grad_norm': 1.4611085653305054, 'learning_rate': 9.712486435728008e-07, 'epoch': 2.59} +2025-02-06 03:19:12 - ERROR - stderr - 86%|████████▋ | 19356/22434 [17:11:31<12:46:41, 14.95s/it] +2025-02-06 03:19:41 - ERROR - stderr - 86%|████████▋ | 19357/22434 [17:12:01<16:35:58, 19.42s/it] +2025-02-06 03:19:42 - ERROR - stderr - +2025-02-06 03:19:42 - ERROR - stderr - +2025-02-06 03:19:42 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.7424030303955078, 'learning_rate': 9.706280668358115e-07, 'epoch': 2.59} +2025-02-06 03:19:42 - ERROR - stderr - 86%|████████▋ | 19357/22434 [17:12:01<16:35:58, 19.42s/it] +2025-02-06 03:20:03 - ERROR - stderr - 86%|████████▋ | 19358/22434 [17:12:23<17:05:16, 20.00s/it] +2025-02-06 03:20:03 - ERROR - stderr - +2025-02-06 03:20:03 - ERROR - stderr - +2025-02-06 03:20:03 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.6497619152069092, 'learning_rate': 9.70007678307443e-07, 'epoch': 2.59} +2025-02-06 03:20:03 - ERROR - stderr - 86%|████████▋ | 19358/22434 [17:12:23<17:05:16, 20.00s/it] +2025-02-06 03:20:05 - ERROR - stderr - 86%|████████▋ | 19359/22434 [17:12:25<12:37:10, 14.77s/it] +2025-02-06 03:20:05 - ERROR - stderr - +2025-02-06 03:20:05 - ERROR - stderr - +2025-02-06 03:20:05 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.6942094564437866, 'learning_rate': 9.693874780006229e-07, 'epoch': 2.59} +2025-02-06 03:20:05 - ERROR - stderr - 86%|████████▋ | 19359/22434 [17:12:25<12:37:10, 14.77s/it] +2025-02-06 03:20:08 - ERROR - stderr - 86%|████████▋ | 19360/22434 [17:12:28<9:27:54, 11.08s/it] +2025-02-06 03:20:08 - ERROR - stderr - +2025-02-06 03:20:08 - ERROR - stderr - +2025-02-06 03:20:08 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.5680617094039917, 'learning_rate': 9.687674659282797e-07, 'epoch': 2.59} +2025-02-06 03:20:08 - ERROR - stderr - 86%|████████▋ | 19360/22434 [17:12:28<9:27:54, 11.08s/it] +2025-02-06 03:20:27 - ERROR - stderr - 86%|████████▋ | 19361/22434 [17:12:46<11:25:48, 13.39s/it] +2025-02-06 03:20:27 - ERROR - stderr - +2025-02-06 03:20:27 - ERROR - stderr - +2025-02-06 03:20:27 - INFO - stdout - {'loss': 0.4049, 'grad_norm': 1.6897770166397095, 'learning_rate': 9.681476421033354e-07, 'epoch': 2.59} +2025-02-06 03:20:27 - ERROR - stderr - 86%|████████▋ | 19361/22434 [17:12:46<11:25:48, 13.39s/it] +2025-02-06 03:20:29 - ERROR - stderr - 86%|████████▋ | 19362/22434 [17:12:49<8:38:17, 10.12s/it] +2025-02-06 03:20:29 - ERROR - stderr - +2025-02-06 03:20:29 - ERROR - stderr - +2025-02-06 03:20:29 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.5389869213104248, 'learning_rate': 9.675280065387117e-07, 'epoch': 2.59} +2025-02-06 03:20:29 - ERROR - stderr - 86%|████████▋ | 19362/22434 [17:12:49<8:38:17, 10.12s/it] +2025-02-06 03:20:44 - ERROR - stderr - 86%|████████▋ | 19363/22434 [17:13:04<9:46:52, 11.47s/it] +2025-02-06 03:20:44 - ERROR - stderr - +2025-02-06 03:20:44 - ERROR - stderr - +2025-02-06 03:20:44 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.4350450038909912, 'learning_rate': 9.669085592473237e-07, 'epoch': 2.59} +2025-02-06 03:20:44 - ERROR - stderr - 86%|████████▋ | 19363/22434 [17:13:04<9:46:52, 11.47s/it] +2025-02-06 03:20:46 - ERROR - stderr - 86%|████████▋ | 19364/22434 [17:13:06<7:29:16, 8.78s/it] +2025-02-06 03:20:46 - ERROR - stderr - +2025-02-06 03:20:46 - ERROR - stderr - +2025-02-06 03:20:46 - INFO - stdout - {'loss': 0.4225, 'grad_norm': 1.6546533107757568, 'learning_rate': 9.662893002420836e-07, 'epoch': 2.59} +2025-02-06 03:20:46 - ERROR - stderr - 86%|████████▋ | 19364/22434 [17:13:06<7:29:16, 8.78s/it] +2025-02-06 03:21:20 - ERROR - stderr - 86%|████████▋ | 19365/22434 [17:13:40<13:54:10, 16.31s/it] +2025-02-06 03:21:20 - ERROR - stderr - +2025-02-06 03:21:20 - ERROR - stderr - +2025-02-06 03:21:20 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.5399622917175293, 'learning_rate': 9.656702295358977e-07, 'epoch': 2.59} +2025-02-06 03:21:20 - ERROR - stderr - 86%|████████▋ | 19365/22434 [17:13:40<13:54:10, 16.31s/it] +2025-02-06 03:21:30 - ERROR - stderr - 86%|████████▋ | 19366/22434 [17:13:49<12:07:54, 14.24s/it] +2025-02-06 03:21:30 - ERROR - stderr - +2025-02-06 03:21:30 - ERROR - stderr - +2025-02-06 03:21:30 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.607690691947937, 'learning_rate': 9.650513471416712e-07, 'epoch': 2.59} +2025-02-06 03:21:30 - ERROR - stderr - 86%|████████▋ | 19366/22434 [17:13:49<12:07:54, 14.24s/it] +2025-02-06 03:21:38 - ERROR - stderr - 86%|████████▋ | 19367/22434 [17:13:58<10:44:26, 12.61s/it] +2025-02-06 03:21:38 - ERROR - stderr - +2025-02-06 03:21:38 - ERROR - stderr - +2025-02-06 03:21:38 - INFO - stdout - {'loss': 0.3057, 'grad_norm': 1.5199034214019775, 'learning_rate': 9.644326530723036e-07, 'epoch': 2.59} +2025-02-06 03:21:38 - ERROR - stderr - 86%|████████▋ | 19367/22434 [17:13:58<10:44:26, 12.61s/it] +2025-02-06 03:21:41 - ERROR - stderr - 86%|████████▋ | 19368/22434 [17:14:01<8:08:40, 9.56s/it] +2025-02-06 03:21:41 - ERROR - stderr - +2025-02-06 03:21:41 - ERROR - stderr - +2025-02-06 03:21:41 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.681808352470398, 'learning_rate': 9.638141473406925e-07, 'epoch': 2.59} +2025-02-06 03:21:41 - ERROR - stderr - 86%|████████▋ | 19368/22434 [17:14:01<8:08:40, 9.56s/it] +2025-02-06 03:21:43 - ERROR - stderr - 86%|████████▋ | 19369/22434 [17:14:03<6:20:02, 7.44s/it] +2025-02-06 03:21:43 - ERROR - stderr - +2025-02-06 03:21:43 - ERROR - stderr - +2025-02-06 03:21:43 - INFO - stdout - {'loss': 0.3143, 'grad_norm': 1.4247227907180786, 'learning_rate': 9.631958299597277e-07, 'epoch': 2.59} +2025-02-06 03:21:43 - ERROR - stderr - 86%|████████▋ | 19369/22434 [17:14:03<6:20:02, 7.44s/it] +2025-02-06 03:21:47 - ERROR - stderr - 86%|████████▋ | 19370/22434 [17:14:06<5:17:28, 6.22s/it] +2025-02-06 03:21:47 - ERROR - stderr - +2025-02-06 03:21:47 - ERROR - stderr - +2025-02-06 03:21:47 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.4172497987747192, 'learning_rate': 9.62577700942301e-07, 'epoch': 2.59} +2025-02-06 03:21:47 - ERROR - stderr - 86%|████████▋ | 19370/22434 [17:14:06<5:17:28, 6.22s/it] +2025-02-06 03:21:49 - ERROR - stderr - 86%|████████▋ | 19371/22434 [17:14:09<4:23:55, 5.17s/it] +2025-02-06 03:21:49 - ERROR - stderr - +2025-02-06 03:21:49 - ERROR - stderr - +2025-02-06 03:21:49 - INFO - stdout - {'loss': 0.3313, 'grad_norm': 1.4623740911483765, 'learning_rate': 9.619597603012898e-07, 'epoch': 2.59} +2025-02-06 03:21:49 - ERROR - stderr - 86%|████████▋ | 19371/22434 [17:14:09<4:23:55, 5.17s/it] +2025-02-06 03:21:52 - ERROR - stderr - 86%|████████▋ | 19372/22434 [17:14:12<3:41:55, 4.35s/it] +2025-02-06 03:21:52 - ERROR - stderr - +2025-02-06 03:21:52 - ERROR - stderr - +2025-02-06 03:21:52 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.7578006982803345, 'learning_rate': 9.613420080495806e-07, 'epoch': 2.59} +2025-02-06 03:21:52 - ERROR - stderr - 86%|████████▋ | 19372/22434 [17:14:12<3:41:55, 4.35s/it] +2025-02-06 03:21:54 - ERROR - stderr - 86%|████████▋ | 19373/22434 [17:14:14<3:13:32, 3.79s/it] +2025-02-06 03:21:54 - ERROR - stderr - +2025-02-06 03:21:54 - ERROR - stderr - +2025-02-06 03:21:54 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.4879993200302124, 'learning_rate': 9.607244442000486e-07, 'epoch': 2.59} +2025-02-06 03:21:54 - ERROR - stderr - 86%|████████▋ | 19373/22434 [17:14:14<3:13:32, 3.79s/it] +2025-02-06 03:21:57 - ERROR - stderr - 86%|████████▋ | 19374/22434 [17:14:17<2:52:38, 3.39s/it] +2025-02-06 03:21:57 - ERROR - stderr - +2025-02-06 03:21:57 - ERROR - stderr - +2025-02-06 03:21:57 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.523939609527588, 'learning_rate': 9.601070687655667e-07, 'epoch': 2.59} +2025-02-06 03:21:57 - ERROR - stderr - 86%|████████▋ | 19374/22434 [17:14:17<2:52:38, 3.39s/it] +2025-02-06 03:21:59 - ERROR - stderr - 86%|████████▋ | 19375/22434 [17:14:19<2:39:36, 3.13s/it] +2025-02-06 03:21:59 - ERROR - stderr - +2025-02-06 03:21:59 - ERROR - stderr - +2025-02-06 03:21:59 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.5160313844680786, 'learning_rate': 9.594898817590037e-07, 'epoch': 2.59} +2025-02-06 03:21:59 - ERROR - stderr - 86%|████████▋ | 19375/22434 [17:14:19<2:39:36, 3.13s/it] +2025-02-06 03:22:02 - ERROR - stderr - 86%|████████▋ | 19376/22434 [17:14:22<2:29:26, 2.93s/it] +2025-02-06 03:22:02 - ERROR - stderr - +2025-02-06 03:22:02 - ERROR - stderr - +2025-02-06 03:22:02 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5185983180999756, 'learning_rate': 9.588728831932193e-07, 'epoch': 2.59} +2025-02-06 03:22:02 - ERROR - stderr - 86%|████████▋ | 19376/22434 [17:14:22<2:29:26, 2.93s/it] +2025-02-06 03:22:04 - ERROR - stderr - 86%|████████▋ | 19377/22434 [17:14:24<2:23:03, 2.81s/it] +2025-02-06 03:22:04 - ERROR - stderr - +2025-02-06 03:22:04 - ERROR - stderr - +2025-02-06 03:22:04 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.373460292816162, 'learning_rate': 9.58256073081083e-07, 'epoch': 2.59} +2025-02-06 03:22:04 - ERROR - stderr - 86%|████████▋ | 19377/22434 [17:14:24<2:23:03, 2.81s/it] +2025-02-06 03:22:07 - ERROR - stderr - 86%|████████▋ | 19378/22434 [17:14:27<2:26:38, 2.88s/it] +2025-02-06 03:22:07 - ERROR - stderr - +2025-02-06 03:22:07 - ERROR - stderr - +2025-02-06 03:22:07 - INFO - stdout - {'loss': 0.3085, 'grad_norm': 1.4084264039993286, 'learning_rate': 9.576394514354425e-07, 'epoch': 2.59} +2025-02-06 03:22:07 - ERROR - stderr - 86%|████████▋ | 19378/22434 [17:14:27<2:26:38, 2.88s/it] +2025-02-06 03:22:10 - ERROR - stderr - 86%|████████▋ | 19379/22434 [17:14:30<2:22:03, 2.79s/it] +2025-02-06 03:22:10 - ERROR - stderr - +2025-02-06 03:22:10 - ERROR - stderr - +2025-02-06 03:22:10 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.6068617105484009, 'learning_rate': 9.570230182691587e-07, 'epoch': 2.59} +2025-02-06 03:22:10 - ERROR - stderr - 86%|████████▋ | 19379/22434 [17:14:30<2:22:03, 2.79s/it] +2025-02-06 03:22:13 - ERROR - stderr - 86%|████████▋ | 19380/22434 [17:14:33<2:23:10, 2.81s/it] +2025-02-06 03:22:13 - ERROR - stderr - +2025-02-06 03:22:13 - ERROR - stderr - +2025-02-06 03:22:13 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.4380632638931274, 'learning_rate': 9.564067735950756e-07, 'epoch': 2.59} +2025-02-06 03:22:13 - ERROR - stderr - 86%|████████▋ | 19380/22434 [17:14:33<2:23:10, 2.81s/it] +2025-02-06 03:22:15 - ERROR - stderr - 86%|████████▋ | 19381/22434 [17:14:35<2:17:37, 2.70s/it] +2025-02-06 03:22:15 - ERROR - stderr - +2025-02-06 03:22:15 - ERROR - stderr - +2025-02-06 03:22:15 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.592768669128418, 'learning_rate': 9.557907174260372e-07, 'epoch': 2.59} +2025-02-06 03:22:15 - ERROR - stderr - 86%|████████▋ | 19381/22434 [17:14:35<2:17:37, 2.70s/it] +2025-02-06 03:22:18 - ERROR - stderr - 86%|████████▋ | 19382/22434 [17:14:38<2:15:02, 2.65s/it] +2025-02-06 03:22:18 - ERROR - stderr - +2025-02-06 03:22:18 - ERROR - stderr - +2025-02-06 03:22:18 - INFO - stdout - {'loss': 0.3967, 'grad_norm': 1.4582490921020508, 'learning_rate': 9.551748497748902e-07, 'epoch': 2.59} +2025-02-06 03:22:18 - ERROR - stderr - 86%|████████▋ | 19382/22434 [17:14:38<2:15:02, 2.65s/it] +2025-02-06 03:22:20 - ERROR - stderr - 86%|████████▋ | 19383/22434 [17:14:40<2:11:44, 2.59s/it] +2025-02-06 03:22:20 - ERROR - stderr - +2025-02-06 03:22:20 - ERROR - stderr - +2025-02-06 03:22:20 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4425758123397827, 'learning_rate': 9.545591706544677e-07, 'epoch': 2.59} +2025-02-06 03:22:20 - ERROR - stderr - 86%|████████▋ | 19383/22434 [17:14:40<2:11:44, 2.59s/it] +2025-02-06 03:22:23 - ERROR - stderr - 86%|████████▋ | 19384/22434 [17:14:42<2:10:48, 2.57s/it] +2025-02-06 03:22:23 - ERROR - stderr - +2025-02-06 03:22:23 - ERROR - stderr - +2025-02-06 03:22:23 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.3401920795440674, 'learning_rate': 9.539436800776026e-07, 'epoch': 2.59} +2025-02-06 03:22:23 - ERROR - stderr - 86%|████████▋ | 19384/22434 [17:14:43<2:10:48, 2.57s/it] +2025-02-06 03:22:25 - ERROR - stderr - 86%|████████▋ | 19385/22434 [17:14:45<2:09:37, 2.55s/it] +2025-02-06 03:22:25 - ERROR - stderr - +2025-02-06 03:22:25 - ERROR - stderr - +2025-02-06 03:22:25 - INFO - stdout - {'loss': 0.3295, 'grad_norm': 1.3675986528396606, 'learning_rate': 9.533283780571257e-07, 'epoch': 2.59} +2025-02-06 03:22:25 - ERROR - stderr - 86%|████████▋ | 19385/22434 [17:14:45<2:09:37, 2.55s/it] +2025-02-06 03:22:28 - ERROR - stderr - 86%|████████▋ | 19386/22434 [17:14:47<2:08:22, 2.53s/it] +2025-02-06 03:22:28 - ERROR - stderr - +2025-02-06 03:22:28 - ERROR - stderr - +2025-02-06 03:22:28 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.5308568477630615, 'learning_rate': 9.527132646058623e-07, 'epoch': 2.59} +2025-02-06 03:22:28 - ERROR - stderr - 86%|████████▋ | 19386/22434 [17:14:48<2:08:22, 2.53s/it] +2025-02-06 03:22:30 - ERROR - stderr - 86%|████████▋ | 19387/22434 [17:14:50<2:07:43, 2.51s/it] +2025-02-06 03:22:30 - ERROR - stderr - +2025-02-06 03:22:30 - ERROR - stderr - +2025-02-06 03:22:30 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.5374099016189575, 'learning_rate': 9.520983397366335e-07, 'epoch': 2.59} +2025-02-06 03:22:30 - ERROR - stderr - 86%|████████▋ | 19387/22434 [17:14:50<2:07:43, 2.51s/it] +2025-02-06 03:22:33 - ERROR - stderr - 86%|████████▋ | 19388/22434 [17:14:53<2:08:43, 2.54s/it] +2025-02-06 03:22:33 - ERROR - stderr - +2025-02-06 03:22:33 - ERROR - stderr - +2025-02-06 03:22:33 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.5637702941894531, 'learning_rate': 9.514836034622565e-07, 'epoch': 2.59} +2025-02-06 03:22:33 - ERROR - stderr - 86%|████████▋ | 19388/22434 [17:14:53<2:08:43, 2.54s/it] +2025-02-06 03:22:35 - ERROR - stderr - 86%|████████▋ | 19389/22434 [17:14:55<2:09:44, 2.56s/it] +2025-02-06 03:22:35 - ERROR - stderr - +2025-02-06 03:22:35 - ERROR - stderr - +2025-02-06 03:22:35 - INFO - stdout - {'loss': 0.3356, 'grad_norm': 1.4943219423294067, 'learning_rate': 9.508690557955458e-07, 'epoch': 2.59} +2025-02-06 03:22:35 - ERROR - stderr - 86%|████████▋ | 19389/22434 [17:14:55<2:09:44, 2.56s/it] +2025-02-06 03:22:38 - ERROR - stderr - 86%|████████▋ | 19390/22434 [17:14:58<2:09:23, 2.55s/it] +2025-02-06 03:22:38 - ERROR - stderr - +2025-02-06 03:22:38 - ERROR - stderr - +2025-02-06 03:22:38 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.475665807723999, 'learning_rate': 9.502546967493109e-07, 'epoch': 2.59} +2025-02-06 03:22:38 - ERROR - stderr - 86%|████████▋ | 19390/22434 [17:14:58<2:09:23, 2.55s/it] +2025-02-06 03:22:40 - ERROR - stderr - 86%|████████▋ | 19391/22434 [17:15:00<2:08:37, 2.54s/it] +2025-02-06 03:22:40 - ERROR - stderr - +2025-02-06 03:22:40 - ERROR - stderr - +2025-02-06 03:22:40 - INFO - stdout - {'loss': 0.3829, 'grad_norm': 1.6110912561416626, 'learning_rate': 9.496405263363562e-07, 'epoch': 2.59} +2025-02-06 03:22:40 - ERROR - stderr - 86%|████████▋ | 19391/22434 [17:15:00<2:08:37, 2.54s/it] +2025-02-06 03:23:19 - ERROR - stderr - 86%|████████▋ | 19392/22434 [17:15:39<11:23:25, 13.48s/it] +2025-02-06 03:23:19 - ERROR - stderr - +2025-02-06 03:23:19 - ERROR - stderr - +2025-02-06 03:23:19 - INFO - stdout - {'loss': 0.3206, 'grad_norm': 1.6001758575439453, 'learning_rate': 9.490265445694857e-07, 'epoch': 2.59} +2025-02-06 03:23:19 - ERROR - stderr - 86%|████████▋ | 19392/22434 [17:15:39<11:23:25, 13.48s/it] +2025-02-06 03:23:47 - ERROR - stderr - 86%|████████▋ | 19393/22434 [17:16:07<14:59:11, 17.74s/it] +2025-02-06 03:23:47 - ERROR - stderr - +2025-02-06 03:23:47 - ERROR - stderr - +2025-02-06 03:23:47 - INFO - stdout - {'loss': 0.3261, 'grad_norm': 1.5079129934310913, 'learning_rate': 9.484127514614949e-07, 'epoch': 2.59} +2025-02-06 03:23:47 - ERROR - stderr - 86%|████████▋ | 19393/22434 [17:16:07<14:59:11, 17.74s/it] +2025-02-06 03:25:14 - ERROR - stderr - 86%|████████▋ | 19394/22434 [17:17:34<32:35:55, 38.60s/it] +2025-02-06 03:25:14 - ERROR - stderr - +2025-02-06 03:25:14 - ERROR - stderr - +2025-02-06 03:25:14 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.507460355758667, 'learning_rate': 9.47799147025179e-07, 'epoch': 2.59} +2025-02-06 03:25:14 - ERROR - stderr - 86%|████████▋ | 19394/22434 [17:17:34<32:35:55, 38.60s/it] +2025-02-06 03:25:59 - ERROR - stderr - 86%|████████▋ | 19395/22434 [17:18:19<34:11:14, 40.50s/it] +2025-02-06 03:25:59 - ERROR - stderr - +2025-02-06 03:25:59 - ERROR - stderr - +2025-02-06 03:25:59 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.6498550176620483, 'learning_rate': 9.47185731273329e-07, 'epoch': 2.59} +2025-02-06 03:25:59 - ERROR - stderr - 86%|████████▋ | 19395/22434 [17:18:19<34:11:14, 40.50s/it] +2025-02-06 03:26:50 - ERROR - stderr - 86%|████████▋ | 19396/22434 [17:19:10<36:42:34, 43.50s/it] +2025-02-06 03:26:50 - ERROR - stderr - +2025-02-06 03:26:50 - ERROR - stderr - +2025-02-06 03:26:50 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.479394555091858, 'learning_rate': 9.465725042187301e-07, 'epoch': 2.59} +2025-02-06 03:26:50 - ERROR - stderr - 86%|████████▋ | 19396/22434 [17:19:10<36:42:34, 43.50s/it] +2025-02-06 03:27:39 - ERROR - stderr - 86%|████████▋ | 19397/22434 [17:19:58<38:02:48, 45.10s/it] +2025-02-06 03:27:39 - ERROR - stderr - +2025-02-06 03:27:39 - ERROR - stderr - +2025-02-06 03:27:39 - INFO - stdout - {'loss': 0.3146, 'grad_norm': 1.4429948329925537, 'learning_rate': 9.459594658741622e-07, 'epoch': 2.59} +2025-02-06 03:27:39 - ERROR - stderr - 86%|████████▋ | 19397/22434 [17:19:58<38:02:48, 45.10s/it] +2025-02-06 03:27:41 - ERROR - stderr - 86%|████████▋ | 19398/22434 [17:20:01<27:14:13, 32.30s/it] +2025-02-06 03:27:41 - ERROR - stderr - +2025-02-06 03:27:41 - ERROR - stderr - +2025-02-06 03:27:41 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.52006196975708, 'learning_rate': 9.453466162524072e-07, 'epoch': 2.59} +2025-02-06 03:27:41 - ERROR - stderr - 86%|████████▋ | 19398/22434 [17:20:01<27:14:13, 32.30s/it] +2025-02-06 03:27:44 - ERROR - stderr - 86%|████████▋ | 19399/22434 [17:20:03<19:41:18, 23.35s/it] +2025-02-06 03:27:44 - ERROR - stderr - +2025-02-06 03:27:44 - ERROR - stderr - +2025-02-06 03:27:44 - INFO - stdout - {'loss': 0.3299, 'grad_norm': 1.3725566864013672, 'learning_rate': 9.447339553662371e-07, 'epoch': 2.59} +2025-02-06 03:27:44 - ERROR - stderr - 86%|████████▋ | 19399/22434 [17:20:03<19:41:18, 23.35s/it] +2025-02-06 03:27:53 - ERROR - stderr - 86%|████████▋ | 19400/22434 [17:20:12<16:04:44, 19.08s/it] +2025-02-06 03:27:53 - ERROR - stderr - +2025-02-06 03:27:53 - ERROR - stderr - +2025-02-06 03:27:53 - INFO - stdout - {'loss': 0.3252, 'grad_norm': 1.4884238243103027, 'learning_rate': 9.441214832284206e-07, 'epoch': 2.59} +2025-02-06 03:27:53 - ERROR - stderr - 86%|████████▋ | 19400/22434 [17:20:12<16:04:44, 19.08s/it] +2025-02-06 03:28:40 - ERROR - stderr - 86%|████████▋ | 19401/22434 [17:21:00<23:17:05, 27.64s/it] +2025-02-06 03:28:40 - ERROR - stderr - +2025-02-06 03:28:40 - ERROR - stderr - +2025-02-06 03:28:40 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.5354907512664795, 'learning_rate': 9.435091998517298e-07, 'epoch': 2.59} +2025-02-06 03:28:40 - ERROR - stderr - 86%|████████▋ | 19401/22434 [17:21:00<23:17:05, 27.64s/it] +2025-02-06 03:28:50 - ERROR - stderr - 86%|████████▋ | 19402/22434 [17:21:10<18:46:18, 22.29s/it] +2025-02-06 03:28:50 - ERROR - stderr - +2025-02-06 03:28:50 - ERROR - stderr - +2025-02-06 03:28:50 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.6081600189208984, 'learning_rate': 9.4289710524892e-07, 'epoch': 2.59} +2025-02-06 03:28:50 - ERROR - stderr - 86%|████████▋ | 19402/22434 [17:21:10<18:46:18, 22.29s/it] +2025-02-06 03:29:42 - ERROR - stderr - 86%|████████▋ | 19403/22434 [17:22:01<26:07:54, 31.04s/it] +2025-02-06 03:29:42 - ERROR - stderr - +2025-02-06 03:29:42 - ERROR - stderr - +2025-02-06 03:29:42 - INFO - stdout - {'loss': 0.3329, 'grad_norm': 1.4156938791275024, 'learning_rate': 9.422851994327576e-07, 'epoch': 2.59} +2025-02-06 03:29:42 - ERROR - stderr - 86%|████████▋ | 19403/22434 [17:22:01<26:07:54, 31.04s/it] +2025-02-06 03:30:32 - ERROR - stderr - 86%|████████▋ | 19404/22434 [17:22:52<31:09:14, 37.01s/it] +2025-02-06 03:30:33 - ERROR - stderr - +2025-02-06 03:30:33 - ERROR - stderr - +2025-02-06 03:30:33 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.6202735900878906, 'learning_rate': 9.416734824159901e-07, 'epoch': 2.59} +2025-02-06 03:30:33 - ERROR - stderr - 86%|████████▋ | 19404/22434 [17:22:52<31:09:14, 37.01s/it] +2025-02-06 03:30:43 - ERROR - stderr - 86%|████████▋ | 19405/22434 [17:23:03<24:24:45, 29.01s/it] +2025-02-06 03:30:43 - ERROR - stderr - +2025-02-06 03:30:43 - ERROR - stderr - +2025-02-06 03:30:43 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.5548313856124878, 'learning_rate': 9.410619542113719e-07, 'epoch': 2.59} +2025-02-06 03:30:43 - ERROR - stderr - 86%|████████▋ | 19405/22434 [17:23:03<24:24:45, 29.01s/it] +2025-02-06 03:31:36 - ERROR - stderr - 87%|████████▋ | 19406/22434 [17:23:56<30:34:15, 36.35s/it] +2025-02-06 03:31:36 - ERROR - stderr - +2025-02-06 03:31:36 - ERROR - stderr - +2025-02-06 03:31:36 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.5368061065673828, 'learning_rate': 9.404506148316473e-07, 'epoch': 2.6} +2025-02-06 03:31:36 - ERROR - stderr - 87%|████████▋ | 19406/22434 [17:23:56<30:34:15, 36.35s/it] +2025-02-06 03:32:02 - ERROR - stderr - 87%|████████▋ | 19407/22434 [17:24:22<27:53:10, 33.16s/it] +2025-02-06 03:32:02 - ERROR - stderr - +2025-02-06 03:32:02 - ERROR - stderr - +2025-02-06 03:32:02 - INFO - stdout - {'loss': 0.4278, 'grad_norm': 1.8214455842971802, 'learning_rate': 9.398394642895625e-07, 'epoch': 2.6} +2025-02-06 03:32:02 - ERROR - stderr - 87%|████████▋ | 19407/22434 [17:24:22<27:53:10, 33.16s/it] +2025-02-06 03:32:09 - ERROR - stderr - 87%|████████▋ | 19408/22434 [17:24:29<21:21:45, 25.42s/it] +2025-02-06 03:32:09 - ERROR - stderr - +2025-02-06 03:32:09 - ERROR - stderr - +2025-02-06 03:32:09 - INFO - stdout - {'loss': 0.4053, 'grad_norm': 1.6040012836456299, 'learning_rate': 9.392285025978531e-07, 'epoch': 2.6} +2025-02-06 03:32:09 - ERROR - stderr - 87%|████████▋ | 19408/22434 [17:24:29<21:21:45, 25.42s/it] +2025-02-06 03:32:31 - ERROR - stderr - 87%|████████▋ | 19409/22434 [17:24:51<20:29:17, 24.38s/it] +2025-02-06 03:32:31 - ERROR - stderr - +2025-02-06 03:32:31 - ERROR - stderr - +2025-02-06 03:32:31 - INFO - stdout - {'loss': 0.298, 'grad_norm': 1.527644395828247, 'learning_rate': 9.386177297692556e-07, 'epoch': 2.6} +2025-02-06 03:32:31 - ERROR - stderr - 87%|████████▋ | 19409/22434 [17:24:51<20:29:17, 24.38s/it] +2025-02-06 03:33:27 - ERROR - stderr - 87%|████████▋ | 19410/22434 [17:25:47<28:22:58, 33.79s/it] +2025-02-06 03:33:27 - ERROR - stderr - +2025-02-06 03:33:27 - ERROR - stderr - +2025-02-06 03:33:27 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.679027795791626, 'learning_rate': 9.380071458165007e-07, 'epoch': 2.6} +2025-02-06 03:33:27 - ERROR - stderr - 87%|████████▋ | 19410/22434 [17:25:47<28:22:58, 33.79s/it] +2025-02-06 03:34:24 - ERROR - stderr - 87%|████████▋ | 19411/22434 [17:26:44<34:08:31, 40.66s/it] +2025-02-06 03:34:24 - ERROR - stderr - +2025-02-06 03:34:24 - ERROR - stderr - +2025-02-06 03:34:24 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.566199541091919, 'learning_rate': 9.373967507523163e-07, 'epoch': 2.6} +2025-02-06 03:34:24 - ERROR - stderr - 87%|████████▋ | 19411/22434 [17:26:44<34:08:31, 40.66s/it] +2025-02-06 03:34:26 - ERROR - stderr - 87%|████████▋ | 19412/22434 [17:26:46<24:31:13, 29.21s/it] +2025-02-06 03:34:26 - ERROR - stderr - +2025-02-06 03:34:26 - ERROR - stderr - +2025-02-06 03:34:26 - INFO - stdout - {'loss': 0.3419, 'grad_norm': 1.6476001739501953, 'learning_rate': 9.367865445894231e-07, 'epoch': 2.6} +2025-02-06 03:34:26 - ERROR - stderr - 87%|████████▋ | 19412/22434 [17:26:46<24:31:13, 29.21s/it] +2025-02-06 03:34:29 - ERROR - stderr - 87%|████████▋ | 19413/22434 [17:26:48<17:46:10, 21.18s/it] +2025-02-06 03:34:29 - ERROR - stderr - +2025-02-06 03:34:29 - ERROR - stderr - +2025-02-06 03:34:29 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.4245692491531372, 'learning_rate': 9.361765273405433e-07, 'epoch': 2.6} +2025-02-06 03:34:29 - ERROR - stderr - 87%|████████▋ | 19413/22434 [17:26:48<17:46:10, 21.18s/it] +2025-02-06 03:34:31 - ERROR - stderr - 87%|████████▋ | 19414/22434 [17:26:51<13:03:31, 15.57s/it] +2025-02-06 03:34:31 - ERROR - stderr - +2025-02-06 03:34:31 - ERROR - stderr - +2025-02-06 03:34:31 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.659962773323059, 'learning_rate': 9.355666990183898e-07, 'epoch': 2.6} +2025-02-06 03:34:31 - ERROR - stderr - 87%|████████▋ | 19414/22434 [17:26:51<13:03:31, 15.57s/it] +2025-02-06 03:35:16 - ERROR - stderr - 87%|████████▋ | 19415/22434 [17:27:36<20:29:07, 24.43s/it] +2025-02-06 03:35:16 - ERROR - stderr - +2025-02-06 03:35:16 - ERROR - stderr - +2025-02-06 03:35:16 - INFO - stdout - {'loss': 0.3154, 'grad_norm': 1.4171830415725708, 'learning_rate': 9.349570596356772e-07, 'epoch': 2.6} +2025-02-06 03:35:16 - ERROR - stderr - 87%|████████▋ | 19415/22434 [17:27:36<20:29:07, 24.43s/it] +2025-02-06 03:35:36 - ERROR - stderr - 87%|████████▋ | 19416/22434 [17:27:56<19:15:21, 22.97s/it] +2025-02-06 03:35:36 - ERROR - stderr - +2025-02-06 03:35:36 - ERROR - stderr - +2025-02-06 03:35:36 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.5904529094696045, 'learning_rate': 9.343476092051063e-07, 'epoch': 2.6} +2025-02-06 03:35:36 - ERROR - stderr - 87%|███████���▋ | 19416/22434 [17:27:56<19:15:21, 22.97s/it] +2025-02-06 03:36:25 - ERROR - stderr - 87%|████████▋ | 19417/22434 [17:28:44<25:43:53, 30.70s/it] +2025-02-06 03:36:25 - ERROR - stderr - +2025-02-06 03:36:25 - ERROR - stderr - +2025-02-06 03:36:25 - INFO - stdout - {'loss': 0.4054, 'grad_norm': 1.579392910003662, 'learning_rate': 9.337383477393858e-07, 'epoch': 2.6} +2025-02-06 03:36:25 - ERROR - stderr - 87%|████████▋ | 19417/22434 [17:28:44<25:43:53, 30.70s/it] +2025-02-06 03:36:27 - ERROR - stderr - 87%|████████▋ | 19418/22434 [17:28:47<18:38:04, 22.24s/it] +2025-02-06 03:36:27 - ERROR - stderr - +2025-02-06 03:36:27 - ERROR - stderr - +2025-02-06 03:36:27 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.62498939037323, 'learning_rate': 9.331292752512156e-07, 'epoch': 2.6} +2025-02-06 03:36:27 - ERROR - stderr - 87%|████████▋ | 19418/22434 [17:28:47<18:38:04, 22.24s/it] +2025-02-06 03:37:13 - ERROR - stderr - 87%|████████▋ | 19419/22434 [17:29:33<24:40:03, 29.45s/it] +2025-02-06 03:37:13 - ERROR - stderr - +2025-02-06 03:37:13 - ERROR - stderr - +2025-02-06 03:37:13 - INFO - stdout - {'loss': 0.4089, 'grad_norm': 1.4954196214675903, 'learning_rate': 9.325203917532877e-07, 'epoch': 2.6} +2025-02-06 03:37:13 - ERROR - stderr - 87%|████████▋ | 19419/22434 [17:29:33<24:40:03, 29.45s/it] +2025-02-06 03:37:22 - ERROR - stderr - 87%|████████▋ | 19420/22434 [17:29:41<19:19:34, 23.08s/it] +2025-02-06 03:37:22 - ERROR - stderr - +2025-02-06 03:37:22 - ERROR - stderr - +2025-02-06 03:37:22 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.502629041671753, 'learning_rate': 9.319116972582987e-07, 'epoch': 2.6} +2025-02-06 03:37:22 - ERROR - stderr - 87%|████████▋ | 19420/22434 [17:29:41<19:19:34, 23.08s/it] +2025-02-06 03:37:28 - ERROR - stderr - 87%|████████▋ | 19421/22434 [17:29:48<15:15:18, 18.23s/it] +2025-02-06 03:37:29 - ERROR - stderr - +2025-02-06 03:37:29 - ERROR - stderr - +2025-02-06 03:37:29 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.533696174621582, 'learning_rate': 9.313031917789295e-07, 'epoch': 2.6} +2025-02-06 03:37:29 - ERROR - stderr - 87%|████████▋ | 19421/22434 [17:29:48<15:15:18, 18.23s/it] +2025-02-06 03:38:12 - ERROR - stderr - 87%|████████▋ | 19422/22434 [17:30:32<21:33:55, 25.78s/it] +2025-02-06 03:38:12 - ERROR - stderr - +2025-02-06 03:38:12 - ERROR - stderr - +2025-02-06 03:38:12 - INFO - stdout - {'loss': 0.3881, 'grad_norm': 1.4959895610809326, 'learning_rate': 9.306948753278711e-07, 'epoch': 2.6} +2025-02-06 03:38:12 - ERROR - stderr - 87%|████████▋ | 19422/22434 [17:30:32<21:33:55, 25.78s/it] +2025-02-06 03:38:14 - ERROR - stderr - 87%|████████▋ | 19423/22434 [17:30:34<15:42:41, 18.79s/it] +2025-02-06 03:38:14 - ERROR - stderr - +2025-02-06 03:38:14 - ERROR - stderr - +2025-02-06 03:38:14 - INFO - stdout - {'loss': 0.3318, 'grad_norm': 1.5563950538635254, 'learning_rate': 9.300867479177966e-07, 'epoch': 2.6} +2025-02-06 03:38:14 - ERROR - stderr - 87%|████████▋ | 19423/22434 [17:30:34<15:42:41, 18.79s/it] +2025-02-06 03:38:17 - ERROR - stderr - 87%|████████▋ | 19424/22434 [17:30:37<11:36:55, 13.89s/it] +2025-02-06 03:38:17 - ERROR - stderr - +2025-02-06 03:38:17 - ERROR - stderr - +2025-02-06 03:38:17 - INFO - stdout - {'loss': 0.4278, 'grad_norm': 1.737650990486145, 'learning_rate': 9.294788095613861e-07, 'epoch': 2.6} +2025-02-06 03:38:17 - ERROR - stderr - 87%|████████▋ | 19424/22434 [17:30:37<11:36:55, 13.89s/it] +2025-02-06 03:38:19 - ERROR - stderr - 87%|████████▋ | 19425/22434 [17:30:39<8:44:53, 10.47s/it] +2025-02-06 03:38:19 - ERROR - stderr - +2025-02-06 03:38:19 - ERROR - stderr - +2025-02-06 03:38:19 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.568581461906433, 'learning_rate': 9.288710602713102e-07, 'epoch': 2.6} +2025-02-06 03:38:19 - ERROR - stderr - 87%|████████▋ | 19425/22434 [17:30:39<8:44:53, 10.47s/it] +2025-02-06 03:38:22 - ERROR - stderr - 87%|████████▋ | 19426/22434 [17:30:41<6:43:53, 8.06s/it] +2025-02-06 03:38:22 - ERROR - stderr - +2025-02-06 03:38:22 - ERROR - stderr - +2025-02-06 03:38:22 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.453376054763794, 'learning_rate': 9.282635000602346e-07, 'epoch': 2.6} +2025-02-06 03:38:22 - ERROR - stderr - 87%|████████▋ | 19426/22434 [17:30:42<6:43:53, 8.06s/it] +2025-02-06 03:38:24 - ERROR - stderr - 87%|████████▋ | 19427/22434 [17:30:44<5:21:16, 6.41s/it] +2025-02-06 03:38:24 - ERROR - stderr - +2025-02-06 03:38:24 - ERROR - stderr - +2025-02-06 03:38:24 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.5165928602218628, 'learning_rate': 9.276561289408293e-07, 'epoch': 2.6} +2025-02-06 03:38:24 - ERROR - stderr - 87%|████████▋ | 19427/22434 [17:30:44<5:21:16, 6.41s/it] +2025-02-06 03:39:05 - ERROR - stderr - 87%|████████▋ | 19428/22434 [17:31:25<13:55:33, 16.68s/it] +2025-02-06 03:39:05 - ERROR - stderr - +2025-02-06 03:39:05 - ERROR - stderr - +2025-02-06 03:39:05 - INFO - stdout - {'loss': 0.4147, 'grad_norm': 1.5319135189056396, 'learning_rate': 9.270489469257493e-07, 'epoch': 2.6} +2025-02-06 03:39:05 - ERROR - stderr - 87%|████████▋ | 19428/22434 [17:31:25<13:55:33, 16.68s/it] +2025-02-06 03:39:37 - ERROR - stderr - 87%|████████▋ | 19429/22434 [17:31:57<17:50:58, 21.38s/it] +2025-02-06 03:39:37 - ERROR - stderr - +2025-02-06 03:39:37 - ERROR - stderr - +2025-02-06 03:39:37 - INFO - stdout - {'loss': 0.3369, 'grad_norm': 1.5313671827316284, 'learning_rate': 9.264419540276526e-07, 'epoch': 2.6} +2025-02-06 03:39:37 - ERROR - stderr - 87%|████████▋ | 19429/22434 [17:31:57<17:50:58, 21.38s/it] +2025-02-06 03:39:40 - ERROR - stderr - 87%|████████▋ | 19430/22434 [17:32:00<13:06:21, 15.71s/it] +2025-02-06 03:39:40 - ERROR - stderr - +2025-02-06 03:39:40 - ERROR - stderr - +2025-02-06 03:39:40 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.6111124753952026, 'learning_rate': 9.2583515025919e-07, 'epoch': 2.6} +2025-02-06 03:39:40 - ERROR - stderr - 87%|████████▋ | 19430/22434 [17:32:00<13:06:21, 15.71s/it] +2025-02-06 03:39:42 - ERROR - stderr - 87%|████████▋ | 19431/22434 [17:32:02<9:48:50, 11.77s/it] +2025-02-06 03:39:42 - ERROR - stderr - +2025-02-06 03:39:42 - ERROR - stderr - +2025-02-06 03:39:42 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.5030075311660767, 'learning_rate': 9.252285356330104e-07, 'epoch': 2.6} +2025-02-06 03:39:42 - ERROR - stderr - 87%|████████▋ | 19431/22434 [17:32:02<9:48:50, 11.77s/it] +2025-02-06 03:40:00 - ERROR - stderr - 87%|████████▋ | 19432/22434 [17:32:20<11:15:32, 13.50s/it] +2025-02-06 03:40:00 - ERROR - stderr - +2025-02-06 03:40:00 - ERROR - stderr - +2025-02-06 03:40:00 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.640863060951233, 'learning_rate': 9.246221101617592e-07, 'epoch': 2.6} +2025-02-06 03:40:00 - ERROR - stderr - 87%|████████▋ | 19432/22434 [17:32:20<11:15:32, 13.50s/it] +2025-02-06 03:40:21 - ERROR - stderr - 87%|████████▋ | 19433/22434 [17:32:40<13:03:39, 15.67s/it] +2025-02-06 03:40:21 - ERROR - stderr - +2025-02-06 03:40:21 - ERROR - stderr - +2025-02-06 03:40:21 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.592004656791687, 'learning_rate': 9.240158738580751e-07, 'epoch': 2.6} +2025-02-06 03:40:21 - ERROR - stderr - 87%|████████▋ | 19433/22434 [17:32:40<13:03:39, 15.67s/it] +2025-02-06 03:40:39 - ERROR - stderr - 87%|████████▋ | 19434/22434 [17:32:59<13:47:52, 16.56s/it] +2025-02-06 03:40:39 - ERROR - stderr - +2025-02-06 03:40:39 - ERROR - stderr - +2025-02-06 03:40:39 - INFO - stdout - {'loss': 0.3812, 'grad_norm': 1.6026378870010376, 'learning_rate': 9.234098267345959e-07, 'epoch': 2.6} +2025-02-06 03:40:39 - ERROR - stderr - 87%|████████▋ | 19434/22434 [17:32:59<13:47:52, 16.56s/it] +2025-02-06 03:40:42 - ERROR - stderr - 87%|████████▋ | 19435/22434 [17:33:01<10:15:39, 12.32s/it] +2025-02-06 03:40:42 - ERROR - stderr - +2025-02-06 03:40:42 - ERROR - stderr - +2025-02-06 03:40:42 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.424882173538208, 'learning_rate': 9.228039688039537e-07, 'epoch': 2.6} +2025-02-06 03:40:42 - ERROR - stderr - 87%|████████▋ | 19435/22434 [17:33:01<10:15:39, 12.32s/it] +2025-02-06 03:40:44 - ERROR - stderr - 87%|████████▋ | 19436/22434 [17:33:04<7:48:05, 9.37s/it] +2025-02-06 03:40:44 - ERROR - stderr - +2025-02-06 03:40:44 - ERROR - stderr - +2025-02-06 03:40:44 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.8395777940750122, 'learning_rate': 9.22198300078777e-07, 'epoch': 2.6} +2025-02-06 03:40:44 - ERROR - stderr - 87%|████████▋ | 19436/22434 [17:33:04<7:48:05, 9.37s/it] +2025-02-06 03:41:03 - ERROR - stderr - 87%|████████▋ | 19437/22434 [17:33:22<10:04:04, 12.09s/it] +2025-02-06 03:41:03 - ERROR - stderr - +2025-02-06 03:41:03 - ERROR - stderr - +2025-02-06 03:41:03 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.5865273475646973, 'learning_rate': 9.215928205716895e-07, 'epoch': 2.6} +2025-02-06 03:41:03 - ERROR - stderr - 87%|████████▋ | 19437/22434 [17:33:22<10:04:04, 12.09s/it] +2025-02-06 03:41:05 - ERROR - stderr - 87%|████████▋ | 19438/22434 [17:33:25<7:39:12, 9.20s/it] +2025-02-06 03:41:05 - ERROR - stderr - +2025-02-06 03:41:05 - ERROR - stderr - +2025-02-06 03:41:05 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.4469475746154785, 'learning_rate': 9.209875302953131e-07, 'epoch': 2.6} +2025-02-06 03:41:05 - ERROR - stderr - 87%|████████▋ | 19438/22434 [17:33:25<7:39:12, 9.20s/it] +2025-02-06 03:41:08 - ERROR - stderr - 87%|████████▋ | 19439/22434 [17:33:27<5:58:36, 7.18s/it] +2025-02-06 03:41:08 - ERROR - stderr - +2025-02-06 03:41:08 - ERROR - stderr - +2025-02-06 03:41:08 - INFO - stdout - {'loss': 0.4043, 'grad_norm': 1.672650933265686, 'learning_rate': 9.203824292622654e-07, 'epoch': 2.6} +2025-02-06 03:41:08 - ERROR - stderr - 87%|████████▋ | 19439/22434 [17:33:27<5:58:36, 7.18s/it] +2025-02-06 03:41:10 - ERROR - stderr - 87%|████████▋ | 19440/22434 [17:33:30<4:47:59, 5.77s/it] +2025-02-06 03:41:10 - ERROR - stderr - +2025-02-06 03:41:10 - ERROR - stderr - +2025-02-06 03:41:10 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.3972047567367554, 'learning_rate': 9.197775174851543e-07, 'epoch': 2.6} +2025-02-06 03:41:10 - ERROR - stderr - 87%|████████▋ | 19440/22434 [17:33:30<4:47:59, 5.77s/it] +2025-02-06 03:41:31 - ERROR - stderr - 87%|████████▋ | 19441/22434 [17:33:50<8:31:50, 10.26s/it] +2025-02-06 03:41:31 - ERROR - stderr - +2025-02-06 03:41:31 - ERROR - stderr - +2025-02-06 03:41:31 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.4710986614227295, 'learning_rate': 9.191727949765949e-07, 'epoch': 2.6} +2025-02-06 03:41:31 - ERROR - stderr - 87%|████████▋ | 19441/22434 [17:33:51<8:31:50, 10.26s/it] +2025-02-06 03:41:43 - ERROR - stderr - 87%|████████▋ | 19442/22434 [17:34:02<8:55:05, 10.73s/it] +2025-02-06 03:41:43 - ERROR - stderr - +2025-02-06 03:41:43 - ERROR - stderr - +2025-02-06 03:41:43 - INFO - stdout - {'loss': 0.3479, 'grad_norm': 1.467466115951538, 'learning_rate': 9.185682617491865e-07, 'epoch': 2.6} +2025-02-06 03:41:43 - ERROR - stderr - 87%|████████▋ | 19442/22434 [17:34:02<8:55:05, 10.73s/it] +2025-02-06 03:41:49 - ERROR - stderr - 87%|████████▋ | 19443/22434 [17:34:09<7:48:10, 9.39s/it] +2025-02-06 03:41:49 - ERROR - stderr - +2025-02-06 03:41:49 - ERROR - stderr - +2025-02-06 03:41:49 - INFO - stdout - {'loss': 0.4154, 'grad_norm': 1.6823577880859375, 'learning_rate': 9.179639178155364e-07, 'epoch': 2.6} +2025-02-06 03:41:49 - ERROR - stderr - 87%|████████▋ | 19443/22434 [17:34:09<7:48:10, 9.39s/it] +2025-02-06 03:41:51 - ERROR - stderr - 87%|████████▋ | 19444/22434 [17:34:11<6:04:52, 7.32s/it] +2025-02-06 03:41:51 - ERROR - stderr - +2025-02-06 03:41:51 - ERROR - stderr - +2025-02-06 03:41:51 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.6559479236602783, 'learning_rate': 9.173597631882359e-07, 'epoch': 2.6} +2025-02-06 03:41:51 - ERROR - stderr - 87%|████████▋ | 19444/22434 [17:34:11<6:04:52, 7.32s/it] +2025-02-06 03:41:54 - ERROR - stderr - 87%|████████▋ | 19445/22434 [17:34:14<4:53:17, 5.89s/it] +2025-02-06 03:41:54 - ERROR - stderr - +2025-02-06 03:41:54 - ERROR - stderr - +2025-02-06 03:41:54 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.5296725034713745, 'learning_rate': 9.16755797879878e-07, 'epoch': 2.6} +2025-02-06 03:41:54 - ERROR - stderr - 87%|████████▋ | 19445/22434 [17:34:14<4:53:17, 5.89s/it] +2025-02-06 03:41:56 - ERROR - stderr - 87%|████████▋ | 19446/22434 [17:34:16<4:03:42, 4.89s/it] +2025-02-06 03:41:56 - ERROR - stderr - +2025-02-06 03:41:56 - ERROR - stderr - +2025-02-06 03:41:56 - INFO - stdout - {'loss': 0.3901, 'grad_norm': 1.641188621520996, 'learning_rate': 9.161520219030573e-07, 'epoch': 2.6} +2025-02-06 03:41:56 - ERROR - stderr - 87%|████████▋ | 19446/22434 [17:34:16<4:03:42, 4.89s/it] +2025-02-06 03:41:59 - ERROR - stderr - 87%|████████▋ | 19447/22434 [17:34:19<3:28:40, 4.19s/it] +2025-02-06 03:41:59 - ERROR - stderr - +2025-02-06 03:41:59 - ERROR - stderr - +2025-02-06 03:41:59 - INFO - stdout - {'loss': 0.4069, 'grad_norm': 1.5461465120315552, 'learning_rate': 9.155484352703537e-07, 'epoch': 2.6} +2025-02-06 03:41:59 - ERROR - stderr - 87%|████████▋ | 19447/22434 [17:34:19<3:28:40, 4.19s/it] +2025-02-06 03:42:01 - ERROR - stderr - 87%|████████▋ | 19448/22434 [17:34:21<3:02:29, 3.67s/it] +2025-02-06 03:42:01 - ERROR - stderr - +2025-02-06 03:42:01 - ERROR - stderr - +2025-02-06 03:42:01 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.581572413444519, 'learning_rate': 9.149450379943491e-07, 'epoch': 2.6} +2025-02-06 03:42:01 - ERROR - stderr - 87%|████████▋ | 19448/22434 [17:34:21<3:02:29, 3.67s/it] +2025-02-06 03:42:04 - ERROR - stderr - 87%|████████▋ | 19449/22434 [17:34:24<2:44:40, 3.31s/it] +2025-02-06 03:42:04 - ERROR - stderr - +2025-02-06 03:42:04 - ERROR - stderr - +2025-02-06 03:42:04 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.6334781646728516, 'learning_rate': 9.143418300876228e-07, 'epoch': 2.6} +2025-02-06 03:42:04 - ERROR - stderr - 87%|████████▋ | 19449/22434 [17:34:24<2:44:40, 3.31s/it] +2025-02-06 03:42:06 - ERROR - stderr - 87%|████████▋ | 19450/22434 [17:34:26<2:33:08, 3.08s/it] +2025-02-06 03:42:06 - ERROR - stderr - +2025-02-06 03:42:06 - ERROR - stderr - +2025-02-06 03:42:06 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.561816692352295, 'learning_rate': 9.137388115627477e-07, 'epoch': 2.6} +2025-02-06 03:42:06 - ERROR - stderr - 87%|████████▋ | 19450/22434 [17:34:26<2:33:08, 3.08s/it] +2025-02-06 03:42:09 - ERROR - stderr - 87%|████████▋ | 19451/22434 [17:34:29<2:25:55, 2.94s/it] +2025-02-06 03:42:09 - ERROR - stderr - +2025-02-06 03:42:09 - ERROR - stderr - +2025-02-06 03:42:09 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.49606192111969, 'learning_rate': 9.131359824322916e-07, 'epoch': 2.6} +2025-02-06 03:42:09 - ERROR - stderr - 87%|████████▋ | 19451/22434 [17:34:29<2:25:55, 2.94s/it] +2025-02-06 03:42:12 - ERROR - stderr - 87%|████████▋ | 19452/22434 [17:34:31<2:19:03, 2.80s/it] +2025-02-06 03:42:12 - ERROR - stderr - +2025-02-06 03:42:12 - ERROR - stderr - +2025-02-06 03:42:12 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.7139372825622559, 'learning_rate': 9.125333427088201e-07, 'epoch': 2.6} +2025-02-06 03:42:12 - ERROR - stderr - 87%|████████▋ | 19452/22434 [17:34:31<2:19:03, 2.80s/it] +2025-02-06 03:42:14 - ERROR - stderr - 87%|████████▋ | 19453/22434 [17:34:34<2:14:26, 2.71s/it] +2025-02-06 03:42:14 - ERROR - stderr - +2025-02-06 03:42:14 - ERROR - stderr - +2025-02-06 03:42:14 - INFO - stdout - {'loss': 0.3404, 'grad_norm': 1.5788805484771729, 'learning_rate': 9.119308924048964e-07, 'epoch': 2.6} +2025-02-06 03:42:14 - ERROR - stderr - 87%|████████▋ | 19453/22434 [17:34:34<2:14:26, 2.71s/it] +2025-02-06 03:42:17 - ERROR - stderr - 87%|████████▋ | 19454/22434 [17:34:36<2:12:34, 2.67s/it] +2025-02-06 03:42:17 - ERROR - stderr - +2025-02-06 03:42:17 - ERROR - stderr - +2025-02-06 03:42:17 - INFO - stdout - {'loss': 0.3563, 'grad_norm': 1.435338020324707, 'learning_rate': 9.11328631533076e-07, 'epoch': 2.6} +2025-02-06 03:42:17 - ERROR - stderr - 87%|████████▋ | 19454/22434 [17:34:36<2:12:34, 2.67s/it] +2025-02-06 03:42:19 - ERROR - stderr - 87%|████████▋ | 19455/22434 [17:34:39<2:09:53, 2.62s/it] +2025-02-06 03:42:19 - ERROR - stderr - +2025-02-06 03:42:19 - ERROR - stderr - +2025-02-06 03:42:19 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.5037792921066284, 'learning_rate': 9.107265601059145e-07, 'epoch': 2.6} +2025-02-06 03:42:19 - ERROR - stderr - 87%|████████▋ | 19455/22434 [17:34:39<2:09:53, 2.62s/it] +2025-02-06 03:42:22 - ERROR - stderr - 87%|████████▋ | 19456/22434 [17:34:41<2:07:46, 2.57s/it] +2025-02-06 03:42:22 - ERROR - stderr - +2025-02-06 03:42:22 - ERROR - stderr - +2025-02-06 03:42:22 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.6911542415618896, 'learning_rate': 9.101246781359596e-07, 'epoch': 2.6} +2025-02-06 03:42:22 - ERROR - stderr - 87%|████████▋ | 19456/22434 [17:34:41<2:07:46, 2.57s/it] +2025-02-06 03:42:24 - ERROR - stderr - 87%|████████▋ | 19457/22434 [17:34:44<2:06:07, 2.54s/it] +2025-02-06 03:42:24 - ERROR - stderr - +2025-02-06 03:42:24 - ERROR - stderr - +2025-02-06 03:42:24 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.6394643783569336, 'learning_rate': 9.095229856357579e-07, 'epoch': 2.6} +2025-02-06 03:42:24 - ERROR - stderr - 87%|████████▋ | 19457/22434 [17:34:44<2:06:07, 2.54s/it] +2025-02-06 03:42:27 - ERROR - stderr - 87%|████████▋ | 19458/22434 [17:34:46<2:05:48, 2.54s/it] +2025-02-06 03:42:27 - ERROR - stderr - +2025-02-06 03:42:27 - ERROR - stderr - +2025-02-06 03:42:27 - INFO - stdout - {'loss': 0.4194, 'grad_norm': 1.636730670928955, 'learning_rate': 9.089214826178505e-07, 'epoch': 2.6} +2025-02-06 03:42:27 - ERROR - stderr - 87%|████████▋ | 19458/22434 [17:34:46<2:05:48, 2.54s/it] +2025-02-06 03:42:29 - ERROR - stderr - 87%|████████▋ | 19459/22434 [17:34:49<2:05:26, 2.53s/it] +2025-02-06 03:42:29 - ERROR - stderr - +2025-02-06 03:42:29 - ERROR - stderr - +2025-02-06 03:42:29 - INFO - stdout - {'loss': 0.3049, 'grad_norm': 1.2542637586593628, 'learning_rate': 9.083201690947763e-07, 'epoch': 2.6} +2025-02-06 03:42:29 - ERROR - stderr - 87%|████████▋ | 19459/22434 [17:34:49<2:05:26, 2.53s/it] +2025-02-06 03:42:32 - ERROR - stderr - 87%|████████▋ | 19460/22434 [17:34:51<2:05:35, 2.53s/it] +2025-02-06 03:42:32 - ERROR - stderr - +2025-02-06 03:42:32 - ERROR - stderr - +2025-02-06 03:42:32 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.3598214387893677, 'learning_rate': 9.077190450790696e-07, 'epoch': 2.6} +2025-02-06 03:42:32 - ERROR - stderr - 87%|████████▋ | 19460/22434 [17:34:51<2:05:35, 2.53s/it] +2025-02-06 03:42:34 - ERROR - stderr - 87%|████████▋ | 19461/22434 [17:34:54<2:04:31, 2.51s/it] +2025-02-06 03:42:34 - ERROR - stderr - +2025-02-06 03:42:34 - ERROR - stderr - +2025-02-06 03:42:34 - INFO - stdout - {'loss': 0.2975, 'grad_norm': 1.4779701232910156, 'learning_rate': 9.071181105832561e-07, 'epoch': 2.6} +2025-02-06 03:42:34 - ERROR - stderr - 87%|████████▋ | 19461/22434 [17:34:54<2:04:31, 2.51s/it] +2025-02-06 03:42:37 - ERROR - stderr - 87%|████████▋ | 19462/22434 [17:34:56<2:03:55, 2.50s/it] +2025-02-06 03:42:37 - ERROR - stderr - +2025-02-06 03:42:37 - ERROR - stderr - +2025-02-06 03:42:37 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.6051291227340698, 'learning_rate': 9.065173656198678e-07, 'epoch': 2.6} +2025-02-06 03:42:37 - ERROR - stderr - 87%|████████▋ | 19462/22434 [17:34:56<2:03:55, 2.50s/it] +2025-02-06 03:42:39 - ERROR - stderr - 87%|████████▋ | 19463/22434 [17:34:59<2:03:32, 2.50s/it] +2025-02-06 03:42:39 - ERROR - stderr - +2025-02-06 03:42:39 - ERROR - stderr - +2025-02-06 03:42:39 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.4587279558181763, 'learning_rate': 9.059168102014193e-07, 'epoch': 2.6} +2025-02-06 03:42:39 - ERROR - stderr - 87%|████████▋ | 19463/22434 [17:34:59<2:03:32, 2.50s/it] +2025-02-06 03:43:00 - ERROR - stderr - 87%|████████▋ | 19464/22434 [17:35:19<6:33:39, 7.95s/it] +2025-02-06 03:43:00 - ERROR - stderr - +2025-02-06 03:43:00 - ERROR - stderr - +2025-02-06 03:43:00 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.505478024482727, 'learning_rate': 9.053164443404361e-07, 'epoch': 2.6} +2025-02-06 03:43:00 - ERROR - stderr - 87%|████████▋ | 19464/22434 [17:35:20<6:33:39, 7.95s/it] +2025-02-06 03:43:20 - ERROR - stderr - 87%|████████▋ | 19465/22434 [17:35:39<9:30:41, 11.53s/it] +2025-02-06 03:43:20 - ERROR - stderr - +2025-02-06 03:43:20 - ERROR - stderr - +2025-02-06 03:43:20 - INFO - stdout - {'loss': 0.3199, 'grad_norm': 1.5664360523223877, 'learning_rate': 9.047162680494293e-07, 'epoch': 2.6} +2025-02-06 03:43:20 - ERROR - stderr - 87%|████████▋ | 19465/22434 [17:35:39<9:30:41, 11.53s/it] +2025-02-06 03:43:59 - ERROR - stderr - 87%|████████▋ | 19466/22434 [17:36:19<16:22:19, 19.86s/it] +2025-02-06 03:43:59 - ERROR - stderr - +2025-02-06 03:43:59 - ERROR - stderr - +2025-02-06 03:43:59 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.5141276121139526, 'learning_rate': 9.041162813409055e-07, 'epoch': 2.6} +2025-02-06 03:43:59 - ERROR - stderr - 87%|████████▋ | 19466/22434 [17:36:19<16:22:19, 19.86s/it] +2025-02-06 03:44:50 - ERROR - stderr - 87%|████████▋ | 19467/22434 [17:37:09<24:01:22, 29.15s/it] +2025-02-06 03:44:50 - ERROR - stderr - +2025-02-06 03:44:50 - ERROR - stderr - +2025-02-06 03:44:50 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.388545036315918, 'learning_rate': 9.03516484227378e-07, 'epoch': 2.6} +2025-02-06 03:44:50 - ERROR - stderr - 87%|████████▋ | 19467/22434 [17:37:10<24:01:22, 29.15s/it] +2025-02-06 03:45:01 - ERROR - stderr - 87%|████████▋ | 19468/22434 [17:37:21<19:43:10, 23.93s/it] +2025-02-06 03:45:02 - ERROR - stderr - +2025-02-06 03:45:02 - ERROR - stderr - +2025-02-06 03:45:02 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.642006754875183, 'learning_rate': 9.029168767213426e-07, 'epoch': 2.6} +2025-02-06 03:45:02 - ERROR - stderr - 87%|████████▋ | 19468/22434 [17:37:21<19:43:10, 23.93s/it] +2025-02-06 03:45:14 - ERROR - stderr - 87%|████████▋ | 19469/22434 [17:37:34<16:50:55, 20.46s/it] +2025-02-06 03:45:14 - ERROR - stderr - +2025-02-06 03:45:14 - ERROR - stderr - +2025-02-06 03:45:14 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.4901856184005737, 'learning_rate': 9.023174588353001e-07, 'epoch': 2.6} +2025-02-06 03:45:14 - ERROR - stderr - 87%|████████▋ | 19469/22434 [17:37:34<16:50:55, 20.46s/it] +2025-02-06 03:46:02 - ERROR - stderr - 87%|████████▋ | 19470/22434 [17:38:21<23:35:59, 28.66s/it] +2025-02-06 03:46:02 - ERROR - stderr - +2025-02-06 03:46:02 - ERROR - stderr - +2025-02-06 03:46:02 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.3451625108718872, 'learning_rate': 9.017182305817451e-07, 'epoch': 2.6} +2025-02-06 03:46:02 - ERROR - stderr - 87%|████████▋ | 19470/22434 [17:38:21<23:35:59, 28.66s/it] +2025-02-06 03:46:17 - ERROR - stderr - 87%|████████▋ | 19471/22434 [17:38:36<20:12:15, 24.55s/it] +2025-02-06 03:46:17 - ERROR - stderr - +2025-02-06 03:46:17 - ERROR - stderr - +2025-02-06 03:46:17 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.7079448699951172, 'learning_rate': 9.011191919731655e-07, 'epoch': 2.6} +2025-02-06 03:46:17 - ERROR - stderr - 87%|████████▋ | 19471/22434 [17:38:36<20:12:15, 24.55s/it] +2025-02-06 03:47:12 - ERROR - stderr - 87%|████████▋ | 19472/22434 [17:39:32<27:45:54, 33.75s/it] +2025-02-06 03:47:12 - ERROR - stderr - +2025-02-06 03:47:12 - ERROR - stderr - +2025-02-06 03:47:12 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.529018759727478, 'learning_rate': 9.005203430220532e-07, 'epoch': 2.6} +2025-02-06 03:47:12 - ERROR - stderr - 87%|████████▋ | 19472/22434 [17:39:32<27:45:54, 33.75s/it] +2025-02-06 03:48:10 - ERROR - stderr - 87%|████████▋ | 19473/22434 [17:40:29<33:41:07, 40.95s/it] +2025-02-06 03:48:10 - ERROR - stderr - +2025-02-06 03:48:10 - ERROR - stderr - +2025-02-06 03:48:10 - INFO - stdout - {'loss': 0.423, 'grad_norm': 1.5724196434020996, 'learning_rate': 8.999216837408853e-07, 'epoch': 2.6} +2025-02-06 03:48:10 - ERROR - stderr - 87%|████████▋ | 19473/22434 [17:40:29<33:41:07, 40.95s/it] +2025-02-06 03:49:08 - ERROR - stderr - 87%|████████▋ | 19474/22434 [17:41:27<37:52:18, 46.06s/it] +2025-02-06 03:49:08 - ERROR - stderr - +2025-02-06 03:49:08 - ERROR - stderr - +2025-02-06 03:49:08 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.5658984184265137, 'learning_rate': 8.993232141421415e-07, 'epoch': 2.6} +2025-02-06 03:49:08 - ERROR - stderr - 87%|████████▋ | 19474/22434 [17:41:27<37:52:18, 46.06s/it] +2025-02-06 03:49:10 - ERROR - stderr - 87%|████████▋ | 19475/22434 [17:41:30<27:05:43, 32.96s/it] +2025-02-06 03:49:10 - ERROR - stderr - +2025-02-06 03:49:10 - ERROR - stderr - +2025-02-06 03:49:10 - INFO - stdout - {'loss': 0.3342, 'grad_norm': 1.5030330419540405, 'learning_rate': 8.987249342382976e-07, 'epoch': 2.6} +2025-02-06 03:49:10 - ERROR - stderr - 87%|████████▋ | 19475/22434 [17:41:30<27:05:43, 32.96s/it] +2025-02-06 03:50:11 - ERROR - stderr - 87%|████████▋ | 19476/22434 [17:42:30<33:56:27, 41.31s/it] +2025-02-06 03:50:11 - ERROR - stderr - +2025-02-06 03:50:11 - ERROR - stderr - +2025-02-06 03:50:11 - INFO - stdout - {'loss': 0.3782, 'grad_norm': 1.6373921632766724, 'learning_rate': 8.981268440418234e-07, 'epoch': 2.6} +2025-02-06 03:50:11 - ERROR - stderr - 87%|████████▋ | 19476/22434 [17:42:31<33:56:27, 41.31s/it] +2025-02-06 03:51:19 - ERROR - stderr - 87%|████████▋ | 19477/22434 [17:43:39<40:35:02, 49.41s/it] +2025-02-06 03:51:19 - ERROR - stderr - +2025-02-06 03:51:19 - ERROR - stderr - +2025-02-06 03:51:19 - INFO - stdout - {'loss': 0.3307, 'grad_norm': 1.4834411144256592, 'learning_rate': 8.975289435651857e-07, 'epoch': 2.6} +2025-02-06 03:51:19 - ERROR - stderr - 87%|████████▋ | 19477/22434 [17:43:39<40:35:02, 49.41s/it] +2025-02-06 03:52:24 - ERROR - stderr - 87%|████████▋ | 19478/22434 [17:44:44<44:30:09, 54.20s/it] +2025-02-06 03:52:24 - ERROR - stderr - +2025-02-06 03:52:24 - ERROR - stderr - +2025-02-06 03:52:24 - INFO - stdout - {'loss': 0.3712, 'grad_norm': 1.6104497909545898, 'learning_rate': 8.969312328208469e-07, 'epoch': 2.6} +2025-02-06 03:52:24 - ERROR - stderr - 87%|████████▋ | 19478/22434 [17:44:44<44:30:09, 54.20s/it] +2025-02-06 03:52:27 - ERROR - stderr - 87%|████████▋ | 19479/22434 [17:44:47<31:46:14, 38.71s/it] +2025-02-06 03:52:27 - ERROR - stderr - +2025-02-06 03:52:27 - ERROR - stderr - +2025-02-06 03:52:27 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.4394108057022095, 'learning_rate': 8.963337118212656e-07, 'epoch': 2.6} +2025-02-06 03:52:27 - ERROR - stderr - 87%|████████▋ | 19479/22434 [17:44:47<31:46:14, 38.71s/it] +2025-02-06 03:52:30 - ERROR - stderr - 87%|████████▋ | 19480/22434 [17:44:49<22:52:33, 27.88s/it] +2025-02-06 03:52:30 - ERROR - stderr - +2025-02-06 03:52:30 - ERROR - stderr - +2025-02-06 03:52:30 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.3359596729278564, 'learning_rate': 8.957363805788965e-07, 'epoch': 2.6} +2025-02-06 03:52:30 - ERROR - stderr - 87%|████████▋ | 19480/22434 [17:44:49<22:52:33, 27.88s/it] +2025-02-06 03:52:32 - ERROR - stderr - 87%|████████▋ | 19481/22434 [17:44:52<16:37:48, 20.27s/it] +2025-02-06 03:52:32 - ERROR - stderr - +2025-02-06 03:52:32 - ERROR - stderr - +2025-02-06 03:52:32 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.4096752405166626, 'learning_rate': 8.95139239106193e-07, 'epoch': 2.61} +2025-02-06 03:52:32 - ERROR - stderr - 87%|████████▋ | 19481/22434 [17:44:52<16:37:48, 20.27s/it] +2025-02-06 03:52:45 - ERROR - stderr - 87%|████████▋ | 19482/22434 [17:45:05<14:49:27, 18.08s/it] +2025-02-06 03:52:45 - ERROR - stderr - +2025-02-06 03:52:45 - ERROR - stderr - +2025-02-06 03:52:45 - INFO - stdout - {'loss': 0.4148, 'grad_norm': 1.685247778892517, 'learning_rate': 8.945422874155962e-07, 'epoch': 2.61} +2025-02-06 03:52:45 - ERROR - stderr - 87%|████████▋ | 19482/22434 [17:45:05<14:49:27, 18.08s/it] +2025-02-06 03:53:49 - ERROR - stderr - 87%|████████▋ | 19483/22434 [17:46:09<26:06:36, 31.85s/it] +2025-02-06 03:53:49 - ERROR - stderr - +2025-02-06 03:53:49 - ERROR - stderr - +2025-02-06 03:53:49 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.422440767288208, 'learning_rate': 8.939455255195539e-07, 'epoch': 2.61} +2025-02-06 03:53:49 - ERROR - stderr - 87%|████████▋ | 19483/22434 [17:46:09<26:06:36, 31.85s/it] +2025-02-06 03:54:01 - ERROR - stderr - 87%|████████▋ | 19484/22434 [17:46:21<21:08:39, 25.80s/it] +2025-02-06 03:54:01 - ERROR - stderr - +2025-02-06 03:54:01 - ERROR - stderr - +2025-02-06 03:54:01 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.450132131576538, 'learning_rate': 8.933489534305051e-07, 'epoch': 2.61} +2025-02-06 03:54:01 - ERROR - stderr - 87%|████████▋ | 19484/22434 [17:46:21<21:08:39, 25.80s/it] +2025-02-06 03:55:17 - ERROR - stderr - 87%|████████▋ | 19485/22434 [17:47:36<33:24:55, 40.79s/it] +2025-02-06 03:55:17 - ERROR - stderr - +2025-02-06 03:55:17 - ERROR - stderr - +2025-02-06 03:55:17 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.4854105710983276, 'learning_rate': 8.927525711608808e-07, 'epoch': 2.61} +2025-02-06 03:55:17 - ERROR - stderr - 87%|████████▋ | 19485/22434 [17:47:36<33:24:55, 40.79s/it] +2025-02-06 03:56:25 - ERROR - stderr - 87%|████████▋ | 19486/22434 [17:48:45<40:09:04, 49.03s/it] +2025-02-06 03:56:25 - ERROR - stderr - +2025-02-06 03:56:25 - ERROR - stderr - +2025-02-06 03:56:25 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.7058250904083252, 'learning_rate': 8.921563787231169e-07, 'epoch': 2.61} +2025-02-06 03:56:25 - ERROR - stderr - 87%|████████▋ | 19486/22434 [17:48:45<40:09:04, 49.03s/it] +2025-02-06 03:57:31 - ERROR - stderr - 87%|████████▋ | 19487/22434 [17:49:51<44:22:19, 54.20s/it] +2025-02-06 03:57:31 - ERROR - stderr - +2025-02-06 03:57:31 - ERROR - stderr - +2025-02-06 03:57:31 - INFO - stdout - {'loss': 0.3227, 'grad_norm': 1.6321206092834473, 'learning_rate': 8.915603761296354e-07, 'epoch': 2.61} +2025-02-06 03:57:31 - ERROR - stderr - 87%|████████▋ | 19487/22434 [17:49:51<44:22:19, 54.20s/it] +2025-02-06 03:57:34 - ERROR - stderr - 87%|████████▋ | 19488/22434 [17:49:53<31:40:07, 38.70s/it] +2025-02-06 03:57:34 - ERROR - stderr - +2025-02-06 03:57:34 - ERROR - stderr - +2025-02-06 03:57:34 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.592790961265564, 'learning_rate': 8.909645633928643e-07, 'epoch': 2.61} +2025-02-06 03:57:34 - ERROR - stderr - 87%|████████▋ | 19488/22434 [17:49:53<31:40:07, 38.70s/it] +2025-02-06 03:58:29 - ERROR - stderr - 87%|████████▋ | 19489/22434 [17:50:49<35:43:49, 43.68s/it] +2025-02-06 03:58:29 - ERROR - stderr - +2025-02-06 03:58:29 - ERROR - stderr - +2025-02-06 03:58:29 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.5909035205841064, 'learning_rate': 8.903689405252203e-07, 'epoch': 2.61} +2025-02-06 03:58:29 - ERROR - stderr - 87%|████████▋ | 19489/22434 [17:50:49<35:43:49, 43.68s/it] +2025-02-06 03:59:27 - ERROR - stderr - 87%|████████▋ | 19490/22434 [17:51:46<39:10:58, 47.91s/it] +2025-02-06 03:59:27 - ERROR - stderr - +2025-02-06 03:59:27 - ERROR - stderr - +2025-02-06 03:59:27 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.5592460632324219, 'learning_rate': 8.897735075391156e-07, 'epoch': 2.61} +2025-02-06 03:59:27 - ERROR - stderr - 87%|████████▋ | 19490/22434 [17:51:46<39:10:58, 47.91s/it] +2025-02-06 03:59:35 - ERROR - stderr - 87%|████████▋ | 19491/22434 [17:51:55<29:29:15, 36.07s/it] +2025-02-06 03:59:35 - ERROR - stderr - +2025-02-06 03:59:35 - ERROR - stderr - +2025-02-06 03:59:35 - INFO - stdout - {'loss': 0.4333, 'grad_norm': 1.6861019134521484, 'learning_rate': 8.891782644469693e-07, 'epoch': 2.61} +2025-02-06 03:59:35 - ERROR - stderr - 87%|████████▋ | 19491/22434 [17:51:55<29:29:15, 36.07s/it] +2025-02-06 03:59:38 - ERROR - stderr - 87%|████████▋ | 19492/22434 [17:51:57<21:14:52, 26.00s/it] +2025-02-06 03:59:38 - ERROR - stderr - +2025-02-06 03:59:38 - ERROR - stderr - +2025-02-06 03:59:38 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.4520800113677979, 'learning_rate': 8.885832112611814e-07, 'epoch': 2.61} +2025-02-06 03:59:38 - ERROR - stderr - 87%|████████▋ | 19492/22434 [17:51:57<21:14:52, 26.00s/it] +2025-02-06 03:59:40 - ERROR - stderr - 87%|████████▋ | 19493/22434 [17:52:00<15:30:54, 18.99s/it] +2025-02-06 03:59:40 - ERROR - stderr - +2025-02-06 03:59:40 - ERROR - stderr - +2025-02-06 03:59:40 - INFO - stdout - {'loss': 0.3856, 'grad_norm': 1.4956095218658447, 'learning_rate': 8.879883479941576e-07, 'epoch': 2.61} +2025-02-06 03:59:40 - ERROR - stderr - 87%|████████▋ | 19493/22434 [17:52:00<15:30:54, 18.99s/it] +2025-02-06 03:59:43 - ERROR - stderr - 87%|████████▋ | 19494/22434 [17:52:03<11:28:31, 14.05s/it] +2025-02-06 03:59:43 - ERROR - stderr - +2025-02-06 03:59:43 - ERROR - stderr - +2025-02-06 03:59:43 - INFO - stdout - {'loss': 0.3199, 'grad_norm': 1.382568120956421, 'learning_rate': 8.873936746582978e-07, 'epoch': 2.61} +2025-02-06 03:59:43 - ERROR - stderr - 87%|████████▋ | 19494/22434 [17:52:03<11:28:31, 14.05s/it] +2025-02-06 03:59:45 - ERROR - stderr - 87%|████████▋ | 19495/22434 [17:52:05<8:38:03, 10.58s/it] +2025-02-06 03:59:45 - ERROR - stderr - +2025-02-06 03:59:45 - ERROR - stderr - +2025-02-06 03:59:45 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.7488336563110352, 'learning_rate': 8.867991912659979e-07, 'epoch': 2.61} +2025-02-06 03:59:45 - ERROR - stderr - 87%|████████▋ | 19495/22434 [17:52:05<8:38:03, 10.58s/it] +2025-02-06 03:59:48 - ERROR - stderr - 87%|████████▋ | 19496/22434 [17:52:08<6:39:44, 8.16s/it] +2025-02-06 03:59:48 - ERROR - stderr - +2025-02-06 03:59:48 - ERROR - stderr - +2025-02-06 03:59:48 - INFO - stdout - {'loss': 0.3257, 'grad_norm': 1.5171436071395874, 'learning_rate': 8.862048978296467e-07, 'epoch': 2.61} +2025-02-06 03:59:48 - ERROR - stderr - 87%|████████▋ | 19496/22434 [17:52:08<6:39:44, 8.16s/it] +2025-02-06 03:59:53 - ERROR - stderr - 87%|████████▋ | 19497/22434 [17:52:13<5:52:49, 7.21s/it] +2025-02-06 03:59:53 - ERROR - stderr - +2025-02-06 03:59:53 - ERROR - stderr - +2025-02-06 03:59:53 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.5666707754135132, 'learning_rate': 8.856107943616343e-07, 'epoch': 2.61} +2025-02-06 03:59:53 - ERROR - stderr - 87%|████████▋ | 19497/22434 [17:52:13<5:52:49, 7.21s/it] +2025-02-06 03:59:55 - ERROR - stderr - 87%|████████▋ | 19498/22434 [17:52:15<4:43:10, 5.79s/it] +2025-02-06 03:59:55 - ERROR - stderr - +2025-02-06 03:59:55 - ERROR - stderr - +2025-02-06 03:59:55 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.5899471044540405, 'learning_rate': 8.850168808743442e-07, 'epoch': 2.61} +2025-02-06 03:59:55 - ERROR - stderr - 87%|████████▋ | 19498/22434 [17:52:15<4:43:10, 5.79s/it] +2025-02-06 03:59:58 - ERROR - stderr - 87%|████████▋ | 19499/22434 [17:52:17<3:54:16, 4.79s/it] +2025-02-06 03:59:58 - ERROR - stderr - +2025-02-06 03:59:58 - ERROR - stderr - +2025-02-06 03:59:58 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.711914300918579, 'learning_rate': 8.844231573801543e-07, 'epoch': 2.61} +2025-02-06 03:59:58 - ERROR - stderr - 87%|████████▋ | 19499/22434 [17:52:17<3:54:16, 4.79s/it] +2025-02-06 04:00:04 - ERROR - stderr - 87%|████████▋ | 19500/22434 [17:52:23<4:11:29, 5.14s/it] +2025-02-06 04:00:04 - ERROR - stderr - +2025-02-06 04:00:04 - ERROR - stderr - +2025-02-06 04:00:04 - INFO - stdout - {'loss': 0.337, 'grad_norm': 1.3864308595657349, 'learning_rate': 8.838296238914424e-07, 'epoch': 2.61} +2025-02-06 04:00:04 - ERROR - stderr - 87%|████████▋ | 19500/22434 [17:52:23<4:11:29, 5.14s/it] +2025-02-06 04:00:46 - ERROR - stderr - 87%|████████▋ | 19501/22434 [17:53:06<13:20:19, 16.37s/it] +2025-02-06 04:00:46 - ERROR - stderr - +2025-02-06 04:00:46 - ERROR - stderr - +2025-02-06 04:00:46 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.355744481086731, 'learning_rate': 8.832362804205763e-07, 'epoch': 2.61} +2025-02-06 04:00:46 - ERROR - stderr - 87%|████████▋ | 19501/22434 [17:53:06<13:20:19, 16.37s/it] +2025-02-06 04:01:24 - ERROR - stderr - 87%|████████▋ | 19502/22434 [17:53:44<18:38:33, 22.89s/it] +2025-02-06 04:01:24 - ERROR - stderr - +2025-02-06 04:01:24 - ERROR - stderr - +2025-02-06 04:01:24 - INFO - stdout - {'loss': 0.3434, 'grad_norm': 1.5719846487045288, 'learning_rate': 8.826431269799274e-07, 'epoch': 2.61} +2025-02-06 04:01:24 - ERROR - stderr - 87%|████████▋ | 19502/22434 [17:53:44<18:38:33, 22.89s/it] +2025-02-06 04:01:54 - ERROR - stderr - 87%|████████▋ | 19503/22434 [17:54:14<20:25:07, 25.08s/it] +2025-02-06 04:01:55 - ERROR - stderr - +2025-02-06 04:01:55 - ERROR - stderr - +2025-02-06 04:01:55 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.5453073978424072, 'learning_rate': 8.820501635818579e-07, 'epoch': 2.61} +2025-02-06 04:01:55 - ERROR - stderr - 87%|████████▋ | 19503/22434 [17:54:14<20:25:07, 25.08s/it] +2025-02-06 04:02:19 - ERROR - stderr - 87%|████████▋ | 19504/22434 [17:54:39<20:14:50, 24.88s/it] +2025-02-06 04:02:19 - ERROR - stderr - +2025-02-06 04:02:19 - ERROR - stderr - +2025-02-06 04:02:19 - INFO - stdout - {'loss': 0.3451, 'grad_norm': 1.5635631084442139, 'learning_rate': 8.81457390238728e-07, 'epoch': 2.61} +2025-02-06 04:02:19 - ERROR - stderr - 87%|████████▋ | 19504/22434 [17:54:39<20:14:50, 24.88s/it] +2025-02-06 04:02:40 - ERROR - stderr - 87%|████████▋ | 19505/22434 [17:54:59<19:12:13, 23.60s/it] +2025-02-06 04:02:40 - ERROR - stderr - +2025-02-06 04:02:40 - ERROR - stderr - +2025-02-06 04:02:40 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.740233063697815, 'learning_rate': 8.808648069628945e-07, 'epoch': 2.61} +2025-02-06 04:02:40 - ERROR - stderr - 87%|████████▋ | 19505/22434 [17:54:59<19:12:13, 23.60s/it] +2025-02-06 04:02:42 - ERROR - stderr - 87%|████████▋ | 19506/22434 [17:55:02<14:03:00, 17.27s/it] +2025-02-06 04:02:42 - ERROR - stderr - +2025-02-06 04:02:42 - ERROR - stderr - +2025-02-06 04:02:42 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.6847102642059326, 'learning_rate': 8.802724137667052e-07, 'epoch': 2.61} +2025-02-06 04:02:42 - ERROR - stderr - 87%|████████▋ | 19506/22434 [17:55:02<14:03:00, 17.27s/it] +2025-02-06 04:02:45 - ERROR - stderr - 87%|████████▋ | 19507/22434 [17:55:04<10:26:51, 12.85s/it] +2025-02-06 04:02:45 - ERROR - stderr - +2025-02-06 04:02:45 - ERROR - stderr - +2025-02-06 04:02:45 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.497064471244812, 'learning_rate': 8.796802106625147e-07, 'epoch': 2.61} +2025-02-06 04:02:45 - ERROR - stderr - 87%|████████▋ | 19507/22434 [17:55:04<10:26:51, 12.85s/it] +2025-02-06 04:02:47 - ERROR - stderr - 87%|████████▋ | 19508/22434 [17:55:07<7:54:42, 9.73s/it] +2025-02-06 04:02:47 - ERROR - stderr - +2025-02-06 04:02:47 - ERROR - stderr - +2025-02-06 04:02:47 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.622875452041626, 'learning_rate': 8.790881976626598e-07, 'epoch': 2.61} +2025-02-06 04:02:47 - ERROR - stderr - 87%|████████▋ | 19508/22434 [17:55:07<7:54:42, 9.73s/it] +2025-02-06 04:02:49 - ERROR - stderr - 87%|████████▋ | 19509/22434 [17:55:09<6:07:52, 7.55s/it] +2025-02-06 04:02:50 - ERROR - stderr - +2025-02-06 04:02:50 - ERROR - stderr - +2025-02-06 04:02:50 - INFO - stdout - {'loss': 0.3196, 'grad_norm': 1.5367298126220703, 'learning_rate': 8.784963747794828e-07, 'epoch': 2.61} +2025-02-06 04:02:50 - ERROR - stderr - 87%|████████▋ | 19509/22434 [17:55:09<6:07:52, 7.55s/it] +2025-02-06 04:03:09 - ERROR - stderr - 87%|████████▋ | 19510/22434 [17:55:29<9:05:04, 11.19s/it] +2025-02-06 04:03:09 - ERROR - stderr - +2025-02-06 04:03:09 - ERROR - stderr - +2025-02-06 04:03:09 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.4905195236206055, 'learning_rate': 8.779047420253239e-07, 'epoch': 2.61} +2025-02-06 04:03:09 - ERROR - stderr - 87%|████████▋ | 19510/22434 [17:55:29<9:05:04, 11.19s/it] +2025-02-06 04:03:12 - ERROR - stderr - 87%|████████▋ | 19511/22434 [17:55:31<6:58:16, 8.59s/it] +2025-02-06 04:03:12 - ERROR - stderr - +2025-02-06 04:03:12 - ERROR - stderr - +2025-02-06 04:03:12 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.3953361511230469, 'learning_rate': 8.773132994125089e-07, 'epoch': 2.61} +2025-02-06 04:03:12 - ERROR - stderr - 87%|████████▋ | 19511/22434 [17:55:31<6:58:16, 8.59s/it] +2025-02-06 04:03:14 - ERROR - stderr - 87%|████████▋ | 19512/22434 [17:55:34<5:28:21, 6.74s/it] +2025-02-06 04:03:14 - ERROR - stderr - +2025-02-06 04:03:14 - ERROR - stderr - +2025-02-06 04:03:14 - INFO - stdout - {'loss': 0.3256, 'grad_norm': 1.5885361433029175, 'learning_rate': 8.767220469533722e-07, 'epoch': 2.61} +2025-02-06 04:03:14 - ERROR - stderr - 87%|████████▋ | 19512/22434 [17:55:34<5:28:21, 6.74s/it] +2025-02-06 04:03:17 - ERROR - stderr - 87%|████████▋ | 19513/22434 [17:55:36<4:26:11, 5.47s/it] +2025-02-06 04:03:17 - ERROR - stderr - +2025-02-06 04:03:17 - ERROR - stderr - +2025-02-06 04:03:17 - INFO - stdout - {'loss': 0.3379, 'grad_norm': 1.5141377449035645, 'learning_rate': 8.761309846602317e-07, 'epoch': 2.61} +2025-02-06 04:03:17 - ERROR - stderr - 87%|████████▋ | 19513/22434 [17:55:36<4:26:11, 5.47s/it] +2025-02-06 04:03:19 - ERROR - stderr - 87%|████████▋ | 19514/22434 [17:55:39<3:45:31, 4.63s/it] +2025-02-06 04:03:19 - ERROR - stderr - +2025-02-06 04:03:19 - ERROR - stderr - +2025-02-06 04:03:19 - INFO - stdout - {'loss': 0.3299, 'grad_norm': 1.4275989532470703, 'learning_rate': 8.75540112545411e-07, 'epoch': 2.61} +2025-02-06 04:03:19 - ERROR - stderr - 87%|████████▋ | 19514/22434 [17:55:39<3:45:31, 4.63s/it] +2025-02-06 04:03:41 - ERROR - stderr - 87%|████████▋ | 19515/22434 [17:56:01<7:57:56, 9.82s/it] +2025-02-06 04:03:41 - ERROR - stderr - +2025-02-06 04:03:41 - ERROR - stderr - +2025-02-06 04:03:41 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.681443452835083, 'learning_rate': 8.749494306212247e-07, 'epoch': 2.61} +2025-02-06 04:03:41 - ERROR - stderr - 87%|████████▋ | 19515/22434 [17:56:01<7:57:56, 9.82s/it] +2025-02-06 04:03:44 - ERROR - stderr - 87%|████████▋ | 19516/22434 [17:56:03<6:09:56, 7.61s/it] +2025-02-06 04:03:44 - ERROR - stderr - +2025-02-06 04:03:44 - ERROR - stderr - +2025-02-06 04:03:44 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.6418869495391846, 'learning_rate': 8.743589388999862e-07, 'epoch': 2.61} +2025-02-06 04:03:44 - ERROR - stderr - 87%|████████▋ | 19516/22434 [17:56:03<6:09:56, 7.61s/it] +2025-02-06 04:04:06 - ERROR - stderr - 87%|████████▋ | 19517/22434 [17:56:26<9:45:27, 12.04s/it] +2025-02-06 04:04:06 - ERROR - stderr - +2025-02-06 04:04:06 - ERROR - stderr - +2025-02-06 04:04:06 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.7757841348648071, 'learning_rate': 8.737686373940036e-07, 'epoch': 2.61} +2025-02-06 04:04:06 - ERROR - stderr - 87%|████████▋ | 19517/22434 [17:56:26<9:45:27, 12.04s/it] +2025-02-06 04:04:19 - ERROR - stderr - 87%|████████▋ | 19518/22434 [17:56:39<9:59:08, 12.33s/it] +2025-02-06 04:04:19 - ERROR - stderr - +2025-02-06 04:04:19 - ERROR - stderr - +2025-02-06 04:04:19 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.6405843496322632, 'learning_rate': 8.731785261155801e-07, 'epoch': 2.61} +2025-02-06 04:04:19 - ERROR - stderr - 87%|████████▋ | 19518/22434 [17:56:39<9:59:08, 12.33s/it] +2025-02-06 04:04:22 - ERROR - stderr - 87%|████████▋ | 19519/22434 [17:56:41<7:35:47, 9.38s/it] +2025-02-06 04:04:22 - ERROR - stderr - +2025-02-06 04:04:22 - ERROR - stderr - +2025-02-06 04:04:22 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.518615484237671, 'learning_rate': 8.725886050770182e-07, 'epoch': 2.61} +2025-02-06 04:04:22 - ERROR - stderr - 87%|████████▋ | 19519/22434 [17:56:41<7:35:47, 9.38s/it] +2025-02-06 04:04:24 - ERROR - stderr - 87%|████████▋ | 19520/22434 [17:56:44<5:55:56, 7.33s/it] +2025-02-06 04:04:24 - ERROR - stderr - +2025-02-06 04:04:24 - ERROR - stderr - +2025-02-06 04:04:24 - INFO - stdout - {'loss': 0.3187, 'grad_norm': 1.3937513828277588, 'learning_rate': 8.719988742906116e-07, 'epoch': 2.61} +2025-02-06 04:04:24 - ERROR - stderr - 87%|████████▋ | 19520/22434 [17:56:44<5:55:56, 7.33s/it] +2025-02-06 04:04:27 - ERROR - stderr - 87%|████████▋ | 19521/22434 [17:56:46<4:46:18, 5.90s/it] +2025-02-06 04:04:27 - ERROR - stderr - +2025-02-06 04:04:27 - ERROR - stderr - +2025-02-06 04:04:27 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.4609380960464478, 'learning_rate': 8.714093337686547e-07, 'epoch': 2.61} +2025-02-06 04:04:27 - ERROR - stderr - 87%|████████▋ | 19521/22434 [17:56:46<4:46:18, 5.90s/it] +2025-02-06 04:04:29 - ERROR - stderr - 87%|████████▋ | 19522/22434 [17:56:49<3:56:24, 4.87s/it] +2025-02-06 04:04:29 - ERROR - stderr - +2025-02-06 04:04:29 - ERROR - stderr - +2025-02-06 04:04:29 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.57331120967865, 'learning_rate': 8.708199835234343e-07, 'epoch': 2.61} +2025-02-06 04:04:29 - ERROR - stderr - 87%|████████▋ | 19522/22434 [17:56:49<3:56:24, 4.87s/it] +2025-02-06 04:04:32 - ERROR - stderr - 87%|████████▋ | 19523/22434 [17:56:51<3:21:34, 4.15s/it] +2025-02-06 04:04:32 - ERROR - stderr - +2025-02-06 04:04:32 - ERROR - stderr - +2025-02-06 04:04:32 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.6052734851837158, 'learning_rate': 8.702308235672363e-07, 'epoch': 2.61} +2025-02-06 04:04:32 - ERROR - stderr - 87%|████████▋ | 19523/22434 [17:56:51<3:21:34, 4.15s/it] +2025-02-06 04:04:35 - ERROR - stderr - 87%|████████▋ | 19524/22434 [17:56:54<3:04:57, 3.81s/it] +2025-02-06 04:04:35 - ERROR - stderr - +2025-02-06 04:04:35 - ERROR - stderr - +2025-02-06 04:04:35 - INFO - stdout - {'loss': 0.3408, 'grad_norm': 1.6063714027404785, 'learning_rate': 8.696418539123419e-07, 'epoch': 2.61} +2025-02-06 04:04:35 - ERROR - stderr - 87%|████████▋ | 19524/22434 [17:56:54<3:04:57, 3.81s/it] +2025-02-06 04:04:37 - ERROR - stderr - 87%|████████▋ | 19525/22434 [17:56:57<2:45:53, 3.42s/it] +2025-02-06 04:04:37 - ERROR - stderr - +2025-02-06 04:04:37 - ERROR - stderr - +2025-02-06 04:04:37 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.5783113241195679, 'learning_rate': 8.690530745710236e-07, 'epoch': 2.61} +2025-02-06 04:04:37 - ERROR - stderr - 87%|████████▋ | 19525/22434 [17:56:57<2:45:53, 3.42s/it] +2025-02-06 04:04:40 - ERROR - stderr - 87%|████████▋ | 19526/22434 [17:56:59<2:32:21, 3.14s/it] +2025-02-06 04:04:40 - ERROR - stderr - +2025-02-06 04:04:40 - ERROR - stderr - +2025-02-06 04:04:40 - INFO - stdout - {'loss': 0.3904, 'grad_norm': 1.717894196510315, 'learning_rate': 8.684644855555591e-07, 'epoch': 2.61} +2025-02-06 04:04:40 - ERROR - stderr - 87%|████████▋ | 19526/22434 [17:56:59<2:32:21, 3.14s/it] +2025-02-06 04:04:42 - ERROR - stderr - 87%|████████▋ | 19527/22434 [17:57:02<2:23:21, 2.96s/it] +2025-02-06 04:04:42 - ERROR - stderr - +2025-02-06 04:04:42 - ERROR - stderr - +2025-02-06 04:04:42 - INFO - stdout - {'loss': 0.3133, 'grad_norm': 1.2968883514404297, 'learning_rate': 8.67876086878211e-07, 'epoch': 2.61} +2025-02-06 04:04:42 - ERROR - stderr - 87%|████████▋ | 19527/22434 [17:57:02<2:23:21, 2.96s/it] +2025-02-06 04:04:45 - ERROR - stderr - 87%|████████▋ | 19528/22434 [17:57:04<2:16:48, 2.82s/it] +2025-02-06 04:04:45 - ERROR - stderr - +2025-02-06 04:04:45 - ERROR - stderr - +2025-02-06 04:04:45 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.5781841278076172, 'learning_rate': 8.672878785512495e-07, 'epoch': 2.61} +2025-02-06 04:04:45 - ERROR - stderr - 87%|████████▋ | 19528/22434 [17:57:04<2:16:48, 2.82s/it] +2025-02-06 04:04:47 - ERROR - stderr - 87%|████████▋ | 19529/22434 [17:57:07<2:12:50, 2.74s/it] +2025-02-06 04:04:47 - ERROR - stderr - +2025-02-06 04:04:47 - ERROR - stderr - +2025-02-06 04:04:47 - INFO - stdout - {'loss': 0.4019, 'grad_norm': 1.652010202407837, 'learning_rate': 8.666998605869348e-07, 'epoch': 2.61} +2025-02-06 04:04:47 - ERROR - stderr - 87%|████████▋ | 19529/22434 [17:57:07<2:12:50, 2.74s/it] +2025-02-06 04:04:50 - ERROR - stderr - 87%|████████▋ | 19530/22434 [17:57:10<2:09:52, 2.68s/it] +2025-02-06 04:04:50 - ERROR - stderr - +2025-02-06 04:04:50 - ERROR - stderr - +2025-02-06 04:04:50 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.4182761907577515, 'learning_rate': 8.661120329975192e-07, 'epoch': 2.61} +2025-02-06 04:04:50 - ERROR - stderr - 87%|████████▋ | 19530/22434 [17:57:10<2:09:52, 2.68s/it] +2025-02-06 04:04:52 - ERROR - stderr - 87%|████████▋ | 19531/22434 [17:57:12<2:07:20, 2.63s/it] +2025-02-06 04:04:52 - ERROR - stderr - +2025-02-06 04:04:52 - ERROR - stderr - +2025-02-06 04:04:52 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.486035704612732, 'learning_rate': 8.655243957952608e-07, 'epoch': 2.61} +2025-02-06 04:04:52 - ERROR - stderr - 87%|████████▋ | 19531/22434 [17:57:12<2:07:20, 2.63s/it] +2025-02-06 04:04:55 - ERROR - stderr - 87%|████████▋ | 19532/22434 [17:57:15<2:05:14, 2.59s/it] +2025-02-06 04:04:55 - ERROR - stderr - +2025-02-06 04:04:55 - ERROR - stderr - +2025-02-06 04:04:55 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.6327126026153564, 'learning_rate': 8.649369489924031e-07, 'epoch': 2.61} +2025-02-06 04:04:55 - ERROR - stderr - 87%|████████▋ | 19532/22434 [17:57:15<2:05:14, 2.59s/it] +2025-02-06 04:04:57 - ERROR - stderr - 87%|████████▋ | 19533/22434 [17:57:17<2:04:28, 2.57s/it] +2025-02-06 04:04:57 - ERROR - stderr - +2025-02-06 04:04:57 - ERROR - stderr - +2025-02-06 04:04:57 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.621757984161377, 'learning_rate': 8.643496926011952e-07, 'epoch': 2.61} +2025-02-06 04:04:57 - ERROR - stderr - 87%|████████▋ | 19533/22434 [17:57:17<2:04:28, 2.57s/it] +2025-02-06 04:05:00 - ERROR - stderr - 87%|████████▋ | 19534/22434 [17:57:20<2:03:34, 2.56s/it] +2025-02-06 04:05:00 - ERROR - stderr - +2025-02-06 04:05:00 - ERROR - stderr - +2025-02-06 04:05:00 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.5309854745864868, 'learning_rate': 8.63762626633875e-07, 'epoch': 2.61} +2025-02-06 04:05:00 - ERROR - stderr - 87%|████████▋ | 19534/22434 [17:57:20<2:03:34, 2.56s/it] +2025-02-06 04:05:24 - ERROR - stderr - 87%|████████▋ | 19535/22434 [17:57:44<7:15:19, 9.01s/it] +2025-02-06 04:05:24 - ERROR - stderr - +2025-02-06 04:05:24 - ERROR - stderr - +2025-02-06 04:05:24 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.6868252754211426, 'learning_rate': 8.631757511026784e-07, 'epoch': 2.61} +2025-02-06 04:05:24 - ERROR - stderr - 87%|████████▋ | 19535/22434 [17:57:44<7:15:19, 9.01s/it] +2025-02-06 04:06:16 - ERROR - stderr - 87%|████████▋ | 19536/22434 [17:58:36<17:43:30, 22.02s/it] +2025-02-06 04:06:16 - ERROR - stderr - +2025-02-06 04:06:16 - ERROR - stderr - +2025-02-06 04:06:16 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.661942720413208, 'learning_rate': 8.625890660198443e-07, 'epoch': 2.61} +2025-02-06 04:06:16 - ERROR - stderr - 87%|████████▋ | 19536/22434 [17:58:36<17:43:30, 22.02s/it] +2025-02-06 04:06:19 - ERROR - stderr - 87%|████████▋ | 19537/22434 [17:58:39<13:00:29, 16.16s/it] +2025-02-06 04:06:19 - ERROR - stderr - +2025-02-06 04:06:19 - ERROR - stderr - +2025-02-06 04:06:19 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.4981926679611206, 'learning_rate': 8.620025713975954e-07, 'epoch': 2.61} +2025-02-06 04:06:19 - ERROR - stderr - 87%|████████▋ | 19537/22434 [17:58:39<13:00:29, 16.16s/it] +2025-02-06 04:06:31 - ERROR - stderr - 87%|████████▋ | 19538/22434 [17:58:51<12:08:24, 15.09s/it] +2025-02-06 04:06:31 - ERROR - stderr - +2025-02-06 04:06:31 - ERROR - stderr - +2025-02-06 04:06:31 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.9869073629379272, 'learning_rate': 8.614162672481585e-07, 'epoch': 2.61} +2025-02-06 04:06:31 - ERROR - stderr - 87%|████████▋ | 19538/22434 [17:58:51<12:08:24, 15.09s/it] +2025-02-06 04:07:16 - ERROR - stderr - 87%|████████▋ | 19539/22434 [17:59:36<19:17:36, 23.99s/it] +2025-02-06 04:07:16 - ERROR - stderr - +2025-02-06 04:07:16 - ERROR - stderr - +2025-02-06 04:07:16 - INFO - stdout - {'loss': 0.3604, 'grad_norm': 1.5965659618377686, 'learning_rate': 8.60830153583756e-07, 'epoch': 2.61} +2025-02-06 04:07:16 - ERROR - stderr - 87%|████████▋ | 19539/22434 [17:59:36<19:17:36, 23.99s/it] +2025-02-06 04:07:39 - ERROR - stderr - 87%|████████▋ | 19540/22434 [17:59:58<18:55:34, 23.54s/it] +2025-02-06 04:07:39 - ERROR - stderr - +2025-02-06 04:07:39 - ERROR - stderr - +2025-02-06 04:07:39 - INFO - stdout - {'loss': 0.3408, 'grad_norm': 1.652419090270996, 'learning_rate': 8.602442304166025e-07, 'epoch': 2.61} +2025-02-06 04:07:39 - ERROR - stderr - 87%|████████▋ | 19540/22434 [17:59:58<18:55:34, 23.54s/it] +2025-02-06 04:08:06 - ERROR - stderr - 87%|████████▋ | 19541/22434 [18:00:25<19:45:41, 24.59s/it] +2025-02-06 04:08:06 - ERROR - stderr - +2025-02-06 04:08:06 - ERROR - stderr - +2025-02-06 04:08:06 - INFO - stdout - {'loss': 0.3507, 'grad_norm': 1.572708010673523, 'learning_rate': 8.596584977589128e-07, 'epoch': 2.61} +2025-02-06 04:08:06 - ERROR - stderr - 87%|████████▋ | 19541/22434 [18:00:25<19:45:41, 24.59s/it] +2025-02-06 04:08:26 - ERROR - stderr - 87%|████████▋ | 19542/22434 [18:00:46<18:47:11, 23.39s/it] +2025-02-06 04:08:26 - ERROR - stderr - +2025-02-06 04:08:26 - ERROR - stderr - +2025-02-06 04:08:26 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.4481728076934814, 'learning_rate': 8.590729556228961e-07, 'epoch': 2.61} +2025-02-06 04:08:26 - ERROR - stderr - 87%|████████▋ | 19542/22434 [18:00:46<18:47:11, 23.39s/it] +2025-02-06 04:09:14 - ERROR - stderr - 87%|████████▋ | 19543/22434 [18:01:34<24:39:39, 30.71s/it] +2025-02-06 04:09:14 - ERROR - stderr - +2025-02-06 04:09:14 - ERROR - stderr - +2025-02-06 04:09:14 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.5846220254898071, 'learning_rate': 8.584876040207557e-07, 'epoch': 2.61} +2025-02-06 04:09:14 - ERROR - stderr - 87%|████████▋ | 19543/22434 [18:01:34<24:39:39, 30.71s/it] +2025-02-06 04:09:59 - ERROR - stderr - 87%|████████▋ | 19544/22434 [18:02:19<28:05:03, 34.98s/it] +2025-02-06 04:09:59 - ERROR - stderr - +2025-02-06 04:09:59 - ERROR - stderr - +2025-02-06 04:09:59 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.5735191106796265, 'learning_rate': 8.579024429646932e-07, 'epoch': 2.61} +2025-02-06 04:09:59 - ERROR - stderr - 87%|████████▋ | 19544/22434 [18:02:19<28:05:03, 34.98s/it] +2025-02-06 04:10:21 - ERROR - stderr - 87%|████████▋ | 19545/22434 [18:02:41<24:57:56, 31.11s/it] +2025-02-06 04:10:21 - ERROR - stderr - +2025-02-06 04:10:21 - ERROR - stderr - +2025-02-06 04:10:21 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.673148274421692, 'learning_rate': 8.573174724669087e-07, 'epoch': 2.61} +2025-02-06 04:10:21 - ERROR - stderr - 87%|████████▋ | 19545/22434 [18:02:41<24:57:56, 31.11s/it] +2025-02-06 04:10:34 - ERROR - stderr - 87%|████████▋ | 19546/22434 [18:02:54<20:34:44, 25.65s/it] +2025-02-06 04:10:34 - ERROR - stderr - +2025-02-06 04:10:34 - ERROR - stderr - +2025-02-06 04:10:34 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.5137078762054443, 'learning_rate': 8.567326925395903e-07, 'epoch': 2.61} +2025-02-06 04:10:34 - ERROR - stderr - 87%|████████▋ | 19546/22434 [18:02:54<20:34:44, 25.65s/it] +2025-02-06 04:11:29 - ERROR - stderr - 87%|████████▋ | 19547/22434 [18:03:48<27:34:23, 34.38s/it] +2025-02-06 04:11:29 - ERROR - stderr - +2025-02-06 04:11:29 - ERROR - stderr - +2025-02-06 04:11:29 - INFO - stdout - {'loss': 0.4049, 'grad_norm': 1.6499638557434082, 'learning_rate': 8.561481031949304e-07, 'epoch': 2.61} +2025-02-06 04:11:29 - ERROR - stderr - 87%|████████▋ | 19547/22434 [18:03:49<27:34:23, 34.38s/it] +2025-02-06 04:11:37 - ERROR - stderr - 87%|████████▋ | 19548/22434 [18:03:57<21:16:42, 26.54s/it] +2025-02-06 04:11:37 - ERROR - stderr - +2025-02-06 04:11:37 - ERROR - stderr - +2025-02-06 04:11:37 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.5932596921920776, 'learning_rate': 8.555637044451138e-07, 'epoch': 2.61} +2025-02-06 04:11:37 - ERROR - stderr - 87%|████████▋ | 19548/22434 [18:03:57<21:16:42, 26.54s/it] +2025-02-06 04:12:29 - ERROR - stderr - 87%|████████▋ | 19549/22434 [18:04:49<27:22:27, 34.16s/it] +2025-02-06 04:12:29 - ERROR - stderr - +2025-02-06 04:12:29 - ERROR - stderr - +2025-02-06 04:12:29 - INFO - stdout - {'loss': 0.3166, 'grad_norm': 1.3847600221633911, 'learning_rate': 8.549794963023216e-07, 'epoch': 2.61} +2025-02-06 04:12:29 - ERROR - stderr - 87%|████████▋ | 19549/22434 [18:04:49<27:22:27, 34.16s/it] +2025-02-06 04:12:54 - ERROR - stderr - 87%|████████▋ | 19550/22434 [18:05:14<25:17:01, 31.56s/it] +2025-02-06 04:12:54 - ERROR - stderr - +2025-02-06 04:12:54 - ERROR - stderr - +2025-02-06 04:12:54 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.7683069705963135, 'learning_rate': 8.543954787787323e-07, 'epoch': 2.61} +2025-02-06 04:12:54 - ERROR - stderr - 87%|████████▋ | 19550/22434 [18:05:14<25:17:01, 31.56s/it] +2025-02-06 04:13:07 - ERROR - stderr - 87%|████████▋ | 19551/22434 [18:05:27<20:46:14, 25.94s/it] +2025-02-06 04:13:07 - ERROR - stderr - +2025-02-06 04:13:07 - ERROR - stderr - +2025-02-06 04:13:07 - INFO - stdout - {'loss': 0.4156, 'grad_norm': 1.8103740215301514, 'learning_rate': 8.538116518865147e-07, 'epoch': 2.61} +2025-02-06 04:13:07 - ERROR - stderr - 87%|████████▋ | 19551/22434 [18:05:27<20:46:14, 25.94s/it] +2025-02-06 04:13:10 - ERROR - stderr - 87%|████████▋ | 19552/22434 [18:05:29<15:07:50, 18.90s/it] +2025-02-06 04:13:10 - ERROR - stderr - +2025-02-06 04:13:10 - ERROR - stderr - +2025-02-06 04:13:10 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.7996622323989868, 'learning_rate': 8.532280156378447e-07, 'epoch': 2.61} +2025-02-06 04:13:10 - ERROR - stderr - 87%|████████▋ | 19552/22434 [18:05:30<15:07:50, 18.90s/it] +2025-02-06 04:13:54 - ERROR - stderr - 87%|████████▋ | 19553/22434 [18:06:14<21:14:05, 26.53s/it] +2025-02-06 04:13:54 - ERROR - stderr - +2025-02-06 04:13:54 - ERROR - stderr - +2025-02-06 04:13:54 - INFO - stdout - {'loss': 0.3197, 'grad_norm': 1.5456198453903198, 'learning_rate': 8.526445700448827e-07, 'epoch': 2.61} +2025-02-06 04:13:54 - ERROR - stderr - 87%|████████▋ | 19553/22434 [18:06:14<21:14:05, 26.53s/it] +2025-02-06 04:14:11 - ERROR - stderr - 87%|████████▋ | 19554/22434 [18:06:31<18:58:26, 23.72s/it] +2025-02-06 04:14:11 - ERROR - stderr - +2025-02-06 04:14:11 - ERROR - stderr - +2025-02-06 04:14:11 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.5507893562316895, 'learning_rate': 8.520613151197899e-07, 'epoch': 2.61} +2025-02-06 04:14:11 - ERROR - stderr - 87%|████████▋ | 19554/22434 [18:06:31<18:58:26, 23.72s/it] +2025-02-06 04:14:24 - ERROR - stderr - 87%|████████▋ | 19555/22434 [18:06:44<16:21:58, 20.47s/it] +2025-02-06 04:14:24 - ERROR - stderr - +2025-02-06 04:14:24 - ERROR - stderr - +2025-02-06 04:14:24 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.474548101425171, 'learning_rate': 8.514782508747288e-07, 'epoch': 2.62} +2025-02-06 04:14:24 - ERROR - stderr - 87%|████████▋ | 19555/22434 [18:06:44<16:21:58, 20.47s/it] +2025-02-06 04:15:12 - ERROR - stderr - 87%|████████▋ | 19556/22434 [18:07:32<22:58:34, 28.74s/it] +2025-02-06 04:15:12 - ERROR - stderr - +2025-02-06 04:15:12 - ERROR - stderr - +2025-02-06 04:15:12 - INFO - stdout - {'loss': 0.3064, 'grad_norm': 1.3237581253051758, 'learning_rate': 8.508953773218454e-07, 'epoch': 2.62} +2025-02-06 04:15:12 - ERROR - stderr - 87%|████████▋ | 19556/22434 [18:07:32<22:58:34, 28.74s/it] +2025-02-06 04:15:36 - ERROR - stderr - 87%|████████▋ | 19557/22434 [18:07:55<21:42:50, 27.17s/it] +2025-02-06 04:15:36 - ERROR - stderr - +2025-02-06 04:15:36 - ERROR - stderr - +2025-02-06 04:15:36 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.5358107089996338, 'learning_rate': 8.503126944732964e-07, 'epoch': 2.62} +2025-02-06 04:15:36 - ERROR - stderr - 87%|████████▋ | 19557/22434 [18:07:55<21:42:50, 27.17s/it] +2025-02-06 04:16:24 - ERROR - stderr - 87%|████████▋ | 19558/22434 [18:08:43<26:42:53, 33.44s/it] +2025-02-06 04:16:24 - ERROR - stderr - +2025-02-06 04:16:24 - ERROR - stderr - +2025-02-06 04:16:24 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.5017247200012207, 'learning_rate': 8.497302023412235e-07, 'epoch': 2.62} +2025-02-06 04:16:24 - ERROR - stderr - 87%|████████▋ | 19558/22434 [18:08:43<26:42:53, 33.44s/it] +2025-02-06 04:16:26 - ERROR - stderr - 87%|████████▋ | 19559/22434 [18:08:46<19:18:46, 24.18s/it] +2025-02-06 04:16:26 - ERROR - stderr - +2025-02-06 04:16:26 - ERROR - stderr - +2025-02-06 04:16:26 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.580004334449768, 'learning_rate': 8.491479009377679e-07, 'epoch': 2.62} +2025-02-06 04:16:26 - ERROR - stderr - 87%|████████▋ | 19559/22434 [18:08:46<19:18:46, 24.18s/it] +2025-02-06 04:16:49 - ERROR - stderr - 87%|████████▋ | 19560/22434 [18:09:08<18:52:45, 23.65s/it] +2025-02-06 04:16:49 - ERROR - stderr - +2025-02-06 04:16:49 - ERROR - stderr - +2025-02-06 04:16:49 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.6635812520980835, 'learning_rate': 8.485657902750677e-07, 'epoch': 2.62} +2025-02-06 04:16:49 - ERROR - stderr - 87%|████████▋ | 19560/22434 [18:09:08<18:52:45, 23.65s/it] +2025-02-06 04:17:29 - ERROR - stderr - 87%|████████▋ | 19561/22434 [18:09:48<22:46:07, 28.53s/it] +2025-02-06 04:17:29 - ERROR - stderr - +2025-02-06 04:17:29 - ERROR - stderr - +2025-02-06 04:17:29 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.4983831644058228, 'learning_rate': 8.479838703652565e-07, 'epoch': 2.62} +2025-02-06 04:17:29 - ERROR - stderr - 87%|████████▋ | 19561/22434 [18:09:48<22:46:07, 28.53s/it] +2025-02-06 04:18:08 - ERROR - stderr - 87%|████████▋ | 19562/22434 [18:10:28<25:29:06, 31.95s/it] +2025-02-06 04:18:09 - ERROR - stderr - +2025-02-06 04:18:09 - ERROR - stderr - +2025-02-06 04:18:09 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.586610198020935, 'learning_rate': 8.474021412204647e-07, 'epoch': 2.62} +2025-02-06 04:18:09 - ERROR - stderr - 87%|████████▋ | 19562/22434 [18:10:28<25:29:06, 31.95s/it] +2025-02-06 04:18:45 - ERROR - stderr - 87%|████████▋ | 19563/22434 [18:11:05<26:35:06, 33.34s/it] +2025-02-06 04:18:45 - ERROR - stderr - +2025-02-06 04:18:45 - ERROR - stderr - +2025-02-06 04:18:45 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.5147449970245361, 'learning_rate': 8.468206028528158e-07, 'epoch': 2.62} +2025-02-06 04:18:45 - ERROR - stderr - 87%|████████▋ | 19563/22434 [18:11:05<26:35:06, 33.34s/it] +2025-02-06 04:19:19 - ERROR - stderr - 87%|████████▋ | 19564/22434 [18:11:39<26:43:58, 33.53s/it] +2025-02-06 04:19:19 - ERROR - stderr - +2025-02-06 04:19:19 - ERROR - stderr - +2025-02-06 04:19:19 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.6614540815353394, 'learning_rate': 8.462392552744347e-07, 'epoch': 2.62} +2025-02-06 04:19:19 - ERROR - stderr - 87%|████████▋ | 19564/22434 [18:11:39<26:43:58, 33.53s/it] +2025-02-06 04:19:22 - ERROR - stderr - 87%|████████▋ | 19565/22434 [18:11:41<19:18:36, 24.23s/it] +2025-02-06 04:19:22 - ERROR - stderr - +2025-02-06 04:19:22 - ERROR - stderr - +2025-02-06 04:19:22 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.4653871059417725, 'learning_rate': 8.45658098497436e-07, 'epoch': 2.62} +2025-02-06 04:19:22 - ERROR - stderr - 87%|████████▋ | 19565/22434 [18:11:41<19:18:36, 24.23s/it] +2025-02-06 04:19:49 - ERROR - stderr - 87%|████████▋ | 19566/22434 [18:12:09<20:08:18, 25.28s/it] +2025-02-06 04:19:49 - ERROR - stderr - +2025-02-06 04:19:49 - ERROR - stderr - +2025-02-06 04:19:49 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.6000083684921265, 'learning_rate': 8.450771325339346e-07, 'epoch': 2.62} +2025-02-06 04:19:49 - ERROR - stderr - 87%|████████▋ | 19566/22434 [18:12:09<20:08:18, 25.28s/it] +2025-02-06 04:19:52 - ERROR - stderr - 87%|████████▋ | 19567/22434 [18:12:12<14:40:51, 18.43s/it] +2025-02-06 04:19:52 - ERROR - stderr - +2025-02-06 04:19:52 - ERROR - stderr - +2025-02-06 04:19:52 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.5855908393859863, 'learning_rate': 8.444963573960396e-07, 'epoch': 2.62} +2025-02-06 04:19:52 - ERROR - stderr - 87%|████████▋ | 19567/22434 [18:12:12<14:40:51, 18.43s/it] +2025-02-06 04:20:15 - ERROR - stderr - 87%|████████▋ | 19568/22434 [18:12:35<15:47:44, 19.84s/it] +2025-02-06 04:20:15 - ERROR - stderr - +2025-02-06 04:20:15 - ERROR - stderr - +2025-02-06 04:20:15 - INFO - stdout - {'loss': 0.4145, 'grad_norm': 1.770967721939087, 'learning_rate': 8.43915773095858e-07, 'epoch': 2.62} +2025-02-06 04:20:15 - ERROR - stderr - 87%|████████▋ | 19568/22434 [18:12:35<15:47:44, 19.84s/it] +2025-02-06 04:20:17 - ERROR - stderr - 87%|████████▋ | 19569/22434 [18:12:37<11:39:40, 14.65s/it] +2025-02-06 04:20:18 - ERROR - stderr - +2025-02-06 04:20:18 - ERROR - stderr - +2025-02-06 04:20:18 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.5113626718521118, 'learning_rate': 8.433353796454924e-07, 'epoch': 2.62} +2025-02-06 04:20:18 - ERROR - stderr - 87%|████████▋ | 19569/22434 [18:12:37<11:39:40, 14.65s/it] +2025-02-06 04:20:35 - ERROR - stderr - 87%|████████▋ | 19570/22434 [18:12:55<12:19:51, 15.50s/it] +2025-02-06 04:20:35 - ERROR - stderr - +2025-02-06 04:20:35 - ERROR - stderr - +2025-02-06 04:20:35 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5077701807022095, 'learning_rate': 8.427551770570352e-07, 'epoch': 2.62} +2025-02-06 04:20:35 - ERROR - stderr - 87%|████████▋ | 19570/22434 [18:12:55<12:19:51, 15.50s/it] +2025-02-06 04:20:38 - ERROR - stderr - 87%|████████▋ | 19571/22434 [18:12:58<9:17:56, 11.69s/it] +2025-02-06 04:20:38 - ERROR - stderr - +2025-02-06 04:20:38 - ERROR - stderr - +2025-02-06 04:20:38 - INFO - stdout - {'loss': 0.421, 'grad_norm': 1.491633415222168, 'learning_rate': 8.421751653425869e-07, 'epoch': 2.62} +2025-02-06 04:20:38 - ERROR - stderr - 87%|████████▋ | 19571/22434 [18:12:58<9:17:56, 11.69s/it] +2025-02-06 04:20:55 - ERROR - stderr - 87%|████████▋ | 19572/22434 [18:13:15<10:39:32, 13.41s/it] +2025-02-06 04:20:55 - ERROR - stderr - +2025-02-06 04:20:55 - ERROR - stderr - +2025-02-06 04:20:55 - INFO - stdout - {'loss': 0.3352, 'grad_norm': 1.4598846435546875, 'learning_rate': 8.415953445142311e-07, 'epoch': 2.62} +2025-02-06 04:20:55 - ERROR - stderr - 87%|████████▋ | 19572/22434 [18:13:15<10:39:32, 13.41s/it] +2025-02-06 04:20:58 - ERROR - stderr - 87%|████████▋ | 19573/22434 [18:13:17<8:04:14, 10.16s/it] +2025-02-06 04:20:58 - ERROR - stderr - +2025-02-06 04:20:58 - ERROR - stderr - +2025-02-06 04:20:58 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.5487326383590698, 'learning_rate': 8.41015714584058e-07, 'epoch': 2.62} +2025-02-06 04:20:58 - ERROR - stderr - 87%|████████▋ | 19573/22434 [18:13:18<8:04:14, 10.16s/it] +2025-02-06 04:21:00 - ERROR - stderr - 87%|████████▋ | 19574/22434 [18:13:20<6:17:33, 7.92s/it] +2025-02-06 04:21:00 - ERROR - stderr - +2025-02-06 04:21:00 - ERROR - stderr - +2025-02-06 04:21:00 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.6238900423049927, 'learning_rate': 8.404362755641504e-07, 'epoch': 2.62} +2025-02-06 04:21:00 - ERROR - stderr - 87%|████████▋ | 19574/22434 [18:13:20<6:17:33, 7.92s/it] +2025-02-06 04:21:03 - ERROR - stderr - 87%|████████▋ | 19575/22434 [18:13:23<4:59:44, 6.29s/it] +2025-02-06 04:21:03 - ERROR - stderr - +2025-02-06 04:21:03 - ERROR - stderr - +2025-02-06 04:21:03 - INFO - stdout - {'loss': 0.3434, 'grad_norm': 1.4055436849594116, 'learning_rate': 8.398570274665796e-07, 'epoch': 2.62} +2025-02-06 04:21:03 - ERROR - stderr - 87%|████████▋ | 19575/22434 [18:13:23<4:59:44, 6.29s/it] +2025-02-06 04:21:05 - ERROR - stderr - 87%|████████▋ | 19576/22434 [18:13:25<4:06:11, 5.17s/it] +2025-02-06 04:21:06 - ERROR - stderr - +2025-02-06 04:21:06 - ERROR - stderr - +2025-02-06 04:21:06 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.5884904861450195, 'learning_rate': 8.392779703034281e-07, 'epoch': 2.62} +2025-02-06 04:21:06 - ERROR - stderr - 87%|████████▋ | 19576/22434 [18:13:25<4:06:11, 5.17s/it] +2025-02-06 04:21:08 - ERROR - stderr - 87%|████████▋ | 19577/22434 [18:13:28<3:29:12, 4.39s/it] +2025-02-06 04:21:08 - ERROR - stderr - +2025-02-06 04:21:08 - ERROR - stderr - +2025-02-06 04:21:08 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.6798869371414185, 'learning_rate': 8.386991040867598e-07, 'epoch': 2.62} +2025-02-06 04:21:08 - ERROR - stderr - 87%|████████▋ | 19577/22434 [18:13:28<3:29:12, 4.39s/it] +2025-02-06 04:21:10 - ERROR - stderr - 87%|████████▋ | 19578/22434 [18:13:30<3:00:56, 3.80s/it] +2025-02-06 04:21:11 - ERROR - stderr - +2025-02-06 04:21:11 - ERROR - stderr - +2025-02-06 04:21:11 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.634273648262024, 'learning_rate': 8.381204288286415e-07, 'epoch': 2.62} +2025-02-06 04:21:11 - ERROR - stderr - 87%|████████▋ | 19578/22434 [18:13:30<3:00:56, 3.80s/it] +2025-02-06 04:21:13 - ERROR - stderr - 87%|████████▋ | 19579/22434 [18:13:33<2:43:24, 3.43s/it] +2025-02-06 04:21:13 - ERROR - stderr - +2025-02-06 04:21:13 - ERROR - stderr - +2025-02-06 04:21:13 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.5996190309524536, 'learning_rate': 8.37541944541137e-07, 'epoch': 2.62} +2025-02-06 04:21:13 - ERROR - stderr - 87%|████████▋ | 19579/22434 [18:13:33<2:43:24, 3.43s/it] +2025-02-06 04:21:15 - ERROR - stderr - 87%|████████▋ | 19580/22434 [18:13:35<2:28:52, 3.13s/it] +2025-02-06 04:21:16 - ERROR - stderr - +2025-02-06 04:21:16 - ERROR - stderr - +2025-02-06 04:21:16 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.6183395385742188, 'learning_rate': 8.369636512363e-07, 'epoch': 2.62} +2025-02-06 04:21:16 - ERROR - stderr - 87%|████████▋ | 19580/22434 [18:13:35<2:28:52, 3.13s/it] +2025-02-06 04:21:32 - ERROR - stderr - 87%|████████▋ | 19581/22434 [18:13:52<5:40:05, 7.15s/it] +2025-02-06 04:21:32 - ERROR - stderr - +2025-02-06 04:21:32 - ERROR - stderr - +2025-02-06 04:21:32 - INFO - stdout - {'loss': 0.4239, 'grad_norm': 1.6020156145095825, 'learning_rate': 8.363855489261918e-07, 'epoch': 2.62} +2025-02-06 04:21:32 - ERROR - stderr - 87%|████████▋ | 19581/22434 [18:13:52<5:40:05, 7.15s/it] +2025-02-06 04:21:35 - ERROR - stderr - 87%|████████▋ | 19582/22434 [18:13:54<4:33:53, 5.76s/it] +2025-02-06 04:21:35 - ERROR - stderr - +2025-02-06 04:21:35 - ERROR - stderr - +2025-02-06 04:21:35 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.568663477897644, 'learning_rate': 8.358076376228563e-07, 'epoch': 2.62} +2025-02-06 04:21:35 - ERROR - stderr - 87%|████████▋ | 19582/22434 [18:13:54<4:33:53, 5.76s/it] +2025-02-06 04:21:37 - ERROR - stderr - 87%|████████▋ | 19583/22434 [18:13:57<3:47:13, 4.78s/it] +2025-02-06 04:21:37 - ERROR - stderr - +2025-02-06 04:21:37 - ERROR - stderr - +2025-02-06 04:21:37 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.5924172401428223, 'learning_rate': 8.352299173383416e-07, 'epoch': 2.62} +2025-02-06 04:21:37 - ERROR - stderr - 87%|████████▋ | 19583/22434 [18:13:57<3:47:13, 4.78s/it] +2025-02-06 04:21:45 - ERROR - stderr - 87%|████████▋ | 19584/22434 [18:14:05<4:34:42, 5.78s/it] +2025-02-06 04:21:45 - ERROR - stderr - +2025-02-06 04:21:45 - ERROR - stderr - +2025-02-06 04:21:45 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.6926302909851074, 'learning_rate': 8.346523880846902e-07, 'epoch': 2.62} +2025-02-06 04:21:45 - ERROR - stderr - 87%|████████▋ | 19584/22434 [18:14:05<4:34:42, 5.78s/it] +2025-02-06 04:21:48 - ERROR - stderr - 87%|████████▋ | 19585/22434 [18:14:08<3:52:20, 4.89s/it] +2025-02-06 04:21:48 - ERROR - stderr - +2025-02-06 04:21:48 - ERROR - stderr - +2025-02-06 04:21:48 - INFO - stdout - {'loss': 0.3437, 'grad_norm': 1.5686759948730469, 'learning_rate': 8.340750498739381e-07, 'epoch': 2.62} +2025-02-06 04:21:48 - ERROR - stderr - 87%|████████▋ | 19585/22434 [18:14:08<3:52:20, 4.89s/it] +2025-02-06 04:21:50 - ERROR - stderr - 87%|████████▋ | 19586/22434 [18:14:10<3:18:42, 4.19s/it] +2025-02-06 04:21:51 - ERROR - stderr - +2025-02-06 04:21:51 - ERROR - stderr - +2025-02-06 04:21:51 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.5210316181182861, 'learning_rate': 8.334979027181222e-07, 'epoch': 2.62} +2025-02-06 04:21:51 - ERROR - stderr - 87%|████████▋ | 19586/22434 [18:14:10<3:18:42, 4.19s/it] +2025-02-06 04:21:53 - ERROR - stderr - 87%|████████▋ | 19587/22434 [18:14:13<2:54:26, 3.68s/it] +2025-02-06 04:21:53 - ERROR - stderr - +2025-02-06 04:21:53 - ERROR - stderr - +2025-02-06 04:21:53 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5274758338928223, 'learning_rate': 8.329209466292698e-07, 'epoch': 2.62} +2025-02-06 04:21:53 - ERROR - stderr - 87%|████████▋ | 19587/22434 [18:14:13<2:54:26, 3.68s/it] +2025-02-06 04:21:56 - ERROR - stderr - 87%|████████▋ | 19588/22434 [18:14:15<2:39:28, 3.36s/it] +2025-02-06 04:21:56 - ERROR - stderr - +2025-02-06 04:21:56 - ERROR - stderr - +2025-02-06 04:21:56 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.5009666681289673, 'learning_rate': 8.323441816194089e-07, 'epoch': 2.62} +2025-02-06 04:21:56 - ERROR - stderr - 87%|████████▋ | 19588/22434 [18:14:15<2:39:28, 3.36s/it] +2025-02-06 04:21:58 - ERROR - stderr - 87%|████████▋ | 19589/22434 [18:14:18<2:28:26, 3.13s/it] +2025-02-06 04:21:58 - ERROR - stderr - +2025-02-06 04:21:58 - ERROR - stderr - +2025-02-06 04:21:58 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.5530569553375244, 'learning_rate': 8.31767607700561e-07, 'epoch': 2.62} +2025-02-06 04:21:58 - ERROR - stderr - 87%|████████▋ | 19589/22434 [18:14:18<2:28:26, 3.13s/it] +2025-02-06 04:22:01 - ERROR - stderr - 87%|████████▋ | 19590/22434 [18:14:21<2:20:38, 2.97s/it] +2025-02-06 04:22:01 - ERROR - stderr - +2025-02-06 04:22:01 - ERROR - stderr - +2025-02-06 04:22:01 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.4850444793701172, 'learning_rate': 8.311912248847465e-07, 'epoch': 2.62} +2025-02-06 04:22:01 - ERROR - stderr - 87%|████████▋ | 19590/22434 [18:14:21<2:20:38, 2.97s/it] +2025-02-06 04:22:04 - ERROR - stderr - 87%|████████▋ | 19591/22434 [18:14:23<2:18:52, 2.93s/it] +2025-02-06 04:22:04 - ERROR - stderr - +2025-02-06 04:22:04 - ERROR - stderr - +2025-02-06 04:22:04 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.4769301414489746, 'learning_rate': 8.306150331839735e-07, 'epoch': 2.62} +2025-02-06 04:22:04 - ERROR - stderr - 87%|████████▋ | 19591/22434 [18:14:23<2:18:52, 2.93s/it] +2025-02-06 04:22:06 - ERROR - stderr - 87%|████████▋ | 19592/22434 [18:14:26<2:12:33, 2.80s/it] +2025-02-06 04:22:06 - ERROR - stderr - +2025-02-06 04:22:06 - ERROR - stderr - +2025-02-06 04:22:06 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.614455223083496, 'learning_rate': 8.30039032610257e-07, 'epoch': 2.62} +2025-02-06 04:22:06 - ERROR - stderr - 87%|████████▋ | 19592/22434 [18:14:26<2:12:33, 2.80s/it] +2025-02-06 04:22:09 - ERROR - stderr - 87%|████████▋ | 19593/22434 [18:14:28<2:08:20, 2.71s/it] +2025-02-06 04:22:09 - ERROR - stderr - +2025-02-06 04:22:09 - ERROR - stderr - +2025-02-06 04:22:09 - INFO - stdout - {'loss': 0.3828, 'grad_norm': 1.6423662900924683, 'learning_rate': 8.29463223175605e-07, 'epoch': 2.62} +2025-02-06 04:22:09 - ERROR - stderr - 87%|████████▋ | 19593/22434 [18:14:28<2:08:20, 2.71s/it] +2025-02-06 04:22:30 - ERROR - stderr - 87%|████████▋ | 19594/22434 [18:14:50<6:29:59, 8.24s/it] +2025-02-06 04:22:30 - ERROR - stderr - +2025-02-06 04:22:30 - ERROR - stderr - +2025-02-06 04:22:30 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.5852645635604858, 'learning_rate': 8.288876048920125e-07, 'epoch': 2.62} +2025-02-06 04:22:30 - ERROR - stderr - 87%|████████▋ | 19594/22434 [18:14:50<6:29:59, 8.24s/it] +2025-02-06 04:22:49 - ERROR - stderr - 87%|████████▋ | 19595/22434 [18:15:08<9:01:30, 11.44s/it] +2025-02-06 04:22:49 - ERROR - stderr - +2025-02-06 04:22:49 - ERROR - stderr - +2025-02-06 04:22:49 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.4702638387680054, 'learning_rate': 8.283121777714864e-07, 'epoch': 2.62} +2025-02-06 04:22:49 - ERROR - stderr - 87%|████████▋ | 19595/22434 [18:15:09<9:01:30, 11.44s/it] +2025-02-06 04:23:17 - ERROR - stderr - 87%|████████▋ | 19596/22434 [18:15:37<13:04:07, 16.58s/it] +2025-02-06 04:23:17 - ERROR - stderr - +2025-02-06 04:23:17 - ERROR - stderr - +2025-02-06 04:23:17 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.6178233623504639, 'learning_rate': 8.277369418260129e-07, 'epoch': 2.62} +2025-02-06 04:23:17 - ERROR - stderr - 87%|████████▋ | 19596/22434 [18:15:37<13:04:07, 16.58s/it] +2025-02-06 04:23:46 - ERROR - stderr - 87%|████████▋ | 19597/22434 [18:16:05<15:50:08, 20.09s/it] +2025-02-06 04:23:46 - ERROR - stderr - +2025-02-06 04:23:46 - ERROR - stderr - +2025-02-06 04:23:46 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.4203206300735474, 'learning_rate': 8.271618970675887e-07, 'epoch': 2.62} +2025-02-06 04:23:46 - ERROR - stderr - 87%|████████▋ | 19597/22434 [18:16:05<15:50:08, 20.09s/it] +2025-02-06 04:24:05 - ERROR - stderr - 87%|████████▋ | 19598/22434 [18:16:25<15:37:24, 19.83s/it] +2025-02-06 04:24:05 - ERROR - stderr - +2025-02-06 04:24:05 - ERROR - stderr - +2025-02-06 04:24:05 - INFO - stdout - {'loss': 0.3499, 'grad_norm': 1.5350168943405151, 'learning_rate': 8.265870435081957e-07, 'epoch': 2.62} +2025-02-06 04:24:05 - ERROR - stderr - 87%|████████▋ | 19598/22434 [18:16:25<15:37:24, 19.83s/it] +2025-02-06 04:24:38 - ERROR - stderr - 87%|████████▋ | 19599/22434 [18:16:58<18:50:25, 23.92s/it] +2025-02-06 04:24:38 - ERROR - stderr - +2025-02-06 04:24:38 - ERROR - stderr - +2025-02-06 04:24:38 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.679657220840454, 'learning_rate': 8.260123811598164e-07, 'epoch': 2.62} +2025-02-06 04:24:38 - ERROR - stderr - 87%|████████▋ | 19599/22434 [18:16:58<18:50:25, 23.92s/it] +2025-02-06 04:25:14 - ERROR - stderr - 87%|████████▋ | 19600/22434 [18:17:34<21:44:12, 27.61s/it] +2025-02-06 04:25:14 - ERROR - stderr - +2025-02-06 04:25:14 - ERROR - stderr - +2025-02-06 04:25:14 - INFO - stdout - {'loss': 0.3375, 'grad_norm': 1.5770777463912964, 'learning_rate': 8.254379100344345e-07, 'epoch': 2.62} +2025-02-06 04:25:14 - ERROR - stderr - 87%|████████▋ | 19600/22434 [18:17:34<21:44:12, 27.61s/it] +2025-02-06 04:25:54 - ERROR - stderr - 87%|████████▋ | 19601/22434 [18:18:14<24:38:36, 31.32s/it] +2025-02-06 04:25:54 - ERROR - stderr - +2025-02-06 04:25:54 - ERROR - stderr - +2025-02-06 04:25:54 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.8325660228729248, 'learning_rate': 8.248636301440171e-07, 'epoch': 2.62} +2025-02-06 04:25:54 - ERROR - stderr - 87%|████████▋ | 19601/22434 [18:18:14<24:38:36, 31.32s/it] +2025-02-06 04:26:07 - ERROR - stderr - 87%|████████▋ | 19602/22434 [18:18:26<20:07:58, 25.59s/it] +2025-02-06 04:26:07 - ERROR - stderr - +2025-02-06 04:26:07 - ERROR - stderr - +2025-02-06 04:26:07 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.6508115530014038, 'learning_rate': 8.242895415005391e-07, 'epoch': 2.62} +2025-02-06 04:26:07 - ERROR - stderr - 87%|████████▋ | 19602/22434 [18:18:26<20:07:58, 25.59s/it] +2025-02-06 04:26:09 - ERROR - stderr - 87%|████████▋ | 19603/22434 [18:18:29<14:40:28, 18.66s/it] +2025-02-06 04:26:09 - ERROR - stderr - +2025-02-06 04:26:09 - ERROR - stderr - +2025-02-06 04:26:09 - INFO - stdout - {'loss': 0.3341, 'grad_norm': 1.4678726196289062, 'learning_rate': 8.237156441159644e-07, 'epoch': 2.62} +2025-02-06 04:26:09 - ERROR - stderr - 87%|████████▋ | 19603/22434 [18:18:29<14:40:28, 18.66s/it] +2025-02-06 04:26:30 - ERROR - stderr - 87%|████████▋ | 19604/22434 [18:18:49<15:06:34, 19.22s/it] +2025-02-06 04:26:30 - ERROR - stderr - +2025-02-06 04:26:30 - ERROR - stderr - +2025-02-06 04:26:30 - INFO - stdout - {'loss': 0.3198, 'grad_norm': 1.3634788990020752, 'learning_rate': 8.231419380022576e-07, 'epoch': 2.62} +2025-02-06 04:26:30 - ERROR - stderr - 87%|████████▋ | 19604/22434 [18:18:49<15:06:34, 19.22s/it] +2025-02-06 04:27:08 - ERROR - stderr - 87%|████████▋ | 19605/22434 [18:19:28<19:41:28, 25.06s/it] +2025-02-06 04:27:08 - ERROR - stderr - +2025-02-06 04:27:08 - ERROR - stderr - +2025-02-06 04:27:08 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.4159530401229858, 'learning_rate': 8.225684231713749e-07, 'epoch': 2.62} +2025-02-06 04:27:08 - ERROR - stderr - 87%|████████▋ | 19605/22434 [18:19:28<19:41:28, 25.06s/it] +2025-02-06 04:27:17 - ERROR - stderr - 87%|████████▋ | 19606/22434 [18:19:37<15:51:37, 20.19s/it] +2025-02-06 04:27:17 - ERROR - stderr - +2025-02-06 04:27:17 - ERROR - stderr - +2025-02-06 04:27:17 - INFO - stdout - {'loss': 0.4523, 'grad_norm': 1.751919150352478, 'learning_rate': 8.21995099635271e-07, 'epoch': 2.62} +2025-02-06 04:27:17 - ERROR - stderr - 87%|████████▋ | 19606/22434 [18:19:37<15:51:37, 20.19s/it] +2025-02-06 04:28:01 - ERROR - stderr - 87%|████████▋ | 19607/22434 [18:20:21<21:23:50, 27.25s/it] +2025-02-06 04:28:01 - ERROR - stderr - +2025-02-06 04:28:01 - ERROR - stderr - +2025-02-06 04:28:01 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.6467260122299194, 'learning_rate': 8.214219674058976e-07, 'epoch': 2.62} +2025-02-06 04:28:01 - ERROR - stderr - 87%|████████▋ | 19607/22434 [18:20:21<21:23:50, 27.25s/it] +2025-02-06 04:28:47 - ERROR - stderr - 87%|████████▋ | 19608/22434 [18:21:07<25:48:46, 32.88s/it] +2025-02-06 04:28:47 - ERROR - stderr - +2025-02-06 04:28:47 - ERROR - stderr - +2025-02-06 04:28:47 - INFO - stdout - {'loss': 0.3208, 'grad_norm': 1.383855938911438, 'learning_rate': 8.208490264952007e-07, 'epoch': 2.62} +2025-02-06 04:28:47 - ERROR - stderr - 87%|████████▋ | 19608/22434 [18:21:07<25:48:46, 32.88s/it] +2025-02-06 04:29:03 - ERROR - stderr - 87%|████████▋ | 19609/22434 [18:21:23<21:54:41, 27.92s/it] +2025-02-06 04:29:03 - ERROR - stderr - +2025-02-06 04:29:03 - ERROR - stderr - +2025-02-06 04:29:03 - INFO - stdout - {'loss': 0.3865, 'grad_norm': 1.5537952184677124, 'learning_rate': 8.202762769151229e-07, 'epoch': 2.62} +2025-02-06 04:29:03 - ERROR - stderr - 87%|████████▋ | 19609/22434 [18:21:23<21:54:41, 27.92s/it] +2025-02-06 04:29:48 - ERROR - stderr - 87%|████████▋ | 19610/22434 [18:22:08<25:49:49, 32.93s/it] +2025-02-06 04:29:48 - ERROR - stderr - +2025-02-06 04:29:48 - ERROR - stderr - +2025-02-06 04:29:48 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.5524613857269287, 'learning_rate': 8.197037186776002e-07, 'epoch': 2.62} +2025-02-06 04:29:48 - ERROR - stderr - 87%|████████▋ | 19610/22434 [18:22:08<25:49:49, 32.93s/it] +2025-02-06 04:30:04 - ERROR - stderr - 87%|████████▋ | 19611/22434 [18:22:23<21:45:59, 27.76s/it] +2025-02-06 04:30:04 - ERROR - stderr - +2025-02-06 04:30:04 - ERROR - stderr - +2025-02-06 04:30:04 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.5280252695083618, 'learning_rate': 8.191313517945698e-07, 'epoch': 2.62} +2025-02-06 04:30:04 - ERROR - stderr - 87%|████████▋ | 19611/22434 [18:22:23<21:45:59, 27.76s/it] +2025-02-06 04:30:17 - ERROR - stderr - 87%|████████▋ | 19612/22434 [18:22:37<18:30:23, 23.61s/it] +2025-02-06 04:30:18 - ERROR - stderr - +2025-02-06 04:30:18 - ERROR - stderr - +2025-02-06 04:30:18 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.4856411218643188, 'learning_rate': 8.18559176277961e-07, 'epoch': 2.62} +2025-02-06 04:30:18 - ERROR - stderr - 87%|████████▋ | 19612/22434 [18:22:37<18:30:23, 23.61s/it] +2025-02-06 04:31:01 - ERROR - stderr - 87%|████████▋ | 19613/22434 [18:23:20<23:03:52, 29.43s/it] +2025-02-06 04:31:01 - ERROR - stderr - +2025-02-06 04:31:01 - ERROR - stderr - +2025-02-06 04:31:01 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.5660953521728516, 'learning_rate': 8.179871921396998e-07, 'epoch': 2.62} +2025-02-06 04:31:01 - ERROR - stderr - 87%|████████▋ | 19613/22434 [18:23:20<23:03:52, 29.43s/it] +2025-02-06 04:31:29 - ERROR - stderr - 87%|████████▋ | 19614/22434 [18:23:48<22:45:51, 29.06s/it] +2025-02-06 04:31:29 - ERROR - stderr - +2025-02-06 04:31:29 - ERROR - stderr - +2025-02-06 04:31:29 - INFO - stdout - {'loss': 0.3196, 'grad_norm': 1.3239666223526, 'learning_rate': 8.174153993917122e-07, 'epoch': 2.62} +2025-02-06 04:31:29 - ERROR - stderr - 87%|████████▋ | 19614/22434 [18:23:49<22:45:51, 29.06s/it] +2025-02-06 04:32:13 - ERROR - stderr - 87%|████████▋ | 19615/22434 [18:24:33<26:24:33, 33.73s/it] +2025-02-06 04:32:13 - ERROR - stderr - +2025-02-06 04:32:13 - ERROR - stderr - +2025-02-06 04:32:13 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.5252865552902222, 'learning_rate': 8.168437980459098e-07, 'epoch': 2.62} +2025-02-06 04:32:13 - ERROR - stderr - 87%|████████▋ | 19615/22434 [18:24:33<26:24:33, 33.73s/it] +2025-02-06 04:33:04 - ERROR - stderr - 87%|████████▋ | 19616/22434 [18:25:24<30:23:12, 38.82s/it] +2025-02-06 04:33:04 - ERROR - stderr - +2025-02-06 04:33:04 - ERROR - stderr - +2025-02-06 04:33:04 - INFO - stdout - {'loss': 0.3881, 'grad_norm': 1.4968665838241577, 'learning_rate': 8.162723881142154e-07, 'epoch': 2.62} +2025-02-06 04:33:04 - ERROR - stderr - 87%|████████▋ | 19616/22434 [18:25:24<30:23:12, 38.82s/it] +2025-02-06 04:33:11 - ERROR - stderr - 87%|████████▋ | 19617/22434 [18:25:31<22:54:12, 29.27s/it] +2025-02-06 04:33:11 - ERROR - stderr - +2025-02-06 04:33:11 - ERROR - stderr - +2025-02-06 04:33:11 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.709981918334961, 'learning_rate': 8.157011696085326e-07, 'epoch': 2.62} +2025-02-06 04:33:11 - ERROR - stderr - 87%|████████▋ | 19617/22434 [18:25:31<22:54:12, 29.27s/it] +2025-02-06 04:33:14 - ERROR - stderr - 87%|████████▋ | 19618/22434 [18:25:33<16:36:46, 21.24s/it] +2025-02-06 04:33:14 - ERROR - stderr - +2025-02-06 04:33:14 - ERROR - stderr - +2025-02-06 04:33:14 - INFO - stdout - {'loss': 0.345, 'grad_norm': 1.4801714420318604, 'learning_rate': 8.151301425407699e-07, 'epoch': 2.62} +2025-02-06 04:33:14 - ERROR - stderr - 87%|████████▋ | 19618/22434 [18:25:33<16:36:46, 21.24s/it] +2025-02-06 04:33:32 - ERROR - stderr - 87%|████████▋ | 19619/22434 [18:25:52<15:56:19, 20.38s/it] +2025-02-06 04:33:32 - ERROR - stderr - +2025-02-06 04:33:32 - ERROR - stderr - +2025-02-06 04:33:32 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.7283940315246582, 'learning_rate': 8.145593069228331e-07, 'epoch': 2.62} +2025-02-06 04:33:32 - ERROR - stderr - 87%|████████▋ | 19619/22434 [18:25:52<15:56:19, 20.38s/it] +2025-02-06 04:33:58 - ERROR - stderr - 87%|████████▋ | 19620/22434 [18:26:18<17:18:54, 22.15s/it] +2025-02-06 04:33:58 - ERROR - stderr - +2025-02-06 04:33:58 - ERROR - stderr - +2025-02-06 04:33:58 - INFO - stdout - {'loss': 0.3496, 'grad_norm': 1.4477654695510864, 'learning_rate': 8.139886627666139e-07, 'epoch': 2.62} +2025-02-06 04:33:58 - ERROR - stderr - 87%|████████▋ | 19620/22434 [18:26:18<17:18:54, 22.15s/it] +2025-02-06 04:34:43 - ERROR - stderr - 87%|████████▋ | 19621/22434 [18:27:02<22:31:05, 28.82s/it] +2025-02-06 04:34:43 - ERROR - stderr - +2025-02-06 04:34:43 - ERROR - stderr - +2025-02-06 04:34:43 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.8064619302749634, 'learning_rate': 8.134182100840149e-07, 'epoch': 2.62} +2025-02-06 04:34:43 - ERROR - stderr - 87%|████████▋ | 19621/22434 [18:27:02<22:31:05, 28.82s/it] +2025-02-06 04:35:16 - ERROR - stderr - 87%|████████▋ | 19622/22434 [18:27:36<23:33:54, 30.17s/it] +2025-02-06 04:35:16 - ERROR - stderr - +2025-02-06 04:35:16 - ERROR - stderr - +2025-02-06 04:35:16 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.5552284717559814, 'learning_rate': 8.128479488869212e-07, 'epoch': 2.62} +2025-02-06 04:35:16 - ERROR - stderr - 87%|████████▋ | 19622/22434 [18:27:36<23:33:54, 30.17s/it] +2025-02-06 04:35:45 - ERROR - stderr - 87%|████████▋ | 19623/22434 [18:28:04<23:12:23, 29.72s/it] +2025-02-06 04:35:45 - ERROR - stderr - +2025-02-06 04:35:45 - ERROR - stderr - +2025-02-06 04:35:45 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.5775485038757324, 'learning_rate': 8.12277879187221e-07, 'epoch': 2.62} +2025-02-06 04:35:45 - ERROR - stderr - 87%|████████▋ | 19623/22434 [18:28:04<23:12:23, 29.72s/it] +2025-02-06 04:35:52 - ERROR - stderr - 87%|████████▋ | 19624/22434 [18:28:12<18:04:08, 23.15s/it] +2025-02-06 04:35:52 - ERROR - stderr - +2025-02-06 04:35:52 - ERROR - stderr - +2025-02-06 04:35:52 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.684749960899353, 'learning_rate': 8.117080009967971e-07, 'epoch': 2.62} +2025-02-06 04:35:52 - ERROR - stderr - 87%|████████▋ | 19624/22434 [18:28:12<18:04:08, 23.15s/it] +2025-02-06 04:36:26 - ERROR - stderr - 87%|████████▋ | 19625/22434 [18:28:45<20:26:32, 26.20s/it] +2025-02-06 04:36:26 - ERROR - stderr - +2025-02-06 04:36:26 - ERROR - stderr - +2025-02-06 04:36:26 - INFO - stdout - {'loss': 0.3258, 'grad_norm': 1.4373093843460083, 'learning_rate': 8.111383143275264e-07, 'epoch': 2.62} +2025-02-06 04:36:26 - ERROR - stderr - 87%|████████▋ | 19625/22434 [18:28:45<20:26:32, 26.20s/it] +2025-02-06 04:36:58 - ERROR - stderr - 87%|████████▋ | 19626/22434 [18:29:18<21:56:43, 28.14s/it] +2025-02-06 04:36:58 - ERROR - stderr - +2025-02-06 04:36:58 - ERROR - stderr - +2025-02-06 04:36:58 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.5786436796188354, 'learning_rate': 8.105688191912852e-07, 'epoch': 2.62} +2025-02-06 04:36:58 - ERROR - stderr - 87%|████████▋ | 19626/22434 [18:29:18<21:56:43, 28.14s/it] +2025-02-06 04:37:01 - ERROR - stderr - 87%|████████▋ | 19627/22434 [18:29:21<15:56:51, 20.45s/it] +2025-02-06 04:37:01 - ERROR - stderr - +2025-02-06 04:37:01 - ERROR - stderr - +2025-02-06 04:37:01 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.6357507705688477, 'learning_rate': 8.09999515599944e-07, 'epoch': 2.62} +2025-02-06 04:37:01 - ERROR - stderr - 87%|████████▋ | 19627/22434 [18:29:21<15:56:51, 20.45s/it] +2025-02-06 04:37:23 - ERROR - stderr - 87%|████████▋ | 19628/22434 [18:29:43<16:26:47, 21.10s/it] +2025-02-06 04:37:24 - ERROR - stderr - +2025-02-06 04:37:24 - ERROR - stderr - +2025-02-06 04:37:24 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.6164336204528809, 'learning_rate': 8.094304035653689e-07, 'epoch': 2.62} +2025-02-06 04:37:24 - ERROR - stderr - 87%|████████▋ | 19628/22434 [18:29:43<16:26:47, 21.10s/it] +2025-02-06 04:37:26 - ERROR - stderr - 87%|████████▋ | 19629/22434 [18:29:46<12:06:29, 15.54s/it] +2025-02-06 04:37:26 - ERROR - stderr - +2025-02-06 04:37:26 - ERROR - stderr - +2025-02-06 04:37:26 - INFO - stdout - {'loss': 0.3476, 'grad_norm': 1.55000901222229, 'learning_rate': 8.088614830994223e-07, 'epoch': 2.62} +2025-02-06 04:37:26 - ERROR - stderr - 87%|████████▋ | 19629/22434 [18:29:46<12:06:29, 15.54s/it] +2025-02-06 04:37:29 - ERROR - stderr - 88%|████████▊ | 19630/22434 [18:29:48<9:03:55, 11.64s/it] +2025-02-06 04:37:29 - ERROR - stderr - +2025-02-06 04:37:29 - ERROR - stderr - +2025-02-06 04:37:29 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.518129825592041, 'learning_rate': 8.08292754213964e-07, 'epoch': 2.63} +2025-02-06 04:37:29 - ERROR - stderr - 88%|████████▊ | 19630/22434 [18:29:48<9:03:55, 11.64s/it] +2025-02-06 04:37:31 - ERROR - stderr - 88%|████████▊ | 19631/22434 [18:29:51<6:56:16, 8.91s/it] +2025-02-06 04:37:31 - ERROR - stderr - +2025-02-06 04:37:31 - ERROR - stderr - +2025-02-06 04:37:31 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.5852760076522827, 'learning_rate': 8.077242169208477e-07, 'epoch': 2.63} +2025-02-06 04:37:31 - ERROR - stderr - 88%|████████▊ | 19631/22434 [18:29:51<6:56:16, 8.91s/it] +2025-02-06 04:37:48 - ERROR - stderr - 88%|████████▊ | 19632/22434 [18:30:08<8:46:24, 11.27s/it] +2025-02-06 04:37:48 - ERROR - stderr - +2025-02-06 04:37:48 - ERROR - stderr - +2025-02-06 04:37:48 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.5196270942687988, 'learning_rate': 8.071558712319227e-07, 'epoch': 2.63} +2025-02-06 04:37:48 - ERROR - stderr - 88%|████████▊ | 19632/22434 [18:30:08<8:46:24, 11.27s/it] +2025-02-06 04:37:50 - ERROR - stderr - 88%|████████▊ | 19633/22434 [18:30:10<6:44:12, 8.66s/it] +2025-02-06 04:37:51 - ERROR - stderr - +2025-02-06 04:37:51 - ERROR - stderr - +2025-02-06 04:37:51 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.5668519735336304, 'learning_rate': 8.065877171590375e-07, 'epoch': 2.63} +2025-02-06 04:37:51 - ERROR - stderr - 88%|████████▊ | 19633/22434 [18:30:10<6:44:12, 8.66s/it] +2025-02-06 04:38:07 - ERROR - stderr - 88%|████████▊ | 19634/22434 [18:30:27<8:31:04, 10.95s/it] +2025-02-06 04:38:07 - ERROR - stderr - +2025-02-06 04:38:07 - ERROR - stderr - +2025-02-06 04:38:07 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.7547022104263306, 'learning_rate': 8.060197547140347e-07, 'epoch': 2.63} +2025-02-06 04:38:07 - ERROR - stderr - 88%|████████▊ | 19634/22434 [18:30:27<8:31:04, 10.95s/it] +2025-02-06 04:38:23 - ERROR - stderr - 88%|████████▊ | 19635/22434 [18:30:43<9:50:02, 12.65s/it] +2025-02-06 04:38:23 - ERROR - stderr - +2025-02-06 04:38:23 - ERROR - stderr - +2025-02-06 04:38:23 - INFO - stdout - {'loss': 0.3395, 'grad_norm': 1.327179193496704, 'learning_rate': 8.054519839087537e-07, 'epoch': 2.63} +2025-02-06 04:38:23 - ERROR - stderr - 88%|████████▊ | 19635/22434 [18:30:43<9:50:02, 12.65s/it] +2025-02-06 04:38:26 - ERROR - stderr - 88%|████████▊ | 19636/22434 [18:30:46<7:33:11, 9.72s/it] +2025-02-06 04:38:26 - ERROR - stderr - +2025-02-06 04:38:26 - ERROR - stderr - +2025-02-06 04:38:26 - INFO - stdout - {'loss': 0.3283, 'grad_norm': 1.2485555410385132, 'learning_rate': 8.048844047550252e-07, 'epoch': 2.63} +2025-02-06 04:38:26 - ERROR - stderr - 88%|████████▊ | 19636/22434 [18:30:46<7:33:11, 9.72s/it] +2025-02-06 04:38:29 - ERROR - stderr - 88%|████████▊ | 19637/22434 [18:30:49<5:55:47, 7.63s/it] +2025-02-06 04:38:29 - ERROR - stderr - +2025-02-06 04:38:29 - ERROR - stderr - +2025-02-06 04:38:29 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.3951236009597778, 'learning_rate': 8.043170172646841e-07, 'epoch': 2.63} +2025-02-06 04:38:29 - ERROR - stderr - 88%|████████▊ | 19637/22434 [18:30:49<5:55:47, 7.63s/it] +2025-02-06 04:38:32 - ERROR - stderr - 88%|████████▊ | 19638/22434 [18:30:51<4:45:48, 6.13s/it] +2025-02-06 04:38:32 - ERROR - stderr - +2025-02-06 04:38:32 - ERROR - stderr - +2025-02-06 04:38:32 - INFO - stdout - {'loss': 0.3259, 'grad_norm': 1.4200693368911743, 'learning_rate': 8.037498214495565e-07, 'epoch': 2.63} +2025-02-06 04:38:32 - ERROR - stderr - 88%|████████▊ | 19638/22434 [18:30:51<4:45:48, 6.13s/it] +2025-02-06 04:38:37 - ERROR - stderr - 88%|████████▊ | 19639/22434 [18:30:57<4:39:31, 6.00s/it] +2025-02-06 04:38:37 - ERROR - stderr - +2025-02-06 04:38:37 - ERROR - stderr - +2025-02-06 04:38:37 - INFO - stdout - {'loss': 0.3342, 'grad_norm': 1.4747624397277832, 'learning_rate': 8.031828173214607e-07, 'epoch': 2.63} +2025-02-06 04:38:37 - ERROR - stderr - 88%|████████▊ | 19639/22434 [18:30:57<4:39:31, 6.00s/it] +2025-02-06 04:38:40 - ERROR - stderr - 88%|████████▊ | 19640/22434 [18:31:00<3:53:39, 5.02s/it] +2025-02-06 04:38:40 - ERROR - stderr - +2025-02-06 04:38:40 - ERROR - stderr - +2025-02-06 04:38:40 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.6580431461334229, 'learning_rate': 8.026160048922216e-07, 'epoch': 2.63} +2025-02-06 04:38:40 - ERROR - stderr - 88%|████████▊ | 19640/22434 [18:31:00<3:53:39, 5.02s/it] +2025-02-06 04:38:43 - ERROR - stderr - 88%|████████▊ | 19641/22434 [18:31:02<3:17:38, 4.25s/it] +2025-02-06 04:38:43 - ERROR - stderr - +2025-02-06 04:38:43 - ERROR - stderr - +2025-02-06 04:38:43 - INFO - stdout - {'loss': 0.4034, 'grad_norm': 1.4473644495010376, 'learning_rate': 8.020493841736487e-07, 'epoch': 2.63} +2025-02-06 04:38:43 - ERROR - stderr - 88%|████████▊ | 19641/22434 [18:31:02<3:17:38, 4.25s/it] +2025-02-06 04:38:45 - ERROR - stderr - 88%|████████▊ | 19642/22434 [18:31:05<2:52:22, 3.70s/it] +2025-02-06 04:38:45 - ERROR - stderr - +2025-02-06 04:38:45 - ERROR - stderr - +2025-02-06 04:38:45 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.6652432680130005, 'learning_rate': 8.014829551775583e-07, 'epoch': 2.63} +2025-02-06 04:38:45 - ERROR - stderr - 88%|████████▊ | 19642/22434 [18:31:05<2:52:22, 3.70s/it] +2025-02-06 04:38:47 - ERROR - stderr - 88%|████████▊ | 19643/22434 [18:31:07<2:36:09, 3.36s/it] +2025-02-06 04:38:48 - ERROR - stderr - +2025-02-06 04:38:48 - ERROR - stderr - +2025-02-06 04:38:48 - INFO - stdout - {'loss': 0.3845, 'grad_norm': 1.5977108478546143, 'learning_rate': 8.009167179157506e-07, 'epoch': 2.63} +2025-02-06 04:38:48 - ERROR - stderr - 88%|████████▊ | 19643/22434 [18:31:07<2:36:09, 3.36s/it] +2025-02-06 04:38:50 - ERROR - stderr - 88%|████████▊ | 19644/22434 [18:31:10<2:24:36, 3.11s/it] +2025-02-06 04:38:50 - ERROR - stderr - +2025-02-06 04:38:50 - ERROR - stderr - +2025-02-06 04:38:50 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.5969293117523193, 'learning_rate': 8.003506724000321e-07, 'epoch': 2.63} +2025-02-06 04:38:50 - ERROR - stderr - 88%|████████▊ | 19644/22434 [18:31:10<2:24:36, 3.11s/it] +2025-02-06 04:38:53 - ERROR - stderr - 88%|████████▊ | 19645/22434 [18:31:12<2:16:53, 2.95s/it] +2025-02-06 04:38:53 - ERROR - stderr - +2025-02-06 04:38:53 - ERROR - stderr - +2025-02-06 04:38:53 - INFO - stdout - {'loss': 0.3437, 'grad_norm': 1.6361817121505737, 'learning_rate': 7.997848186422008e-07, 'epoch': 2.63} +2025-02-06 04:38:53 - ERROR - stderr - 88%|████████▊ | 19645/22434 [18:31:12<2:16:53, 2.95s/it] +2025-02-06 04:38:55 - ERROR - stderr - 88%|████████▊ | 19646/22434 [18:31:15<2:10:20, 2.81s/it] +2025-02-06 04:38:55 - ERROR - stderr - +2025-02-06 04:38:55 - ERROR - stderr - +2025-02-06 04:38:55 - INFO - stdout - {'loss': 0.3847, 'grad_norm': 1.6540716886520386, 'learning_rate': 7.992191566540519e-07, 'epoch': 2.63} +2025-02-06 04:38:55 - ERROR - stderr - 88%|████████▊ | 19646/22434 [18:31:15<2:10:20, 2.81s/it] +2025-02-06 04:38:58 - ERROR - stderr - 88%|████████▊ | 19647/22434 [18:31:17<2:06:16, 2.72s/it] +2025-02-06 04:38:58 - ERROR - stderr - +2025-02-06 04:38:58 - ERROR - stderr - +2025-02-06 04:38:58 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.5350583791732788, 'learning_rate': 7.986536864473748e-07, 'epoch': 2.63} +2025-02-06 04:38:58 - ERROR - stderr - 88%|████████▊ | 19647/22434 [18:31:17<2:06:16, 2.72s/it] +2025-02-06 04:39:00 - ERROR - stderr - 88%|████████▊ | 19648/22434 [18:31:20<2:03:10, 2.65s/it] +2025-02-06 04:39:00 - ERROR - stderr - +2025-02-06 04:39:00 - ERROR - stderr - +2025-02-06 04:39:00 - INFO - stdout - {'loss': 0.3222, 'grad_norm': 1.46451735496521, 'learning_rate': 7.980884080339568e-07, 'epoch': 2.63} +2025-02-06 04:39:00 - ERROR - stderr - 88%|████████▊ | 19648/22434 [18:31:20<2:03:10, 2.65s/it] +2025-02-06 04:39:03 - ERROR - stderr - 88%|████████▊ | 19649/22434 [18:31:22<2:01:48, 2.62s/it] +2025-02-06 04:39:03 - ERROR - stderr - +2025-02-06 04:39:03 - ERROR - stderr - +2025-02-06 04:39:03 - INFO - stdout - {'loss': 0.3387, 'grad_norm': 1.5971907377243042, 'learning_rate': 7.975233214255807e-07, 'epoch': 2.63} +2025-02-06 04:39:03 - ERROR - stderr - 88%|████████▊ | 19649/22434 [18:31:22<2:01:48, 2.62s/it] +2025-02-06 04:39:03 - INFO - stdout - WARNING: tokenization mismatch: 112 vs. 138. (ignored) +2025-02-06 04:39:05 - ERROR - stderr - 88%|████████▊ | 19650/22434 [18:31:25<2:04:33, 2.68s/it] +2025-02-06 04:39:06 - ERROR - stderr - +2025-02-06 04:39:06 - ERROR - stderr - +2025-02-06 04:39:06 - INFO - stdout - {'loss': 0.338, 'grad_norm': 1.453679084777832, 'learning_rate': 7.969584266340258e-07, 'epoch': 2.63} +2025-02-06 04:39:06 - ERROR - stderr - 88%|████████▊ | 19650/22434 [18:31:25<2:04:33, 2.68s/it] +2025-02-06 04:39:08 - ERROR - stderr - 88%|████████▊ | 19651/22434 [18:31:28<2:01:53, 2.63s/it] +2025-02-06 04:39:08 - ERROR - stderr - +2025-02-06 04:39:08 - ERROR - stderr - +2025-02-06 04:39:08 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.5526421070098877, 'learning_rate': 7.96393723671065e-07, 'epoch': 2.63} +2025-02-06 04:39:08 - ERROR - stderr - 88%|████████▊ | 19651/22434 [18:31:28<2:01:53, 2.63s/it] +2025-02-06 04:39:10 - ERROR - stderr - 88%|████████▊ | 19652/22434 [18:31:30<2:00:34, 2.60s/it] +2025-02-06 04:39:11 - ERROR - stderr - +2025-02-06 04:39:11 - ERROR - stderr - +2025-02-06 04:39:11 - INFO - stdout - {'loss': 0.3243, 'grad_norm': 1.5498524904251099, 'learning_rate': 7.958292125484713e-07, 'epoch': 2.63} +2025-02-06 04:39:11 - ERROR - stderr - 88%|████████▊ | 19652/22434 [18:31:30<2:00:34, 2.60s/it] +2025-02-06 04:39:13 - ERROR - stderr - 88%|████████▊ | 19653/22434 [18:31:33<2:02:41, 2.65s/it] +2025-02-06 04:39:13 - ERROR - stderr - +2025-02-06 04:39:13 - ERROR - stderr - +2025-02-06 04:39:13 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.7014621496200562, 'learning_rate': 7.952648932780094e-07, 'epoch': 2.63} +2025-02-06 04:39:13 - ERROR - stderr - 88%|████████▊ | 19653/22434 [18:31:33<2:02:41, 2.65s/it] +2025-02-06 04:39:35 - ERROR - stderr - 88%|████████▊ | 19654/22434 [18:31:55<6:30:50, 8.44s/it] +2025-02-06 04:39:35 - ERROR - stderr - +2025-02-06 04:39:35 - ERROR - stderr - +2025-02-06 04:39:35 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.4539575576782227, 'learning_rate': 7.947007658714446e-07, 'epoch': 2.63} +2025-02-06 04:39:35 - ERROR - stderr - 88%|████████▊ | 19654/22434 [18:31:55<6:30:50, 8.44s/it] +2025-02-06 04:39:54 - ERROR - stderr - 88%|████████▊ | 19655/22434 [18:32:14<9:00:32, 11.67s/it] +2025-02-06 04:39:54 - ERROR - stderr - +2025-02-06 04:39:54 - ERROR - stderr - +2025-02-06 04:39:54 - INFO - stdout - {'loss': 0.3962, 'grad_norm': 1.6647298336029053, 'learning_rate': 7.941368303405306e-07, 'epoch': 2.63} +2025-02-06 04:39:54 - ERROR - stderr - 88%|████████▊ | 19655/22434 [18:32:14<9:00:32, 11.67s/it] +2025-02-06 04:40:07 - ERROR - stderr - 88%|████████▊ | 19656/22434 [18:32:27<9:18:18, 12.06s/it] +2025-02-06 04:40:07 - ERROR - stderr - +2025-02-06 04:40:07 - ERROR - stderr - +2025-02-06 04:40:07 - INFO - stdout - {'loss': 0.3207, 'grad_norm': 1.5608336925506592, 'learning_rate': 7.93573086697027e-07, 'epoch': 2.63} +2025-02-06 04:40:07 - ERROR - stderr - 88%|████████▊ | 19656/22434 [18:32:27<9:18:18, 12.06s/it] +2025-02-06 04:40:23 - ERROR - stderr - 88%|████████▊ | 19657/22434 [18:32:43<10:07:55, 13.14s/it] +2025-02-06 04:40:23 - ERROR - stderr - +2025-02-06 04:40:23 - ERROR - stderr - +2025-02-06 04:40:23 - INFO - stdout - {'loss': 0.3242, 'grad_norm': 1.448331594467163, 'learning_rate': 7.930095349526834e-07, 'epoch': 2.63} +2025-02-06 04:40:23 - ERROR - stderr - 88%|████████▊ | 19657/22434 [18:32:43<10:07:55, 13.14s/it] +2025-02-06 04:40:31 - ERROR - stderr - 88%|████████▊ | 19658/22434 [18:32:51<8:58:13, 11.63s/it] +2025-02-06 04:40:31 - ERROR - stderr - +2025-02-06 04:40:31 - ERROR - stderr - +2025-02-06 04:40:31 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.375649094581604, 'learning_rate': 7.924461751192447e-07, 'epoch': 2.63} +2025-02-06 04:40:31 - ERROR - stderr - 88%|████████▊ | 19658/22434 [18:32:51<8:58:13, 11.63s/it] +2025-02-06 04:41:42 - ERROR - stderr - 88%|████████▊ | 19659/22434 [18:34:02<22:39:37, 29.40s/it] +2025-02-06 04:41:42 - ERROR - stderr - +2025-02-06 04:41:42 - ERROR - stderr - +2025-02-06 04:41:42 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.62669837474823, 'learning_rate': 7.918830072084571e-07, 'epoch': 2.63} +2025-02-06 04:41:42 - ERROR - stderr - 88%|████████▊ | 19659/22434 [18:34:02<22:39:37, 29.40s/it] +2025-02-06 04:41:54 - ERROR - stderr - 88%|████████▊ | 19660/22434 [18:34:14<18:42:41, 24.28s/it] +2025-02-06 04:41:54 - ERROR - stderr - +2025-02-06 04:41:54 - ERROR - stderr - +2025-02-06 04:41:54 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.4446439743041992, 'learning_rate': 7.913200312320546e-07, 'epoch': 2.63} +2025-02-06 04:41:54 - ERROR - stderr - 88%|████████▊ | 19660/22434 [18:34:14<18:42:41, 24.28s/it] +2025-02-06 04:42:06 - ERROR - stderr - 88%|████████▊ | 19661/22434 [18:34:26<15:49:51, 20.55s/it] +2025-02-06 04:42:06 - ERROR - stderr - +2025-02-06 04:42:06 - ERROR - stderr - +2025-02-06 04:42:06 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.616838812828064, 'learning_rate': 7.907572472017766e-07, 'epoch': 2.63} +2025-02-06 04:42:06 - ERROR - stderr - 88%|████████▊ | 19661/22434 [18:34:26<15:49:51, 20.55s/it] +2025-02-06 04:42:21 - ERROR - stderr - 88%|████████▊ | 19662/22434 [18:34:41<14:32:50, 18.89s/it] +2025-02-06 04:42:21 - ERROR - stderr - +2025-02-06 04:42:21 - ERROR - stderr - +2025-02-06 04:42:21 - INFO - stdout - {'loss': 0.4061, 'grad_norm': 1.688781976699829, 'learning_rate': 7.901946551293493e-07, 'epoch': 2.63} +2025-02-06 04:42:21 - ERROR - stderr - 88%|████████▊ | 19662/22434 [18:34:41<14:32:50, 18.89s/it] +2025-02-06 04:42:24 - ERROR - stderr - 88%|████████▊ | 19663/22434 [18:34:43<10:45:16, 13.97s/it] +2025-02-06 04:42:24 - ERROR - stderr - +2025-02-06 04:42:24 - ERROR - stderr - +2025-02-06 04:42:24 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.566849946975708, 'learning_rate': 7.896322550265012e-07, 'epoch': 2.63} +2025-02-06 04:42:24 - ERROR - stderr - 88%|████████▊ | 19663/22434 [18:34:44<10:45:16, 13.97s/it] +2025-02-06 04:42:37 - ERROR - stderr - 88%|████████▊ | 19664/22434 [18:34:57<10:41:32, 13.90s/it] +2025-02-06 04:42:37 - ERROR - stderr - +2025-02-06 04:42:37 - ERROR - stderr - +2025-02-06 04:42:37 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.6776561737060547, 'learning_rate': 7.890700469049573e-07, 'epoch': 2.63} +2025-02-06 04:42:37 - ERROR - stderr - 88%|████████▊ | 19664/22434 [18:34:57<10:41:32, 13.90s/it] +2025-02-06 04:42:54 - ERROR - stderr - 88%|████████▊ | 19665/22434 [18:35:14<11:22:09, 14.78s/it] +2025-02-06 04:42:54 - ERROR - stderr - +2025-02-06 04:42:54 - ERROR - stderr - +2025-02-06 04:42:54 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.6221222877502441, 'learning_rate': 7.885080307764326e-07, 'epoch': 2.63} +2025-02-06 04:42:54 - ERROR - stderr - 88%|████████▊ | 19665/22434 [18:35:14<11:22:09, 14.78s/it] +2025-02-06 04:43:38 - ERROR - stderr - 88%|████████▊ | 19666/22434 [18:35:58<18:00:42, 23.43s/it] +2025-02-06 04:43:38 - ERROR - stderr - +2025-02-06 04:43:38 - ERROR - stderr - +2025-02-06 04:43:38 - INFO - stdout - {'loss': 0.3553, 'grad_norm': 1.554355263710022, 'learning_rate': 7.879462066526456e-07, 'epoch': 2.63} +2025-02-06 04:43:38 - ERROR - stderr - 88%|████████▊ | 19666/22434 [18:35:58<18:00:42, 23.43s/it] +2025-02-06 04:44:24 - ERROR - stderr - 88%|████████▊ | 19667/22434 [18:36:44<23:11:07, 30.17s/it] +2025-02-06 04:44:24 - ERROR - stderr - +2025-02-06 04:44:24 - ERROR - stderr - +2025-02-06 04:44:24 - INFO - stdout - {'loss': 0.3433, 'grad_norm': 1.381039023399353, 'learning_rate': 7.873845745453046e-07, 'epoch': 2.63} +2025-02-06 04:44:24 - ERROR - stderr - 88%|████████▊ | 19667/22434 [18:36:44<23:11:07, 30.17s/it] +2025-02-06 04:45:08 - ERROR - stderr - 88%|████████▊ | 19668/22434 [18:37:28<26:22:23, 34.33s/it] +2025-02-06 04:45:08 - ERROR - stderr - +2025-02-06 04:45:08 - ERROR - stderr - +2025-02-06 04:45:08 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.6432201862335205, 'learning_rate': 7.868231344661148e-07, 'epoch': 2.63} +2025-02-06 04:45:08 - ERROR - stderr - 88%|████████▊ | 19668/22434 [18:37:28<26:22:23, 34.33s/it] +2025-02-06 04:45:10 - ERROR - stderr - 88%|████████▊ | 19669/22434 [18:37:30<19:01:14, 24.76s/it] +2025-02-06 04:45:10 - ERROR - stderr - +2025-02-06 04:45:10 - ERROR - stderr - +2025-02-06 04:45:10 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.6069613695144653, 'learning_rate': 7.862618864267823e-07, 'epoch': 2.63} +2025-02-06 04:45:10 - ERROR - stderr - 88%|████████▊ | 19669/22434 [18:37:30<19:01:14, 24.76s/it] +2025-02-06 04:45:37 - ERROR - stderr - 88%|████████▊ | 19670/22434 [18:37:56<19:22:40, 25.24s/it] +2025-02-06 04:45:37 - ERROR - stderr - +2025-02-06 04:45:37 - ERROR - stderr - +2025-02-06 04:45:37 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.6324853897094727, 'learning_rate': 7.857008304390035e-07, 'epoch': 2.63} +2025-02-06 04:45:37 - ERROR - stderr - 88%|████████▊ | 19670/22434 [18:37:56<19:22:40, 25.24s/it] +2025-02-06 04:46:21 - ERROR - stderr - 88%|████████▊ | 19671/22434 [18:38:41<23:51:41, 31.09s/it] +2025-02-06 04:46:21 - ERROR - stderr - +2025-02-06 04:46:21 - ERROR - stderr - +2025-02-06 04:46:21 - INFO - stdout - {'loss': 0.3609, 'grad_norm': 1.5837786197662354, 'learning_rate': 7.851399665144743e-07, 'epoch': 2.63} +2025-02-06 04:46:21 - ERROR - stderr - 88%|████████▊ | 19671/22434 [18:38:41<23:51:41, 31.09s/it] +2025-02-06 04:46:35 - ERROR - stderr - 88%|████████▊ | 19672/22434 [18:38:55<19:48:05, 25.81s/it] +2025-02-06 04:46:35 - ERROR - stderr - +2025-02-06 04:46:35 - ERROR - stderr - +2025-02-06 04:46:35 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.6376917362213135, 'learning_rate': 7.845792946648845e-07, 'epoch': 2.63} +2025-02-06 04:46:35 - ERROR - stderr - 88%|████████▊ | 19672/22434 [18:38:55<19:48:05, 25.81s/it] +2025-02-06 04:47:22 - ERROR - stderr - 88%|████████▊ | 19673/22434 [18:39:42<24:42:48, 32.22s/it] +2025-02-06 04:47:22 - ERROR - stderr - +2025-02-06 04:47:22 - ERROR - stderr - +2025-02-06 04:47:22 - INFO - stdout - {'loss': 0.3347, 'grad_norm': 1.3903653621673584, 'learning_rate': 7.840188149019201e-07, 'epoch': 2.63} +2025-02-06 04:47:22 - ERROR - stderr - 88%|████████▊ | 19673/22434 [18:39:42<24:42:48, 32.22s/it] +2025-02-06 04:47:25 - ERROR - stderr - 88%|████████▊ | 19674/22434 [18:39:44<17:52:38, 23.32s/it] +2025-02-06 04:47:25 - ERROR - stderr - +2025-02-06 04:47:25 - ERROR - stderr - +2025-02-06 04:47:25 - INFO - stdout - {'loss': 0.3872, 'grad_norm': 1.6362255811691284, 'learning_rate': 7.834585272372663e-07, 'epoch': 2.63} +2025-02-06 04:47:25 - ERROR - stderr - 88%|████████▊ | 19674/22434 [18:39:44<17:52:38, 23.32s/it] +2025-02-06 04:48:10 - ERROR - stderr - 88%|████████▊ | 19675/22434 [18:40:30<23:01:23, 30.04s/it] +2025-02-06 04:48:10 - ERROR - stderr - +2025-02-06 04:48:10 - ERROR - stderr - +2025-02-06 04:48:10 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.4612270593643188, 'learning_rate': 7.828984316825994e-07, 'epoch': 2.63} +2025-02-06 04:48:10 - ERROR - stderr - 88%|████████▊ | 19675/22434 [18:40:30<23:01:23, 30.04s/it] +2025-02-06 04:48:54 - ERROR - stderr - 88%|████████▊ | 19676/22434 [18:41:14<26:10:46, 34.17s/it] +2025-02-06 04:48:54 - ERROR - stderr - +2025-02-06 04:48:54 - ERROR - stderr - +2025-02-06 04:48:54 - INFO - stdout - {'loss': 0.3323, 'grad_norm': 1.2788245677947998, 'learning_rate': 7.823385282495954e-07, 'epoch': 2.63} +2025-02-06 04:48:54 - ERROR - stderr - 88%|████████▊ | 19676/22434 [18:41:14<26:10:46, 34.17s/it] +2025-02-06 04:49:07 - ERROR - stderr - 88%|████████▊ | 19677/22434 [18:41:27<21:18:57, 27.83s/it] +2025-02-06 04:49:07 - ERROR - stderr - +2025-02-06 04:49:07 - ERROR - stderr - +2025-02-06 04:49:07 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.7574976682662964, 'learning_rate': 7.81778816949924e-07, 'epoch': 2.63} +2025-02-06 04:49:07 - ERROR - stderr - 88%|████████▊ | 19677/22434 [18:41:27<21:18:57, 27.83s/it] +2025-02-06 04:49:57 - ERROR - stderr - 88%|████████▊ | 19678/22434 [18:42:16<26:16:48, 34.33s/it] +2025-02-06 04:49:57 - ERROR - stderr - +2025-02-06 04:49:57 - ERROR - stderr - +2025-02-06 04:49:57 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.418548345565796, 'learning_rate': 7.812192977952538e-07, 'epoch': 2.63} +2025-02-06 04:49:57 - ERROR - stderr - 88%|████████▊ | 19678/22434 [18:42:16<26:16:48, 34.33s/it] +2025-02-06 04:50:12 - ERROR - stderr - 88%|████████▊ | 19679/22434 [18:42:32<21:57:16, 28.69s/it] +2025-02-06 04:50:12 - ERROR - stderr - +2025-02-06 04:50:12 - ERROR - stderr - +2025-02-06 04:50:12 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.5804587602615356, 'learning_rate': 7.806599707972429e-07, 'epoch': 2.63} +2025-02-06 04:50:12 - ERROR - stderr - 88%|████████▊ | 19679/22434 [18:42:32<21:57:16, 28.69s/it] +2025-02-06 04:51:03 - ERROR - stderr - 88%|████████▊ | 19680/22434 [18:43:23<27:07:15, 35.45s/it] +2025-02-06 04:51:03 - ERROR - stderr - +2025-02-06 04:51:03 - ERROR - stderr - +2025-02-06 04:51:03 - INFO - stdout - {'loss': 0.334, 'grad_norm': 1.5596683025360107, 'learning_rate': 7.801008359675565e-07, 'epoch': 2.63} +2025-02-06 04:51:03 - ERROR - stderr - 88%|████████▊ | 19680/22434 [18:43:23<27:07:15, 35.45s/it] +2025-02-06 04:51:28 - ERROR - stderr - 88%|████████▊ | 19681/22434 [18:43:48<24:39:18, 32.24s/it] +2025-02-06 04:51:28 - ERROR - stderr - +2025-02-06 04:51:28 - ERROR - stderr - +2025-02-06 04:51:28 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.6001561880111694, 'learning_rate': 7.795418933178423e-07, 'epoch': 2.63} +2025-02-06 04:51:28 - ERROR - stderr - 88%|████████▊ | 19681/22434 [18:43:48<24:39:18, 32.24s/it] +2025-02-06 04:51:52 - ERROR - stderr - 88%|████████▊ | 19682/22434 [18:44:12<22:47:25, 29.81s/it] +2025-02-06 04:51:52 - ERROR - stderr - +2025-02-06 04:51:52 - ERROR - stderr - +2025-02-06 04:51:52 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.6871455907821655, 'learning_rate': 7.78983142859755e-07, 'epoch': 2.63} +2025-02-06 04:51:52 - ERROR - stderr - 88%|████████▊ | 19682/22434 [18:44:12<22:47:25, 29.81s/it] +2025-02-06 04:52:37 - ERROR - stderr - 88%|████████▊ | 19683/22434 [18:44:56<26:06:07, 34.16s/it] +2025-02-06 04:52:37 - ERROR - stderr - +2025-02-06 04:52:37 - ERROR - stderr - +2025-02-06 04:52:37 - INFO - stdout - {'loss': 0.3577, 'grad_norm': 1.5695146322250366, 'learning_rate': 7.784245846049432e-07, 'epoch': 2.63} +2025-02-06 04:52:37 - ERROR - stderr - 88%|████████▊ | 19683/22434 [18:44:56<26:06:07, 34.16s/it] +2025-02-06 04:52:53 - ERROR - stderr - 88%|████████▊ | 19684/22434 [18:45:13<22:08:07, 28.98s/it] +2025-02-06 04:52:54 - ERROR - stderr - +2025-02-06 04:52:54 - ERROR - stderr - +2025-02-06 04:52:54 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.4506876468658447, 'learning_rate': 7.778662185650431e-07, 'epoch': 2.63} +2025-02-06 04:52:54 - ERROR - stderr - 88%|████████▊ | 19684/22434 [18:45:13<22:08:07, 28.98s/it] +2025-02-06 04:52:56 - ERROR - stderr - 88%|████████▊ | 19685/22434 [18:45:16<16:02:50, 21.02s/it] +2025-02-06 04:52:56 - ERROR - stderr - +2025-02-06 04:52:56 - ERROR - stderr - +2025-02-06 04:52:56 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.5756123065948486, 'learning_rate': 7.773080447517012e-07, 'epoch': 2.63} +2025-02-06 04:52:56 - ERROR - stderr - 88%|████████▊ | 19685/22434 [18:45:16<16:02:50, 21.02s/it] +2025-02-06 04:52:58 - ERROR - stderr - 88%|████████▊ | 19686/22434 [18:45:18<11:48:21, 15.47s/it] +2025-02-06 04:52:58 - ERROR - stderr - +2025-02-06 04:52:58 - ERROR - stderr - +2025-02-06 04:52:58 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.4218013286590576, 'learning_rate': 7.767500631765456e-07, 'epoch': 2.63} +2025-02-06 04:52:58 - ERROR - stderr - 88%|████████▊ | 19686/22434 [18:45:18<11:48:21, 15.47s/it] +2025-02-06 04:53:32 - ERROR - stderr - 88%|████████▊ | 19687/22434 [18:45:51<15:52:44, 20.81s/it] +2025-02-06 04:53:32 - ERROR - stderr - +2025-02-06 04:53:32 - ERROR - stderr - +2025-02-06 04:53:32 - INFO - stdout - {'loss': 0.3397, 'grad_norm': 1.4662104845046997, 'learning_rate': 7.761922738512096e-07, 'epoch': 2.63} +2025-02-06 04:53:32 - ERROR - stderr - 88%|████████▊ | 19687/22434 [18:45:52<15:52:44, 20.81s/it] +2025-02-06 04:53:34 - ERROR - stderr - 88%|████████▊ | 19688/22434 [18:45:54<11:43:33, 15.37s/it] +2025-02-06 04:53:34 - ERROR - stderr - +2025-02-06 04:53:34 - ERROR - stderr - +2025-02-06 04:53:34 - INFO - stdout - {'loss': 0.3509, 'grad_norm': 1.516066074371338, 'learning_rate': 7.756346767873191e-07, 'epoch': 2.63} +2025-02-06 04:53:34 - ERROR - stderr - 88%|████████▊ | 19688/22434 [18:45:54<11:43:33, 15.37s/it] +2025-02-06 04:53:37 - ERROR - stderr - 88%|████████▊ | 19689/22434 [18:45:57<8:47:08, 11.52s/it] +2025-02-06 04:53:37 - ERROR - stderr - +2025-02-06 04:53:37 - ERROR - stderr - +2025-02-06 04:53:37 - INFO - stdout - {'loss': 0.4086, 'grad_norm': 1.6813167333602905, 'learning_rate': 7.750772719964961e-07, 'epoch': 2.63} +2025-02-06 04:53:37 - ERROR - stderr - 88%|████████▊ | 19689/22434 [18:45:57<8:47:08, 11.52s/it] +2025-02-06 04:53:39 - ERROR - stderr - 88%|████████▊ | 19690/22434 [18:45:59<6:42:09, 8.79s/it] +2025-02-06 04:53:39 - ERROR - stderr - +2025-02-06 04:53:39 - ERROR - stderr - +2025-02-06 04:53:39 - INFO - stdout - {'loss': 0.3973, 'grad_norm': 1.6123945713043213, 'learning_rate': 7.745200594903612e-07, 'epoch': 2.63} +2025-02-06 04:53:39 - ERROR - stderr - 88%|████████▊ | 19690/22434 [18:45:59<6:42:09, 8.79s/it] +2025-02-06 04:54:04 - ERROR - stderr - 88%|████████▊ | 19691/22434 [18:46:24<10:18:41, 13.53s/it] +2025-02-06 04:54:04 - ERROR - stderr - +2025-02-06 04:54:04 - ERROR - stderr - +2025-02-06 04:54:04 - INFO - stdout - {'loss': 0.3345, 'grad_norm': 1.4779176712036133, 'learning_rate': 7.739630392805276e-07, 'epoch': 2.63} +2025-02-06 04:54:04 - ERROR - stderr - 88%|████████▊ | 19691/22434 [18:46:24<10:18:41, 13.53s/it] +2025-02-06 04:54:27 - ERROR - stderr - 88%|████████▊ | 19692/22434 [18:46:46<12:25:02, 16.30s/it] +2025-02-06 04:54:27 - ERROR - stderr - +2025-02-06 04:54:27 - ERROR - stderr - +2025-02-06 04:54:27 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.4237959384918213, 'learning_rate': 7.734062113786067e-07, 'epoch': 2.63} +2025-02-06 04:54:27 - ERROR - stderr - 88%|████████▊ | 19692/22434 [18:46:47<12:25:02, 16.30s/it] +2025-02-06 04:54:29 - ERROR - stderr - 88%|████████▊ | 19693/22434 [18:46:49<9:16:04, 12.17s/it] +2025-02-06 04:54:29 - ERROR - stderr - +2025-02-06 04:54:29 - ERROR - stderr - +2025-02-06 04:54:29 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.5116260051727295, 'learning_rate': 7.72849575796204e-07, 'epoch': 2.63} +2025-02-06 04:54:29 - ERROR - stderr - 88%|████████▊ | 19693/22434 [18:46:49<9:16:04, 12.17s/it] +2025-02-06 04:54:32 - ERROR - stderr - 88%|████████▊ | 19694/22434 [18:46:51<7:02:15, 9.25s/it] +2025-02-06 04:54:32 - ERROR - stderr - +2025-02-06 04:54:32 - ERROR - stderr - +2025-02-06 04:54:32 - INFO - stdout - {'loss': 0.3188, 'grad_norm': 1.4630063772201538, 'learning_rate': 7.722931325449223e-07, 'epoch': 2.63} +2025-02-06 04:54:32 - ERROR - stderr - 88%|████████▊ | 19694/22434 [18:46:51<7:02:15, 9.25s/it] +2025-02-06 04:54:34 - ERROR - stderr - 88%|████████▊ | 19695/22434 [18:46:54<5:28:57, 7.21s/it] +2025-02-06 04:54:34 - ERROR - stderr - +2025-02-06 04:54:34 - ERROR - stderr - +2025-02-06 04:54:34 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.4825414419174194, 'learning_rate': 7.717368816363602e-07, 'epoch': 2.63} +2025-02-06 04:54:34 - ERROR - stderr - 88%|████████▊ | 19695/22434 [18:46:54<5:28:57, 7.21s/it] +2025-02-06 04:54:54 - ERROR - stderr - 88%|████████▊ | 19696/22434 [18:47:14<8:24:49, 11.06s/it] +2025-02-06 04:54:54 - ERROR - stderr - +2025-02-06 04:54:54 - ERROR - stderr - +2025-02-06 04:54:54 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.4199774265289307, 'learning_rate': 7.711808230821116e-07, 'epoch': 2.63} +2025-02-06 04:54:54 - ERROR - stderr - 88%|████████▊ | 19696/22434 [18:47:14<8:24:49, 11.06s/it] +2025-02-06 04:54:57 - ERROR - stderr - 88%|████████▊ | 19697/22434 [18:47:16<6:27:08, 8.49s/it] +2025-02-06 04:54:57 - ERROR - stderr - +2025-02-06 04:54:57 - ERROR - stderr - +2025-02-06 04:54:57 - INFO - stdout - {'loss': 0.3069, 'grad_norm': 1.567478895187378, 'learning_rate': 7.706249568937685e-07, 'epoch': 2.63} +2025-02-06 04:54:57 - ERROR - stderr - 88%|████████▊ | 19697/22434 [18:47:16<6:27:08, 8.49s/it] +2025-02-06 04:55:16 - ERROR - stderr - 88%|████████▊ | 19698/22434 [18:47:35<8:48:54, 11.60s/it] +2025-02-06 04:55:16 - ERROR - stderr - +2025-02-06 04:55:16 - ERROR - stderr - +2025-02-06 04:55:16 - INFO - stdout - {'loss': 0.336, 'grad_norm': 1.4778313636779785, 'learning_rate': 7.70069283082917e-07, 'epoch': 2.63} +2025-02-06 04:55:16 - ERROR - stderr - 88%|████████▊ | 19698/22434 [18:47:35<8:48:54, 11.60s/it] +2025-02-06 04:55:18 - ERROR - stderr - 88%|████████▊ | 19699/22434 [18:47:38<6:46:36, 8.92s/it] +2025-02-06 04:55:18 - ERROR - stderr - +2025-02-06 04:55:18 - ERROR - stderr - +2025-02-06 04:55:18 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.4743849039077759, 'learning_rate': 7.695138016611403e-07, 'epoch': 2.63} +2025-02-06 04:55:18 - ERROR - stderr - 88%|████████▊ | 19699/22434 [18:47:38<6:46:36, 8.92s/it] +2025-02-06 04:55:35 - ERROR - stderr - 88%|████████▊ | 19700/22434 [18:47:54<8:30:20, 11.20s/it] +2025-02-06 04:55:35 - ERROR - stderr - +2025-02-06 04:55:35 - ERROR - stderr - +2025-02-06 04:55:35 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.4713363647460938, 'learning_rate': 7.689585126400135e-07, 'epoch': 2.63} +2025-02-06 04:55:35 - ERROR - stderr - 88%|████████▊ | 19700/22434 [18:47:55<8:30:20, 11.20s/it] +2025-02-06 04:55:50 - ERROR - stderr - 88%|████████▊ | 19701/22434 [18:48:10<9:28:37, 12.48s/it] +2025-02-06 04:55:50 - ERROR - stderr - +2025-02-06 04:55:50 - ERROR - stderr - +2025-02-06 04:55:50 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.541777491569519, 'learning_rate': 7.684034160311138e-07, 'epoch': 2.63} +2025-02-06 04:55:50 - ERROR - stderr - 88%|████████▊ | 19701/22434 [18:48:10<9:28:37, 12.48s/it] +2025-02-06 04:55:53 - ERROR - stderr - 88%|████████▊ | 19702/22434 [18:48:12<7:12:10, 9.49s/it] +2025-02-06 04:55:53 - ERROR - stderr - +2025-02-06 04:55:53 - ERROR - stderr - +2025-02-06 04:55:53 - INFO - stdout - {'loss': 0.4178, 'grad_norm': 1.9172792434692383, 'learning_rate': 7.678485118460133e-07, 'epoch': 2.63} +2025-02-06 04:55:53 - ERROR - stderr - 88%|████████▊ | 19702/22434 [18:48:12<7:12:10, 9.49s/it] +2025-02-06 04:56:01 - ERROR - stderr - 88%|████████▊ | 19703/22434 [18:48:21<6:55:38, 9.13s/it] +2025-02-06 04:56:01 - ERROR - stderr - +2025-02-06 04:56:01 - ERROR - stderr - +2025-02-06 04:56:01 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.7327104806900024, 'learning_rate': 7.672938000962726e-07, 'epoch': 2.63} +2025-02-06 04:56:01 - ERROR - stderr - 88%|████████▊ | 19703/22434 [18:48:21<6:55:38, 9.13s/it] +2025-02-06 04:56:03 - ERROR - stderr - 88%|████████▊ | 19704/22434 [18:48:23<5:24:44, 7.14s/it] +2025-02-06 04:56:04 - ERROR - stderr - +2025-02-06 04:56:04 - ERROR - stderr - +2025-02-06 04:56:04 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.4440220594406128, 'learning_rate': 7.667392807934615e-07, 'epoch': 2.63} +2025-02-06 04:56:04 - ERROR - stderr - 88%|████████▊ | 19704/22434 [18:48:23<5:24:44, 7.14s/it] +2025-02-06 04:56:06 - ERROR - stderr - 88%|████████▊ | 19705/22434 [18:48:26<4:22:44, 5.78s/it] +2025-02-06 04:56:06 - ERROR - stderr - +2025-02-06 04:56:06 - ERROR - stderr - +2025-02-06 04:56:06 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.3925327062606812, 'learning_rate': 7.661849539491318e-07, 'epoch': 2.64} +2025-02-06 04:56:06 - ERROR - stderr - 88%|████████▊ | 19705/22434 [18:48:26<4:22:44, 5.78s/it] +2025-02-06 04:56:09 - ERROR - stderr - 88%|████████▊ | 19706/22434 [18:48:28<3:37:50, 4.79s/it] +2025-02-06 04:56:09 - ERROR - stderr - +2025-02-06 04:56:09 - ERROR - stderr - +2025-02-06 04:56:09 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.4897451400756836, 'learning_rate': 7.656308195748441e-07, 'epoch': 2.64} +2025-02-06 04:56:09 - ERROR - stderr - 88%|████████▊ | 19706/22434 [18:48:28<3:37:50, 4.79s/it] +2025-02-06 04:56:13 - ERROR - stderr - 88%|████████▊ | 19707/22434 [18:48:33<3:33:33, 4.70s/it] +2025-02-06 04:56:13 - ERROR - stderr - +2025-02-06 04:56:13 - ERROR - stderr - +2025-02-06 04:56:13 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.4798705577850342, 'learning_rate': 7.650768776821438e-07, 'epoch': 2.64} +2025-02-06 04:56:13 - ERROR - stderr - 88%|████████▊ | 19707/22434 [18:48:33<3:33:33, 4.70s/it] +2025-02-06 04:56:16 - ERROR - stderr - 88%|████████▊ | 19708/22434 [18:48:35<3:04:15, 4.06s/it] +2025-02-06 04:56:16 - ERROR - stderr - +2025-02-06 04:56:16 - ERROR - stderr - +2025-02-06 04:56:16 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.408481240272522, 'learning_rate': 7.645231282825794e-07, 'epoch': 2.64} +2025-02-06 04:56:16 - ERROR - stderr - 88%|████████▊ | 19708/22434 [18:48:35<3:04:15, 4.06s/it] +2025-02-06 04:56:19 - ERROR - stderr - 88%|████████▊ | 19709/22434 [18:48:38<2:49:24, 3.73s/it] +2025-02-06 04:56:19 - ERROR - stderr - +2025-02-06 04:56:19 - ERROR - stderr - +2025-02-06 04:56:19 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.6023201942443848, 'learning_rate': 7.639695713876938e-07, 'epoch': 2.64} +2025-02-06 04:56:19 - ERROR - stderr - 88%|████████▊ | 19709/22434 [18:48:38<2:49:24, 3.73s/it] +2025-02-06 04:56:21 - ERROR - stderr - 88%|████████▊ | 19710/22434 [18:48:41<2:32:29, 3.36s/it] +2025-02-06 04:56:21 - ERROR - stderr - +2025-02-06 04:56:21 - ERROR - stderr - +2025-02-06 04:56:21 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.6901289224624634, 'learning_rate': 7.634162070090234e-07, 'epoch': 2.64} +2025-02-06 04:56:21 - ERROR - stderr - 88%|████████▊ | 19710/22434 [18:48:41<2:32:29, 3.36s/it] +2025-02-06 04:56:24 - ERROR - stderr - 88%|████████▊ | 19711/22434 [18:48:43<2:20:49, 3.10s/it] +2025-02-06 04:56:24 - ERROR - stderr - +2025-02-06 04:56:24 - ERROR - stderr - +2025-02-06 04:56:24 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.626448631286621, 'learning_rate': 7.628630351581035e-07, 'epoch': 2.64} +2025-02-06 04:56:24 - ERROR - stderr - 88%|████████▊ | 19711/22434 [18:48:43<2:20:49, 3.10s/it] +2025-02-06 04:56:26 - ERROR - stderr - 88%|████████▊ | 19712/22434 [18:48:46<2:16:32, 3.01s/it] +2025-02-06 04:56:26 - ERROR - stderr - +2025-02-06 04:56:26 - ERROR - stderr - +2025-02-06 04:56:26 - INFO - stdout - {'loss': 0.2992, 'grad_norm': 1.475656270980835, 'learning_rate': 7.623100558464658e-07, 'epoch': 2.64} +2025-02-06 04:56:26 - ERROR - stderr - 88%|████████▊ | 19712/22434 [18:48:46<2:16:32, 3.01s/it] +2025-02-06 04:56:29 - ERROR - stderr - 88%|████████▊ | 19713/22434 [18:48:49<2:09:35, 2.86s/it] +2025-02-06 04:56:29 - ERROR - stderr - +2025-02-06 04:56:29 - ERROR - stderr - +2025-02-06 04:56:29 - INFO - stdout - {'loss': 0.3395, 'grad_norm': 1.5321446657180786, 'learning_rate': 7.617572690856346e-07, 'epoch': 2.64} +2025-02-06 04:56:29 - ERROR - stderr - 88%|████████▊ | 19713/22434 [18:48:49<2:09:35, 2.86s/it] +2025-02-06 04:56:31 - ERROR - stderr - 88%|████████▊ | 19714/22434 [18:48:51<2:03:36, 2.73s/it] +2025-02-06 04:56:31 - ERROR - stderr - +2025-02-06 04:56:31 - ERROR - stderr - +2025-02-06 04:56:31 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.5492538213729858, 'learning_rate': 7.612046748871327e-07, 'epoch': 2.64} +2025-02-06 04:56:31 - ERROR - stderr - 88%|████████▊ | 19714/22434 [18:48:51<2:03:36, 2.73s/it] +2025-02-06 04:56:34 - ERROR - stderr - 88%|████████▊ | 19715/22434 [18:48:54<2:01:14, 2.68s/it] +2025-02-06 04:56:34 - ERROR - stderr - +2025-02-06 04:56:34 - ERROR - stderr - +2025-02-06 04:56:34 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.636447548866272, 'learning_rate': 7.606522732624799e-07, 'epoch': 2.64} +2025-02-06 04:56:34 - ERROR - stderr - 88%|████████▊ | 19715/22434 [18:48:54<2:01:14, 2.68s/it] +2025-02-06 04:56:56 - ERROR - stderr - 88%|████████▊ | 19716/22434 [18:49:16<6:31:16, 8.64s/it] +2025-02-06 04:56:56 - ERROR - stderr - +2025-02-06 04:56:56 - ERROR - stderr - +2025-02-06 04:56:56 - INFO - stdout - {'loss': 0.3851, 'grad_norm': 1.651064395904541, 'learning_rate': 7.601000642231882e-07, 'epoch': 2.64} +2025-02-06 04:56:56 - ERROR - stderr - 88%|████████▊ | 19716/22434 [18:49:16<6:31:16, 8.64s/it] +2025-02-06 04:57:08 - ERROR - stderr - 88%|████████▊ | 19717/22434 [18:49:27<7:04:58, 9.38s/it] +2025-02-06 04:57:08 - ERROR - stderr - +2025-02-06 04:57:08 - ERROR - stderr - +2025-02-06 04:57:08 - INFO - stdout - {'loss': 0.4742, 'grad_norm': 1.7746621370315552, 'learning_rate': 7.595480477807704e-07, 'epoch': 2.64} +2025-02-06 04:57:08 - ERROR - stderr - 88%|████████▊ | 19717/22434 [18:49:27<7:04:58, 9.38s/it] +2025-02-06 04:57:34 - ERROR - stderr - 88%|████████▊ | 19718/22434 [18:49:53<10:50:50, 14.38s/it] +2025-02-06 04:57:34 - ERROR - stderr - +2025-02-06 04:57:34 - ERROR - stderr - +2025-02-06 04:57:34 - INFO - stdout - {'loss': 0.3661, 'grad_norm': 1.5417121648788452, 'learning_rate': 7.589962239467297e-07, 'epoch': 2.64} +2025-02-06 04:57:34 - ERROR - stderr - 88%|████████▊ | 19718/22434 [18:49:53<10:50:50, 14.38s/it] +2025-02-06 04:58:03 - ERROR - stderr - 88%|████████▊ | 19719/22434 [18:50:22<14:08:45, 18.76s/it] +2025-02-06 04:58:03 - ERROR - stderr - +2025-02-06 04:58:03 - ERROR - stderr - +2025-02-06 04:58:03 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.538307785987854, 'learning_rate': 7.584445927325713e-07, 'epoch': 2.64} +2025-02-06 04:58:03 - ERROR - stderr - 88%|████████▊ | 19719/22434 [18:50:22<14:08:45, 18.76s/it] +2025-02-06 04:58:14 - ERROR - stderr - 88%|████████▊ | 19720/22434 [18:50:33<12:23:58, 16.45s/it] +2025-02-06 04:58:14 - ERROR - stderr - +2025-02-06 04:58:14 - ERROR - stderr - +2025-02-06 04:58:14 - INFO - stdout - {'loss': 0.3269, 'grad_norm': 1.5849133729934692, 'learning_rate': 7.578931541497925e-07, 'epoch': 2.64} +2025-02-06 04:58:14 - ERROR - stderr - 88%|████████▊ | 19720/22434 [18:50:33<12:23:58, 16.45s/it] +2025-02-06 04:58:26 - ERROR - stderr - 88%|████████▊ | 19721/22434 [18:50:46<11:27:39, 15.21s/it] +2025-02-06 04:58:26 - ERROR - stderr - +2025-02-06 04:58:26 - ERROR - stderr - +2025-02-06 04:58:26 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.694332480430603, 'learning_rate': 7.573419082098865e-07, 'epoch': 2.64} +2025-02-06 04:58:26 - ERROR - stderr - 88%|████████▊ | 19721/22434 [18:50:46<11:27:39, 15.21s/it] +2025-02-06 04:59:03 - ERROR - stderr - 88%|████████▊ | 19722/22434 [18:51:23<16:21:02, 21.70s/it] +2025-02-06 04:59:03 - ERROR - stderr - +2025-02-06 04:59:03 - ERROR - stderr - +2025-02-06 04:59:03 - INFO - stdout - {'loss': 0.3276, 'grad_norm': 1.5112320184707642, 'learning_rate': 7.567908549243441e-07, 'epoch': 2.64} +2025-02-06 04:59:03 - ERROR - stderr - 88%|████████▊ | 19722/22434 [18:51:23<16:21:02, 21.70s/it] +2025-02-06 04:59:47 - ERROR - stderr - 88%|████████▊ | 19723/22434 [18:52:07<21:29:18, 28.54s/it] +2025-02-06 04:59:47 - ERROR - stderr - +2025-02-06 04:59:47 - ERROR - stderr - +2025-02-06 04:59:47 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.5857157707214355, 'learning_rate': 7.562399943046527e-07, 'epoch': 2.64} +2025-02-06 04:59:47 - ERROR - stderr - 88%|████████▊ | 19723/22434 [18:52:07<21:29:18, 28.54s/it] +2025-02-06 05:00:32 - ERROR - stderr - 88%|████████▊ | 19724/22434 [18:52:52<25:11:45, 33.47s/it] +2025-02-06 05:00:32 - ERROR - stderr - +2025-02-06 05:00:32 - ERROR - stderr - +2025-02-06 05:00:32 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.5652291774749756, 'learning_rate': 7.556893263622911e-07, 'epoch': 2.64} +2025-02-06 05:00:32 - ERROR - stderr - 88%|████████▊ | 19724/22434 [18:52:52<25:11:45, 33.47s/it] +2025-02-06 05:00:45 - ERROR - stderr - 88%|████████▊ | 19725/22434 [18:53:05<20:28:46, 27.22s/it] +2025-02-06 05:00:45 - ERROR - stderr - +2025-02-06 05:00:45 - ERROR - stderr - +2025-02-06 05:00:45 - INFO - stdout - {'loss': 0.3627, 'grad_norm': 1.5392587184906006, 'learning_rate': 7.551388511087421e-07, 'epoch': 2.64} +2025-02-06 05:00:45 - ERROR - stderr - 88%|████████▊ | 19725/22434 [18:53:05<20:28:46, 27.22s/it] +2025-02-06 05:01:26 - ERROR - stderr - 88%|████████▊ | 19726/22434 [18:53:46<23:40:26, 31.47s/it] +2025-02-06 05:01:26 - ERROR - stderr - +2025-02-06 05:01:26 - ERROR - stderr - +2025-02-06 05:01:26 - INFO - stdout - {'loss': 0.3277, 'grad_norm': 1.5864126682281494, 'learning_rate': 7.545885685554743e-07, 'epoch': 2.64} +2025-02-06 05:01:26 - ERROR - stderr - 88%|████████▊ | 19726/22434 [18:53:46<23:40:26, 31.47s/it] +2025-02-06 05:01:48 - ERROR - stderr - 88%|████████▊ | 19727/22434 [18:54:08<21:31:55, 28.64s/it] +2025-02-06 05:01:48 - ERROR - stderr - +2025-02-06 05:01:48 - ERROR - stderr - +2025-02-06 05:01:48 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.504490613937378, 'learning_rate': 7.540384787139643e-07, 'epoch': 2.64} +2025-02-06 05:01:48 - ERROR - stderr - 88%|████████▊ | 19727/22434 [18:54:08<21:31:55, 28.64s/it] +2025-02-06 05:01:51 - ERROR - stderr - 88%|████████▊ | 19728/22434 [18:54:11<15:37:38, 20.79s/it] +2025-02-06 05:01:51 - ERROR - stderr - +2025-02-06 05:01:51 - ERROR - stderr - +2025-02-06 05:01:51 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.6559419631958008, 'learning_rate': 7.534885815956727e-07, 'epoch': 2.64} +2025-02-06 05:01:51 - ERROR - stderr - 88%|████████▊ | 19728/22434 [18:54:11<15:37:38, 20.79s/it] +2025-02-06 05:02:37 - ERROR - stderr - 88%|████████▊ | 19729/22434 [18:54:57<21:27:45, 28.56s/it] +2025-02-06 05:02:37 - ERROR - stderr - +2025-02-06 05:02:37 - ERROR - stderr - +2025-02-06 05:02:37 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.4424320459365845, 'learning_rate': 7.529388772120628e-07, 'epoch': 2.64} +2025-02-06 05:02:37 - ERROR - stderr - 88%|████████▊ | 19729/22434 [18:54:57<21:27:45, 28.56s/it] +2025-02-06 05:03:19 - ERROR - stderr - 88%|████████▊ | 19730/22434 [18:55:39<24:20:52, 32.42s/it] +2025-02-06 05:03:19 - ERROR - stderr - +2025-02-06 05:03:19 - ERROR - stderr - +2025-02-06 05:03:19 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.615708589553833, 'learning_rate': 7.523893655745962e-07, 'epoch': 2.64} +2025-02-06 05:03:19 - ERROR - stderr - 88%|████████▊ | 19730/22434 [18:55:39<24:20:52, 32.42s/it] +2025-02-06 05:03:21 - ERROR - stderr - 88%|████████▊ | 19731/22434 [18:55:41<17:35:48, 23.44s/it] +2025-02-06 05:03:21 - ERROR - stderr - +2025-02-06 05:03:21 - ERROR - stderr - +2025-02-06 05:03:21 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.4666892290115356, 'learning_rate': 7.518400466947229e-07, 'epoch': 2.64} +2025-02-06 05:03:21 - ERROR - stderr - 88%|████████▊ | 19731/22434 [18:55:41<17:35:48, 23.44s/it] +2025-02-06 05:04:07 - ERROR - stderr - 88%|████████▊ | 19732/22434 [18:56:27<22:34:37, 30.08s/it] +2025-02-06 05:04:07 - ERROR - stderr - +2025-02-06 05:04:07 - ERROR - stderr - +2025-02-06 05:04:07 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.4887847900390625, 'learning_rate': 7.512909205838948e-07, 'epoch': 2.64} +2025-02-06 05:04:07 - ERROR - stderr - 88%|████████▊ | 19732/22434 [18:56:27<22:34:37, 30.08s/it] +2025-02-06 05:04:58 - ERROR - stderr - 88%|████████▊ | 19733/22434 [18:57:18<27:24:38, 36.53s/it] +2025-02-06 05:04:59 - ERROR - stderr - +2025-02-06 05:04:59 - ERROR - stderr - +2025-02-06 05:04:59 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.520777940750122, 'learning_rate': 7.507419872535559e-07, 'epoch': 2.64} +2025-02-06 05:04:59 - ERROR - stderr - 88%|████████▊ | 19733/22434 [18:57:18<27:24:38, 36.53s/it] +2025-02-06 05:05:25 - ERROR - stderr - 88%|████████▊ | 19734/22434 [18:57:44<25:04:14, 33.43s/it] +2025-02-06 05:05:25 - ERROR - stderr - +2025-02-06 05:05:25 - ERROR - stderr - +2025-02-06 05:05:25 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.5798959732055664, 'learning_rate': 7.501932467151507e-07, 'epoch': 2.64} +2025-02-06 05:05:25 - ERROR - stderr - 88%|████████▊ | 19734/22434 [18:57:44<25:04:14, 33.43s/it] +2025-02-06 05:05:42 - ERROR - stderr - 88%|████████▊ | 19735/22434 [18:58:02<21:30:00, 28.68s/it] +2025-02-06 05:05:42 - ERROR - stderr - +2025-02-06 05:05:42 - ERROR - stderr - +2025-02-06 05:05:42 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.6038391590118408, 'learning_rate': 7.496446989801165e-07, 'epoch': 2.64} +2025-02-06 05:05:42 - ERROR - stderr - 88%|████████▊ | 19735/22434 [18:58:02<21:30:00, 28.68s/it] +2025-02-06 05:05:54 - ERROR - stderr - 88%|████████▊ | 19736/22434 [18:58:14<17:38:04, 23.53s/it] +2025-02-06 05:05:54 - ERROR - stderr - +2025-02-06 05:05:54 - ERROR - stderr - +2025-02-06 05:05:54 - INFO - stdout - {'loss': 0.3604, 'grad_norm': 1.4596564769744873, 'learning_rate': 7.490963440598864e-07, 'epoch': 2.64} +2025-02-06 05:05:54 - ERROR - stderr - 88%|████████▊ | 19736/22434 [18:58:14<17:38:04, 23.53s/it] +2025-02-06 05:07:34 - ERROR - stderr - 88%|████████▊ | 19737/22434 [18:59:54<34:58:13, 46.68s/it] +2025-02-06 05:07:35 - ERROR - stderr - +2025-02-06 05:07:35 - ERROR - stderr - +2025-02-06 05:07:35 - INFO - stdout - {'loss': 0.295, 'grad_norm': 1.3082728385925293, 'learning_rate': 7.485481819658913e-07, 'epoch': 2.64} +2025-02-06 05:07:35 - ERROR - stderr - 88%|████████▊ | 19737/22434 [18:59:54<34:58:13, 46.68s/it] +2025-02-06 05:07:42 - ERROR - stderr - 88%|████████▊ | 19738/22434 [19:00:02<26:10:56, 34.96s/it] +2025-02-06 05:07:42 - ERROR - stderr - +2025-02-06 05:07:42 - ERROR - stderr - +2025-02-06 05:07:42 - INFO - stdout - {'loss': 0.3164, 'grad_norm': 1.5038464069366455, 'learning_rate': 7.480002127095564e-07, 'epoch': 2.64} +2025-02-06 05:07:42 - ERROR - stderr - 88%|████████▊ | 19738/22434 [19:00:02<26:10:56, 34.96s/it] +2025-02-06 05:08:31 - ERROR - stderr - 88%|████████▊ | 19739/22434 [19:00:51<29:21:27, 39.22s/it] +2025-02-06 05:08:31 - ERROR - stderr - +2025-02-06 05:08:31 - ERROR - stderr - +2025-02-06 05:08:31 - INFO - stdout - {'loss': 0.4225, 'grad_norm': 1.668889045715332, 'learning_rate': 7.474524363023039e-07, 'epoch': 2.64} +2025-02-06 05:08:31 - ERROR - stderr - 88%|████████▊ | 19739/22434 [19:00:51<29:21:27, 39.22s/it] +2025-02-06 05:08:53 - ERROR - stderr - 88%|████████▊ | 19740/22434 [19:01:13<25:27:52, 34.03s/it] +2025-02-06 05:08:53 - ERROR - stderr - +2025-02-06 05:08:53 - ERROR - stderr - +2025-02-06 05:08:53 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.4537122249603271, 'learning_rate': 7.469048527555512e-07, 'epoch': 2.64} +2025-02-06 05:08:53 - ERROR - stderr - 88%|████████▊ | 19740/22434 [19:01:13<25:27:52, 34.03s/it] +2025-02-06 05:09:59 - ERROR - stderr - 88%|████████▊ | 19741/22434 [19:02:18<32:28:52, 43.42s/it] +2025-02-06 05:09:59 - ERROR - stderr - +2025-02-06 05:09:59 - ERROR - stderr - +2025-02-06 05:09:59 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.5721479654312134, 'learning_rate': 7.463574620807135e-07, 'epoch': 2.64} +2025-02-06 05:09:59 - ERROR - stderr - 88%|████████▊ | 19741/22434 [19:02:18<32:28:52, 43.42s/it] +2025-02-06 05:10:35 - ERROR - stderr - 88%|████████▊ | 19742/22434 [19:02:54<30:48:51, 41.21s/it] +2025-02-06 05:10:35 - ERROR - stderr - +2025-02-06 05:10:35 - ERROR - stderr - +2025-02-06 05:10:35 - INFO - stdout - {'loss': 0.4063, 'grad_norm': 1.5630850791931152, 'learning_rate': 7.458102642891984e-07, 'epoch': 2.64} +2025-02-06 05:10:35 - ERROR - stderr - 88%|████████▊ | 19742/22434 [19:02:54<30:48:51, 41.21s/it] +2025-02-06 05:10:37 - ERROR - stderr - 88%|████████▊ | 19743/22434 [19:02:57<22:06:44, 29.58s/it] +2025-02-06 05:10:37 - ERROR - stderr - +2025-02-06 05:10:37 - ERROR - stderr - +2025-02-06 05:10:37 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.835421085357666, 'learning_rate': 7.452632593924147e-07, 'epoch': 2.64} +2025-02-06 05:10:37 - ERROR - stderr - 88%|████████▊ | 19743/22434 [19:02:57<22:06:44, 29.58s/it] +2025-02-06 05:10:39 - ERROR - stderr - 88%|████████▊ | 19744/22434 [19:02:59<16:01:03, 21.44s/it] +2025-02-06 05:10:39 - ERROR - stderr - +2025-02-06 05:10:39 - ERROR - stderr - +2025-02-06 05:10:39 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.7075573205947876, 'learning_rate': 7.447164474017632e-07, 'epoch': 2.64} +2025-02-06 05:10:39 - ERROR - stderr - 88%|████████▊ | 19744/22434 [19:02:59<16:01:03, 21.44s/it] +2025-02-06 05:10:42 - ERROR - stderr - 88%|████████▊ | 19745/22434 [19:03:02<11:46:58, 15.77s/it] +2025-02-06 05:10:42 - ERROR - stderr - +2025-02-06 05:10:42 - ERROR - stderr - +2025-02-06 05:10:42 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.4152356386184692, 'learning_rate': 7.44169828328637e-07, 'epoch': 2.64} +2025-02-06 05:10:42 - ERROR - stderr - 88%|████████▊ | 19745/22434 [19:03:02<11:46:58, 15.77s/it] +2025-02-06 05:11:13 - ERROR - stderr - 88%|████████▊ | 19746/22434 [19:03:32<15:05:54, 20.22s/it] +2025-02-06 05:11:13 - ERROR - stderr - +2025-02-06 05:11:13 - ERROR - stderr - +2025-02-06 05:11:13 - INFO - stdout - {'loss': 0.3236, 'grad_norm': 1.3786213397979736, 'learning_rate': 7.43623402184438e-07, 'epoch': 2.64} +2025-02-06 05:11:13 - ERROR - stderr - 88%|████████▊ | 19746/22434 [19:03:32<15:05:54, 20.22s/it] +2025-02-06 05:11:40 - ERROR - stderr - 88%|████████▊ | 19747/22434 [19:04:00<16:44:15, 22.42s/it] +2025-02-06 05:11:40 - ERROR - stderr - +2025-02-06 05:11:40 - ERROR - stderr - +2025-02-06 05:11:40 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.5594052076339722, 'learning_rate': 7.430771689805504e-07, 'epoch': 2.64} +2025-02-06 05:11:40 - ERROR - stderr - 88%|████████▊ | 19747/22434 [19:04:00<16:44:15, 22.42s/it] +2025-02-06 05:11:43 - ERROR - stderr - 88%|████████▊ | 19748/22434 [19:04:02<12:15:19, 16.43s/it] +2025-02-06 05:11:43 - ERROR - stderr - +2025-02-06 05:11:43 - ERROR - stderr - +2025-02-06 05:11:43 - INFO - stdout - {'loss': 0.3999, 'grad_norm': 1.647066593170166, 'learning_rate': 7.425311287283599e-07, 'epoch': 2.64} +2025-02-06 05:11:43 - ERROR - stderr - 88%|████████▊ | 19748/22434 [19:04:02<12:15:19, 16.43s/it] +2025-02-06 05:11:45 - ERROR - stderr - 88%|████████▊ | 19749/22434 [19:04:05<9:08:16, 12.25s/it] +2025-02-06 05:11:45 - ERROR - stderr - +2025-02-06 05:11:45 - ERROR - stderr - +2025-02-06 05:11:45 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.6193326711654663, 'learning_rate': 7.419852814392526e-07, 'epoch': 2.64} +2025-02-06 05:11:45 - ERROR - stderr - 88%|████████▊ | 19749/22434 [19:04:05<9:08:16, 12.25s/it] +2025-02-06 05:11:48 - ERROR - stderr - 88%|████████▊ | 19750/22434 [19:04:07<6:56:08, 9.30s/it] +2025-02-06 05:11:48 - ERROR - stderr - +2025-02-06 05:11:48 - ERROR - stderr - +2025-02-06 05:11:48 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.669316291809082, 'learning_rate': 7.414396271245994e-07, 'epoch': 2.64} +2025-02-06 05:11:48 - ERROR - stderr - 88%|████████▊ | 19750/22434 [19:04:07<6:56:08, 9.30s/it] +2025-02-06 05:12:06 - ERROR - stderr - 88%|████████▊ | 19751/22434 [19:04:26<8:56:07, 11.99s/it] +2025-02-06 05:12:06 - ERROR - stderr - +2025-02-06 05:12:06 - ERROR - stderr - +2025-02-06 05:12:06 - INFO - stdout - {'loss': 0.3269, 'grad_norm': 1.415085792541504, 'learning_rate': 7.408941657957813e-07, 'epoch': 2.64} +2025-02-06 05:12:06 - ERROR - stderr - 88%|████████▊ | 19751/22434 [19:04:26<8:56:07, 11.99s/it] +2025-02-06 05:12:24 - ERROR - stderr - 88%|████████▊ | 19752/22434 [19:04:43<10:15:31, 13.77s/it] +2025-02-06 05:12:24 - ERROR - stderr - +2025-02-06 05:12:24 - ERROR - stderr - +2025-02-06 05:12:24 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.685892105102539, 'learning_rate': 7.403488974641626e-07, 'epoch': 2.64} +2025-02-06 05:12:24 - ERROR - stderr - 88%|████████▊ | 19752/22434 [19:04:44<10:15:31, 13.77s/it] +2025-02-06 05:12:26 - ERROR - stderr - 88%|████████▊ | 19753/22434 [19:04:46<7:44:34, 10.40s/it] +2025-02-06 05:12:26 - ERROR - stderr - +2025-02-06 05:12:26 - ERROR - stderr - +2025-02-06 05:12:26 - INFO - stdout - {'loss': 0.3099, 'grad_norm': 1.4404512643814087, 'learning_rate': 7.398038221411096e-07, 'epoch': 2.64} +2025-02-06 05:12:26 - ERROR - stderr - 88%|████████▊ | 19753/22434 [19:04:46<7:44:34, 10.40s/it] +2025-02-06 05:12:29 - ERROR - stderr - 88%|████████▊ | 19754/22434 [19:04:49<5:58:52, 8.03s/it] +2025-02-06 05:12:29 - ERROR - stderr - +2025-02-06 05:12:29 - ERROR - stderr - +2025-02-06 05:12:29 - INFO - stdout - {'loss': 0.3288, 'grad_norm': 1.4804177284240723, 'learning_rate': 7.392589398379868e-07, 'epoch': 2.64} +2025-02-06 05:12:29 - ERROR - stderr - 88%|████████▊ | 19754/22434 [19:04:49<5:58:52, 8.03s/it] +2025-02-06 05:12:48 - ERROR - stderr - 88%|████████▊ | 19755/22434 [19:05:08<8:33:02, 11.49s/it] +2025-02-06 05:12:48 - ERROR - stderr - +2025-02-06 05:12:48 - ERROR - stderr - +2025-02-06 05:12:48 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.6431386470794678, 'learning_rate': 7.387142505661482e-07, 'epoch': 2.64} +2025-02-06 05:12:48 - ERROR - stderr - 88%|████████▊ | 19755/22434 [19:05:08<8:33:02, 11.49s/it] +2025-02-06 05:13:07 - ERROR - stderr - 88%|████████▊ | 19756/22434 [19:05:27<10:10:51, 13.69s/it] +2025-02-06 05:13:07 - ERROR - stderr - +2025-02-06 05:13:07 - ERROR - stderr - +2025-02-06 05:13:07 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.444705605506897, 'learning_rate': 7.381697543369492e-07, 'epoch': 2.64} +2025-02-06 05:13:07 - ERROR - stderr - 88%|████████▊ | 19756/22434 [19:05:27<10:10:51, 13.69s/it] +2025-02-06 05:13:10 - ERROR - stderr - 88%|████████▊ | 19757/22434 [19:05:29<7:40:30, 10.32s/it] +2025-02-06 05:13:10 - ERROR - stderr - +2025-02-06 05:13:10 - ERROR - stderr - +2025-02-06 05:13:10 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.6506578922271729, 'learning_rate': 7.376254511617398e-07, 'epoch': 2.64} +2025-02-06 05:13:10 - ERROR - stderr - 88%|████████▊ | 19757/22434 [19:05:29<7:40:30, 10.32s/it] +2025-02-06 05:13:12 - ERROR - stderr - 88%|████████▊ | 19758/22434 [19:05:32<5:54:29, 7.95s/it] +2025-02-06 05:13:12 - ERROR - stderr - +2025-02-06 05:13:12 - ERROR - stderr - +2025-02-06 05:13:12 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.5627301931381226, 'learning_rate': 7.370813410518652e-07, 'epoch': 2.64} +2025-02-06 05:13:12 - ERROR - stderr - 88%|████████▊ | 19758/22434 [19:05:32<5:54:29, 7.95s/it] +2025-02-06 05:13:17 - ERROR - stderr - 88%|████████▊ | 19759/22434 [19:05:37<5:17:11, 7.11s/it] +2025-02-06 05:13:17 - ERROR - stderr - +2025-02-06 05:13:17 - ERROR - stderr - +2025-02-06 05:13:17 - INFO - stdout - {'loss': 0.3398, 'grad_norm': 1.5087485313415527, 'learning_rate': 7.365374240186651e-07, 'epoch': 2.64} +2025-02-06 05:13:17 - ERROR - stderr - 88%|████████▊ | 19759/22434 [19:05:37<5:17:11, 7.11s/it] +2025-02-06 05:13:21 - ERROR - stderr - 88%|████████▊ | 19760/22434 [19:05:40<4:27:53, 6.01s/it] +2025-02-06 05:13:21 - ERROR - stderr - +2025-02-06 05:13:21 - ERROR - stderr - +2025-02-06 05:13:21 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.4212716817855835, 'learning_rate': 7.359937000734785e-07, 'epoch': 2.64} +2025-02-06 05:13:21 - ERROR - stderr - 88%|████████▊ | 19760/22434 [19:05:40<4:27:53, 6.01s/it] +2025-02-06 05:13:23 - ERROR - stderr - 88%|████████▊ | 19761/22434 [19:05:43<3:43:06, 5.01s/it] +2025-02-06 05:13:23 - ERROR - stderr - +2025-02-06 05:13:23 - ERROR - stderr - +2025-02-06 05:13:23 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.5440388917922974, 'learning_rate': 7.354501692276394e-07, 'epoch': 2.64} +2025-02-06 05:13:23 - ERROR - stderr - 88%|████████▊ | 19761/22434 [19:05:43<3:43:06, 5.01s/it] +2025-02-06 05:13:26 - ERROR - stderr - 88%|████████▊ | 19762/22434 [19:05:46<3:10:56, 4.29s/it] +2025-02-06 05:13:26 - ERROR - stderr - +2025-02-06 05:13:26 - ERROR - stderr - +2025-02-06 05:13:26 - INFO - stdout - {'loss': 0.3979, 'grad_norm': 1.684924840927124, 'learning_rate': 7.349068314924757e-07, 'epoch': 2.64} +2025-02-06 05:13:26 - ERROR - stderr - 88%|████████▊ | 19762/22434 [19:05:46<3:10:56, 4.29s/it] +2025-02-06 05:13:28 - ERROR - stderr - 88%|████████▊ | 19763/22434 [19:05:48<2:47:14, 3.76s/it] +2025-02-06 05:13:28 - ERROR - stderr - +2025-02-06 05:13:28 - ERROR - stderr - +2025-02-06 05:13:28 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.4977796077728271, 'learning_rate': 7.343636868793147e-07, 'epoch': 2.64} +2025-02-06 05:13:28 - ERROR - stderr - 88%|████████▊ | 19763/22434 [19:05:48<2:47:14, 3.76s/it] +2025-02-06 05:13:31 - ERROR - stderr - 88%|████████▊ | 19764/22434 [19:05:51<2:31:26, 3.40s/it] +2025-02-06 05:13:31 - ERROR - stderr - +2025-02-06 05:13:31 - ERROR - stderr - +2025-02-06 05:13:31 - INFO - stdout - {'loss': 0.4185, 'grad_norm': 1.6045700311660767, 'learning_rate': 7.33820735399473e-07, 'epoch': 2.64} +2025-02-06 05:13:31 - ERROR - stderr - 88%|████████▊ | 19764/22434 [19:05:51<2:31:26, 3.40s/it] +2025-02-06 05:13:33 - ERROR - stderr - 88%|████████▊ | 19765/22434 [19:05:53<2:19:30, 3.14s/it] +2025-02-06 05:13:34 - ERROR - stderr - +2025-02-06 05:13:34 - ERROR - stderr - +2025-02-06 05:13:34 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.6093525886535645, 'learning_rate': 7.332779770642751e-07, 'epoch': 2.64} +2025-02-06 05:13:34 - ERROR - stderr - 88%|████████▊ | 19765/22434 [19:05:53<2:19:30, 3.14s/it] +2025-02-06 05:13:36 - ERROR - stderr - 88%|████████▊ | 19766/22434 [19:05:56<2:10:20, 2.93s/it] +2025-02-06 05:13:36 - ERROR - stderr - +2025-02-06 05:13:36 - ERROR - stderr - +2025-02-06 05:13:36 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.4677379131317139, 'learning_rate': 7.327354118850272e-07, 'epoch': 2.64} +2025-02-06 05:13:36 - ERROR - stderr - 88%|████████▊ | 19766/22434 [19:05:56<2:10:20, 2.93s/it] +2025-02-06 05:13:38 - ERROR - stderr - 88%|████████▊ | 19767/22434 [19:05:58<2:04:20, 2.80s/it] +2025-02-06 05:13:38 - ERROR - stderr - +2025-02-06 05:13:38 - ERROR - stderr - +2025-02-06 05:13:38 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.4054431915283203, 'learning_rate': 7.321930398730436e-07, 'epoch': 2.64} +2025-02-06 05:13:38 - ERROR - stderr - 88%|████████▊ | 19767/22434 [19:05:58<2:04:20, 2.80s/it] +2025-02-06 05:13:41 - ERROR - stderr - 88%|████████▊ | 19768/22434 [19:06:01<2:01:31, 2.74s/it] +2025-02-06 05:13:41 - ERROR - stderr - +2025-02-06 05:13:41 - ERROR - stderr - +2025-02-06 05:13:41 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.6680492162704468, 'learning_rate': 7.316508610396289e-07, 'epoch': 2.64} +2025-02-06 05:13:41 - ERROR - stderr - 88%|████████▊ | 19768/22434 [19:06:01<2:01:31, 2.74s/it] +2025-02-06 05:13:44 - ERROR - stderr - 88%|████████▊ | 19769/22434 [19:06:03<1:59:13, 2.68s/it] +2025-02-06 05:13:44 - ERROR - stderr - +2025-02-06 05:13:44 - ERROR - stderr - +2025-02-06 05:13:44 - INFO - stdout - {'loss': 0.2845, 'grad_norm': 1.4538205862045288, 'learning_rate': 7.311088753960804e-07, 'epoch': 2.64} +2025-02-06 05:13:44 - ERROR - stderr - 88%|████████▊ | 19769/22434 [19:06:03<1:59:13, 2.68s/it] +2025-02-06 05:13:46 - ERROR - stderr - 88%|████████▊ | 19770/22434 [19:06:06<1:57:36, 2.65s/it] +2025-02-06 05:13:46 - ERROR - stderr - +2025-02-06 05:13:46 - ERROR - stderr - +2025-02-06 05:13:46 - INFO - stdout - {'loss': 0.4137, 'grad_norm': 1.687366247177124, 'learning_rate': 7.305670829537004e-07, 'epoch': 2.64} +2025-02-06 05:13:46 - ERROR - stderr - 88%|████████▊ | 19770/22434 [19:06:06<1:57:36, 2.65s/it] +2025-02-06 05:13:49 - ERROR - stderr - 88%|████████▊ | 19771/22434 [19:06:08<1:55:40, 2.61s/it] +2025-02-06 05:13:49 - ERROR - stderr - +2025-02-06 05:13:49 - ERROR - stderr - +2025-02-06 05:13:49 - INFO - stdout - {'loss': 0.418, 'grad_norm': 1.7847224473953247, 'learning_rate': 7.300254837237797e-07, 'epoch': 2.64} +2025-02-06 05:13:49 - ERROR - stderr - 88%|████████▊ | 19771/22434 [19:06:08<1:55:40, 2.61s/it] +2025-02-06 05:13:51 - ERROR - stderr - 88%|████████▊ | 19772/22434 [19:06:11<1:53:54, 2.57s/it] +2025-02-06 05:13:51 - ERROR - stderr - +2025-02-06 05:13:51 - ERROR - stderr - +2025-02-06 05:13:51 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.6247503757476807, 'learning_rate': 7.29484077717606e-07, 'epoch': 2.64} +2025-02-06 05:13:51 - ERROR - stderr - 88%|████████▊ | 19772/22434 [19:06:11<1:53:54, 2.57s/it] +2025-02-06 05:14:02 - ERROR - stderr - 88%|████████▊ | 19773/22434 [19:06:21<3:40:28, 4.97s/it] +2025-02-06 05:14:02 - ERROR - stderr - +2025-02-06 05:14:02 - ERROR - stderr - +2025-02-06 05:14:02 - INFO - stdout - {'loss': 0.3298, 'grad_norm': 1.4451861381530762, 'learning_rate': 7.289428649464658e-07, 'epoch': 2.64} +2025-02-06 05:14:02 - ERROR - stderr - 88%|████████▊ | 19773/22434 [19:06:22<3:40:28, 4.97s/it] +2025-02-06 05:14:10 - ERROR - stderr - 88%|████████▊ | 19774/22434 [19:06:30<4:25:06, 5.98s/it] +2025-02-06 05:14:10 - ERROR - stderr - +2025-02-06 05:14:10 - ERROR - stderr - +2025-02-06 05:14:10 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.5634313821792603, 'learning_rate': 7.28401845421639e-07, 'epoch': 2.64} +2025-02-06 05:14:10 - ERROR - stderr - 88%|████████▊ | 19774/22434 [19:06:30<4:25:06, 5.98s/it] +2025-02-06 05:14:17 - ERROR - stderr - 88%|████████▊ | 19775/22434 [19:06:37<4:42:28, 6.37s/it] +2025-02-06 05:14:17 - ERROR - stderr - +2025-02-06 05:14:17 - ERROR - stderr - +2025-02-06 05:14:17 - INFO - stdout - {'loss': 0.4012, 'grad_norm': 1.6519863605499268, 'learning_rate': 7.278610191544067e-07, 'epoch': 2.64} +2025-02-06 05:14:17 - ERROR - stderr - 88%|████████▊ | 19775/22434 [19:06:37<4:42:28, 6.37s/it] +2025-02-06 05:14:35 - ERROR - stderr - 88%|████████▊ | 19776/22434 [19:06:55<7:17:38, 9.88s/it] +2025-02-06 05:14:35 - ERROR - stderr - +2025-02-06 05:14:35 - ERROR - stderr - +2025-02-06 05:14:35 - INFO - stdout - {'loss': 0.4192, 'grad_norm': 1.7648208141326904, 'learning_rate': 7.273203861560374e-07, 'epoch': 2.64} +2025-02-06 05:14:35 - ERROR - stderr - 88%|████████▊ | 19776/22434 [19:06:55<7:17:38, 9.88s/it] +2025-02-06 05:14:52 - ERROR - stderr - 88%|████████▊ | 19777/22434 [19:07:12<8:46:04, 11.88s/it] +2025-02-06 05:14:52 - ERROR - stderr - +2025-02-06 05:14:52 - ERROR - stderr - +2025-02-06 05:14:52 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.6875418424606323, 'learning_rate': 7.267799464378023e-07, 'epoch': 2.64} +2025-02-06 05:14:52 - ERROR - stderr - 88%|████████▊ | 19777/22434 [19:07:12<8:46:04, 11.88s/it] +2025-02-06 05:15:01 - ERROR - stderr - 88%|████████▊ | 19778/22434 [19:07:21<8:07:46, 11.02s/it] +2025-02-06 05:15:01 - ERROR - stderr - +2025-02-06 05:15:01 - ERROR - stderr - +2025-02-06 05:15:01 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.388227939605713, 'learning_rate': 7.262397000109645e-07, 'epoch': 2.64} +2025-02-06 05:15:01 - ERROR - stderr - 88%|████████▊ | 19778/22434 [19:07:21<8:07:46, 11.02s/it] +2025-02-06 05:15:03 - ERROR - stderr - 88%|████████▊ | 19779/22434 [19:07:23<6:13:34, 8.44s/it] +2025-02-06 05:15:03 - ERROR - stderr - +2025-02-06 05:15:03 - ERROR - stderr - +2025-02-06 05:15:03 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.7273554801940918, 'learning_rate': 7.256996468867871e-07, 'epoch': 2.64} +2025-02-06 05:15:03 - ERROR - stderr - 88%|████████▊ | 19779/22434 [19:07:23<6:13:34, 8.44s/it] +2025-02-06 05:15:43 - ERROR - stderr - 88%|████████▊ | 19780/22434 [19:08:02<13:00:35, 17.65s/it] +2025-02-06 05:15:43 - ERROR - stderr - +2025-02-06 05:15:43 - ERROR - stderr - +2025-02-06 05:15:43 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.3983500003814697, 'learning_rate': 7.251597870765259e-07, 'epoch': 2.65} +2025-02-06 05:15:43 - ERROR - stderr - 88%|████████▊ | 19780/22434 [19:08:02<13:00:35, 17.65s/it] +2025-02-06 05:15:53 - ERROR - stderr - 88%|████████▊ | 19781/22434 [19:08:13<11:27:20, 15.54s/it] +2025-02-06 05:15:53 - ERROR - stderr - +2025-02-06 05:15:53 - ERROR - stderr - +2025-02-06 05:15:53 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.5141669511795044, 'learning_rate': 7.246201205914338e-07, 'epoch': 2.65} +2025-02-06 05:15:53 - ERROR - stderr - 88%|████████▊ | 19781/22434 [19:08:13<11:27:20, 15.54s/it] +2025-02-06 05:16:35 - ERROR - stderr - 88%|████████▊ | 19782/22434 [19:08:55<17:21:50, 23.57s/it] +2025-02-06 05:16:36 - ERROR - stderr - +2025-02-06 05:16:36 - ERROR - stderr - +2025-02-06 05:16:36 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.422760009765625, 'learning_rate': 7.240806474427598e-07, 'epoch': 2.65} +2025-02-06 05:16:36 - ERROR - stderr - 88%|████████▊ | 19782/22434 [19:08:55<17:21:50, 23.57s/it] +2025-02-06 05:16:47 - ERROR - stderr - 88%|████████▊ | 19783/22434 [19:09:07<14:42:07, 19.96s/it] +2025-02-06 05:16:47 - ERROR - stderr - +2025-02-06 05:16:47 - ERROR - stderr - +2025-02-06 05:16:47 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.5840612649917603, 'learning_rate': 7.23541367641748e-07, 'epoch': 2.65} +2025-02-06 05:16:47 - ERROR - stderr - 88%|████████▊ | 19783/22434 [19:09:07<14:42:07, 19.96s/it] +2025-02-06 05:17:40 - ERROR - stderr - 88%|████████▊ | 19784/22434 [19:09:59<21:53:31, 29.74s/it] +2025-02-06 05:17:40 - ERROR - stderr - +2025-02-06 05:17:40 - ERROR - stderr - +2025-02-06 05:17:40 - INFO - stdout - {'loss': 0.3316, 'grad_norm': 1.472528338432312, 'learning_rate': 7.230022811996407e-07, 'epoch': 2.65} +2025-02-06 05:17:40 - ERROR - stderr - 88%|████████▊ | 19784/22434 [19:09:59<21:53:31, 29.74s/it] +2025-02-06 05:18:02 - ERROR - stderr - 88%|████████▊ | 19785/22434 [19:10:22<20:22:49, 27.70s/it] +2025-02-06 05:18:03 - ERROR - stderr - +2025-02-06 05:18:03 - ERROR - stderr - +2025-02-06 05:18:03 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.5449398756027222, 'learning_rate': 7.224633881276732e-07, 'epoch': 2.65} +2025-02-06 05:18:03 - ERROR - stderr - 88%|████████▊ | 19785/22434 [19:10:22<20:22:49, 27.70s/it] +2025-02-06 05:18:56 - ERROR - stderr - 88%|████████▊ | 19786/22434 [19:11:15<25:58:58, 35.32s/it] +2025-02-06 05:18:56 - ERROR - stderr - +2025-02-06 05:18:56 - ERROR - stderr - +2025-02-06 05:18:56 - INFO - stdout - {'loss': 0.3918, 'grad_norm': 1.8165409564971924, 'learning_rate': 7.21924688437079e-07, 'epoch': 2.65} +2025-02-06 05:18:56 - ERROR - stderr - 88%|████████▊ | 19786/22434 [19:11:15<25:58:58, 35.32s/it] +2025-02-06 05:19:46 - ERROR - stderr - 88%|████████▊ | 19787/22434 [19:12:06<29:23:38, 39.98s/it] +2025-02-06 05:19:46 - ERROR - stderr - +2025-02-06 05:19:46 - ERROR - stderr - +2025-02-06 05:19:46 - INFO - stdout - {'loss': 0.3221, 'grad_norm': 1.5480653047561646, 'learning_rate': 7.213861821390877e-07, 'epoch': 2.65} +2025-02-06 05:19:46 - ERROR - stderr - 88%|████████▊ | 19787/22434 [19:12:06<29:23:38, 39.98s/it] +2025-02-06 05:19:58 - ERROR - stderr - 88%|████████▊ | 19788/22434 [19:12:18<23:12:05, 31.57s/it] +2025-02-06 05:19:58 - ERROR - stderr - +2025-02-06 05:19:58 - ERROR - stderr - +2025-02-06 05:19:58 - INFO - stdout - {'loss': 0.4337, 'grad_norm': 1.6885993480682373, 'learning_rate': 7.208478692449194e-07, 'epoch': 2.65} +2025-02-06 05:19:58 - ERROR - stderr - 88%|████████▊ | 19788/22434 [19:12:18<23:12:05, 31.57s/it] +2025-02-06 05:20:01 - ERROR - stderr - 88%|████████▊ | 19789/22434 [19:12:21<16:48:38, 22.88s/it] +2025-02-06 05:20:01 - ERROR - stderr - +2025-02-06 05:20:01 - ERROR - stderr - +2025-02-06 05:20:01 - INFO - stdout - {'loss': 0.4306, 'grad_norm': 1.7999461889266968, 'learning_rate': 7.203097497658019e-07, 'epoch': 2.65} +2025-02-06 05:20:01 - ERROR - stderr - 88%|████████▊ | 19789/22434 [19:12:21<16:48:38, 22.88s/it] +2025-02-06 05:20:04 - ERROR - stderr - 88%|████████▊ | 19790/22434 [19:12:23<12:19:47, 16.79s/it] +2025-02-06 05:20:04 - ERROR - stderr - +2025-02-06 05:20:04 - ERROR - stderr - +2025-02-06 05:20:04 - INFO - stdout - {'loss': 0.4277, 'grad_norm': 1.8441200256347656, 'learning_rate': 7.197718237129447e-07, 'epoch': 2.65} +2025-02-06 05:20:04 - ERROR - stderr - 88%|████████▊ | 19790/22434 [19:12:23<12:19:47, 16.79s/it] +2025-02-06 05:20:54 - ERROR - stderr - 88%|████████▊ | 19791/22434 [19:13:13<19:38:50, 26.76s/it] +2025-02-06 05:20:54 - ERROR - stderr - +2025-02-06 05:20:54 - ERROR - stderr - +2025-02-06 05:20:54 - INFO - stdout - {'loss': 0.3213, 'grad_norm': 1.3388022184371948, 'learning_rate': 7.192340910975659e-07, 'epoch': 2.65} +2025-02-06 05:20:54 - ERROR - stderr - 88%|████████▊ | 19791/22434 [19:13:13<19:38:50, 26.76s/it] +2025-02-06 05:21:40 - ERROR - stderr - 88%|████████▊ | 19792/22434 [19:14:00<24:04:10, 32.80s/it] +2025-02-06 05:21:41 - ERROR - stderr - +2025-02-06 05:21:41 - ERROR - stderr - +2025-02-06 05:21:41 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.6072968244552612, 'learning_rate': 7.186965519308709e-07, 'epoch': 2.65} +2025-02-06 05:21:41 - ERROR - stderr - 88%|████████▊ | 19792/22434 [19:14:00<24:04:10, 32.80s/it] +2025-02-06 05:22:35 - ERROR - stderr - 88%|████████▊ | 19793/22434 [19:14:55<28:56:02, 39.44s/it] +2025-02-06 05:22:35 - ERROR - stderr - +2025-02-06 05:22:35 - ERROR - stderr - +2025-02-06 05:22:35 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.4616632461547852, 'learning_rate': 7.181592062240638e-07, 'epoch': 2.65} +2025-02-06 05:22:35 - ERROR - stderr - 88%|████████▊ | 19793/22434 [19:14:55<28:56:02, 39.44s/it] +2025-02-06 05:23:26 - ERROR - stderr - 88%|████████▊ | 19794/22434 [19:15:46<31:19:16, 42.71s/it] +2025-02-06 05:23:26 - ERROR - stderr - +2025-02-06 05:23:26 - ERROR - stderr - +2025-02-06 05:23:26 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.6701246500015259, 'learning_rate': 7.176220539883494e-07, 'epoch': 2.65} +2025-02-06 05:23:26 - ERROR - stderr - 88%|████████▊ | 19794/22434 [19:15:46<31:19:16, 42.71s/it] +2025-02-06 05:23:48 - ERROR - stderr - 88%|████████▊ | 19795/22434 [19:16:08<26:50:08, 36.61s/it] +2025-02-06 05:23:48 - ERROR - stderr - +2025-02-06 05:23:48 - ERROR - stderr - +2025-02-06 05:23:48 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.6362653970718384, 'learning_rate': 7.170850952349185e-07, 'epoch': 2.65} +2025-02-06 05:23:48 - ERROR - stderr - 88%|████████▊ | 19795/22434 [19:16:08<26:50:08, 36.61s/it] +2025-02-06 05:24:00 - ERROR - stderr - 88%|████████▊ | 19796/22434 [19:16:19<21:16:52, 29.04s/it] +2025-02-06 05:24:00 - ERROR - stderr - +2025-02-06 05:24:00 - ERROR - stderr - +2025-02-06 05:24:00 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.5370182991027832, 'learning_rate': 7.165483299749665e-07, 'epoch': 2.65} +2025-02-06 05:24:00 - ERROR - stderr - 88%|████████▊ | 19796/22434 [19:16:19<21:16:52, 29.04s/it] +2025-02-06 05:24:08 - ERROR - stderr - 88%|████████▊ | 19797/22434 [19:16:27<16:41:38, 22.79s/it] +2025-02-06 05:24:08 - ERROR - stderr - +2025-02-06 05:24:08 - ERROR - stderr - +2025-02-06 05:24:08 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.7026193141937256, 'learning_rate': 7.160117582196813e-07, 'epoch': 2.65} +2025-02-06 05:24:08 - ERROR - stderr - 88%|████████▊ | 19797/22434 [19:16:28<16:41:38, 22.79s/it] +2025-02-06 05:24:10 - ERROR - stderr - 88%|████████▊ | 19798/22434 [19:16:30<12:16:22, 16.76s/it] +2025-02-06 05:24:10 - ERROR - stderr - +2025-02-06 05:24:10 - ERROR - stderr - +2025-02-06 05:24:10 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.6964266300201416, 'learning_rate': 7.154753799802472e-07, 'epoch': 2.65} +2025-02-06 05:24:10 - ERROR - stderr - 88%|████████▊ | 19798/22434 [19:16:30<12:16:22, 16.76s/it] +2025-02-06 05:24:13 - ERROR - stderr - 88%|████████▊ | 19799/22434 [19:16:33<9:07:25, 12.47s/it] +2025-02-06 05:24:13 - ERROR - stderr - +2025-02-06 05:24:13 - ERROR - stderr - +2025-02-06 05:24:13 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4802186489105225, 'learning_rate': 7.149391952678453e-07, 'epoch': 2.65} +2025-02-06 05:24:13 - ERROR - stderr - 88%|████████▊ | 19799/22434 [19:16:33<9:07:25, 12.47s/it] +2025-02-06 05:24:15 - ERROR - stderr - 88%|████████▊ | 19800/22434 [19:16:35<6:56:50, 9.50s/it] +2025-02-06 05:24:15 - ERROR - stderr - +2025-02-06 05:24:15 - ERROR - stderr - +2025-02-06 05:24:15 - INFO - stdout - {'loss': 0.4067, 'grad_norm': 1.526956558227539, 'learning_rate': 7.144032040936499e-07, 'epoch': 2.65} +2025-02-06 05:24:15 - ERROR - stderr - 88%|████████▊ | 19800/22434 [19:16:35<6:56:50, 9.50s/it] +2025-02-06 05:24:18 - ERROR - stderr - 88%|████████▊ | 19801/22434 [19:16:38<5:25:07, 7.41s/it] +2025-02-06 05:24:18 - ERROR - stderr - +2025-02-06 05:24:18 - ERROR - stderr - +2025-02-06 05:24:18 - INFO - stdout - {'loss': 0.3489, 'grad_norm': 1.4172106981277466, 'learning_rate': 7.138674064688344e-07, 'epoch': 2.65} +2025-02-06 05:24:18 - ERROR - stderr - 88%|████████▊ | 19801/22434 [19:16:38<5:25:07, 7.41s/it] +2025-02-06 05:24:20 - ERROR - stderr - 88%|████████▊ | 19802/22434 [19:16:40<4:19:27, 5.91s/it] +2025-02-06 05:24:20 - ERROR - stderr - +2025-02-06 05:24:20 - ERROR - stderr - +2025-02-06 05:24:20 - INFO - stdout - {'loss': 0.3319, 'grad_norm': 1.426018238067627, 'learning_rate': 7.133318024045677e-07, 'epoch': 2.65} +2025-02-06 05:24:20 - ERROR - stderr - 88%|████████▊ | 19802/22434 [19:16:40<4:19:27, 5.91s/it] +2025-02-06 05:24:23 - ERROR - stderr - 88%|████████▊ | 19803/22434 [19:16:43<3:36:24, 4.94s/it] +2025-02-06 05:24:23 - ERROR - stderr - +2025-02-06 05:24:23 - ERROR - stderr - +2025-02-06 05:24:23 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.5189889669418335, 'learning_rate': 7.127963919120129e-07, 'epoch': 2.65} +2025-02-06 05:24:23 - ERROR - stderr - 88%|████████▊ | 19803/22434 [19:16:43<3:36:24, 4.94s/it] +2025-02-06 05:24:26 - ERROR - stderr - 88%|████████▊ | 19804/22434 [19:16:45<3:04:27, 4.21s/it] +2025-02-06 05:24:26 - ERROR - stderr - +2025-02-06 05:24:26 - ERROR - stderr - +2025-02-06 05:24:26 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.5168719291687012, 'learning_rate': 7.1226117500233e-07, 'epoch': 2.65} +2025-02-06 05:24:26 - ERROR - stderr - 88%|████████▊ | 19804/22434 [19:16:45<3:04:27, 4.21s/it] +2025-02-06 05:25:17 - ERROR - stderr - 88%|████████▊ | 19805/22434 [19:17:36<13:19:20, 18.24s/it] +2025-02-06 05:25:17 - ERROR - stderr - +2025-02-06 05:25:17 - ERROR - stderr - +2025-02-06 05:25:17 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.7775944471359253, 'learning_rate': 7.117261516866758e-07, 'epoch': 2.65} +2025-02-06 05:25:17 - ERROR - stderr - 88%|████████▊ | 19805/22434 [19:17:36<13:19:20, 18.24s/it] +2025-02-06 05:25:19 - ERROR - stderr - 88%|████████▊ | 19806/22434 [19:17:39<9:52:20, 13.52s/it] +2025-02-06 05:25:19 - ERROR - stderr - +2025-02-06 05:25:19 - ERROR - stderr - +2025-02-06 05:25:19 - INFO - stdout - {'loss': 0.303, 'grad_norm': 1.4655529260635376, 'learning_rate': 7.111913219762023e-07, 'epoch': 2.65} +2025-02-06 05:25:19 - ERROR - stderr - 88%|████████▊ | 19806/22434 [19:17:39<9:52:20, 13.52s/it] +2025-02-06 05:25:22 - ERROR - stderr - 88%|████████▊ | 19807/22434 [19:17:41<7:26:54, 10.21s/it] +2025-02-06 05:25:22 - ERROR - stderr - +2025-02-06 05:25:22 - ERROR - stderr - +2025-02-06 05:25:22 - INFO - stdout - {'loss': 0.4123, 'grad_norm': 1.6205755472183228, 'learning_rate': 7.106566858820563e-07, 'epoch': 2.65} +2025-02-06 05:25:22 - ERROR - stderr - 88%|████████▊ | 19807/22434 [19:17:41<7:26:54, 10.21s/it] +2025-02-06 05:25:37 - ERROR - stderr - 88%|████████▊ | 19808/22434 [19:17:57<8:35:55, 11.79s/it] +2025-02-06 05:25:37 - ERROR - stderr - +2025-02-06 05:25:37 - ERROR - stderr - +2025-02-06 05:25:37 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.4475929737091064, 'learning_rate': 7.101222434153854e-07, 'epoch': 2.65} +2025-02-06 05:25:37 - ERROR - stderr - 88%|████████▊ | 19808/22434 [19:17:57<8:35:55, 11.79s/it] +2025-02-06 05:25:39 - ERROR - stderr - 88%|████████▊ | 19809/22434 [19:17:59<6:33:47, 9.00s/it] +2025-02-06 05:25:40 - ERROR - stderr - +2025-02-06 05:25:40 - ERROR - stderr - +2025-02-06 05:25:40 - INFO - stdout - {'loss': 0.3791, 'grad_norm': 1.561442494392395, 'learning_rate': 7.095879945873241e-07, 'epoch': 2.65} +2025-02-06 05:25:40 - ERROR - stderr - 88%|████████▊ | 19809/22434 [19:17:59<6:33:47, 9.00s/it] +2025-02-06 05:26:13 - ERROR - stderr - 88%|████████▊ | 19810/22434 [19:18:33<12:01:31, 16.50s/it] +2025-02-06 05:26:14 - ERROR - stderr - +2025-02-06 05:26:14 - ERROR - stderr - +2025-02-06 05:26:14 - INFO - stdout - {'loss': 0.3417, 'grad_norm': 1.4973715543746948, 'learning_rate': 7.090539394090135e-07, 'epoch': 2.65} +2025-02-06 05:26:14 - ERROR - stderr - 88%|████████▊ | 19810/22434 [19:18:33<12:01:31, 16.50s/it] +2025-02-06 05:26:16 - ERROR - stderr - 88%|████████▊ | 19811/22434 [19:18:36<8:56:46, 12.28s/it] +2025-02-06 05:26:16 - ERROR - stderr - +2025-02-06 05:26:16 - ERROR - stderr - +2025-02-06 05:26:16 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.5112063884735107, 'learning_rate': 7.085200778915791e-07, 'epoch': 2.65} +2025-02-06 05:26:16 - ERROR - stderr - 88%|████████▊ | 19811/22434 [19:18:36<8:56:46, 12.28s/it] +2025-02-06 05:26:56 - ERROR - stderr - 88%|████████▊ | 19812/22434 [19:19:15<14:55:55, 20.50s/it] +2025-02-06 05:26:56 - ERROR - stderr - +2025-02-06 05:26:56 - ERROR - stderr - +2025-02-06 05:26:56 - INFO - stdout - {'loss': 0.3828, 'grad_norm': 1.6839542388916016, 'learning_rate': 7.079864100461553e-07, 'epoch': 2.65} +2025-02-06 05:26:56 - ERROR - stderr - 88%|████████▊ | 19812/22434 [19:19:15<14:55:55, 20.50s/it] +2025-02-06 05:27:15 - ERROR - stderr - 88%|████████▊ | 19813/22434 [19:19:34<14:37:06, 20.08s/it] +2025-02-06 05:27:15 - ERROR - stderr - +2025-02-06 05:27:15 - ERROR - stderr - +2025-02-06 05:27:15 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.700052261352539, 'learning_rate': 7.074529358838644e-07, 'epoch': 2.65} +2025-02-06 05:27:15 - ERROR - stderr - 88%|████████▊ | 19813/22434 [19:19:35<14:37:06, 20.08s/it] +2025-02-06 05:27:17 - ERROR - stderr - 88%|████████▊ | 19814/22434 [19:19:37<10:46:27, 14.80s/it] +2025-02-06 05:27:17 - ERROR - stderr - +2025-02-06 05:27:17 - ERROR - stderr - +2025-02-06 05:27:17 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.4718619585037231, 'learning_rate': 7.069196554158219e-07, 'epoch': 2.65} +2025-02-06 05:27:17 - ERROR - stderr - 88%|████████▊ | 19814/22434 [19:19:37<10:46:27, 14.80s/it] +2025-02-06 05:27:20 - ERROR - stderr - 88%|████████▊ | 19815/22434 [19:19:39<8:04:27, 11.10s/it] +2025-02-06 05:27:20 - ERROR - stderr - +2025-02-06 05:27:20 - ERROR - stderr - +2025-02-06 05:27:20 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.569956660270691, 'learning_rate': 7.063865686531512e-07, 'epoch': 2.65} +2025-02-06 05:27:20 - ERROR - stderr - 88%|████████▊ | 19815/22434 [19:19:39<8:04:27, 11.10s/it] +2025-02-06 05:27:22 - ERROR - stderr - 88%|████████▊ | 19816/22434 [19:19:42<6:11:08, 8.51s/it] +2025-02-06 05:27:22 - ERROR - stderr - +2025-02-06 05:27:22 - ERROR - stderr - +2025-02-06 05:27:22 - INFO - stdout - {'loss': 0.3279, 'grad_norm': 1.436689019203186, 'learning_rate': 7.058536756069567e-07, 'epoch': 2.65} +2025-02-06 05:27:22 - ERROR - stderr - 88%|████████▊ | 19816/22434 [19:19:42<6:11:08, 8.51s/it] +2025-02-06 05:28:00 - ERROR - stderr - 88%|████████▊ | 19817/22434 [19:20:20<12:40:26, 17.43s/it] +2025-02-06 05:28:00 - ERROR - stderr - +2025-02-06 05:28:00 - ERROR - stderr - +2025-02-06 05:28:00 - INFO - stdout - {'loss': 0.3277, 'grad_norm': 1.4086973667144775, 'learning_rate': 7.053209762883483e-07, 'epoch': 2.65} +2025-02-06 05:28:00 - ERROR - stderr - 88%|████████▊ | 19817/22434 [19:20:20<12:40:26, 17.43s/it] +2025-02-06 05:28:36 - ERROR - stderr - 88%|████████▊ | 19818/22434 [19:20:55<16:34:15, 22.80s/it] +2025-02-06 05:28:36 - ERROR - stderr - +2025-02-06 05:28:36 - ERROR - stderr - +2025-02-06 05:28:36 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.5871057510375977, 'learning_rate': 7.047884707084307e-07, 'epoch': 2.65} +2025-02-06 05:28:36 - ERROR - stderr - 88%|████████▊ | 19818/22434 [19:20:56<16:34:15, 22.80s/it] +2025-02-06 05:29:15 - ERROR - stderr - 88%|████████▊ | 19819/22434 [19:21:35<20:10:46, 27.78s/it] +2025-02-06 05:29:15 - ERROR - stderr - +2025-02-06 05:29:15 - ERROR - stderr - +2025-02-06 05:29:15 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.7710801362991333, 'learning_rate': 7.042561588783015e-07, 'epoch': 2.65} +2025-02-06 05:29:15 - ERROR - stderr - 88%|████████▊ | 19819/22434 [19:21:35<20:10:46, 27.78s/it] +2025-02-06 05:29:18 - ERROR - stderr - 88%|████████▊ | 19820/22434 [19:21:37<14:38:58, 20.18s/it] +2025-02-06 05:29:18 - ERROR - stderr - +2025-02-06 05:29:18 - ERROR - stderr - +2025-02-06 05:29:18 - INFO - stdout - {'loss': 0.3009, 'grad_norm': 1.4365084171295166, 'learning_rate': 7.037240408090607e-07, 'epoch': 2.65} +2025-02-06 05:29:18 - ERROR - stderr - 88%|████████▊ | 19820/22434 [19:21:37<14:38:58, 20.18s/it] +2025-02-06 05:29:46 - ERROR - stderr - 88%|████████▊ | 19821/22434 [19:22:06<16:32:37, 22.79s/it] +2025-02-06 05:29:46 - ERROR - stderr - +2025-02-06 05:29:46 - ERROR - stderr - +2025-02-06 05:29:46 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.3276411294937134, 'learning_rate': 7.03192116511795e-07, 'epoch': 2.65} +2025-02-06 05:29:46 - ERROR - stderr - 88%|████████▊ | 19821/22434 [19:22:06<16:32:37, 22.79s/it] +2025-02-06 05:30:15 - ERROR - stderr - 88%|████████▊ | 19822/22434 [19:22:34<17:43:29, 24.43s/it] +2025-02-06 05:30:15 - ERROR - stderr - +2025-02-06 05:30:15 - ERROR - stderr - +2025-02-06 05:30:15 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.4707804918289185, 'learning_rate': 7.026603859975933e-07, 'epoch': 2.65} +2025-02-06 05:30:15 - ERROR - stderr - 88%|████████▊ | 19822/22434 [19:22:34<17:43:29, 24.43s/it] +2025-02-06 05:30:34 - ERROR - stderr - 88%|████████▊ | 19823/22434 [19:22:54<16:38:54, 22.95s/it] +2025-02-06 05:30:34 - ERROR - stderr - +2025-02-06 05:30:34 - ERROR - stderr - +2025-02-06 05:30:34 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.4275401830673218, 'learning_rate': 7.021288492775391e-07, 'epoch': 2.65} +2025-02-06 05:30:34 - ERROR - stderr - 88%|████████▊ | 19823/22434 [19:22:54<16:38:54, 22.95s/it] +2025-02-06 05:30:53 - ERROR - stderr - 88%|████████▊ | 19824/22434 [19:23:13<15:45:48, 21.74s/it] +2025-02-06 05:30:53 - ERROR - stderr - +2025-02-06 05:30:53 - ERROR - stderr - +2025-02-06 05:30:53 - INFO - stdout - {'loss': 0.4437, 'grad_norm': 1.740086317062378, 'learning_rate': 7.015975063627123e-07, 'epoch': 2.65} +2025-02-06 05:30:53 - ERROR - stderr - 88%|████████▊ | 19824/22434 [19:23:13<15:45:48, 21.74s/it] +2025-02-06 05:30:56 - ERROR - stderr - 88%|████████▊ | 19825/22434 [19:23:15<11:34:56, 15.98s/it] +2025-02-06 05:30:56 - ERROR - stderr - +2025-02-06 05:30:56 - ERROR - stderr - +2025-02-06 05:30:56 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.4595612287521362, 'learning_rate': 7.010663572641885e-07, 'epoch': 2.65} +2025-02-06 05:30:56 - ERROR - stderr - 88%|████████▊ | 19825/22434 [19:23:15<11:34:56, 15.98s/it] +2025-02-06 05:31:17 - ERROR - stderr - 88%|████████▊ | 19826/22434 [19:23:37<12:41:15, 17.51s/it] +2025-02-06 05:31:17 - ERROR - stderr - +2025-02-06 05:31:17 - ERROR - stderr - +2025-02-06 05:31:17 - INFO - stdout - {'loss': 0.3109, 'grad_norm': 1.4893689155578613, 'learning_rate': 7.005354019930377e-07, 'epoch': 2.65} +2025-02-06 05:31:17 - ERROR - stderr - 88%|████████▊ | 19826/22434 [19:23:37<12:41:15, 17.51s/it] +2025-02-06 05:31:19 - ERROR - stderr - 88%|████████▊ | 19827/22434 [19:23:39<9:25:03, 13.00s/it] +2025-02-06 05:31:19 - ERROR - stderr - +2025-02-06 05:31:19 - ERROR - stderr - +2025-02-06 05:31:19 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.4931366443634033, 'learning_rate': 7.000046405603278e-07, 'epoch': 2.65} +2025-02-06 05:31:19 - ERROR - stderr - 88%|████████▊ | 19827/22434 [19:23:39<9:25:03, 13.00s/it] +2025-02-06 05:31:22 - ERROR - stderr - 88%|████████▊ | 19828/22434 [19:23:41<7:08:05, 9.86s/it] +2025-02-06 05:31:22 - ERROR - stderr - +2025-02-06 05:31:22 - ERROR - stderr - +2025-02-06 05:31:22 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.5984081029891968, 'learning_rate': 6.994740729771221e-07, 'epoch': 2.65} +2025-02-06 05:31:22 - ERROR - stderr - 88%|████████▊ | 19828/22434 [19:23:42<7:08:05, 9.86s/it] +2025-02-06 05:31:24 - ERROR - stderr - 88%|████████▊ | 19829/22434 [19:23:44<5:31:53, 7.64s/it] +2025-02-06 05:31:24 - ERROR - stderr - +2025-02-06 05:31:24 - ERROR - stderr - +2025-02-06 05:31:24 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.5655324459075928, 'learning_rate': 6.989436992544807e-07, 'epoch': 2.65} +2025-02-06 05:31:24 - ERROR - stderr - 88%|████████▊ | 19829/22434 [19:23:44<5:31:53, 7.64s/it] +2025-02-06 05:31:27 - ERROR - stderr - 88%|████████▊ | 19830/22434 [19:23:46<4:24:30, 6.09s/it] +2025-02-06 05:31:27 - ERROR - stderr - +2025-02-06 05:31:27 - ERROR - stderr - +2025-02-06 05:31:27 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.565459132194519, 'learning_rate': 6.984135194034558e-07, 'epoch': 2.65} +2025-02-06 05:31:27 - ERROR - stderr - 88%|████████▊ | 19830/22434 [19:23:47<4:24:30, 6.09s/it] +2025-02-06 05:31:29 - ERROR - stderr - 88%|████████▊ | 19831/22434 [19:23:49<3:37:58, 5.02s/it] +2025-02-06 05:31:29 - ERROR - stderr - +2025-02-06 05:31:29 - ERROR - stderr - +2025-02-06 05:31:29 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.5754010677337646, 'learning_rate': 6.978835334351008e-07, 'epoch': 2.65} +2025-02-06 05:31:29 - ERROR - stderr - 88%|████████▊ | 19831/22434 [19:23:49<3:37:58, 5.02s/it] +2025-02-06 05:31:32 - ERROR - stderr - 88%|████████▊ | 19832/22434 [19:23:51<3:04:49, 4.26s/it] +2025-02-06 05:31:32 - ERROR - stderr - +2025-02-06 05:31:32 - ERROR - stderr - +2025-02-06 05:31:32 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.5657188892364502, 'learning_rate': 6.973537413604647e-07, 'epoch': 2.65} +2025-02-06 05:31:32 - ERROR - stderr - 88%|████████▊ | 19832/22434 [19:23:52<3:04:49, 4.26s/it] +2025-02-06 05:31:34 - ERROR - stderr - 88%|████████▊ | 19833/22434 [19:23:54<2:42:53, 3.76s/it] +2025-02-06 05:31:34 - ERROR - stderr - +2025-02-06 05:31:34 - ERROR - stderr - +2025-02-06 05:31:34 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.4824175834655762, 'learning_rate': 6.968241431905853e-07, 'epoch': 2.65} +2025-02-06 05:31:34 - ERROR - stderr - 88%|████████▊ | 19833/22434 [19:23:54<2:42:53, 3.76s/it] +2025-02-06 05:31:45 - ERROR - stderr - 88%|████████▊ | 19834/22434 [19:24:05<4:18:11, 5.96s/it] +2025-02-06 05:31:45 - ERROR - stderr - +2025-02-06 05:31:45 - ERROR - stderr - +2025-02-06 05:31:45 - INFO - stdout - {'loss': 0.41, 'grad_norm': 1.8440114259719849, 'learning_rate': 6.962947389365071e-07, 'epoch': 2.65} +2025-02-06 05:31:45 - ERROR - stderr - 88%|████████▊ | 19834/22434 [19:24:05<4:18:11, 5.96s/it] +2025-02-06 05:31:48 - ERROR - stderr - 88%|████████▊ | 19835/22434 [19:24:08<3:33:29, 4.93s/it] +2025-02-06 05:31:48 - ERROR - stderr - +2025-02-06 05:31:48 - ERROR - stderr - +2025-02-06 05:31:48 - INFO - stdout - {'loss': 0.3046, 'grad_norm': 1.4448524713516235, 'learning_rate': 6.95765528609259e-07, 'epoch': 2.65} +2025-02-06 05:31:48 - ERROR - stderr - 88%|████████▊ | 19835/22434 [19:24:08<3:33:29, 4.93s/it] +2025-02-06 05:31:50 - ERROR - stderr - 88%|████████▊ | 19836/22434 [19:24:10<3:01:50, 4.20s/it] +2025-02-06 05:31:50 - ERROR - stderr - +2025-02-06 05:31:50 - ERROR - stderr - +2025-02-06 05:31:50 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.571654200553894, 'learning_rate': 6.95236512219879e-07, 'epoch': 2.65} +2025-02-06 05:31:50 - ERROR - stderr - 88%|████████▊ | 19836/22434 [19:24:10<3:01:50, 4.20s/it] +2025-02-06 05:31:53 - ERROR - stderr - 88%|████████▊ | 19837/22434 [19:24:13<2:40:17, 3.70s/it] +2025-02-06 05:31:53 - ERROR - stderr - +2025-02-06 05:31:53 - ERROR - stderr - +2025-02-06 05:31:53 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.7086185216903687, 'learning_rate': 6.947076897793881e-07, 'epoch': 2.65} +2025-02-06 05:31:53 - ERROR - stderr - 88%|████████▊ | 19837/22434 [19:24:13<2:40:17, 3.70s/it] +2025-02-06 05:31:55 - ERROR - stderr - 88%|████████▊ | 19838/22434 [19:24:15<2:25:09, 3.36s/it] +2025-02-06 05:31:56 - ERROR - stderr - +2025-02-06 05:31:56 - ERROR - stderr - +2025-02-06 05:31:56 - INFO - stdout - {'loss': 0.383, 'grad_norm': 1.5983456373214722, 'learning_rate': 6.941790612988097e-07, 'epoch': 2.65} +2025-02-06 05:31:56 - ERROR - stderr - 88%|████████▊ | 19838/22434 [19:24:15<2:25:09, 3.36s/it] +2025-02-06 05:31:58 - ERROR - stderr - 88%|████████▊ | 19839/22434 [19:24:18<2:13:45, 3.09s/it] +2025-02-06 05:31:58 - ERROR - stderr - +2025-02-06 05:31:58 - ERROR - stderr - +2025-02-06 05:31:58 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.6153956651687622, 'learning_rate': 6.936506267891685e-07, 'epoch': 2.65} +2025-02-06 05:31:58 - ERROR - stderr - 88%|████████▊ | 19839/22434 [19:24:18<2:13:45, 3.09s/it] +2025-02-06 05:32:01 - ERROR - stderr - 88%|████████▊ | 19840/22434 [19:24:20<2:06:33, 2.93s/it] +2025-02-06 05:32:01 - ERROR - stderr - +2025-02-06 05:32:01 - ERROR - stderr - +2025-02-06 05:32:01 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.6302534341812134, 'learning_rate': 6.931223862614711e-07, 'epoch': 2.65} +2025-02-06 05:32:01 - ERROR - stderr - 88%|████████▊ | 19840/22434 [19:24:20<2:06:33, 2.93s/it] +2025-02-06 05:32:03 - ERROR - stderr - 88%|████████▊ | 19841/22434 [19:24:23<2:01:58, 2.82s/it] +2025-02-06 05:32:03 - ERROR - stderr - +2025-02-06 05:32:03 - ERROR - stderr - +2025-02-06 05:32:03 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.3857886791229248, 'learning_rate': 6.925943397267331e-07, 'epoch': 2.65} +2025-02-06 05:32:03 - ERROR - stderr - 88%|████████▊ | 19841/22434 [19:24:23<2:01:58, 2.82s/it] +2025-02-06 05:32:06 - ERROR - stderr - 88%|████████▊ | 19842/22434 [19:24:25<1:58:23, 2.74s/it] +2025-02-06 05:32:06 - ERROR - stderr - +2025-02-06 05:32:06 - ERROR - stderr - +2025-02-06 05:32:06 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.577000617980957, 'learning_rate': 6.920664871959603e-07, 'epoch': 2.65} +2025-02-06 05:32:06 - ERROR - stderr - 88%|████████▊ | 19842/22434 [19:24:25<1:58:23, 2.74s/it] +2025-02-06 05:32:08 - ERROR - stderr - 88%|████████▊ | 19843/22434 [19:24:28<1:55:11, 2.67s/it] +2025-02-06 05:32:08 - ERROR - stderr - +2025-02-06 05:32:08 - ERROR - stderr - +2025-02-06 05:32:08 - INFO - stdout - {'loss': 0.4224, 'grad_norm': 1.5896180868148804, 'learning_rate': 6.915388286801539e-07, 'epoch': 2.65} +2025-02-06 05:32:08 - ERROR - stderr - 88%|████████▊ | 19843/22434 [19:24:28<1:55:11, 2.67s/it] +2025-02-06 05:32:11 - ERROR - stderr - 88%|████████▊ | 19844/22434 [19:24:30<1:52:04, 2.60s/it] +2025-02-06 05:32:11 - ERROR - stderr - +2025-02-06 05:32:11 - ERROR - stderr - +2025-02-06 05:32:11 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.7300999164581299, 'learning_rate': 6.910113641903138e-07, 'epoch': 2.65} +2025-02-06 05:32:11 - ERROR - stderr - 88%|████████▊ | 19844/22434 [19:24:30<1:52:04, 2.60s/it] +2025-02-06 05:32:13 - ERROR - stderr - 88%|████████▊ | 19845/22434 [19:24:33<1:51:06, 2.57s/it] +2025-02-06 05:32:13 - ERROR - stderr - +2025-02-06 05:32:13 - ERROR - stderr - +2025-02-06 05:32:13 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.4600780010223389, 'learning_rate': 6.904840937374336e-07, 'epoch': 2.65} +2025-02-06 05:32:13 - ERROR - stderr - 88%|████████▊ | 19845/22434 [19:24:33<1:51:06, 2.57s/it] +2025-02-06 05:32:16 - ERROR - stderr - 88%|████████▊ | 19846/22434 [19:24:35<1:50:13, 2.56s/it] +2025-02-06 05:32:16 - ERROR - stderr - +2025-02-06 05:32:16 - ERROR - stderr - +2025-02-06 05:32:16 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.5271409749984741, 'learning_rate': 6.899570173325043e-07, 'epoch': 2.65} +2025-02-06 05:32:16 - ERROR - stderr - 88%|████████▊ | 19846/22434 [19:24:35<1:50:13, 2.56s/it] +2025-02-06 05:32:18 - ERROR - stderr - 88%|████████▊ | 19847/22434 [19:24:38<1:49:11, 2.53s/it] +2025-02-06 05:32:18 - ERROR - stderr - +2025-02-06 05:32:18 - ERROR - stderr - +2025-02-06 05:32:18 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.6505041122436523, 'learning_rate': 6.894301349865129e-07, 'epoch': 2.65} +2025-02-06 05:32:18 - ERROR - stderr - 88%|████████▊ | 19847/22434 [19:24:38<1:49:11, 2.53s/it] +2025-02-06 05:33:40 - ERROR - stderr - 88%|████████▊ | 19848/22434 [19:25:59<18:50:36, 26.23s/it] +2025-02-06 05:33:40 - ERROR - stderr - +2025-02-06 05:33:40 - ERROR - stderr - +2025-02-06 05:33:40 - INFO - stdout - {'loss': 0.3215, 'grad_norm': 1.4916082620620728, 'learning_rate': 6.889034467104427e-07, 'epoch': 2.65} +2025-02-06 05:33:40 - ERROR - stderr - 88%|████████▊ | 19848/22434 [19:25:59<18:50:36, 26.23s/it] +2025-02-06 05:34:45 - ERROR - stderr - 88%|████████▊ | 19849/22434 [19:27:05<27:15:57, 37.97s/it] +2025-02-06 05:34:45 - ERROR - stderr - +2025-02-06 05:34:45 - ERROR - stderr - +2025-02-06 05:34:45 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.4528629779815674, 'learning_rate': 6.883769525152661e-07, 'epoch': 2.65} +2025-02-06 05:34:45 - ERROR - stderr - 88%|████████▊ | 19849/22434 [19:27:05<27:15:57, 37.97s/it] +2025-02-06 05:35:36 - ERROR - stderr - 88%|████████▊ | 19850/22434 [19:27:56<30:00:44, 41.81s/it] +2025-02-06 05:35:36 - ERROR - stderr - +2025-02-06 05:35:36 - ERROR - stderr - +2025-02-06 05:35:36 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.7115123271942139, 'learning_rate': 6.878506524119644e-07, 'epoch': 2.65} +2025-02-06 05:35:36 - ERROR - stderr - 88%|████████▊ | 19850/22434 [19:27:56<30:00:44, 41.81s/it] +2025-02-06 05:36:17 - ERROR - stderr - 88%|████████▊ | 19851/22434 [19:28:37<29:56:44, 41.74s/it] +2025-02-06 05:36:17 - ERROR - stderr - +2025-02-06 05:36:17 - ERROR - stderr - +2025-02-06 05:36:17 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.5015507936477661, 'learning_rate': 6.873245464115053e-07, 'epoch': 2.65} +2025-02-06 05:36:17 - ERROR - stderr - 88%|████████▊ | 19851/22434 [19:28:37<29:56:44, 41.74s/it] +2025-02-06 05:36:20 - ERROR - stderr - 88%|████████▊ | 19852/22434 [19:28:40<21:28:43, 29.95s/it] +2025-02-06 05:36:20 - ERROR - stderr - +2025-02-06 05:36:20 - ERROR - stderr - +2025-02-06 05:36:20 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.5385017395019531, 'learning_rate': 6.867986345248534e-07, 'epoch': 2.65} +2025-02-06 05:36:20 - ERROR - stderr - 88%|████████▊ | 19852/22434 [19:28:40<21:28:43, 29.95s/it] +2025-02-06 05:36:28 - ERROR - stderr - 88%|████████▊ | 19853/22434 [19:28:47<16:43:45, 23.33s/it] +2025-02-06 05:36:28 - ERROR - stderr - +2025-02-06 05:36:28 - ERROR - stderr - +2025-02-06 05:36:28 - INFO - stdout - {'loss': 0.336, 'grad_norm': 1.4977712631225586, 'learning_rate': 6.862729167629745e-07, 'epoch': 2.65} +2025-02-06 05:36:28 - ERROR - stderr - 88%|████████▊ | 19853/22434 [19:28:47<16:43:45, 23.33s/it] +2025-02-06 05:36:30 - ERROR - stderr - 88%|████████▊ | 19854/22434 [19:28:50<12:13:53, 17.07s/it] +2025-02-06 05:36:30 - ERROR - stderr - +2025-02-06 05:36:30 - ERROR - stderr - +2025-02-06 05:36:30 - INFO - stdout - {'loss': 0.4123, 'grad_norm': 1.7158117294311523, 'learning_rate': 6.857473931368219e-07, 'epoch': 2.65} +2025-02-06 05:36:30 - ERROR - stderr - 88%|████████▊ | 19854/22434 [19:28:50<12:13:53, 17.07s/it] +2025-02-06 05:36:33 - ERROR - stderr - 89%|████████▊ | 19855/22434 [19:28:52<9:05:40, 12.70s/it] +2025-02-06 05:36:33 - ERROR - stderr - +2025-02-06 05:36:33 - ERROR - stderr - +2025-02-06 05:36:33 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.4028549194335938, 'learning_rate': 6.852220636573537e-07, 'epoch': 2.66} +2025-02-06 05:36:33 - ERROR - stderr - 89%|████████▊ | 19855/22434 [19:28:52<9:05:40, 12.70s/it] +2025-02-06 05:37:44 - ERROR - stderr - 89%|████████▊ | 19856/22434 [19:30:04<21:40:06, 30.26s/it] +2025-02-06 05:37:44 - ERROR - stderr - +2025-02-06 05:37:44 - ERROR - stderr - +2025-02-06 05:37:44 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.6149935722351074, 'learning_rate': 6.846969283355176e-07, 'epoch': 2.66} +2025-02-06 05:37:44 - ERROR - stderr - 89%|████████▊ | 19856/22434 [19:30:04<21:40:06, 30.26s/it] +2025-02-06 05:38:50 - ERROR - stderr - 89%|████████▊ | 19857/22434 [19:31:10<29:22:23, 41.03s/it] +2025-02-06 05:38:50 - ERROR - stderr - +2025-02-06 05:38:50 - ERROR - stderr - +2025-02-06 05:38:50 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.6282262802124023, 'learning_rate': 6.841719871822594e-07, 'epoch': 2.66} +2025-02-06 05:38:50 - ERROR - stderr - 89%|████████▊ | 19857/22434 [19:31:10<29:22:23, 41.03s/it] +2025-02-06 05:39:21 - ERROR - stderr - 89%|████████▊ | 19858/22434 [19:31:41<27:13:09, 38.04s/it] +2025-02-06 05:39:21 - ERROR - stderr - +2025-02-06 05:39:21 - ERROR - stderr - +2025-02-06 05:39:21 - INFO - stdout - {'loss': 0.3611, 'grad_norm': 1.5465480089187622, 'learning_rate': 6.836472402085237e-07, 'epoch': 2.66} +2025-02-06 05:39:21 - ERROR - stderr - 89%|████████▊ | 19858/22434 [19:31:41<27:13:09, 38.04s/it] +2025-02-06 05:40:18 - ERROR - stderr - 89%|████████▊ | 19859/22434 [19:32:38<31:13:23, 43.65s/it] +2025-02-06 05:40:18 - ERROR - stderr - +2025-02-06 05:40:18 - ERROR - stderr - +2025-02-06 05:40:18 - INFO - stdout - {'loss': 0.3378, 'grad_norm': 1.5175998210906982, 'learning_rate': 6.831226874252439e-07, 'epoch': 2.66} +2025-02-06 05:40:18 - ERROR - stderr - 89%|████████▊ | 19859/22434 [19:32:38<31:13:23, 43.65s/it] +2025-02-06 05:40:47 - ERROR - stderr - 89%|████████▊ | 19860/22434 [19:33:07<28:09:14, 39.38s/it] +2025-02-06 05:40:47 - ERROR - stderr - +2025-02-06 05:40:47 - ERROR - stderr - +2025-02-06 05:40:47 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.5405676364898682, 'learning_rate': 6.825983288433602e-07, 'epoch': 2.66} +2025-02-06 05:40:47 - ERROR - stderr - 89%|████████▊ | 19860/22434 [19:33:07<28:09:14, 39.38s/it] +2025-02-06 05:41:37 - ERROR - stderr - 89%|████████▊ | 19861/22434 [19:33:56<30:17:17, 42.38s/it] +2025-02-06 05:41:37 - ERROR - stderr - +2025-02-06 05:41:37 - ERROR - stderr - +2025-02-06 05:41:37 - INFO - stdout - {'loss': 0.3615, 'grad_norm': 1.5204635858535767, 'learning_rate': 6.82074164473796e-07, 'epoch': 2.66} +2025-02-06 05:41:37 - ERROR - stderr - 89%|████████▊ | 19861/22434 [19:33:56<30:17:17, 42.38s/it] +2025-02-06 05:41:47 - ERROR - stderr - 89%|████████▊ | 19862/22434 [19:34:07<23:27:54, 32.84s/it] +2025-02-06 05:41:47 - ERROR - stderr - +2025-02-06 05:41:47 - ERROR - stderr - +2025-02-06 05:41:47 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5734401941299438, 'learning_rate': 6.815501943274804e-07, 'epoch': 2.66} +2025-02-06 05:41:47 - ERROR - stderr - 89%|████████▊ | 19862/22434 [19:34:07<23:27:54, 32.84s/it] +2025-02-06 05:42:16 - ERROR - stderr - 89%|████████▊ | 19863/22434 [19:34:36<22:39:26, 31.73s/it] +2025-02-06 05:42:16 - ERROR - stderr - +2025-02-06 05:42:16 - ERROR - stderr - +2025-02-06 05:42:16 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.5918062925338745, 'learning_rate': 6.810264184153336e-07, 'epoch': 2.66} +2025-02-06 05:42:16 - ERROR - stderr - 89%|████████▊ | 19863/22434 [19:34:36<22:39:26, 31.73s/it] +2025-02-06 05:42:19 - ERROR - stderr - 89%|████████▊ | 19864/22434 [19:34:39<16:24:25, 22.98s/it] +2025-02-06 05:42:19 - ERROR - stderr - +2025-02-06 05:42:19 - ERROR - stderr - +2025-02-06 05:42:19 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.5607930421829224, 'learning_rate': 6.805028367482736e-07, 'epoch': 2.66} +2025-02-06 05:42:19 - ERROR - stderr - 89%|████████▊ | 19864/22434 [19:34:39<16:24:25, 22.98s/it] +2025-02-06 05:42:30 - ERROR - stderr - 89%|████████▊ | 19865/22434 [19:34:50<13:48:36, 19.35s/it] +2025-02-06 05:42:30 - ERROR - stderr - +2025-02-06 05:42:30 - ERROR - stderr - +2025-02-06 05:42:30 - INFO - stdout - {'loss': 0.3274, 'grad_norm': 1.6355377435684204, 'learning_rate': 6.799794493372148e-07, 'epoch': 2.66} +2025-02-06 05:42:30 - ERROR - stderr - 89%|████████▊ | 19865/22434 [19:34:50<13:48:36, 19.35s/it] +2025-02-06 05:43:26 - ERROR - stderr - 89%|████████▊ | 19866/22434 [19:35:46<21:46:29, 30.53s/it] +2025-02-06 05:43:26 - ERROR - stderr - +2025-02-06 05:43:26 - ERROR - stderr - +2025-02-06 05:43:26 - INFO - stdout - {'loss': 0.4188, 'grad_norm': 1.7407686710357666, 'learning_rate': 6.794562561930662e-07, 'epoch': 2.66} +2025-02-06 05:43:26 - ERROR - stderr - 89%|████████▊ | 19866/22434 [19:35:46<21:46:29, 30.53s/it] +2025-02-06 05:44:15 - ERROR - stderr - 89%|████████▊ | 19867/22434 [19:36:34<25:33:42, 35.85s/it] +2025-02-06 05:44:15 - ERROR - stderr - +2025-02-06 05:44:15 - ERROR - stderr - +2025-02-06 05:44:15 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.5318787097930908, 'learning_rate': 6.789332573267327e-07, 'epoch': 2.66} +2025-02-06 05:44:15 - ERROR - stderr - 89%|████████▊ | 19867/22434 [19:36:34<25:33:42, 35.85s/it] +2025-02-06 05:45:05 - ERROR - stderr - 89%|████████▊ | 19868/22434 [19:37:25<28:36:31, 40.14s/it] +2025-02-06 05:45:05 - ERROR - stderr - +2025-02-06 05:45:05 - ERROR - stderr - +2025-02-06 05:45:05 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.7059862613677979, 'learning_rate': 6.784104527491154e-07, 'epoch': 2.66} +2025-02-06 05:45:05 - ERROR - stderr - 89%|████████▊ | 19868/22434 [19:37:25<28:36:31, 40.14s/it] +2025-02-06 05:46:04 - ERROR - stderr - 89%|████████▊ | 19869/22434 [19:38:24<32:45:05, 45.97s/it] +2025-02-06 05:46:04 - ERROR - stderr - +2025-02-06 05:46:04 - ERROR - stderr - +2025-02-06 05:46:04 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.5117486715316772, 'learning_rate': 6.778878424711133e-07, 'epoch': 2.66} +2025-02-06 05:46:04 - ERROR - stderr - 89%|████████▊ | 19869/22434 [19:38:24<32:45:05, 45.97s/it] +2025-02-06 05:46:59 - ERROR - stderr - 89%|████████▊ | 19870/22434 [19:39:19<34:40:30, 48.69s/it] +2025-02-06 05:46:59 - ERROR - stderr - +2025-02-06 05:46:59 - ERROR - stderr - +2025-02-06 05:46:59 - INFO - stdout - {'loss': 0.3616, 'grad_norm': 1.408552885055542, 'learning_rate': 6.773654265036189e-07, 'epoch': 2.66} +2025-02-06 05:46:59 - ERROR - stderr - 89%|████████▊ | 19870/22434 [19:39:19<34:40:30, 48.69s/it] +2025-02-06 05:47:02 - ERROR - stderr - 89%|████████▊ | 19871/22434 [19:39:22<24:47:43, 34.83s/it] +2025-02-06 05:47:02 - ERROR - stderr - +2025-02-06 05:47:02 - ERROR - stderr - +2025-02-06 05:47:02 - INFO - stdout - {'loss': 0.3937, 'grad_norm': 1.5598475933074951, 'learning_rate': 6.768432048575213e-07, 'epoch': 2.66} +2025-02-06 05:47:02 - ERROR - stderr - 89%|████████▊ | 19871/22434 [19:39:22<24:47:43, 34.83s/it] +2025-02-06 05:47:50 - ERROR - stderr - 89%|████████▊ | 19872/22434 [19:40:10<27:34:35, 38.75s/it] +2025-02-06 05:47:50 - ERROR - stderr - +2025-02-06 05:47:50 - ERROR - stderr - +2025-02-06 05:47:50 - INFO - stdout - {'loss': 0.3419, 'grad_norm': 1.5333913564682007, 'learning_rate': 6.763211775437073e-07, 'epoch': 2.66} +2025-02-06 05:47:50 - ERROR - stderr - 89%|████████▊ | 19872/22434 [19:40:10<27:34:35, 38.75s/it] +2025-02-06 05:48:35 - ERROR - stderr - 89%|████████▊ | 19873/22434 [19:40:55<28:57:12, 40.70s/it] +2025-02-06 05:48:35 - ERROR - stderr - +2025-02-06 05:48:35 - ERROR - stderr - +2025-02-06 05:48:35 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.4545553922653198, 'learning_rate': 6.757993445730537e-07, 'epoch': 2.66} +2025-02-06 05:48:35 - ERROR - stderr - 89%|████████▊ | 19873/22434 [19:40:55<28:57:12, 40.70s/it] +2025-02-06 05:48:38 - ERROR - stderr - 89%|████████▊ | 19874/22434 [19:40:57<20:48:00, 29.25s/it] +2025-02-06 05:48:38 - ERROR - stderr - +2025-02-06 05:48:38 - ERROR - stderr - +2025-02-06 05:48:38 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.5719926357269287, 'learning_rate': 6.752777059564431e-07, 'epoch': 2.66} +2025-02-06 05:48:38 - ERROR - stderr - 89%|████████▊ | 19874/22434 [19:40:57<20:48:00, 29.25s/it] +2025-02-06 05:49:18 - ERROR - stderr - 89%|████████▊ | 19875/22434 [19:41:38<23:16:17, 32.74s/it] +2025-02-06 05:49:18 - ERROR - stderr - +2025-02-06 05:49:18 - ERROR - stderr - +2025-02-06 05:49:18 - INFO - stdout - {'loss': 0.3534, 'grad_norm': 1.5418215990066528, 'learning_rate': 6.747562617047432e-07, 'epoch': 2.66} +2025-02-06 05:49:18 - ERROR - stderr - 89%|████████▊ | 19875/22434 [19:41:38<23:16:17, 32.74s/it] +2025-02-06 05:49:21 - ERROR - stderr - 89%|████████▊ | 19876/22434 [19:41:41<16:48:00, 23.64s/it] +2025-02-06 05:49:21 - ERROR - stderr - +2025-02-06 05:49:21 - ERROR - stderr - +2025-02-06 05:49:21 - INFO - stdout - {'loss': 0.3364, 'grad_norm': 1.527799367904663, 'learning_rate': 6.742350118288277e-07, 'epoch': 2.66} +2025-02-06 05:49:21 - ERROR - stderr - 89%|████████▊ | 19876/22434 [19:41:41<16:48:00, 23.64s/it] +2025-02-06 05:49:23 - ERROR - stderr - 89%|████████▊ | 19877/22434 [19:41:43<12:18:37, 17.33s/it] +2025-02-06 05:49:24 - ERROR - stderr - +2025-02-06 05:49:24 - ERROR - stderr - +2025-02-06 05:49:24 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.4655767679214478, 'learning_rate': 6.737139563395601e-07, 'epoch': 2.66} +2025-02-06 05:49:24 - ERROR - stderr - 89%|████████▊ | 19877/22434 [19:41:43<12:18:37, 17.33s/it] +2025-02-06 05:50:04 - ERROR - stderr - 89%|████████▊ | 19878/22434 [19:42:24<17:21:21, 24.45s/it] +2025-02-06 05:50:05 - ERROR - stderr - +2025-02-06 05:50:05 - ERROR - stderr - +2025-02-06 05:50:05 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.5679669380187988, 'learning_rate': 6.731930952477983e-07, 'epoch': 2.66} +2025-02-06 05:50:05 - ERROR - stderr - 89%|████████▊ | 19878/22434 [19:42:24<17:21:21, 24.45s/it] +2025-02-06 05:50:29 - ERROR - stderr - 89%|████████▊ | 19879/22434 [19:42:49<17:25:35, 24.55s/it] +2025-02-06 05:50:29 - ERROR - stderr - +2025-02-06 05:50:29 - ERROR - stderr - +2025-02-06 05:50:29 - INFO - stdout - {'loss': 0.3306, 'grad_norm': 1.468704342842102, 'learning_rate': 6.726724285644048e-07, 'epoch': 2.66} +2025-02-06 05:50:29 - ERROR - stderr - 89%|████████▊ | 19879/22434 [19:42:49<17:25:35, 24.55s/it] +2025-02-06 05:50:32 - ERROR - stderr - 89%|████████▊ | 19880/22434 [19:42:52<12:43:14, 17.93s/it] +2025-02-06 05:50:32 - ERROR - stderr - +2025-02-06 05:50:32 - ERROR - stderr - +2025-02-06 05:50:32 - INFO - stdout - {'loss': 0.3988, 'grad_norm': 1.5013827085494995, 'learning_rate': 6.721519563002276e-07, 'epoch': 2.66} +2025-02-06 05:50:32 - ERROR - stderr - 89%|████████▊ | 19880/22434 [19:42:52<12:43:14, 17.93s/it] +2025-02-06 05:51:01 - ERROR - stderr - 89%|████████▊ | 19881/22434 [19:43:20<15:03:05, 21.22s/it] +2025-02-06 05:51:01 - ERROR - stderr - +2025-02-06 05:51:01 - ERROR - stderr - +2025-02-06 05:51:01 - INFO - stdout - {'loss': 0.3604, 'grad_norm': 1.7555853128433228, 'learning_rate': 6.71631678466117e-07, 'epoch': 2.66} +2025-02-06 05:51:01 - ERROR - stderr - 89%|████████▊ | 19881/22434 [19:43:21<15:03:05, 21.22s/it] +2025-02-06 05:51:03 - ERROR - stderr - 89%|████████▊ | 19882/22434 [19:43:23<11:03:25, 15.60s/it] +2025-02-06 05:51:03 - ERROR - stderr - +2025-02-06 05:51:03 - ERROR - stderr - +2025-02-06 05:51:03 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.6543340682983398, 'learning_rate': 6.711115950729174e-07, 'epoch': 2.66} +2025-02-06 05:51:03 - ERROR - stderr - 89%|████████▊ | 19882/22434 [19:43:23<11:03:25, 15.60s/it] +2025-02-06 05:51:42 - ERROR - stderr - 89%|████████▊ | 19883/22434 [19:44:02<16:05:29, 22.71s/it] +2025-02-06 05:51:43 - ERROR - stderr - +2025-02-06 05:51:43 - ERROR - stderr - +2025-02-06 05:51:43 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.540244221687317, 'learning_rate': 6.705917061314693e-07, 'epoch': 2.66} +2025-02-06 05:51:43 - ERROR - stderr - 89%|████████▊ | 19883/22434 [19:44:02<16:05:29, 22.71s/it] +2025-02-06 05:52:17 - ERROR - stderr - 89%|████████▊ | 19884/22434 [19:44:37<18:35:54, 26.26s/it] +2025-02-06 05:52:17 - ERROR - stderr - +2025-02-06 05:52:17 - ERROR - stderr - +2025-02-06 05:52:17 - INFO - stdout - {'loss': 0.3217, 'grad_norm': 1.5035181045532227, 'learning_rate': 6.700720116526116e-07, 'epoch': 2.66} +2025-02-06 05:52:17 - ERROR - stderr - 89%|████████▊ | 19884/22434 [19:44:37<18:35:54, 26.26s/it] +2025-02-06 05:52:44 - ERROR - stderr - 89%|████████▊ | 19885/22434 [19:45:03<18:40:50, 26.38s/it] +2025-02-06 05:52:44 - ERROR - stderr - +2025-02-06 05:52:44 - ERROR - stderr - +2025-02-06 05:52:44 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.644747257232666, 'learning_rate': 6.695525116471746e-07, 'epoch': 2.66} +2025-02-06 05:52:44 - ERROR - stderr - 89%|████████▊ | 19885/22434 [19:45:03<18:40:50, 26.38s/it] +2025-02-06 05:52:46 - ERROR - stderr - 89%|████████▊ | 19886/22434 [19:45:06<13:35:25, 19.20s/it] +2025-02-06 05:52:46 - ERROR - stderr - +2025-02-06 05:52:46 - ERROR - stderr - +2025-02-06 05:52:46 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.4243842363357544, 'learning_rate': 6.690332061259863e-07, 'epoch': 2.66} +2025-02-06 05:52:46 - ERROR - stderr - 89%|████████▊ | 19886/22434 [19:45:06<13:35:25, 19.20s/it] +2025-02-06 05:53:12 - ERROR - stderr - 89%|████████▊ | 19887/22434 [19:45:32<15:04:08, 21.30s/it] +2025-02-06 05:53:12 - ERROR - stderr - +2025-02-06 05:53:12 - ERROR - stderr - +2025-02-06 05:53:12 - INFO - stdout - {'loss': 0.3226, 'grad_norm': 1.4170126914978027, 'learning_rate': 6.685140950998725e-07, 'epoch': 2.66} +2025-02-06 05:53:12 - ERROR - stderr - 89%|████████▊ | 19887/22434 [19:45:32<15:04:08, 21.30s/it] +2025-02-06 05:53:34 - ERROR - stderr - 89%|████████▊ | 19888/22434 [19:45:53<15:03:06, 21.28s/it] +2025-02-06 05:53:34 - ERROR - stderr - +2025-02-06 05:53:34 - ERROR - stderr - +2025-02-06 05:53:34 - INFO - stdout - {'loss': 0.343, 'grad_norm': 1.4763046503067017, 'learning_rate': 6.679951785796534e-07, 'epoch': 2.66} +2025-02-06 05:53:34 - ERROR - stderr - 89%|████████▊ | 19888/22434 [19:45:53<15:03:06, 21.28s/it] +2025-02-06 05:53:36 - ERROR - stderr - 89%|████████▊ | 19889/22434 [19:45:56<11:03:21, 15.64s/it] +2025-02-06 05:53:36 - ERROR - stderr - +2025-02-06 05:53:36 - ERROR - stderr - +2025-02-06 05:53:36 - INFO - stdout - {'loss': 0.4187, 'grad_norm': 1.7905995845794678, 'learning_rate': 6.674764565761449e-07, 'epoch': 2.66} +2025-02-06 05:53:36 - ERROR - stderr - 89%|████████▊ | 19889/22434 [19:45:56<11:03:21, 15.64s/it] +2025-02-06 05:53:39 - ERROR - stderr - 89%|████████▊ | 19890/22434 [19:45:58<8:16:05, 11.70s/it] +2025-02-06 05:53:39 - ERROR - stderr - +2025-02-06 05:53:39 - ERROR - stderr - +2025-02-06 05:53:39 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.4849501848220825, 'learning_rate': 6.669579291001593e-07, 'epoch': 2.66} +2025-02-06 05:53:39 - ERROR - stderr - 89%|████████▊ | 19890/22434 [19:45:58<8:16:05, 11.70s/it] +2025-02-06 05:54:02 - ERROR - stderr - 89%|████████▊ | 19891/22434 [19:46:21<10:39:19, 15.08s/it] +2025-02-06 05:54:02 - ERROR - stderr - +2025-02-06 05:54:02 - ERROR - stderr - +2025-02-06 05:54:02 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.6455808877944946, 'learning_rate': 6.664395961625048e-07, 'epoch': 2.66} +2025-02-06 05:54:02 - ERROR - stderr - 89%|████████▊ | 19891/22434 [19:46:21<10:39:19, 15.08s/it] +2025-02-06 05:54:04 - ERROR - stderr - 89%|████████▊ | 19892/22434 [19:46:24<7:58:53, 11.30s/it] +2025-02-06 05:54:04 - ERROR - stderr - +2025-02-06 05:54:04 - ERROR - stderr - +2025-02-06 05:54:04 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.6777042150497437, 'learning_rate': 6.659214577739858e-07, 'epoch': 2.66} +2025-02-06 05:54:04 - ERROR - stderr - 89%|████████▊ | 19892/22434 [19:46:24<7:58:53, 11.30s/it] +2025-02-06 05:54:06 - ERROR - stderr - 89%|████████▊ | 19893/22434 [19:46:26<6:06:01, 8.64s/it] +2025-02-06 05:54:06 - ERROR - stderr - +2025-02-06 05:54:06 - ERROR - stderr - +2025-02-06 05:54:06 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.6227328777313232, 'learning_rate': 6.65403513945404e-07, 'epoch': 2.66} +2025-02-06 05:54:06 - ERROR - stderr - 89%|████████▊ | 19893/22434 [19:46:26<6:06:01, 8.64s/it] +2025-02-06 05:54:09 - ERROR - stderr - 89%|████████▊ | 19894/22434 [19:46:29<4:48:25, 6.81s/it] +2025-02-06 05:54:09 - ERROR - stderr - +2025-02-06 05:54:09 - ERROR - stderr - +2025-02-06 05:54:09 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.5736827850341797, 'learning_rate': 6.648857646875506e-07, 'epoch': 2.66} +2025-02-06 05:54:09 - ERROR - stderr - 89%|████████▊ | 19894/22434 [19:46:29<4:48:25, 6.81s/it] +2025-02-06 05:54:12 - ERROR - stderr - 89%|████████▊ | 19895/22434 [19:46:31<3:54:05, 5.53s/it] +2025-02-06 05:54:12 - ERROR - stderr - +2025-02-06 05:54:12 - ERROR - stderr - +2025-02-06 05:54:12 - INFO - stdout - {'loss': 0.3865, 'grad_norm': 1.4937360286712646, 'learning_rate': 6.643682100112226e-07, 'epoch': 2.66} +2025-02-06 05:54:12 - ERROR - stderr - 89%|████████▊ | 19895/22434 [19:46:31<3:54:05, 5.53s/it] +2025-02-06 05:54:22 - ERROR - stderr - 89%|████████▊ | 19896/22434 [19:46:42<5:02:17, 7.15s/it] +2025-02-06 05:54:22 - ERROR - stderr - +2025-02-06 05:54:22 - ERROR - stderr - +2025-02-06 05:54:22 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.622722864151001, 'learning_rate': 6.638508499272045e-07, 'epoch': 2.66} +2025-02-06 05:54:22 - ERROR - stderr - 89%|████████▊ | 19896/22434 [19:46:42<5:02:17, 7.15s/it] +2025-02-06 05:54:25 - ERROR - stderr - 89%|████████▊ | 19897/22434 [19:46:45<4:03:49, 5.77s/it] +2025-02-06 05:54:25 - ERROR - stderr - +2025-02-06 05:54:25 - ERROR - stderr - +2025-02-06 05:54:25 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.6487196683883667, 'learning_rate': 6.633336844462834e-07, 'epoch': 2.66} +2025-02-06 05:54:25 - ERROR - stderr - 89%|████████▊ | 19897/22434 [19:46:45<4:03:49, 5.77s/it] +2025-02-06 05:54:28 - ERROR - stderr - 89%|████████▊ | 19898/22434 [19:46:47<3:22:59, 4.80s/it] +2025-02-06 05:54:28 - ERROR - stderr - +2025-02-06 05:54:28 - ERROR - stderr - +2025-02-06 05:54:28 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.453078031539917, 'learning_rate': 6.628167135792385e-07, 'epoch': 2.66} +2025-02-06 05:54:28 - ERROR - stderr - 89%|████████▊ | 19898/22434 [19:46:47<3:22:59, 4.80s/it] +2025-02-06 05:54:30 - ERROR - stderr - 89%|████████▊ | 19899/22434 [19:46:50<2:54:44, 4.14s/it] +2025-02-06 05:54:30 - ERROR - stderr - +2025-02-06 05:54:30 - ERROR - stderr - +2025-02-06 05:54:30 - INFO - stdout - {'loss': 0.3569, 'grad_norm': 1.508697509765625, 'learning_rate': 6.62299937336841e-07, 'epoch': 2.66} +2025-02-06 05:54:30 - ERROR - stderr - 89%|████████▊ | 19899/22434 [19:46:50<2:54:44, 4.14s/it] +2025-02-06 05:54:33 - ERROR - stderr - 89%|████████▊ | 19900/22434 [19:46:52<2:33:46, 3.64s/it] +2025-02-06 05:54:33 - ERROR - stderr - +2025-02-06 05:54:33 - ERROR - stderr - +2025-02-06 05:54:33 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.473039984703064, 'learning_rate': 6.617833557298692e-07, 'epoch': 2.66} +2025-02-06 05:54:33 - ERROR - stderr - 89%|████████▊ | 19900/22434 [19:46:52<2:33:46, 3.64s/it] +2025-02-06 05:54:35 - ERROR - stderr - 89%|████████▊ | 19901/22434 [19:46:55<2:19:27, 3.30s/it] +2025-02-06 05:54:35 - ERROR - stderr - +2025-02-06 05:54:35 - ERROR - stderr - +2025-02-06 05:54:35 - INFO - stdout - {'loss': 0.4043, 'grad_norm': 1.6444505453109741, 'learning_rate': 6.612669687690865e-07, 'epoch': 2.66} +2025-02-06 05:54:35 - ERROR - stderr - 89%|████████▊ | 19901/22434 [19:46:55<2:19:27, 3.30s/it] +2025-02-06 05:54:38 - ERROR - stderr - 89%|████████▊ | 19902/22434 [19:46:58<2:13:07, 3.15s/it] +2025-02-06 05:54:38 - ERROR - stderr - +2025-02-06 05:54:38 - ERROR - stderr - +2025-02-06 05:54:38 - INFO - stdout - {'loss': 0.3975, 'grad_norm': 1.6796361207962036, 'learning_rate': 6.607507764652554e-07, 'epoch': 2.66} +2025-02-06 05:54:38 - ERROR - stderr - 89%|████████▊ | 19902/22434 [19:46:58<2:13:07, 3.15s/it] +2025-02-06 05:54:40 - ERROR - stderr - 89%|████████▊ | 19903/22434 [19:47:00<2:04:54, 2.96s/it] +2025-02-06 05:54:40 - ERROR - stderr - +2025-02-06 05:54:40 - ERROR - stderr - +2025-02-06 05:54:40 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.6511452198028564, 'learning_rate': 6.602347788291419e-07, 'epoch': 2.66} +2025-02-06 05:54:40 - ERROR - stderr - 89%|████████▊ | 19903/22434 [19:47:00<2:04:54, 2.96s/it] +2025-02-06 05:54:43 - ERROR - stderr - 89%|████████▊ | 19904/22434 [19:47:03<2:00:38, 2.86s/it] +2025-02-06 05:54:43 - ERROR - stderr - +2025-02-06 05:54:43 - ERROR - stderr - +2025-02-06 05:54:43 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.7967760562896729, 'learning_rate': 6.597189758714928e-07, 'epoch': 2.66} +2025-02-06 05:54:43 - ERROR - stderr - 89%|████████▊ | 19904/22434 [19:47:03<2:00:38, 2.86s/it] +2025-02-06 05:54:46 - ERROR - stderr - 89%|████████▊ | 19905/22434 [19:47:05<1:56:07, 2.76s/it] +2025-02-06 05:54:46 - ERROR - stderr - +2025-02-06 05:54:46 - ERROR - stderr - +2025-02-06 05:54:46 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.7029051780700684, 'learning_rate': 6.592033676030685e-07, 'epoch': 2.66} +2025-02-06 05:54:46 - ERROR - stderr - 89%|████████▊ | 19905/22434 [19:47:05<1:56:07, 2.76s/it] +2025-02-06 05:54:48 - ERROR - stderr - 89%|████████▊ | 19906/22434 [19:47:08<1:53:05, 2.68s/it] +2025-02-06 05:54:48 - ERROR - stderr - +2025-02-06 05:54:48 - ERROR - stderr - +2025-02-06 05:54:48 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.5597264766693115, 'learning_rate': 6.586879540346092e-07, 'epoch': 2.66} +2025-02-06 05:54:48 - ERROR - stderr - 89%|████████▊ | 19906/22434 [19:47:08<1:53:05, 2.68s/it] +2025-02-06 05:54:51 - ERROR - stderr - 89%|████████▊ | 19907/22434 [19:47:10<1:52:32, 2.67s/it] +2025-02-06 05:54:51 - ERROR - stderr - +2025-02-06 05:54:51 - ERROR - stderr - +2025-02-06 05:54:51 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.6921013593673706, 'learning_rate': 6.581727351768608e-07, 'epoch': 2.66} +2025-02-06 05:54:51 - ERROR - stderr - 89%|████████▊ | 19907/22434 [19:47:11<1:52:32, 2.67s/it] +2025-02-06 05:54:53 - ERROR - stderr - 89%|████████▊ | 19908/22434 [19:47:13<1:49:58, 2.61s/it] +2025-02-06 05:54:53 - ERROR - stderr - +2025-02-06 05:54:53 - ERROR - stderr - +2025-02-06 05:54:53 - INFO - stdout - {'loss': 0.309, 'grad_norm': 1.52169930934906, 'learning_rate': 6.576577110405635e-07, 'epoch': 2.66} +2025-02-06 05:54:53 - ERROR - stderr - 89%|████████▊ | 19908/22434 [19:47:13<1:49:58, 2.61s/it] +2025-02-06 05:54:56 - ERROR - stderr - 89%|████████▊ | 19909/22434 [19:47:16<1:48:57, 2.59s/it] +2025-02-06 05:54:56 - ERROR - stderr - +2025-02-06 05:54:56 - ERROR - stderr - +2025-02-06 05:54:56 - INFO - stdout - {'loss': 0.3946, 'grad_norm': 1.7437068223953247, 'learning_rate': 6.571428816364512e-07, 'epoch': 2.66} +2025-02-06 05:54:56 - ERROR - stderr - 89%|████████▊ | 19909/22434 [19:47:16<1:48:57, 2.59s/it] +2025-02-06 05:54:58 - ERROR - stderr - 89%|████████▊ | 19910/22434 [19:47:18<1:47:43, 2.56s/it] +2025-02-06 05:54:58 - ERROR - stderr - +2025-02-06 05:54:58 - ERROR - stderr - +2025-02-06 05:54:58 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.4568579196929932, 'learning_rate': 6.56628246975255e-07, 'epoch': 2.66} +2025-02-06 05:54:58 - ERROR - stderr - 89%|████████▊ | 19910/22434 [19:47:18<1:47:43, 2.56s/it] +2025-02-06 05:55:01 - ERROR - stderr - 89%|████████▉ | 19911/22434 [19:47:20<1:46:18, 2.53s/it] +2025-02-06 05:55:01 - ERROR - stderr - +2025-02-06 05:55:01 - ERROR - stderr - +2025-02-06 05:55:01 - INFO - stdout - {'loss': 0.3449, 'grad_norm': 1.597946047782898, 'learning_rate': 6.56113807067702e-07, 'epoch': 2.66} +2025-02-06 05:55:01 - ERROR - stderr - 89%|████████▉ | 19911/22434 [19:47:21<1:46:18, 2.53s/it] +2025-02-06 05:55:03 - ERROR - stderr - 89%|████████▉ | 19912/22434 [19:47:23<1:45:08, 2.50s/it] +2025-02-06 05:55:03 - ERROR - stderr - +2025-02-06 05:55:03 - ERROR - stderr - +2025-02-06 05:55:03 - INFO - stdout - {'loss': 0.3218, 'grad_norm': 1.6338046789169312, 'learning_rate': 6.555995619245159e-07, 'epoch': 2.66} +2025-02-06 05:55:03 - ERROR - stderr - 89%|████████▉ | 19912/22434 [19:47:23<1:45:08, 2.50s/it] +2025-02-06 05:55:06 - ERROR - stderr - 89%|████████▉ | 19913/22434 [19:47:26<1:47:08, 2.55s/it] +2025-02-06 05:55:06 - ERROR - stderr - +2025-02-06 05:55:06 - ERROR - stderr - +2025-02-06 05:55:06 - INFO - stdout - {'loss': 0.3316, 'grad_norm': 1.5306764841079712, 'learning_rate': 6.550855115564159e-07, 'epoch': 2.66} +2025-02-06 05:55:06 - ERROR - stderr - 89%|████████▉ | 19913/22434 [19:47:26<1:47:08, 2.55s/it] +2025-02-06 05:55:08 - ERROR - stderr - 89%|████████▉ | 19914/22434 [19:47:28<1:46:30, 2.54s/it] +2025-02-06 05:55:08 - ERROR - stderr - +2025-02-06 05:55:08 - ERROR - stderr - +2025-02-06 05:55:08 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.5425440073013306, 'learning_rate': 6.545716559741166e-07, 'epoch': 2.66} +2025-02-06 05:55:08 - ERROR - stderr - 89%|████████▉ | 19914/22434 [19:47:28<1:46:30, 2.54s/it] +2025-02-06 05:55:11 - ERROR - stderr - 89%|████████▉ | 19915/22434 [19:47:31<1:46:20, 2.53s/it] +2025-02-06 05:55:11 - ERROR - stderr - +2025-02-06 05:55:11 - ERROR - stderr - +2025-02-06 05:55:11 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.6091891527175903, 'learning_rate': 6.540579951883275e-07, 'epoch': 2.66} +2025-02-06 05:55:11 - ERROR - stderr - 89%|████████▉ | 19915/22434 [19:47:31<1:46:20, 2.53s/it] +2025-02-06 05:55:13 - ERROR - stderr - 89%|████████▉ | 19916/22434 [19:47:33<1:47:28, 2.56s/it] +2025-02-06 05:55:13 - ERROR - stderr - +2025-02-06 05:55:13 - ERROR - stderr - +2025-02-06 05:55:13 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.6633224487304688, 'learning_rate': 6.535445292097564e-07, 'epoch': 2.66} +2025-02-06 05:55:13 - ERROR - stderr - 89%|████████▉ | 19916/22434 [19:47:33<1:47:28, 2.56s/it] +2025-02-06 05:55:16 - ERROR - stderr - 89%|████████▉ | 19917/22434 [19:47:36<1:48:07, 2.58s/it] +2025-02-06 05:55:16 - ERROR - stderr - +2025-02-06 05:55:16 - ERROR - stderr - +2025-02-06 05:55:16 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.6942492723464966, 'learning_rate': 6.530312580491082e-07, 'epoch': 2.66} +2025-02-06 05:55:16 - ERROR - stderr - 89%|████████▉ | 19917/22434 [19:47:36<1:48:07, 2.58s/it] +2025-02-06 05:55:36 - ERROR - stderr - 89%|████████▉ | 19918/22434 [19:47:56<5:23:25, 7.71s/it] +2025-02-06 05:55:36 - ERROR - stderr - +2025-02-06 05:55:36 - ERROR - stderr - +2025-02-06 05:55:36 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.4368776082992554, 'learning_rate': 6.525181817170756e-07, 'epoch': 2.66} +2025-02-06 05:55:36 - ERROR - stderr - 89%|████████▉ | 19918/22434 [19:47:56<5:23:25, 7.71s/it] +2025-02-06 05:55:48 - ERROR - stderr - 89%|████████▉ | 19919/22434 [19:48:08<6:17:37, 9.01s/it] +2025-02-06 05:55:48 - ERROR - stderr - +2025-02-06 05:55:48 - ERROR - stderr - +2025-02-06 05:55:48 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.6549551486968994, 'learning_rate': 6.520053002243609e-07, 'epoch': 2.66} +2025-02-06 05:55:48 - ERROR - stderr - 89%|████████▉ | 19919/22434 [19:48:08<6:17:37, 9.01s/it] +2025-02-06 05:55:57 - ERROR - stderr - 89%|████████▉ | 19920/22434 [19:48:17<6:16:58, 9.00s/it] +2025-02-06 05:55:57 - ERROR - stderr - +2025-02-06 05:55:57 - ERROR - stderr - +2025-02-06 05:55:57 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.5179585218429565, 'learning_rate': 6.514926135816469e-07, 'epoch': 2.66} +2025-02-06 05:55:57 - ERROR - stderr - 89%|████████▉ | 19920/22434 [19:48:17<6:16:58, 9.00s/it] +2025-02-06 05:55:59 - ERROR - stderr - 89%|████████▉ | 19921/22434 [19:48:19<4:54:20, 7.03s/it] +2025-02-06 05:55:59 - ERROR - stderr - +2025-02-06 05:55:59 - ERROR - stderr - +2025-02-06 05:55:59 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.564812183380127, 'learning_rate': 6.509801217996259e-07, 'epoch': 2.66} +2025-02-06 05:55:59 - ERROR - stderr - 89%|████████▉ | 19921/22434 [19:48:19<4:54:20, 7.03s/it] +2025-02-06 05:56:02 - ERROR - stderr - 89%|████████▉ | 19922/22434 [19:48:21<3:57:09, 5.66s/it] +2025-02-06 05:56:02 - ERROR - stderr - +2025-02-06 05:56:02 - ERROR - stderr - +2025-02-06 05:56:02 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.5664849281311035, 'learning_rate': 6.504678248889785e-07, 'epoch': 2.66} +2025-02-06 05:56:02 - ERROR - stderr - 89%|████████▉ | 19922/22434 [19:48:21<3:57:09, 5.66s/it] +2025-02-06 05:56:34 - ERROR - stderr - 89%|████████▉ | 19923/22434 [19:48:54<9:32:31, 13.68s/it] +2025-02-06 05:56:34 - ERROR - stderr - +2025-02-06 05:56:34 - ERROR - stderr - +2025-02-06 05:56:34 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.6470412015914917, 'learning_rate': 6.499557228603803e-07, 'epoch': 2.66} +2025-02-06 05:56:34 - ERROR - stderr - 89%|████████▉ | 19923/22434 [19:48:54<9:32:31, 13.68s/it] +2025-02-06 05:56:54 - ERROR - stderr - 89%|████████▉ | 19924/22434 [19:49:14<10:54:50, 15.65s/it] +2025-02-06 05:56:54 - ERROR - stderr - +2025-02-06 05:56:54 - ERROR - stderr - +2025-02-06 05:56:54 - INFO - stdout - {'loss': 0.3412, 'grad_norm': 1.7087093591690063, 'learning_rate': 6.49443815724512e-07, 'epoch': 2.66} +2025-02-06 05:56:54 - ERROR - stderr - 89%|████████▉ | 19924/22434 [19:49:14<10:54:50, 15.65s/it] +2025-02-06 05:57:17 - ERROR - stderr - 89%|████████▉ | 19925/22434 [19:49:37<12:25:07, 17.82s/it] +2025-02-06 05:57:17 - ERROR - stderr - +2025-02-06 05:57:17 - ERROR - stderr - +2025-02-06 05:57:17 - INFO - stdout - {'loss': 0.4009, 'grad_norm': 1.7388477325439453, 'learning_rate': 6.489321034920382e-07, 'epoch': 2.66} +2025-02-06 05:57:17 - ERROR - stderr - 89%|████████▉ | 19925/22434 [19:49:37<12:25:07, 17.82s/it] +2025-02-06 05:57:59 - ERROR - stderr - 89%|████████▉ | 19926/22434 [19:50:18<17:21:38, 24.92s/it] +2025-02-06 05:57:59 - ERROR - stderr - +2025-02-06 05:57:59 - ERROR - stderr - +2025-02-06 05:57:59 - INFO - stdout - {'loss': 0.3168, 'grad_norm': 1.3731889724731445, 'learning_rate': 6.484205861736259e-07, 'epoch': 2.66} +2025-02-06 05:57:59 - ERROR - stderr - 89%|████████▉ | 19926/22434 [19:50:18<17:21:38, 24.92s/it] +2025-02-06 05:58:24 - ERROR - stderr - 89%|████████▉ | 19927/22434 [19:50:44<17:24:12, 24.99s/it] +2025-02-06 05:58:24 - ERROR - stderr - +2025-02-06 05:58:24 - ERROR - stderr - +2025-02-06 05:58:24 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.438269019126892, 'learning_rate': 6.479092637799378e-07, 'epoch': 2.66} +2025-02-06 05:58:24 - ERROR - stderr - 89%|████████▉ | 19927/22434 [19:50:44<17:24:12, 24.99s/it] +2025-02-06 05:58:51 - ERROR - stderr - 89%|████████▉ | 19928/22434 [19:51:11<17:54:25, 25.72s/it] +2025-02-06 05:58:51 - ERROR - stderr - +2025-02-06 05:58:51 - ERROR - stderr - +2025-02-06 05:58:51 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.6568113565444946, 'learning_rate': 6.473981363216309e-07, 'epoch': 2.66} +2025-02-06 05:58:51 - ERROR - stderr - 89%|████████▉ | 19928/22434 [19:51:11<17:54:25, 25.72s/it] +2025-02-06 05:58:54 - ERROR - stderr - 89%|████████▉ | 19929/22434 [19:51:14<13:06:04, 18.83s/it] +2025-02-06 05:58:54 - ERROR - stderr - +2025-02-06 05:58:54 - ERROR - stderr - +2025-02-06 05:58:54 - INFO - stdout - {'loss': 0.3377, 'grad_norm': 1.5853067636489868, 'learning_rate': 6.468872038093643e-07, 'epoch': 2.67} +2025-02-06 05:58:54 - ERROR - stderr - 89%|████████▉ | 19929/22434 [19:51:14<13:06:04, 18.83s/it] +2025-02-06 05:59:19 - ERROR - stderr - 89%|████████▉ | 19930/22434 [19:51:39<14:23:22, 20.69s/it] +2025-02-06 05:59:19 - ERROR - stderr - +2025-02-06 05:59:19 - ERROR - stderr - +2025-02-06 05:59:19 - INFO - stdout - {'loss': 0.3285, 'grad_norm': 1.4314448833465576, 'learning_rate': 6.463764662537809e-07, 'epoch': 2.67} +2025-02-06 05:59:19 - ERROR - stderr - 89%|████████▉ | 19930/22434 [19:51:39<14:23:22, 20.69s/it] +2025-02-06 05:59:47 - ERROR - stderr - 89%|████████▉ | 19931/22434 [19:52:07<15:52:08, 22.82s/it] +2025-02-06 05:59:47 - ERROR - stderr - +2025-02-06 05:59:47 - ERROR - stderr - +2025-02-06 05:59:47 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.5260133743286133, 'learning_rate': 6.458659236655307e-07, 'epoch': 2.67} +2025-02-06 05:59:47 - ERROR - stderr - 89%|████████▉ | 19931/22434 [19:52:07<15:52:08, 22.82s/it] +2025-02-06 06:00:01 - ERROR - stderr - 89%|████████▉ | 19932/22434 [19:52:21<14:08:14, 20.34s/it] +2025-02-06 06:00:01 - ERROR - stderr - +2025-02-06 06:00:01 - ERROR - stderr - +2025-02-06 06:00:01 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.530300498008728, 'learning_rate': 6.453555760552544e-07, 'epoch': 2.67} +2025-02-06 06:00:01 - ERROR - stderr - 89%|████████▉ | 19932/22434 [19:52:21<14:08:14, 20.34s/it] +2025-02-06 06:00:16 - ERROR - stderr - 89%|████████▉ | 19933/22434 [19:52:36<12:58:32, 18.68s/it] +2025-02-06 06:00:16 - ERROR - stderr - +2025-02-06 06:00:16 - ERROR - stderr - +2025-02-06 06:00:16 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.7612539529800415, 'learning_rate': 6.448454234335888e-07, 'epoch': 2.67} +2025-02-06 06:00:16 - ERROR - stderr - 89%|████████▉ | 19933/22434 [19:52:36<12:58:32, 18.68s/it] +2025-02-06 06:02:04 - ERROR - stderr - 89%|████████▉ | 19934/22434 [19:54:24<31:32:04, 45.41s/it] +2025-02-06 06:02:04 - ERROR - stderr - +2025-02-06 06:02:04 - ERROR - stderr - +2025-02-06 06:02:04 - INFO - stdout - {'loss': 0.4172, 'grad_norm': 1.7463163137435913, 'learning_rate': 6.4433546581117e-07, 'epoch': 2.67} +2025-02-06 06:02:04 - ERROR - stderr - 89%|████████▉ | 19934/22434 [19:54:24<31:32:04, 45.41s/it] +2025-02-06 06:02:56 - ERROR - stderr - 89%|████████▉ | 19935/22434 [19:55:16<32:54:33, 47.41s/it] +2025-02-06 06:02:56 - ERROR - stderr - +2025-02-06 06:02:56 - ERROR - stderr - +2025-02-06 06:02:56 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.5070174932479858, 'learning_rate': 6.43825703198625e-07, 'epoch': 2.67} +2025-02-06 06:02:56 - ERROR - stderr - 89%|████████▉ | 19935/22434 [19:55:16<32:54:33, 47.41s/it] +2025-02-06 06:03:52 - ERROR - stderr - 89%|████████▉ | 19936/22434 [19:56:12<34:43:50, 50.05s/it] +2025-02-06 06:03:52 - ERROR - stderr - +2025-02-06 06:03:52 - ERROR - stderr - +2025-02-06 06:03:52 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.5224629640579224, 'learning_rate': 6.433161356065798e-07, 'epoch': 2.67} +2025-02-06 06:03:52 - ERROR - stderr - 89%|████████▉ | 19936/22434 [19:56:12<34:43:50, 50.05s/it] +2025-02-06 06:04:06 - ERROR - stderr - 89%|████████▉ | 19937/22434 [19:56:26<27:12:01, 39.22s/it] +2025-02-06 06:04:06 - ERROR - stderr - +2025-02-06 06:04:06 - ERROR - stderr - +2025-02-06 06:04:06 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.6397041082382202, 'learning_rate': 6.42806763045657e-07, 'epoch': 2.67} +2025-02-06 06:04:06 - ERROR - stderr - 89%|████████▉ | 19937/22434 [19:56:26<27:12:01, 39.22s/it] +2025-02-06 06:05:05 - ERROR - stderr - 89%|████████▉ | 19938/22434 [19:57:25<31:20:57, 45.22s/it] +2025-02-06 06:05:05 - ERROR - stderr - +2025-02-06 06:05:05 - ERROR - stderr - +2025-02-06 06:05:05 - INFO - stdout - {'loss': 0.4149, 'grad_norm': 1.8572028875350952, 'learning_rate': 6.422975855264757e-07, 'epoch': 2.67} +2025-02-06 06:05:05 - ERROR - stderr - 89%|████████▉ | 19938/22434 [19:57:25<31:20:57, 45.22s/it] +2025-02-06 06:05:58 - ERROR - stderr - 89%|████████▉ | 19939/22434 [19:58:18<32:53:30, 47.46s/it] +2025-02-06 06:05:58 - ERROR - stderr - +2025-02-06 06:05:58 - ERROR - stderr - +2025-02-06 06:05:58 - INFO - stdout - {'loss': 0.4006, 'grad_norm': 1.6111172437667847, 'learning_rate': 6.417886030596421e-07, 'epoch': 2.67} +2025-02-06 06:05:58 - ERROR - stderr - 89%|████████▉ | 19939/22434 [19:58:18<32:53:30, 47.46s/it] +2025-02-06 06:06:01 - ERROR - stderr - 89%|████████▉ | 19940/22434 [19:58:20<23:31:54, 33.97s/it] +2025-02-06 06:06:01 - ERROR - stderr - +2025-02-06 06:06:01 - ERROR - stderr - +2025-02-06 06:06:01 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.547215223312378, 'learning_rate': 6.412798156557732e-07, 'epoch': 2.67} +2025-02-06 06:06:01 - ERROR - stderr - 89%|████████▉ | 19940/22434 [19:58:20<23:31:54, 33.97s/it] +2025-02-06 06:06:51 - ERROR - stderr - 89%|████████▉ | 19941/22434 [19:59:11<26:54:27, 38.86s/it] +2025-02-06 06:06:51 - ERROR - stderr - +2025-02-06 06:06:51 - ERROR - stderr - +2025-02-06 06:06:51 - INFO - stdout - {'loss': 0.3553, 'grad_norm': 1.6440682411193848, 'learning_rate': 6.407712233254726e-07, 'epoch': 2.67} +2025-02-06 06:06:51 - ERROR - stderr - 89%|████████▉ | 19941/22434 [19:59:11<26:54:27, 38.86s/it] +2025-02-06 06:07:36 - ERROR - stderr - 89%|████████▉ | 19942/22434 [19:59:56<28:13:34, 40.78s/it] +2025-02-06 06:07:36 - ERROR - stderr - +2025-02-06 06:07:36 - ERROR - stderr - +2025-02-06 06:07:36 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.6081055402755737, 'learning_rate': 6.402628260793365e-07, 'epoch': 2.67} +2025-02-06 06:07:36 - ERROR - stderr - 89%|████████▉ | 19942/22434 [19:59:56<28:13:34, 40.78s/it] +2025-02-06 06:08:17 - ERROR - stderr - 89%|████████▉ | 19943/22434 [20:00:36<28:08:25, 40.67s/it] +2025-02-06 06:08:17 - ERROR - stderr - +2025-02-06 06:08:17 - ERROR - stderr - +2025-02-06 06:08:17 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.5576151609420776, 'learning_rate': 6.397546239279684e-07, 'epoch': 2.67} +2025-02-06 06:08:17 - ERROR - stderr - 89%|████████▉ | 19943/22434 [20:00:36<28:08:25, 40.67s/it] +2025-02-06 06:08:32 - ERROR - stderr - 89%|████████▉ | 19944/22434 [20:00:52<22:54:19, 33.12s/it] +2025-02-06 06:08:32 - ERROR - stderr - +2025-02-06 06:08:32 - ERROR - stderr - +2025-02-06 06:08:32 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.4261161088943481, 'learning_rate': 6.392466168819555e-07, 'epoch': 2.67} +2025-02-06 06:08:32 - ERROR - stderr - 89%|████████▉ | 19944/22434 [20:00:52<22:54:19, 33.12s/it] +2025-02-06 06:09:07 - ERROR - stderr - 89%|████████▉ | 19945/22434 [20:01:26<23:12:35, 33.57s/it] +2025-02-06 06:09:07 - ERROR - stderr - +2025-02-06 06:09:07 - ERROR - stderr - +2025-02-06 06:09:07 - INFO - stdout - {'loss': 0.3958, 'grad_norm': 1.633931279182434, 'learning_rate': 6.387388049518927e-07, 'epoch': 2.67} +2025-02-06 06:09:07 - ERROR - stderr - 89%|████████▉ | 19945/22434 [20:01:26<23:12:35, 33.57s/it] +2025-02-06 06:09:13 - ERROR - stderr - 89%|████████▉ | 19946/22434 [20:01:33<17:30:45, 25.34s/it] +2025-02-06 06:09:13 - ERROR - stderr - +2025-02-06 06:09:13 - ERROR - stderr - +2025-02-06 06:09:13 - INFO - stdout - {'loss': 0.3319, 'grad_norm': 1.595692753791809, 'learning_rate': 6.382311881483605e-07, 'epoch': 2.67} +2025-02-06 06:09:13 - ERROR - stderr - 89%|████████▉ | 19946/22434 [20:01:33<17:30:45, 25.34s/it] +2025-02-06 06:09:17 - ERROR - stderr - 89%|████████▉ | 19947/22434 [20:01:37<13:11:49, 19.10s/it] +2025-02-06 06:09:17 - ERROR - stderr - +2025-02-06 06:09:17 - ERROR - stderr - +2025-02-06 06:09:17 - INFO - stdout - {'loss': 0.3228, 'grad_norm': 1.6369017362594604, 'learning_rate': 6.377237664819392e-07, 'epoch': 2.67} +2025-02-06 06:09:17 - ERROR - stderr - 89%|████████▉ | 19947/22434 [20:01:37<13:11:49, 19.10s/it] +2025-02-06 06:09:20 - ERROR - stderr - 89%|████████▉ | 19948/22434 [20:01:40<9:45:12, 14.12s/it] +2025-02-06 06:09:20 - ERROR - stderr - +2025-02-06 06:09:20 - ERROR - stderr - +2025-02-06 06:09:20 - INFO - stdout - {'loss': 0.3108, 'grad_norm': 1.4656267166137695, 'learning_rate': 6.372165399632102e-07, 'epoch': 2.67} +2025-02-06 06:09:20 - ERROR - stderr - 89%|████████▉ | 19948/22434 [20:01:40<9:45:12, 14.12s/it] +2025-02-06 06:09:52 - ERROR - stderr - 89%|████████▉ | 19949/22434 [20:02:12<13:27:00, 19.49s/it] +2025-02-06 06:09:52 - ERROR - stderr - +2025-02-06 06:09:52 - ERROR - stderr - +2025-02-06 06:09:52 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.4643809795379639, 'learning_rate': 6.367095086027419e-07, 'epoch': 2.67} +2025-02-06 06:09:52 - ERROR - stderr - 89%|████████▉ | 19949/22434 [20:02:12<13:27:00, 19.49s/it] +2025-02-06 06:09:54 - ERROR - stderr - 89%|████████▉ | 19950/22434 [20:02:14<9:55:27, 14.38s/it] +2025-02-06 06:09:54 - ERROR - stderr - +2025-02-06 06:09:54 - ERROR - stderr - +2025-02-06 06:09:54 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.6563465595245361, 'learning_rate': 6.362026724111036e-07, 'epoch': 2.67} +2025-02-06 06:09:54 - ERROR - stderr - 89%|████████▉ | 19950/22434 [20:02:14<9:55:27, 14.38s/it] +2025-02-06 06:10:17 - ERROR - stderr - 89%|████████▉ | 19951/22434 [20:02:37<11:39:57, 16.91s/it] +2025-02-06 06:10:17 - ERROR - stderr - +2025-02-06 06:10:17 - ERROR - stderr - +2025-02-06 06:10:17 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.520506501197815, 'learning_rate': 6.356960313988614e-07, 'epoch': 2.67} +2025-02-06 06:10:17 - ERROR - stderr - 89%|████████▉ | 19951/22434 [20:02:37<11:39:57, 16.91s/it] +2025-02-06 06:10:20 - ERROR - stderr - 89%|████████▉ | 19952/22434 [20:02:40<8:43:36, 12.66s/it] +2025-02-06 06:10:20 - ERROR - stderr - +2025-02-06 06:10:20 - ERROR - stderr - +2025-02-06 06:10:20 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.5092777013778687, 'learning_rate': 6.351895855765733e-07, 'epoch': 2.67} +2025-02-06 06:10:20 - ERROR - stderr - 89%|████████▉ | 19952/22434 [20:02:40<8:43:36, 12.66s/it] +2025-02-06 06:10:22 - ERROR - stderr - 89%|████████▉ | 19953/22434 [20:02:42<6:37:36, 9.62s/it] +2025-02-06 06:10:22 - ERROR - stderr - +2025-02-06 06:10:22 - ERROR - stderr - +2025-02-06 06:10:22 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.5897568464279175, 'learning_rate': 6.346833349547988e-07, 'epoch': 2.67} +2025-02-06 06:10:22 - ERROR - stderr - 89%|████████▉ | 19953/22434 [20:02:42<6:37:36, 9.62s/it] +2025-02-06 06:10:25 - ERROR - stderr - 89%|████████▉ | 19954/22434 [20:02:45<5:11:15, 7.53s/it] +2025-02-06 06:10:25 - ERROR - stderr - +2025-02-06 06:10:25 - ERROR - stderr - +2025-02-06 06:10:25 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.4701083898544312, 'learning_rate': 6.34177279544087e-07, 'epoch': 2.67} +2025-02-06 06:10:25 - ERROR - stderr - 89%|████████▉ | 19954/22434 [20:02:45<5:11:15, 7.53s/it] +2025-02-06 06:10:27 - ERROR - stderr - 89%|████████▉ | 19955/22434 [20:02:47<4:07:47, 6.00s/it] +2025-02-06 06:10:28 - ERROR - stderr - +2025-02-06 06:10:28 - ERROR - stderr - +2025-02-06 06:10:28 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.7193269729614258, 'learning_rate': 6.336714193549887e-07, 'epoch': 2.67} +2025-02-06 06:10:28 - ERROR - stderr - 89%|████████▉ | 19955/22434 [20:02:47<4:07:47, 6.00s/it] +2025-02-06 06:10:30 - ERROR - stderr - 89%|████████▉ | 19956/22434 [20:02:50<3:24:29, 4.95s/it] +2025-02-06 06:10:30 - ERROR - stderr - +2025-02-06 06:10:30 - ERROR - stderr - +2025-02-06 06:10:30 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.7678195238113403, 'learning_rate': 6.331657543980474e-07, 'epoch': 2.67} +2025-02-06 06:10:30 - ERROR - stderr - 89%|████████▉ | 19956/22434 [20:02:50<3:24:29, 4.95s/it] +2025-02-06 06:10:49 - ERROR - stderr - 89%|████████▉ | 19957/22434 [20:03:09<6:16:11, 9.11s/it] +2025-02-06 06:10:49 - ERROR - stderr - +2025-02-06 06:10:49 - ERROR - stderr - +2025-02-06 06:10:49 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.482303261756897, 'learning_rate': 6.326602846838037e-07, 'epoch': 2.67} +2025-02-06 06:10:49 - ERROR - stderr - 89%|████████▉ | 19957/22434 [20:03:09<6:16:11, 9.11s/it] +2025-02-06 06:10:51 - ERROR - stderr - 89%|████████▉ | 19958/22434 [20:03:11<4:54:28, 7.14s/it] +2025-02-06 06:10:51 - ERROR - stderr - +2025-02-06 06:10:51 - ERROR - stderr - +2025-02-06 06:10:51 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.6284412145614624, 'learning_rate': 6.321550102227902e-07, 'epoch': 2.67} +2025-02-06 06:10:51 - ERROR - stderr - 89%|████████▉ | 19958/22434 [20:03:11<4:54:28, 7.14s/it] +2025-02-06 06:10:54 - ERROR - stderr - 89%|██���█████▉ | 19959/22434 [20:03:14<3:56:25, 5.73s/it] +2025-02-06 06:10:54 - ERROR - stderr - +2025-02-06 06:10:54 - ERROR - stderr - +2025-02-06 06:10:54 - INFO - stdout - {'loss': 0.3358, 'grad_norm': 1.587695598602295, 'learning_rate': 6.316499310255419e-07, 'epoch': 2.67} +2025-02-06 06:10:54 - ERROR - stderr - 89%|████████▉ | 19959/22434 [20:03:14<3:56:25, 5.73s/it] +2025-02-06 06:10:56 - ERROR - stderr - 89%|████████▉ | 19960/22434 [20:03:16<3:16:14, 4.76s/it] +2025-02-06 06:10:56 - ERROR - stderr - +2025-02-06 06:10:56 - ERROR - stderr - +2025-02-06 06:10:56 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.489880919456482, 'learning_rate': 6.31145047102587e-07, 'epoch': 2.67} +2025-02-06 06:10:56 - ERROR - stderr - 89%|████████▉ | 19960/22434 [20:03:16<3:16:14, 4.76s/it] +2025-02-06 06:10:59 - ERROR - stderr - 89%|████████▉ | 19961/22434 [20:03:19<2:48:04, 4.08s/it] +2025-02-06 06:10:59 - ERROR - stderr - +2025-02-06 06:10:59 - ERROR - stderr - +2025-02-06 06:10:59 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.5229634046554565, 'learning_rate': 6.306403584644494e-07, 'epoch': 2.67} +2025-02-06 06:10:59 - ERROR - stderr - 89%|████████▉ | 19961/22434 [20:03:19<2:48:04, 4.08s/it] +2025-02-06 06:11:01 - ERROR - stderr - 89%|████████▉ | 19962/22434 [20:03:21<2:30:07, 3.64s/it] +2025-02-06 06:11:01 - ERROR - stderr - +2025-02-06 06:11:01 - ERROR - stderr - +2025-02-06 06:11:01 - INFO - stdout - {'loss': 0.3858, 'grad_norm': 1.5497459173202515, 'learning_rate': 6.301358651216482e-07, 'epoch': 2.67} +2025-02-06 06:11:01 - ERROR - stderr - 89%|████████▉ | 19962/22434 [20:03:21<2:30:07, 3.64s/it] +2025-02-06 06:11:21 - ERROR - stderr - 89%|████████▉ | 19963/22434 [20:03:41<5:50:58, 8.52s/it] +2025-02-06 06:11:21 - ERROR - stderr - +2025-02-06 06:11:21 - ERROR - stderr - +2025-02-06 06:11:21 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.530900478363037, 'learning_rate': 6.296315670846964e-07, 'epoch': 2.67} +2025-02-06 06:11:21 - ERROR - stderr - 89%|████████▉ | 19963/22434 [20:03:41<5:50:58, 8.52s/it] +2025-02-06 06:11:24 - ERROR - stderr - 89%|████████▉ | 19964/22434 [20:03:44<4:37:03, 6.73s/it] +2025-02-06 06:11:24 - ERROR - stderr - +2025-02-06 06:11:24 - ERROR - stderr - +2025-02-06 06:11:24 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.5213403701782227, 'learning_rate': 6.29127464364111e-07, 'epoch': 2.67} +2025-02-06 06:11:24 - ERROR - stderr - 89%|████████▉ | 19964/22434 [20:03:44<4:37:03, 6.73s/it] +2025-02-06 06:11:26 - ERROR - stderr - 89%|████████▉ | 19965/22434 [20:03:46<3:44:00, 5.44s/it] +2025-02-06 06:11:26 - ERROR - stderr - +2025-02-06 06:11:26 - ERROR - stderr - +2025-02-06 06:11:26 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.5871517658233643, 'learning_rate': 6.286235569703958e-07, 'epoch': 2.67} +2025-02-06 06:11:26 - ERROR - stderr - 89%|████████▉ | 19965/22434 [20:03:46<3:44:00, 5.44s/it] +2025-02-06 06:11:44 - ERROR - stderr - 89%|████████▉ | 19966/22434 [20:04:03<6:09:17, 8.98s/it] +2025-02-06 06:11:44 - ERROR - stderr - +2025-02-06 06:11:44 - ERROR - stderr - +2025-02-06 06:11:44 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.6712712049484253, 'learning_rate': 6.281198449140525e-07, 'epoch': 2.67} +2025-02-06 06:11:44 - ERROR - stderr - 89%|████████▉ | 19966/22434 [20:04:03<6:09:17, 8.98s/it] +2025-02-06 06:11:46 - ERROR - stderr - 89%|████████▉ | 19967/22434 [20:04:06<4:49:11, 7.03s/it] +2025-02-06 06:11:46 - ERROR - stderr - +2025-02-06 06:11:46 - ERROR - stderr - +2025-02-06 06:11:46 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.6878187656402588, 'learning_rate': 6.276163282055869e-07, 'epoch': 2.67} +2025-02-06 06:11:46 - ERROR - stderr - 89%|████████▉ | 19967/22434 [20:04:06<4:49:11, 7.03s/it] +2025-02-06 06:11:48 - ERROR - stderr - 89%|████████▉ | 19968/22434 [20:04:08<3:52:42, 5.66s/it] +2025-02-06 06:11:49 - ERROR - stderr - +2025-02-06 06:11:49 - ERROR - stderr - +2025-02-06 06:11:49 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.459533929824829, 'learning_rate': 6.271130068554876e-07, 'epoch': 2.67} +2025-02-06 06:11:49 - ERROR - stderr - 89%|████████▉ | 19968/22434 [20:04:08<3:52:42, 5.66s/it] +2025-02-06 06:11:51 - ERROR - stderr - 89%|████████▉ | 19969/22434 [20:04:11<3:12:42, 4.69s/it] +2025-02-06 06:11:51 - ERROR - stderr - +2025-02-06 06:11:51 - ERROR - stderr - +2025-02-06 06:11:51 - INFO - stdout - {'loss': 0.3415, 'grad_norm': 1.4354403018951416, 'learning_rate': 6.266098808742515e-07, 'epoch': 2.67} +2025-02-06 06:11:51 - ERROR - stderr - 89%|████████▉ | 19969/22434 [20:04:11<3:12:42, 4.69s/it] +2025-02-06 06:12:01 - ERROR - stderr - 89%|████████▉ | 19970/22434 [20:04:20<4:15:49, 6.23s/it] +2025-02-06 06:12:01 - ERROR - stderr - +2025-02-06 06:12:01 - ERROR - stderr - +2025-02-06 06:12:01 - INFO - stdout - {'loss': 0.3647, 'grad_norm': 1.6031494140625, 'learning_rate': 6.261069502723616e-07, 'epoch': 2.67} +2025-02-06 06:12:01 - ERROR - stderr - 89%|████████▉ | 19970/22434 [20:04:21<4:15:49, 6.23s/it] +2025-02-06 06:12:03 - ERROR - stderr - 89%|████████▉ | 19971/22434 [20:04:23<3:31:31, 5.15s/it] +2025-02-06 06:12:03 - ERROR - stderr - +2025-02-06 06:12:03 - ERROR - stderr - +2025-02-06 06:12:03 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.4494433403015137, 'learning_rate': 6.256042150603025e-07, 'epoch': 2.67} +2025-02-06 06:12:03 - ERROR - stderr - 89%|████████▉ | 19971/22434 [20:04:23<3:31:31, 5.15s/it] +2025-02-06 06:12:06 - ERROR - stderr - 89%|████████▉ | 19972/22434 [20:04:26<2:58:02, 4.34s/it] +2025-02-06 06:12:06 - ERROR - stderr - +2025-02-06 06:12:06 - ERROR - stderr - +2025-02-06 06:12:06 - INFO - stdout - {'loss': 0.4064, 'grad_norm': 1.4605907201766968, 'learning_rate': 6.251016752485539e-07, 'epoch': 2.67} +2025-02-06 06:12:06 - ERROR - stderr - 89%|████████▉ | 19972/22434 [20:04:26<2:58:02, 4.34s/it] +2025-02-06 06:12:08 - ERROR - stderr - 89%|████████▉ | 19973/22434 [20:04:28<2:34:35, 3.77s/it] +2025-02-06 06:12:08 - ERROR - stderr - +2025-02-06 06:12:08 - ERROR - stderr - +2025-02-06 06:12:08 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.5953834056854248, 'learning_rate': 6.245993308475884e-07, 'epoch': 2.67} +2025-02-06 06:12:08 - ERROR - stderr - 89%|████████▉ | 19973/22434 [20:04:28<2:34:35, 3.77s/it] +2025-02-06 06:12:11 - ERROR - stderr - 89%|████████▉ | 19974/22434 [20:04:31<2:21:32, 3.45s/it] +2025-02-06 06:12:11 - ERROR - stderr - +2025-02-06 06:12:11 - ERROR - stderr - +2025-02-06 06:12:11 - INFO - stdout - {'loss': 0.4145, 'grad_norm': 1.5787290334701538, 'learning_rate': 6.240971818678798e-07, 'epoch': 2.67} +2025-02-06 06:12:11 - ERROR - stderr - 89%|████████▉ | 19974/22434 [20:04:31<2:21:32, 3.45s/it] +2025-02-06 06:12:14 - ERROR - stderr - 89%|████████▉ | 19975/22434 [20:04:33<2:12:15, 3.23s/it] +2025-02-06 06:12:14 - ERROR - stderr - +2025-02-06 06:12:14 - ERROR - stderr - +2025-02-06 06:12:14 - INFO - stdout - {'loss': 0.3224, 'grad_norm': 1.4142094850540161, 'learning_rate': 6.235952283198932e-07, 'epoch': 2.67} +2025-02-06 06:12:14 - ERROR - stderr - 89%|████████▉ | 19975/22434 [20:04:33<2:12:15, 3.23s/it] +2025-02-06 06:12:16 - ERROR - stderr - 89%|████████▉ | 19976/22434 [20:04:36<2:04:55, 3.05s/it] +2025-02-06 06:12:16 - ERROR - stderr - +2025-02-06 06:12:16 - ERROR - stderr - +2025-02-06 06:12:16 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.7694220542907715, 'learning_rate': 6.230934702140923e-07, 'epoch': 2.67} +2025-02-06 06:12:16 - ERROR - stderr - 89%|████████▉ | 19976/22434 [20:04:36<2:04:55, 3.05s/it] +2025-02-06 06:12:19 - ERROR - stderr - 89%|████████▉ | 19977/22434 [20:04:39<1:57:39, 2.87s/it] +2025-02-06 06:12:19 - ERROR - stderr - +2025-02-06 06:12:19 - ERROR - stderr - +2025-02-06 06:12:19 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.5131410360336304, 'learning_rate': 6.225919075609354e-07, 'epoch': 2.67} +2025-02-06 06:12:19 - ERROR - stderr - 89%|████████▉ | 19977/22434 [20:04:39<1:57:39, 2.87s/it] +2025-02-06 06:12:21 - ERROR - stderr - 89%|████████▉ | 19978/22434 [20:04:41<1:53:27, 2.77s/it] +2025-02-06 06:12:21 - ERROR - stderr - +2025-02-06 06:12:21 - ERROR - stderr - +2025-02-06 06:12:21 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.572045922279358, 'learning_rate': 6.220905403708766e-07, 'epoch': 2.67} +2025-02-06 06:12:21 - ERROR - stderr - 89%|████████▉ | 19978/22434 [20:04:41<1:53:27, 2.77s/it] +2025-02-06 06:12:24 - ERROR - stderr - 89%|████████▉ | 19979/22434 [20:04:44<1:50:00, 2.69s/it] +2025-02-06 06:12:24 - ERROR - stderr - +2025-02-06 06:12:24 - ERROR - stderr - +2025-02-06 06:12:24 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.4961086511611938, 'learning_rate': 6.215893686543672e-07, 'epoch': 2.67} +2025-02-06 06:12:24 - ERROR - stderr - 89%|████████▉ | 19979/22434 [20:04:44<1:50:00, 2.69s/it] +2025-02-06 06:12:26 - ERROR - stderr - 89%|████████▉ | 19980/22434 [20:04:46<1:47:18, 2.62s/it] +2025-02-06 06:12:26 - ERROR - stderr - +2025-02-06 06:12:26 - ERROR - stderr - +2025-02-06 06:12:26 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.5664687156677246, 'learning_rate': 6.210883924218525e-07, 'epoch': 2.67} +2025-02-06 06:12:26 - ERROR - stderr - 89%|████████▉ | 19980/22434 [20:04:46<1:47:18, 2.62s/it] +2025-02-06 06:12:29 - ERROR - stderr - 89%|████████▉ | 19981/22434 [20:04:48<1:45:27, 2.58s/it] +2025-02-06 06:12:29 - ERROR - stderr - +2025-02-06 06:12:29 - ERROR - stderr - +2025-02-06 06:12:29 - INFO - stdout - {'loss': 0.3514, 'grad_norm': 1.5498435497283936, 'learning_rate': 6.205876116837761e-07, 'epoch': 2.67} +2025-02-06 06:12:29 - ERROR - stderr - 89%|████████▉ | 19981/22434 [20:04:49<1:45:27, 2.58s/it] +2025-02-06 06:12:31 - ERROR - stderr - 89%|████████▉ | 19982/22434 [20:04:51<1:44:35, 2.56s/it] +2025-02-06 06:12:31 - ERROR - stderr - +2025-02-06 06:12:31 - ERROR - stderr - +2025-02-06 06:12:31 - INFO - stdout - {'loss': 0.2999, 'grad_norm': 1.3279353380203247, 'learning_rate': 6.200870264505754e-07, 'epoch': 2.67} +2025-02-06 06:12:31 - ERROR - stderr - 89%|████████▉ | 19982/22434 [20:04:51<1:44:35, 2.56s/it] +2025-02-06 06:12:34 - ERROR - stderr - 89%|████████▉ | 19983/22434 [20:04:54<1:44:05, 2.55s/it] +2025-02-06 06:12:34 - ERROR - stderr - +2025-02-06 06:12:34 - ERROR - stderr - +2025-02-06 06:12:34 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.4898501634597778, 'learning_rate': 6.195866367326875e-07, 'epoch': 2.67} +2025-02-06 06:12:34 - ERROR - stderr - 89%|████████▉ | 19983/22434 [20:04:54<1:44:05, 2.55s/it] +2025-02-06 06:12:36 - ERROR - stderr - 89%|████████▉ | 19984/22434 [20:04:56<1:43:04, 2.52s/it] +2025-02-06 06:12:36 - ERROR - stderr - +2025-02-06 06:12:36 - ERROR - stderr - +2025-02-06 06:12:36 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.3464760780334473, 'learning_rate': 6.190864425405363e-07, 'epoch': 2.67} +2025-02-06 06:12:36 - ERROR - stderr - 89%|████████▉ | 19984/22434 [20:04:56<1:43:04, 2.52s/it] +2025-02-06 06:12:39 - ERROR - stderr - 89%|████████▉ | 19985/22434 [20:04:59<1:46:26, 2.61s/it] +2025-02-06 06:12:39 - ERROR - stderr - +2025-02-06 06:12:39 - ERROR - stderr - +2025-02-06 06:12:39 - INFO - stdout - {'loss': 0.3068, 'grad_norm': 1.5027506351470947, 'learning_rate': 6.185864438845523e-07, 'epoch': 2.67} +2025-02-06 06:12:39 - ERROR - stderr - 89%|████████▉ | 19985/22434 [20:04:59<1:46:26, 2.61s/it] +2025-02-06 06:12:42 - ERROR - stderr - 89%|████████▉ | 19986/22434 [20:05:01<1:45:36, 2.59s/it] +2025-02-06 06:12:42 - ERROR - stderr - +2025-02-06 06:12:42 - ERROR - stderr - +2025-02-06 06:12:42 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.7076531648635864, 'learning_rate': 6.180866407751595e-07, 'epoch': 2.67} +2025-02-06 06:12:42 - ERROR - stderr - 89%|████████▉ | 19986/22434 [20:05:01<1:45:36, 2.59s/it] +2025-02-06 06:12:44 - ERROR - stderr - 89%|████████▉ | 19987/22434 [20:05:04<1:46:10, 2.60s/it] +2025-02-06 06:12:44 - ERROR - stderr - +2025-02-06 06:12:44 - ERROR - stderr - +2025-02-06 06:12:44 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.5985829830169678, 'learning_rate': 6.175870332227707e-07, 'epoch': 2.67} +2025-02-06 06:12:44 - ERROR - stderr - 89%|████████▉ | 19987/22434 [20:05:04<1:46:10, 2.60s/it] +2025-02-06 06:12:47 - ERROR - stderr - 89%|████████▉ | 19988/22434 [20:05:06<1:44:48, 2.57s/it] +2025-02-06 06:12:47 - ERROR - stderr - +2025-02-06 06:12:47 - ERROR - stderr - +2025-02-06 06:12:47 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.5114191770553589, 'learning_rate': 6.17087621237804e-07, 'epoch': 2.67} +2025-02-06 06:12:47 - ERROR - stderr - 89%|████████▉ | 19988/22434 [20:05:07<1:44:48, 2.57s/it] +2025-02-06 06:12:49 - ERROR - stderr - 89%|████████▉ | 19989/22434 [20:05:09<1:44:01, 2.55s/it] +2025-02-06 06:12:49 - ERROR - stderr - +2025-02-06 06:12:49 - ERROR - stderr - +2025-02-06 06:12:49 - INFO - stdout - {'loss': 0.4068, 'grad_norm': 1.759737253189087, 'learning_rate': 6.165884048306647e-07, 'epoch': 2.67} +2025-02-06 06:12:49 - ERROR - stderr - 89%|████████▉ | 19989/22434 [20:05:09<1:44:01, 2.55s/it] +2025-02-06 06:12:52 - ERROR - stderr - 89%|████████▉ | 19990/22434 [20:05:12<1:44:31, 2.57s/it] +2025-02-06 06:12:52 - ERROR - stderr - +2025-02-06 06:12:52 - ERROR - stderr - +2025-02-06 06:12:52 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.550301432609558, 'learning_rate': 6.160893840117643e-07, 'epoch': 2.67} +2025-02-06 06:12:52 - ERROR - stderr - 89%|████████▉ | 19990/22434 [20:05:12<1:44:31, 2.57s/it] +2025-02-06 06:12:54 - ERROR - stderr - 89%|████████▉ | 19991/22434 [20:05:14<1:43:24, 2.54s/it] +2025-02-06 06:12:54 - ERROR - stderr - +2025-02-06 06:12:54 - ERROR - stderr - +2025-02-06 06:12:54 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.623882532119751, 'learning_rate': 6.155905587915001e-07, 'epoch': 2.67} +2025-02-06 06:12:54 - ERROR - stderr - 89%|████████▉ | 19991/22434 [20:05:14<1:43:24, 2.54s/it] +2025-02-06 06:12:57 - ERROR - stderr - 89%|████████▉ | 19992/22434 [20:05:17<1:44:12, 2.56s/it] +2025-02-06 06:12:57 - ERROR - stderr - +2025-02-06 06:12:57 - ERROR - stderr - +2025-02-06 06:12:57 - INFO - stdout - {'loss': 0.4122, 'grad_norm': 1.80980384349823, 'learning_rate': 6.150919291802704e-07, 'epoch': 2.67} +2025-02-06 06:12:57 - ERROR - stderr - 89%|████████▉ | 19992/22434 [20:05:17<1:44:12, 2.56s/it] +2025-02-06 06:13:00 - ERROR - stderr - 89%|████████▉ | 19993/22434 [20:05:19<1:46:15, 2.61s/it] +2025-02-06 06:13:00 - ERROR - stderr - +2025-02-06 06:13:00 - ERROR - stderr - +2025-02-06 06:13:00 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.5744447708129883, 'learning_rate': 6.145934951884691e-07, 'epoch': 2.67} +2025-02-06 06:13:00 - ERROR - stderr - 89%|████████▉ | 19993/22434 [20:05:19<1:46:15, 2.61s/it] +2025-02-06 06:13:02 - ERROR - stderr - 89%|████████▉ | 19994/22434 [20:05:22<1:49:25, 2.69s/it] +2025-02-06 06:13:03 - ERROR - stderr - +2025-02-06 06:13:03 - ERROR - stderr - +2025-02-06 06:13:03 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.4680581092834473, 'learning_rate': 6.140952568264858e-07, 'epoch': 2.67} +2025-02-06 06:13:03 - ERROR - stderr - 89%|████████▉ | 19994/22434 [20:05:22<1:49:25, 2.69s/it] +2025-02-06 06:13:05 - ERROR - stderr - 89%|████████▉ | 19995/22434 [20:05:25<1:47:35, 2.65s/it] +2025-02-06 06:13:05 - ERROR - stderr - +2025-02-06 06:13:05 - ERROR - stderr - +2025-02-06 06:13:05 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.5514980554580688, 'learning_rate': 6.135972141047042e-07, 'epoch': 2.67} +2025-02-06 06:13:05 - ERROR - stderr - 89%|████████▉ | 19995/22434 [20:05:25<1:47:35, 2.65s/it] +2025-02-06 06:13:08 - ERROR - stderr - 89%|████████▉ | 19996/22434 [20:05:27<1:45:50, 2.60s/it] +2025-02-06 06:13:08 - ERROR - stderr - +2025-02-06 06:13:08 - ERROR - stderr - +2025-02-06 06:13:08 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.59768807888031, 'learning_rate': 6.130993670335083e-07, 'epoch': 2.67} +2025-02-06 06:13:08 - ERROR - stderr - 89%|████████▉ | 19996/22434 [20:05:27<1:45:50, 2.60s/it] +2025-02-06 06:13:10 - ERROR - stderr - 89%|████████▉ | 19997/22434 [20:05:30<1:45:14, 2.59s/it] +2025-02-06 06:13:10 - ERROR - stderr - +2025-02-06 06:13:10 - ERROR - stderr - +2025-02-06 06:13:10 - INFO - stdout - {'loss': 0.3248, 'grad_norm': 1.3371001482009888, 'learning_rate': 6.126017156232734e-07, 'epoch': 2.67} +2025-02-06 06:13:10 - ERROR - stderr - 89%|████████▉ | 19997/22434 [20:05:30<1:45:14, 2.59s/it] +2025-02-06 06:13:13 - ERROR - stderr - 89%|████████▉ | 19998/22434 [20:05:32<1:43:53, 2.56s/it] +2025-02-06 06:13:13 - ERROR - stderr - +2025-02-06 06:13:13 - ERROR - stderr - +2025-02-06 06:13:13 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.5018333196640015, 'learning_rate': 6.121042598843729e-07, 'epoch': 2.67} +2025-02-06 06:13:13 - ERROR - stderr - 89%|████████▉ | 19998/22434 [20:05:32<1:43:53, 2.56s/it] +2025-02-06 06:13:15 - ERROR - stderr - 89%|████████▉ | 19999/22434 [20:05:35<1:43:11, 2.54s/it] +2025-02-06 06:13:15 - ERROR - stderr - +2025-02-06 06:13:15 - ERROR - stderr - +2025-02-06 06:13:15 - INFO - stdout - {'loss': 0.4087, 'grad_norm': 1.5479469299316406, 'learning_rate': 6.116069998271756e-07, 'epoch': 2.67} +2025-02-06 06:13:15 - ERROR - stderr - 89%|████████▉ | 19999/22434 [20:05:35<1:43:11, 2.54s/it] +2025-02-06 06:13:18 - ERROR - stderr - 89%|████████▉ | 20000/22434 [20:05:37<1:42:52, 2.54s/it] +2025-02-06 06:13:18 - ERROR - stderr - +2025-02-06 06:13:18 - ERROR - stderr - +2025-02-06 06:13:18 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.579924464225769, 'learning_rate': 6.111099354620476e-07, 'epoch': 2.67} +2025-02-06 06:13:18 - ERROR - stderr - 89%|████████▉ | 20000/22434 [20:05:37<1:42:52, 2.54s/it] +2025-02-06 06:13:20 - ERROR - stderr - 89%|████████▉ | 20001/22434 [20:05:40<1:43:05, 2.54s/it] +2025-02-06 06:13:20 - ERROR - stderr - +2025-02-06 06:13:20 - ERROR - stderr - +2025-02-06 06:13:20 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.5959900617599487, 'learning_rate': 6.106130667993482e-07, 'epoch': 2.67} +2025-02-06 06:13:20 - ERROR - stderr - 89%|████████▉ | 20001/22434 [20:05:40<1:43:05, 2.54s/it] +2025-02-06 06:13:23 - ERROR - stderr - 89%|████████▉ | 20002/22434 [20:05:42<1:42:26, 2.53s/it] +2025-02-06 06:13:23 - ERROR - stderr - +2025-02-06 06:13:23 - ERROR - stderr - +2025-02-06 06:13:23 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.6480743885040283, 'learning_rate': 6.101163938494359e-07, 'epoch': 2.67} +2025-02-06 06:13:23 - ERROR - stderr - 89%|████████▉ | 20002/22434 [20:05:42<1:42:26, 2.53s/it] +2025-02-06 06:13:25 - ERROR - stderr - 89%|████████▉ | 20003/22434 [20:05:45<1:42:21, 2.53s/it] +2025-02-06 06:13:25 - ERROR - stderr - +2025-02-06 06:13:25 - ERROR - stderr - +2025-02-06 06:13:25 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.4970154762268066, 'learning_rate': 6.096199166226602e-07, 'epoch': 2.67} +2025-02-06 06:13:25 - ERROR - stderr - 89%|████████▉ | 20003/22434 [20:05:45<1:42:21, 2.53s/it] +2025-02-06 06:13:28 - ERROR - stderr - 89%|████████▉ | 20004/22434 [20:05:47<1:41:56, 2.52s/it] +2025-02-06 06:13:28 - ERROR - stderr - +2025-02-06 06:13:28 - ERROR - stderr - +2025-02-06 06:13:28 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.5489505529403687, 'learning_rate': 6.091236351293717e-07, 'epoch': 2.68} +2025-02-06 06:13:28 - ERROR - stderr - 89%|████████▉ | 20004/22434 [20:05:48<1:41:56, 2.52s/it] +2025-02-06 06:13:30 - ERROR - stderr - 89%|████████▉ | 20005/22434 [20:05:50<1:42:02, 2.52s/it] +2025-02-06 06:13:30 - ERROR - stderr - +2025-02-06 06:13:30 - ERROR - stderr - +2025-02-06 06:13:30 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.4105826616287231, 'learning_rate': 6.086275493799165e-07, 'epoch': 2.68} +2025-02-06 06:13:30 - ERROR - stderr - 89%|████████▉ | 20005/22434 [20:05:50<1:42:02, 2.52s/it] +2025-02-06 06:13:33 - ERROR - stderr - 89%|████████▉ | 20006/22434 [20:05:53<1:42:57, 2.54s/it] +2025-02-06 06:13:33 - ERROR - stderr - +2025-02-06 06:13:33 - ERROR - stderr - +2025-02-06 06:13:33 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.5864354372024536, 'learning_rate': 6.081316593846331e-07, 'epoch': 2.68} +2025-02-06 06:13:33 - ERROR - stderr - 89%|████████▉ | 20006/22434 [20:05:53<1:42:57, 2.54s/it] +2025-02-06 06:13:35 - ERROR - stderr - 89%|████████▉ | 20007/22434 [20:05:55<1:42:14, 2.53s/it] +2025-02-06 06:13:35 - ERROR - stderr - +2025-02-06 06:13:35 - ERROR - stderr - +2025-02-06 06:13:35 - INFO - stdout - {'loss': 0.3387, 'grad_norm': 1.3754163980484009, 'learning_rate': 6.076359651538588e-07, 'epoch': 2.68} +2025-02-06 06:13:35 - ERROR - stderr - 89%|████████▉ | 20007/22434 [20:05:55<1:42:14, 2.53s/it] +2025-02-06 06:13:38 - ERROR - stderr - 89%|████████▉ | 20008/22434 [20:05:58<1:41:39, 2.51s/it] +2025-02-06 06:13:38 - ERROR - stderr - +2025-02-06 06:13:38 - ERROR - stderr - +2025-02-06 06:13:38 - INFO - stdout - {'loss': 0.3255, 'grad_norm': 1.5301357507705688, 'learning_rate': 6.071404666979231e-07, 'epoch': 2.68} +2025-02-06 06:13:38 - ERROR - stderr - 89%|████████▉ | 20008/22434 [20:05:58<1:41:39, 2.51s/it] +2025-02-06 06:13:40 - ERROR - stderr - 89%|████████▉ | 20009/22434 [20:06:00<1:42:55, 2.55s/it] +2025-02-06 06:13:40 - ERROR - stderr - +2025-02-06 06:13:40 - ERROR - stderr - +2025-02-06 06:13:40 - INFO - stdout - {'loss': 0.3225, 'grad_norm': 1.3993881940841675, 'learning_rate': 6.066451640271587e-07, 'epoch': 2.68} +2025-02-06 06:13:40 - ERROR - stderr - 89%|████████▉ | 20009/22434 [20:06:00<1:42:55, 2.55s/it] +2025-02-06 06:13:43 - ERROR - stderr - 89%|████████▉ | 20010/22434 [20:06:03<1:42:59, 2.55s/it] +2025-02-06 06:13:43 - ERROR - stderr - +2025-02-06 06:13:43 - ERROR - stderr - +2025-02-06 06:13:43 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.2642285823822021, 'learning_rate': 6.061500571518864e-07, 'epoch': 2.68} +2025-02-06 06:13:43 - ERROR - stderr - 89%|████████▉ | 20010/22434 [20:06:03<1:42:59, 2.55s/it] +2025-02-06 06:13:45 - ERROR - stderr - 89%|████████▉ | 20011/22434 [20:06:05<1:42:12, 2.53s/it] +2025-02-06 06:13:45 - ERROR - stderr - +2025-02-06 06:13:45 - ERROR - stderr - +2025-02-06 06:13:45 - INFO - stdout - {'loss': 0.3961, 'grad_norm': 1.6877914667129517, 'learning_rate': 6.056551460824279e-07, 'epoch': 2.68} +2025-02-06 06:13:45 - ERROR - stderr - 89%|████████▉ | 20011/22434 [20:06:05<1:42:12, 2.53s/it] +2025-02-06 06:13:48 - ERROR - stderr - 89%|████████▉ | 20012/22434 [20:06:08<1:42:50, 2.55s/it] +2025-02-06 06:13:48 - ERROR - stderr - +2025-02-06 06:13:48 - ERROR - stderr - +2025-02-06 06:13:48 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.6665089130401611, 'learning_rate': 6.05160430829097e-07, 'epoch': 2.68} +2025-02-06 06:13:48 - ERROR - stderr - 89%|████████▉ | 20012/22434 [20:06:08<1:42:50, 2.55s/it] +2025-02-06 06:13:51 - ERROR - stderr - 89%|████████▉ | 20013/22434 [20:06:10<1:41:58, 2.53s/it] +2025-02-06 06:13:51 - ERROR - stderr - +2025-02-06 06:13:51 - ERROR - stderr - +2025-02-06 06:13:51 - INFO - stdout - {'loss': 0.3561, 'grad_norm': 1.596577525138855, 'learning_rate': 6.046659114022068e-07, 'epoch': 2.68} +2025-02-06 06:13:51 - ERROR - stderr - 89%|████████▉ | 20013/22434 [20:06:10<1:41:58, 2.53s/it] +2025-02-06 06:13:53 - ERROR - stderr - 89%|████████▉ | 20014/22434 [20:06:13<1:40:46, 2.50s/it] +2025-02-06 06:13:53 - ERROR - stderr - +2025-02-06 06:13:53 - ERROR - stderr - +2025-02-06 06:13:53 - INFO - stdout - {'loss': 0.423, 'grad_norm': 1.763180136680603, 'learning_rate': 6.04171587812068e-07, 'epoch': 2.68} +2025-02-06 06:13:53 - ERROR - stderr - 89%|████████▉ | 20014/22434 [20:06:13<1:40:46, 2.50s/it] +2025-02-06 06:13:55 - ERROR - stderr - 89%|████████▉ | 20015/22434 [20:06:15<1:39:57, 2.48s/it] +2025-02-06 06:13:55 - ERROR - stderr - +2025-02-06 06:13:55 - ERROR - stderr - +2025-02-06 06:13:55 - INFO - stdout - {'loss': 0.4437, 'grad_norm': 1.7104454040527344, 'learning_rate': 6.036774600689798e-07, 'epoch': 2.68} +2025-02-06 06:13:55 - ERROR - stderr - 89%|████████▉ | 20015/22434 [20:06:15<1:39:57, 2.48s/it] +2025-02-06 06:13:58 - ERROR - stderr - 89%|████████▉ | 20016/22434 [20:06:18<1:39:21, 2.47s/it] +2025-02-06 06:13:58 - ERROR - stderr - +2025-02-06 06:13:58 - ERROR - stderr - +2025-02-06 06:13:58 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.4546639919281006, 'learning_rate': 6.031835281832433e-07, 'epoch': 2.68} +2025-02-06 06:13:58 - ERROR - stderr - 89%|████████▉ | 20016/22434 [20:06:18<1:39:21, 2.47s/it] +2025-02-06 06:14:00 - ERROR - stderr - 89%|████████▉ | 20017/22434 [20:06:20<1:39:02, 2.46s/it] +2025-02-06 06:14:00 - ERROR - stderr - +2025-02-06 06:14:00 - ERROR - stderr - +2025-02-06 06:14:00 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.5124884843826294, 'learning_rate': 6.026897921651553e-07, 'epoch': 2.68} +2025-02-06 06:14:00 - ERROR - stderr - 89%|████████▉ | 20017/22434 [20:06:20<1:39:02, 2.46s/it] +2025-02-06 06:14:03 - ERROR - stderr - 89%|████████▉ | 20018/22434 [20:06:23<1:40:32, 2.50s/it] +2025-02-06 06:14:03 - ERROR - stderr - +2025-02-06 06:14:03 - ERROR - stderr - +2025-02-06 06:14:03 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.5104905366897583, 'learning_rate': 6.021962520250058e-07, 'epoch': 2.68} +2025-02-06 06:14:03 - ERROR - stderr - 89%|████████▉ | 20018/22434 [20:06:23<1:40:32, 2.50s/it] +2025-02-06 06:14:05 - ERROR - stderr - 89%|████████▉ | 20019/22434 [20:06:25<1:39:45, 2.48s/it] +2025-02-06 06:14:05 - ERROR - stderr - +2025-02-06 06:14:05 - ERROR - stderr - +2025-02-06 06:14:05 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.5597808361053467, 'learning_rate': 6.017029077730829e-07, 'epoch': 2.68} +2025-02-06 06:14:05 - ERROR - stderr - 89%|████████▉ | 20019/22434 [20:06:25<1:39:45, 2.48s/it] +2025-02-06 06:14:08 - ERROR - stderr - 89%|████████▉ | 20020/22434 [20:06:28<1:39:49, 2.48s/it] +2025-02-06 06:14:08 - ERROR - stderr - +2025-02-06 06:14:08 - ERROR - stderr - +2025-02-06 06:14:08 - INFO - stdout - {'loss': 0.3928, 'grad_norm': 1.641136884689331, 'learning_rate': 6.012097594196698e-07, 'epoch': 2.68} +2025-02-06 06:14:08 - ERROR - stderr - 89%|████████▉ | 20020/22434 [20:06:28<1:39:49, 2.48s/it] +2025-02-06 06:14:10 - ERROR - stderr - 89%|████████▉ | 20021/22434 [20:06:30<1:41:25, 2.52s/it] +2025-02-06 06:14:10 - ERROR - stderr - +2025-02-06 06:14:10 - ERROR - stderr - +2025-02-06 06:14:10 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.6957744359970093, 'learning_rate': 6.007168069750446e-07, 'epoch': 2.68} +2025-02-06 06:14:10 - ERROR - stderr - 89%|████████▉ | 20021/22434 [20:06:30<1:41:25, 2.52s/it] +2025-02-06 06:14:13 - ERROR - stderr - 89%|████████▉ | 20022/22434 [20:06:33<1:40:26, 2.50s/it] +2025-02-06 06:14:13 - ERROR - stderr - +2025-02-06 06:14:13 - ERROR - stderr - +2025-02-06 06:14:13 - INFO - stdout - {'loss': 0.3478, 'grad_norm': 1.575194001197815, 'learning_rate': 6.002240504494849e-07, 'epoch': 2.68} +2025-02-06 06:14:13 - ERROR - stderr - 89%|████████▉ | 20022/22434 [20:06:33<1:40:26, 2.50s/it] +2025-02-06 06:14:15 - ERROR - stderr - 89%|████████▉ | 20023/22434 [20:06:35<1:40:20, 2.50s/it] +2025-02-06 06:14:15 - ERROR - stderr - +2025-02-06 06:14:15 - ERROR - stderr - +2025-02-06 06:14:15 - INFO - stdout - {'loss': 0.3325, 'grad_norm': 1.5909523963928223, 'learning_rate': 5.997314898532591e-07, 'epoch': 2.68} +2025-02-06 06:14:15 - ERROR - stderr - 89%|████████▉ | 20023/22434 [20:06:35<1:40:20, 2.50s/it] +2025-02-06 06:14:18 - ERROR - stderr - 89%|████████▉ | 20024/22434 [20:06:38<1:40:57, 2.51s/it] +2025-02-06 06:14:18 - ERROR - stderr - +2025-02-06 06:14:18 - ERROR - stderr - +2025-02-06 06:14:18 - INFO - stdout - {'loss': 0.3891, 'grad_norm': 1.6662592887878418, 'learning_rate': 5.992391251966356e-07, 'epoch': 2.68} +2025-02-06 06:14:18 - ERROR - stderr - 89%|████████▉ | 20024/22434 [20:06:38<1:40:57, 2.51s/it] +2025-02-06 06:14:20 - ERROR - stderr - 89%|████████▉ | 20025/22434 [20:06:40<1:40:19, 2.50s/it] +2025-02-06 06:14:20 - ERROR - stderr - +2025-02-06 06:14:20 - ERROR - stderr - +2025-02-06 06:14:20 - INFO - stdout - {'loss': 0.4163, 'grad_norm': 1.7494367361068726, 'learning_rate': 5.987469564898773e-07, 'epoch': 2.68} +2025-02-06 06:14:20 - ERROR - stderr - 89%|████████▉ | 20025/22434 [20:06:40<1:40:19, 2.50s/it] +2025-02-06 06:14:23 - ERROR - stderr - 89%|████████▉ | 20026/22434 [20:06:43<1:39:56, 2.49s/it] +2025-02-06 06:14:23 - ERROR - stderr - +2025-02-06 06:14:23 - ERROR - stderr - +2025-02-06 06:14:23 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.6909092664718628, 'learning_rate': 5.982549837432439e-07, 'epoch': 2.68} +2025-02-06 06:14:23 - ERROR - stderr - 89%|████████▉ | 20026/22434 [20:06:43<1:39:56, 2.49s/it] +2025-02-06 06:14:25 - ERROR - stderr - 89%|████████▉ | 20027/22434 [20:06:45<1:40:03, 2.49s/it] +2025-02-06 06:14:25 - ERROR - stderr - +2025-02-06 06:14:25 - ERROR - stderr - +2025-02-06 06:14:25 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.6549019813537598, 'learning_rate': 5.977632069669859e-07, 'epoch': 2.68} +2025-02-06 06:14:25 - ERROR - stderr - 89%|████████▉ | 20027/22434 [20:06:45<1:40:03, 2.49s/it] +2025-02-06 06:14:28 - ERROR - stderr - 89%|████████▉ | 20028/22434 [20:06:48<1:40:32, 2.51s/it] +2025-02-06 06:14:28 - ERROR - stderr - +2025-02-06 06:14:28 - ERROR - stderr - +2025-02-06 06:14:28 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.8106907606124878, 'learning_rate': 5.972716261713607e-07, 'epoch': 2.68} +2025-02-06 06:14:28 - ERROR - stderr - 89%|████████▉ | 20028/22434 [20:06:48<1:40:32, 2.51s/it] +2025-02-06 06:14:30 - ERROR - stderr - 89%|████████▉ | 20029/22434 [20:06:50<1:41:41, 2.54s/it] +2025-02-06 06:14:31 - ERROR - stderr - +2025-02-06 06:14:31 - ERROR - stderr - +2025-02-06 06:14:31 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.5486547946929932, 'learning_rate': 5.967802413666068e-07, 'epoch': 2.68} +2025-02-06 06:14:31 - ERROR - stderr - 89%|████████▉ | 20029/22434 [20:06:50<1:41:41, 2.54s/it] +2025-02-06 06:14:33 - ERROR - stderr - 89%|████████▉ | 20030/22434 [20:06:53<1:41:06, 2.52s/it] +2025-02-06 06:14:33 - ERROR - stderr - +2025-02-06 06:14:33 - ERROR - stderr - +2025-02-06 06:14:33 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.4902559518814087, 'learning_rate': 5.962890525629727e-07, 'epoch': 2.68} +2025-02-06 06:14:33 - ERROR - stderr - 89%|████████▉ | 20030/22434 [20:06:53<1:41:06, 2.52s/it] +2025-02-06 06:14:36 - ERROR - stderr - 89%|████████▉ | 20031/22434 [20:06:55<1:43:39, 2.59s/it] +2025-02-06 06:14:36 - ERROR - stderr - +2025-02-06 06:14:36 - ERROR - stderr - +2025-02-06 06:14:36 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.4929165840148926, 'learning_rate': 5.957980597706969e-07, 'epoch': 2.68} +2025-02-06 06:14:36 - ERROR - stderr - 89%|████████▉ | 20031/22434 [20:06:56<1:43:39, 2.59s/it] +2025-02-06 06:14:38 - ERROR - stderr - 89%|████████▉ | 20032/22434 [20:06:58<1:43:40, 2.59s/it] +2025-02-06 06:14:38 - ERROR - stderr - +2025-02-06 06:14:38 - ERROR - stderr - +2025-02-06 06:14:38 - INFO - stdout - {'loss': 0.3419, 'grad_norm': 1.5461690425872803, 'learning_rate': 5.953072630000079e-07, 'epoch': 2.68} +2025-02-06 06:14:38 - ERROR - stderr - 89%|████████▉ | 20032/22434 [20:06:58<1:43:40, 2.59s/it] +2025-02-06 06:14:41 - ERROR - stderr - 89%|████████▉ | 20033/22434 [20:07:01<1:42:51, 2.57s/it] +2025-02-06 06:14:41 - ERROR - stderr - +2025-02-06 06:14:41 - ERROR - stderr - +2025-02-06 06:14:41 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.472486972808838, 'learning_rate': 5.94816662261144e-07, 'epoch': 2.68} +2025-02-06 06:14:41 - ERROR - stderr - 89%|████████▉ | 20033/22434 [20:07:01<1:42:51, 2.57s/it] +2025-02-06 06:14:43 - ERROR - stderr - 89%|████████▉ | 20034/22434 [20:07:03<1:40:52, 2.52s/it] +2025-02-06 06:14:43 - ERROR - stderr - +2025-02-06 06:14:43 - ERROR - stderr - +2025-02-06 06:14:43 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.6837760210037231, 'learning_rate': 5.943262575643239e-07, 'epoch': 2.68} +2025-02-06 06:14:43 - ERROR - stderr - 89%|████████▉ | 20034/22434 [20:07:03<1:40:52, 2.52s/it] +2025-02-06 06:14:46 - ERROR - stderr - 89%|████████▉ | 20035/22434 [20:07:05<1:40:31, 2.51s/it] +2025-02-06 06:14:46 - ERROR - stderr - +2025-02-06 06:14:46 - ERROR - stderr - +2025-02-06 06:14:46 - INFO - stdout - {'loss': 0.334, 'grad_norm': 1.4544481039047241, 'learning_rate': 5.938360489197736e-07, 'epoch': 2.68} +2025-02-06 06:14:46 - ERROR - stderr - 89%|████████▉ | 20035/22434 [20:07:06<1:40:31, 2.51s/it] +2025-02-06 06:14:48 - ERROR - stderr - 89%|████████▉ | 20036/22434 [20:07:08<1:40:17, 2.51s/it] +2025-02-06 06:14:48 - ERROR - stderr - +2025-02-06 06:14:48 - ERROR - stderr - +2025-02-06 06:14:48 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.7290195226669312, 'learning_rate': 5.933460363377108e-07, 'epoch': 2.68} +2025-02-06 06:14:48 - ERROR - stderr - 89%|████████▉ | 20036/22434 [20:07:08<1:40:17, 2.51s/it] +2025-02-06 06:14:51 - ERROR - stderr - 89%|████████▉ | 20037/22434 [20:07:10<1:39:31, 2.49s/it] +2025-02-06 06:14:51 - ERROR - stderr - +2025-02-06 06:14:51 - ERROR - stderr - +2025-02-06 06:14:51 - INFO - stdout - {'loss': 0.3207, 'grad_norm': 1.5007505416870117, 'learning_rate': 5.928562198283472e-07, 'epoch': 2.68} +2025-02-06 06:14:51 - ERROR - stderr - 89%|████████▉ | 20037/22434 [20:07:10<1:39:31, 2.49s/it] +2025-02-06 06:14:53 - ERROR - stderr - 89%|████████▉ | 20038/22434 [20:07:13<1:42:17, 2.56s/it] +2025-02-06 06:14:53 - ERROR - stderr - +2025-02-06 06:14:53 - ERROR - stderr - +2025-02-06 06:14:53 - INFO - stdout - {'loss': 0.3309, 'grad_norm': 1.5000938177108765, 'learning_rate': 5.923665994018946e-07, 'epoch': 2.68} +2025-02-06 06:14:53 - ERROR - stderr - 89%|████████▉ | 20038/22434 [20:07:13<1:42:17, 2.56s/it] +2025-02-06 06:14:56 - ERROR - stderr - 89%|████████▉ | 20039/22434 [20:07:16<1:40:49, 2.53s/it] +2025-02-06 06:14:56 - ERROR - stderr - +2025-02-06 06:14:56 - ERROR - stderr - +2025-02-06 06:14:56 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.60750150680542, 'learning_rate': 5.918771750685581e-07, 'epoch': 2.68} +2025-02-06 06:14:56 - ERROR - stderr - 89%|████████▉ | 20039/22434 [20:07:16<1:40:49, 2.53s/it] +2025-02-06 06:14:58 - ERROR - stderr - 89%|████████▉ | 20040/22434 [20:07:18<1:40:38, 2.52s/it] +2025-02-06 06:14:58 - ERROR - stderr - +2025-02-06 06:14:58 - ERROR - stderr - +2025-02-06 06:14:58 - INFO - stdout - {'loss': 0.3289, 'grad_norm': 1.5071264505386353, 'learning_rate': 5.913879468385397e-07, 'epoch': 2.68} +2025-02-06 06:14:58 - ERROR - stderr - 89%|████████▉ | 20040/22434 [20:07:18<1:40:38, 2.52s/it] +2025-02-06 06:15:01 - ERROR - stderr - 89%|████████▉ | 20041/22434 [20:07:21<1:41:18, 2.54s/it] +2025-02-06 06:15:01 - ERROR - stderr - +2025-02-06 06:15:01 - ERROR - stderr - +2025-02-06 06:15:01 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.4741175174713135, 'learning_rate': 5.908989147220367e-07, 'epoch': 2.68} +2025-02-06 06:15:01 - ERROR - stderr - 89%|████████▉ | 20041/22434 [20:07:21<1:41:18, 2.54s/it] +2025-02-06 06:15:03 - ERROR - stderr - 89%|████████▉ | 20042/22434 [20:07:23<1:40:36, 2.52s/it] +2025-02-06 06:15:03 - ERROR - stderr - +2025-02-06 06:15:03 - ERROR - stderr - +2025-02-06 06:15:03 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.6908389329910278, 'learning_rate': 5.904100787292411e-07, 'epoch': 2.68} +2025-02-06 06:15:03 - ERROR - stderr - 89%|████████▉ | 20042/22434 [20:07:23<1:40:36, 2.52s/it] +2025-02-06 06:15:06 - ERROR - stderr - 89%|████████▉ | 20043/22434 [20:07:26<1:44:55, 2.63s/it] +2025-02-06 06:15:06 - ERROR - stderr - +2025-02-06 06:15:06 - ERROR - stderr - +2025-02-06 06:15:06 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.5359634160995483, 'learning_rate': 5.899214388703445e-07, 'epoch': 2.68} +2025-02-06 06:15:06 - ERROR - stderr - 89%|████████▉ | 20043/22434 [20:07:26<1:44:55, 2.63s/it] +2025-02-06 06:15:09 - ERROR - stderr - 89%|████████▉ | 20044/22434 [20:07:29<1:42:46, 2.58s/it] +2025-02-06 06:15:09 - ERROR - stderr - +2025-02-06 06:15:09 - ERROR - stderr - +2025-02-06 06:15:09 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.5638254880905151, 'learning_rate': 5.894329951555311e-07, 'epoch': 2.68} +2025-02-06 06:15:09 - ERROR - stderr - 89%|████████▉ | 20044/22434 [20:07:29<1:42:46, 2.58s/it] +2025-02-06 06:15:11 - ERROR - stderr - 89%|████████▉ | 20045/22434 [20:07:31<1:41:47, 2.56s/it] +2025-02-06 06:15:11 - ERROR - stderr - +2025-02-06 06:15:11 - ERROR - stderr - +2025-02-06 06:15:11 - INFO - stdout - {'loss': 0.3281, 'grad_norm': 1.3662681579589844, 'learning_rate': 5.889447475949805e-07, 'epoch': 2.68} +2025-02-06 06:15:11 - ERROR - stderr - 89%|████████▉ | 20045/22434 [20:07:31<1:41:47, 2.56s/it] +2025-02-06 06:15:14 - ERROR - stderr - 89%|████████▉ | 20046/22434 [20:07:33<1:40:40, 2.53s/it] +2025-02-06 06:15:14 - ERROR - stderr - +2025-02-06 06:15:14 - ERROR - stderr - +2025-02-06 06:15:14 - INFO - stdout - {'loss': 0.4165, 'grad_norm': 1.529213309288025, 'learning_rate': 5.884566961988724e-07, 'epoch': 2.68} +2025-02-06 06:15:14 - ERROR - stderr - 89%|████████▉ | 20046/22434 [20:07:34<1:40:40, 2.53s/it] +2025-02-06 06:15:16 - ERROR - stderr - 89%|████████▉ | 20047/22434 [20:07:36<1:40:40, 2.53s/it] +2025-02-06 06:15:16 - ERROR - stderr - +2025-02-06 06:15:16 - ERROR - stderr - +2025-02-06 06:15:16 - INFO - stdout - {'loss': 0.3347, 'grad_norm': 1.4362126588821411, 'learning_rate': 5.879688409773798e-07, 'epoch': 2.68} +2025-02-06 06:15:16 - ERROR - stderr - 89%|████████▉ | 20047/22434 [20:07:36<1:40:40, 2.53s/it] +2025-02-06 06:15:19 - ERROR - stderr - 89%|████████▉ | 20048/22434 [20:07:38<1:39:56, 2.51s/it] +2025-02-06 06:15:19 - ERROR - stderr - +2025-02-06 06:15:19 - ERROR - stderr - +2025-02-06 06:15:19 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.7076367139816284, 'learning_rate': 5.874811819406678e-07, 'epoch': 2.68} +2025-02-06 06:15:19 - ERROR - stderr - 89%|████████▉ | 20048/22434 [20:07:39<1:39:56, 2.51s/it] +2025-02-06 06:15:21 - ERROR - stderr - 89%|████████▉ | 20049/22434 [20:07:41<1:40:01, 2.52s/it] +2025-02-06 06:15:21 - ERROR - stderr - +2025-02-06 06:15:21 - ERROR - stderr - +2025-02-06 06:15:21 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.5246272087097168, 'learning_rate': 5.86993719098905e-07, 'epoch': 2.68} +2025-02-06 06:15:21 - ERROR - stderr - 89%|████████▉ | 20049/22434 [20:07:41<1:40:01, 2.52s/it] +2025-02-06 06:15:24 - ERROR - stderr - 89%|████████▉ | 20050/22434 [20:07:44<1:41:08, 2.55s/it] +2025-02-06 06:15:24 - ERROR - stderr - +2025-02-06 06:15:24 - ERROR - stderr - +2025-02-06 06:15:24 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.3723030090332031, 'learning_rate': 5.865064524622522e-07, 'epoch': 2.68} +2025-02-06 06:15:24 - ERROR - stderr - 89%|████████▉ | 20050/22434 [20:07:44<1:41:08, 2.55s/it] +2025-02-06 06:15:26 - ERROR - stderr - 89%|████████▉ | 20051/22434 [20:07:46<1:40:14, 2.52s/it] +2025-02-06 06:15:26 - ERROR - stderr - +2025-02-06 06:15:26 - ERROR - stderr - +2025-02-06 06:15:26 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.3810194730758667, 'learning_rate': 5.860193820408621e-07, 'epoch': 2.68} +2025-02-06 06:15:26 - ERROR - stderr - 89%|████████▉ | 20051/22434 [20:07:46<1:40:14, 2.52s/it] +2025-02-06 06:15:29 - ERROR - stderr - 89%|████████▉ | 20052/22434 [20:07:49<1:40:41, 2.54s/it] +2025-02-06 06:15:29 - ERROR - stderr - +2025-02-06 06:15:29 - ERROR - stderr - +2025-02-06 06:15:29 - INFO - stdout - {'loss': 0.3366, 'grad_norm': 1.3940906524658203, 'learning_rate': 5.855325078448926e-07, 'epoch': 2.68} +2025-02-06 06:15:29 - ERROR - stderr - 89%|████████▉ | 20052/22434 [20:07:49<1:40:41, 2.54s/it] +2025-02-06 06:15:31 - ERROR - stderr - 89%|████████▉ | 20053/22434 [20:07:51<1:40:59, 2.54s/it] +2025-02-06 06:15:32 - ERROR - stderr - +2025-02-06 06:15:32 - ERROR - stderr - +2025-02-06 06:15:32 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.6015270948410034, 'learning_rate': 5.850458298844863e-07, 'epoch': 2.68} +2025-02-06 06:15:32 - ERROR - stderr - 89%|████████▉ | 20053/22434 [20:07:51<1:40:59, 2.54s/it] +2025-02-06 06:15:34 - ERROR - stderr - 89%|████████▉ | 20054/22434 [20:07:54<1:41:14, 2.55s/it] +2025-02-06 06:15:34 - ERROR - stderr - +2025-02-06 06:15:34 - ERROR - stderr - +2025-02-06 06:15:34 - INFO - stdout - {'loss': 0.4393, 'grad_norm': 1.8359153270721436, 'learning_rate': 5.845593481697931e-07, 'epoch': 2.68} +2025-02-06 06:15:34 - ERROR - stderr - 89%|████████▉ | 20054/22434 [20:07:54<1:41:14, 2.55s/it] +2025-02-06 06:15:37 - ERROR - stderr - 89%|████████▉ | 20055/22434 [20:07:56<1:40:34, 2.54s/it] +2025-02-06 06:15:37 - ERROR - stderr - +2025-02-06 06:15:37 - ERROR - stderr - +2025-02-06 06:15:37 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.4320597648620605, 'learning_rate': 5.840730627109492e-07, 'epoch': 2.68} +2025-02-06 06:15:37 - ERROR - stderr - 89%|████████▉ | 20055/22434 [20:07:56<1:40:34, 2.54s/it] +2025-02-06 06:15:39 - ERROR - stderr - 89%|████████▉ | 20056/22434 [20:07:59<1:40:09, 2.53s/it] +2025-02-06 06:15:39 - ERROR - stderr - +2025-02-06 06:15:39 - ERROR - stderr - +2025-02-06 06:15:39 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.4067903757095337, 'learning_rate': 5.835869735180932e-07, 'epoch': 2.68} +2025-02-06 06:15:39 - ERROR - stderr - 89%|████████▉ | 20056/22434 [20:07:59<1:40:09, 2.53s/it] +2025-02-06 06:15:42 - ERROR - stderr - 89%|████████▉ | 20057/22434 [20:08:02<1:44:11, 2.63s/it] +2025-02-06 06:15:42 - ERROR - stderr - +2025-02-06 06:15:42 - ERROR - stderr - +2025-02-06 06:15:42 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.6775288581848145, 'learning_rate': 5.831010806013548e-07, 'epoch': 2.68} +2025-02-06 06:15:42 - ERROR - stderr - 89%|████████▉ | 20057/22434 [20:08:02<1:44:11, 2.63s/it] +2025-02-06 06:15:44 - ERROR - stderr - 89%|████████▉ | 20058/22434 [20:08:04<1:42:02, 2.58s/it] +2025-02-06 06:15:44 - ERROR - stderr - +2025-02-06 06:15:44 - ERROR - stderr - +2025-02-06 06:15:44 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.645927906036377, 'learning_rate': 5.826153839708637e-07, 'epoch': 2.68} +2025-02-06 06:15:44 - ERROR - stderr - 89%|████████▉ | 20058/22434 [20:08:04<1:42:02, 2.58s/it] +2025-02-06 06:15:47 - ERROR - stderr - 89%|████████▉ | 20059/22434 [20:08:07<1:42:51, 2.60s/it] +2025-02-06 06:15:47 - ERROR - stderr - +2025-02-06 06:15:47 - ERROR - stderr - +2025-02-06 06:15:47 - INFO - stdout - {'loss': 0.3699, 'grad_norm': 1.5379769802093506, 'learning_rate': 5.82129883636745e-07, 'epoch': 2.68} +2025-02-06 06:15:47 - ERROR - stderr - 89%|████████▉ | 20059/22434 [20:08:07<1:42:51, 2.60s/it] +2025-02-06 06:15:50 - ERROR - stderr - 89%|████████▉ | 20060/22434 [20:08:09<1:41:29, 2.56s/it] +2025-02-06 06:15:50 - ERROR - stderr - +2025-02-06 06:15:50 - ERROR - stderr - +2025-02-06 06:15:50 - INFO - stdout - {'loss': 0.3964, 'grad_norm': 1.8257750272750854, 'learning_rate': 5.816445796091153e-07, 'epoch': 2.68} +2025-02-06 06:15:50 - ERROR - stderr - 89%|████████▉ | 20060/22434 [20:08:09<1:41:29, 2.56s/it] +2025-02-06 06:15:52 - ERROR - stderr - 89%|████████▉ | 20061/22434 [20:08:12<1:40:42, 2.55s/it] +2025-02-06 06:15:52 - ERROR - stderr - +2025-02-06 06:15:52 - ERROR - stderr - +2025-02-06 06:15:52 - INFO - stdout - {'loss': 0.3211, 'grad_norm': 1.4231352806091309, 'learning_rate': 5.811594718980928e-07, 'epoch': 2.68} +2025-02-06 06:15:52 - ERROR - stderr - 89%|████████▉ | 20061/22434 [20:08:12<1:40:42, 2.55s/it] +2025-02-06 06:15:55 - ERROR - stderr - 89%|████████▉ | 20062/22434 [20:08:14<1:40:17, 2.54s/it] +2025-02-06 06:15:55 - ERROR - stderr - +2025-02-06 06:15:55 - ERROR - stderr - +2025-02-06 06:15:55 - INFO - stdout - {'loss': 0.402, 'grad_norm': 1.518759846687317, 'learning_rate': 5.806745605137876e-07, 'epoch': 2.68} +2025-02-06 06:15:55 - ERROR - stderr - 89%|████████▉ | 20062/22434 [20:08:14<1:40:17, 2.54s/it] +2025-02-06 06:15:57 - ERROR - stderr - 89%|████████▉ | 20063/22434 [20:08:17<1:39:52, 2.53s/it] +2025-02-06 06:15:57 - ERROR - stderr - +2025-02-06 06:15:57 - ERROR - stderr - +2025-02-06 06:15:57 - INFO - stdout - {'loss': 0.4326, 'grad_norm': 1.752624273300171, 'learning_rate': 5.801898454663091e-07, 'epoch': 2.68} +2025-02-06 06:15:57 - ERROR - stderr - 89%|████████▉ | 20063/22434 [20:08:17<1:39:52, 2.53s/it] +2025-02-06 06:16:00 - ERROR - stderr - 89%|████████▉ | 20064/22434 [20:08:19<1:40:14, 2.54s/it] +2025-02-06 06:16:00 - ERROR - stderr - +2025-02-06 06:16:00 - ERROR - stderr - +2025-02-06 06:16:00 - INFO - stdout - {'loss': 0.378, 'grad_norm': 1.5707350969314575, 'learning_rate': 5.797053267657582e-07, 'epoch': 2.68} +2025-02-06 06:16:00 - ERROR - stderr - 89%|████████▉ | 20064/22434 [20:08:19<1:40:14, 2.54s/it] +2025-02-06 06:16:02 - ERROR - stderr - 89%|████████▉ | 20065/22434 [20:08:22<1:39:36, 2.52s/it] +2025-02-06 06:16:02 - ERROR - stderr - +2025-02-06 06:16:02 - ERROR - stderr - +2025-02-06 06:16:02 - INFO - stdout - {'loss': 0.3275, 'grad_norm': 1.5453437566757202, 'learning_rate': 5.792210044222357e-07, 'epoch': 2.68} +2025-02-06 06:16:02 - ERROR - stderr - 89%|████████▉ | 20065/22434 [20:08:22<1:39:36, 2.52s/it] +2025-02-06 06:16:05 - ERROR - stderr - 89%|████████▉ | 20066/22434 [20:08:24<1:38:30, 2.50s/it] +2025-02-06 06:16:05 - ERROR - stderr - +2025-02-06 06:16:05 - ERROR - stderr - +2025-02-06 06:16:05 - INFO - stdout - {'loss': 0.3195, 'grad_norm': 1.6071445941925049, 'learning_rate': 5.78736878445837e-07, 'epoch': 2.68} +2025-02-06 06:16:05 - ERROR - stderr - 89%|████████▉ | 20066/22434 [20:08:24<1:38:30, 2.50s/it] +2025-02-06 06:16:07 - ERROR - stderr - 89%|████████▉ | 20067/22434 [20:08:27<1:43:38, 2.63s/it] +2025-02-06 06:16:07 - ERROR - stderr - +2025-02-06 06:16:07 - ERROR - stderr - +2025-02-06 06:16:07 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.494536280632019, 'learning_rate': 5.782529488466527e-07, 'epoch': 2.68} +2025-02-06 06:16:07 - ERROR - stderr - 89%|████████▉ | 20067/22434 [20:08:27<1:43:38, 2.63s/it] +2025-02-06 06:16:10 - ERROR - stderr - 89%|████████▉ | 20068/22434 [20:08:30<1:43:09, 2.62s/it] +2025-02-06 06:16:10 - ERROR - stderr - +2025-02-06 06:16:10 - ERROR - stderr - +2025-02-06 06:16:10 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.6537988185882568, 'learning_rate': 5.777692156347703e-07, 'epoch': 2.68} +2025-02-06 06:16:10 - ERROR - stderr - 89%|████████▉ | 20068/22434 [20:08:30<1:43:09, 2.62s/it] +2025-02-06 06:16:12 - ERROR - stderr - 89%|████████▉ | 20069/22434 [20:08:32<1:41:12, 2.57s/it] +2025-02-06 06:16:13 - ERROR - stderr - +2025-02-06 06:16:13 - ERROR - stderr - +2025-02-06 06:16:13 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.7844514846801758, 'learning_rate': 5.77285678820273e-07, 'epoch': 2.68} +2025-02-06 06:16:13 - ERROR - stderr - 89%|████████▉ | 20069/22434 [20:08:32<1:41:12, 2.57s/it] +2025-02-06 06:16:15 - ERROR - stderr - 89%|████████▉ | 20070/22434 [20:08:35<1:41:20, 2.57s/it] +2025-02-06 06:16:15 - ERROR - stderr - +2025-02-06 06:16:15 - ERROR - stderr - +2025-02-06 06:16:15 - INFO - stdout - {'loss': 0.333, 'grad_norm': 1.4989330768585205, 'learning_rate': 5.768023384132382e-07, 'epoch': 2.68} +2025-02-06 06:16:15 - ERROR - stderr - 89%|████████▉ | 20070/22434 [20:08:35<1:41:20, 2.57s/it] +2025-02-06 06:16:18 - ERROR - stderr - 89%|████████▉ | 20071/22434 [20:08:38<1:43:23, 2.63s/it] +2025-02-06 06:16:18 - ERROR - stderr - +2025-02-06 06:16:18 - ERROR - stderr - +2025-02-06 06:16:18 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.5010210275650024, 'learning_rate': 5.763191944237434e-07, 'epoch': 2.68} +2025-02-06 06:16:18 - ERROR - stderr - 89%|████████▉ | 20071/22434 [20:08:38<1:43:23, 2.63s/it] +2025-02-06 06:16:20 - ERROR - stderr - 89%|████████▉ | 20072/22434 [20:08:40<1:42:28, 2.60s/it] +2025-02-06 06:16:20 - ERROR - stderr - +2025-02-06 06:16:20 - ERROR - stderr - +2025-02-06 06:16:20 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.4865617752075195, 'learning_rate': 5.75836246861854e-07, 'epoch': 2.68} +2025-02-06 06:16:20 - ERROR - stderr - 89%|████████▉ | 20072/22434 [20:08:40<1:42:28, 2.60s/it] +2025-02-06 06:16:23 - ERROR - stderr - 89%|████████▉ | 20073/22434 [20:08:43<1:41:57, 2.59s/it] +2025-02-06 06:16:23 - ERROR - stderr - +2025-02-06 06:16:23 - ERROR - stderr - +2025-02-06 06:16:23 - INFO - stdout - {'loss': 0.3801, 'grad_norm': 1.8358020782470703, 'learning_rate': 5.753534957376438e-07, 'epoch': 2.68} +2025-02-06 06:16:23 - ERROR - stderr - 89%|████████▉ | 20073/22434 [20:08:43<1:41:57, 2.59s/it] +2025-02-06 06:16:25 - ERROR - stderr - 89%|████████▉ | 20074/22434 [20:08:45<1:41:18, 2.58s/it] +2025-02-06 06:16:26 - ERROR - stderr - +2025-02-06 06:16:26 - ERROR - stderr - +2025-02-06 06:16:26 - INFO - stdout - {'loss': 0.3534, 'grad_norm': 1.5007939338684082, 'learning_rate': 5.748709410611686e-07, 'epoch': 2.68} +2025-02-06 06:16:26 - ERROR - stderr - 89%|████████▉ | 20074/22434 [20:08:45<1:41:18, 2.58s/it] +2025-02-06 06:16:28 - ERROR - stderr - 89%|████████▉ | 20075/22434 [20:08:48<1:40:45, 2.56s/it] +2025-02-06 06:16:28 - ERROR - stderr - +2025-02-06 06:16:28 - ERROR - stderr - +2025-02-06 06:16:28 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.4989217519760132, 'learning_rate': 5.743885828424923e-07, 'epoch': 2.68} +2025-02-06 06:16:28 - ERROR - stderr - 89%|████████▉ | 20075/22434 [20:08:48<1:40:45, 2.56s/it] +2025-02-06 06:16:31 - ERROR - stderr - 89%|████████▉ | 20076/22434 [20:08:50<1:41:55, 2.59s/it] +2025-02-06 06:16:31 - ERROR - stderr - +2025-02-06 06:16:31 - ERROR - stderr - +2025-02-06 06:16:31 - INFO - stdout - {'loss': 0.3566, 'grad_norm': 1.4408620595932007, 'learning_rate': 5.739064210916656e-07, 'epoch': 2.68} +2025-02-06 06:16:31 - ERROR - stderr - 89%|████████▉ | 20076/22434 [20:08:50<1:41:55, 2.59s/it] +2025-02-06 06:16:33 - ERROR - stderr - 89%|████████▉ | 20077/22434 [20:08:53<1:40:05, 2.55s/it] +2025-02-06 06:16:33 - ERROR - stderr - +2025-02-06 06:16:33 - ERROR - stderr - +2025-02-06 06:16:33 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.5105623006820679, 'learning_rate': 5.734244558187385e-07, 'epoch': 2.68} +2025-02-06 06:16:33 - ERROR - stderr - 89%|████████▉ | 20077/22434 [20:08:53<1:40:05, 2.55s/it] +2025-02-06 06:16:36 - ERROR - stderr - 89%|████████▉ | 20078/22434 [20:08:56<1:40:57, 2.57s/it] +2025-02-06 06:16:36 - ERROR - stderr - +2025-02-06 06:16:36 - ERROR - stderr - +2025-02-06 06:16:36 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.5920864343643188, 'learning_rate': 5.729426870337606e-07, 'epoch': 2.68} +2025-02-06 06:16:36 - ERROR - stderr - 89%|████████▉ | 20078/22434 [20:08:56<1:40:57, 2.57s/it] +2025-02-06 06:16:38 - ERROR - stderr - 90%|████████▉ | 20079/22434 [20:08:58<1:40:57, 2.57s/it] +2025-02-06 06:16:38 - ERROR - stderr - +2025-02-06 06:16:38 - ERROR - stderr - +2025-02-06 06:16:38 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.597782850265503, 'learning_rate': 5.724611147467707e-07, 'epoch': 2.69} +2025-02-06 06:16:38 - ERROR - stderr - 90%|████████▉ | 20079/22434 [20:08:58<1:40:57, 2.57s/it] +2025-02-06 06:16:41 - ERROR - stderr - 90%|████████▉ | 20080/22434 [20:09:01<1:40:34, 2.56s/it] +2025-02-06 06:16:41 - ERROR - stderr - +2025-02-06 06:16:41 - ERROR - stderr - +2025-02-06 06:16:41 - INFO - stdout - {'loss': 0.3883, 'grad_norm': 1.436955451965332, 'learning_rate': 5.719797389678072e-07, 'epoch': 2.69} +2025-02-06 06:16:41 - ERROR - stderr - 90%|████████▉ | 20080/22434 [20:09:01<1:40:34, 2.56s/it] +2025-02-06 06:16:43 - ERROR - stderr - 90%|██████��█▉ | 20081/22434 [20:09:03<1:39:52, 2.55s/it] +2025-02-06 06:16:43 - ERROR - stderr - +2025-02-06 06:16:43 - ERROR - stderr - +2025-02-06 06:16:43 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.5158867835998535, 'learning_rate': 5.714985597069045e-07, 'epoch': 2.69} +2025-02-06 06:16:43 - ERROR - stderr - 90%|████████▉ | 20081/22434 [20:09:03<1:39:52, 2.55s/it] +2025-02-06 06:16:46 - ERROR - stderr - 90%|████████▉ | 20082/22434 [20:09:06<1:39:20, 2.53s/it] +2025-02-06 06:16:46 - ERROR - stderr - +2025-02-06 06:16:46 - ERROR - stderr - +2025-02-06 06:16:46 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.6186591386795044, 'learning_rate': 5.710175769740933e-07, 'epoch': 2.69} +2025-02-06 06:16:46 - ERROR - stderr - 90%|████████▉ | 20082/22434 [20:09:06<1:39:20, 2.53s/it] +2025-02-06 06:16:48 - ERROR - stderr - 90%|████████▉ | 20083/22434 [20:09:08<1:40:27, 2.56s/it] +2025-02-06 06:16:49 - ERROR - stderr - +2025-02-06 06:16:49 - ERROR - stderr - +2025-02-06 06:16:49 - INFO - stdout - {'loss': 0.2942, 'grad_norm': 1.250008225440979, 'learning_rate': 5.705367907793969e-07, 'epoch': 2.69} +2025-02-06 06:16:49 - ERROR - stderr - 90%|████████▉ | 20083/22434 [20:09:08<1:40:27, 2.56s/it] +2025-02-06 06:16:51 - ERROR - stderr - 90%|████████▉ | 20084/22434 [20:09:11<1:39:55, 2.55s/it] +2025-02-06 06:16:51 - ERROR - stderr - +2025-02-06 06:16:51 - ERROR - stderr - +2025-02-06 06:16:51 - INFO - stdout - {'loss': 0.4211, 'grad_norm': 1.6313121318817139, 'learning_rate': 5.700562011328381e-07, 'epoch': 2.69} +2025-02-06 06:16:51 - ERROR - stderr - 90%|████████▉ | 20084/22434 [20:09:11<1:39:55, 2.55s/it] +2025-02-06 06:16:54 - ERROR - stderr - 90%|████████▉ | 20085/22434 [20:09:13<1:39:32, 2.54s/it] +2025-02-06 06:16:54 - ERROR - stderr - +2025-02-06 06:16:54 - ERROR - stderr - +2025-02-06 06:16:54 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.3851209878921509, 'learning_rate': 5.695758080444346e-07, 'epoch': 2.69} +2025-02-06 06:16:54 - ERROR - stderr - 90%|████████▉ | 20085/22434 [20:09:13<1:39:32, 2.54s/it] +2025-02-06 06:16:56 - ERROR - stderr - 90%|████████▉ | 20086/22434 [20:09:16<1:39:20, 2.54s/it] +2025-02-06 06:16:56 - ERROR - stderr - +2025-02-06 06:16:56 - ERROR - stderr - +2025-02-06 06:16:56 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.521771788597107, 'learning_rate': 5.690956115241997e-07, 'epoch': 2.69} +2025-02-06 06:16:56 - ERROR - stderr - 90%|████████▉ | 20086/22434 [20:09:16<1:39:20, 2.54s/it] +2025-02-06 06:16:59 - ERROR - stderr - 90%|████████▉ | 20087/22434 [20:09:18<1:40:29, 2.57s/it] +2025-02-06 06:16:59 - ERROR - stderr - +2025-02-06 06:16:59 - ERROR - stderr - +2025-02-06 06:16:59 - INFO - stdout - {'loss': 0.3041, 'grad_norm': 1.4570585489273071, 'learning_rate': 5.686156115821428e-07, 'epoch': 2.69} +2025-02-06 06:16:59 - ERROR - stderr - 90%|████████▉ | 20087/22434 [20:09:19<1:40:29, 2.57s/it] +2025-02-06 06:17:01 - ERROR - stderr - 90%|████████▉ | 20088/22434 [20:09:21<1:42:07, 2.61s/it] +2025-02-06 06:17:01 - ERROR - stderr - +2025-02-06 06:17:01 - ERROR - stderr - +2025-02-06 06:17:01 - INFO - stdout - {'loss': 0.3894, 'grad_norm': 1.7978097200393677, 'learning_rate': 5.681358082282673e-07, 'epoch': 2.69} +2025-02-06 06:17:01 - ERROR - stderr - 90%|████████▉ | 20088/22434 [20:09:21<1:42:07, 2.61s/it] +2025-02-06 06:17:04 - ERROR - stderr - 90%|████████▉ | 20089/22434 [20:09:24<1:40:01, 2.56s/it] +2025-02-06 06:17:04 - ERROR - stderr - +2025-02-06 06:17:04 - ERROR - stderr - +2025-02-06 06:17:04 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.6892977952957153, 'learning_rate': 5.676562014725773e-07, 'epoch': 2.69} +2025-02-06 06:17:04 - ERROR - stderr - 90%|████████▉ | 20089/22434 [20:09:24<1:40:01, 2.56s/it] +2025-02-06 06:17:06 - ERROR - stderr - 90%|████████▉ | 20090/22434 [20:09:26<1:39:26, 2.55s/it] +2025-02-06 06:17:06 - ERROR - stderr - +2025-02-06 06:17:06 - ERROR - stderr - +2025-02-06 06:17:06 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.4773523807525635, 'learning_rate': 5.671767913250669e-07, 'epoch': 2.69} +2025-02-06 06:17:06 - ERROR - stderr - 90%|████████▉ | 20090/22434 [20:09:26<1:39:26, 2.55s/it] +2025-02-06 06:17:09 - ERROR - stderr - 90%|████████▉ | 20091/22434 [20:09:29<1:39:28, 2.55s/it] +2025-02-06 06:17:09 - ERROR - stderr - +2025-02-06 06:17:09 - ERROR - stderr - +2025-02-06 06:17:09 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.650503158569336, 'learning_rate': 5.666975777957295e-07, 'epoch': 2.69} +2025-02-06 06:17:09 - ERROR - stderr - 90%|████████▉ | 20091/22434 [20:09:29<1:39:28, 2.55s/it] +2025-02-06 06:17:12 - ERROR - stderr - 90%|████████▉ | 20092/22434 [20:09:32<1:43:37, 2.65s/it] +2025-02-06 06:17:12 - ERROR - stderr - +2025-02-06 06:17:12 - ERROR - stderr - +2025-02-06 06:17:12 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.352003574371338, 'learning_rate': 5.66218560894557e-07, 'epoch': 2.69} +2025-02-06 06:17:12 - ERROR - stderr - 90%|████████▉ | 20092/22434 [20:09:32<1:43:37, 2.65s/it] +2025-02-06 06:17:14 - ERROR - stderr - 90%|████████▉ | 20093/22434 [20:09:34<1:42:01, 2.61s/it] +2025-02-06 06:17:14 - ERROR - stderr - +2025-02-06 06:17:14 - ERROR - stderr - +2025-02-06 06:17:14 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.5948925018310547, 'learning_rate': 5.65739740631528e-07, 'epoch': 2.69} +2025-02-06 06:17:14 - ERROR - stderr - 90%|████████▉ | 20093/22434 [20:09:34<1:42:01, 2.61s/it] +2025-02-06 06:17:17 - ERROR - stderr - 90%|████████▉ | 20094/22434 [20:09:37<1:45:32, 2.71s/it] +2025-02-06 06:17:17 - ERROR - stderr - +2025-02-06 06:17:17 - ERROR - stderr - +2025-02-06 06:17:17 - INFO - stdout - {'loss': 0.3946, 'grad_norm': 1.5667890310287476, 'learning_rate': 5.652611170166288e-07, 'epoch': 2.69} +2025-02-06 06:17:17 - ERROR - stderr - 90%|████████▉ | 20094/22434 [20:09:37<1:45:32, 2.71s/it] +2025-02-06 06:17:20 - ERROR - stderr - 90%|████████▉ | 20095/22434 [20:09:40<1:43:53, 2.67s/it] +2025-02-06 06:17:20 - ERROR - stderr - +2025-02-06 06:17:20 - ERROR - stderr - +2025-02-06 06:17:20 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.5759068727493286, 'learning_rate': 5.64782690059833e-07, 'epoch': 2.69} +2025-02-06 06:17:20 - ERROR - stderr - 90%|████████▉ | 20095/22434 [20:09:40<1:43:53, 2.67s/it] +2025-02-06 06:17:22 - ERROR - stderr - 90%|████████▉ | 20096/22434 [20:09:42<1:41:43, 2.61s/it] +2025-02-06 06:17:22 - ERROR - stderr - +2025-02-06 06:17:22 - ERROR - stderr - +2025-02-06 06:17:22 - INFO - stdout - {'loss': 0.4033, 'grad_norm': 1.7227462530136108, 'learning_rate': 5.643044597711122e-07, 'epoch': 2.69} +2025-02-06 06:17:22 - ERROR - stderr - 90%|████████▉ | 20096/22434 [20:09:42<1:41:43, 2.61s/it] +2025-02-06 06:17:25 - ERROR - stderr - 90%|████████▉ | 20097/22434 [20:09:45<1:41:16, 2.60s/it] +2025-02-06 06:17:25 - ERROR - stderr - +2025-02-06 06:17:25 - ERROR - stderr - +2025-02-06 06:17:25 - INFO - stdout - {'loss': 0.3935, 'grad_norm': 1.6775298118591309, 'learning_rate': 5.638264261604387e-07, 'epoch': 2.69} +2025-02-06 06:17:25 - ERROR - stderr - 90%|████████▉ | 20097/22434 [20:09:45<1:41:16, 2.60s/it] +2025-02-06 06:17:27 - ERROR - stderr - 90%|████████▉ | 20098/22434 [20:09:47<1:40:15, 2.58s/it] +2025-02-06 06:17:27 - ERROR - stderr - +2025-02-06 06:17:27 - ERROR - stderr - +2025-02-06 06:17:27 - INFO - stdout - {'loss': 0.4348, 'grad_norm': 1.7272974252700806, 'learning_rate': 5.633485892377699e-07, 'epoch': 2.69} +2025-02-06 06:17:27 - ERROR - stderr - 90%|████████▉ | 20098/22434 [20:09:47<1:40:15, 2.58s/it] +2025-02-06 06:17:30 - ERROR - stderr - 90%|████████▉ | 20099/22434 [20:09:50<1:39:08, 2.55s/it] +2025-02-06 06:17:30 - ERROR - stderr - +2025-02-06 06:17:30 - ERROR - stderr - +2025-02-06 06:17:30 - INFO - stdout - {'loss': 0.335, 'grad_norm': 1.4525794982910156, 'learning_rate': 5.628709490130734e-07, 'epoch': 2.69} +2025-02-06 06:17:30 - ERROR - stderr - 90%|████████▉ | 20099/22434 [20:09:50<1:39:08, 2.55s/it] +2025-02-06 06:17:32 - ERROR - stderr - 90%|████████▉ | 20100/22434 [20:09:52<1:39:10, 2.55s/it] +2025-02-06 06:17:33 - ERROR - stderr - +2025-02-06 06:17:33 - ERROR - stderr - +2025-02-06 06:17:33 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.553026556968689, 'learning_rate': 5.623935054963014e-07, 'epoch': 2.69} +2025-02-06 06:17:33 - ERROR - stderr - 90%|████████▉ | 20100/22434 [20:09:52<1:39:10, 2.55s/it] +2025-02-06 06:17:35 - ERROR - stderr - 90%|████████▉ | 20101/22434 [20:09:55<1:38:46, 2.54s/it] +2025-02-06 06:17:35 - ERROR - stderr - +2025-02-06 06:17:35 - ERROR - stderr - +2025-02-06 06:17:35 - INFO - stdout - {'loss': 0.3342, 'grad_norm': 1.5336637496948242, 'learning_rate': 5.619162586974048e-07, 'epoch': 2.69} +2025-02-06 06:17:35 - ERROR - stderr - 90%|████████▉ | 20101/22434 [20:09:55<1:38:46, 2.54s/it] +2025-02-06 06:17:37 - ERROR - stderr - 90%|████████▉ | 20102/22434 [20:09:57<1:37:48, 2.52s/it] +2025-02-06 06:17:37 - ERROR - stderr - +2025-02-06 06:17:37 - ERROR - stderr - +2025-02-06 06:17:37 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.448643445968628, 'learning_rate': 5.61439208626332e-07, 'epoch': 2.69} +2025-02-06 06:17:37 - ERROR - stderr - 90%|████████▉ | 20102/22434 [20:09:57<1:37:48, 2.52s/it] +2025-02-06 06:17:40 - ERROR - stderr - 90%|████████▉ | 20103/22434 [20:10:00<1:37:24, 2.51s/it] +2025-02-06 06:17:40 - ERROR - stderr - +2025-02-06 06:17:40 - ERROR - stderr - +2025-02-06 06:17:40 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.5856961011886597, 'learning_rate': 5.609623552930288e-07, 'epoch': 2.69} +2025-02-06 06:17:40 - ERROR - stderr - 90%|████████▉ | 20103/22434 [20:10:00<1:37:24, 2.51s/it] +2025-02-06 06:17:43 - ERROR - stderr - 90%|████████▉ | 20104/22434 [20:10:02<1:38:42, 2.54s/it] +2025-02-06 06:17:43 - ERROR - stderr - +2025-02-06 06:17:43 - ERROR - stderr - +2025-02-06 06:17:43 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.7282506227493286, 'learning_rate': 5.604856987074314e-07, 'epoch': 2.69} +2025-02-06 06:17:43 - ERROR - stderr - 90%|████████▉ | 20104/22434 [20:10:02<1:38:42, 2.54s/it] +2025-02-06 06:17:45 - ERROR - stderr - 90%|████████▉ | 20105/22434 [20:10:05<1:39:52, 2.57s/it] +2025-02-06 06:17:45 - ERROR - stderr - +2025-02-06 06:17:45 - ERROR - stderr - +2025-02-06 06:17:45 - INFO - stdout - {'loss': 0.3659, 'grad_norm': 1.606393814086914, 'learning_rate': 5.600092388794776e-07, 'epoch': 2.69} +2025-02-06 06:17:45 - ERROR - stderr - 90%|████████▉ | 20105/22434 [20:10:05<1:39:52, 2.57s/it] +2025-02-06 06:17:48 - ERROR - stderr - 90%|████████▉ | 20106/22434 [20:10:08<1:40:09, 2.58s/it] +2025-02-06 06:17:48 - ERROR - stderr - +2025-02-06 06:17:48 - ERROR - stderr - +2025-02-06 06:17:48 - INFO - stdout - {'loss': 0.3192, 'grad_norm': 1.4404832124710083, 'learning_rate': 5.595329758190993e-07, 'epoch': 2.69} +2025-02-06 06:17:48 - ERROR - stderr - 90%|████████▉ | 20106/22434 [20:10:08<1:40:09, 2.58s/it] +2025-02-06 06:17:50 - ERROR - stderr - 90%|████████▉ | 20107/22434 [20:10:10<1:40:49, 2.60s/it] +2025-02-06 06:17:50 - ERROR - stderr - +2025-02-06 06:17:50 - ERROR - stderr - +2025-02-06 06:17:50 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.6190552711486816, 'learning_rate': 5.590569095362208e-07, 'epoch': 2.69} +2025-02-06 06:17:50 - ERROR - stderr - 90%|████████▉ | 20107/22434 [20:10:10<1:40:49, 2.60s/it] +2025-02-06 06:17:53 - ERROR - stderr - 90%|████████▉ | 20108/22434 [20:10:13<1:39:21, 2.56s/it] +2025-02-06 06:17:53 - ERROR - stderr - +2025-02-06 06:17:53 - ERROR - stderr - +2025-02-06 06:17:53 - INFO - stdout - {'loss': 0.3285, 'grad_norm': 1.3745753765106201, 'learning_rate': 5.585810400407677e-07, 'epoch': 2.69} +2025-02-06 06:17:53 - ERROR - stderr - 90%|████████▉ | 20108/22434 [20:10:13<1:39:21, 2.56s/it] +2025-02-06 06:17:55 - ERROR - stderr - 90%|████████▉ | 20109/22434 [20:10:15<1:39:24, 2.57s/it] +2025-02-06 06:17:56 - ERROR - stderr - +2025-02-06 06:17:56 - ERROR - stderr - +2025-02-06 06:17:56 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.6005133390426636, 'learning_rate': 5.581053673426584e-07, 'epoch': 2.69} +2025-02-06 06:17:56 - ERROR - stderr - 90%|████████▉ | 20109/22434 [20:10:15<1:39:24, 2.57s/it] +2025-02-06 06:17:58 - ERROR - stderr - 90%|████████▉ | 20110/22434 [20:10:18<1:38:32, 2.54s/it] +2025-02-06 06:17:58 - ERROR - stderr - +2025-02-06 06:17:58 - ERROR - stderr - +2025-02-06 06:17:58 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.4634137153625488, 'learning_rate': 5.576298914518086e-07, 'epoch': 2.69} +2025-02-06 06:17:58 - ERROR - stderr - 90%|████████▉ | 20110/22434 [20:10:18<1:38:32, 2.54s/it] +2025-02-06 06:18:01 - ERROR - stderr - 90%|████████▉ | 20111/22434 [20:10:20<1:38:57, 2.56s/it] +2025-02-06 06:18:01 - ERROR - stderr - +2025-02-06 06:18:01 - ERROR - stderr - +2025-02-06 06:18:01 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.3834683895111084, 'learning_rate': 5.571546123781291e-07, 'epoch': 2.69} +2025-02-06 06:18:01 - ERROR - stderr - 90%|████████▉ | 20111/22434 [20:10:20<1:38:57, 2.56s/it] +2025-02-06 06:18:03 - ERROR - stderr - 90%|████████▉ | 20112/22434 [20:10:23<1:38:22, 2.54s/it] +2025-02-06 06:18:03 - ERROR - stderr - +2025-02-06 06:18:03 - ERROR - stderr - +2025-02-06 06:18:03 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.6085689067840576, 'learning_rate': 5.56679530131522e-07, 'epoch': 2.69} +2025-02-06 06:18:03 - ERROR - stderr - 90%|████████▉ | 20112/22434 [20:10:23<1:38:22, 2.54s/it] +2025-02-06 06:18:06 - ERROR - stderr - 90%|████████▉ | 20113/22434 [20:10:25<1:38:03, 2.53s/it] +2025-02-06 06:18:06 - ERROR - stderr - +2025-02-06 06:18:06 - ERROR - stderr - +2025-02-06 06:18:06 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.6271476745605469, 'learning_rate': 5.562046447218983e-07, 'epoch': 2.69} +2025-02-06 06:18:06 - ERROR - stderr - 90%|████████▉ | 20113/22434 [20:10:25<1:38:03, 2.53s/it] +2025-02-06 06:18:08 - ERROR - stderr - 90%|████████▉ | 20114/22434 [20:10:28<1:37:40, 2.53s/it] +2025-02-06 06:18:08 - ERROR - stderr - +2025-02-06 06:18:08 - ERROR - stderr - +2025-02-06 06:18:08 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.629969596862793, 'learning_rate': 5.557299561591478e-07, 'epoch': 2.69} +2025-02-06 06:18:08 - ERROR - stderr - 90%|████████▉ | 20114/22434 [20:10:28<1:37:40, 2.53s/it] +2025-02-06 06:18:11 - ERROR - stderr - 90%|████████▉ | 20115/22434 [20:10:30<1:38:15, 2.54s/it] +2025-02-06 06:18:11 - ERROR - stderr - +2025-02-06 06:18:11 - ERROR - stderr - +2025-02-06 06:18:11 - INFO - stdout - {'loss': 0.3697, 'grad_norm': 1.6920576095581055, 'learning_rate': 5.552554644531715e-07, 'epoch': 2.69} +2025-02-06 06:18:11 - ERROR - stderr - 90%|████████▉ | 20115/22434 [20:10:30<1:38:15, 2.54s/it] +2025-02-06 06:18:13 - ERROR - stderr - 90%|████████▉ | 20116/22434 [20:10:33<1:38:01, 2.54s/it] +2025-02-06 06:18:13 - ERROR - stderr - +2025-02-06 06:18:13 - ERROR - stderr - +2025-02-06 06:18:13 - INFO - stdout - {'loss': 0.4035, 'grad_norm': 1.5295406579971313, 'learning_rate': 5.547811696138594e-07, 'epoch': 2.69} +2025-02-06 06:18:13 - ERROR - stderr - 90%|████████▉ | 20116/22434 [20:10:33<1:38:01, 2.54s/it] +2025-02-06 06:18:16 - ERROR - stderr - 90%|████████▉ | 20117/22434 [20:10:36<1:40:41, 2.61s/it] +2025-02-06 06:18:16 - ERROR - stderr - +2025-02-06 06:18:16 - ERROR - stderr - +2025-02-06 06:18:16 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.5440192222595215, 'learning_rate': 5.543070716510912e-07, 'epoch': 2.69} +2025-02-06 06:18:16 - ERROR - stderr - 90%|████████▉ | 20117/22434 [20:10:36<1:40:41, 2.61s/it] +2025-02-06 06:18:19 - ERROR - stderr - 90%|████████▉ | 20118/22434 [20:10:38<1:39:47, 2.59s/it] +2025-02-06 06:18:19 - ERROR - stderr - +2025-02-06 06:18:19 - ERROR - stderr - +2025-02-06 06:18:19 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.4434239864349365, 'learning_rate': 5.53833170574758e-07, 'epoch': 2.69} +2025-02-06 06:18:19 - ERROR - stderr - 90%|████████▉ | 20118/22434 [20:10:38<1:39:47, 2.59s/it] +2025-02-06 06:18:21 - ERROR - stderr - 90%|████████▉ | 20119/22434 [20:10:41<1:38:12, 2.55s/it] +2025-02-06 06:18:21 - ERROR - stderr - +2025-02-06 06:18:21 - ERROR - stderr - +2025-02-06 06:18:21 - INFO - stdout - {'loss': 0.3998, 'grad_norm': 1.7046767473220825, 'learning_rate': 5.533594663947306e-07, 'epoch': 2.69} +2025-02-06 06:18:21 - ERROR - stderr - 90%|████████▉ | 20119/22434 [20:10:41<1:38:12, 2.55s/it] +2025-02-06 06:18:24 - ERROR - stderr - 90%|████████▉ | 20120/22434 [20:10:43<1:38:32, 2.56s/it] +2025-02-06 06:18:24 - ERROR - stderr - +2025-02-06 06:18:24 - ERROR - stderr - +2025-02-06 06:18:24 - INFO - stdout - {'loss': 0.3246, 'grad_norm': 1.4687715768814087, 'learning_rate': 5.528859591208869e-07, 'epoch': 2.69} +2025-02-06 06:18:24 - ERROR - stderr - 90%|████████▉ | 20120/22434 [20:10:43<1:38:32, 2.56s/it] +2025-02-06 06:18:26 - ERROR - stderr - 90%|████████▉ | 20121/22434 [20:10:46<1:38:27, 2.55s/it] +2025-02-06 06:18:26 - ERROR - stderr - +2025-02-06 06:18:26 - ERROR - stderr - +2025-02-06 06:18:26 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.5858440399169922, 'learning_rate': 5.524126487630943e-07, 'epoch': 2.69} +2025-02-06 06:18:26 - ERROR - stderr - 90%|████████▉ | 20121/22434 [20:10:46<1:38:27, 2.55s/it] +2025-02-06 06:18:29 - ERROR - stderr - 90%|████████▉ | 20122/22434 [20:10:48<1:37:53, 2.54s/it] +2025-02-06 06:18:29 - ERROR - stderr - +2025-02-06 06:18:29 - ERROR - stderr - +2025-02-06 06:18:29 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.5686880350112915, 'learning_rate': 5.519395353312195e-07, 'epoch': 2.69} +2025-02-06 06:18:29 - ERROR - stderr - 90%|████████▉ | 20122/22434 [20:10:48<1:37:53, 2.54s/it] +2025-02-06 06:18:31 - ERROR - stderr - 90%|████████▉ | 20123/22434 [20:10:51<1:38:34, 2.56s/it] +2025-02-06 06:18:31 - ERROR - stderr - +2025-02-06 06:18:31 - ERROR - stderr - +2025-02-06 06:18:31 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.7948952913284302, 'learning_rate': 5.514666188351258e-07, 'epoch': 2.69} +2025-02-06 06:18:31 - ERROR - stderr - 90%|████████▉ | 20123/22434 [20:10:51<1:38:34, 2.56s/it] +2025-02-06 06:18:34 - ERROR - stderr - 90%|████████▉ | 20124/22434 [20:10:54<1:38:44, 2.56s/it] +2025-02-06 06:18:34 - ERROR - stderr - +2025-02-06 06:18:34 - ERROR - stderr - +2025-02-06 06:18:34 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.594271183013916, 'learning_rate': 5.509938992846686e-07, 'epoch': 2.69} +2025-02-06 06:18:34 - ERROR - stderr - 90%|████████▉ | 20124/22434 [20:10:54<1:38:44, 2.56s/it] +2025-02-06 06:18:36 - ERROR - stderr - 90%|████████▉ | 20125/22434 [20:10:56<1:38:12, 2.55s/it] +2025-02-06 06:18:36 - ERROR - stderr - +2025-02-06 06:18:36 - ERROR - stderr - +2025-02-06 06:18:36 - INFO - stdout - {'loss': 0.3246, 'grad_norm': 1.3327934741973877, 'learning_rate': 5.505213766897022e-07, 'epoch': 2.69} +2025-02-06 06:18:36 - ERROR - stderr - 90%|████████▉ | 20125/22434 [20:10:56<1:38:12, 2.55s/it] +2025-02-06 06:18:39 - ERROR - stderr - 90%|████████▉ | 20126/22434 [20:10:59<1:37:37, 2.54s/it] +2025-02-06 06:18:39 - ERROR - stderr - +2025-02-06 06:18:39 - ERROR - stderr - +2025-02-06 06:18:39 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.5597537755966187, 'learning_rate': 5.500490510600742e-07, 'epoch': 2.69} +2025-02-06 06:18:39 - ERROR - stderr - 90%|████████▉ | 20126/22434 [20:10:59<1:37:37, 2.54s/it] +2025-02-06 06:18:41 - ERROR - stderr - 90%|████████▉ | 20127/22434 [20:11:01<1:38:49, 2.57s/it] +2025-02-06 06:18:41 - ERROR - stderr - +2025-02-06 06:18:41 - ERROR - stderr - +2025-02-06 06:18:41 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.3275127410888672, 'learning_rate': 5.495769224056325e-07, 'epoch': 2.69} +2025-02-06 06:18:41 - ERROR - stderr - 90%|████████▉ | 20127/22434 [20:11:01<1:38:49, 2.57s/it] +2025-02-06 06:18:44 - ERROR - stderr - 90%|████████▉ | 20128/22434 [20:11:04<1:38:40, 2.57s/it] +2025-02-06 06:18:44 - ERROR - stderr - +2025-02-06 06:18:44 - ERROR - stderr - +2025-02-06 06:18:44 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.4869270324707031, 'learning_rate': 5.491049907362156e-07, 'epoch': 2.69} +2025-02-06 06:18:44 - ERROR - stderr - 90%|████████▉ | 20128/22434 [20:11:04<1:38:40, 2.57s/it] +2025-02-06 06:18:47 - ERROR - stderr - 90%|████████▉ | 20129/22434 [20:11:06<1:38:04, 2.55s/it] +2025-02-06 06:18:47 - ERROR - stderr - +2025-02-06 06:18:47 - ERROR - stderr - +2025-02-06 06:18:47 - INFO - stdout - {'loss': 0.3414, 'grad_norm': 1.5921506881713867, 'learning_rate': 5.486332560616625e-07, 'epoch': 2.69} +2025-02-06 06:18:47 - ERROR - stderr - 90%|████████▉ | 20129/22434 [20:11:06<1:38:04, 2.55s/it] +2025-02-06 06:18:49 - ERROR - stderr - 90%|████████▉ | 20130/22434 [20:11:09<1:38:02, 2.55s/it] +2025-02-06 06:18:49 - ERROR - stderr - +2025-02-06 06:18:49 - ERROR - stderr - +2025-02-06 06:18:49 - INFO - stdout - {'loss': 0.3412, 'grad_norm': 1.6436141729354858, 'learning_rate': 5.481617183918053e-07, 'epoch': 2.69} +2025-02-06 06:18:49 - ERROR - stderr - 90%|████████▉ | 20130/22434 [20:11:09<1:38:02, 2.55s/it] +2025-02-06 06:18:52 - ERROR - stderr - 90%|████████▉ | 20131/22434 [20:11:11<1:38:02, 2.55s/it] +2025-02-06 06:18:52 - ERROR - stderr - +2025-02-06 06:18:52 - ERROR - stderr - +2025-02-06 06:18:52 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.5021904706954956, 'learning_rate': 5.476903777364717e-07, 'epoch': 2.69} +2025-02-06 06:18:52 - ERROR - stderr - 90%|████████▉ | 20131/22434 [20:11:11<1:38:02, 2.55s/it] +2025-02-06 06:18:54 - ERROR - stderr - 90%|████████▉ | 20132/22434 [20:11:14<1:37:32, 2.54s/it] +2025-02-06 06:18:54 - ERROR - stderr - +2025-02-06 06:18:54 - ERROR - stderr - +2025-02-06 06:18:54 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.5741074085235596, 'learning_rate': 5.472192341054882e-07, 'epoch': 2.69} +2025-02-06 06:18:54 - ERROR - stderr - 90%|████████▉ | 20132/22434 [20:11:14<1:37:32, 2.54s/it] +2025-02-06 06:18:57 - ERROR - stderr - 90%|████████▉ | 20133/22434 [20:11:16<1:36:16, 2.51s/it] +2025-02-06 06:18:57 - ERROR - stderr - +2025-02-06 06:18:57 - ERROR - stderr - +2025-02-06 06:18:57 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.653293251991272, 'learning_rate': 5.467482875086738e-07, 'epoch': 2.69} +2025-02-06 06:18:57 - ERROR - stderr - 90%|████████▉ | 20133/22434 [20:11:16<1:36:16, 2.51s/it] +2025-02-06 06:18:59 - ERROR - stderr - 90%|████████▉ | 20134/22434 [20:11:19<1:37:43, 2.55s/it] +2025-02-06 06:18:59 - ERROR - stderr - +2025-02-06 06:18:59 - ERROR - stderr - +2025-02-06 06:18:59 - INFO - stdout - {'loss': 0.3296, 'grad_norm': 1.5534422397613525, 'learning_rate': 5.462775379558461e-07, 'epoch': 2.69} +2025-02-06 06:18:59 - ERROR - stderr - 90%|████████▉ | 20134/22434 [20:11:19<1:37:43, 2.55s/it] +2025-02-06 06:19:02 - ERROR - stderr - 90%|████████▉ | 20135/22434 [20:11:21<1:36:47, 2.53s/it] +2025-02-06 06:19:02 - ERROR - stderr - +2025-02-06 06:19:02 - ERROR - stderr - +2025-02-06 06:19:02 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.657217264175415, 'learning_rate': 5.458069854568182e-07, 'epoch': 2.69} +2025-02-06 06:19:02 - ERROR - stderr - 90%|████████▉ | 20135/22434 [20:11:22<1:36:47, 2.53s/it] +2025-02-06 06:19:04 - ERROR - stderr - 90%|████████▉ | 20136/22434 [20:11:24<1:35:52, 2.50s/it] +2025-02-06 06:19:04 - ERROR - stderr - +2025-02-06 06:19:04 - ERROR - stderr - +2025-02-06 06:19:04 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.3705108165740967, 'learning_rate': 5.453366300213936e-07, 'epoch': 2.69} +2025-02-06 06:19:04 - ERROR - stderr - 90%|████████▉ | 20136/22434 [20:11:24<1:35:52, 2.50s/it] +2025-02-06 06:19:07 - ERROR - stderr - 90%|████████▉ | 20137/22434 [20:11:26<1:35:41, 2.50s/it] +2025-02-06 06:19:07 - ERROR - stderr - +2025-02-06 06:19:07 - ERROR - stderr - +2025-02-06 06:19:07 - INFO - stdout - {'loss': 0.402, 'grad_norm': 1.6446661949157715, 'learning_rate': 5.448664716593833e-07, 'epoch': 2.69} +2025-02-06 06:19:07 - ERROR - stderr - 90%|████████▉ | 20137/22434 [20:11:26<1:35:41, 2.50s/it] +2025-02-06 06:19:09 - ERROR - stderr - 90%|████████▉ | 20138/22434 [20:11:29<1:39:20, 2.60s/it] +2025-02-06 06:19:10 - ERROR - stderr - +2025-02-06 06:19:10 - ERROR - stderr - +2025-02-06 06:19:10 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.7517341375350952, 'learning_rate': 5.443965103805803e-07, 'epoch': 2.69} +2025-02-06 06:19:10 - ERROR - stderr - 90%|████████▉ | 20138/22434 [20:11:29<1:39:20, 2.60s/it] +2025-02-06 06:19:12 - ERROR - stderr - 90%|████████▉ | 20139/22434 [20:11:32<1:38:36, 2.58s/it] +2025-02-06 06:19:12 - ERROR - stderr - +2025-02-06 06:19:12 - ERROR - stderr - +2025-02-06 06:19:12 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.4992750883102417, 'learning_rate': 5.439267461947884e-07, 'epoch': 2.69} +2025-02-06 06:19:12 - ERROR - stderr - 90%|████████▉ | 20139/22434 [20:11:32<1:38:36, 2.58s/it] +2025-02-06 06:19:14 - ERROR - stderr - 90%|████████▉ | 20140/22434 [20:11:34<1:37:00, 2.54s/it] +2025-02-06 06:19:14 - ERROR - stderr - +2025-02-06 06:19:14 - ERROR - stderr - +2025-02-06 06:19:14 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.6932213306427002, 'learning_rate': 5.434571791117915e-07, 'epoch': 2.69} +2025-02-06 06:19:14 - ERROR - stderr - 90%|████████▉ | 20140/22434 [20:11:34<1:37:00, 2.54s/it] +2025-02-06 06:19:17 - ERROR - stderr - 90%|████████▉ | 20141/22434 [20:11:37<1:36:39, 2.53s/it] +2025-02-06 06:19:17 - ERROR - stderr - +2025-02-06 06:19:17 - ERROR - stderr - +2025-02-06 06:19:17 - INFO - stdout - {'loss': 0.4217, 'grad_norm': 1.8865656852722168, 'learning_rate': 5.42987809141381e-07, 'epoch': 2.69} +2025-02-06 06:19:17 - ERROR - stderr - 90%|████████▉ | 20141/22434 [20:11:37<1:36:39, 2.53s/it] +2025-02-06 06:19:19 - ERROR - stderr - 90%|████████▉ | 20142/22434 [20:11:39<1:36:16, 2.52s/it] +2025-02-06 06:19:19 - ERROR - stderr - +2025-02-06 06:19:19 - ERROR - stderr - +2025-02-06 06:19:19 - INFO - stdout - {'loss': 0.311, 'grad_norm': 1.4891695976257324, 'learning_rate': 5.425186362933422e-07, 'epoch': 2.69} +2025-02-06 06:19:19 - ERROR - stderr - 90%|████████▉ | 20142/22434 [20:11:39<1:36:16, 2.52s/it] +2025-02-06 06:19:22 - ERROR - stderr - 90%|████████▉ | 20143/22434 [20:11:42<1:35:48, 2.51s/it] +2025-02-06 06:19:22 - ERROR - stderr - +2025-02-06 06:19:22 - ERROR - stderr - +2025-02-06 06:19:22 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.5253475904464722, 'learning_rate': 5.420496605774495e-07, 'epoch': 2.69} +2025-02-06 06:19:22 - ERROR - stderr - 90%|████████▉ | 20143/22434 [20:11:42<1:35:48, 2.51s/it] +2025-02-06 06:19:24 - ERROR - stderr - 90%|████████▉ | 20144/22434 [20:11:44<1:35:25, 2.50s/it] +2025-02-06 06:19:24 - ERROR - stderr - +2025-02-06 06:19:24 - ERROR - stderr - +2025-02-06 06:19:24 - INFO - stdout - {'loss': 0.4124, 'grad_norm': 1.668724536895752, 'learning_rate': 5.415808820034851e-07, 'epoch': 2.69} +2025-02-06 06:19:24 - ERROR - stderr - 90%|████████▉ | 20144/22434 [20:11:44<1:35:25, 2.50s/it] +2025-02-06 06:19:27 - ERROR - stderr - 90%|████████▉ | 20145/22434 [20:11:47<1:35:18, 2.50s/it] +2025-02-06 06:19:27 - ERROR - stderr - +2025-02-06 06:19:27 - ERROR - stderr - +2025-02-06 06:19:27 - INFO - stdout - {'loss': 0.3434, 'grad_norm': 1.3502877950668335, 'learning_rate': 5.411123005812147e-07, 'epoch': 2.69} +2025-02-06 06:19:27 - ERROR - stderr - 90%|████████▉ | 20145/22434 [20:11:47<1:35:18, 2.50s/it] +2025-02-06 06:19:29 - ERROR - stderr - 90%|████████▉ | 20146/22434 [20:11:49<1:35:15, 2.50s/it] +2025-02-06 06:19:29 - ERROR - stderr - +2025-02-06 06:19:29 - ERROR - stderr - +2025-02-06 06:19:29 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.4562195539474487, 'learning_rate': 5.40643916320407e-07, 'epoch': 2.69} +2025-02-06 06:19:29 - ERROR - stderr - 90%|████████▉ | 20146/22434 [20:11:49<1:35:15, 2.50s/it] +2025-02-06 06:19:32 - ERROR - stderr - 90%|████████▉ | 20147/22434 [20:11:52<1:35:04, 2.49s/it] +2025-02-06 06:19:32 - ERROR - stderr - +2025-02-06 06:19:32 - ERROR - stderr - +2025-02-06 06:19:32 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.5881630182266235, 'learning_rate': 5.401757292308251e-07, 'epoch': 2.69} +2025-02-06 06:19:32 - ERROR - stderr - 90%|████████▉ | 20147/22434 [20:11:52<1:35:04, 2.49s/it] +2025-02-06 06:19:34 - ERROR - stderr - 90%|████████▉ | 20148/22434 [20:11:54<1:35:19, 2.50s/it] +2025-02-06 06:19:34 - ERROR - stderr - +2025-02-06 06:19:34 - ERROR - stderr - +2025-02-06 06:19:34 - INFO - stdout - {'loss': 0.2983, 'grad_norm': 1.3890424966812134, 'learning_rate': 5.397077393222283e-07, 'epoch': 2.69} +2025-02-06 06:19:34 - ERROR - stderr - 90%|████████▉ | 20148/22434 [20:11:54<1:35:19, 2.50s/it] +2025-02-06 06:19:37 - ERROR - stderr - 90%|████████▉ | 20149/22434 [20:11:57<1:36:14, 2.53s/it] +2025-02-06 06:19:37 - ERROR - stderr - +2025-02-06 06:19:37 - ERROR - stderr - +2025-02-06 06:19:37 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.5175317525863647, 'learning_rate': 5.392399466043719e-07, 'epoch': 2.69} +2025-02-06 06:19:37 - ERROR - stderr - 90%|████████▉ | 20149/22434 [20:11:57<1:36:14, 2.53s/it] +2025-02-06 06:19:39 - ERROR - stderr - 90%|████████▉ | 20150/22434 [20:11:59<1:35:45, 2.52s/it] +2025-02-06 06:19:40 - ERROR - stderr - +2025-02-06 06:19:40 - ERROR - stderr - +2025-02-06 06:19:40 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.5718451738357544, 'learning_rate': 5.387723510870047e-07, 'epoch': 2.69} +2025-02-06 06:19:40 - ERROR - stderr - 90%|████████▉ | 20150/22434 [20:11:59<1:35:45, 2.52s/it] +2025-02-06 06:19:42 - ERROR - stderr - 90%|████████▉ | 20151/22434 [20:12:02<1:36:33, 2.54s/it] +2025-02-06 06:19:42 - ERROR - stderr - +2025-02-06 06:19:42 - ERROR - stderr - +2025-02-06 06:19:42 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.5663902759552002, 'learning_rate': 5.383049527798756e-07, 'epoch': 2.69} +2025-02-06 06:19:42 - ERROR - stderr - 90%|████████▉ | 20151/22434 [20:12:02<1:36:33, 2.54s/it] +2025-02-06 06:19:45 - ERROR - stderr - 90%|████████▉ | 20152/22434 [20:12:04<1:35:50, 2.52s/it] +2025-02-06 06:19:45 - ERROR - stderr - +2025-02-06 06:19:45 - ERROR - stderr - +2025-02-06 06:19:45 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.532414436340332, 'learning_rate': 5.378377516927247e-07, 'epoch': 2.69} +2025-02-06 06:19:45 - ERROR - stderr - 90%|████████▉ | 20152/22434 [20:12:04<1:35:50, 2.52s/it] +2025-02-06 06:19:47 - ERROR - stderr - 90%|████████▉ | 20153/22434 [20:12:07<1:36:47, 2.55s/it] +2025-02-06 06:19:47 - ERROR - stderr - +2025-02-06 06:19:47 - ERROR - stderr - +2025-02-06 06:19:47 - INFO - stdout - {'loss': 0.3255, 'grad_norm': 1.5722057819366455, 'learning_rate': 5.373707478352918e-07, 'epoch': 2.69} +2025-02-06 06:19:47 - ERROR - stderr - 90%|████████▉ | 20153/22434 [20:12:07<1:36:47, 2.55s/it] +2025-02-06 06:19:50 - ERROR - stderr - 90%|████████▉ | 20154/22434 [20:12:09<1:36:10, 2.53s/it] +2025-02-06 06:19:50 - ERROR - stderr - +2025-02-06 06:19:50 - ERROR - stderr - +2025-02-06 06:19:50 - INFO - stdout - {'loss': 0.4155, 'grad_norm': 1.7246384620666504, 'learning_rate': 5.369039412173116e-07, 'epoch': 2.7} +2025-02-06 06:19:50 - ERROR - stderr - 90%|████████▉ | 20154/22434 [20:12:09<1:36:10, 2.53s/it] +2025-02-06 06:19:52 - ERROR - stderr - 90%|████████▉ | 20155/22434 [20:12:12<1:35:24, 2.51s/it] +2025-02-06 06:19:52 - ERROR - stderr - +2025-02-06 06:19:52 - ERROR - stderr - +2025-02-06 06:19:52 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.7544385194778442, 'learning_rate': 5.364373318485128e-07, 'epoch': 2.7} +2025-02-06 06:19:52 - ERROR - stderr - 90%|████████▉ | 20155/22434 [20:12:12<1:35:24, 2.51s/it] +2025-02-06 06:19:55 - ERROR - stderr - 90%|████████▉ | 20156/22434 [20:12:14<1:35:04, 2.50s/it] +2025-02-06 06:19:55 - ERROR - stderr - +2025-02-06 06:19:55 - ERROR - stderr - +2025-02-06 06:19:55 - INFO - stdout - {'loss': 0.315, 'grad_norm': 1.41415274143219, 'learning_rate': 5.359709197386243e-07, 'epoch': 2.7} +2025-02-06 06:19:55 - ERROR - stderr - 90%|████████▉ | 20156/22434 [20:12:14<1:35:04, 2.50s/it] +2025-02-06 06:19:57 - ERROR - stderr - 90%|████████▉ | 20157/22434 [20:12:17<1:36:46, 2.55s/it] +2025-02-06 06:19:57 - ERROR - stderr - +2025-02-06 06:19:57 - ERROR - stderr - +2025-02-06 06:19:57 - INFO - stdout - {'loss': 0.3566, 'grad_norm': 1.5216760635375977, 'learning_rate': 5.355047048973627e-07, 'epoch': 2.7} +2025-02-06 06:19:57 - ERROR - stderr - 90%|████████▉ | 20157/22434 [20:12:17<1:36:46, 2.55s/it] +2025-02-06 06:20:00 - ERROR - stderr - 90%|████████▉ | 20158/22434 [20:12:20<1:36:18, 2.54s/it] +2025-02-06 06:20:00 - ERROR - stderr - +2025-02-06 06:20:00 - ERROR - stderr - +2025-02-06 06:20:00 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.5549861192703247, 'learning_rate': 5.350386873344515e-07, 'epoch': 2.7} +2025-02-06 06:20:00 - ERROR - stderr - 90%|████████▉ | 20158/22434 [20:12:20<1:36:18, 2.54s/it] +2025-02-06 06:20:02 - ERROR - stderr - 90%|████████▉ | 20159/22434 [20:12:22<1:36:12, 2.54s/it] +2025-02-06 06:20:02 - ERROR - stderr - +2025-02-06 06:20:02 - ERROR - stderr - +2025-02-06 06:20:02 - INFO - stdout - {'loss': 0.3278, 'grad_norm': 1.6170156002044678, 'learning_rate': 5.345728670595995e-07, 'epoch': 2.7} +2025-02-06 06:20:02 - ERROR - stderr - 90%|████████▉ | 20159/22434 [20:12:22<1:36:12, 2.54s/it] +2025-02-06 06:20:05 - ERROR - stderr - 90%|████████▉ | 20160/22434 [20:12:25<1:35:51, 2.53s/it] +2025-02-06 06:20:05 - ERROR - stderr - +2025-02-06 06:20:05 - ERROR - stderr - +2025-02-06 06:20:05 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.7136040925979614, 'learning_rate': 5.341072440825201e-07, 'epoch': 2.7} +2025-02-06 06:20:05 - ERROR - stderr - 90%|████████▉ | 20160/22434 [20:12:25<1:35:51, 2.53s/it] +2025-02-06 06:20:07 - ERROR - stderr - 90%|████████▉ | 20161/22434 [20:12:27<1:35:36, 2.52s/it] +2025-02-06 06:20:07 - ERROR - stderr - +2025-02-06 06:20:07 - ERROR - stderr - +2025-02-06 06:20:07 - INFO - stdout - {'loss': 0.4451, 'grad_norm': 1.7905199527740479, 'learning_rate': 5.336418184129177e-07, 'epoch': 2.7} +2025-02-06 06:20:07 - ERROR - stderr - 90%|████████▉ | 20161/22434 [20:12:27<1:35:36, 2.52s/it] +2025-02-06 06:20:10 - ERROR - stderr - 90%|████████▉ | 20162/22434 [20:12:30<1:35:55, 2.53s/it] +2025-02-06 06:20:10 - ERROR - stderr - +2025-02-06 06:20:10 - ERROR - stderr - +2025-02-06 06:20:10 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.6279823780059814, 'learning_rate': 5.331765900604913e-07, 'epoch': 2.7} +2025-02-06 06:20:10 - ERROR - stderr - 90%|████████▉ | 20162/22434 [20:12:30<1:35:55, 2.53s/it] +2025-02-06 06:20:12 - ERROR - stderr - 90%|████████▉ | 20163/22434 [20:12:32<1:35:40, 2.53s/it] +2025-02-06 06:20:12 - ERROR - stderr - +2025-02-06 06:20:12 - ERROR - stderr - +2025-02-06 06:20:12 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.4701722860336304, 'learning_rate': 5.32711559034943e-07, 'epoch': 2.7} +2025-02-06 06:20:12 - ERROR - stderr - 90%|████████▉ | 20163/22434 [20:12:32<1:35:40, 2.53s/it] +2025-02-06 06:20:16 - ERROR - stderr - 90%|████████▉ | 20164/22434 [20:12:35<1:42:10, 2.70s/it] +2025-02-06 06:20:16 - ERROR - stderr - +2025-02-06 06:20:16 - ERROR - stderr - +2025-02-06 06:20:16 - INFO - stdout - {'loss': 0.3385, 'grad_norm': 1.5260616540908813, 'learning_rate': 5.322467253459618e-07, 'epoch': 2.7} +2025-02-06 06:20:16 - ERROR - stderr - 90%|████████▉ | 20164/22434 [20:12:35<1:42:10, 2.70s/it] +2025-02-06 06:20:18 - ERROR - stderr - 90%|████████▉ | 20165/22434 [20:12:38<1:41:06, 2.67s/it] +2025-02-06 06:20:18 - ERROR - stderr - +2025-02-06 06:20:18 - ERROR - stderr - +2025-02-06 06:20:18 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.6723707914352417, 'learning_rate': 5.317820890032376e-07, 'epoch': 2.7} +2025-02-06 06:20:18 - ERROR - stderr - 90%|████████▉ | 20165/22434 [20:12:38<1:41:06, 2.67s/it] +2025-02-06 06:20:21 - ERROR - stderr - 90%|████████▉ | 20166/22434 [20:12:40<1:39:24, 2.63s/it] +2025-02-06 06:20:21 - ERROR - stderr - +2025-02-06 06:20:21 - ERROR - stderr - +2025-02-06 06:20:21 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.4774198532104492, 'learning_rate': 5.313176500164563e-07, 'epoch': 2.7} +2025-02-06 06:20:21 - ERROR - stderr - 90%|████████▉ | 20166/22434 [20:12:40<1:39:24, 2.63s/it] +2025-02-06 06:20:23 - ERROR - stderr - 90%|████████▉ | 20167/22434 [20:12:43<1:39:00, 2.62s/it] +2025-02-06 06:20:23 - ERROR - stderr - +2025-02-06 06:20:23 - ERROR - stderr - +2025-02-06 06:20:23 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.4690394401550293, 'learning_rate': 5.308534083952954e-07, 'epoch': 2.7} +2025-02-06 06:20:23 - ERROR - stderr - 90%|████████▉ | 20167/22434 [20:12:43<1:39:00, 2.62s/it] +2025-02-06 06:20:26 - ERROR - stderr - 90%|████████▉ | 20168/22434 [20:12:46<1:38:00, 2.59s/it] +2025-02-06 06:20:26 - ERROR - stderr - +2025-02-06 06:20:26 - ERROR - stderr - +2025-02-06 06:20:26 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.5543104410171509, 'learning_rate': 5.303893641494374e-07, 'epoch': 2.7} +2025-02-06 06:20:26 - ERROR - stderr - 90%|████████▉ | 20168/22434 [20:12:46<1:38:00, 2.59s/it] +2025-02-06 06:20:28 - ERROR - stderr - 90%|████████▉ | 20169/22434 [20:12:48<1:37:43, 2.59s/it] +2025-02-06 06:20:28 - ERROR - stderr - +2025-02-06 06:20:28 - ERROR - stderr - +2025-02-06 06:20:28 - INFO - stdout - {'loss': 0.3801, 'grad_norm': 1.6397544145584106, 'learning_rate': 5.299255172885509e-07, 'epoch': 2.7} +2025-02-06 06:20:28 - ERROR - stderr - 90%|████████▉ | 20169/22434 [20:12:48<1:37:43, 2.59s/it] +2025-02-06 06:20:31 - ERROR - stderr - 90%|████████▉ | 20170/22434 [20:12:51<1:39:07, 2.63s/it] +2025-02-06 06:20:31 - ERROR - stderr - +2025-02-06 06:20:31 - ERROR - stderr - +2025-02-06 06:20:31 - INFO - stdout - {'loss': 0.3706, 'grad_norm': 1.5951378345489502, 'learning_rate': 5.294618678223051e-07, 'epoch': 2.7} +2025-02-06 06:20:31 - ERROR - stderr - 90%|████████▉ | 20170/22434 [20:12:51<1:39:07, 2.63s/it] +2025-02-06 06:20:34 - ERROR - stderr - 90%|████████▉ | 20171/22434 [20:12:53<1:38:34, 2.61s/it] +2025-02-06 06:20:34 - ERROR - stderr - +2025-02-06 06:20:34 - ERROR - stderr - +2025-02-06 06:20:34 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.6154307126998901, 'learning_rate': 5.289984157603634e-07, 'epoch': 2.7} +2025-02-06 06:20:34 - ERROR - stderr - 90%|████████▉ | 20171/22434 [20:12:53<1:38:34, 2.61s/it] +2025-02-06 06:20:36 - ERROR - stderr - 90%|████████▉ | 20172/22434 [20:12:56<1:37:54, 2.60s/it] +2025-02-06 06:20:36 - ERROR - stderr - +2025-02-06 06:20:36 - ERROR - stderr - +2025-02-06 06:20:36 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.4765098094940186, 'learning_rate': 5.285351611123879e-07, 'epoch': 2.7} +2025-02-06 06:20:36 - ERROR - stderr - 90%|████████▉ | 20172/22434 [20:12:56<1:37:54, 2.60s/it] +2025-02-06 06:20:39 - ERROR - stderr - 90%|████████▉ | 20173/22434 [20:12:58<1:36:07, 2.55s/it] +2025-02-06 06:20:39 - ERROR - stderr - +2025-02-06 06:20:39 - ERROR - stderr - +2025-02-06 06:20:39 - INFO - stdout - {'loss': 0.3889, 'grad_norm': 1.5295416116714478, 'learning_rate': 5.280721038880333e-07, 'epoch': 2.7} +2025-02-06 06:20:39 - ERROR - stderr - 90%|████████▉ | 20173/22434 [20:12:58<1:36:07, 2.55s/it] +2025-02-06 06:20:41 - ERROR - stderr - 90%|████████▉ | 20174/22434 [20:13:01<1:39:06, 2.63s/it] +2025-02-06 06:20:42 - ERROR - stderr - +2025-02-06 06:20:42 - ERROR - stderr - +2025-02-06 06:20:42 - INFO - stdout - {'loss': 0.2884, 'grad_norm': 1.4261928796768188, 'learning_rate': 5.276092440969527e-07, 'epoch': 2.7} +2025-02-06 06:20:42 - ERROR - stderr - 90%|████████▉ | 20174/22434 [20:13:01<1:39:06, 2.63s/it] +2025-02-06 06:20:44 - ERROR - stderr - 90%|████████▉ | 20175/22434 [20:13:04<1:37:42, 2.60s/it] +2025-02-06 06:20:44 - ERROR - stderr - +2025-02-06 06:20:44 - ERROR - stderr - +2025-02-06 06:20:44 - INFO - stdout - {'loss': 0.3601, 'grad_norm': 1.6325639486312866, 'learning_rate': 5.271465817487919e-07, 'epoch': 2.7} +2025-02-06 06:20:44 - ERROR - stderr - 90%|████████▉ | 20175/22434 [20:13:04<1:37:42, 2.60s/it] +2025-02-06 06:20:47 - ERROR - stderr - 90%|████████▉ | 20176/22434 [20:13:06<1:37:59, 2.60s/it] +2025-02-06 06:20:47 - ERROR - stderr - +2025-02-06 06:20:47 - ERROR - stderr - +2025-02-06 06:20:47 - INFO - stdout - {'loss': 0.3037, 'grad_norm': 1.368662714958191, 'learning_rate': 5.266841168531977e-07, 'epoch': 2.7} +2025-02-06 06:20:47 - ERROR - stderr - 90%|████████▉ | 20176/22434 [20:13:06<1:37:59, 2.60s/it] +2025-02-06 06:20:49 - ERROR - stderr - 90%|████████▉ | 20177/22434 [20:13:09<1:38:38, 2.62s/it] +2025-02-06 06:20:49 - ERROR - stderr - +2025-02-06 06:20:49 - ERROR - stderr - +2025-02-06 06:20:49 - INFO - stdout - {'loss': 0.3863, 'grad_norm': 1.6453522443771362, 'learning_rate': 5.26221849419809e-07, 'epoch': 2.7} +2025-02-06 06:20:49 - ERROR - stderr - 90%|████████▉ | 20177/22434 [20:13:09<1:38:38, 2.62s/it] +2025-02-06 06:20:52 - ERROR - stderr - 90%|████████▉ | 20178/22434 [20:13:12<1:36:48, 2.57s/it] +2025-02-06 06:20:52 - ERROR - stderr - +2025-02-06 06:20:52 - ERROR - stderr - +2025-02-06 06:20:52 - INFO - stdout - {'loss': 0.3369, 'grad_norm': 1.6269104480743408, 'learning_rate': 5.25759779458257e-07, 'epoch': 2.7} +2025-02-06 06:20:52 - ERROR - stderr - 90%|████████▉ | 20178/22434 [20:13:12<1:36:48, 2.57s/it] +2025-02-06 06:20:54 - ERROR - stderr - 90%|████████▉ | 20179/22434 [20:13:14<1:36:04, 2.56s/it] +2025-02-06 06:20:54 - ERROR - stderr - +2025-02-06 06:20:54 - ERROR - stderr - +2025-02-06 06:20:54 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.5034455060958862, 'learning_rate': 5.252979069781783e-07, 'epoch': 2.7} +2025-02-06 06:20:54 - ERROR - stderr - 90%|████████▉ | 20179/22434 [20:13:14<1:36:04, 2.56s/it] +2025-02-06 06:20:57 - ERROR - stderr - 90%|████████▉ | 20180/22434 [20:13:17<1:36:07, 2.56s/it] +2025-02-06 06:20:57 - ERROR - stderr - +2025-02-06 06:20:57 - ERROR - stderr - +2025-02-06 06:20:57 - INFO - stdout - {'loss': 0.3204, 'grad_norm': 1.607408881187439, 'learning_rate': 5.248362319891998e-07, 'epoch': 2.7} +2025-02-06 06:20:57 - ERROR - stderr - 90%|████████▉ | 20180/22434 [20:13:17<1:36:07, 2.56s/it] +2025-02-06 06:20:59 - ERROR - stderr - 90%|████████▉ | 20181/22434 [20:13:19<1:35:35, 2.55s/it] +2025-02-06 06:20:59 - ERROR - stderr - +2025-02-06 06:20:59 - ERROR - stderr - +2025-02-06 06:20:59 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.4805762767791748, 'learning_rate': 5.243747545009404e-07, 'epoch': 2.7} +2025-02-06 06:20:59 - ERROR - stderr - 90%|████████▉ | 20181/22434 [20:13:19<1:35:35, 2.55s/it] +2025-02-06 06:21:02 - ERROR - stderr - 90%|████████▉ | 20182/22434 [20:13:22<1:35:31, 2.54s/it] +2025-02-06 06:21:02 - ERROR - stderr - +2025-02-06 06:21:02 - ERROR - stderr - +2025-02-06 06:21:02 - INFO - stdout - {'loss': 0.3678, 'grad_norm': 1.596431851387024, 'learning_rate': 5.239134745230246e-07, 'epoch': 2.7} +2025-02-06 06:21:02 - ERROR - stderr - 90%|████████▉ | 20182/22434 [20:13:22<1:35:31, 2.54s/it] +2025-02-06 06:21:04 - ERROR - stderr - 90%|████████▉ | 20183/22434 [20:13:24<1:34:52, 2.53s/it] +2025-02-06 06:21:04 - ERROR - stderr - +2025-02-06 06:21:04 - ERROR - stderr - +2025-02-06 06:21:04 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.6354860067367554, 'learning_rate': 5.234523920650624e-07, 'epoch': 2.7} +2025-02-06 06:21:04 - ERROR - stderr - 90%|████████▉ | 20183/22434 [20:13:24<1:34:52, 2.53s/it] +2025-02-06 06:21:07 - ERROR - stderr - 90%|████████▉ | 20184/22434 [20:13:27<1:38:06, 2.62s/it] +2025-02-06 06:21:07 - ERROR - stderr - +2025-02-06 06:21:07 - ERROR - stderr - +2025-02-06 06:21:07 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.4448127746582031, 'learning_rate': 5.229915071366698e-07, 'epoch': 2.7} +2025-02-06 06:21:07 - ERROR - stderr - 90%|████████▉ | 20184/22434 [20:13:27<1:38:06, 2.62s/it] +2025-02-06 06:21:10 - ERROR - stderr - 90%|████████▉ | 20185/22434 [20:13:30<1:41:20, 2.70s/it] +2025-02-06 06:21:10 - ERROR - stderr - +2025-02-06 06:21:10 - ERROR - stderr - +2025-02-06 06:21:10 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.6629222631454468, 'learning_rate': 5.225308197474499e-07, 'epoch': 2.7} +2025-02-06 06:21:10 - ERROR - stderr - 90%|████████▉ | 20185/22434 [20:13:30<1:41:20, 2.70s/it] +2025-02-06 06:21:13 - ERROR - stderr - 90%|████████▉ | 20186/22434 [20:13:32<1:39:28, 2.65s/it] +2025-02-06 06:21:13 - ERROR - stderr - +2025-02-06 06:21:13 - ERROR - stderr - +2025-02-06 06:21:13 - INFO - stdout - {'loss': 0.4084, 'grad_norm': 1.75308358669281, 'learning_rate': 5.22070329907004e-07, 'epoch': 2.7} +2025-02-06 06:21:13 - ERROR - stderr - 90%|████████▉ | 20186/22434 [20:13:32<1:39:28, 2.65s/it] +2025-02-06 06:21:15 - ERROR - stderr - 90%|████████▉ | 20187/22434 [20:13:35<1:37:13, 2.60s/it] +2025-02-06 06:21:15 - ERROR - stderr - +2025-02-06 06:21:15 - ERROR - stderr - +2025-02-06 06:21:15 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.4608988761901855, 'learning_rate': 5.216100376249356e-07, 'epoch': 2.7} +2025-02-06 06:21:15 - ERROR - stderr - 90%|████████▉ | 20187/22434 [20:13:35<1:37:13, 2.60s/it] +2025-02-06 06:21:18 - ERROR - stderr - 90%|████████▉ | 20188/22434 [20:13:38<1:38:29, 2.63s/it] +2025-02-06 06:21:18 - ERROR - stderr - +2025-02-06 06:21:18 - ERROR - stderr - +2025-02-06 06:21:18 - INFO - stdout - {'loss': 0.3845, 'grad_norm': 1.5837665796279907, 'learning_rate': 5.211499429108346e-07, 'epoch': 2.7} +2025-02-06 06:21:18 - ERROR - stderr - 90%|████████▉ | 20188/22434 [20:13:38<1:38:29, 2.63s/it] +2025-02-06 06:21:20 - ERROR - stderr - 90%|████████▉ | 20189/22434 [20:13:40<1:37:17, 2.60s/it] +2025-02-06 06:21:20 - ERROR - stderr - +2025-02-06 06:21:20 - ERROR - stderr - +2025-02-06 06:21:20 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.697843313217163, 'learning_rate': 5.206900457742924e-07, 'epoch': 2.7} +2025-02-06 06:21:20 - ERROR - stderr - 90%|████████▉ | 20189/22434 [20:13:40<1:37:17, 2.60s/it] +2025-02-06 06:21:23 - ERROR - stderr - 90%|████████▉ | 20190/22434 [20:13:43<1:36:48, 2.59s/it] +2025-02-06 06:21:23 - ERROR - stderr - +2025-02-06 06:21:23 - ERROR - stderr - +2025-02-06 06:21:23 - INFO - stdout - {'loss': 0.3378, 'grad_norm': 1.404334545135498, 'learning_rate': 5.20230346224897e-07, 'epoch': 2.7} +2025-02-06 06:21:23 - ERROR - stderr - 90%|████████▉ | 20190/22434 [20:13:43<1:36:48, 2.59s/it] +2025-02-06 06:21:25 - ERROR - stderr - 90%|█████████ | 20191/22434 [20:13:45<1:36:29, 2.58s/it] +2025-02-06 06:21:26 - ERROR - stderr - +2025-02-06 06:21:26 - ERROR - stderr - +2025-02-06 06:21:26 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.8581522703170776, 'learning_rate': 5.197708442722272e-07, 'epoch': 2.7} +2025-02-06 06:21:26 - ERROR - stderr - 90%|█████████ | 20191/22434 [20:13:45<1:36:29, 2.58s/it] +2025-02-06 06:21:28 - ERROR - stderr - 90%|██████��██ | 20192/22434 [20:13:48<1:36:04, 2.57s/it] +2025-02-06 06:21:28 - ERROR - stderr - +2025-02-06 06:21:28 - ERROR - stderr - +2025-02-06 06:21:28 - INFO - stdout - {'loss': 0.4232, 'grad_norm': 1.7924938201904297, 'learning_rate': 5.19311539925863e-07, 'epoch': 2.7} +2025-02-06 06:21:28 - ERROR - stderr - 90%|█████████ | 20192/22434 [20:13:48<1:36:04, 2.57s/it] +2025-02-06 06:21:31 - ERROR - stderr - 90%|█████████ | 20193/22434 [20:13:50<1:36:00, 2.57s/it] +2025-02-06 06:21:31 - ERROR - stderr - +2025-02-06 06:21:31 - ERROR - stderr - +2025-02-06 06:21:31 - INFO - stdout - {'loss': 0.3211, 'grad_norm': 1.7070937156677246, 'learning_rate': 5.188524331953782e-07, 'epoch': 2.7} +2025-02-06 06:21:31 - ERROR - stderr - 90%|█████████ | 20193/22434 [20:13:50<1:36:00, 2.57s/it] +2025-02-06 06:21:33 - ERROR - stderr - 90%|█████████ | 20194/22434 [20:13:53<1:35:40, 2.56s/it] +2025-02-06 06:21:33 - ERROR - stderr - +2025-02-06 06:21:33 - ERROR - stderr - +2025-02-06 06:21:33 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.791308045387268, 'learning_rate': 5.183935240903415e-07, 'epoch': 2.7} +2025-02-06 06:21:33 - ERROR - stderr - 90%|█████████ | 20194/22434 [20:13:53<1:35:40, 2.56s/it] +2025-02-06 06:21:36 - ERROR - stderr - 90%|█████████ | 20195/22434 [20:13:55<1:35:24, 2.56s/it] +2025-02-06 06:21:36 - ERROR - stderr - +2025-02-06 06:21:36 - ERROR - stderr - +2025-02-06 06:21:36 - INFO - stdout - {'loss': 0.3458, 'grad_norm': 1.4736934900283813, 'learning_rate': 5.179348126203188e-07, 'epoch': 2.7} +2025-02-06 06:21:36 - ERROR - stderr - 90%|█████████ | 20195/22434 [20:13:55<1:35:24, 2.56s/it] +2025-02-06 06:21:38 - ERROR - stderr - 90%|█████████ | 20196/22434 [20:13:58<1:35:41, 2.57s/it] +2025-02-06 06:21:38 - ERROR - stderr - +2025-02-06 06:21:38 - ERROR - stderr - +2025-02-06 06:21:38 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.509047269821167, 'learning_rate': 5.174762987948734e-07, 'epoch': 2.7} +2025-02-06 06:21:38 - ERROR - stderr - 90%|█████████ | 20196/22434 [20:13:58<1:35:41, 2.57s/it] +2025-02-06 06:21:41 - ERROR - stderr - 90%|█████████ | 20197/22434 [20:14:01<1:36:18, 2.58s/it] +2025-02-06 06:21:41 - ERROR - stderr - +2025-02-06 06:21:41 - ERROR - stderr - +2025-02-06 06:21:41 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.481350064277649, 'learning_rate': 5.170179826235577e-07, 'epoch': 2.7} +2025-02-06 06:21:41 - ERROR - stderr - 90%|█████████ | 20197/22434 [20:14:01<1:36:18, 2.58s/it] +2025-02-06 06:21:44 - ERROR - stderr - 90%|█████████ | 20198/22434 [20:14:03<1:36:47, 2.60s/it] +2025-02-06 06:21:44 - ERROR - stderr - +2025-02-06 06:21:44 - ERROR - stderr - +2025-02-06 06:21:44 - INFO - stdout - {'loss': 0.3265, 'grad_norm': 1.3817325830459595, 'learning_rate': 5.165598641159297e-07, 'epoch': 2.7} +2025-02-06 06:21:44 - ERROR - stderr - 90%|█████████ | 20198/22434 [20:14:03<1:36:47, 2.60s/it] +2025-02-06 06:21:46 - ERROR - stderr - 90%|█████████ | 20199/22434 [20:14:06<1:37:49, 2.63s/it] +2025-02-06 06:21:46 - ERROR - stderr - +2025-02-06 06:21:46 - ERROR - stderr - +2025-02-06 06:21:46 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.5333493947982788, 'learning_rate': 5.161019432815362e-07, 'epoch': 2.7} +2025-02-06 06:21:46 - ERROR - stderr - 90%|█████████ | 20199/22434 [20:14:06<1:37:49, 2.63s/it] +2025-02-06 06:21:49 - ERROR - stderr - 90%|█████████ | 20200/22434 [20:14:08<1:36:24, 2.59s/it] +2025-02-06 06:21:49 - ERROR - stderr - +2025-02-06 06:21:49 - ERROR - stderr - +2025-02-06 06:21:49 - INFO - stdout - {'loss': 0.4296, 'grad_norm': 1.7811447381973267, 'learning_rate': 5.156442201299228e-07, 'epoch': 2.7} +2025-02-06 06:21:49 - ERROR - stderr - 90%|█████████ | 20200/22434 [20:14:09<1:36:24, 2.59s/it] +2025-02-06 06:21:51 - ERROR - stderr - 90%|█████████ | 20201/22434 [20:14:11<1:36:19, 2.59s/it] +2025-02-06 06:21:51 - ERROR - stderr - +2025-02-06 06:21:51 - ERROR - stderr - +2025-02-06 06:21:51 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.6808396577835083, 'learning_rate': 5.151866946706318e-07, 'epoch': 2.7} +2025-02-06 06:21:51 - ERROR - stderr - 90%|█████████ | 20201/22434 [20:14:11<1:36:19, 2.59s/it] +2025-02-06 06:21:54 - ERROR - stderr - 90%|█████████ | 20202/22434 [20:14:14<1:36:44, 2.60s/it] +2025-02-06 06:21:54 - ERROR - stderr - +2025-02-06 06:21:54 - ERROR - stderr - +2025-02-06 06:21:54 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.4510438442230225, 'learning_rate': 5.147293669131947e-07, 'epoch': 2.7} +2025-02-06 06:21:54 - ERROR - stderr - 90%|█████████ | 20202/22434 [20:14:14<1:36:44, 2.60s/it] +2025-02-06 06:21:57 - ERROR - stderr - 90%|█████████ | 20203/22434 [20:14:17<1:39:24, 2.67s/it] +2025-02-06 06:21:57 - ERROR - stderr - +2025-02-06 06:21:57 - ERROR - stderr - +2025-02-06 06:21:57 - INFO - stdout - {'loss': 0.372, 'grad_norm': 1.6661646366119385, 'learning_rate': 5.142722368671505e-07, 'epoch': 2.7} +2025-02-06 06:21:57 - ERROR - stderr - 90%|█████████ | 20203/22434 [20:14:17<1:39:24, 2.67s/it] +2025-02-06 06:21:59 - ERROR - stderr - 90%|█████████ | 20204/22434 [20:14:19<1:37:40, 2.63s/it] +2025-02-06 06:21:59 - ERROR - stderr - +2025-02-06 06:21:59 - ERROR - stderr - +2025-02-06 06:21:59 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.6509891748428345, 'learning_rate': 5.138153045420236e-07, 'epoch': 2.7} +2025-02-06 06:21:59 - ERROR - stderr - 90%|█████████ | 20204/22434 [20:14:19<1:37:40, 2.63s/it] +2025-02-06 06:22:02 - ERROR - stderr - 90%|█████████ | 20205/22434 [20:14:22<1:37:49, 2.63s/it] +2025-02-06 06:22:02 - ERROR - stderr - +2025-02-06 06:22:02 - ERROR - stderr - +2025-02-06 06:22:02 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.7195852994918823, 'learning_rate': 5.133585699473376e-07, 'epoch': 2.7} +2025-02-06 06:22:02 - ERROR - stderr - 90%|█████████ | 20205/22434 [20:14:22<1:37:49, 2.63s/it] +2025-02-06 06:22:04 - ERROR - stderr - 90%|█████████ | 20206/22434 [20:14:24<1:35:28, 2.57s/it] +2025-02-06 06:22:04 - ERROR - stderr - +2025-02-06 06:22:04 - ERROR - stderr - +2025-02-06 06:22:04 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.6587779521942139, 'learning_rate': 5.129020330926182e-07, 'epoch': 2.7} +2025-02-06 06:22:04 - ERROR - stderr - 90%|█████████ | 20206/22434 [20:14:24<1:35:28, 2.57s/it] +2025-02-06 06:22:07 - ERROR - stderr - 90%|█████████ | 20207/22434 [20:14:27<1:33:51, 2.53s/it] +2025-02-06 06:22:07 - ERROR - stderr - +2025-02-06 06:22:07 - ERROR - stderr - +2025-02-06 06:22:07 - INFO - stdout - {'loss': 0.3795, 'grad_norm': 1.5667407512664795, 'learning_rate': 5.124456939873734e-07, 'epoch': 2.7} +2025-02-06 06:22:07 - ERROR - stderr - 90%|█████████ | 20207/22434 [20:14:27<1:33:51, 2.53s/it] +2025-02-06 06:22:09 - ERROR - stderr - 90%|█████████ | 20208/22434 [20:14:29<1:33:46, 2.53s/it] +2025-02-06 06:22:09 - ERROR - stderr - +2025-02-06 06:22:09 - ERROR - stderr - +2025-02-06 06:22:09 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.4892768859863281, 'learning_rate': 5.119895526411234e-07, 'epoch': 2.7} +2025-02-06 06:22:09 - ERROR - stderr - 90%|█████████ | 20208/22434 [20:14:29<1:33:46, 2.53s/it] +2025-02-06 06:22:12 - ERROR - stderr - 90%|█████████ | 20209/22434 [20:14:32<1:34:29, 2.55s/it] +2025-02-06 06:22:12 - ERROR - stderr - +2025-02-06 06:22:12 - ERROR - stderr - +2025-02-06 06:22:12 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.724997639656067, 'learning_rate': 5.115336090633705e-07, 'epoch': 2.7} +2025-02-06 06:22:12 - ERROR - stderr - 90%|█████████ | 20209/22434 [20:14:32<1:34:29, 2.55s/it] +2025-02-06 06:22:14 - ERROR - stderr - 90%|█████████ | 20210/22434 [20:14:34<1:33:38, 2.53s/it] +2025-02-06 06:22:14 - ERROR - stderr - +2025-02-06 06:22:14 - ERROR - stderr - +2025-02-06 06:22:14 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.5854359865188599, 'learning_rate': 5.110778632636204e-07, 'epoch': 2.7} +2025-02-06 06:22:14 - ERROR - stderr - 90%|█████████ | 20210/22434 [20:14:34<1:33:38, 2.53s/it] +2025-02-06 06:22:17 - ERROR - stderr - 90%|█████████ | 20211/22434 [20:14:37<1:33:11, 2.52s/it] +2025-02-06 06:22:17 - ERROR - stderr - +2025-02-06 06:22:17 - ERROR - stderr - +2025-02-06 06:22:17 - INFO - stdout - {'loss': 0.3972, 'grad_norm': 1.601108431816101, 'learning_rate': 5.106223152513712e-07, 'epoch': 2.7} +2025-02-06 06:22:17 - ERROR - stderr - 90%|█████████ | 20211/22434 [20:14:37<1:33:11, 2.52s/it] +2025-02-06 06:22:20 - ERROR - stderr - 90%|█████████ | 20212/22434 [20:14:39<1:34:29, 2.55s/it] +2025-02-06 06:22:20 - ERROR - stderr - +2025-02-06 06:22:20 - ERROR - stderr - +2025-02-06 06:22:20 - INFO - stdout - {'loss': 0.3373, 'grad_norm': 1.5840487480163574, 'learning_rate': 5.101669650361207e-07, 'epoch': 2.7} +2025-02-06 06:22:20 - ERROR - stderr - 90%|█████████ | 20212/22434 [20:14:39<1:34:29, 2.55s/it] +2025-02-06 06:22:22 - ERROR - stderr - 90%|█████████ | 20213/22434 [20:14:42<1:34:32, 2.55s/it] +2025-02-06 06:22:22 - ERROR - stderr - +2025-02-06 06:22:22 - ERROR - stderr - +2025-02-06 06:22:22 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.5212128162384033, 'learning_rate': 5.097118126273582e-07, 'epoch': 2.7} +2025-02-06 06:22:22 - ERROR - stderr - 90%|█████████ | 20213/22434 [20:14:42<1:34:32, 2.55s/it] +2025-02-06 06:22:25 - ERROR - stderr - 90%|█████████ | 20214/22434 [20:14:44<1:33:40, 2.53s/it] +2025-02-06 06:22:25 - ERROR - stderr - +2025-02-06 06:22:25 - ERROR - stderr - +2025-02-06 06:22:25 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.5600091218948364, 'learning_rate': 5.092568580345724e-07, 'epoch': 2.7} +2025-02-06 06:22:25 - ERROR - stderr - 90%|█████████ | 20214/22434 [20:14:44<1:33:40, 2.53s/it] +2025-02-06 06:22:27 - ERROR - stderr - 90%|█████████ | 20215/22434 [20:14:47<1:32:57, 2.51s/it] +2025-02-06 06:22:27 - ERROR - stderr - +2025-02-06 06:22:27 - ERROR - stderr - +2025-02-06 06:22:27 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.7619163990020752, 'learning_rate': 5.08802101267245e-07, 'epoch': 2.7} +2025-02-06 06:22:27 - ERROR - stderr - 90%|█████████ | 20215/22434 [20:14:47<1:32:57, 2.51s/it] +2025-02-06 06:22:30 - ERROR - stderr - 90%|█████████ | 20216/22434 [20:14:49<1:32:58, 2.52s/it] +2025-02-06 06:22:30 - ERROR - stderr - +2025-02-06 06:22:30 - ERROR - stderr - +2025-02-06 06:22:30 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.467216968536377, 'learning_rate': 5.083475423348572e-07, 'epoch': 2.7} +2025-02-06 06:22:30 - ERROR - stderr - 90%|█████████ | 20216/22434 [20:14:49<1:32:58, 2.52s/it] +2025-02-06 06:22:32 - ERROR - stderr - 90%|█████████ | 20217/22434 [20:14:52<1:33:59, 2.54s/it] +2025-02-06 06:22:32 - ERROR - stderr - +2025-02-06 06:22:32 - ERROR - stderr - +2025-02-06 06:22:32 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.4702365398406982, 'learning_rate': 5.078931812468813e-07, 'epoch': 2.7} +2025-02-06 06:22:32 - ERROR - stderr - 90%|█████████ | 20217/22434 [20:14:52<1:33:59, 2.54s/it] +2025-02-06 06:22:35 - ERROR - stderr - 90%|█████████ | 20218/22434 [20:14:54<1:33:53, 2.54s/it] +2025-02-06 06:22:35 - ERROR - stderr - +2025-02-06 06:22:35 - ERROR - stderr - +2025-02-06 06:22:35 - INFO - stdout - {'loss': 0.3947, 'grad_norm': 1.4741390943527222, 'learning_rate': 5.074390180127886e-07, 'epoch': 2.7} +2025-02-06 06:22:35 - ERROR - stderr - 90%|█████████ | 20218/22434 [20:14:55<1:33:53, 2.54s/it] +2025-02-06 06:22:37 - ERROR - stderr - 90%|█████████ | 20219/22434 [20:14:57<1:34:28, 2.56s/it] +2025-02-06 06:22:37 - ERROR - stderr - +2025-02-06 06:22:37 - ERROR - stderr - +2025-02-06 06:22:37 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.5596321821212769, 'learning_rate': 5.069850526420461e-07, 'epoch': 2.7} +2025-02-06 06:22:37 - ERROR - stderr - 90%|█████████ | 20219/22434 [20:14:57<1:34:28, 2.56s/it] +2025-02-06 06:22:40 - ERROR - stderr - 90%|█████████ | 20220/22434 [20:15:00<1:35:50, 2.60s/it] +2025-02-06 06:22:40 - ERROR - stderr - +2025-02-06 06:22:40 - ERROR - stderr - +2025-02-06 06:22:40 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.7318263053894043, 'learning_rate': 5.065312851441184e-07, 'epoch': 2.7} +2025-02-06 06:22:40 - ERROR - stderr - 90%|█████████ | 20220/22434 [20:15:00<1:35:50, 2.60s/it] +2025-02-06 06:22:42 - ERROR - stderr - 90%|█████████ | 20221/22434 [20:15:02<1:34:24, 2.56s/it] +2025-02-06 06:22:42 - ERROR - stderr - +2025-02-06 06:22:42 - ERROR - stderr - +2025-02-06 06:22:42 - INFO - stdout - {'loss': 0.2943, 'grad_norm': 1.4680488109588623, 'learning_rate': 5.06077715528459e-07, 'epoch': 2.7} +2025-02-06 06:22:42 - ERROR - stderr - 90%|█████████ | 20221/22434 [20:15:02<1:34:24, 2.56s/it] +2025-02-06 06:22:45 - ERROR - stderr - 90%|█████████ | 20222/22434 [20:15:05<1:35:19, 2.59s/it] +2025-02-06 06:22:45 - ERROR - stderr - +2025-02-06 06:22:45 - ERROR - stderr - +2025-02-06 06:22:45 - INFO - stdout - {'loss': 0.3122, 'grad_norm': 1.4233413934707642, 'learning_rate': 5.056243438045283e-07, 'epoch': 2.7} +2025-02-06 06:22:45 - ERROR - stderr - 90%|█████████ | 20222/22434 [20:15:05<1:35:19, 2.59s/it] +2025-02-06 06:22:48 - ERROR - stderr - 90%|█████████ | 20223/22434 [20:15:07<1:35:14, 2.58s/it] +2025-02-06 06:22:48 - ERROR - stderr - +2025-02-06 06:22:48 - ERROR - stderr - +2025-02-06 06:22:48 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.5125577449798584, 'learning_rate': 5.051711699817696e-07, 'epoch': 2.7} +2025-02-06 06:22:48 - ERROR - stderr - 90%|█████████ | 20223/22434 [20:15:07<1:35:14, 2.58s/it] +2025-02-06 06:22:50 - ERROR - stderr - 90%|█████████ | 20224/22434 [20:15:10<1:34:21, 2.56s/it] +2025-02-06 06:22:50 - ERROR - stderr - +2025-02-06 06:22:50 - ERROR - stderr - +2025-02-06 06:22:50 - INFO - stdout - {'loss': 0.3479, 'grad_norm': 1.7236768007278442, 'learning_rate': 5.047181940696333e-07, 'epoch': 2.7} +2025-02-06 06:22:50 - ERROR - stderr - 90%|█████████ | 20224/22434 [20:15:10<1:34:21, 2.56s/it] +2025-02-06 06:22:53 - ERROR - stderr - 90%|█████████ | 20225/22434 [20:15:12<1:33:34, 2.54s/it] +2025-02-06 06:22:53 - ERROR - stderr - +2025-02-06 06:22:53 - ERROR - stderr - +2025-02-06 06:22:53 - INFO - stdout - {'loss': 0.3189, 'grad_norm': 1.5374935865402222, 'learning_rate': 5.042654160775617e-07, 'epoch': 2.7} +2025-02-06 06:22:53 - ERROR - stderr - 90%|█████████ | 20225/22434 [20:15:12<1:33:34, 2.54s/it] +2025-02-06 06:22:55 - ERROR - stderr - 90%|█████████ | 20226/22434 [20:15:15<1:33:02, 2.53s/it] +2025-02-06 06:22:55 - ERROR - stderr - +2025-02-06 06:22:55 - ERROR - stderr - +2025-02-06 06:22:55 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.5510648488998413, 'learning_rate': 5.038128360149885e-07, 'epoch': 2.7} +2025-02-06 06:22:55 - ERROR - stderr - 90%|█████████ | 20226/22434 [20:15:15<1:33:02, 2.53s/it] +2025-02-06 06:22:58 - ERROR - stderr - 90%|█████████ | 20227/22434 [20:15:18<1:35:27, 2.60s/it] +2025-02-06 06:22:58 - ERROR - stderr - +2025-02-06 06:22:58 - ERROR - stderr - +2025-02-06 06:22:58 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.5830087661743164, 'learning_rate': 5.033604538913528e-07, 'epoch': 2.7} +2025-02-06 06:22:58 - ERROR - stderr - 90%|█████████ | 20227/22434 [20:15:18<1:35:27, 2.60s/it] +2025-02-06 06:23:00 - ERROR - stderr - 90%|█████████ | 20228/22434 [20:15:20<1:34:42, 2.58s/it] +2025-02-06 06:23:00 - ERROR - stderr - +2025-02-06 06:23:00 - ERROR - stderr - +2025-02-06 06:23:00 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.5863440036773682, 'learning_rate': 5.029082697160781e-07, 'epoch': 2.71} +2025-02-06 06:23:01 - ERROR - stderr - 90%|█████████ | 20228/22434 [20:15:20<1:34:42, 2.58s/it] +2025-02-06 06:23:03 - ERROR - stderr - 90%|█████████ | 20229/22434 [20:15:23<1:34:34, 2.57s/it] +2025-02-06 06:23:03 - ERROR - stderr - +2025-02-06 06:23:03 - ERROR - stderr - +2025-02-06 06:23:03 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.5671225786209106, 'learning_rate': 5.024562834985958e-07, 'epoch': 2.71} +2025-02-06 06:23:03 - ERROR - stderr - 90%|█████████ | 20229/22434 [20:15:23<1:34:34, 2.57s/it] +2025-02-06 06:23:06 - ERROR - stderr - 90%|█████████ | 20230/22434 [20:15:25<1:34:26, 2.57s/it] +2025-02-06 06:23:06 - ERROR - stderr - +2025-02-06 06:23:06 - ERROR - stderr - +2025-02-06 06:23:06 - INFO - stdout - {'loss': 0.4248, 'grad_norm': 1.5910592079162598, 'learning_rate': 5.020044952483228e-07, 'epoch': 2.71} +2025-02-06 06:23:06 - ERROR - stderr - 90%|█████████ | 20230/22434 [20:15:25<1:34:26, 2.57s/it] +2025-02-06 06:23:08 - ERROR - stderr - 90%|█████████ | 20231/22434 [20:15:28<1:33:04, 2.54s/it] +2025-02-06 06:23:08 - ERROR - stderr - +2025-02-06 06:23:08 - ERROR - stderr - +2025-02-06 06:23:08 - INFO - stdout - {'loss': 0.3128, 'grad_norm': 1.351464033126831, 'learning_rate': 5.015529049746759e-07, 'epoch': 2.71} +2025-02-06 06:23:08 - ERROR - stderr - 90%|█████████ | 20231/22434 [20:15:28<1:33:04, 2.54s/it] +2025-02-06 06:23:11 - ERROR - stderr - 90%|█████████ | 20232/22434 [20:15:30<1:33:44, 2.55s/it] +2025-02-06 06:23:11 - ERROR - stderr - +2025-02-06 06:23:11 - ERROR - stderr - +2025-02-06 06:23:11 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.5354423522949219, 'learning_rate': 5.011015126870722e-07, 'epoch': 2.71} +2025-02-06 06:23:11 - ERROR - stderr - 90%|█████████ | 20232/22434 [20:15:30<1:33:44, 2.55s/it] +2025-02-06 06:23:13 - ERROR - stderr - 90%|█████████ | 20233/22434 [20:15:33<1:33:56, 2.56s/it] +2025-02-06 06:23:13 - ERROR - stderr - +2025-02-06 06:23:13 - ERROR - stderr - +2025-02-06 06:23:13 - INFO - stdout - {'loss': 0.4018, 'grad_norm': 1.8688173294067383, 'learning_rate': 5.006503183949174e-07, 'epoch': 2.71} +2025-02-06 06:23:13 - ERROR - stderr - 90%|█████████ | 20233/22434 [20:15:33<1:33:56, 2.56s/it] +2025-02-06 06:23:16 - ERROR - stderr - 90%|█████████ | 20234/22434 [20:15:35<1:33:27, 2.55s/it] +2025-02-06 06:23:16 - ERROR - stderr - +2025-02-06 06:23:16 - ERROR - stderr - +2025-02-06 06:23:16 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.5489201545715332, 'learning_rate': 5.001993221076162e-07, 'epoch': 2.71} +2025-02-06 06:23:16 - ERROR - stderr - 90%|█████████ | 20234/22434 [20:15:36<1:33:27, 2.55s/it] +2025-02-06 06:23:18 - ERROR - stderr - 90%|█████████ | 20235/22434 [20:15:38<1:32:33, 2.53s/it] +2025-02-06 06:23:18 - ERROR - stderr - +2025-02-06 06:23:18 - ERROR - stderr - +2025-02-06 06:23:18 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.6392488479614258, 'learning_rate': 4.9974852383457e-07, 'epoch': 2.71} +2025-02-06 06:23:18 - ERROR - stderr - 90%|█████████ | 20235/22434 [20:15:38<1:32:33, 2.53s/it] +2025-02-06 06:23:21 - ERROR - stderr - 90%|█████████ | 20236/22434 [20:15:41<1:32:43, 2.53s/it] +2025-02-06 06:23:21 - ERROR - stderr - +2025-02-06 06:23:21 - ERROR - stderr - +2025-02-06 06:23:21 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.710466742515564, 'learning_rate': 4.992979235851747e-07, 'epoch': 2.71} +2025-02-06 06:23:21 - ERROR - stderr - 90%|█████████ | 20236/22434 [20:15:41<1:32:43, 2.53s/it] +2025-02-06 06:23:23 - ERROR - stderr - 90%|█████████ | 20237/22434 [20:15:43<1:32:08, 2.52s/it] +2025-02-06 06:23:23 - ERROR - stderr - +2025-02-06 06:23:23 - ERROR - stderr - +2025-02-06 06:23:23 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.5702614784240723, 'learning_rate': 4.988475213688238e-07, 'epoch': 2.71} +2025-02-06 06:23:23 - ERROR - stderr - 90%|█████████ | 20237/22434 [20:15:43<1:32:08, 2.52s/it] +2025-02-06 06:23:26 - ERROR - stderr - 90%|█████████ | 20238/22434 [20:15:45<1:31:56, 2.51s/it] +2025-02-06 06:23:26 - ERROR - stderr - +2025-02-06 06:23:26 - ERROR - stderr - +2025-02-06 06:23:26 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.4164937734603882, 'learning_rate': 4.983973171949042e-07, 'epoch': 2.71} +2025-02-06 06:23:26 - ERROR - stderr - 90%|█████████ | 20238/22434 [20:15:46<1:31:56, 2.51s/it] +2025-02-06 06:23:28 - ERROR - stderr - 90%|█████████ | 20239/22434 [20:15:48<1:31:26, 2.50s/it] +2025-02-06 06:23:28 - ERROR - stderr - +2025-02-06 06:23:28 - ERROR - stderr - +2025-02-06 06:23:28 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.607800006866455, 'learning_rate': 4.979473110728006e-07, 'epoch': 2.71} +2025-02-06 06:23:28 - ERROR - stderr - 90%|█████████ | 20239/22434 [20:15:48<1:31:26, 2.50s/it] +2025-02-06 06:23:31 - ERROR - stderr - 90%|█████████ | 20240/22434 [20:15:50<1:31:16, 2.50s/it] +2025-02-06 06:23:31 - ERROR - stderr - +2025-02-06 06:23:31 - ERROR - stderr - +2025-02-06 06:23:31 - INFO - stdout - {'loss': 0.3113, 'grad_norm': 1.3887269496917725, 'learning_rate': 4.974975030118923e-07, 'epoch': 2.71} +2025-02-06 06:23:31 - ERROR - stderr - 90%|█████████ | 20240/22434 [20:15:51<1:31:16, 2.50s/it] +2025-02-06 06:23:33 - ERROR - stderr - 90%|█████████ | 20241/22434 [20:15:53<1:30:27, 2.48s/it] +2025-02-06 06:23:33 - ERROR - stderr - +2025-02-06 06:23:33 - ERROR - stderr - +2025-02-06 06:23:33 - INFO - stdout - {'loss': 0.329, 'grad_norm': 1.4379385709762573, 'learning_rate': 4.970478930215573e-07, 'epoch': 2.71} +2025-02-06 06:23:33 - ERROR - stderr - 90%|█████████ | 20241/22434 [20:15:53<1:30:27, 2.48s/it] +2025-02-06 06:23:36 - ERROR - stderr - 90%|█████████ | 20242/22434 [20:15:55<1:31:22, 2.50s/it] +2025-02-06 06:23:36 - ERROR - stderr - +2025-02-06 06:23:36 - ERROR - stderr - +2025-02-06 06:23:36 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.461632490158081, 'learning_rate': 4.965984811111635e-07, 'epoch': 2.71} +2025-02-06 06:23:36 - ERROR - stderr - 90%|█████████ | 20242/22434 [20:15:55<1:31:22, 2.50s/it] +2025-02-06 06:23:38 - ERROR - stderr - 90%|█████████ | 20243/22434 [20:15:58<1:33:40, 2.57s/it] +2025-02-06 06:23:38 - ERROR - stderr - +2025-02-06 06:23:38 - ERROR - stderr - +2025-02-06 06:23:38 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.4693002700805664, 'learning_rate': 4.961492672900814e-07, 'epoch': 2.71} +2025-02-06 06:23:38 - ERROR - stderr - 90%|█████████ | 20243/22434 [20:15:58<1:33:40, 2.57s/it] +2025-02-06 06:23:41 - ERROR - stderr - 90%|█████████ | 20244/22434 [20:16:01<1:33:05, 2.55s/it] +2025-02-06 06:23:41 - ERROR - stderr - +2025-02-06 06:23:41 - ERROR - stderr - +2025-02-06 06:23:41 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.6032915115356445, 'learning_rate': 4.957002515676735e-07, 'epoch': 2.71} +2025-02-06 06:23:41 - ERROR - stderr - 90%|█████████ | 20244/22434 [20:16:01<1:33:05, 2.55s/it] +2025-02-06 06:23:44 - ERROR - stderr - 90%|█████████ | 20245/22434 [20:16:03<1:35:04, 2.61s/it] +2025-02-06 06:23:44 - ERROR - stderr - +2025-02-06 06:23:44 - ERROR - stderr - +2025-02-06 06:23:44 - INFO - stdout - {'loss': 0.3989, 'grad_norm': 1.5636367797851562, 'learning_rate': 4.952514339532998e-07, 'epoch': 2.71} +2025-02-06 06:23:44 - ERROR - stderr - 90%|█████████ | 20245/22434 [20:16:03<1:35:04, 2.61s/it] +2025-02-06 06:23:46 - ERROR - stderr - 90%|█████████ | 20246/22434 [20:16:06<1:33:36, 2.57s/it] +2025-02-06 06:23:46 - ERROR - stderr - +2025-02-06 06:23:46 - ERROR - stderr - +2025-02-06 06:23:46 - INFO - stdout - {'loss': 0.3982, 'grad_norm': 1.5605103969573975, 'learning_rate': 4.948028144563155e-07, 'epoch': 2.71} +2025-02-06 06:23:46 - ERROR - stderr - 90%|█████████ | 20246/22434 [20:16:06<1:33:36, 2.57s/it] +2025-02-06 06:23:49 - ERROR - stderr - 90%|█████████ | 20247/22434 [20:16:08<1:33:05, 2.55s/it] +2025-02-06 06:23:49 - ERROR - stderr - +2025-02-06 06:23:49 - ERROR - stderr - +2025-02-06 06:23:49 - INFO - stdout - {'loss': 0.3272, 'grad_norm': 1.4551712274551392, 'learning_rate': 4.943543930860683e-07, 'epoch': 2.71} +2025-02-06 06:23:49 - ERROR - stderr - 90%|█████████ | 20247/22434 [20:16:08<1:33:05, 2.55s/it] +2025-02-06 06:23:51 - ERROR - stderr - 90%|█████████ | 20248/22434 [20:16:11<1:33:27, 2.56s/it] +2025-02-06 06:23:51 - ERROR - stderr - +2025-02-06 06:23:51 - ERROR - stderr - +2025-02-06 06:23:51 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.5693658590316772, 'learning_rate': 4.93906169851911e-07, 'epoch': 2.71} +2025-02-06 06:23:51 - ERROR - stderr - 90%|█████████ | 20248/22434 [20:16:11<1:33:27, 2.56s/it] +2025-02-06 06:23:54 - ERROR - stderr - 90%|█████████ | 20249/22434 [20:16:13<1:32:26, 2.54s/it] +2025-02-06 06:23:54 - ERROR - stderr - +2025-02-06 06:23:54 - ERROR - stderr - +2025-02-06 06:23:54 - INFO - stdout - {'loss': 0.3023, 'grad_norm': 1.2982853651046753, 'learning_rate': 4.934581447631825e-07, 'epoch': 2.71} +2025-02-06 06:23:54 - ERROR - stderr - 90%|█████████ | 20249/22434 [20:16:14<1:32:26, 2.54s/it] +2025-02-06 06:23:56 - ERROR - stderr - 90%|█████████ | 20250/22434 [20:16:16<1:34:13, 2.59s/it] +2025-02-06 06:23:56 - ERROR - stderr - +2025-02-06 06:23:56 - ERROR - stderr - +2025-02-06 06:23:56 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.611607551574707, 'learning_rate': 4.930103178292201e-07, 'epoch': 2.71} +2025-02-06 06:23:56 - ERROR - stderr - 90%|█████████ | 20250/22434 [20:16:16<1:34:13, 2.59s/it] +2025-02-06 06:23:59 - ERROR - stderr - 90%|█████████ | 20251/22434 [20:16:19<1:32:40, 2.55s/it] +2025-02-06 06:23:59 - ERROR - stderr - +2025-02-06 06:23:59 - ERROR - stderr - +2025-02-06 06:23:59 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.4041997194290161, 'learning_rate': 4.925626890593638e-07, 'epoch': 2.71} +2025-02-06 06:23:59 - ERROR - stderr - 90%|█████████ | 20251/22434 [20:16:19<1:32:40, 2.55s/it] +2025-02-06 06:24:01 - ERROR - stderr - 90%|█████████ | 20252/22434 [20:16:21<1:32:23, 2.54s/it] +2025-02-06 06:24:01 - ERROR - stderr - +2025-02-06 06:24:01 - ERROR - stderr - +2025-02-06 06:24:01 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.5647393465042114, 'learning_rate': 4.921152584629363e-07, 'epoch': 2.71} +2025-02-06 06:24:01 - ERROR - stderr - 90%|█████████ | 20252/22434 [20:16:21<1:32:23, 2.54s/it] +2025-02-06 06:24:04 - ERROR - stderr - 90%|█████████ | 20253/22434 [20:16:24<1:32:07, 2.53s/it] +2025-02-06 06:24:04 - ERROR - stderr - +2025-02-06 06:24:04 - ERROR - stderr - +2025-02-06 06:24:04 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.7457700967788696, 'learning_rate': 4.916680260492724e-07, 'epoch': 2.71} +2025-02-06 06:24:04 - ERROR - stderr - 90%|█████████ | 20253/22434 [20:16:24<1:32:07, 2.53s/it] +2025-02-06 06:24:06 - ERROR - stderr - 90%|█████████ | 20254/22434 [20:16:26<1:31:54, 2.53s/it] +2025-02-06 06:24:06 - ERROR - stderr - +2025-02-06 06:24:06 - ERROR - stderr - +2025-02-06 06:24:06 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.5076274871826172, 'learning_rate': 4.912209918276877e-07, 'epoch': 2.71} +2025-02-06 06:24:06 - ERROR - stderr - 90%|█████████ | 20254/22434 [20:16:26<1:31:54, 2.53s/it] +2025-02-06 06:24:09 - ERROR - stderr - 90%|█████████ | 20255/22434 [20:16:29<1:33:16, 2.57s/it] +2025-02-06 06:24:09 - ERROR - stderr - +2025-02-06 06:24:09 - ERROR - stderr - +2025-02-06 06:24:09 - INFO - stdout - {'loss': 0.3276, 'grad_norm': 1.644917607307434, 'learning_rate': 4.907741558075041e-07, 'epoch': 2.71} +2025-02-06 06:24:09 - ERROR - stderr - 90%|█████████ | 20255/22434 [20:16:29<1:33:16, 2.57s/it] +2025-02-06 06:24:12 - ERROR - stderr - 90%|█████████ | 20256/22434 [20:16:31<1:31:48, 2.53s/it] +2025-02-06 06:24:12 - ERROR - stderr - +2025-02-06 06:24:12 - ERROR - stderr - +2025-02-06 06:24:12 - INFO - stdout - {'loss': 0.4238, 'grad_norm': 1.810935378074646, 'learning_rate': 4.903275179980327e-07, 'epoch': 2.71} +2025-02-06 06:24:12 - ERROR - stderr - 90%|█████████ | 20256/22434 [20:16:31<1:31:48, 2.53s/it] +2025-02-06 06:24:14 - ERROR - stderr - 90%|█████████ | 20257/22434 [20:16:34<1:30:58, 2.51s/it] +2025-02-06 06:24:14 - ERROR - stderr - +2025-02-06 06:24:14 - ERROR - stderr - +2025-02-06 06:24:14 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.7289130687713623, 'learning_rate': 4.898810784085838e-07, 'epoch': 2.71} +2025-02-06 06:24:14 - ERROR - stderr - 90%|█████████ | 20257/22434 [20:16:34<1:30:58, 2.51s/it] +2025-02-06 06:24:16 - ERROR - stderr - 90%|█████████ | 20258/22434 [20:16:36<1:30:41, 2.50s/it] +2025-02-06 06:24:17 - ERROR - stderr - +2025-02-06 06:24:17 - ERROR - stderr - +2025-02-06 06:24:17 - INFO - stdout - {'loss': 0.3827, 'grad_norm': 1.716043472290039, 'learning_rate': 4.894348370484648e-07, 'epoch': 2.71} +2025-02-06 06:24:17 - ERROR - stderr - 90%|█████████ | 20258/22434 [20:16:36<1:30:41, 2.50s/it] +2025-02-06 06:24:19 - ERROR - stderr - 90%|█████████ | 20259/22434 [20:16:39<1:29:50, 2.48s/it] +2025-02-06 06:24:19 - ERROR - stderr - +2025-02-06 06:24:19 - ERROR - stderr - +2025-02-06 06:24:19 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.5843870639801025, 'learning_rate': 4.889887939269755e-07, 'epoch': 2.71} +2025-02-06 06:24:19 - ERROR - stderr - 90%|█████████ | 20259/22434 [20:16:39<1:29:50, 2.48s/it] +2025-02-06 06:24:21 - ERROR - stderr - 90%|█████████ | 20260/22434 [20:16:41<1:30:35, 2.50s/it] +2025-02-06 06:24:21 - ERROR - stderr - +2025-02-06 06:24:21 - ERROR - stderr - +2025-02-06 06:24:21 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.5630922317504883, 'learning_rate': 4.885429490534133e-07, 'epoch': 2.71} +2025-02-06 06:24:21 - ERROR - stderr - 90%|█████████ | 20260/22434 [20:16:41<1:30:35, 2.50s/it] +2025-02-06 06:24:24 - ERROR - stderr - 90%|█████████ | 20261/22434 [20:16:44<1:31:28, 2.53s/it] +2025-02-06 06:24:24 - ERROR - stderr - +2025-02-06 06:24:24 - ERROR - stderr - +2025-02-06 06:24:24 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.5369900465011597, 'learning_rate': 4.880973024370728e-07, 'epoch': 2.71} +2025-02-06 06:24:24 - ERROR - stderr - 90%|█████████ | 20261/22434 [20:16:44<1:31:28, 2.53s/it] +2025-02-06 06:24:27 - ERROR - stderr - 90%|█████████ | 20262/22434 [20:16:46<1:31:32, 2.53s/it] +2025-02-06 06:24:27 - ERROR - stderr - +2025-02-06 06:24:27 - ERROR - stderr - +2025-02-06 06:24:27 - INFO - stdout - {'loss': 0.398, 'grad_norm': 1.838873267173767, 'learning_rate': 4.876518540872411e-07, 'epoch': 2.71} +2025-02-06 06:24:27 - ERROR - stderr - 90%|█████████ | 20262/22434 [20:16:46<1:31:32, 2.53s/it] +2025-02-06 06:24:29 - ERROR - stderr - 90%|█████████ | 20263/22434 [20:16:49<1:30:39, 2.51s/it] +2025-02-06 06:24:29 - ERROR - stderr - +2025-02-06 06:24:29 - ERROR - stderr - +2025-02-06 06:24:29 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.4741324186325073, 'learning_rate': 4.87206604013205e-07, 'epoch': 2.71} +2025-02-06 06:24:29 - ERROR - stderr - 90%|█████████ | 20263/22434 [20:16:49<1:30:39, 2.51s/it] +2025-02-06 06:24:31 - ERROR - stderr - 90%|█████████ | 20264/22434 [20:16:51<1:30:01, 2.49s/it] +2025-02-06 06:24:32 - ERROR - stderr - +2025-02-06 06:24:32 - ERROR - stderr - +2025-02-06 06:24:32 - INFO - stdout - {'loss': 0.3579, 'grad_norm': 1.5917896032333374, 'learning_rate': 4.867615522242442e-07, 'epoch': 2.71} +2025-02-06 06:24:32 - ERROR - stderr - 90%|█████████ | 20264/22434 [20:16:51<1:30:01, 2.49s/it] +2025-02-06 06:24:34 - ERROR - stderr - 90%|█████████ | 20265/22434 [20:16:54<1:29:51, 2.49s/it] +2025-02-06 06:24:34 - ERROR - stderr - +2025-02-06 06:24:34 - ERROR - stderr - +2025-02-06 06:24:34 - INFO - stdout - {'loss': 0.4083, 'grad_norm': 1.7424354553222656, 'learning_rate': 4.863166987296375e-07, 'epoch': 2.71} +2025-02-06 06:24:34 - ERROR - stderr - 90%|█████████ | 20265/22434 [20:16:54<1:29:51, 2.49s/it] +2025-02-06 06:24:36 - ERROR - stderr - 90%|█████████ | 20266/22434 [20:16:56<1:29:44, 2.48s/it] +2025-02-06 06:24:36 - ERROR - stderr - +2025-02-06 06:24:36 - ERROR - stderr - +2025-02-06 06:24:36 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4296364784240723, 'learning_rate': 4.858720435386522e-07, 'epoch': 2.71} +2025-02-06 06:24:36 - ERROR - stderr - 90%|█████████ | 20266/22434 [20:16:56<1:29:44, 2.48s/it] +2025-02-06 06:24:39 - ERROR - stderr - 90%|█████████ | 20267/22434 [20:16:59<1:29:26, 2.48s/it] +2025-02-06 06:24:39 - ERROR - stderr - +2025-02-06 06:24:39 - ERROR - stderr - +2025-02-06 06:24:39 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.6664575338363647, 'learning_rate': 4.854275866605629e-07, 'epoch': 2.71} +2025-02-06 06:24:39 - ERROR - stderr - 90%|█████████ | 20267/22434 [20:16:59<1:29:26, 2.48s/it] +2025-02-06 06:24:41 - ERROR - stderr - 90%|█████████ | 20268/22434 [20:17:01<1:29:55, 2.49s/it] +2025-02-06 06:24:41 - ERROR - stderr - +2025-02-06 06:24:41 - ERROR - stderr - +2025-02-06 06:24:41 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.6264694929122925, 'learning_rate': 4.84983328104629e-07, 'epoch': 2.71} +2025-02-06 06:24:41 - ERROR - stderr - 90%|█████████ | 20268/22434 [20:17:01<1:29:55, 2.49s/it] +2025-02-06 06:24:44 - ERROR - stderr - 90%|█████████ | 20269/22434 [20:17:04<1:29:50, 2.49s/it] +2025-02-06 06:24:44 - ERROR - stderr - +2025-02-06 06:24:44 - ERROR - stderr - +2025-02-06 06:24:44 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.523430347442627, 'learning_rate': 4.845392678801131e-07, 'epoch': 2.71} +2025-02-06 06:24:44 - ERROR - stderr - 90%|█████████ | 20269/22434 [20:17:04<1:29:50, 2.49s/it] +2025-02-06 06:24:46 - ERROR - stderr - 90%|█████████ | 20270/22434 [20:17:06<1:29:22, 2.48s/it] +2025-02-06 06:24:46 - ERROR - stderr - +2025-02-06 06:24:46 - ERROR - stderr - +2025-02-06 06:24:46 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.560166597366333, 'learning_rate': 4.840954059962733e-07, 'epoch': 2.71} +2025-02-06 06:24:46 - ERROR - stderr - 90%|█████████ | 20270/22434 [20:17:06<1:29:22, 2.48s/it] +2025-02-06 06:24:49 - ERROR - stderr - 90%|█████████ | 20271/22434 [20:17:09<1:29:54, 2.49s/it] +2025-02-06 06:24:49 - ERROR - stderr - +2025-02-06 06:24:49 - ERROR - stderr - +2025-02-06 06:24:49 - INFO - stdout - {'loss': 0.372, 'grad_norm': 1.5584559440612793, 'learning_rate': 4.836517424623555e-07, 'epoch': 2.71} +2025-02-06 06:24:49 - ERROR - stderr - 90%|█████████ | 20271/22434 [20:17:09<1:29:54, 2.49s/it] +2025-02-06 06:24:51 - ERROR - stderr - 90%|█████████ | 20272/22434 [20:17:11<1:29:23, 2.48s/it] +2025-02-06 06:24:51 - ERROR - stderr - +2025-02-06 06:24:51 - ERROR - stderr - +2025-02-06 06:24:51 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.6313939094543457, 'learning_rate': 4.832082772876135e-07, 'epoch': 2.71} +2025-02-06 06:24:51 - ERROR - stderr - 90%|█████████ | 20272/22434 [20:17:11<1:29:23, 2.48s/it] +2025-02-06 06:24:54 - ERROR - stderr - 90%|█████████ | 20273/22434 [20:17:14<1:29:13, 2.48s/it] +2025-02-06 06:24:54 - ERROR - stderr - +2025-02-06 06:24:54 - ERROR - stderr - +2025-02-06 06:24:54 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.4671710729599, 'learning_rate': 4.827650104812876e-07, 'epoch': 2.71} +2025-02-06 06:24:54 - ERROR - stderr - 90%|█████████ | 20273/22434 [20:17:14<1:29:13, 2.48s/it] +2025-02-06 06:24:56 - ERROR - stderr - 90%|█████████ | 20274/22434 [20:17:16<1:29:22, 2.48s/it] +2025-02-06 06:24:56 - ERROR - stderr - +2025-02-06 06:24:56 - ERROR - stderr - +2025-02-06 06:24:56 - INFO - stdout - {'loss': 0.3312, 'grad_norm': 1.4950790405273438, 'learning_rate': 4.823219420526182e-07, 'epoch': 2.71} +2025-02-06 06:24:56 - ERROR - stderr - 90%|█████████ | 20274/22434 [20:17:16<1:29:22, 2.48s/it] +2025-02-06 06:24:59 - ERROR - stderr - 90%|█████████ | 20275/22434 [20:17:19<1:29:27, 2.49s/it] +2025-02-06 06:24:59 - ERROR - stderr - +2025-02-06 06:24:59 - ERROR - stderr - +2025-02-06 06:24:59 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.6663103103637695, 'learning_rate': 4.818790720108402e-07, 'epoch': 2.71} +2025-02-06 06:24:59 - ERROR - stderr - 90%|█████████ | 20275/22434 [20:17:19<1:29:27, 2.49s/it] +2025-02-06 06:25:01 - ERROR - stderr - 90%|█████████ | 20276/22434 [20:17:21<1:29:31, 2.49s/it] +2025-02-06 06:25:01 - ERROR - stderr - +2025-02-06 06:25:01 - ERROR - stderr - +2025-02-06 06:25:01 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.705607533454895, 'learning_rate': 4.814364003651839e-07, 'epoch': 2.71} +2025-02-06 06:25:01 - ERROR - stderr - 90%|█████████ | 20276/22434 [20:17:21<1:29:31, 2.49s/it] +2025-02-06 06:25:04 - ERROR - stderr - 90%|█████████ | 20277/22434 [20:17:24<1:29:45, 2.50s/it] +2025-02-06 06:25:04 - ERROR - stderr - +2025-02-06 06:25:04 - ERROR - stderr - +2025-02-06 06:25:04 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.5607072114944458, 'learning_rate': 4.809939271248798e-07, 'epoch': 2.71} +2025-02-06 06:25:04 - ERROR - stderr - 90%|█████████ | 20277/22434 [20:17:24<1:29:45, 2.50s/it] +2025-02-06 06:25:06 - ERROR - stderr - 90%|█████████ | 20278/22434 [20:17:26<1:30:07, 2.51s/it] +2025-02-06 06:25:06 - ERROR - stderr - +2025-02-06 06:25:06 - ERROR - stderr - +2025-02-06 06:25:06 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.5186140537261963, 'learning_rate': 4.805516522991483e-07, 'epoch': 2.71} +2025-02-06 06:25:06 - ERROR - stderr - 90%|█████████ | 20278/22434 [20:17:26<1:30:07, 2.51s/it] +2025-02-06 06:25:09 - ERROR - stderr - 90%|█████████ | 20279/22434 [20:17:29<1:30:48, 2.53s/it] +2025-02-06 06:25:09 - ERROR - stderr - +2025-02-06 06:25:09 - ERROR - stderr - +2025-02-06 06:25:09 - INFO - stdout - {'loss': 0.4351, 'grad_norm': 1.6969985961914062, 'learning_rate': 4.801095758972074e-07, 'epoch': 2.71} +2025-02-06 06:25:09 - ERROR - stderr - 90%|█████████ | 20279/22434 [20:17:29<1:30:48, 2.53s/it] +2025-02-06 06:25:11 - ERROR - stderr - 90%|█████████ | 20280/22434 [20:17:31<1:30:45, 2.53s/it] +2025-02-06 06:25:11 - ERROR - stderr - +2025-02-06 06:25:11 - ERROR - stderr - +2025-02-06 06:25:11 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.581917405128479, 'learning_rate': 4.796676979282733e-07, 'epoch': 2.71} +2025-02-06 06:25:11 - ERROR - stderr - 90%|█████████ | 20280/22434 [20:17:31<1:30:45, 2.53s/it] +2025-02-06 06:25:14 - ERROR - stderr - 90%|█████████ | 20281/22434 [20:17:34<1:30:15, 2.52s/it] +2025-02-06 06:25:14 - ERROR - stderr - +2025-02-06 06:25:14 - ERROR - stderr - +2025-02-06 06:25:14 - INFO - stdout - {'loss': 0.3262, 'grad_norm': 1.7262730598449707, 'learning_rate': 4.792260184015552e-07, 'epoch': 2.71} +2025-02-06 06:25:14 - ERROR - stderr - 90%|█████████ | 20281/22434 [20:17:34<1:30:15, 2.52s/it] +2025-02-06 06:25:16 - ERROR - stderr - 90%|█████████ | 20282/22434 [20:17:36<1:30:09, 2.51s/it] +2025-02-06 06:25:16 - ERROR - stderr - +2025-02-06 06:25:16 - ERROR - stderr - +2025-02-06 06:25:16 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.673905849456787, 'learning_rate': 4.787845373262612e-07, 'epoch': 2.71} +2025-02-06 06:25:16 - ERROR - stderr - 90%|█████████ | 20282/22434 [20:17:36<1:30:09, 2.51s/it] +2025-02-06 06:25:19 - ERROR - stderr - 90%|█████████ | 20283/22434 [20:17:39<1:30:04, 2.51s/it] +2025-02-06 06:25:19 - ERROR - stderr - +2025-02-06 06:25:19 - ERROR - stderr - +2025-02-06 06:25:19 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.4823137521743774, 'learning_rate': 4.783432547115929e-07, 'epoch': 2.71} +2025-02-06 06:25:19 - ERROR - stderr - 90%|█████████ | 20283/22434 [20:17:39<1:30:04, 2.51s/it] +2025-02-06 06:25:21 - ERROR - stderr - 90%|█████████ | 20284/22434 [20:17:41<1:29:38, 2.50s/it] +2025-02-06 06:25:21 - ERROR - stderr - +2025-02-06 06:25:21 - ERROR - stderr - +2025-02-06 06:25:21 - INFO - stdout - {'loss': 0.3355, 'grad_norm': 1.3652760982513428, 'learning_rate': 4.779021705667475e-07, 'epoch': 2.71} +2025-02-06 06:25:21 - ERROR - stderr - 90%|█████████ | 20284/22434 [20:17:41<1:29:38, 2.50s/it] +2025-02-06 06:25:24 - ERROR - stderr - 90%|█████████ | 20285/22434 [20:17:44<1:31:45, 2.56s/it] +2025-02-06 06:25:24 - ERROR - stderr - +2025-02-06 06:25:24 - ERROR - stderr - +2025-02-06 06:25:24 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.6847953796386719, 'learning_rate': 4.774612849009208e-07, 'epoch': 2.71} +2025-02-06 06:25:24 - ERROR - stderr - 90%|█████████ | 20285/22434 [20:17:44<1:31:45, 2.56s/it] +2025-02-06 06:25:27 - ERROR - stderr - 90%|█████████ | 20286/22434 [20:17:46<1:31:37, 2.56s/it] +2025-02-06 06:25:27 - ERROR - stderr - +2025-02-06 06:25:27 - ERROR - stderr - +2025-02-06 06:25:27 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.5922396183013916, 'learning_rate': 4.770205977233022e-07, 'epoch': 2.71} +2025-02-06 06:25:27 - ERROR - stderr - 90%|█████████ | 20286/22434 [20:17:46<1:31:37, 2.56s/it] +2025-02-06 06:25:29 - ERROR - stderr - 90%|█████████ | 20287/22434 [20:17:49<1:31:27, 2.56s/it] +2025-02-06 06:25:29 - ERROR - stderr - +2025-02-06 06:25:29 - ERROR - stderr - +2025-02-06 06:25:29 - INFO - stdout - {'loss': 0.3396, 'grad_norm': 1.3647819757461548, 'learning_rate': 4.765801090430733e-07, 'epoch': 2.71} +2025-02-06 06:25:29 - ERROR - stderr - 90%|█████████ | 20287/22434 [20:17:49<1:31:27, 2.56s/it] +2025-02-06 06:25:32 - ERROR - stderr - 90%|█████████ | 20288/22434 [20:17:52<1:31:33, 2.56s/it] +2025-02-06 06:25:32 - ERROR - stderr - +2025-02-06 06:25:32 - ERROR - stderr - +2025-02-06 06:25:32 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.5936484336853027, 'learning_rate': 4.761398188694211e-07, 'epoch': 2.71} +2025-02-06 06:25:32 - ERROR - stderr - 90%|█████████ | 20288/22434 [20:17:52<1:31:33, 2.56s/it] +2025-02-06 06:25:34 - ERROR - stderr - 90%|█████████ | 20289/22434 [20:17:54<1:30:28, 2.53s/it] +2025-02-06 06:25:34 - ERROR - stderr - +2025-02-06 06:25:34 - ERROR - stderr - +2025-02-06 06:25:34 - INFO - stdout - {'loss': 0.4102, 'grad_norm': 1.5954011678695679, 'learning_rate': 4.756997272115227e-07, 'epoch': 2.71} +2025-02-06 06:25:34 - ERROR - stderr - 90%|█████████ | 20289/22434 [20:17:54<1:30:28, 2.53s/it] +2025-02-06 06:25:37 - ERROR - stderr - 90%|█████████ | 20290/22434 [20:17:57<1:30:14, 2.53s/it] +2025-02-06 06:25:37 - ERROR - stderr - +2025-02-06 06:25:37 - ERROR - stderr - +2025-02-06 06:25:37 - INFO - stdout - {'loss': 0.3388, 'grad_norm': 1.4341659545898438, 'learning_rate': 4.752598340785475e-07, 'epoch': 2.71} +2025-02-06 06:25:37 - ERROR - stderr - 90%|█████████ | 20290/22434 [20:17:57<1:30:14, 2.53s/it] +2025-02-06 06:25:39 - ERROR - stderr - 90%|█████████ | 20291/22434 [20:17:59<1:31:12, 2.55s/it] +2025-02-06 06:25:39 - ERROR - stderr - +2025-02-06 06:25:39 - ERROR - stderr - +2025-02-06 06:25:39 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.6562172174453735, 'learning_rate': 4.748201394796681e-07, 'epoch': 2.71} +2025-02-06 06:25:39 - ERROR - stderr - 90%|█████████ | 20291/22434 [20:17:59<1:31:12, 2.55s/it] +2025-02-06 06:25:42 - ERROR - stderr - 90%|█████████ | 20292/22434 [20:18:02<1:31:06, 2.55s/it] +2025-02-06 06:25:42 - ERROR - stderr - +2025-02-06 06:25:42 - ERROR - stderr - +2025-02-06 06:25:42 - INFO - stdout - {'loss': 0.3602, 'grad_norm': 1.4977498054504395, 'learning_rate': 4.7438064342404724e-07, 'epoch': 2.71} +2025-02-06 06:25:42 - ERROR - stderr - 90%|█████████ | 20292/22434 [20:18:02<1:31:06, 2.55s/it] +2025-02-06 06:25:45 - ERROR - stderr - 90%|█████████ | 20293/22434 [20:18:04<1:31:44, 2.57s/it] +2025-02-06 06:25:45 - ERROR - stderr - +2025-02-06 06:25:45 - ERROR - stderr - +2025-02-06 06:25:45 - INFO - stdout - {'loss': 0.3041, 'grad_norm': 1.39657461643219, 'learning_rate': 4.739413459208486e-07, 'epoch': 2.71} +2025-02-06 06:25:45 - ERROR - stderr - 90%|█████████ | 20293/22434 [20:18:04<1:31:44, 2.57s/it] +2025-02-06 06:25:47 - ERROR - stderr - 90%|█████████ | 20294/22434 [20:18:07<1:30:20, 2.53s/it] +2025-02-06 06:25:47 - ERROR - stderr - +2025-02-06 06:25:47 - ERROR - stderr - +2025-02-06 06:25:47 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.677172303199768, 'learning_rate': 4.73502246979225e-07, 'epoch': 2.71} +2025-02-06 06:25:47 - ERROR - stderr - 90%|█████████ | 20294/22434 [20:18:07<1:30:20, 2.53s/it] +2025-02-06 06:25:49 - ERROR - stderr - 90%|█████████ | 20295/22434 [20:18:09<1:29:29, 2.51s/it] +2025-02-06 06:25:50 - ERROR - stderr - +2025-02-06 06:25:50 - ERROR - stderr - +2025-02-06 06:25:50 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.6039676666259766, 'learning_rate': 4.730633466083312e-07, 'epoch': 2.71} +2025-02-06 06:25:50 - ERROR - stderr - 90%|█████████ | 20295/22434 [20:18:09<1:29:29, 2.51s/it] +2025-02-06 06:25:52 - ERROR - stderr - 90%|█████████ | 20296/22434 [20:18:12<1:29:23, 2.51s/it] +2025-02-06 06:25:52 - ERROR - stderr - +2025-02-06 06:25:52 - ERROR - stderr - +2025-02-06 06:25:52 - INFO - stdout - {'loss': 0.4, 'grad_norm': 1.692585825920105, 'learning_rate': 4.726246448173177e-07, 'epoch': 2.71} +2025-02-06 06:25:52 - ERROR - stderr - 90%|█████████ | 20296/22434 [20:18:12<1:29:23, 2.51s/it] +2025-02-06 06:25:55 - ERROR - stderr - 90%|█████████ | 20297/22434 [20:18:14<1:31:12, 2.56s/it] +2025-02-06 06:25:55 - ERROR - stderr - +2025-02-06 06:25:55 - ERROR - stderr - +2025-02-06 06:25:55 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.66712486743927, 'learning_rate': 4.7218614161532505e-07, 'epoch': 2.71} +2025-02-06 06:25:55 - ERROR - stderr - 90%|█████████ | 20297/22434 [20:18:14<1:31:12, 2.56s/it] +2025-02-06 06:25:57 - ERROR - stderr - 90%|█████████ | 20298/22434 [20:18:17<1:30:14, 2.53s/it] +2025-02-06 06:25:57 - ERROR - stderr - +2025-02-06 06:25:57 - ERROR - stderr - +2025-02-06 06:25:57 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.6623013019561768, 'learning_rate': 4.7174783701149584e-07, 'epoch': 2.71} +2025-02-06 06:25:57 - ERROR - stderr - 90%|█████████ | 20298/22434 [20:18:17<1:30:14, 2.53s/it] +2025-02-06 06:26:00 - ERROR - stderr - 90%|█████████ | 20299/22434 [20:18:19<1:29:17, 2.51s/it] +2025-02-06 06:26:00 - ERROR - stderr - +2025-02-06 06:26:00 - ERROR - stderr - +2025-02-06 06:26:00 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.654388189315796, 'learning_rate': 4.7130973101496504e-07, 'epoch': 2.71} +2025-02-06 06:26:00 - ERROR - stderr - 90%|█████████ | 20299/22434 [20:18:19<1:29:17, 2.51s/it] +2025-02-06 06:26:02 - ERROR - stderr - 90%|█████████ | 20300/22434 [20:18:22<1:29:20, 2.51s/it] +2025-02-06 06:26:02 - ERROR - stderr - +2025-02-06 06:26:02 - ERROR - stderr - +2025-02-06 06:26:02 - INFO - stdout - {'loss': 0.4027, 'grad_norm': 1.5681519508361816, 'learning_rate': 4.7087182363486525e-07, 'epoch': 2.71} +2025-02-06 06:26:02 - ERROR - stderr - 90%|█████████ | 20300/22434 [20:18:22<1:29:20, 2.51s/it] +2025-02-06 06:26:05 - ERROR - stderr - 90%|█████████ | 20301/22434 [20:18:24<1:28:40, 2.49s/it] +2025-02-06 06:26:05 - ERROR - stderr - +2025-02-06 06:26:05 - ERROR - stderr - +2025-02-06 06:26:05 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.4763222932815552, 'learning_rate': 4.7043411488032373e-07, 'epoch': 2.71} +2025-02-06 06:26:05 - ERROR - stderr - 90%|█████████ | 20301/22434 [20:18:24<1:28:40, 2.49s/it] +2025-02-06 06:26:07 - ERROR - stderr - 90%|█████████ | 20302/22434 [20:18:27<1:28:02, 2.48s/it] +2025-02-06 06:26:07 - ERROR - stderr - +2025-02-06 06:26:07 - ERROR - stderr - +2025-02-06 06:26:07 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.4896670579910278, 'learning_rate': 4.699966047604643e-07, 'epoch': 2.71} +2025-02-06 06:26:07 - ERROR - stderr - 90%|█████████ | 20302/22434 [20:18:27<1:28:02, 2.48s/it] +2025-02-06 06:26:10 - ERROR - stderr - 91%|███████��█ | 20303/22434 [20:18:29<1:28:37, 2.50s/it] +2025-02-06 06:26:10 - ERROR - stderr - +2025-02-06 06:26:10 - ERROR - stderr - +2025-02-06 06:26:10 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.60325288772583, 'learning_rate': 4.695592932844073e-07, 'epoch': 2.72} +2025-02-06 06:26:10 - ERROR - stderr - 91%|█████████ | 20303/22434 [20:18:29<1:28:37, 2.50s/it] +2025-02-06 06:26:12 - ERROR - stderr - 91%|█████████ | 20304/22434 [20:18:32<1:29:13, 2.51s/it] +2025-02-06 06:26:12 - ERROR - stderr - +2025-02-06 06:26:12 - ERROR - stderr - +2025-02-06 06:26:12 - INFO - stdout - {'loss': 0.3272, 'grad_norm': 1.5308139324188232, 'learning_rate': 4.691221804612656e-07, 'epoch': 2.72} +2025-02-06 06:26:12 - ERROR - stderr - 91%|█████████ | 20304/22434 [20:18:32<1:29:13, 2.51s/it] +2025-02-06 06:26:15 - ERROR - stderr - 91%|█████████ | 20305/22434 [20:18:34<1:30:13, 2.54s/it] +2025-02-06 06:26:15 - ERROR - stderr - +2025-02-06 06:26:15 - ERROR - stderr - +2025-02-06 06:26:15 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.3872236013412476, 'learning_rate': 4.68685266300154e-07, 'epoch': 2.72} +2025-02-06 06:26:15 - ERROR - stderr - 91%|█████████ | 20305/22434 [20:18:34<1:30:13, 2.54s/it] +2025-02-06 06:26:17 - ERROR - stderr - 91%|█████████ | 20306/22434 [20:18:37<1:29:11, 2.51s/it] +2025-02-06 06:26:17 - ERROR - stderr - +2025-02-06 06:26:17 - ERROR - stderr - +2025-02-06 06:26:17 - INFO - stdout - {'loss': 0.2858, 'grad_norm': 1.429917812347412, 'learning_rate': 4.6824855081017527e-07, 'epoch': 2.72} +2025-02-06 06:26:17 - ERROR - stderr - 91%|█████████ | 20306/22434 [20:18:37<1:29:11, 2.51s/it] +2025-02-06 06:26:20 - ERROR - stderr - 91%|█████████ | 20307/22434 [20:18:40<1:30:23, 2.55s/it] +2025-02-06 06:26:20 - ERROR - stderr - +2025-02-06 06:26:20 - ERROR - stderr - +2025-02-06 06:26:20 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.5927925109863281, 'learning_rate': 4.678120340004355e-07, 'epoch': 2.72} +2025-02-06 06:26:20 - ERROR - stderr - 91%|█████████ | 20307/22434 [20:18:40<1:30:23, 2.55s/it] +2025-02-06 06:26:22 - ERROR - stderr - 91%|█████████ | 20308/22434 [20:18:42<1:29:37, 2.53s/it] +2025-02-06 06:26:22 - ERROR - stderr - +2025-02-06 06:26:22 - ERROR - stderr - +2025-02-06 06:26:22 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.5316531658172607, 'learning_rate': 4.6737571588003294e-07, 'epoch': 2.72} +2025-02-06 06:26:22 - ERROR - stderr - 91%|█████████ | 20308/22434 [20:18:42<1:29:37, 2.53s/it] +2025-02-06 06:26:25 - ERROR - stderr - 91%|█████████ | 20309/22434 [20:18:44<1:28:59, 2.51s/it] +2025-02-06 06:26:25 - ERROR - stderr - +2025-02-06 06:26:25 - ERROR - stderr - +2025-02-06 06:26:25 - INFO - stdout - {'loss': 0.3981, 'grad_norm': 1.6830354928970337, 'learning_rate': 4.6693959645806143e-07, 'epoch': 2.72} +2025-02-06 06:26:25 - ERROR - stderr - 91%|█████████ | 20309/22434 [20:18:45<1:28:59, 2.51s/it] +2025-02-06 06:26:27 - ERROR - stderr - 91%|█████████ | 20310/22434 [20:18:47<1:28:26, 2.50s/it] +2025-02-06 06:26:27 - ERROR - stderr - +2025-02-06 06:26:27 - ERROR - stderr - +2025-02-06 06:26:27 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.564633846282959, 'learning_rate': 4.6650367574361366e-07, 'epoch': 2.72} +2025-02-06 06:26:27 - ERROR - stderr - 91%|█████████ | 20310/22434 [20:18:47<1:28:26, 2.50s/it] +2025-02-06 06:26:30 - ERROR - stderr - 91%|█████████ | 20311/22434 [20:18:49<1:27:52, 2.48s/it] +2025-02-06 06:26:30 - ERROR - stderr - +2025-02-06 06:26:30 - ERROR - stderr - +2025-02-06 06:26:30 - INFO - stdout - {'loss': 0.3917, 'grad_norm': 1.5637178421020508, 'learning_rate': 4.660679537457713e-07, 'epoch': 2.72} +2025-02-06 06:26:30 - ERROR - stderr - 91%|█████████ | 20311/22434 [20:18:49<1:27:52, 2.48s/it] +2025-02-06 06:26:32 - ERROR - stderr - 91%|█████████ | 20312/22434 [20:18:52<1:27:58, 2.49s/it] +2025-02-06 06:26:32 - ERROR - stderr - +2025-02-06 06:26:32 - ERROR - stderr - +2025-02-06 06:26:32 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.5787701606750488, 'learning_rate': 4.656324304736215e-07, 'epoch': 2.72} +2025-02-06 06:26:32 - ERROR - stderr - 91%|█████████ | 20312/22434 [20:18:52<1:27:58, 2.49s/it] +2025-02-06 06:26:35 - ERROR - stderr - 91%|█████████ | 20313/22434 [20:18:54<1:28:39, 2.51s/it] +2025-02-06 06:26:35 - ERROR - stderr - +2025-02-06 06:26:35 - ERROR - stderr - +2025-02-06 06:26:35 - INFO - stdout - {'loss': 0.3182, 'grad_norm': 1.4261279106140137, 'learning_rate': 4.651971059362381e-07, 'epoch': 2.72} +2025-02-06 06:26:35 - ERROR - stderr - 91%|█████████ | 20313/22434 [20:18:54<1:28:39, 2.51s/it] +2025-02-06 06:26:37 - ERROR - stderr - 91%|█████████ | 20314/22434 [20:18:57<1:29:56, 2.55s/it] +2025-02-06 06:26:37 - ERROR - stderr - +2025-02-06 06:26:37 - ERROR - stderr - +2025-02-06 06:26:37 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.439518928527832, 'learning_rate': 4.6476198014269945e-07, 'epoch': 2.72} +2025-02-06 06:26:37 - ERROR - stderr - 91%|█████████ | 20314/22434 [20:18:57<1:29:56, 2.55s/it] +2025-02-06 06:26:40 - ERROR - stderr - 91%|█████████ | 20315/22434 [20:19:00<1:28:49, 2.52s/it] +2025-02-06 06:26:40 - ERROR - stderr - +2025-02-06 06:26:40 - ERROR - stderr - +2025-02-06 06:26:40 - INFO - stdout - {'loss': 0.3311, 'grad_norm': 1.4222633838653564, 'learning_rate': 4.643270531020738e-07, 'epoch': 2.72} +2025-02-06 06:26:40 - ERROR - stderr - 91%|█████████ | 20315/22434 [20:19:00<1:28:49, 2.52s/it] +2025-02-06 06:26:42 - ERROR - stderr - 91%|█████████ | 20316/22434 [20:19:02<1:28:23, 2.50s/it] +2025-02-06 06:26:42 - ERROR - stderr - +2025-02-06 06:26:42 - ERROR - stderr - +2025-02-06 06:26:42 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.4127367734909058, 'learning_rate': 4.638923248234228e-07, 'epoch': 2.72} +2025-02-06 06:26:42 - ERROR - stderr - 91%|█████████ | 20316/22434 [20:19:02<1:28:23, 2.50s/it] +2025-02-06 06:26:45 - ERROR - stderr - 91%|█████████ | 20317/22434 [20:19:05<1:28:26, 2.51s/it] +2025-02-06 06:26:45 - ERROR - stderr - +2025-02-06 06:26:45 - ERROR - stderr - +2025-02-06 06:26:45 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.5622849464416504, 'learning_rate': 4.634577953158137e-07, 'epoch': 2.72} +2025-02-06 06:26:45 - ERROR - stderr - 91%|█████████ | 20317/22434 [20:19:05<1:28:26, 2.51s/it] +2025-02-06 06:26:47 - ERROR - stderr - 91%|█████████ | 20318/22434 [20:19:07<1:28:22, 2.51s/it] +2025-02-06 06:26:47 - ERROR - stderr - +2025-02-06 06:26:47 - ERROR - stderr - +2025-02-06 06:26:47 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.5244227647781372, 'learning_rate': 4.630234645883014e-07, 'epoch': 2.72} +2025-02-06 06:26:47 - ERROR - stderr - 91%|█████████ | 20318/22434 [20:19:07<1:28:22, 2.51s/it] +2025-02-06 06:26:50 - ERROR - stderr - 91%|█████████ | 20319/22434 [20:19:10<1:28:07, 2.50s/it] +2025-02-06 06:26:50 - ERROR - stderr - +2025-02-06 06:26:50 - ERROR - stderr - +2025-02-06 06:26:50 - INFO - stdout - {'loss': 0.4158, 'grad_norm': 1.6528339385986328, 'learning_rate': 4.625893326499387e-07, 'epoch': 2.72} +2025-02-06 06:26:50 - ERROR - stderr - 91%|█████████ | 20319/22434 [20:19:10<1:28:07, 2.50s/it] +2025-02-06 06:26:52 - ERROR - stderr - 91%|█████████ | 20320/22434 [20:19:12<1:28:03, 2.50s/it] +2025-02-06 06:26:52 - ERROR - stderr - +2025-02-06 06:26:52 - ERROR - stderr - +2025-02-06 06:26:52 - INFO - stdout - {'loss': 0.4261, 'grad_norm': 1.7539079189300537, 'learning_rate': 4.6215539950977385e-07, 'epoch': 2.72} +2025-02-06 06:26:52 - ERROR - stderr - 91%|█████████ | 20320/22434 [20:19:12<1:28:03, 2.50s/it] +2025-02-06 06:26:55 - ERROR - stderr - 91%|█████████ | 20321/22434 [20:19:14<1:27:31, 2.49s/it] +2025-02-06 06:26:55 - ERROR - stderr - +2025-02-06 06:26:55 - ERROR - stderr - +2025-02-06 06:26:55 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.4734458923339844, 'learning_rate': 4.617216651768541e-07, 'epoch': 2.72} +2025-02-06 06:26:55 - ERROR - stderr - 91%|█████████ | 20321/22434 [20:19:15<1:27:31, 2.49s/it] +2025-02-06 06:26:57 - ERROR - stderr - 91%|█████████ | 20322/22434 [20:19:17<1:27:29, 2.49s/it] +2025-02-06 06:26:57 - ERROR - stderr - +2025-02-06 06:26:57 - ERROR - stderr - +2025-02-06 06:26:57 - INFO - stdout - {'loss': 0.4042, 'grad_norm': 1.6417827606201172, 'learning_rate': 4.6128812966021894e-07, 'epoch': 2.72} +2025-02-06 06:26:57 - ERROR - stderr - 91%|█████████ | 20322/22434 [20:19:17<1:27:29, 2.49s/it] +2025-02-06 06:27:00 - ERROR - stderr - 91%|█████████ | 20323/22434 [20:19:20<1:33:36, 2.66s/it] +2025-02-06 06:27:00 - ERROR - stderr - +2025-02-06 06:27:00 - ERROR - stderr - +2025-02-06 06:27:00 - INFO - stdout - {'loss': 0.3434, 'grad_norm': 1.5569766759872437, 'learning_rate': 4.6085479296890444e-07, 'epoch': 2.72} +2025-02-06 06:27:00 - ERROR - stderr - 91%|█████████ | 20323/22434 [20:19:20<1:33:36, 2.66s/it] +2025-02-06 06:27:03 - ERROR - stderr - 91%|█████████ | 20324/22434 [20:19:22<1:31:41, 2.61s/it] +2025-02-06 06:27:03 - ERROR - stderr - +2025-02-06 06:27:03 - ERROR - stderr - +2025-02-06 06:27:03 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.6486555337905884, 'learning_rate': 4.6042165511194447e-07, 'epoch': 2.72} +2025-02-06 06:27:03 - ERROR - stderr - 91%|█████████ | 20324/22434 [20:19:23<1:31:41, 2.61s/it] +2025-02-06 06:27:05 - ERROR - stderr - 91%|█████████ | 20325/22434 [20:19:25<1:30:38, 2.58s/it] +2025-02-06 06:27:05 - ERROR - stderr - +2025-02-06 06:27:05 - ERROR - stderr - +2025-02-06 06:27:05 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.5388462543487549, 'learning_rate': 4.599887160983674e-07, 'epoch': 2.72} +2025-02-06 06:27:05 - ERROR - stderr - 91%|█████████ | 20325/22434 [20:19:25<1:30:38, 2.58s/it] +2025-02-06 06:27:08 - ERROR - stderr - 91%|█████████ | 20326/22434 [20:19:28<1:32:02, 2.62s/it] +2025-02-06 06:27:08 - ERROR - stderr - +2025-02-06 06:27:08 - ERROR - stderr - +2025-02-06 06:27:08 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.4521822929382324, 'learning_rate': 4.5955597593719593e-07, 'epoch': 2.72} +2025-02-06 06:27:08 - ERROR - stderr - 91%|█████████ | 20326/22434 [20:19:28<1:32:02, 2.62s/it] +2025-02-06 06:27:11 - ERROR - stderr - 91%|█████████ | 20327/22434 [20:19:30<1:31:28, 2.60s/it] +2025-02-06 06:27:11 - ERROR - stderr - +2025-02-06 06:27:11 - ERROR - stderr - +2025-02-06 06:27:11 - INFO - stdout - {'loss': 0.4006, 'grad_norm': 1.8240342140197754, 'learning_rate': 4.591234346374507e-07, 'epoch': 2.72} +2025-02-06 06:27:11 - ERROR - stderr - 91%|█████████ | 20327/22434 [20:19:30<1:31:28, 2.60s/it] +2025-02-06 06:27:13 - ERROR - stderr - 91%|█████████ | 20328/22434 [20:19:33<1:30:04, 2.57s/it] +2025-02-06 06:27:13 - ERROR - stderr - +2025-02-06 06:27:13 - ERROR - stderr - +2025-02-06 06:27:13 - INFO - stdout - {'loss': 0.4182, 'grad_norm': 1.761733889579773, 'learning_rate': 4.586910922081478e-07, 'epoch': 2.72} +2025-02-06 06:27:13 - ERROR - stderr - 91%|█████████ | 20328/22434 [20:19:33<1:30:04, 2.57s/it] +2025-02-06 06:27:16 - ERROR - stderr - 91%|█████████ | 20329/22434 [20:19:35<1:30:41, 2.58s/it] +2025-02-06 06:27:16 - ERROR - stderr - +2025-02-06 06:27:16 - ERROR - stderr - +2025-02-06 06:27:16 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.7469160556793213, 'learning_rate': 4.582589486583e-07, 'epoch': 2.72} +2025-02-06 06:27:16 - ERROR - stderr - 91%|█████████ | 20329/22434 [20:19:35<1:30:41, 2.58s/it] +2025-02-06 06:27:18 - ERROR - stderr - 91%|█████████ | 20330/22434 [20:19:38<1:31:56, 2.62s/it] +2025-02-06 06:27:18 - ERROR - stderr - +2025-02-06 06:27:18 - ERROR - stderr - +2025-02-06 06:27:18 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.534040927886963, 'learning_rate': 4.5782700399691347e-07, 'epoch': 2.72} +2025-02-06 06:27:18 - ERROR - stderr - 91%|█████████ | 20330/22434 [20:19:38<1:31:56, 2.62s/it] +2025-02-06 06:27:21 - ERROR - stderr - 91%|█████████ | 20331/22434 [20:19:41<1:33:44, 2.67s/it] +2025-02-06 06:27:21 - ERROR - stderr - +2025-02-06 06:27:21 - ERROR - stderr - +2025-02-06 06:27:21 - INFO - stdout - {'loss': 0.3563, 'grad_norm': 1.5893938541412354, 'learning_rate': 4.5739525823299326e-07, 'epoch': 2.72} +2025-02-06 06:27:21 - ERROR - stderr - 91%|█████████ | 20331/22434 [20:19:41<1:33:44, 2.67s/it] +2025-02-06 06:27:24 - ERROR - stderr - 91%|█████████ | 20332/22434 [20:19:43<1:31:32, 2.61s/it] +2025-02-06 06:27:24 - ERROR - stderr - +2025-02-06 06:27:24 - ERROR - stderr - +2025-02-06 06:27:24 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.7124924659729004, 'learning_rate': 4.569637113755343e-07, 'epoch': 2.72} +2025-02-06 06:27:24 - ERROR - stderr - 91%|█████████ | 20332/22434 [20:19:43<1:31:32, 2.61s/it] +2025-02-06 06:27:26 - ERROR - stderr - 91%|█████████ | 20333/22434 [20:19:46<1:30:02, 2.57s/it] +2025-02-06 06:27:26 - ERROR - stderr - +2025-02-06 06:27:26 - ERROR - stderr - +2025-02-06 06:27:26 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.6995853185653687, 'learning_rate': 4.5653236343353727e-07, 'epoch': 2.72} +2025-02-06 06:27:26 - ERROR - stderr - 91%|█████████ | 20333/22434 [20:19:46<1:30:02, 2.57s/it] +2025-02-06 06:27:29 - ERROR - stderr - 91%|█████████ | 20334/22434 [20:19:48<1:29:53, 2.57s/it] +2025-02-06 06:27:29 - ERROR - stderr - +2025-02-06 06:27:29 - ERROR - stderr - +2025-02-06 06:27:29 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.4726452827453613, 'learning_rate': 4.561012144159926e-07, 'epoch': 2.72} +2025-02-06 06:27:29 - ERROR - stderr - 91%|█████████ | 20334/22434 [20:19:48<1:29:53, 2.57s/it] +2025-02-06 06:27:31 - ERROR - stderr - 91%|█████████ | 20335/22434 [20:19:51<1:28:59, 2.54s/it] +2025-02-06 06:27:31 - ERROR - stderr - +2025-02-06 06:27:31 - ERROR - stderr - +2025-02-06 06:27:31 - INFO - stdout - {'loss': 0.4208, 'grad_norm': 1.694965124130249, 'learning_rate': 4.5567026433188223e-07, 'epoch': 2.72} +2025-02-06 06:27:31 - ERROR - stderr - 91%|█████████ | 20335/22434 [20:19:51<1:28:59, 2.54s/it] +2025-02-06 06:27:34 - ERROR - stderr - 91%|█████████ | 20336/22434 [20:19:53<1:28:34, 2.53s/it] +2025-02-06 06:27:34 - ERROR - stderr - +2025-02-06 06:27:34 - ERROR - stderr - +2025-02-06 06:27:34 - INFO - stdout - {'loss': 0.3369, 'grad_norm': 1.4601316452026367, 'learning_rate': 4.5523951319019545e-07, 'epoch': 2.72} +2025-02-06 06:27:34 - ERROR - stderr - 91%|█████████ | 20336/22434 [20:19:53<1:28:34, 2.53s/it] +2025-02-06 06:27:36 - ERROR - stderr - 91%|█████████ | 20337/22434 [20:19:56<1:27:44, 2.51s/it] +2025-02-06 06:27:36 - ERROR - stderr - +2025-02-06 06:27:36 - ERROR - stderr - +2025-02-06 06:27:36 - INFO - stdout - {'loss': 0.3355, 'grad_norm': 1.58147132396698, 'learning_rate': 4.548089609999051e-07, 'epoch': 2.72} +2025-02-06 06:27:36 - ERROR - stderr - 91%|█████████ | 20337/22434 [20:19:56<1:27:44, 2.51s/it] +2025-02-06 06:27:39 - ERROR - stderr - 91%|█████████ | 20338/22434 [20:19:58<1:27:46, 2.51s/it] +2025-02-06 06:27:39 - ERROR - stderr - +2025-02-06 06:27:39 - ERROR - stderr - +2025-02-06 06:27:39 - INFO - stdout - {'loss': 0.4036, 'grad_norm': 1.5607597827911377, 'learning_rate': 4.5437860776999075e-07, 'epoch': 2.72} +2025-02-06 06:27:39 - ERROR - stderr - 91%|█████████ | 20338/22434 [20:19:58<1:27:46, 2.51s/it] +2025-02-06 06:27:41 - ERROR - stderr - 91%|█████████ | 20339/22434 [20:20:01<1:27:50, 2.52s/it] +2025-02-06 06:27:41 - ERROR - stderr - +2025-02-06 06:27:41 - ERROR - stderr - +2025-02-06 06:27:41 - INFO - stdout - {'loss': 0.3235, 'grad_norm': 1.6427724361419678, 'learning_rate': 4.5394845350941854e-07, 'epoch': 2.72} +2025-02-06 06:27:41 - ERROR - stderr - 91%|█████████ | 20339/22434 [20:20:01<1:27:50, 2.52s/it] +2025-02-06 06:27:44 - ERROR - stderr - 91%|█████████ | 20340/22434 [20:20:04<1:31:54, 2.63s/it] +2025-02-06 06:27:44 - ERROR - stderr - +2025-02-06 06:27:44 - ERROR - stderr - +2025-02-06 06:27:44 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.4942275285720825, 'learning_rate': 4.5351849822715566e-07, 'epoch': 2.72} +2025-02-06 06:27:44 - ERROR - stderr - 91%|█████████ | 20340/22434 [20:20:04<1:31:54, 2.63s/it] +2025-02-06 06:27:47 - ERROR - stderr - 91%|█████████ | 20341/22434 [20:20:06<1:30:20, 2.59s/it] +2025-02-06 06:27:47 - ERROR - stderr - +2025-02-06 06:27:47 - ERROR - stderr - +2025-02-06 06:27:47 - INFO - stdout - {'loss': 0.4041, 'grad_norm': 1.5920283794403076, 'learning_rate': 4.5308874193216614e-07, 'epoch': 2.72} +2025-02-06 06:27:47 - ERROR - stderr - 91%|█████████ | 20341/22434 [20:20:06<1:30:20, 2.59s/it] +2025-02-06 06:27:49 - ERROR - stderr - 91%|█████████ | 20342/22434 [20:20:09<1:29:52, 2.58s/it] +2025-02-06 06:27:49 - ERROR - stderr - +2025-02-06 06:27:49 - ERROR - stderr - +2025-02-06 06:27:49 - INFO - stdout - {'loss': 0.3311, 'grad_norm': 1.2896345853805542, 'learning_rate': 4.52659184633405e-07, 'epoch': 2.72} +2025-02-06 06:27:49 - ERROR - stderr - 91%|█████████ | 20342/22434 [20:20:09<1:29:52, 2.58s/it] +2025-02-06 06:27:52 - ERROR - stderr - 91%|█████████ | 20343/22434 [20:20:11<1:29:08, 2.56s/it] +2025-02-06 06:27:52 - ERROR - stderr - +2025-02-06 06:27:52 - ERROR - stderr - +2025-02-06 06:27:52 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.4886295795440674, 'learning_rate': 4.5222982633982837e-07, 'epoch': 2.72} +2025-02-06 06:27:52 - ERROR - stderr - 91%|█████████ | 20343/22434 [20:20:11<1:29:08, 2.56s/it] +2025-02-06 06:27:54 - ERROR - stderr - 91%|█████████ | 20344/22434 [20:20:14<1:27:54, 2.52s/it] +2025-02-06 06:27:54 - ERROR - stderr - +2025-02-06 06:27:54 - ERROR - stderr - +2025-02-06 06:27:54 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.9110654592514038, 'learning_rate': 4.518006670603847e-07, 'epoch': 2.72} +2025-02-06 06:27:54 - ERROR - stderr - 91%|█████████ | 20344/22434 [20:20:14<1:27:54, 2.52s/it] +2025-02-06 06:27:57 - ERROR - stderr - 91%|█████████ | 20345/22434 [20:20:16<1:27:21, 2.51s/it] +2025-02-06 06:27:57 - ERROR - stderr - +2025-02-06 06:27:57 - ERROR - stderr - +2025-02-06 06:27:57 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.495684027671814, 'learning_rate': 4.5137170680401907e-07, 'epoch': 2.72} +2025-02-06 06:27:57 - ERROR - stderr - 91%|█████████ | 20345/22434 [20:20:16<1:27:21, 2.51s/it] +2025-02-06 06:27:59 - ERROR - stderr - 91%|█████████ | 20346/22434 [20:20:19<1:26:56, 2.50s/it] +2025-02-06 06:27:59 - ERROR - stderr - +2025-02-06 06:27:59 - ERROR - stderr - +2025-02-06 06:27:59 - INFO - stdout - {'loss': 0.3017, 'grad_norm': 1.4094411134719849, 'learning_rate': 4.509429455796732e-07, 'epoch': 2.72} +2025-02-06 06:27:59 - ERROR - stderr - 91%|█████████ | 20346/22434 [20:20:19<1:26:56, 2.50s/it] +2025-02-06 06:28:02 - ERROR - stderr - 91%|█████████ | 20347/22434 [20:20:21<1:27:11, 2.51s/it] +2025-02-06 06:28:02 - ERROR - stderr - +2025-02-06 06:28:02 - ERROR - stderr - +2025-02-06 06:28:02 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.53178071975708, 'learning_rate': 4.505143833962844e-07, 'epoch': 2.72} +2025-02-06 06:28:02 - ERROR - stderr - 91%|█████████ | 20347/22434 [20:20:21<1:27:11, 2.51s/it] +2025-02-06 06:28:04 - ERROR - stderr - 91%|█████████ | 20348/22434 [20:20:24<1:26:52, 2.50s/it] +2025-02-06 06:28:04 - ERROR - stderr - +2025-02-06 06:28:04 - ERROR - stderr - +2025-02-06 06:28:04 - INFO - stdout - {'loss': 0.3417, 'grad_norm': 1.7276862859725952, 'learning_rate': 4.5008602026278545e-07, 'epoch': 2.72} +2025-02-06 06:28:04 - ERROR - stderr - 91%|█████████ | 20348/22434 [20:20:24<1:26:52, 2.50s/it] +2025-02-06 06:28:07 - ERROR - stderr - 91%|█████████ | 20349/22434 [20:20:27<1:30:55, 2.62s/it] +2025-02-06 06:28:07 - ERROR - stderr - +2025-02-06 06:28:07 - ERROR - stderr - +2025-02-06 06:28:07 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.6290594339370728, 'learning_rate': 4.4965785618810486e-07, 'epoch': 2.72} +2025-02-06 06:28:07 - ERROR - stderr - 91%|█████████ | 20349/22434 [20:20:27<1:30:55, 2.62s/it] +2025-02-06 06:28:09 - ERROR - stderr - 91%|█████████ | 20350/22434 [20:20:29<1:30:00, 2.59s/it] +2025-02-06 06:28:09 - ERROR - stderr - +2025-02-06 06:28:09 - ERROR - stderr - +2025-02-06 06:28:09 - INFO - stdout - {'loss': 0.3028, 'grad_norm': 1.445056676864624, 'learning_rate': 4.492298911811688e-07, 'epoch': 2.72} +2025-02-06 06:28:09 - ERROR - stderr - 91%|█████████ | 20350/22434 [20:20:29<1:30:00, 2.59s/it] +2025-02-06 06:28:12 - ERROR - stderr - 91%|█████████ | 20351/22434 [20:20:32<1:29:01, 2.56s/it] +2025-02-06 06:28:12 - ERROR - stderr - +2025-02-06 06:28:12 - ERROR - stderr - +2025-02-06 06:28:12 - INFO - stdout - {'loss': 0.3361, 'grad_norm': 1.6794085502624512, 'learning_rate': 4.488021252508945e-07, 'epoch': 2.72} +2025-02-06 06:28:12 - ERROR - stderr - 91%|█████████ | 20351/22434 [20:20:32<1:29:01, 2.56s/it] +2025-02-06 06:28:14 - ERROR - stderr - 91%|█████████ | 20352/22434 [20:20:34<1:28:18, 2.54s/it] +2025-02-06 06:28:14 - ERROR - stderr - +2025-02-06 06:28:14 - ERROR - stderr - +2025-02-06 06:28:14 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.563234567642212, 'learning_rate': 4.483745584062005e-07, 'epoch': 2.72} +2025-02-06 06:28:14 - ERROR - stderr - 91%|█████████ | 20352/22434 [20:20:34<1:28:18, 2.54s/it] +2025-02-06 06:28:17 - ERROR - stderr - 91%|█████████ | 20353/22434 [20:20:37<1:29:43, 2.59s/it] +2025-02-06 06:28:17 - ERROR - stderr - +2025-02-06 06:28:17 - ERROR - stderr - +2025-02-06 06:28:17 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.4091428518295288, 'learning_rate': 4.4794719065599955e-07, 'epoch': 2.72} +2025-02-06 06:28:17 - ERROR - stderr - 91%|█████████ | 20353/22434 [20:20:37<1:29:43, 2.59s/it] +2025-02-06 06:28:20 - ERROR - stderr - 91%|█████████ | 20354/22434 [20:20:39<1:28:47, 2.56s/it] +2025-02-06 06:28:20 - ERROR - stderr - +2025-02-06 06:28:20 - ERROR - stderr - +2025-02-06 06:28:20 - INFO - stdout - {'loss': 0.3375, 'grad_norm': 1.4657106399536133, 'learning_rate': 4.475200220092002e-07, 'epoch': 2.72} +2025-02-06 06:28:20 - ERROR - stderr - 91%|█████████ | 20354/22434 [20:20:39<1:28:47, 2.56s/it] +2025-02-06 06:28:22 - ERROR - stderr - 91%|█████████ | 20355/22434 [20:20:42<1:27:45, 2.53s/it] +2025-02-06 06:28:22 - ERROR - stderr - +2025-02-06 06:28:22 - ERROR - stderr - +2025-02-06 06:28:22 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.6407135725021362, 'learning_rate': 4.4709305247470524e-07, 'epoch': 2.72} +2025-02-06 06:28:22 - ERROR - stderr - 91%|█████████ | 20355/22434 [20:20:42<1:27:45, 2.53s/it] +2025-02-06 06:28:25 - ERROR - stderr - 91%|█████████ | 20356/22434 [20:20:44<1:27:34, 2.53s/it] +2025-02-06 06:28:25 - ERROR - stderr - +2025-02-06 06:28:25 - ERROR - stderr - +2025-02-06 06:28:25 - INFO - stdout - {'loss': 0.3283, 'grad_norm': 1.5972973108291626, 'learning_rate': 4.4666628206141203e-07, 'epoch': 2.72} +2025-02-06 06:28:25 - ERROR - stderr - 91%|█████████ | 20356/22434 [20:20:44<1:27:34, 2.53s/it] +2025-02-06 06:28:27 - ERROR - stderr - 91%|█████████ | 20357/22434 [20:20:47<1:26:58, 2.51s/it] +2025-02-06 06:28:27 - ERROR - stderr - +2025-02-06 06:28:27 - ERROR - stderr - +2025-02-06 06:28:27 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.5466053485870361, 'learning_rate': 4.4623971077822127e-07, 'epoch': 2.72} +2025-02-06 06:28:27 - ERROR - stderr - 91%|█████████ | 20357/22434 [20:20:47<1:26:58, 2.51s/it] +2025-02-06 06:28:30 - ERROR - stderr - 91%|█████████ | 20358/22434 [20:20:49<1:26:59, 2.51s/it] +2025-02-06 06:28:30 - ERROR - stderr - +2025-02-06 06:28:30 - ERROR - stderr - +2025-02-06 06:28:30 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.6096493005752563, 'learning_rate': 4.4581333863402134e-07, 'epoch': 2.72} +2025-02-06 06:28:30 - ERROR - stderr - 91%|█████████ | 20358/22434 [20:20:49<1:26:59, 2.51s/it] +2025-02-06 06:28:32 - ERROR - stderr - 91%|█████████ | 20359/22434 [20:20:52<1:26:09, 2.49s/it] +2025-02-06 06:28:32 - ERROR - stderr - +2025-02-06 06:28:32 - ERROR - stderr - +2025-02-06 06:28:32 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.6768133640289307, 'learning_rate': 4.453871656376996e-07, 'epoch': 2.72} +2025-02-06 06:28:32 - ERROR - stderr - 91%|█████████ | 20359/22434 [20:20:52<1:26:09, 2.49s/it] +2025-02-06 06:28:34 - ERROR - stderr - 91%|█████████ | 20360/22434 [20:20:54<1:25:57, 2.49s/it] +2025-02-06 06:28:35 - ERROR - stderr - +2025-02-06 06:28:35 - ERROR - stderr - +2025-02-06 06:28:35 - INFO - stdout - {'loss': 0.4451, 'grad_norm': 1.8329030275344849, 'learning_rate': 4.449611917981389e-07, 'epoch': 2.72} +2025-02-06 06:28:35 - ERROR - stderr - 91%|█████████ | 20360/22434 [20:20:54<1:25:57, 2.49s/it] +2025-02-06 06:28:37 - ERROR - stderr - 91%|█████████ | 20361/22434 [20:20:57<1:25:54, 2.49s/it] +2025-02-06 06:28:37 - ERROR - stderr - +2025-02-06 06:28:37 - ERROR - stderr - +2025-02-06 06:28:37 - INFO - stdout - {'loss': 0.3201, 'grad_norm': 1.519812822341919, 'learning_rate': 4.445354171242178e-07, 'epoch': 2.72} +2025-02-06 06:28:37 - ERROR - stderr - 91%|█████████ | 20361/22434 [20:20:57<1:25:54, 2.49s/it] +2025-02-06 06:28:39 - ERROR - stderr - 91%|█████████ | 20362/22434 [20:20:59<1:25:20, 2.47s/it] +2025-02-06 06:28:39 - ERROR - stderr - +2025-02-06 06:28:39 - ERROR - stderr - +2025-02-06 06:28:39 - INFO - stdout - {'loss': 0.3125, 'grad_norm': 1.4075103998184204, 'learning_rate': 4.4410984162481574e-07, 'epoch': 2.72} +2025-02-06 06:28:39 - ERROR - stderr - 91%|█████████ | 20362/22434 [20:20:59<1:25:20, 2.47s/it] +2025-02-06 06:28:42 - ERROR - stderr - 91%|█████████ | 20363/22434 [20:21:02<1:25:10, 2.47s/it] +2025-02-06 06:28:42 - ERROR - stderr - +2025-02-06 06:28:42 - ERROR - stderr - +2025-02-06 06:28:42 - INFO - stdout - {'loss': 0.327, 'grad_norm': 1.3142962455749512, 'learning_rate': 4.4368446530879794e-07, 'epoch': 2.72} +2025-02-06 06:28:42 - ERROR - stderr - 91%|█████████ | 20363/22434 [20:21:02<1:25:10, 2.47s/it] +2025-02-06 06:28:44 - ERROR - stderr - 91%|█████████ | 20364/22434 [20:21:04<1:25:44, 2.49s/it] +2025-02-06 06:28:44 - ERROR - stderr - +2025-02-06 06:28:44 - ERROR - stderr - +2025-02-06 06:28:44 - INFO - stdout - {'loss': 0.3191, 'grad_norm': 1.4906506538391113, 'learning_rate': 4.4325928818503395e-07, 'epoch': 2.72} +2025-02-06 06:28:44 - ERROR - stderr - 91%|█████████ | 20364/22434 [20:21:04<1:25:44, 2.49s/it] +2025-02-06 06:28:47 - ERROR - stderr - 91%|█████████ | 20365/22434 [20:21:07<1:30:55, 2.64s/it] +2025-02-06 06:28:47 - ERROR - stderr - +2025-02-06 06:28:47 - ERROR - stderr - +2025-02-06 06:28:47 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.4396533966064453, 'learning_rate': 4.4283431026238446e-07, 'epoch': 2.72} +2025-02-06 06:28:47 - ERROR - stderr - 91%|█████████ | 20365/22434 [20:21:07<1:30:55, 2.64s/it] +2025-02-06 06:28:50 - ERROR - stderr - 91%|█████████ | 20366/22434 [20:21:10<1:29:10, 2.59s/it] +2025-02-06 06:28:50 - ERROR - stderr - +2025-02-06 06:28:50 - ERROR - stderr - +2025-02-06 06:28:50 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.3519947528839111, 'learning_rate': 4.42409531549709e-07, 'epoch': 2.72} +2025-02-06 06:28:50 - ERROR - stderr - 91%|█████████ | 20366/22434 [20:21:10<1:29:10, 2.59s/it] +2025-02-06 06:28:52 - ERROR - stderr - 91%|█████████ | 20367/22434 [20:21:12<1:28:41, 2.57s/it] +2025-02-06 06:28:52 - ERROR - stderr - +2025-02-06 06:28:52 - ERROR - stderr - +2025-02-06 06:28:52 - INFO - stdout - {'loss': 0.4143, 'grad_norm': 1.5930904150009155, 'learning_rate': 4.4198495205586056e-07, 'epoch': 2.72} +2025-02-06 06:28:52 - ERROR - stderr - 91%|█████████ | 20367/22434 [20:21:12<1:28:41, 2.57s/it] +2025-02-06 06:28:55 - ERROR - stderr - 91%|█████████ | 20368/22434 [20:21:15<1:27:54, 2.55s/it] +2025-02-06 06:28:55 - ERROR - stderr - +2025-02-06 06:28:55 - ERROR - stderr - +2025-02-06 06:28:55 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.5092682838439941, 'learning_rate': 4.415605717896898e-07, 'epoch': 2.72} +2025-02-06 06:28:55 - ERROR - stderr - 91%|█████████ | 20368/22434 [20:21:15<1:27:54, 2.55s/it] +2025-02-06 06:28:57 - ERROR - stderr - 91%|█████████ | 20369/22434 [20:21:17<1:27:26, 2.54s/it] +2025-02-06 06:28:57 - ERROR - stderr - +2025-02-06 06:28:57 - ERROR - stderr - +2025-02-06 06:28:57 - INFO - stdout - {'loss': 0.3291, 'grad_norm': 1.5401010513305664, 'learning_rate': 4.41136390760043e-07, 'epoch': 2.72} +2025-02-06 06:28:57 - ERROR - stderr - 91%|█████████ | 20369/22434 [20:21:17<1:27:26, 2.54s/it] +2025-02-06 06:29:00 - ERROR - stderr - 91%|█████████ | 20370/22434 [20:21:20<1:26:26, 2.51s/it] +2025-02-06 06:29:00 - ERROR - stderr - +2025-02-06 06:29:00 - ERROR - stderr - +2025-02-06 06:29:00 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.5423215627670288, 'learning_rate': 4.40712408975762e-07, 'epoch': 2.72} +2025-02-06 06:29:00 - ERROR - stderr - 91%|█████████ | 20370/22434 [20:21:20<1:26:26, 2.51s/it] +2025-02-06 06:29:02 - ERROR - stderr - 91%|█████████ | 20371/22434 [20:21:22<1:26:08, 2.51s/it] +2025-02-06 06:29:02 - ERROR - stderr - +2025-02-06 06:29:02 - ERROR - stderr - +2025-02-06 06:29:02 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.5316669940948486, 'learning_rate': 4.4028862644568293e-07, 'epoch': 2.72} +2025-02-06 06:29:02 - ERROR - stderr - 91%|█████████ | 20371/22434 [20:21:22<1:26:08, 2.51s/it] +2025-02-06 06:29:05 - ERROR - stderr - 91%|█████████ | 20372/22434 [20:21:25<1:25:59, 2.50s/it] +2025-02-06 06:29:05 - ERROR - stderr - +2025-02-06 06:29:05 - ERROR - stderr - +2025-02-06 06:29:05 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.5459325313568115, 'learning_rate': 4.398650431786389e-07, 'epoch': 2.72} +2025-02-06 06:29:05 - ERROR - stderr - 91%|█████████ | 20372/22434 [20:21:25<1:25:59, 2.50s/it] +2025-02-06 06:29:07 - ERROR - stderr - 91%|█████████ | 20373/22434 [20:21:27<1:25:57, 2.50s/it] +2025-02-06 06:29:07 - ERROR - stderr - +2025-02-06 06:29:07 - ERROR - stderr - +2025-02-06 06:29:07 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.515528678894043, 'learning_rate': 4.394416591834616e-07, 'epoch': 2.72} +2025-02-06 06:29:07 - ERROR - stderr - 91%|█████████ | 20373/22434 [20:21:27<1:25:57, 2.50s/it] +2025-02-06 06:29:10 - ERROR - stderr - 91%|█████████ | 20374/22434 [20:21:30<1:26:09, 2.51s/it] +2025-02-06 06:29:10 - ERROR - stderr - +2025-02-06 06:29:10 - ERROR - stderr - +2025-02-06 06:29:10 - INFO - stdout - {'loss': 0.3645, 'grad_norm': 1.569743275642395, 'learning_rate': 4.390184744689741e-07, 'epoch': 2.72} +2025-02-06 06:29:10 - ERROR - stderr - 91%|█████████ | 20374/22434 [20:21:30<1:26:09, 2.51s/it] +2025-02-06 06:29:12 - ERROR - stderr - 91%|█████████ | 20375/22434 [20:21:32<1:26:02, 2.51s/it] +2025-02-06 06:29:12 - ERROR - stderr - +2025-02-06 06:29:12 - ERROR - stderr - +2025-02-06 06:29:12 - INFO - stdout - {'loss': 0.4179, 'grad_norm': 1.6246967315673828, 'learning_rate': 4.3859548904399586e-07, 'epoch': 2.72} +2025-02-06 06:29:12 - ERROR - stderr - 91%|█████████ | 20375/22434 [20:21:32<1:26:02, 2.51s/it] +2025-02-06 06:29:15 - ERROR - stderr - 91%|█████████ | 20376/22434 [20:21:35<1:27:45, 2.56s/it] +2025-02-06 06:29:15 - ERROR - stderr - +2025-02-06 06:29:15 - ERROR - stderr - +2025-02-06 06:29:15 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.727081298828125, 'learning_rate': 4.381727029173488e-07, 'epoch': 2.72} +2025-02-06 06:29:15 - ERROR - stderr - 91%|█████████ | 20376/22434 [20:21:35<1:27:45, 2.56s/it] +2025-02-06 06:29:18 - ERROR - stderr - 91%|█████████ | 20377/22434 [20:21:37<1:27:08, 2.54s/it] +2025-02-06 06:29:18 - ERROR - stderr - +2025-02-06 06:29:18 - ERROR - stderr - +2025-02-06 06:29:18 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.5679454803466797, 'learning_rate': 4.3775011609783814e-07, 'epoch': 2.72} +2025-02-06 06:29:18 - ERROR - stderr - 91%|█████████ | 20377/22434 [20:21:37<1:27:08, 2.54s/it] +2025-02-06 06:29:20 - ERROR - stderr - 91%|█████████ | 20378/22434 [20:21:40<1:26:24, 2.52s/it] +2025-02-06 06:29:20 - ERROR - stderr - +2025-02-06 06:29:20 - ERROR - stderr - +2025-02-06 06:29:20 - INFO - stdout - {'loss': 0.3573, 'grad_norm': 1.4618114233016968, 'learning_rate': 4.3732772859427787e-07, 'epoch': 2.73} +2025-02-06 06:29:20 - ERROR - stderr - 91%|█████████ | 20378/22434 [20:21:40<1:26:24, 2.52s/it] +2025-02-06 06:29:22 - ERROR - stderr - 91%|█████████ | 20379/22434 [20:21:42<1:25:45, 2.50s/it] +2025-02-06 06:29:23 - ERROR - stderr - +2025-02-06 06:29:23 - ERROR - stderr - +2025-02-06 06:29:23 - INFO - stdout - {'loss': 0.3328, 'grad_norm': 1.6613402366638184, 'learning_rate': 4.369055404154721e-07, 'epoch': 2.73} +2025-02-06 06:29:23 - ERROR - stderr - 91%|█████████ | 20379/22434 [20:21:42<1:25:45, 2.50s/it] +2025-02-06 06:29:25 - ERROR - stderr - 91%|█████████ | 20380/22434 [20:21:45<1:28:46, 2.59s/it] +2025-02-06 06:29:25 - ERROR - stderr - +2025-02-06 06:29:25 - ERROR - stderr - +2025-02-06 06:29:25 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.4926159381866455, 'learning_rate': 4.3648355157021704e-07, 'epoch': 2.73} +2025-02-06 06:29:25 - ERROR - stderr - 91%|█████████ | 20380/22434 [20:21:45<1:28:46, 2.59s/it] +2025-02-06 06:29:28 - ERROR - stderr - 91%|█████████ | 20381/22434 [20:21:48<1:28:43, 2.59s/it] +2025-02-06 06:29:28 - ERROR - stderr - +2025-02-06 06:29:28 - ERROR - stderr - +2025-02-06 06:29:28 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.5645604133605957, 'learning_rate': 4.3606176206731354e-07, 'epoch': 2.73} +2025-02-06 06:29:28 - ERROR - stderr - 91%|█████████ | 20381/22434 [20:21:48<1:28:43, 2.59s/it] +2025-02-06 06:29:30 - ERROR - stderr - 91%|█████████ | 20382/22434 [20:21:50<1:27:23, 2.56s/it] +2025-02-06 06:29:30 - ERROR - stderr - +2025-02-06 06:29:30 - ERROR - stderr - +2025-02-06 06:29:30 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.5707736015319824, 'learning_rate': 4.3564017191554895e-07, 'epoch': 2.73} +2025-02-06 06:29:30 - ERROR - stderr - 91%|█████████ | 20382/22434 [20:21:50<1:27:23, 2.56s/it] +2025-02-06 06:29:33 - ERROR - stderr - 91%|█████████ | 20383/22434 [20:21:53<1:26:13, 2.52s/it] +2025-02-06 06:29:33 - ERROR - stderr - +2025-02-06 06:29:33 - ERROR - stderr - +2025-02-06 06:29:33 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.608981728553772, 'learning_rate': 4.3521878112371406e-07, 'epoch': 2.73} +2025-02-06 06:29:33 - ERROR - stderr - 91%|█████████ | 20383/22434 [20:21:53<1:26:13, 2.52s/it] +2025-02-06 06:29:35 - ERROR - stderr - 91%|█████████ | 20384/22434 [20:21:55<1:27:25, 2.56s/it] +2025-02-06 06:29:35 - ERROR - stderr - +2025-02-06 06:29:35 - ERROR - stderr - +2025-02-06 06:29:35 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.630436658859253, 'learning_rate': 4.3479758970059074e-07, 'epoch': 2.73} +2025-02-06 06:29:35 - ERROR - stderr - 91%|█████████ | 20384/22434 [20:21:55<1:27:25, 2.56s/it] +2025-02-06 06:29:38 - ERROR - stderr - 91%|█████████ | 20385/22434 [20:21:58<1:27:13, 2.55s/it] +2025-02-06 06:29:38 - ERROR - stderr - +2025-02-06 06:29:38 - ERROR - stderr - +2025-02-06 06:29:38 - INFO - stdout - {'loss': 0.3624, 'grad_norm': 1.5983762741088867, 'learning_rate': 4.3437659765495853e-07, 'epoch': 2.73} +2025-02-06 06:29:38 - ERROR - stderr - 91%|█████████ | 20385/22434 [20:21:58<1:27:13, 2.55s/it] +2025-02-06 06:29:41 - ERROR - stderr - 91%|█████████ | 20386/22434 [20:22:00<1:26:45, 2.54s/it] +2025-02-06 06:29:41 - ERROR - stderr - +2025-02-06 06:29:41 - ERROR - stderr - +2025-02-06 06:29:41 - INFO - stdout - {'loss': 0.4015, 'grad_norm': 1.6413226127624512, 'learning_rate': 4.3395580499559276e-07, 'epoch': 2.73} +2025-02-06 06:29:41 - ERROR - stderr - 91%|█████████ | 20386/22434 [20:22:00<1:26:45, 2.54s/it] +2025-02-06 06:29:43 - ERROR - stderr - 91%|█████████ | 20387/22434 [20:22:03<1:26:07, 2.52s/it] +2025-02-06 06:29:43 - ERROR - stderr - +2025-02-06 06:29:43 - ERROR - stderr - +2025-02-06 06:29:43 - INFO - stdout - {'loss': 0.4125, 'grad_norm': 1.6908191442489624, 'learning_rate': 4.3353521173126413e-07, 'epoch': 2.73} +2025-02-06 06:29:43 - ERROR - stderr - 91%|█████████ | 20387/22434 [20:22:03<1:26:07, 2.52s/it] +2025-02-06 06:29:45 - ERROR - stderr - 91%|█████████ | 20388/22434 [20:22:05<1:25:47, 2.52s/it] +2025-02-06 06:29:46 - ERROR - stderr - +2025-02-06 06:29:46 - ERROR - stderr - +2025-02-06 06:29:46 - INFO - stdout - {'loss': 0.3266, 'grad_norm': 1.5939172506332397, 'learning_rate': 4.331148178707412e-07, 'epoch': 2.73} +2025-02-06 06:29:46 - ERROR - stderr - 91%|█████████ | 20388/22434 [20:22:05<1:25:47, 2.52s/it] +2025-02-06 06:29:48 - ERROR - stderr - 91%|█████████ | 20389/22434 [20:22:08<1:26:38, 2.54s/it] +2025-02-06 06:29:48 - ERROR - stderr - +2025-02-06 06:29:48 - ERROR - stderr - +2025-02-06 06:29:48 - INFO - stdout - {'loss': 0.4326, 'grad_norm': 1.7253609895706177, 'learning_rate': 4.3269462342278356e-07, 'epoch': 2.73} +2025-02-06 06:29:48 - ERROR - stderr - 91%|█████████ | 20389/22434 [20:22:08<1:26:38, 2.54s/it] +2025-02-06 06:29:51 - ERROR - stderr - 91%|█████████ | 20390/22434 [20:22:10<1:25:48, 2.52s/it] +2025-02-06 06:29:51 - ERROR - stderr - +2025-02-06 06:29:51 - ERROR - stderr - +2025-02-06 06:29:51 - INFO - stdout - {'loss': 0.3328, 'grad_norm': 1.5579947233200073, 'learning_rate': 4.322746283961532e-07, 'epoch': 2.73} +2025-02-06 06:29:51 - ERROR - stderr - 91%|█████████ | 20390/22434 [20:22:10<1:25:48, 2.52s/it] +2025-02-06 06:29:53 - ERROR - stderr - 91%|█████████ | 20391/22434 [20:22:13<1:25:07, 2.50s/it] +2025-02-06 06:29:53 - ERROR - stderr - +2025-02-06 06:29:53 - ERROR - stderr - +2025-02-06 06:29:53 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.5691473484039307, 'learning_rate': 4.3185483279960196e-07, 'epoch': 2.73} +2025-02-06 06:29:53 - ERROR - stderr - 91%|█████████ | 20391/22434 [20:22:13<1:25:07, 2.50s/it] +2025-02-06 06:29:56 - ERROR - stderr - 91%|█████████ | 20392/22434 [20:22:15<1:25:42, 2.52s/it] +2025-02-06 06:29:56 - ERROR - stderr - +2025-02-06 06:29:56 - ERROR - stderr - +2025-02-06 06:29:56 - INFO - stdout - {'loss': 0.3953, 'grad_norm': 1.6086087226867676, 'learning_rate': 4.314352366418817e-07, 'epoch': 2.73} +2025-02-06 06:29:56 - ERROR - stderr - 91%|█████████ | 20392/22434 [20:22:15<1:25:42, 2.52s/it] +2025-02-06 06:29:58 - ERROR - stderr - 91%|█████████ | 20393/22434 [20:22:18<1:25:07, 2.50s/it] +2025-02-06 06:29:58 - ERROR - stderr - +2025-02-06 06:29:58 - ERROR - stderr - +2025-02-06 06:29:58 - INFO - stdout - {'loss': 0.4124, 'grad_norm': 1.6190946102142334, 'learning_rate': 4.3101583993173767e-07, 'epoch': 2.73} +2025-02-06 06:29:58 - ERROR - stderr - 91%|█████████ | 20393/22434 [20:22:18<1:25:07, 2.50s/it] +2025-02-06 06:30:01 - ERROR - stderr - 91%|█████████ | 20394/22434 [20:22:20<1:25:41, 2.52s/it] +2025-02-06 06:30:01 - ERROR - stderr - +2025-02-06 06:30:01 - ERROR - stderr - +2025-02-06 06:30:01 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.5999892950057983, 'learning_rate': 4.305966426779118e-07, 'epoch': 2.73} +2025-02-06 06:30:01 - ERROR - stderr - 91%|█████████ | 20394/22434 [20:22:20<1:25:41, 2.52s/it] +2025-02-06 06:30:03 - ERROR - stderr - 91%|█████████ | 20395/22434 [20:22:23<1:26:06, 2.53s/it] +2025-02-06 06:30:03 - ERROR - stderr - +2025-02-06 06:30:03 - ERROR - stderr - +2025-02-06 06:30:03 - INFO - stdout - {'loss': 0.405, 'grad_norm': 1.8151549100875854, 'learning_rate': 4.301776448891426e-07, 'epoch': 2.73} +2025-02-06 06:30:03 - ERROR - stderr - 91%|█████████ | 20395/22434 [20:22:23<1:26:06, 2.53s/it] +2025-02-06 06:30:06 - ERROR - stderr - 91%|█████████ | 20396/22434 [20:22:25<1:25:40, 2.52s/it] +2025-02-06 06:30:06 - ERROR - stderr - +2025-02-06 06:30:06 - ERROR - stderr - +2025-02-06 06:30:06 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.5247858762741089, 'learning_rate': 4.297588465741609e-07, 'epoch': 2.73} +2025-02-06 06:30:06 - ERROR - stderr - 91%|█████████ | 20396/22434 [20:22:25<1:25:40, 2.52s/it] +2025-02-06 06:30:08 - ERROR - stderr - 91%|█████████ | 20397/22434 [20:22:28<1:25:23, 2.52s/it] +2025-02-06 06:30:08 - ERROR - stderr - +2025-02-06 06:30:08 - ERROR - stderr - +2025-02-06 06:30:08 - INFO - stdout - {'loss': 0.358, 'grad_norm': 1.5828157663345337, 'learning_rate': 4.293402477416997e-07, 'epoch': 2.73} +2025-02-06 06:30:08 - ERROR - stderr - 91%|█████████ | 20397/22434 [20:22:28<1:25:23, 2.52s/it] +2025-02-06 06:30:11 - ERROR - stderr - 91%|█████████ | 20398/22434 [20:22:31<1:26:46, 2.56s/it] +2025-02-06 06:30:11 - ERROR - stderr - +2025-02-06 06:30:11 - ERROR - stderr - +2025-02-06 06:30:11 - INFO - stdout - {'loss': 0.4291, 'grad_norm': 1.5063343048095703, 'learning_rate': 4.2892184840048315e-07, 'epoch': 2.73} +2025-02-06 06:30:11 - ERROR - stderr - 91%|█████████ | 20398/22434 [20:22:31<1:26:46, 2.56s/it] +2025-02-06 06:30:13 - ERROR - stderr - 91%|█████████ | 20399/22434 [20:22:33<1:25:42, 2.53s/it] +2025-02-06 06:30:13 - ERROR - stderr - +2025-02-06 06:30:13 - ERROR - stderr - +2025-02-06 06:30:13 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.4469585418701172, 'learning_rate': 4.28503648559232e-07, 'epoch': 2.73} +2025-02-06 06:30:13 - ERROR - stderr - 91%|█████████ | 20399/22434 [20:22:33<1:25:42, 2.53s/it] +2025-02-06 06:30:16 - ERROR - stderr - 91%|█████████ | 20400/22434 [20:22:36<1:26:06, 2.54s/it] +2025-02-06 06:30:16 - ERROR - stderr - +2025-02-06 06:30:16 - ERROR - stderr - +2025-02-06 06:30:16 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.856197714805603, 'learning_rate': 4.2808564822666486e-07, 'epoch': 2.73} +2025-02-06 06:30:16 - ERROR - stderr - 91%|█████████ | 20400/22434 [20:22:36<1:26:06, 2.54s/it] +2025-02-06 06:30:18 - ERROR - stderr - 91%|█████████ | 20401/22434 [20:22:38<1:25:47, 2.53s/it] +2025-02-06 06:30:18 - ERROR - stderr - +2025-02-06 06:30:18 - ERROR - stderr - +2025-02-06 06:30:18 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.4738566875457764, 'learning_rate': 4.2766784741149034e-07, 'epoch': 2.73} +2025-02-06 06:30:18 - ERROR - stderr - 91%|█████████ | 20401/22434 [20:22:38<1:25:47, 2.53s/it] +2025-02-06 06:30:21 - ERROR - stderr - 91%|█████████ | 20402/22434 [20:22:41<1:26:18, 2.55s/it] +2025-02-06 06:30:21 - ERROR - stderr - +2025-02-06 06:30:21 - ERROR - stderr - +2025-02-06 06:30:21 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.580964207649231, 'learning_rate': 4.272502461224226e-07, 'epoch': 2.73} +2025-02-06 06:30:21 - ERROR - stderr - 91%|█████████ | 20402/22434 [20:22:41<1:26:18, 2.55s/it] +2025-02-06 06:30:24 - ERROR - stderr - 91%|█████████ | 20403/22434 [20:22:43<1:26:45, 2.56s/it] +2025-02-06 06:30:24 - ERROR - stderr - +2025-02-06 06:30:24 - ERROR - stderr - +2025-02-06 06:30:24 - INFO - stdout - {'loss': 0.332, 'grad_norm': 1.398289680480957, 'learning_rate': 4.268328443681613e-07, 'epoch': 2.73} +2025-02-06 06:30:24 - ERROR - stderr - 91%|█████████ | 20403/22434 [20:22:43<1:26:45, 2.56s/it] +2025-02-06 06:30:26 - ERROR - stderr - 91%|█████████ | 20404/22434 [20:22:46<1:26:32, 2.56s/it] +2025-02-06 06:30:26 - ERROR - stderr - +2025-02-06 06:30:26 - ERROR - stderr - +2025-02-06 06:30:26 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.7332830429077148, 'learning_rate': 4.264156421574095e-07, 'epoch': 2.73} +2025-02-06 06:30:26 - ERROR - stderr - 91%|█████████ | 20404/22434 [20:22:46<1:26:32, 2.56s/it] +2025-02-06 06:30:29 - ERROR - stderr - 91%|█████████ | 20405/22434 [20:22:48<1:25:56, 2.54s/it] +2025-02-06 06:30:29 - ERROR - stderr - +2025-02-06 06:30:29 - ERROR - stderr - +2025-02-06 06:30:29 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.5387110710144043, 'learning_rate': 4.2599863949886245e-07, 'epoch': 2.73} +2025-02-06 06:30:29 - ERROR - stderr - 91%|█████████ | 20405/22434 [20:22:48<1:25:56, 2.54s/it] +2025-02-06 06:30:31 - ERROR - stderr - 91%|█████████ | 20406/22434 [20:22:51<1:25:48, 2.54s/it] +2025-02-06 06:30:31 - ERROR - stderr - +2025-02-06 06:30:31 - ERROR - stderr - +2025-02-06 06:30:31 - INFO - stdout - {'loss': 0.322, 'grad_norm': 1.5739604234695435, 'learning_rate': 4.25581836401211e-07, 'epoch': 2.73} +2025-02-06 06:30:31 - ERROR - stderr - 91%|█████████ | 20406/22434 [20:22:51<1:25:48, 2.54s/it] +2025-02-06 06:30:34 - ERROR - stderr - 91%|█████████ | 20407/22434 [20:22:53<1:25:51, 2.54s/it] +2025-02-06 06:30:34 - ERROR - stderr - +2025-02-06 06:30:34 - ERROR - stderr - +2025-02-06 06:30:34 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.4158567190170288, 'learning_rate': 4.2516523287314703e-07, 'epoch': 2.73} +2025-02-06 06:30:34 - ERROR - stderr - 91%|█████████ | 20407/22434 [20:22:53<1:25:51, 2.54s/it] +2025-02-06 06:30:36 - ERROR - stderr - 91%|█████████ | 20408/22434 [20:22:56<1:25:12, 2.52s/it] +2025-02-06 06:30:36 - ERROR - stderr - +2025-02-06 06:30:36 - ERROR - stderr - +2025-02-06 06:30:36 - INFO - stdout - {'loss': 0.3401, 'grad_norm': 1.4814201593399048, 'learning_rate': 4.2474882892335144e-07, 'epoch': 2.73} +2025-02-06 06:30:36 - ERROR - stderr - 91%|█████████ | 20408/22434 [20:22:56<1:25:12, 2.52s/it] +2025-02-06 06:30:39 - ERROR - stderr - 91%|█████████ | 20409/22434 [20:22:58<1:24:32, 2.50s/it] +2025-02-06 06:30:39 - ERROR - stderr - +2025-02-06 06:30:39 - ERROR - stderr - +2025-02-06 06:30:39 - INFO - stdout - {'loss': 0.3374, 'grad_norm': 1.4562358856201172, 'learning_rate': 4.2433262456050286e-07, 'epoch': 2.73} +2025-02-06 06:30:39 - ERROR - stderr - 91%|█████████ | 20409/22434 [20:22:58<1:24:32, 2.50s/it] +2025-02-06 06:30:41 - ERROR - stderr - 91%|█████████ | 20410/22434 [20:23:01<1:23:59, 2.49s/it] +2025-02-06 06:30:41 - ERROR - stderr - +2025-02-06 06:30:41 - ERROR - stderr - +2025-02-06 06:30:41 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.6334632635116577, 'learning_rate': 4.239166197932776e-07, 'epoch': 2.73} +2025-02-06 06:30:41 - ERROR - stderr - 91%|█████████ | 20410/22434 [20:23:01<1:23:59, 2.49s/it] +2025-02-06 06:30:44 - ERROR - stderr - 91%|█████████ | 20411/22434 [20:23:03<1:24:13, 2.50s/it] +2025-02-06 06:30:44 - ERROR - stderr - +2025-02-06 06:30:44 - ERROR - stderr - +2025-02-06 06:30:44 - INFO - stdout - {'loss': 0.3258, 'grad_norm': 1.3604103326797485, 'learning_rate': 4.2350081463034767e-07, 'epoch': 2.73} +2025-02-06 06:30:44 - ERROR - stderr - 91%|█████████ | 20411/22434 [20:23:03<1:24:13, 2.50s/it] +2025-02-06 06:30:46 - ERROR - stderr - 91%|█████████ | 20412/22434 [20:23:06<1:24:49, 2.52s/it] +2025-02-06 06:30:46 - ERROR - stderr - +2025-02-06 06:30:46 - ERROR - stderr - +2025-02-06 06:30:46 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5220311880111694, 'learning_rate': 4.230852090803794e-07, 'epoch': 2.73} +2025-02-06 06:30:46 - ERROR - stderr - 91%|█████████ | 20412/22434 [20:23:06<1:24:49, 2.52s/it] +2025-02-06 06:30:49 - ERROR - stderr - 91%|█████████ | 20413/22434 [20:23:08<1:24:30, 2.51s/it] +2025-02-06 06:30:49 - ERROR - stderr - +2025-02-06 06:30:49 - ERROR - stderr - +2025-02-06 06:30:49 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.5934181213378906, 'learning_rate': 4.22669803152036e-07, 'epoch': 2.73} +2025-02-06 06:30:49 - ERROR - stderr - 91%|█████████ | 20413/22434 [20:23:08<1:24:30, 2.51s/it] +2025-02-06 06:30:51 - ERROR - stderr - 91%|█████████ | 20414/22434 [20:23:11<1:24:26, 2.51s/it] +2025-02-06 06:30:51 - ERROR - stderr - +2025-02-06 06:30:51 - ERROR - stderr - +2025-02-06 06:30:51 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.4442826509475708, 'learning_rate': 4.22254596853976e-07, 'epoch': 2.73} +2025-02-06 06:30:51 - ERROR - stderr - 91%|█████████ | 20414/22434 [20:23:11<1:24:26, 2.51s/it] +2025-02-06 06:30:54 - ERROR - stderr - 91%|█████████ | 20415/22434 [20:23:13<1:24:36, 2.51s/it] +2025-02-06 06:30:54 - ERROR - stderr - +2025-02-06 06:30:54 - ERROR - stderr - +2025-02-06 06:30:54 - INFO - stdout - {'loss': 0.3159, 'grad_norm': 1.4409736394882202, 'learning_rate': 4.2183959019485354e-07, 'epoch': 2.73} +2025-02-06 06:30:54 - ERROR - stderr - 91%|█████████ | 20415/22434 [20:23:13<1:24:36, 2.51s/it] +2025-02-06 06:30:56 - ERROR - stderr - 91%|█████████ | 20416/22434 [20:23:16<1:24:23, 2.51s/it] +2025-02-06 06:30:56 - ERROR - stderr - +2025-02-06 06:30:56 - ERROR - stderr - +2025-02-06 06:30:56 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.4900918006896973, 'learning_rate': 4.214247831833207e-07, 'epoch': 2.73} +2025-02-06 06:30:56 - ERROR - stderr - 91%|█████████ | 20416/22434 [20:23:16<1:24:23, 2.51s/it] +2025-02-06 06:30:59 - ERROR - stderr - 91%|█████████ | 20417/22434 [20:23:18<1:24:54, 2.53s/it] +2025-02-06 06:30:59 - ERROR - stderr - +2025-02-06 06:30:59 - ERROR - stderr - +2025-02-06 06:30:59 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.5594525337219238, 'learning_rate': 4.210101758280216e-07, 'epoch': 2.73} +2025-02-06 06:30:59 - ERROR - stderr - 91%|█████████ | 20417/22434 [20:23:19<1:24:54, 2.53s/it] +2025-02-06 06:31:01 - ERROR - stderr - 91%|█████████ | 20418/22434 [20:23:21<1:26:43, 2.58s/it] +2025-02-06 06:31:01 - ERROR - stderr - +2025-02-06 06:31:01 - ERROR - stderr - +2025-02-06 06:31:01 - INFO - stdout - {'loss': 0.3871, 'grad_norm': 1.5929666757583618, 'learning_rate': 4.205957681375994e-07, 'epoch': 2.73} +2025-02-06 06:31:01 - ERROR - stderr - 91%|█████████ | 20418/22434 [20:23:21<1:26:43, 2.58s/it] +2025-02-06 06:31:04 - ERROR - stderr - 91%|█████████ | 20419/22434 [20:23:24<1:26:23, 2.57s/it] +2025-02-06 06:31:04 - ERROR - stderr - +2025-02-06 06:31:04 - ERROR - stderr - +2025-02-06 06:31:04 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.52675461769104, 'learning_rate': 4.2018156012069265e-07, 'epoch': 2.73} +2025-02-06 06:31:04 - ERROR - stderr - 91%|█████████ | 20419/22434 [20:23:24<1:26:23, 2.57s/it] +2025-02-06 06:31:06 - ERROR - stderr - 91%|█████████ | 20420/22434 [20:23:26<1:24:44, 2.52s/it] +2025-02-06 06:31:06 - ERROR - stderr - +2025-02-06 06:31:06 - ERROR - stderr - +2025-02-06 06:31:06 - INFO - stdout - {'loss': 0.3313, 'grad_norm': 1.6141642332077026, 'learning_rate': 4.197675517859323e-07, 'epoch': 2.73} +2025-02-06 06:31:06 - ERROR - stderr - 91%|█████████ | 20420/22434 [20:23:26<1:24:44, 2.52s/it] +2025-02-06 06:31:09 - ERROR - stderr - 91%|█████████ | 20421/22434 [20:23:29<1:23:59, 2.50s/it] +2025-02-06 06:31:09 - ERROR - stderr - +2025-02-06 06:31:09 - ERROR - stderr - +2025-02-06 06:31:09 - INFO - stdout - {'loss': 0.3959, 'grad_norm': 1.5692574977874756, 'learning_rate': 4.1935374314195254e-07, 'epoch': 2.73} +2025-02-06 06:31:09 - ERROR - stderr - 91%|█████████ | 20421/22434 [20:23:29<1:23:59, 2.50s/it] +2025-02-06 06:31:11 - ERROR - stderr - 91%|█████████ | 20422/22434 [20:23:31<1:24:30, 2.52s/it] +2025-02-06 06:31:11 - ERROR - stderr - +2025-02-06 06:31:11 - ERROR - stderr - +2025-02-06 06:31:11 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.5117493867874146, 'learning_rate': 4.189401341973742e-07, 'epoch': 2.73} +2025-02-06 06:31:11 - ERROR - stderr - 91%|█████████ | 20422/22434 [20:23:31<1:24:30, 2.52s/it] +2025-02-06 06:31:14 - ERROR - stderr - 91%|█████████ | 20423/22434 [20:23:34<1:24:13, 2.51s/it] +2025-02-06 06:31:14 - ERROR - stderr - +2025-02-06 06:31:14 - ERROR - stderr - +2025-02-06 06:31:14 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.543954610824585, 'learning_rate': 4.1852672496082267e-07, 'epoch': 2.73} +2025-02-06 06:31:14 - ERROR - stderr - 91%|█████████ | 20423/22434 [20:23:34<1:24:13, 2.51s/it] +2025-02-06 06:31:17 - ERROR - stderr - 91%|█████████ | 20424/22434 [20:23:36<1:26:20, 2.58s/it] +2025-02-06 06:31:17 - ERROR - stderr - +2025-02-06 06:31:17 - ERROR - stderr - +2025-02-06 06:31:17 - INFO - stdout - {'loss': 0.3364, 'grad_norm': 1.4033490419387817, 'learning_rate': 4.1811351544091217e-07, 'epoch': 2.73} +2025-02-06 06:31:17 - ERROR - stderr - 91%|█████████ | 20424/22434 [20:23:36<1:26:20, 2.58s/it] +2025-02-06 06:31:19 - ERROR - stderr - 91%|█████████ | 20425/22434 [20:23:39<1:25:46, 2.56s/it] +2025-02-06 06:31:19 - ERROR - stderr - +2025-02-06 06:31:19 - ERROR - stderr - +2025-02-06 06:31:19 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.6456210613250732, 'learning_rate': 4.1770050564625577e-07, 'epoch': 2.73} +2025-02-06 06:31:19 - ERROR - stderr - 91%|█████████ | 20425/22434 [20:23:39<1:25:46, 2.56s/it] +2025-02-06 06:31:22 - ERROR - stderr - 91%|█████████ | 20426/22434 [20:23:42<1:26:01, 2.57s/it] +2025-02-06 06:31:22 - ERROR - stderr - +2025-02-06 06:31:22 - ERROR - stderr - +2025-02-06 06:31:22 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.4848979711532593, 'learning_rate': 4.1728769558546547e-07, 'epoch': 2.73} +2025-02-06 06:31:22 - ERROR - stderr - 91%|█████████ | 20426/22434 [20:23:42<1:26:01, 2.57s/it] +2025-02-06 06:31:24 - ERROR - stderr - 91%|█████████ | 20427/22434 [20:23:44<1:25:09, 2.55s/it] +2025-02-06 06:31:24 - ERROR - stderr - +2025-02-06 06:31:24 - ERROR - stderr - +2025-02-06 06:31:24 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.4872301816940308, 'learning_rate': 4.1687508526714103e-07, 'epoch': 2.73} +2025-02-06 06:31:24 - ERROR - stderr - 91%|█████████ | 20427/22434 [20:23:44<1:25:09, 2.55s/it] +2025-02-06 06:31:27 - ERROR - stderr - 91%|█████████ | 20428/22434 [20:23:47<1:25:41, 2.56s/it] +2025-02-06 06:31:27 - ERROR - stderr - +2025-02-06 06:31:27 - ERROR - stderr - +2025-02-06 06:31:27 - INFO - stdout - {'loss': 0.3312, 'grad_norm': 1.499072551727295, 'learning_rate': 4.164626746998868e-07, 'epoch': 2.73} +2025-02-06 06:31:27 - ERROR - stderr - 91%|█████████ | 20428/22434 [20:23:47<1:25:41, 2.56s/it] +2025-02-06 06:31:29 - ERROR - stderr - 91%|█████████ | 20429/22434 [20:23:49<1:24:33, 2.53s/it] +2025-02-06 06:31:29 - ERROR - stderr - +2025-02-06 06:31:29 - ERROR - stderr - +2025-02-06 06:31:29 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.6013410091400146, 'learning_rate': 4.1605046389229686e-07, 'epoch': 2.73} +2025-02-06 06:31:29 - ERROR - stderr - 91%|█████████ | 20429/22434 [20:23:49<1:24:33, 2.53s/it] +2025-02-06 06:31:32 - ERROR - stderr - 91%|█████████ | 20430/22434 [20:23:52<1:26:04, 2.58s/it] +2025-02-06 06:31:32 - ERROR - stderr - +2025-02-06 06:31:32 - ERROR - stderr - +2025-02-06 06:31:32 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.582965612411499, 'learning_rate': 4.1563845285296443e-07, 'epoch': 2.73} +2025-02-06 06:31:32 - ERROR - stderr - 91%|█████████ | 20430/22434 [20:23:52<1:26:04, 2.58s/it] +2025-02-06 06:31:35 - ERROR - stderr - 91%|█████████ | 20431/22434 [20:23:54<1:25:35, 2.56s/it] +2025-02-06 06:31:35 - ERROR - stderr - +2025-02-06 06:31:35 - ERROR - stderr - +2025-02-06 06:31:35 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.5268311500549316, 'learning_rate': 4.152266415904771e-07, 'epoch': 2.73} +2025-02-06 06:31:35 - ERROR - stderr - 91%|█████████ | 20431/22434 [20:23:54<1:25:35, 2.56s/it] +2025-02-06 06:31:37 - ERROR - stderr - 91%|█████████ | 20432/22434 [20:23:57<1:25:22, 2.56s/it] +2025-02-06 06:31:37 - ERROR - stderr - +2025-02-06 06:31:37 - ERROR - stderr - +2025-02-06 06:31:37 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.6414775848388672, 'learning_rate': 4.1481503011341906e-07, 'epoch': 2.73} +2025-02-06 06:31:37 - ERROR - stderr - 91%|█████████ | 20432/22434 [20:23:57<1:25:22, 2.56s/it] +2025-02-06 06:31:40 - ERROR - stderr - 91%|█████████ | 20433/22434 [20:23:59<1:25:45, 2.57s/it] +2025-02-06 06:31:40 - ERROR - stderr - +2025-02-06 06:31:40 - ERROR - stderr - +2025-02-06 06:31:40 - INFO - stdout - {'loss': 0.402, 'grad_norm': 1.6225417852401733, 'learning_rate': 4.14403618430369e-07, 'epoch': 2.73} +2025-02-06 06:31:40 - ERROR - stderr - 91%|█████████ | 20433/22434 [20:23:59<1:25:45, 2.57s/it] +2025-02-06 06:31:42 - ERROR - stderr - 91%|█████████ | 20434/22434 [20:24:02<1:25:21, 2.56s/it] +2025-02-06 06:31:42 - ERROR - stderr - +2025-02-06 06:31:42 - ERROR - stderr - +2025-02-06 06:31:42 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.4447029829025269, 'learning_rate': 4.139924065499035e-07, 'epoch': 2.73} +2025-02-06 06:31:42 - ERROR - stderr - 91%|█████████ | 20434/22434 [20:24:02<1:25:21, 2.56s/it] +2025-02-06 06:31:45 - ERROR - stderr - 91%|█████████ | 20435/22434 [20:24:05<1:25:07, 2.55s/it] +2025-02-06 06:31:45 - ERROR - stderr - +2025-02-06 06:31:45 - ERROR - stderr - +2025-02-06 06:31:45 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.5432274341583252, 'learning_rate': 4.135813944805933e-07, 'epoch': 2.73} +2025-02-06 06:31:45 - ERROR - stderr - 91%|█████████ | 20435/22434 [20:24:05<1:25:07, 2.55s/it] +2025-02-06 06:31:47 - ERROR - stderr - 91%|█████████ | 20436/22434 [20:24:07<1:24:32, 2.54s/it] +2025-02-06 06:31:47 - ERROR - stderr - +2025-02-06 06:31:47 - ERROR - stderr - +2025-02-06 06:31:47 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.7019670009613037, 'learning_rate': 4.1317058223100614e-07, 'epoch': 2.73} +2025-02-06 06:31:47 - ERROR - stderr - 91%|█████████ | 20436/22434 [20:24:07<1:24:32, 2.54s/it] +2025-02-06 06:31:50 - ERROR - stderr - 91%|█████████ | 20437/22434 [20:24:10<1:24:29, 2.54s/it] +2025-02-06 06:31:50 - ERROR - stderr - +2025-02-06 06:31:50 - ERROR - stderr - +2025-02-06 06:31:50 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.7091443538665771, 'learning_rate': 4.12759969809704e-07, 'epoch': 2.73} +2025-02-06 06:31:50 - ERROR - stderr - 91%|█████████ | 20437/22434 [20:24:10<1:24:29, 2.54s/it] +2025-02-06 06:31:52 - ERROR - stderr - 91%|█████████ | 20438/22434 [20:24:12<1:24:04, 2.53s/it] +2025-02-06 06:31:52 - ERROR - stderr - +2025-02-06 06:31:52 - ERROR - stderr - +2025-02-06 06:31:52 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.4933542013168335, 'learning_rate': 4.123495572252467e-07, 'epoch': 2.73} +2025-02-06 06:31:52 - ERROR - stderr - 91%|█████████ | 20438/22434 [20:24:12<1:24:04, 2.53s/it] +2025-02-06 06:31:55 - ERROR - stderr - 91%|█████████ | 20439/22434 [20:24:15<1:24:09, 2.53s/it] +2025-02-06 06:31:55 - ERROR - stderr - +2025-02-06 06:31:55 - ERROR - stderr - +2025-02-06 06:31:55 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.6354148387908936, 'learning_rate': 4.1193934448618857e-07, 'epoch': 2.73} +2025-02-06 06:31:55 - ERROR - stderr - 91%|█████████ | 20439/22434 [20:24:15<1:24:09, 2.53s/it] +2025-02-06 06:31:57 - ERROR - stderr - 91%|█████████ | 20440/22434 [20:24:17<1:23:57, 2.53s/it] +2025-02-06 06:31:57 - ERROR - stderr - +2025-02-06 06:31:57 - ERROR - stderr - +2025-02-06 06:31:57 - INFO - stdout - {'loss': 0.3107, 'grad_norm': 1.5271919965744019, 'learning_rate': 4.1152933160108157e-07, 'epoch': 2.73} +2025-02-06 06:31:57 - ERROR - stderr - 91%|█████████ | 20440/22434 [20:24:17<1:23:57, 2.53s/it] +2025-02-06 06:32:00 - ERROR - stderr - 91%|█████████ | 20441/22434 [20:24:20<1:23:42, 2.52s/it] +2025-02-06 06:32:00 - ERROR - stderr - +2025-02-06 06:32:00 - ERROR - stderr - +2025-02-06 06:32:00 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.6925134658813477, 'learning_rate': 4.1111951857846775e-07, 'epoch': 2.73} +2025-02-06 06:32:00 - ERROR - stderr - 91%|█████████ | 20441/22434 [20:24:20<1:23:42, 2.52s/it] +2025-02-06 06:32:03 - ERROR - stderr - 91%|█████████ | 20442/22434 [20:24:22<1:25:20, 2.57s/it] +2025-02-06 06:32:03 - ERROR - stderr - +2025-02-06 06:32:03 - ERROR - stderr - +2025-02-06 06:32:03 - INFO - stdout - {'loss': 0.3328, 'grad_norm': 1.4857733249664307, 'learning_rate': 4.1070990542689373e-07, 'epoch': 2.73} +2025-02-06 06:32:03 - ERROR - stderr - 91%|█████████ | 20442/22434 [20:24:22<1:25:20, 2.57s/it] +2025-02-06 06:32:05 - ERROR - stderr - 91%|█████████ | 20443/22434 [20:24:25<1:24:46, 2.55s/it] +2025-02-06 06:32:05 - ERROR - stderr - +2025-02-06 06:32:05 - ERROR - stderr - +2025-02-06 06:32:05 - INFO - stdout - {'loss': 0.3352, 'grad_norm': 1.5856181383132935, 'learning_rate': 4.1030049215489586e-07, 'epoch': 2.73} +2025-02-06 06:32:05 - ERROR - stderr - 91%|█████████ | 20443/22434 [20:24:25<1:24:46, 2.55s/it] +2025-02-06 06:32:07 - ERROR - stderr - 91%|█████████ | 20444/22434 [20:24:27<1:23:34, 2.52s/it] +2025-02-06 06:32:08 - ERROR - stderr - +2025-02-06 06:32:08 - ERROR - stderr - +2025-02-06 06:32:08 - INFO - stdout - {'loss': 0.3284, 'grad_norm': 1.601984977722168, 'learning_rate': 4.0989127877100523e-07, 'epoch': 2.73} +2025-02-06 06:32:08 - ERROR - stderr - 91%|█████████ | 20444/22434 [20:24:27<1:23:34, 2.52s/it] +2025-02-06 06:32:10 - ERROR - stderr - 91%|█████████ | 20445/22434 [20:24:30<1:23:27, 2.52s/it] +2025-02-06 06:32:10 - ERROR - stderr - +2025-02-06 06:32:10 - ERROR - stderr - +2025-02-06 06:32:10 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.6515451669692993, 'learning_rate': 4.0948226528375714e-07, 'epoch': 2.73} +2025-02-06 06:32:10 - ERROR - stderr - 91%|█████████ | 20445/22434 [20:24:30<1:23:27, 2.52s/it] +2025-02-06 06:32:12 - ERROR - stderr - 91%|█████████ | 20446/22434 [20:24:32<1:22:57, 2.50s/it] +2025-02-06 06:32:13 - ERROR - stderr - +2025-02-06 06:32:13 - ERROR - stderr - +2025-02-06 06:32:13 - INFO - stdout - {'loss': 0.3373, 'grad_norm': 1.5508577823638916, 'learning_rate': 4.090734517016726e-07, 'epoch': 2.73} +2025-02-06 06:32:13 - ERROR - stderr - 91%|█████████ | 20446/22434 [20:24:32<1:22:57, 2.50s/it] +2025-02-06 06:32:15 - ERROR - stderr - 91%|█████████ | 20447/22434 [20:24:35<1:22:24, 2.49s/it] +2025-02-06 06:32:15 - ERROR - stderr - +2025-02-06 06:32:15 - ERROR - stderr - +2025-02-06 06:32:15 - INFO - stdout - {'loss': 0.3579, 'grad_norm': 1.5244628190994263, 'learning_rate': 4.0866483803327583e-07, 'epoch': 2.73} +2025-02-06 06:32:15 - ERROR - stderr - 91%|█████████ | 20447/22434 [20:24:35<1:22:24, 2.49s/it] +2025-02-06 06:32:18 - ERROR - stderr - 91%|█████████ | 20448/22434 [20:24:37<1:23:14, 2.52s/it] +2025-02-06 06:32:18 - ERROR - stderr - +2025-02-06 06:32:18 - ERROR - stderr - +2025-02-06 06:32:18 - INFO - stdout - {'loss': 0.3299, 'grad_norm': 1.3885129690170288, 'learning_rate': 4.0825642428708125e-07, 'epoch': 2.73} +2025-02-06 06:32:18 - ERROR - stderr - 91%|█████████ | 20448/22434 [20:24:37<1:23:14, 2.52s/it] +2025-02-06 06:32:20 - ERROR - stderr - 91%|█████████ | 20449/22434 [20:24:40<1:23:17, 2.52s/it] +2025-02-06 06:32:20 - ERROR - stderr - +2025-02-06 06:32:20 - ERROR - stderr - +2025-02-06 06:32:20 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.4743024110794067, 'learning_rate': 4.078482104716042e-07, 'epoch': 2.73} +2025-02-06 06:32:20 - ERROR - stderr - 91%|█████████ | 20449/22434 [20:24:40<1:23:17, 2.52s/it] +2025-02-06 06:32:23 - ERROR - stderr - 91%|█████████ | 20450/22434 [20:24:42<1:23:11, 2.52s/it] +2025-02-06 06:32:23 - ERROR - stderr - +2025-02-06 06:32:23 - ERROR - stderr - +2025-02-06 06:32:23 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.723941683769226, 'learning_rate': 4.0744019659535116e-07, 'epoch': 2.73} +2025-02-06 06:32:23 - ERROR - stderr - 91%|█████████ | 20450/22434 [20:24:42<1:23:11, 2.52s/it] +2025-02-06 06:32:25 - ERROR - stderr - 91%|█████████ | 20451/22434 [20:24:45<1:23:42, 2.53s/it] +2025-02-06 06:32:25 - ERROR - stderr - +2025-02-06 06:32:25 - ERROR - stderr - +2025-02-06 06:32:25 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.462186336517334, 'learning_rate': 4.070323826668299e-07, 'epoch': 2.73} +2025-02-06 06:32:25 - ERROR - stderr - 91%|█████████ | 20451/22434 [20:24:45<1:23:42, 2.53s/it] +2025-02-06 06:32:28 - ERROR - stderr - 91%|█████████ | 20452/22434 [20:24:47<1:23:54, 2.54s/it] +2025-02-06 06:32:28 - ERROR - stderr - +2025-02-06 06:32:28 - ERROR - stderr - +2025-02-06 06:32:28 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.6000711917877197, 'learning_rate': 4.066247686945379e-07, 'epoch': 2.73} +2025-02-06 06:32:28 - ERROR - stderr - 91%|█████████ | 20452/22434 [20:24:47<1:23:54, 2.54s/it] +2025-02-06 06:32:30 - ERROR - stderr - 91%|█████████ | 20453/22434 [20:24:50<1:23:34, 2.53s/it] +2025-02-06 06:32:30 - ERROR - stderr - +2025-02-06 06:32:30 - ERROR - stderr - +2025-02-06 06:32:30 - INFO - stdout - {'loss': 0.3218, 'grad_norm': 1.5289440155029297, 'learning_rate': 4.0621735468697297e-07, 'epoch': 2.74} +2025-02-06 06:32:30 - ERROR - stderr - 91%|█████████ | 20453/22434 [20:24:50<1:23:34, 2.53s/it] +2025-02-06 06:32:33 - ERROR - stderr - 91%|█████████ | 20454/22434 [20:24:53<1:24:08, 2.55s/it] +2025-02-06 06:32:33 - ERROR - stderr - +2025-02-06 06:32:33 - ERROR - stderr - +2025-02-06 06:32:33 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.6520367860794067, 'learning_rate': 4.058101406526271e-07, 'epoch': 2.74} +2025-02-06 06:32:33 - ERROR - stderr - 91%|█████████ | 20454/22434 [20:24:53<1:24:08, 2.55s/it] +2025-02-06 06:32:35 - ERROR - stderr - 91%|█████████ | 20455/22434 [20:24:55<1:22:54, 2.51s/it] +2025-02-06 06:32:35 - ERROR - stderr - +2025-02-06 06:32:35 - ERROR - stderr - +2025-02-06 06:32:35 - INFO - stdout - {'loss': 0.3895, 'grad_norm': 1.7498400211334229, 'learning_rate': 4.0540312659998803e-07, 'epoch': 2.74} +2025-02-06 06:32:35 - ERROR - stderr - 91%|█████████ | 20455/22434 [20:24:55<1:22:54, 2.51s/it] +2025-02-06 06:32:38 - ERROR - stderr - 91%|█████████ | 20456/22434 [20:24:57<1:22:45, 2.51s/it] +2025-02-06 06:32:38 - ERROR - stderr - +2025-02-06 06:32:38 - ERROR - stderr - +2025-02-06 06:32:38 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.6241114139556885, 'learning_rate': 4.0499631253754003e-07, 'epoch': 2.74} +2025-02-06 06:32:38 - ERROR - stderr - 91%|█████████ | 20456/22434 [20:24:58<1:22:45, 2.51s/it] +2025-02-06 06:32:40 - ERROR - stderr - 91%|█████████ | 20457/22434 [20:25:00<1:22:32, 2.50s/it] +2025-02-06 06:32:40 - ERROR - stderr - +2025-02-06 06:32:40 - ERROR - stderr - +2025-02-06 06:32:40 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.5540971755981445, 'learning_rate': 4.0458969847376185e-07, 'epoch': 2.74} +2025-02-06 06:32:40 - ERROR - stderr - 91%|█████████ | 20457/22434 [20:25:00<1:22:32, 2.50s/it] +2025-02-06 06:32:43 - ERROR - stderr - 91%|█████████ | 20458/22434 [20:25:02<1:22:20, 2.50s/it] +2025-02-06 06:32:43 - ERROR - stderr - +2025-02-06 06:32:43 - ERROR - stderr - +2025-02-06 06:32:43 - INFO - stdout - {'loss': 0.3323, 'grad_norm': 1.469602108001709, 'learning_rate': 4.0418328441713007e-07, 'epoch': 2.74} +2025-02-06 06:32:43 - ERROR - stderr - 91%|█████████ | 20458/22434 [20:25:03<1:22:20, 2.50s/it] +2025-02-06 06:32:45 - ERROR - stderr - 91%|█████████ | 20459/22434 [20:25:05<1:23:49, 2.55s/it] +2025-02-06 06:32:45 - ERROR - stderr - +2025-02-06 06:32:45 - ERROR - stderr - +2025-02-06 06:32:45 - INFO - stdout - {'loss': 0.3117, 'grad_norm': 1.5364309549331665, 'learning_rate': 4.037770703761168e-07, 'epoch': 2.74} +2025-02-06 06:32:45 - ERROR - stderr - 91%|█████████ | 20459/22434 [20:25:05<1:23:49, 2.55s/it] +2025-02-06 06:32:48 - ERROR - stderr - 91%|█████████ | 20460/22434 [20:25:08<1:22:53, 2.52s/it] +2025-02-06 06:32:48 - ERROR - stderr - +2025-02-06 06:32:48 - ERROR - stderr - +2025-02-06 06:32:48 - INFO - stdout - {'loss': 0.3219, 'grad_norm': 1.546209454536438, 'learning_rate': 4.033710563591853e-07, 'epoch': 2.74} +2025-02-06 06:32:48 - ERROR - stderr - 91%|█████████ | 20460/22434 [20:25:08<1:22:53, 2.52s/it] +2025-02-06 06:32:51 - ERROR - stderr - 91%|█████████ | 20461/22434 [20:25:10<1:25:03, 2.59s/it] +2025-02-06 06:32:51 - ERROR - stderr - +2025-02-06 06:32:51 - ERROR - stderr - +2025-02-06 06:32:51 - INFO - stdout - {'loss': 0.4176, 'grad_norm': 1.7581359148025513, 'learning_rate': 4.0296524237480426e-07, 'epoch': 2.74} +2025-02-06 06:32:51 - ERROR - stderr - 91%|█████████ | 20461/22434 [20:25:10<1:25:03, 2.59s/it] +2025-02-06 06:32:53 - ERROR - stderr - 91%|█████████ | 20462/22434 [20:25:13<1:24:29, 2.57s/it] +2025-02-06 06:32:53 - ERROR - stderr - +2025-02-06 06:32:53 - ERROR - stderr - +2025-02-06 06:32:53 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.5537997484207153, 'learning_rate': 4.025596284314259e-07, 'epoch': 2.74} +2025-02-06 06:32:53 - ERROR - stderr - 91%|█████████ | 20462/22434 [20:25:13<1:24:29, 2.57s/it] +2025-02-06 06:32:56 - ERROR - stderr - 91%|█████████ | 20463/22434 [20:25:15<1:23:12, 2.53s/it] +2025-02-06 06:32:56 - ERROR - stderr - +2025-02-06 06:32:56 - ERROR - stderr - +2025-02-06 06:32:56 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.5532357692718506, 'learning_rate': 4.0215421453751014e-07, 'epoch': 2.74} +2025-02-06 06:32:56 - ERROR - stderr - 91%|█████████ | 20463/22434 [20:25:15<1:23:12, 2.53s/it] +2025-02-06 06:32:58 - ERROR - stderr - 91%|█████████ | 20464/22434 [20:25:18<1:23:22, 2.54s/it] +2025-02-06 06:32:58 - ERROR - stderr - +2025-02-06 06:32:58 - ERROR - stderr - +2025-02-06 06:32:58 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.5923744440078735, 'learning_rate': 4.017490007015068e-07, 'epoch': 2.74} +2025-02-06 06:32:58 - ERROR - stderr - 91%|█████████ | 20464/22434 [20:25:18<1:23:22, 2.54s/it] +2025-02-06 06:33:01 - ERROR - stderr - 91%|█████████ | 20465/22434 [20:25:20<1:23:29, 2.54s/it] +2025-02-06 06:33:01 - ERROR - stderr - +2025-02-06 06:33:01 - ERROR - stderr - +2025-02-06 06:33:01 - INFO - stdout - {'loss': 0.3168, 'grad_norm': 1.4443254470825195, 'learning_rate': 4.0134398693185803e-07, 'epoch': 2.74} +2025-02-06 06:33:01 - ERROR - stderr - 91%|█████████ | 20465/22434 [20:25:20<1:23:29, 2.54s/it] +2025-02-06 06:33:03 - ERROR - stderr - 91%|█████████ | 20466/22434 [20:25:23<1:23:40, 2.55s/it] +2025-02-06 06:33:03 - ERROR - stderr - +2025-02-06 06:33:03 - ERROR - stderr - +2025-02-06 06:33:03 - INFO - stdout - {'loss': 0.3907, 'grad_norm': 1.6133848428726196, 'learning_rate': 4.009391732370116e-07, 'epoch': 2.74} +2025-02-06 06:33:03 - ERROR - stderr - 91%|█████████ | 20466/22434 [20:25:23<1:23:40, 2.55s/it] +2025-02-06 06:33:06 - ERROR - stderr - 91%|█████████ | 20467/22434 [20:25:25<1:23:15, 2.54s/it] +2025-02-06 06:33:06 - ERROR - stderr - +2025-02-06 06:33:06 - ERROR - stderr - +2025-02-06 06:33:06 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.5660719871520996, 'learning_rate': 4.005345596254029e-07, 'epoch': 2.74} +2025-02-06 06:33:06 - ERROR - stderr - 91%|█████████ | 20467/22434 [20:25:26<1:23:15, 2.54s/it] +2025-02-06 06:33:08 - ERROR - stderr - 91%|█████████ | 20468/22434 [20:25:28<1:24:00, 2.56s/it] +2025-02-06 06:33:08 - ERROR - stderr - +2025-02-06 06:33:08 - ERROR - stderr - +2025-02-06 06:33:08 - INFO - stdout - {'loss': 0.3573, 'grad_norm': 1.6654837131500244, 'learning_rate': 4.001301461054641e-07, 'epoch': 2.74} +2025-02-06 06:33:08 - ERROR - stderr - 91%|█████████ | 20468/22434 [20:25:28<1:24:00, 2.56s/it] +2025-02-06 06:33:11 - ERROR - stderr - 91%|█████████ | 20469/22434 [20:25:31<1:24:19, 2.57s/it] +2025-02-06 06:33:11 - ERROR - stderr - +2025-02-06 06:33:11 - ERROR - stderr - +2025-02-06 06:33:11 - INFO - stdout - {'loss': 0.337, 'grad_norm': 1.330752968788147, 'learning_rate': 3.997259326856262e-07, 'epoch': 2.74} +2025-02-06 06:33:11 - ERROR - stderr - 91%|█████████ | 20469/22434 [20:25:31<1:24:19, 2.57s/it] +2025-02-06 06:33:13 - ERROR - stderr - 91%|█████████ | 20470/22434 [20:25:33<1:23:11, 2.54s/it] +2025-02-06 06:33:13 - ERROR - stderr - +2025-02-06 06:33:13 - ERROR - stderr - +2025-02-06 06:33:13 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.469082236289978, 'learning_rate': 3.9932191937431474e-07, 'epoch': 2.74} +2025-02-06 06:33:13 - ERROR - stderr - 91%|█████████ | 20470/22434 [20:25:33<1:23:11, 2.54s/it] +2025-02-06 06:33:16 - ERROR - stderr - 91%|█████████ | 20471/22434 [20:25:36<1:22:43, 2.53s/it] +2025-02-06 06:33:16 - ERROR - stderr - +2025-02-06 06:33:16 - ERROR - stderr - +2025-02-06 06:33:16 - INFO - stdout - {'loss': 0.3616, 'grad_norm': 1.6516684293746948, 'learning_rate': 3.98918106179953e-07, 'epoch': 2.74} +2025-02-06 06:33:16 - ERROR - stderr - 91%|█████████ | 20471/22434 [20:25:36<1:22:43, 2.53s/it] +2025-02-06 06:33:19 - ERROR - stderr - 91%|█████████▏| 20472/22434 [20:25:38<1:24:39, 2.59s/it] +2025-02-06 06:33:19 - ERROR - stderr - +2025-02-06 06:33:19 - ERROR - stderr - +2025-02-06 06:33:19 - INFO - stdout - {'loss': 0.3316, 'grad_norm': 1.3364589214324951, 'learning_rate': 3.9851449311095415e-07, 'epoch': 2.74} +2025-02-06 06:33:19 - ERROR - stderr - 91%|█████████▏| 20472/22434 [20:25:38<1:24:39, 2.59s/it] +2025-02-06 06:33:22 - ERROR - stderr - 91%|█████████▏| 20473/22434 [20:25:41<1:28:46, 2.72s/it] +2025-02-06 06:33:22 - ERROR - stderr - +2025-02-06 06:33:22 - ERROR - stderr - +2025-02-06 06:33:22 - INFO - stdout - {'loss': 0.2953, 'grad_norm': 1.3165228366851807, 'learning_rate': 3.981110801757337e-07, 'epoch': 2.74} +2025-02-06 06:33:22 - ERROR - stderr - 91%|█████████▏| 20473/22434 [20:25:41<1:28:46, 2.72s/it] +2025-02-06 06:33:24 - ERROR - stderr - 91%|█████████▏| 20474/22434 [20:25:44<1:26:11, 2.64s/it] +2025-02-06 06:33:24 - ERROR - stderr - +2025-02-06 06:33:24 - ERROR - stderr - +2025-02-06 06:33:24 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.637839674949646, 'learning_rate': 3.977078673826995e-07, 'epoch': 2.74} +2025-02-06 06:33:24 - ERROR - stderr - 91%|█████████▏| 20474/22434 [20:25:44<1:26:11, 2.64s/it] +2025-02-06 06:33:27 - ERROR - stderr - 91%|█████████▏| 20475/22434 [20:25:46<1:25:04, 2.61s/it] +2025-02-06 06:33:27 - ERROR - stderr - +2025-02-06 06:33:27 - ERROR - stderr - +2025-02-06 06:33:27 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.376086950302124, 'learning_rate': 3.9730485474025695e-07, 'epoch': 2.74} +2025-02-06 06:33:27 - ERROR - stderr - 91%|█████████▏| 20475/22434 [20:25:46<1:25:04, 2.61s/it] +2025-02-06 06:33:29 - ERROR - stderr - 91%|█████████▏| 20476/22434 [20:25:49<1:23:57, 2.57s/it] +2025-02-06 06:33:29 - ERROR - stderr - +2025-02-06 06:33:29 - ERROR - stderr - +2025-02-06 06:33:29 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.56931734085083, 'learning_rate': 3.9690204225680595e-07, 'epoch': 2.74} +2025-02-06 06:33:29 - ERROR - stderr - 91%|█████████▏| 20476/22434 [20:25:49<1:23:57, 2.57s/it] +2025-02-06 06:33:32 - ERROR - stderr - 91%|█████████▏| 20477/22434 [20:25:51<1:23:22, 2.56s/it] +2025-02-06 06:33:32 - ERROR - stderr - +2025-02-06 06:33:32 - ERROR - stderr - +2025-02-06 06:33:32 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.6726292371749878, 'learning_rate': 3.964994299407421e-07, 'epoch': 2.74} +2025-02-06 06:33:32 - ERROR - stderr - 91%|█████████▏| 20477/22434 [20:25:51<1:23:22, 2.56s/it] +2025-02-06 06:33:34 - ERROR - stderr - 91%|█████████▏| 20478/22434 [20:25:54<1:23:32, 2.56s/it] +2025-02-06 06:33:34 - ERROR - stderr - +2025-02-06 06:33:34 - ERROR - stderr - +2025-02-06 06:33:34 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.3977808952331543, 'learning_rate': 3.960970178004586e-07, 'epoch': 2.74} +2025-02-06 06:33:34 - ERROR - stderr - 91%|█████████▏| 20478/22434 [20:25:54<1:23:32, 2.56s/it] +2025-02-06 06:33:37 - ERROR - stderr - 91%|█████████▏| 20479/22434 [20:25:57<1:24:05, 2.58s/it] +2025-02-06 06:33:37 - ERROR - stderr - +2025-02-06 06:33:37 - ERROR - stderr - +2025-02-06 06:33:37 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.5280615091323853, 'learning_rate': 3.9569480584434217e-07, 'epoch': 2.74} +2025-02-06 06:33:37 - ERROR - stderr - 91%|█████████▏| 20479/22434 [20:25:57<1:24:05, 2.58s/it] +2025-02-06 06:33:40 - ERROR - stderr - 91%|█████████▏| 20480/22434 [20:26:00<1:28:01, 2.70s/it] +2025-02-06 06:33:40 - ERROR - stderr - +2025-02-06 06:33:40 - ERROR - stderr - +2025-02-06 06:33:40 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.4856868982315063, 'learning_rate': 3.9529279408077715e-07, 'epoch': 2.74} +2025-02-06 06:33:40 - ERROR - stderr - 91%|█████████▏| 20480/22434 [20:26:00<1:28:01, 2.70s/it] +2025-02-06 06:33:42 - ERROR - stderr - 91%|█████████▏| 20481/22434 [20:26:02<1:27:33, 2.69s/it] +2025-02-06 06:33:43 - ERROR - stderr - +2025-02-06 06:33:43 - ERROR - stderr - +2025-02-06 06:33:43 - INFO - stdout - {'loss': 0.3925, 'grad_norm': 1.6603909730911255, 'learning_rate': 3.9489098251814353e-07, 'epoch': 2.74} +2025-02-06 06:33:43 - ERROR - stderr - 91%|█████████▏| 20481/22434 [20:26:02<1:27:33, 2.69s/it] +2025-02-06 06:33:45 - ERROR - stderr - 91%|█████████▏| 20482/22434 [20:26:05<1:25:27, 2.63s/it] +2025-02-06 06:33:45 - ERROR - stderr - +2025-02-06 06:33:45 - ERROR - stderr - +2025-02-06 06:33:45 - INFO - stdout - {'loss': 0.392, 'grad_norm': 1.5924251079559326, 'learning_rate': 3.9448937116481676e-07, 'epoch': 2.74} +2025-02-06 06:33:45 - ERROR - stderr - 91%|█████████▏| 20482/22434 [20:26:05<1:25:27, 2.63s/it] +2025-02-06 06:33:48 - ERROR - stderr - 91%|█████████▏| 20483/22434 [20:26:07<1:24:41, 2.60s/it] +2025-02-06 06:33:48 - ERROR - stderr - +2025-02-06 06:33:48 - ERROR - stderr - +2025-02-06 06:33:48 - INFO - stdout - {'loss': 0.3325, 'grad_norm': 1.4669179916381836, 'learning_rate': 3.9408796002916696e-07, 'epoch': 2.74} +2025-02-06 06:33:48 - ERROR - stderr - 91%|█████████▏| 20483/22434 [20:26:07<1:24:41, 2.60s/it] +2025-02-06 06:33:50 - ERROR - stderr - 91%|█████████▏| 20484/22434 [20:26:10<1:24:20, 2.60s/it] +2025-02-06 06:33:50 - ERROR - stderr - +2025-02-06 06:33:50 - ERROR - stderr - +2025-02-06 06:33:50 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.5693894624710083, 'learning_rate': 3.936867491195617e-07, 'epoch': 2.74} +2025-02-06 06:33:50 - ERROR - stderr - 91%|█████████▏| 20484/22434 [20:26:10<1:24:20, 2.60s/it] +2025-02-06 06:33:53 - ERROR - stderr - 91%|█████████▏| 20485/22434 [20:26:12<1:23:52, 2.58s/it] +2025-02-06 06:33:53 - ERROR - stderr - +2025-02-06 06:33:53 - ERROR - stderr - +2025-02-06 06:33:53 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.583762526512146, 'learning_rate': 3.9328573844436555e-07, 'epoch': 2.74} +2025-02-06 06:33:53 - ERROR - stderr - 91%|█████████▏| 20485/22434 [20:26:12<1:23:52, 2.58s/it] +2025-02-06 06:33:55 - ERROR - stderr - 91%|█████████▏| 20486/22434 [20:26:15<1:22:44, 2.55s/it] +2025-02-06 06:33:55 - ERROR - stderr - +2025-02-06 06:33:55 - ERROR - stderr - +2025-02-06 06:33:55 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.595390796661377, 'learning_rate': 3.928849280119329e-07, 'epoch': 2.74} +2025-02-06 06:33:55 - ERROR - stderr - 91%|█████████▏| 20486/22434 [20:26:15<1:22:44, 2.55s/it] +2025-02-06 06:33:58 - ERROR - stderr - 91%|█████████▏| 20487/22434 [20:26:17<1:22:49, 2.55s/it] +2025-02-06 06:33:58 - ERROR - stderr - +2025-02-06 06:33:58 - ERROR - stderr - +2025-02-06 06:33:58 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.699400544166565, 'learning_rate': 3.9248431783062366e-07, 'epoch': 2.74} +2025-02-06 06:33:58 - ERROR - stderr - 91%|█████████▏| 20487/22434 [20:26:17<1:22:49, 2.55s/it] +2025-02-06 06:34:00 - ERROR - stderr - 91%|█████████▏| 20488/22434 [20:26:20<1:23:16, 2.57s/it] +2025-02-06 06:34:00 - ERROR - stderr - +2025-02-06 06:34:00 - ERROR - stderr - +2025-02-06 06:34:00 - INFO - stdout - {'loss': 0.4033, 'grad_norm': 1.7009303569793701, 'learning_rate': 3.920839079087835e-07, 'epoch': 2.74} +2025-02-06 06:34:00 - ERROR - stderr - 91%|█████████▏| 20488/22434 [20:26:20<1:23:16, 2.57s/it] +2025-02-06 06:34:03 - ERROR - stderr - 91%|█████████▏| 20489/22434 [20:26:23<1:22:18, 2.54s/it] +2025-02-06 06:34:03 - ERROR - stderr - +2025-02-06 06:34:03 - ERROR - stderr - +2025-02-06 06:34:03 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.571163535118103, 'learning_rate': 3.9168369825476003e-07, 'epoch': 2.74} +2025-02-06 06:34:03 - ERROR - stderr - 91%|█████████▏| 20489/22434 [20:26:23<1:22:18, 2.54s/it] +2025-02-06 06:34:05 - ERROR - stderr - 91%|█████████▏| 20490/22434 [20:26:25<1:22:14, 2.54s/it] +2025-02-06 06:34:05 - ERROR - stderr - +2025-02-06 06:34:05 - ERROR - stderr - +2025-02-06 06:34:05 - INFO - stdout - {'loss': 0.3847, 'grad_norm': 1.8056821823120117, 'learning_rate': 3.912836888768978e-07, 'epoch': 2.74} +2025-02-06 06:34:05 - ERROR - stderr - 91%|█████████▏| 20490/22434 [20:26:25<1:22:14, 2.54s/it] +2025-02-06 06:34:08 - ERROR - stderr - 91%|█████████▏| 20491/22434 [20:26:28<1:21:48, 2.53s/it] +2025-02-06 06:34:08 - ERROR - stderr - +2025-02-06 06:34:08 - ERROR - stderr - +2025-02-06 06:34:08 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.7083226442337036, 'learning_rate': 3.9088387978353015e-07, 'epoch': 2.74} +2025-02-06 06:34:08 - ERROR - stderr - 91%|█████████▏| 20491/22434 [20:26:28<1:21:48, 2.53s/it] +2025-02-06 06:34:10 - ERROR - stderr - 91%|█████████▏| 20492/22434 [20:26:30<1:21:39, 2.52s/it] +2025-02-06 06:34:10 - ERROR - stderr - +2025-02-06 06:34:10 - ERROR - stderr - +2025-02-06 06:34:10 - INFO - stdout - {'loss': 0.3699, 'grad_norm': 1.6366592645645142, 'learning_rate': 3.904842709829948e-07, 'epoch': 2.74} +2025-02-06 06:34:10 - ERROR - stderr - 91%|█████████▏| 20492/22434 [20:26:30<1:21:39, 2.52s/it] +2025-02-06 06:34:13 - ERROR - stderr - 91%|█████████▏| 20493/22434 [20:26:33<1:22:05, 2.54s/it] +2025-02-06 06:34:13 - ERROR - stderr - +2025-02-06 06:34:13 - ERROR - stderr - +2025-02-06 06:34:13 - INFO - stdout - {'loss': 0.3145, 'grad_norm': 1.5583598613739014, 'learning_rate': 3.9008486248361957e-07, 'epoch': 2.74} +2025-02-06 06:34:13 - ERROR - stderr - 91%|█████████▏| 20493/22434 [20:26:33<1:22:05, 2.54s/it] +2025-02-06 06:34:16 - ERROR - stderr - 91%|█████████▏| 20494/22434 [20:26:35<1:23:37, 2.59s/it] +2025-02-06 06:34:16 - ERROR - stderr - +2025-02-06 06:34:16 - ERROR - stderr - +2025-02-06 06:34:16 - INFO - stdout - {'loss': 0.3625, 'grad_norm': 1.5246495008468628, 'learning_rate': 3.8968565429372885e-07, 'epoch': 2.74} +2025-02-06 06:34:16 - ERROR - stderr - 91%|█████████▏| 20494/22434 [20:26:35<1:23:37, 2.59s/it] +2025-02-06 06:34:18 - ERROR - stderr - 91%|█████████▏| 20495/22434 [20:26:38<1:22:59, 2.57s/it] +2025-02-06 06:34:18 - ERROR - stderr - +2025-02-06 06:34:18 - ERROR - stderr - +2025-02-06 06:34:18 - INFO - stdout - {'loss': 0.3325, 'grad_norm': 1.4283463954925537, 'learning_rate': 3.892866464216449e-07, 'epoch': 2.74} +2025-02-06 06:34:18 - ERROR - stderr - 91%|█████████▏| 20495/22434 [20:26:38<1:22:59, 2.57s/it] +2025-02-06 06:34:21 - ERROR - stderr - 91%|█████████▏| 20496/22434 [20:26:40<1:22:08, 2.54s/it] +2025-02-06 06:34:21 - ERROR - stderr - +2025-02-06 06:34:21 - ERROR - stderr - +2025-02-06 06:34:21 - INFO - stdout - {'loss': 0.4112, 'grad_norm': 1.7979589700698853, 'learning_rate': 3.888878388756845e-07, 'epoch': 2.74} +2025-02-06 06:34:21 - ERROR - stderr - 91%|█████████▏| 20496/22434 [20:26:40<1:22:08, 2.54s/it] +2025-02-06 06:34:23 - ERROR - stderr - 91%|█████████▏| 20497/22434 [20:26:43<1:22:37, 2.56s/it] +2025-02-06 06:34:23 - ERROR - stderr - +2025-02-06 06:34:23 - ERROR - stderr - +2025-02-06 06:34:23 - INFO - stdout - {'loss': 0.3668, 'grad_norm': 1.6331069469451904, 'learning_rate': 3.884892316641598e-07, 'epoch': 2.74} +2025-02-06 06:34:23 - ERROR - stderr - 91%|█████████▏| 20497/22434 [20:26:43<1:22:37, 2.56s/it] +2025-02-06 06:34:26 - ERROR - stderr - 91%|█████████▏| 20498/22434 [20:26:45<1:21:32, 2.53s/it] +2025-02-06 06:34:26 - ERROR - stderr - +2025-02-06 06:34:26 - ERROR - stderr - +2025-02-06 06:34:26 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.4808342456817627, 'learning_rate': 3.880908247953796e-07, 'epoch': 2.74} +2025-02-06 06:34:26 - ERROR - stderr - 91%|█████████▏| 20498/22434 [20:26:45<1:21:32, 2.53s/it] +2025-02-06 06:34:28 - ERROR - stderr - 91%|█████████▏| 20499/22434 [20:26:48<1:21:03, 2.51s/it] +2025-02-06 06:34:28 - ERROR - stderr - +2025-02-06 06:34:28 - ERROR - stderr - +2025-02-06 06:34:28 - INFO - stdout - {'loss': 0.3542, 'grad_norm': 1.497487187385559, 'learning_rate': 3.876926182776497e-07, 'epoch': 2.74} +2025-02-06 06:34:28 - ERROR - stderr - 91%|█████████▏| 20499/22434 [20:26:48<1:21:03, 2.51s/it] +2025-02-06 06:34:31 - ERROR - stderr - 91%|█████████▏| 20500/22434 [20:26:50<1:20:54, 2.51s/it] +2025-02-06 06:34:31 - ERROR - stderr - +2025-02-06 06:34:31 - ERROR - stderr - +2025-02-06 06:34:31 - INFO - stdout - {'loss': 0.4, 'grad_norm': 1.62755286693573, 'learning_rate': 3.872946121192689e-07, 'epoch': 2.74} +2025-02-06 06:34:31 - ERROR - stderr - 91%|█████████▏| 20500/22434 [20:26:50<1:20:54, 2.51s/it] +2025-02-06 06:34:33 - ERROR - stderr - 91%|█████████▏| 20501/22434 [20:26:53<1:20:47, 2.51s/it] +2025-02-06 06:34:33 - ERROR - stderr - +2025-02-06 06:34:33 - ERROR - stderr - +2025-02-06 06:34:33 - INFO - stdout - {'loss': 0.3933, 'grad_norm': 1.5997579097747803, 'learning_rate': 3.8689680632853275e-07, 'epoch': 2.74} +2025-02-06 06:34:33 - ERROR - stderr - 91%|█████████▏| 20501/22434 [20:26:53<1:20:47, 2.51s/it] +2025-02-06 06:34:36 - ERROR - stderr - 91%|█████████▏| 20502/22434 [20:26:55<1:20:50, 2.51s/it] +2025-02-06 06:34:36 - ERROR - stderr - +2025-02-06 06:34:36 - ERROR - stderr - +2025-02-06 06:34:36 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.6901222467422485, 'learning_rate': 3.864992009137347e-07, 'epoch': 2.74} +2025-02-06 06:34:36 - ERROR - stderr - 91%|█████████▏| 20502/22434 [20:26:55<1:20:50, 2.51s/it] +2025-02-06 06:34:38 - ERROR - stderr - 91%|█████████▏| 20503/22434 [20:26:58<1:19:57, 2.48s/it] +2025-02-06 06:34:38 - ERROR - stderr - +2025-02-06 06:34:38 - ERROR - stderr - +2025-02-06 06:34:38 - INFO - stdout - {'loss': 0.402, 'grad_norm': 1.746044635772705, 'learning_rate': 3.8610179588316144e-07, 'epoch': 2.74} +2025-02-06 06:34:38 - ERROR - stderr - 91%|█████████▏| 20503/22434 [20:26:58<1:19:57, 2.48s/it] +2025-02-06 06:34:41 - ERROR - stderr - 91%|█████████▏| 20504/22434 [20:27:00<1:19:35, 2.47s/it] +2025-02-06 06:34:41 - ERROR - stderr - +2025-02-06 06:34:41 - ERROR - stderr - +2025-02-06 06:34:41 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.6435599327087402, 'learning_rate': 3.857045912450974e-07, 'epoch': 2.74} +2025-02-06 06:34:41 - ERROR - stderr - 91%|█████████▏| 20504/22434 [20:27:00<1:19:35, 2.47s/it] +2025-02-06 06:34:43 - ERROR - stderr - 91%|█████████▏| 20505/22434 [20:27:03<1:19:54, 2.49s/it] +2025-02-06 06:34:43 - ERROR - stderr - +2025-02-06 06:34:43 - ERROR - stderr - +2025-02-06 06:34:43 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.652241587638855, 'learning_rate': 3.853075870078193e-07, 'epoch': 2.74} +2025-02-06 06:34:43 - ERROR - stderr - 91%|█████████▏| 20505/22434 [20:27:03<1:19:54, 2.49s/it] +2025-02-06 06:34:46 - ERROR - stderr - 91%|█████████▏| 20506/22434 [20:27:05<1:20:36, 2.51s/it] +2025-02-06 06:34:46 - ERROR - stderr - +2025-02-06 06:34:46 - ERROR - stderr - +2025-02-06 06:34:46 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.608970046043396, 'learning_rate': 3.849107831796073e-07, 'epoch': 2.74} +2025-02-06 06:34:46 - ERROR - stderr - 91%|█████████▏| 20506/22434 [20:27:05<1:20:36, 2.51s/it] +2025-02-06 06:34:48 - ERROR - stderr - 91%|█████████▏| 20507/22434 [20:27:08<1:20:54, 2.52s/it] +2025-02-06 06:34:48 - ERROR - stderr - +2025-02-06 06:34:48 - ERROR - stderr - +2025-02-06 06:34:48 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.4193334579467773, 'learning_rate': 3.845141797687257e-07, 'epoch': 2.74} +2025-02-06 06:34:48 - ERROR - stderr - 91%|█████████▏| 20507/22434 [20:27:08<1:20:54, 2.52s/it] +2025-02-06 06:34:51 - ERROR - stderr - 91%|█████████▏| 20508/22434 [20:27:11<1:23:55, 2.61s/it] +2025-02-06 06:34:51 - ERROR - stderr - +2025-02-06 06:34:51 - ERROR - stderr - +2025-02-06 06:34:51 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.6934199333190918, 'learning_rate': 3.84117776783447e-07, 'epoch': 2.74} +2025-02-06 06:34:51 - ERROR - stderr - 91%|█████████▏| 20508/22434 [20:27:11<1:23:55, 2.61s/it] +2025-02-06 06:34:53 - ERROR - stderr - 91%|█████████▏| 20509/22434 [20:27:13<1:23:02, 2.59s/it] +2025-02-06 06:34:54 - ERROR - stderr - +2025-02-06 06:34:54 - ERROR - stderr - +2025-02-06 06:34:54 - INFO - stdout - {'loss': 0.3553, 'grad_norm': 1.4429768323898315, 'learning_rate': 3.837215742320333e-07, 'epoch': 2.74} +2025-02-06 06:34:54 - ERROR - stderr - 91%|█████████▏| 20509/22434 [20:27:13<1:23:02, 2.59s/it] +2025-02-06 06:34:56 - ERROR - stderr - 91%|█████████▏| 20510/22434 [20:27:16<1:22:16, 2.57s/it] +2025-02-06 06:34:56 - ERROR - stderr - +2025-02-06 06:34:56 - ERROR - stderr - +2025-02-06 06:34:56 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.5743972063064575, 'learning_rate': 3.833255721227391e-07, 'epoch': 2.74} +2025-02-06 06:34:56 - ERROR - stderr - 91%|█████████▏| 20510/22434 [20:27:16<1:22:16, 2.57s/it] +2025-02-06 06:34:59 - ERROR - stderr - 91%|█████████▏| 20511/22434 [20:27:18<1:23:35, 2.61s/it] +2025-02-06 06:34:59 - ERROR - stderr - +2025-02-06 06:34:59 - ERROR - stderr - +2025-02-06 06:34:59 - INFO - stdout - {'loss': 0.3473, 'grad_norm': 1.6451466083526611, 'learning_rate': 3.829297704638224e-07, 'epoch': 2.74} +2025-02-06 06:34:59 - ERROR - stderr - 91%|█████████▏| 20511/22434 [20:27:19<1:23:35, 2.61s/it] +2025-02-06 06:35:01 - ERROR - stderr - 91%|█████████▏| 20512/22434 [20:27:21<1:23:25, 2.60s/it] +2025-02-06 06:35:01 - ERROR - stderr - +2025-02-06 06:35:01 - ERROR - stderr - +2025-02-06 06:35:01 - INFO - stdout - {'loss': 0.3928, 'grad_norm': 1.8716309070587158, 'learning_rate': 3.82534169263532e-07, 'epoch': 2.74} +2025-02-06 06:35:01 - ERROR - stderr - 91%|█████████▏| 20512/22434 [20:27:21<1:23:25, 2.60s/it] +2025-02-06 06:35:04 - ERROR - stderr - 91%|█████████▏| 20513/22434 [20:27:24<1:22:54, 2.59s/it] +2025-02-06 06:35:04 - ERROR - stderr - +2025-02-06 06:35:04 - ERROR - stderr - +2025-02-06 06:35:04 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.5953787565231323, 'learning_rate': 3.8213876853011365e-07, 'epoch': 2.74} +2025-02-06 06:35:04 - ERROR - stderr - 91%|█████████▏| 20513/22434 [20:27:24<1:22:54, 2.59s/it] +2025-02-06 06:35:06 - ERROR - stderr - 91%|█████████▏| 20514/22434 [20:27:26<1:21:44, 2.55s/it] +2025-02-06 06:35:06 - ERROR - stderr - +2025-02-06 06:35:06 - ERROR - stderr - +2025-02-06 06:35:06 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.6890983581542969, 'learning_rate': 3.817435682718096e-07, 'epoch': 2.74} +2025-02-06 06:35:06 - ERROR - stderr - 91%|█████████▏| 20514/22434 [20:27:26<1:21:44, 2.55s/it] +2025-02-06 06:35:09 - ERROR - stderr - 91%|█████████▏| 20515/22434 [20:27:29<1:23:59, 2.63s/it] +2025-02-06 06:35:09 - ERROR - stderr - +2025-02-06 06:35:09 - ERROR - stderr - +2025-02-06 06:35:09 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.4891178607940674, 'learning_rate': 3.813485684968565e-07, 'epoch': 2.74} +2025-02-06 06:35:09 - ERROR - stderr - 91%|█████████▏| 20515/22434 [20:27:29<1:23:59, 2.63s/it] +2025-02-06 06:35:12 - ERROR - stderr - 91%|█████████▏| 20516/22434 [20:27:32<1:24:10, 2.63s/it] +2025-02-06 06:35:12 - ERROR - stderr - +2025-02-06 06:35:12 - ERROR - stderr - +2025-02-06 06:35:12 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.824203610420227, 'learning_rate': 3.8095376921349015e-07, 'epoch': 2.74} +2025-02-06 06:35:12 - ERROR - stderr - 91%|█████████▏| 20516/22434 [20:27:32<1:24:10, 2.63s/it] +2025-02-06 06:35:14 - ERROR - stderr - 91%|█████████▏| 20517/22434 [20:27:34<1:24:27, 2.64s/it] +2025-02-06 06:35:14 - ERROR - stderr - +2025-02-06 06:35:14 - ERROR - stderr - +2025-02-06 06:35:14 - INFO - stdout - {'loss': 0.34, 'grad_norm': 1.5892139673233032, 'learning_rate': 3.8055917042993716e-07, 'epoch': 2.74} +2025-02-06 06:35:14 - ERROR - stderr - 91%|█████████▏| 20517/22434 [20:27:34<1:24:27, 2.64s/it] +2025-02-06 06:35:17 - ERROR - stderr - 91%|█████████▏| 20518/22434 [20:27:37<1:24:01, 2.63s/it] +2025-02-06 06:35:17 - ERROR - stderr - +2025-02-06 06:35:17 - ERROR - stderr - +2025-02-06 06:35:17 - INFO - stdout - {'loss': 0.3698, 'grad_norm': 1.637537956237793, 'learning_rate': 3.8016477215442325e-07, 'epoch': 2.74} +2025-02-06 06:35:17 - ERROR - stderr - 91%|█████████▏| 20518/22434 [20:27:37<1:24:01, 2.63s/it] +2025-02-06 06:35:20 - ERROR - stderr - 91%|█████████▏| 20519/22434 [20:27:39<1:23:28, 2.62s/it] +2025-02-06 06:35:20 - ERROR - stderr - +2025-02-06 06:35:20 - ERROR - stderr - +2025-02-06 06:35:20 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5128076076507568, 'learning_rate': 3.797705743951685e-07, 'epoch': 2.74} +2025-02-06 06:35:20 - ERROR - stderr - 91%|█████████▏| 20519/22434 [20:27:39<1:23:28, 2.62s/it] +2025-02-06 06:35:22 - ERROR - stderr - 91%|█████████▏| 20520/22434 [20:27:42<1:23:21, 2.61s/it] +2025-02-06 06:35:22 - ERROR - stderr - +2025-02-06 06:35:22 - ERROR - stderr - +2025-02-06 06:35:22 - INFO - stdout - {'loss': 0.398, 'grad_norm': 1.5798979997634888, 'learning_rate': 3.793765771603919e-07, 'epoch': 2.74} +2025-02-06 06:35:22 - ERROR - stderr - 91%|█████████▏| 20520/22434 [20:27:42<1:23:21, 2.61s/it] +2025-02-06 06:35:25 - ERROR - stderr - 91%|█████████▏| 20521/22434 [20:27:44<1:22:06, 2.58s/it] +2025-02-06 06:35:25 - ERROR - stderr - +2025-02-06 06:35:25 - ERROR - stderr - +2025-02-06 06:35:25 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.6757993698120117, 'learning_rate': 3.789827804583046e-07, 'epoch': 2.74} +2025-02-06 06:35:25 - ERROR - stderr - 91%|█████████▏| 20521/22434 [20:27:45<1:22:06, 2.58s/it] +2025-02-06 06:35:27 - ERROR - stderr - 91%|█████████▏| 20522/22434 [20:27:47<1:21:38, 2.56s/it] +2025-02-06 06:35:27 - ERROR - stderr - +2025-02-06 06:35:27 - ERROR - stderr - +2025-02-06 06:35:27 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.6149609088897705, 'learning_rate': 3.7858918429711455e-07, 'epoch': 2.74} +2025-02-06 06:35:27 - ERROR - stderr - 91%|█████████▏| 20522/22434 [20:27:47<1:21:38, 2.56s/it] +2025-02-06 06:35:30 - ERROR - stderr - 91%|█████████▏| 20523/22434 [20:27:49<1:20:41, 2.53s/it] +2025-02-06 06:35:30 - ERROR - stderr - +2025-02-06 06:35:30 - ERROR - stderr - +2025-02-06 06:35:30 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.4499512910842896, 'learning_rate': 3.7819578868502626e-07, 'epoch': 2.74} +2025-02-06 06:35:30 - ERROR - stderr - 91%|█████████▏| 20523/22434 [20:27:50<1:20:41, 2.53s/it] +2025-02-06 06:35:32 - ERROR - stderr - 91%|█████████▏| 20524/22434 [20:27:52<1:20:26, 2.53s/it] +2025-02-06 06:35:32 - ERROR - stderr - +2025-02-06 06:35:32 - ERROR - stderr - +2025-02-06 06:35:32 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.7151280641555786, 'learning_rate': 3.7780259363023983e-07, 'epoch': 2.74} +2025-02-06 06:35:32 - ERROR - stderr - 91%|█████████▏| 20524/22434 [20:27:52<1:20:26, 2.53s/it] +2025-02-06 06:35:35 - ERROR - stderr - 91%|█████████▏| 20525/22434 [20:27:55<1:20:16, 2.52s/it] +2025-02-06 06:35:35 - ERROR - stderr - +2025-02-06 06:35:35 - ERROR - stderr - +2025-02-06 06:35:35 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.5275129079818726, 'learning_rate': 3.774095991409521e-07, 'epoch': 2.74} +2025-02-06 06:35:35 - ERROR - stderr - 91%|█████████▏| 20525/22434 [20:27:55<1:20:16, 2.52s/it] +2025-02-06 06:35:37 - ERROR - stderr - 91%|█████████▏| 20526/22434 [20:27:57<1:19:42, 2.51s/it] +2025-02-06 06:35:37 - ERROR - stderr - +2025-02-06 06:35:37 - ERROR - stderr - +2025-02-06 06:35:37 - INFO - stdout - {'loss': 0.336, 'grad_norm': 1.5633188486099243, 'learning_rate': 3.7701680522535087e-07, 'epoch': 2.74} +2025-02-06 06:35:37 - ERROR - stderr - 91%|█████████▏| 20526/22434 [20:27:57<1:19:42, 2.51s/it] +2025-02-06 06:35:40 - ERROR - stderr - 91%|█████████▏| 20527/22434 [20:28:00<1:22:56, 2.61s/it] +2025-02-06 06:35:40 - ERROR - stderr - +2025-02-06 06:35:40 - ERROR - stderr - +2025-02-06 06:35:40 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.5304471254348755, 'learning_rate': 3.7662421189162745e-07, 'epoch': 2.74} +2025-02-06 06:35:40 - ERROR - stderr - 91%|█████████▏| 20527/22434 [20:28:00<1:22:56, 2.61s/it] +2025-02-06 06:35:43 - ERROR - stderr - 92%|█████████▏| 20528/22434 [20:28:02<1:21:24, 2.56s/it] +2025-02-06 06:35:43 - ERROR - stderr - +2025-02-06 06:35:43 - ERROR - stderr - +2025-02-06 06:35:43 - INFO - stdout - {'loss': 0.3448, 'grad_norm': 1.4296998977661133, 'learning_rate': 3.762318191479641e-07, 'epoch': 2.75} +2025-02-06 06:35:43 - ERROR - stderr - 92%|█████████▏| 20528/22434 [20:28:02<1:21:24, 2.56s/it] +2025-02-06 06:35:45 - ERROR - stderr - 92%|█████████▏| 20529/22434 [20:28:05<1:20:26, 2.53s/it] +2025-02-06 06:35:45 - ERROR - stderr - +2025-02-06 06:35:45 - ERROR - stderr - +2025-02-06 06:35:45 - INFO - stdout - {'loss': 0.3867, 'grad_norm': 1.5945011377334595, 'learning_rate': 3.7583962700253774e-07, 'epoch': 2.75} +2025-02-06 06:35:45 - ERROR - stderr - 92%|█████████▏| 20529/22434 [20:28:05<1:20:26, 2.53s/it] +2025-02-06 06:35:47 - ERROR - stderr - 92%|█████████▏| 20530/22434 [20:28:07<1:19:37, 2.51s/it] +2025-02-06 06:35:47 - ERROR - stderr - +2025-02-06 06:35:47 - ERROR - stderr - +2025-02-06 06:35:47 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.4329532384872437, 'learning_rate': 3.7544763546352834e-07, 'epoch': 2.75} +2025-02-06 06:35:47 - ERROR - stderr - 92%|█████████▏| 20530/22434 [20:28:07<1:19:37, 2.51s/it] +2025-02-06 06:35:50 - ERROR - stderr - 92%|█████████▏| 20531/22434 [20:28:10<1:22:37, 2.60s/it] +2025-02-06 06:35:50 - ERROR - stderr - +2025-02-06 06:35:50 - ERROR - stderr - +2025-02-06 06:35:50 - INFO - stdout - {'loss': 0.3183, 'grad_norm': 1.4994834661483765, 'learning_rate': 3.750558445390995e-07, 'epoch': 2.75} +2025-02-06 06:35:50 - ERROR - stderr - 92%|█████████▏| 20531/22434 [20:28:10<1:22:37, 2.60s/it] +2025-02-06 06:35:53 - ERROR - stderr - 92%|█████████▏| 20532/22434 [20:28:13<1:22:27, 2.60s/it] +2025-02-06 06:35:53 - ERROR - stderr - +2025-02-06 06:35:53 - ERROR - stderr - +2025-02-06 06:35:53 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.5915946960449219, 'learning_rate': 3.7466425423742457e-07, 'epoch': 2.75} +2025-02-06 06:35:53 - ERROR - stderr - 92%|█████████▏| 20532/22434 [20:28:13<1:22:27, 2.60s/it] +2025-02-06 06:35:55 - ERROR - stderr - 92%|█████████▏| 20533/22434 [20:28:15<1:20:51, 2.55s/it] +2025-02-06 06:35:55 - ERROR - stderr - +2025-02-06 06:35:55 - ERROR - stderr - +2025-02-06 06:35:55 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.6245179176330566, 'learning_rate': 3.742728645666616e-07, 'epoch': 2.75} +2025-02-06 06:35:55 - ERROR - stderr - 92%|█████████▏| 20533/22434 [20:28:15<1:20:51, 2.55s/it] +2025-02-06 06:35:58 - ERROR - stderr - 92%|█████████▏| 20534/22434 [20:28:18<1:20:00, 2.53s/it] +2025-02-06 06:35:58 - ERROR - stderr - +2025-02-06 06:35:58 - ERROR - stderr - +2025-02-06 06:35:58 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.4274048805236816, 'learning_rate': 3.7388167553496944e-07, 'epoch': 2.75} +2025-02-06 06:35:58 - ERROR - stderr - 92%|█████████▏| 20534/22434 [20:28:18<1:20:00, 2.53s/it] +2025-02-06 06:36:00 - ERROR - stderr - 92%|█████████▏| 20535/22434 [20:28:20<1:20:39, 2.55s/it] +2025-02-06 06:36:00 - ERROR - stderr - +2025-02-06 06:36:00 - ERROR - stderr - +2025-02-06 06:36:00 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.5425249338150024, 'learning_rate': 3.73490687150504e-07, 'epoch': 2.75} +2025-02-06 06:36:00 - ERROR - stderr - 92%|█████████▏| 20535/22434 [20:28:20<1:20:39, 2.55s/it] +2025-02-06 06:36:03 - ERROR - stderr - 92%|█████████▏| 20536/22434 [20:28:23<1:20:59, 2.56s/it] +2025-02-06 06:36:03 - ERROR - stderr - +2025-02-06 06:36:03 - ERROR - stderr - +2025-02-06 06:36:03 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.4881196022033691, 'learning_rate': 3.73099899421413e-07, 'epoch': 2.75} +2025-02-06 06:36:03 - ERROR - stderr - 92%|█████████▏| 20536/22434 [20:28:23<1:20:59, 2.56s/it] +2025-02-06 06:36:05 - ERROR - stderr - 92%|█████████▏| 20537/22434 [20:28:25<1:20:25, 2.54s/it] +2025-02-06 06:36:05 - ERROR - stderr - +2025-02-06 06:36:05 - ERROR - stderr - +2025-02-06 06:36:05 - INFO - stdout - {'loss': 0.3035, 'grad_norm': 1.426841139793396, 'learning_rate': 3.727093123558423e-07, 'epoch': 2.75} +2025-02-06 06:36:06 - ERROR - stderr - 92%|█████████▏| 20537/22434 [20:28:25<1:20:25, 2.54s/it] +2025-02-06 06:36:08 - ERROR - stderr - 92%|█████████▏| 20538/22434 [20:28:28<1:19:35, 2.52s/it] +2025-02-06 06:36:08 - ERROR - stderr - +2025-02-06 06:36:08 - ERROR - stderr - +2025-02-06 06:36:08 - INFO - stdout - {'loss': 0.3184, 'grad_norm': 1.4708136320114136, 'learning_rate': 3.723189259619331e-07, 'epoch': 2.75} +2025-02-06 06:36:08 - ERROR - stderr - 92%|█████████▏| 20538/22434 [20:28:28<1:19:35, 2.52s/it] +2025-02-06 06:36:10 - ERROR - stderr - 92%|█████████▏| 20539/22434 [20:28:30<1:19:14, 2.51s/it] +2025-02-06 06:36:10 - ERROR - stderr - +2025-02-06 06:36:10 - ERROR - stderr - +2025-02-06 06:36:10 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.4729924201965332, 'learning_rate': 3.7192874024782443e-07, 'epoch': 2.75} +2025-02-06 06:36:10 - ERROR - stderr - 92%|█████████▏| 20539/22434 [20:28:30<1:19:14, 2.51s/it] +2025-02-06 06:36:13 - ERROR - stderr - 92%|█████████▏| 20540/22434 [20:28:33<1:19:19, 2.51s/it] +2025-02-06 06:36:13 - ERROR - stderr - +2025-02-06 06:36:13 - ERROR - stderr - +2025-02-06 06:36:13 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.636109709739685, 'learning_rate': 3.715387552216476e-07, 'epoch': 2.75} +2025-02-06 06:36:13 - ERROR - stderr - 92%|█████████▏| 20540/22434 [20:28:33<1:19:19, 2.51s/it] +2025-02-06 06:36:15 - ERROR - stderr - 92%|█████████▏| 20541/22434 [20:28:35<1:19:07, 2.51s/it] +2025-02-06 06:36:15 - ERROR - stderr - +2025-02-06 06:36:15 - ERROR - stderr - +2025-02-06 06:36:15 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.568668246269226, 'learning_rate': 3.7114897089153167e-07, 'epoch': 2.75} +2025-02-06 06:36:15 - ERROR - stderr - 92%|█████████▏| 20541/22434 [20:28:35<1:19:07, 2.51s/it] +2025-02-06 06:36:18 - ERROR - stderr - 92%|█████████▏| 20542/22434 [20:28:38<1:19:08, 2.51s/it] +2025-02-06 06:36:18 - ERROR - stderr - +2025-02-06 06:36:18 - ERROR - stderr - +2025-02-06 06:36:18 - INFO - stdout - {'loss': 0.3395, 'grad_norm': 1.4993621110916138, 'learning_rate': 3.7075938726560123e-07, 'epoch': 2.75} +2025-02-06 06:36:18 - ERROR - stderr - 92%|█████████▏| 20542/22434 [20:28:38<1:19:08, 2.51s/it] +2025-02-06 06:36:20 - ERROR - stderr - 92%|█████████▏| 20543/22434 [20:28:40<1:19:04, 2.51s/it] +2025-02-06 06:36:20 - ERROR - stderr - +2025-02-06 06:36:20 - ERROR - stderr - +2025-02-06 06:36:20 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.6911479234695435, 'learning_rate': 3.703700043519787e-07, 'epoch': 2.75} +2025-02-06 06:36:20 - ERROR - stderr - 92%|█████████▏| 20543/22434 [20:28:40<1:19:04, 2.51s/it] +2025-02-06 06:36:23 - ERROR - stderr - 92%|█████████▏| 20544/22434 [20:28:43<1:19:05, 2.51s/it] +2025-02-06 06:36:23 - ERROR - stderr - +2025-02-06 06:36:23 - ERROR - stderr - +2025-02-06 06:36:23 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.6015174388885498, 'learning_rate': 3.699808221587786e-07, 'epoch': 2.75} +2025-02-06 06:36:23 - ERROR - stderr - 92%|█████████▏| 20544/22434 [20:28:43<1:19:05, 2.51s/it] +2025-02-06 06:36:25 - ERROR - stderr - 92%|█████████▏| 20545/22434 [20:28:45<1:18:51, 2.50s/it] +2025-02-06 06:36:25 - ERROR - stderr - +2025-02-06 06:36:25 - ERROR - stderr - +2025-02-06 06:36:25 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.4216365814208984, 'learning_rate': 3.6959184069411123e-07, 'epoch': 2.75} +2025-02-06 06:36:25 - ERROR - stderr - 92%|█████████▏| 20545/22434 [20:28:45<1:18:51, 2.50s/it] +2025-02-06 06:36:28 - ERROR - stderr - 92%|█████████▏| 20546/22434 [20:28:48<1:18:56, 2.51s/it] +2025-02-06 06:36:28 - ERROR - stderr - +2025-02-06 06:36:28 - ERROR - stderr - +2025-02-06 06:36:28 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.7470240592956543, 'learning_rate': 3.6920305996608785e-07, 'epoch': 2.75} +2025-02-06 06:36:28 - ERROR - stderr - 92%|█████████▏| 20546/22434 [20:28:48<1:18:56, 2.51s/it] +2025-02-06 06:36:30 - ERROR - stderr - 92%|█████████▏| 20547/22434 [20:28:50<1:18:20, 2.49s/it] +2025-02-06 06:36:30 - ERROR - stderr - +2025-02-06 06:36:30 - ERROR - stderr - +2025-02-06 06:36:30 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.5597035884857178, 'learning_rate': 3.6881447998281193e-07, 'epoch': 2.75} +2025-02-06 06:36:30 - ERROR - stderr - 92%|█████████▏| 20547/22434 [20:28:50<1:18:20, 2.49s/it] +2025-02-06 06:36:33 - ERROR - stderr - 92%|█████████▏| 20548/22434 [20:28:53<1:18:54, 2.51s/it] +2025-02-06 06:36:33 - ERROR - stderr - +2025-02-06 06:36:33 - ERROR - stderr - +2025-02-06 06:36:33 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.5774924755096436, 'learning_rate': 3.684261007523815e-07, 'epoch': 2.75} +2025-02-06 06:36:33 - ERROR - stderr - 92%|█████████▏| 20548/22434 [20:28:53<1:18:54, 2.51s/it] +2025-02-06 06:36:35 - ERROR - stderr - 92%|█████████▏| 20549/22434 [20:28:55<1:18:53, 2.51s/it] +2025-02-06 06:36:36 - ERROR - stderr - +2025-02-06 06:36:36 - ERROR - stderr - +2025-02-06 06:36:36 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.6980552673339844, 'learning_rate': 3.6803792228289337e-07, 'epoch': 2.75} +2025-02-06 06:36:36 - ERROR - stderr - 92%|█████████▏| 20549/22434 [20:28:55<1:18:53, 2.51s/it] +2025-02-06 06:36:38 - ERROR - stderr - 92%|█████████▏| 20550/22434 [20:28:58<1:18:02, 2.49s/it] +2025-02-06 06:36:38 - ERROR - stderr - +2025-02-06 06:36:38 - ERROR - stderr - +2025-02-06 06:36:38 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.6803628206253052, 'learning_rate': 3.676499445824355e-07, 'epoch': 2.75} +2025-02-06 06:36:38 - ERROR - stderr - 92%|█████████▏| 20550/22434 [20:28:58<1:18:02, 2.49s/it] +2025-02-06 06:36:40 - ERROR - stderr - 92%|█████████▏| 20551/22434 [20:29:00<1:17:54, 2.48s/it] +2025-02-06 06:36:40 - ERROR - stderr - +2025-02-06 06:36:40 - ERROR - stderr - +2025-02-06 06:36:40 - INFO - stdout - {'loss': 0.4127, 'grad_norm': 1.789453148841858, 'learning_rate': 3.6726216765910036e-07, 'epoch': 2.75} +2025-02-06 06:36:40 - ERROR - stderr - 92%|█████████▏| 20551/22434 [20:29:00<1:17:54, 2.48s/it] +2025-02-06 06:36:43 - ERROR - stderr - 92%|█████████▏| 20552/22434 [20:29:03<1:18:14, 2.49s/it] +2025-02-06 06:36:43 - ERROR - stderr - +2025-02-06 06:36:43 - ERROR - stderr - +2025-02-06 06:36:43 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.4321181774139404, 'learning_rate': 3.6687459152096706e-07, 'epoch': 2.75} +2025-02-06 06:36:43 - ERROR - stderr - 92%|█████████▏| 20552/22434 [20:29:03<1:18:14, 2.49s/it] +2025-02-06 06:36:46 - ERROR - stderr - 92%|█████████▏| 20553/22434 [20:29:05<1:20:12, 2.56s/it] +2025-02-06 06:36:46 - ERROR - stderr - +2025-02-06 06:36:46 - ERROR - stderr - +2025-02-06 06:36:46 - INFO - stdout - {'loss': 0.3551, 'grad_norm': 1.6468758583068848, 'learning_rate': 3.664872161761135e-07, 'epoch': 2.75} +2025-02-06 06:36:46 - ERROR - stderr - 92%|█████████▏| 20553/22434 [20:29:05<1:20:12, 2.56s/it] +2025-02-06 06:36:48 - ERROR - stderr - 92%|█████████▏| 20554/22434 [20:29:08<1:19:26, 2.54s/it] +2025-02-06 06:36:48 - ERROR - stderr - +2025-02-06 06:36:48 - ERROR - stderr - +2025-02-06 06:36:48 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.5822397470474243, 'learning_rate': 3.661000416326177e-07, 'epoch': 2.75} +2025-02-06 06:36:48 - ERROR - stderr - 92%|█████████▏| 20554/22434 [20:29:08<1:19:26, 2.54s/it] +2025-02-06 06:36:51 - ERROR - stderr - 92%|█████████▏| 20555/22434 [20:29:10<1:19:37, 2.54s/it] +2025-02-06 06:36:51 - ERROR - stderr - +2025-02-06 06:36:51 - ERROR - stderr - +2025-02-06 06:36:51 - INFO - stdout - {'loss': 0.4132, 'grad_norm': 1.7735439538955688, 'learning_rate': 3.6571306789854543e-07, 'epoch': 2.75} +2025-02-06 06:36:51 - ERROR - stderr - 92%|█████████▏| 20555/22434 [20:29:10<1:19:37, 2.54s/it] +2025-02-06 06:36:53 - ERROR - stderr - 92%|█████████▏| 20556/22434 [20:29:13<1:19:44, 2.55s/it] +2025-02-06 06:36:53 - ERROR - stderr - +2025-02-06 06:36:53 - ERROR - stderr - +2025-02-06 06:36:53 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.4930862188339233, 'learning_rate': 3.6532629498196694e-07, 'epoch': 2.75} +2025-02-06 06:36:53 - ERROR - stderr - 92%|█████████▏| 20556/22434 [20:29:13<1:19:44, 2.55s/it] +2025-02-06 06:36:56 - ERROR - stderr - 92%|█████████▏| 20557/22434 [20:29:15<1:18:39, 2.51s/it] +2025-02-06 06:36:56 - ERROR - stderr - +2025-02-06 06:36:56 - ERROR - stderr - +2025-02-06 06:36:56 - INFO - stdout - {'loss': 0.3648, 'grad_norm': 1.5944476127624512, 'learning_rate': 3.649397228909424e-07, 'epoch': 2.75} +2025-02-06 06:36:56 - ERROR - stderr - 92%|█████████▏| 20557/22434 [20:29:15<1:18:39, 2.51s/it] +2025-02-06 06:36:58 - ERROR - stderr - 92%|█████████▏| 20558/22434 [20:29:18<1:18:23, 2.51s/it] +2025-02-06 06:36:58 - ERROR - stderr - +2025-02-06 06:36:58 - ERROR - stderr - +2025-02-06 06:36:58 - INFO - stdout - {'loss': 0.3795, 'grad_norm': 1.3522071838378906, 'learning_rate': 3.6455335163352977e-07, 'epoch': 2.75} +2025-02-06 06:36:58 - ERROR - stderr - 92%|█████████▏| 20558/22434 [20:29:18<1:18:23, 2.51s/it] +2025-02-06 06:37:01 - ERROR - stderr - 92%|█████████▏| 20559/22434 [20:29:20<1:18:04, 2.50s/it] +2025-02-06 06:37:01 - ERROR - stderr - +2025-02-06 06:37:01 - ERROR - stderr - +2025-02-06 06:37:01 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.5602048635482788, 'learning_rate': 3.641671812177816e-07, 'epoch': 2.75} +2025-02-06 06:37:01 - ERROR - stderr - 92%|█████████▏| 20559/22434 [20:29:20<1:18:04, 2.50s/it] +2025-02-06 06:37:03 - ERROR - stderr - 92%|█████████▏| 20560/22434 [20:29:23<1:18:54, 2.53s/it] +2025-02-06 06:37:03 - ERROR - stderr - +2025-02-06 06:37:03 - ERROR - stderr - +2025-02-06 06:37:03 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.6146314144134521, 'learning_rate': 3.6378121165174806e-07, 'epoch': 2.75} +2025-02-06 06:37:03 - ERROR - stderr - 92%|█████████▏| 20560/22434 [20:29:23<1:18:54, 2.53s/it] +2025-02-06 06:37:06 - ERROR - stderr - 92%|█████████▏| 20561/22434 [20:29:26<1:19:03, 2.53s/it] +2025-02-06 06:37:06 - ERROR - stderr - +2025-02-06 06:37:06 - ERROR - stderr - +2025-02-06 06:37:06 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.7265864610671997, 'learning_rate': 3.63395442943475e-07, 'epoch': 2.75} +2025-02-06 06:37:06 - ERROR - stderr - 92%|█████████▏| 20561/22434 [20:29:26<1:19:03, 2.53s/it] +2025-02-06 06:37:08 - ERROR - stderr - 92%|█████████▏| 20562/22434 [20:29:28<1:18:04, 2.50s/it] +2025-02-06 06:37:08 - ERROR - stderr - +2025-02-06 06:37:08 - ERROR - stderr - +2025-02-06 06:37:08 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.6512349843978882, 'learning_rate': 3.6300987510100136e-07, 'epoch': 2.75} +2025-02-06 06:37:08 - ERROR - stderr - 92%|█████████▏| 20562/22434 [20:29:28<1:18:04, 2.50s/it] +2025-02-06 06:37:11 - ERROR - stderr - 92%|█████████▏| 20563/22434 [20:29:30<1:18:21, 2.51s/it] +2025-02-06 06:37:11 - ERROR - stderr - +2025-02-06 06:37:11 - ERROR - stderr - +2025-02-06 06:37:11 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.515991449356079, 'learning_rate': 3.6262450813236647e-07, 'epoch': 2.75} +2025-02-06 06:37:11 - ERROR - stderr - 92%|█████████▏| 20563/22434 [20:29:31<1:18:21, 2.51s/it] +2025-02-06 06:37:13 - ERROR - stderr - 92%|█████████▏| 20564/22434 [20:29:33<1:17:43, 2.49s/it] +2025-02-06 06:37:13 - ERROR - stderr - +2025-02-06 06:37:13 - ERROR - stderr - +2025-02-06 06:37:13 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.4392749071121216, 'learning_rate': 3.6223934204560165e-07, 'epoch': 2.75} +2025-02-06 06:37:13 - ERROR - stderr - 92%|█████████▏| 20564/22434 [20:29:33<1:17:43, 2.49s/it] +2025-02-06 06:37:16 - ERROR - stderr - 92%|█████████▏| 20565/22434 [20:29:36<1:18:43, 2.53s/it] +2025-02-06 06:37:16 - ERROR - stderr - +2025-02-06 06:37:16 - ERROR - stderr - +2025-02-06 06:37:16 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.4407451152801514, 'learning_rate': 3.618543768487348e-07, 'epoch': 2.75} +2025-02-06 06:37:16 - ERROR - stderr - 92%|█████████▏| 20565/22434 [20:29:36<1:18:43, 2.53s/it] +2025-02-06 06:37:18 - ERROR - stderr - 92%|█████████▏| 20566/22434 [20:29:38<1:18:31, 2.52s/it] +2025-02-06 06:37:18 - ERROR - stderr - +2025-02-06 06:37:18 - ERROR - stderr - +2025-02-06 06:37:18 - INFO - stdout - {'loss': 0.3978, 'grad_norm': 1.668703556060791, 'learning_rate': 3.6146961254979187e-07, 'epoch': 2.75} +2025-02-06 06:37:18 - ERROR - stderr - 92%|█████████▏| 20566/22434 [20:29:38<1:18:31, 2.52s/it] +2025-02-06 06:37:21 - ERROR - stderr - 92%|█████████▏| 20567/22434 [20:29:40<1:17:42, 2.50s/it] +2025-02-06 06:37:21 - ERROR - stderr - +2025-02-06 06:37:21 - ERROR - stderr - +2025-02-06 06:37:21 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.430204153060913, 'learning_rate': 3.610850491567908e-07, 'epoch': 2.75} +2025-02-06 06:37:21 - ERROR - stderr - 92%|█████████▏| 20567/22434 [20:29:41<1:17:42, 2.50s/it] +2025-02-06 06:37:23 - ERROR - stderr - 92%|█████████▏| 20568/22434 [20:29:43<1:17:35, 2.49s/it] +2025-02-06 06:37:23 - ERROR - stderr - +2025-02-06 06:37:23 - ERROR - stderr - +2025-02-06 06:37:23 - INFO - stdout - {'loss': 0.3908, 'grad_norm': 1.5843908786773682, 'learning_rate': 3.607006866777485e-07, 'epoch': 2.75} +2025-02-06 06:37:23 - ERROR - stderr - 92%|█████████▏| 20568/22434 [20:29:43<1:17:35, 2.49s/it] +2025-02-06 06:37:26 - ERROR - stderr - 92%|█████████▏| 20569/22434 [20:29:46<1:17:45, 2.50s/it] +2025-02-06 06:37:26 - ERROR - stderr - +2025-02-06 06:37:26 - ERROR - stderr - +2025-02-06 06:37:26 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.6481750011444092, 'learning_rate': 3.603165251206764e-07, 'epoch': 2.75} +2025-02-06 06:37:26 - ERROR - stderr - 92%|█████████▏| 20569/22434 [20:29:46<1:17:45, 2.50s/it] +2025-02-06 06:37:28 - ERROR - stderr - 92%|█████████▏| 20570/22434 [20:29:48<1:18:25, 2.52s/it] +2025-02-06 06:37:28 - ERROR - stderr - +2025-02-06 06:37:28 - ERROR - stderr - +2025-02-06 06:37:28 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.571509599685669, 'learning_rate': 3.5993256449358474e-07, 'epoch': 2.75} +2025-02-06 06:37:28 - ERROR - stderr - 92%|█████████▏| 20570/22434 [20:29:48<1:18:25, 2.52s/it] +2025-02-06 06:37:31 - ERROR - stderr - 92%|█████████▏| 20571/22434 [20:29:51<1:17:39, 2.50s/it] +2025-02-06 06:37:31 - ERROR - stderr - +2025-02-06 06:37:31 - ERROR - stderr - +2025-02-06 06:37:31 - INFO - stdout - {'loss': 0.315, 'grad_norm': 1.4145193099975586, 'learning_rate': 3.595488048044704e-07, 'epoch': 2.75} +2025-02-06 06:37:31 - ERROR - stderr - 92%|█████████▏| 20571/22434 [20:29:51<1:17:39, 2.50s/it] +2025-02-06 06:37:33 - ERROR - stderr - 92%|█████████▏| 20572/22434 [20:29:53<1:17:40, 2.50s/it] +2025-02-06 06:37:33 - ERROR - stderr - +2025-02-06 06:37:33 - ERROR - stderr - +2025-02-06 06:37:33 - INFO - stdout - {'loss': 0.3427, 'grad_norm': 1.607410192489624, 'learning_rate': 3.591652460613382e-07, 'epoch': 2.75} +2025-02-06 06:37:33 - ERROR - stderr - 92%|█████████▏| 20572/22434 [20:29:53<1:17:40, 2.50s/it] +2025-02-06 06:37:36 - ERROR - stderr - 92%|█████████▏| 20573/22434 [20:29:56<1:19:33, 2.57s/it] +2025-02-06 06:37:36 - ERROR - stderr - +2025-02-06 06:37:36 - ERROR - stderr - +2025-02-06 06:37:36 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.6128318309783936, 'learning_rate': 3.5878188827218166e-07, 'epoch': 2.75} +2025-02-06 06:37:36 - ERROR - stderr - 92%|█████████▏| 20573/22434 [20:29:56<1:19:33, 2.57s/it] +2025-02-06 06:37:38 - ERROR - stderr - 92%|█████████▏| 20574/22434 [20:29:58<1:18:37, 2.54s/it] +2025-02-06 06:37:38 - ERROR - stderr - +2025-02-06 06:37:38 - ERROR - stderr - +2025-02-06 06:37:38 - INFO - stdout - {'loss': 0.4169, 'grad_norm': 1.7510581016540527, 'learning_rate': 3.5839873144498885e-07, 'epoch': 2.75} +2025-02-06 06:37:38 - ERROR - stderr - 92%|█████████▏| 20574/22434 [20:29:58<1:18:37, 2.54s/it] +2025-02-06 06:37:41 - ERROR - stderr - 92%|█████████▏| 20575/22434 [20:30:01<1:18:13, 2.52s/it] +2025-02-06 06:37:41 - ERROR - stderr - +2025-02-06 06:37:41 - ERROR - stderr - +2025-02-06 06:37:41 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.5919370651245117, 'learning_rate': 3.5801577558775113e-07, 'epoch': 2.75} +2025-02-06 06:37:41 - ERROR - stderr - 92%|█████████▏| 20575/22434 [20:30:01<1:18:13, 2.52s/it] +2025-02-06 06:37:43 - ERROR - stderr - 92%|█████████▏| 20576/22434 [20:30:03<1:18:13, 2.53s/it] +2025-02-06 06:37:44 - ERROR - stderr - +2025-02-06 06:37:44 - ERROR - stderr - +2025-02-06 06:37:44 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.7270561456680298, 'learning_rate': 3.576330207084466e-07, 'epoch': 2.75} +2025-02-06 06:37:44 - ERROR - stderr - 92%|█████████▏| 20576/22434 [20:30:03<1:18:13, 2.53s/it] +2025-02-06 06:37:46 - ERROR - stderr - 92%|█████████▏| 20577/22434 [20:30:06<1:17:15, 2.50s/it] +2025-02-06 06:37:46 - ERROR - stderr - +2025-02-06 06:37:46 - ERROR - stderr - +2025-02-06 06:37:46 - INFO - stdout - {'loss': 0.3742, 'grad_norm': 1.581162452697754, 'learning_rate': 3.572504668150556e-07, 'epoch': 2.75} +2025-02-06 06:37:46 - ERROR - stderr - 92%|█████████▏| 20577/22434 [20:30:06<1:17:15, 2.50s/it] +2025-02-06 06:37:48 - ERROR - stderr - 92%|█████████▏| 20578/22434 [20:30:08<1:16:38, 2.48s/it] +2025-02-06 06:37:48 - ERROR - stderr - +2025-02-06 06:37:48 - ERROR - stderr - +2025-02-06 06:37:48 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.7202913761138916, 'learning_rate': 3.5686811391555164e-07, 'epoch': 2.75} +2025-02-06 06:37:48 - ERROR - stderr - 92%|█████████▏| 20578/22434 [20:30:08<1:16:38, 2.48s/it] +2025-02-06 06:37:51 - ERROR - stderr - 92%|█████████▏| 20579/22434 [20:30:11<1:16:05, 2.46s/it] +2025-02-06 06:37:51 - ERROR - stderr - +2025-02-06 06:37:51 - ERROR - stderr - +2025-02-06 06:37:51 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.520103096961975, 'learning_rate': 3.564859620179029e-07, 'epoch': 2.75} +2025-02-06 06:37:51 - ERROR - stderr - 92%|█████████▏| 20579/22434 [20:30:11<1:16:05, 2.46s/it] +2025-02-06 06:37:53 - ERROR - stderr - 92%|█████████▏| 20580/22434 [20:30:13<1:15:48, 2.45s/it] +2025-02-06 06:37:53 - ERROR - stderr - +2025-02-06 06:37:53 - ERROR - stderr - +2025-02-06 06:37:53 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.6849335432052612, 'learning_rate': 3.5610401113007844e-07, 'epoch': 2.75} +2025-02-06 06:37:53 - ERROR - stderr - 92%|█████████▏| 20580/22434 [20:30:13<1:15:48, 2.45s/it] +2025-02-06 06:37:56 - ERROR - stderr - 92%|█████████▏| 20581/22434 [20:30:15<1:15:40, 2.45s/it] +2025-02-06 06:37:56 - ERROR - stderr - +2025-02-06 06:37:56 - ERROR - stderr - +2025-02-06 06:37:56 - INFO - stdout - {'loss': 0.3103, 'grad_norm': 1.4101080894470215, 'learning_rate': 3.557222612600375e-07, 'epoch': 2.75} +2025-02-06 06:37:56 - ERROR - stderr - 92%|█████████▏| 20581/22434 [20:30:15<1:15:40, 2.45s/it] +2025-02-06 06:37:58 - ERROR - stderr - 92%|█████████▏| 20582/22434 [20:30:18<1:16:21, 2.47s/it] +2025-02-06 06:37:58 - ERROR - stderr - +2025-02-06 06:37:58 - ERROR - stderr - +2025-02-06 06:37:58 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.4197107553482056, 'learning_rate': 3.55340712415736e-07, 'epoch': 2.75} +2025-02-06 06:37:58 - ERROR - stderr - 92%|█████████▏| 20582/22434 [20:30:18<1:16:21, 2.47s/it] +2025-02-06 06:38:01 - ERROR - stderr - 92%|█████████▏| 20583/22434 [20:30:20<1:16:24, 2.48s/it] +2025-02-06 06:38:01 - ERROR - stderr - +2025-02-06 06:38:01 - ERROR - stderr - +2025-02-06 06:38:01 - INFO - stdout - {'loss': 0.3755, 'grad_norm': 1.6285066604614258, 'learning_rate': 3.549593646051297e-07, 'epoch': 2.75} +2025-02-06 06:38:01 - ERROR - stderr - 92%|█████████▏| 20583/22434 [20:30:20<1:16:24, 2.48s/it] +2025-02-06 06:38:03 - ERROR - stderr - 92%|█████████▏| 20584/22434 [20:30:23<1:16:07, 2.47s/it] +2025-02-06 06:38:03 - ERROR - stderr - +2025-02-06 06:38:03 - ERROR - stderr - +2025-02-06 06:38:03 - INFO - stdout - {'loss': 0.4207, 'grad_norm': 1.7666267156600952, 'learning_rate': 3.5457821783616565e-07, 'epoch': 2.75} +2025-02-06 06:38:03 - ERROR - stderr - 92%|█████████▏| 20584/22434 [20:30:23<1:16:07, 2.47s/it] +2025-02-06 06:38:06 - ERROR - stderr - 92%|█████████▏| 20585/22434 [20:30:25<1:15:57, 2.46s/it] +2025-02-06 06:38:06 - ERROR - stderr - +2025-02-06 06:38:06 - ERROR - stderr - +2025-02-06 06:38:06 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.6444405317306519, 'learning_rate': 3.5419727211678857e-07, 'epoch': 2.75} +2025-02-06 06:38:06 - ERROR - stderr - 92%|█████████▏| 20585/22434 [20:30:25<1:15:57, 2.46s/it] +2025-02-06 06:38:08 - ERROR - stderr - 92%|█████████▏| 20586/22434 [20:30:28<1:15:36, 2.45s/it] +2025-02-06 06:38:08 - ERROR - stderr - +2025-02-06 06:38:08 - ERROR - stderr - +2025-02-06 06:38:08 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.6292647123336792, 'learning_rate': 3.538165274549399e-07, 'epoch': 2.75} +2025-02-06 06:38:08 - ERROR - stderr - 92%|█████████▏| 20586/22434 [20:30:28<1:15:36, 2.45s/it] +2025-02-06 06:38:10 - ERROR - stderr - 92%|█████████▏| 20587/22434 [20:30:30<1:15:36, 2.46s/it] +2025-02-06 06:38:10 - ERROR - stderr - +2025-02-06 06:38:10 - ERROR - stderr - +2025-02-06 06:38:10 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.5892674922943115, 'learning_rate': 3.534359838585544e-07, 'epoch': 2.75} +2025-02-06 06:38:10 - ERROR - stderr - 92%|█████████▏| 20587/22434 [20:30:30<1:15:36, 2.46s/it] +2025-02-06 06:38:13 - ERROR - stderr - 92%|█████████▏| 20588/22434 [20:30:33<1:16:29, 2.49s/it] +2025-02-06 06:38:13 - ERROR - stderr - +2025-02-06 06:38:13 - ERROR - stderr - +2025-02-06 06:38:13 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.5457539558410645, 'learning_rate': 3.530556413355657e-07, 'epoch': 2.75} +2025-02-06 06:38:13 - ERROR - stderr - 92%|█████████▏| 20588/22434 [20:30:33<1:16:29, 2.49s/it] +2025-02-06 06:38:16 - ERROR - stderr - 92%|█████████▏| 20589/22434 [20:30:35<1:17:09, 2.51s/it] +2025-02-06 06:38:16 - ERROR - stderr - +2025-02-06 06:38:16 - ERROR - stderr - +2025-02-06 06:38:16 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.4582083225250244, 'learning_rate': 3.52675499893903e-07, 'epoch': 2.75} +2025-02-06 06:38:16 - ERROR - stderr - 92%|█████████▏| 20589/22434 [20:30:35<1:17:09, 2.51s/it] +2025-02-06 06:38:18 - ERROR - stderr - 92%|█████████▏| 20590/22434 [20:30:38<1:16:18, 2.48s/it] +2025-02-06 06:38:18 - ERROR - stderr - +2025-02-06 06:38:18 - ERROR - stderr - +2025-02-06 06:38:18 - INFO - stdout - {'loss': 0.377, 'grad_norm': 1.6187825202941895, 'learning_rate': 3.5229555954148453e-07, 'epoch': 2.75} +2025-02-06 06:38:18 - ERROR - stderr - 92%|█████████▏| 20590/22434 [20:30:38<1:16:18, 2.48s/it] +2025-02-06 06:38:21 - ERROR - stderr - 92%|█████████▏| 20591/22434 [20:30:40<1:17:07, 2.51s/it] +2025-02-06 06:38:21 - ERROR - stderr - +2025-02-06 06:38:21 - ERROR - stderr - +2025-02-06 06:38:21 - INFO - stdout - {'loss': 0.4188, 'grad_norm': 1.8877390623092651, 'learning_rate': 3.5191582028623495e-07, 'epoch': 2.75} +2025-02-06 06:38:21 - ERROR - stderr - 92%|█████████▏| 20591/22434 [20:30:40<1:17:07, 2.51s/it] +2025-02-06 06:38:23 - ERROR - stderr - 92%|█████████▏| 20592/22434 [20:30:43<1:17:01, 2.51s/it] +2025-02-06 06:38:23 - ERROR - stderr - +2025-02-06 06:38:23 - ERROR - stderr - +2025-02-06 06:38:23 - INFO - stdout - {'loss': 0.3883, 'grad_norm': 1.6285940408706665, 'learning_rate': 3.5153628213606795e-07, 'epoch': 2.75} +2025-02-06 06:38:23 - ERROR - stderr - 92%|█████████▏| 20592/22434 [20:30:43<1:17:01, 2.51s/it] +2025-02-06 06:38:26 - ERROR - stderr - 92%|█████████▏| 20593/22434 [20:30:45<1:17:02, 2.51s/it] +2025-02-06 06:38:26 - ERROR - stderr - +2025-02-06 06:38:26 - ERROR - stderr - +2025-02-06 06:38:26 - INFO - stdout - {'loss': 0.3346, 'grad_norm': 1.5363494157791138, 'learning_rate': 3.5115694509889386e-07, 'epoch': 2.75} +2025-02-06 06:38:26 - ERROR - stderr - 92%|█████████▏| 20593/22434 [20:30:45<1:17:02, 2.51s/it] +2025-02-06 06:38:28 - ERROR - stderr - 92%|█████████▏| 20594/22434 [20:30:48<1:16:56, 2.51s/it] +2025-02-06 06:38:28 - ERROR - stderr - +2025-02-06 06:38:28 - ERROR - stderr - +2025-02-06 06:38:28 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.4802676439285278, 'learning_rate': 3.5077780918262196e-07, 'epoch': 2.75} +2025-02-06 06:38:28 - ERROR - stderr - 92%|█████████▏| 20594/22434 [20:30:48<1:16:56, 2.51s/it] +2025-02-06 06:38:31 - ERROR - stderr - 92%|█████████▏| 20595/22434 [20:30:51<1:18:42, 2.57s/it] +2025-02-06 06:38:31 - ERROR - stderr - +2025-02-06 06:38:31 - ERROR - stderr - +2025-02-06 06:38:31 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.7005364894866943, 'learning_rate': 3.503988743951514e-07, 'epoch': 2.75} +2025-02-06 06:38:31 - ERROR - stderr - 92%|█████████▏| 20595/22434 [20:30:51<1:18:42, 2.57s/it] +2025-02-06 06:38:33 - ERROR - stderr - 92%|█████████▏| 20596/22434 [20:30:53<1:17:43, 2.54s/it] +2025-02-06 06:38:33 - ERROR - stderr - +2025-02-06 06:38:33 - ERROR - stderr - +2025-02-06 06:38:33 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.5036591291427612, 'learning_rate': 3.500201407443848e-07, 'epoch': 2.75} +2025-02-06 06:38:33 - ERROR - stderr - 92%|█████████▏| 20596/22434 [20:30:53<1:17:43, 2.54s/it] +2025-02-06 06:38:36 - ERROR - stderr - 92%|█████████▏| 20597/22434 [20:30:55<1:17:04, 2.52s/it] +2025-02-06 06:38:36 - ERROR - stderr - +2025-02-06 06:38:36 - ERROR - stderr - +2025-02-06 06:38:36 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.5266205072402954, 'learning_rate': 3.4964160823821257e-07, 'epoch': 2.75} +2025-02-06 06:38:36 - ERROR - stderr - 92%|█████████▏| 20597/22434 [20:30:56<1:17:04, 2.52s/it] +2025-02-06 06:38:38 - ERROR - stderr - 92%|█████████▏| 20598/22434 [20:30:58<1:16:48, 2.51s/it] +2025-02-06 06:38:38 - ERROR - stderr - +2025-02-06 06:38:38 - ERROR - stderr - +2025-02-06 06:38:38 - INFO - stdout - {'loss': 0.4086, 'grad_norm': 1.746596097946167, 'learning_rate': 3.492632768845261e-07, 'epoch': 2.75} +2025-02-06 06:38:38 - ERROR - stderr - 92%|█████████▏| 20598/22434 [20:30:58<1:16:48, 2.51s/it] +2025-02-06 06:38:41 - ERROR - stderr - 92%|█████████▏| 20599/22434 [20:31:00<1:16:29, 2.50s/it] +2025-02-06 06:38:41 - ERROR - stderr - +2025-02-06 06:38:41 - ERROR - stderr - +2025-02-06 06:38:41 - INFO - stdout - {'loss': 0.3201, 'grad_norm': 1.6739288568496704, 'learning_rate': 3.488851466912135e-07, 'epoch': 2.75} +2025-02-06 06:38:41 - ERROR - stderr - 92%|█████████▏| 20599/22434 [20:31:01<1:16:29, 2.50s/it] +2025-02-06 06:38:43 - ERROR - stderr - 92%|█████████▏| 20600/22434 [20:31:03<1:16:29, 2.50s/it] +2025-02-06 06:38:43 - ERROR - stderr - +2025-02-06 06:38:43 - ERROR - stderr - +2025-02-06 06:38:43 - INFO - stdout - {'loss': 0.3064, 'grad_norm': 1.515841007232666, 'learning_rate': 3.4850721766615304e-07, 'epoch': 2.75} +2025-02-06 06:38:43 - ERROR - stderr - 92%|█████████▏| 20600/22434 [20:31:03<1:16:29, 2.50s/it] +2025-02-06 06:38:46 - ERROR - stderr - 92%|█████████▏| 20601/22434 [20:31:05<1:16:27, 2.50s/it] +2025-02-06 06:38:46 - ERROR - stderr - +2025-02-06 06:38:46 - ERROR - stderr - +2025-02-06 06:38:46 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.5862118005752563, 'learning_rate': 3.4812948981722716e-07, 'epoch': 2.75} +2025-02-06 06:38:46 - ERROR - stderr - 92%|█████████▏| 20601/22434 [20:31:06<1:16:27, 2.50s/it] +2025-02-06 06:38:48 - ERROR - stderr - 92%|█████████▏| 20602/22434 [20:31:08<1:16:04, 2.49s/it] +2025-02-06 06:38:48 - ERROR - stderr - +2025-02-06 06:38:48 - ERROR - stderr - +2025-02-06 06:38:48 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.4882827997207642, 'learning_rate': 3.477519631523041e-07, 'epoch': 2.76} +2025-02-06 06:38:48 - ERROR - stderr - 92%|█████████▏| 20602/22434 [20:31:08<1:16:04, 2.49s/it] +2025-02-06 06:38:51 - ERROR - stderr - 92%|█████████▏| 20603/22434 [20:31:10<1:15:58, 2.49s/it] +2025-02-06 06:38:51 - ERROR - stderr - +2025-02-06 06:38:51 - ERROR - stderr - +2025-02-06 06:38:51 - INFO - stdout - {'loss': 0.3035, 'grad_norm': 1.540814757347107, 'learning_rate': 3.4737463767925526e-07, 'epoch': 2.76} +2025-02-06 06:38:51 - ERROR - stderr - 92%|█████████▏| 20603/22434 [20:31:10<1:15:58, 2.49s/it] +2025-02-06 06:38:53 - ERROR - stderr - 92%|█████████▏| 20604/22434 [20:31:13<1:15:28, 2.47s/it] +2025-02-06 06:38:53 - ERROR - stderr - +2025-02-06 06:38:53 - ERROR - stderr - +2025-02-06 06:38:53 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.607160210609436, 'learning_rate': 3.4699751340594557e-07, 'epoch': 2.76} +2025-02-06 06:38:53 - ERROR - stderr - 92%|█████████▏| 20604/22434 [20:31:13<1:15:28, 2.47s/it] +2025-02-06 06:38:56 - ERROR - stderr - 92%|█████████▏| 20605/22434 [20:31:15<1:15:44, 2.48s/it] +2025-02-06 06:38:56 - ERROR - stderr - +2025-02-06 06:38:56 - ERROR - stderr - +2025-02-06 06:38:56 - INFO - stdout - {'loss': 0.3233, 'grad_norm': 1.3468674421310425, 'learning_rate': 3.4662059034023644e-07, 'epoch': 2.76} +2025-02-06 06:38:56 - ERROR - stderr - 92%|█████████▏| 20605/22434 [20:31:15<1:15:44, 2.48s/it] +2025-02-06 06:38:58 - ERROR - stderr - 92%|█████████▏| 20606/22434 [20:31:18<1:16:37, 2.52s/it] +2025-02-06 06:38:58 - ERROR - stderr - +2025-02-06 06:38:58 - ERROR - stderr - +2025-02-06 06:38:58 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.5717487335205078, 'learning_rate': 3.462438684899827e-07, 'epoch': 2.76} +2025-02-06 06:38:58 - ERROR - stderr - 92%|█████████▏| 20606/22434 [20:31:18<1:16:37, 2.52s/it] +2025-02-06 06:39:01 - ERROR - stderr - 92%|█████████▏| 20607/22434 [20:31:20<1:16:04, 2.50s/it] +2025-02-06 06:39:01 - ERROR - stderr - +2025-02-06 06:39:01 - ERROR - stderr - +2025-02-06 06:39:01 - INFO - stdout - {'loss': 0.3485, 'grad_norm': 1.514543056488037, 'learning_rate': 3.458673478630392e-07, 'epoch': 2.76} +2025-02-06 06:39:01 - ERROR - stderr - 92%|█████████▏| 20607/22434 [20:31:20<1:16:04, 2.50s/it] +2025-02-06 06:39:03 - ERROR - stderr - 92%|█████████▏| 20608/22434 [20:31:23<1:15:53, 2.49s/it] +2025-02-06 06:39:03 - ERROR - stderr - +2025-02-06 06:39:03 - ERROR - stderr - +2025-02-06 06:39:03 - INFO - stdout - {'loss': 0.4024, 'grad_norm': 1.5631985664367676, 'learning_rate': 3.454910284672519e-07, 'epoch': 2.76} +2025-02-06 06:39:03 - ERROR - stderr - 92%|█████████▏| 20608/22434 [20:31:23<1:15:53, 2.49s/it] +2025-02-06 06:39:06 - ERROR - stderr - 92%|█████████▏| 20609/22434 [20:31:25<1:15:34, 2.48s/it] +2025-02-06 06:39:06 - ERROR - stderr - +2025-02-06 06:39:06 - ERROR - stderr - +2025-02-06 06:39:06 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.4613646268844604, 'learning_rate': 3.451149103104656e-07, 'epoch': 2.76} +2025-02-06 06:39:06 - ERROR - stderr - 92%|█████████▏| 20609/22434 [20:31:25<1:15:34, 2.48s/it] +2025-02-06 06:39:08 - ERROR - stderr - 92%|█████████▏| 20610/22434 [20:31:28<1:16:03, 2.50s/it] +2025-02-06 06:39:08 - ERROR - stderr - +2025-02-06 06:39:08 - ERROR - stderr - +2025-02-06 06:39:08 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.536556601524353, 'learning_rate': 3.4473899340052075e-07, 'epoch': 2.76} +2025-02-06 06:39:08 - ERROR - stderr - 92%|█████████▏| 20610/22434 [20:31:28<1:16:03, 2.50s/it] +2025-02-06 06:39:11 - ERROR - stderr - 92%|█████████▏| 20611/22434 [20:31:30<1:16:17, 2.51s/it] +2025-02-06 06:39:11 - ERROR - stderr - +2025-02-06 06:39:11 - ERROR - stderr - +2025-02-06 06:39:11 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.4610799551010132, 'learning_rate': 3.443632777452521e-07, 'epoch': 2.76} +2025-02-06 06:39:11 - ERROR - stderr - 92%|█████████▏| 20611/22434 [20:31:30<1:16:17, 2.51s/it] +2025-02-06 06:39:13 - ERROR - stderr - 92%|█████████▏| 20612/22434 [20:31:33<1:15:54, 2.50s/it] +2025-02-06 06:39:13 - ERROR - stderr - +2025-02-06 06:39:13 - ERROR - stderr - +2025-02-06 06:39:13 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.584354043006897, 'learning_rate': 3.439877633524924e-07, 'epoch': 2.76} +2025-02-06 06:39:13 - ERROR - stderr - 92%|█████████▏| 20612/22434 [20:31:33<1:15:54, 2.50s/it] +2025-02-06 06:39:16 - ERROR - stderr - 92%|█████████▏| 20613/22434 [20:31:35<1:15:32, 2.49s/it] +2025-02-06 06:39:16 - ERROR - stderr - +2025-02-06 06:39:16 - ERROR - stderr - +2025-02-06 06:39:16 - INFO - stdout - {'loss': 0.3575, 'grad_norm': 1.5367207527160645, 'learning_rate': 3.4361245023006864e-07, 'epoch': 2.76} +2025-02-06 06:39:16 - ERROR - stderr - 92%|█████████▏| 20613/22434 [20:31:35<1:15:32, 2.49s/it] +2025-02-06 06:39:18 - ERROR - stderr - 92%|█████████▏| 20614/22434 [20:31:38<1:15:47, 2.50s/it] +2025-02-06 06:39:18 - ERROR - stderr - +2025-02-06 06:39:18 - ERROR - stderr - +2025-02-06 06:39:18 - INFO - stdout - {'loss': 0.3002, 'grad_norm': 1.5644794702529907, 'learning_rate': 3.432373383858001e-07, 'epoch': 2.76} +2025-02-06 06:39:18 - ERROR - stderr - 92%|█████████▏| 20614/22434 [20:31:38<1:15:47, 2.50s/it] +2025-02-06 06:39:21 - ERROR - stderr - 92%|█████████▏| 20615/22434 [20:31:40<1:15:49, 2.50s/it] +2025-02-06 06:39:21 - ERROR - stderr - +2025-02-06 06:39:21 - ERROR - stderr - +2025-02-06 06:39:21 - INFO - stdout - {'loss': 0.3936, 'grad_norm': 1.6107590198516846, 'learning_rate': 3.4286242782751165e-07, 'epoch': 2.76} +2025-02-06 06:39:21 - ERROR - stderr - 92%|█████████▏| 20615/22434 [20:31:40<1:15:49, 2.50s/it] +2025-02-06 06:39:23 - ERROR - stderr - 92%|█████████▏| 20616/22434 [20:31:43<1:16:03, 2.51s/it] +2025-02-06 06:39:23 - ERROR - stderr - +2025-02-06 06:39:23 - ERROR - stderr - +2025-02-06 06:39:23 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.5187064409255981, 'learning_rate': 3.4248771856301266e-07, 'epoch': 2.76} +2025-02-06 06:39:23 - ERROR - stderr - 92%|█████████▏| 20616/22434 [20:31:43<1:16:03, 2.51s/it] +2025-02-06 06:39:26 - ERROR - stderr - 92%|█████████▏| 20617/22434 [20:31:45<1:15:54, 2.51s/it] +2025-02-06 06:39:26 - ERROR - stderr - +2025-02-06 06:39:26 - ERROR - stderr - +2025-02-06 06:39:26 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.641087532043457, 'learning_rate': 3.4211321060011795e-07, 'epoch': 2.76} +2025-02-06 06:39:26 - ERROR - stderr - 92%|█████████▏| 20617/22434 [20:31:45<1:15:54, 2.51s/it] +2025-02-06 06:39:28 - ERROR - stderr - 92%|█████████▏| 20618/22434 [20:31:48<1:16:56, 2.54s/it] +2025-02-06 06:39:28 - ERROR - stderr - +2025-02-06 06:39:28 - ERROR - stderr - +2025-02-06 06:39:28 - INFO - stdout - {'loss': 0.308, 'grad_norm': 1.4647361040115356, 'learning_rate': 3.4173890394663124e-07, 'epoch': 2.76} +2025-02-06 06:39:28 - ERROR - stderr - 92%|█████████▏| 20618/22434 [20:31:48<1:16:56, 2.54s/it] +2025-02-06 06:39:31 - ERROR - stderr - 92%|█████████▏| 20619/22434 [20:31:51<1:16:08, 2.52s/it] +2025-02-06 06:39:31 - ERROR - stderr - +2025-02-06 06:39:31 - ERROR - stderr - +2025-02-06 06:39:31 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.7544102668762207, 'learning_rate': 3.413647986103541e-07, 'epoch': 2.76} +2025-02-06 06:39:31 - ERROR - stderr - 92%|█████████▏| 20619/22434 [20:31:51<1:16:08, 2.52s/it] +2025-02-06 06:39:33 - ERROR - stderr - 92%|█████████▏| 20620/22434 [20:31:53<1:15:45, 2.51s/it] +2025-02-06 06:39:33 - ERROR - stderr - +2025-02-06 06:39:33 - ERROR - stderr - +2025-02-06 06:39:33 - INFO - stdout - {'loss': 0.3188, 'grad_norm': 1.5245217084884644, 'learning_rate': 3.4099089459908697e-07, 'epoch': 2.76} +2025-02-06 06:39:33 - ERROR - stderr - 92%|█████████▏| 20620/22434 [20:31:53<1:15:45, 2.51s/it] +2025-02-06 06:39:36 - ERROR - stderr - 92%|█████████▏| 20621/22434 [20:31:56<1:15:51, 2.51s/it] +2025-02-06 06:39:36 - ERROR - stderr - +2025-02-06 06:39:36 - ERROR - stderr - +2025-02-06 06:39:36 - INFO - stdout - {'loss': 0.4136, 'grad_norm': 1.8576616048812866, 'learning_rate': 3.406171919206214e-07, 'epoch': 2.76} +2025-02-06 06:39:36 - ERROR - stderr - 92%|█████████▏| 20621/22434 [20:31:56<1:15:51, 2.51s/it] +2025-02-06 06:39:38 - ERROR - stderr - 92%|█████████▏| 20622/22434 [20:31:58<1:15:35, 2.50s/it] +2025-02-06 06:39:38 - ERROR - stderr - +2025-02-06 06:39:38 - ERROR - stderr - +2025-02-06 06:39:38 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.659319519996643, 'learning_rate': 3.4024369058274774e-07, 'epoch': 2.76} +2025-02-06 06:39:38 - ERROR - stderr - 92%|█████████▏| 20622/22434 [20:31:58<1:15:35, 2.50s/it] +2025-02-06 06:39:41 - ERROR - stderr - 92%|█████████▏| 20623/22434 [20:32:01<1:15:30, 2.50s/it] +2025-02-06 06:39:41 - ERROR - stderr - +2025-02-06 06:39:41 - ERROR - stderr - +2025-02-06 06:39:41 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.5806396007537842, 'learning_rate': 3.398703905932499e-07, 'epoch': 2.76} +2025-02-06 06:39:41 - ERROR - stderr - 92%|█████████▏| 20623/22434 [20:32:01<1:15:30, 2.50s/it] +2025-02-06 06:39:43 - ERROR - stderr - 92%|█████████▏| 20624/22434 [20:32:03<1:16:34, 2.54s/it] +2025-02-06 06:39:43 - ERROR - stderr - +2025-02-06 06:39:43 - ERROR - stderr - +2025-02-06 06:39:43 - INFO - stdout - {'loss': 0.3546, 'grad_norm': 1.554179072380066, 'learning_rate': 3.394972919599093e-07, 'epoch': 2.76} +2025-02-06 06:39:43 - ERROR - stderr - 92%|█████████▏| 20624/22434 [20:32:03<1:16:34, 2.54s/it] +2025-02-06 06:39:46 - ERROR - stderr - 92%|█████████▏| 20625/22434 [20:32:06<1:17:37, 2.57s/it] +2025-02-06 06:39:46 - ERROR - stderr - +2025-02-06 06:39:46 - ERROR - stderr - +2025-02-06 06:39:46 - INFO - stdout - {'loss': 0.323, 'grad_norm': 1.4996528625488281, 'learning_rate': 3.391243946905065e-07, 'epoch': 2.76} +2025-02-06 06:39:46 - ERROR - stderr - 92%|█████████▏| 20625/22434 [20:32:06<1:17:37, 2.57s/it] +2025-02-06 06:39:48 - ERROR - stderr - 92%|█████████▏| 20626/22434 [20:32:08<1:16:29, 2.54s/it] +2025-02-06 06:39:49 - ERROR - stderr - +2025-02-06 06:39:49 - ERROR - stderr - +2025-02-06 06:39:49 - INFO - stdout - {'loss': 0.3404, 'grad_norm': 1.5668412446975708, 'learning_rate': 3.3875169879280966e-07, 'epoch': 2.76} +2025-02-06 06:39:49 - ERROR - stderr - 92%|█████████▏| 20626/22434 [20:32:08<1:16:29, 2.54s/it] +2025-02-06 06:39:51 - ERROR - stderr - 92%|█████████▏| 20627/22434 [20:32:11<1:15:43, 2.51s/it] +2025-02-06 06:39:51 - ERROR - stderr - +2025-02-06 06:39:51 - ERROR - stderr - +2025-02-06 06:39:51 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.6871651411056519, 'learning_rate': 3.3837920427458814e-07, 'epoch': 2.76} +2025-02-06 06:39:51 - ERROR - stderr - 92%|█████████▏| 20627/22434 [20:32:11<1:15:43, 2.51s/it] +2025-02-06 06:39:54 - ERROR - stderr - 92%|█████████▏| 20628/22434 [20:32:13<1:16:22, 2.54s/it] +2025-02-06 06:39:54 - ERROR - stderr - +2025-02-06 06:39:54 - ERROR - stderr - +2025-02-06 06:39:54 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.4550955295562744, 'learning_rate': 3.3800691114360794e-07, 'epoch': 2.76} +2025-02-06 06:39:54 - ERROR - stderr - 92%|█████████▏| 20628/22434 [20:32:13<1:16:22, 2.54s/it] +2025-02-06 06:39:56 - ERROR - stderr - 92%|█████████▏| 20629/22434 [20:32:16<1:17:31, 2.58s/it] +2025-02-06 06:39:56 - ERROR - stderr - +2025-02-06 06:39:56 - ERROR - stderr - +2025-02-06 06:39:56 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.501133680343628, 'learning_rate': 3.376348194076273e-07, 'epoch': 2.76} +2025-02-06 06:39:56 - ERROR - stderr - 92%|█████████▏| 20629/22434 [20:32:16<1:17:31, 2.58s/it] +2025-02-06 06:39:59 - ERROR - stderr - 92%|█████████▏| 20630/22434 [20:32:19<1:17:28, 2.58s/it] +2025-02-06 06:39:59 - ERROR - stderr - +2025-02-06 06:39:59 - ERROR - stderr - +2025-02-06 06:39:59 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.6942713260650635, 'learning_rate': 3.372629290744034e-07, 'epoch': 2.76} +2025-02-06 06:39:59 - ERROR - stderr - 92%|█████████▏| 20630/22434 [20:32:19<1:17:28, 2.58s/it] +2025-02-06 06:40:01 - ERROR - stderr - 92%|█████████▏| 20631/22434 [20:32:21<1:16:45, 2.55s/it] +2025-02-06 06:40:01 - ERROR - stderr - +2025-02-06 06:40:01 - ERROR - stderr - +2025-02-06 06:40:01 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.62017822265625, 'learning_rate': 3.368912401516877e-07, 'epoch': 2.76} +2025-02-06 06:40:01 - ERROR - stderr - 92%|█████████▏| 20631/22434 [20:32:21<1:16:45, 2.55s/it] +2025-02-06 06:40:04 - ERROR - stderr - 92%|█████████▏| 20632/22434 [20:32:23<1:15:32, 2.52s/it] +2025-02-06 06:40:04 - ERROR - stderr - +2025-02-06 06:40:04 - ERROR - stderr - +2025-02-06 06:40:04 - INFO - stdout - {'loss': 0.3657, 'grad_norm': 1.5366078615188599, 'learning_rate': 3.3651975264722746e-07, 'epoch': 2.76} +2025-02-06 06:40:04 - ERROR - stderr - 92%|█████████▏| 20632/22434 [20:32:24<1:15:32, 2.52s/it] +2025-02-06 06:40:06 - ERROR - stderr - 92%|█████████▏| 20633/22434 [20:32:26<1:14:40, 2.49s/it] +2025-02-06 06:40:06 - ERROR - stderr - +2025-02-06 06:40:06 - ERROR - stderr - +2025-02-06 06:40:06 - INFO - stdout - {'loss': 0.3318, 'grad_norm': 1.533186674118042, 'learning_rate': 3.361484665687664e-07, 'epoch': 2.76} +2025-02-06 06:40:06 - ERROR - stderr - 92%|█████████▏| 20633/22434 [20:32:26<1:14:40, 2.49s/it] +2025-02-06 06:40:09 - ERROR - stderr - 92%|█████████▏| 20634/22434 [20:32:29<1:18:28, 2.62s/it] +2025-02-06 06:40:09 - ERROR - stderr - +2025-02-06 06:40:09 - ERROR - stderr - +2025-02-06 06:40:09 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.4641669988632202, 'learning_rate': 3.3577738192404395e-07, 'epoch': 2.76} +2025-02-06 06:40:09 - ERROR - stderr - 92%|█████████▏| 20634/22434 [20:32:29<1:18:28, 2.62s/it] +2025-02-06 06:40:12 - ERROR - stderr - 92%|█████████▏| 20635/22434 [20:32:31<1:17:16, 2.58s/it] +2025-02-06 06:40:12 - ERROR - stderr - +2025-02-06 06:40:12 - ERROR - stderr - +2025-02-06 06:40:12 - INFO - stdout - {'loss': 0.4082, 'grad_norm': 1.8235427141189575, 'learning_rate': 3.354064987207917e-07, 'epoch': 2.76} +2025-02-06 06:40:12 - ERROR - stderr - 92%|█████████▏| 20635/22434 [20:32:31<1:17:16, 2.58s/it] +2025-02-06 06:40:14 - ERROR - stderr - 92%|█████████▏| 20636/22434 [20:32:34<1:17:49, 2.60s/it] +2025-02-06 06:40:14 - ERROR - stderr - +2025-02-06 06:40:14 - ERROR - stderr - +2025-02-06 06:40:14 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.5606300830841064, 'learning_rate': 3.3503581696674446e-07, 'epoch': 2.76} +2025-02-06 06:40:14 - ERROR - stderr - 92%|█████████▏| 20636/22434 [20:32:34<1:17:49, 2.60s/it] +2025-02-06 06:40:17 - ERROR - stderr - 92%|█████████▏| 20637/22434 [20:32:36<1:16:37, 2.56s/it] +2025-02-06 06:40:17 - ERROR - stderr - +2025-02-06 06:40:17 - ERROR - stderr - +2025-02-06 06:40:17 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.6499061584472656, 'learning_rate': 3.346653366696284e-07, 'epoch': 2.76} +2025-02-06 06:40:17 - ERROR - stderr - 92%|█████████▏| 20637/22434 [20:32:36<1:16:37, 2.56s/it] +2025-02-06 06:40:19 - ERROR - stderr - 92%|█████████▏| 20638/22434 [20:32:39<1:16:01, 2.54s/it] +2025-02-06 06:40:19 - ERROR - stderr - +2025-02-06 06:40:19 - ERROR - stderr - +2025-02-06 06:40:19 - INFO - stdout - {'loss': 0.3418, 'grad_norm': 1.5538313388824463, 'learning_rate': 3.3429505783716177e-07, 'epoch': 2.76} +2025-02-06 06:40:19 - ERROR - stderr - 92%|█████████▏| 20638/22434 [20:32:39<1:16:01, 2.54s/it] +2025-02-06 06:40:22 - ERROR - stderr - 92%|█████████▏| 20639/22434 [20:32:41<1:15:55, 2.54s/it] +2025-02-06 06:40:22 - ERROR - stderr - +2025-02-06 06:40:22 - ERROR - stderr - +2025-02-06 06:40:22 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.432137131690979, 'learning_rate': 3.3392498047706836e-07, 'epoch': 2.76} +2025-02-06 06:40:22 - ERROR - stderr - 92%|█████████▏| 20639/22434 [20:32:41<1:15:55, 2.54s/it] +2025-02-06 06:40:24 - ERROR - stderr - 92%|█████████▏| 20640/22434 [20:32:44<1:16:01, 2.54s/it] +2025-02-06 06:40:24 - ERROR - stderr - +2025-02-06 06:40:24 - ERROR - stderr - +2025-02-06 06:40:24 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.4365860223770142, 'learning_rate': 3.3355510459705754e-07, 'epoch': 2.76} +2025-02-06 06:40:24 - ERROR - stderr - 92%|█████████▏| 20640/22434 [20:32:44<1:16:01, 2.54s/it] +2025-02-06 06:40:27 - ERROR - stderr - 92%|█████████▏| 20641/22434 [20:32:46<1:15:35, 2.53s/it] +2025-02-06 06:40:27 - ERROR - stderr - +2025-02-06 06:40:27 - ERROR - stderr - +2025-02-06 06:40:27 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.6688669919967651, 'learning_rate': 3.331854302048432e-07, 'epoch': 2.76} +2025-02-06 06:40:27 - ERROR - stderr - 92%|█████████▏| 20641/22434 [20:32:47<1:15:35, 2.53s/it] +2025-02-06 06:40:29 - ERROR - stderr - 92%|█████████▏| 20642/22434 [20:32:49<1:16:22, 2.56s/it] +2025-02-06 06:40:29 - ERROR - stderr - +2025-02-06 06:40:29 - ERROR - stderr - +2025-02-06 06:40:29 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.6407005786895752, 'learning_rate': 3.328159573081258e-07, 'epoch': 2.76} +2025-02-06 06:40:29 - ERROR - stderr - 92%|█████████▏| 20642/22434 [20:32:49<1:16:22, 2.56s/it] +2025-02-06 06:40:32 - ERROR - stderr - 92%|█████████▏| 20643/22434 [20:32:52<1:16:13, 2.55s/it] +2025-02-06 06:40:32 - ERROR - stderr - +2025-02-06 06:40:32 - ERROR - stderr - +2025-02-06 06:40:32 - INFO - stdout - {'loss': 0.3088, 'grad_norm': 1.3723372220993042, 'learning_rate': 3.3244668591460916e-07, 'epoch': 2.76} +2025-02-06 06:40:32 - ERROR - stderr - 92%|█████████▏| 20643/22434 [20:32:52<1:16:13, 2.55s/it] +2025-02-06 06:40:34 - ERROR - stderr - 92%|█████████▏| 20644/22434 [20:32:54<1:15:52, 2.54s/it] +2025-02-06 06:40:34 - ERROR - stderr - +2025-02-06 06:40:34 - ERROR - stderr - +2025-02-06 06:40:34 - INFO - stdout - {'loss': 0.3999, 'grad_norm': 1.6139580011367798, 'learning_rate': 3.320776160319927e-07, 'epoch': 2.76} +2025-02-06 06:40:34 - ERROR - stderr - 92%|█████████▏| 20644/22434 [20:32:54<1:15:52, 2.54s/it] +2025-02-06 06:40:37 - ERROR - stderr - 92%|████████��▏| 20645/22434 [20:32:57<1:17:18, 2.59s/it] +2025-02-06 06:40:37 - ERROR - stderr - +2025-02-06 06:40:37 - ERROR - stderr - +2025-02-06 06:40:37 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.5799717903137207, 'learning_rate': 3.317087476679659e-07, 'epoch': 2.76} +2025-02-06 06:40:37 - ERROR - stderr - 92%|█████████▏| 20645/22434 [20:32:57<1:17:18, 2.59s/it] +2025-02-06 06:40:40 - ERROR - stderr - 92%|█████████▏| 20646/22434 [20:32:59<1:16:53, 2.58s/it] +2025-02-06 06:40:40 - ERROR - stderr - +2025-02-06 06:40:40 - ERROR - stderr - +2025-02-06 06:40:40 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.4462039470672607, 'learning_rate': 3.3134008083021916e-07, 'epoch': 2.76} +2025-02-06 06:40:40 - ERROR - stderr - 92%|█████████▏| 20646/22434 [20:32:59<1:16:53, 2.58s/it] +2025-02-06 06:40:42 - ERROR - stderr - 92%|█████████▏| 20647/22434 [20:33:02<1:16:32, 2.57s/it] +2025-02-06 06:40:42 - ERROR - stderr - +2025-02-06 06:40:42 - ERROR - stderr - +2025-02-06 06:40:42 - INFO - stdout - {'loss': 0.3509, 'grad_norm': 1.5910276174545288, 'learning_rate': 3.309716155264364e-07, 'epoch': 2.76} +2025-02-06 06:40:42 - ERROR - stderr - 92%|█████████▏| 20647/22434 [20:33:02<1:16:32, 2.57s/it] +2025-02-06 06:40:45 - ERROR - stderr - 92%|█████████▏| 20648/22434 [20:33:04<1:16:03, 2.55s/it] +2025-02-06 06:40:45 - ERROR - stderr - +2025-02-06 06:40:45 - ERROR - stderr - +2025-02-06 06:40:45 - INFO - stdout - {'loss': 0.4012, 'grad_norm': 1.7002403736114502, 'learning_rate': 3.3060335176429703e-07, 'epoch': 2.76} +2025-02-06 06:40:45 - ERROR - stderr - 92%|█████████▏| 20648/22434 [20:33:05<1:16:03, 2.55s/it] +2025-02-06 06:40:47 - ERROR - stderr - 92%|█████████▏| 20649/22434 [20:33:07<1:15:42, 2.54s/it] +2025-02-06 06:40:47 - ERROR - stderr - +2025-02-06 06:40:47 - ERROR - stderr - +2025-02-06 06:40:47 - INFO - stdout - {'loss': 0.3327, 'grad_norm': 1.4837027788162231, 'learning_rate': 3.302352895514793e-07, 'epoch': 2.76} +2025-02-06 06:40:47 - ERROR - stderr - 92%|█████████▏| 20649/22434 [20:33:07<1:15:42, 2.54s/it] +2025-02-06 06:40:50 - ERROR - stderr - 92%|█████████▏| 20650/22434 [20:33:10<1:15:39, 2.54s/it] +2025-02-06 06:40:50 - ERROR - stderr - +2025-02-06 06:40:50 - ERROR - stderr - +2025-02-06 06:40:50 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.6668310165405273, 'learning_rate': 3.298674288956538e-07, 'epoch': 2.76} +2025-02-06 06:40:50 - ERROR - stderr - 92%|█████████▏| 20650/22434 [20:33:10<1:15:39, 2.54s/it] +2025-02-06 06:40:52 - ERROR - stderr - 92%|█████████▏| 20651/22434 [20:33:12<1:14:56, 2.52s/it] +2025-02-06 06:40:52 - ERROR - stderr - +2025-02-06 06:40:52 - ERROR - stderr - +2025-02-06 06:40:52 - INFO - stdout - {'loss': 0.3199, 'grad_norm': 1.583036184310913, 'learning_rate': 3.2949976980448774e-07, 'epoch': 2.76} +2025-02-06 06:40:52 - ERROR - stderr - 92%|█████████▏| 20651/22434 [20:33:12<1:14:56, 2.52s/it] +2025-02-06 06:40:55 - ERROR - stderr - 92%|█████████▏| 20652/22434 [20:33:15<1:14:50, 2.52s/it] +2025-02-06 06:40:55 - ERROR - stderr - +2025-02-06 06:40:55 - ERROR - stderr - +2025-02-06 06:40:55 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.4512341022491455, 'learning_rate': 3.2913231228564604e-07, 'epoch': 2.76} +2025-02-06 06:40:55 - ERROR - stderr - 92%|█████████▏| 20652/22434 [20:33:15<1:14:50, 2.52s/it] +2025-02-06 06:40:57 - ERROR - stderr - 92%|█████████▏| 20653/22434 [20:33:17<1:14:40, 2.52s/it] +2025-02-06 06:40:57 - ERROR - stderr - +2025-02-06 06:40:57 - ERROR - stderr - +2025-02-06 06:40:57 - INFO - stdout - {'loss': 0.3893, 'grad_norm': 1.6365638971328735, 'learning_rate': 3.28765056346787e-07, 'epoch': 2.76} +2025-02-06 06:40:57 - ERROR - stderr - 92%|█████████▏| 20653/22434 [20:33:17<1:14:40, 2.52s/it] +2025-02-06 06:41:00 - ERROR - stderr - 92%|█████████▏| 20654/22434 [20:33:20<1:17:50, 2.62s/it] +2025-02-06 06:41:00 - ERROR - stderr - +2025-02-06 06:41:00 - ERROR - stderr - +2025-02-06 06:41:00 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.6326844692230225, 'learning_rate': 3.283980019955668e-07, 'epoch': 2.76} +2025-02-06 06:41:00 - ERROR - stderr - 92%|█████████▏| 20654/22434 [20:33:20<1:17:50, 2.62s/it] +2025-02-06 06:41:03 - ERROR - stderr - 92%|█████████▏| 20655/22434 [20:33:22<1:16:11, 2.57s/it] +2025-02-06 06:41:03 - ERROR - stderr - +2025-02-06 06:41:03 - ERROR - stderr - +2025-02-06 06:41:03 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.7585816383361816, 'learning_rate': 3.2803114923963377e-07, 'epoch': 2.76} +2025-02-06 06:41:03 - ERROR - stderr - 92%|█████████▏| 20655/22434 [20:33:22<1:16:11, 2.57s/it] +2025-02-06 06:41:05 - ERROR - stderr - 92%|█████████▏| 20656/22434 [20:33:25<1:15:15, 2.54s/it] +2025-02-06 06:41:05 - ERROR - stderr - +2025-02-06 06:41:05 - ERROR - stderr - +2025-02-06 06:41:05 - INFO - stdout - {'loss': 0.4253, 'grad_norm': 1.8509804010391235, 'learning_rate': 3.2766449808663836e-07, 'epoch': 2.76} +2025-02-06 06:41:05 - ERROR - stderr - 92%|█████████▏| 20656/22434 [20:33:25<1:15:15, 2.54s/it] +2025-02-06 06:41:08 - ERROR - stderr - 92%|█████████▏| 20657/22434 [20:33:27<1:15:53, 2.56s/it] +2025-02-06 06:41:08 - ERROR - stderr - +2025-02-06 06:41:08 - ERROR - stderr - +2025-02-06 06:41:08 - INFO - stdout - {'loss': 0.3897, 'grad_norm': 1.7923189401626587, 'learning_rate': 3.272980485442201e-07, 'epoch': 2.76} +2025-02-06 06:41:08 - ERROR - stderr - 92%|█████████▏| 20657/22434 [20:33:28<1:15:53, 2.56s/it] +2025-02-06 06:41:10 - ERROR - stderr - 92%|█████████▏| 20658/22434 [20:33:30<1:16:17, 2.58s/it] +2025-02-06 06:41:10 - ERROR - stderr - +2025-02-06 06:41:10 - ERROR - stderr - +2025-02-06 06:41:10 - INFO - stdout - {'loss': 0.3553, 'grad_norm': 1.5577131509780884, 'learning_rate': 3.269318006200195e-07, 'epoch': 2.76} +2025-02-06 06:41:10 - ERROR - stderr - 92%|█████████▏| 20658/22434 [20:33:30<1:16:17, 2.58s/it] +2025-02-06 06:41:13 - ERROR - stderr - 92%|█████████▏| 20659/22434 [20:33:33<1:16:10, 2.57s/it] +2025-02-06 06:41:13 - ERROR - stderr - +2025-02-06 06:41:13 - ERROR - stderr - +2025-02-06 06:41:13 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.5973894596099854, 'learning_rate': 3.2656575432166605e-07, 'epoch': 2.76} +2025-02-06 06:41:13 - ERROR - stderr - 92%|█████████▏| 20659/22434 [20:33:33<1:16:10, 2.57s/it] +2025-02-06 06:41:15 - ERROR - stderr - 92%|█████████▏| 20660/22434 [20:33:35<1:16:35, 2.59s/it] +2025-02-06 06:41:16 - ERROR - stderr - +2025-02-06 06:41:16 - ERROR - stderr - +2025-02-06 06:41:16 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.6508179903030396, 'learning_rate': 3.2619990965679695e-07, 'epoch': 2.76} +2025-02-06 06:41:16 - ERROR - stderr - 92%|█████████▏| 20660/22434 [20:33:35<1:16:35, 2.59s/it] +2025-02-06 06:41:18 - ERROR - stderr - 92%|█████████▏| 20661/22434 [20:33:38<1:16:47, 2.60s/it] +2025-02-06 06:41:18 - ERROR - stderr - +2025-02-06 06:41:18 - ERROR - stderr - +2025-02-06 06:41:18 - INFO - stdout - {'loss': 0.3283, 'grad_norm': 1.676206350326538, 'learning_rate': 3.258342666330305e-07, 'epoch': 2.76} +2025-02-06 06:41:18 - ERROR - stderr - 92%|█████████▏| 20661/22434 [20:33:38<1:16:47, 2.60s/it] +2025-02-06 06:41:21 - ERROR - stderr - 92%|█████████▏| 20662/22434 [20:33:40<1:15:13, 2.55s/it] +2025-02-06 06:41:21 - ERROR - stderr - +2025-02-06 06:41:21 - ERROR - stderr - +2025-02-06 06:41:21 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.5846302509307861, 'learning_rate': 3.2546882525799294e-07, 'epoch': 2.76} +2025-02-06 06:41:21 - ERROR - stderr - 92%|█████████▏| 20662/22434 [20:33:40<1:15:13, 2.55s/it] +2025-02-06 06:41:23 - ERROR - stderr - 92%|█████████▏| 20663/22434 [20:33:43<1:16:11, 2.58s/it] +2025-02-06 06:41:23 - ERROR - stderr - +2025-02-06 06:41:23 - ERROR - stderr - +2025-02-06 06:41:23 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.5683162212371826, 'learning_rate': 3.2510358553930143e-07, 'epoch': 2.76} +2025-02-06 06:41:23 - ERROR - stderr - 92%|█████████▏| 20663/22434 [20:33:43<1:16:11, 2.58s/it] +2025-02-06 06:41:26 - ERROR - stderr - 92%|█████████▏| 20664/22434 [20:33:45<1:15:38, 2.56s/it] +2025-02-06 06:41:26 - ERROR - stderr - +2025-02-06 06:41:26 - ERROR - stderr - +2025-02-06 06:41:26 - INFO - stdout - {'loss': 0.3364, 'grad_norm': 1.611045002937317, 'learning_rate': 3.247385474845655e-07, 'epoch': 2.76} +2025-02-06 06:41:26 - ERROR - stderr - 92%|█████████▏| 20664/22434 [20:33:46<1:15:38, 2.56s/it] +2025-02-06 06:41:28 - ERROR - stderr - 92%|█████████▏| 20665/22434 [20:33:48<1:15:34, 2.56s/it] +2025-02-06 06:41:28 - ERROR - stderr - +2025-02-06 06:41:28 - ERROR - stderr - +2025-02-06 06:41:28 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.3984476327896118, 'learning_rate': 3.2437371110139895e-07, 'epoch': 2.76} +2025-02-06 06:41:28 - ERROR - stderr - 92%|█████████▏| 20665/22434 [20:33:48<1:15:34, 2.56s/it] +2025-02-06 06:41:31 - ERROR - stderr - 92%|█████████▏| 20666/22434 [20:33:51<1:14:53, 2.54s/it] +2025-02-06 06:41:31 - ERROR - stderr - +2025-02-06 06:41:31 - ERROR - stderr - +2025-02-06 06:41:31 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.6490401029586792, 'learning_rate': 3.2400907639740243e-07, 'epoch': 2.76} +2025-02-06 06:41:31 - ERROR - stderr - 92%|█████████▏| 20666/22434 [20:33:51<1:14:53, 2.54s/it] +2025-02-06 06:41:33 - ERROR - stderr - 92%|██���██████▏| 20667/22434 [20:33:53<1:15:32, 2.57s/it] +2025-02-06 06:41:33 - ERROR - stderr - +2025-02-06 06:41:33 - ERROR - stderr - +2025-02-06 06:41:33 - INFO - stdout - {'loss': 0.3385, 'grad_norm': 1.5555320978164673, 'learning_rate': 3.236446433801776e-07, 'epoch': 2.76} +2025-02-06 06:41:33 - ERROR - stderr - 92%|█████████▏| 20667/22434 [20:33:53<1:15:32, 2.57s/it] +2025-02-06 06:41:36 - ERROR - stderr - 92%|█████████▏| 20668/22434 [20:33:56<1:14:57, 2.55s/it] +2025-02-06 06:41:36 - ERROR - stderr - +2025-02-06 06:41:36 - ERROR - stderr - +2025-02-06 06:41:36 - INFO - stdout - {'loss': 0.3217, 'grad_norm': 1.4809041023254395, 'learning_rate': 3.232804120573219e-07, 'epoch': 2.76} +2025-02-06 06:41:36 - ERROR - stderr - 92%|█████████▏| 20668/22434 [20:33:56<1:14:57, 2.55s/it] +2025-02-06 06:41:38 - ERROR - stderr - 92%|█████████▏| 20669/22434 [20:33:58<1:13:51, 2.51s/it] +2025-02-06 06:41:38 - ERROR - stderr - +2025-02-06 06:41:38 - ERROR - stderr - +2025-02-06 06:41:38 - INFO - stdout - {'loss': 0.3113, 'grad_norm': 1.4999443292617798, 'learning_rate': 3.2291638243642567e-07, 'epoch': 2.76} +2025-02-06 06:41:38 - ERROR - stderr - 92%|█████████▏| 20669/22434 [20:33:58<1:13:51, 2.51s/it] +2025-02-06 06:41:41 - ERROR - stderr - 92%|█████████▏| 20670/22434 [20:34:01<1:13:13, 2.49s/it] +2025-02-06 06:41:41 - ERROR - stderr - +2025-02-06 06:41:41 - ERROR - stderr - +2025-02-06 06:41:41 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.4797770977020264, 'learning_rate': 3.225525545250774e-07, 'epoch': 2.76} +2025-02-06 06:41:41 - ERROR - stderr - 92%|█████████▏| 20670/22434 [20:34:01<1:13:13, 2.49s/it] +2025-02-06 06:41:43 - ERROR - stderr - 92%|█████████▏| 20671/22434 [20:34:03<1:13:51, 2.51s/it] +2025-02-06 06:41:43 - ERROR - stderr - +2025-02-06 06:41:43 - ERROR - stderr - +2025-02-06 06:41:43 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.5110925436019897, 'learning_rate': 3.22188928330861e-07, 'epoch': 2.76} +2025-02-06 06:41:43 - ERROR - stderr - 92%|█████████▏| 20671/22434 [20:34:03<1:13:51, 2.51s/it] +2025-02-06 06:41:46 - ERROR - stderr - 92%|█████████▏| 20672/22434 [20:34:06<1:14:05, 2.52s/it] +2025-02-06 06:41:46 - ERROR - stderr - +2025-02-06 06:41:46 - ERROR - stderr - +2025-02-06 06:41:46 - INFO - stdout - {'loss': 0.3785, 'grad_norm': 1.8774360418319702, 'learning_rate': 3.218255038613549e-07, 'epoch': 2.76} +2025-02-06 06:41:46 - ERROR - stderr - 92%|█████████▏| 20672/22434 [20:34:06<1:14:05, 2.52s/it] +2025-02-06 06:41:48 - ERROR - stderr - 92%|█████████▏| 20673/22434 [20:34:08<1:14:48, 2.55s/it] +2025-02-06 06:41:49 - ERROR - stderr - +2025-02-06 06:41:49 - ERROR - stderr - +2025-02-06 06:41:49 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.5553300380706787, 'learning_rate': 3.2146228112413637e-07, 'epoch': 2.76} +2025-02-06 06:41:49 - ERROR - stderr - 92%|█████████▏| 20673/22434 [20:34:08<1:14:48, 2.55s/it] +2025-02-06 06:41:51 - ERROR - stderr - 92%|█████████▏| 20674/22434 [20:34:11<1:14:31, 2.54s/it] +2025-02-06 06:41:51 - ERROR - stderr - +2025-02-06 06:41:51 - ERROR - stderr - +2025-02-06 06:41:51 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.5218271017074585, 'learning_rate': 3.2109926012677484e-07, 'epoch': 2.76} +2025-02-06 06:41:51 - ERROR - stderr - 92%|█████████▏| 20674/22434 [20:34:11<1:14:31, 2.54s/it] +2025-02-06 06:41:53 - ERROR - stderr - 92%|█████████▏| 20675/22434 [20:34:13<1:13:18, 2.50s/it] +2025-02-06 06:41:53 - ERROR - stderr - +2025-02-06 06:41:53 - ERROR - stderr - +2025-02-06 06:41:53 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.7428843975067139, 'learning_rate': 3.2073644087683654e-07, 'epoch': 2.76} +2025-02-06 06:41:53 - ERROR - stderr - 92%|█████████▏| 20675/22434 [20:34:13<1:13:18, 2.50s/it] +2025-02-06 06:41:56 - ERROR - stderr - 92%|█████████▏| 20676/22434 [20:34:16<1:14:10, 2.53s/it] +2025-02-06 06:41:56 - ERROR - stderr - +2025-02-06 06:41:56 - ERROR - stderr - +2025-02-06 06:41:56 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.7955750226974487, 'learning_rate': 3.203738233818865e-07, 'epoch': 2.76} +2025-02-06 06:41:56 - ERROR - stderr - 92%|█████████▏| 20676/22434 [20:34:16<1:14:10, 2.53s/it] +2025-02-06 06:41:59 - ERROR - stderr - 92%|█████████▏| 20677/22434 [20:34:18<1:14:49, 2.55s/it] +2025-02-06 06:41:59 - ERROR - stderr - +2025-02-06 06:41:59 - ERROR - stderr - +2025-02-06 06:41:59 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.3737322092056274, 'learning_rate': 3.200114076494809e-07, 'epoch': 2.77} +2025-02-06 06:41:59 - ERROR - stderr - 92%|█████████▏| 20677/22434 [20:34:18<1:14:49, 2.55s/it] +2025-02-06 06:42:01 - ERROR - stderr - 92%|█████████▏| 20678/22434 [20:34:21<1:13:46, 2.52s/it] +2025-02-06 06:42:01 - ERROR - stderr - +2025-02-06 06:42:01 - ERROR - stderr - +2025-02-06 06:42:01 - INFO - stdout - {'loss': 0.3308, 'grad_norm': 1.570135235786438, 'learning_rate': 3.196491936871748e-07, 'epoch': 2.77} +2025-02-06 06:42:01 - ERROR - stderr - 92%|█████████▏| 20678/22434 [20:34:21<1:13:46, 2.52s/it] +2025-02-06 06:42:04 - ERROR - stderr - 92%|█████████▏| 20679/22434 [20:34:23<1:14:29, 2.55s/it] +2025-02-06 06:42:04 - ERROR - stderr - +2025-02-06 06:42:04 - ERROR - stderr - +2025-02-06 06:42:04 - INFO - stdout - {'loss': 0.3097, 'grad_norm': 1.4440898895263672, 'learning_rate': 3.1928718150252e-07, 'epoch': 2.77} +2025-02-06 06:42:04 - ERROR - stderr - 92%|█████████▏| 20679/22434 [20:34:24<1:14:29, 2.55s/it] +2025-02-06 06:42:06 - ERROR - stderr - 92%|█████████▏| 20680/22434 [20:34:26<1:13:33, 2.52s/it] +2025-02-06 06:42:06 - ERROR - stderr - +2025-02-06 06:42:06 - ERROR - stderr - +2025-02-06 06:42:06 - INFO - stdout - {'loss': 0.3147, 'grad_norm': 1.4188389778137207, 'learning_rate': 3.189253711030571e-07, 'epoch': 2.77} +2025-02-06 06:42:06 - ERROR - stderr - 92%|█████████▏| 20680/22434 [20:34:26<1:13:33, 2.52s/it] +2025-02-06 06:42:09 - ERROR - stderr - 92%|█████████▏| 20681/22434 [20:34:28<1:13:12, 2.51s/it] +2025-02-06 06:42:09 - ERROR - stderr - +2025-02-06 06:42:09 - ERROR - stderr - +2025-02-06 06:42:09 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.6822290420532227, 'learning_rate': 3.1856376249633336e-07, 'epoch': 2.77} +2025-02-06 06:42:09 - ERROR - stderr - 92%|█████████▏| 20681/22434 [20:34:28<1:13:12, 2.51s/it] +2025-02-06 06:42:11 - ERROR - stderr - 92%|█████████▏| 20682/22434 [20:34:31<1:13:00, 2.50s/it] +2025-02-06 06:42:11 - ERROR - stderr - +2025-02-06 06:42:11 - ERROR - stderr - +2025-02-06 06:42:11 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.6088173389434814, 'learning_rate': 3.182023556898839e-07, 'epoch': 2.77} +2025-02-06 06:42:11 - ERROR - stderr - 92%|█████████▏| 20682/22434 [20:34:31<1:13:00, 2.50s/it] +2025-02-06 06:42:14 - ERROR - stderr - 92%|█████████▏| 20683/22434 [20:34:33<1:13:23, 2.51s/it] +2025-02-06 06:42:14 - ERROR - stderr - +2025-02-06 06:42:14 - ERROR - stderr - +2025-02-06 06:42:14 - INFO - stdout - {'loss': 0.416, 'grad_norm': 1.6445739269256592, 'learning_rate': 3.1784115069124044e-07, 'epoch': 2.77} +2025-02-06 06:42:14 - ERROR - stderr - 92%|█████████▏| 20683/22434 [20:34:33<1:13:23, 2.51s/it] +2025-02-06 06:42:16 - ERROR - stderr - 92%|█████████▏| 20684/22434 [20:34:36<1:13:16, 2.51s/it] +2025-02-06 06:42:16 - ERROR - stderr - +2025-02-06 06:42:16 - ERROR - stderr - +2025-02-06 06:42:16 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.6406012773513794, 'learning_rate': 3.1748014750793587e-07, 'epoch': 2.77} +2025-02-06 06:42:16 - ERROR - stderr - 92%|█████████▏| 20684/22434 [20:34:36<1:13:16, 2.51s/it] +2025-02-06 06:42:19 - ERROR - stderr - 92%|█████████▏| 20685/22434 [20:34:38<1:13:14, 2.51s/it] +2025-02-06 06:42:19 - ERROR - stderr - +2025-02-06 06:42:19 - ERROR - stderr - +2025-02-06 06:42:19 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.49924898147583, 'learning_rate': 3.1711934614748975e-07, 'epoch': 2.77} +2025-02-06 06:42:19 - ERROR - stderr - 92%|█████████▏| 20685/22434 [20:34:38<1:13:14, 2.51s/it] +2025-02-06 06:42:21 - ERROR - stderr - 92%|█████████▏| 20686/22434 [20:34:41<1:13:17, 2.52s/it] +2025-02-06 06:42:21 - ERROR - stderr - +2025-02-06 06:42:21 - ERROR - stderr - +2025-02-06 06:42:21 - INFO - stdout - {'loss': 0.386, 'grad_norm': 1.590999722480774, 'learning_rate': 3.1675874661742713e-07, 'epoch': 2.77} +2025-02-06 06:42:21 - ERROR - stderr - 92%|█████████▏| 20686/22434 [20:34:41<1:13:17, 2.52s/it] +2025-02-06 06:42:24 - ERROR - stderr - 92%|█████████▏| 20687/22434 [20:34:43<1:12:37, 2.49s/it] +2025-02-06 06:42:24 - ERROR - stderr - +2025-02-06 06:42:24 - ERROR - stderr - +2025-02-06 06:42:24 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.647998571395874, 'learning_rate': 3.16398348925262e-07, 'epoch': 2.77} +2025-02-06 06:42:24 - ERROR - stderr - 92%|█████████▏| 20687/22434 [20:34:43<1:12:37, 2.49s/it] +2025-02-06 06:42:26 - ERROR - stderr - 92%|█████████▏| 20688/22434 [20:34:46<1:12:35, 2.49s/it] +2025-02-06 06:42:26 - ERROR - stderr - +2025-02-06 06:42:26 - ERROR - stderr - +2025-02-06 06:42:26 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.49367094039917, 'learning_rate': 3.160381530785062e-07, 'epoch': 2.77} +2025-02-06 06:42:26 - ERROR - stderr - 92%|█████████▏| 20688/22434 [20:34:46<1:12:35, 2.49s/it] +2025-02-06 06:42:29 - ERROR - stderr - 92%|█████████▏| 20689/22434 [20:34:48<1:12:42, 2.50s/it] +2025-02-06 06:42:29 - ERROR - stderr - +2025-02-06 06:42:29 - ERROR - stderr - +2025-02-06 06:42:29 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.607854962348938, 'learning_rate': 3.1567815908467023e-07, 'epoch': 2.77} +2025-02-06 06:42:29 - ERROR - stderr - 92%|█████████▏| 20689/22434 [20:34:48<1:12:42, 2.50s/it] +2025-02-06 06:42:31 - ERROR - stderr - 92%|█████████▏| 20690/22434 [20:34:51<1:12:37, 2.50s/it] +2025-02-06 06:42:31 - ERROR - stderr - +2025-02-06 06:42:31 - ERROR - stderr - +2025-02-06 06:42:31 - INFO - stdout - {'loss': 0.3603, 'grad_norm': 1.6258554458618164, 'learning_rate': 3.1531836695125495e-07, 'epoch': 2.77} +2025-02-06 06:42:31 - ERROR - stderr - 92%|█████████▏| 20690/22434 [20:34:51<1:12:37, 2.50s/it] +2025-02-06 06:42:34 - ERROR - stderr - 92%|█████████▏| 20691/22434 [20:34:53<1:12:38, 2.50s/it] +2025-02-06 06:42:34 - ERROR - stderr - +2025-02-06 06:42:34 - ERROR - stderr - +2025-02-06 06:42:34 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.5969023704528809, 'learning_rate': 3.149587766857609e-07, 'epoch': 2.77} +2025-02-06 06:42:34 - ERROR - stderr - 92%|█████████▏| 20691/22434 [20:34:53<1:12:38, 2.50s/it] +2025-02-06 06:42:36 - ERROR - stderr - 92%|█████████▏| 20692/22434 [20:34:56<1:12:14, 2.49s/it] +2025-02-06 06:42:36 - ERROR - stderr - +2025-02-06 06:42:36 - ERROR - stderr - +2025-02-06 06:42:36 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.6094210147857666, 'learning_rate': 3.1459938829568435e-07, 'epoch': 2.77} +2025-02-06 06:42:36 - ERROR - stderr - 92%|█████████▏| 20692/22434 [20:34:56<1:12:14, 2.49s/it] +2025-02-06 06:42:39 - ERROR - stderr - 92%|█████████▏| 20693/22434 [20:34:58<1:11:53, 2.48s/it] +2025-02-06 06:42:39 - ERROR - stderr - +2025-02-06 06:42:39 - ERROR - stderr - +2025-02-06 06:42:39 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.5778844356536865, 'learning_rate': 3.142402017885149e-07, 'epoch': 2.77} +2025-02-06 06:42:39 - ERROR - stderr - 92%|█████████▏| 20693/22434 [20:34:58<1:11:53, 2.48s/it] +2025-02-06 06:42:41 - ERROR - stderr - 92%|█████████▏| 20694/22434 [20:35:01<1:14:35, 2.57s/it] +2025-02-06 06:42:41 - ERROR - stderr - +2025-02-06 06:42:41 - ERROR - stderr - +2025-02-06 06:42:41 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5329524278640747, 'learning_rate': 3.1388121717174093e-07, 'epoch': 2.77} +2025-02-06 06:42:41 - ERROR - stderr - 92%|█████████▏| 20694/22434 [20:35:01<1:14:35, 2.57s/it] +2025-02-06 06:42:44 - ERROR - stderr - 92%|█████████▏| 20695/22434 [20:35:04<1:15:01, 2.59s/it] +2025-02-06 06:42:44 - ERROR - stderr - +2025-02-06 06:42:44 - ERROR - stderr - +2025-02-06 06:42:44 - INFO - stdout - {'loss': 0.3205, 'grad_norm': 1.407728910446167, 'learning_rate': 3.1352243445284425e-07, 'epoch': 2.77} +2025-02-06 06:42:44 - ERROR - stderr - 92%|█████████▏| 20695/22434 [20:35:04<1:15:01, 2.59s/it] +2025-02-06 06:42:46 - ERROR - stderr - 92%|█████████▏| 20696/22434 [20:35:06<1:13:46, 2.55s/it] +2025-02-06 06:42:46 - ERROR - stderr - +2025-02-06 06:42:46 - ERROR - stderr - +2025-02-06 06:42:46 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.551501750946045, 'learning_rate': 3.1316385363930223e-07, 'epoch': 2.77} +2025-02-06 06:42:46 - ERROR - stderr - 92%|█████████▏| 20696/22434 [20:35:06<1:13:46, 2.55s/it] +2025-02-06 06:42:49 - ERROR - stderr - 92%|█████████▏| 20697/22434 [20:35:09<1:12:56, 2.52s/it] +2025-02-06 06:42:49 - ERROR - stderr - +2025-02-06 06:42:49 - ERROR - stderr - +2025-02-06 06:42:49 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5153677463531494, 'learning_rate': 3.1280547473859224e-07, 'epoch': 2.77} +2025-02-06 06:42:49 - ERROR - stderr - 92%|█████████▏| 20697/22434 [20:35:09<1:12:56, 2.52s/it] +2025-02-06 06:42:51 - ERROR - stderr - 92%|█████████▏| 20698/22434 [20:35:11<1:13:38, 2.55s/it] +2025-02-06 06:42:52 - ERROR - stderr - +2025-02-06 06:42:52 - ERROR - stderr - +2025-02-06 06:42:52 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.5307896137237549, 'learning_rate': 3.124472977581827e-07, 'epoch': 2.77} +2025-02-06 06:42:52 - ERROR - stderr - 92%|█████████▏| 20698/22434 [20:35:11<1:13:38, 2.55s/it] +2025-02-06 06:42:54 - ERROR - stderr - 92%|█████████▏| 20699/22434 [20:35:14<1:13:16, 2.53s/it] +2025-02-06 06:42:54 - ERROR - stderr - +2025-02-06 06:42:54 - ERROR - stderr - +2025-02-06 06:42:54 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.4883335828781128, 'learning_rate': 3.120893227055366e-07, 'epoch': 2.77} +2025-02-06 06:42:54 - ERROR - stderr - 92%|█████████▏| 20699/22434 [20:35:14<1:13:16, 2.53s/it] +2025-02-06 06:42:57 - ERROR - stderr - 92%|█████████▏| 20700/22434 [20:35:17<1:16:40, 2.65s/it] +2025-02-06 06:42:57 - ERROR - stderr - +2025-02-06 06:42:57 - ERROR - stderr - +2025-02-06 06:42:57 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.6086235046386719, 'learning_rate': 3.1173154958812013e-07, 'epoch': 2.77} +2025-02-06 06:42:57 - ERROR - stderr - 92%|█████████▏| 20700/22434 [20:35:17<1:16:40, 2.65s/it] +2025-02-06 06:42:59 - ERROR - stderr - 92%|█████████▏| 20701/22434 [20:35:19<1:14:46, 2.59s/it] +2025-02-06 06:42:59 - ERROR - stderr - +2025-02-06 06:42:59 - ERROR - stderr - +2025-02-06 06:42:59 - INFO - stdout - {'loss': 0.3755, 'grad_norm': 1.6698130369186401, 'learning_rate': 3.1137397841338844e-07, 'epoch': 2.77} +2025-02-06 06:42:59 - ERROR - stderr - 92%|█████████▏| 20701/22434 [20:35:19<1:14:46, 2.59s/it] +2025-02-06 06:43:02 - ERROR - stderr - 92%|█████████▏| 20702/22434 [20:35:22<1:13:48, 2.56s/it] +2025-02-06 06:43:02 - ERROR - stderr - +2025-02-06 06:43:02 - ERROR - stderr - +2025-02-06 06:43:02 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.6747626066207886, 'learning_rate': 3.110166091887956e-07, 'epoch': 2.77} +2025-02-06 06:43:02 - ERROR - stderr - 92%|█████████▏| 20702/22434 [20:35:22<1:13:48, 2.56s/it] +2025-02-06 06:43:04 - ERROR - stderr - 92%|█████████▏| 20703/22434 [20:35:24<1:13:23, 2.54s/it] +2025-02-06 06:43:04 - ERROR - stderr - +2025-02-06 06:43:04 - ERROR - stderr - +2025-02-06 06:43:04 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.506170630455017, 'learning_rate': 3.106594419217901e-07, 'epoch': 2.77} +2025-02-06 06:43:04 - ERROR - stderr - 92%|█████████▏| 20703/22434 [20:35:24<1:13:23, 2.54s/it] +2025-02-06 06:43:07 - ERROR - stderr - 92%|█████████▏| 20704/22434 [20:35:27<1:13:15, 2.54s/it] +2025-02-06 06:43:07 - ERROR - stderr - +2025-02-06 06:43:07 - ERROR - stderr - +2025-02-06 06:43:07 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.6966493129730225, 'learning_rate': 3.1030247661981594e-07, 'epoch': 2.77} +2025-02-06 06:43:07 - ERROR - stderr - 92%|█████████▏| 20704/22434 [20:35:27<1:13:15, 2.54s/it] +2025-02-06 06:43:09 - ERROR - stderr - 92%|█████████▏| 20705/22434 [20:35:29<1:12:52, 2.53s/it] +2025-02-06 06:43:09 - ERROR - stderr - +2025-02-06 06:43:09 - ERROR - stderr - +2025-02-06 06:43:09 - INFO - stdout - {'loss': 0.3994, 'grad_norm': 1.6657495498657227, 'learning_rate': 3.099457132903161e-07, 'epoch': 2.77} +2025-02-06 06:43:09 - ERROR - stderr - 92%|█████████▏| 20705/22434 [20:35:29<1:12:52, 2.53s/it] +2025-02-06 06:43:12 - ERROR - stderr - 92%|█████████▏| 20706/22434 [20:35:32<1:12:13, 2.51s/it] +2025-02-06 06:43:12 - ERROR - stderr - +2025-02-06 06:43:12 - ERROR - stderr - +2025-02-06 06:43:12 - INFO - stdout - {'loss': 0.4118, 'grad_norm': 1.5423030853271484, 'learning_rate': 3.095891519407246e-07, 'epoch': 2.77} +2025-02-06 06:43:12 - ERROR - stderr - 92%|█████████▏| 20706/22434 [20:35:32<1:12:13, 2.51s/it] +2025-02-06 06:43:14 - ERROR - stderr - 92%|█████████▏| 20707/22434 [20:35:34<1:11:57, 2.50s/it] +2025-02-06 06:43:14 - ERROR - stderr - +2025-02-06 06:43:14 - ERROR - stderr - +2025-02-06 06:43:14 - INFO - stdout - {'loss': 0.4392, 'grad_norm': 1.8879189491271973, 'learning_rate': 3.0923279257847436e-07, 'epoch': 2.77} +2025-02-06 06:43:14 - ERROR - stderr - 92%|█████████▏| 20707/22434 [20:35:34<1:11:57, 2.50s/it] +2025-02-06 06:43:17 - ERROR - stderr - 92%|█████████▏| 20708/22434 [20:35:37<1:11:42, 2.49s/it] +2025-02-06 06:43:17 - ERROR - stderr - +2025-02-06 06:43:17 - ERROR - stderr - +2025-02-06 06:43:17 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.6100877523422241, 'learning_rate': 3.0887663521099397e-07, 'epoch': 2.77} +2025-02-06 06:43:17 - ERROR - stderr - 92%|█████████▏| 20708/22434 [20:35:37<1:11:42, 2.49s/it] +2025-02-06 06:43:19 - ERROR - stderr - 92%|█████████▏| 20709/22434 [20:35:39<1:11:53, 2.50s/it] +2025-02-06 06:43:19 - ERROR - stderr - +2025-02-06 06:43:19 - ERROR - stderr - +2025-02-06 06:43:19 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.5506712198257446, 'learning_rate': 3.085206798457052e-07, 'epoch': 2.77} +2025-02-06 06:43:19 - ERROR - stderr - 92%|█████████▏| 20709/22434 [20:35:39<1:11:53, 2.50s/it] +2025-02-06 06:43:22 - ERROR - stderr - 92%|█████████▏| 20710/22434 [20:35:42<1:11:15, 2.48s/it] +2025-02-06 06:43:22 - ERROR - stderr - +2025-02-06 06:43:22 - ERROR - stderr - +2025-02-06 06:43:22 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.517378568649292, 'learning_rate': 3.081649264900322e-07, 'epoch': 2.77} +2025-02-06 06:43:22 - ERROR - stderr - 92%|█████████▏| 20710/22434 [20:35:42<1:11:15, 2.48s/it] +2025-02-06 06:43:24 - ERROR - stderr - 92%|█████████▏| 20711/22434 [20:35:44<1:11:34, 2.49s/it] +2025-02-06 06:43:24 - ERROR - stderr - +2025-02-06 06:43:24 - ERROR - stderr - +2025-02-06 06:43:24 - INFO - stdout - {'loss': 0.4299, 'grad_norm': 1.7510780096054077, 'learning_rate': 3.0780937515138444e-07, 'epoch': 2.77} +2025-02-06 06:43:24 - ERROR - stderr - 92%|█████████▏| 20711/22434 [20:35:44<1:11:34, 2.49s/it] +2025-02-06 06:43:27 - ERROR - stderr - 92%|█████████▏| 20712/22434 [20:35:46<1:11:10, 2.48s/it] +2025-02-06 06:43:27 - ERROR - stderr - +2025-02-06 06:43:27 - ERROR - stderr - +2025-02-06 06:43:27 - INFO - stdout - {'loss': 0.4091, 'grad_norm': 1.823799729347229, 'learning_rate': 3.074540258371772e-07, 'epoch': 2.77} +2025-02-06 06:43:27 - ERROR - stderr - 92%|█████████▏| 20712/22434 [20:35:47<1:11:10, 2.48s/it] +2025-02-06 06:43:29 - ERROR - stderr - 92%|█████████▏| 20713/22434 [20:35:49<1:11:19, 2.49s/it] +2025-02-06 06:43:29 - ERROR - stderr - +2025-02-06 06:43:29 - ERROR - stderr - +2025-02-06 06:43:29 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.5755752325057983, 'learning_rate': 3.070988785548157e-07, 'epoch': 2.77} +2025-02-06 06:43:29 - ERROR - stderr - 92%|█████████▏| 20713/22434 [20:35:49<1:11:19, 2.49s/it] +2025-02-06 06:43:32 - ERROR - stderr - 92%|█████████▏| 20714/22434 [20:35:51<1:11:02, 2.48s/it] +2025-02-06 06:43:32 - ERROR - stderr - +2025-02-06 06:43:32 - ERROR - stderr - +2025-02-06 06:43:32 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.624715805053711, 'learning_rate': 3.067439333117028e-07, 'epoch': 2.77} +2025-02-06 06:43:32 - ERROR - stderr - 92%|█████████▏| 20714/22434 [20:35:52<1:11:02, 2.48s/it] +2025-02-06 06:43:34 - ERROR - stderr - 92%|█████████▏| 20715/22434 [20:35:54<1:10:45, 2.47s/it] +2025-02-06 06:43:34 - ERROR - stderr - +2025-02-06 06:43:34 - ERROR - stderr - +2025-02-06 06:43:34 - INFO - stdout - {'loss': 0.3282, 'grad_norm': 1.5108314752578735, 'learning_rate': 3.0638919011523714e-07, 'epoch': 2.77} +2025-02-06 06:43:34 - ERROR - stderr - 92%|█████████▏| 20715/22434 [20:35:54<1:10:45, 2.47s/it] +2025-02-06 06:43:37 - ERROR - stderr - 92%|█████████▏| 20716/22434 [20:35:56<1:10:38, 2.47s/it] +2025-02-06 06:43:37 - ERROR - stderr - +2025-02-06 06:43:37 - ERROR - stderr - +2025-02-06 06:43:37 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.6149975061416626, 'learning_rate': 3.0603464897281275e-07, 'epoch': 2.77} +2025-02-06 06:43:37 - ERROR - stderr - 92%|█████████▏| 20716/22434 [20:35:56<1:10:38, 2.47s/it] +2025-02-06 06:43:39 - ERROR - stderr - 92%|█████████▏| 20717/22434 [20:35:59<1:11:53, 2.51s/it] +2025-02-06 06:43:39 - ERROR - stderr - +2025-02-06 06:43:39 - ERROR - stderr - +2025-02-06 06:43:39 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.6913609504699707, 'learning_rate': 3.0568030989182043e-07, 'epoch': 2.77} +2025-02-06 06:43:39 - ERROR - stderr - 92%|█████████▏| 20717/22434 [20:35:59<1:11:53, 2.51s/it] +2025-02-06 06:43:42 - ERROR - stderr - 92%|█████████▏| 20718/22434 [20:36:02<1:12:25, 2.53s/it] +2025-02-06 06:43:42 - ERROR - stderr - +2025-02-06 06:43:42 - ERROR - stderr - +2025-02-06 06:43:42 - INFO - stdout - {'loss': 0.4138, 'grad_norm': 1.7315477132797241, 'learning_rate': 3.053261728796464e-07, 'epoch': 2.77} +2025-02-06 06:43:42 - ERROR - stderr - 92%|█████████▏| 20718/22434 [20:36:02<1:12:25, 2.53s/it] +2025-02-06 06:43:44 - ERROR - stderr - 92%|█████████▏| 20719/22434 [20:36:04<1:12:42, 2.54s/it] +2025-02-06 06:43:44 - ERROR - stderr - +2025-02-06 06:43:44 - ERROR - stderr - +2025-02-06 06:43:44 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.6756341457366943, 'learning_rate': 3.049722379436704e-07, 'epoch': 2.77} +2025-02-06 06:43:44 - ERROR - stderr - 92%|█████████▏| 20719/22434 [20:36:04<1:12:42, 2.54s/it] +2025-02-06 06:43:47 - ERROR - stderr - 92%|█████████▏| 20720/22434 [20:36:07<1:13:00, 2.56s/it] +2025-02-06 06:43:47 - ERROR - stderr - +2025-02-06 06:43:47 - ERROR - stderr - +2025-02-06 06:43:47 - INFO - stdout - {'loss': 0.2987, 'grad_norm': 1.4319431781768799, 'learning_rate': 3.046185050912709e-07, 'epoch': 2.77} +2025-02-06 06:43:47 - ERROR - stderr - 92%|█████████▏| 20720/22434 [20:36:07<1:13:00, 2.56s/it] +2025-02-06 06:43:49 - ERROR - stderr - 92%|█████████▏| 20721/22434 [20:36:09<1:12:35, 2.54s/it] +2025-02-06 06:43:50 - ERROR - stderr - +2025-02-06 06:43:50 - ERROR - stderr - +2025-02-06 06:43:50 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.7399311065673828, 'learning_rate': 3.0426497432982207e-07, 'epoch': 2.77} +2025-02-06 06:43:50 - ERROR - stderr - 92%|█████████▏| 20721/22434 [20:36:09<1:12:35, 2.54s/it] +2025-02-06 06:43:52 - ERROR - stderr - 92%|█████████▏| 20722/22434 [20:36:12<1:12:59, 2.56s/it] +2025-02-06 06:43:52 - ERROR - stderr - +2025-02-06 06:43:52 - ERROR - stderr - +2025-02-06 06:43:52 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.4651795625686646, 'learning_rate': 3.039116456666924e-07, 'epoch': 2.77} +2025-02-06 06:43:52 - ERROR - stderr - 92%|█████████▏| 20722/22434 [20:36:12<1:12:59, 2.56s/it] +2025-02-06 06:43:55 - ERROR - stderr - 92%|█████████▏| 20723/22434 [20:36:15<1:14:04, 2.60s/it] +2025-02-06 06:43:55 - ERROR - stderr - +2025-02-06 06:43:55 - ERROR - stderr - +2025-02-06 06:43:55 - INFO - stdout - {'loss': 0.3584, 'grad_norm': 1.5235135555267334, 'learning_rate': 3.035585191092438e-07, 'epoch': 2.77} +2025-02-06 06:43:55 - ERROR - stderr - 92%|█████████▏| 20723/22434 [20:36:15<1:14:04, 2.60s/it] +2025-02-06 06:43:57 - ERROR - stderr - 92%|█████████▏| 20724/22434 [20:36:17<1:12:52, 2.56s/it] +2025-02-06 06:43:57 - ERROR - stderr - +2025-02-06 06:43:57 - ERROR - stderr - +2025-02-06 06:43:57 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.7636213302612305, 'learning_rate': 3.0320559466484265e-07, 'epoch': 2.77} +2025-02-06 06:43:57 - ERROR - stderr - 92%|█████████▏| 20724/22434 [20:36:17<1:12:52, 2.56s/it] +2025-02-06 06:44:00 - ERROR - stderr - 92%|█████████▏| 20725/22434 [20:36:19<1:12:25, 2.54s/it] +2025-02-06 06:44:00 - ERROR - stderr - +2025-02-06 06:44:00 - ERROR - stderr - +2025-02-06 06:44:00 - INFO - stdout - {'loss': 0.4314, 'grad_norm': 1.6372895240783691, 'learning_rate': 3.028528723408386e-07, 'epoch': 2.77} +2025-02-06 06:44:00 - ERROR - stderr - 92%|█████████▏| 20725/22434 [20:36:20<1:12:25, 2.54s/it] +2025-02-06 06:44:02 - ERROR - stderr - 92%|█████████▏| 20726/22434 [20:36:22<1:11:50, 2.52s/it] +2025-02-06 06:44:02 - ERROR - stderr - +2025-02-06 06:44:02 - ERROR - stderr - +2025-02-06 06:44:02 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.660750150680542, 'learning_rate': 3.025003521445891e-07, 'epoch': 2.77} +2025-02-06 06:44:02 - ERROR - stderr - 92%|█████████▏| 20726/22434 [20:36:22<1:11:50, 2.52s/it] +2025-02-06 06:44:05 - ERROR - stderr - 92%|█████████▏| 20727/22434 [20:36:25<1:12:55, 2.56s/it] +2025-02-06 06:44:05 - ERROR - stderr - +2025-02-06 06:44:05 - ERROR - stderr - +2025-02-06 06:44:05 - INFO - stdout - {'loss': 0.3103, 'grad_norm': 1.2646684646606445, 'learning_rate': 3.021480340834415e-07, 'epoch': 2.77} +2025-02-06 06:44:05 - ERROR - stderr - 92%|█████████▏| 20727/22434 [20:36:25<1:12:55, 2.56s/it] +2025-02-06 06:44:07 - ERROR - stderr - 92%|█████████▏| 20728/22434 [20:36:27<1:11:59, 2.53s/it] +2025-02-06 06:44:07 - ERROR - stderr - +2025-02-06 06:44:07 - ERROR - stderr - +2025-02-06 06:44:07 - INFO - stdout - {'loss': 0.3981, 'grad_norm': 1.7003023624420166, 'learning_rate': 3.0179591816473566e-07, 'epoch': 2.77} +2025-02-06 06:44:07 - ERROR - stderr - 92%|█████████▏| 20728/22434 [20:36:27<1:11:59, 2.53s/it] +2025-02-06 06:44:10 - ERROR - stderr - 92%|█████████▏| 20729/22434 [20:36:30<1:11:40, 2.52s/it] +2025-02-06 06:44:10 - ERROR - stderr - +2025-02-06 06:44:10 - ERROR - stderr - +2025-02-06 06:44:10 - INFO - stdout - {'loss': 0.3958, 'grad_norm': 1.6878200769424438, 'learning_rate': 3.014440043958167e-07, 'epoch': 2.77} +2025-02-06 06:44:10 - ERROR - stderr - 92%|█████████▏| 20729/22434 [20:36:30<1:11:40, 2.52s/it] +2025-02-06 06:44:12 - ERROR - stderr - 92%|█████████▏| 20730/22434 [20:36:32<1:11:40, 2.52s/it] +2025-02-06 06:44:12 - ERROR - stderr - +2025-02-06 06:44:12 - ERROR - stderr - +2025-02-06 06:44:12 - INFO - stdout - {'loss': 0.3427, 'grad_norm': 1.5842031240463257, 'learning_rate': 3.010922927840154e-07, 'epoch': 2.77} +2025-02-06 06:44:12 - ERROR - stderr - 92%|█████████▏| 20730/22434 [20:36:32<1:11:40, 2.52s/it] +2025-02-06 06:44:15 - ERROR - stderr - 92%|█████████▏| 20731/22434 [20:36:35<1:12:36, 2.56s/it] +2025-02-06 06:44:15 - ERROR - stderr - +2025-02-06 06:44:15 - ERROR - stderr - +2025-02-06 06:44:15 - INFO - stdout - {'loss': 0.3293, 'grad_norm': 1.4597207307815552, 'learning_rate': 3.007407833366638e-07, 'epoch': 2.77} +2025-02-06 06:44:15 - ERROR - stderr - 92%|█████████▏| 20731/22434 [20:36:35<1:12:36, 2.56s/it] +2025-02-06 06:44:17 - ERROR - stderr - 92%|█████████▏| 20732/22434 [20:36:37<1:12:04, 2.54s/it] +2025-02-06 06:44:18 - ERROR - stderr - +2025-02-06 06:44:18 - ERROR - stderr - +2025-02-06 06:44:18 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.5233381986618042, 'learning_rate': 3.0038947606109036e-07, 'epoch': 2.77} +2025-02-06 06:44:18 - ERROR - stderr - 92%|█████████▏| 20732/22434 [20:36:37<1:12:04, 2.54s/it] +2025-02-06 06:44:20 - ERROR - stderr - 92%|█████████▏| 20733/22434 [20:36:40<1:12:04, 2.54s/it] +2025-02-06 06:44:20 - ERROR - stderr - +2025-02-06 06:44:20 - ERROR - stderr - +2025-02-06 06:44:20 - INFO - stdout - {'loss': 0.3398, 'grad_norm': 1.5119308233261108, 'learning_rate': 3.00038370964616e-07, 'epoch': 2.77} +2025-02-06 06:44:20 - ERROR - stderr - 92%|█████████▏| 20733/22434 [20:36:40<1:12:04, 2.54s/it] +2025-02-06 06:44:22 - ERROR - stderr - 92%|█████████▏| 20734/22434 [20:36:42<1:11:14, 2.51s/it] +2025-02-06 06:44:23 - ERROR - stderr - +2025-02-06 06:44:23 - ERROR - stderr - +2025-02-06 06:44:23 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.579413890838623, 'learning_rate': 2.996874680545603e-07, 'epoch': 2.77} +2025-02-06 06:44:23 - ERROR - stderr - 92%|█████████▏| 20734/22434 [20:36:42<1:11:14, 2.51s/it] +2025-02-06 06:44:25 - ERROR - stderr - 92%|█████████▏| 20735/22434 [20:36:45<1:12:30, 2.56s/it] +2025-02-06 06:44:25 - ERROR - stderr - +2025-02-06 06:44:25 - ERROR - stderr - +2025-02-06 06:44:25 - INFO - stdout - {'loss': 0.3731, 'grad_norm': 1.5960884094238281, 'learning_rate': 2.9933676733823747e-07, 'epoch': 2.77} +2025-02-06 06:44:25 - ERROR - stderr - 92%|█████████▏| 20735/22434 [20:36:45<1:12:30, 2.56s/it] +2025-02-06 06:44:28 - ERROR - stderr - 92%|█████████▏| 20736/22434 [20:36:47<1:12:15, 2.55s/it] +2025-02-06 06:44:28 - ERROR - stderr - +2025-02-06 06:44:28 - ERROR - stderr - +2025-02-06 06:44:28 - INFO - stdout - {'loss': 0.3805, 'grad_norm': 1.64297616481781, 'learning_rate': 2.989862688229572e-07, 'epoch': 2.77} +2025-02-06 06:44:28 - ERROR - stderr - 92%|█████████▏| 20736/22434 [20:36:47<1:12:15, 2.55s/it] +2025-02-06 06:44:30 - ERROR - stderr - 92%|█████████▏| 20737/22434 [20:36:50<1:11:53, 2.54s/it] +2025-02-06 06:44:30 - ERROR - stderr - +2025-02-06 06:44:30 - ERROR - stderr - +2025-02-06 06:44:30 - INFO - stdout - {'loss': 0.3439, 'grad_norm': 1.5175938606262207, 'learning_rate': 2.9863597251602484e-07, 'epoch': 2.77} +2025-02-06 06:44:30 - ERROR - stderr - 92%|█████████▏| 20737/22434 [20:36:50<1:11:53, 2.54s/it] +2025-02-06 06:44:33 - ERROR - stderr - 92%|█████████▏| 20738/22434 [20:36:52<1:10:46, 2.50s/it] +2025-02-06 06:44:33 - ERROR - stderr - +2025-02-06 06:44:33 - ERROR - stderr - +2025-02-06 06:44:33 - INFO - stdout - {'loss': 0.3897, 'grad_norm': 1.6994036436080933, 'learning_rate': 2.982858784247422e-07, 'epoch': 2.77} +2025-02-06 06:44:33 - ERROR - stderr - 92%|█████████▏| 20738/22434 [20:36:52<1:10:46, 2.50s/it] +2025-02-06 06:44:35 - ERROR - stderr - 92%|█████████▏| 20739/22434 [20:36:55<1:10:56, 2.51s/it] +2025-02-06 06:44:35 - ERROR - stderr - +2025-02-06 06:44:35 - ERROR - stderr - +2025-02-06 06:44:35 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.3891130685806274, 'learning_rate': 2.9793598655640687e-07, 'epoch': 2.77} +2025-02-06 06:44:35 - ERROR - stderr - 92%|█████████▏| 20739/22434 [20:36:55<1:10:56, 2.51s/it] +2025-02-06 06:44:38 - ERROR - stderr - 92%|█████████▏| 20740/22434 [20:36:57<1:10:56, 2.51s/it] +2025-02-06 06:44:38 - ERROR - stderr - +2025-02-06 06:44:38 - ERROR - stderr - +2025-02-06 06:44:38 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.550123691558838, 'learning_rate': 2.9758629691831296e-07, 'epoch': 2.77} +2025-02-06 06:44:38 - ERROR - stderr - 92%|█████████▏| 20740/22434 [20:36:57<1:10:56, 2.51s/it] +2025-02-06 06:44:40 - ERROR - stderr - 92%|█████████▏| 20741/22434 [20:37:00<1:10:43, 2.51s/it] +2025-02-06 06:44:40 - ERROR - stderr - +2025-02-06 06:44:40 - ERROR - stderr - +2025-02-06 06:44:40 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.4373564720153809, 'learning_rate': 2.9723680951774804e-07, 'epoch': 2.77} +2025-02-06 06:44:40 - ERROR - stderr - 92%|█████████▏| 20741/22434 [20:37:00<1:10:43, 2.51s/it] +2025-02-06 06:44:43 - ERROR - stderr - 92%|█████████▏| 20742/22434 [20:37:03<1:13:24, 2.60s/it] +2025-02-06 06:44:43 - ERROR - stderr - +2025-02-06 06:44:43 - ERROR - stderr - +2025-02-06 06:44:43 - INFO - stdout - {'loss': 0.3154, 'grad_norm': 1.5844159126281738, 'learning_rate': 2.968875243619962e-07, 'epoch': 2.77} +2025-02-06 06:44:43 - ERROR - stderr - 92%|█████████▏| 20742/22434 [20:37:03<1:13:24, 2.60s/it] +2025-02-06 06:44:46 - ERROR - stderr - 92%|█████████▏| 20743/22434 [20:37:05<1:13:08, 2.60s/it] +2025-02-06 06:44:46 - ERROR - stderr - +2025-02-06 06:44:46 - ERROR - stderr - +2025-02-06 06:44:46 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.5807666778564453, 'learning_rate': 2.9653844145834164e-07, 'epoch': 2.77} +2025-02-06 06:44:46 - ERROR - stderr - 92%|█████████▏| 20743/22434 [20:37:05<1:13:08, 2.60s/it] +2025-02-06 06:44:48 - ERROR - stderr - 92%|█████████▏| 20744/22434 [20:37:08<1:12:25, 2.57s/it] +2025-02-06 06:44:48 - ERROR - stderr - +2025-02-06 06:44:48 - ERROR - stderr - +2025-02-06 06:44:48 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.3800562620162964, 'learning_rate': 2.9618956081405525e-07, 'epoch': 2.77} +2025-02-06 06:44:48 - ERROR - stderr - 92%|█████████▏| 20744/22434 [20:37:08<1:12:25, 2.57s/it] +2025-02-06 06:44:51 - ERROR - stderr - 92%|█████████▏| 20745/22434 [20:37:11<1:14:31, 2.65s/it] +2025-02-06 06:44:51 - ERROR - stderr - +2025-02-06 06:44:51 - ERROR - stderr - +2025-02-06 06:44:51 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.4731806516647339, 'learning_rate': 2.958408824364134e-07, 'epoch': 2.77} +2025-02-06 06:44:51 - ERROR - stderr - 92%|█████████▏| 20745/22434 [20:37:11<1:14:31, 2.65s/it] +2025-02-06 06:44:53 - ERROR - stderr - 92%|█████████▏| 20746/22434 [20:37:13<1:13:04, 2.60s/it] +2025-02-06 06:44:53 - ERROR - stderr - +2025-02-06 06:44:53 - ERROR - stderr - +2025-02-06 06:44:53 - INFO - stdout - {'loss': 0.3661, 'grad_norm': 1.5849946737289429, 'learning_rate': 2.954924063326814e-07, 'epoch': 2.77} +2025-02-06 06:44:53 - ERROR - stderr - 92%|█████████▏| 20746/22434 [20:37:13<1:13:04, 2.60s/it] +2025-02-06 06:44:56 - ERROR - stderr - 92%|█████████▏| 20747/22434 [20:37:16<1:11:36, 2.55s/it] +2025-02-06 06:44:56 - ERROR - stderr - +2025-02-06 06:44:56 - ERROR - stderr - +2025-02-06 06:44:56 - INFO - stdout - {'loss': 0.417, 'grad_norm': 1.7389240264892578, 'learning_rate': 2.9514413251012563e-07, 'epoch': 2.77} +2025-02-06 06:44:56 - ERROR - stderr - 92%|█████████▏| 20747/22434 [20:37:16<1:11:36, 2.55s/it] +2025-02-06 06:44:58 - ERROR - stderr - 92%|█████████▏| 20748/22434 [20:37:18<1:11:12, 2.53s/it] +2025-02-06 06:44:58 - ERROR - stderr - +2025-02-06 06:44:58 - ERROR - stderr - +2025-02-06 06:44:58 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.4546507596969604, 'learning_rate': 2.947960609760037e-07, 'epoch': 2.77} +2025-02-06 06:44:58 - ERROR - stderr - 92%|█████████▏| 20748/22434 [20:37:18<1:11:12, 2.53s/it] +2025-02-06 06:45:01 - ERROR - stderr - 92%|█████████▏| 20749/22434 [20:37:21<1:10:47, 2.52s/it] +2025-02-06 06:45:01 - ERROR - stderr - +2025-02-06 06:45:01 - ERROR - stderr - +2025-02-06 06:45:01 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.4727009534835815, 'learning_rate': 2.9444819173756966e-07, 'epoch': 2.77} +2025-02-06 06:45:01 - ERROR - stderr - 92%|█████████▏| 20749/22434 [20:37:21<1:10:47, 2.52s/it] +2025-02-06 06:45:03 - ERROR - stderr - 92%|█████████▏| 20750/22434 [20:37:23<1:11:46, 2.56s/it] +2025-02-06 06:45:03 - ERROR - stderr - +2025-02-06 06:45:03 - ERROR - stderr - +2025-02-06 06:45:03 - INFO - stdout - {'loss': 0.331, 'grad_norm': 1.6057275533676147, 'learning_rate': 2.9410052480207674e-07, 'epoch': 2.77} +2025-02-06 06:45:03 - ERROR - stderr - 92%|█████████▏| 20750/22434 [20:37:23<1:11:46, 2.56s/it] +2025-02-06 06:45:06 - ERROR - stderr - 92%|█████████▏| 20751/22434 [20:37:26<1:10:57, 2.53s/it] +2025-02-06 06:45:06 - ERROR - stderr - +2025-02-06 06:45:06 - ERROR - stderr - +2025-02-06 06:45:06 - INFO - stdout - {'loss': 0.3379, 'grad_norm': 1.4012356996536255, 'learning_rate': 2.937530601767713e-07, 'epoch': 2.77} +2025-02-06 06:45:06 - ERROR - stderr - 92%|█████████▏| 20751/22434 [20:37:26<1:10:57, 2.53s/it] +2025-02-06 06:45:08 - ERROR - stderr - 93%|█████████▎| 20752/22434 [20:37:28<1:10:07, 2.50s/it] +2025-02-06 06:45:08 - ERROR - stderr - +2025-02-06 06:45:08 - ERROR - stderr - +2025-02-06 06:45:08 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.4955625534057617, 'learning_rate': 2.934057978688942e-07, 'epoch': 2.78} +2025-02-06 06:45:08 - ERROR - stderr - 93%|█████████▎| 20752/22434 [20:37:28<1:10:07, 2.50s/it] +2025-02-06 06:45:11 - ERROR - stderr - 93%|█████████▎| 20753/22434 [20:37:31<1:10:47, 2.53s/it] +2025-02-06 06:45:11 - ERROR - stderr - +2025-02-06 06:45:11 - ERROR - stderr - +2025-02-06 06:45:11 - INFO - stdout - {'loss': 0.3898, 'grad_norm': 1.6552820205688477, 'learning_rate': 2.9305873788568637e-07, 'epoch': 2.78} +2025-02-06 06:45:11 - ERROR - stderr - 93%|█████████▎| 20753/22434 [20:37:31<1:10:47, 2.53s/it] +2025-02-06 06:45:13 - ERROR - stderr - 93%|█████████▎| 20754/22434 [20:37:33<1:10:26, 2.52s/it] +2025-02-06 06:45:13 - ERROR - stderr - +2025-02-06 06:45:13 - ERROR - stderr - +2025-02-06 06:45:13 - INFO - stdout - {'loss': 0.4045, 'grad_norm': 1.8777143955230713, 'learning_rate': 2.927118802343787e-07, 'epoch': 2.78} +2025-02-06 06:45:13 - ERROR - stderr - 93%|█████████▎| 20754/22434 [20:37:33<1:10:26, 2.52s/it] +2025-02-06 06:45:16 - ERROR - stderr - 93%|█████████▎| 20755/22434 [20:37:36<1:10:20, 2.51s/it] +2025-02-06 06:45:16 - ERROR - stderr - +2025-02-06 06:45:16 - ERROR - stderr - +2025-02-06 06:45:16 - INFO - stdout - {'loss': 0.394, 'grad_norm': 1.5445661544799805, 'learning_rate': 2.923652249222053e-07, 'epoch': 2.78} +2025-02-06 06:45:16 - ERROR - stderr - 93%|█████████▎| 20755/22434 [20:37:36<1:10:20, 2.51s/it] +2025-02-06 06:45:18 - ERROR - stderr - 93%|█████████▎| 20756/22434 [20:37:38<1:09:24, 2.48s/it] +2025-02-06 06:45:18 - ERROR - stderr - +2025-02-06 06:45:18 - ERROR - stderr - +2025-02-06 06:45:18 - INFO - stdout - {'loss': 0.332, 'grad_norm': 1.5453871488571167, 'learning_rate': 2.9201877195638827e-07, 'epoch': 2.78} +2025-02-06 06:45:18 - ERROR - stderr - 93%|█████████▎| 20756/22434 [20:37:38<1:09:24, 2.48s/it] +2025-02-06 06:45:21 - ERROR - stderr - 93%|█████████▎| 20757/22434 [20:37:41<1:09:47, 2.50s/it] +2025-02-06 06:45:21 - ERROR - stderr - +2025-02-06 06:45:21 - ERROR - stderr - +2025-02-06 06:45:21 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.5067980289459229, 'learning_rate': 2.916725213441507e-07, 'epoch': 2.78} +2025-02-06 06:45:21 - ERROR - stderr - 93%|█████████▎| 20757/22434 [20:37:41<1:09:47, 2.50s/it] +2025-02-06 06:45:23 - ERROR - stderr - 93%|█████████▎| 20758/22434 [20:37:43<1:09:27, 2.49s/it] +2025-02-06 06:45:23 - ERROR - stderr - +2025-02-06 06:45:23 - ERROR - stderr - +2025-02-06 06:45:23 - INFO - stdout - {'loss': 0.402, 'grad_norm': 1.851904034614563, 'learning_rate': 2.91326473092709e-07, 'epoch': 2.78} +2025-02-06 06:45:23 - ERROR - stderr - 93%|█████████▎| 20758/22434 [20:37:43<1:09:27, 2.49s/it] +2025-02-06 06:45:26 - ERROR - stderr - 93%|█████████▎| 20759/22434 [20:37:46<1:10:05, 2.51s/it] +2025-02-06 06:45:26 - ERROR - stderr - +2025-02-06 06:45:26 - ERROR - stderr - +2025-02-06 06:45:26 - INFO - stdout - {'loss': 0.426, 'grad_norm': 1.7865034341812134, 'learning_rate': 2.9098062720927746e-07, 'epoch': 2.78} +2025-02-06 06:45:26 - ERROR - stderr - 93%|█████████▎| 20759/22434 [20:37:46<1:10:05, 2.51s/it] +2025-02-06 06:45:28 - ERROR - stderr - 93%|█████████▎| 20760/22434 [20:37:48<1:10:46, 2.54s/it] +2025-02-06 06:45:29 - ERROR - stderr - +2025-02-06 06:45:29 - ERROR - stderr - +2025-02-06 06:45:29 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.6535567045211792, 'learning_rate': 2.906349837010636e-07, 'epoch': 2.78} +2025-02-06 06:45:29 - ERROR - stderr - 93%|█████████▎| 20760/22434 [20:37:48<1:10:46, 2.54s/it] +2025-02-06 06:45:31 - ERROR - stderr - 93%|█████████▎| 20761/22434 [20:37:51<1:12:42, 2.61s/it] +2025-02-06 06:45:31 - ERROR - stderr - +2025-02-06 06:45:31 - ERROR - stderr - +2025-02-06 06:45:31 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.5183446407318115, 'learning_rate': 2.9028954257527277e-07, 'epoch': 2.78} +2025-02-06 06:45:31 - ERROR - stderr - 93%|█████████▎| 20761/22434 [20:37:51<1:12:42, 2.61s/it] +2025-02-06 06:45:34 - ERROR - stderr - 93%|█████████▎| 20762/22434 [20:37:53<1:11:23, 2.56s/it] +2025-02-06 06:45:34 - ERROR - stderr - +2025-02-06 06:45:34 - ERROR - stderr - +2025-02-06 06:45:34 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5191289186477661, 'learning_rate': 2.899443038391059e-07, 'epoch': 2.78} +2025-02-06 06:45:34 - ERROR - stderr - 93%|█████████▎| 20762/22434 [20:37:54<1:11:23, 2.56s/it] +2025-02-06 06:45:36 - ERROR - stderr - 93%|█████████▎| 20763/22434 [20:37:56<1:11:51, 2.58s/it] +2025-02-06 06:45:36 - ERROR - stderr - +2025-02-06 06:45:36 - ERROR - stderr - +2025-02-06 06:45:36 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.5413789749145508, 'learning_rate': 2.895992674997583e-07, 'epoch': 2.78} +2025-02-06 06:45:36 - ERROR - stderr - 93%|█████████▎| 20763/22434 [20:37:56<1:11:51, 2.58s/it] +2025-02-06 06:45:39 - ERROR - stderr - 93%|█████████▎| 20764/22434 [20:37:59<1:11:31, 2.57s/it] +2025-02-06 06:45:39 - ERROR - stderr - +2025-02-06 06:45:39 - ERROR - stderr - +2025-02-06 06:45:39 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.601488471031189, 'learning_rate': 2.8925443356442206e-07, 'epoch': 2.78} +2025-02-06 06:45:39 - ERROR - stderr - 93%|█████████▎| 20764/22434 [20:37:59<1:11:31, 2.57s/it] +2025-02-06 06:45:41 - ERROR - stderr - 93%|█████████▎| 20765/22434 [20:38:01<1:10:40, 2.54s/it] +2025-02-06 06:45:41 - ERROR - stderr - +2025-02-06 06:45:41 - ERROR - stderr - +2025-02-06 06:45:41 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.7223652601242065, 'learning_rate': 2.8890980204028476e-07, 'epoch': 2.78} +2025-02-06 06:45:41 - ERROR - stderr - 93%|█████████▎| 20765/22434 [20:38:01<1:10:40, 2.54s/it] +2025-02-06 06:45:44 - ERROR - stderr - 93%|█████████▎| 20766/22434 [20:38:04<1:10:47, 2.55s/it] +2025-02-06 06:45:44 - ERROR - stderr - +2025-02-06 06:45:44 - ERROR - stderr - +2025-02-06 06:45:44 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.5945628881454468, 'learning_rate': 2.885653729345306e-07, 'epoch': 2.78} +2025-02-06 06:45:44 - ERROR - stderr - 93%|█████████▎| 20766/22434 [20:38:04<1:10:47, 2.55s/it] +2025-02-06 06:45:46 - ERROR - stderr - 93%|█████████▎| 20767/22434 [20:38:06<1:10:48, 2.55s/it] +2025-02-06 06:45:47 - ERROR - stderr - +2025-02-06 06:45:47 - ERROR - stderr - +2025-02-06 06:45:47 - INFO - stdout - {'loss': 0.3719, 'grad_norm': 1.5190303325653076, 'learning_rate': 2.8822114625433826e-07, 'epoch': 2.78} +2025-02-06 06:45:47 - ERROR - stderr - 93%|█████████▎| 20767/22434 [20:38:06<1:10:48, 2.55s/it] +2025-02-06 06:45:49 - ERROR - stderr - 93%|█████████▎| 20768/22434 [20:38:09<1:10:06, 2.52s/it] +2025-02-06 06:45:49 - ERROR - stderr - +2025-02-06 06:45:49 - ERROR - stderr - +2025-02-06 06:45:49 - INFO - stdout - {'loss': 0.3347, 'grad_norm': 1.44663405418396, 'learning_rate': 2.8787712200688214e-07, 'epoch': 2.78} +2025-02-06 06:45:49 - ERROR - stderr - 93%|█████████▎| 20768/22434 [20:38:09<1:10:06, 2.52s/it] +2025-02-06 06:45:51 - ERROR - stderr - 93%|█████████▎| 20769/22434 [20:38:11<1:10:18, 2.53s/it] +2025-02-06 06:45:52 - ERROR - stderr - +2025-02-06 06:45:52 - ERROR - stderr - +2025-02-06 06:45:52 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.4878138303756714, 'learning_rate': 2.875333001993352e-07, 'epoch': 2.78} +2025-02-06 06:45:52 - ERROR - stderr - 93%|█████████▎| 20769/22434 [20:38:11<1:10:18, 2.53s/it] +2025-02-06 06:45:54 - ERROR - stderr - 93%|█████████▎| 20770/22434 [20:38:14<1:10:14, 2.53s/it] +2025-02-06 06:45:54 - ERROR - stderr - +2025-02-06 06:45:54 - ERROR - stderr - +2025-02-06 06:45:54 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.539793848991394, 'learning_rate': 2.871896808388608e-07, 'epoch': 2.78} +2025-02-06 06:45:54 - ERROR - stderr - 93%|█████████▎| 20770/22434 [20:38:14<1:10:14, 2.53s/it] +2025-02-06 06:45:57 - ERROR - stderr - 93%|█████████▎| 20771/22434 [20:38:16<1:09:58, 2.52s/it] +2025-02-06 06:45:57 - ERROR - stderr - +2025-02-06 06:45:57 - ERROR - stderr - +2025-02-06 06:45:57 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.6660206317901611, 'learning_rate': 2.8684626393262637e-07, 'epoch': 2.78} +2025-02-06 06:45:57 - ERROR - stderr - 93%|█████████▎| 20771/22434 [20:38:16<1:09:58, 2.52s/it] +2025-02-06 06:45:59 - ERROR - stderr - 93%|█████████▎| 20772/22434 [20:38:19<1:10:01, 2.53s/it] +2025-02-06 06:45:59 - ERROR - stderr - +2025-02-06 06:45:59 - ERROR - stderr - +2025-02-06 06:45:59 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.448326587677002, 'learning_rate': 2.865030494877852e-07, 'epoch': 2.78} +2025-02-06 06:45:59 - ERROR - stderr - 93%|█████████▎| 20772/22434 [20:38:19<1:10:01, 2.53s/it] +2025-02-06 06:46:02 - ERROR - stderr - 93%|█████████▎| 20773/22434 [20:38:22<1:12:37, 2.62s/it] +2025-02-06 06:46:02 - ERROR - stderr - +2025-02-06 06:46:02 - ERROR - stderr - +2025-02-06 06:46:02 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.4468492269515991, 'learning_rate': 2.861600375114926e-07, 'epoch': 2.78} +2025-02-06 06:46:02 - ERROR - stderr - 93%|█████████▎| 20773/22434 [20:38:22<1:12:37, 2.62s/it] +2025-02-06 06:46:04 - ERROR - stderr - 93%|█████████▎| 20774/22434 [20:38:24<1:11:38, 2.59s/it] +2025-02-06 06:46:04 - ERROR - stderr - +2025-02-06 06:46:04 - ERROR - stderr - +2025-02-06 06:46:04 - INFO - stdout - {'loss': 0.3487, 'grad_norm': 1.4524065256118774, 'learning_rate': 2.8581722801090063e-07, 'epoch': 2.78} +2025-02-06 06:46:04 - ERROR - stderr - 93%|█████████▎| 20774/22434 [20:38:24<1:11:38, 2.59s/it] +2025-02-06 06:46:07 - ERROR - stderr - 93%|█████████▎| 20775/22434 [20:38:27<1:10:55, 2.56s/it] +2025-02-06 06:46:07 - ERROR - stderr - +2025-02-06 06:46:07 - ERROR - stderr - +2025-02-06 06:46:07 - INFO - stdout - {'loss': 0.3933, 'grad_norm': 1.63760244846344, 'learning_rate': 2.854746209931514e-07, 'epoch': 2.78} +2025-02-06 06:46:07 - ERROR - stderr - 93%|█████████▎| 20775/22434 [20:38:27<1:10:55, 2.56s/it] +2025-02-06 06:46:10 - ERROR - stderr - 93%|█████████▎| 20776/22434 [20:38:29<1:11:41, 2.59s/it] +2025-02-06 06:46:10 - ERROR - stderr - +2025-02-06 06:46:10 - ERROR - stderr - +2025-02-06 06:46:10 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.5655663013458252, 'learning_rate': 2.8513221646538913e-07, 'epoch': 2.78} +2025-02-06 06:46:10 - ERROR - stderr - 93%|█████████▎| 20776/22434 [20:38:29<1:11:41, 2.59s/it] +2025-02-06 06:46:12 - ERROR - stderr - 93%|█████████▎| 20777/22434 [20:38:32<1:11:25, 2.59s/it] +2025-02-06 06:46:12 - ERROR - stderr - +2025-02-06 06:46:12 - ERROR - stderr - +2025-02-06 06:46:12 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.61640465259552, 'learning_rate': 2.847900144347493e-07, 'epoch': 2.78} +2025-02-06 06:46:12 - ERROR - stderr - 93%|█████████▎| 20777/22434 [20:38:32<1:11:25, 2.59s/it] +2025-02-06 06:46:15 - ERROR - stderr - 93%|█████████▎| 20778/22434 [20:38:35<1:13:46, 2.67s/it] +2025-02-06 06:46:15 - ERROR - stderr - +2025-02-06 06:46:15 - ERROR - stderr - +2025-02-06 06:46:15 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.4885908365249634, 'learning_rate': 2.8444801490836505e-07, 'epoch': 2.78} +2025-02-06 06:46:15 - ERROR - stderr - 93%|█████████▎| 20778/22434 [20:38:35<1:13:46, 2.67s/it] +2025-02-06 06:46:18 - ERROR - stderr - 93%|█████████▎| 20779/22434 [20:38:37<1:12:29, 2.63s/it] +2025-02-06 06:46:18 - ERROR - stderr - +2025-02-06 06:46:18 - ERROR - stderr - +2025-02-06 06:46:18 - INFO - stdout - {'loss': 0.3343, 'grad_norm': 1.6433523893356323, 'learning_rate': 2.8410621789336513e-07, 'epoch': 2.78} +2025-02-06 06:46:18 - ERROR - stderr - 93%|█████████▎| 20779/22434 [20:38:37<1:12:29, 2.63s/it] +2025-02-06 06:46:20 - ERROR - stderr - 93%|█████████▎| 20780/22434 [20:38:40<1:12:52, 2.64s/it] +2025-02-06 06:46:20 - ERROR - stderr - +2025-02-06 06:46:20 - ERROR - stderr - +2025-02-06 06:46:20 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.5935453176498413, 'learning_rate': 2.8376462339687383e-07, 'epoch': 2.78} +2025-02-06 06:46:20 - ERROR - stderr - 93%|█████████▎| 20780/22434 [20:38:40<1:12:52, 2.64s/it] +2025-02-06 06:46:23 - ERROR - stderr - 93%|█████████▎| 20781/22434 [20:38:43<1:12:14, 2.62s/it] +2025-02-06 06:46:23 - ERROR - stderr - +2025-02-06 06:46:23 - ERROR - stderr - +2025-02-06 06:46:23 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.5280020236968994, 'learning_rate': 2.8342323142601104e-07, 'epoch': 2.78} +2025-02-06 06:46:23 - ERROR - stderr - 93%|█████████▎| 20781/22434 [20:38:43<1:12:14, 2.62s/it] +2025-02-06 06:46:25 - ERROR - stderr - 93%|█████████▎| 20782/22434 [20:38:45<1:11:38, 2.60s/it] +2025-02-06 06:46:25 - ERROR - stderr - +2025-02-06 06:46:25 - ERROR - stderr - +2025-02-06 06:46:25 - INFO - stdout - {'loss': 0.381, 'grad_norm': 1.8817561864852905, 'learning_rate': 2.830820419878944e-07, 'epoch': 2.78} +2025-02-06 06:46:25 - ERROR - stderr - 93%|█████████▎| 20782/22434 [20:38:45<1:11:38, 2.60s/it] +2025-02-06 06:46:28 - ERROR - stderr - 93%|█████████▎| 20783/22434 [20:38:48<1:11:16, 2.59s/it] +2025-02-06 06:46:28 - ERROR - stderr - +2025-02-06 06:46:28 - ERROR - stderr - +2025-02-06 06:46:28 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.7712763547897339, 'learning_rate': 2.827410550896337e-07, 'epoch': 2.78} +2025-02-06 06:46:28 - ERROR - stderr - 93%|█████████▎| 20783/22434 [20:38:48<1:11:16, 2.59s/it] +2025-02-06 06:46:31 - ERROR - stderr - 93%|█████████▎| 20784/22434 [20:38:50<1:12:22, 2.63s/it] +2025-02-06 06:46:31 - ERROR - stderr - +2025-02-06 06:46:31 - ERROR - stderr - +2025-02-06 06:46:31 - INFO - stdout - {'loss': 0.4368, 'grad_norm': 1.6753543615341187, 'learning_rate': 2.824002707383378e-07, 'epoch': 2.78} +2025-02-06 06:46:31 - ERROR - stderr - 93%|█████████▎| 20784/22434 [20:38:50<1:12:22, 2.63s/it] +2025-02-06 06:46:33 - ERROR - stderr - 93%|█████████▎| 20785/22434 [20:38:53<1:11:10, 2.59s/it] +2025-02-06 06:46:33 - ERROR - stderr - +2025-02-06 06:46:33 - ERROR - stderr - +2025-02-06 06:46:33 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.6137648820877075, 'learning_rate': 2.8205968894110867e-07, 'epoch': 2.78} +2025-02-06 06:46:33 - ERROR - stderr - 93%|█████████▎| 20785/22434 [20:38:53<1:11:10, 2.59s/it] +2025-02-06 06:46:36 - ERROR - stderr - 93%|█████████▎| 20786/22434 [20:38:55<1:10:59, 2.58s/it] +2025-02-06 06:46:36 - ERROR - stderr - +2025-02-06 06:46:36 - ERROR - stderr - +2025-02-06 06:46:36 - INFO - stdout - {'loss': 0.3238, 'grad_norm': 1.4507417678833008, 'learning_rate': 2.8171930970504745e-07, 'epoch': 2.78} +2025-02-06 06:46:36 - ERROR - stderr - 93%|█████████▎| 20786/22434 [20:38:56<1:10:59, 2.58s/it] +2025-02-06 06:46:38 - ERROR - stderr - 93%|█████████▎| 20787/22434 [20:38:58<1:10:29, 2.57s/it] +2025-02-06 06:46:38 - ERROR - stderr - +2025-02-06 06:46:38 - ERROR - stderr - +2025-02-06 06:46:38 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.4163216352462769, 'learning_rate': 2.813791330372473e-07, 'epoch': 2.78} +2025-02-06 06:46:38 - ERROR - stderr - 93%|█████████▎| 20787/22434 [20:38:58<1:10:29, 2.57s/it] +2025-02-06 06:46:41 - ERROR - stderr - 93%|█████████▎| 20788/22434 [20:39:01<1:10:22, 2.57s/it] +2025-02-06 06:46:41 - ERROR - stderr - +2025-02-06 06:46:41 - ERROR - stderr - +2025-02-06 06:46:41 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.5768721103668213, 'learning_rate': 2.810391589448003e-07, 'epoch': 2.78} +2025-02-06 06:46:41 - ERROR - stderr - 93%|█████████▎| 20788/22434 [20:39:01<1:10:22, 2.57s/it] +2025-02-06 06:46:43 - ERROR - stderr - 93%|█████████▎| 20789/22434 [20:39:03<1:10:28, 2.57s/it] +2025-02-06 06:46:43 - ERROR - stderr - +2025-02-06 06:46:43 - ERROR - stderr - +2025-02-06 06:46:43 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5263575315475464, 'learning_rate': 2.8069938743478965e-07, 'epoch': 2.78} +2025-02-06 06:46:43 - ERROR - stderr - 93%|█████████▎| 20789/22434 [20:39:03<1:10:28, 2.57s/it] +2025-02-06 06:46:46 - ERROR - stderr - 93%|█████████▎| 20790/22434 [20:39:06<1:09:20, 2.53s/it] +2025-02-06 06:46:46 - ERROR - stderr - +2025-02-06 06:46:46 - ERROR - stderr - +2025-02-06 06:46:46 - INFO - stdout - {'loss': 0.4374, 'grad_norm': 1.8141093254089355, 'learning_rate': 2.8035981851430303e-07, 'epoch': 2.78} +2025-02-06 06:46:46 - ERROR - stderr - 93%|█████████▎| 20790/22434 [20:39:06<1:09:20, 2.53s/it] +2025-02-06 06:46:48 - ERROR - stderr - 93%|█████████▎| 20791/22434 [20:39:08<1:09:36, 2.54s/it] +2025-02-06 06:46:48 - ERROR - stderr - +2025-02-06 06:46:48 - ERROR - stderr - +2025-02-06 06:46:48 - INFO - stdout - {'loss': 0.3234, 'grad_norm': 1.5084362030029297, 'learning_rate': 2.8002045219041374e-07, 'epoch': 2.78} +2025-02-06 06:46:48 - ERROR - stderr - 93%|█████████▎| 20791/22434 [20:39:08<1:09:36, 2.54s/it] +2025-02-06 06:46:51 - ERROR - stderr - 93%|█████████▎| 20792/22434 [20:39:11<1:10:46, 2.59s/it] +2025-02-06 06:46:51 - ERROR - stderr - +2025-02-06 06:46:51 - ERROR - stderr - +2025-02-06 06:46:51 - INFO - stdout - {'loss': 0.3283, 'grad_norm': 1.472138524055481, 'learning_rate': 2.79681288470196e-07, 'epoch': 2.78} +2025-02-06 06:46:51 - ERROR - stderr - 93%|█████████▎| 20792/22434 [20:39:11<1:10:46, 2.59s/it] +2025-02-06 06:46:54 - ERROR - stderr - 93%|█████████▎| 20793/22434 [20:39:13<1:09:24, 2.54s/it] +2025-02-06 06:46:54 - ERROR - stderr - +2025-02-06 06:46:54 - ERROR - stderr - +2025-02-06 06:46:54 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.8117003440856934, 'learning_rate': 2.793423273607221e-07, 'epoch': 2.78} +2025-02-06 06:46:54 - ERROR - stderr - 93%|█████████▎| 20793/22434 [20:39:13<1:09:24, 2.54s/it] +2025-02-06 06:46:56 - ERROR - stderr - 93%|█████████▎| 20794/22434 [20:39:16<1:09:10, 2.53s/it] +2025-02-06 06:46:56 - ERROR - stderr - +2025-02-06 06:46:56 - ERROR - stderr - +2025-02-06 06:46:56 - INFO - stdout - {'loss': 0.319, 'grad_norm': 1.5973988771438599, 'learning_rate': 2.79003568869054e-07, 'epoch': 2.78} +2025-02-06 06:46:56 - ERROR - stderr - 93%|█████████▎| 20794/22434 [20:39:16<1:09:10, 2.53s/it] +2025-02-06 06:46:59 - ERROR - stderr - 93%|█████████▎| 20795/22434 [20:39:18<1:09:01, 2.53s/it] +2025-02-06 06:46:59 - ERROR - stderr - +2025-02-06 06:46:59 - ERROR - stderr - +2025-02-06 06:46:59 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.5897465944290161, 'learning_rate': 2.7866501300225613e-07, 'epoch': 2.78} +2025-02-06 06:46:59 - ERROR - stderr - 93%|█████████▎| 20795/22434 [20:39:18<1:09:01, 2.53s/it] +2025-02-06 06:47:01 - ERROR - stderr - 93%|█████████▎| 20796/22434 [20:39:21<1:09:24, 2.54s/it] +2025-02-06 06:47:01 - ERROR - stderr - +2025-02-06 06:47:01 - ERROR - stderr - +2025-02-06 06:47:01 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.414781928062439, 'learning_rate': 2.7832665976738393e-07, 'epoch': 2.78} +2025-02-06 06:47:01 - ERROR - stderr - 93%|█████████▎| 20796/22434 [20:39:21<1:09:24, 2.54s/it] +2025-02-06 06:47:04 - ERROR - stderr - 93%|█████████▎| 20797/22434 [20:39:23<1:09:31, 2.55s/it] +2025-02-06 06:47:04 - ERROR - stderr - +2025-02-06 06:47:04 - ERROR - stderr - +2025-02-06 06:47:04 - INFO - stdout - {'loss': 0.3881, 'grad_norm': 1.5728187561035156, 'learning_rate': 2.7798850917148845e-07, 'epoch': 2.78} +2025-02-06 06:47:04 - ERROR - stderr - 93%|█████████▎| 20797/22434 [20:39:24<1:09:31, 2.55s/it] +2025-02-06 06:47:06 - ERROR - stderr - 93%|█████████▎| 20798/22434 [20:39:26<1:09:33, 2.55s/it] +2025-02-06 06:47:06 - ERROR - stderr - +2025-02-06 06:47:06 - ERROR - stderr - +2025-02-06 06:47:06 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.6355929374694824, 'learning_rate': 2.776505612216207e-07, 'epoch': 2.78} +2025-02-06 06:47:06 - ERROR - stderr - 93%|█████████▎| 20798/22434 [20:39:26<1:09:33, 2.55s/it] +2025-02-06 06:47:09 - ERROR - stderr - 93%|█████████▎| 20799/22434 [20:39:28<1:08:40, 2.52s/it] +2025-02-06 06:47:09 - ERROR - stderr - +2025-02-06 06:47:09 - ERROR - stderr - +2025-02-06 06:47:09 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.7737250328063965, 'learning_rate': 2.7731281592482285e-07, 'epoch': 2.78} +2025-02-06 06:47:09 - ERROR - stderr - 93%|█████████▎| 20799/22434 [20:39:29<1:08:40, 2.52s/it] +2025-02-06 06:47:11 - ERROR - stderr - 93%|█████████▎| 20800/22434 [20:39:31<1:08:57, 2.53s/it] +2025-02-06 06:47:11 - ERROR - stderr - +2025-02-06 06:47:11 - ERROR - stderr - +2025-02-06 06:47:11 - INFO - stdout - {'loss': 0.2964, 'grad_norm': 1.4858434200286865, 'learning_rate': 2.76975273288137e-07, 'epoch': 2.78} +2025-02-06 06:47:11 - ERROR - stderr - 93%|█████████▎| 20800/22434 [20:39:31<1:08:57, 2.53s/it] +2025-02-06 06:47:14 - ERROR - stderr - 93%|█████████▎| 20801/22434 [20:39:34<1:09:06, 2.54s/it] +2025-02-06 06:47:14 - ERROR - stderr - +2025-02-06 06:47:14 - ERROR - stderr - +2025-02-06 06:47:14 - INFO - stdout - {'loss': 0.3882, 'grad_norm': 1.5188225507736206, 'learning_rate': 2.7663793331859645e-07, 'epoch': 2.78} +2025-02-06 06:47:14 - ERROR - stderr - 93%|█████████▎| 20801/22434 [20:39:34<1:09:06, 2.54s/it] +2025-02-06 06:47:16 - ERROR - stderr - 93%|█████████▎| 20802/22434 [20:39:36<1:08:42, 2.53s/it] +2025-02-06 06:47:16 - ERROR - stderr - +2025-02-06 06:47:16 - ERROR - stderr - +2025-02-06 06:47:16 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.6512240171432495, 'learning_rate': 2.7630079602323447e-07, 'epoch': 2.78} +2025-02-06 06:47:16 - ERROR - stderr - 93%|█████████▎| 20802/22434 [20:39:36<1:08:42, 2.53s/it] +2025-02-06 06:47:19 - ERROR - stderr - 93%|█████████▎| 20803/22434 [20:39:39<1:08:34, 2.52s/it] +2025-02-06 06:47:19 - ERROR - stderr - +2025-02-06 06:47:19 - ERROR - stderr - +2025-02-06 06:47:19 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.623434066772461, 'learning_rate': 2.759638614090776e-07, 'epoch': 2.78} +2025-02-06 06:47:19 - ERROR - stderr - 93%|█████████▎| 20803/22434 [20:39:39<1:08:34, 2.52s/it] +2025-02-06 06:47:22 - ERROR - stderr - 93%|█████████▎| 20804/22434 [20:39:41<1:10:13, 2.58s/it] +2025-02-06 06:47:22 - ERROR - stderr - +2025-02-06 06:47:22 - ERROR - stderr - +2025-02-06 06:47:22 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.73087739944458, 'learning_rate': 2.756271294831492e-07, 'epoch': 2.78} +2025-02-06 06:47:22 - ERROR - stderr - 93%|█████████▎| 20804/22434 [20:39:41<1:10:13, 2.58s/it] +2025-02-06 06:47:24 - ERROR - stderr - 93%|█████████▎| 20805/22434 [20:39:44<1:09:51, 2.57s/it] +2025-02-06 06:47:24 - ERROR - stderr - +2025-02-06 06:47:24 - ERROR - stderr - +2025-02-06 06:47:24 - INFO - stdout - {'loss': 0.3174, 'grad_norm': 1.4283182621002197, 'learning_rate': 2.75290600252468e-07, 'epoch': 2.78} +2025-02-06 06:47:24 - ERROR - stderr - 93%|█████████▎| 20805/22434 [20:39:44<1:09:51, 2.57s/it] +2025-02-06 06:47:27 - ERROR - stderr - 93%|█████████▎| 20806/22434 [20:39:46<1:09:38, 2.57s/it] +2025-02-06 06:47:27 - ERROR - stderr - +2025-02-06 06:47:27 - ERROR - stderr - +2025-02-06 06:47:27 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.4423762559890747, 'learning_rate': 2.749542737240485e-07, 'epoch': 2.78} +2025-02-06 06:47:27 - ERROR - stderr - 93%|█████████▎| 20806/22434 [20:39:46<1:09:38, 2.57s/it] +2025-02-06 06:47:29 - ERROR - stderr - 93%|█████████▎| 20807/22434 [20:39:49<1:09:31, 2.56s/it] +2025-02-06 06:47:29 - ERROR - stderr - +2025-02-06 06:47:29 - ERROR - stderr - +2025-02-06 06:47:29 - INFO - stdout - {'loss': 0.3311, 'grad_norm': 1.3682961463928223, 'learning_rate': 2.746181499049028e-07, 'epoch': 2.78} +2025-02-06 06:47:29 - ERROR - stderr - 93%|█████████▎| 20807/22434 [20:39:49<1:09:31, 2.56s/it] +2025-02-06 06:47:32 - ERROR - stderr - 93%|█████████▎| 20808/22434 [20:39:52<1:09:17, 2.56s/it] +2025-02-06 06:47:32 - ERROR - stderr - +2025-02-06 06:47:32 - ERROR - stderr - +2025-02-06 06:47:32 - INFO - stdout - {'loss': 0.3573, 'grad_norm': 1.5871332883834839, 'learning_rate': 2.74282228802033e-07, 'epoch': 2.78} +2025-02-06 06:47:32 - ERROR - stderr - 93%|█████████▎| 20808/22434 [20:39:52<1:09:17, 2.56s/it] +2025-02-06 06:47:34 - ERROR - stderr - 93%|█████████▎| 20809/22434 [20:39:54<1:08:27, 2.53s/it] +2025-02-06 06:47:34 - ERROR - stderr - +2025-02-06 06:47:34 - ERROR - stderr - +2025-02-06 06:47:34 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.54444420337677, 'learning_rate': 2.739465104224459e-07, 'epoch': 2.78} +2025-02-06 06:47:34 - ERROR - stderr - 93%|█████████▎| 20809/22434 [20:39:54<1:08:27, 2.53s/it] +2025-02-06 06:47:37 - ERROR - stderr - 93%|█████████▎| 20810/22434 [20:39:56<1:08:02, 2.51s/it] +2025-02-06 06:47:37 - ERROR - stderr - +2025-02-06 06:47:37 - ERROR - stderr - +2025-02-06 06:47:37 - INFO - stdout - {'loss': 0.3433, 'grad_norm': 1.5802757740020752, 'learning_rate': 2.736109947731358e-07, 'epoch': 2.78} +2025-02-06 06:47:37 - ERROR - stderr - 93%|█████████▎| 20810/22434 [20:39:57<1:08:02, 2.51s/it] +2025-02-06 06:47:39 - ERROR - stderr - 93%|█████████▎| 20811/22434 [20:39:59<1:08:00, 2.51s/it] +2025-02-06 06:47:39 - ERROR - stderr - +2025-02-06 06:47:39 - ERROR - stderr - +2025-02-06 06:47:39 - INFO - stdout - {'loss': 0.3354, 'grad_norm': 1.5497969388961792, 'learning_rate': 2.732756818610971e-07, 'epoch': 2.78} +2025-02-06 06:47:39 - ERROR - stderr - 93%|█████████▎| 20811/22434 [20:39:59<1:08:00, 2.51s/it] +2025-02-06 06:47:42 - ERROR - stderr - 93%|█████████▎| 20812/22434 [20:40:01<1:07:52, 2.51s/it] +2025-02-06 06:47:42 - ERROR - stderr - +2025-02-06 06:47:42 - ERROR - stderr - +2025-02-06 06:47:42 - INFO - stdout - {'loss': 0.3197, 'grad_norm': 1.3995293378829956, 'learning_rate': 2.729405716933209e-07, 'epoch': 2.78} +2025-02-06 06:47:42 - ERROR - stderr - 93%|█████████▎| 20812/22434 [20:40:02<1:07:52, 2.51s/it] +2025-02-06 06:47:44 - ERROR - stderr - 93%|█████████▎| 20813/22434 [20:40:04<1:07:44, 2.51s/it] +2025-02-06 06:47:44 - ERROR - stderr - +2025-02-06 06:47:44 - ERROR - stderr - +2025-02-06 06:47:44 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.4622576236724854, 'learning_rate': 2.7260566427678935e-07, 'epoch': 2.78} +2025-02-06 06:47:44 - ERROR - stderr - 93%|█████████▎| 20813/22434 [20:40:04<1:07:44, 2.51s/it] +2025-02-06 06:47:47 - ERROR - stderr - 93%|█████████▎| 20814/22434 [20:40:06<1:07:06, 2.49s/it] +2025-02-06 06:47:47 - ERROR - stderr - +2025-02-06 06:47:47 - ERROR - stderr - +2025-02-06 06:47:47 - INFO - stdout - {'loss': 0.3865, 'grad_norm': 1.5768816471099854, 'learning_rate': 2.722709596184858e-07, 'epoch': 2.78} +2025-02-06 06:47:47 - ERROR - stderr - 93%|█████████▎| 20814/22434 [20:40:06<1:07:06, 2.49s/it] +2025-02-06 06:47:49 - ERROR - stderr - 93%|█████████▎| 20815/22434 [20:40:09<1:07:32, 2.50s/it] +2025-02-06 06:47:49 - ERROR - stderr - +2025-02-06 06:47:49 - ERROR - stderr - +2025-02-06 06:47:49 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.5556169748306274, 'learning_rate': 2.7193645772538467e-07, 'epoch': 2.78} +2025-02-06 06:47:49 - ERROR - stderr - 93%|█████████▎| 20815/22434 [20:40:09<1:07:32, 2.50s/it] +2025-02-06 06:47:52 - ERROR - stderr - 93%|█████████▎| 20816/22434 [20:40:12<1:07:55, 2.52s/it] +2025-02-06 06:47:52 - ERROR - stderr - +2025-02-06 06:47:52 - ERROR - stderr - +2025-02-06 06:47:52 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.6539340019226074, 'learning_rate': 2.7160215860445924e-07, 'epoch': 2.78} +2025-02-06 06:47:52 - ERROR - stderr - 93%|█████████▎| 20816/22434 [20:40:12<1:07:55, 2.52s/it] +2025-02-06 06:47:54 - ERROR - stderr - 93%|█████████▎| 20817/22434 [20:40:14<1:07:22, 2.50s/it] +2025-02-06 06:47:54 - ERROR - stderr - +2025-02-06 06:47:54 - ERROR - stderr - +2025-02-06 06:47:54 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.6140497922897339, 'learning_rate': 2.7126806226267845e-07, 'epoch': 2.78} +2025-02-06 06:47:54 - ERROR - stderr - 93%|█████████▎| 20817/22434 [20:40:14<1:07:22, 2.50s/it] +2025-02-06 06:47:57 - ERROR - stderr - 93%|█████████▎| 20818/22434 [20:40:16<1:07:30, 2.51s/it] +2025-02-06 06:47:57 - ERROR - stderr - +2025-02-06 06:47:57 - ERROR - stderr - +2025-02-06 06:47:57 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.4616377353668213, 'learning_rate': 2.709341687070044e-07, 'epoch': 2.78} +2025-02-06 06:47:57 - ERROR - stderr - 93%|█████████▎| 20818/22434 [20:40:17<1:07:30, 2.51s/it] +2025-02-06 06:47:59 - ERROR - stderr - 93%|█████████▎| 20819/22434 [20:40:19<1:07:39, 2.51s/it] +2025-02-06 06:47:59 - ERROR - stderr - +2025-02-06 06:47:59 - ERROR - stderr - +2025-02-06 06:47:59 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.536062479019165, 'learning_rate': 2.7060047794439937e-07, 'epoch': 2.78} +2025-02-06 06:47:59 - ERROR - stderr - 93%|█████████▎| 20819/22434 [20:40:19<1:07:39, 2.51s/it] +2025-02-06 06:48:02 - ERROR - stderr - 93%|█████████▎| 20820/22434 [20:40:21<1:07:14, 2.50s/it] +2025-02-06 06:48:02 - ERROR - stderr - +2025-02-06 06:48:02 - ERROR - stderr - +2025-02-06 06:48:02 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.5663034915924072, 'learning_rate': 2.702669899818167e-07, 'epoch': 2.78} +2025-02-06 06:48:02 - ERROR - stderr - 93%|█████████▎| 20820/22434 [20:40:22<1:07:14, 2.50s/it] +2025-02-06 06:48:04 - ERROR - stderr - 93%|█████████▎| 20821/22434 [20:40:24<1:08:13, 2.54s/it] +2025-02-06 06:48:04 - ERROR - stderr - +2025-02-06 06:48:04 - ERROR - stderr - +2025-02-06 06:48:04 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.6951788663864136, 'learning_rate': 2.699337048262074e-07, 'epoch': 2.78} +2025-02-06 06:48:04 - ERROR - stderr - 93%|█████████▎| 20821/22434 [20:40:24<1:08:13, 2.54s/it] +2025-02-06 06:48:07 - ERROR - stderr - 93%|█████████▎| 20822/22434 [20:40:27<1:08:37, 2.55s/it] +2025-02-06 06:48:07 - ERROR - stderr - +2025-02-06 06:48:07 - ERROR - stderr - +2025-02-06 06:48:07 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.4148496389389038, 'learning_rate': 2.6960062248452043e-07, 'epoch': 2.78} +2025-02-06 06:48:07 - ERROR - stderr - 93%|█████████▎| 20822/22434 [20:40:27<1:08:37, 2.55s/it] +2025-02-06 06:48:10 - ERROR - stderr - 93%|█████████▎| 20823/22434 [20:40:29<1:08:44, 2.56s/it] +2025-02-06 06:48:10 - ERROR - stderr - +2025-02-06 06:48:10 - ERROR - stderr - +2025-02-06 06:48:10 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5569648742675781, 'learning_rate': 2.6926774296369696e-07, 'epoch': 2.78} +2025-02-06 06:48:10 - ERROR - stderr - 93%|█████████▎| 20823/22434 [20:40:29<1:08:44, 2.56s/it] +2025-02-06 06:48:12 - ERROR - stderr - 93%|█████████▎| 20824/22434 [20:40:32<1:08:13, 2.54s/it] +2025-02-06 06:48:12 - ERROR - stderr - +2025-02-06 06:48:12 - ERROR - stderr - +2025-02-06 06:48:12 - INFO - stdout - {'loss': 0.4166, 'grad_norm': 1.6534479856491089, 'learning_rate': 2.689350662706769e-07, 'epoch': 2.78} +2025-02-06 06:48:12 - ERROR - stderr - 93%|█████████▎| 20824/22434 [20:40:32<1:08:13, 2.54s/it] +2025-02-06 06:48:14 - ERROR - stderr - 93%|█████████▎| 20825/22434 [20:40:34<1:07:09, 2.50s/it] +2025-02-06 06:48:14 - ERROR - stderr - +2025-02-06 06:48:14 - ERROR - stderr - +2025-02-06 06:48:14 - INFO - stdout - {'loss': 0.3176, 'grad_norm': 1.5791547298431396, 'learning_rate': 2.686025924123925e-07, 'epoch': 2.78} +2025-02-06 06:48:14 - ERROR - stderr - 93%|█████████▎| 20825/22434 [20:40:34<1:07:09, 2.50s/it] +2025-02-06 06:48:14 - INFO - stdout - WARNING: tokenization mismatch: 110 vs. 127. (ignored) +2025-02-06 06:48:17 - ERROR - stderr - 93%|█████████▎| 20826/22434 [20:40:37<1:08:52, 2.57s/it] +2025-02-06 06:48:17 - ERROR - stderr - +2025-02-06 06:48:17 - ERROR - stderr - +2025-02-06 06:48:17 - INFO - stdout - {'loss': 0.368, 'grad_norm': 1.5644370317459106, 'learning_rate': 2.6827032139577604e-07, 'epoch': 2.78} +2025-02-06 06:48:17 - ERROR - stderr - 93%|█████████▎| 20826/22434 [20:40:37<1:08:52, 2.57s/it] +2025-02-06 06:48:20 - ERROR - stderr - 93%|█████████▎| 20827/22434 [20:40:39<1:08:12, 2.55s/it] +2025-02-06 06:48:20 - ERROR - stderr - +2025-02-06 06:48:20 - ERROR - stderr - +2025-02-06 06:48:20 - INFO - stdout - {'loss': 0.4188, 'grad_norm': 1.612885594367981, 'learning_rate': 2.6793825322775193e-07, 'epoch': 2.79} +2025-02-06 06:48:20 - ERROR - stderr - 93%|█████████▎| 20827/22434 [20:40:39<1:08:12, 2.55s/it] +2025-02-06 06:48:22 - ERROR - stderr - 93%|█████████▎| 20828/22434 [20:40:42<1:07:18, 2.51s/it] +2025-02-06 06:48:22 - ERROR - stderr - +2025-02-06 06:48:22 - ERROR - stderr - +2025-02-06 06:48:22 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.6456758975982666, 'learning_rate': 2.676063879152424e-07, 'epoch': 2.79} +2025-02-06 06:48:22 - ERROR - stderr - 93%|█████████▎| 20828/22434 [20:40:42<1:07:18, 2.51s/it] +2025-02-06 06:48:25 - ERROR - stderr - 93%|█████████▎| 20829/22434 [20:40:44<1:06:59, 2.50s/it] +2025-02-06 06:48:25 - ERROR - stderr - +2025-02-06 06:48:25 - ERROR - stderr - +2025-02-06 06:48:25 - INFO - stdout - {'loss': 0.3863, 'grad_norm': 1.6820650100708008, 'learning_rate': 2.672747254651653e-07, 'epoch': 2.79} +2025-02-06 06:48:25 - ERROR - stderr - 93%|█████████▎| 20829/22434 [20:40:44<1:06:59, 2.50s/it] +2025-02-06 06:48:27 - ERROR - stderr - 93%|█████████▎| 20830/22434 [20:40:47<1:06:26, 2.49s/it] +2025-02-06 06:48:27 - ERROR - stderr - +2025-02-06 06:48:27 - ERROR - stderr - +2025-02-06 06:48:27 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5543450117111206, 'learning_rate': 2.6694326588443286e-07, 'epoch': 2.79} +2025-02-06 06:48:27 - ERROR - stderr - 93%|█████████▎| 20830/22434 [20:40:47<1:06:26, 2.49s/it] +2025-02-06 06:48:30 - ERROR - stderr - 93%|█████████▎| 20831/22434 [20:40:49<1:06:41, 2.50s/it] +2025-02-06 06:48:30 - ERROR - stderr - +2025-02-06 06:48:30 - ERROR - stderr - +2025-02-06 06:48:30 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.6793380975723267, 'learning_rate': 2.666120091799551e-07, 'epoch': 2.79} +2025-02-06 06:48:30 - ERROR - stderr - 93%|█████████▎| 20831/22434 [20:40:49<1:06:41, 2.50s/it] +2025-02-06 06:48:32 - ERROR - stderr - 93%|█████████▎| 20832/22434 [20:40:52<1:07:53, 2.54s/it] +2025-02-06 06:48:32 - ERROR - stderr - +2025-02-06 06:48:32 - ERROR - stderr - +2025-02-06 06:48:32 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.4780927896499634, 'learning_rate': 2.662809553586354e-07, 'epoch': 2.79} +2025-02-06 06:48:32 - ERROR - stderr - 93%|█████████▎| 20832/22434 [20:40:52<1:07:53, 2.54s/it] +2025-02-06 06:48:35 - ERROR - stderr - 93%|█████████▎| 20833/22434 [20:40:54<1:06:48, 2.50s/it] +2025-02-06 06:48:35 - ERROR - stderr - +2025-02-06 06:48:35 - ERROR - stderr - +2025-02-06 06:48:35 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.702867865562439, 'learning_rate': 2.659501044273771e-07, 'epoch': 2.79} +2025-02-06 06:48:35 - ERROR - stderr - 93%|█████████▎| 20833/22434 [20:40:54<1:06:48, 2.50s/it] +2025-02-06 06:48:37 - ERROR - stderr - 93%|█████████▎| 20834/22434 [20:40:57<1:06:24, 2.49s/it] +2025-02-06 06:48:37 - ERROR - stderr - +2025-02-06 06:48:37 - ERROR - stderr - +2025-02-06 06:48:37 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.5688337087631226, 'learning_rate': 2.656194563930714e-07, 'epoch': 2.79} +2025-02-06 06:48:37 - ERROR - stderr - 93%|█████████▎| 20834/22434 [20:40:57<1:06:24, 2.49s/it] +2025-02-06 06:48:40 - ERROR - stderr - 93%|█████████▎| 20835/22434 [20:40:59<1:06:31, 2.50s/it] +2025-02-06 06:48:40 - ERROR - stderr - +2025-02-06 06:48:40 - ERROR - stderr - +2025-02-06 06:48:40 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.4807660579681396, 'learning_rate': 2.652890112626161e-07, 'epoch': 2.79} +2025-02-06 06:48:40 - ERROR - stderr - 93%|█████████▎| 20835/22434 [20:40:59<1:06:31, 2.50s/it] +2025-02-06 06:48:42 - ERROR - stderr - 93%|█████████▎| 20836/22434 [20:41:02<1:06:30, 2.50s/it] +2025-02-06 06:48:42 - ERROR - stderr - +2025-02-06 06:48:42 - ERROR - stderr - +2025-02-06 06:48:42 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.6465785503387451, 'learning_rate': 2.6495876904289454e-07, 'epoch': 2.79} +2025-02-06 06:48:42 - ERROR - stderr - 93%|█████████▎| 20836/22434 [20:41:02<1:06:30, 2.50s/it] +2025-02-06 06:48:45 - ERROR - stderr - 93%|█████████▎| 20837/22434 [20:41:04<1:06:15, 2.49s/it] +2025-02-06 06:48:45 - ERROR - stderr - +2025-02-06 06:48:45 - ERROR - stderr - +2025-02-06 06:48:45 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.8622580766677856, 'learning_rate': 2.6462872974079125e-07, 'epoch': 2.79} +2025-02-06 06:48:45 - ERROR - stderr - 93%|█████████▎| 20837/22434 [20:41:04<1:06:15, 2.49s/it] +2025-02-06 06:48:47 - ERROR - stderr - 93%|█████████▎| 20838/22434 [20:41:07<1:07:49, 2.55s/it] +2025-02-06 06:48:47 - ERROR - stderr - +2025-02-06 06:48:47 - ERROR - stderr - +2025-02-06 06:48:47 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.3903818130493164, 'learning_rate': 2.6429889336318847e-07, 'epoch': 2.79} +2025-02-06 06:48:47 - ERROR - stderr - 93%|█████████▎| 20838/22434 [20:41:07<1:07:49, 2.55s/it] +2025-02-06 06:48:50 - ERROR - stderr - 93%|█████████▎| 20839/22434 [20:41:09<1:06:51, 2.51s/it] +2025-02-06 06:48:50 - ERROR - stderr - +2025-02-06 06:48:50 - ERROR - stderr - +2025-02-06 06:48:50 - INFO - stdout - {'loss': 0.3295, 'grad_norm': 1.5297942161560059, 'learning_rate': 2.6396925991695744e-07, 'epoch': 2.79} +2025-02-06 06:48:50 - ERROR - stderr - 93%|█████████▎| 20839/22434 [20:41:09<1:06:51, 2.51s/it] +2025-02-06 06:48:52 - ERROR - stderr - 93%|█████████▎| 20840/22434 [20:41:12<1:06:31, 2.50s/it] +2025-02-06 06:48:52 - ERROR - stderr - +2025-02-06 06:48:52 - ERROR - stderr - +2025-02-06 06:48:52 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.4815959930419922, 'learning_rate': 2.636398294089726e-07, 'epoch': 2.79} +2025-02-06 06:48:52 - ERROR - stderr - 93%|█████████▎| 20840/22434 [20:41:12<1:06:31, 2.50s/it] +2025-02-06 06:48:55 - ERROR - stderr - 93%|█████████▎| 20841/22434 [20:41:14<1:06:35, 2.51s/it] +2025-02-06 06:48:55 - ERROR - stderr - +2025-02-06 06:48:55 - ERROR - stderr - +2025-02-06 06:48:55 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.7010157108306885, 'learning_rate': 2.6331060184609735e-07, 'epoch': 2.79} +2025-02-06 06:48:55 - ERROR - stderr - 93%|█████████▎| 20841/22434 [20:41:14<1:06:35, 2.51s/it] +2025-02-06 06:48:57 - ERROR - stderr - 93%|█████████▎| 20842/22434 [20:41:17<1:06:06, 2.49s/it] +2025-02-06 06:48:57 - ERROR - stderr - +2025-02-06 06:48:57 - ERROR - stderr - +2025-02-06 06:48:57 - INFO - stdout - {'loss': 0.4278, 'grad_norm': 1.6452326774597168, 'learning_rate': 2.629815772351962e-07, 'epoch': 2.79} +2025-02-06 06:48:57 - ERROR - stderr - 93%|█████████▎| 20842/22434 [20:41:17<1:06:06, 2.49s/it] +2025-02-06 06:49:00 - ERROR - stderr - 93%|█████████▎| 20843/22434 [20:41:19<1:05:49, 2.48s/it] +2025-02-06 06:49:00 - ERROR - stderr - +2025-02-06 06:49:00 - ERROR - stderr - +2025-02-06 06:49:00 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.4043594598770142, 'learning_rate': 2.62652755583126e-07, 'epoch': 2.79} +2025-02-06 06:49:00 - ERROR - stderr - 93%|█████████▎| 20843/22434 [20:41:19<1:05:49, 2.48s/it] +2025-02-06 06:49:02 - ERROR - stderr - 93%|█████████▎| 20844/22434 [20:41:22<1:05:44, 2.48s/it] +2025-02-06 06:49:02 - ERROR - stderr - +2025-02-06 06:49:02 - ERROR - stderr - +2025-02-06 06:49:02 - INFO - stdout - {'loss': 0.3133, 'grad_norm': 1.4779294729232788, 'learning_rate': 2.623241368967422e-07, 'epoch': 2.79} +2025-02-06 06:49:02 - ERROR - stderr - 93%|█████████▎| 20844/22434 [20:41:22<1:05:44, 2.48s/it] +2025-02-06 06:49:05 - ERROR - stderr - 93%|█████████▎| 20845/22434 [20:41:24<1:06:15, 2.50s/it] +2025-02-06 06:49:05 - ERROR - stderr - +2025-02-06 06:49:05 - ERROR - stderr - +2025-02-06 06:49:05 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.57975172996521, 'learning_rate': 2.619957211828938e-07, 'epoch': 2.79} +2025-02-06 06:49:05 - ERROR - stderr - 93%|█████████▎| 20845/22434 [20:41:24<1:06:15, 2.50s/it] +2025-02-06 06:49:07 - ERROR - stderr - 93%|█████████▎| 20846/22434 [20:41:27<1:06:30, 2.51s/it] +2025-02-06 06:49:07 - ERROR - stderr - +2025-02-06 06:49:07 - ERROR - stderr - +2025-02-06 06:49:07 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.4036624431610107, 'learning_rate': 2.616675084484266e-07, 'epoch': 2.79} +2025-02-06 06:49:07 - ERROR - stderr - 93%|█████████▎| 20846/22434 [20:41:27<1:06:30, 2.51s/it] +2025-02-06 06:49:10 - ERROR - stderr - 93%|█████████▎| 20847/22434 [20:41:29<1:06:56, 2.53s/it] +2025-02-06 06:49:10 - ERROR - stderr - +2025-02-06 06:49:10 - ERROR - stderr - +2025-02-06 06:49:10 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.830676555633545, 'learning_rate': 2.613394987001805e-07, 'epoch': 2.79} +2025-02-06 06:49:10 - ERROR - stderr - 93%|█████████▎| 20847/22434 [20:41:30<1:06:56, 2.53s/it] +2025-02-06 06:49:12 - ERROR - stderr - 93%|█████████▎| 20848/22434 [20:41:32<1:06:22, 2.51s/it] +2025-02-06 06:49:12 - ERROR - stderr - +2025-02-06 06:49:12 - ERROR - stderr - +2025-02-06 06:49:12 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.5744816064834595, 'learning_rate': 2.6101169194499456e-07, 'epoch': 2.79} +2025-02-06 06:49:12 - ERROR - stderr - 93%|█████████▎| 20848/22434 [20:41:32<1:06:22, 2.51s/it] +2025-02-06 06:49:15 - ERROR - stderr - 93%|█████████▎| 20849/22434 [20:41:34<1:06:36, 2.52s/it] +2025-02-06 06:49:15 - ERROR - stderr - +2025-02-06 06:49:15 - ERROR - stderr - +2025-02-06 06:49:15 - INFO - stdout - {'loss': 0.3812, 'grad_norm': 1.7157931327819824, 'learning_rate': 2.6068408818970106e-07, 'epoch': 2.79} +2025-02-06 06:49:15 - ERROR - stderr - 93%|█████████▎| 20849/22434 [20:41:35<1:06:36, 2.52s/it] +2025-02-06 06:49:17 - ERROR - stderr - 93%|█████████▎| 20850/22434 [20:41:37<1:05:54, 2.50s/it] +2025-02-06 06:49:17 - ERROR - stderr - +2025-02-06 06:49:17 - ERROR - stderr - +2025-02-06 06:49:17 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.6557964086532593, 'learning_rate': 2.6035668744112786e-07, 'epoch': 2.79} +2025-02-06 06:49:17 - ERROR - stderr - 93%|█████████▎| 20850/22434 [20:41:37<1:05:54, 2.50s/it] +2025-02-06 06:49:20 - ERROR - stderr - 93%|█████████▎| 20851/22434 [20:41:39<1:05:16, 2.47s/it] +2025-02-06 06:49:20 - ERROR - stderr - +2025-02-06 06:49:20 - ERROR - stderr - +2025-02-06 06:49:20 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.4989999532699585, 'learning_rate': 2.6002948970609956e-07, 'epoch': 2.79} +2025-02-06 06:49:20 - ERROR - stderr - 93%|█████████▎| 20851/22434 [20:41:39<1:05:16, 2.47s/it] +2025-02-06 06:49:22 - ERROR - stderr - 93%|█████████▎| 20852/22434 [20:41:42<1:06:28, 2.52s/it] +2025-02-06 06:49:22 - ERROR - stderr - +2025-02-06 06:49:22 - ERROR - stderr - +2025-02-06 06:49:22 - INFO - stdout - {'loss': 0.2783, 'grad_norm': 1.3471267223358154, 'learning_rate': 2.597024949914373e-07, 'epoch': 2.79} +2025-02-06 06:49:22 - ERROR - stderr - 93%|█████████▎| 20852/22434 [20:41:42<1:06:28, 2.52s/it] +2025-02-06 06:49:25 - ERROR - stderr - 93%|█████████▎| 20853/22434 [20:41:44<1:06:14, 2.51s/it] +2025-02-06 06:49:25 - ERROR - stderr - +2025-02-06 06:49:25 - ERROR - stderr - +2025-02-06 06:49:25 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.5260107517242432, 'learning_rate': 2.5937570330395345e-07, 'epoch': 2.79} +2025-02-06 06:49:25 - ERROR - stderr - 93%|█████████▎| 20853/22434 [20:41:45<1:06:14, 2.51s/it] +2025-02-06 06:49:27 - ERROR - stderr - 93%|█████████▎| 20854/22434 [20:41:47<1:05:37, 2.49s/it] +2025-02-06 06:49:27 - ERROR - stderr - +2025-02-06 06:49:27 - ERROR - stderr - +2025-02-06 06:49:27 - INFO - stdout - {'loss': 0.419, 'grad_norm': 1.5828197002410889, 'learning_rate': 2.5904911465046476e-07, 'epoch': 2.79} +2025-02-06 06:49:27 - ERROR - stderr - 93%|█████████▎| 20854/22434 [20:41:47<1:05:37, 2.49s/it] +2025-02-06 06:49:30 - ERROR - stderr - 93%|█████████▎| 20855/22434 [20:41:49<1:05:25, 2.49s/it] +2025-02-06 06:49:30 - ERROR - stderr - +2025-02-06 06:49:30 - ERROR - stderr - +2025-02-06 06:49:30 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.5923101902008057, 'learning_rate': 2.5872272903777473e-07, 'epoch': 2.79} +2025-02-06 06:49:30 - ERROR - stderr - 93%|█████████▎| 20855/22434 [20:41:49<1:05:25, 2.49s/it] +2025-02-06 06:49:32 - ERROR - stderr - 93%|█████████▎| 20856/22434 [20:41:52<1:05:00, 2.47s/it] +2025-02-06 06:49:32 - ERROR - stderr - +2025-02-06 06:49:32 - ERROR - stderr - +2025-02-06 06:49:32 - INFO - stdout - {'loss': 0.3703, 'grad_norm': 1.6633175611495972, 'learning_rate': 2.5839654647268896e-07, 'epoch': 2.79} +2025-02-06 06:49:32 - ERROR - stderr - 93%|█████████▎| 20856/22434 [20:41:52<1:05:00, 2.47s/it] +2025-02-06 06:49:35 - ERROR - stderr - 93%|█████████▎| 20857/22434 [20:41:54<1:04:59, 2.47s/it] +2025-02-06 06:49:35 - ERROR - stderr - +2025-02-06 06:49:35 - ERROR - stderr - +2025-02-06 06:49:35 - INFO - stdout - {'loss': 0.3391, 'grad_norm': 1.5400134325027466, 'learning_rate': 2.580705669620065e-07, 'epoch': 2.79} +2025-02-06 06:49:35 - ERROR - stderr - 93%|█████████▎| 20857/22434 [20:41:54<1:04:59, 2.47s/it] +2025-02-06 06:49:37 - ERROR - stderr - 93%|█████████▎| 20858/22434 [20:41:57<1:05:00, 2.48s/it] +2025-02-06 06:49:37 - ERROR - stderr - +2025-02-06 06:49:37 - ERROR - stderr - +2025-02-06 06:49:37 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.5184351205825806, 'learning_rate': 2.5774479051251856e-07, 'epoch': 2.79} +2025-02-06 06:49:37 - ERROR - stderr - 93%|█████████▎| 20858/22434 [20:41:57<1:05:00, 2.48s/it] +2025-02-06 06:49:40 - ERROR - stderr - 93%|█████████▎| 20859/22434 [20:41:59<1:05:36, 2.50s/it] +2025-02-06 06:49:40 - ERROR - stderr - +2025-02-06 06:49:40 - ERROR - stderr - +2025-02-06 06:49:40 - INFO - stdout - {'loss': 0.3323, 'grad_norm': 1.537785291671753, 'learning_rate': 2.574192171310197e-07, 'epoch': 2.79} +2025-02-06 06:49:40 - ERROR - stderr - 93%|█████████▎| 20859/22434 [20:41:59<1:05:36, 2.50s/it] +2025-02-06 06:49:42 - ERROR - stderr - 93%|█████████▎| 20860/22434 [20:42:02<1:05:36, 2.50s/it] +2025-02-06 06:49:42 - ERROR - stderr - +2025-02-06 06:49:42 - ERROR - stderr - +2025-02-06 06:49:42 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.4815930128097534, 'learning_rate': 2.570938468242945e-07, 'epoch': 2.79} +2025-02-06 06:49:42 - ERROR - stderr - 93%|█████████▎| 20860/22434 [20:42:02<1:05:36, 2.50s/it] +2025-02-06 06:49:45 - ERROR - stderr - 93%|█████████▎| 20861/22434 [20:42:04<1:05:36, 2.50s/it] +2025-02-06 06:49:45 - ERROR - stderr - +2025-02-06 06:49:45 - ERROR - stderr - +2025-02-06 06:49:45 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.586591362953186, 'learning_rate': 2.567686795991253e-07, 'epoch': 2.79} +2025-02-06 06:49:45 - ERROR - stderr - 93%|█████████▎| 20861/22434 [20:42:04<1:05:36, 2.50s/it] +2025-02-06 06:49:47 - ERROR - stderr - 93%|█████████▎| 20862/22434 [20:42:07<1:06:17, 2.53s/it] +2025-02-06 06:49:47 - ERROR - stderr - +2025-02-06 06:49:47 - ERROR - stderr - +2025-02-06 06:49:47 - INFO - stdout - {'loss': 0.3267, 'grad_norm': 1.515321969985962, 'learning_rate': 2.5644371546228895e-07, 'epoch': 2.79} +2025-02-06 06:49:47 - ERROR - stderr - 93%|█████████▎| 20862/22434 [20:42:07<1:06:17, 2.53s/it] +2025-02-06 06:49:50 - ERROR - stderr - 93%|█████████▎| 20863/22434 [20:42:09<1:05:50, 2.51s/it] +2025-02-06 06:49:50 - ERROR - stderr - +2025-02-06 06:49:50 - ERROR - stderr - +2025-02-06 06:49:50 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.6835474967956543, 'learning_rate': 2.561189544205589e-07, 'epoch': 2.79} +2025-02-06 06:49:50 - ERROR - stderr - 93%|█████████▎| 20863/22434 [20:42:09<1:05:50, 2.51s/it] +2025-02-06 06:49:52 - ERROR - stderr - 93%|█████████▎| 20864/22434 [20:42:12<1:05:46, 2.51s/it] +2025-02-06 06:49:52 - ERROR - stderr - +2025-02-06 06:49:52 - ERROR - stderr - +2025-02-06 06:49:52 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.528456449508667, 'learning_rate': 2.5579439648070745e-07, 'epoch': 2.79} +2025-02-06 06:49:52 - ERROR - stderr - 93%|█████████▎| 20864/22434 [20:42:12<1:05:46, 2.51s/it] +2025-02-06 06:49:55 - ERROR - stderr - 93%|█████████▎| 20865/22434 [20:42:14<1:05:56, 2.52s/it] +2025-02-06 06:49:55 - ERROR - stderr - +2025-02-06 06:49:55 - ERROR - stderr - +2025-02-06 06:49:55 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.5126547813415527, 'learning_rate': 2.5547004164949707e-07, 'epoch': 2.79} +2025-02-06 06:49:55 - ERROR - stderr - 93%|█████████▎| 20865/22434 [20:42:15<1:05:56, 2.52s/it] +2025-02-06 06:49:57 - ERROR - stderr - 93%|█████████▎| 20866/22434 [20:42:17<1:05:13, 2.50s/it] +2025-02-06 06:49:57 - ERROR - stderr - +2025-02-06 06:49:57 - ERROR - stderr - +2025-02-06 06:49:57 - INFO - stdout - {'loss': 0.3528, 'grad_norm': 1.5286407470703125, 'learning_rate': 2.5514588993368894e-07, 'epoch': 2.79} +2025-02-06 06:49:57 - ERROR - stderr - 93%|█████████▎| 20866/22434 [20:42:17<1:05:13, 2.50s/it] +2025-02-06 06:50:00 - ERROR - stderr - 93%|█████████▎| 20867/22434 [20:42:19<1:04:44, 2.48s/it] +2025-02-06 06:50:00 - ERROR - stderr - +2025-02-06 06:50:00 - ERROR - stderr - +2025-02-06 06:50:00 - INFO - stdout - {'loss': 0.3475, 'grad_norm': 1.6199709177017212, 'learning_rate': 2.548219413400399e-07, 'epoch': 2.79} +2025-02-06 06:50:00 - ERROR - stderr - 93%|█████████▎| 20867/22434 [20:42:19<1:04:44, 2.48s/it] +2025-02-06 06:50:02 - ERROR - stderr - 93%|█████████▎| 20868/22434 [20:42:22<1:06:22, 2.54s/it] +2025-02-06 06:50:02 - ERROR - stderr - +2025-02-06 06:50:02 - ERROR - stderr - +2025-02-06 06:50:02 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.6597638130187988, 'learning_rate': 2.5449819587530233e-07, 'epoch': 2.79} +2025-02-06 06:50:02 - ERROR - stderr - 93%|█████████▎| 20868/22434 [20:42:22<1:06:22, 2.54s/it] +2025-02-06 06:50:05 - ERROR - stderr - 93%|█████████▎| 20869/22434 [20:42:25<1:06:33, 2.55s/it] +2025-02-06 06:50:05 - ERROR - stderr - +2025-02-06 06:50:05 - ERROR - stderr - +2025-02-06 06:50:05 - INFO - stdout - {'loss': 0.3385, 'grad_norm': 1.6368234157562256, 'learning_rate': 2.541746535462242e-07, 'epoch': 2.79} +2025-02-06 06:50:05 - ERROR - stderr - 93%|█████████▎| 20869/22434 [20:42:25<1:06:33, 2.55s/it] +2025-02-06 06:50:07 - ERROR - stderr - 93%|█████████▎| 20870/22434 [20:42:27<1:05:51, 2.53s/it] +2025-02-06 06:50:07 - ERROR - stderr - +2025-02-06 06:50:07 - ERROR - stderr - +2025-02-06 06:50:07 - INFO - stdout - {'loss': 0.4118, 'grad_norm': 1.6762608289718628, 'learning_rate': 2.5385131435955e-07, 'epoch': 2.79} +2025-02-06 06:50:07 - ERROR - stderr - 93%|█████████▎| 20870/22434 [20:42:27<1:05:51, 2.53s/it] +2025-02-06 06:50:10 - ERROR - stderr - 93%|█████████▎| 20871/22434 [20:42:30<1:07:38, 2.60s/it] +2025-02-06 06:50:10 - ERROR - stderr - +2025-02-06 06:50:10 - ERROR - stderr - +2025-02-06 06:50:10 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.4200471639633179, 'learning_rate': 2.5352817832201893e-07, 'epoch': 2.79} +2025-02-06 06:50:10 - ERROR - stderr - 93%|█████████▎| 20871/22434 [20:42:30<1:07:38, 2.60s/it] +2025-02-06 06:50:13 - ERROR - stderr - 93%|█████████▎| 20872/22434 [20:42:32<1:06:36, 2.56s/it] +2025-02-06 06:50:13 - ERROR - stderr - +2025-02-06 06:50:13 - ERROR - stderr - +2025-02-06 06:50:13 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.5693608522415161, 'learning_rate': 2.5320524544036664e-07, 'epoch': 2.79} +2025-02-06 06:50:13 - ERROR - stderr - 93%|█████████▎| 20872/22434 [20:42:32<1:06:36, 2.56s/it] +2025-02-06 06:50:15 - ERROR - stderr - 93%|█████████▎| 20873/22434 [20:42:35<1:06:42, 2.56s/it] +2025-02-06 06:50:15 - ERROR - stderr - +2025-02-06 06:50:15 - ERROR - stderr - +2025-02-06 06:50:15 - INFO - stdout - {'loss': 0.3119, 'grad_norm': 1.529669165611267, 'learning_rate': 2.528825157213255e-07, 'epoch': 2.79} +2025-02-06 06:50:15 - ERROR - stderr - 93%|█████████▎| 20873/22434 [20:42:35<1:06:42, 2.56s/it] +2025-02-06 06:50:18 - ERROR - stderr - 93%|█████████▎| 20874/22434 [20:42:37<1:05:44, 2.53s/it] +2025-02-06 06:50:18 - ERROR - stderr - +2025-02-06 06:50:18 - ERROR - stderr - +2025-02-06 06:50:18 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.523508906364441, 'learning_rate': 2.5255998917161903e-07, 'epoch': 2.79} +2025-02-06 06:50:18 - ERROR - stderr - 93%|█████████▎| 20874/22434 [20:42:37<1:05:44, 2.53s/it] +2025-02-06 06:50:20 - ERROR - stderr - 93%|█████████▎| 20875/22434 [20:42:40<1:07:53, 2.61s/it] +2025-02-06 06:50:20 - ERROR - stderr - +2025-02-06 06:50:20 - ERROR - stderr - +2025-02-06 06:50:20 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.5929558277130127, 'learning_rate': 2.5223766579797416e-07, 'epoch': 2.79} +2025-02-06 06:50:20 - ERROR - stderr - 93%|█████████▎| 20875/22434 [20:42:40<1:07:53, 2.61s/it] +2025-02-06 06:50:23 - ERROR - stderr - 93%|█████████▎| 20876/22434 [20:42:43<1:08:40, 2.65s/it] +2025-02-06 06:50:23 - ERROR - stderr - +2025-02-06 06:50:23 - ERROR - stderr - +2025-02-06 06:50:23 - INFO - stdout - {'loss': 0.4047, 'grad_norm': 1.5914088487625122, 'learning_rate': 2.519155456071076e-07, 'epoch': 2.79} +2025-02-06 06:50:23 - ERROR - stderr - 93%|█████████▎| 20876/22434 [20:42:43<1:08:40, 2.65s/it] +2025-02-06 06:50:26 - ERROR - stderr - 93%|█████████▎| 20877/22434 [20:42:45<1:07:12, 2.59s/it] +2025-02-06 06:50:26 - ERROR - stderr - +2025-02-06 06:50:26 - ERROR - stderr - +2025-02-06 06:50:26 - INFO - stdout - {'loss': 0.4279, 'grad_norm': 1.8419655561447144, 'learning_rate': 2.5159362860573187e-07, 'epoch': 2.79} +2025-02-06 06:50:26 - ERROR - stderr - 93%|█████████▎| 20877/22434 [20:42:45<1:07:12, 2.59s/it] +2025-02-06 06:50:28 - ERROR - stderr - 93%|█████████▎| 20878/22434 [20:42:48<1:06:37, 2.57s/it] +2025-02-06 06:50:28 - ERROR - stderr - +2025-02-06 06:50:28 - ERROR - stderr - +2025-02-06 06:50:28 - INFO - stdout - {'loss': 0.3306, 'grad_norm': 1.6265692710876465, 'learning_rate': 2.5127191480056044e-07, 'epoch': 2.79} +2025-02-06 06:50:28 - ERROR - stderr - 93%|█████████▎| 20878/22434 [20:42:48<1:06:37, 2.57s/it] +2025-02-06 06:50:31 - ERROR - stderr - 93%|█████████▎| 20879/22434 [20:42:50<1:05:47, 2.54s/it] +2025-02-06 06:50:31 - ERROR - stderr - +2025-02-06 06:50:31 - ERROR - stderr - +2025-02-06 06:50:31 - INFO - stdout - {'loss': 0.3263, 'grad_norm': 1.4231890439987183, 'learning_rate': 2.5095040419829575e-07, 'epoch': 2.79} +2025-02-06 06:50:31 - ERROR - stderr - 93%|█████████▎| 20879/22434 [20:42:50<1:05:47, 2.54s/it] +2025-02-06 06:50:33 - ERROR - stderr - 93%|█████████▎| 20880/22434 [20:42:53<1:05:10, 2.52s/it] +2025-02-06 06:50:33 - ERROR - stderr - +2025-02-06 06:50:33 - ERROR - stderr - +2025-02-06 06:50:33 - INFO - stdout - {'loss': 0.424, 'grad_norm': 1.741549015045166, 'learning_rate': 2.506290968056424e-07, 'epoch': 2.79} +2025-02-06 06:50:33 - ERROR - stderr - 93%|█████████▎| 20880/22434 [20:42:53<1:05:10, 2.52s/it] +2025-02-06 06:50:36 - ERROR - stderr - 93%|█████████▎| 20881/22434 [20:42:55<1:05:14, 2.52s/it] +2025-02-06 06:50:36 - ERROR - stderr - +2025-02-06 06:50:36 - ERROR - stderr - +2025-02-06 06:50:36 - INFO - stdout - {'loss': 0.2997, 'grad_norm': 1.4248981475830078, 'learning_rate': 2.503079926292962e-07, 'epoch': 2.79} +2025-02-06 06:50:36 - ERROR - stderr - 93%|█████████▎| 20881/22434 [20:42:55<1:05:14, 2.52s/it] +2025-02-06 06:50:38 - ERROR - stderr - 93%|█████████▎| 20882/22434 [20:42:58<1:04:28, 2.49s/it] +2025-02-06 06:50:38 - ERROR - stderr - +2025-02-06 06:50:38 - ERROR - stderr - +2025-02-06 06:50:38 - INFO - stdout - {'loss': 0.3078, 'grad_norm': 1.2932381629943848, 'learning_rate': 2.4998709167594946e-07, 'epoch': 2.79} +2025-02-06 06:50:38 - ERROR - stderr - 93%|█████████▎| 20882/22434 [20:42:58<1:04:28, 2.49s/it] +2025-02-06 06:50:41 - ERROR - stderr - 93%|█████████▎| 20883/22434 [20:43:00<1:04:50, 2.51s/it] +2025-02-06 06:50:41 - ERROR - stderr - +2025-02-06 06:50:41 - ERROR - stderr - +2025-02-06 06:50:41 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.496315836906433, 'learning_rate': 2.4966639395229366e-07, 'epoch': 2.79} +2025-02-06 06:50:41 - ERROR - stderr - 93%|█████████▎| 20883/22434 [20:43:00<1:04:50, 2.51s/it] +2025-02-06 06:50:43 - ERROR - stderr - 93%|█████████▎| 20884/22434 [20:43:03<1:04:24, 2.49s/it] +2025-02-06 06:50:43 - ERROR - stderr - +2025-02-06 06:50:43 - ERROR - stderr - +2025-02-06 06:50:43 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.669024109840393, 'learning_rate': 2.493458994650111e-07, 'epoch': 2.79} +2025-02-06 06:50:43 - ERROR - stderr - 93%|█████████▎| 20884/22434 [20:43:03<1:04:24, 2.49s/it] +2025-02-06 06:50:45 - ERROR - stderr - 93%|█████████▎| 20885/22434 [20:43:05<1:04:35, 2.50s/it] +2025-02-06 06:50:46 - ERROR - stderr - +2025-02-06 06:50:46 - ERROR - stderr - +2025-02-06 06:50:46 - INFO - stdout - {'loss': 0.3769, 'grad_norm': 1.6058648824691772, 'learning_rate': 2.4902560822078316e-07, 'epoch': 2.79} +2025-02-06 06:50:46 - ERROR - stderr - 93%|█████████▎| 20885/22434 [20:43:05<1:04:35, 2.50s/it] +2025-02-06 06:50:48 - ERROR - stderr - 93%|█████████▎| 20886/22434 [20:43:08<1:04:26, 2.50s/it] +2025-02-06 06:50:48 - ERROR - stderr - +2025-02-06 06:50:48 - ERROR - stderr - +2025-02-06 06:50:48 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.559538722038269, 'learning_rate': 2.487055202262856e-07, 'epoch': 2.79} +2025-02-06 06:50:48 - ERROR - stderr - 93%|█████████▎| 20886/22434 [20:43:08<1:04:26, 2.50s/it] +2025-02-06 06:50:51 - ERROR - stderr - 93%|█████████▎| 20887/22434 [20:43:11<1:07:33, 2.62s/it] +2025-02-06 06:50:51 - ERROR - stderr - +2025-02-06 06:50:51 - ERROR - stderr - +2025-02-06 06:50:51 - INFO - stdout - {'loss': 0.3972, 'grad_norm': 1.6637864112854004, 'learning_rate': 2.483856354881897e-07, 'epoch': 2.79} +2025-02-06 06:50:51 - ERROR - stderr - 93%|█████████▎| 20887/22434 [20:43:11<1:07:33, 2.62s/it] +2025-02-06 06:50:53 - ERROR - stderr - 93%|█████████▎| 20888/22434 [20:43:13<1:07:01, 2.60s/it] +2025-02-06 06:50:53 - ERROR - stderr - +2025-02-06 06:50:53 - ERROR - stderr - +2025-02-06 06:50:53 - INFO - stdout - {'loss': 0.3288, 'grad_norm': 1.4640616178512573, 'learning_rate': 2.480659540131647e-07, 'epoch': 2.79} +2025-02-06 06:50:53 - ERROR - stderr - 93%|█████████▎| 20888/22434 [20:43:13<1:07:01, 2.60s/it] +2025-02-06 06:50:56 - ERROR - stderr - 93%|█████████▎| 20889/22434 [20:43:16<1:06:05, 2.57s/it] +2025-02-06 06:50:56 - ERROR - stderr - +2025-02-06 06:50:56 - ERROR - stderr - +2025-02-06 06:50:56 - INFO - stdout - {'loss': 0.316, 'grad_norm': 1.3532356023788452, 'learning_rate': 2.477464758078729e-07, 'epoch': 2.79} +2025-02-06 06:50:56 - ERROR - stderr - 93%|█████████▎| 20889/22434 [20:43:16<1:06:05, 2.57s/it] +2025-02-06 06:50:58 - ERROR - stderr - 93%|█████████▎| 20890/22434 [20:43:18<1:05:24, 2.54s/it] +2025-02-06 06:50:58 - ERROR - stderr - +2025-02-06 06:50:58 - ERROR - stderr - +2025-02-06 06:50:58 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.4552329778671265, 'learning_rate': 2.4742720087897466e-07, 'epoch': 2.79} +2025-02-06 06:50:58 - ERROR - stderr - 93%|█████████▎| 20890/22434 [20:43:18<1:05:24, 2.54s/it] +2025-02-06 06:51:01 - ERROR - stderr - 93%|█████████▎| 20891/22434 [20:43:21<1:05:31, 2.55s/it] +2025-02-06 06:51:01 - ERROR - stderr - +2025-02-06 06:51:01 - ERROR - stderr - +2025-02-06 06:51:01 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.4973418712615967, 'learning_rate': 2.4710812923312346e-07, 'epoch': 2.79} +2025-02-06 06:51:01 - ERROR - stderr - 93%|█████████▎| 20891/22434 [20:43:21<1:05:31, 2.55s/it] +2025-02-06 06:51:04 - ERROR - stderr - 93%|█████████▎| 20892/22434 [20:43:23<1:05:23, 2.54s/it] +2025-02-06 06:51:04 - ERROR - stderr - +2025-02-06 06:51:04 - ERROR - stderr - +2025-02-06 06:51:04 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.6040809154510498, 'learning_rate': 2.4678926087697177e-07, 'epoch': 2.79} +2025-02-06 06:51:04 - ERROR - stderr - 93%|█████████▎| 20892/22434 [20:43:23<1:05:23, 2.54s/it] +2025-02-06 06:51:06 - ERROR - stderr - 93%|█████████▎| 20893/22434 [20:43:26<1:04:56, 2.53s/it] +2025-02-06 06:51:06 - ERROR - stderr - +2025-02-06 06:51:06 - ERROR - stderr - +2025-02-06 06:51:06 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.434979796409607, 'learning_rate': 2.464705958171632e-07, 'epoch': 2.79} +2025-02-06 06:51:06 - ERROR - stderr - 93%|█████████▎| 20893/22434 [20:43:26<1:04:56, 2.53s/it] +2025-02-06 06:51:08 - ERROR - stderr - 93%|█████████▎| 20894/22434 [20:43:28<1:04:22, 2.51s/it] +2025-02-06 06:51:09 - ERROR - stderr - +2025-02-06 06:51:09 - ERROR - stderr - +2025-02-06 06:51:09 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.6095519065856934, 'learning_rate': 2.4615213406034345e-07, 'epoch': 2.79} +2025-02-06 06:51:09 - ERROR - stderr - 93%|█████████▎| 20894/22434 [20:43:28<1:04:22, 2.51s/it] +2025-02-06 06:51:11 - ERROR - stderr - 93%|█████████▎| 20895/22434 [20:43:31<1:04:18, 2.51s/it] +2025-02-06 06:51:11 - ERROR - stderr - +2025-02-06 06:51:11 - ERROR - stderr - +2025-02-06 06:51:11 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.6396526098251343, 'learning_rate': 2.458338756131484e-07, 'epoch': 2.79} +2025-02-06 06:51:11 - ERROR - stderr - 93%|█████████▎| 20895/22434 [20:43:31<1:04:18, 2.51s/it] +2025-02-06 06:51:13 - ERROR - stderr - 93%|█████████▎| 20896/22434 [20:43:33<1:04:22, 2.51s/it] +2025-02-06 06:51:14 - ERROR - stderr - +2025-02-06 06:51:14 - ERROR - stderr - +2025-02-06 06:51:14 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.4860919713974, 'learning_rate': 2.455158204822128e-07, 'epoch': 2.79} +2025-02-06 06:51:14 - ERROR - stderr - 93%|█████████▎| 20896/22434 [20:43:33<1:04:22, 2.51s/it] +2025-02-06 06:51:16 - ERROR - stderr - 93%|█████████▎| 20897/22434 [20:43:36<1:03:38, 2.48s/it] +2025-02-06 06:51:16 - ERROR - stderr - +2025-02-06 06:51:16 - ERROR - stderr - +2025-02-06 06:51:16 - INFO - stdout - {'loss': 0.3539, 'grad_norm': 1.5590410232543945, 'learning_rate': 2.451979686741668e-07, 'epoch': 2.79} +2025-02-06 06:51:16 - ERROR - stderr - 93%|█████████▎| 20897/22434 [20:43:36<1:03:38, 2.48s/it] +2025-02-06 06:51:18 - ERROR - stderr - 93%|█████████▎| 20898/22434 [20:43:38<1:04:08, 2.51s/it] +2025-02-06 06:51:19 - ERROR - stderr - +2025-02-06 06:51:19 - ERROR - stderr - +2025-02-06 06:51:19 - INFO - stdout - {'loss': 0.3197, 'grad_norm': 1.4203370809555054, 'learning_rate': 2.44880320195634e-07, 'epoch': 2.79} +2025-02-06 06:51:19 - ERROR - stderr - 93%|█████████▎| 20898/22434 [20:43:38<1:04:08, 2.51s/it] +2025-02-06 06:51:21 - ERROR - stderr - 93%|█████████▎| 20899/22434 [20:43:41<1:05:57, 2.58s/it] +2025-02-06 06:51:21 - ERROR - stderr - +2025-02-06 06:51:21 - ERROR - stderr - +2025-02-06 06:51:21 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.452337622642517, 'learning_rate': 2.4456287505323693e-07, 'epoch': 2.79} +2025-02-06 06:51:21 - ERROR - stderr - 93%|█████████▎| 20899/22434 [20:43:41<1:05:57, 2.58s/it] +2025-02-06 06:51:24 - ERROR - stderr - 93%|█████████▎| 20900/22434 [20:43:44<1:06:18, 2.59s/it] +2025-02-06 06:51:24 - ERROR - stderr - +2025-02-06 06:51:24 - ERROR - stderr - +2025-02-06 06:51:24 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.6493194103240967, 'learning_rate': 2.442456332535903e-07, 'epoch': 2.79} +2025-02-06 06:51:24 - ERROR - stderr - 93%|█████████▎| 20900/22434 [20:43:44<1:06:18, 2.59s/it] +2025-02-06 06:51:26 - ERROR - stderr - 93%|█████████▎| 20901/22434 [20:43:46<1:05:43, 2.57s/it] +2025-02-06 06:51:26 - ERROR - stderr - +2025-02-06 06:51:26 - ERROR - stderr - +2025-02-06 06:51:26 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.625227928161621, 'learning_rate': 2.4392859480330876e-07, 'epoch': 2.79} +2025-02-06 06:51:26 - ERROR - stderr - 93%|█████████▎| 20901/22434 [20:43:46<1:05:43, 2.57s/it] +2025-02-06 06:51:29 - ERROR - stderr - 93%|█████████▎| 20902/22434 [20:43:49<1:04:38, 2.53s/it] +2025-02-06 06:51:29 - ERROR - stderr - +2025-02-06 06:51:29 - ERROR - stderr - +2025-02-06 06:51:29 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.6872650384902954, 'learning_rate': 2.4361175970900154e-07, 'epoch': 2.8} +2025-02-06 06:51:29 - ERROR - stderr - 93%|█████████▎| 20902/22434 [20:43:49<1:04:38, 2.53s/it] +2025-02-06 06:51:31 - ERROR - stderr - 93%|█████████▎| 20903/22434 [20:43:51<1:03:44, 2.50s/it] +2025-02-06 06:51:31 - ERROR - stderr - +2025-02-06 06:51:31 - ERROR - stderr - +2025-02-06 06:51:31 - INFO - stdout - {'loss': 0.3823, 'grad_norm': 1.5454391241073608, 'learning_rate': 2.4329512797726884e-07, 'epoch': 2.8} +2025-02-06 06:51:31 - ERROR - stderr - 93%|█████████▎| 20903/22434 [20:43:51<1:03:44, 2.50s/it] +2025-02-06 06:51:34 - ERROR - stderr - 93%|█████████▎| 20904/22434 [20:43:54<1:04:47, 2.54s/it] +2025-02-06 06:51:34 - ERROR - stderr - +2025-02-06 06:51:34 - ERROR - stderr - +2025-02-06 06:51:34 - INFO - stdout - {'loss': 0.4122, 'grad_norm': 1.8218274116516113, 'learning_rate': 2.4297869961471544e-07, 'epoch': 2.8} +2025-02-06 06:51:34 - ERROR - stderr - 93%|█████████▎| 20904/22434 [20:43:54<1:04:47, 2.54s/it] +2025-02-06 06:51:36 - ERROR - stderr - 93%|█████████▎| 20905/22434 [20:43:56<1:04:39, 2.54s/it] +2025-02-06 06:51:36 - ERROR - stderr - +2025-02-06 06:51:36 - ERROR - stderr - +2025-02-06 06:51:36 - INFO - stdout - {'loss': 0.3433, 'grad_norm': 1.775802493095398, 'learning_rate': 2.426624746279327e-07, 'epoch': 2.8} +2025-02-06 06:51:36 - ERROR - stderr - 93%|█████████▎| 20905/22434 [20:43:56<1:04:39, 2.54s/it] +2025-02-06 06:51:39 - ERROR - stderr - 93%|█████████▎| 20906/22434 [20:43:59<1:04:09, 2.52s/it] +2025-02-06 06:51:39 - ERROR - stderr - +2025-02-06 06:51:39 - ERROR - stderr - +2025-02-06 06:51:39 - INFO - stdout - {'loss': 0.3137, 'grad_norm': 1.3720479011535645, 'learning_rate': 2.423464530235153e-07, 'epoch': 2.8} +2025-02-06 06:51:39 - ERROR - stderr - 93%|█████████▎| 20906/22434 [20:43:59<1:04:09, 2.52s/it] +2025-02-06 06:51:41 - ERROR - stderr - 93%|█████████▎| 20907/22434 [20:44:01<1:03:54, 2.51s/it] +2025-02-06 06:51:41 - ERROR - stderr - +2025-02-06 06:51:41 - ERROR - stderr - +2025-02-06 06:51:41 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.4110440015792847, 'learning_rate': 2.420306348080481e-07, 'epoch': 2.8} +2025-02-06 06:51:41 - ERROR - stderr - 93%|█████████▎| 20907/22434 [20:44:01<1:03:54, 2.51s/it] +2025-02-06 06:51:44 - ERROR - stderr - 93%|█████████▎| 20908/22434 [20:44:04<1:03:20, 2.49s/it] +2025-02-06 06:51:44 - ERROR - stderr - +2025-02-06 06:51:44 - ERROR - stderr - +2025-02-06 06:51:44 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.6522252559661865, 'learning_rate': 2.4171501998811466e-07, 'epoch': 2.8} +2025-02-06 06:51:44 - ERROR - stderr - 93%|█████████▎| 20908/22434 [20:44:04<1:03:20, 2.49s/it] +2025-02-06 06:51:46 - ERROR - stderr - 93%|█████████▎| 20909/22434 [20:44:06<1:03:17, 2.49s/it] +2025-02-06 06:51:46 - ERROR - stderr - +2025-02-06 06:51:46 - ERROR - stderr - +2025-02-06 06:51:46 - INFO - stdout - {'loss': 0.3707, 'grad_norm': 1.6671603918075562, 'learning_rate': 2.413996085702952e-07, 'epoch': 2.8} +2025-02-06 06:51:46 - ERROR - stderr - 93%|█████████▎| 20909/22434 [20:44:06<1:03:17, 2.49s/it] +2025-02-06 06:51:49 - ERROR - stderr - 93%|█████████▎| 20910/22434 [20:44:09<1:03:24, 2.50s/it] +2025-02-06 06:51:49 - ERROR - stderr - +2025-02-06 06:51:49 - ERROR - stderr - +2025-02-06 06:51:49 - INFO - stdout - {'loss': 0.3181, 'grad_norm': 1.410933256149292, 'learning_rate': 2.4108440056116236e-07, 'epoch': 2.8} +2025-02-06 06:51:49 - ERROR - stderr - 93%|█████████▎| 20910/22434 [20:44:09<1:03:24, 2.50s/it] +2025-02-06 06:51:51 - ERROR - stderr - 93%|█████████▎| 20911/22434 [20:44:11<1:03:34, 2.50s/it] +2025-02-06 06:51:51 - ERROR - stderr - +2025-02-06 06:51:51 - ERROR - stderr - +2025-02-06 06:51:51 - INFO - stdout - {'loss': 0.3082, 'grad_norm': 1.4107152223587036, 'learning_rate': 2.407693959672874e-07, 'epoch': 2.8} +2025-02-06 06:51:51 - ERROR - stderr - 93%|█████████▎| 20911/22434 [20:44:11<1:03:34, 2.50s/it] +2025-02-06 06:51:54 - ERROR - stderr - 93%|█████████▎| 20912/22434 [20:44:14<1:03:22, 2.50s/it] +2025-02-06 06:51:54 - ERROR - stderr - +2025-02-06 06:51:54 - ERROR - stderr - +2025-02-06 06:51:54 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.4872812032699585, 'learning_rate': 2.4045459479523524e-07, 'epoch': 2.8} +2025-02-06 06:51:54 - ERROR - stderr - 93%|█████████▎| 20912/22434 [20:44:14<1:03:22, 2.50s/it] +2025-02-06 06:51:56 - ERROR - stderr - 93%|█████████▎| 20913/22434 [20:44:16<1:03:36, 2.51s/it] +2025-02-06 06:51:56 - ERROR - stderr - +2025-02-06 06:51:56 - ERROR - stderr - +2025-02-06 06:51:56 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.584105134010315, 'learning_rate': 2.4013999705156834e-07, 'epoch': 2.8} +2025-02-06 06:51:56 - ERROR - stderr - 93%|█████████▎| 20913/22434 [20:44:16<1:03:36, 2.51s/it] +2025-02-06 06:51:59 - ERROR - stderr - 93%|█████████▎| 20914/22434 [20:44:19<1:03:07, 2.49s/it] +2025-02-06 06:51:59 - ERROR - stderr - +2025-02-06 06:51:59 - ERROR - stderr - +2025-02-06 06:51:59 - INFO - stdout - {'loss': 0.3679, 'grad_norm': 1.567333459854126, 'learning_rate': 2.398256027428436e-07, 'epoch': 2.8} +2025-02-06 06:51:59 - ERROR - stderr - 93%|█████████▎| 20914/22434 [20:44:19<1:03:07, 2.49s/it] +2025-02-06 06:52:01 - ERROR - stderr - 93%|█████████▎| 20915/22434 [20:44:21<1:02:56, 2.49s/it] +2025-02-06 06:52:01 - ERROR - stderr - +2025-02-06 06:52:01 - ERROR - stderr - +2025-02-06 06:52:01 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.5118257999420166, 'learning_rate': 2.395114118756148e-07, 'epoch': 2.8} +2025-02-06 06:52:01 - ERROR - stderr - 93%|█████████▎| 20915/22434 [20:44:21<1:02:56, 2.49s/it] +2025-02-06 06:52:04 - ERROR - stderr - 93%|█████████▎| 20916/22434 [20:44:24<1:04:46, 2.56s/it] +2025-02-06 06:52:04 - ERROR - stderr - +2025-02-06 06:52:04 - ERROR - stderr - +2025-02-06 06:52:04 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.5153311491012573, 'learning_rate': 2.39197424456431e-07, 'epoch': 2.8} +2025-02-06 06:52:04 - ERROR - stderr - 93%|█████████▎| 20916/22434 [20:44:24<1:04:46, 2.56s/it] +2025-02-06 06:52:07 - ERROR - stderr - 93%|█████████▎| 20917/22434 [20:44:26<1:04:48, 2.56s/it] +2025-02-06 06:52:07 - ERROR - stderr - +2025-02-06 06:52:07 - ERROR - stderr - +2025-02-06 06:52:07 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.5067511796951294, 'learning_rate': 2.388836404918371e-07, 'epoch': 2.8} +2025-02-06 06:52:07 - ERROR - stderr - 93%|█████████▎| 20917/22434 [20:44:26<1:04:48, 2.56s/it] +2025-02-06 06:52:09 - ERROR - stderr - 93%|█████████▎| 20918/22434 [20:44:29<1:05:09, 2.58s/it] +2025-02-06 06:52:09 - ERROR - stderr - +2025-02-06 06:52:09 - ERROR - stderr - +2025-02-06 06:52:09 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.625604510307312, 'learning_rate': 2.385700599883745e-07, 'epoch': 2.8} +2025-02-06 06:52:09 - ERROR - stderr - 93%|█████████▎| 20918/22434 [20:44:29<1:05:09, 2.58s/it] +2025-02-06 06:52:12 - ERROR - stderr - 93%|█████████▎| 20919/22434 [20:44:32<1:05:35, 2.60s/it] +2025-02-06 06:52:12 - ERROR - stderr - +2025-02-06 06:52:12 - ERROR - stderr - +2025-02-06 06:52:12 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.7500056028366089, 'learning_rate': 2.3825668295257563e-07, 'epoch': 2.8} +2025-02-06 06:52:12 - ERROR - stderr - 93%|█████████▎| 20919/22434 [20:44:32<1:05:35, 2.60s/it] +2025-02-06 06:52:14 - ERROR - stderr - 93%|█████████▎| 20920/22434 [20:44:34<1:05:05, 2.58s/it] +2025-02-06 06:52:14 - ERROR - stderr - +2025-02-06 06:52:14 - ERROR - stderr - +2025-02-06 06:52:14 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.5812227725982666, 'learning_rate': 2.3794350939097653e-07, 'epoch': 2.8} +2025-02-06 06:52:14 - ERROR - stderr - 93%|█████████▎| 20920/22434 [20:44:34<1:05:05, 2.58s/it] +2025-02-06 06:52:17 - ERROR - stderr - 93%|█████████▎| 20921/22434 [20:44:37<1:04:29, 2.56s/it] +2025-02-06 06:52:17 - ERROR - stderr - +2025-02-06 06:52:17 - ERROR - stderr - +2025-02-06 06:52:17 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.5940697193145752, 'learning_rate': 2.3763053931010415e-07, 'epoch': 2.8} +2025-02-06 06:52:17 - ERROR - stderr - 93%|█████████▎| 20921/22434 [20:44:37<1:04:29, 2.56s/it] +2025-02-06 06:52:19 - ERROR - stderr - 93%|█████████▎| 20922/22434 [20:44:39<1:04:17, 2.55s/it] +2025-02-06 06:52:19 - ERROR - stderr - +2025-02-06 06:52:19 - ERROR - stderr - +2025-02-06 06:52:19 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.4274922609329224, 'learning_rate': 2.3731777271647995e-07, 'epoch': 2.8} +2025-02-06 06:52:19 - ERROR - stderr - 93%|█████████▎| 20922/22434 [20:44:39<1:04:17, 2.55s/it] +2025-02-06 06:52:22 - ERROR - stderr - 93%|█████████▎| 20923/22434 [20:44:42<1:03:47, 2.53s/it] +2025-02-06 06:52:22 - ERROR - stderr - +2025-02-06 06:52:22 - ERROR - stderr - +2025-02-06 06:52:22 - INFO - stdout - {'loss': 0.3702, 'grad_norm': 1.691535234451294, 'learning_rate': 2.3700520961662753e-07, 'epoch': 2.8} +2025-02-06 06:52:22 - ERROR - stderr - 93%|█████████▎| 20923/22434 [20:44:42<1:03:47, 2.53s/it] +2025-02-06 06:52:24 - ERROR - stderr - 93%|█████████▎| 20924/22434 [20:44:44<1:03:32, 2.52s/it] +2025-02-06 06:52:24 - ERROR - stderr - +2025-02-06 06:52:24 - ERROR - stderr - +2025-02-06 06:52:24 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.6848517656326294, 'learning_rate': 2.3669285001705734e-07, 'epoch': 2.8} +2025-02-06 06:52:24 - ERROR - stderr - 93%|█████████▎| 20924/22434 [20:44:44<1:03:32, 2.52s/it] +2025-02-06 06:52:27 - ERROR - stderr - 93%|█████████▎| 20925/22434 [20:44:47<1:03:06, 2.51s/it] +2025-02-06 06:52:27 - ERROR - stderr - +2025-02-06 06:52:27 - ERROR - stderr - +2025-02-06 06:52:27 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.495120644569397, 'learning_rate': 2.36380693924283e-07, 'epoch': 2.8} +2025-02-06 06:52:27 - ERROR - stderr - 93%|█████████▎| 20925/22434 [20:44:47<1:03:06, 2.51s/it] +2025-02-06 06:52:29 - ERROR - stderr - 93%|█████████▎| 20926/22434 [20:44:49<1:03:52, 2.54s/it] +2025-02-06 06:52:30 - ERROR - stderr - +2025-02-06 06:52:30 - ERROR - stderr - +2025-02-06 06:52:30 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.551878571510315, 'learning_rate': 2.360687413448104e-07, 'epoch': 2.8} +2025-02-06 06:52:30 - ERROR - stderr - 93%|█████████▎| 20926/22434 [20:44:49<1:03:52, 2.54s/it] +2025-02-06 06:52:32 - ERROR - stderr - 93%|█████████▎| 20927/22434 [20:44:52<1:03:51, 2.54s/it] +2025-02-06 06:52:32 - ERROR - stderr - +2025-02-06 06:52:32 - ERROR - stderr - +2025-02-06 06:52:32 - INFO - stdout - {'loss': 0.3864, 'grad_norm': 1.6506924629211426, 'learning_rate': 2.3575699228514105e-07, 'epoch': 2.8} +2025-02-06 06:52:32 - ERROR - stderr - 93%|█████████▎| 20927/22434 [20:44:52<1:03:51, 2.54s/it] +2025-02-06 06:52:35 - ERROR - stderr - 93%|█████████▎| 20928/22434 [20:44:54<1:03:48, 2.54s/it] +2025-02-06 06:52:35 - ERROR - stderr - +2025-02-06 06:52:35 - ERROR - stderr - +2025-02-06 06:52:35 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.594507098197937, 'learning_rate': 2.3544544675177528e-07, 'epoch': 2.8} +2025-02-06 06:52:35 - ERROR - stderr - 93%|█████████▎| 20928/22434 [20:44:54<1:03:48, 2.54s/it] +2025-02-06 06:52:37 - ERROR - stderr - 93%|█████████▎| 20929/22434 [20:44:57<1:03:23, 2.53s/it] +2025-02-06 06:52:37 - ERROR - stderr - +2025-02-06 06:52:37 - ERROR - stderr - +2025-02-06 06:52:37 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.5190945863723755, 'learning_rate': 2.3513410475120456e-07, 'epoch': 2.8} +2025-02-06 06:52:37 - ERROR - stderr - 93%|█████████▎| 20929/22434 [20:44:57<1:03:23, 2.53s/it] +2025-02-06 06:52:40 - ERROR - stderr - 93%|█████████▎| 20930/22434 [20:44:59<1:02:49, 2.51s/it] +2025-02-06 06:52:40 - ERROR - stderr - +2025-02-06 06:52:40 - ERROR - stderr - +2025-02-06 06:52:40 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.5221331119537354, 'learning_rate': 2.348229662899193e-07, 'epoch': 2.8} +2025-02-06 06:52:40 - ERROR - stderr - 93%|█████████▎| 20930/22434 [20:44:59<1:02:49, 2.51s/it] +2025-02-06 06:52:42 - ERROR - stderr - 93%|█████████▎| 20931/22434 [20:45:02<1:02:48, 2.51s/it] +2025-02-06 06:52:42 - ERROR - stderr - +2025-02-06 06:52:42 - ERROR - stderr - +2025-02-06 06:52:42 - INFO - stdout - {'loss': 0.3326, 'grad_norm': 1.519376277923584, 'learning_rate': 2.3451203137440538e-07, 'epoch': 2.8} +2025-02-06 06:52:42 - ERROR - stderr - 93%|█████████▎| 20931/22434 [20:45:02<1:02:48, 2.51s/it] +2025-02-06 06:52:45 - ERROR - stderr - 93%|█████████▎| 20932/22434 [20:45:04<1:02:50, 2.51s/it] +2025-02-06 06:52:45 - ERROR - stderr - +2025-02-06 06:52:45 - ERROR - stderr - +2025-02-06 06:52:45 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.7048168182373047, 'learning_rate': 2.3420130001114317e-07, 'epoch': 2.8} +2025-02-06 06:52:45 - ERROR - stderr - 93%|█████████▎| 20932/22434 [20:45:04<1:02:50, 2.51s/it] +2025-02-06 06:52:47 - ERROR - stderr - 93%|█████████▎| 20933/22434 [20:45:07<1:02:34, 2.50s/it] +2025-02-06 06:52:47 - ERROR - stderr - +2025-02-06 06:52:47 - ERROR - stderr - +2025-02-06 06:52:47 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.7285701036453247, 'learning_rate': 2.338907722066097e-07, 'epoch': 2.8} +2025-02-06 06:52:47 - ERROR - stderr - 93%|█████████▎| 20933/22434 [20:45:07<1:02:34, 2.50s/it] +2025-02-06 06:52:50 - ERROR - stderr - 93%|█████████▎| 20934/22434 [20:45:09<1:02:20, 2.49s/it] +2025-02-06 06:52:50 - ERROR - stderr - +2025-02-06 06:52:50 - ERROR - stderr - +2025-02-06 06:52:50 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.6800168752670288, 'learning_rate': 2.3358044796727874e-07, 'epoch': 2.8} +2025-02-06 06:52:50 - ERROR - stderr - 93%|█████████▎| 20934/22434 [20:45:09<1:02:20, 2.49s/it] +2025-02-06 06:52:52 - ERROR - stderr - 93%|█████████▎| 20935/22434 [20:45:12<1:02:42, 2.51s/it] +2025-02-06 06:52:52 - ERROR - stderr - +2025-02-06 06:52:52 - ERROR - stderr - +2025-02-06 06:52:52 - INFO - stdout - {'loss': 0.3678, 'grad_norm': 1.5824846029281616, 'learning_rate': 2.332703272996173e-07, 'epoch': 2.8} +2025-02-06 06:52:52 - ERROR - stderr - 93%|█████████▎| 20935/22434 [20:45:12<1:02:42, 2.51s/it] +2025-02-06 06:52:54 - ERROR - stderr - 93%|█████████▎| 20936/22434 [20:45:14<1:02:09, 2.49s/it] +2025-02-06 06:52:55 - ERROR - stderr - +2025-02-06 06:52:55 - ERROR - stderr - +2025-02-06 06:52:55 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.6931216716766357, 'learning_rate': 2.329604102100913e-07, 'epoch': 2.8} +2025-02-06 06:52:55 - ERROR - stderr - 93%|█████████▎| 20936/22434 [20:45:14<1:02:09, 2.49s/it] +2025-02-06 06:52:57 - ERROR - stderr - 93%|█████████▎| 20937/22434 [20:45:17<1:01:53, 2.48s/it] +2025-02-06 06:52:57 - ERROR - stderr - +2025-02-06 06:52:57 - ERROR - stderr - +2025-02-06 06:52:57 - INFO - stdout - {'loss': 0.4196, 'grad_norm': 1.62696373462677, 'learning_rate': 2.3265069670515894e-07, 'epoch': 2.8} +2025-02-06 06:52:57 - ERROR - stderr - 93%|█████████▎| 20937/22434 [20:45:17<1:01:53, 2.48s/it] +2025-02-06 06:53:00 - ERROR - stderr - 93%|█████████▎| 20938/22434 [20:45:19<1:03:47, 2.56s/it] +2025-02-06 06:53:00 - ERROR - stderr - +2025-02-06 06:53:00 - ERROR - stderr - +2025-02-06 06:53:00 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.4578145742416382, 'learning_rate': 2.3234118679127615e-07, 'epoch': 2.8} +2025-02-06 06:53:00 - ERROR - stderr - 93%|█████████▎| 20938/22434 [20:45:20<1:03:47, 2.56s/it] +2025-02-06 06:53:02 - ERROR - stderr - 93%|█████████▎| 20939/22434 [20:45:22<1:04:49, 2.60s/it] +2025-02-06 06:53:02 - ERROR - stderr - +2025-02-06 06:53:02 - ERROR - stderr - +2025-02-06 06:53:02 - INFO - stdout - {'loss': 0.4086, 'grad_norm': 1.4603453874588013, 'learning_rate': 2.3203188047489443e-07, 'epoch': 2.8} +2025-02-06 06:53:02 - ERROR - stderr - 93%|█████████▎| 20939/22434 [20:45:22<1:04:49, 2.60s/it] +2025-02-06 06:53:05 - ERROR - stderr - 93%|█████████▎| 20940/22434 [20:45:25<1:03:39, 2.56s/it] +2025-02-06 06:53:05 - ERROR - stderr - +2025-02-06 06:53:05 - ERROR - stderr - +2025-02-06 06:53:05 - INFO - stdout - {'loss': 0.3395, 'grad_norm': 1.5470774173736572, 'learning_rate': 2.317227777624609e-07, 'epoch': 2.8} +2025-02-06 06:53:05 - ERROR - stderr - 93%|█████████▎| 20940/22434 [20:45:25<1:03:39, 2.56s/it] +2025-02-06 06:53:07 - ERROR - stderr - 93%|█████████▎| 20941/22434 [20:45:27<1:03:18, 2.54s/it] +2025-02-06 06:53:07 - ERROR - stderr - +2025-02-06 06:53:07 - ERROR - stderr - +2025-02-06 06:53:07 - INFO - stdout - {'loss': 0.3884, 'grad_norm': 1.6321628093719482, 'learning_rate': 2.314138786604203e-07, 'epoch': 2.8} +2025-02-06 06:53:07 - ERROR - stderr - 93%|█████████▎| 20941/22434 [20:45:27<1:03:18, 2.54s/it] +2025-02-06 06:53:10 - ERROR - stderr - 93%|█████████▎| 20942/22434 [20:45:30<1:03:18, 2.55s/it] +2025-02-06 06:53:10 - ERROR - stderr - +2025-02-06 06:53:10 - ERROR - stderr - +2025-02-06 06:53:10 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.668300747871399, 'learning_rate': 2.311051831752098e-07, 'epoch': 2.8} +2025-02-06 06:53:10 - ERROR - stderr - 93%|█████████▎| 20942/22434 [20:45:30<1:03:18, 2.55s/it] +2025-02-06 06:53:12 - ERROR - stderr - 93%|█████████▎| 20943/22434 [20:45:32<1:02:54, 2.53s/it] +2025-02-06 06:53:12 - ERROR - stderr - +2025-02-06 06:53:12 - ERROR - stderr - +2025-02-06 06:53:12 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.5133758783340454, 'learning_rate': 2.30796691313262e-07, 'epoch': 2.8} +2025-02-06 06:53:12 - ERROR - stderr - 93%|█████████▎| 20943/22434 [20:45:32<1:02:54, 2.53s/it] +2025-02-06 06:53:15 - ERROR - stderr - 93%|█████████▎| 20944/22434 [20:45:35<1:02:42, 2.53s/it] +2025-02-06 06:53:15 - ERROR - stderr - +2025-02-06 06:53:15 - ERROR - stderr - +2025-02-06 06:53:15 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.3428596258163452, 'learning_rate': 2.304884030810117e-07, 'epoch': 2.8} +2025-02-06 06:53:15 - ERROR - stderr - 93%|█████████▎| 20944/22434 [20:45:35<1:02:42, 2.53s/it] +2025-02-06 06:53:17 - ERROR - stderr - 93%|█████████▎| 20945/22434 [20:45:37<1:02:14, 2.51s/it] +2025-02-06 06:53:17 - ERROR - stderr - +2025-02-06 06:53:17 - ERROR - stderr - +2025-02-06 06:53:17 - INFO - stdout - {'loss': 0.2985, 'grad_norm': 1.3497636318206787, 'learning_rate': 2.3018031848488055e-07, 'epoch': 2.8} +2025-02-06 06:53:17 - ERROR - stderr - 93%|█████████▎| 20945/22434 [20:45:37<1:02:14, 2.51s/it] +2025-02-06 06:53:20 - ERROR - stderr - 93%|█████████▎| 20946/22434 [20:45:40<1:01:51, 2.49s/it] +2025-02-06 06:53:20 - ERROR - stderr - +2025-02-06 06:53:20 - ERROR - stderr - +2025-02-06 06:53:20 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.4529153108596802, 'learning_rate': 2.2987243753129107e-07, 'epoch': 2.8} +2025-02-06 06:53:20 - ERROR - stderr - 93%|█████████▎| 20946/22434 [20:45:40<1:01:51, 2.49s/it] +2025-02-06 06:53:22 - ERROR - stderr - 93%|█████████▎| 20947/22434 [20:45:42<1:02:11, 2.51s/it] +2025-02-06 06:53:22 - ERROR - stderr - +2025-02-06 06:53:22 - ERROR - stderr - +2025-02-06 06:53:22 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.6158350706100464, 'learning_rate': 2.2956476022666375e-07, 'epoch': 2.8} +2025-02-06 06:53:22 - ERROR - stderr - 93%|█████████▎| 20947/22434 [20:45:42<1:02:11, 2.51s/it] +2025-02-06 06:53:25 - ERROR - stderr - 93%|█████████▎| 20948/22434 [20:45:45<1:02:28, 2.52s/it] +2025-02-06 06:53:25 - ERROR - stderr - +2025-02-06 06:53:25 - ERROR - stderr - +2025-02-06 06:53:25 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.6476553678512573, 'learning_rate': 2.2925728657740786e-07, 'epoch': 2.8} +2025-02-06 06:53:25 - ERROR - stderr - 93%|█████████▎| 20948/22434 [20:45:45<1:02:28, 2.52s/it] +2025-02-06 06:53:27 - ERROR - stderr - 93%|█████████▎| 20949/22434 [20:45:47<1:01:49, 2.50s/it] +2025-02-06 06:53:27 - ERROR - stderr - +2025-02-06 06:53:27 - ERROR - stderr - +2025-02-06 06:53:27 - INFO - stdout - {'loss': 0.4103, 'grad_norm': 1.6119074821472168, 'learning_rate': 2.289500165899361e-07, 'epoch': 2.8} +2025-02-06 06:53:27 - ERROR - stderr - 93%|█████████▎| 20949/22434 [20:45:47<1:01:49, 2.50s/it] +2025-02-06 06:53:30 - ERROR - stderr - 93%|█████████▎| 20950/22434 [20:45:50<1:01:14, 2.48s/it] +2025-02-06 06:53:30 - ERROR - stderr - +2025-02-06 06:53:30 - ERROR - stderr - +2025-02-06 06:53:30 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.6339457035064697, 'learning_rate': 2.2864295027064997e-07, 'epoch': 2.8} +2025-02-06 06:53:30 - ERROR - stderr - 93%|█████████▎| 20950/22434 [20:45:50<1:01:14, 2.48s/it] +2025-02-06 06:53:32 - ERROR - stderr - 93%|█████████▎| 20951/22434 [20:45:52<1:01:10, 2.48s/it] +2025-02-06 06:53:32 - ERROR - stderr - +2025-02-06 06:53:32 - ERROR - stderr - +2025-02-06 06:53:32 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.5531072616577148, 'learning_rate': 2.2833608762595217e-07, 'epoch': 2.8} +2025-02-06 06:53:32 - ERROR - stderr - 93%|█████████▎| 20951/22434 [20:45:52<1:01:10, 2.48s/it] +2025-02-06 06:53:35 - ERROR - stderr - 93%|█████████▎| 20952/22434 [20:45:55<1:01:35, 2.49s/it] +2025-02-06 06:53:35 - ERROR - stderr - +2025-02-06 06:53:35 - ERROR - stderr - +2025-02-06 06:53:35 - INFO - stdout - {'loss': 0.3954, 'grad_norm': 1.6813520193099976, 'learning_rate': 2.2802942866223754e-07, 'epoch': 2.8} +2025-02-06 06:53:35 - ERROR - stderr - 93%|█████████▎| 20952/22434 [20:45:55<1:01:35, 2.49s/it] +2025-02-06 06:53:37 - ERROR - stderr - 93%|█████████▎| 20953/22434 [20:45:57<1:02:30, 2.53s/it] +2025-02-06 06:53:37 - ERROR - stderr - +2025-02-06 06:53:37 - ERROR - stderr - +2025-02-06 06:53:37 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.667297124862671, 'learning_rate': 2.2772297338589878e-07, 'epoch': 2.8} +2025-02-06 06:53:38 - ERROR - stderr - 93%|█████████▎| 20953/22434 [20:45:57<1:02:30, 2.53s/it] +2025-02-06 06:53:40 - ERROR - stderr - 93%|█████████▎| 20954/22434 [20:46:00<1:01:55, 2.51s/it] +2025-02-06 06:53:40 - ERROR - stderr - +2025-02-06 06:53:40 - ERROR - stderr - +2025-02-06 06:53:40 - INFO - stdout - {'loss': 0.3834, 'grad_norm': 1.5414049625396729, 'learning_rate': 2.2741672180332409e-07, 'epoch': 2.8} +2025-02-06 06:53:40 - ERROR - stderr - 93%|█████████▎| 20954/22434 [20:46:00<1:01:55, 2.51s/it] +2025-02-06 06:53:43 - ERROR - stderr - 93%|█████████▎| 20955/22434 [20:46:02<1:02:58, 2.55s/it] +2025-02-06 06:53:43 - ERROR - stderr - +2025-02-06 06:53:43 - ERROR - stderr - +2025-02-06 06:53:43 - INFO - stdout - {'loss': 0.3341, 'grad_norm': 1.3934364318847656, 'learning_rate': 2.2711067392089613e-07, 'epoch': 2.8} +2025-02-06 06:53:43 - ERROR - stderr - 93%|█████████▎| 20955/22434 [20:46:02<1:02:58, 2.55s/it] +2025-02-06 06:53:45 - ERROR - stderr - 93%|█████████▎| 20956/22434 [20:46:05<1:02:17, 2.53s/it] +2025-02-06 06:53:45 - ERROR - stderr - +2025-02-06 06:53:45 - ERROR - stderr - +2025-02-06 06:53:45 - INFO - stdout - {'loss': 0.3874, 'grad_norm': 1.796728253364563, 'learning_rate': 2.268048297449943e-07, 'epoch': 2.8} +2025-02-06 06:53:45 - ERROR - stderr - 93%|█████████▎| 20956/22434 [20:46:05<1:02:17, 2.53s/it] +2025-02-06 06:53:48 - ERROR - stderr - 93%|█████████▎| 20957/22434 [20:46:07<1:01:55, 2.52s/it] +2025-02-06 06:53:48 - ERROR - stderr - +2025-02-06 06:53:48 - ERROR - stderr - +2025-02-06 06:53:48 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.4994337558746338, 'learning_rate': 2.2649918928199455e-07, 'epoch': 2.8} +2025-02-06 06:53:48 - ERROR - stderr - 93%|█████████▎| 20957/22434 [20:46:07<1:01:55, 2.52s/it] +2025-02-06 06:53:50 - ERROR - stderr - 93%|█████████▎| 20958/22434 [20:46:10<1:01:46, 2.51s/it] +2025-02-06 06:53:50 - ERROR - stderr - +2025-02-06 06:53:50 - ERROR - stderr - +2025-02-06 06:53:50 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.6909615993499756, 'learning_rate': 2.2619375253826624e-07, 'epoch': 2.8} +2025-02-06 06:53:50 - ERROR - stderr - 93%|█████████▎| 20958/22434 [20:46:10<1:01:46, 2.51s/it] +2025-02-06 06:53:52 - ERROR - stderr - 93%|█████████▎| 20959/22434 [20:46:12<1:01:08, 2.49s/it] +2025-02-06 06:53:53 - ERROR - stderr - +2025-02-06 06:53:53 - ERROR - stderr - +2025-02-06 06:53:53 - INFO - stdout - {'loss': 0.3913, 'grad_norm': 1.7342839241027832, 'learning_rate': 2.2588851952017653e-07, 'epoch': 2.8} +2025-02-06 06:53:53 - ERROR - stderr - 93%|█████████▎| 20959/22434 [20:46:12<1:01:08, 2.49s/it] +2025-02-06 06:53:55 - ERROR - stderr - 93%|█████████▎| 20960/22434 [20:46:15<1:01:06, 2.49s/it] +2025-02-06 06:53:55 - ERROR - stderr - +2025-02-06 06:53:55 - ERROR - stderr - +2025-02-06 06:53:55 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.452297329902649, 'learning_rate': 2.255834902340881e-07, 'epoch': 2.8} +2025-02-06 06:53:55 - ERROR - stderr - 93%|█████████▎| 20960/22434 [20:46:15<1:01:06, 2.49s/it] +2025-02-06 06:53:57 - ERROR - stderr - 93%|█████████▎| 20961/22434 [20:46:17<1:01:10, 2.49s/it] +2025-02-06 06:53:57 - ERROR - stderr - +2025-02-06 06:53:57 - ERROR - stderr - +2025-02-06 06:53:57 - INFO - stdout - {'loss': 0.3319, 'grad_norm': 1.413185477256775, 'learning_rate': 2.252786646863603e-07, 'epoch': 2.8} +2025-02-06 06:53:57 - ERROR - stderr - 93%|█████████▎| 20961/22434 [20:46:17<1:01:10, 2.49s/it] +2025-02-06 06:54:00 - ERROR - stderr - 93%|█████████▎| 20962/22434 [20:46:20<1:00:34, 2.47s/it] +2025-02-06 06:54:00 - ERROR - stderr - +2025-02-06 06:54:00 - ERROR - stderr - +2025-02-06 06:54:00 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.541680097579956, 'learning_rate': 2.2497404288334245e-07, 'epoch': 2.8} +2025-02-06 06:54:00 - ERROR - stderr - 93%|█████████▎| 20962/22434 [20:46:20<1:00:34, 2.47s/it] +2025-02-06 06:54:02 - ERROR - stderr - 93%|█████████▎| 20963/22434 [20:46:22<1:01:18, 2.50s/it] +2025-02-06 06:54:02 - ERROR - stderr - +2025-02-06 06:54:02 - ERROR - stderr - +2025-02-06 06:54:02 - INFO - stdout - {'loss': 0.3416, 'grad_norm': 1.5604345798492432, 'learning_rate': 2.2466962483138954e-07, 'epoch': 2.8} +2025-02-06 06:54:02 - ERROR - stderr - 93%|█████████▎| 20963/22434 [20:46:22<1:01:18, 2.50s/it] +2025-02-06 06:54:05 - ERROR - stderr - 93%|█████████▎| 20964/22434 [20:46:25<1:00:47, 2.48s/it] +2025-02-06 06:54:05 - ERROR - stderr - +2025-02-06 06:54:05 - ERROR - stderr - +2025-02-06 06:54:05 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5624465942382812, 'learning_rate': 2.2436541053684203e-07, 'epoch': 2.8} +2025-02-06 06:54:05 - ERROR - stderr - 93%|█████████▎| 20964/22434 [20:46:25<1:00:47, 2.48s/it] +2025-02-06 06:54:07 - ERROR - stderr - 93%|█████████▎| 20965/22434 [20:46:27<1:00:58, 2.49s/it] +2025-02-06 06:54:07 - ERROR - stderr - +2025-02-06 06:54:07 - ERROR - stderr - +2025-02-06 06:54:07 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.6201050281524658, 'learning_rate': 2.240614000060448e-07, 'epoch': 2.8} +2025-02-06 06:54:07 - ERROR - stderr - 93%|█████████▎| 20965/22434 [20:46:27<1:00:58, 2.49s/it] +2025-02-06 06:54:10 - ERROR - stderr - 93%|█████████▎| 20966/22434 [20:46:30<1:00:56, 2.49s/it] +2025-02-06 06:54:10 - ERROR - stderr - +2025-02-06 06:54:10 - ERROR - stderr - +2025-02-06 06:54:10 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.5026683807373047, 'learning_rate': 2.2375759324533398e-07, 'epoch': 2.8} +2025-02-06 06:54:10 - ERROR - stderr - 93%|█████████▎| 20966/22434 [20:46:30<1:00:56, 2.49s/it] +2025-02-06 06:54:12 - ERROR - stderr - 93%|█████████▎| 20967/22434 [20:46:32<1:00:29, 2.47s/it] +2025-02-06 06:54:12 - ERROR - stderr - +2025-02-06 06:54:12 - ERROR - stderr - +2025-02-06 06:54:12 - INFO - stdout - {'loss': 0.3813, 'grad_norm': 1.677214503288269, 'learning_rate': 2.2345399026103888e-07, 'epoch': 2.8} +2025-02-06 06:54:12 - ERROR - stderr - 93%|█████████▎| 20967/22434 [20:46:32<1:00:29, 2.47s/it] +2025-02-06 06:54:15 - ERROR - stderr - 93%|█████████▎| 20968/22434 [20:46:35<1:01:44, 2.53s/it] +2025-02-06 06:54:15 - ERROR - stderr - +2025-02-06 06:54:15 - ERROR - stderr - +2025-02-06 06:54:15 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.6230571269989014, 'learning_rate': 2.2315059105949222e-07, 'epoch': 2.8} +2025-02-06 06:54:15 - ERROR - stderr - 93%|█████████▎| 20968/22434 [20:46:35<1:01:44, 2.53s/it] +2025-02-06 06:54:17 - ERROR - stderr - 93%|█████████▎| 20969/22434 [20:46:37<1:01:25, 2.52s/it] +2025-02-06 06:54:17 - ERROR - stderr - +2025-02-06 06:54:17 - ERROR - stderr - +2025-02-06 06:54:17 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.5914098024368286, 'learning_rate': 2.2284739564701563e-07, 'epoch': 2.8} +2025-02-06 06:54:17 - ERROR - stderr - 93%|█████████▎| 20969/22434 [20:46:37<1:01:25, 2.52s/it] +2025-02-06 06:54:20 - ERROR - stderr - 93%|█████████▎| 20970/22434 [20:46:40<1:01:08, 2.51s/it] +2025-02-06 06:54:20 - ERROR - stderr - +2025-02-06 06:54:20 - ERROR - stderr - +2025-02-06 06:54:20 - INFO - stdout - {'loss': 0.3227, 'grad_norm': 1.461203694343567, 'learning_rate': 2.225444040299285e-07, 'epoch': 2.8} +2025-02-06 06:54:20 - ERROR - stderr - 93%|█████████▎| 20970/22434 [20:46:40<1:01:08, 2.51s/it] +2025-02-06 06:54:23 - ERROR - stderr - 93%|█████████▎| 20971/22434 [20:46:42<1:01:39, 2.53s/it] +2025-02-06 06:54:23 - ERROR - stderr - +2025-02-06 06:54:23 - ERROR - stderr - +2025-02-06 06:54:23 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.4094074964523315, 'learning_rate': 2.22241616214548e-07, 'epoch': 2.8} +2025-02-06 06:54:23 - ERROR - stderr - 93%|█████████▎| 20971/22434 [20:46:42<1:01:39, 2.53s/it] +2025-02-06 06:54:25 - ERROR - stderr - 93%|█████████▎| 20972/22434 [20:46:45<1:01:04, 2.51s/it] +2025-02-06 06:54:25 - ERROR - stderr - +2025-02-06 06:54:25 - ERROR - stderr - +2025-02-06 06:54:25 - INFO - stdout - {'loss': 0.3779, 'grad_norm': 1.583274483680725, 'learning_rate': 2.219390322071835e-07, 'epoch': 2.8} +2025-02-06 06:54:25 - ERROR - stderr - 93%|█████████▎| 20972/22434 [20:46:45<1:01:04, 2.51s/it] +2025-02-06 06:54:27 - ERROR - stderr - 93%|█████████▎| 20973/22434 [20:46:47<1:00:49, 2.50s/it] +2025-02-06 06:54:27 - ERROR - stderr - +2025-02-06 06:54:27 - ERROR - stderr - +2025-02-06 06:54:27 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.521883249282837, 'learning_rate': 2.2163665201414553e-07, 'epoch': 2.8} +2025-02-06 06:54:27 - ERROR - stderr - 93%|█████████▎| 20973/22434 [20:46:47<1:00:49, 2.50s/it] +2025-02-06 06:54:30 - ERROR - stderr - 93%|█████████▎| 20974/22434 [20:46:50<1:00:50, 2.50s/it] +2025-02-06 06:54:30 - ERROR - stderr - +2025-02-06 06:54:30 - ERROR - stderr - +2025-02-06 06:54:30 - INFO - stdout - {'loss': 0.3475, 'grad_norm': 1.4544239044189453, 'learning_rate': 2.2133447564173237e-07, 'epoch': 2.8} +2025-02-06 06:54:30 - ERROR - stderr - 93%|█████████▎| 20974/22434 [20:46:50<1:00:50, 2.50s/it] +2025-02-06 06:54:32 - ERROR - stderr - 93%|█████████▎| 20975/22434 [20:46:52<1:00:45, 2.50s/it] +2025-02-06 06:54:32 - ERROR - stderr - +2025-02-06 06:54:32 - ERROR - stderr - +2025-02-06 06:54:32 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.590610146522522, 'learning_rate': 2.210325030962468e-07, 'epoch': 2.8} +2025-02-06 06:54:32 - ERROR - stderr - 93%|█████████▎| 20975/22434 [20:46:52<1:00:45, 2.50s/it] +2025-02-06 06:54:35 - ERROR - stderr - 94%|█████████▎| 20976/22434 [20:46:55<1:00:35, 2.49s/it] +2025-02-06 06:54:35 - ERROR - stderr - +2025-02-06 06:54:35 - ERROR - stderr - +2025-02-06 06:54:35 - INFO - stdout - {'loss': 0.3107, 'grad_norm': 1.535759687423706, 'learning_rate': 2.2073073438397929e-07, 'epoch': 2.81} +2025-02-06 06:54:35 - ERROR - stderr - 94%|█████████▎| 20976/22434 [20:46:55<1:00:35, 2.49s/it] +2025-02-06 06:54:37 - ERROR - stderr - 94%|█████████▎| 20977/22434 [20:46:57<1:00:17, 2.48s/it] +2025-02-06 06:54:37 - ERROR - stderr - +2025-02-06 06:54:37 - ERROR - stderr - +2025-02-06 06:54:37 - INFO - stdout - {'loss': 0.3787, 'grad_norm': 1.7200812101364136, 'learning_rate': 2.2042916951122372e-07, 'epoch': 2.81} +2025-02-06 06:54:37 - ERROR - stderr - 94%|█████████▎| 20977/22434 [20:46:57<1:00:17, 2.48s/it] +2025-02-06 06:54:40 - ERROR - stderr - 94%|█████████▎| 20978/22434 [20:47:00<1:00:18, 2.49s/it] +2025-02-06 06:54:40 - ERROR - stderr - +2025-02-06 06:54:40 - ERROR - stderr - +2025-02-06 06:54:40 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.624894618988037, 'learning_rate': 2.2012780848426286e-07, 'epoch': 2.81} +2025-02-06 06:54:40 - ERROR - stderr - 94%|█████████▎| 20978/22434 [20:47:00<1:00:18, 2.49s/it] +2025-02-06 06:54:42 - ERROR - stderr - 94%|█████████▎| 20979/22434 [20:47:02<1:00:05, 2.48s/it] +2025-02-06 06:54:42 - ERROR - stderr - +2025-02-06 06:54:42 - ERROR - stderr - +2025-02-06 06:54:42 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.5929768085479736, 'learning_rate': 2.1982665130938054e-07, 'epoch': 2.81} +2025-02-06 06:54:42 - ERROR - stderr - 94%|█████████▎| 20979/22434 [20:47:02<1:00:05, 2.48s/it] +2025-02-06 06:54:45 - ERROR - stderr - 94%|█████████▎| 20980/22434 [20:47:05<59:39, 2.46s/it] +2025-02-06 06:54:45 - ERROR - stderr - +2025-02-06 06:54:45 - ERROR - stderr - +2025-02-06 06:54:45 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.6137999296188354, 'learning_rate': 2.1952569799285172e-07, 'epoch': 2.81} +2025-02-06 06:54:45 - ERROR - stderr - 94%|█████████▎| 20980/22434 [20:47:05<59:39, 2.46s/it] +2025-02-06 06:54:47 - ERROR - stderr - 94%|█████████▎| 20981/22434 [20:47:07<1:00:03, 2.48s/it] +2025-02-06 06:54:47 - ERROR - stderr - +2025-02-06 06:54:47 - ERROR - stderr - +2025-02-06 06:54:47 - INFO - stdout - {'loss': 0.335, 'grad_norm': 1.440737247467041, 'learning_rate': 2.1922494854095145e-07, 'epoch': 2.81} +2025-02-06 06:54:47 - ERROR - stderr - 94%|█████████▎| 20981/22434 [20:47:07<1:00:03, 2.48s/it] +2025-02-06 06:54:50 - ERROR - stderr - 94%|█████████▎| 20982/22434 [20:47:10<1:01:24, 2.54s/it] +2025-02-06 06:54:50 - ERROR - stderr - +2025-02-06 06:54:50 - ERROR - stderr - +2025-02-06 06:54:50 - INFO - stdout - {'loss': 0.3443, 'grad_norm': 1.5126547813415527, 'learning_rate': 2.189244029599491e-07, 'epoch': 2.81} +2025-02-06 06:54:50 - ERROR - stderr - 94%|█████████▎| 20982/22434 [20:47:10<1:01:24, 2.54s/it] +2025-02-06 06:54:52 - ERROR - stderr - 94%|█████████▎| 20983/22434 [20:47:12<1:00:54, 2.52s/it] +2025-02-06 06:54:52 - ERROR - stderr - +2025-02-06 06:54:52 - ERROR - stderr - +2025-02-06 06:54:52 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.6157044172286987, 'learning_rate': 2.1862406125610636e-07, 'epoch': 2.81} +2025-02-06 06:54:52 - ERROR - stderr - 94%|█████████▎| 20983/22434 [20:47:12<1:00:54, 2.52s/it] +2025-02-06 06:54:55 - ERROR - stderr - 94%|█████████▎| 20984/22434 [20:47:15<1:00:55, 2.52s/it] +2025-02-06 06:54:55 - ERROR - stderr - +2025-02-06 06:54:55 - ERROR - stderr - +2025-02-06 06:54:55 - INFO - stdout - {'loss': 0.3417, 'grad_norm': 1.2181618213653564, 'learning_rate': 2.1832392343568598e-07, 'epoch': 2.81} +2025-02-06 06:54:55 - ERROR - stderr - 94%|█████████▎| 20984/22434 [20:47:15<1:00:55, 2.52s/it] +2025-02-06 06:54:57 - ERROR - stderr - 94%|█████████▎| 20985/22434 [20:47:17<1:00:18, 2.50s/it] +2025-02-06 06:54:57 - ERROR - stderr - +2025-02-06 06:54:57 - ERROR - stderr - +2025-02-06 06:54:57 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.5439636707305908, 'learning_rate': 2.180239895049441e-07, 'epoch': 2.81} +2025-02-06 06:54:57 - ERROR - stderr - 94%|█████████▎| 20985/22434 [20:47:17<1:00:18, 2.50s/it] +2025-02-06 06:55:00 - ERROR - stderr - 94%|█████████▎| 20986/22434 [20:47:20<1:00:02, 2.49s/it] +2025-02-06 06:55:00 - ERROR - stderr - +2025-02-06 06:55:00 - ERROR - stderr - +2025-02-06 06:55:00 - INFO - stdout - {'loss': 0.3427, 'grad_norm': 1.5068248510360718, 'learning_rate': 2.1772425947013008e-07, 'epoch': 2.81} +2025-02-06 06:55:00 - ERROR - stderr - 94%|█████████▎| 20986/22434 [20:47:20<1:00:02, 2.49s/it] +2025-02-06 06:55:02 - ERROR - stderr - 94%|█████████▎| 20987/22434 [20:47:22<1:00:28, 2.51s/it] +2025-02-06 06:55:02 - ERROR - stderr - +2025-02-06 06:55:02 - ERROR - stderr - +2025-02-06 06:55:02 - INFO - stdout - {'loss': 0.3279, 'grad_norm': 1.3487454652786255, 'learning_rate': 2.1742473333749569e-07, 'epoch': 2.81} +2025-02-06 06:55:02 - ERROR - stderr - 94%|█████████▎| 20987/22434 [20:47:22<1:00:28, 2.51s/it] +2025-02-06 06:55:05 - ERROR - stderr - 94%|█████████▎| 20988/22434 [20:47:25<1:00:13, 2.50s/it] +2025-02-06 06:55:05 - ERROR - stderr - +2025-02-06 06:55:05 - ERROR - stderr - +2025-02-06 06:55:05 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.57566237449646, 'learning_rate': 2.1712541111327924e-07, 'epoch': 2.81} +2025-02-06 06:55:05 - ERROR - stderr - 94%|█████████▎| 20988/22434 [20:47:25<1:00:13, 2.50s/it] +2025-02-06 06:55:07 - ERROR - stderr - 94%|█████████▎| 20989/22434 [20:47:27<1:00:24, 2.51s/it] +2025-02-06 06:55:07 - ERROR - stderr - +2025-02-06 06:55:07 - ERROR - stderr - +2025-02-06 06:55:07 - INFO - stdout - {'loss': 0.3174, 'grad_norm': 1.48201322555542, 'learning_rate': 2.168262928037246e-07, 'epoch': 2.81} +2025-02-06 06:55:07 - ERROR - stderr - 94%|█████████▎| 20989/22434 [20:47:27<1:00:24, 2.51s/it] +2025-02-06 06:55:10 - ERROR - stderr - 94%|█████████▎| 20990/22434 [20:47:30<1:00:24, 2.51s/it] +2025-02-06 06:55:10 - ERROR - stderr - +2025-02-06 06:55:10 - ERROR - stderr - +2025-02-06 06:55:10 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.5476562976837158, 'learning_rate': 2.1652737841506344e-07, 'epoch': 2.81} +2025-02-06 06:55:10 - ERROR - stderr - 94%|█████████▎| 20990/22434 [20:47:30<1:00:24, 2.51s/it] +2025-02-06 06:55:13 - ERROR - stderr - 94%|█████████▎| 20991/22434 [20:47:32<1:02:21, 2.59s/it] +2025-02-06 06:55:13 - ERROR - stderr - +2025-02-06 06:55:13 - ERROR - stderr - +2025-02-06 06:55:13 - INFO - stdout - {'loss': 0.3913, 'grad_norm': 1.7266650199890137, 'learning_rate': 2.1622866795352638e-07, 'epoch': 2.81} +2025-02-06 06:55:13 - ERROR - stderr - 94%|█████████▎| 20991/22434 [20:47:33<1:02:21, 2.59s/it] +2025-02-06 06:55:15 - ERROR - stderr - 94%|█████████▎| 20992/22434 [20:47:35<1:03:31, 2.64s/it] +2025-02-06 06:55:16 - ERROR - stderr - +2025-02-06 06:55:16 - ERROR - stderr - +2025-02-06 06:55:16 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.472413420677185, 'learning_rate': 2.1593016142534173e-07, 'epoch': 2.81} +2025-02-06 06:55:16 - ERROR - stderr - 94%|█████████▎| 20992/22434 [20:47:35<1:03:31, 2.64s/it] +2025-02-06 06:55:18 - ERROR - stderr - 94%|█████████▎| 20993/22434 [20:47:38<1:02:22, 2.60s/it] +2025-02-06 06:55:18 - ERROR - stderr - +2025-02-06 06:55:18 - ERROR - stderr - +2025-02-06 06:55:18 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.558553695678711, 'learning_rate': 2.156318588367301e-07, 'epoch': 2.81} +2025-02-06 06:55:18 - ERROR - stderr - 94%|█████████▎| 20993/22434 [20:47:38<1:02:22, 2.60s/it] +2025-02-06 06:55:21 - ERROR - stderr - 94%|█████████▎| 20994/22434 [20:47:40<1:02:06, 2.59s/it] +2025-02-06 06:55:21 - ERROR - stderr - +2025-02-06 06:55:21 - ERROR - stderr - +2025-02-06 06:55:21 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.6842482089996338, 'learning_rate': 2.1533376019391095e-07, 'epoch': 2.81} +2025-02-06 06:55:21 - ERROR - stderr - 94%|█████████▎| 20994/22434 [20:47:40<1:02:06, 2.59s/it] +2025-02-06 06:55:23 - ERROR - stderr - 94%|█████████▎| 20995/22434 [20:47:43<1:01:09, 2.55s/it] +2025-02-06 06:55:23 - ERROR - stderr - +2025-02-06 06:55:23 - ERROR - stderr - +2025-02-06 06:55:23 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.5345304012298584, 'learning_rate': 2.1503586550309486e-07, 'epoch': 2.81} +2025-02-06 06:55:23 - ERROR - stderr - 94%|█████████▎| 20995/22434 [20:47:43<1:01:09, 2.55s/it] +2025-02-06 06:55:26 - ERROR - stderr - 94%|█████████▎| 20996/22434 [20:47:45<1:01:08, 2.55s/it] +2025-02-06 06:55:26 - ERROR - stderr - +2025-02-06 06:55:26 - ERROR - stderr - +2025-02-06 06:55:26 - INFO - stdout - {'loss': 0.2888, 'grad_norm': 1.4740495681762695, 'learning_rate': 2.147381747704935e-07, 'epoch': 2.81} +2025-02-06 06:55:26 - ERROR - stderr - 94%|█████████▎| 20996/22434 [20:47:45<1:01:08, 2.55s/it] +2025-02-06 06:55:28 - ERROR - stderr - 94%|█████████▎| 20997/22434 [20:47:48<1:00:05, 2.51s/it] +2025-02-06 06:55:28 - ERROR - stderr - +2025-02-06 06:55:28 - ERROR - stderr - +2025-02-06 06:55:28 - INFO - stdout - {'loss': 0.3887, 'grad_norm': 1.6585506200790405, 'learning_rate': 2.14440688002312e-07, 'epoch': 2.81} +2025-02-06 06:55:28 - ERROR - stderr - 94%|█████████▎| 20997/22434 [20:47:48<1:00:05, 2.51s/it] +2025-02-06 06:55:30 - ERROR - stderr - 94%|█████████▎| 20998/22434 [20:47:50<59:29, 2.49s/it] +2025-02-06 06:55:30 - ERROR - stderr - +2025-02-06 06:55:30 - ERROR - stderr - +2025-02-06 06:55:30 - INFO - stdout - {'loss': 0.3166, 'grad_norm': 1.4644114971160889, 'learning_rate': 2.1414340520475087e-07, 'epoch': 2.81} +2025-02-06 06:55:30 - ERROR - stderr - 94%|█████████▎| 20998/22434 [20:47:50<59:29, 2.49s/it] +2025-02-06 06:55:33 - ERROR - stderr - 94%|█████████▎| 20999/22434 [20:47:53<59:45, 2.50s/it] +2025-02-06 06:55:33 - ERROR - stderr - +2025-02-06 06:55:33 - ERROR - stderr - +2025-02-06 06:55:33 - INFO - stdout - {'loss': 0.3281, 'grad_norm': 1.4557394981384277, 'learning_rate': 2.1384632638400515e-07, 'epoch': 2.81} +2025-02-06 06:55:33 - ERROR - stderr - 94%|█████████▎| 20999/22434 [20:47:53<59:45, 2.50s/it] +2025-02-06 06:55:36 - ERROR - stderr - 94%|█████████▎| 21000/22434 [20:47:55<1:00:15, 2.52s/it] +2025-02-06 06:55:36 - ERROR - stderr - +2025-02-06 06:55:36 - ERROR - stderr - +2025-02-06 06:55:36 - INFO - stdout - {'loss': 0.3371, 'grad_norm': 1.6681737899780273, 'learning_rate': 2.1354945154626883e-07, 'epoch': 2.81} +2025-02-06 06:55:36 - ERROR - stderr - 94%|█████████▎| 21000/22434 [20:47:55<1:00:15, 2.52s/it] +2025-02-06 06:55:38 - ERROR - stderr - 94%|█████████▎| 21001/22434 [20:47:58<1:00:25, 2.53s/it] +2025-02-06 06:55:38 - ERROR - stderr - +2025-02-06 06:55:38 - ERROR - stderr - +2025-02-06 06:55:38 - INFO - stdout - {'loss': 0.3406, 'grad_norm': 1.4628387689590454, 'learning_rate': 2.1325278069773027e-07, 'epoch': 2.81} +2025-02-06 06:55:38 - ERROR - stderr - 94%|█████████▎| 21001/22434 [20:47:58<1:00:25, 2.53s/it] +2025-02-06 06:55:40 - ERROR - stderr - 94%|█████████▎| 21002/22434 [20:48:00<59:39, 2.50s/it] +2025-02-06 06:55:41 - ERROR - stderr - +2025-02-06 06:55:41 - ERROR - stderr - +2025-02-06 06:55:41 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.5058053731918335, 'learning_rate': 2.1295631384457228e-07, 'epoch': 2.81} +2025-02-06 06:55:41 - ERROR - stderr - 94%|█████████▎| 21002/22434 [20:48:00<59:39, 2.50s/it] +2025-02-06 06:55:43 - ERROR - stderr - 94%|█████████▎| 21003/22434 [20:48:03<59:10, 2.48s/it] +2025-02-06 06:55:43 - ERROR - stderr - +2025-02-06 06:55:43 - ERROR - stderr - +2025-02-06 06:55:43 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.4408000707626343, 'learning_rate': 2.1266005099297436e-07, 'epoch': 2.81} +2025-02-06 06:55:43 - ERROR - stderr - 94%|█████████▎| 21003/22434 [20:48:03<59:10, 2.48s/it] +2025-02-06 06:55:46 - ERROR - stderr - 94%|█████████▎| 21004/22434 [20:48:05<1:00:16, 2.53s/it] +2025-02-06 06:55:46 - ERROR - stderr - +2025-02-06 06:55:46 - ERROR - stderr - +2025-02-06 06:55:46 - INFO - stdout - {'loss': 0.4396, 'grad_norm': 1.6201496124267578, 'learning_rate': 2.1236399214911274e-07, 'epoch': 2.81} +2025-02-06 06:55:46 - ERROR - stderr - 94%|█████████▎| 21004/22434 [20:48:05<1:00:16, 2.53s/it] +2025-02-06 06:55:48 - ERROR - stderr - 94%|█████████▎| 21005/22434 [20:48:08<1:00:01, 2.52s/it] +2025-02-06 06:55:48 - ERROR - stderr - +2025-02-06 06:55:48 - ERROR - stderr - +2025-02-06 06:55:48 - INFO - stdout - {'loss': 0.3174, 'grad_norm': 1.2999467849731445, 'learning_rate': 2.1206813731915798e-07, 'epoch': 2.81} +2025-02-06 06:55:48 - ERROR - stderr - 94%|█████████▎| 21005/22434 [20:48:08<1:00:01, 2.52s/it] +2025-02-06 06:55:51 - ERROR - stderr - 94%|█████████▎| 21006/22434 [20:48:10<59:58, 2.52s/it] +2025-02-06 06:55:51 - ERROR - stderr - +2025-02-06 06:55:51 - ERROR - stderr - +2025-02-06 06:55:51 - INFO - stdout - {'loss': 0.3968, 'grad_norm': 1.7491109371185303, 'learning_rate': 2.117724865092774e-07, 'epoch': 2.81} +2025-02-06 06:55:51 - ERROR - stderr - 94%|█████████▎| 21006/22434 [20:48:10<59:58, 2.52s/it] +2025-02-06 06:55:53 - ERROR - stderr - 94%|█████████▎| 21007/22434 [20:48:13<59:14, 2.49s/it] +2025-02-06 06:55:53 - ERROR - stderr - +2025-02-06 06:55:53 - ERROR - stderr - +2025-02-06 06:55:53 - INFO - stdout - {'loss': 0.4134, 'grad_norm': 1.5332244634628296, 'learning_rate': 2.1147703972563049e-07, 'epoch': 2.81} +2025-02-06 06:55:53 - ERROR - stderr - 94%|█████████▎| 21007/22434 [20:48:13<59:14, 2.49s/it] +2025-02-06 06:55:56 - ERROR - stderr - 94%|█████████▎| 21008/22434 [20:48:15<59:27, 2.50s/it] +2025-02-06 06:55:56 - ERROR - stderr - +2025-02-06 06:55:56 - ERROR - stderr - +2025-02-06 06:55:56 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.6016746759414673, 'learning_rate': 2.1118179697438125e-07, 'epoch': 2.81} +2025-02-06 06:55:56 - ERROR - stderr - 94%|█████████▎| 21008/22434 [20:48:15<59:27, 2.50s/it] +2025-02-06 06:55:58 - ERROR - stderr - 94%|█████████▎| 21009/22434 [20:48:18<58:51, 2.48s/it] +2025-02-06 06:55:58 - ERROR - stderr - +2025-02-06 06:55:58 - ERROR - stderr - +2025-02-06 06:55:58 - INFO - stdout - {'loss': 0.3106, 'grad_norm': 1.379032850265503, 'learning_rate': 2.1088675826167804e-07, 'epoch': 2.81} +2025-02-06 06:55:58 - ERROR - stderr - 94%|█████████▎| 21009/22434 [20:48:18<58:51, 2.48s/it] +2025-02-06 06:56:01 - ERROR - stderr - 94%|█████████▎| 21010/22434 [20:48:20<1:00:02, 2.53s/it] +2025-02-06 06:56:01 - ERROR - stderr - +2025-02-06 06:56:01 - ERROR - stderr - +2025-02-06 06:56:01 - INFO - stdout - {'loss': 0.3525, 'grad_norm': 1.5215754508972168, 'learning_rate': 2.1059192359367485e-07, 'epoch': 2.81} +2025-02-06 06:56:01 - ERROR - stderr - 94%|█████████▎| 21010/22434 [20:48:20<1:00:02, 2.53s/it] +2025-02-06 06:56:03 - ERROR - stderr - 94%|█████████▎| 21011/22434 [20:48:23<59:42, 2.52s/it] +2025-02-06 06:56:03 - ERROR - stderr - +2025-02-06 06:56:03 - ERROR - stderr - +2025-02-06 06:56:03 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.5460797548294067, 'learning_rate': 2.102972929765157e-07, 'epoch': 2.81} +2025-02-06 06:56:03 - ERROR - stderr - 94%|█████████▎| 21011/22434 [20:48:23<59:42, 2.52s/it] +2025-02-06 06:56:06 - ERROR - stderr - 94%|█████████▎| 21012/22434 [20:48:25<58:56, 2.49s/it] +2025-02-06 06:56:06 - ERROR - stderr - +2025-02-06 06:56:06 - ERROR - stderr - +2025-02-06 06:56:06 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.6412646770477295, 'learning_rate': 2.1000286641634003e-07, 'epoch': 2.81} +2025-02-06 06:56:06 - ERROR - stderr - 94%|█████████▎| 21012/22434 [20:48:25<58:56, 2.49s/it] +2025-02-06 06:56:08 - ERROR - stderr - 94%|█████████▎| 21013/22434 [20:48:28<58:19, 2.46s/it] +2025-02-06 06:56:08 - ERROR - stderr - +2025-02-06 06:56:08 - ERROR - stderr - +2025-02-06 06:56:08 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.6331738233566284, 'learning_rate': 2.0970864391928858e-07, 'epoch': 2.81} +2025-02-06 06:56:08 - ERROR - stderr - 94%|█████████▎| 21013/22434 [20:48:28<58:19, 2.46s/it] +2025-02-06 06:56:11 - ERROR - stderr - 94%|█████████▎| 21014/22434 [20:48:30<59:13, 2.50s/it] +2025-02-06 06:56:11 - ERROR - stderr - +2025-02-06 06:56:11 - ERROR - stderr - +2025-02-06 06:56:11 - INFO - stdout - {'loss': 0.305, 'grad_norm': 1.3080657720565796, 'learning_rate': 2.0941462549149083e-07, 'epoch': 2.81} +2025-02-06 06:56:11 - ERROR - stderr - 94%|█████████▎| 21014/22434 [20:48:30<59:13, 2.50s/it] +2025-02-06 06:56:13 - ERROR - stderr - 94%|█████████▎| 21015/22434 [20:48:33<58:45, 2.48s/it] +2025-02-06 06:56:13 - ERROR - stderr - +2025-02-06 06:56:13 - ERROR - stderr - +2025-02-06 06:56:13 - INFO - stdout - {'loss': 0.2812, 'grad_norm': 1.4372737407684326, 'learning_rate': 2.0912081113907745e-07, 'epoch': 2.81} +2025-02-06 06:56:13 - ERROR - stderr - 94%|█████████▎| 21015/22434 [20:48:33<58:45, 2.48s/it] +2025-02-06 06:56:15 - ERROR - stderr - 94%|█████████▎| 21016/22434 [20:48:35<58:47, 2.49s/it] +2025-02-06 06:56:15 - ERROR - stderr - +2025-02-06 06:56:15 - ERROR - stderr - +2025-02-06 06:56:15 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.7766419649124146, 'learning_rate': 2.0882720086817132e-07, 'epoch': 2.81} +2025-02-06 06:56:16 - ERROR - stderr - 94%|█████████▎| 21016/22434 [20:48:35<58:47, 2.49s/it] +2025-02-06 06:56:18 - ERROR - stderr - 94%|█████████▎| 21017/22434 [20:48:38<58:46, 2.49s/it] +2025-02-06 06:56:18 - ERROR - stderr - +2025-02-06 06:56:18 - ERROR - stderr - +2025-02-06 06:56:18 - INFO - stdout - {'loss': 0.319, 'grad_norm': 1.4037578105926514, 'learning_rate': 2.085337946848931e-07, 'epoch': 2.81} +2025-02-06 06:56:18 - ERROR - stderr - 94%|█████████▎| 21017/22434 [20:48:38<58:46, 2.49s/it] +2025-02-06 06:56:20 - ERROR - stderr - 94%|█████████▎| 21018/22434 [20:48:40<58:32, 2.48s/it] +2025-02-06 06:56:20 - ERROR - stderr - +2025-02-06 06:56:20 - ERROR - stderr - +2025-02-06 06:56:20 - INFO - stdout - {'loss': 0.3922, 'grad_norm': 1.5752131938934326, 'learning_rate': 2.082405925953579e-07, 'epoch': 2.81} +2025-02-06 06:56:20 - ERROR - stderr - 94%|█████████▎| 21018/22434 [20:48:40<58:32, 2.48s/it] +2025-02-06 06:56:23 - ERROR - stderr - 94%|█████████▎| 21019/22434 [20:48:43<58:40, 2.49s/it] +2025-02-06 06:56:23 - ERROR - stderr - +2025-02-06 06:56:23 - ERROR - stderr - +2025-02-06 06:56:23 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.5167311429977417, 'learning_rate': 2.079475946056786e-07, 'epoch': 2.81} +2025-02-06 06:56:23 - ERROR - stderr - 94%|█████████▎| 21019/22434 [20:48:43<58:40, 2.49s/it] +2025-02-06 06:56:25 - ERROR - stderr - 94%|█████████▎| 21020/22434 [20:48:45<58:26, 2.48s/it] +2025-02-06 06:56:25 - ERROR - stderr - +2025-02-06 06:56:25 - ERROR - stderr - +2025-02-06 06:56:25 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.6328784227371216, 'learning_rate': 2.0765480072196142e-07, 'epoch': 2.81} +2025-02-06 06:56:25 - ERROR - stderr - 94%|█████████▎| 21020/22434 [20:48:45<58:26, 2.48s/it] +2025-02-06 06:56:28 - ERROR - stderr - 94%|█████████▎| 21021/22434 [20:48:48<1:01:34, 2.61s/it] +2025-02-06 06:56:28 - ERROR - stderr - +2025-02-06 06:56:28 - ERROR - stderr - +2025-02-06 06:56:28 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.6029284000396729, 'learning_rate': 2.073622109503104e-07, 'epoch': 2.81} +2025-02-06 06:56:28 - ERROR - stderr - 94%|█████████▎| 21021/22434 [20:48:48<1:01:34, 2.61s/it] +2025-02-06 06:56:31 - ERROR - stderr - 94%|█████████▎| 21022/22434 [20:48:51<1:00:33, 2.57s/it] +2025-02-06 06:56:31 - ERROR - stderr - +2025-02-06 06:56:31 - ERROR - stderr - +2025-02-06 06:56:31 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.451704978942871, 'learning_rate': 2.0706982529682286e-07, 'epoch': 2.81} +2025-02-06 06:56:31 - ERROR - stderr - 94%|█████████▎| 21022/22434 [20:48:51<1:00:33, 2.57s/it] +2025-02-06 06:56:33 - ERROR - stderr - 94%|█████████▎| 21023/22434 [20:48:53<1:00:10, 2.56s/it] +2025-02-06 06:56:33 - ERROR - stderr - +2025-02-06 06:56:33 - ERROR - stderr - +2025-02-06 06:56:33 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.5524015426635742, 'learning_rate': 2.067776437675939e-07, 'epoch': 2.81} +2025-02-06 06:56:33 - ERROR - stderr - 94%|█████████▎| 21023/22434 [20:48:53<1:00:10, 2.56s/it] +2025-02-06 06:56:36 - ERROR - stderr - 94%|█████████▎| 21024/22434 [20:48:56<59:27, 2.53s/it] +2025-02-06 06:56:36 - ERROR - stderr - +2025-02-06 06:56:36 - ERROR - stderr - +2025-02-06 06:56:36 - INFO - stdout - {'loss': 0.3845, 'grad_norm': 1.6109240055084229, 'learning_rate': 2.0648566636871426e-07, 'epoch': 2.81} +2025-02-06 06:56:36 - ERROR - stderr - 94%|█████████▎| 21024/22434 [20:48:56<59:27, 2.53s/it] +2025-02-06 06:56:38 - ERROR - stderr - 94%|█████████▎| 21025/22434 [20:48:58<59:28, 2.53s/it] +2025-02-06 06:56:38 - ERROR - stderr - +2025-02-06 06:56:38 - ERROR - stderr - +2025-02-06 06:56:38 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.4419208765029907, 'learning_rate': 2.0619389310626903e-07, 'epoch': 2.81} +2025-02-06 06:56:38 - ERROR - stderr - 94%|█████████▎| 21025/22434 [20:48:58<59:28, 2.53s/it] +2025-02-06 06:56:41 - ERROR - stderr - 94%|█████████▎| 21026/22434 [20:49:01<59:30, 2.54s/it] +2025-02-06 06:56:41 - ERROR - stderr - +2025-02-06 06:56:41 - ERROR - stderr - +2025-02-06 06:56:41 - INFO - stdout - {'loss': 0.3212, 'grad_norm': 1.5239206552505493, 'learning_rate': 2.0590232398634114e-07, 'epoch': 2.81} +2025-02-06 06:56:41 - ERROR - stderr - 94%|█████████▎| 21026/22434 [20:49:01<59:30, 2.54s/it] +2025-02-06 06:56:43 - ERROR - stderr - 94%|█████████▎| 21027/22434 [20:49:03<58:58, 2.51s/it] +2025-02-06 06:56:43 - ERROR - stderr - +2025-02-06 06:56:43 - ERROR - stderr - +2025-02-06 06:56:43 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.6813175678253174, 'learning_rate': 2.0561095901500793e-07, 'epoch': 2.81} +2025-02-06 06:56:43 - ERROR - stderr - 94%|█████████▎| 21027/22434 [20:49:03<58:58, 2.51s/it] +2025-02-06 06:56:46 - ERROR - stderr - 94%|█████████▎| 21028/22434 [20:49:06<58:52, 2.51s/it] +2025-02-06 06:56:46 - ERROR - stderr - +2025-02-06 06:56:46 - ERROR - stderr - +2025-02-06 06:56:46 - INFO - stdout - {'loss': 0.3305, 'grad_norm': 1.3473381996154785, 'learning_rate': 2.0531979819834015e-07, 'epoch': 2.81} +2025-02-06 06:56:46 - ERROR - stderr - 94%|█████████▎| 21028/22434 [20:49:06<58:52, 2.51s/it] +2025-02-06 06:56:48 - ERROR - stderr - 94%|█████████▎| 21029/22434 [20:49:08<58:21, 2.49s/it] +2025-02-06 06:56:48 - ERROR - stderr - +2025-02-06 06:56:48 - ERROR - stderr - +2025-02-06 06:56:48 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.5564929246902466, 'learning_rate': 2.0502884154240955e-07, 'epoch': 2.81} +2025-02-06 06:56:48 - ERROR - stderr - 94%|█████████▎| 21029/22434 [20:49:08<58:21, 2.49s/it] +2025-02-06 06:56:51 - ERROR - stderr - 94%|█████████▎| 21030/22434 [20:49:11<58:49, 2.51s/it] +2025-02-06 06:56:51 - ERROR - stderr - +2025-02-06 06:56:51 - ERROR - stderr - +2025-02-06 06:56:51 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.5361626148223877, 'learning_rate': 2.047380890532813e-07, 'epoch': 2.81} +2025-02-06 06:56:51 - ERROR - stderr - 94%|█████████▎| 21030/22434 [20:49:11<58:49, 2.51s/it] +2025-02-06 06:56:53 - ERROR - stderr - 94%|█████████▎| 21031/22434 [20:49:13<59:02, 2.53s/it] +2025-02-06 06:56:53 - ERROR - stderr - +2025-02-06 06:56:53 - ERROR - stderr - +2025-02-06 06:56:53 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.6218271255493164, 'learning_rate': 2.044475407370128e-07, 'epoch': 2.81} +2025-02-06 06:56:53 - ERROR - stderr - 94%|█████████▎| 21031/22434 [20:49:13<59:02, 2.53s/it] +2025-02-06 06:56:56 - ERROR - stderr - 94%|█████████▍| 21032/22434 [20:49:16<58:47, 2.52s/it] +2025-02-06 06:56:56 - ERROR - stderr - +2025-02-06 06:56:56 - ERROR - stderr - +2025-02-06 06:56:56 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.5912046432495117, 'learning_rate': 2.041571965996636e-07, 'epoch': 2.81} +2025-02-06 06:56:56 - ERROR - stderr - 94%|█████████▍| 21032/22434 [20:49:16<58:47, 2.52s/it] +2025-02-06 06:56:58 - ERROR - stderr - 94%|█████████▍| 21033/22434 [20:49:18<58:37, 2.51s/it] +2025-02-06 06:56:58 - ERROR - stderr - +2025-02-06 06:56:58 - ERROR - stderr - +2025-02-06 06:56:58 - INFO - stdout - {'loss': 0.4031, 'grad_norm': 1.7652696371078491, 'learning_rate': 2.0386705664728222e-07, 'epoch': 2.81} +2025-02-06 06:56:58 - ERROR - stderr - 94%|█████████▍| 21033/22434 [20:49:18<58:37, 2.51s/it] +2025-02-06 06:57:01 - ERROR - stderr - 94%|█████████▍| 21034/22434 [20:49:21<58:10, 2.49s/it] +2025-02-06 06:57:01 - ERROR - stderr - +2025-02-06 06:57:01 - ERROR - stderr - +2025-02-06 06:57:01 - INFO - stdout - {'loss': 0.3158, 'grad_norm': 1.5469073057174683, 'learning_rate': 2.0357712088591942e-07, 'epoch': 2.81} +2025-02-06 06:57:01 - ERROR - stderr - 94%|█████████▍| 21034/22434 [20:49:21<58:10, 2.49s/it] +2025-02-06 06:57:03 - ERROR - stderr - 94%|█████████▍| 21035/22434 [20:49:23<57:41, 2.47s/it] +2025-02-06 06:57:03 - ERROR - stderr - +2025-02-06 06:57:03 - ERROR - stderr - +2025-02-06 06:57:03 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.5513590574264526, 'learning_rate': 2.0328738932161695e-07, 'epoch': 2.81} +2025-02-06 06:57:03 - ERROR - stderr - 94%|█████████▍| 21035/22434 [20:49:23<57:41, 2.47s/it] +2025-02-06 06:57:06 - ERROR - stderr - 94%|█████████▍| 21036/22434 [20:49:25<57:38, 2.47s/it] +2025-02-06 06:57:06 - ERROR - stderr - +2025-02-06 06:57:06 - ERROR - stderr - +2025-02-06 06:57:06 - INFO - stdout - {'loss': 0.3909, 'grad_norm': 1.6858493089675903, 'learning_rate': 2.0299786196041448e-07, 'epoch': 2.81} +2025-02-06 06:57:06 - ERROR - stderr - 94%|█████████▍| 21036/22434 [20:49:26<57:38, 2.47s/it] +2025-02-06 06:57:08 - ERROR - stderr - 94%|█████████▍| 21037/22434 [20:49:28<57:49, 2.48s/it] +2025-02-06 06:57:08 - ERROR - stderr - +2025-02-06 06:57:08 - ERROR - stderr - +2025-02-06 06:57:08 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.4892266988754272, 'learning_rate': 2.0270853880834608e-07, 'epoch': 2.81} +2025-02-06 06:57:08 - ERROR - stderr - 94%|█████████▍| 21037/22434 [20:49:28<57:49, 2.48s/it] +2025-02-06 06:57:11 - ERROR - stderr - 94%|█████████▍| 21038/22434 [20:49:30<57:43, 2.48s/it] +2025-02-06 06:57:11 - ERROR - stderr - +2025-02-06 06:57:11 - ERROR - stderr - +2025-02-06 06:57:11 - INFO - stdout - {'loss': 0.3595, 'grad_norm': 1.588689923286438, 'learning_rate': 2.0241941987144464e-07, 'epoch': 2.81} +2025-02-06 06:57:11 - ERROR - stderr - 94%|█████████▍| 21038/22434 [20:49:31<57:43, 2.48s/it] +2025-02-06 06:57:13 - ERROR - stderr - 94%|█████████▍| 21039/22434 [20:49:33<57:26, 2.47s/it] +2025-02-06 06:57:13 - ERROR - stderr - +2025-02-06 06:57:13 - ERROR - stderr - +2025-02-06 06:57:13 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.515647530555725, 'learning_rate': 2.021305051557343e-07, 'epoch': 2.81} +2025-02-06 06:57:13 - ERROR - stderr - 94%|█████████▍| 21039/22434 [20:49:33<57:26, 2.47s/it] +2025-02-06 06:57:16 - ERROR - stderr - 94%|█████████▍| 21040/22434 [20:49:35<57:31, 2.48s/it] +2025-02-06 06:57:16 - ERROR - stderr - +2025-02-06 06:57:16 - ERROR - stderr - +2025-02-06 06:57:16 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.5506705045700073, 'learning_rate': 2.0184179466723796e-07, 'epoch': 2.81} +2025-02-06 06:57:16 - ERROR - stderr - 94%|█████████▍| 21040/22434 [20:49:35<57:31, 2.48s/it] +2025-02-06 06:57:18 - ERROR - stderr - 94%|█████████▍| 21041/22434 [20:49:38<57:34, 2.48s/it] +2025-02-06 06:57:18 - ERROR - stderr - +2025-02-06 06:57:18 - ERROR - stderr - +2025-02-06 06:57:18 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.565531849861145, 'learning_rate': 2.0155328841197307e-07, 'epoch': 2.81} +2025-02-06 06:57:18 - ERROR - stderr - 94%|█████████▍| 21041/22434 [20:49:38<57:34, 2.48s/it] +2025-02-06 06:57:21 - ERROR - stderr - 94%|█████████▍| 21042/22434 [20:49:40<57:43, 2.49s/it] +2025-02-06 06:57:21 - ERROR - stderr - +2025-02-06 06:57:21 - ERROR - stderr - +2025-02-06 06:57:21 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.4972879886627197, 'learning_rate': 2.0126498639595481e-07, 'epoch': 2.81} +2025-02-06 06:57:21 - ERROR - stderr - 94%|█████████▍| 21042/22434 [20:49:40<57:43, 2.49s/it] +2025-02-06 06:57:23 - ERROR - stderr - 94%|█████████▍| 21043/22434 [20:49:43<57:34, 2.48s/it] +2025-02-06 06:57:23 - ERROR - stderr - +2025-02-06 06:57:23 - ERROR - stderr - +2025-02-06 06:57:23 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.4020835161209106, 'learning_rate': 2.009768886251906e-07, 'epoch': 2.81} +2025-02-06 06:57:23 - ERROR - stderr - 94%|█████████▍| 21043/22434 [20:49:43<57:34, 2.48s/it] +2025-02-06 06:57:26 - ERROR - stderr - 94%|█████████▍| 21044/22434 [20:49:45<57:10, 2.47s/it] +2025-02-06 06:57:26 - ERROR - stderr - +2025-02-06 06:57:26 - ERROR - stderr - +2025-02-06 06:57:26 - INFO - stdout - {'loss': 0.3294, 'grad_norm': 1.5145186185836792, 'learning_rate': 2.0068899510568783e-07, 'epoch': 2.81} +2025-02-06 06:57:26 - ERROR - stderr - 94%|█████████▍| 21044/22434 [20:49:45<57:10, 2.47s/it] +2025-02-06 06:57:28 - ERROR - stderr - 94%|█████████▍| 21045/22434 [20:49:48<57:00, 2.46s/it] +2025-02-06 06:57:28 - ERROR - stderr - +2025-02-06 06:57:28 - ERROR - stderr - +2025-02-06 06:57:28 - INFO - stdout - {'loss': 0.3261, 'grad_norm': 1.5370123386383057, 'learning_rate': 2.004013058434451e-07, 'epoch': 2.81} +2025-02-06 06:57:28 - ERROR - stderr - 94%|█████████▍| 21045/22434 [20:49:48<57:00, 2.46s/it] +2025-02-06 06:57:31 - ERROR - stderr - 94%|█████████▍| 21046/22434 [20:49:50<57:29, 2.49s/it] +2025-02-06 06:57:31 - ERROR - stderr - +2025-02-06 06:57:31 - ERROR - stderr - +2025-02-06 06:57:31 - INFO - stdout - {'loss': 0.3377, 'grad_norm': 1.3789373636245728, 'learning_rate': 2.0011382084446085e-07, 'epoch': 2.81} +2025-02-06 06:57:31 - ERROR - stderr - 94%|█████████▍| 21046/22434 [20:49:50<57:29, 2.49s/it] +2025-02-06 06:57:33 - ERROR - stderr - 94%|█████████▍| 21047/22434 [20:49:53<57:17, 2.48s/it] +2025-02-06 06:57:33 - ERROR - stderr - +2025-02-06 06:57:33 - ERROR - stderr - +2025-02-06 06:57:33 - INFO - stdout - {'loss': 0.3609, 'grad_norm': 1.5318207740783691, 'learning_rate': 1.998265401147248e-07, 'epoch': 2.81} +2025-02-06 06:57:33 - ERROR - stderr - 94%|█████████▍| 21047/22434 [20:49:53<57:17, 2.48s/it] +2025-02-06 06:57:35 - ERROR - stderr - 94%|█████████▍| 21048/22434 [20:49:55<56:58, 2.47s/it] +2025-02-06 06:57:35 - ERROR - stderr - +2025-02-06 06:57:35 - ERROR - stderr - +2025-02-06 06:57:35 - INFO - stdout - {'loss': 0.3096, 'grad_norm': 1.4459102153778076, 'learning_rate': 1.995394636602277e-07, 'epoch': 2.81} +2025-02-06 06:57:35 - ERROR - stderr - 94%|█████████▍| 21048/22434 [20:49:55<56:58, 2.47s/it] +2025-02-06 06:57:38 - ERROR - stderr - 94%|█████████▍| 21049/22434 [20:49:58<57:57, 2.51s/it] +2025-02-06 06:57:38 - ERROR - stderr - +2025-02-06 06:57:38 - ERROR - stderr - +2025-02-06 06:57:38 - INFO - stdout - {'loss': 0.3306, 'grad_norm': 1.4076447486877441, 'learning_rate': 1.9925259148695253e-07, 'epoch': 2.81} +2025-02-06 06:57:38 - ERROR - stderr - 94%|█████████▍| 21049/22434 [20:49:58<57:57, 2.51s/it] +2025-02-06 06:57:41 - ERROR - stderr - 94%|█████████▍| 21050/22434 [20:50:00<57:31, 2.49s/it] +2025-02-06 06:57:41 - ERROR - stderr - +2025-02-06 06:57:41 - ERROR - stderr - +2025-02-06 06:57:41 - INFO - stdout - {'loss': 0.3243, 'grad_norm': 1.4815031290054321, 'learning_rate': 1.9896592360087897e-07, 'epoch': 2.81} +2025-02-06 06:57:41 - ERROR - stderr - 94%|█████████▍| 21050/22434 [20:50:00<57:31, 2.49s/it] +2025-02-06 06:57:43 - ERROR - stderr - 94%|█████████▍| 21051/22434 [20:50:03<57:46, 2.51s/it] +2025-02-06 06:57:43 - ERROR - stderr - +2025-02-06 06:57:43 - ERROR - stderr - +2025-02-06 06:57:43 - INFO - stdout - {'loss': 0.3725, 'grad_norm': 1.670536994934082, 'learning_rate': 1.9867946000798223e-07, 'epoch': 2.82} +2025-02-06 06:57:43 - ERROR - stderr - 94%|█████████▍| 21051/22434 [20:50:03<57:46, 2.51s/it] +2025-02-06 06:57:46 - ERROR - stderr - 94%|█████████▍| 21052/22434 [20:50:05<57:30, 2.50s/it] +2025-02-06 06:57:46 - ERROR - stderr - +2025-02-06 06:57:46 - ERROR - stderr - +2025-02-06 06:57:46 - INFO - stdout - {'loss': 0.3386, 'grad_norm': 1.4800221920013428, 'learning_rate': 1.9839320071423195e-07, 'epoch': 2.82} +2025-02-06 06:57:46 - ERROR - stderr - 94%|█████████▍| 21052/22434 [20:50:05<57:30, 2.50s/it] +2025-02-06 06:57:48 - ERROR - stderr - 94%|█████████▍| 21053/22434 [20:50:08<57:22, 2.49s/it] +2025-02-06 06:57:48 - ERROR - stderr - +2025-02-06 06:57:48 - ERROR - stderr - +2025-02-06 06:57:48 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.6522111892700195, 'learning_rate': 1.9810714572559898e-07, 'epoch': 2.82} +2025-02-06 06:57:48 - ERROR - stderr - 94%|█████████▍| 21053/22434 [20:50:08<57:22, 2.49s/it] +2025-02-06 06:57:50 - ERROR - stderr - 94%|█████████▍| 21054/22434 [20:50:10<57:01, 2.48s/it] +2025-02-06 06:57:50 - ERROR - stderr - +2025-02-06 06:57:50 - ERROR - stderr - +2025-02-06 06:57:50 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.529272198677063, 'learning_rate': 1.9782129504804182e-07, 'epoch': 2.82} +2025-02-06 06:57:50 - ERROR - stderr - 94%|█████████▍| 21054/22434 [20:50:10<57:01, 2.48s/it] +2025-02-06 06:57:53 - ERROR - stderr - 94%|█████████▍| 21055/22434 [20:50:13<57:36, 2.51s/it] +2025-02-06 06:57:53 - ERROR - stderr - +2025-02-06 06:57:53 - ERROR - stderr - +2025-02-06 06:57:53 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.8183497190475464, 'learning_rate': 1.9753564868751906e-07, 'epoch': 2.82} +2025-02-06 06:57:53 - ERROR - stderr - 94%|█████████▍| 21055/22434 [20:50:13<57:36, 2.51s/it] +2025-02-06 06:57:56 - ERROR - stderr - 94%|█████████▍| 21056/22434 [20:50:15<57:41, 2.51s/it] +2025-02-06 06:57:56 - ERROR - stderr - +2025-02-06 06:57:56 - ERROR - stderr - +2025-02-06 06:57:56 - INFO - stdout - {'loss': 0.2975, 'grad_norm': 1.3781611919403076, 'learning_rate': 1.9725020664998707e-07, 'epoch': 2.82} +2025-02-06 06:57:56 - ERROR - stderr - 94%|█████████▍| 21056/22434 [20:50:15<57:41, 2.51s/it] +2025-02-06 06:57:58 - ERROR - stderr - 94%|█████████▍| 21057/22434 [20:50:18<57:08, 2.49s/it] +2025-02-06 06:57:58 - ERROR - stderr - +2025-02-06 06:57:58 - ERROR - stderr - +2025-02-06 06:57:58 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.7327841520309448, 'learning_rate': 1.9696496894139216e-07, 'epoch': 2.82} +2025-02-06 06:57:58 - ERROR - stderr - 94%|█████████▍| 21057/22434 [20:50:18<57:08, 2.49s/it] +2025-02-06 06:58:01 - ERROR - stderr - 94%|█████████▍| 21058/22434 [20:50:20<58:03, 2.53s/it] +2025-02-06 06:58:01 - ERROR - stderr - +2025-02-06 06:58:01 - ERROR - stderr - +2025-02-06 06:58:01 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.483992338180542, 'learning_rate': 1.9667993556768517e-07, 'epoch': 2.82} +2025-02-06 06:58:01 - ERROR - stderr - 94%|█████████▍| 21058/22434 [20:50:20<58:03, 2.53s/it] +2025-02-06 06:58:03 - ERROR - stderr - 94%|█████████▍| 21059/22434 [20:50:23<58:09, 2.54s/it] +2025-02-06 06:58:03 - ERROR - stderr - +2025-02-06 06:58:03 - ERROR - stderr - +2025-02-06 06:58:03 - INFO - stdout - {'loss': 0.3859, 'grad_norm': 1.6965386867523193, 'learning_rate': 1.9639510653480244e-07, 'epoch': 2.82} +2025-02-06 06:58:03 - ERROR - stderr - 94%|█████████▍| 21059/22434 [20:50:23<58:09, 2.54s/it] +2025-02-06 06:58:06 - ERROR - stderr - 94%|█████████▍| 21060/22434 [20:50:25<58:16, 2.54s/it] +2025-02-06 06:58:06 - ERROR - stderr - +2025-02-06 06:58:06 - ERROR - stderr - +2025-02-06 06:58:06 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.4392951726913452, 'learning_rate': 1.9611048184868254e-07, 'epoch': 2.82} +2025-02-06 06:58:06 - ERROR - stderr - 94%|█████████▍| 21060/22434 [20:50:26<58:16, 2.54s/it] +2025-02-06 06:58:08 - ERROR - stderr - 94%|█████████▍| 21061/22434 [20:50:28<58:08, 2.54s/it] +2025-02-06 06:58:08 - ERROR - stderr - +2025-02-06 06:58:08 - ERROR - stderr - +2025-02-06 06:58:08 - INFO - stdout - {'loss': 0.3322, 'grad_norm': 1.3528062105178833, 'learning_rate': 1.958260615152585e-07, 'epoch': 2.82} +2025-02-06 06:58:08 - ERROR - stderr - 94%|█████████▍| 21061/22434 [20:50:28<58:08, 2.54s/it] +2025-02-06 06:58:11 - ERROR - stderr - 94%|█████████▍| 21062/22434 [20:50:31<57:47, 2.53s/it] +2025-02-06 06:58:11 - ERROR - stderr - +2025-02-06 06:58:11 - ERROR - stderr - +2025-02-06 06:58:11 - INFO - stdout - {'loss': 0.3719, 'grad_norm': 1.8010636568069458, 'learning_rate': 1.9554184554045897e-07, 'epoch': 2.82} +2025-02-06 06:58:11 - ERROR - stderr - 94%|█████████▍| 21062/22434 [20:50:31<57:47, 2.53s/it] +2025-02-06 06:58:13 - ERROR - stderr - 94%|█████████▍| 21063/22434 [20:50:33<57:09, 2.50s/it] +2025-02-06 06:58:13 - ERROR - stderr - +2025-02-06 06:58:13 - ERROR - stderr - +2025-02-06 06:58:13 - INFO - stdout - {'loss': 0.3218, 'grad_norm': 1.544804334640503, 'learning_rate': 1.9525783393020803e-07, 'epoch': 2.82} +2025-02-06 06:58:13 - ERROR - stderr - 94%|█████████▍| 21063/22434 [20:50:33<57:09, 2.50s/it] +2025-02-06 06:58:16 - ERROR - stderr - 94%|█████████▍| 21064/22434 [20:50:35<57:00, 2.50s/it] +2025-02-06 06:58:16 - ERROR - stderr - +2025-02-06 06:58:16 - ERROR - stderr - +2025-02-06 06:58:16 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.5690834522247314, 'learning_rate': 1.949740266904243e-07, 'epoch': 2.82} +2025-02-06 06:58:16 - ERROR - stderr - 94%|█████████▍| 21064/22434 [20:50:35<57:00, 2.50s/it] +2025-02-06 06:58:18 - ERROR - stderr - 94%|█████████▍| 21065/22434 [20:50:38<56:48, 2.49s/it] +2025-02-06 06:58:18 - ERROR - stderr - +2025-02-06 06:58:18 - ERROR - stderr - +2025-02-06 06:58:18 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.5462055206298828, 'learning_rate': 1.946904238270253e-07, 'epoch': 2.82} +2025-02-06 06:58:18 - ERROR - stderr - 94%|█████████▍| 21065/22434 [20:50:38<56:48, 2.49s/it] +2025-02-06 06:58:21 - ERROR - stderr - 94%|█████████▍| 21066/22434 [20:50:40<56:52, 2.49s/it] +2025-02-06 06:58:21 - ERROR - stderr - +2025-02-06 06:58:21 - ERROR - stderr - +2025-02-06 06:58:21 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.4532150030136108, 'learning_rate': 1.944070253459218e-07, 'epoch': 2.82} +2025-02-06 06:58:21 - ERROR - stderr - 94%|█████████▍| 21066/22434 [20:50:40<56:52, 2.49s/it] +2025-02-06 06:58:23 - ERROR - stderr - 94%|█████████▍| 21067/22434 [20:50:43<56:57, 2.50s/it] +2025-02-06 06:58:23 - ERROR - stderr - +2025-02-06 06:58:23 - ERROR - stderr - +2025-02-06 06:58:23 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.698218584060669, 'learning_rate': 1.9412383125302136e-07, 'epoch': 2.82} +2025-02-06 06:58:23 - ERROR - stderr - 94%|█████████▍| 21067/22434 [20:50:43<56:57, 2.50s/it] +2025-02-06 06:58:26 - ERROR - stderr - 94%|█████████▍| 21068/22434 [20:50:45<56:34, 2.48s/it] +2025-02-06 06:58:26 - ERROR - stderr - +2025-02-06 06:58:26 - ERROR - stderr - +2025-02-06 06:58:26 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.3921817541122437, 'learning_rate': 1.938408415542259e-07, 'epoch': 2.82} +2025-02-06 06:58:26 - ERROR - stderr - 94%|█████████▍| 21068/22434 [20:50:45<56:34, 2.48s/it] +2025-02-06 06:58:29 - ERROR - stderr - 94%|█████████▍| 21069/22434 [20:50:48<59:16, 2.61s/it] +2025-02-06 06:58:29 - ERROR - stderr - +2025-02-06 06:58:29 - ERROR - stderr - +2025-02-06 06:58:29 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.5417016744613647, 'learning_rate': 1.93558056255434e-07, 'epoch': 2.82} +2025-02-06 06:58:29 - ERROR - stderr - 94%|█████████▍| 21069/22434 [20:50:48<59:16, 2.61s/it] +2025-02-06 06:58:31 - ERROR - stderr - 94%|█████████▍| 21070/22434 [20:50:51<58:26, 2.57s/it] +2025-02-06 06:58:31 - ERROR - stderr - +2025-02-06 06:58:31 - ERROR - stderr - +2025-02-06 06:58:31 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.745919108390808, 'learning_rate': 1.932754753625421e-07, 'epoch': 2.82} +2025-02-06 06:58:31 - ERROR - stderr - 94%|█████████▍| 21070/22434 [20:50:51<58:26, 2.57s/it] +2025-02-06 06:58:33 - ERROR - stderr - 94%|█████████▍| 21071/22434 [20:50:53<57:57, 2.55s/it] +2025-02-06 06:58:34 - ERROR - stderr - +2025-02-06 06:58:34 - ERROR - stderr - +2025-02-06 06:58:34 - INFO - stdout - {'loss': 0.3794, 'grad_norm': 1.710904598236084, 'learning_rate': 1.929930988814377e-07, 'epoch': 2.82} +2025-02-06 06:58:34 - ERROR - stderr - 94%|█████████▍| 21071/22434 [20:50:53<57:57, 2.55s/it] +2025-02-06 06:58:36 - ERROR - stderr - 94%|█████████▍| 21072/22434 [20:50:56<57:09, 2.52s/it] +2025-02-06 06:58:36 - ERROR - stderr - +2025-02-06 06:58:36 - ERROR - stderr - +2025-02-06 06:58:36 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.4693067073822021, 'learning_rate': 1.927109268180094e-07, 'epoch': 2.82} +2025-02-06 06:58:36 - ERROR - stderr - 94%|█████████▍| 21072/22434 [20:50:56<57:09, 2.52s/it] +2025-02-06 06:58:38 - ERROR - stderr - 94%|█████████▍| 21073/22434 [20:50:58<56:33, 2.49s/it] +2025-02-06 06:58:38 - ERROR - stderr - +2025-02-06 06:58:38 - ERROR - stderr - +2025-02-06 06:58:38 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.6029951572418213, 'learning_rate': 1.9242895917813475e-07, 'epoch': 2.82} +2025-02-06 06:58:38 - ERROR - stderr - 94%|█████████▍| 21073/22434 [20:50:58<56:33, 2.49s/it] +2025-02-06 06:58:41 - ERROR - stderr - 94%|█████████▍| 21074/22434 [20:51:01<56:42, 2.50s/it] +2025-02-06 06:58:41 - ERROR - stderr - +2025-02-06 06:58:41 - ERROR - stderr - +2025-02-06 06:58:41 - INFO - stdout - {'loss': 0.3535, 'grad_norm': 1.5685030221939087, 'learning_rate': 1.921471959676957e-07, 'epoch': 2.82} +2025-02-06 06:58:41 - ERROR - stderr - 94%|█████████▍| 21074/22434 [20:51:01<56:42, 2.50s/it] +2025-02-06 06:58:43 - ERROR - stderr - 94%|█████████▍| 21075/22434 [20:51:03<56:30, 2.50s/it] +2025-02-06 06:58:43 - ERROR - stderr - +2025-02-06 06:58:43 - ERROR - stderr - +2025-02-06 06:58:43 - INFO - stdout - {'loss': 0.4062, 'grad_norm': 1.5314923524856567, 'learning_rate': 1.9186563719256313e-07, 'epoch': 2.82} +2025-02-06 06:58:43 - ERROR - stderr - 94%|█████████▍| 21075/22434 [20:51:03<56:30, 2.50s/it] +2025-02-06 06:58:46 - ERROR - stderr - 94%|█████████▍| 21076/22434 [20:51:06<56:25, 2.49s/it] +2025-02-06 06:58:46 - ERROR - stderr - +2025-02-06 06:58:46 - ERROR - stderr - +2025-02-06 06:58:46 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.592108130455017, 'learning_rate': 1.9158428285860452e-07, 'epoch': 2.82} +2025-02-06 06:58:46 - ERROR - stderr - 94%|█████████▍| 21076/22434 [20:51:06<56:25, 2.49s/it] +2025-02-06 06:58:48 - ERROR - stderr - 94%|█████████▍| 21077/22434 [20:51:08<56:13, 2.49s/it] +2025-02-06 06:58:48 - ERROR - stderr - +2025-02-06 06:58:48 - ERROR - stderr - +2025-02-06 06:58:48 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.82607901096344, 'learning_rate': 1.9130313297168746e-07, 'epoch': 2.82} +2025-02-06 06:58:48 - ERROR - stderr - 94%|█████████▍| 21077/22434 [20:51:08<56:13, 2.49s/it] +2025-02-06 06:58:51 - ERROR - stderr - 94%|█████████▍| 21078/22434 [20:51:11<56:49, 2.51s/it] +2025-02-06 06:58:51 - ERROR - stderr - +2025-02-06 06:58:51 - ERROR - stderr - +2025-02-06 06:58:51 - INFO - stdout - {'loss': 0.3251, 'grad_norm': 1.4730095863342285, 'learning_rate': 1.9102218753766943e-07, 'epoch': 2.82} +2025-02-06 06:58:51 - ERROR - stderr - 94%|█████████▍| 21078/22434 [20:51:11<56:49, 2.51s/it] +2025-02-06 06:58:53 - ERROR - stderr - 94%|█████████▍| 21079/22434 [20:51:13<57:04, 2.53s/it] +2025-02-06 06:58:54 - ERROR - stderr - +2025-02-06 06:58:54 - ERROR - stderr - +2025-02-06 06:58:54 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.8074052333831787, 'learning_rate': 1.9074144656240913e-07, 'epoch': 2.82} +2025-02-06 06:58:54 - ERROR - stderr - 94%|█████████▍| 21079/22434 [20:51:13<57:04, 2.53s/it] +2025-02-06 06:58:56 - ERROR - stderr - 94%|█████████▍| 21080/22434 [20:51:16<56:43, 2.51s/it] +2025-02-06 06:58:56 - ERROR - stderr - +2025-02-06 06:58:56 - ERROR - stderr - +2025-02-06 06:58:56 - INFO - stdout - {'loss': 0.3948, 'grad_norm': 1.5792359113693237, 'learning_rate': 1.9046091005175627e-07, 'epoch': 2.82} +2025-02-06 06:58:56 - ERROR - stderr - 94%|█████████▍| 21080/22434 [20:51:16<56:43, 2.51s/it] +2025-02-06 06:58:58 - ERROR - stderr - 94%|█████████▍| 21081/22434 [20:51:18<56:04, 2.49s/it] +2025-02-06 06:58:58 - ERROR - stderr - +2025-02-06 06:58:58 - ERROR - stderr - +2025-02-06 06:58:58 - INFO - stdout - {'loss': 0.2905, 'grad_norm': 1.3176779747009277, 'learning_rate': 1.9018057801155843e-07, 'epoch': 2.82} +2025-02-06 06:58:58 - ERROR - stderr - 94%|█████████▍| 21081/22434 [20:51:18<56:04, 2.49s/it] +2025-02-06 06:59:01 - ERROR - stderr - 94%|█████████▍| 21082/22434 [20:51:21<56:16, 2.50s/it] +2025-02-06 06:59:01 - ERROR - stderr - +2025-02-06 06:59:01 - ERROR - stderr - +2025-02-06 06:59:01 - INFO - stdout - {'loss': 0.3373, 'grad_norm': 1.520768165588379, 'learning_rate': 1.8990045044766093e-07, 'epoch': 2.82} +2025-02-06 06:59:01 - ERROR - stderr - 94%|█████████▍| 21082/22434 [20:51:21<56:16, 2.50s/it] +2025-02-06 06:59:03 - ERROR - stderr - 94%|█████████▍| 21083/22434 [20:51:23<56:35, 2.51s/it] +2025-02-06 06:59:03 - ERROR - stderr - +2025-02-06 06:59:04 - ERROR - stderr - +2025-02-06 06:59:04 - INFO - stdout - {'loss': 0.3294, 'grad_norm': 1.5521377325057983, 'learning_rate': 1.8962052736590019e-07, 'epoch': 2.82} +2025-02-06 06:59:04 - ERROR - stderr - 94%|█████████▍| 21083/22434 [20:51:23<56:35, 2.51s/it] +2025-02-06 06:59:06 - ERROR - stderr - 94%|█████████▍| 21084/22434 [20:51:26<55:56, 2.49s/it] +2025-02-06 06:59:06 - ERROR - stderr - +2025-02-06 06:59:06 - ERROR - stderr - +2025-02-06 06:59:06 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.5931363105773926, 'learning_rate': 1.8934080877211158e-07, 'epoch': 2.82} +2025-02-06 06:59:06 - ERROR - stderr - 94%|█████████▍| 21084/22434 [20:51:26<55:56, 2.49s/it] +2025-02-06 06:59:08 - ERROR - stderr - 94%|█████��███▍| 21085/22434 [20:51:28<56:06, 2.50s/it] +2025-02-06 06:59:08 - ERROR - stderr - +2025-02-06 06:59:08 - ERROR - stderr - +2025-02-06 06:59:08 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.6153312921524048, 'learning_rate': 1.8906129467212708e-07, 'epoch': 2.82} +2025-02-06 06:59:08 - ERROR - stderr - 94%|█████████▍| 21085/22434 [20:51:28<56:06, 2.50s/it] +2025-02-06 06:59:11 - ERROR - stderr - 94%|█████████▍| 21086/22434 [20:51:31<56:01, 2.49s/it] +2025-02-06 06:59:11 - ERROR - stderr - +2025-02-06 06:59:11 - ERROR - stderr - +2025-02-06 06:59:11 - INFO - stdout - {'loss': 0.3519, 'grad_norm': 1.6326535940170288, 'learning_rate': 1.8878198507177093e-07, 'epoch': 2.82} +2025-02-06 06:59:11 - ERROR - stderr - 94%|█████████▍| 21086/22434 [20:51:31<56:01, 2.49s/it] +2025-02-06 06:59:13 - ERROR - stderr - 94%|█████████▍| 21087/22434 [20:51:33<56:17, 2.51s/it] +2025-02-06 06:59:13 - ERROR - stderr - +2025-02-06 06:59:13 - ERROR - stderr - +2025-02-06 06:59:13 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.5739383697509766, 'learning_rate': 1.8850287997686623e-07, 'epoch': 2.82} +2025-02-06 06:59:13 - ERROR - stderr - 94%|█████████▍| 21087/22434 [20:51:33<56:17, 2.51s/it] +2025-02-06 06:59:16 - ERROR - stderr - 94%|█████████▍| 21088/22434 [20:51:36<56:32, 2.52s/it] +2025-02-06 06:59:16 - ERROR - stderr - +2025-02-06 06:59:16 - ERROR - stderr - +2025-02-06 06:59:16 - INFO - stdout - {'loss': 0.3269, 'grad_norm': 1.3473291397094727, 'learning_rate': 1.8822397939323055e-07, 'epoch': 2.82} +2025-02-06 06:59:16 - ERROR - stderr - 94%|█████████▍| 21088/22434 [20:51:36<56:32, 2.52s/it] +2025-02-06 06:59:19 - ERROR - stderr - 94%|█████████▍| 21089/22434 [20:51:38<56:48, 2.53s/it] +2025-02-06 06:59:19 - ERROR - stderr - +2025-02-06 06:59:19 - ERROR - stderr - +2025-02-06 06:59:19 - INFO - stdout - {'loss': 0.3121, 'grad_norm': 1.3279112577438354, 'learning_rate': 1.8794528332667816e-07, 'epoch': 2.82} +2025-02-06 06:59:19 - ERROR - stderr - 94%|█████████▍| 21089/22434 [20:51:38<56:48, 2.53s/it] +2025-02-06 06:59:21 - ERROR - stderr - 94%|█████████▍| 21090/22434 [20:51:41<56:57, 2.54s/it] +2025-02-06 06:59:21 - ERROR - stderr - +2025-02-06 06:59:21 - ERROR - stderr - +2025-02-06 06:59:21 - INFO - stdout - {'loss': 0.3811, 'grad_norm': 1.4681549072265625, 'learning_rate': 1.876667917830155e-07, 'epoch': 2.82} +2025-02-06 06:59:21 - ERROR - stderr - 94%|█████████▍| 21090/22434 [20:51:41<56:57, 2.54s/it] +2025-02-06 06:59:24 - ERROR - stderr - 94%|█████████▍| 21091/22434 [20:51:43<57:00, 2.55s/it] +2025-02-06 06:59:24 - ERROR - stderr - +2025-02-06 06:59:24 - ERROR - stderr - +2025-02-06 06:59:24 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.5945870876312256, 'learning_rate': 1.8738850476805127e-07, 'epoch': 2.82} +2025-02-06 06:59:24 - ERROR - stderr - 94%|█████████▍| 21091/22434 [20:51:43<57:00, 2.55s/it] +2025-02-06 06:59:26 - ERROR - stderr - 94%|█████████▍| 21092/22434 [20:51:46<57:36, 2.58s/it] +2025-02-06 06:59:26 - ERROR - stderr - +2025-02-06 06:59:26 - ERROR - stderr - +2025-02-06 06:59:26 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.6379867792129517, 'learning_rate': 1.871104222875819e-07, 'epoch': 2.82} +2025-02-06 06:59:26 - ERROR - stderr - 94%|█████████▍| 21092/22434 [20:51:46<57:36, 2.58s/it] +2025-02-06 06:59:29 - ERROR - stderr - 94%|█████████▍| 21093/22434 [20:51:49<56:37, 2.53s/it] +2025-02-06 06:59:29 - ERROR - stderr - +2025-02-06 06:59:29 - ERROR - stderr - +2025-02-06 06:59:29 - INFO - stdout - {'loss': 0.3076, 'grad_norm': 1.2924295663833618, 'learning_rate': 1.8683254434740617e-07, 'epoch': 2.82} +2025-02-06 06:59:29 - ERROR - stderr - 94%|█████████▍| 21093/22434 [20:51:49<56:37, 2.53s/it] +2025-02-06 06:59:31 - ERROR - stderr - 94%|█████████▍| 21094/22434 [20:51:51<56:12, 2.52s/it] +2025-02-06 06:59:31 - ERROR - stderr - +2025-02-06 06:59:31 - ERROR - stderr - +2025-02-06 06:59:31 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.4257365465164185, 'learning_rate': 1.8655487095331716e-07, 'epoch': 2.82} +2025-02-06 06:59:31 - ERROR - stderr - 94%|█████████▍| 21094/22434 [20:51:51<56:12, 2.52s/it] +2025-02-06 06:59:34 - ERROR - stderr - 94%|█████████▍| 21095/22434 [20:51:54<59:29, 2.67s/it] +2025-02-06 06:59:34 - ERROR - stderr - +2025-02-06 06:59:34 - ERROR - stderr - +2025-02-06 06:59:34 - INFO - stdout - {'loss': 0.3817, 'grad_norm': 1.57588529586792, 'learning_rate': 1.8627740211110023e-07, 'epoch': 2.82} +2025-02-06 06:59:34 - ERROR - stderr - 94%|█████████▍| 21095/22434 [20:51:54<59:29, 2.67s/it] +2025-02-06 06:59:37 - ERROR - stderr - 94%|█████████▍| 21096/22434 [20:51:57<1:00:09, 2.70s/it] +2025-02-06 06:59:37 - ERROR - stderr - +2025-02-06 06:59:37 - ERROR - stderr - +2025-02-06 06:59:37 - INFO - stdout - {'loss': 0.3423, 'grad_norm': 1.5338976383209229, 'learning_rate': 1.860001378265408e-07, 'epoch': 2.82} +2025-02-06 06:59:37 - ERROR - stderr - 94%|█████████▍| 21096/22434 [20:51:57<1:00:09, 2.70s/it] +2025-02-06 06:59:40 - ERROR - stderr - 94%|█████████▍| 21097/22434 [20:52:00<1:01:22, 2.75s/it] +2025-02-06 06:59:40 - ERROR - stderr - +2025-02-06 06:59:40 - ERROR - stderr - +2025-02-06 06:59:40 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.4424211978912354, 'learning_rate': 1.8572307810541645e-07, 'epoch': 2.82} +2025-02-06 06:59:40 - ERROR - stderr - 94%|█████████▍| 21097/22434 [20:52:00<1:01:22, 2.75s/it] +2025-02-06 06:59:42 - ERROR - stderr - 94%|█████████▍| 21098/22434 [20:52:02<59:37, 2.68s/it] +2025-02-06 06:59:42 - ERROR - stderr - +2025-02-06 06:59:42 - ERROR - stderr - +2025-02-06 06:59:42 - INFO - stdout - {'loss': 0.3596, 'grad_norm': 1.6507391929626465, 'learning_rate': 1.854462229535059e-07, 'epoch': 2.82} +2025-02-06 06:59:42 - ERROR - stderr - 94%|█████████▍| 21098/22434 [20:52:02<59:37, 2.68s/it] +2025-02-06 06:59:45 - ERROR - stderr - 94%|█████████▍| 21099/22434 [20:52:05<58:36, 2.63s/it] +2025-02-06 06:59:45 - ERROR - stderr - +2025-02-06 06:59:45 - ERROR - stderr - +2025-02-06 06:59:45 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.5054662227630615, 'learning_rate': 1.851695723765745e-07, 'epoch': 2.82} +2025-02-06 06:59:45 - ERROR - stderr - 94%|█████████▍| 21099/22434 [20:52:05<58:36, 2.63s/it] +2025-02-06 06:59:47 - ERROR - stderr - 94%|█████████▍| 21100/22434 [20:52:07<57:39, 2.59s/it] +2025-02-06 06:59:47 - ERROR - stderr - +2025-02-06 06:59:47 - ERROR - stderr - +2025-02-06 06:59:47 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.5473171472549438, 'learning_rate': 1.8489312638039325e-07, 'epoch': 2.82} +2025-02-06 06:59:47 - ERROR - stderr - 94%|█████████▍| 21100/22434 [20:52:07<57:39, 2.59s/it] +2025-02-06 06:59:50 - ERROR - stderr - 94%|█████████▍| 21101/22434 [20:52:10<57:01, 2.57s/it] +2025-02-06 06:59:50 - ERROR - stderr - +2025-02-06 06:59:50 - ERROR - stderr - +2025-02-06 06:59:50 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.5663384199142456, 'learning_rate': 1.8461688497072193e-07, 'epoch': 2.82} +2025-02-06 06:59:50 - ERROR - stderr - 94%|█████████▍| 21101/22434 [20:52:10<57:01, 2.57s/it] +2025-02-06 06:59:53 - ERROR - stderr - 94%|█████████▍| 21102/22434 [20:52:12<57:30, 2.59s/it] +2025-02-06 06:59:53 - ERROR - stderr - +2025-02-06 06:59:53 - ERROR - stderr - +2025-02-06 06:59:53 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.3666025400161743, 'learning_rate': 1.843408481533182e-07, 'epoch': 2.82} +2025-02-06 06:59:53 - ERROR - stderr - 94%|█████████▍| 21102/22434 [20:52:12<57:30, 2.59s/it] +2025-02-06 06:59:55 - ERROR - stderr - 94%|█████████▍| 21103/22434 [20:52:15<56:42, 2.56s/it] +2025-02-06 06:59:55 - ERROR - stderr - +2025-02-06 06:59:55 - ERROR - stderr - +2025-02-06 06:59:55 - INFO - stdout - {'loss': 0.3321, 'grad_norm': 1.4269355535507202, 'learning_rate': 1.8406501593393967e-07, 'epoch': 2.82} +2025-02-06 06:59:55 - ERROR - stderr - 94%|█████████▍| 21103/22434 [20:52:15<56:42, 2.56s/it] +2025-02-06 06:59:58 - ERROR - stderr - 94%|█████████▍| 21104/22434 [20:52:17<56:09, 2.53s/it] +2025-02-06 06:59:58 - ERROR - stderr - +2025-02-06 06:59:58 - ERROR - stderr - +2025-02-06 06:59:58 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.6491373777389526, 'learning_rate': 1.8378938831833172e-07, 'epoch': 2.82} +2025-02-06 06:59:58 - ERROR - stderr - 94%|█████████▍| 21104/22434 [20:52:17<56:09, 2.53s/it] +2025-02-06 07:00:00 - ERROR - stderr - 94%|█████████▍| 21105/22434 [20:52:20<56:10, 2.54s/it] +2025-02-06 07:00:00 - ERROR - stderr - +2025-02-06 07:00:00 - ERROR - stderr - +2025-02-06 07:00:00 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.6401958465576172, 'learning_rate': 1.8351396531224087e-07, 'epoch': 2.82} +2025-02-06 07:00:00 - ERROR - stderr - 94%|█████████▍| 21105/22434 [20:52:20<56:10, 2.54s/it] +2025-02-06 07:00:03 - ERROR - stderr - 94%|█████████▍| 21106/22434 [20:52:22<55:56, 2.53s/it] +2025-02-06 07:00:03 - ERROR - stderr - +2025-02-06 07:00:03 - ERROR - stderr - +2025-02-06 07:00:03 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.6312828063964844, 'learning_rate': 1.8323874692140807e-07, 'epoch': 2.82} +2025-02-06 07:00:03 - ERROR - stderr - 94%|█████████▍| 21106/22434 [20:52:22<55:56, 2.53s/it] +2025-02-06 07:00:05 - ERROR - stderr - 94%|█████████▍| 21107/22434 [20:52:25<56:28, 2.55s/it] +2025-02-06 07:00:05 - ERROR - stderr - +2025-02-06 07:00:05 - ERROR - stderr - +2025-02-06 07:00:05 - INFO - stdout - {'loss': 0.3225, 'grad_norm': 1.5851949453353882, 'learning_rate': 1.829637331515699e-07, 'epoch': 2.82} +2025-02-06 07:00:05 - ERROR - stderr - 94%|█████████▍| 21107/22434 [20:52:25<56:28, 2.55s/it] +2025-02-06 07:00:08 - ERROR - stderr - 94%|█████████▍| 21108/22434 [20:52:28<56:33, 2.56s/it] +2025-02-06 07:00:08 - ERROR - stderr - +2025-02-06 07:00:08 - ERROR - stderr - +2025-02-06 07:00:08 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.52326500415802, 'learning_rate': 1.8268892400845838e-07, 'epoch': 2.82} +2025-02-06 07:00:08 - ERROR - stderr - 94%|█████████▍| 21108/22434 [20:52:28<56:33, 2.56s/it] +2025-02-06 07:00:10 - ERROR - stderr - 94%|█████████▍| 21109/22434 [20:52:30<56:21, 2.55s/it] +2025-02-06 07:00:10 - ERROR - stderr - +2025-02-06 07:00:10 - ERROR - stderr - +2025-02-06 07:00:10 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.6832385063171387, 'learning_rate': 1.824143194978023e-07, 'epoch': 2.82} +2025-02-06 07:00:10 - ERROR - stderr - 94%|█████████▍| 21109/22434 [20:52:30<56:21, 2.55s/it] +2025-02-06 07:00:13 - ERROR - stderr - 94%|█████████▍| 21110/22434 [20:52:33<56:00, 2.54s/it] +2025-02-06 07:00:13 - ERROR - stderr - +2025-02-06 07:00:13 - ERROR - stderr - +2025-02-06 07:00:13 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.3753222227096558, 'learning_rate': 1.8213991962532595e-07, 'epoch': 2.82} +2025-02-06 07:00:13 - ERROR - stderr - 94%|█████████▍| 21110/22434 [20:52:33<56:00, 2.54s/it] +2025-02-06 07:00:15 - ERROR - stderr - 94%|█████████▍| 21111/22434 [20:52:35<55:39, 2.52s/it] +2025-02-06 07:00:15 - ERROR - stderr - +2025-02-06 07:00:15 - ERROR - stderr - +2025-02-06 07:00:15 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.6039220094680786, 'learning_rate': 1.818657243967481e-07, 'epoch': 2.82} +2025-02-06 07:00:15 - ERROR - stderr - 94%|█████████▍| 21111/22434 [20:52:35<55:39, 2.52s/it] +2025-02-06 07:00:18 - ERROR - stderr - 94%|█████████▍| 21112/22434 [20:52:38<55:32, 2.52s/it] +2025-02-06 07:00:18 - ERROR - stderr - +2025-02-06 07:00:18 - ERROR - stderr - +2025-02-06 07:00:18 - INFO - stdout - {'loss': 0.3899, 'grad_norm': 1.6353678703308105, 'learning_rate': 1.8159173381778417e-07, 'epoch': 2.82} +2025-02-06 07:00:18 - ERROR - stderr - 94%|█████████▍| 21112/22434 [20:52:38<55:32, 2.52s/it] +2025-02-06 07:00:20 - ERROR - stderr - 94%|█████████▍| 21113/22434 [20:52:40<55:02, 2.50s/it] +2025-02-06 07:00:20 - ERROR - stderr - +2025-02-06 07:00:20 - ERROR - stderr - +2025-02-06 07:00:20 - INFO - stdout - {'loss': 0.3905, 'grad_norm': 1.6089023351669312, 'learning_rate': 1.8131794789414513e-07, 'epoch': 2.82} +2025-02-06 07:00:20 - ERROR - stderr - 94%|█████████▍| 21113/22434 [20:52:40<55:02, 2.50s/it] +2025-02-06 07:00:23 - ERROR - stderr - 94%|█████████▍| 21114/22434 [20:52:43<55:07, 2.51s/it] +2025-02-06 07:00:23 - ERROR - stderr - +2025-02-06 07:00:23 - ERROR - stderr - +2025-02-06 07:00:23 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.4124891757965088, 'learning_rate': 1.8104436663153757e-07, 'epoch': 2.82} +2025-02-06 07:00:23 - ERROR - stderr - 94%|█████████▍| 21114/22434 [20:52:43<55:07, 2.51s/it] +2025-02-06 07:00:25 - ERROR - stderr - 94%|█████████▍| 21115/22434 [20:52:45<55:56, 2.54s/it] +2025-02-06 07:00:25 - ERROR - stderr - +2025-02-06 07:00:25 - ERROR - stderr - +2025-02-06 07:00:25 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.6612999439239502, 'learning_rate': 1.807709900356658e-07, 'epoch': 2.82} +2025-02-06 07:00:25 - ERROR - stderr - 94%|█████████▍| 21115/22434 [20:52:45<55:56, 2.54s/it] +2025-02-06 07:00:28 - ERROR - stderr - 94%|█████████▍| 21116/22434 [20:52:48<55:09, 2.51s/it] +2025-02-06 07:00:28 - ERROR - stderr - +2025-02-06 07:00:28 - ERROR - stderr - +2025-02-06 07:00:28 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.6417137384414673, 'learning_rate': 1.8049781811222523e-07, 'epoch': 2.82} +2025-02-06 07:00:28 - ERROR - stderr - 94%|█████████▍| 21116/22434 [20:52:48<55:09, 2.51s/it] +2025-02-06 07:00:30 - ERROR - stderr - 94%|█████████▍| 21117/22434 [20:52:50<55:20, 2.52s/it] +2025-02-06 07:00:30 - ERROR - stderr - +2025-02-06 07:00:30 - ERROR - stderr - +2025-02-06 07:00:30 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.4563477039337158, 'learning_rate': 1.8022485086691355e-07, 'epoch': 2.82} +2025-02-06 07:00:30 - ERROR - stderr - 94%|█████████▍| 21117/22434 [20:52:50<55:20, 2.52s/it] +2025-02-06 07:00:33 - ERROR - stderr - 94%|█████████▍| 21118/22434 [20:52:53<57:08, 2.61s/it] +2025-02-06 07:00:33 - ERROR - stderr - +2025-02-06 07:00:33 - ERROR - stderr - +2025-02-06 07:00:33 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.511047601699829, 'learning_rate': 1.7995208830541512e-07, 'epoch': 2.82} +2025-02-06 07:00:33 - ERROR - stderr - 94%|█████████▍| 21118/22434 [20:52:53<57:08, 2.61s/it] +2025-02-06 07:00:36 - ERROR - stderr - 94%|█████████▍| 21119/22434 [20:52:56<57:54, 2.64s/it] +2025-02-06 07:00:36 - ERROR - stderr - +2025-02-06 07:00:36 - ERROR - stderr - +2025-02-06 07:00:36 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.490206003189087, 'learning_rate': 1.7967953043342202e-07, 'epoch': 2.82} +2025-02-06 07:00:36 - ERROR - stderr - 94%|█████████▍| 21119/22434 [20:52:56<57:54, 2.64s/it] +2025-02-06 07:00:38 - ERROR - stderr - 94%|█████████▍| 21120/22434 [20:52:58<57:19, 2.62s/it] +2025-02-06 07:00:39 - ERROR - stderr - +2025-02-06 07:00:39 - ERROR - stderr - +2025-02-06 07:00:39 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.4757161140441895, 'learning_rate': 1.7940717725661082e-07, 'epoch': 2.82} +2025-02-06 07:00:39 - ERROR - stderr - 94%|█████████▍| 21120/22434 [20:52:58<57:19, 2.62s/it] +2025-02-06 07:00:41 - ERROR - stderr - 94%|█████████▍| 21121/22434 [20:53:01<56:23, 2.58s/it] +2025-02-06 07:00:41 - ERROR - stderr - +2025-02-06 07:00:41 - ERROR - stderr - +2025-02-06 07:00:41 - INFO - stdout - {'loss': 0.4394, 'grad_norm': 1.8071578741073608, 'learning_rate': 1.7913502878065814e-07, 'epoch': 2.82} +2025-02-06 07:00:41 - ERROR - stderr - 94%|█████████▍| 21121/22434 [20:53:01<56:23, 2.58s/it] +2025-02-06 07:00:44 - ERROR - stderr - 94%|█████████▍| 21122/22434 [20:53:03<56:35, 2.59s/it] +2025-02-06 07:00:44 - ERROR - stderr - +2025-02-06 07:00:44 - ERROR - stderr - +2025-02-06 07:00:44 - INFO - stdout - {'loss': 0.339, 'grad_norm': 1.5498602390289307, 'learning_rate': 1.788630850112405e-07, 'epoch': 2.82} +2025-02-06 07:00:44 - ERROR - stderr - 94%|█████████▍| 21122/22434 [20:53:03<56:35, 2.59s/it] +2025-02-06 07:00:46 - ERROR - stderr - 94%|█████████▍| 21123/22434 [20:53:06<55:33, 2.54s/it] +2025-02-06 07:00:46 - ERROR - stderr - +2025-02-06 07:00:46 - ERROR - stderr - +2025-02-06 07:00:46 - INFO - stdout - {'loss': 0.3134, 'grad_norm': 1.786620855331421, 'learning_rate': 1.785913459540234e-07, 'epoch': 2.82} +2025-02-06 07:00:46 - ERROR - stderr - 94%|█████████▍| 21123/22434 [20:53:06<55:33, 2.54s/it] +2025-02-06 07:00:49 - ERROR - stderr - 94%|█████████▍| 21124/22434 [20:53:08<55:17, 2.53s/it] +2025-02-06 07:00:49 - ERROR - stderr - +2025-02-06 07:00:49 - ERROR - stderr - +2025-02-06 07:00:49 - INFO - stdout - {'loss': 0.3577, 'grad_norm': 1.5780508518218994, 'learning_rate': 1.7831981161467116e-07, 'epoch': 2.82} +2025-02-06 07:00:49 - ERROR - stderr - 94%|█████████▍| 21124/22434 [20:53:08<55:17, 2.53s/it] +2025-02-06 07:00:51 - ERROR - stderr - 94%|█████████▍| 21125/22434 [20:53:11<56:26, 2.59s/it] +2025-02-06 07:00:51 - ERROR - stderr - +2025-02-06 07:00:51 - ERROR - stderr - +2025-02-06 07:00:51 - INFO - stdout - {'loss': 0.318, 'grad_norm': 1.4263684749603271, 'learning_rate': 1.7804848199884373e-07, 'epoch': 2.82} +2025-02-06 07:00:51 - ERROR - stderr - 94%|█████████▍| 21125/22434 [20:53:11<56:26, 2.59s/it] +2025-02-06 07:00:54 - ERROR - stderr - 94%|█████████▍| 21126/22434 [20:53:14<56:49, 2.61s/it] +2025-02-06 07:00:54 - ERROR - stderr - +2025-02-06 07:00:54 - ERROR - stderr - +2025-02-06 07:00:54 - INFO - stdout - {'loss': 0.3226, 'grad_norm': 1.4885960817337036, 'learning_rate': 1.7777735711219768e-07, 'epoch': 2.83} +2025-02-06 07:00:54 - ERROR - stderr - 94%|█████████▍| 21126/22434 [20:53:14<56:49, 2.61s/it] +2025-02-06 07:00:56 - ERROR - stderr - 94%|█████████▍| 21127/22434 [20:53:16<56:16, 2.58s/it] +2025-02-06 07:00:56 - ERROR - stderr - +2025-02-06 07:00:56 - ERROR - stderr - +2025-02-06 07:00:56 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.5297586917877197, 'learning_rate': 1.7750643696038406e-07, 'epoch': 2.83} +2025-02-06 07:00:56 - ERROR - stderr - 94%|█████████▍| 21127/22434 [20:53:16<56:16, 2.58s/it] +2025-02-06 07:00:59 - ERROR - stderr - 94%|█████████▍| 21128/22434 [20:53:19<56:51, 2.61s/it] +2025-02-06 07:00:59 - ERROR - stderr - +2025-02-06 07:00:59 - ERROR - stderr - +2025-02-06 07:00:59 - INFO - stdout - {'loss': 0.326, 'grad_norm': 1.533209204673767, 'learning_rate': 1.7723572154904944e-07, 'epoch': 2.83} +2025-02-06 07:00:59 - ERROR - stderr - 94%|█████████▍| 21128/22434 [20:53:19<56:51, 2.61s/it] +2025-02-06 07:01:02 - ERROR - stderr - 94%|█████████▍| 21129/22434 [20:53:21<56:37, 2.60s/it] +2025-02-06 07:01:02 - ERROR - stderr - +2025-02-06 07:01:02 - ERROR - stderr - +2025-02-06 07:01:02 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.5913262367248535, 'learning_rate': 1.76965210883836e-07, 'epoch': 2.83} +2025-02-06 07:01:02 - ERROR - stderr - 94%|█████████▍| 21129/22434 [20:53:21<56:37, 2.60s/it] +2025-02-06 07:01:04 - ERROR - stderr - 94%|█████████▍| 21130/22434 [20:53:24<56:55, 2.62s/it] +2025-02-06 07:01:04 - ERROR - stderr - +2025-02-06 07:01:04 - ERROR - stderr - +2025-02-06 07:01:04 - INFO - stdout - {'loss': 0.3341, 'grad_norm': 1.6883372068405151, 'learning_rate': 1.7669490497038366e-07, 'epoch': 2.83} +2025-02-06 07:01:04 - ERROR - stderr - 94%|█████████▍| 21130/22434 [20:53:24<56:55, 2.62s/it] +2025-02-06 07:01:07 - ERROR - stderr - 94%|█████████▍| 21131/22434 [20:53:27<56:20, 2.59s/it] +2025-02-06 07:01:07 - ERROR - stderr - +2025-02-06 07:01:07 - ERROR - stderr - +2025-02-06 07:01:07 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.7060953378677368, 'learning_rate': 1.764248038143268e-07, 'epoch': 2.83} +2025-02-06 07:01:07 - ERROR - stderr - 94%|█████████▍| 21131/22434 [20:53:27<56:20, 2.59s/it] +2025-02-06 07:01:09 - ERROR - stderr - 94%|█████████▍| 21132/22434 [20:53:29<55:44, 2.57s/it] +2025-02-06 07:01:09 - ERROR - stderr - +2025-02-06 07:01:09 - ERROR - stderr - +2025-02-06 07:01:09 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.5996332168579102, 'learning_rate': 1.7615490742129427e-07, 'epoch': 2.83} +2025-02-06 07:01:09 - ERROR - stderr - 94%|█████████▍| 21132/22434 [20:53:29<55:44, 2.57s/it] +2025-02-06 07:01:12 - ERROR - stderr - 94%|█████████▍| 21133/22434 [20:53:32<55:37, 2.57s/it] +2025-02-06 07:01:12 - ERROR - stderr - +2025-02-06 07:01:12 - ERROR - stderr - +2025-02-06 07:01:12 - INFO - stdout - {'loss': 0.39, 'grad_norm': 1.8237059116363525, 'learning_rate': 1.7588521579691263e-07, 'epoch': 2.83} +2025-02-06 07:01:12 - ERROR - stderr - 94%|█████████▍| 21133/22434 [20:53:32<55:37, 2.57s/it] +2025-02-06 07:01:15 - ERROR - stderr - 94%|█████████▍| 21134/22434 [20:53:34<55:41, 2.57s/it] +2025-02-06 07:01:15 - ERROR - stderr - +2025-02-06 07:01:15 - ERROR - stderr - +2025-02-06 07:01:15 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.5537697076797485, 'learning_rate': 1.756157289468019e-07, 'epoch': 2.83} +2025-02-06 07:01:15 - ERROR - stderr - 94%|█████████▍| 21134/22434 [20:53:34<55:41, 2.57s/it] +2025-02-06 07:01:17 - ERROR - stderr - 94%|█████████▍| 21135/22434 [20:53:37<55:16, 2.55s/it] +2025-02-06 07:01:17 - ERROR - stderr - +2025-02-06 07:01:17 - ERROR - stderr - +2025-02-06 07:01:17 - INFO - stdout - {'loss': 0.3325, 'grad_norm': 1.3952116966247559, 'learning_rate': 1.7534644687658197e-07, 'epoch': 2.83} +2025-02-06 07:01:17 - ERROR - stderr - 94%|█████████▍| 21135/22434 [20:53:37<55:16, 2.55s/it] +2025-02-06 07:01:20 - ERROR - stderr - 94%|█████████▍| 21136/22434 [20:53:39<55:02, 2.54s/it] +2025-02-06 07:01:20 - ERROR - stderr - +2025-02-06 07:01:20 - ERROR - stderr - +2025-02-06 07:01:20 - INFO - stdout - {'loss': 0.3825, 'grad_norm': 1.5767836570739746, 'learning_rate': 1.7507736959186394e-07, 'epoch': 2.83} +2025-02-06 07:01:20 - ERROR - stderr - 94%|█████████▍| 21136/22434 [20:53:39<55:02, 2.54s/it] +2025-02-06 07:01:22 - ERROR - stderr - 94%|█████████▍| 21137/22434 [20:53:42<54:32, 2.52s/it] +2025-02-06 07:01:22 - ERROR - stderr - +2025-02-06 07:01:22 - ERROR - stderr - +2025-02-06 07:01:22 - INFO - stdout - {'loss': 0.3437, 'grad_norm': 1.3958687782287598, 'learning_rate': 1.7480849709825555e-07, 'epoch': 2.83} +2025-02-06 07:01:22 - ERROR - stderr - 94%|█████████▍| 21137/22434 [20:53:42<54:32, 2.52s/it] +2025-02-06 07:01:25 - ERROR - stderr - 94%|█████████▍| 21138/22434 [20:53:45<55:51, 2.59s/it] +2025-02-06 07:01:25 - ERROR - stderr - +2025-02-06 07:01:25 - ERROR - stderr - +2025-02-06 07:01:25 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.5096261501312256, 'learning_rate': 1.7453982940136337e-07, 'epoch': 2.83} +2025-02-06 07:01:25 - ERROR - stderr - 94%|█████████▍| 21138/22434 [20:53:45<55:51, 2.59s/it] +2025-02-06 07:01:27 - ERROR - stderr - 94%|█████████▍| 21139/22434 [20:53:47<54:52, 2.54s/it] +2025-02-06 07:01:27 - ERROR - stderr - +2025-02-06 07:01:27 - ERROR - stderr - +2025-02-06 07:01:27 - INFO - stdout - {'loss': 0.4306, 'grad_norm': 1.6935224533081055, 'learning_rate': 1.7427136650678634e-07, 'epoch': 2.83} +2025-02-06 07:01:27 - ERROR - stderr - 94%|█████████▍| 21139/22434 [20:53:47<54:52, 2.54s/it] +2025-02-06 07:01:30 - ERROR - stderr - 94%|█████████▍| 21140/22434 [20:53:49<54:36, 2.53s/it] +2025-02-06 07:01:30 - ERROR - stderr - +2025-02-06 07:01:30 - ERROR - stderr - +2025-02-06 07:01:30 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.3541380167007446, 'learning_rate': 1.740031084201188e-07, 'epoch': 2.83} +2025-02-06 07:01:30 - ERROR - stderr - 94%|█████████▍| 21140/22434 [20:53:50<54:36, 2.53s/it] +2025-02-06 07:01:32 - ERROR - stderr - 94%|█████████▍| 21141/22434 [20:53:52<54:17, 2.52s/it] +2025-02-06 07:01:32 - ERROR - stderr - +2025-02-06 07:01:32 - ERROR - stderr - +2025-02-06 07:01:32 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.4912734031677246, 'learning_rate': 1.7373505514695633e-07, 'epoch': 2.83} +2025-02-06 07:01:32 - ERROR - stderr - 94%|█████████▍| 21141/22434 [20:53:52<54:17, 2.52s/it] +2025-02-06 07:01:35 - ERROR - stderr - 94%|█████████▍| 21142/22434 [20:53:54<54:07, 2.51s/it] +2025-02-06 07:01:35 - ERROR - stderr - +2025-02-06 07:01:35 - ERROR - stderr - +2025-02-06 07:01:35 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.6090102195739746, 'learning_rate': 1.734672066928822e-07, 'epoch': 2.83} +2025-02-06 07:01:35 - ERROR - stderr - 94%|█████████▍| 21142/22434 [20:53:55<54:07, 2.51s/it] +2025-02-06 07:01:38 - ERROR - stderr - 94%|█████████▍| 21143/22434 [20:53:57<57:03, 2.65s/it] +2025-02-06 07:01:38 - ERROR - stderr - +2025-02-06 07:01:38 - ERROR - stderr - +2025-02-06 07:01:38 - INFO - stdout - {'loss': 0.3366, 'grad_norm': 1.589031457901001, 'learning_rate': 1.7319956306348307e-07, 'epoch': 2.83} +2025-02-06 07:01:38 - ERROR - stderr - 94%|█████████▍| 21143/22434 [20:53:57<57:03, 2.65s/it] +2025-02-06 07:01:40 - ERROR - stderr - 94%|█████████▍| 21144/22434 [20:54:00<56:15, 2.62s/it] +2025-02-06 07:01:40 - ERROR - stderr - +2025-02-06 07:01:40 - ERROR - stderr - +2025-02-06 07:01:40 - INFO - stdout - {'loss': 0.3407, 'grad_norm': 1.587558388710022, 'learning_rate': 1.7293212426433447e-07, 'epoch': 2.83} +2025-02-06 07:01:40 - ERROR - stderr - 94%|█████████▍| 21144/22434 [20:54:00<56:15, 2.62s/it] +2025-02-06 07:01:43 - ERROR - stderr - 94%|█████████▍| 21145/22434 [20:54:02<55:24, 2.58s/it] +2025-02-06 07:01:43 - ERROR - stderr - +2025-02-06 07:01:43 - ERROR - stderr - +2025-02-06 07:01:43 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.486051082611084, 'learning_rate': 1.7266489030101308e-07, 'epoch': 2.83} +2025-02-06 07:01:43 - ERROR - stderr - 94%|█████████▍| 21145/22434 [20:54:03<55:24, 2.58s/it] +2025-02-06 07:01:45 - ERROR - stderr - 94%|█████████▍| 21146/22434 [20:54:05<55:04, 2.57s/it] +2025-02-06 07:01:45 - ERROR - stderr - +2025-02-06 07:01:45 - ERROR - stderr - +2025-02-06 07:01:45 - INFO - stdout - {'loss': 0.2863, 'grad_norm': 1.3847711086273193, 'learning_rate': 1.7239786117908776e-07, 'epoch': 2.83} +2025-02-06 07:01:45 - ERROR - stderr - 94%|█████████▍| 21146/22434 [20:54:05<55:04, 2.57s/it] +2025-02-06 07:01:48 - ERROR - stderr - 94%|█████████▍| 21147/22434 [20:54:07<54:15, 2.53s/it] +2025-02-06 07:01:48 - ERROR - stderr - +2025-02-06 07:01:48 - ERROR - stderr - +2025-02-06 07:01:48 - INFO - stdout - {'loss': 0.3225, 'grad_norm': 1.4588078260421753, 'learning_rate': 1.7213103690412402e-07, 'epoch': 2.83} +2025-02-06 07:01:48 - ERROR - stderr - 94%|█████████▍| 21147/22434 [20:54:07<54:15, 2.53s/it] +2025-02-06 07:01:50 - ERROR - stderr - 94%|█████████▍| 21148/22434 [20:54:10<53:51, 2.51s/it] +2025-02-06 07:01:50 - ERROR - stderr - +2025-02-06 07:01:50 - ERROR - stderr - +2025-02-06 07:01:50 - INFO - stdout - {'loss': 0.3262, 'grad_norm': 1.5200961828231812, 'learning_rate': 1.7186441748168637e-07, 'epoch': 2.83} +2025-02-06 07:01:50 - ERROR - stderr - 94%|█████████▍| 21148/22434 [20:54:10<53:51, 2.51s/it] +2025-02-06 07:01:53 - ERROR - stderr - 94%|█████████▍| 21149/22434 [20:54:13<54:22, 2.54s/it] +2025-02-06 07:01:53 - ERROR - stderr - +2025-02-06 07:01:53 - ERROR - stderr - +2025-02-06 07:01:53 - INFO - stdout - {'loss': 0.376, 'grad_norm': 1.5430275201797485, 'learning_rate': 1.715980029173292e-07, 'epoch': 2.83} +2025-02-06 07:01:53 - ERROR - stderr - 94%|█████████▍| 21149/22434 [20:54:13<54:22, 2.54s/it] +2025-02-06 07:01:55 - ERROR - stderr - 94%|█████████▍| 21150/22434 [20:54:15<54:57, 2.57s/it] +2025-02-06 07:01:55 - ERROR - stderr - +2025-02-06 07:01:55 - ERROR - stderr - +2025-02-06 07:01:55 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.5299605131149292, 'learning_rate': 1.7133179321660698e-07, 'epoch': 2.83} +2025-02-06 07:01:55 - ERROR - stderr - 94%|█████████▍| 21150/22434 [20:54:15<54:57, 2.57s/it] +2025-02-06 07:01:58 - ERROR - stderr - 94%|█████████▍| 21151/22434 [20:54:18<54:53, 2.57s/it] +2025-02-06 07:01:58 - ERROR - stderr - +2025-02-06 07:01:58 - ERROR - stderr - +2025-02-06 07:01:58 - INFO - stdout - {'loss': 0.3143, 'grad_norm': 1.5517380237579346, 'learning_rate': 1.710657883850697e-07, 'epoch': 2.83} +2025-02-06 07:01:58 - ERROR - stderr - 94%|█████████▍| 21151/22434 [20:54:18<54:53, 2.57s/it] +2025-02-06 07:02:00 - ERROR - stderr - 94%|█████████▍| 21152/22434 [20:54:20<54:28, 2.55s/it] +2025-02-06 07:02:01 - ERROR - stderr - +2025-02-06 07:02:01 - ERROR - stderr - +2025-02-06 07:02:01 - INFO - stdout - {'loss': 0.3408, 'grad_norm': 1.376177191734314, 'learning_rate': 1.7079998842825962e-07, 'epoch': 2.83} +2025-02-06 07:02:01 - ERROR - stderr - 94%|█████████▍| 21152/22434 [20:54:20<54:28, 2.55s/it] +2025-02-06 07:02:03 - ERROR - stderr - 94%|█████████▍| 21153/22434 [20:54:23<53:56, 2.53s/it] +2025-02-06 07:02:03 - ERROR - stderr - +2025-02-06 07:02:03 - ERROR - stderr - +2025-02-06 07:02:03 - INFO - stdout - {'loss': 0.3107, 'grad_norm': 1.5396382808685303, 'learning_rate': 1.7053439335171895e-07, 'epoch': 2.83} +2025-02-06 07:02:03 - ERROR - stderr - 94%|█████████▍| 21153/22434 [20:54:23<53:56, 2.53s/it] +2025-02-06 07:02:05 - ERROR - stderr - 94%|█████████▍| 21154/22434 [20:54:25<53:32, 2.51s/it] +2025-02-06 07:02:05 - ERROR - stderr - +2025-02-06 07:02:05 - ERROR - stderr - +2025-02-06 07:02:05 - INFO - stdout - {'loss': 0.4055, 'grad_norm': 1.5965592861175537, 'learning_rate': 1.7026900316098217e-07, 'epoch': 2.83} +2025-02-06 07:02:05 - ERROR - stderr - 94%|█████████▍| 21154/22434 [20:54:25<53:32, 2.51s/it] +2025-02-06 07:02:08 - ERROR - stderr - 94%|█████████▍| 21155/22434 [20:54:28<53:34, 2.51s/it] +2025-02-06 07:02:08 - ERROR - stderr - +2025-02-06 07:02:08 - ERROR - stderr - +2025-02-06 07:02:08 - INFO - stdout - {'loss': 0.4026, 'grad_norm': 1.5593491792678833, 'learning_rate': 1.7000381786158372e-07, 'epoch': 2.83} +2025-02-06 07:02:08 - ERROR - stderr - 94%|█████████▍| 21155/22434 [20:54:28<53:34, 2.51s/it] +2025-02-06 07:02:11 - ERROR - stderr - 94%|█████████▍| 21156/22434 [20:54:31<55:33, 2.61s/it] +2025-02-06 07:02:11 - ERROR - stderr - +2025-02-06 07:02:11 - ERROR - stderr - +2025-02-06 07:02:11 - INFO - stdout - {'loss': 0.3292, 'grad_norm': 1.6342612504959106, 'learning_rate': 1.6973883745904696e-07, 'epoch': 2.83} +2025-02-06 07:02:11 - ERROR - stderr - 94%|█████████▍| 21156/22434 [20:54:31<55:33, 2.61s/it] +2025-02-06 07:02:13 - ERROR - stderr - 94%|█████████▍| 21157/22434 [20:54:33<54:59, 2.58s/it] +2025-02-06 07:02:13 - ERROR - stderr - +2025-02-06 07:02:13 - ERROR - stderr - +2025-02-06 07:02:13 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.4602092504501343, 'learning_rate': 1.694740619588997e-07, 'epoch': 2.83} +2025-02-06 07:02:13 - ERROR - stderr - 94%|█████████▍| 21157/22434 [20:54:33<54:59, 2.58s/it] +2025-02-06 07:02:16 - ERROR - stderr - 94%|█████████▍| 21158/22434 [20:54:36<54:14, 2.55s/it] +2025-02-06 07:02:16 - ERROR - stderr - +2025-02-06 07:02:16 - ERROR - stderr - +2025-02-06 07:02:16 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.6312198638916016, 'learning_rate': 1.6920949136665753e-07, 'epoch': 2.83} +2025-02-06 07:02:16 - ERROR - stderr - 94%|█████████▍| 21158/22434 [20:54:36<54:14, 2.55s/it] +2025-02-06 07:02:18 - ERROR - stderr - 94%|█████████▍| 21159/22434 [20:54:38<54:06, 2.55s/it] +2025-02-06 07:02:18 - ERROR - stderr - +2025-02-06 07:02:18 - ERROR - stderr - +2025-02-06 07:02:18 - INFO - stdout - {'loss': 0.3093, 'grad_norm': 1.3996559381484985, 'learning_rate': 1.6894512568783717e-07, 'epoch': 2.83} +2025-02-06 07:02:18 - ERROR - stderr - 94%|█████████▍| 21159/22434 [20:54:38<54:06, 2.55s/it] +2025-02-06 07:02:21 - ERROR - stderr - 94%|█████████▍| 21160/22434 [20:54:41<54:27, 2.56s/it] +2025-02-06 07:02:21 - ERROR - stderr - +2025-02-06 07:02:21 - ERROR - stderr - +2025-02-06 07:02:21 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.5484685897827148, 'learning_rate': 1.686809649279486e-07, 'epoch': 2.83} +2025-02-06 07:02:21 - ERROR - stderr - 94%|█████████▍| 21160/22434 [20:54:41<54:27, 2.56s/it] +2025-02-06 07:02:23 - ERROR - stderr - 94%|█████████▍| 21161/22434 [20:54:43<54:09, 2.55s/it] +2025-02-06 07:02:23 - ERROR - stderr - +2025-02-06 07:02:23 - ERROR - stderr - +2025-02-06 07:02:23 - INFO - stdout - {'loss': 0.3737, 'grad_norm': 1.6423790454864502, 'learning_rate': 1.6841700909249637e-07, 'epoch': 2.83} +2025-02-06 07:02:23 - ERROR - stderr - 94%|█████████▍| 21161/22434 [20:54:43<54:09, 2.55s/it] +2025-02-06 07:02:26 - ERROR - stderr - 94%|█████████▍| 21162/22434 [20:54:46<54:04, 2.55s/it] +2025-02-06 07:02:26 - ERROR - stderr - +2025-02-06 07:02:26 - ERROR - stderr - +2025-02-06 07:02:26 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.4821759462356567, 'learning_rate': 1.6815325818698493e-07, 'epoch': 2.83} +2025-02-06 07:02:26 - ERROR - stderr - 94%|█████████▍| 21162/22434 [20:54:46<54:04, 2.55s/it] +2025-02-06 07:02:28 - ERROR - stderr - 94%|█████████▍| 21163/22434 [20:54:48<53:41, 2.53s/it] +2025-02-06 07:02:29 - ERROR - stderr - +2025-02-06 07:02:29 - ERROR - stderr - +2025-02-06 07:02:29 - INFO - stdout - {'loss': 0.3932, 'grad_norm': 1.7089571952819824, 'learning_rate': 1.6788971221690986e-07, 'epoch': 2.83} +2025-02-06 07:02:29 - ERROR - stderr - 94%|█████████▍| 21163/22434 [20:54:48<53:41, 2.53s/it] +2025-02-06 07:02:31 - ERROR - stderr - 94%|█████████▍| 21164/22434 [20:54:51<53:27, 2.53s/it] +2025-02-06 07:02:31 - ERROR - stderr - +2025-02-06 07:02:31 - ERROR - stderr - +2025-02-06 07:02:31 - INFO - stdout - {'loss': 0.3965, 'grad_norm': 1.705830454826355, 'learning_rate': 1.6762637118776681e-07, 'epoch': 2.83} +2025-02-06 07:02:31 - ERROR - stderr - 94%|█████████▍| 21164/22434 [20:54:51<53:27, 2.53s/it] +2025-02-06 07:02:33 - ERROR - stderr - 94%|█████████▍| 21165/22434 [20:54:53<53:20, 2.52s/it] +2025-02-06 07:02:34 - ERROR - stderr - +2025-02-06 07:02:34 - ERROR - stderr - +2025-02-06 07:02:34 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.579155445098877, 'learning_rate': 1.6736323510504248e-07, 'epoch': 2.83} +2025-02-06 07:02:34 - ERROR - stderr - 94%|█████████▍| 21165/22434 [20:54:53<53:20, 2.52s/it] +2025-02-06 07:02:36 - ERROR - stderr - 94%|█████████▍| 21166/22434 [20:54:56<53:10, 2.52s/it] +2025-02-06 07:02:36 - ERROR - stderr - +2025-02-06 07:02:36 - ERROR - stderr - +2025-02-06 07:02:36 - INFO - stdout - {'loss': 0.3514, 'grad_norm': 1.4554558992385864, 'learning_rate': 1.671003039742225e-07, 'epoch': 2.83} +2025-02-06 07:02:36 - ERROR - stderr - 94%|█████████▍| 21166/22434 [20:54:56<53:10, 2.52s/it] +2025-02-06 07:02:39 - ERROR - stderr - 94%|█████████▍| 21167/22434 [20:54:58<53:14, 2.52s/it] +2025-02-06 07:02:39 - ERROR - stderr - +2025-02-06 07:02:39 - ERROR - stderr - +2025-02-06 07:02:39 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.4563628435134888, 'learning_rate': 1.6683757780078913e-07, 'epoch': 2.83} +2025-02-06 07:02:39 - ERROR - stderr - 94%|█████████▍| 21167/22434 [20:54:58<53:14, 2.52s/it] +2025-02-06 07:02:41 - ERROR - stderr - 94%|█████████▍| 21168/22434 [20:55:01<52:56, 2.51s/it] +2025-02-06 07:02:41 - ERROR - stderr - +2025-02-06 07:02:41 - ERROR - stderr - +2025-02-06 07:02:41 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.699845314025879, 'learning_rate': 1.6657505659021577e-07, 'epoch': 2.83} +2025-02-06 07:02:41 - ERROR - stderr - 94%|█████████▍| 21168/22434 [20:55:01<52:56, 2.51s/it] +2025-02-06 07:02:43 - ERROR - stderr - 94%|█████████▍| 21169/22434 [20:55:03<52:44, 2.50s/it] +2025-02-06 07:02:44 - ERROR - stderr - +2025-02-06 07:02:44 - ERROR - stderr - +2025-02-06 07:02:44 - INFO - stdout - {'loss': 0.3285, 'grad_norm': 1.5582879781723022, 'learning_rate': 1.6631274034797696e-07, 'epoch': 2.83} +2025-02-06 07:02:44 - ERROR - stderr - 94%|█████████▍| 21169/22434 [20:55:03<52:44, 2.50s/it] +2025-02-06 07:02:46 - ERROR - stderr - 94%|█████████▍| 21170/22434 [20:55:06<52:18, 2.48s/it] +2025-02-06 07:02:46 - ERROR - stderr - +2025-02-06 07:02:46 - ERROR - stderr - +2025-02-06 07:02:46 - INFO - stdout - {'loss': 0.3835, 'grad_norm': 1.5410879850387573, 'learning_rate': 1.6605062907953829e-07, 'epoch': 2.83} +2025-02-06 07:02:46 - ERROR - stderr - 94%|█████████▍| 21170/22434 [20:55:06<52:18, 2.48s/it] +2025-02-06 07:02:48 - ERROR - stderr - 94%|█████████▍| 21171/22434 [20:55:08<52:39, 2.50s/it] +2025-02-06 07:02:49 - ERROR - stderr - +2025-02-06 07:02:49 - ERROR - stderr - +2025-02-06 07:02:49 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.4152581691741943, 'learning_rate': 1.657887227903643e-07, 'epoch': 2.83} +2025-02-06 07:02:49 - ERROR - stderr - 94%|█████████▍| 21171/22434 [20:55:08<52:39, 2.50s/it] +2025-02-06 07:02:51 - ERROR - stderr - 94%|█████████▍| 21172/22434 [20:55:11<53:19, 2.54s/it] +2025-02-06 07:02:51 - ERROR - stderr - +2025-02-06 07:02:51 - ERROR - stderr - +2025-02-06 07:02:51 - INFO - stdout - {'loss': 0.3337, 'grad_norm': 1.479344129562378, 'learning_rate': 1.6552702148591392e-07, 'epoch': 2.83} +2025-02-06 07:02:51 - ERROR - stderr - 94%|█████████▍| 21172/22434 [20:55:11<53:19, 2.54s/it] +2025-02-06 07:02:54 - ERROR - stderr - 94%|█████████▍| 21173/22434 [20:55:14<55:49, 2.66s/it] +2025-02-06 07:02:54 - ERROR - stderr - +2025-02-06 07:02:54 - ERROR - stderr - +2025-02-06 07:02:54 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.8182692527770996, 'learning_rate': 1.6526552517164174e-07, 'epoch': 2.83} +2025-02-06 07:02:54 - ERROR - stderr - 94%|█████████▍| 21173/22434 [20:55:14<55:49, 2.66s/it] +2025-02-06 07:02:57 - ERROR - stderr - 94%|█████████▍| 21174/22434 [20:55:16<55:38, 2.65s/it] +2025-02-06 07:02:57 - ERROR - stderr - +2025-02-06 07:02:57 - ERROR - stderr - +2025-02-06 07:02:57 - INFO - stdout - {'loss': 0.3687, 'grad_norm': 1.6860917806625366, 'learning_rate': 1.6500423385300001e-07, 'epoch': 2.83} +2025-02-06 07:02:57 - ERROR - stderr - 94%|█████████▍| 21174/22434 [20:55:16<55:38, 2.65s/it] +2025-02-06 07:02:59 - ERROR - stderr - 94%|█████████▍| 21175/22434 [20:55:19<54:53, 2.62s/it] +2025-02-06 07:02:59 - ERROR - stderr - +2025-02-06 07:02:59 - ERROR - stderr - +2025-02-06 07:02:59 - INFO - stdout - {'loss': 0.3777, 'grad_norm': 1.6268608570098877, 'learning_rate': 1.647431475354333e-07, 'epoch': 2.83} +2025-02-06 07:02:59 - ERROR - stderr - 94%|█████████▍| 21175/22434 [20:55:19<54:53, 2.62s/it] +2025-02-06 07:03:02 - ERROR - stderr - 94%|█████████▍| 21176/22434 [20:55:21<53:58, 2.57s/it] +2025-02-06 07:03:02 - ERROR - stderr - +2025-02-06 07:03:02 - ERROR - stderr - +2025-02-06 07:03:02 - INFO - stdout - {'loss': 0.3053, 'grad_norm': 1.4623850584030151, 'learning_rate': 1.6448226622438503e-07, 'epoch': 2.83} +2025-02-06 07:03:02 - ERROR - stderr - 94%|█████████▍| 21176/22434 [20:55:21<53:58, 2.57s/it] +2025-02-06 07:03:04 - ERROR - stderr - 94%|█████████▍| 21177/22434 [20:55:24<53:47, 2.57s/it] +2025-02-06 07:03:04 - ERROR - stderr - +2025-02-06 07:03:04 - ERROR - stderr - +2025-02-06 07:03:04 - INFO - stdout - {'loss': 0.4221, 'grad_norm': 1.672318935394287, 'learning_rate': 1.6422158992529082e-07, 'epoch': 2.83} +2025-02-06 07:03:04 - ERROR - stderr - 94%|█████████▍| 21177/22434 [20:55:24<53:47, 2.57s/it] +2025-02-06 07:03:07 - ERROR - stderr - 94%|█████████▍| 21178/22434 [20:55:27<54:57, 2.63s/it] +2025-02-06 07:03:07 - ERROR - stderr - +2025-02-06 07:03:07 - ERROR - stderr - +2025-02-06 07:03:07 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.3909658193588257, 'learning_rate': 1.6396111864358744e-07, 'epoch': 2.83} +2025-02-06 07:03:07 - ERROR - stderr - 94%|█████████▍| 21178/22434 [20:55:27<54:57, 2.63s/it] +2025-02-06 07:03:10 - ERROR - stderr - 94%|█████████▍| 21179/22434 [20:55:29<54:26, 2.60s/it] +2025-02-06 07:03:10 - ERROR - stderr - +2025-02-06 07:03:10 - ERROR - stderr - +2025-02-06 07:03:10 - INFO - stdout - {'loss': 0.321, 'grad_norm': 1.2756800651550293, 'learning_rate': 1.6370085238470168e-07, 'epoch': 2.83} +2025-02-06 07:03:10 - ERROR - stderr - 94%|█████████▍| 21179/22434 [20:55:29<54:26, 2.60s/it] +2025-02-06 07:03:12 - ERROR - stderr - 94%|█████████▍| 21180/22434 [20:55:32<53:45, 2.57s/it] +2025-02-06 07:03:12 - ERROR - stderr - +2025-02-06 07:03:12 - ERROR - stderr - +2025-02-06 07:03:12 - INFO - stdout - {'loss': 0.375, 'grad_norm': 1.5968912839889526, 'learning_rate': 1.634407911540592e-07, 'epoch': 2.83} +2025-02-06 07:03:12 - ERROR - stderr - 94%|█████████▍| 21180/22434 [20:55:32<53:45, 2.57s/it] +2025-02-06 07:03:15 - ERROR - stderr - 94%|█████████▍| 21181/22434 [20:55:34<53:03, 2.54s/it] +2025-02-06 07:03:15 - ERROR - stderr - +2025-02-06 07:03:15 - ERROR - stderr - +2025-02-06 07:03:15 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.7612203359603882, 'learning_rate': 1.631809349570823e-07, 'epoch': 2.83} +2025-02-06 07:03:15 - ERROR - stderr - 94%|█████████▍| 21181/22434 [20:55:34<53:03, 2.54s/it] +2025-02-06 07:03:17 - ERROR - stderr - 94%|█████████▍| 21182/22434 [20:55:37<52:34, 2.52s/it] +2025-02-06 07:03:17 - ERROR - stderr - +2025-02-06 07:03:17 - ERROR - stderr - +2025-02-06 07:03:17 - INFO - stdout - {'loss': 0.3915, 'grad_norm': 1.662456750869751, 'learning_rate': 1.6292128379918337e-07, 'epoch': 2.83} +2025-02-06 07:03:17 - ERROR - stderr - 94%|█████████▍| 21182/22434 [20:55:37<52:34, 2.52s/it] +2025-02-06 07:03:20 - ERROR - stderr - 94%|█████████▍| 21183/22434 [20:55:39<52:46, 2.53s/it] +2025-02-06 07:03:20 - ERROR - stderr - +2025-02-06 07:03:20 - ERROR - stderr - +2025-02-06 07:03:20 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.5555535554885864, 'learning_rate': 1.6266183768578026e-07, 'epoch': 2.83} +2025-02-06 07:03:20 - ERROR - stderr - 94%|█████████▍| 21183/22434 [20:55:39<52:46, 2.53s/it] +2025-02-06 07:03:22 - ERROR - stderr - 94%|█████████▍| 21184/22434 [20:55:42<52:27, 2.52s/it] +2025-02-06 07:03:22 - ERROR - stderr - +2025-02-06 07:03:22 - ERROR - stderr - +2025-02-06 07:03:22 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5987141132354736, 'learning_rate': 1.6240259662227531e-07, 'epoch': 2.83} +2025-02-06 07:03:22 - ERROR - stderr - 94%|█████████▍| 21184/22434 [20:55:42<52:27, 2.52s/it] +2025-02-06 07:03:25 - ERROR - stderr - 94%|█████████▍| 21185/22434 [20:55:44<52:58, 2.55s/it] +2025-02-06 07:03:25 - ERROR - stderr - +2025-02-06 07:03:25 - ERROR - stderr - +2025-02-06 07:03:25 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.6038262844085693, 'learning_rate': 1.6214356061407532e-07, 'epoch': 2.83} +2025-02-06 07:03:25 - ERROR - stderr - 94%|█████████▍| 21185/22434 [20:55:44<52:58, 2.55s/it] +2025-02-06 07:03:27 - ERROR - stderr - 94%|█████████▍| 21186/22434 [20:55:47<52:54, 2.54s/it] +2025-02-06 07:03:27 - ERROR - stderr - +2025-02-06 07:03:27 - ERROR - stderr - +2025-02-06 07:03:27 - INFO - stdout - {'loss': 0.3473, 'grad_norm': 1.4723541736602783, 'learning_rate': 1.6188472966658043e-07, 'epoch': 2.83} +2025-02-06 07:03:27 - ERROR - stderr - 94%|█████████▍| 21186/22434 [20:55:47<52:54, 2.54s/it] +2025-02-06 07:03:30 - ERROR - stderr - 94%|█████████▍| 21187/22434 [20:55:49<52:46, 2.54s/it] +2025-02-06 07:03:30 - ERROR - stderr - +2025-02-06 07:03:30 - ERROR - stderr - +2025-02-06 07:03:30 - INFO - stdout - {'loss': 0.3632, 'grad_norm': 1.5981757640838623, 'learning_rate': 1.6162610378518183e-07, 'epoch': 2.83} +2025-02-06 07:03:30 - ERROR - stderr - 94%|█████████▍| 21187/22434 [20:55:50<52:46, 2.54s/it] +2025-02-06 07:03:32 - ERROR - stderr - 94%|█████████▍| 21188/22434 [20:55:52<51:58, 2.50s/it] +2025-02-06 07:03:32 - ERROR - stderr - +2025-02-06 07:03:32 - ERROR - stderr - +2025-02-06 07:03:32 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.7323797941207886, 'learning_rate': 1.6136768297527527e-07, 'epoch': 2.83} +2025-02-06 07:03:32 - ERROR - stderr - 94%|█████████▍| 21188/22434 [20:55:52<51:58, 2.50s/it] +2025-02-06 07:03:35 - ERROR - stderr - 94%|█████████▍| 21189/22434 [20:55:54<51:36, 2.49s/it] +2025-02-06 07:03:35 - ERROR - stderr - +2025-02-06 07:03:35 - ERROR - stderr - +2025-02-06 07:03:35 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.5628026723861694, 'learning_rate': 1.6110946724224308e-07, 'epoch': 2.83} +2025-02-06 07:03:35 - ERROR - stderr - 94%|█████████▍| 21189/22434 [20:55:54<51:36, 2.49s/it] +2025-02-06 07:03:37 - ERROR - stderr - 94%|█████████▍| 21190/22434 [20:55:57<51:35, 2.49s/it] +2025-02-06 07:03:37 - ERROR - stderr - +2025-02-06 07:03:37 - ERROR - stderr - +2025-02-06 07:03:37 - INFO - stdout - {'loss': 0.3685, 'grad_norm': 1.591599464416504, 'learning_rate': 1.6085145659146985e-07, 'epoch': 2.83} +2025-02-06 07:03:37 - ERROR - stderr - 94%|█████████▍| 21190/22434 [20:55:57<51:35, 2.49s/it] +2025-02-06 07:03:40 - ERROR - stderr - 94%|█████████▍| 21191/22434 [20:55:59<51:18, 2.48s/it] +2025-02-06 07:03:40 - ERROR - stderr - +2025-02-06 07:03:40 - ERROR - stderr - +2025-02-06 07:03:40 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.6770014762878418, 'learning_rate': 1.6059365102833346e-07, 'epoch': 2.83} +2025-02-06 07:03:40 - ERROR - stderr - 94%|█████████▍| 21191/22434 [20:55:59<51:18, 2.48s/it] +2025-02-06 07:03:42 - ERROR - stderr - 94%|█████████▍| 21192/22434 [20:56:02<51:43, 2.50s/it] +2025-02-06 07:03:42 - ERROR - stderr - +2025-02-06 07:03:42 - ERROR - stderr - +2025-02-06 07:03:42 - INFO - stdout - {'loss': 0.4218, 'grad_norm': 1.7900179624557495, 'learning_rate': 1.6033605055820634e-07, 'epoch': 2.83} +2025-02-06 07:03:42 - ERROR - stderr - 94%|█████████▍| 21192/22434 [20:56:02<51:43, 2.50s/it] +2025-02-06 07:03:45 - ERROR - stderr - 94%|█████████▍| 21193/22434 [20:56:04<51:55, 2.51s/it] +2025-02-06 07:03:45 - ERROR - stderr - +2025-02-06 07:03:45 - ERROR - stderr - +2025-02-06 07:03:45 - INFO - stdout - {'loss': 0.3215, 'grad_norm': 1.4443432092666626, 'learning_rate': 1.6007865518645859e-07, 'epoch': 2.83} +2025-02-06 07:03:45 - ERROR - stderr - 94%|█████████▍| 21193/22434 [20:56:04<51:55, 2.51s/it] +2025-02-06 07:03:47 - ERROR - stderr - 94%|█████████▍| 21194/22434 [20:56:07<53:11, 2.57s/it] +2025-02-06 07:03:47 - ERROR - stderr - +2025-02-06 07:03:47 - ERROR - stderr - +2025-02-06 07:03:47 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.6381317377090454, 'learning_rate': 1.5982146491845596e-07, 'epoch': 2.83} +2025-02-06 07:03:47 - ERROR - stderr - 94%|█████████▍| 21194/22434 [20:56:07<53:11, 2.57s/it] +2025-02-06 07:03:50 - ERROR - stderr - 94%|█████████▍| 21195/22434 [20:56:10<53:05, 2.57s/it] +2025-02-06 07:03:50 - ERROR - stderr - +2025-02-06 07:03:50 - ERROR - stderr - +2025-02-06 07:03:50 - INFO - stdout - {'loss': 0.394, 'grad_norm': 1.5008882284164429, 'learning_rate': 1.5956447975955859e-07, 'epoch': 2.83} +2025-02-06 07:03:50 - ERROR - stderr - 94%|█████████▍| 21195/22434 [20:56:10<53:05, 2.57s/it] +2025-02-06 07:03:52 - ERROR - stderr - 94%|█████████▍| 21196/22434 [20:56:12<53:19, 2.58s/it] +2025-02-06 07:03:53 - ERROR - stderr - +2025-02-06 07:03:53 - ERROR - stderr - +2025-02-06 07:03:53 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.5776972770690918, 'learning_rate': 1.5930769971512327e-07, 'epoch': 2.83} +2025-02-06 07:03:53 - ERROR - stderr - 94%|█████████▍| 21196/22434 [20:56:12<53:19, 2.58s/it] +2025-02-06 07:03:55 - ERROR - stderr - 94%|█████████▍| 21197/22434 [20:56:15<52:41, 2.56s/it] +2025-02-06 07:03:55 - ERROR - stderr - +2025-02-06 07:03:55 - ERROR - stderr - +2025-02-06 07:03:55 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.5829923152923584, 'learning_rate': 1.5905112479050354e-07, 'epoch': 2.83} +2025-02-06 07:03:55 - ERROR - stderr - 94%|█████████▍| 21197/22434 [20:56:15<52:41, 2.56s/it] +2025-02-06 07:03:57 - ERROR - stderr - 94%|█████████▍| 21198/22434 [20:56:17<52:08, 2.53s/it] +2025-02-06 07:03:58 - ERROR - stderr - +2025-02-06 07:03:58 - ERROR - stderr - +2025-02-06 07:03:58 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.5479191541671753, 'learning_rate': 1.5879475499104514e-07, 'epoch': 2.83} +2025-02-06 07:03:58 - ERROR - stderr - 94%|█████████▍| 21198/22434 [20:56:17<52:08, 2.53s/it] +2025-02-06 07:04:00 - ERROR - stderr - 94%|█████████▍| 21199/22434 [20:56:20<54:26, 2.64s/it] +2025-02-06 07:04:00 - ERROR - stderr - +2025-02-06 07:04:00 - ERROR - stderr - +2025-02-06 07:04:00 - INFO - stdout - {'loss': 0.3151, 'grad_norm': 1.4448344707489014, 'learning_rate': 1.5853859032209374e-07, 'epoch': 2.83} +2025-02-06 07:04:00 - ERROR - stderr - 94%|█████████▍| 21199/22434 [20:56:20<54:26, 2.64s/it] +2025-02-06 07:04:03 - ERROR - stderr - 94%|█████████▍| 21200/22434 [20:56:23<53:28, 2.60s/it] +2025-02-06 07:04:03 - ERROR - stderr - +2025-02-06 07:04:03 - ERROR - stderr - +2025-02-06 07:04:03 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.5671799182891846, 'learning_rate': 1.5828263078898842e-07, 'epoch': 2.83} +2025-02-06 07:04:03 - ERROR - stderr - 94%|█████████▍| 21200/22434 [20:56:23<53:28, 2.60s/it] +2025-02-06 07:04:05 - ERROR - stderr - 95%|█████████▍| 21201/22434 [20:56:25<53:35, 2.61s/it] +2025-02-06 07:04:06 - ERROR - stderr - +2025-02-06 07:04:06 - ERROR - stderr - +2025-02-06 07:04:06 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.4489481449127197, 'learning_rate': 1.5802687639706272e-07, 'epoch': 2.84} +2025-02-06 07:04:06 - ERROR - stderr - 95%|█████████▍| 21201/22434 [20:56:25<53:35, 2.61s/it] +2025-02-06 07:04:08 - ERROR - stderr - 95%|█████████▍| 21202/22434 [20:56:28<53:05, 2.59s/it] +2025-02-06 07:04:08 - ERROR - stderr - +2025-02-06 07:04:08 - ERROR - stderr - +2025-02-06 07:04:08 - INFO - stdout - {'loss': 0.4093, 'grad_norm': 1.6658684015274048, 'learning_rate': 1.5777132715165012e-07, 'epoch': 2.84} +2025-02-06 07:04:08 - ERROR - stderr - 95%|█████████▍| 21202/22434 [20:56:28<53:05, 2.59s/it] +2025-02-06 07:04:11 - ERROR - stderr - 95%|█████████▍| 21203/22434 [20:56:30<52:34, 2.56s/it] +2025-02-06 07:04:11 - ERROR - stderr - +2025-02-06 07:04:11 - ERROR - stderr - +2025-02-06 07:04:11 - INFO - stdout - {'loss': 0.3227, 'grad_norm': 1.4433869123458862, 'learning_rate': 1.5751598305807526e-07, 'epoch': 2.84} +2025-02-06 07:04:11 - ERROR - stderr - 95%|█████████▍| 21203/22434 [20:56:30<52:34, 2.56s/it] +2025-02-06 07:04:13 - ERROR - stderr - 95%|█████████▍| 21204/22434 [20:56:33<52:08, 2.54s/it] +2025-02-06 07:04:13 - ERROR - stderr - +2025-02-06 07:04:13 - ERROR - stderr - +2025-02-06 07:04:13 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.4792289733886719, 'learning_rate': 1.5726084412166277e-07, 'epoch': 2.84} +2025-02-06 07:04:13 - ERROR - stderr - 95%|█████████▍| 21204/22434 [20:56:33<52:08, 2.54s/it] +2025-02-06 07:04:15 - ERROR - stderr - 95%|█████████▍| 21205/22434 [20:56:35<51:26, 2.51s/it] +2025-02-06 07:04:16 - ERROR - stderr - +2025-02-06 07:04:16 - ERROR - stderr - +2025-02-06 07:04:16 - INFO - stdout - {'loss': 0.3639, 'grad_norm': 1.581131100654602, 'learning_rate': 1.5700591034772949e-07, 'epoch': 2.84} +2025-02-06 07:04:16 - ERROR - stderr - 95%|█████████▍| 21205/22434 [20:56:35<51:26, 2.51s/it] +2025-02-06 07:04:18 - ERROR - stderr - 95%|█████████▍| 21206/22434 [20:56:38<51:34, 2.52s/it] +2025-02-06 07:04:18 - ERROR - stderr - +2025-02-06 07:04:18 - ERROR - stderr - +2025-02-06 07:04:18 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.6240030527114868, 'learning_rate': 1.5675118174158787e-07, 'epoch': 2.84} +2025-02-06 07:04:18 - ERROR - stderr - 95%|█████████���| 21206/22434 [20:56:38<51:34, 2.52s/it] +2025-02-06 07:04:20 - ERROR - stderr - 95%|█████████▍| 21207/22434 [20:56:40<51:11, 2.50s/it] +2025-02-06 07:04:21 - ERROR - stderr - +2025-02-06 07:04:21 - ERROR - stderr - +2025-02-06 07:04:21 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.514824390411377, 'learning_rate': 1.564966583085503e-07, 'epoch': 2.84} +2025-02-06 07:04:21 - ERROR - stderr - 95%|█████████▍| 21207/22434 [20:56:40<51:11, 2.50s/it] +2025-02-06 07:04:23 - ERROR - stderr - 95%|█████████▍| 21208/22434 [20:56:43<51:03, 2.50s/it] +2025-02-06 07:04:23 - ERROR - stderr - +2025-02-06 07:04:23 - ERROR - stderr - +2025-02-06 07:04:23 - INFO - stdout - {'loss': 0.294, 'grad_norm': 1.382228136062622, 'learning_rate': 1.5624234005392036e-07, 'epoch': 2.84} +2025-02-06 07:04:23 - ERROR - stderr - 95%|█████████▍| 21208/22434 [20:56:43<51:03, 2.50s/it] +2025-02-06 07:04:25 - ERROR - stderr - 95%|█████████▍| 21209/22434 [20:56:45<50:41, 2.48s/it] +2025-02-06 07:04:25 - ERROR - stderr - +2025-02-06 07:04:25 - ERROR - stderr - +2025-02-06 07:04:25 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.636189341545105, 'learning_rate': 1.5598822698299932e-07, 'epoch': 2.84} +2025-02-06 07:04:25 - ERROR - stderr - 95%|█████████▍| 21209/22434 [20:56:45<50:41, 2.48s/it] +2025-02-06 07:04:28 - ERROR - stderr - 95%|█████████▍| 21210/22434 [20:56:48<51:06, 2.51s/it] +2025-02-06 07:04:28 - ERROR - stderr - +2025-02-06 07:04:28 - ERROR - stderr - +2025-02-06 07:04:28 - INFO - stdout - {'loss': 0.3194, 'grad_norm': 1.4737849235534668, 'learning_rate': 1.5573431910108404e-07, 'epoch': 2.84} +2025-02-06 07:04:28 - ERROR - stderr - 95%|█████████▍| 21210/22434 [20:56:48<51:06, 2.51s/it] +2025-02-06 07:04:30 - ERROR - stderr - 95%|█████████▍| 21211/22434 [20:56:50<51:03, 2.50s/it] +2025-02-06 07:04:31 - ERROR - stderr - +2025-02-06 07:04:31 - ERROR - stderr - +2025-02-06 07:04:31 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.7016079425811768, 'learning_rate': 1.554806164134659e-07, 'epoch': 2.84} +2025-02-06 07:04:31 - ERROR - stderr - 95%|█████████▍| 21211/22434 [20:56:50<51:03, 2.50s/it] +2025-02-06 07:04:33 - ERROR - stderr - 95%|█████████▍| 21212/22434 [20:56:53<51:27, 2.53s/it] +2025-02-06 07:04:33 - ERROR - stderr - +2025-02-06 07:04:33 - ERROR - stderr - +2025-02-06 07:04:33 - INFO - stdout - {'loss': 0.3673, 'grad_norm': 1.5472626686096191, 'learning_rate': 1.552271189254362e-07, 'epoch': 2.84} +2025-02-06 07:04:33 - ERROR - stderr - 95%|█████████▍| 21212/22434 [20:56:53<51:27, 2.53s/it] +2025-02-06 07:04:36 - ERROR - stderr - 95%|█████████▍| 21213/22434 [20:56:56<53:48, 2.64s/it] +2025-02-06 07:04:36 - ERROR - stderr - +2025-02-06 07:04:36 - ERROR - stderr - +2025-02-06 07:04:36 - INFO - stdout - {'loss': 0.3232, 'grad_norm': 1.2918970584869385, 'learning_rate': 1.5497382664227512e-07, 'epoch': 2.84} +2025-02-06 07:04:36 - ERROR - stderr - 95%|█████████▍| 21213/22434 [20:56:56<53:48, 2.64s/it] +2025-02-06 07:04:38 - ERROR - stderr - 95%|█████████▍| 21214/22434 [20:56:58<52:43, 2.59s/it] +2025-02-06 07:04:38 - ERROR - stderr - +2025-02-06 07:04:38 - ERROR - stderr - +2025-02-06 07:04:38 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.6269826889038086, 'learning_rate': 1.5472073956926404e-07, 'epoch': 2.84} +2025-02-06 07:04:38 - ERROR - stderr - 95%|█████████▍| 21214/22434 [20:56:58<52:43, 2.59s/it] +2025-02-06 07:04:41 - ERROR - stderr - 95%|█████████▍| 21215/22434 [20:57:01<52:04, 2.56s/it] +2025-02-06 07:04:41 - ERROR - stderr - +2025-02-06 07:04:41 - ERROR - stderr - +2025-02-06 07:04:41 - INFO - stdout - {'loss': 0.3295, 'grad_norm': 1.4544490575790405, 'learning_rate': 1.544678577116787e-07, 'epoch': 2.84} +2025-02-06 07:04:41 - ERROR - stderr - 95%|█████████▍| 21215/22434 [20:57:01<52:04, 2.56s/it] +2025-02-06 07:04:43 - ERROR - stderr - 95%|█████████▍| 21216/22434 [20:57:03<51:58, 2.56s/it] +2025-02-06 07:04:44 - ERROR - stderr - +2025-02-06 07:04:44 - ERROR - stderr - +2025-02-06 07:04:44 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.5082616806030273, 'learning_rate': 1.5421518107478939e-07, 'epoch': 2.84} +2025-02-06 07:04:44 - ERROR - stderr - 95%|█████████▍| 21216/22434 [20:57:03<51:58, 2.56s/it] +2025-02-06 07:04:46 - ERROR - stderr - 95%|█████████▍| 21217/22434 [20:57:06<51:18, 2.53s/it] +2025-02-06 07:04:46 - ERROR - stderr - +2025-02-06 07:04:46 - ERROR - stderr - +2025-02-06 07:04:46 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.574182152748108, 'learning_rate': 1.5396270966386407e-07, 'epoch': 2.84} +2025-02-06 07:04:46 - ERROR - stderr - 95%|█████████▍| 21217/22434 [20:57:06<51:18, 2.53s/it] +2025-02-06 07:04:49 - ERROR - stderr - 95%|█████████▍| 21218/22434 [20:57:09<53:05, 2.62s/it] +2025-02-06 07:04:49 - ERROR - stderr - +2025-02-06 07:04:49 - ERROR - stderr - +2025-02-06 07:04:49 - INFO - stdout - {'loss': 0.3764, 'grad_norm': 1.6156799793243408, 'learning_rate': 1.537104434841641e-07, 'epoch': 2.84} +2025-02-06 07:04:49 - ERROR - stderr - 95%|█████████▍| 21218/22434 [20:57:09<53:05, 2.62s/it] +2025-02-06 07:04:52 - ERROR - stderr - 95%|█████████▍| 21219/22434 [20:57:11<54:45, 2.70s/it] +2025-02-06 07:04:52 - ERROR - stderr - +2025-02-06 07:04:52 - ERROR - stderr - +2025-02-06 07:04:52 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.6959151029586792, 'learning_rate': 1.5345838254094746e-07, 'epoch': 2.84} +2025-02-06 07:04:52 - ERROR - stderr - 95%|█████████▍| 21219/22434 [20:57:11<54:45, 2.70s/it] +2025-02-06 07:04:54 - ERROR - stderr - 95%|█████████▍| 21220/22434 [20:57:14<53:05, 2.62s/it] +2025-02-06 07:04:54 - ERROR - stderr - +2025-02-06 07:04:54 - ERROR - stderr - +2025-02-06 07:04:54 - INFO - stdout - {'loss': 0.3881, 'grad_norm': 1.5061683654785156, 'learning_rate': 1.532065268394689e-07, 'epoch': 2.84} +2025-02-06 07:04:54 - ERROR - stderr - 95%|█████████▍| 21220/22434 [20:57:14<53:05, 2.62s/it] +2025-02-06 07:04:57 - ERROR - stderr - 95%|█████████▍| 21221/22434 [20:57:16<52:43, 2.61s/it] +2025-02-06 07:04:57 - ERROR - stderr - +2025-02-06 07:04:57 - ERROR - stderr - +2025-02-06 07:04:57 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.6991662979125977, 'learning_rate': 1.5295487638497863e-07, 'epoch': 2.84} +2025-02-06 07:04:57 - ERROR - stderr - 95%|█████████▍| 21221/22434 [20:57:16<52:43, 2.61s/it] +2025-02-06 07:04:59 - ERROR - stderr - 95%|█████████▍| 21222/22434 [20:57:19<51:53, 2.57s/it] +2025-02-06 07:04:59 - ERROR - stderr - +2025-02-06 07:04:59 - ERROR - stderr - +2025-02-06 07:04:59 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.6324154138565063, 'learning_rate': 1.5270343118272024e-07, 'epoch': 2.84} +2025-02-06 07:04:59 - ERROR - stderr - 95%|█████████▍| 21222/22434 [20:57:19<51:53, 2.57s/it] +2025-02-06 07:05:02 - ERROR - stderr - 95%|█████████▍| 21223/22434 [20:57:21<51:23, 2.55s/it] +2025-02-06 07:05:02 - ERROR - stderr - +2025-02-06 07:05:02 - ERROR - stderr - +2025-02-06 07:05:02 - INFO - stdout - {'loss': 0.3426, 'grad_norm': 1.6182020902633667, 'learning_rate': 1.5245219123793619e-07, 'epoch': 2.84} +2025-02-06 07:05:02 - ERROR - stderr - 95%|█████████▍| 21223/22434 [20:57:21<51:23, 2.55s/it] +2025-02-06 07:05:04 - ERROR - stderr - 95%|█████████▍| 21224/22434 [20:57:24<50:44, 2.52s/it] +2025-02-06 07:05:04 - ERROR - stderr - +2025-02-06 07:05:04 - ERROR - stderr - +2025-02-06 07:05:04 - INFO - stdout - {'loss': 0.3624, 'grad_norm': 1.6600232124328613, 'learning_rate': 1.5220115655586454e-07, 'epoch': 2.84} +2025-02-06 07:05:04 - ERROR - stderr - 95%|█████████▍| 21224/22434 [20:57:24<50:44, 2.52s/it] +2025-02-06 07:05:07 - ERROR - stderr - 95%|█████████▍| 21225/22434 [20:57:26<50:19, 2.50s/it] +2025-02-06 07:05:07 - ERROR - stderr - +2025-02-06 07:05:07 - ERROR - stderr - +2025-02-06 07:05:07 - INFO - stdout - {'loss': 0.3974, 'grad_norm': 1.6321144104003906, 'learning_rate': 1.5195032714173442e-07, 'epoch': 2.84} +2025-02-06 07:05:07 - ERROR - stderr - 95%|█████████▍| 21225/22434 [20:57:26<50:19, 2.50s/it] +2025-02-06 07:05:09 - ERROR - stderr - 95%|█████████▍| 21226/22434 [20:57:29<50:06, 2.49s/it] +2025-02-06 07:05:09 - ERROR - stderr - +2025-02-06 07:05:09 - ERROR - stderr - +2025-02-06 07:05:09 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.5164142847061157, 'learning_rate': 1.516997030007783e-07, 'epoch': 2.84} +2025-02-06 07:05:09 - ERROR - stderr - 95%|█████████▍| 21226/22434 [20:57:29<50:06, 2.49s/it] +2025-02-06 07:05:12 - ERROR - stderr - 95%|█████████▍| 21227/22434 [20:57:31<50:42, 2.52s/it] +2025-02-06 07:05:12 - ERROR - stderr - +2025-02-06 07:05:12 - ERROR - stderr - +2025-02-06 07:05:12 - INFO - stdout - {'loss': 0.3449, 'grad_norm': 1.4518516063690186, 'learning_rate': 1.5144928413821647e-07, 'epoch': 2.84} +2025-02-06 07:05:12 - ERROR - stderr - 95%|█████████▍| 21227/22434 [20:57:31<50:42, 2.52s/it] +2025-02-06 07:05:14 - ERROR - stderr - 95%|█████████▍| 21228/22434 [20:57:34<50:33, 2.51s/it] +2025-02-06 07:05:14 - ERROR - stderr - +2025-02-06 07:05:14 - ERROR - stderr - +2025-02-06 07:05:14 - INFO - stdout - {'loss': 0.4094, 'grad_norm': 1.642876148223877, 'learning_rate': 1.5119907055927142e-07, 'epoch': 2.84} +2025-02-06 07:05:14 - ERROR - stderr - 95%|█████████▍| 21228/22434 [20:57:34<50:33, 2.51s/it] +2025-02-06 07:05:17 - ERROR - stderr - 95%|█████████▍| 21229/22434 [20:57:36<50:46, 2.53s/it] +2025-02-06 07:05:17 - ERROR - stderr - +2025-02-06 07:05:17 - ERROR - stderr - +2025-02-06 07:05:17 - INFO - stdout - {'loss': 0.3863, 'grad_norm': 1.6323058605194092, 'learning_rate': 1.5094906226915673e-07, 'epoch': 2.84} +2025-02-06 07:05:17 - ERROR - stderr - 95%|█████████▍| 21229/22434 [20:57:36<50:46, 2.53s/it] +2025-02-06 07:05:19 - ERROR - stderr - 95%|█████████▍| 21230/22434 [20:57:39<50:24, 2.51s/it] +2025-02-06 07:05:19 - ERROR - stderr - +2025-02-06 07:05:19 - ERROR - stderr - +2025-02-06 07:05:19 - INFO - stdout - {'loss': 0.2975, 'grad_norm': 1.1945912837982178, 'learning_rate': 1.506992592730827e-07, 'epoch': 2.84} +2025-02-06 07:05:19 - ERROR - stderr - 95%|█████████▍| 21230/22434 [20:57:39<50:24, 2.51s/it] +2025-02-06 07:05:22 - ERROR - stderr - 95%|█████████▍| 21231/22434 [20:57:41<50:18, 2.51s/it] +2025-02-06 07:05:22 - ERROR - stderr - +2025-02-06 07:05:22 - ERROR - stderr - +2025-02-06 07:05:22 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.7042748928070068, 'learning_rate': 1.5044966157626072e-07, 'epoch': 2.84} +2025-02-06 07:05:22 - ERROR - stderr - 95%|█████████▍| 21231/22434 [20:57:41<50:18, 2.51s/it] +2025-02-06 07:05:24 - ERROR - stderr - 95%|█████████▍| 21232/22434 [20:57:44<50:10, 2.50s/it] +2025-02-06 07:05:24 - ERROR - stderr - +2025-02-06 07:05:24 - ERROR - stderr - +2025-02-06 07:05:24 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.5199317932128906, 'learning_rate': 1.5020026918388885e-07, 'epoch': 2.84} +2025-02-06 07:05:24 - ERROR - stderr - 95%|█████████▍| 21232/22434 [20:57:44<50:10, 2.50s/it] +2025-02-06 07:05:27 - ERROR - stderr - 95%|█████████▍| 21233/22434 [20:57:46<50:18, 2.51s/it] +2025-02-06 07:05:27 - ERROR - stderr - +2025-02-06 07:05:27 - ERROR - stderr - +2025-02-06 07:05:27 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.6378141641616821, 'learning_rate': 1.499510821011685e-07, 'epoch': 2.84} +2025-02-06 07:05:27 - ERROR - stderr - 95%|█████████▍| 21233/22434 [20:57:46<50:18, 2.51s/it] +2025-02-06 07:05:29 - ERROR - stderr - 95%|█████████▍| 21234/22434 [20:57:49<49:59, 2.50s/it] +2025-02-06 07:05:29 - ERROR - stderr - +2025-02-06 07:05:29 - ERROR - stderr - +2025-02-06 07:05:29 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.7438452243804932, 'learning_rate': 1.4970210033329102e-07, 'epoch': 2.84} +2025-02-06 07:05:29 - ERROR - stderr - 95%|█████████▍| 21234/22434 [20:57:49<49:59, 2.50s/it] +2025-02-06 07:05:32 - ERROR - stderr - 95%|█████████▍| 21235/22434 [20:57:51<49:48, 2.49s/it] +2025-02-06 07:05:32 - ERROR - stderr - +2025-02-06 07:05:32 - ERROR - stderr - +2025-02-06 07:05:32 - INFO - stdout - {'loss': 0.3304, 'grad_norm': 1.6094701290130615, 'learning_rate': 1.4945332388544787e-07, 'epoch': 2.84} +2025-02-06 07:05:32 - ERROR - stderr - 95%|█████████▍| 21235/22434 [20:57:51<49:48, 2.49s/it] +2025-02-06 07:05:34 - ERROR - stderr - 95%|█████████▍| 21236/22434 [20:57:54<50:01, 2.51s/it] +2025-02-06 07:05:34 - ERROR - stderr - +2025-02-06 07:05:34 - ERROR - stderr - +2025-02-06 07:05:34 - INFO - stdout - {'loss': 0.3199, 'grad_norm': 1.5313116312026978, 'learning_rate': 1.4920475276282487e-07, 'epoch': 2.84} +2025-02-06 07:05:34 - ERROR - stderr - 95%|█████████▍| 21236/22434 [20:57:54<50:01, 2.51s/it] +2025-02-06 07:05:37 - ERROR - stderr - 95%|█████████▍| 21237/22434 [20:57:56<49:50, 2.50s/it] +2025-02-06 07:05:37 - ERROR - stderr - +2025-02-06 07:05:37 - ERROR - stderr - +2025-02-06 07:05:37 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.6055421829223633, 'learning_rate': 1.4895638697060232e-07, 'epoch': 2.84} +2025-02-06 07:05:37 - ERROR - stderr - 95%|█████████▍| 21237/22434 [20:57:56<49:50, 2.50s/it] +2025-02-06 07:05:39 - ERROR - stderr - 95%|█████████▍| 21238/22434 [20:57:59<50:06, 2.51s/it] +2025-02-06 07:05:39 - ERROR - stderr - +2025-02-06 07:05:39 - ERROR - stderr - +2025-02-06 07:05:39 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.565784215927124, 'learning_rate': 1.487082265139572e-07, 'epoch': 2.84} +2025-02-06 07:05:39 - ERROR - stderr - 95%|█████████▍| 21238/22434 [20:57:59<50:06, 2.51s/it] +2025-02-06 07:05:42 - ERROR - stderr - 95%|█████████▍| 21239/22434 [20:58:01<49:57, 2.51s/it] +2025-02-06 07:05:42 - ERROR - stderr - +2025-02-06 07:05:42 - ERROR - stderr - +2025-02-06 07:05:42 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.4942741394042969, 'learning_rate': 1.4846027139806207e-07, 'epoch': 2.84} +2025-02-06 07:05:42 - ERROR - stderr - 95%|█████████▍| 21239/22434 [20:58:02<49:57, 2.51s/it] +2025-02-06 07:05:44 - ERROR - stderr - 95%|█████████▍| 21240/22434 [20:58:04<49:56, 2.51s/it] +2025-02-06 07:05:44 - ERROR - stderr - +2025-02-06 07:05:44 - ERROR - stderr - +2025-02-06 07:05:44 - INFO - stdout - {'loss': 0.313, 'grad_norm': 1.422958493232727, 'learning_rate': 1.482125216280872e-07, 'epoch': 2.84} +2025-02-06 07:05:44 - ERROR - stderr - 95%|█████████▍| 21240/22434 [20:58:04<49:56, 2.51s/it] +2025-02-06 07:05:47 - ERROR - stderr - 95%|█████████▍| 21241/22434 [20:58:06<49:30, 2.49s/it] +2025-02-06 07:05:47 - ERROR - stderr - +2025-02-06 07:05:47 - ERROR - stderr - +2025-02-06 07:05:47 - INFO - stdout - {'loss': 0.3312, 'grad_norm': 1.4228938817977905, 'learning_rate': 1.479649772091929e-07, 'epoch': 2.84} +2025-02-06 07:05:47 - ERROR - stderr - 95%|█████████▍| 21241/22434 [20:58:06<49:30, 2.49s/it] +2025-02-06 07:05:49 - ERROR - stderr - 95%|█████████▍| 21242/22434 [20:58:09<49:30, 2.49s/it] +2025-02-06 07:05:49 - ERROR - stderr - +2025-02-06 07:05:49 - ERROR - stderr - +2025-02-06 07:05:49 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.6390165090560913, 'learning_rate': 1.4771763814654282e-07, 'epoch': 2.84} +2025-02-06 07:05:49 - ERROR - stderr - 95%|█████████▍| 21242/22434 [20:58:09<49:30, 2.49s/it] +2025-02-06 07:05:52 - ERROR - stderr - 95%|█████████▍| 21243/22434 [20:58:12<50:24, 2.54s/it] +2025-02-06 07:05:52 - ERROR - stderr - +2025-02-06 07:05:52 - ERROR - stderr - +2025-02-06 07:05:52 - INFO - stdout - {'loss': 0.343, 'grad_norm': 1.4138860702514648, 'learning_rate': 1.4747050444529066e-07, 'epoch': 2.84} +2025-02-06 07:05:52 - ERROR - stderr - 95%|█████████▍| 21243/22434 [20:58:12<50:24, 2.54s/it] +2025-02-06 07:05:54 - ERROR - stderr - 95%|█████████▍| 21244/22434 [20:58:14<49:55, 2.52s/it] +2025-02-06 07:05:54 - ERROR - stderr - +2025-02-06 07:05:54 - ERROR - stderr - +2025-02-06 07:05:54 - INFO - stdout - {'loss': 0.3966, 'grad_norm': 1.6071960926055908, 'learning_rate': 1.472235761105878e-07, 'epoch': 2.84} +2025-02-06 07:05:54 - ERROR - stderr - 95%|█████████▍| 21244/22434 [20:58:14<49:55, 2.52s/it] +2025-02-06 07:05:57 - ERROR - stderr - 95%|█████████▍| 21245/22434 [20:58:17<49:55, 2.52s/it] +2025-02-06 07:05:57 - ERROR - stderr - +2025-02-06 07:05:57 - ERROR - stderr - +2025-02-06 07:05:57 - INFO - stdout - {'loss': 0.3108, 'grad_norm': 1.354293942451477, 'learning_rate': 1.4697685314758236e-07, 'epoch': 2.84} +2025-02-06 07:05:57 - ERROR - stderr - 95%|█████████▍| 21245/22434 [20:58:17<49:55, 2.52s/it] +2025-02-06 07:05:59 - ERROR - stderr - 95%|█████████▍| 21246/22434 [20:58:19<50:17, 2.54s/it] +2025-02-06 07:05:59 - ERROR - stderr - +2025-02-06 07:05:59 - ERROR - stderr - +2025-02-06 07:05:59 - INFO - stdout - {'loss': 0.3317, 'grad_norm': 1.6347801685333252, 'learning_rate': 1.467303355614147e-07, 'epoch': 2.84} +2025-02-06 07:05:59 - ERROR - stderr - 95%|█████████▍| 21246/22434 [20:58:19<50:17, 2.54s/it] +2025-02-06 07:06:02 - ERROR - stderr - 95%|█████████▍| 21247/22434 [20:58:22<50:17, 2.54s/it] +2025-02-06 07:06:02 - ERROR - stderr - +2025-02-06 07:06:02 - ERROR - stderr - +2025-02-06 07:06:02 - INFO - stdout - {'loss': 0.3308, 'grad_norm': 1.4767248630523682, 'learning_rate': 1.4648402335722511e-07, 'epoch': 2.84} +2025-02-06 07:06:02 - ERROR - stderr - 95%|█████████▍| 21247/22434 [20:58:22<50:17, 2.54s/it] +2025-02-06 07:06:05 - ERROR - stderr - 95%|█████████▍| 21248/22434 [20:58:24<50:53, 2.57s/it] +2025-02-06 07:06:05 - ERROR - stderr - +2025-02-06 07:06:05 - ERROR - stderr - +2025-02-06 07:06:05 - INFO - stdout - {'loss': 0.3838, 'grad_norm': 1.5630346536636353, 'learning_rate': 1.462379165401473e-07, 'epoch': 2.84} +2025-02-06 07:06:05 - ERROR - stderr - 95%|█████████▍| 21248/22434 [20:58:24<50:53, 2.57s/it] +2025-02-06 07:06:07 - ERROR - stderr - 95%|█████████▍| 21249/22434 [20:58:27<50:25, 2.55s/it] +2025-02-06 07:06:07 - ERROR - stderr - +2025-02-06 07:06:07 - ERROR - stderr - +2025-02-06 07:06:07 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.4109286069869995, 'learning_rate': 1.4599201511531046e-07, 'epoch': 2.84} +2025-02-06 07:06:07 - ERROR - stderr - 95%|█████████▍| 21249/22434 [20:58:27<50:25, 2.55s/it] +2025-02-06 07:06:10 - ERROR - stderr - 95%|█████████▍| 21250/22434 [20:58:29<50:01, 2.54s/it] +2025-02-06 07:06:10 - ERROR - stderr - +2025-02-06 07:06:10 - ERROR - stderr - +2025-02-06 07:06:10 - INFO - stdout - {'loss': 0.3245, 'grad_norm': 1.5704572200775146, 'learning_rate': 1.4574631908784275e-07, 'epoch': 2.84} +2025-02-06 07:06:10 - ERROR - stderr - 95%|█████████▍| 21250/22434 [20:58:29<50:01, 2.54s/it] +2025-02-06 07:06:12 - ERROR - stderr - 95%|█████████▍| 21251/22434 [20:58:32<52:14, 2.65s/it] +2025-02-06 07:06:13 - ERROR - stderr - +2025-02-06 07:06:13 - ERROR - stderr - +2025-02-06 07:06:13 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.6264375448226929, 'learning_rate': 1.4550082846286117e-07, 'epoch': 2.84} +2025-02-06 07:06:13 - ERROR - stderr - 95%|█████████▍| 21251/22434 [20:58:32<52:14, 2.65s/it] +2025-02-06 07:06:15 - ERROR - stderr - 95%|█████████▍| 21252/22434 [20:58:35<51:00, 2.59s/it] +2025-02-06 07:06:15 - ERROR - stderr - +2025-02-06 07:06:15 - ERROR - stderr - +2025-02-06 07:06:15 - INFO - stdout - {'loss': 0.3506, 'grad_norm': 1.4583697319030762, 'learning_rate': 1.452555432454872e-07, 'epoch': 2.84} +2025-02-06 07:06:15 - ERROR - stderr - 95%|█████████▍| 21252/22434 [20:58:35<51:00, 2.59s/it] +2025-02-06 07:06:18 - ERROR - stderr - 95%|█████████▍| 21253/22434 [20:58:37<51:12, 2.60s/it] +2025-02-06 07:06:18 - ERROR - stderr - +2025-02-06 07:06:18 - ERROR - stderr - +2025-02-06 07:06:18 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.504470705986023, 'learning_rate': 1.4501046344083002e-07, 'epoch': 2.84} +2025-02-06 07:06:18 - ERROR - stderr - 95%|█████████▍| 21253/22434 [20:58:37<51:12, 2.60s/it] +2025-02-06 07:06:20 - ERROR - stderr - 95%|█████████▍| 21254/22434 [20:58:40<50:39, 2.58s/it] +2025-02-06 07:06:20 - ERROR - stderr - +2025-02-06 07:06:20 - ERROR - stderr - +2025-02-06 07:06:20 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5982348918914795, 'learning_rate': 1.4476558905400008e-07, 'epoch': 2.84} +2025-02-06 07:06:20 - ERROR - stderr - 95%|█████████▍| 21254/22434 [20:58:40<50:39, 2.58s/it] +2025-02-06 07:06:23 - ERROR - stderr - 95%|█████████▍| 21255/22434 [20:58:42<50:13, 2.56s/it] +2025-02-06 07:06:23 - ERROR - stderr - +2025-02-06 07:06:23 - ERROR - stderr - +2025-02-06 07:06:23 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.4951746463775635, 'learning_rate': 1.44520920090101e-07, 'epoch': 2.84} +2025-02-06 07:06:23 - ERROR - stderr - 95%|█████████▍| 21255/22434 [20:58:42<50:13, 2.56s/it] +2025-02-06 07:06:25 - ERROR - stderr - 95%|█████████▍| 21256/22434 [20:58:45<49:27, 2.52s/it] +2025-02-06 07:06:25 - ERROR - stderr - +2025-02-06 07:06:25 - ERROR - stderr - +2025-02-06 07:06:25 - INFO - stdout - {'loss': 0.3999, 'grad_norm': 1.6530685424804688, 'learning_rate': 1.4427645655423205e-07, 'epoch': 2.84} +2025-02-06 07:06:25 - ERROR - stderr - 95%|█████████▍| 21256/22434 [20:58:45<49:27, 2.52s/it] +2025-02-06 07:06:28 - ERROR - stderr - 95%|█████████▍| 21257/22434 [20:58:47<49:18, 2.51s/it] +2025-02-06 07:06:28 - ERROR - stderr - +2025-02-06 07:06:28 - ERROR - stderr - +2025-02-06 07:06:28 - INFO - stdout - {'loss': 0.3696, 'grad_norm': 1.6215635538101196, 'learning_rate': 1.440321984514903e-07, 'epoch': 2.84} +2025-02-06 07:06:28 - ERROR - stderr - 95%|█████████▍| 21257/22434 [20:58:47<49:18, 2.51s/it] +2025-02-06 07:06:30 - ERROR - stderr - 95%|█████████▍| 21258/22434 [20:58:50<48:47, 2.49s/it] +2025-02-06 07:06:30 - ERROR - stderr - +2025-02-06 07:06:30 - ERROR - stderr - +2025-02-06 07:06:30 - INFO - stdout - {'loss': 0.3334, 'grad_norm': 1.5934858322143555, 'learning_rate': 1.437881457869661e-07, 'epoch': 2.84} +2025-02-06 07:06:30 - ERROR - stderr - 95%|█████████▍| 21258/22434 [20:58:50<48:47, 2.49s/it] +2025-02-06 07:06:32 - ERROR - stderr - 95%|█████████▍| 21259/22434 [20:58:52<48:53, 2.50s/it] +2025-02-06 07:06:33 - ERROR - stderr - +2025-02-06 07:06:33 - ERROR - stderr - +2025-02-06 07:06:33 - INFO - stdout - {'loss': 0.3875, 'grad_norm': 1.7375500202178955, 'learning_rate': 1.435442985657465e-07, 'epoch': 2.84} +2025-02-06 07:06:33 - ERROR - stderr - 95%|█████████▍| 21259/22434 [20:58:52<48:53, 2.50s/it] +2025-02-06 07:06:35 - ERROR - stderr - 95%|█████████▍| 21260/22434 [20:58:55<48:22, 2.47s/it] +2025-02-06 07:06:35 - ERROR - stderr - +2025-02-06 07:06:35 - ERROR - stderr - +2025-02-06 07:06:35 - INFO - stdout - {'loss': 0.3204, 'grad_norm': 1.6236870288848877, 'learning_rate': 1.4330065679291404e-07, 'epoch': 2.84} +2025-02-06 07:06:35 - ERROR - stderr - 95%|█████████▍| 21260/22434 [20:58:55<48:22, 2.47s/it] +2025-02-06 07:06:38 - ERROR - stderr - 95%|█████████▍| 21261/22434 [20:58:57<49:43, 2.54s/it] +2025-02-06 07:06:38 - ERROR - stderr - +2025-02-06 07:06:38 - ERROR - stderr - +2025-02-06 07:06:38 - INFO - stdout - {'loss': 0.3276, 'grad_norm': 1.3964658975601196, 'learning_rate': 1.4305722047354808e-07, 'epoch': 2.84} +2025-02-06 07:06:38 - ERROR - stderr - 95%|█████████▍| 21261/22434 [20:58:57<49:43, 2.54s/it] +2025-02-06 07:06:40 - ERROR - stderr - 95%|█████████▍| 21262/22434 [20:59:00<49:40, 2.54s/it] +2025-02-06 07:06:40 - ERROR - stderr - +2025-02-06 07:06:40 - ERROR - stderr - +2025-02-06 07:06:40 - INFO - stdout - {'loss': 0.3354, 'grad_norm': 1.5651562213897705, 'learning_rate': 1.428139896127223e-07, 'epoch': 2.84} +2025-02-06 07:06:40 - ERROR - stderr - 95%|█████████▍| 21262/22434 [20:59:00<49:40, 2.54s/it] +2025-02-06 07:06:43 - ERROR - stderr - 95%|█████████▍| 21263/22434 [20:59:02<49:27, 2.53s/it] +2025-02-06 07:06:43 - ERROR - stderr - +2025-02-06 07:06:43 - ERROR - stderr - +2025-02-06 07:06:43 - INFO - stdout - {'loss': 0.4184, 'grad_norm': 1.8112328052520752, 'learning_rate': 1.4257096421550598e-07, 'epoch': 2.84} +2025-02-06 07:06:43 - ERROR - stderr - 95%|█████████▍| 21263/22434 [20:59:02<49:27, 2.53s/it] +2025-02-06 07:06:45 - ERROR - stderr - 95%|█████████▍| 21264/22434 [20:59:05<48:52, 2.51s/it] +2025-02-06 07:06:45 - ERROR - stderr - +2025-02-06 07:06:45 - ERROR - stderr - +2025-02-06 07:06:45 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.5994350910186768, 'learning_rate': 1.4232814428696507e-07, 'epoch': 2.84} +2025-02-06 07:06:45 - ERROR - stderr - 95%|█████████▍| 21264/22434 [20:59:05<48:52, 2.51s/it] +2025-02-06 07:06:48 - ERROR - stderr - 95%|█████████▍| 21265/22434 [20:59:07<49:02, 2.52s/it] +2025-02-06 07:06:48 - ERROR - stderr - +2025-02-06 07:06:48 - ERROR - stderr - +2025-02-06 07:06:48 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.4212368726730347, 'learning_rate': 1.4208552983216218e-07, 'epoch': 2.84} +2025-02-06 07:06:48 - ERROR - stderr - 95%|█████████▍| 21265/22434 [20:59:07<49:02, 2.52s/it] +2025-02-06 07:06:50 - ERROR - stderr - 95%|█████████▍| 21266/22434 [20:59:10<49:02, 2.52s/it] +2025-02-06 07:06:50 - ERROR - stderr - +2025-02-06 07:06:50 - ERROR - stderr - +2025-02-06 07:06:50 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.6336641311645508, 'learning_rate': 1.4184312085615437e-07, 'epoch': 2.84} +2025-02-06 07:06:50 - ERROR - stderr - 95%|█████████▍| 21266/22434 [20:59:10<49:02, 2.52s/it] +2025-02-06 07:06:53 - ERROR - stderr - 95%|█████████▍| 21267/22434 [20:59:12<49:13, 2.53s/it] +2025-02-06 07:06:53 - ERROR - stderr - +2025-02-06 07:06:53 - ERROR - stderr - +2025-02-06 07:06:53 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.3432832956314087, 'learning_rate': 1.4160091736399096e-07, 'epoch': 2.84} +2025-02-06 07:06:53 - ERROR - stderr - 95%|█████████▍| 21267/22434 [20:59:13<49:13, 2.53s/it] +2025-02-06 07:06:55 - ERROR - stderr - 95%|█████████▍| 21268/22434 [20:59:15<48:51, 2.51s/it] +2025-02-06 07:06:55 - ERROR - stderr - +2025-02-06 07:06:55 - ERROR - stderr - +2025-02-06 07:06:55 - INFO - stdout - {'loss': 0.4143, 'grad_norm': 1.6145756244659424, 'learning_rate': 1.4135891936072456e-07, 'epoch': 2.84} +2025-02-06 07:06:55 - ERROR - stderr - 95%|█████████▍| 21268/22434 [20:59:15<48:51, 2.51s/it] +2025-02-06 07:06:58 - ERROR - stderr - 95%|█████████▍| 21269/22434 [20:59:18<51:06, 2.63s/it] +2025-02-06 07:06:58 - ERROR - stderr - +2025-02-06 07:06:58 - ERROR - stderr - +2025-02-06 07:06:58 - INFO - stdout - {'loss': 0.3614, 'grad_norm': 1.443620204925537, 'learning_rate': 1.4111712685139777e-07, 'epoch': 2.84} +2025-02-06 07:06:58 - ERROR - stderr - 95%|█████████▍| 21269/22434 [20:59:18<51:06, 2.63s/it] +2025-02-06 07:07:01 - ERROR - stderr - 95%|█████████▍| 21270/22434 [20:59:20<50:24, 2.60s/it] +2025-02-06 07:07:01 - ERROR - stderr - +2025-02-06 07:07:01 - ERROR - stderr - +2025-02-06 07:07:01 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.57658052444458, 'learning_rate': 1.4087553984104995e-07, 'epoch': 2.84} +2025-02-06 07:07:01 - ERROR - stderr - 95%|█████████▍| 21270/22434 [20:59:20<50:24, 2.60s/it] +2025-02-06 07:07:03 - ERROR - stderr - 95%|█████████▍| 21271/22434 [20:59:23<50:27, 2.60s/it] +2025-02-06 07:07:03 - ERROR - stderr - +2025-02-06 07:07:03 - ERROR - stderr - +2025-02-06 07:07:03 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.3784065246582031, 'learning_rate': 1.4063415833471815e-07, 'epoch': 2.84} +2025-02-06 07:07:03 - ERROR - stderr - 95%|█████████▍| 21271/22434 [20:59:23<50:27, 2.60s/it] +2025-02-06 07:07:06 - ERROR - stderr - 95%|█████████▍| 21272/22434 [20:59:25<49:41, 2.57s/it] +2025-02-06 07:07:06 - ERROR - stderr - +2025-02-06 07:07:06 - ERROR - stderr - +2025-02-06 07:07:06 - INFO - stdout - {'loss': 0.3672, 'grad_norm': 1.827199101448059, 'learning_rate': 1.4039298233743171e-07, 'epoch': 2.84} +2025-02-06 07:07:06 - ERROR - stderr - 95%|█████████▍| 21272/22434 [20:59:26<49:41, 2.57s/it] +2025-02-06 07:07:08 - ERROR - stderr - 95%|█████████▍| 21273/22434 [20:59:28<49:34, 2.56s/it] +2025-02-06 07:07:08 - ERROR - stderr - +2025-02-06 07:07:08 - ERROR - stderr - +2025-02-06 07:07:08 - INFO - stdout - {'loss': 0.3347, 'grad_norm': 1.3486016988754272, 'learning_rate': 1.401520118542199e-07, 'epoch': 2.84} +2025-02-06 07:07:08 - ERROR - stderr - 95%|█████████▍| 21273/22434 [20:59:28<49:34, 2.56s/it] +2025-02-06 07:07:11 - ERROR - stderr - 95%|█████████▍| 21274/22434 [20:59:31<49:30, 2.56s/it] +2025-02-06 07:07:11 - ERROR - stderr - +2025-02-06 07:07:11 - ERROR - stderr - +2025-02-06 07:07:11 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.5634711980819702, 'learning_rate': 1.3991124689010426e-07, 'epoch': 2.84} +2025-02-06 07:07:11 - ERROR - stderr - 95%|█████████▍| 21274/22434 [20:59:31<49:30, 2.56s/it] +2025-02-06 07:07:13 - ERROR - stderr - 95%|█████████▍| 21275/22434 [20:59:33<48:48, 2.53s/it] +2025-02-06 07:07:13 - ERROR - stderr - +2025-02-06 07:07:13 - ERROR - stderr - +2025-02-06 07:07:13 - INFO - stdout - {'loss': 0.3962, 'grad_norm': 1.6382722854614258, 'learning_rate': 1.3967068745010305e-07, 'epoch': 2.85} +2025-02-06 07:07:13 - ERROR - stderr - 95%|█████████▍| 21275/22434 [20:59:33<48:48, 2.53s/it] +2025-02-06 07:07:16 - ERROR - stderr - 95%|█████████▍| 21276/22434 [20:59:36<50:43, 2.63s/it] +2025-02-06 07:07:16 - ERROR - stderr - +2025-02-06 07:07:16 - ERROR - stderr - +2025-02-06 07:07:16 - INFO - stdout - {'loss': 0.3406, 'grad_norm': 1.4224644899368286, 'learning_rate': 1.394303335392322e-07, 'epoch': 2.85} +2025-02-06 07:07:16 - ERROR - stderr - 95%|█████████▍| 21276/22434 [20:59:36<50:43, 2.63s/it] +2025-02-06 07:07:19 - ERROR - stderr - 95%|█████████▍| 21277/22434 [20:59:38<49:48, 2.58s/it] +2025-02-06 07:07:19 - ERROR - stderr - +2025-02-06 07:07:19 - ERROR - stderr - +2025-02-06 07:07:19 - INFO - stdout - {'loss': 0.3643, 'grad_norm': 1.690901756286621, 'learning_rate': 1.3919018516249994e-07, 'epoch': 2.85} +2025-02-06 07:07:19 - ERROR - stderr - 95%|█████████▍| 21277/22434 [20:59:38<49:48, 2.58s/it] +2025-02-06 07:07:21 - ERROR - stderr - 95%|█████████▍| 21278/22434 [20:59:41<49:11, 2.55s/it] +2025-02-06 07:07:21 - ERROR - stderr - +2025-02-06 07:07:21 - ERROR - stderr - +2025-02-06 07:07:21 - INFO - stdout - {'loss': 0.3682, 'grad_norm': 1.618504524230957, 'learning_rate': 1.3895024232491338e-07, 'epoch': 2.85} +2025-02-06 07:07:21 - ERROR - stderr - 95%|█████████▍| 21278/22434 [20:59:41<49:11, 2.55s/it] +2025-02-06 07:07:24 - ERROR - stderr - 95%|█████████▍| 21279/22434 [20:59:43<48:38, 2.53s/it] +2025-02-06 07:07:24 - ERROR - stderr - +2025-02-06 07:07:24 - ERROR - stderr - +2025-02-06 07:07:24 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.5880351066589355, 'learning_rate': 1.387105050314719e-07, 'epoch': 2.85} +2025-02-06 07:07:24 - ERROR - stderr - 95%|█████████▍| 21279/22434 [20:59:43<48:38, 2.53s/it] +2025-02-06 07:07:26 - ERROR - stderr - 95%|█████████▍| 21280/22434 [20:59:46<48:10, 2.50s/it] +2025-02-06 07:07:26 - ERROR - stderr - +2025-02-06 07:07:26 - ERROR - stderr - +2025-02-06 07:07:26 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.5475866794586182, 'learning_rate': 1.3847097328717363e-07, 'epoch': 2.85} +2025-02-06 07:07:26 - ERROR - stderr - 95%|█████████▍| 21280/22434 [20:59:46<48:10, 2.50s/it] +2025-02-06 07:07:29 - ERROR - stderr - 95%|█████████▍| 21281/22434 [20:59:49<49:26, 2.57s/it] +2025-02-06 07:07:29 - ERROR - stderr - +2025-02-06 07:07:29 - ERROR - stderr - +2025-02-06 07:07:29 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.650546908378601, 'learning_rate': 1.3823164709701133e-07, 'epoch': 2.85} +2025-02-06 07:07:29 - ERROR - stderr - 95%|█████████▍| 21281/22434 [20:59:49<49:26, 2.57s/it] +2025-02-06 07:07:31 - ERROR - stderr - 95%|█████████▍| 21282/22434 [20:59:51<49:27, 2.58s/it] +2025-02-06 07:07:31 - ERROR - stderr - +2025-02-06 07:07:31 - ERROR - stderr - +2025-02-06 07:07:31 - INFO - stdout - {'loss': 0.3196, 'grad_norm': 1.5455268621444702, 'learning_rate': 1.3799252646597428e-07, 'epoch': 2.85} +2025-02-06 07:07:31 - ERROR - stderr - 95%|█████████▍| 21282/22434 [20:59:51<49:27, 2.58s/it] +2025-02-06 07:07:34 - ERROR - stderr - 95%|█████████▍| 21283/22434 [20:59:54<50:18, 2.62s/it] +2025-02-06 07:07:34 - ERROR - stderr - +2025-02-06 07:07:34 - ERROR - stderr - +2025-02-06 07:07:34 - INFO - stdout - {'loss': 0.4036, 'grad_norm': 1.831058382987976, 'learning_rate': 1.377536113990463e-07, 'epoch': 2.85} +2025-02-06 07:07:34 - ERROR - stderr - 95%|█████████▍| 21283/22434 [20:59:54<50:18, 2.62s/it] +2025-02-06 07:07:37 - ERROR - stderr - 95%|█████████▍| 21284/22434 [20:59:56<49:48, 2.60s/it] +2025-02-06 07:07:37 - ERROR - stderr - +2025-02-06 07:07:37 - ERROR - stderr - +2025-02-06 07:07:37 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.7668429613113403, 'learning_rate': 1.3751490190120675e-07, 'epoch': 2.85} +2025-02-06 07:07:37 - ERROR - stderr - 95%|█████████▍| 21284/22434 [20:59:56<49:48, 2.60s/it] +2025-02-06 07:07:39 - ERROR - stderr - 95%|█████████▍| 21285/22434 [20:59:59<49:18, 2.58s/it] +2025-02-06 07:07:39 - ERROR - stderr - +2025-02-06 07:07:39 - ERROR - stderr - +2025-02-06 07:07:39 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.4617892503738403, 'learning_rate': 1.3727639797743163e-07, 'epoch': 2.85} +2025-02-06 07:07:39 - ERROR - stderr - 95%|█████████▍| 21285/22434 [20:59:59<49:18, 2.58s/it] +2025-02-06 07:07:42 - ERROR - stderr - 95%|█████████▍| 21286/22434 [21:00:01<48:55, 2.56s/it] +2025-02-06 07:07:42 - ERROR - stderr - +2025-02-06 07:07:42 - ERROR - stderr - +2025-02-06 07:07:42 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.5171540975570679, 'learning_rate': 1.3703809963269256e-07, 'epoch': 2.85} +2025-02-06 07:07:42 - ERROR - stderr - 95%|█████████▍| 21286/22434 [21:00:01<48:55, 2.56s/it] +2025-02-06 07:07:44 - ERROR - stderr - 95%|█████████▍| 21287/22434 [21:00:04<48:40, 2.55s/it] +2025-02-06 07:07:44 - ERROR - stderr - +2025-02-06 07:07:44 - ERROR - stderr - +2025-02-06 07:07:44 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.647857666015625, 'learning_rate': 1.368000068719566e-07, 'epoch': 2.85} +2025-02-06 07:07:44 - ERROR - stderr - 95%|█████████▍| 21287/22434 [21:00:04<48:40, 2.55s/it] +2025-02-06 07:07:47 - ERROR - stderr - 95%|█████████▍| 21288/22434 [21:00:06<48:35, 2.54s/it] +2025-02-06 07:07:47 - ERROR - stderr - +2025-02-06 07:07:47 - ERROR - stderr - +2025-02-06 07:07:47 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.5477628707885742, 'learning_rate': 1.365621197001854e-07, 'epoch': 2.85} +2025-02-06 07:07:47 - ERROR - stderr - 95%|█████████▍| 21288/22434 [21:00:07<48:35, 2.54s/it] +2025-02-06 07:07:49 - ERROR - stderr - 95%|█████████▍| 21289/22434 [21:00:09<48:20, 2.53s/it] +2025-02-06 07:07:49 - ERROR - stderr - +2025-02-06 07:07:49 - ERROR - stderr - +2025-02-06 07:07:49 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.6630971431732178, 'learning_rate': 1.3632443812233943e-07, 'epoch': 2.85} +2025-02-06 07:07:49 - ERROR - stderr - 95%|█████████▍| 21289/22434 [21:00:09<48:20, 2.53s/it] +2025-02-06 07:07:52 - ERROR - stderr - 95%|█████████▍| 21290/22434 [21:00:11<47:53, 2.51s/it] +2025-02-06 07:07:52 - ERROR - stderr - +2025-02-06 07:07:52 - ERROR - stderr - +2025-02-06 07:07:52 - INFO - stdout - {'loss': 0.3615, 'grad_norm': 1.5150630474090576, 'learning_rate': 1.3608696214337246e-07, 'epoch': 2.85} +2025-02-06 07:07:52 - ERROR - stderr - 95%|█████████▍| 21290/22434 [21:00:11<47:53, 2.51s/it] +2025-02-06 07:07:54 - ERROR - stderr - 95%|█████████▍| 21291/22434 [21:00:14<47:54, 2.51s/it] +2025-02-06 07:07:54 - ERROR - stderr - +2025-02-06 07:07:54 - ERROR - stderr - +2025-02-06 07:07:54 - INFO - stdout - {'loss': 0.322, 'grad_norm': 1.4692448377609253, 'learning_rate': 1.3584969176823282e-07, 'epoch': 2.85} +2025-02-06 07:07:54 - ERROR - stderr - 95%|█████████▍| 21291/22434 [21:00:14<47:54, 2.51s/it] +2025-02-06 07:07:57 - ERROR - stderr - 95%|█████████▍| 21292/22434 [21:00:16<47:46, 2.51s/it] +2025-02-06 07:07:57 - ERROR - stderr - +2025-02-06 07:07:57 - ERROR - stderr - +2025-02-06 07:07:57 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.6206945180892944, 'learning_rate': 1.3561262700186872e-07, 'epoch': 2.85} +2025-02-06 07:07:57 - ERROR - stderr - 95%|█████████▍| 21292/22434 [21:00:17<47:46, 2.51s/it] +2025-02-06 07:07:59 - ERROR - stderr - 95%|█████████▍| 21293/22434 [21:00:19<47:57, 2.52s/it] +2025-02-06 07:07:59 - ERROR - stderr - +2025-02-06 07:07:59 - ERROR - stderr - +2025-02-06 07:07:59 - INFO - stdout - {'loss': 0.3262, 'grad_norm': 1.3087486028671265, 'learning_rate': 1.3537576784921957e-07, 'epoch': 2.85} +2025-02-06 07:07:59 - ERROR - stderr - 95%|█████████▍| 21293/22434 [21:00:19<47:57, 2.52s/it] +2025-02-06 07:08:02 - ERROR - stderr - 95%|█████████▍| 21294/22434 [21:00:22<47:58, 2.53s/it] +2025-02-06 07:08:02 - ERROR - stderr - +2025-02-06 07:08:02 - ERROR - stderr - +2025-02-06 07:08:02 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.5789207220077515, 'learning_rate': 1.3513911431522254e-07, 'epoch': 2.85} +2025-02-06 07:08:02 - ERROR - stderr - 95%|█████████▍| 21294/22434 [21:00:22<47:58, 2.53s/it] +2025-02-06 07:08:04 - ERROR - stderr - 95%|█████████▍| 21295/22434 [21:00:24<47:49, 2.52s/it] +2025-02-06 07:08:04 - ERROR - stderr - +2025-02-06 07:08:04 - ERROR - stderr - +2025-02-06 07:08:04 - INFO - stdout - {'loss': 0.3339, 'grad_norm': 1.472839117050171, 'learning_rate': 1.3490266640481254e-07, 'epoch': 2.85} +2025-02-06 07:08:04 - ERROR - stderr - 95%|█████████▍| 21295/22434 [21:00:24<47:49, 2.52s/it] +2025-02-06 07:08:07 - ERROR - stderr - 95%|█████████▍| 21296/22434 [21:00:27<48:28, 2.56s/it] +2025-02-06 07:08:07 - ERROR - stderr - +2025-02-06 07:08:07 - ERROR - stderr - +2025-02-06 07:08:07 - INFO - stdout - {'loss': 0.3354, 'grad_norm': 1.4450534582138062, 'learning_rate': 1.3466642412291454e-07, 'epoch': 2.85} +2025-02-06 07:08:07 - ERROR - stderr - 95%|█████████▍| 21296/22434 [21:00:27<48:28, 2.56s/it] +2025-02-06 07:08:09 - ERROR - stderr - 95%|█████████▍| 21297/22434 [21:00:29<48:26, 2.56s/it] +2025-02-06 07:08:10 - ERROR - stderr - +2025-02-06 07:08:10 - ERROR - stderr - +2025-02-06 07:08:10 - INFO - stdout - {'loss': 0.4141, 'grad_norm': 1.7096911668777466, 'learning_rate': 1.344303874744568e-07, 'epoch': 2.85} +2025-02-06 07:08:10 - ERROR - stderr - 95%|█████████▍| 21297/22434 [21:00:29<48:26, 2.56s/it] +2025-02-06 07:08:12 - ERROR - stderr - 95%|█████████▍| 21298/22434 [21:00:32<48:23, 2.56s/it] +2025-02-06 07:08:12 - ERROR - stderr - +2025-02-06 07:08:12 - ERROR - stderr - +2025-02-06 07:08:12 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.9521342515945435, 'learning_rate': 1.3419455646435653e-07, 'epoch': 2.85} +2025-02-06 07:08:12 - ERROR - stderr - 95%|█████████▍| 21298/22434 [21:00:32<48:23, 2.56s/it] +2025-02-06 07:08:14 - ERROR - stderr - 95%|█████████▍| 21299/22434 [21:00:34<47:42, 2.52s/it] +2025-02-06 07:08:15 - ERROR - stderr - +2025-02-06 07:08:15 - ERROR - stderr - +2025-02-06 07:08:15 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.5692262649536133, 'learning_rate': 1.3395893109752979e-07, 'epoch': 2.85} +2025-02-06 07:08:15 - ERROR - stderr - 95%|█████████▍| 21299/22434 [21:00:34<47:42, 2.52s/it] +2025-02-06 07:08:17 - ERROR - stderr - 95%|█████████▍| 21300/22434 [21:00:37<49:28, 2.62s/it] +2025-02-06 07:08:17 - ERROR - stderr - +2025-02-06 07:08:17 - ERROR - stderr - +2025-02-06 07:08:17 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.8141324520111084, 'learning_rate': 1.3372351137888929e-07, 'epoch': 2.85} +2025-02-06 07:08:17 - ERROR - stderr - 95%|█████████▍| 21300/22434 [21:00:37<49:28, 2.62s/it] +2025-02-06 07:08:20 - ERROR - stderr - 95%|█████████▍| 21301/22434 [21:00:40<49:00, 2.60s/it] +2025-02-06 07:08:20 - ERROR - stderr - +2025-02-06 07:08:20 - ERROR - stderr - +2025-02-06 07:08:20 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.3479763269424438, 'learning_rate': 1.3348829731334002e-07, 'epoch': 2.85} +2025-02-06 07:08:20 - ERROR - stderr - 95%|█████████▍| 21301/22434 [21:00:40<49:00, 2.60s/it] +2025-02-06 07:08:23 - ERROR - stderr - 95%|█████████▍| 21302/22434 [21:00:43<50:56, 2.70s/it] +2025-02-06 07:08:23 - ERROR - stderr - +2025-02-06 07:08:23 - ERROR - stderr - +2025-02-06 07:08:23 - INFO - stdout - {'loss': 0.3577, 'grad_norm': 1.5192432403564453, 'learning_rate': 1.3325328890578693e-07, 'epoch': 2.85} +2025-02-06 07:08:23 - ERROR - stderr - 95%|█████████▍| 21302/22434 [21:00:43<50:56, 2.70s/it] +2025-02-06 07:08:25 - ERROR - stderr - 95%|█████████▍| 21303/22434 [21:00:45<49:23, 2.62s/it] +2025-02-06 07:08:25 - ERROR - stderr - +2025-02-06 07:08:25 - ERROR - stderr - +2025-02-06 07:08:25 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.7903512716293335, 'learning_rate': 1.3301848616112724e-07, 'epoch': 2.85} +2025-02-06 07:08:25 - ERROR - stderr - 95%|█████████▍| 21303/22434 [21:00:45<49:23, 2.62s/it] +2025-02-06 07:08:28 - ERROR - stderr - 95%|█████████▍| 21304/22434 [21:00:48<50:20, 2.67s/it] +2025-02-06 07:08:28 - ERROR - stderr - +2025-02-06 07:08:28 - ERROR - stderr - +2025-02-06 07:08:28 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.4066026210784912, 'learning_rate': 1.3278388908425477e-07, 'epoch': 2.85} +2025-02-06 07:08:28 - ERROR - stderr - 95%|█████████▍| 21304/22434 [21:00:48<50:20, 2.67s/it] +2025-02-06 07:08:30 - ERROR - stderr - 95%|█████████▍| 21305/22434 [21:00:50<48:59, 2.60s/it] +2025-02-06 07:08:31 - ERROR - stderr - +2025-02-06 07:08:31 - ERROR - stderr - +2025-02-06 07:08:31 - INFO - stdout - {'loss': 0.3366, 'grad_norm': 1.4800617694854736, 'learning_rate': 1.325494976800612e-07, 'epoch': 2.85} +2025-02-06 07:08:31 - ERROR - stderr - 95%|█████████▍| 21305/22434 [21:00:50<48:59, 2.60s/it] +2025-02-06 07:08:33 - ERROR - stderr - 95%|█████████▍| 21306/22434 [21:00:53<48:27, 2.58s/it] +2025-02-06 07:08:33 - ERROR - stderr - +2025-02-06 07:08:33 - ERROR - stderr - +2025-02-06 07:08:33 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.5031013488769531, 'learning_rate': 1.323153119534315e-07, 'epoch': 2.85} +2025-02-06 07:08:33 - ERROR - stderr - 95%|█████████▍| 21306/22434 [21:00:53<48:27, 2.58s/it] +2025-02-06 07:08:36 - ERROR - stderr - 95%|█████████▍| 21307/22434 [21:00:55<48:21, 2.57s/it] +2025-02-06 07:08:36 - ERROR - stderr - +2025-02-06 07:08:36 - ERROR - stderr - +2025-02-06 07:08:36 - INFO - stdout - {'loss': 0.319, 'grad_norm': 1.3461406230926514, 'learning_rate': 1.320813319092462e-07, 'epoch': 2.85} +2025-02-06 07:08:36 - ERROR - stderr - 95%|█████████▍| 21307/22434 [21:00:55<48:21, 2.57s/it] +2025-02-06 07:08:38 - ERROR - stderr - 95%|█████████▍| 21308/22434 [21:00:58<47:48, 2.55s/it] +2025-02-06 07:08:38 - ERROR - stderr - +2025-02-06 07:08:38 - ERROR - stderr - +2025-02-06 07:08:38 - INFO - stdout - {'loss': 0.2989, 'grad_norm': 1.5111366510391235, 'learning_rate': 1.3184755755238254e-07, 'epoch': 2.85} +2025-02-06 07:08:38 - ERROR - stderr - 95%|█████████▍| 21308/22434 [21:00:58<47:48, 2.55s/it] +2025-02-06 07:08:41 - ERROR - stderr - 95%|█████████▍| 21309/22434 [21:01:00<48:11, 2.57s/it] +2025-02-06 07:08:41 - ERROR - stderr - +2025-02-06 07:08:41 - ERROR - stderr - +2025-02-06 07:08:41 - INFO - stdout - {'loss': 0.3851, 'grad_norm': 1.3939272165298462, 'learning_rate': 1.3161398888771436e-07, 'epoch': 2.85} +2025-02-06 07:08:41 - ERROR - stderr - 95%|█████████▍| 21309/22434 [21:01:00<48:11, 2.57s/it] +2025-02-06 07:08:43 - ERROR - stderr - 95%|█████████▍| 21310/22434 [21:01:03<47:22, 2.53s/it] +2025-02-06 07:08:43 - ERROR - stderr - +2025-02-06 07:08:43 - ERROR - stderr - +2025-02-06 07:08:43 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.383384108543396, 'learning_rate': 1.313806259201089e-07, 'epoch': 2.85} +2025-02-06 07:08:43 - ERROR - stderr - 95%|█████████▍| 21310/22434 [21:01:03<47:22, 2.53s/it] +2025-02-06 07:08:46 - ERROR - stderr - 95%|█████████▍| 21311/22434 [21:01:05<47:11, 2.52s/it] +2025-02-06 07:08:46 - ERROR - stderr - +2025-02-06 07:08:46 - ERROR - stderr - +2025-02-06 07:08:46 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.4775140285491943, 'learning_rate': 1.3114746865443227e-07, 'epoch': 2.85} +2025-02-06 07:08:46 - ERROR - stderr - 95%|█████████▍| 21311/22434 [21:01:05<47:11, 2.52s/it] +2025-02-06 07:08:48 - ERROR - stderr - 95%|█████████▍| 21312/22434 [21:01:08<46:40, 2.50s/it] +2025-02-06 07:08:48 - ERROR - stderr - +2025-02-06 07:08:48 - ERROR - stderr - +2025-02-06 07:08:48 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.7025502920150757, 'learning_rate': 1.3091451709554172e-07, 'epoch': 2.85} +2025-02-06 07:08:48 - ERROR - stderr - 95%|█████████▍| 21312/22434 [21:01:08<46:40, 2.50s/it] +2025-02-06 07:08:51 - ERROR - stderr - 95%|█████████▌| 21313/22434 [21:01:10<46:29, 2.49s/it] +2025-02-06 07:08:51 - ERROR - stderr - +2025-02-06 07:08:51 - ERROR - stderr - +2025-02-06 07:08:51 - INFO - stdout - {'loss': 0.3208, 'grad_norm': 1.4109947681427002, 'learning_rate': 1.306817712482955e-07, 'epoch': 2.85} +2025-02-06 07:08:51 - ERROR - stderr - 95%|█████████▌| 21313/22434 [21:01:10<46:29, 2.49s/it] +2025-02-06 07:08:53 - ERROR - stderr - 95%|█████████▌| 21314/22434 [21:01:13<46:23, 2.49s/it] +2025-02-06 07:08:53 - ERROR - stderr - +2025-02-06 07:08:53 - ERROR - stderr - +2025-02-06 07:08:53 - INFO - stdout - {'loss': 0.3905, 'grad_norm': 1.5134276151657104, 'learning_rate': 1.3044923111754427e-07, 'epoch': 2.85} +2025-02-06 07:08:53 - ERROR - stderr - 95%|█████████▌| 21314/22434 [21:01:13<46:23, 2.49s/it] +2025-02-06 07:08:55 - ERROR - stderr - 95%|█████████▌| 21315/22434 [21:01:15<46:15, 2.48s/it] +2025-02-06 07:08:55 - ERROR - stderr - +2025-02-06 07:08:55 - ERROR - stderr - +2025-02-06 07:08:55 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.352474331855774, 'learning_rate': 1.30216896708133e-07, 'epoch': 2.85} +2025-02-06 07:08:55 - ERROR - stderr - 95%|█████████▌| 21315/22434 [21:01:15<46:15, 2.48s/it] +2025-02-06 07:08:58 - ERROR - stderr - 95%|█████████▌| 21316/22434 [21:01:18<46:07, 2.48s/it] +2025-02-06 07:08:58 - ERROR - stderr - +2025-02-06 07:08:58 - ERROR - stderr - +2025-02-06 07:08:58 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.483819603919983, 'learning_rate': 1.2998476802490779e-07, 'epoch': 2.85} +2025-02-06 07:08:58 - ERROR - stderr - 95%|█████████▌| 21316/22434 [21:01:18<46:07, 2.48s/it] +2025-02-06 07:09:00 - ERROR - stderr - 95%|█████████▌| 21317/22434 [21:01:20<45:57, 2.47s/it] +2025-02-06 07:09:00 - ERROR - stderr - +2025-02-06 07:09:00 - ERROR - stderr - +2025-02-06 07:09:00 - INFO - stdout - {'loss': 0.3246, 'grad_norm': 1.434454083442688, 'learning_rate': 1.297528450727048e-07, 'epoch': 2.85} +2025-02-06 07:09:00 - ERROR - stderr - 95%|█████████▌| 21317/22434 [21:01:20<45:57, 2.47s/it] +2025-02-06 07:09:03 - ERROR - stderr - 95%|█████████▌| 21318/22434 [21:01:23<46:00, 2.47s/it] +2025-02-06 07:09:03 - ERROR - stderr - +2025-02-06 07:09:03 - ERROR - stderr - +2025-02-06 07:09:03 - INFO - stdout - {'loss': 0.3678, 'grad_norm': 1.6253951787948608, 'learning_rate': 1.2952112785635796e-07, 'epoch': 2.85} +2025-02-06 07:09:03 - ERROR - stderr - 95%|█████████▌| 21318/22434 [21:01:23<46:00, 2.47s/it] +2025-02-06 07:09:05 - ERROR - stderr - 95%|█████████▌| 21319/22434 [21:01:25<46:14, 2.49s/it] +2025-02-06 07:09:05 - ERROR - stderr - +2025-02-06 07:09:05 - ERROR - stderr - +2025-02-06 07:09:05 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.4970555305480957, 'learning_rate': 1.2928961638069893e-07, 'epoch': 2.85} +2025-02-06 07:09:05 - ERROR - stderr - 95%|█████████▌| 21319/22434 [21:01:25<46:14, 2.49s/it] +2025-02-06 07:09:08 - ERROR - stderr - 95%|█████████▌| 21320/22434 [21:01:28<46:14, 2.49s/it] +2025-02-06 07:09:08 - ERROR - stderr - +2025-02-06 07:09:08 - ERROR - stderr - +2025-02-06 07:09:08 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.6115256547927856, 'learning_rate': 1.2905831065055275e-07, 'epoch': 2.85} +2025-02-06 07:09:08 - ERROR - stderr - 95%|█████████▌| 21320/22434 [21:01:28<46:14, 2.49s/it] +2025-02-06 07:09:10 - ERROR - stderr - 95%|█████████▌| 21321/22434 [21:01:30<46:41, 2.52s/it] +2025-02-06 07:09:10 - ERROR - stderr - +2025-02-06 07:09:11 - ERROR - stderr - +2025-02-06 07:09:11 - INFO - stdout - {'loss': 0.3101, 'grad_norm': 1.4637290239334106, 'learning_rate': 1.288272106707411e-07, 'epoch': 2.85} +2025-02-06 07:09:11 - ERROR - stderr - 95%|█████████▌| 21321/22434 [21:01:30<46:41, 2.52s/it] +2025-02-06 07:09:13 - ERROR - stderr - 95%|█████████▌| 21322/22434 [21:01:33<47:06, 2.54s/it] +2025-02-06 07:09:13 - ERROR - stderr - +2025-02-06 07:09:13 - ERROR - stderr - +2025-02-06 07:09:13 - INFO - stdout - {'loss': 0.3884, 'grad_norm': 1.5984907150268555, 'learning_rate': 1.2859631644608016e-07, 'epoch': 2.85} +2025-02-06 07:09:13 - ERROR - stderr - 95%|█████████▌| 21322/22434 [21:01:33<47:06, 2.54s/it] +2025-02-06 07:09:15 - ERROR - stderr - 95%|█████████▌| 21323/22434 [21:01:35<46:23, 2.51s/it] +2025-02-06 07:09:16 - ERROR - stderr - +2025-02-06 07:09:16 - ERROR - stderr - +2025-02-06 07:09:16 - INFO - stdout - {'loss': 0.3725, 'grad_norm': 1.8046034574508667, 'learning_rate': 1.2836562798138275e-07, 'epoch': 2.85} +2025-02-06 07:09:16 - ERROR - stderr - 95%|█████████▌| 21323/22434 [21:01:35<46:23, 2.51s/it] +2025-02-06 07:09:18 - ERROR - stderr - 95%|█████████▌| 21324/22434 [21:01:38<46:35, 2.52s/it] +2025-02-06 07:09:18 - ERROR - stderr - +2025-02-06 07:09:18 - ERROR - stderr - +2025-02-06 07:09:18 - INFO - stdout - {'loss': 0.4227, 'grad_norm': 1.8774304389953613, 'learning_rate': 1.2813514528145833e-07, 'epoch': 2.85} +2025-02-06 07:09:18 - ERROR - stderr - 95%|█████████▌| 21324/22434 [21:01:38<46:35, 2.52s/it] +2025-02-06 07:09:21 - ERROR - stderr - 95%|█████████▌| 21325/22434 [21:01:40<46:24, 2.51s/it] +2025-02-06 07:09:21 - ERROR - stderr - +2025-02-06 07:09:21 - ERROR - stderr - +2025-02-06 07:09:21 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.5253602266311646, 'learning_rate': 1.2790486835110972e-07, 'epoch': 2.85} +2025-02-06 07:09:21 - ERROR - stderr - 95%|█████████▌| 21325/22434 [21:01:40<46:24, 2.51s/it] +2025-02-06 07:09:23 - ERROR - stderr - 95%|█████████▌| 21326/22434 [21:01:43<46:26, 2.51s/it] +2025-02-06 07:09:23 - ERROR - stderr - +2025-02-06 07:09:23 - ERROR - stderr - +2025-02-06 07:09:23 - INFO - stdout - {'loss': 0.3626, 'grad_norm': 1.6466706991195679, 'learning_rate': 1.2767479719513864e-07, 'epoch': 2.85} +2025-02-06 07:09:23 - ERROR - stderr - 95%|█████████▌| 21326/22434 [21:01:43<46:26, 2.51s/it] +2025-02-06 07:09:26 - ERROR - stderr - 95%|█████████▌| 21327/22434 [21:01:45<46:32, 2.52s/it] +2025-02-06 07:09:26 - ERROR - stderr - +2025-02-06 07:09:26 - ERROR - stderr - +2025-02-06 07:09:26 - INFO - stdout - {'loss': 0.3406, 'grad_norm': 1.4994215965270996, 'learning_rate': 1.2744493181833793e-07, 'epoch': 2.85} +2025-02-06 07:09:26 - ERROR - stderr - 95%|█████████▌| 21327/22434 [21:01:45<46:32, 2.52s/it] +2025-02-06 07:09:28 - ERROR - stderr - 95%|█████████▌| 21328/22434 [21:01:48<46:36, 2.53s/it] +2025-02-06 07:09:28 - ERROR - stderr - +2025-02-06 07:09:28 - ERROR - stderr - +2025-02-06 07:09:28 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.596832036972046, 'learning_rate': 1.2721527222550267e-07, 'epoch': 2.85} +2025-02-06 07:09:28 - ERROR - stderr - 95%|█████████▌| 21328/22434 [21:01:48<46:36, 2.53s/it] +2025-02-06 07:09:31 - ERROR - stderr - 95%|█████████▌| 21329/22434 [21:01:50<46:38, 2.53s/it] +2025-02-06 07:09:31 - ERROR - stderr - +2025-02-06 07:09:31 - ERROR - stderr - +2025-02-06 07:09:31 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.5940698385238647, 'learning_rate': 1.2698581842141567e-07, 'epoch': 2.85} +2025-02-06 07:09:31 - ERROR - stderr - 95%|█████████▌| 21329/22434 [21:01:50<46:38, 2.53s/it] +2025-02-06 07:09:33 - ERROR - stderr - 95%|█████████▌| 21330/22434 [21:01:53<46:49, 2.55s/it] +2025-02-06 07:09:33 - ERROR - stderr - +2025-02-06 07:09:33 - ERROR - stderr - +2025-02-06 07:09:33 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.6018999814987183, 'learning_rate': 1.267565704108642e-07, 'epoch': 2.85} +2025-02-06 07:09:33 - ERROR - stderr - 95%|█████████▌| 21330/22434 [21:01:53<46:49, 2.55s/it] +2025-02-06 07:09:36 - ERROR - stderr - 95%|█████████▌| 21331/22434 [21:01:56<47:24, 2.58s/it] +2025-02-06 07:09:36 - ERROR - stderr - +2025-02-06 07:09:36 - ERROR - stderr - +2025-02-06 07:09:36 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.4740620851516724, 'learning_rate': 1.2652752819862225e-07, 'epoch': 2.85} +2025-02-06 07:09:36 - ERROR - stderr - 95%|█████████▌| 21331/22434 [21:01:56<47:24, 2.58s/it] +2025-02-06 07:09:38 - ERROR - stderr - 95%|█████████▌| 21332/22434 [21:01:58<47:03, 2.56s/it] +2025-02-06 07:09:38 - ERROR - stderr - +2025-02-06 07:09:38 - ERROR - stderr - +2025-02-06 07:09:38 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.371578335762024, 'learning_rate': 1.2629869178946708e-07, 'epoch': 2.85} +2025-02-06 07:09:38 - ERROR - stderr - 95%|█████████▌| 21332/22434 [21:01:58<47:03, 2.56s/it] +2025-02-06 07:09:41 - ERROR - stderr - 95%|█████████▌| 21333/22434 [21:02:01<46:28, 2.53s/it] +2025-02-06 07:09:41 - ERROR - stderr - +2025-02-06 07:09:41 - ERROR - stderr - +2025-02-06 07:09:41 - INFO - stdout - {'loss': 0.3388, 'grad_norm': 1.3511689901351929, 'learning_rate': 1.2607006118816712e-07, 'epoch': 2.85} +2025-02-06 07:09:41 - ERROR - stderr - 95%|█████████▌| 21333/22434 [21:02:01<46:28, 2.53s/it] +2025-02-06 07:09:43 - ERROR - stderr - 95%|█████████▌| 21334/22434 [21:02:03<45:57, 2.51s/it] +2025-02-06 07:09:43 - ERROR - stderr - +2025-02-06 07:09:43 - ERROR - stderr - +2025-02-06 07:09:43 - INFO - stdout - {'loss': 0.4007, 'grad_norm': 1.7066760063171387, 'learning_rate': 1.2584163639948853e-07, 'epoch': 2.85} +2025-02-06 07:09:43 - ERROR - stderr - 95%|█████████▌| 21334/22434 [21:02:03<45:57, 2.51s/it] +2025-02-06 07:09:46 - ERROR - stderr - 95%|█████████▌| 21335/22434 [21:02:06<45:48, 2.50s/it] +2025-02-06 07:09:46 - ERROR - stderr - +2025-02-06 07:09:46 - ERROR - stderr - +2025-02-06 07:09:46 - INFO - stdout - {'loss': 0.3679, 'grad_norm': 1.5390831232070923, 'learning_rate': 1.2561341742819422e-07, 'epoch': 2.85} +2025-02-06 07:09:46 - ERROR - stderr - 95%|█████████▌| 21335/22434 [21:02:06<45:48, 2.50s/it] +2025-02-06 07:09:48 - ERROR - stderr - 95%|█████████▌| 21336/22434 [21:02:08<45:47, 2.50s/it] +2025-02-06 07:09:48 - ERROR - stderr - +2025-02-06 07:09:48 - ERROR - stderr - +2025-02-06 07:09:48 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.6075454950332642, 'learning_rate': 1.25385404279037e-07, 'epoch': 2.85} +2025-02-06 07:09:48 - ERROR - stderr - 95%|█████████▌| 21336/22434 [21:02:08<45:47, 2.50s/it] +2025-02-06 07:09:51 - ERROR - stderr - 95%|█████████▌| 21337/22434 [21:02:11<46:01, 2.52s/it] +2025-02-06 07:09:51 - ERROR - stderr - +2025-02-06 07:09:51 - ERROR - stderr - +2025-02-06 07:09:51 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.6106818914413452, 'learning_rate': 1.2515759695677309e-07, 'epoch': 2.85} +2025-02-06 07:09:51 - ERROR - stderr - 95%|█████████▌| 21337/22434 [21:02:11<46:01, 2.52s/it] +2025-02-06 07:09:53 - ERROR - stderr - 95%|█████████▌| 21338/22434 [21:02:13<46:03, 2.52s/it] +2025-02-06 07:09:53 - ERROR - stderr - +2025-02-06 07:09:53 - ERROR - stderr - +2025-02-06 07:09:53 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.5912941694259644, 'learning_rate': 1.2492999546614982e-07, 'epoch': 2.85} +2025-02-06 07:09:53 - ERROR - stderr - 95%|█████████▌| 21338/22434 [21:02:13<46:03, 2.52s/it] +2025-02-06 07:09:56 - ERROR - stderr - 95%|█████████▌| 21339/22434 [21:02:16<46:14, 2.53s/it] +2025-02-06 07:09:56 - ERROR - stderr - +2025-02-06 07:09:56 - ERROR - stderr - +2025-02-06 07:09:56 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.486480951309204, 'learning_rate': 1.2470259981191113e-07, 'epoch': 2.85} +2025-02-06 07:09:56 - ERROR - stderr - 95%|█████████▌| 21339/22434 [21:02:16<46:14, 2.53s/it] +2025-02-06 07:09:58 - ERROR - stderr - 95%|█████████▌| 21340/22434 [21:02:18<45:50, 2.51s/it] +2025-02-06 07:09:58 - ERROR - stderr - +2025-02-06 07:09:58 - ERROR - stderr - +2025-02-06 07:09:58 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.486699104309082, 'learning_rate': 1.244754099987977e-07, 'epoch': 2.85} +2025-02-06 07:09:58 - ERROR - stderr - 95%|█████████▌| 21340/22434 [21:02:18<45:50, 2.51s/it] +2025-02-06 07:10:01 - ERROR - stderr - 95%|█████████▌| 21341/22434 [21:02:21<46:54, 2.58s/it] +2025-02-06 07:10:01 - ERROR - stderr - +2025-02-06 07:10:01 - ERROR - stderr - +2025-02-06 07:10:01 - INFO - stdout - {'loss': 0.3886, 'grad_norm': 1.6901589632034302, 'learning_rate': 1.2424842603154353e-07, 'epoch': 2.85} +2025-02-06 07:10:01 - ERROR - stderr - 95%|█████████▌| 21341/22434 [21:02:21<46:54, 2.58s/it] +2025-02-06 07:10:04 - ERROR - stderr - 95%|█████████▌| 21342/22434 [21:02:23<46:25, 2.55s/it] +2025-02-06 07:10:04 - ERROR - stderr - +2025-02-06 07:10:04 - ERROR - stderr - +2025-02-06 07:10:04 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.611914873123169, 'learning_rate': 1.2402164791488146e-07, 'epoch': 2.85} +2025-02-06 07:10:04 - ERROR - stderr - 95%|█████████▌| 21342/22434 [21:02:23<46:25, 2.55s/it] +2025-02-06 07:10:06 - ERROR - stderr - 95%|█████████▌| 21343/22434 [21:02:26<45:59, 2.53s/it] +2025-02-06 07:10:06 - ERROR - stderr - +2025-02-06 07:10:06 - ERROR - stderr - +2025-02-06 07:10:06 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.7068796157836914, 'learning_rate': 1.2379507565353776e-07, 'epoch': 2.85} +2025-02-06 07:10:06 - ERROR - stderr - 95%|█████████▌| 21343/22434 [21:02:26<45:59, 2.53s/it] +2025-02-06 07:10:09 - ERROR - stderr - 95%|█████████▌| 21344/22434 [21:02:28<45:48, 2.52s/it] +2025-02-06 07:10:09 - ERROR - stderr - +2025-02-06 07:10:09 - ERROR - stderr - +2025-02-06 07:10:09 - INFO - stdout - {'loss': 0.4436, 'grad_norm': 1.8993489742279053, 'learning_rate': 1.2356870925223528e-07, 'epoch': 2.85} +2025-02-06 07:10:09 - ERROR - stderr - 95%|█████████▌| 21344/22434 [21:02:28<45:48, 2.52s/it] +2025-02-06 07:10:11 - ERROR - stderr - 95%|█████████▌| 21345/22434 [21:02:31<45:32, 2.51s/it] +2025-02-06 07:10:11 - ERROR - stderr - +2025-02-06 07:10:11 - ERROR - stderr - +2025-02-06 07:10:11 - INFO - stdout - {'loss': 0.3186, 'grad_norm': 1.545749545097351, 'learning_rate': 1.2334254871569252e-07, 'epoch': 2.85} +2025-02-06 07:10:11 - ERROR - stderr - 95%|█████████▌| 21345/22434 [21:02:31<45:32, 2.51s/it] +2025-02-06 07:10:14 - ERROR - stderr - 95%|█████████▌| 21346/22434 [21:02:33<45:42, 2.52s/it] +2025-02-06 07:10:14 - ERROR - stderr - +2025-02-06 07:10:14 - ERROR - stderr - +2025-02-06 07:10:14 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.6281611919403076, 'learning_rate': 1.231165940486234e-07, 'epoch': 2.85} +2025-02-06 07:10:14 - ERROR - stderr - 95%|█████████▌| 21346/22434 [21:02:33<45:42, 2.52s/it] +2025-02-06 07:10:16 - ERROR - stderr - 95%|█████████▌| 21347/22434 [21:02:36<45:12, 2.50s/it] +2025-02-06 07:10:16 - ERROR - stderr - +2025-02-06 07:10:16 - ERROR - stderr - +2025-02-06 07:10:16 - INFO - stdout - {'loss': 0.3402, 'grad_norm': 1.7126564979553223, 'learning_rate': 1.2289084525573646e-07, 'epoch': 2.85} +2025-02-06 07:10:16 - ERROR - stderr - 95%|█████████▌| 21347/22434 [21:02:36<45:12, 2.50s/it] +2025-02-06 07:10:19 - ERROR - stderr - 95%|█████████▌| 21348/22434 [21:02:38<45:06, 2.49s/it] +2025-02-06 07:10:19 - ERROR - stderr - +2025-02-06 07:10:19 - ERROR - stderr - +2025-02-06 07:10:19 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.7417253255844116, 'learning_rate': 1.2266530234174013e-07, 'epoch': 2.85} +2025-02-06 07:10:19 - ERROR - stderr - 95%|█████████▌| 21348/22434 [21:02:38<45:06, 2.49s/it] +2025-02-06 07:10:21 - ERROR - stderr - 95%|█████████▌| 21349/22434 [21:02:41<44:57, 2.49s/it] +2025-02-06 07:10:21 - ERROR - stderr - +2025-02-06 07:10:21 - ERROR - stderr - +2025-02-06 07:10:21 - INFO - stdout - {'loss': 0.389, 'grad_norm': 1.5606211423873901, 'learning_rate': 1.2243996531133284e-07, 'epoch': 2.85} +2025-02-06 07:10:21 - ERROR - stderr - 95%|█████████▌| 21349/22434 [21:02:41<44:57, 2.49s/it] +2025-02-06 07:10:24 - ERROR - stderr - 95%|█████████▌| 21350/22434 [21:02:43<44:44, 2.48s/it] +2025-02-06 07:10:24 - ERROR - stderr - +2025-02-06 07:10:24 - ERROR - stderr - +2025-02-06 07:10:24 - INFO - stdout - {'loss': 0.3868, 'grad_norm': 1.767960786819458, 'learning_rate': 1.222148341692131e-07, 'epoch': 2.86} +2025-02-06 07:10:24 - ERROR - stderr - 95%|█���███████▌| 21350/22434 [21:02:43<44:44, 2.48s/it] +2025-02-06 07:10:26 - ERROR - stderr - 95%|█████████▌| 21351/22434 [21:02:46<44:57, 2.49s/it] +2025-02-06 07:10:26 - ERROR - stderr - +2025-02-06 07:10:26 - ERROR - stderr - +2025-02-06 07:10:26 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.6719179153442383, 'learning_rate': 1.219899089200738e-07, 'epoch': 2.86} +2025-02-06 07:10:26 - ERROR - stderr - 95%|█████████▌| 21351/22434 [21:02:46<44:57, 2.49s/it] +2025-02-06 07:10:29 - ERROR - stderr - 95%|█████████▌| 21352/22434 [21:02:48<44:56, 2.49s/it] +2025-02-06 07:10:29 - ERROR - stderr - +2025-02-06 07:10:29 - ERROR - stderr - +2025-02-06 07:10:29 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.600157380104065, 'learning_rate': 1.217651895686023e-07, 'epoch': 2.86} +2025-02-06 07:10:29 - ERROR - stderr - 95%|█████████▌| 21352/22434 [21:02:48<44:56, 2.49s/it] +2025-02-06 07:10:31 - ERROR - stderr - 95%|█████████▌| 21353/22434 [21:02:51<44:52, 2.49s/it] +2025-02-06 07:10:31 - ERROR - stderr - +2025-02-06 07:10:31 - ERROR - stderr - +2025-02-06 07:10:31 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.5621857643127441, 'learning_rate': 1.215406761194826e-07, 'epoch': 2.86} +2025-02-06 07:10:31 - ERROR - stderr - 95%|█████████▌| 21353/22434 [21:02:51<44:52, 2.49s/it] +2025-02-06 07:10:34 - ERROR - stderr - 95%|█████████▌| 21354/22434 [21:02:53<45:20, 2.52s/it] +2025-02-06 07:10:34 - ERROR - stderr - +2025-02-06 07:10:34 - ERROR - stderr - +2025-02-06 07:10:34 - INFO - stdout - {'loss': 0.323, 'grad_norm': 1.4373791217803955, 'learning_rate': 1.2131636857739548e-07, 'epoch': 2.86} +2025-02-06 07:10:34 - ERROR - stderr - 95%|█████████▌| 21354/22434 [21:02:53<45:20, 2.52s/it] +2025-02-06 07:10:36 - ERROR - stderr - 95%|█████████▌| 21355/22434 [21:02:56<45:09, 2.51s/it] +2025-02-06 07:10:36 - ERROR - stderr - +2025-02-06 07:10:36 - ERROR - stderr - +2025-02-06 07:10:36 - INFO - stdout - {'loss': 0.3439, 'grad_norm': 1.8352621793746948, 'learning_rate': 1.210922669470149e-07, 'epoch': 2.86} +2025-02-06 07:10:36 - ERROR - stderr - 95%|█████████▌| 21355/22434 [21:02:56<45:09, 2.51s/it] +2025-02-06 07:10:39 - ERROR - stderr - 95%|█████████▌| 21356/22434 [21:02:58<45:05, 2.51s/it] +2025-02-06 07:10:39 - ERROR - stderr - +2025-02-06 07:10:39 - ERROR - stderr - +2025-02-06 07:10:39 - INFO - stdout - {'loss': 0.36, 'grad_norm': 1.5156341791152954, 'learning_rate': 1.2086837123301388e-07, 'epoch': 2.86} +2025-02-06 07:10:39 - ERROR - stderr - 95%|█████████▌| 21356/22434 [21:02:58<45:05, 2.51s/it] +2025-02-06 07:10:41 - ERROR - stderr - 95%|█████████▌| 21357/22434 [21:03:01<44:29, 2.48s/it] +2025-02-06 07:10:41 - ERROR - stderr - +2025-02-06 07:10:41 - ERROR - stderr - +2025-02-06 07:10:41 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.8039201498031616, 'learning_rate': 1.2064468144005637e-07, 'epoch': 2.86} +2025-02-06 07:10:41 - ERROR - stderr - 95%|█████████▌| 21357/22434 [21:03:01<44:29, 2.48s/it] +2025-02-06 07:10:44 - ERROR - stderr - 95%|█████████▌| 21358/22434 [21:03:03<44:34, 2.49s/it] +2025-02-06 07:10:44 - ERROR - stderr - +2025-02-06 07:10:44 - ERROR - stderr - +2025-02-06 07:10:44 - INFO - stdout - {'loss': 0.3725, 'grad_norm': 1.5010358095169067, 'learning_rate': 1.2042119757280867e-07, 'epoch': 2.86} +2025-02-06 07:10:44 - ERROR - stderr - 95%|█████████▌| 21358/22434 [21:03:03<44:34, 2.49s/it] +2025-02-06 07:10:46 - ERROR - stderr - 95%|█████████▌| 21359/22434 [21:03:06<44:43, 2.50s/it] +2025-02-06 07:10:46 - ERROR - stderr - +2025-02-06 07:10:46 - ERROR - stderr - +2025-02-06 07:10:46 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.594415307044983, 'learning_rate': 1.201979196359282e-07, 'epoch': 2.86} +2025-02-06 07:10:46 - ERROR - stderr - 95%|█████████▌| 21359/22434 [21:03:06<44:43, 2.50s/it] +2025-02-06 07:10:49 - ERROR - stderr - 95%|█████████▌| 21360/22434 [21:03:08<45:05, 2.52s/it] +2025-02-06 07:10:49 - ERROR - stderr - +2025-02-06 07:10:49 - ERROR - stderr - +2025-02-06 07:10:49 - INFO - stdout - {'loss': 0.3078, 'grad_norm': 1.3552346229553223, 'learning_rate': 1.1997484763406564e-07, 'epoch': 2.86} +2025-02-06 07:10:49 - ERROR - stderr - 95%|█████████▌| 21360/22434 [21:03:08<45:05, 2.52s/it] +2025-02-06 07:10:51 - ERROR - stderr - 95%|█████████▌| 21361/22434 [21:03:11<44:57, 2.51s/it] +2025-02-06 07:10:51 - ERROR - stderr - +2025-02-06 07:10:51 - ERROR - stderr - +2025-02-06 07:10:51 - INFO - stdout - {'loss': 0.4023, 'grad_norm': 1.504364013671875, 'learning_rate': 1.1975198157187507e-07, 'epoch': 2.86} +2025-02-06 07:10:51 - ERROR - stderr - 95%|█████████▌| 21361/22434 [21:03:11<44:57, 2.51s/it] +2025-02-06 07:10:54 - ERROR - stderr - 95%|█████████▌| 21362/22434 [21:03:13<45:09, 2.53s/it] +2025-02-06 07:10:54 - ERROR - stderr - +2025-02-06 07:10:54 - ERROR - stderr - +2025-02-06 07:10:54 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.5708248615264893, 'learning_rate': 1.1952932145399943e-07, 'epoch': 2.86} +2025-02-06 07:10:54 - ERROR - stderr - 95%|█████████▌| 21362/22434 [21:03:13<45:09, 2.53s/it] +2025-02-06 07:10:56 - ERROR - stderr - 95%|█████████▌| 21363/22434 [21:03:16<45:00, 2.52s/it] +2025-02-06 07:10:56 - ERROR - stderr - +2025-02-06 07:10:56 - ERROR - stderr - +2025-02-06 07:10:56 - INFO - stdout - {'loss': 0.346, 'grad_norm': 1.5693808794021606, 'learning_rate': 1.1930686728508055e-07, 'epoch': 2.86} +2025-02-06 07:10:56 - ERROR - stderr - 95%|█████████▌| 21363/22434 [21:03:16<45:00, 2.52s/it] +2025-02-06 07:10:59 - ERROR - stderr - 95%|█████████▌| 21364/22434 [21:03:18<44:59, 2.52s/it] +2025-02-06 07:10:59 - ERROR - stderr - +2025-02-06 07:10:59 - ERROR - stderr - +2025-02-06 07:10:59 - INFO - stdout - {'loss': 0.3427, 'grad_norm': 1.5300365686416626, 'learning_rate': 1.1908461906975588e-07, 'epoch': 2.86} +2025-02-06 07:10:59 - ERROR - stderr - 95%|█████████▌| 21364/22434 [21:03:19<44:59, 2.52s/it] +2025-02-06 07:11:01 - ERROR - stderr - 95%|█████████▌| 21365/22434 [21:03:21<45:13, 2.54s/it] +2025-02-06 07:11:01 - ERROR - stderr - +2025-02-06 07:11:01 - ERROR - stderr - +2025-02-06 07:11:01 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.5368432998657227, 'learning_rate': 1.1886257681265722e-07, 'epoch': 2.86} +2025-02-06 07:11:01 - ERROR - stderr - 95%|█████████▌| 21365/22434 [21:03:21<45:13, 2.54s/it] +2025-02-06 07:11:04 - ERROR - stderr - 95%|█████████▌| 21366/22434 [21:03:24<44:51, 2.52s/it] +2025-02-06 07:11:04 - ERROR - stderr - +2025-02-06 07:11:04 - ERROR - stderr - +2025-02-06 07:11:04 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.4282829761505127, 'learning_rate': 1.1864074051841202e-07, 'epoch': 2.86} +2025-02-06 07:11:04 - ERROR - stderr - 95%|█████████▌| 21366/22434 [21:03:24<44:51, 2.52s/it] +2025-02-06 07:11:06 - ERROR - stderr - 95%|█████████▌| 21367/22434 [21:03:26<44:58, 2.53s/it] +2025-02-06 07:11:06 - ERROR - stderr - +2025-02-06 07:11:06 - ERROR - stderr - +2025-02-06 07:11:06 - INFO - stdout - {'loss': 0.3318, 'grad_norm': 1.4215545654296875, 'learning_rate': 1.1841911019164542e-07, 'epoch': 2.86} +2025-02-06 07:11:06 - ERROR - stderr - 95%|█████████▌| 21367/22434 [21:03:26<44:58, 2.53s/it] +2025-02-06 07:11:09 - ERROR - stderr - 95%|█████████▌| 21368/22434 [21:03:29<44:51, 2.53s/it] +2025-02-06 07:11:09 - ERROR - stderr - +2025-02-06 07:11:09 - ERROR - stderr - +2025-02-06 07:11:09 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.6703910827636719, 'learning_rate': 1.1819768583697711e-07, 'epoch': 2.86} +2025-02-06 07:11:09 - ERROR - stderr - 95%|█████████▌| 21368/22434 [21:03:29<44:51, 2.53s/it] +2025-02-06 07:11:11 - ERROR - stderr - 95%|█████████▌| 21369/22434 [21:03:31<44:27, 2.50s/it] +2025-02-06 07:11:11 - ERROR - stderr - +2025-02-06 07:11:11 - ERROR - stderr - +2025-02-06 07:11:11 - INFO - stdout - {'loss': 0.3071, 'grad_norm': 1.3917287588119507, 'learning_rate': 1.1797646745902225e-07, 'epoch': 2.86} +2025-02-06 07:11:11 - ERROR - stderr - 95%|█████████▌| 21369/22434 [21:03:31<44:27, 2.50s/it] +2025-02-06 07:11:14 - ERROR - stderr - 95%|█████████▌| 21370/22434 [21:03:34<44:20, 2.50s/it] +2025-02-06 07:11:14 - ERROR - stderr - +2025-02-06 07:11:14 - ERROR - stderr - +2025-02-06 07:11:14 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.5248653888702393, 'learning_rate': 1.1775545506239161e-07, 'epoch': 2.86} +2025-02-06 07:11:14 - ERROR - stderr - 95%|█████████▌| 21370/22434 [21:03:34<44:20, 2.50s/it] +2025-02-06 07:11:16 - ERROR - stderr - 95%|█████████▌| 21371/22434 [21:03:36<44:03, 2.49s/it] +2025-02-06 07:11:16 - ERROR - stderr - +2025-02-06 07:11:16 - ERROR - stderr - +2025-02-06 07:11:16 - INFO - stdout - {'loss': 0.3458, 'grad_norm': 1.5397793054580688, 'learning_rate': 1.1753464865169261e-07, 'epoch': 2.86} +2025-02-06 07:11:16 - ERROR - stderr - 95%|█████████▌| 21371/22434 [21:03:36<44:03, 2.49s/it] +2025-02-06 07:11:20 - ERROR - stderr - 95%|█████████▌| 21372/22434 [21:03:39<49:09, 2.78s/it] +2025-02-06 07:11:20 - ERROR - stderr - +2025-02-06 07:11:20 - ERROR - stderr - +2025-02-06 07:11:20 - INFO - stdout - {'loss': 0.3887, 'grad_norm': 1.5369126796722412, 'learning_rate': 1.1731404823152603e-07, 'epoch': 2.86} +2025-02-06 07:11:20 - ERROR - stderr - 95%|█████████▌| 21372/22434 [21:03:39<49:09, 2.78s/it] +2025-02-06 07:11:22 - ERROR - stderr - 95%|█████████▌| 21373/22434 [21:03:42<47:42, 2.70s/it] +2025-02-06 07:11:22 - ERROR - stderr - +2025-02-06 07:11:22 - ERROR - stderr - +2025-02-06 07:11:22 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.5674604177474976, 'learning_rate': 1.1709365380649263e-07, 'epoch': 2.86} +2025-02-06 07:11:22 - ERROR - stderr - 95%|█████████▌| 21373/22434 [21:03:42<47:42, 2.70s/it] +2025-02-06 07:11:25 - ERROR - stderr - 95%|█████████▌| 21374/22434 [21:03:44<46:16, 2.62s/it] +2025-02-06 07:11:25 - ERROR - stderr - +2025-02-06 07:11:25 - ERROR - stderr - +2025-02-06 07:11:25 - INFO - stdout - {'loss': 0.3229, 'grad_norm': 1.619637131690979, 'learning_rate': 1.1687346538118538e-07, 'epoch': 2.86} +2025-02-06 07:11:25 - ERROR - stderr - 95%|█████████▌| 21374/22434 [21:03:44<46:16, 2.62s/it] +2025-02-06 07:11:27 - ERROR - stderr - 95%|█████████▌| 21375/22434 [21:03:47<45:20, 2.57s/it] +2025-02-06 07:11:27 - ERROR - stderr - +2025-02-06 07:11:27 - ERROR - stderr - +2025-02-06 07:11:27 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.7939313650131226, 'learning_rate': 1.1665348296019396e-07, 'epoch': 2.86} +2025-02-06 07:11:27 - ERROR - stderr - 95%|█████████▌| 21375/22434 [21:03:47<45:20, 2.57s/it] +2025-02-06 07:11:30 - ERROR - stderr - 95%|█████████▌| 21376/22434 [21:03:49<44:48, 2.54s/it] +2025-02-06 07:11:30 - ERROR - stderr - +2025-02-06 07:11:30 - ERROR - stderr - +2025-02-06 07:11:30 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.4504554271697998, 'learning_rate': 1.1643370654810138e-07, 'epoch': 2.86} +2025-02-06 07:11:30 - ERROR - stderr - 95%|█████████▌| 21376/22434 [21:03:49<44:48, 2.54s/it] +2025-02-06 07:11:32 - ERROR - stderr - 95%|█████████▌| 21377/22434 [21:03:52<45:01, 2.56s/it] +2025-02-06 07:11:32 - ERROR - stderr - +2025-02-06 07:11:32 - ERROR - stderr - +2025-02-06 07:11:32 - INFO - stdout - {'loss': 0.3242, 'grad_norm': 1.4417351484298706, 'learning_rate': 1.1621413614949173e-07, 'epoch': 2.86} +2025-02-06 07:11:32 - ERROR - stderr - 95%|█████████▌| 21377/22434 [21:03:52<45:01, 2.56s/it] +2025-02-06 07:11:35 - ERROR - stderr - 95%|█████████▌| 21378/22434 [21:03:54<44:28, 2.53s/it] +2025-02-06 07:11:35 - ERROR - stderr - +2025-02-06 07:11:35 - ERROR - stderr - +2025-02-06 07:11:35 - INFO - stdout - {'loss': 0.3988, 'grad_norm': 1.7811124324798584, 'learning_rate': 1.1599477176894136e-07, 'epoch': 2.86} +2025-02-06 07:11:35 - ERROR - stderr - 95%|█████████▌| 21378/22434 [21:03:54<44:28, 2.53s/it] +2025-02-06 07:11:37 - ERROR - stderr - 95%|█████████▌| 21379/22434 [21:03:57<44:08, 2.51s/it] +2025-02-06 07:11:37 - ERROR - stderr - +2025-02-06 07:11:37 - ERROR - stderr - +2025-02-06 07:11:37 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.7639611959457397, 'learning_rate': 1.1577561341102106e-07, 'epoch': 2.86} +2025-02-06 07:11:37 - ERROR - stderr - 95%|█████████▌| 21379/22434 [21:03:57<44:08, 2.51s/it] +2025-02-06 07:11:40 - ERROR - stderr - 95%|█████████▌| 21380/22434 [21:03:59<43:53, 2.50s/it] +2025-02-06 07:11:40 - ERROR - stderr - +2025-02-06 07:11:40 - ERROR - stderr - +2025-02-06 07:11:40 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.5757520198822021, 'learning_rate': 1.155566610803005e-07, 'epoch': 2.86} +2025-02-06 07:11:40 - ERROR - stderr - 95%|█████████▌| 21380/22434 [21:03:59<43:53, 2.50s/it] +2025-02-06 07:11:42 - ERROR - stderr - 95%|█████████▌| 21381/22434 [21:04:02<43:31, 2.48s/it] +2025-02-06 07:11:42 - ERROR - stderr - +2025-02-06 07:11:42 - ERROR - stderr - +2025-02-06 07:11:42 - INFO - stdout - {'loss': 0.3198, 'grad_norm': 1.453657865524292, 'learning_rate': 1.1533791478134271e-07, 'epoch': 2.86} +2025-02-06 07:11:42 - ERROR - stderr - 95%|█████████▌| 21381/22434 [21:04:02<43:31, 2.48s/it] +2025-02-06 07:11:45 - ERROR - stderr - 95%|█████████▌| 21382/22434 [21:04:04<43:55, 2.51s/it] +2025-02-06 07:11:45 - ERROR - stderr - +2025-02-06 07:11:45 - ERROR - stderr - +2025-02-06 07:11:45 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.4833120107650757, 'learning_rate': 1.1511937451870737e-07, 'epoch': 2.86} +2025-02-06 07:11:45 - ERROR - stderr - 95%|█████████▌| 21382/22434 [21:04:04<43:55, 2.51s/it] +2025-02-06 07:11:47 - ERROR - stderr - 95%|█████████▌| 21383/22434 [21:04:07<44:09, 2.52s/it] +2025-02-06 07:11:47 - ERROR - stderr - +2025-02-06 07:11:47 - ERROR - stderr - +2025-02-06 07:11:47 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.756886601448059, 'learning_rate': 1.149010402969497e-07, 'epoch': 2.86} +2025-02-06 07:11:47 - ERROR - stderr - 95%|█████████▌| 21383/22434 [21:04:07<44:09, 2.52s/it] +2025-02-06 07:11:50 - ERROR - stderr - 95%|█████████▌| 21384/22434 [21:04:10<47:16, 2.70s/it] +2025-02-06 07:11:50 - ERROR - stderr - +2025-02-06 07:11:50 - ERROR - stderr - +2025-02-06 07:11:50 - INFO - stdout - {'loss': 0.4046, 'grad_norm': 1.6146433353424072, 'learning_rate': 1.1468291212062165e-07, 'epoch': 2.86} +2025-02-06 07:11:50 - ERROR - stderr - 95%|█████████▌| 21384/22434 [21:04:10<47:16, 2.70s/it] +2025-02-06 07:11:53 - ERROR - stderr - 95%|█████████▌| 21385/22434 [21:04:12<45:49, 2.62s/it] +2025-02-06 07:11:53 - ERROR - stderr - +2025-02-06 07:11:53 - ERROR - stderr - +2025-02-06 07:11:53 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.6746702194213867, 'learning_rate': 1.1446498999426848e-07, 'epoch': 2.86} +2025-02-06 07:11:53 - ERROR - stderr - 95%|█████████▌| 21385/22434 [21:04:12<45:49, 2.62s/it] +2025-02-06 07:11:55 - ERROR - stderr - 95%|█████████▌| 21386/22434 [21:04:15<45:45, 2.62s/it] +2025-02-06 07:11:55 - ERROR - stderr - +2025-02-06 07:11:55 - ERROR - stderr - +2025-02-06 07:11:55 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.5913323163986206, 'learning_rate': 1.1424727392243317e-07, 'epoch': 2.86} +2025-02-06 07:11:55 - ERROR - stderr - 95%|█████████▌| 21386/22434 [21:04:15<45:45, 2.62s/it] +2025-02-06 07:11:58 - ERROR - stderr - 95%|█████████▌| 21387/22434 [21:04:18<44:58, 2.58s/it] +2025-02-06 07:11:58 - ERROR - stderr - +2025-02-06 07:11:58 - ERROR - stderr - +2025-02-06 07:11:58 - INFO - stdout - {'loss': 0.3716, 'grad_norm': 1.7871694564819336, 'learning_rate': 1.1402976390965326e-07, 'epoch': 2.86} +2025-02-06 07:11:58 - ERROR - stderr - 95%|█████████▌| 21387/22434 [21:04:18<44:58, 2.58s/it] +2025-02-06 07:12:00 - ERROR - stderr - 95%|█████████▌| 21388/22434 [21:04:20<45:20, 2.60s/it] +2025-02-06 07:12:00 - ERROR - stderr - +2025-02-06 07:12:00 - ERROR - stderr - +2025-02-06 07:12:00 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.4441653490066528, 'learning_rate': 1.1381245996046397e-07, 'epoch': 2.86} +2025-02-06 07:12:00 - ERROR - stderr - 95%|█████████▌| 21388/22434 [21:04:20<45:20, 2.60s/it] +2025-02-06 07:12:03 - ERROR - stderr - 95%|█████████▌| 21389/22434 [21:04:23<45:03, 2.59s/it] +2025-02-06 07:12:03 - ERROR - stderr - +2025-02-06 07:12:03 - ERROR - stderr - +2025-02-06 07:12:03 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.5628869533538818, 'learning_rate': 1.1359536207939393e-07, 'epoch': 2.86} +2025-02-06 07:12:03 - ERROR - stderr - 95%|█████████▌| 21389/22434 [21:04:23<45:03, 2.59s/it] +2025-02-06 07:12:06 - ERROR - stderr - 95%|█████████▌| 21390/22434 [21:04:25<44:54, 2.58s/it] +2025-02-06 07:12:06 - ERROR - stderr - +2025-02-06 07:12:06 - ERROR - stderr - +2025-02-06 07:12:06 - INFO - stdout - {'loss': 0.3076, 'grad_norm': 1.4453990459442139, 'learning_rate': 1.1337847027096726e-07, 'epoch': 2.86} +2025-02-06 07:12:06 - ERROR - stderr - 95%|█████████▌| 21390/22434 [21:04:25<44:54, 2.58s/it] +2025-02-06 07:12:08 - ERROR - stderr - 95%|█████████▌| 21391/22434 [21:04:28<44:18, 2.55s/it] +2025-02-06 07:12:08 - ERROR - stderr - +2025-02-06 07:12:08 - ERROR - stderr - +2025-02-06 07:12:08 - INFO - stdout - {'loss': 0.4375, 'grad_norm': 1.7705130577087402, 'learning_rate': 1.1316178453970706e-07, 'epoch': 2.86} +2025-02-06 07:12:08 - ERROR - stderr - 95%|█████████▌| 21391/22434 [21:04:28<44:18, 2.55s/it] +2025-02-06 07:12:10 - ERROR - stderr - 95%|█████████▌| 21392/22434 [21:04:30<43:37, 2.51s/it] +2025-02-06 07:12:10 - ERROR - stderr - +2025-02-06 07:12:10 - ERROR - stderr - +2025-02-06 07:12:10 - INFO - stdout - {'loss': 0.3883, 'grad_norm': 1.588617205619812, 'learning_rate': 1.1294530489012856e-07, 'epoch': 2.86} +2025-02-06 07:12:10 - ERROR - stderr - 95%|█████████▌| 21392/22434 [21:04:30<43:37, 2.51s/it] +2025-02-06 07:12:13 - ERROR - stderr - 95%|█████████▌| 21393/22434 [21:04:33<43:21, 2.50s/it] +2025-02-06 07:12:13 - ERROR - stderr - +2025-02-06 07:12:13 - ERROR - stderr - +2025-02-06 07:12:13 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.4051568508148193, 'learning_rate': 1.1272903132674374e-07, 'epoch': 2.86} +2025-02-06 07:12:13 - ERROR - stderr - 95%|█████████▌| 21393/22434 [21:04:33<43:21, 2.50s/it] +2025-02-06 07:12:15 - ERROR - stderr - 95%|█████████▌| 21394/22434 [21:04:35<43:25, 2.51s/it] +2025-02-06 07:12:15 - ERROR - stderr - +2025-02-06 07:12:15 - ERROR - stderr - +2025-02-06 07:12:15 - INFO - stdout - {'loss': 0.4085, 'grad_norm': 1.7912297248840332, 'learning_rate': 1.125129638540623e-07, 'epoch': 2.86} +2025-02-06 07:12:15 - ERROR - stderr - 95%|█████████▌| 21394/22434 [21:04:35<43:25, 2.51s/it] +2025-02-06 07:12:18 - ERROR - stderr - 95%|█████��███▌| 21395/22434 [21:04:38<42:55, 2.48s/it] +2025-02-06 07:12:18 - ERROR - stderr - +2025-02-06 07:12:18 - ERROR - stderr - +2025-02-06 07:12:18 - INFO - stdout - {'loss': 0.3414, 'grad_norm': 1.5637527704238892, 'learning_rate': 1.122971024765851e-07, 'epoch': 2.86} +2025-02-06 07:12:18 - ERROR - stderr - 95%|█████████▌| 21395/22434 [21:04:38<42:55, 2.48s/it] +2025-02-06 07:12:20 - ERROR - stderr - 95%|█████████▌| 21396/22434 [21:04:40<42:56, 2.48s/it] +2025-02-06 07:12:20 - ERROR - stderr - +2025-02-06 07:12:20 - ERROR - stderr - +2025-02-06 07:12:20 - INFO - stdout - {'loss': 0.3159, 'grad_norm': 1.2206496000289917, 'learning_rate': 1.1208144719881408e-07, 'epoch': 2.86} +2025-02-06 07:12:20 - ERROR - stderr - 95%|█████████▌| 21396/22434 [21:04:40<42:56, 2.48s/it] +2025-02-06 07:12:23 - ERROR - stderr - 95%|█████████▌| 21397/22434 [21:04:43<42:37, 2.47s/it] +2025-02-06 07:12:23 - ERROR - stderr - +2025-02-06 07:12:23 - ERROR - stderr - +2025-02-06 07:12:23 - INFO - stdout - {'loss': 0.3359, 'grad_norm': 1.5811996459960938, 'learning_rate': 1.1186599802524344e-07, 'epoch': 2.86} +2025-02-06 07:12:23 - ERROR - stderr - 95%|█████████▌| 21397/22434 [21:04:43<42:37, 2.47s/it] +2025-02-06 07:12:25 - ERROR - stderr - 95%|█████████▌| 21398/22434 [21:04:45<42:27, 2.46s/it] +2025-02-06 07:12:25 - ERROR - stderr - +2025-02-06 07:12:25 - ERROR - stderr - +2025-02-06 07:12:25 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.6839238405227661, 'learning_rate': 1.1165075496036515e-07, 'epoch': 2.86} +2025-02-06 07:12:25 - ERROR - stderr - 95%|█████████▌| 21398/22434 [21:04:45<42:27, 2.46s/it] +2025-02-06 07:12:28 - ERROR - stderr - 95%|█████████▌| 21399/22434 [21:04:47<42:19, 2.45s/it] +2025-02-06 07:12:28 - ERROR - stderr - +2025-02-06 07:12:28 - ERROR - stderr - +2025-02-06 07:12:28 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.4935643672943115, 'learning_rate': 1.1143571800866449e-07, 'epoch': 2.86} +2025-02-06 07:12:28 - ERROR - stderr - 95%|█████████▌| 21399/22434 [21:04:47<42:19, 2.45s/it] +2025-02-06 07:12:30 - ERROR - stderr - 95%|█████████▌| 21400/22434 [21:04:50<42:33, 2.47s/it] +2025-02-06 07:12:30 - ERROR - stderr - +2025-02-06 07:12:30 - ERROR - stderr - +2025-02-06 07:12:30 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.5058201551437378, 'learning_rate': 1.1122088717462231e-07, 'epoch': 2.86} +2025-02-06 07:12:30 - ERROR - stderr - 95%|█████████▌| 21400/22434 [21:04:50<42:33, 2.47s/it] +2025-02-06 07:12:33 - ERROR - stderr - 95%|█████████▌| 21401/22434 [21:04:52<42:19, 2.46s/it] +2025-02-06 07:12:33 - ERROR - stderr - +2025-02-06 07:12:33 - ERROR - stderr - +2025-02-06 07:12:33 - INFO - stdout - {'loss': 0.3034, 'grad_norm': 1.3692536354064941, 'learning_rate': 1.1100626246272062e-07, 'epoch': 2.86} +2025-02-06 07:12:33 - ERROR - stderr - 95%|█████████▌| 21401/22434 [21:04:52<42:19, 2.46s/it] +2025-02-06 07:12:35 - ERROR - stderr - 95%|█████████▌| 21402/22434 [21:04:55<42:34, 2.47s/it] +2025-02-06 07:12:35 - ERROR - stderr - +2025-02-06 07:12:35 - ERROR - stderr - +2025-02-06 07:12:35 - INFO - stdout - {'loss': 0.288, 'grad_norm': 1.365714192390442, 'learning_rate': 1.1079184387742914e-07, 'epoch': 2.86} +2025-02-06 07:12:35 - ERROR - stderr - 95%|█████████▌| 21402/22434 [21:04:55<42:34, 2.47s/it] +2025-02-06 07:12:38 - ERROR - stderr - 95%|█████████▌| 21403/22434 [21:04:57<42:42, 2.49s/it] +2025-02-06 07:12:38 - ERROR - stderr - +2025-02-06 07:12:38 - ERROR - stderr - +2025-02-06 07:12:38 - INFO - stdout - {'loss': 0.3242, 'grad_norm': 1.5063707828521729, 'learning_rate': 1.1057763142321875e-07, 'epoch': 2.86} +2025-02-06 07:12:38 - ERROR - stderr - 95%|█████████▌| 21403/22434 [21:04:57<42:42, 2.49s/it] +2025-02-06 07:12:41 - ERROR - stderr - 95%|█████████▌| 21404/22434 [21:05:00<45:10, 2.63s/it] +2025-02-06 07:12:41 - ERROR - stderr - +2025-02-06 07:12:41 - ERROR - stderr - +2025-02-06 07:12:41 - INFO - stdout - {'loss': 0.3591, 'grad_norm': 1.7072794437408447, 'learning_rate': 1.1036362510455478e-07, 'epoch': 2.86} +2025-02-06 07:12:41 - ERROR - stderr - 95%|█████████▌| 21404/22434 [21:05:00<45:10, 2.63s/it] +2025-02-06 07:12:43 - ERROR - stderr - 95%|█████████▌| 21405/22434 [21:05:03<44:19, 2.58s/it] +2025-02-06 07:12:43 - ERROR - stderr - +2025-02-06 07:12:43 - ERROR - stderr - +2025-02-06 07:12:43 - INFO - stdout - {'loss': 0.3041, 'grad_norm': 1.4276528358459473, 'learning_rate': 1.1014982492589698e-07, 'epoch': 2.86} +2025-02-06 07:12:43 - ERROR - stderr - 95%|█████████▌| 21405/22434 [21:05:03<44:19, 2.58s/it] +2025-02-06 07:12:46 - ERROR - stderr - 95%|█████████▌| 21406/22434 [21:05:05<43:52, 2.56s/it] +2025-02-06 07:12:46 - ERROR - stderr - +2025-02-06 07:12:46 - ERROR - stderr - +2025-02-06 07:12:46 - INFO - stdout - {'loss': 0.3059, 'grad_norm': 1.2962820529937744, 'learning_rate': 1.0993623089170402e-07, 'epoch': 2.86} +2025-02-06 07:12:46 - ERROR - stderr - 95%|█████████▌| 21406/22434 [21:05:05<43:52, 2.56s/it] +2025-02-06 07:12:48 - ERROR - stderr - 95%|█████████▌| 21407/22434 [21:05:08<43:43, 2.55s/it] +2025-02-06 07:12:48 - ERROR - stderr - +2025-02-06 07:12:48 - ERROR - stderr - +2025-02-06 07:12:48 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.660308837890625, 'learning_rate': 1.0972284300642567e-07, 'epoch': 2.86} +2025-02-06 07:12:48 - ERROR - stderr - 95%|█████████▌| 21407/22434 [21:05:08<43:43, 2.55s/it] +2025-02-06 07:12:51 - ERROR - stderr - 95%|█████████▌| 21408/22434 [21:05:10<43:16, 2.53s/it] +2025-02-06 07:12:51 - ERROR - stderr - +2025-02-06 07:12:51 - ERROR - stderr - +2025-02-06 07:12:51 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.4883298873901367, 'learning_rate': 1.0950966127451057e-07, 'epoch': 2.86} +2025-02-06 07:12:51 - ERROR - stderr - 95%|█████████▌| 21408/22434 [21:05:10<43:16, 2.53s/it] +2025-02-06 07:12:53 - ERROR - stderr - 95%|█████████▌| 21409/22434 [21:05:13<43:09, 2.53s/it] +2025-02-06 07:12:53 - ERROR - stderr - +2025-02-06 07:12:53 - ERROR - stderr - +2025-02-06 07:12:53 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.4082783460617065, 'learning_rate': 1.0929668570040187e-07, 'epoch': 2.86} +2025-02-06 07:12:53 - ERROR - stderr - 95%|█████████▌| 21409/22434 [21:05:13<43:09, 2.53s/it] +2025-02-06 07:12:56 - ERROR - stderr - 95%|█████████▌| 21410/22434 [21:05:15<43:07, 2.53s/it] +2025-02-06 07:12:56 - ERROR - stderr - +2025-02-06 07:12:56 - ERROR - stderr - +2025-02-06 07:12:56 - INFO - stdout - {'loss': 0.3974, 'grad_norm': 1.4843063354492188, 'learning_rate': 1.0908391628854042e-07, 'epoch': 2.86} +2025-02-06 07:12:56 - ERROR - stderr - 95%|█████████▌| 21410/22434 [21:05:15<43:07, 2.53s/it] +2025-02-06 07:12:58 - ERROR - stderr - 95%|█████████▌| 21411/22434 [21:05:18<42:35, 2.50s/it] +2025-02-06 07:12:58 - ERROR - stderr - +2025-02-06 07:12:58 - ERROR - stderr - +2025-02-06 07:12:58 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.6850398778915405, 'learning_rate': 1.0887135304335938e-07, 'epoch': 2.86} +2025-02-06 07:12:58 - ERROR - stderr - 95%|█████████▌| 21411/22434 [21:05:18<42:35, 2.50s/it] +2025-02-06 07:13:01 - ERROR - stderr - 95%|█████████▌| 21412/22434 [21:05:20<42:28, 2.49s/it] +2025-02-06 07:13:01 - ERROR - stderr - +2025-02-06 07:13:01 - ERROR - stderr - +2025-02-06 07:13:01 - INFO - stdout - {'loss': 0.335, 'grad_norm': 1.496293306350708, 'learning_rate': 1.0865899596929075e-07, 'epoch': 2.86} +2025-02-06 07:13:01 - ERROR - stderr - 95%|█████████▌| 21412/22434 [21:05:20<42:28, 2.49s/it] +2025-02-06 07:13:03 - ERROR - stderr - 95%|█████████▌| 21413/22434 [21:05:23<42:33, 2.50s/it] +2025-02-06 07:13:03 - ERROR - stderr - +2025-02-06 07:13:03 - ERROR - stderr - +2025-02-06 07:13:03 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.688007116317749, 'learning_rate': 1.0844684507076097e-07, 'epoch': 2.86} +2025-02-06 07:13:03 - ERROR - stderr - 95%|█████████▌| 21413/22434 [21:05:23<42:33, 2.50s/it] +2025-02-06 07:13:06 - ERROR - stderr - 95%|█████████▌| 21414/22434 [21:05:25<42:56, 2.53s/it] +2025-02-06 07:13:06 - ERROR - stderr - +2025-02-06 07:13:06 - ERROR - stderr - +2025-02-06 07:13:06 - INFO - stdout - {'loss': 0.3314, 'grad_norm': 1.4264684915542603, 'learning_rate': 1.0823490035218986e-07, 'epoch': 2.86} +2025-02-06 07:13:06 - ERROR - stderr - 95%|█████████▌| 21414/22434 [21:05:25<42:56, 2.53s/it] +2025-02-06 07:13:08 - ERROR - stderr - 95%|█████████▌| 21415/22434 [21:05:28<42:52, 2.52s/it] +2025-02-06 07:13:08 - ERROR - stderr - +2025-02-06 07:13:08 - ERROR - stderr - +2025-02-06 07:13:08 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.7247264385223389, 'learning_rate': 1.0802316181799833e-07, 'epoch': 2.86} +2025-02-06 07:13:08 - ERROR - stderr - 95%|█████████▌| 21415/22434 [21:05:28<42:52, 2.52s/it] +2025-02-06 07:13:11 - ERROR - stderr - 95%|█████████▌| 21416/22434 [21:05:30<42:44, 2.52s/it] +2025-02-06 07:13:11 - ERROR - stderr - +2025-02-06 07:13:11 - ERROR - stderr - +2025-02-06 07:13:11 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.5128061771392822, 'learning_rate': 1.0781162947259727e-07, 'epoch': 2.86} +2025-02-06 07:13:11 - ERROR - stderr - 95%|█████████▌| 21416/22434 [21:05:30<42:44, 2.52s/it] +2025-02-06 07:13:14 - ERROR - stderr - 95%|█████████▌| 21417/22434 [21:05:34<46:22, 2.74s/it] +2025-02-06 07:13:14 - ERROR - stderr - +2025-02-06 07:13:14 - ERROR - stderr - +2025-02-06 07:13:14 - INFO - stdout - {'loss': 0.4195, 'grad_norm': 1.9091788530349731, 'learning_rate': 1.0760030332039761e-07, 'epoch': 2.86} +2025-02-06 07:13:14 - ERROR - stderr - 95%|█████████▌| 21417/22434 [21:05:34<46:22, 2.74s/it] +2025-02-06 07:13:16 - ERROR - stderr - 95%|█████████▌| 21418/22434 [21:05:36<45:26, 2.68s/it] +2025-02-06 07:13:17 - ERROR - stderr - +2025-02-06 07:13:17 - ERROR - stderr - +2025-02-06 07:13:17 - INFO - stdout - {'loss': 0.408, 'grad_norm': 1.7148890495300293, 'learning_rate': 1.0738918336580362e-07, 'epoch': 2.86} +2025-02-06 07:13:17 - ERROR - stderr - 95%|█████████▌| 21418/22434 [21:05:36<45:26, 2.68s/it] +2025-02-06 07:13:19 - ERROR - stderr - 95%|█████████▌| 21419/22434 [21:05:39<44:36, 2.64s/it] +2025-02-06 07:13:19 - ERROR - stderr - +2025-02-06 07:13:19 - ERROR - stderr - +2025-02-06 07:13:19 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.3402626514434814, 'learning_rate': 1.071782696132162e-07, 'epoch': 2.86} +2025-02-06 07:13:19 - ERROR - stderr - 95%|█████████▌| 21419/22434 [21:05:39<44:36, 2.64s/it] +2025-02-06 07:13:21 - ERROR - stderr - 95%|█████████▌| 21420/22434 [21:05:41<43:54, 2.60s/it] +2025-02-06 07:13:22 - ERROR - stderr - +2025-02-06 07:13:22 - ERROR - stderr - +2025-02-06 07:13:22 - INFO - stdout - {'loss': 0.3801, 'grad_norm': 1.4568476676940918, 'learning_rate': 1.0696756206703185e-07, 'epoch': 2.86} +2025-02-06 07:13:22 - ERROR - stderr - 95%|█████████▌| 21420/22434 [21:05:41<43:54, 2.60s/it] +2025-02-06 07:13:24 - ERROR - stderr - 95%|█████████▌| 21421/22434 [21:05:44<43:46, 2.59s/it] +2025-02-06 07:13:24 - ERROR - stderr - +2025-02-06 07:13:24 - ERROR - stderr - +2025-02-06 07:13:24 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.5037922859191895, 'learning_rate': 1.0675706073164038e-07, 'epoch': 2.86} +2025-02-06 07:13:24 - ERROR - stderr - 95%|█████████▌| 21421/22434 [21:05:44<43:46, 2.59s/it] +2025-02-06 07:13:27 - ERROR - stderr - 95%|█████████▌| 21422/22434 [21:05:46<43:00, 2.55s/it] +2025-02-06 07:13:27 - ERROR - stderr - +2025-02-06 07:13:27 - ERROR - stderr - +2025-02-06 07:13:27 - INFO - stdout - {'loss': 0.3518, 'grad_norm': 1.3684141635894775, 'learning_rate': 1.0654676561143273e-07, 'epoch': 2.86} +2025-02-06 07:13:27 - ERROR - stderr - 95%|█████████▌| 21422/22434 [21:05:46<43:00, 2.55s/it] +2025-02-06 07:13:29 - ERROR - stderr - 95%|█████████▌| 21423/22434 [21:05:49<42:20, 2.51s/it] +2025-02-06 07:13:29 - ERROR - stderr - +2025-02-06 07:13:29 - ERROR - stderr - +2025-02-06 07:13:29 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.4641708135604858, 'learning_rate': 1.0633667671078984e-07, 'epoch': 2.86} +2025-02-06 07:13:29 - ERROR - stderr - 95%|█████████▌| 21423/22434 [21:05:49<42:20, 2.51s/it] +2025-02-06 07:13:32 - ERROR - stderr - 95%|█████████▌| 21424/22434 [21:05:51<42:51, 2.55s/it] +2025-02-06 07:13:32 - ERROR - stderr - +2025-02-06 07:13:32 - ERROR - stderr - +2025-02-06 07:13:32 - INFO - stdout - {'loss': 0.3364, 'grad_norm': 1.6568219661712646, 'learning_rate': 1.0612679403409154e-07, 'epoch': 2.86} +2025-02-06 07:13:32 - ERROR - stderr - 95%|█████████▌| 21424/22434 [21:05:51<42:51, 2.55s/it] +2025-02-06 07:13:34 - ERROR - stderr - 96%|█████████▌| 21425/22434 [21:05:54<42:19, 2.52s/it] +2025-02-06 07:13:34 - ERROR - stderr - +2025-02-06 07:13:34 - ERROR - stderr - +2025-02-06 07:13:34 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.486709713935852, 'learning_rate': 1.0591711758571322e-07, 'epoch': 2.87} +2025-02-06 07:13:34 - ERROR - stderr - 96%|█████████▌| 21425/22434 [21:05:54<42:19, 2.52s/it] +2025-02-06 07:13:37 - ERROR - stderr - 96%|█████████▌| 21426/22434 [21:05:56<42:15, 2.52s/it] +2025-02-06 07:13:37 - ERROR - stderr - +2025-02-06 07:13:37 - ERROR - stderr - +2025-02-06 07:13:37 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.5510412454605103, 'learning_rate': 1.057076473700247e-07, 'epoch': 2.87} +2025-02-06 07:13:37 - ERROR - stderr - 96%|█████████▌| 21426/22434 [21:05:56<42:15, 2.52s/it] +2025-02-06 07:13:39 - ERROR - stderr - 96%|█████████▌| 21427/22434 [21:05:59<42:23, 2.53s/it] +2025-02-06 07:13:39 - ERROR - stderr - +2025-02-06 07:13:39 - ERROR - stderr - +2025-02-06 07:13:39 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.59087336063385, 'learning_rate': 1.0549838339139362e-07, 'epoch': 2.87} +2025-02-06 07:13:39 - ERROR - stderr - 96%|█████████▌| 21427/22434 [21:05:59<42:23, 2.53s/it] +2025-02-06 07:13:42 - ERROR - stderr - 96%|█████████▌| 21428/22434 [21:06:01<42:14, 2.52s/it] +2025-02-06 07:13:42 - ERROR - stderr - +2025-02-06 07:13:42 - ERROR - stderr - +2025-02-06 07:13:42 - INFO - stdout - {'loss': 0.3502, 'grad_norm': 1.5892013311386108, 'learning_rate': 1.0528932565417982e-07, 'epoch': 2.87} +2025-02-06 07:13:42 - ERROR - stderr - 96%|█████████▌| 21428/22434 [21:06:01<42:14, 2.52s/it] +2025-02-06 07:13:44 - ERROR - stderr - 96%|█████████▌| 21429/22434 [21:06:04<41:55, 2.50s/it] +2025-02-06 07:13:44 - ERROR - stderr - +2025-02-06 07:13:44 - ERROR - stderr - +2025-02-06 07:13:44 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.523118019104004, 'learning_rate': 1.0508047416274203e-07, 'epoch': 2.87} +2025-02-06 07:13:44 - ERROR - stderr - 96%|█████████▌| 21429/22434 [21:06:04<41:55, 2.50s/it] +2025-02-06 07:13:46 - ERROR - stderr - 96%|█████████▌| 21430/22434 [21:06:06<41:36, 2.49s/it] +2025-02-06 07:13:47 - ERROR - stderr - +2025-02-06 07:13:47 - ERROR - stderr - +2025-02-06 07:13:47 - INFO - stdout - {'loss': 0.3957, 'grad_norm': 1.6732336282730103, 'learning_rate': 1.0487182892143232e-07, 'epoch': 2.87} +2025-02-06 07:13:47 - ERROR - stderr - 96%|█████████▌| 21430/22434 [21:06:06<41:36, 2.49s/it] +2025-02-06 07:13:49 - ERROR - stderr - 96%|█████████▌| 21431/22434 [21:06:09<41:27, 2.48s/it] +2025-02-06 07:13:49 - ERROR - stderr - +2025-02-06 07:13:49 - ERROR - stderr - +2025-02-06 07:13:49 - INFO - stdout - {'loss': 0.3397, 'grad_norm': 1.7156574726104736, 'learning_rate': 1.0466338993460167e-07, 'epoch': 2.87} +2025-02-06 07:13:49 - ERROR - stderr - 96%|█████████▌| 21431/22434 [21:06:09<41:27, 2.48s/it] +2025-02-06 07:13:51 - ERROR - stderr - 96%|█████████▌| 21432/22434 [21:06:11<41:37, 2.49s/it] +2025-02-06 07:13:52 - ERROR - stderr - +2025-02-06 07:13:52 - ERROR - stderr - +2025-02-06 07:13:52 - INFO - stdout - {'loss': 0.3599, 'grad_norm': 1.6031233072280884, 'learning_rate': 1.0445515720659438e-07, 'epoch': 2.87} +2025-02-06 07:13:52 - ERROR - stderr - 96%|█████████▌| 21432/22434 [21:06:11<41:37, 2.49s/it] +2025-02-06 07:13:54 - ERROR - stderr - 96%|█████████▌| 21433/22434 [21:06:14<41:33, 2.49s/it] +2025-02-06 07:13:54 - ERROR - stderr - +2025-02-06 07:13:54 - ERROR - stderr - +2025-02-06 07:13:54 - INFO - stdout - {'loss': 0.3315, 'grad_norm': 1.4509334564208984, 'learning_rate': 1.0424713074174919e-07, 'epoch': 2.87} +2025-02-06 07:13:54 - ERROR - stderr - 96%|█████████▌| 21433/22434 [21:06:14<41:33, 2.49s/it] +2025-02-06 07:13:56 - ERROR - stderr - 96%|█████████▌| 21434/22434 [21:06:16<41:25, 2.49s/it] +2025-02-06 07:13:56 - ERROR - stderr - +2025-02-06 07:13:56 - ERROR - stderr - +2025-02-06 07:13:56 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.6894257068634033, 'learning_rate': 1.0403931054440375e-07, 'epoch': 2.87} +2025-02-06 07:13:56 - ERROR - stderr - 96%|█████████▌| 21434/22434 [21:06:16<41:25, 2.49s/it] +2025-02-06 07:13:59 - ERROR - stderr - 96%|█████████▌| 21435/22434 [21:06:19<41:03, 2.47s/it] +2025-02-06 07:13:59 - ERROR - stderr - +2025-02-06 07:13:59 - ERROR - stderr - +2025-02-06 07:13:59 - INFO - stdout - {'loss': 0.3476, 'grad_norm': 1.3921858072280884, 'learning_rate': 1.0383169661888904e-07, 'epoch': 2.87} +2025-02-06 07:13:59 - ERROR - stderr - 96%|█████████▌| 21435/22434 [21:06:19<41:03, 2.47s/it] +2025-02-06 07:14:02 - ERROR - stderr - 96%|█████████▌| 21436/22434 [21:06:21<42:40, 2.57s/it] +2025-02-06 07:14:02 - ERROR - stderr - +2025-02-06 07:14:02 - ERROR - stderr - +2025-02-06 07:14:02 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.5089781284332275, 'learning_rate': 1.036242889695338e-07, 'epoch': 2.87} +2025-02-06 07:14:02 - ERROR - stderr - 96%|█████████▌| 21436/22434 [21:06:21<42:40, 2.57s/it] +2025-02-06 07:14:04 - ERROR - stderr - 96%|█████████▌| 21437/22434 [21:06:24<42:15, 2.54s/it] +2025-02-06 07:14:04 - ERROR - stderr - +2025-02-06 07:14:04 - ERROR - stderr - +2025-02-06 07:14:04 - INFO - stdout - {'loss': 0.3765, 'grad_norm': 1.5736150741577148, 'learning_rate': 1.0341708760066016e-07, 'epoch': 2.87} +2025-02-06 07:14:04 - ERROR - stderr - 96%|█████████▌| 21437/22434 [21:06:24<42:15, 2.54s/it] +2025-02-06 07:14:07 - ERROR - stderr - 96%|█████████▌| 21438/22434 [21:06:27<42:42, 2.57s/it] +2025-02-06 07:14:07 - ERROR - stderr - +2025-02-06 07:14:07 - ERROR - stderr - +2025-02-06 07:14:07 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.4795104265213013, 'learning_rate': 1.0321009251658686e-07, 'epoch': 2.87} +2025-02-06 07:14:07 - ERROR - stderr - 96%|█████████▌| 21438/22434 [21:06:27<42:42, 2.57s/it] +2025-02-06 07:14:09 - ERROR - stderr - 96%|█████████▌| 21439/22434 [21:06:29<42:24, 2.56s/it] +2025-02-06 07:14:09 - ERROR - stderr - +2025-02-06 07:14:09 - ERROR - stderr - +2025-02-06 07:14:09 - INFO - stdout - {'loss': 0.4047, 'grad_norm': 1.5555113554000854, 'learning_rate': 1.0300330372163047e-07, 'epoch': 2.87} +2025-02-06 07:14:09 - ERROR - stderr - 96%|█████████▌| 21439/22434 [21:06:29<42:24, 2.56s/it] +2025-02-06 07:14:12 - ERROR - stderr - 96%|█████████▌| 21440/22434 [21:06:32<41:48, 2.52s/it] +2025-02-06 07:14:12 - ERROR - stderr - +2025-02-06 07:14:12 - ERROR - stderr - +2025-02-06 07:14:12 - INFO - stdout - {'loss': 0.3463, 'grad_norm': 1.5946283340454102, 'learning_rate': 1.0279672122009865e-07, 'epoch': 2.87} +2025-02-06 07:14:12 - ERROR - stderr - 96%|█████████▌| 21440/22434 [21:06:32<41:48, 2.52s/it] +2025-02-06 07:14:14 - ERROR - stderr - 96%|█████████▌| 21441/22434 [21:06:34<41:50, 2.53s/it] +2025-02-06 07:14:14 - ERROR - stderr - +2025-02-06 07:14:14 - ERROR - stderr - +2025-02-06 07:14:14 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.6732900142669678, 'learning_rate': 1.0259034501629795e-07, 'epoch': 2.87} +2025-02-06 07:14:14 - ERROR - stderr - 96%|█████████▌| 21441/22434 [21:06:34<41:50, 2.53s/it] +2025-02-06 07:14:17 - ERROR - stderr - 96%|█████████▌| 21442/22434 [21:06:37<41:47, 2.53s/it] +2025-02-06 07:14:17 - ERROR - stderr - +2025-02-06 07:14:17 - ERROR - stderr - +2025-02-06 07:14:17 - INFO - stdout - {'loss': 0.3245, 'grad_norm': 1.4459824562072754, 'learning_rate': 1.0238417511453158e-07, 'epoch': 2.87} +2025-02-06 07:14:17 - ERROR - stderr - 96%|█████████▌| 21442/22434 [21:06:37<41:47, 2.53s/it] +2025-02-06 07:14:19 - ERROR - stderr - 96%|█████████▌| 21443/22434 [21:06:39<42:05, 2.55s/it] +2025-02-06 07:14:19 - ERROR - stderr - +2025-02-06 07:14:19 - ERROR - stderr - +2025-02-06 07:14:19 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.4974004030227661, 'learning_rate': 1.0217821151909612e-07, 'epoch': 2.87} +2025-02-06 07:14:19 - ERROR - stderr - 96%|█████████▌| 21443/22434 [21:06:39<42:05, 2.55s/it] +2025-02-06 07:14:22 - ERROR - stderr - 96%|█████████▌| 21444/22434 [21:06:42<41:50, 2.54s/it] +2025-02-06 07:14:22 - ERROR - stderr - +2025-02-06 07:14:22 - ERROR - stderr - +2025-02-06 07:14:22 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.5511211156845093, 'learning_rate': 1.0197245423428481e-07, 'epoch': 2.87} +2025-02-06 07:14:22 - ERROR - stderr - 96%|█████████▌| 21444/22434 [21:06:42<41:50, 2.54s/it] +2025-02-06 07:14:24 - ERROR - stderr - 96%|█████████▌| 21445/22434 [21:06:44<41:46, 2.53s/it] +2025-02-06 07:14:25 - ERROR - stderr - +2025-02-06 07:14:25 - ERROR - stderr - +2025-02-06 07:14:25 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.5882346630096436, 'learning_rate': 1.0176690326438531e-07, 'epoch': 2.87} +2025-02-06 07:14:25 - ERROR - stderr - 96%|█████████▌| 21445/22434 [21:06:44<41:46, 2.53s/it] +2025-02-06 07:14:27 - ERROR - stderr - 96%|█████████▌| 21446/22434 [21:06:47<41:37, 2.53s/it] +2025-02-06 07:14:27 - ERROR - stderr - +2025-02-06 07:14:27 - ERROR - stderr - +2025-02-06 07:14:27 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.6177600622177124, 'learning_rate': 1.0156155861368533e-07, 'epoch': 2.87} +2025-02-06 07:14:27 - ERROR - stderr - 96%|█████████▌| 21446/22434 [21:06:47<41:37, 2.53s/it] +2025-02-06 07:14:29 - ERROR - stderr - 96%|█████████▌| 21447/22434 [21:06:49<41:09, 2.50s/it] +2025-02-06 07:14:29 - ERROR - stderr - +2025-02-06 07:14:29 - ERROR - stderr - +2025-02-06 07:14:29 - INFO - stdout - {'loss': 0.4726, 'grad_norm': 1.7877057790756226, 'learning_rate': 1.0135642028646142e-07, 'epoch': 2.87} +2025-02-06 07:14:29 - ERROR - stderr - 96%|█████████▌| 21447/22434 [21:06:49<41:09, 2.50s/it] +2025-02-06 07:14:32 - ERROR - stderr - 96%|█████████▌| 21448/22434 [21:06:52<40:57, 2.49s/it] +2025-02-06 07:14:32 - ERROR - stderr - +2025-02-06 07:14:32 - ERROR - stderr - +2025-02-06 07:14:32 - INFO - stdout - {'loss': 0.3181, 'grad_norm': 1.6083234548568726, 'learning_rate': 1.0115148828699017e-07, 'epoch': 2.87} +2025-02-06 07:14:32 - ERROR - stderr - 96%|█████████▌| 21448/22434 [21:06:52<40:57, 2.49s/it] +2025-02-06 07:14:34 - ERROR - stderr - 96%|█████████▌| 21449/22434 [21:06:54<40:55, 2.49s/it] +2025-02-06 07:14:34 - ERROR - stderr - +2025-02-06 07:14:34 - ERROR - stderr - +2025-02-06 07:14:34 - INFO - stdout - {'loss': 0.3441, 'grad_norm': 1.4748469591140747, 'learning_rate': 1.0094676261954484e-07, 'epoch': 2.87} +2025-02-06 07:14:34 - ERROR - stderr - 96%|█████████▌| 21449/22434 [21:06:54<40:55, 2.49s/it] +2025-02-06 07:14:37 - ERROR - stderr - 96%|█████████▌| 21450/22434 [21:06:57<41:03, 2.50s/it] +2025-02-06 07:14:37 - ERROR - stderr - +2025-02-06 07:14:37 - ERROR - stderr - +2025-02-06 07:14:37 - INFO - stdout - {'loss': 0.3146, 'grad_norm': 1.688672423362732, 'learning_rate': 1.0074224328839088e-07, 'epoch': 2.87} +2025-02-06 07:14:37 - ERROR - stderr - 96%|█████████▌| 21450/22434 [21:06:57<41:03, 2.50s/it] +2025-02-06 07:14:39 - ERROR - stderr - 96%|█████████▌| 21451/22434 [21:06:59<41:02, 2.50s/it] +2025-02-06 07:14:39 - ERROR - stderr - +2025-02-06 07:14:39 - ERROR - stderr - +2025-02-06 07:14:39 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.5790619850158691, 'learning_rate': 1.0053793029779379e-07, 'epoch': 2.87} +2025-02-06 07:14:39 - ERROR - stderr - 96%|█████████▌| 21451/22434 [21:06:59<41:02, 2.50s/it] +2025-02-06 07:14:42 - ERROR - stderr - 96%|█████████▌| 21452/22434 [21:07:02<41:00, 2.51s/it] +2025-02-06 07:14:42 - ERROR - stderr - +2025-02-06 07:14:42 - ERROR - stderr - +2025-02-06 07:14:42 - INFO - stdout - {'loss': 0.3467, 'grad_norm': 1.4694894552230835, 'learning_rate': 1.0033382365201016e-07, 'epoch': 2.87} +2025-02-06 07:14:42 - ERROR - stderr - 96%|█████████▌| 21452/22434 [21:07:02<41:00, 2.51s/it] +2025-02-06 07:14:44 - ERROR - stderr - 96%|█████████▌| 21453/22434 [21:07:04<40:47, 2.50s/it] +2025-02-06 07:14:44 - ERROR - stderr - +2025-02-06 07:14:44 - ERROR - stderr - +2025-02-06 07:14:44 - INFO - stdout - {'loss': 0.3458, 'grad_norm': 1.664807915687561, 'learning_rate': 1.0012992335529548e-07, 'epoch': 2.87} +2025-02-06 07:14:44 - ERROR - stderr - 96%|█████████▌| 21453/22434 [21:07:04<40:47, 2.50s/it] +2025-02-06 07:14:47 - ERROR - stderr - 96%|█████████▌| 21454/22434 [21:07:07<40:23, 2.47s/it] +2025-02-06 07:14:47 - ERROR - stderr - +2025-02-06 07:14:47 - ERROR - stderr - +2025-02-06 07:14:47 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.5460667610168457, 'learning_rate': 9.992622941189856e-08, 'epoch': 2.87} +2025-02-06 07:14:47 - ERROR - stderr - 96%|█████████▌| 21454/22434 [21:07:07<40:23, 2.47s/it] +2025-02-06 07:14:49 - ERROR - stderr - 96%|█████████▌| 21455/22434 [21:07:09<40:03, 2.46s/it] +2025-02-06 07:14:49 - ERROR - stderr - +2025-02-06 07:14:49 - ERROR - stderr - +2025-02-06 07:14:49 - INFO - stdout - {'loss': 0.369, 'grad_norm': 1.5894346237182617, 'learning_rate': 9.972274182606712e-08, 'epoch': 2.87} +2025-02-06 07:14:49 - ERROR - stderr - 96%|█████████▌| 21455/22434 [21:07:09<40:03, 2.46s/it] +2025-02-06 07:14:52 - ERROR - stderr - 96%|█████████▌| 21456/22434 [21:07:11<40:01, 2.46s/it] +2025-02-06 07:14:52 - ERROR - stderr - +2025-02-06 07:14:52 - ERROR - stderr - +2025-02-06 07:14:52 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.630205750465393, 'learning_rate': 9.95194606020411e-08, 'epoch': 2.87} +2025-02-06 07:14:52 - ERROR - stderr - 96%|█████████▌| 21456/22434 [21:07:12<40:01, 2.46s/it] +2025-02-06 07:14:54 - ERROR - stderr - 96%|█████████▌| 21457/22434 [21:07:14<41:21, 2.54s/it] +2025-02-06 07:14:54 - ERROR - stderr - +2025-02-06 07:14:54 - ERROR - stderr - +2025-02-06 07:14:54 - INFO - stdout - {'loss': 0.3579, 'grad_norm': 1.7381374835968018, 'learning_rate': 9.931638574405711e-08, 'epoch': 2.87} +2025-02-06 07:14:54 - ERROR - stderr - 96%|█████████▌| 21457/22434 [21:07:14<41:21, 2.54s/it] +2025-02-06 07:14:57 - ERROR - stderr - 96%|█████████▌| 21458/22434 [21:07:17<42:04, 2.59s/it] +2025-02-06 07:14:57 - ERROR - stderr - +2025-02-06 07:14:57 - ERROR - stderr - +2025-02-06 07:14:57 - INFO - stdout - {'loss': 0.3877, 'grad_norm': 1.5989984273910522, 'learning_rate': 9.911351725635066e-08, 'epoch': 2.87} +2025-02-06 07:14:57 - ERROR - stderr - 96%|█████████▌| 21458/22434 [21:07:17<42:04, 2.59s/it] +2025-02-06 07:15:00 - ERROR - stderr - 96%|█████████▌| 21459/22434 [21:07:19<41:27, 2.55s/it] +2025-02-06 07:15:00 - ERROR - stderr - +2025-02-06 07:15:00 - ERROR - stderr - +2025-02-06 07:15:00 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.4860011339187622, 'learning_rate': 9.891085514314835e-08, 'epoch': 2.87} +2025-02-06 07:15:00 - ERROR - stderr - 96%|█████████▌| 21459/22434 [21:07:19<41:27, 2.55s/it] +2025-02-06 07:15:02 - ERROR - stderr - 96%|█████████▌| 21460/22434 [21:07:22<41:30, 2.56s/it] +2025-02-06 07:15:02 - ERROR - stderr - +2025-02-06 07:15:02 - ERROR - stderr - +2025-02-06 07:15:02 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.5850794315338135, 'learning_rate': 9.870839940867461e-08, 'epoch': 2.87} +2025-02-06 07:15:02 - ERROR - stderr - 96%|█████████▌| 21460/22434 [21:07:22<41:30, 2.56s/it] +2025-02-06 07:15:05 - ERROR - stderr - 96%|█████████▌| 21461/22434 [21:07:24<41:09, 2.54s/it] +2025-02-06 07:15:05 - ERROR - stderr - +2025-02-06 07:15:05 - ERROR - stderr - +2025-02-06 07:15:05 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.543359637260437, 'learning_rate': 9.850615005714936e-08, 'epoch': 2.87} +2025-02-06 07:15:05 - ERROR - stderr - 96%|█████████▌| 21461/22434 [21:07:24<41:09, 2.54s/it] +2025-02-06 07:15:07 - ERROR - stderr - 96%|█████████▌| 21462/22434 [21:07:27<41:02, 2.53s/it] +2025-02-06 07:15:07 - ERROR - stderr - +2025-02-06 07:15:07 - ERROR - stderr - +2025-02-06 07:15:07 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.6070114374160767, 'learning_rate': 9.830410709278925e-08, 'epoch': 2.87} +2025-02-06 07:15:07 - ERROR - stderr - 96%|█████████▌| 21462/22434 [21:07:27<41:02, 2.53s/it] +2025-02-06 07:15:10 - ERROR - stderr - 96%|█████████▌| 21463/22434 [21:07:29<40:49, 2.52s/it] +2025-02-06 07:15:10 - ERROR - stderr - +2025-02-06 07:15:10 - ERROR - stderr - +2025-02-06 07:15:10 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.5142873525619507, 'learning_rate': 9.810227051980648e-08, 'epoch': 2.87} +2025-02-06 07:15:10 - ERROR - stderr - 96%|█████████▌| 21463/22434 [21:07:29<40:49, 2.52s/it] +2025-02-06 07:15:12 - ERROR - stderr - 96%|█████████▌| 21464/22434 [21:07:32<40:34, 2.51s/it] +2025-02-06 07:15:12 - ERROR - stderr - +2025-02-06 07:15:12 - ERROR - stderr - +2025-02-06 07:15:12 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.8093655109405518, 'learning_rate': 9.790064034240432e-08, 'epoch': 2.87} +2025-02-06 07:15:12 - ERROR - stderr - 96%|█████████▌| 21464/22434 [21:07:32<40:34, 2.51s/it] +2025-02-06 07:15:15 - ERROR - stderr - 96%|█████████▌| 21465/22434 [21:07:34<40:42, 2.52s/it] +2025-02-06 07:15:15 - ERROR - stderr - +2025-02-06 07:15:15 - ERROR - stderr - +2025-02-06 07:15:15 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.633550763130188, 'learning_rate': 9.769921656479053e-08, 'epoch': 2.87} +2025-02-06 07:15:15 - ERROR - stderr - 96%|█████████▌| 21465/22434 [21:07:35<40:42, 2.52s/it] +2025-02-06 07:15:17 - ERROR - stderr - 96%|█████████▌| 21466/22434 [21:07:37<40:15, 2.49s/it] +2025-02-06 07:15:17 - ERROR - stderr - +2025-02-06 07:15:17 - ERROR - stderr - +2025-02-06 07:15:17 - INFO - stdout - {'loss': 0.3429, 'grad_norm': 1.5115312337875366, 'learning_rate': 9.749799919115844e-08, 'epoch': 2.87} +2025-02-06 07:15:17 - ERROR - stderr - 96%|█████████▌| 21466/22434 [21:07:37<40:15, 2.49s/it] +2025-02-06 07:15:20 - ERROR - stderr - 96%|█████████▌| 21467/22434 [21:07:39<40:25, 2.51s/it] +2025-02-06 07:15:20 - ERROR - stderr - +2025-02-06 07:15:20 - ERROR - stderr - +2025-02-06 07:15:20 - INFO - stdout - {'loss': 0.3665, 'grad_norm': 1.6019270420074463, 'learning_rate': 9.729698822570688e-08, 'epoch': 2.87} +2025-02-06 07:15:20 - ERROR - stderr - 96%|█████████▌| 21467/22434 [21:07:39<40:25, 2.51s/it] +2025-02-06 07:15:22 - ERROR - stderr - 96%|█████████▌| 21468/22434 [21:07:42<41:30, 2.58s/it] +2025-02-06 07:15:22 - ERROR - stderr - +2025-02-06 07:15:22 - ERROR - stderr - +2025-02-06 07:15:22 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.5307166576385498, 'learning_rate': 9.709618367262364e-08, 'epoch': 2.87} +2025-02-06 07:15:22 - ERROR - stderr - 96%|█████████▌| 21468/22434 [21:07:42<41:30, 2.58s/it] +2025-02-06 07:15:25 - ERROR - stderr - 96%|█████████▌| 21469/22434 [21:07:45<40:55, 2.54s/it] +2025-02-06 07:15:25 - ERROR - stderr - +2025-02-06 07:15:25 - ERROR - stderr - +2025-02-06 07:15:25 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.6280714273452759, 'learning_rate': 9.689558553609313e-08, 'epoch': 2.87} +2025-02-06 07:15:25 - ERROR - stderr - 96%|█████████▌| 21469/22434 [21:07:45<40:55, 2.54s/it] +2025-02-06 07:15:27 - ERROR - stderr - 96%|█████████▌| 21470/22434 [21:07:47<40:53, 2.55s/it] +2025-02-06 07:15:27 - ERROR - stderr - +2025-02-06 07:15:27 - ERROR - stderr - +2025-02-06 07:15:27 - INFO - stdout - {'loss': 0.3226, 'grad_norm': 1.5097849369049072, 'learning_rate': 9.669519382029869e-08, 'epoch': 2.87} +2025-02-06 07:15:27 - ERROR - stderr - 96%|█████████▌| 21470/22434 [21:07:47<40:53, 2.55s/it] +2025-02-06 07:15:30 - ERROR - stderr - 96%|█████████▌| 21471/22434 [21:07:50<40:30, 2.52s/it] +2025-02-06 07:15:30 - ERROR - stderr - +2025-02-06 07:15:30 - ERROR - stderr - +2025-02-06 07:15:30 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.4506953954696655, 'learning_rate': 9.649500852941696e-08, 'epoch': 2.87} +2025-02-06 07:15:30 - ERROR - stderr - 96%|█████████▌| 21471/22434 [21:07:50<40:30, 2.52s/it] +2025-02-06 07:15:32 - ERROR - stderr - 96%|█████████▌| 21472/22434 [21:07:52<40:02, 2.50s/it] +2025-02-06 07:15:32 - ERROR - stderr - +2025-02-06 07:15:32 - ERROR - stderr - +2025-02-06 07:15:32 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.637101650238037, 'learning_rate': 9.629502966761905e-08, 'epoch': 2.87} +2025-02-06 07:15:32 - ERROR - stderr - 96%|█████████▌| 21472/22434 [21:07:52<40:02, 2.50s/it] +2025-02-06 07:15:35 - ERROR - stderr - 96%|█████████▌| 21473/22434 [21:07:55<39:45, 2.48s/it] +2025-02-06 07:15:35 - ERROR - stderr - +2025-02-06 07:15:35 - ERROR - stderr - +2025-02-06 07:15:35 - INFO - stdout - {'loss': 0.3691, 'grad_norm': 1.5781289339065552, 'learning_rate': 9.609525723907498e-08, 'epoch': 2.87} +2025-02-06 07:15:35 - ERROR - stderr - 96%|█████████▌| 21473/22434 [21:07:55<39:45, 2.48s/it] +2025-02-06 07:15:37 - ERROR - stderr - 96%|█████████▌| 21474/22434 [21:07:57<39:42, 2.48s/it] +2025-02-06 07:15:37 - ERROR - stderr - +2025-02-06 07:15:37 - ERROR - stderr - +2025-02-06 07:15:37 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.4545581340789795, 'learning_rate': 9.589569124794918e-08, 'epoch': 2.87} +2025-02-06 07:15:37 - ERROR - stderr - 96%|█████████▌| 21474/22434 [21:07:57<39:42, 2.48s/it] +2025-02-06 07:15:40 - ERROR - stderr - 96%|█████████▌| 21475/22434 [21:08:00<39:57, 2.50s/it] +2025-02-06 07:15:40 - ERROR - stderr - +2025-02-06 07:15:40 - ERROR - stderr - +2025-02-06 07:15:40 - INFO - stdout - {'loss': 0.3374, 'grad_norm': 1.584717869758606, 'learning_rate': 9.569633169839943e-08, 'epoch': 2.87} +2025-02-06 07:15:40 - ERROR - stderr - 96%|█████████▌| 21475/22434 [21:08:00<39:57, 2.50s/it] +2025-02-06 07:15:42 - ERROR - stderr - 96%|█████████▌| 21476/22434 [21:08:02<40:40, 2.55s/it] +2025-02-06 07:15:43 - ERROR - stderr - +2025-02-06 07:15:43 - ERROR - stderr - +2025-02-06 07:15:43 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5601285696029663, 'learning_rate': 9.549717859458241e-08, 'epoch': 2.87} +2025-02-06 07:15:43 - ERROR - stderr - 96%|█████████▌| 21476/22434 [21:08:02<40:40, 2.55s/it] +2025-02-06 07:15:45 - ERROR - stderr - 96%|█████████▌| 21477/22434 [21:08:05<40:48, 2.56s/it] +2025-02-06 07:15:45 - ERROR - stderr - +2025-02-06 07:15:45 - ERROR - stderr - +2025-02-06 07:15:45 - INFO - stdout - {'loss': 0.357, 'grad_norm': 1.5052555799484253, 'learning_rate': 9.529823194064924e-08, 'epoch': 2.87} +2025-02-06 07:15:45 - ERROR - stderr - 96%|█████████▌| 21477/22434 [21:08:05<40:48, 2.56s/it] +2025-02-06 07:15:48 - ERROR - stderr - 96%|█████████▌| 21478/22434 [21:08:07<40:35, 2.55s/it] +2025-02-06 07:15:48 - ERROR - stderr - +2025-02-06 07:15:48 - ERROR - stderr - +2025-02-06 07:15:48 - INFO - stdout - {'loss': 0.3319, 'grad_norm': 1.695675015449524, 'learning_rate': 9.509949174074662e-08, 'epoch': 2.87} +2025-02-06 07:15:48 - ERROR - stderr - 96%|█████████▌| 21478/22434 [21:08:07<40:35, 2.55s/it] +2025-02-06 07:15:50 - ERROR - stderr - 96%|█████████▌| 21479/22434 [21:08:10<40:15, 2.53s/it] +2025-02-06 07:15:50 - ERROR - stderr - +2025-02-06 07:15:50 - ERROR - stderr - +2025-02-06 07:15:50 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.4590795040130615, 'learning_rate': 9.490095799901677e-08, 'epoch': 2.87} +2025-02-06 07:15:50 - ERROR - stderr - 96%|█████████▌| 21479/22434 [21:08:10<40:15, 2.53s/it] +2025-02-06 07:15:53 - ERROR - stderr - 96%|█████████▌| 21480/22434 [21:08:12<40:08, 2.52s/it] +2025-02-06 07:15:53 - ERROR - stderr - +2025-02-06 07:15:53 - ERROR - stderr - +2025-02-06 07:15:53 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.5261414051055908, 'learning_rate': 9.470263071959862e-08, 'epoch': 2.87} +2025-02-06 07:15:53 - ERROR - stderr - 96%|█████████▌| 21480/22434 [21:08:12<40:08, 2.52s/it] +2025-02-06 07:15:55 - ERROR - stderr - 96%|█████████▌| 21481/22434 [21:08:15<39:42, 2.50s/it] +2025-02-06 07:15:55 - ERROR - stderr - +2025-02-06 07:15:55 - ERROR - stderr - +2025-02-06 07:15:55 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.6008661985397339, 'learning_rate': 9.450450990662552e-08, 'epoch': 2.87} +2025-02-06 07:15:55 - ERROR - stderr - 96%|█████████▌| 21481/22434 [21:08:15<39:42, 2.50s/it] +2025-02-06 07:15:57 - ERROR - stderr - 96%|█████████▌| 21482/22434 [21:08:17<39:25, 2.49s/it] +2025-02-06 07:15:58 - ERROR - stderr - +2025-02-06 07:15:58 - ERROR - stderr - +2025-02-06 07:15:58 - INFO - stdout - {'loss': 0.406, 'grad_norm': 1.5007354021072388, 'learning_rate': 9.43065955642275e-08, 'epoch': 2.87} +2025-02-06 07:15:58 - ERROR - stderr - 96%|█████████▌| 21482/22434 [21:08:17<39:25, 2.49s/it] +2025-02-06 07:16:00 - ERROR - stderr - 96%|█████████▌| 21483/22434 [21:08:20<39:20, 2.48s/it] +2025-02-06 07:16:00 - ERROR - stderr - +2025-02-06 07:16:00 - ERROR - stderr - +2025-02-06 07:16:00 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.563239574432373, 'learning_rate': 9.410888769653015e-08, 'epoch': 2.87} +2025-02-06 07:16:00 - ERROR - stderr - 96%|█████████▌| 21483/22434 [21:08:20<39:20, 2.48s/it] +2025-02-06 07:16:02 - ERROR - stderr - 96%|█████████▌| 21484/22434 [21:08:22<39:06, 2.47s/it] +2025-02-06 07:16:02 - ERROR - stderr - +2025-02-06 07:16:02 - ERROR - stderr - +2025-02-06 07:16:02 - INFO - stdout - {'loss': 0.3321, 'grad_norm': 1.430267333984375, 'learning_rate': 9.391138630765462e-08, 'epoch': 2.87} +2025-02-06 07:16:02 - ERROR - stderr - 96%|█████████▌| 21484/22434 [21:08:22<39:06, 2.47s/it] +2025-02-06 07:16:05 - ERROR - stderr - 96%|█████████▌| 21485/22434 [21:08:25<39:29, 2.50s/it] +2025-02-06 07:16:05 - ERROR - stderr - +2025-02-06 07:16:05 - ERROR - stderr - +2025-02-06 07:16:05 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.542784571647644, 'learning_rate': 9.37140914017154e-08, 'epoch': 2.87} +2025-02-06 07:16:05 - ERROR - stderr - 96%|█████████▌| 21485/22434 [21:08:25<39:29, 2.50s/it] +2025-02-06 07:16:07 - ERROR - stderr - 96%|█████████▌| 21486/22434 [21:08:27<39:36, 2.51s/it] +2025-02-06 07:16:08 - ERROR - stderr - +2025-02-06 07:16:08 - ERROR - stderr - +2025-02-06 07:16:08 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.5014311075210571, 'learning_rate': 9.351700298282806e-08, 'epoch': 2.87} +2025-02-06 07:16:08 - ERROR - stderr - 96%|█████████▌| 21486/22434 [21:08:27<39:36, 2.51s/it] +2025-02-06 07:16:10 - ERROR - stderr - 96%|█████████▌| 21487/22434 [21:08:30<41:13, 2.61s/it] +2025-02-06 07:16:10 - ERROR - stderr - +2025-02-06 07:16:10 - ERROR - stderr - +2025-02-06 07:16:10 - INFO - stdout - {'loss': 0.424, 'grad_norm': 1.7070379257202148, 'learning_rate': 9.332012105509935e-08, 'epoch': 2.87} +2025-02-06 07:16:10 - ERROR - stderr - 96%|█████████▌| 21487/22434 [21:08:30<41:13, 2.61s/it] +2025-02-06 07:16:13 - ERROR - stderr - 96%|█████████▌| 21488/22434 [21:08:33<42:10, 2.68s/it] +2025-02-06 07:16:13 - ERROR - stderr - +2025-02-06 07:16:13 - ERROR - stderr - +2025-02-06 07:16:13 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.6422314643859863, 'learning_rate': 9.312344562263153e-08, 'epoch': 2.87} +2025-02-06 07:16:13 - ERROR - stderr - 96%|█████████▌| 21488/22434 [21:08:33<42:10, 2.68s/it] +2025-02-06 07:16:16 - ERROR - stderr - 96%|█████████▌| 21489/22434 [21:08:35<41:35, 2.64s/it] +2025-02-06 07:16:16 - ERROR - stderr - +2025-02-06 07:16:16 - ERROR - stderr - +2025-02-06 07:16:16 - INFO - stdout - {'loss': 0.3734, 'grad_norm': 1.6174280643463135, 'learning_rate': 9.292697668952799e-08, 'epoch': 2.87} +2025-02-06 07:16:16 - ERROR - stderr - 96%|█████████▌| 21489/22434 [21:08:36<41:35, 2.64s/it] +2025-02-06 07:16:18 - ERROR - stderr - 96%|█████████▌| 21490/22434 [21:08:38<41:15, 2.62s/it] +2025-02-06 07:16:18 - ERROR - stderr - +2025-02-06 07:16:18 - ERROR - stderr - +2025-02-06 07:16:18 - INFO - stdout - {'loss': 0.3552, 'grad_norm': 1.6071025133132935, 'learning_rate': 9.273071425987878e-08, 'epoch': 2.87} +2025-02-06 07:16:18 - ERROR - stderr - 96%|█████████▌| 21490/22434 [21:08:38<41:15, 2.62s/it] +2025-02-06 07:16:21 - ERROR - stderr - 96%|█████████▌| 21491/22434 [21:08:41<40:32, 2.58s/it] +2025-02-06 07:16:21 - ERROR - stderr - +2025-02-06 07:16:21 - ERROR - stderr - +2025-02-06 07:16:21 - INFO - stdout - {'loss': 0.3377, 'grad_norm': 1.5185357332229614, 'learning_rate': 9.253465833778064e-08, 'epoch': 2.87} +2025-02-06 07:16:21 - ERROR - stderr - 96%|█████████▌| 21491/22434 [21:08:41<40:32, 2.58s/it] +2025-02-06 07:16:23 - ERROR - stderr - 96%|█████████▌| 21492/22434 [21:08:43<39:58, 2.55s/it] +2025-02-06 07:16:23 - ERROR - stderr - +2025-02-06 07:16:23 - ERROR - stderr - +2025-02-06 07:16:23 - INFO - stdout - {'loss': 0.3268, 'grad_norm': 1.7695194482803345, 'learning_rate': 9.233880892731473e-08, 'epoch': 2.87} +2025-02-06 07:16:23 - ERROR - stderr - 96%|█████████▌| 21492/22434 [21:08:43<39:58, 2.55s/it] +2025-02-06 07:16:26 - ERROR - stderr - 96%|█████████▌| 21493/22434 [21:08:46<40:01, 2.55s/it] +2025-02-06 07:16:26 - ERROR - stderr - +2025-02-06 07:16:26 - ERROR - stderr - +2025-02-06 07:16:26 - INFO - stdout - {'loss': 0.342, 'grad_norm': 1.4268543720245361, 'learning_rate': 9.214316603256668e-08, 'epoch': 2.87} +2025-02-06 07:16:26 - ERROR - stderr - 96%|█████████▌| 21493/22434 [21:08:46<40:01, 2.55s/it] +2025-02-06 07:16:29 - ERROR - stderr - 96%|█████████▌| 21494/22434 [21:08:48<41:15, 2.63s/it] +2025-02-06 07:16:29 - ERROR - stderr - +2025-02-06 07:16:29 - ERROR - stderr - +2025-02-06 07:16:29 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.6798388957977295, 'learning_rate': 9.194772965761434e-08, 'epoch': 2.87} +2025-02-06 07:16:29 - ERROR - stderr - 96%|█████████▌| 21494/22434 [21:08:48<41:15, 2.63s/it] +2025-02-06 07:16:31 - ERROR - stderr - 96%|█████████▌| 21495/22434 [21:08:51<40:36, 2.59s/it] +2025-02-06 07:16:31 - ERROR - stderr - +2025-02-06 07:16:31 - ERROR - stderr - +2025-02-06 07:16:31 - INFO - stdout - {'loss': 0.4133, 'grad_norm': 1.8127113580703735, 'learning_rate': 9.17524998065289e-08, 'epoch': 2.87} +2025-02-06 07:16:31 - ERROR - stderr - 96%|█████████▌| 21495/22434 [21:08:51<40:36, 2.59s/it] +2025-02-06 07:16:34 - ERROR - stderr - 96%|█████████▌| 21496/22434 [21:08:53<40:11, 2.57s/it] +2025-02-06 07:16:34 - ERROR - stderr - +2025-02-06 07:16:34 - ERROR - stderr - +2025-02-06 07:16:34 - INFO - stdout - {'loss': 0.4121, 'grad_norm': 1.6814757585525513, 'learning_rate': 9.155747648338264e-08, 'epoch': 2.87} +2025-02-06 07:16:34 - ERROR - stderr - 96%|█████████▌| 21496/22434 [21:08:53<40:11, 2.57s/it] +2025-02-06 07:16:36 - ERROR - stderr - 96%|█████████▌| 21497/22434 [21:08:56<39:34, 2.53s/it] +2025-02-06 07:16:36 - ERROR - stderr - +2025-02-06 07:16:36 - ERROR - stderr - +2025-02-06 07:16:36 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.6773827075958252, 'learning_rate': 9.1362659692239e-08, 'epoch': 2.87} +2025-02-06 07:16:36 - ERROR - stderr - 96%|█████████▌| 21497/22434 [21:08:56<39:34, 2.53s/it] +2025-02-06 07:16:39 - ERROR - stderr - 96%|█████████▌| 21498/22434 [21:08:59<40:59, 2.63s/it] +2025-02-06 07:16:39 - ERROR - stderr - +2025-02-06 07:16:39 - ERROR - stderr - +2025-02-06 07:16:39 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.5675195455551147, 'learning_rate': 9.116804943715918e-08, 'epoch': 2.87} +2025-02-06 07:16:39 - ERROR - stderr - 96%|█████████▌| 21498/22434 [21:08:59<40:59, 2.63s/it] +2025-02-06 07:16:41 - ERROR - stderr - 96%|█████████▌| 21499/22434 [21:09:01<40:25, 2.59s/it] +2025-02-06 07:16:42 - ERROR - stderr - +2025-02-06 07:16:42 - ERROR - stderr - +2025-02-06 07:16:42 - INFO - stdout - {'loss': 0.3193, 'grad_norm': 1.3587759733200073, 'learning_rate': 9.09736457221999e-08, 'epoch': 2.87} +2025-02-06 07:16:42 - ERROR - stderr - 96%|█████████▌| 21499/22434 [21:09:01<40:25, 2.59s/it] +2025-02-06 07:16:44 - ERROR - stderr - 96%|█████████▌| 21500/22434 [21:09:04<40:03, 2.57s/it] +2025-02-06 07:16:44 - ERROR - stderr - +2025-02-06 07:16:44 - ERROR - stderr - +2025-02-06 07:16:44 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.540226697921753, 'learning_rate': 9.07794485514124e-08, 'epoch': 2.88} +2025-02-06 07:16:44 - ERROR - stderr - 96%|█████████▌| 21500/22434 [21:09:04<40:03, 2.57s/it] +2025-02-06 07:16:46 - ERROR - stderr - 96%|█████████▌| 21501/22434 [21:09:06<39:43, 2.55s/it] +2025-02-06 07:16:47 - ERROR - stderr - +2025-02-06 07:16:47 - ERROR - stderr - +2025-02-06 07:16:47 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5474010705947876, 'learning_rate': 9.058545792884565e-08, 'epoch': 2.88} +2025-02-06 07:16:47 - ERROR - stderr - 96%|█████████▌| 21501/22434 [21:09:06<39:43, 2.55s/it] +2025-02-06 07:16:49 - ERROR - stderr - 96%|█████████▌| 21502/22434 [21:09:09<39:24, 2.54s/it] +2025-02-06 07:16:49 - ERROR - stderr - +2025-02-06 07:16:49 - ERROR - stderr - +2025-02-06 07:16:49 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.6101235151290894, 'learning_rate': 9.039167385854308e-08, 'epoch': 2.88} +2025-02-06 07:16:49 - ERROR - stderr - 96%|█████████▌| 21502/22434 [21:09:09<39:24, 2.54s/it] +2025-02-06 07:16:51 - ERROR - stderr - 96%|█████████▌| 21503/22434 [21:09:11<39:03, 2.52s/it] +2025-02-06 07:16:52 - ERROR - stderr - +2025-02-06 07:16:52 - ERROR - stderr - +2025-02-06 07:16:52 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.7283473014831543, 'learning_rate': 9.019809634454369e-08, 'epoch': 2.88} +2025-02-06 07:16:52 - ERROR - stderr - 96%|█████████▌| 21503/22434 [21:09:11<39:03, 2.52s/it] +2025-02-06 07:16:54 - ERROR - stderr - 96%|█████████▌| 21504/22434 [21:09:14<38:57, 2.51s/it] +2025-02-06 07:16:54 - ERROR - stderr - +2025-02-06 07:16:54 - ERROR - stderr - +2025-02-06 07:16:54 - INFO - stdout - {'loss': 0.3847, 'grad_norm': 1.7435520887374878, 'learning_rate': 9.000472539088201e-08, 'epoch': 2.88} +2025-02-06 07:16:54 - ERROR - stderr - 96%|█████████▌| 21504/22434 [21:09:14<38:57, 2.51s/it] +2025-02-06 07:16:56 - ERROR - stderr - 96%|█████████▌| 21505/22434 [21:09:16<38:49, 2.51s/it] +2025-02-06 07:16:57 - ERROR - stderr - +2025-02-06 07:16:57 - ERROR - stderr - +2025-02-06 07:16:57 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.4107561111450195, 'learning_rate': 8.981156100158928e-08, 'epoch': 2.88} +2025-02-06 07:16:57 - ERROR - stderr - 96%|█████████▌| 21505/22434 [21:09:16<38:49, 2.51s/it] +2025-02-06 07:16:59 - ERROR - stderr - 96%|█████████▌| 21506/22434 [21:09:19<38:23, 2.48s/it] +2025-02-06 07:16:59 - ERROR - stderr - +2025-02-06 07:16:59 - ERROR - stderr - +2025-02-06 07:16:59 - INFO - stdout - {'loss': 0.3138, 'grad_norm': 1.5231406688690186, 'learning_rate': 8.961860318069115e-08, 'epoch': 2.88} +2025-02-06 07:16:59 - ERROR - stderr - 96%|█████████▌| 21506/22434 [21:09:19<38:23, 2.48s/it] +2025-02-06 07:17:01 - ERROR - stderr - 96%|█████████▌| 21507/22434 [21:09:21<38:24, 2.49s/it] +2025-02-06 07:17:01 - ERROR - stderr - +2025-02-06 07:17:01 - ERROR - stderr - +2025-02-06 07:17:01 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.5713320970535278, 'learning_rate': 8.942585193220998e-08, 'epoch': 2.88} +2025-02-06 07:17:01 - ERROR - stderr - 96%|█████████▌| 21507/22434 [21:09:21<38:24, 2.49s/it] +2025-02-06 07:17:04 - ERROR - stderr - 96%|█████████▌| 21508/22434 [21:09:24<38:16, 2.48s/it] +2025-02-06 07:17:04 - ERROR - stderr - +2025-02-06 07:17:04 - ERROR - stderr - +2025-02-06 07:17:04 - INFO - stdout - {'loss': 0.3714, 'grad_norm': 1.6168357133865356, 'learning_rate': 8.923330726016366e-08, 'epoch': 2.88} +2025-02-06 07:17:04 - ERROR - stderr - 96%|█████████▌| 21508/22434 [21:09:24<38:16, 2.48s/it] +2025-02-06 07:17:06 - ERROR - stderr - 96%|█████████▌| 21509/22434 [21:09:26<37:55, 2.46s/it] +2025-02-06 07:17:06 - ERROR - stderr - +2025-02-06 07:17:06 - ERROR - stderr - +2025-02-06 07:17:06 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.6079397201538086, 'learning_rate': 8.904096916856452e-08, 'epoch': 2.88} +2025-02-06 07:17:06 - ERROR - stderr - 96%|█████████▌| 21509/22434 [21:09:26<37:55, 2.46s/it] +2025-02-06 07:17:09 - ERROR - stderr - 96%|█████████▌| 21510/22434 [21:09:29<39:01, 2.53s/it] +2025-02-06 07:17:09 - ERROR - stderr - +2025-02-06 07:17:09 - ERROR - stderr - +2025-02-06 07:17:09 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.6095151901245117, 'learning_rate': 8.884883766142494e-08, 'epoch': 2.88} +2025-02-06 07:17:09 - ERROR - stderr - 96%|█████████▌| 21510/22434 [21:09:29<39:01, 2.53s/it] +2025-02-06 07:17:12 - ERROR - stderr - 96%|█████████▌| 21511/22434 [21:09:31<39:23, 2.56s/it] +2025-02-06 07:17:12 - ERROR - stderr - +2025-02-06 07:17:12 - ERROR - stderr - +2025-02-06 07:17:12 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.5530354976654053, 'learning_rate': 8.865691274274502e-08, 'epoch': 2.88} +2025-02-06 07:17:12 - ERROR - stderr - 96%|█████████▌| 21511/22434 [21:09:31<39:23, 2.56s/it] +2025-02-06 07:17:14 - ERROR - stderr - 96%|█████████▌| 21512/22434 [21:09:34<38:43, 2.52s/it] +2025-02-06 07:17:14 - ERROR - stderr - +2025-02-06 07:17:14 - ERROR - stderr - +2025-02-06 07:17:14 - INFO - stdout - {'loss': 0.4241, 'grad_norm': 1.8362239599227905, 'learning_rate': 8.846519441652935e-08, 'epoch': 2.88} +2025-02-06 07:17:14 - ERROR - stderr - 96%|█████████▌| 21512/22434 [21:09:34<38:43, 2.52s/it] +2025-02-06 07:17:16 - ERROR - stderr - 96%|█████████▌| 21513/22434 [21:09:36<38:11, 2.49s/it] +2025-02-06 07:17:16 - ERROR - stderr - +2025-02-06 07:17:16 - ERROR - stderr - +2025-02-06 07:17:16 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.554849624633789, 'learning_rate': 8.827368268677139e-08, 'epoch': 2.88} +2025-02-06 07:17:16 - ERROR - stderr - 96%|█████████▌| 21513/22434 [21:09:36<38:11, 2.49s/it] +2025-02-06 07:17:19 - ERROR - stderr - 96%|█████████▌| 21514/22434 [21:09:39<37:58, 2.48s/it] +2025-02-06 07:17:19 - ERROR - stderr - +2025-02-06 07:17:19 - ERROR - stderr - +2025-02-06 07:17:19 - INFO - stdout - {'loss': 0.3229, 'grad_norm': 1.431810975074768, 'learning_rate': 8.808237755746352e-08, 'epoch': 2.88} +2025-02-06 07:17:19 - ERROR - stderr - 96%|█████████▌| 21514/22434 [21:09:39<37:58, 2.48s/it] +2025-02-06 07:17:21 - ERROR - stderr - 96%|█████████▌| 21515/22434 [21:09:41<38:00, 2.48s/it] +2025-02-06 07:17:21 - ERROR - stderr - +2025-02-06 07:17:21 - ERROR - stderr - +2025-02-06 07:17:21 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.369589924812317, 'learning_rate': 8.789127903259586e-08, 'epoch': 2.88} +2025-02-06 07:17:21 - ERROR - stderr - 96%|█████████▌| 21515/22434 [21:09:41<38:00, 2.48s/it] +2025-02-06 07:17:24 - ERROR - stderr - 96%|█████████▌| 21516/22434 [21:09:44<38:11, 2.50s/it] +2025-02-06 07:17:24 - ERROR - stderr - +2025-02-06 07:17:24 - ERROR - stderr - +2025-02-06 07:17:24 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.506874442100525, 'learning_rate': 8.770038711614747e-08, 'epoch': 2.88} +2025-02-06 07:17:24 - ERROR - stderr - 96%|█████████▌| 21516/22434 [21:09:44<38:11, 2.50s/it] +2025-02-06 07:17:26 - ERROR - stderr - 96%|█████████▌| 21517/22434 [21:09:46<38:27, 2.52s/it] +2025-02-06 07:17:27 - ERROR - stderr - +2025-02-06 07:17:27 - ERROR - stderr - +2025-02-06 07:17:27 - INFO - stdout - {'loss': 0.3278, 'grad_norm': 1.4686498641967773, 'learning_rate': 8.750970181210072e-08, 'epoch': 2.88} +2025-02-06 07:17:27 - ERROR - stderr - 96%|█████████▌| 21517/22434 [21:09:46<38:27, 2.52s/it] +2025-02-06 07:17:29 - ERROR - stderr - 96%|█████████▌| 21518/22434 [21:09:49<38:44, 2.54s/it] +2025-02-06 07:17:29 - ERROR - stderr - +2025-02-06 07:17:29 - ERROR - stderr - +2025-02-06 07:17:29 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.7377760410308838, 'learning_rate': 8.731922312442909e-08, 'epoch': 2.88} +2025-02-06 07:17:29 - ERROR - stderr - 96%|█████████▌| 21518/22434 [21:09:49<38:44, 2.54s/it] +2025-02-06 07:17:31 - ERROR - stderr - 96%|█████████▌| 21519/22434 [21:09:51<38:19, 2.51s/it] +2025-02-06 07:17:32 - ERROR - stderr - +2025-02-06 07:17:32 - ERROR - stderr - +2025-02-06 07:17:32 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.4389381408691406, 'learning_rate': 8.712895105710162e-08, 'epoch': 2.88} +2025-02-06 07:17:32 - ERROR - stderr - 96%|█████████▌| 21519/22434 [21:09:51<38:19, 2.51s/it] +2025-02-06 07:17:34 - ERROR - stderr - 96%|█████████▌| 21520/22434 [21:09:54<37:53, 2.49s/it] +2025-02-06 07:17:34 - ERROR - stderr - +2025-02-06 07:17:34 - ERROR - stderr - +2025-02-06 07:17:34 - INFO - stdout - {'loss': 0.3578, 'grad_norm': 1.604592204093933, 'learning_rate': 8.693888561408625e-08, 'epoch': 2.88} +2025-02-06 07:17:34 - ERROR - stderr - 96%|█████████▌| 21520/22434 [21:09:54<37:53, 2.49s/it] +2025-02-06 07:17:36 - ERROR - stderr - 96%|█████████▌| 21521/22434 [21:09:56<37:33, 2.47s/it] +2025-02-06 07:17:36 - ERROR - stderr - +2025-02-06 07:17:36 - ERROR - stderr - +2025-02-06 07:17:36 - INFO - stdout - {'loss': 0.3401, 'grad_norm': 1.5750700235366821, 'learning_rate': 8.674902679934427e-08, 'epoch': 2.88} +2025-02-06 07:17:36 - ERROR - stderr - 96%|█████████▌| 21521/22434 [21:09:56<37:33, 2.47s/it] +2025-02-06 07:17:39 - ERROR - stderr - 96%|█████████▌| 21522/22434 [21:09:59<37:48, 2.49s/it] +2025-02-06 07:17:39 - ERROR - stderr - +2025-02-06 07:17:39 - ERROR - stderr - +2025-02-06 07:17:39 - INFO - stdout - {'loss': 0.3222, 'grad_norm': 1.4906057119369507, 'learning_rate': 8.655937461683362e-08, 'epoch': 2.88} +2025-02-06 07:17:39 - ERROR - stderr - 96%|█████████▌| 21522/22434 [21:09:59<37:48, 2.49s/it] +2025-02-06 07:17:41 - ERROR - stderr - 96%|█████████▌| 21523/22434 [21:10:01<37:36, 2.48s/it] +2025-02-06 07:17:41 - ERROR - stderr - +2025-02-06 07:17:41 - ERROR - stderr - +2025-02-06 07:17:41 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.517045021057129, 'learning_rate': 8.636992907050556e-08, 'epoch': 2.88} +2025-02-06 07:17:41 - ERROR - stderr - 96%|█████████▌| 21523/22434 [21:10:01<37:36, 2.48s/it] +2025-02-06 07:17:44 - ERROR - stderr - 96%|█████████▌| 21524/22434 [21:10:04<37:47, 2.49s/it] +2025-02-06 07:17:44 - ERROR - stderr - +2025-02-06 07:17:44 - ERROR - stderr - +2025-02-06 07:17:44 - INFO - stdout - {'loss': 0.3113, 'grad_norm': 1.5600850582122803, 'learning_rate': 8.618069016431029e-08, 'epoch': 2.88} +2025-02-06 07:17:44 - ERROR - stderr - 96%|█████████▌| 21524/22434 [21:10:04<37:47, 2.49s/it] +2025-02-06 07:17:46 - ERROR - stderr - 96%|█████████▌| 21525/22434 [21:10:06<37:45, 2.49s/it] +2025-02-06 07:17:46 - ERROR - stderr - +2025-02-06 07:17:46 - ERROR - stderr - +2025-02-06 07:17:46 - INFO - stdout - {'loss': 0.353, 'grad_norm': 1.3813369274139404, 'learning_rate': 8.599165790219133e-08, 'epoch': 2.88} +2025-02-06 07:17:46 - ERROR - stderr - 96%|█████████▌| 21525/22434 [21:10:06<37:45, 2.49s/it] +2025-02-06 07:17:49 - ERROR - stderr - 96%|█████████▌| 21526/22434 [21:10:09<39:24, 2.60s/it] +2025-02-06 07:17:49 - ERROR - stderr - +2025-02-06 07:17:49 - ERROR - stderr - +2025-02-06 07:17:49 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.6630665063858032, 'learning_rate': 8.580283228809105e-08, 'epoch': 2.88} +2025-02-06 07:17:49 - ERROR - stderr - 96%|█████████▌| 21526/22434 [21:10:09<39:24, 2.60s/it] +2025-02-06 07:17:52 - ERROR - stderr - 96%|█████████▌| 21527/22434 [21:10:12<39:48, 2.63s/it] +2025-02-06 07:17:52 - ERROR - stderr - +2025-02-06 07:17:52 - ERROR - stderr - +2025-02-06 07:17:52 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.6878085136413574, 'learning_rate': 8.5614213325943e-08, 'epoch': 2.88} +2025-02-06 07:17:52 - ERROR - stderr - 96%|█████████▌| 21527/22434 [21:10:12<39:48, 2.63s/it] +2025-02-06 07:17:54 - ERROR - stderr - 96%|██���██████▌| 21528/22434 [21:10:14<39:22, 2.61s/it] +2025-02-06 07:17:55 - ERROR - stderr - +2025-02-06 07:17:55 - ERROR - stderr - +2025-02-06 07:17:55 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.4341968297958374, 'learning_rate': 8.542580101967957e-08, 'epoch': 2.88} +2025-02-06 07:17:55 - ERROR - stderr - 96%|█████████▌| 21528/22434 [21:10:14<39:22, 2.61s/it] +2025-02-06 07:17:57 - ERROR - stderr - 96%|█████████▌| 21529/22434 [21:10:17<39:19, 2.61s/it] +2025-02-06 07:17:57 - ERROR - stderr - +2025-02-06 07:17:57 - ERROR - stderr - +2025-02-06 07:17:57 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.3677970170974731, 'learning_rate': 8.523759537322873e-08, 'epoch': 2.88} +2025-02-06 07:17:57 - ERROR - stderr - 96%|█████████▌| 21529/22434 [21:10:17<39:19, 2.61s/it] +2025-02-06 07:18:00 - ERROR - stderr - 96%|█████████▌| 21530/22434 [21:10:19<38:52, 2.58s/it] +2025-02-06 07:18:00 - ERROR - stderr - +2025-02-06 07:18:00 - ERROR - stderr - +2025-02-06 07:18:00 - INFO - stdout - {'loss': 0.3793, 'grad_norm': 1.5509475469589233, 'learning_rate': 8.50495963905118e-08, 'epoch': 2.88} +2025-02-06 07:18:00 - ERROR - stderr - 96%|█████████▌| 21530/22434 [21:10:19<38:52, 2.58s/it] +2025-02-06 07:18:02 - ERROR - stderr - 96%|█████████▌| 21531/22434 [21:10:22<38:30, 2.56s/it] +2025-02-06 07:18:02 - ERROR - stderr - +2025-02-06 07:18:02 - ERROR - stderr - +2025-02-06 07:18:02 - INFO - stdout - {'loss': 0.3523, 'grad_norm': 1.5249499082565308, 'learning_rate': 8.486180407544897e-08, 'epoch': 2.88} +2025-02-06 07:18:02 - ERROR - stderr - 96%|█████████▌| 21531/22434 [21:10:22<38:30, 2.56s/it] +2025-02-06 07:18:05 - ERROR - stderr - 96%|█████████▌| 21532/22434 [21:10:24<38:32, 2.56s/it] +2025-02-06 07:18:05 - ERROR - stderr - +2025-02-06 07:18:05 - ERROR - stderr - +2025-02-06 07:18:05 - INFO - stdout - {'loss': 0.4166, 'grad_norm': 1.6444282531738281, 'learning_rate': 8.467421843195488e-08, 'epoch': 2.88} +2025-02-06 07:18:05 - ERROR - stderr - 96%|█████████▌| 21532/22434 [21:10:24<38:32, 2.56s/it] +2025-02-06 07:18:07 - ERROR - stderr - 96%|█████████▌| 21533/22434 [21:10:27<38:44, 2.58s/it] +2025-02-06 07:18:07 - ERROR - stderr - +2025-02-06 07:18:07 - ERROR - stderr - +2025-02-06 07:18:07 - INFO - stdout - {'loss': 0.3282, 'grad_norm': 1.4829626083374023, 'learning_rate': 8.448683946393643e-08, 'epoch': 2.88} +2025-02-06 07:18:07 - ERROR - stderr - 96%|█████████▌| 21533/22434 [21:10:27<38:44, 2.58s/it] +2025-02-06 07:18:10 - ERROR - stderr - 96%|█████████▌| 21534/22434 [21:10:30<38:19, 2.56s/it] +2025-02-06 07:18:10 - ERROR - stderr - +2025-02-06 07:18:10 - ERROR - stderr - +2025-02-06 07:18:10 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.6491841077804565, 'learning_rate': 8.42996671753038e-08, 'epoch': 2.88} +2025-02-06 07:18:10 - ERROR - stderr - 96%|█████████▌| 21534/22434 [21:10:30<38:19, 2.56s/it] +2025-02-06 07:18:12 - ERROR - stderr - 96%|█████████▌| 21535/22434 [21:10:32<37:52, 2.53s/it] +2025-02-06 07:18:12 - ERROR - stderr - +2025-02-06 07:18:12 - ERROR - stderr - +2025-02-06 07:18:12 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.7757115364074707, 'learning_rate': 8.41127015699561e-08, 'epoch': 2.88} +2025-02-06 07:18:12 - ERROR - stderr - 96%|█████████▌| 21535/22434 [21:10:32<37:52, 2.53s/it] +2025-02-06 07:18:15 - ERROR - stderr - 96%|█████████▌| 21536/22434 [21:10:35<37:45, 2.52s/it] +2025-02-06 07:18:15 - ERROR - stderr - +2025-02-06 07:18:15 - ERROR - stderr - +2025-02-06 07:18:15 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.374893307685852, 'learning_rate': 8.392594265179022e-08, 'epoch': 2.88} +2025-02-06 07:18:15 - ERROR - stderr - 96%|█████████▌| 21536/22434 [21:10:35<37:45, 2.52s/it] +2025-02-06 07:18:17 - ERROR - stderr - 96%|█████████▌| 21537/22434 [21:10:37<37:57, 2.54s/it] +2025-02-06 07:18:17 - ERROR - stderr - +2025-02-06 07:18:17 - ERROR - stderr - +2025-02-06 07:18:17 - INFO - stdout - {'loss': 0.3385, 'grad_norm': 1.5248851776123047, 'learning_rate': 8.373939042469969e-08, 'epoch': 2.88} +2025-02-06 07:18:17 - ERROR - stderr - 96%|█████████▌| 21537/22434 [21:10:37<37:57, 2.54s/it] +2025-02-06 07:18:20 - ERROR - stderr - 96%|█████████▌| 21538/22434 [21:10:40<37:55, 2.54s/it] +2025-02-06 07:18:20 - ERROR - stderr - +2025-02-06 07:18:20 - ERROR - stderr - +2025-02-06 07:18:20 - INFO - stdout - {'loss': 0.3111, 'grad_norm': 1.4480652809143066, 'learning_rate': 8.355304489257254e-08, 'epoch': 2.88} +2025-02-06 07:18:20 - ERROR - stderr - 96%|█████████▌| 21538/22434 [21:10:40<37:55, 2.54s/it] +2025-02-06 07:18:22 - ERROR - stderr - 96%|█████████▌| 21539/22434 [21:10:42<37:29, 2.51s/it] +2025-02-06 07:18:22 - ERROR - stderr - +2025-02-06 07:18:22 - ERROR - stderr - +2025-02-06 07:18:22 - INFO - stdout - {'loss': 0.3367, 'grad_norm': 1.4269475936889648, 'learning_rate': 8.336690605929343e-08, 'epoch': 2.88} +2025-02-06 07:18:22 - ERROR - stderr - 96%|█████████▌| 21539/22434 [21:10:42<37:29, 2.51s/it] +2025-02-06 07:18:25 - ERROR - stderr - 96%|█████████▌| 21540/22434 [21:10:45<37:17, 2.50s/it] +2025-02-06 07:18:25 - ERROR - stderr - +2025-02-06 07:18:25 - ERROR - stderr - +2025-02-06 07:18:25 - INFO - stdout - {'loss': 0.4103, 'grad_norm': 1.7795159816741943, 'learning_rate': 8.318097392874147e-08, 'epoch': 2.88} +2025-02-06 07:18:25 - ERROR - stderr - 96%|█████████▌| 21540/22434 [21:10:45<37:17, 2.50s/it] +2025-02-06 07:18:27 - ERROR - stderr - 96%|█████████▌| 21541/22434 [21:10:47<37:31, 2.52s/it] +2025-02-06 07:18:27 - ERROR - stderr - +2025-02-06 07:18:27 - ERROR - stderr - +2025-02-06 07:18:27 - INFO - stdout - {'loss': 0.3972, 'grad_norm': 1.5217500925064087, 'learning_rate': 8.299524850479357e-08, 'epoch': 2.88} +2025-02-06 07:18:27 - ERROR - stderr - 96%|█████████▌| 21541/22434 [21:10:47<37:31, 2.52s/it] +2025-02-06 07:18:30 - ERROR - stderr - 96%|█████████▌| 21542/22434 [21:10:50<37:34, 2.53s/it] +2025-02-06 07:18:30 - ERROR - stderr - +2025-02-06 07:18:30 - ERROR - stderr - +2025-02-06 07:18:30 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.6410175561904907, 'learning_rate': 8.280972979131885e-08, 'epoch': 2.88} +2025-02-06 07:18:30 - ERROR - stderr - 96%|█████████▌| 21542/22434 [21:10:50<37:34, 2.53s/it] +2025-02-06 07:18:33 - ERROR - stderr - 96%|█████████▌| 21543/22434 [21:10:53<38:49, 2.61s/it] +2025-02-06 07:18:33 - ERROR - stderr - +2025-02-06 07:18:33 - ERROR - stderr - +2025-02-06 07:18:33 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.495924711227417, 'learning_rate': 8.262441779218644e-08, 'epoch': 2.88} +2025-02-06 07:18:33 - ERROR - stderr - 96%|█████████▌| 21543/22434 [21:10:53<38:49, 2.61s/it] +2025-02-06 07:18:35 - ERROR - stderr - 96%|█████████▌| 21544/22434 [21:10:55<38:07, 2.57s/it] +2025-02-06 07:18:35 - ERROR - stderr - +2025-02-06 07:18:35 - ERROR - stderr - +2025-02-06 07:18:35 - INFO - stdout - {'loss': 0.393, 'grad_norm': 1.6971803903579712, 'learning_rate': 8.24393125112577e-08, 'epoch': 2.88} +2025-02-06 07:18:35 - ERROR - stderr - 96%|█████████▌| 21544/22434 [21:10:55<38:07, 2.57s/it] +2025-02-06 07:18:38 - ERROR - stderr - 96%|█████████▌| 21545/22434 [21:10:58<38:13, 2.58s/it] +2025-02-06 07:18:38 - ERROR - stderr - +2025-02-06 07:18:38 - ERROR - stderr - +2025-02-06 07:18:38 - INFO - stdout - {'loss': 0.4159, 'grad_norm': 1.6271405220031738, 'learning_rate': 8.225441395239176e-08, 'epoch': 2.88} +2025-02-06 07:18:38 - ERROR - stderr - 96%|█████████▌| 21545/22434 [21:10:58<38:13, 2.58s/it] +2025-02-06 07:18:40 - ERROR - stderr - 96%|█████████▌| 21546/22434 [21:11:00<37:46, 2.55s/it] +2025-02-06 07:18:40 - ERROR - stderr - +2025-02-06 07:18:40 - ERROR - stderr - +2025-02-06 07:18:40 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.6666885614395142, 'learning_rate': 8.20697221194422e-08, 'epoch': 2.88} +2025-02-06 07:18:40 - ERROR - stderr - 96%|█████████▌| 21546/22434 [21:11:00<37:46, 2.55s/it] +2025-02-06 07:18:43 - ERROR - stderr - 96%|█████████▌| 21547/22434 [21:11:02<37:05, 2.51s/it] +2025-02-06 07:18:43 - ERROR - stderr - +2025-02-06 07:18:43 - ERROR - stderr - +2025-02-06 07:18:43 - INFO - stdout - {'loss': 0.4284, 'grad_norm': 1.7279062271118164, 'learning_rate': 8.188523701625928e-08, 'epoch': 2.88} +2025-02-06 07:18:43 - ERROR - stderr - 96%|█████████▌| 21547/22434 [21:11:03<37:05, 2.51s/it] +2025-02-06 07:18:45 - ERROR - stderr - 96%|█████████▌| 21548/22434 [21:11:05<37:04, 2.51s/it] +2025-02-06 07:18:45 - ERROR - stderr - +2025-02-06 07:18:45 - ERROR - stderr - +2025-02-06 07:18:45 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.7403970956802368, 'learning_rate': 8.170095864668881e-08, 'epoch': 2.88} +2025-02-06 07:18:45 - ERROR - stderr - 96%|█████████▌| 21548/22434 [21:11:05<37:04, 2.51s/it] +2025-02-06 07:18:48 - ERROR - stderr - 96%|█████████▌| 21549/22434 [21:11:07<36:41, 2.49s/it] +2025-02-06 07:18:48 - ERROR - stderr - +2025-02-06 07:18:48 - ERROR - stderr - +2025-02-06 07:18:48 - INFO - stdout - {'loss': 0.4002, 'grad_norm': 1.5310173034667969, 'learning_rate': 8.151688701456884e-08, 'epoch': 2.88} +2025-02-06 07:18:48 - ERROR - stderr - 96%|█████████▌| 21549/22434 [21:11:07<36:41, 2.49s/it] +2025-02-06 07:18:50 - ERROR - stderr - 96%|█████████▌| 21550/22434 [21:11:10<37:02, 2.51s/it] +2025-02-06 07:18:50 - ERROR - stderr - +2025-02-06 07:18:50 - ERROR - stderr - +2025-02-06 07:18:50 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.5115386247634888, 'learning_rate': 8.133302212373961e-08, 'epoch': 2.88} +2025-02-06 07:18:50 - ERROR - stderr - 96%|█████████▌| 21550/22434 [21:11:10<37:02, 2.51s/it] +2025-02-06 07:18:53 - ERROR - stderr - 96%|█████████▌| 21551/22434 [21:11:13<37:03, 2.52s/it] +2025-02-06 07:18:53 - ERROR - stderr - +2025-02-06 07:18:53 - ERROR - stderr - +2025-02-06 07:18:53 - INFO - stdout - {'loss': 0.3453, 'grad_norm': 1.5396368503570557, 'learning_rate': 8.114936397803252e-08, 'epoch': 2.88} +2025-02-06 07:18:53 - ERROR - stderr - 96%|█████████▌| 21551/22434 [21:11:13<37:03, 2.52s/it] +2025-02-06 07:18:55 - ERROR - stderr - 96%|█████████▌| 21552/22434 [21:11:15<36:55, 2.51s/it] +2025-02-06 07:18:55 - ERROR - stderr - +2025-02-06 07:18:55 - ERROR - stderr - +2025-02-06 07:18:55 - INFO - stdout - {'loss': 0.3661, 'grad_norm': 1.5014476776123047, 'learning_rate': 8.09659125812745e-08, 'epoch': 2.88} +2025-02-06 07:18:55 - ERROR - stderr - 96%|█████████▌| 21552/22434 [21:11:15<36:55, 2.51s/it] +2025-02-06 07:18:58 - ERROR - stderr - 96%|█████████▌| 21553/22434 [21:11:18<36:45, 2.50s/it] +2025-02-06 07:18:58 - ERROR - stderr - +2025-02-06 07:18:58 - ERROR - stderr - +2025-02-06 07:18:58 - INFO - stdout - {'loss': 0.3123, 'grad_norm': 1.3916600942611694, 'learning_rate': 8.07826679372925e-08, 'epoch': 2.88} +2025-02-06 07:18:58 - ERROR - stderr - 96%|█████████▌| 21553/22434 [21:11:18<36:45, 2.50s/it] +2025-02-06 07:19:00 - ERROR - stderr - 96%|█████████▌| 21554/22434 [21:11:20<36:55, 2.52s/it] +2025-02-06 07:19:00 - ERROR - stderr - +2025-02-06 07:19:00 - ERROR - stderr - +2025-02-06 07:19:00 - INFO - stdout - {'loss': 0.3154, 'grad_norm': 1.5446007251739502, 'learning_rate': 8.059963004990234e-08, 'epoch': 2.88} +2025-02-06 07:19:00 - ERROR - stderr - 96%|█████████▌| 21554/22434 [21:11:20<36:55, 2.52s/it] +2025-02-06 07:19:03 - ERROR - stderr - 96%|█████████▌| 21555/22434 [21:11:23<36:48, 2.51s/it] +2025-02-06 07:19:03 - ERROR - stderr - +2025-02-06 07:19:03 - ERROR - stderr - +2025-02-06 07:19:03 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.567206621170044, 'learning_rate': 8.041679892292209e-08, 'epoch': 2.88} +2025-02-06 07:19:03 - ERROR - stderr - 96%|█████████▌| 21555/22434 [21:11:23<36:48, 2.51s/it] +2025-02-06 07:19:05 - ERROR - stderr - 96%|█████████▌| 21556/22434 [21:11:25<36:51, 2.52s/it] +2025-02-06 07:19:05 - ERROR - stderr - +2025-02-06 07:19:05 - ERROR - stderr - +2025-02-06 07:19:05 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.7728365659713745, 'learning_rate': 8.023417456016202e-08, 'epoch': 2.88} +2025-02-06 07:19:05 - ERROR - stderr - 96%|█████████▌| 21556/22434 [21:11:25<36:51, 2.52s/it] +2025-02-06 07:19:08 - ERROR - stderr - 96%|█████████▌| 21557/22434 [21:11:28<36:27, 2.49s/it] +2025-02-06 07:19:08 - ERROR - stderr - +2025-02-06 07:19:08 - ERROR - stderr - +2025-02-06 07:19:08 - INFO - stdout - {'loss': 0.3533, 'grad_norm': 1.587586522102356, 'learning_rate': 8.005175696542688e-08, 'epoch': 2.88} +2025-02-06 07:19:08 - ERROR - stderr - 96%|█████████▌| 21557/22434 [21:11:28<36:27, 2.49s/it] +2025-02-06 07:19:10 - ERROR - stderr - 96%|█████████▌| 21558/22434 [21:11:30<36:40, 2.51s/it] +2025-02-06 07:19:10 - ERROR - stderr - +2025-02-06 07:19:10 - ERROR - stderr - +2025-02-06 07:19:10 - INFO - stdout - {'loss': 0.3392, 'grad_norm': 1.4738870859146118, 'learning_rate': 7.98695461425214e-08, 'epoch': 2.88} +2025-02-06 07:19:10 - ERROR - stderr - 96%|█████████▌| 21558/22434 [21:11:30<36:40, 2.51s/it] +2025-02-06 07:19:13 - ERROR - stderr - 96%|█████████▌| 21559/22434 [21:11:33<36:32, 2.51s/it] +2025-02-06 07:19:13 - ERROR - stderr - +2025-02-06 07:19:13 - ERROR - stderr - +2025-02-06 07:19:13 - INFO - stdout - {'loss': 0.2953, 'grad_norm': 1.4089224338531494, 'learning_rate': 7.968754209524254e-08, 'epoch': 2.88} +2025-02-06 07:19:13 - ERROR - stderr - 96%|█████████▌| 21559/22434 [21:11:33<36:32, 2.51s/it] +2025-02-06 07:19:15 - ERROR - stderr - 96%|█████████▌| 21560/22434 [21:11:35<36:37, 2.51s/it] +2025-02-06 07:19:15 - ERROR - stderr - +2025-02-06 07:19:15 - ERROR - stderr - +2025-02-06 07:19:15 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.5764145851135254, 'learning_rate': 7.950574482738505e-08, 'epoch': 2.88} +2025-02-06 07:19:15 - ERROR - stderr - 96%|█████████▌| 21560/22434 [21:11:35<36:37, 2.51s/it] +2025-02-06 07:19:18 - ERROR - stderr - 96%|█████████▌| 21561/22434 [21:11:38<36:43, 2.52s/it] +2025-02-06 07:19:18 - ERROR - stderr - +2025-02-06 07:19:18 - ERROR - stderr - +2025-02-06 07:19:18 - INFO - stdout - {'loss': 0.2913, 'grad_norm': 1.380562424659729, 'learning_rate': 7.932415434273589e-08, 'epoch': 2.88} +2025-02-06 07:19:18 - ERROR - stderr - 96%|█████████▌| 21561/22434 [21:11:38<36:43, 2.52s/it] +2025-02-06 07:19:20 - ERROR - stderr - 96%|█████████▌| 21562/22434 [21:11:40<36:27, 2.51s/it] +2025-02-06 07:19:20 - ERROR - stderr - +2025-02-06 07:19:20 - ERROR - stderr - +2025-02-06 07:19:20 - INFO - stdout - {'loss': 0.4326, 'grad_norm': 1.7031933069229126, 'learning_rate': 7.914277064508314e-08, 'epoch': 2.88} +2025-02-06 07:19:20 - ERROR - stderr - 96%|█████████▌| 21562/22434 [21:11:40<36:27, 2.51s/it] +2025-02-06 07:19:23 - ERROR - stderr - 96%|█████████▌| 21563/22434 [21:11:43<36:04, 2.49s/it] +2025-02-06 07:19:23 - ERROR - stderr - +2025-02-06 07:19:23 - ERROR - stderr - +2025-02-06 07:19:23 - INFO - stdout - {'loss': 0.3935, 'grad_norm': 1.8635889291763306, 'learning_rate': 7.896159373820489e-08, 'epoch': 2.88} +2025-02-06 07:19:23 - ERROR - stderr - 96%|█████████▌| 21563/22434 [21:11:43<36:04, 2.49s/it] +2025-02-06 07:19:25 - ERROR - stderr - 96%|█████████▌| 21564/22434 [21:11:45<36:12, 2.50s/it] +2025-02-06 07:19:25 - ERROR - stderr - +2025-02-06 07:19:25 - ERROR - stderr - +2025-02-06 07:19:25 - INFO - stdout - {'loss': 0.3727, 'grad_norm': 1.550293207168579, 'learning_rate': 7.878062362587924e-08, 'epoch': 2.88} +2025-02-06 07:19:25 - ERROR - stderr - 96%|█████████▌| 21564/22434 [21:11:45<36:12, 2.50s/it] +2025-02-06 07:19:28 - ERROR - stderr - 96%|█████████▌| 21565/22434 [21:11:48<35:52, 2.48s/it] +2025-02-06 07:19:28 - ERROR - stderr - +2025-02-06 07:19:28 - ERROR - stderr - +2025-02-06 07:19:28 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.4565457105636597, 'learning_rate': 7.859986031187761e-08, 'epoch': 2.88} +2025-02-06 07:19:28 - ERROR - stderr - 96%|█████████▌| 21565/22434 [21:11:48<35:52, 2.48s/it] +2025-02-06 07:19:30 - ERROR - stderr - 96%|█████████▌| 21566/22434 [21:11:50<35:51, 2.48s/it] +2025-02-06 07:19:30 - ERROR - stderr - +2025-02-06 07:19:30 - ERROR - stderr - +2025-02-06 07:19:30 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.6653430461883545, 'learning_rate': 7.84193037999692e-08, 'epoch': 2.88} +2025-02-06 07:19:30 - ERROR - stderr - 96%|█████████▌| 21566/22434 [21:11:50<35:51, 2.48s/it] +2025-02-06 07:19:33 - ERROR - stderr - 96%|█████████▌| 21567/22434 [21:11:52<35:39, 2.47s/it] +2025-02-06 07:19:33 - ERROR - stderr - +2025-02-06 07:19:33 - ERROR - stderr - +2025-02-06 07:19:33 - INFO - stdout - {'loss': 0.3998, 'grad_norm': 1.6085915565490723, 'learning_rate': 7.823895409391546e-08, 'epoch': 2.88} +2025-02-06 07:19:33 - ERROR - stderr - 96%|█████████▌| 21567/22434 [21:11:52<35:39, 2.47s/it] +2025-02-06 07:19:35 - ERROR - stderr - 96%|█████████▌| 21568/22434 [21:11:55<37:06, 2.57s/it] +2025-02-06 07:19:36 - ERROR - stderr - +2025-02-06 07:19:36 - ERROR - stderr - +2025-02-06 07:19:36 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.5220946073532104, 'learning_rate': 7.805881119747672e-08, 'epoch': 2.88} +2025-02-06 07:19:36 - ERROR - stderr - 96%|█████████▌| 21568/22434 [21:11:55<37:06, 2.57s/it] +2025-02-06 07:19:38 - ERROR - stderr - 96%|█████████▌| 21569/22434 [21:11:58<36:34, 2.54s/it] +2025-02-06 07:19:38 - ERROR - stderr - +2025-02-06 07:19:38 - ERROR - stderr - +2025-02-06 07:19:38 - INFO - stdout - {'loss': 0.3261, 'grad_norm': 1.4650648832321167, 'learning_rate': 7.787887511440883e-08, 'epoch': 2.88} +2025-02-06 07:19:38 - ERROR - stderr - 96%|█████████▌| 21569/22434 [21:11:58<36:34, 2.54s/it] +2025-02-06 07:19:40 - ERROR - stderr - 96%|█████████▌| 21570/22434 [21:12:00<36:22, 2.53s/it] +2025-02-06 07:19:40 - ERROR - stderr - +2025-02-06 07:19:40 - ERROR - stderr - +2025-02-06 07:19:40 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.344678521156311, 'learning_rate': 7.769914584845994e-08, 'epoch': 2.88} +2025-02-06 07:19:40 - ERROR - stderr - 96%|█████████▌| 21570/22434 [21:12:00<36:22, 2.53s/it] +2025-02-06 07:19:43 - ERROR - stderr - 96%|█████████▌| 21571/22434 [21:12:03<35:54, 2.50s/it] +2025-02-06 07:19:43 - ERROR - stderr - +2025-02-06 07:19:43 - ERROR - stderr - +2025-02-06 07:19:43 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.5516005754470825, 'learning_rate': 7.751962340337815e-08, 'epoch': 2.88} +2025-02-06 07:19:43 - ERROR - stderr - 96%|█████████▌| 21571/22434 [21:12:03<35:54, 2.50s/it] +2025-02-06 07:19:45 - ERROR - stderr - 96%|█████████▌| 21572/22434 [21:12:05<35:54, 2.50s/it] +2025-02-06 07:19:45 - ERROR - stderr - +2025-02-06 07:19:45 - ERROR - stderr - +2025-02-06 07:19:45 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.5888960361480713, 'learning_rate': 7.734030778290602e-08, 'epoch': 2.88} +2025-02-06 07:19:45 - ERROR - stderr - 96%|█████████▌| 21572/22434 [21:12:05<35:54, 2.50s/it] +2025-02-06 07:19:48 - ERROR - stderr - 96%|█████████▌| 21573/22434 [21:12:08<35:29, 2.47s/it] +2025-02-06 07:19:48 - ERROR - stderr - +2025-02-06 07:19:48 - ERROR - stderr - +2025-02-06 07:19:48 - INFO - stdout - {'loss': 0.3259, 'grad_norm': 1.5200307369232178, 'learning_rate': 7.716119899077834e-08, 'epoch': 2.88} +2025-02-06 07:19:48 - ERROR - stderr - 96%|█████████▌| 21573/22434 [21:12:08<35:29, 2.47s/it] +2025-02-06 07:19:50 - ERROR - stderr - 96%|█████████▌| 21574/22434 [21:12:10<35:28, 2.47s/it] +2025-02-06 07:19:50 - ERROR - stderr - +2025-02-06 07:19:50 - ERROR - stderr - +2025-02-06 07:19:50 - INFO - stdout - {'loss': 0.354, 'grad_norm': 1.4606342315673828, 'learning_rate': 7.698229703073213e-08, 'epoch': 2.88} +2025-02-06 07:19:50 - ERROR - stderr - 96%|█████████▌| 21574/22434 [21:12:10<35:28, 2.47s/it] +2025-02-06 07:19:53 - ERROR - stderr - 96%|█████████▌| 21575/22434 [21:12:13<35:32, 2.48s/it] +2025-02-06 07:19:53 - ERROR - stderr - +2025-02-06 07:19:53 - ERROR - stderr - +2025-02-06 07:19:53 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.3554993867874146, 'learning_rate': 7.680360190649327e-08, 'epoch': 2.89} +2025-02-06 07:19:53 - ERROR - stderr - 96%|█████████▌| 21575/22434 [21:12:13<35:32, 2.48s/it] +2025-02-06 07:19:55 - ERROR - stderr - 96%|█████████▌| 21576/22434 [21:12:15<35:27, 2.48s/it] +2025-02-06 07:19:55 - ERROR - stderr - +2025-02-06 07:19:55 - ERROR - stderr - +2025-02-06 07:19:55 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.7382012605667114, 'learning_rate': 7.662511362178993e-08, 'epoch': 2.89} +2025-02-06 07:19:55 - ERROR - stderr - 96%|█████████▌| 21576/22434 [21:12:15<35:27, 2.48s/it] +2025-02-06 07:19:58 - ERROR - stderr - 96%|█████████▌| 21577/22434 [21:12:17<35:23, 2.48s/it] +2025-02-06 07:19:58 - ERROR - stderr - +2025-02-06 07:19:58 - ERROR - stderr - +2025-02-06 07:19:58 - INFO - stdout - {'loss': 0.318, 'grad_norm': 1.4775586128234863, 'learning_rate': 7.644683218033911e-08, 'epoch': 2.89} +2025-02-06 07:19:58 - ERROR - stderr - 96%|█████████▌| 21577/22434 [21:12:18<35:23, 2.48s/it] +2025-02-06 07:20:00 - ERROR - stderr - 96%|█████████▌| 21578/22434 [21:12:20<35:34, 2.49s/it] +2025-02-06 07:20:00 - ERROR - stderr - +2025-02-06 07:20:00 - ERROR - stderr - +2025-02-06 07:20:00 - INFO - stdout - {'loss': 0.4231, 'grad_norm': 1.8527549505233765, 'learning_rate': 7.626875758585673e-08, 'epoch': 2.89} +2025-02-06 07:20:00 - ERROR - stderr - 96%|█████████▌| 21578/22434 [21:12:20<35:34, 2.49s/it] +2025-02-06 07:20:03 - ERROR - stderr - 96%|█████████▌| 21579/22434 [21:12:22<35:24, 2.49s/it] +2025-02-06 07:20:03 - ERROR - stderr - +2025-02-06 07:20:03 - ERROR - stderr - +2025-02-06 07:20:03 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.694792628288269, 'learning_rate': 7.60908898420587e-08, 'epoch': 2.89} +2025-02-06 07:20:03 - ERROR - stderr - 96%|█████████▌| 21579/22434 [21:12:23<35:24, 2.49s/it] +2025-02-06 07:20:05 - ERROR - stderr - 96%|█████████▌| 21580/22434 [21:12:25<35:21, 2.48s/it] +2025-02-06 07:20:05 - ERROR - stderr - +2025-02-06 07:20:05 - ERROR - stderr - +2025-02-06 07:20:05 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.5652803182601929, 'learning_rate': 7.591322895264874e-08, 'epoch': 2.89} +2025-02-06 07:20:05 - ERROR - stderr - 96%|█████████▌| 21580/22434 [21:12:25<35:21, 2.48s/it] +2025-02-06 07:20:08 - ERROR - stderr - 96%|█████████▌| 21581/22434 [21:12:27<35:23, 2.49s/it] +2025-02-06 07:20:08 - ERROR - stderr - +2025-02-06 07:20:08 - ERROR - stderr - +2025-02-06 07:20:08 - INFO - stdout - {'loss': 0.3392, 'grad_norm': 1.4466875791549683, 'learning_rate': 7.573577492133055e-08, 'epoch': 2.89} +2025-02-06 07:20:08 - ERROR - stderr - 96%|█████████▌| 21581/22434 [21:12:28<35:23, 2.49s/it] +2025-02-06 07:20:10 - ERROR - stderr - 96%|█████████▌| 21582/22434 [21:12:30<35:18, 2.49s/it] +2025-02-06 07:20:10 - ERROR - stderr - +2025-02-06 07:20:10 - ERROR - stderr - +2025-02-06 07:20:10 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.4100452661514282, 'learning_rate': 7.55585277518045e-08, 'epoch': 2.89} +2025-02-06 07:20:10 - ERROR - stderr - 96%|█████████▌| 21582/22434 [21:12:30<35:18, 2.49s/it] +2025-02-06 07:20:13 - ERROR - stderr - 96%|█████████▌| 21583/22434 [21:12:33<37:10, 2.62s/it] +2025-02-06 07:20:13 - ERROR - stderr - +2025-02-06 07:20:13 - ERROR - stderr - +2025-02-06 07:20:13 - INFO - stdout - {'loss': 0.3529, 'grad_norm': 1.4746037721633911, 'learning_rate': 7.53814874477643e-08, 'epoch': 2.89} +2025-02-06 07:20:13 - ERROR - stderr - 96%|█████████▌| 21583/22434 [21:12:33<37:10, 2.62s/it] +2025-02-06 07:20:16 - ERROR - stderr - 96%|█████████▌| 21584/22434 [21:12:35<36:32, 2.58s/it] +2025-02-06 07:20:16 - ERROR - stderr - +2025-02-06 07:20:16 - ERROR - stderr - +2025-02-06 07:20:16 - INFO - stdout - {'loss': 0.396, 'grad_norm': 1.743467926979065, 'learning_rate': 7.520465401290033e-08, 'epoch': 2.89} +2025-02-06 07:20:16 - ERROR - stderr - 96%|█████████▌| 21584/22434 [21:12:35<36:32, 2.58s/it] +2025-02-06 07:20:18 - ERROR - stderr - 96%|█████████▌| 21585/22434 [21:12:38<35:53, 2.54s/it] +2025-02-06 07:20:18 - ERROR - stderr - +2025-02-06 07:20:18 - ERROR - stderr - +2025-02-06 07:20:18 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.4487121105194092, 'learning_rate': 7.502802745089743e-08, 'epoch': 2.89} +2025-02-06 07:20:18 - ERROR - stderr - 96%|█████████▌| 21585/22434 [21:12:38<35:53, 2.54s/it] +2025-02-06 07:20:21 - ERROR - stderr - 96%|█████████▌| 21586/22434 [21:12:40<35:51, 2.54s/it] +2025-02-06 07:20:21 - ERROR - stderr - +2025-02-06 07:20:21 - ERROR - stderr - +2025-02-06 07:20:21 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.563948631286621, 'learning_rate': 7.485160776543931e-08, 'epoch': 2.89} +2025-02-06 07:20:21 - ERROR - stderr - 96%|█████████▌| 21586/22434 [21:12:40<35:51, 2.54s/it] +2025-02-06 07:20:23 - ERROR - stderr - 96%|█████████▌| 21587/22434 [21:12:43<35:29, 2.51s/it] +2025-02-06 07:20:23 - ERROR - stderr - +2025-02-06 07:20:23 - ERROR - stderr - +2025-02-06 07:20:23 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 1.705986499786377, 'learning_rate': 7.467539496020082e-08, 'epoch': 2.89} +2025-02-06 07:20:23 - ERROR - stderr - 96%|█████████▌| 21587/22434 [21:12:43<35:29, 2.51s/it] +2025-02-06 07:20:26 - ERROR - stderr - 96%|█████████▌| 21588/22434 [21:12:45<35:28, 2.52s/it] +2025-02-06 07:20:26 - ERROR - stderr - +2025-02-06 07:20:26 - ERROR - stderr - +2025-02-06 07:20:26 - INFO - stdout - {'loss': 0.3408, 'grad_norm': 1.6362204551696777, 'learning_rate': 7.44993890388579e-08, 'epoch': 2.89} +2025-02-06 07:20:26 - ERROR - stderr - 96%|█████████▌| 21588/22434 [21:12:45<35:28, 2.52s/it] +2025-02-06 07:20:28 - ERROR - stderr - 96%|█████████▌| 21589/22434 [21:12:48<35:04, 2.49s/it] +2025-02-06 07:20:28 - ERROR - stderr - +2025-02-06 07:20:28 - ERROR - stderr - +2025-02-06 07:20:28 - INFO - stdout - {'loss': 0.3608, 'grad_norm': 1.4489270448684692, 'learning_rate': 7.43235900050765e-08, 'epoch': 2.89} +2025-02-06 07:20:28 - ERROR - stderr - 96%|█████████▌| 21589/22434 [21:12:48<35:04, 2.49s/it] +2025-02-06 07:20:30 - ERROR - stderr - 96%|█████████▌| 21590/22434 [21:12:50<34:54, 2.48s/it] +2025-02-06 07:20:30 - ERROR - stderr - +2025-02-06 07:20:30 - ERROR - stderr - +2025-02-06 07:20:30 - INFO - stdout - {'loss': 0.3157, 'grad_norm': 1.46921968460083, 'learning_rate': 7.414799786252147e-08, 'epoch': 2.89} +2025-02-06 07:20:30 - ERROR - stderr - 96%|█████████▌| 21590/22434 [21:12:50<34:54, 2.48s/it] +2025-02-06 07:20:33 - ERROR - stderr - 96%|█████████▌| 21591/22434 [21:12:53<35:13, 2.51s/it] +2025-02-06 07:20:33 - ERROR - stderr - +2025-02-06 07:20:33 - ERROR - stderr - +2025-02-06 07:20:33 - INFO - stdout - {'loss': 0.3196, 'grad_norm': 1.479576587677002, 'learning_rate': 7.397261261485434e-08, 'epoch': 2.89} +2025-02-06 07:20:33 - ERROR - stderr - 96%|█████████▌| 21591/22434 [21:12:53<35:13, 2.51s/it] +2025-02-06 07:20:36 - ERROR - stderr - 96%|█████████▌| 21592/22434 [21:12:56<36:21, 2.59s/it] +2025-02-06 07:20:36 - ERROR - stderr - +2025-02-06 07:20:36 - ERROR - stderr - +2025-02-06 07:20:36 - INFO - stdout - {'loss': 0.3914, 'grad_norm': 1.6558668613433838, 'learning_rate': 7.379743426572883e-08, 'epoch': 2.89} +2025-02-06 07:20:36 - ERROR - stderr - 96%|█████████▌| 21592/22434 [21:12:56<36:21, 2.59s/it] +2025-02-06 07:20:39 - ERROR - stderr - 96%|█████████▋| 21593/22434 [21:12:58<36:51, 2.63s/it] +2025-02-06 07:20:39 - ERROR - stderr - +2025-02-06 07:20:39 - ERROR - stderr - +2025-02-06 07:20:39 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.4549545049667358, 'learning_rate': 7.36224628187987e-08, 'epoch': 2.89} +2025-02-06 07:20:39 - ERROR - stderr - 96%|█████████▋| 21593/22434 [21:12:58<36:51, 2.63s/it] +2025-02-06 07:20:41 - ERROR - stderr - 96%|█████████▋| 21594/22434 [21:13:01<36:19, 2.59s/it] +2025-02-06 07:20:41 - ERROR - stderr - +2025-02-06 07:20:41 - ERROR - stderr - +2025-02-06 07:20:41 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.4747297763824463, 'learning_rate': 7.344769827770882e-08, 'epoch': 2.89} +2025-02-06 07:20:41 - ERROR - stderr - 96%|█████████▋| 21594/22434 [21:13:01<36:19, 2.59s/it] +2025-02-06 07:20:43 - ERROR - stderr - 96%|█████████▋| 21595/22434 [21:13:03<35:29, 2.54s/it] +2025-02-06 07:20:43 - ERROR - stderr - +2025-02-06 07:20:43 - ERROR - stderr - +2025-02-06 07:20:43 - INFO - stdout - {'loss': 0.3969, 'grad_norm': 1.7839456796646118, 'learning_rate': 7.327314064610403e-08, 'epoch': 2.89} +2025-02-06 07:20:43 - ERROR - stderr - 96%|█████████▋| 21595/22434 [21:13:03<35:29, 2.54s/it] +2025-02-06 07:20:46 - ERROR - stderr - 96%|█████████▋| 21596/22434 [21:13:06<35:24, 2.53s/it] +2025-02-06 07:20:46 - ERROR - stderr - +2025-02-06 07:20:46 - ERROR - stderr - +2025-02-06 07:20:46 - INFO - stdout - {'loss': 0.3358, 'grad_norm': 1.547399640083313, 'learning_rate': 7.309878992762142e-08, 'epoch': 2.89} +2025-02-06 07:20:46 - ERROR - stderr - 96%|█████████▋| 21596/22434 [21:13:06<35:24, 2.53s/it] +2025-02-06 07:20:48 - ERROR - stderr - 96%|█████████▋| 21597/22434 [21:13:08<34:56, 2.51s/it] +2025-02-06 07:20:48 - ERROR - stderr - +2025-02-06 07:20:48 - ERROR - stderr - +2025-02-06 07:20:48 - INFO - stdout - {'loss': 0.412, 'grad_norm': 1.7318370342254639, 'learning_rate': 7.292464612589478e-08, 'epoch': 2.89} +2025-02-06 07:20:48 - ERROR - stderr - 96%|█████████▋| 21597/22434 [21:13:08<34:56, 2.51s/it] +2025-02-06 07:20:51 - ERROR - stderr - 96%|█████████▋| 21598/22434 [21:13:11<34:55, 2.51s/it] +2025-02-06 07:20:51 - ERROR - stderr - +2025-02-06 07:20:51 - ERROR - stderr - +2025-02-06 07:20:51 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.4341596364974976, 'learning_rate': 7.275070924455563e-08, 'epoch': 2.89} +2025-02-06 07:20:51 - ERROR - stderr - 96%|█████████▋| 21598/22434 [21:13:11<34:55, 2.51s/it] +2025-02-06 07:20:53 - ERROR - stderr - 96%|█████████▋| 21599/22434 [21:13:13<34:59, 2.51s/it] +2025-02-06 07:20:53 - ERROR - stderr - +2025-02-06 07:20:53 - ERROR - stderr - +2025-02-06 07:20:53 - INFO - stdout - {'loss': 0.3732, 'grad_norm': 1.4663244485855103, 'learning_rate': 7.257697928722774e-08, 'epoch': 2.89} +2025-02-06 07:20:53 - ERROR - stderr - 96%|█████████▋| 21599/22434 [21:13:13<34:59, 2.51s/it] +2025-02-06 07:20:56 - ERROR - stderr - 96%|█████████▋| 21600/22434 [21:13:16<34:44, 2.50s/it] +2025-02-06 07:20:56 - ERROR - stderr - +2025-02-06 07:20:56 - ERROR - stderr - +2025-02-06 07:20:56 - INFO - stdout - {'loss': 0.3328, 'grad_norm': 1.7320536375045776, 'learning_rate': 7.240345625753486e-08, 'epoch': 2.89} +2025-02-06 07:20:56 - ERROR - stderr - 96%|█████████▋| 21600/22434 [21:13:16<34:44, 2.50s/it] +2025-02-06 07:20:58 - ERROR - stderr - 96%|█████████▋| 21601/22434 [21:13:18<34:44, 2.50s/it] +2025-02-06 07:20:58 - ERROR - stderr - +2025-02-06 07:20:58 - ERROR - stderr - +2025-02-06 07:20:58 - INFO - stdout - {'loss': 0.3112, 'grad_norm': 1.380563497543335, 'learning_rate': 7.22301401590908e-08, 'epoch': 2.89} +2025-02-06 07:20:58 - ERROR - stderr - 96%|█████████▋| 21601/22434 [21:13:18<34:44, 2.50s/it] +2025-02-06 07:21:01 - ERROR - stderr - 96%|█████████▋| 21602/22434 [21:13:21<34:43, 2.50s/it] +2025-02-06 07:21:01 - ERROR - stderr - +2025-02-06 07:21:01 - ERROR - stderr - +2025-02-06 07:21:01 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.6667157411575317, 'learning_rate': 7.205703099551042e-08, 'epoch': 2.89} +2025-02-06 07:21:01 - ERROR - stderr - 96%|█████████▋| 21602/22434 [21:13:21<34:43, 2.50s/it] +2025-02-06 07:21:03 - ERROR - stderr - 96%|█████████▋| 21603/22434 [21:13:23<34:48, 2.51s/it] +2025-02-06 07:21:03 - ERROR - stderr - +2025-02-06 07:21:03 - ERROR - stderr - +2025-02-06 07:21:03 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.649115800857544, 'learning_rate': 7.188412877040086e-08, 'epoch': 2.89} +2025-02-06 07:21:03 - ERROR - stderr - 96%|█████████▋| 21603/22434 [21:13:23<34:48, 2.51s/it] +2025-02-06 07:21:06 - ERROR - stderr - 96%|█████████▋| 21604/22434 [21:13:26<34:36, 2.50s/it] +2025-02-06 07:21:06 - ERROR - stderr - +2025-02-06 07:21:06 - ERROR - stderr - +2025-02-06 07:21:06 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.5852653980255127, 'learning_rate': 7.171143348736475e-08, 'epoch': 2.89} +2025-02-06 07:21:06 - ERROR - stderr - 96%|█████████▋| 21604/22434 [21:13:26<34:36, 2.50s/it] +2025-02-06 07:21:08 - ERROR - stderr - 96%|█████████▋| 21605/22434 [21:13:28<34:15, 2.48s/it] +2025-02-06 07:21:08 - ERROR - stderr - +2025-02-06 07:21:08 - ERROR - stderr - +2025-02-06 07:21:08 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5737032890319824, 'learning_rate': 7.153894515000592e-08, 'epoch': 2.89} +2025-02-06 07:21:08 - ERROR - stderr - 96%|█████████▋| 21605/22434 [21:13:28<34:15, 2.48s/it] +2025-02-06 07:21:11 - ERROR - stderr - 96%|█████████▋| 21606/22434 [21:13:31<34:04, 2.47s/it] +2025-02-06 07:21:11 - ERROR - stderr - +2025-02-06 07:21:11 - ERROR - stderr - +2025-02-06 07:21:11 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.5107169151306152, 'learning_rate': 7.136666376191703e-08, 'epoch': 2.89} +2025-02-06 07:21:11 - ERROR - stderr - 96%|█████████▋| 21606/22434 [21:13:31<34:04, 2.47s/it] +2025-02-06 07:21:13 - ERROR - stderr - 96%|█████████▋| 21607/22434 [21:13:33<33:48, 2.45s/it] +2025-02-06 07:21:13 - ERROR - stderr - +2025-02-06 07:21:13 - ERROR - stderr - +2025-02-06 07:21:13 - INFO - stdout - {'loss': 0.4003, 'grad_norm': 1.8240649700164795, 'learning_rate': 7.119458932668855e-08, 'epoch': 2.89} +2025-02-06 07:21:13 - ERROR - stderr - 96%|█████████▋| 21607/22434 [21:13:33<33:48, 2.45s/it] +2025-02-06 07:21:16 - ERROR - stderr - 96%|█████████▋| 21608/22434 [21:13:35<33:52, 2.46s/it] +2025-02-06 07:21:16 - ERROR - stderr - +2025-02-06 07:21:16 - ERROR - stderr - +2025-02-06 07:21:16 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.5535826683044434, 'learning_rate': 7.10227218479076e-08, 'epoch': 2.89} +2025-02-06 07:21:16 - ERROR - stderr - 96%|█████████▋| 21608/22434 [21:13:36<33:52, 2.46s/it] +2025-02-06 07:21:18 - ERROR - stderr - 96%|█████████▋| 21609/22434 [21:13:38<34:12, 2.49s/it] +2025-02-06 07:21:18 - ERROR - stderr - +2025-02-06 07:21:18 - ERROR - stderr - +2025-02-06 07:21:18 - INFO - stdout - {'loss': 0.3403, 'grad_norm': 1.5502957105636597, 'learning_rate': 7.085106132915798e-08, 'epoch': 2.89} +2025-02-06 07:21:18 - ERROR - stderr - 96%|█████████▋| 21609/22434 [21:13:38<34:12, 2.49s/it] +2025-02-06 07:21:21 - ERROR - stderr - 96%|█████████▋| 21610/22434 [21:13:41<35:27, 2.58s/it] +2025-02-06 07:21:21 - ERROR - stderr - +2025-02-06 07:21:21 - ERROR - stderr - +2025-02-06 07:21:21 - INFO - stdout - {'loss': 0.363, 'grad_norm': 1.4516522884368896, 'learning_rate': 7.067960777401684e-08, 'epoch': 2.89} +2025-02-06 07:21:21 - ERROR - stderr - 96%|█████████▋| 21610/22434 [21:13:41<35:27, 2.58s/it] +2025-02-06 07:21:24 - ERROR - stderr - 96%|█████████▋| 21611/22434 [21:13:43<35:14, 2.57s/it] +2025-02-06 07:21:24 - ERROR - stderr - +2025-02-06 07:21:24 - ERROR - stderr - +2025-02-06 07:21:24 - INFO - stdout - {'loss': 0.2935, 'grad_norm': 1.4155759811401367, 'learning_rate': 7.050836118605686e-08, 'epoch': 2.89} +2025-02-06 07:21:24 - ERROR - stderr - 96%|█████████▋| 21611/22434 [21:13:43<35:14, 2.57s/it] +2025-02-06 07:21:26 - ERROR - stderr - 96%|█████████▋| 21612/22434 [21:13:46<34:51, 2.54s/it] +2025-02-06 07:21:26 - ERROR - stderr - +2025-02-06 07:21:26 - ERROR - stderr - +2025-02-06 07:21:26 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.6853954792022705, 'learning_rate': 7.033732156884965e-08, 'epoch': 2.89} +2025-02-06 07:21:26 - ERROR - stderr - 96%|█████████▋| 21612/22434 [21:13:46<34:51, 2.54s/it] +2025-02-06 07:21:29 - ERROR - stderr - 96%|█████████▋| 21613/22434 [21:13:48<34:38, 2.53s/it] +2025-02-06 07:21:29 - ERROR - stderr - +2025-02-06 07:21:29 - ERROR - stderr - +2025-02-06 07:21:29 - INFO - stdout - {'loss': 0.3713, 'grad_norm': 1.5842230319976807, 'learning_rate': 7.0166488925959e-08, 'epoch': 2.89} +2025-02-06 07:21:29 - ERROR - stderr - 96%|█████████▋| 21613/22434 [21:13:48<34:38, 2.53s/it] +2025-02-06 07:21:31 - ERROR - stderr - 96%|█████████▋| 21614/22434 [21:13:51<34:22, 2.52s/it] +2025-02-06 07:21:31 - ERROR - stderr - +2025-02-06 07:21:31 - ERROR - stderr - +2025-02-06 07:21:31 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.579323649406433, 'learning_rate': 6.999586326094654e-08, 'epoch': 2.89} +2025-02-06 07:21:31 - ERROR - stderr - 96%|█████████▋| 21614/22434 [21:13:51<34:22, 2.52s/it] +2025-02-06 07:21:34 - ERROR - stderr - 96%|█████████▋| 21615/22434 [21:13:53<34:25, 2.52s/it] +2025-02-06 07:21:34 - ERROR - stderr - +2025-02-06 07:21:34 - ERROR - stderr - +2025-02-06 07:21:34 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.4228390455245972, 'learning_rate': 6.982544457736717e-08, 'epoch': 2.89} +2025-02-06 07:21:34 - ERROR - stderr - 96%|█████████▋| 21615/22434 [21:13:53<34:25, 2.52s/it] +2025-02-06 07:21:36 - ERROR - stderr - 96%|█████████▋| 21616/22434 [21:13:56<34:11, 2.51s/it] +2025-02-06 07:21:36 - ERROR - stderr - +2025-02-06 07:21:36 - ERROR - stderr - +2025-02-06 07:21:36 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.6732838153839111, 'learning_rate': 6.965523287877473e-08, 'epoch': 2.89} +2025-02-06 07:21:36 - ERROR - stderr - 96%|████��████▋| 21616/22434 [21:13:56<34:11, 2.51s/it] +2025-02-06 07:21:39 - ERROR - stderr - 96%|█████████▋| 21617/22434 [21:13:58<34:20, 2.52s/it] +2025-02-06 07:21:39 - ERROR - stderr - +2025-02-06 07:21:39 - ERROR - stderr - +2025-02-06 07:21:39 - INFO - stdout - {'loss': 0.3797, 'grad_norm': 1.5317559242248535, 'learning_rate': 6.94852281687175e-08, 'epoch': 2.89} +2025-02-06 07:21:39 - ERROR - stderr - 96%|█████████▋| 21617/22434 [21:13:58<34:20, 2.52s/it] +2025-02-06 07:21:41 - ERROR - stderr - 96%|█████████▋| 21618/22434 [21:14:01<34:06, 2.51s/it] +2025-02-06 07:21:41 - ERROR - stderr - +2025-02-06 07:21:41 - ERROR - stderr - +2025-02-06 07:21:41 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.5683553218841553, 'learning_rate': 6.931543045073708e-08, 'epoch': 2.89} +2025-02-06 07:21:41 - ERROR - stderr - 96%|█████████▋| 21618/22434 [21:14:01<34:06, 2.51s/it] +2025-02-06 07:21:44 - ERROR - stderr - 96%|█████████▋| 21619/22434 [21:14:03<34:06, 2.51s/it] +2025-02-06 07:21:44 - ERROR - stderr - +2025-02-06 07:21:44 - ERROR - stderr - +2025-02-06 07:21:44 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.3311394453048706, 'learning_rate': 6.914583972837508e-08, 'epoch': 2.89} +2025-02-06 07:21:44 - ERROR - stderr - 96%|█████████▋| 21619/22434 [21:14:03<34:06, 2.51s/it] +2025-02-06 07:21:46 - ERROR - stderr - 96%|█████████▋| 21620/22434 [21:14:06<34:06, 2.51s/it] +2025-02-06 07:21:46 - ERROR - stderr - +2025-02-06 07:21:46 - ERROR - stderr - +2025-02-06 07:21:46 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.5298680067062378, 'learning_rate': 6.897645600516311e-08, 'epoch': 2.89} +2025-02-06 07:21:46 - ERROR - stderr - 96%|█████████▋| 21620/22434 [21:14:06<34:06, 2.51s/it] +2025-02-06 07:21:49 - ERROR - stderr - 96%|█████████▋| 21621/22434 [21:14:09<35:42, 2.64s/it] +2025-02-06 07:21:49 - ERROR - stderr - +2025-02-06 07:21:49 - ERROR - stderr - +2025-02-06 07:21:49 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.7033016681671143, 'learning_rate': 6.880727928463615e-08, 'epoch': 2.89} +2025-02-06 07:21:49 - ERROR - stderr - 96%|█████████▋| 21621/22434 [21:14:09<35:42, 2.64s/it] +2025-02-06 07:21:51 - ERROR - stderr - 96%|█████████▋| 21622/22434 [21:14:11<34:53, 2.58s/it] +2025-02-06 07:21:52 - ERROR - stderr - +2025-02-06 07:21:52 - ERROR - stderr - +2025-02-06 07:21:52 - INFO - stdout - {'loss': 0.3935, 'grad_norm': 1.5434212684631348, 'learning_rate': 6.863830957031803e-08, 'epoch': 2.89} +2025-02-06 07:21:52 - ERROR - stderr - 96%|█████████▋| 21622/22434 [21:14:11<34:53, 2.58s/it] +2025-02-06 07:21:54 - ERROR - stderr - 96%|█████████▋| 21623/22434 [21:14:14<34:17, 2.54s/it] +2025-02-06 07:21:54 - ERROR - stderr - +2025-02-06 07:21:54 - ERROR - stderr - +2025-02-06 07:21:54 - INFO - stdout - {'loss': 0.3284, 'grad_norm': 1.5689938068389893, 'learning_rate': 6.846954686572927e-08, 'epoch': 2.89} +2025-02-06 07:21:54 - ERROR - stderr - 96%|█████████▋| 21623/22434 [21:14:14<34:17, 2.54s/it] +2025-02-06 07:21:56 - ERROR - stderr - 96%|█████████▋| 21624/22434 [21:14:16<33:46, 2.50s/it] +2025-02-06 07:21:56 - ERROR - stderr - +2025-02-06 07:21:56 - ERROR - stderr - +2025-02-06 07:21:56 - INFO - stdout - {'loss': 0.3352, 'grad_norm': 1.5319631099700928, 'learning_rate': 6.830099117439149e-08, 'epoch': 2.89} +2025-02-06 07:21:56 - ERROR - stderr - 96%|█████████▋| 21624/22434 [21:14:16<33:46, 2.50s/it] +2025-02-06 07:21:59 - ERROR - stderr - 96%|█████████▋| 21625/22434 [21:14:19<33:36, 2.49s/it] +2025-02-06 07:21:59 - ERROR - stderr - +2025-02-06 07:21:59 - ERROR - stderr - +2025-02-06 07:21:59 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.6633013486862183, 'learning_rate': 6.813264249981522e-08, 'epoch': 2.89} +2025-02-06 07:21:59 - ERROR - stderr - 96%|█████████▋| 21625/22434 [21:14:19<33:36, 2.49s/it] +2025-02-06 07:22:01 - ERROR - stderr - 96%|█████████▋| 21626/22434 [21:14:21<33:40, 2.50s/it] +2025-02-06 07:22:01 - ERROR - stderr - +2025-02-06 07:22:01 - ERROR - stderr - +2025-02-06 07:22:01 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5380678176879883, 'learning_rate': 6.796450084550988e-08, 'epoch': 2.89} +2025-02-06 07:22:01 - ERROR - stderr - 96%|█████████▋| 21626/22434 [21:14:21<33:40, 2.50s/it] +2025-02-06 07:22:04 - ERROR - stderr - 96%|█████████▋| 21627/22434 [21:14:24<33:34, 2.50s/it] +2025-02-06 07:22:04 - ERROR - stderr - +2025-02-06 07:22:04 - ERROR - stderr - +2025-02-06 07:22:04 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.6302703619003296, 'learning_rate': 6.779656621498154e-08, 'epoch': 2.89} +2025-02-06 07:22:04 - ERROR - stderr - 96%|█████████▋| 21627/22434 [21:14:24<33:34, 2.50s/it] +2025-02-06 07:22:06 - ERROR - stderr - 96%|█████████▋| 21628/22434 [21:14:26<33:16, 2.48s/it] +2025-02-06 07:22:06 - ERROR - stderr - +2025-02-06 07:22:06 - ERROR - stderr - +2025-02-06 07:22:06 - INFO - stdout - {'loss': 0.3668, 'grad_norm': 1.5005167722702026, 'learning_rate': 6.762883861172853e-08, 'epoch': 2.89} +2025-02-06 07:22:06 - ERROR - stderr - 96%|█████████▋| 21628/22434 [21:14:26<33:16, 2.48s/it] +2025-02-06 07:22:09 - ERROR - stderr - 96%|█████████▋| 21629/22434 [21:14:29<33:13, 2.48s/it] +2025-02-06 07:22:09 - ERROR - stderr - +2025-02-06 07:22:09 - ERROR - stderr - +2025-02-06 07:22:09 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.6770083904266357, 'learning_rate': 6.746131803924915e-08, 'epoch': 2.89} +2025-02-06 07:22:09 - ERROR - stderr - 96%|█████████▋| 21629/22434 [21:14:29<33:13, 2.48s/it] +2025-02-06 07:22:11 - ERROR - stderr - 96%|█████████▋| 21630/22434 [21:14:31<33:00, 2.46s/it] +2025-02-06 07:22:11 - ERROR - stderr - +2025-02-06 07:22:11 - ERROR - stderr - +2025-02-06 07:22:11 - INFO - stdout - {'loss': 0.3617, 'grad_norm': 1.4912923574447632, 'learning_rate': 6.729400450103285e-08, 'epoch': 2.89} +2025-02-06 07:22:11 - ERROR - stderr - 96%|█████████▋| 21630/22434 [21:14:31<33:00, 2.46s/it] +2025-02-06 07:22:14 - ERROR - stderr - 96%|█████████▋| 21631/22434 [21:14:33<33:03, 2.47s/it] +2025-02-06 07:22:14 - ERROR - stderr - +2025-02-06 07:22:14 - ERROR - stderr - +2025-02-06 07:22:14 - INFO - stdout - {'loss': 0.3822, 'grad_norm': 1.7437427043914795, 'learning_rate': 6.712689800057015e-08, 'epoch': 2.89} +2025-02-06 07:22:14 - ERROR - stderr - 96%|█████████▋| 21631/22434 [21:14:33<33:03, 2.47s/it] +2025-02-06 07:22:16 - ERROR - stderr - 96%|█████████▋| 21632/22434 [21:14:36<33:11, 2.48s/it] +2025-02-06 07:22:16 - ERROR - stderr - +2025-02-06 07:22:16 - ERROR - stderr - +2025-02-06 07:22:16 - INFO - stdout - {'loss': 0.3111, 'grad_norm': 1.3525652885437012, 'learning_rate': 6.695999854134161e-08, 'epoch': 2.89} +2025-02-06 07:22:16 - ERROR - stderr - 96%|█████████▋| 21632/22434 [21:14:36<33:11, 2.48s/it] +2025-02-06 07:22:19 - ERROR - stderr - 96%|█████████▋| 21633/22434 [21:14:38<33:06, 2.48s/it] +2025-02-06 07:22:19 - ERROR - stderr - +2025-02-06 07:22:19 - ERROR - stderr - +2025-02-06 07:22:19 - INFO - stdout - {'loss': 0.3803, 'grad_norm': 1.6839276552200317, 'learning_rate': 6.679330612682666e-08, 'epoch': 2.89} +2025-02-06 07:22:19 - ERROR - stderr - 96%|█████████▋| 21633/22434 [21:14:38<33:06, 2.48s/it] +2025-02-06 07:22:21 - ERROR - stderr - 96%|█████████▋| 21634/22434 [21:14:41<33:05, 2.48s/it] +2025-02-06 07:22:21 - ERROR - stderr - +2025-02-06 07:22:21 - ERROR - stderr - +2025-02-06 07:22:21 - INFO - stdout - {'loss': 0.3494, 'grad_norm': 1.6583653688430786, 'learning_rate': 6.662682076050031e-08, 'epoch': 2.89} +2025-02-06 07:22:21 - ERROR - stderr - 96%|█████████▋| 21634/22434 [21:14:41<33:05, 2.48s/it] +2025-02-06 07:22:24 - ERROR - stderr - 96%|█████████▋| 21635/22434 [21:14:43<32:57, 2.48s/it] +2025-02-06 07:22:24 - ERROR - stderr - +2025-02-06 07:22:24 - ERROR - stderr - +2025-02-06 07:22:24 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.8219316005706787, 'learning_rate': 6.646054244583311e-08, 'epoch': 2.89} +2025-02-06 07:22:24 - ERROR - stderr - 96%|█████████▋| 21635/22434 [21:14:43<32:57, 2.48s/it] +2025-02-06 07:22:26 - ERROR - stderr - 96%|█████████▋| 21636/22434 [21:14:46<33:01, 2.48s/it] +2025-02-06 07:22:26 - ERROR - stderr - +2025-02-06 07:22:26 - ERROR - stderr - +2025-02-06 07:22:26 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.554471492767334, 'learning_rate': 6.629447118629006e-08, 'epoch': 2.89} +2025-02-06 07:22:26 - ERROR - stderr - 96%|█████████▋| 21636/22434 [21:14:46<33:01, 2.48s/it] +2025-02-06 07:22:29 - ERROR - stderr - 96%|█████████▋| 21637/22434 [21:14:48<33:11, 2.50s/it] +2025-02-06 07:22:29 - ERROR - stderr - +2025-02-06 07:22:29 - ERROR - stderr - +2025-02-06 07:22:29 - INFO - stdout - {'loss': 0.3173, 'grad_norm': 1.5477598905563354, 'learning_rate': 6.612860698533397e-08, 'epoch': 2.89} +2025-02-06 07:22:29 - ERROR - stderr - 96%|█████████▋| 21637/22434 [21:14:48<33:11, 2.50s/it] +2025-02-06 07:22:31 - ERROR - stderr - 96%|█████████▋| 21638/22434 [21:14:51<33:21, 2.51s/it] +2025-02-06 07:22:31 - ERROR - stderr - +2025-02-06 07:22:31 - ERROR - stderr - +2025-02-06 07:22:31 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.5822207927703857, 'learning_rate': 6.596294984642093e-08, 'epoch': 2.89} +2025-02-06 07:22:31 - ERROR - stderr - 96%|█████████▋| 21638/22434 [21:14:51<33:21, 2.51s/it] +2025-02-06 07:22:34 - ERROR - stderr - 96%|█████████▋| 21639/22434 [21:14:53<32:53, 2.48s/it] +2025-02-06 07:22:34 - ERROR - stderr - +2025-02-06 07:22:34 - ERROR - stderr - +2025-02-06 07:22:34 - INFO - stdout - {'loss': 0.3674, 'grad_norm': 1.6077431440353394, 'learning_rate': 6.579749977300488e-08, 'epoch': 2.89} +2025-02-06 07:22:34 - ERROR - stderr - 96%|█████████▋| 21639/22434 [21:14:53<32:53, 2.48s/it] +2025-02-06 07:22:36 - ERROR - stderr - 96%|█████████▋| 21640/22434 [21:14:56<32:45, 2.48s/it] +2025-02-06 07:22:36 - ERROR - stderr - +2025-02-06 07:22:36 - ERROR - stderr - +2025-02-06 07:22:36 - INFO - stdout - {'loss': 0.3132, 'grad_norm': 1.3613229990005493, 'learning_rate': 6.563225676853302e-08, 'epoch': 2.89} +2025-02-06 07:22:36 - ERROR - stderr - 96%|█████████▋| 21640/22434 [21:14:56<32:45, 2.48s/it] +2025-02-06 07:22:39 - ERROR - stderr - 96%|█████████▋| 21641/22434 [21:14:58<32:41, 2.47s/it] +2025-02-06 07:22:39 - ERROR - stderr - +2025-02-06 07:22:39 - ERROR - stderr - +2025-02-06 07:22:39 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.56528639793396, 'learning_rate': 6.546722083645151e-08, 'epoch': 2.89} +2025-02-06 07:22:39 - ERROR - stderr - 96%|█████████▋| 21641/22434 [21:14:58<32:41, 2.47s/it] +2025-02-06 07:22:41 - ERROR - stderr - 96%|█████████▋| 21642/22434 [21:15:01<32:44, 2.48s/it] +2025-02-06 07:22:41 - ERROR - stderr - +2025-02-06 07:22:41 - ERROR - stderr - +2025-02-06 07:22:41 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.5418583154678345, 'learning_rate': 6.530239198019872e-08, 'epoch': 2.89} +2025-02-06 07:22:41 - ERROR - stderr - 96%|█████████▋| 21642/22434 [21:15:01<32:44, 2.48s/it] +2025-02-06 07:22:44 - ERROR - stderr - 96%|█████████▋| 21643/22434 [21:15:03<32:51, 2.49s/it] +2025-02-06 07:22:44 - ERROR - stderr - +2025-02-06 07:22:44 - ERROR - stderr - +2025-02-06 07:22:44 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.5107418298721313, 'learning_rate': 6.513777020321188e-08, 'epoch': 2.89} +2025-02-06 07:22:44 - ERROR - stderr - 96%|█████████▋| 21643/22434 [21:15:03<32:51, 2.49s/it] +2025-02-06 07:22:46 - ERROR - stderr - 96%|█████████▋| 21644/22434 [21:15:06<33:15, 2.53s/it] +2025-02-06 07:22:46 - ERROR - stderr - +2025-02-06 07:22:46 - ERROR - stderr - +2025-02-06 07:22:46 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.664481282234192, 'learning_rate': 6.497335550892048e-08, 'epoch': 2.89} +2025-02-06 07:22:46 - ERROR - stderr - 96%|█████████▋| 21644/22434 [21:15:06<33:15, 2.53s/it] +2025-02-06 07:22:49 - ERROR - stderr - 96%|█████████▋| 21645/22434 [21:15:08<33:01, 2.51s/it] +2025-02-06 07:22:49 - ERROR - stderr - +2025-02-06 07:22:49 - ERROR - stderr - +2025-02-06 07:22:49 - INFO - stdout - {'loss': 0.3757, 'grad_norm': 1.6998982429504395, 'learning_rate': 6.480914790075399e-08, 'epoch': 2.89} +2025-02-06 07:22:49 - ERROR - stderr - 96%|█████████▋| 21645/22434 [21:15:08<33:01, 2.51s/it] +2025-02-06 07:22:51 - ERROR - stderr - 96%|█████████▋| 21646/22434 [21:15:11<32:48, 2.50s/it] +2025-02-06 07:22:51 - ERROR - stderr - +2025-02-06 07:22:51 - ERROR - stderr - +2025-02-06 07:22:51 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.543299674987793, 'learning_rate': 6.464514738213301e-08, 'epoch': 2.89} +2025-02-06 07:22:51 - ERROR - stderr - 96%|█████████▋| 21646/22434 [21:15:11<32:48, 2.50s/it] +2025-02-06 07:22:54 - ERROR - stderr - 96%|█████████▋| 21647/22434 [21:15:13<32:28, 2.48s/it] +2025-02-06 07:22:54 - ERROR - stderr - +2025-02-06 07:22:54 - ERROR - stderr - +2025-02-06 07:22:54 - INFO - stdout - {'loss': 0.3017, 'grad_norm': 1.497301697731018, 'learning_rate': 6.448135395647703e-08, 'epoch': 2.89} +2025-02-06 07:22:54 - ERROR - stderr - 96%|█████████▋| 21647/22434 [21:15:13<32:28, 2.48s/it] +2025-02-06 07:22:56 - ERROR - stderr - 96%|█████████▋| 21648/22434 [21:15:16<32:30, 2.48s/it] +2025-02-06 07:22:56 - ERROR - stderr - +2025-02-06 07:22:56 - ERROR - stderr - +2025-02-06 07:22:56 - INFO - stdout - {'loss': 0.3939, 'grad_norm': 1.543073296546936, 'learning_rate': 6.43177676272e-08, 'epoch': 2.89} +2025-02-06 07:22:56 - ERROR - stderr - 96%|█████████▋| 21648/22434 [21:15:16<32:30, 2.48s/it] +2025-02-06 07:22:58 - ERROR - stderr - 97%|█████████▋| 21649/22434 [21:15:18<32:16, 2.47s/it] +2025-02-06 07:22:58 - ERROR - stderr - +2025-02-06 07:22:58 - ERROR - stderr - +2025-02-06 07:22:58 - INFO - stdout - {'loss': 0.3197, 'grad_norm': 1.47458815574646, 'learning_rate': 6.415438839771137e-08, 'epoch': 2.9} +2025-02-06 07:22:58 - ERROR - stderr - 97%|█████████▋| 21649/22434 [21:15:18<32:16, 2.47s/it] +2025-02-06 07:23:01 - ERROR - stderr - 97%|█████████▋| 21650/22434 [21:15:21<32:20, 2.48s/it] +2025-02-06 07:23:01 - ERROR - stderr - +2025-02-06 07:23:01 - ERROR - stderr - +2025-02-06 07:23:01 - INFO - stdout - {'loss': 0.3114, 'grad_norm': 1.4781126976013184, 'learning_rate': 6.399121627141736e-08, 'epoch': 2.9} +2025-02-06 07:23:01 - ERROR - stderr - 97%|█████████▋| 21650/22434 [21:15:21<32:20, 2.48s/it] +2025-02-06 07:23:03 - ERROR - stderr - 97%|█████████▋| 21651/22434 [21:15:23<32:23, 2.48s/it] +2025-02-06 07:23:03 - ERROR - stderr - +2025-02-06 07:23:03 - ERROR - stderr - +2025-02-06 07:23:03 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.7995100021362305, 'learning_rate': 6.382825125171854e-08, 'epoch': 2.9} +2025-02-06 07:23:03 - ERROR - stderr - 97%|█████████▋| 21651/22434 [21:15:23<32:23, 2.48s/it] +2025-02-06 07:23:06 - ERROR - stderr - 97%|█████████▋| 21652/22434 [21:15:26<33:31, 2.57s/it] +2025-02-06 07:23:06 - ERROR - stderr - +2025-02-06 07:23:06 - ERROR - stderr - +2025-02-06 07:23:06 - INFO - stdout - {'loss': 0.3206, 'grad_norm': 1.4060759544372559, 'learning_rate': 6.366549334201222e-08, 'epoch': 2.9} +2025-02-06 07:23:06 - ERROR - stderr - 97%|█████████▋| 21652/22434 [21:15:26<33:31, 2.57s/it] +2025-02-06 07:23:09 - ERROR - stderr - 97%|█████████▋| 21653/22434 [21:15:28<33:14, 2.55s/it] +2025-02-06 07:23:09 - ERROR - stderr - +2025-02-06 07:23:09 - ERROR - stderr - +2025-02-06 07:23:09 - INFO - stdout - {'loss': 0.3656, 'grad_norm': 1.5838254690170288, 'learning_rate': 6.350294254569012e-08, 'epoch': 2.9} +2025-02-06 07:23:09 - ERROR - stderr - 97%|█████████▋| 21653/22434 [21:15:29<33:14, 2.55s/it] +2025-02-06 07:23:11 - ERROR - stderr - 97%|█████████▋| 21654/22434 [21:15:31<33:17, 2.56s/it] +2025-02-06 07:23:11 - ERROR - stderr - +2025-02-06 07:23:11 - ERROR - stderr - +2025-02-06 07:23:11 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.448468804359436, 'learning_rate': 6.334059886614063e-08, 'epoch': 2.9} +2025-02-06 07:23:11 - ERROR - stderr - 97%|█████████▋| 21654/22434 [21:15:31<33:17, 2.56s/it] +2025-02-06 07:23:14 - ERROR - stderr - 97%|█████████▋| 21655/22434 [21:15:34<33:13, 2.56s/it] +2025-02-06 07:23:14 - ERROR - stderr - +2025-02-06 07:23:14 - ERROR - stderr - +2025-02-06 07:23:14 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.6509809494018555, 'learning_rate': 6.317846230674885e-08, 'epoch': 2.9} +2025-02-06 07:23:14 - ERROR - stderr - 97%|█████████▋| 21655/22434 [21:15:34<33:13, 2.56s/it] +2025-02-06 07:23:16 - ERROR - stderr - 97%|█████████▋| 21656/22434 [21:15:36<33:03, 2.55s/it] +2025-02-06 07:23:16 - ERROR - stderr - +2025-02-06 07:23:16 - ERROR - stderr - +2025-02-06 07:23:16 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.6306124925613403, 'learning_rate': 6.301653287089315e-08, 'epoch': 2.9} +2025-02-06 07:23:16 - ERROR - stderr - 97%|█████████▋| 21656/22434 [21:15:36<33:03, 2.55s/it] +2025-02-06 07:23:19 - ERROR - stderr - 97%|█████████▋| 21657/22434 [21:15:39<32:29, 2.51s/it] +2025-02-06 07:23:19 - ERROR - stderr - +2025-02-06 07:23:19 - ERROR - stderr - +2025-02-06 07:23:19 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.6412004232406616, 'learning_rate': 6.285481056194976e-08, 'epoch': 2.9} +2025-02-06 07:23:19 - ERROR - stderr - 97%|█████████▋| 21657/22434 [21:15:39<32:29, 2.51s/it] +2025-02-06 07:23:21 - ERROR - stderr - 97%|█████████▋| 21658/22434 [21:15:41<32:19, 2.50s/it] +2025-02-06 07:23:21 - ERROR - stderr - +2025-02-06 07:23:21 - ERROR - stderr - +2025-02-06 07:23:21 - INFO - stdout - {'loss': 0.3722, 'grad_norm': 1.57150137424469, 'learning_rate': 6.269329538328817e-08, 'epoch': 2.9} +2025-02-06 07:23:21 - ERROR - stderr - 97%|█████████▋| 21658/22434 [21:15:41<32:19, 2.50s/it] +2025-02-06 07:23:24 - ERROR - stderr - 97%|█████████▋| 21659/22434 [21:15:44<32:30, 2.52s/it] +2025-02-06 07:23:24 - ERROR - stderr - +2025-02-06 07:23:24 - ERROR - stderr - +2025-02-06 07:23:24 - INFO - stdout - {'loss': 0.3654, 'grad_norm': 1.6219292879104614, 'learning_rate': 6.253198733827681e-08, 'epoch': 2.9} +2025-02-06 07:23:24 - ERROR - stderr - 97%|█████████▋| 21659/22434 [21:15:44<32:30, 2.52s/it] +2025-02-06 07:23:26 - ERROR - stderr - 97%|█████████▋| 21660/22434 [21:15:46<32:30, 2.52s/it] +2025-02-06 07:23:26 - ERROR - stderr - +2025-02-06 07:23:26 - ERROR - stderr - +2025-02-06 07:23:26 - INFO - stdout - {'loss': 0.2873, 'grad_norm': 1.3908092975616455, 'learning_rate': 6.237088643027633e-08, 'epoch': 2.9} +2025-02-06 07:23:26 - ERROR - stderr - 97%|█████████▋| 21660/22434 [21:15:46<32:30, 2.52s/it] +2025-02-06 07:23:29 - ERROR - stderr - 97%|█████████▋| 21661/22434 [21:15:49<32:19, 2.51s/it] +2025-02-06 07:23:29 - ERROR - stderr - +2025-02-06 07:23:29 - ERROR - stderr - +2025-02-06 07:23:29 - INFO - stdout - {'loss': 0.3334, 'grad_norm': 1.5875223875045776, 'learning_rate': 6.220999266264516e-08, 'epoch': 2.9} +2025-02-06 07:23:29 - ERROR - stderr - 97%|█████████▋| 21661/22434 [21:15:49<32:19, 2.51s/it] +2025-02-06 07:23:31 - ERROR - stderr - 97%|█████████▋| 21662/22434 [21:15:51<32:22, 2.52s/it] +2025-02-06 07:23:31 - ERROR - stderr - +2025-02-06 07:23:31 - ERROR - stderr - +2025-02-06 07:23:31 - INFO - stdout - {'loss': 0.4316, 'grad_norm': 1.7054367065429688, 'learning_rate': 6.204930603873838e-08, 'epoch': 2.9} +2025-02-06 07:23:31 - ERROR - stderr - 97%|█████████▋| 21662/22434 [21:15:51<32:22, 2.52s/it] +2025-02-06 07:23:34 - ERROR - stderr - 97%|█████████▋| 21663/22434 [21:15:54<32:12, 2.51s/it] +2025-02-06 07:23:34 - ERROR - stderr - +2025-02-06 07:23:34 - ERROR - stderr - +2025-02-06 07:23:34 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.5833126306533813, 'learning_rate': 6.188882656190331e-08, 'epoch': 2.9} +2025-02-06 07:23:34 - ERROR - stderr - 97%|█████████▋| 21663/22434 [21:15:54<32:12, 2.51s/it] +2025-02-06 07:23:36 - ERROR - stderr - 97%|█████████▋| 21664/22434 [21:15:56<32:01, 2.50s/it] +2025-02-06 07:23:36 - ERROR - stderr - +2025-02-06 07:23:36 - ERROR - stderr - +2025-02-06 07:23:36 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.3678572177886963, 'learning_rate': 6.172855423548618e-08, 'epoch': 2.9} +2025-02-06 07:23:36 - ERROR - stderr - 97%|█████████▋| 21664/22434 [21:15:56<32:01, 2.50s/it] +2025-02-06 07:23:39 - ERROR - stderr - 97%|█████████▋| 21665/22434 [21:15:59<33:09, 2.59s/it] +2025-02-06 07:23:39 - ERROR - stderr - +2025-02-06 07:23:39 - ERROR - stderr - +2025-02-06 07:23:39 - INFO - stdout - {'loss': 0.4377, 'grad_norm': 1.7285796403884888, 'learning_rate': 6.156848906282764e-08, 'epoch': 2.9} +2025-02-06 07:23:39 - ERROR - stderr - 97%|█████████▋| 21665/22434 [21:15:59<33:09, 2.59s/it] +2025-02-06 07:23:42 - ERROR - stderr - 97%|█████████▋| 21666/22434 [21:16:01<32:31, 2.54s/it] +2025-02-06 07:23:42 - ERROR - stderr - +2025-02-06 07:23:42 - ERROR - stderr - +2025-02-06 07:23:42 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.6165828704833984, 'learning_rate': 6.140863104726391e-08, 'epoch': 2.9} +2025-02-06 07:23:42 - ERROR - stderr - 97%|█████████▋| 21666/22434 [21:16:01<32:31, 2.54s/it] +2025-02-06 07:23:44 - ERROR - stderr - 97%|█████████▋| 21667/22434 [21:16:04<32:12, 2.52s/it] +2025-02-06 07:23:44 - ERROR - stderr - +2025-02-06 07:23:44 - ERROR - stderr - +2025-02-06 07:23:44 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.6657164096832275, 'learning_rate': 6.124898019212677e-08, 'epoch': 2.9} +2025-02-06 07:23:44 - ERROR - stderr - 97%|█████████▋| 21667/22434 [21:16:04<32:12, 2.52s/it] +2025-02-06 07:23:46 - ERROR - stderr - 97%|█████████▋| 21668/22434 [21:16:06<31:46, 2.49s/it] +2025-02-06 07:23:46 - ERROR - stderr - +2025-02-06 07:23:46 - ERROR - stderr - +2025-02-06 07:23:46 - INFO - stdout - {'loss': 0.3568, 'grad_norm': 1.6752368211746216, 'learning_rate': 6.108953650074467e-08, 'epoch': 2.9} +2025-02-06 07:23:47 - ERROR - stderr - 97%|█████████▋| 21668/22434 [21:16:06<31:46, 2.49s/it] +2025-02-06 07:23:49 - ERROR - stderr - 97%|█████████▋| 21669/22434 [21:16:09<32:51, 2.58s/it] +2025-02-06 07:23:49 - ERROR - stderr - +2025-02-06 07:23:49 - ERROR - stderr - +2025-02-06 07:23:49 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.5614625215530396, 'learning_rate': 6.09302999764394e-08, 'epoch': 2.9} +2025-02-06 07:23:49 - ERROR - stderr - 97%|█████████▋| 21669/22434 [21:16:09<32:51, 2.58s/it] +2025-02-06 07:23:52 - ERROR - stderr - 97%|█████████▋| 21670/22434 [21:16:12<32:43, 2.57s/it] +2025-02-06 07:23:52 - ERROR - stderr - +2025-02-06 07:23:52 - ERROR - stderr - +2025-02-06 07:23:52 - INFO - stdout - {'loss': 0.3537, 'grad_norm': 1.6195648908615112, 'learning_rate': 6.077127062253274e-08, 'epoch': 2.9} +2025-02-06 07:23:52 - ERROR - stderr - 97%|█████████▋| 21670/22434 [21:16:12<32:43, 2.57s/it] +2025-02-06 07:23:54 - ERROR - stderr - 97%|█████████▋| 21671/22434 [21:16:14<32:16, 2.54s/it] +2025-02-06 07:23:54 - ERROR - stderr - +2025-02-06 07:23:54 - ERROR - stderr - +2025-02-06 07:23:54 - INFO - stdout - {'loss': 0.3348, 'grad_norm': 1.2677838802337646, 'learning_rate': 6.06124484423376e-08, 'epoch': 2.9} +2025-02-06 07:23:54 - ERROR - stderr - 97%|█████████▋| 21671/22434 [21:16:14<32:16, 2.54s/it] +2025-02-06 07:23:57 - ERROR - stderr - 97%|█████████▋| 21672/22434 [21:16:17<32:04, 2.53s/it] +2025-02-06 07:23:57 - ERROR - stderr - +2025-02-06 07:23:57 - ERROR - stderr - +2025-02-06 07:23:57 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.5156317949295044, 'learning_rate': 6.045383343916466e-08, 'epoch': 2.9} +2025-02-06 07:23:57 - ERROR - stderr - 97%|█████████▋| 21672/22434 [21:16:17<32:04, 2.53s/it] +2025-02-06 07:23:59 - ERROR - stderr - 97%|█████████▋| 21673/22434 [21:16:19<31:53, 2.51s/it] +2025-02-06 07:23:59 - ERROR - stderr - +2025-02-06 07:23:59 - ERROR - stderr - +2025-02-06 07:23:59 - INFO - stdout - {'loss': 0.3281, 'grad_norm': 1.4525957107543945, 'learning_rate': 6.02954256163213e-08, 'epoch': 2.9} +2025-02-06 07:23:59 - ERROR - stderr - 97%|█████████▋| 21673/22434 [21:16:19<31:53, 2.51s/it] +2025-02-06 07:24:02 - ERROR - stderr - 97%|█████████▋| 21674/22434 [21:16:22<32:01, 2.53s/it] +2025-02-06 07:24:02 - ERROR - stderr - +2025-02-06 07:24:02 - ERROR - stderr - +2025-02-06 07:24:02 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.6193652153015137, 'learning_rate': 6.013722497710817e-08, 'epoch': 2.9} +2025-02-06 07:24:02 - ERROR - stderr - 97%|█████████▋| 21674/22434 [21:16:22<32:01, 2.53s/it] +2025-02-06 07:24:04 - ERROR - stderr - 97%|█████████▋| 21675/22434 [21:16:24<31:47, 2.51s/it] +2025-02-06 07:24:04 - ERROR - stderr - +2025-02-06 07:24:04 - ERROR - stderr - +2025-02-06 07:24:04 - INFO - stdout - {'loss': 0.3736, 'grad_norm': 1.4548860788345337, 'learning_rate': 5.997923152482377e-08, 'epoch': 2.9} +2025-02-06 07:24:04 - ERROR - stderr - 97%|█████████▋| 21675/22434 [21:16:24<31:47, 2.51s/it] +2025-02-06 07:24:07 - ERROR - stderr - 97%|█████████▋| 21676/22434 [21:16:27<31:56, 2.53s/it] +2025-02-06 07:24:07 - ERROR - stderr - +2025-02-06 07:24:07 - ERROR - stderr - +2025-02-06 07:24:07 - INFO - stdout - {'loss': 0.3401, 'grad_norm': 1.5066725015640259, 'learning_rate': 5.982144526275991e-08, 'epoch': 2.9} +2025-02-06 07:24:07 - ERROR - stderr - 97%|█████████▋| 21676/22434 [21:16:27<31:56, 2.53s/it] +2025-02-06 07:24:09 - ERROR - stderr - 97%|█████████▋| 21677/22434 [21:16:29<31:50, 2.52s/it] +2025-02-06 07:24:09 - ERROR - stderr - +2025-02-06 07:24:09 - ERROR - stderr - +2025-02-06 07:24:09 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.5357369184494019, 'learning_rate': 5.966386619420617e-08, 'epoch': 2.9} +2025-02-06 07:24:09 - ERROR - stderr - 97%|█████████▋| 21677/22434 [21:16:29<31:50, 2.52s/it] +2025-02-06 07:24:12 - ERROR - stderr - 97%|█████████▋| 21678/22434 [21:16:32<31:52, 2.53s/it] +2025-02-06 07:24:12 - ERROR - stderr - +2025-02-06 07:24:12 - ERROR - stderr - +2025-02-06 07:24:12 - INFO - stdout - {'loss': 0.352, 'grad_norm': 1.4644618034362793, 'learning_rate': 5.9506494322447704e-08, 'epoch': 2.9} +2025-02-06 07:24:12 - ERROR - stderr - 97%|█████████▋| 21678/22434 [21:16:32<31:52, 2.53s/it] +2025-02-06 07:24:12 - INFO - stdout - WARNING: tokenization mismatch: 1 vs. 62. (ignored) +2025-02-06 07:24:14 - ERROR - stderr - 97%|█████████▋| 21679/22434 [21:16:34<31:44, 2.52s/it] +2025-02-06 07:24:14 - ERROR - stderr - +2025-02-06 07:24:14 - ERROR - stderr - +2025-02-06 07:24:14 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.4940434694290161, 'learning_rate': 5.934932965076412e-08, 'epoch': 2.9} +2025-02-06 07:24:14 - ERROR - stderr - 97%|█████████▋| 21679/22434 [21:16:34<31:44, 2.52s/it] +2025-02-06 07:24:17 - ERROR - stderr - 97%|█████████▋| 21680/22434 [21:16:37<31:37, 2.52s/it] +2025-02-06 07:24:17 - ERROR - stderr - +2025-02-06 07:24:17 - ERROR - stderr - +2025-02-06 07:24:17 - INFO - stdout - {'loss': 0.3066, 'grad_norm': 1.6580296754837036, 'learning_rate': 5.919237218243168e-08, 'epoch': 2.9} +2025-02-06 07:24:17 - ERROR - stderr - 97%|█████████▋| 21680/22434 [21:16:37<31:37, 2.52s/it] +2025-02-06 07:24:19 - ERROR - stderr - 97%|█████████▋| 21681/22434 [21:16:39<31:25, 2.50s/it] +2025-02-06 07:24:19 - ERROR - stderr - +2025-02-06 07:24:19 - ERROR - stderr - +2025-02-06 07:24:19 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.5230480432510376, 'learning_rate': 5.903562192072221e-08, 'epoch': 2.9} +2025-02-06 07:24:19 - ERROR - stderr - 97%|█████████▋| 21681/22434 [21:16:39<31:25, 2.50s/it] +2025-02-06 07:24:22 - ERROR - stderr - 97%|█████████▋| 21682/22434 [21:16:42<31:07, 2.48s/it] +2025-02-06 07:24:22 - ERROR - stderr - +2025-02-06 07:24:22 - ERROR - stderr - +2025-02-06 07:24:22 - INFO - stdout - {'loss': 0.3255, 'grad_norm': 1.3444669246673584, 'learning_rate': 5.887907886890199e-08, 'epoch': 2.9} +2025-02-06 07:24:22 - ERROR - stderr - 97%|█████████▋| 21682/22434 [21:16:42<31:07, 2.48s/it] +2025-02-06 07:24:24 - ERROR - stderr - 97%|█████████▋| 21683/22434 [21:16:44<31:03, 2.48s/it] +2025-02-06 07:24:24 - ERROR - stderr - +2025-02-06 07:24:24 - ERROR - stderr - +2025-02-06 07:24:24 - INFO - stdout - {'loss': 0.3189, 'grad_norm': 1.5545517206192017, 'learning_rate': 5.8722743030236174e-08, 'epoch': 2.9} +2025-02-06 07:24:24 - ERROR - stderr - 97%|█████████▋| 21683/22434 [21:16:44<31:03, 2.48s/it] +2025-02-06 07:24:27 - ERROR - stderr - 97%|█████████▋| 21684/22434 [21:16:47<31:04, 2.49s/it] +2025-02-06 07:24:27 - ERROR - stderr - +2025-02-06 07:24:27 - ERROR - stderr - +2025-02-06 07:24:27 - INFO - stdout - {'loss': 0.3773, 'grad_norm': 1.583840012550354, 'learning_rate': 5.856661440797995e-08, 'epoch': 2.9} +2025-02-06 07:24:27 - ERROR - stderr - 97%|█████████▋| 21684/22434 [21:16:47<31:04, 2.49s/it] +2025-02-06 07:24:29 - ERROR - stderr - 97%|█████████▋| 21685/22434 [21:16:49<30:52, 2.47s/it] +2025-02-06 07:24:29 - ERROR - stderr - +2025-02-06 07:24:29 - ERROR - stderr - +2025-02-06 07:24:29 - INFO - stdout - {'loss': 0.3729, 'grad_norm': 1.5190311670303345, 'learning_rate': 5.841069300539182e-08, 'epoch': 2.9} +2025-02-06 07:24:29 - ERROR - stderr - 97%|█████████▋| 21685/22434 [21:16:49<30:52, 2.47s/it] +2025-02-06 07:24:32 - ERROR - stderr - 97%|█████████▋| 21686/22434 [21:16:52<31:09, 2.50s/it] +2025-02-06 07:24:32 - ERROR - stderr - +2025-02-06 07:24:32 - ERROR - stderr - +2025-02-06 07:24:32 - INFO - stdout - {'loss': 0.3117, 'grad_norm': 1.6228069067001343, 'learning_rate': 5.8254978825718065e-08, 'epoch': 2.9} +2025-02-06 07:24:32 - ERROR - stderr - 97%|█████████▋| 21686/22434 [21:16:52<31:09, 2.50s/it] +2025-02-06 07:24:34 - ERROR - stderr - 97%|█████████▋| 21687/22434 [21:16:54<30:59, 2.49s/it] +2025-02-06 07:24:34 - ERROR - stderr - +2025-02-06 07:24:34 - ERROR - stderr - +2025-02-06 07:24:34 - INFO - stdout - {'loss': 0.3752, 'grad_norm': 1.6367323398590088, 'learning_rate': 5.80994718722061e-08, 'epoch': 2.9} +2025-02-06 07:24:34 - ERROR - stderr - 97%|█████████▋| 21687/22434 [21:16:54<30:59, 2.49s/it] +2025-02-06 07:24:37 - ERROR - stderr - 97%|█████████▋| 21688/22434 [21:16:57<32:16, 2.60s/it] +2025-02-06 07:24:37 - ERROR - stderr - +2025-02-06 07:24:37 - ERROR - stderr - +2025-02-06 07:24:37 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.7038311958312988, 'learning_rate': 5.794417214809889e-08, 'epoch': 2.9} +2025-02-06 07:24:37 - ERROR - stderr - 97%|█████████▋| 21688/22434 [21:16:57<32:16, 2.60s/it] +2025-02-06 07:24:40 - ERROR - stderr - 97%|█████████▋| 21689/22434 [21:16:59<31:52, 2.57s/it] +2025-02-06 07:24:40 - ERROR - stderr - +2025-02-06 07:24:40 - ERROR - stderr - +2025-02-06 07:24:40 - INFO - stdout - {'loss': 0.3082, 'grad_norm': 1.493650197982788, 'learning_rate': 5.77890796566305e-08, 'epoch': 2.9} +2025-02-06 07:24:40 - ERROR - stderr - 97%|█████████▋| 21689/22434 [21:16:59<31:52, 2.57s/it] +2025-02-06 07:24:42 - ERROR - stderr - 97%|█████████▋| 21690/22434 [21:17:02<31:41, 2.56s/it] +2025-02-06 07:24:42 - ERROR - stderr - +2025-02-06 07:24:42 - ERROR - stderr - +2025-02-06 07:24:42 - INFO - stdout - {'loss': 0.3558, 'grad_norm': 1.9125847816467285, 'learning_rate': 5.763419440103613e-08, 'epoch': 2.9} +2025-02-06 07:24:42 - ERROR - stderr - 97%|█████████▋| 21690/22434 [21:17:02<31:41, 2.56s/it] +2025-02-06 07:24:45 - ERROR - stderr - 97%|█████████▋| 21691/22434 [21:17:04<31:05, 2.51s/it] +2025-02-06 07:24:45 - ERROR - stderr - +2025-02-06 07:24:45 - ERROR - stderr - +2025-02-06 07:24:45 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.5828602313995361, 'learning_rate': 5.747951638454208e-08, 'epoch': 2.9} +2025-02-06 07:24:45 - ERROR - stderr - 97%|█████████▋| 21691/22434 [21:17:04<31:05, 2.51s/it] +2025-02-06 07:24:47 - ERROR - stderr - 97%|█████████▋| 21692/22434 [21:17:07<31:42, 2.56s/it] +2025-02-06 07:24:47 - ERROR - stderr - +2025-02-06 07:24:47 - ERROR - stderr - +2025-02-06 07:24:47 - INFO - stdout - {'loss': 0.3671, 'grad_norm': 1.4001460075378418, 'learning_rate': 5.7325045610374665e-08, 'epoch': 2.9} +2025-02-06 07:24:47 - ERROR - stderr - 97%|█████████▋| 21692/22434 [21:17:07<31:42, 2.56s/it] +2025-02-06 07:24:50 - ERROR - stderr - 97%|█████████▋| 21693/22434 [21:17:09<31:23, 2.54s/it] +2025-02-06 07:24:50 - ERROR - stderr - +2025-02-06 07:24:50 - ERROR - stderr - +2025-02-06 07:24:50 - INFO - stdout - {'loss': 0.3728, 'grad_norm': 1.6218438148498535, 'learning_rate': 5.7170782081751305e-08, 'epoch': 2.9} +2025-02-06 07:24:50 - ERROR - stderr - 97%|█████████▋| 21693/22434 [21:17:10<31:23, 2.54s/it] +2025-02-06 07:24:52 - ERROR - stderr - 97%|█████████▋| 21694/22434 [21:17:12<31:09, 2.53s/it] +2025-02-06 07:24:52 - ERROR - stderr - +2025-02-06 07:24:52 - ERROR - stderr - +2025-02-06 07:24:52 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.4938585758209229, 'learning_rate': 5.701672580188944e-08, 'epoch': 2.9} +2025-02-06 07:24:52 - ERROR - stderr - 97%|█████████▋| 21694/22434 [21:17:12<31:09, 2.53s/it] +2025-02-06 07:24:55 - ERROR - stderr - 97%|█████████▋| 21695/22434 [21:17:15<31:10, 2.53s/it] +2025-02-06 07:24:55 - ERROR - stderr - +2025-02-06 07:24:55 - ERROR - stderr - +2025-02-06 07:24:55 - INFO - stdout - {'loss': 0.3984, 'grad_norm': 1.7457612752914429, 'learning_rate': 5.686287677399982e-08, 'epoch': 2.9} +2025-02-06 07:24:55 - ERROR - stderr - 97%|█████████▋| 21695/22434 [21:17:15<31:10, 2.53s/it] +2025-02-06 07:24:57 - ERROR - stderr - 97%|█████████▋| 21696/22434 [21:17:17<31:13, 2.54s/it] +2025-02-06 07:24:57 - ERROR - stderr - +2025-02-06 07:24:57 - ERROR - stderr - +2025-02-06 07:24:57 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.6180546283721924, 'learning_rate': 5.670923500128766e-08, 'epoch': 2.9} +2025-02-06 07:24:57 - ERROR - stderr - 97%|█████████▋| 21696/22434 [21:17:17<31:13, 2.54s/it] +2025-02-06 07:25:00 - ERROR - stderr - 97%|█████████▋| 21697/22434 [21:17:20<31:22, 2.55s/it] +2025-02-06 07:25:00 - ERROR - stderr - +2025-02-06 07:25:00 - ERROR - stderr - +2025-02-06 07:25:00 - INFO - stdout - {'loss': 0.3945, 'grad_norm': 1.530826210975647, 'learning_rate': 5.655580048695819e-08, 'epoch': 2.9} +2025-02-06 07:25:00 - ERROR - stderr - 97%|█████████▋| 21697/22434 [21:17:20<31:22, 2.55s/it] +2025-02-06 07:25:02 - ERROR - stderr - 97%|█████████▋| 21698/22434 [21:17:22<31:26, 2.56s/it] +2025-02-06 07:25:03 - ERROR - stderr - +2025-02-06 07:25:03 - ERROR - stderr - +2025-02-06 07:25:03 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.53468656539917, 'learning_rate': 5.6402573234207725e-08, 'epoch': 2.9} +2025-02-06 07:25:03 - ERROR - stderr - 97%|█████████▋| 21698/22434 [21:17:22<31:26, 2.56s/it] +2025-02-06 07:25:05 - ERROR - stderr - 97%|█████████▋| 21699/22434 [21:17:25<31:03, 2.54s/it] +2025-02-06 07:25:05 - ERROR - stderr - +2025-02-06 07:25:05 - ERROR - stderr - +2025-02-06 07:25:05 - INFO - stdout - {'loss': 0.3277, 'grad_norm': 1.5353302955627441, 'learning_rate': 5.6249553246230384e-08, 'epoch': 2.9} +2025-02-06 07:25:05 - ERROR - stderr - 97%|█████████▋| 21699/22434 [21:17:25<31:03, 2.54s/it] +2025-02-06 07:25:07 - ERROR - stderr - 97%|█████████▋| 21700/22434 [21:17:27<30:53, 2.52s/it] +2025-02-06 07:25:07 - ERROR - stderr - +2025-02-06 07:25:07 - ERROR - stderr - +2025-02-06 07:25:07 - INFO - stdout - {'loss': 0.3571, 'grad_norm': 1.6673094034194946, 'learning_rate': 5.609674052621694e-08, 'epoch': 2.9} +2025-02-06 07:25:07 - ERROR - stderr - 97%|█████████▋| 21700/22434 [21:17:27<30:53, 2.52s/it] +2025-02-06 07:25:10 - ERROR - stderr - 97%|█████████▋| 21701/22434 [21:17:30<31:20, 2.57s/it] +2025-02-06 07:25:10 - ERROR - stderr - +2025-02-06 07:25:10 - ERROR - stderr - +2025-02-06 07:25:10 - INFO - stdout - {'loss': 0.3997, 'grad_norm': 1.7075071334838867, 'learning_rate': 5.5944135077350415e-08, 'epoch': 2.9} +2025-02-06 07:25:10 - ERROR - stderr - 97%|█████████▋| 21701/22434 [21:17:30<31:20, 2.57s/it] +2025-02-06 07:25:13 - ERROR - stderr - 97%|█████████▋| 21702/22434 [21:17:32<31:08, 2.55s/it] +2025-02-06 07:25:13 - ERROR - stderr - +2025-02-06 07:25:13 - ERROR - stderr - +2025-02-06 07:25:13 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.6711736917495728, 'learning_rate': 5.579173690281381e-08, 'epoch': 2.9} +2025-02-06 07:25:13 - ERROR - stderr - 97%|█████████▋| 21702/22434 [21:17:32<31:08, 2.55s/it] +2025-02-06 07:25:15 - ERROR - stderr - 97%|█████████▋| 21703/22434 [21:17:35<31:31, 2.59s/it] +2025-02-06 07:25:15 - ERROR - stderr - +2025-02-06 07:25:15 - ERROR - stderr - +2025-02-06 07:25:15 - INFO - stdout - {'loss': 0.3543, 'grad_norm': 1.5767652988433838, 'learning_rate': 5.5639546005782365e-08, 'epoch': 2.9} +2025-02-06 07:25:15 - ERROR - stderr - 97%|█████████▋| 21703/22434 [21:17:35<31:31, 2.59s/it] +2025-02-06 07:25:18 - ERROR - stderr - 97%|█████████▋| 21704/22434 [21:17:38<31:17, 2.57s/it] +2025-02-06 07:25:18 - ERROR - stderr - +2025-02-06 07:25:18 - ERROR - stderr - +2025-02-06 07:25:18 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.5851879119873047, 'learning_rate': 5.5487562389429095e-08, 'epoch': 2.9} +2025-02-06 07:25:18 - ERROR - stderr - 97%|█████████▋| 21704/22434 [21:17:38<31:17, 2.57s/it] +2025-02-06 07:25:20 - ERROR - stderr - 97%|█████████▋| 21705/22434 [21:17:40<31:19, 2.58s/it] +2025-02-06 07:25:20 - ERROR - stderr - +2025-02-06 07:25:20 - ERROR - stderr - +2025-02-06 07:25:20 - INFO - stdout - {'loss': 0.4059, 'grad_norm': 1.7123242616653442, 'learning_rate': 5.533578605692147e-08, 'epoch': 2.9} +2025-02-06 07:25:20 - ERROR - stderr - 97%|█████████▋| 21705/22434 [21:17:40<31:19, 2.58s/it] +2025-02-06 07:25:23 - ERROR - stderr - 97%|█████████▋| 21706/22434 [21:17:43<30:46, 2.54s/it] +2025-02-06 07:25:23 - ERROR - stderr - +2025-02-06 07:25:23 - ERROR - stderr - +2025-02-06 07:25:23 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.6452350616455078, 'learning_rate': 5.518421701142362e-08, 'epoch': 2.9} +2025-02-06 07:25:23 - ERROR - stderr - 97%|█████████▋| 21706/22434 [21:17:43<30:46, 2.54s/it] +2025-02-06 07:25:25 - ERROR - stderr - 97%|█████████▋| 21707/22434 [21:17:45<30:37, 2.53s/it] +2025-02-06 07:25:25 - ERROR - stderr - +2025-02-06 07:25:25 - ERROR - stderr - +2025-02-06 07:25:25 - INFO - stdout - {'loss': 0.3309, 'grad_norm': 1.5667481422424316, 'learning_rate': 5.5032855256095254e-08, 'epoch': 2.9} +2025-02-06 07:25:25 - ERROR - stderr - 97%|█████████▋| 21707/22434 [21:17:45<30:37, 2.53s/it] +2025-02-06 07:25:28 - ERROR - stderr - 97%|█████████▋| 21708/22434 [21:17:48<30:29, 2.52s/it] +2025-02-06 07:25:28 - ERROR - stderr - +2025-02-06 07:25:28 - ERROR - stderr - +2025-02-06 07:25:28 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.721821904182434, 'learning_rate': 5.488170079408939e-08, 'epoch': 2.9} +2025-02-06 07:25:28 - ERROR - stderr - 97%|█████████▋| 21708/22434 [21:17:48<30:29, 2.52s/it] +2025-02-06 07:25:30 - ERROR - stderr - 97%|█████████▋| 21709/22434 [21:17:50<30:25, 2.52s/it] +2025-02-06 07:25:30 - ERROR - stderr - +2025-02-06 07:25:30 - ERROR - stderr - +2025-02-06 07:25:30 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.512250304222107, 'learning_rate': 5.473075362855906e-08, 'epoch': 2.9} +2025-02-06 07:25:30 - ERROR - stderr - 97%|█████████▋| 21709/22434 [21:17:50<30:25, 2.52s/it] +2025-02-06 07:25:33 - ERROR - stderr - 97%|█████████▋| 21710/22434 [21:17:53<30:45, 2.55s/it] +2025-02-06 07:25:33 - ERROR - stderr - +2025-02-06 07:25:33 - ERROR - stderr - +2025-02-06 07:25:33 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.4494457244873047, 'learning_rate': 5.4580013762649544e-08, 'epoch': 2.9} +2025-02-06 07:25:33 - ERROR - stderr - 97%|█████████▋| 21710/22434 [21:17:53<30:45, 2.55s/it] +2025-02-06 07:25:36 - ERROR - stderr - 97%|█████████▋| 21711/22434 [21:17:55<30:54, 2.56s/it] +2025-02-06 07:25:36 - ERROR - stderr - +2025-02-06 07:25:36 - ERROR - stderr - +2025-02-06 07:25:36 - INFO - stdout - {'loss': 0.3778, 'grad_norm': 1.7065660953521729, 'learning_rate': 5.442948119950276e-08, 'epoch': 2.9} +2025-02-06 07:25:36 - ERROR - stderr - 97%|█████████▋| 21711/22434 [21:17:55<30:54, 2.56s/it] +2025-02-06 07:25:38 - ERROR - stderr - 97%|█████████▋| 21712/22434 [21:17:58<30:24, 2.53s/it] +2025-02-06 07:25:38 - ERROR - stderr - +2025-02-06 07:25:38 - ERROR - stderr - +2025-02-06 07:25:38 - INFO - stdout - {'loss': 0.3201, 'grad_norm': 1.4483524560928345, 'learning_rate': 5.427915594225619e-08, 'epoch': 2.9} +2025-02-06 07:25:38 - ERROR - stderr - 97%|█████████▋| 21712/22434 [21:17:58<30:24, 2.53s/it] +2025-02-06 07:25:40 - ERROR - stderr - 97%|█████████▋| 21713/22434 [21:18:00<29:59, 2.50s/it] +2025-02-06 07:25:41 - ERROR - stderr - +2025-02-06 07:25:41 - ERROR - stderr - +2025-02-06 07:25:41 - INFO - stdout - {'loss': 0.3366, 'grad_norm': 1.622246503829956, 'learning_rate': 5.412903799404401e-08, 'epoch': 2.9} +2025-02-06 07:25:41 - ERROR - stderr - 97%|█████████▋| 21713/22434 [21:18:00<29:59, 2.50s/it] +2025-02-06 07:25:43 - ERROR - stderr - 97%|█████████▋| 21714/22434 [21:18:03<29:50, 2.49s/it] +2025-02-06 07:25:43 - ERROR - stderr - +2025-02-06 07:25:43 - ERROR - stderr - +2025-02-06 07:25:43 - INFO - stdout - {'loss': 0.3744, 'grad_norm': 1.6377480030059814, 'learning_rate': 5.397912735799371e-08, 'epoch': 2.9} +2025-02-06 07:25:43 - ERROR - stderr - 97%|█████████▋| 21714/22434 [21:18:03<29:50, 2.49s/it] +2025-02-06 07:25:45 - ERROR - stderr - 97%|█████████▋| 21715/22434 [21:18:05<29:50, 2.49s/it] +2025-02-06 07:25:45 - ERROR - stderr - +2025-02-06 07:25:45 - ERROR - stderr - +2025-02-06 07:25:45 - INFO - stdout - {'loss': 0.3923, 'grad_norm': 1.588748812675476, 'learning_rate': 5.382942403723279e-08, 'epoch': 2.9} +2025-02-06 07:25:45 - ERROR - stderr - 97%|█████████▋| 21715/22434 [21:18:05<29:50, 2.49s/it] +2025-02-06 07:25:48 - ERROR - stderr - 97%|█████████▋| 21716/22434 [21:18:08<29:44, 2.49s/it] +2025-02-06 07:25:48 - ERROR - stderr - +2025-02-06 07:25:48 - ERROR - stderr - +2025-02-06 07:25:48 - INFO - stdout - {'loss': 0.3117, 'grad_norm': 1.4903801679611206, 'learning_rate': 5.367992803487876e-08, 'epoch': 2.9} +2025-02-06 07:25:48 - ERROR - stderr - 97%|█████████▋| 21716/22434 [21:18:08<29:44, 2.49s/it] +2025-02-06 07:25:50 - ERROR - stderr - 97%|█████████▋| 21717/22434 [21:18:10<29:38, 2.48s/it] +2025-02-06 07:25:50 - ERROR - stderr - +2025-02-06 07:25:50 - ERROR - stderr - +2025-02-06 07:25:50 - INFO - stdout - {'loss': 0.34, 'grad_norm': 1.6881256103515625, 'learning_rate': 5.353063935405023e-08, 'epoch': 2.9} +2025-02-06 07:25:50 - ERROR - stderr - 97%|█████████▋| 21717/22434 [21:18:10<29:38, 2.48s/it] +2025-02-06 07:25:53 - ERROR - stderr - 97%|█████████▋| 21718/22434 [21:18:13<29:31, 2.47s/it] +2025-02-06 07:25:53 - ERROR - stderr - +2025-02-06 07:25:53 - ERROR - stderr - +2025-02-06 07:25:53 - INFO - stdout - {'loss': 0.3879, 'grad_norm': 1.5486427545547485, 'learning_rate': 5.338155799785694e-08, 'epoch': 2.9} +2025-02-06 07:25:53 - ERROR - stderr - 97%|█████████▋| 21718/22434 [21:18:13<29:31, 2.47s/it] +2025-02-06 07:25:55 - ERROR - stderr - 97%|█████████▋| 21719/22434 [21:18:15<29:40, 2.49s/it] +2025-02-06 07:25:55 - ERROR - stderr - +2025-02-06 07:25:55 - ERROR - stderr - +2025-02-06 07:25:55 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.7677642107009888, 'learning_rate': 5.323268396940751e-08, 'epoch': 2.9} +2025-02-06 07:25:55 - ERROR - stderr - 97%|█████████▋| 21719/22434 [21:18:15<29:40, 2.49s/it] +2025-02-06 07:25:58 - ERROR - stderr - 97%|█████████▋| 21720/22434 [21:18:18<29:37, 2.49s/it] +2025-02-06 07:25:58 - ERROR - stderr - +2025-02-06 07:25:58 - ERROR - stderr - +2025-02-06 07:25:58 - INFO - stdout - {'loss': 0.43, 'grad_norm': 1.7201359272003174, 'learning_rate': 5.308401727180501e-08, 'epoch': 2.9} +2025-02-06 07:25:58 - ERROR - stderr - 97%|█████████▋| 21720/22434 [21:18:18<29:37, 2.49s/it] +2025-02-06 07:26:00 - ERROR - stderr - 97%|█████████▋| 21721/22434 [21:18:20<29:42, 2.50s/it] +2025-02-06 07:26:00 - ERROR - stderr - +2025-02-06 07:26:00 - ERROR - stderr - +2025-02-06 07:26:00 - INFO - stdout - {'loss': 0.3499, 'grad_norm': 1.5433242321014404, 'learning_rate': 5.2935557908146976e-08, 'epoch': 2.9} +2025-02-06 07:26:00 - ERROR - stderr - 97%|█████████▋| 21721/22434 [21:18:20<29:42, 2.50s/it] +2025-02-06 07:26:03 - ERROR - stderr - 97%|█████████▋| 21722/22434 [21:18:23<29:23, 2.48s/it] +2025-02-06 07:26:03 - ERROR - stderr - +2025-02-06 07:26:03 - ERROR - stderr - +2025-02-06 07:26:03 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.516464352607727, 'learning_rate': 5.27873058815298e-08, 'epoch': 2.9} +2025-02-06 07:26:03 - ERROR - stderr - 97%|█████████▋| 21722/22434 [21:18:23<29:23, 2.48s/it] +2025-02-06 07:26:05 - ERROR - stderr - 97%|█████████▋| 21723/22434 [21:18:25<29:48, 2.52s/it] +2025-02-06 07:26:05 - ERROR - stderr - +2025-02-06 07:26:05 - ERROR - stderr - +2025-02-06 07:26:05 - INFO - stdout - {'loss': 0.3759, 'grad_norm': 1.6005887985229492, 'learning_rate': 5.263926119504326e-08, 'epoch': 2.9} +2025-02-06 07:26:05 - ERROR - stderr - 97%|█████████▋| 21723/22434 [21:18:25<29:48, 2.52s/it] +2025-02-06 07:26:08 - ERROR - stderr - 97%|█████████▋| 21724/22434 [21:18:28<29:36, 2.50s/it] +2025-02-06 07:26:08 - ERROR - stderr - +2025-02-06 07:26:08 - ERROR - stderr - +2025-02-06 07:26:08 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.5587105751037598, 'learning_rate': 5.249142385177153e-08, 'epoch': 2.91} +2025-02-06 07:26:08 - ERROR - stderr - 97%|█████████▋| 21724/22434 [21:18:28<29:36, 2.50s/it] +2025-02-06 07:26:10 - ERROR - stderr - 97%|█████████▋| 21725/22434 [21:18:30<29:50, 2.53s/it] +2025-02-06 07:26:11 - ERROR - stderr - +2025-02-06 07:26:11 - ERROR - stderr - +2025-02-06 07:26:11 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.633468747138977, 'learning_rate': 5.234379385479771e-08, 'epoch': 2.91} +2025-02-06 07:26:11 - ERROR - stderr - 97%|█████████▋| 21725/22434 [21:18:30<29:50, 2.53s/it] +2025-02-06 07:26:13 - ERROR - stderr - 97%|█████████▋| 21726/22434 [21:18:33<29:47, 2.52s/it] +2025-02-06 07:26:13 - ERROR - stderr - +2025-02-06 07:26:13 - ERROR - stderr - +2025-02-06 07:26:13 - INFO - stdout - {'loss': 0.3581, 'grad_norm': 1.5826988220214844, 'learning_rate': 5.2196371207199336e-08, 'epoch': 2.91} +2025-02-06 07:26:13 - ERROR - stderr - 97%|█████████▋| 21726/22434 [21:18:33<29:47, 2.52s/it] +2025-02-06 07:26:15 - ERROR - stderr - 97%|█████████▋| 21727/22434 [21:18:35<29:36, 2.51s/it] +2025-02-06 07:26:16 - ERROR - stderr - +2025-02-06 07:26:16 - ERROR - stderr - +2025-02-06 07:26:16 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.5138053894042969, 'learning_rate': 5.20491559120484e-08, 'epoch': 2.91} +2025-02-06 07:26:16 - ERROR - stderr - 97%|█████████▋| 21727/22434 [21:18:35<29:36, 2.51s/it] +2025-02-06 07:26:18 - ERROR - stderr - 97%|█████████▋| 21728/22434 [21:18:38<29:43, 2.53s/it] +2025-02-06 07:26:18 - ERROR - stderr - +2025-02-06 07:26:18 - ERROR - stderr - +2025-02-06 07:26:18 - INFO - stdout - {'loss': 0.3225, 'grad_norm': 1.489418387413025, 'learning_rate': 5.190214797241355e-08, 'epoch': 2.91} +2025-02-06 07:26:18 - ERROR - stderr - 97%|█████████▋| 21728/22434 [21:18:38<29:43, 2.53s/it] +2025-02-06 07:26:21 - ERROR - stderr - 97%|█████████▋| 21729/22434 [21:18:40<30:09, 2.57s/it] +2025-02-06 07:26:21 - ERROR - stderr - +2025-02-06 07:26:21 - ERROR - stderr - +2025-02-06 07:26:21 - INFO - stdout - {'loss': 0.3445, 'grad_norm': 1.4915026426315308, 'learning_rate': 5.17553473913579e-08, 'epoch': 2.91} +2025-02-06 07:26:21 - ERROR - stderr - 97%|█████████▋| 21729/22434 [21:18:41<30:09, 2.57s/it] +2025-02-06 07:26:23 - ERROR - stderr - 97%|█████████▋| 21730/22434 [21:18:43<29:56, 2.55s/it] +2025-02-06 07:26:23 - ERROR - stderr - +2025-02-06 07:26:23 - ERROR - stderr - +2025-02-06 07:26:23 - INFO - stdout - {'loss': 0.3531, 'grad_norm': 1.6610913276672363, 'learning_rate': 5.1608754171944555e-08, 'epoch': 2.91} +2025-02-06 07:26:23 - ERROR - stderr - 97%|█████████▋| 21730/22434 [21:18:43<29:56, 2.55s/it] +2025-02-06 07:26:26 - ERROR - stderr - 97%|█████████▋| 21731/22434 [21:18:45<29:43, 2.54s/it] +2025-02-06 07:26:26 - ERROR - stderr - +2025-02-06 07:26:26 - ERROR - stderr - +2025-02-06 07:26:26 - INFO - stdout - {'loss': 0.3406, 'grad_norm': 1.5077239274978638, 'learning_rate': 5.1462368317226616e-08, 'epoch': 2.91} +2025-02-06 07:26:26 - ERROR - stderr - 97%|█████████▋| 21731/22434 [21:18:46<29:43, 2.54s/it] +2025-02-06 07:26:28 - ERROR - stderr - 97%|█████████▋| 21732/22434 [21:18:48<29:37, 2.53s/it] +2025-02-06 07:26:28 - ERROR - stderr - +2025-02-06 07:26:28 - ERROR - stderr - +2025-02-06 07:26:28 - INFO - stdout - {'loss': 0.3553, 'grad_norm': 1.4788247346878052, 'learning_rate': 5.131618983025499e-08, 'epoch': 2.91} +2025-02-06 07:26:28 - ERROR - stderr - 97%|█████████▋| 21732/22434 [21:18:48<29:37, 2.53s/it] +2025-02-06 07:26:31 - ERROR - stderr - 97%|█████████▋| 21733/22434 [21:18:51<29:36, 2.53s/it] +2025-02-06 07:26:31 - ERROR - stderr - +2025-02-06 07:26:31 - ERROR - stderr - +2025-02-06 07:26:31 - INFO - stdout - {'loss': 0.3356, 'grad_norm': 1.492851734161377, 'learning_rate': 5.1170218714078346e-08, 'epoch': 2.91} +2025-02-06 07:26:31 - ERROR - stderr - 97%|█████████▋| 21733/22434 [21:18:51<29:36, 2.53s/it] +2025-02-06 07:26:34 - ERROR - stderr - 97%|█████████▋| 21734/22434 [21:18:53<31:01, 2.66s/it] +2025-02-06 07:26:34 - ERROR - stderr - +2025-02-06 07:26:34 - ERROR - stderr - +2025-02-06 07:26:34 - INFO - stdout - {'loss': 0.3466, 'grad_norm': 1.5526649951934814, 'learning_rate': 5.102445497173758e-08, 'epoch': 2.91} +2025-02-06 07:26:34 - ERROR - stderr - 97%|█████████▋| 21734/22434 [21:18:54<31:01, 2.66s/it] +2025-02-06 07:26:36 - ERROR - stderr - 97%|█████████▋| 21735/22434 [21:18:56<30:26, 2.61s/it] +2025-02-06 07:26:36 - ERROR - stderr - +2025-02-06 07:26:36 - ERROR - stderr - +2025-02-06 07:26:36 - INFO - stdout - {'loss': 0.3513, 'grad_norm': 1.5814646482467651, 'learning_rate': 5.0878898606272483e-08, 'epoch': 2.91} +2025-02-06 07:26:36 - ERROR - stderr - 97%|█████████▋| 21735/22434 [21:18:56<30:26, 2.61s/it] +2025-02-06 07:26:39 - ERROR - stderr - 97%|█████████▋| 21736/22434 [21:18:59<30:02, 2.58s/it] +2025-02-06 07:26:39 - ERROR - stderr - +2025-02-06 07:26:39 - ERROR - stderr - +2025-02-06 07:26:39 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.4282602071762085, 'learning_rate': 5.0733549620717306e-08, 'epoch': 2.91} +2025-02-06 07:26:39 - ERROR - stderr - 97%|█████████▋| 21736/22434 [21:18:59<30:02, 2.58s/it] +2025-02-06 07:26:41 - ERROR - stderr - 97%|█████████▋| 21737/22434 [21:19:01<29:54, 2.58s/it] +2025-02-06 07:26:41 - ERROR - stderr - +2025-02-06 07:26:41 - ERROR - stderr - +2025-02-06 07:26:41 - INFO - stdout - {'loss': 0.3655, 'grad_norm': 1.544682264328003, 'learning_rate': 5.058840801809961e-08, 'epoch': 2.91} +2025-02-06 07:26:41 - ERROR - stderr - 97%|█████████▋| 21737/22434 [21:19:01<29:54, 2.58s/it] +2025-02-06 07:26:44 - ERROR - stderr - 97%|█████████▋| 21738/22434 [21:19:03<29:23, 2.53s/it] +2025-02-06 07:26:44 - ERROR - stderr - +2025-02-06 07:26:44 - ERROR - stderr - +2025-02-06 07:26:44 - INFO - stdout - {'loss': 0.3771, 'grad_norm': 1.6191850900650024, 'learning_rate': 5.044347380144698e-08, 'epoch': 2.91} +2025-02-06 07:26:44 - ERROR - stderr - 97%|█████████▋| 21738/22434 [21:19:04<29:23, 2.53s/it] +2025-02-06 07:26:46 - ERROR - stderr - 97%|█████████▋| 21739/22434 [21:19:06<29:12, 2.52s/it] +2025-02-06 07:26:46 - ERROR - stderr - +2025-02-06 07:26:46 - ERROR - stderr - +2025-02-06 07:26:46 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.308089017868042, 'learning_rate': 5.0298746973778124e-08, 'epoch': 2.91} +2025-02-06 07:26:46 - ERROR - stderr - 97%|█████████▋| 21739/22434 [21:19:06<29:12, 2.52s/it] +2025-02-06 07:26:49 - ERROR - stderr - 97%|█████████▋| 21740/22434 [21:19:08<29:06, 2.52s/it] +2025-02-06 07:26:49 - ERROR - stderr - +2025-02-06 07:26:49 - ERROR - stderr - +2025-02-06 07:26:49 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.5420504808425903, 'learning_rate': 5.015422753811172e-08, 'epoch': 2.91} +2025-02-06 07:26:49 - ERROR - stderr - 97%|█████████▋| 21740/22434 [21:19:09<29:06, 2.52s/it] +2025-02-06 07:26:51 - ERROR - stderr - 97%|█████████▋| 21741/22434 [21:19:11<28:53, 2.50s/it] +2025-02-06 07:26:51 - ERROR - stderr - +2025-02-06 07:26:51 - ERROR - stderr - +2025-02-06 07:26:51 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.5922136306762695, 'learning_rate': 5.0009915497459815e-08, 'epoch': 2.91} +2025-02-06 07:26:51 - ERROR - stderr - 97%|█████████▋| 21741/22434 [21:19:11<28:53, 2.50s/it] +2025-02-06 07:26:54 - ERROR - stderr - 97%|█████████▋| 21742/22434 [21:19:13<28:41, 2.49s/it] +2025-02-06 07:26:54 - ERROR - stderr - +2025-02-06 07:26:54 - ERROR - stderr - +2025-02-06 07:26:54 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.5506490468978882, 'learning_rate': 4.986581085483111e-08, 'epoch': 2.91} +2025-02-06 07:26:54 - ERROR - stderr - 97%|█████████▋| 21742/22434 [21:19:13<28:41, 2.49s/it] +2025-02-06 07:26:56 - ERROR - stderr - 97%|█████████▋| 21743/22434 [21:19:16<28:49, 2.50s/it] +2025-02-06 07:26:56 - ERROR - stderr - +2025-02-06 07:26:56 - ERROR - stderr - +2025-02-06 07:26:56 - INFO - stdout - {'loss': 0.334, 'grad_norm': 1.3692717552185059, 'learning_rate': 4.972191361322654e-08, 'epoch': 2.91} +2025-02-06 07:26:56 - ERROR - stderr - 97%|█████████▋| 21743/22434 [21:19:16<28:49, 2.50s/it] +2025-02-06 07:26:59 - ERROR - stderr - 97%|█████████▋| 21744/22434 [21:19:19<29:00, 2.52s/it] +2025-02-06 07:26:59 - ERROR - stderr - +2025-02-06 07:26:59 - ERROR - stderr - +2025-02-06 07:26:59 - INFO - stdout - {'loss': 0.3751, 'grad_norm': 1.6222630739212036, 'learning_rate': 4.9578223775647026e-08, 'epoch': 2.91} +2025-02-06 07:26:59 - ERROR - stderr - 97%|█████████▋| 21744/22434 [21:19:19<29:00, 2.52s/it] +2025-02-06 07:27:01 - ERROR - stderr - 97%|█████████▋| 21745/22434 [21:19:21<28:51, 2.51s/it] +2025-02-06 07:27:01 - ERROR - stderr - +2025-02-06 07:27:01 - ERROR - stderr - +2025-02-06 07:27:01 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.859039545059204, 'learning_rate': 4.943474134508908e-08, 'epoch': 2.91} +2025-02-06 07:27:01 - ERROR - stderr - 97%|█████████▋| 21745/22434 [21:19:21<28:51, 2.51s/it] +2025-02-06 07:27:04 - ERROR - stderr - 97%|█████████▋| 21746/22434 [21:19:24<28:44, 2.51s/it] +2025-02-06 07:27:04 - ERROR - stderr - +2025-02-06 07:27:04 - ERROR - stderr - +2025-02-06 07:27:04 - INFO - stdout - {'loss': 0.3276, 'grad_norm': 1.34371018409729, 'learning_rate': 4.929146632454251e-08, 'epoch': 2.91} +2025-02-06 07:27:04 - ERROR - stderr - 97%|█████████▋| 21746/22434 [21:19:24<28:44, 2.51s/it] +2025-02-06 07:27:06 - ERROR - stderr - 97%|█████████▋| 21747/22434 [21:19:26<28:48, 2.52s/it] +2025-02-06 07:27:06 - ERROR - stderr - +2025-02-06 07:27:06 - ERROR - stderr - +2025-02-06 07:27:06 - INFO - stdout - {'loss': 0.3308, 'grad_norm': 1.6226098537445068, 'learning_rate': 4.914839871699273e-08, 'epoch': 2.91} +2025-02-06 07:27:06 - ERROR - stderr - 97%|█████████▋| 21747/22434 [21:19:26<28:48, 2.52s/it] +2025-02-06 07:27:09 - ERROR - stderr - 97%|█████████▋| 21748/22434 [21:19:29<28:48, 2.52s/it] +2025-02-06 07:27:09 - ERROR - stderr - +2025-02-06 07:27:09 - ERROR - stderr - +2025-02-06 07:27:09 - INFO - stdout - {'loss': 0.3684, 'grad_norm': 1.4297494888305664, 'learning_rate': 4.900553852542289e-08, 'epoch': 2.91} +2025-02-06 07:27:09 - ERROR - stderr - 97%|█████████▋| 21748/22434 [21:19:29<28:48, 2.52s/it] +2025-02-06 07:27:11 - ERROR - stderr - 97%|█████████▋| 21749/22434 [21:19:31<28:53, 2.53s/it] +2025-02-06 07:27:11 - ERROR - stderr - +2025-02-06 07:27:11 - ERROR - stderr - +2025-02-06 07:27:11 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.660927653312683, 'learning_rate': 4.8862885752810615e-08, 'epoch': 2.91} +2025-02-06 07:27:11 - ERROR - stderr - 97%|████████��▋| 21749/22434 [21:19:31<28:53, 2.53s/it] +2025-02-06 07:27:14 - ERROR - stderr - 97%|█████████▋| 21750/22434 [21:19:34<28:53, 2.53s/it] +2025-02-06 07:27:14 - ERROR - stderr - +2025-02-06 07:27:14 - ERROR - stderr - +2025-02-06 07:27:14 - INFO - stdout - {'loss': 0.3693, 'grad_norm': 1.4111627340316772, 'learning_rate': 4.872044040212909e-08, 'epoch': 2.91} +2025-02-06 07:27:14 - ERROR - stderr - 97%|█████████▋| 21750/22434 [21:19:34<28:53, 2.53s/it] +2025-02-06 07:27:16 - ERROR - stderr - 97%|█████████▋| 21751/22434 [21:19:36<28:29, 2.50s/it] +2025-02-06 07:27:16 - ERROR - stderr - +2025-02-06 07:27:16 - ERROR - stderr - +2025-02-06 07:27:16 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.4912859201431274, 'learning_rate': 4.857820247634815e-08, 'epoch': 2.91} +2025-02-06 07:27:16 - ERROR - stderr - 97%|█████████▋| 21751/22434 [21:19:36<28:29, 2.50s/it] +2025-02-06 07:27:19 - ERROR - stderr - 97%|█████████▋| 21752/22434 [21:19:39<28:11, 2.48s/it] +2025-02-06 07:27:19 - ERROR - stderr - +2025-02-06 07:27:19 - ERROR - stderr - +2025-02-06 07:27:19 - INFO - stdout - {'loss': 0.3446, 'grad_norm': 1.50459623336792, 'learning_rate': 4.843617197843209e-08, 'epoch': 2.91} +2025-02-06 07:27:19 - ERROR - stderr - 97%|█████████▋| 21752/22434 [21:19:39<28:11, 2.48s/it] +2025-02-06 07:27:21 - ERROR - stderr - 97%|█████████▋| 21753/22434 [21:19:41<28:19, 2.50s/it] +2025-02-06 07:27:21 - ERROR - stderr - +2025-02-06 07:27:21 - ERROR - stderr - +2025-02-06 07:27:21 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.5881121158599854, 'learning_rate': 4.8294348911340774e-08, 'epoch': 2.91} +2025-02-06 07:27:21 - ERROR - stderr - 97%|█████████▋| 21753/22434 [21:19:41<28:19, 2.50s/it] +2025-02-06 07:27:24 - ERROR - stderr - 97%|█████████▋| 21754/22434 [21:19:44<28:17, 2.50s/it] +2025-02-06 07:27:24 - ERROR - stderr - +2025-02-06 07:27:24 - ERROR - stderr - +2025-02-06 07:27:24 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.5604761838912964, 'learning_rate': 4.815273327803183e-08, 'epoch': 2.91} +2025-02-06 07:27:24 - ERROR - stderr - 97%|█████████▋| 21754/22434 [21:19:44<28:17, 2.50s/it] +2025-02-06 07:27:26 - ERROR - stderr - 97%|█████████▋| 21755/22434 [21:19:46<28:11, 2.49s/it] +2025-02-06 07:27:26 - ERROR - stderr - +2025-02-06 07:27:26 - ERROR - stderr - +2025-02-06 07:27:26 - INFO - stdout - {'loss': 0.3187, 'grad_norm': 1.5102523565292358, 'learning_rate': 4.8011325081455115e-08, 'epoch': 2.91} +2025-02-06 07:27:26 - ERROR - stderr - 97%|█████████▋| 21755/22434 [21:19:46<28:11, 2.49s/it] +2025-02-06 07:27:29 - ERROR - stderr - 97%|█████████▋| 21756/22434 [21:19:49<28:19, 2.51s/it] +2025-02-06 07:27:29 - ERROR - stderr - +2025-02-06 07:27:29 - ERROR - stderr - +2025-02-06 07:27:29 - INFO - stdout - {'loss': 0.3798, 'grad_norm': 1.5953079462051392, 'learning_rate': 4.787012432456051e-08, 'epoch': 2.91} +2025-02-06 07:27:29 - ERROR - stderr - 97%|█████████▋| 21756/22434 [21:19:49<28:19, 2.51s/it] +2025-02-06 07:27:31 - ERROR - stderr - 97%|█████████▋| 21757/22434 [21:19:51<28:10, 2.50s/it] +2025-02-06 07:27:31 - ERROR - stderr - +2025-02-06 07:27:31 - ERROR - stderr - +2025-02-06 07:27:31 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.6200634241104126, 'learning_rate': 4.772913101028898e-08, 'epoch': 2.91} +2025-02-06 07:27:31 - ERROR - stderr - 97%|█████████▋| 21757/22434 [21:19:51<28:10, 2.50s/it] +2025-02-06 07:27:34 - ERROR - stderr - 97%|█████████▋| 21758/22434 [21:19:54<28:20, 2.52s/it] +2025-02-06 07:27:34 - ERROR - stderr - +2025-02-06 07:27:34 - ERROR - stderr - +2025-02-06 07:27:34 - INFO - stdout - {'loss': 0.3196, 'grad_norm': 1.362520456314087, 'learning_rate': 4.7588345141580396e-08, 'epoch': 2.91} +2025-02-06 07:27:34 - ERROR - stderr - 97%|█████████▋| 21758/22434 [21:19:54<28:20, 2.52s/it] +2025-02-06 07:27:36 - ERROR - stderr - 97%|█████████▋| 21759/22434 [21:19:56<28:19, 2.52s/it] +2025-02-06 07:27:36 - ERROR - stderr - +2025-02-06 07:27:36 - ERROR - stderr - +2025-02-06 07:27:36 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.5362629890441895, 'learning_rate': 4.744776672137019e-08, 'epoch': 2.91} +2025-02-06 07:27:36 - ERROR - stderr - 97%|█████████▋| 21759/22434 [21:19:56<28:19, 2.52s/it] +2025-02-06 07:27:39 - ERROR - stderr - 97%|█████████▋| 21760/22434 [21:19:59<28:19, 2.52s/it] +2025-02-06 07:27:39 - ERROR - stderr - +2025-02-06 07:27:39 - ERROR - stderr - +2025-02-06 07:27:39 - INFO - stdout - {'loss': 0.3069, 'grad_norm': 1.36909019947052, 'learning_rate': 4.730739575258714e-08, 'epoch': 2.91} +2025-02-06 07:27:39 - ERROR - stderr - 97%|█████████▋| 21760/22434 [21:19:59<28:19, 2.52s/it] +2025-02-06 07:27:41 - ERROR - stderr - 97%|█████████▋| 21761/22434 [21:20:01<28:03, 2.50s/it] +2025-02-06 07:27:41 - ERROR - stderr - +2025-02-06 07:27:41 - ERROR - stderr - +2025-02-06 07:27:41 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.681099772453308, 'learning_rate': 4.716723223815778e-08, 'epoch': 2.91} +2025-02-06 07:27:41 - ERROR - stderr - 97%|█████████▋| 21761/22434 [21:20:01<28:03, 2.50s/it] +2025-02-06 07:27:44 - ERROR - stderr - 97%|█████████▋| 21762/22434 [21:20:04<28:22, 2.53s/it] +2025-02-06 07:27:44 - ERROR - stderr - +2025-02-06 07:27:44 - ERROR - stderr - +2025-02-06 07:27:44 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.5328775644302368, 'learning_rate': 4.702727618100422e-08, 'epoch': 2.91} +2025-02-06 07:27:44 - ERROR - stderr - 97%|█████████▋| 21762/22434 [21:20:04<28:22, 2.53s/it] +2025-02-06 07:27:47 - ERROR - stderr - 97%|█████████▋| 21763/22434 [21:20:06<28:27, 2.54s/it] +2025-02-06 07:27:47 - ERROR - stderr - +2025-02-06 07:27:47 - ERROR - stderr - +2025-02-06 07:27:47 - INFO - stdout - {'loss': 0.3618, 'grad_norm': 1.5747220516204834, 'learning_rate': 4.688752758404302e-08, 'epoch': 2.91} +2025-02-06 07:27:47 - ERROR - stderr - 97%|█████████▋| 21763/22434 [21:20:06<28:27, 2.54s/it] +2025-02-06 07:27:49 - ERROR - stderr - 97%|█████████▋| 21764/22434 [21:20:09<28:36, 2.56s/it] +2025-02-06 07:27:49 - ERROR - stderr - +2025-02-06 07:27:49 - ERROR - stderr - +2025-02-06 07:27:49 - INFO - stdout - {'loss': 0.3589, 'grad_norm': 1.6061378717422485, 'learning_rate': 4.67479864501863e-08, 'epoch': 2.91} +2025-02-06 07:27:49 - ERROR - stderr - 97%|█████████▋| 21764/22434 [21:20:09<28:36, 2.56s/it] +2025-02-06 07:27:52 - ERROR - stderr - 97%|█████████▋| 21765/22434 [21:20:11<28:23, 2.55s/it] +2025-02-06 07:27:52 - ERROR - stderr - +2025-02-06 07:27:52 - ERROR - stderr - +2025-02-06 07:27:52 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.631950855255127, 'learning_rate': 4.660865278234394e-08, 'epoch': 2.91} +2025-02-06 07:27:52 - ERROR - stderr - 97%|█████████▋| 21765/22434 [21:20:11<28:23, 2.55s/it] +2025-02-06 07:27:54 - ERROR - stderr - 97%|█████████▋| 21766/22434 [21:20:14<28:12, 2.53s/it] +2025-02-06 07:27:54 - ERROR - stderr - +2025-02-06 07:27:54 - ERROR - stderr - +2025-02-06 07:27:54 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.5810906887054443, 'learning_rate': 4.64695265834203e-08, 'epoch': 2.91} +2025-02-06 07:27:54 - ERROR - stderr - 97%|█████████▋| 21766/22434 [21:20:14<28:12, 2.53s/it] +2025-02-06 07:27:57 - ERROR - stderr - 97%|█████████▋| 21767/22434 [21:20:16<28:02, 2.52s/it] +2025-02-06 07:27:57 - ERROR - stderr - +2025-02-06 07:27:57 - ERROR - stderr - +2025-02-06 07:27:57 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.4219386577606201, 'learning_rate': 4.633060785631527e-08, 'epoch': 2.91} +2025-02-06 07:27:57 - ERROR - stderr - 97%|█████████▋| 21767/22434 [21:20:16<28:02, 2.52s/it] +2025-02-06 07:27:59 - ERROR - stderr - 97%|█████████▋| 21768/22434 [21:20:19<27:55, 2.52s/it] +2025-02-06 07:27:59 - ERROR - stderr - +2025-02-06 07:27:59 - ERROR - stderr - +2025-02-06 07:27:59 - INFO - stdout - {'loss': 0.409, 'grad_norm': 1.6579660177230835, 'learning_rate': 4.61918966039232e-08, 'epoch': 2.91} +2025-02-06 07:27:59 - ERROR - stderr - 97%|█████████▋| 21768/22434 [21:20:19<27:55, 2.52s/it] +2025-02-06 07:28:02 - ERROR - stderr - 97%|█████████▋| 21769/22434 [21:20:21<27:49, 2.51s/it] +2025-02-06 07:28:02 - ERROR - stderr - +2025-02-06 07:28:02 - ERROR - stderr - +2025-02-06 07:28:02 - INFO - stdout - {'loss': 0.3708, 'grad_norm': 1.4656703472137451, 'learning_rate': 4.6053392829136234e-08, 'epoch': 2.91} +2025-02-06 07:28:02 - ERROR - stderr - 97%|█████████▋| 21769/22434 [21:20:21<27:49, 2.51s/it] +2025-02-06 07:28:05 - ERROR - stderr - 97%|█████████▋| 21770/22434 [21:20:24<28:58, 2.62s/it] +2025-02-06 07:28:05 - ERROR - stderr - +2025-02-06 07:28:05 - ERROR - stderr - +2025-02-06 07:28:05 - INFO - stdout - {'loss': 0.3607, 'grad_norm': 1.7243539094924927, 'learning_rate': 4.591509653484205e-08, 'epoch': 2.91} +2025-02-06 07:28:05 - ERROR - stderr - 97%|█████████▋| 21770/22434 [21:20:24<28:58, 2.62s/it] +2025-02-06 07:28:07 - ERROR - stderr - 97%|█████████▋| 21771/22434 [21:20:27<28:12, 2.55s/it] +2025-02-06 07:28:07 - ERROR - stderr - +2025-02-06 07:28:07 - ERROR - stderr - +2025-02-06 07:28:07 - INFO - stdout - {'loss': 0.348, 'grad_norm': 1.5317944288253784, 'learning_rate': 4.5777007723922796e-08, 'epoch': 2.91} +2025-02-06 07:28:07 - ERROR - stderr - 97%|█████████▋| 21771/22434 [21:20:27<28:12, 2.55s/it] +2025-02-06 07:28:09 - ERROR - stderr - 97%|█████████▋| 21772/22434 [21:20:29<27:57, 2.53s/it] +2025-02-06 07:28:09 - ERROR - stderr - +2025-02-06 07:28:09 - ERROR - stderr - +2025-02-06 07:28:09 - INFO - stdout - {'loss': 0.3248, 'grad_norm': 1.5792025327682495, 'learning_rate': 4.563912639925616e-08, 'epoch': 2.91} +2025-02-06 07:28:09 - ERROR - stderr - 97%|█████████▋| 21772/22434 [21:20:29<27:57, 2.53s/it] +2025-02-06 07:28:12 - ERROR - stderr - 97%|█████████▋| 21773/22434 [21:20:32<27:49, 2.53s/it] +2025-02-06 07:28:12 - ERROR - stderr - +2025-02-06 07:28:12 - ERROR - stderr - +2025-02-06 07:28:12 - INFO - stdout - {'loss': 0.332, 'grad_norm': 1.4057174921035767, 'learning_rate': 4.550145256371652e-08, 'epoch': 2.91} +2025-02-06 07:28:12 - ERROR - stderr - 97%|█████████▋| 21773/22434 [21:20:32<27:49, 2.53s/it] +2025-02-06 07:28:14 - ERROR - stderr - 97%|█████████▋| 21774/22434 [21:20:34<27:56, 2.54s/it] +2025-02-06 07:28:15 - ERROR - stderr - +2025-02-06 07:28:15 - ERROR - stderr - +2025-02-06 07:28:15 - INFO - stdout - {'loss': 0.302, 'grad_norm': 1.4943373203277588, 'learning_rate': 4.53639862201738e-08, 'epoch': 2.91} +2025-02-06 07:28:15 - ERROR - stderr - 97%|█████████▋| 21774/22434 [21:20:34<27:56, 2.54s/it] +2025-02-06 07:28:17 - ERROR - stderr - 97%|█████████▋| 21775/22434 [21:20:37<28:08, 2.56s/it] +2025-02-06 07:28:17 - ERROR - stderr - +2025-02-06 07:28:17 - ERROR - stderr - +2025-02-06 07:28:17 - INFO - stdout - {'loss': 0.3906, 'grad_norm': 1.7238916158676147, 'learning_rate': 4.522672737149347e-08, 'epoch': 2.91} +2025-02-06 07:28:17 - ERROR - stderr - 97%|█████████▋| 21775/22434 [21:20:37<28:08, 2.56s/it] +2025-02-06 07:28:20 - ERROR - stderr - 97%|█████████▋| 21776/22434 [21:20:39<27:58, 2.55s/it] +2025-02-06 07:28:20 - ERROR - stderr - +2025-02-06 07:28:20 - ERROR - stderr - +2025-02-06 07:28:20 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.4010136127471924, 'learning_rate': 4.508967602053549e-08, 'epoch': 2.91} +2025-02-06 07:28:20 - ERROR - stderr - 97%|█████████▋| 21776/22434 [21:20:39<27:58, 2.55s/it] +2025-02-06 07:28:22 - ERROR - stderr - 97%|█████████▋| 21777/22434 [21:20:42<27:38, 2.52s/it] +2025-02-06 07:28:22 - ERROR - stderr - +2025-02-06 07:28:22 - ERROR - stderr - +2025-02-06 07:28:22 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.6481897830963135, 'learning_rate': 4.495283217015867e-08, 'epoch': 2.91} +2025-02-06 07:28:22 - ERROR - stderr - 97%|█████████▋| 21777/22434 [21:20:42<27:38, 2.52s/it] +2025-02-06 07:28:25 - ERROR - stderr - 97%|█████████▋| 21778/22434 [21:20:44<27:20, 2.50s/it] +2025-02-06 07:28:25 - ERROR - stderr - +2025-02-06 07:28:25 - ERROR - stderr - +2025-02-06 07:28:25 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.5905396938323975, 'learning_rate': 4.4816195823212946e-08, 'epoch': 2.91} +2025-02-06 07:28:25 - ERROR - stderr - 97%|█████████▋| 21778/22434 [21:20:44<27:20, 2.50s/it] +2025-02-06 07:28:27 - ERROR - stderr - 97%|█████████▋| 21779/22434 [21:20:47<27:31, 2.52s/it] +2025-02-06 07:28:27 - ERROR - stderr - +2025-02-06 07:28:27 - ERROR - stderr - +2025-02-06 07:28:27 - INFO - stdout - {'loss': 0.329, 'grad_norm': 1.5707181692123413, 'learning_rate': 4.467976698254828e-08, 'epoch': 2.91} +2025-02-06 07:28:27 - ERROR - stderr - 97%|█████████▋| 21779/22434 [21:20:47<27:31, 2.52s/it] +2025-02-06 07:28:30 - ERROR - stderr - 97%|█████████▋| 21780/22434 [21:20:49<27:35, 2.53s/it] +2025-02-06 07:28:30 - ERROR - stderr - +2025-02-06 07:28:30 - ERROR - stderr - +2025-02-06 07:28:30 - INFO - stdout - {'loss': 0.3267, 'grad_norm': 1.4278359413146973, 'learning_rate': 4.454354565100793e-08, 'epoch': 2.91} +2025-02-06 07:28:30 - ERROR - stderr - 97%|█████████▋| 21780/22434 [21:20:49<27:35, 2.53s/it] +2025-02-06 07:28:32 - ERROR - stderr - 97%|█████████▋| 21781/22434 [21:20:52<27:23, 2.52s/it] +2025-02-06 07:28:32 - ERROR - stderr - +2025-02-06 07:28:32 - ERROR - stderr - +2025-02-06 07:28:32 - INFO - stdout - {'loss': 0.3952, 'grad_norm': 1.6504592895507812, 'learning_rate': 4.440753183143076e-08, 'epoch': 2.91} +2025-02-06 07:28:32 - ERROR - stderr - 97%|█████████▋| 21781/22434 [21:20:52<27:23, 2.52s/it] +2025-02-06 07:28:35 - ERROR - stderr - 97%|█████████▋| 21782/22434 [21:20:54<27:21, 2.52s/it] +2025-02-06 07:28:35 - ERROR - stderr - +2025-02-06 07:28:35 - ERROR - stderr - +2025-02-06 07:28:35 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.5234427452087402, 'learning_rate': 4.4271725526651155e-08, 'epoch': 2.91} +2025-02-06 07:28:35 - ERROR - stderr - 97%|█████████▋| 21782/22434 [21:20:54<27:21, 2.52s/it] +2025-02-06 07:28:37 - ERROR - stderr - 97%|█████████▋| 21783/22434 [21:20:57<27:13, 2.51s/it] +2025-02-06 07:28:37 - ERROR - stderr - +2025-02-06 07:28:37 - ERROR - stderr - +2025-02-06 07:28:37 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.7004739046096802, 'learning_rate': 4.4136126739502405e-08, 'epoch': 2.91} +2025-02-06 07:28:37 - ERROR - stderr - 97%|█████████▋| 21783/22434 [21:20:57<27:13, 2.51s/it] +2025-02-06 07:28:40 - ERROR - stderr - 97%|█████████▋| 21784/22434 [21:20:59<27:06, 2.50s/it] +2025-02-06 07:28:40 - ERROR - stderr - +2025-02-06 07:28:40 - ERROR - stderr - +2025-02-06 07:28:40 - INFO - stdout - {'loss': 0.3072, 'grad_norm': 1.44155752658844, 'learning_rate': 4.400073547280781e-08, 'epoch': 2.91} +2025-02-06 07:28:40 - ERROR - stderr - 97%|█████████▋| 21784/22434 [21:20:59<27:06, 2.50s/it] +2025-02-06 07:28:42 - ERROR - stderr - 97%|█████████▋| 21785/22434 [21:21:02<26:49, 2.48s/it] +2025-02-06 07:28:42 - ERROR - stderr - +2025-02-06 07:28:42 - ERROR - stderr - +2025-02-06 07:28:42 - INFO - stdout - {'loss': 0.3526, 'grad_norm': 1.5545722246170044, 'learning_rate': 4.3865551729391773e-08, 'epoch': 2.91} +2025-02-06 07:28:42 - ERROR - stderr - 97%|█████████▋| 21785/22434 [21:21:02<26:49, 2.48s/it] +2025-02-06 07:28:45 - ERROR - stderr - 97%|█████████▋| 21786/22434 [21:21:04<26:41, 2.47s/it] +2025-02-06 07:28:45 - ERROR - stderr - +2025-02-06 07:28:45 - ERROR - stderr - +2025-02-06 07:28:45 - INFO - stdout - {'loss': 0.3294, 'grad_norm': 1.5478792190551758, 'learning_rate': 4.373057551207205e-08, 'epoch': 2.91} +2025-02-06 07:28:45 - ERROR - stderr - 97%|█████████▋| 21786/22434 [21:21:04<26:41, 2.47s/it] +2025-02-06 07:28:47 - ERROR - stderr - 97%|█████████▋| 21787/22434 [21:21:07<27:23, 2.54s/it] +2025-02-06 07:28:47 - ERROR - stderr - +2025-02-06 07:28:47 - ERROR - stderr - +2025-02-06 07:28:47 - INFO - stdout - {'loss': 0.3243, 'grad_norm': 1.4151712656021118, 'learning_rate': 4.3595806823660826e-08, 'epoch': 2.91} +2025-02-06 07:28:47 - ERROR - stderr - 97%|█████████▋| 21787/22434 [21:21:07<27:23, 2.54s/it] +2025-02-06 07:28:50 - ERROR - stderr - 97%|█████████▋| 21788/22434 [21:21:09<27:14, 2.53s/it] +2025-02-06 07:28:50 - ERROR - stderr - +2025-02-06 07:28:50 - ERROR - stderr - +2025-02-06 07:28:50 - INFO - stdout - {'loss': 0.3511, 'grad_norm': 1.553268313407898, 'learning_rate': 4.346124566696697e-08, 'epoch': 2.91} +2025-02-06 07:28:50 - ERROR - stderr - 97%|█████████▋| 21788/22434 [21:21:10<27:14, 2.53s/it] +2025-02-06 07:28:52 - ERROR - stderr - 97%|█████████▋| 21789/22434 [21:21:12<27:25, 2.55s/it] +2025-02-06 07:28:52 - ERROR - stderr - +2025-02-06 07:28:52 - ERROR - stderr - +2025-02-06 07:28:52 - INFO - stdout - {'loss': 0.3432, 'grad_norm': 1.396307110786438, 'learning_rate': 4.332689204479712e-08, 'epoch': 2.91} +2025-02-06 07:28:52 - ERROR - stderr - 97%|█████████▋| 21789/22434 [21:21:12<27:25, 2.55s/it] +2025-02-06 07:28:55 - ERROR - stderr - 97%|█████████▋| 21790/22434 [21:21:15<27:08, 2.53s/it] +2025-02-06 07:28:55 - ERROR - stderr - +2025-02-06 07:28:55 - ERROR - stderr - +2025-02-06 07:28:55 - INFO - stdout - {'loss': 0.3811, 'grad_norm': 1.6681559085845947, 'learning_rate': 4.319274595995016e-08, 'epoch': 2.91} +2025-02-06 07:28:55 - ERROR - stderr - 97%|█████████▋| 21790/22434 [21:21:15<27:08, 2.53s/it] +2025-02-06 07:28:57 - ERROR - stderr - 97%|█████████▋| 21791/22434 [21:21:17<26:57, 2.51s/it] +2025-02-06 07:28:57 - ERROR - stderr - +2025-02-06 07:28:57 - ERROR - stderr - +2025-02-06 07:28:57 - INFO - stdout - {'loss': 0.403, 'grad_norm': 1.734554409980774, 'learning_rate': 4.305880741522273e-08, 'epoch': 2.91} +2025-02-06 07:28:57 - ERROR - stderr - 97%|█████████▋| 21791/22434 [21:21:17<26:57, 2.51s/it] +2025-02-06 07:29:00 - ERROR - stderr - 97%|█████████▋| 21792/22434 [21:21:20<26:45, 2.50s/it] +2025-02-06 07:29:00 - ERROR - stderr - +2025-02-06 07:29:00 - ERROR - stderr - +2025-02-06 07:29:00 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.7042791843414307, 'learning_rate': 4.292507641340704e-08, 'epoch': 2.91} +2025-02-06 07:29:00 - ERROR - stderr - 97%|█████████▋| 21792/22434 [21:21:20<26:45, 2.50s/it] +2025-02-06 07:29:02 - ERROR - stderr - 97%|█████████▋| 21793/22434 [21:21:22<27:01, 2.53s/it] +2025-02-06 07:29:02 - ERROR - stderr - +2025-02-06 07:29:02 - ERROR - stderr - +2025-02-06 07:29:02 - INFO - stdout - {'loss': 0.2971, 'grad_norm': 1.4473533630371094, 'learning_rate': 4.279155295728976e-08, 'epoch': 2.91} +2025-02-06 07:29:02 - ERROR - stderr - 97%|█████████▋| 21793/22434 [21:21:22<27:01, 2.53s/it] +2025-02-06 07:29:05 - ERROR - stderr - 97%|█████████▋| 21794/22434 [21:21:25<26:40, 2.50s/it] +2025-02-06 07:29:05 - ERROR - stderr - +2025-02-06 07:29:05 - ERROR - stderr - +2025-02-06 07:29:05 - INFO - stdout - {'loss': 0.3676, 'grad_norm': 1.6682684421539307, 'learning_rate': 4.2658237049655325e-08, 'epoch': 2.91} +2025-02-06 07:29:05 - ERROR - stderr - 97%|█████████▋| 21794/22434 [21:21:25<26:40, 2.50s/it] +2025-02-06 07:29:08 - ERROR - stderr - 97%|█████████▋| 21795/22434 [21:21:27<27:22, 2.57s/it] +2025-02-06 07:29:08 - ERROR - stderr - +2025-02-06 07:29:08 - ERROR - stderr - +2025-02-06 07:29:08 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.5631887912750244, 'learning_rate': 4.252512869328151e-08, 'epoch': 2.91} +2025-02-06 07:29:08 - ERROR - stderr - 97%|█████████▋| 21795/22434 [21:21:27<27:22, 2.57s/it] +2025-02-06 07:29:10 - ERROR - stderr - 97%|█████████▋| 21796/22434 [21:21:30<27:10, 2.55s/it] +2025-02-06 07:29:10 - ERROR - stderr - +2025-02-06 07:29:10 - ERROR - stderr - +2025-02-06 07:29:10 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.663098931312561, 'learning_rate': 4.2392227890942774e-08, 'epoch': 2.91} +2025-02-06 07:29:10 - ERROR - stderr - 97%|█████████▋| 21796/22434 [21:21:30<27:10, 2.55s/it] +2025-02-06 07:29:12 - ERROR - stderr - 97%|█████████▋| 21797/22434 [21:21:32<26:44, 2.52s/it] +2025-02-06 07:29:13 - ERROR - stderr - +2025-02-06 07:29:13 - ERROR - stderr - +2025-02-06 07:29:13 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.610314965248108, 'learning_rate': 4.225953464540911e-08, 'epoch': 2.91} +2025-02-06 07:29:13 - ERROR - stderr - 97%|█████████▋| 21797/22434 [21:21:32<26:44, 2.52s/it] +2025-02-06 07:29:15 - ERROR - stderr - 97%|█████████▋| 21798/22434 [21:21:35<26:33, 2.51s/it] +2025-02-06 07:29:15 - ERROR - stderr - +2025-02-06 07:29:15 - ERROR - stderr - +2025-02-06 07:29:15 - INFO - stdout - {'loss': 0.3612, 'grad_norm': 1.693556547164917, 'learning_rate': 4.212704895944719e-08, 'epoch': 2.91} +2025-02-06 07:29:15 - ERROR - stderr - 97%|█████████▋| 21798/22434 [21:21:35<26:33, 2.51s/it] +2025-02-06 07:29:17 - ERROR - stderr - 97%|█████████▋| 21799/22434 [21:21:37<26:09, 2.47s/it] +2025-02-06 07:29:17 - ERROR - stderr - +2025-02-06 07:29:17 - ERROR - stderr - +2025-02-06 07:29:17 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.7170764207839966, 'learning_rate': 4.199477083581926e-08, 'epoch': 2.92} +2025-02-06 07:29:17 - ERROR - stderr - 97%|█████████▋| 21799/22434 [21:21:37<26:09, 2.47s/it] +2025-02-06 07:29:20 - ERROR - stderr - 97%|█████████▋| 21800/22434 [21:21:40<26:21, 2.49s/it] +2025-02-06 07:29:20 - ERROR - stderr - +2025-02-06 07:29:20 - ERROR - stderr - +2025-02-06 07:29:20 - INFO - stdout - {'loss': 0.3857, 'grad_norm': 1.6769074201583862, 'learning_rate': 4.18627002772809e-08, 'epoch': 2.92} +2025-02-06 07:29:20 - ERROR - stderr - 97%|█████████▋| 21800/22434 [21:21:40<26:21, 2.49s/it] +2025-02-06 07:29:22 - ERROR - stderr - 97%|█████████▋| 21801/22434 [21:21:42<26:15, 2.49s/it] +2025-02-06 07:29:22 - ERROR - stderr - +2025-02-06 07:29:22 - ERROR - stderr - +2025-02-06 07:29:22 - INFO - stdout - {'loss': 0.362, 'grad_norm': 1.5538465976715088, 'learning_rate': 4.173083728658656e-08, 'epoch': 2.92} +2025-02-06 07:29:22 - ERROR - stderr - 97%|█████████▋| 21801/22434 [21:21:42<26:15, 2.49s/it] +2025-02-06 07:29:25 - ERROR - stderr - 97%|█████████▋| 21802/22434 [21:21:45<26:01, 2.47s/it] +2025-02-06 07:29:25 - ERROR - stderr - +2025-02-06 07:29:25 - ERROR - stderr - +2025-02-06 07:29:25 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.665004849433899, 'learning_rate': 4.159918186648293e-08, 'epoch': 2.92} +2025-02-06 07:29:25 - ERROR - stderr - 97%|█████████▋| 21802/22434 [21:21:45<26:01, 2.47s/it] +2025-02-06 07:29:27 - ERROR - stderr - 97%|█████████▋| 21803/22434 [21:21:47<26:16, 2.50s/it] +2025-02-06 07:29:27 - ERROR - stderr - +2025-02-06 07:29:27 - ERROR - stderr - +2025-02-06 07:29:27 - INFO - stdout - {'loss': 0.3862, 'grad_norm': 1.7422281503677368, 'learning_rate': 4.146773401971449e-08, 'epoch': 2.92} +2025-02-06 07:29:27 - ERROR - stderr - 97%|█████████▋| 21803/22434 [21:21:47<26:16, 2.50s/it] +2025-02-06 07:29:30 - ERROR - stderr - 97%|█████████▋| 21804/22434 [21:21:50<26:33, 2.53s/it] +2025-02-06 07:29:30 - ERROR - stderr - +2025-02-06 07:29:30 - ERROR - stderr - +2025-02-06 07:29:30 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.59601891040802, 'learning_rate': 4.133649374902349e-08, 'epoch': 2.92} +2025-02-06 07:29:30 - ERROR - stderr - 97%|█████████▋| 21804/22434 [21:21:50<26:33, 2.53s/it] +2025-02-06 07:29:32 - ERROR - stderr - 97%|█████████▋| 21805/22434 [21:21:52<26:13, 2.50s/it] +2025-02-06 07:29:32 - ERROR - stderr - +2025-02-06 07:29:32 - ERROR - stderr - +2025-02-06 07:29:32 - INFO - stdout - {'loss': 0.3793, 'grad_norm': 1.6603186130523682, 'learning_rate': 4.120546105714329e-08, 'epoch': 2.92} +2025-02-06 07:29:32 - ERROR - stderr - 97%|█████████▋| 21805/22434 [21:21:52<26:13, 2.50s/it] +2025-02-06 07:29:35 - ERROR - stderr - 97%|█████████▋| 21806/22434 [21:21:55<26:16, 2.51s/it] +2025-02-06 07:29:35 - ERROR - stderr - +2025-02-06 07:29:35 - ERROR - stderr - +2025-02-06 07:29:35 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.5019904375076294, 'learning_rate': 4.107463594680505e-08, 'epoch': 2.92} +2025-02-06 07:29:35 - ERROR - stderr - 97%|█████████▋| 21806/22434 [21:21:55<26:16, 2.51s/it] +2025-02-06 07:29:37 - ERROR - stderr - 97%|█████████▋| 21807/22434 [21:21:57<26:22, 2.52s/it] +2025-02-06 07:29:38 - ERROR - stderr - +2025-02-06 07:29:38 - ERROR - stderr - +2025-02-06 07:29:38 - INFO - stdout - {'loss': 0.3515, 'grad_norm': 1.3670494556427002, 'learning_rate': 4.094401842073659e-08, 'epoch': 2.92} +2025-02-06 07:29:38 - ERROR - stderr - 97%|█████████▋| 21807/22434 [21:21:57<26:22, 2.52s/it] +2025-02-06 07:29:40 - ERROR - stderr - 97%|█████████▋| 21808/22434 [21:22:00<26:12, 2.51s/it] +2025-02-06 07:29:40 - ERROR - stderr - +2025-02-06 07:29:40 - ERROR - stderr - +2025-02-06 07:29:40 - INFO - stdout - {'loss': 0.3786, 'grad_norm': 1.6355624198913574, 'learning_rate': 4.081360848166016e-08, 'epoch': 2.92} +2025-02-06 07:29:40 - ERROR - stderr - 97%|█████████▋| 21808/22434 [21:22:00<26:12, 2.51s/it] +2025-02-06 07:29:42 - ERROR - stderr - 97%|█████████▋| 21809/22434 [21:22:02<26:13, 2.52s/it] +2025-02-06 07:29:43 - ERROR - stderr - +2025-02-06 07:29:43 - ERROR - stderr - +2025-02-06 07:29:43 - INFO - stdout - {'loss': 0.424, 'grad_norm': 1.8192009925842285, 'learning_rate': 4.068340613229471e-08, 'epoch': 2.92} +2025-02-06 07:29:43 - ERROR - stderr - 97%|█████████▋| 21809/22434 [21:22:02<26:13, 2.52s/it] +2025-02-06 07:29:45 - ERROR - stderr - 97%|█████████▋| 21810/22434 [21:22:05<26:00, 2.50s/it] +2025-02-06 07:29:45 - ERROR - stderr - +2025-02-06 07:29:45 - ERROR - stderr - +2025-02-06 07:29:45 - INFO - stdout - {'loss': 0.3756, 'grad_norm': 1.741868495941162, 'learning_rate': 4.0553411375353626e-08, 'epoch': 2.92} +2025-02-06 07:29:45 - ERROR - stderr - 97%|█████████▋| 21810/22434 [21:22:05<26:00, 2.50s/it] +2025-02-06 07:29:47 - ERROR - stderr - 97%|█████████▋| 21811/22434 [21:22:07<25:52, 2.49s/it] +2025-02-06 07:29:47 - ERROR - stderr - +2025-02-06 07:29:47 - ERROR - stderr - +2025-02-06 07:29:47 - INFO - stdout - {'loss': 0.4193, 'grad_norm': 1.7142159938812256, 'learning_rate': 4.042362421354695e-08, 'epoch': 2.92} +2025-02-06 07:29:47 - ERROR - stderr - 97%|█████████▋| 21811/22434 [21:22:07<25:52, 2.49s/it] +2025-02-06 07:29:50 - ERROR - stderr - 97%|█████████▋| 21812/22434 [21:22:10<26:06, 2.52s/it] +2025-02-06 07:29:50 - ERROR - stderr - +2025-02-06 07:29:50 - ERROR - stderr - +2025-02-06 07:29:50 - INFO - stdout - {'loss': 0.3339, 'grad_norm': 1.5600560903549194, 'learning_rate': 4.029404464957809e-08, 'epoch': 2.92} +2025-02-06 07:29:50 - ERROR - stderr - 97%|█████████▋| 21812/22434 [21:22:10<26:06, 2.52s/it] +2025-02-06 07:29:53 - ERROR - stderr - 97%|█████████▋| 21813/22434 [21:22:12<26:33, 2.57s/it] +2025-02-06 07:29:53 - ERROR - stderr - +2025-02-06 07:29:53 - ERROR - stderr - +2025-02-06 07:29:53 - INFO - stdout - {'loss': 0.3322, 'grad_norm': 1.6838316917419434, 'learning_rate': 4.016467268615154e-08, 'epoch': 2.92} +2025-02-06 07:29:53 - ERROR - stderr - 97%|█████████▋| 21813/22434 [21:22:12<26:33, 2.57s/it] +2025-02-06 07:29:55 - ERROR - stderr - 97%|█████████▋| 21814/22434 [21:22:15<26:13, 2.54s/it] +2025-02-06 07:29:55 - ERROR - stderr - +2025-02-06 07:29:55 - ERROR - stderr - +2025-02-06 07:29:55 - INFO - stdout - {'loss': 0.3483, 'grad_norm': 1.5570639371871948, 'learning_rate': 4.003550832595959e-08, 'epoch': 2.92} +2025-02-06 07:29:55 - ERROR - stderr - 97%|█████████▋| 21814/22434 [21:22:15<26:13, 2.54s/it] +2025-02-06 07:29:58 - ERROR - stderr - 97%|█████████▋| 21815/22434 [21:22:17<26:03, 2.53s/it] +2025-02-06 07:29:58 - ERROR - stderr - +2025-02-06 07:29:58 - ERROR - stderr - +2025-02-06 07:29:58 - INFO - stdout - {'loss': 0.3228, 'grad_norm': 1.4684759378433228, 'learning_rate': 3.9906551571697874e-08, 'epoch': 2.92} +2025-02-06 07:29:58 - ERROR - stderr - 97%|█████████▋| 21815/22434 [21:22:17<26:03, 2.53s/it] +2025-02-06 07:30:00 - ERROR - stderr - 97%|█████████▋| 21816/22434 [21:22:20<26:04, 2.53s/it] +2025-02-06 07:30:00 - ERROR - stderr - +2025-02-06 07:30:00 - ERROR - stderr - +2025-02-06 07:30:00 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.5414067506790161, 'learning_rate': 3.977780242605422e-08, 'epoch': 2.92} +2025-02-06 07:30:00 - ERROR - stderr - 97%|█████████▋| 21816/22434 [21:22:20<26:04, 2.53s/it] +2025-02-06 07:30:03 - ERROR - stderr - 97%|█████████▋| 21817/22434 [21:22:22<25:58, 2.53s/it] +2025-02-06 07:30:03 - ERROR - stderr - +2025-02-06 07:30:03 - ERROR - stderr - +2025-02-06 07:30:03 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.5478793382644653, 'learning_rate': 3.964926089170984e-08, 'epoch': 2.92} +2025-02-06 07:30:03 - ERROR - stderr - 97%|█████████▋| 21817/22434 [21:22:23<25:58, 2.53s/it] +2025-02-06 07:30:05 - ERROR - stderr - 97%|█████████▋| 21818/22434 [21:22:25<25:48, 2.51s/it] +2025-02-06 07:30:05 - ERROR - stderr - +2025-02-06 07:30:05 - ERROR - stderr - +2025-02-06 07:30:05 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.5231963396072388, 'learning_rate': 3.952092697134591e-08, 'epoch': 2.92} +2025-02-06 07:30:05 - ERROR - stderr - 97%|█████████▋| 21818/22434 [21:22:25<25:48, 2.51s/it] +2025-02-06 07:30:08 - ERROR - stderr - 97%|█████████▋| 21819/22434 [21:22:28<25:54, 2.53s/it] +2025-02-06 07:30:08 - ERROR - stderr - +2025-02-06 07:30:08 - ERROR - stderr - +2025-02-06 07:30:08 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.3407139778137207, 'learning_rate': 3.939280066763806e-08, 'epoch': 2.92} +2025-02-06 07:30:08 - ERROR - stderr - 97%|█████████▋| 21819/22434 [21:22:28<25:54, 2.53s/it] +2025-02-06 07:30:10 - ERROR - stderr - 97%|█████████▋| 21820/22434 [21:22:30<25:52, 2.53s/it] +2025-02-06 07:30:10 - ERROR - stderr - +2025-02-06 07:30:10 - ERROR - stderr - +2025-02-06 07:30:10 - INFO - stdout - {'loss': 0.374, 'grad_norm': 1.5974416732788086, 'learning_rate': 3.926488198325529e-08, 'epoch': 2.92} +2025-02-06 07:30:10 - ERROR - stderr - 97%|█████████▋| 21820/22434 [21:22:30<25:52, 2.53s/it] +2025-02-06 07:30:13 - ERROR - stderr - 97%|█████████▋| 21821/22434 [21:22:33<25:46, 2.52s/it] +2025-02-06 07:30:13 - ERROR - stderr - +2025-02-06 07:30:13 - ERROR - stderr - +2025-02-06 07:30:13 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.594809651374817, 'learning_rate': 3.913717092086433e-08, 'epoch': 2.92} +2025-02-06 07:30:13 - ERROR - stderr - 97%|█████████▋| 21821/22434 [21:22:33<25:46, 2.52s/it] +2025-02-06 07:30:15 - ERROR - stderr - 97%|█████████▋| 21822/22434 [21:22:35<25:39, 2.51s/it] +2025-02-06 07:30:15 - ERROR - stderr - +2025-02-06 07:30:15 - ERROR - stderr - +2025-02-06 07:30:15 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.5401958227157593, 'learning_rate': 3.900966748312862e-08, 'epoch': 2.92} +2025-02-06 07:30:15 - ERROR - stderr - 97%|█████████▋| 21822/22434 [21:22:35<25:39, 2.51s/it] +2025-02-06 07:30:18 - ERROR - stderr - 97%|█████████▋| 21823/22434 [21:22:38<25:35, 2.51s/it] +2025-02-06 07:30:18 - ERROR - stderr - +2025-02-06 07:30:18 - ERROR - stderr - +2025-02-06 07:30:18 - INFO - stdout - {'loss': 0.3748, 'grad_norm': 1.7426056861877441, 'learning_rate': 3.888237167270381e-08, 'epoch': 2.92} +2025-02-06 07:30:18 - ERROR - stderr - 97%|█████████▋| 21823/22434 [21:22:38<25:35, 2.51s/it] +2025-02-06 07:30:20 - ERROR - stderr - 97%|█████████▋| 21824/22434 [21:22:40<25:37, 2.52s/it] +2025-02-06 07:30:20 - ERROR - stderr - +2025-02-06 07:30:20 - ERROR - stderr - +2025-02-06 07:30:20 - INFO - stdout - {'loss': 0.3095, 'grad_norm': 1.3408571481704712, 'learning_rate': 3.875528349224444e-08, 'epoch': 2.92} +2025-02-06 07:30:20 - ERROR - stderr - 97%|█████████▋| 21824/22434 [21:22:40<25:37, 2.52s/it] +2025-02-06 07:30:23 - ERROR - stderr - 97%|█████████▋| 21825/22434 [21:22:43<25:22, 2.50s/it] +2025-02-06 07:30:23 - ERROR - stderr - +2025-02-06 07:30:23 - ERROR - stderr - +2025-02-06 07:30:23 - INFO - stdout - {'loss': 0.345, 'grad_norm': 1.6286340951919556, 'learning_rate': 3.862840294439951e-08, 'epoch': 2.92} +2025-02-06 07:30:23 - ERROR - stderr - 97%|█████████▋| 21825/22434 [21:22:43<25:22, 2.50s/it] +2025-02-06 07:30:25 - ERROR - stderr - 97%|█████████▋| 21826/22434 [21:22:45<25:39, 2.53s/it] +2025-02-06 07:30:25 - ERROR - stderr - +2025-02-06 07:30:25 - ERROR - stderr - +2025-02-06 07:30:25 - INFO - stdout - {'loss': 0.3346, 'grad_norm': 1.448320984840393, 'learning_rate': 3.850173003181357e-08, 'epoch': 2.92} +2025-02-06 07:30:25 - ERROR - stderr - 97%|█████████▋| 21826/22434 [21:22:45<25:39, 2.53s/it] +2025-02-06 07:30:28 - ERROR - stderr - 97%|█████████▋| 21827/22434 [21:22:48<25:25, 2.51s/it] +2025-02-06 07:30:28 - ERROR - stderr - +2025-02-06 07:30:28 - ERROR - stderr - +2025-02-06 07:30:28 - INFO - stdout - {'loss': 0.3378, 'grad_norm': 1.5646271705627441, 'learning_rate': 3.8375264757126716e-08, 'epoch': 2.92} +2025-02-06 07:30:28 - ERROR - stderr - 97%|█████████▋| 21827/22434 [21:22:48<25:25, 2.51s/it] +2025-02-06 07:30:30 - ERROR - stderr - 97%|█████████▋| 21828/22434 [21:22:50<25:28, 2.52s/it] +2025-02-06 07:30:30 - ERROR - stderr - +2025-02-06 07:30:30 - ERROR - stderr - +2025-02-06 07:30:30 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.4516915082931519, 'learning_rate': 3.824900712297464e-08, 'epoch': 2.92} +2025-02-06 07:30:30 - ERROR - stderr - 97%|█████████▋| 21828/22434 [21:22:50<25:28, 2.52s/it] +2025-02-06 07:30:33 - ERROR - stderr - 97%|█████████▋| 21829/22434 [21:22:53<26:27, 2.62s/it] +2025-02-06 07:30:33 - ERROR - stderr - +2025-02-06 07:30:33 - ERROR - stderr - +2025-02-06 07:30:33 - INFO - stdout - {'loss': 0.2913, 'grad_norm': 1.3167798519134521, 'learning_rate': 3.812295713199077e-08, 'epoch': 2.92} +2025-02-06 07:30:33 - ERROR - stderr - 97%|█████████▋| 21829/22434 [21:22:53<26:27, 2.62s/it] +2025-02-06 07:30:36 - ERROR - stderr - 97%|█████████▋| 21830/22434 [21:22:56<26:13, 2.60s/it] +2025-02-06 07:30:36 - ERROR - stderr - +2025-02-06 07:30:36 - ERROR - stderr - +2025-02-06 07:30:36 - INFO - stdout - {'loss': 0.3845, 'grad_norm': 1.6384868621826172, 'learning_rate': 3.7997114786800794e-08, 'epoch': 2.92} +2025-02-06 07:30:36 - ERROR - stderr - 97%|█████████▋| 21830/22434 [21:22:56<26:13, 2.60s/it] +2025-02-06 07:30:38 - ERROR - stderr - 97%|█████████▋| 21831/22434 [21:22:58<25:40, 2.55s/it] +2025-02-06 07:30:38 - ERROR - stderr - +2025-02-06 07:30:38 - ERROR - stderr - +2025-02-06 07:30:38 - INFO - stdout - {'loss': 0.3926, 'grad_norm': 2.3137524127960205, 'learning_rate': 3.787148009002817e-08, 'epoch': 2.92} +2025-02-06 07:30:38 - ERROR - stderr - 97%|█████████▋| 21831/22434 [21:22:58<25:40, 2.55s/it] +2025-02-06 07:30:41 - ERROR - stderr - 97%|█████████▋| 21832/22434 [21:23:01<25:47, 2.57s/it] +2025-02-06 07:30:41 - ERROR - stderr - +2025-02-06 07:30:41 - ERROR - stderr - +2025-02-06 07:30:41 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.5455163717269897, 'learning_rate': 3.774605304429191e-08, 'epoch': 2.92} +2025-02-06 07:30:41 - ERROR - stderr - 97%|█████████▋| 21832/22434 [21:23:01<25:47, 2.57s/it] +2025-02-06 07:30:43 - ERROR - stderr - 97%|█████████▋| 21833/22434 [21:23:03<25:41, 2.56s/it] +2025-02-06 07:30:43 - ERROR - stderr - +2025-02-06 07:30:43 - ERROR - stderr - +2025-02-06 07:30:43 - INFO - stdout - {'loss': 0.3474, 'grad_norm': 1.4665167331695557, 'learning_rate': 3.762083365220659e-08, 'epoch': 2.92} +2025-02-06 07:30:43 - ERROR - stderr - 97%|█████████▋| 21833/22434 [21:23:03<25:41, 2.56s/it] +2025-02-06 07:30:46 - ERROR - stderr - 97%|█████████▋| 21834/22434 [21:23:06<25:22, 2.54s/it] +2025-02-06 07:30:46 - ERROR - stderr - +2025-02-06 07:30:46 - ERROR - stderr - +2025-02-06 07:30:46 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.5014750957489014, 'learning_rate': 3.7495821916382347e-08, 'epoch': 2.92} +2025-02-06 07:30:46 - ERROR - stderr - 97%|█████████▋| 21834/22434 [21:23:06<25:22, 2.54s/it] +2025-02-06 07:30:48 - ERROR - stderr - 97%|█████████▋| 21835/22434 [21:23:08<25:14, 2.53s/it] +2025-02-06 07:30:48 - ERROR - stderr - +2025-02-06 07:30:48 - ERROR - stderr - +2025-02-06 07:30:48 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.5485979318618774, 'learning_rate': 3.7371017839423765e-08, 'epoch': 2.92} +2025-02-06 07:30:48 - ERROR - stderr - 97%|█████████▋| 21835/22434 [21:23:08<25:14, 2.53s/it] +2025-02-06 07:30:51 - ERROR - stderr - 97%|█████████▋| 21836/22434 [21:23:11<24:52, 2.50s/it] +2025-02-06 07:30:51 - ERROR - stderr - +2025-02-06 07:30:51 - ERROR - stderr - +2025-02-06 07:30:51 - INFO - stdout - {'loss': 0.364, 'grad_norm': 1.6716946363449097, 'learning_rate': 3.72464214239332e-08, 'epoch': 2.92} +2025-02-06 07:30:51 - ERROR - stderr - 97%|█████████▋| 21836/22434 [21:23:11<24:52, 2.50s/it] +2025-02-06 07:30:53 - ERROR - stderr - 97%|█████████▋| 21837/22434 [21:23:13<25:00, 2.51s/it] +2025-02-06 07:30:53 - ERROR - stderr - +2025-02-06 07:30:53 - ERROR - stderr - +2025-02-06 07:30:53 - INFO - stdout - {'loss': 0.3461, 'grad_norm': 1.6421786546707153, 'learning_rate': 3.712203267250858e-08, 'epoch': 2.92} +2025-02-06 07:30:53 - ERROR - stderr - 97%|█████████▋| 21837/22434 [21:23:13<25:00, 2.51s/it] +2025-02-06 07:30:56 - ERROR - stderr - 97%|█████████▋| 21838/22434 [21:23:16<24:45, 2.49s/it] +2025-02-06 07:30:56 - ERROR - stderr - +2025-02-06 07:30:56 - ERROR - stderr - +2025-02-06 07:30:56 - INFO - stdout - {'loss': 0.2774, 'grad_norm': 1.363187551498413, 'learning_rate': 3.699785158774116e-08, 'epoch': 2.92} +2025-02-06 07:30:56 - ERROR - stderr - 97%|█████████▋| 21838/22434 [21:23:16<24:45, 2.49s/it] +2025-02-06 07:31:00 - ERROR - stderr - 97%|█████████▋| 21839/22434 [21:23:20<31:05, 3.14s/it] +2025-02-06 07:31:01 - ERROR - stderr - +2025-02-06 07:31:01 - ERROR - stderr - +2025-02-06 07:31:01 - INFO - stdout - {'loss': 0.3649, 'grad_norm': 1.6017245054244995, 'learning_rate': 3.687387817221999e-08, 'epoch': 2.92} +2025-02-06 07:31:01 - ERROR - stderr - 97%|█████████▋| 21839/22434 [21:23:20<31:05, 3.14s/it] +2025-02-06 07:31:03 - ERROR - stderr - 97%|█████████▋| 21840/22434 [21:23:23<28:55, 2.92s/it] +2025-02-06 07:31:03 - ERROR - stderr - +2025-02-06 07:31:03 - ERROR - stderr - +2025-02-06 07:31:03 - INFO - stdout - {'loss': 0.3723, 'grad_norm': 1.374782681465149, 'learning_rate': 3.675011242852966e-08, 'epoch': 2.92} +2025-02-06 07:31:03 - ERROR - stderr - 97%|█████████▋| 21840/22434 [21:23:23<28:55, 2.92s/it] +2025-02-06 07:31:06 - ERROR - stderr - 97%|█████████▋| 21841/22434 [21:23:25<28:31, 2.89s/it] +2025-02-06 07:31:06 - ERROR - stderr - +2025-02-06 07:31:06 - ERROR - stderr - +2025-02-06 07:31:06 - INFO - stdout - {'loss': 0.306, 'grad_norm': 1.4454938173294067, 'learning_rate': 3.662655435924811e-08, 'epoch': 2.92} +2025-02-06 07:31:06 - ERROR - stderr - 97%|█████████▋| 21841/22434 [21:23:26<28:31, 2.89s/it] +2025-02-06 07:31:08 - ERROR - stderr - 97%|█████████▋| 21842/22434 [21:23:28<27:24, 2.78s/it] +2025-02-06 07:31:08 - ERROR - stderr - +2025-02-06 07:31:08 - ERROR - stderr - +2025-02-06 07:31:08 - INFO - stdout - {'loss': 0.3664, 'grad_norm': 1.5344688892364502, 'learning_rate': 3.650320396695328e-08, 'epoch': 2.92} +2025-02-06 07:31:08 - ERROR - stderr - 97%|█████████▋| 21842/22434 [21:23:28<27:24, 2.78s/it] +2025-02-06 07:31:11 - ERROR - stderr - 97%|█████████▋| 21843/22434 [21:23:31<28:46, 2.92s/it] +2025-02-06 07:31:12 - ERROR - stderr - +2025-02-06 07:31:12 - ERROR - stderr - +2025-02-06 07:31:12 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.6449830532073975, 'learning_rate': 3.638006125421423e-08, 'epoch': 2.92} +2025-02-06 07:31:12 - ERROR - stderr - 97%|█████████▋| 21843/22434 [21:23:31<28:46, 2.92s/it] +2025-02-06 07:31:14 - ERROR - stderr - 97%|█████████▋| 21844/22434 [21:23:34<29:01, 2.95s/it] +2025-02-06 07:31:15 - ERROR - stderr - +2025-02-06 07:31:15 - ERROR - stderr - +2025-02-06 07:31:15 - INFO - stdout - {'loss': 0.383, 'grad_norm': 1.5854240655899048, 'learning_rate': 3.62571262236e-08, 'epoch': 2.92} +2025-02-06 07:31:15 - ERROR - stderr - 97%|█████████▋| 21844/22434 [21:23:34<29:01, 2.95s/it] +2025-02-06 07:31:17 - ERROR - stderr - 97%|█████████▋| 21845/22434 [21:23:37<27:40, 2.82s/it] +2025-02-06 07:31:17 - ERROR - stderr - +2025-02-06 07:31:17 - ERROR - stderr - +2025-02-06 07:31:17 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.4984458684921265, 'learning_rate': 3.613439887767078e-08, 'epoch': 2.92} +2025-02-06 07:31:17 - ERROR - stderr - 97%|█████████▋| 21845/22434 [21:23:37<27:40, 2.82s/it] +2025-02-06 07:31:19 - ERROR - stderr - 97%|█████████▋| 21846/22434 [21:23:39<26:37, 2.72s/it] +2025-02-06 07:31:20 - ERROR - stderr - +2025-02-06 07:31:20 - ERROR - stderr - +2025-02-06 07:31:20 - INFO - stdout - {'loss': 0.3138, 'grad_norm': 1.4035987854003906, 'learning_rate': 3.6011879218985634e-08, 'epoch': 2.92} +2025-02-06 07:31:20 - ERROR - stderr - 97%|█████████▋| 21846/22434 [21:23:39<26:37, 2.72s/it] +2025-02-06 07:31:22 - ERROR - stderr - 97%|█████████▋| 21847/22434 [21:23:42<25:46, 2.63s/it] +2025-02-06 07:31:22 - ERROR - stderr - +2025-02-06 07:31:22 - ERROR - stderr - +2025-02-06 07:31:22 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.6949747800827026, 'learning_rate': 3.588956725009807e-08, 'epoch': 2.92} +2025-02-06 07:31:22 - ERROR - stderr - 97%|█████████▋| 21847/22434 [21:23:42<25:46, 2.63s/it] +2025-02-06 07:31:25 - ERROR - stderr - 97%|█████████▋| 21848/22434 [21:23:44<26:01, 2.66s/it] +2025-02-06 07:31:25 - ERROR - stderr - +2025-02-06 07:31:25 - ERROR - stderr - +2025-02-06 07:31:25 - INFO - stdout - {'loss': 0.3741, 'grad_norm': 1.645996332168579, 'learning_rate': 3.576746297355826e-08, 'epoch': 2.92} +2025-02-06 07:31:25 - ERROR - stderr - 97%|█████████▋| 21848/22434 [21:23:44<26:01, 2.66s/it] +2025-02-06 07:31:27 - ERROR - stderr - 97%|█████████▋| 21849/22434 [21:23:47<25:18, 2.60s/it] +2025-02-06 07:31:27 - ERROR - stderr - +2025-02-06 07:31:27 - ERROR - stderr - +2025-02-06 07:31:27 - INFO - stdout - {'loss': 0.4032, 'grad_norm': 1.725846767425537, 'learning_rate': 3.564556639191197e-08, 'epoch': 2.92} +2025-02-06 07:31:27 - ERROR - stderr - 97%|█████████▋| 21849/22434 [21:23:47<25:18, 2.60s/it] +2025-02-06 07:31:30 - ERROR - stderr - 97%|█████████▋| 21850/22434 [21:23:49<25:03, 2.57s/it] +2025-02-06 07:31:30 - ERROR - stderr - +2025-02-06 07:31:30 - ERROR - stderr - +2025-02-06 07:31:30 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.4980498552322388, 'learning_rate': 3.552387750769715e-08, 'epoch': 2.92} +2025-02-06 07:31:30 - ERROR - stderr - 97%|█████████▋| 21850/22434 [21:23:49<25:03, 2.57s/it] +2025-02-06 07:31:33 - ERROR - stderr - 97%|█████████▋| 21851/22434 [21:23:53<26:57, 2.77s/it] +2025-02-06 07:31:33 - ERROR - stderr - +2025-02-06 07:31:33 - ERROR - stderr - +2025-02-06 07:31:33 - INFO - stdout - {'loss': 0.3695, 'grad_norm': 1.5417406558990479, 'learning_rate': 3.540239632345288e-08, 'epoch': 2.92} +2025-02-06 07:31:33 - ERROR - stderr - 97%|█████████▋| 21851/22434 [21:23:53<26:57, 2.77s/it] +2025-02-06 07:31:36 - ERROR - stderr - 97%|█████████▋| 21852/22434 [21:23:55<27:03, 2.79s/it] +2025-02-06 07:31:36 - ERROR - stderr - +2025-02-06 07:31:36 - ERROR - stderr - +2025-02-06 07:31:36 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.58588445186615, 'learning_rate': 3.528112284171159e-08, 'epoch': 2.92} +2025-02-06 07:31:36 - ERROR - stderr - 97%|█████████▋| 21852/22434 [21:23:55<27:03, 2.79s/it] +2025-02-06 07:31:38 - ERROR - stderr - 97%|█████████▋| 21853/22434 [21:23:58<26:01, 2.69s/it] +2025-02-06 07:31:38 - ERROR - stderr - +2025-02-06 07:31:38 - ERROR - stderr - +2025-02-06 07:31:38 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.6231944561004639, 'learning_rate': 3.516005706499903e-08, 'epoch': 2.92} +2025-02-06 07:31:38 - ERROR - stderr - 97%|█████████▋| 21853/22434 [21:23:58<26:01, 2.69s/it] +2025-02-06 07:31:41 - ERROR - stderr - 97%|█████████▋| 21854/22434 [21:24:00<25:28, 2.63s/it] +2025-02-06 07:31:41 - ERROR - stderr - +2025-02-06 07:31:41 - ERROR - stderr - +2025-02-06 07:31:41 - INFO - stdout - {'loss': 0.3206, 'grad_norm': 1.5480135679244995, 'learning_rate': 3.503919899583985e-08, 'epoch': 2.92} +2025-02-06 07:31:41 - ERROR - stderr - 97%|█████████▋| 21854/22434 [21:24:00<25:28, 2.63s/it] +2025-02-06 07:31:43 - ERROR - stderr - 97%|█████████▋| 21855/22434 [21:24:03<25:06, 2.60s/it] +2025-02-06 07:31:43 - ERROR - stderr - +2025-02-06 07:31:43 - ERROR - stderr - +2025-02-06 07:31:43 - INFO - stdout - {'loss': 0.3976, 'grad_norm': 1.5012600421905518, 'learning_rate': 3.4918548636753145e-08, 'epoch': 2.92} +2025-02-06 07:31:43 - ERROR - stderr - 97%|█████████▋| 21855/22434 [21:24:03<25:06, 2.60s/it] +2025-02-06 07:31:46 - ERROR - stderr - 97%|█████████▋| 21856/22434 [21:24:05<24:39, 2.56s/it] +2025-02-06 07:31:46 - ERROR - stderr - +2025-02-06 07:31:46 - ERROR - stderr - +2025-02-06 07:31:46 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.6514532566070557, 'learning_rate': 3.4798105990253575e-08, 'epoch': 2.92} +2025-02-06 07:31:46 - ERROR - stderr - 97%|█████████▋| 21856/22434 [21:24:05<24:39, 2.56s/it] +2025-02-06 07:31:48 - ERROR - stderr - 97%|█████████▋| 21857/22434 [21:24:08<24:28, 2.54s/it] +2025-02-06 07:31:48 - ERROR - stderr - +2025-02-06 07:31:48 - ERROR - stderr - +2025-02-06 07:31:48 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.7092106342315674, 'learning_rate': 3.4677871058852454e-08, 'epoch': 2.92} +2025-02-06 07:31:48 - ERROR - stderr - 97%|█████████▋| 21857/22434 [21:24:08<24:28, 2.54s/it] +2025-02-06 07:31:51 - ERROR - stderr - 97%|█████████▋| 21858/22434 [21:24:10<24:26, 2.55s/it] +2025-02-06 07:31:51 - ERROR - stderr - +2025-02-06 07:31:51 - ERROR - stderr - +2025-02-06 07:31:51 - INFO - stdout - {'loss': 0.3301, 'grad_norm': 1.4381464719772339, 'learning_rate': 3.455784384505445e-08, 'epoch': 2.92} +2025-02-06 07:31:51 - ERROR - stderr - 97%|█████████▋| 21858/22434 [21:24:11<24:26, 2.55s/it] +2025-02-06 07:31:53 - ERROR - stderr - 97%|█████████▋| 21859/22434 [21:24:13<24:12, 2.53s/it] +2025-02-06 07:31:53 - ERROR - stderr - +2025-02-06 07:31:53 - ERROR - stderr - +2025-02-06 07:31:53 - INFO - stdout - {'loss': 0.3784, 'grad_norm': 1.5082228183746338, 'learning_rate': 3.443802435136312e-08, 'epoch': 2.92} +2025-02-06 07:31:53 - ERROR - stderr - 97%|█████████▋| 21859/22434 [21:24:13<24:12, 2.53s/it] +2025-02-06 07:31:56 - ERROR - stderr - 97%|█████████▋| 21860/22434 [21:24:15<23:56, 2.50s/it] +2025-02-06 07:31:56 - ERROR - stderr - +2025-02-06 07:31:56 - ERROR - stderr - +2025-02-06 07:31:56 - INFO - stdout - {'loss': 0.3647, 'grad_norm': 1.5529791116714478, 'learning_rate': 3.431841258027535e-08, 'epoch': 2.92} +2025-02-06 07:31:56 - ERROR - stderr - 97%|█████████▋| 21860/22434 [21:24:15<23:56, 2.50s/it] +2025-02-06 07:31:58 - ERROR - stderr - 97%|█████████▋| 21861/22434 [21:24:18<23:45, 2.49s/it] +2025-02-06 07:31:58 - ERROR - stderr - +2025-02-06 07:31:58 - ERROR - stderr - +2025-02-06 07:31:58 - INFO - stdout - {'loss': 0.2964, 'grad_norm': 1.2835859060287476, 'learning_rate': 3.41990085342836e-08, 'epoch': 2.92} +2025-02-06 07:31:58 - ERROR - stderr - 97%|█████████▋| 21861/22434 [21:24:18<23:45, 2.49s/it] +2025-02-06 07:32:01 - ERROR - stderr - 97%|█████████▋| 21862/22434 [21:24:20<23:51, 2.50s/it] +2025-02-06 07:32:01 - ERROR - stderr - +2025-02-06 07:32:01 - ERROR - stderr - +2025-02-06 07:32:01 - INFO - stdout - {'loss': 0.3585, 'grad_norm': 1.6354966163635254, 'learning_rate': 3.407981221587586e-08, 'epoch': 2.92} +2025-02-06 07:32:01 - ERROR - stderr - 97%|█████████▋| 21862/22434 [21:24:20<23:51, 2.50s/it] +2025-02-06 07:32:03 - ERROR - stderr - 97%|█████████▋| 21863/22434 [21:24:23<23:55, 2.51s/it] +2025-02-06 07:32:03 - ERROR - stderr - +2025-02-06 07:32:03 - ERROR - stderr - +2025-02-06 07:32:03 - INFO - stdout - {'loss': 0.3763, 'grad_norm': 1.5972139835357666, 'learning_rate': 3.3960823627540163e-08, 'epoch': 2.92} +2025-02-06 07:32:03 - ERROR - stderr - 97%|█████████▋| 21863/22434 [21:24:23<23:55, 2.51s/it] +2025-02-06 07:32:06 - ERROR - stderr - 97%|█████████▋| 21864/22434 [21:24:25<23:45, 2.50s/it] +2025-02-06 07:32:06 - ERROR - stderr - +2025-02-06 07:32:06 - ERROR - stderr - +2025-02-06 07:32:06 - INFO - stdout - {'loss': 0.3704, 'grad_norm': 1.3483550548553467, 'learning_rate': 3.3842042771754515e-08, 'epoch': 2.92} +2025-02-06 07:32:06 - ERROR - stderr - 97%|█████████▋| 21864/22434 [21:24:25<23:45, 2.50s/it] +2025-02-06 07:32:08 - ERROR - stderr - 97%|█████████▋| 21865/22434 [21:24:28<24:37, 2.60s/it] +2025-02-06 07:32:08 - ERROR - stderr - +2025-02-06 07:32:08 - ERROR - stderr - +2025-02-06 07:32:08 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.5618432760238647, 'learning_rate': 3.37234696509936e-08, 'epoch': 2.92} +2025-02-06 07:32:08 - ERROR - stderr - 97%|█████████▋| 21865/22434 [21:24:28<24:37, 2.60s/it] +2025-02-06 07:32:11 - ERROR - stderr - 97%|█████████▋| 21866/22434 [21:24:31<24:22, 2.57s/it] +2025-02-06 07:32:11 - ERROR - stderr - +2025-02-06 07:32:11 - ERROR - stderr - +2025-02-06 07:32:11 - INFO - stdout - {'loss': 0.3178, 'grad_norm': 1.3945764303207397, 'learning_rate': 3.3605104267731003e-08, 'epoch': 2.92} +2025-02-06 07:32:11 - ERROR - stderr - 97%|█████████▋| 21866/22434 [21:24:31<24:22, 2.57s/it] +2025-02-06 07:32:13 - ERROR - stderr - 97%|█████████▋| 21867/22434 [21:24:33<24:00, 2.54s/it] +2025-02-06 07:32:13 - ERROR - stderr - +2025-02-06 07:32:13 - ERROR - stderr - +2025-02-06 07:32:13 - INFO - stdout - {'loss': 0.3249, 'grad_norm': 1.5600690841674805, 'learning_rate': 3.348694662443364e-08, 'epoch': 2.92} +2025-02-06 07:32:13 - ERROR - stderr - 97%|█████████▋| 21867/22434 [21:24:33<24:00, 2.54s/it] +2025-02-06 07:32:16 - ERROR - stderr - 97%|█████████▋| 21868/22434 [21:24:36<23:44, 2.52s/it] +2025-02-06 07:32:16 - ERROR - stderr - +2025-02-06 07:32:16 - ERROR - stderr - +2025-02-06 07:32:16 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.5998305082321167, 'learning_rate': 3.336899672356397e-08, 'epoch': 2.92} +2025-02-06 07:32:16 - ERROR - stderr - 97%|█████████▋| 21868/22434 [21:24:36<23:44, 2.52s/it] +2025-02-06 07:32:18 - ERROR - stderr - 97%|█████████▋| 21869/22434 [21:24:38<23:38, 2.51s/it] +2025-02-06 07:32:18 - ERROR - stderr - +2025-02-06 07:32:18 - ERROR - stderr - +2025-02-06 07:32:18 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.58379065990448, 'learning_rate': 3.325125456758005e-08, 'epoch': 2.92} +2025-02-06 07:32:18 - ERROR - stderr - 97%|█████████▋| 21869/22434 [21:24:38<23:38, 2.51s/it] +2025-02-06 07:32:21 - ERROR - stderr - 97%|█████████▋| 21870/22434 [21:24:41<23:46, 2.53s/it] +2025-02-06 07:32:21 - ERROR - stderr - +2025-02-06 07:32:21 - ERROR - stderr - +2025-02-06 07:32:21 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.5419119596481323, 'learning_rate': 3.313372015893657e-08, 'epoch': 2.92} +2025-02-06 07:32:21 - ERROR - stderr - 97%|█████████▋| 21870/22434 [21:24:41<23:46, 2.53s/it] +2025-02-06 07:32:23 - ERROR - stderr - 97%|█████████▋| 21871/22434 [21:24:43<23:32, 2.51s/it] +2025-02-06 07:32:23 - ERROR - stderr - +2025-02-06 07:32:23 - ERROR - stderr - +2025-02-06 07:32:23 - INFO - stdout - {'loss': 0.3582, 'grad_norm': 1.5822155475616455, 'learning_rate': 3.301639350008379e-08, 'epoch': 2.92} +2025-02-06 07:32:23 - ERROR - stderr - 97%|██████��██▋| 21871/22434 [21:24:43<23:32, 2.51s/it] +2025-02-06 07:32:26 - ERROR - stderr - 97%|█████████▋| 21872/22434 [21:24:46<23:28, 2.51s/it] +2025-02-06 07:32:26 - ERROR - stderr - +2025-02-06 07:32:26 - ERROR - stderr - +2025-02-06 07:32:26 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.6291447877883911, 'learning_rate': 3.2899274593466425e-08, 'epoch': 2.92} +2025-02-06 07:32:26 - ERROR - stderr - 97%|█████████▋| 21872/22434 [21:24:46<23:28, 2.51s/it] +2025-02-06 07:32:29 - ERROR - stderr - 97%|█████████▋| 21873/22434 [21:24:48<23:41, 2.53s/it] +2025-02-06 07:32:29 - ERROR - stderr - +2025-02-06 07:32:29 - ERROR - stderr - +2025-02-06 07:32:29 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.6246795654296875, 'learning_rate': 3.278236344152586e-08, 'epoch': 2.92} +2025-02-06 07:32:29 - ERROR - stderr - 97%|█████████▋| 21873/22434 [21:24:48<23:41, 2.53s/it] +2025-02-06 07:32:31 - ERROR - stderr - 98%|█████████▊| 21874/22434 [21:24:51<23:36, 2.53s/it] +2025-02-06 07:32:31 - ERROR - stderr - +2025-02-06 07:32:31 - ERROR - stderr - +2025-02-06 07:32:31 - INFO - stdout - {'loss': 0.3871, 'grad_norm': 1.7607389688491821, 'learning_rate': 3.266566004670013e-08, 'epoch': 2.93} +2025-02-06 07:32:31 - ERROR - stderr - 98%|█████████▊| 21874/22434 [21:24:51<23:36, 2.53s/it] +2025-02-06 07:32:34 - ERROR - stderr - 98%|█████████▊| 21875/22434 [21:24:53<23:36, 2.53s/it] +2025-02-06 07:32:34 - ERROR - stderr - +2025-02-06 07:32:34 - ERROR - stderr - +2025-02-06 07:32:34 - INFO - stdout - {'loss': 0.3796, 'grad_norm': 1.5084717273712158, 'learning_rate': 3.254916441142064e-08, 'epoch': 2.93} +2025-02-06 07:32:34 - ERROR - stderr - 98%|█████████▊| 21875/22434 [21:24:53<23:36, 2.53s/it] +2025-02-06 07:32:36 - ERROR - stderr - 98%|█████████▊| 21876/22434 [21:24:56<23:36, 2.54s/it] +2025-02-06 07:32:36 - ERROR - stderr - +2025-02-06 07:32:36 - ERROR - stderr - +2025-02-06 07:32:36 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.4809011220932007, 'learning_rate': 3.2432876538116554e-08, 'epoch': 2.93} +2025-02-06 07:32:36 - ERROR - stderr - 98%|█████████▊| 21876/22434 [21:24:56<23:36, 2.54s/it] +2025-02-06 07:32:39 - ERROR - stderr - 98%|█████████▊| 21877/22434 [21:24:58<23:34, 2.54s/it] +2025-02-06 07:32:39 - ERROR - stderr - +2025-02-06 07:32:39 - ERROR - stderr - +2025-02-06 07:32:39 - INFO - stdout - {'loss': 0.3637, 'grad_norm': 1.6005514860153198, 'learning_rate': 3.2316796429210373e-08, 'epoch': 2.93} +2025-02-06 07:32:39 - ERROR - stderr - 98%|█████████▊| 21877/22434 [21:24:58<23:34, 2.54s/it] +2025-02-06 07:32:41 - ERROR - stderr - 98%|█████████▊| 21878/22434 [21:25:01<23:29, 2.54s/it] +2025-02-06 07:32:41 - ERROR - stderr - +2025-02-06 07:32:41 - ERROR - stderr - +2025-02-06 07:32:41 - INFO - stdout - {'loss': 0.3343, 'grad_norm': 1.321458339691162, 'learning_rate': 3.22009240871235e-08, 'epoch': 2.93} +2025-02-06 07:32:41 - ERROR - stderr - 98%|█████████▊| 21878/22434 [21:25:01<23:29, 2.54s/it] +2025-02-06 07:32:44 - ERROR - stderr - 98%|█████████▊| 21879/22434 [21:25:04<23:43, 2.56s/it] +2025-02-06 07:32:44 - ERROR - stderr - +2025-02-06 07:32:44 - ERROR - stderr - +2025-02-06 07:32:44 - INFO - stdout - {'loss': 0.3534, 'grad_norm': 1.619275689125061, 'learning_rate': 3.208525951426955e-08, 'epoch': 2.93} +2025-02-06 07:32:44 - ERROR - stderr - 98%|█████████▊| 21879/22434 [21:25:04<23:43, 2.56s/it] +2025-02-06 07:32:46 - ERROR - stderr - 98%|█████████▊| 21880/22434 [21:25:06<23:34, 2.55s/it] +2025-02-06 07:32:46 - ERROR - stderr - +2025-02-06 07:32:46 - ERROR - stderr - +2025-02-06 07:32:46 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.5018470287322998, 'learning_rate': 3.196980271305994e-08, 'epoch': 2.93} +2025-02-06 07:32:46 - ERROR - stderr - 98%|█████████▊| 21880/22434 [21:25:06<23:34, 2.55s/it] +2025-02-06 07:32:49 - ERROR - stderr - 98%|█████████▊| 21881/22434 [21:25:09<23:11, 2.52s/it] +2025-02-06 07:32:49 - ERROR - stderr - +2025-02-06 07:32:49 - ERROR - stderr - +2025-02-06 07:32:49 - INFO - stdout - {'loss': 0.3951, 'grad_norm': 1.6215152740478516, 'learning_rate': 3.185455368590162e-08, 'epoch': 2.93} +2025-02-06 07:32:49 - ERROR - stderr - 98%|█████████▊| 21881/22434 [21:25:09<23:11, 2.52s/it] +2025-02-06 07:32:51 - ERROR - stderr - 98%|█████████▊| 21882/22434 [21:25:11<23:12, 2.52s/it] +2025-02-06 07:32:51 - ERROR - stderr - +2025-02-06 07:32:51 - ERROR - stderr - +2025-02-06 07:32:51 - INFO - stdout - {'loss': 0.3158, 'grad_norm': 1.5148367881774902, 'learning_rate': 3.1739512435197126e-08, 'epoch': 2.93} +2025-02-06 07:32:51 - ERROR - stderr - 98%|█████████▊| 21882/22434 [21:25:11<23:12, 2.52s/it] +2025-02-06 07:32:54 - ERROR - stderr - 98%|█████████▊| 21883/22434 [21:25:14<23:10, 2.52s/it] +2025-02-06 07:32:54 - ERROR - stderr - +2025-02-06 07:32:54 - ERROR - stderr - +2025-02-06 07:32:54 - INFO - stdout - {'loss': 0.3681, 'grad_norm': 1.4534218311309814, 'learning_rate': 3.1624678963343426e-08, 'epoch': 2.93} +2025-02-06 07:32:54 - ERROR - stderr - 98%|█████████▊| 21883/22434 [21:25:14<23:10, 2.52s/it] +2025-02-06 07:32:56 - ERROR - stderr - 98%|█████████▊| 21884/22434 [21:25:16<23:05, 2.52s/it] +2025-02-06 07:32:56 - ERROR - stderr - +2025-02-06 07:32:56 - ERROR - stderr - +2025-02-06 07:32:56 - INFO - stdout - {'loss': 0.3394, 'grad_norm': 1.533595085144043, 'learning_rate': 3.151005327273526e-08, 'epoch': 2.93} +2025-02-06 07:32:56 - ERROR - stderr - 98%|█████████▊| 21884/22434 [21:25:16<23:05, 2.52s/it] +2025-02-06 07:32:59 - ERROR - stderr - 98%|█████████▊| 21885/22434 [21:25:19<22:53, 2.50s/it] +2025-02-06 07:32:59 - ERROR - stderr - +2025-02-06 07:32:59 - ERROR - stderr - +2025-02-06 07:32:59 - INFO - stdout - {'loss': 0.3152, 'grad_norm': 1.3647336959838867, 'learning_rate': 3.1395635365760736e-08, 'epoch': 2.93} +2025-02-06 07:32:59 - ERROR - stderr - 98%|█████████▊| 21885/22434 [21:25:19<22:53, 2.50s/it] +2025-02-06 07:33:01 - ERROR - stderr - 98%|█████████▊| 21886/22434 [21:25:21<22:58, 2.51s/it] +2025-02-06 07:33:01 - ERROR - stderr - +2025-02-06 07:33:01 - ERROR - stderr - +2025-02-06 07:33:01 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.5203237533569336, 'learning_rate': 3.12814252448046e-08, 'epoch': 2.93} +2025-02-06 07:33:01 - ERROR - stderr - 98%|█████████▊| 21886/22434 [21:25:21<22:58, 2.51s/it] +2025-02-06 07:33:04 - ERROR - stderr - 98%|█████████▊| 21887/22434 [21:25:24<22:42, 2.49s/it] +2025-02-06 07:33:04 - ERROR - stderr - +2025-02-06 07:33:04 - ERROR - stderr - +2025-02-06 07:33:04 - INFO - stdout - {'loss': 0.3404, 'grad_norm': 1.5276892185211182, 'learning_rate': 3.116742291224939e-08, 'epoch': 2.93} +2025-02-06 07:33:04 - ERROR - stderr - 98%|█████████▊| 21887/22434 [21:25:24<22:42, 2.49s/it] +2025-02-06 07:33:06 - ERROR - stderr - 98%|█████████▊| 21888/22434 [21:25:26<22:42, 2.50s/it] +2025-02-06 07:33:06 - ERROR - stderr - +2025-02-06 07:33:06 - ERROR - stderr - +2025-02-06 07:33:06 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.599893569946289, 'learning_rate': 3.105362837046877e-08, 'epoch': 2.93} +2025-02-06 07:33:06 - ERROR - stderr - 98%|█████████▊| 21888/22434 [21:25:26<22:42, 2.50s/it] +2025-02-06 07:33:09 - ERROR - stderr - 98%|█████████▊| 21889/22434 [21:25:29<22:43, 2.50s/it] +2025-02-06 07:33:09 - ERROR - stderr - +2025-02-06 07:33:09 - ERROR - stderr - +2025-02-06 07:33:09 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.5389573574066162, 'learning_rate': 3.0940041621836395e-08, 'epoch': 2.93} +2025-02-06 07:33:09 - ERROR - stderr - 98%|█████████▊| 21889/22434 [21:25:29<22:43, 2.50s/it] +2025-02-06 07:33:11 - ERROR - stderr - 98%|█████████▊| 21890/22434 [21:25:31<22:43, 2.51s/it] +2025-02-06 07:33:11 - ERROR - stderr - +2025-02-06 07:33:11 - ERROR - stderr - +2025-02-06 07:33:11 - INFO - stdout - {'loss': 0.3414, 'grad_norm': 1.4590905904769897, 'learning_rate': 3.082666266872036e-08, 'epoch': 2.93} +2025-02-06 07:33:11 - ERROR - stderr - 98%|█████████▊| 21890/22434 [21:25:31<22:43, 2.51s/it] +2025-02-06 07:33:14 - ERROR - stderr - 98%|█████████▊| 21891/22434 [21:25:34<22:35, 2.50s/it] +2025-02-06 07:33:14 - ERROR - stderr - +2025-02-06 07:33:14 - ERROR - stderr - +2025-02-06 07:33:14 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.4509474039077759, 'learning_rate': 3.071349151348213e-08, 'epoch': 2.93} +2025-02-06 07:33:14 - ERROR - stderr - 98%|█████████▊| 21891/22434 [21:25:34<22:35, 2.50s/it] +2025-02-06 07:33:16 - ERROR - stderr - 98%|█████████▊| 21892/22434 [21:25:36<22:33, 2.50s/it] +2025-02-06 07:33:16 - ERROR - stderr - +2025-02-06 07:33:16 - ERROR - stderr - +2025-02-06 07:33:16 - INFO - stdout - {'loss': 0.4217, 'grad_norm': 1.7496082782745361, 'learning_rate': 3.060052815848202e-08, 'epoch': 2.93} +2025-02-06 07:33:16 - ERROR - stderr - 98%|█████████▊| 21892/22434 [21:25:36<22:33, 2.50s/it] +2025-02-06 07:33:19 - ERROR - stderr - 98%|█████████▊| 21893/22434 [21:25:39<22:25, 2.49s/it] +2025-02-06 07:33:19 - ERROR - stderr - +2025-02-06 07:33:19 - ERROR - stderr - +2025-02-06 07:33:19 - INFO - stdout - {'loss': 0.3363, 'grad_norm': 1.4676233530044556, 'learning_rate': 3.0487772606074826e-08, 'epoch': 2.93} +2025-02-06 07:33:19 - ERROR - stderr - 98%|█████████▊| 21893/22434 [21:25:39<22:25, 2.49s/it] +2025-02-06 07:33:21 - ERROR - stderr - 98%|█████████▊| 21894/22434 [21:25:41<22:50, 2.54s/it] +2025-02-06 07:33:21 - ERROR - stderr - +2025-02-06 07:33:21 - ERROR - stderr - +2025-02-06 07:33:21 - INFO - stdout - {'loss': 0.3644, 'grad_norm': 1.5512809753417969, 'learning_rate': 3.0375224858609774e-08, 'epoch': 2.93} +2025-02-06 07:33:21 - ERROR - stderr - 98%|█████████▊| 21894/22434 [21:25:41<22:50, 2.54s/it] +2025-02-06 07:33:24 - ERROR - stderr - 98%|█████████▊| 21895/22434 [21:25:44<22:36, 2.52s/it] +2025-02-06 07:33:24 - ERROR - stderr - +2025-02-06 07:33:24 - ERROR - stderr - +2025-02-06 07:33:24 - INFO - stdout - {'loss': 0.3627, 'grad_norm': 1.4243831634521484, 'learning_rate': 3.026288491843277e-08, 'epoch': 2.93} +2025-02-06 07:33:24 - ERROR - stderr - 98%|█████████▊| 21895/22434 [21:25:44<22:36, 2.52s/it] +2025-02-06 07:33:26 - ERROR - stderr - 98%|█████████▊| 21896/22434 [21:25:46<22:38, 2.52s/it] +2025-02-06 07:33:26 - ERROR - stderr - +2025-02-06 07:33:26 - ERROR - stderr - +2025-02-06 07:33:26 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.7067409753799438, 'learning_rate': 3.0150752787886374e-08, 'epoch': 2.93} +2025-02-06 07:33:26 - ERROR - stderr - 98%|█████████▊| 21896/22434 [21:25:46<22:38, 2.52s/it] +2025-02-06 07:33:29 - ERROR - stderr - 98%|█████████▊| 21897/22434 [21:25:49<22:32, 2.52s/it] +2025-02-06 07:33:29 - ERROR - stderr - +2025-02-06 07:33:29 - ERROR - stderr - +2025-02-06 07:33:29 - INFO - stdout - {'loss': 0.3588, 'grad_norm': 1.3961254358291626, 'learning_rate': 3.0038828469306506e-08, 'epoch': 2.93} +2025-02-06 07:33:29 - ERROR - stderr - 98%|█████████▊| 21897/22434 [21:25:49<22:32, 2.52s/it] +2025-02-06 07:33:31 - ERROR - stderr - 98%|█████████▊| 21898/22434 [21:25:51<22:34, 2.53s/it] +2025-02-06 07:33:32 - ERROR - stderr - +2025-02-06 07:33:32 - ERROR - stderr - +2025-02-06 07:33:32 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.5491318702697754, 'learning_rate': 2.9927111965029063e-08, 'epoch': 2.93} +2025-02-06 07:33:32 - ERROR - stderr - 98%|█████████▊| 21898/22434 [21:25:51<22:34, 2.53s/it] +2025-02-06 07:33:34 - ERROR - stderr - 98%|█████████▊| 21899/22434 [21:25:54<22:25, 2.51s/it] +2025-02-06 07:33:34 - ERROR - stderr - +2025-02-06 07:33:34 - ERROR - stderr - +2025-02-06 07:33:34 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5140010118484497, 'learning_rate': 2.981560327737887e-08, 'epoch': 2.93} +2025-02-06 07:33:34 - ERROR - stderr - 98%|█████████▊| 21899/22434 [21:25:54<22:25, 2.51s/it] +2025-02-06 07:33:36 - ERROR - stderr - 98%|█████████▊| 21900/22434 [21:25:56<22:16, 2.50s/it] +2025-02-06 07:33:36 - ERROR - stderr - +2025-02-06 07:33:36 - ERROR - stderr - +2025-02-06 07:33:36 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.536584734916687, 'learning_rate': 2.970430240868183e-08, 'epoch': 2.93} +2025-02-06 07:33:36 - ERROR - stderr - 98%|█████████▊| 21900/22434 [21:25:56<22:16, 2.50s/it] +2025-02-06 07:33:39 - ERROR - stderr - 98%|█████████▊| 21901/22434 [21:25:59<22:10, 2.50s/it] +2025-02-06 07:33:39 - ERROR - stderr - +2025-02-06 07:33:39 - ERROR - stderr - +2025-02-06 07:33:39 - INFO - stdout - {'loss': 0.3613, 'grad_norm': 1.4457896947860718, 'learning_rate': 2.9593209361259422e-08, 'epoch': 2.93} +2025-02-06 07:33:39 - ERROR - stderr - 98%|█████████▊| 21901/22434 [21:25:59<22:10, 2.50s/it] +2025-02-06 07:33:41 - ERROR - stderr - 98%|█████████▊| 21902/22434 [21:26:01<22:17, 2.51s/it] +2025-02-06 07:33:42 - ERROR - stderr - +2025-02-06 07:33:42 - ERROR - stderr - +2025-02-06 07:33:42 - INFO - stdout - {'loss': 0.3738, 'grad_norm': 1.4457533359527588, 'learning_rate': 2.9482324137425355e-08, 'epoch': 2.93} +2025-02-06 07:33:42 - ERROR - stderr - 98%|█████████▊| 21902/22434 [21:26:01<22:17, 2.51s/it] +2025-02-06 07:33:44 - ERROR - stderr - 98%|█████████▊| 21903/22434 [21:26:04<22:23, 2.53s/it] +2025-02-06 07:33:44 - ERROR - stderr - +2025-02-06 07:33:44 - ERROR - stderr - +2025-02-06 07:33:44 - INFO - stdout - {'loss': 0.3227, 'grad_norm': 1.4049588441848755, 'learning_rate': 2.937164673949111e-08, 'epoch': 2.93} +2025-02-06 07:33:44 - ERROR - stderr - 98%|█████████▊| 21903/22434 [21:26:04<22:23, 2.53s/it] +2025-02-06 07:33:47 - ERROR - stderr - 98%|█████████▊| 21904/22434 [21:26:06<22:22, 2.53s/it] +2025-02-06 07:33:47 - ERROR - stderr - +2025-02-06 07:33:47 - ERROR - stderr - +2025-02-06 07:33:47 - INFO - stdout - {'loss': 0.3783, 'grad_norm': 1.5382344722747803, 'learning_rate': 2.926117716976484e-08, 'epoch': 2.93} +2025-02-06 07:33:47 - ERROR - stderr - 98%|█████████▊| 21904/22434 [21:26:06<22:22, 2.53s/it] +2025-02-06 07:33:49 - ERROR - stderr - 98%|███��█████▊| 21905/22434 [21:26:09<22:29, 2.55s/it] +2025-02-06 07:33:49 - ERROR - stderr - +2025-02-06 07:33:49 - ERROR - stderr - +2025-02-06 07:33:49 - INFO - stdout - {'loss': 0.3792, 'grad_norm': 1.505469560623169, 'learning_rate': 2.9150915430548045e-08, 'epoch': 2.93} +2025-02-06 07:33:49 - ERROR - stderr - 98%|█████████▊| 21905/22434 [21:26:09<22:29, 2.55s/it] +2025-02-06 07:33:52 - ERROR - stderr - 98%|█████████▊| 21906/22434 [21:26:11<22:19, 2.54s/it] +2025-02-06 07:33:52 - ERROR - stderr - +2025-02-06 07:33:52 - ERROR - stderr - +2025-02-06 07:33:52 - INFO - stdout - {'loss': 0.3831, 'grad_norm': 1.5301388502120972, 'learning_rate': 2.9040861524138876e-08, 'epoch': 2.93} +2025-02-06 07:33:52 - ERROR - stderr - 98%|█████████▊| 21906/22434 [21:26:11<22:19, 2.54s/it] +2025-02-06 07:33:54 - ERROR - stderr - 98%|█████████▊| 21907/22434 [21:26:14<22:09, 2.52s/it] +2025-02-06 07:33:54 - ERROR - stderr - +2025-02-06 07:33:54 - ERROR - stderr - +2025-02-06 07:33:54 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.4482907056808472, 'learning_rate': 2.8931015452831057e-08, 'epoch': 2.93} +2025-02-06 07:33:54 - ERROR - stderr - 98%|█████████▊| 21907/22434 [21:26:14<22:09, 2.52s/it] +2025-02-06 07:33:57 - ERROR - stderr - 98%|█████████▊| 21908/22434 [21:26:16<22:10, 2.53s/it] +2025-02-06 07:33:57 - ERROR - stderr - +2025-02-06 07:33:57 - ERROR - stderr - +2025-02-06 07:33:57 - INFO - stdout - {'loss': 0.3464, 'grad_norm': 1.555217981338501, 'learning_rate': 2.8821377218917202e-08, 'epoch': 2.93} +2025-02-06 07:33:57 - ERROR - stderr - 98%|█████████▊| 21908/22434 [21:26:17<22:10, 2.53s/it] +2025-02-06 07:33:59 - ERROR - stderr - 98%|█████████▊| 21909/22434 [21:26:19<22:09, 2.53s/it] +2025-02-06 07:33:59 - ERROR - stderr - +2025-02-06 07:33:59 - ERROR - stderr - +2025-02-06 07:33:59 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.5804312229156494, 'learning_rate': 2.8711946824678817e-08, 'epoch': 2.93} +2025-02-06 07:33:59 - ERROR - stderr - 98%|█████████▊| 21909/22434 [21:26:19<22:09, 2.53s/it] +2025-02-06 07:34:02 - ERROR - stderr - 98%|█████████▊| 21910/22434 [21:26:22<22:20, 2.56s/it] +2025-02-06 07:34:02 - ERROR - stderr - +2025-02-06 07:34:02 - ERROR - stderr - +2025-02-06 07:34:02 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.5982356071472168, 'learning_rate': 2.860272427239852e-08, 'epoch': 2.93} +2025-02-06 07:34:02 - ERROR - stderr - 98%|█████████▊| 21910/22434 [21:26:22<22:20, 2.56s/it] +2025-02-06 07:34:04 - ERROR - stderr - 98%|█████████▊| 21911/22434 [21:26:24<22:17, 2.56s/it] +2025-02-06 07:34:04 - ERROR - stderr - +2025-02-06 07:34:04 - ERROR - stderr - +2025-02-06 07:34:04 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.5752573013305664, 'learning_rate': 2.8493709564353376e-08, 'epoch': 2.93} +2025-02-06 07:34:04 - ERROR - stderr - 98%|█████████▊| 21911/22434 [21:26:24<22:17, 2.56s/it] +2025-02-06 07:34:07 - ERROR - stderr - 98%|█████████▊| 21912/22434 [21:26:27<22:06, 2.54s/it] +2025-02-06 07:34:07 - ERROR - stderr - +2025-02-06 07:34:07 - ERROR - stderr - +2025-02-06 07:34:07 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.5073572397232056, 'learning_rate': 2.838490270281491e-08, 'epoch': 2.93} +2025-02-06 07:34:07 - ERROR - stderr - 98%|█████████▊| 21912/22434 [21:26:27<22:06, 2.54s/it] +2025-02-06 07:34:09 - ERROR - stderr - 98%|█████████▊| 21913/22434 [21:26:29<21:56, 2.53s/it] +2025-02-06 07:34:09 - ERROR - stderr - +2025-02-06 07:34:09 - ERROR - stderr - +2025-02-06 07:34:09 - INFO - stdout - {'loss': 0.4056, 'grad_norm': 1.6357098817825317, 'learning_rate': 2.827630369005019e-08, 'epoch': 2.93} +2025-02-06 07:34:09 - ERROR - stderr - 98%|█████████▊| 21913/22434 [21:26:29<21:56, 2.53s/it] +2025-02-06 07:34:12 - ERROR - stderr - 98%|█████████▊| 21914/22434 [21:26:32<21:59, 2.54s/it] +2025-02-06 07:34:12 - ERROR - stderr - +2025-02-06 07:34:12 - ERROR - stderr - +2025-02-06 07:34:12 - INFO - stdout - {'loss': 0.3473, 'grad_norm': 1.4607244729995728, 'learning_rate': 2.816791252832518e-08, 'epoch': 2.93} +2025-02-06 07:34:12 - ERROR - stderr - 98%|█████████▊| 21914/22434 [21:26:32<21:59, 2.54s/it] +2025-02-06 07:34:14 - ERROR - stderr - 98%|█████████▊| 21915/22434 [21:26:34<21:49, 2.52s/it] +2025-02-06 07:34:15 - ERROR - stderr - +2025-02-06 07:34:15 - ERROR - stderr - +2025-02-06 07:34:15 - INFO - stdout - {'loss': 0.3384, 'grad_norm': 1.5359269380569458, 'learning_rate': 2.805972921989808e-08, 'epoch': 2.93} +2025-02-06 07:34:15 - ERROR - stderr - 98%|█████████▊| 21915/22434 [21:26:34<21:49, 2.52s/it] +2025-02-06 07:34:17 - ERROR - stderr - 98%|█████████▊| 21916/22434 [21:26:37<21:43, 2.52s/it] +2025-02-06 07:34:17 - ERROR - stderr - +2025-02-06 07:34:17 - ERROR - stderr - +2025-02-06 07:34:17 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.4903286695480347, 'learning_rate': 2.795175376702375e-08, 'epoch': 2.93} +2025-02-06 07:34:17 - ERROR - stderr - 98%|█████████▊| 21916/22434 [21:26:37<21:43, 2.52s/it] +2025-02-06 07:34:19 - ERROR - stderr - 98%|█████████▊| 21917/22434 [21:26:39<21:39, 2.51s/it] +2025-02-06 07:34:20 - ERROR - stderr - +2025-02-06 07:34:20 - ERROR - stderr - +2025-02-06 07:34:20 - INFO - stdout - {'loss': 0.3854, 'grad_norm': 1.6003645658493042, 'learning_rate': 2.784398617195372e-08, 'epoch': 2.93} +2025-02-06 07:34:20 - ERROR - stderr - 98%|█████████▊| 21917/22434 [21:26:39<21:39, 2.51s/it] +2025-02-06 07:34:22 - ERROR - stderr - 98%|█████████▊| 21918/22434 [21:26:42<21:49, 2.54s/it] +2025-02-06 07:34:22 - ERROR - stderr - +2025-02-06 07:34:22 - ERROR - stderr - +2025-02-06 07:34:22 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.4195661544799805, 'learning_rate': 2.7736426436931753e-08, 'epoch': 2.93} +2025-02-06 07:34:22 - ERROR - stderr - 98%|█████████▊| 21918/22434 [21:26:42<21:49, 2.54s/it] +2025-02-06 07:34:25 - ERROR - stderr - 98%|█████████▊| 21919/22434 [21:26:44<21:40, 2.53s/it] +2025-02-06 07:34:25 - ERROR - stderr - +2025-02-06 07:34:25 - ERROR - stderr - +2025-02-06 07:34:25 - INFO - stdout - {'loss': 0.3767, 'grad_norm': 1.584608554840088, 'learning_rate': 2.762907456420272e-08, 'epoch': 2.93} +2025-02-06 07:34:25 - ERROR - stderr - 98%|█████████▊| 21919/22434 [21:26:44<21:40, 2.53s/it] +2025-02-06 07:34:27 - ERROR - stderr - 98%|█████████▊| 21920/22434 [21:26:47<21:26, 2.50s/it] +2025-02-06 07:34:27 - ERROR - stderr - +2025-02-06 07:34:27 - ERROR - stderr - +2025-02-06 07:34:27 - INFO - stdout - {'loss': 0.329, 'grad_norm': 1.5825413465499878, 'learning_rate': 2.7521930556002608e-08, 'epoch': 2.93} +2025-02-06 07:34:27 - ERROR - stderr - 98%|█████████▊| 21920/22434 [21:26:47<21:26, 2.50s/it] +2025-02-06 07:34:30 - ERROR - stderr - 98%|█████████▊| 21921/22434 [21:26:49<21:30, 2.52s/it] +2025-02-06 07:34:30 - ERROR - stderr - +2025-02-06 07:34:30 - ERROR - stderr - +2025-02-06 07:34:30 - INFO - stdout - {'loss': 0.3754, 'grad_norm': 1.6157947778701782, 'learning_rate': 2.7414994414565187e-08, 'epoch': 2.93} +2025-02-06 07:34:30 - ERROR - stderr - 98%|█████████▊| 21921/22434 [21:26:49<21:30, 2.52s/it] +2025-02-06 07:34:32 - ERROR - stderr - 98%|█████████▊| 21922/22434 [21:26:52<21:18, 2.50s/it] +2025-02-06 07:34:32 - ERROR - stderr - +2025-02-06 07:34:32 - ERROR - stderr - +2025-02-06 07:34:32 - INFO - stdout - {'loss': 0.3509, 'grad_norm': 1.5587482452392578, 'learning_rate': 2.7308266142119788e-08, 'epoch': 2.93} +2025-02-06 07:34:32 - ERROR - stderr - 98%|█████████▊| 21922/22434 [21:26:52<21:18, 2.50s/it] +2025-02-06 07:34:35 - ERROR - stderr - 98%|█████████▊| 21923/22434 [21:26:54<21:24, 2.51s/it] +2025-02-06 07:34:35 - ERROR - stderr - +2025-02-06 07:34:35 - ERROR - stderr - +2025-02-06 07:34:35 - INFO - stdout - {'loss': 0.3376, 'grad_norm': 1.5920610427856445, 'learning_rate': 2.7201745740890186e-08, 'epoch': 2.93} +2025-02-06 07:34:35 - ERROR - stderr - 98%|█████████▊| 21923/22434 [21:26:54<21:24, 2.51s/it] +2025-02-06 07:34:37 - ERROR - stderr - 98%|█████████▊| 21924/22434 [21:26:57<21:34, 2.54s/it] +2025-02-06 07:34:37 - ERROR - stderr - +2025-02-06 07:34:37 - ERROR - stderr - +2025-02-06 07:34:37 - INFO - stdout - {'loss': 0.3683, 'grad_norm': 1.6547080278396606, 'learning_rate': 2.7095433213097933e-08, 'epoch': 2.93} +2025-02-06 07:34:37 - ERROR - stderr - 98%|█████████▊| 21924/22434 [21:26:57<21:34, 2.54s/it] +2025-02-06 07:34:40 - ERROR - stderr - 98%|█████████▊| 21925/22434 [21:26:59<21:22, 2.52s/it] +2025-02-06 07:34:40 - ERROR - stderr - +2025-02-06 07:34:40 - ERROR - stderr - +2025-02-06 07:34:40 - INFO - stdout - {'loss': 0.3675, 'grad_norm': 1.6580116748809814, 'learning_rate': 2.698932856095793e-08, 'epoch': 2.93} +2025-02-06 07:34:40 - ERROR - stderr - 98%|█████████▊| 21925/22434 [21:26:59<21:22, 2.52s/it] +2025-02-06 07:34:42 - ERROR - stderr - 98%|█████████▊| 21926/22434 [21:27:02<21:20, 2.52s/it] +2025-02-06 07:34:42 - ERROR - stderr - +2025-02-06 07:34:42 - ERROR - stderr - +2025-02-06 07:34:42 - INFO - stdout - {'loss': 0.3213, 'grad_norm': 1.5862411260604858, 'learning_rate': 2.6883431786682844e-08, 'epoch': 2.93} +2025-02-06 07:34:42 - ERROR - stderr - 98%|█████████▊| 21926/22434 [21:27:02<21:20, 2.52s/it] +2025-02-06 07:34:45 - ERROR - stderr - 98%|█████████▊| 21927/22434 [21:27:04<21:04, 2.49s/it] +2025-02-06 07:34:45 - ERROR - stderr - +2025-02-06 07:34:45 - ERROR - stderr - +2025-02-06 07:34:45 - INFO - stdout - {'loss': 0.3614, 'grad_norm': 1.412866473197937, 'learning_rate': 2.6777742892478697e-08, 'epoch': 2.93} +2025-02-06 07:34:45 - ERROR - stderr - 98%|█████████▊| 21927/22434 [21:27:04<21:04, 2.49s/it] +2025-02-06 07:34:47 - ERROR - stderr - 98%|█████████▊| 21928/22434 [21:27:07<21:04, 2.50s/it] +2025-02-06 07:34:47 - ERROR - stderr - +2025-02-06 07:34:47 - ERROR - stderr - +2025-02-06 07:34:47 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.4328703880310059, 'learning_rate': 2.6672261880549276e-08, 'epoch': 2.93} +2025-02-06 07:34:47 - ERROR - stderr - 98%|█████████▊| 21928/22434 [21:27:07<21:04, 2.50s/it] +2025-02-06 07:34:50 - ERROR - stderr - 98%|█████████▊| 21929/22434 [21:27:09<21:12, 2.52s/it] +2025-02-06 07:34:50 - ERROR - stderr - +2025-02-06 07:34:50 - ERROR - stderr - +2025-02-06 07:34:50 - INFO - stdout - {'loss': 0.3482, 'grad_norm': 1.4566869735717773, 'learning_rate': 2.6566988753093938e-08, 'epoch': 2.93} +2025-02-06 07:34:50 - ERROR - stderr - 98%|█████████▊| 21929/22434 [21:27:10<21:12, 2.52s/it] +2025-02-06 07:34:52 - ERROR - stderr - 98%|█████████▊| 21930/22434 [21:27:12<21:09, 2.52s/it] +2025-02-06 07:34:52 - ERROR - stderr - +2025-02-06 07:34:52 - ERROR - stderr - +2025-02-06 07:34:52 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.6085487604141235, 'learning_rate': 2.6461923512305367e-08, 'epoch': 2.93} +2025-02-06 07:34:52 - ERROR - stderr - 98%|█████████▊| 21930/22434 [21:27:12<21:09, 2.52s/it] +2025-02-06 07:34:55 - ERROR - stderr - 98%|█████████▊| 21931/22434 [21:27:14<20:58, 2.50s/it] +2025-02-06 07:34:55 - ERROR - stderr - +2025-02-06 07:34:55 - ERROR - stderr - +2025-02-06 07:34:55 - INFO - stdout - {'loss': 0.3876, 'grad_norm': 1.6303867101669312, 'learning_rate': 2.6357066160374035e-08, 'epoch': 2.93} +2025-02-06 07:34:55 - ERROR - stderr - 98%|█████████▊| 21931/22434 [21:27:14<20:58, 2.50s/it] +2025-02-06 07:34:57 - ERROR - stderr - 98%|█████████▊| 21932/22434 [21:27:17<20:57, 2.51s/it] +2025-02-06 07:34:57 - ERROR - stderr - +2025-02-06 07:34:57 - ERROR - stderr - +2025-02-06 07:34:57 - INFO - stdout - {'loss': 0.3424, 'grad_norm': 1.377264380455017, 'learning_rate': 2.625241669948597e-08, 'epoch': 2.93} +2025-02-06 07:34:57 - ERROR - stderr - 98%|█████████▊| 21932/22434 [21:27:17<20:57, 2.51s/it] +2025-02-06 07:35:00 - ERROR - stderr - 98%|█████████▊| 21933/22434 [21:27:19<20:57, 2.51s/it] +2025-02-06 07:35:00 - ERROR - stderr - +2025-02-06 07:35:00 - ERROR - stderr - +2025-02-06 07:35:00 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.4616636037826538, 'learning_rate': 2.6147975131822767e-08, 'epoch': 2.93} +2025-02-06 07:35:00 - ERROR - stderr - 98%|█████████▊| 21933/22434 [21:27:20<20:57, 2.51s/it] +2025-02-06 07:35:02 - ERROR - stderr - 98%|█████████▊| 21934/22434 [21:27:22<21:12, 2.55s/it] +2025-02-06 07:35:02 - ERROR - stderr - +2025-02-06 07:35:02 - ERROR - stderr - +2025-02-06 07:35:02 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.4236183166503906, 'learning_rate': 2.6043741459561565e-08, 'epoch': 2.93} +2025-02-06 07:35:02 - ERROR - stderr - 98%|█████████▊| 21934/22434 [21:27:22<21:12, 2.55s/it] +2025-02-06 07:35:05 - ERROR - stderr - 98%|█████████▊| 21935/22434 [21:27:25<21:01, 2.53s/it] +2025-02-06 07:35:05 - ERROR - stderr - +2025-02-06 07:35:05 - ERROR - stderr - +2025-02-06 07:35:05 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.5016098022460938, 'learning_rate': 2.5939715684873967e-08, 'epoch': 2.93} +2025-02-06 07:35:05 - ERROR - stderr - 98%|█████████▊| 21935/22434 [21:27:25<21:01, 2.53s/it] +2025-02-06 07:35:07 - ERROR - stderr - 98%|█████████▊| 21936/22434 [21:27:27<20:55, 2.52s/it] +2025-02-06 07:35:07 - ERROR - stderr - +2025-02-06 07:35:07 - ERROR - stderr - +2025-02-06 07:35:07 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.4501601457595825, 'learning_rate': 2.5835897809929345e-08, 'epoch': 2.93} +2025-02-06 07:35:07 - ERROR - stderr - 98%|█████████▊| 21936/22434 [21:27:27<20:55, 2.52s/it] +2025-02-06 07:35:10 - ERROR - stderr - 98%|█████████▊| 21937/22434 [21:27:29<20:35, 2.49s/it] +2025-02-06 07:35:10 - ERROR - stderr - +2025-02-06 07:35:10 - ERROR - stderr - +2025-02-06 07:35:10 - INFO - stdout - {'loss': 0.4076, 'grad_norm': 1.6074968576431274, 'learning_rate': 2.5732287836890413e-08, 'epoch': 2.93} +2025-02-06 07:35:10 - ERROR - stderr - 98%|█████████▊| 21937/22434 [21:27:30<20:35, 2.49s/it] +2025-02-06 07:35:12 - ERROR - stderr - 98%|█████████▊| 21938/22434 [21:27:32<20:58, 2.54s/it] +2025-02-06 07:35:12 - ERROR - stderr - +2025-02-06 07:35:12 - ERROR - stderr - +2025-02-06 07:35:12 - INFO - stdout - {'loss': 0.3663, 'grad_norm': 1.5663450956344604, 'learning_rate': 2.5628885767918777e-08, 'epoch': 2.93} +2025-02-06 07:35:12 - ERROR - stderr - 98%|█████████▊| 21938/22434 [21:27:32<20:58, 2.54s/it] +2025-02-06 07:35:15 - ERROR - stderr - 98%|█████████▊| 21939/22434 [21:27:35<21:41, 2.63s/it] +2025-02-06 07:35:15 - ERROR - stderr - +2025-02-06 07:35:15 - ERROR - stderr - +2025-02-06 07:35:15 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.5713156461715698, 'learning_rate': 2.5525691605167156e-08, 'epoch': 2.93} +2025-02-06 07:35:15 - ERROR - stderr - 98%|█████████▊| 21939/22434 [21:27:35<21:41, 2.63s/it] +2025-02-06 07:35:18 - ERROR - stderr - 98%|█████████▊| 21940/22434 [21:27:38<22:16, 2.71s/it] +2025-02-06 07:35:18 - ERROR - stderr - +2025-02-06 07:35:18 - ERROR - stderr - +2025-02-06 07:35:18 - INFO - stdout - {'loss': 0.2813, 'grad_norm': 1.2413362264633179, 'learning_rate': 2.542270535078828e-08, 'epoch': 2.93} +2025-02-06 07:35:18 - ERROR - stderr - 98%|█████████▊| 21940/22434 [21:27:38<22:16, 2.71s/it] +2025-02-06 07:35:21 - ERROR - stderr - 98%|█████████▊| 21941/22434 [21:27:40<21:40, 2.64s/it] +2025-02-06 07:35:21 - ERROR - stderr - +2025-02-06 07:35:21 - ERROR - stderr - +2025-02-06 07:35:21 - INFO - stdout - {'loss': 0.3956, 'grad_norm': 1.623618721961975, 'learning_rate': 2.5319927006929313e-08, 'epoch': 2.93} +2025-02-06 07:35:21 - ERROR - stderr - 98%|█████████▊| 21941/22434 [21:27:40<21:40, 2.64s/it] +2025-02-06 07:35:23 - ERROR - stderr - 98%|█████████▊| 21942/22434 [21:27:43<21:31, 2.63s/it] +2025-02-06 07:35:23 - ERROR - stderr - +2025-02-06 07:35:23 - ERROR - stderr - +2025-02-06 07:35:23 - INFO - stdout - {'loss': 0.3408, 'grad_norm': 1.4769296646118164, 'learning_rate': 2.5217356575730767e-08, 'epoch': 2.93} +2025-02-06 07:35:23 - ERROR - stderr - 98%|█████████▊| 21942/22434 [21:27:43<21:31, 2.63s/it] +2025-02-06 07:35:26 - ERROR - stderr - 98%|█████████▊| 21943/22434 [21:27:45<21:10, 2.59s/it] +2025-02-06 07:35:26 - ERROR - stderr - +2025-02-06 07:35:26 - ERROR - stderr - +2025-02-06 07:35:26 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.7603259086608887, 'learning_rate': 2.5114994059333154e-08, 'epoch': 2.93} +2025-02-06 07:35:26 - ERROR - stderr - 98%|█████████▊| 21943/22434 [21:27:46<21:10, 2.59s/it] +2025-02-06 07:35:28 - ERROR - stderr - 98%|█████████▊| 21944/22434 [21:27:48<20:55, 2.56s/it] +2025-02-06 07:35:28 - ERROR - stderr - +2025-02-06 07:35:28 - ERROR - stderr - +2025-02-06 07:35:28 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.5921375751495361, 'learning_rate': 2.5012839459866987e-08, 'epoch': 2.93} +2025-02-06 07:35:28 - ERROR - stderr - 98%|█████████▊| 21944/22434 [21:27:48<20:55, 2.56s/it] +2025-02-06 07:35:31 - ERROR - stderr - 98%|█████████▊| 21945/22434 [21:27:50<20:46, 2.55s/it] +2025-02-06 07:35:31 - ERROR - stderr - +2025-02-06 07:35:31 - ERROR - stderr - +2025-02-06 07:35:31 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.488053321838379, 'learning_rate': 2.49108927794639e-08, 'epoch': 2.93} +2025-02-06 07:35:31 - ERROR - stderr - 98%|█████████▊| 21945/22434 [21:27:51<20:46, 2.55s/it] +2025-02-06 07:35:33 - ERROR - stderr - 98%|█████████▊| 21946/22434 [21:27:53<20:37, 2.54s/it] +2025-02-06 07:35:33 - ERROR - stderr - +2025-02-06 07:35:33 - ERROR - stderr - +2025-02-06 07:35:33 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.5141801834106445, 'learning_rate': 2.480915402024775e-08, 'epoch': 2.93} +2025-02-06 07:35:33 - ERROR - stderr - 98%|█████████▊| 21946/22434 [21:27:53<20:37, 2.54s/it] +2025-02-06 07:35:36 - ERROR - stderr - 98%|█████████▊| 21947/22434 [21:27:56<20:39, 2.54s/it] +2025-02-06 07:35:36 - ERROR - stderr - +2025-02-06 07:35:36 - ERROR - stderr - +2025-02-06 07:35:36 - INFO - stdout - {'loss': 0.3576, 'grad_norm': 1.4018319845199585, 'learning_rate': 2.4707623184339057e-08, 'epoch': 2.93} +2025-02-06 07:35:36 - ERROR - stderr - 98%|█████████▊| 21947/22434 [21:27:56<20:39, 2.54s/it] +2025-02-06 07:35:38 - ERROR - stderr - 98%|█████████▊| 21948/22434 [21:27:58<20:36, 2.54s/it] +2025-02-06 07:35:38 - ERROR - stderr - +2025-02-06 07:35:38 - ERROR - stderr - +2025-02-06 07:35:38 - INFO - stdout - {'loss': 0.3476, 'grad_norm': 1.4830055236816406, 'learning_rate': 2.4606300273856133e-08, 'epoch': 2.94} +2025-02-06 07:35:38 - ERROR - stderr - 98%|█████████▊| 21948/22434 [21:27:58<20:36, 2.54s/it] +2025-02-06 07:35:41 - ERROR - stderr - 98%|█████████▊| 21949/22434 [21:28:01<20:34, 2.55s/it] +2025-02-06 07:35:41 - ERROR - stderr - +2025-02-06 07:35:41 - ERROR - stderr - +2025-02-06 07:35:41 - INFO - stdout - {'loss': 0.3116, 'grad_norm': 1.474337100982666, 'learning_rate': 2.4505185290908396e-08, 'epoch': 2.94} +2025-02-06 07:35:41 - ERROR - stderr - 98%|█████████▊| 21949/22434 [21:28:01<20:34, 2.55s/it] +2025-02-06 07:35:43 - ERROR - stderr - 98%|█████████▊| 21950/22434 [21:28:03<20:17, 2.52s/it] +2025-02-06 07:35:43 - ERROR - stderr - +2025-02-06 07:35:43 - ERROR - stderr - +2025-02-06 07:35:43 - INFO - stdout - {'loss': 0.3885, 'grad_norm': 1.5377109050750732, 'learning_rate': 2.4404278237605272e-08, 'epoch': 2.94} +2025-02-06 07:35:43 - ERROR - stderr - 98%|█████████▊| 21950/22434 [21:28:03<20:17, 2.52s/it] +2025-02-06 07:35:46 - ERROR - stderr - 98%|█████████▊| 21951/22434 [21:28:06<20:37, 2.56s/it] +2025-02-06 07:35:46 - ERROR - stderr - +2025-02-06 07:35:46 - ERROR - stderr - +2025-02-06 07:35:46 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4599922895431519, 'learning_rate': 2.4303579116048416e-08, 'epoch': 2.94} +2025-02-06 07:35:46 - ERROR - stderr - 98%|█████████▊| 21951/22434 [21:28:06<20:37, 2.56s/it] +2025-02-06 07:35:49 - ERROR - stderr - 98%|█████████▊| 21952/22434 [21:28:08<20:42, 2.58s/it] +2025-02-06 07:35:49 - ERROR - stderr - +2025-02-06 07:35:49 - ERROR - stderr - +2025-02-06 07:35:49 - INFO - stdout - {'loss': 0.3835, 'grad_norm': 1.669805884361267, 'learning_rate': 2.4203087928338366e-08, 'epoch': 2.94} +2025-02-06 07:35:49 - ERROR - stderr - 98%|█████████▊| 21952/22434 [21:28:08<20:42, 2.58s/it] +2025-02-06 07:35:51 - ERROR - stderr - 98%|█████████▊| 21953/22434 [21:28:11<20:35, 2.57s/it] +2025-02-06 07:35:51 - ERROR - stderr - +2025-02-06 07:35:51 - ERROR - stderr - +2025-02-06 07:35:51 - INFO - stdout - {'loss': 0.3867, 'grad_norm': 1.5633931159973145, 'learning_rate': 2.4102804676569004e-08, 'epoch': 2.94} +2025-02-06 07:35:51 - ERROR - stderr - 98%|█████████▊| 21953/22434 [21:28:11<20:35, 2.57s/it] +2025-02-06 07:35:54 - ERROR - stderr - 98%|█████████▊| 21954/22434 [21:28:13<20:24, 2.55s/it] +2025-02-06 07:35:54 - ERROR - stderr - +2025-02-06 07:35:54 - ERROR - stderr - +2025-02-06 07:35:54 - INFO - stdout - {'loss': 0.3499, 'grad_norm': 1.6189275979995728, 'learning_rate': 2.400272936283088e-08, 'epoch': 2.94} +2025-02-06 07:35:54 - ERROR - stderr - 98%|█████████▊| 21954/22434 [21:28:13<20:24, 2.55s/it] +2025-02-06 07:35:56 - ERROR - stderr - 98%|█████████▊| 21955/22434 [21:28:16<20:19, 2.55s/it] +2025-02-06 07:35:56 - ERROR - stderr - +2025-02-06 07:35:56 - ERROR - stderr - +2025-02-06 07:35:56 - INFO - stdout - {'loss': 0.3465, 'grad_norm': 1.5520025491714478, 'learning_rate': 2.3902861989208994e-08, 'epoch': 2.94} +2025-02-06 07:35:56 - ERROR - stderr - 98%|█████████▊| 21955/22434 [21:28:16<20:19, 2.55s/it] +2025-02-06 07:35:59 - ERROR - stderr - 98%|█████████▊| 21956/22434 [21:28:18<20:10, 2.53s/it] +2025-02-06 07:35:59 - ERROR - stderr - +2025-02-06 07:35:59 - ERROR - stderr - +2025-02-06 07:35:59 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.5975831747055054, 'learning_rate': 2.380320255778723e-08, 'epoch': 2.94} +2025-02-06 07:35:59 - ERROR - stderr - 98%|█████████▊| 21956/22434 [21:28:19<20:10, 2.53s/it] +2025-02-06 07:36:01 - ERROR - stderr - 98%|█████████▊| 21957/22434 [21:28:21<20:22, 2.56s/it] +2025-02-06 07:36:01 - ERROR - stderr - +2025-02-06 07:36:01 - ERROR - stderr - +2025-02-06 07:36:01 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.6494909524917603, 'learning_rate': 2.37037510706406e-08, 'epoch': 2.94} +2025-02-06 07:36:01 - ERROR - stderr - 98%|█████████▊| 21957/22434 [21:28:21<20:22, 2.56s/it] +2025-02-06 07:36:04 - ERROR - stderr - 98%|█████████▊| 21958/22434 [21:28:24<20:02, 2.53s/it] +2025-02-06 07:36:04 - ERROR - stderr - +2025-02-06 07:36:04 - ERROR - stderr - +2025-02-06 07:36:04 - INFO - stdout - {'loss': 0.3799, 'grad_norm': 1.594377040863037, 'learning_rate': 2.3604507529843e-08, 'epoch': 2.94} +2025-02-06 07:36:04 - ERROR - stderr - 98%|█████████▊| 21958/22434 [21:28:24<20:02, 2.53s/it] +2025-02-06 07:36:06 - ERROR - stderr - 98%|█████████▊| 21959/22434 [21:28:26<19:52, 2.51s/it] +2025-02-06 07:36:06 - ERROR - stderr - +2025-02-06 07:36:06 - ERROR - stderr - +2025-02-06 07:36:06 - INFO - stdout - {'loss': 0.2936, 'grad_norm': 1.3874338865280151, 'learning_rate': 2.3505471937463888e-08, 'epoch': 2.94} +2025-02-06 07:36:06 - ERROR - stderr - 98%|█████████▊| 21959/22434 [21:28:26<19:52, 2.51s/it] +2025-02-06 07:36:09 - ERROR - stderr - 98%|█████████▊| 21960/22434 [21:28:29<19:46, 2.50s/it] +2025-02-06 07:36:09 - ERROR - stderr - +2025-02-06 07:36:09 - ERROR - stderr - +2025-02-06 07:36:09 - INFO - stdout - {'loss': 0.3491, 'grad_norm': 1.533348560333252, 'learning_rate': 2.340664429556605e-08, 'epoch': 2.94} +2025-02-06 07:36:09 - ERROR - stderr - 98%|█████████▊| 21960/22434 [21:28:29<19:46, 2.50s/it] +2025-02-06 07:36:11 - ERROR - stderr - 98%|█████████▊| 21961/22434 [21:28:31<19:36, 2.49s/it] +2025-02-06 07:36:11 - ERROR - stderr - +2025-02-06 07:36:11 - ERROR - stderr - +2025-02-06 07:36:11 - INFO - stdout - {'loss': 0.3753, 'grad_norm': 1.7203236818313599, 'learning_rate': 2.3308024606210066e-08, 'epoch': 2.94} +2025-02-06 07:36:11 - ERROR - stderr - 98%|█████████▊| 21961/22434 [21:28:31<19:36, 2.49s/it] +2025-02-06 07:36:14 - ERROR - stderr - 98%|█████████▊| 21962/22434 [21:28:33<19:36, 2.49s/it] +2025-02-06 07:36:14 - ERROR - stderr - +2025-02-06 07:36:14 - ERROR - stderr - +2025-02-06 07:36:14 - INFO - stdout - {'loss': 0.3965, 'grad_norm': 1.729691982269287, 'learning_rate': 2.320961287145207e-08, 'epoch': 2.94} +2025-02-06 07:36:14 - ERROR - stderr - 98%|█████████▊| 21962/22434 [21:28:34<19:36, 2.49s/it] +2025-02-06 07:36:16 - ERROR - stderr - 98%|█████████▊| 21963/22434 [21:28:36<19:45, 2.52s/it] +2025-02-06 07:36:16 - ERROR - stderr - +2025-02-06 07:36:16 - ERROR - stderr - +2025-02-06 07:36:16 - INFO - stdout - {'loss': 0.3899, 'grad_norm': 1.7557917833328247, 'learning_rate': 2.311140909334264e-08, 'epoch': 2.94} +2025-02-06 07:36:16 - ERROR - stderr - 98%|█████████▊| 21963/22434 [21:28:36<19:45, 2.52s/it] +2025-02-06 07:36:19 - ERROR - stderr - 98%|█████████▊| 21964/22434 [21:28:39<19:41, 2.51s/it] +2025-02-06 07:36:19 - ERROR - stderr - +2025-02-06 07:36:19 - ERROR - stderr - +2025-02-06 07:36:19 - INFO - stdout - {'loss': 0.4084, 'grad_norm': 1.9127912521362305, 'learning_rate': 2.301341327392903e-08, 'epoch': 2.94} +2025-02-06 07:36:19 - ERROR - stderr - 98%|█████████▊| 21964/22434 [21:28:39<19:41, 2.51s/it] +2025-02-06 07:36:21 - ERROR - stderr - 98%|█████████▊| 21965/22434 [21:28:41<19:30, 2.50s/it] +2025-02-06 07:36:21 - ERROR - stderr - +2025-02-06 07:36:21 - ERROR - stderr - +2025-02-06 07:36:21 - INFO - stdout - {'loss': 0.4268, 'grad_norm': 1.8002527952194214, 'learning_rate': 2.291562541525405e-08, 'epoch': 2.94} +2025-02-06 07:36:21 - ERROR - stderr - 98%|█████████▊| 21965/22434 [21:28:41<19:30, 2.50s/it] +2025-02-06 07:36:24 - ERROR - stderr - 98%|█████████▊| 21966/22434 [21:28:44<20:26, 2.62s/it] +2025-02-06 07:36:24 - ERROR - stderr - +2025-02-06 07:36:24 - ERROR - stderr - +2025-02-06 07:36:24 - INFO - stdout - {'loss': 0.3225, 'grad_norm': 1.4639889001846313, 'learning_rate': 2.281804551935607e-08, 'epoch': 2.94} +2025-02-06 07:36:24 - ERROR - stderr - 98%|█████████▊| 21966/22434 [21:28:44<20:26, 2.62s/it] +2025-02-06 07:36:27 - ERROR - stderr - 98%|█████████▊| 21967/22434 [21:28:46<20:04, 2.58s/it] +2025-02-06 07:36:27 - ERROR - stderr - +2025-02-06 07:36:27 - ERROR - stderr - +2025-02-06 07:36:27 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.4750975370407104, 'learning_rate': 2.2720673588269014e-08, 'epoch': 2.94} +2025-02-06 07:36:27 - ERROR - stderr - 98%|█████████▊| 21967/22434 [21:28:46<20:04, 2.58s/it] +2025-02-06 07:36:29 - ERROR - stderr - 98%|█████████▊| 21968/22434 [21:28:49<19:53, 2.56s/it] +2025-02-06 07:36:29 - ERROR - stderr - +2025-02-06 07:36:29 - ERROR - stderr - +2025-02-06 07:36:29 - INFO - stdout - {'loss': 0.3705, 'grad_norm': 1.6867755651474, 'learning_rate': 2.2623509624021266e-08, 'epoch': 2.94} +2025-02-06 07:36:29 - ERROR - stderr - 98%|█████████▊| 21968/22434 [21:28:49<19:53, 2.56s/it] +2025-02-06 07:36:32 - ERROR - stderr - 98%|█████████▊| 21969/22434 [21:28:51<19:38, 2.53s/it] +2025-02-06 07:36:32 - ERROR - stderr - +2025-02-06 07:36:32 - ERROR - stderr - +2025-02-06 07:36:32 - INFO - stdout - {'loss': 0.3853, 'grad_norm': 1.4584228992462158, 'learning_rate': 2.252655362864009e-08, 'epoch': 2.94} +2025-02-06 07:36:32 - ERROR - stderr - 98%|█████████▊| 21969/22434 [21:28:51<19:38, 2.53s/it] +2025-02-06 07:36:34 - ERROR - stderr - 98%|█████████▊| 21970/22434 [21:28:54<19:21, 2.50s/it] +2025-02-06 07:36:34 - ERROR - stderr - +2025-02-06 07:36:34 - ERROR - stderr - +2025-02-06 07:36:34 - INFO - stdout - {'loss': 0.3198, 'grad_norm': 1.5253322124481201, 'learning_rate': 2.2429805604144983e-08, 'epoch': 2.94} +2025-02-06 07:36:34 - ERROR - stderr - 98%|█████████▊| 21970/22434 [21:28:54<19:21, 2.50s/it] +2025-02-06 07:36:37 - ERROR - stderr - 98%|█████████▊| 21971/22434 [21:28:56<19:15, 2.50s/it] +2025-02-06 07:36:37 - ERROR - stderr - +2025-02-06 07:36:37 - ERROR - stderr - +2025-02-06 07:36:37 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.6178983449935913, 'learning_rate': 2.233326555255322e-08, 'epoch': 2.94} +2025-02-06 07:36:37 - ERROR - stderr - 98%|█████████▊| 21971/22434 [21:28:56<19:15, 2.50s/it] +2025-02-06 07:36:39 - ERROR - stderr - 98%|█████████▊| 21972/22434 [21:28:59<19:19, 2.51s/it] +2025-02-06 07:36:39 - ERROR - stderr - +2025-02-06 07:36:39 - ERROR - stderr - +2025-02-06 07:36:39 - INFO - stdout - {'loss': 0.3333, 'grad_norm': 1.4760771989822388, 'learning_rate': 2.223693347587652e-08, 'epoch': 2.94} +2025-02-06 07:36:39 - ERROR - stderr - 98%|█████████▊| 21972/22434 [21:28:59<19:19, 2.51s/it] +2025-02-06 07:36:42 - ERROR - stderr - 98%|█████████▊| 21973/22434 [21:29:01<19:16, 2.51s/it] +2025-02-06 07:36:42 - ERROR - stderr - +2025-02-06 07:36:42 - ERROR - stderr - +2025-02-06 07:36:42 - INFO - stdout - {'loss': 0.4297, 'grad_norm': 1.761681318283081, 'learning_rate': 2.2140809376124396e-08, 'epoch': 2.94} +2025-02-06 07:36:42 - ERROR - stderr - 98%|█████████▊| 21973/22434 [21:29:01<19:16, 2.51s/it] +2025-02-06 07:36:44 - ERROR - stderr - 98%|█████████▊| 21974/22434 [21:29:04<19:21, 2.52s/it] +2025-02-06 07:36:44 - ERROR - stderr - +2025-02-06 07:36:44 - ERROR - stderr - +2025-02-06 07:36:44 - INFO - stdout - {'loss': 0.3344, 'grad_norm': 1.506162405014038, 'learning_rate': 2.204489325529857e-08, 'epoch': 2.94} +2025-02-06 07:36:44 - ERROR - stderr - 98%|█████████▊| 21974/22434 [21:29:04<19:21, 2.52s/it] +2025-02-06 07:36:47 - ERROR - stderr - 98%|█████████▊| 21975/22434 [21:29:06<19:23, 2.54s/it] +2025-02-06 07:36:47 - ERROR - stderr - +2025-02-06 07:36:47 - ERROR - stderr - +2025-02-06 07:36:47 - INFO - stdout - {'loss': 0.366, 'grad_norm': 1.5161032676696777, 'learning_rate': 2.1949185115398564e-08, 'epoch': 2.94} +2025-02-06 07:36:47 - ERROR - stderr - 98%|█████████▊| 21975/22434 [21:29:07<19:23, 2.54s/it] +2025-02-06 07:36:50 - ERROR - stderr - 98%|█████████▊| 21976/22434 [21:29:09<20:13, 2.65s/it] +2025-02-06 07:36:50 - ERROR - stderr - +2025-02-06 07:36:50 - ERROR - stderr - +2025-02-06 07:36:50 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.6477665901184082, 'learning_rate': 2.1853684958420553e-08, 'epoch': 2.94} +2025-02-06 07:36:50 - ERROR - stderr - 98%|█████████▊| 21976/22434 [21:29:09<20:13, 2.65s/it] +2025-02-06 07:36:52 - ERROR - stderr - 98%|█████████▊| 21977/22434 [21:29:12<19:49, 2.60s/it] +2025-02-06 07:36:52 - ERROR - stderr - +2025-02-06 07:36:52 - ERROR - stderr - +2025-02-06 07:36:52 - INFO - stdout - {'loss': 0.4199, 'grad_norm': 1.7221617698669434, 'learning_rate': 2.1758392786354056e-08, 'epoch': 2.94} +2025-02-06 07:36:52 - ERROR - stderr - 98%|█████████▊| 21977/22434 [21:29:12<19:49, 2.60s/it] +2025-02-06 07:36:55 - ERROR - stderr - 98%|█████████▊| 21978/22434 [21:29:14<19:40, 2.59s/it] +2025-02-06 07:36:55 - ERROR - stderr - +2025-02-06 07:36:55 - ERROR - stderr - +2025-02-06 07:36:55 - INFO - stdout - {'loss': 0.3788, 'grad_norm': 1.5409016609191895, 'learning_rate': 2.166330860118637e-08, 'epoch': 2.94} +2025-02-06 07:36:55 - ERROR - stderr - 98%|█████████▊| 21978/22434 [21:29:14<19:40, 2.59s/it] +2025-02-06 07:36:57 - ERROR - stderr - 98%|█████████▊| 21979/22434 [21:29:17<19:23, 2.56s/it] +2025-02-06 07:36:57 - ERROR - stderr - +2025-02-06 07:36:57 - ERROR - stderr - +2025-02-06 07:36:57 - INFO - stdout - {'loss': 0.3806, 'grad_norm': 1.6811773777008057, 'learning_rate': 2.1568432404898144e-08, 'epoch': 2.94} +2025-02-06 07:36:57 - ERROR - stderr - 98%|█████████▊| 21979/22434 [21:29:17<19:23, 2.56s/it] +2025-02-06 07:37:00 - ERROR - stderr - 98%|█████████▊| 21980/22434 [21:29:19<19:12, 2.54s/it] +2025-02-06 07:37:00 - ERROR - stderr - +2025-02-06 07:37:00 - ERROR - stderr - +2025-02-06 07:37:00 - INFO - stdout - {'loss': 0.372, 'grad_norm': 1.5992987155914307, 'learning_rate': 2.1473764199467784e-08, 'epoch': 2.94} +2025-02-06 07:37:00 - ERROR - stderr - 98%|█████████▊| 21980/22434 [21:29:19<19:12, 2.54s/it] +2025-02-06 07:37:02 - ERROR - stderr - 98%|█████████▊| 21981/22434 [21:29:22<19:38, 2.60s/it] +2025-02-06 07:37:02 - ERROR - stderr - +2025-02-06 07:37:02 - ERROR - stderr - +2025-02-06 07:37:02 - INFO - stdout - {'loss': 0.3218, 'grad_norm': 1.4680454730987549, 'learning_rate': 2.137930398686816e-08, 'epoch': 2.94} +2025-02-06 07:37:02 - ERROR - stderr - 98%|█████████▊| 21981/22434 [21:29:22<19:38, 2.60s/it] +2025-02-06 07:37:05 - ERROR - stderr - 98%|█████████▊| 21982/22434 [21:29:25<19:13, 2.55s/it] +2025-02-06 07:37:05 - ERROR - stderr - +2025-02-06 07:37:05 - ERROR - stderr - +2025-02-06 07:37:05 - INFO - stdout - {'loss': 0.3565, 'grad_norm': 1.5107353925704956, 'learning_rate': 2.128505176906881e-08, 'epoch': 2.94} +2025-02-06 07:37:05 - ERROR - stderr - 98%|█████████▊| 21982/22434 [21:29:25<19:13, 2.55s/it] +2025-02-06 07:37:07 - ERROR - stderr - 98%|█████████▊| 21983/22434 [21:29:27<19:03, 2.54s/it] +2025-02-06 07:37:07 - ERROR - stderr - +2025-02-06 07:37:07 - ERROR - stderr - +2025-02-06 07:37:07 - INFO - stdout - {'loss': 0.3336, 'grad_norm': 1.51665198802948, 'learning_rate': 2.1191007548033715e-08, 'epoch': 2.94} +2025-02-06 07:37:07 - ERROR - stderr - 98%|█████████▊| 21983/22434 [21:29:27<19:03, 2.54s/it] +2025-02-06 07:37:10 - ERROR - stderr - 98%|█████████▊| 21984/22434 [21:29:30<18:52, 2.52s/it] +2025-02-06 07:37:10 - ERROR - stderr - +2025-02-06 07:37:10 - ERROR - stderr - +2025-02-06 07:37:10 - INFO - stdout - {'loss': 0.3916, 'grad_norm': 1.5794954299926758, 'learning_rate': 2.109717132572353e-08, 'epoch': 2.94} +2025-02-06 07:37:10 - ERROR - stderr - 98%|█████████▊| 21984/22434 [21:29:30<18:52, 2.52s/it] +2025-02-06 07:37:12 - ERROR - stderr - 98%|█████████▊| 21985/22434 [21:29:32<18:53, 2.52s/it] +2025-02-06 07:37:12 - ERROR - stderr - +2025-02-06 07:37:12 - ERROR - stderr - +2025-02-06 07:37:12 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.8279074430465698, 'learning_rate': 2.1003543104093362e-08, 'epoch': 2.94} +2025-02-06 07:37:12 - ERROR - stderr - 98%|█████████▊| 21985/22434 [21:29:32<18:53, 2.52s/it] +2025-02-06 07:37:15 - ERROR - stderr - 98%|█████████▊| 21986/22434 [21:29:35<18:51, 2.52s/it] +2025-02-06 07:37:15 - ERROR - stderr - +2025-02-06 07:37:15 - ERROR - stderr - +2025-02-06 07:37:15 - INFO - stdout - {'loss': 0.343, 'grad_norm': 1.580827236175537, 'learning_rate': 2.0910122885097194e-08, 'epoch': 2.94} +2025-02-06 07:37:15 - ERROR - stderr - 98%|█████████▊| 21986/22434 [21:29:35<18:51, 2.52s/it] +2025-02-06 07:37:17 - ERROR - stderr - 98%|█████████▊| 21987/22434 [21:29:37<18:54, 2.54s/it] +2025-02-06 07:37:17 - ERROR - stderr - +2025-02-06 07:37:17 - ERROR - stderr - +2025-02-06 07:37:17 - INFO - stdout - {'loss': 0.3938, 'grad_norm': 1.5968875885009766, 'learning_rate': 2.0816910670679035e-08, 'epoch': 2.94} +2025-02-06 07:37:17 - ERROR - stderr - 98%|█████████▊| 21987/22434 [21:29:37<18:54, 2.54s/it] +2025-02-06 07:37:20 - ERROR - stderr - 98%|█████████▊| 21988/22434 [21:29:40<18:43, 2.52s/it] +2025-02-06 07:37:20 - ERROR - stderr - +2025-02-06 07:37:20 - ERROR - stderr - +2025-02-06 07:37:20 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.5079947710037231, 'learning_rate': 2.0723906462783995e-08, 'epoch': 2.94} +2025-02-06 07:37:20 - ERROR - stderr - 98%|█████████▊| 21988/22434 [21:29:40<18:43, 2.52s/it] +2025-02-06 07:37:22 - ERROR - stderr - 98%|█████████▊| 21989/22434 [21:29:42<18:35, 2.51s/it] +2025-02-06 07:37:22 - ERROR - stderr - +2025-02-06 07:37:22 - ERROR - stderr - +2025-02-06 07:37:22 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.4081978797912598, 'learning_rate': 2.063111026334941e-08, 'epoch': 2.94} +2025-02-06 07:37:22 - ERROR - stderr - 98%|█████████▊| 21989/22434 [21:29:42<18:35, 2.51s/it] +2025-02-06 07:37:25 - ERROR - stderr - 98%|█████████▊| 21990/22434 [21:29:45<18:23, 2.48s/it] +2025-02-06 07:37:25 - ERROR - stderr - +2025-02-06 07:37:25 - ERROR - stderr - +2025-02-06 07:37:25 - INFO - stdout - {'loss': 0.4195, 'grad_norm': 1.8406225442886353, 'learning_rate': 2.0538522074310395e-08, 'epoch': 2.94} +2025-02-06 07:37:25 - ERROR - stderr - 98%|█████████▊| 21990/22434 [21:29:45<18:23, 2.48s/it] +2025-02-06 07:37:27 - ERROR - stderr - 98%|█████████▊| 21991/22434 [21:29:47<18:16, 2.47s/it] +2025-02-06 07:37:27 - ERROR - stderr - +2025-02-06 07:37:27 - ERROR - stderr - +2025-02-06 07:37:27 - INFO - stdout - {'loss': 0.3978, 'grad_norm': 1.7217954397201538, 'learning_rate': 2.0446141897596528e-08, 'epoch': 2.94} +2025-02-06 07:37:27 - ERROR - stderr - 98%|█████████▊| 21991/22434 [21:29:47<18:16, 2.47s/it] +2025-02-06 07:37:30 - ERROR - stderr - 98%|█████████▊| 21992/22434 [21:29:50<18:25, 2.50s/it] +2025-02-06 07:37:30 - ERROR - stderr - +2025-02-06 07:37:30 - ERROR - stderr - +2025-02-06 07:37:30 - INFO - stdout - {'loss': 0.2995, 'grad_norm': 1.5998014211654663, 'learning_rate': 2.0353969735134037e-08, 'epoch': 2.94} +2025-02-06 07:37:30 - ERROR - stderr - 98%|█████████▊| 21992/22434 [21:29:50<18:25, 2.50s/it] +2025-02-06 07:37:32 - ERROR - stderr - 98%|█████████▊| 21993/22434 [21:29:52<18:17, 2.49s/it] +2025-02-06 07:37:32 - ERROR - stderr - +2025-02-06 07:37:32 - ERROR - stderr - +2025-02-06 07:37:32 - INFO - stdout - {'loss': 0.3098, 'grad_norm': 1.3732471466064453, 'learning_rate': 2.0262005588842503e-08, 'epoch': 2.94} +2025-02-06 07:37:32 - ERROR - stderr - 98%|█████████▊| 21993/22434 [21:29:52<18:17, 2.49s/it] +2025-02-06 07:37:35 - ERROR - stderr - 98%|█████████▊| 21994/22434 [21:29:55<18:40, 2.55s/it] +2025-02-06 07:37:35 - ERROR - stderr - +2025-02-06 07:37:35 - ERROR - stderr - +2025-02-06 07:37:35 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.4810923337936401, 'learning_rate': 2.01702494606415e-08, 'epoch': 2.94} +2025-02-06 07:37:35 - ERROR - stderr - 98%|█████████▊| 21994/22434 [21:29:55<18:40, 2.55s/it] +2025-02-06 07:37:37 - ERROR - stderr - 98%|█████████▊| 21995/22434 [21:29:57<18:26, 2.52s/it] +2025-02-06 07:37:37 - ERROR - stderr - +2025-02-06 07:37:37 - ERROR - stderr - +2025-02-06 07:37:37 - INFO - stdout - {'loss': 0.3302, 'grad_norm': 1.4648445844650269, 'learning_rate': 2.007870135244061e-08, 'epoch': 2.94} +2025-02-06 07:37:37 - ERROR - stderr - 98%|█████████▊| 21995/22434 [21:29:57<18:26, 2.52s/it] +2025-02-06 07:37:40 - ERROR - stderr - 98%|█████████▊| 21996/22434 [21:30:00<18:16, 2.50s/it] +2025-02-06 07:37:40 - ERROR - stderr - +2025-02-06 07:37:40 - ERROR - stderr - +2025-02-06 07:37:40 - INFO - stdout - {'loss': 0.4216, 'grad_norm': 1.5751278400421143, 'learning_rate': 1.998736126614942e-08, 'epoch': 2.94} +2025-02-06 07:37:40 - ERROR - stderr - 98%|█████████▊| 21996/22434 [21:30:00<18:16, 2.50s/it] +2025-02-06 07:37:42 - ERROR - stderr - 98%|█████████▊| 21997/22434 [21:30:02<18:16, 2.51s/it] +2025-02-06 07:37:42 - ERROR - stderr - +2025-02-06 07:37:42 - ERROR - stderr - +2025-02-06 07:37:42 - INFO - stdout - {'loss': 0.3534, 'grad_norm': 1.7293936014175415, 'learning_rate': 1.9896229203671956e-08, 'epoch': 2.94} +2025-02-06 07:37:42 - ERROR - stderr - 98%|█████████▊| 21997/22434 [21:30:02<18:16, 2.51s/it] +2025-02-06 07:37:45 - ERROR - stderr - 98%|█████████▊| 21998/22434 [21:30:05<18:16, 2.52s/it] +2025-02-06 07:37:45 - ERROR - stderr - +2025-02-06 07:37:45 - ERROR - stderr - +2025-02-06 07:37:45 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.7084097862243652, 'learning_rate': 1.9805305166908926e-08, 'epoch': 2.94} +2025-02-06 07:37:45 - ERROR - stderr - 98%|█████████▊| 21998/22434 [21:30:05<18:16, 2.52s/it] +2025-02-06 07:37:47 - ERROR - stderr - 98%|█████████▊| 21999/22434 [21:30:07<18:10, 2.51s/it] +2025-02-06 07:37:47 - ERROR - stderr - +2025-02-06 07:37:47 - ERROR - stderr - +2025-02-06 07:37:47 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.4307655096054077, 'learning_rate': 1.9714589157753262e-08, 'epoch': 2.94} +2025-02-06 07:37:47 - ERROR - stderr - 98%|█████████▊| 21999/22434 [21:30:07<18:10, 2.51s/it] +2025-02-06 07:37:50 - ERROR - stderr - 98%|█████████▊| 22000/22434 [21:30:10<18:09, 2.51s/it] +2025-02-06 07:37:50 - ERROR - stderr - +2025-02-06 07:37:50 - ERROR - stderr - +2025-02-06 07:37:50 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.5646039247512817, 'learning_rate': 1.9624081178096777e-08, 'epoch': 2.94} +2025-02-06 07:37:50 - ERROR - stderr - 98%|█████████▊| 22000/22434 [21:30:10<18:09, 2.51s/it] +2025-02-06 07:37:52 - ERROR - stderr - 98%|█████████▊| 22001/22434 [21:30:12<18:08, 2.51s/it] +2025-02-06 07:37:53 - ERROR - stderr - +2025-02-06 07:37:53 - ERROR - stderr - +2025-02-06 07:37:53 - INFO - stdout - {'loss': 0.3931, 'grad_norm': 1.6654305458068848, 'learning_rate': 1.9533781229825742e-08, 'epoch': 2.94} +2025-02-06 07:37:53 - ERROR - stderr - 98%|█████████▊| 22001/22434 [21:30:12<18:08, 2.51s/it] +2025-02-06 07:37:55 - ERROR - stderr - 98%|█████████▊| 22002/22434 [21:30:15<18:00, 2.50s/it] +2025-02-06 07:37:55 - ERROR - stderr - +2025-02-06 07:37:55 - ERROR - stderr - +2025-02-06 07:37:55 - INFO - stdout - {'loss': 0.3772, 'grad_norm': 1.5475987195968628, 'learning_rate': 1.94436893148231e-08, 'epoch': 2.94} +2025-02-06 07:37:55 - ERROR - stderr - 98%|█████████▊| 22002/22434 [21:30:15<18:00, 2.50s/it] +2025-02-06 07:37:57 - ERROR - stderr - 98%|█████████▊| 22003/22434 [21:30:17<18:02, 2.51s/it] +2025-02-06 07:37:58 - ERROR - stderr - +2025-02-06 07:37:58 - ERROR - stderr - +2025-02-06 07:37:58 - INFO - stdout - {'loss': 0.3468, 'grad_norm': 1.4131194353103638, 'learning_rate': 1.9353805434967343e-08, 'epoch': 2.94} +2025-02-06 07:37:58 - ERROR - stderr - 98%|█████████▊| 22003/22434 [21:30:17<18:02, 2.51s/it] +2025-02-06 07:38:00 - ERROR - stderr - 98%|█████████▊| 22004/22434 [21:30:20<18:23, 2.57s/it] +2025-02-06 07:38:00 - ERROR - stderr - +2025-02-06 07:38:00 - ERROR - stderr - +2025-02-06 07:38:00 - INFO - stdout - {'loss': 0.395, 'grad_norm': 1.6756306886672974, 'learning_rate': 1.926412959213031e-08, 'epoch': 2.94} +2025-02-06 07:38:00 - ERROR - stderr - 98%|█████████▊| 22004/22434 [21:30:20<18:23, 2.57s/it] +2025-02-06 07:38:03 - ERROR - stderr - 98%|█████████▊| 22005/22434 [21:30:22<18:05, 2.53s/it] +2025-02-06 07:38:03 - ERROR - stderr - +2025-02-06 07:38:03 - ERROR - stderr - +2025-02-06 07:38:03 - INFO - stdout - {'loss': 0.3761, 'grad_norm': 1.6910536289215088, 'learning_rate': 1.9174661788181613e-08, 'epoch': 2.94} +2025-02-06 07:38:03 - ERROR - stderr - 98%|█████████▊| 22005/22434 [21:30:22<18:05, 2.53s/it] +2025-02-06 07:38:05 - ERROR - stderr - 98%|█████████▊| 22006/22434 [21:30:25<18:09, 2.55s/it] +2025-02-06 07:38:05 - ERROR - stderr - +2025-02-06 07:38:05 - ERROR - stderr - +2025-02-06 07:38:05 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.5505757331848145, 'learning_rate': 1.9085402024987542e-08, 'epoch': 2.94} +2025-02-06 07:38:05 - ERROR - stderr - 98%|█████████▊| 22006/22434 [21:30:25<18:09, 2.55s/it] +2025-02-06 07:38:08 - ERROR - stderr - 98%|█████████▊| 22007/22434 [21:30:27<17:52, 2.51s/it] +2025-02-06 07:38:08 - ERROR - stderr - +2025-02-06 07:38:08 - ERROR - stderr - +2025-02-06 07:38:08 - INFO - stdout - {'loss': 0.3579, 'grad_norm': 1.7231221199035645, 'learning_rate': 1.8996350304406607e-08, 'epoch': 2.94} +2025-02-06 07:38:08 - ERROR - stderr - 98%|█████████▊| 22007/22434 [21:30:27<17:52, 2.51s/it] +2025-02-06 07:38:10 - ERROR - stderr - 98%|█████████▊| 22008/22434 [21:30:30<18:02, 2.54s/it] +2025-02-06 07:38:10 - ERROR - stderr - +2025-02-06 07:38:10 - ERROR - stderr - +2025-02-06 07:38:10 - INFO - stdout - {'loss': 0.3396, 'grad_norm': 1.4031049013137817, 'learning_rate': 1.8907506628296212e-08, 'epoch': 2.94} +2025-02-06 07:38:10 - ERROR - stderr - 98%|█████████▊| 22008/22434 [21:30:30<18:02, 2.54s/it] +2025-02-06 07:38:13 - ERROR - stderr - 98%|█████████▊| 22009/22434 [21:30:33<17:54, 2.53s/it] +2025-02-06 07:38:13 - ERROR - stderr - +2025-02-06 07:38:13 - ERROR - stderr - +2025-02-06 07:38:13 - INFO - stdout - {'loss': 0.3563, 'grad_norm': 1.6705553531646729, 'learning_rate': 1.881887099850821e-08, 'epoch': 2.94} +2025-02-06 07:38:13 - ERROR - stderr - 98%|█████████▊| 22009/22434 [21:30:33<17:54, 2.53s/it] +2025-02-06 07:38:15 - ERROR - stderr - 98%|█████████▊| 22010/22434 [21:30:35<17:37, 2.49s/it] +2025-02-06 07:38:15 - ERROR - stderr - +2025-02-06 07:38:15 - ERROR - stderr - +2025-02-06 07:38:15 - INFO - stdout - {'loss': 0.3396, 'grad_norm': 1.3488579988479614, 'learning_rate': 1.873044341689001e-08, 'epoch': 2.94} +2025-02-06 07:38:15 - ERROR - stderr - 98%|█████████▊| 22010/22434 [21:30:35<17:37, 2.49s/it] +2025-02-06 07:38:18 - ERROR - stderr - 98%|█████████▊| 22011/22434 [21:30:37<17:40, 2.51s/it] +2025-02-06 07:38:18 - ERROR - stderr - +2025-02-06 07:38:18 - ERROR - stderr - +2025-02-06 07:38:18 - INFO - stdout - {'loss': 0.3063, 'grad_norm': 1.3270381689071655, 'learning_rate': 1.8642223885283474e-08, 'epoch': 2.94} +2025-02-06 07:38:18 - ERROR - stderr - 98%|█████████▊| 22011/22434 [21:30:38<17:40, 2.51s/it] +2025-02-06 07:38:18 - INFO - stdout - WARNING: tokenization mismatch: 1 vs. 55. (ignored) +2025-02-06 07:38:20 - ERROR - stderr - 98%|█████████▊| 22012/22434 [21:30:40<17:57, 2.55s/it] +2025-02-06 07:38:20 - ERROR - stderr - +2025-02-06 07:38:20 - ERROR - stderr - +2025-02-06 07:38:20 - INFO - stdout - {'loss': 0.3448, 'grad_norm': 1.6327368021011353, 'learning_rate': 1.8554212405530457e-08, 'epoch': 2.94} +2025-02-06 07:38:20 - ERROR - stderr - 98%|█████████▊| 22012/22434 [21:30:40<17:57, 2.55s/it] +2025-02-06 07:38:23 - ERROR - stderr - 98%|█████████▊| 22013/22434 [21:30:43<18:04, 2.58s/it] +2025-02-06 07:38:23 - ERROR - stderr - +2025-02-06 07:38:23 - ERROR - stderr - +2025-02-06 07:38:23 - INFO - stdout - {'loss': 0.3815, 'grad_norm': 1.7653745412826538, 'learning_rate': 1.8466408979461724e-08, 'epoch': 2.94} +2025-02-06 07:38:23 - ERROR - stderr - 98%|█████████▊| 22013/22434 [21:30:43<18:04, 2.58s/it] +2025-02-06 07:38:26 - ERROR - stderr - 98%|█████████▊| 22014/22434 [21:30:45<18:17, 2.61s/it] +2025-02-06 07:38:26 - ERROR - stderr - +2025-02-06 07:38:26 - ERROR - stderr - +2025-02-06 07:38:26 - INFO - stdout - {'loss': 0.3747, 'grad_norm': 1.6860569715499878, 'learning_rate': 1.837881360891136e-08, 'epoch': 2.94} +2025-02-06 07:38:26 - ERROR - stderr - 98%|█████████▊| 22014/22434 [21:30:46<18:17, 2.61s/it] +2025-02-06 07:38:28 - ERROR - stderr - 98%|█████████▊| 22015/22434 [21:30:48<18:15, 2.61s/it] +2025-02-06 07:38:28 - ERROR - stderr - +2025-02-06 07:38:28 - ERROR - stderr - +2025-02-06 07:38:28 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.3813573122024536, 'learning_rate': 1.8291426295702353e-08, 'epoch': 2.94} +2025-02-06 07:38:28 - ERROR - stderr - 98%|█████████▊| 22015/22434 [21:30:48<18:15, 2.61s/it] +2025-02-06 07:38:31 - ERROR - stderr - 98%|█████████▊| 22016/22434 [21:30:51<18:08, 2.60s/it] +2025-02-06 07:38:31 - ERROR - stderr - +2025-02-06 07:38:31 - ERROR - stderr - +2025-02-06 07:38:31 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.641084909439087, 'learning_rate': 1.8204247041656576e-08, 'epoch': 2.94} +2025-02-06 07:38:31 - ERROR - stderr - 98%|█████████▊| 22016/22434 [21:30:51<18:08, 2.60s/it] +2025-02-06 07:38:33 - ERROR - stderr - 98%|█████████▊| 22017/22434 [21:30:53<17:55, 2.58s/it] +2025-02-06 07:38:33 - ERROR - stderr - +2025-02-06 07:38:33 - ERROR - stderr - +2025-02-06 07:38:33 - INFO - stdout - {'loss': 0.3248, 'grad_norm': 1.4626497030258179, 'learning_rate': 1.8117275848592574e-08, 'epoch': 2.94} +2025-02-06 07:38:33 - ERROR - stderr - 98%|█████████▊| 22017/22434 [21:30:53<17:55, 2.58s/it] +2025-02-06 07:38:36 - ERROR - stderr - 98%|█████████▊| 22018/22434 [21:30:56<17:43, 2.56s/it] +2025-02-06 07:38:36 - ERROR - stderr - +2025-02-06 07:38:36 - ERROR - stderr - +2025-02-06 07:38:36 - INFO - stdout - {'loss': 0.3949, 'grad_norm': 1.724860668182373, 'learning_rate': 1.8030512718322235e-08, 'epoch': 2.94} +2025-02-06 07:38:36 - ERROR - stderr - 98%|█████████▊| 22018/22434 [21:30:56<17:43, 2.56s/it] +2025-02-06 07:38:38 - ERROR - stderr - 98%|█████████▊| 22019/22434 [21:30:58<17:28, 2.53s/it] +2025-02-06 07:38:38 - ERROR - stderr - +2025-02-06 07:38:38 - ERROR - stderr - +2025-02-06 07:38:38 - INFO - stdout - {'loss': 0.3814, 'grad_norm': 1.7224942445755005, 'learning_rate': 1.7943957652653e-08, 'epoch': 2.94} +2025-02-06 07:38:38 - ERROR - stderr - 98%|█████████▊| 22019/22434 [21:30:58<17:28, 2.53s/it] +2025-02-06 07:38:41 - ERROR - stderr - 98%|█████████▊| 22020/22434 [21:31:01<17:22, 2.52s/it] +2025-02-06 07:38:41 - ERROR - stderr - +2025-02-06 07:38:41 - ERROR - stderr - +2025-02-06 07:38:41 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.5102858543395996, 'learning_rate': 1.7857610653391198e-08, 'epoch': 2.94} +2025-02-06 07:38:41 - ERROR - stderr - 98%|█████████▊| 22020/22434 [21:31:01<17:22, 2.52s/it] +2025-02-06 07:38:43 - ERROR - stderr - 98%|█████████▊| 22021/22434 [21:31:03<17:18, 2.52s/it] +2025-02-06 07:38:43 - ERROR - stderr - +2025-02-06 07:38:43 - ERROR - stderr - +2025-02-06 07:38:43 - INFO - stdout - {'loss': 0.351, 'grad_norm': 1.4710279703140259, 'learning_rate': 1.77714717223354e-08, 'epoch': 2.94} +2025-02-06 07:38:43 - ERROR - stderr - 98%|█████████▊| 22021/22434 [21:31:03<17:18, 2.52s/it] +2025-02-06 07:38:46 - ERROR - stderr - 98%|█████████▊| 22022/22434 [21:31:06<17:13, 2.51s/it] +2025-02-06 07:38:46 - ERROR - stderr - +2025-02-06 07:38:46 - ERROR - stderr - +2025-02-06 07:38:46 - INFO - stdout - {'loss': 0.3118, 'grad_norm': 1.602555513381958, 'learning_rate': 1.7685540861281937e-08, 'epoch': 2.94} +2025-02-06 07:38:46 - ERROR - stderr - 98%|█████████▊| 22022/22434 [21:31:06<17:13, 2.51s/it] +2025-02-06 07:38:48 - ERROR - stderr - 98%|█████████▊| 22023/22434 [21:31:08<17:10, 2.51s/it] +2025-02-06 07:38:48 - ERROR - stderr - +2025-02-06 07:38:48 - ERROR - stderr - +2025-02-06 07:38:48 - INFO - stdout - {'loss': 0.3841, 'grad_norm': 1.6551854610443115, 'learning_rate': 1.7599818072020492e-08, 'epoch': 2.95} +2025-02-06 07:38:48 - ERROR - stderr - 98%|█████████▊| 22023/22434 [21:31:08<17:10, 2.51s/it] +2025-02-06 07:38:51 - ERROR - stderr - 98%|█████████▊| 22024/22434 [21:31:11<17:09, 2.51s/it] +2025-02-06 07:38:51 - ERROR - stderr - +2025-02-06 07:38:51 - ERROR - stderr - +2025-02-06 07:38:51 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.7148544788360596, 'learning_rate': 1.7514303356339635e-08, 'epoch': 2.95} +2025-02-06 07:38:51 - ERROR - stderr - 98%|█████████▊| 22024/22434 [21:31:11<17:09, 2.51s/it] +2025-02-06 07:38:53 - ERROR - stderr - 98%|█████████▊| 22025/22434 [21:31:13<17:04, 2.50s/it] +2025-02-06 07:38:53 - ERROR - stderr - +2025-02-06 07:38:53 - ERROR - stderr - +2025-02-06 07:38:53 - INFO - stdout - {'loss': 0.3339, 'grad_norm': 1.586101770401001, 'learning_rate': 1.7428996716020163e-08, 'epoch': 2.95} +2025-02-06 07:38:53 - ERROR - stderr - 98%|█████████▊| 22025/22434 [21:31:13<17:04, 2.50s/it] +2025-02-06 07:38:56 - ERROR - stderr - 98%|█████████▊| 22026/22434 [21:31:16<16:56, 2.49s/it] +2025-02-06 07:38:56 - ERROR - stderr - +2025-02-06 07:38:56 - ERROR - stderr - +2025-02-06 07:38:56 - INFO - stdout - {'loss': 0.3063, 'grad_norm': 1.5500149726867676, 'learning_rate': 1.7343898152841765e-08, 'epoch': 2.95} +2025-02-06 07:38:56 - ERROR - stderr - 98%|█████████▊| 22026/22434 [21:31:16<16:56, 2.49s/it] +2025-02-06 07:38:58 - ERROR - stderr - 98%|█████████▊| 22027/22434 [21:31:18<16:51, 2.49s/it] +2025-02-06 07:38:58 - ERROR - stderr - +2025-02-06 07:38:58 - ERROR - stderr - +2025-02-06 07:38:58 - INFO - stdout - {'loss': 0.3946, 'grad_norm': 1.5457959175109863, 'learning_rate': 1.7259007668576355e-08, 'epoch': 2.95} +2025-02-06 07:38:58 - ERROR - stderr - 98%|█████████▊| 22027/22434 [21:31:18<16:51, 2.49s/it] +2025-02-06 07:39:01 - ERROR - stderr - 98%|█████████▊| 22028/22434 [21:31:21<16:47, 2.48s/it] +2025-02-06 07:39:01 - ERROR - stderr - +2025-02-06 07:39:01 - ERROR - stderr - +2025-02-06 07:39:01 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.6086974143981934, 'learning_rate': 1.717432526499474e-08, 'epoch': 2.95} +2025-02-06 07:39:01 - ERROR - stderr - 98%|█████████▊| 22028/22434 [21:31:21<16:47, 2.48s/it] +2025-02-06 07:39:03 - ERROR - stderr - 98%|█████████▊| 22029/22434 [21:31:23<16:47, 2.49s/it] +2025-02-06 07:39:03 - ERROR - stderr - +2025-02-06 07:39:03 - ERROR - stderr - +2025-02-06 07:39:03 - INFO - stdout - {'loss': 0.3924, 'grad_norm': 1.6505303382873535, 'learning_rate': 1.7089850943862175e-08, 'epoch': 2.95} +2025-02-06 07:39:03 - ERROR - stderr - 98%|█████████▊| 22029/22434 [21:31:23<16:47, 2.49s/it] +2025-02-06 07:39:06 - ERROR - stderr - 98%|█████████▊| 22030/22434 [21:31:26<16:55, 2.51s/it] +2025-02-06 07:39:06 - ERROR - stderr - +2025-02-06 07:39:06 - ERROR - stderr - +2025-02-06 07:39:06 - INFO - stdout - {'loss': 0.411, 'grad_norm': 1.7740213871002197, 'learning_rate': 1.700558470693836e-08, 'epoch': 2.95} +2025-02-06 07:39:06 - ERROR - stderr - 98%|█████████▊| 22030/22434 [21:31:26<16:55, 2.51s/it] +2025-02-06 07:39:08 - ERROR - stderr - 98%|█████████▊| 22031/22434 [21:31:28<17:06, 2.55s/it] +2025-02-06 07:39:09 - ERROR - stderr - +2025-02-06 07:39:09 - ERROR - stderr - +2025-02-06 07:39:09 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.459455966949463, 'learning_rate': 1.6921526555981894e-08, 'epoch': 2.95} +2025-02-06 07:39:09 - ERROR - stderr - 98%|█████████▊| 22031/22434 [21:31:28<17:06, 2.55s/it] +2025-02-06 07:39:11 - ERROR - stderr - 98%|█████████▊| 22032/22434 [21:31:31<16:58, 2.53s/it] +2025-02-06 07:39:11 - ERROR - stderr - +2025-02-06 07:39:11 - ERROR - stderr - +2025-02-06 07:39:11 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.6366335153579712, 'learning_rate': 1.6837676492742482e-08, 'epoch': 2.95} +2025-02-06 07:39:11 - ERROR - stderr - 98%|█████████▊| 22032/22434 [21:31:31<16:58, 2.53s/it] +2025-02-06 07:39:13 - ERROR - stderr - 98%|█████████▊| 22033/22434 [21:31:33<16:50, 2.52s/it] +2025-02-06 07:39:14 - ERROR - stderr - +2025-02-06 07:39:14 - ERROR - stderr - +2025-02-06 07:39:14 - INFO - stdout - {'loss': 0.3774, 'grad_norm': 1.9239898920059204, 'learning_rate': 1.6754034518968732e-08, 'epoch': 2.95} +2025-02-06 07:39:14 - ERROR - stderr - 98%|█████████▊| 22033/22434 [21:31:33<16:50, 2.52s/it] +2025-02-06 07:39:16 - ERROR - stderr - 98%|█████████▊| 22034/22434 [21:31:36<16:45, 2.51s/it] +2025-02-06 07:39:16 - ERROR - stderr - +2025-02-06 07:39:16 - ERROR - stderr - +2025-02-06 07:39:16 - INFO - stdout - {'loss': 0.3766, 'grad_norm': 1.683491587638855, 'learning_rate': 1.667060063640369e-08, 'epoch': 2.95} +2025-02-06 07:39:16 - ERROR - stderr - 98%|█████████▊| 22034/22434 [21:31:36<16:45, 2.51s/it] +2025-02-06 07:39:18 - ERROR - stderr - 98%|█████████▊| 22035/22434 [21:31:38<16:34, 2.49s/it] +2025-02-06 07:39:18 - ERROR - stderr - +2025-02-06 07:39:18 - ERROR - stderr - +2025-02-06 07:39:18 - INFO - stdout - {'loss': 0.3642, 'grad_norm': 1.6517918109893799, 'learning_rate': 1.6587374846788186e-08, 'epoch': 2.95} +2025-02-06 07:39:18 - ERROR - stderr - 98%|█████████▊| 22035/22434 [21:31:38<16:34, 2.49s/it] +2025-02-06 07:39:21 - ERROR - stderr - 98%|█████████▊| 22036/22434 [21:31:41<16:39, 2.51s/it] +2025-02-06 07:39:21 - ERROR - stderr - +2025-02-06 07:39:21 - ERROR - stderr - +2025-02-06 07:39:21 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.5421899557113647, 'learning_rate': 1.6504357151855277e-08, 'epoch': 2.95} +2025-02-06 07:39:21 - ERROR - stderr - 98%|█████████▊| 22036/22434 [21:31:41<16:39, 2.51s/it] +2025-02-06 07:39:24 - ERROR - stderr - 98%|█████████▊| 22037/22434 [21:31:43<16:42, 2.53s/it] +2025-02-06 07:39:24 - ERROR - stderr - +2025-02-06 07:39:24 - ERROR - stderr - +2025-02-06 07:39:24 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.7061314582824707, 'learning_rate': 1.6421547553335805e-08, 'epoch': 2.95} +2025-02-06 07:39:24 - ERROR - stderr - 98%|█████████▊| 22037/22434 [21:31:43<16:42, 2.53s/it] +2025-02-06 07:39:26 - ERROR - stderr - 98%|█████████▊| 22038/22434 [21:31:46<16:29, 2.50s/it] +2025-02-06 07:39:26 - ERROR - stderr - +2025-02-06 07:39:26 - ERROR - stderr - +2025-02-06 07:39:26 - INFO - stdout - {'loss': 0.3507, 'grad_norm': 1.6021326780319214, 'learning_rate': 1.6338946052956163e-08, 'epoch': 2.95} +2025-02-06 07:39:26 - ERROR - stderr - 98%|█████████▊| 22038/22434 [21:31:46<16:29, 2.50s/it] +2025-02-06 07:39:29 - ERROR - stderr - 98%|█████████▊| 22039/22434 [21:31:48<16:37, 2.52s/it] +2025-02-06 07:39:29 - ERROR - stderr - +2025-02-06 07:39:29 - ERROR - stderr - +2025-02-06 07:39:29 - INFO - stdout - {'loss': 0.3103, 'grad_norm': 1.533370018005371, 'learning_rate': 1.6256552652437197e-08, 'epoch': 2.95} +2025-02-06 07:39:29 - ERROR - stderr - 98%|█████████▊| 22039/22434 [21:31:48<16:37, 2.52s/it] +2025-02-06 07:39:31 - ERROR - stderr - 98%|█████████▊| 22040/22434 [21:31:51<16:31, 2.52s/it] +2025-02-06 07:39:31 - ERROR - stderr - +2025-02-06 07:39:31 - ERROR - stderr - +2025-02-06 07:39:31 - INFO - stdout - {'loss': 0.3807, 'grad_norm': 1.5597457885742188, 'learning_rate': 1.617436735349753e-08, 'epoch': 2.95} +2025-02-06 07:39:31 - ERROR - stderr - 98%|█████████▊| 22040/22434 [21:31:51<16:31, 2.52s/it] +2025-02-06 07:39:34 - ERROR - stderr - 98%|█████████▊| 22041/22434 [21:31:53<16:44, 2.56s/it] +2025-02-06 07:39:34 - ERROR - stderr - +2025-02-06 07:39:34 - ERROR - stderr - +2025-02-06 07:39:34 - INFO - stdout - {'loss': 0.4047, 'grad_norm': 1.7895134687423706, 'learning_rate': 1.6092390157849137e-08, 'epoch': 2.95} +2025-02-06 07:39:34 - ERROR - stderr - 98%|█████████▊| 22041/22434 [21:31:54<16:44, 2.56s/it] +2025-02-06 07:39:36 - ERROR - stderr - 98%|█████████▊| 22042/22434 [21:31:56<16:34, 2.54s/it] +2025-02-06 07:39:36 - ERROR - stderr - +2025-02-06 07:39:36 - ERROR - stderr - +2025-02-06 07:39:36 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.5382256507873535, 'learning_rate': 1.601062106720175e-08, 'epoch': 2.95} +2025-02-06 07:39:36 - ERROR - stderr - 98%|█████████▊| 22042/22434 [21:31:56<16:34, 2.54s/it] +2025-02-06 07:39:39 - ERROR - stderr - 98%|█████████▊| 22043/22434 [21:31:59<16:34, 2.54s/it] +2025-02-06 07:39:39 - ERROR - stderr - +2025-02-06 07:39:39 - ERROR - stderr - +2025-02-06 07:39:39 - INFO - stdout - {'loss': 0.3505, 'grad_norm': 1.496140718460083, 'learning_rate': 1.5929060083259563e-08, 'epoch': 2.95} +2025-02-06 07:39:39 - ERROR - stderr - 98%|█████████▊| 22043/22434 [21:31:59<16:34, 2.54s/it] +2025-02-06 07:39:41 - ERROR - stderr - 98%|█████████▊| 22044/22434 [21:32:01<16:42, 2.57s/it] +2025-02-06 07:39:41 - ERROR - stderr - +2025-02-06 07:39:41 - ERROR - stderr - +2025-02-06 07:39:41 - INFO - stdout - {'loss': 0.37, 'grad_norm': 1.5602507591247559, 'learning_rate': 1.584770720772233e-08, 'epoch': 2.95} +2025-02-06 07:39:41 - ERROR - stderr - 98%|█████████▊| 22044/22434 [21:32:01<16:42, 2.57s/it] +2025-02-06 07:39:44 - ERROR - stderr - 98%|█████████▊| 22045/22434 [21:32:04<16:34, 2.56s/it] +2025-02-06 07:39:44 - ERROR - stderr - +2025-02-06 07:39:44 - ERROR - stderr - +2025-02-06 07:39:44 - INFO - stdout - {'loss': 0.3544, 'grad_norm': 1.5773041248321533, 'learning_rate': 1.576656244228536e-08, 'epoch': 2.95} +2025-02-06 07:39:44 - ERROR - stderr - 98%|█████████▊| 22045/22434 [21:32:04<16:34, 2.56s/it] +2025-02-06 07:39:46 - ERROR - stderr - 98%|█████████▊| 22046/22434 [21:32:06<16:34, 2.56s/it] +2025-02-06 07:39:47 - ERROR - stderr - +2025-02-06 07:39:47 - ERROR - stderr - +2025-02-06 07:39:47 - INFO - stdout - {'loss': 0.3362, 'grad_norm': 1.4929159879684448, 'learning_rate': 1.5685625788640635e-08, 'epoch': 2.95} +2025-02-06 07:39:47 - ERROR - stderr - 98%|█████████▊| 22046/22434 [21:32:06<16:34, 2.56s/it] +2025-02-06 07:39:49 - ERROR - stderr - 98%|█████████▊| 22047/22434 [21:32:09<16:33, 2.57s/it] +2025-02-06 07:39:49 - ERROR - stderr - +2025-02-06 07:39:49 - ERROR - stderr - +2025-02-06 07:39:49 - INFO - stdout - {'loss': 0.3329, 'grad_norm': 1.413486361503601, 'learning_rate': 1.5604897248475692e-08, 'epoch': 2.95} +2025-02-06 07:39:49 - ERROR - stderr - 98%|█████████▊| 22047/22434 [21:32:09<16:33, 2.57s/it] +2025-02-06 07:39:52 - ERROR - stderr - 98%|█████████▊| 22048/22434 [21:32:11<16:25, 2.55s/it] +2025-02-06 07:39:52 - ERROR - stderr - +2025-02-06 07:39:52 - ERROR - stderr - +2025-02-06 07:39:52 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.293760061264038, 'learning_rate': 1.552437682347252e-08, 'epoch': 2.95} +2025-02-06 07:39:52 - ERROR - stderr - 98%|█████████▊| 22048/22434 [21:32:11<16:25, 2.55s/it] +2025-02-06 07:39:54 - ERROR - stderr - 98%|█████████▊| 22049/22434 [21:32:14<16:18, 2.54s/it] +2025-02-06 07:39:54 - ERROR - stderr - +2025-02-06 07:39:54 - ERROR - stderr - +2025-02-06 07:39:54 - INFO - stdout - {'loss': 0.3225, 'grad_norm': 1.4096174240112305, 'learning_rate': 1.5444064515308666e-08, 'epoch': 2.95} +2025-02-06 07:39:54 - ERROR - stderr - 98%|█████████▊| 22049/22434 [21:32:14<16:18, 2.54s/it] +2025-02-06 07:39:57 - ERROR - stderr - 98%|█████████▊| 22050/22434 [21:32:16<16:10, 2.53s/it] +2025-02-06 07:39:57 - ERROR - stderr - +2025-02-06 07:39:57 - ERROR - stderr - +2025-02-06 07:39:57 - INFO - stdout - {'loss': 0.4195, 'grad_norm': 1.614760160446167, 'learning_rate': 1.5363960325660565e-08, 'epoch': 2.95} +2025-02-06 07:39:57 - ERROR - stderr - 98%|█████████▊| 22050/22434 [21:32:16<16:10, 2.53s/it] +2025-02-06 07:39:59 - ERROR - stderr - 98%|█████████▊| 22051/22434 [21:32:19<16:32, 2.59s/it] +2025-02-06 07:39:59 - ERROR - stderr - +2025-02-06 07:39:59 - ERROR - stderr - +2025-02-06 07:39:59 - INFO - stdout - {'loss': 0.3477, 'grad_norm': 1.2995647192001343, 'learning_rate': 1.5284064256195773e-08, 'epoch': 2.95} +2025-02-06 07:39:59 - ERROR - stderr - 98%|█████████▊| 22051/22434 [21:32:19<16:32, 2.59s/it] +2025-02-06 07:40:02 - ERROR - stderr - 98%|█████████▊| 22052/22434 [21:32:22<16:12, 2.55s/it] +2025-02-06 07:40:02 - ERROR - stderr - +2025-02-06 07:40:02 - ERROR - stderr - +2025-02-06 07:40:02 - INFO - stdout - {'loss': 0.2896, 'grad_norm': 1.5287638902664185, 'learning_rate': 1.5204376308579627e-08, 'epoch': 2.95} +2025-02-06 07:40:02 - ERROR - stderr - 98%|█████████▊| 22052/22434 [21:32:22<16:12, 2.55s/it] +2025-02-06 07:40:04 - ERROR - stderr - 98%|█████████▊| 22053/22434 [21:32:24<16:04, 2.53s/it] +2025-02-06 07:40:04 - ERROR - stderr - +2025-02-06 07:40:04 - ERROR - stderr - +2025-02-06 07:40:04 - INFO - stdout - {'loss': 0.3356, 'grad_norm': 1.5452977418899536, 'learning_rate': 1.5124896484474127e-08, 'epoch': 2.95} +2025-02-06 07:40:04 - ERROR - stderr - 98%|█████████▊| 22053/22434 [21:32:24<16:04, 2.53s/it] +2025-02-06 07:40:07 - ERROR - stderr - 98%|█████████▊| 22054/22434 [21:32:27<16:04, 2.54s/it] +2025-02-06 07:40:07 - ERROR - stderr - +2025-02-06 07:40:07 - ERROR - stderr - +2025-02-06 07:40:07 - INFO - stdout - {'loss': 0.3587, 'grad_norm': 1.6063107252120972, 'learning_rate': 1.504562478553684e-08, 'epoch': 2.95} +2025-02-06 07:40:07 - ERROR - stderr - 98%|█████████▊| 22054/22434 [21:32:27<16:04, 2.54s/it] +2025-02-06 07:40:09 - ERROR - stderr - 98%|█████████▊| 22055/22434 [21:32:29<15:57, 2.53s/it] +2025-02-06 07:40:09 - ERROR - stderr - +2025-02-06 07:40:09 - ERROR - stderr - +2025-02-06 07:40:09 - INFO - stdout - {'loss': 0.3376, 'grad_norm': 1.4274415969848633, 'learning_rate': 1.496656121341755e-08, 'epoch': 2.95} +2025-02-06 07:40:09 - ERROR - stderr - 98%|█████████▊| 22055/22434 [21:32:29<15:57, 2.53s/it] +2025-02-06 07:40:12 - ERROR - stderr - 98%|█████████▊| 22056/22434 [21:32:32<15:52, 2.52s/it] +2025-02-06 07:40:12 - ERROR - stderr - +2025-02-06 07:40:12 - ERROR - stderr - +2025-02-06 07:40:12 - INFO - stdout - {'loss': 0.328, 'grad_norm': 1.3482648134231567, 'learning_rate': 1.4887705769766058e-08, 'epoch': 2.95} +2025-02-06 07:40:12 - ERROR - stderr - 98%|█████████▊| 22056/22434 [21:32:32<15:52, 2.52s/it] +2025-02-06 07:40:14 - ERROR - stderr - 98%|█████████▊| 22057/22434 [21:32:34<15:48, 2.52s/it] +2025-02-06 07:40:14 - ERROR - stderr - +2025-02-06 07:40:14 - ERROR - stderr - +2025-02-06 07:40:14 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.5038604736328125, 'learning_rate': 1.4809058456226599e-08, 'epoch': 2.95} +2025-02-06 07:40:14 - ERROR - stderr - 98%|█████████▊| 22057/22434 [21:32:34<15:48, 2.52s/it] +2025-02-06 07:40:17 - ERROR - stderr - 98%|█████████▊| 22058/22434 [21:32:37<15:42, 2.51s/it] +2025-02-06 07:40:17 - ERROR - stderr - +2025-02-06 07:40:17 - ERROR - stderr - +2025-02-06 07:40:17 - INFO - stdout - {'loss': 0.3521, 'grad_norm': 1.632952332496643, 'learning_rate': 1.4730619274435643e-08, 'epoch': 2.95} +2025-02-06 07:40:17 - ERROR - stderr - 98%|█████████▊| 22058/22434 [21:32:37<15:42, 2.51s/it] +2025-02-06 07:40:19 - ERROR - stderr - 98%|█████████▊| 22059/22434 [21:32:39<15:34, 2.49s/it] +2025-02-06 07:40:19 - ERROR - stderr - +2025-02-06 07:40:19 - ERROR - stderr - +2025-02-06 07:40:19 - INFO - stdout - {'loss': 0.3357, 'grad_norm': 1.4397382736206055, 'learning_rate': 1.4652388226031878e-08, 'epoch': 2.95} +2025-02-06 07:40:19 - ERROR - stderr - 98%|█████████▊| 22059/22434 [21:32:39<15:34, 2.49s/it] +2025-02-06 07:40:22 - ERROR - stderr - 98%|█████████▊| 22060/22434 [21:32:42<15:33, 2.50s/it] +2025-02-06 07:40:22 - ERROR - stderr - +2025-02-06 07:40:22 - ERROR - stderr - +2025-02-06 07:40:22 - INFO - stdout - {'loss': 0.3172, 'grad_norm': 1.337106466293335, 'learning_rate': 1.4574365312642891e-08, 'epoch': 2.95} +2025-02-06 07:40:22 - ERROR - stderr - 98%|█████████▊| 22060/22434 [21:32:42<15:33, 2.50s/it] +2025-02-06 07:40:24 - ERROR - stderr - 98%|█████████▊| 22061/22434 [21:32:44<15:36, 2.51s/it] +2025-02-06 07:40:24 - ERROR - stderr - +2025-02-06 07:40:24 - ERROR - stderr - +2025-02-06 07:40:24 - INFO - stdout - {'loss': 0.3118, 'grad_norm': 1.4436990022659302, 'learning_rate': 1.449655053589627e-08, 'epoch': 2.95} +2025-02-06 07:40:24 - ERROR - stderr - 98%|█████████▊| 22061/22434 [21:32:44<15:36, 2.51s/it] +2025-02-06 07:40:27 - ERROR - stderr - 98%|█████████▊| 22062/22434 [21:32:47<15:25, 2.49s/it] +2025-02-06 07:40:27 - ERROR - stderr - +2025-02-06 07:40:27 - ERROR - stderr - +2025-02-06 07:40:27 - INFO - stdout - {'loss': 0.3331, 'grad_norm': 1.5434415340423584, 'learning_rate': 1.441894389741516e-08, 'epoch': 2.95} +2025-02-06 07:40:27 - ERROR - stderr - 98%|█████████▊| 22062/22434 [21:32:47<15:25, 2.49s/it] +2025-02-06 07:40:29 - ERROR - stderr - 98%|█████████▊| 22063/22434 [21:32:49<15:26, 2.50s/it] +2025-02-06 07:40:29 - ERROR - stderr - +2025-02-06 07:40:29 - ERROR - stderr - +2025-02-06 07:40:29 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.6947423219680786, 'learning_rate': 1.4341545398814937e-08, 'epoch': 2.95} +2025-02-06 07:40:29 - ERROR - stderr - 98%|█████████▊| 22063/22434 [21:32:49<15:26, 2.50s/it] +2025-02-06 07:40:32 - ERROR - stderr - 98%|█████████▊| 22064/22434 [21:32:52<15:26, 2.50s/it] +2025-02-06 07:40:32 - ERROR - stderr - +2025-02-06 07:40:32 - ERROR - stderr - +2025-02-06 07:40:32 - INFO - stdout - {'loss': 0.349, 'grad_norm': 1.38620924949646, 'learning_rate': 1.4264355041709865e-08, 'epoch': 2.95} +2025-02-06 07:40:32 - ERROR - stderr - 98%|█████████▊| 22064/22434 [21:32:52<15:26, 2.50s/it] +2025-02-06 07:40:34 - ERROR - stderr - 98%|█████████▊| 22065/22434 [21:32:54<15:19, 2.49s/it] +2025-02-06 07:40:34 - ERROR - stderr - +2025-02-06 07:40:34 - ERROR - stderr - +2025-02-06 07:40:34 - INFO - stdout - {'loss': 0.3428, 'grad_norm': 1.5586435794830322, 'learning_rate': 1.4187372827709766e-08, 'epoch': 2.95} +2025-02-06 07:40:34 - ERROR - stderr - 98%|█████████▊| 22065/22434 [21:32:54<15:19, 2.49s/it] +2025-02-06 07:40:37 - ERROR - stderr - 98%|█████████▊| 22066/22434 [21:32:56<15:14, 2.49s/it] +2025-02-06 07:40:37 - ERROR - stderr - +2025-02-06 07:40:37 - ERROR - stderr - +2025-02-06 07:40:37 - INFO - stdout - {'loss': 0.3541, 'grad_norm': 1.6843657493591309, 'learning_rate': 1.4110598758417804e-08, 'epoch': 2.95} +2025-02-06 07:40:37 - ERROR - stderr - 98%|█████████▊| 22066/22434 [21:32:57<15:14, 2.49s/it] +2025-02-06 07:40:39 - ERROR - stderr - 98%|█████████▊| 22067/22434 [21:32:59<15:17, 2.50s/it] +2025-02-06 07:40:39 - ERROR - stderr - +2025-02-06 07:40:39 - ERROR - stderr - +2025-02-06 07:40:39 - INFO - stdout - {'loss': 0.3846, 'grad_norm': 1.6519551277160645, 'learning_rate': 1.403403283543603e-08, 'epoch': 2.95} +2025-02-06 07:40:39 - ERROR - stderr - 98%|█████████▊| 22067/22434 [21:32:59<15:17, 2.50s/it] +2025-02-06 07:40:42 - ERROR - stderr - 98%|█████████▊| 22068/22434 [21:33:02<15:19, 2.51s/it] +2025-02-06 07:40:42 - ERROR - stderr - +2025-02-06 07:40:42 - ERROR - stderr - +2025-02-06 07:40:42 - INFO - stdout - {'loss': 0.2927, 'grad_norm': 1.3385058641433716, 'learning_rate': 1.3957675060357611e-08, 'epoch': 2.95} +2025-02-06 07:40:42 - ERROR - stderr - 98%|█████████▊| 22068/22434 [21:33:02<15:19, 2.51s/it] +2025-02-06 07:40:44 - ERROR - stderr - 98%|█████████▊| 22069/22434 [21:33:04<15:17, 2.52s/it] +2025-02-06 07:40:44 - ERROR - stderr - +2025-02-06 07:40:44 - ERROR - stderr - +2025-02-06 07:40:44 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.5511231422424316, 'learning_rate': 1.3881525434776833e-08, 'epoch': 2.95} +2025-02-06 07:40:44 - ERROR - stderr - 98%|█████████▊| 22069/22434 [21:33:04<15:17, 2.52s/it] +2025-02-06 07:40:47 - ERROR - stderr - 98%|█████████▊| 22070/22434 [21:33:07<15:21, 2.53s/it] +2025-02-06 07:40:47 - ERROR - stderr - +2025-02-06 07:40:47 - ERROR - stderr - +2025-02-06 07:40:47 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.46129310131073, 'learning_rate': 1.38055839602802e-08, 'epoch': 2.95} +2025-02-06 07:40:47 - ERROR - stderr - 98%|█████████▊| 22070/22434 [21:33:07<15:21, 2.53s/it] +2025-02-06 07:40:49 - ERROR - stderr - 98%|█████████▊| 22071/22434 [21:33:09<15:23, 2.54s/it] +2025-02-06 07:40:50 - ERROR - stderr - +2025-02-06 07:40:50 - ERROR - stderr - +2025-02-06 07:40:50 - INFO - stdout - {'loss': 0.3217, 'grad_norm': 1.7316912412643433, 'learning_rate': 1.3729850638450892e-08, 'epoch': 2.95} +2025-02-06 07:40:50 - ERROR - stderr - 98%|█████████▊| 22071/22434 [21:33:09<15:23, 2.54s/it] +2025-02-06 07:40:52 - ERROR - stderr - 98%|█████████▊| 22072/22434 [21:33:12<15:43, 2.61s/it] +2025-02-06 07:40:52 - ERROR - stderr - +2025-02-06 07:40:52 - ERROR - stderr - +2025-02-06 07:40:52 - INFO - stdout - {'loss': 0.4326, 'grad_norm': 1.7960542440414429, 'learning_rate': 1.3654325470865426e-08, 'epoch': 2.95} +2025-02-06 07:40:52 - ERROR - stderr - 98%|█████████▊| 22072/22434 [21:33:12<15:43, 2.61s/it] +2025-02-06 07:40:55 - ERROR - stderr - 98%|█████████▊| 22073/22434 [21:33:14<15:24, 2.56s/it] +2025-02-06 07:40:55 - ERROR - stderr - +2025-02-06 07:40:55 - ERROR - stderr - +2025-02-06 07:40:55 - INFO - stdout - {'loss': 0.3237, 'grad_norm': 1.5307230949401855, 'learning_rate': 1.3579008459100317e-08, 'epoch': 2.95} +2025-02-06 07:40:55 - ERROR - stderr - 98%|█████████▊| 22073/22434 [21:33:14<15:24, 2.56s/it] +2025-02-06 07:40:57 - ERROR - stderr - 98%|█████████▊| 22074/22434 [21:33:17<15:22, 2.56s/it] +2025-02-06 07:40:57 - ERROR - stderr - +2025-02-06 07:40:57 - ERROR - stderr - +2025-02-06 07:40:57 - INFO - stdout - {'loss': 0.3519, 'grad_norm': 1.6344870328903198, 'learning_rate': 1.3503899604725424e-08, 'epoch': 2.95} +2025-02-06 07:40:57 - ERROR - stderr - 98%|█████████▊| 22074/22434 [21:33:17<15:22, 2.56s/it] +2025-02-06 07:41:00 - ERROR - stderr - 98%|█████████▊| 22075/22434 [21:33:19<15:11, 2.54s/it] +2025-02-06 07:41:00 - ERROR - stderr - +2025-02-06 07:41:00 - ERROR - stderr - +2025-02-06 07:41:00 - INFO - stdout - {'loss': 0.3532, 'grad_norm': 1.4651750326156616, 'learning_rate': 1.3428998909305046e-08, 'epoch': 2.95} +2025-02-06 07:41:00 - ERROR - stderr - 98%|█████████▊| 22075/22434 [21:33:20<15:11, 2.54s/it] +2025-02-06 07:41:02 - ERROR - stderr - 98%|█████████▊| 22076/22434 [21:33:22<15:09, 2.54s/it] +2025-02-06 07:41:02 - ERROR - stderr - +2025-02-06 07:41:02 - ERROR - stderr - +2025-02-06 07:41:02 - INFO - stdout - {'loss': 0.3495, 'grad_norm': 1.6745234727859497, 'learning_rate': 1.3354306374401271e-08, 'epoch': 2.95} +2025-02-06 07:41:02 - ERROR - stderr - 98%|█████████▊| 22076/22434 [21:33:22<15:09, 2.54s/it] +2025-02-06 07:41:05 - ERROR - stderr - 98%|█████████▊| 22077/22434 [21:33:24<14:56, 2.51s/it] +2025-02-06 07:41:05 - ERROR - stderr - +2025-02-06 07:41:05 - ERROR - stderr - +2025-02-06 07:41:05 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.6372802257537842, 'learning_rate': 1.327982200157063e-08, 'epoch': 2.95} +2025-02-06 07:41:05 - ERROR - stderr - 98%|█████████▊| 22077/22434 [21:33:25<14:56, 2.51s/it] +2025-02-06 07:41:07 - ERROR - stderr - 98%|█████████▊| 22078/22434 [21:33:27<14:59, 2.53s/it] +2025-02-06 07:41:07 - ERROR - stderr - +2025-02-06 07:41:07 - ERROR - stderr - +2025-02-06 07:41:07 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.6900583505630493, 'learning_rate': 1.3205545792366326e-08, 'epoch': 2.95} +2025-02-06 07:41:07 - ERROR - stderr - 98%|█████████▊| 22078/22434 [21:33:27<14:59, 2.53s/it] +2025-02-06 07:41:10 - ERROR - stderr - 98%|█████████▊| 22079/22434 [21:33:30<14:50, 2.51s/it] +2025-02-06 07:41:10 - ERROR - stderr - +2025-02-06 07:41:10 - ERROR - stderr - +2025-02-06 07:41:10 - INFO - stdout - {'loss': 0.3944, 'grad_norm': 1.8080791234970093, 'learning_rate': 1.3131477748336008e-08, 'epoch': 2.95} +2025-02-06 07:41:10 - ERROR - stderr - 98%|█████████▊| 22079/22434 [21:33:30<14:50, 2.51s/it] +2025-02-06 07:41:12 - ERROR - stderr - 98%|█████████▊| 22080/22434 [21:33:32<14:49, 2.51s/it] +2025-02-06 07:41:12 - ERROR - stderr - +2025-02-06 07:41:12 - ERROR - stderr - +2025-02-06 07:41:12 - INFO - stdout - {'loss': 0.3712, 'grad_norm': 1.7625123262405396, 'learning_rate': 1.3057617871022888e-08, 'epoch': 2.95} +2025-02-06 07:41:12 - ERROR - stderr - 98%|█████████▊| 22080/22434 [21:33:32<14:49, 2.51s/it] +2025-02-06 07:41:15 - ERROR - stderr - 98%|█████████▊| 22081/22434 [21:33:35<14:45, 2.51s/it] +2025-02-06 07:41:15 - ERROR - stderr - +2025-02-06 07:41:15 - ERROR - stderr - +2025-02-06 07:41:15 - INFO - stdout - {'loss': 0.3982, 'grad_norm': 1.6536566019058228, 'learning_rate': 1.2983966161967954e-08, 'epoch': 2.95} +2025-02-06 07:41:15 - ERROR - stderr - 98%|█████████▊| 22081/22434 [21:33:35<14:45, 2.51s/it] +2025-02-06 07:41:17 - ERROR - stderr - 98%|████���████▊| 22082/22434 [21:33:37<14:43, 2.51s/it] +2025-02-06 07:41:17 - ERROR - stderr - +2025-02-06 07:41:17 - ERROR - stderr - +2025-02-06 07:41:17 - INFO - stdout - {'loss': 0.384, 'grad_norm': 1.677746057510376, 'learning_rate': 1.2910522622705534e-08, 'epoch': 2.95} +2025-02-06 07:41:17 - ERROR - stderr - 98%|█████████▊| 22082/22434 [21:33:37<14:43, 2.51s/it] +2025-02-06 07:41:20 - ERROR - stderr - 98%|█████████▊| 22083/22434 [21:33:40<14:37, 2.50s/it] +2025-02-06 07:41:20 - ERROR - stderr - +2025-02-06 07:41:20 - ERROR - stderr - +2025-02-06 07:41:20 - INFO - stdout - {'loss': 0.33, 'grad_norm': 1.5278812646865845, 'learning_rate': 1.2837287254766629e-08, 'epoch': 2.95} +2025-02-06 07:41:20 - ERROR - stderr - 98%|█████████▊| 22083/22434 [21:33:40<14:37, 2.50s/it] +2025-02-06 07:41:22 - ERROR - stderr - 98%|█████████▊| 22084/22434 [21:33:42<14:40, 2.52s/it] +2025-02-06 07:41:22 - ERROR - stderr - +2025-02-06 07:41:22 - ERROR - stderr - +2025-02-06 07:41:22 - INFO - stdout - {'loss': 0.3517, 'grad_norm': 1.6542762517929077, 'learning_rate': 1.2764260059677792e-08, 'epoch': 2.95} +2025-02-06 07:41:22 - ERROR - stderr - 98%|█████████▊| 22084/22434 [21:33:42<14:40, 2.52s/it] +2025-02-06 07:41:25 - ERROR - stderr - 98%|█████████▊| 22085/22434 [21:33:45<14:38, 2.52s/it] +2025-02-06 07:41:25 - ERROR - stderr - +2025-02-06 07:41:25 - ERROR - stderr - +2025-02-06 07:41:25 - INFO - stdout - {'loss': 0.3352, 'grad_norm': 1.583396077156067, 'learning_rate': 1.2691441038961139e-08, 'epoch': 2.95} +2025-02-06 07:41:25 - ERROR - stderr - 98%|█████████▊| 22085/22434 [21:33:45<14:38, 2.52s/it] +2025-02-06 07:41:27 - ERROR - stderr - 98%|█████████▊| 22086/22434 [21:33:47<14:49, 2.56s/it] +2025-02-06 07:41:28 - ERROR - stderr - +2025-02-06 07:41:28 - ERROR - stderr - +2025-02-06 07:41:28 - INFO - stdout - {'loss': 0.3269, 'grad_norm': 1.304334282875061, 'learning_rate': 1.2618830194135456e-08, 'epoch': 2.95} +2025-02-06 07:41:28 - ERROR - stderr - 98%|█████████▊| 22086/22434 [21:33:47<14:49, 2.56s/it] +2025-02-06 07:41:30 - ERROR - stderr - 98%|█████████▊| 22087/22434 [21:33:50<14:46, 2.56s/it] +2025-02-06 07:41:30 - ERROR - stderr - +2025-02-06 07:41:30 - ERROR - stderr - +2025-02-06 07:41:30 - INFO - stdout - {'loss': 0.3398, 'grad_norm': 1.615761160850525, 'learning_rate': 1.2546427526711757e-08, 'epoch': 2.95} +2025-02-06 07:41:30 - ERROR - stderr - 98%|█████████▊| 22087/22434 [21:33:50<14:46, 2.56s/it] +2025-02-06 07:41:33 - ERROR - stderr - 98%|█████████▊| 22088/22434 [21:33:52<14:43, 2.55s/it] +2025-02-06 07:41:33 - ERROR - stderr - +2025-02-06 07:41:33 - ERROR - stderr - +2025-02-06 07:41:33 - INFO - stdout - {'loss': 0.3351, 'grad_norm': 1.5632275342941284, 'learning_rate': 1.2474233038202167e-08, 'epoch': 2.95} +2025-02-06 07:41:33 - ERROR - stderr - 98%|█████████▊| 22088/22434 [21:33:52<14:43, 2.55s/it] +2025-02-06 07:41:35 - ERROR - stderr - 98%|█████████▊| 22089/22434 [21:33:55<14:45, 2.57s/it] +2025-02-06 07:41:35 - ERROR - stderr - +2025-02-06 07:41:35 - ERROR - stderr - +2025-02-06 07:41:35 - INFO - stdout - {'loss': 0.3072, 'grad_norm': 1.5585885047912598, 'learning_rate': 1.2402246730109924e-08, 'epoch': 2.95} +2025-02-06 07:41:35 - ERROR - stderr - 98%|█████████▊| 22089/22434 [21:33:55<14:45, 2.57s/it] +2025-02-06 07:41:38 - ERROR - stderr - 98%|█████████▊| 22090/22434 [21:33:58<14:48, 2.58s/it] +2025-02-06 07:41:38 - ERROR - stderr - +2025-02-06 07:41:38 - ERROR - stderr - +2025-02-06 07:41:38 - INFO - stdout - {'loss': 0.3833, 'grad_norm': 1.5410196781158447, 'learning_rate': 1.2330468603934942e-08, 'epoch': 2.95} +2025-02-06 07:41:38 - ERROR - stderr - 98%|█████████▊| 22090/22434 [21:33:58<14:48, 2.58s/it] +2025-02-06 07:41:40 - ERROR - stderr - 98%|█████████▊| 22091/22434 [21:34:00<14:43, 2.58s/it] +2025-02-06 07:41:40 - ERROR - stderr - +2025-02-06 07:41:40 - ERROR - stderr - +2025-02-06 07:41:40 - INFO - stdout - {'loss': 0.367, 'grad_norm': 1.6962624788284302, 'learning_rate': 1.2258898661174911e-08, 'epoch': 2.95} +2025-02-06 07:41:40 - ERROR - stderr - 98%|█████████▊| 22091/22434 [21:34:00<14:43, 2.58s/it] +2025-02-06 07:41:43 - ERROR - stderr - 98%|█████████▊| 22092/22434 [21:34:03<14:38, 2.57s/it] +2025-02-06 07:41:43 - ERROR - stderr - +2025-02-06 07:41:43 - ERROR - stderr - +2025-02-06 07:41:43 - INFO - stdout - {'loss': 0.4261, 'grad_norm': 1.795861840248108, 'learning_rate': 1.2187536903320863e-08, 'epoch': 2.95} +2025-02-06 07:41:43 - ERROR - stderr - 98%|█████████▊| 22092/22434 [21:34:03<14:38, 2.57s/it] +2025-02-06 07:41:45 - ERROR - stderr - 98%|█████████▊| 22093/22434 [21:34:05<14:30, 2.55s/it] +2025-02-06 07:41:45 - ERROR - stderr - +2025-02-06 07:41:45 - ERROR - stderr - +2025-02-06 07:41:45 - INFO - stdout - {'loss': 0.3839, 'grad_norm': 1.4791733026504517, 'learning_rate': 1.2116383331860493e-08, 'epoch': 2.95} +2025-02-06 07:41:45 - ERROR - stderr - 98%|█████████▊| 22093/22434 [21:34:05<14:30, 2.55s/it] +2025-02-06 07:41:48 - ERROR - stderr - 98%|█████████▊| 22094/22434 [21:34:08<14:22, 2.54s/it] +2025-02-06 07:41:48 - ERROR - stderr - +2025-02-06 07:41:48 - ERROR - stderr - +2025-02-06 07:41:48 - INFO - stdout - {'loss': 0.3715, 'grad_norm': 1.6617205142974854, 'learning_rate': 1.2045437948275952e-08, 'epoch': 2.95} +2025-02-06 07:41:48 - ERROR - stderr - 98%|█████████▊| 22094/22434 [21:34:08<14:22, 2.54s/it] +2025-02-06 07:41:51 - ERROR - stderr - 98%|█████████▊| 22095/22434 [21:34:10<14:24, 2.55s/it] +2025-02-06 07:41:51 - ERROR - stderr - +2025-02-06 07:41:51 - ERROR - stderr - +2025-02-06 07:41:51 - INFO - stdout - {'loss': 0.3199, 'grad_norm': 1.457046389579773, 'learning_rate': 1.1974700754047164e-08, 'epoch': 2.95} +2025-02-06 07:41:51 - ERROR - stderr - 98%|█████████▊| 22095/22434 [21:34:10<14:24, 2.55s/it] +2025-02-06 07:41:53 - ERROR - stderr - 98%|█████████▊| 22096/22434 [21:34:13<14:22, 2.55s/it] +2025-02-06 07:41:53 - ERROR - stderr - +2025-02-06 07:41:53 - ERROR - stderr - +2025-02-06 07:41:53 - INFO - stdout - {'loss': 0.3236, 'grad_norm': 1.3638380765914917, 'learning_rate': 1.1904171750648508e-08, 'epoch': 2.95} +2025-02-06 07:41:53 - ERROR - stderr - 98%|█████████▊| 22096/22434 [21:34:13<14:22, 2.55s/it] +2025-02-06 07:41:56 - ERROR - stderr - 98%|█████████▊| 22097/22434 [21:34:16<14:56, 2.66s/it] +2025-02-06 07:41:56 - ERROR - stderr - +2025-02-06 07:41:56 - ERROR - stderr - +2025-02-06 07:41:56 - INFO - stdout - {'loss': 0.3425, 'grad_norm': 1.3656083345413208, 'learning_rate': 1.1833850939549918e-08, 'epoch': 2.95} +2025-02-06 07:41:56 - ERROR - stderr - 98%|█████████▊| 22097/22434 [21:34:16<14:56, 2.66s/it] +2025-02-06 07:41:59 - ERROR - stderr - 99%|█████████▊| 22098/22434 [21:34:18<14:41, 2.62s/it] +2025-02-06 07:41:59 - ERROR - stderr - +2025-02-06 07:41:59 - ERROR - stderr - +2025-02-06 07:41:59 - INFO - stdout - {'loss': 0.3903, 'grad_norm': 1.5414594411849976, 'learning_rate': 1.1763738322216888e-08, 'epoch': 2.96} +2025-02-06 07:41:59 - ERROR - stderr - 99%|█████████▊| 22098/22434 [21:34:18<14:41, 2.62s/it] +2025-02-06 07:42:01 - ERROR - stderr - 99%|█████████▊| 22099/22434 [21:34:21<14:25, 2.58s/it] +2025-02-06 07:42:01 - ERROR - stderr - +2025-02-06 07:42:01 - ERROR - stderr - +2025-02-06 07:42:01 - INFO - stdout - {'loss': 0.4093, 'grad_norm': 1.62140953540802, 'learning_rate': 1.1693833900110474e-08, 'epoch': 2.96} +2025-02-06 07:42:01 - ERROR - stderr - 99%|█████████▊| 22099/22434 [21:34:21<14:25, 2.58s/it] +2025-02-06 07:42:03 - ERROR - stderr - 99%|█████████▊| 22100/22434 [21:34:23<14:12, 2.55s/it] +2025-02-06 07:42:04 - ERROR - stderr - +2025-02-06 07:42:04 - ERROR - stderr - +2025-02-06 07:42:04 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.5548747777938843, 'learning_rate': 1.1624137674689507e-08, 'epoch': 2.96} +2025-02-06 07:42:04 - ERROR - stderr - 99%|█████████▊| 22100/22434 [21:34:23<14:12, 2.55s/it] +2025-02-06 07:42:06 - ERROR - stderr - 99%|█████████▊| 22101/22434 [21:34:26<14:00, 2.53s/it] +2025-02-06 07:42:06 - ERROR - stderr - +2025-02-06 07:42:06 - ERROR - stderr - +2025-02-06 07:42:06 - INFO - stdout - {'loss': 0.3388, 'grad_norm': 1.5562431812286377, 'learning_rate': 1.1554649647403937e-08, 'epoch': 2.96} +2025-02-06 07:42:06 - ERROR - stderr - 99%|█████████▊| 22101/22434 [21:34:26<14:00, 2.53s/it] +2025-02-06 07:42:08 - ERROR - stderr - 99%|█████████▊| 22102/22434 [21:34:28<13:52, 2.51s/it] +2025-02-06 07:42:08 - ERROR - stderr - +2025-02-06 07:42:08 - ERROR - stderr - +2025-02-06 07:42:08 - INFO - stdout - {'loss': 0.3556, 'grad_norm': 1.5606025457382202, 'learning_rate': 1.1485369819705939e-08, 'epoch': 2.96} +2025-02-06 07:42:08 - ERROR - stderr - 99%|█████████▊| 22102/22434 [21:34:28<13:52, 2.51s/it] +2025-02-06 07:42:11 - ERROR - stderr - 99%|█████████▊| 22103/22434 [21:34:31<13:45, 2.49s/it] +2025-02-06 07:42:11 - ERROR - stderr - +2025-02-06 07:42:11 - ERROR - stderr - +2025-02-06 07:42:11 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.5031514167785645, 'learning_rate': 1.1416298193035469e-08, 'epoch': 2.96} +2025-02-06 07:42:11 - ERROR - stderr - 99%|█████████▊| 22103/22434 [21:34:31<13:45, 2.49s/it] +2025-02-06 07:42:13 - ERROR - stderr - 99%|█████████▊| 22104/22434 [21:34:33<13:50, 2.52s/it] +2025-02-06 07:42:13 - ERROR - stderr - +2025-02-06 07:42:13 - ERROR - stderr - +2025-02-06 07:42:13 - INFO - stdout - {'loss': 0.2968, 'grad_norm': 1.3654319047927856, 'learning_rate': 1.1347434768834708e-08, 'epoch': 2.96} +2025-02-06 07:42:13 - ERROR - stderr - 99%|█████████▊| 22104/22434 [21:34:33<13:50, 2.52s/it] +2025-02-06 07:42:16 - ERROR - stderr - 99%|█████████▊| 22105/22434 [21:34:36<13:47, 2.52s/it] +2025-02-06 07:42:16 - ERROR - stderr - +2025-02-06 07:42:16 - ERROR - stderr - +2025-02-06 07:42:16 - INFO - stdout - {'loss': 0.3486, 'grad_norm': 1.4673477411270142, 'learning_rate': 1.1278779548539176e-08, 'epoch': 2.96} +2025-02-06 07:42:16 - ERROR - stderr - 99%|█████████▊| 22105/22434 [21:34:36<13:47, 2.52s/it] +2025-02-06 07:42:18 - ERROR - stderr - 99%|█████████▊| 22106/22434 [21:34:38<13:42, 2.51s/it] +2025-02-06 07:42:18 - ERROR - stderr - +2025-02-06 07:42:18 - ERROR - stderr - +2025-02-06 07:42:18 - INFO - stdout - {'loss': 0.3211, 'grad_norm': 1.5142109394073486, 'learning_rate': 1.1210332533578839e-08, 'epoch': 2.96} +2025-02-06 07:42:18 - ERROR - stderr - 99%|█████████▊| 22106/22434 [21:34:38<13:42, 2.51s/it] +2025-02-06 07:42:21 - ERROR - stderr - 99%|█████████▊| 22107/22434 [21:34:41<13:37, 2.50s/it] +2025-02-06 07:42:21 - ERROR - stderr - +2025-02-06 07:42:21 - ERROR - stderr - +2025-02-06 07:42:21 - INFO - stdout - {'loss': 0.3484, 'grad_norm': 1.4431949853897095, 'learning_rate': 1.1142093725381441e-08, 'epoch': 2.96} +2025-02-06 07:42:21 - ERROR - stderr - 99%|█████████▊| 22107/22434 [21:34:41<13:37, 2.50s/it] +2025-02-06 07:42:23 - ERROR - stderr - 99%|█████████▊| 22108/22434 [21:34:43<13:38, 2.51s/it] +2025-02-06 07:42:24 - ERROR - stderr - +2025-02-06 07:42:24 - ERROR - stderr - +2025-02-06 07:42:24 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.5324714183807373, 'learning_rate': 1.1074063125368073e-08, 'epoch': 2.96} +2025-02-06 07:42:24 - ERROR - stderr - 99%|█████████▊| 22108/22434 [21:34:43<13:38, 2.51s/it] +2025-02-06 07:42:26 - ERROR - stderr - 99%|█████████▊| 22109/22434 [21:34:46<13:36, 2.51s/it] +2025-02-06 07:42:26 - ERROR - stderr - +2025-02-06 07:42:26 - ERROR - stderr - +2025-02-06 07:42:26 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.5592246055603027, 'learning_rate': 1.1006240734957596e-08, 'epoch': 2.96} +2025-02-06 07:42:26 - ERROR - stderr - 99%|█████████▊| 22109/22434 [21:34:46<13:36, 2.51s/it] +2025-02-06 07:42:29 - ERROR - stderr - 99%|█████████▊| 22110/22434 [21:34:48<13:37, 2.52s/it] +2025-02-06 07:42:29 - ERROR - stderr - +2025-02-06 07:42:29 - ERROR - stderr - +2025-02-06 07:42:29 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.628833532333374, 'learning_rate': 1.0938626555564436e-08, 'epoch': 2.96} +2025-02-06 07:42:29 - ERROR - stderr - 99%|█████████▊| 22110/22434 [21:34:48<13:37, 2.52s/it] +2025-02-06 07:42:31 - ERROR - stderr - 99%|█████████▊| 22111/22434 [21:34:51<13:37, 2.53s/it] +2025-02-06 07:42:31 - ERROR - stderr - +2025-02-06 07:42:31 - ERROR - stderr - +2025-02-06 07:42:31 - INFO - stdout - {'loss': 0.3324, 'grad_norm': 1.435444712638855, 'learning_rate': 1.0871220588596353e-08, 'epoch': 2.96} +2025-02-06 07:42:31 - ERROR - stderr - 99%|█████████▊| 22111/22434 [21:34:51<13:37, 2.53s/it] +2025-02-06 07:42:34 - ERROR - stderr - 99%|█████████▊| 22112/22434 [21:34:54<13:51, 2.58s/it] +2025-02-06 07:42:34 - ERROR - stderr - +2025-02-06 07:42:34 - ERROR - stderr - +2025-02-06 07:42:34 - INFO - stdout - {'loss': 0.3725, 'grad_norm': 1.6318798065185547, 'learning_rate': 1.0804022835458895e-08, 'epoch': 2.96} +2025-02-06 07:42:34 - ERROR - stderr - 99%|█████████▊| 22112/22434 [21:34:54<13:51, 2.58s/it] +2025-02-06 07:42:36 - ERROR - stderr - 99%|█████████▊| 22113/22434 [21:34:56<13:38, 2.55s/it] +2025-02-06 07:42:36 - ERROR - stderr - +2025-02-06 07:42:36 - ERROR - stderr - +2025-02-06 07:42:36 - INFO - stdout - {'loss': 0.3721, 'grad_norm': 1.7556695938110352, 'learning_rate': 1.0737033297553156e-08, 'epoch': 2.96} +2025-02-06 07:42:36 - ERROR - stderr - 99%|█████████▊| 22113/22434 [21:34:56<13:38, 2.55s/it] +2025-02-06 07:42:39 - ERROR - stderr - 99%|█████████▊| 22114/22434 [21:34:59<13:31, 2.54s/it] +2025-02-06 07:42:39 - ERROR - stderr - +2025-02-06 07:42:39 - ERROR - stderr - +2025-02-06 07:42:39 - INFO - stdout - {'loss': 0.3652, 'grad_norm': 1.6174085140228271, 'learning_rate': 1.0670251976275803e-08, 'epoch': 2.96} +2025-02-06 07:42:39 - ERROR - stderr - 99%|█████████▊| 22114/22434 [21:34:59<13:31, 2.54s/it] +2025-02-06 07:42:41 - ERROR - stderr - 99%|█████████▊| 22115/22434 [21:35:01<13:23, 2.52s/it] +2025-02-06 07:42:41 - ERROR - stderr - +2025-02-06 07:42:41 - ERROR - stderr - +2025-02-06 07:42:41 - INFO - stdout - {'loss': 0.3125, 'grad_norm': 1.4284385442733765, 'learning_rate': 1.0603678873017941e-08, 'epoch': 2.96} +2025-02-06 07:42:41 - ERROR - stderr - 99%|█████████▊| 22115/22434 [21:35:01<13:23, 2.52s/it] +2025-02-06 07:42:44 - ERROR - stderr - 99%|█████████▊| 22116/22434 [21:35:04<13:30, 2.55s/it] +2025-02-06 07:42:44 - ERROR - stderr - +2025-02-06 07:42:44 - ERROR - stderr - +2025-02-06 07:42:44 - INFO - stdout - {'loss': 0.3115, 'grad_norm': 1.5501993894577026, 'learning_rate': 1.0537313989167353e-08, 'epoch': 2.96} +2025-02-06 07:42:44 - ERROR - stderr - 99%|█████████▊| 22116/22434 [21:35:04<13:30, 2.55s/it] +2025-02-06 07:42:46 - ERROR - stderr - 99%|█████████▊| 22117/22434 [21:35:06<13:34, 2.57s/it] +2025-02-06 07:42:47 - ERROR - stderr - +2025-02-06 07:42:47 - ERROR - stderr - +2025-02-06 07:42:47 - INFO - stdout - {'loss': 0.3332, 'grad_norm': 1.473180890083313, 'learning_rate': 1.0471157326107372e-08, 'epoch': 2.96} +2025-02-06 07:42:47 - ERROR - stderr - 99%|█████████▊| 22117/22434 [21:35:06<13:34, 2.57s/it] +2025-02-06 07:42:49 - ERROR - stderr - 99%|█████████▊| 22118/22434 [21:35:09<13:28, 2.56s/it] +2025-02-06 07:42:49 - ERROR - stderr - +2025-02-06 07:42:49 - ERROR - stderr - +2025-02-06 07:42:49 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.7096822261810303, 'learning_rate': 1.040520888521801e-08, 'epoch': 2.96} +2025-02-06 07:42:49 - ERROR - stderr - 99%|█████████▊| 22118/22434 [21:35:09<13:28, 2.56s/it] +2025-02-06 07:42:52 - ERROR - stderr - 99%|█████████▊| 22119/22434 [21:35:11<13:23, 2.55s/it] +2025-02-06 07:42:52 - ERROR - stderr - +2025-02-06 07:42:52 - ERROR - stderr - +2025-02-06 07:42:52 - INFO - stdout - {'loss': 0.3677, 'grad_norm': 1.6601336002349854, 'learning_rate': 1.0339468667872609e-08, 'epoch': 2.96} +2025-02-06 07:42:52 - ERROR - stderr - 99%|█████████▊| 22119/22434 [21:35:11<13:23, 2.55s/it] +2025-02-06 07:42:54 - ERROR - stderr - 99%|█████████▊| 22120/22434 [21:35:14<13:12, 2.52s/it] +2025-02-06 07:42:54 - ERROR - stderr - +2025-02-06 07:42:54 - ERROR - stderr - +2025-02-06 07:42:54 - INFO - stdout - {'loss': 0.2962, 'grad_norm': 1.614563226699829, 'learning_rate': 1.0273936675441187e-08, 'epoch': 2.96} +2025-02-06 07:42:54 - ERROR - stderr - 99%|█████████▊| 22120/22434 [21:35:14<13:12, 2.52s/it] +2025-02-06 07:42:57 - ERROR - stderr - 99%|█████████▊| 22121/22434 [21:35:16<13:08, 2.52s/it] +2025-02-06 07:42:57 - ERROR - stderr - +2025-02-06 07:42:57 - ERROR - stderr - +2025-02-06 07:42:57 - INFO - stdout - {'loss': 0.4137, 'grad_norm': 1.8810795545578003, 'learning_rate': 1.0208612909291537e-08, 'epoch': 2.96} +2025-02-06 07:42:57 - ERROR - stderr - 99%|█████████▊| 22121/22434 [21:35:16<13:08, 2.52s/it] +2025-02-06 07:42:59 - ERROR - stderr - 99%|█████████▊| 22122/22434 [21:35:19<13:02, 2.51s/it] +2025-02-06 07:42:59 - ERROR - stderr - +2025-02-06 07:42:59 - ERROR - stderr - +2025-02-06 07:42:59 - INFO - stdout - {'loss': 0.4056, 'grad_norm': 1.8962736129760742, 'learning_rate': 1.0143497370783683e-08, 'epoch': 2.96} +2025-02-06 07:42:59 - ERROR - stderr - 99%|█████████▊| 22122/22434 [21:35:19<13:02, 2.51s/it] +2025-02-06 07:43:02 - ERROR - stderr - 99%|█████████▊| 22123/22434 [21:35:21<13:10, 2.54s/it] +2025-02-06 07:43:02 - ERROR - stderr - +2025-02-06 07:43:02 - ERROR - stderr - +2025-02-06 07:43:02 - INFO - stdout - {'loss': 0.3012, 'grad_norm': 1.3922019004821777, 'learning_rate': 1.0078590061275428e-08, 'epoch': 2.96} +2025-02-06 07:43:02 - ERROR - stderr - 99%|█████████▊| 22123/22434 [21:35:21<13:10, 2.54s/it] +2025-02-06 07:43:04 - ERROR - stderr - 99%|█████████▊| 22124/22434 [21:35:24<13:08, 2.54s/it] +2025-02-06 07:43:04 - ERROR - stderr - +2025-02-06 07:43:04 - ERROR - stderr - +2025-02-06 07:43:04 - INFO - stdout - {'loss': 0.3382, 'grad_norm': 1.5836387872695923, 'learning_rate': 1.0013890982120133e-08, 'epoch': 2.96} +2025-02-06 07:43:04 - ERROR - stderr - 99%|█████████▊| 22124/22434 [21:35:24<13:08, 2.54s/it] +2025-02-06 07:43:07 - ERROR - stderr - 99%|█████████▊| 22125/22434 [21:35:27<13:26, 2.61s/it] +2025-02-06 07:43:07 - ERROR - stderr - +2025-02-06 07:43:07 - ERROR - stderr - +2025-02-06 07:43:07 - INFO - stdout - {'loss': 0.3437, 'grad_norm': 1.6255797147750854, 'learning_rate': 9.94940013466561e-09, 'epoch': 2.96} +2025-02-06 07:43:07 - ERROR - stderr - 99%|█████████▊| 22125/22434 [21:35:27<13:26, 2.61s/it] +2025-02-06 07:43:09 - ERROR - stderr - 99%|█████████▊| 22126/22434 [21:35:29<13:06, 2.55s/it] +2025-02-06 07:43:09 - ERROR - stderr - +2025-02-06 07:43:09 - ERROR - stderr - +2025-02-06 07:43:09 - INFO - stdout - {'loss': 0.3488, 'grad_norm': 1.48419988155365, 'learning_rate': 9.885117520256338e-09, 'epoch': 2.96} +2025-02-06 07:43:09 - ERROR - stderr - 99%|█████████▊| 22126/22434 [21:35:29<13:06, 2.55s/it] +2025-02-06 07:43:12 - ERROR - stderr - 99%|█████████▊| 22127/22434 [21:35:32<12:58, 2.54s/it] +2025-02-06 07:43:12 - ERROR - stderr - +2025-02-06 07:43:12 - ERROR - stderr - +2025-02-06 07:43:12 - INFO - stdout - {'loss': 0.3686, 'grad_norm': 1.801751732826233, 'learning_rate': 9.821043140232356e-09, 'epoch': 2.96} +2025-02-06 07:43:12 - ERROR - stderr - 99%|█████████▊| 22127/22434 [21:35:32<12:58, 2.54s/it] +2025-02-06 07:43:14 - ERROR - stderr - 99%|█████████▊| 22128/22434 [21:35:34<13:06, 2.57s/it] +2025-02-06 07:43:15 - ERROR - stderr - +2025-02-06 07:43:15 - ERROR - stderr - +2025-02-06 07:43:15 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.5909297466278076, 'learning_rate': 9.757176995928153e-09, 'epoch': 2.96} +2025-02-06 07:43:15 - ERROR - stderr - 99%|█████████▊| 22128/22434 [21:35:34<13:06, 2.57s/it] +2025-02-06 07:43:17 - ERROR - stderr - 99%|█████████▊| 22129/22434 [21:35:37<12:50, 2.53s/it] +2025-02-06 07:43:17 - ERROR - stderr - +2025-02-06 07:43:17 - ERROR - stderr - +2025-02-06 07:43:17 - INFO - stdout - {'loss': 0.3866, 'grad_norm': 1.6510050296783447, 'learning_rate': 9.693519088677106e-09, 'epoch': 2.96} +2025-02-06 07:43:17 - ERROR - stderr - 99%|█████████▊| 22129/22434 [21:35:37<12:50, 2.53s/it] +2025-02-06 07:43:19 - ERROR - stderr - 99%|█████████▊| 22130/22434 [21:35:39<12:44, 2.52s/it] +2025-02-06 07:43:19 - ERROR - stderr - +2025-02-06 07:43:19 - ERROR - stderr - +2025-02-06 07:43:19 - INFO - stdout - {'loss': 0.4232, 'grad_norm': 1.7215285301208496, 'learning_rate': 9.630069419804821e-09, 'epoch': 2.96} +2025-02-06 07:43:19 - ERROR - stderr - 99%|█████████▊| 22130/22434 [21:35:39<12:44, 2.52s/it] +2025-02-06 07:43:22 - ERROR - stderr - 99%|█████████▊| 22131/22434 [21:35:42<12:46, 2.53s/it] +2025-02-06 07:43:22 - ERROR - stderr - +2025-02-06 07:43:22 - ERROR - stderr - +2025-02-06 07:43:22 - INFO - stdout - {'loss': 0.35, 'grad_norm': 1.5931317806243896, 'learning_rate': 9.566827990633576e-09, 'epoch': 2.96} +2025-02-06 07:43:22 - ERROR - stderr - 99%|█████████▊| 22131/22434 [21:35:42<12:46, 2.53s/it] +2025-02-06 07:43:25 - ERROR - stderr - 99%|█████████▊| 22132/22434 [21:35:44<13:03, 2.60s/it] +2025-02-06 07:43:25 - ERROR - stderr - +2025-02-06 07:43:25 - ERROR - stderr - +2025-02-06 07:43:25 - INFO - stdout - {'loss': 0.3215, 'grad_norm': 1.6030317544937134, 'learning_rate': 9.503794802482314e-09, 'epoch': 2.96} +2025-02-06 07:43:25 - ERROR - stderr - 99%|█████████▊| 22132/22434 [21:35:45<13:03, 2.60s/it] +2025-02-06 07:43:28 - ERROR - stderr - 99%|█████████▊| 22133/22434 [21:35:47<13:37, 2.72s/it] +2025-02-06 07:43:28 - ERROR - stderr - +2025-02-06 07:43:28 - ERROR - stderr - +2025-02-06 07:43:28 - INFO - stdout - {'loss': 0.4006, 'grad_norm': 1.6255704164505005, 'learning_rate': 9.440969856664428e-09, 'epoch': 2.96} +2025-02-06 07:43:28 - ERROR - stderr - 99%|█████████▊| 22133/22434 [21:35:48<13:37, 2.72s/it] +2025-02-06 07:43:30 - ERROR - stderr - 99%|█████████▊| 22134/22434 [21:35:50<13:19, 2.66s/it] +2025-02-06 07:43:30 - ERROR - stderr - +2025-02-06 07:43:30 - ERROR - stderr - +2025-02-06 07:43:30 - INFO - stdout - {'loss': 0.4039, 'grad_norm': 1.7018259763717651, 'learning_rate': 9.378353154489983e-09, 'epoch': 2.96} +2025-02-06 07:43:30 - ERROR - stderr - 99%|█████████▊| 22134/22434 [21:35:50<13:19, 2.66s/it] +2025-02-06 07:43:33 - ERROR - stderr - 99%|█████████▊| 22135/22434 [21:35:53<13:04, 2.62s/it] +2025-02-06 07:43:33 - ERROR - stderr - +2025-02-06 07:43:33 - ERROR - stderr - +2025-02-06 07:43:33 - INFO - stdout - {'loss': 0.3974, 'grad_norm': 1.665073037147522, 'learning_rate': 9.31594469726349e-09, 'epoch': 2.96} +2025-02-06 07:43:33 - ERROR - stderr - 99%|█████████▊| 22135/22434 [21:35:53<13:04, 2.62s/it] +2025-02-06 07:43:35 - ERROR - stderr - 99%|█████████▊| 22136/22434 [21:35:55<13:01, 2.62s/it] +2025-02-06 07:43:35 - ERROR - stderr - +2025-02-06 07:43:35 - ERROR - stderr - +2025-02-06 07:43:35 - INFO - stdout - {'loss': 0.365, 'grad_norm': 1.6080793142318726, 'learning_rate': 9.253744486286132e-09, 'epoch': 2.96} +2025-02-06 07:43:35 - ERROR - stderr - 99%|█████████▊| 22136/22434 [21:35:55<13:01, 2.62s/it] +2025-02-06 07:43:38 - ERROR - stderr - 99%|█████████▊| 22137/22434 [21:35:58<12:44, 2.57s/it] +2025-02-06 07:43:38 - ERROR - stderr - +2025-02-06 07:43:38 - ERROR - stderr - +2025-02-06 07:43:38 - INFO - stdout - {'loss': 0.382, 'grad_norm': 1.475387454032898, 'learning_rate': 9.191752522854647e-09, 'epoch': 2.96} +2025-02-06 07:43:38 - ERROR - stderr - 99%|█████████▊| 22137/22434 [21:35:58<12:44, 2.57s/it] +2025-02-06 07:43:40 - ERROR - stderr - 99%|█████████▊| 22138/22434 [21:36:00<12:41, 2.57s/it] +2025-02-06 07:43:40 - ERROR - stderr - +2025-02-06 07:43:40 - ERROR - stderr - +2025-02-06 07:43:40 - INFO - stdout - {'loss': 0.3795, 'grad_norm': 1.666695475578308, 'learning_rate': 9.129968808260225e-09, 'epoch': 2.96} +2025-02-06 07:43:40 - ERROR - stderr - 99%|█████████▊| 22138/22434 [21:36:00<12:41, 2.57s/it] +2025-02-06 07:43:43 - ERROR - stderr - 99%|█████████▊| 22139/22434 [21:36:03<12:30, 2.54s/it] +2025-02-06 07:43:43 - ERROR - stderr - +2025-02-06 07:43:43 - ERROR - stderr - +2025-02-06 07:43:43 - INFO - stdout - {'loss': 0.3547, 'grad_norm': 1.4904379844665527, 'learning_rate': 9.068393343791837e-09, 'epoch': 2.96} +2025-02-06 07:43:43 - ERROR - stderr - 99%|█████████▊| 22139/22434 [21:36:03<12:30, 2.54s/it] +2025-02-06 07:43:45 - ERROR - stderr - 99%|█████████▊| 22140/22434 [21:36:05<12:21, 2.52s/it] +2025-02-06 07:43:45 - ERROR - stderr - +2025-02-06 07:43:45 - ERROR - stderr - +2025-02-06 07:43:45 - INFO - stdout - {'loss': 0.3726, 'grad_norm': 1.4746145009994507, 'learning_rate': 9.007026130732899e-09, 'epoch': 2.96} +2025-02-06 07:43:45 - ERROR - stderr - 99%|█████████▊| 22140/22434 [21:36:05<12:21, 2.52s/it] +2025-02-06 07:43:48 - ERROR - stderr - 99%|█████████▊| 22141/22434 [21:36:08<12:14, 2.51s/it] +2025-02-06 07:43:48 - ERROR - stderr - +2025-02-06 07:43:48 - ERROR - stderr - +2025-02-06 07:43:48 - INFO - stdout - {'loss': 0.3349, 'grad_norm': 1.5225833654403687, 'learning_rate': 8.945867170361278e-09, 'epoch': 2.96} +2025-02-06 07:43:48 - ERROR - stderr - 99%|█████████▊| 22141/22434 [21:36:08<12:14, 2.51s/it] +2025-02-06 07:43:50 - ERROR - stderr - 99%|█████████▊| 22142/22434 [21:36:10<12:09, 2.50s/it] +2025-02-06 07:43:50 - ERROR - stderr - +2025-02-06 07:43:50 - ERROR - stderr - +2025-02-06 07:43:50 - INFO - stdout - {'loss': 0.4105, 'grad_norm': 1.6785800457000732, 'learning_rate': 8.88491646395262e-09, 'epoch': 2.96} +2025-02-06 07:43:50 - ERROR - stderr - 99%|█████████▊| 22142/22434 [21:36:10<12:09, 2.50s/it] +2025-02-06 07:43:53 - ERROR - stderr - 99%|█████████▊| 22143/22434 [21:36:13<12:05, 2.49s/it] +2025-02-06 07:43:53 - ERROR - stderr - +2025-02-06 07:43:53 - ERROR - stderr - +2025-02-06 07:43:53 - INFO - stdout - {'loss': 0.3548, 'grad_norm': 1.7005842924118042, 'learning_rate': 8.82417401277813e-09, 'epoch': 2.96} +2025-02-06 07:43:53 - ERROR - stderr - 99%|█████████▊| 22143/22434 [21:36:13<12:05, 2.49s/it] +2025-02-06 07:43:53 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2736 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 07:43:53 - WARNING - transformers.tokenization_utils_base - Token indices sequence length is longer than the specified maximum sequence length for this model (2736 > 2048). Running this sequence through the model will result in indexing errors +2025-02-06 07:43:55 - ERROR - stderr - 99%|█████████▊| 22144/22434 [21:36:15<12:07, 2.51s/it] +2025-02-06 07:43:55 - ERROR - stderr - +2025-02-06 07:43:55 - ERROR - stderr - +2025-02-06 07:43:55 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.490071177482605, 'learning_rate': 8.763639818103464e-09, 'epoch': 2.96} +2025-02-06 07:43:55 - ERROR - stderr - 99%|█████████▊| 22144/22434 [21:36:15<12:07, 2.51s/it] +2025-02-06 07:44:01 - ERROR - stderr - 99%|█████████▊| 22145/22434 [21:36:21<16:47, 3.49s/it] +2025-02-06 07:44:01 - ERROR - stderr - +2025-02-06 07:44:01 - ERROR - stderr - +2025-02-06 07:44:01 - INFO - stdout - {'loss': 0.3388, 'grad_norm': 1.5956265926361084, 'learning_rate': 8.703313881188724e-09, 'epoch': 2.96} +2025-02-06 07:44:01 - ERROR - stderr - 99%|█████████▊| 22145/22434 [21:36:21<16:47, 3.49s/it] +2025-02-06 07:44:04 - ERROR - stderr - 99%|█████████▊| 22146/22434 [21:36:23<15:16, 3.18s/it] +2025-02-06 07:44:04 - ERROR - stderr - +2025-02-06 07:44:04 - ERROR - stderr - +2025-02-06 07:44:04 - INFO - stdout - {'loss': 0.3992, 'grad_norm': 1.63701331615448, 'learning_rate': 8.643196203294013e-09, 'epoch': 2.96} +2025-02-06 07:44:04 - ERROR - stderr - 99%|█████████▊| 22146/22434 [21:36:23<15:16, 3.18s/it] +2025-02-06 07:44:06 - ERROR - stderr - 99%|█████████▊| 22147/22434 [21:36:26<14:16, 2.98s/it] +2025-02-06 07:44:06 - ERROR - stderr - +2025-02-06 07:44:06 - ERROR - stderr - +2025-02-06 07:44:06 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.64596426486969, 'learning_rate': 8.583286785670552e-09, 'epoch': 2.96} +2025-02-06 07:44:06 - ERROR - stderr - 99%|█████████▊| 22147/22434 [21:36:26<14:16, 2.98s/it] +2025-02-06 07:44:09 - ERROR - stderr - 99%|█████████▊| 22148/22434 [21:36:29<13:49, 2.90s/it] +2025-02-06 07:44:09 - ERROR - stderr - +2025-02-06 07:44:09 - ERROR - stderr - +2025-02-06 07:44:09 - INFO - stdout - {'loss': 0.3438, 'grad_norm': 1.496146559715271, 'learning_rate': 8.523585629568454e-09, 'epoch': 2.96} +2025-02-06 07:44:09 - ERROR - stderr - 99%|█████████▊| 22148/22434 [21:36:29<13:49, 2.90s/it] +2025-02-06 07:44:12 - ERROR - stderr - 99%|█████████▊| 22149/22434 [21:36:31<13:30, 2.84s/it] +2025-02-06 07:44:12 - ERROR - stderr - +2025-02-06 07:44:12 - ERROR - stderr - +2025-02-06 07:44:12 - INFO - stdout - {'loss': 0.2825, 'grad_norm': 1.3700833320617676, 'learning_rate': 8.464092736231166e-09, 'epoch': 2.96} +2025-02-06 07:44:12 - ERROR - stderr - 99%|█████████▊| 22149/22434 [21:36:31<13:30, 2.84s/it] +2025-02-06 07:44:14 - ERROR - stderr - 99%|█████████▊| 22150/22434 [21:36:34<13:03, 2.76s/it] +2025-02-06 07:44:14 - ERROR - stderr - +2025-02-06 07:44:14 - ERROR - stderr - +2025-02-06 07:44:14 - INFO - stdout - {'loss': 0.388, 'grad_norm': 1.6656126976013184, 'learning_rate': 8.40480810689881e-09, 'epoch': 2.96} +2025-02-06 07:44:14 - ERROR - stderr - 99%|█████████▊| 22150/22434 [21:36:34<13:03, 2.76s/it] +2025-02-06 07:44:17 - ERROR - stderr - 99%|█████████▊| 22151/22434 [21:36:36<12:36, 2.67s/it] +2025-02-06 07:44:17 - ERROR - stderr - +2025-02-06 07:44:17 - ERROR - stderr - +2025-02-06 07:44:17 - INFO - stdout - {'loss': 0.3316, 'grad_norm': 1.540081262588501, 'learning_rate': 8.345731742807061e-09, 'epoch': 2.96} +2025-02-06 07:44:17 - ERROR - stderr - 99%|█████████▊| 22151/22434 [21:36:36<12:36, 2.67s/it] +2025-02-06 07:44:19 - ERROR - stderr - 99%|█████████▊| 22152/22434 [21:36:39<12:21, 2.63s/it] +2025-02-06 07:44:19 - ERROR - stderr - +2025-02-06 07:44:19 - ERROR - stderr - +2025-02-06 07:44:19 - INFO - stdout - {'loss': 0.3421, 'grad_norm': 1.5660656690597534, 'learning_rate': 8.28686364518827e-09, 'epoch': 2.96} +2025-02-06 07:44:19 - ERROR - stderr - 99%|█████████▊| 22152/22434 [21:36:39<12:21, 2.63s/it] +2025-02-06 07:44:22 - ERROR - stderr - 99%|█████████▊| 22153/22434 [21:36:42<12:24, 2.65s/it] +2025-02-06 07:44:22 - ERROR - stderr - +2025-02-06 07:44:22 - ERROR - stderr - +2025-02-06 07:44:22 - INFO - stdout - {'loss': 0.359, 'grad_norm': 1.6048998832702637, 'learning_rate': 8.228203815268121e-09, 'epoch': 2.96} +2025-02-06 07:44:22 - ERROR - stderr - 99%|█████████▊| 22153/22434 [21:36:42<12:24, 2.65s/it] +2025-02-06 07:44:24 - ERROR - stderr - 99%|█████████▉| 22154/22434 [21:36:44<12:10, 2.61s/it] +2025-02-06 07:44:24 - ERROR - stderr - +2025-02-06 07:44:24 - ERROR - stderr - +2025-02-06 07:44:24 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.4242249727249146, 'learning_rate': 8.169752254270081e-09, 'epoch': 2.96} +2025-02-06 07:44:24 - ERROR - stderr - 99%|█████████▉| 22154/22434 [21:36:44<12:10, 2.61s/it] +2025-02-06 07:44:27 - ERROR - stderr - 99%|█████████▉| 22155/22434 [21:36:47<12:11, 2.62s/it] +2025-02-06 07:44:27 - ERROR - stderr - +2025-02-06 07:44:27 - ERROR - stderr - +2025-02-06 07:44:27 - INFO - stdout - {'loss': 0.3372, 'grad_norm': 1.5164504051208496, 'learning_rate': 8.111508963412062e-09, 'epoch': 2.96} +2025-02-06 07:44:27 - ERROR - stderr - 99%|█████████▉| 22155/22434 [21:36:47<12:11, 2.62s/it] +2025-02-06 07:44:29 - ERROR - stderr - 99%|█████████▉| 22156/22434 [21:36:49<11:57, 2.58s/it] +2025-02-06 07:44:30 - ERROR - stderr - +2025-02-06 07:44:30 - ERROR - stderr - +2025-02-06 07:44:30 - INFO - stdout - {'loss': 0.381, 'grad_norm': 1.6279053688049316, 'learning_rate': 8.053473943908651e-09, 'epoch': 2.96} +2025-02-06 07:44:30 - ERROR - stderr - 99%|█████████▉| 22156/22434 [21:36:49<11:57, 2.58s/it] +2025-02-06 07:44:32 - ERROR - stderr - 99%|█████████▉| 22157/22434 [21:36:52<11:43, 2.54s/it] +2025-02-06 07:44:32 - ERROR - stderr - +2025-02-06 07:44:32 - ERROR - stderr - +2025-02-06 07:44:32 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.7106561660766602, 'learning_rate': 7.99564719696999e-09, 'epoch': 2.96} +2025-02-06 07:44:32 - ERROR - stderr - 99%|█████████▉| 22157/22434 [21:36:52<11:43, 2.54s/it] +2025-02-06 07:44:34 - ERROR - stderr - 99%|█████████▉| 22158/22434 [21:36:54<11:38, 2.53s/it] +2025-02-06 07:44:34 - ERROR - stderr - +2025-02-06 07:44:34 - ERROR - stderr - +2025-02-06 07:44:34 - INFO - stdout - {'loss': 0.3622, 'grad_norm': 1.5342135429382324, 'learning_rate': 7.938028723800672e-09, 'epoch': 2.96} +2025-02-06 07:44:34 - ERROR - stderr - 99%|█████████▉| 22158/22434 [21:36:54<11:38, 2.53s/it] +2025-02-06 07:44:37 - ERROR - stderr - 99%|█████████▉| 22159/22434 [21:36:57<11:33, 2.52s/it] +2025-02-06 07:44:37 - ERROR - stderr - +2025-02-06 07:44:37 - ERROR - stderr - +2025-02-06 07:44:37 - INFO - stdout - {'loss': 0.3913, 'grad_norm': 1.5336884260177612, 'learning_rate': 7.880618525600847e-09, 'epoch': 2.96} +2025-02-06 07:44:37 - ERROR - stderr - 99%|█████████▉| 22159/22434 [21:36:57<11:33, 2.52s/it] +2025-02-06 07:44:40 - ERROR - stderr - 99%|█████████▉| 22160/22434 [21:36:59<11:50, 2.59s/it] +2025-02-06 07:44:40 - ERROR - stderr - +2025-02-06 07:44:40 - ERROR - stderr - +2025-02-06 07:44:40 - INFO - stdout - {'loss': 0.3775, 'grad_norm': 1.5649466514587402, 'learning_rate': 7.823416603568446e-09, 'epoch': 2.96} +2025-02-06 07:44:40 - ERROR - stderr - 99%|█████████▉| 22160/22434 [21:36:59<11:50, 2.59s/it] +2025-02-06 07:44:42 - ERROR - stderr - 99%|█████████▉| 22161/22434 [21:37:02<11:41, 2.57s/it] +2025-02-06 07:44:42 - ERROR - stderr - +2025-02-06 07:44:42 - ERROR - stderr - +2025-02-06 07:44:42 - INFO - stdout - {'loss': 0.401, 'grad_norm': 1.5983009338378906, 'learning_rate': 7.766422958895848e-09, 'epoch': 2.96} +2025-02-06 07:44:42 - ERROR - stderr - 99%|█████████▉| 22161/22434 [21:37:02<11:41, 2.57s/it] +2025-02-06 07:44:45 - ERROR - stderr - 99%|█████████▉| 22162/22434 [21:37:05<11:41, 2.58s/it] +2025-02-06 07:44:45 - ERROR - stderr - +2025-02-06 07:44:45 - ERROR - stderr - +2025-02-06 07:44:45 - INFO - stdout - {'loss': 0.3492, 'grad_norm': 1.425809383392334, 'learning_rate': 7.70963759277099e-09, 'epoch': 2.96} +2025-02-06 07:44:45 - ERROR - stderr - 99%|█████████▉| 22162/22434 [21:37:05<11:41, 2.58s/it] +2025-02-06 07:44:48 - ERROR - stderr - 99%|█████████▉| 22163/22434 [21:37:07<11:55, 2.64s/it] +2025-02-06 07:44:48 - ERROR - stderr - +2025-02-06 07:44:48 - ERROR - stderr - +2025-02-06 07:44:48 - INFO - stdout - {'loss': 0.3323, 'grad_norm': 1.3870645761489868, 'learning_rate': 7.653060506376264e-09, 'epoch': 2.96} +2025-02-06 07:44:48 - ERROR - stderr - 99%|█████████▉| 22163/22434 [21:37:07<11:55, 2.64s/it] +2025-02-06 07:44:50 - ERROR - stderr - 99%|█████████▉| 22164/22434 [21:37:10<11:36, 2.58s/it] +2025-02-06 07:44:50 - ERROR - stderr - +2025-02-06 07:44:50 - ERROR - stderr - +2025-02-06 07:44:50 - INFO - stdout - {'loss': 0.3955, 'grad_norm': 1.6561626195907593, 'learning_rate': 7.596691700891834e-09, 'epoch': 2.96} +2025-02-06 07:44:50 - ERROR - stderr - 99%|█████████▉| 22164/22434 [21:37:10<11:36, 2.58s/it] +2025-02-06 07:44:53 - ERROR - stderr - 99%|█████████▉| 22165/22434 [21:37:12<11:29, 2.56s/it] +2025-02-06 07:44:53 - ERROR - stderr - +2025-02-06 07:44:53 - ERROR - stderr - +2025-02-06 07:44:53 - INFO - stdout - {'loss': 0.3479, 'grad_norm': 1.6288796663284302, 'learning_rate': 7.540531177493427e-09, 'epoch': 2.96} +2025-02-06 07:44:53 - ERROR - stderr - 99%|█████████▉| 22165/22434 [21:37:12<11:29, 2.56s/it] +2025-02-06 07:44:55 - ERROR - stderr - 99%|█████████▉| 22166/22434 [21:37:15<11:20, 2.54s/it] +2025-02-06 07:44:55 - ERROR - stderr - +2025-02-06 07:44:55 - ERROR - stderr - +2025-02-06 07:44:55 - INFO - stdout - {'loss': 0.4097, 'grad_norm': 1.6949232816696167, 'learning_rate': 7.484578937350107e-09, 'epoch': 2.96} +2025-02-06 07:44:55 - ERROR - stderr - 99%|█████████▉| 22166/22434 [21:37:15<11:20, 2.54s/it] +2025-02-06 07:44:58 - ERROR - stderr - 99%|█████████▉| 22167/22434 [21:37:17<11:24, 2.56s/it] +2025-02-06 07:44:58 - ERROR - stderr - +2025-02-06 07:44:58 - ERROR - stderr - +2025-02-06 07:44:58 - INFO - stdout - {'loss': 0.385, 'grad_norm': 1.4974019527435303, 'learning_rate': 7.428834981629829e-09, 'epoch': 2.96} +2025-02-06 07:44:58 - ERROR - stderr - 99%|█████████▉| 22167/22434 [21:37:17<11:24, 2.56s/it] +2025-02-06 07:45:00 - ERROR - stderr - 99%|█████████▉| 22168/22434 [21:37:20<11:10, 2.52s/it] +2025-02-06 07:45:00 - ERROR - stderr - +2025-02-06 07:45:00 - ERROR - stderr - +2025-02-06 07:45:00 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.5792781114578247, 'learning_rate': 7.373299311492777e-09, 'epoch': 2.96} +2025-02-06 07:45:00 - ERROR - stderr - 99%|█████████▉| 22168/22434 [21:37:20<11:10, 2.52s/it] +2025-02-06 07:45:03 - ERROR - stderr - 99%|█████████▉| 22169/22434 [21:37:22<11:07, 2.52s/it] +2025-02-06 07:45:03 - ERROR - stderr - +2025-02-06 07:45:03 - ERROR - stderr - +2025-02-06 07:45:03 - INFO - stdout - {'loss': 0.3692, 'grad_norm': 1.703270673751831, 'learning_rate': 7.3179719280980225e-09, 'epoch': 2.96} +2025-02-06 07:45:03 - ERROR - stderr - 99%|█████████▉| 22169/22434 [21:37:22<11:07, 2.52s/it] +2025-02-06 07:45:05 - ERROR - stderr - 99%|█████████▉| 22170/22434 [21:37:25<11:01, 2.51s/it] +2025-02-06 07:45:05 - ERROR - stderr - +2025-02-06 07:45:05 - ERROR - stderr - +2025-02-06 07:45:05 - INFO - stdout - {'loss': 0.3399, 'grad_norm': 1.5327261686325073, 'learning_rate': 7.2628528325979774e-09, 'epoch': 2.96} +2025-02-06 07:45:05 - ERROR - stderr - 99%|█████████▉| 22170/22434 [21:37:25<11:01, 2.51s/it] +2025-02-06 07:45:08 - ERROR - stderr - 99%|█████████▉| 22171/22434 [21:37:27<11:05, 2.53s/it] +2025-02-06 07:45:08 - ERROR - stderr - +2025-02-06 07:45:08 - ERROR - stderr - +2025-02-06 07:45:08 - INFO - stdout - {'loss': 0.3041, 'grad_norm': 1.2844221591949463, 'learning_rate': 7.2079420261417235e-09, 'epoch': 2.96} +2025-02-06 07:45:08 - ERROR - stderr - 99%|█████████▉| 22171/22434 [21:37:27<11:05, 2.53s/it] +2025-02-06 07:45:10 - ERROR - stderr - 99%|█████████▉| 22172/22434 [21:37:30<11:02, 2.53s/it] +2025-02-06 07:45:10 - ERROR - stderr - +2025-02-06 07:45:10 - ERROR - stderr - +2025-02-06 07:45:10 - INFO - stdout - {'loss': 0.3498, 'grad_norm': 1.5606865882873535, 'learning_rate': 7.153239509873899e-09, 'epoch': 2.96} +2025-02-06 07:45:10 - ERROR - stderr - 99%|█████████▉| 22172/22434 [21:37:30<11:02, 2.53s/it] +2025-02-06 07:45:13 - ERROR - stderr - 99%|█████████▉| 22173/22434 [21:37:32<10:55, 2.51s/it] +2025-02-06 07:45:13 - ERROR - stderr - +2025-02-06 07:45:13 - ERROR - stderr - +2025-02-06 07:45:13 - INFO - stdout - {'loss': 0.3724, 'grad_norm': 1.583666443824768, 'learning_rate': 7.0987452849347045e-09, 'epoch': 2.97} +2025-02-06 07:45:13 - ERROR - stderr - 99%|█████████▉| 22173/22434 [21:37:32<10:55, 2.51s/it] +2025-02-06 07:45:15 - ERROR - stderr - 99%|█████████▉| 22174/22434 [21:37:35<10:55, 2.52s/it] +2025-02-06 07:45:15 - ERROR - stderr - +2025-02-06 07:45:15 - ERROR - stderr - +2025-02-06 07:45:15 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.5024676322937012, 'learning_rate': 7.044459352459898e-09, 'epoch': 2.97} +2025-02-06 07:45:15 - ERROR - stderr - 99%|█████████▉| 22174/22434 [21:37:35<10:55, 2.52s/it] +2025-02-06 07:45:18 - ERROR - stderr - 99%|█████████▉| 22175/22434 [21:37:37<10:43, 2.49s/it] +2025-02-06 07:45:18 - ERROR - stderr - +2025-02-06 07:45:18 - ERROR - stderr - +2025-02-06 07:45:18 - INFO - stdout - {'loss': 0.3849, 'grad_norm': 1.8029674291610718, 'learning_rate': 6.990381713580796e-09, 'epoch': 2.97} +2025-02-06 07:45:18 - ERROR - stderr - 99%|█████████▉| 22175/22434 [21:37:37<10:43, 2.49s/it] +2025-02-06 07:45:20 - ERROR - stderr - 99%|█████████▉| 22176/22434 [21:37:40<10:46, 2.50s/it] +2025-02-06 07:45:20 - ERROR - stderr - +2025-02-06 07:45:20 - ERROR - stderr - +2025-02-06 07:45:20 - INFO - stdout - {'loss': 0.3142, 'grad_norm': 1.4698599576950073, 'learning_rate': 6.936512369425386e-09, 'epoch': 2.97} +2025-02-06 07:45:20 - ERROR - stderr - 99%|█████████▉| 22176/22434 [21:37:40<10:46, 2.50s/it] +2025-02-06 07:45:23 - ERROR - stderr - 99%|█████████▉| 22177/22434 [21:37:42<10:41, 2.49s/it] +2025-02-06 07:45:23 - ERROR - stderr - +2025-02-06 07:45:23 - ERROR - stderr - +2025-02-06 07:45:23 - INFO - stdout - {'loss': 0.33, 'grad_norm': 1.5150262117385864, 'learning_rate': 6.882851321116102e-09, 'epoch': 2.97} +2025-02-06 07:45:23 - ERROR - stderr - 99%|█████████▉| 22177/22434 [21:37:42<10:41, 2.49s/it] +2025-02-06 07:45:25 - ERROR - stderr - 99%|█████████▉| 22178/22434 [21:37:45<10:36, 2.49s/it] +2025-02-06 07:45:25 - ERROR - stderr - +2025-02-06 07:45:25 - ERROR - stderr - +2025-02-06 07:45:25 - INFO - stdout - {'loss': 0.34, 'grad_norm': 1.4610412120819092, 'learning_rate': 6.82939856977094e-09, 'epoch': 2.97} +2025-02-06 07:45:25 - ERROR - stderr - 99%|█████████▉| 22178/22434 [21:37:45<10:36, 2.49s/it] +2025-02-06 07:45:28 - ERROR - stderr - 99%|█████████▉| 22179/22434 [21:37:47<10:35, 2.49s/it] +2025-02-06 07:45:28 - ERROR - stderr - +2025-02-06 07:45:28 - ERROR - stderr - +2025-02-06 07:45:28 - INFO - stdout - {'loss': 0.4011, 'grad_norm': 1.7131565809249878, 'learning_rate': 6.776154116504563e-09, 'epoch': 2.97} +2025-02-06 07:45:28 - ERROR - stderr - 99%|█████████▉| 22179/22434 [21:37:47<10:35, 2.49s/it] +2025-02-06 07:45:30 - ERROR - stderr - 99%|█████████▉| 22180/22434 [21:37:50<10:42, 2.53s/it] +2025-02-06 07:45:30 - ERROR - stderr - +2025-02-06 07:45:30 - ERROR - stderr - +2025-02-06 07:45:30 - INFO - stdout - {'loss': 0.3033, 'grad_norm': 1.3283063173294067, 'learning_rate': 6.723117962427195e-09, 'epoch': 2.97} +2025-02-06 07:45:30 - ERROR - stderr - 99%|█████████▉| 22180/22434 [21:37:50<10:42, 2.53s/it] +2025-02-06 07:45:33 - ERROR - stderr - 99%|█████████▉| 22181/22434 [21:37:52<10:39, 2.53s/it] +2025-02-06 07:45:33 - ERROR - stderr - +2025-02-06 07:45:33 - ERROR - stderr - +2025-02-06 07:45:33 - INFO - stdout - {'loss': 0.3493, 'grad_norm': 1.4546793699264526, 'learning_rate': 6.6702901086435065e-09, 'epoch': 2.97} +2025-02-06 07:45:33 - ERROR - stderr - 99%|█████████▉| 22181/22434 [21:37:53<10:39, 2.53s/it] +2025-02-06 07:45:35 - ERROR - stderr - 99%|█████████▉| 22182/22434 [21:37:55<10:31, 2.51s/it] +2025-02-06 07:45:35 - ERROR - stderr - +2025-02-06 07:45:35 - ERROR - stderr - +2025-02-06 07:45:35 - INFO - stdout - {'loss': 0.3666, 'grad_norm': 1.6371980905532837, 'learning_rate': 6.6176705562559506e-09, 'epoch': 2.97} +2025-02-06 07:45:35 - ERROR - stderr - 99%|█████████▉| 22182/22434 [21:37:55<10:31, 2.51s/it] +2025-02-06 07:45:38 - ERROR - stderr - 99%|█████████▉| 22183/22434 [21:37:57<10:29, 2.51s/it] +2025-02-06 07:45:38 - ERROR - stderr - +2025-02-06 07:45:38 - ERROR - stderr - +2025-02-06 07:45:38 - INFO - stdout - {'loss': 0.3743, 'grad_norm': 1.728973627090454, 'learning_rate': 6.565259306359206e-09, 'epoch': 2.97} +2025-02-06 07:45:38 - ERROR - stderr - 99%|█████████▉| 22183/22434 [21:37:57<10:29, 2.51s/it] +2025-02-06 07:45:40 - ERROR - stderr - 99%|█████████▉| 22184/22434 [21:38:00<10:30, 2.52s/it] +2025-02-06 07:45:40 - ERROR - stderr - +2025-02-06 07:45:40 - ERROR - stderr - +2025-02-06 07:45:40 - INFO - stdout - {'loss': 0.344, 'grad_norm': 1.6249005794525146, 'learning_rate': 6.513056360047954e-09, 'epoch': 2.97} +2025-02-06 07:45:40 - ERROR - stderr - 99%|█████████▉| 22184/22434 [21:38:00<10:30, 2.52s/it] +2025-02-06 07:45:43 - ERROR - stderr - 99%|█████████▉| 22185/22434 [21:38:02<10:23, 2.51s/it] +2025-02-06 07:45:43 - ERROR - stderr - +2025-02-06 07:45:43 - ERROR - stderr - +2025-02-06 07:45:43 - INFO - stdout - {'loss': 0.3912, 'grad_norm': 1.4646488428115845, 'learning_rate': 6.4610617184091e-09, 'epoch': 2.97} +2025-02-06 07:45:43 - ERROR - stderr - 99%|█████████▉| 22185/22434 [21:38:03<10:23, 2.51s/it] +2025-02-06 07:45:45 - ERROR - stderr - 99%|█████████▉| 22186/22434 [21:38:05<10:23, 2.51s/it] +2025-02-06 07:45:45 - ERROR - stderr - +2025-02-06 07:45:45 - ERROR - stderr - +2025-02-06 07:45:45 - INFO - stdout - {'loss': 0.3745, 'grad_norm': 1.5914289951324463, 'learning_rate': 6.4092753825262254e-09, 'epoch': 2.97} +2025-02-06 07:45:45 - ERROR - stderr - 99%|█████████▉| 22186/22434 [21:38:05<10:23, 2.51s/it] +2025-02-06 07:45:48 - ERROR - stderr - 99%|█████████▉| 22187/22434 [21:38:08<10:20, 2.51s/it] +2025-02-06 07:45:48 - ERROR - stderr - +2025-02-06 07:45:48 - ERROR - stderr - +2025-02-06 07:45:48 - INFO - stdout - {'loss': 0.3368, 'grad_norm': 1.461050033569336, 'learning_rate': 6.357697353479575e-09, 'epoch': 2.97} +2025-02-06 07:45:48 - ERROR - stderr - 99%|█████████▉| 22187/22434 [21:38:08<10:20, 2.51s/it] +2025-02-06 07:45:50 - ERROR - stderr - 99%|█████████▉| 22188/22434 [21:38:10<10:14, 2.50s/it] +2025-02-06 07:45:50 - ERROR - stderr - +2025-02-06 07:45:50 - ERROR - stderr - +2025-02-06 07:45:50 - INFO - stdout - {'loss': 0.3804, 'grad_norm': 1.6507654190063477, 'learning_rate': 6.306327632342734e-09, 'epoch': 2.97} +2025-02-06 07:45:50 - ERROR - stderr - 99%|█████████▉| 22188/22434 [21:38:10<10:14, 2.50s/it] +2025-02-06 07:45:53 - ERROR - stderr - 99%|█████████▉| 22189/22434 [21:38:12<10:11, 2.50s/it] +2025-02-06 07:45:53 - ERROR - stderr - +2025-02-06 07:45:53 - ERROR - stderr - +2025-02-06 07:45:53 - INFO - stdout - {'loss': 0.3406, 'grad_norm': 1.4742457866668701, 'learning_rate': 6.2551662201892905e-09, 'epoch': 2.97} +2025-02-06 07:45:53 - ERROR - stderr - 99%|█████████▉| 22189/22434 [21:38:13<10:11, 2.50s/it] +2025-02-06 07:45:55 - ERROR - stderr - 99%|█████████▉| 22190/22434 [21:38:15<10:10, 2.50s/it] +2025-02-06 07:45:55 - ERROR - stderr - +2025-02-06 07:45:55 - ERROR - stderr - +2025-02-06 07:45:55 - INFO - stdout - {'loss': 0.3605, 'grad_norm': 1.602725863456726, 'learning_rate': 6.2042131180828355e-09, 'epoch': 2.97} +2025-02-06 07:45:55 - ERROR - stderr - 99%|█████████▉| 22190/22434 [21:38:15<10:10, 2.50s/it] +2025-02-06 07:45:58 - ERROR - stderr - 99%|█████████▉| 22191/22434 [21:38:18<10:09, 2.51s/it] +2025-02-06 07:45:58 - ERROR - stderr - +2025-02-06 07:45:58 - ERROR - stderr - +2025-02-06 07:45:58 - INFO - stdout - {'loss': 0.3278, 'grad_norm': 1.4466229677200317, 'learning_rate': 6.153468327086964e-09, 'epoch': 2.97} +2025-02-06 07:45:58 - ERROR - stderr - 99%|█████████▉| 22191/22434 [21:38:18<10:09, 2.51s/it] +2025-02-06 07:46:00 - ERROR - stderr - 99%|█████████▉| 22192/22434 [21:38:20<10:06, 2.51s/it] +2025-02-06 07:46:00 - ERROR - stderr - +2025-02-06 07:46:00 - ERROR - stderr - +2025-02-06 07:46:00 - INFO - stdout - {'loss': 0.3594, 'grad_norm': 1.4026503562927246, 'learning_rate': 6.1029318482586085e-09, 'epoch': 2.97} +2025-02-06 07:46:00 - ERROR - stderr - 99%|█████████▉| 22192/22434 [21:38:20<10:06, 2.51s/it] +2025-02-06 07:46:03 - ERROR - stderr - 99%|█████████▉| 22193/22434 [21:38:23<10:04, 2.51s/it] +2025-02-06 07:46:03 - ERROR - stderr - +2025-02-06 07:46:03 - ERROR - stderr - +2025-02-06 07:46:03 - INFO - stdout - {'loss': 0.3139, 'grad_norm': 1.3978121280670166, 'learning_rate': 6.0526036826513705e-09, 'epoch': 2.97} +2025-02-06 07:46:03 - ERROR - stderr - 99%|█████████▉| 22193/22434 [21:38:23<10:04, 2.51s/it] +2025-02-06 07:46:05 - ERROR - stderr - 99%|█████████▉| 22194/22434 [21:38:25<09:57, 2.49s/it] +2025-02-06 07:46:05 - ERROR - stderr - +2025-02-06 07:46:05 - ERROR - stderr - +2025-02-06 07:46:05 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.556868553161621, 'learning_rate': 6.0024838313144095e-09, 'epoch': 2.97} +2025-02-06 07:46:05 - ERROR - stderr - 99%|█████████▉| 22194/22434 [21:38:25<09:57, 2.49s/it] +2025-02-06 07:46:08 - ERROR - stderr - 99%|█████████▉| 22195/22434 [21:38:27<09:57, 2.50s/it] +2025-02-06 07:46:08 - ERROR - stderr - +2025-02-06 07:46:08 - ERROR - stderr - +2025-02-06 07:46:08 - INFO - stdout - {'loss': 0.3746, 'grad_norm': 1.5053924322128296, 'learning_rate': 5.952572295293557e-09, 'epoch': 2.97} +2025-02-06 07:46:08 - ERROR - stderr - 99%|█████████▉| 22195/22434 [21:38:28<09:57, 2.50s/it] +2025-02-06 07:46:10 - ERROR - stderr - 99%|█████████▉| 22196/22434 [21:38:30<09:59, 2.52s/it] +2025-02-06 07:46:10 - ERROR - stderr - +2025-02-06 07:46:10 - ERROR - stderr - +2025-02-06 07:46:10 - INFO - stdout - {'loss': 0.3242, 'grad_norm': 1.4812268018722534, 'learning_rate': 5.902869075626871e-09, 'epoch': 2.97} +2025-02-06 07:46:10 - ERROR - stderr - 99%|█████████▉| 22196/22434 [21:38:30<09:59, 2.52s/it] +2025-02-06 07:46:13 - ERROR - stderr - 99%|█████████▉| 22197/22434 [21:38:33<09:57, 2.52s/it] +2025-02-06 07:46:13 - ERROR - stderr - +2025-02-06 07:46:13 - ERROR - stderr - +2025-02-06 07:46:13 - INFO - stdout - {'loss': 0.42, 'grad_norm': 1.815366268157959, 'learning_rate': 5.853374173352411e-09, 'epoch': 2.97} +2025-02-06 07:46:13 - ERROR - stderr - 99%|█████████▉| 22197/22434 [21:38:33<09:57, 2.52s/it] +2025-02-06 07:46:15 - ERROR - stderr - 99%|█████████▉| 22198/22434 [21:38:35<09:51, 2.51s/it] +2025-02-06 07:46:15 - ERROR - stderr - +2025-02-06 07:46:15 - ERROR - stderr - +2025-02-06 07:46:15 - INFO - stdout - {'loss': 0.343, 'grad_norm': 1.688860535621643, 'learning_rate': 5.8040875895004625e-09, 'epoch': 2.97} +2025-02-06 07:46:15 - ERROR - stderr - 99%|█████████▉| 22198/22434 [21:38:35<09:51, 2.51s/it] +2025-02-06 07:46:18 - ERROR - stderr - 99%|█████████▉| 22199/22434 [21:38:37<09:44, 2.49s/it] +2025-02-06 07:46:18 - ERROR - stderr - +2025-02-06 07:46:18 - ERROR - stderr - +2025-02-06 07:46:18 - INFO - stdout - {'loss': 0.3789, 'grad_norm': 1.5160353183746338, 'learning_rate': 5.755009325099092e-09, 'epoch': 2.97} +2025-02-06 07:46:18 - ERROR - stderr - 99%|█████████▉| 22199/22434 [21:38:38<09:44, 2.49s/it] +2025-02-06 07:46:20 - ERROR - stderr - 99%|█████████▉| 22200/22434 [21:38:40<09:50, 2.52s/it] +2025-02-06 07:46:20 - ERROR - stderr - +2025-02-06 07:46:20 - ERROR - stderr - +2025-02-06 07:46:20 - INFO - stdout - {'loss': 0.3545, 'grad_norm': 1.4813988208770752, 'learning_rate': 5.706139381170816e-09, 'epoch': 2.97} +2025-02-06 07:46:20 - ERROR - stderr - 99%|█████████▉| 22200/22434 [21:38:40<09:50, 2.52s/it] +2025-02-06 07:46:23 - ERROR - stderr - 99%|█████████▉| 22201/22434 [21:38:43<09:46, 2.52s/it] +2025-02-06 07:46:23 - ERROR - stderr - +2025-02-06 07:46:23 - ERROR - stderr - +2025-02-06 07:46:23 - INFO - stdout - {'loss': 0.3455, 'grad_norm': 1.5177451372146606, 'learning_rate': 5.6574777587348195e-09, 'epoch': 2.97} +2025-02-06 07:46:23 - ERROR - stderr - 99%|█████████▉| 22201/22434 [21:38:43<09:46, 2.52s/it] +2025-02-06 07:46:25 - ERROR - stderr - 99%|█████████▉| 22202/22434 [21:38:45<09:48, 2.54s/it] +2025-02-06 07:46:25 - ERROR - stderr - +2025-02-06 07:46:25 - ERROR - stderr - +2025-02-06 07:46:25 - INFO - stdout - {'loss': 0.3811, 'grad_norm': 1.6913261413574219, 'learning_rate': 5.609024458804735e-09, 'epoch': 2.97} +2025-02-06 07:46:25 - ERROR - stderr - 99%|█████████▉| 22202/22434 [21:38:45<09:48, 2.54s/it] +2025-02-06 07:46:28 - ERROR - stderr - 99%|█████████▉| 22203/22434 [21:38:48<09:45, 2.53s/it] +2025-02-06 07:46:28 - ERROR - stderr - +2025-02-06 07:46:28 - ERROR - stderr - +2025-02-06 07:46:28 - INFO - stdout - {'loss': 0.4125, 'grad_norm': 1.6593009233474731, 'learning_rate': 5.560779482391976e-09, 'epoch': 2.97} +2025-02-06 07:46:28 - ERROR - stderr - 99%|█████████▉| 22203/22434 [21:38:48<09:45, 2.53s/it] +2025-02-06 07:46:30 - ERROR - stderr - 99%|█████████▉| 22204/22434 [21:38:50<09:40, 2.52s/it] +2025-02-06 07:46:30 - ERROR - stderr - +2025-02-06 07:46:30 - ERROR - stderr - +2025-02-06 07:46:30 - INFO - stdout - {'loss': 0.3843, 'grad_norm': 1.6107127666473389, 'learning_rate': 5.512742830500184e-09, 'epoch': 2.97} +2025-02-06 07:46:31 - ERROR - stderr - 99%|█████████▉| 22204/22434 [21:38:50<09:40, 2.52s/it] +2025-02-06 07:46:33 - ERROR - stderr - 99%|█████████▉| 22205/22434 [21:38:53<09:40, 2.54s/it] +2025-02-06 07:46:33 - ERROR - stderr - +2025-02-06 07:46:33 - ERROR - stderr - +2025-02-06 07:46:33 - INFO - stdout - {'loss': 0.4428, 'grad_norm': 1.9694842100143433, 'learning_rate': 5.464914504131891e-09, 'epoch': 2.97} +2025-02-06 07:46:33 - ERROR - stderr - 99%|█████████▉| 22205/22434 [21:38:53<09:40, 2.54s/it] +2025-02-06 07:46:35 - ERROR - stderr - 99%|█████████▉| 22206/22434 [21:38:55<09:30, 2.50s/it] +2025-02-06 07:46:35 - ERROR - stderr - +2025-02-06 07:46:35 - ERROR - stderr - +2025-02-06 07:46:35 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5029587745666504, 'learning_rate': 5.417294504284076e-09, 'epoch': 2.97} +2025-02-06 07:46:35 - ERROR - stderr - 99%|█████████▉| 22206/22434 [21:38:55<09:30, 2.50s/it] +2025-02-06 07:46:38 - ERROR - stderr - 99%|█████████▉| 22207/22434 [21:38:58<09:27, 2.50s/it] +2025-02-06 07:46:38 - ERROR - stderr - +2025-02-06 07:46:38 - ERROR - stderr - +2025-02-06 07:46:38 - INFO - stdout - {'loss': 0.3802, 'grad_norm': 1.6546305418014526, 'learning_rate': 5.36988283194817e-09, 'epoch': 2.97} +2025-02-06 07:46:38 - ERROR - stderr - 99%|█████████▉| 22207/22434 [21:38:58<09:27, 2.50s/it] +2025-02-06 07:46:40 - ERROR - stderr - 99%|█████████▉| 22208/22434 [21:39:00<09:28, 2.52s/it] +2025-02-06 07:46:41 - ERROR - stderr - +2025-02-06 07:46:41 - ERROR - stderr - +2025-02-06 07:46:41 - INFO - stdout - {'loss': 0.3406, 'grad_norm': 1.5915272235870361, 'learning_rate': 5.32267948811338e-09, 'epoch': 2.97} +2025-02-06 07:46:41 - ERROR - stderr - 99%|█████████▉| 22208/22434 [21:39:00<09:28, 2.52s/it] +2025-02-06 07:46:43 - ERROR - stderr - 99%|█████████▉| 22209/22434 [21:39:03<09:20, 2.49s/it] +2025-02-06 07:46:43 - ERROR - stderr - +2025-02-06 07:46:43 - ERROR - stderr - +2025-02-06 07:46:43 - INFO - stdout - {'loss': 0.3852, 'grad_norm': 1.7490209341049194, 'learning_rate': 5.275684473764475e-09, 'epoch': 2.97} +2025-02-06 07:46:43 - ERROR - stderr - 99%|█████████▉| 22209/22434 [21:39:03<09:20, 2.49s/it] +2025-02-06 07:46:45 - ERROR - stderr - 99%|█████████▉| 22210/22434 [21:39:05<09:18, 2.49s/it] +2025-02-06 07:46:45 - ERROR - stderr - +2025-02-06 07:46:45 - ERROR - stderr - +2025-02-06 07:46:45 - INFO - stdout - {'loss': 0.3024, 'grad_norm': 1.5525455474853516, 'learning_rate': 5.228897789878451e-09, 'epoch': 2.97} +2025-02-06 07:46:45 - ERROR - stderr - 99%|█████████▉| 22210/22434 [21:39:05<09:18, 2.49s/it] +2025-02-06 07:46:48 - ERROR - stderr - 99%|█████████▉| 22211/22434 [21:39:08<09:14, 2.49s/it] +2025-02-06 07:46:48 - ERROR - stderr - +2025-02-06 07:46:48 - ERROR - stderr - +2025-02-06 07:46:48 - INFO - stdout - {'loss': 0.3821, 'grad_norm': 1.4536575078964233, 'learning_rate': 5.182319437433414e-09, 'epoch': 2.97} +2025-02-06 07:46:48 - ERROR - stderr - 99%|█████████▉| 22211/22434 [21:39:08<09:14, 2.49s/it] +2025-02-06 07:46:51 - ERROR - stderr - 99%|█████████▉| 22212/22434 [21:39:10<09:34, 2.59s/it] +2025-02-06 07:46:51 - ERROR - stderr - +2025-02-06 07:46:51 - ERROR - stderr - +2025-02-06 07:46:51 - INFO - stdout - {'loss': 0.3207, 'grad_norm': 1.5042445659637451, 'learning_rate': 5.1359494173985895e-09, 'epoch': 2.97} +2025-02-06 07:46:51 - ERROR - stderr - 99%|█████████▉| 22212/22434 [21:39:11<09:34, 2.59s/it] +2025-02-06 07:46:53 - ERROR - stderr - 99%|█████████▉| 22213/22434 [21:39:13<09:26, 2.56s/it] +2025-02-06 07:46:53 - ERROR - stderr - +2025-02-06 07:46:53 - ERROR - stderr - +2025-02-06 07:46:53 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.6227918863296509, 'learning_rate': 5.08978773074098e-09, 'epoch': 2.97} +2025-02-06 07:46:53 - ERROR - stderr - 99%|█████████▉| 22213/22434 [21:39:13<09:26, 2.56s/it] +2025-02-06 07:46:56 - ERROR - stderr - 99%|█████████▉| 22214/22434 [21:39:15<09:19, 2.54s/it] +2025-02-06 07:46:56 - ERROR - stderr - +2025-02-06 07:46:56 - ERROR - stderr - +2025-02-06 07:46:56 - INFO - stdout - {'loss': 0.3536, 'grad_norm': 1.5258833169937134, 'learning_rate': 5.043834378422041e-09, 'epoch': 2.97} +2025-02-06 07:46:56 - ERROR - stderr - 99%|█████████▉| 22214/22434 [21:39:16<09:19, 2.54s/it] +2025-02-06 07:46:58 - ERROR - stderr - 99%|█████████▉| 22215/22434 [21:39:18<09:17, 2.54s/it] +2025-02-06 07:46:58 - ERROR - stderr - +2025-02-06 07:46:58 - ERROR - stderr - +2025-02-06 07:46:58 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.7199859619140625, 'learning_rate': 4.998089361401004e-09, 'epoch': 2.97} +2025-02-06 07:46:58 - ERROR - stderr - 99%|█████████▉| 22215/22434 [21:39:18<09:17, 2.54s/it] +2025-02-06 07:47:01 - ERROR - stderr - 99%|█████████▉| 22216/22434 [21:39:21<09:13, 2.54s/it] +2025-02-06 07:47:01 - ERROR - stderr - +2025-02-06 07:47:01 - ERROR - stderr - +2025-02-06 07:47:01 - INFO - stdout - {'loss': 0.3921, 'grad_norm': 1.549997091293335, 'learning_rate': 4.95255268062933e-09, 'epoch': 2.97} +2025-02-06 07:47:01 - ERROR - stderr - 99%|█████████▉| 22216/22434 [21:39:21<09:13, 2.54s/it] +2025-02-06 07:47:03 - ERROR - stderr - 99%|█████████▉| 22217/22434 [21:39:23<09:10, 2.54s/it] +2025-02-06 07:47:03 - ERROR - stderr - +2025-02-06 07:47:03 - ERROR - stderr - +2025-02-06 07:47:03 - INFO - stdout - {'loss': 0.3958, 'grad_norm': 1.7030919790267944, 'learning_rate': 4.907224337058481e-09, 'epoch': 2.97} +2025-02-06 07:47:03 - ERROR - stderr - 99%|█████████▉| 22217/22434 [21:39:23<09:10, 2.54s/it] +2025-02-06 07:47:06 - ERROR - stderr - 99%|█████████▉| 22218/22434 [21:39:26<09:09, 2.54s/it] +2025-02-06 07:47:06 - ERROR - stderr - +2025-02-06 07:47:06 - ERROR - stderr - +2025-02-06 07:47:06 - INFO - stdout - {'loss': 0.3689, 'grad_norm': 1.543946385383606, 'learning_rate': 4.8621043316321444e-09, 'epoch': 2.97} +2025-02-06 07:47:06 - ERROR - stderr - 99%|█████████▉| 22218/22434 [21:39:26<09:09, 2.54s/it] +2025-02-06 07:47:08 - ERROR - stderr - 99%|█████████▉| 22219/22434 [21:39:28<09:00, 2.52s/it] +2025-02-06 07:47:08 - ERROR - stderr - +2025-02-06 07:47:08 - ERROR - stderr - +2025-02-06 07:47:08 - INFO - stdout - {'loss': 0.3522, 'grad_norm': 1.5781748294830322, 'learning_rate': 4.817192665291792e-09, 'epoch': 2.97} +2025-02-06 07:47:08 - ERROR - stderr - 99%|█████████▉| 22219/22434 [21:39:28<09:00, 2.52s/it] +2025-02-06 07:47:11 - ERROR - stderr - 99%|█████████▉| 22220/22434 [21:39:31<08:56, 2.51s/it] +2025-02-06 07:47:11 - ERROR - stderr - +2025-02-06 07:47:11 - ERROR - stderr - +2025-02-06 07:47:11 - INFO - stdout - {'loss': 0.3919, 'grad_norm': 1.6947940587997437, 'learning_rate': 4.77248933897112e-09, 'epoch': 2.97} +2025-02-06 07:47:11 - ERROR - stderr - 99%|█████████▉| 22220/22434 [21:39:31<08:56, 2.51s/it] +2025-02-06 07:47:13 - ERROR - stderr - 99%|█████████▉| 22221/22434 [21:39:33<08:52, 2.50s/it] +2025-02-06 07:47:13 - ERROR - stderr - +2025-02-06 07:47:13 - ERROR - stderr - +2025-02-06 07:47:13 - INFO - stdout - {'loss': 0.3793, 'grad_norm': 1.6470860242843628, 'learning_rate': 4.727994353604937e-09, 'epoch': 2.97} +2025-02-06 07:47:13 - ERROR - stderr - 99%|█████████▉| 22221/22434 [21:39:33<08:52, 2.50s/it] +2025-02-06 07:47:16 - ERROR - stderr - 99%|█████████▉| 22222/22434 [21:39:36<08:49, 2.50s/it] +2025-02-06 07:47:16 - ERROR - stderr - +2025-02-06 07:47:16 - ERROR - stderr - +2025-02-06 07:47:16 - INFO - stdout - {'loss': 0.3427, 'grad_norm': 1.4695788621902466, 'learning_rate': 4.683707710118057e-09, 'epoch': 2.97} +2025-02-06 07:47:16 - ERROR - stderr - 99%|█████████▉| 22222/22434 [21:39:36<08:49, 2.50s/it] +2025-02-06 07:47:18 - ERROR - stderr - 99%|█████████▉| 22223/22434 [21:39:38<08:44, 2.49s/it] +2025-02-06 07:47:18 - ERROR - stderr - +2025-02-06 07:47:18 - ERROR - stderr - +2025-02-06 07:47:18 - INFO - stdout - {'loss': 0.3708, 'grad_norm': 1.5075627565383911, 'learning_rate': 4.6396294094352975e-09, 'epoch': 2.97} +2025-02-06 07:47:18 - ERROR - stderr - 99%|█████████▉| 22223/22434 [21:39:38<08:44, 2.49s/it] +2025-02-06 07:47:21 - ERROR - stderr - 99%|█████████▉| 22224/22434 [21:39:41<08:42, 2.49s/it] +2025-02-06 07:47:21 - ERROR - stderr - +2025-02-06 07:47:21 - ERROR - stderr - +2025-02-06 07:47:21 - INFO - stdout - {'loss': 0.2974, 'grad_norm': 1.574312686920166, 'learning_rate': 4.595759452474812e-09, 'epoch': 2.97} +2025-02-06 07:47:21 - ERROR - stderr - 99%|█████████▉| 22224/22434 [21:39:41<08:42, 2.49s/it] +2025-02-06 07:47:23 - ERROR - stderr - 99%|███��█████▉| 22225/22434 [21:39:43<08:37, 2.48s/it] +2025-02-06 07:47:23 - ERROR - stderr - +2025-02-06 07:47:23 - ERROR - stderr - +2025-02-06 07:47:23 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.4989244937896729, 'learning_rate': 4.552097840151426e-09, 'epoch': 2.97} +2025-02-06 07:47:23 - ERROR - stderr - 99%|█████████▉| 22225/22434 [21:39:43<08:37, 2.48s/it] +2025-02-06 07:47:26 - ERROR - stderr - 99%|█████████▉| 22226/22434 [21:39:45<08:30, 2.46s/it] +2025-02-06 07:47:26 - ERROR - stderr - +2025-02-06 07:47:26 - ERROR - stderr - +2025-02-06 07:47:26 - INFO - stdout - {'loss': 0.3826, 'grad_norm': 1.5745875835418701, 'learning_rate': 4.50864457337441e-09, 'epoch': 2.97} +2025-02-06 07:47:26 - ERROR - stderr - 99%|█████████▉| 22226/22434 [21:39:45<08:30, 2.46s/it] +2025-02-06 07:47:28 - ERROR - stderr - 99%|█████████▉| 22227/22434 [21:39:48<08:32, 2.48s/it] +2025-02-06 07:47:28 - ERROR - stderr - +2025-02-06 07:47:28 - ERROR - stderr - +2025-02-06 07:47:28 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.5760798454284668, 'learning_rate': 4.465399653050817e-09, 'epoch': 2.97} +2025-02-06 07:47:28 - ERROR - stderr - 99%|█████████▉| 22227/22434 [21:39:48<08:32, 2.48s/it] +2025-02-06 07:47:31 - ERROR - stderr - 99%|█████████▉| 22228/22434 [21:39:50<08:36, 2.51s/it] +2025-02-06 07:47:31 - ERROR - stderr - +2025-02-06 07:47:31 - ERROR - stderr - +2025-02-06 07:47:31 - INFO - stdout - {'loss': 0.3196, 'grad_norm': 1.4698699712753296, 'learning_rate': 4.422363080081038e-09, 'epoch': 2.97} +2025-02-06 07:47:31 - ERROR - stderr - 99%|█████████▉| 22228/22434 [21:39:51<08:36, 2.51s/it] +2025-02-06 07:47:33 - ERROR - stderr - 99%|█████████▉| 22229/22434 [21:39:53<08:31, 2.49s/it] +2025-02-06 07:47:33 - ERROR - stderr - +2025-02-06 07:47:33 - ERROR - stderr - +2025-02-06 07:47:33 - INFO - stdout - {'loss': 0.38, 'grad_norm': 1.6363751888275146, 'learning_rate': 4.379534855362133e-09, 'epoch': 2.97} +2025-02-06 07:47:33 - ERROR - stderr - 99%|█████████▉| 22229/22434 [21:39:53<08:31, 2.49s/it] +2025-02-06 07:47:36 - ERROR - stderr - 99%|█████████▉| 22230/22434 [21:39:55<08:28, 2.49s/it] +2025-02-06 07:47:36 - ERROR - stderr - +2025-02-06 07:47:36 - ERROR - stderr - +2025-02-06 07:47:36 - INFO - stdout - {'loss': 0.3318, 'grad_norm': 1.5081733465194702, 'learning_rate': 4.336914979787832e-09, 'epoch': 2.97} +2025-02-06 07:47:36 - ERROR - stderr - 99%|█████████▉| 22230/22434 [21:39:55<08:28, 2.49s/it] +2025-02-06 07:47:38 - ERROR - stderr - 99%|█████████▉| 22231/22434 [21:39:58<08:27, 2.50s/it] +2025-02-06 07:47:38 - ERROR - stderr - +2025-02-06 07:47:38 - ERROR - stderr - +2025-02-06 07:47:38 - INFO - stdout - {'loss': 0.3416, 'grad_norm': 1.4902421236038208, 'learning_rate': 4.294503454244092e-09, 'epoch': 2.97} +2025-02-06 07:47:38 - ERROR - stderr - 99%|█████████▉| 22231/22434 [21:39:58<08:27, 2.50s/it] +2025-02-06 07:47:41 - ERROR - stderr - 99%|█████████▉| 22232/22434 [21:40:00<08:22, 2.49s/it] +2025-02-06 07:47:41 - ERROR - stderr - +2025-02-06 07:47:41 - ERROR - stderr - +2025-02-06 07:47:41 - INFO - stdout - {'loss': 0.3623, 'grad_norm': 1.7309287786483765, 'learning_rate': 4.252300279617982e-09, 'epoch': 2.97} +2025-02-06 07:47:41 - ERROR - stderr - 99%|█████████▉| 22232/22434 [21:40:00<08:22, 2.49s/it] +2025-02-06 07:47:43 - ERROR - stderr - 99%|█████████▉| 22233/22434 [21:40:03<08:21, 2.50s/it] +2025-02-06 07:47:43 - ERROR - stderr - +2025-02-06 07:47:43 - ERROR - stderr - +2025-02-06 07:47:43 - INFO - stdout - {'loss': 0.3572, 'grad_norm': 1.535492181777954, 'learning_rate': 4.2103054567876885e-09, 'epoch': 2.97} +2025-02-06 07:47:43 - ERROR - stderr - 99%|█████████▉| 22233/22434 [21:40:03<08:21, 2.50s/it] +2025-02-06 07:47:46 - ERROR - stderr - 99%|█████████▉| 22234/22434 [21:40:05<08:20, 2.50s/it] +2025-02-06 07:47:46 - ERROR - stderr - +2025-02-06 07:47:46 - ERROR - stderr - +2025-02-06 07:47:46 - INFO - stdout - {'loss': 0.4303, 'grad_norm': 1.7999907732009888, 'learning_rate': 4.1685189866280676e-09, 'epoch': 2.97} +2025-02-06 07:47:46 - ERROR - stderr - 99%|█████████▉| 22234/22434 [21:40:05<08:20, 2.50s/it] +2025-02-06 07:47:48 - ERROR - stderr - 99%|█████████▉| 22235/22434 [21:40:08<08:19, 2.51s/it] +2025-02-06 07:47:48 - ERROR - stderr - +2025-02-06 07:47:48 - ERROR - stderr - +2025-02-06 07:47:48 - INFO - stdout - {'loss': 0.3393, 'grad_norm': 1.6312158107757568, 'learning_rate': 4.126940870010643e-09, 'epoch': 2.97} +2025-02-06 07:47:48 - ERROR - stderr - 99%|█████████▉| 22235/22434 [21:40:08<08:19, 2.51s/it] +2025-02-06 07:47:51 - ERROR - stderr - 99%|█████████▉| 22236/22434 [21:40:11<08:43, 2.64s/it] +2025-02-06 07:47:51 - ERROR - stderr - +2025-02-06 07:47:51 - ERROR - stderr - +2025-02-06 07:47:51 - INFO - stdout - {'loss': 0.3861, 'grad_norm': 1.322710394859314, 'learning_rate': 4.085571107802499e-09, 'epoch': 2.97} +2025-02-06 07:47:51 - ERROR - stderr - 99%|█████████▉| 22236/22434 [21:40:11<08:43, 2.64s/it] +2025-02-06 07:47:54 - ERROR - stderr - 99%|█████████▉| 22237/22434 [21:40:13<08:30, 2.59s/it] +2025-02-06 07:47:54 - ERROR - stderr - +2025-02-06 07:47:54 - ERROR - stderr - +2025-02-06 07:47:54 - INFO - stdout - {'loss': 0.3776, 'grad_norm': 1.6536288261413574, 'learning_rate': 4.044409700866281e-09, 'epoch': 2.97} +2025-02-06 07:47:54 - ERROR - stderr - 99%|█████████▉| 22237/22434 [21:40:13<08:30, 2.59s/it] +2025-02-06 07:47:56 - ERROR - stderr - 99%|█████████▉| 22238/22434 [21:40:16<08:20, 2.55s/it] +2025-02-06 07:47:56 - ERROR - stderr - +2025-02-06 07:47:56 - ERROR - stderr - +2025-02-06 07:47:56 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.6778944730758667, 'learning_rate': 4.003456650057968e-09, 'epoch': 2.97} +2025-02-06 07:47:56 - ERROR - stderr - 99%|█████████▉| 22238/22434 [21:40:16<08:20, 2.55s/it] +2025-02-06 07:47:59 - ERROR - stderr - 99%|█████████▉| 22239/22434 [21:40:18<08:15, 2.54s/it] +2025-02-06 07:47:59 - ERROR - stderr - +2025-02-06 07:47:59 - ERROR - stderr - +2025-02-06 07:47:59 - INFO - stdout - {'loss': 0.3306, 'grad_norm': 1.454187273979187, 'learning_rate': 3.962711956233545e-09, 'epoch': 2.97} +2025-02-06 07:47:59 - ERROR - stderr - 99%|█████████▉| 22239/22434 [21:40:18<08:15, 2.54s/it] +2025-02-06 07:48:01 - ERROR - stderr - 99%|█████████▉| 22240/22434 [21:40:21<08:14, 2.55s/it] +2025-02-06 07:48:01 - ERROR - stderr - +2025-02-06 07:48:01 - ERROR - stderr - +2025-02-06 07:48:01 - INFO - stdout - {'loss': 0.4412, 'grad_norm': 1.6880204677581787, 'learning_rate': 3.9221756202401096e-09, 'epoch': 2.97} +2025-02-06 07:48:01 - ERROR - stderr - 99%|█████████▉| 22240/22434 [21:40:21<08:14, 2.55s/it] +2025-02-06 07:48:04 - ERROR - stderr - 99%|█████████▉| 22241/22434 [21:40:23<08:05, 2.52s/it] +2025-02-06 07:48:04 - ERROR - stderr - +2025-02-06 07:48:04 - ERROR - stderr - +2025-02-06 07:48:04 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.5218244791030884, 'learning_rate': 3.8818476429247634e-09, 'epoch': 2.97} +2025-02-06 07:48:04 - ERROR - stderr - 99%|█████████▉| 22241/22434 [21:40:23<08:05, 2.52s/it] +2025-02-06 07:48:06 - ERROR - stderr - 99%|█████████▉| 22242/22434 [21:40:26<08:00, 2.50s/it] +2025-02-06 07:48:06 - ERROR - stderr - +2025-02-06 07:48:06 - ERROR - stderr - +2025-02-06 07:48:06 - INFO - stdout - {'loss': 0.3878, 'grad_norm': 1.5361510515213013, 'learning_rate': 3.8417280251257235e-09, 'epoch': 2.97} +2025-02-06 07:48:06 - ERROR - stderr - 99%|█████████▉| 22242/22434 [21:40:26<08:00, 2.50s/it] +2025-02-06 07:48:09 - ERROR - stderr - 99%|█████████▉| 22243/22434 [21:40:28<07:54, 2.49s/it] +2025-02-06 07:48:09 - ERROR - stderr - +2025-02-06 07:48:09 - ERROR - stderr - +2025-02-06 07:48:09 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.7068085670471191, 'learning_rate': 3.80181676768121e-09, 'epoch': 2.97} +2025-02-06 07:48:09 - ERROR - stderr - 99%|█████████▉| 22243/22434 [21:40:28<07:54, 2.49s/it] +2025-02-06 07:48:11 - ERROR - stderr - 99%|█████████▉| 22244/22434 [21:40:31<07:56, 2.51s/it] +2025-02-06 07:48:11 - ERROR - stderr - +2025-02-06 07:48:11 - ERROR - stderr - +2025-02-06 07:48:11 - INFO - stdout - {'loss': 0.3415, 'grad_norm': 1.4000853300094604, 'learning_rate': 3.762113871422779e-09, 'epoch': 2.97} +2025-02-06 07:48:11 - ERROR - stderr - 99%|█████████▉| 22244/22434 [21:40:31<07:56, 2.51s/it] +2025-02-06 07:48:14 - ERROR - stderr - 99%|█████████▉| 22245/22434 [21:40:33<07:53, 2.51s/it] +2025-02-06 07:48:14 - ERROR - stderr - +2025-02-06 07:48:14 - ERROR - stderr - +2025-02-06 07:48:14 - INFO - stdout - {'loss': 0.3567, 'grad_norm': 1.7498782873153687, 'learning_rate': 3.7226193371775465e-09, 'epoch': 2.97} +2025-02-06 07:48:14 - ERROR - stderr - 99%|█████████▉| 22245/22434 [21:40:33<07:53, 2.51s/it] +2025-02-06 07:48:16 - ERROR - stderr - 99%|█████████▉| 22246/22434 [21:40:36<07:49, 2.50s/it] +2025-02-06 07:48:16 - ERROR - stderr - +2025-02-06 07:48:16 - ERROR - stderr - +2025-02-06 07:48:16 - INFO - stdout - {'loss': 0.4289, 'grad_norm': 1.6771210432052612, 'learning_rate': 3.6833331657692985e-09, 'epoch': 2.97} +2025-02-06 07:48:16 - ERROR - stderr - 99%|█████████▉| 22246/22434 [21:40:36<07:49, 2.50s/it] +2025-02-06 07:48:19 - ERROR - stderr - 99%|█████████▉| 22247/22434 [21:40:38<07:53, 2.53s/it] +2025-02-06 07:48:19 - ERROR - stderr - +2025-02-06 07:48:19 - ERROR - stderr - +2025-02-06 07:48:19 - INFO - stdout - {'loss': 0.3768, 'grad_norm': 1.6229379177093506, 'learning_rate': 3.6442553580162687e-09, 'epoch': 2.97} +2025-02-06 07:48:19 - ERROR - stderr - 99%|█████████▉| 22247/22434 [21:40:38<07:53, 2.53s/it] +2025-02-06 07:48:21 - ERROR - stderr - 99%|█████████▉| 22248/22434 [21:40:41<07:52, 2.54s/it] +2025-02-06 07:48:21 - ERROR - stderr - +2025-02-06 07:48:21 - ERROR - stderr - +2025-02-06 07:48:21 - INFO - stdout - {'loss': 0.324, 'grad_norm': 1.4154161214828491, 'learning_rate': 3.6053859147333614e-09, 'epoch': 2.98} +2025-02-06 07:48:21 - ERROR - stderr - 99%|█████████▉| 22248/22434 [21:40:41<07:52, 2.54s/it] +2025-02-06 07:48:24 - ERROR - stderr - 99%|█████████▉| 22249/22434 [21:40:44<07:53, 2.56s/it] +2025-02-06 07:48:24 - ERROR - stderr - +2025-02-06 07:48:24 - ERROR - stderr - +2025-02-06 07:48:24 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.418697714805603, 'learning_rate': 3.5667248367310392e-09, 'epoch': 2.98} +2025-02-06 07:48:24 - ERROR - stderr - 99%|█████████▉| 22249/22434 [21:40:44<07:53, 2.56s/it] +2025-02-06 07:48:27 - ERROR - stderr - 99%|█████████▉| 22250/22434 [21:40:46<08:01, 2.61s/it] +2025-02-06 07:48:27 - ERROR - stderr - +2025-02-06 07:48:27 - ERROR - stderr - +2025-02-06 07:48:27 - INFO - stdout - {'loss': 0.381, 'grad_norm': 1.7120435237884521, 'learning_rate': 3.5282721248142137e-09, 'epoch': 2.98} +2025-02-06 07:48:27 - ERROR - stderr - 99%|█████████▉| 22250/22434 [21:40:46<08:01, 2.61s/it] +2025-02-06 07:48:29 - ERROR - stderr - 99%|█████████▉| 22251/22434 [21:40:49<07:51, 2.58s/it] +2025-02-06 07:48:29 - ERROR - stderr - +2025-02-06 07:48:29 - ERROR - stderr - +2025-02-06 07:48:29 - INFO - stdout - {'loss': 0.3555, 'grad_norm': 1.5570100545883179, 'learning_rate': 3.4900277797844663e-09, 'epoch': 2.98} +2025-02-06 07:48:29 - ERROR - stderr - 99%|█████████▉| 22251/22434 [21:40:49<07:51, 2.58s/it] +2025-02-06 07:48:32 - ERROR - stderr - 99%|█████████▉| 22252/22434 [21:40:51<07:45, 2.56s/it] +2025-02-06 07:48:32 - ERROR - stderr - +2025-02-06 07:48:32 - ERROR - stderr - +2025-02-06 07:48:32 - INFO - stdout - {'loss': 0.3557, 'grad_norm': 1.5194956064224243, 'learning_rate': 3.4519918024400467e-09, 'epoch': 2.98} +2025-02-06 07:48:32 - ERROR - stderr - 99%|█████████▉| 22252/22434 [21:40:51<07:45, 2.56s/it] +2025-02-06 07:48:34 - ERROR - stderr - 99%|█████████▉| 22253/22434 [21:40:54<07:41, 2.55s/it] +2025-02-06 07:48:34 - ERROR - stderr - +2025-02-06 07:48:34 - ERROR - stderr - +2025-02-06 07:48:34 - INFO - stdout - {'loss': 0.2945, 'grad_norm': 1.3730318546295166, 'learning_rate': 3.4141641935736547e-09, 'epoch': 2.98} +2025-02-06 07:48:34 - ERROR - stderr - 99%|█████████▉| 22253/22434 [21:40:54<07:41, 2.55s/it] +2025-02-06 07:48:37 - ERROR - stderr - 99%|█████████▉| 22254/22434 [21:40:56<07:36, 2.54s/it] +2025-02-06 07:48:37 - ERROR - stderr - +2025-02-06 07:48:37 - ERROR - stderr - +2025-02-06 07:48:37 - INFO - stdout - {'loss': 0.33, 'grad_norm': 1.6216363906860352, 'learning_rate': 3.376544953972438e-09, 'epoch': 2.98} +2025-02-06 07:48:37 - ERROR - stderr - 99%|█████████▉| 22254/22434 [21:40:56<07:36, 2.54s/it] +2025-02-06 07:48:39 - ERROR - stderr - 99%|█████████▉| 22255/22434 [21:40:59<07:32, 2.53s/it] +2025-02-06 07:48:39 - ERROR - stderr - +2025-02-06 07:48:39 - ERROR - stderr - +2025-02-06 07:48:39 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.5357571840286255, 'learning_rate': 3.3391340844224353e-09, 'epoch': 2.98} +2025-02-06 07:48:39 - ERROR - stderr - 99%|█████████▉| 22255/22434 [21:40:59<07:32, 2.53s/it] +2025-02-06 07:48:42 - ERROR - stderr - 99%|█████████▉| 22256/22434 [21:41:01<07:30, 2.53s/it] +2025-02-06 07:48:42 - ERROR - stderr - +2025-02-06 07:48:42 - ERROR - stderr - +2025-02-06 07:48:42 - INFO - stdout - {'loss': 0.3832, 'grad_norm': 1.5569233894348145, 'learning_rate': 3.301931585701912e-09, 'epoch': 2.98} +2025-02-06 07:48:42 - ERROR - stderr - 99%|█████████▉| 22256/22434 [21:41:01<07:30, 2.53s/it] +2025-02-06 07:48:44 - ERROR - stderr - 99%|█████████▉| 22257/22434 [21:41:04<07:29, 2.54s/it] +2025-02-06 07:48:44 - ERROR - stderr - +2025-02-06 07:48:44 - ERROR - stderr - +2025-02-06 07:48:44 - INFO - stdout - {'loss': 0.3586, 'grad_norm': 1.7576504945755005, 'learning_rate': 3.264937458585804e-09, 'epoch': 2.98} +2025-02-06 07:48:44 - ERROR - stderr - 99%|█████████▉| 22257/22434 [21:41:04<07:29, 2.54s/it] +2025-02-06 07:48:47 - ERROR - stderr - 99%|█████████▉| 22258/22434 [21:41:07<07:27, 2.54s/it] +2025-02-06 07:48:47 - ERROR - stderr - +2025-02-06 07:48:47 - ERROR - stderr - +2025-02-06 07:48:47 - INFO - stdout - {'loss': 0.3837, 'grad_norm': 1.5837814807891846, 'learning_rate': 3.228151703847937e-09, 'epoch': 2.98} +2025-02-06 07:48:47 - ERROR - stderr - 99%|█████████▉| 22258/22434 [21:41:07<07:27, 2.54s/it] +2025-02-06 07:48:50 - ERROR - stderr - 99%|█████████▉| 22259/22434 [21:41:09<07:34, 2.60s/it] +2025-02-06 07:48:50 - ERROR - stderr - +2025-02-06 07:48:50 - ERROR - stderr - +2025-02-06 07:48:50 - INFO - stdout - {'loss': 0.3508, 'grad_norm': 1.5091686248779297, 'learning_rate': 3.1915743222521446e-09, 'epoch': 2.98} +2025-02-06 07:48:50 - ERROR - stderr - 99%|█████████▉| 22259/22434 [21:41:09<07:34, 2.60s/it] +2025-02-06 07:48:52 - ERROR - stderr - 99%|█████████▉| 22260/22434 [21:41:12<07:27, 2.57s/it] +2025-02-06 07:48:52 - ERROR - stderr - +2025-02-06 07:48:52 - ERROR - stderr - +2025-02-06 07:48:52 - INFO - stdout - {'loss': 0.4265, 'grad_norm': 1.7607334852218628, 'learning_rate': 3.1552053145622596e-09, 'epoch': 2.98} +2025-02-06 07:48:52 - ERROR - stderr - 99%|█████████▉| 22260/22434 [21:41:12<07:27, 2.57s/it] +2025-02-06 07:48:55 - ERROR - stderr - 99%|█████████▉| 22261/22434 [21:41:14<07:21, 2.55s/it] +2025-02-06 07:48:55 - ERROR - stderr - +2025-02-06 07:48:55 - ERROR - stderr - +2025-02-06 07:48:55 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.5745912790298462, 'learning_rate': 3.119044681536565e-09, 'epoch': 2.98} +2025-02-06 07:48:55 - ERROR - stderr - 99%|█████████▉| 22261/22434 [21:41:14<07:21, 2.55s/it] +2025-02-06 07:48:57 - ERROR - stderr - 99%|█████████▉| 22262/22434 [21:41:17<07:15, 2.53s/it] +2025-02-06 07:48:57 - ERROR - stderr - +2025-02-06 07:48:57 - ERROR - stderr - +2025-02-06 07:48:57 - INFO - stdout - {'loss': 0.3281, 'grad_norm': 1.4709348678588867, 'learning_rate': 3.083092423928902e-09, 'epoch': 2.98} +2025-02-06 07:48:57 - ERROR - stderr - 99%|█████████▉| 22262/22434 [21:41:17<07:15, 2.53s/it] +2025-02-06 07:49:00 - ERROR - stderr - 99%|█████████▉| 22263/22434 [21:41:19<07:11, 2.52s/it] +2025-02-06 07:49:00 - ERROR - stderr - +2025-02-06 07:49:00 - ERROR - stderr - +2025-02-06 07:49:00 - INFO - stdout - {'loss': 0.4085, 'grad_norm': 1.81768798828125, 'learning_rate': 3.0473485424875603e-09, 'epoch': 2.98} +2025-02-06 07:49:00 - ERROR - stderr - 99%|█████████▉| 22263/22434 [21:41:19<07:11, 2.52s/it] +2025-02-06 07:49:02 - ERROR - stderr - 99%|█████████▉| 22264/22434 [21:41:22<07:11, 2.54s/it] +2025-02-06 07:49:02 - ERROR - stderr - +2025-02-06 07:49:02 - ERROR - stderr - +2025-02-06 07:49:02 - INFO - stdout - {'loss': 0.327, 'grad_norm': 1.5832089185714722, 'learning_rate': 3.0118130379575005e-09, 'epoch': 2.98} +2025-02-06 07:49:02 - ERROR - stderr - 99%|█████████▉| 22264/22434 [21:41:22<07:11, 2.54s/it] +2025-02-06 07:49:05 - ERROR - stderr - 99%|█████████▉| 22265/22434 [21:41:24<07:05, 2.52s/it] +2025-02-06 07:49:05 - ERROR - stderr - +2025-02-06 07:49:05 - ERROR - stderr - +2025-02-06 07:49:05 - INFO - stdout - {'loss': 0.3331, 'grad_norm': 1.3883129358291626, 'learning_rate': 2.9764859110814614e-09, 'epoch': 2.98} +2025-02-06 07:49:05 - ERROR - stderr - 99%|█████████▉| 22265/22434 [21:41:24<07:05, 2.52s/it] +2025-02-06 07:49:07 - ERROR - stderr - 99%|█████████▉| 22266/22434 [21:41:27<06:59, 2.50s/it] +2025-02-06 07:49:07 - ERROR - stderr - +2025-02-06 07:49:07 - ERROR - stderr - +2025-02-06 07:49:07 - INFO - stdout - {'loss': 0.347, 'grad_norm': 1.6383506059646606, 'learning_rate': 2.9413671625933005e-09, 'epoch': 2.98} +2025-02-06 07:49:07 - ERROR - stderr - 99%|█████████▉| 22266/22434 [21:41:27<06:59, 2.50s/it] +2025-02-06 07:49:10 - ERROR - stderr - 99%|█████████▉| 22267/22434 [21:41:29<06:57, 2.50s/it] +2025-02-06 07:49:10 - ERROR - stderr - +2025-02-06 07:49:10 - ERROR - stderr - +2025-02-06 07:49:10 - INFO - stdout - {'loss': 0.3497, 'grad_norm': 1.5711519718170166, 'learning_rate': 2.906456793226875e-09, 'epoch': 2.98} +2025-02-06 07:49:10 - ERROR - stderr - 99%|█████████▉| 22267/22434 [21:41:29<06:57, 2.50s/it] +2025-02-06 07:49:12 - ERROR - stderr - 99%|█████████▉| 22268/22434 [21:41:32<06:53, 2.49s/it] +2025-02-06 07:49:12 - ERROR - stderr - +2025-02-06 07:49:12 - ERROR - stderr - +2025-02-06 07:49:12 - INFO - stdout - {'loss': 0.4144, 'grad_norm': 1.8194258213043213, 'learning_rate': 2.871754803709381e-09, 'epoch': 2.98} +2025-02-06 07:49:12 - ERROR - stderr - 99%|█████████▉| 22268/22434 [21:41:32<06:53, 2.49s/it] +2025-02-06 07:49:14 - ERROR - stderr - 99%|█████████▉| 22269/22434 [21:41:34<06:50, 2.49s/it] +2025-02-06 07:49:15 - ERROR - stderr - +2025-02-06 07:49:15 - ERROR - stderr - +2025-02-06 07:49:15 - INFO - stdout - {'loss': 0.3422, 'grad_norm': 1.4886397123336792, 'learning_rate': 2.8372611947635742e-09, 'epoch': 2.98} +2025-02-06 07:49:15 - ERROR - stderr - 99%|█████████▉| 22269/22434 [21:41:34<06:50, 2.49s/it] +2025-02-06 07:49:17 - ERROR - stderr - 99%|█████████▉| 22270/22434 [21:41:37<06:50, 2.50s/it] +2025-02-06 07:49:17 - ERROR - stderr - +2025-02-06 07:49:17 - ERROR - stderr - +2025-02-06 07:49:17 - INFO - stdout - {'loss': 0.3606, 'grad_norm': 1.5778586864471436, 'learning_rate': 2.8029759671088787e-09, 'epoch': 2.98} +2025-02-06 07:49:17 - ERROR - stderr - 99%|█████████▉| 22270/22434 [21:41:37<06:50, 2.50s/it] +2025-02-06 07:49:19 - ERROR - stderr - 99%|█████████▉| 22271/22434 [21:41:39<06:46, 2.49s/it] +2025-02-06 07:49:20 - ERROR - stderr - +2025-02-06 07:49:20 - ERROR - stderr - +2025-02-06 07:49:20 - INFO - stdout - {'loss': 0.3395, 'grad_norm': 1.503562092781067, 'learning_rate': 2.7688991214591677e-09, 'epoch': 2.98} +2025-02-06 07:49:20 - ERROR - stderr - 99%|█████████▉| 22271/22434 [21:41:39<06:46, 2.49s/it] +2025-02-06 07:49:22 - ERROR - stderr - 99%|█████████▉| 22272/22434 [21:41:42<06:47, 2.51s/it] +2025-02-06 07:49:22 - ERROR - stderr - +2025-02-06 07:49:22 - ERROR - stderr - +2025-02-06 07:49:22 - INFO - stdout - {'loss': 0.3688, 'grad_norm': 1.6470410823822021, 'learning_rate': 2.7350306585260943e-09, 'epoch': 2.98} +2025-02-06 07:49:22 - ERROR - stderr - 99%|█████████▉| 22272/22434 [21:41:42<06:47, 2.51s/it] +2025-02-06 07:49:25 - ERROR - stderr - 99%|█████████▉| 22273/22434 [21:41:44<06:42, 2.50s/it] +2025-02-06 07:49:25 - ERROR - stderr - +2025-02-06 07:49:25 - ERROR - stderr - +2025-02-06 07:49:25 - INFO - stdout - {'loss': 0.3381, 'grad_norm': 1.3695263862609863, 'learning_rate': 2.7013705790146503e-09, 'epoch': 2.98} +2025-02-06 07:49:25 - ERROR - stderr - 99%|█████████▉| 22273/22434 [21:41:44<06:42, 2.50s/it] +2025-02-06 07:49:27 - ERROR - stderr - 99%|█████████▉| 22274/22434 [21:41:47<06:38, 2.49s/it] +2025-02-06 07:49:27 - ERROR - stderr - +2025-02-06 07:49:27 - ERROR - stderr - +2025-02-06 07:49:27 - INFO - stdout - {'loss': 0.3636, 'grad_norm': 1.435930609703064, 'learning_rate': 2.667918883627607e-09, 'epoch': 2.98} +2025-02-06 07:49:27 - ERROR - stderr - 99%|█████████▉| 22274/22434 [21:41:47<06:38, 2.49s/it] +2025-02-06 07:49:29 - ERROR - stderr - 99%|█████████▉| 22275/22434 [21:41:49<06:35, 2.49s/it] +2025-02-06 07:49:30 - ERROR - stderr - +2025-02-06 07:49:30 - ERROR - stderr - +2025-02-06 07:49:30 - INFO - stdout - {'loss': 0.3973, 'grad_norm': 1.69691801071167, 'learning_rate': 2.634675573061074e-09, 'epoch': 2.98} +2025-02-06 07:49:30 - ERROR - stderr - 99%|█████████▉| 22275/22434 [21:41:49<06:35, 2.49s/it] +2025-02-06 07:49:32 - ERROR - stderr - 99%|█████████▉| 22276/22434 [21:41:52<06:38, 2.52s/it] +2025-02-06 07:49:32 - ERROR - stderr - +2025-02-06 07:49:32 - ERROR - stderr - +2025-02-06 07:49:32 - INFO - stdout - {'loss': 0.387, 'grad_norm': 1.6868587732315063, 'learning_rate': 2.6016406480078305e-09, 'epoch': 2.98} +2025-02-06 07:49:32 - ERROR - stderr - 99%|█████████▉| 22276/22434 [21:41:52<06:38, 2.52s/it] +2025-02-06 07:49:35 - ERROR - stderr - 99%|█████████▉| 22277/22434 [21:41:54<06:33, 2.51s/it] +2025-02-06 07:49:35 - ERROR - stderr - +2025-02-06 07:49:35 - ERROR - stderr - +2025-02-06 07:49:35 - INFO - stdout - {'loss': 0.3815, 'grad_norm': 1.686750888824463, 'learning_rate': 2.568814109157325e-09, 'epoch': 2.98} +2025-02-06 07:49:35 - ERROR - stderr - 99%|█████████▉| 22277/22434 [21:41:54<06:33, 2.51s/it] +2025-02-06 07:49:37 - ERROR - stderr - 99%|█████████▉| 22278/22434 [21:41:57<06:27, 2.49s/it] +2025-02-06 07:49:37 - ERROR - stderr - +2025-02-06 07:49:37 - ERROR - stderr - +2025-02-06 07:49:37 - INFO - stdout - {'loss': 0.391, 'grad_norm': 1.6707063913345337, 'learning_rate': 2.5361959571923445e-09, 'epoch': 2.98} +2025-02-06 07:49:37 - ERROR - stderr - 99%|█████████▉| 22278/22434 [21:41:57<06:27, 2.49s/it] +2025-02-06 07:49:40 - ERROR - stderr - 99%|█████████▉| 22279/22434 [21:41:59<06:28, 2.50s/it] +2025-02-06 07:49:40 - ERROR - stderr - +2025-02-06 07:49:40 - ERROR - stderr - +2025-02-06 07:49:40 - INFO - stdout - {'loss': 0.3619, 'grad_norm': 1.5386732816696167, 'learning_rate': 2.5037861927945663e-09, 'epoch': 2.98} +2025-02-06 07:49:40 - ERROR - stderr - 99%|█████████▉| 22279/22434 [21:41:59<06:28, 2.50s/it] +2025-02-06 07:49:42 - ERROR - stderr - 99%|█████████▉| 22280/22434 [21:42:02<06:23, 2.49s/it] +2025-02-06 07:49:42 - ERROR - stderr - +2025-02-06 07:49:42 - ERROR - stderr - +2025-02-06 07:49:42 - INFO - stdout - {'loss': 0.3524, 'grad_norm': 1.622697114944458, 'learning_rate': 2.4715848166390053e-09, 'epoch': 2.98} +2025-02-06 07:49:42 - ERROR - stderr - 99%|█████████▉| 22280/22434 [21:42:02<06:23, 2.49s/it] +2025-02-06 07:49:45 - ERROR - stderr - 99%|█████████▉| 22281/22434 [21:42:04<06:31, 2.56s/it] +2025-02-06 07:49:45 - ERROR - stderr - +2025-02-06 07:49:45 - ERROR - stderr - +2025-02-06 07:49:45 - INFO - stdout - {'loss': 0.3447, 'grad_norm': 1.5207462310791016, 'learning_rate': 2.4395918293973476e-09, 'epoch': 2.98} +2025-02-06 07:49:45 - ERROR - stderr - 99%|█████████▉| 22281/22434 [21:42:05<06:31, 2.56s/it] +2025-02-06 07:49:47 - ERROR - stderr - 99%|█████████▉| 22282/22434 [21:42:07<06:25, 2.54s/it] +2025-02-06 07:49:47 - ERROR - stderr - +2025-02-06 07:49:47 - ERROR - stderr - +2025-02-06 07:49:47 - INFO - stdout - {'loss': 0.3749, 'grad_norm': 1.4792819023132324, 'learning_rate': 2.4078072317346156e-09, 'epoch': 2.98} +2025-02-06 07:49:47 - ERROR - stderr - 99%|█████████▉| 22282/22434 [21:42:07<06:25, 2.54s/it] +2025-02-06 07:49:50 - ERROR - stderr - 99%|█████████▉| 22283/22434 [21:42:09<06:23, 2.54s/it] +2025-02-06 07:49:50 - ERROR - stderr - +2025-02-06 07:49:50 - ERROR - stderr - +2025-02-06 07:49:50 - INFO - stdout - {'loss': 0.3666, 'grad_norm': 1.7396942377090454, 'learning_rate': 2.3762310243147236e-09, 'epoch': 2.98} +2025-02-06 07:49:50 - ERROR - stderr - 99%|█████████▉| 22283/22434 [21:42:10<06:23, 2.54s/it] +2025-02-06 07:49:52 - ERROR - stderr - 99%|█████████▉| 22284/22434 [21:42:12<06:18, 2.53s/it] +2025-02-06 07:49:52 - ERROR - stderr - +2025-02-06 07:49:52 - ERROR - stderr - +2025-02-06 07:49:52 - INFO - stdout - {'loss': 0.4122, 'grad_norm': 1.8536089658737183, 'learning_rate': 2.3448632077960332e-09, 'epoch': 2.98} +2025-02-06 07:49:52 - ERROR - stderr - 99%|█████████▉| 22284/22434 [21:42:12<06:18, 2.53s/it] +2025-02-06 07:49:55 - ERROR - stderr - 99%|█████████▉| 22285/22434 [21:42:14<06:10, 2.49s/it] +2025-02-06 07:49:55 - ERROR - stderr - +2025-02-06 07:49:55 - ERROR - stderr - +2025-02-06 07:49:55 - INFO - stdout - {'loss': 0.3986, 'grad_norm': 1.7033309936523438, 'learning_rate': 2.313703782831356e-09, 'epoch': 2.98} +2025-02-06 07:49:55 - ERROR - stderr - 99%|█████████▉| 22285/22434 [21:42:14<06:10, 2.49s/it] +2025-02-06 07:49:57 - ERROR - stderr - 99%|█████████▉| 22286/22434 [21:42:17<06:07, 2.48s/it] +2025-02-06 07:49:57 - ERROR - stderr - +2025-02-06 07:49:57 - ERROR - stderr - +2025-02-06 07:49:57 - INFO - stdout - {'loss': 0.3711, 'grad_norm': 1.5611618757247925, 'learning_rate': 2.282752750071282e-09, 'epoch': 2.98} +2025-02-06 07:49:57 - ERROR - stderr - 99%|█████████▉| 22286/22434 [21:42:17<06:07, 2.48s/it] +2025-02-06 07:50:00 - ERROR - stderr - 99%|█████████▉| 22287/22434 [21:42:19<06:03, 2.48s/it] +2025-02-06 07:50:00 - ERROR - stderr - +2025-02-06 07:50:00 - ERROR - stderr - +2025-02-06 07:50:00 - INFO - stdout - {'loss': 0.3409, 'grad_norm': 1.4842947721481323, 'learning_rate': 2.2520101101597412e-09, 'epoch': 2.98} +2025-02-06 07:50:00 - ERROR - stderr - 99%|█████████▉| 22287/22434 [21:42:19<06:03, 2.48s/it] +2025-02-06 07:50:02 - ERROR - stderr - 99%|█████████▉| 22288/22434 [21:42:22<06:00, 2.47s/it] +2025-02-06 07:50:02 - ERROR - stderr - +2025-02-06 07:50:02 - ERROR - stderr - +2025-02-06 07:50:02 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.5789676904678345, 'learning_rate': 2.2214758637384426e-09, 'epoch': 2.98} +2025-02-06 07:50:02 - ERROR - stderr - 99%|█████████▉| 22288/22434 [21:42:22<06:00, 2.47s/it] +2025-02-06 07:50:05 - ERROR - stderr - 99%|█████████▉| 22289/22434 [21:42:24<06:01, 2.50s/it] +2025-02-06 07:50:05 - ERROR - stderr - +2025-02-06 07:50:05 - ERROR - stderr - +2025-02-06 07:50:05 - INFO - stdout - {'loss': 0.3389, 'grad_norm': 1.4404479265213013, 'learning_rate': 2.1911500114446536e-09, 'epoch': 2.98} +2025-02-06 07:50:05 - ERROR - stderr - 99%|█████████▉| 22289/22434 [21:42:24<06:01, 2.50s/it] +2025-02-06 07:50:07 - ERROR - stderr - 99%|█████████▉| 22290/22434 [21:42:27<05:57, 2.48s/it] +2025-02-06 07:50:07 - ERROR - stderr - +2025-02-06 07:50:07 - ERROR - stderr - +2025-02-06 07:50:07 - INFO - stdout - {'loss': 0.3718, 'grad_norm': 1.5626124143600464, 'learning_rate': 2.1610325539089817e-09, 'epoch': 2.98} +2025-02-06 07:50:07 - ERROR - stderr - 99%|█████████▉| 22290/22434 [21:42:27<05:57, 2.48s/it] +2025-02-06 07:50:10 - ERROR - stderr - 99%|█████████▉| 22291/22434 [21:42:29<05:55, 2.48s/it] +2025-02-06 07:50:10 - ERROR - stderr - +2025-02-06 07:50:10 - ERROR - stderr - +2025-02-06 07:50:10 - INFO - stdout - {'loss': 0.3191, 'grad_norm': 1.531546950340271, 'learning_rate': 2.1311234917587022e-09, 'epoch': 2.98} +2025-02-06 07:50:10 - ERROR - stderr - 99%|█████████▉| 22291/22434 [21:42:29<05:55, 2.48s/it] +2025-02-06 07:50:12 - ERROR - stderr - 99%|█████████▉| 22292/22434 [21:42:32<05:52, 2.49s/it] +2025-02-06 07:50:12 - ERROR - stderr - +2025-02-06 07:50:12 - ERROR - stderr - +2025-02-06 07:50:12 - INFO - stdout - {'loss': 0.3378, 'grad_norm': 1.5170469284057617, 'learning_rate': 2.1014228256188705e-09, 'epoch': 2.98} +2025-02-06 07:50:12 - ERROR - stderr - 99%|█████████▉| 22292/22434 [21:42:32<05:52, 2.49s/it] +2025-02-06 07:50:14 - ERROR - stderr - 99%|█████████▉| 22293/22434 [21:42:34<05:48, 2.47s/it] +2025-02-06 07:50:14 - ERROR - stderr - +2025-02-06 07:50:14 - ERROR - stderr - +2025-02-06 07:50:14 - INFO - stdout - {'loss': 0.4078, 'grad_norm': 1.5909638404846191, 'learning_rate': 2.071930556107882e-09, 'epoch': 2.98} +2025-02-06 07:50:14 - ERROR - stderr - 99%|█████████▉| 22293/22434 [21:42:34<05:48, 2.47s/it] +2025-02-06 07:50:17 - ERROR - stderr - 99%|█████████▉| 22294/22434 [21:42:37<05:45, 2.47s/it] +2025-02-06 07:50:17 - ERROR - stderr - +2025-02-06 07:50:17 - ERROR - stderr - +2025-02-06 07:50:17 - INFO - stdout - {'loss': 0.3739, 'grad_norm': 1.564225673675537, 'learning_rate': 2.042646683840799e-09, 'epoch': 2.98} +2025-02-06 07:50:17 - ERROR - stderr - 99%|█████████▉| 22294/22434 [21:42:37<05:45, 2.47s/it] +2025-02-06 07:50:19 - ERROR - stderr - 99%|█████████▉| 22295/22434 [21:42:39<05:42, 2.46s/it] +2025-02-06 07:50:19 - ERROR - stderr - +2025-02-06 07:50:19 - ERROR - stderr - +2025-02-06 07:50:19 - INFO - stdout - {'loss': 0.4334, 'grad_norm': 1.7929240465164185, 'learning_rate': 2.0135712094282444e-09, 'epoch': 2.98} +2025-02-06 07:50:19 - ERROR - stderr - 99%|█████████▉| 22295/22434 [21:42:39<05:42, 2.46s/it] +2025-02-06 07:50:22 - ERROR - stderr - 99%|█████████▉| 22296/22434 [21:42:42<05:39, 2.46s/it] +2025-02-06 07:50:22 - ERROR - stderr - +2025-02-06 07:50:22 - ERROR - stderr - +2025-02-06 07:50:22 - INFO - stdout - {'loss': 0.3631, 'grad_norm': 1.5259467363357544, 'learning_rate': 1.9847041334752905e-09, 'epoch': 2.98} +2025-02-06 07:50:22 - ERROR - stderr - 99%|█████████▉| 22296/22434 [21:42:42<05:39, 2.46s/it] +2025-02-06 07:50:24 - ERROR - stderr - 99%|█████████▉| 22297/22434 [21:42:44<05:37, 2.46s/it] +2025-02-06 07:50:24 - ERROR - stderr - +2025-02-06 07:50:24 - ERROR - stderr - +2025-02-06 07:50:24 - INFO - stdout - {'loss': 0.3593, 'grad_norm': 1.691005825996399, 'learning_rate': 1.956045456583677e-09, 'epoch': 2.98} +2025-02-06 07:50:24 - ERROR - stderr - 99%|█████████▉| 22297/22434 [21:42:44<05:37, 2.46s/it] +2025-02-06 07:50:27 - ERROR - stderr - 99%|█████████▉| 22298/22434 [21:42:47<05:35, 2.47s/it] +2025-02-06 07:50:27 - ERROR - stderr - +2025-02-06 07:50:27 - ERROR - stderr - +2025-02-06 07:50:27 - INFO - stdout - {'loss': 0.3287, 'grad_norm': 1.2505619525909424, 'learning_rate': 1.9275951793518154e-09, 'epoch': 2.98} +2025-02-06 07:50:27 - ERROR - stderr - 99%|█████████▉| 22298/22434 [21:42:47<05:35, 2.47s/it] +2025-02-06 07:50:29 - ERROR - stderr - 99%|█████████▉| 22299/22434 [21:42:49<05:41, 2.53s/it] +2025-02-06 07:50:29 - ERROR - stderr - +2025-02-06 07:50:29 - ERROR - stderr - +2025-02-06 07:50:29 - INFO - stdout - {'loss': 0.3444, 'grad_norm': 1.4262430667877197, 'learning_rate': 1.899353302371454e-09, 'epoch': 2.98} +2025-02-06 07:50:29 - ERROR - stderr - 99%|█████████▉| 22299/22434 [21:42:49<05:41, 2.53s/it] +2025-02-06 07:50:32 - ERROR - stderr - 99%|█████████▉| 22300/22434 [21:42:52<05:43, 2.56s/it] +2025-02-06 07:50:32 - ERROR - stderr - +2025-02-06 07:50:32 - ERROR - stderr - +2025-02-06 07:50:32 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.5075546503067017, 'learning_rate': 1.8713198262321207e-09, 'epoch': 2.98} +2025-02-06 07:50:32 - ERROR - stderr - 99%|█████████▉| 22300/22434 [21:42:52<05:43, 2.56s/it] +2025-02-06 07:50:35 - ERROR - stderr - 99%|█████████▉| 22301/22434 [21:42:54<05:39, 2.55s/it] +2025-02-06 07:50:35 - ERROR - stderr - +2025-02-06 07:50:35 - ERROR - stderr - +2025-02-06 07:50:35 - INFO - stdout - {'loss': 0.341, 'grad_norm': 1.512439489364624, 'learning_rate': 1.8434947515177936e-09, 'epoch': 2.98} +2025-02-06 07:50:35 - ERROR - stderr - 99%|█████████▉| 22301/22434 [21:42:54<05:39, 2.55s/it] +2025-02-06 07:50:37 - ERROR - stderr - 99%|█████████▉| 22302/22434 [21:42:57<05:34, 2.53s/it] +2025-02-06 07:50:37 - ERROR - stderr - +2025-02-06 07:50:37 - ERROR - stderr - +2025-02-06 07:50:37 - INFO - stdout - {'loss': 0.3634, 'grad_norm': 1.821179747581482, 'learning_rate': 1.815878078809119e-09, 'epoch': 2.98} +2025-02-06 07:50:37 - ERROR - stderr - 99%|█████████▉| 22302/22434 [21:42:57<05:34, 2.53s/it] +2025-02-06 07:50:40 - ERROR - stderr - 99%|█████████▉| 22303/22434 [21:42:59<05:31, 2.53s/it] +2025-02-06 07:50:40 - ERROR - stderr - +2025-02-06 07:50:40 - ERROR - stderr - +2025-02-06 07:50:40 - INFO - stdout - {'loss': 0.3633, 'grad_norm': 1.597002625465393, 'learning_rate': 1.7884698086811926e-09, 'epoch': 2.98} +2025-02-06 07:50:40 - ERROR - stderr - 99%|█████████▉| 22303/22434 [21:42:59<05:31, 2.53s/it] +2025-02-06 07:50:42 - ERROR - stderr - 99%|█████████▉| 22304/22434 [21:43:02<05:28, 2.53s/it] +2025-02-06 07:50:42 - ERROR - stderr - +2025-02-06 07:50:42 - ERROR - stderr - +2025-02-06 07:50:42 - INFO - stdout - {'loss': 0.3628, 'grad_norm': 1.823317289352417, 'learning_rate': 1.7612699417057788e-09, 'epoch': 2.98} +2025-02-06 07:50:42 - ERROR - stderr - 99%|█████████▉| 22304/22434 [21:43:02<05:28, 2.53s/it] +2025-02-06 07:50:45 - ERROR - stderr - 99%|█████████▉| 22305/22434 [21:43:04<05:27, 2.54s/it] +2025-02-06 07:50:45 - ERROR - stderr - +2025-02-06 07:50:45 - ERROR - stderr - +2025-02-06 07:50:45 - INFO - stdout - {'loss': 0.3855, 'grad_norm': 1.5619186162948608, 'learning_rate': 1.7342784784479817e-09, 'epoch': 2.98} +2025-02-06 07:50:45 - ERROR - stderr - 99%|█████████▉| 22305/22434 [21:43:04<05:27, 2.54s/it] +2025-02-06 07:50:47 - ERROR - stderr - 99%|█████████▉| 22306/22434 [21:43:07<05:26, 2.55s/it] +2025-02-06 07:50:47 - ERROR - stderr - +2025-02-06 07:50:47 - ERROR - stderr - +2025-02-06 07:50:47 - INFO - stdout - {'loss': 0.373, 'grad_norm': 1.6022979021072388, 'learning_rate': 1.7074954194729044e-09, 'epoch': 2.98} +2025-02-06 07:50:47 - ERROR - stderr - 99%|█████████▉| 22306/22434 [21:43:07<05:26, 2.55s/it] +2025-02-06 07:50:50 - ERROR - stderr - 99%|█████████▉| 22307/22434 [21:43:09<05:21, 2.53s/it] +2025-02-06 07:50:50 - ERROR - stderr - +2025-02-06 07:50:50 - ERROR - stderr - +2025-02-06 07:50:50 - INFO - stdout - {'loss': 0.356, 'grad_norm': 1.5845285654067993, 'learning_rate': 1.680920765337879e-09, 'epoch': 2.98} +2025-02-06 07:50:50 - ERROR - stderr - 99%|█████████▉| 22307/22434 [21:43:10<05:21, 2.53s/it] +2025-02-06 07:50:52 - ERROR - stderr - 99%|█████████▉| 22308/22434 [21:43:12<05:15, 2.50s/it] +2025-02-06 07:50:52 - ERROR - stderr - +2025-02-06 07:50:52 - ERROR - stderr - +2025-02-06 07:50:52 - INFO - stdout - {'loss': 0.3554, 'grad_norm': 1.5219188928604126, 'learning_rate': 1.6545545165969067e-09, 'epoch': 2.98} +2025-02-06 07:50:52 - ERROR - stderr - 99%|█████████▉| 22308/22434 [21:43:12<05:15, 2.50s/it] +2025-02-06 07:50:55 - ERROR - stderr - 99%|█████████▉| 22309/22434 [21:43:14<05:09, 2.47s/it] +2025-02-06 07:50:55 - ERROR - stderr - +2025-02-06 07:50:55 - ERROR - stderr - +2025-02-06 07:50:55 - INFO - stdout - {'loss': 0.3284, 'grad_norm': 1.5052961111068726, 'learning_rate': 1.6283966737984381e-09, 'epoch': 2.98} +2025-02-06 07:50:55 - ERROR - stderr - 99%|█████████▉| 22309/22434 [21:43:14<05:09, 2.47s/it] +2025-02-06 07:50:57 - ERROR - stderr - 99%|█████████▉| 22310/22434 [21:43:17<05:07, 2.48s/it] +2025-02-06 07:50:57 - ERROR - stderr - +2025-02-06 07:50:57 - ERROR - stderr - +2025-02-06 07:50:57 - INFO - stdout - {'loss': 0.3361, 'grad_norm': 1.4924882650375366, 'learning_rate': 1.6024472374887023e-09, 'epoch': 2.98} +2025-02-06 07:50:57 - ERROR - stderr - 99%|█████████▉| 22310/22434 [21:43:17<05:07, 2.48s/it] +2025-02-06 07:51:00 - ERROR - stderr - 99%|█████████▉| 22311/22434 [21:43:19<05:03, 2.47s/it] +2025-02-06 07:51:00 - ERROR - stderr - +2025-02-06 07:51:00 - ERROR - stderr - +2025-02-06 07:51:00 - INFO - stdout - {'loss': 0.3621, 'grad_norm': 1.5894914865493774, 'learning_rate': 1.5767062082094887e-09, 'epoch': 2.98} +2025-02-06 07:51:00 - ERROR - stderr - 99%|█████████▉| 22311/22434 [21:43:19<05:03, 2.47s/it] +2025-02-06 07:51:02 - ERROR - stderr - 99%|█████████▉| 22312/22434 [21:43:22<05:01, 2.47s/it] +2025-02-06 07:51:02 - ERROR - stderr - +2025-02-06 07:51:02 - ERROR - stderr - +2025-02-06 07:51:02 - INFO - stdout - {'loss': 0.3819, 'grad_norm': 1.7610788345336914, 'learning_rate': 1.5511735864959244e-09, 'epoch': 2.98} +2025-02-06 07:51:02 - ERROR - stderr - 99%|█████████▉| 22312/22434 [21:43:22<05:01, 2.47s/it] +2025-02-06 07:51:05 - ERROR - stderr - 99%|█████████▉| 22313/22434 [21:43:24<05:03, 2.51s/it] +2025-02-06 07:51:05 - ERROR - stderr - +2025-02-06 07:51:05 - ERROR - stderr - +2025-02-06 07:51:05 - INFO - stdout - {'loss': 0.3564, 'grad_norm': 1.6446856260299683, 'learning_rate': 1.5258493728798063e-09, 'epoch': 2.98} +2025-02-06 07:51:05 - ERROR - stderr - 99%|█████████▉| 22313/22434 [21:43:24<05:03, 2.51s/it] +2025-02-06 07:51:07 - ERROR - stderr - 99%|█████████▉| 22314/22434 [21:43:27<05:00, 2.50s/it] +2025-02-06 07:51:07 - ERROR - stderr - +2025-02-06 07:51:07 - ERROR - stderr - +2025-02-06 07:51:07 - INFO - stdout - {'loss': 0.3635, 'grad_norm': 1.5324846506118774, 'learning_rate': 1.500733567890711e-09, 'epoch': 2.98} +2025-02-06 07:51:07 - ERROR - stderr - 99%|█████████▉| 22314/22434 [21:43:27<05:00, 2.50s/it] +2025-02-06 07:51:10 - ERROR - stderr - 99%|█████████▉| 22315/22434 [21:43:29<04:55, 2.49s/it] +2025-02-06 07:51:10 - ERROR - stderr - +2025-02-06 07:51:10 - ERROR - stderr - +2025-02-06 07:51:10 - INFO - stdout - {'loss': 0.3274, 'grad_norm': 1.3964418172836304, 'learning_rate': 1.4758261720515533e-09, 'epoch': 2.98} +2025-02-06 07:51:10 - ERROR - stderr - 99%|█████████▉| 22315/22434 [21:43:29<04:55, 2.49s/it] +2025-02-06 07:51:12 - ERROR - stderr - 99%|█████████▉| 22316/22434 [21:43:32<05:04, 2.58s/it] +2025-02-06 07:51:12 - ERROR - stderr - +2025-02-06 07:51:12 - ERROR - stderr - +2025-02-06 07:51:12 - INFO - stdout - {'loss': 0.3239, 'grad_norm': 1.382839322090149, 'learning_rate': 1.4511271858808075e-09, 'epoch': 2.98} +2025-02-06 07:51:12 - ERROR - stderr - 99%|█████████▉| 22316/22434 [21:43:32<05:04, 2.58s/it] +2025-02-06 07:51:15 - ERROR - stderr - 99%|█████████▉| 22317/22434 [21:43:35<05:00, 2.57s/it] +2025-02-06 07:51:15 - ERROR - stderr - +2025-02-06 07:51:15 - ERROR - stderr - +2025-02-06 07:51:15 - INFO - stdout - {'loss': 0.3297, 'grad_norm': 1.5403969287872314, 'learning_rate': 1.4266366098936169e-09, 'epoch': 2.98} +2025-02-06 07:51:15 - ERROR - stderr - 99%|█████████▉| 22317/22434 [21:43:35<05:00, 2.57s/it] +2025-02-06 07:51:17 - ERROR - stderr - 99%|█████████▉| 22318/22434 [21:43:37<04:56, 2.56s/it] +2025-02-06 07:51:17 - ERROR - stderr - +2025-02-06 07:51:17 - ERROR - stderr - +2025-02-06 07:51:17 - INFO - stdout - {'loss': 0.3629, 'grad_norm': 1.5398378372192383, 'learning_rate': 1.4023544446006842e-09, 'epoch': 2.98} +2025-02-06 07:51:17 - ERROR - stderr - 99%|█████████▉| 22318/22434 [21:43:37<04:56, 2.56s/it] +2025-02-06 07:51:20 - ERROR - stderr - 99%|█████████▉| 22319/22434 [21:43:40<04:51, 2.54s/it] +2025-02-06 07:51:20 - ERROR - stderr - +2025-02-06 07:51:20 - ERROR - stderr - +2025-02-06 07:51:20 - INFO - stdout - {'loss': 0.3481, 'grad_norm': 1.351636290550232, 'learning_rate': 1.3782806905082714e-09, 'epoch': 2.98} +2025-02-06 07:51:20 - ERROR - stderr - 99%|█████████▉| 22319/22434 [21:43:40<04:51, 2.54s/it] +2025-02-06 07:51:22 - ERROR - stderr - 99%|█████████▉| 22320/22434 [21:43:42<04:46, 2.52s/it] +2025-02-06 07:51:22 - ERROR - stderr - +2025-02-06 07:51:22 - ERROR - stderr - +2025-02-06 07:51:22 - INFO - stdout - {'loss': 0.3611, 'grad_norm': 1.5044087171554565, 'learning_rate': 1.3544153481181988e-09, 'epoch': 2.98} +2025-02-06 07:51:22 - ERROR - stderr - 99%|█████████▉| 22320/22434 [21:43:42<04:46, 2.52s/it] +2025-02-06 07:51:25 - ERROR - stderr - 99%|█████████▉| 22321/22434 [21:43:45<04:41, 2.49s/it] +2025-02-06 07:51:25 - ERROR - stderr - +2025-02-06 07:51:25 - ERROR - stderr - +2025-02-06 07:51:25 - INFO - stdout - {'loss': 0.4079, 'grad_norm': 1.585081696510315, 'learning_rate': 1.3307584179267364e-09, 'epoch': 2.98} +2025-02-06 07:51:25 - ERROR - stderr - 99%|█████████▉| 22321/22434 [21:43:45<04:41, 2.49s/it] +2025-02-06 07:51:27 - ERROR - stderr - 100%|█████████▉| 22322/22434 [21:43:47<04:38, 2.49s/it] +2025-02-06 07:51:27 - ERROR - stderr - +2025-02-06 07:51:27 - ERROR - stderr - +2025-02-06 07:51:27 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.4718503952026367, 'learning_rate': 1.3073099004290436e-09, 'epoch': 2.99} +2025-02-06 07:51:27 - ERROR - stderr - 100%|█████████▉| 22322/22434 [21:43:47<04:38, 2.49s/it] +2025-02-06 07:51:30 - ERROR - stderr - 100%|█████████▉| 22323/22434 [21:43:50<04:36, 2.49s/it] +2025-02-06 07:51:30 - ERROR - stderr - +2025-02-06 07:51:30 - ERROR - stderr - +2025-02-06 07:51:30 - INFO - stdout - {'loss': 0.3947, 'grad_norm': 1.5799646377563477, 'learning_rate': 1.284069796111398e-09, 'epoch': 2.99} +2025-02-06 07:51:30 - ERROR - stderr - 100%|█████████▉| 22323/22434 [21:43:50<04:36, 2.49s/it] +2025-02-06 07:51:32 - ERROR - stderr - 100%|█████████▉| 22324/22434 [21:43:52<04:34, 2.50s/it] +2025-02-06 07:51:32 - ERROR - stderr - +2025-02-06 07:51:32 - ERROR - stderr - +2025-02-06 07:51:32 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.5137360095977783, 'learning_rate': 1.2610381054611875e-09, 'epoch': 2.99} +2025-02-06 07:51:32 - ERROR - stderr - 100%|█████████▉| 22324/22434 [21:43:52<04:34, 2.50s/it] +2025-02-06 07:51:35 - ERROR - stderr - 100%|█████████▉| 22325/22434 [21:43:55<04:30, 2.49s/it] +2025-02-06 07:51:35 - ERROR - stderr - +2025-02-06 07:51:35 - ERROR - stderr - +2025-02-06 07:51:35 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.7324237823486328, 'learning_rate': 1.2382148289558082e-09, 'epoch': 2.99} +2025-02-06 07:51:35 - ERROR - stderr - 100%|█████████▉| 22325/22434 [21:43:55<04:30, 2.49s/it] +2025-02-06 07:51:37 - ERROR - stderr - 100%|█████████▉| 22326/22434 [21:43:57<04:27, 2.48s/it] +2025-02-06 07:51:37 - ERROR - stderr - +2025-02-06 07:51:37 - ERROR - stderr - +2025-02-06 07:51:37 - INFO - stdout - {'loss': 0.3735, 'grad_norm': 1.7351129055023193, 'learning_rate': 1.2155999670726559e-09, 'epoch': 2.99} +2025-02-06 07:51:37 - ERROR - stderr - 100%|█████████▉| 22326/22434 [21:43:57<04:27, 2.48s/it] +2025-02-06 07:51:40 - ERROR - stderr - 100%|█████████▉| 22327/22434 [21:44:00<04:38, 2.61s/it] +2025-02-06 07:51:40 - ERROR - stderr - +2025-02-06 07:51:40 - ERROR - stderr - +2025-02-06 07:51:40 - INFO - stdout - {'loss': 0.401, 'grad_norm': 1.6593226194381714, 'learning_rate': 1.193193520281355e-09, 'epoch': 2.99} +2025-02-06 07:51:40 - ERROR - stderr - 100%|█████████▉| 22327/22434 [21:44:00<04:38, 2.61s/it] +2025-02-06 07:51:43 - ERROR - stderr - 100%|█████████▉| 22328/22434 [21:44:02<04:33, 2.58s/it] +2025-02-06 07:51:43 - ERROR - stderr - +2025-02-06 07:51:43 - ERROR - stderr - +2025-02-06 07:51:43 - INFO - stdout - {'loss': 0.3456, 'grad_norm': 1.5094729661941528, 'learning_rate': 1.1709954890515296e-09, 'epoch': 2.99} +2025-02-06 07:51:43 - ERROR - stderr - 100%|█████████▉| 22328/22434 [21:44:02<04:33, 2.58s/it] +2025-02-06 07:51:45 - ERROR - stderr - 100%|█████████▉| 22329/22434 [21:44:05<04:28, 2.56s/it] +2025-02-06 07:51:45 - ERROR - stderr - +2025-02-06 07:51:45 - ERROR - stderr - +2025-02-06 07:51:45 - INFO - stdout - {'loss': 0.3448, 'grad_norm': 1.3878175020217896, 'learning_rate': 1.1490058738439225e-09, 'epoch': 2.99} +2025-02-06 07:51:45 - ERROR - stderr - 100%|█████████▉| 22329/22434 [21:44:05<04:28, 2.56s/it] +2025-02-06 07:51:48 - ERROR - stderr - 100%|█████████▉| 22330/22434 [21:44:07<04:25, 2.55s/it] +2025-02-06 07:51:48 - ERROR - stderr - +2025-02-06 07:51:48 - ERROR - stderr - +2025-02-06 07:51:48 - INFO - stdout - {'loss': 0.3574, 'grad_norm': 1.4065676927566528, 'learning_rate': 1.1272246751170558e-09, 'epoch': 2.99} +2025-02-06 07:51:48 - ERROR - stderr - 100%|█████████▉| 22330/22434 [21:44:07<04:25, 2.55s/it] +2025-02-06 07:51:50 - ERROR - stderr - 100%|█████████▉| 22331/22434 [21:44:10<04:23, 2.56s/it] +2025-02-06 07:51:50 - ERROR - stderr - +2025-02-06 07:51:50 - ERROR - stderr - +2025-02-06 07:51:50 - INFO - stdout - {'loss': 0.3439, 'grad_norm': 1.6719173192977905, 'learning_rate': 1.1056518933261207e-09, 'epoch': 2.99} +2025-02-06 07:51:50 - ERROR - stderr - 100%|█████████▉| 22331/22434 [21:44:10<04:23, 2.56s/it] +2025-02-06 07:51:53 - ERROR - stderr - 100%|█████████▉| 22332/22434 [21:44:12<04:17, 2.52s/it] +2025-02-06 07:51:53 - ERROR - stderr - +2025-02-06 07:51:53 - ERROR - stderr - +2025-02-06 07:51:53 - INFO - stdout - {'loss': 0.3816, 'grad_norm': 1.5907431840896606, 'learning_rate': 1.0842875289196475e-09, 'epoch': 2.99} +2025-02-06 07:51:53 - ERROR - stderr - 100%|█████████▉| 22332/22434 [21:44:13<04:17, 2.52s/it] +2025-02-06 07:51:55 - ERROR - stderr - 100%|█████████▉| 22333/22434 [21:44:15<04:14, 2.52s/it] +2025-02-06 07:51:55 - ERROR - stderr - +2025-02-06 07:51:55 - ERROR - stderr - +2025-02-06 07:51:55 - INFO - stdout - {'loss': 0.3383, 'grad_norm': 1.5053400993347168, 'learning_rate': 1.0631315823428357e-09, 'epoch': 2.99} +2025-02-06 07:51:55 - ERROR - stderr - 100%|█████████▉| 22333/22434 [21:44:15<04:14, 2.52s/it] +2025-02-06 07:51:58 - ERROR - stderr - 100%|█████████▉| 22334/22434 [21:44:17<04:10, 2.50s/it] +2025-02-06 07:51:58 - ERROR - stderr - +2025-02-06 07:51:58 - ERROR - stderr - +2025-02-06 07:51:58 - INFO - stdout - {'loss': 0.3701, 'grad_norm': 1.6027690172195435, 'learning_rate': 1.0421840540375538e-09, 'epoch': 2.99} +2025-02-06 07:51:58 - ERROR - stderr - 100%|█████████▉| 22334/22434 [21:44:17<04:10, 2.50s/it] +2025-02-06 07:52:00 - ERROR - stderr - 100%|█████████▉| 22335/22434 [21:44:20<04:07, 2.50s/it] +2025-02-06 07:52:00 - ERROR - stderr - +2025-02-06 07:52:00 - ERROR - stderr - +2025-02-06 07:52:00 - INFO - stdout - {'loss': 0.3592, 'grad_norm': 1.4910728931427002, 'learning_rate': 1.0214449444390096e-09, 'epoch': 2.99} +2025-02-06 07:52:00 - ERROR - stderr - 100%|████��████▉| 22335/22434 [21:44:20<04:07, 2.50s/it] +2025-02-06 07:52:03 - ERROR - stderr - 100%|█████████▉| 22336/22434 [21:44:22<04:03, 2.48s/it] +2025-02-06 07:52:03 - ERROR - stderr - +2025-02-06 07:52:03 - ERROR - stderr - +2025-02-06 07:52:03 - INFO - stdout - {'loss': 0.3452, 'grad_norm': 1.4757072925567627, 'learning_rate': 1.0009142539813e-09, 'epoch': 2.99} +2025-02-06 07:52:03 - ERROR - stderr - 100%|█████████▉| 22336/22434 [21:44:22<04:03, 2.48s/it] +2025-02-06 07:52:05 - ERROR - stderr - 100%|█████████▉| 22337/22434 [21:44:25<04:01, 2.49s/it] +2025-02-06 07:52:05 - ERROR - stderr - +2025-02-06 07:52:05 - ERROR - stderr - +2025-02-06 07:52:05 - INFO - stdout - {'loss': 0.379, 'grad_norm': 1.527743935585022, 'learning_rate': 9.805919830918609e-10, 'epoch': 2.99} +2025-02-06 07:52:05 - ERROR - stderr - 100%|█████████▉| 22337/22434 [21:44:25<04:01, 2.49s/it] +2025-02-06 07:52:08 - ERROR - stderr - 100%|█████████▉| 22338/22434 [21:44:27<03:58, 2.48s/it] +2025-02-06 07:52:08 - ERROR - stderr - +2025-02-06 07:52:08 - ERROR - stderr - +2025-02-06 07:52:08 - INFO - stdout - {'loss': 0.3512, 'grad_norm': 1.845774531364441, 'learning_rate': 9.604781321936875e-10, 'epoch': 2.99} +2025-02-06 07:52:08 - ERROR - stderr - 100%|█████████▉| 22338/22434 [21:44:27<03:58, 2.48s/it] +2025-02-06 07:52:10 - ERROR - stderr - 100%|█████████▉| 22339/22434 [21:44:30<03:54, 2.46s/it] +2025-02-06 07:52:10 - ERROR - stderr - +2025-02-06 07:52:10 - ERROR - stderr - +2025-02-06 07:52:10 - INFO - stdout - {'loss': 0.3667, 'grad_norm': 1.6075879335403442, 'learning_rate': 9.405727017064436e-10, 'epoch': 2.99} +2025-02-06 07:52:10 - ERROR - stderr - 100%|█████████▉| 22339/22434 [21:44:30<03:54, 2.46s/it] +2025-02-06 07:52:12 - ERROR - stderr - 100%|█████████▉| 22340/22434 [21:44:32<03:51, 2.46s/it] +2025-02-06 07:52:12 - ERROR - stderr - +2025-02-06 07:52:12 - ERROR - stderr - +2025-02-06 07:52:12 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.3417794704437256, 'learning_rate': 9.208756920442429e-10, 'epoch': 2.99} +2025-02-06 07:52:12 - ERROR - stderr - 100%|█████████▉| 22340/22434 [21:44:32<03:51, 2.46s/it] +2025-02-06 07:52:15 - ERROR - stderr - 100%|█████████▉| 22341/22434 [21:44:35<03:51, 2.49s/it] +2025-02-06 07:52:15 - ERROR - stderr - +2025-02-06 07:52:15 - ERROR - stderr - +2025-02-06 07:52:15 - INFO - stdout - {'loss': 0.3413, 'grad_norm': 1.4942771196365356, 'learning_rate': 9.013871036189781e-10, 'epoch': 2.99} +2025-02-06 07:52:15 - ERROR - stderr - 100%|█████████▉| 22341/22434 [21:44:35<03:51, 2.49s/it] +2025-02-06 07:52:18 - ERROR - stderr - 100%|█████████▉| 22342/22434 [21:44:37<03:50, 2.50s/it] +2025-02-06 07:52:18 - ERROR - stderr - +2025-02-06 07:52:18 - ERROR - stderr - +2025-02-06 07:52:18 - INFO - stdout - {'loss': 0.3598, 'grad_norm': 1.5697009563446045, 'learning_rate': 8.821069368358803e-10, 'epoch': 2.99} +2025-02-06 07:52:18 - ERROR - stderr - 100%|█████████▉| 22342/22434 [21:44:37<03:50, 2.50s/it] +2025-02-06 07:52:20 - ERROR - stderr - 100%|█████████▉| 22343/22434 [21:44:40<03:46, 2.49s/it] +2025-02-06 07:52:20 - ERROR - stderr - +2025-02-06 07:52:20 - ERROR - stderr - +2025-02-06 07:52:20 - INFO - stdout - {'loss': 0.3471, 'grad_norm': 1.5428520441055298, 'learning_rate': 8.630351920968505e-10, 'epoch': 2.99} +2025-02-06 07:52:20 - ERROR - stderr - 100%|█████████▉| 22343/22434 [21:44:40<03:46, 2.49s/it] +2025-02-06 07:52:23 - ERROR - stderr - 100%|█████████▉| 22344/22434 [21:44:43<03:52, 2.58s/it] +2025-02-06 07:52:23 - ERROR - stderr - +2025-02-06 07:52:23 - ERROR - stderr - +2025-02-06 07:52:23 - INFO - stdout - {'loss': 0.355, 'grad_norm': 1.6713427305221558, 'learning_rate': 8.441718698004587e-10, 'epoch': 2.99} +2025-02-06 07:52:23 - ERROR - stderr - 100%|█████████▉| 22344/22434 [21:44:43<03:52, 2.58s/it] +2025-02-06 07:52:25 - ERROR - stderr - 100%|█████████▉| 22345/22434 [21:44:45<03:47, 2.56s/it] +2025-02-06 07:52:25 - ERROR - stderr - +2025-02-06 07:52:25 - ERROR - stderr - +2025-02-06 07:52:25 - INFO - stdout - {'loss': 0.3538, 'grad_norm': 1.6043636798858643, 'learning_rate': 8.255169703386134e-10, 'epoch': 2.99} +2025-02-06 07:52:25 - ERROR - stderr - 100%|█████████▉| 22345/22434 [21:44:45<03:47, 2.56s/it] +2025-02-06 07:52:28 - ERROR - stderr - 100%|█████████▉| 22346/22434 [21:44:48<03:43, 2.54s/it] +2025-02-06 07:52:28 - ERROR - stderr - +2025-02-06 07:52:28 - ERROR - stderr - +2025-02-06 07:52:28 - INFO - stdout - {'loss': 0.3436, 'grad_norm': 1.4679776430130005, 'learning_rate': 8.070704941010033e-10, 'epoch': 2.99} +2025-02-06 07:52:28 - ERROR - stderr - 100%|█████████▉| 22346/22434 [21:44:48<03:43, 2.54s/it] +2025-02-06 07:52:30 - ERROR - stderr - 100%|█████████▉| 22347/22434 [21:44:50<03:40, 2.54s/it] +2025-02-06 07:52:30 - ERROR - stderr - +2025-02-06 07:52:30 - ERROR - stderr - +2025-02-06 07:52:30 - INFO - stdout - {'loss': 0.3147, 'grad_norm': 1.4655911922454834, 'learning_rate': 7.888324414717652e-10, 'epoch': 2.99} +2025-02-06 07:52:30 - ERROR - stderr - 100%|█████████▉| 22347/22434 [21:44:50<03:40, 2.54s/it] +2025-02-06 07:52:33 - ERROR - stderr - 100%|█████████▉| 22348/22434 [21:44:53<03:36, 2.52s/it] +2025-02-06 07:52:33 - ERROR - stderr - +2025-02-06 07:52:33 - ERROR - stderr - +2025-02-06 07:52:33 - INFO - stdout - {'loss': 0.3647, 'grad_norm': 1.6129311323165894, 'learning_rate': 7.708028128305956e-10, 'epoch': 2.99} +2025-02-06 07:52:33 - ERROR - stderr - 100%|█████████▉| 22348/22434 [21:44:53<03:36, 2.52s/it] +2025-02-06 07:52:35 - ERROR - stderr - 100%|█████████▉| 22349/22434 [21:44:55<03:35, 2.54s/it] +2025-02-06 07:52:35 - ERROR - stderr - +2025-02-06 07:52:35 - ERROR - stderr - +2025-02-06 07:52:35 - INFO - stdout - {'loss': 0.4013, 'grad_norm': 1.6744179725646973, 'learning_rate': 7.529816085549701e-10, 'epoch': 2.99} +2025-02-06 07:52:35 - ERROR - stderr - 100%|█████████▉| 22349/22434 [21:44:55<03:35, 2.54s/it] +2025-02-06 07:52:38 - ERROR - stderr - 100%|█████████▉| 22350/22434 [21:44:58<03:31, 2.52s/it] +2025-02-06 07:52:38 - ERROR - stderr - +2025-02-06 07:52:38 - ERROR - stderr - +2025-02-06 07:52:38 - INFO - stdout - {'loss': 0.3758, 'grad_norm': 1.7384462356567383, 'learning_rate': 7.353688290145933e-10, 'epoch': 2.99} +2025-02-06 07:52:38 - ERROR - stderr - 100%|█████████▉| 22350/22434 [21:44:58<03:31, 2.52s/it] +2025-02-06 07:52:40 - ERROR - stderr - 100%|█████████▉| 22351/22434 [21:45:00<03:29, 2.52s/it] +2025-02-06 07:52:40 - ERROR - stderr - +2025-02-06 07:52:40 - ERROR - stderr - +2025-02-06 07:52:40 - INFO - stdout - {'loss': 0.3457, 'grad_norm': 1.5321229696273804, 'learning_rate': 7.179644745769488e-10, 'epoch': 2.99} +2025-02-06 07:52:40 - ERROR - stderr - 100%|█████████▉| 22351/22434 [21:45:00<03:29, 2.52s/it] +2025-02-06 07:52:43 - ERROR - stderr - 100%|█████████▉| 22352/22434 [21:45:03<03:25, 2.51s/it] +2025-02-06 07:52:43 - ERROR - stderr - +2025-02-06 07:52:43 - ERROR - stderr - +2025-02-06 07:52:43 - INFO - stdout - {'loss': 0.3218, 'grad_norm': 1.435744285583496, 'learning_rate': 7.007685456050795e-10, 'epoch': 2.99} +2025-02-06 07:52:43 - ERROR - stderr - 100%|█████████▉| 22352/22434 [21:45:03<03:25, 2.51s/it] +2025-02-06 07:52:45 - ERROR - stderr - 100%|█████████▉| 22353/22434 [21:45:05<03:21, 2.49s/it] +2025-02-06 07:52:45 - ERROR - stderr - +2025-02-06 07:52:45 - ERROR - stderr - +2025-02-06 07:52:45 - INFO - stdout - {'loss': 0.3472, 'grad_norm': 1.4888209104537964, 'learning_rate': 6.837810424575875e-10, 'epoch': 2.99} +2025-02-06 07:52:45 - ERROR - stderr - 100%|█████████▉| 22353/22434 [21:45:05<03:21, 2.49s/it] +2025-02-06 07:52:48 - ERROR - stderr - 100%|█████████▉| 22354/22434 [21:45:08<03:18, 2.48s/it] +2025-02-06 07:52:48 - ERROR - stderr - +2025-02-06 07:52:48 - ERROR - stderr - +2025-02-06 07:52:48 - INFO - stdout - {'loss': 0.4017, 'grad_norm': 1.5564380884170532, 'learning_rate': 6.670019654875237e-10, 'epoch': 2.99} +2025-02-06 07:52:48 - ERROR - stderr - 100%|█████████▉| 22354/22434 [21:45:08<03:18, 2.48s/it] +2025-02-06 07:52:50 - ERROR - stderr - 100%|█████████▉| 22355/22434 [21:45:10<03:15, 2.48s/it] +2025-02-06 07:52:50 - ERROR - stderr - +2025-02-06 07:52:50 - ERROR - stderr - +2025-02-06 07:52:50 - INFO - stdout - {'loss': 0.3641, 'grad_norm': 1.5097473859786987, 'learning_rate': 6.504313150468289e-10, 'epoch': 2.99} +2025-02-06 07:52:50 - ERROR - stderr - 100%|█████████▉| 22355/22434 [21:45:10<03:15, 2.48s/it] +2025-02-06 07:52:53 - ERROR - stderr - 100%|█████████▉| 22356/22434 [21:45:12<03:11, 2.46s/it] +2025-02-06 07:52:53 - ERROR - stderr - +2025-02-06 07:52:53 - ERROR - stderr - +2025-02-06 07:52:53 - INFO - stdout - {'loss': 0.371, 'grad_norm': 1.585821509361267, 'learning_rate': 6.340690914785619e-10, 'epoch': 2.99} +2025-02-06 07:52:53 - ERROR - stderr - 100%|█████████▉| 22356/22434 [21:45:12<03:11, 2.46s/it] +2025-02-06 07:52:55 - ERROR - stderr - 100%|█████████▉| 22357/22434 [21:45:15<03:09, 2.46s/it] +2025-02-06 07:52:55 - ERROR - stderr - +2025-02-06 07:52:55 - ERROR - stderr - +2025-02-06 07:52:55 - INFO - stdout - {'loss': 0.3829, 'grad_norm': 1.840911626815796, 'learning_rate': 6.179152951257816e-10, 'epoch': 2.99} +2025-02-06 07:52:55 - ERROR - stderr - 100%|█████████▉| 22357/22434 [21:45:15<03:09, 2.46s/it] +2025-02-06 07:52:58 - ERROR - stderr - 100%|█████████▉| 22358/22434 [21:45:17<03:07, 2.46s/it] +2025-02-06 07:52:58 - ERROR - stderr - +2025-02-06 07:52:58 - ERROR - stderr - +2025-02-06 07:52:58 - INFO - stdout - {'loss': 0.4168, 'grad_norm': 1.6738406419754028, 'learning_rate': 6.019699263237755e-10, 'epoch': 2.99} +2025-02-06 07:52:58 - ERROR - stderr - 100%|█████████▉| 22358/22434 [21:45:17<03:07, 2.46s/it] +2025-02-06 07:53:00 - ERROR - stderr - 100%|█████████▉| 22359/22434 [21:45:20<03:12, 2.56s/it] +2025-02-06 07:53:00 - ERROR - stderr - +2025-02-06 07:53:00 - ERROR - stderr - +2025-02-06 07:53:00 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.3490147590637207, 'learning_rate': 5.862329854045001e-10, 'epoch': 2.99} +2025-02-06 07:53:00 - ERROR - stderr - 100%|█████████▉| 22359/22434 [21:45:20<03:12, 2.56s/it] +2025-02-06 07:53:03 - ERROR - stderr - 100%|█████████▉| 22360/22434 [21:45:23<03:06, 2.52s/it] +2025-02-06 07:53:03 - ERROR - stderr - +2025-02-06 07:53:03 - ERROR - stderr - +2025-02-06 07:53:03 - INFO - stdout - {'loss': 0.3869, 'grad_norm': 1.7374845743179321, 'learning_rate': 5.707044726976918e-10, 'epoch': 2.99} +2025-02-06 07:53:03 - ERROR - stderr - 100%|█████████▉| 22360/22434 [21:45:23<03:06, 2.52s/it] +2025-02-06 07:53:05 - ERROR - stderr - 100%|█████████▉| 22361/22434 [21:45:25<03:02, 2.50s/it] +2025-02-06 07:53:05 - ERROR - stderr - +2025-02-06 07:53:05 - ERROR - stderr - +2025-02-06 07:53:05 - INFO - stdout - {'loss': 0.3658, 'grad_norm': 1.5178130865097046, 'learning_rate': 5.553843885253151e-10, 'epoch': 2.99} +2025-02-06 07:53:05 - ERROR - stderr - 100%|█████████▉| 22361/22434 [21:45:25<03:02, 2.50s/it] +2025-02-06 07:53:08 - ERROR - stderr - 100%|█████████▉| 22362/22434 [21:45:27<02:59, 2.49s/it] +2025-02-06 07:53:08 - ERROR - stderr - +2025-02-06 07:53:08 - ERROR - stderr - +2025-02-06 07:53:08 - INFO - stdout - {'loss': 0.3353, 'grad_norm': 1.445766806602478, 'learning_rate': 5.402727332082248e-10, 'epoch': 2.99} +2025-02-06 07:53:08 - ERROR - stderr - 100%|█████████▉| 22362/22434 [21:45:28<02:59, 2.49s/it] +2025-02-06 07:53:10 - ERROR - stderr - 100%|█████████▉| 22363/22434 [21:45:30<03:01, 2.55s/it] +2025-02-06 07:53:10 - ERROR - stderr - +2025-02-06 07:53:10 - ERROR - stderr - +2025-02-06 07:53:10 - INFO - stdout - {'loss': 0.3559, 'grad_norm': 1.5314022302627563, 'learning_rate': 5.253695070606135e-10, 'epoch': 2.99} +2025-02-06 07:53:10 - ERROR - stderr - 100%|█████████▉| 22363/22434 [21:45:30<03:01, 2.55s/it] +2025-02-06 07:53:13 - ERROR - stderr - 100%|█████████▉| 22364/22434 [21:45:33<02:57, 2.53s/it] +2025-02-06 07:53:13 - ERROR - stderr - +2025-02-06 07:53:13 - ERROR - stderr - +2025-02-06 07:53:13 - INFO - stdout - {'loss': 0.3809, 'grad_norm': 1.6302567720413208, 'learning_rate': 5.106747103933441e-10, 'epoch': 2.99} +2025-02-06 07:53:13 - ERROR - stderr - 100%|█████████▉| 22364/22434 [21:45:33<02:57, 2.53s/it] +2025-02-06 07:53:16 - ERROR - stderr - 100%|█████████▉| 22365/22434 [21:45:35<02:56, 2.56s/it] +2025-02-06 07:53:16 - ERROR - stderr - +2025-02-06 07:53:16 - ERROR - stderr - +2025-02-06 07:53:16 - INFO - stdout - {'loss': 0.41, 'grad_norm': 1.994011402130127, 'learning_rate': 4.961883435128378e-10, 'epoch': 2.99} +2025-02-06 07:53:16 - ERROR - stderr - 100%|█████████▉| 22365/22434 [21:45:35<02:56, 2.56s/it] +2025-02-06 07:53:18 - ERROR - stderr - 100%|█████████▉| 22366/22434 [21:45:38<02:52, 2.54s/it] +2025-02-06 07:53:18 - ERROR - stderr - +2025-02-06 07:53:18 - ERROR - stderr - +2025-02-06 07:53:18 - INFO - stdout - {'loss': 0.3781, 'grad_norm': 1.5871838331222534, 'learning_rate': 4.819104067199653e-10, 'epoch': 2.99} +2025-02-06 07:53:18 - ERROR - stderr - 100%|█████████▉| 22366/22434 [21:45:38<02:52, 2.54s/it] +2025-02-06 07:53:21 - ERROR - stderr - 100%|█████████▉| 22367/22434 [21:45:40<02:50, 2.54s/it] +2025-02-06 07:53:21 - ERROR - stderr - +2025-02-06 07:53:21 - ERROR - stderr - +2025-02-06 07:53:21 - INFO - stdout - {'loss': 0.3095, 'grad_norm': 1.4956631660461426, 'learning_rate': 4.678409003133766e-10, 'epoch': 2.99} +2025-02-06 07:53:21 - ERROR - stderr - 100%|█████████▉| 22367/22434 [21:45:40<02:50, 2.54s/it] +2025-02-06 07:53:23 - ERROR - stderr - 100%|█████████▉| 22368/22434 [21:45:43<02:47, 2.54s/it] +2025-02-06 07:53:23 - ERROR - stderr - +2025-02-06 07:53:23 - ERROR - stderr - +2025-02-06 07:53:23 - INFO - stdout - {'loss': 0.3219, 'grad_norm': 1.3614957332611084, 'learning_rate': 4.539798245861704e-10, 'epoch': 2.99} +2025-02-06 07:53:23 - ERROR - stderr - 100%|█████████▉| 22368/22434 [21:45:43<02:47, 2.54s/it] +2025-02-06 07:53:26 - ERROR - stderr - 100%|█████████▉| 22369/22434 [21:45:45<02:43, 2.52s/it] +2025-02-06 07:53:26 - ERROR - stderr - +2025-02-06 07:53:26 - ERROR - stderr - +2025-02-06 07:53:26 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.5551254749298096, 'learning_rate': 4.40327179828115e-10, 'epoch': 2.99} +2025-02-06 07:53:26 - ERROR - stderr - 100%|█████████▉| 22369/22434 [21:45:45<02:43, 2.52s/it] +2025-02-06 07:53:28 - ERROR - stderr - 100%|█████████▉| 22370/22434 [21:45:48<02:41, 2.52s/it] +2025-02-06 07:53:28 - ERROR - stderr - +2025-02-06 07:53:28 - ERROR - stderr - +2025-02-06 07:53:28 - INFO - stdout - {'loss': 0.3597, 'grad_norm': 1.6438894271850586, 'learning_rate': 4.2688296632120705e-10, 'epoch': 2.99} +2025-02-06 07:53:28 - ERROR - stderr - 100%|█████████▉| 22370/22434 [21:45:48<02:41, 2.52s/it] +2025-02-06 07:53:31 - ERROR - stderr - 100%|█████████▉| 22371/22434 [21:45:50<02:39, 2.53s/it] +2025-02-06 07:53:31 - ERROR - stderr - +2025-02-06 07:53:31 - ERROR - stderr - +2025-02-06 07:53:31 - INFO - stdout - {'loss': 0.375, 'grad_norm': 1.7057816982269287, 'learning_rate': 4.1364718434855343e-10, 'epoch': 2.99} +2025-02-06 07:53:31 - ERROR - stderr - 100%|█████████▉| 22371/22434 [21:45:50<02:39, 2.53s/it] +2025-02-06 07:53:33 - ERROR - stderr - 100%|█████████▉| 22372/22434 [21:45:53<02:36, 2.53s/it] +2025-02-06 07:53:33 - ERROR - stderr - +2025-02-06 07:53:33 - ERROR - stderr - +2025-02-06 07:53:33 - INFO - stdout - {'loss': 0.4108, 'grad_norm': 1.6758105754852295, 'learning_rate': 4.0061983418437923e-10, 'epoch': 2.99} +2025-02-06 07:53:33 - ERROR - stderr - 100%|█████████▉| 22372/22434 [21:45:53<02:36, 2.53s/it] +2025-02-06 07:53:36 - ERROR - stderr - 100%|█████████▉| 22373/22434 [21:45:55<02:34, 2.54s/it] +2025-02-06 07:53:36 - ERROR - stderr - +2025-02-06 07:53:36 - ERROR - stderr - +2025-02-06 07:53:36 - INFO - stdout - {'loss': 0.3387, 'grad_norm': 1.595291256904602, 'learning_rate': 3.8780091610179924e-10, 'epoch': 2.99} +2025-02-06 07:53:36 - ERROR - stderr - 100%|█████████▉| 22373/22434 [21:45:56<02:34, 2.54s/it] +2025-02-06 07:53:38 - ERROR - stderr - 100%|█████████▉| 22374/22434 [21:45:58<02:30, 2.51s/it] +2025-02-06 07:53:38 - ERROR - stderr - +2025-02-06 07:53:38 - ERROR - stderr - +2025-02-06 07:53:38 - INFO - stdout - {'loss': 0.3147, 'grad_norm': 1.4250311851501465, 'learning_rate': 3.751904303661569e-10, 'epoch': 2.99} +2025-02-06 07:53:38 - ERROR - stderr - 100%|█████████▉| 22374/22434 [21:45:58<02:30, 2.51s/it] +2025-02-06 07:53:41 - ERROR - stderr - 100%|█████████▉| 22375/22434 [21:46:00<02:28, 2.51s/it] +2025-02-06 07:53:41 - ERROR - stderr - +2025-02-06 07:53:41 - ERROR - stderr - +2025-02-06 07:53:41 - INFO - stdout - {'loss': 0.3818, 'grad_norm': 1.673694133758545, 'learning_rate': 3.627883772405749e-10, 'epoch': 2.99} +2025-02-06 07:53:41 - ERROR - stderr - 100%|█████████▉| 22375/22434 [21:46:00<02:28, 2.51s/it] +2025-02-06 07:53:43 - ERROR - stderr - 100%|█████████▉| 22376/22434 [21:46:03<02:25, 2.51s/it] +2025-02-06 07:53:43 - ERROR - stderr - +2025-02-06 07:53:43 - ERROR - stderr - +2025-02-06 07:53:43 - INFO - stdout - {'loss': 0.3501, 'grad_norm': 1.5873024463653564, 'learning_rate': 3.505947569848456e-10, 'epoch': 2.99} +2025-02-06 07:53:43 - ERROR - stderr - 100%|█████████▉| 22376/22434 [21:46:03<02:25, 2.51s/it] +2025-02-06 07:53:46 - ERROR - stderr - 100%|█████████▉| 22377/22434 [21:46:05<02:22, 2.49s/it] +2025-02-06 07:53:46 - ERROR - stderr - +2025-02-06 07:53:46 - ERROR - stderr - +2025-02-06 07:53:46 - INFO - stdout - {'loss': 0.3258, 'grad_norm': 1.4187895059585571, 'learning_rate': 3.386095698509895e-10, 'epoch': 2.99} +2025-02-06 07:53:46 - ERROR - stderr - 100%|█████████▉| 22377/22434 [21:46:05<02:22, 2.49s/it] +2025-02-06 07:53:48 - ERROR - stderr - 100%|█████████▉| 22378/22434 [21:46:08<02:19, 2.50s/it] +2025-02-06 07:53:48 - ERROR - stderr - +2025-02-06 07:53:48 - ERROR - stderr - +2025-02-06 07:53:48 - INFO - stdout - {'loss': 0.3844, 'grad_norm': 1.6273142099380493, 'learning_rate': 3.2683281609213745e-10, 'epoch': 2.99} +2025-02-06 07:53:48 - ERROR - stderr - 100%|█████████▉| 22378/22434 [21:46:08<02:19, 2.50s/it] +2025-02-06 07:53:51 - ERROR - stderr - 100%|█████████▉| 22379/22434 [21:46:11<02:19, 2.53s/it] +2025-02-06 07:53:51 - ERROR - stderr - +2025-02-06 07:53:51 - ERROR - stderr - +2025-02-06 07:53:51 - INFO - stdout - {'loss': 0.3653, 'grad_norm': 1.5777959823608398, 'learning_rate': 3.1526449595031815e-10, 'epoch': 2.99} +2025-02-06 07:53:51 - ERROR - stderr - 100%|█████████▉| 22379/22434 [21:46:11<02:19, 2.53s/it] +2025-02-06 07:53:53 - ERROR - stderr - 100%|█████████▉| 22380/22434 [21:46:13<02:15, 2.50s/it] +2025-02-06 07:53:53 - ERROR - stderr - +2025-02-06 07:53:53 - ERROR - stderr - +2025-02-06 07:53:53 - INFO - stdout - {'loss': 0.4081, 'grad_norm': 1.7691506147384644, 'learning_rate': 3.039046096686704e-10, 'epoch': 2.99} +2025-02-06 07:53:53 - ERROR - stderr - 100%|█████████▉| 22380/22434 [21:46:13<02:15, 2.50s/it] +2025-02-06 07:53:56 - ERROR - stderr - 100%|█████████▉| 22381/22434 [21:46:15<02:12, 2.49s/it] +2025-02-06 07:53:56 - ERROR - stderr - +2025-02-06 07:53:56 - ERROR - stderr - +2025-02-06 07:53:56 - INFO - stdout - {'loss': 0.361, 'grad_norm': 1.6440637111663818, 'learning_rate': 2.927531574836717e-10, 'epoch': 2.99} +2025-02-06 07:53:56 - ERROR - stderr - 100%|█████████▉| 22381/22434 [21:46:15<02:12, 2.49s/it] +2025-02-06 07:53:58 - ERROR - stderr - 100%|█████████▉| 22382/22434 [21:46:18<02:09, 2.50s/it] +2025-02-06 07:53:58 - ERROR - stderr - +2025-02-06 07:53:58 - ERROR - stderr - +2025-02-06 07:53:58 - INFO - stdout - {'loss': 0.3369, 'grad_norm': 1.5825228691101074, 'learning_rate': 2.818101396273587e-10, 'epoch': 2.99} +2025-02-06 07:53:58 - ERROR - stderr - 100%|█████████▉| 22382/22434 [21:46:18<02:09, 2.50s/it] +2025-02-06 07:54:01 - ERROR - stderr - 100%|█████████▉| 22383/22434 [21:46:20<02:06, 2.48s/it] +2025-02-06 07:54:01 - ERROR - stderr - +2025-02-06 07:54:01 - ERROR - stderr - +2025-02-06 07:54:01 - INFO - stdout - {'loss': 0.3365, 'grad_norm': 1.5457130670547485, 'learning_rate': 2.7107555632732705e-10, 'epoch': 2.99} +2025-02-06 07:54:01 - ERROR - stderr - 100%|█████████▉| 22383/22434 [21:46:20<02:06, 2.48s/it] +2025-02-06 07:54:03 - ERROR - stderr - 100%|█████████▉| 22384/22434 [21:46:23<02:03, 2.46s/it] +2025-02-06 07:54:03 - ERROR - stderr - +2025-02-06 07:54:03 - ERROR - stderr - +2025-02-06 07:54:03 - INFO - stdout - {'loss': 0.3651, 'grad_norm': 1.5401756763458252, 'learning_rate': 2.605494078089521e-10, 'epoch': 2.99} +2025-02-06 07:54:03 - ERROR - stderr - 100%|█████████▉| 22384/22434 [21:46:23<02:03, 2.46s/it] +2025-02-06 07:54:05 - ERROR - stderr - 100%|█████████▉| 22385/22434 [21:46:25<02:00, 2.45s/it] +2025-02-06 07:54:06 - ERROR - stderr - +2025-02-06 07:54:06 - ERROR - stderr - +2025-02-06 07:54:06 - INFO - stdout - {'loss': 0.3583, 'grad_norm': 1.5718353986740112, 'learning_rate': 2.5023169429094773e-10, 'epoch': 2.99} +2025-02-06 07:54:06 - ERROR - stderr - 100%|█████████▉| 22385/22434 [21:46:25<02:00, 2.45s/it] +2025-02-06 07:54:08 - ERROR - stderr - 100%|█████████▉| 22386/22434 [21:46:28<01:58, 2.46s/it] +2025-02-06 07:54:08 - ERROR - stderr - +2025-02-06 07:54:08 - ERROR - stderr - +2025-02-06 07:54:08 - INFO - stdout - {'loss': 0.3271, 'grad_norm': 1.4894545078277588, 'learning_rate': 2.40122415987587e-10, 'epoch': 2.99} +2025-02-06 07:54:08 - ERROR - stderr - 100%|█████████▉| 22386/22434 [21:46:28<01:58, 2.46s/it] +2025-02-06 07:54:10 - ERROR - stderr - 100%|█████████▉| 22387/22434 [21:46:30<01:55, 2.46s/it] +2025-02-06 07:54:10 - ERROR - stderr - +2025-02-06 07:54:10 - ERROR - stderr - +2025-02-06 07:54:10 - INFO - stdout - {'loss': 0.3286, 'grad_norm': 1.5813277959823608, 'learning_rate': 2.3022157310981231e-10, 'epoch': 2.99} +2025-02-06 07:54:10 - ERROR - stderr - 100%|█████████▉| 22387/22434 [21:46:30<01:55, 2.46s/it] +2025-02-06 07:54:13 - ERROR - stderr - 100%|█████████▉| 22388/22434 [21:46:33<01:53, 2.47s/it] +2025-02-06 07:54:13 - ERROR - stderr - +2025-02-06 07:54:13 - ERROR - stderr - +2025-02-06 07:54:13 - INFO - stdout - {'loss': 0.3503, 'grad_norm': 1.5094188451766968, 'learning_rate': 2.205291658641251e-10, 'epoch': 2.99} +2025-02-06 07:54:13 - ERROR - stderr - 100%|█████████▉| 22388/22434 [21:46:33<01:53, 2.47s/it] +2025-02-06 07:54:15 - ERROR - stderr - 100%|█████████▉| 22389/22434 [21:46:35<01:51, 2.48s/it] +2025-02-06 07:54:15 - ERROR - stderr - +2025-02-06 07:54:15 - ERROR - stderr - +2025-02-06 07:54:15 - INFO - stdout - {'loss': 0.3733, 'grad_norm': 1.5635809898376465, 'learning_rate': 2.110451944536962e-10, 'epoch': 2.99} +2025-02-06 07:54:15 - ERROR - stderr - 100%|█████████▉| 22389/22434 [21:46:35<01:51, 2.48s/it] +2025-02-06 07:54:18 - ERROR - stderr - 100%|█████████▉| 22390/22434 [21:46:38<01:49, 2.50s/it] +2025-02-06 07:54:18 - ERROR - stderr - +2025-02-06 07:54:18 - ERROR - stderr - +2025-02-06 07:54:18 - INFO - stdout - {'loss': 0.3504, 'grad_norm': 1.5083894729614258, 'learning_rate': 2.0176965907503509e-10, 'epoch': 2.99} +2025-02-06 07:54:18 - ERROR - stderr - 100%|█████████▉| 22390/22434 [21:46:38<01:49, 2.50s/it] +2025-02-06 07:54:20 - ERROR - stderr - 100%|█████████▉| 22391/22434 [21:46:40<01:47, 2.49s/it] +2025-02-06 07:54:20 - ERROR - stderr - +2025-02-06 07:54:20 - ERROR - stderr - +2025-02-06 07:54:20 - INFO - stdout - {'loss': 0.3694, 'grad_norm': 1.5902777910232544, 'learning_rate': 1.927025599213206e-10, 'epoch': 2.99} +2025-02-06 07:54:20 - ERROR - stderr - 100%|█████████▉| 22391/22434 [21:46:40<01:47, 2.49s/it] +2025-02-06 07:54:23 - ERROR - stderr - 100%|█████████▉| 22392/22434 [21:46:43<01:44, 2.48s/it] +2025-02-06 07:54:23 - ERROR - stderr - +2025-02-06 07:54:23 - ERROR - stderr - +2025-02-06 07:54:23 - INFO - stdout - {'loss': 0.3835, 'grad_norm': 1.7590152025222778, 'learning_rate': 1.838438971824008e-10, 'epoch': 2.99} +2025-02-06 07:54:23 - ERROR - stderr - 100%|█████████▉| 22392/22434 [21:46:43<01:44, 2.48s/it] +2025-02-06 07:54:25 - ERROR - stderr - 100%|█████████▉| 22393/22434 [21:46:45<01:42, 2.49s/it] +2025-02-06 07:54:25 - ERROR - stderr - +2025-02-06 07:54:25 - ERROR - stderr - +2025-02-06 07:54:25 - INFO - stdout - {'loss': 0.3762, 'grad_norm': 1.711905598640442, 'learning_rate': 1.7519367104257279e-10, 'epoch': 2.99} +2025-02-06 07:54:25 - ERROR - stderr - 100%|█████████▉| 22393/22434 [21:46:45<01:42, 2.49s/it] +2025-02-06 07:54:28 - ERROR - stderr - 100%|█████████▉| 22394/22434 [21:46:48<01:40, 2.51s/it] +2025-02-06 07:54:28 - ERROR - stderr - +2025-02-06 07:54:28 - ERROR - stderr - +2025-02-06 07:54:28 - INFO - stdout - {'loss': 0.3717, 'grad_norm': 1.6400275230407715, 'learning_rate': 1.6675188168169266e-10, 'epoch': 2.99} +2025-02-06 07:54:28 - ERROR - stderr - 100%|█████████▉| 22394/22434 [21:46:48<01:40, 2.51s/it] +2025-02-06 07:54:30 - ERROR - stderr - 100%|█████████▉| 22395/22434 [21:46:50<01:37, 2.49s/it] +2025-02-06 07:54:30 - ERROR - stderr - +2025-02-06 07:54:30 - ERROR - stderr - +2025-02-06 07:54:30 - INFO - stdout - {'loss': 0.3662, 'grad_norm': 1.4617305994033813, 'learning_rate': 1.5851852927628586e-10, 'epoch': 2.99} +2025-02-06 07:54:30 - ERROR - stderr - 100%|█████████▉| 22395/22434 [21:46:50<01:37, 2.49s/it] +2025-02-06 07:54:33 - ERROR - stderr - 100%|█████████▉| 22396/22434 [21:46:53<01:35, 2.51s/it] +2025-02-06 07:54:33 - ERROR - stderr - +2025-02-06 07:54:33 - ERROR - stderr - +2025-02-06 07:54:33 - INFO - stdout - {'loss': 0.4089, 'grad_norm': 1.565327525138855, 'learning_rate': 1.5049361399732675e-10, 'epoch': 2.99} +2025-02-06 07:54:33 - ERROR - stderr - 100%|█████████▉| 22396/22434 [21:46:53<01:35, 2.51s/it] +2025-02-06 07:54:35 - ERROR - stderr - 100%|█████████▉| 22397/22434 [21:46:55<01:32, 2.51s/it] +2025-02-06 07:54:35 - ERROR - stderr - +2025-02-06 07:54:35 - ERROR - stderr - +2025-02-06 07:54:35 - INFO - stdout - {'loss': 0.3669, 'grad_norm': 1.4674016237258911, 'learning_rate': 1.4267713601245904e-10, 'epoch': 3.0} +2025-02-06 07:54:35 - ERROR - stderr - 100%|█████████▉| 22397/22434 [21:46:55<01:32, 2.51s/it] +2025-02-06 07:54:38 - ERROR - stderr - 100%|█████████▉| 22398/22434 [21:46:58<01:29, 2.49s/it] +2025-02-06 07:54:38 - ERROR - stderr - +2025-02-06 07:54:38 - ERROR - stderr - +2025-02-06 07:54:38 - INFO - stdout - {'loss': 0.4123, 'grad_norm': 1.7457830905914307, 'learning_rate': 1.3506909548488545e-10, 'epoch': 3.0} +2025-02-06 07:54:38 - ERROR - stderr - 100%|█████████▉| 22398/22434 [21:46:58<01:29, 2.49s/it] +2025-02-06 07:54:40 - ERROR - stderr - 100%|█████████▉| 22399/22434 [21:47:00<01:27, 2.49s/it] +2025-02-06 07:54:40 - ERROR - stderr - +2025-02-06 07:54:40 - ERROR - stderr - +2025-02-06 07:54:40 - INFO - stdout - {'loss': 0.3808, 'grad_norm': 1.6267791986465454, 'learning_rate': 1.2766949257336792e-10, 'epoch': 3.0} +2025-02-06 07:54:40 - ERROR - stderr - 100%|█████████▉| 22399/22434 [21:47:00<01:27, 2.49s/it] +2025-02-06 07:54:43 - ERROR - stderr - 100%|█████████▉| 22400/22434 [21:47:03<01:25, 2.50s/it] +2025-02-06 07:54:43 - ERROR - stderr - +2025-02-06 07:54:43 - ERROR - stderr - +2025-02-06 07:54:43 - INFO - stdout - {'loss': 0.3646, 'grad_norm': 1.5260343551635742, 'learning_rate': 1.204783274311172e-10, 'epoch': 3.0} +2025-02-06 07:54:43 - ERROR - stderr - 100%|█████████▉| 22400/22434 [21:47:03<01:25, 2.50s/it] +2025-02-06 07:54:46 - ERROR - stderr - 100%|█████████▉| 22401/22434 [21:47:05<01:23, 2.54s/it] +2025-02-06 07:54:46 - ERROR - stderr - +2025-02-06 07:54:46 - ERROR - stderr - +2025-02-06 07:54:46 - INFO - stdout - {'loss': 0.3145, 'grad_norm': 1.4252662658691406, 'learning_rate': 1.1349560020912364e-10, 'epoch': 3.0} +2025-02-06 07:54:46 - ERROR - stderr - 100%|█████████▉| 22401/22434 [21:47:05<01:23, 2.54s/it] +2025-02-06 07:54:48 - ERROR - stderr - 100%|█████████▉| 22402/22434 [21:47:08<01:20, 2.52s/it] +2025-02-06 07:54:48 - ERROR - stderr - +2025-02-06 07:54:48 - ERROR - stderr - +2025-02-06 07:54:48 - INFO - stdout - {'loss': 0.3209, 'grad_norm': 1.4194985628128052, 'learning_rate': 1.0672131105282646e-10, 'epoch': 3.0} +2025-02-06 07:54:48 - ERROR - stderr - 100%|█████████▉| 22402/22434 [21:47:08<01:20, 2.52s/it] +2025-02-06 07:54:51 - ERROR - stderr - 100%|█████████▉| 22403/22434 [21:47:10<01:18, 2.53s/it] +2025-02-06 07:54:51 - ERROR - stderr - +2025-02-06 07:54:51 - ERROR - stderr - +2025-02-06 07:54:51 - INFO - stdout - {'loss': 0.3848, 'grad_norm': 1.7668046951293945, 'learning_rate': 1.0015546010211375e-10, 'epoch': 3.0} +2025-02-06 07:54:51 - ERROR - stderr - 100%|█████████▉| 22403/22434 [21:47:10<01:18, 2.53s/it] +2025-02-06 07:54:53 - ERROR - stderr - 100%|█████████▉| 22404/22434 [21:47:13<01:15, 2.53s/it] +2025-02-06 07:54:53 - ERROR - stderr - +2025-02-06 07:54:53 - ERROR - stderr - +2025-02-06 07:54:53 - INFO - stdout - {'loss': 0.3991, 'grad_norm': 1.7604402303695679, 'learning_rate': 9.379804749465316e-11, 'epoch': 3.0} +2025-02-06 07:54:53 - ERROR - stderr - 100%|█████████▉| 22404/22434 [21:47:13<01:15, 2.53s/it] +2025-02-06 07:54:56 - ERROR - stderr - 100%|█████████▉| 22405/22434 [21:47:15<01:12, 2.51s/it] +2025-02-06 07:54:56 - ERROR - stderr - +2025-02-06 07:54:56 - ERROR - stderr - +2025-02-06 07:54:56 - INFO - stdout - {'loss': 0.3435, 'grad_norm': 1.6049607992172241, 'learning_rate': 8.764907336367146e-11, 'epoch': 3.0} +2025-02-06 07:54:56 - ERROR - stderr - 100%|█████████▉| 22405/22434 [21:47:15<01:12, 2.51s/it] +2025-02-06 07:54:58 - ERROR - stderr - 100%|█████████▉| 22406/22434 [21:47:18<01:09, 2.50s/it] +2025-02-06 07:54:58 - ERROR - stderr - +2025-02-06 07:54:58 - ERROR - stderr - +2025-02-06 07:54:58 - INFO - stdout - {'loss': 0.3709, 'grad_norm': 1.3902431726455688, 'learning_rate': 8.170853783684429e-11, 'epoch': 3.0} +2025-02-06 07:54:58 - ERROR - stderr - 100%|█████████▉| 22406/22434 [21:47:18<01:09, 2.50s/it] +2025-02-06 07:55:00 - ERROR - stderr - 100%|█████████▉| 22407/22434 [21:47:20<01:06, 2.48s/it] +2025-02-06 07:55:01 - ERROR - stderr - +2025-02-06 07:55:01 - ERROR - stderr - +2025-02-06 07:55:01 - INFO - stdout - {'loss': 0.3549, 'grad_norm': 1.5619003772735596, 'learning_rate': 7.597644103851664e-11, 'epoch': 3.0} +2025-02-06 07:55:01 - ERROR - stderr - 100%|█████████▉| 22407/22434 [21:47:20<01:06, 2.48s/it] +2025-02-06 07:55:03 - ERROR - stderr - 100%|█████████▉| 22408/22434 [21:47:23<01:05, 2.50s/it] +2025-02-06 07:55:03 - ERROR - stderr - +2025-02-06 07:55:03 - ERROR - stderr - +2025-02-06 07:55:03 - INFO - stdout - {'loss': 0.3255, 'grad_norm': 1.446862816810608, 'learning_rate': 7.045278308637215e-11, 'epoch': 3.0} +2025-02-06 07:55:03 - ERROR - stderr - 100%|█████████▉| 22408/22434 [21:47:23<01:05, 2.50s/it] +2025-02-06 07:55:06 - ERROR - stderr - 100%|█████████▉| 22409/22434 [21:47:25<01:02, 2.51s/it] +2025-02-06 07:55:06 - ERROR - stderr - +2025-02-06 07:55:06 - ERROR - stderr - +2025-02-06 07:55:06 - INFO - stdout - {'loss': 0.3454, 'grad_norm': 1.5430889129638672, 'learning_rate': 6.513756409698424e-11, 'epoch': 3.0} +2025-02-06 07:55:06 - ERROR - stderr - 100%|█████████▉| 22409/22434 [21:47:25<01:02, 2.51s/it] +2025-02-06 07:55:08 - ERROR - stderr - 100%|█████████▉| 22410/22434 [21:47:28<00:59, 2.50s/it] +2025-02-06 07:55:08 - ERROR - stderr - +2025-02-06 07:55:08 - ERROR - stderr - +2025-02-06 07:55:08 - INFO - stdout - {'loss': 0.3638, 'grad_norm': 1.6606395244598389, 'learning_rate': 6.003078418137521e-11, 'epoch': 3.0} +2025-02-06 07:55:08 - ERROR - stderr - 100%|█████████▉| 22410/22434 [21:47:28<00:59, 2.50s/it] +2025-02-06 07:55:10 - ERROR - stderr - 100%|█████████▉| 22411/22434 [21:47:30<00:57, 2.48s/it] +2025-02-06 07:55:11 - ERROR - stderr - +2025-02-06 07:55:11 - ERROR - stderr - +2025-02-06 07:55:11 - INFO - stdout - {'loss': 0.3338, 'grad_norm': 1.5591570138931274, 'learning_rate': 5.5132443445016225e-11, 'epoch': 3.0} +2025-02-06 07:55:11 - ERROR - stderr - 100%|█████████▉| 22411/22434 [21:47:30<00:57, 2.48s/it] +2025-02-06 07:55:13 - ERROR - stderr - 100%|█████████▉| 22412/22434 [21:47:33<00:54, 2.49s/it] +2025-02-06 07:55:13 - ERROR - stderr - +2025-02-06 07:55:13 - ERROR - stderr - +2025-02-06 07:55:13 - INFO - stdout - {'loss': 0.3295, 'grad_norm': 1.5504367351531982, 'learning_rate': 5.0442541991158056e-11, 'epoch': 3.0} +2025-02-06 07:55:13 - ERROR - stderr - 100%|█████████▉| 22412/22434 [21:47:33<00:54, 2.49s/it] +2025-02-06 07:55:16 - ERROR - stderr - 100%|█████████▉| 22413/22434 [21:47:35<00:52, 2.51s/it] +2025-02-06 07:55:16 - ERROR - stderr - +2025-02-06 07:55:16 - ERROR - stderr - +2025-02-06 07:55:16 - INFO - stdout - {'loss': 0.3901, 'grad_norm': 1.680036187171936, 'learning_rate': 4.5961079916390095e-11, 'epoch': 3.0} +2025-02-06 07:55:16 - ERROR - stderr - 100%|█████████▉| 22413/22434 [21:47:35<00:52, 2.51s/it] +2025-02-06 07:55:18 - ERROR - stderr - 100%|█████████▉| 22414/22434 [21:47:38<00:50, 2.51s/it] +2025-02-06 07:55:18 - ERROR - stderr - +2025-02-06 07:55:18 - ERROR - stderr - +2025-02-06 07:55:18 - INFO - stdout - {'loss': 0.3245, 'grad_norm': 1.421204924583435, 'learning_rate': 4.16880573150813e-11, 'epoch': 3.0} +2025-02-06 07:55:18 - ERROR - stderr - 100%|█████████▉| 22414/22434 [21:47:38<00:50, 2.51s/it] +2025-02-06 07:55:21 - ERROR - stderr - 100%|█████████▉| 22415/22434 [21:47:40<00:47, 2.52s/it] +2025-02-06 07:55:21 - ERROR - stderr - +2025-02-06 07:55:21 - ERROR - stderr - +2025-02-06 07:55:21 - INFO - stdout - {'loss': 0.3419, 'grad_norm': 1.4074370861053467, 'learning_rate': 3.762347427604951e-11, 'epoch': 3.0} +2025-02-06 07:55:21 - ERROR - stderr - 100%|█████████▉| 22415/22434 [21:47:40<00:47, 2.52s/it] +2025-02-06 07:55:23 - ERROR - stderr - 100%|█████████▉| 22416/22434 [21:47:43<00:46, 2.59s/it] +2025-02-06 07:55:23 - ERROR - stderr - +2025-02-06 07:55:23 - ERROR - stderr - +2025-02-06 07:55:23 - INFO - stdout - {'loss': 0.4021, 'grad_norm': 1.824880599975586, 'learning_rate': 3.376733088256145e-11, 'epoch': 3.0} +2025-02-06 07:55:23 - ERROR - stderr - 100%|█████████▉| 22416/22434 [21:47:43<00:46, 2.59s/it] +2025-02-06 07:55:26 - ERROR - stderr - 100%|█████████▉| 22417/22434 [21:47:46<00:43, 2.56s/it] +2025-02-06 07:55:26 - ERROR - stderr - +2025-02-06 07:55:26 - ERROR - stderr - +2025-02-06 07:55:26 - INFO - stdout - {'loss': 0.3459, 'grad_norm': 1.5240685939788818, 'learning_rate': 3.0119627217883864e-11, 'epoch': 3.0} +2025-02-06 07:55:26 - ERROR - stderr - 100%|█████████▉| 22417/22434 [21:47:46<00:43, 2.56s/it] +2025-02-06 07:55:29 - ERROR - stderr - 100%|█████████▉| 22418/22434 [21:47:48<00:41, 2.61s/it] +2025-02-06 07:55:29 - ERROR - stderr - +2025-02-06 07:55:29 - ERROR - stderr - +2025-02-06 07:55:29 - INFO - stdout - {'loss': 0.3824, 'grad_norm': 1.624670386314392, 'learning_rate': 2.668036335529145e-11, 'epoch': 3.0} +2025-02-06 07:55:29 - ERROR - stderr - 100%|█████████▉| 22418/22434 [21:47:48<00:41, 2.61s/it] +2025-02-06 07:55:31 - ERROR - stderr - 100%|█████████▉| 22419/22434 [21:47:51<00:38, 2.56s/it] +2025-02-06 07:55:31 - ERROR - stderr - +2025-02-06 07:55:31 - ERROR - stderr - +2025-02-06 07:55:31 - INFO - stdout - {'loss': 0.3168, 'grad_norm': 1.462699055671692, 'learning_rate': 2.3449539368058937e-11, 'epoch': 3.0} +2025-02-06 07:55:31 - ERROR - stderr - 100%|█████████▉| 22419/22434 [21:47:51<00:38, 2.56s/it] +2025-02-06 07:55:33 - ERROR - stderr - 100%|█████████▉| 22420/22434 [21:47:53<00:35, 2.54s/it] +2025-02-06 07:55:34 - ERROR - stderr - +2025-02-06 07:55:34 - ERROR - stderr - +2025-02-06 07:55:34 - INFO - stdout - {'loss': 0.3944, 'grad_norm': 1.6156619787216187, 'learning_rate': 2.042715532279971e-11, 'epoch': 3.0} +2025-02-06 07:55:34 - ERROR - stderr - 100%|█████████▉| 22420/22434 [21:47:53<00:35, 2.54s/it] +2025-02-06 07:55:36 - ERROR - stderr - 100%|█████████▉| 22421/22434 [21:47:56<00:32, 2.53s/it] +2025-02-06 07:55:36 - ERROR - stderr - +2025-02-06 07:55:36 - ERROR - stderr - +2025-02-06 07:55:36 - INFO - stdout - {'loss': 0.3902, 'grad_norm': 1.4902485609054565, 'learning_rate': 1.7613211282796472e-11, 'epoch': 3.0} +2025-02-06 07:55:36 - ERROR - stderr - 100%|█████████▉| 22421/22434 [21:47:56<00:32, 2.53s/it] +2025-02-06 07:55:39 - ERROR - stderr - 100%|█████████▉| 22422/22434 [21:47:58<00:30, 2.54s/it] +2025-02-06 07:55:39 - ERROR - stderr - +2025-02-06 07:55:39 - ERROR - stderr - +2025-02-06 07:55:39 - INFO - stdout - {'loss': 0.3614, 'grad_norm': 1.4786412715911865, 'learning_rate': 1.500770730689105e-11, 'epoch': 3.0} +2025-02-06 07:55:39 - ERROR - stderr - 100%|█████████▉| 22422/22434 [21:47:58<00:30, 2.54s/it] +2025-02-06 07:55:41 - ERROR - stderr - 100%|█████████▉| 22423/22434 [21:48:01<00:28, 2.55s/it] +2025-02-06 07:55:41 - ERROR - stderr - +2025-02-06 07:55:41 - ERROR - stderr - +2025-02-06 07:55:41 - INFO - stdout - {'loss': 0.4001, 'grad_norm': 1.7052795886993408, 'learning_rate': 1.2610643449484373e-11, 'epoch': 3.0} +2025-02-06 07:55:41 - ERROR - stderr - 100%|█████████▉| 22423/22434 [21:48:01<00:28, 2.55s/it] +2025-02-06 07:55:44 - ERROR - stderr - 100%|█████████▉| 22424/22434 [21:48:03<00:25, 2.54s/it] +2025-02-06 07:55:44 - ERROR - stderr - +2025-02-06 07:55:44 - ERROR - stderr - +2025-02-06 07:55:44 - INFO - stdout - {'loss': 0.3562, 'grad_norm': 1.5927343368530273, 'learning_rate': 1.0422019759426249e-11, 'epoch': 3.0} +2025-02-06 07:55:44 - ERROR - stderr - 100%|█████████▉| 22424/22434 [21:48:03<00:25, 2.54s/it] +2025-02-06 07:55:46 - ERROR - stderr - 100%|█████████▉| 22425/22434 [21:48:06<00:23, 2.57s/it] +2025-02-06 07:55:46 - ERROR - stderr - +2025-02-06 07:55:46 - ERROR - stderr - +2025-02-06 07:55:46 - INFO - stdout - {'loss': 0.3229, 'grad_norm': 1.552304983139038, 'learning_rate': 8.441836284456274e-12, 'epoch': 3.0} +2025-02-06 07:55:46 - ERROR - stderr - 100%|█████████▉| 22425/22434 [21:48:06<00:23, 2.57s/it] +2025-02-06 07:55:49 - ERROR - stderr - 100%|█████████▉| 22426/22434 [21:48:09<00:20, 2.55s/it] +2025-02-06 07:55:49 - ERROR - stderr - +2025-02-06 07:55:49 - ERROR - stderr - +2025-02-06 07:55:49 - INFO - stdout - {'loss': 0.3161, 'grad_norm': 1.882645845413208, 'learning_rate': 6.670093063432248e-12, 'epoch': 3.0} +2025-02-06 07:55:49 - ERROR - stderr - 100%|█████████▉| 22426/22434 [21:48:09<00:20, 2.55s/it] +2025-02-06 07:55:51 - ERROR - stderr - 100%|█████████▉| 22427/22434 [21:48:11<00:17, 2.53s/it] +2025-02-06 07:55:51 - ERROR - stderr - +2025-02-06 07:55:51 - ERROR - stderr - +2025-02-06 07:55:51 - INFO - stdout - {'loss': 0.3462, 'grad_norm': 1.9288395643234253, 'learning_rate': 5.1067901341017574e-12, 'epoch': 3.0} +2025-02-06 07:55:51 - ERROR - stderr - 100%|█████████▉| 22427/22434 [21:48:11<00:17, 2.53s/it] +2025-02-06 07:55:54 - ERROR - stderr - 100%|█████████▉| 22428/22434 [21:48:14<00:15, 2.53s/it] +2025-02-06 07:55:54 - ERROR - stderr - +2025-02-06 07:55:54 - ERROR - stderr - +2025-02-06 07:55:54 - INFO - stdout - {'loss': 0.3934, 'grad_norm': 1.5325981378555298, 'learning_rate': 3.751927530881716e-12, 'epoch': 3.0} +2025-02-06 07:55:54 - ERROR - stderr - 100%|█████████▉| 22428/22434 [21:48:14<00:15, 2.53s/it] +2025-02-06 07:55:56 - ERROR - stderr - 100%|█████████▉| 22429/22434 [21:48:16<00:12, 2.52s/it] +2025-02-06 07:55:56 - ERROR - stderr - +2025-02-06 07:55:56 - ERROR - stderr - +2025-02-06 07:55:56 - INFO - stdout - {'loss': 0.3516, 'grad_norm': 1.5895451307296753, 'learning_rate': 2.6055052793072522e-12, 'epoch': 3.0} +2025-02-06 07:55:56 - ERROR - stderr - 100%|█████████▉| 22429/22434 [21:48:16<00:12, 2.52s/it] +2025-02-06 07:55:59 - ERROR - stderr - 100%|█████████▉| 22430/22434 [21:48:19<00:10, 2.51s/it] +2025-02-06 07:55:59 - ERROR - stderr - +2025-02-06 07:55:59 - ERROR - stderr - +2025-02-06 07:55:59 - INFO - stdout - {'loss': 0.4512, 'grad_norm': 1.758899211883545, 'learning_rate': 1.667523404913496e-12, 'epoch': 3.0} +2025-02-06 07:55:59 - ERROR - stderr - 100%|█████████▉| 22430/22434 [21:48:19<00:10, 2.51s/it] +2025-02-06 07:56:01 - ERROR - stderr - 100%|█████████▉| 22431/22434 [21:48:21<00:07, 2.51s/it] +2025-02-06 07:56:01 - ERROR - stderr - +2025-02-06 07:56:01 - ERROR - stderr - +2025-02-06 07:56:01 - INFO - stdout - {'loss': 0.3469, 'grad_norm': 1.6138715744018555, 'learning_rate': 9.379819265742385e-13, 'epoch': 3.0} +2025-02-06 07:56:01 - ERROR - stderr - 100%|█████████▉| 22431/22434 [21:48:21<00:07, 2.51s/it] +2025-02-06 07:56:04 - ERROR - stderr - 100%|█████████▉| 22432/22434 [21:48:24<00:04, 2.49s/it] +2025-02-06 07:56:04 - ERROR - stderr - +2025-02-06 07:56:04 - ERROR - stderr - +2025-02-06 07:56:04 - INFO - stdout - {'loss': 0.3893, 'grad_norm': 1.7010003328323364, 'learning_rate': 4.168808598326024e-13, 'epoch': 3.0} +2025-02-06 07:56:04 - ERROR - stderr - 100%|█████████▉| 22432/22434 [21:48:24<00:04, 2.49s/it] +2025-02-06 07:56:06 - ERROR - stderr - 100%|█████████▉| 22433/22434 [21:48:26<00:02, 2.49s/it] +2025-02-06 07:56:06 - ERROR - stderr - +2025-02-06 07:56:06 - ERROR - stderr - +2025-02-06 07:56:06 - INFO - stdout - {'loss': 0.4047, 'grad_norm': 1.7058738470077515, 'learning_rate': 1.0422021579081786e-13, 'epoch': 3.0} +2025-02-06 07:56:06 - ERROR - stderr - 100%|█████████▉| 22433/22434 [21:48:26<00:02, 2.49s/it] +2025-02-06 07:56:09 - ERROR - stderr - 100%|██████████| 22434/22434 [21:48:28<00:00, 2.49s/it] +2025-02-06 07:56:09 - ERROR - stderr - +2025-02-06 07:56:09 - ERROR - stderr - +2025-02-06 07:56:09 - INFO - stdout - {'loss': 0.2504, 'grad_norm': 1.2399357557296753, 'learning_rate': 0.0, 'epoch': 3.0} +2025-02-06 07:56:09 - ERROR - stderr - 100%|██████████| 22434/22434 [21:48:29<00:00, 2.49s/it] +2025-02-06 07:56:52 - INFO - transformers.trainer - Saving model checkpoint to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434 +2025-02-06 07:56:52 - INFO - transformers.trainer - Saving model checkpoint to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434 +2025-02-06 07:56:52 - INFO - transformers.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/config.json +2025-02-06 07:56:52 - INFO - transformers.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/config.json +2025-02-06 07:56:52 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/generation_config.json +2025-02-06 07:56:52 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/generation_config.json +2025-02-06 07:57:56 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/model.safetensors.index.json. +2025-02-06 07:57:56 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/model.safetensors.index.json. +2025-02-06 07:57:56 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/tokenizer_config.json +2025-02-06 07:57:56 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/tokenizer_config.json +2025-02-06 07:57:56 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/special_tokens_map.json +2025-02-06 07:57:56 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/special_tokens_map.json +2025-02-06 07:57:56 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/added_tokens.json +2025-02-06 07:57:56 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/added_tokens.json +2025-02-06 07:58:40 - INFO - accelerate.utils.fsdp_utils - Saving model to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/pytorch_model_fsdp.bin +2025-02-06 07:59:24 - INFO - accelerate.utils.fsdp_utils - Model saved to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/pytorch_model_fsdp.bin +2025-02-06 07:59:25 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484371 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1901 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153854 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153854 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:26 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153854 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153854 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:28 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:29 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30774 K | 30774 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:30 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30774 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30774 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:31 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30774 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30774 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:32 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:33 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:35 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:36 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:37 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:38 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:39 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:40 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:42 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:43 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:44 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:45 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:46 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:47 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:49 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:50 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:51 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3603 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:53 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:55 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:56 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 07:59:58 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:00 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:02 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:04 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3603 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:05 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:07 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:09 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:11 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:13 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153855 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:14 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:16 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:18 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:20 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:21 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:23 - WARNING - torch.distributed.fsdp._optim_utils - CUDA Memory Summary before calling to _allgather_orig_param_states |===========================================================================| +| PyTorch CUDA memory summary, device ID 0 | +|---------------------------------------------------------------------------| +| CUDA OOMs: 0 | cudaMalloc retries: 0 | +|===========================================================================| +| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | +|---------------------------------------------------------------------------| +| Allocated memory | 9352 MiB | 16709 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 16707 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Active memory | 9352 MiB | 17993 MiB | 3604 TiB | 3604 TiB | +| from large pool | 9352 MiB | 17992 MiB | 3603 TiB | 3603 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Requested memory | 9335 MiB | 17971 MiB | 3591 TiB | 3591 TiB | +| from large pool | 9335 MiB | 17969 MiB | 3590 TiB | 3590 TiB | +| from small pool | 0 MiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| GPU reserved memory | 30382 MiB | 30382 MiB | 30382 MiB | 0 B | +| from large pool | 30374 MiB | 30374 MiB | 30374 MiB | 0 B | +| from small pool | 8 MiB | 8 MiB | 8 MiB | 0 B | +|---------------------------------------------------------------------------| +| Non-releasable memory | 484408 KiB | 3943 MiB | 874 TiB | 874 TiB | +| from large pool | 482470 KiB | 3938 MiB | 873 TiB | 873 TiB | +| from small pool | 1938 KiB | 5 MiB | 0 TiB | 0 TiB | +|---------------------------------------------------------------------------| +| Allocations | 173 | 257 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 48 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| Active allocs | 178 | 263 | 153856 K | 153855 K | +| from large pool | 125 | 207 | 123080 K | 123080 K | +| from small pool | 53 | 91 | 30775 K | 30775 K | +|---------------------------------------------------------------------------| +| GPU reserved segments | 189 | 189 | 189 | 0 | +| from large pool | 185 | 185 | 185 | 0 | +| from small pool | 4 | 4 | 4 | 0 | +|---------------------------------------------------------------------------| +| Non-releasable allocs | 21 | 63 | 66766 K | 66766 K | +| from large pool | 16 | 52 | 56299 K | 56299 K | +| from small pool | 5 | 18 | 10467 K | 10467 K | +|---------------------------------------------------------------------------| +| Oversize allocations | 0 | 0 | 0 | 0 | +|---------------------------------------------------------------------------| +| Oversize GPU segments | 0 | 0 | 0 | 0 | +|===========================================================================| + +2025-02-06 08:00:24 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:00:24 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:00:24 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:00:24 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:00:25 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:00:25 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:00:25 - WARNING - torch.distributed.fsdp._debug_utils - FSDP _optim_state_dict() profiling: defaultdict(, {'preprocessing': 0.04272403847426176, 'preprocessing_with_comm': 0.01296241208910942, : 0.18280870839953423, : 2.006765777245164, : 57.48512378986925, : 59.75500159803778, 'state_converting': 59.75983114819974, : 59.819412912242115}) +2025-02-06 08:00:25 - INFO - accelerate.utils.fsdp_utils - Saving Optimizer state to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/optimizer.bin +2025-02-06 08:03:20 - INFO - accelerate.utils.fsdp_utils - Optimizer state saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/checkpoint-22434/optimizer.bin +2025-02-06 08:03:24 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:03:24 - INFO - transformers.trainer - + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +2025-02-06 08:03:24 - ERROR - stderr - +2025-02-06 08:03:24 - ERROR - stderr - +2025-02-06 08:03:24 - INFO - stdout - {'train_runtime': 78944.7497, 'train_samples_per_second': 18.185, 'train_steps_per_second': 0.284, 'train_loss': 0.687396430635533, 'epoch': 3.0} +2025-02-06 08:03:24 - ERROR - stderr - 100%|██████████| 22434/22434 [21:55:43<00:00, 2.49s/it] +2025-02-06 08:03:24 - ERROR - stderr - 100%|██████████| 22434/22434 [21:55:43<00:00, 3.52s/it] +2025-02-06 08:03:24 - ERROR - stderr - +2025-02-06 08:03:49 - INFO - transformers.trainer - Saving model checkpoint to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-06 08:03:49 - INFO - transformers.trainer - Saving model checkpoint to outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script +2025-02-06 08:03:49 - INFO - transformers.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/config.json +2025-02-06 08:03:49 - INFO - transformers.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/config.json +2025-02-06 08:03:49 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/generation_config.json +2025-02-06 08:03:49 - INFO - transformers.generation.configuration_utils - Configuration saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/generation_config.json +2025-02-06 08:04:58 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/model.safetensors.index.json. +2025-02-06 08:04:58 - INFO - transformers.modeling_utils - The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/model.safetensors.index.json. +2025-02-06 08:04:58 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/tokenizer_config.json +2025-02-06 08:04:58 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/tokenizer_config.json +2025-02-06 08:04:58 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/special_tokens_map.json +2025-02-06 08:04:58 - INFO - transformers.tokenization_utils_base - Special tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/special_tokens_map.json +2025-02-06 08:04:58 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/added_tokens.json +2025-02-06 08:04:58 - INFO - transformers.tokenization_utils_base - added tokens file saved in outputs/LLaNA_13B_train_stage2_recipe3_shapenerf_objanerf_AUGMENTED/slurm_script/added_tokens.json +2025-02-06 08:05:00 - WARNING - wandb - message_loop has been closed +2025-02-06 08:05:05 - WARNING - urllib3.connectionpool - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/4504800232407040/envelope/ +2025-02-06 08:05:05 - WARNING - urllib3.connectionpool - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/4504800232407040/envelope/ +2025-02-06 08:05:05 - WARNING - urllib3.connectionpool - Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/4504800232407040/envelope/ +2025-02-06 08:05:05 - WARNING - urllib3.connectionpool - Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError(': Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/4504800232407040/envelope/